Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Feb 20.
Published in final edited form as: Stat Med. 2026 Jan;45(1-2):e70394. doi: 10.1002/sim.70394

Mendelian Randomization Methods for Causal Inference: Estimands, Identification and Inference

Minhao Yao 1, Anqi Wang 2, Xihao Li 3,4, Zhonghua Liu 5
PMCID: PMC12917735  NIHMSID: NIHMS2142469  PMID: 41569765

Abstract

Mendelian randomization (MR) has become an essential tool for causal inference in biomedical and public health research. By using genetic variants as instrumental variables, MR helps address unmeasured confounding and reverse causation, offering a quasi-experimental framework to evaluate causal effects of modifiable exposures on health outcomes. Despite its promise, MR faces substantial methodological challenges, including invalid instruments, weak instrument bias, and design complexities across different data structures. In this tutorial review, we aim to provide a systematic overview of MR methods for causal inference, emphasizing clarity of causal interpretation, study design comparisons, availability of software tools, and practical guidance for applied scientists. We organize the review around causal estimands, ensuring that analyses are anchored to well-defined causal questions. We discuss the problems of invalid and weak instruments, comparing available strategies for their detection and correction. We integrate discussions of population-based versus family-based MR designs, analyses based on individual-level versus summary-level data, and one-sample versus two-sample MR designs, highlighting their relative advantages and limitations. We also summarize recent methodological advances and software developments that extend MR to settings with many weak or invalid instruments and to modern high-dimensional omics data. Real-data applications, including UK Biobank and Alzheimer’s disease proteomics studies, illustrate the use of these methods in practice. This review aims to serve as a tutorial-style reference for both methodologists and applied scientists.

Keywords: causal genomics, causal inference, instrumental variables, Mendelian randomization, omics data, UK Biobank, unmeasured confounding

1 |. Introduction

1.1 |. Motivation for Causal Inference in Observational Studies

A central goal of causal inference is to assess the causal relationships between variables [14]. This involves determining whether changes in one variable (the treatment or exposure) directly influence changes in another variable (the outcome). Randomized experiments are generally regarded as the gold standard study design in statistical research and practice due to their ability to facilitate causal inference [5, 6]. In essence, these experiments employ random assignment to allocate participants to treatment and control groups, which ensures that the comparison groups are balanced regarding all (measured and unmeasured) covariates except for the treatment assignment itself [58]. This randomization minimizes bias and enhances the internal validity of the findings. In contrast, observational (non-randomized) studies are often employed when randomization is unfeasible or unethical [9, 10]. Observational studies aim to draw causal conclusions from real world data; however, such studies can be affected by confounding variables—factors that may influence both the treatment assignment and outcome variables, complicating establishing treatment-outcome causal relationships [1].

1.2 |. Mendelian Randomization as a Natural Experiment

An instrumental variable (IV) serves as a powerful tool in causal inference by leveraging a natural experiment, allowing researchers to uncover causal relationships even in the presence of unobserved confounding [1115]. By leveraging an exogenous source of variation, such as genotype [1618] or draft lottery [19], the IV approach isolates the variation in the treatment variable that is as good as randomly assigned, much like a randomized controlled trial. This natural experiment framework helps address confounding concerns, providing more credible estimates of causal effects [20]. Embracing IV as a natural experiment not only strengthens empirical research but also brings us closer to the gold standard of causal inference using randomized experiments.

Mendelian randomization (MR) is a causal inference method that applies Mendel’s laws of inheritance, using genetic variants as instrumental variables to assess causal relationships between modifiable risk factors and health outcomes. By leveraging Gregor Mendel’s principles of random segregation and independent assortment of alleles, MR mimics a randomized experiment, reducing confounding biases inherent in observational studies [16, 2124]. For illustrative purposes, we compare the designs of a randomized experiment and Mendelian randomization in Figure 1. Since genetic variants are randomly assigned at conception, much like the randomization in a clinical trial, they serve as ideal instruments to assess causal relationships between modifiable exposures (e.g., cholesterol levels) and health outcomes (e.g., heart disease) [2527]. By leveraging the unconfounded nature of genetic inheritance, MR minimizes biases from reverse causation and unmeasured confounding, offering a robust framework for causal inference in biomedical research [23, 24, 28]. This approach has been transformative in public health and medicine, helping to validate drug targets, debunk spurious associations, and guide public health policies [2932]. By employing genetic variants as IVs, MR leverages the natural randomization of alleles conferred by Mendelian inheritance, transforming observational data into a quasi-experimental framework that robustly infers causal relationships.

FIGURE 1 |.

FIGURE 1 |

Comparisons of randomized controlled trial and Mendelian randomization.

1.3 |. From Causal Estimands to Statistical Inference

In this article, we adopt the estimand framework to elucidate key concepts, study designs, statistical inference methods, and causal interpretations in causal inference, aiming to clarify common misconceptions and provide practical guidance for MR analysis [33, 34]. By formally defining causal estimands, such as the average treatment effect or local average treatment effect, we align MR with the underlying causal questions of interest [3436]. We then discuss how MR designs and analytical approaches target these causal estimands under certain assumptions [29, 37]. This estimand framework not only enhances the interpretability of MR results but also helps researchers navigate methodological challenges, such as pleiotropy and weak instrument bias, ensuring more reliable causal inference in practice [33, 3842].

To formulate a coherent causal inference framework, it is essential to distinguish the following three key concepts: causal population, observed population, and sample, as illustrated in Figure 2. The causal population consists of all subjects in the study domain, where each subject is associated with multiple potential outcomes, one corresponding to each level of the treatment or exposure [6, 11, 36, 43, 44]. Causal estimands are precisely defined target quantities in causal population specifying the causal effects that we are interested in. The observed population consists of subjects for whom only one potential outcome is realized due to the actual treatment assignment. Statistical estimands are quantities that are defined in the observed population. Causal assumptions (e.g., consistency, unconfoundedness) are required to establish causal identification, linking a causal estimand to its corresponding statistical estimand defined in the observed population [6, 38, 43]. A sample consists of a subset drawn from the target observed population through either random or non-random selection procedures during data collection. Estimation and inference within this sample necessitates accounting for the sampling design to draw valid conclusions about the target observed population.

FIGURE 2 |.

FIGURE 2 |

Conceptual flowchart bridging causal population, observed population, and sample in causal inference.

1.4 |. Outline of the Article

Although MR was originally proposed by Katan [17] as a hypothesis-testing strategy, in practice researchers are often interested in estimating causal effects. From a modern causal inference perspective, hypothesis testing typically relies on weaker assumptions and does not necessarily require the stronger assumptions needed for point identification of the causal estimand [18, 31]. Nevertheless, effect sizes are essential in modern epidemiology for risk quantification and public health interpretation, and reliance on p-values alone has been widely criticized [45]. Consequently, while hypothesis testing using MR remains important, many applied MR analyses seek causal effect estimation, motivating the estimand-oriented framework adopted in this article. In line with the estimands framework increasingly emphasized in clinical trials [33], clearly specifying the causal estimand helps ensure that the study design, assumptions, and analytic methods are aligned with the research question, and this clarity is equally crucial in Mendelian randomization, where different identification assumptions target fundamentally different causal effects in the underlying population. We also note that, given the vast and rapidly expanding MR literature, this article is intended as a tutorial rather than a comprehensive review.

The remainder of the article is organized as follows. Section 2 introduces how natural experiments based on genetic variation underpin MR analysis. Section 3 defines key causal estimands and the identification assumptions under the potential outcomes framework. Section 4 focuses on causal estimand under the Additive LInear Constant Effect (ALICE) model framework. Section 5 discusses identification and inference when some genetic instruments are invalid. Section 6 introduces weak instrument bias and reviews recent methods aiming at mitigating this bias. Section 7 compares MR analyses based on population-based versus family-based study designs. Section 8 discusses the use of individual-level and summary-level data in MR, highlighting their respective advantages and limitations. Section 9 compares one-sample and two-sample MR designs, focusing on the assumptions and the behavior of weak instrument bias. Section 10 outlines the procedure and key considerations for selecting genetic instruments in MR analyses. Section 11 illustrates method comparisons using two real-data applications. Finally, Section 12 outlines future directions for MR, including binary and survival outcomes, longitudinal designs, and multivariable MR.

2 |. Using Genetic Variants as Instruments: Natural Experiments in Health Research

2.1 |. From Randomized Trials to Natural Experiments

Randomized controlled trials (RCTs) serve as the gold standard to establish the causal relationship between an exposure and an outcome [5, 46]. However, some exposures are unethical or even infeasible to be randomized [47, 48]. As an alternative to RCTs, natural experiments are observational (non-randomized) studies where subjects are assigned to the treatment or control groups based on events determined by other factors beyond the control of researchers [20, 49, 50]. Natural experiments are common and have been used extensively in many fields, especially when the exposure in view cannot be ethically or practically manipulated in experimental settings [49, 51, 52].

2.2 |. Genetic Inheritance as a Natural Experiment

Mendelian randomization (MR), named after Gregor Mendel (1822–1884) who established the laws of Mendelian inheritance [5355]., leverages the random assortment of genetic information during meiosis as a natural experiment to assess the causality between a modifiable exposure and an outcome of interest from observational studies [16, 22, 23]. In biallelic single-nucleotide polymorphisms (SNPs) where two possible alleles exist at a specific locus, the predominant allele in the population is referred to as the wild-type or major allele, while the less common allele is referred to as the variant or minor allele [56, 57]. During meiosis, alleles for unlinked genes are inherited independently, which is a process governed by Mendel’s Law of Independent Assortment [54, 55, 58]. This process forms the basis for the natural experiment underpinning MR framework [16, 23, 59].

2.3 |. Instrumental Variable Assumptions and Potential Violations in MR

Just as Archimedes famously claimed, “Give me a place to stand, and I will move the Earth” [60], IV methods echo: “Give me a valid instrument, and I will eliminate confounding.” For reliable causal findings, genetic instruments included in the conventional MR analysis are required to be valid IVs, that is, they should satisfy the following three core IV assumptions [23, 61]:

Assumption A1 (IV relevance). The genetic variant is associated with the exposure.

Assumption A2 (IV independence). The genetic variant is not associated with unmeasured confounder of the exposure-outcome relationship.

Assumption A3 (Exclusion restriction). The genetic variant affects the outcome only through the exposure.

However, all of the above three core IV assumptions might be violated in large-scale genetics data, as shown in Figure 3b. Among the three core IV Assumptions A1’–A3’, only IV relevance Assumption A1 is empirically testable by selecting genetic variants associated with the exposure, while IV independence Assumption A2 and exclusion restriction Assumption A3 cannot be empirically verified in general [16, 23, 24]. The near violation of the testable Assumption A1 may happen when genetic variants exhibit weak associations with the exposure, leading to the potential weak IV bias [6265]. The violation of Assumption A2 may arise due to the presence of population stratification, assortative mating and dynastic effect [23, 66, 67]. The violation of Assumption A3 may occur due to the widespread horizontal pleiotropy, where the genetic variant influences the outcome through other biological pathways that do not involve the exposure in view [23, 6870]. Recently, a number of MR methods have been proposed for the identification, estimation and inference of the causal effect of interest when one or more of the three core IV assumptions are potentially violated [32, 7178]. For a more comprehensive review of identification and inference with invalid IVs, see Kang et al. [79].

FIGURE 3 |.

FIGURE 3 |

Directed acyclic graphs (DAGs) that show the relationship among an instrumental variable Z, a treatment/exposure D, an outcome Y, and the unmeasured confounding U. In the right DAG, the dashed red, solid green and blue lines represent violations of the IV relevance (A1), IV independence (A2), and exclusion restriction (A3) assumptions, (a) valid IV and (b) invalid IV.

3 |. Causal Estimands in the Potential Outcomes Framework

3.1 |. Definition of Individual Treatment Effect and Average Treatment Effect

Under the potential outcomes framework [6, 43, 44], let Di{0,1} denote the binary treatment status (also referred to as the exposure) for subject i, and Yi(d) denote the potential outcome for subject i if we set Di=d{0,1}. The individual treatment effect (ITE) [6, 43] for subject i is defined as

ITEi=Yi(1)Yi(0),

which quantifies the difference in the outcome for subject i under treatment versus no treatment. The ITE is generally not identifiable since we cannot observe both Yi(1) and Yi(0) for the same subject i at one time [1, 2], a concept known as the fundamental problem of causal inference.

The average treatment effect (ATE) [36, 43] is defined as

ATE=EYi(1)-Yi(0),

which measures the difference in mean outcomes had everyone been treated versus had everyone been untreated. Similarly, for a continuous treatment, we can also define similar treatment effect for any two distinct levels: d,d. Let Yi be the observed outcome for subject i. Under the following assumptions:

  1. the consistency assumption, that is, Yi=Yi(d) for d{0,1} and

  2. the random assignment of treatment status, that is, Yi(0),Yi(1)Di, the ATE can be identified as follows [36, 80]:
    ATE=EYi(1)-EYi(0)=EYi(1)Di=1-EYi(0)Di=0=EYiDi=1-EYiDi=0,

    where the second equation holds because of the random assignment of treatment status, and the third equation holds because of the consistency assumption. However, when the treatment status is not randomized, the ATE cannot be identified using the above formula, because EYi(d)EYi(d)Di=d for d,d{0,1}.

3.2 |. Constant Treatment Effect

Consider a binary instrument variable (IV) Zi{0,1}. Let Di(z) denote the binary treatment status for subject i when the IV is set to Zi=z{0,1}, and Yi(z,d) denote the potential outcome for subject i if we set Zi=z{0,1} and Di=d{0,1}. Then, the core IV Assumptions A1’–A3’ can be stated as:

Assumption A1’ (IV relevance). DiZi.

Assumption A2’ (IV independence). Yi0,Di(0),Yi(1,Di(1),Di(0),Di(1)Zi.

Assumption A3’ (Exclusion restriction). Yi(0,d)=Yi(1,d)=Yi(d) for d{0,1}.

However, the above Assumptions A1’–A3’ alone are insufficient for the point identification of the causal effect, and hence a fourth assumption is required [1]. One such assumption is the following constant treatment effect (CTE) assumption [1, 8184]:

Assumption A4.1 (Constant treatment effect). Yi(1)-Yi(0)=β for all subjects i.

Let Yi(0)=y0+εi, where y0=EYi(0), then the observed outcome Yi can be written as the following model [82, 83, 85]:

Yi=y0+βDi+εi.

Since Di might be correlated with εi, regressing the outcome Yi on the treatment Di does not consistently estimate β. However, under the IV independence Assumption A2’, εi should be independent of the instrument Zi, implying EεiZi=0=EεiZi=1]. Substituting εi=Yi-y0-βDi and solving for β, it can be shown that the CTE β equals the following usual IV estimand βIV [1, 86]:

βIV=EYiZi=1-EYiZi=0EDiZi=1-EDiZi=0. (1)

As illustrated in Figure 4, the usual IV estimand βIV is the slope of the line that captures the relationship between the expected outcome EYiZi and the expected treatment EDiZi conditional on two levels of the IV Zi=z{0,1}.

FIGURE 4 |.

FIGURE 4 |

Graphical illustration of the usual IV estimand βIV, represented by the slope of the solid line.

Conclusion 1. Under Assumptions A1’–A3’ and A4.1, the usual IV estimand (1) identifies the constant treatment effect.

3.3 |. Average Treatment Effect on the Treated

In this section, we consider the following additive homogeneity assumption [1, 84, 87], which is weaker than Assumption A4.1 and only requires that the average treatment effect is the same across different levels of Zi for both treated and untreated groups, that is,

Assumption A4.2 (Additive homogeneity). EYi(1)-Yi(0)Di=d,Zi=1=EYi(1)-Yi(0)Di=d,Zi=0 for d{0,1}.

For binary treatment Di and binary IV Zi, we can express the average treatment effect among the treated across different levels of Zi using the following saturated additive structural mean model [1, 84, 87]:

EYi(1)-Yi(0)Di=1,Zi=β0+β1Zi.

With the consistency assumption, the above model can be re-written as EYi-Yi(0)Di,Zi=Diβ0+β1Zi [1]. Here, β0 represents the average treatment effect among the treated individuals with Zi=0, and β0+β1 represents the average treatment effect among the treated individuals with Zi=1. The additive homogeneity Assumption A4.2 implies β1=0, and then the parameter β0 corresponds to the average treatment effect on the treated (ATT) [84, 87, 88]:

ATT=EYi(1)-Yi(0)Di=1.

Under Assumption A4.2, EYi(0)Zi=z=EYi-Diβ0Zi=z. Under the IV independence Assumption A2’, EYi(0)Zi=0=EYi(0)Zi=1. By solving the equation EYi-Diβ0Zi=0=EYi-Diβ0Zi=1, the parameter β0 equals the usual IV estimand βIV defined in Equation (1). More recently, Liu et al. [74] and Liu et al. [89] have further investigated the identification of the ATT under potential violations of core IV assumptions.

Conclusion 2. Under Assumptions A1’–A3’ and A4.2, the usual IV estimand (1) identifies the average treatment effect on the treated.

3.4 |. Local Average Treatment Effect

An alternative fourth identification assumption is the following monotonicity assumption [11, 36]:

Assumption A4.3 (Monotonicity). Di(1)Di(0) for all subjects i.

Then, under Assumptions A1’–A3’ and A4.3, the usual IV estimand βIV in Equation (1) identifies the local average treatment effect (LATE) in the subgroup of compliers (i.e., subjects with Di(0)=0 and Di(1)=1) [11, 36], which is defined as:

LATE=EYi(1)-Yi(0)Compliers.

For binary IV Zi and binary treatment status Di, the entire population is divided into four latent subgroups, known as compliance types [11, 36], as shown in the following table (Table 1):

TABLE 1 |.

Four compliance types based on the values of Di(z) for z{0,1}.

Zi=0 Zi=1
Complier Di(0)=0 Di(1)=1
Always-taker Di(0)=1 Di(1)=1
Never-taker Di(0)=0 Di(1)=0
Defier Di(0)=1 Di(1)=0

The compliance type of subject i is generally latent since we cannot observe both Di(0) and Di(1) at one time. Under the monotonicity Assumption A4.3 there are no defiers in the population. Additionally, the IV Zi has no effect on Yi in the subgroups of always-takers or never-takers, as the treatment status Di is fixed across different levels of Zi in these two subgroups. Therefore, the IV Zi can only affect the outcome Yi in the subgroup of compliers, meaning that the usual IV estimand identifies the LATE in compliers.

Conclusion 3. Under Assumptions A1’–A3’ and A4.3, the usual IV estimand (1) identifies the local average treatment effect in compliers.

3.5 |. Identification of Average Treatment Effect

Let Ui denote the unmeasured confounders. Wang & Tchetgen Tchetgen [90] proposes the following two no-interaction assumptions for the identification of ATE:

Assumption A4.4 (No additive Ui-Zi interaction). There is no additive Ui-Zi interaction in EDiZi,Ui, that is, EDiZi=1,Ui-EDiZi=0,Ui=EDiZi=1-EDiZi=0.

Assumption A4.5 (No additive Ui-d interaction). There is no additive Ui-d interaction in EYi(d)Ui, that is, EYi(1)-Yi(0)Ui=EYi(1)-Yi(0).

Intuitively, Assumption A4.4 rules out modification of instrument-treatment association by Ui on the additive scale, whereas Assumption A4.5 rules out modification of the effect of the treatment on the outcome by Ui on the additive scale. In addition, Wang & Tchetgen Tchetgen [90] also imposes the following assumption for confounding control:

Assumption A5 (Sufficiency of Ui for confounding control). Yi(d)Di,ZiUi.

Assumption A5 requires that, conditional on the unmeasured confounders Ui, the potential outcomes are independent of the treatment status and the instrument. This assumption is originally formulated by Richardson & Robins [91].

Conclusion 4. Under Assumptions A1’–A3’ and A5, together with either Assumption A4.4 or A4.5, the usual IV estimand (1) identifies the average treatment effect.

3.6 |. Comparisons of the Causal Estimands

We compare the four causal estimands (CTE, ATT, LATE, and ATE) in Table 2. The identification of all four causal estimands requires Assumptions A1’–A3’, which are therefore referred to as the core IV assumptions [11, 13]. However, these core IV Assumptions A1’–A3’ are insufficient for point identification, and the four causal estimands differ in their additional identification assumption [1]. The CTE relies on the strong constant treatment effect Assumption A4.1, which posits that the treatment effect is the same across all subjects. By contrast, the ATT relies on the additive homogeneity Assumption A4.2, a weaker identification assumption than A4.1, and corresponds to the average treatment effect among those who actually received the treatment. The LATE is identified by imposing the monotonicity Assumption A4.3 as the additional identification assumption, and is interpreted as the average treatment effect in the subgroup of compliers, that is, subjects who would receive the treatment if assigned to it and not receive it otherwise. Finally, the ATE, which captures the average treatment effect for the entire population, is identified under either the no-interaction Assumption A4.4 or A4.5, together with the confounding control Assumption A5.

TABLE 2 |.

Comparison of identification assumptions and interpretations for CTE, ATE, ATT, and LATE.

Causal estimand Identification assumptions Causal interpretation
CTE IV relevance A1’;
IV independence A2’;
Exclusion restriction A3’;
Constant treatment effect A4.1.
Constant treatment effect of the treatment versus control across all subjects.
ATT IV relevance A1’;
IV independence A2’;
Exclusion restriction A3’;
Additive homogeneity A4.2.
Average treatment effect of the treatment versus control specifically for subjects that actually received the treatment.
LATE IV relevance A1’;
IV independence A2’;
Exclusion restriction A3’;
Monotonicity A4.3.
Average treatment effect of the treatment versus control specifically for the compliers.
ATE IV relevance A1’;
IV independence A2’;
Exclusion restriction A3’;
No-interaction A4.4 or A4.5;
Confounding control A5.
Average treatment effect of the treatment versus control for the entire population.

4 |. Causal Estimand Defined in the ALICE Model

4.1 |. Definition of the ALICE Model

Having introduced the key causal estimands (CTE, ATT, LATE, and ATE) and their identification under IV assumptions, we now turn to a specific causal model that has become the workhorse of Mendelian randomization studies. The Additive LInear Constant Effects (ALICE) model [92] formalizes the constant treatment effect Assumption A4.1 introduced in Section 3, providing a simple yet widely used framework for characterizing causal effects in MR. Let DiR denote the exposure of subject i, and Zi=Zi1,,ZipRp denote the vector of p genetic instruments of subject i. Let Yi(z,d)R denote the continuous potential outcome if subject i had Zi=z=z1,,zp and Di=d. Then, for two possible values of instruments z,z and the exposure d,d, we assume the following model [2, 73, 93]:

Yiz,d-Yiz,d=βd-d+j=1pψjzj-zj,
EYi(0,0)Zi=j=1pϕjZij, (2)

where βR is the primary causal parameter of interest, representing the constant effect of a one-unit change in the exposure on the outcome across all subjects in the whole population. The parameter ψjR quantifies the degree of violation of the exclusion restriction Assumption A3 for jth instrument, capturing the direct effects of the instrument on the potential outcome. The parameter ϕjR quantifies the degree of violation of the IV independence Assumption A2 for jth genetic instrument. Under the IV independence Assumption A2, the instruments Zi should be independent of the baseline potential outcome Yi(0,0) in the absence of confounding. However, in model (2), the relationship between Zi and Yi(0,0) is modeled through ϕ1,,ϕp, allowing for potential violations of Assumption A2 [11, 73, 93]. Let πj=ψj+ϕj for j=1,,p, and εi=Yi(0,0)-EYi(0,0)Zi, then under the consistency assumption, we have the following observed outcome model [73, 93, 94]:

Yi=βDi+j=1pπjZij+εi,EεiZi=0. (3)

The causal effect β in model (3) cannot be estimated by directly fitting a usual linear regression because the exposure Di might be correlated with the error term εi. Moreover, in model (3), the parameter πjR encodes the degrees of violation of Assumptions A2 and A3 for jth genetic instrument. Specifically, if the jth genetic instrument satisfies both the exclusion restriction assumption and IV independence assumption, then πj=0; otherwise, if πj0, the jth genetic instrument violates at least one of the exclusion restriction assumption or IV independence assumption [73, 75, 77, 79, 93, 95, 96]. Therefore, we say the jth genetic instrument is a valid IV if πj=0, and an invalid IV if πj 0. In Section 5, we will discuss the identification and inference in the presence of invalid IVs under the ALICE model framework.

Remark 1. Kang et al. [93] extends model (2) to incorporate heterogeneous causal effect as follows:

Yiz,d-Yiz,d=βid-d+j=1pψjzj-zj,

where βi is the individual causal effect of subject i. Let β=Eβi be the average causal effect, the observed outcome model (3) becomes

Yi=βDi+j=1pπjZij+βi-βDi+εi,EεiZi=0.

This model reduces to the constant causal effect model (3) if (βi-β) is independent of Di given Zi [93].

4.2 |. ALICE Model is Widely Used in MR Studies

To model the relationship between a continuous exposure and genetic instruments, we further consider a linear model between the exposure Di and the genetic instruments Zi [11, 73, 94]:

Di=j=1pγjZij+δi,EδiZ=0, (4)

where γj represents the IV strength of jth genetic instrument.

Note that the error term δi in the exposure model (4) might be correlated with the error term εi in the outcome model (3) due to unmeasured confounders. By plugging in the exposure model (4) into the outcome model (3), we can obtain the reduced-form model for the outcome [73, 94]:

Yi=j=1pΓjZij+ei,EeiZi=0, (5)

where Γj=βγj+πj, and ei=βδi+εi.

Most summary-level MR methods for continuous outcomes build upon the ALICE model. For a single genetic instrument j, according to the equation Γj=βγj+πj, the ratio estimand is defined as follows [97, 98]:

βj=Γjγj=β+πjγj. (6)

When jth genetic instrument is a valid IV (i.e., πj=0), the ratio estimand βj equals the causal effect β in the ALICE model. In summary-level MR analysis, the ratio estimate of jth SNP is defined as β^j=Γ^j/γ^j, where Γ^j and γ^j are marginal estimates of Γj and γj in genome-wide association studies (GWAS) summary statistics.

4.3 |. Practical Limitations and Interpretational Caveats of the ALICE Model

Although the ALICE model provides a useful framework for causal inference with potentially invalid instrumental variables, it is subject to several important limitations and interpretational caveats. First, the causal effect defined in the ALICE model should be interpreted as a constant treatment effect of a one-unit increase in the exposure on the outcome [2, 93, 94]. However, the constant treatment effect assumption may not hold in real-world settings where treatment effects may vary across subjects [99101]. Second, the ALICE model assumes linearity not only in the causal effect but also in the violation of core IV assumptions. This linearity assumption might be violated when the underlying relationships are nonlinear, such as in complex genetic architectures [75, 102, 103]. Therefore, when applying the ALICE model, it is essential to carefully assess the plausibility of its assumptions within the context of the study and to interpret the resulting estimates with appropriate caution.

5 |. Identification and Inference in the Presence of Invalid IVs

5.1 |. Additional Identification Assumption Under the ALICE Model

When jth genetic IV is a valid instrument (i.e., πj=0), the ALICE model in Section 4 enables causal identification through the ratio Γj/γj. However, when invalid instruments are present without prior knowledge of IV validity status, the causal effect β in model (3) becomes non-identifiable. This is because the parameters in models (4) and (5) should satisfy the following equation system:

Γj=βγj+πj,j=1,,p, (7)

where the IV-exposure associations γ=γ1,,γp and IV-outcome associations 𝚪=Γ1,,Γp can be identified using population ordinary least squares (OLS) through γ=EZiZi-1EZiDi and 𝚪=EZiZi-1EZiYi. Given γ and 𝚪, there are p equations with p+1 unknown parameters (β,π1,,πp), resulting in an underdetermined equation system that precludes unique identification of β,π1,,πp, and renders models (3) and (4) under-identified. Consequently, additional assumptions regarding π=π1,,πp are required to address the identifiability issue. Below, we list three commonly adopted additional identification assumptions in the ALICE model.

Assumption A6 (Instrument strength independent of the direct effect (InSIDE)). The IV-exposure association γj is asymptotically independent of the degree of IV invalidity πj as the number of genetic IVs p goes to infinity.

From the equation system (7), we have

Cov(𝚪,γ)Var(γ)=β+Cov(π,γ)Var(γ).

Under the InSIDE assumption, Cov(π,γ)0 as p, yielding the identification of β [71, 104]. The InSIDE assumption has been adopted in some summary-level MR methods, for example, MR-Egger [71], random-effects inverse-variance weighed (IVW) method [72], and MR using the Robust Adjusted Profile Score (MR-RAPS) [78].

Assumption A6 (Majority rule). The number of valid genetic IVs is more than half of the relevant genetic IVs.

The majority rule assumption is a sufficient condition for the identification of β under the ALICE model framework [93, 105]. Formally, let S=j:γj0 denote the set of all relevant genetic IVs with non-zero IV-exposure associations, and 𝒱=jS:γj0andπj=0 denote the set of all valid genetic IVs, then the majority rule assumption can be expressed as |𝒱|>12|S|. Under the majority rule assumption, more than half of the ratio estimand βj defined in Equation (6) equal the true causal effect β, since they arise from valid instruments with πj=0 [106]. A natural identification strategy is therefore to find the median of βjjS [106]. Some MR methods based on the majority rule assumption include Some Invalid Some Valid IV Estimator (sisVIVE) [93], weighted median method [106], and MR Pleiotropy RESidual Sum and Outlier test (MR-PRESSO) [76].

Assumption A8(Plurality rule). Valid genetic IVs form the largest group among relevant genetic IVs based on the ratio of IV-outcome association to IV-exposure association.

As shown in Guo et al. [73], the plurality rule assumption is weaker than the majority rule assumption, and is a sufficient condition for the identification of causal effect β under the ALICE model framework. Formally, the plurality rule assumption can be expressed as |𝒱|>maxc0jS:πjγj=c. Under this assumption, the true causal effect β corresponds to the mode of the distribution of βjjS [107]. Thus, the identification of the causal effect β can be achieved by detecting the largest group of ratio estimands, either by direct mode estimation [107] or via voting-based procedures [32, 73]. The plurality rule assumption is also termed as the ZEro Modal Pleiotropy Assumption (ZEMPA) [107], and is adopted in MR methods including the mode-based estimation [107], Two-Stage Hard Thresholding (TSHT) [73], MRMix [108], the contamination mixture method [109], Confidence Interval method for Instrumental Variable (CIIV) [96], and MR with valid IV Selection and Post-selection Inference (MR-SPI) [32].

Remark 2. Equation (7) demonstrates that, in the presence of unknown instrument invalidity, the causal effect β in the ALICE model is generally not identifiable. Assumptions A6–A8 restore identification by imposing different constraints on π, which encodes the degree of violation of Assumptions A2 and A3. Because Assumptions A6–A8 cannot be empirically tested with data, a common practice is to employ multiple MR methods relying on different assumptions as sensitivity analyses to evaluate the robustness of MR findings.

5.2 |. Alternative Identification Strategies Beyond the ALICE Model Framework

Sun et al. [75] considers the following model under the potential outcomes framework:

Yiz,d-Yi(0,d)=βd-d+ψ(z),

where ψ() is an unknown function that satisfies ψ(0)=0, which allows for arbitrary interactions among the direct effects of the instruments on the outcome. The ALICE model is a special case of the above model by specifying ψ(z)=j=1pψjzj and EYi(0,0)Zi=j=1pϕjZij [75]. Under this model, the set of valid instruments is defined as the index set 𝒱{1,,p} such that ψZi=ψZi,-ν and EYi(0,0)Zi=EYi(0,0)Zi,-ν holds almost surely, where Zi,-ν=Zij:j𝒱) [75]. When all p instruments are mutually independent and there are at least v valid instruments, Sun et al. [75] shows that the causal effect β in the above model is the unique solution to the following equation:

Eh[v]ZiYi-βDi=0,

where the function h[v]ZiRm with m=j=0v-1pj represents all demeaned interactions involving at least p-v+1 instruments. For example, when there are p=2 instruments and there is at least v=1 valid instrument, then there is only one demeaned interaction Zi1-μ1Zi2-μ2 involving at least 2 instruments, where μ1 and μ2 are the expectations of Zi1 and Zi2, respectively. When Zi1-μ1Zi2-μ2 is associated with the exposure Di, then β is the unique solution to

EZi1-μ1Zi2-μ2Yi-βDi=0.

Remark 3. As discussed in Kang et al. [79], Sun et al. [75] uses higher order interactions to create “new” instruments from the p instruments, which can capture possible nonlinear effects of instruments on the exposure. In contrast, Guo et al. [110] applies machine learning algorithms to explore nonlinear effects of instruments on the exposure.

Tchetgen Tchetgen et al. [111] proposes the MR G-Estimation under No Interaction with Unmeasured Selection (MR-GENIUS) approach that leverages heteroscedasticity in the exposure to identify the causal effect. Specifically, Tchetgen Tchetgen et al. [111] considers the following model:

EYiDi,Zi,Ui=βyUiDi+αyUi,Zi+ηyUi,
EDiZi,Ui=αdUi,Zi+ηdUi,

where βy() is an unspecified function of the unmeasured confounder Ui, which affects both the exposure Di and the outcome Yi, and is independent of the instruments Zi. The terms ηy() and ηd() are two unspecified functions of Ui, and αy() and αd() are two unspecified functions of (Ui,Zi) satisfying αy(U,0)=αd(U,0)=0. When the exposure Di is heteroscedastic, that is, VarDiZi varies with the instruments Zi, Tchetgen Tchetgen et al. [111] shows that the average causal effect β=EβyUi in the above model is the unique solution to the following equation:

EZi-EZiDi-EDiZiYi-βDi=0.

Remark 4. As discussed in Tchetgen Tchetgen et al. [111], MR-GENIUS might not perform well when VarDiZi is only weakly dependent on the instruments Zi. Ye et al. [112] extends MR-GENIUS to allow for many weak invalid instruments.

By leveraging heteroscedasticity in the outcome, Liu et al. [74] proposes the Mendelian Randomization Mixed-Scale Treatment Effect Robust Identification (MR-MiSTERI) approach for the average treatment effect on the treated (ATT). In this section, we focus on the case where both the treatment and the possibly invalid genetic instrument are binary; see Liu et al. [74] for extensions. Specifically, MR-MiSTERI relies on the following three identification assumptions:

Assumption B1 (Homogeneous ATT). The ATT does not vary with the possibly invalid IV on the additive scale, that is, EYi(z,d=1)-Yi(z,d=0)Di=1,Zi=z=β.

Assumption B2 (Homogeneous confounding bias on the odds ratio scale). ORYi(0)=y0,Di=dZi=z=expξdy0, where ξ quantifies the magnitude of confounding bias.

Assumption B3 (Outcome heteroscedasticity). Define εi=Yi-EYiDi,Zi and suppose that εiDi,Zi~N0,σ2Zi, then σ2Zi must vary with the genetic instrument Zi.

Remark 5. Assumption B3 can be empirically testable through genome-wide variance quantitative trait loci (vQTL) analyses [113, 114]. As shown in Paré et al. [113], gene–gene (GxG) and/or gene–environment (GxE) interactions can result in genotype-dependent changes in trait variance, providing direct evidence for heteroscedasticity in quantitative traits.

Under Assumptions B1–B3, the confounding bias parameter ξ and the causal effect β are uniquely identified by

ξ=DiZi=1-DiZi=0σ2Zi=1-σ2Zi=0,
β=DiZi-DiZi=1-DiZi=0σ2Zi=1-σ2Zi=0σ2Zi,Zi=0,1,

and the estimates for ξ and β can be obtained by replacing the unknown quantities with the sample counterparts in observed data.

5.3 |. Inference for the Causal Effect

Following Kang et al. [79], inference for the causal effect in MR analysis can be broadly classified into two methodological paradigms: pointwise inference and uniformly valid inference. Pointwise inference constructs the confidence interval for the causal effect either by:

  1. calculating the standard error of the causal effect estimate directly from asymptotic distribution or resampling techniques (e.g., bootstrap) [71, 72, 78, 106109].; or

  2. selecting valid instruments from candidate genetic variants (e.g., using voting procedure or outlier detection test) and subsequently constructing confidence intervals using the selected subset [32, 73, 76, 96]. The latter approach relies on the correct selection of valid IVs; when IV selection error occurs in finite samples, it might lead to poor coverage performance [95].

The second paradigm, uniformly valid inference, constructs CIs that remain robust to finite-sample instrument selection errors [32, 79, 95, 115]. Specifically, Kang et al. [115] proposes taking the union of confidence intervals constructed from subsets of instruments passing the J test [116]; however, this procedure is computationally costly when the number of candidate genetic IVs is large [79]. Alternatively, Guo [95] and Yao et al. [32] first construct “pseudo CIs” through grid-search using resampled IV-exposure and IV-outcome associations, and then construct the final robust CI by taking the union of these pseudo CIs across resamples. Uniformly valid inference generally constructs wider CIs than pointwise inference, which is a trade-off for the guaranteed finite-sample coverage level [79].

6 |. Weak Identification in the ALICE Model Framework

6.1 |. The Presence of Weak Identification Bias

In this section, we examine the bias introduced by weak instruments, that is, instruments that are only weakly associated with the exposure, in the ALICE model framework. We begin by assuming that all instruments satisfy Assumptions A2 and A3, that is, πj=0 for all j{1,,p} in model (3). In this case, two-stage least squares (2SLS) is commonly employed to estimate the causal effect β [12, 14]. Specifically, in the first stage, we fit an OLS regression of the exposure D on the genetic instruments Z to obtain the following fitted exposure values D^:

D^=ZZZ-1ZD.

In the second stage, the outcome Y is regressed on these fitted exposures D^ to obtain the following 2SLS estimator of the causal effect:

βˆ2SLS=DˆTD^-1DˆY. (8)

Assume that the error terms satisfy εi~N0,σε2,δi~N0,σδ2, and denote Covδi,εi=σδ,ε, Rothenberg [117] provides the following analytic expression for the bias of 2SLS estimator

μβ^2SLS-β=σεσδηε+ξε,δ/μ1+2ηδ/μ+ξδ,δ/μ2, (9)

where μ2=γZZγ/σδ2 is the concentration parameter that measures the instrument strength [65, 118]. Here, ηε=σεγZZγ-1γZε and ηδ=σδγZZγ-1γZδ are two standard normal random variables with correlation σδ,ε/σεσδ,ξε,δ=δZσεσδZZ-1Zε and ξδ,δ=δZσδ2ZZ-1Zδ are two quadratic forms of normal random variables that do not depend on the sample size n.

Remark 6. From Equation (9), as the concentration parameter μ2 goes to infinity, μβ^2SLS-β has an asymptotic distribution of N0,σε2/σδ2 [117]. Therefore, the concentration parameter μ2 can be thought of as an effective sample size [65, 118]. When instruments are strong, the concentration parameter μ2 increases proportionally to the sample size n [119].

6.2 |. Measurement of Weak Identification

Staiger & Stock [64] proposes to assess the instrument strength using the following F-statistic:

F^=γˆZZγ^pσ^δ2,

where γˆ denotes the coefficient vector by fitting an OLS regression of the exposure on the instruments, and σˆδ2 is the corresponding residual variance. This statistic provides a test of the joint null hypothesis γ=0 in the first-stage regression of 2SLS, and is therefore commonly referred to as the “first-stage F-statistic” [64, 118]. Under the null hypothesis γ=0 and within the weak instrument asymptotics framework (i.e., IV strengths γjj=1p shrink at a 1/n rate [64]), pF^ converges in distribution to a noncentral chi-squared random variable with p degrees of freedom and noncentrality parameter μ2 [118]. As suggested by Staiger & Stock [64], F^<10 is the rule-of-thumb threshold for weak instruments.

6.3 |. Addressing Weak Identification Bias in MR Studies

Recent developments in IV and MR literature have advanced methodologies to address weak instrument bias. For example, Ye et al. [120] proposes dIVW, a debiased version of the inverse-variance weighted (IVW) estimator, which is robust to many weak IVs using two-sample summary-level data. Xu et al. [121] further develops the penalized IVW (pIVW) estimator by using a penalization approach to prevent the denominator of dIVW estimator to be too close to zero. Mikusheva & Sun [122] defines weak identification in the context of many instruments, where the number of instruments p grows with the sample size n, and introduces a jackknifed version of the Anderson-Rubin test statistic [123] that is robust to weak identification with many instruments and heteroscedasticity in both the exposure and the outcome. Ye et al. [112] proposes GENIUS-MAWII (G-Estimation under No Interaction with Unmeasured Selection leveraging MAny Weak Invalid IVs), which simultaneously addresses the challenges of many weak instruments and widespread horizontal pleiotropy in MR studies.

7 |. Population-Based Versus Family-Based Design

7.1 |. Population-Based MR Design

Population-based designs (e.g., cohort studies and case-control studies) include unrelated subjects from the target population [124126]. A cohort study is an observational research method where a group of people with a shared characteristic, called a cohort, is followed over time to observe health outcomes or the development of a disease after a specific exposure [125, 126]. These studies identify groups based on factors like exposure to a risk factor and then compare the outcomes in exposed versus unexposed individuals to determine associations. There are two main types of cohort studies: prospective cohort studies [127], which follow the group into the future, and retrospective cohort studies [128], which look back at historical data. For example, UK Biobank (UKB) is a large-scale, prospective cohort study that includes over 500,000 participants aged 40–69 at recruitment across the United Kingdom [129, 130]. In contrast, a case-control study retrospectively compares subjects with a specific outcome (cases) to those without (controls) [125, 131, 132]. Since outcomes are often rare, cases are oversampled and controls are undersampled in case-control studies, resulting in a sample that may not reflect the target population [133]. Nevertheless, logistic regression can still provide valid association estimates on the odds ratio scale in case-control studies [134]. MR studies leveraging population-based designs benefit from large sample sizes and wide coverage. For example, UKB has genotyped over 500,000 individuals, providing high statistical power to detect modest associations between exposures and outcomes [129, 130]. However, population-based designs are susceptible to confounding by population stratification, assortative mating, dynastic effects, and selection bias [23, 24, 66]. To mitigate these biases, researchers often apply methods such as adjusting for principal components, matching, or the use of negative controls [67, 135137].

7.2 |. Family-Based MR Design

Family-based designs in MR use data from related individuals, typically sibling pairs or parent-offspring trios, to draw causal conclusions within families [66, 138141]. By comparing genetically and demographically similar relatives, these designs inherently control for many confounding factors [138, 140, 142]. For example, in a sibling-based MR, one sibling can serve as a control for shared family background [140]. This within-family comparison helps to eliminate bias due to population stratification, dynastic effects, and assortative mating, which may otherwise confound population-based MR findings [66, 140]. However, family-based study designs typically have smaller sample sizes, limiting statistical power to detect causal relationships [66, 140, 143]. Moreover, these designs require more complex modeling to properly account for family structures and relatedness among subjects [66, 141, 144]. Despite these challenges, family-based MR has proven valuable; for example, within-family analyses have shown that effects of height and BMI on educational attainment, observed in population-based MR, are substantially attenuated when shared familial factors are controlled [66].

7.3 |. Choosing Between Population and Family-Based MR Designs

In summary, population-based and family-based MR designs offer complementary advantages and trade-offs, as summarized in Table 3. Population-based MR relies on broad, large-scale samples, offering greater statistical power and generalizability but is more susceptible to bias from population stratification, assortative mating, dynastic effects, and selection bias. Family-based MR inherently accounts for many shared genetic and environmental factors, enhancing the reliability of causal conclusions, but typically involves smaller samples and more complex modeling. In practice, combining insights from both study designs can provide more robust causal findings.

TABLE 3 |.

Comparison of population-based and family-based MR designs.

Feature Population-based design Family-based design
Definition Includes subjects sampled from the target population (e.g., UK Biobank). Includes genetically related subjects (e.g., siblings or parent-offspring trios).
Study types Includes designs such as cohort studies (prospective or retrospective) and case-control studies. Includes designs such as sibling designs and parent-offspring trio designs.
Key strengths Generally larger sample size; high statistical power and broader generalizability. Inherent control for population stratification, dynastic effects, and shared environment.
Limitations Susceptible to population stratification, assortative mating, dynastic bias, and selection bias. Smaller sample sizes; requires more complex modeling to account for family structures and relatedness.
IV assumption violation risks More susceptible to violations of the IV independence assumption. Less susceptible to IV assumption violations due to within-family comparison.
Bias mitigation Adjustment for principal components, matching, and negative control outcomes. Natural control for genetic and environmental confounding within families.

8 |. Individual-Level Versus Summary-Level Data

8.1 |. MR Methods Using Individual-Level Data

Let Y=Y1,,Yn be the vector of continuous outcomes of n subjects, D=D1,,Dn be the vector of continuous exposures, and Z=Zijn×p be the matrix of genetic instruments, where Zij is the genotype of j th genetic instrument of subject i. Individual-level data MR utilizes the dataset {Y,D,Z} containing individual-level measurements of outcomes, exposures, and genotypes. When individual-level data are available and all instruments are valid, two-stage least squares (2SLS) estimator β^2SLS defined in Equation (8) is commonly employed to estimate the causal effect β for continuous outcomes under the ALICE model framework [12, 14]. When all genetic instruments satisfy the three core IV Assumptions A1’–A3’, the 2SLS estimator β^2SLS is consistent to the causal effect β [14]. When some genetic instruments violate one or more of the core IV assumptions, alternative individual-level data MR methods have been developed to estimate the causal effect β, for example, sisVIVE [93], TSHT [73], MR-GENIUS [111], MR-MiSTERI [74], MRSquare [75], GENIUS-MAWII [112], and MR-MAGIC [77]. See Sections 5 and 6, as well as Kang et al. [79], for details.

8.2 |. MR Methods Using Summary-Level Data

In contrast, summary-level MR utilizes summary statistics of marginal IV-exposure and IV-outcome associations derived from individual-level data (often not directly accessible) to perform causal inference. Let Z.j=Z1j,,Znj denote the genotype vector of jth genetic instrument, then the marginal estimates of γj and Γj are obtained through marginal regressions of D and Y on Z.j, respectively:

γˆj=ZjZj-1ZjD,
Γˆj=ZjZj-1ZjY.

Then, the ratio estimator using jth genetic instrument is

β^j=Γ^jγ^j

When all p genetic instruments are valid, the following inverse-variance weighted (IVW) estimator [97] combines ratio estimators from each genetic instrument:

β^IVW=j=1pγˆj2σˆΓ,j-2βˆjj=1pγˆj2σˆΓ,j-2

where σ^Γ,j is the standard error of Γ^j. The IVW estimator upweights genetic instruments with stronger IV-exposure associations (i.e., larger γˆj2) and more precise IV-outcome associations (i.e., smaller σ^Γ,j2). In addition, β^IVW consistently estimates β when all instruments are valid and mutually independent. When some genetic instruments violate the core IV assumptions, the IVW estimator is no longer consistent. To address this, several summary-level data MR methods have been proposed to obtain robust causal effect estimates even in the violation of core IV assumptions [32, 71, 76, 106, 109, 121, 145]. We provide a list of commonly used software implementations with links in Supporting Informations.

8.3 |. Comparison of Individual-Level and Summary-Level Data in MR

Compared to summary-level data MR methods, individual-level data MR offers distinct methodological advantages in modeling nonlinear biological relationships and addressing invalid instrument issues. First, individual-level data allow for the characterization of nonlinear relationship among genetic instruments, exposures and outcomes [75, 103, 110, 146148]. For example, Guo et al. [110] explicitly models both nonlinear IV-exposure associations and nonlinear violations of assumptions (A2) and (A3), and proposes the Two-Stage Curvature Identification (TSCI) method to identify and estimate the causal effect of interest using individual-level data. In contrast, summary-level data are calculated using linear or generalized linear models, and thus MR based on summary-level data lacks the capacity to detect nonlinear associations [149]. Second, individual-level data enable more flexible approaches for handling invalid IVs. For example, Sun et al. [75] proposes a class of G-estimators for the causal effect in the presence of multiple potentially invalid IVs by leveraging gene-gene interactions, and Liu et al. [74] proposes novel identification assumptions for the average treatment effect on the treated (ATT) with a possibly invalid IV.

Conversely, summary-level data MR provides significant practical advantages in genomic data. First, publicly accessible genome-wide association studies (GWAS) summary statistics have become increasingly abundant [150152], overcoming privacy concerns and logistic burdens that often limit access to individual-level genetic data [153155]. Second, GWAS consortia routinely combine data from hundreds of thousands of participants, significantly enhancing statistical power to detect causal relationships [156]. In addition, platforms like MR-Base [157] further streamline analysis by enabling efficient harmonization of exposure and outcome summary statistics across multiple GWAS datasets.

9 |. One-Sample Versus Two-Sample Design

9.1 |. Overview and Conceptual Differences

To facilitate comparison between one-sample and two-sample MR designs, we adopt the ALICE framework for two independent datasets [145, 158, 159]. Let s{1,2} index two independent samples with sample sizes n(1) and n(2), respectively. Within each sample s, let Yi(s) denote the outcome, Di(s) denote the exposure, and Zi(s)=Zi1(s),,Zip(s)Rp denote the vector of p genetic instruments for subject i. The data Yi(s),Di(s),Zi(s)i=1n(s) are generated according to:

Di(s)=j=1pγj(s)Zij(s)+δi(s),
Yi(s)=β(s)Di(s)+j=1pπj(s)Zij(s)+εi(s),

with error terms satisfying Eδi(s)Zi(s)=0 and Eεi(s)Zi(s)=0. We now state the study objectives in one-sample and two-sample MR designs as follows [145]:

  • Objective in one-sample MR design: Given either the individual-level data Yi(s),Di(s),Zi(s)i=1n(s), or the corresponding summary-level data of IV-exposure and IV-outcome associations from sample s, how to estimate the causal effect β(s) ?

  • Objective in two-sample MR design: Given the individual-level data Yi(1),Zi(1)i=1n(1) from the first sample and Di(2),Zi(2)i=1n(2) from the second sample (or their corresponding summary-level data), how to estimate the causal effects β(1) and/or β(2) ?

As noted in Zhao et al. [145], to enable the identification and estimation of the causal effect in two-sample designs, we further need to impose the following assumptions [145, 158, 159]:

Assumption C1 (Homogeneity in parameters). β(1)=β(2)=β,γj(1)=γj(2)=γj, and πj(1)=πj(2)=πj for j=1,,p.

Assumption C2 (Homogeneity in the distribution of error terms). δi(1),εi(1)=dδi(2),εi(2) for i1,,n(1) and i1,,n(2), where =d indicates that the random vectors have the same distribution.

Remark 7. Under Assumptions C1 and C2, the only source of heterogeneity between the two samples arises from differences in the distribution of instruments [145]. In the context of MR, such heterogeneity may reflect differences in genetic ancestry, sampling design, or genotyping platforms, which can lead to differences in allele frequencies or linkage disequilibrium patterns.

9.2 |. Weak IV Biases in One-Sample and Two-Sample MR Estimations

In this section, we focus on the comparison between one-sample and two-sample MR designs, and for simplicity we assume all genetic instruments are valid IVs, that is, πj(1)=π2(2)=0 for j=1,,p. Under Assumptions C1 and C2 and by assuming all genetic instruments are valid IVs, the above ALICE model becomes

Di(s)=j=1pγjZij(s)+δi(s),
Yi(s)=βDi(s)+εi(s),

and the reduced-form outcome model becomes

Yi(s)=j=1pΓjZij(s)+ei(s),

where Γj=βγj and ei(s)=βδi(s)+εi(s). Within sample s, let Y(s)=Y1(s),,Yn(s)(s) be the vector of outcomes, D(s)=D1(s),,Dn(s)(s) be the vector of exposures, and Z(s)=Z1(s),,Zn(s)(s)Rn(s)×p be the matrix of genetic instruments. For simplicity, we further assume that genetic instruments are (1) standardized such that EZij(s)=0 and VarZij(s)=1 for j=1,,p [160], and (2) mutually independent after LD clumping [161]. We now analyze the weak instrument biases of one-sample and two-sample 2SLS estimators under this setup.

Let β^2SLS(s) denote the one-sample 2SLS estimator using individual-level data from sample s. According to Hahn & Hausman [162], the weak instrument bias of β^2SLS(s) can be approximated as follows:

Eβ^2SLS(s)-βσδ,εn(s)γ2/p+σδ2,

where σδ,ε is the covariance between δ(s) and ε(s),σδ2 is the variance of δ(s), and γ2=j=1pγj2. We also provide the derivation of this approximate bias in Supporting Informations. On the other hand, the approxmate bias of ordinary least square (OLS) estimator β^OLS(s) using sample s is

Eβ^OLS(s)-βσδ,εγ2+σδ2.

With weak instruments, both the one-sample 2SLS and OLS estimators are biased when the error terms in the exposure and outcome models are correlated, that is, σδ,ε0. Importantly, the direction of the bias for β^2SLS(s) is the same as that for β^OLS(s), implying that the one-sample 2SLS estimator tends to be biased towards the OLS estimator with weak instruments.

In the two-sample design, the IV–exposure associations are first estimated in the first sample and then used to construct fitted exposures in the second sample. The causal effect is subsequently estimated by regression the outcome on these fitted exposures in the second sample. This estimation strategy is also known as the Split-Sample Instrumental Variable (SSIV) estimation [159]. Under our setting, the two-sample 2SLS estimator, denoted as β^SSIV, has the following approximation [159]:

EβˆSSIVβ×γ2γ2+pσδ2/n1.

This expression shows that the two-sample 2SLS estimator is attenuated toward zero by a factor that depends on the first-stage sample size n(1), the number of IVs p, the IV strengths, and the variance of the error in exposure model σδ2. Unlike in the one-sample setting, where weak instruments bias the 2SLS estimator toward the confounded OLS estimate, the two-sample 2SLS estimator is biased toward zero when instruments are weak.

Then, we consider the case where only marginal association estimates and their standard errors are available from a single sample. Specifically, in sample s, the marginal estimates of IV-exposure and IV-outcome associations of jth genetic instrument are γˆj(s)=Zj(s)Zj(s)-1Zj(s)D(s) and Γ^j(s)=Zj(s)Zj(s)-1Zj(s)Y(s), respectively. Let σ^Γ,j(s) denote the standard error of Γ^j(s). Then, the one-sample IVW estimator for the causal effect β using summary statistics from sample s is given by

β^IVW(s)=j=1pΓ^j(s)γ^j(s)/σ^Γ,j(s)2j=1pγ^j(s)2/σ^Γ,j(s)2,

and the bias of β^IVW(s) can be approximated as

EβˆIVW(s)-βσδ,εn(s)×j=1p1/σˆΓ,j(s)2j=1pγj2+σγ,j(s)2/σˆΓ,j(s)2,

where σγ,j(s)2 is the variance of γ^j(s). As with the one-sample 2SLS estimator, this weak instrument bias arises due to the correlation between error terms in the exposure and outcome models, and tends toward the OLS estimator.

Finally, we consider the two-sample IVW estimator that combines the IV–exposure association estimates γ^j(1)j=1p from the first sample and the IV–outcome association estimates Γ^j(2)j=1p from the second sample [72, 97], which is given by

β^IVW(1,2)=j=1pΓ^j(2)γ^j(1)/σ^Γ,j(2)2j=1pγ^j(1)2/σ^Γ,j(2)2,

and the expectation of the two-sample IVW estimator can be approximated as follows [78, 120]:

EβˆIVW(1,2)β×j=1pγj2/σˆΓ,j(2)2j=1pγj2+σγ,j(1)2/σˆΓ,j(2)2.

This reveals that the two-sample IVW estimator is biased toward zero with weak instruments, similar to the two-sample 2SLS estimator.

Conclusion 5. In the presence of weak IVs, one-sample MR estimators tend to be biased towards the confounded OLS estimator, whereas two-sample MR estimators tend to be biased towards zero.

Remark 8. To handle the weak IV bias in two-sample summary-level data MR analysis, Xu et al. [121] proposes a novel penalized inverse-variance weighted (pIVW) estimator that adjusts the IVW estimator through a penalized likelihood approach.

9.3 |. Advantages, Limitations, and Recommendations for Practice

In summary, both the 2SLS estimator with individual-level data and the IVW estimator with summary-level data in one-sample study designs are biased toward the OLS estimator with weak instruments. In contrast, both 2SLS and IVW estimators in two-sample study designs tends to be biased toward zero. In addition, in two-sample study design, the identification of the causal effect and interpretation of the estimate require Assumptions C1 and C2 on parameters and error terms in addition to core IV assumptions. Neither study design is universally superior; the choice between one-sample and two-sample study design depends on data availability (single versus two independent datasets), interpretability (population-specific versus generalizable estimates), and bias trade-offs (weak IV bias toward confounded OLS estimates versus toward zero).

10 |. Selecting Genetic IVs for MR Analysis

A critical step in MR analysis is the selection of appropriate genetic variants to serve as instruments. In practice, variants are often chosen based on their strength of association with the exposure, typically using GWAS summary statistics. Thresholds such as p<5×10-8 (genome-wide significance) or p<1×10-6 are commonly applied, although the precise cut-off varies across studies [24, 156, 163, 164].

The more difficult challenge lies in distinguishing valid from invalid instruments when the core assumptions may be violated. Different MR methods address this issue by introducing additional identification assumptions. For example, MR-PRESSO [76] identifies valid IVs under the majority rule.

Motivated by the Anna Karenina Principle (AKP) [32], which states that “all happy families are alike, but every unhappy family is unhappy in its own way,” we view valid instruments as a coherent group that share the same properties, while each invalid instrument may fail validity in a distinct manner. Building on this intuition, MR-SPI adopts the plurality rule assumption A8, which requires only that the largest group of instruments corresponds to the valid set, even if valid instruments do not form a majority. We note that both MR-SPI and Two-Stage Hard Thresholding (TSHT) [73] select valid IVs using a voting procedure under the plurality rule assumption, but they differ in two respects. First, TSHT requires one-sample individual-level data as input, whereas MR-SPI uses more accessible summary-level data from two samples. Second, the criterion for selecting relevant instruments differs: MR-SPI employs a more stringent threshold that accounts for multiple testing in genome-wide association studies.

Several considerations are important when selecting genetic instruments. Ideally, the sample used for IV selection should be independent of the samples used for estimating causal effect to minimize the winner’s curse [165, 166]. For example, Zhao et al. [167] proposes a three-sample MR design to eliminate the bias due to the winner’s curse. It is also advisable to use external resources, such as PhenoScanner [168], to screen candidate genetic IVs for associations with potential confounders or secondary traits, thereby improving instrument validity. Careful attention to these issues enhances the reliability and reproducibility of MR findings.

11 |. Applications in Real Datasets

11.1 |. Application 1: Assessing the Causal Effect of Body Mass Index on Diastolic Blood Pressure Using One-Sample Individual-Level Data From UK Biobank

In this section, we apply several one-sample MR methods to assess the causal effect of body mass index (BMI) on diastolic blood pressure (DBP). This analysis utilizes data from the UK Biobank (UKB) cohort study, a biomedical database comprising genetic and phenotypic information from approximately 500 000 UK participants [129, 130]. Participants who reported using anti-hypertensive medication or had missing data were excluded, resulting in a final sample of 254 502 individuals. Following Sun et al. [75], we selected the top 10 independent single-nucleotide polymorphisms (SNPs) most strongly associated with BMI after applying linkage disequilibrium (LD) clumping with r2<0.01. These SNPs are rs1558902, rs6567160, rs543874, rs13021737, rs10182181, rs2207139, rs11030104, rs10938397, rs13107325, and rs3810291.

We compare the following methods for estimating the causal effect β:

  1. two-stage least squares (2SLS) [14];

  2. Two-Stage Hard Thresholding (TSHT) [73];

  3. Confidence Interval method for Instrumental Variable (CIIV) [96];

  4. Some Invalid Some Valid IV Estimator (sisVIVE) [93];

  5. MRSquare [75];

MR Mixed-Scale Treatment Effect Robust Identification (MR-MiSTERI) [74]; and MR with MAny weak Genetic Interactions for Causality (MR-MAGIC) [77]. The MR methods discussed here are selected for tutorial demonstration and should not be viewed as a comprehensive coverage of all methods in the literature. For sisVIVE, we choose the tuning parameter via 10-fold cross-validation. For MRSquare, we set the minimum number of valid instruments to be 6. The code for this application is provided in “DataExamples.R” within the Supplementary Files. The results are summarized in Figure 5.

FIGURE 5 |.

FIGURE 5 |

Point estimates and 95% confidence intervals for the causal effect of body mass index (BMI) on diastolic blood pressure (DBP) in the UK Biobank data, obtained using different one-sample MR methods.

From Figure 5, all methods suggest a positive causal effect of BMI on DBP, with point estimates ranging from 0.1677 to 0.4037. The 2SLS method yields the smallest estimate (β^2SLS=0.1677; 95% CI: 0.0456–0.2898), likely due to the inclusion of invalid instruments in the analysis. Unlike 2SLS, which assumes all instruments are valid, the other methods account for potential invalid IVs in various ways. Notably, TSHT, CIIV, and sisVIVE implement procedures to select valid instruments from candidate ones. In this application, TSHT identifies two invalid IVs (rs10182181 and rs13107325), CIIV identifies three (rs10182181, rs13107325, and rs3810291), and sisVIVE identifies one (rs13107325). Instrument rs13107325 is consistently identified as an invalid IV by all three methods.

11.2 |. Application 2: Performing xMR Analysis to Identify Plasma Proteins Associated With the Risk of Alzheimer’s Disease Using Two-Sample Summary-Level Data

The increasingly available large-scale multi-omics data (e.g., epigenomics, transcriptomics, proteomics, and metabolomics data) enable us to perform omics MR (xMR), a methodology within the causal genomics field, to detect putative causal omics biomarkers for complex traits and diseases, thereby uncovering the underlying causal mechanisms. For a detailed, step-by-step tutorial on implementing commonly used xMR methods, refer to Yao & Liu [169].

In this section, we apply several two-sample MR methods to perform xMR analysis, aiming to identify putative causal plasma proteins associated with the risk of Alzheimer’s disease. For the exposure, we use UK Biobank Pharma Proteomics Project (UKB-PPP) summary statistics on 1463 plasma proteins measured in 54 306 individuals [170]. For the outcome, we use summary statistics from a meta-analysis of GWASs for clinically diagnosed AD and AD by proxy, comprising 455 258 samples in total [171]. Genetic instruments for each protein are selected by applying a Bonferroni-corrected threshold of p-value < 3.40 × 10−11, followed by LD clumping at threshold r2<0.01, as described in Sun et al. [170]. We compare the following two-sample MR methods in this application:

  1. IVW method [72];

  2. MR-Egger regression [71];

  3. MR-RAPS [78];

  4. MR-PRESSO [76];

  5. the weighted median method [106];

  6. the mode-based estimation [107];

  7. MRMix [108];

  8. the contamination mixture method [109]; and

  9. MR-SPI [32]. The code for this application can be found in “DataExamples.R” within the Supplementary Files.

Proteins identified to be significantly associated with Alzheimer’s disease after Bonferroni correction [172] are summarized in Figure S1 in Supporting Informations. In Figure S2A, we report the number of significant plasma proteins detected by each method, which ranges from 0 (MR-PRESSO) to 14 (MRMix). MR-PRESSO detects no significant proteins, likely because the small number of candidate IVs per protein limits its power to perform the outlier test and detect invalid instruments. In Figure S2B, we list the 11 plasma proteins identified by at least two methods. Notably, the seven proteins identified by MR-SPI correspond to the top seven proteins ranked by the number of supporting methods.

12 |. Future Directions

12.1 |. Binary and Survival Outcomes

In epidemiological research, binary and survival outcomes are common, yet current MR methods predominantly focus on continuous outcomes. For one-sample MR with individual-level data, establishing identification conditions for MR analysis with binary or survival outcomes is essential, particularly in the presence of invalid instruments [89, 173175]. For two-sample MR with summary-level data, the current standard practice is to directly apply existing two-sample MR methods under the ALICE model framework for continuous outcomes to analyze summary statistics of binary or survival outcomes, while the interpretation of the causal effect estimate obtained by this direct application is unclear and requires further justification [145].

12.2 |. Longitudinal Studies

A longitudinal study is a research design involving repeated measures of the same variables over prolonged periods of time, widely used in epidemiology and social science to track trends and establish causal relationships. However, longitudinal studies aimed at estimating causal effects may be subject to bias arising from unmeasured confounding and/or time-varying confounding variables. MR analysis can mitigate such unmeasured and time-varying confounding bias by leveraging genetic variations as IVs, though it typically relies on data measured at a single time point. Developing MR methods tailored for longitudinal studies holds promise for estimating time-varying causal effects, thereby providing novel biological insights into lifetime health trajectories [176178].

12.3 |. Multivariable MR

Multivariable MR extends the classical MR framework to estimate the causal effects of multiple exposures on an outcome simultaneously [179, 180]. This approach is particularly valuable when exposures are biologically correlated, and has been applied to disentangle complex causal relationships, for example, identifying metabolite biomarkers for age-related macular degeneration [181]. Recent studies have begun to address challenges arising from invalid instruments in multivariable MR framework [182184]. However, methods robust to the violation of core IV assumptions when handling high-dimensional exposures (e.g., omics biomarkers) are still lacking.

Supplementary Material

Supp

Additional supporting information can be found online in the Supporting Information section. Data S1: Supporting Information.

Acknowledgments

This research has been conducted using the UK Biobank Resource under application number 52008. The authors express sincere thanks to Dr. Paul S. Albert for initiating this tutorial and for his constructive feedback during its development, and Dr. Stephen Burgess for insightful comments that improved this manuscript. Dr. Zhonghua Liu is supported by the National Institutes of Health (NIH) grant R01AG086379. Dr. Xihao Li is supported by the research start-up funds from the Department of Biostatistics and the Department of Genetics at the University of North Carolina at Chapel Hill.

Funding

This work was supported by the National Institutes of Health (Grant No. R01AG086379) and the research start-up funds from the Department of Biostatistics and the Department of Genetics at the University of North Carolina at Chapel Hill.

Footnotes

Conflicts of Interest

The authors declare no conflicts of interest.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

References

  • 1.Hernán MA and Robins JM, Causal Inference: What if (CRC Press, 2020). [Google Scholar]
  • 2.Holland PW, “Statistics and Causal Inference,” Journal of the American Statistical Association 81, no. 396 (1986): 945–960. [Google Scholar]
  • 3.Imbens GW, “Causal Inference in the Social Sciences,” Annual Review of Statistics and Its Application 11 (2024): 123–152. [Google Scholar]
  • 4.Pearl J, “Causal Inference in Statistics: An Overview,” Statistics Surveys 3 (2009): 96–146. [Google Scholar]
  • 5.Fisher RA, The Design of Experiments (Oliver and Boyd, 1935). [Google Scholar]
  • 6.Neyman J, “Sur Les Applications de la théorie Des probabilités Aux Experiences Agricoles: Essai Des Principes,” Roczniki Nauk Rolniczych 10, no. 1 (1923): 1–51. [Google Scholar]
  • 7.Imbens GW and Rubin DB, Causal Inference in Statistics, Social, and Biomedical Sciences (Cambridge University Press, 2015). [Google Scholar]
  • 8.Rubin DB, “Assignment to Treatment Group on the Basis of a Covariate,” Journal of Educational Statistics 2, no. 1 (1977): 1–26. [Google Scholar]
  • 9.Rosenbaum PR, Design of Observational Studies, vol. 10 (Springer, 2010). [Google Scholar]
  • 10.Rubin DB, “The Design Versus the Analysis of Observational Studies for Causal Effects: Parallels With the Design of Randomized Trials,” Statistics in Medicine 26, no. 1 (2007): 20–36. [DOI] [PubMed] [Google Scholar]
  • 11.Angrist JD, Imbens GW, and Rubin DB, “Identification of Causal Effects Using Instrumental Variables,” Journal of the American Statistical Association 91, no. 434 (1996): 444–455. [Google Scholar]
  • 12.Angrist JD and Pischke J-S, Mostly Harmless Econometrics: An Empiricist’s Companion (Princeton University Press, 2009). [Google Scholar]
  • 13.Baiocchi M, Cheng J, and Small DS, “Instrumental Variable Methods for Causal Inference,” Statistics in Medicine 33, no. 13 (2014): 2297–2340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wooldridge JM, Introductory Econometrics: A Modern Approach (South-Western Cengage Learning, 2016). [Google Scholar]
  • 15.Wright PG, The Tariff on Animal and Vegetable Oils (Macmillan, 1928). [Google Scholar]
  • 16.Davey Smith G and Ebrahim S, “‘Mendelian Randomization’: Can Genetic Epidemiology Contribute to Understanding Environmental Determinants of Disease?,” International Journal of Epidemiology 32, no. 1 (2003): 1–22. [DOI] [PubMed] [Google Scholar]
  • 17.Katan M, “Apoupoprotein E Isoforms, Serum Cholesterol, and Cancer,” Lancet 327, no. 8479 (1986): 507–508. [DOI] [PubMed] [Google Scholar]
  • 18.VanderWeele TJ, Tchetgen EJT, Cornelis M, and Kraft P, “Methodological Challenges in Mendelian Randomization,” Epidemiology 25, no. 3 (2014): 427–435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Angrist JD, “Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence From Social Security Administrative Records,” American Economic Review 80, no. 3 (1990): 313–336. [Google Scholar]
  • 20.Dunning T, Natural Experiments in the Social Sciences: A Design-Based Approach (Cambridge University Press, 2012). [Google Scholar]
  • 21.Burgess S and Thompson SG, Mendelian Randomization: Methods for Causal Inference Using Genetic Variants (CRC Press, 2021). [Google Scholar]
  • 22.Davey Smith G and Hemani G, “Mendelian Randomization: Genetic Anchors for Causal Inference in Epidemiological Studies,” Human Molecular Genetics 23, no. R1 (2014): R89–R98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Lawlor DA, Harbord RM, Sterne JA, Timpson N, and Davey Smith G, “Mendelian Randomization: Using Genes as Instruments for Making Causal Inferences in Epidemiology,” Statistics in Medicine 27, no. 8 (2008): 1133–1163. [DOI] [PubMed] [Google Scholar]
  • 24.Sanderson E, Glymour MM, Holmes MV, et al. , “Mendelian Randomization,” Nature Reviews Methods Primers 2, no. 1 (2022): 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Emdin CA, Khera AV, and Kathiresan S, “Mendelian Randomization,” JAMA 318, no. 19 (2017): 1925–1926. [DOI] [PubMed] [Google Scholar]
  • 26.Palmer TM, Lawlor DA, Harbord RM, et al. , “Using Multiple Genetic Variants as Instrumental Variables for Modifiable Risk Factors,” Statistical Methods in Medical Research 21, no. 3 (2012): 223–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Thanassoulis G and O’Donnell CJ, “Mendelian Randomization: Nature’s Randomized Trial in the Post–Genome Era,” JAMA 301, no. 22 (2009): 2386–2388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Burgess S, Swanson SA, and Labrecque JA, “Are Mendelian Randomization Investigations Immune From Bias due to Reverse Causation?,” European Journal of Epidemiology 36 (2021): 253–257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Haycock PC, Burgess S, Wade KH, Bowden J, Relton C, and Smith GD, “Best (But Oft-Forgotten) Practices: The Design, Analysis, and Interpretation of Mendelian Randomization Studies,” American Journal of Clinical Nutrition 103, no. 4 (2016): 965–978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Smith GD, Paternoster L, and Relton C, “When Will Mendelian Randomization Become Relevant for Clinical Practice and Public Health?,” JAMA 317, no. 6 (2017): 589–591. [DOI] [PubMed] [Google Scholar]
  • 31.Woolf B and Burgess S, “The Role of Estimation in Mendelian Randomization: Should Mendelian Randomization Investigations Provide Estimates?,” AJE Advances: Research in Epidemiology 1, no. 1 (2025): uuaf003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Yao M, Miller GW, Vardarajan BN, Baccarelli AA, Guo Z, and Liu Z, “Deciphering Proteins in Alzheimer’s Disease: A New Mendelian Randomization Method Integrated With AlphaFold3 for 3D Structure Prediction,” Cell Genomics 4, no. 12 (2024): 100700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Kahan BC, Hindley J, Edwards M, Cro S, and Morris TP, “The Estimands Framework: A Primer on the ICH E9 (R1) Addendum,” BMJ 384 (2024): e076316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Lundberg I, Johnson R, and Stewart BM, “What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory,” American Sociological Review 86, no. 3 (2021): 532–565. [Google Scholar]
  • 35.Imbens GW, “Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review,” Review of Economics and Statistics 86, no. 1 (2004): 4–29. [Google Scholar]
  • 36.Imbens GW and Angrist JD, “Identification and Estimation of Local Average Treatment Effects,” Econometrica 62, no. 2 (1994): 467–475. [Google Scholar]
  • 37.Ference BA, Holmes MV, and Smith GD, “Using Mendelian Randomization to Improve the Design of Randomized Trials,” Cold Spring Harbor Perspectives in Medicine 11, no. 7 (2021): a040980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Han S and Zhou X-H, “Defining Estimands in Clinical Trials: A Unified Procedure,” Statistics in Medicine 42, no. 12 (2023): 1869–1887. [DOI] [PubMed] [Google Scholar]
  • 39.Keene ON, Lynggaard H, Englert S, Lanius V, and Wright D, “Why Estimands Are Needed to Define Treatment Effects in Clinical Trials,” BMC Medicine 21, no. 1 (2023): 276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lewis JA, “Statistical Principles for Clinical Trials (Ich e9): An Introductory Note on an International Guideline,” Statistics in Medicine 18, no. 15 (1999): 1903–1942. [DOI] [PubMed] [Google Scholar]
  • 41.Little RJ and Lewis RJ, “Estimands, Estimators, and Estimates,” JAMA 326, no. 10 (2021): 967–968. [DOI] [PubMed] [Google Scholar]
  • 42.VanderWeele TJ, “Commentary: On Causes, Causal Inference, and Potential Outcomes,” International Journal of Epidemiology 45, no. 6 (2016): 1809–1816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Rubin DB, “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies,” Journal of Educational Psychology 66, no. 5 (1974): 688–701. [Google Scholar]
  • 44.Rubin DB, “Causal Inference Using Potential Outcomes: Design, Modeling, Decisions,” Journal of the American Statistical Association 100, no. 469 (2005): 322–331. [Google Scholar]
  • 45.Wasserstein RL and Lazar NA, “The ASA Statement on p-Values: Context, Process, and Purpose,” American Statistician 70, no. 2 (2016): 129–133. [Google Scholar]
  • 46.Stolberg HO, Norman G, and Trop I, “Randomized Controlled Trials,” American Journal of Roentgenology 183, no. 6 (2004): 1539–1544. [DOI] [PubMed] [Google Scholar]
  • 47.Goldstein CE, Weijer C, Brehaut JC, et al. , “Ethical Issues in Pragmatic Randomized Controlled Trials: A Review of the Recent Literature Identifies Gaps in Ethical Argumentation,” BMC Medical Ethics 19 (2018): 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Hellman S and Hellman DS, “Of Mice but Not Men: Problems of the Randomized Clinical Trial,” in Research Ethics (Routledge, 2017), 201–205. [Google Scholar]
  • 49.Craig P, Katikireddi SV, Leyland A, and Popham F, “Natural Experiments: An Overview of Methods, Approaches, and Contributions to Public Health Intervention Research,” Annual Review of Public Health 38 (2017): 39–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.DiNardo J, “Natural Experiments and Quasi-Natural Experiments,” in Microeconometrics (Springer, 2010), 139–153. [Google Scholar]
  • 51.Leatherdale ST, “Natural Experiment Methodology for Research: A Review of How Different Methods Can Support Real-World Research,” International Journal of Social Research Methodology 22, no. 1 (2019): 19–35. [Google Scholar]
  • 52.Sanson-Fisher RW, D’Este CA, Carey ML, Noble N, and Paul CL, “Evaluation of Systems-Oriented Public Health Interventions: Alternative Research Designs,” Annual Review of Public Health 35 (2014): 9–27. [DOI] [PubMed] [Google Scholar]
  • 53.Bateson W and Mendel G, Mendel’s Principles of Heredity (Courier Corporation, 2013). [Google Scholar]
  • 54.Biffen RH, “Mendel’s Laws of Inheritance and Wheat Breeding,” Journal of Agricultural Science 1, no. 1 (1905): 4–48. [Google Scholar]
  • 55.Castle WE, “Mendel’s Law of Heredity,” Science 18, no. 456 (1903): 396–406. [DOI] [PubMed] [Google Scholar]
  • 56.Chari S and Dworkin I, “The Conditional Nature of Genetic Interactions: The Consequences of Wild-Type Backgrounds on Mutational Interactions in a Genome-Wide Modifier Screen,” PLoS Genetics 9, no. 8 (2013): e1003661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.International HapMap Consortium, “A Haplotype Map of the Human Genome,” Nature 437, no. 7063 (2005): 1299–1320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Kleckner N, “Meiosis: How Could It Work?,” Proceedings of the National Academy of Sciences 93, no. 16 (1996): 8167–8174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Davey Smith G, Holmes MV, Davies NM, and Ebrahim S, “Mendel’s Laws, Mendelian Randomization and Causal Inference in Observational Data: Substantive and Nomenclatural Issues,” European Journal of Epidemiology 35, no. 2 (2020): 99–111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Dijksterhuis EJ, Archimedes (Princeton University Press, 2014). [Google Scholar]
  • 61.Didelez V and Sheehan N, “Mendelian Randomization as an Instrumental Variable Approach to Causal Inference,” Statistical Methods in Medical Research 16, no. 4 (2007): 309–330. [DOI] [PubMed] [Google Scholar]
  • 62.Andrews I, Stock JH, and Sun L, “Weak Instruments in Instrumental Variables Regression: Theory and Practice,” Annual Review of Economics 11 (2019): 727–753. [Google Scholar]
  • 63.Burgess S, Thompson SG, and Crp Chd Genetics Collaboration, “Avoiding Bias From Weak Instruments in Mendelian Randomization Studies,” International Journal of Epidemiology 40, no. 3 (2011): 755–764. [DOI] [PubMed] [Google Scholar]
  • 64.Staiger D and Stock JH, “Instrumental Variables Regression With Weak Instruments,” Econometrica 65, no. 3 (1997): 557–586. [Google Scholar]
  • 65.Stock JH, Wright JH, and Yogo M, “A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments,” Journal of Business & Economic Statistics 20, no. 4 (2002): 518–529. [Google Scholar]
  • 66.Brumpton B, Sanderson E, Heilbron K, et al. , “Avoiding Dynastic, Assortative Mating, and Population Stratification Biases in Mendelian Randomization Through Within-Family Analyses,” Nature Communications 11, no. 1 (2020): 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Sanderson E, Richardson TG, Hemani G, and Davey Smith G, “The Use of Negative Control Outcomes in Mendelian Randomization to Detect Potential Population Stratification,” International Journal of Epidemiology 50, no. 4 (2021): 1350–1361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Hemani G, Bowden J, and Davey Smith G, “Evaluating the Potential Role of Pleiotropy in Mendelian Randomization Studies,” Human Molecular Genetics 27, no. R2 (2018): R195–R208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Sivakumaran S, Agakov F, Theodoratou E, et al. , “Abundant Pleiotropy in Human Complex Diseases and Traits,” American Journal of Human Genetics 89, no. 5 (2011): 607–618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Solovieff N, Cotsapas C, Lee PH, Purcell SM, and Smoller JW, “Pleiotropy in Complex Traits: Challenges and Strategies,” Nature Reviews Genetics 14, no. 7 (2013): 483–495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Bowden J, Davey Smith G, and Burgess S, “Mendelian Randomization With Invalid Instruments: Effect Estimation and Bias Detection Through Egger Regression,” International Journal of Epidemiology 44, no. 2 (2015): 512–525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Bowden J, Del Greco MF, Minelli C, Davey Smith G, Sheehan N, and Thompson J, “A Framework for the Investigation of Pleiotropy in Two-Sample Summary Data Mendelian Randomization,” Statistics in Medicine 36, no. 11 (2017): 1783–1802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Guo Z, Kang H, Tony Cai T, and Small DS, “Confidence Intervals for Causal Effects With Invalid Instruments by Using Two-Stage Hard Thresholding With Voting,” Journal of the Royal Statistical Society, Series B: Statistical Methodology 80, no. 4 (2018): 793–815. [Google Scholar]
  • 74.Liu Z, Ye T, Sun B, Schooling M, and Tchetgen Tchetgen E, “Mendelian Randomization Mixed-Scale Treatment Effect Robust Identification and Estimation for Causal Inference,” Biometrics 79, no. 3 (2023): 2208–2219. [DOI] [PubMed] [Google Scholar]
  • 75.Sun B, Liu Z, and Tchetgen Tchetgen E, “Semiparametric Efficient G-Estimation With Invalid Instrumental Variables,” Biometrika 110, no. 4 (2023): 953–971. [Google Scholar]
  • 76.Verbanck M, Chen C-Y, Neale B, and Do R, “Detection of Widespread Horizontal Pleiotropy in Causal Relationships Inferred From Mendelian Randomization Between Complex Traits and Diseases,” Nature Genetics 50, no. 5 (2018): 693–698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Zhang D, Yao M, Liu Z, and Sun B, “Mr-Magic: Robust Causal Inference Using Many Weak Genetic Interactions,” 2025. arXiv Preprint arXiv:2504.13565. [Google Scholar]
  • 78.Zhao Q, Wang J, Hemani G, Bowden J, and Small DS, “Statistical Inference in Two-Sample Summary-Data Mendelian Randomization Using Robust Adjusted Profile Score,” Annals of Statistics 48, no. 3 (2020): 1742–1769. [Google Scholar]
  • 79.Kang H, Guo Z, Liu Z, and Small D, “Identification and Inference With Invalid Instruments,” Annual Review of Statistics and Its Application 12 (2024): 385–405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Hirano K, Imbens GW, and Ridder G, “Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score,” Econometrica 71, no. 4 (2003): 1161–1189. [Google Scholar]
  • 81.Christ CF, Econometric Models and Methods (John Wiley & Sons, Inc, 1966). [Google Scholar]
  • 82.Goldberger AS, “Structural Equation Methods in the Social Sciences,” Econometrica: Journal of the Econometric Society 40 (1972): 979–1001. [Google Scholar]
  • 83.Haavelmo T, “The Probability Approach in Econometrics,” Econometrica 12 (1944): iii–115. [Google Scholar]
  • 84.Hernán MA and Robins JM, “Instruments for Causal Inference: An Epidemiologist’s Dream?,” Epidemiology 17, no. 4 (2006): 360–372. [DOI] [PubMed] [Google Scholar]
  • 85.Wooldridge JM, Econometric Analysis of Cross Section and Panel Data (MIT press, 2010). [Google Scholar]
  • 86.Wald A, “The Fitting of Straight Lines if Both Variables Are Subject to Error,” Annals of Mathematical Statistics 11, no. 3 (1940): 284–300. [Google Scholar]
  • 87.Robins JM, “Correcting for Non-Compliance in Randomized Trials Using Structural Nested Mean Models,” Communications in Statistics Theory and Methods 23, no. 8 (1994): 2379–2412. [Google Scholar]
  • 88.Heckman JJ and Robb R Jr., “Alternative Methods for Evaluating the Impact of Interventions: An Overview,” Journal of Econometrics 30, no. 1–2 (1985): 239–267. [Google Scholar]
  • 89.Liu Z, Sun B, Ye T, Richardson D, and Tchetgen ET, “Quasi Instrumental Variable Methods for Stable Hidden Confounding and Binary Outcome,” 2025. arXiv Preprint arXiv:2508.16096. [Google Scholar]
  • 90.Wang L and Tchetgen Tchetgen E, “Bounded, Efficient and Multiply Robust Estimation of Average Treatment Effects Using Instrumental Variables,” Journal of the Royal Statistical Society, Series B: Statistical Methodology 80, no. 3 (2018): 531–550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Richardson TS and Robins JM, “ACE Bounds; SEMs With Equilibrium Conditions,” Statistical Science 29, no. 3 (2014): 363–366. [Google Scholar]
  • 92.Holland PW, “Causal Inference, Path Analysis and Recursive Structural Equations Models,” ETS Research Report Series 1988, no. 1 (1988): i–50. [Google Scholar]
  • 93.Kang H, Zhang A, Cai TT, and Small DS, “Instrumental Variables Estimation With Some Invalid Instruments and Its Application to Mendelian Randomization,” Journal of the American Statistical Association 111, no. 513 (2016): 132–144. [Google Scholar]
  • 94.Small DS, “Sensitivity Analysis for Instrumental Variables Regression With Overidentifying Restrictions,” Journal of the American Statistical Association 102, no. 479 (2007): 1049–1058. [Google Scholar]
  • 95.Guo Z, “Causal Inference With Invalid Instruments: Post-Selection Problems and a Solution Using Searching and Sampling,” Journal of the Royal Statistical Society, Series B: Statistical Methodology 85, no. 3 (2023): 959–985. [Google Scholar]
  • 96.Windmeijer F, Liang X, Hartwig FP, and Bowden J, “The Confidence Interval Method for Selecting Valid Instrumental Variables,” Journal of the Royal Statistical Society, Series B: Statistical Methodology 83, no. 4 (2021): 752–776. [Google Scholar]
  • 97.Burgess S, Butterworth A, and Thompson SG, “Mendelian Randomization Analysis With Multiple Genetic Variants Using Summarized Data,” Genetic Epidemiology 37, no. 7 (2013): 658–665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Slob EA and Burgess S, “A Comparison of Robust Mendelian Randomization Methods Using Summary Data,” Genetic Epidemiology 44, no. 4 (2020): 313–329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Angrist JD, “Treatment Effect Heterogeneity in Theory and Practice,” Economic Journal 114, no. 494 (2004): C52–C83. [Google Scholar]
  • 100.Künzel SR, Sekhon JS, Bickel PJ, and Yu B, “Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning,” Proceedings of the National Academy of Sciences 116, no. 10 (2019): 4156–4165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Powers S, Qian J, Jung K, et al. , “Some Methods for Heterogeneous Treatment Effect Estimation in High Dimensions,” Statistics in Medicine 37, no. 11 (2018): 1767–1787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Guindo-Martínez M, Amela R, Bonàs-Guarch S, et al. , “The Impact of Non-Additive Genetic Associations on Age-Related Complex Diseases,” Nature Communications 12, no. 1 (2021): 2436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Veitia RA, Bottani S, and Birchler JA, “Gene Dosage Effects: Nonlinearities, Genetic Interactions, and Dosage Compensation,” Trends in Genetics 29, no. 7 (2013): 385–393. [DOI] [PubMed] [Google Scholar]
  • 104.Kolesár M, Chetty R, Friedman J, Glaeser E, and Imbens GW, “Identification and Inference With Many Invalid Instruments,” Journal of Business & Economic Statistics 33, no. 4 (2015): 474–484. [Google Scholar]
  • 105.Han C, “Detecting Invalid Instruments Using L1-GMM,” Economics Letters 101, no. 3 (2008): 285–287. [Google Scholar]
  • 106.Bowden J, Davey Smith G, Haycock PC, and Burgess S, “Consistent Estimation in Mendelian Randomization With Some Invalid Instruments Using a Weighted Median Estimator,” Genetic Epidemiology 40, no. 4 (2016): 304–314. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Hartwig FP, Davey Smith G, and Bowden J, “Robust Inference in Summary Data Mendelian Randomization via the Zero Modal Pleiotropy Assumption,” International Journal of Epidemiology 46, no. 6 (2017): 1985–1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Qi G and Chatterjee N, “Mendelian Randomization Analysis Using Mixture Models for Robust and Efficient Estimation of Causal Effects,” Nature Communications 10, no. 1 (2019): 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Burgess S, Foley CN, Allara E, Staley JR, and Howson JM, “A Robust and Efficient Method for Mendelian Randomization With Hundreds of Genetic Variants,” Nature Communications 11, no. 1 (2020): 376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Guo Z, Zheng M, and Bühlmann P, “Robustness Against Weak or Invalid Instruments: Exploring Nonlinear Treatment Models With Machine Learning,” 2022. arXiv Preprint arXiv:2203.12808. [Google Scholar]
  • 111.Tchetgen Tchetgen E, Sun B, and Walter S, “The GENIUS Approach to Robust Mendelian Randomization Inference,” Statistical Science 36, no. 3 (2021): 443–464. [Google Scholar]
  • 112.Ye T, Liu Z, Sun B, and Tchetgen Tchetgen E, “GENIUS-MAWII: For Robust Mendelian Randomization With Many Weak Invalid Instruments,” Journal of the Royal Statistical Society, Series B: Statistical Methodology 86, no. 4 (2024): 1045–1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Paré G, Cook NR, Ridker PM, and Chasman DI, “On the Use of Variance Per Genotype as a Tool to Identify Quantitative Trait Interaction Effects: A Report From the Women’s Genome Health Study,” PLoS Genetics 6, no. 6 (2010): e1000981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Wang H, Zhang F, Zeng J, et al. , “Genotype-By-Environment Interactions Inferred From Genetic Effects on Phenotypic Variability in the UK Biobank,” Science Advances 5, no. 8 (2019): eaaw3538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Kang H, Lee Y, Cai TT, and Small DS, “Two Robust Tools for Inference About Causal Effects With Invalid Instruments,” Biometrics 78, no. 1 (2022): 24–34. [DOI] [PubMed] [Google Scholar]
  • 116.Hansen LP, “Large Sample Properties of Generalized Method of Moments Estimators,” Econometrica: Journal of the Econometric Society 50 (1982): 1029–1054. [Google Scholar]
  • 117.Rothenberg TJ, “Approximating the Distributions of Econometric Estimators and Test Statistics,” Handbook of Econometrics 2 (1984): 881–935. [Google Scholar]
  • 118.Stock JH and Yogo M, “Testing for weak instruments in linear IV regression. NBER Technical Working Papers 0284, National Bureau of Economic Research, Inc,” 2002.
  • 119.Andrews D and Stock JH, Inference With Weak Instruments, NBER Technical Working Papers 0313 (National Bureau of Economic Research, Inc, 2005). [Google Scholar]
  • 120.Ye T, Shao J, and Kang H, “Debiased Inverse-Variance Weighted Estimator in Two-Sample Summary-Data Mendelian Randomization,” Annals of Statistics 49, no. 4 (2021): 2079–2100. [DOI] [PubMed] [Google Scholar]
  • 121.Xu S, Wang P, Fung WK, and Liu Z, “A Novel Penalized Inverse-Variance Weighted Estimator for Mendelian Randomization With Applications to COVID-19 Outcomes,” Biometrics 79, no. 3 (2023): 2184–2195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Mikusheva A and Sun L, “Inference With Many Weak Instruments,” Review of Economic Studies 89, no. 5 (2022): 2663–2686. [Google Scholar]
  • 123.Anderson TW and Rubin H, “Estimation of the Parameters of a Single Equation in a Complete System of Stochastic Equations,” Annals of Mathematical Statistics 20, no. 1 (1949): 46–63. [Google Scholar]
  • 124.Nkomo VT, Gardin JM, Skelton TN, Gottdiener JS, Scott CG, and Enriquez-Sarano M, “Burden of Valvular Heart Diseases: A Population-Based Study,” Lancet 368, no. 9540 (2006): 1005–1011. [DOI] [PubMed] [Google Scholar]
  • 125.Rothman KJ, Greenland S, Lash TL, et al. , Modern Epidemiology, vol. 3 (Wolters Kluwer Health/Lippincott Williams & Wilkins Philadelphia, 2008). [Google Scholar]
  • 126.Szklo M, “Population-Based Cohort Studies,” Epidemiologic Reviews 20, no. 1 (1998): 81–90. [DOI] [PubMed] [Google Scholar]
  • 127.Sedgwick P, “Prospective Cohort Studies: Advantages and Disadvantages,” BMJ 347 (2013): f6726. [Google Scholar]
  • 128.Sedgwick P, “Retrospective Cohort Studies: Advantages and Disadvantages,” BMJ 348 (2014): g1072. [DOI] [PubMed] [Google Scholar]
  • 129.Bycroft C, Freeman C, Petkova D, et al. , “The UK Biobank Resource With Deep Phenotyping and Genomic Data,” Nature 562, no. 7726 (2018): 203–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Sudlow C, Gallacher J, Allen N, et al. , “UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age,” PLoS Medicine 12, no. 3 (2015): e1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Breslow NE, “Statistics in Epidemiology: The Case-Control Study,” Journal of the American Statistical Association 91, no. 433 (1996): 14–28. [DOI] [PubMed] [Google Scholar]
  • 132.Schlesselman JJ, Case-Control Studies: Design, Conduct, Analysis, Volume 2 (Oxford university press, 1982). [Google Scholar]
  • 133.Wan F, Colditz GA, and Sutcliffe S, “Matched Versus Unmatched Analysis of Matched Case-Control Studies,” American Journal of Epidemiology 190, no. 9 (2021): 1859–1866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Prentice RL and Pyke R, “Logistic Disease Incidence Models and Case-Control Studies,” Biometrika 66, no. 3 (1979): 403–411. [Google Scholar]
  • 135.Lipsitch M, Tchetgen ET, and Cohen T, “Negative Controls: A Tool for Detecting Confounding and Bias in Observational Studies,” Epidemiology 21, no. 3 (2010): 383–388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136.Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, and Reich D, “Principal Components Analysis Corrects for Stratification in Genome-Wide Association Studies,” Nature Genetics 38, no. 8 (2006): 904–909. [DOI] [PubMed] [Google Scholar]
  • 137.Stuart EA, “Matching Methods for Causal Inference: A Review and a Look Forward,” Statistical Science: A Review Journal of the Institute of Mathematical Statistics 25, no. 1 (2010): 1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Davies NM, Hemani G, Neiderhiser JM, et al. , “The Importance of Family-Based Sampling for Biobanks,” Nature 634, no. 8035 (2024): 795–803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.Davies NM, Howe LJ, Brumpton B, Havdahl A, Evans DM, and Davey Smith G, “Within Family Mendelian Randomization Studies,” Human Molecular Genetics 28, no. R2 (2019): R170–R179. [DOI] [PubMed] [Google Scholar]
  • 140.Howe LJ, Nivard MG, Morris TT, et al. , “Within-Sibship Genome-Wide Association Analyses Decrease Bias in Estimates of Direct Genetic Effects,” Nature Genetics 54, no. 5 (2022): 581–592. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.LaPierre N, Fu B, Turnbull S, Eskin E, and Sankararaman S, “Leveraging Family Data to Design Mendelian Randomization That Is Provably Robust to Population Stratification,” Genome Research 33, no. 7 (2023): 1032–1041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Kong A, Thorleifsson G, Frigge ML, et al. , “The Nature of Nurture: Effects of Parental Genotypes,” Science 359, no. 6374 (2018): 424–428. [DOI] [PubMed] [Google Scholar]
  • 143.Chen W-M and Abecasis GR, “Family-Based Association Tests for Genomewide Association Scans,” American Journal of Human Genetics 81, no. 5 (2007): 913–926. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 144.Hwang L-D, Davies NM, Warrington NM, and Evans DM, “Integrating Family-Based and Mendelian Randomization Designs,” Cold Spring Harbor Perspectives in Medicine 11, no. 3 (2021): a039503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 145.Zhao Q, Wang J, Spiller W, Bowden J, and Small DS, “Two-Sample Instrumental Variable Analyses Using Heterogeneous Samples,” Statistical Science 34, no. 2 (2019): 317–333. [Google Scholar]
  • 146.Hall P and Horowitz JL, “Nonparametric Methods for Inference in the Presence of Instrumental Variables,” Annals of Statistics 33, no. 6 (2005): 2904–2929. [Google Scholar]
  • 147.Staley JR and Burgess S, “Semiparametric Methods for Estimation of a Nonlinear Exposure-Outcome Relationship Using Instrumental Variables With Application to Mendelian Randomization,” Genetic Epidemiology 41, no. 4 (2017): 341–352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148.Sulc J, Sjaarda J, and Kutalik Z, “Polynomial Mendelian Randomization Reveals Non-Linear Causal Effects for Obesity-Related Traits,” Human Genetics and Genomics Advances 3, no. 3 (2022): 100124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 149.Burgess S, “Towards More Reliable Non-Linear Mendelian Randomization Investigations,” European Journal of Epidemiology 39, no. 5 (2024): 447–449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150.Buniello A, MacArthur JAL, Cerezo M, et al. , “The NHGRI-EBI GWAS Catalog of Published Genome-Wide Association Studies, Targeted Arrays and Summary Statistics 2019,” Nucleic Acids Research 47, no. D1 (2019): D1005–D1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 151.Yang J, Ferreira T, Morris AP, et al. , “Conditional and Joint Multiple-SNP Analysis of GWAS Summary Statistics Identifies Additional Variants Influencing Complex Traits,” Nature Genetics 44, no. 4 (2012): 369–375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152.Zhu Z, Zhang F, Hu H, et al. , “Integration of Summary Data From GWAS and eQTL Studies Predicts Complex Trait Gene Targets,” Nature Genetics 48, no. 5 (2016): 481–487. [DOI] [PubMed] [Google Scholar]
  • 153.Harmanci A and Gerstein M, “Quantification of Private Information Leakage From Phenotype-Genotype Data: Linking Attacks,” Nature Methods 13, no. 3 (2016): 251–256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 154.Kaufman DJ, Murphy-Bollinger J, Scott J, and Hudson KL, “Public Opinion About the Importance of Privacy in Biobank Research,” American Journal of Human Genetics 85, no. 5 (2009): 643–654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 155.Naveed M, Ayday E, Clayton EW, et al. , “Privacy in the Genomic Era,” ACM Computing Surveys 48, no. 1 (2015): 1–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 156.Swerdlow DI, Kuchenbaecker KB, Shah S, et al. , “Selecting Instruments for Mendelian Randomization in the Wake of Genome-Wide Association Studies,” International Journal of Epidemiology 45, no. 5 (2016): 1600–1616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 157.Hemani G, Zheng J, Elsworth B, et al. , “The MR-Base Platform Supports Systematic Causal Inference Across the Human Phenome,” eLife 7 (2018): e34408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 158.Angrist JD and Krueger AB, “The Effect of Age at School Entry on Educational Attainment: An Application of Instrumental Variables With Moments From Two Samples,” Journal of the American Statistical Association 87, no. 418 (1992): 328–336. [Google Scholar]
  • 159.Angrist JD and Krueger AB, “Split-Sample Instrumental Variables Estimates of the Return to Schooling,” Journal of Business & Economic Statistics 13, no. 2 (1995): 225–235. [Google Scholar]
  • 160.Bulik-Sullivan BK, Loh P-R, Finucane HK, et al. , “LD Score Regression Distinguishes Confounding From Polygenicity in Genome-Wide Association Studies,” Nature Genetics 47, no. 3 (2015): 291–295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 161.Purcell S, Neale B, Todd-Brown K, et al. , “PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses,” American Journal of Human Genetics 81, no. 3 (2007): 559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 162.Hahn J and Hausman J, “Notes on Bias in Estimators for Simultaneous Equation Models,” Economics Letters 75, no. 2 (2002): 237–241. [Google Scholar]
  • 163.Kanai M, Tanaka T, and Okada Y, “Empirical Estimation of Genome-Wide Significance Thresholds Based on the 1000 Genomes Project Data Set,” Journal of Human Genetics 61, no. 10 (2016): 861–866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 164.Panagiotou OA, Ioannidis JP, and Genome-Wide Significance Project, “What Should the Genome-Wide Significance Threshold Be? Empirical Replication of Borderline Genetic Associations,” International Journal of Epidemiology 41, no. 1 (2012): 273–286. [DOI] [PubMed] [Google Scholar]
  • 165.Jiang T, Gill D, Butterworth AS, and Burgess S, “An Empirical Investigation Into the Impact of Winner’s Curse on Estimates From Mendelian Randomization,” International Journal of Epidemiology 52, no. 4 (2023): 1209–1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 166.Ma X, Wang J, and Wu C, “Breaking the Winner’s Curse in Mendelian Randomization: Rerandomized Inverse Variance Weighted Estimator,” Annals of Statistics 51, no. 1 (2023): 211–232. [Google Scholar]
  • 167.Zhao Q, Chen Y, Wang J, and Small DS, “Powerful Three-Sample Genome-Wide Design and Robust Statistical Inference in Summary-Data Mendelian Randomization,” International Journal of Epidemiology 48, no. 5 (2019): 1478–1492. [DOI] [PubMed] [Google Scholar]
  • 168.Kamat MA, Blackshaw JA, Young R, et al. , “PhenoScanner V2: An Expanded Tool for Searching Human Genotype–Phenotype Associations,” Bioinformatics 35, no. 22 (2019): 4851–4853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 169.Yao M and Liu Z, “An Introduction to Causal Inference Methods With Multi-Omics Data,” Current Protocols 5, no. 6 (2025): e70168. [DOI] [PubMed] [Google Scholar]
  • 170.Sun BB, Chiou J, Traylor M, et al. , “Plasma Proteomic Associations With Genetics and Health in the UK Biobank,” Nature 622, no. 7982 (2023): 329–338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 171.Jansen IE, Savage JE, Watanabe K, et al. , “Genome-Wide Meta-Analysis Identifies New Loci and Functional Pathways Influencing Alzheimer’s Disease Risk,” Nature Genetics 51, no. 3 (2019): 404–413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 172.Bland JM and Altman DG, “Multiple Significance Tests: The Bonferroni Method,” BMJ 310, no. 6973 (1995): 170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 173.Bu Q, Su W, Zhao X, and Liu Z, “Semiparametric Causal Inference for Right-Censored Outcomes With Many Weak Invalid Instruments,” 2025. arXiv Preprint arXiv:2509.13176. [Google Scholar]
  • 174.Clarke PS and Windmeijer F, “Identification of Causal Effects on Binary Outcomes Using Structural Mean Models,” Biostatistics 11, no. 4 (2010): 756–770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 175.Deng Y, Tu D, O’Callaghan CJ, et al. , “A Bayesian Approach for Two-Stage Multivariate Mendelian Randomization With Mixed Outcomes,” Statistics in Medicine 42, no. 13 (2023): 2241–2256. [DOI] [PubMed] [Google Scholar]
  • 176.Labrecque JA and Swanson SA, “Interpretation and Potential Biases of Mendelian Randomization Estimates With Time-Varying Exposures,” American Journal of Epidemiology 188, no. 1 (2019): 231–238. [DOI] [PubMed] [Google Scholar]
  • 177.Morris TT, Heron J, Sanderson EC, Davey Smith G, Didelez V, and Tilling K, “Interpretation of Mendelian Randomization Using a Single Measure of an Exposure That Varies Over Time,” International Journal of Epidemiology 51, no. 6 (2022): 1899–1909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 178.Sanderson E, Richardson TG, Morris TT, Tilling K, and Davey Smith G, “Estimation of Causal Effects of a Time-Varying Exposure at Multiple Time Points Through Multivariable Mendelian Randomization,” PLoS Genetics 18, no. 7 (2022): e1010290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 179.Burgess S and Thompson SG, “Multivariable Mendelian Randomization: The Use of Pleiotropic Genetic Variants to Estimate Causal Effects,” American Journal of Epidemiology 181, no. 4 (2015): 251–260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 180.Sanderson E, Davey Smith G, Windmeijer F, and Bowden J, “An Examination of Multivariable Mendelian Randomization in the Single-Sample and Two-Sample Summary Data Settings,” International Journal of Epidemiology 48, no. 3 (2019): 713–727. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 181.Zuber V, Colijn JM, Klaver C, and Burgess S, “Selecting Likely Causal Risk Factors From High-Throughput Experiments Using Multivariable Mendelian Randomization,” Nature Communications 11, no. 1 (2020): 29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 182.Chan LS, Malakhov MM, and Pan W, “A Novel Multivariable Mendelian Randomization Framework to Disentangle Highly Correlated Exposures With Application to Metabolomics,” American Journal of Human Genetics 111, no. 9 (2024): 1834–1847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 183.Liang X, Sanderson E, and Windmeijer F, “Selecting Valid Instrumental Variables in Linear Models With Multiple Exposure Variables: Adaptive Lasso and the Median-Of-Medians Estimator,” 2022. arXiv Preprint arXiv:2208.05278. [Google Scholar]
  • 184.Lin Z, Xue H, and Pan W, “Robust Multivariable Mendelian Randomization Based on Constrained Maximum Likelihood,” American Journal of Human Genetics 110, no. 4 (2023): 592–605. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

RESOURCES