Demystifying inconsistent two-sample mendelian randomization estimations using selection diagram

Lei Hou; Yuanyuan Yu; Zhi Geng; Fuzhong Xue; Hongkai Li

doi:10.1186/s12874-025-02707-x

. 2025 Dec 12;26:22. doi: 10.1186/s12874-025-02707-x

Demystifying inconsistent two-sample mendelian randomization estimations using selection diagram

Lei Hou ^1,^2,^#, Yuanyuan Yu ^1,^2,^#, Zhi Geng ³, Fuzhong Xue ^1,^2,^4,^✉,^#, Hongkai Li ^2,^5,^✉,^#

PMCID: PMC12859973 PMID: 41388250

Abstract

Two-Sample Mendelian Randomization (TSMR) analysis is a widely used method for inferring causal effect in the presence of unmeasured confounding. However, causal inferences may be biased if the distributions of key variables (e.g., exposures, outcomes, and confounders) differ across populations. Such discrepancies in the distributions of key variables between the two populations are referred to as different local mechanisms. This paper aims to clarify the impact of different local mechanisms on the estimation of the Local Average Treatment Effect (LATE) in TSMR analyses using selection diagrams. We first uncover and formally define the Complete and Partial Inconsistent TSMR Estimations (InTSMRE). Subsequently, we propose a criterion of No InTSMRE in the context of continuous and binary outcomes. Following this, we introduce the LATE Ratio to evaluate the deviation of the LATE estimate from the true causal effect. Finally, we demonstrate that the violation of the Monotonicity condition exacerbates the occurrences of the Complete InTSMRE; otherwise only the Partial InTSMRE occurs. Additionally, through simulation studies, we illustrate the specific conditions under which these InTSMRE arise. We explore the LATEs of Waist-to-hip ratio on Type 2 diabetes in European and mixed populations, demonstrating the phenomenon of the InTSMRE.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12874-025-02707-x.

Keywords: Inconsistent Two-Sample Mendelian Randomization Estimations, Local Average Treatment Effect, Selection Diagram, Monotonicity Condition, Local mechanisms

Introduction

Two-Sample Mendelian Randomization (TSMR) is a pivotal method for estimating unbiased causal effects using genetic variants as instrumental variables (IVs), particularly in the presence of unmeasured confounding [1–3]. This approach has been successfully applied in various fields, including cardiovascular research, metabolic and oncology studies [4–6]. A fundamental assumption of TSMR is population homogeneity. However, in practice, researchers often regard participants from the same ethnic background as adequately homogeneous, without thoroughly evaluating whether the distributions of key variables (e.g., exposures, outcomes, and confounders) are consistent across different populations. Such discrepancies in the distributions of key variables between the two populations are referred as different local mechanisms. Such different local mechanisms are common, especially in life-course MR studies [5, 6]. For instance, Zhang et al. explored the causal relationship between childhood body mass index (BMI) and amyotrophic lateral sclerosis (ALS) risk in a TSMR framework [6]. The analysis used two distinct samples, one comprising children and the other adults. The distributions of BMI and ALS risk are heterogeneous between these samples, violating the homogeneity assumption. This variation across samples can significantly impact causal effect estimates, potentially leading to inconsistent or even contradictory findings. For example, while a multivariable TSMR study found that alcohol consumption decreased the LDL-C levels [7], Rosoff et al. suggest that alcohol consumption had a positive effect on the LDL-C levels [8]. The inconsistency in findings may arise from differences in research designs and variable distributions across populations.

The estimand in TSMR, known as Local Average Treatment Effect (LATE), is introduced by Imbens and Angrist [9, 10]. LATE represents the Average Treatment Effect (ATE) in a subgroup called "compliers", where individuals are influenced by the random variation of an IV to either receive or avoid the treatment. When data on IVs, exposure, and outcome are collected from the same population, conventional TSMR methods like the Wald ratio and the two-stage least squares approach are applicable for estimating the LATE [11–14]. TSMR assumes that the two samples originate from similar populations. However, when the samples are drawn from populations with inherent heterogeneity, the foundational assumptions of TSMR may be compromised, leading to potential inaccuracies in LATE estimation [15, 16]. Zhao has explored scenarios where the distributional properties of the IV differ between populations [17] but does not extend the analysis to disparities in local mechanisms, including exposure, outcome, and confounding variables, between populations.

Cinelli and Pearl [18] introduced a selection diagram based on Structural Causal Models to illustrate differing local mechanisms across populations within causal diagrams. We use selection nodes S, represented by square nodes (∎), to indicate local mechanisms suspected to differ between two populations. Selection diagrams depict the variation in local mechanisms affecting exposure, outcome, and unmeasured confounders in two populations. Figure 1 demonstrates how the local mechanisms depend on the three variables in two populations. The diagram uses grey shading to visually indicate which variables are unobserved in each population.

Fig. 1 — Selection diagrams in two-sample Mendelian randomization. (A) the local mechanisms of X are different in two samples; (B) the local mechanisms of Y are different in two samples; (C) the local mechanisms of U are different in two samples

For TSMR analyses, we identify three types of LATE estimands [Eq. (6– 8)]. Actually, these estimands can all be positive or negative, or even in opposite directions. This inconsistency, referred to as the Inconsistent Two-Sample Mendelian Randomization Estimations (InTSMRE), arises from the violation of conditional mean exchangeability across populations [17]. The violation is attributed to distinct local mechanisms of the exposure, outcome, and unmeasured confounders across different populations. This divergence is potentially driven by the interaction effects between the IV and unmeasured confounders on the exposure, as well as between confounders and exposure on the outcome, which vary by populations.

In this paper, we aim to clarify the impact of two heterogeneous populations on LATE estimations in TSMR analyses. Initially, we introduce the notation and define the InTSMRE in Sect. 2.1, followed by criteria for No InTSMRE in Sect. 2.2. In Sect. 2.3, we define a LATE Ratio to quantitatively evaluate the relationships among three types of LATE estimands. We then discuss how the violation of the Monotonicity condition can exacerbate the InTSMRE in Sect. 2.4 and extend the analysis to continuous outcomes in Sect. 2.5. The multiple IVs for the InTSMRE are explored in Sect. 2.6. Additionally, a simulation study demonstrating the phenomenon of InTSMRE induced by variable interactions is presented in Sect. 3. Finally, Sect. 4 investigates the LATEs of waist-to-hip ratio on Type 2 diabetes in European and mixed populations, showcasing the practical implications of the InTSMRE.

Methods

Definitions and notation

Let Inline graphic denotes the IV, the binary exposure or treatment ( or ), the outcome (binary or continuous), the unmeasured confounders. the potential outcome of when , the potential outcome of when and , the potential outcome of when .

Let Inline graphic and represent the ATE of on and the ATE of on , respectively. Then , can be defined as

Inline graphic can be defined as

The Inline graphic of on can be defined as.

which Inline graphic represents among individuals who are induced to take the treatment by assignment to the treatment, that is, individuals who satisfy .

Assumption 1. Stable unit treatment value assumption (SUTVA)

Potential outcomes for each individual are assumed to be independent of the treatment status of other individuals.

Assumption 2. Exclusion restriction

Inline graphic .

The Assumption 2 captures that any effect of Inline graphic on must be via an effect of on .

Assumption 3. Nonzero ATE of Z on X

Inline graphic .

Assumption 4. Monotonicity assumption

Inline graphic for .

Under the Assumptions 1, 2 and 4, Inline graphic can be expressed as

The Inline graphic of on can be expressed as the product of of on and the of on . The proof for Eq. (3) is detailed in Supplementary Methods and Materials S1.

Under the Assumptions 1–4, Inline graphic of on can be identified by

Assumption 5: Random assignment

The treatment assignment Inline graphic is random.

Under Assumption 5, the Inline graphic of on , equals the difference in observed outcomes conditioned on and , and the can also be identified by.

The numerator and denominator can be estimated either within the same sample or across two different samples from two heterogeneous populations (Population I ( Inline graphic ) and Population II ()). Therefore, we have three types of :

(1) when Inline graphic and are both estimated from Population I:

(2) when Inline graphic and are both estimated from Population II:

(3) when Inline graphic is estimated from Population II while is estimated from Population I:

For Inline graphic and , estimates are derived from samples within a single population. We are particularly interested in , where and originate from two different populations exhibiting distinct local mechanisms. When two populations have the same local mechanisms, and . However, when two populations have different local mechanisms, Inline graphic is not consistent with and , and the InTSMRE occurs. We formally define the InTSMRE in Definition 1, we use a triple with elements being " + " or "-" to represent the positivity or negativity of the three LATE values (, , ).

Structurally identical, formulas (7) and (8) estimate the LATE within a single population. This framework is ideal for one-sample MR, where data on genotype, exposure, and outcome are from a single cohort, ensuring that associations are estimated under identical “local mechanisms” to minimize bias. Theoretically, it also applies to two-sample MR, provided the two separate samples are demonstrably drawn from a common underlying population that shares the same local mechanisms. Formula (9) is the theoretical formula behind the vast majority of modern two-sample MR studies. It explicitly indicates that the effect estimates are derived from two different local mechanisms populations.

Definition 1. InTSMRE

Complete InTSMRE: either (+, − , −) or (− , +, +);
Partial InTSMRE: either (+, − , +), (+, +, −), (− , +, −) or (− , − , +).

Definition 1 provides two types of InTSMRE

The first, the Complete InTSMRE, occurs when the direction of Inline graphic differs from both and ,while and are aligned. The second, the Partial InTSMRE, arises when aligns with at least one of and , but the directions of and are opposite. Examples for Complete and Partial InTSMREs are shown in the Supplementary Methods and Materials S2.

Criterion for No InTSMR

In practice, it is desirable to avoid the InTSMRE. We begin by introducing the definition of No InTSMRE.

Definition 2. No InTSMRE

There is no InTSMRE if and only if either (+, +, +) or (− , − , −).

Definition 2 asserts the direction of Inline graphic aligns with both and . When only is satisfied, the Partial InTSMRE may occur. When only the is satisfied, the Complete InTSMRE may occur. Note that this definition does not imply that the value of three estimands are identical; rather, it ensures that their directions are consistent. In the Sect. 2.3, we will provide the quantitative relationships of three Inline graphic estimands. An example for No InTSMRE is shown in the Supplementary Methods and Materials S2.

Subsequently, Theorem 1 provides the necessary condition for No InTSMRE when the outcome is binary.

Assumption 6. (Counterfactual independence)

(a) Inline graphic ;

(b) Inline graphic in compliers.

Assumption 6 means that the assignment of individuals to different samples should not relate to their underlying treatment compliance behavior or potential outcomes. This assumption is more plausible when samples come from harmonized or randomized sources with similar recruitment protocols and ancestry backgrounds, when treatment and outcome measurements are standardized across studies, and when populations are genetically and socioeconomically homogeneous. Conversely, it becomes difficult to justify when there is substantial heterogeneity between samples, differential measurement error in phenotyping, or unmeasured effect modifiers that vary systematically with sample source.

Theorem 1 (Criterion for No InTSMRE)

For TSMR, under the Assumptions 1–5,

when the local mechanisms of X are suspected to differ between two populations (Fig. 1A) and Assumption 6(a) is satisfied, there is no InTSMRE;
when the local mechanisms of Y are suspected to differ between two populations (Fig. 1B) and Assumption 6(b) is satisfied, Partial InTSMRE may occur. There is No InTSMRE if at least one of following conditions satisfied:

where Inline graphic and . When the treatment effect is monotonic, that is for all individuals, then , and only the condition (a) needs to be satisfied.

(3)
when the local mechanisms of U are suspected to differ between different populations (Fig. 1C) and Assumption 6 is satisfied, the Partial InTSMRE may occur. There is No InTSMRE if at least one of conditions (10) or (11) satisfied.

The Assumptions 6 and Inline graphic are defined in Pearl and Tian [19, 20]. The emphasizes the relationship with the “probability of sufficiency” (PS). represents the probability that treatment alone is sufficient to cause death. denotes the probability that the treatment is sufficient to save a person who would otherwise die if treatment were withheld. PSP is the proportion of these two probabilities. As inferred from Figures S1−3 in the Supplementary Methods and Materials S3, Assumption 6 posits that these probabilities of causation among compliers remain constant across populations (S = 1 and S = 2), regardless of whether Inline graphic and are independent or not. In MR, can be estimated by and is the prediction of using IVs. Details of the proof are presented in Supplementary Methods and Materials S3. Theorem 1 further indicates that when the Assumptions 1–6 are satisfied, the direction of aligns consistently with Inline graphic , leading to either no InTSMRE or a Partial InTSMRE. The Complete InTSMRE does not occur. The Complete InTSMRE will occur when the Assumption 4 is violated and we will introduce this in the Sect. 2.4.

LATE ratio

Even there is No InTSMRE, the value of Inline graphic may not equal to or . For instance, when the local mechanisms of X differ between two populations, is not equivalent to . is quantitatively equivalent to multiply . We define a LATE Ratio, denoted as .

The Inline graphic Ratio can be used to connect the three estimands , and

Corollary 1.1. (Quantitative relationships of TSMR estimands)

For TSMR, under the Assumptions 1–5,

(1) when the local mechanisms of Inline graphic are suspected to differ between two populations and Assumption 6(a) is satisfied,

Inline graphic ,

where Inline graphic ;

(2) when the local mechanisms of Inline graphic are suspected to differ between two populations and Assumption 6(b) is satisfied,

Inline graphic ,

Inline graphic where in compliers;

(3) when the local mechanisms of Inline graphic are suspected to differ between two populations and Assumption 6 is satisfied, the inconsistence of in population I is

Inline graphic

Inline graphic ,

where Inline graphic .

Corollary 1.1 provides the quantitative relationships of three estimands using Inline graphic Ratio. When the local mechanisms of are suspected to differ between two populations, the direction of the three estimands are aligned, indicating No InTSMRE. Although, and are equal, differs, with a specific discrepancy noted as . When the local mechanisms of Y differ, equals to Inline graphic , but not , potentially leading to a Partial InTSMRE. If the local mechanisms of differ across populations, all three estimands diverge. The presence of a InTSMRE is indicated by a non-zero difference among the three estimands, though the converse—differences implying a paradox—is not necessarily true. The proofs for Corollary 1.1 are shown in Supplementary Methods and Materials S4.

InTSMRE for continuous outcome

Theorem 2 (Criterion for No InTSMRE)

For TSMR, under the Assumptions 1–4,

when the local mechanisms of are suspected to differ between two populations and Assumption 6(a) is satisfied, there is No InTSMRE;
when the local mechanisms of are suspected to differ between two populations and Assumption 6(b) is satisfied, Partial InTSMRE may occur. There is No InTSMRE among different studies if at least one of the following conditions is satisfied: for S = 1 and S = 2,

13

14

(3) when the local mechanisms of Inline graphic are suspected to differ between two populations and Assumption 6 is satisfied, Partial InTSMRE may occur. There is No InTSMRE if for S = 1 and S = 2 at least one of the (13–14) is satisfied.

Corollary 2.1 (Quantitative relationships of TSMR estimands)

For TSMR, under the Assumptions 1–4,

when the local mechanisms of are suspected to differ between two populations and Assumption 6(a) is satisfied,

Inline graphic ,where ;

(2)
when the local mechanisms of are suspected to differ between two populations and Assumptions 6(b) and 6.1 are satisfied,

Inline graphic ,

Inline graphic where ;

(3)
when the local mechanisms of are suspected to differ between two populations and Assumptions 6 and 6.1 are satisfied,

Inline graphic

Inline graphic where .

The proofs of the Theorem 2 and Corollary 2.1 are shown in the Supplementary Methods and Materials S6.

Multiple IVs

When multiple IVs are involved, the InTSMRE becomes significantly more complex, particularly within TSMR. TSMR employs multiple single nucleotide polymorphisms (SNPs) as IVs to infer the causal effect of an exposure on an outcome. Common MR methods including Inverse Variance Weighted (IVW), MR-Egger, weighted median and mode-based, etc., infer causal effect of exposure on the outcome based on the Inline graphic for each IV. Taking the IVW as an example, IVW calculates a weighted average of of these effect sizes, with weights determined by the inverse of their variances. When each IV exhibits distinct forms of the InTSMRE—such as one leading to a Complete InTSMRE and another to a Partial InTSMRE—complicating the assessment of the Criterion for No InTSMRE for multiple IVs. Additionally, the variables influencing local mechanisms between two populations may vary across different IVs. For example, in the TSMR estimation, one IV may depend on certain local mechanisms, while another IV depends on different mechanisms. In this case, the Criterion for No InTSMRE is more complex. Practitioners should first evaluate each IV individually to ensure there is No InTSMREs for any single IV, thereby reducing the likelihood of a InTSMRE with multiple IVs. While satisfying the Criterion for No InTSMRE for each IV is a sufficient condition for avoiding the InTSMREs with multiple IVs, it is not a necessary condition.

A practical workflow for screening and diagnosing InTSMRE

While the preceding sections have theoretically defined the InTSMRE and elaborated on the conditions and criteria for its occurrence, translating this framework into practice is essential for applied research. This section, therefore, proposes a systematic workflow (Fig. 2) for testing and diagnosing InTSMRE. This protocol is designed to serve as a critical pre-analytical screening step before conducting a standard two-sample MR analysis. By evaluating each IV individually, this process is particularly crucial for studies involving multiple IVs, aiming to enhance the robustness and reliability of the final causal inference.

Fig. 2 — The LATE estimations in the scenario of one IV with varying interaction effects of Z and U on X () in Population I. “Population I”, “Population II” and “Two Populations” in the legend denote the estimations of three LATEs (, and )

Inline graphic — The LATE estimations in the scenario of one IV with varying interaction effects of Z and U on X () in Population I. “Population I”, “Population II” and “Two Populations” in the legend denote the estimations of three LATEs (, and )

Step 1: Basic data preparation

Clearly define Population I and Population II and prepare the GWAS summary data for both the exposure and the outcome in these two populations.

Step 2: Calculation of LATE estimands

Calculate the three types of LATE estimands: Inline graphic , and .

Step 3: Classification of InTSMRE and IV screening

Based on the signs of these three LATE estimands, classify the type of InTSMRE for each IV:

Complete InTSMRE: either (+, − , −) or (− , +, +);
Partial InTSMRE: either (+, − , +), (+, +, −), (− , +, −) or (− , − , +);
No InTSMRE: either (+, +, +) or (− , − , −).

Based on this classification, we recommend an IV screening procedure: IVs classified as “No InTSMRE” can be included in the subsequent multivariable MR analysis. For IVs exhibiting “Complete InTSMRE” or “Partial InTSMRE”, their exclusion should be considered to mitigate the risk of bias in the overall estimate.

Step 4: Diagnostic analysis of inconsistency

When Partial or Complete InTSMRE is detected, researchers can perform further diagnostic analyses to investigate the potential underlying causes:

Assess the monotonicity assumption

This is a critical step for distinguishing the risks associated with different types of InTSMRE.

If the Monotonicity assumption holds, only Partial InTSMRE may occur. This could stem from different local mechanisms of the X, the Y, or U. If individual-level data are accessible, researchers can empirically test for differences in the distributions of the X or Y between the two populations to provide evidence for the source of heterogeneity.

If the Monotonicity assumption is violated, Complete InTSMRE may occur, particularly when the differences in local mechanisms are attributable to the X or U. If Partial InTSMRE occurs under this condition, the cause may be linked to different local mechanisms of the Y.

When a study involves more than two populations, we propose a more rigorous “all-pairs” screening strategy to ensure the robustness of the causal inference. This strategy requires applying the Practical Workflow for Screening and Diagnosing InTSMRE independently to each IV across all possible population pairings. Only those IVs that consistently exhibit "No InTSMRE" in every single pairwise comparison are selected for the final Mendelian randomization analysis. Although this stringent approach may reduce the number of available IVs, it substantially enhances the robustness and generalizability of the causal inference, ensuring that the final effect estimate is based on the most reliable genetic instruments that demonstrate consistent performance across diverse population backgrounds.

In summary, although this testing workflow involves multiple steps, its logic is clear, and each step has a distinct criterion. Performing this rapid screening and classification for each IV prior to conducting a two-sample MR analysis is not excessively time-consuming, yet it can significantly enhance the reliability of the final causal inference.

Simulation study

To demonstrate the performance of Inline graphic estimations using TSMR methods when two populations have different local mechanisms, we conduct a simulation for both continuous and binary outcome. The data generation process for the continuous outcome as follows:

For each IV ( Inline graphic ),

where S = 1 or S = 2 represent two populations. We generate data for two populations, incorporating interaction terms of Inline graphic and on () as well as and on (). To demonstrate the different local mechanisms in two populations, we set the different parameters in two populations, respectively. We compare above three types of estimations: , , corresponding to the blue, green and red lines in Fig. 3 and 4.

Fig. 3 — The LATE estimations in the scenario of multiple IVs with varying interaction effects of Z and U on X () in Population I. “Population I”, “Population II” and “Two Populations” in the legend denote the estimations of three LATEs (, and ). Six TSMR methods including Weighted Median, MR Robust, MR Egger, MR IVW, MR Lasso and MR contamination mixture are used to estimate the

Fig. 4 — The LATE estimations of Waist-to-hip ratio on Type 2 diabetes using six methods including Weighted Median, MR Robust, MR Egger, MR IVW, MR Lasso and MR contamination mixture

Figure 3 demonstrates the InTSMRE for one IV. Figure 3(A), the Complete InTSMRE is shown, where the estimation of Inline graphic is in the opposite direction to both and . Figure 3(B) presents the Partial InTSMREs, where aligns with , but opposes Fig. 3(C) depicts the No InTSMRE, with , and all in the same directions. The simulation results for binary outcome are similarly with Fig. 3.

Figure 4 illustrates the InTSMRE for multiple IVs. Six TSMR methods including Weighted Median, MR-Robust, MR-Egger, MR-IVW, MR-Lasso and MR contamination mixture are used to estimate the Inline graphic . First column in Fig. 4 shows the Complete InTSMRE, where has the opposite direction with and . The second column demonstrates a Partial InTSMRE, where aligns with , but opposes . The third column illustrates the No InTSMRE, with , and all in the same direction. The estimates corresponding to Figs. 3 and 4 are provided in Supplementary Table 4 and 5, with details of simulation study in Supplementary Methods and Materials S8.

Application

In this section, we explore the Inline graphic of Waist-to-hip ratio (WHR) on Type 2 diabetes (T2D) in European and mixed populations using GWAS summary data from MRbase platform (https://www.mrbase.org/). Details are provided in Table 1. The is estimated using GWAS summary data from mixed population for both WHR (id: ebi-a-GCST90095041) and T2D (id: ieu-a-24); The Inline graphic is estimated using GWAS summary data from European population for both WHR (id: ieu-a-73) and T2D (id: finn-b-E4_DM2OPTH); The is estimated using GWAS summary data from mixed population for WHR (id: ebi-a-GCST90095041) and European population for T2D (id: finn-b-E4_DM2OPTH). We extracted the SNPs with Inline graphic and clump SNPs with the linkage disequilibrium with , and the F-statistics of all SNPs are above 10.

Table 1.

GWAS summary data in Application

ID	Trait	Year	Population	Sample_size(cases/controls)	Nsnp
ebi-a-GCST90095041	Waist-to-hip ratio	2022	Mixed	77,482	2,083,711
ieu-a-73	Waist-to-hip ratio	2015	European	212,244	2,560,782
ieu-a-24	Type 2 diabetes	2012	Mixed	149,821(34,840/114981)	127,904
finn-b-E4_DM2OPTH	Type 2 diabetes	2021	European	185,304(2119/183185)	16,380,340

Open in a new tab

Figure 5 reveals the estimations of three Inline graphic using six methods discussed in the simulation section. For the , we find that increasing WHR appears to be protective against T2D; for the , higher WHR increases the risk of T2D; for the , the results is the same as the . However, it is generally acknowledged that WHR is a risk factor for T2D. These findings highlight the Partial InTSMRE, likely driven by the heterogeneity between European and mixed populations. Differences in the distributions of WHR, T2D, or unmeasured confounders between these populations may distort causal effect estimations.

Fig. 5 — Guideline for practical application

Discussion

In this article, we define the InTSMRE using selection diagram, and provide the criterion for No InTSMRE when different local mechanisms exist between populations. Our analysis reveals that the Monotonicity condition voids the Complete InTSMRE, the Partial InTSMRE may still exist. In addition, we introduce the LATE Ratio to describe the invariance of the Monotonicity condition across populations and the quantitative relationships among the three TSMR estimators. Additionally, we illustrate the InTSMRE through simulations and examples to enhance understanding and offer strategies for addressing it.

The Monotonicity condition is a key assumption in TSMR. In MR studies, this condition is ensured by maintaining a positive association coefficient between the exposure and each IV, followed by harmonizing the reference and effect alleles between the exposure and outcome GWAS [21, 22]. However, the Monotonicity condition, while potentially satisfied in one population, may not hold in another due to divergent local mechanisms. We provide the condition of Monotonicity condition invariance using LATE Ratio to test whether this phenomenon occurs. Notably, many studies using summary statistics to infer causal relationships might unintentionally exacerbate the InTSMRE by neglecting this condition, particularly when local mechanisms vary across populations. Population stratification in MR studies represents that the genetic structures, which can induce a connection between Inline graphic and . If the genetic structures differ between populations, the InTSMRE may also occur. This suggests that local mechanisms between two populations depend on the . is randomized in all populations. It can be illustrated as the different local mechanisms between two populations depend on Inline graphic , or depend on or so induce the connection between and because acts as a collider. The InTSMRE becomes more complex when multiple IVs are involved. In this paper, we only provide the criterion of No InTSMRE when there is one IV. The various forms of the InTSMRE with multiple IVs complicate the assessment of the Criterion for No InTSMRE. We caution readers that when using multiple IVs, the possibility of the InTSMRE can only be minimized by reducing the likelihood of its occurrence in each individual IV.

We evaluate the plausibility of the assumption (especially for the Assumption 6) in the setting of our two-sample MR analysis estimating the LATE of WHR on T2D. Assumption 6 requires that sample source S is independent of both genetic predisposition to WHR (treatment) and potential T2D outcomes among compliers. The first part is reasonable since genetic instruments for WHR exhibit similar biological effects across European and mixed populations—especially given shared European ancestry and standardized GWAS protocols. The second part is also plausible, as both populations demonstrate consistent genetic architectures and phenotyping methods for WHR and T2D, reducing the risk of sample-driven bias in counterfactual outcomes. Although unmeasured environmental differences may exist, their influence remains limited in this context.

In conclusion, we establish the Criterion for No InTSMRE when different local mechanisms exist between populations, offering theoretical and practical guidance for applying TSMR in analysis.

Supplementary Information

12874_2025_2707_MOESM1_ESM.docx^{(5.6MB, docx)}

Supplementary Material 1. Proof of Theorems and Corollaries and details of simulation.

12874_2025_2707_MOESM2_ESM.xlsx^{(140.2KB, xlsx)}

Supplementary Material 2. Details of simulation results.

Acknowledgements

Not applicable.

Clinical trial number

Not applicable.

Authors’ contributions

XF, LH, YY and HL conceived the study. YY, LX and HL contributed to the data simulation and the application. YY, HL, and ZG wrote the manuscript with input from all other authors. All authors reviewed and approved the final manuscript.

Funding

National Key Research and Development Program of China (Grant No. 2022YFC3502100), Shandong Province Key R&D Program Project (Grant No. 2021SFGC0504). National Natural Science Foundation Project of China (Grant No. 82173625 and 82404377), National Natural Science Foundation Special Project of China (Grant No. T2341018), Shandong Provincial Natural Science Foundation Youth Fund (Grant No. ZR2023QH236) and 2021 Shandong Medical Association Clinical Research Fund -- Qilu Special Project (Grant No. YXH2022DZX02008).

Data availability

Codes to implement the method and reproduce all simulations and analyses are available from the corresponding author upon reasonable request. Data used in applied example can be obtained in MRbase (https://www.mrbase.org).

Declarations

Ethics approval and consent to participate

Ethical approval was not sought, because this study involved analysis of publicly available data.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Lei Hou and Yuanyuan Yu contributed equally to this work.

Fuzhong Xue and Hongkai Li contributed equally to this work.

Contributor Information

Fuzhong Xue, Email: xuefzh@sdu.edu.cn.

Hongkai Li, Email: lihongkaiyouxiang@163.com.

References

1.Baiocchi M, Cheng J, Small DS. Instrumental variable methods for causal inference. Stat Med. 2014;33(13):2297–340. 10.1002/sim.6128. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Hernán MA, Robins JM. Instruments for causal inference: an epidemiologist’s dream? Epidemiology. 2006;17(4):360–72. 10.1097/01.ede.0000222409.00878.37. [DOI] [PubMed] [Google Scholar]
3.Little M. Mendelian randomization: methods for using genetic variants in causal estimation. J R Stat Soc Ser A Stat Soc. 2018;181:549–50. [Google Scholar]
4.Ference BA, et al. Variation in PCSK9 and HMGCR and Risk of Cardiovascular Disease and Diabetes. N Engl J Med. 2017;376(20):2144–53. [DOI] [PubMed] [Google Scholar]
5.Geng T, Smith CE, Li C, Huang T. Childhood BMI and adult type 2 diabetes, coronary artery diseases, chronic kidney disease, and cardiometabolic traits: a Mendelian randomization analysis. Diabetes Care. 2018;41:1089–96. [DOI] [PubMed] [Google Scholar]
6.Zhang L, Tang L, Huang T, Fan D. Life course adiposity and amyotrophic lateral sclerosis: a mendelian randomization study. Ann Neurol. 2020;87(3):434–41. 10.1002/ana.25671. [DOI] [PubMed] [Google Scholar]
7.Tabara Y, Arai H, Hirao Y, et al. The causal effects of alcohol on lipoprotein subfraction and triglyceride levels using a Mendelian randomization analysis: the Nagahama study. Atherosclerosis. 2017;257:22–8. 10.1016/j.atherosclerosis.2016.12.008. [DOI] [PubMed] [Google Scholar]
8.Rosoff DB, Davey Smith G, Mehta N, Clarke TK, Lohoff FW. Evaluating the relationship between alcohol consumption, tobacco use, and cardiovascular disease: a multivariable Mendelian randomization study. PLoS Med. 2020;17(12):e1003410. 10.1371/journal.pmed.1003410. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Choi BY. Instrumental variable estimation of truncated local average treatment effects. PLoS One. 2021;16(4):e0249642. 10.1371/journal.pone.0249642. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Imbens GW, Angrist JD. Identification and estimation of local average treatment effects. Econometrica. 1994;62(2):467–75. 10.2307/2951620. [Google Scholar]
11.Burgess S, Small DS, Thompson SG. A review of instrumental variable estimators for Mendelian randomization. Stat Methods Med Res. 2017;26(5):2333–55. 10.1177/0962280215597579. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Angrist JD, Imbens GW. Two-stage least squares estimation of average causal effects in models with variable treatment intensity. J Am Stat Assoc. 1995;90(430):431–42. [Google Scholar]
13.Vansteelandt S, Bowden J, Babanezhad M, et al. On instrumental variables estimation of causal odds ratios. Stat Sci. 2011;26:403–22. 10.1214/11-STS360. [Google Scholar]
14.Malina S, Cizin D, Knowles DA. Deep mendelian randomization: investigating the causal knowledge of genomic deep learning models. PLoS Comput Biol. 2022;18(10):e1009880. 10.1371/journal.pcbi.1009880. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Yu Y, Hou L, Shi X, et al. Impact of nonrandom selection mechanisms on the causal effect estimation for two-sample Mendelian randomization methods. PLoS Genet. 2022;18(3):e1010107. 10.1371/journal.pgen.1010107. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Johnson M, Cao J, Kang H. Detecting heterogeneous treatment effects with instrumental variables and application to the Oregon health insurance experiment. Ann Appl Stat. 2022;16(2):1111–29. 10.1214/21-AOAS1535. [Google Scholar]
17.Zhao Q, Wang J, Spiller W, et al. Two-sample instrumental variable analyses using heterogeneous samples. Stat Sci. 2019;34(2):317–333. https://www.jstor.org/stable/26771058
18.Cinelli C, Pearl J. Generalizing experimental results by leveraging knowledge of mechanisms. Eur J Epidemiol. 2021;36:149–64. 10.1007/s10654-020-00687-4. [DOI] [PubMed] [Google Scholar]
19.Pearl J. Probabilities of causation: three counterfactual interpretations and their identification. Synthese. 1999;121(1–2):93–149. [Google Scholar]
20.Tian J, Pearl J. Probabilities of causation: bounds and identification. Ann Math Artif Intell. 2000;28(1–4):287–313. [Google Scholar]
21.Swanson SA, Hernán MA. The challenging interpretation of instrumental variable estimates under monotonicity. Int J Epidemiol. 2017;47:1289–97. 10.1093/ije/dyx038. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Gagliano Taliun SA, Evans DM. Ten simple rules for conducting a mendelian randomization study. PLoS Comput Biol. 2021;17(8):e1009238. 10.1371/journal.pcbi.1009238. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12874_2025_2707_MOESM1_ESM.docx^{(5.6MB, docx)}

Supplementary Material 1. Proof of Theorems and Corollaries and details of simulation.

12874_2025_2707_MOESM2_ESM.xlsx^{(140.2KB, xlsx)}

Supplementary Material 2. Details of simulation results.

Data Availability Statement

[CR1] 1.Baiocchi M, Cheng J, Small DS. Instrumental variable methods for causal inference. Stat Med. 2014;33(13):2297–340. 10.1002/sim.6128. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Hernán MA, Robins JM. Instruments for causal inference: an epidemiologist’s dream? Epidemiology. 2006;17(4):360–72. 10.1097/01.ede.0000222409.00878.37. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Little M. Mendelian randomization: methods for using genetic variants in causal estimation. J R Stat Soc Ser A Stat Soc. 2018;181:549–50. [Google Scholar]

[CR4] 4.Ference BA, et al. Variation in PCSK9 and HMGCR and Risk of Cardiovascular Disease and Diabetes. N Engl J Med. 2017;376(20):2144–53. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Geng T, Smith CE, Li C, Huang T. Childhood BMI and adult type 2 diabetes, coronary artery diseases, chronic kidney disease, and cardiometabolic traits: a Mendelian randomization analysis. Diabetes Care. 2018;41:1089–96. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Zhang L, Tang L, Huang T, Fan D. Life course adiposity and amyotrophic lateral sclerosis: a mendelian randomization study. Ann Neurol. 2020;87(3):434–41. 10.1002/ana.25671. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Tabara Y, Arai H, Hirao Y, et al. The causal effects of alcohol on lipoprotein subfraction and triglyceride levels using a Mendelian randomization analysis: the Nagahama study. Atherosclerosis. 2017;257:22–8. 10.1016/j.atherosclerosis.2016.12.008. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Rosoff DB, Davey Smith G, Mehta N, Clarke TK, Lohoff FW. Evaluating the relationship between alcohol consumption, tobacco use, and cardiovascular disease: a multivariable Mendelian randomization study. PLoS Med. 2020;17(12):e1003410. 10.1371/journal.pmed.1003410. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Choi BY. Instrumental variable estimation of truncated local average treatment effects. PLoS One. 2021;16(4):e0249642. 10.1371/journal.pone.0249642. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Imbens GW, Angrist JD. Identification and estimation of local average treatment effects. Econometrica. 1994;62(2):467–75. 10.2307/2951620. [Google Scholar]

[CR11] 11.Burgess S, Small DS, Thompson SG. A review of instrumental variable estimators for Mendelian randomization. Stat Methods Med Res. 2017;26(5):2333–55. 10.1177/0962280215597579. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Angrist JD, Imbens GW. Two-stage least squares estimation of average causal effects in models with variable treatment intensity. J Am Stat Assoc. 1995;90(430):431–42. [Google Scholar]

[CR13] 13.Vansteelandt S, Bowden J, Babanezhad M, et al. On instrumental variables estimation of causal odds ratios. Stat Sci. 2011;26:403–22. 10.1214/11-STS360. [Google Scholar]

[CR14] 14.Malina S, Cizin D, Knowles DA. Deep mendelian randomization: investigating the causal knowledge of genomic deep learning models. PLoS Comput Biol. 2022;18(10):e1009880. 10.1371/journal.pcbi.1009880. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Yu Y, Hou L, Shi X, et al. Impact of nonrandom selection mechanisms on the causal effect estimation for two-sample Mendelian randomization methods. PLoS Genet. 2022;18(3):e1010107. 10.1371/journal.pgen.1010107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Johnson M, Cao J, Kang H. Detecting heterogeneous treatment effects with instrumental variables and application to the Oregon health insurance experiment. Ann Appl Stat. 2022;16(2):1111–29. 10.1214/21-AOAS1535. [Google Scholar]

[CR17] 17.Zhao Q, Wang J, Spiller W, et al. Two-sample instrumental variable analyses using heterogeneous samples. Stat Sci. 2019;34(2):317–333. https://www.jstor.org/stable/26771058

[CR18] 18.Cinelli C, Pearl J. Generalizing experimental results by leveraging knowledge of mechanisms. Eur J Epidemiol. 2021;36:149–64. 10.1007/s10654-020-00687-4. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Pearl J. Probabilities of causation: three counterfactual interpretations and their identification. Synthese. 1999;121(1–2):93–149. [Google Scholar]

[CR20] 20.Tian J, Pearl J. Probabilities of causation: bounds and identification. Ann Math Artif Intell. 2000;28(1–4):287–313. [Google Scholar]

[CR21] 21.Swanson SA, Hernán MA. The challenging interpretation of instrumental variable estimates under monotonicity. Int J Epidemiol. 2017;47:1289–97. 10.1093/ije/dyx038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Gagliano Taliun SA, Evans DM. Ten simple rules for conducting a mendelian randomization study. PLoS Comput Biol. 2021;17(8):e1009238. 10.1371/journal.pcbi.1009238. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Demystifying inconsistent two-sample mendelian randomization estimations using selection diagram

Lei Hou

Yuanyuan Yu

Zhi Geng

Fuzhong Xue

Hongkai Li

Abstract

Supplementary Information

Introduction

Fig. 1.

Methods

Definitions and notation

Assumption 1. Stable unit treatment value assumption (SUTVA)

Assumption 2. Exclusion restriction

Assumption 3. Nonzero ATE of Z on X

Assumption 4. Monotonicity assumption

Assumption 5: Random assignment

Definition 1. InTSMRE

Definition 1 provides two types of InTSMRE

Criterion for No InTSMR

Definition 2. No InTSMRE

Assumption 6. (Counterfactual independence)

Theorem 1 (Criterion for No InTSMRE)

LATE ratio

Corollary 1.1. (Quantitative relationships of TSMR estimands)

InTSMRE for continuous outcome

Theorem 2 (Criterion for No InTSMRE)

Corollary 2.1 (Quantitative relationships of TSMR estimands)

Multiple IVs

A practical workflow for screening and diagnosing InTSMRE

Fig. 2.

Step 1: Basic data preparation

Step 2: Calculation of LATE estimands

Step 3: Classification of InTSMRE and IV screening

Step 4: Diagnostic analysis of inconsistency

Assess the monotonicity assumption

Simulation study

Fig. 3.

Fig. 4.

Application

Table 1.

Fig. 5.

Discussion

Supplementary Information

Acknowledgements

Clinical trial number

Authors’ contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Competing interests

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases