MVMRmode: Introducing an R package for plurality valid estimators for multivariable Mendelian randomisation

Benjamin Woolf; Dipender Gill; Andrew J Grant; Stephen Burgess

doi:10.1371/journal.pone.0291183

. 2024 May 7;19(5):e0291183. doi: 10.1371/journal.pone.0291183

MVMRmode: Introducing an R package for plurality valid estimators for multivariable Mendelian randomisation

Benjamin Woolf ^1,^2,^3,^*, Dipender Gill ⁴, Andrew J Grant ⁵, Stephen Burgess ³

Editor: Suyan Tian⁶

PMCID: PMC11075861 PMID: 38713711

Abstract

Background

Mendelian randomisation (MR) is the use of genetic variants as instrumental variables. Mode-based estimators (MBE) are one of the most popular types of estimators used in univariable-MR studies and is often used as a sensitivity analysis for pleiotropy. However, because there are no plurality valid regression estimators, modal estimators for multivariable-MR have been under-explored.

Methods

We use the residual framework for multivariable-MR to introduce two multivariable modal estimators: multivariable-MBE, which uses IVW to create residuals fed into a traditional plurality valid estimator, and an estimator which instead has the residuals fed into the contamination mixture method (CM), multivariable-CM. We then use Monte-Carlo simulations to explore the performance of these estimators when compared to existing ones and re-analyse the data used by Grant and Burgess (2021) looking at the causal effect of intelligence, education, and household income on Alzheimer’s disease as an applied example.

Results

In our simulation, we found that multivariable-MBE was generally too variable to be much use. Multivariable-CM produced more precise estimates on the other hand. Multivariable-CM performed better than MR-Egger in almost all settings, and Weighted Median under balanced pleiotropy. However, it underperformed Weighted Median when there was a moderate amount of directional pleiotropy. Our re-analysis supported the conclusion of Grant and Burgess (2021), that intelligence had a protective effect on Alzheimer’s disease, while education, and household income do not have a causal effect.

Conclusions

Here we introduced two, non-regression-based, plurality valid estimators for multivariable MR. Of these, “multivariable-CM” which uses IVW to create residuals fed into a contamination-mixture model, performed the best. This estimator uses a plurality of variants valid assumption, and appears to provide precise and unbiased estimates in the presence of balanced pleiotropy and small amounts of directional pleiotropy.

Background

Mendelian randomisation (MR) is an increasingly popular method for causal inference in epidemiology which uses the random assignment of genetic variants at birth to justify the assumptions of an Instrumental variables analysis [1, 2]. In a traditional MR study, genetic variants (typically single-nucleotide polymorphisms, SNPs) which robustly associate (typically at genome-wide significance) with an exposure of interest are selected as instruments [3]. Because of the easy accessibility of Genome-Wide Association Study (GWAS) summary statistics for many epidemiological traits, MR is often implemented using summary data, in a so-called ‘two-sample MR’ analysis [4]. In such a setting, the effect of the exposure on the outcome is estimated using a Wald ratio as the variant-outcome association divided by the genotype-exposure association. When there are multiple variants, their effects are generally combined using an inverse variance weighted (IVW) meta-analysis.

On top of requiring a robust genotype-exposure association, instrumental variables analysis requires that there are no variant-outcome confounders, and that the variant can only cause the outcome via the exposure. The first of these assumptions is justified by Mendel’s laws of independent and random segregation. However, the second assumption is less plausible due to pleiotropy (the association of most variants with multiple traits). Pleiotropy can occur for two reasons: Firstly, if the exposure causes many other traits, then the genetic variants which associate with it should also associate with these other traits. This type of pleiotropy (often called vertical pleiotropy) is required for MR to work. However, the second type of pleiotropy (horizontal pleiotropy) occurs when the genetic variants independently cause two phenotypes. A second advantage of two-sample MR is that it allows for the implementation of ‘pleiotropy robust’ estimators [5]. These methods generally allow for some variants to be pleiotropic by modifying the assumptions of the instrumental variables framework. One of the first methods proposed for doing this is MR-Egger. IVW can be conceptualised as a weighted intercept-free regression of the variant-outcome associations on the variant-exposure associations. MR-Egger fits the same model as IVW but with an intercept. This model is robust to pleiotropy if the instrument strength is independent of the strength of the direct, pleiotropic, effect (called the InSIDE assumption) [6].

A recent systematic review of two-sample MR studies found that the most frequently implemented pleiotropy robust estimators were MR-Egger, weighted median, and weighted mode [7]. Weighted Median will provide valid estimates if at least half the variants are valid instruments, and so is called a ‘majority valid’ estimator. Weighted mode makes the ZEro Modal Pleiotropy Assumption (ZEMPA), i.e. that there is zero pleiotropy in the modal estimand of the causal effect [8]. ZEMPA is plausible because we should expect the causal effects for variants which are valid instruments to be similar, but each invalid variant to have its own unique pleiotropic bias [9]. If the unique paths are independent of each other, then so too should the biases they exert on invalid variants. Thus, valid variants should have clustered effect estimates, while invalid variants should create heterogeneity. Hence, in settings where there are some valid instruments, we should expect the most common effect estimated to be the valid causal parameter. Here in, we call this type of estimator, which will produce valid estimates when a plurality of SNPs are valid, ‘plurality valid’ estimators.

Estimating modes directly from observed data can be difficult because no two estimates are ever exactly equal. Therefore, the most common observation at a given level of precision may be very different from the true mode. Traditional MBEs avoid this dilemma by smoothing the observed distribution using a parametric kernel-density-smoothed function. This converts the observed estimates into a probability density distribution, and then select the mode of this distribution. An alternative plurality valid estimator comes from the contamination mixture method [10].

The contamination mixture method uses a maximum likelihood approach, assuming the variant specific Wald ratios are normally distributed [10]. It produces a consistent estimator of the causal effect under the plurality valid (ZEMPA) assumption. The advantages of the contamination mixture method are that it does not require the parametric assumptions of the kernel-density function, is more computationally efficient, and generally produces more precise estimates with potentially asymmetric confidence intervals [10].

Multivariable MR (MVMR) is an extension of MR to allow for the simultaneous modelling of the effect of multiple exposures on an outcome [11]. The effects of each exposure in an MVMR model are the direct effects of the exposure on the outcome conditional on the other exposures. This has resulted in MVMR being applied as a method for mediation analyses [12], but it is also used to adjust for known biases in an MR model [13–15]. MVMR modifies the three instrumental Variables assumptions so that the variant is a valid instrument if: 1) the variant is robustly associated with at least one exposure, 2) there are no variant-outcome confounders, 3) the variant can only cause the outcome via one or more of the exposures.

MVMR was originally introduced using a residual-based framework, in which the effect of a second exposure on the outcome was removed from the variant-outcome association, and the effect of the second exposure on the exposure was removed from the variant-exposure association [14]. These modified associations were then used as the input to a traditional MR estimator. However, given the analogy between IVW and weighted regression, two-sample MVMR is typically implemented as a type of multiple regression, in which the variant-outcome associations for the variants which associate with either exposure of interest are regressed on the variant-exposure associations in an intercept-free linear regression, inversely weighted by the variance in the variant-outcome association. MR-Egger can also be implemented by allowing for a non-zero regression intercept, and weighted median can be implemented using weighted quantile regression [16].

However, we are not aware of an existing estimators for doing mode-based regression, and hence MVMR which make a plurality valid-type assumption like ZEMPA have been underexplored. The multivariable constrained maximum likelihood (MVMR-cML) method provides consistent estimates under a plurality-valid assumption by maximizing a constrained likelihood function subject a maximum number of invalid instruments [17]. The MVMR-Horse method provides estimates under the same model as MVMR-cML in a Bayesian framework, using horseshoe priors for identification [18]. Finally, the Genome-wide mR Analysis under Pervasive PLEiotropy (GRAPPLE) method is a multivariable method that can provide robust estimates in the presence of invalid instruments using profile likelihood [19]. Here, we, introduce and validate a further framework for implementing plurality valid estimators in two-sample MVMR.

Methods

Theoretical background

Notation and assumptions

We assume a set of genetic variants that are independently distributed are being proposed as instruments in an MR analysis. We shall denote with subscript i the ith element of any vector, which relates to the ith genetic variant. Let β_y,i be the genetic variant-outcome association for the ith genetic variant and β_x,i be the genetic variant-exposure association for the ith variant. We represent the causal effect of the exposure on the outcome using the scalar θ. We also assume that the exposure-outcome relationship is linear and unaffected by effect modification. We let α_i represent pleiotropic effects of the ith variant on the outcome. Thus, when α_i = 0, the ith variant is a valid instrument.

Suppose the ith variant-exposure and variant-outcome associations are related according to the model proposed by Bowden et al. [20]:

β_{y, i} = θ β_{x, i} + α_{i}

Now suppose we have estimates for two exposures, denoted by x₁ and x₂. $β_{x_{1}, i}$ and $β_{x_{2}, i}$ are the ith variant’s associations with the first and second exposure, respectively. Likewise, θ₁ and θ₂ are the causal effects of the first and second exposure, respectively, on the outcome. We can now extend (1) as follows:

β_{y, i} = θ_{1} β_{x_{1}, i} + θ_{2} β_{x_{2}, i} + α_{i}^{'}

Where $α_{i}^{'}$ represents pleiotropic effects of the ith variant on the outcome which do not pass via x₁ or x₂.

Statistical framework

In practice, we do not observe β_y, $β_{x_{1}}$ , or $β_{x_{2}}$ . However, we may obtain estimates, for example from GWAS. We denote the vectors of association estimates by ${\hat{β}}_{y} {\hat{β}}_{x_{1}}$ , and ${\hat{β}}_{x_{2}}$ . Thus, in traditional multivariable-IVW we can estimate θ₁ and θ₂ using the following linear model:

{\hat{β}}_{y} = θ_{1} {\hat{β}}_{x_{1}} + θ_{2} {\hat{β}}_{x_{2}} + ε_{1}; a n d ε_{1, i} \sim N (0, σ_{y, i}^{2}) .

Given the data structure in Eqs (2), (3) will provide a consistent estimator when $α_{i}^{'} = 0$ for all i (i.e., all variants are valid instruments), or when $\sum_{1}^{n} α_{i}^{'} = 0$ and $α_{i}^{'}$ is independent of ${\hat{β}}_{x_{1, i}}$ and ${\hat{β}}_{x_{2, i}}$ for all i (i.e., pleiotropy is balanced and the InSIDE assumption is met). A plurality valid estimator, on the other hand, should be consistent provided that a plurality of the $α_{i}^{'}$ are zero, i.e. under the ZEMPA assumption.

Let ${\tilde{β}}_{y}$ be the residuals from regressing ${\hat{β}}_{y}$ on ${\hat{β}}_{x_{2}}$ (without an intercept), and let ${\tilde{β}}_{x 1}$ be the residuals from regressing ${\hat{β}}_{x_{1}}$ on ${\hat{β}}_{x_{2}}$ (without an intercept). We can now estimate θ₁ using the linear model:

{\tilde{β}}_{y} = θ_{1} {\tilde{β}}_{x 1} + ε_{2}; a n d ε_{2, i} \sim N (0, σ_{y}^{2}) .

Let $\tilde{α}$ be the residuals from regressing a vector of the pleiotropic effects on ${\hat{β}}_{x_{2}}$ (without an intercept). Because we have now reformulated the equation for the variant-outcome association so that it is in terms of a univariable regression model, ${\tilde{β}}_{y}$ and ${\tilde{β}}_{x 1}$ can be used as the inputs to a traditional univariable mode-based estimator. When more than one exposure is of interest, then this process can be iterated for each exposure. It follows that a plurality valid estimator for θ₁ using the residuals in this way will produce a valid estimate provided that a plurality of the ${\tilde{α}}_{i}$ values are zero. This seems likely to be the case if a plurality of the $α_{i}^{'}$ values are zero and the non-zero elements are distributed around zero (i.e., balanced pleiotropy).

In settings with only two exposures, the residuals could be obtained through univariable MR of the outcome on the second exposure, and of the first exposure on the second exposure. Where there are more than two exposures, an existing multivariable MR method could be used instead to create residuals. This general framework could be implemented using a variety of estimators. Here we explore two types of plurality valid estimators. Firstly, we explore an estimator which uses a regression model to create the residuals fed into a traditional mode-based estimator (MBE) [8], which we dub ‘multivariable-MBE’. This regression model could be created using any of the existing MVMR-estimators. Here we model the residuals using IVW (i.e. intercept-free linear regression).

Although ultimately arbitrary, we focused on IVW, rather than another type of MR estimator, because it provides the most intuitive way to understand validity conditions: using IVW to create residuals means that pleiotropic effects in the residual creation step are passed forwards to the MR analysis. Hence, the estimator should produce valid estimates if a plurality of SNP effects are valid instruments. On the other hand, if weighted median was used in the first step then this would require that at least 50% of these variants would be valid. It is not obvious how the identification assumptions for the two steps would interact when defining which settings the estimator would be valid in. In addition, MBE are known to be much less precise than other estimators, and IVW is currently the most efficient multivariable estimator. Using other estimators to create residuals could exacerbate this issue.

Since the contamination mixture method has several advantages, discussed above, we also implemented this framework using both the contamination mixture method. This ‘multivariable-CM’ estimator uses IVW to create residuals fed into a contamination mixture model.

Our estimators are therefore algorithmic rather than model-based in the sense that we are not starting by precisely defining a statistical model, and then deriving conclusion from the assumptions of the model. But, instead, using an algorithm (taking the mode of the distribution) to convert genetic data in MR estimates. The likely trade-off for the conceptual simplicity of this approach will not optimise statistical efficiency.

Deriving a standard error multivariable-MBE and multivariable-CM

Assuming we have strong instruments (i.e. the first MR assumption is valid) we can use the first order approximation for the standard error of the Wald ratio that is typically used in two-sample MR studies. In a traditional univariable model this is defined as:

S E_{w a l d, i} = S E_{y, i} / | β_{x, i} |

Where SE_y,i is the standard error of the ith variant-outcome association estimate.

In effect, this standard error is assuming that the variant-exposure association is measured with sufficient precision that we can assume that it contributes no error to the estimate of the causal effect. Under this assumption, the process of creating residuals will not increase the random error in the standard error of the Wald ratio. Hence, we model the standard error of the ith Wald ratio estimate as:

S E_{r e s i d, i} = S E_{y, i} / | {\tilde{β}}_{x_{1}, i} |

Simulation study

We report our simulation study using the ADEMP (aims, data-generating mechanisms, estimands, methods, and performance measures) approach [21].

Aims

We ran a simple simulation study to assess the performance of our plurality valid estimators when compared to other MVMR estimators.

Data-generating mechanisms

We broadly simulate a setting in which there are two putative causal exposures for a single outcome. In the primary simulation we explore a setting in which the second exposure is pleiotropic (Fig 1), and where either both or neither of the exposures have a causal association with the outcome. We then explore how well the methods do under varying amounts of balanced and directional pleiotropy.

Fig 1 — E and E2 are the first and second exposures respectively, GRS is the genetic liability to the exposures, and O is the outcome, and C is a confounder.

More formally, we simulated 200 single nucleotide polymorphisms (SNPs, which are common genetic variants) as independent and identically distributed binomial variables with the following parameters:

S N P \sim B (1, 0.4) + B (1, 0.4)

We additionally simulated the SNP effects on the exposures as independent and identically distributed normal variables

b S N P \sim N (0.05, {0.02}^{2})

The beta values and allele frequencies here were chosen to be loosely based on the effect sizes for the genome wide significant SNPs in the Wootton et al. UK Biobank GWAS smoking [22].

For settings in which we simulated pleiotropy (Fig 1.2A and 1.2B), the pleiotropic SNP effects were simulated as:

p S N P \sim N (B E T A, S E^{2})

Each simulation was repeated with BETA being set to either 0 or -0.03 to represent balanced and directional pleiotropy respectively. SE was always set to 0.1.

We then simulated a confounder as a normally distributed variable with the following parameters: C ~ N(0, 1²)

We then defined the first exposure as:

E_{1} = 0.3 * C + \sum_{1}^{200} [b S N P * S N P] + ε_{1}

where ε is an error term such that ε₁ ~ N(0, 1²).

The second exposure was defined as:

E_{P l} = 0.4 * C + \sum_{1}^{200} [b S N P * S N P] + ε_{3}

where ε is an error term such that ε₃ ~ N(0, 1²).

When both exposures had null effects on the outcome (Fig 1.1B and 1.2B), the outcome was defined as:

O_{N; P} = C + \sum_{0}^{p} [p S N P * S N P] + ε_{4}

where ε is an error term such that ε₄ ~ N(0, 1²). p could take the value of 0, 20, 40, or 80 to represent pleiotropic effects for 0, 10%, 20% or 40% of SNPs.

When both exposures had non-null effects on the outcome (Fig 1.1A and 1.2A), the outcomes were defined as:

O_{E 1, E P l; P} = C + 0.3 * E_{1} + 0.4 * E_{P l} + \sum_{1}^{p} [p S N P * * S N P] + ε_{4}

Where p takes the same definition as it had for O_N;P.

The phenotypic beta values chosen were chosen arbitrarily. However, biases are often more visible with larger effect estimates. By choosing realistically large betas we hoped to clearly illustrate the possible strengths and limitations of the different methods. While the specific results of our simulation may not be applicable to any specific applied setting, more general trends should be.

GWAS summary statistics for each exposure variable were estimated from linear regression models. Each genetic association with each exposure, and the outcome, were estimated from a unique sample of 200,000 participants with no sample overlap with the other GWASs.

Estimands

The causal effects of each exposure on the outcome.

Methods. We compare five methods for estimating the causal effect of the exposure on the outcome: multivariable IVW (intercept free multiple regression of the variant-outcome associations on the variant-exposure associations weighted by the inverse variance in the variant-outcome association), multivariable MR-Egger (multiple regression of the variant-outcome associations on the variant-exposure associations weighted by the inverse variance in the variant-outcome association), multivariable Weighted Median (quantile regression of the variant-outcome associations on the variant-exposure associations weighted by the inverse variance in the variant-outcome association), multivariable-MBE (using IVW to create the residuals and an MBE to estimate the causal effect), and multivariable-CM (using IVW to create the residuals and the contamination mixture method estimate the causal effect). IVW, MR-Egger, and weighted median were chosen because they appear to be some of the most widely used estimators which use different assumptions.

Performance measure

The primary performance measures were mean bias, 95% CI width, and the percentage of times that the confidence intervals include zero. When there is no causal effect, the latter will represent the type-2 error rate. When there is a causal effect, it measures one minus the type-1 error rate. In additional analyses we also explore the standard deviation of the effect estimate (overall 1000 simulations), and coverage for the causal effect of the exposure on the outcome over the 1000 iterations. Bias was defined as the estimate minus the true causal effect. Thus, in the null settings, bias was the effect estimate. In the non-null settings, bias was the effect estimate of E₁ minus 0.3 and the estimate of E_Pl minus 0.4. Coverage was defied as the percentage of times that the 95% confidence interval included the causal effect (or zero). 95% CI width was operationalised as difference between the upper 95% CI limit and the lower 95% CI limit.

Applied example

We re-analysed the applied example (on the effect of intelligence, education, and household income on Alzheimer’s disease) from Grant and Burgess’ (2021) paper on pleiotropy robust estimators for MMVR [23]. This had previously been studied by Davies et al. and Anderson et al. [24, 25]. Anderson et al., in particular, had shown that a multivariable model was important for accounting for the collinearity between intelligence and education. Grant and Burgess then added household income to explore how the models worked with an additional risk factor.

Here we re-analysed the data used by Grant and Burgess (2021). They used 213 genetic variants from Davies et al. as instruments. These instruments had been clumped to ensure independence from each other and all had F statistics greater than 10, although the mean conditional F statistics ranged between 1.5 and 2.5. They used the Hill et al. GWAS of intelligence (n = 199,242 male and female European ancestry individuals) [26], Okbay et al. GWAS of years of education (n = 293,723 male and female European ancestry individuals) [27], and the Neale Lab UK Biobank GWAS of household income (n = 337,199 male and female European ancestry individuals) as sources of exposure data [28]. Since household income is an ordinal categorical variable, the genetic variant associations represent the increase in log odds of being in a higher income category per extra effect allele. Grant and Burgess (2021) additionally used Lambert et al. as a source of Alzheimer’s data (n = 74,046 male and female European ancestry individuals) [29]. More information on the data sources can be found in the original publications.

We implemented our two novel estimators, as well as IVW, MR-Egger, and MR-Median. Since the genetic associations with education and intelligence were in the same direction, the MR-Egger estimates can be interpreted as being oriented in the direction of either of these exposures.

Results

Simulation

Table 1 presents the results for the primary performance measures (bias and 95% CI width) of the simulations from the settings in which both exposures cause the outcome, while in Table 2 neither exposure exerts a causal effect on the outcome. The mean conditional F statistic for Exposure 1 was around 197, and 186 for Exposure 2.

Table 1. Primary results for setting where both exposures cause the outcome, and exposure 2 is pleiotropic.

			No bias	10% balanced pleiotropy	20% balanced pleiotropy	40% balanced pleiotropy	10% directional pleiotropy	20% directional pleiotropy	40% directional pleiotropy
Bias	IVW	Exposure 1	0.001	-0.003	0.003	0.001	-0.026	-0.048	-0.109
	IVW	Exposure 2	-0.006	-0.002	-0.008	-0.004	-0.034	-0.067	-0.121
	MR Egger	Exposure 1	0.06	0.056	0.061	0.059	0.053	0.059	0.045
	MR Egger	Exposure 2	-0.051	-0.047	-0.046	-0.048	-0.05	-0.056	-0.055
	Median	Exposure 1	0.006	0.007	0.006	0.009	0.003	0.001	-0.008
	Median	Exposure 2	-0.012	-0.012	-0.012	-0.014	-0.015	-0.02	-0.031
	multivariable-CM	Exposure 1	0.003	0.002	0.003	0.004	0.006	0.015	0.073
	multivariable-CM	Exposure 2	-0.005	-0.006	-0.007	-0.007	-0.005	0.001	0.054
	multivariable-MBE	Exposure 1	-0.001	0.001	-0.079	0.149	-0.055	0.154	0.253
	multivariable-MBE	Exposure 2	0.074	-0.049	0.179	0.598	-0.117	0.054	-0.113
95% CI width	IVW	Exposure 1	0.096	0.328	0.459	0.641	0.339	0.473	0.662
	IVW	Exposure 2	0.096	0.328	0.459	0.641	0.339	0.473	0.662
	MR Egger	Exposure 1	0.172	0.464	0.638	0.885	0.478	0.656	0.91
	MR Egger	Exposure 2	0.173	0.464	0.637	0.883	0.477	0.655	0.908
	Median	Exposure 1	0.14	0.147	0.156	0.18	0.147	0.155	0.178
	Median	Exposure 2	0.14	0.148	0.157	0.18	0.147	0.156	0.179
	multivariable-CM	Exposure 1	0.093	0.102	0.114	0.144	0.107	0.131	0.209
	multivariable-CM	Exposure 2	0.092	0.102	0.114	0.145	0.105	0.127	0.206
	multivariable-MBE	Exposure 1	1.09	1.752	2.38	4.061	2.516	2.732	3.961
	multivariable-MBE	Exposure 2	1.607	2.117	2.722	5.567	2.754	4.672	4.502
% of times the 95% CI includes 0	IVW	Exposure 1	0%	0%	0%	0%	0%	0%	0.5%
	IVW	Exposure 2	0%	0%	0%	0%	0%	0%	0.1%
	MR Egger	Exposure 1	0%	0%	0%	2.2%	0%	0%	2.6%
	MR Egger	Exposure 2	0%	0%	0%	0.9%	0%	0%	0.7%
	Median	Exposure 1	0%	0%	0%	0%	0%	0%	0%
	Median	Exposure 2	0%	0%	0%	0%	0%	0%	0%
	multivariable-CM	Exposure 1	0%	0%	0%	0%	0%	0%	0%
	multivariable-CM	Exposure 2	0%	0%	0%	0%	0%	0%	0%
	multivariable-MBE	Exposure 1	6.4%	13.3%	22%	31.9%	15%	24.6%	35.5%
	multivariable-MBE	Exposure 2	7.5%	14.5%	21.6%	33.8%	13.8%	20.5%	35%

Open in a new tab

Table 2. Primary results for setting where neither exposure causes the outcome, and exposure 2 is pleiotropic.

			No bias	10% balanced pleiotropy	20% balanced pleiotropy	40% balanced pleiotropy	10% directional pleiotropy	20% directional pleiotropy	40% directional pleiotropy
Bias	IVW	Exposure 1	0	0.004	-0.002	0.003	-0.029	-0.056	-0.111
	IVW	Exposure 2	0	-0.004	0.002	-0.001	-0.03	-0.059	-0.116
	MR Egger	Exposure 1	0	0.002	0.001	0.006	-0.006	-0.015	-0.022
	MR Egger	Exposure 2	0	-0.005	0.003	0.002	0	-0.003	-0.007
	Median	Exposure 1	0	0	0	0.001	-0.001	-0.002	-0.007
	Median	Exposure 2	0	0	0	0	-0.001	-0.003	-0.007
	multivariable-CM	Exposure 1	0	0	0	-0.001	0.009	0.047	0.166
	multivariable-CM	Exposure 2	0	0	0	-0.001	0.009	0.046	0.167
	multivariable-MBE	Exposure 1	0.004	-0.062	-0.168	-0.276	0.097	-0.17	0.208
	multivariable-MBE	Exposure 2	0.008	0.02	0.12	-0.062	-0.117	0.189	0.903
95% CI width	IVW	Exposure 1	0.032	0.318	0.451	0.638	0.332	0.469	0.658
	IVW	Exposure 2	0.032	0.317	0.451	0.637	0.332	0.468	0.658
	MR Egger	Exposure 1	0.044	0.435	0.618	0.873	0.454	0.641	0.9
	MR Egger	Exposure 2	0.044	0.434	0.616	0.871	0.453	0.64	0.898
	Median	Exposure 1	0.046	0.051	0.057	0.071	0.052	0.057	0.073
	Median	Exposure 2	0.046	0.051	0.057	0.071	0.052	0.057	0.073
	multivariable-CM	Exposure 1	0.033	0.041	0.051	0.075	0.050	0.078	0.147
	multivariable-CM	Exposure 2	0.033	0.041	0.051	0.074	0.051	0.080	0.147
	multivariable-MBE	Exposure 1	0.324	1.146	1.743	2.888	1.326	2.083	3.375
	multivariable-MBE	Exposure 2	0.433	2.058	2.078	4.628	1.380	2.120	5.276
% of times the 95% CI includes 0	IVW	Exposure 1	95.7%	95.6%	94.7%	95.7%	93.5%	93.1%	90.2%
	IVW	Exposure 2	94.6%	96.2%	95.1%	94.5%	92.2%	92%	88.3%
	MR Egger	Exposure 1	94.8%	95.3%	94.5%	95.4%	94%	96.3%	95%
	MR Egger	Exposure 2	95.1%	94.9%	94.4%	94.7%	94.4%	95.8%	94.8%
	Median	Exposure 1	97.1%	98%	95.7%	91.9%	96.8%	96.5%	89.1%
	Median	Exposure 2	97.2%	96.3%	95.4%	91.1%	97.2%	95.9%	90.7%
	multivariable-CM	Exposure 1	95.4%	95.3%	92.1%	86.1%	89.7%	63.3%	23.5%
	multivariable-CM	Exposure 2	94.7%	95.1%	91.7%	85.4%	88.1%	65%	24.7%
	multivariable-MBE	Exposure 1	99.4%	99.2%	99.2%	99%	99.2%	98.7%	98.9%
	multivariable-MBE	Exposure 2	99.7%	99.5%	99%	98.9%	99.1%	99%	98%

Open in a new tab

Bias

In both Tables 1 and 2, all estimators performed well in the no-bias setting. The small amount of bias observed (0.1% - 0.5%) is explicable by weak instrument bias and the variability in the estimates (S1 and S2 Tables). When there was balanced pleiotropy, the multivariable-MBE seemed to underperform the non-plurality valid estimators while the multivariable -CM estimator appeared to do slightly better. Multivariable-CM was comparatively unbiased by even large amounts of balanced pleiotropy. However, moderate amounts of directional pleiotropy were sufficient to bias estimates more than the Median estimator. For example, in the setting where both exposures are causal and there was 40% directional pleiotropy, the first and second exposure estimates were biased by -0.055 and -0.008 respectively for the Median estimator, but 0.073 and 0.054 for multivariable-CM. Multivariable-MBE was more biased than multivariable-CM in all settings. For example, using the same simulation as above, multivariable-MBE was biased by 0.253 and -0.113 in the estimates for exposure 1 and 2 respectively.

95% CI width

The multivariable-MBE had the widest 95% CIs of all the estimators. For example, in the no bias simulation, the 95% CI widths were five to ten time larger than for the other estimators. The non-plurality valid estimators generally had similarly wide 95% CI. Multivariable-CM generally had tighter 95% CI than the other estimators.

Coverage and power

Since it had wide 95% CI, multivariable-MBE unsurprisingly had a low type-1 error rate (the 95% CI included the null in all settings > 98% when there was no association), but a high type-2 (the 95% CI included the null up to 35% of the time in settings where there was a true association). Multivariable-CM conversely had a very low type-2 error rate (the 95% CI never included the null when there was a true association). Multivariable-CM had a type-1 error rate at the nominal level (5%) for the 0% and 10% balance pleiotropy scenarios. In contrast, the Median estimator had type-1 error rates well below the nominal level in these scenarios. The type-1 error rates for Multivariable-CM were above the nominal level from 20% balanced pleiotropy, and for all levels of directional pleiotropy.

Additional outcomes

Standard deviation of the effect estimates across the 1000 simulations: The SD of effect estimates between the multivariable-CM estimator and the non-plurality valid estimators were similar in the no-bias setting and when there was balanced pleiotropy (S1 and S2 Tables). However, multivariable-MBE had much wider SD, possibly because MBE produces less precise estimates than the contamination mixture method. In addition, all the plurality valid estimators had larger standard deviations when there was directional pleiotropy.

Coverage. Although all the estimators achieved 95% coverage when neither exposure was causal and there was no bias (S2 Table), surprisingly, except for Weighted Median and Multivariable-MBE, most estimators did not achieve at least 95% coverage when both exposures were causal (S1 Table). This might be because Weighted Median and Multivariable-MBE had the widest CI width (Tables 1 and 2) and all estimators were being effected by weak-instrument bias.

Applied example

As with Grant and Burgess (2021), the pleiotropy robust estimators provided consistent estimates of the effects of education, intelligence, and household income on Alzheimer’s disease (Table 3). All estimators concluded a null effect of education on Alzheimer’s, conditional on the other exposures. However, they all implied a negative effect on intelligence, although the 95% CI for MR-Egger and multivariable-MBE included the null hypotheses. All estimators estimated a log odds ratio of household income around 0.3, but again with 95% CI which included zero. As the original study concluded “[t]he consistency of the findings give strength to the assertion that intelligence has a causally protective effect on Alzheimer’s disease, conditional on years of education and household income. However, there is no evidence of a direct effect of years of education or household income on Alzheimer’s disease.”

Table 3. Results of the applied example exploring the effect to education and intelligence on Alzheimer’s disease.

Method	Education (95% CI)	Intelligence (95% CI)	Household Income (95% CI)
IVW	-0.244 (-0.919 to 0.430)	-0.469 (-0.864 to -0.074)	0.416 (-0.250 to 1.082)
Egger	-0.035 (-0.761 to 0.691)	-0.073 (-0.724 to 0.578)	0.400 (-0.264 to 1.064)
Robust	-0.017 (-0.624 to 0.590)	-0.544 (-0.927 to -0.161)	0.263 (-0.404 to 0.931
Median	-0.134 (-0.873 to 0.606)	-0.573 (-1.029 to -0.116)	0.368 (-0.378 to 1.114)
multivariable-CM	0.046 (-0.601 to 0.689)	-0.575 (-0.920 to -0.198)	0.303 (-0.288 to 0.893)
multivariable-MBE	0.648 (-1.048 to 2.344)	-0.733 (-1.684 to 0.219)	0.229 (-3.701 to 4.158)

Open in a new tab

Discussion

Here we introduce two plurality valid estimators for multivariable Mendelian randomisation. Unlike most existing estimators, these use residual framework rather than multivariable regression models to produce the final effect estimates. We then used simulations with varying amounts of directional and balanced pleiotropy, as well as a re-analysis of the effect of intelligence, years of education, and household income on Alzheimer’s disease to compare the relative performance of our estimators with each other and existing estimators for MVMR.

As with previous analyses, our estimators implied that intelligence has a protective effect on Alzheimer’s disease, while years of education and household income do not. This has two important implications, firstly that as the years of mandatory education increase, there should not be a corresponding increase in Alzheimer’s. Secondly, our results imply that public health interventions to boost intelligence, beyond additional years of education, may be useful in reducing the burden of Alzheimer’s, although further research would be needed to confirm this hypothesis.

Of the two plurality valid estimators considered here, multivariable-CM, which uses IVW to create the residuals fed into a contamination mixture model, overall performed the best. It generally performed at least as well, if not better, than MR-Egger and IVW in terms of bias and precision in all settings. Indeed, when there was balanced pleiotropy, it was both more precise and less biased than IVW. However, in settings with moderate-to-high amounts of directional pleiotropy it was a lot more biased than Weighted median. Indeed, the high precision of the CM estimates is probably detrimental in this setting as it resulted in lower coverage than the other estimators. The divergence in performance between balanced and directional settings is probably, as discussed in the methods section, due to the multivariable-CM method assuming balanced pleiotropy. Hence, we would expect the estimator to perform better under situations where the distribution of Wald ratios with directional pleiotropy is similar to the assumed model with balanced pleiotropy, such as when the absolute amount of directional bias is small. The MR-Egger intercept and funnel plots have both been suggested as methods for exploring the presence of directional pleiotropy, and therefore may be useful additional analyses when employing the multivariable-CM estimator [30]. Thus, while we think it can help triangulate results between a univariate and multivariable setting by allowing the use of a plurality valid estimator in both analyses, or between multiple multivariable estimators, we cannot recommend using it alone unless there is a priori evidence that there should be no directional pleiotropy.

Multivariable-MBE was sufficiently imprecise that it is likely to be uninformative in practice, and we would therefore suggest that, when needed, researchers use another robust multivariable method instead. The poorer performance of the MV-MBE estimator is probably due to the greater uncertainty in the estimates produced by the mode-based estimator [5]: in Tables 1 and 2, the bias remains meaningfully smaller than half of the 95% CI width, despite often being more than ten times greater than the bias for the other estimators.

Our simulations are not without limitations. Firstly, although pleiotropy can vary continuously between studies, we explore only discrete amounts of this biases. This could potentially mask non-linearities in the performance of pleiotropy robust estimators for MVMR in the presence of these biases. In addition, all our simulations assume linearity and homogeneity (i.e. no effect modification or interaction) of the effects of the risk factors on the outcomes. A further limitation of this work is that we have only considered the scenario with two exposures in our simulation study. However, the framework we introduce in this paper does naturally extend to consider more than two exposures by using multivariable IVW in the first stage. Finally, although multivariable-CM and multivariable-MBE could be implemented using estimates other than IVW to create residuals, here we have implemented it explicitly using IVW because the interpretation of the validity assumption using the other estimators is unclear.

In summary, here we introduce a framework for implementing plurality valid estimators for multivariable Mendelian randomisation in the absence of modal regression. Of these, the multivariable-CM estimator, which uses IVW to create residuals then fed into a contamination mixture method, appeared to perform the best. Although it performed very well with large amounts of balanced pleiotropy, it underperformed estimators like Weighted median when there was directional pleiotropy. We hope these estimators (available from https://github.com/bar-woolf/MVMRmode/wiki) will further enable the future triangulation of univariable MR studies which have used plurality valid estimators with multivariable MR designs.

Supporting information

S1 Table. Results for additional outcomes when both exposures cause the outcome, and exposure 2 is pleiotropic.

(DOCX)

pone.0291183.s001.docx^{(21.6KB, docx)}

S2 Table. Results for additional outcomes when neither exposure cause the outcome, and exposure 2 is pleiotropic.

(DOCX)

pone.0291183.s002.docx^{(21.4KB, docx)}

Acknowledgments

This work was carried out using the computational facilities of the Advanced Computing Research Centre, University of Bristol - http://www.bris.ac.uk/acrc/.

Data Availability

All data produced in the present study are available from DOI 10.17605/OSF.IO/8DZKU.

Funding Statement

Benjamin Woolf is funded by an Economic and Social Research Council (ESRC) South West Doctoral Training Partnership (SWDTP) 1+3 PhD Studentship Award (ES/P000630/1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Human Molecular Genetics. 2014;23: R89–R98. doi: 10.1093/hmg/ddu328 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Davey Smith G, Holmes MV, Davies NM, Ebrahim S. Mendel’s laws, Mendelian randomization and causal inference in observational data: substantive and nomenclatural issues. Eur J Epidemiol. 2020;35: 99–111. doi: 10.1007/s10654-020-00622-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Burgess S, Small DS, Thompson SG. A review of instrumental variable estimators for Mendelian randomization. Stat Methods Med Res. 2017;26: 2333–2355. doi: 10.1177/0962280215597579 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Hartwig FP, Davies NM, Hemani G, Smith GD. Two-sample Mendelian randomization: avoiding the downsides of a powerful, widely applicable but potentially fallible technique. International Journal of Epidemiology. 2016;45: 1717–1726. doi: 10.1093/ije/dyx028 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Slob EAW, Burgess S. A comparison of robust Mendelian randomization methods using summary data. Genetic Epidemiology. 2020;44: 313–329. doi: 10.1002/gepi.22295 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. International Journal of Epidemiology. 2015;44: 512–525. doi: 10.1093/ije/dyv080 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Woolf B, Di Cara N, Moreno-Stokoe C, Skrivankova V, Drax K, Higgins JPT, et al. Investigating the transparency of reporting in two-sample summary data Mendelian randomization studies using the MR-Base platform. International Journal of Epidemiology. 2022; dyac074. doi: 10.1093/ije/dyac074 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Hartwig FP, Davey Smith G, Bowden J. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int J Epidemiol. 2017;46: 1985–1998. doi: 10.1093/ije/dyx102 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Bowden J, Hemani G, Davey Smith G. Invited Commentary: Detecting Individual and Global Horizontal Pleiotropy in Mendelian Randomization—A Job for the Humble Heterogeneity Statistic? Am J Epidemiol. 2018;187: 2681–2685. doi: 10.1093/aje/kwy185 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Burgess S, Foley CN, Allara E, Staley JR, Howson JMM. A robust and efficient method for Mendelian randomization with hundreds of genetic variants. Nat Commun. 2020;11: 376. doi: 10.1038/s41467-019-14156-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Sanderson E, Spiller W, Bowden J. Testing and correcting for weak and pleiotropic instruments in two-sample multivariable Mendelian randomization. Statistics in Medicine. 2021;40: 5434–5452. doi: 10.1002/sim.9133 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Carter AR, Sanderson E, Hammerton G, Richmond RC, Davey Smith G, Heron J, et al. Mendelian randomisation for mediation analysis: current methods and challenges for implementation. Eur J Epidemiol. 2021;36: 465–478. doi: 10.1007/s10654-021-00757-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Schooling CM, Lopez PM, Yang Z, Zhao JV, Au Yeung SL, Huang JV. Use of Multivariable Mendelian Randomization to Address Biases Due to Competing Risk Before Recruitment. Frontiers in Genetics. 2021;11. Available: https://www.frontiersin.org/article/10.3389/fgene.2020.610852 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Burgess S, Thompson SG. Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. Am J Epidemiol. 2015;181: 251–260. doi: 10.1093/aje/kwu283 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Woolf B. mesrument error and MR. 2021. [cited 23 Apr 2022]. doi: 10.17605/OSF.IO/YXZWC [DOI] [Google Scholar]
16.Rees JMB, Wood AM, Burgess S. Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy. Stat Med. 2017;36: 4705–4718. doi: 10.1002/sim.7492 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Lin Z, Xue H, Pan W. Robust multivariable Mendelian randomization based on constrained maximum likelihood. The American Journal of Human Genetics. 2023;110: 592–605. doi: 10.1016/j.ajhg.2023.02.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Grant AJ, Burgess S. A Bayesian approach to Mendelian randomization using summary statistics in the univariable and multivariable settings with correlated pleiotropy. bioRxiv; 2023. p. 2023.05.30.542988. doi: 10.1101/2023.05.30.542988 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Wang J, Zhao Q, Bowden J, Hemani G, Smith GD, Small DS, et al. Causal inference for heritable phenotypic risk factors using heterogeneous genetic instruments. PLOS Genetics. 2021;17: e1009575. doi: 10.1371/journal.pgen.1009575 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Bowden J, Del Greco M F, Minelli C, Davey Smith G, Sheehan N, Thompson J. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat Med. 2017;36: 1783–1802. doi: 10.1002/sim.7221 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Statistics in Medicine. 2019;38: 2074–2102. doi: 10.1002/sim.8086 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Wootton RE, Richmond RC, Stuijfzand BG, Lawn RB, Sallis HM, Taylor GMJ, et al. Evidence for causal effects of lifetime smoking on risk for depression and schizophrenia: a Mendelian randomisation study. Psychol Med. 2020;50: 2435–2443. doi: 10.1017/S0033291719002678 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Grant AJ, Burgess S. Pleiotropy robust methods for multivariable Mendelian randomization. Stat Med. 2021;40: 5813–5830. doi: 10.1002/sim.9156 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Anderson EL, Howe LD, Wade KH, Ben-Shlomo Y, Hill WD, Deary IJ, et al. Education, intelligence and Alzheimer’s disease: evidence from a multivariable two-sample Mendelian randomization study. International Journal of Epidemiology. 2020;49: 1163–1172. doi: 10.1093/ije/dyz280 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Davies NM, Hill WD, Anderson EL, Sanderson E, Deary IJ, Davey Smith G. Multivariable two-sample Mendelian randomization estimates of the effects of intelligence and education on health. Teare MD, Franco E, Burgess S, editors. eLife. 2019;8: e43990. doi: 10.7554/eLife.43990 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Hill WD, Marioni RE, Maghzian O, Ritchie SJ, Hagenaars SP, McIntosh AM, et al. A combined analysis of genetically correlated traits identifies 187 loci and a role for neurogenesis and myelination in intelligence. Mol Psychiatry. 2019;24: 169–181. doi: 10.1038/s41380-017-0001-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Okbay A, Beauchamp JP, Fontana MA, Lee JJ, Pers TH, Rietveld CA, et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature. 2016;533: 539–542. doi: 10.1038/nature17671 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Rapid GWAS of thousands of phenotypes for 337,000 samples in the UK Biobank. In: Neale lab [Internet]. [cited 18 Jul 2022]. Available: http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank
29.Lambert J-C, Ibrahim-Verbaas CA, Harold D, Naj AC, Sims R, Bellenguez C, et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat Genet. 2013;45: 1452–1458. doi: 10.1038/ng.2802 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Haycock PC, Burgess S, Wade KH, Bowden J, Relton C, Davey Smith G. Best (but oft-forgotten) practices: the design, analysis, and interpretation of Mendelian randomization studies. Am J Clin Nutr. 2016;103: 965–978. doi: 10.3945/ajcn.115.118216 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0291183.r001

Decision Letter 0

Suyan Tian

5 May 2023

PONE-D-23-08704MVMRmode: Introducing an R package for plurality valid estimators for multivariable Mendelian randomisationPLOS ONE

Dear Dr. Woolf,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please address the points raised by the reviewers one by one, particularly provide more details about the simulations, and discuss the comparision results with other methods in detail. Please submit your revised manuscript by Jun 19 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Suyan Tian

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: No

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper describes the application of the Mode estimator in the multivariable MR setting. The design of the study includes a simulation study and calculation of effects and comparison with effects of a previously published study on Alzheimer’s disease. The conclusions of the paper are supported by the application of the two mode estimators (multivariable-MBE and multivariable-CM) in the two studies. These findings show that while multivariable-MBE is of no use, the multivariable-CM outperforms other multivariable MR pleiotropy-robust methods such as the Egger and the Weighed median when balanced pleiotropy is present, but not when directional pleiotropy is present. The paper is well-written and I have no major comments.

My minor comments are the following:

Abstract: the CM stands for contamination mixture, please define this abbreviation

There is a typo in the conclusions: “to provided” instead of “to provide”

In the introduction of the abstract, there could be a mention on the fact that the modal method was developed as a complementary method to address pleiotropy in MR.

Can the term casual association be defined? Do the authors just mean causal association?

Could the authors better explain , or rephrase the following sentence: “we focused on IVW, rather than another type of MR estimator, because it provides the most intuitive to understand validity conditions”

In the data generating mechanisms of the ADEMP, there is a typo (casual instead of causal association).

In the methods section of the ADEMP definition, the authors compare the multivariable-MBE and multivariable-CM to the IVW, Egger and weighted median, but they do not specify if the latter 3 methods are applied in a univariate or multivariable MR framework.

In the performance measure paragraph, there is another typo (casual instead of causal)

Results section:

Did the authors expect the significant differences in performance metrics between the multivariable-MBE and the multivariable-CM? Are these differences comparable to those seen when the two mode estimands are applied in a univariate MR setting?

In the discussion of the paper, the message is clear and offers a guidance to the reader in terms of which method is the best to use in a given setting (ie presence of balanced vs directional pleiotropy). Could the authors also remind the reader how to evaluate empirically the presence of balanced vs directional pleiotropy in order to make the best choice of method for the multivariable MR?

Reviewer #2: The paper introduced two multivariable modal estimators though residual method for multivariable-MR, namely mulitivariable-MBE and multivariable-CM and developed a R package for this method. It provided a relatively comprehensive motivation for its proposal as they focused on the mode-based regression and plurality valid estimator for MVMR. However, some questions and concerns are listed as follows.

1. The description of the proposed method and its performance could be improved. There are typos and ambiguous terms. For example:

i). model assumptions and notations might be trivial but still should be included in the ‘Methods’ section before introducing equation 1) for clearer explanation on the proposed operations on estimates beta, alpha, etc.

ii). In ‘Simulation study’ section, capital P is used twice without introducing. I think the first P is a type but the second P in `B(1, P)` in the formula of outcome with non-null effects might be P / 100?

2. More theoretical details are needed or more careful discussion at least for the development and performance of the proposed methods. It seems more like a framework than a rigorous method to me and its performance closely related to the approach used to create the residuals and the precision of estimates. It’s good to discuss a little about why IVW method is the focus one though it’s more based on the interpretation perspective.

3. For the simulation results,

i). it may be better if the discussion of the choice of magnitude of coefficients and noise level are included as it will be interesting to see how other terms can affect the performance of this framework as no theoretical details are provided.

ii). Measurement metrics should be described, e.g. how bias is calculated in the table?

iii). More discussion or exploration should be provided. Based on the simulation and real data results and authors’ conclusion, multivariable-MBE generally provides more biased estimates than other methods and much larger but why? Is it due to the way residuals generated or estimates or the framework itself? This might also related to (2) the theoretical details or more comprehensive discussion of the method.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Despoina Manousaki

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 May 7;19(5):e0291183. doi: 10.1371/journal.pone.0291183.r002

Author response to Decision Letter 0

3 Jul 2023

Suyan Tian

Academic Editor

PLOS ONE

Dear Dr Tian,

We would like to thank you and the reviewers for taking the time to assess our article “MVMRmode: Introducing an R package for plurality valid estimators for multivariable Mendelian randomisation" and providing feedback. Please find bellow our point-by-point response to the comments. We hope we now have adequately addressed all issues. a

Yours,

Benjamin Woolf

Reviewer #1:

“The paper is well-written and I have no major comments.”

Thank you very much!

My minor comments are the following:

“Abstract: the CM stands for contamination mixture, please define this abbreviation”

We have changed the sentence to read:

multivariable-MBE, which uses IVW to create residuals fed into a traditional plurality valid estimator, and a method which instead has the residuals fed into the contamination mixture method (CM), multivariable-CM.

“There is a typo in the conclusions: “to provided” instead of “to provide””

Thank you, we have made the suggested correction.

“In the introduction of the abstract, there could be a mention on the fact that the modal method was developed as a complementary method to address pleiotropy in MR.”

We have changed the second sentence of the abstract to read:

Mode-based estimator (MBE) are one of the most popular types of estimators used in univariable-MR studies and is often used as a sensitivity analysis for pleiotropy.

“Can the term casual association be defined? Do the authors just mean causal association?”

“In the data generating mechanisms of the ADEMP, there is a typo (casual instead of causal association).”

“In the performance measure paragraph, there is another typo (casual instead of causal)”

Apologies for this typo, we’ve changed all references of ‘casual’ in the text to ‘causal’

“Could the authors better explain, or rephrase the following sentence: “we focused on IVW, rather than another type of MR estimator, because it provides the most intuitive to understand validity conditions”

We have expanded the sentence to read:

Although ultimately arbitrary, we focused on IVW, rather than another type of MR estimator, because it provides the most intuitive to understand validity conditions: using IVW to create residuals means that pleiotropic effects in the residual creation step are passed forwards to the MR analysis. Hence, the method should produce valid estimates if a plurality of SNP effects are valid instruments. On the other hand, if weighted median was used in the first step then this would require that at least 50% of these variants would be valid. It is not obvious how the identification assumptions for the two steps would interact when defining which settings the method would be valid in. In addition, MBE are known to be much less precise than other estimators, and IVW is currently the most efficient multivariable estimator. Using other estimators to create residuals could exacerbate this issue.

“In the methods section of the ADEMP definition, the authors compare the multivariable-MBE and multivariable-CM to the IVW, Egger and weighted median, but they do not specify if the latter 3 methods are applied in a univariate or multivariable MR framework.”

Sorry – these were all the multivariable implementation of the relevant methods. We have updated the text to read:

We compare five methods for estimating the causal effect of the exposure on the outcome: multivariable IVW (intercept free multiple regression of the variant-outcome associations on the variant-exposure associations weighted by the inverse variance in the variant-outcome association), multivariable MR-Egger (multiple regression of the variant-outcome associations on the variant-exposure associations weighted by the inverse variance in the variant-outcome association), multivariable Weighted Median (quantile regression of the variant-outcome associations on the variant-exposure associations weighted by the inverse variance in the variant-outcome association), multivariable-MBE (using IVW to create the residuals and an MBE to estimate the causal effect), and multivariable-CM (using IVW to create the residuals and the contamination mixture method estimate the causal effect). IVW, MR-Egger, and weighted median were chosen because they appear to be some of the most widely used estimators which use different assumptions.

“Did the authors expect the significant differences in performance metrics between the multivariable-MBE and the multivariable-CM? Are these differences comparable to those seen when the two mode estimand are applied in a univariate MR setting?”

We did not go into the study with a strong expectation of how the different methods would perform. As we note in the introduction, CM is more precise than most MBE estimators, so we did expect that CM would produce more precise estimates.

“Could the authors also remind the reader how to evaluate empirically the presence of balanced vs directional pleiotropy in order to make the best choice of method for the multivariable MR?”

We have added the following to the discussion:

The MR-Egger intercept and funnel plots have both been suggested as methods for exploring the presence of directional pleiotropy, and therefore may be useful additional analyses when employing the multivariable-CM estimator (27).

Reviewer #2:

“1. The description of the proposed method and its performance could be improved. There are typos and ambiguous terms. For example:

We have added a "Notation and assumptions" section at the start of the ‘Theoretical background’ section which introduces the notation used in the read of this section. And, then kept the rest under a ‘statistical framework’ sub-section. The updated text can be found at the end of this letter.

“ii). In ‘Simulation study’ section, capital P is used twice without introducing. I think the first P is a type but the second P in `B(1, P)` in the formula of outcome with non-null effects might be P / 100?”

Thank you for picking up on this. Yes, they are the same P and the second reference should have been P/100. We have added the following after the definition of OE1,EPl;P:

Where P takes the same definition as it had for ON;P.

We hope this is clearer.

“2. More theoretical details are needed or more careful discussion at least for the development and performance of the proposed methods. It seems more like a framework than a rigorous method to me and its performance closely related to the approach used to create the residuals and the precision of estimates.”

Some statistical methods are model-based, whereas others are more algorithmic in nature. For example, clustering methods include Gaussian mixture modelling, which fits a formal statistical model for the datapoints, and nearest neighbour clustering, which is more algorithmic in nature. The latter approach is still a statistical method, in that it takes data inputs and provides cluster membership as outputs. In our case, we appreciate that this is not a model-based method, but it still takes data inputs and provides Mendelian randomization estimates are outputs. We note that algorithmic methods are more common in robust statistics, as the analyst does not always want to define the precise model followed by the data, but rather provide an estimate that performs reasonably for a variety of models.

We have improved our exposition of the mode-based MVMR method. We have also clarified that this is an algorithmic method, rather than a model-based method at the end of the ‘statistical’ framework subsection:

Finally, to avoid any contention, we have replaced most references to ‘method’ to ‘estimator’ or ‘framework’ where appropriate.

“It’s good to discuss a little about why IVW method is the focus one though it’s more based on the interpretation perspective.”

We have expanded the discussion in the methods so that it now reads:

“3. For the simulation results,

We have added the following two sentences to the simulation methods section which we hope explains the logic behind the parameter choices:

The beta values and allele frequencies here were chosen to be loosely based on the effect sizes for the genome wide significant SNPs in the Woottan et al UK Biobank GWAS smoking (64).

Finally, we had actually found when running small (e.g. 100 iteration) versions of our simulation that the results are actually reasonably robust to parameter choice, and certainly the trends observed here generalised to other parameterisations. However, we decided not to report this for two reasons: A) the computational time (about a week using an array job) to run the full simulation means that it is not feasible in practice to run the full simulation over many settings, and B) Since we already have three pages of tables with only one setting, it is unclear how to present the results in a way which is not overwhelming and is actually interpretable.

“ii). Measurement metrics should be described, e.g. how bias is calculated in the table?”

We have added the following at the end of the performance measures subsection:

Bias was defined as the estimate minus the true causal effect. Thus, in the null settings, bias was the effect estimate. In the non-null settings, bias was the effect estimate of E1 minus 0.3 and the estimate of EPl minus 0.4. Coverage was defied as the percentage of times that the 95% confidence interval included the causal effect (or zero). 95% CI width was operationalised as difference between the upper 95% CI limit and the lower 95% CI limit.

We hope this is an adequate definition of the performance measures.

“iii). More discussion or exploration should be provided. Based on the simulation and real data results and authors’ conclusion, multivariable-MBE generally provides more biased estimates than other methods and much larger but why? Is it due to the way residuals generated or estimates or the framework itself? This might also relate to (2) the theoretical details or more comprehensive discussion of the method.”

We suspect that it is due to the way that the residuals are created. Since MBEs are much less precise than other estimators, we suspect that part of the error might be accounted for by the much greater uncertainty in the estimates produced by the MBE used here. For example, even though the bias for MV-MBE is 10s or 100s of times larger than that for the other estimators in Table 1 and 2, it remains meaningfully smaller than half of the 95% CI width. We have added the following to the discussion:

The poorer performance of the MV-MBE estimator is probably due to the greater uncertainty in the estimates produced by the mode-based estimator: in Tables 1 and 2, the bias remains meaningfully smaller than half of the 95% CI width, despite often being more than ten times greater than the bias for the other estimators.

##########################################################################

Notation and assumptions

We assume a set of genetic variants that are independently distributed are being proposed as instruments in an MR analysis. We shall denote with subscript i the ith element of any vector, which relates to the ith genetic variant. Let β_(y,i) be the genetic variant-outcome association for the ith genetic variant and β_(x,i) be the genetic variant-exposure association for the ith variant. We represent the causal effect of the exposure on the outcome using the scalar θ. We also assume that the exposure-outcome relationship is linear and unaffected by effect modification. We let αi represent pleiotropic effects of the ith variant on the outcome. Thus, when αi = 0, the ith variant is a valid instrument.

Suppose the ith variant-exposure and variant-outcome associations are related according to the model proposed by Bowden et al. (16):

β_(y,i)= θβ_(x,i)+ α_i

Now suppose we have estimates for two exposures, denoted by x_1 and x_2. β_(x_1,i) and β_(x_2,,i) are the ith variant’s associations with the first and second exposure, respectively. Likewise, θ_1 and θ_2 are the causal effects of the first and second exposure, respectively, on the outcome. We can now extend (1) as follows:

β_(y,i)= θ_1 β_(x_1,i)+θ_2 β_(x_2,i)+α_i^'

Where α_i^' represents pleiotropic effects of the ith variant on the outcome which do not pass via x_1 or x_2.

Statistical framework

In practice, we do not observe β_y, β_(x_1 ), or β_(x_2 ). However, we may obtain estimates, for example from GWAS. We denote the vectors of association estimates by β ^_y β ^_(x_1 ), and β ^_(x_2 ). Thus, in traditional multivariable-IVW we can estimate θ_1and θ_2 using the following linear model:

β ^_y= θ_1 β ^_(x_1 )+θ_2 β ^_(x_2 )+ε_1; and ε_(1,i) ~ N(0,σ_(y,i)^2 ).

Given the data structure in equation (2), (3) will provide a consistent estimator when α_i^'=0 for all i (i.e., all variants are valid instruments), or when ∑_1^n▒α_i^' =0 and α_i^' is independent of β ^_(x_(1,i) ) and β ^_(x_(2,i) ) for all i (i.e., pleiotropy is balanced and the InSIDE assumption is met). A plurality valid estimator, on the other hand, should be consistent provided that a plurality of the α_i^' are zero, i.e. under the ZEMPA assumption.

Let 〖 β ~〗_y be the residuals from regressing β ^_y on β ^_(x_2 ) (without an intercept), and let 〖 β ~〗_x1be the residuals from regressing β ^_(x_1 ) on β ^_(x_2 ) (without an intercept). We can now estimate θ_1 using the linear model:

〖 β ~〗_y= θ_1 〖 β ~〗_x1+ ε_2 ; and ε_(2,i) ~ N(0,σ_y^2 ).

Let α ~ be the residuals from regressing a vector of the pleiotropic effects on β ^_(x_2 ) (without an intercept). Because we have now reformulated the equation for the variant-outcome association so that it is in terms of a univariable regression model, 〖 β ~〗_y and 〖 β ~〗_x1 can be used as the inputs to a traditional univariable mode-based estimator. When more than one exposure is of interest, then this process can be iterated for each exposure. It follows that a plurality valid estimator for θ_1 using the residuals in this way will produce a valid estimate provided that a plurality of the α ~_i values are zero. This seems likely to be the case if a plurality of the α_i^' values are zero and the non-zero elements are distributed around zero (i.e., balanced pleiotropy).

In settings with only two exposures, the residuals could be obtained through univariable MR of the outcome on the second exposure, and of the first exposure on the second exposure. Where there are more than two exposures, an existing multivariable MR method could be used instead to create residuals. This general framework could be implemented using a variety of estimators. Here we explore two types of plurality valid estimators. Firstly, we explore an estimator which uses a regression model to create the residuals fed into a traditional mode-based estimator (MBE) (8), which we dub ‘multivariable-MBE’. This regression model could be created using any of the existing MVMR-estimators. Here we model the residuals using IVW (i.e. intercept-free linear regression).

Although ultimately arbitrary, we focused on IVW, rather than another type of MR estimator, because it provides the most intuitive to understand validity conditions: using IVW to create residuals means that pleiotropic effects in the residual creation step are passed forwards to the MR analysis. Hence, the estimator should produce valid estimates if a plurality of SNP effects are valid instruments. On the other hand, if weighted median was used in the first step then this would require that at least 50% of these variants would be valid. It is not obvious how the identification assumptions for the two steps would interact when defining which settings the estimator would be valid in. In addition, MBE are known to be much less precise than other estimators, and IVW is currently the most efficient multivariable estimator. Using other estimators to create residuals could exacerbate this issue.

PLoS One. doi: 10.1371/journal.pone.0291183.r003

Decision Letter 1

Suyan Tian

17 Jul 2023

PONE-D-23-08704R1MVMRmode: Introducing an R package for plurality valid estimators for multivariable Mendelian randomisationPLOS ONE

Dear Dr. Woolf,

Please submit your revised manuscript by Aug 31 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Suyan Tian

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: I thank the authors for addressing my previous comments. I have no further suggestions, other than a small typo (intuitive (adjective) should be replaced by "intuition" or "intuitive way")

Reviewer #2: Thank you for taking your time to carefully address all the questions! All in all, this paper introduced two algorithmic methods for multivariable-MR, namely the multivariable-MBE and multivarible-CM. They adopted Monte-carlo simulations for the performance of the proposed estimators and adopted them to study the causal effect of intelligence, education and household income on Alzheimer’s disease for real data analysis, along with the existing methods. I think in this revised version, a much clearer explanation and description have been made for your proposed methods.

1. There are still some typos and unclear parts you might want to correct, for example:

(i) the 2,, in the sentence “Now suppose we have estimates for two exposures, denoted by 1 and 2. 1, and 2,,” right after equation (1).

(ii) In simulation part, it’s mentioned that 200 SNPs are generated and generation of E1, E_PI includes the first 50 and first 100 SNPs, respectively. Do you use all the 200 SNPs or only the first 100 ones? Also there is a small p in the definition of O_N;P, I think you mean p = P / 100 * 200, is that correct?

(iii) In the definition of O_E1, EPI;P, I think the 100 should be P or p?

(iv) ‘causals’ should be causal in the discussion of coverage as the additional outcomes in the Result section for simulation.

2. Will you conclude that multivariable-MBE is not that useful in practice? Or provide some suggestions for under what circumstance this method might work? Maybe an inclusion of the MV-MBE performance under different levels of pleiotropy can be used to better explain why it fail in this framework, or include the sd for the estimator to support the argument “probably due to the greater uncertainty in the estimates”.

3. Just for comprehensive analysis, a more detailed discussion for more than two exposures will be ideal I think, will complicate the analysis for sure though. Because I’m kind of curious how different methods in this framework affect the final results and this kind of analysis, e.g. some patterns shown in the simulation, might provide some insights for your framework as well.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Despoina Manousaki

Reviewer #2: No

**********

PLoS One. 2024 May 7;19(5):e0291183. doi: 10.1371/journal.pone.0291183.r004

Author response to Decision Letter 1

16 Aug 2023

Suyan Tian

Academic Editor

PLOS ONE

Dear Dr Tian,

We would like to thank you and the reviewer for again taking the time to assess our article "MVMRmode: Introducing an R package for plurality valid estimators for multivariable Mendelian randomisation”, and providing such generous feedback. I apologise for most of these errors being typos – I’m badly dyslexic and struggle to pick up this type of mistake. Please find below our point-by-point response to the comments bellow where we have implemented all of the suggested changes.

Yours,

Benjamin Woolf

Journal Requirements:

“Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.”

To the best of our knowledge we have not cited any retracted papers. In the process of revising the manuscript, it has come to our attention that other plurality-valid estimators for multivariable Mendelian randomization (MVMR) have been published (either as a journal article or as a pre-print). We have amended the Introduction and Discussion to clarify that this is no longer the first plurality-valid MVMR method, and added reference to these methods.

Reviewers' comments

Reviewer #1:

“a small typo (intuitive (adjective) should be replaced by "intuition" or "intuitive way")”

Thank you for picking this up, we have changed “intuitive” to " intuitive way "

Reviewer #2:

“There are still some typos and unclear parts you might want to correct, for example: (i) the 2,, in the sentence “Now suppose we have estimates for two exposures, denoted by 1 and 2. 1, and 2,,” right after equation (1).”

Thank you for picking this up, I have removed the second comma form β_(x_2,i)

“(ii) In simulation part, it’s mentioned that 200 SNPs are generated and generation of E1, E_PI includes the first 50 and first 100 SNPs, respectively. Do you use all the 200 SNPs or only the first 100 ones? Also there is a small p in the definition of O_N;P, I think you mean p = P / 100 * 200, is that correct?”

Thank you very much for picking this up. After double checking with the code, it should be 200 SNPs for both. I’m sorry for not having picked this up this version control issue before. I have changed the ‘P’ to a ‘p’.

“(iii) In the definition of O_E1, EPI;P, I think the 100 should be P or p?”

Thank you again for picking this up, it should be ‘p’. I have changed the ‘100’ to ‘p’

“(iv) ‘causals’ should be causal in the discussion of coverage as the additional outcomes in the Result section for simulation.”

Thank you for picking these up, we have made the suggested changes.

“Will you conclude that multivariable-MBE is not that useful in practice? Or provide some suggestions for under what circumstance this method might work? Maybe an inclusion of the MV-MBE performance under different levels of pleiotropy can be used to better explain why it fail in this framework, or include the sd for the estimator to support the argument “probably due to the greater uncertainty in the estimates”.

We agree with the reviewer, and have clarified this in the text:

The claim “probably due to the greater uncertainty in the estimates” relates to the univariable mode-based method, which has been shown to have variable estimates in previous published comparisons of univariable Mendelian randomization methods. We now provide a citation for this claim.

“Just for comprehensive analysis, a more detailed discussion for more than two exposures will be ideal I think, will complicate the analysis for sure though. Because I’m kind of curious how different methods in this framework affect the final results and this kind of analysis, e.g. some patterns shown in the simulation, might provide some insights for your framework as well.”

We agree with the reviewer that this is a limitation of the current manuscript. However, due to the limited performance of the proposed methods in the current simulation scenarios, and the recent publication of other plurality-robust methods, we did not this it would be interesting to readers to provide a comparison of these methods in a more complex scenario.

In the text we have added the following:

“A further limitation of this work is that we have only considered the scenario with two exposures in our simulation study. However, the framework we introduce in this paper does naturally extend to consider more than two exposures by using multivariable IVW in the first stage.”

Attachment

Submitted filename: r2 response letter v2.docx

pone.0291183.s003.docx^{(20.4KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0291183.r005

Decision Letter 2

Suyan Tian

23 Aug 2023

MVMRmode: Introducing an R package for plurality valid estimators for multivariable Mendelian randomisation

PONE-D-23-08704R2

Dear Dr. Woolf,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Suyan Tian

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

All points raised by the reviewers have been addressed appropriately.

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0291183.r006

Acceptance letter

Suyan Tian

29 Aug 2023

PONE-D-23-08704R2

MVMRmode: Introducing an R package for plurality valid estimators for multivariable Mendelian randomisation

Dear Dr. Woolf:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Suyan Tian

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Table. Results for additional outcomes when both exposures cause the outcome, and exposure 2 is pleiotropic.

(DOCX)

pone.0291183.s001.docx^{(21.6KB, docx)}

S2 Table. Results for additional outcomes when neither exposure cause the outcome, and exposure 2 is pleiotropic.

(DOCX)

pone.0291183.s002.docx^{(21.4KB, docx)}

Attachment

Submitted filename: r2 response letter v2.docx

pone.0291183.s003.docx^{(20.4KB, docx)}

Data Availability Statement

All data produced in the present study are available from DOI 10.17605/OSF.IO/8DZKU.

[pone.0291183.ref001] 1.Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Human Molecular Genetics. 2014;23: R89–R98. doi: 10.1093/hmg/ddu328 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref002] 2.Davey Smith G, Holmes MV, Davies NM, Ebrahim S. Mendel’s laws, Mendelian randomization and causal inference in observational data: substantive and nomenclatural issues. Eur J Epidemiol. 2020;35: 99–111. doi: 10.1007/s10654-020-00622-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref003] 3.Burgess S, Small DS, Thompson SG. A review of instrumental variable estimators for Mendelian randomization. Stat Methods Med Res. 2017;26: 2333–2355. doi: 10.1177/0962280215597579 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref004] 4.Hartwig FP, Davies NM, Hemani G, Smith GD. Two-sample Mendelian randomization: avoiding the downsides of a powerful, widely applicable but potentially fallible technique. International Journal of Epidemiology. 2016;45: 1717–1726. doi: 10.1093/ije/dyx028 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref005] 5.Slob EAW, Burgess S. A comparison of robust Mendelian randomization methods using summary data. Genetic Epidemiology. 2020;44: 313–329. doi: 10.1002/gepi.22295 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref006] 6.Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. International Journal of Epidemiology. 2015;44: 512–525. doi: 10.1093/ije/dyv080 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref007] 7.Woolf B, Di Cara N, Moreno-Stokoe C, Skrivankova V, Drax K, Higgins JPT, et al. Investigating the transparency of reporting in two-sample summary data Mendelian randomization studies using the MR-Base platform. International Journal of Epidemiology. 2022; dyac074. doi: 10.1093/ije/dyac074 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref008] 8.Hartwig FP, Davey Smith G, Bowden J. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int J Epidemiol. 2017;46: 1985–1998. doi: 10.1093/ije/dyx102 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref009] 9.Bowden J, Hemani G, Davey Smith G. Invited Commentary: Detecting Individual and Global Horizontal Pleiotropy in Mendelian Randomization—A Job for the Humble Heterogeneity Statistic? Am J Epidemiol. 2018;187: 2681–2685. doi: 10.1093/aje/kwy185 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref010] 10.Burgess S, Foley CN, Allara E, Staley JR, Howson JMM. A robust and efficient method for Mendelian randomization with hundreds of genetic variants. Nat Commun. 2020;11: 376. doi: 10.1038/s41467-019-14156-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref011] 11.Sanderson E, Spiller W, Bowden J. Testing and correcting for weak and pleiotropic instruments in two-sample multivariable Mendelian randomization. Statistics in Medicine. 2021;40: 5434–5452. doi: 10.1002/sim.9133 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref012] 12.Carter AR, Sanderson E, Hammerton G, Richmond RC, Davey Smith G, Heron J, et al. Mendelian randomisation for mediation analysis: current methods and challenges for implementation. Eur J Epidemiol. 2021;36: 465–478. doi: 10.1007/s10654-021-00757-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref013] 13.Schooling CM, Lopez PM, Yang Z, Zhao JV, Au Yeung SL, Huang JV. Use of Multivariable Mendelian Randomization to Address Biases Due to Competing Risk Before Recruitment. Frontiers in Genetics. 2021;11. Available: https://www.frontiersin.org/article/10.3389/fgene.2020.610852 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref014] 14.Burgess S, Thompson SG. Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. Am J Epidemiol. 2015;181: 251–260. doi: 10.1093/aje/kwu283 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref015] 15.Woolf B. mesrument error and MR. 2021. [cited 23 Apr 2022]. doi: 10.17605/OSF.IO/YXZWC [DOI] [Google Scholar]

[pone.0291183.ref016] 16.Rees JMB, Wood AM, Burgess S. Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy. Stat Med. 2017;36: 4705–4718. doi: 10.1002/sim.7492 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref017] 17.Lin Z, Xue H, Pan W. Robust multivariable Mendelian randomization based on constrained maximum likelihood. The American Journal of Human Genetics. 2023;110: 592–605. doi: 10.1016/j.ajhg.2023.02.014 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref018] 18.Grant AJ, Burgess S. A Bayesian approach to Mendelian randomization using summary statistics in the univariable and multivariable settings with correlated pleiotropy. bioRxiv; 2023. p. 2023.05.30.542988. doi: 10.1101/2023.05.30.542988 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref019] 19.Wang J, Zhao Q, Bowden J, Hemani G, Smith GD, Small DS, et al. Causal inference for heritable phenotypic risk factors using heterogeneous genetic instruments. PLOS Genetics. 2021;17: e1009575. doi: 10.1371/journal.pgen.1009575 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref020] 20.Bowden J, Del Greco M F, Minelli C, Davey Smith G, Sheehan N, Thompson J. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat Med. 2017;36: 1783–1802. doi: 10.1002/sim.7221 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref021] 21.Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Statistics in Medicine. 2019;38: 2074–2102. doi: 10.1002/sim.8086 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref022] 22.Wootton RE, Richmond RC, Stuijfzand BG, Lawn RB, Sallis HM, Taylor GMJ, et al. Evidence for causal effects of lifetime smoking on risk for depression and schizophrenia: a Mendelian randomisation study. Psychol Med. 2020;50: 2435–2443. doi: 10.1017/S0033291719002678 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref023] 23.Grant AJ, Burgess S. Pleiotropy robust methods for multivariable Mendelian randomization. Stat Med. 2021;40: 5813–5830. doi: 10.1002/sim.9156 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref024] 24.Anderson EL, Howe LD, Wade KH, Ben-Shlomo Y, Hill WD, Deary IJ, et al. Education, intelligence and Alzheimer’s disease: evidence from a multivariable two-sample Mendelian randomization study. International Journal of Epidemiology. 2020;49: 1163–1172. doi: 10.1093/ije/dyz280 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref025] 25.Davies NM, Hill WD, Anderson EL, Sanderson E, Deary IJ, Davey Smith G. Multivariable two-sample Mendelian randomization estimates of the effects of intelligence and education on health. Teare MD, Franco E, Burgess S, editors. eLife. 2019;8: e43990. doi: 10.7554/eLife.43990 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref026] 26.Hill WD, Marioni RE, Maghzian O, Ritchie SJ, Hagenaars SP, McIntosh AM, et al. A combined analysis of genetically correlated traits identifies 187 loci and a role for neurogenesis and myelination in intelligence. Mol Psychiatry. 2019;24: 169–181. doi: 10.1038/s41380-017-0001-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref027] 27.Okbay A, Beauchamp JP, Fontana MA, Lee JJ, Pers TH, Rietveld CA, et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature. 2016;533: 539–542. doi: 10.1038/nature17671 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref028] 28.Rapid GWAS of thousands of phenotypes for 337,000 samples in the UK Biobank. In: Neale lab [Internet]. [cited 18 Jul 2022]. Available: http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank

[pone.0291183.ref029] 29.Lambert J-C, Ibrahim-Verbaas CA, Harold D, Naj AC, Sims R, Bellenguez C, et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat Genet. 2013;45: 1452–1458. doi: 10.1038/ng.2802 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291183.ref030] 30.Haycock PC, Burgess S, Wade KH, Bowden J, Relton C, Davey Smith G. Best (but oft-forgotten) practices: the design, analysis, and interpretation of Mendelian randomization studies. Am J Clin Nutr. 2016;103: 965–978. doi: 10.3945/ajcn.115.118216 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

MVMRmode: Introducing an R package for plurality valid estimators for multivariable Mendelian randomisation

Benjamin Woolf

Dipender Gill

Andrew J Grant

Stephen Burgess

Roles

Abstract

Background

Methods

Results

Conclusions

Background

Methods

Theoretical background

Notation and assumptions

Statistical framework

Deriving a standard error multivariable-MBE and multivariable-CM

Simulation study

Aims

Data-generating mechanisms

Fig 1. Directed acyclic graphs of the simulation data generative models.

Estimands

Performance measure

Applied example

Results

Simulation

Table 1. Primary results for setting where both exposures cause the outcome, and exposure 2 is pleiotropic.

Table 2. Primary results for setting where neither exposure causes the outcome, and exposure 2 is pleiotropic.

Bias

95% CI width

Coverage and power

Additional outcomes

Applied example

Table 3. Results of the applied example exploring the effect to education and intelligence on Alzheimer’s disease.

Discussion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Suyan Tian

Roles

Author response to Decision Letter 0

Decision Letter 1

Suyan Tian

Roles

Author response to Decision Letter 1

Decision Letter 2

Suyan Tian

Roles

Acceptance letter

Suyan Tian

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases