Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2024 Jun 1.
Published in final edited form as: Ann Appl Stat. 2024 Apr 5;18(2):23-aoas1856. doi: 10.1214/23-AOAS1856

Selecting Invalid Instruments to Improve Mendelian Randomization with Two-Sample Summary Data

Ashish Patel 1, Francis J DiTraglia 2, Verena Zuber 3, Stephen Burgess 1,4
PMCID: PMC7615940  EMSID: EMS195813  PMID: 38737575

Abstract

Mendelian randomization (MR) is a widely-used method to estimate the causal relationship between a risk factor and disease. A fundamental part of any MR analysis is to choose appropriate genetic variants as instrumental variables. Genome-wide association studies often reveal that hundreds of genetic variants may be robustly associated with a risk factor, but in some situations investigators may have greater confidence in the instrument validity of only a smaller subset of variants. Nevertheless, the use of additional instruments may be optimal from the perspective of mean squared error even if they are slightly invalid; a small bias in estimation may be a price worth paying for a larger reduction in variance. For this purpose, we consider a method for “focused” instrument selection whereby genetic variants are selected to minimise the estimated asymptotic mean squared error of causal effect estimates. In a setting of many weak and locally invalid instruments, we propose a novel strategy to construct confidence intervals for post-selection focused estimators that guards against the worst case loss in asymptotic coverage. In empirical applications to: (i) validate lipid drug targets; and (ii) investigate vitamin D effects on a wide range of outcomes, our findings suggest that the optimal selection of instruments does not involve only a small number of biologically-justified instruments, but also many potentially invalid instruments.

Keywords and phrases: Mendelian randomization, focused information criterion, post-selection inference

1. Introduction

Mendelian randomization (MR) uses genetic variants as instrumental variables to estimate the causal effect of a risk factor on an outcome in the presence of unobserved confounding. By Mendel’s second law, genetic variants sort independently of other traits. Thus, genetic variants, which are fixed at conception, can provide a source of exogenous variation in a risk factor of interest, allowing analyses that are less vulnerable to reverse causality and confounding (Davey Smith and Ebrahim, 2003).

Large-scale consortia genome-wide association studies (meta-GWASs) have identified large numbers of genetic variants that are robustly associated with a wide range of traits. Due in part to privacy issues, often only summary statistics of these genetic associations are made publicly available. Since these results are easily accessible, MR investigations increasingly rely on inferential methods that require only two-sample summary data (Burgess et al., 2015). In such applications, genetic variant associations with the risk factor are obtained from a representative but non-overlapping sample used to measure genetic variant associations with the outcome.

As in usual instrumental variable analyses, identifying causal effects through MR requires some key assumptions. For a genetic variant to be a valid instrument, it must be associated with the risk factor; βXj ≠ 0 in Figure 1 (relevance). Second, the variant must be uncorrelated with unobserved confounders. Third, any effect that the variant has on the outcome must be mediated by its effect on the risk factor; τj = 0 in Figure 1 (exclusion).

Fig 1. The effect of genetic variant Zj on the risk factor X and outcome Y, where U is an unobserved confounder.

Fig 1

Violations of the exclusion condition are common in MR studies (Lawlor et al., 2008), due in part to the widespread phenomenon of pleiotropy where a single genetic variant may influence several traits (Solovieff et al., 2013; Hemani, Bowden and Davey Smith, 2018). While the biological mechanism of certain genetic effects may be well understood (for example, when using protein risk factors for drug target validation; see Schmidt et al., 2020), it is generally difficult to rule out the possibility that many genetic variants have a direct effect on the outcome, and thus violate the exclusion restriction (Verbanck et al., 2018).

In some MR applications, investigators may have more confidence in the validity of a particular subset of all candidate instruments. In particular, the causal mechanisms linking specific genes to risk factors may be known, which may better justify the use of genetic variants from those genes as instruments. Examples of MR studies that have prioritised the use of variants from biologically-plausible genes include investigations into the effects of alcohol consumption (Millwood et al., 2019), C-reactive protein level (Swerdlow et al., 2016), smoking behaviour (Lassi et al., 2016), vitamin D supplementation (Mokry et al., 2015), and perturbing drug targets (Gill et al., 2021).

Along with biologically-justified instruments, investigators could consider using additional variants that are plausibly valid instruments in order to improve the precision of an analysis. Even if these additional instruments are slightly invalid, their use may provide slightly biased but more precise estimates that are optimal from the perspective of mean squared error.

Hence, to guide this instrument choice, we propose a method for “focused” instrument selection whereby genetic variants are selected to minimise the estimated asymptotic mean squared error of causal effect estimates. The strategy allows a tiering in the assumptions on instrument validity, and prioritises the evidence suggested by biologically-justified instruments.

We work with the popular two-sample summary data design, and in a setting of many weak and locally invalid instruments. In this local mis-specification setting, the collective direct instrument effects on the outcome are decreasing with the sample size at a rate that ensures a meaningful bias-variance trade off, and thus enables a mean squared error comparison in finite samples.

The theoretical contribution of our work has two elements. First, we extend DiTraglia (2016)’s results to allow for many weak instruments. We consider an asymptotic framework in which the number of instruments can grow at the same rate as the sample size, as long as their collective effects on the risk factor are bounded (Zhao et al., 2020). Second, we consider the problem of post-selection inference for focused estimators which is particularly challenging because we do not have consistent model selection; the uncertainty in model selection directly impacts the asymptotic distribution of focused estimators.

Given that focused use of additional instruments may lead to improved estimation compared with using only a smaller set of biologically-justified instruments, it is natural to consider whether a similar advantage may hold for inference. One desirable property for confidence intervals is that they are uniformly valid; that is, they achieve nominal coverage asymptotically over the space of potential direct instrument effects on the outcome. Unfortunately, the impossibility results discussed by Leeb and Pötscher (2005) suggest we cannot construct uniformly valid confidence intervals for focused estimators which will exactly achieve nominal coverage, and therefore such intervals will generally be conservative. For example, in simulation we find that the “2-step” uniformly valid confidence intervals of DiTraglia (2016) can be over 30% longer than standard confidence intervals based on using only a core set of valid instruments.

In the same way that focused estimation is willing to trade a small bias in estimation for a larger reduction in variance, we develop a strategy for constructing “focused” confidence intervals that accepts a small loss in coverage probability over a subspace of direct instrument effects on the outcome in exchange for shorter confidence intervals. Our focused confidence intervals account for uncertainty in model selection, and they achieve nominal coverage for plausible values of direct instrument effects, while providing a statistical guarantee on the worst case size distortion away from those values.

Compared with using only a core set of biologically-justified instruments, our simulation evidence illustrates how focused estimators and confidence intervals may provide improved estimation and inference when additional instruments are valid or slightly invalid, while buying insurance against poor performance when additional instruments are very invalid.

The utility of our methods is demonstrated in two empirical applications. The first considers genetic validation of lipid drug targets when investigators are uncertain about defining the width of a cis-gene window from which to select instruments. The second application investigates the genetically predicted effect of vitamin D supplementation on a range of outcomes when investigators want to prioritise the use of genetic variants from specific genes through biological considerations.

We use the following notation and abbreviations: P ‘converges in probability to’; D ‘converges in distribution to’; a~ ‘is asymptotically distributed as’. For any sequences an and bn, if an = O(bn), then there exists a positive constant M and a positive integer N such that for all nN, bn > 0 and |an| ≤ Mbn. If an = o(bn), then |an|/bn → 0 as n → ∞. Also, if an = Θ(bn), then there exist positive constants M1 and M2, M1M2 < ∞, and a positive integer N such that M1bnanM2bn for all nN. The proofs of theoretical results are given in Supplementary Material (Patel et al., 2024), and R code to perform our empirical investigation is available on GitHub at github.com/ash-res/focused-MR/.

2. Model and assumptions

2.1. Two-sample summary data

We first outline our assumptions on genetic association summary data, which are motivated through a simple linear model with invalid instruments and homoscedastic errors.

Let Z = (Z1,…, Zp)′ denote a p-vector of uncorrelated genetic variants. The parameter of interest is the causal effect θ0 of the risk factor X on the outcome Y, which is described by

Y=ωY+Xθ0+Zτ+UY (1)
X=ωX+ZβX+UX, (2)

where E[UY|Z]=0,E[UX|Z]=0,E[UY2|Z]=σUY2,E[UX2|Z]=σUX2,βX=(βX1,,βXp), and (ωX, ωY, βX, τ, θ0) are unknown parameters. If τ is non-zero, then at least one genetic variant fails the exclusion restriction and directly affects the outcome.

Substituting (2) into (1), we have Y = (ωY + ωXθ0) + Z′(βXθ0 + τ) + (UY + UXθ0). Thus, Cov(Z, Y) = Var(Z)(βXθ0 + τ), which leads to a model

βY=βXθ0+τ, (3)

where βY = (βY1,…, βYp)′ is the p-vector of coefficients from a population regression of Y on Z. We aim to estimate the model in (3) using two-sample summary data on genetic associations.

Assumption 1 (two-sample summary data)

For each variant j, we observe genetic associations β^Xj and β^Yj, which satisfy β^Xj~N(βXj,σXj2) and β^Yj~N(βYj,σYj2), where σXj2=Θ(1/nX) and σYj2=Θ(1/nY) are assumed to be known. Moreover, the set of 2p genetic associations {β^Xj,β^Yj}j=1p are mutually uncorrelated, and nX/nYc, as n := (nX, nY) → ∞ for some constant 0 < c < ∞.

Assumption 1 is taken from Zhao et al. (2020) and states a normal approximation of estimated genetic associations which is typically justified by large random sampling expected in genetic association studies. Specifically, for each variant j, we have access to estimates and standard errors from univariable X on Zj linear regressions from an nX-sized sample, and from a non-overlapping nY-sized sample, we observe measured associations from univariable Y on Zj linear regressions. Both random samples are drawn from the joint distribution of (Y, X, Z).

The assumption that the population standard deviations {σXj,σYj}j=1p are known is common in MR, and a formal justification for this is given by Ye, Shao and Kang (2021). To simplify notation, the dependence of the standard errors on the sample size is not made explicit, but they are assumed to decrease at the usual parametric rate.

2.2. Core instruments

From (3), the parameter of interest θ0 is not identified unless there are some restrictions on τ = (τ1,…, τp)′. When genetic variants from several gene regions are used to instrument the risk factor, a popular identification strategy is to assume that τ is a mean zero random effect; see, for example, Zhao et al. (2020). Another commonly-used assumption is that most genetic variants are valid instruments; τj = 0 for jSM, where SM is some unknown set of variants such that p−1|SM| > 0.5. In this setting the median-based estimators of Bowden et al. (2016) are consistent. Based on a variation of these assumptions, many summary data MR methods have proposed ways to obtain unbiased estimates of θ0; a recent review is given in Sanderson et al. (2022).

In this work we do not focus on a novel identification strategy, but instead only the simple instrument selection choice for investigators when they believe a core set of genetic variants S0 consists of only valid instruments, but they are less confident on the validity of any additional candidate instruments.

Assumption 2 (core instruments)

For some known set of instruments S0 such that 1 ≤ |S0| < p, and where |S0| grows proportionately with p, we have τj = 0 for all jS0.

Assumption 2 allows us to measure the bias which may result from the inclusion of additional instruments. In some MR studies, there is good reason for believing that a smaller subset of all available genetic variants are more likely to be valid instruments. We mention two examples that we explore in more detail in Section 6.

Example 1 (choosing cis windows in drug target MR)

In drug target MR studies, only those genetic variants from a single gene region that encodes the protein target of a drug are used to instrument the risk factor. Such studies have gained popularity for providing supporting genetic evidence to validate drug targets and to study side effects (Gill et al., 2021). A key decision in drug target MR is to choose the width of a “cis window” which dictates a gene region from which to select instruments. Tools such as GeneCards (Stelzer et al., 2016) offer practical guidance on defining appropriate gene regions, but since there are often relatively few uncorrelated genetic signals from narrow cis windows, researchers often resort to widening the cis window in order to boost power through the use of a larger number of instruments. This leaves the possibility of a type of publication bias where researchers may report findings only from a cis window which gives their preferred result. We may be less confident on the instrument validity of additional genetic variants that are included only from widening a cis window.

Example 2 (estimating vitamin D effects)

GWASs have identified strong genetic associations with vitamin D in biologically plausible genes; GC, DHCR7, CYP2R1, and CYP24A1. Each of these genes is known influence vitamin D level through different mechanisms. In order to investigate the effect of vitamin D supplementation on a range of traits and diseases, MR studies have often used genetic variants located in neighborhoods of those genes to instrument vitamin D as they are considered more likely to satisfy the exclusion restriction than other genome-wide significant variants (Mokry et al., 2015; Revez et al., 2020). Genetic variants from other gene regions may be strongly associated with vitamin D, but they may be more likely to have direct effects on the outcome through their effects on traits other than vitamin D.

2.3. Many weak variant associations

In typical applications, we may expect many genetic variants to have weak effects on the risk factor. This can cause difficulties for identifying and estimating the causal effect. We study a setting of many weak instruments where the number of genetic variants is permitted to grow at the same rate as the sample size, but their collective explanatory power is bounded.

Assumption 3 (many weak instruments)

Let βX,0 denote a |S0|-vector with its elements given by βXj for jS0. Then, ||βX||2 = O(1), ||βX||3/||βX,0||2 → 0, and p/nβX,022=O(1) as n, p → ∞.

Given (3) and Assumptions 2 and 3, the causal effect is point identified. Assumption 3 is similar to Zhao et al. (2020), and it implies that all variants are relevant instruments, but the explanatory power of any individual variant is decreasing as p → ∞. Moreover, the skewness of βX is restricted which rules out very sparse variant effects settings. Finally, the rate restriction p/nβX,022=O(1) as n, p → ∞ ensures asymptotic normality in estimation, and the condition is plausible given the large sample sizes of typical GWASs.

2.4. Additional locally invalid instruments

To meaningfully study a bias-variance trade off from including additional instruments we need to ensure that the bias and variance terms have the same order of magnitude. Following DiTraglia (2016)’s framework of locally invalid instruments, we work with local mis-specification in which the direct effects τ are collectively “local-to-zero”, and decrease at the same rate as the sampling errors with n. This is not a substantive biological assumption, but rather a technical device that allows us to study the instrument selection problem from a mean squared error perspective.

Assumption 4 (locally invalid instruments)

τ22=O(1/n) as n, p → ∞.

Under the rate restriction in Assumption 4, we can consistently estimate θ0 using any set of instruments, but making valid inferences using invalid instruments will require us to account for an asymptotic bias. The rate restriction is slightly different to DiTraglia (2016) in that we limit only the collective direct effects over all variants.

Assumption 4 is otherwise quite general. For example, the direct effects τ may be correlated with βX, and thus may violate the so-called InSIDE (Instrument Strength Independent of Direct Effect) assumption which some existing MR methods that make use of invalid instruments rely on (Bowden, Smith and Burgess, 2015).

3. Focused instrument selection and estimation

Using only the core set of instruments S0 should lead to asymptotically unbiased estimation of θ0, and this forms a basis from which to measure the potential bias from including additional instruments. For ease of exposition we focus on the simple choice between using either: (i) the “Core” estimator that chooses only the set of core instruments S0; or (ii) the “Full” estimator that uses the full set of p instruments. The inclusion of S, the set of the additional p – |S0| genetic variants, may result in improved estimation if there is a large reduction in variance, and at most only a small increase in bias. Thus we use a “focused” instrument selection strategy (DiTraglia, 2016) where the Full estimator is selected only if it has a lower estimated asymptotic mean squared error than the Core estimator.

The results presented here can be easily extended for more fine-tuned instrument selection at the expense of extra notation; instead of simply choosing between the Core and Full estimators, we could also select from subsets of the additional instruments. Such subsets could be data-driven, chosen on biological reasons, and have overlapping variants. For example, in our empirical application to study vitamin D effects, we chose from all possible subsets of 3 partitions of the additional instruments, where the partitions were formed by k-means clustering on the Wald ratio estimate of each variant, β^Yj/β^Xj,jS.

We consider limited information maximum likelihood (LIML) estimation of θ0. The Core and Full estimators, θ^C and θ^F, are given by

θ^C=argminθjS0(β^Yjβ^Xjθ)2σYj2+σXj2θ2andθ^F=argminθjS0S(β^Yjβ^Xjθ)2σYj2+σXj2θ2.

Under Assumptions 1-3, Theorem 3.1 of Zhao et al. (2020) shows that the asymptotic distribution of θ^C is

ΔC1/2(θ^Cθ0)DN(0,1),

as n, p → ∞, where ΔC=ηC2(ηC+ζC),ηC=jS0Ωj1βXj2,ζC=jS0Ωj2σXj2σYj2, and Ωj=σYj2+θ02σXj2. Using similar arguments, we can derive the asymptotic distribution of the Full estimator.

Theorem 1 (Full estimator)

Under Assumptions 1-4, and the Full estimator is consistent, and its asymptotic distribution is

ΔF1/2(θ^Fθ0b)DN(0,1),

as n, p → ∞, where ΔF=(ηC+ηS)2(ηC+ηS+ζC+ζS),b=(ηC+ηS)1bS,bS=jSΩj1βXjτj,ηS=jSΩj1βXj2, and ζS=jSΩj2σXj2σYj2.

The variance terms ηC and ηS are of order O(nβX22), and the terms ςC and ςS are of order O(p). Therefore, (ηC + ηS)−1 would be the asymptotic variance of θ^F in a fixed p setting with strong instruments, and (ηC + ηS)−2(ςC + ςS) is an additional variance component to account for the extra uncertainty due to many weak instruments (Zhao et al., 2020). This is particularly important for our instrument selection problem since under-estimated variances based on ‘fixed p’ asymptotics could cause a mean squared error-based selection criteria to falsely recommend the inclusion of additional instruments. These two variance components are of the same order of magnitude when p/nβX22=Θ(1).

As discussed by Newey and Windmeijer (2009), despite the knife-edge condition required to balance these variance components, it may be advisable to use the weak instrument variance correction (ηC + ηS)−2(ςC + ςS) in general scenarios when n is considerably larger than p, and when instruments are strong. For example, the simulation study of Davies et al. (2015, pp. 457-9), which mimicked an MR design, showed that standard errors that did not correct for many weak instrument effects led to inflated type I error rates even for the case where n = 3000 and p = 9.

We also note that compared with the model of Zhao et al. (2020), here we consider τ to be a fixed effect rather than a random variable; this direct variant effect on the outcome induces a bias in estimation rather than an increase in variance. The asymptotic variance of θ^F is of order O(1/nβX22)+O(p/n2βX24), and its asymptotic bias is of order O(1/nβX2). Thus, there is a meaningful bias-variance trade off if p/nβX22=O(1) since the square of the asymptotic bias is of the same order of magnitude as the asymptotic variance.

To carry out focused instrument selection, we need to estimate and compare the asymptotic mean squared error (AMSE) of θ^C and θ^F. The Core estimator is asymptotically unbiased, and it is straightforward to consistently estimate its asymptotic variance ΔC. Under local mis-specification, the asymptotic bias b of the Full estimator cannot be consistently estimated. However, we can use θ^C to construct an asymptotically unbiased estimate of b.

Let η^C=jS0Ω^j1(β^Xj2σXj2),η^S=jSΩ^j1(β^Xj2σXj2), and Ω^j=σYj2+θ^C2σXj2. Then, our estimator of b is b^=(η^C+η^S)1b^S, where b^S=jSΩ^j1β^Xjβ^Yjθ^CjSΩ^j1(β^Xj2σXj2).

Theorem 2 (Asymptotic bias estimator)

Under Assumptions 1-4, the asymptotic distribution of b^ is

ΔB1/2(b^b)DN(0,1),

as n, p → ∞, where ΔB=(ηC+ηS)2[ηS+ζS+ξS+ηC2ηS2(ηC+ζC)] and ξS=2θ02jSΩj2σXj4.

The standard error of the estimate b^ can be up to the same order of magnitude OP(1/nβX2) as the true bias b, and hence the asymptotic variance of b^ does not decrease fast enough for consistency as n, p → ∞. However, Theorem 2 shows that b^ is an asymptotically unbiased estimator. Similar to before, we can write the asymptotic variance ΔB as the sum of two terms: (ηC+ηS)2ηS(1+ηC1ηS) and (ηC+ηS)2(ζS+ξS+ηC2ηS2ζC). The first term is the asymptotic variance of b^ in a fixed p setting with strong instruments, and therefore the second term represents the extra uncertainty in estimation due to many weak instruments.

In order to estimate the AMSE of θ^C and θ^F, we need to construct consistent estimators Δ^C=η^C2(η^C+ς^C) for ΔC,Δ^F=(η^C+η^S)2(η^C+η^S+ς^C+ςS) for ΔF, and Δ^B=(η^C+η^S)2[η^S+ς^S+ξ^S+η^C2η^S2(η^C+ς^C)] for ΔB where ς^C=jS0Ω^j2σXj2σYj2,ς^S=jSΩ^j2σXj2σYj2, and ξ^S=2θ^C2jSΩ^j2σXj4.

Since θ^C is asymptotically unbiased, a consistent estimator for its AMSE is Δ^C. Whereas for θ^F, an asymptotically unbiased estimator of its AMSE is (b^2Δ^B)+Δ^F. Following DiTraglia (2016), since the square of the asymptotic bias cannot be negative, we use max(0,b^2Δ^B) instead of b^2Δ^B when estimating the AMSE of θ^F.

Define W^=max(b^2Δ^B,0)+Δ^FΔ^C as the estimated AMSE of θ^F minus the estimated AMSE of θ^C. Therefore, the selection event that we select the Full estimator is given by {W^0}. The “Focused” estimator is then given by

θ^=I{W^0}θ^F+(1I{W^0})θ^C.

Since both θ^F and θ^C are consistent estimators of θ0, so is the Focused estimator θ^.

As noted by a reviewer, it is also possible to use Theorem 2 to construct a valid test of H0 : b = 0. Such tests could potentially be used to inform an alternative instrument selection strategy based on pre-testing for bias; see Supplementary Material for further discussion (Patel et al., 2024).

4. Post-selection inference

In this section we discuss the problem of constructing confidence intervals for the Focused estimator θ^, which is non-standard because model selection is based on an AMSE criterion that is not consistently estimated.

We start by deriving the asymptotic distribution of θ^, and discuss why naive confidence intervals that ignore sampling uncertainty in instrument selection are likely to perform poorly. We then note the relative merits of two related inference procedures proposed in DiTraglia (2016), before introducing a new “Focused” approach that combines their strengths. Just as the Focused estimator aims to achieve a good balance between bias and variance, Focused intervals aim to achieve a good balance between the length of confidence intervals and the potential worst case asymptotic coverage over the likely space of direct instrument effects on the outcome.

4.1. Asymptotic distribution of the Focused estimator

A feature of the local mis-specification framework is that the sampling uncertainty from asymptotic bias estimation directly affects the asymptotic distribution of the post-selection Focused estimator θ^. In particular, there is non-ignorable uncertainty in model selection: even if the use of only the core instruments is optimal from an AMSE perspective, the uncertainty in asymptotic bias estimation can cause the Focused estimator to erroneously select the full set of instruments.

In contrast, under consistent model selection, the Focused estimator would choose either θ^C or θ^F with probability approaching 1 as n, p → ∞. While this would seem to simplify the task of inference, it would still not be possible to consistently estimate the distribution of post-selection estimators uniformly over the space of direct effects τ (Leeb and Pötscher, 2005, Section 2.3, pp. 38–40). Moreover, consistent model selection does not suit our goal of improved estimation in terms of low risk, since the worst case risk of post-selection estimators would be unbounded (Leeb and Pötscher, 2008).

Theorem 3 (Focused estimator)

Under Assumptions 1-4, the Focused estimator is consistent, and is asymptotically distributed θ^~aθ0+Λ(b) as n, p → ∞ where Λ(b)=I{W(b)0}(b+KF)+(1I{W(b)0})KC,W(b)=max((b+Kb)2ΔB,0)+ΔFΔC, and

K[KFKCKb]~N([000],Δ[ΔFΔEΔDΔEΔCΔAΔDΔAΔB]),

where ΔE=ηC1(ηC+ηS)1(ηC+ςC),ΔA=ΔEηC1ηS, and ΔD=ηC1(ηC+ηS)2ηC(ηS+ςS)ΔE(ηC+ηS)1ηS.

Theorem 3 indicates that the asymptotic distribution of θ^ is a weighted average of the asymptotic distributions of θ^C and θ^F, where the weights are random even as n, p → ∞. As a result, “naive” confidence intervals for θ^ that ignore sampling uncertainty in instrument selection should not be reported. Such intervals can perform extremely poorly in practice, with coverage arbitrarily far below their nominal level.

Under the condition p/nβX22=O(1) as n, p → ∞ from Assumption 3, the variance components Δ can be consistently estimated, and therefore to simplify notation we henceforth assume that Δ is known.

4.2. Focused confidence intervals with coverage constraints

There are two related problems that a valid inference procedure must solve. First, confidence intervals need to be widened to account for the model selection uncertainty introduced by focused instrument selection. Second, confidence intervals need to be re-centered if the focused estimator is asymptotically biased.

If the true asymptotic bias component b was known, inference would be straightforward: Theorem 3 could be used directly to simulate the distribution of Λ(b). However, b cannot be consistently estimated in our locally invalid instruments framework. For this setting, DiTraglia (2016) proposes two feasible inference procedures that consider the distribution of Λ(b) at values of b that are plausible given the observed data.

The “1-step” interval is based on the distribution of Λ(b) evaluated at b=b^, and it effectively assumes that the true value of b is equal to its asymptotically unbiased estimator b^. This is intuitive since b^ is in some sense the most plausible value of b given the data. It follows that if b^ is close to the true value of b, then the 1-step interval will have coverage that is close to its nominal level. Moreover, when the direct effects τ are small, simulation evidence from DiTraglia (2016) suggests that the 1-step interval has competitive coverage and can be shorter in length than the “Core” interval, which is the standard confidence interval of θ^C that uses only the core instruments. More generally, however, the 1-step interval comes with no theoretical guarantees; it may substantially under-cover, although its performance is much better than that of a naive interval that ignores instrument selection uncertainty.

The reason why the 1-step interval may under-cover is that it fails to account for the uncertainty in the estimate b^ of b, thus potentially leading to intervals that are too short and centered incorrectly. The “2-step” interval allows for such uncertainty by first constructing a confidence region φ for b. It then simulates the distribution of Λ(b′) at every value b′ in φ, constructing a collection of confidence intervals each based on the assumption that the true value of b is b′. To obtain a uniform coverage guarantee, the 2-step interval takes the outer envelope of all of the resulting intervals. This makes the 2-step interval extremely conservative: in general there is no value of b for which the actual coverage equals the nominal coverage, and hence it will always over-cover. Our simulation evidence suggests that this over-coverage problem makes the 2-step intervals too wide to be useful in practice.

We consider a way forward for improved inference with “Focused” intervals. These intervals aim to combine the strengths of the 1-step and 2-step intervals while avoiding their drawbacks. Like the 1-step interval, it is constructed by simulating Λ(b) at a single value of b rather than taking an outer envelope over many values. This means that it can yield shorter confidence intervals. Like the 2-step interval, however, it comes with theoretical guarantees. The key is to choose an appropriate value of b.

The Focused interval considers only values of b that are contained in a (1 – α1) × 100% confidence interval called φ. This is the same confidence interval for b used in the 2-step interval approach. For some values of b in φ the distribution of Λ(b) will be highly dispersed. Suppose that b′ is such a value, so that a (1 – α2) × 100% confidence interval CI(b′) for θ0 computed under the assumption that b = b′ will be relatively wide. By construction, CI(b′) will achieve nominal coverage probability 1 – α2 when b = b′. The key insight is as follows: if b″ is a value in φ for which Λ(b″) is relatively less dispersed, then CI(b′) may also be a nearly valid confidence interval for θ0 when b = b″. Using this idea, the construction of the Focused interval proceeds as follows, based on a user-specified tolerance γ and nominal coverage probability 1 – α.

Algorithm 1. (Focused interval with a minimum coverage constraint).

1. Construct a (1 – α1) × 100% confidence interval φ for b using Theorem 2.

2. For each b′φ, calculate a collection of (1 – α2) × 100% intervals [al(b′), au(b′)] for Λ(b′) using Theorem 3, each under the assumption that b = b′.

3. Calculate φ¯={b(α,γ)φ:P(al(b)Λ(b)au(b))1α2γ,forallbφ}.

4. Find the value of b(α,γ)φ¯ that yields the shortest (1 – α2) × 100% confidence interval for Λ(b′). Call this value b*(α, γ). A CI for θ0 is then [θ^al(b*(α,γ)),θ^au(b*(α,γ))].

5. Notice that b* (α, γ) depends on γ, α1, and α2. Repeat steps 1-4 for a range of choices of α1 subject to the constraint α1 + α2 = α. Choose the value of α1 that yields the shortest interval for θ0.

Like the 1-step interval, the Focused interval is always shorter than the 2-step interval because it is one of the intervals contained in the outer envelope that forms the 2-step interval. Unlike the 1-step interval, its asymptotic coverage probability can never fall below 1 – αγ.

To illustrate the differences between the 1-step, 2-step, and Focused intervals, Figure 2 plots the distributions of Λ(b¯) evaluated at different bias levels b¯. The true bias is b, an asymptotically unbiased estimator for the bias is b^, while b*(α, γ) and b″ are other bias levels contained in a (1 – α1) × 100% confidence region φ for b. Since b is unknown, we cannot directly calculate a confidence interval based on the distribution of Λ(b) given in green. The 1-step (1 – α) × 100% confidence interval is based on the distribution of Λ(b^) given in blue. The 2-step interval takes an outer envelope of (1 – α2) × 100% confidence intervals based on the distributions of Λ(b^) in blue, Λ(b*(α, γ)) in yellow, Λ(b″) in red, and all other distributions of Λ(b¯) for every b¯ in φ.

Fig 2.

Fig 2

The distribution of Λ(b¯) evaluated at b¯=b,b¯=b^, and b¯=b*(α,γ), for the case where all instruments are equally strong, and the additional instruments are invalid.

The Focused interval is a (1 – α2) × 100% confidence interval based on the yellow distribution of Λ(b*(α, γ)) in Figure 2, and this interval also covers (1 – α2γ) × 100% of the distributions of Λ(b^) and Λ(b″), and all other distributions Λ(b¯) for every b¯ in φ. Note that the distribution of Λ(b*(α, γ)) that the Focused interval is based on is changing with the choice of γ. For lower values of γ, Λ(b*(α, γ)) is relatively dispersed: a bias level b*(α, γ) that corresponds to a more dispersed distribution of Λ(b*(α, γ)) is selected in order to satisfy the more stringent (1 – α2γ) × 100% coverage requirement of the distributions of Λ(b¯), for all b¯φ.

Theorem 4 (Worst case asymptotic coverage of Focused intervals)

Under Assumptions 1-4, the Focused interval defined in Algorithm 1 has asymptotic coverage probability no less than 1 – αγ as n, p → ∞.

The Focused interval is designed to achieve nominal coverage 1 ‒ α at a plausible value of the asymptotic bias, while also controlling the worst case asymptotic coverage according to a maximum allowable size distortion γ. The choice of γ dictates the trade off between the worst case coverage level and the length of the interval, with a lower level of tolerance γ more likely to provide conservative inference. The feasibility of constructing the Focused interval relies on the existence of a sufficiently dispersed distribution of Λ(b′(α,γ)) at a plausible value b′(α,γ), which may not exist for extremely low levels of γ. However, selecting an extremely low level of γ would defeat the purpose of the Focused interval, which aims to be competitive in terms of both coverage and length.

Although Algorithm 1 is novel, the concept of an inference procedure that depends on a user-specified allowable size distortion has precursors in the econometrics literature. For example, Andrews (2018) proposes an inference strategy that controls a worst case coverage distortion under weak instruments, and where the decision to report a conventional or weak instrument robust confidence set depends on the level of under-coverage that an investigator is willing to accept.

5. Simulation study

In this section we illustrate how the finite sample performance of the Focused estimator and Focused confidence interval depends on the strength of instruments and the magnitude of direct variant effects on the outcome.

First, we consider estimation performance in terms of root-mean squared error (RMSE). Second, we show how the Focused interval may be able to achieve a favourable balance of length and coverage, and discuss where it may lead to improved inference compared with the Core interval, which is the conventional confidence interval of θ^C based on using only the core instruments. Third, we discuss the sensitivity of the Focused interval to the choice of γ which controls the worst case coverage loss. Finally, we consider the performance of the Focused estimator when the core instruments S0 are in fact invalid.

5.1. Design

We simulated two-sample summary data on p = 110 variants according to Assumptions 14. The sample sizes were set at n = nX = nY = 1000. Of the 110 variants, 10 were set to be valid instruments, and they formed the core set S0. The remaining p – |S0| = 100 variants formed the set of additional instruments S.

We generated estimated associations β^Xj~N(βXj,σXj2) and β^Yj~N(βYj,σYj2), where true genetic variant associations with the risk factor were set as βXj=β¯C/|S0| for jS0, and βXj=β¯S/p|S0| for jS, where β¯C and β¯S were chosen to maintain a particular level of the concentration parameters λC=jS0βXj2/(|S0|σXj2) and λS=jSβXj2/(|S|σXj2), which are measures of the average instrument strength of S0 and S. The variances σXj2 and σYj2 were set equal to 1/n for all variants.

The true variant–outcome associations were set to be βYj = βXj θ0 for jS0, and βYj = βXj θ0 + τj for jS, where the true causal effect was θ0 = 0.2, and the direct effects are fixed effects generated as τj~U[0,τ¯/np], for different values τ¯0.

For inference, we set the nominal coverage probability at 1 – α = 0.95, and unless otherwise stated, the allowable worst case size distortion for the Focused interval was set at γ = 0.2. Along with the Focused, 1-step, 2-step, and Core intervals described above, we also note the performance of the “Naive” confidence interval, which is the standard confidence interval for the selected estimator that ignores sampling uncertainty in model selection.

5.2. Estimation

Figure 3 highlights that when the direct variant effects on the outcome are sufficiently small, the RMSE of the Focused estimator is lower than the Core estimator. However, this improvement is not uniform across larger values of τ¯. The performance of the Focused estimator worsens for more intermediate values of τ¯. Then, as τ¯ becomes large, the Focused estimator should select the Core estimator with large probability, so that the RMSEs of the Core and Focused estimators should be quite similar.

Fig 3.

Fig 3

RMSE of Focused estimator relative to RMSE of Core estimator varying with the average instrument strength of S0C) and S (λS), and invalidness of S(τ¯).

The extent of the improvement in RMSE that is possible appears to depend on the strength of instruments. For example, when τ¯=2, the Focused estimator offered a 32.8% reduction in RMSE when all instruments are relatively weak (λC = λS = 40), a 30.7% reduction when λC = λS = 120, and a 27.8% reduction when all instruments are relatively strong (λC = λS = 200).

The relative strengths of the core and additional instrument sets affect the values of τ¯ over which focused instrument selection is able to improve estimation. When the additional instruments were strong and the core instruments were relatively weak (λC = 40, λS = 200), the Focused estimator had a lower RMSE than the Core estimator over the range 0τ¯<10. In contrast, when λC = 200 and λS = 40, the Focused estimator had a lower RMSE only over the range 0τ¯2.

In summary, these estimation results intuitively suggest that focused instrument selection is more likely to improve estimation when the additional instruments are not too invalid, the core instruments are quite weak, and the additional instruments are strong. This is practically relevant for MR analyses of polygenic traits where several genes may be causally related.

5.3. Confidence intervals

For inference, Figure 4 shows that the coverage probability of the Naive interval dropped to as low as 0.4 when all instruments are equally strong (τ¯=8), and lower than 0.2 when the additional instruments are much stronger than the core instruments (λC = 40, λS = 200, τ¯=10). This underscores the importance for confidence intervals of θ^ to account for sampling uncertainty in model selection. On the other hand, the 2-step intervals are conservative; the intervals exceeded nominal coverage probability, and Figure 5 shows that they were generally over 30% longer in length than the Core intervals.

Fig 4. Coverage probabilities of confidence intervals (nominal coverage is 1 – α = 0.95).

Fig 4

Fig 5. Length of confidence intervals relative to the Core interval (nominal coverage is 1 – α = 0.95).

Fig 5

The 1-step intervals are a useful compromise between the 2-step and Naive intervals. When the additional instruments were not too invalid (i.e. for small enough parameter values of τ¯), the 1-step intervals were shorter in length than the Core intervals, while also achieving nominal coverage probability. The performance of the 1-step interval appears to be quite sensitive to the relative strengths of the core and additional instruments. When the additional instruments were relatively strong (for example, λC = 40, λS ≥ 160), the 1-step intervals were shorter than the Core intervals, but they under-covered for large enough values of τ¯. Conversely, when the core instruments are strong enough, the 1-step intervals showed no real advantages compared with the Core interval.

For the Focused intervals, we selected the allowable worst case size distortion as γ = 0.2, so that for a 1 – α = 0.95 level confidence interval, coverage probability should not be lower than 0.75. Figure 4 shows that the coverage of the Focused interval was higher than 0.8 for all parameter values τ¯, whereas for some values the coverage probability of the 1-step interval dropped below 0.7. Moreover, the coverage of the Focused interval was generally more competitive than the 1-step interval, especially when the additional instruments were at least as strong as the core instruments.

When the core instruments were not much weaker than the additional instruments, the Focused intervals were shorter in length than the 1-step interval. Interestingly, the Focused intervals were also up to 8% shorter in length than the Core intervals unless the additional instruments were very invalid (τ¯12). The Focused interval can therefore improve the power of an analysis by incorporating information from additional relevant instruments if they are not too invalid. At the same time, the Focused interval also retains good size control and buys insurance against serious under-coverage when additional instruments are very invalid.

5.4. Sensitivity to the choice of γ

The Focused intervals require investigators to choose an acceptable level of the worst case size distortion γ. Figure 6 illustrates how the performance of the Focused interval varies according to the choice of γ for the case where all instruments are equally strong (λC = λS = 40). The results of other confidence intervals discussed in this paper are also shown for comparison, but of course their performance should not vary with γ.

Fig 6.

Fig 6

The dashed line in the first row is the allowable size distortion 1 – αγ (nominal coverage is 1 – α = 0.95). The second row plots the length of confidence intervals relative to the Core interval.

Figure 6 verifies that the Focused interval is able to control the worst case coverage over all parameter values τ¯. The cost of allowing only a small size distortion is the longer length of the interval. The Focused intervals were over 15% longer in length than the Core intervals when γ = 0.05, although they were also much shorter than the 2-step intervals.

Conversely, for larger values of γ, the Focused intervals become shorter, but only to a certain degree: since the Focused intervals account for uncertainty in model selection, the lengths of the intervals are not as short as the Naive intervals. Accordingly, the coverage probability of the Focused intervals will also not change for large enough values of γ.

5.5. Estimation when S0 contains invalid instruments

Focused instrument selection aims to prioritise the evidence suggested by a core set of genetic variants S0 that are believed to be valid instruments. In practice, investigators may not always be correct in their belief, and S0 may actually consist of invalid instruments. We consider this setting in simulation by slightly altering the design in Section 5.1 so that the true variant–outcome associations are set to be βYj = βXjθ0 + τj, where τj~U[0,τ¯C/np] for jS0. Figure 7 presents the estimation results for the case where all instruments are equally strong (λC = λS = 40).

Fig 7.

Fig 7

RMSE of Focused estimator relative to RMSE of Core estimator varying with the invalidness of S0(τ¯C) and invalidness of S(τ¯).

Our results show that when S0 contained only slightly invalid instruments, the Focused estimator was able to improve on the Core estimator in terms of RMSE. As the instruments in S0 become more invalid, but not as invalid as S, the performance of the Focused estimator worsens because the estimated bias from including S is under-estimated. We note that the Focused estimator also performed relatively well when the instruments S0 were at least as invalid than S; for example, the RMSE of the Focused estimator was lower than that of the Core estimator when τ¯C=τ¯=12.

6. Empirical Examples

In this section, we demonstrate how focused instrument selection can be applied in MR studies. First, we consider the problem of instrument selection in drug target MR, where variation in a gene that encodes the protein target of a drug is used to proxy drug target perturbation. Such MR investigations have the potential to provide genetic evidence on drug efficacy and to study potential side effects (Gill et al., 2021).

An important decision in drug target MR is to specify a “cis window” that defines a gene region from which instruments are selected. In practice, investigators often use cis windows that are wider than gene regions defined by tools such as GeneCards (Stelzer et al., 2016) in order to boost the power of an analysis through the use of multiple instruments. This leaves the possibility of a type of publication bias where researchers may report findings only from a cis window that gives their preferred result. In Section 6.1, we apply the focused instrument selection method to two lipid drug targets, where genetic variants that may be included only from widening a cis window are additional instruments.

Second, we investigate the effect of vitamin D supplementation on a range of outcomes. GWASs have identified strong genetic associations in biologically plausible genes known to have a functional role in the transport, synthesis, or metabolism of vitamin D. Some previous MR studies aiming to study vitamin D effects have used genetic variants located in neighborhoods of those genes to instrument vitamin D as they are considered more likely to satisfy the exclusion restriction than other genome-wide significant variants (Mokry et al., 2015; Revez et al., 2020). However, the role of many other genes which are robustly associated with vitamin D may not yet be fully understood; for example, Jiang, Ge and Chen (2021) selected genetic variants from 69 independent loci to instrument vitamin D. In Section 6.2, we use genetic variants from biologically plausible genes as the core set of instruments, and apply focused instrument selection to select from many additional genetic variants which may be considered more likely to have a direct effect on the outcome through their effects on traits other than vitamin D.

For Focused confidence intervals, we selected γ = 0.2 for the maximum allowable size distortion. The data used for our analyses are publicly available through the MR-Base platform (Hemani et al., 2018), and R code to perform our empirical investigation is available on GitHub at github.com/ash-res/focused-MR/.

6.1. CETP and PCSK9 inhibitors

Cholesteryl ester transfer protein (CETP) inhibitors are a class of drugs that raise high density lipoprotein cholesterol levels and lower low density lipoprotein cholesterol (LDL-C) levels. At least three CETP inhibitors have failed in clinical trials to conclude a protective effect against coronary heart disease (CHD), but the successful trial of Anacetrapib showed a modest benefit when used with statins (Bowman et al., 2017). A recent drug target MR analysis by Schmidt et al. (2021) offers genetic evidence that CETP inhibition may be an effective approach for preventing CHD. Here we investigate the robustness of a similar drug target MR study to the choice of a cis window used to select instruments.

We study the genetically predicted LDL-C lowering effect of CETP inhibition on a range of outcomes by using genetic variants located in a neighborhood of the CETP gene. We may consider instruments drawn from the “narrow” cis window Chr 16: bp: 56,985,862–57,027,757 (which is the region stated in GeneCards ±10,000 bp) to more accurately represent the genetically predicted effects of CETP inhibition. At the same time, it is also common for drug target MR studies to use a “wider” cis window for instrument selection. We may consider potential additional instruments from the wider window Chr 16: bp: 56,895,862–57,117,757 (which is the region stated in GeneCards ±100,000 bp). The use of these additional instruments may be considered more likely to lead to biased estimation compared with using only those variants from the narrow cis window.

Therefore the Core estimator used only the 3 uncorrelated genetic variants located in the narrow window as instruments, while the number of additional variants that were available varied between 4 and 10 depending on the outcome of interest. The Focused estimator selected the full set of instruments for 12 out of the 16 outcomes we studied. The concentration parameter for the core instruments was 91.10, while the concentration parameter for the additional instruments was between 6.52 and 8.12 depending on how many additional instruments were available for the application.

From Figure 8, we find that the genetically predicted LDL-C lowering effect of CETP inhibition is associated with a lower risk of CHD. The results also suggest that genetically predicted CETP inhibition may have a protective effect on other cardiovascular disease outcomes; specifically atrial fibrillation and heart failure. We do not find evidence for a protective effect on stroke outcomes, nor do we find evidence for any adverse effects on various non-cardiovascular related outcomes.

Fig 8.

Fig 8

CETP gene analysis. Point estimates and 95% confidence intervals of the change in log odds ratio of various outcomes due to a 1 standard deviation increase in instrumented LDL-C.

Compared with CETP inhibitors, PCSK9 inhibitors (PCSK9i) are a more established class of drugs that lower LDL-C levels. A drug target MR analysis can genetically proxy the effect of taking PCSK9i by instrumenting LDL-C using genetic variants located in a neighborhood of the PCSK9 gene. Similar to the CETP gene analysis above, we consider variants located in a narrow cis window of PCSK9 (Chr 1: bp: 55,505,221–55,530,525; exactly equal to the window stated in GeneCards) as core instruments, while additional variants taken from a wider window ±100,000 bp are additional instruments.

Using this criteria, there were 4 core instruments when studying the same outcomes as before, apart from the case of lung cancer where there were 3. The number of additional instruments available ranged from 2 to 13. The Focused estimator again selected the full set of instruments for 12 out of 16 outcomes. The set of core instruments was very strong compared with the additional instruments; the concentration parameter for the core instruments was 618.71, compared with a range of 8.72 to 26.63 for the full set of additional variants available.

The results in Figure 9 suggest that the genetically predicted LDL-C lowering effect of PCSK9i is associated with a lower risk of coronary artery disease, heart failure, and stroke incidence; in addition, the Focused intervals suggest an association specifically with large artery stroke incidence. We also find genetic evidence that PCSK9 inhibition may adversely affect the risk of developing Alzheimer’s disease, which supports the findings of Williams et al. (2020) and Schmidt et al. (2021).

Fig 9.

Fig 9

PCSK9 gene analysis. Point estimates and 95% confidence intervals of the change in log odds ratio of various outcomes due to a 1 standard deviation increase in instrumented LDL-C.

6.2. Vitamin D supplementation

Finally, we apply focused instrument selection to estimate the genetically predicted effect of vitamin D supplementation on a range of outcomes. Previous MR studies instrumenting vitamin D have used variants from genes implicated in the modulation of 25OHD levels through known mechanisms. In particular, the GC, DHCR7, CYP2R1 and CYP24A1 genes have known functions in vitamin D transport, synthesis, or metabolism (Berry et al., 2012; Mokry et al., 2015). Therefore we take genetic variants from neighborhoods of these genes (±500,000 bp on regions stated in GeneCards) as our set of core instruments.

Moreover, GWASs also provide data on many other robustly associated genetic variants with vitamin D (Jiang, Ge and Chen, 2021). We use other variants passing a genome-wide significance threshold (p-value association with vitamin D less than 5 × 10−8) to form additional instrument sets. Instead of choosing between the Core and Full estimator, in our analysis of vitamin D effects we allowed the Focused estimator to also choose from subsets of the additional instruments. We partitioned the additional instruments into 3 groups by k-means clustering based on the ratio estimate of each variant (β^Yj/β^Xj,jS) and then considered all possible combinations of these 3 partitions, thus creating 7 sets of additional instruments.

In nearly all the outcomes we considered, there were between 10 and 11 genome-wide significant variants from the GC, DHCR7, CYP2R1 and CYP24A1 gene regions which were used as core instruments for vitamin D. For the outcomes primary biliary cirrhosis and asthma there were only 4 variants available to use as core instruments. Many additional instruments (≥ 50) were selected by the Focused estimator in all cases. The core instruments were again stronger than the additional instruments; for the core instruments the concentration parameter ranged from 625.52 to 723.21, and for the additional instruments it ranged from 49.73 to 99.80.

Evidence from observational studies suggests that low serum vitamin D levels are associated with an increased risk of cardiovascular disease (Dobnig et al., 2008). These reported associations may be due to unmeasured confounding, as evidence from a meta-analysis of 21 randomized clinical trials suggests no causal link (Barbarawi et al., 2019). From Figure 10, we find no genetically predicted effect of vitamin D on cardiovascular outcomes.

Fig 10.

Fig 10

Vitamin D effects. Point estimates and 95% confidence intervals of the change in log odds ratio of various outcomes due to a 1 standard deviation increase in instrumented vitamin D levels.

Our analysis is able to highlight that using the full set of available instruments can sometimes lead to very different estimates than when the core instruments are prioritised. In particular, the standard confidence intervals of the Full estimator suggest a non-null association of vitamin D on coronary artery disease, heart failure, eczema, primary biliary cirrhosis, and type 2 diabetes, but these results are not supported by the Focused intervals.

Our results suggest that higher vitamin D level may have a protective effect on the incidence of multiple sclerosis, a finding which has previously been discussed in other MR studies (Mokry et al., 2015). Through the Core and Focused intervals, we also find that genetically predicted vitamin D level may be associated with anorexia.

The 1-step and Focused intervals of the Focused estimator suggest that higher vitamin D level may have a protective effect on the risk of developing Alzheimer’s disease, which supports the findings from Jiang, Ge and Chen (2021), but interestingly this is not supported by the Core and Full intervals. The Core estimator used 11 instruments, the Full estimator used 91 additional instruments, and the Focused estimator selected a subset of 65 of the additional instruments. This illustrates the ability of the focused instrument selection method to carefully select additional instruments that can potentially improve the power of an analysis.

7. Conclusion

Publicly available GWAS summary data have revealed that hundreds of genetic variants are robustly associated with a wide range of traits and diseases. However, MR studies in practice do not always make use of all genetic variants that are strongly associated with risk factors of interest; in some applications, investigators may have greater confidence in the instrument validity of only a smaller subset of many genetic variants. For this setting, we propose a way forward for improved estimation through focused use of many weak and potentially invalid instruments.

Whether focused use of invalid instruments can in turn improve inference is an open question. While a uniform improvement is not possible, we propose a new strategy for postselection inference through Focused intervals, which are shown to achieve a good balance between precision and coverage probability while also guarding against a user-specified worst case size distortion. Our empirical applications highlight the potential of focused instrument selection to uncover new causal relationships in MR studies.

For neither estimation nor inference do we obtain uniform improvements, but such results could be explored under different settings. For multivariable Mendelian randomization where we are interested in estimating the effects θ0 = (θ1,…, θK)′ that multiple exposures may have on an outcome, it would be interesting to consider uniformly improved estimation in terms of an aggregate risk function, in similar spirit to Rosenman et al. (2023). Compared with the usual frequentist notion of inference, uniform improvement in terms of both length and coverage is possible from the perspective of empirical Bayes (Casella and Hwang, 2012, p.56) or “average coverage” (Armstrong, Kolesár and Plagborg-Møller, 2022) criterions. Future work could explore empirical Bayes confidence intervals under quite general restrictions on the direct effects τj, jS0.

Supplementary Material

Supplementary Material A

This contains proofs of theoretical results, and further simulation results.

Supplementary Material B

This contains R code to apply the focused instrument selection method in two-sample Mendelian randomization studies.

Supplementary material

Acknowledgments

We thank participants at the 2022 International Society for Clinical Biostatistics conference, participants at the 2023 Siena Workshop on Econometric Theory and Applications, and Dipender Gill for helpful discussions. We thank an anonymous referee for detailed comments.

Funding

SB was supported by a Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and the Royal Society (204623/Z/16/Z). VZ was supported by the United Kingdom Research and Innovation Medical Research Council (MR/W029790/1). This research was funded by the United Kingdom Research and Innovation Medical Research Council (MC-UU-00002/7), and supported by the National Institute for Health Research Cambridge Biomedical Research Centre: BRC-1215-20014.

Contributor Information

Ashish Patel, Email: ashish.patel@mrc-bsu.cam.ac.uk.

Francis J. DiTraglia, Email: francis.ditraglia@economics.ox.ac.uk.

Verena Zuber, Email: verena.zuber@imperial.ac.uk.

Stephen Burgess, Email: sb452@medschl.cam.ac.uk.

References

  1. Andrews I. Valid two-step identification-robust confidence sets for GMM. Review of Economics and Statistics. 2018;100:337–348. [Google Scholar]
  2. Armstrong TB, Kolesár M, Plagborg-Møller M. Robust empirical Bayes confidence intervals. Econometrica. 2022;90:2567–2602. [Google Scholar]
  3. Barbarawi M, Kheiri B, Zayed Y, Barbarawi O, Dhillon H, Swaid B, Yelangi A, Sundus S, Bachuwa G, Alkotob ML, Manson JE. Vitamin D Supplementation and Cardiovascular Disease Risks in More Than 83 000 Individuals in 21 Randomized Clinical Trials: A Meta-analysis. JAMA Cardiology. 2019;4:765–776. doi: 10.1001/jamacardio.2019.1870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Berry DJ, Vimaleswaran KS, Whittaker JC, Hingorani AD, Hyppönen E. Evaluation of genetic markers as instruments for Mendelian randomization studies on vitamin D. PLoS ONE. 2012;7:1–10. doi: 10.1371/journal.pone.0037465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bowden J, Smith GD, Burgess S. Mendelian randomization with invalid instruments: Effect estimation and bias detection through Egger regression. International Journal of Epidemiology. 2015;44:512–525. doi: 10.1093/ije/dyv080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genetic Epidemiology. 2016;40:304–314. doi: 10.1002/gepi.21965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bowman L, Hopewell JC, Chen F, Wallendszus K, Stevens W, Collins R, et al. HPS3 and TIMI55 REVEAL Collaborative Group, T Effects of anacetrapib in patients with atherosclerotic vascular disease. New England Journal of Medicine. 2017;377:1217–1227. doi: 10.1056/NEJMoa1706444. [DOI] [PubMed] [Google Scholar]
  8. Burgess S, Scott RA, Timpson NJ, Davey Smith G, Thompson SG, Consortium E-I Using Published Data in Mendelian Randomization: A Blueprint for Efficient Identification of Causal Risk Factors. European Journal of Epidemiology. 2015;30:543–552. doi: 10.1007/s10654-015-0011-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Casella G, Hwang JTG. Shrinkage confidence procedures. Statistical Science. 2012;27:51–60. [Google Scholar]
  10. Davey Smith G, Ebrahim S. ‘Mendelian randomization’: Can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology. 2003;32:1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]
  11. Davies NM, von Hinke Kessler Scholder S, Farbmacher H, Burgess S, Windmeijer F, Smith GD. The many weak instruments problem and mendelian randomization. Statistics in Medicine. 2015;34:454–468. doi: 10.1002/sim.6358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. DiTraglia FJ. Using invalid instruments on purpose: Focused moment selection and averaging for GMM. Journal of Econometrics. 2016;195:187–208. [Google Scholar]
  13. Dobnig H, Pilz S, Scharnagl H, Renner W, Seelhorst U, Wellnitz B, Kinkeldei J, Boehm BO, Weihrauch G, Maerz W. Independent association of low serum 25-hydroxyvitamin d and 1,25-dihydroxyvitamin d levels with all-cause and cardiovascular mortality. JAMA - The Archives of Internal Medicine. 2008;168:1340–1349. doi: 10.1001/archinte.168.12.1340. [DOI] [PubMed] [Google Scholar]
  14. Gill D, Georgakis MK, Walker VM, Schmidt AF, Gkatzionis A, Davies NM, et al. Mendelian randomization for studying the effects of perturbing drug targets. Wellcome Open Research. 2021;6:1–19. doi: 10.12688/wellcomeopenres.16544.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hemani G, Bowden J, Davey Smith G. Evaluating the potential role of pleiotropy in Mendelian randomization studies. Human Molecular Genetics. 2018;27:R195–R208. doi: 10.1093/hmg/ddy163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Hemani G, et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife. 2018;7:1–29. doi: 10.7554/eLife.34408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Jiang X, Ge T, Chen CY. The causal role of circulating vitamin D concentrations in human complex traits and diseases: a large-scale Mendelian randomization study. Nature Scientific Reports. 2021;11:1–10. doi: 10.1038/s41598-020-80655-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Lassi G, Taylor AE, Timpson NJ, Kenny PJ, Mather RJ, Eisen T, Munafò MR. The CHRNA5-A3-B4 gene cluster and smoking: from discovery to therapeutics. Trends in Neurosciences. 2016;39:851–861. doi: 10.1016/j.tins.2016.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Lawlor DA, Harbord RM, Sterne JAC, Timpson N, Davey Smith G. Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology. Statistics in Medicine. 2008;27:1133–1163. doi: 10.1002/sim.3034. [DOI] [PubMed] [Google Scholar]
  20. Leeb H, Pötscher BM. Model selection and inference: Facts and fiction. Econometric Theory. 2005;21:21–59. [Google Scholar]
  21. Leeb H, Pötscher BM. Sparse estimators and the oracle property, or the return of Hodges’ estimator. Journal of Econometrics. 2008;142:201–211. [Google Scholar]
  22. Millwood IY, Walters RG, Mei XW, Guo Y, Yang L, Bian Z, Bennett DA, Chen Y, Dong C, Hu R, et al. Conventional and genetic evidence on alcohol and vascular disease aetiology: a prospective study of 500,000 men and women in China. The Lancet. 2019;393:1831–1842. doi: 10.1016/S0140-6736(18)31772-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Mokry LE, Ross S, Ahmad OS, Forgetta V, Smith GD, Leong A, Greenwood CMT, Thanassoulis G, Richards JB. Vitamin D and Risk of Multiple Sclerosis: A Mendelian Randomization Study. PLoS Medicine. 2015;12:1–20. doi: 10.1371/journal.pmed.1001866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Newey WK, Windmeijer F. Generalized method of moments with many weak moment conditions. Econometrica. 2009;77:687–719. [Google Scholar]
  25. Patel A, DiTraglia FJ, Zuber V, Burgess S. Supplement to “Selecting invalid instruments to improve Mendelian randomization with two-sample summary data”. Annals of Applied Statistics. 2024 doi: 10.1214/23-AOAS1856. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Revez JA, Lin T, Qiao Z, Xue A, Holtz Y, Zhu Z, Zeng J, Wang H, Sidorenko J, Kemper KE, Vinkhuyzen AAE, et al. Genome-wide association study identifies 143 loci associated with 25 hydroxyvitamin D concentration. Nature Communications. 2020;11:1–12. doi: 10.1038/s41467-020-15421-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Rosenman ETR, Basse G, Owen AB, Baiocchi M. Combining observational and experimental datasets using shrinkage estimators. Biometrics. 2023:1–13. doi: 10.1111/biom.13827. [DOI] [PubMed] [Google Scholar]
  28. Sanderson E, Glymour MM, Holmes MV, Kang H, Morrison J, Davey Smith G, et al. Mendelian randomization. Nature Reviews Methods Primers. 2022;2:1–6. doi: 10.1038/s43586-021-00092-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Schmidt AF, Finan C, Gordillo-Maranon M, Asselbergs FW, Freitag DF, Patel RS, et al. Genetic drug target validation using Mendelian randomization. Nature Communications. 2020;11:1–12. doi: 10.1038/s41467-020-16969-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Schmidt AF, Hunt NB, Gordillo-Maranon M, Charoen P, Drenos F, Finan C, et al. Cholesteryl Ester Transfer Protein (CETP) as a drug target for cardiovascular disease. Nature Communications. 2021;12:1–10. doi: 10.1038/s41467-021-25703-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nature Reviews Genetics. 2013;14:483–495. doi: 10.1038/nrg3461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Stelzer G, et al. The GeneCards suite: From gene data mining to disease genome sequence analyses. Current Protocols in Bioinformatics. 2016;54:1–30. doi: 10.1002/cpbi.5. [DOI] [PubMed] [Google Scholar]
  33. Swerdlow DI, Kuchenbaecker KB, Shah S, Sofat R, Holmes MV, Hingorani AD, et al. Selecting instruments for Mendelian randomization in the wake of genome-wide association studies. International Journal of Epidemiology. 2016;45:1600–1616. doi: 10.1093/ije/dyw088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Verbanck M, Chen CY, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nature Genetics. 2018;50:693–698. doi: 10.1038/s41588-018-0099-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Williams DM, Finan C, Schmidt AF, Burgess S, Hingorani AD. Lipid lowering and Alzheimer’s disease risk: a Mendelian randomization study. Annals of Neurology. 2020;87:30–39. doi: 10.1002/ana.25642. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Ye BT, Shao J, Kang H. Debiased inverse-variance weighted estimator in two-sample summary-data Mendelian randomization. Annals of Statistics. 2021;49:2079–2100. doi: 10.1002/sim.10245. [DOI] [PubMed] [Google Scholar]
  37. Zhao Q, Wang J, Hemani G, Bowden J, Small DS. Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. Annals of Statistics. 2020;48:1742–1769. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

RESOURCES