Summary
With the increasing availability of large-scale GWAS summary data on various complex traits and diseases, there have been tremendous interests in applications of Mendelian randomization (MR) to investigate causal relationships between pairs of traits using SNPs as instrumental variables (IVs) based on observational data. In spite of the potential significance of such applications, the validity of their causal conclusions critically depends on some strong modeling assumptions required by MR, which may be violated due to the widespread (horizontal) pleiotropy. Although many MR methods have been proposed recently to relax the assumptions by mainly dealing with uncorrelated pleiotropy, only a few can handle correlated pleiotropy, in which some SNPs/IVs may be associated with hidden confounders, such as some heritable factors shared by both traits. Here we propose a simple and effective approach based on constrained maximum likelihood and model averaging, called cML-MA, applicable to GWAS summary data. To deal with more challenging situations with many invalid IVs with only weak pleiotropic effects, we modify and improve it with data perturbation. Extensive simulations demonstrated that the proposed methods could control the type I error rate better while achieving higher power than other competitors. Applications to 48 risk factor-disease pairs based on large-scale GWAS summary data of 3 cardio-metabolic diseases (coronary artery disease, stroke, and type 2 diabetes), asthma, and 12 risk factors confirmed its superior performance.
Keywords: causal inference, data perturbation, goodness-of-fit test, GWAS, instrumental variable
Introduction
Mendelian randomization (MR) has been widely applied to assess causal relationships between pairs of complex traits (called exposures and outcomes, respectively) using genetic variants as instrumental variables (IVs) for observational data. The practice is not only motivated by fundamental scientific questions on causal relationships, but also largely facilitated by recent advances in human genetics with increasing availability of large-scale GWAS summary data on various complex traits and of the simplicity of such analyses. However, the validity of such an analysis critically depends on the IV assumptions, which are often violated due to the widespread genetic (horizontal) pleiotropy, leading to biased inference and false conclusions.1,2 The three assumptions on a valid IV, as shown in Figure 1, are listed below:
-
(A1)
The IV is associated with the exposure X; i.e., .
-
(A2)
The IV is not associated with the outcome Y conditional on the exposure X; i.e., αi = 0.
-
(A3)
The IV is not associated with unmeasured confounder U; i.e., .
Among the three, the first assumption is more straightforward to handle by using highly significant SNPs as IVs. The violation of assumption A2 introduces so-called uncorrelated pleiotropic effects, for which some MR methods, such as MR Egger regression,3,4 have been proposed and applied. The most challenging is assumption A3, introducing so-called correlated pleiotropic effects. Under our general causal model as shown in Figure 1, the total effects of IV gi on the outcome Y can be decomposed into two parts: that mediated through exposure X, , and other direct effect (not mediated through X), . It is clear that, under the violation of assumption A3, we have , leading to the correlation of the mediated and direct effects and , and thus the violation of the instrument strength independent of direct effect (InSIDE) assumption required by MR-Egger regression and other methods (e.g., RAPS)5,6 that model the direct effects αi’s as independent random effects; in turn, these methods have to impose that the pleiotropic effect of any SNP must be uncorrelated with its SNP-exposure association.
Here we propose a simple MR method based on constrained maximum likelihood and model averaging, denoted cML-MA, that is robust to the violation of both assumptions A2 and A3; i.e., it is robust to invalid IVs with uncorrelated or correlated pleiotropic effects. Table 1 compares our proposed cML-MA with some most popular and new methods that are all applicable to GWAS summary data. Our method depends only on the “plurality valid” assumption: in large samples, while (Wald) ratio estimates of the target causal effect from invalid IVs will take different values, ratio estimates from all valid IVs should approach the true causal effect and thus the valid IVs form the largest group of SNPs among all the groups giving different ratio estimates.7,8 This assumption is weaker than the “majority valid” assumption, which states that more than 50% of the SNPs being used are valid IVs. It is noted that three new methods with this or other weak assumptions, MR-Mix, MR-ContMix, and CAUSE, impose a normal mixture model with more unknown parameters to estimate, while our proposed cML-MA does not impose such an assumption and estimates only a minimum number of necessary parameters. It is well known that mixture models are statistically difficult to estimate with small numbers of SNPs/IVs. This brings up an important point: in addition to its modeling assumptions, another key factor determining the performance of a method is how it is implemented. This point might explain why our cML-MA performed better than MR-Mix, MR-ContMix, and CAUSE as to be shown. Another example is the MR-Weighted-Mode method: although it imposes minimum modeling assumptions as cML-MA, it is difficult to estimate the mode of a distribution with small numbers of SNPs/IVs, leading to often its poor performance as shown by others9,10 and to be confirmed later too. A similar argument in the discussion section will be made on the advantage of our method over two other related ones, MR-Lasso and MR-PRESSO: all three share the basic idea of selecting/removing invalid IVs, but due to their different implementations, they perform quite differently. As shown by previous studies,9, 10, 11 existing MR methods may not perform well with inflated type I errors and/or biased estimates, especially with a high proportion of invalid IVs among a small number of SNPs, prompting an urgent need for more robust and efficient MR methods. Here we develop an efficient algorithm, an effective model selection criterion, a model averaging approach and its variant based on data perturbation for such a purpose; our proposed cML-MA and its data perturbation-based variant are simple and easy to implement while imposing less stringent modeling assumptions, and as to be shown, perform consistently better than other methods across a wide range of scenarios.
Table 1.
Method | A2 | A3 | Key assumptions | Implementation challenges/performance |
---|---|---|---|---|
cML-MA | ✓ | ✓ | plurality valid | controlling type I errors with high power |
MR-Mix12 | ✓ | ✓ | plurality valid; a mixture of normals | biased to the null, thus conservative |
MR-ContMix9 | ✓ | ✓ | plurality valid; ; NOME | difficult to pre-choose a fixed value for tuning parameter ψ |
CAUSE13 | ✓ | ✓ | <50% IVs have correlated pleiotropy; ; ; or ; | difficult to estimate some parameters depending on the hidden confounder U; sensitive to assumption of |
MR-Lasso14 | ✓ | ✓ | plurality valid;7 some condition on the exposure-association strengths of invalid IVs relative to that of valid IVs to ensure consistency;15 NOME | depending on the heterogeneity criterion for choosing the tuning parameter for the Lasso penalty |
MR-Weighted-Mode16 | ✓ | ✓ | plurality valid | sensitive to the difficult bandwidth selection for mode estimation |
MR-Weighted-Median3 | ✓ | ✓ | majority valid | robust to outliers; low powered; sometimes biased |
MR-PRESSO1 | ✓ | x | majority valid; InSIDE; Good delete-1 causal estimates | inflated type I errors; unable to completely remove invalid IVs |
MR-Egger17 | ✓ | x | InSIDE: ; for a small m (but no normality needed for a large m); NOME | often biased and low powered |
MR-RAPS6 | ✓ | x | InSIDE: ; if overdispersion is specified | may be sensitive to directional pleiotropy; robust to outliers with Tukey’s loss |
MR-IVW (RE)18,19 | ✓ | x | balanced pleiotropy; NOME | sensitive to directional pleiotropy; low powered |
MR-IVW (FE)18,19 | x | x | all IVs are valid; NOME | efficient when all IVs are valid; sensitive to invalid IVs |
The notations are defined in Figure 1 and Equation 1, and q is the (unknown) proportion of invalid IVs while and are the Wald ratio estimate of θ based on SNP i and its standard error, respectively. NOME refers to no measurement error assumption: the variance of any IV-exposure association estimate is negligible.20
Material and methods
Overview
Suppose that we have m independent SNPs, , as IVs, X is the exposure, Y is the outcome, and U is the hidden confounder. Under the true causal model as shown in Figure 1, we obtain the total effects of gi on X and on Y as and , respectively:
(Equation 1) |
where represents the direct/pleiotropic effects of IV gi on outcome Y, not mediated by exposure X. If gi is a valid IV, IV assumptions A2 and A3 would imply αi = 0 and , respectively, leading to ri = 0. For an invalid IV with , its (Wald) ratio is biased for θ: . Our goal is for unbiased inference of the causal effect θ in the possible presence of some (unknown) invalid IVs with the corresponding (unknown) .
From two independent GWAS summary datasets for traits X and Y, respectively, we obtain the estimated marginal effect sizes of gi’s on X and Y (and their standard errors) as and , . Asymptotically (or approximately) we have and for . With Equation 1, we have the log-likelihood function (up to a constant) as
(Equation 2) |
which, for simplicity, may be written as . Throughout we use to represent a set of the parameters, and similarly for . Under the constraint that the number of the invalid IVs is K, a given integer, we can obtain the constrained maximum likelihood estimate (cMLE) and its standard error . We prove that with correctly selected valid IVs, the cMLE is consistent and asymptotically normal. Accordingly, we can construct normal-based confidence intervals (CIs) or conduct significance testing for θ. In practice, since K is unknown, we propose a Bayesian information criterion (BIC) to select K consistently before drawing inference on the true causal effect θ; this is our proposed cML-BIC. For finite (especially small) sample sizes, due to model selection errors, such a procedure might have slightly inflated type I error rates as shown in the supplemental material and methods. Instead, to account for model selection uncertainty, we propose a model averaging approach,21 called cML-MA-BIC, or cML-MA for short. We obtain multiple estimates , each based on a selected model with each value of 0 ≤ K < m − 1, then take their weighted average as our final estimate of the causal effect and draw inference accordingly; the weights are determined by the BIC values of the models, in which those more likely models (with lower BIC values) are given higher weights. More details are given below.
Estimation and selection consistency with the cMLE
We develop some (asymptotic) theory to support our proposed method (for fixed m as n increases). Denote the set of truly invalid IVs, and its size . Suppose that n1 and n2 are the sample sizes of the two GWAS summary datasets for X and Y, respectively. With Equation 2, we obtain the cMLEs by solving
(Equation 3) |
Here is the indicator function and K is a tuning parameter representing the unknown number of invalid IVs. Denote the cMLEs from Equation 3 as , , and for , and the estimated set of invalid IVs. We propose a Bayesian information criterion (BIC) based on GWAS summary data to select the best K in a candidate set :
(Equation 4) |
Here n could be either n1 or n2; we recommend using n = min(n1, n2). We select , and estimate as the set of invalid IVs (and its complement as the set of valid IVs).
Now we state two assumptions used to prove the selection consistency of our proposed BIC.
Assumption 1. (Plurality valid condition.) Suppose that B0 is the index set of the true invalid IVs with . For any and , if , then the (m − K0) ratios are not all equal.
Assumption 2. (Orders of the variances and sample sizes.) There exist positive constants and such that we have , , and for .
Assumption 2 says that the two sample sizes n1 and n2 are comparable and that variance or is of order 1/n1 or 1/n2, which is satisfied by the usual least-squares or maximum likelihood estimates obtained from GWAS summary data. We also note that the sample sizes for the SNPs/IVs being used may vary; as long as they are comparable (in the sense as defined in assumption 2), we can take their minimum in each GWAS dataset as the corresponding sample size n1 or n2. With assumptions 1 and 2 we prove (in the supplemental material and methods) that our proposed BIC consistently selects invalid IVs.
Theorem 1. With assumptions 1 and 2 satisfied, if , we have and as , .
As shown in the supplemental information, it was confirmed that in the simulations the proposed BIC selected increasing proportions of the correct models as the sample size increased.
After correctly selecting (and implictly removing) invalid IVs, our proposed cMLE of θ is the same as the maximum profile likelihood estimate (MPLE) being applied to all valid IVs.6 Applying theorems 3.1 and 3.2 in Zhao et al.,6 coupled with the above selection consistency, we obtain both the estimation consistency and asymptotic normality of the cMLE . It is proven in the supplemental material and methods that the variance of the cMLE (based on the Fisher information matrix as shown in computation section below) and that of the MPLE are asymptotically equal. As confirmed numerically in the supplemental material and methods, our cMLE and the MPLE were essentially the same in both the simulations and real data examples.
Model selection and model averaging approaches to inference of θ
After selecting , we can use the cMLE and its SE (see below for how to obtain it) to infer θ: based on the asymptotic normal distribution , we either construct a confidence interval (CI) or conduct a significance test. We call this method cML-BIC.
In spite of the selection consistency of our proposed BIC, to account for model selection uncertainties, especially with small sample sizes, we propose a model averaging (MA) approach. Following Buckland et al.,21 we first obtain the estimate of θ and its standard error for each candidate , then take their weighted average as the final estimate of θ with the weights determined by the BIC values of the corresponding candidate models.
Following Buckland et al.,21 for a set of K’s, we define the initial and standardized weights
The final weighted estimate and its standard error are
(Equation 5) |
With and SE, based on the asymptotic normal distribution, we draw inference on θ. We call this method cML-MA-BIC. In practice, we use the set of candidate K’s, . K = 0 means all IVs are valid; according to assumption 1, there should be at least two valid IVs. Other choices of candidate sets of K’s could also be applied, especially when we roughly know the proportion of invalid IVs.
Instead of using BIC, we can also use the corresponding Akaike information criterion (AIC) to select K or weight its corresponding model, leading to cML-AIC and cML-MA-AIC for model selection- and model-averaging-based approaches, respectively. As shown in the supplemental material and methods, they did not perform as well as their BIC versions.
Computation
We propose a coordinate descent-like algorithm to iteratively solve Equation 3 to obtain cMLEs, , , and for . We start with the initial values and ’s, then update them iteratively as below until convergence: at the iteration,
Step 1: Given , update ri. Order decreasingly, as . Then for , let ; for , let .
Step 2: Given , ’s, update as
(Equation 6) |
Step 3: Given ’s, ’s, update θ as
(Equation 7) |
It is noted that at the convergence, by the expression of ri in step 1 and that for in step 3, if (i.e., for an invalid IV), SNP i and its data do not contribute to estimating θ.
By default, as in all our simulations, we set and ’s all at 0. More generally, as in our main real data examples, we can also use multiple random starts; in our real data examples, in addition to the above default starting values, we tried 100 random starts, each randomly generated as , and for 1 ≤ i ≤ m. Then we take the cMLE as the one from the initial values giving the maximum likelihood among those multiple starts. As shown in Tables S5 and S6, for our primary real data examples, among all 48 risk factor-disease pairs, only for 7 pairs the 101 starts gave slightly different results from using the default starting values; the differences in the numbers of detected invalid IVs were only 1 or 2, leading to almost the same results at the end. In our secondary real data analysis, for each of 63 null pairs we tried 10 random starts.
Next we estimate the standard error of for any given K. Denote the set of the indices of K non-zero ’s as , the (m − K + 1) by (m − K + 1) Fisher information matrix is
(Equation 8) |
where is a vector of elements with . Plugging , ’s into , we obtain the standard error of as . Details are shown in the supplemental material and methods.
Data perturbation
When the sample sizes of GWAS summary data are relatively small and there are many invalid IVs with weak pleiotropic effects, the (asymptotic) selection consistency of cML-BIC as described in theorem 1 may not be achieved, leading to missing some invalid IVs and ultimately biased inference, such as inflated type I errors. To alleviate this problem, we propose using data perturbation (DP).22 For , we generate independent perturbed samples and for . Then similar to Equation 3, we obtain the cMLEs with perturbed data by solving
(Equation 9) |
Denote the cMLEs from Equation 9 as , , and for , we get the maximized log-likelihood as
(Equation 10) |
Averaging over T perturbed estimates, we have
(Equation 11) |
and estimate the standard error of as the sample standard deviation of ’s,
(Equation 12) |
Then, as for cML-BIC and cML-MA-BIC, with the DP estimates in Equations 11 and 12, we obtain their DP versions called cML-BIC-DP and cML-MA-BIC-DP, respectively.
Goodness-of-fit tests for the variance estimates
In general, as to be shown numerically, cML-MA-BIC-DP is more conservative for inference and thus controls the type I errors better than cML-MA-BIC, but may lose power while being computationally more demanding. To help decide which one to use for a given problem, we develop two goodness-of-fit tests, denoted GOF1 and GOF2, to check whether the (asymptotic) model-based and DP-based variance estimates converge to the same estimate; if so, then we recommend using cML-MA-BIC; otherwise, cML-MA-BIC-DP is preferred.
Suppose that cML-BIC selects a set of invalid IVs, with the estimate and its model-based variance calculated using Fisher Information matrix (Equation 8) as . If the BIC-based model selection is correct with only small model selection uncertainty, we’d expect that would be close to the DP-based variance estimate, . Our proposed goodness-of-fit tests aim to test whether the two variance estimates converge to the same estimate (asymptotically).
First, based on each perturbed dataset, we obtain , from which we estimate the sample variance of ’s as . Second, as shown by Equation 12, is the sample variance of T i.i.d. random from some distribution ; theorem 2 in chapter 6 of Mood et al.23 shows that its variance is
(Equation 13) |
where and are the central fourth moment and variance of . We use the T samples to estimate them as
Plugging them into Equation 13, we obtain
(Equation 14) |
and the first GOF test statistic is
(Equation 15) |
Comparing with the standard normal random variate Z, we can calculate the p value as . This is the first goodness-of-fit test GOF1.
When T is small or only moderately large, the estimate could have a large variance. Furthermore, if ’s are normally distributed, we have , and the variance estimate in Equation 14 can be simplified to
(Equation 16) |
Replacing in ZGOF1 with , we obtain a new GOF test statistic, ZGOF2, by which and a standard normal as the null distribution, we can calculate a p value.
Other methods
We compared cML-MA with other existing two-sample MR methods, including MR-ContMix, MR-Mix, MR-CAUSE, MR-Lasso, MR-PRESSO, MR-IVW (random-effect [RE] meta-analysis), MR-Egger regression, MR-Weighted-Median, MR-Weighted-Mode, and MR-RAPS (the Robust Adjusted Profile Score) methods. We applied MR-RAPS with four different combinations of its parameters: for MR-RAPS1 and MR-RAPS2, we set the over-dispersion parameter as TRUE, and used the L2 loss and the Tukey loss, respectively; for MR-RAPS3 and MR-RAPS4, we set the over-dispersion as FALSE and used the L2 and the Tukey loss, respectively; we present RAPS2 to represent RAPS in the main text. We also applied the Oracle MR-IVW to only valid IVs in the simulations, called MR-IVW-Oracle.
Each method takes GWAS summary data of as input and gives an estimate of θ, say , along with its standard error SE.
GWAS data
Primary real data examples
We applied various methods to some large-scale GWAS summary data. Following Morrison et al.,13 we studied possible causal effects of 12 risk factors on 4 complex diseases: coronary artery disease (CAD [MIM: 608320]), stroke (MIM: 601367), type 2 diabetes (T2D [MIM: 125853]), and asthma (MIM: 600807) (mostly as a negative control). These 12 cardio-metabolic risk factors are triglycerides (TG), high-density lipoprotein (HDL), low-density lipoprotein (LDL), drinks per week (alcohol), ever regular smoker (smoke), body fat percentage (BF), birth weight (BW), body mass index (BMI), height (MIM: 606255), fasting glucose (FG), systolic blood pressure (SBP), and diastolic blood pressure (DBP).
For each risk factor/exposure-disease/outcome pair, we used the set of LD-independent SNPs as IVs as described in Morrison et al.13 (in their Table S4) and applied all methods to the GWAS summary statistics of these SNPs.
Secondary real data examples
If two traits are not genetically correlated, it is unlikely that they are causally related. As suggested by a reviewer, from LD Hub,24 we collected 63 trait pairs without significant genetic correlations (i.e., p value > 0.05) as negative controls to study the type I error properties of the methods. These 63 pairs involve 13 traits in total: fasting proinsulin (FP), height, homeostasis model assessment of beta-cell function (HOMA), LDL, rheumatoid arthritis (RA [MIM: 180300]), schizophrenia (SCZ [MIM: 181510]), T2D, age at smoking (ASmk), anorexia nervosa (AN [MIM: 606788]), childhood IQ (CIQ), ever/never smoked (ESmk), former current smoker (FSmk), and infant head circumference (IHC). For each pair, we used R package TwoSampleMR to select LD-independent SNPs as IVs and extract their summary statistics following the standard procedures; sample R code is available in the supplemental material and methods. The GWAS summary data in LD Hub and TwoSampleMR are the same for 12 traits except for height. For height, LD Hub contains the GWAS data of the GIANT consortium from year 2010,25 while TwoSampleMR uses the GIANT data from year 2014.26 Details of the GWAS data used are in the supplemental material and methods.
Simulation set-ups
Main simulations
We compared different methods through extensive simulations. The simulation set-ups were similar to those in Burgess et al.9 We set the number of the SNPs/IVs m = 10, 20, or 100, and sample size n = 50,000, 100,000, or 200,000. For each SNP , we generated ’s from a uniform distribution on ; its MAF fi from a uniform distribution Unif(0.1,0.3), then its genotypes SNPij from a bonimial Bin(2, fi) for . For each m, we tried different proportions q = 0%, 20%, 40%, 60% of invalid IVs: for each SNP , we generated its direct effect size αi from Uniform(0.2,0.3) and set (when the InSIDE assumption was satisfied) or generated from Uniform(−0.1,0.1) (when InSIDE was violated). We set , generated the random errors , , and independently from N(0,1). Then we generated U, X, and Y from the causal model (Equation 17):
(Equation 17) |
We generated two independent samples, each of size n1 = n2 = n, and used the first sample to fit marginal linear regressions of X on SNPs, and using the second sample to fit marginal linear regressions of Y on SNPs, thus obtaining the GWAS summary statistics. We tried different . When it was the null case; i.e., X had no causal effect on Y.
Secondary simulations
We did simulations to compare various methods with CAUSE. We generated the simulated data in the framework of CAUSE as described in the original CAUSE paper.13 We set the sample size n1 = n2 = n = 50,000 or 100,000. Denote the direct effects of SNP i on the exposure as and on the outcome Y as αi. For the expected number of SNPs with non-zero associations with the exposure and the outcome (i.e., and ), denoted by mX and , we set or 100. Denote the true causal effect size from the exposure to the outcome as θ. We set for the null case and for the non null case. In each simulation, we generated p = 100,000 independent SNPs with MAF fi independently drawn from Uniform(0.1,0.3). In CAUSE, it is assumed that the hidden confounder U is standardized with ; we set the effect size from U to Y at . We set the proportion of invalid IVs with correlated pleiotropic effects at q = 0.3. We set the heritabilities of X and Y at . Then we generated the standardized effect sizes and from a mixture of bivariate normal distribution:
(Equation 18) |
Here , and ; , and , . The non-standardized effect sizes were , and . We generated the indicator of an invalid IV, Zi, from Bernoulli(q), and . With the standard error , the GWAS summary statistics were generated as , .
For each combination of we did simulations 200 times (while the simulations in Morrison et al.13 were repeated only 100 times due to longer running time of CAUSE). For CAUSE, we used all p SNPs to estimate the parameters, and used its default p value threshold 0.001 to select the SNPs associated with the exposure. For other MR methods, we used the usual p value threshold 5 × 10−8 to select the exposure-associated SNPs as IVs. For comparison, we also applied the p value threshold 5 × 10−8 (instead of its default threshold 0.001) to select the SNPs for CAUSE.
Simulations with weak invalid IVs
As suggested by a reviewer, we did more simulations with many invalid IVs with weak effects (so-called “weak invalid IVs”), representing a scenario more challenging to identify invalid IVs with only weak effects. We set the number of IVs m = 50, sample size n = 20,000, and ’s from N(0,hx / m) for . We had the first 60% IVs as invalid IVs with uncorrelated pleiotropic effects αi’s from N(0,hy / m) and correlated pleiotropic effects ’s from N(0,hu / m) for , and set for . Then we set and generated , , where θ was the true causal effect. We set hx = 0.5, and tried different hy = 0.1,0.2,0.4,0.6, different hu = 0,0.1, and different . Here hx, hy, and hu could be viewed as the heritability of the exposure, outcome, and confounder due to direct effects of the IVs. Note that for hu = 0 there was no correlated pleiotropy, while for hu = 0.1 there was. The smaller hy, the weaker the direct/pleiotropic effects and thus more difficult to identify the invalid IVs.
Results
Simulations: Better type I error control and higher power of the new method than other MR methods
We compared our proposed method with ten most popular and new MR methods as shown in Table 1 through extensive simulations: MR-Mix, MR-ContMix, MR-CAUSE, MR-Lasso, MR-PRESSO, MR-Weighted-Median, MR-Weighted-Mode, MR-Egger, MR-RAPS, and MR-IVW (with a random-effect model throughout this paper). For evaluation, we also added MR-IVW-Oracle, an ideal but impractical method with all valid IVs known and being used, giving best possible performance. Since CAUSE requires full GWAS summary data (with both trait-associated and non-associated SNPs) with much longer running time, we divided the simulations into two parts. For main simulations, following Burgess et al.9 and Slob and Burgess,10 we only generated summary statistics for exposure-associated SNPs, and compared cML-MA with other nine methods except CAUSE. For secondary simulations, we simulated both exposure-associated and non-associated SNPs as required by CAUSE13 and compared cML-MA to all other 10 methods.
Main simulations: Comparison with major MR methods
We did extensive simulations with the true causal model shown in Figure 1, including n = 50,000, 100,000, or 200,000 subjects in each GWAS dataset, using m = 10, 20, or 100 SNPs as IVs, among which 0% to 60% were invalid IVs with IV assumptions A2 or/and A3 being violated. For each setup we did 1,000 simulations to compare the proposed cML methods and other existing MR methods, and for the 10 setups shown in Figure 2, we also increased the number of simulations to 10,000 to better estimate the type I errors. For the 9 setups with 60% invalid IVs and with both IV assumptions A2 and A3 being violated, we also applied cML-BIC-DP and cML-MA-BIC-DP. We used the nominal significance level of 0.05. Here we only show some representative results while all others are available in the supplemental material and methods.
Figure 2 shows the empirical type I errors (at the nominal level 0.05). First, in the cases with all valid IVs, the methods generally performed well, though MR-Weighted-Mode, MR-Weighted-Median, and MR-Mix might be too conservative. On the other hand, MR-ContMix could have an inflated type I error rate, perhaps due to its inappropriately pre-selected tuning parameter value. Second, in the presence of 20% or 60% invalid IVs with IV assumption A2 violated but assumption A3 (thus the InSIDE assumption) holding, MR-PRESSO, MR-Lasso, and MR-IVW could have inflated type I error rates. It is noted that, though the InSIDE assumption held, MR-Egger could have a slightly inflated type I error rate for small m = 10 but not for large m = 100. MR-Weighted-Mode gave the most highly inflated type I error rate with the large proportion (60%) of invalid IVs and with the small number of SNPs (m = 10). Third, in the cases with both IV assumptions A2 and A3 violated, MR-IVW, MR-Egger, and MR-PRESSO all had inflated type I error rates, while MR-Weighted-Mode and MR-Weighted-Median had largely inflated type I error rates with 60% invalid IVs, and so did MR-Lasso with 60% invalid IVs and with only m = 10 SNPs. In summary, as expected, MR-IVW was problematic in the presence of invalid IVs, MR-Egger did not perform well if IV assumption A3 was violated, and MR-PRESSO often had inflated type I errors; on the other hand, in agreement with Slob and Burgess,10 MR-Lasso, MR-Weighted-Median, and MR-Weighted-Mode did not perform well with a small number of SNPs and with a high proportion of invalid IVs. We conclude that only MR-cML-BIC and MR-Mix could control the type I error rates across all the scenarios, though MR-Mix was often too conservative (with too small type I errors), especially for a large number of SNPs/IVs. The results for type I errors based on 10,000, instead of 1,000, simulations, as shown in Figure S6, were essentially the same.
Figure 3 shows the empirical type I error (for ) and power (for ) curves. It is confirmed that our proposed method cML-MA-BIC always yielded a power curve close to that of MR-IVW-Oracle, the ideal test based on using only valid IVs. In particular, cML-MA-BIC was more powerful than MR-Mix and other methods (when their type I errors were close to the nominal level).
Figures 4 and 5 show the distributions of the causal estimates by each method for the true causal effect sizes and 0.1, respectively. Again it is confirmed that the distribution of the θ estimates from our proposed cML-MA-BIC was almost the same as that from the ideal MR-IVW-Oracle. In particular, cML-MA-BIC, MR-ContMix, and MR-Lasso (and MR-IVW-Oracle) always yielded (almost) unbiased estimates with smaller variances, while other methods sometimes gave biased estimates (and/or with much larger variances). In particular, as shown in Figure 5, MR-Mix was slightly biased (toward 0) for m = 100.
For the 9 setups with 60% invalid IVs and with both IV assumptions A2 and A3 being violated, we applied cML-BIC-DP and cML-MA-BIC-DP, with various numbers of perturbations T = 100, 200, and 500. The complete results are shown in supplemental material and methods section S5.4. cML-BIC-DP and cML-MA-BIC-DP yielded similar results to those of cML-BIC and cML-MA-BIC, respectively, in terms of both point estimation and statistical inference, and both the GOF tests performed similarly in rejecting the null hypothesis (of unequal variance estimates from the model- and DP-based approaches) with low frequencies.
In summary, overall, among all the methods, cML-MA-BIC, MR-Mix, and MR-ContMix performed best across all the scenarios; among the latter three, cML-MA-BIC was the clear winner for its higher power while better controlling the type I error rate.
Secondary simulations: Comparison with CAUSE
We did simulations in the framework of MR-CAUSE as described in Morrison et al.13 Figure 6 shows the empirical type I error (for ) and power (for ) for the methods with m = 10 or 100 exposure-associated SNPs and sample size n = 50,000 or 100,000. It is clear that our proposed method cML-MA-BIC could control type I error well with high power. In contrast, CAUSE could have largely inflated type I error rates and much lower power than cML-MA-BIC. Here the results for CAUSE were based on using its default p value threshold of 0.001 to select exposure-associated SNPs; as shown in the supplemental material and methods, using the threshold 5 × 10−8 (as for other methods shown here) did not give better results for CAUSE. The poor performance of CAUSE here is in agreement with that shown in the original CAUSE paper (Morrison et al.,13 Figure SN1): when it was high powered to detect SNP-exposure and SNP-outcome associations, CAUSE tended to give dramatically inflated false positive rates. In addition, compared to cML-MA-BIC, both MR-Mix and MR-ContMix had higher inflated type I error rates for small m = 10; on the other hand, MR-Mix was too conservative with too small type I error rates and lower power for m = 100 with the small sample size n = 50,000.
Simulations with weak invalid IVs
We did 1,000 simulations for each setup with a small sample size and many invalid IVs having weak direct/pleiotropic effects. Figure 7A shows some representative results for the empirical type I error (for ) and power (for ) curves for hy = 0.2 and hu = 0 (i.e., no correlated pleiotropy); the complete results are given in the supplemental material and methods. It is clear that under this challenging situation, in addition to the ideal MR-IVW-Oracle, only three methods could control the type I error satisfactorily (while all others could not): cML-MA-BIC-DP, MR-IVW, and MR-Egger, but cML-MA-BIC-DP was much more powerful than the other two. It is noted that here the (weak) direct effects were balanced (i.e., with mean 0) and from a normal distribution, explaining the reasons for the relatively good performance by MR-IVW and MR-Egger. Nevertheless, as shown in Tables S107–S110, as the pleiotropic effect sizes (i.e., hy) increased, the power of cML-MA-BIC-DP improved (by better identifying invalid IVs), but, perhaps surprisingly, both MR-IVW and MR-Egger became less powerful (because of the increasing error variances in their models by treating the pleiotropic effects as random).
In the presence of correlated pleiotropy with hu = 0.1, as shown in Tables S111–S114, only our proposed cML-BIC-MA-DP could satisfactorily control type I error and was well powered, while all other methods, including MR-IVW, MR-Egger, and MR-RAPS, yielded inflated type I errors and possibly low power.
Figure 7B shows the relative frequencies of the goodness-of-fit tests’ rejecting the null hypothesis that the model-based variance was equal to the DP-based variance by cML. The proposed goodness-of-fit tests could detect with high power the problem with cML-MA-BIC; the two GOF tests performed similarly, though GOF2 was slightly more powerful (presumably due to its taking advantage of the causal estimates being nearly normally distributed). In addition, as hy decreased, it became harder to identify invalid IVs, leading to more inflated type I error rates by most methods, including cML-MA-BIC; accordingly the two GOF tests rejected the null hypothesis more frequently, demonstrating their effectiveness.
Computational time
We did simulations to compare the running times of different methods as detailed in supplemental material and methods section S8. In summary, cML-MA-BIC runs reasonably fast: its computing time was comparable to that of MR-ContMix and MR-RAPS, while being faster than MR-Mix and MR-Weighted-Mode but slower than MR-IVW, MR-Egger, and MR-Weighted-Median. As expected, using more random starting points or data perturbation would take much more time. Nevertheless, on a MacBook Pro laptop, with 10 to 100 SNPs/IVs, it took from a few seconds to less than 10 minutes with cML-MA-BIC-DP with five random starts and T = 200 data perturbations; in contrast, cML-MA-BIC with five random starts ran from 0.3 to 4 seconds.
Identifying causal risk factors of complex diseases
We compare our proposed cML with other methods to identify possible causal effects of 12 risk factors on three cardio-metabolic diseases—coronary artery disease (CAD),27 stroke,28 and type 2 diabetes (T2D)29—plus asthma largely used as a negative control.30 These 12 risk factors (and their corresponding GWASs) are LDL cholesterol, HDL cholesterol, triglycerides (TG),31 drinks per week (alcohol), ever regular smoker (smoke),32 body fat percentage (BF),33 birth weight (BW),34 body mass index (BMI),35 height,26 fasting glucose (FG),36 systolic blood pressure (SBP), and diastolic blood pressure (DBP).37 As used and shown in Morrison et al.,13 the sample sizes of these GWASs ranged from 46,186 for FG and 69,033 for T2D, to 100,716 for BF, 142,486 for asthma, 188,577 for TG, HDL, and LDL, 253,288 for height, 322,154 for BMI, then 446,696 for stroke, 547,261 for CAD, 757,601 for DBP and SBP, finally to near and above a million for alcohol and smoke, respectively. For each risk factor/exposure-disease/outcome pair, we used the set of LD-independent SNPs as IVs as described in Morrison et al.13 (in their Table S4) and applied all methods except CAUSE to the GWAS summary statistics of these SNPs; for CAUSE, we extracted the results from the original paper.13
In Morrison et al.,13 the 48 exposure-outcome pairs were classified into 5 categories: considered causal (9 pairs), likely causal as supported by the literature (10 pairs), correlated but unknown to be causal or with conflicting evidence (17 pairs), unrelated (10 pairs), and considered non-causal (2 pairs); here we combined the first two categories into one to represent (likely) causal pairs. In Figure 8 we compare cML-MA-BIC with three representative methods: CAUSE, a new one specifically proposed to deal with correlated pleiotropy, and two MR methods, one robust and competitive (MR-Mix) and the other perhaps most popular (MR-IVW), for all of these 48 risk factor-disease pairs. Figure 9 compares the numbers of the detected pairs by these and other methods, and other detailed results (of the causal parameter estimates, SEs, and p values) for all methods are available in the supplemental material and methods. Here we discuss the results based on the Bonferroni adjusted significance level of . For the 19 known or likely causal risk factor-disease pairs, cML-MA-BIC, CAUSE, MR-Mix, and MR-IVW identified 15, 7, 12, and 12 significant pairs, respectively; among the 17 correlated pairs, the four methods detected 6, 0, 4, and 1 pairs, respectively; among the 10 unrelated pairs, none of the methods identified any, while for the two pairs of non-causal pairs, all four methods indicated one and the same one (i.e., HDL-CAD, which is still under debate as whether it is truly causal). In addition, although none of the methods detected causal smoke-asthma, smoke-T2D, and BMI-stroke, our method was the only one among the four methods giving marginally significant p values. It is clear that our proposed cML-MA-BIC identified the largest numbers of the known or likely causal pairs, showcasing its highest power. On the other hand, it also detected more pairs from the category of “correlated” pairs. It is possible that these pairs, such as BW-CAD, BW-T2D, and DBP/SBP-T2D, may be false positives, but at the same time, they may be truly causal as to be confirmed (or refuted) by further studies. As shown in Figure 10, only based on the data, there seems to be evidence to support these causal relationships as detected by our cML-MA-BIC and many other tests (shown in the supplemental material and methods). It is noted that CAUSE also gave marginally significant p values < 0.05 for three of these six pairs.
Among the tens to about 1,000 SNPs used as IVs for the 48 risk factor-disease pairs, 0 to 96 SNPs, mostly <10, ranging from 0% to 30%, mostly <3%, were identified as invalid IVs by our method (Table S5). However, as for BW-CAD and BW-T2D in Figure 10, although only five (out of 65 and 54) SNPs were identified as invalid IVs based on the BIC-selected best models, the models containing up to K = 20 SNPs as invalid IVs were estimated to be better (with lower BIC values) than the model treating all SNPs as valid IVs (i.e., K = 0). In general there were fewer invalid IVs for asthma, stroke, and T2D, but more for CAD. It can be seen that the influence of invalid IVs is somewhat complex; the difference between the two causal estimates with and without invalid IVs may or may not simply depend on the presence or the number of invalid IVs, but more on the configuration of invalid IVs relative to that of valid ones. For example, with the same number of invalid IVs detected for the BW-CAD and BW-T2D pairs, the causal estimates with and without the detected invalid IVs were almost the same for the former pair, but more different for the latter (Figure 10).
We then applied cML-MA-BIC-DP with T = 200 perturbations and 10 additional random starts in each perturbation. Compared to those of cML-MA-BIC, the results remained the same for 46 pairs in terms of statistical significance, but changed from being significant to marginally significant for only two pairs: one known or likely causal pair, BF-CAD; and one correlated pair, BW-T2D.
In summary, it is encouraging that most methods detected similar numbers of significant pairs, though MR-Egger and CAUSE detected much fewer as shown in Figure 9. In addition, there were some notable differences in specific pairs detected across the methods. For example, while our methods detected the causal pairs BF-T2D and FG-T2D, MR-IVW missed both and MR-RAPS missed BF-T2D. We conclude that our proposed methods performed competitively.
Secondary real data analysis
We compared the type I errors of the cML methods and other existing MR methods with 63 pairs of traits that were not genetically correlated (with their p values greater than 0.05). For 10 of 63 pairs with HOMA as the exposure, TwoSampleMR gave only 2 LD-independent SNPs as IVs, which was too small to apply MR-ContMix, MR-Lasso, MR-Egger, MR-Weighted-Median, MR-Weighted-Mode, and MR-PRESSO; although cML methods are applicable to only two IVs, they would require K = 0, i.e., no invalid IVs. Hence we applied all methods to the other 53 pairs (without HOMA as the exposure). Figure 11 shows the Q-Q plots of cML-MA-BIC, cML-MA-BIC-DP, MR-Mix, MR-ContMix, MR-IVW, and MR-RAPS for these 53 pairs; Figure S5 shows the results for all methods. While the methods based on selection of invalid IVs, i.e., cML-MA-BIC, MR-Mix, and MR-ContMix, all seemed to have inflated type I errors, the proposed cML-MA-BIC-DP with T = 200 perturbations and 10 additional random starts in each perturbation, along with MR-IVW and MR-RAPS, appeared to perform well in satisfactorily controlling the type I errors. The complete results are in the supplemental material and methods.
Discussion
We have proposed several methods based on constrained maximum likelihood (cML) to consistently identify invalid IVs with either or both of correlated and uncorrelated pleiotropic effects, thus leading to consistent estimation and inference of the causal effect between an exposure and an outcome. For finite samples, the (asymptotic) selection consistency may not be achieved. To account for model selection uncertainty, we first propose a model-averaging approach, cML-MA-BIC, which performs better, especially in better controlling the type I error rate, than the selection-based version, cML-BIC. In addition, in more challenging situations with many invalid IVs with only weak pleiotropic effects, both model selection and model averaging may not perform well by failing to fully account for model selection uncertainty; accordingly, we propose a version based on data perturbation, cML-MA-BIC-DP, which could control the type I error rate satisfactorily across all simulated and real data examples. However, cML-MA-BIC-DP is computationally more demanding and may be conservative with some loss of power as compared to cML-MA-BIC. To help a user determine which one is preferred, we propose two GOF tests for the null hypothesis that the two approaches give the equivalent variance estimates (and thus inferential results); if the null hypothesis is not rejected, one can simply apply cML-MA-BIC; otherwise, cML-MA-BIC-DP is preferred. All the proposed methods are applicable to GWAS summary data.
Three new competitors to our methods include CAUSE,13 MR-Mix,12 and MR-ContMix.9 As shown in our simulations and in agreement with the original study,13 CAUSE may have dramatically inflated type I error rates, presumably due to its complex modeling and estimation of the parameters related to hidden/unobserved confounding. Furthermore, CAUSE imposes an assumption of (i.e., that an IV can only have a direct effect on either the exposure or the hidden confounder, but not on both simultaneously), though it is yet unknown and debatable whether this assumption is reasonable for real data. It seems advantageous that our proposed methods simply estimate only a small number of necessary parameters (without such an assumption). Both MR-Mix and MR-ContMix are based on multivariate normal mixture models on various effect sizes across the genome, which not only impose stronger modeling assumptions, but also are computationally more demanding. In our experiments, overall, MR-Mix performed only second to our proposed methods with mostly controlled type I error rates and high power, though it might still have either largely inflated or too conservative type I errors while giving biased estimates (e.g., Figures 6 and 5). On the other hand, it is challenging to pre-select a fixed tuning parameter in MR-ContMix, which may negatively influence its performance in some situations as shown in our simulations. At the same time, it is confirmed that two most popular methods, MR-IVW and MR-Egger, do not perform well with dramatically inflated type I error rates and low power in the presence of correlated pleiotropy. Most importantly, we conclude that our proposed cML-MA-BIC (or cML-MA-BIC-DP) was the overall winner based on our extensive numerical studies.
MR-PRESSO and MR-Lasso are two existing methods looking most similar to our proposed methods. For our cML estimates, if an SNP has an estimated total direct effect , then it is an invalid IV and does not contribute to estimating the causal effect θ (as shown in the material and methods section); otherwise it is a valid IV and contributes to estimating θ. Hence, our methods work by selecting and (implicitly) removing invalid IVs, in which sense they are related to MR-PRESSO and MR-Lasso (and an improved variant of MR-PRESSO38). However, there are some important differences. First, we propose a BIC for consistent model selection (and weighting) while MR-PRESSO uses resampling-based significance testing and MR-Lasso is based on a heuristic heterogeneity criterion. We have a rigorous theory to support our proposed method. Second, both MR-PRESSO and MR-Lasso draw inference on the causal effect θ based on a single selected model; due to selection bias, they often have inflated type I error rates and in general biased estimates of θ. In contrast, by accounting for model selection uncertainties through model averaging, our method cML-MA-BIC performs much better as shown in simulations. Third, since MR-PRESSO selects invalid IVs one by one while ours and MR-Lasso select multiple ones simultaneously, MR-PRESSO may miss some invalid IVs (e.g., as well known in statistics that two invalid IVs/outliers may not appear so if checked one by one). In addition, MR-PRESSO fails to properly account for the variability of the delete-1 (IV) estimates of the causal effect while assuming that the delete-1 estimates are all accurate, which may be false, leading to both false positives and false negatives in selecting invalid IVs. On the other hand, MR-Lasso depends on a specified candidate set of the tuning parameter values for the Lasso penalty, which may be difficult to specify a priori. There is also lack of theoretical justification for its heterogeneity-based model selection criterion. Furthermore, in general, as most penalized methods, MR-Lasso yields biased estimates due to the shrinkage effects of the Lasso penalty. Finally, MR-LASSO, as MR-IVW and MR-Egger, imposes the NOME (no measurement errors) assumption by ignoring the variability in estimating each IV-exposure association. Presumably due to these reasons, our proposed methods performed much better than the other two methods in our numerical examples.
Our proposed cML-MA-BIC not only performs extremely well in our numerical examples, but also is quite simple and intuitive with strong theoretical support. In fact, it may be surprising that such a method has not appeared in the MR literature. We note that our methods are based on classic statistical theory (for “large n, small m,” i.e., asymptotics for a large sample size and a fixed/small number of parameters), which is suitable for typical GWASs (with n in tens to hundreds of thousands, while m is no more than a few hundreds). It might be of interest to extend our methods to a high-dimensional “large n, large m” scenario with even a much larger number of IVs: instead of using the full likelihood, we can use the profile likelihood6 as shown in the supplemental material and methods; we will also need to adopt or develop some new model selection criteria for high-dimensional data.39, 40, 41 There are other limitations of our proposed methods. First, we propose a fast algorithm to select valid (or equivalently, invalid) IVs to obtain cML estimates. Since it is a combinatorial and non-convex variable selection problem, the proposed algorithm cannot guarantee finding a global solution. Nevertheless, in our simulations it yielded good results with only one starting value (by setting all parameters at 0). A simple strategy is to use multiple random starting values as used in the real data examples, for which little difference was found. In the future other more sophisticated algorithms42,43 may be adapted and applied. Second, we assume that the two GWAS (summary) data for the exposure and outcome are independent. As in CAUSE, we may estimate and model possible correlations between the two GWAS datasets due to overlapping subjects, and then modify the log-likelihood accordingly. More generally, it is critical to adjust for possible sample structure, including population stratification and subject relatedness, present in some GWAS data.44,45 Our methods, cML-MA-BIC and especially cML-MA-BIC-DP, appeared to work in some preliminary simulaions (not shown) with population stratification as the hidden confounding factor U. Third, as in typical MR applications, we used the same GWAS sample to select significant SNPs as IVs (to meet IV assumption A1) before using the same data for inference. It is known that this double-use of the data could lead to biased inference due to selection bias. Alternatively, we may use an independent GWAS sample to select SNPs to avoid selection bias as in the three-sample MR design,46 or account for selection explicitly.44,47 Fourth, we considered only independent SNPs as IVs; extending to using correlated SNPs as IVs may be useful in other applications, e.g., transcriptome-wide association studies,48, 49, 50, 51, 52, 53, 54 which equally face the analysis challenges with pleiotropic SNPs and thus invalid IVs.55 These are interesting topics for future investigation.
Declaration of interests
The authors declare no competing interests.
Acknowledgments
We thank the two reviewers for many helpful and insightful comments, leading to much improvement. This research was supported by NIH grants R01 AG069895, RF1 AG067924, R01 AG065636, R01 HL116720, R01 GM113250, and R01 GM126002, by NSF grant DMS 1711226, and by the Minnesota Supercomputing Institute at the University of Minnesota.
Published: July 1, 2021
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2021.05.014.
Data and code availability
This study used the GWAS summary datasets that are all publicly available as indicated in their corresponding references. The proposed methods are implemented in R package MRcML, which is publicly available to download on GitHub at https://github.com/xue-hr/MRcML. All other MR methods used for comparison are in publicly available R packages with links given in the web resources section.
Web resources
MR-ContMix, https://cran.r-project.org/web/packages/MendelianRandomization
MR-IVW, MR-Egger, MR-Weighted-Median, MR-Weighted-Mode, MR-RAPS, https://github.com/MRCIEU/TwoSampleMR
MR-Lasso, https://onlinelibrary.wiley.com/doi/full/10.1002/gepi.22295
MR-Mix, https://github.com/gqi/MRMix
MR-PRESSO, https://github.com/rondolab/MR-PRESSO
OMIM, https://www.omim.org/
Software/R package for MR-cML, https://github.com/xue-hr/MRcML
Supplemental information
References
- 1.Verbanck M., Chen C.Y., Neale B., Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. 2018;50:693–698. doi: 10.1038/s41588-018-0099-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Watanabe K., Stringer S., Frei O., Umićević Mirkov M., de Leeuw C., Polderman T.J.C., van der Sluis S., Andreassen O.A., Neale B.M., Posthuma D. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 2019;51:1339–1348. doi: 10.1038/s41588-019-0481-0. [DOI] [PubMed] [Google Scholar]
- 3.Bowden J., Davey Smith G., Haycock P.C., Burgess S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 2016;40:304–314. doi: 10.1002/gepi.21965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bowden J., Del Greco M F., Minelli C., Davey Smith G., Sheehan N., Thompson J. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat. Med. 2017;36:1783–1802. doi: 10.1002/sim.7221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zhao J., Ming J., Hu X., Chen G., Liu J., Yang C. Bayesian weighted Mendelian randomization for causal inference based on summary statistics. Bioinformatics. 2020;36:1501–1508. doi: 10.1093/bioinformatics/btz749. [DOI] [PubMed] [Google Scholar]
- 6.Zhao Q., Wang J., Hemani G., Bowden J., Small D.S. Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. Ann. Stat. 2020;48:1742–1769. [Google Scholar]
- 7.Kang H., Zhang A., Cai T.T., Small D.S. Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization. J. Am. Stat. Assoc. 2016;111:132–144. [Google Scholar]
- 8.Guo Z., Kang H., Cai T.T., Small D.S. Confidence intervals for causal effects with invalid instruments by using two-stage hard thresholding with voting. J. R. Stat. Soc. Series B Stat. Methodol. 2018;80:793–815. [Google Scholar]
- 9.Burgess S., Foley C.N., Allara E., Staley J.R., Howson J.M.M. A robust and efficient method for Mendelian randomization with hundreds of genetic variants. Nat. Commun. 2020;11:376. doi: 10.1038/s41467-019-14156-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Slob E.A.W., Burgess S. A comparison of robust Mendelian randomization methods using summary data. Genet. Epidemiol. 2020;44:313–329. doi: 10.1002/gepi.22295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Zhu X. Mendelian randomization and pleiotropy analysis. Quant. Biol. 2020;2020:1–11. doi: 10.1007/s40484-020-0216-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Qi G., Chatterjee N. Mendelian randomization analysis using mixture models for robust and efficient estimation of causal effects. Nat. Commun. 2019;10:1941. doi: 10.1038/s41467-019-09432-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Morrison J., Knoblauch N., Marcus J.H., Stephens M., He X. Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics. Nat. Genet. 2020;52:740–747. doi: 10.1038/s41588-020-0631-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Burgess S., Bowden J., Dudbridge F., Thompson S.G. Robust instrumental variable methods using multiple candidate instruments with application to Mendelian randomization. arXiv. 2016 1606.03279. [Google Scholar]
- 15.Windmeijer F., Farbmacher H., Davies N., Davey Smith G. On the use of the lasso for instrumental variables estimation with some invalid instruments. J. Am. Stat. Assoc. 2018;114:1339–1350. doi: 10.1080/01621459.2018.1498346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Hartwig F.P., Davey Smith G., Bowden J. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int. J. Epidemiol. 2017;46:1985–1998. doi: 10.1093/ije/dyx102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Bowden J., Davey Smith G., Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 2015;44:512–525. doi: 10.1093/ije/dyv080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Didelez V., Sheehan N. Mendelian randomization as an instrumental variable approach to causal inference. Stat. Methods Med. Res. 2007;16:309–330. doi: 10.1177/0962280206077743. [DOI] [PubMed] [Google Scholar]
- 19.Burgess S., Butterworth A., Thompson S.G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 2013;37:658–665. doi: 10.1002/gepi.21758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bowden J., Del Greco M., Minelli F., Davey C., Smith G., Sheehan N.A., Thompson J.R. Assessing the suitability of summary data for mendelian randomization analyses using MR-Egger regression: the role of the . Int. J. Epidemiol. 2016;45:1961–1974. doi: 10.1093/ije/dyw220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Buckland S.T., Burnham K.P., Augustin N.H. Model selection: an integral part of inference. Biometrics. 1997;53:603–618. [Google Scholar]
- 22.Shen X., Ye J. Adaptive model selection. J. Am. Stat. Assoc. 2002;97:210–221. [Google Scholar]
- 23.Mood A.M., Graybill F.A., Boes D.C. McGraw-Hill Kogakusha; 1974. Introduction to the Theory of Statistics 1974. [Google Scholar]
- 24.Zheng J., Erzurumluoglu A.M., Elsworth B.L., Kemp J.P., Howe L., Haycock P.C., Hemani G., Tansey K., Laurin C., Pourcain B.S., Early Genetics and Lifecourse Epidemiology (EAGLE) Eczema Consortium LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics. 2017;33:272–279. doi: 10.1093/bioinformatics/btw613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lango Allen H., Estrada K., Lettre G., Berndt S.I., Weedon M.N., Rivadeneira F., Willer C.J., Jackson A.U., Vedantam S., Raychaudhuri S. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467:832–838. doi: 10.1038/nature09410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wood A.R., Esko T., Yang J., Vedantam S., Pers T.H., Gustafsson S., Chu A.Y., Estrada K., Luan J., Kutalik Z., Electronic Medical Records and Genomics (eMEMERGEGE) Consortium. MIGen Consortium. PAGEGE Consortium. LifeLines Cohort Study Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 2014;46:1173–1186. doi: 10.1038/ng.3097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.van der Harst P., Verweij N. Identification of 64 Novel Genetic Loci Provides an Expanded View on the Genetic Architecture of Coronary Artery Disease. Circ. Res. 2018;122:433–443. doi: 10.1161/CIRCRESAHA.117.312086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Malik R., Chauhan G., Traylor M., Sargurupremraj M., Okada Y., Mishra A., Rutten-Jacobs L., Giese A.K., van der Laan S.W., Gretarsdottir S., AFGen Consortium. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium. International Genomics of Blood Pressure (iGEN-BP) Consortium. INVENT Consortium. STARNET. BioBank Japan Cooperative Hospital Group. COMPASS Consortium. EPIC-CVD Consortium. EPIC-InterAct Consortium. International Stroke Genetics Consortium (ISGC) METASTROKE Consortium. Neurology Working Group of the CHARGE Consortium. NINDS Stroke Genetics Network (SiGN) UK Young Lacunar DNA Study. MEGASTROKE Consortium Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes. Nat. Genet. 2018;50:524–537. doi: 10.1038/s41588-018-0058-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Morris A.P., Voight B.F., Teslovich T.M., Ferreira T., Segrè A.V., Steinthorsdottir V., Strawbridge R.J., Khan H., Grallert H., Mahajan A., Wellcome Trust Case Control Consortium. Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) Investigators. Genetic Investigation of ANthropometric Traits (GIANT) Consortium. Asian Genetic Epidemiology Network–Type 2 Diabetes (AGEN-T2D) Consortium. South Asian Type 2 Diabetes (SAT2D) Consortium. DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 2012;44:981–990. doi: 10.1038/ng.2383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Demenais F., Margaritte-Jeannin P., Barnes K.C., Cookson W.O.C., Altmüller J., Ang W., Barr R.G., Beaty T.H., Becker A.B., Beilby J., Australian Asthma Genetics Consortium (AAGC) collaborators Multiancestry association study identifies new asthma risk loci that colocalize with immune-cell enhancer marks. Nat. Genet. 2018;50:42–53. doi: 10.1038/s41588-017-0014-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Willer C.J., Schmidt E.M., Sengupta S., Peloso G.M., Gustafsson S., Kanoni S., Ganna A., Chen J., Buchkovich M.L., Mora S., Global Lipids Genetics Consortium Discovery and refinement of loci associated with lipid levels. Nat. Genet. 2013;45:1274–1283. doi: 10.1038/ng.2797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Liu M., Jiang Y., Wedow R., Li Y., Brazel D.M., Chen F., Datta G., Davila-Velderrain J., McGuire D., Tian C., 23andMe Research Team. HUNT All-In Psychiatry Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat. Genet. 2019;51:237–244. doi: 10.1038/s41588-018-0307-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Lu Y., Day F.R., Gustafsson S., Buchkovich M.L., Na J., Bataille V., Cousminer D.L., Dastani Z., Drong A.W., Esko T. New loci for body fat percentage reveal link between adiposity and cardiometabolic disease risk. Nat. Commun. 2016;7:10495. doi: 10.1038/ncomms10495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Horikoshi M., Beaumont R.N., Day F.R., Warrington N.M., Kooijman M.N., Fernandez-Tajes J., Feenstra B., van Zuydam N.R., Gaulton K.J., Grarup N., CHARGE Consortium Hematology Working Group. Early Growth Genetics (EGG) Consortium Genome-wide associations for birth weight and correlations with adult disease. Nature. 2016;538:248–252. doi: 10.1038/nature19806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Locke A.E., Kahali B., Berndt S.I., Justice A.E., Pers T.H., Day F.R., Powell C., Vedantam S., Buchkovich M.L., Yang J., LifeLines Cohort Study. ADIPOGen Consortium. AGEN-BMI Working Group. CARDIOGRAMplusC4D Consortium. CKDGen Consortium. GLGC. ICBP. MAGIC Investigators. MuTHER Consortium. MIGen Consortium. PAGE Consortium. ReproGen Consortium. GENIE Consortium. International Endogene Consortium Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518:197–206. doi: 10.1038/nature14177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Dupuis J., Langenberg C., Prokopenko I., Saxena R., Soranzo N., Jackson A.U., Wheeler E., Glazer N.L., Bouatia-Naji N., Gloyn A.L., DIAGRAM Consortium. GIANT Consortium. Global BPgen Consortium. Anders Hamsten on behalf of Procardis Consortium. MAGIC investigators New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat. Genet. 2010;42:105–116. doi: 10.1038/ng.520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Evangelou E., Warren H.R., Mosen-Ansorena D., Mifsud B., Pazoki R., Gao H., Ntritsos G., Dimou N., Cabrera C.P., Karaman I., Million Veteran Program Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nat. Genet. 2018;50:1412–1425. doi: 10.1038/s41588-018-0205-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zhu X., Li X., Xu R., Wang T. An iterative approach to detect pleiotropy and perform mendelian randomization analysis using GWAS summary statistics. Bioinformatics. 2020 doi: 10.1093/bioinformatics/btaa985. btaa985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Chen J., Chen Z. Extended Bayesian information criteria for model selection with large model spaces. Biometrika. 2008;95:759–771. [Google Scholar]
- 40.Zhang Y., Shen X. Model selection procedure for high-dimensional data. Stat. Anal. Data Min. 2010;3:350–358. doi: 10.1002/sam.10088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Wang L., Kim Y., Li R. Calibrating non-convex penalized regression in ultra-high dimension. Ann. Stat. 2013;41:2505–2536. doi: 10.1214/13-AOS1159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Shen X., Pan W., Zhu Y., Zhou H. On constrained and regularized high-dimensional regression. Ann. Inst. Stat. Math. 2013;65:807–832. doi: 10.1007/s10463-012-0396-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Zhu J., Wen C., Zhu J., Zhang H., Wang X. A polynomial algorithm for best-subset selection problem. Proc. Natl. Acad. Sci. USA. 2020;117:33117–33123. doi: 10.1073/pnas.2014241117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Hu X., Zhao J., Lin Z., Wang Y., Peng H., Zhao H., Wan X., Yang C. MR-APSS: a unified approach to Mendelian Randomization accounting for pleiotropy and sample structure using genome-wide summary statistics. bioRxiv. 2021 doi: 10.1101/2021.03.11.434915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Sanderson E., Richardson T.G., Hemani G., Davey Smith G. The use of negative control outcomes in Mendelian randomization to detect potential population stratification. Int. J. Epidemiol. 2021 doi: 10.1093/ije/dyaa288. dyaa288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Zhao Q., Chen Y., Wang J., Small D.S. Powerful three-sample genome-wide design and robust statistical inference in summary-data Mendelian randomization. Int. J. Epidemiol. 2019;48:1478–1492. doi: 10.1093/ije/dyz142. [DOI] [PubMed] [Google Scholar]
- 47.Wang K., Han S. Effect of selection bias on two sample summary data based Mendelian randomization. Sci. Rep. 2021;11:7585. doi: 10.1038/s41598-021-87219-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Gamazon E.R., Wheeler H.E., Shah K.P., Mozaffari S.V., Aquino-Michaels K., Carroll R.J., Eyler A.E., Denny J.C., Nicolae D.L., Cox N.J., Im H.K., GTEx Consortium A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 2015;47:1091–1098. doi: 10.1038/ng.3367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Gusev A., Ko A., Shi H., Bhatia G., Chung W., Penninx B.W., Jansen R., de Geus E.J., Boomsma D.I., Wright F.A. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 2016;48:245–252. doi: 10.1038/ng.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Hu Y., Li M., Lu Q., Weng H., Wang J., Zekavat S.M., Yu Z., Li B., Gu J., Muchnik S., Alzheimer’s Disease Genetics Consortium A statistical framework for cross-tissue transcriptome-wide association analysis. Nat. Genet. 2019;51:568–576. doi: 10.1038/s41588-019-0345-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Xu Z., Wu C., Wei P., Pan W. A Powerful Framework for Integrating eQTL and GWAS Summary Data. Genetics. 2017;207:893–902. doi: 10.1534/genetics.117.300270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Zhu Z., Zheng Z., Zhang F., Wu Y., Trzaskowski M., Maier R., Robinson M.R., McGrath J.J., Visscher P.M., Wray N.R., Yang J. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nat. Commun. 2018;9:224. doi: 10.1038/s41467-017-02317-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Yang Y., Shi X., Jiao Y., Huang J., Chen M., Zhou X., Sun L., Lin X., Yang C., Liu J. CoMM-S2: a collaborative mixed model using summary statistics in transcriptome-wide association studies. Bioinformatics. 2020;36:2009–2016. doi: 10.1093/bioinformatics/btz880. [DOI] [PubMed] [Google Scholar]
- 54.Yuan Z., Zhu H., Zeng P., Yang S., Sun S., Yang C., Liu J., Zhou X. Testing and controlling for horizontal pleiotropy with probabilistic Mendelian randomization in transcriptome-wide association studies. Nat. Commun. 2020;11:3861. doi: 10.1038/s41467-020-17668-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Wainberg M., Sinnott-Armstrong N., Mancuso N., Barbeira A.N., Knowles D.A., Golan D., Ermel R., Ruusalepp A., Quertermous T., Hao K. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 2019;51:592–599. doi: 10.1038/s41588-019-0385-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
This study used the GWAS summary datasets that are all publicly available as indicated in their corresponding references. The proposed methods are implemented in R package MRcML, which is publicly available to download on GitHub at https://github.com/xue-hr/MRcML. All other MR methods used for comparison are in publicly available R packages with links given in the web resources section.