Abstract
Phenotypic heterogeneity at genomic loci encoding drug targets can be exploited by multivariable Mendelian randomization to provide insight on the pathways by which pharmacological interventions may affect disease risk. However, statistical inference in such investigations may be poor if overdispersion heterogeneity in measured genetic associations is unaccounted for. In this work, we first develop conditional F-statistics for dimension-reduced genetic associations that enable more accurate measurement of phenotypic heterogeneity. We then develop a novel extension for two-sample multivariable Mendelian randomization that accounts for overdispersion heterogeneity in dimension-reduced genetic associations. Our empirical focus is to use genetic variants in the GLP1R gene region to understand the mechanism by which GLP1R agonism affects coronary artery disease (CAD) risk. Colocalization analyses indicate that distinct variants in the GLP1R gene region are associated with body mass index and type 2 diabetes. Multivariable Mendelian randomization analyses that were corrected for overdispersion heterogeneity suggest that bodyweight lowering rather than type 2 diabetes liability lowering effects of GLP1R agonism are more likely contributing to reduced CAD risk. Tissue-specific analyses prioritised brain tissue as the most likely to be relevant for CAD risk, of the tissues considered. We hope the multivariable Mendelian randomization approach illustrated here is widely applicable to better understand mechanisms linking drug targets to diseases outcomes, and hence to guide drug development efforts.
Keywords: drug target Mendelian randomization, phenotypic heterogeneity, overdispersion heterogeneity, GLP1R agonists
Introduction
By phenotypic heterogeneity, we refer to differences in the effects of distinct genetic variants in a single gene region on observable traits (Nussbaum et al., 2007). The increasing size and scope of genome-wide association studies has resulted in the discovery of multiple causal variants for many gene regions (Visscher et al., 2017; Pasaniuc and Price, 2017; Yang et al., 2012). In some cases, different causal variants in the same gene have distinct patterns of association with gene expression, molecular, and phenotypic traits. The principle of Mendelian randomization is that genetic variants can be used as unconfounded proxies to understand the consequences of intervening on the pathway or trait affected by the variants (Davey Smith and Hemani, 2014). The use of genetic associations of variants in a genetic region relating to a molecular target to make causal inferences is known as cis-Mendelian randomization (Schmidt et al., 2020).
Genes code for proteins, which make up the majority of drug targets, particularly for small molecule and biologic compounds. Previous work has demonstrated that cis-Mendelian randomization focused on genes coding for drug target proteins can be used to anticipate the effects of their pharmacological perturbation (Gill et al., 2021; Holmes et al., 2021; Daghlas and Gill, 2023). The human relevance of such insights offers considerable advantages over animal models, which can be difficult to translate to patient populations (Hingorani et al., 2019). Further, the random allocation of genetic variants at conception means that their associations with clinical traits are less vulnerable to the confounding from environmental factors and reverse causation bias that can hinder causal inference in traditional epidemiological study designs (Burgess et al., 2023). However, it is also important to appreciate that genetic variants mimicking drug class effects typically inform on small, lifelong perturbations of the corresponding target, which contrasts with the effects of discrete interventions of larger magnitude that are most commonly encountered in clinical practice (Gill et al., 2021). As such, it is not advisable to directly translate effect estimates from genetic analyses to those that might be expected in clinical practice.
Multivariable Mendelian randomization uses shared genetic predictors of related traits to distinguish between the causal effects of the traits (Sanderson et al., 2019). Such analyses have the potential to provide important mechanistic insights to identify causal risk factors in a way that univariable Mendelian randomization analyses do not: univariable analyses may falsely conclude that a non-causal trait is causal because it is genetically correlated with a causal trait. Multivariable cis-Mendelian randomization studies can therefore provide evidence on the causal mechanism linking the gene to the outcome that could guide the design of a pharmacological intervention trial.
There are, however, several methodological difficulties of multivariable cis-Mendelian randomization. First, in order to identify multivariable trait effects, we require genetic predictors of each trait that are not collinear. This is potentially problematic because it is often not possible to conduct such analyses using a set of variants pruned to minimal correlation since such variants are likely to be few in number and unlikely to provide enough information to reliably estimate several trait effects (Batool et al., 2022). Conversely, the challenge with using a larger number of correlated variants in a single gene region is that they are typically highly correlated. Hence, even if different causal variants have different mechanistic consequences, disentangling the effects of these mechanisms on a disease outcome is tricky.
Second, Mendelian randomization analyses must typically rely on summarised data, representing genetic associations (beta-coefficients and standard errors) from regression on the trait of each genetic variant in turn (Burgess et al., 2013). Modelling approaches using marginal genetic association estimates can suffer from numerical instability, as the matrix of correlations between genetic variants can be near singular (ill-conditioned) even if pairwise correlations between the genetic variants are pruned at a threshold level (Zou et al., 2022).
Previous investigations have tackled these two challenges by using dimension reduction techniques to improve the stability of summary data analyses with a large number of correlated variants, including principal component analysis (Burgess et al., 2017) and factor analysis (Patel et al., 2023). However, these methods may be vulnerable to an important practical problem of overdispersion heterogeneity in variant–outcome associations, whereby genetic variants have direct effects on the outcome, not via their effects on included traits, which manifests as heterogeneity in a random-effects model. The failure to account for overdispersion heterogeneity may lead to overly precise confidence intervals for estimated trait effects and thus misleading analyses, as previously shown for univariable Mendelian randomization (Zhao et al., 2020).
A lack of phenotypic heterogeneity means a lack of unique genetic predictors of each risk factor; this is a called a weak instruments problem in Mendelian randomization, and it limits our ability to identify and reliably estimate genetically-predicted risk factor effects on the outcome. A set of genetic predictors are weak instruments for a risk factor if: (i) they are, collectively, only weakly associated with that risk factor; and/or (ii) their association with that risk factor is collinear with their associations with other risk factors in the model. Conditional F-statistics (Sanderson and Windmeijer, 2016) are a measure of instrument strength for each risk factor, with low values of conditional F-statistics intrinsically linked to larger biases of causal point estimates.
We emphasise that the two types of heterogeneity discussed in this paper have different consequences: phenotypic heterogeneity enables us to reliably estimate multivariable trait effects based on a single gene region, whereas overdispersion heterogeneity presents a challenge for inference using these estimated trait effects. Therefore, following previous terminology (Sanderson et al., 2019), we can think of phenotypic heterogeneity as a ‘good’ type of heterogeneity, and overdispersion heterogeneity as a ‘bad’ type for multivariable Mendelian randomization estimation.
In this work, we bridge this methodological gap by providing a way forward for summary data multivariable cis-Mendelian randomization analysis that is robust to overdispersion heterogeneity. In most Mendelian randomization analyses, genetic variants are used to instrument risk factors, whereas here we use the principal components of genetic variants as instruments to exploit the explanatory power of many correlated genetic variants in the gene region. Our estimation strategy is similar to Sanderson et al. (2021) in that it minimises a continuously-updating generalized method of moments criterion (Hansen et al., 1996), and similar to Batool et al. (2022) in using the principal components of genetic variants as instruments, however it differs from both in how it uses summary data in variance–covariance estimation in order to provide more accurate inferences in a linear model. Further, we aim to combine the strengths of both approaches by allowing for variants that are in highly structured correlation as in Batool et al. (2022), and by providing inferences that are robust to over-dispersion heterogeneity in genetic associations with the outcome as in Sanderson et al. (2021).
Our empirical focus concerns GLP1R agonists, which have proven effective for the treatment of type 2 diabetes and obesity (Aroda et al., 2019; Wilding et al., 2021), and for preventing cardiovascular events in people with type 2 diabetes (Marso et al., 2016; Gerstein et al., 2019; Hernandez et al., 2018), with a cardiovascular outcome clinical trial underway in people with obesity (Ryan et al., 2020). We perform multivariable cis-Mendelian randomization for variants in the GLP1R gene region: first, considering body mass index (BMI) and type 2 diabetes (T2D) as distinct risk factors, representing two outcomes affected by GLP1R agonism; and second, considering GLP1R gene expression in different tissues as risk factors. In each case, the outcome is coronary artery disease (CAD) risk. The aim of these analyses is to better understand the biological pathway by which GLP1R agonism affects CAD risk. These risk factors represent pathways by which the effect of GLP1R agonism may affect CAD risk, rather than the totality of the effect of the risk factor itself. The method proposed here can be implemented using the ‘mr_mvpcgmm’ function in the MendelianRandomization R package (v0.9.0 update; Patel et al., 2023).
Methods
We perform a simulation study and an empirical investigation. The empirical investigation has three connected elements. First, we assess whether genetic associations with BMI and T2D at the GLP1R gene region colocalize, and we compute conditional F-test statistics for dimension-reduced genetic associations. Second, we perform multivariable cis-Mendelian randomization analyses considering the effects of BMI and T2D liability on CAD risk based on variants in the GLP1R gene region. Third, we perform multivariable cis-Mendelian randomization analyses considering the effects of GLP1R gene expression on CAD risk based on variants in the GLP1R gene region.
The first analysis explores whether there is phenotypic heterogeneity for BMI and T2D at the GLP1R locus, to verify that their genetically-predicted effects on CAD risk can be reliably estimated. The second analysis explores whether any effect of perturbing GLP1R pathways on CAD risk is due to the effect of BMI or T2D liability. The third analysis explores the likely tissue at which the effect of GLP1R perturbation on CAD risk occurs. A schematic diagram illustrating these elements is presented as Figure 1.
Data sources
Genetic associations with BMI were obtained from a meta-analysis of 694,649 individuals of European ancestries from UK Biobank and the Genetic Investigation of ANthropometric Traits (GIANT) consortium (Pulit et al., 2019). Genetic associations with T2D risk were obtained from a meta-analysis of 80,154 cases and 853,816 controls of European ancestries from the DIAbetes Meta-ANalysis of Trans-Ethnic association studies (DIAMANTE) (Mahajan et al., 2022) consortium. Genetic associations with gene expression were estimated in 838 participants from the Genotype-Tissue Expression (GTEx) project version 8 (GTEx Consortium, 2020). Genetic associations with CAD were obtained from a GWAS of 1,165,690 individuals (181,522 cases) mostly of European ancestries from the Coronary ARtery DIsease Genome wide Replication and Meta-analysis plus The Coronary Artery Disease Genetics (CARDIoGRAMplusC4D) consortium (Aragam et al., 2022). Correlation matrices for variants were estimated using 367,703 unrelated UK Biobank participants of European ancestries (Astle et al., 2016). We considered associations of genetic variants in a region ± 100kbp of the GLP1R gene (chr6:39,016,557-39,059,079 in GRCh37/hg19).
We perform two tissue-specific analyses: a limited analysis, comparing gene expression in three tissues; and an extended analysis, comparing gene expression in ten tissues. The three tissues in the limited analysis were chosen as those with high levels of GLP1R expression and in biologically plausible tissues: brain, pancreas, and heart. The tissue subtype that had greatest GLP1R expression: brain-caudate, pancreas, and heart-atrial appendage were used in the analysis. The ten tissues in the extended analysis were chosen based on GLP1R expression levels alone, and additionally include thyroid, testis, stomach, nerve, lung, heart-left ventricle, and brain-hypothalamus.
We pruned variants to according to an R2 ≤ 0.95 correlation threshold, and we removed variants that were not associated with at least one risk factor at p-value less than 0.05. This led to the use of 253 genetic variant associations for the primary analysis of the effects of BMI and T2D on CAD risk, 169 genetic variant associations for the tissue-specific analyses with 3 tissue types, and 288 genetic variant associations for the tissue-specific analyses with 10 tissue types.
Statistical methods
Colocalization analyses may be used to assess whether two traits have distinct causal variants in a single genetic region, and thus can provide evidence for phenotypic heterogeneity. Colocalization was performed by the coloc method using the default prior settings (Giambartolomei et al., 2014; p1 = p2 = 10−4, p12 = 10−5), where pk is the prior probability that a genetic variant is associated with trait k, (k = 1, 2). We also performed a sensitivity analysis varying the value of p12, which represents the prior probability of a variant being causal for both traits.
Conditional F-statistics are an alternative way to assess phenotypic heterogeneity, which is required to reliably estimate risk factor effects on the outcome using multivariable Mendelian randomization. More formally, conditional F-statistics (Sanderson and Windmeijer, 2016; Sanderson et al., 2021) can be used to test for evidence against a rank reduction of one for any given risk factor, where the genetically-predicted effects of the risk factor can be expressed as a linear combination of other genetically-predicted risk factor effects (top-left panel of Figure 1). Sanderson et al. (2021) derive summary data versions of conditional F-statistics for uncorrelated variants. Compared with Sanderson et al. (2021), here we consider a different approach to calculate conditional F-statistics that: (i) uses summary data differently such that the statistics more accurately follow a limiting χ2 null distribution in a linear model; and (ii) is compatible with correlated variants, which is important for our cis-gene focus. We computed conditional F-statistics for BMI and T2D; a higher value of the conditional F-statistic suggests greater evidence of phenotypic heterogeneity for that risk factor.
Multivariable cis-Mendelian randomization analyses were performed using the Principal Component analysis-based Generalised Method of Moments (PC-GMM) method. This method extends the previously published multivariable inverse-variance weighted principal component analysis method (Batool et al., 2022) in two directions: firstly, it uses the continuously updating generalised method of methods (GMM) method (Hansen et al., 1996), which is known to be less sensitive to weak instruments than other instrumental variable methods (Antoine and Renault, 2009; Chao and Swanson, 2005); and secondly, it can allow for heterogeneity in the model by incorporating an overdispersion parameter.
The intuition behind the approach is that the additional uncertainty due to overdispersion heterogeneity affects the variance of estimates rather than the bias. Hence, consistent estimation is possible without accounting for possible overdispersion, which allows us to propose an overdispersion parameter that can be used to correct standard errors. We refer to the method with an overdispersion parameter as “robust PC-GMM”.
For the unrobust version of the PC-GMM method that assumes there is no overdispersion heterogeneity, an overidentification test (Hansen, 1982) (henceforth, heterogeneity test) can be used to assess heterogeneity that is unexplained by the model; a rejection of this test indicates model misspecification, which is a sign of possible pleiotropy. Further technical details of conditional F-statistics and the robust PC-GMM method are found in Supplementary Material.
Comparison of models for the extended analysis was performed using the Mendelian Randomization Bayesian Model Averaging (MR-BMA) method (Zuber et al., 2020). Fitting a model with all ten tissues as risk factors can result in imprecise estimates that are difficult to interpret due to multicollinearity. Instead, the MR-BMA method fits models with each risk factor in turn, all pairs of risk factors, all triples of risk factors, and so on. Each model, representing a particular combination of risk factors, receives a posterior model probability based on its goodness-of-fit; the model that best explains the genetic associations with the outcome will receive the greatest posterior probability. Additionally, each risk factor is assigned a marginal inclusion probability, calculated as the sum of the posterior model probabilities for all models including that risk factor. The method is implemented using stochastic search, with a prior probability of p = 0.1 for each tissue, representing a prior expectation that one tissue is the true causal tissue.
Similar to the PC-GMM method, we transformed univariable trait on genetic variant linear regression coefficients into multivariable trait on principal components linear regression coefficients (Equation 3 of Supplementary Material). The MR-BMA method was then applied to these PCA-transformed multivariable linear regression coefficients; we used 34 principal components that explained 99.9% of variation in a weighted genetic correlation matrix of the 288 genetic variants.
The PC-GMM method requires an input of a risk factor correlation matrix. For our analysis of tissue-specific effects, only the trait correlations between the two brain tissues and between the two heart tissues were set at 0.7. Otherwise, the correlation entries were set equal to 0. Sensitivity to these values was investigated. Unless otherwise stated, all Mendelian randomization estimates are expressed as log odds ratios per 1 standard deviation increase in genetically-predicted levels of the risk factor.
Simulation study
Our simulation study comprises two parts. The first part investigates how the bias of Mendelian randomization estimates varies with conditional F-statistics, a measure of phenotypic heterogeneity ξ. The second part investigates the ability of the robust PC-GMM method to provide reliable inferences under overdispersion heterogeneity κ2. An illustration of phenotypic and overdispersion heterogeneity in our simulation design is given in Figure 2.
We generated two-sample summary data from a linear instrumental variable model with 3 risk factors and 200 instruments. Out of the 200 instruments, 15 instruments that were mutually weakly correlated (R2 ≤ 0.2) had non-zero fixed effects on at least one risk factor. The remaining 185 instruments had no effect on any risk factor. The instruments, Z = (Z1, …, Z200)′, were normally distributed with mean zero and were mutually correlated according to a measured genetic variant correlation matrix of the GLP1R gene region based on individuals of European ancestries (Supplementary Figure S1). The three risk factors were generated as , and , where the errors (V1, V2, V3) were normally distributed such that var(V1) = var (V2) = var (V3) = 1, cor (V1, V2) = cor (V1, V3) = cor (V2, V3) = 0.3. Hence, the parameter ξ ∈ [0, 1] is a measure of phenotypic heterogeneity, with ξ = 0 implying no phenotypic heterogeneity for risk factors 2 and 3, and ξ = 1 implying phenotypic heterogeneity for all risk factors.
The sample sizes nX and nY used to compute two-sample summary data were varied from 500 to 20, 000. Knowledge of the true instrument correlation matrix and a sample correlation matrix of risk factors was assumed. The impact of mis-specifying risk factor correlations is discussed in Supplementary Material. The number of principal components was chosen to explain 99.99% of variation of a sample weighted instrument correlation matrix. Each of the 200 instruments had a direct effect on the outcome which was normally distributed with mean zero and variance proportional to an overdispersion parameter κ2 ranging between 0 and 1. The outcome was generated as Y = θ1X1 + θ2X2 + θ3X3 + Z′α + U, where U is a mean zero error term with variance 1, and is correlated with the errors in the exposure model such that cor(Vk, U) = 0.2 for k = 1, 2, 3. The random effect α was normally distributed with mean zero and variance equal to . The causal effect of the 3 risk factors on the outcome was set equal to θ = (θ1, θ2, θ3)′ = (−1/3, 0, 1/3)′. Therefore, for risk factors 1 and 3 we tested the ability of the robust PC-GMM method to successfully reject the null hypothesis of no causal effect, whereas for risk factor 2 we tested the ability of our method to control the type I error rate.
Results
Simulation results
The results of robust PC-GMM estimation and inference for the case of no overdispersion heterogeneity (κ2 = 0) and varying phenotypic heterogeneity (0 ≤ ξ ≤ 1) are displayed in Figure 3. In reporting the rejection rates of the conditional F-tests, the conditional F-statistics were compared to the 0.95 quantile of their limiting χ2 distribution under the null hypothesis of no phenotypic heterogeneity. We note that these are not weak instrument-adjusted critical values that could be used to infer a particular level of weak instrument bias, as studied in Sanderson and Windmeijer (2016). The first row of Figure 3 shows that for large enough sample sizes the type I error rates of conditional F-tests were controlled at the nominal 5% level when ξ = 0, and the tests were able to detect phenotypic heterogeneity when ξ ≠ 0. The second and third rows of Figure 3 highlight that the estimates and were heavily biased when the conditional F-statistics F2|−2 and F3|−3 were low; this estimation bias disappears when ξ is further away from 0 and when sample sizes are large (and hence the conditional F-statistics are larger). The final row of Figure 3 shows that the type I error rate for the test H0 : θ2 = 0 was not controlled at 5% unless there was sufficient phenotypic heterogeneity and a large enough sample size. Similarly, the power to detect the non-zero effect θ3 = 1/3 was higher for larger values of the conditional F-statistic, since the estimate was otherwise biased toward θ2 = 0.
The results of PC-GMM estimation and inference for the case of phenotypic heterogeneity (ξ = 1) and varying overdispersion heterogeneity (0 ≤ κ2 ≤ 1) are displayed in Figure 4. The first row displays the root-mean squared error (RMSE) performance of PC-GMM estimates, and the second row displays the type I error rates when testing the null effect θ2 = 0 as well as the power to detect the non-zero effects θ1 and θ3. When κ2 = 0, there appears to be a small loss in power going from the unrobust to robust estimates, but for larger sample sizes the robust PC-GMM method was competitive in terms of power. Importantly, however, only the robust PC-GMM method was able to control the type I error rate of near the nominal 5% level, unlike the unrobust version that had a type I error rate of nearly 40% when κ2 = 1 even in very large samples (nX = nY = 20, 000).
Strangely, in smaller samples, the methods were unable to detect the negative effect θ1 = −1/3 quite as well as the positive effect θ3 = 1/3. This power asymmetry has recently been discussed in the context of two-stage least squares estimates (Keane and Neal, 2023), and our findings suggest a similar phenomenon may exist here for summary data PC-GMM estimates. The robust PC-GMM method also offered improved estimation in smaller samples in terms of RMSE compared with the unrobust version when κ2 > 0. However, the RMSE from both methods fall with larger sample sizes. This is a direct consequence of our modelling assumptions, which imply that even the unrobust estimates should be consistent; the problem of overdispersion heterogeneity affects inference rather than estimation.
Finally, we note that the results when choosing the number of principal components to explain 99.9% of variation in the weighted genetic correlation matrix exhibited very similar patterns to those using the 99.99% threshold that are shown in Figures 3 and 4. In practice, we recommend using a fairly large number of principal components in order to reliably estimate an overdispersion parameter κ2, and hence to obtain accurate inferences. However, we should also be wary that the inclusion of principal components that are only weakly associated with the risk factors would lower the values of conditional F-statistics.
Colocalization for BMI and T2D at the GLP1R gene region
Colocalization did not provide evidence supporting a shared causal variant for both BMI and T2D at the GLP1R locus. At the default prior values, the posterior probability of the shared causal variant hypothesis (H4) was 0.1%. The posterior probability that there is a causal variant for T2D but not for BMI (H2) was 73.3%, and the posterior probability that there are distinct causal variants for BMI and T2D (H3) was 24.4%. The posterior probability that there are distinct causal variants for BMI and T2D conditional on both variants having a causal variant (H3/(H3 + H4)) was 99.6%. This suggests the presence of phenotypic heterogeneity at this locus. A sensitivity analysis indicated little support for the shared causal variant hypothesis at any value of the p12 parameter (Supplementary Figure S2).
The conditional F-statistic relating to BMI using 25 principal components of 253 genetic variants was FBMI|T2D = 3.516, while the corresponding conditional F-statistic relating to T2D risk was FT2D|BMI = 4.475. Conditional F-tests rejected the null hypothesis of no phenotypic heterogeneity (P < 0.001). In Supplementary Figure S3, we also note Mendelian randomization estimates when using a fewer number of relevant principal components as instruments, which led to larger conditional F-statistics for both risk factors. For example, FBMI|T2D = 9.721 and FT2D|BMI = 11.725 for the case where only 4 principal components were used.
Mendelian randomization analyses for BMI and T2D on CAD risk
Results from the robust PC-GMM method for the analysis including BMI and T2D risk are presented in Table 1. When using 25 principal components that explained 99% of weighted variation in 253 genetic variants, genetically-predicted BMI was associated with CAD risk (log odds ratio estimate per 1 standard deviation increase in BMI 1.146, 95% confidence interval [CI] 0.292, 2.000), but genetically-predicted T2D liability was not (log odds ratio estimate per 1 unit increase in the log odds of T2D risk -0.012, 95% CI -0.241, 0.217).
Table 1. Robust PC-GMM results: Genetically-predicted multivariable BMI and T2D effects on CAD risk using 25 principal components that explain 99% of weighted genetic variation in GLP1R.
estimate | 95% CI | p-value | # PCs | # genetic variants | |
---|---|---|---|---|---|
BMI | 1.146 | 0.292, 2.000 | 0.009 | 25 | 253 |
T2D | -0.012 | -0.241, 0.217 | 0.917 |
Estimates were similar when not accounting for overdispersed heterogeneity in the genetic associations (Supplementary Table S1), but the heterogeneity test was rejected (P < 0.001). The results were not very sensitive to the number of principal components included in the analysis; when using principal components explaining 95% and 99.9% of weighted genetic variation, estimates were similar in magnitude, but less precise when using the 95% threshold (Supplementary Table S1). Finally, the results were also similar when using different numbers of principal components (Supplementary Figure S3).
Mendelian randomization analyses for tissue-specific GLP1R expression on CAD risk
Results from the robust PC-GMM method for the analyses of gene expression in different tissues are presented in Figure 5 when using 11 principal components that explained 99% of weighted variation in 169 genetic variants for the limited analysis (Figure 5A), and when using 34 principal components that explained 99.9% of weighted variation in 288 genetic variants for the extended analysis (Figure 5B). In the limited analysis comparing gene expression in pancreas, brain-caudate, and heart-atrial appendage, genetically-predicted gene expression in brain-caudate was associated with CAD risk (estimate -0.076, 95% CI -0.119, -0.032), whereas gene expression in the other tissues was not (Figure 5A). The conditional F-statistics relating to the 3 tissues were between 2.280 and 5.474. Results were similar when using different numbers of principal components (Supplementary Figure S4). Similarly, in the extended analysis, only gene expression in brain-caudate and brain-hypothalamus was associated with CAD risk at a conventional level of statistical significance (Figure 5B; brain-caudate estimate -0.056, 95% CI -0.105 -0.007, and brain-hypothalamus estimate -0.066, 95% CI -0.110 - 0.022). However, conditional F-statistics relating to 7 of the 10 tissues were less than 1, and significance for other tissues may have been masked by the complexity of the model.
The MR-BMA method, which compares evidence for models containing different combinations of risk factors, indicated a clear preference for the model containing brain-caudate and no other tissue (Table 2). The posterior probability for this model was 45.9%. The next highest ranking models contained stomach only (posterior probability 14.8%), nerve only (8.0%), testis only (6.6%), and then brain-hypothalamus only (5.4%). Overall, the marginal inclusion probability for brain-caudate was 47.7%, indicating that brain-caudate was selected as a causal risk factor in models with a total posterior probability of 47.7%. The next ranking tissues by marginal inclusion probability were stomach (15.5%) and nerve (8.6%).
Table 2. MR-BMA results: Top 5 ranked models and corresponding posterior probabilities.
model rank | tissues selected by MR-BMA | posterior probability (%) |
---|---|---|
1 | brain-caudate | 45.9 |
2 | stomach | 14.8 |
3 | nerve | 8.0 |
4 | testis | 6.6 |
5 | brain-hypothalamus | 5.4 |
Discussion
In this paper, we have developed a novel extension to multivariable cis-Mendelian randomization that offers accurate confidence intervals when overdispersion heterogeneity in dimension-reduced genetic associations is detected. Conventional standard errors that ignore the presence of overdispersion heterogeneity may lead to inflated type I error rates, as illustrated in our simulation study.
We demonstrated the use of this method to better understand the effect of GLP1R agonism on CAD risk. We found evidence for an association between genetically-predicted BMI and CAD risk in a multivariable model including both BMI and T2D, suggesting that the mechanistic pathway from GLP1R to CAD risk passes predominantly via BMI rather than T2D. We also demonstrated an association between genetically-predicted GLP1R gene expression in the brain and CAD risk in a multivariable model including GLP1R gene expression in specific areas of the brain and heart, and the pancreas. This was further validated in a multivariable model including gene expression in a wide range of tissues.
Our findings provide mechanistic insight into the effects of GLP1R agonists in preventing CAD. The results are consistent with the bodyweight lowering effects of GLP1R agonism being involved in mediating reduced CAD risk more than T2D liability reducing effects. Further, bodyweight reduction may in itself be reducing T2D liability (Gill et al., 2021). However, it is important to appreciate that reduced inflammation, blood pressure, and triglycerides are other mechanisms that are also reduced by GLP1R agonists and which may contribute to reduce CAD risk (Rakipovski et al., 2018).
These results of this study are hypothesis-generating. It should, however, be noted that these data directly relate to risk of CAD incidence, rather than progression. Thus, our results support that GLP1R agonism is likely to be efficacious through these mechanisms for CAD prevention. Further work may investigate related cardiovascular outcomes, including stroke and heart failure.
We note that our method is widely applicable for robust inference in general multivariable cis-Mendelian randomization investigations. For many drug targets, the mechanism of action of the drug is unclear. Multivariable MR can provide insights into plausible risk factors, which represent causal agents or surrogate markers indicating the relevant causal pathway (Schmidt et al., 2021).
In the two-sample setting considered in the simulation, there was some type I error inflation at low levels of phenotypic heterogeneity, particularly with small sample sizes and low conditional F statistics. In practice, we would advise comparing results with fewer genetic principal components (say, variants explaining 95% or 99% of variation in the weighted genetic correlation matrix), which would typically result in lower power but less bias in estimates, and with more principal components (say, variants explaining 99.9% of variation), which would typically result in greater power, but could lead to bias from weak instruments. In a one-sample setting, greater care to avoid using weak instruments would be advised. In our applied examples, results were broadly similar with different numbers of principal components, although the conditional F-statistics were quite low for our tissue-specific analyses.
The key strength of our method is that it offers a way to disentangle and obtain robust inferences on effects of several correlated risk factors on an outcome, thereby illuminating causal mechanisms that could potentially inform the design of therapeutic interventions. The general advantages of many summary data Mendelian randomization investigations also apply; the analyses may be less vulnerable to biases due to reverse causality, and the increasing availability of large-scale genetic association data and genotyped biobank data makes the use of such approaches for gaining insight into drug target effects both time and cost efficient.
There are, however, some limitations of our investigation. In terms of methodology, while the overdispersion correction accounts for the extra uncertainty due to random heterogeneity in genetic associations, it does not account for more idiosyncratic heterogeneity which could lead to biased estimation. In our application, we do not use cell-specific gene expression data which could strengthen causal claims and identify cell-type specific effects (Haglund et al., 2022). Moreover, the tissues we considered for our analyses were based on measured GLP1R expression levels, but they may not necessarily be the tissues most biologically relevant for the study of GLP1R agonism. Specifically, for GLP1R expression in the brain it is difficult to map these tissues onto effects, as they may be contributing to weight loss, but could also be involved in glycemia, inflammation or lipid lowering. Another limitation of our analysis is that it does not consider time varying exposure effects; previous studies have suggested that the effect of BMI on CAD risk may depend on BMI over life course (Richardson et al., 2020). It may also be useful to consider additional relevant effects of GLP1R agonism, such as reduced inflammation, blood pressure, and triglyceride levels, in order to further understand the mechanisms by which GLP1 agonists exert their effects on reducing CAD risk.
As a final semantic point, we note that the term “phenotypic heterogeneity” is used inconsistently in the literature: in some cases, it refers to distinct effects of the same genetic polymorphism (Wolf, 1997; particular for highly disruptive variants); whereas elsewhere, it refers to distinct effects of different polymorphisms in the same gene region (Nussbaum et al., 2007). We here use the latter definition, and encourage future researchers to clearly define the term in context to clarify whether heterogeneity is being explored at the variant or gene level.
In conclusion, we have presented evidence suggesting that the effect of GLP1R agonism on CAD risk operates predominantly via BMI reduction, as compared to reduced T2D liability. Separate tissue-specific analyses suggest brain mechanisms may be relevant for CAD risk compared with other considered tissues. We hope our investigation serves as both a substantive analysis and a didactic example of how phenotypic heterogeneity at a gene corresponding to a druggable target can be used to investigate the mechanism and site-of-action of the causal effect of pharmacological intervention of that target on a disease outcome.
Supplementary Material
Funding acknowledgements
This research was funded by the United Kingdom Research and Innovation Medical Research Council (MC-UU-00002/7), and supported by the National Institute for Health Research Cambridge Biomedical Research Centre (BRC-1215-20014). S.B. was supported by the Wellcome Trust and the Royal Society (204623/Z/16/Z and 225790/Z/22/Z).
Footnotes
Conflict of interest disclosure
J.B. is a part time employee of Novo Nordisk. D.S. and L.B.K. are full time employees of Novo Nordisk.
Contributor Information
Dipender Gill, Email: dipender.gill@imperial.ac.uk.
Dmitry Shungin, Email: DMSN@novonordisk.com.
Christos S. Mantzoros, Email: cmantzor@bidmc.harvard.edu.
Lotte Bjerre Knudsen, Email: lbkn@novonordisk.com.
Jack Bowden, Email: J.Bowden2@exeter.ac.uk.
Stephen Burgess, Email: sb452@medschl.cam.ac.uk.
References
- Antoine B, Renault E. Efficient GMM with nearly-weak instruments. The Econometrics Journal. 2009;12(S1):S135–S171. [Google Scholar]
- Aragam KG, Jiang T, Goel A, Kanoni S, Wolford BN, Atri DS, Weeks EM, Wang M, Hindy G, Zhou W. Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants. Nature Genetics. 2022;54(1):1803–1815. doi: 10.1038/s41588-022-01233-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aroda VR, Ahmann A, Cariou B, Chow F, Davies MJ, Jódar E, Mehta R, Woo V, Lingvay I. Comparative efficacy, safety, and cardiovascular outcomes with once-weekly subcutaneous semaglutide in the treatment of type 2 diabetes: insights from the SUSTAIN 1-7 trials. Diabetes and Metabolism. 2019;45(5):409–418. doi: 10.1016/j.diabet.2018.12.001. [DOI] [PubMed] [Google Scholar]
- Astle WJ, Elding H, Jiang T, Allen D, Ruklisa D, Mann AL, Mead D, Bouman H, Riveros-Mckay F, Kostadima MA. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell. 2016;167(5):1415–1429. doi: 10.1016/j.cell.2016.10.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Batool F, Patel A, Gill D, Burgess S. Disentangling the effects of traits with shared clustered genetic predictors using multivariable Mendelian randomization. Genetic Epidemiology. 2022;46(7):415–429. doi: 10.1002/gepi.22462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genetic Epidemiology. 2013;37(7):658–665. doi: 10.1002/gepi.21758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burgess S, Mason AM, Grant A, Slob EAW, Gkatzionis A, Gill D, et al. Using genetic association data to guide drug discovery and development: Review of methods and applications. American Journal of Human Genetics. 2023;110(2):195–214. doi: 10.1016/j.ajhg.2022.12.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burgess S, Zuber V, Valdes-Marquez E, Sun BB, Hopewell JC. Mendelian randomization with fine-mapped genetic data: choosing from large numbers of correlated instrumental variables. Genetic Epidemiology. 2017;41(8):714–725. doi: 10.1002/gepi.22077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chao JC, Swanson NR. Consistent estimation with a large number of weak instruments. Econometrica. 2005;73(5):1673–1692. [Google Scholar]
- Daghlas I, Gill D. Mendelian randomization as a tool to inform drug development using human genetics. Cambridge Prisms: Precision Medicine. 2023;1:e16. doi: 10.1017/pcm.2023.5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Human Molecular Genetics. 2014;23(R1):R89–R98. doi: 10.1093/hmg/ddu328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gerstein HC, Colhoun HM, Dagenais GR, Diaz R, Lakshmanan M, Pais P, Probstfield J, Riesmeyer JS, Riddle MC, Rydén L, REWIND Investigators Dulaglutide and cardiovascular outcomes in type 2 diabetes (REWIND): a double-blind, randomised placebo-controlled trial. The Lancet. 2019;394(10193):121–130. doi: 10.1016/S0140-6736(19)31149-3. [DOI] [PubMed] [Google Scholar]
- Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLOS Genetics. 2014;10(5):1–15. doi: 10.1371/journal.pgen.1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gill D, Georgakis MK, Walker VM, Schmidt AF, Gkatzionis A, Freitag DF, Finan C, Hingorani AD, Howson JMM, Burgess S. Mendelian randomization for studying the effects of perturbing drug targets. Wellcome Open Research. 2021;6(16):1–19. doi: 10.12688/wellcomeopenres.16544.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gill D, Zuber V, Dawson J, Pearson-Stuttard J, Carter AR, Sanderson E, Karhunen V, Levin MG, Wootton RE, Klarin D, Tsao PS, et al. Risk factors mediating the effect of body mass index and waist-to-hip ratio on cardiovascular outcomes: Mendelian randomization analysis. International Journal of Obesity. 2021;45(7):1428–1438. doi: 10.1038/s41366-021-00807-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369(6509):1318–1330. doi: 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haglund A, Zuber V, Yang Y, Abouzeid M, Feleke R, Ko JH, Nott A, Babtie AC, Bryois J, Bottolo L, Johnson MR, et al. Single-cell Mendelian randomisation identifies cell-type specific genetic effects on human brain disease and behaviour. bioRxiv. 2022:1–33. doi: 10.1101/2022.11.28.517913. [DOI] [Google Scholar]
- Hansen LP. Large sample properties of generalized method of moments estimators. Econometrica. 1982;50(4):1029–1054. [Google Scholar]
- Hansen LP, Heaton J, Yaron A. Finite-sample properties of some alternative GMM estimators. Journal of Business and Economic Statistics. 1996;14(3):262–280. [Google Scholar]
- Hernandez AF, Green JB, Janmohamed S, D’Agostino RB, Granger CB, Jones NP, Leiter LA, Rosenberg AE, Sigmon KN, Somerville MC, Harmony Outcomes committees and investigators Albiglutide and cardiovascular outcomes in patients with type 2 diabetes and cardiovascular disease (Harmony Outcomes): a double-blind, randomised placebo-controlled trial. The Lancet. 2018;392(10157):1519–1529. doi: 10.1016/S0140-6736(18)32261-X. [DOI] [PubMed] [Google Scholar]
- Hingorani AD, Kuan V, Finan C, Kruger FA, Gaulton A, Chopade S, Sofat R, MacAllister RJ, Overington JP, Hemingway H. Improving the odds of drug development success through human genomics: modelling study. Nature Scientific Reports. 2019;9(18911):1–25. doi: 10.1038/s41598-019-54849-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holmes MV, Richardson TG, Ference BA, Davies NM, Davey Smith G. Integrating genomics with biomarkers and therapeutic targets to invigorate cardiovascular drug development. Nature Reviews Cardiology. 2021;18(6):435–453. doi: 10.1038/s41569-020-00493-1. [DOI] [PubMed] [Google Scholar]
- Keane M, Neal T. Instrument strength in IV estimation and inference: A guide to theory and practice. Journal of Econometrics. 2023:1–29. (in print) [Google Scholar]
- Mahajan A, Spracklen CN, Zhang W, Ng MCY, Petty LE, Kitajima H, Yu GZ, Rueger S, Speidel L, Kim YJ. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nature Genetics. 2022;54(5):560–572. doi: 10.1038/s41588-022-01058-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marso SP, Daniels GH, Brown-Frandsen K, Kristensen P, Mann JFE, Nauck MA, Nissen SE, Pocock S, Poulter NR, Ravn LS, Buse JB. Liraglutide and cardiovascular outcomes in type 2 diabetes. New England Journal of Medicine. 2016;375(4):311–322. doi: 10.1056/NEJMoa1603827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nussbaum RL, McInnes RR, Willard HF. Thompson and Thompson: Genetics in medicine. 7th ed Saunders; 2007. [Google Scholar]
- Pasaniuc B, Price AL. Dissecting the genetics of complex traits using summary association statistics. Nature Reviews Genetics. 2017;18(2):117–127. doi: 10.1038/nrg.2016.142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patel A, Gill D, Newcombe PJ, Burgess S. Conditional inference in cis-Mendelian randomization using weak genetic factors. Biometrics. 2023:1–14. doi: 10.1111/biom.13888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patel A, Ye T, Xue H, Lin Z, Xu S, Woolf B, Mason AM, Burgess S. Mendelian-Randomization v0.9.0: updates to an R package for performing Mendelian randomization analyses using summarized data. Wellcome Open Research. 2023;8:449. doi: 10.12688/wellcomeopenres.19995.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pulit SL, Stoneman C, Morris AP, Wood AR, Glastonbury CA, Tyrrell J, Yengo L, Ferreira T, Marouli E, Ji Y. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry. Human Molecular Genetics. 2019;28(1):166–174. doi: 10.1093/hmg/ddy327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rakipovski G, Rolin B, Nøhr J, Klewe I, Frederiksen KS, Augustin R, Hecksher-Sørensen J, Ingvorsen C, Polex-Wolf J, Knudsen LB. The GLP-1 analogs liraglutide and semaglutide reduce atherosclerosis in ApoE-/- and LDLr-/- mice by a mechanism that includes inflammatory pathways. JACC: Basic to Translational Science. 2018;3(6):844–857. doi: 10.1016/j.jacbts.2018.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richardson TG, Sanderson E, Elsworth B, Tilling K, Smith GD. Use of genetic variation to separate the effects of early and later life adiposity on disease risk: Mendelian randomisation study. BMJ. 2020;369(m1203):1–12. doi: 10.1136/bmj.m1203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ryan DH, Lingvay I, Colhoun HM, Deanfield J, Emerson SS, Kahn SE, Kushner RF, Marso S, Plutzky J, Brown-Frandsen K. Semaglutide effects on cardiovascular outcomes in people with overweight or obesity (SELECT) rationale and design. American Heart Journal. 2020;229:61–69. doi: 10.1016/j.ahj.2020.07.008. [DOI] [PubMed] [Google Scholar]
- Sanderson E, Davey Smith G, Windmeijer F, Bowden J. An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. International Journal of Epidemiology. 2019;48(3):713–727. doi: 10.1093/ije/dyy262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanderson E, Spiller W, Bowden J. Testing and correcting for weak and pleiotropic instruments in two-sample multivariable Mendelian randomization. Statistics in Medicine. 2021;40:5434–5452. doi: 10.1002/sim.9133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanderson E, Windmeijer F. A weak instrument F-test in linear IV models with multiple endogenous variables. Journal of Econometrics. 2016;190(2):212–221. doi: 10.1016/j.jeconom.2015.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmidt AF, Finan C, Gordillo-Maranon M, Asselbergs FW, Freitag DF, Patel RS, et al. Genetic drug target validation using Mendelian randomization. Nature Communications. 2020;11(3255):1–12. doi: 10.1038/s41467-020-16969-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmidt AF, Hunt NB, Gordillo-Maranon M, Charoen P, Drenos F, Finan C, et al. Cholesteryl Ester Transfer Protein (CETP) as a drug target for cardiovascular disease. Nature Communications. 2021;12(1):1–10. doi: 10.1038/s41467-021-25703-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J. 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics. 2017;101(1):5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilding JPH, Batterham RL, Calanna S, Davies M, Van Gaal LF, Lingvay I, McGowan BM, Rosenstock J, Tran MTD, Wadden TA. Once-weekly semaglutide in adults with overweight or obesity. New England Journal of Medicine. 2021;18(384(11)):989–1002. doi: 10.1056/NEJMoa2032183. [DOI] [PubMed] [Google Scholar]
- Wolf U. Identical mutations and phenotypic variation. Human Genetics. 1997;100(3-4):305–321. doi: 10.1007/s004390050509. [DOI] [PubMed] [Google Scholar]
- Yang J, Ferreira T, Morris AP, Medland SE. Giant Consortium, DIAGRAM Consortium. Madden PAF, Heath AC, Martin NG, Montgomery GW. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nature Genetics. 2012;44(4):369–375. doi: 10.1038/ng.2213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Q, Wang J, Hemani G, Bowden J, Small DS. Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. Annals of Statistics. 2020;48(3):1742–1769. [Google Scholar]
- Zou Y, Carbonetto P, Wang G, Stephens M. Fine-mapping from summary data with the “Sum of Single Effects” model. PLoS Genetics. 2022;18(7):1–24. doi: 10.1371/journal.pgen.1010299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zuber V, Colijn JM, Klaver C, Burgess S. Selecting likely causal risk factors from high-throughput experiments using multivariable Mendelian randomization. Nature communications. 2020;11(29):1–11. doi: 10.1038/s41467-019-13870-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.