Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2024 Jul 24;111(8):1736–1749. doi: 10.1016/j.ajhg.2024.06.012

An integrative multi-context Mendelian randomization method for identifying risk genes across human tissues

Yihao Lu 1, Ke Xu 2, Nathaniel Maydanchik 1, Bowei Kang 1, Brandon L Pierce 1, Fan Yang 3,4,, Lin S Chen 1,∗∗
PMCID: PMC11339623  PMID: 39053459

Summary

Mendelian randomization (MR) provides valuable assessments of the causal effect of exposure on outcome, yet the application of conventional MR methods for mapping risk genes encounters new challenges. One of the issues is the limited availability of expression quantitative trait loci (eQTLs) as instrumental variables (IVs), hampering the estimation of sparse causal effects. Additionally, the often context- or tissue-specific eQTL effects challenge the MR assumption of consistent IV effects across eQTL and GWAS data. To address these challenges, we propose a multi-context multivariable integrative MR framework, mintMR, for mapping expression and molecular traits as joint exposures. It models the effects of molecular exposures across multiple tissues in each gene region, while simultaneously estimating across multiple gene regions. It uses eQTLs with consistent effects across more than one tissue type as IVs, improving IV consistency. A major innovation of mintMR involves employing multi-view learning methods to collectively model latent indicators of disease relevance across multiple tissues, molecular traits, and gene regions. The multi-view learning captures the major patterns of disease relevance and uses these patterns to update the estimated tissue relevance probabilities. The proposed mintMR iterates between performing a multi-tissue MR for each gene region and joint learning the disease-relevant tissue probabilities across gene regions, improving the estimation of sparse effects across genes. We apply mintMR to evaluate the causal effects of gene expression and DNA methylation for 35 complex traits using multi-tissue QTLs as IVs. The proposed mintMR controls genome-wide inflation and offers insights into disease mechanisms.

Keywords: Mendelian randomization, transcriptome-wide Mendelian randomization, multivariable, multi-view learning, instrumental variables, pleiotropy, tissue-specific effects


Mendelian randomization (MR) faces challenges in mapping risk genes due to limited and tissue-specific instrumental variables. We propose mintMR, a multi-context multivariable integrative MR framework, that models molecular exposure effects across multiple tissues for each gene region and estimates multiple gene regions simultaneously, improving causal gene identification.

Introduction

Mendelian randomization (MR) examines the causal relationships between risk exposures and complex disease outcomes, using genetic variants as instrumental variables (IVs).1,2,3,4 With the rapidly growing availability of summary statistics from genome-wide association studies (GWASs), two-sample MR leveraging two sets of GWAS summary statistics as input has achieved many successes in assessing the causal effects of complex traits as exposures for diseases.5,6,7,8,9,10,11 Recently, transcriptome-wide MR (TWMR) considers gene expression as risk exposure and leverages expression quantitative trait loci (eQTL) and GWAS summary statistics to map risk genes.12,13,14,15 Unlike transcriptome-wide association studies (TWASs),16,17 TWMR focuses on causal assessment. Comparing with colocalization analysis,18,19,20,21,22 MR offers the flexibility to adjust for known confounders,23,24 consider joint exposures,25,26,27,28 and allow unmeasured confounders under appropriate assumptions.8,9,10,11

While MR offers valuable insights, the application of conventional MR methods in TWMR analysis for mapping risk genes comes with new challenges.12,29,30,31,32 A notable issue is the limited number of eQTLs as IVs,32,33 with cis-eQTLs being generally correlated.33 Furthermore, disease-associated eQTLs tend to have tissue-specific effects,34 while the disease-relevant tissue types are often unknown.35,36 This can lead to inconsistent IV effects across GWAS and eQTL samples, violating core IV assumptions.5,37,38 These issues motivate us to consider multiple tissues simultaneously. Nevertheless, in multi-tissue MR analysis, the causal effects of genes on diseases are often tissue specific and sparse,39,40,41 and thus the estimation of tissue-specific causal effects with a limited number of eQTLs/IVs is challenging.

Recognizing these challenges and opportunities, we propose a multi-context multivariable integrative Mendelian randomization method (mintMR), specifically designed for mapping gene expression and molecular traits as risk exposures. For each gene, we perform a multi-tissue MR analysis using eQTLs with non-zero and sign-consistent effects in more than one tissue as IVs, thereby improving the IV consistency. Our method improves the estimation of tissue-specific causal effects of all genes by simultaneously modeling the latent tissue indicators of disease relevance for multiple gene regions, jointly learning the major/low-rank patterns of latent indicators/probabilities via multi-view learning techniques, and then using the major patterns to estimate and update the probability of non-zero effects. The rationale is that risk genes for a disease often show non-zero effects in similar or related tissues,34,36 and by jointly learning the major patterns across genes, one can gain improved estimation of tissue-relevance probabilities and further use them to estimate the tissue-specific causal effects for each gene. The joint learning of disease relevance of latent tissue indicators improves the estimation of sparse tissue-specific causal effects for all genes. Our algorithm iterates between estimating multi-tissue MR models for each gene and jointly learning the latent patterns and probabilities of non-zero causal effects for all genes until the maximum iteration is reached. Our MR framework considers cis gene expression and DNA methylation (DNAm) as joint exposures. Given the frequent co-occurrence of eQTLs and mQTLs,30 the joint consideration of DNAm with gene expression is crucial for accurately mapping causal genes. If the causal DNAm is associated with gene expression and cis-eQTLs selected as IVs are also associated with DNAm, the DNAm would be a confounder being associated with IV, and its omission could lead to biased causal inference. By jointly assessing the causal effects of gene expression and DNAm, we demonstrate that the proposed method controls genome-wide inflation, improves the power, and offers valuable insights into disease-relevant tissues and mechanisms. Our mintMR approach uniquely tackles challenges in mapping molecular traits as risk exposures via MR, jointly learns the low-rank patterns in the probabilities of disease relevance across many genes, and thereby enhances the estimation of sparse tissue-specific causal effects.

Methods

A starting model for a single gene region

We start with a multi-tissue MR model for studying the gene expression of a single gene from multiple tissues as the exposure and a complex disease as the outcome. We consider an eQTL i (i = 1,…,Ig) as an IV for the expression of a gene indexed by g. Let γgik(k=1,,K) denote the true marginal effect of the SNP i on the gene g in tissue k. Let Γig denote the true marginal association between SNP i and the disease outcome of interest, and the superscript g indicates that the SNP i is an IV for gene g. Denote {γˆgik,sˆγgik} as the estimated SNP-gene association and its standard error for SNP i and gene g in tissue k, and {Γˆig,sˆΓig} as the estimated effect of SNP i on the outcome and its standard error. We have the model for SNP i:

(Γˆigγˆgi1γˆgik)N((Γigγgi1γgik),SˆgiCSˆgi), (Equation 1)

where C is the tissue-tissue correlation matrix due to sample overlap and is often estimated apriori42 and Sˆgi=diag(sˆΓig,sˆγgi1,,sˆγgiK) is the standard error estimate from GWASs and multi-tissue eQTL studies.

We further assume the true causal relationship between GWAS and eQTL effects, Γig's and γgik's, is linear and is given by

Γig=αig+k=1Kηgk·βgkγgik, (Equation 2)

where βgk is the causal effect of interest for gene g in tissue k. We introduce ηgk as a latent indicator for disease relevance of tissue, and ηgk=1 if βgk0. We assume ηgkBernoulli(πgk). The effect of gene expression levels on the disease outcome is often sparse and varies across contexts, tissues, and cell types. The effect ηgk·βgk is the direct effect of the gene g in tissue k on the disease outcome not mediated via other exposures (including the gene expression in other tissues). When estimating the latent variables and the causal effects, the estimated probability of non-zero for the latent indicator can be viewed as a weight on the relevance of tissue types or the proportion of disease-relevant cell types in the current tissues. Without modeling latent disease-relevance tissue indicators, all tissues in the model are weighted equally. Here, the true IV-to-exposure effect follows γgikN(0,σγg2) and αigN(0,σαg2) is the uncorrelated horizontal pleiotropic effect (green arrow in Figure 1A) when IV affects outcome not through exposure and IV is not associated with confounder.

Figure 1.

Figure 1

Illustrations of the multi-context multivariable integrative Mendelian randomization method

(A) The causal diagram of the multivariable MR model. When assessing the effect of gene expression on outcome, if a correlated exposure (e.g., DNA methylation; shaded) is not considered in the model, it will serve as a confounder and bias the inference (orange line). The green line represents uncorrelated horizontal pleiotropic effect.

(B) An illustration of the mintMR framework for analyzing multiple gene-CpG pairs from G gene regions. mintMR takes as input G × L (L = 2 here) sets of IV-to-exposure effects and standard error matrices from multi-tissue eQTL and mQTL studies, respectively. It models the latent status for each causal effect. Via a logit function, mintMR links the latent status of the causal effects with a continuous modulation matrix. By performing multi-view learning on the modulation matrices, mintMR captures the low-rank data-shared and data-specific major patterns and uses them to estimate the disease-relevant probabilities. By iterating between performing MR for each gene region and estimating the disease-relevant probabilities for all genes, mintMR improves the estimation and inference on sparse causal effects for all genes.

Additionally, we consider a multivariable MR (MVMR) framework for a set of L (l = 1,…, L) molecular traits as exposures, each from Kl contexts or tissues. For example, in our motivating application, we jointly consider a gene expression and a CpG site from multiple tissues as the exposures, L = 2. Let SNP i be a cis-molecular QTL (xQTLs) for gene g, and γgik,lN(0,σγg,l2)(k=1,,Kl) denote the marginal effect of SNP i on the lth molecular exposure in tissue k. Extending Equation 2, we assume the following causal relationship holds between the marginal effect of the SNP i on the outcome, i.e., Γig, and the marginal effects of the SNP i on exposures, i.e., γgik,l's:

Γig=αig+k=1K1ηgk,l·βgk,lγgik,l++k=1KLηgk,l·βgk,lγgik,l. (Equation 3)

In Equation 3, ηgk,l·βgk,l describes the direct effect of exposure l in tissue k on the outcome not operating through the exposure in other tissues nor through other exposures (ll). Here similar to Equation 2, we assume ηgk,lBernoulli(πgk,l) and αigN(0,σαg2). The MVMR model allows the joint modeling of correlated cis-molecular traits in the gene regions to identify the risk factors and elucidate the mechanisms. In practice, since often there are only a limited number of xQTLs as IVs, the causal effects (and the latent indicators) in the above single-gene model in Equations 2 and 3 may not be statistically identifiable.

The proposed mintMR model for jointly learning the disease relevance of tissue indicators across G gene-CpG pairs

Common eQTLs are often weakly selected and disease-associated genetic variants typically influence downstream genes with effects being highly context specific.34 When multiple genes are causally affecting diseases in a pathway or gene set, they often have effects specific to certain disease-associated tissues and cell types. Furthermore, the enrichment of disease-associated gene expression has been successfully used to identify disease-relevant tissues and cell types.36 These observations motivate us to jointly learn the patterns of disease-relevance indicators/probabilities across many genes, especially considering the sparse nature of disease-relevant causal effects.

We propose a joint MVMR model across G gene-CpG pairs to estimate the causal effects for each gene and CpG in each tissue and jointly learn the major patterns of latent disease-relevance tissue indicators, particularly in scenarios where these effects are sparse. As illustrated in Figure 1B, we consider multi-tissue expression and DNAm of the gene-CpG pairs from the gth gene region (g=1,,G) and study their effects on the outcome. While the direct effects (βgk,l's) may vary in magnitude and direction, there could still be concerted patterns among the true non-zero causal effects and their effect operating contexts/tissues. The proposed mintMR model works by iteratively estimating the starting model in Equation 3 for each gene-CpG pair (one red box in Figure 1B) and collectively capturing the low-rank (major) patterns of non-zero causal effects across G gene regions for updating the tissue-relevance probabilities/weights until the maximum iteration is reached. The resulting estimates provide not only the causal effect for each gene and CpG site in each tissue, but also the estimated probability of disease relevance for each gene-tissue pair or CpG-tissue pair accounting for shared patterns. A major innovation of our model is the use of multi-view learning methods to capture the low-rank patterns shared across gene regions and omics-data types. The details of the estimation are provided in Appendix A.

To learn the low-rank patterns of disease-relevance (non-zero causal effects) across genes, molecular exposures, and tissue types, one may employ multi-view learning strategies such as co-training,43 multiple kernel learning,44 and canonical correlation analysis (CCA).45,46 For each gene-CpG pair, we have Equation 3. We model the latent disease-relevance tissue indicators for all G gene-CpG-tissue trios, assuming the latent indicator ηgk,lBernoulli(πgk,l). As illustrated in Figure 1B, we form L latent disease-relevance indicator matrices for L molecular exposures, ηl={ηgk,l}RG×Kl for expression, and DNAm (L = 2) in our motivating application. We introduce a continuous modulation matrix for each exposure l, Ul={Ugk,l}RG×Kl, and

Ugk,l=logPr(ηgk,l=1Ul,u0k,l)Pr(ηgk,l=0Ul,u0k,l)u0k,l. (Equation 4)

Here Ul modulates the probability of the latent binary association status and u0k,l is the tissue-specific intercept, controlling the sparsity of non-zero effects in the kth tissue of the lth type of molecular exposure. For each gene (g=1,,G), we estimate Equation 3 separately and then jointly model the L modulation matrices. We approximate the modulation matrices (Ul's) with low-rank matrices (U˜l's) capturing the major patterns of disease relevance across gene regions, molecular exposures, and tissue types. The mintMR model uses these approximated low-rank matrices (U˜l's) to estimate the disease-relevance probability for each gene/CpG in each tissue without over-parameterization. If there is no pattern shared across gene regions/molecular exposures/tissues, Ugk,l=0,g,k,l, and Equation 4 is reduced to logit(πgk,l)=u0k,l, i.e., only tissue-specific prior being imposed for the indicators across exposures.

More specifically, in this work, we capture the major patterns of disease relevance across all genes as the sum of major patterns shared across molecular exposures (expression and DNAm) and major tissue-sharing patterns specific to each data type. We have

Ugk,lU˜gk,l=Ugk,lC+Ugk,lR. (Equation 5)

The matrices U··,lC,l=1,,L represent the common major structures shared across the L latent tissue-relevance indicator matrices. We estimate U··,lC by applying generalized CCA on the matrices {logit(πgk,l)u0k,l}G×Kl. Furthermore, the U··,lR,l=1,,L matrices capture omics data-specific tissue-sharing patterns. We perform separate principal component analysis (PCA) on each residual matrix {logit(πgk,l)Ugk,lCu0k,l}G×Kl to obtain the low-rank patterns in each omics exposure data type, U··,lR. Alternative multi-view learning methods could be used to capture different types of desirable data patterns and obtain other approximated matrices.43,44,45,46 The proposed mintMR algorithm iterates between estimating the causal effects in the single-gene model in Equation 3 for each of the G gene regions and jointly learning/estimating the latent disease-relevance indicators/probabilities via Gibbs sampling until the maximum iteration is reached (see Appendix A for details). mintMR outputs the estimated causal effect for each gene/CpG in each tissue type and the p value for each effect. The mintMR p value is calculated by comparing the posterior samples’ Z score with the standard normal distribution quantile, and the confidence interval is constructed by the approximate normal distribution of the posterior samples’ Z score.47 mintMR also outputs the estimated disease-relevance probability for each gene/CpG in each tissue. These probabilities can be thresholded or summarized to evaluate the overall enrichment of disease relevance of each tissue type.

Accounting for LD

When studying gene expression and DNAm as joint molecular exposures, the number of e/mQTLs as IVs is generally limited. Applying a stringent LD clumping threshold would lose many IVs and hurt power. Instead of assuming independent IVs as in most existing multivariable MR methods,28 we allow IVs to be correlated. Assuming non-overlapping samples, we model the estimated effect sizes by accounting for the correlation among IVs i=1,,Ig:

(Γˆ1gΓˆIgg)N(SˆΓgRˆgSˆΓg1Γg,SˆΓgRˆgSˆΓg),and(γˆg1k,lγˆgIgk,l)N(Sˆγgk,lRˆgSˆγgk,l1γgk,l,Sˆγgk,lRˆgSˆγgk,l),l=1,,L,k=1,,Kl, (Equation 6)

where Rˆg is the correlation matrix of the Ig number of IVs for the gth set of exposures, SˆΓg=diag(sˆΓ1g,,sˆΓIgg), and Sˆγgk,l=diag(sˆγg1k,l,,sˆγgIgk,l). In the supplemental methods, we provided details of the Gibbs sampling algorithm of mintMR accounting for both LD and sample overlap.

Results

Simulations to evaluate the performance of mintMR and competing MR methods

We conducted simulation studies to evaluate the performance of mintMR in comparison with existing univariable MR (UVMR) and MVMR methods in various scenarios.

We simulated individual-level data for the GWAS of outcome and multi-tissue QTL studies for each exposure (details in supplemental methods). We simulated a genotype matrix for each gene-CpG pair g, with all generated SNPs having uncorrelated horizontal pleiotropy (UHP) effects on the simulated outcome not via exposures. We varied the proportion of the variance in the outcome that can be explained by these UHP effects. We then generated matrices of disease-relevance tissue indicators (ηl's), where ηgk,lBernoulli(πgk,l). Outcome variables in the GWAS were simulated according to the data generation models in Equation S2 in supplemental methods. QTL data were simulated based on Equation S3 in supplemental methods. With generated individual-level data, we calculated the marginal QTL and GWAS summary statistics as the input for MR analyses.

Most existing MR methods were developed to analyze complex traits as exposure. In TWMR, the number of cis-eQTLs as IVs for gene expression as exposure is generally much smaller than the number of IVs in conventional MR analyses. Our simulation studies show that the limited number of IVs poses a challenge for existing MR methods. We compared mintMR with existing multivariable methods, including MVMR-IVW,25 MVMR-Egger,26 MVMR-Lasso,27 MVMR-Median,27 MVMR-Robust,27 and MVcML.28 In addition, we included IVW with cross-tissue IVs and IV effects being estimated based on a meta-analysis of multiple tissue types (termed as “IVW+metaIV” below) and MR-Egger in the comparison. Among those competing methods, IVW and MVMR-IVW do not allow invalid IVs25,48; MR-Egger and MVMR-Egger require Instrument Strength Independent of Direct Effect (InSIDE) assumption6,26; MVMR-Median assumes the majority of IVs are valid27; MVMR-Lasso and MVMR-Robust are robust to outliers (few invalid IVs)27; and MVcML assumes that the valid IVs form the largest group to provide the causal parameter estimate, i.e., the plurality condition holds.28 All existing UVMR and MVMR methods are developed for using complex traits as exposures. Here we adapted them to TWMR with molecular traits as exposures for comparison purposes. Moreover, we compared the proposed mintMR with its two variations: mintMRoracle is a variation of mintMR where the true latent disease-relevance indicator is known, and it provides the optimal performance of mintMR, which in practice cannot be achieved without further information on disease-relevance indicators; and mintMRsingle-gene performs the starting model in Equation 3 for each single gene region separately without the joint learning of shared patterns, and its comparison to the proposed mintMR illustrates the improvement gained by jointly learning low-rank disease-relevance patterns across multiple gene regions, tissues, and molecular exposures. We applied competing MVMR methods with multiple tissues of both simulated expression and DNA methylation as exposures and applied MR-Egger with a single tissue of gene expression as exposure to evaluate their performance. We presented the comparison of type I error rates and powers of the proposed mintMR versus competing methods at the p value threshold of 0.05. To evaluate the estimation of effect sizes, we also compared the root-mean-square errors (RMSEs) of all methods (see supplemental methods).

For each simulation, we generated G = 50 pairs of genes and CpGs (L = 2) from 5 tissues (Kl = 5, l = 1,2), each with 500 samples. We generated 15 IVs for each gene-CpG pair and included IVs with p value < 0.01 in at least one tissue. We simulated two types of causal effects of genes on outcomes. In the first setting of Table 1, we simulated genes having effects on outcome in multiple tissues, with the effect indicators (ηgk,l's) having the same probability (πgk,l=0.05) across all tissues. In the second setting of Table 1, 15% of the genes have non-zero effects on outcome in one tissue (πgk,l=0.15). In each of the rest of the tissues, 3% of the genes have non-zero effects with πgk,l=0.03. We varied the proportion of the variation in the outcome explained by UHP effects of the g=1GIg IVs of all G genes from 0.05 to 0.15. As shown in Table 1, when the number of IVs was limited and all IVs had UHP,11 the proposed mintMR model could control type I error rate. Most competing methods, including mintMRsingle-gene, suffered from inflated type I error rates. Most of the competing methods showed increases in type I error rates when UHP effects increased. MVMR-Robust had reasonable control of type I error rates but suffered from low power. When the proportion of the variation in the outcome explained by UHP effects increased, the powers of all methods decreased. The proposed mintMR method had comparable power to the oracle method, mintMRoracle. These simulation results, in particular the comparisons of mintMR with mintMRoracle and mintMRsingle-gene, suggested that multi-view learning of shared patterns across multiple genes can effectively improve the estimation of latent disease-relevant probabilities, which leads to the improved estimation of the causal effects of interest. In Table S1, we showed that mintMR had the smallest RMSE among all the methods. In both settings, the multi-view learning of low-rank patterns of causal effects improved the power and precision when the number of IVs was limited.

Table 1.

Simulation results evaluating the performance of mintMR and competing methods when the number of IVs is limited

Equal probability of non-zero effects across all tissues Higher probability of non-zero effects in one tissue and lower in others
Variance of outcome explained by UHP effects

0.05 0.10 0.15 0.05 0.10 0.15

Power

mintMR 0.859 0.786 0.657 0.841 0.794 0.677
mintMRoracle 0.903 0.842 0.773 0.898 0.830 0.775
mintMRsingle-gene 0.718 0.629 0.567 0.691 0.610 0.582
IVW+metaIV 0.351 0.317 0.323 0.362 0.341 0.342
Egger 0.308 0.275 0.262 0.305 0.270 0.273
MVMR-IVW 0.663 0.534 0.473 0.652 0.523 0.461
MVMR-Egger 0.573 0.518 0.443 0.510 0.441 0.384
MVMR-Lasso 0.770 0.730 0.704 0.764 0.710 0.681
MVMR-Median 0.641 0.572 0.519 0.677 0.578 0.500
MVMR-Robust 0.455 0.374 0.315 0.444 0.365 0.295

Type I error rate

mintMR 0.050 0.049 0.048 0.051 0.050 0.046
mintMRoracle 0.048 0.050 0.048 0.050 0.048 0.048
mintMRsingle-gene 0.072 0.120 0.158 0.074 0.116 0.155
IVW+metaIV 0.149 0.160 0.162 0.146 0.157 0.158
Egger 0.131 0.138 0.141 0.131 0.135 0.139
MVMR-IVW 0.121 0.128 0.134 0.122 0.130 0.134
MVMR-Egger 0.133 0.138 0.138 0.121 0.136 0.141
MVMR-Lasso 0.159 0.214 0.259 0.158 0.210 0.257
MVMR-Median 0.089 0.117 0.132 0.087 0.115 0.127
MVMR-Robust 0.062 0.076 0.080 0.062 0.077 0.080

Two types of causal effects of genes on outcomes are simulated. For the first type, genes affect outcomes in multiple tissues, with each gene having an equal probability (5%) of having non-zero effects in any tissue. For the second type, in one tissue, 15% of the genes have non-zero effects on outcome. In each of the rest of the tissues, 3% of the genes have non-zero effects. The proportion of variation in outcome explained by UHP effects varies from 0.05 to 0.15. The sample size of the outcome is 50,000 and 500 for exposure. The number of IVs is 15. Two exposures are generated and each exposure has 5 tissues. The causal effects are generated with N(0,0.015). The type I error rate and power are calculated based on the p value cutoff of 0.05. Methods with type I error rates between 0.05 and 0.1 are considered to have borderline but tolerable control, and we still compare their power without highlighting their mildly inflated type I error rates. Methods with inflated type I error rates (≥0.1) are indicated with an asterisk () to ensure a fair power comparison.

In Table 2, we compared these methods in different scenarios. First, we increased the number of IVs from 15, 25, to 100. When the number of IVs increased, all competing methods could better control the type I error rates. When the number of IVs was 100, all MVMR methods had reasonable control of the type I error rates. The power and RMSE (Table S2) of all competing MVMR methods were similar to the proposed mintMR and mintMR single-gene version. The univariable MR methods IVW and Egger still had slightly inflated type I error rates and low power due to the omission of correlated exposures. These competing methods were proposed for analyzing complex traits as exposures, and the number of IVs in conventional MR analyses is usually much larger than the number of cis-QTLs as IVs in TWMR analyses. In other words, while existing MR methods work effectively for complex trait exposures, they may not perform as well in TWMR analyses, and our proposed mintMR was tailored for analyzing molecular traits as exposures from multiple contexts or tissues. Second, we varied the probability of QTL effect sharing. When the probability decreased, eQTL/IV effects became more specific to context or tissue and the consistency of IV effects decreased. Table 2 showed that when the consistency of QTL effects across the QTL and GWAS sample decreased, power was reduced for all methods due to the inclusion of many inconsistent IVs. Conversely, the power improved when more QTLs with tissue-shared effects were selected as IVs. This simulation underscores the importance of considering multiple tissues and selecting QTLs with consistent effects across more than one tissue as IVs. Third, we varied the number of tissues for each exposure. When the number of tissues increased, mintMR showed improved power as more IVs were included.

Table 2.

Simulation results evaluating the performance of mintMR and competing methods in different scenarios

Number of IVs Probability of QTL effect being consistent across QTL and GWAS sample Number of tissues for each exposure
15 25 100 0.8 0.5 0.2 5 10 15

Power

mintMR 0.734 0.822 0.932 0.867 0.789 0.660 0.553 0.713 0.739
mintMRoracle 0.764 0.861 0.964 0.898 0.852 0.746 0.586 0.810 0.900
mintMRsingle-gene 0.547 0.789 0.963 0.751 0.712 0.527 0.497 0.538 0.583
IVW+metaIV 0.351 0.352 0.530 0.299 0.388 0.399 0.321 0.286 0.281
Egger 0.236 0.254 0.220 0.367 0.289 0.264 0.240 0.180 0.192
MVMR-IVW 0.444 0.774 0.969 0.682 0.572 0.436 0.316 0.302 0.487
MVMR-Egger 0.383 0.729 0.898 0.562 0.495 0.407 0.266 0.326 0.404
MVMR-Lasso 0.631 0.783 0.969 0.793 0.679 0.443 0.612 0.364 0.493
MVMR-Median 0.526 0.745 0.920 0.723 0.671 0.500 0.418 0.225 0.480
MVMR-Robust 0.276 0.730 0.961 0.513 0.432 0.324 0.175 0.190 0.407

Type I error rate

mintMR 0.049 0.055 0.041 0.053 0.066 0.060 0.052 0.037 0.062
mintMRoracle 0.049 0.051 0.056 0.049 0.054 0.050 0.048 0.050 0.048
mintMRsingle-gene 0.115 0.109 0.059 0.236 0.152 0.194 0.174 0.235 0.126
IVW+metaIV 0.145 0.144 0.093 0.148 0.156 0.168 0.146 0.157 0.125
Egger 0.134 0.112 0.074 0.109 0.110 0.127 0.138 0.124 0.105
MVMR-IVW 0.122 0.064 0.055 0.122 0.106 0.067 0.126 0.122 0.081
MVMR-Egger 0.122 0.063 0.055 0.120 0.098 0.064 0.126 0.153 0.084
MVMR-Lasso 0.194 0.068 0.057 0.188 0.129 0.069 0.303 0.145 0.081
MVMR-Median 0.114 0.116 0.100 0.172 0.112 0.069 0.135 0.076 0.103
MVMR-Robust 0.066 0.047 0.051 0.067 0.052 0.036 0.072 0.062 0.060

When varying the number of IVs, the proportion of variation in outcome explained by UHP effect is 0.1. The causal effects are generated from N(0,0.01). The probability of QTL effect being consistent across QTL and GWAS samples is 0.8. Five tissues are generated for each exposure. When decreasing the probability of QTL effect being consistent, the causal effects are generated from N(0,0.02). We simulated 15 IVs across 5 tissues for each exposure. When the number of tissues for each exposure increased from 5, 10, to 20, we simulated 15, 25, and 45 IVs, respectively. The probability of consistency is 0.8. Causal effects are generated from N(0,0.01). Results are asterisked () for methods unable to control type I error rates (≥0.1).

We presented additional simulation results in the supplemental methods. In Table S3, we showed that mintMR had the smallest RMSEs when varying the consistency of QTL effect and the number of tissues. In Table S4, we increased the sample size from 500 to 10,000 for each tissue type in the presence of UHP. The larger tissue sample size improved the estimation of the IV-to-exposure effects, while also making the impacts of invalid IVs stronger. The performance of competing methods was similar for different sample sizes. In Table S5, we varied the causal effect size, and the proposed mintMR method controlled the type I error rate and showed improved power compared with other methods. mintMR had the smallest RMSEs on data with varied sample sizes and effect sizes (Table S6). In addition, we simulated correlated IVs with genetic correlation up to 0.5. When the IVs were correlated and the numbers of IVs were limited, the proposed mintMR could still control the type I error rate and showed reasonable power (Table S7). We performed additional simulations to evaluate the performance of mintMR when exposures have large differences in sample sizes for different tissues. In Table S8, we simulated data with exposures with sample sizes mimicking the real data sample sizes in our analysis. mintMR demonstrates higher power in tissues with larger sample sizes and controls type I error rates across tissues with varying sample sizes. In Table S9, we showed that mintMR is robust to IV selection threshold. mintMR is robust to the specification of hyperparameters within a reasonable range (Table S10). In Table S11, we showed that mintMR has a similar performance when selecting the dimension of the low-rank representation within a certain range. We also showed that mintMR is computationally efficient (Table S12).

Data analysis: Identifying trait and disease risk-associated genes via mintMR

We applied the proposed mintMR method to map risk genes for 35 complex traits and diseases, including 14 immunological traits, 6 metabolic traits, 2 neurological diseases, 2 cardiovascular traits, 7 psychiatric diseases and traits, and 4 other traits. We used GWAS statistics as the IV-to-outcome statistics. Details of the GWAS statistics can be found in Table S16. We used multi-tissue eQTL and mQTL summary statistics as the IV-to-exposure statistics. For eQTLs, we obtained the summary statistics for blood tissue from the eQTLGen consortium49 (N = 31,684), for muscle tissue (N = 706), lung tissue (N = 515), and brain cerebellum tissue (N = 209) from version 8 of the Genotype-Tissue Expressions (GTEx) project,33 and for brain dorsolateral prefrontal cortex tissue from the Religious Orders Study and Memory and Aging Project (ROSMAP; N = 560).50 For mQTLs, we obtained the summary statistics for lung tissue from GTEx18 (N = 190), skeletal muscle tissue from FUSION51 (N = 265), and blood tissue (N = 1,366) from Brisbane Systems Genetics Study (BSGS)52,53 plus Lothian Birth Cohorts (LBC).54 For each gene, we selected the strongest correlated CpG for the gene in the promotor region of the gene in at least one tissue. Specifically, for each gene in each tissue, we performed an F-test on each of its proximal CpGs (within 100 kb of TSS) to test its association to expression. We then selected the CpG with the smallest combined p value across tissues based on Fisher’s method for each gene. For each gene-CpG pair, we selected the cis-eSNPs or mSNPs with non-zero and sign-consistent eQTL or mQTL effects in at least two tissues (p ≤ 0.005). We selected sign-consistent eQTLs (or mQTLs) as IVs to enhance the consistency of IV-to-exposure effects across GWAS and reference QTL samples, which is crucial given the context-specific nature of many QTLs. Moreover, requiring sign consistency of eQTLs (or mQTLs) reduces the chances of selecting invalid IVs. This is because a significant proportion of eQTLs exhibit sign-opposite effects in different tissues,55 often indicating multiple causal/lead eSNPs and pleiotropy. We performed LD clumping at the r2 threshold of 0.01. We chose a stringent LD threshold here because some tissues (e.g., blood) have a much larger sample size than others. Using a relaxed LD clumping threshold may select many correlated IVs in that tissue. We restricted our analysis to genes with at least 10 IVs overall and at least one IV for each tissue.

We applied mintMR to each of the 35 complex traits and diseases, with an average of 3,440 genes examined for each trait or disease. At the false discovery rate (FDR) of 0.05, we identified the genes and CpG sites showing significant effects in at least two tissues for each examined trait/disease. See Table S13 for a list of examined traits/diseases, the number of genes studied, and the number of detected genes and CpG sites. As part of the output, mintMR also provides the estimated probability of disease relevance for each gene-tissue or CpG-tissue pair in each tissue type. These probabilities can be thresholded or summarized to evaluate the overall enrichment of disease relevance of each tissue type. Among the 72 risk genes identified for Alzheimer disease, 52 (72.2%) of them are significant in brain cerebellum or brain cortex.56,57,58 Among the 37 identified risk genes for depressive symptoms, 20 (54.1%) of them showed significant effects in brain cortex.59,60 Among the 110 identified risk genes for lymphocyte counts, 62 (56.4%) of them showed significant effects in blood. These results demonstrated the rationale of the proposed method: risk genes for a disease often show an enrichment of non-zero effects in disease-relevant tissues.

In Table S14, we evaluated the genome-wide inflation factor61 with and without accounting for DNAm, based on the p value distributions of gene expression in each tissue. By accounting for the most correlated cis-CpG site, genome-wide inflation is substantially reduced for all examined traits and diseases. An important message from our analysis result is that in mapping the expression of risk genes, cis-DNAm can be a major confounder if not accounted for. Existing studies showed that cis-DNAm frequently correlates with cis expression and cis-eQTLs often co-occur with cis-mQTLs.30 If a cis-e/mQTL or a variant in LD with it is selected as an IV and cis-DNAm is not accounted for, the causal inference can be compromised due to the IVs being correlated with the confounder. In Table S15, we showed that mintMR had lower inflation factors than MVMR-Lasso, MVMR-Median, and MVMR-IVW. The inflation factors of mintMR and MVMR-Robust are comparable. Due to the prevalent pleiotropy in TWMR analysis, MVMR-Egger and MVMR-Robust are expected to have lower power than the other examined methods.27,28 Simulation showed that MVMR-Egger and MVMR-Robust have much lower power than mintMR when UHP is prevalent. We also note that there is remaining mild inflation in the p values. It suggests that there are additional factors and potential IV-associated confounders that have not been fully accounted for in the analyses. This could be at least partially due to, for example, secondary cis-CpG sites, and other correlated and co-expressed cis genes in the region. The proposed mintMR model is a multivariable MR framework and it can be applied to jointly consider one or more cis gene expression and multiple CpG sites.

In Figure 2A, we showed the quantile-quantile (QQ) plot of negative log base 10 of p values for gene expression effects on hypertension in the blood tissue. The genome-wide inflation factor decreased from 1.88 to 1.25 after accounting for DNAm. In the 5q31–32 region, we identified four genes (HSPA4 [MIM: 601113], HARS2 [MIM: 600783], KIAA0141 [MIM: 620664], and ARHGEF37 [MIM: 615741]) showing significant effects on hypertension (FDR < 0.05) without accounting for DNAm. After adjusting for the most correlated cis-CpG site, only the expression of HSPA4 still showed a significant effect (Figure 2B). HSPA4 is a member of the heat shock protein 70 family, which is known to be involved in the pathogenesis of hypertension.62,63 We further conducted a colocalization analysis, and only the gene HSPA4 showed a high probability of colocalization with hypertension (PP4 = .95) (Figure 2C). Additionally, we examined all the significant genes identified for hypertension at the FDR level of 0.05 in at least two tissues. Out of the 57 identified genes, 49 were analyzed in a TWAS.64 Among these, 15 genes (30.6%) were also significant in the TWAS analysis (p < 0.005), a proportion much higher than that observed among all genes examined (14.2%). Moreover, 6 out of these 49 genes (12.2%) were supported by colocalization analyses (PP4 > 0.7), a much higher proportion than all genes examined (2.3%).

Figure 2.

Figure 2

Results of gene expression effects on hypertension

(A) A QQ plot of negative log base 10 of p values for gene expression effects on hypertension. Red points represent the p values of gene expression adjusting for DNAm. Blue points are the p values of gene expression without adjusting for DNAm. Genome-wide inflation factors (GIFs) for both analyses are shown.

(B) The causal effects of four genes on hypertension in the blood tissue in the 5q31–32 region, with and without adjusting for DNAm. Without adjusting for DNAm (blue points and error bars), the four gene expression levels show significant effects on hypertension (FDR < 0.05). After adjusting for DNAm (red points and error bars), only the expression of HSPA4 is significant.

(C) Genotype-phenotype association p values in the HSPA4 locus for hypertension GWAS (top) and eQTL in the blood (bottom). The colocalization probability (PP4) of eQTL with GWAS signal is shown. The diamond-shaped point represents the top significant eQTL variant (rs72801474). Linkage disequilibrium between SNPs is assessed by squared Pearson coefficient of correlation (r2).

We further conducted pathway analyses on the significant genes and proximal genes correlated with significant CpGs identified for each of the 35 traits and diseases, utilizing the Reactome65 and Gene Ontology66 database. We detected the significantly enriched biological pathways for each trait and disease, as shown in Figure 3. Our results revealed many enriched pathways being shared among related traits, suggesting shared mechanisms. Lipid-related pathways, including lipid localization and transport, are implied for Alzheimer disease, monocyte count, lymphocyte count, and platelet count. As the basic component of cell membranes, lipids play an important role in brain function. Impaired homeostasis of lipids is known to be related to neurologic disorders.67,68,69 Monocytes, lymphocytes, and platelets are key components of the immune system,70,71,72 and the fact that these traits share common enriched pathways with Alzheimer disease suggests that inflammation and immune response play a significant role in Alzheimer disease.73,74

Figure 3.

Figure 3

The heatmaps of enriched pathways

Heatmaps are shown for (A) identified genes affecting complex traits/diseases and (B) proximal genes correlated with the identified significant CpG sites. The p values of pathway enrichment are calculated based on one-sided Fisher’s exact tests without multiple testing adjustments. Pathways with p values < 0.005 for at least two traits are presented.

Discussion

In this work, we propose an integrative multi-context Mendelian randomization method, mintMR, for addressing unique challenges in TWMR analysis. mintMR performs a multi-tissue MR analysis using QTLs as IVs for each gene region. It improves the estimation of tissue-specific causal effects of all genes by simultaneously modeling the latent disease-relevance context/tissue indicators for multiple gene regions, jointly learning the low-rank patterns of latent indicators/probabilities via multi-view learning techniques, and then using the major patterns to update the probability of non-zero effects. The joint learning of disease relevance of latent tissue indicators improves the estimation of sparse tissue-specific causal effects for all genes. By selecting cross-tissue QTLs as IVs and considering both gene expression and DNAm as joint exposures, mintMR improves IV consistency and reduces confounding due to correlated cis molecular traits when mapping causal genes. Simulations show that mintMR can control the type I error rates and has good powers in various settings, even when there are a limited number of QTLs as IVs and the causal effects are sparse.

We applied mintMR to map risk genes for 35 complex traits and diseases, leveraging QTL summary statistics from multiple tissues of different studies and GWAS summary statistics. Our results showed a reasonable control of genome-wide inflation for the examined traits and diseases, demonstrating the feasibility of leveraging multi-tissue QTLs and jointly learning disease-relevance probabilities across multiple gene regions in improving causal identification. Our results also suggested DNAm might be a major confounder in mapping risk genes. By accounting for cis DNAm, genome-wide inflation for TWMR analyses was substantially reduced. Our analysis and results demonstrated that mintMR could offer valuable insights into disease-relevant tissues and the underlying mechanisms.

There are several limitations of our work. First, mintMR does not allow IV to be associated with unmeasured confounders. As a multivariable MR framework, mintMR allows the adjustment and joint modeling of correlated molecular traits as joint exposures. Simulation studies show that mintMR is robust to mild violations of the InSIDE assumption. In the TWMR analysis of 35 traits and diseases, we noted some remaining mild genome-wide inflation after modeling the most correlated cis-CpG sites. In future analyses, additional correlated cis molecular traits, such as secondary cis CpG sites or nearly co-expressed genes, could also be modeled to further reduce genome-wide inflation. Second, we assume linear effects of exposures on outcome. The current mintMR model is not flexible for modeling complex interactions among exposures and interactions with known covariates, such as sex-biased effects.

In future work, mintMR can be extended to allow for correlated horizontal pleiotropy by identifying IVs with such effects. Another area of future development is to improve the modeling of major patterns of disease relevance indicators by adopting other advanced multi-view learning techniques. In this work, we used CCA and PCA to capture omics-shared and tissue-shared patterns in mapping risk genes. Other deep learning and supervised multi-view learning methods could be implemented to promote other desirable patterns among examined genes.45,75,76 Moreover, the mintMR model could be further expanded to model interaction effects among joint exposures and covariates. These developments will be explored in future works.

Data and code availability

All the summary statistics used in this paper are publicly available. The GTEx study data (v8) are available through dbGaP under accession number phs000424.v8.p2. Summary statistics of mQTLs are available at the enhancing GTEx Portal (https://gtexportal.org/home/downloads/egtex/methylation). The eQTLGen data released by eQTLGen Consortium are available at https://www.eqtlgen.org. The ROSMAP eQTL summary statistics are available at eQTL Catalog https://www.ebi.ac.uk/eqtl/. The FUSION data are available through FUSION Skeletal Muscle Study portal: https://www.ebi.ac.uk/birney-srv/FUSION/. The BSGS+LBC mQTL summary statistics are available at: https://cnsgenomics.com/data/SMR/#mQTLsummarydata. Code for simulation and real data analysis for the mintMR paper is available at https://github.com/ylustat/MVMR-Analysis. The code for mintMR is available at https://github.com/ylustat/mintMR.

Appendix A. The Gibbs sampling algorithm for mintMR model

graphic file with name fx1.jpg

Acknowledgments

We thank the GTEx Consortium. The research of L.S.C. and Y.L. was supported by NIH 2R01GM108711, R35ES028379, and 1R01CA229618. Y.L. was also supported by Susan G. Komen TREND21675016.

Author contributions

L.S.C. conceived the project. L.S.C., F.Y., and Y.L. developed the methods and wrote the manuscript. K.X. assisted Y.L. with the development of the estimation algorithm. Y.L. conducted the simulations and analyzed the data. All authors provided valuable suggestions for the development of the methods and the data analyses. All authors reviewed and approved the final manuscript.

Declaration of interests

The authors declare no competing interests.

Published: July 24, 2024

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2024.06.012.

Contributor Information

Fan Yang, Email: yangfan1987@mail.tsinghua.edu.cn.

Lin S. Chen, Email: lchen4@bsd.uchicago.edu.

Web resources

Supplemental information

Document S1. Tables S1–S15
mmc1.pdf (328.1KB, pdf)
Table S16. List of 35 analyzed GWAS traits
mmc2.xlsx (12.3KB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (3MB, pdf)

References

  • 1.Chen L.S., Emmert-Streib F., Storey J.D. Harnessing naturally randomized transcription to infer regulatory relationships among genes. Genome Biol. 2007;8:219. doi: 10.1186/gb-2007-8-10-r219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Lawlor D.A., Harbord R.M., Sterne J.A., Timpson N., Davey Smith G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat. Med. 2008;27:1133–1163. doi: 10.1002/sim.3034. [DOI] [PubMed] [Google Scholar]
  • 3.Schadt E.E., Lamb J., Yang X., Zhu J., Edwards S., GuhaThakurta D., Sieberts S.K., Monks S., Reitman M., Zhang C., et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nat. Genet. 2005;37:710–717. doi: 10.1038/ng1589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Davey Smith G., Ebrahim S. Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 2003;32:1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]
  • 5.Burgess S., Butterworth A., Thompson S.G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 2013;37:658–665. doi: 10.1002/gepi.21758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bowden J., Davey Smith G., Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 2015;44:512–525. doi: 10.1093/ije/dyv080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Zhao Q., Wang J., Hemani G., Bowden J., Small D.S. Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. Ann. Stat. 2020;48:1742–1769. doi: 10.1214/19-AOS1866. [DOI] [Google Scholar]
  • 8.Cheng Q., Zhang X., Chen L.S., Liu J. Mendelian randomization accounting for complex correlated horizontal pleiotropy while elucidating shared genetic etiology. Nat. Commun. 2022;13:6490. doi: 10.1038/s41467-022-34164-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wang J., Zhao Q., Bowden J., Hemani G., Davey Smith G., Small D.S., Zhang N.R. Causal inference for heritable phenotypic risk factors using heterogeneous genetic instruments. PLoS Genet. 2021;17 doi: 10.1371/journal.pgen.1009575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Xue H., Shen X., Pan W. Constrained maximum likelihood-based Mendelian randomization robust to both correlated and uncorrelated pleiotropic effects. Am. J. Hum. Genet. 2021;108:1251–1269. doi: 10.1016/j.ajhg.2021.05.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Morrison J., Knoblauch N., Marcus J.H., Stephens M., He X. Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics. Nat. Genet. 2020;52:740–747. doi: 10.1038/s41588-020-0631-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Gleason K.J., Yang F., Chen L.S. A robust two-sample transcriptome-wide Mendelian randomization method integrating GWAS with multi-tissue eQTL summary statistics. Genet. Epidemiol. 2021;45:353–371. doi: 10.1002/gepi.22380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Richardson T.G., Hemani G., Gaunt T.R., Relton C.L., Davey Smith G. A transcriptome-wide Mendelian randomization study to uncover tissue-dependent regulatory mechanisms across the human phenome. Nat. Commun. 2020;11:185. doi: 10.1038/s41467-019-13921-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Barfield R., Feng H., Gusev A., Wu L., Zheng W., Pasaniuc B., Kraft P. Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet. Epidemiol. 2018;42:418–433. doi: 10.1002/gepi.22131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhou D., Jiang Y., Zhong X., Cox N.J., Liu C., Gamazon E.R. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nat. Genet. 2020;52:1239–1246. doi: 10.1038/s41588-020-0706-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Shi X., Chai X., Yang Y., Cheng Q., Jiao Y., Chen H., Huang J., Yang C., Liu J. A tissue-specific collaborative mixed model for jointly analyzing multiple tissues in transcriptome-wide association studies. Nucleic Acids Res. 2020;48:e109. doi: 10.1093/nar/gkaa767. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hu Y., Li M., Lu Q., Weng H., Wang J., Zekavat S.M., Yu Z., Li B., Gu J., Muchnik S., et al. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat. Genet. 2019;51:568–576. doi: 10.1038/s41588-019-0345-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Oliva M., Demanelis K., Lu Y., Chernoff M., Jasmine F., Ahsan H., Kibriya M.G., Chen L.S., Pierce B.L. DNA methylation QTL mapping across diverse human tissues provides molecular links between genetic variation and complex traits. Nat. Genet. 2023;55:112–122. doi: 10.1038/s41588-022-01248-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Giambartolomei C., Zhenli Liu J., Zhang W., Hauberg M., Shi H., Boocock J., Pickrell J., Jaffe A.E., Consortium C., Pasaniuc B., Roussos P. A Bayesian framework for multiple trait colocalization from summary association statistics. Bioinformatics. 2018;34:2538–2545. doi: 10.1093/bioinformatics/bty147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Guo H., Fortune M.D., Burren O.S., Schofield E., Todd J.A., Wallace C. Integration of disease association and eQTL data using a Bayesian colocalisation approach highlights six candidate causal genes in immune-mediated diseases. Hum. Mol. Genet. 2015;24:3305–3313. doi: 10.1093/hmg/ddv077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Foley C.N., Staley J.R., Breen P.G., Sun B.B., Kirk P.D., Burgess S., Howson J.M. A fast and efficient colocalization algorithm for identifying shared genetic risk factors across multiple traits. Nat. Commun. 2021;12:764. doi: 10.1038/s41467-020-20885-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wen X., Pique-Regi R., Luca F. Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization. PLoS Genet. 2017;13 doi: 10.1371/journal.pgen.1006646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Sanderson E., Davey Smith G., Windmeijer F., Bowden J. An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int. J. Epidemiol. 2019;48:713–727. doi: 10.1093/ije/dyy262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Anderson E.L., Howe L.D., Wade K.H., Ben-Shlomo Y., Hill W.D., Deary I.J., Sanderson E.C., Zheng J., Korologou-Linden R., Stergiakouli E., et al. Education, intelligence and Alzheimer’s disease: evidence from a multivariable two-sample Mendelian randomization study. Int. J. Epidemiol. 2020;49:1163–1172. doi: 10.1093/ije/dyz280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Burgess S., Thompson S.G. Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. Am. J. Epidemiol. 2015;181:251–260. doi: 10.1093/aje/kwu283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Rees J.M., Wood A.M., Burgess S. Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy. Stat. Med. 2017;36:4705–4718. doi: 10.1002/sim.7492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Grant A.J., Burgess S. Pleiotropy robust methods for multivariable Mendelian randomization. Stat. Med. 2021;40:5813–5830. doi: 10.1002/sim.9156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Lin Z., Xue H., Pan W. Robust multivariable Mendelian randomization based on constrained maximum likelihood. Am. J. Hum. Genet. 2023;110:592–605. doi: 10.1016/j.ajhg.2023.02.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Yang F., Wang J., Pierce B.L., Chen L.S., Aguet F., Ardlie K.G., Cummings B.B., Gelfand E.T., Getz G., Hadley K., et al. Identifying cis-mediators for trans-eQTLs across many human tissues using genomic mediation analysis. Genome Res. 2017;27:1859–1871. doi: 10.1101/gr.216754.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Pierce B.L., Tong L., Argos M., Demanelis K., Jasmine F., Rakibuz-Zaman M., Sarwar G., Islam M.T., Shahriar H., Islam T., et al. Co-occurring expression and methylation QTLs allow detection of common causal variants and shared biological mechanisms. Nat. Commun. 2018;9:804. doi: 10.1038/s41467-018-03209-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Verbanck M., Chen C.-Y., Neale B., Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. 2018;50:693–698. doi: 10.1038/s41588-018-0099-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Gleason K.J., Yang F., Pierce B.L., He X., Chen L.S. Primo: integration of multiple GWAS and omics QTL summary statistics for elucidation of molecular mechanisms of trait-associated SNPs and detection of pleiotropy in complex traits. Genome Biol. 2020;21:236. doi: 10.1186/s13059-020-02125-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Consortium G. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318–1330. doi: 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Umans B.D., Battle A., Gilad Y. Where are the disease-associated eQTLs? Trends Genet. 2021;37:109–124. doi: 10.1016/j.tig.2020.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Shang L., Smith J.A., Zhou X. Leveraging gene co-expression patterns to infer trait-relevant tissues in genome-wide association studies. PLoS Genet. 2020;16 doi: 10.1371/journal.pgen.1008734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Finucane H.K., Reshef Y.A., Anttila V., Slowikowski K., Gusev A., Byrnes A., Gazal S., Loh P.-R., Lareau C., Shoresh N., et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 2018;50:621–629. doi: 10.1038/s41588-018-0081-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Burgess S., Scott R.A., Timpson N.J., Davey Smith G., Thompson S.G., Consortium E.-I. Using published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. Eur. J. Epidemiol. 2015;30:543–552. doi: 10.1007/s10654-015-0011-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Pierce B.L., Burgess S. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am. J. Epidemiol. 2013;178:1177–1184. doi: 10.1093/aje/kwt084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Hekselman I., Yeger-Lotem E. Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nat. Rev. Genet. 2020;21:137–150. doi: 10.1038/s41576-019-0200-9. [DOI] [PubMed] [Google Scholar]
  • 40.Ongen H., Brown A.A., Delaneau O., Panousis N.I., Nica A.C., Consortium G., Dermitzakis E.T. Estimating the causal tissues for complex traits and diseases. Nat. Genet. 2017;49:1676–1683. doi: 10.1038/ng.3981. [DOI] [PubMed] [Google Scholar]
  • 41.Feng H., Mancuso N., Gusev A., Majumdar A., Major M., Pasaniuc B., Kraft P. Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improves the power of transcriptome-wide association studies. PLoS Genet. 2021;17 doi: 10.1371/journal.pgen.1008973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Urbut S.M., Wang G., Carbonetto P., Stephens M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 2019;51:187–195. doi: 10.1038/s41588-018-0268-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Ma F., Meng D., Dong X., Yang Y. Self-paced multi-view co-training. J. Mach. Learn. Res. 2020;21:1–38. https://jmlr.org/papers/v21/18-794.html [Google Scholar]
  • 44.Liu J., Liu X., Yang Y., Liao Q., Xia Y. Contrastive Multi-View Kernel Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2023;45:9552–9566. doi: 10.1109/TPAMI.2023.3253211. [DOI] [PubMed] [Google Scholar]
  • 45.Wang W., Arora R., Livescu K., Bilmes J. On deep multi-view representation learning. Proc. 32nd Int. Conf. International Conference on Machine Learning. 2015;37:1083–1092. https://proceedings.mlr.press/v37/wangb15.html [Google Scholar]
  • 46.Li G., Han D., Wang C., Hu W., Calhoun V.D., Wang Y.-P. Application of deep canonically correlated sparse autoencoder for the classification of schizophrenia. Comput. Methods Progr. Biomed. 2020;183 doi: 10.1016/j.cmpb.2019.105073. [DOI] [PubMed] [Google Scholar]
  • 47.Gelman A., Carlin J.B., Stern H.S., Dunson D.B., Vehtari A., Rubin D.B. Chapman and Hall/CRC; 2014. Bayesian Data Analysis.http://www.stat.columbia.edu/∼gelman/book/ [Google Scholar]
  • 48.Slob E.A., Burgess S. A comparison of robust Mendelian randomization methods using summary data. Genet. Epidemiol. 2020;44:313–329. doi: 10.1002/gepi.22295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Võsa U., Claringbould A., Westra H.-J., Bonder M.J., Deelen P., Zeng B., Kirsten H., Saha A., Kreuzhuber R., Yazar S., et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 2021;53:1300–1310. doi: 10.1038/s41588-021-00913-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Bennett D.A., Buchman A.S., Boyle P.A., Barnes L.L., Wilson R.S., Schneider J.A. Religious orders study and rush memory and aging project. J. Alzheim. Dis. 2018;64:S161–S189. doi: 10.3233/JAD-179939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Taylor D.L., Jackson A.U., Narisu N., Hemani G., Erdos M.R., Chines P.S., Swift A., Idol J., Didion J.P., Welch R.P., et al. Integrative analysis of gene expression, DNA methylation, physiological traits, and genetic variation in human skeletal muscle. Proc. Natl. Acad. Sci. USA. 2019;116:10883–10888. doi: 10.1073/pnas.1814263116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.McRae A.F., Powell J.E., Henders A.K., Bowdler L., Hemani G., Shah S., Painter J.N., Martin N.G., Visscher P.M., Montgomery G.W. Contribution of genetic variation to transgenerational inheritance of DNA methylation. Genome Biol. 2014;15 doi: 10.1186/gb-2014-15-5-r73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Powell J.E., Henders A.K., McRae A.F., Caracella A., Smith S., Wright M.J., Whitfield J.B., Dermitzakis E.T., Martin N.G., Visscher P.M., Montgomery G.W. The Brisbane Systems Genetics Study: genetical genomics meets complex trait genetics. PLoS One. 2012;7 doi: 10.1371/journal.pone.0035430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Deary I.J., Gow A.J., Pattie A., Starr J.M. Cohort profile: the Lothian Birth Cohorts of 1921 and 1936. Int. J. Epidemiol. 2012;41:1576–1584. doi: 10.1093/ije/dyr197. [DOI] [PubMed] [Google Scholar]
  • 55.Mizuno A., Okada Y. Biological characterization of expression quantitative trait loci (eQTLs) showing tissue-specific opposite directional effects. Eur. J. Hum. Genet. 2019;27:1745–1756. doi: 10.1038/s41431-019-0468-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Jacobs H.I., Hopkins D.A., Mayrhofer H.C., Bruner E., van Leeuwen F.W., Raaijmakers W., Schmahmann J.D. The cerebellum in Alzheimer’s disease: evaluating its role in cognitive decline. Brain. 2018;141:37–47. doi: 10.1093/brain/awx194. [DOI] [PubMed] [Google Scholar]
  • 57.Roe J.M., Vidal-Piñeiro D., Sørensen Ø., Brandmaier A.M., Düzel S., Gonzalez H.A., Kievit R.A., Knights E., Kühn S., Lindenberger U., et al. Asymmetric thinning of the cerebral cortex across the adult lifespan is accelerated in Alzheimer’s disease. Nat. Commun. 2021;12:721. doi: 10.1038/s41467-021-21057-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Qiang W., Yau W.-M., Lu J.-X., Collinge J., Tycko R. Structural variation in amyloid-β fibrils from Alzheimer’s disease clinical subtypes. Nature. 2017;541:217–221. doi: 10.1038/nature20814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Lebedeva A., Sundström A., Lindgren L., Stomby A., Aarsland D., Westman E., Winblad B., Olsson T., Nyberg L. Longitudinal relationships among depressive symptoms, cortisol, and brain atrophy in the neocortex and the hippocampus. Acta Psychiatr. Scand. 2018;137:491–502. doi: 10.1111/acps.12860. [DOI] [PubMed] [Google Scholar]
  • 60.Zaremba D., Dohm K., Redlich R., Grotegerd D., Strojny R., Meinert S., Bürger C., Enneking V., Förster K., Repple J., et al. Association of brain cortical changes with relapse in patients with major depressive disorder. JAMA Psychiatr. 2018;75:484–492. doi: 10.1001/jamapsychiatry.2018.0123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Devlin B., Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004. doi: 10.1111/j.0006-341X.1999.00997.x. [DOI] [PubMed] [Google Scholar]
  • 62.Mohamed B.A., Barakat A.Z., Zimmermann W.-H., Bittner R.E., Mühlfeld C., Hünlich M., Engel W., Maier L.S., Adham I.M. Targeted disruption of Hspa4 gene leads to cardiac hypertrophy and fibrosis. J. Mol. Cell. Cardiol. 2012;53:459–468. doi: 10.1016/j.yjmcc.2012.07.014. [DOI] [PubMed] [Google Scholar]
  • 63.Rodriguez-Iturbe B., Johnson R.J., Sanchez-Lozada L.G., Pons H. HSP70 and Primary Arterial Hypertension. Biomolecules. 2023;13:272. doi: 10.3390/biom13020272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Mancuso N., Shi H., Goddard P., Kichaev G., Gusev A., Pasaniuc B. Integrating gene expression with summary association statistics to identify genes associated with 30 complex traits. Am. J. Hum. Genet. 2017;100:473–487. doi: 10.1016/j.ajhg.2017.01.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Fabregat A., Jupe S., Matthews L., Sidiropoulos K., Gillespie M., Garapati P., Haw R., Jassal B., Korninger F., May B., et al. The reactome pathway knowledgebase. Nucleic Acids Res. 2018;46:D649–D655. doi: 10.1093/nar/gkx1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Kao Y.-C., Ho P.-C., Tu Y.-K., Jou I.-M., Tsai K.-J. Lipids and Alzheimer’s disease. Int. J. Mol. Sci. 2020;21:1505. doi: 10.3390/ijms21041505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Di Paolo G., Kim T.-W. Linking lipids to Alzheimer’s disease: cholesterol and beyond. Nat. Rev. Neurosci. 2011;12:284–296. doi: 10.1038/nrn3012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Li M., Gao Y., Wang D., Hu X., Jiang J., Qing Y., Yang X., Cui G., Wang P., Zhang J., et al. Impaired membrane lipid homeostasis in schizophrenia. Schizophr. Bull. 2022;48:1125–1135. doi: 10.1093/schbul/sbac011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Schluter J., Peled J.U., Taylor B.P., Markey K.A., Smith M., Taur Y., Niehus R., Staffas A., Dai A., Fontana E., et al. The gut microbiota is associated with immune cell dynamics in humans. Nature. 2020;588:303–307. doi: 10.1038/s41586-020-2971-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Shi C., Pamer E.G. Monocyte recruitment during infection and inflammation. Nat. Rev. Immunol. 2011;11:762–774. doi: 10.1038/nri3070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Scherlinger M., Richez C., Tsokos G.C., Boilard E., Blanco P. The role of platelets in immune-mediated inflammatory diseases. Nat. Rev. Immunol. 2023;23:1–16. doi: 10.1038/s41577-023-00869-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Heppner F.L., Ransohoff R.M., Becher B. Immune attack: the role of inflammation in Alzheimer disease. Nat. Rev. Neurosci. 2015;16:358–372. doi: 10.1038/nrn3880. [DOI] [PubMed] [Google Scholar]
  • 74.Heneka M.T., Golenbock D.T., Latz E. Innate immunity in Alzheimer’s disease. Nat. Immunol. 2015;16:229–236. doi: 10.1038/ni.3102. [DOI] [PubMed] [Google Scholar]
  • 75.Andrew G., Arora R., Bilmes J., Livescu K. Deep canonical correlation analysis. International conference on machine learning. 2013;28:1247–1255. https://proceedings.mlr.press/v28/andrew13.html [Google Scholar]
  • 76.Yin J., Sun S. Multiview uncorrelated locality preserving projection. IEEE Transact. Neural Networks Learn. Syst. 2020;31:3442–3455. doi: 10.1109/TNNLS.2019.2944664. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Tables S1–S15
mmc1.pdf (328.1KB, pdf)
Table S16. List of 35 analyzed GWAS traits
mmc2.xlsx (12.3KB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (3MB, pdf)

Data Availability Statement

All the summary statistics used in this paper are publicly available. The GTEx study data (v8) are available through dbGaP under accession number phs000424.v8.p2. Summary statistics of mQTLs are available at the enhancing GTEx Portal (https://gtexportal.org/home/downloads/egtex/methylation). The eQTLGen data released by eQTLGen Consortium are available at https://www.eqtlgen.org. The ROSMAP eQTL summary statistics are available at eQTL Catalog https://www.ebi.ac.uk/eqtl/. The FUSION data are available through FUSION Skeletal Muscle Study portal: https://www.ebi.ac.uk/birney-srv/FUSION/. The BSGS+LBC mQTL summary statistics are available at: https://cnsgenomics.com/data/SMR/#mQTLsummarydata. Code for simulation and real data analysis for the mintMR paper is available at https://github.com/ylustat/MVMR-Analysis. The code for mintMR is available at https://github.com/ylustat/mintMR.


Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES