Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2020 Sep 21;107(4):714–726. doi: 10.1016/j.ajhg.2020.08.022

Bayesian Genome-wide TWAS Method to Leverage both cis- and trans-eQTL Information through Summary Statistics

Justin M Luningham 1,5, Junyu Chen 5, Shizhen Tang 2,5, Philip L De Jager 3, David A Bennett 4, Aron S Buchman 4, Jingjing Yang 5,
PMCID: PMC7536614  PMID: 32961112

Summary

Transcriptome-wide association studies (TWASs) have been widely used to integrate gene expression and genetic data for studying complex traits. Due to the computational burden, existing TWAS methods do not assess distant trans-expression quantitative trait loci (eQTL) that are known to explain important expression variation for most genes. We propose a Bayesian genome-wide TWAS (BGW-TWAS) method that leverages both cis- and trans-eQTL information for a TWAS. Our BGW-TWAS method is based on Bayesian variable selection regression, which not only accounts for cis- and trans-eQTL of the target gene but also enables efficient computation by using summary statistics from standard eQTL analyses. Our simulation studies illustrated that BGW-TWASs achieved higher power compared to existing TWAS methods that do not assess trans-eQTL information. We further applied BWG-TWAS to individual-level GWAS data (N = ∼3.3K), which identified significant associations between the genetically regulated gene expression (GReX) of ZC3H12B and Alzheimer dementia (AD) (p value = 5.42 × 10−13), neurofibrillary tangle density (p value = 1.89 × 10−6), and global measure of AD pathology (p value = 9.59 × 10−7). These associations for ZC3H12B were completely driven by trans-eQTL. Additionally, the GReX of KCTD12 was found to be significantly associated with β-amyloid (p value = 3.44 × 10−8) which was driven by both cis- and trans-eQTL. Four of the top driven trans-eQTL of ZC3H12B are located within APOC1, a known major risk gene of AD and blood lipids. Additionally, by applying BGW-TWAS with summary-level GWAS data of AD (N = ∼54K), we identified 13 significant genes including known GWAS risk genes HLA-DRB1 and APOC1, as well as ZC3H12B.

Keywords: transcriptome-wide association study, TWAS, Bayesian variable selection model, cis-eQTL, trans-eQTL, summary statistics, Alzheimer dementia

Introduction

Although genome-wide association studies (GWASs) have identified thousands of variants associated with complex traits over the past decades,1, 2, 3, 4, 5 most of these associations are located within noncoding regions and the underlying biological mechanisms by which these variants impact a phenotype are unknown.6,7 Recent studies have shown that GWAS associations were enriched for regulatory elements such as expression quantitative trait loci (eQTL),8, 9, 10 suggesting that integrating transcriptomic and genetic data could help identify key molecular mechanisms underlying complex traits.

One such integrative method is transcriptome-wide association study (TWAS),11, 12, 13 which takes advantage of a reference panel with profiled transcriptomic and genetic data from the same individuals. A TWAS first utilizes such reference data to fit an imputation regression model for the expression quantitative trait of a target gene with nearby genotypes (e.g., cis-SNPs within 1 MB region of transcription starting site) as predictors, and then examines the gene-based association between the imputed genetically regulated gene expression (GReX) and the phenotype of interest. With fitted gene expression imputation models from reference data, TWASs can be conducted with test samples that have either individual-level or summary-level GWAS data.12, 13, 14 The SNPs with non-zero effect sizes on reference transcriptome in the fitted imputation models are referred to as broad sense “eQTL” in TWASs. Examples of publicly available reference data include the Genotype-Tissue Expression (GTEx) project with transcriptomic data for 54 human tissues,8 Genetic European Variation in Health and Disease (GEUVADIS) for lymphoblastoid cell lines,15 and North American Brain Expression Consortium (NABEC) for cortex tissues.16

Essentially, a TWAS is equivalent to a burden type gene-based test taking “cis-eQTL effect sizes” that are non-zero coefficients of cis-SNPs from the fitted GReX imputation model as their corresponding burden weights.11, 12, 13 By weighting genetic variants using cis-eQTL effect sizes, a TWAS assumes the effects of risk genes on the phenotype of interest are potentially mediated through their transcriptome variations. Recent studies of a wide range of complex traits such as schizophrenia, breast cancer, and Alzheimer dementia (AD)17, 18, 19, 20, 21 using TWASs have identified additional risk genes besides known GWAS risk loci, demonstrating that additional significant associations can be identified by TWASs.

However, existing TWAS methods only use genetic data of cis-SNPs of the target gene as predictors to fit the GReX imputation model.11, 12, 13 As shown by recent studies, trans-SNPs (e.g., outside of the 1 MB region) of the target gene not only explain a significant amount of variation for most expression quantitative traits, but also often contain significant trans-eQTL that are likely to inform molecular mechanisms.22,23 Thus, using both cis- and trans-SNPs is likely to increase the imputation accuracy of GReX and the power of TWASs. Nonetheless, the enormous computational cost required to fit ∼20K GReX imputation models for genome-wide genes and ∼10M genotypes per tissue type makes the routine use of existing TWAS methods impractical.

We propose a Bayesian genome-wide TWAS (BGW-TWAS) method that accounts for both cis- and trans-SNPs based on a Bayesian variable selection regression (BVSR) model24 for imputing GReX. Our BGW-TWAS method circumvents the current computational burden impeding TWASs by enabling efficient computation via the scalable EM-MCMC algorithm25 and the summary statistics of standard eQTL analyses based on single variant tests. First, we demonstrate the feasibility of this Bayesian approach by simulation studies with varying proportions of true causal cis- and trans-eQTL for expression quantitative traits. We compared BGW-TWAS with several existing TWAS methods including PrediXcan11 and TIGAR13 that assess only cis-SNPs. Then we applied BGW-TWAS to clinical and postmortem data from older adults with individual-level GWAS data (N = ∼3.3K)26 to study several clinical and pathological AD-related phenotypes including clinical diagnosis of AD, neurofibrillary tangle density, β-amyloid load, and a global summary measure of AD pathology. Further, we compared BGW-TWAS with alternative methods by using GWAS summary statistics for AD available from the International Genomics of Alzheimer’s Project (IGAP)27 (N = ∼54K).

Our simulation studies revealed that BGW-TWAS achieved higher TWAS power by considering both cis- and trans-SNPs when trans-eQTL accounted for a non-negligible proportion of transcriptome variance. Our studies of human AD GWAS datasets identified several risk genes associated with AD phenotypes that were driven by trans-eQTL and thus not identified by alternative methods. The software for implementing BGW-TWAS is available freely on Github (see Web Resources).

Material and Methods

TWAS Procedure

The first step in a TWAS is to train an imputation model for profiled gene expression levels using genotype data as predictors on a per-gene basis.11, 12, 13 The general imputation model based on linear regression is given by

Eg=Xw+ϵ, (Equation 1)

where Eg denotes the expression quantitative trait of the target gene, centered and adjusted for non-genetic covariates; X denotes centered genotype data; w denotes the corresponding “eQTL” effect sizes for the target gene; and ϵ is an error term following a N(0,σϵ2I) distribution. The intercept term is dropped for centering both response (Eg) and explanatory (X) variables. With wˆ estimated from the training data (i.e., reference data) that have both transcriptomic and genetic data profiled for the same subjects, a TWAS will test the association between the phenotype of interest and the imputed GReX obtained from individual-level GWAS genotype data X˜ of the test cohort as follows

GReXˆ=X˜wˆ.

Bayesian Variable Selection Regression

Existing TWAS methods only consider SNPs within 1 MB of the flanking 5' and 3' ends (cis-SNPs) in the gene expression imputation model (Equation 1).11, 12, 13 In order to leverage additional information provided by trans-eQTL that are located outside the 1 MB flanking region of the target gene, we utilize the Bayesian Variable Selection Regression (BVSR)24 model to account for both cis- and trans-SNPs as follows:

Eg=Xciswcis+Xtranswtrans+ϵ,ϵiN(0,σϵ2). (Equation 2)

The BVSR model assumes a spike-and-slab prior distribution for wi. That is, the prior on wi is a mixture distribution of a normal distribution with zero mean and a point-mass density function at 0. In order to model potentially different distributions of the effect sizes for cis- and trans-SNPs, we assume the following respective priors,

wcis,iπcisN(0,σcis2σϵ2)+(1πcis)δ0(wcis,i);
wtrans,iπtransN(0,σtrans2σϵ2)+(1πtrans)δ0(wcis,i); (Equation 3)

where (πcis,πtrans) denote the respective probability that the coefficient is non-zero and normally distributed, and δ0(wi) is the point mass density function that takes value 0 when wi0 and 1 when wi = 0. Further, the following conjugate hyper prior distributions are respectively assumed for the cis- and trans-specific parameters,

πcisBeta(acis,bcis);σcis2IG(k1,k2);
πtransBeta(atrans,btrans);σtrans2IG(k3,k4); (Equation 4)

where IG indicates the Inverse Gamma distribution and hyper parameters (acis,bcis,atrans,btrans,k1,k2,k3,k4) will be chosen to enable non-informative hyper prior distributions (see Supplemental Material and Methods for model details).

To facilitate computation, a latent indicator γi is assumed such that wi = 0 if γi=0, and wi follows a normal distribution if γi=1. Then the expected value of this indicator, E[γi], represents the posterior probability (PPi) for each individual SNP to have a non-zero effect size (i.e., to be an eQTL of the target gene).24 Moreover, we propose a Bayesian approach to estimate GReX for test samples that can account for the uncertainty for each SNP to be an eQTL:

GReXˆ=i=1pXiPPiˆwˆi, (Equation 5)

where Xi˜represents the genotype data of variant i for test samples and (wˆi, PPiˆ) denote the estimate of effect size and posterior probability (PP) of having a non-zero effect size from the BVSR model (Equations 2, 3, and 4) (see Supplemental Material and Methods for detailed Bayesian inference procedure). This Bayesian GReX estimate can then be used to conduct a TWAS with individual-level GWAS data by testing the association between the imputed GReX and the phenotype of interest.

BGW-TWAS with Summary-Level GWAS Data

With summary-level GWAS data that were generated by single variant tests, we employed the S-PrediXcan14 approach to obtain a burden TWAS Z-score test statistic, including not only cis- but also trans-eQTL in the test. Let βlˆ denote the SNP effect size of SNP l from GWAS, SE(βlˆ) denote the standard error of βlˆ, Zl denote the Z-score statistic value by single variant test, σlˆ denote the estimated standard deviation of the genotype data of SNP l from reference panel, and σgˆ denote the estimated standard deviation of the imputed expression of gene g from reference panel. The burden TWAS Z-score test statistic for gene g is given by

Zg=lModelgwlgˆσlˆσgˆβlˆSE(βlˆ)=lModelgwlgˆσlˆσgˆZl=lModelg(wlgˆσlˆ)Zlwˆ'Vwˆ,
σl2ˆ=Varxl,σg2ˆ=wˆ'Vwˆ,V=CovX,

where wlgˆ=PPiˆwˆi is the product of posterior probability for SNP l to have non-zero eQTL effect size from the BVSR model (Equation 2). Here, X denotes the genotype matrix of analyzed SNPs from reference panels of the same ethnicity and V denotes the corresponding genotype covariance matrix.

Efficient Computation Techniques

In theory, the estimates of eQTL effect sizes and corresponding posterior probabilities (wˆi, PPiˆ) can be obtained by using a standard Markov Chain Monte Carlo (MCMC)28 algorithm. However, in practice, the computation burden for modeling genome-wide genotype data is nearly impossible because of enormous required memory capacity and slow convergence rate for MCMC. To circumvent these practical limitations, we employ several techniques to enable computational efficiency such that BGW-TWAS method can be deployed to leverage both cis- and trans-eQTL information in practice. In particular, we adapt a previously developed scalable expectation-maximization Markov Chain Monte Carlo (EM-MCMC) algorithm.25 Unlike the original EM-MCMC algorithm requiring individual-level GWAS data, we can reduce up to 90% of the computation time by adapting the EM-MCMC algorithm to utilize only summary statistics, including the pre-calculated linkage disequilibrium (LD) coefficients and score statistics from standard eQTL analyses by single variant tests. Additionally, we prune genome-wide genotypes into a subset of genome regions that are approximately independent and contain either at least one cis-SNP or one trans-SNP with p value < 1 × 10−5 by standard eQTL analyses (technical details are provided in Supplemental Material and Methods).

Simulation Study Design

We conducted simulation studies to validate the performance of our proposed BGW-TWAS method through comparing with the alternative existing methods, e.g., PrediXcan, TIGAR, as well as BVSR using only cis-eQTL. To mimic real studies, we used real genotype data from the ROS/MAP study to simulate gene expression and phenotype data. We took 499 samples as our training data and 1,209 samples as our test data. GReX imputation models were fitted using the training data where “eQTL” effect sizes and the corresponding posterior probabilities were estimated. Given these fitted GReX imputation models, GReX data were imputed for a follow-up TWAS with the test data.

We arbitrarily selected five approximately independent genome blocks, including one “cis-” and four “trans-” genotype blocks (variants were filtered with minor allele frequency (MAF) > 5% and Hardy-Weinberg p value > 10−5). With genotype matrix Xg of the randomly selected causal eQTL, we generated effect-sizes wi to target a selected gene expression heritability he2 and that all causal eQTL explain equal expression heritability. Gene expression levels were generated by Eg=Xgw+ϵe, with ϵeN(0,(1he2)). Then we simulated phenotypes by Y=βEg+ϵp, where β was selected with respect to a selected phenotype heritability hp2 and ϵpN(0,(1hp2)).

To mimic the complex genomic architecture of gene expression in practice, we considered two scenarios, one with 5 true causal eQTL representing the scenario with relatively large effect sizes and the other one with 22 true causal eQTL representing the scenario with relatively small effect sizes. For the scenario with 5 true causal eQTL, we considered three sub-scenarios with respect to how these true causal eQTL distributed over considered genome blocks: (1) all causal eQTL are from the cis-block; (2) two causal eQTL are from the cis-block explaining 70% of the specified he2 while the other three causal eQTL are from the trans-blocks explaining the other 30% of he2; (3) all causal QTL are from the trans-block. Similarly, for the scenario with 22 true causal eQTL, we considered three scenarios where 30%, 50%, and 70% of the causal eQTL were from cis-genome blocks. We also varied the total expression trait heritability and phenotype heritability in both scenarios, i.e., (he2,hp2)=((5%, 90%), (10%, 45%), (20%, 20%), (50%,6%))for the scenario with 5 true causal eQTL and (he2,hp2)=((5%, 99%), (10%, 80%), (20%, 35%), (50%,8%)) for the scenario with 22 true causal eQTL. Here, different levels of phenotype heritability were arbitrarily selected to achieve similar levels of TWAS power across all scenarios.

In each simulation, with training data, we first fitted GReX imputation models by BVSR (BGW-TWAS) with both cis- and trans-genome blocks, as well as by Elastic-Net (PrediXcan) and nonparametric Bayesian Dirichlet process regression (TIGAR) with only cis-genome block. Then we conducted TWAS with imputed GReX by respective method. We also compared BGW-TWAS with using only cis-eQTL estimates from the same BVSR model. The performance was compared in terms of R2 of the imputed GReX and TWAS power in test samples. Test R2 was calculated as the squared correlation between imputed GReX and simulated gene expression values of the test samples. TWAS power was calculated as the proportion of 1,000 repeated simulations of each scenario with p value < 2.5 × 10−6 (genome-wide significance threshold for gene-based association studies).

ROS/MAP and Mayo Clinic GWAS Data of AD

Following simulation studies, we applied BGW-TWAS method to individual-level genomic and AD related phenotype data from older adults available from several studies. We used transcriptomic data, GWAS data, clinical diagnosis of AD and postmortem indices of AD pathology from the Religious Orders Study (ROS) and Rush Memory and Aging Project (MAP)29, 30, 31 and GWAS data from the Mayo Clinic Alzheimer’s Disease Genetics Studies (MCADGS).32, 33, 34 All participants from ROS/MAP sign an informed consent, an Anatomic Gift Act, and a consent for their data to be deposited in the Rush Alzheimer’s Disease Center (RADC) repository. ROS/MAP studies were approved by the Institutional Review Board of Rush University Medical Center, Chicago, IL. MCADGS contains samples from two clinical AD case-control series (Mayo Clinic Jacksonville and Mayo Clinic Rochester) as well as a neuropathological series of autopsy-confirmed subjects from the Mayo Clinic Brain Bank.

Microarray genotype data generated for 2,093 European-decent subjects from ROS/MAP35 and 2,099 European-decent subjects from MCADGS were further imputed to the 1000 Genomes Project Phase 336 in our analysis.37

Post-mortem brain samples from the dorsal lateral prefrontal cortex from ∼30% of these ROS/MAP participants with assayed genotype data were profiled for transcriptomic data by next-generation RNA seqencing.38 These data were used as reference data to train GReX prediction models in this study. We conducted TWASs for both clinical and pathological AD phenotypes. The clinical diagnosis of late-onset Alzheimer dementia was available for both ROS/MAP and MCADGS. Postmortem pathology indices of AD were only available for ROS/MAP and included PHFtau tangle density, β-amyloid load, and a global measure of AD pathology based on measures of neuritic and diffuse plaques and neurofibrillary tangles.29, 30, 31 Additional details about the ongoing ROS/MAP cohort studies and how postmortem indices of tangles and β-amyloid load were quantified are included in prior publications29, 30, 31 and summarized in the Supplemental Material and Methods.

Results

Simulation Results

For the scenario with five true causal eQTL and various expression heritability, our simulation studies showed that BGW-TWAS obtained the highest test R2 for GReX and TWAS power than PrediXcan and TIGAR when any portion of the true causal eQTL are distributed over trans-genome blocks (Figures 1A and 1B). This is because BGW-TWAS leverages both cis- and trans-eQTL information while the alternative methods fail to account for trans-eQTL. Especially, when all true causal eQTL are from trans-genome regions, the alternative methods barely have any power to identify the TWAS association with nearly zero test R2. As expected, BGW-TWAS and PrediXcan performed comparably when all causal eQTL were from the cis-genome block, while TIGAR performed slightly worse with sparse true causal eQTL (Figure 1A).

Figure 1.

Figure 1

Simulation TWASs Comparing BGW-TWAS, BVSR with cis-eQTL only, PrediXcan, and TIGAR Methods

Simulation studies used various gene expression heritability he2=(0.05,0.1,0.2,0.5) and various true causal cis-eQTL proportions. Test R2 was calculated as the squared correlation between imputed GReX and simulated gene expression values of the test samples.

(A and B) Test R2 and TWAS power comparison with 5 true causal eQTL. BGW-TWAS was found to out-perform the alternative methods when a non-negligible proportion of true causal eQTL were from trans-genome regions.

(C and D) Test R2 and TWAS power comparison with 22 true causal eQTL. BGW-TWAS was found to out-perform alternative method when >50% of true causal eQTL were from trans-genome regions and he2>0.1.

For the scenario with 22 mixed cis- and trans-eQTL, the performance comparison became more complicated with respect to various true expression heritability levels (Figures 1C and 1D). Particularly, when he2=0.05, all methods had difficulties accurately estimating eQTL effect sizes and resulted in nearly zero test R2. As expression heritability increased, the advantage of modeling both cis- and trans-genotype data by BGW-TWAS arisen and led to higher test R2 and TWAS power. When he2=(0.1,0.2) and 70% of the true causal eQTL were cis-, BGW-TWAS was less effective than PrediXcan and TIGAR while TIGAR achieved the best performance. This is likely due to the fact that the nonparametric Bayesian Dirichlet process regression model used by TIGAR is preferred when true causal eQTL manifest relatively small effect sizes, which is consistent with previous findings.13

In contrast, when true causal eQTL signals have relatively large effect sizes and are distributed outside the cis-region of the target gene, BGW-TWAS method is preferred due to the improved accuracy for GReX prediction by leveraging trans-SNP data. By comparing with using only BVSR estimates of cis-eQTL, we showed that a significant proportion of transcriptome variation due to trans-eQTL was missed and the follow-up TWAS was underpowered.

TWAS of AD-Related Phenotypes with Individual-Level GWAS Data

Next, we applied BGW-TWAS to the individual-level GWAS data from ROS/MAP26,31 and MCADGS.32 First, we trained the BVSR GReX imputation models using samples (n = 499) from the ROS/MAP cohort that contained both profiled genotype data and transcriptomic data obtained from the dorsal lateral prefrontal cortex. All expression quantitative traits were normalized and corrected for age at death, sex, postmortem interval (PMI), study (ROS or MAP), batch effects, RNA integrity number (RIN), top three principal components derived from genome-wide genotype data, and cell type proportions (oligodendrocytes, astrocytes, microglia, neurons). The cell type proportions were derived by using CIBERSORT pipeline39 with single-cell RNA-seq transcriptome profiles from human brain tissues as in Darmanis et al. 40 to de-convolute bulk RNA-seq data.41

When we applied BGW-TWAS, we obtained GReX imputation models for 14,156 genes, compared to respective 6,011 genes and 14,214 genes by PrediXcan and TIGAR that have at least one cis-eQTL with nonzero effect size on expression quantitative trait. Across the 6,011 genes with GReX imputation models by PrediXcan, our BGW-TWAS approach had a smaller train R2 (squared correlation between fitted and profiled gene expression values) value for expression quantitative traits for only 855 genes (Figure S1A). While TIGAR and BGW-TWAS yielded a similar number of GReX imputation models, BGW method is expected to result in higher imputation accuracy when trans-eQTL play an important role in affecting gene expression levels as shown by our simulation results. Of 13,142 genes that had imputation models fitted by both TIGAR and BGW-TWAS, BGW-TWAS had smaller train R2 for only 3,304 genes. That is, BGW-TWAS would be preferred for genes that have sparse eQTL, especially trans-eQTL, while TIGAR would be preferred for genes that have less sparse eQTL that are mostly cis-eQTL (Figure S1B).

We imputed Bayesian GReX values for all remaining individuals with genotype data in ROS/MAP and MCADGS by using Equation 4. We then conducted TWASs by testing the association between the standardized GReX values (with unit variance) and both clinical and pathological AD phenotypes. The TWASs for these phenotypes controlled for age at death, sex, smoking, ROS or MAP study, education level, and top three principal components derived from genome-wide genotype data.

For the dichotomous phenotype of clinical diagnosis of AD, the case/control status was determined by different rules and the available confounding variables were different for ROS/MAP and MCADGS. Cognitive status at death for individuals from the ROS/MAP cohort is based on the review of all longitudinal clinical data available at the time of death blinded to all pathologic data. Individuals were classified as having no cognitive impairment, mild cognitive impairment, or AD. In this study, samples from individuals with AD were taken as case subjects and samples from individuals without dementia (i.e., either with no cognitive impairment or mild cognitive impairment) were taken as control subjects. For the MCADGS samples, case subjects were determined for samples with a medical history of late-onset AD diagnosis, and available confounding variables included only age, sex, and top three principal components derived from GWAS data. Therefore, we meta-analyzed these two cohorts for AD clinical diagnosis by applying the inverse-variance weighting method42 to summary statistics obtained by TWAS per cohort. We compared the meta-TWAS results obtained with BGW-TWAS to alternative TWAS methods.

BGW-TWAS identified ZC3H12B (located on chromosome X) whose GReX values were associated with AD with effect size β=0.265, p value 5.42 × 10−13, and FDR = 3.07 × 10−8 (Figure 2A; Table 1). Both within-cohort TWASs obtained positive effect sizes (β=0.22,0.29) for ROS/MAP and MCADGS, with respective p value = 2 × 10−4, 4.12 × 10−10. On the other hand, this gene was not identified by either PrediXcan or TIGAR because the association of this gene is completely driven by trans-eQTL (Figures S2 and S4).

Figure 2.

Figure 2

Manhattan Plots of BGW-TWAS Results of AD Clinical Diagnosis and Global AD Pathology

Here, -log10(p values) by BGW-TWAS were plotted and red lines denote genome-wide significant threshold (2.5 × 10−6) for gene-based association studies. ZC3H12B was found to be significantly associated with both AD clinical diagnosis (A) and global AD pathology (B).

Table 1.

Significant Risk Genes Identified by BGW-TWAS using Individual-Level GWAS Data from ROS/MAP and MCADGS Cohorts

Gene CHR Position Train R2 p Value Effect Size (SD) Phenotype
ZC3H12B X 64,708,614 0.24 5.42 × 10−13 0.265 (0.037) AD
ZC3H12B X 64,708,614 0.24 9.59 × 10−7 0.142 (0.029) global AD pathology
ZC3H12B X 64,708,614 0.24 1.89 × 10−6 0.138 (0.029) tangles
KCTD12 13 77,454,311 0.09 3.44 × 10−8 0.143 (0.026) β-amyloid

TWASs of pathological AD phenotypes were restricted to ROS/MAP from whom postmortem AD indices were collected. TWASs were conducted for individuals with GWAS genotype data and AD pathology indices: tangles (n = 1,121), β-amyloid (n = 1,114), and global AD pathology (n = 1,139). These results are shown in the Manhattan plots in Figures 2B, 3A, 3B, S2, S3, and S5S7.

Figure 3.

Figure 3

Manhattan Plots of BGW-TWAS Results of Neurofibrillary Tangle Density and β-Amyloid Load

Here, -log10(p values) by BGW-TWAS were plotted and Red lines denote genome-wide significant threshold (2.5 × 10−6) for gene-based association studies. ZC3H12B was found to be significantly associated with neurofibrillary tangle density (A). KCTD12 was found to be significantly associated with and β-amyloid load (B).

Using BGW-TWAS, ZC3H12B was identified to be associated with global AD pathology with p value = 9.59 × 10−7 (Figure 2B; Table 1) as well as neurofibrillary tangle density with p value 1.89 × 10−6 (Figure 3A; Table 1). KCTD12 located on chromosome 12 was identified to be significantly associated with β-amyloid load with p value = 3.44 × 10−8(Figure 3B; Table 1).

We show the BVSR posterior probabilities for considered SNPs to be eQTL for ZC3H12B and KCTD12 in Figure 4 and the standard eQTL analyses results for these two genes in Figure S4. These data suggest that the association between the imputed GReX values of ZC3H12B and AD phenotypes is completely driven by trans-eQTL, while the association between the GReX values of KCTD12 and β-amyloid load is driven by both cis- and trans-eQTL. Four of the top driven trans-eQTL (rs12721051, rs4420638, rs56131196, rs157592; Table 2) for ZC3H12B are located in APOC1, a known risk gene for AD43 and blood lipids,44, 45, 46 which is <12 KB away from the well-known AD risk gene APOE.47 Specifically, rs12721051 located in the 3' UTR region of APOC1 was identified as a GWAS signal of total cholesterol levels;46 rs4420638 located in the downstream of APOC1 is in linkage disequilibrium (LD) with the APOE-E4 allele (rs429358) and was identified to be a GWAS signal of various blood lipids measurements (i.e., low-density lipoprotein cholesterol measurement, C-reactive protein measurement, triglyceride measurement, and total cholesterol measurement)44 and AD;48 rs56131196 located in the downstream and rs157592 located in the regulatory region of APOC1 were identified as GWAS signals of AD and independent of APOE-E4.49 Additionally, ZC3H12B was found to regulate pro-inflammatory activation of macrophages50 and has higher expression in brain, spinal cord, and thymus tissue types compared to other tissues.51 These results showed that the effects of these known GWAS signals (rs4420638, rs56131196, rs157592) of AD could be mediated through the expression levels of ZC3H12B.

Figure 4.

Figure 4

BVSR Posterior Probabilities (PP) of Having Non-zero eQTL Effect Sizes for Analyzed cis- and trans-SNPs, with Target Genes ZC3H12B and KCTD12

(A) ZC3H12B located on chromosome X has top trans-eQTL from chromosomes 1, 6, and 19, where all eQTL are of trans-eQTL.

(B) KCTD12 located on chromosome 12 has top cis-eQTL from chromosome 12 and trans-eQTL from chromosomes 4 and 6.

Table 2.

trans-SNPs with Top Five Posterior Probability (PP) > 0.003 of Having Non-zero eQTL Effect Sizes for ZC3H12B

CHR POS rsID Function MAF PP w p Value
1 159,135,282 rs3026946 intergenic 0.213 0.0147 −0.071 6.25 × 10−7
19 45,422,160 rs12721051 3' UTR (APOC1) 0.161 0.0031 0.071 3.94 × 10−6
19 45,422,846 rs56131196 downstream (APOC1) 0.173 0.0048 0.069 1.75 × 10−6
19 45,422,946 rs4420638 downstream (APOC1) 0.173 0.0051 0.068 1.77 × 10−6
19 45,424,514 rs157592 regulatory region (APOC1) 0.181 0.0056 0.075 1.43 × 10−6

TWAS of AD with Summary-Level GWAS Data

To validate our findings using individual-level GWAS data from ROS/MAP and MCADGS, we conducted a TWAS of AD using the publicly available IGAP GWAS summary statistics.27 Specifically, we used the GWAS summary statistics that were generated by meta-analysis of four consortia (∼17K case subjects and ∼37K control subjects, Europeans): the Alzheimer’s Disease Genetic Consortium (ADGC), the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium, the European Alzheimer’s Disease Initiative (EADI), and the Genetic and Environmental Risk in Alzheimer’s Disease (GERAD) Consortium.

BGW-TWAS identified 13 significant genes located in chromosome 3, 6, 7, 10, 11, 19, and X, including known GWAS risk genes HLA-DRB152 and APOC1, and ZC3H12B that was identified using individual-level GWAS data from ROS/MAP and MCADGS (Table 3). Moreover, seven of these genes (including HLA-DRB1 and APOC1) were also identified when we only considered cis-eQTL estimates by BVSR in the TWAS. CEACAM19 near the well-known GWAS risk gene APOE was also identified by S-PrediXcan and TIGAR. Known GWAS risk gene HLA-DRB152 was also identified by S-PrediXcan (Table 3; Tables S1–S3).

Table 3.

Significant Genes Identified by BGW-TWAS using IGAP GWAS Summary Statistics of AD

Gene CHR Position TWAS p Value
BGW-TWAS BVSR cis-eQTL PrediXcan TIGAR
GPX1a 3 49,394,608 2.45 × 10−98 2.45 × 10−98 3.15 × 10−1
FAM86DP 3 75,484,261 1.55 × 10−13 4.81 × 10−1 5.38 × 10−1 9.63 × 10−1
BTN3A2a 6 26,378,546 1.59×1026 1.56×1026 3.17×101 5.04×101
ZNF192a 6 28,124,089 1.26×1032 1.25×1032 8.56×102 2.07×101
AL022393.7a 6 28,144,452 3.25×10178 2.24×10178 1.50×101 8.36×102
HLA-DRB1a,b 6 32,557,625 1.02×1012 8.99×1013 2.06×106
AEBP1 7 44,154,161 5.55×10220 8.62×101 6.69×101 4.19×101
BUB3 10 124,924,886 6.64×1018 1.05×102 4.76×101
FBXO3 11 33,796,089 1.48×109 6.88×101 1.13×101
CEACAM19a,b,c 19 45,187,631 4.7×1013 2.54×1013 3.60×1012 2.83×1016
APOC1a 19 45,422,606 8.9×1011 1.11×1010 3.18×106 7.2×103
ZC3H12B X 64,727,767 2.08×1037
CXorf56 X 118,699,397 6.02×1007

TWAS p values by alternative methods, i.e., using BVSR cis-eQTL estimates only, PrediXcan, and TIGAR are also provided. p values for genes that were missed by TWAS were indicated as “–.” ZC3H12B that was identified by BGW-TWAS using individual-level GWAS data from ROS/MAP and MCADGS was also identified by BGW-TWAS using IGAP summary-level GWAS statistics. CEACAM19 from chromosome 19 was identified by all TWAS methods, and HLA-DRB1 from chromosome 6 is a known GWAS risk locus.

a

Genes that were also identified as significant by using BVSR cis-eQTL estimates.

b

Genes that were also identified by PrediXcan.

c

Genes that were also identified by TIGAR.

Our results showed that by using BVSR estimates of cis- and trans-eQTL (BGW-TWAS), most independent risk loci were identified including loci driven by trans-eQTL. For those significant genes driven mainly by cis-eQTL, a TWAS using BVSR estimates of cis-eQTL still identified more independent significant risk loci (distributed over chromosomes 2, 3, 6, 11, and 19) than S-PrediXcan and TIGAR, including all 4 significant genes (HLA-DRB1, SLC39A13, PVR, CEACAM19) identified by S-PrediXcan and 4 out of 21 significant genes (ZNF227, ZFP112, PVR, CEACAM19) identified by TIGAR (Tables S1–S3). Although TIGAR identified the most significant TWAS genes (21), these genes are from chromosomes 11 and 19, which are likely to be driven by the same cis-eQTL from two independent loci.

These TWAS results using summary-level GWAS data with a much larger sample size validated our findings obtained with BGW-TWAS using individual-level GWAS data from ROS/MAP and MCADGS.

Insights about eQTL Genetic Architecture

In addition to imputing Bayesian GReX values (Equation 5), the posterior probabilities of having non-zero eQTL effect sizes estimated by BVSR also provide insights into the genetic architecture of eQTL, especially about how potential eQTL are distributed across the genome. Note that the posterior probability obtained from the BVSR model (Equations 2, 3, and 4) is essentially the expected probability for a SNP to be an eQTL. Therefore, the sum of posterior probabilities of having non-zero eQTL effect sizes represents the expected number of eQTL.

From our simulation studies, we observed that the expected proportions of cis-eQTL were consistent with the true proportions of causal cis-eQTL. The expected number of eQTL obtained across simulation scenarios is presented in Table 4, where two out of five (40%) causal eQTL and 11 out of 22 (50%) causal eQTL are from cis-genome regions. We can see that, with higher true expression heritability, the expected number of eQTL is closer to the true number of causal eQTL. We can also see that the expected number of eQTL is more accurate for the scenario with 5 true causal eQTL than with 22 true causal eQTL, which is due to the fact that the BVSR model prefers relatively larger eQTL effect sizes. These simulation results demonstrated the validity of our BGW-TWAS method based on the BVSR model as well as the usefulness of the sum of posterior probabilities of having non-zero eQTL effect sizes.

Table 4.

Average Sums of Posterior Probabilities of Having Non-zero eQTL Effect Sizes that Are Stratified Based on Gene Expression Heritability (Either True Simulated Heritability in Simulation Studies or the Range of Train R2 of the Fitted BVSR Models with ROS/MAP Data

Gene Expression Heritability Sum of Posterior Probabilities
Whole Genome cis-Region trans-Region
5 True Causal eQTL

0.05 0.79 0.46 0.33
0.1 2.28 1.13 1.15
0.2 3.72 1.44 2.28
0.5 4.91 1.56 3.35

22 True Causal eQTL

0.05 0.05 0.02 0.03
0.1 0.21 0.11 0.10
0.2 1.43 0.87 0.56
0.5 6.46 3.89 2.57

ROS/MAP

(0, 0.05) 1,504 genes 6.63 0.60 6.23
(0.05, 0.1) 1,964 genes 1.45 0.13 1.32
(0.1, 0.25) 6,617 genes 2.00 0.17 1.83
(0.25, 0.5) 3,224 genes 2.66 0.22 2.44
(0.5, 1) 474 genes 3.04 0.31 2.73

The simulation scenarios presented here are those with 2 of 5 and 11 of 22 true causal eQTL from the cis-regions.

For 14,156 genes with fitted GReX prediction models by BVSR using the ROS/MAP data, after excluding 19 outlier genes with >100 expected eQTL, we obtained the average number of expected eQTL as 2.44 (SD = 5.70) across genome-wide regions, 0.25 (SD = 1.24) for cis-eQTL, and 2.48 (SD = 5.49) for trans-eQTL. That is, on overage, 88% of eQTL were from trans-genome regions with respect to the target gene. We can see that ∼90% genes with train R2 > 0.05 have ∼2–3 average expected eQTL, and ∼10% genes with train R2 < 0.05 have >5 average expected eQTL (Table 4). By linking these findings with our simulation studies where train R2 is likely to be >0.05 when true expression heritability is >0.1, we can conclude that ∼90% genes are likely to have true expression heritability >0.1.

Additionally, from our Bayesian estimates of the cis- and trans-specific posterior probabilities of having non-zero eQTL effect sizes (i.e., πcis,πtrans in Equation 3) for genome-wide genes using ROS/MAP data (Figure S11), we can see that πcis and πtrans clearly follow different distributions. This also validates our assumptions of respective prior distribution for cis- and trans-hyper parameters.

Discussion

In this paper, we proposed and validated a Bayesian genome-wide TWAS (BGW-TWAS) method based on the BVSR24 model to leverage the information of both cis- and trans-eQTL. We derived an efficient computational approach to fit the BVSR model with large-scale genomic data, by pruning genome regions that contain either at least one cis-SNP or one potential trans-eQTL and adapting the previously developed scalable EM-MCMC algorithm25 with pre-calculated LD coefficients and summary statistics from standard eQTL analyses. BGW-TWAS extends previous TWAS methods11, 12, 13 that utilize only partial genotype information from a small window of cis- SNPs to train the GReX imputation model.

Genotype data of trans-eQTL have been shown to explain a significant amount of variation of expression quantitative traits and provide important molecular mechanisms underlying known GWAS loci of complex diseases.22,23 The results from our simulation and application studies demonstrated that BGW-TWAS improves the yield of a TWAS by levering both cis- and trans-eQTL information. For example, higher precision of GReX prediction and power of TWASs were obtained in our simulation studies when true causal trans-eQTL existed. These results showed that BGW-TWAS has a greater advantage for scenarios where eQTL have relatively large effect sizes for the expression quantitative traits (e.g., 5 versus 22 true causal eQTL with the same expression heritability). This is because variable selection by the BVSR model is designed to select sparse signals with relatively large effect sizes as shown in previous GWASs.24,25

By applying our BGW-TWAS method to several human AD datasets, we identified a risk gene (ZC3H12B) with GReX values that were significantly associated with both clinical diagnosis of AD and postmortem AD pathology indices (neurofibrillary tangle density and global measure of AD pathology). This association was not identified by existing TWAS methods because this gene is shown to be completely driven by trans-eQTL. Importantly, a potential biological mechanism was revealed by showing that the top driven trans-eQTL of ZC3H12B are known GWAS signals of AD43 and blood lipids44, 45, 46 and <12 KB away from the well-known AD risk gene (APOE).47 Thus, we expect BGW-TWAS leveraging both cis- and trans-eQTL has potential for making a large impact on advancing our understanding of complex human diseases and traits.

By fitting BVSR models using both cis- and trans-eQTL, we not only can account for the uncertainty for a SNP to be an eQTL to predict GReX (Equation 5), but also can use the sum of posterior probabilities of having non-zero eQTL effect sizes to estimate the expected number of eQTL.24,25 The distribution of expected eQTL can also help characterize the underlying genetic architecture of expression quantitative traits.

The current study has several limitations. First, while BGW-TWAS reduces the computational burden for modeling both cis- and trans-eQTL, its computing costs are still substantial to train GReX prediction models for genome-wide genes (∼20K) per tissue type. It requires approximately 30 min of computation time and 3 GB memory per gene (with parallel computation implemented in 4 CPU cores). Parallel computation can be employed to make use of high-performance computation clusters with multiple cores to reduce computation time. Second, our current method is designed to use pre-calculated in-sample LD coefficients and summary statistics from single variant eQTL analyses; further work is required to expand this approach to use approximate LD coefficients generated from reference samples of the same ethnicity. Third, our simulation studies showed that the non-parametric Bayesian method TIGAR performed best when all causal eQTL are cis- with relatively small effect sizes (e.g., 22 true causal cis-eQTL). Our TWAS results of AD using the IGAP summary statistics demonstrated that TIGAR and BGW-TWAS yield complementary findings. These results highlight the potential utility of leveraging both methods especially for studies in which the true distributions of cis- and trans-eQTL of the test genes are generally unknown.

In conclusion, the BGW-TWAS method presented herein provides a framework for leveraging information from both cis- and trans-eQTL to conduct gene-based association studies. Because trans-QTL are common for other quantitative omics traits, e.g., epigenetic, proteomic, and metabonomic, our proposed computational procedure would be to investigate other quantitative omics traits in gene-based association studies. Integrative method developments will stand to benefit from our BGW-TWAS method, especially the perspectives of leveraging information from trans-QTL and efficient computation techniques derived from this paper. In addition, BGW-TWAS can be applied to study other complex human phenotypes to identify potential risk genes that could be targeted in further drug discovery.

Declaration of Interests

The authors declare no competing interests.

Acknowledgment

J.Y. was supported by the startup funding from Department of Human Genetics at Emory University School of Medicine. ROS/MAP study data were provided by the Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL. Data collection was supported through funding by NIA grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, U01AG32984, U01AG46152, and U01AG61356, the Illinois Department of Public Health, and the Translational Genomics Research Institute. The MCADGC led by Dr. Nilüfer Ertekin-Taner and Dr. Steven G. Younkin, Mayo Clinic, Jacksonville, FL uses samples from the Mayo Clinic Study of Aging, the Mayo Clinic Alzheimer disease Research Center, and the Mayo Clinic Brain Bank. MCADGC data collection was supported through funding by NIA grants P50 AG016574, R01 AG032990, U01 AG046139, R01 AG018023, U01 AG006576, U01 AG006786, R01 AG025711, R01 AG017216, and R01 AG003949, NINDS grant R01 NS080820, CurePSP Foundation, and support from Mayo Foundation.

Published: September 21, 2020

Footnotes

Supplemental Data can be found online at https://doi.org/10.1016/j.ajhg.2020.08.022.

Web Resources

Data and Code Availability

ROS/MAP data can be requested through Rush Alzheimer’s Disease Center and Synapse. MCADGS data can be requested through Synapse. IGAP summary statistics are available online. Summary statistics generated from our BGW-TWAS methods for studying AD are publicly available through Synapse. Source code of BGW-TWAS is available through Github. See Web Resources for URLs.

Supplemental Data

Document S1. Figures S1–S11, Tables S1–S3, and Supplemental Material and Methods
mmc1.pdf (10.9MB, pdf)
Document S2. Article plus Supplemental Information
mmc2.pdf (12.9MB, pdf)

References

  • 1.Hirschhorn J.N., Daly M.J. Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 2005;6:95–108. doi: 10.1038/nrg1521. [DOI] [PubMed] [Google Scholar]
  • 2.Wellcome Trust Case Control Consortium Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.McCarthy M.I., Abecasis G.R., Cardon L.R., Goldstein D.B., Little J., Ioannidis J.P., Hirschhorn J.N. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 2008;9:356–369. doi: 10.1038/nrg2344. [DOI] [PubMed] [Google Scholar]
  • 4.Nikpay M., Goel A., Won H.H., Hall L.M., Willenborg C., Kanoni S., Saleheen D., Kyriakou T., Nelson C.P., Hopewell J.C. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 2015;47:1121–1130. doi: 10.1038/ng.3396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., Yang J. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Huang Q. Genetic study of complex diseases in the post-GWAS era. J. Genet. Genomics. 2015;42:87–98. doi: 10.1016/j.jgg.2015.02.001. [DOI] [PubMed] [Google Scholar]
  • 7.Gallagher M.D., Chen-Plotkin A.S. The Post-GWAS Era: From Association to Function. Am. J. Hum. Genet. 2018;102:717–730. doi: 10.1016/j.ajhg.2018.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Battle A., Brown C.D., Engelhardt B.E., Montgomery S.B., GTEx Consortium. Laboratory, Data Analysis &Coordinating Center (LDACC)—Analysis Working Group. Statistical Methods groups—Analysis Working Group. Enhancing GTEx (eGTEx) groups. NIH Common Fund. NIH/NCI. NIH/NHGRI. NIH/NIMH. NIH/NIDA. Biospecimen Collection Source Site—NDRI. Biospecimen Collection Source Site—RPCI. Biospecimen Core Resource—VARI. Brain Bank Repository—University of Miami Brain Endowment Bank. Leidos Biomedical—Project Management. ELSI Study. Genome Browser Data Integration &Visualization—EBI. Genome Browser Data Integration &Visualization—UCSC Genomics Institute, University of California Santa Cruz. Lead analysts. Laboratory, Data Analysis &Coordinating Center (LDACC) NIH program management. Biospecimen collection. Pathology. eQTL manuscript working group Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. [Google Scholar]
  • 9.Nicolae D.L., Gamazon E., Zhang W., Duan S., Dolan M.E., Cox N.J. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 2010;6:e1000888. doi: 10.1371/journal.pgen.1000888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Pickrell J.K., Marioni J.C., Pai A.A., Degner J.F., Engelhardt B.E., Nkadori E., Veyrieras J.B., Stephens M., Gilad Y., Pritchard J.K. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010;464:768–772. doi: 10.1038/nature08872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Gamazon E.R., Wheeler H.E., Shah K.P., Mozaffari S.V., Aquino-Michaels K., Carroll R.J., Eyler A.E., Denny J.C., Nicolae D.L., Cox N.J., Im H.K., GTEx Consortium A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 2015;47:1091–1098. doi: 10.1038/ng.3367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Gusev A., Ko A., Shi H., Bhatia G., Chung W., Penninx B.W., Jansen R., de Geus E.J., Boomsma D.I., Wright F.A. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 2016;48:245–252. doi: 10.1038/ng.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Nagpal S., Meng X., Epstein M.P., Tsoi L.C., Patrick M., Gibson G., De Jager P.L., Bennett D.A., Wingo A.P., Wingo T.S., Yang J. TIGAR: An Improved Bayesian Tool for Transcriptomic Data Imputation Enhances Gene Mapping of Complex Traits. Am. J. Hum. Genet. 2019;105:258–266. doi: 10.1016/j.ajhg.2019.05.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Barbeira A.N., Dickinson S.P., Bonazzola R., Zheng J., Wheeler H.E., Torres J.M., Torstenson E.S., Shah K.P., Garcia T., Edwards T.L., GTEx Consortium Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 2018;9:1825. doi: 10.1038/s41467-018-03621-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lappalainen T., Sammeth M., Friedländer M.R., ’t Hoen P.A., Monlong J., Rivas M.A., Gonzàlez-Porta M., Kurbatova N., Griebel T., Ferreira P.G., Geuvadis Consortium Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013;501:506–511. doi: 10.1038/nature12531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gibbs J.R., van der Brug M.P., Hernandez D.G., Traynor B.J., Nalls M.A., Lai S.L., Arepalli S., Dillman A., Rafferty I.P., Troncoso J. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 2010;6:e1000952. doi: 10.1371/journal.pgen.1000952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Mancuso N., Shi H., Goddard P., Kichaev G., Gusev A., Pasaniuc B. Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits. Am. J. Hum. Genet. 2017;100:473–487. doi: 10.1016/j.ajhg.2017.01.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Gusev A., Mancuso N., Won H., Kousi M., Finucane H.K., Reshef Y., Song L., Safi A., McCarroll S., Neale B.M., Schizophrenia Working Group of the Psychiatric Genomics Consortium Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. Nat. Genet. 2018;50:538–548. doi: 10.1038/s41588-018-0092-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wu L., Shi W., Long J., Guo X., Michailidou K., Beesley J., Bolla M.K., Shu X.O., Lu Y., Cai Q., NBCS Collaborators. kConFab/AOCS Investigators A transcriptome-wide association study of 229,000 women identifies new candidate susceptibility genes for breast cancer. Nat. Genet. 2018;50:968–978. doi: 10.1038/s41588-018-0132-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Raj T., Li Y.I., Wong G., Humphrey J., Wang M., Ramdhani S., Wang Y.C., Ng B., Gupta I., Haroutunian V. Integrative transcriptome analyses of the aging brain implicate altered splicing in Alzheimer’s disease susceptibility. Nat. Genet. 2018;50:1584–1592. doi: 10.1038/s41588-018-0238-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wainberg M., Sinnott-Armstrong N., Mancuso N., Barbeira A.N., Knowles D.A., Golan D., Ermel R., Ruusalepp A., Quertermous T., Hao K. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 2019;51:592–599. doi: 10.1038/s41588-019-0385-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lloyd-Jones L.R., Holloway A., McRae A., Yang J., Small K., Zhao J., Zeng B., Bakshi A., Metspalu A., Dermitzakis M. The Genetic Architecture of Gene Expression in Peripheral Blood. Am. J. Hum. Genet. 2017;100:371. doi: 10.1016/j.ajhg.2017.01.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Võsa U., Claringbould A., Westra H.-J., Bonder M.J., Deelen P., Zeng B., Kirsten H., Saha A., Kreuzhuber R., Kasela S. Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. biorXiv. 2018 doi: 10.1101/447367. [DOI] [Google Scholar]
  • 24.Guan Y.T., Stephens M. Bayesian Variable Selection Regression for Genome-Wide Association Studies and Other Large-Scale Problems. Ann. Appl. Stat. 2011;5:1780–1815. [Google Scholar]
  • 25.Yang J., Fritsche L.G., Zhou X., Abecasis G., International Age-Related Macular Degeneration Genomics Consortium A Scalable Bayesian Method for Integrating Functional Information in Genome-wide Association Studies. Am. J. Hum. Genet. 2017;101:404–416. doi: 10.1016/j.ajhg.2017.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.De Jager P.L., Ma Y., McCabe C., Xu J., Vardarajan B.N., Felsky D., Klein H.U., White C.C., Peters M.A., Lodgson B. A multi-omic atlas of the human frontal cortex for aging and Alzheimer’s disease research. Sci. Data. 2018;5:180142. doi: 10.1038/sdata.2018.142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Lambert J.C., Ibrahim-Verbaas C.A., Harold D., Naj A.C., Sims R., Bellenguez C., DeStafano A.L., Bis J.C., Beecham G.W., Grenier-Boley B., European Alzheimer’s Disease Initiative (EADI) Genetic and Environmental Risk in Alzheimer’s Disease. Alzheimer’s Disease Genetic Consortium. Cohorts for Heart and Aging Research in Genomic Epidemiology Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 2013;45:1452–1458. doi: 10.1038/ng.2802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Casella G. Empirical Bayes Gibbs sampling. Biostatistics. 2001;2:485–500. doi: 10.1093/biostatistics/2.4.485. [DOI] [PubMed] [Google Scholar]
  • 29.Bennett D.A., Schneider J.A., Arvanitakis Z., Wilson R.S. Overview and findings from the religious orders study. Curr. Alzheimer Res. 2012;9:628–645. doi: 10.2174/156720512801322573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Bennett D.A., Schneider J.A., Buchman A.S., Barnes L.L., Boyle P.A., Wilson R.S. Overview and findings from the rush Memory and Aging Project. Curr. Alzheimer Res. 2012;9:646–663. doi: 10.2174/156720512801322663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Bennett D.A., Buchman A.S., Boyle P.A., Barnes L.L., Wilson R.S., Schneider J.A. Religious Orders Study and Rush Memory and Aging Project. J. Alzheimers Dis. 2018;64(s1):S161–S189. doi: 10.3233/JAD-179939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Carrasquillo M.M., Zou F., Pankratz V.S., Wilcox S.L., Ma L., Walker L.P., Younkin S.G., Younkin C.S., Younkin L.H., Bisceglio G.D. Genetic variation in PCDH11X is associated with susceptibility to late-onset Alzheimer’s disease. Nat. Genet. 2009;41:192–198. doi: 10.1038/ng.305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Zou F., Chai H.S., Younkin C.S., Allen M., Crook J., Pankratz V.S., Carrasquillo M.M., Rowley C.N., Nair A.A., Middha S., Alzheimer’s Disease Genetics Consortium Brain expression genome-wide association study (eGWAS) identifies human disease-associated variants. PLoS Genet. 2012;8:e1002707. doi: 10.1371/journal.pgen.1002707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Allen M., Carrasquillo M.M., Funk C., Heavner B.D., Zou F., Younkin C.S., Burgess J.D., Chai H.S., Crook J., Eddy J.A. Human whole genome genotype and transcriptome data for Alzheimer’s and other neurodegenerative diseases. Sci. Data. 2016;3:160089. doi: 10.1038/sdata.2016.89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.De Jager P.L., Shulman J.M., Chibnik L.B., Keenan B.T., Raj T., Wilson R.S., Yu L., Leurgans S.E., Tran D., Aubin C., Alzheimer’s Disease Neuroimaging Initiative A genome-wide scan for common variants affecting the rate of age-related cognitive decline. Neurobiol. Aging. 2012;33:1017.e1–1017.e15. doi: 10.1016/j.neurobiolaging.2011.09.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Abecasis G.R., Auton A., Brooks L.D., DePristo M.A., Durbin R.M., Handsaker R.E., Kang H.M., Marth G.T., McVean G.A., 1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Das S., Forer L., Schönherr S., Sidore C., Locke A.E., Kwong A., Vrieze S.I., Chew E.Y., Levy S., McGue M. Next-generation genotype imputation service and methods. Nat. Genet. 2016;48:1284–1287. doi: 10.1038/ng.3656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.De Jager P.L., Srivastava G., Lunnon K., Burgess J., Schalkwyk L.C., Yu L., Eaton M.L., Keenan B.T., Ernst J., McCabe C. Alzheimer’s disease: early alterations in brain DNA methylation at ANK1, BIN1, RHBDF2 and other loci. Nat. Neurosci. 2014;17:1156–1163. doi: 10.1038/nn.3786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Newman A.M., Liu C.L., Green M.R., Gentles A.J., Feng W., Xu Y., Hoang C.D., Diehn M., Alizadeh A.A. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods. 2015;12:453–457. doi: 10.1038/nmeth.3337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Darmanis S., Sloan S.A., Zhang Y., Enge M., Caneda C., Shuer L.M., Hayden Gephart M.G., Barres B.A., Quake S.R. A survey of human brain transcriptome diversity at the single cell level. Proc. Natl. Acad. Sci. USA. 2015;112:7285–7290. doi: 10.1073/pnas.1507125112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Wingo T.S., Yang J., Fan W., Min Canon S., Gerasimov E.S., Lori A., Logsdon B., Yao B., Seyfried N.T., Lah J.J. Brain microRNAs associated with late-life depressive symptoms are also associated with cognitive trajectory and dementia. NPJ Genom. Med. 2020;5:6. doi: 10.1038/s41525-019-0113-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Hartung J., Knapp G., Sinha B.K. John Wiley & Sons; 2011. Statistical meta-analysis with applications. [Google Scholar]
  • 43.Marioni R.E., Harris S.E., Zhang Q., McRae A.F., Hagenaars S.P., Hill W.D., Davies G., Ritchie C.W., Gale C.R., Starr J.M. GWAS on family history of Alzheimer’s disease. Transl. Psychiatry. 2018;8:99. doi: 10.1038/s41398-018-0150-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Ligthart S., Vaez A., Hsu Y.H., Stolk R., Uitterlinden A.G., Hofman A., Alizadeh B.Z., Franco O.H., Dehghan A., Inflammation Working Group of the CHARGE Consortium. PMI-WG-XCP. LifeLines Cohort Study Bivariate genome-wide association study identifies novel pleiotropic loci for lipids and inflammation. BMC Genomics. 2016;17:443. doi: 10.1186/s12864-016-2712-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Ligthart S., Vaez A., Võsa U., Stathopoulou M.G., de Vries P.S., Prins B.P., Van der Most P.J., Tanaka T., Naderi E., Rose L.M., LifeLines Cohort Study. CHARGE Inflammation Working Group Genome Analyses of >200,000 Individuals Identify 58 Loci for Chronic Inflammation and Highlight Pathways that Link Inflammation and Complex Disorders. Am. J. Hum. Genet. 2018;103:691–706. doi: 10.1016/j.ajhg.2018.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Klarin D., Damrauer S.M., Cho K., Sun Y.V., Teslovich T.M., Honerlaw J., Gagnon D.R., DuVall S.L., Li J., Peloso G.M., Global Lipids Genetics Consortium. Myocardial Infarction Genetics (MIGen) Consortium. Geisinger-Regeneron DiscovEHR Collaboration. VA Million Veteran Program Genetics of blood lipids among ∼300,000 multi-ethnic participants of the Million Veteran Program. Nat. Genet. 2018;50:1514–1523. doi: 10.1038/s41588-018-0222-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Genin E., Hannequin D., Wallon D., Sleegers K., Hiltunen M., Combarros O., Bullido M.J., Engelborghs S., De Deyn P., Berr C. APOE and Alzheimer disease: a major gene with semi-dominant inheritance. Mol. Psychiatry. 2011;16:903–907. doi: 10.1038/mp.2011.52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Kamboh M.I., Demirci F.Y., Wang X., Minster R.L., Carrasquillo M.M., Pankratz V.S., Younkin S.G., Saykin A.J., Jun G., Baldwin C., Alzheimer’s Disease Neuroimaging Initiative Genome-wide association study of Alzheimer’s disease. Transl. Psychiatry. 2012;2:e117. doi: 10.1038/tp.2012.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Zhou X., Chen Y., Mok K.Y., Kwok T.C.Y., Mok V.C.T., Guo Q., Ip F.C., Chen Y., Mullapudi N., Giusti-Rodríguez P., Alzheimer’s Disease Neuroimaging Initiative Non-coding variability at the APOE locus contributes to the Alzheimer’s risk. Nat. Commun. 2019;10:3310. doi: 10.1038/s41467-019-10945-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Liang J., Wang J., Azfer A., Song W., Tromp G., Kolattukudy P.E., Fu M. A novel CCCH-zinc finger protein family regulates proinflammatory activation of macrophages. J. Biol. Chem. 2008;283:6337–6346. doi: 10.1074/jbc.M707861200. [DOI] [PubMed] [Google Scholar]
  • 51.Fagerberg L., Hallström B.M., Oksvold P., Kampf C., Djureinovic D., Odeberg J., Habuka M., Tahmasebpoor S., Danielsson A., Edlund K. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics. 2014;13:397–406. doi: 10.1074/mcp.M113.035600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Jansen I.E., Savage J.E., Watanabe K., Bryois J., Williams D.M., Steinberg S., Sealock J., Karlsson I.K., Hägg S., Athanasiu L. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 2019;51:404–413. doi: 10.1038/s41588-018-0311-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S11, Tables S1–S3, and Supplemental Material and Methods
mmc1.pdf (10.9MB, pdf)
Document S2. Article plus Supplemental Information
mmc2.pdf (12.9MB, pdf)

Data Availability Statement

ROS/MAP data can be requested through Rush Alzheimer’s Disease Center and Synapse. MCADGS data can be requested through Synapse. IGAP summary statistics are available online. Summary statistics generated from our BGW-TWAS methods for studying AD are publicly available through Synapse. Source code of BGW-TWAS is available through Github. See Web Resources for URLs.


Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES