Abstract
Transcriptome-wide association study (TWAS) methodologies aim to identify genetic effects on phenotypes through the mediation of gene transcription. In TWAS, in silico models of gene expression are trained as functions of genetic variants and then applied to genome-wide association study (GWAS) data. This post-GWAS analysis identifies gene-trait associations with high interpretability, enabling follow-up functional genomics studies and the development of genetics-anchored resources. We provide an overview of commonly used TWAS approaches, their advantages and limitations, and some widely used applications.
Keywords: Transcriptome-Wide Association Studies (TWAS), Complex traits, PrediXcan, Single cell transcriptomics, Joint Tissue Imputation (JTI), Electronic health records
Genome-wide association studies (GWAS) have been remarkably successful in finding genetic variants reproducibly associated with Mendelian and complex diseases. Increasingly large sample sizes have powered GWAS discoveries in thousands of traits, enabling an unprecedented view of the effects of genetic variants on the phenome. The availability of large-scale electronic health records (EHRs) linked to whole-genome information, pioneered by the UK Biobank (Bycroft et al., 2018) and other biobanks, has further accelerated the pace of discovery. Despite the success of GWAS, it remains challenging to determine the biological underpinnings of identified loci, as these are primarily located in noncoding or intergenic regions of the genome (Visscher et al., 2017). GWAS loci also tend to have modest effect sizes; thus, finding the underlying mechanisms and functional implications remains elusive. It is generally presumed that GWAS loci exert their phenotypic effects by playing a critical role in the genetic regulation of gene expression, which is supported by the observation that trait-associated SNPs are often expression quantitative trait loci (eQTLs) (Nicolae et al., 2010; Nica et al., 2010). Large-scale reference RNA-seq resources, such as the Genotype-Tissue Expression (GTEx) project (GTEx Consortium, 2015), have facilitated the systematic investigation of gene expression variation and the discovery of eQTLs across a broad array of human tissues for integration with GWAS data (GTEx Consortium et al., 2017; GTEx Consortium, 2020).
Transcriptome-wide association study (TWAS)
TWAS historically exploited advances in methodology and resource generation, focusing on the mechanism of transcription to detect trait-associated genes (Figure 1) (Gamazon et al., 2015). TWAS is a two-stage procedure, each stage having an impact on statistical power. The first stage trains in silico genetic models of expression for a gene in a given tissue or cell type. The second stage performs the association analysis between the genetically regulated gene expression (GReX) and the phenotype.
Figure 1. TWAS.
In a typical workflow, after reference data pre-processing (a), gene expression prediction models are trained using genetic variants as features and applied to GWAS data (individual-level or summary statistics) to detect gene-level associations (b). Extended analyses include using a variety of machine learning approaches to predict expression, joint-tissue imputation to improve prediction, cross-tissue association analysis, colocalization, or Mendelian randomization to evaluate the causal effect of a gene on outcome (c).
The model building of the first stage can be performed using machine learning and statistical approaches (e.g., Elastic Net, LASSO, BSLMM, best QTL). PrediXcan, the first TWAS methodology, trains expression models using Elastic Net regularization to estimate GReX (Gamazon et al., 2015). Consider the expression level of . samples in a given tissue or cell type: . For each gene, PrediXcan solves the following optimization problem:
The L1 term induces sparsity, the L2 penalty promotes the grouping effect, and gives the relative weights of the two penalties. On comparison with a polygenic risk score (p-value threshold) based model, the best eQTL (single-SNP) model, and LASSO, this model was found to perform well, potentially capturing the underlying genetic architecture of gene expression, and to be more robust to slight changes in the input feature set (Gamazon et al., 2015). Joint-tissue methods (such as UTMOST and JTI) (Hu et al., 2019; Zhou et al., 2020) have been found to improve prediction through the borrowing of information across tissues. The association analysis of the second stage can be performed using individual-level data or using GWAS summary statistics (Gusev et al., 2016; Barbeira et al., 2018). The gain in power from a multivariate model is dependent on the prediction quality and the extent of allelic heterogeneity in gene expression. It should be noted that linkage disequilibrium contamination can complicate downstream inference (Wainberg et al., 2019). Thus, several methods to support causality, including probabilistic fine-mapping (Mancuso et al., 2019) and Mendelian randomization (Zhou et al., 2020), have been integrated into TWAS.
TWAS is not the only gene-based strategy for finding trait associations. Other gene-based association methods include Multi-marker Analysis of GenoMic Annotation (MAGMA) (de Leeuw et al., 2015), which uses an arbitrary genomic window to assign variants to genes, and cS2G, which combines seven constituent SNP-to-gene (S2G) linking strategies to maximize their disease informativeness (Gazal et al., 2022). The TWAS approach explicitly utilizes multi-variant gene expression predictive models for integration with GWAS data to improve our understanding of the mechanistic basis of the genetic associations (Table 1).
Table 1. Select TWAS and related approaches.
TWAS is a two-stage procedure: gene expression prediction and association analysis. The “Expression prediction” column indicates the method used for gene expression prediction model training. “NA” means that the TWAS approach bypasses this first stage. The “Pleiotropy control” column indicates the method used for pleiotropic effect control, an important concern in Mendelian randomization in estimating the causal effect of gene expression on the phenotype of interest. Reference PubMed ID (PMID) for the source publication and Github link for the software are provided for each TWAS approach.
Approach | Expression Prediction | Pleiotropy Control | Features and Pros | Cons | Reference PMID | Github Link |
---|---|---|---|---|---|---|
PrediXcan | Elastic Net | NA | First TWAS method | No pleiotropy control | 26258848 | https://github.com/hakyimlab/PrediXcan; https://github.com/gamazonlab/MR-JTI |
Fusion | BSLMM | NA | First summary-statistics based | No pleiotropy control | 26854917 | http://gusevlab.org/projects/fusion/ |
SMR-HEIDI | Univariate | HEIDI | Heterogeneity modeling | Less powerful. Overly conservative/deflated p-values | 27019110 | https://yanglab.westlake.edu.cn/software/smr/#Overview |
TIGAR | Dirichlet process regression | NA | Bayesian method with data-driven nonparametric prior for cis-eQTL effect sizes | No pleiotropy control | 31230719 | https://github.com/yanglab-emory/ |
MultiXcan | Elastic Net | NA | Cross-tissue approach from principal components of predicted expression | No pleiotropy control | 30668570 | https://github.com/hakyimlab/MetaXcan |
UTMOST | Group-LASSO regularization | NA | Improved prediction quality; cross-tissue approach from generalized Berk-Jones (GBJ) test | No pleiotropy control | 30804563 | https://github.com/Joker-Jerome/UTMOST |
MR-JTI | Weighted Elastic Net | Bootstrap LASSO | Improved prediction quality; flexibility of heterogeneity modeling | Weak instruments? | 33020666 | https://github.com/gamazonlab/MR-JTI |
LDA-MR-EGGER | NA | Egger | Extending the MR estimator by relaxing the assumption of the SNPs being independent | Assumption of equal horizontal pleiotropic effects across SNPs (limitation of MR-Egger). | 29808603 | https://github.com/yuanzhongshang/PMRreproduce/blob/master/LDA%20MR-Egger.R |
PMR-EGGER | NA | Egger | Accounting for LD structure; considering the uncertainty of the unobserved gene expression | Weak instruments? Assumption of polygenic architecture for gene expression. Assumption of equal horizontal pleiotropic effects across SNPs (limitation of MR-Egger) | 32737316 | https://github.com/yuanzhongshang/PMR |
TWMR | NA | Heterogeneity test (Cochran’s Q test) | Accounting for confounding factor | Heterogeneity test is less flexible. | 31341166 | https://github.com/eleporcu/TWMR |
PTWAS | Probabilistic annotation of eQTLs | I2 statistic | Method uses strong instruments. | Heterogeneity test is less flexible. | 32912253 | https://github.com/xqwen/ptwas |
METRO | Weighted sum of effects from the genetic ancestries | Egger | Multi-ancestry TWAS | Assumption of equal horizontal pleiotropic effects across SNPs (limitation of MR-Egger). Power may be lower than Simes version of TWAS/PrediXcan when cis-SNP heritability is high. | 35334221 | https://github.com/zhengli09/METRO |
MRLocus | NA | Dispersion statistic | 2-step procedure: Colocalization prior to MR slope fitting | Weak instruments? | 33872308 | https://github.com/mikelove/mrlocusPaper |
Reference panels
Increasingly, large-scale transcriptome reference panels are available to facilitate the development of genetic variation-based gene expression models for downstream GReX analysis. The most comprehensive transcriptome dataset is the Genotype-Tissue Expression (GTEx) project (GTEx Consortium, 2015; GTEx Consortium et al., 2017; GTEx Consortium, 2020). The GTEx project performed RNA-seq analysis on more than 50 tissues and cell types from 834 donors (for a total of more than 15,000 samples). The postmortem samples had been procured from healthy tissues of donors, though not every donor provided every tissue. The project has been an enabling resource, allowing GReX models in a wide range of tissues to be developed. The original models utilized common SNPs in the cis-region of a gene. Trans-eQTLs may well explain a sizable proportion of gene expression variation, but face a substantial multiple testing burden (Võsa et al., 2021). Nevertheless, both cis- and trans- SNPs can be incorporated (Võsa et al., 2021), as in the BGW-TWAS implementation built on Bayesian variable selection regression, potentially improving model accuracy.
Although the GTEx project has generated a catalog of eQTLs in a broad collection of tissues and cell types, it should be noted that eQTL datasets, especially for difficult-to-acquire tissues (e.g., brain), are relatively small in sample size. Larger eQTL datasets are available but only for select tissues such as brain (PsychENCODE) (PsychENCODE Consortium et al., 2015) and whole blood (eQTLGen Consortium) (Võsa et al., 2021). One approach to expand the scale of a transcriptome dataset is Hypergraph Factorization (Viñas et al., 2023).The neural network approach allows for the imputation of gene expression in hard-to-collect tissues using available expression data in more accessible tissues, e.g., whole blood. Such multi-tissue imputation can in turn substantially improve eQTL detection.
A recent study asserts that GWAS and eQTL mapping are powered to detect different types of variants along diverse types of annotations (Mostafavi et al., 2023). Nevertheless, it remains unclear to what extent the sampling of GWAS traits to date is captured in relevant tissues, cell types, and developmental stages represented in existing eQTL reference resources. Furthermore, the use of regulatory variation in less-accessible contexts or rare cell types in colocalization analysis may require new methodologies in addition to the generation of a more complete panel.
Functional validation
Since TWAS identifies interpretable (gene-level) associations, follow-up functional validation becomes more direct than for GWAS-identified variants. The approach has the benefit of evaluating a molecular mechanism through which genetic variation affects phenotype. We and others have employed GReX analysis followed by functional validation to identify potentially causal genes for complex traits. Gusev et al. validated gene-level associations with schizophrenia using data on physical chromatin interactions during brain development (Gusev et al., 2018). Applying CRISPR/Cas9 gene editing at the TWAS locus 5q13.2 in CD34+ hematopoietic and progenitor cells, Yao et al. identified the causal gene in the locus for neurophil count (Yao et al., 2020). Unlu et al. (Unlu et al., 2019) showed that GRIK5 contributes to the polygenic liability to develop eye diseases in humans through its GReX, which was further mechanistically investigated via depletion of its ortholog in zebrafish.
Single-cell transcriptomics
Advances in single-cell technologies are transforming the biomedical sciences by facilitating high-throughput profiling of molecular traits at unprecedented resolution (Hao et al., 2021). These technologies are helping to generate comprehensive reference maps of cell types and cell states (Rozenblatt-Rosen et al., 2017), expanding our understanding of cellular identity, cell-type-specific disease-relevant signatures, and the molecular function of genetic variation (Perez et al., 2022). Here, GReX modeling can shed light on the context-specific and context-shared components of gene expression (Thompson et al., 2022). Furthermore, genetic models of cell-type-specific and cell-state-adjusted gene expression from differentiating cell types can provide insights into the dynamics of gene regulation and enable the discovery of context-specific disease associations not accessible via bulk sequencing derived models (Abe et al., 2023).
Extensions to other omics layers
A natural extension of the TWAS approach is the proteome-wide association study (PWAS) methodology, which integrates proteomics data with GWAS data (Wingo et al., 2021a, 2021b; Zhang et al., 2022). Existing modeling approaches to train imputation models using genetic variants as features remain relevant for PWAS. Despite the relative paucity of proteogenomic data, there have been some notable developments. A PWAS of Alzheimer’s disease followed by Mendelian randomization and colocalization analysis identified 11 genes that are putatively causal for the disease, through the local regulation of protein abundance in brain. In this study, human brain proteomes in dorsolateral prefrontal cortex (dPFC) of postmortem samples from 376 European ancestry samples in the ROS/MAP project were used as a discovery panel while proteome data from a second set of 152 dPFC samples were used in a confirmation PWAS (Bennett et al., 2018). A multi-ancestry PWAS in large European ancestry and African ancestry cohorts of the Atherosclerosis Risk in Communities (ARIC) developed prediction models in plasma (a potentially powerful diagnostic and prognostic biosample) in each population (Zhang et al., 2022). As in the PrediXcan modeling of gene expression, Elastic Net generated high-performance protein imputation models, gaining 36% and 40% accuracy in European ancestry and African ancestry populations, respectively, in comparison with the models utilizing only the sentinel cis-pQTL. Interestingly, the study found that the cross-population performance of imputation models was improved when African ancestry models were used to impute protein abundance in European ancestry.
Isoform-level TWAS (isoTWAS) is a framework to integrate genetic variation and isoform-level expression for enhanced discovery of phenotypic associations (Bhattacharya et al., 2022b). The framework may offer some advantages over standard TWAS, prioritizing isoforms that explain the gene-level association. The approach focuses on alternative splicing – a fundamental mechanism that generates functional diversity – whose disruption may contribute to disease pathophysiology (Gamazon and Stranger, 2014).
Applications to disease mapping
Recent applications of the methodology have been broad. These include investigations into the genetic mechanisms of critical illness in COVID-19 (Pairo-Castineira et al., 2021, 2023); molecular dysregulation in autism spectrum disorder, schizophrenia, and bipolar disorder (Gandal et al., 2018); and the genetic determinants of blood pressure in a trans-ethnic study (Giri et al., 2019). PrediXcan has been used to identify a new Mendelian syndrome, CATIFA, uncovering a new mechanism for rare and common diseases (Unlu et al., 2020). A recent study combined organellar proteomics and metabolomics to solve long-standing questions in the fields of metabolism and redox biology, and the implications on human diseases were investigated using JTI (Wang et al., 2021).
Drug repurposing
Leveraging GReX can be a powerful approach to drug repurposing efforts. Indeed, drug candidates with genetic support are more likely to progress through clinical trials towards FDA approval (King et al., 2019; Nelson et al., 2015). GReX thus provides a powerful probe to predict repurposing candidates. For example, GReX analysis of COVID-19 severity led to the inclusion of the repurposing candidate baricitinib in a large clinical trial (Pairo-Castineira et al., 2021, 2023). The drug is now the first FDA-approved immunomodulatory treatment for COVID-19 after clinical trials showed therapeutic benefit (Rubin, 2022; Kalil et al., 2021; RECOVERY Collaborative Group, 2022). Several other GReX-based drug repositioning applications across a wide range of disease traits have been investigated (Gerring et al., 2021; So et al., 2017; Wu et al., 2022; He et al., 2023).
EHRs, biobanks, and undiagnosed disease
The TWAS approach in conjunction with dense EHR data can be used to generate an atlas of associations with the medical phenome. In EHRs without transcriptome data, gene expression can nevertheless be estimated using only genomic data. The resulting GReX-based atlas can be used in a wide variety of applications, including facilitating a phenome-wide association study (PheWAS). The Undiagnosed Diseases Network (UDN) aims to expedite the diagnosis of rare, novel, and previously unrecognized disorders and reduce lengthy and costly diagnostic odysseys (Ramoni et al., 2017). As a primary tool of the UDN site at Vanderbilt University Medical Center, TWAS, as implemented in JTI/PrediXcan, is currently one of the methods we use to prioritize genes with mutations in a proband by comparing the JTI/PrediXcan implicated phenotypes with the patient’s medical history. The genes that are implicated by rare variant analysis and show JTI/PrediXcan associations that match patient symptoms are given greater weights. A catalog of disease associations built on EHR data is especially useful for genes for which there is little known concerning the medical consequences of their disruption (Unlu et al., 2019).
Since TWAS results have a direction of effect, we can prioritize genes for which decreased expression is associated with disease, under the assumption that rare phenotypes observed in the UDN cases are caused by gene knockouts or a partial disruption of gene function. For each prioritized gene, we can then compare the observed (RNA-Seq) expression of the patient (if available) to the distribution of expression values in an expression reference database (such as GTEx) to determine whether the individual’s expression level is an outlier. One caveat to keep in mind is that expression may vary by age, with UDN cases being predominantly pediatric, while GTEx samples represent older individuals. If so, this may indicate the presence of large-effect rare variants within the gene or gene network, especially if the patient’s GReX (from the JTI/PrediXcan model of common variants) falls within the distribution of GReX values in a population panel (e.g., in 1000 Genomes).
Multi-ancestry TWAS
TWAS in a multi-biobank, multi-ancestry setting may enable well-powered investigations into biological mechanisms of diseases. However, in this setting, as TWAS integrates genetic, gene expression, and phenotypic data, each layer introduces new methodological challenges. The challenges on the genetic level are shared with conventional GWAS and include confounding from genetic ancestry, population stratification, and complex linkage disequilibrium. On the gene expression level, tissue or cell type specific gene expression and the effect of cell state on expression may determine context specificity (Gamazon, 2021). Leveraging phenotypic data in a multi-biobank setting requires paying attention to case-control definition, data harmonization, and ascertainment and selection bias. Implementing JTI (Zhou et al., 2020) and MOSTWAS (Bhattacharya et al., 2021), the Global Biobank Meta-analysis Initiative (GBMI), a network of 24 biobanks consisting of 2.2 million patients of diverse ancestries, investigates these challenges and others, including the portability of the gene expression prediction across ancestries (Bhattacharya et al., 2022a; Lu et al., 2022; Li et al., 2022).
Gene knockout and function
It can be difficult to observe the consequences of gene knockouts in the general population. One way to overcome this challenge is to consider the effects of low expression level of a gene, i.e., a pseudo-knockout, on the medical phenome. Individuals at the low end of the GReX distribution for a given gene (such as from a PrediXcan model of common variants) may still be predisposed to disease in a comparable way as an individual with a large-effect rare loss-of-function variant (Salisbury-Ruf et al., 2018; Zhou et al., 2021). By analyzing the enriched phenotypes of individuals at the low end of the GReX distribution, we may be able to identify at least some of the potential consequences of a gene knockout. For genes with embryonic lethal phenotypes or genes buffered by the genetic redundancy of other genes with similar function, this approach is limited (although the complementary use of animal models may further our understanding of the function of these genes). Otherwise, as pseudo-knockouts predispose an individual to disease, we may be able to infer the rare-variant effects given a sufficiently large sample size.
Future directions and challenges
The portability of TWAS trained in one population may be limited in another population (Geoffroy et al., 2020). Recent work has addressed this serious limitation ((Hoffman et al., 2017; Geoffroy et al., 2020; Bhattacharya et al., 2020; Chen et al., 2023; Bhattacharya et al., 2022a), but the problem remains methodologically challenging, especially as datasets are combined across multiple resources. In admixed populations, use of local ancestry may aid to better characterize the genetic architecture of molecular traits, including gene expression, but heterogeneity in ancestry-specific effect may require new methodological developments (Zhong et al., 2019). Notably, TWAS has even been used to investigate differences between archaic hominin species and anatomically modern humans as an avenue for studying phenotypic differences using genomic information alone (Colbran et al., 2019).
The development of TWAS models may be greatly enhanced by new technological advances, such as long-read sequencing (Marx, 2023) and single-cell technologies (Heumos et al., 2023). In the case of isoform-level TWAS, long-read sequencing can enable the discovery of novel isoforms. On the other hand, the use of the expression of the full transcript may mask domain-specific gene-phenotype associations. An example is the X-linked androgen receptor AR. Trinucleotide repeats in exon 1 cause spinal and bulbar muscular atrophy while mutations in the androgen receptor domain cause androgen insensitivity syndrome. This difference may also be important for apparent gain-of-function mutations whose estimated phenotypic effects may be masked in the analysis. Gain-of-function mutations, i.e., single mutations that result in a new phenotype, may not be easily detected in trait associations with the total expression of a gene in which the mutated amino acid residue in the corresponding protein acts as a “domain”. Finally, some eQTLs may be active only at certain developmental stages or in rare cell types (Perez et al., 2022). Single-cell modeling approaches may therefore be needed to detect the causal effects of these genes on disease (Abe et al., 2023).
Concluding remarks
GReX analysis has proven to be useful in studies of human phenotypes and complex diseases. The functional focus on gene expression in the approach leads to interpretable results that can then be followed up experimentally. The application to biobank data facilitates translatable research while also enabling basic science studies of biological mechanisms. The integration of diverse ancestries provides opportunities to identify causal genes and accelerate precision medicine while also presenting methodological challenges. The core idea behind PrediXcan will continue to play a critical role as we switch our focus from genetic discovery to biological mechanism of disease.
Acknowledgments
This work was supported by the following National Institutes of Health (NIH) grants to E.R.G.: NHGRI R35HG010718, NHGRI R01HG011138, NIA R56AG068026, NIGMS R01GM140287, and NIMH R01MH126459. We acknowledge support from the Vanderbilt Center of Excellence for Undiagnosed Disease (VCEUD). E.W.K. gratefully acknowledges financial support from the NIH: R01MH113362, R56AG068026.
Footnotes
Conflict of interest
The authors declare no conflict of interest.
Data availability statement
The data (gene expression prediction models for PrediXcan, JTI, and modified UTMOST) that support the protocol are openly available in Zenodo at http://www.doi.org/10.5281/zenodo.3842289. The MR-JTI source code is openly available in Zenodo at http://www.doi.org/10.5281/zenodo.4164739. This source code is supplementary to the Github repository for the project which can be accessed at https://github.com/gamazonlab/MR-JTI/.
Literature Cited
- Abe H, Lin P, Zhou D, Ruderfer DM, and Gamazon ER 2023. Mapping the landscape of lineage-specific dynamic regulation of gene expression using single-cell transcriptomics and application to genetics of complex disease. Genetic and Genomic Medicine Available at: http://medrxiv.org/lookup/doi/10.1101/2023.10.24.23297476 [Accessed November 18, 2023]. [Google Scholar]
- Barbeira AN, Dickinson SP, Bonazzola R, Zheng J, Wheeler HE, Torres JM, Torstenson ES, Shah KP, Garcia T, Edwards TL, et al. 2018. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nature Communications 9:1825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bennett DA, Buchman AS, Boyle PA, Barnes LL, Wilson RS, and Schneider JA 2018. Religious Orders Study and Rush Memory and Aging Project. Journal of Alzheimer’s disease: JAD 64:S161–S189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bhattacharya A, García-Closas M, Olshan AF, Perou CM, Troester MA, and Love MI 2020. A framework for transcriptome-wide association studies in breast cancer in diverse study populations. Genome Biology 21:42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bhattacharya A, Hirbo JB, Zhou D, Zhou W, Zheng J, Kanai M, Global Biobank Meta-analysis Initiative, Pasaniuc B, Gamazon ER, and Cox NJ 2022a. Best practices for multi-ancestry, meta-analytic transcriptome-wide association studies: Lessons from the Global Biobank Meta-analysis Initiative. Cell Genomics 2:100180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bhattacharya A, Li Y, and Love MI 2021. MOSTWAS: Multi-Omic Strategies for Transcriptome-Wide Association Studies. PLoS genetics 17:e1009398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bhattacharya A, Vo DD, Jops C, Kim M, Wen C, Hervoso JL, Pasaniuc B, and Gandal MJ 2022b. Isoform-level transcriptome-wide association uncovers extensive novel genetic risk mechanisms for neuropsychiatric disorders in the human brain. Genetic and Genomic Medicine Available at: http://medrxiv.org/lookup/doi/10.1101/2022.08.23.22279134 [Accessed November 15, 2023]. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O’Connell J, et al. 2018. The UK Biobank resource with deep phenotyping and genomic data. Nature 562:203–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen F, Wang X, Jang S-K, Quach BC, Weissenkampen JD, Khunsriraksakul C, Yang L, Sauteraud R, Albert CM, Allred NDD, et al. 2023. Multi-ancestry transcriptome-wide association analyses yield insights into tobacco use biology and drug repurposing. Nature Genetics 55:291–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Colbran LL, Gamazon ER, Zhou D, Evans P, Cox NJ, and Capra JA 2019. Inferred divergent gene regulation in archaic hominins reveals potential phenotypic differences. Nature Ecology & Evolution 3:1598–1606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gamazon ER 2021. Detecting context-dependent gene regulation. Nature Computational Science 1:393–394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gamazon ER, and Stranger BE 2014. Genomics of alternative splicing: evolution, development and pathophysiology. Human Genetics 133:679–687. [DOI] [PubMed] [Google Scholar]
- Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, Eyler AE, Denny JC, GTEx Consortium, Nicolae DL, et al. 2015. A gene-based association method for mapping traits using reference transcriptome data. Nature Genetics 47:1091–1098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gandal MJ, Zhang P, Hadjimichael E, Walker RL, Chen C, Liu S, Won H, van Bakel H, Varghese M, Wang Y, et al. 2018. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science (New York, N.Y.) 362:eaat8127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gazal S, Weissbrod O, Hormozdiari F, Dey KK, Nasser J, Jagadeesh KA, Weiner DJ, Shi H, Fulco CP, O’Connor LJ, et al. 2022. Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity. Nature Genetics 54:827–836. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Geoffroy E, Gregga I, and Wheeler HE 2020. Population-Matched Transcriptome Prediction Increases TWAS Discovery and Replication Rate. iScience 23:101850. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gerring ZF, Gamazon ER, White A, and Derks EM 2021. Integrative Network-Based Analysis Reveals Gene Networks and Novel Drug Repositioning Candidates for Alzheimer Disease. Neurology. Genetics 7:e622. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giri A, Hellwege JN, Keaton JM, Park J, Qiu C, Warren HR, Torstenson ES, Kovesdy CP, Sun YV, Wilson OD, et al. 2019. Trans-ethnic association study of blood pressure determinants in over 750,000 individuals. Nature Genetics 51:51–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Consortium GTEx 2015. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science (New York, N.Y.) 348:648–660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- GTEx Consortium 2020. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science (New York, N.Y.) 369:1318–1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- GTEx Consortium, Laboratory, Data Analysis &Coordinating Center (LDACC)—Analysis Working Group, Statistical Methods groups—Analysis Working Group, Enhancing GTEx (eGTEx) groups, NIH Common Fund, NIH/NCI, NIH/NHGRI, NIH/NIMH, NIH/NIDA, Biospecimen Collection Source Site—NDRI, et al. 2017. Genetic effects on gene expression across human tissues. Nature 550:204–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BWJH, Jansen R, de Geus EJC, Boomsma DI, Wright FA, et al. 2016. Integrative approaches for large-scale transcriptome-wide association studies. Nature Genetics 48:245–252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gusev A, Mancuso N, Won H, Kousi M, Finucane HK, Reshef Y, Song L, Safi A, Schizophrenia Working Group of the Psychiatric Genomics Consortium, McCarroll S, et al. 2018. Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. Nature Genetics 50:538–548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hao Y, Hao S, Andersen-Nissen E, Mauck WM, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M, et al. 2021. Integrated analysis of multimodal single-cell data. Cell 184:3573–3587.e29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He C, Xu Y, Zhou Y, Fan J, Cheng C, Meng R, Gamazon ER, and Zhou D 2023. Integrating population-level and cell-based signatures for drug repositioning. bioRxiv: The Preprint Server for Biology:2023.10.25.564079. [Google Scholar]
- Heumos L, Schaar AC, Lance C, Litinetskaya A, Drost F, Zappia L, Lücken MD, Strobl DC, Henao J, Curion F, et al. 2023. Best practices for single-cell analysis across modalities. Nature Reviews. Genetics 24:550–572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoffman JD, Graff RE, Emami NC, Tai CG, Passarelli MN, Hu D, Huntsman S, Hadley D, Leong L, Majumdar A, et al. 2017. Cis-eQTL-based trans-ethnic meta-analysis reveals novel genes associated with breast cancer risk. PLoS genetics 13:e1006690. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu Y, Li M, Lu Q, Weng H, Wang J, Zekavat SM, Yu Z, Li B, Gu J, Muchnik S, et al. 2019. A statistical framework for cross-tissue transcriptome-wide association analysis. Nature Genetics 51:568–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kalil AC, Patterson TF, Mehta AK, Tomashek KM, Wolfe CR, Ghazaryan V, Marconi VC, Ruiz-Palacios GM, Hsieh L, Kline S, et al. 2021. Baricitinib plus Remdesivir for Hospitalized Adults with Covid-19. The New England Journal of Medicine 384:795–807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- King EA, Davis JW, and Degner JF 2019. Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval. PLoS genetics 15:e1008489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Leeuw CA, Mooij JM, Heskes T, and Posthuma D 2015. MAGMA: generalized gene-set analysis of GWAS data. PLoS computational biology 11:e1004219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Z, Zhao W, Shang L, Mosley TH, Kardia SLR, Smith JA, and Zhou X 2022. METRO: Multi-ancestry transcriptome-wide association studies for powerful gene-trait association detection. American Journal of Human Genetics 109:783–801. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu Z, Gopalan S, Yuan D, Conti DV, Pasaniuc B, Gusev A, and Mancuso N 2022. Multi-ancestry fine-mapping improves precision to identify causal genes in transcriptome-wide association studies. American Journal of Human Genetics 109:1388–1404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mancuso N, Freund MK, Johnson R, Shi H, Kichaev G, Gusev A, and Pasaniuc B 2019. Probabilistic fine-mapping of transcriptome-wide association studies. Nature Genetics 51:675–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marx V 2023. Method of the year: long-read sequencing. Nature Methods 20:6–11. [DOI] [PubMed] [Google Scholar]
- Mostafavi H, Spence JP, Naqvi S, and Pritchard JK 2023. Systematic differences in discovery of genetic effects on gene expression and complex traits. Nature Genetics 55:1866–1875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nelson MR, Tipney H, Painter JL, Shen J, Nicoletti P, Shen Y, Floratos A, Sham PC, Li MJ, Wang J, et al. 2015. The support of human genetic evidence for approved drug indications. Nature Genetics 47:856–860. [DOI] [PubMed] [Google Scholar]
- Nica AC, Montgomery SB, Dimas AS, Stranger BE, Beazley C, Barroso I, and Dermitzakis ET 2010. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS genetics 6:e1000895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, and Cox NJ 2010. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS genetics 6:e1000888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pairo-Castineira E, Clohisey S, Klaric L, Bretherick AD, Rawlik K, Pasko D, Walker S, Parkinson N, Fourman MH, Russell CD, et al. 2021. Genetic mechanisms of critical illness in COVID-19. Nature 591:92–98. [DOI] [PubMed] [Google Scholar]
- Pairo-Castineira E, Rawlik K, Bretherick AD, Qi T, Wu Y, Nassiri I, McConkey GA, Zechner M, Klaric L, Griffiths F, et al. 2023. GWAS and meta-analysis identifies 49 genetic variants underlying critical COVID-19. Nature 617:764–768. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Perez RK, Gordon MG, Subramaniam M, Kim MC, Hartoularos GC, Targ S, Sun Y, Ogorodnikov A, Bueno R, Lu A, et al. 2022. Single-cell RNA-seq reveals cell type-specific molecular and genetic associations to lupus. Science (New York, N.Y.) 376:eabf1970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- PsychENCODE Consortium, Akbarian S, Liu C, Knowles JA, Vaccarino FM, Farnham PJ, Crawford GE, Jaffe AE, Pinto D, Dracheva S, et al. 2015. The PsychENCODE project. Nature Neuroscience 18:1707–1712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramoni RB, Mulvihill JJ, Adams DR, Allard P, Ashley EA, Bernstein JA, Gahl WA, Hamid R, Loscalzo J, McCray AT, et al. 2017. The Undiagnosed Diseases Network: Accelerating Discovery about Health and Disease. American Journal of Human Genetics 100:185–192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- RECOVERY Collaborative Group 2022. Baricitinib in patients admitted to hospital with COVID-19 (RECOVERY): a randomised, controlled, open-label, platform trial and updated meta-analysis. Lancet (London, England) 400:359–368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rozenblatt-Rosen O, Stubbington MJT, Regev A, and Teichmann SA 2017. The Human Cell Atlas: from vision to reality. Nature 550:451–453. [DOI] [PubMed] [Google Scholar]
- Rubin R 2022. Baricitinib Is First Approved COVID-19 Immunomodulatory Treatment. JAMA 327:2281. [DOI] [PubMed] [Google Scholar]
- Salisbury-Ruf CT, Bertram CC, Vergeade A, Lark DS, Shi Q, Heberling ML, Fortune NL, Okoye GD, Jerome WG, Wells QS, et al. 2018. Bid maintains mitochondrial cristae structure and function and protects against cardiac disease in an integrative genomics study. eLife 7:e40907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- So H-C, Chau CK-L, Chiu W-T, Ho K-S, Lo C-P, Yim SH-Y, and Sham P-C 2017. Analysis of genome-wide association data highlights candidates for drug repositioning in psychiatry. Nature Neuroscience 20:1342–1349. [DOI] [PubMed] [Google Scholar]
- Thompson M, Gordon MG, Lu A, Tandon A, Halperin E, Gusev A, Ye CJ, Balliu B, and Zaitlen N 2022. Multi-context genetic modeling of transcriptional regulation resolves novel disease loci. Nature Communications 13:5704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Unlu G, Gamazon ER, Qi X, Levic DS, Bastarache L, Denny JC, Roden DM, Mayzus I, Breyer M, Zhong X, et al. 2019. GRIK5 Genetically Regulated Expression Associated with Eye and Vascular Phenomes: Discovery through Iteration among Biobanks, Electronic Health Records, and Zebrafish. American Journal of Human Genetics 104:503–519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Unlu G, Qi X, Gamazon ER, Melville DB, Patel N, Rushing AR, Hashem M, Al-Faifi A, Chen R, Li B, et al. 2020. Phenome-based approach identifies RIC1-linked Mendelian syndrome through zebrafish models, biobank associations and clinical studies. Nature Medicine 26:98–109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Viñas R, Joshi CK, Georgiev D, Lin P, Dumitrascu B, Gamazon ER, and Liò P 2023. Hypergraph factorization for multi-tissue gene expression imputation. Nature Machine Intelligence 5:739–753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, and Yang J 2017. 10 Years of GWAS Discovery: Biology, Function, and Translation. American Journal of Human Genetics 101:5–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Võsa U, Claringbould A, Westra H-J, Bonder MJ, Deelen P, Zeng B, Kirsten H, Saha A, Kreuzhuber R, Yazar S, et al. 2021. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nature Genetics 53:1300–1310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wainberg M, Sinnott-Armstrong N, Mancuso N, Barbeira AN, Knowles DA, Golan D, Ermel R, Ruusalepp A, Quertermous T, Hao K, et al. 2019. Opportunities and challenges for transcriptome-wide association studies. Nature Genetics 51:592–599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Y, Yen FS, Zhu XG, Timson RC, Weber R, Xing C, Liu Y, Allwein B, Luo H, Yeh H-W, et al. 2021. SLC25A39 is necessary for mitochondrial glutathione import in mammalian cells. Nature 599:136–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wingo AP, Liu Y, Gerasimov ES, Gockley J, Logsdon BA, Duong DM, Dammer EB, Robins C, Beach TG, Reiman EM, et al. 2021a. Integrating human brain proteomes with genome-wide association data implicates new proteins in Alzheimer’s disease pathogenesis. Nature Genetics 53:143–146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wingo TS, Liu Y, Gerasimov ES, Gockley J, Logsdon BA, Duong DM, Dammer EB, Lori A, Kim PJ, Ressler KJ, et al. 2021b. Brain proteome-wide association study implicates novel proteins in depression pathogenesis. Nature Neuroscience 24:810–817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu P, Feng Q, Kerchberger VE, Nelson SD, Chen Q, Li B, Edwards TL, Cox NJ, Phillips EJ, Stein CM, et al. 2022. Integrating gene expression and clinical data to identify drug repurposing candidates for hyperlipidemia and hypertension. Nature Communications 13:46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao Y, Yang J, Qin Q, Tang C, Li Z, Chen L, Li K, Ren C, Chen L, and Rao S 2020. Functional annotation of genetic associations by transcriptome-wide association analysis provides insights into neutrophil development regulation. Communications Biology 3:790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang J, Dutta D, Köttgen A, Tin A, Schlosser P, Grams ME, Harvey B, CKDGen Consortium, Yu B, Boerwinkle E, et al. 2022. Plasma proteome analyses in individuals of European and African ancestry identify cis-pQTLs and models for proteome-wide association studies. Nature Genetics 54:593–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhong Y, Perera MA, and Gamazon ER 2019. On Using Local Ancestry to Characterize the Genetic Architecture of Human Traits: Genetic Regulation of Gene Expression in Multiethnic or Admixed Populations. American Journal of Human Genetics 104:1097–1115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou D, Jiang Y, Zhong X, Cox NJ, Liu C, and Gamazon ER 2020. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nature Genetics 52:1239–1246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou D, Yu D, Scharf JM, Mathews CA, McGrath L, Cook E, Lee SH, Davis LK, and Gamazon ER 2021. Contextualizing genetic risk score for disease screening and rare variant discovery. Nature Communications 12:4418. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data (gene expression prediction models for PrediXcan, JTI, and modified UTMOST) that support the protocol are openly available in Zenodo at http://www.doi.org/10.5281/zenodo.3842289. The MR-JTI source code is openly available in Zenodo at http://www.doi.org/10.5281/zenodo.4164739. This source code is supplementary to the Github repository for the project which can be accessed at https://github.com/gamazonlab/MR-JTI/.