Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2025 Jun 30;112(8):1936–1947. doi: 10.1016/j.ajhg.2025.06.006

TransferTWAS: A transfer learning framework for cross-tissue transcriptome-wide association study

Daoyuan Lai 1, Han Wang 2, Tian Gu 3, Siqi Wu 1, Dajiang J Liu 4, Pak Chung Sham 5,6, Yan Dora Zhang 1,
PMCID: PMC12414720  PMID: 40592330

Summary

Transcriptome-wide association studies (TWASs) utilize gene-expression data to explore the genetic basis of complex traits. A key challenge in TWASs is developing robust imputation models for tissues with limited sample sizes. This paper introduces transfer learning-assisted TWAS (TransferTWAS), a framework that adaptively transfers information from multiple tissues to improve gene-expression prediction in the target tissue. TransferTWAS employs a data-driven strategy that assigns higher weights to genetically similar external tissues. It outperforms other multi-tissue TWAS methods, such as the Unified Test for Molecular Signatures (UTMOST), which neglects tissue similarity, and Joint-Tissue Imputation (JTI), which relies on functional annotations to represent tissue similarity. Simulation studies demonstrate that TransferTWAS achieves the highest imputation accuracy, and analyses using the ROS/MAP and GEUVADIS datasets show a substantial power gain while maintaining control over type-I errors. Furthermore, analysis of the low-density lipoprotein cholesterol GWAS dataset and other complex traits demonstrates that TransferTWAS effectively identifies more associations compared with existing methods.

Keywords: transcriptome-wide association study, transfer learning, genetic association, eQTLs, genome-wide association studies


TransferTWAS is a TWAS method that adaptively borrows information from multiple external tissues to boost gene-expression prediction in tissues with small sample sizes. It outperforms competing approaches both in gene-expression imputation accuracy and in the number of detected gene-trait associations.

Introduction

Genome-wide association studies (GWASs) have identified numerous single-nucleotide polymorphisms (SNPs) associated with complex diseases; however, they face significant challenges in pinpointing causal genes, particularly for SNPs in non-coding regions.1,2 To address these limitations, transcriptome-wide association studies (TWASs) have emerged as a powerful approach, focusing on the association between predicted levels of genetically regulated gene expression (GReX) and phenotypes of interest.3,4 TWAS leverages gene-expression reference panels, such as the Genotype-Tissue Expression (GTEx) project,5,6,7,8 to explore the relationship between genotype and phenotype.9 Central to TWAS is the hypothesis that SNPs influence complex traits through expression quantitative trait loci (eQTLs). A TWAS involves two key steps: imputing tissue-specific GReX using transcriptomic and genetic data from reference panels and conducting association analyses between GReX and the phenotypes.

TWAS methods can be broadly categorized into single-tissue and multi-tissue approaches. Singe-tissue methods, which focus on gene expression in biologically relevant tissues, face several limitations. These include an inability to fully utilize the multi-tissue nature of gene-expression reference panels such as GTEx, disregard for cross-tissue transcriptional regulatory similarities, and poor performance in tissues with limited sample sizes.10,11,12 In response, multi-tissue methods have been developed to leverage information across multiple tissues, aiming to improve performance in tissues with small effective sample sizes by incorporating data from larger, external tissues.10,11,12,13

The multi-tissue method, MultiXcan, regresses the complex phenotype of interest onto the principal components of the GReX from all available tissues.14 However, MultiXcan does not enhance the quality of GReX in individual tissues, and the interpretability of its principal components is limited. Another approach, the Unified Test for Molecular Signatures (UTMOST), jointly models genotype and cross-tissue gene-expression data but fails to account for cross-tissue similarity.10,11 To address this limitation, Zhou et al.11 proposed the joint-tissue imputation (JTI) method, which leverages shared genetic regulation of gene expression across tissues and incorporates external annotations, such as tissue-level expression correlations and gene-level DNase I hypersensitive site similarity. While JTI improves prediction performance, its effectiveness depends on the quality and availability of the functional annotations.

To overcome these challenges, we propose TransferTWAS, a multi-tissue TWAS method that employs transfer learning to enhance GReX imputation. Unlike traditional methods, TransferTWAS does not depend on functional annotations; instead, it automatically prioritizes tissues with expression patterns similar to the target tissue, minimizing the impact of dissimilar tissues. Our extensive simulations show that TransferTWAS significantly enhances gene-expression imputation accuracy compared to existing multi-tissue methods, resulting in greater TWAS power while maintaining control over type-I error rates. When applied to a quantile-transformed low-density lipoprotein cholesterol (LDL-C; MIM: 605028) GWAS dataset, TransferTWAS effectively identifies well-known causal genes, including the SORT1 (MIM: 602458)-PSRC1 (MIM: 613126)-CELSR2 (MIM: 604265) cluster, KPNB1 (MIM: 602738), and LIPC (MIM: 151670). Additionally, when tested on 30 other complex traits, TransferTWAS uncovers the highest number of significant associations, highlighting its wide-ranging applicability and effectiveness in genetic research.

Material and methods

TWAS framework

The first step of TWAS involves estimating cis-eQTL effect sizes using a gene-expression reference panel, which includes both gene-expression and genotype data. For tissue k(k{1,,K}),the relationship between gene expression and genotype is modeled as a multiple linear regression:

Ek=Gkβk+ϵk, (Equation 1)

where E(k)Rnk is a vector of gene expression for nk individuals in tissue k,G(k)Rnk×M is the genotype matrix of the M cis-SNPs (within 1 MB of the gene’s flanking regions), βk=β1(k),,βM(k) is the M vector of the eQTL effect sizes, and ϵ(k)nk denotes the residual error term. Gene expression E(k) is adjusted for non-genetic covariates and centered such that E(E(k))=0. The genotype matrix G(k) is centered but not standardized. After estimating βˆ(k) for k{1,,K}, the GReX for a GWAS dataset with genotype matrix G¯ is imputed as

GReˆX=G¯βˆ(k).

Overview of TransferTWAS

We provide a visualization of the workflow of TransferTWAS in Figure 1. For simplicity, we first assume that we are working on a gene that only has expression in two tissues. To calculate the cis-eQTL effect size in tissue k,TransferTWAS optimizes the following loss function to enhance GReX imputation:

βˆ(k)=argminβ(k)1nkE(k)G(k)β(k)22+λβ(k)22Ridgepenalty2η(βˆ(k))β(k)Angle-basedpenalty,k=1,,K. (Equation 2)

Here, λ,ηR are tuning parameters controlling the ridge and the angle-based penalties, respectively. Since we only have two tissues (one target and one external), βˆ(k)RM is the estimated eQTL effect size from that external tissue. The angle-based penalty, motivated by Gu et al.,15 encourages alignment between β(k) and informative external effect directions.

Figure 1.

Figure 1

The schematic workflow of TransferTWAS

Suppose the target tissue is tissue k. The first step is to take standard ridge regression to train the expression predictive models in other available tissues and obtain the imputation weights βˆ(j) for jk. Next, we take the algorithm 1 to aggregate βˆ(j)’s information into βˆ-k. The third step uses βˆ(k), tissue k’s gene expression E(k), and genotype data G(k) as inputs to calculate the final estimator.

Now we turn to the scenario that multiple (larger than one) external tissues are available. In this case, βˆ(k) is adaptively aggregated across external tissues using algorithm 1, which assigns higher weights to tissues with stronger predictive utility. This ensures that only relevant tissues contribute to the model, reducing noise from uninformative sources. Readers may refer to Note S1 for a detailed explanation of the details and the logic of algorithm 1.

Solving the loss function

Equation 2 has a closed-form solution given by

βˆk=(GkGk+nkλI)1GkEk+nkηβˆk. (Equation 3)

However, calculating the inverse of G(k)G(k) becomes computationally expensive for large M. To address this, we derive an alternative formulation of Equation 3,

βˆk=VtΣ1UtEk+nkηVtΣ2Vt+IVtVtnkλβˆk. (Equation 4)

In this formulation, the matrices Ut and Vt consist of the first t columns of the matrices U and V, which are obtained through the singular value decomposition (SVD) of G(k). We define

Σ1:=diag(d1d12+nkλ,,dtdt2+nkλ),
Σ2:=diag(1d12+nkλ,,1dt2+nkλ),

where (d1,,dt) are the first t singular values of G(k), and I is an identity matrix of size M×M. Throughout this paper, we set t=min(nk,M).

Equation 4 offers two key advantages: it avoids the computationally expensive matrix inversion, making it scalable for large M, and it significantly reduces computational time by using the economic SVD instead of the full SVD. A detailed derivation of Equation 4 can be found in Note S2.

Choice of tuning parameters

The selection of the optimal tuning parameters (λ,η) is a critical step in the implementation of TransferTWAS. The tuning process starts with λ. We first perform standard ridge regression using the R function cv.glmnet() with alpha=0 and nfold=5. This function generates a sequence of λ values, from which we select five equally spaced candidates within the range (λmin,λmax).

Next, candidate values for η are generated based on the selected λ candidates. As suggested by Gu et al.,15 the theoretical optimal ηopt is related to λopt through the following equation:

ηopt=λoptραkαk. (Equation 5)

Here, ρ is the Pearson correlation between βˆ(k) and βˆ(k). The two terms αk:=E(βˆ(k)22) and αk:=E(βˆ(k)22) measure the signal strength of the target and external datasets, respectively. Using Equation 5, five η candidates are generated from the λ candidates. Finally, the best combination of λ and η is determined through cross-validation (CV), ensuring optimal performance of the model.

GTEx, GEUVADIS, and ROS/MAP data preprocessing

The preprocessing of the GTEx data follows Zhou et al.,11 Wang et al.,16 and Hu et al..10 Gene-expression imputation models were trained using genotype and normalized gene-expression data from 48 GTEx tissues. SNPs with ambiguous alleles, minor allele frequency (MAF) less than 0.05, or Hardy-Weinberg equilibrium (HWE) p values less than 0.05 were excluded. The gene-expression data were adjusted for potential confounding effects, including sex, sequencing platform, the top three principal components of genotype data, and the top probabilistic estimation of expression residuals (PEER) factors. The number of PEER factors included in the adjustment was determined by tissue sample size: 15 (<150 samples), 30 (150–250 samples), and 35 (>250 samples). Covariates were sourced from the GTEx portal, and biallelic SNPs within a 1-MB region of the target gene were selected as features.

TransferTWAS predictive performance was evaluated using two distinct gene-expression reference panels: Religious Orders Study and Rush Memory Aging Project (ROS/MAP) with brain (prefrontal cortex) tissue,17 and Genetic European variation in disease (GEUVADIS) with lymphoblastoid cell lines.18 These datasets enable a comprehensive assessment of TransferTWAS. Their preprocessing follows protocols from Wang et al.19 and Keys et al.,20 with detailed methods provided in Note S3.

Simulation study I: Gene-expression imputation accuracy

Simulation study I evaluates the accuracy of gene-expression imputation for UTMOST, JTI, and TransferTWAS. The design followed Khunsriraksakul et al.,21 Feng et al.,22 and Nagpal et al.23 We measured imputation accuracy using the squared Pearson correlation (i.e., r2) between the observed and predicted gene-expression levels. We focused on the gene CPTP (MIM: 615467), which has expression across all GTEx tissues, and examined three types of tissues: causal tissue, where genetic variants directly affect expression levels; genetically correlated tissues, which influence expression in the target tissue to a lesser extent; and genetically uncorrelated tissues with no genetic relationship to the causal tissue. In the simulations, the brain (prefrontal cortex) was designated as the causal tissue.

Simulation of gene expression in causal tissue

The gene expression in the causal tissue, E, was simulated using the formula

E=Gβ+ϵ. (Equation 6)

Here, GRn×M is the normalized genotype matrix for n=130 GTEx individuals with CPTP expression data in the target tissue, where the matrix has a mean of 0 and a variance of 1. The vector β=(β1,,βM) contains the eQTL effect sizes with M=2,212. We randomly selected pcausal=5% SNPs as causal SNPs. The set of non-zero coefficients (i.e., the causal SNPs) in β is denoted by S={j:wj0}. For the SNPs in S, we generated effect sizes βj (where jS) from a standard normal distribution N(0,1), while the effect sizes for the remaining non-causal SNPs were set to 0. We then rescaled the effect sizes β to ensure that the gene-expression heritability (i.e., the proportion of gene-expression variance explained by SNPs) is he2. The residual error ϵ follows a normal distribution N(0,(1he2)I).

Simulation of gene expression in correlated tissues

The gene expressions in Ncorr randomly selected correlated tissues were simulated next. We assumed a uniform genetic correlation ρ between the causal tissue and each correlated tissue. The effect sizes for the k-th correlated tissue were denoted as β(k) and simulated as follows:

βj(k)N(ρβj,(1ρ2)×he2),k=1,,Ncorr,jSk.

Here, βj(k) represents the j-th coordinate of β(k), and Sk is the active set of β(k). We assumed that Sk is a random subset of S, with |Sk|=qk|S|. The percentage qk represents proportion of the shared causal SNPs between the causal tissue and tissue k in GTEx, with specific values provided in Khunsriraksakul et al..21 The residual gene-expression values in correlated tissues were simulated as

ϵ(k)N(0,diag(1he2)×Σ×diag(1he2)),

where Σ represents the residual correlation among the gene-expression levels across tissues. Following Khunsriraksakul et al.,21 we set Σ=I. The gene-expression levels in the k-th correlated tissues, E(k), were then simulated as

Ek=Gkβk+ϵk, (Equation 7)

where G(k)Rnk×M is the genotype matrix of the nk GTEx individuals that have gene expression in the k-th tissue.

Simulation of gene expression in uncorrelated tissues

Gene-expression data (E(l),G(l)) in the l-th uncorrelated tissues were simulated using a similar model as in Equation 6, but with ρ=0.

Simulations were repeated 1,000 times for each of the six pairs of (ρ,Ncorr,pcausal), i.e., (0.3,0,5%),(0.3,24,5%),(0.3,47,5%),(0.8,0,5%),(0.8,24,5%), and (0.8,47,5%). Results were presented as averages across 1,000 replicates for each (ρ,Ncorr,pcausal) combination. The proportion of causal SNPs, pcausal, also varied among values in vector pcausal=(0.1%,2%,5%,10%).

Simulation study II: TWAS power analysis

Simulation study II compares the performance of UTMOST, JTI, and TransferTWAS in terms of TWAS power, following the design outlined by Zhou et al..11 We assumed P causal genes for the brain (prefrontal cortex) based on the ROS/MAP panel, and the true expression Eg,true for each causal gene g=1,2,,P was simulated from a standard normal distribution N(0,I). The phenotype Y was genetically determined by these P causal genes and generated using the equation

Y=g=1PαgEg,true+ϵ. (Equation 8)

In this equation, the coefficients αg were drawn from N0,hp2/P,while the residual ϵ followed N(0,(1hp2)I). The term hp2 represents phenotypic heritability, which is the proportion of phenotypic variance explained by gene-expression levels. Equation 8 indicates the overall phenotypic variance explained by gene expression is hp2, and each causal gene contributes, on average, Eαi=hp2/P to this variance.

We constructed predicted gene expressions, denoted as GReˆXUTMOST,GReˆXJTI, and GReˆXTransferTWAS. To achieve this, we utilized eQTL effect sizes estimated from UTMOST, JTI, and TransferTWAS, which were derived from GTEx data. These effect sizes were applied to predict gene expression in the ROS/MAP dataset. Next, we calculated the empirical correlations for the three methods: rUTMOST,rJTI,and rTransferTWAS. These correlations represent the Pearson correlation between observed and predicted gene expressions based on each method’s performance in ROS/MAP. An empirical correlation matrix Φ was constructed as follows:

Φ=(1rUTMOSTrJTIrTransferTWASrUTMOST100rJTI010rTransferTWAS001).

The predicted gene expressions GReˆXUTMOST,GReˆXJTI,GReˆXTransferTWAS were determined using the following formula:

(Eg,true,GReˆXUTMOST,GReˆXJTI,GReˆXTransferTWAS)=(Eg,true,Z1,Z2,Z3)×cholesky(Φ)×SD(Eg,true)+mean(Eg,true),

where Z1,Z2,andZ3 were independently simulated from a standard normal distribution N(0,I). The goal is to achieve specified correlations rUTMOST,rJTI, and rTransferTWAS between Eg,true and the respective predicted gene expressions, mimicking the behavior of the prediction models under study. The predicted gene expressions were generated independently while ensuring they correlate with the true expression Eg,true.

Subsequently, 1,000 simulations were performed to test the association between the predicted gene expression GReˆXUTMOST,GReˆXJTI,GReˆXTransferTWAS, and the phenotype Y. The TWAS power was estimated as the proportion of simulations achieving statistical significance (pBonferroni<0.05). To explore different scenarios, we varied the expression heritability hp2 across the values (0.05,0.1,0.2,0.3) and the number of causal genes P among (40,50,60,70). Additionally, we varied the causal tissues by using the GEUVADIS dataset as the reference panel and selected Epstein-Barr virus (EBV) transformed lymphocytes as the causal tissue to construct Φ and simulate GReˆXUTMOST,GReˆXJTI,GReˆXTransferTWAS, and Y.

Simulation study III: Type-I error analysis

To assess the type-I error rates of the three methods, we assume no association between the phenotype Y and the true gene expression Eg,true. Therefore, Y was directly simulated from a standard normal distribution N(0,I). For each gene in the designated tissue, we simulated Y and regressed it on the predicted gene expression GReˆX derived directly from GTEx. This process was replicated 1,000 times for each gene, and a significance threshold of 0.05 was used for the type-I error analysis. We considered two different GTEx tissues as the causal tissue: the brain (prefrontal cortex) and EBV-transformed lymphocytes.

We also evaluated TWAS power and type-I error using alternative simulation designs from Nagpal et al.,23 Feng et al.,22 and Khunsriraksakul et al.21 These are designated as simulation studies IV (power) and V (type-I error), with detailed information provided in Note S4.

Model assessment in GTEx

The gene-expression imputation performances of UTMOST, JTI, and TransferTWAS were compared based on their r2. The dataset was randomly divided into five equal-sized groups, with 3/5 as training set, 1/5 as validation set, and 1/5 as test set. A 5-fold CV was conducted on the training set to select the best tuning parameter that minimizes the prediction mean squared error on the validation set. The models were then trained on the training and validation sets using the selected tuning parameter, and prediction performance was evaluated on the test set through the Pearson correlation. The final correlation was calculated based on the average of the five Pearson correlation estimates, with r2 set to 0 if the training model is null. This assessment procedure follows Khunsriraksakul et al.21

TWAS with summary-level GWAS

When working with summary-level data in a GWAS, S-PrediXcan24 is applied to calculate the TWAS statistic using the following formula:

Zg=m=1Mβˆmσˆmσˆgγˆmseγˆm, (Equation 9)

where βˆm is the prediction weight of gene g’s SNP m obtained in the first step of TWAS, σˆm is SNP m’s variance, σˆg is an estimate of gene g’s predicted expression’s variance, and γˆm and se(γˆm) are the GWAS regression coefficient for SNP m and corresponding standard error.

Results

Simulation studies

Simulation showed TransferTWAS achieved the highest TWAS power, controlled type-I error, and improved gene-expression imputation accuracy.

Simulation study I examined TransferTWAS’s gene-expression imputation performance under various scenarios. Figure 2 shows that across different (ρ,Ncorr) combinations, TransferTWAS consistently outperformed UTMOST and JTI in terms of r2 if 5% of the SNPs were causal. Notably, we considered some extreme cases—for example, there is only one causal tissue and no correlated tissue (Ncorr=0). In this situation, TransferTWAS still achieved higher imputation accuracy compared to the other methods. This highlights the robustness of TransferTWAS, as it effectively avoids negative transfer, where a transfer learning method performs worse than a target-only method.25 We also considered the scenario where ρ=0.8, indicating a non-zero correlation between the correlated tissues. Under this scenario, TransferTWAS still achieved improved performance.

Figure 2.

Figure 2

Comparison of gene-expression imputation accuracy (r2) among TransferTWAS, UTMOST, and JTI in simulation study I

The average Pearson correlations (r2) between observed and predicted gene expression in the test dataset by TransferTWAS, UTMOST, and JTI, with various combinations of genetic correlation between causal and correlated tissues ρ=(0.3,0.8) and number of correlated tissues Ncorr=(0,24,47). We assumed the proportion of causal SNPs is 5% in this simulation.

TransferTWAS demonstrated robust performance across varying causal proportions. When pcausal>2%, it consistently achieved higher imputation r2 on the test dataset compared with UTMOST and JTI across all levels of expression heritability he2 and number of causal genes P (Figures 2 and S4–S6). At pcausal=2%, TransferTWAS outperformed both methods for he2=(0.05,0.1,0.15,0.2), while maintaining an advantage over JTI at he2=0.25 despite UTMOST’s slightly better performance in this specific scenario (Figure S4). However, under a sparse model with pcausal=0.1%, UTMOST and JTI yield higher r2 in the test dataset compared with TransferTWAS (Figure S5). These patterns suggest that TransferTWAS achieves optimal performance when pcausal2% and he20.2, with performance comparable to that of UTMOST at he2=0.25. Since UTMOST does not incorporate tissue-similarity information, these results indicate that leveraging external tissue data provides greatest benefit when pcausal2% and he20.2. The observed performance differences reflect each method’s underlying assumptions. TransferTWAS assumes a non-sparse architecture, while UTMOST and JTI assume a sparse one. We will expand on these implications in the discussion.

Simulation studies II and IV indicated that TransferTWAS exhibits significantly higher statistical power than other methods across sample sizes from 5,000 to 500,000 (Figures 3, S1, and S2) when analyzing the brain (prefrontal cortex) using ROS/MAP data. This advantage remained consistent across varying levels of he2 and P. A similar trend was observed with EBV-transformed lymphocytes, as shown in Figure S2. Additionally, TransferTWAS maintained robust performance under varying conditions, including the number of tissues correlated with the causal tissue (Ncorr), expression heritability (he2), and correlation strength (ρ), as detailed in Table S1.

Figure 3.

Figure 3

Power comparison of UTMOST, JTI, and TransferTWAS based on simulation study II using ROS/MAP data

We simulated 40 causal genes (P=40) explaining hp2=2% of the total phenotypic variance. True gene-expression levels and their effects on the trait were simulated, with each gene contributing hp2/P variance. Predicted expression levels were generated using the actual prediction performance (r2) from ROS/MAP for each method. Power was calculated as the proportion of simulations with Bonferroni-corrected significance pBonferroni < 0.05. More scenarios were evaluated in Figure S1.

In addition to its enhanced power, TransferTWAS controlled type-I error rates in Simulation studies III and V. Evaluations in the brain (prefrontal cortex) and EBV-transformed lymphocytes revealed that TransferTWAS maintains well-controlled type-I error rates, as shown in Figures 4 and S3. While UTMOST and JTI exhibited comparable type-I error rates in the brain (prefrontal cortex), UTMOST showed inflation in EBV-transformed lymphocytes. Further analysis (simulation study V) confirms that TransferTWAS’s type-I error remains well controlled across varying levels of he2,ρ,andNcorr, as summarized in Table S2.

Figure 4.

Figure 4

Comparison of type-I error rates for UTMOST, JTI, and TransferTWAS in simulation study III using ROS/MAP prefrontal cortex data

Quantile-quantile plot of TWAS p values from TransferTWAS, UTMOST, and JTI are generated to visualize the type-I error rates of these models in brain (prefrontal cortex) compared to the expected values, with the blue dashed lines representing the 95% confidence intervals of the expected log(p) values.

Real application to GTEx

We first compared the transcriptome-wide 5-fold CV r2 of TransferTWAS, UTMOST, and JTI in GTEx. Figures 5A and S7 illustrate that TransferTWAS and UTMOST improve r2 over JTI on the test dataset, with TransferTWAS showing increased imputation accuracy as GTEx tissue sample sizes decrease. Notably, TransferTWAS achieved a mean ΔrTransferTWAS2 (r2 difference between TransferTWAS and JTI) of 0.017, surpassing UTMOST’s mean ΔrUTMOST2 (r2 difference between UTMOST and JTI) of 0.006 (Table S3). While TransferTWAS’s r2 was lower than JTI’s in four tissues with large sample size, it showed improvement over JTI in the remaining 44 GTEx tissues. The enhancement over UTMOST can be attributed to the inclusion of tissue similarity information, which UTMOST does not consider. Additionally, TransferTWAS’s improvement over JTI showed that its data-driven approach to aggregating external tissue information appears to be more effective in most GTEx tissues.

Figure 5.

Figure 5

Gene-expression imputation accuracy improvement over JTI in GTEx tissues

(A) The average r2 increment of UTMOST and TransferTWAS compared to JTI. The average r2 values are calculated over all expressed genes in each tissue.

(B) The average iGene (r2>0.01) number increment of UTMOST and TransferTWAS compared with JTI.

(C) The proportion of JTI iGenes captured by UTMOST and TransferTWAS.

Second, we compared the number of imputable genes (iGenes, defined as r2>0.01, as suggested by multiple studies3,23,26). TransferTWAS showed an average of 10,744 iGenes, exceeding UTMOST’s 7,927 and JTI’s 6,668. TransferTWAS consistently outperformed JTI across all GTEx tissues, while UTMOST failed to do so in larger-sample-size tissues (Figure 5B and Table S3). Although TransferTWAS may not exceed JTI in Δr2 for some larger tissues, it effectively captured more iGenes, indicating strong imputation capability.

Figure 5C and Table S3 analyze the proportion of iGenes captured. TransferTWAS captured an average of 81.68% of JTI’s iGenes compared to UTMOST’s 75.46%. Thus, TransferTWAS not only identified a substantial number of iGenes of JTI but also those previously unaccounted for.

Focusing on tissues with sample sizes smaller than 300, TransferTWAS’s superiority became more evident, achieving a mean ΔrTransferTWAS2=0.028, versus UTMOST’s 0.013 (Table S3). It identified an average of 11,390 iGenes, exceeding UTMOST’s 8,419 and JTI’s 6,361, and captured 89.42% of JTI’s iGenes compared to UTMOST’s 87.50%. This confirms the effectiveness of TransferTWAS’s transfer learning approach for tissues with limited sample sizes.

In challenging contexts involving large sample sizes, TransferTWAS consistently outperformed or matched UTMOST and JTI. For muscle (skeletal) tissue (n=706), TransferTWAS has ΔrTransferTWAS2=0.001 and identifies 8,380 iGenes, outperforming UTMOST (ΔrUTMOST2=0.005 with 6,077 iGenes) and JTI (6,454 iGenes). In the testis tissue, TransferTWAS demonstrated an even more impressive ΔrTransferTWAS2=0.004, surpassing UTMOST’s ΔrUTMOST2=0.004. Additionally, TransferTWAS identified a substantially larger number of significant iGenes, totaling 11,908, compared to UTMOST’s 8,932 and JTI’s 8,276. Overall, while TransferTWAS may show a slight disadvantage in imputation r2 for tissues with relatively large sample size, it consistently allows for a greater number of genes to be classified as imputable.

Additionally, we conducted a replication study using weights trained on GTEx samples to predict expression levels in 373 European individuals from the GEUVADIS dataset. TransferTWAS achieved higher prediction r2 and identified more iGenes compared to UTMOST and JTI (Table S4).

Overall, TransferTWAS enhanced imputation accuracy in GTEx tissues, which was consistent with our simulation result.

Real application to quantile-transformed LDL-C GWAS dataset

We applied the gene-expression imputation models from GTEx data to identify potential risk genes of quantile-transformed LDL-C (N=343,621) using the UK Biobank GWAS dataset. The SNP-SNP covariance matrices for Equation 9 were estimated using the GTEx v.8 samples, and identified associations are validated against existing literature.

As shown in Figure 6A, TransferTWAS identified the largest number of significant associations (1,385) in liver tissue, outperforming JTI (375) and UTMOST (483). The significance threshold was set at a false discovery rate (FDR)-corrected p value of less than 0.05 (pFDR<0.05). Among associations identified by JTI, 54.67% (205) were also nominally significant (p<0.05) under UTMOST, while TransferTWAS increased this proportion to 67.73% (254).

Figure 6.

Figure 6

TWAS results of studying low-density lipoprotein cholesterol

(A) The number of genes that were significant under TransferTWAS, UTMOST, or JTI. Here, significance was defined as false discovery rate (FDR)-corrected p value of less than 0.05 (pFDR<0.05).

(B) The number of predefined known LDL-C-related genes detected by the three methods.

(C) The number of genes identified in the biologically relevant tissue for each of the 30 complex traits. Each box includes two horizontal borders that represent the upper and lower quartiles and a solid line that represents the median. The highest and lowest points indicate the maximum and minimum values.

We examined TransferTWAS’s ability to replicate well-known LDL-C-associated genes. Among the 59 LDL-C-related genes reported by Zhou et al.,11 TransferTWAS identified 22, while UTMOST and JTI captured 11 (Figure 6B and Table S5). UTMOST, JTI, and TransferTWAS consistently captured many well-known LDL-C-related genes. For example, all three methods show similar strong association signals for the potential LDL-C-related genes, including PCSK9 (MIM: 607786), SORT1-PSRC1-CELSR2 cluster, KPNB1, and LIPC (Table S6). For other LDL-C genes, TransferTWAS showed a boosted performance. TransferTWAS uniquely identified ANGPTL3 (MIM: 603874) as imputable (r2 = 1.85%), leading to a significant TWAS association (p=0) and supporting findings that inhibiting ANGPTL3 lowers LDL-C levels.27

We identified 898 additional associations through the TransferTWAS method (Table S7). Based on the suggestion of Zhou et al.,11 we defined the additional association usings the following criteria: TransferTWAS pFDR<0.05; UTMOST p>0.05 or not imputable; and JTI p>0.05 or not imputable. Among these, several associations merit discussion. An improved signal was detected for APOA1 (MIM: 107680) (TransferTWAS: r2=2.95%), whereas the other two methods reported this gene as not imputable. Such improvement on gene-expression imputation may contribute to the significant associations from TransferTWAS (p=5.79×107). Similarly, TransferTWAS showed an improved imputation quality in the APOB (MIM: 107730) gene (TransferTWAS: r2=17%; UTMOST: not imputable; JTI: not imputable), leading to a significant association (TransferTWAS: p=0; UTMOST: not imputable; JTI: not imputable). This finding replicated the observations of Peloso et al.,28 who reports that mutations in APOB may be associated with lower LDL-C. An enhanced imputation quality for ABCA6 (MIM: 612504) was suggested by TransferTWAS (r2=3.87%), and the corresponding TWAS p value is 5.10×104. This finding is in line with those of Francis et al.,29 who associated a variant of this gene with LDL-C. UTMOST and JTI failed to impute this gene.

Enriched pathways in LDL-C

To assess the biological relevance between LDL-C and the significant genes identified by TransferTWAS, we performed functional enrichment analysis. As shown in Figure S8, the most significant pathways are directly tied to lipid metabolism and cardiovascular mechanisms. Specifically, the top seven significant pathways are all closely related to LDL-C, which include total cholesterol (p =2.24×1025), LDL-C (p =6.23×1021), triglycerides (p =1.89×1015), metabolite levels (p =1.08×1013), lipid metabolism phenotypes (p =1.93×1012), cholesterol metabolism (p =1.82×1012), and high-density lipoprotein cholesterol (HDL-C) (p =7.41×1011). Other pathways such as cholesterol metabolic process (p =5.81×106) and cholesterol homeostasis (p =1.46×106) are also among the top 50 significant pathways, aligning with LDL-C’s central role in lipid regulation. Additionally, pathways linked to cardiovascular disease risk are also strongly enriched, including coronary heart disease (p =5.14×1010), coronary artery disease (p =3.08×107), and cardiovascular disease risk factors (p = 1.72×107). This is consistent with the clinical implications of elevated LDL-C. Interestingly, the enrichment of MHC class II antigen presentation and immune-related pathways (e.g., graft-versus-host disease) may reflect emerging links between lipid metabolism and inflammation. The strong functional overlap with established LDL-C-relevant pathways and disease mechanisms further validates the ability of TransferTWAS in capturing trait-relevant genes.

Real application to other complex traits

We tested the TWAS performance of UTMOST, JTI, and TransferTWAS in 30 other complex traits, including depressive symptoms, schizophrenia, and Alzheimer disease (Ntotal2.5 million without adjusting for cross-study sample overlap). These GWAS datasets were previously employed in Hu et al..10 To identify the biologically most related tissues for each analyzed trait, Hu et al.10 employed linkage-disequilibrium-score regression30 and tissue-specific functional genome predicted by GenoSkyline-Plus annotations.31 The results are listed in their Supplementary Table 24, and we used them to define the causal tissue of each trait.

TransferTWAS identified the greatest number of significant associations within biologically relevant tissues across 30 complex traits. As illustrated in Figure 6C, TransferTWAS outperformed competing methods, detecting substantially more associations in the most biologically relevant tissue for each trait. Specifically, TransferTWAS exhibited a 192.17% increase in associations compared to UTMOST and a 213.14% increase compared to JTI. Applying paired one-sided Wilcoxon tests on the number of associations identified by each method confirmed these improvements: TransferTWAS significantly found more associations compared with UTMOST (p=2.64×102), and JTI (p=2.39×102). In contrast, while UTMOST identified 23.3% more associations than JTI, this difference was not statistically significant (p=0.2618). We list the number of associations identified in each trait in Table S8.

Discussion

The proposed TransferTWAS method aims to enhance gene-expression imputation accuracy by leveraging tissue-tissue similarity information. This approach borrows information from tissues with substantial sample sizes to improve predictions in tissues with limited samples. The performance of TransferTWAS was evaluated through extensive simulations and real data analysis using GTEx, GEUVADIS, ROS/MAP, and multiple GWAS datasets. We found that TransferTWAS can enhance the power of TWAS, and no evidence of inflated type-I error was observed. An enrichment analysis was conducted to clarify the biological relevance between LDL-C and the iGenes identified by TransferTWAS.

Transfer learning has been applied in various areas of statistical genetics, such as enhancing prediction accuracy by leveraging pretrained polygenic risk score models.32,33 TransferTWAS demonstrated improved TWAS power compared to other methods by leveraging eQTL effect-size information from multiple external tissues with similar genetic regulation profiles. The method’s ability to effectively utilize external tissue information across various scenarios reinforces its potential as a powerful tool for enhancing TWAS imputation performance.

In simulation I on the GTEx dataset, TransferTWAS outperformed competing methods across various scenarios in terms of imputation accuracy. Its shrinkage-based approach, which avoids SNP selection during optimization, aligns with its strength under an infinitesimal model (many SNPs with weak effects), whereas regularization-based methods like UTMOST excel under sparse architectures (a few causal SNPs with strong effects). TransferTWAS outperformed UTMOST when more than 2% of SNPs were causal (Figures 2 and S4–S6), highlighting the limitations of regularization-based methods (such as UTMOST and JTI) as default choices. This is also supported by TIGAR (transcriptome-integrated genetic association resource),23,34 which shows improved prediction accuracy over PrediXcan—a method relying on an elastic net model.35

In simulation studies II–V, TransferTWAS was evaluated for power and type-I error using ROS/MAP and GEUVADIS panels. It demonstrated superior TWAS power compared to UTMOST and JTI by leveraging eQTL effect-size information from tissues with similar genetic regulation profiles. While UTMOST lacks tissue similarity modeling and JTI relies on functional annotation, TransferTWAS’s data-driven approach proved more effective, achieving the highest TWAS power across scenarios, even in tissue like EBV-transformed lymphocytes (Figure S1) with highly specific gene expression.11 That is, its regulation is less influenced by cross-tissue expression information. Simulation studies III and V confirmed no inflated type-I error, and pathway enrichment analysis of iGenes revealed significant overlap with LDL-C-related pathways, indicating minimal false positives.

While TransferTWAS achieved lower imputation r2 than JTI in four tissues (Figure 5A), it increased the number of iGenes across all tissues (Figure 5B). This enabled more genes to enter the second step of TWAS, thereby enhancing the likelihood of identifying significant associations. In LDL-C analysis, TransferTWAS identified 898 associations missed by other methods. Given its primary goal of improving imputation in tissues with limited sample sizes, the lower r2 in specific tissues is less critical.

Several future directions for TransferTWAS warrant consideration. First, the method does not currently account for uncertainty in weight estimation during gene-expression prediction. Recent studies have successfully incorporated such uncertainty by identifying cis-eQTLs, performing fine-mapping to pinpoint key variants, and using the multivariate adaptive shrinkage (MASH) method to jointly estimate eQTL effects across tissues while incorporating tissue-specific uncertainty and correlations.36,37,38,39,40,41,42 While TransferTWAS currently relies on standard ridge regression for tissue-specific effect estimation, integrating MASH could be a promising extension, although penalized regression methods (e.g., ridge, LASSO, and elastic net) face challenges in providing valid uncertainty due to biased estimates.43,44 Second, TransferTWAS could be enhanced by incorporating external eQTL summary-level data on tissue expression similarity. For instance, Zhang et al.45 proposed a TWAS method that leverages eQTL summary-level data to improve gene-expression prediction accuracy. Since TransferTWAS only requires tissue-specific point estimates as input, it appears well suited for integrating such data. Third, addressing potential false-positive inflation in TWAS, as highlighted by recent studies,46,47 could further improve TransferTWAS’s reliability and accuracy.

In summary, we introduced TransferTWAS, a transfer learning algorithm that leverages GTEx data for gene-expression imputation. By improving imputation accuracy and TWAS power, TransferTWAS has the potential to advance our understanding of the genetic underpinnings of complex traits.

Data and code availability

Acknowledgments

This work was supported by the Hong Kong Research Grants Council General Research Fund (17307324). The Genotype-Tissue Expression (GTEx) project was supported by the Common Fund of the Office of the Director of the National Institutes of Health and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. All protected data of the GTEx project are available through the database of Genotypes and Phenotypes (dbGaP) (accession number phs000424.v8.p2). We also thank Xiang Li for many insightful discussions.

Author contributions

Y.D.Z. and D.Y.L. conceived the project. D.Y.L. and H.W. implemented the method and performed the analyses. D.Y.L., H.W., T.G., S.W., D.J.L., P.C.S., and Y.D.Z. interpreted the results. D.Y.L. and Y.D.Z. drafted the original manuscript. All authors read and approved the final manuscript.

Declaration of interests

The authors declare no competing interests.

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work, the authors used ChatGPT in order to improve readability and language only. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

Published: June 30, 2025

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2025.06.006.

Web resources

Supplemental information

Document S1. Figures S1–S8 and Notes S1–S4
mmc1.pdf (2.9MB, pdf)
Data S1. Tables S1–S8
mmc2.xlsx (2.2MB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (4.9MB, pdf)

References

  • 1.Gallagher M.D., Chen-Plotkin A.S. The post-GWAS era: from association to function. Am. J. Hum. Genet. 2018;102:717–730. doi: 10.1016/j.ajhg.2018.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Maurano M.T., Humbert R., Rynes E., Thurman R.E., Haugen E., Wang H., Reynolds A.P., Sandstrom R., Qu H., Brody J., et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. doi: 10.1126/science.1222794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Gamazon E.R., Wheeler H.E., Shah K.P., Mozaffari S.V., Aquino-Michaels K., Carroll R.J., Eyler A.E., Denny J.C., Nicolae D.L., et al. GTEx Consortium A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 2015;47:1091–1098. doi: 10.1038/ng.3367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Gusev A., Ko A., Shi H., Bhatia G., Chung W., Penninx B.W.J.H., Jansen R., de Geus E.J.C., Boomsma D.I., Wright F.A., et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 2016;48:245–252. doi: 10.1038/ng.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lonsdale J., Thomas J., Salvatore M., Phillips R., Lo E., Shad S., Hasz R., Walters G., Garcia F., Young N., et al. The genotype-tissue expression (GTEx) project. Nat. Genet. 2013;45:580–585. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.GTEx Consortium. Ardlie K.G., Deluca D.S., Segrè A.V., Sullivan T.J., Young T.R., Gelfand E.T., Trowbridge C.A., Maller J.B., Tukiainen T. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348:648–660. doi: 10.1126/science.1262110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Aguet F., Brown A.A., Castel S.E., Davis J.R., He Y., Jo B., Mohammadi P., Park Y., Parsana P., Segrè A.V., et al. Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. doi: 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318–1330. doi: 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wainberg M., Sinnott-Armstrong N., Mancuso N., Barbeira A.N., Knowles D.A., Golan D., Ermel R., Ruusalepp A., Quertermous T., Hao K., et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 2019;51:592–599. doi: 10.1038/s41588-019-0385-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hu Y., Li M., Lu Q., Weng H., Wang J., Zekavat S.M., Yu Z., Li B., Gu J., Muchnik S., et al. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat. Genet. 2019;51:568–576. doi: 10.1038/s41588-019-0345-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zhou D., Jiang Y., Zhong X., Cox N.J., Liu C., Gamazon E.R. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nat. Genet. 2020;52:1239–1246. doi: 10.1038/s41588-020-0706-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Li B., Veturi Y., Verma A., Bradford Y., Daar E.S., Gulick R.M., Riddler S.A., Robbins G.K., Lennox J.L., Haas D.W., Ritchie M.D. Tissue specificity-aware TWAS (TSA-TWAS) framework identifies novel associations with metabolic, immunologic, and virologic traits in HIV-positive adults. PLoS Genet. 2021;17 doi: 10.1371/journal.pgen.1009464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mai J., Lu M., Gao Q., Zeng J., Xiao J. Transcriptome-wide association studies: recent advances in methods, applications and available databases. Commun. Biol. 2023;6:899. doi: 10.1038/s42003-023-05279-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Barbeira A.N., Pividori M., Zheng J., Wheeler H.E., Nicolae D.L., Im H.K. Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet. 2019;15 doi: 10.1371/journal.pgen.1007889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Gu T., Han Y., Duan R. Robust angle-based transfer learning in high dimensions. J. Roy. Stat. Soc. B Stat. Methodol. 2024 doi: 10.1093/jrsssb/qkae111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wang A., Tian P., Zhang Y.D. TWAS-GKF: a novel method for causal gene identification in transcriptome-wide association studies with knockoff inference. Bioinformatics. 2024;40 doi: 10.1093/bioinformatics/btae502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bennett D.A., Buchman A.S., Boyle P.A., Barnes L.L., Wilson R.S., Schneider J.A. Religious orders study and rush memory and aging project. J. Alzheimers Dis. 2018;64:S161–S189. doi: 10.3233/JAD-179939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Lappalainen T., Sammeth M., Friedländer M.R., 't Hoen P.A.C., Monlong J., Rivas M.A., Gonzàlez-Porta M., Kurbatova N., Griebel T., Ferreira P.G., et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013;501:506–511. doi: 10.1038/nature12531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wang H., Li X., Li T., Li Z., Sham P.C., Zhang Y.D. MAAT: a new nonparametric Bayesian framework for incorporating multiple functional annotations in transcriptome-wide association studies. Genome Biol. 2025;26:21. doi: 10.1186/s13059-025-03485-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Keys K.L., Mak A.C.Y., White M.J., Eckalbar W.L., Dahl A.W., Mefford J., Mikhaylova A.V., Contreras M.G., Elhawary J.R., Eng C., et al. On the cross-population generalizability of gene expression prediction models. PLoS Genet. 2020;16 doi: 10.1371/journal.pgen.1008927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Khunsriraksakul C., McGuire D., Sauteraud R., Chen F., Yang L., Wang L., Hughey J., Eckert S., Dylan Weissenkampen J., Shenoy G., et al. Integrating 3D genomic and epigenomic data to enhance target gene discovery and drug repurposing in transcriptome-wide association studies. Nat. Commun. 2022;13:3258. doi: 10.1038/s41467-022-30956-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Feng H., Mancuso N., Gusev A., Majumdar A., Major M., Pasaniuc B., Kraft P. Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improves the power of transcriptome-wide association studies. PLoS Genet. 2021;17 doi: 10.1371/journal.pgen.1008973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Nagpal S., Meng X., Epstein M.P., Tsoi L.C., Patrick M., Gibson G., De Jager P.L., Bennett D.A., Wingo A.P., Wingo T.S., Yang J. TIGAR: an improved Bayesian tool for transcriptomic data imputation enhances gene mapping of complex traits. Am. J. Hum. Genet. 2019;105:258–266. doi: 10.1016/j.ajhg.2019.05.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Barbeira A.N., Dickinson S.P., Bonazzola R., Zheng J., Wheeler H.E., Torres J.M., Torstenson E.S., Shah K.P., Garcia T., Edwards T.L., et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 2018;9:1825. doi: 10.1038/s41467-018-03621-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Weiss K., Khoshgoftaar T.M., Wang D. A survey of transfer learning. J. Big Data. 2016;3:9–40. [Google Scholar]
  • 26.Wu L., Shi W., Long J., Guo X., Michailidou K., Beesley J., Bolla M.K., Shu X.-O., Lu Y., Cai Q., et al. A transcriptome-wide association study of 229,000 women identifies new candidate susceptibility genes for breast cancer. Nat. Genet. 2018;50:968–978. doi: 10.1038/s41588-018-0132-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Gaudet D., Gipe D.A., Pordy R., Ahmad Z., Cuchel M., Shah P.K., Chyu K.-Y., Sasiela W.J., Chan K.-C., Brisson D., et al. ANGPTL3 inhibition in homozygous familial hypercholesterolemia. N. Engl. J. Med. 2017;377:296–297. doi: 10.1056/NEJMc1705994. [DOI] [PubMed] [Google Scholar]
  • 28.Peloso G.M., Nomura A., Khera A.V., Chaffin M., Won H.-H., Ardissino D., Danesh J., Schunkert H., Wilson J.G., Samani N., et al. Rare protein-truncating variants in APOB, lower low-density lipoprotein cholesterol, and protection against coronary heart disease. Circ. Genom. Precis. Med. 2019;12 doi: 10.1161/CIRCGEN.118.002376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Francis M., Li C., Sun Y., Zhou J., Li X., Brenna J.T., Ye K. Genome-wide association study of fish oil supplementation on lipid traits in 81,246 individuals reveals new gene-diet interaction loci. PLoS Genet. 2021;17 doi: 10.1371/journal.pgen.1009431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Finucane H.K., Bulik-Sullivan B., Gusev A., Trynka G., Reshef Y., Loh P.-R., Anttila V., Xu H., Zang C., Farh K., et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 2015;47:1228–1235. doi: 10.1038/ng.3404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lu Q., Powles R.L., Abdallah S., Ou D., Wang Q., Hu Y., Lu Y., Liu W., Li B., Mukherjee S., et al. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer’s disease. PLoS Genet. 2017;13 doi: 10.1371/journal.pgen.1006933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhao Z., Fritsche L.G., Smith J.A., Mukherjee B., Lee S. The construction of cross-population polygenic risk scores using transfer learning. Am. J. Hum. Genet. 2022;109:1998–2008. doi: 10.1016/j.ajhg.2022.09.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Tian P., Chan T.H., Wang Y.-F., Yang W., Yin G., Zhang Y.D. Multiethnic polygenic risk prediction in diverse populations through transfer learning. Front. Genet. 2022;13 doi: 10.3389/fgene.2022.906965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Parrish R.L., Buchman A.S., Tasaki S., Wang Y., Avey D., Xu J., De Jager P.L., Bennett D.A., Epstein M.P., Yang J. SR-TWAS: leveraging multiple reference panels to improve transcriptome-wide association study power by ensemble machine learning. Nat. Commun. 2024;15:6646. doi: 10.1038/s41467-024-50983-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zou H., Hastie T. Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B Stat. Methodol. 2005;67:301–320. [Google Scholar]
  • 36.Gao G., Fiorica P.N., McClellan J., Barbeira A.N., Li J.L., Olopade O.I., Im H.K., Huo D. A joint transcriptome-wide association study across multiple tissues identifies candidate breast cancer susceptibility genes. Am. J. Hum. Genet. 2023;110:950–962. doi: 10.1016/j.ajhg.2023.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Gao G., McClellan J., Barbeira A.N., Fiorica P.N., Li J.L., Mu Z., Olopade O.I., Huo D., Im H.K. A multi-tissue, splicing-based joint transcriptome-wide association study identifies susceptibility genes for breast cancer. Am. J. Hum. Genet. 2024;111:1100–1113. doi: 10.1016/j.ajhg.2024.04.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Li J.L., McClellan J.C., Zhang H., Gao G., Huo D. Multi-tissue transcriptome-wide association studies identified 235 genes for intrinsic subtypes of breast cancer. J. Natl. Cancer Inst. 2024;116:1105–1115. doi: 10.1093/jnci/djae041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.McClellan J.C., Li J.L., Gao G., Huo D. Expression-and splicing-based multi-tissue transcriptome-wide association studies identified multiple genes for breast cancer by estrogen-receptor status. Breast Cancer Res. 2024;26:51. doi: 10.1186/s13058-024-01809-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Araujo D.S., Nguyen C., Hu X., Mikhaylova A.V., Gignoux C., Ardlie K., Taylor K.D., Durda P., Liu Y., Papanicolaou G., et al. Multivariate adaptive shrinkage improves cross-population transcriptome prediction and association studies in underrepresented populations. HGG Adv. 2023;4 doi: 10.1016/j.xhgg.2023.100216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Chen D.M., Dong R., Kachuri L., Hoffmann T.J., Jiang Y., Berndt S.I., Shelley J.P., Schaffer K.R., Machiela M.J., Freedman N.D., et al. Transcriptome-wide association analysis identifies candidate susceptibility genes for prostate-specific antigen levels in men without prostate cancer. HGG Adv. 2024;5 doi: 10.1016/j.xhgg.2024.100315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Urbut S.M., Wang G., Carbonetto P., Stephens M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 2019;51:187–195. doi: 10.1038/s41588-018-0268-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Zhang C.-H., Zhang S.S. Confidence intervals for low dimensional parameters in high dimensional linear models. J. Roy. Stat. Soc. B Stat. Methodol. 2014;76:217–242. [Google Scholar]
  • 44.Javanmard A., Montanari A. Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 2014;15:2869–2909. [Google Scholar]
  • 45.Zhang Z., Bae Y.E., Bradley J.R., Wu L., Wu C. SUMMIT: An integrative approach for better transcriptomic data imputation improves causal gene identification. Nat. Commun. 2022;13:6336. doi: 10.1038/s41467-022-34016-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.de Leeuw C., Werme J., Savage J.E., Peyrot W.J., Posthuma D. On the interpretation of transcriptome-wide association studies. PLoS Genet. 2023;19 doi: 10.1371/journal.pgen.1010921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Liang Y., Nyasimi F., Im H.K. Pervasive polygenicity of complex traits inflates false positive rates in transcriptome-wide association studies. bioRxiv. 2024 doi: 10.1101/2023.2010.2017.562831. Preprint at. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S8 and Notes S1–S4
mmc1.pdf (2.9MB, pdf)
Data S1. Tables S1–S8
mmc2.xlsx (2.2MB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (4.9MB, pdf)

Data Availability Statement


Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES