Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2023 Sep 7;19(9):e1010921. doi: 10.1371/journal.pgen.1010921

On the interpretation of transcriptome-wide association studies

Christiaan de Leeuw 1,*, Josefin Werme 1, Jeanne E Savage 1, Wouter J Peyrot 1,2, Danielle Posthuma 1,3
Editor: Heather J Cordell4
PMCID: PMC10508613  PMID: 37676898

Abstract

Transcriptome-wide association studies (TWAS) aim to detect relationships between gene expression and a phenotype, and are commonly used for secondary analysis of genome-wide association study (GWAS) results. Results from TWAS analyses are often interpreted as indicating a genetic relationship between gene expression and a phenotype, but this interpretation is not consistent with the null hypothesis that is evaluated in the traditional TWAS framework. In this study we provide a mathematical outline of this TWAS framework, and elucidate what interpretations are warranted given the null hypothesis it actually tests. We then use both simulations and real data analysis to assess the implications of misinterpreting TWAS results as indicative of a genetic relationship between gene expression and the phenotype. Our simulation results show considerably inflated type 1 error rates for TWAS when interpreted this way, with 41% of significant TWAS associations detected in the real data analysis found to have insufficient statistical evidence to infer such a relationship. This demonstrates that in current implementations, TWAS cannot reliably be used to investigate genetic relationships between gene expression and a phenotype, but that local genetic correlation analysis can serve as a potential alternative.

Author summary

The aim of transcriptome-wide association studies (TWAS) is to find genetic relationships between the expressions of genes and specific human traits of interest, and they do so using large-scale genetic association results from existing studies of those traits. However, the statistical methods traditionally used to perform these TWAS studies do not directly test such genetic relationships, and in this study we examine the implications of this limitation. Using both simulations as well as analyses of real data, we demonstrate that when traditional TWAS methods are interpreted as testing a genetic relationship between gene expression and a trait, this results in a considerable inflation in the false positive rates for these analysis. This suggests that these TWAS methods cannot reliably be used to study such genetic relationships, with their statistically valid interpretion being much more limited. We also show that methods such as local genetic correlation analysis can serve as potential alternative.

Introduction

Transcriptome-wide association studies (TWAS) [18] constitute a statistical framework commonly used to study relationships between gene expression and a phenotype. Combining expression quantitative trait loci (eQTL) and genome-wide association study (GWAS) results, a TWAS analysis estimates the genetic component of the gene expression and then tests whether this is associated with the phenotype. Since TWAS does not require that gene expression and the phenotype are measured in the same sample, it can be used with any phenotype for which GWAS data is available, giving it a broad scope of application.

From a significant TWAS association, it would be tempting to conclude that there is a genetic relationship between the expression of the tested gene and the phenotype of interest, and this interpretation is indeed commonly found in applied TWAS studies [913]. Such an interpretation is also suggested by various TWAS method papers such as Gamazon (2015) [7] and Mancuso (2019) [8], with for example the latter stating that “[TWAS] can be viewed as a test for non-zero local genetic correlation between expression and trait.”

However, the traditional TWAS framework does not directly test the relationship of the phenotype with the genetic component of the gene expression. Rather, it only tests the relationship with the predicted genetic component of the gene expression, without accounting for the uncertainty in that prediction [6,7]. It thereby omits the fact that the gene expression data is also the result of a sampling process from the analysis. Consequently, as will be shown, the test of association between that predicted genetic component and a phenotype reduces to merely a (weighted) test of joint association of the SNPs with the phenotype, which means that they cannot be used to infer a genetic relationship between gene expression and the phenotype on a population level.

In this study, we show that this distinction has important implications for the interpretation and utility of TWAS results. First, we dissect the mathematical structure of the TWAS framework, explaining the exact null hypothesis that is tested in TWAS analysis, and then demonstrating how this relates to the genetic relationship between gene expression and the phenotype. Second, we use large-scale simulations to demonstrate that if we were to incorrectly interpret TWAS as testing such genetic relationships, this can result in potentially highly inflated type 1 error rates. Finally, we use a real data application with five well-powered phenotypes to illustrate what these error rate inflations can look like in practice, showing that TWAS cannot be used as a reliable tool for detecting genetic relationships between gene expression and a phenotype of interest.

Results

The TWAS framework

The traditional TWAS framework is implemented as a two-stage procedure, as follows [1,6,7]. First, a model is specified for the expression level E of a particular gene

E=XαE+ξE, (1)

where X denotes the genotype matrix of SNPs local to that gene (typically within one megabase from the transcription region of the gene), and ξE a residual component independent of X. Note that any trans-effects on E of SNPs elsewhere on the genome will be absorbed into ξE, insofar as those SNPs are independent of X. We can further define GE = E, which reflects the true local genetic component of the gene expression captured by the SNPs in X. This model is fitted to a sample with genotype and gene expression data to obtain an estimated genetic effect vector α^E. With this, the estimated genetic component of the gene expression is then computed in the GWAS sample for the outcome phenotype Y as G^E=Xα^E. Finally, a linear regression model of the form

Y=G^Eβ+εY, (2)

with coefficient β and residual εY, is used to test the relationship between the estimated genetic component G^E and the phenotype Y, evaluating the null hypothesis of no association H0: β = 0.

Note that our presentation here is simplified for the sake of brevity, which includes assuming a continuous phenotype and omitting covariates. The second stage can also be rewritten to require only pre-existing GWAS summary statistics rather than raw genotype and phenotype data as input. See Methods - Outline of TWAS framework for a description of the more general case.

In practice, most existing TWAS methods have this mathematical structure [68,1424], including all the ‘linear’ models listed in Table 1 aside from CoMM [25]. Where they differ is in their implementation, particularly in how α^E is estimated [68,1425]. Since the number of SNPs in X is usually much larger than the sample size of the gene expression data, and are also highly collinear, it is not possible to fit the model in (1) using standard procedures such as Ordinary Least Squares. Most TWAS implementations therefore use a penalized regression approach such as Elastic Net, LASSO or a spike-and-slab model to resolve this issue (see Table 1). Although different kinds of penalized regression procedures will also yield differ different estimates α^E, all these methods are still conceptually equivalent, each providing an estimate G^E of the genetic component of the gene expression for use in the second stage.

Table 1. Overview of available TWAS analysis methods.

Method Weight estimationa Base model extensions
Linear models
    Gamazon (2015)[7]—PrediXcan Marginal
LASSO
Elastic net
-
    Gusev (2016)[6]—FUSIONb Top eQTL
BLUP
Bayesian LMM
LASSO
Elastic net
-
    Mancuso (2017)[14]—RhoGE BLUP -
    Barbeira (2018)[15]—MetaXcan Marginal
LASSO
Elastic net
-
    Su (2018)[16]—MiST External Models additional variance component for genetic effects not mediated by predicted expression
    Hu (2019)[17]—UTMOST Multivariate LASSO Simultaneously models multiple tissues during weight estimation
    Yang (2019)[25]—CoMM Collaborative mixed model Estimates weights and associations with phenotype simultaneously in single model
    Mancuso (2019)[8]—FOCUS External Models multiple genes at once, as well as additional pleiotropic genetic effects on phenotype
    Nagpal (2019)[18]—TIGAR Dirichlet process regression Multivariate model with multiple outcome phenotypes
    Liu (2020)[19]—T-GEN Spike & Slab Incorporates epigenetic information into weight estimation process
    Luningham (2020)[20]—BGW-TWAS Spike & Slab Models additional trans-eQTL component
    Bhattacharya (2021)[21]—MOSTWAS Elastic net
BLUP
Models additional components for trans-eQTL or other molecular phenotypes
Non-linear models
    Xu (2017)[22]—ASPU External Uses adaptive test combining sums of powers of score statistics for different powers (includes linear model)
    Zhang (2020)[23] External Uses adaptive test combining linear model with sum of squared score statistics
    Tang (2021)[24]—VC-TWAS External Uses sum of powers of score statistics instead of linear model

LASSO: least absolute shrinkage and selection operator; BLUP: best linear unbiased predictor; LMM: linear mixed model

a Multiple entries for a method denote different options; ‘marginal’ refers to marginal SNP effect sizes being used as weights, ‘external’ means the method requires precomputed weights from an external source

b The name ‘FUSION’ and the LASSO and elastic net options for this method were added after publication of the Gusev (2016) paper

Not all existing methods adopt this structure, such as the methods listed as ‘non-linear’ in Table 1, but these methods run into the same issues that will be discussed below for the traditional ‘linear’ TWAS framework (see Non-linear TWAS models in S1 Text for more details). Throughout this paper, unless otherwise specified, we will use TWAS to refer specifically to the linear TWAS framework as described above.

Structure of the TWAS null hypothesis

TWAS evaluates the null hypothesis H0: β = 0 for the model presented in Eq (2), which can be rewritten to mathematically equivalent formulations to provide a clearer understanding of what is being tested. Firstly, we can observe that β=cov(G^E,Y)var(G^E), and therefore β will be zero if and only if the covariance cov(G^E,Y) is zero. As such, the TWAS null hypothesis is equivalent to H0:cov(G^E,Y)=0.

To better understand the nature of the covariance term cov(G^E,Y), we can partition the phenotype Y in the same way as we did with E in Eq (1), such that

Y=XαY+ξY=GY+ξY (3)

Analogous to GE, this gives us a local genetic component GY for the phenotype that is based on the SNPs in X, and a residual ξY that is independent of X (that is, cov(Xj, ξY) = 0 for every SNP j). This local genetic component reflects all the genetic signal of Y that is (linearly) captured by the SNPs in X, including that of SNPs in LD with X, with the rest of the genetic signal of Y being absorbed into ξY.

Partitioning Y in this manner allows us to substitute Y = GY+ξY in cov(G^E,Y), giving us cov(G^E,Y)=cov(G^E,GY+ξY)=cov(G^E,GY)+cov(G^E,ξY) (where the last step follows from the distributive property of covariance). The covariance cov(G^E,Y) can thus be partitioned into two parts, of which the second part cov(G^E,ξY) must be zero. This follows from the fact that G^E=Xα^E=jXjα^Ej (with j indexing the individual SNPs), and therefore cov(G^E,ξY)=cov(jXjα^Ej,ξY)=jα^Ejcov(Xj,ξY). As noted in the previous paragraph, cov(Xj, ξY) = 0 for every SNP j, and as such cov(G^E,ξY)=0 as well. With that, we can conclude that cov(G^E,Y)=cov(G^E,GY). In other words, since GY captures all the association that the SNPs in X have with the phenotype Y, and G^E is entirely based on X, the covariance between G^E and the residual term ξY will be zero by definition. The result of this is that we can further rewrite the TWAS null hypothesis into H0:cov(G^E,GY)=0.

Relation to local genetic covariance

Local genetic correlation, analogous to genome-wide genetic correlations, quantifies the genetic relationship between two phenotypes within a particular genomic region, with gene expression, in this context, being one of those phenotypes [26,27]. Since the local genetic correlation can be defined as cov(GE,GY)var(GE)var(GY), and thus will be zero if and only if cov(GE, GY) is zero, the genetic relationship between gene expression and the phenotype can be tested using the null hypothesis H0:cov(GE, GY) = 0.

However, by contrast, TWAS performs a test of cov(G^E,GY) rather than cov(GE, GY), as shown above. To see how these two covariances relate to each other, we can partition the estimated genetic component G^E as G^E=GE+Δ^E, with Δ^E the deviation of the sample estimate from the true genetic component GE. Again using the distributive property of covariance, we can then conclude cov(G^E,GY)=cov(GE+Δ^E,GY)=cov(GE,GY)+cov(Δ^E,GY). For ease of notation, we will denote C^=cov(Δ^E,GY). We can thus see that the covariance term cov(G^E,GY) that is tested by TWAS is offset from the local genetic covariance cov(GE, GY) by C^, and plugging this relationship into H0:cov(G^E,GY)=0, we find that the TWAS null hypothesis is equivalent to H0:cov(GE,GY)=C^.

Crucially, the regression model in Eq (2) conditions on its predictor G^E, thus considering the value of G^E to be fixed. It follows that the values of the deviation term Δ^E, and therefore that of C^, are fixed as well, even though Δ^E is the result of the random sampling of E in the gene expression data. As such, the quantity C^ that TWAS implicitly tests cov(GE, GY) against, will have an unknown, arbitrary value (different for each gene) that reflects the specific error in the estimation that was realized when the gene expression E was generated, with the distribution from which this value is drawn also determined by choice of model used for estimating αE (for more details, see The TWAS null value in S1 Text).

To further illustrate this, consider an analogy to the t-test. Suppose we are interested in testing whether the population means μA and μB of some variable for groups A and B differ. However, instead of performing a two-sample t-test with H0:μA = μB, or equivalently H0:μA = μB = 0, we compute the sample mean μ^B for group B and use a single sample t-test with H0:μA=μ^B. Let us assume that in this case, in our sample from group B the realized error in the estimate μ^B is 1, ie. μ^B=μB+1. This means that with our single sample t-test, we would be testing H0:μA = μB = 1. As with TWAS, because we are disregarding the estimation uncertainty in μ^B when performing our test, we end up testing the quantity of interest, μA = μB, against an unknown null value, which isn’t very informative.

Interpretation of TWAS results

As shown above, TWAS implicitly tests the local genetic covariance cov(GE, GY) against C^ (for an unknown value of C^) rather than zero, and as such it cannot be considered a direct test of the genetic relationship between gene expression and the phenotype. This raises the question of how TWAS results actually can be interpreted, which we can address by determining the circumstances in which the TWAS null hypothesis H0:cov(G^E,GY)=0 is true.

As shown above, the null hypothesis H0:cov(G^E,GY)=0 is equivalent to H0:cov(GE,GY)=C^. Examining the structure of C^ more closely we can see that it decomposes into a weighted sum C^=jw^jcov(Xj,Y) of the (population-level) covariances cov(Xj, Y) of the SNPs in X with the phenotype (see The TWAS null value in S1 Text). These weights w^j reflect the error in the effect size estimate for each SNP j, ie. α^Ej=αEj+w^j, and as such their values are the realization of a random process.

It follows that, if there is any genetic association between the SNPs in X and the phenotype Y, the TWAS null hypothesis will only be true if by chance the values of w^j are such that jw^jcov(Xj,Y) happens to equal −cov(GE, GY). Because the value of jw^jcov(Xj,Y) is effectively a random draw from a continuous space, the probability of this happening is infinitessimal. Realistically, the TWAS null hypothesis will therefore only be true if there is no genetic association between X and Y, with cov(Xj, Y) = 0 for every SNP j. In this case, jw^jcov(Xj,Y) will be zero as well, and at the same time the absence of such genetic signal also means that the variance of GY is zero, and therefore no covariance with GE can exist. As such, in this case both cov(GE, GY) and C^ will be zero, and H0:cov(GE,GY)=C^ is therefore true.

In practice, the TWAS null hypothesis will therefore be true if, and only if, the SNPs in X are all independent of the phenotype, and a test of this null hypothesis therefore amounts to a joint genetic association test, ie. a test of independence between X and Y akin to testing H0:αY = 0 in the general regression model in Eq (3), though with somewhat different power characteristics. Although as shown, the TWAS null hypothesis H0:cov(GE,GY)=C^ being true does imply that the null hypothesis H0:cov(GE, GY) = 0 of no local genetic relationship is true as well, the reverse does not hold. If genetic association does exist between X and Y, then H0:cov(GE,GY)=C^ will be false, but as long as this genetic association is independent of the genetic signal captured by GE, then cov(GE, GY) will still be zero. This is why it does not follow from the TWAS null hypothesis being false that a local genetic relationship exists between gene the expression and the phenotype.

Despite this, the intuition may remain that even though G^E is uncertain, it is still based on the true GE and should therefore still contain some information relevant to the genetic relationship between gene expression and the phenotype. And this is indeed the case, because as shown the tested covariance cov(G^E,GY)=cov(GE,GY)+C^ is partly based on the true local genetic covariance cov(GE, GY). But although present, this information cannot meaningfully be extracted from the TWAS results, since even when cov(G^E,GY) is found to deviate significantly from zero, the TWAS cannot tell us whether this is due to a strong local genetic relationship through cov(GE, GY), or just through the noise term C^.

TWAS as a test of genetic relationship

As established above, TWAS does not technically provide a direct test of the genetic relationship between gene expression and the phenotype, since its null hypothesis is equivalent to H0:cov(GE,GY)=C^, with the value of C^ unknown. However, it is possible that in the scenarios that are likely to arise in practice, the value of C^ might be so small as to be negligible. If so, we could simply treat TWAS as if it is testing the null hypothesis H0:cov(GE, GY) = 0 after all, and interpret its significant results as indicative of genetic relationships between gene expression and the phenotype without issue.

In the remainder of this paper, we will evaluate the viability in practice of treating TWAS as a test of the null hypothesis H0:cov(GE, GY) = 0, first using large-scale simulations to determine the resulting type 1 error rates across a range of different scenarios, and then using an application to real data to illustrate the impact in practice. We used FUSION [6] for this evaluation, as it is a widely used tool, and it implements a range of different models within the same tool that are representative of those used in many other TWAS methods (see Table 1). We also included CoMM [25], since it uses a setup that is somewhat different than that of the other TWAS methods, and closer to that of local genetic correlation methods [26,27] (see Comparison with the CoMM model in S1 Text). Note that reported type 1 error rates will be relative to H0:cov(GE, GY) = 0, rather than the null hypotheses these methods are designed to test, as the aim is to evaluate the viability of using these methods when they are treated as a test the genetic relationship between gene expression and the phenotype (as they are often interpreted in practice).

To serve as a reference, we included the local genetic correlation model implemented in LAVA (denoted LAVA-rG), which is designed and calibrated to test the null hypothesis H0:cov(GE, GY) = 0 directly. Furthermore, we also implemented a TWAS model in the LAVA framework (denoted LAVA-TWAS), which is identical to LAVA-rG in every way, except that like other traditional TWAS methods it treats G^E as given (rather than modelling its distribution under sampling of the gene expression E, like LAVA-rG does; see Methods - LAVA implementation of TWAS). This implementation allows for a direct comparison between testing H0:cov(G^E,GY)=0 and testing H0:cov(GE, GY) = 0, unconfounded by other differences in model or implementation.

Simulation study

To evaluate the performance of TWAS when interpreted as a test of the genetic relationship between gene expression and outcome phenotype, we performed a series of simulations, generating gene expression and phenotype values under the null hypothesis H0:cov(GE, GY) = 0 and computed type 1 error rates. Genotype data from the UK Biobank [28,29] was used as a basis for the simulations, and we varied the number of SNPs per simulated gene, as well as the sample size and local heritability of the gene expression and outcome phenotype data (see Methods - Simulation study). Analysis was performed using LAVA-TWAS and LAVA-rG, as the four different models in FUSION (BLUP, Bayesian LMM, Elastic Net, LASSO), and CoMM.

The simulations showed considerable inflation of type 1 error rates (α = 0.05) when using LAVA-TWAS to evaluate H0:cov(GE, GY) = 0, depending on the simulation condition. The inflation strongly increased with larger local heritability or sample size for the outcome phenotype (Fig 1), with the local heritability also interacting with the number of SNPs in the gene (Fig A in S1 Text). In addition, some decrease in the type 1 error rate inflation was also found with greater local heritability or sample size for the gene expression, but this effect is much less pronounced. This is in part because in the typical sample size range of existing individual eQTL data sets, the relative contribution of these parameters to the overall variance is dominated by that of other parameters (see also the attenuation factor A discussed in Underlying distributions in S1 Text). At higher sample sizes for the gene expression, changes in both the sample size and the heritability of the gene expression would have a larger impact.

Fig 1. Type 1 error rate simulation results as a function of sample size and local heritability.

Fig 1

Shown is the type 1 error rate (at significance threshold of 0.05) of the LAVA-TWAS model for the null hypothesis of no genetic covariance (cov(GE, GY) = 0), at different levels of local heritability (h2) for outcome phenotype (horizontal axis) and gene expression (separate lines). Results are shown for different sample sizes (N) for the expression (rows) and outcome (columns) data, and at 1,000 SNPs. As shown, the type 1 error rates get considerably larger with increasing sample size or local heritability for the outcome. Conversely, increasing these for the expression reduced the type 1 error rate, but the effect of this is much less pronounced.

The simulations also show that the type 1 error rate inflation also became progressively worse when evaluating it at stricter significance thresholds than the α of 0.05 used so far (Fig 2). Type 1 error rates also increased when filtering out genes that exhibit insufficient genetic signal for the gene expression, as is general practice in TWAS analysis (Fig B in S1 Text). To validate the LAVA-TWAS model, type 1 error rates for LAVA-TWAS under its actual null hypothesis of H0:cov(G^E,GY)=0 were also assessed. These were found to be well controlled in all conditions (Fig C in S1 Text), further emphasizing the disconnect between this null hypothesis and the null hypothesis of no genetic relationship.

Fig 2. Type 1 error rate inflation simulations results as a function of significance threshold.

Fig 2

Shown is the type 1 error rate inflation of the LAVA-TWAS model for the null hypothesis of no genetic covariance (cov(GE, GY) = 0), for different levels of significance threshold α. The type 1 error rate inflation is defined as the type 1 error rate divided by the significance rate α, and equals 1 if the error rates are well-controlled. Results are shown for simulations with expression sample size of 1,000, expression local heritability of 5%, and 1,000 SNPs. As shown, the type 1 error rate becomes relatively more inflated the lower the significance threshold used (separate lines), with the difference becoming more pronounced at higher local heritability (h2) (horizontal axis) and sample size (N) (separate panels) for the outcome.

Results for the four FUSION penalized regression models are largely the same as for LAVA-TWAS (Fig 3), suggesting that the type of model and penalization used for the estimation of α^E does not strongly affect the subsequent type 1 error rate inflation. Although the Elastic Net and LASSO models did show lower inflation than other models in the 1% gene expression heritability conditions, this can be explained by the fact that they frequently failed to converge in these conditions when estimating Eq (1). This difference was no longer present in the 5% heritability conditions (see Fig D in S1 Text).

Fig 3. Comparison of type 1 error rate simulation results across models.

Fig 3

Shown is the type 1 error rate (at significance threshold of 0.05) of the five different TWAS models and LAVA-rG for the null hypothesis of no genetic covariance (cov(GE, GY) = 0), at different levels of local heritability (h2) for outcome phenotype (horizontal axis) and gene expression (separate panels). Results are shown for sample sizes of 1,000 and 100,000 for the expression and outcome respectively, with 1,000 SNPs. As shown, the results are very similar across the different models. The lower type 1 error rate for Elastic Net and LASSO compared to the other TWAS models for the 1% expression heritability condition is due to these two models sometimes failing to converge at very low genetic signal for the expression (see also Fig D in S1 Text).

Similar patterns of type 1 error rate inflation were found for CoMM as well and were generally more pronounced, though the influence of the gene expression heritability was more complex than for the other methods (Fig E in S1 Text). This is due to specific constraints in the model, which are analogous to how traditional TWAS models constrain αY to be proportional to α^E in the multiple regression model in Eq (3) (see Comparison with the CoMM model in S1 Text). As in the validation simulations in Werme (2022) [26], type 1 error rates for LAVA-rG were found to be well-controlled (Fig 3).

Overall, the results of our simulations demonstrate that treating TWAS analysis as if they were a test of the null hypothesis of no genetic relationship H0:cov(GE, GY) = 0 can cause considerable inflation of the type 1 error rates in many conditions. A strong effect was observed for the sample size and heritability of the outcome phenotypes, in line with the interpretation of TWAS as a joint association test. Of particular concern is that the inflation increases when filtering on the univariate signal of the gene expression, which is commonly done in TWAS studies. The inflation also increases when using lower significance thresholds, which is likely to apply in practice to correct for multiple testing, and often at more stringent levels than shown in our simulations. See Discussion of simulation results in S1 Text for more discussion of the findings from these simulations.

Real data analysis

To gauge the magnitude of the impact this issue has in a real data context, we applied TWAS to GWAS results of five well-powered phenotypes [3033] (see Table 2), with eQTL data for blood gene expression from GTEx [34] (v8). A total of 14,584 genes were analysed, performing TWAS analysis for those genes for which the univariate gene expression signal was significant at αBONF = 0.05/14,584 = 3.43×10−6 (see Methods - Data and Methods—Real data analysis). The data was analysed with the Elastic Net and LASSO models in FUSION and LAVA-TWAS. We analyzed the data with LAVA-rG as well to provide a baseline for the LAVA-TWAS results, and determine for how many associations the H0:cov(GE, GY) = 0 null hypothesis can be validly rejected.

Table 2. Summary of results of TWAS and local genetic correlation analyses of published summary statistics for five phenotypes.

Phenotype Sample sizea Number of SNPsb Genes tested Significance threshold LAVA signif. associations FUSION signif. associations
r G TWAS Overlap %c Elast. net LASSO
Blood pressure[33] 361K 5.94M 6,686 7.48×10−6 68 107 63.6 100 109
BMI[30] 807K 6.28M 5,453 9.17×10−6 134 232 57.8 297 328
Type 2 diabetes[33] 18.5K/366K 5.94M 6,686 7.48×10−6 27 45 60.0 26 33
Educational attainment[31] 766K 6.18M 5,532 9.04×10−6 59 108 54.6 149 165
Schizophrenia[32] 67.4K/94.0K 6.08M 5,765 8.67×10−6 53 92 57.6 109 114

Genes were included for bivariate testing if they exhibited significant univariate eQTL signal in the LAVA univariate test at αBONF = 0.05/14,584 = 3.43×10−6 (see Table A in S1 Text for results filtered on FUSION univariate test). Results were Bonferroni corrected per phenotype for the number genes tested (see Methods - Real data analysis). Note that all significant LAVA-rG associations were also significant in LAVA-TWAS.

a Showing case/control for binary phenotypes

b After filtering for overlap with 1,000 Genomes and GTEx SNPs

c Percentage of significant LAVA-TWAS associations that were also LAVA-rG associations

A summary of the results is given in Table 2 and as shown, the number of significant LAVA-TWAS associations is considerably higher than those for LAVA-rG, with on average only about 59% of LAVA-TWAS associations confirmed by LAVA-rG. From this it follows that if we were to interpret LAVA-TWAS as a test of genetic relationship between gene expression and the phenotype, 41% of the associations found would not be valid. This is because LAVA-rG is calibrated for testing H0:cov(GE, GY) = 0, and LAVA-TWAS only differs from LAVA-rG in the fact that it conditions on G^E whereas LAVA-rG models the distribution of the estimation uncertainty in G^E that we know exists. This single difference is the cause of the inflated type 1 error rates observed for LAVA-TWAS (but not LAVA-rG) in the simulations. Consequently, because there is no other difference between the two, the 41% of significant LAVA-TWAS associations not shared with LAVA-rG cannot be due to a legitimate increase in power, meaning that they are not valid if interpreting LAVA-TWAS as a test of H0:cov(GE, GY) = 0 (see also Underlying distributions in S1 Text).

Although no equivalent reference model directly testing H0:cov(GE, GY) = 0 is available for the FUSION models, the FUSION results paint a similar picture. The number of associations found when using FUSION are at a comparable level to LAVA-TWAS (Tables 2 and A in S1 Text), and the FUSION models had very similar type 1 error rate inflation levels as LAVA-TWAS in the simulations as well. Taken together, it can reasonably be concluded that the proportion of FUSION results in the real data analysis that are not supported by sufficient statistical evidence, when interpreted as a test of the genetic relationship between gene expression and the phenotype, is likely at a very similar level as LAVA-TWAS.

We also evaluated results for genes with a univariate p-value for the gene expression greater than 0.05. As shown in Table B in S1 Text, virtually no local genetic correlations are found with LAVA-rG for these genes, yet results for the TWAS analyses remain at a similar level as in the primary analysis in Table 2. Because these genes have no detectable genetic component for the gene expression, they would typically not be included in a TWAS analysis. But the fact that TWAS still yields a considerable number of significant associations in this context, when α^E reflects essentially just noise, serves to further illustrate how TWAS simply functions as a form of joint association test, with results not necessarily reflecting any genetic relationship between gene expression and the outcome phenotype.

Deflation of standard errors

Fundamentally, the statistical reason behind the results in our simulations and real data analysis is that the TWAS model omits a source of variance that we know is present. The estimated genetic effect vector α^E is subject to estimation uncertainty due to the sampling of the gene expression E, but this uncertainty is not accounted for in the TWAS test. If the aim is to evaluate the genetic relationship with the phenotype, by testing the null hypothesis H0:cov(GE, GY) = 0, the resulting standard errors will therefore only reflect the uncertainty due to sampling of Y, and not that of E. As such, the standard errors will be too small, creating a downward bias in the p-values (see Underlying distributions in S1 Text for more details).

For LAVA-TWAS the corrected standard errors are known, as these are the LAVA-rG standard errors, and we can therefore use these to illustrate the issue. We computed this standard error deflation individually for each gene in our analysis, as the ratio of the LAVA-TWAS standard errors to their corrected value. These are summarized in Table 3 and, as shown, they vary considerably across genes, with negligible deflation for some genes but severe deflation for others. Fig 4 illustrates the relation between this standard error deflation and the resulting bias in p-values, using the BMI results as an example, with Fig F in S1 Text further showing the level of realized bias. This high variability of standard error deflation across genes also demonstrates that this issue must be addressed at the level of the individual gene, and cannot be resolved by applying a simple post-hoc correction to the gene p-values.

Table 3. Summary of TWAS standard error deflation values per gene.

Phenotype Mean Minimum Quantiles
5% 25% Median 75% 95%
Blood pressure 0.93 0.69 0.84 0.90 0.94 0.97 0.99
BMI 0.88 0.46 0.75 0.84 0.89 0.94 0.98
Type 2 diabetes 0.95 0.51 0.89 0.93 0.96 0.98 1.00
Educational attainment 0.93 0.62 0.84 0.90 0.94 0.97 0.99
Schizophrenia 0.92 0.65 0.83 0.89 0.92 0.96 0.99

Based on genes with significant univariate eQTL signal in the LAVA univariate test, see Table 2. The standard error deflation is calculated as the ratio of the standard error for genes in the LAVA-TWAS model (not correcting for uncertainty in the estimate) to the corresponding standard errors in the LAVA-rG model (calibrated for testing H0:cov(GE, GY) = 0).

Fig 4. Illustration of p-value bias as a result of deflated standard errors.

Fig 4

Shown in the left panel are the p-values that will result from a LAVA-TWAS analysis at different levels of standard error deflation, when the true p-value computed with correct standard errors would either be 0.05 (left vertical axis) or 0.0001 (right vertical axis). The standard error deflation values are taken from the BMI analysis, with the large red point denoting the median and the other red points the 5th, 25th, 75th and 95th quantiles of their distribution (see Table 3). As shown, the relative bias is stronger the lower the p-value. For example, at the median standard error deflation of 0.89, at a reference p-value of 0.05 the biased p-value is about 0.028, about a 1.8-fold decrease in the p-value relative to its true value. At a reference p-value of 0.0001 however, the biased p-value is about 1.2×10−5, an 8.1-fold decrease. In the right panel is the general relationship between the true reference p-value and the p-value that would result at different levels of standard error deflation, based on the percentiles for the BMI analysis. This further illustrates the relative increase in bias for lower p-values, with the gap with the reference p-value (black line) getting larger at lower reference p-values. Note that all axes with p-values use a -log10(p-value) scaling, but labels are expressed as p-values for ease of interpretation.

Discussion

In this paper, our aim was to elucidate the interpretation of the TWAS null hypothesis and results. Although TWAS results are often interpreted as indicating a genetic relationship between gene expression and the phenotype, we showed that this is not what the TWAS null hypothesis allows us to test. Instead, the null hypothesis that the traditional TWAS models actually evaluate effectively makes them a test of joint genetic association between the SNPs included in the analysis and the phenotype, irrespective of any genetic relationship with the gene expression. However, because it is not uncommon for TWAS results to be misinterpreted as a test of such a genetic relationship, we aimed to determine to what extent this interpretation was viable in practice. To this end, we performed simulations to determine the degree of type 1 error rate inflation that occurs when treating TWAS results as a test of genetic relationship, and performed real data analysis to demonstrate what this translated to in practice.

Our results showed that using TWAS to infer genetic relationships between gene expression and the phenotype can lead to strongly inflated type 1 error rates, as well as a substantial proportion of invalid significant results, for which such a genetic relationship is insufficiently supported by the statistical evidence in the data. This was further emphasized by the fact that in our real data analysis, TWAS still yielded a considerable number of significant associations even when genetic signal for the gene expression was absent altogether, clearly illustrating how TWAS functions simply as a form of joint association test. The issue is further complicated by the fact that the degree to which standard errors are deflated varies across genes, depending on factors such as the gene size and level of genetic signal for both gene expression and phenotype. This therefore rules out the possibility of a simple post-hoc correction on existing TWAS results, making them difficult to interpret and use in practice. Although TWAS results can still be validly interpreted as a form of joint association test for the genetic relationship between the SNPs and the phenotype, as out results show, it does not allow for any conclusions to be drawn about the relationship of the phenotype with the gene expression, and therefore does not address the research questions that most researchers are looking to answer with such data.

While the issue of uncertainty in eQTL estimates has occasionally been mentioned in TWAS literature [1,25,35,36], its implications for the validity of the TWAS framework have received little scrutiny thus far. Moreover, although various TWAS methods have included simulations to evaluate type 1 error rates, these all either treated G^E as fixed in the simulations just as in the TWAS model itself [16,23], or only simulated under a null scenario of the phenotype having no genetic component at all (ie. GY = 0) [22,24,25]. This explains why the issue demonstrated in this paper has not yet been widely discussed.

Addressing this issue with TWAS generally requires fully accounting for the uncertainty of the estimate G^E in relation to the true genetic component GE. While it is in principle possible to incorporate this into existing TWAS models, in practice, the feasibility of implementing this will strongly depend on the approach used to fit the model in Eq (1), as the distribution of the noise component induced by penalized regression models can take mathematically very complex forms.

Another option is to use alternative methods to detect genetic relationships between gene expression and a phenotype. As demonstrated, local genetic correlation analysis methods could potentially fill this role, as these are explicitly designed to evaluate local genetic relationships between phenotypes. However, methods like colocalization [37,38] or Mendelian Randomization (MR) [3941] could potentially be used for this as well. These methods use quite different model assumptions, but test null hypotheses that can answer similar research questions pertaining to local genetic relationships between gene expression and phenotypes. Of note however is that the issue with TWAS described in this paper has a clear analogy to the issue of the “no measurement error” (NOME) assumption in MR analysis [42]. While the practice of selecting only highly significant instruments could reduce the impact of the NOME assumption, further evaluation would be needed to determine whether this same issue would still arise when using MR as an alternative to TWAS.

Investigating genetic relationships between gene expression and phenotypes, as TWAS is often intended to do, has the potential to provide valuable insights into the genetic etiology of many phenotypes. But although traditional TWAS models are statistically valid relative to their own null hypothesis, ultimately they only perform a test of joint association of the phenotype with the SNPs in a region, and thus cannot be used to reliably draw conclusions about any genetic relationship with gene expression. Further development of existing TWAS methods to account for this issue, or adoption of alternative methods such as local genetic correlation analysis, will therefore be essential to support future investigations of the relationship between gene expression and phenotypes.

Methods

Ethics statement

This study relied on simulated data based on the UK Biobank genotype data [28], as well as secondary analysis of publicly available summary statistics. Ethical approval for this data was obtained by the primary researchers, and as such no further ethical approval was required for this study.

Outline of TWAS framework

For a standardized genotype matrix X containing SNPs for a particular gene, and with E and Y the centered gene expression for that gene and outcome phenotype respectively, the general TWAS framework consists of a two-stage procedure, based on the Eqs (1) and (2) in the main text, with the estimated genetic component G^E=Xα^E, also given in the main text. For ease of notation, we have omitted model intercepts and covariates from these equations, but in practice these will usually be included. An alternative model may also be used rather than the linear regression in Eq (2), such as a logistic regression if the phenotype is binary.

For each gene and tissue, Eq (1) is first fitted to the eQTL data to obtain the estimated weight vector α^E. This is then used to compute G^E in the target GWAS sample in the second stage, and plugged into the linear model in Eq (2). A p-value is then obtained by performing a test on the coefficient β. Note that usually the second stage is only performed for genes and tissues that exhibit sufficient genetic association in the eQTL data. This second stage can also be rewritten in terms of GWAS summary statistics, allowing TWAS to be performed without having direct access to the GWAS sample. In this case a genotype matrix X obtained from a separate reference sample is used to estimate LD.

Which SNPs are included in X varies, but a common choice is to use all available SNPs within one megabase of the transcription region of the gene. Although for simplicity the same genotype matrix X is used for Eqs (1) and (2), there will be separate X genotype matrices for each sample. The analysis is therefore restricted to using only those SNPs that are available in both samples, as well as in the LD reference sample when using summary statistics as input.

In practice, Eq (1) cannot be fitted with a traditional multiple linear regression model, due to the high LD between SNPs (leading to extreme collinearity), and the number of SNPs typically exceeding the sample sizes of eQTL data. Some form of regularization in the regression model is therefore required to obtain α^E, and consequently one of the main discrepancies between TWAS implementations is the specific regularization used (see Table 1). In some cases, rather than fitting Eq (1), the elements of α^E are simply set to the marginal SNP effect estimates instead.

Note that some methods [2224] diverge from this linear model structure (Table 1, non-linear models). Statistically these can be seen as generalizations of the TWAS framework, though conceptually they can no longer be interpreted as imputing the genetic component of gene expression. See Non-linear TWAS models in S1 Text for more details.

Local genetic correlation

The LAVA implementation of local genetic correlation analysis has been described in detail in Werme et al. (2022) [26]. In brief, LAVA uses summary statistics and a reference genotype sample to fit Eqs (1) and (3), obtaining estimates of α^E and α^Y as well as a corresponding sampling covariance matrix for each (a logistic regression equivalent is used for binary phenotypes). To do so, a singular value decomposition for X is computed, pruning away excess principal components to attain regularization of the models and allowing them to be fitted.

With the pruned and standardized principal component matrix W = XR (with R the transformation matrix projecting the genotypes onto the principal components), we can write GE = E and GY = Y, where δE and δY are the genetic effect size vectors for these principal components. Their estimates δ^E and δ^Y can be used to obtain α^E and α^Y by reversing the transformation through R, such that α^E=Rδ^E and α^Y=Rδ^Y, with the projection to the principal components effectively providing a form of regularization in the estimation of αE and αY. In practice however, LAVA is defined and implemented directly in terms of δE and δY and its estimates, rather than working with αE and αY explicitly.

For ease of notation, we define the combined matrix G = (GE, GY) = for the genetic components, with combined effect size matrix δ = (δE, δY). We denote the effect sizes for a single principal component j as δj, corrresponding to the jth row of γ, and denote the number of principal components as K.

The estimates δ^E and δ^Y are obtained by reconstructing multiple linear regressions from the input summary statistics (see Werme et al. (2022)[26] for details). This uses two separate equations of the form E = E+ζE and Y = Y+ζY, analogous to Eqs (1) and (3) but regressing on W rather than X, with residual variances ηE2 and ηY2 for ζE and ζY. From these models we have estimates of the form δ^E=(WTW)1WTE=WTEN1 (since WTW = IK(N−1), with IK the size K identity matrix) and similarly δ^Y=WTYN1, with corresponding sampling distributions δ^EMVN(δE,σE2I) and δ^YMVN(δY,σY2I), where σE2=ηE2N1 and σY2=ηY2N1 are the sampling variances (ie. squared standard errors). In practice, we obtain these by estimating ηE2 and ηY2 and plugging these in to get estimates σ^E2 and σ^Y2.

For principal component j we therefore have δ^jMVN(δj,Σ^), where the diagonal elements of Σ^ are σ^E2 and σ^Y2 and the off-diagonal elements are 0 (in the general case the off-diagonal elements represent the sampling covariance resulting from sample overlap, but this is not present in the analyses in this study). Since for the covariance matrix of G, denoted Ω, we have Ω=cov(G)=GTGN1=δTWTWδN1=δTδ, it follows that inference on Ω can be performed using the sampling distributions for δ^E and δ^Y directly.

Using this model, separate univariate tests of joint association of the SNPs in X with E and Y can be performed, testing the null hypotheses δE=0 and δY=0 respectively (using standard linear regression F-test for continuous phenotypes (such as gene expression), or a χ2 test for binary phenotypes). This is equivalent to testing the local genetic variances ωE2 and ωY2, the diagonal elements of Ω. In general, it is recommended in LAVA to test both genetic variances before performing the bivariate analysis, since genetic covariance can only exist in a genomic region where there both phenotypes exhibit some degree of genetic variance, though analogous to standard practice in TWAS literature in this paper we only test on the genetic variance for the gene expression.

From the above distributions it follows that the expected value E[δ^Tδ^]=δTδ+KΣ^, and we can therefore use the method of moments to estimate Ω as Ω^=δ^Tδ^KΣ^. Since in the present analyses there is assumed to be no sample overlap, the off-diagonal elements of Σ are 0, and the estimate for the genetic covariance therefore reduces to ω^EY=δ^ETδ^Y. The matrix δ^Tδ^ has a non-central Wishart sampling distribution, which in the current implementation of LAVA is used to obtain p-values to test H0:ωEY = cov(GE, GY) = 0 using a simulation procedure (see Werme et al. (2022) for details). Note that the LAVA model as described in Werme et al. (2022) is technically defined in terms of testing H0:cov(δE, δY) = 0, but since cov(GE,GY)=GETGYN1=δETWTWδYN1=δETδY=Kcov(δE,δY), this is equivalent to testing H0:cov(GE, GY) = 0.

However, under this null hypothesis ω^EY has an expected value of zero and a variance of VrG=Kσ^E2σ^Y2+ω^E2σ^Y2+σ^E2ω^Y2. Although it is a sum of product-normal distributions, with large enough K its distribution converges on a normal distribution following the central limit theorem, allowing for a normal approximation of the sampling distribution of ω^EY. For the simulations and real data analysis we computed LAVA-rG p-values in both ways, and as shown in Fig G in S1 Text these were found to be virtually identical. As such, all LAVA-rG results reported in this paper are based on p-values computed using the distribution ω^EYN(0,VrG), to provide a mathematically more direct comparison for the LAVA-TWAS implementation.

LAVA implementation of TWAS

To construct a TWAS model within the LAVA framework, we note that in a linear regression for Eq (2) we have estimator β^=cov^(G^E,Y)var^(G^E). Since var^(G^E) is considered fixed in TWAS the sampling distribution of β^ directly proportional to the distribution of cov^(G^E,Y), the sample estimate of cov(G^E,Y). As both G^E and Y have means of zero, and since G^E=Wδ^E, we have cov^(G^E,Y)=G^ETYN1=δ^ETWTYN1=δ^ETWTYN1. As previously derived δ^Y=WTYN1, and it therefore follows that cov^(G^E,Y)=δ^ETδ^Y. This cov^(G^E,Y) is therefore equal to the genetic covariance estimate ω^EY used by LAVA-rG, and thus both LAVA-TWAS and LAVA-rG are shown to use the same test statistic δ^ETδ^Y=ω^EY.

We can also observe that this ω^EY can be rewritten as the sum jδ^Ejδ^Yj, weighting each genetic association δ^Yj by the corresponding eQTL estimate δ^Ej. This implementation is therefore analogous to how TWAS is performed using GWAS summary statistics in other TWAS methods (eg. Gusev (2016)[6]), except defined in terms of the estimated genetic associations of principal component matrix W rather than the original SNP genotype matrix X.

Data

Genotype data for simulations came from the UK Biobank [28], a national genomic study of over 500,000 volunteer participants in the UK. The study was approved by the National Research Ethics Service Committee North West–Haydock (reference 11/NW/0382) and data were accessed under application #16406. Data collection, primary quality control, and imputation of the genotype data were performed by the UK Biobank[29]. Data was converted to hardcalled genotypes, and then additionally filtered, using only SNPs with an info score of at least 0.9 and missingness no greater than 5%, and restricting the sample to unrelated, European-ancestry individuals with concordant sex. Remaining missing genotype values were then mean-imputed to ensure consistency of the input data across different methods.

The European panel of the 1,000 Genomes [43] data (N = 503, as downloaded from https://ctg.cncr.nl/software/magma) was used as genotype reference data to estimate LD for the real data analysis for both LAVA and FUSION, removing SNP with a minor allele frequency (MAF) below 0.5%. For eQTL data we used the GTEx [34] data (v8, European subset), for whole blood gene expression. The published cis-eQTL summary statistics (European ancestry) from GTEx Portal were used for the LAVA analysis, pre-computed FUSION model files were used for the FUSION analysis. For every analysed gene, we included all SNPs in the data within one megabase of the transcription start site. Genes were filtered to include only autosomal protein-coding and RNA genes expressed in blood, and only genes also available in the FUSION model files were used, resulting in a total of 14,584 genes available for analysis.

GWAS summary statistics were selected for five well-powered phenotypes, chosen to reflect a range of different domains. These were BMI (GIANT) [30] (no waist-hip ratio adjustment), educational attainment (SSGAC)[31], schizophrenia (PGC, wave 3)[32], diastolic blood pressure (GWAS Atlas)[33] and type 2 diabetes (GWAS Atlas)[33]. Sample size and number of SNPs for each sample can be found in Table 2.

Simulation study

Simulation studies were performed to evaluate the type 1 error rates of TWAS when used to evaluate the null hypothesis H0:cov(GE, GY) = 0. The simulations were based on the UK Biobank data, generating gene expression and outcome phenotypes while varying their sample sizes and the local heritabilities, as well as the number of SNPs per gene. The number of SNPs K in the genes was varied across values of 100, 500, 1,000 and 2,500 SNPs; sample sizes of either 250 or 1,000 were used for the gene expression (denoted NE), and 10,000, 50,000 or 100,000 for the outcome phenotype (denoted NY); and local heritabilities, defined as the percentage of variance explained by the SNPs in the gene, were set at either 1%, 5% or 10% for the gene expression (denoted hE2) [44], and 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5% or 1% for the outcome phenotype (denoted hY2). The number of SNPs K was only varied at sample size settings of NE = 1,000 and NY = 10,000, and the sample sizes were only varied at K = 1,000. All combinations of the two local heritabilities were simulated for each setting of K and sample sizes. The simulations were analysed using LAVA-TWAS, LAVA-rG, the four FUSION models and CoMM.

To create the samples to use, we first drew a random subset of 100,000 individuals from the UK Biobank data. For each sample size N used in the simulations, the first N individuals from this subset were used for that sample, to ensure that samples of different sizes are all nested. The MAF for all SNPs was computed based on the N = 100 subsample, discarding all SNPs with MAF below 5%. Since the subsamples are nested, this ensures that all SNPs have a sufficient minor allele count at every sample size used. Ten blocks of 2,500 consecutive SNPs were then selected to represent simulated genes, one block each from around the middle of the first ten chromosomes (avoiding the centromere region). The number of SNPs K was varied across conditions by using only a subset of these 2,500 SNPs, always selecting the first K SNPs from each block. For each simulation condition, the total number of iterations for that condition was evenly distributed over these ten blocks, and results were aggregated over the blocks. Type 1 error rates per condition were computed as the proportion of iterations for that condition for which the p-value was smaller than the significance threshold, which was set to 0.05 unless otherwise specified.

To simulate gene expression E and outcome phenotype Y for the standardized genotypes X of the SNPs in a gene, X was projected onto standardized principal components W, pruning away redundant components based on the cumulative genotypic variance explained by the components (retaining those that jointly explain 99% of the total variance). For each iteration, true genetic effect sizes δE and δY for the principal components under the null hypothesis of cov(GE, GY) = 0 were generated by drawing values from a normal distribution, then regressing one vector on the other and retaining only the residuals for the outcome vector to ensure that δE and δY were exactly independent. The two genetic components were then computed as GE = E and GY = Y, and both were subsequently standardized to a variance of one.

Simulated gene expression and phenotype values were then generated as E = E+ξE and Y = Y +ξY, drawing the residuals ξE and ξY from normal distributions with variance parameters equal to 100hE2hE2 and 100hY2hY2 respectively, ensuring the desired level of explained variance. Both E and Y were then regressed on each individual SNP in X using simple linear regression, and the resulting test statistics as well as the simulated E and Y themselves were stored for subsequent analysis.

To compare the different methods, a baseline set of simulation conditions with 1,000 iterations, with K = 1,000, NE = 1,000 and the remaining parameters across their full range, was generated and analysed with all methods: LAVA-TWAS, LAVA-rG, FUSION-BLUP, FUSION-BSLMM, FUSION-ElastNet, FUSION-LASSO and CoMM. LAVA analyses were performed using the summary statistics as input, FUSION analyses with the raw gene expression values for the first stage and outcome phenotype summary statistics for input, and CoMM using the raw values for both gene expression and outcome phenotype. For the first stage of FUSION, cross-validation was turned off and the true RE2=hE2100 value was set using the—hsq_set option. In the CoMM analyses, the model was found to run into convergence issues due to K being equal to NE, and the simulations were rerun setting K to 100 instead to resolve this. Moreover, an additional set of simulations was performed with he2 values of 1%, 2%, 4%, 5%, 6%, 8% and 10%, and hY2 values of 0.01%, 0.02%, 0.05%, 0.1%, 0.2%, 0.5% and 1%.

For LAVA-TWAS, further of simulations with 10,000 iterations were performed across the full range of parameter settings for analysis, to evaluate the impact of these parameters on type 1 error rates. To determine the effect of filtering genes on the presence of eQTL signal, as is common practice in TWAS analyses, for these simulations we also computed the LAVA univariate p-value for the gene expression. In addition to the regular type 1 error rates, filtered type 1 error rates were also computed based on only the iterations for which the univariate p-value was below either 0.05 or 0.0001. Finally, to allow for reliable computation of type 1 error rates at lower significance thresholds, for the subset of these conditions with K = 1,000, NE = 1,000, hE2=5% and hY2=0.001%,0.01%,0.1% or 1% the number of iterations was increased to 100,000. Note that although for LAVA-TWAS the number of iterations differs from the other methods and varies across conditions, for each plot in this paper the type 1 error rate values within that plot are always based on the same number of iterations.

Additional simulations were also performed under the TWAS null hypothesis of H0:cov(G^E,GY)=0 to validate the LAVA-TWAS model. To do so, the above procedure was modified to first generate δE and the corresponding GE, from which E was then simulated. This was used to estimate SNP summary statistics for the associations between X and E, from which the estimate vector δ^E was then computed for the outcome data. Finally, δY was then generated to be exactly independent of this δ^E, with the rest of the simulation process proceeding as normal. For hY2, the parameter range was extended to also include hY2=0%.

Real data analysis

TWAS and local genetic correlation analyses were performed on the GWAS data as follows, separately for each of the five phenotypes. In all the analyses, SNP filtering was applied to remove all SNPs with a minor allele frequency below 0.5%, and only SNPs available in the 1,000 Genomes data, the GTEx data, and the GWAS sample for that phenotype were used.

For every gene, univariate LAVA analysis for the eQTL signal, LAVA-rG and LAVA-TWAS analyses, and FUSION analyses with the Elastic Net and LASSO models was performed. For FUSION, the BLUP and BSLMM models were not used as these were not available in the precomputed model files. FUSION univariate p-values for the eQTL signal were also obtained from these model files. Note that separate SNP filtering was applied prior to these models being computed, and as such the LAVA and FUSION analyses are based on different sets of SNPs per gene.

For the primary analyses, the bivariate LAVA and FUSION analyses were only performed for genes for which the LAVA univariate tests were significant after correcting for the total number of genes, at a threshold of 0.05/14,584 = 3.43×10−6 (see Table A in S1 Text for results when using the FUSION univariate p-values instead). The significance threshold for the bivariate analyses was set separately for each phenotype, at a Bonferroni-correction for the total number of genes that was univariate significant for that phenotype (see Table 2). In a secondary analysis, to evaluate TWAS results in the absence of gene expression signal, bivariate analyses were performed for all genes for which the LAVA univariate p-value was greater than 0.05, again setting the significance threshold per phenotype for the number of genes with univariate p-value greater than 0.05 for that phenotype.

Supporting information

S1 Text. Supplemental Information.

Contains additional derivations and explanation of the core properties of TWAS and related tests and models, as well as figures and tables with additional results. Fig A. Type 1 error rate simulation results as a function of the number of SNPs in the gene. Shown is the type 1 error rate (at significance threshold of 0.05) of the LAVA-TWAS model for the null hypothesis of no genetic covariance (cov(GE, GY) = 0), at different levels of local heritability (h2) for outcome phenotype (horizontal axis). Results are shown for sample sizes of 1,000 and 10,000 for the expression and outcome respectively, at 5% local heritability for the expression. As shown, the number of SNPs interacts with the local heritability of the outcome, with lower numbers of SNPs resulting in a lower type 1 error rate at lower heritability, but increasing more rapidly as a function of heritability than conditions with higher numbers of SNPs. Fig B. Type 1 error rate simulation results as a function of univariate input filtering. Shown is the type 1 error rate (at significance threshold of 0.05) of the LAVA-TWAS model for the null hypothesis of no genetic covariance (cov(GE, GY) = 0), at different levels of local heritability (h2) for outcome phenotype (horizontal axis) and different gene filtering settings. The univariate filtering is based on the test of genetic signal for the gene expression implemented in LAVA, applying either no filtering or subsetting to iterations of the simulation with univariate p-value smaller than 0.05, 0.001 or 0.0001 (corresponding to multiple testing correction for 1, 50 or 500 genes respectively) before computing the type 1 error rate on the remaining iterations. Results are shown for a sample size (N) of 1,000 and a local heritability of 5% for the expression, and 1,000 SNPs. As shown, filtering at a univariate p-value threshold of 0.05 has little effect, but at the stricter thresholds the type 1 error rate becomes further increased. Fig C. Type 1 error rate simulation results for LAVA-TWAS under the TWAS null hypothesis. Shown is the type 1 error rate (at significance threshold of 0.05) of LAVA-TWAS under the TWAS null hypothesis cov(G^E,GY)=0), at different levels of local heritability (h2) for outcome phenotype (horizontal axis) and gene expression (separate lines). Results are shown for sample sizes of 1,000 and 100,000 for the expression and outcome respectively, with 1,000 SNPs. As shown, and in contrast to the results when testing the null hypothesis of no genetic covariance (cov(GE, GY) = 0, Fig 1), type 1 error rates here are well controlled regardless of condition. Fig D. Type 1 error rate simulation results for FUSION models. Shown is the type 1 error rate (at significance threshold of 0.05) of the four penalized regression models in FUSION for the null hypothesis of no genetic covariance (cov(GE, GY) = 0), at different levels of local heritability (h2) for outcome phenotype (horizontal axis) and gene expression (separate lines). Results are shown for sample sizes of 1,000 and 100,000 for the expression and outcome respectively, with 1,000 SNPs, thus corresponding to the results for LAVA-TWAS shown in the bottom-right panel of Fig 1. As shown, the results are very similar across the different models. The Elastic Net and LASSO models do show decreased type 1 error rate for the 1% expression heritability condition, which is due to these models more frequently failing to converge if there is little detectable genetic signal for the expression, making the TWAS analysis inherently non-significant in those cases (whereas the BLUP and Bayesian LMM models always yield an α^E). Fig E. Type 1 error rate simulation results for CoMM. Shown on the top row is the type 1 error rate (at significance threshold of 0.05) of CoMM for the null hypothesis of no genetic covariance (cov(GE, GY) = 0), at different levels of local heritability (h2) for outcome phenotype (horizontal axis) and gene expression (separate lines). Results are shown for sample sizes of 1,000 for the expression, and sample sizes of 10,000 (left) and 100,000 (right) for the outcome. Results in the top-right panel thus correspond to the LAVA-TWAS results in the bottom-right panel of Fig 1 and the FUSION results in Fig D, except for using 100 rather than 1,000 SNPs. As shown, type 1 error rate inflation is generally more pronounced than for the other methods. Moreover, there is a (partial) reversal of the effect of the local heritability of the expression, with the error rates for the 10% heritability level generally in between those of 1% and 5% heritability. Subsequently, additional simulations were performed using a more fine-grained range of heritability values to further investigate this phenomenon, with results shown on the bottom row with local heritability for the gene expression now on the horizontal axis and for the outcome as separate lines. These additional results confirm the non-monotonic effect of gene expression heritability on type 1 error rate, and demonstrate an interaction between it and the heritability and sample size of the outcome as well. See also section Discussion of simulation results above. Fig F. P-value bias in the real data analysis for BMI. Shown is the relation between the -log10 p-values of LAVA-TWAS model against those of the LAVA-rG model, demonstrating the bias in the LAVA-TWAS p-values as a result of its failure to account for the uncertainty in G^E. On the left a scatterplot of all the -log10 p-values from the analysis, on the right the same values truncated to 1.0×10−10 to better visualize the region around the Bonferroni-corrected significance threshold (horizontal and vertical dashed lines). In green the genes significant for both LAVA-TWAS and LAVA-rG, in red the invalid associations significant in LAVA-TWAS only. Fig G. Comparison of -log10 p-values for different implementations of LAVA-rG. P-values in the published implementation of LAVA-rG are computed using a partially empirical, sampling-based approach. Shown here is a comparison with p-values computed using a normal approximation, results for which are used throughout this paper rather than those based on the empirical p-values. Here a comparison is given of all the LAVA-rG p-values from the simulations (left) and real-data analysis (right). As shown, these p-values are virtually identical, and as such results based on either approach can be considered interchangeable. Fig H. Type 1 error rate simulation results for FUSION models with post hoc permutation test. Shown is the type 1 error rate (at significance threshold of 0.05) of the four penalized regression models in FUSION for the null hypothesis of no genetic covariance (cov(GE, GY) = 0), at different levels of local heritability (h2) for the outcome phenotype (horizontal axis). Results are shown for 5% heritability for the gene expression, and sample sizes of 1,000 and 100,000 for the expression and outcome respectively, with 1,000 SNPs. Significance is declared either when the permutation p-value is below 0.05 (black lines), or when the permutation p-value and the main analysis p-value are both below 0.05 (yellow lines). As shown, type 1 error rates are generally well controlled for the Elastic Net and LASSO models, and well controlled for the Bayesian LMM model when using both p-values, though for the BLUP model inflation remains even when using both p-values. Table A. Summary of results of TWAS and local genetic correlation analyses of published summary statistics for five phenotypes, with FUSION univariate filtering. Table B. Summary of results of TWAS and local genetic correlation analyses for genes with no detectable eQTL signal. Table C. Summary of results of FUSION permutation tests for published summary statistics for five phenotypes.

(DOCX)

Acknowledgments

The analyses were carried out on the Genetic Cluster Computer, which is financed by the Netherlands Organization for Scientific Research (NWO: 480-05-003), by the VU University (Amsterdam, The Netherlands) and the Dutch Brain Foundation, hosted by the Dutch National Computing and Networking Services SurfSARA.

Data Availability

For the real data analysis, summary statistics were obtained for five different phenotypes: blood pressure (https://atlas.ctglab.nl/traitDB/3379), BMI (https://zenodo.org/record/1251813#.YBAfYBbvJhE, combined.23May2018), educational attainment (https://www.thessgac.org/data), schizophrenia (https://www.med.unc.edu/pgc/download-results/, scz2022), type 2 diabetes (https://atlas.ctglab.nl/traitDB/3328). Note that the educational attainment data requires registering your name and email address to confirm compliance to the terms of use of the data. GTEx v8 European ancestry whole blood eQTL summary statistics and the gene definition file used for the LAVA-TWAS and LAVA-rG analyses were obtained via https://www.gtexportal.org/home/datasets, the corresponding FUSION model files from http://gusevlab.org/projects/fusion/#single-tissue-gene-expression (EUR Samples). The European subset of the 1000 Genomes data used as LD reference for the real data analyses was downloaded from https://ctg.cncr.nl/software/magma. UK Biobank genotype data was obtained from https://www.ukbiobank.ac.uk/ under application #16406. LAVA is implemented as an R package, which is publicly available at the LAVA website (https://ctg.cncr.nl/software/lava) and LAVA GitHub repository (https://github.com/josefin-werme/LAVA). GTEx v8 input files compatible with LAVA are available from the LAVA website as well. Version 0.1.0 of LAVA was used for the LAVA analyses in this paper; this version contains specific functionality for handling eQTL summary statistics input files. FUSION can be obtained from http://gusevlab.org/projects/fusion/ and CoMM from https://github.com/gordonliu810822/CoMM. Scripts used for the simulations and analyses performed for this paper, as well as scripts and data underlying the figures, are available from https://github.com/cadeleeuw/twas-validity-scripts2022.

Funding Statement

This work was funded by The Netherlands Organization for Scientific Research (Grant No. NWO VICI 435–14–005 [to DP]) and NWO Gravitation: BRAINSCAPES: A Roadmap from Neurogenetics to Neurobiology (Grant No. 024.004.012 [to DP]), and a European Research Council advanced grant (Grant No, ERC-2018-AdG GWAS2FUNC 834057 [to DP, and funding CdL]). CdL was funded by F. Hoffmann-La Roche AG. WJP was funded by an NWO Veni grant (NWO: 916-19-152). JES was supported by an NWO Veni grant (NWO: 201G-064). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Wainberg M, Sinnott-Armstrong N, Mancuso N, Barbeira AN, Knowles DA, Golan D, et al. Opportunities and challenges for transcriptome-wide association studies. Nat Genet. 2019. Apr;51(4):592–9. doi: 10.1038/s41588-019-0385-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Cano-Gamez E, Trynka G. From GWAS to Function: Using Functional Genomics to Identify the Mechanisms Underlying Complex Diseases. Front Genet. 2020;11. doi: 10.3389/fgene.2020.00424 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Zhao B, Shan Y, Yang Y, Yu Z, Li T, Wang X, et al. Transcriptome-wide association analysis of brain structures yields insights into pleiotropy with complex neuropsychiatric traits. Nat Commun. 2021. May 17;12(1):2878. doi: 10.1038/s41467-021-23130-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Strunz T, Lauwen S, Kiel C, Hollander A den, Weber BHF. A transcriptome-wide association study based on 27 tissues identifies 106 genes potentially relevant for disease pathology in age-related macular degeneration. Sci Rep. 2020. Jan 31;10(1):1584. doi: 10.1038/s41598-020-58510-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Derks EM, Gamazon ER. Transcriptome-wide association analysis offers novel opportunities for clinical translation of genetic discoveries on mental disorders. World Psychiatry. 2020. Feb;19(1):113–4. doi: 10.1002/wps.20702 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BWJH, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet. 2016. Mar;48(3):245–52. doi: 10.1038/ng.3506 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 2015. Sep;47(9):1091–8. doi: 10.1038/ng.3367 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mancuso N, Freund MK, Johnson R, Shi H, Kichaev G, Gusev A, et al. Probabilistic fine-mapping of transcriptome-wide association studies. 2019;22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Cheng W, van der Meer D, Parker N, Hindley G, O’Connell KS, Wang Y, et al. Shared genetic architecture between schizophrenia and subcortical brain volumes implicates early neurodevelopmental processes and brain development in childhood. Mol Psychiatry. 2022. Sep 13; doi: 10.1038/s41380-022-01751-z [DOI] [PubMed] [Google Scholar]
  • 10.Chen X, Yao T, Cai J, Zhang Q, Li S, Li H, et al. A novel genetic variant potentially altering the expression of MANBA in the cerebellum associated with attention deficit hyperactivity disorder in Han Chinese children. The World Journal of Biological Psychiatry. 2022. Jan 13;1–12. [DOI] [PubMed] [Google Scholar]
  • 11.Li L, Chen Z, von Scheidt M, Li S, Steiner A, Güldener U, et al. Transcriptome-wide association study of coronary artery disease identifies novel susceptibility genes. Basic Res Cardiol. 2022. Dec;117(1):6. doi: 10.1007/s00395-022-00917-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhu Q, Schultz E, Long J, Roh JM, Valice E, Laurent CA, et al. UACA locus is associated with breast cancer chemoresistance and survival. npj Breast Cancer. 2022. Dec;8(1):39. doi: 10.1038/s41523-022-00401-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Asefa NG, Kamali Z, Pereira S, Vaez A, Jansonius N, Bergen AA, et al. Bioinformatic Prioritization and Functional Annotation of GWAS-Based Candidate Genes for Primary Open-Angle Glaucoma. Genes. 2022. Jun 13;13(6):1055. doi: 10.3390/genes13061055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Mancuso N, Shi H, Goddard P, Kichaev G, Gusev A, Pasaniuc B. Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits. The American Journal of Human Genetics. 2017. Mar;100(3):473–87. doi: 10.1016/j.ajhg.2017.01.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Barbeira AN, Dickinson SP, Bonazzola R, Zheng J, Wheeler HE, Torres JM, et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat Commun. 2018. Dec;9(1):1825. doi: 10.1038/s41467-018-03621-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Su YR, Di C, Bien S, Huang L, Dong X, Abecasis G, et al. A Mixed-Effects Model for Powerful Association Tests in Integrative Functional Genomics. The American Journal of Human Genetics. 2018. May;102(5):904–19. doi: 10.1016/j.ajhg.2018.03.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hu Y, Li M, Lu Q, Weng H, Wang J, Zekavat SM, et al. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat Genet. 2019. Mar;51(3):568–76. doi: 10.1038/s41588-019-0345-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Nagpal S, Meng X, Epstein MP, Tsoi LC, Patrick M, Gibson G, et al. TIGAR: An Improved Bayesian Tool for Transcriptomic Data Imputation Enhances Gene Mapping of Complex Traits. The American Journal of Human Genetics. 2019. Aug;105(2):258–66. doi: 10.1016/j.ajhg.2019.05.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Liu W, Li M, Zhang W, Zhou G, Wu X, Wang J, et al. Leveraging functional annotation to identify genes associated with complex diseases. Kim S, editor. PLoS Comput Biol. 2020. Nov 2;16(11):e1008315. doi: 10.1371/journal.pcbi.1008315 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Luningham JM, Chen J, Tang S, De Jager PL, Bennett DA, Buchman AS, et al. Bayesian Genome-wide TWAS Method to Leverage both cis- and trans-eQTL Information through Summary Statistics. The American Journal of Human Genetics. 2020. Oct;107(4):714–26. doi: 10.1016/j.ajhg.2020.08.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bhattacharya A, Li Y, Love MI. MOSTWAS: Multi-Omic Strategies for Transcriptome-Wide Association Studies. Zhu X, editor. PLoS Genet. 2021. Mar 8;17(3):e1009398. doi: 10.1371/journal.pgen.1009398 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Xu Z, Wu C, Wei P, Pan W. A Powerful Framework for Integrating eQTL and GWAS Summary Data. Genetics. 2017. Nov 1;207(3):893–902. doi: 10.1534/genetics.117.300270 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zhang J, Xie S, Gonzales S, Liu J, Wang X. A fast and powerful eQTL weighted method to detect genes associated with complex trait using GWAS summary data. Genetic Epidemiology. 2020. Sep;44(6):550–63. doi: 10.1002/gepi.22297 [DOI] [PubMed] [Google Scholar]
  • 24.Tang S, Buchman AS, De Jager PL, Bennett DA, Epstein MP, Yang J. Novel Variance-Component TWAS method for studying complex human diseases with applications to Alzheimer’s dementia. Chen L, editor. PLoS Genet. 2021. Apr 2;17(4):e1009482. doi: 10.1371/journal.pgen.1009482 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Yang C, Wan X, Lin X, Chen M, Zhou X, Liu J. CoMM: a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information. Hancock J, editor. Bioinformatics. 2019. May 15;35(10):1644–52. doi: 10.1093/bioinformatics/bty865 [DOI] [PubMed] [Google Scholar]
  • 26.Werme J, van der Sluis S, Posthuma D, de Leeuw CA. An integrated framework for local genetic correlation analysis. Nat Genet. 2022. Mar;54(3):274–82. doi: 10.1038/s41588-022-01017-y [DOI] [PubMed] [Google Scholar]
  • 27.Shi H, Mancuso N, Spendlove S, Pasaniuc B. Local Genetic Correlation Gives Insights into the Shared Genetic Architecture of Complex Traits. Am J Hum Genet. 2017. Nov 2;101(5):737–51. doi: 10.1016/j.ajhg.2017.09.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med. 2015. Mar 31;12(3):e1001779. doi: 10.1371/journal.pmed.1001779 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–9. doi: 10.1038/s41586-018-0579-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Pulit SL, Stoneman C, Morris AP, Wood AR, Glastonbury CA, Tyrrell J, et al. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry. Human Molecular Genetics. 2019. Jan 1;28(1):166–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lee JJ, Wedow R, Okbay A, Kong E, Maghzian O, Zacher M, et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat Genet. 2018. Jul 23;50(8):1112–21. doi: 10.1038/s41588-018-0147-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.The Schizophrenia Working Group of the Psychiatric Genomics Consortium, Ripke S, Walters JT, O’Donovan MC. Mapping genomic loci prioritises genes and implicates synaptic biology in schizophrenia. medRxiv. 2020. Sep 13;2020.09.12. doi: 10.1042/BJ20090867 [DOI] [Google Scholar]
  • 33.Watanabe K, Stringer S, Frei O, Umićević Mirkov M, de Leeuw C, Polderman TJC, et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat Genet. 2019. Sep;51(9):1339–48. doi: 10.1038/s41588-019-0481-0 [DOI] [PubMed] [Google Scholar]
  • 34.Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, et al. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013. Jun;45(6):580–5. doi: 10.1038/ng.2653 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Xue H, Pan W. Some statistical consideration in transcriptome-wide association studies. Genet Epidemiol. 2020. Apr;44(3):221–32. doi: 10.1002/gepi.22274 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Zhu H, Zhou X. Transcriptome-wide association studies: a view from Mendelian randomization. Quant Biol. 2020. Jun 17; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, et al. Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics. Williams SM, editor. PLoS Genet. 2014. May 15;10(5):e1004383. doi: 10.1371/journal.pgen.1004383 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Hormozdiari F, van de Bunt M, Segrè AV, Li X, Joo JWJ, Bilow M, et al. Colocalization of GWAS and eQTL Signals Detects Target Genes. The American Journal of Human Genetics. 2016. Dec;99(6):1245–60. doi: 10.1016/j.ajhg.2016.10.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016. May;48(5):481–7. doi: 10.1038/ng.3538 [DOI] [PubMed] [Google Scholar]
  • 40.Yuan Z, Zhu H, Zeng P, Yang S, Sun S, Yang C, et al. Testing and controlling for horizontal pleiotropy with probabilistic Mendelian randomization in transcriptome-wide association studies. Nat Commun. 2020. Dec;11(1):3861. doi: 10.1038/s41467-020-17668-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Porcu E, Rüeger S, Lepik K, Santoni FA, Reymond A, Kutalik Z. Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traits. Nat Commun. 2019. Dec;10(1):3300. doi: 10.1038/s41467-019-10936-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Bowden J, Del Greco M F, Minelli C, Zhao Q, Lawlor DA, Sheehan NA, et al. Improving the accuracy of two-sample summary-data Mendelian randomization: moving beyond the NOME assumption. International Journal of Epidemiology. 2019. Jun 1;48(3):728–42. doi: 10.1093/ije/dyy258 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, et al. A global reference for human genetic variation. Nature. 2015. Oct;526(7571):68–74. doi: 10.1038/nature15393 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Lloyd-Jones LR, Holloway A, McRae A, Yang J, Small K, Zhao J, et al. The Genetic Architecture of Gene Expression in Peripheral Blood. The American Journal of Human Genetics. 2017. Feb;100(2):228–37. doi: 10.1016/j.ajhg.2016.12.008 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Text. Supplemental Information.

Contains additional derivations and explanation of the core properties of TWAS and related tests and models, as well as figures and tables with additional results. Fig A. Type 1 error rate simulation results as a function of the number of SNPs in the gene. Shown is the type 1 error rate (at significance threshold of 0.05) of the LAVA-TWAS model for the null hypothesis of no genetic covariance (cov(GE, GY) = 0), at different levels of local heritability (h2) for outcome phenotype (horizontal axis). Results are shown for sample sizes of 1,000 and 10,000 for the expression and outcome respectively, at 5% local heritability for the expression. As shown, the number of SNPs interacts with the local heritability of the outcome, with lower numbers of SNPs resulting in a lower type 1 error rate at lower heritability, but increasing more rapidly as a function of heritability than conditions with higher numbers of SNPs. Fig B. Type 1 error rate simulation results as a function of univariate input filtering. Shown is the type 1 error rate (at significance threshold of 0.05) of the LAVA-TWAS model for the null hypothesis of no genetic covariance (cov(GE, GY) = 0), at different levels of local heritability (h2) for outcome phenotype (horizontal axis) and different gene filtering settings. The univariate filtering is based on the test of genetic signal for the gene expression implemented in LAVA, applying either no filtering or subsetting to iterations of the simulation with univariate p-value smaller than 0.05, 0.001 or 0.0001 (corresponding to multiple testing correction for 1, 50 or 500 genes respectively) before computing the type 1 error rate on the remaining iterations. Results are shown for a sample size (N) of 1,000 and a local heritability of 5% for the expression, and 1,000 SNPs. As shown, filtering at a univariate p-value threshold of 0.05 has little effect, but at the stricter thresholds the type 1 error rate becomes further increased. Fig C. Type 1 error rate simulation results for LAVA-TWAS under the TWAS null hypothesis. Shown is the type 1 error rate (at significance threshold of 0.05) of LAVA-TWAS under the TWAS null hypothesis cov(G^E,GY)=0), at different levels of local heritability (h2) for outcome phenotype (horizontal axis) and gene expression (separate lines). Results are shown for sample sizes of 1,000 and 100,000 for the expression and outcome respectively, with 1,000 SNPs. As shown, and in contrast to the results when testing the null hypothesis of no genetic covariance (cov(GE, GY) = 0, Fig 1), type 1 error rates here are well controlled regardless of condition. Fig D. Type 1 error rate simulation results for FUSION models. Shown is the type 1 error rate (at significance threshold of 0.05) of the four penalized regression models in FUSION for the null hypothesis of no genetic covariance (cov(GE, GY) = 0), at different levels of local heritability (h2) for outcome phenotype (horizontal axis) and gene expression (separate lines). Results are shown for sample sizes of 1,000 and 100,000 for the expression and outcome respectively, with 1,000 SNPs, thus corresponding to the results for LAVA-TWAS shown in the bottom-right panel of Fig 1. As shown, the results are very similar across the different models. The Elastic Net and LASSO models do show decreased type 1 error rate for the 1% expression heritability condition, which is due to these models more frequently failing to converge if there is little detectable genetic signal for the expression, making the TWAS analysis inherently non-significant in those cases (whereas the BLUP and Bayesian LMM models always yield an α^E). Fig E. Type 1 error rate simulation results for CoMM. Shown on the top row is the type 1 error rate (at significance threshold of 0.05) of CoMM for the null hypothesis of no genetic covariance (cov(GE, GY) = 0), at different levels of local heritability (h2) for outcome phenotype (horizontal axis) and gene expression (separate lines). Results are shown for sample sizes of 1,000 for the expression, and sample sizes of 10,000 (left) and 100,000 (right) for the outcome. Results in the top-right panel thus correspond to the LAVA-TWAS results in the bottom-right panel of Fig 1 and the FUSION results in Fig D, except for using 100 rather than 1,000 SNPs. As shown, type 1 error rate inflation is generally more pronounced than for the other methods. Moreover, there is a (partial) reversal of the effect of the local heritability of the expression, with the error rates for the 10% heritability level generally in between those of 1% and 5% heritability. Subsequently, additional simulations were performed using a more fine-grained range of heritability values to further investigate this phenomenon, with results shown on the bottom row with local heritability for the gene expression now on the horizontal axis and for the outcome as separate lines. These additional results confirm the non-monotonic effect of gene expression heritability on type 1 error rate, and demonstrate an interaction between it and the heritability and sample size of the outcome as well. See also section Discussion of simulation results above. Fig F. P-value bias in the real data analysis for BMI. Shown is the relation between the -log10 p-values of LAVA-TWAS model against those of the LAVA-rG model, demonstrating the bias in the LAVA-TWAS p-values as a result of its failure to account for the uncertainty in G^E. On the left a scatterplot of all the -log10 p-values from the analysis, on the right the same values truncated to 1.0×10−10 to better visualize the region around the Bonferroni-corrected significance threshold (horizontal and vertical dashed lines). In green the genes significant for both LAVA-TWAS and LAVA-rG, in red the invalid associations significant in LAVA-TWAS only. Fig G. Comparison of -log10 p-values for different implementations of LAVA-rG. P-values in the published implementation of LAVA-rG are computed using a partially empirical, sampling-based approach. Shown here is a comparison with p-values computed using a normal approximation, results for which are used throughout this paper rather than those based on the empirical p-values. Here a comparison is given of all the LAVA-rG p-values from the simulations (left) and real-data analysis (right). As shown, these p-values are virtually identical, and as such results based on either approach can be considered interchangeable. Fig H. Type 1 error rate simulation results for FUSION models with post hoc permutation test. Shown is the type 1 error rate (at significance threshold of 0.05) of the four penalized regression models in FUSION for the null hypothesis of no genetic covariance (cov(GE, GY) = 0), at different levels of local heritability (h2) for the outcome phenotype (horizontal axis). Results are shown for 5% heritability for the gene expression, and sample sizes of 1,000 and 100,000 for the expression and outcome respectively, with 1,000 SNPs. Significance is declared either when the permutation p-value is below 0.05 (black lines), or when the permutation p-value and the main analysis p-value are both below 0.05 (yellow lines). As shown, type 1 error rates are generally well controlled for the Elastic Net and LASSO models, and well controlled for the Bayesian LMM model when using both p-values, though for the BLUP model inflation remains even when using both p-values. Table A. Summary of results of TWAS and local genetic correlation analyses of published summary statistics for five phenotypes, with FUSION univariate filtering. Table B. Summary of results of TWAS and local genetic correlation analyses for genes with no detectable eQTL signal. Table C. Summary of results of FUSION permutation tests for published summary statistics for five phenotypes.

(DOCX)

Data Availability Statement

For the real data analysis, summary statistics were obtained for five different phenotypes: blood pressure (https://atlas.ctglab.nl/traitDB/3379), BMI (https://zenodo.org/record/1251813#.YBAfYBbvJhE, combined.23May2018), educational attainment (https://www.thessgac.org/data), schizophrenia (https://www.med.unc.edu/pgc/download-results/, scz2022), type 2 diabetes (https://atlas.ctglab.nl/traitDB/3328). Note that the educational attainment data requires registering your name and email address to confirm compliance to the terms of use of the data. GTEx v8 European ancestry whole blood eQTL summary statistics and the gene definition file used for the LAVA-TWAS and LAVA-rG analyses were obtained via https://www.gtexportal.org/home/datasets, the corresponding FUSION model files from http://gusevlab.org/projects/fusion/#single-tissue-gene-expression (EUR Samples). The European subset of the 1000 Genomes data used as LD reference for the real data analyses was downloaded from https://ctg.cncr.nl/software/magma. UK Biobank genotype data was obtained from https://www.ukbiobank.ac.uk/ under application #16406. LAVA is implemented as an R package, which is publicly available at the LAVA website (https://ctg.cncr.nl/software/lava) and LAVA GitHub repository (https://github.com/josefin-werme/LAVA). GTEx v8 input files compatible with LAVA are available from the LAVA website as well. Version 0.1.0 of LAVA was used for the LAVA analyses in this paper; this version contains specific functionality for handling eQTL summary statistics input files. FUSION can be obtained from http://gusevlab.org/projects/fusion/ and CoMM from https://github.com/gordonliu810822/CoMM. Scripts used for the simulations and analyses performed for this paper, as well as scripts and data underlying the figures, are available from https://github.com/cadeleeuw/twas-validity-scripts2022.


Articles from PLOS Genetics are provided here courtesy of PLOS

RESOURCES