Skip to main content
European Journal of Human Genetics logoLink to European Journal of Human Genetics
. 2022 Nov 29;32(3):263–269. doi: 10.1038/s41431-022-01244-1

Examination of a novel expression-based gene-SNP annotation strategy to identify tissue-specific contributions to heritability in multiple traits

Travis J Mize 1,2,, Luke M Evans 1,2,
PMCID: PMC10924090  PMID: 36446896

Abstract

Complex traits show clear patterns of tissue-specific expression influenced by single nucleotide polymorphisms (SNPs), yet current strategies aggregate SNP effects to genes by employing simple physical proximity-based windows. Here, we examined whether incorporating SNPs with effects on tissue-specific cis-expression would improve our ability to detect trait-relevant tissues across 31 complex traits using stratified linkage disequilibrium score regression (S-LDSC). We found that a physical proximity annotation produced more significant tissue enrichments and larger S-LDSC regression coefficients, as compared to an expression-based annotation. Furthermore, we showed that our expression-based annotation did not outperform an annotation strategy in which an equal number of randomly chosen SNPs were annotated to genes within the same genomic window, suggesting extensive redundancy among SNP effect estimates due to linkage disequilibrium. That said, current sample sizes limit estimation of cis-genetic SNP effects; therefore, we recommend reexamination of the expression-based annotation when larger tissue-specific expression datasets become available. To examine the influence of sample size, we used a large whole blood eQTL reference panel (N = 31,684) applying a similar expression-based annotation strategy. We found that significant cis-expression QTLs in whole blood did not outperform the physical proximity annotation when estimating tissue-specific SNP heritability enrichment for either high- or low-density lipoprotein phenotypes but performed similarly for inflammatory bowel disease. Finally, we report new and updated tissue enrichment estimates across 31 complex traits, such as significant heritability enrichment of the frontal cortex for cognitive performance, educational attainment, and intelligence, providing further evidence of this structure’s importance in higher cognitive function.

Subject terms: Transcriptomics, Heritable quantitative trait, Gene expression, Genome informatics

Introduction

Regulation of gene expression is one mechanism of heritable variation of complex traits [15], yet integration of expression-related processes in identifying genetic associations and influences on traits remains incomplete. Single nucleotide polymorphisms (SNPs) located in or near a gene (i.e., in cis) may influence that gene’s function through alterations of transcription regulatory elements [6], mRNA splicing [7], translation [8, 9], and many other factors [10, 11]. However, the functional effects of most SNPs on complex traits, particularly those without clear protein-coding changes, remain unknown [12]. Despite this, there is clear evidence that genetic influences on tissue- and cell-type-specific expression affect complex traits [1316]. This has led to discussions of pleiotropic effects across tissues [17] and how additional transcriptomic complexity may influence our ability to estimate expression-mediated heritability [14].

Recently, Finucane et al. [2] used Genotype Tissue Expression (GTEx) [18] RNA-seq data to identify trait-relevant tissues and cell types across 48 traits and diseases through the application of stratified linkage disequilibrium score regression (S-LDSC) [19]. They identified sets of genes specifically expressed in tissues or brain regions, within which SNPs contribute significantly to heritable variation in complex traits. As is standard practice, they utilized an arbitrarily specified physical window within which all SNPs were annotated to genes. Previous studies suggest that while cis-acting factors influence tissue-specific gene expression [18, 20], causal genes are frequently further than 100 kb away from associated SNPs [21], and the addition of functional information (e.g., expression quantitative trait loci) can improve causal inference in genomic annotation [12]. Together, these studies imply that incorporating variants estimated to influence tissue-specific expression, in addition to specifying a larger genomic window, may capture additional transcriptomic information resulting in heightened model specificity and improved estimates of complex trait heritability due to tissue-specific gene expression.

There are several approaches to estimate SNP effects on tissue-specific gene expression [4, 14, 15, 18] and it is therefore possible to annotate variants to genes based on their predicted functional impact on expression, foregoing the assumption of the physically nearest SNPs being causal. Here, we examined whether annotating SNPs to genes based on their association with expression [1, 22] would identify novel expression-trait estimates of tissue-specific enrichment while leveraging the largest publicly available genome-wide association study (GWAS) summary statistics to date. We hypothesized the expression-related annotations would improve the ability to identify trait-relevant tissues by increasing gene-SNP annotation specificity while simultaneously incorporating a larger genomic window. We also updated estimates of partitioned cis-expression SNP heritability hSNP2 of specifically expressed genes using the most recent available GWAS across 31 complex traits.

Methods

Overview of cell-type-specific stratified LDSC and specifically expressed genes

Cell-type-specific S-LDSC [2, 19] models the combined heritable contribution of a SNP and those with which it is in LD, such that Eχi2=1+Na+Nkτkl(i,k), where Eχi2 is the expectation of the association test statistic for SNP i, N is the sample size of the GWAS, a is a measure of confounding bias (e.g., population stratification), and l(i,k) is SNP i’s LD score for a functional category k. The regression coefficient (τk) for the kth functional annotation estimates the contribution of that category’s specific expression to hSNP2 enrichment conditional on all other annotations. More specifically, when τk = 0 there is no enrichment, when τk < 0 there is a decrease in the per-SNP heritability while accounting for other annotations, and when τk > 0 there is an increase in the per-SNP heritability while accounting for other annotations. The 53 baseline functional annotations incorporate genetic information not specific to cell type from gene structure (e.g., promoter, super enhancer, intronic, exonic) and methylation patterns (e.g., histone marks, chromatin structure) to increase the accuracy of estimated enrichment [19]. Refer to Finucane et al. [19] for a complete description of modeling and baseline functional annotation procedures.

Sets of genes specifically expressed in individual tissues and brain regions using GTEx v6p expression data were identified by Finucane et al. [2], which we downloaded from https://alkesgroup.broadinstitute.org/LDSCORE/ and used in all subsequent analyses. The baseline gene-SNP annotation strategy for S-LDSC is to map all SNPs ±100 kb surrounding each gene of interest (Fig. 1). We refer to this approach as the physical proximity annotation henceforth.

Fig. 1. Gene-SNP annotation comparison flowchart.

Fig. 1

Overview of S-LDSC physical proximity and expression-based annotation methods used to compare estimates of LDSC enrichment coefficients across tissues.

We examined, following Finucane et al. [2], two series of gene sets: (1) genes with unique expression within a category of similar tissues, as compared to the expression of other tissues (e.g., cortex vs. all non-brain tissues), and (2) genes with unique expression within a given brain region, as compared to the expression of all other available brain regions. For simplicity, we refer to gene sets from (1) as multi-tissue and (2) as within-brain. Refer to Finucane et al. [2] for a complete description of the methods employed to identify these sets of genes.

We accessed available GWAS summary statistics for 31 phenotypes [2342] (Supplementary Table 1) and performed S-LDSC using both the physical (described above) and expression-based annotation approaches (described in the “Expression-based annotation” section). Due to the limited samples of publicly released GWAS summary statistics, only data from individuals of European descent were included in this study.

Expression-based annotation

We identified SNPs with evidence of expression effects in GTEx v7 data in each of 48 tissues based on pre-computed gene expression weights estimated in Functional Summary-based Imputation (FUSION) [1] and available at http://gusevlab.org/projects/fusion/. FUSION applies four statistical models (best linear unbiased prediction (BLUP), top1, lasso regression, and elastic net regression), each with distinct assumptions of polygenicity, to estimate tissue-specific gene expression weights ±500 kb from the transcription start site of all genes for which there was an available expression weight. For each gene, FUSION then selects the highest performing model based on the model’s R2 and its corresponding p value. Briefly, BLUP includes all non-zero effect SNPs, top1 incorporates the single largest effect SNP, lasso regression generates a large effect SNP sparse model, and elastic net regression tests across a spectrum of SNP inclusion ranging from BLUP to top1. Thus, for each gene, a set of SNPs was identified that had non-zero cis-expression effects within 500 kb upstream and downstream of the transcription start site, and within specific tissues or brain regions.

For the expression-based annotation, we mapped these putatively expression-influencing SNPs specified by the best performing statistical model in FUSION to the specifically expressed sets of genes identified by Finucane et al. [2] (Fig. 1) and implemented gene set enrichment analyses in S-LDSC to test for trait-specific tissue relevance (see below). As LDSC performs poorly when annotations have too few SNPs [16], all gene expression weights for a given tissue, regardless of their estimated cis-genetic expression hSNP2, were included to ensure a sufficient number of SNPs were incorporated to control for type 1 error using S-LDSC, as well as to assure the highest degree of overlap with the Finucane et al. [2] sets of genes (Supplementary Table 2). We then directly compared our expression-based annotation approach to the standard physical proximity gene-SNP annotation using the exact same gene sets, all of which were published by Finucane et al. [2] (see below).

For each tissue, specifically expressed sets of genes were incorporated as a 54th functional annotation category alongside the 53 baseline functional categories [19] plus an additional annotation category that consisted of either all genes examined in the GTEx v6p gene expression dataset from which the specifically expressed gene sets were derived, as specified in Finucane et al. [2] (when using physical proximity annotation), or all genes for which there were available expression weights for the expression-based annotation. To correct for multiple tests, a false discovery rate (FDR) < 5% was applied across multi-tissue S-LDSC analyses, as in Finucane et al. [2].

For traits with multiple implicated brain regions based on the cross-tissue analyses, a within-brain analysis was conducted to control for overlap between brain expression gene sets implemented in the multi-tissue analyses, as described above and similar to Finucane et al. [2]. If no brain regions were identified as significant for a given trait in the multi-tissue analyses, that trait was excluded from consideration in the within-brain analyses. All background LD model functional annotations remained the same between the multi-tissue and within-brain analyses. Once again, to correct for multiple tests, an FDR < 5% was applied across within-brain S-LDSC analyses.

Whole blood significant Cis-eQTL annotation

Many tissue-specific expression datasets, such as GTEx [18], have relatively small sample sizes which may limit the ability to accurately identify expression-influencing SNPs. To determine if a larger expression reference panel improved hSNP2 enrichment estimates, we used, to our knowledge, the largest (N = 31,684) publicly available dataset of significant whole blood cis-expression quantitative trait loci (eQTL) from eQTLGen [22] (https://www.eqtlgen.org/). Please refer to Võsa et al. [22] for a complete description of cis-eQTL mapping and association testing.

For each of the 16,920 genes examined, we mapped eQTLs to a gene if they fell within a ±500 kb window around the transcription start site of that gene. These annotated genes were then matched to the Finucane et al. [2] specifically expressed whole blood gene set (1534/2485 overlapped) and control gene set (13,894/16,902 overlapped) to create two functional annotations that were incorporated alongside the 53 baseline functional categories of S-LDSC to examine cis-expression hSNP2 via gene set enrichment analyses. This approach was then directly compared to the expression-based and physical proximity annotations using United Kingdom BioBank Neale Lab GWAS summary statistics [43] pertaining to high- and low-density lipoprotein cholesterol, as well as inflammatory bowel disease [44].

Whole blood eQTL conditional and joint analysis to determine independently associated variants

A high number of significant eQTLs were reported by eQTLGen [22] for most genes, many of which are not independent due to their close physical proximity and LD. As such, we expected a large degree of overlap between the physical proximity annotation and significantly associated whole blood eQTLs when SNPs were subset to LDSC baseline variants (hapmap3 [45]). Therefore, to identify the variants with independently associated eQTL effects for each gene’s whole blood expression, we performed conditional and joint analyses using Genome-wide Complex Trait Analysis [46] for each of the 19,250 genes examined by eQTLGen.

Specifically, for each gene tested by eQTLGen, SNPs were first mapped to a given gene using PLINK2 [47] if a variant was located within ±2 mb of the center of that gene, mirroring the genomic region examined in the original study [22]. Then, independent association tests were run on each individual gene specifying a standard genome-wide significant threshold of 5e−8, a minor allele frequency of 0.01, and the –cojo-slct flag while using a random subset of 20,000 United Kingdom BioBank [48] unrelated individuals to estimate LD among variants and cis-eQTL summary statistics reported by eQTLGen. Annotating independently associated SNPs to Finucane’s specifically expressed whole blood gene set resulted in only ~8000 SNPs in the annotation, far fewer than is needed for reliable estimates of enrichment in LDSC [16]. Therefore, all independently associated whole blood eQTLs were instead annotated as their own functional category to test hSNP2 enrichment regardless of the specificity of gene expression, a slightly different question than our main analysis of tissue-specific enrichment.

Testing specificity of LDSC coefficients between expression- and proximity-based SNP annotations

To assess whether the expression-based annotation procedure led to increased specificity of heritable contribution relative to physical proximity-based annotation we performed permutation tests. We used data from schizophrenia [41] (Fig. 2) as a baseline phenotype to compare estimated hSNP2 enrichment when applying expression-based annotation against a set of randomly chosen SNPs annotated within an equivalently sized genomic window (1 mb). For each gene, and each permutation, the same number of random SNPs were annotated equivalent to the number of SNPs with non-zero expression weights from the best performing expression prediction model. For each permutation, linkage disequilibrium scores were re-calculated and hSNP2 LDSC regression coefficients estimated. We performed 1000 permutations to generate a normal distribution of S-LDSC coefficients for a single tissue, representing a null distribution to test whether the specific choice of SNPs used, namely, those with evidence of expression effects, provides more information than a random set of SNPs of equal number. We compared our expression-based LDSC regression coefficients to this null distribution using a one-tailed significance test (p = 0.05). We performed this permutation procedure for three separate scenarios, applied to the 2014 schizophrenia GWAS summary statistics: (1) nonsignificant hSNP2 enrichment when using the expression-based annotation for the within-brain frontal cortex gene set, (2) significant hSNP2 enrichment when using the expression-based annotation for the within-brain cerebellum gene set, and (3) significant hSNP2 enrichment when using both the expression-based and physical proximity annotation for the multi-tissue cerebellum gene set. We chose these scenarios as they represent a range of results and possible outcomes.

Fig. 2. Flowchart of LDSC coefficient permutations.

Fig. 2

Overview of permutation procedure to test whether a SNP annotation based on putative expression effects provides stronger evidence of cell type or tissue enrichment than random sets of nearby SNPs.

Results

Ensuring consistency between our proximity-based annotation results and prior work, we first replicated the Finucane et al. multi-tissue schizophrenia S-LDSC analyses (correlation of the total number of SNPs included for each tissue, r = 1, Supplementary Table 3, and the correlation of the proportion of SNP heritability explained across all tissues, r > 0.99, Supplementary Table 3). This small difference is most likely due to minor variability in reported GWAS summary statistics over time.

Next, we sought to examine differences in S-LDSC coefficients across prior and newly reported GWAS for the same phenotypes. While some traits were highly consistent, such as height (LDSC regression coefficient estimates based on the new vs. prior GWAS summary statistics r = 0.9968, Supplementary Fig. 1), comparisons of other traits, such as Alzheimer’s disease (r = 0.3265, Supplementary Fig. 1), strongly differed (Supplementary Table 4). The large differences in regression coefficients for some traits are likely due to deviations in phenotypic definitions (e.g., proxy phenotype vs. clinical diagnosis). To examine discrepancies due to phenotypic differences, we performed a GWAS on inflammatory bowel disease (IBD) in United Kingdom BioBank unrelated individuals using ICD10 diagnoses and compared these to self-report IBD data made available by the Neale lab [43], as well as an additional IBD GWAS conducted by Jostins et al. [44]; see Supplementary Methods, Results, and Supplementary Fig. 2).

For tissue enrichment analyses, at least 100,000 SNPs were mapped to 46 of the 48 tissues when using an expression-based annotation (Supplementary Table 2), with pancreas and whole blood falling below this threshold (83,239 and 87,720 SNPs, respectively), suggesting that for these 46 tissues, S-LDSC regression coefficients are likely well controlled for type 1 error. We found the expression-based annotation resulted in fewer identified tissues or brain regions that contribute significantly to hSNP2 in complex traits when compared to the physical proximity-based annotation. Across the multi-tissue analyses, of the 31 phenotypes examined, 18 had at least one significant tissue when employing a physical proximity annotation, whereas only seven phenotypes had at least one significant tissue when using an expression-based annotation (FDR < 5%, Supplementary Figs. 35 and Supplementary Table 5). All tissue and trait combinations with significant expression-based annotation enrichments were also identified using the physical proximity annotation with the single exception of ovary tissue in Tourette syndrome (Supplementary Fig. 4). Of the phenotypes examined, schizophrenia (both sets of published GWAS summary statistics), educational attainment, and intelligence identified all 13 brain regions as significant, representing the maximum individual tissues identified for any trait.

Within-brain analyses were conducted for the 16 phenotypes with significant hSNP2 contribution of at least one brain region identified in the multi-tissue analyses. We identified significant contributions of specific brain regions in ten and six phenotypes when using a physical proximity and expression-based annotation, respectively (FDR < 5%, Fig. 3, Supplementary Fig. 6, and Supplementary Table 6). All significant expression-based annotations overlapped with a significant physical proximity annotation, with four exceptions: cortex in major depressive disorder (Fig. 3), cerebellum in the two schizophrenia datasets [41, 49] (Fig. 3), and cerebellum in Tourette syndrome (Supplementary Fig. 5). Additional analyses pertaining to synapse pathway enrichment for schizophrenia are provided in the Supplementary Materials (Supplementary Fig. 7 and Supplementary Table 7).

Fig. 3. Negative log10 p values for LDSC regression coefficients for specifically expressed gene set within-brain analyses utilizing a physical proximity (triangle) and expression-based (circle) annotation across 13 tissues categorized into four distinct subgroups for eight complex traits.

Fig. 3

Each point represents a tissue type for that specific annotation. The dashed line represents an FDR < 5% at −log10(p) = 2.34 with larger shapes falling above this threshold.

Annotating all significant cis-expression QTLs in whole blood to their associated genes did not outperform the physical proximity annotation when estimating tissue-specific hSNP2 enrichment for either high- or low-density lipoprotein phenotypes but performed similarly for IBD (Supplementary Fig. 8 and Supplementary Table 8). However, independently associated whole blood eQTLs were enriched for high-density lipoprotein and clinically diagnosed IBD, but not for low-density lipoprotein or self-reported IBD (Supplementary Fig. 9 and Supplementary Table 9).

Permutation tests, to assess whether the expression-based annotation procedure led to increased specificity of heritable contribution relative to physical proximity-based annotation, suggested that SNP-gene annotations based on SNP expression effects do not differ from randomly chosen SNPs within the same regions. In all three instances examined, the strength of the regression coefficient using an expression-based annotation was no different than when annotating SNPs to genes at random (all p > 0.32, Fig. 4, Supplementary Figs. 1012, and Supplementary Table 10).

Fig. 4. Distribution of 1000 permuted S-LDSC coefficients when using a random annotation procedure for schizophrenia in the frontal cortex within-brain analyses.

Fig. 4

Solid line represents mean S-LDSC coefficient. Dashed line represents expression-based annotation coefficient.

Discussion

We tested whether expression-based annotation of SNPs to genes with tissue-specific gene expression influences the specificity of tissue enrichment estimates utilizing S-LDSC and updated estimates of partitioned hSNP2 across 31 different complex traits. We found little evidence that annotating SNPs to genes based on evidence of expression positively impacts hSNP2 estimates of S-LDSC. In both the multi-tissue and within-brain analyses, the physical proximity annotation resulted in more instances of significant tissue enrichment (Supplementary Tables 5 and 6) and larger S-LDSC regression coefficients, as compared to the expression-based annotation. There were only five occurrences in which the expression-based annotation identified significant tissue enrichments not found when employing the physical proximity annotation: (1) ovary tissue for Tourette syndrome in multi-tissue analyses, (2) cerebellum for Tourette syndrome in within-brain analyses, (3) brain cortex for major depressive disorder in within-brain analyses, (4) cerebellum for schizophrenia (Daner) [41] in within-brain analyses, and (5) cerebellum for schizophrenia (Clozuk) [49] in within-brain analyses.

Permutations suggest that the expression-based annotation did not outperform an annotation strategy in which an equal number of randomly chosen SNPs were annotated to genes within the same genomic window (Supplementary Table 10). This may have resulted for several reasons. The choice of SNPs within the windows surrounding the specifically expressed genes makes little difference, perhaps because of extensive LD among SNPs surrounding these genes. Alternatively, while genetic effects on gene regulation clearly impact complex traits [1, 2, 4], non-regulatory effects do as well, and their effects are ignored when only including expression-regulating SNPs. Finally, we included expression-based annotations for all genes with available FUSION weights in order to include as many SNPs for each annotation as possible to properly control for known type 1 error inflation in S-LDSC when annotations have too few SNPs [16], as well as to maximize the number of genes overlapping the specifically expressed genes of each tissue or brain region by Finucane et al. [2]. In doing so, the total number of annotated SNPs using an expression-based annotation was still approximately tenfold less on average compared to a physical proximity (see Supplementary Table 5). Additionally, roughly half of the genes implicated by Finucane et al. [2] did not have significant cis-genetic SNP heritability of expression (as estimated in FUSION [1]) and may have provided additional noise in our hSNP2 enrichment analyses. As tissue-specific expression sample sizes increase, we expect the number of genes, and the subsequent total number of annotated SNPs, with significant cis-hSNP2 to also increase, as well as the accuracy of their expression prediction models [1]. As such, we predict that as these resources continue to grow, expression-based annotation may result in higher specificity of identified relevant tissues and better estimates of partitioned hSNP2. Therefore, we suggest reexamination of the expression-based annotation in estimating tissue-specific enrichment when larger expression reference panel sample sizes become available.

We attempted to explore this possibility further by incorporating eQTLs from a large reference panel (N = 31,684) and applying a similar annotation strategy to that of the expression-based annotation. The underlying association models of FUSION and the eQTL associations are somewhat different, but annotating SNPs to genes based on their association with expression level may be an effective alternative to a physical proximity annotation (Supplementary Fig. 8). Additionally, reducing eQTLs to those that are independently associated with whole blood expression shows clear signs of enrichment for high-density lipoprotein and IBD, demonstrating significant contributions to heritability for these traits, and suggesting that as the sample sizes of expression studies increase, incorporating their results into analyses of hSNP2 enrichment may be fruitful.

In addition to our comparison of annotation strategies, here we also report new and updated S-LDSC tissue enrichments across a wide variety of traits (Supplementary Tables 5 and 6). For example, we report significant heritability enrichment of the frontal cortex for cognitive performance, educational attainment, and intelligence while controlling for expression in other brain regions, corroborating evidence of this structure’s importance in higher cognitive function [50]. Additionally, we found stronger tissue-specific heritability enrichment estimates across several traits compared to those originally reported by Finucane et al. [2], such as those for schizophrenia, reiterating the importance of GWAS sample size in S-LDSC tissue enrichment analyses. We also demonstrate the portability of our annotation methodology to other types of gene set analyses, such as those pertaining to pathway enrichment, an additional avenue to explore post-GWAS results (Supplementary Methods and Results, Supplementary Fig. 7).

Identification of heritable contributions to complex traits remains pertinent as we transition from macro-level estimates of heritability to a finer-scale of tissue- and cell-type relevancy. Improving annotation strategies of SNPs to genes is relevant for tissue-specific enrichment analyses as well as other gene set analyses, such as enrichment of a priori-identified pathways. Here, we have attempted to build upon current gene-SNP annotation strategies through the incorporation of estimated effects on gene expression, while simultaneously providing updated and new tissue-specific enrichment estimates across 31 complex traits. While our expression-based annotation did not improve our ability to identify trait-relevant tissues, we suggest further examination of this approach as tissue- and cell-type-specific transcriptome reference panels continue to grow.

Supplementary information

Supplementary Tables (722.6KB, xlsx)

Acknowledgements

We thank the participants and researchers of the original studies whose summary statistics and expression weights were utilized in this work. We also thank the authors of FUSION for generating the expression weights, as well as the Finucane lab for the specifically expressed gene sets. This work utilized the Summit supercomputer, which is supported by the National Science Foundation (awards ACI-1532235 and ACI-1532236), the University of Colorado Boulder, and Colorado State University. The Summit supercomputer is a joint effort of the University of Colorado Boulder and Colorado State University.

Author contributions

TJM and LME contributed to study conception, design, manuscript preparation, and approval of the final version of the manuscript. TJM conducted analyses and curated datasets.

Funding

This work was supported by the National Institutes of Health (grant numbers T32 DA017637 to John K. Hewitt, R01 AG046938-06 to Chandra A. Reynolds, R01 DA044283-01 to Scott I. Vrieze, and R01 MH100141-06 to Matthew C. Keller) and the University of Colorado Boulder Institute for Behavioral Genetics.

Data availability

GWAS summary statistics (Supplementary Table 2), FUSION expression weights (http://gusevlab.org/projects/fusion/), and S-LDSC specifically expressed genes (https://alkesgroup.broadinstitute.org/LDSCORE/) are available online. All data generated herein are reported in the Supplementary tables. ICD10 United Kingdom Biobank GWAS summary statistics are available upon request.

Code availability

All code is available upon request.

Competing interests

The authors declare no competing interests.

Ethical approval

This research was approved by the CU IRB under protocol 18-0091.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Travis J. Mize, Email: trmi1868@colorado.edu

Luke M. Evans, Email: luke.m.evans@colorado.edu

Supplementary information

The online version contains supplementary material available at 10.1038/s41431-022-01244-1.

References

  • 1.Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BWJH, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet. 2016;48:245–52. doi: 10.1038/ng.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Finucane HK, Reshef YA, Anttila V, Slowikowski K, Gusev A, Byrnes A, et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat Genet. 2018;50:621–9. doi: 10.1038/s41588-018-0081-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ongen H, Brown AA, Delaneau O, Panousis NI, Nica AC, Dermitzakis ET. Estimating the causal tissues for complex traits and diseases. Nat Genet. 2017;49:1676–83. doi: 10.1038/ng.3981. [DOI] [PubMed] [Google Scholar]
  • 4.Gusev A, Mancuso N, Won H, Kousi M, Finucane HK, Reshef Y, et al. Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. Nat Genet. 2018;50:538–48. doi: 10.1038/s41588-018-0092-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Heinzen EL, Ge D, Cronin KD, Maia JM, Shianna KV, Gabriel WN, et al. Tissue-specific genetic control of splicing: implications for the study of complex traits. PLoS Biol. 2008;6:e1000001. doi: 10.1371/journal.pbio.1000001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Guo Y, Jamison DC. The distribution of SNPs in human gene regulatory regions. BMC Genomics. 2005;6:140. doi: 10.1186/1471-2164-6-140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Shen LX, Basilion JP, Stanton VP. Single-nucleotide polymorphisms can cause different structural folds of mRNA. Proc Natl Acad Sci USA. 1999;96:7871–6. doi: 10.1073/pnas.96.14.7871. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Beaudoin JD, Perreault JP. 5′-UTR G-quadruplex structures acting as translational repressors. Nucleic Acids Res. 2010;38:7022–36. doi: 10.1093/nar/gkq557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Xu H, Wang P, You J, Zheng Y, Fu Y, Tang Q, et al. Screening of Kozak-motif-located SNPs and analysis of their association with human diseases. Biochem Biophys Res Commun. 2010;392:89–94. doi: 10.1016/j.bbrc.2010.01.002. [DOI] [PubMed] [Google Scholar]
  • 10.Robert F, Pelletier J. Exploring the impact of single-nucleotide polymorphisms on translation. Front Genet. 2018;9:507. doi: 10.3389/fgene.2018.00507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Calvo SE, Pagliarini DJ, Mootha VK. Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proc Natl Acad Sci USA. 2009;106:7507–12. doi: 10.1073/pnas.0810916106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Watanabe K, Taskesen E, van Bochoven A, Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat Commun. 2017;8:1826. doi: 10.1038/s41467-017-01261-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Barbeira AN, Dickinson SP, Bonazzola R, Zheng J, Wheeler HE, Torres JM, et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat Commun. 2018;9:1825. doi: 10.1038/s41467-018-03621-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Yao DW, O’Connor LJ, Price AL, Gusev A. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat Genet. 2020;52:626–33. doi: 10.1038/s41588-020-0625-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Liu X, Li YI, Pritchard JK. Trans effects on gene expression can drive omnigenic inheritance. Cell. 2019;177:1022–34.e6. doi: 10.1016/j.cell.2019.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Tashman KC, Cui R, O’Connor LJ, Neale BM, Finucane HK. Significance testing for small annotations in stratified LD-score regression. 2021. 10.1101/2021.03.13.21249938.
  • 17.Watanabe K, Stringer S, Frei O, Mirkov MU, de Leeuw C, Polderman TJC, et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat Genet. 2019;51:1339–48. doi: 10.1038/s41588-019-0481-0. [DOI] [PubMed] [Google Scholar]
  • 18.Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, et al. The genotype-tissue expression (GTEx) project. Nat Genet. 2013;45:580–5. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh P, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet. 2015;47:1228–35. doi: 10.1038/ng.3404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Akbarian S, Liu C, Knowles JA, Vaccarino FM, Farnham PJ, Crawford GE, et al. The PsychENCODE project. Nat Neurosci. 2015;18:1707–12. doi: 10.1038/nn.4156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Brodie A, Azaria JR, Ofran Y. How far from the SNP may the causative genes be? Nucleic Acids Res. 2016;44:6046–54. doi: 10.1093/nar/gkw500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Võsa U, Claringbould A, Westra HJ, Bonder MJ, Deelen P, Zeng B, et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet. 2021;53:1300–10. doi: 10.1038/s41588-021-00913-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Liu M, Jiang Y, Wedow R, Li Y, Brazel DM, Chen F, et al. Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat Genet. 2019;51:237–44. doi: 10.1038/s41588-018-0307-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Walters RK, Polimanti R, Johnson EC, McClintick JN, Adams MJ, Adkins AE, et al. Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nat Neurosci. 2018;21:1656–69. doi: 10.1038/s41593-018-0275-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Jansen IE, Savage JE, Watanabe K, Bryois J, Williams DM, Steinberg S, et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat Genet. 2019;51:404–13. doi: 10.1038/s41588-018-0311-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Watson HJ, Yilmaz Z, Thornton LM, Hubel C, Coleman JRI, Gaspar HA, et al. Genome-wide association study identifies eight risk loci and implicates metabo-psychiatric origins for anorexia nervosa. Nat Genet. 2019;51:1207–14. doi: 10.1038/s41588-019-0439-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Otowa T, Hek K, Lee M, Byrne EM, Nivard MG, Bigdeli T, et al. Meta-analysis of genome-wide association studies of anxiety disorders. Mol Psychiatry. 2016;21:1391–9. doi: 10.1038/mp.2015.197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Mullins N, Forstner AJ, O’Connell KS, Coombes B, Coleman JRI, Qiao Z, et al. Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nat Genet. 2021;53:817–29. doi: 10.1038/s41588-021-00857-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Pulit SL, Stoneman C, Morris AP, Wood AR, Glastonbury CA, Tyrrell J, et al. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry. Hum Mol Genet. 2019;28:166–74. doi: 10.1093/hmg/ddy327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Jansen PR, Nagel M, Watanabe K, Wei Y, Savage JE, de Leeuw CA, et al. Genome-wide meta-analysis of brain volume identifies genomic loci and genes shared with intelligence. Nat Commun. 2020;11:5606. doi: 10.1038/s41467-020-19378-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Johnson EC, Demontis D, Thorgeirsson TE, Walters RK, Polimanti R, Hatoum AS, et al. A large-scale genome-wide association study meta-analysis of cannabis use disorder. Lancet Psychiatry. 2020;7:1032–45. doi: 10.1016/S2215-0366(20)30339-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Lee JJ, Wedow R, Okbay A, Kong E, Maghzian O, Zacher M, et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat Genet. 2018;50:1112–21. doi: 10.1038/s41588-018-0147-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Jansen PR, Watanabe K, Stringer S, Skene N, Bryois J, Hammerschlag AR, et al. Genome-wide analysis of insomnia in 1,331,010 individuals identifies new risk loci and functional pathways. Nat Genet. 2019;51:394–403. doi: 10.1038/s41588-018-0333-3. [DOI] [PubMed] [Google Scholar]
  • 34.Okbay A, Baselmans BML, De Neve JE, Turley P, Nivard MG, Fontana MA, et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat Genet. 2016;48:624–33. doi: 10.1038/ng.3552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, et al. Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. Hum Mol Genet. 2018;27:3641–9. doi: 10.1093/hmg/ddy271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Savage JE, Jansen PR, Stringer S, Watanabe K, Bryois J, de Leeuw CA, et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat Genet. 2018;50:912–9. doi: 10.1038/s41588-018-0152-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Wray NR, Ripke S, Mattheisen M, Trzaskowski M, Byrne EM, Abdellaoui A, et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat Genet. 2018;50:668–81. doi: 10.1038/s41588-018-0090-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.International Obsessive Compulsive Disorder Foundation Genetics Collaborative (IOCDF-GC) and OCD Collaborative Genetics Association Studies (OCGAS) Revealing the complex genetic architecture of obsessive-compulsive disorder using meta-analysis. Mol Psychiatry. 2018;23:1181–8. doi: 10.1038/mp.2017.154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Nievergelt CM, Maihofer AX, Klengel T, Atkinson EG, Chen C, Choi KW, et al. International meta-analysis of PTSD genome-wide association studies identifies sex- and ancestry-specific genetic risk loci. Nat Commun. 2019;10:4558. doi: 10.1038/s41467-019-12576-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Yu D, Sul JH, Tsetsos F, Nawaz MS, Huang AY, Zelaya I, et al. Interrogating the genetic determinants of Tourette syndrome and other tic disorders through genome-wide association studies. Am J Psychiatry. 2019;176:217–27. doi: 10.1176/appi.ajp.2018.18070857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ripke S, Neale BM, Corvin A, Walters JTR, Farh K, Holmans PA, et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511:421–7. doi: 10.1038/nature13595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Autism Spectrum Disorders Working Group of The Psychiatric Genomics Consortium. Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia. Mol Autism. 2017;8:21. doi: 10.1186/s13229-017-0137-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Neale Lab. UK Biobank—Neale Lab. 2018. http://www.nealelab.is/uk-biobank/.
  • 44.Jostins L, Ripke S, Weersma RK, Duerr RH, McGovern DP, Hui KY, et al. Host–microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature. 2012;491:119–24. doi: 10.1038/nature11582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.The International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. doi: 10.1038/nature09298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779. doi: 10.1371/journal.pmed.1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Pardiñas AF, Holmans P, Pocklington AJ, Escott-Price V, Ripke S, Carrera N, et al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat Genet. 2018;50:381–9. doi: 10.1038/s41588-018-0059-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Badre D, Nee DE. Frontal cortex and the hierarchical control of behavior. Trends Cogn Sci. 2018;22:170–88. doi: 10.1016/j.tics.2017.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Tables (722.6KB, xlsx)

Data Availability Statement

GWAS summary statistics (Supplementary Table 2), FUSION expression weights (http://gusevlab.org/projects/fusion/), and S-LDSC specifically expressed genes (https://alkesgroup.broadinstitute.org/LDSCORE/) are available online. All data generated herein are reported in the Supplementary tables. ICD10 United Kingdom Biobank GWAS summary statistics are available upon request.

All code is available upon request.


Articles from European Journal of Human Genetics are provided here courtesy of Nature Publishing Group

RESOURCES