Abstract
Identifying causal genetic variants and understanding their mechanisms of effect on traits remains a challenge in genome-wide association studies (GWASs). In particular, how genetic variants (i.e., trans-eQTLs) affect expression of remote genes (i.e., trans-eGenes) remains unknown. We hypothesized that some trans-eQTLs regulate expression of distant genes by altering the expression of nearby genes (cis-eGenes). Using published GWAS datasets with 39,165 single-nucleotide polymorphisms (SNPs) associated with 1,960 traits, we explored whole blood gene expression associations of trait-associated SNPs in 5,257 individuals from the Framingham Heart Study. We identified 2,350 trans-eQTLs (at p < 10−7); more than 80% of them were found to have cis-associated eGenes. Mediation testing suggested that for 35% of trans-eQTL-trans-eGene pairs in different chromosomes and 90% pairs in the same chromosome, the disease-associated SNP may alter expression of the trans-eGene via cis-eGene expression. In addition, we identified 13 trans-eQTL hotspots, affecting from ten to hundreds of genes, suggesting the existence of master genetic regulators. Using causal inference testing, we searched causal variants across eight cardiometabolic traits (BMI, systolic and diastolic blood pressure, LDL cholesterol, HDL cholesterol, total cholesterol, triglycerides, and fasting blood glucose) and identified several cis-eGenes (ALDH2 for systolic and diastolic blood pressure, MCM6 and DARS for total cholesterol, and TRIB1 for triglycerides) that were causal mediators for the corresponding traits, as well as examples of trans-mediators (TAGAP for LDL cholesterol). The finding of extensive evidence of genome-wide mediation effects suggests a critical role of cryptic gene regulation underlying many disease traits.
Keywords: GWAS, trans, eQTLs, hotspots, mediation, causal variants, cardiometabolic traits
Introduction
Genome-wide association studies (GWASs) have identified tens of thousands of genetic variants associated with complex traits and diseases.1, 2 Genetic variants identified by GWASs, however, explain only a small proportion of phenotypic variation, even for diseases known to have a strong genetic component, such as obesity, diabetes, and schizophrenia.3, 4 This knowledge void has been termed the “missing heritability.”5 One important consideration in the search for missing heritability is that the top GWAS single-nucleotide polymorphisms (SNPs) are often not causal variants for their associated traits, but rather are in linkage disequilibrium (LD) with causal SNPs.3 In addition, fewer than 5% of GWAS SNPs are non-synonymous substitutions, while the remainder are located within non-coding regions.2, 6 This suggests that instead of directly altering the amino acid sequence of proteins, SNPs can affect phenotypes by other mechanisms, such as regulation of gene transcription levels.
Expression quantitative trait loci (eQTLs) are genetic variants that are associated with gene transcription levels.7 eQTLs that alter expression of nearby transcripts (cis-eGenes) are referred to as cis-eQTLs, whereas those associated with expression of remote transcripts (trans-eGenes), usually on different chromosomes, are referred to as trans-eQTLs.8, 9 When SNPs at a trans-eQTL locus affect the expression of multiple trans-eGenes, the region is defined as a trans-eQTL hotspot.10 cis-eQTLs typically reside close to transcription start sites (TSSs), suggesting that they directly impact gene expression.11 In contrast to cis-eQTLs, analysis of trans-eQTLs is vastly more computationally challenging and reported trans-eQTLs have proven to be less replicable across studies.11, 12 Therefore, many eQTL studies focus only on cis-eQTLs or a small subset of trans-eQTLs.12, 13 trans-eQTL hotspots are of particular interest because SNPs linked to such hotpots could serve important regulatory roles. The mechanisms by which trans-eQTLs alter transcription of their linked trans-eGenes are largely unknown and likely reflect indirect or cryptic regulation.14, 15 For example, it has been proposed that expression of trans-eGenes could be mediated by transcription factors residing close to the corresponding trans-eQTLs.14 This phenomenon would allow cis-eQTLs near regulatory genes to serve as master regulators for a large number of trans-eGenes. We found that some eQTLs can affect expression of eGenes both in a cis and trans manner, whereby cis-eGenes mediate the associations between eQTLs and trans-eGenes.16
We investigated the associations of SNPs previously reported to be associated with a variety of traits in GWASs with whole blood gene expression measured in 5,257 Framingham Heart Study (FHS) participants. In total, we related genotypes for 39,165 genome-wide significant GWAS SNPs reported to be associated with 1,960 traits in GWAS databases with expression levels of 17,873 genes to identify cis- and trans-eQTLs and their associated eGenes.17 Our results reveal that a large number of eQTLs regulate gene expression in both a cis and trans manner. Additionally, we identified 13 trans-eQTL hotspots and found that about one third of trans regulation is significantly mediated by the expression of cis-eGenes. As proof of principle, we inferred causality and directionality of SNP-transcript-trait relationships using genetic variants as instrumental variables in causal inference analyses. Specifically, we looked at eight cardiometabolic traits that were extensively characterized in the FHS, including body mass index (BMI), systolic and diastolic blood pressure, low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides, total cholesterol, and fasting blood glucose levels.
Material and Methods
Study Sample
In 1948, the FHS started recruiting participants (original cohort) from Framingham, MA, to begin the first round of physical examinations and lifestyle interviews to investigate cardiovascular disease (CVD) and its risk factors. In 1971 and 2002, FHS recruited offspring (and their spouses) and adult grandchildren of the original cohort participants into the Offspring and Third Generation cohorts, respectively.18 A total of 5,257 participants from the FHS Offspring and Third Generation cohorts had gene expression profiling and genome-wide genotyping.19 Methods for collection of whole blood samples and RNA isolation and preparation have been described previously.19 A summary of the cardiometabolic traits used in this study can be found in Table S1. All participants provided informed consent and the protocols were approved by the institutional review board.
Genotype Data
A total of 42,271 SNPs associated with 1,960 complex traits from GWASs (at p ≤ 5 × 10−8 in the GRASP database) were curated and matched with ∼8.5 million SNPs imputed from the 1000 Genomes Project Reference Panel. GRASP v.2.0 re-annotated genotype-phenotype results from 1,390 GWASs and corresponding open-access GWAS results.2 SNPs were input to Minimac20 software. In brief, we combined genotype data with the HapMap CEU samples and inferred genotypes probabilistically based on shared haplotype stretches between study samples and HapMap release 22 build 36. For each genotype, imputation results were summarized as an “allele dosage” defined as the expected number of copies of the minor allele at that SNP (value between 0 and 2). SNPs with imputed quality score (r2) < 0.3 and MAF < 0.01 were filtered out, resulting in 39,165 GWAS significant SNPs for eQTL analysis (Figure 1).
Gene Expression
Whole blood was collected in PAXgene tubes (PreAnalytiX) and frozen at −80°C. RNA was extracted using a whole blood RNA System Kit (QIAGEN) in FHS and mRNA expression profiling was assessed using the Affymetrix Human Exon 1.0 ST GeneChip platform (Affymetrix), which contains more than 5.5 million probes targeting the expression of 17,873 genes. The Robust Multi-array Average (RMA) package21 was used to normalize gene expression values and remove any technical or spurious background variation. Linear regression models were used to adjust for technical covariates (batch, first principal component, and residual all probeset mean) and differential blood cell proportions. The pedigreemm package22 was used to remove the effects of sex and age and accounted for familial relationships. 2,181 individuals from the Third Generation cohort had complete blood cell counts (white blood cells, neutrophils, lymphocytes, monocytes, eosinophils, and basophils). Using gene expression data, we imputed the cell counts of remaining samples by partial least square (PLS) prediction that was developed in participants with measured cell counts and expression data. We did not find a significant difference when comparing results using imputed cell counts and those using measured values. Therefore, we used measured cell counts when they were available and used imputed values when measured cell counts were not available.
Identifying eQTLs and trans-eQTL Hotspots
eQTL analysis was conducted on 5,257 individuals from the FHS Offspring and Third Generation cohorts using available mRNA expression data and genome-wide genotyping. Twenty PEER factors were calculated using a Bayesian framework and were used to account for hidden confounding factors in the adjusted gene expression data.23 For each SNP-mRNA pair, a linear model was developed to identify SNP-mRNA associations, adjusting for PEER factors and familial relationship. p values were adjusted for multiple comparisons using the false discovery rate (FDR) method.24 eQTLs at FDR ≤ 0.05 were considered to be significant. cis-eQTLs were defined as SNPs that reside within 1 Mb of the transcription start site. trans-eQTLs were defined as SNPs that were at a distance greater than 5 Mb from the TSS of an associated transcript on the same chromosome or on a different chromosome. eGenes were defined as genes associated with eQTLs. An independent set of eQTLs was obtained by pruning eQTLs in LD (R2 > 0.2) and within 250 Kb, while keeping the most significant SNPs per eGene. trans-eQTL hotspots were identified by an index eQTL and nearby SNPs in high LD (R2 > 0.8) associated with at least ten trans-eGenes. We excluded from analysis eQTLs that resided on the same chromosome but were less than 5 Mb from their eGenes to avoid confounding by long-range LD patterns.
Mediation and Causal Testing
Mediation testing was conducted using the mediation package (see Web Resources) in R with eQTL as the “exposure,” cis-eGene expression as the “mediator,” and trans-eGene expression as the “outcome.” A 100% proportion of mediation effect indicated that the entire association between an eQTL and expression of a trans-eGene (direct effect) is explained by effects of the eQTL on cis-eGene expression. Significant mediation effects were defined at a permutation threshold of p < 0.005 (1,000 permutations). The causal inference test (CIT) was conducted using the statistical package CIT25 in R based on the following conditions: (1) the trait (T) is associated with the locus (L); (2) L is associated with the eGene mediator (G) after adjusting for T; (3) G is associated with T after adjusting for L; and (4) L is independent of T after adjusting for G. The p value of CIT is defined as the maximum of the four-component test p values by the intersection-union test framework (Figure S1).26 To determine whether cis-eGenes or trans-eGenes are causal mediators for a trait, CIT was performed for cis-eGenes and trans-eGenes separately. For a cis-eGene, we used its cis-eQTL with the smallest p value as an instrumental variable. For a trans-eGene, we calculated its best cis-eQTL from ∼8 million imputed SNPs residing within 1 Mb of the trans-eGene, based on the smallest p value.
Functional Annotation and Enrichment Testing
SNP annotations were conducted on HaploReg v.4.1,27 which linked the SNPs with chromatin state and protein binding annotation from the Roadmap Epigenomics and ENCODE project, sequence conservation across mammals, the effect of SNPs on regulatory motifs, and the effect of SNPs on expression from eQTL studies. Regulatory motif enrichment was conducted using cis-eQTLs residing in trans-eQTL hotspots as test sets and all cis-eQTLs as background. The gene ontology and transcription factor target enrichment analyses were conducted by “Gene-Set Enrichment Analysis (GSEA).”28 The transcription factors (TFs) were extracted from FANTOM,29 the large international consortium that mapped all human TFs and the genes they regulate; it contains 1,672 human genes. The protein-protein interaction (PPI) network contained a systematically generated or literature-curated dataset of ∼58,000 PPIs among 10,690 human proteins.30 We defined hub proteins as those having no fewer than four interactions in the PPI network.
Results
eQTLs Associated with Complex Disease Traits
At a minor allele frequency > 0.01 and imputation r2 > 0.3, 39,165 genome-wide significant (p < 5 × 10−8) SNPs reported in published GWAS databases2 were genotyped or imputed in the FHS. At FDR < 0.05, we identified 23,579 cis-eQTLs (associated with expression of 2,933 cis-eGenes at a corresponding p < 1 × 10−4; Table S2) representing 5,974 independent SNPs (LD threshold < 0.2) and 2,350 trans-eQTLs (associated with expression of 606 trans-eGenes at a corresponding p < 1 × 10−7; Table S3) representing 486 independent SNPs (LD threshold < 0.2). Because many SNPs in high LD are associated with different traits in GWASs, we used non-pruned eQTLs in the subsequent analyses. In total, we determined that 23,951 out of 39,165 (61%) statistically significant GWAS SNPs are eQTLs, which is consistent with previous findings that GWAS SNPs are enriched for eQTLs (p < 0.0001 for 10,000 random sets of 39,165 SNPs at MAF > 0.01 and r2 > 0.3; average eQTL number = 9,022).13, 31
Reproducibility and Mediation Effects of trans-eQTLs
In accordance with previous results,32 we found that trans effects on gene expression are much weaker than cis effects (Figure S2A, average trans-eQTL effect size on corresponding transcript R2 = 0.009 versus average cis-eQTL effect size R2 = 0.02, t test p = 1.1 × 10−16). Using the Blood eQTL Browser (meta-analysis in non-transformed peripheral blood samples from 5,311 individuals)12 as a reference database, we found that 331 out of 1,686 (20%) trans-eQTL-trans-eGene pairs from the database were statistically significant (at p < 1 × 10−7) in our results. Among them, 323 pairs (98%) have concordant directions of effects (Table S4). The overlapping pairs increased to 562 (33%) when we used p < 1 × 10−4 as our trans-eQTL threshold. On the other hand, the replication rate was much higher for cis-eQTLs; 17,118 out of 38,608 (44%) cis-eQTL-cis-eGene pairs in the Blood eQTL Browser were statistically significant (at p < 1 × 10−4) in our results. Among them, 14,208 pairs (83%) had the same direction of effect. We hypothesized that the genetic effects of trans-eQTLs on expression of trans-eGenes are mediated in some cases by the expression of cis-eGenes (Figure 2). To test this hypothesis, we conducted mediation analyses for all 8,566 trans-eQTL-trans-eGene pairs to identify the proportion of the association between a trans-eQTL and trans-eGene that was attributable to the effect of the eQTL on cis-eGene expression. For trans-eQTLs and trans-eGenes on different chromosomes, we found that 1,953 out of 2,324 trans-eQTLs (84%) affect cis-eGene expression and that 2,612 trans-eQTL-trans-eGene pairs (35%) are significantly mediated by expression of cis-eGenes near the trans-eQTL. The proportion of mediation ranged from 1.4% to 100% (mean 15%). For trans-eQTLs and trans-eGenes on the same chromosome (by definition, separated by at least 5 Mb), we found that 913 out of 931 trans-eQTLs (98%) affect cis-eGene expression and that 1,011 trans-eQTL-trans-eGene pairs (90%) are significantly mediated by expression of cis-eGenes near the trans-eQTL, suggesting that trans-eGenes on the same chromosome are highly regulated through cis-eGenes (Table S5).
trans-eQTL Hotspots
Among the 2,324 trans-eQTLs, we identified 13 trans-eQTL hotspots across eight chromosomes, with the index SNP associated with at least ten transcripts (Table 1 and Figure 3). Notably, 8 out of 13 trans-eQTL hotspots were also identified in the Blood eQTL Browser,12 indicating that hotspots are more replicable than individual trans-eQTLs. For these trans-eQTL hotspots, we found that cis-eQTLs linked to trans-eQTL hotspots have smaller effects compared to cis-eQTLs not located within trans-eQTL hotspots (mean R2 0.009 versus 0.02, t test p < 1 × 10−8) and have similar effect sizes as trans-eQTLs (mean R2 0.009 versus 0.01, Table S6, Figure S2B). We found that eGenes associated with trans-eQTL hotspots have a directional bias, with 65% of trans-eGenes showing the same directional effect in relation to the trans-eQTLs,33 rather than equal ratios of overexpression versus underexpression, as would be expected if the trans-eQTLs randomly affect the direction of expression of their corresponding trans-eGenes. For example, for age at menarche and HDL cholesterol, the associated trans-eGenes show 100% directionally consistent expression in relation to the index trans-eQTL (Table 1). One explanation for this observation is that the trans-eQTL alters the activity or abundance of a transcription factor (TF, or other trans-acting factor), leading to concordant expression changes of all the target genes of this factor.
Table 1.
Hotspot Location (hg19) | Number of trans-eQTLs | Number of trans-eGenes Associated with Index eQTL | Directional Bias of trans-eGenes Associated with Index eQTL | Traits Associated in GWAS with Index eQTLs | trans-eGene Enrichment in TF Motifsa |
---|---|---|---|---|---|
1: 205,187,981–205,244,972 | 10 | 10 | +64% | platelet count | NA |
1: 248,039,451 | 1 | 12 | −58% | red blood cell count | STAT1/STAT2 |
2: 60,708,597–60,725,451 | 14 | 14 | −79% | fetal hemoglobin level | NFAT/SP1 |
3: 50,093,209 | 1 | 24 | +100% | age at menarche | NA |
3: 56,849,749–56,865,776 | 2 | 126; 84 | −94%; +92% | platelet count; mean platelet volume | TCF3/ETS2 |
6: 135,411,228–13,543,5501 | 13 | 22 | −55% | fetal hemoglobin | NA |
6: 139,840,693–139,844,429 | 13 | 48 | −70% | erythrocyte count | SP1/TCF3 |
7: 50,423,963–50,562,361 | 19 | 76 | −59% | childhood acute lymphoblastic leukemia | PAX4 |
12: 54,712,308–54,736,470 | 2 | 14 | +79% | mean platelet volume | ETS2 |
12: 111,884,608–112,610,714 | 9 | 13 | −62% | LDL cholesterol; blood pressure; asthma | NA |
16: 57,061,189–57,061,189 | 2 | 10 | −100% | HDL cholesterol | IRF8/IRF2 |
17: 27,072,463–27,322,441 | 45 | 32 | +55% | mean corpuscular volume | E4F1 |
17: 33,796,260–33,944,055 | 4 | 51 | +75% | mean platelet volume | ETS2/MAZ |
Plus sign (+) denotes the positive association; minus sign (−) denotes the negative association.
Transcription factors whose motifs were matched with promoter regions [−2 kb, 2 kb] around transcription start site of the trans-eGenes; NA, no TF target enrichment.
For all 13 trans-eQTL hotspots, we found that 37% of trans-eQTL-trans-eGene associations were mediated by the expression of cis-eGenes (at p < 0.005 based on 1,000 bootstrap permutations, Figure 1). The strongest mediation effect was found at the NLRC5 locus on chromosome 16, which is associated with expression of ten eGenes at the HLA locus on chromosome 6. We found that 80% of the genetic effects between rs291040 and TAP1 are mediated by the cis expression of NLRC5 (Table 2). Prior studies have shown that NLRC5 acts as a master regulator of MHC class genes in immune response34 and interacts with the RFX transcription factor complex to induce MHC class I gene expression.35 Using HaploReg v.4.1, we found that SNPs in trans-eQTL hotspots are significantly enriched for regulatory motifs (hypergeometric p = 3.7 × 10−5), with 121 out of 138 (88%) SNPs in ENCODE TF binding experiments (Table S7), suggesting that the expression of trans-eGenes is controlled by these SNPs. Among 37 cis-eGenes linked to trans-eQTL hotspots, we found that two are TFs (NFE2 [MIM: 601490] and IKZF1 [MIM: 603023]). Although cis-eGenes are not enriched for TFs, we found that trans-eGenes are significantly enriched for TF targets (at FDR < 0.05, Table 1) in 9 of 13 trans-eQTL hotspots. In addition, we found that 13 of 37 cis-eGenes shared the same regulatory motifs with the trans-eGene(s) of the same trans-hotspot, suggesting that both cis-eGene and trans-eGene are under the same regulatory control. The enriched functions of trans-eGenes are also highly consistent with the traits affected by the trans-eQTLs. For example, trans-eQTLs on chromosome 2 are associated with platelet count in GWASs. We found that the trans-eGenes in this hotspot are enriched for platelet degranulation (Table S8). Moreover, analyzing 25 cis-eGenes having binary interactions identified in a systematic screen for protein-protein interactions (PPI),30 we found that 15 of them are hub genes in the PPI network (Figure S3, p < 0.001 for randomly selecting 25 proteins in a PPI network), suggesting that cis-eGenes linked to trans-eQTLs play a central regulatory role in critical biological pathways through their mediation effects on trans-eGenes.
Table 2.
SNPs | Hotspot Number | cis-eGene | trans-eGene | SNP-eGene trans Association (β) | SNP- trans eGene Association Adjusted for cis-eGene (β) | Proportion of Mediationa | p Value for Mediation |
---|---|---|---|---|---|---|---|
rs3811444 | 2 | TRIM58 | ZER1 | −0.015 | −0.011 | 27% | 0.001 |
rs6762477 | 3 | UBA7 | RCAN3 (MIM: 605860) | 0.021 | 0.014 | 37% | 0.002 |
rs12718597 | 8 | IKZF1 (MIM: 603023) | TMEM9B (MIM: 616877) | −0.024 | −0.021 | 12% | 0.002 |
rs11065987 | 10 | ALDH2 | ARHGEF40 | 0.017 | 0.013 | 23% | 0.002 |
rs291040 | 11 | NLRC5 (MIM: 613537) | TAP1 (MIM: 170260) | −0.023 | −0.003 | 88% | 0.001 |
rs10512472 | 13 | AP2B1 (MIM: 601025) | TRAK2 (MIM: 607334) | 0.023 | 0.014 | 40% | 0.001 |
Proportion of mediation of the trans-eQTL-trans-eGene association by the cis-eGene.
Causal Effects between eQTLs and Phenotypes
To test whether expression levels of eGenes (cis or trans) associated with eQTLs might explain the observed associations between eQTLs and phenotypes, we conducted causal inference testing (CIT) using the statistical package CIT in R.25, 36 We applied this approach to the analysis of eight common cardiometabolic traits (BMI, blood lipid levels [HDL-cholesterol, LDL-cholesterol, triglycerides, and total cholesterol], fasting blood glucose, and systolic and diastolic blood pressure [SBP and DBP]) that were available along with genotype and gene expression data for 5,257 FHS participants. Among cis-eQTLs, we identified the SH2B3 (MIM: 605093)/ALDH2 (MIM: 100650) locus as having a causal effect on DBP (p = 0.005) and SBP (p = 0.02) through ALDH2 expression (Table 3). SNPs in this locus are associated with coronary artery disease (CAD)/myocardial infarction (MI), blood pressure, LDL-cholesterol, and type 1 diabetes (Figure 4). Not only was the cis-locus found to be associated with risk of CAD/MI,37 recent studies describe the trans-regulation of MYADM (MIM: 609959) and TAGAP (MIM: 609667) expression by the same trans-eQTL.38, 39 In addition, we found on average that 11% of trans-regulation of trans-eGenes for this module is mediated through expression of ALDH2, suggesting a new target and regulatory mechanism related to this CAD/MI module. Two additional causal loci are RAB3GAP1 (MIM: 602536) for total cholesterol (eGenes MCM6 [MIM: 601806] and DARS [MIM: 603084]) and LOC105375745 for triglycerides (eGene TRIB1 [MIM: 609461]). Among trans-eQTLs, we identified the TAGAP locus as having a causal effect on LDL and total cholesterol through expression of TAGAP. The TAGAP locus has been found to be significantly associated with lipoprotein (a) levels40 and TAGAP was reported to be differentially expressed in CAD patients41 and after atorvastatin treatment.42 Another possible mechanism to explain the association of trans-eQTLs with the expression of their trans-eGenes is reverse causality, whereby an eQTL alters expression of a trans-eGene through its effect on phenotype. In this case, the phenotype serves as a mediator (feedback effect). CIT, however, did not identify any examples of reverse causal effects (at p < 0.05).
Table 3.
Trait |
eQTL Annotation (hg19) |
eQTL-Trait Association |
eQTL-eGene Association Given Trait |
eGene-Trait Association Given eQTL |
p Value (CIT) | |||||
---|---|---|---|---|---|---|---|---|---|---|
ID | Location | Causal eGenea | β | p Value | B | p Value | β | p Value | ||
SBP | rs11065898 | 12: 111,634,620 | ALDH2 (c) | −0.48 | 0.02 | 0.44 | 1.1 × 10−23 | 4.44 | 8.4 × 10−6 | 0.02 |
DBP | rs11065898 | 12: 111,634,620 | ALDH2(c) | 0.6 | 0.005 | −0.04 | 5.7 × 10−13 | 2.5 | 0.0001 | 0.005 |
LDL-cholesterol | rs926657 | 6: 159,463,452 | TAGAP(t) | 2.5 | 0.04 | 0.09 | 8.8 × 10−23 | 6.6 | 0.003 | 0.04 |
Total cholesterol | rs7570971 | 2: 135,837,906 | MCM6 (c) | 3.0 | 8 × 10−5 | 0.09 | 1.3 × 10−109 | 7.8 | 0.003 | 0.01 |
rs7570971 | 2: 135,837,906 | DARS (c) | 3.0 | 8 × 10−5 | 0.03 | 2.3 × 10−23 | 11.4 | 0.0002 | 0.01 | |
rs926657 | 6: 159,463,452 | TAGAP (t) | 3.1 | 0.04 | 0.09 | 2.1 × 10−22 | 8.7 | 0.001 | 0.04 | |
Triglycerides | rs4604455 | 8: 125,505,785 | TRIB1 (c) | 0.02 | 0.0008 | 0.02 | 0.0005 | −0.03 | 0.02 | 0.03 |
Abbreviations are as follows: SBP, systolic blood pressure; DBP, diastolic blood pressure.
cis-eGenes denoted (c); trans-eGenes denoted (t).
Discussion
Identifying disease-causal genes and variants within GWASs results is an enormous challenge that simple association analysis cannot address.43 Unlike GWASs, where the association between a genetic variant and trait is unidirectional, in transcriptome-wide association studies (TWASs) the direction of association between transcript and phenotype is not clear and causal inference must be drawn with caution.44 eQTL studies hold the promise of revealing biological mechanisms of SNP-phenotype associations; integrating GWASs with TWASs may help prioritize genes and variants for functional studies.44 In this study, we used a causal inference approach to infer causal relations and their directionality by integrating SNPs from GWASs with gene expression and phenotype data predicated on the assumption that if a gene is causally related to a phenotype, a nearby genetic variant (i.e., a cis-eQTL) that explains a large proportion of its expression should be associated with the same phenotype.
We discovered that many trans-eQTL-trans-eGene associations are mediated by cis-eGene expression, reflecting a complex regulatory mechanism. An intuitive explanation for hidden regulation of trans-eGenes is TFs that directly influence gene transcription. Although we found no enrichment for TFs among trans-eGenes, we found that more than one-third of cis-eGenes shared a common motif with trans-eGenes from the same trans-hotspot, indicating that there may exist indirect relations of cis-eGenes to TFs. For example, we identified trans-eGenes in the chromosome 7 hotspot that were enriched for the targets of TF PAX4 (MIM: 167413). The cis-eGene at this hotspot is TF IKZF1 and although PAX4 and IKZF1 are different TFs, they share common motifs.
Using different cell types and populations, Pierce et al.15 also reported a similar proportion (∼20%) of trans-eQTLs that act through cis-mediation, indicating that the mechanism of cis-eGene mediation of trans-eGene expression may be a common feature genome wide. To extend this concept, we explored how this phenomenon affects disease pathways. For example, we observed that rs174538, which was reported to be associated with plasma phospholipids in GWASs,45 is a trans-eQTL of LDLR (MIM: 606945) expression (p = 3.69 × 10−8). This association, however, was not significant after adjusting for expression of FADS2 (a cis-eGene of rs174538) and the proportion of mediation of FADS2 (MIM: 606149) on LDLR was 100%. FADS2 is a key gene influencing n-3 polyunsaturated fatty acids (PUFA) levels and PUFA levels have been found to upregulate LDL receptor protein expression in fibroblasts and HepG2 cells,46 indicating a likely pathway from PUFA to lipid metabolism. A recent study reported that Fads1 KO mice had 40% less atheromatous plaque compared to wild-type littermates.47 Therefore, the FADS gene could be a putative therapeutic target for cardiovascular disease prevention and treatment.
We found that 10 out of 13 trans-eQTL hotspots are blood trait related and five of them replicated in the Blood eQTL Browser.12 Among the 227 trans-eGenes associated with platelet SNPs, 26 were reported as platelet eQTL-genes,48 suggesting that trans-eQTLs are highly tissue specific and that SNPs might remotely affect tissue-specific eGenes. For example, some of the loci identified in GWASs for platelet traits (e.g., ARHGEF3) affect the expression of hundreds of genes and may be key drivers of hematopoiesis and affect multiple blood cell lineages.49
This study has several limitations. First, the Blood eQTL Browser12 is the only database that includes extensive trans-eQTL results in a comparable large sample size. Therefore, our results cannot readily be validated in other tissues as most other large eQTL databases provide only cis-eQTLs. Second, although ours is one of the largest studies to detect trans-eQTLs, we are still underpowered for causal inference testing, which tests the SNP-phenotype association as the first condition. Therefore, many genes were excluded from causality testing because they did not fulfill the first condition.
In summary, we provide evidence of a cis-mediated mechanism that explains distal regulation of trans-eGenes by their trans-eQTLs. Importantly, the causal loci, especially the trans-eQTLs identified from our integrative genomic approach, could not be detected from traditional GWASs by searching SNPs around the GWAS signal. Our next steps are to explore eQTL data from more disease-related tissues and to incorporate whole-genome sequence data to identify more causal eQTLs. We speculate that it may be worthwhile to apply this approach across eQTL databases and across multiple phenotypes as a means of identifying plausible targets for therapeutic intervention.
Acknowledgments
We thank all of the study participants who helped to create this valuable resource and supported this work. We thank the data management group of FHS for organizing and providing these data. We thank the NIH Fellows Editorial Board members for their valuable edits and comments. This study used the high-performance computational capabilities of the Biowulf Linux cluster at the NIH. The FHS is funded by NIH contract N01-HC-25195. The laboratory work for this investigation was funded by the Division of Intramural Research, National Heart, Lung, and Blood Institute, NIH. The analytical component of this project was funded by American Heart Association (AHA) Cardiovascular Genome-Phenome Study (CVGPS) grant 15CVGPS23430000. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the NIH; or the U.S. Department of Health and Human Services.
Published: March 9, 2017
Footnotes
Supplemental Data include three figures and eight tables and can be found with this article online at http://dx.doi.org/10.1016/j.ajhg.2017.02.003.
Web Resources
Bioconductor, http://www.bioconductor.org
Blood eQTL Browser, http://genenetwork.nl/bloodeqtlbrowser/
GRASP, downloaded June 2016, http://grasp.nhlbi.nih.gov/Overview.aspx
HaploReg v.4.1, http://www.broadinstitute.org/mammals/haploreg/haploreg.php
mediation: Causal Mediation Analysis, https://cran.r-project.org/web/packages/mediation/index.html
OMIM, http://www.omim.org/
R statistical software, http://www.r-project.org/
Supplemental Data
References
- 1.Welter D., MacArthur J., Morales J., Burdett T., Hall P., Junkins H., Klemm A., Flicek P., Manolio T., Hindorff L., Parkinson H. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001–D1006. doi: 10.1093/nar/gkt1229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Eicher J.D., Landowski C., Stackhouse B., Sloan A., Chen W., Jensen N., Lien J.P., Leslie R., Johnson A.D. GRASP v2.0: an update on the Genome-Wide Repository of Associations between SNPs and phenotypes. Nucleic Acids Res. 2015;43:D799–D804. doi: 10.1093/nar/gku1202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Visscher P.M., Brown M.A., McCarthy M.I., Yang J. Five years of GWAS discovery. Am. J. Hum. Genet. 2012;90:7–24. doi: 10.1016/j.ajhg.2011.11.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Schizophrenia Working Group of the Psychiatric Genomics Consortium Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511:421–427. doi: 10.1038/nature13595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Manolio T.A., Collins F.S., Cox N.J., Goldstein D.B., Hindorff L.A., Hunter D.J., McCarthy M.I., Ramos E.M., Cardon L.R., Chakravarti A. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Schaub M.A., Boyle A.P., Kundaje A., Batzoglou S., Snyder M. Linking disease associations with regulatory information in the human genome. Genome Res. 2012;22:1748–1759. doi: 10.1101/gr.136127.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Rockman M.V., Kruglyak L. Genetics of global gene expression. Nat. Rev. Genet. 2006;7:862–872. doi: 10.1038/nrg1964. [DOI] [PubMed] [Google Scholar]
- 8.Michaelson J.J., Loguercio S., Beyer A. Detection and interpretation of expression quantitative trait loci (eQTL) Methods. 2009;48:265–276. doi: 10.1016/j.ymeth.2009.03.004. [DOI] [PubMed] [Google Scholar]
- 9.Atanasovska B., Kumar V., Fu J., Wijmenga C., Hofker M.H. GWAS as a driver of gene discovery in cardiometabolic diseases. Trends Endocrinol. Metab. 2015;26:722–732. doi: 10.1016/j.tem.2015.10.004. [DOI] [PubMed] [Google Scholar]
- 10.Breitling R., Li Y., Tesson B.M., Fu J., Wu C., Wiltshire T., Gerrits A., Bystrykh L.V., de Haan G., Su A.I., Jansen R.C. Genetical genomics: spotlight on QTL hotspots. PLoS Genet. 2008;4:e1000232. doi: 10.1371/journal.pgen.1000232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Stranger B.E., Montgomery S.B., Dimas A.S., Parts L., Stegle O., Ingle C.E., Sekowska M., Smith G.D., Evans D., Gutierrez-Arcelus M. Patterns of cis regulatory variation in diverse human populations. PLoS Genet. 2012;8:e1002639. doi: 10.1371/journal.pgen.1002639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Westra H.J., Peters M.J., Esko T., Yaghootkar H., Schurmann C., Kettunen J., Christiansen M.W., Fairfax B.P., Schramm K., Powell J.E. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 2013;45:1238–1243. doi: 10.1038/ng.2756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zhang X., Gierman H.J., Levy D., Plump A., Dobrin R., Goring H.H., Curran J.E., Johnson M.P., Blangero J., Kim S.K. Synthesis of 53 tissue and cell line expression QTL datasets reveals master eQTLs. BMC Genomics. 2014;15:532. doi: 10.1186/1471-2164-15-532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bryois J., Buil A., Evans D.M., Kemp J.P., Montgomery S.B., Conrad D.F., Ho K.M., Ring S., Hurles M., Deloukas P. Cis and trans effects of human genomic variants on gene expression. PLoS Genet. 2014;10:e1004461. doi: 10.1371/journal.pgen.1004461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Pierce B.L., Tong L., Chen L.S., Rahaman R., Argos M., Jasmine F., Roy S., Paul-Brutus R., Westra H.J., Franke L. Mediation analysis demonstrates that trans-eQTLs are often explained by cis-mediation: a genome-wide analysis among 1,800 South Asians. PLoS Genet. 2014;10:e1004818. doi: 10.1371/journal.pgen.1004818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Yao C., Chen B.H., Joehanes R., Otlu B., Zhang X., Liu C., Huan T., Tastan O., Cupples L.A., Meigs J.B. Integromic analysis of genetic variation and gene expression identifies networks for cardiovascular disease phenotypes. Circulation. 2015;131:536–549. doi: 10.1161/CIRCULATIONAHA.114.010696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Joehanes R., Zhang X., Huan T., Yao C., Ying S.X., Nguyen Q.T., Demirkale C.Y., Feolo M.L., Sharopova N.R., Sturcke A. Integrated genome-wide analysis of expression quantitative trait loci aids interpretation of genomic association studies. Genome Biol. 2017;18:16. doi: 10.1186/s13059-016-1142-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Mahmood S.S., Levy D., Vasan R.S., Wang T.J. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. Lancet. 2014;383:999–1008. doi: 10.1016/S0140-6736(13)61752-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Joehanes R., Johnson A.D., Barb J.J., Raghavachari N., Liu P., Woodhouse K.A., O’Donnell C.J., Munson P.J., Levy D. Gene expression analysis of whole blood, peripheral blood mononuclear cells, and lymphoblastoid cell lines from the Framingham Heart Study. Physiol. Genomics. 2012;44:59–75. doi: 10.1152/physiolgenomics.00130.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Howie B., Fuchsberger C., Stephens M., Marchini J., Abecasis G.R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 2012;44:955–959. doi: 10.1038/ng.2354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Irizarry R.A., Hobbs B., Collin F., Beazer-Barclay Y.D., Antonellis K.J., Scherf U., Speed T.P. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4:249–264. doi: 10.1093/biostatistics/4.2.249. [DOI] [PubMed] [Google Scholar]
- 22.Bates D., Machler M., Bolker B.M., Walker S.C. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 2015;67:1–48. [Google Scholar]
- 23.Stegle O., Parts L., Piipari M., Winn J., Durbin R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 2012;7:500–507. doi: 10.1038/nprot.2011.457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Benjamini Y., Hochberg Y. Controlling the false discovery rate - a practical and powerful approach to multiple testing. J. R. Stat. Soc. B. 1995;57:289–300. [Google Scholar]
- 25.Millstein J., Zhang B., Zhu J., Schadt E.E. Disentangling molecular relationships with a causal inference test. BMC Genet. 2009;10:23. doi: 10.1186/1471-2156-10-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Orozco L.D., Morselli M., Rubbi L., Guo W., Go J., Shi H., Lopez D., Furlotte N.A., Bennett B.J., Farber C.R. Epigenome-wide association of liver methylation patterns and complex metabolic traits in mice. Cell Metab. 2015;21:905–917. doi: 10.1016/j.cmet.2015.04.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ward L.D., Kellis M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 2012;40:D930–D934. doi: 10.1093/nar/gkr917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S., Mesirov J.P. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Lizio M., Harshbarger J., Abugessaisa I., Noguchi S., Kondo A., Severin J., Mungall C., Arenillas D., Mathelier A., Medvedeva Y.A. Update of the FANTOM web resource: high resolution transcriptome of diverse cell types in mammals. Nucleic Acids Res. 2017;45(D1):D737–D743. doi: 10.1093/nar/gkw995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Rolland T., Taşan M., Charloteaux B., Pevzner S.J., Zhong Q., Sahni N., Yi S., Lemmens I., Fontanillo C., Mosca R. A proteome-scale map of the human interactome network. Cell. 2014;159:1212–1226. doi: 10.1016/j.cell.2014.10.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Nicolae D.L., Gamazon E., Zhang W., Duan S., Dolan M.E., Cox N.J. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 2010;6:e1000888. doi: 10.1371/journal.pgen.1000888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.McKenzie M., Henders A.K., Caracella A., Wray N.R., Powell J.E. Overlap of expression quantitative trait loci (eQTL) in human brain and blood. BMC Med. Genomics. 2014;7:31. doi: 10.1186/1755-8794-7-31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Thompson D., Regev A., Roy S. Comparative analysis of gene regulatory networks: from network reconstruction to evolution. Annu. Rev. Cell Dev. Biol. 2015;31:399–428. doi: 10.1146/annurev-cellbio-100913-012908. [DOI] [PubMed] [Google Scholar]
- 34.Kobayashi K.S., van den Elsen P.J. NLRC5: a key regulator of MHC class I-dependent immune responses. Nat. Rev. Immunol. 2012;12:813–820. doi: 10.1038/nri3339. [DOI] [PubMed] [Google Scholar]
- 35.Meissner T.B., Liu Y.J., Lee K.H., Li A., Biswas A., van Eggermond M.C., van den Elsen P.J., Kobayashi K.S. NLRC5 cooperates with the RFX transcription factor complex to induce MHC class I gene expression. J. Immunol. 2012;188:4951–4958. doi: 10.4049/jimmunol.1103160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Millstein J., Chen G.K., Breton C.V. cit: hypothesis testing software for mediation analysis in genomic applications. Bioinformatics. 2016;32:2364–2365. doi: 10.1093/bioinformatics/btw135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Deloukas P., Kanoni S., Willenborg C., Farrall M., Assimes T.L., Thompson J.R., Ingelsson E., Saleheen D., Erdmann J., Goldstein B.A., CARDIoGRAMplusC4D Consortium. DIAGRAM Consortium. CARDIOGENICS Consortium. MuTHER Consortium. Wellcome Trust Case Control Consortium Large-scale association analysis identifies new risk loci for coronary artery disease. Nat. Genet. 2013;45:25–33. doi: 10.1038/ng.2480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Hunt K.A., Zhernakova A., Turner G., Heap G.A., Franke L., Bruinenberg M., Romanos J., Dinesen L.C., Ryan A.W., Panesar D. Newly identified genetic risk variants for celiac disease related to the immune response. Nat. Genet. 2008;40:395–402. doi: 10.1038/ng.102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Huan T., Esko T., Peters M.J., Pilling L.C., Schramm K., Schurmann C., Chen B.H., Liu C., Joehanes R., Johnson A.D., International Consortium for Blood Pressure GWAS (ICBP) A meta-analysis of gene expression signatures of blood pressure and hypertension. PLoS Genet. 2015;11:e1005035. doi: 10.1371/journal.pgen.1005035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lu W., Cheng Y.C., Chen K., Wang H., Gerhard G.S., Still C.D., Chu X., Yang R., Parihar A., O’Connell J.R. Evidence for several independent genetic variants affecting lipoprotein (a) cholesterol levels. Hum. Mol. Genet. 2015;24:2390–2400. doi: 10.1093/hmg/ddu731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Arvind P., Jayashree S., Jambunathan S., Nair J., Kakkar V.V. Understanding gene expression in coronary artery disease through global profiling, network analysis and independent validation of key candidate genes. J. Genet. 2015;94:601–610. doi: 10.1007/s12041-015-0548-3. [DOI] [PubMed] [Google Scholar]
- 42.Won H.H., Kim S.R., Bang O.Y., Lee S.C., Huh W., Ko J.W., Kim H.G., McLeod H.L., O’Connell T.M., Kim J.W., Lee S.Y. Differentially expressed genes in human peripheral blood as potential markers for statin response. J. Mol. Med. 2012;90:201–211. doi: 10.1007/s00109-011-0818-3. [DOI] [PubMed] [Google Scholar]
- 43.Wang K., Dickson S.P., Stolle C.A., Krantz I.D., Goldstein D.B., Hakonarson H. Interpretation of association signals and identification of causal variants from genome-wide association studies. Am. J. Hum. Genet. 2010;86:730–742. doi: 10.1016/j.ajhg.2010.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Gusev A., Ko A., Shi H., Bhatia G., Chung W., Penninx B.W., Jansen R., de Geus E.J., Boomsma D.I., Wright F.A. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 2016;48:245–252. doi: 10.1038/ng.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Lemaitre R.N., Tanaka T., Tang W., Manichaikul A., Foy M., Kabagambe E.K., Nettleton J.A., King I.B., Weng L.C., Bhattacharya S. Genetic loci associated with plasma phospholipid n-3 fatty acids: a meta-analysis of genome-wide association studies from the CHARGE Consortium. PLoS Genet. 2011;7:e1002193. doi: 10.1371/journal.pgen.1002193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Yu-Poth S., Yin D., Kris-Etherton P.M., Zhao G., Etherton T.D. Long-chain polyunsaturated fatty acids upregulate LDL receptor protein expression in fibroblasts and HepG2 cells. J. Nutr. 2005;135:2541–2545. doi: 10.1093/jn/135.11.2541. [DOI] [PubMed] [Google Scholar]
- 47.Powell D.R., Gay J.P., Smith M., Wilganowski N., Harris A., Holland A., Reyes M., Kirkham L., Kirkpatrick L.L., Zambrowicz B. Fatty acid desaturase 1 knockout mice are lean with improved glycemic control and decreased development of atheromatous plaque. Diabetes Metab. Syndr. Obes. 2016;9:185–199. doi: 10.2147/DMSO.S106653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Simon L.M., Chen E.S., Edelstein L.C., Kong X., Bhatlekar S., Rigoutsos I., Bray P.F., Shaw C.A. Integrative multi-omic analysis of human platelet eQTLs reveals alternative start site in mitofusin 2. Am. J. Hum. Genet. 2016;98:883–897. doi: 10.1016/j.ajhg.2016.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Gieger C., Radhakrishnan A., Cvejic A., Tang W., Porcu E., Pistis G., Serbanovic-Canic J., Elling U., Goodall A.H., Labrune Y. New gene functions in megakaryopoiesis and platelet formation. Nature. 2011;480:201–208. doi: 10.1038/nature10659. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.