Large-Scale trans-eQTLs Affect Hundreds of Transcripts and Mediate Patterns of Transcriptional Co-regulation

Boel Brynedal; JinMyung Choi; Towfique Raj; Robert Bjornson; Barbara E Stranger; Benjamin M Neale; Benjamin F Voight; Chris Cotsapas

doi:10.1016/j.ajhg.2017.02.004

. 2017 Mar 9;100(4):581–591. doi: 10.1016/j.ajhg.2017.02.004

Large-Scale trans-eQTLs Affect Hundreds of Transcripts and Mediate Patterns of Transcriptional Co-regulation

Boel Brynedal ^1,^2,¹³, JinMyung Choi ¹, Towfique Raj ^2,^3,^4,^5,¹⁴, Robert Bjornson ⁶, Barbara E Stranger ⁷, Benjamin M Neale ^2,^5,⁸, Benjamin F Voight ^9,^10,¹¹, Chris Cotsapas ^1,^2,^12,^∗

PMCID: PMC5384037 PMID: 28285767

Abstract

Efforts to decipher the causal relationships between differences in gene regulation and corresponding differences in phenotype have been stymied by several basic technical challenges. Although detecting local, cis-eQTLs is now routine, trans-eQTLs, which are distant from the genes of origin, are far more difficult to find because millions of SNPs must currently be compared to thousands of transcripts. Here, we demonstrate an alternative approach: we looked for SNPs associated with the expression of many genes simultaneously and found that hundreds of trans-eQTLs each affect hundreds of transcripts in lymphoblastoid cell lines across three African populations. These trans-eQTLs target the same genes across the three populations and show the same direction of effect. We discovered that target transcripts of a high-confidence set of trans-eQTLs encode proteins that interact more frequently than expected by chance, are bound by the same transcription factors, and are enriched for pathway annotations indicative of roles in basic cell homeostasis. We thus demonstrate that our approach can uncover trans-acting transcriptional control circuits that affect co-regulated groups of genes: a key to understanding how cellular pathways and processes are orchestrated.

Keywords: trans-eQTL, cross phenotype meta analysis, transcription, master regulator, regulatory network

Introduction

Biological processes are carefully orchestrated events, requiring precise activation and repression of participating genes by hierarchical gene regulatory mechanisms. This elaborate co-regulation can be seen in the complex patterns of gene co-expression across tissues¹ and conditions;² the overlap and organization of transcription factor target sets;³ the precise orchestration of developmental processes; and the organization of gene interaction networks.⁴ However, uncovering the underlying genes, understanding how some can participate in diverse or conflicting processes, and exposing the regulatory framework that controls each process remains challenging, even with the widespread deployment of genome-scale technologies. This dissection has become even more relevant as we have come to appreciate that a substantial fraction of common genetic variants driving human physiological traits, including susceptibility to a wide variety of diseases, affect gene regulation rather than altering protein structure,⁵^,⁶ demonstrating that regulatory changes to cellular processes are major determinants of variation between individuals in a population.

Transcript abundance levels are highly heritable, which is largely attributable to genetic variants far from the genomic loci encoding the affected genes,⁷ rather than in or near the gene locus itself.⁸ Expression quantitative trait loci (eQTL) mapping studies in yeast,⁹ mouse,¹⁰^,¹¹ rat,¹² maize,¹³ and human¹³^,¹⁴ have demonstrated that some genomic loci simultaneously affect the abundance of multiple transcripts encoded throughout the genome. These distant, or trans-eQTLs, must act on regulatory circuits governing groups of genes, implying that we can uncover the genes participating in biological processes and the mechanisms of regulation that govern them by mapping these little-known trans-acting eQTLs.⁸

For purely practical reasons, most studies to date have concentrated on mapping cis-acting eQTLs (local to the gene region), which explain a small fraction of total transcript-level heritability.⁷ Focusing on cis-eQTLs reduces the multiple testing burden: only a small subset of genetic markers proximal to each transcript need be tested for association, conserving the statistical power of eQTL cohorts, which are limited to a few hundred individuals because of expense and difficulty with sample acquisition. However, cis-eQTLs cannot help elucidate the co-regulatory landscape of the transcriptome and thus the higher organization of gene regulation. Although detecting trans-eQTLs can provide this deeper understanding, it currently requires testing of all markers across the genome for association to each transcript, substantially increasing the multiple testing burden,⁹^,¹⁵ though at least one approach to minimizing this burden through adaptive false discovery rate estimation.¹⁶ This approach has identified a limited number of trans-eQTLs in human lymphoblastoid cell lines,¹⁷^,¹⁸^,¹⁹ adipose tissue,²⁰^,²¹ and whole blood,²² further indicating that trans-eQTLs can be identified in these cohorts.

Here we present a complementary approach to identifying trans-eQTLs that influence many transcripts simultaneously. We reason that if no transcript abundances are associated to a genetic marker, the distribution of the association statistics for all transcripts to that marker will follow the expected null distribution. However, if a subset of transcripts is associated to the marker, the association statistics will actually form a mixture drawn from the null and alternative distributions.²³ We can therefore assess the likelihood of these competing hypotheses for test statistics at each marker, and reject the null for markers with evidence that many association statistics are non null. We have previously described cross-phenotype meta-analysis (CPMA²³), a statistic designed to test this prediction. Here, we adapt this approach to the detection of trans-eQTLs by deriving an empirical null distribution to account for the correlation between gene expression levels. This second-level significance testing²⁴ offers evidence for the presence of a trans-eQTL without requiring that we specify which transcripts are associated; in fact, it does not identify which transcripts are affected. We can then verify that the candidate trans-eQTLs affect the same targets across multiple cohorts, indicating that they represent true regulatory mechanisms. Here, we interrogate publicly available eQTL data from lymphoblastoid cell lines across three African HapMap populations¹⁹ and show evidence of many trans-eQTLs active in these data. We describe eight independent trans-eQTLs associated with many transcripts in each of the three populations, and which target the same genes with the same direction of effect across the three populations (neither parameter is considered by our distributional test). We then show that target transcripts of trans-eQTLs encode proteins that interact more frequently than expected by chance and are enriched for pathway annotations indicative of roles in basic cell homeostasis, suggesting that they are co-regulated sets of genes. We thus provide evidence that trans-eQTLs affect multiple human genes and regulate biologically coherent sets of genes. Importantly, our new methodology suggests not only that trans-eQTLs can be identified in ongoing transcriptional profiling of large cohort²⁵ but that they are discoverable in extant data in the public domain.

Material and Methods

Unless otherwise stated, all statistical analyses were done using the R programming language (v.3.1.0).²⁶ Additional libraries are cited where appropriate. An overview of our pipeline (CPMAtranseqtl) is shown in Figure S1, and our pipeline is available for download (see Web Resources).

Genotype Data Processing

We selected unrelated individuals from the three African populations included in the HapMap project phase III, reasoning that the high genetic diversity and average minor allele frequencies observed in Africa will increase the statistical power of the eQTL association tests. We obtained genome-wide genotype data for 135 Maasai in Kinyawa, Kenya (MKK); 83 Luhya in Webuye, Kenya (LWK); and 107 Yoruba in Ibadan, Nigeria (YRI) from the HapMap Project website (see Web Resources). As our sample size is limited, we restricted our analysis to 737,867 autosomal markers with at least 15% minor allele frequency in all three populations. All remaining variants are in Hardy-Weinberg equilibrium (p_HWE > 1 × 10⁻⁶); all individuals have <3% of genotypes missing and all remaining variants have <8% missing data. Genotype data annotation was converted into hg38 coordinates.

Expression Data Processing

We obtained processed expression data for lymphoblastoid cell line profiling on the Illumina Human-6 v2 Expression BeadChip array for all 322 individuals (publicly available under ArrayExpress accession number E-MTAB-264).¹⁹ The expression data includes 21,802 probes mapping to one single gene, excluding probes that map to multiple genes or to genes on the X or Y chromosome, and that have not been subjected to the PEER method.¹⁹ After quantile normalization to reduce inter-individual variability,²⁷ we removed probesets with low variance or low intensity in each population. Both the interquartile range and mean intensity across probe sets showed clear bimodal shapes (Figures S2 and S3); we used mixture modeling (mclust v.5.1 in R²⁸) to detect those probe sets that belonged to each higher distribution with 80% probability. We retained those probe sets that had a higher variance and higher intensity in all three populations, resulting in 9,085 analyzed probe sets.

By converting Illumina probe IDs to HGNC gene symbols (biomaRt v.2.22.0²⁹), we mapped 8,673/9,085 probesets to 7,984 unique HGNC genes with unambiguous hg38 genomic coordinates in GENCODE v.20. Unmapped probesets were excluded from analyses relying on annotation.

Expression data suffer from systematic, non-genetic biases, hampering eQTL studies.³⁰ Several multivariate approaches have been used to correct these data artifacts,¹⁶^,³¹^,³² all of which identify trends in variance in expression data assumed to stem from (usually unmeasured) confounders. These methods clearly improve power to detect cis-eQTLs²⁰^,³³ but cannot distinguish between systematic artifacts and genuine trans-eQTLs, both of which will explain some proportion of variance across many transcripts.¹⁶^,³¹^,³² For this reason, we have chosen not to use these corrections in our data processing pipeline, as our goal is to detect the presence of trans-eQTLs.

Calculating eQTL Association Statistics

We calculated association statistics for each probeset intensity to each SNP by linear regression,³⁴ controlling for population stratification by adding structure principal components as covariates.³⁵ In each population, we estimated the optimal number of principal components by incremental inclusion of components until the overall test statistic inflation was minimized, as previously described³⁶ (see Appendix A and Figure S4). We included the top two principal components for YRI, ten for LWK, and 20 for MKK, as optimal corrections for population stratification.

Identifying trans-eQTLs by Cross Phenotype Meta-analysis

Previous strategies to identify trans-eQTLs rely on either identifying significant associations to a single transcript²⁵^,³⁷ or associating variance components affecting multiple transcripts with genetic markers as surrogate phenotypes.³¹^,³⁸ We have previously described a second-level significance testing approach²⁴ to assess evidence of multiple associations at a genomic marker.²³ At each marker we tested for over-dispersion of association –log(p) values across all probe sets with a null hypothesis that −log(p) should be exponentially distributed, with a decay parameter $λ = 1$ . Under the joint alternative hypothesis, where a subset of association statistics are non null, $λ \neq 1$ . We compared the evidence for these hypotheses as a likelihood ratio test for our cross-phenotype meta-analysis (CPMA), where the statistic S_CPMA is defined as:

S_{C P M A} = - 2 \times l n (\frac{P [D a t a | λ = 1]}{P [D a t a | λ = \hat{λ}]}) \sim χ_{d f = 1}^{2}

where $\hat{λ}$ is the observed exponential decay rate in the data. Thus we needed to estimate only a single parameter, $\hat{λ}$ , so that the test has a single degree of freedom.

We accounted for the extensive correlation between t probeset levels across individuals by empirical significance testing. We simulated eQTL association statistics under the null expectation of no association to any marker given the observed correlation between probe sets association statistics from a multivariate normal distribution (using the MASS package in R³⁹). We performed an eigen-decomposition:

C = Q Λ Q^{T}

where the covariance matrix C has entries c_i,j = cov(a_i,a_j) where a_i and a_j are vectors of scaled z-scores for the ith and jth probesets across all markers in the genome. All three sample covariance matrices thus have dimension 9,085; because they are calculated from the probeset x SNP matrix of eQTL Z statistics rather than the probeset x individual matrix of expression levels, we find all three are positive definite (data not shown; additional details in Appendix A).

To account for the correlation between transcript expression levels, we generated the empirical null distribution Z^∗ of association statistics using:

Z^{∗} = μ + Q \sqrt{Ë} z

where z is a vector of i.i.d. standard normal values (N(0,1)) and μ a vector of mean eQTL Z statistics of the 9,085 probe sets.

We calculated p values from this null distribution, calculate S_CPMA and determined empirical significance as:

P_{C P M A} = \frac{\sum_{n = 1}^{N} (S_{n} > S_{0}) + 1}{N + 1}

where S_n is S_CPMA for the nth iteration of the null simulation, S₀ is the observed S_CPMA, and N is the number of permutations (here, N = 5,000,000).

We investigated the overlap of signal across populations using hypergeometric tests (hint v.0.1-1⁴⁰) of the independent SNPs (see below) at FDR α levels 0.5, 0.4, 0.3, 0.2, 0.1, and 0.05. We also investigated the enrichment across populations using Wilcoxon sum-rank test at different α levels.

Simulating eQTL Statistics to Test CPMA

To test the power of CPMA, we simulated trans-eQTLs by sampling mixtures of 9,085 association statistics drawn from the null and alternative distributions. We varied the proportion of statistics drawn from the alternative distribution and the non-centrality parameter of that alternative. For each combination of these parameters, we ran 1,000 trials and assessed CPMA performance using ROC curves, as shown in Figures S7 and S8.

Meta-analysis of CPMA Statistics

We combined empirical CPMA statistics from the three African populations using sample-size weighted meta-analysis.⁴¹ To identify independent effects across the genome, we clumped these meta-analysis results at r² < 0.2.³⁴

Analytical Validation of trans-eQTLs

To validate the detected trans-eQTLs, we performed two secondary analyses: we tested whether the trans-eQTL is associated to the same probesets in the three populations and whether the directions of effect are consistent across the three populations. This information is not used in the CPMA and meta-analysis calculations, and thus offers an independent validation analysis on these data.

We first empirically assessed evidence that a trans-eQTL is associated to the same probesets across populations. In a pair of populations P₁ and P₂, we observed N₁ and N₂ probesets with an eQTL p ≤ 0.05 at a trans-eQTL, with an intersect N_o = (N₁ ∩ N₂). We constructed the expected distribution of N_o using the N₁ and N₂ most associated probesets at all M independent SNPs across the autosomes, and computed empirical significance P_o as:

P_{o} = \frac{\sum_{m = 1}^{M} (N_{o, m} > N_{o})}{M + 1} .

Similarly, we assessed consistency of effect direction for N_O probesets with an association p ≤ 0.05 to a trans-eQTL in a pair of populations P₁ and P₂. To allow for alternate linkage disequilibrium patterns in different populations (where effects can be opposite with respect to a detected trans-eQTL), we defined the overlap in directionality as N_dir = max((N_1,p ∩ N_2,p) $\cup$ (N_1,n ∩ N_2,n), (N_1,p ∩ N_2,n) $\cup$ (N_1,n ∩ N_2,p)) where N_1,p are the number of probe sets with increasing expression given the number of alleles of the SNP, and N_1,n those with decreasing expression. We constructed the null distribution of N_dir of the targets of each trans-eQTL by computing it for all M independent SNPs across the autosomes and computed empirical significance P_dir as:

P_{d i r} = \frac{\sum_{m = 1}^{M} (N_{d i r, m} > N_{d i r})}{M + 1} .

Defining High-Confidence trans-eQTL Targets

We used a meta-analytic approach to define consensus target gene sets for high-confidence trans-eQTLs. For each candidate trans-eQTL, we meta-analyzed eQTL association statistics for each of the 9,085 probesets across the three populations, using sample-size weighted fixed effect meta-analysis,³⁴^,⁴¹ and then defined the group of target probesets as those with FDR < 0.01. This approach differs from the meta-analysis of the aggregate CPMA statistics above, where we were combining overall evidence of a trans-eQTL rather than for association to specific probeset levels.

Functional Enrichment Analyses of trans-eQTL Target Probesets

For each set of trans-eQTL target transcripts, we calculated enrichment of proximal transcription factor binding events using publicly available chromatin immunoprecipitation/sequencing (ChIP-seq) data for 50 factors in lymphoblastoid cell lines from the ENCODE consortium.³^,⁴² We were able to annotate 2,405/7,984 unique HGNC genes corresponding to the 9,085 probesets in our analysis with at least one transcription factor binding event from these data. We observed TF_o, the number of binding events for each transcription factor in the target probesets of each trans-eQTL, and assessed significance empirically by resampling probesets with similar expression intensity over N = 1,000 iterations:

P = \frac{\sum_{n = 1}^{N} (T F_{n} > T F_{o}) + 1}{N + 1}

where TF_n is the number of binding events for a transcription factor in the nth iteration.

To test for functional categories over-represented in each set of trans-eQTL target transcripts, we looked for enrichment of Gene Ontology biological process annotations with the hypergeometric approach implemented in BioConductor,⁴³ which accounts for the dependencies in the hierarchical structure of the ontology. We considered only terms where at least ten genes were observed.

To establish whether each set of trans-eQTL target transcripts represent biological networks, we used our previously described Protein Interaction Network Tissue Search (PINTS) framework⁴⁴ (R packages PINTS v.0.1, igraph v.1.44.1, and BioNet v.1.29.1). In brief, for each trans-eQTL we first collapsed target probesets onto HGNC genes and then projected these onto a protein-protein interaction network. We detected the largest subnetwork of target genes using the prize-collecting Steiner tree algorithm and assessed significance by permuting the network 100 times and assessing the size and connectivity of the largest subnetwork in the observed data. For any subnetworks showing significant excess in either size or connectivity, we used PINTS to test for preferential expression across a tissue atlas.⁴⁴

Results

Replicable trans-eQTLs Affect Many Genes

We used CPMA to detect trans-eQTLs affecting the expression levels of many target genes across three African HapMap populations.¹⁹ To account for the correlation between expression levels, we developed an approach to estimate the expected null distribution of eQTL statistics in the absence of a trans-eQTL. We analyzed population structure-corrected eQTL data for 9,085 probe sets at 737,867 autosomal markers from the MKK, LWK, and YRI HapMap populations (135, 83, and 107 individuals, respectively¹⁹), empirically assessing significance to account for the correlation between eQTL statistics for each gene. We first compared CPMA results across the three cohorts to detect replicable trans-eQTLs and found consistency between all three populations (Tables 1 and S1). To further explore these results, we combined CPMA statistics across the three cohorts by meta-analysis (Figure S5) and found 16,484/178,464 (9.2%) pairwise-independent SNPs with meta-analysis p_meta < 0.05, though none reached genome-wide significance (minimum p_meta = 7.2 × 10⁻⁷ at rs10842750).

Table 1.

trans-eQTLs Are Abundant across the Autosomal Genome

CPMA α	Expected	Observed	Hypergeometric p Value
0.5	37,942	39,479	3.6 × 10⁻⁴⁰
0.4	20,459	21,361	1.7 × 10⁻¹⁸
0.3	9,048	9,350	8.2 × 10⁻⁵
0.2	2,815	2,787	0.73
0.1	362	361	0.54

Open in a new tab

We find an enrichment of independent SNPs (178,464 markers with pairwise r² < 0.2) with elevated cross-phenotype meta-analysis (CPMA^²³) statistics across a variety of significance thresholds. CPMA tests whether the distribution of eQTL statistics at a given SNP is consistent with the null hypothesis that none of the 9,085 probesets examined are associated to that SNP, accounting for the correlation between expression levels. We observe that more SNPs than expected by chance have modest significance thresholds (CPMA α values), indicating that many trans-eQTL effects exist across the autosomal genome.

We next sought to prioritize a high-confidence subset of the 16,484 trans-eQTLs using additional independent criteria, which CPMA does not consider. True trans-eQTLs should fulfill two predictions: the genes they influence should be the same across populations and the direction of effect should be consistent between the populations. Since the extensive correlation between gene expression levels (and therefore between eQTL association statistics) is a major confounder when testing these predictions, we assessed the significance for both of these predictions empirically through pairwise comparisons of populations (see Material and Methods). Of the 16,484 nominal trans-eQTLs, 1,692 (10.2%; YRI and MKK), 1,851 (11.2%; YRI and LWK), and 1,892 (11.5%; MKK and LWK) had significant target overlaps (per-marker empirical overlap p < 0.05; Figure 1). Of these, 62 trans-eQTLs had significant target overlaps across all three pairwise comparisons (22 overlaps expected by chance, hypergeometric p = 4.5 × 10⁻¹³). Increasing the stringency of the CPMA cutoff threshold increases the proportions of these overlaps (Table S2), suggesting the presence of multiple trans-eQTLs affecting the same target genes across populations.

Hundreds of Putative *trans*-eQTLs across the Genome Affect the Same Genes in the Same Direction across Three African HapMap Populations

We considered all autosomal variants with nominal evidence of association to multiple transcript levels (p_cpma< 0.05). We find that the targets of these *trans*-eQTLs overlap significantly in the three populations (empirical assessment of *trans*-eQTL target overlaps between YRI and LWK [A], YRI and MKK [B], and LWK and MKK [C]). We also find that *trans*-eQTL allelic effects are consistently in the same direction across the three populations (empirical assessment of *trans*-eQTL sign tests between YRI and LWK [D], YRI and MKK [E], and LWK and MKK [F]). These bulk results indicate that *trans*-eQTLs affect the expression of the same target genes, in the same direction, across populations.

We next tested our second prediction, that a trans-eQTL minor allele has the same direction of effect across populations. We found that 5,743 (34.8%; YRI and MKK), 5,762 (35.0%; YRI and LWK), and 5,498 (33.4%; MKK and LWK) of the 16,484 candidate trans-eQTLs had consistent effects (per-marker empirical p < 0.05; Figure 1). Of these, 1,062 trans-eQTLs were significant across all three pairwise comparisons (670 overlaps expected by chance, hypergeometric p = 8.1 × 10⁻⁶⁴). Again, increasing the stringency of the CPMA cutoff threshold increases the proportions of these overlaps (Table S3). We also found the target overlap and directionality overlap statistics to be significantly correlated (p < 2.2 × 10⁻¹⁶, Figure S6), indicating the presence of trans-eQTLs affecting the same target transcripts in the same way.

Our results thus provide several lines of evidence for trans-eQTLs replicating across multiple populations. We recognized that not all 16,484 candidates represent true effects, so we next defined a set of high-confidence trans-eQTLs. We found that eight trans-eQTLs with significant CPMA meta-analysis statistics were also significant in each of the pairwise population tests of target overlap and directionality, which we prioritized for further analysis (Table 2). From simulations, we found that CPMA was well-powered to detect trans-acting effects on this scale in our data (Figures S7 and S8). Our pairwise overlap analysis is useful to demonstrate that trans-eQTLs replicate across populations and have consistent target sets and directions of effect. However, it is not optimal for defining the set of target transcripts for each trans-eQTL, as we would have to set arbitrary eQTL significance thresholds in each population and take the transcripts passing this threshold in all populations as the consensus target set. We therefore combined the eQTL association statistics for each transcript across the three populations in a formal meta-analysis—distinct from the CPMA statistic meta-analysis above, where we combined evidence that a trans-eQTL actually exists—and defined the consensus target sets for each trans-eQTL as those passing a false discovery threshold across the entire, combined eQTL dataset (FDR < 0.01; Figures 2, 3, and S9A–S9F). These eight trans-eQTLs are thus supported by the strongest cumulative evidence across all our analyses, though the extensive overlaps we observe in our bulk overlaps strongly suggest many others will exist in these data.

Table 2.

Eight trans-eQTLs Affect Hundreds of Transcripts across the Genome

trans-eQTL	Position	Closest Gene	Distance to Gene	Distance to TSS	CPMA^a(p_meta)	Target Overlap^b(p)	Effect Direction^c(p)	Target Genes^d	TF Enriched^e	Network^f(p)
rs7694213	chr4: 53,157,184	SCFD2	0	108,383	1.6 × 10⁻²	1.7 × 10⁻⁴ 4.0 × 10⁻² 3.1 × 10⁻³	5.6 × 10⁻⁶ 2.0 × 10⁻⁴ 5.6 × 10⁻⁶	417	–	0.18
rs6899963	chr6: 104,048,696	NPM1P10	21,932	21,931	1.9 × 10⁻²	3.5 × 10⁻² 5.1 × 10⁻³ 2.0 × 10⁻²	5.6 × 10⁻⁶ 1.7 × 10⁻⁵ 3.9 × 10⁻⁵	393	SRF, STAT3	<10⁻³
rs9406332	chr6: 169,317,960	VTA1P1	20,000	20,000	3.7 × 10⁻²	3.4 × 10⁻² 2.8 × 10⁻² 3.9 × 10⁻²	3.4 × 10⁻⁵ 1.1 × 10⁻⁵ 5.3 × 10⁻⁴	77	–	0.21
rs10107976	chr8: 62,689,355	NKAIN3	0	167,779	7.9 × 10⁻³	1.7 × 10⁻³ 3.4 × 10⁻² 7.7 × 10⁻³	1.7 × 10⁻⁵ 5.6 × 10⁻⁶ 5.6 × 10⁻⁶	116	USF1, USF2, MAX, RFX5, ZZZ3	0.19
rs4773419	chr13: 111,658,732	RP11-65D24.2	0	62,722	2.4 × 10⁻²	5.6 × 10⁻³ 2.6 × 10⁻² 1.3 × 10⁻²	5.6 × 10⁻⁶ 1.2 × 10⁻⁴ 1.2 × 10⁻⁴	166	SP1, PBX3, FOS, NRF1, BRCA1	0.37
rs11621120	chr14: 29,324,177	RP11-562L8.1	0	54,479	1.9 × 10⁻²	8.5 × 10⁻⁴ 8.2 × 10⁻³ 4.9 × 10⁻³	5.6 × 10⁻⁶ 5.7 × 10⁻⁴ 7.8 × 10⁻⁵	228	–	0.19
rs10520643	chr15: 86,859,175	AGBL1	0	55,267	2.4 × 10⁻³	4.5 × 10⁻³ 2.3 × 10⁻² 1.3 × 10⁻²	5.6 × 10⁻⁶ 5.6 × 10⁻⁶ 3.9 × 10⁻⁵	833	NR2C2	<10⁻³
rs7281608	chr21: 23,675,283	AP000960.1	166,275	166,360	4.5 × 10⁻²	2.8 × 10⁻² 4.8 × 10⁻³ 7.1 × 10⁻³	1.5 × 10⁻² 1.1 × 10⁻² 4.5 × 10⁻⁴	97	–	1 × 10⁻²

Open in a new tab

We identified a subset of trans-eQTLs with nominally significant CPMA meta-analysis statistics; pairwise tests of target overlap; and pairwise tests of directionality across three populations. For each trans-eQTL, we defined a consensus set of target transcripts and found significant enrichment of transcription factor binding events at their promoters. These targets also form significant protein-protein interaction subnetworks.

CPMA meta-analysis statistics for the SNP.

Significance for pairwise tests of target overlap across three populations.

Significance for pairwise tests of directionality across three populations.

For each trans-eQTL, we defined a consensus set of target transcripts (FDR < 0.01) by meta-analyzing eQTL statistics for individual probesets across the three populations.

Significant for enrichment of transcription factor binding events at the target promoters.

Significance for protein-protein sub-networks by PINTS.

A *trans*-eQTL at rs6899963 on Chromosome 6 Affects the Expression Levels of Many Genes across Three African HapMap Populations

(A) Meta-analysis p values for 9,085 transcript eQTLs at rs6899963.

(B) Effect directions are consistent across the three populations. In each population (x axis), we select SNPs where the minor allele increases (left) and decreases (right) expression and show the direction of effect in the other two populations as violin plots (trans-eQTL effect size, beta, on the y axis). The overwhelming majority of effects are consistent across all three populations.

(C) The target genes of the rs6899963 *trans*-eQTL form a large subnetwork, which is enriched for multiple Gene Ontology biological processes. Here, we show the interplay between the top two enriched terms: GO:0007088 (regulation of mitotic nuclear division) and GO:1901564 (organonitrogen compound metabolic process).

A *trans*-eQTL at rs10520643 on Chromosome 15 Affects the Expression Levels of Many Genes across Three African HapMap Populations

(A) Meta-analysis p values for 9,085 transcript eQTLs at rs10520643.

(B) Effect directions are consistent across the three populations. In each population (x axis), we select SNPs where the minor allele increases (left) and decreases (right) expression and show the direction of effect in the other two populations as violin plots (trans-eQTL effect size, beta, on the y axis). The overwhelming majority of effects are consistent across all three populations.

(C) The target genes of the rs10520643 *trans*-eQTL form a large subnetwork, which is enriched for multiple Gene Ontology biological processes. Here, we show the term GO:0046483 (heterocycle metabolic process).

Five out of the eight high-confidence trans-eQTLs are located within introns of genes, with the remaining three intergenic markers at least 20 kb away from any known gene (Table 2). Notably, we do not detect significant cis-eQTL effects between any of the eight markers or their proxies and genes within 500 kb, suggesting that their trans-acting effects are not mediated by changes to the steady-state expression levels of nearby transcripts.

Target Genes of trans-eQTLs Are Co-regulated and Enriched in Biological Pathways

Our hypothesis predicts that targets of a trans-eQTL will be co-regulated and participate in a limited number of biological pathways and processes. To test whether each of the eight trans-eQTL consensus target sets are co-regulated, we asked whether any transcription factors were more likely to bind upstream of targets than expected by chance. We tested binding events for 50 transcription factors assayed by chromatin immunoprecipitation and sequencing (ChIP-seq) in lymphoblastoid cell lines by the ENCODE project³^,⁴² and found significant enrichment for at least one transcription factor in four out of the eight trans-eQTLs (Table 2). This suggests that the transcripts associated to our high-confidence trans-eQTLs are regulated by the same cellular mechanisms, in line with our prediction for true trans-eQTL target sets.

As we had made a deliberate choice not to correct for unknown systematic effects in the expression data, we tested whether each of the eight consensus trans-eQTL target set member genes would have been discovered by such analyses. We found that all eight gene sets were significantly correlated to absolute loading values with at least one of the top 20 PEER factors³¹ (Wilcoxon rank sum test p < 3.1 × 10⁻⁴, Bonferroni adjusted for the number of trans-eQTLs and PEER factors). Thus, the effect of trans-eQTLs on gene expression may be minimized by such data treatments, and discovery efforts focused on trans-acting effects should account for this in their design.

We next tested whether each of the eight trans-eQTL consensus target sets are enriched for pathway annotations, indicating membership in a particular biological process, and found that all eight are enriched for 20–100 biological processes defined in the Gene Ontology (Table S4 lists the ten most significant biological processes of each trans-eQTL). Notably, we saw strong enrichment for annotations indicating fundamental biological processes including cell cycle control, metabolism, and assembly of cellular machinery. To further characterize these functional connections, we asked whether consensus target set genes form interacting protein networks and found that three out of the eight trans-eQTL target sets form such networks larger and more densely connected than expected by chance⁴⁴ (Table 2). We also found that these subnetworks are preferentially expressed in particular tissues: the largest subnetwork of rs6899963 target genes is preferentially expressed in fetal tissues and inducible pluripotent stem cells (interaction network permutation tests: size p = 0.03; number of edges p < 0.001; connectivity coefficient p < 0.001; overall eQTL load p = 0.02; Figure 2). The largest subnetwork of rs10520643 target genes is preferentially expressed in a similar pattern across fetal tissues and inducible pluripotent stem cells (interaction network permutation tests: size p = 0.23; number of edges p < 0.001; connectivity coefficient p < 0.001; overall eQTL statistic load p = 0.18; Figure 3). Collectively, these results show that trans-acting eQTLs affect the transcription of functionally related groups of genes involved in basic cellular processes.

Discussion

We have presented an approach for detecting trans-eQTLs simultaneously affecting many transcripts. We were able to detect reproducible trans-acting regulatory influences on the same genes, in the same direction, across multiple populations, and we showed that the targets of these trans-eQTLs participate in the same biological pathways, that they encode proteins that interact directly, and that they are regulated by the same transcription factors. Notably, we find that the trans-eQTL variants themselves do not act as cis-eQTLs on nearby genes, indicating that they do not alter gene regulation by affecting the transcript abundance of nearby genes in stable conditions. Thus, we provide strong evidence for trans-eQTLs affecting hundreds of co-regulated transcripts.

Despite the substantial heritability of gene expression attributed to them, trans-eQTLs have proven challenging to detect by detecting multiple significant associations to a genomic locus in human data.⁷ To date, only a few trans-eQTLs acting on small numbers of transcripts have been reported in human lymphoblastoid cell lines,¹⁷^,¹⁸^,¹⁹ adipose tissue,²⁰^,²¹ and whole blood.²² Reports linking the abundances of many transcripts to specific genomic loci in yeast,⁹ mouse,¹⁰^,¹¹ rat,¹² maize, and human¹³ have suggested that some trans-acting factors influence the transcription of hundreds of genes,⁸ but these results have been plagued by sensitivity of the analytical methods to noise from data processing artifacts.³⁰ While these results demonstrate the feasibility of detecting trans-eQTLs, the analytical approaches employed are generally inefficient and lack statistical power to detect many weak effects, as they rely on detecting many individually significant eQTLs at a locus.

To circumvent these limitations, several methods have been proposed to leverage the expectation that trans-eQTLs will simultaneously affect multiple transcripts by jointly analyzing expression data for many genes. Both principal component and latent variable analysis have been used to identify trends in covariance induced by a trans-eQTL in the expression levels of its targets, and using these as meta-traits in an association or linkage test.³¹^,⁴⁵ However, these approaches have yet to yield the substantial numbers of trans-eQTL effects required to explain the 88% of heritability explained by trans-acting factors. More recently, a novel tensor decomposition method has been developed and shown to detect trans-eQTLs affecting many genes across multiple tissues,⁴⁶ indicating that trans-eQTL detection is a tractable analytical problem in existing datasets. Our own approach complements these efforts by looking for changes to the distribution of test statistics, rather than widespread trends in variance in the underlying expression data. Cumulatively, these efforts demonstrate that both the modest effect sizes of trans-acting variants¹⁷^,²⁰ and the systematic noise in gene expression assays³⁰ can be addressed by novel analytical methods¹⁶^,³¹ to maximize the insights gleaned from current datasets. We note that our approach is geared toward detecting trans-eQTLs influencing many genes, at the cost of low power to detect effects on single genes or a small number of targets. However, larger sample sizes will be required to estimate the relative contributions of trans-eQTLs affecting many genes and those affecting few targets to the overall heritability of transcript levels, so we cannot yet gauge how widespread variation in large transcriptional control networks may be.

As larger sample sizes are collected in eQTL studies and profiled with more technically robust assays such as RNA sequencing,³⁷ the statistical power to detect trans-acting eQTLs will increase.⁴⁷ This will be particularly true in the GTEx project,²⁵ which aims to profile more than 40 tissues from 1,000 donors (though the current public release, v6p, includes ∼330 donors, equivalent to the dataset used in our analysis). Increases in sample size will also help overcome the increased variability and consequent loss of statistical power when profiling tissue samples composed of heterogeneous cell types, where a trans-eQTL may manifest in only a subset of profiled cells. Combined with emerging analytical methods, our ability to detect the structure of regulatory networks and thus the organization of biological processes will increase. Our current results are consistent with precise regulation of biological processes at such a large scale, particularly for basic homeostatic mechanisms, and further support the notion that regulation of basic cell processes is highly orchestrated and occurs on several levels simultaneously.⁴⁸ Applying this approach to eQTL datasets from diverse tissues under different stimuli will yield rich insights into tissue-specific regulatory circuits driving diverse cellular processes. Finally, we note that biological exploration and dissection of these pathways will require new experimental tools, which can address the subtleties of quantitative regulatory changes in large numbers of genes.

Acknowledgments

Computing resources at Yale were funded partly by NIH grants RR19895 and RR029676-01. B.B. was supported by a post-doctoral fellowship from the Swedish Research Council (grant 524-2012-6881).

Published: March 9, 2017

Footnotes

Supplemental Data include nine figures and four tables and can be found with this article online at http://dx.doi.org/10.1016/j.ajhg.2017.02.004.

Appendix A

Finding the Optimal Number of Principal Components to Include

We estimated the optimal number of principal components to include for each population. Using the entire set of genes in this iterative analysis is computationally prohibitive, so we chose a subset of 100 random genes. 50 of these genes were randomly selected among those within the top 95% of genomic inflation factor λ_gc to ensure that the we were able to correct most of the extreme biases. In each population, we iteratively included 1 to 30 principal components as covariates for each gene in a linear regression. The distributions of λ_gc per number of principal components included is seen in Figure S4. The smallest number of principal components required to correct for population stratification was 2 for YRI, 10 for LWK, and 20 for MKK.

Shrinkage of Covariance Matrices

We have estimated the covariance between the eQTL z-score vectors of 9,085 genes based on 737,867 eQTL z-scores using the cov function in R (scaling and centering each vector), which uses maximum likelihood estimation. Inferring large covariance matrices from sparse genomic data can be problematic, and we therefore evaluated whether a shrinkage approach would produce more well-conditioned covariance matrices. For this testing we utilized a subset of the MKK data, 10% of the genes (900) and 10% of the eQTL z-values (73,786) per gene. We estimated the covariance of this subset using the cov function in R and then shrunk the covariance matrix using the cov.shrink function from the corpcor package (v. 1.6.8). Both covariance matrices had full rank (900), but the shrunken covariance matrix has a larger condition value (1,253) than the originally estimated covariance matrix (1,442). We therefore selected not to shrink our covariance matrices.

The three full-size covariance matrices are all positive definite with full rank.

Web Resources

Code, http://www.github.com/cotsapaslab/CPMAtranseqtl
E-MTAB-264, http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-264/
HapMap (accessed 2014-06-18) Genotype data, ftp://ftp.ncbi.nlm.nih.gov/hapmap/genotypes/2009-01_phaseIII/plink_format/

Supplemental Data

Document S1. Figures S1–S9 and Tables S1–S4

mmc1.pdf^{(2.6MB, pdf)}

Document S2. Article plus Supplemental Data

mmc2.pdf^{(3.1MB, pdf)}

References

1.Castle J.C., Zhang C., Shah J.K., Kulkarni A.V., Kalsotra A., Cooper T.A., Johnson J.M. Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nat. Genet. 2008;40:1416–1425. doi: 10.1038/ng.264. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Choy E., Yelensky R., Bonakdar S., Plenge R.M., Saxena R., De Jager P.L., Shaw S.Y., Wolfish C.S., Slavik J.M., Cotsapas C. Genetic analysis of human traits in vitro: drug response and gene expression in lymphoblastoid cell lines. PLoS Genet. 2008;4:e1000287. doi: 10.1371/journal.pgen.1000287. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Gerstein M.B., Kundaje A., Hariharan M., Landt S.G., Yan K.K., Cheng C., Mu X.J., Khurana E., Rozowsky J., Alexander R. Architecture of the human regulatory network derived from ENCODE data. Nature. 2012;489:91–100. doi: 10.1038/nature11245. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Barabási A.L., Oltvai Z.N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 2004;5:101–113. doi: 10.1038/nrg1272. [DOI] [PubMed] [Google Scholar]
5.Maurano M.T., Humbert R., Rynes E., Thurman R.E., Haugen E., Wang H., Reynolds A.P., Sandstrom R., Qu H., Brody J. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. doi: 10.1126/science.1222794. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Trynka G., Sandor C., Han B., Xu H., Stranger B.E., Liu X.S., Raychaudhuri S. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 2013;45:124–130. doi: 10.1038/ng.2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Price A.L., Patterson N., Hancks D.C., Myers S., Reich D., Cheung V.G., Spielman R.S. Effects of cis and trans genetic ancestry on gene expression in African Americans. PLoS Genet. 2008;4:e1000294. doi: 10.1371/journal.pgen.1000294. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Rockman M.V., Kruglyak L. Genetics of global gene expression. Nat. Rev. Genet. 2006;7:862–872. doi: 10.1038/nrg1964. [DOI] [PubMed] [Google Scholar]
9.Yvert G., Brem R.B., Whittle J., Akey J.M., Foss E., Smith E.N., Mackelprang R., Kruglyak L. Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat. Genet. 2003;35:57–64. doi: 10.1038/ng1222. [DOI] [PubMed] [Google Scholar]
10.Bystrykh L., Weersing E., Dontje B., Sutton S., Pletcher M.T., Wiltshire T., Su A.I., Vellenga E., Wang J., Manly K.F. Uncovering regulatory pathways that affect hematopoietic stem cell function using ‘genetical genomics’. Nat. Genet. 2005;37:225–232. doi: 10.1038/ng1497. [DOI] [PubMed] [Google Scholar]
11.Chesler E.J., Lu L., Shou S., Qu Y., Gu J., Wang J., Hsu H.C., Mountz J.D., Baldwin N.E., Langston M.A. Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat. Genet. 2005;37:233–242. doi: 10.1038/ng1518. [DOI] [PubMed] [Google Scholar]
12.Esparza-Gordillo J., Weidinger S., Fölster-Holst R., Bauerfeind A., Ruschendorf F., Patone G., Rohde K., Marenholz I., Schulz F., Kerscher T. A common variant on chromosome 11q13 is associated with atopic dermatitis. Nat. Genet. 2009;41:596–601. doi: 10.1038/ng.347. [DOI] [PubMed] [Google Scholar]
13.Schadt E.E., Monks S.A., Drake T.A., Lusis A.J., Che N., Colinayo V., Ruff T.G., Milligan S.B., Lamb J.R., Cavet G. Genetics of gene expression surveyed in maize, mouse and man. Nature. 2003;422:297–302. doi: 10.1038/nature01434. [DOI] [PubMed] [Google Scholar]
14.Cheung V.G., Spielman R.S., Ewens K.G., Weber T.M., Morley M., Burdick J.T. Mapping determinants of human gene expression by regional and genome-wide association. Nature. 2005;437:1365–1369. doi: 10.1038/nature04244. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Brem R.B., Yvert G., Clinton R., Kruglyak L. Genetic dissection of transcriptional regulation in budding yeast. Science. 2002;296:752–755. doi: 10.1126/science.1069516. [DOI] [PubMed] [Google Scholar]
16.Kang H.M., Ye C., Eskin E. Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics. 2008;180:1909–1925. doi: 10.1534/genetics.108.094201. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Battle A., Mostafavi S., Zhu X., Potash J.B., Weissman M.M., McCormick C., Haudenschild C.D., Beckman K.B., Shi J., Mei R. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 2014;24:14–24. doi: 10.1101/gr.155192.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Dimas A.S., Deutsch S., Stranger B.E., Montgomery S.B., Borel C., Attar-Cohen H., Ingle C., Beazley C., Gutierrez Arcelus M., Sekowska M. Common regulatory variation impacts gene expression in a cell type-dependent manner. Science. 2009;325:1246–1250. doi: 10.1126/science.1174148. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Stranger B.E., Montgomery S.B., Dimas A.S., Parts L., Stegle O., Ingle C.E., Sekowska M., Smith G.D., Evans D., Gutierrez-Arcelus M. Patterns of cis regulatory variation in diverse human populations. PLoS Genet. 2012;8:e1002639. doi: 10.1371/journal.pgen.1002639. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Grundberg E., Small K.S., Hedman A.K., Nica A.C., Buil A., Keildson S., Bell J.T., Yang T.P., Meduri E., Barrett A., Multiple Tissue Human Expression Resource (MuTHER) Consortium Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 2012;44:1084–1089. doi: 10.1038/ng.2394. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Small K.S., Hedman A.K., Grundberg E., Nica A.C., Thorleifsson G., Kong A., Thorsteindottir U., Shin S.Y., Richards H.B., Soranzo N., GIANT Consortium. MAGIC Investigators. DIAGRAM Consortium. MuTHER Consortium Identification of an imprinted master trans regulator at the KLF14 locus related to multiple metabolic phenotypes. Nat. Genet. 2011;43:561–564. doi: 10.1038/ng.833. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Westra H.J., Peters M.J., Esko T., Yaghootkar H., Schurmann C., Kettunen J., Christiansen M.W., Fairfax B.P., Schramm K., Powell J.E. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 2013;45:1238–1243. doi: 10.1038/ng.2756. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Cotsapas C., Voight B.F., Rossin E., Lage K., Neale B.M., Wallace C., Abecasis G.R., Barrett J.C., Behrens T., Cho J., FOCiS Network of Consortia Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 2011;7:e1002254. doi: 10.1371/journal.pgen.1002254. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Donoho D., Jin J. Higher criticism for detecting sparse heterogeneous mixtures. Ann. Stat. 2004;32:962–994. [Google Scholar]
25.Consortium G.T., GTEx Consortium The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 2013;45:580–585. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Team T.R.C.D. R Foundation for Statistical Computing; Vienna, Austria: 2014. R: A Language and Environment for Statistical Computing. [Google Scholar]
27.Gentleman R.C., Carey V.J., Bates D.M., Bolstad B., Dettling M., Dudoit S., Ellis B., Gautier L., Ge Y., Gentry J. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. doi: 10.1186/gb-2004-5-10-r80. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Fraley C., Raftery A.E. Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 2002;97:611–631. [Google Scholar]
29.Durinck S., Spellman P.T., Birney E., Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 2009;4:1184–1191. doi: 10.1038/nprot.2009.97. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Williams R.B., Cotsapas C.J., Cowley M.J., Chan E., Nott D.J., Little P.F. Normalization procedures and detection of linkage signal in genetical-genomics experiments. Nat. Genet. 2006;38:855–856. doi: 10.1038/ng0806-855. author reply 856–859. [DOI] [PubMed] [Google Scholar]
31.Stegle O., Parts L., Durbin R., Winn J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 2010;6:e1000770. doi: 10.1371/journal.pcbi.1000770. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Leek J.T., Storey J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3:1724–1735. doi: 10.1371/journal.pgen.0030161. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Nica A.C., Parts L., Glass D., Nisbet J., Barrett A., Sekowska M., Travers M., Potter S., Grundberg E., Small K., MuTHER Consortium The architecture of gene regulatory variation across multiple human tissues: the MuTHER study. PLoS Genet. 2011;7:e1002003. doi: 10.1371/journal.pgen.1002003. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J., Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
36.Beecham A.H., Patsopoulos N.A., Xifara D.K., Davis M.F., Kemppinen A., Cotsapas C., Shah T.S., Spencer C., Booth D., Goris A., International Multiple Sclerosis Genetics Consortium (IMSGC) Wellcome Trust Case Control Consortium 2 (WTCCC2) International IBD Genetics Consortium (IIBDGC) Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat. Genet. 2013;45:1353–1360. doi: 10.1038/ng.2770. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Lappalainen T., Sammeth M., Friedländer M.R., ’t Hoen P.A., Monlong J., Rivas M.A., Gonzàlez-Porta M., Kurbatova N., Griebel T., Ferreira P.G., Geuvadis Consortium Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013;501:506–511. doi: 10.1038/nature12531. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Alter O., Brown P.O., Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA. 2000;97:10101–10106. doi: 10.1073/pnas.97.18.10101. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Venables W.N., Ripley B.D. Springer; 2002. Modern Applied Statistics with S-PLUS. [Google Scholar]
40.Kalinka, A.T. (2013). The probability of drawing intersections: extending the hypergeometric distribution. arXiv. arXiv:1305.0717.
41.de Bakker P.I., Ferreira M.A., Jia X., Neale B.M., Raychaudhuri S., Voight B.F. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet. 2008;17(R2):R122–R128. doi: 10.1093/hmg/ddn288. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Cheng C., Min R., Gerstein M. TIP: a probabilistic method for identifying transcription factor target genes from ChIP-seq binding profiles. Bioinformatics. 2011;27:3221–3227. doi: 10.1093/bioinformatics/btr552. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Falcon S., Gentleman R. Using GOstats to test gene lists for GO term association. Bioinformatics. 2007;23:257–258. doi: 10.1093/bioinformatics/btl567. [DOI] [PubMed] [Google Scholar]
44.Choi J., Shooshtari P., Samocha K.E., Daly M.J., Cotsapas C. Network analysis of genome-wide selective constraint reveals a gene network active in early fetal brain intolerant of mutation. PLoS Genet. 2016;12:e1006121. doi: 10.1371/journal.pgen.1006121. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Fusi N., Stegle O., Lawrence N.D. Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies. PLoS Comput. Biol. 2012;8:e1002330. doi: 10.1371/journal.pcbi.1002330. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Hore V., Viñuela A., Buil A., Knight J., McCarthy M.I., Small K., Marchini J. Tensor decomposition for multiple-tissue gene expression experiments. Nat. Genet. 2016;48:1094–1100. doi: 10.1038/ng.3624. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Fehrmann R.S., Jansen R.C., Veldink J.H., Westra H.J., Arends D., Bonder M.J., Fu J., Deelen P., Groen H.J., Smolonska A. Trans-eQTLs reveal that independent genetic variants associated with a complex phenotype converge on intermediate genes, with a major role for the HLA. PLoS Genet. 2011;7:e1002197. doi: 10.1371/journal.pgen.1002197. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Gilad Y., Rifkin S.A., Pritchard J.K. Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet. 2008;24:408–415. doi: 10.1016/j.tig.2008.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S9 and Tables S1–S4

mmc1.pdf^{(2.6MB, pdf)}

Document S2. Article plus Supplemental Data

mmc2.pdf^{(3.1MB, pdf)}

[bib1] 1.Castle J.C., Zhang C., Shah J.K., Kulkarni A.V., Kalsotra A., Cooper T.A., Johnson J.M. Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nat. Genet. 2008;40:1416–1425. doi: 10.1038/ng.264. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Choy E., Yelensky R., Bonakdar S., Plenge R.M., Saxena R., De Jager P.L., Shaw S.Y., Wolfish C.S., Slavik J.M., Cotsapas C. Genetic analysis of human traits in vitro: drug response and gene expression in lymphoblastoid cell lines. PLoS Genet. 2008;4:e1000287. doi: 10.1371/journal.pgen.1000287. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Gerstein M.B., Kundaje A., Hariharan M., Landt S.G., Yan K.K., Cheng C., Mu X.J., Khurana E., Rozowsky J., Alexander R. Architecture of the human regulatory network derived from ENCODE data. Nature. 2012;489:91–100. doi: 10.1038/nature11245. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Barabási A.L., Oltvai Z.N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 2004;5:101–113. doi: 10.1038/nrg1272. [DOI] [PubMed] [Google Scholar]

[bib5] 5.Maurano M.T., Humbert R., Rynes E., Thurman R.E., Haugen E., Wang H., Reynolds A.P., Sandstrom R., Qu H., Brody J. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. doi: 10.1126/science.1222794. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Trynka G., Sandor C., Han B., Xu H., Stranger B.E., Liu X.S., Raychaudhuri S. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 2013;45:124–130. doi: 10.1038/ng.2504. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Price A.L., Patterson N., Hancks D.C., Myers S., Reich D., Cheung V.G., Spielman R.S. Effects of cis and trans genetic ancestry on gene expression in African Americans. PLoS Genet. 2008;4:e1000294. doi: 10.1371/journal.pgen.1000294. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Rockman M.V., Kruglyak L. Genetics of global gene expression. Nat. Rev. Genet. 2006;7:862–872. doi: 10.1038/nrg1964. [DOI] [PubMed] [Google Scholar]

[bib9] 9.Yvert G., Brem R.B., Whittle J., Akey J.M., Foss E., Smith E.N., Mackelprang R., Kruglyak L. Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat. Genet. 2003;35:57–64. doi: 10.1038/ng1222. [DOI] [PubMed] [Google Scholar]

[bib10] 10.Bystrykh L., Weersing E., Dontje B., Sutton S., Pletcher M.T., Wiltshire T., Su A.I., Vellenga E., Wang J., Manly K.F. Uncovering regulatory pathways that affect hematopoietic stem cell function using ‘genetical genomics’. Nat. Genet. 2005;37:225–232. doi: 10.1038/ng1497. [DOI] [PubMed] [Google Scholar]

[bib11] 11.Chesler E.J., Lu L., Shou S., Qu Y., Gu J., Wang J., Hsu H.C., Mountz J.D., Baldwin N.E., Langston M.A. Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat. Genet. 2005;37:233–242. doi: 10.1038/ng1518. [DOI] [PubMed] [Google Scholar]

[bib12] 12.Esparza-Gordillo J., Weidinger S., Fölster-Holst R., Bauerfeind A., Ruschendorf F., Patone G., Rohde K., Marenholz I., Schulz F., Kerscher T. A common variant on chromosome 11q13 is associated with atopic dermatitis. Nat. Genet. 2009;41:596–601. doi: 10.1038/ng.347. [DOI] [PubMed] [Google Scholar]

[bib13] 13.Schadt E.E., Monks S.A., Drake T.A., Lusis A.J., Che N., Colinayo V., Ruff T.G., Milligan S.B., Lamb J.R., Cavet G. Genetics of gene expression surveyed in maize, mouse and man. Nature. 2003;422:297–302. doi: 10.1038/nature01434. [DOI] [PubMed] [Google Scholar]

[bib14] 14.Cheung V.G., Spielman R.S., Ewens K.G., Weber T.M., Morley M., Burdick J.T. Mapping determinants of human gene expression by regional and genome-wide association. Nature. 2005;437:1365–1369. doi: 10.1038/nature04244. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Brem R.B., Yvert G., Clinton R., Kruglyak L. Genetic dissection of transcriptional regulation in budding yeast. Science. 2002;296:752–755. doi: 10.1126/science.1069516. [DOI] [PubMed] [Google Scholar]

[bib16] 16.Kang H.M., Ye C., Eskin E. Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics. 2008;180:1909–1925. doi: 10.1534/genetics.108.094201. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17.Battle A., Mostafavi S., Zhu X., Potash J.B., Weissman M.M., McCormick C., Haudenschild C.D., Beckman K.B., Shi J., Mei R. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 2014;24:14–24. doi: 10.1101/gr.155192.113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Dimas A.S., Deutsch S., Stranger B.E., Montgomery S.B., Borel C., Attar-Cohen H., Ingle C., Beazley C., Gutierrez Arcelus M., Sekowska M. Common regulatory variation impacts gene expression in a cell type-dependent manner. Science. 2009;325:1246–1250. doi: 10.1126/science.1174148. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19.Stranger B.E., Montgomery S.B., Dimas A.S., Parts L., Stegle O., Ingle C.E., Sekowska M., Smith G.D., Evans D., Gutierrez-Arcelus M. Patterns of cis regulatory variation in diverse human populations. PLoS Genet. 2012;8:e1002639. doi: 10.1371/journal.pgen.1002639. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Grundberg E., Small K.S., Hedman A.K., Nica A.C., Buil A., Keildson S., Bell J.T., Yang T.P., Meduri E., Barrett A., Multiple Tissue Human Expression Resource (MuTHER) Consortium Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 2012;44:1084–1089. doi: 10.1038/ng.2394. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Small K.S., Hedman A.K., Grundberg E., Nica A.C., Thorleifsson G., Kong A., Thorsteindottir U., Shin S.Y., Richards H.B., Soranzo N., GIANT Consortium. MAGIC Investigators. DIAGRAM Consortium. MuTHER Consortium Identification of an imprinted master trans regulator at the KLF14 locus related to multiple metabolic phenotypes. Nat. Genet. 2011;43:561–564. doi: 10.1038/ng.833. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] 22.Westra H.J., Peters M.J., Esko T., Yaghootkar H., Schurmann C., Kettunen J., Christiansen M.W., Fairfax B.P., Schramm K., Powell J.E. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 2013;45:1238–1243. doi: 10.1038/ng.2756. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23.Cotsapas C., Voight B.F., Rossin E., Lage K., Neale B.M., Wallace C., Abecasis G.R., Barrett J.C., Behrens T., Cho J., FOCiS Network of Consortia Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 2011;7:e1002254. doi: 10.1371/journal.pgen.1002254. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] 24.Donoho D., Jin J. Higher criticism for detecting sparse heterogeneous mixtures. Ann. Stat. 2004;32:962–994. [Google Scholar]

[bib25] 25.Consortium G.T., GTEx Consortium The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 2013;45:580–585. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] 26.Team T.R.C.D. R Foundation for Statistical Computing; Vienna, Austria: 2014. R: A Language and Environment for Statistical Computing. [Google Scholar]

[bib27] 27.Gentleman R.C., Carey V.J., Bates D.M., Bolstad B., Dettling M., Dudoit S., Ellis B., Gautier L., Ge Y., Gentry J. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. doi: 10.1186/gb-2004-5-10-r80. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] 28.Fraley C., Raftery A.E. Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 2002;97:611–631. [Google Scholar]

[bib29] 29.Durinck S., Spellman P.T., Birney E., Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 2009;4:1184–1191. doi: 10.1038/nprot.2009.97. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] 30.Williams R.B., Cotsapas C.J., Cowley M.J., Chan E., Nott D.J., Little P.F. Normalization procedures and detection of linkage signal in genetical-genomics experiments. Nat. Genet. 2006;38:855–856. doi: 10.1038/ng0806-855. author reply 856–859. [DOI] [PubMed] [Google Scholar]

[bib31] 31.Stegle O., Parts L., Durbin R., Winn J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 2010;6:e1000770. doi: 10.1371/journal.pcbi.1000770. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] 32.Leek J.T., Storey J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3:1724–1735. doi: 10.1371/journal.pgen.0030161. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] 33.Nica A.C., Parts L., Glass D., Nisbet J., Barrett A., Sekowska M., Travers M., Potter S., Grundberg E., Small K., MuTHER Consortium The architecture of gene regulatory variation across multiple human tissues: the MuTHER study. PLoS Genet. 2011;7:e1002003. doi: 10.1371/journal.pgen.1002003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] 34.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J., Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] 35.Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]

[bib36] 36.Beecham A.H., Patsopoulos N.A., Xifara D.K., Davis M.F., Kemppinen A., Cotsapas C., Shah T.S., Spencer C., Booth D., Goris A., International Multiple Sclerosis Genetics Consortium (IMSGC) Wellcome Trust Case Control Consortium 2 (WTCCC2) International IBD Genetics Consortium (IIBDGC) Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat. Genet. 2013;45:1353–1360. doi: 10.1038/ng.2770. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] 37.Lappalainen T., Sammeth M., Friedländer M.R., ’t Hoen P.A., Monlong J., Rivas M.A., Gonzàlez-Porta M., Kurbatova N., Griebel T., Ferreira P.G., Geuvadis Consortium Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013;501:506–511. doi: 10.1038/nature12531. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] 38.Alter O., Brown P.O., Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA. 2000;97:10101–10106. doi: 10.1073/pnas.97.18.10101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] 39.Venables W.N., Ripley B.D. Springer; 2002. Modern Applied Statistics with S-PLUS. [Google Scholar]

[bib40] 40.Kalinka, A.T. (2013). The probability of drawing intersections: extending the hypergeometric distribution. arXiv. arXiv:1305.0717.

[bib41] 41.de Bakker P.I., Ferreira M.A., Jia X., Neale B.M., Raychaudhuri S., Voight B.F. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet. 2008;17(R2):R122–R128. doi: 10.1093/hmg/ddn288. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] 42.Cheng C., Min R., Gerstein M. TIP: a probabilistic method for identifying transcription factor target genes from ChIP-seq binding profiles. Bioinformatics. 2011;27:3221–3227. doi: 10.1093/bioinformatics/btr552. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] 43.Falcon S., Gentleman R. Using GOstats to test gene lists for GO term association. Bioinformatics. 2007;23:257–258. doi: 10.1093/bioinformatics/btl567. [DOI] [PubMed] [Google Scholar]

[bib44] 44.Choi J., Shooshtari P., Samocha K.E., Daly M.J., Cotsapas C. Network analysis of genome-wide selective constraint reveals a gene network active in early fetal brain intolerant of mutation. PLoS Genet. 2016;12:e1006121. doi: 10.1371/journal.pgen.1006121. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] 45.Fusi N., Stegle O., Lawrence N.D. Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies. PLoS Comput. Biol. 2012;8:e1002330. doi: 10.1371/journal.pcbi.1002330. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] 46.Hore V., Viñuela A., Buil A., Knight J., McCarthy M.I., Small K., Marchini J. Tensor decomposition for multiple-tissue gene expression experiments. Nat. Genet. 2016;48:1094–1100. doi: 10.1038/ng.3624. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] 47.Fehrmann R.S., Jansen R.C., Veldink J.H., Westra H.J., Arends D., Bonder M.J., Fu J., Deelen P., Groen H.J., Smolonska A. Trans-eQTLs reveal that independent genetic variants associated with a complex phenotype converge on intermediate genes, with a major role for the HLA. PLoS Genet. 2011;7:e1002197. doi: 10.1371/journal.pgen.1002197. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] 48.Gilad Y., Rifkin S.A., Pritchard J.K. Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet. 2008;24:408–415. doi: 10.1016/j.tig.2008.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Large-Scale trans-eQTLs Affect Hundreds of Transcripts and Mediate Patterns of Transcriptional Co-regulation

Boel Brynedal

JinMyung Choi

Towfique Raj

Robert Bjornson

Barbara E Stranger

Benjamin M Neale

Benjamin F Voight

Chris Cotsapas

Abstract

Introduction

Material and Methods

Genotype Data Processing

Expression Data Processing

Calculating eQTL Association Statistics

Identifying trans-eQTLs by Cross Phenotype Meta-analysis

Simulating eQTL Statistics to Test CPMA

Meta-analysis of CPMA Statistics

Analytical Validation of trans-eQTLs

Defining High-Confidence trans-eQTL Targets

Functional Enrichment Analyses of trans-eQTL Target Probesets

Results

Replicable trans-eQTLs Affect Many Genes

Table 1.

Figure 1.

Table 2.

Figure 2.

Figure 3.

Target Genes of trans-eQTLs Are Co-regulated and Enriched in Biological Pathways

Discussion

Acknowledgments

Footnotes

Appendix A

Finding the Optimal Number of Principal Components to Include

Shrinkage of Covariance Matrices

Web Resources

Supplemental Data

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases