Skip to main content
Genetics logoLink to Genetics
. 2022 Nov 2;223(1):iyac162. doi: 10.1093/genetics/iyac162

Effect of all-but-one conditional analysis for eQTL isolation in peripheral blood

Margaret Brown 1, Emily Greenwood 2, Biao Zeng ✉,4, Joseph E Powell , Greg Gibson 3,
Editor: J Flint
PMCID: PMC9836021  PMID: 36321965

Abstract

Expression quantitative trait locus detection has become increasingly important for understanding how noncoding variants contribute to disease susceptibility and complex traits. The major challenges in expression quantitative trait locus fine-mapping and causal variant discovery relate to the impact of linkage disequilibrium on signals due to one or multiple functional variants that lie within a credible set. We perform expression quantitative trait locus fine-mapping using the all-but-one approach, conditioning each signal on all others detected in an interval, on the Consortium for the Architecture of Gene Expression cohorts of microarray-based peripheral blood gene expression in 2,138 European-ancestry human adults. We contrast these results with traditional forward stepwise conditional analysis and a Bayesian localization method. All-but-one conditioning significantly modifies effect-size estimates for 51% of 2,351 expression quantitative trait locus peaks, but only modestly affects credible set size and location. On the other hand, both conditioning approaches result in unexpectedly low overlap with Bayesian credible sets, with just 57% peak concordance and between 50% and 70% SNP sharing, leading us to caution against the assumption that any one localization method is superior to another. We also cross reference our results with ATAC-seq data, cell-type-specific expression quantitative trait locus, and activity-by-contact-enhancers, leading to the proposal of a 5-tier approach to further reduce credible set sizes and prioritize likely causal variants for all known inflammatory bowel disease risk loci active in immune cells.

Keywords: gene expression, quantitative trait loci, conditional analysis, fine-mapping, credible set

Introduction

Causal variant discovery is an increasingly important component of human genetics as it promises to improve individual genetic risk prediction, while also enhancing knowledge of the molecular mechanisms of disease. Most SNP to trait associations are documented at low resolution by genome-wide association studies (GWAS), generally discovering many variants in noncoding regions which are difficult to assign function to (ENCODE 2007; Farh et al. 2015). Statistical fine-mapping of such association signals precisely detects credible sets of SNPs associated with a particular trait, but rarely has the resolution to identify the causal variant (Emilsson et al. 2008; Chen et al. 2008; Zhang et al. 2013). Recognizing that regulation of gene expression is the likely mechanism of action, expression quantitative trait loci (eQTLs) are believed to provide a complementary source of information regarding which noncoding variants likely influence traits or disease susceptibility (Zeng et al. 2019; Wu et al. 2019). While recent studies demonstrate eQTLs in white blood cell types are often associated with autoimmune disease (Odhams et al. 2017), the example of Crohn’s disease shows fewer than 20% of GWAS and eQTL signals are likely to be coincident (Chun et al. 2017).

One of the major challenges in fine-mapping is overcoming the complexities imposed by linkage disequilibrium (LD), which results in misestimation both of allelic effect sizes and localization of the true peak (Kichaev et al. 2014; Visscher et al. 2017; Schaid et al. 2018). Sampling variance is sufficient to make distinction among variants in high LD ambiguous, and problems are confounded by interference among tightly linked causal SNPs. This can either reinforce one another leading to overestimation of effect sizes, or act antagonistically leading to underestimation of true effect sizes. As the number of contributing variants at a locus increases, accuracy decreases, and simulations show it is not uncommon for the causal variant(s) to be excluded from the fine-mapped credible set(s) (Zeng et al. 2017). Furthermore, embedding of credible sets within one another can increase the size of the observed set, hindering the specificity (Liu et al. 2021). The extent of these biases is not always recognized.

There are two current broad approaches to fine-mapping eQTLs by reducing the effects of LD and interference of tightly linked SNPs. One is based on conditional analysis, most often forward stepwise regression (FSR), where the effect of each newly identified peak SNP is included in subsequent modeling steps and identifies statistically independent effects (Hormozdiari et al. 2014; Jansen et al. 2017; Gudjonsson et al. 2022). In theory, if the identities of the true causal variants are known, then inclusion of all peaks should result in a linear model that optimizes the variance explained for the trait or transcript. With thousands of possible variants at any given locus, exhaustive searches are impractical, and instead the alternative strategy of Bayesian mapping has gained popularity (Kendziorski et al. 2006; Stegle et al. 2010; Fachal et al. 2020). These methods consider LD, and in some cases functional annotations, to generate posterior probabilities that highlight one or a handful of high priority candidates. We recently proposed a third approach, TreeMap, which evaluates associations contingent on the haplotype tree at the locus, demonstrating some gains in accuracy, but also highlighting disagreement among methods (Liu et al. 2021).

Each method utilizes raw genotypes and gene expression matrices, which are more difficult to work with than published summary statistics. Methods such as FINEMAP (Benner et al. 2016) and Sum of Single Effects (SuSiE; Wang et al. 2020) are competitive in speed and accuracy for estimating multi-site regulatory effects from summary data and approach to detecting sources of inaccuracy are under development (Zou et al. 2022). Version 2 of TIGAR (Parrish et al. 2022) implements a nonparametric Bayesian estimation of eQTL weights across a locus and integrates them with GWAS summary statistics for burden testing, while OTTERS extends this TWAS framework by also predicting gene expression from summary eQTL data (Dai et al. 2022).

Against this background of methodological improvement, we focus on exploring an enhancement of conditional association designed to improve allelic effect size estimation given a set of peaks identified by FSR. The idea of “all-but-one” modeling is to isolate each eQTL signal by conditioning on all other independent associations within a locus, thereby minimizing the effects of residual LD (Yang et al. 2012; Dobbyn et al. 2018). We demonstrate this method modifies effect estimates for many eQTL, and for a minority relocalizes the likely causal variant peak. We apply the approach to eQTL previously identified in the large Consortium for the Architecture of Gene Expression (CAGE) peripheral blood microarray gene expression dataset of 2,138 European ancestry adults (Lloyd-Jones et al. 2017). Of the 2,680 genes analyzed, 276 are associated with inflammatory bowel disease (IBD) by GWAS (de Lange et al. 2017), affording the opportunity to address how the fine-mapping compares with case–control European-ancestry GWAS meta-analysis summary statistics (Wu et al. 2019). We cross-reference our results to ATAC-seq peaks and to Bayesian mapping in the same dataset to evaluate whether the fine-mapping of additional IBD associations improves by all-but-one conditioning (AbO). Combining this information with three other public datasets, we further reduce credible set size and prioritize likely causal variants in a proposed 5-tier approach. The results suggest caution in assuming that any one statistical approach, Bayesian or frequentist, is more likely to fine map credible sets and causal variants.

Materials and methods

Gene expression data from the CAGE dataset

The gene expression and genotype dataset consist of 2,765 individuals whose whole genome genotype information was obtained from five studies that used Illumina microarray gene expression profiles: the Brisbane Systems Genetics Study (BSGS, N = 926 individuals), the Atlanta-based Center for Health Discovery and Well-Being (N = 439 individuals), the Emory Cardiology Genebank (N = 147 individuals), the Estonia Genome Centre of the University of Tartu (EGCUT, N = 1,065), and the Morocco Lifestyle study (N = 188). Gene expression levels were measured in whole blood, and genotype imputation to the 1000 Genomes Phase 1 Version 3 was performed on whole-genome genotypes including quality control steps described by Lloyd-Jones et al. (2017). We utilized the final genotype and normalized expression matrices described in that study. The BSGS included 461 pairs of twins (N = 922 total individuals). To avoid relatedness complications for downstream analysis, one family member was chosen at random for each family ID associated with each individual which left 2,138 individuals of European ancestry for analysis across cohorts. To understand variance due to sampling, we repeated the random family-member selection five times for the 276 IBD-gene subset and compared the genes with 1 or more significant eQTL (P < 5 × 10−5) and SNPs obtained in each credible set. Reported results are from one representative iteration, while replicates allow us to draw comparisons with the Bayesian mapping approach. All analyses were conducted with respect to the human reference genome GRCh37/hg19.

eQTL identification and isolation

Traditional efforts to fine-map eQTLs involve forward stepwise conditional analysis. This method attempts to reduce the effects of LD and interference of tightly linked SNPs by conditioning on eSNPs (SNPs with the most significant association in an eQTL signal) to identify subsequent eQTLs (Hormozdiari et al. 2014) described in Fig. 1a. While conditioning on eSNPs in a forward stepwise manner may remove effects of previously identified eQTLs, the effects of any subsequent eQTLs may still lead to inaccurate eQTL signals due to over- and underestimated effect sizes from SNPs either reinforcing one another or acting antagonistically, respectively (Zeng et al. 2017). A proposed “all-but-one” method to increase the precision of eQTL fine-mapping is to isolate eQTLs by conditioning on all identified eSNPs of eQTLs within a locus to reduce the effects of LD from all eQTLs both previously and subsequently identified (Dobbyn et al. 2018).

Fig. 1.

Fig. 1.

Explanation of forward stepwise conditional analysis modeling compared with all-but-one conditional analysis. The eSNPs detected and used are colored by the order they are identified to aid in visualizing their additive components in forward stepwise conditional analysis, and exclusions in all-but-one conditional analysis. Terms representing the intercept (mean expression) and variance (assumed to be normally distributed with a mean of zero) are excluded from the equations. βn represents the effect size associated with the nth significant eSNP, and β0 represents the estimate for all other SNPs being evaluated for a conditional effect. a) Forward stepwise conditional analysis is conducted by fitting eSNPs stepwise into PLINK’s linear association model to identify eSNPs one at a time. All significant eSNPs identified in previous steps are included as covariates, so each step adds one eSNP. b) All-but-one conditional analysis uses each forward-stepwise identified eSNP as a covariate when evaluating effects (β0) at all other SNPs, thus isolating eQTL signals.

A total of 2,680 genes were selected for eQTL identification and isolation based on evidence for the presence of eQTLs in the CAGE dataset obtained using PolyQTL, a Bayesian multi-eQTL mapping approach derived from DAP (Wen et al. 2016) that also accounts for identity-by-descent and population structure (Zeng et al. 2017). That study included both twins from the BSGS, so the sample size was slightly larger which may have affected the mapping, in addition to the modeling strategy. Since previous studies have suggested that the majority of cis-eQTLs are located within 500 kilobases (kb) of an associated gene’s transcription start site (TSS) (Wen et al. 2015; Dobbyn et al. 2018), a window of 1 megabase (Mb) was established for each gene by determining the positions 500 kb before and after each gene’s TSS resulting in variable amounts of intronic and 3′ intergenic DNA for each gene model.

A 2-part conditional analysis was conducted using PLINK v1.9 (Purcell et al. 2007) to perform conditional association analysis to identify SNP-transcript abundance associations in a linear regression model. First, eQTLs were identified in a forward stepwise manner (Fig. 1a) incorporating each identified independent peak eSNP until no additional SNPs met the significance threshold. Next, the all-but-one method was applied to isolate each independent eQTL signal by conditioning on all eQTLs identified within the locus, excluding the eQTL being isolated (Fig. 1b). Instead of conditioning on significant eSNPs identified previously, after all significant SNPs have been identified, AbO re-evaluates the effect at each peak by recomputing evaluations across the locus with each of the other peak SNPs as covariates. Thus, if 3 associations were detected (SNP1, SNP2, and SNP3), 3 all-but-one models were evaluated, namely reidentifying SNP1 conditional on SNP2 and SNP3, then SNP2 on SNP1 and SNP3, and SNP3 on SNP1 and SNP2 (which is equivalent to the last forward stepwise model). A standard threshold of 5 × 10−5 was established to classify significance, based on permutation tests that established a conservative false discovery rate in our previous study (Zeng et al. 2017). All code is provided along with a worked example of CISD1 at Github: https://github.com/GibsonLab-GT/All-but-One.

Defining credible sets

To compare credible set sizes across methods, we developed a heuristic approach to defining credible sets. Most studies adopt an (arbitrary) LD R2 threshold such as 0.8 to the peak SNP, but this approach would not provide any discrimination among credible set sizes when contrasting AbO and FSR since the SNPs included in these two analyses are the same. It may also assign SNPs to an inappropriate credible set in cases where two causal variants are in high LD. Instead, we borrowed from classical QTL literature where QTL peaks are often defined by 2 log of the odds (LOD) drop intervals (Lander and Botstein 1989). Since eQTL have vastly different effect sizes and hence significance levels, we define credible sets as all SNPs in high LD and within 20% of the negative log10(P-value), or NLP value, of the peak. Thus, if the peak NLP is 30, then all SNPs with R2 > 0.8 and NLP > 24 are placed within the credible set, whereas if the peak NLP is 15 the NLP cutoff is 12. In all cases, we also require that the variant has a greater LD R2 with the peak than with another independent peak in the locus. The choice of threshold is arbitrary but is generally consistent with LD R2 > 0.8 criteria and is mainly used here for illustrative purposes to demonstrate how conditional analysis affects inclusion. A more conservative 10% threshold is too strict for small effect sizes since manual inspection reveal very few SNPs in most of these cases. For Bayesian methods, credible sets are often defined as the summation of posterior inclusion probabilities greater than 0.95, but here we simply used R2 > 0.8 with the PolyQTL PIP peak.

Credible sets defined for our subset of IBD associated genes (165 of the 276 IBD loci observed to have at least 1 significant eQTL peak) were overlapped with those identified by the Bayesian detection (Zeng et al. 2019), allowing 139 comparisons. In each case, we initially explored the overlap only between the same eQTL credible sets, namely primary from AbO with primary Bayesian, secondary with secondary, and so forth. Subsequently, we explored overlap for all peaks detected at each locus.

Corroboration of identified eQTLs with Crohn’s disease

Annotations in Immunobase (https://genetics.opentargets.org/immunobase, accessed January 2019) were used to refine the GWAS list to 247 genes present in CAGE that have candidate causal associations with IBD. Summary statistics for Crohn’s Disease (15 studies), Ulcerative Colitis (16), and IBD (8) were downloaded through OpenTargets (https://www.opentargets.org, accessed November 2021) (Ghoussaini et al. 2021; Mountjoy et al. 2021), which added 29 genes to the IBD subset. Isolated eQTLs were obtained from CAGE and compared with case vs control Crohn’s disease European GWAS meta-analysis summary statistics (Franke et al. 2010) (https://www.ibdgenetics.org/downloads.html) to corroborate the presence of eQTLs in genes with at least one eQTL detected.

Cell specificity analysis of eQTLs in IBD

Schmiedel et al. (2018) conducted an eQTL study of 13 flow cytometry-sorted immune cell types to identify the specific source of cell type-specific blood eQTL signals. Their results reported in Database of Immune Cell Expression (DICE) were downloaded from https://dice-database.org/ to determine whether identified SNPs were within CAGE eQTL peaks. Fifty-six of the 165 IBD genes with an eQTL signal were available in DICE. These were associated with 41 eSNPs listed as a cell-type specific eQTL in DICE with 31 eSNPs associated with the primary eQTL and 10 eSNPs associated with the secondary.

Cross matching eQTL to ATAC-seq

Integration of epigenomics data with fine-mapping may provide additional insight into the mechanisms by which eQTL regulate gene expression (Zeng et al. 2022). Since SNPs in statistically fine-mapped credible sets generally have similar statistical support, we further asked whether one or more SNPs cross-match to open chromatin in the interval as assessed by ATAC-seq data from the NCBI Gene Expression Omnibus (Barrett et al. 2013) for 3 different immune cell subsets: HL-60 (GSM2083754) (Ramírez et al. 2017), Jurkat (GSM4005276) (Li et al. 2020), and Monocyte (GSM2679893) (Park et al. 2017). The HL-60 ATAC-seq data were transformed from GRCh38/hg38 to the GRCh37/hg19 assembly via the UCSC tool LiftOver (Kent et al. 2002; Karolchik et al. 2004). Custom ATAC-seq tracks were generated using pyGenome Tracks (Ramírez et al. 2018). ATAC-seq data for SNPs in credible sets were also cross-matched to Bayesian confidence sets (Zeng et al. 2019) to assess the overlap with open chromatin for this method. Ninety-eight percent of SNPs that overlap the two methods had ATAC-data from the previously mentioned cell types.

Prioritization of credible set variants

The 5-tier prioritization method used to reduce the size of fine-mapped credible sets, and thereby limit the focus to a subset of highly probable causal variants, quantifies the degree of overlap between (1) predicted enhancers, (2) chromatin accessibility, (3) Bayesian credible sets, (4) published IBD GWAS results, and (5) OpenTargets prioritized IBD variants. Chromatin accessibility was evaluated using ATAC-seq peaks in the above-mentioned immune cell subsets (Park et al. 2017; Ramírez et al. 2017; Li et al. 2020). Previous IBD fine-mapped variants were collected from Huang et al. (2017) association studies conducted on approximately 68,000 individuals (Huang et al. 2017). Lead and tag variants for CD (from 15 published studies), UC (16), and IBD (8) were downloaded from OpenTargets (https://www.opentargets.org, accessed November 2021) (Ghoussaini et al. 2021; Mountjoy et al. 2021) and cross-matched to our credible sets. Predicted enhancers were obtained from a machine learning model, activity-by-contact (ABC), which improves causal variant identification by combining evidence of chromatin accessibility and promoter contacts (Fulco et al. 2019). The prioritization score is simply the number of these 5 categories observed at each locus divided by the total number of categories for which data were available for that variant.

Results

Simulation comparison of stepwise and Bayesian eQTL mapping

As background for evaluating the impact of AbO conditioning, we first summarize our previously conducted simulations of stepwise conditional and Bayesian eQTL mapping reported in Zeng et al. (2017) and Liu et al. (2021). Table 1 reviews 16 findings based on extensive permutation of the number, direction of effect, and level of LD among up to 4 linked causal variants superimposed on CAGE or 1000 Genomes European-ancestry genotype matrices for a similar sample size (1,835) as the empirical evaluations reported below. The main takeaway of these studies was that no fine-mapping methods accurately recall more than 85% of multisite regulatory eQTL because of interference among site effects. Phantom eQTL peaks not in strong LD with the true causal variant are observed surprisingly often; effect sizes and peak localization can be meaningfully misestimated, and significant disagreement among methods occurs for a quarter or more of peaks.

Table 1.

Summary of prior simulation multi-eQTL study findings.

Liu et al. (2021) Bioinformatics [Simulated 1, 2, 3 or 4 eQTL affecting 400 genes in 1000G Europeans, n = 1,835]

Zeng et al. (2017) G3 [Simulated 1, 2, 3 or 4 eQTL affecting 500,000 loci in CAGE without twins, n = 1,839]
1. Sequential regression with up to 4 independent SNPs fails to tag >15% of eQTL; finds all 4 50% of the time
2. Phantom eQTL may be detected 10% of the time, notably when all SNPs operate in same direction at a locus
3. Interference due to linked allelic effects in opposite direction can reduce tagging efficiency to <75%
4. Incorporating all discovered SNPs into a joint multivariable model best estimates individual effects
5. Inefficient tagging of unimputed rare variants may drive as many as 10% of observed multisite peaks
6. Up to 15% of peak SNPs are not in high LD (R2 > 0.8) with the simulated causal variant
7. Mis-estimation biases (of effect size and location) increase with LD, allele frequency, and effect size
8. Bayesian mapping with eCaviar tagged 95% of 2-site eQTL and 85% of 3-site eQTL, similar to sequential reg.
9. Bayesian DAP slightly improved detection rates, tagging 88% of 4-site effects with 6% false discovery
10. In the presence of high LD and similar effect sizes, DAP performance drops and set sizes increase
Liu et al. (2021) Bioinformatics [Simulated 1, 2, 3 or 4 eQTL affecting 400 genes in 1000G Europeans, n = 1,835]
1. DAP better reports correct number of eQTL (70%) than CAVIARBF (∼60%), but is inferior to TreeMap (77%)
2. Lead recall rates (fraction of peak eQTL that are the causal variant) are consistently ∼58% for all methods
3. As the number of causal variants increases, precision-recall drops, and is slightly lower for sequential reg.
4. For r2 > 0.7 between linked causal variants, no methods correctly resolve both loci more than 30% of time
5. Stepwise analysis is more sensitive to LD than Bayesian methods, exhaustive search performs best
6. All methods can collapse linked peaks into a single incorrect peak, or split 1 peak into 2 spurious ones

Although Bayesian methods such as CAVIARBF (Chen et al. 2015) outperformed sequential stepwise regression by up to 10% in some conditions, that is not always the case, and they come at a substantial cost in computational burden. Given the prevalence of stepwise regression in transcriptome-wide analysis, we chose to use it as the foundation for the AbO analysis that follows, but note that similar conclusions are likely if peaks are chosen from Bayesian modeling. Our simulations also established the superiority of the Deterministic Estimation of Posteriors (DAP) approach in the presence of complex multisite regulation (Wen et al. 2016), again improving accuracy by up to 10%. Our PolyQTL method employed below is a modification of DAP that incorporates adjustment for ancestry and relatedness (Zeng and Gibson 2019) and gives identical results without either. Other Bayesian approaches, such as PAINTOR have also been adapted to facilitate trans-ancestry mapping (Kichaev and Pasaniuc 2015). All of these simulations make it clear that no method is anywhere near as accurate as often assumed, and hence our results further quantify sources of error in eQTL, and by-extension trait fine-mapping. The limitation should be noted that some these methods, including DAP and PolyQTL, utilize raw genotype and gene expression data rather than more commonly available summary statistics.

Identification and isolation of eQTL in CAGE by stepwise regression

All-but-one conditional analysis was performed to identify and isolate eQTL in 2,680 genes previously found by Bayesian analysis in the CAGE dataset to harbor at least one regulatory interval. Of the 2,680 genes, 2,574 had a minimum of one eQTL by conditional analysis, and 1,528 had more than one eQTL, with an average of 1.91 independent eQTL per gene with at least one eQTL. As described in Zeng et al. (2017), there was a consistent tendency for between one-fifth and one-third of associations to yield a further independent association, and this pattern persisted for each new association. The number of genes with 1 through 9 independent signals were 1,046, 986, 360, 127, 34, 10, 5, 3, and 2 respectively, and one gene (HLA-DRB6) had 12 signals. There were 286 genes without eQTL in this analysis, possibly due to methodology, reduced sample size or noninclusion of family members. To address this loss of signal, we performed 5 iterations of selecting unrelated individuals. We observed 91% overlap of identical genes with at least one detected eQTL signal. This is similar to the overlap with the Bayesian analysis, implying that it is mainly due to stochastic sampling effects rather than inherent inferiority of frequentist analysis.

To visualize the isolated eQTL peaks, we adapted LocusZoom plots (Pruim et al. 2010) to color each SNP by reference to its LD with the closest association. The illustrative example of cis-eQTL affecting CAMK1D expression in Fig. 2 shows 4 independent association peaks. The x-axis represents the base pair position along the chromosome and the y-axis represents the −log(P-value) for each SNP after the indicated conditional association analysis. The primary peak is blue, secondary green, tertiary yellow, and fourth red. In each case darker shading represents stronger LD (higher R2) with the peak eSNP for the credible set, defined as the SNP with the highest association. The left and right panels (Fig. 2a and b, respectively) compare the effects of FSR and AbO fitting. In this example the primary- and quaternary SNPs operate in the same direction. AbO reduces the observed effect size for both associations, markedly (from NLP 44 to 12) for the primary association, which is also seen to have a small set of just 3 SNPs. In contrast, conditioning for peak 3 is enhanced by conditioning for peak 4, which operates in the opposite direction. Conditioning on the fourth peak isolates peak 3 as shown in Fig. 2a and 2b.

Fig. 2.

Fig. 2.

eQTL isolation plots for gene CAMK1D. Plots display the 4 eQTL signals identified for CAMK1D. The shading is proportional to LD with the lead variant as indicated by the key in the top panels. Conditional analysis models are displayed below each plot. a) The eQTL signals from forward stepwise linear regression. b) The eQTL signals from all-but-one conditional analysis illustrate improved signal isolation for the first 3 eQTL signals. The fourth signal is the same between the two fine-mapping methods.

Characterization of the effect of all-but-one conditioning on eQTL estimates

We next evaluated the impact of AbO on 3 characteristics of eQTL identification: effect size, credible set size, and peak identity. Note, there is considerable variety in the degree to which eQTL signals are embedded within one another, or completely nonoverlapping at a locus. For example, signals 1 and 4 in Fig. 2 are fully embedded, there is some embedding with signal 3, and no embedding with signal 2 that sits about 600 kb away from the other signals. The majority (78%) of 1,528 genes with more than 1 eQTL contained at least 1 pair of overlapping eQTL, and 26% (314 genes) of these had a mixture of overlapping and nonoverlapping peaks.

Effect sizes were estimated using PLINK as the logarithm of the odds ratio (beta) attributable to one copy of the allele relative to major allele homozygotes (Purcell et al. 2007). Table 2 lists the number of overlapping and nonoverlapping eQTL where the estimate either increased or decreased after AbO conditioning relative to the forward stepwise estimated peaks. A total of 4,925 eQTL were detected in 2,574 genes, but 1,046 of these had a single peak. After removing the last detected peak, since by definition the estimates are the same in both analyses, 2,351 peaks remain for comparison. Differential effect sizes were determined for any difference, and where the betas were at least 10% different. The observed proportions are minimally affected by this choice of threshold. A 2-by-2 contingency chi-square test shows that the proportions of increased or decreased effects are the same (∼50%) in both categories (P = 0.83). As a result, initial estimates are equally likely to be over- or under-estimated relative to all-but-one analysis, which meaningfully alters effect size estimation (i.e. at least a 10% change) for 51% of the eQTL. The effect of random twin selection was generally <2%. Thus, random selection was not responsible for the nondirectionality and is more likely attributable to similar probabilities of variants increasing or decreasing expression. The magnitude of effect size change was typically much greater for overlapping sets as shown in Fig. 3. The largest observed change was more than 390%, corresponding to a 0.6 beta unit increase in magnitude. The median % change for overlapping eQTL was ∼13%, and 137 of the eQTL changed by more than 50%. In contrast, only a small minority of the nonoverlapping eQTL (n = 4) were affected by more than this amount which is expected since interference from linked alternative associations do not occur.

Table 2.

Effect of all-but-one conditioning on effect size estimation.

eQTL signal type No. of eQTLs (excluding last signal) No. of genes Increased (underestimated) effect sizes Decreased (overestimated) effect sizes
Overlapping (all) 1,781 1,196 868 eQTL (48.7%) 702 genes 912 eQTL (51.2%) 728 genes
Not overlapping (all) 569 483 281 eQTL (49.4%) 255 genes 287 eQTL (50.4%) 260 genes
Overlapping (>10% different) 1,047 818 515 eQTL (49.2%) 449 genes 532 eQTL (50.8%) 464 genes
Not overlapping (>10% different) 154 142 88 eQTL (57.1%) 83 genes 66 eQTL (42.9%) 64 genes

Fig. 3.

Fig. 3.

Effect size comparisons between forwards stepwise regression conditional analysis and all-but-one conditional analysis. a) Magnitude difference of effect size changes with FSR as the reference, such as “Increase+,” implying a larger effect in the AbO estimate. b) Percent change of effect size changes. The two distributions on the left of each panel are overlapping credible sets, and distinct intervals not in LD on the right.

Credible set sizes were calculated for all 2,351 eQTL present in genes with 2 or more independent peaks, excluding the last detected peak. Given the high degree of overlap, rather than using a simple LD R2 metric, we borrowed from the classical QTL literature that defines QTL limits in terms of 2-LOD drop intervals (Lander and Botstein 1989). Given the high variability in effect sizes, we chose heuristically to define credible sets as SNPs lying within 20% of the NLP maximum for the peak (see Materials and methods). If an SNP was included in 2 sets, it was assigned to the one with the highest degree of LD. By this criterion, approximately 30% of all credible sets change in size after all-but-one conditioning, of which 64% decreased the number of SNPs and 36% increased the number of SNPs. A change in credible set size in either direction was more likely to occur if that credible set overlapped at least one other (86% of decreased credible sets overlapped at least one other credible set; 73% of increased credible sets overlapped at least one other credible set). This highlights how eQTL fine-mapping can stratify embedded signals of overlapping peaks. Although in cases where credible set sizes changed, the reduction of the credible set size was equally likely to occur in cases where the effect size decreased (167 of 217) or increased (144 of 208). This indicates that signals may act in opposite directions or reinforce one another’s effects and thus demonstrates how eQTL fine-mapping can provide more accurate effect sizes by removing additive effects or surrounding SNPs.

In general, the number of SNPs involved was modest relative to the total size of the credible set. An example of this analysis for the 5 identified independent eQTL affecting AOAH expression is shown in Fig. 4. The first credible set decreased in size with the effect size estimate changing from 0.38 to 0.26 (NLP from 35.2 to 13.9), because some of the SNPs in lower LD drop out of the credible set after conditioning. There is a slight increase in the second eQTL credible set size despite a small effect size decrease of 2.5% (NLP 10.5–11.5), although the identities of some SNPs change. There was no change in the third peak, even with an effect size increase from 0.23 to 0.27 after AbO conditioning. The fourth peak increases in its credible set size after AbO conditioning, despite a small decrease in effect size of 4% change. In each of these 4 peaks, only a few SNPs drop in or out of a peak once it is isolated by including information on LD with other peaks, indicating that the peaks are cleanly isolated. The example of peak 1 demonstrates how AbO reduces the set size relative to the defined primary set without reference to any other peaks, in which case both the blue and green SNPs would have been included in the primary credible set.

Fig. 4.

Fig. 4.

eQTL isolation plots for gene AOAH. Plots display 5 eQTL signals identified for AOAH. Conditional analysis models are displayed below each plot. The red lines indicate the NLP threshold for defining credible sets. Beta values indicate the effect sizes for the eSNP, and the number for n represents the number of SNPs that met the criteria to be considered within the credible set. a) The eQTL signals from forward stepwise linear regression. b) The eQTL signals from all-but-one conditional analysis improved signal isolation for the first 3 eQTL signals. The fourth signal is the same between the two fine-mapping methods.

A third potential advantage of all-but-one conditioning is fine-mapping of the actual peak. Given statistical noise in high LD-intervals, it is common for the identity of the peak SNP to shift slightly. In some cases, we observed consequential disagreement between the two mapping approaches. We define consequential as situations where the peak after AbO conditioning is at least 10% different in effect size and has an LD R2 value <0.7 relative to the forward stepwise peak used to identify the eQTL. Of the 2,351 eQTL potentially affected, 1,893 (81%) retained the identical peak SNP, and of the 458 where differences were seen, 149 (33%) were deemed consequential. Figure 4 shows the example of AOAH in which the first eQTL signal shifts from rs79848624 to rs7780908, the fourth eQTL signal peak shifts from rs13438465 to rs10215905, and the second and third peaks are unaffected. The first peak signal shift is nonconsequential, where the LD R2 value of rs7780908 is 0.90 with rs79848624 and the effect size decreased by 0.23% only. The fourth peak signal shift is consequential, as the LD R2 value of rs10215905 is 0.52 with rs13438465 and the effect size changed by a 3.6% decrease.

A possible explanation for eSNP disagreements is the influence of genotype by population interactions, namely differences in allele effects in one or more of the CAGE study populations. To evaluate this, we performed a 2-way ANOVA on the allelic effects of the FSR and AbO peak eSNPs for each set, adopting P < 5 × 10−5 as an approximate Bonferroni-adjusted threshold. Supplementary Figure 1 illustrates differences in relative transcript abundance for FSR eSNPs and AbO eSNPs for the same interval. Panel A is an example where the peak 1 FSR eSNP for CYP4V2 expression shows significant genotype by population interaction effects (Brisbane and Morocco as outlier studies) while the AbO eSNP on the right does not. Conversely, panel B illustrates population-specific effects for both the FSR and AbO eSNPs. Table 3 confirms that population differences in allele effects account for most cases of consequential disagreement. For intervals in which both FSR and AbO eSNPs had genotype by population interactions, only 12% of the cases were consequential (16 of 135). In contrast, for intervals in which only one of the 2 eSNPs (FSR or AbO) was found to have a genotype by population effect difference, 63% of cases were consequential (48 of 76) with the FSR eSNPs 4 times more likely to be the source of the interaction than AbO eSNPs (41 vs 7).

Table 3.

Distribution of both consequential and nonconsequential peak disagreements as a function genotype by population interaction.

Disagreement type Both FSR and AbO eSNPs FSR eSNP only AbO eSNP only Neither Total
Consequential 16 41 7 85 149
Not consequential 129 20 8 152 309
Total 135 61 15 237 458

All-but-one and Bayesian comparison

To evaluate differences between fine-mapping methods, credible sets from AbO conditional analysis were compared to those detected by the Bayesian mapping for 139 IBD genes (Zeng et al. 2019). 85% of genes had at least 1 SNP in a credible set that overlapped between the two methods. Of the 21 genes that had no overlap, 5 were due to differences in the length of the region included in the analyses and 16 were due to discordant inference. For example, Bayesian and AbO identified a significant eQTL for TYK2 within the same 100 kb region, but the peak SNPs identified were distinct. In Fig. 5, AbO variants are highlighted in blue and Bayesian in red on the custom LocusZoom plots based on P-values computed with the frequentist and Bayesian modeling approaches in panels A and B, respectively. This particular difference is likely due to the relatively small sample size, leading to low power to resolve the true peak.

Fig. 5.

Fig. 5.

Overlaid Bayesian and all-but-one credible sets from eQTL for gene TYK2. a) Isolated eQTL from all-but-one analysis with significance represented by –log(P-value) on the y-axis. b) Isolated eQTL from Bayesian analysis with converted P-values represented on the y-axis. All-but-one credible set SNPs are highlighted in blue with darker shading indicating higher LD with the lead variant. Bayesian credible set SNPs are highlighted in red with the same shading principle.

To determine whether sampling differences might account for these differences, we generated 5 iterations of AbO analysis randomly selecting an individual from each family. These iterations yielded 90% similarity at the gene level and 78% similarity for all credible sets. Three-quarters of the 16 genes were consistently discordant between methods across all 5 iterations, including the TYK2 case. Some of the mismatches of credible sets is due to sampling variance, but for a small percentage of cases, there is complete lack of concordance between Bayesian and AbO fine-mapping methods.

Expanding the analysis to consider all eQTL peaks at the 139 IBD loci, 214 eQTL detected by AbO had a Bayesian credible set to compare. Of these, 57% of credible sets had a SNP that overlapped between the 2 methods. The average percentage of SNPs that overlapped the same credible set was 54% for AbO sets and 50% for Bayesian. These percentages were slightly higher when comparing the overlap of all credible sets detected per gene (70% and 60%, respectively), because of instances where secondary peaks match primary peaks.

Corroborating evidence for eQTL in IBD

Whole blood is a complex mixture of more than a dozen cell types, and it is known that a proportion of eQTL is cell type-specific (Mizuno and Okada 2019). Schmiedel et al. (2018) mapped eQTL in 15 flow-sorted immune cell populations from 100 healthy human donors, generating a DICE. Of the 165 genes associated with IBD with at least 1 whole blood eQTL in CAGE, 56 have 41 shared eSNPs listed in the DICE database. Figure 6 shows these associations collapsed to 15 immune cell types. The size of each dot represents the percentage of SNPs in our AbO credible set which matched DICE associations with the gene in a given cell type, and the color represents the average DICE P-value measured for each SNP. Panel A reports credible set SNPs from peak 1 eQTL, while panel B reports credible set SNPs from peak 2 eQTL overlapping the DICE database. Of the 62 total eQTL associations, an average of 5 cell types were represented per eQTL, 80% with the primary eQTL, and 20% with the secondary eQTL.

Fig. 6.

Fig. 6.

Cross-matching Crohn’s disease eQTL to the DICE database. a) Dotplot depicting P-value associations from DICE between Crohn’s eSNP associations with 15 immune cell types for peak 1 eQTL. b) Dotplot depicting P-value associations from DICE between Crohn’s eSNP associations with 14 immune cell types for peak 2 eQTL.

The lymphoid (B, T, and NK cells) and myeloid (monocyte) lineages have the most distinct profiles, while CD4+ T cells show the largest number of eQTL. The eSNP of the primary eQTL for the gene ERAP2 has very strong associations with each cell type (Fig. 6a), whereas LGALS9 and PNKD have monocyte-specific eQTL. The eSNP in RPIK2, rs40380 was uniquely observed in regulatory T cells, further implicating this SNP in that cell type as the causal regulatory variant for IBD. The gene TMEM50B has two eQTL with respective eSNPs having different cell type associations: the primary eQTL has strong associations in all cell types, the secondary eQTL only has associations in helper T cells.

Given evidence that causal variants tend to lie within accessible chromatin (Maurano et al. 2012; Trynka et al. 2013; Wen et al. 2017; Soskic et al. 2019; Pan et al. 2020), we crossmatched eQTL peaks to ATAC-seq peaks described in 3 different immune cell subsets, namely the HL-60 myeloid (GSM2083754) (Ramírez et al. 2017), Jurkat T-cell line (GSM4005276) (Li et al. 2020), and primary monocytes (GSM2679893) (Park et al. 2017). Of the 270 AbO eQTL for 161 IBD genes, 72% aligned with an ATAC-seq peak in at least one of these datasets, with the highest proportion of alignments observed in the primary peak. Supplementary Figure 2 shows an example in the SPHK2 locus where two independent eQTL peaks align with two ATAC-seq peaks in an overlapping interval. This suggests there may be a causal SNP or group of SNPs in each peak responsible for the signal. The aligned plots for AbO and custom ATAC-seq tracks generated through pyGenome Tracks (Ramírez et al. 2018) for these 161 IBD genes are publicly available at https://eqtlhub-gt.shinyapps.io/shiny/. ATAC-seq data obtained for SNPs in AbO credible sets were also crossmatched to Bayesian sets. Of the SNPs in Bayesian sets overlapping AbO sets, we had ATAC-seq data for 98% of variants, of which 20% overlapped with an ATAC-seq peak in one or more cells lines. This was similar to the overlap of AbO credible set SNPs with chromatin accessibility.

SNP prioritization

Credible set variants can be further prioritized based on previous implication in published studies and fine-mapping methods. We quantified the overlap between AbO credible sets and ABC enhancers (Fulco et al. 2019; 18.1%), ATAC-seq data (20.5%), Bayesian credible sets (72%), IBD GWAS variants (Huang et al. 2017; 12.9%), and OpenTargets IBD lead and tagged variants (Ghoussaini et al. 2021; Mountjoy et al. 2021; 23.7%). Although the overlap between a credible set and each of these datasets individually is informative, we reasoned that combining the overlap percentages in a prioritization score may be more effective. Our proposed 5-tier approach prioritizes causal variants based on the overlap percentage adjusted for the data available for a given variant. This method allowed us to reduce likely causal variants in our 275 credible sets for the 165 IBD gene subset. Resolution of prioritized variants increases with stricter prioritization scores, as some credible sets shrink to a handful of highly probable variants. Supplementary Figure 3 shows the relationship between priority score and number of variants per refined credible set, indicating that most high-priority SNPs were found in small sets. Our prioritized list reduces 4,195 credible set variants to 442 high probability causal variants with a prioritization score of 60% or greater. Since the score is weighted by the number of data points available for the SNP, a high score is seen in a few cases because only one category was available. This is indicated in the data reported on our Shiny app. The highest priority variants should be considered for experimental validation.

Discussion

This is the seventh of a series of studies using the CAGE dataset to characterize the genetic architecture of gene expression in peripheral blood. The primary paper (Lloyd-Jones et al. 2017) estimated a median heritability of 0.14 for two-thirds of all probes on the Illumina microarrays, namely ∼10,000 genes, with half of this heritability explained by common variants in trans, and one-quarter by the lead eQTL in cis. Three studies have investigated the effect of purifying selection on SNP heritability (Zeng et al. 2018), the covariance of blood gene expression (Lukowski et al. 2017), and autosomal regulation across sexes (Kassam et al. 2016). We used CAGE for a simulation study of multi-SNP cis-acting transcriptional regulation (Zeng et al. 2017), inferring that constraints due to interference among closely linked eSNPs affect the localization of eQTL peaks pervasively and may result in the true causal variant lying outside mapped credible sets at least 5% of the time, likely more. We documented the instance of multi-SNP regulation using a Bayesian algorithm modified to adjust for relatedness (Zeng and Gibson 2019), finding disturbingly low concordance in comparison with another large blood eQTL dataset from the Framingham Heart Study (Huan et al. 2015). Here, we estimate the isolated effects of all cis-eQTL detected by stepwise regression and compare the localization with the Bayesian results, once again concluding that systematic constraints on fine-mapping should caution against overconfidence that mapped intervals are precise.

Our main finding is that implementing the AbO mapping procedure significantly changes the magnitude of estimated effect sizes for approximately half of all probes compared with more standard conditional analyses that model effects in a stepwise manner. In a small minority of cases, the localization of the peak eQTL is shifted such that the credible set between the two methods is nonoverlapping. There are essentially three modes of estimating effect sizes considered here: univariate, FSR marginal, and AbO isolated marginal. The univariate approach of simply reporting the point estimate independent of all other SNPs in the region is the least reliable, yet is frequently reported. The widely used Blood eQTL portal reports univariate associations now from over 30,000 samples (Võsa et al. 2021; https://www.eqtlgen.org/cis-eqtls.html). Unfortunately, it is common for investigators to validate their GWAS SNP as “functional” due to significant association in this database, whereas colocalization evidence is formally required. Marginal estimates derived from conditional analysis in the context of other associations in the region provide evidence for statistical independence. Although in blood the interpretation is complicated by variable cell-type proportions (Zhernakova et al. 2017). In theory, marginal effects also provide the best estimates of the combined effects of all eQTL on the expression of a gene once they have been localized. The AbO approach to isolating effects should improve on FSR because it includes all linked effects, not just those antecedent to it in the discovery pipeline. One interpretation of our findings is these isolated effect sizes tend to be larger than inferred from FSR effects, providing context for the conclusion in Lloyd-Jones et al. (2017) that the lead eQTL typically explains more than three-quarters of the variance at a locus, since those estimates were based on FSR.

The second major finding of this study is the limited overlap between the credible sets defined by conditional linear modeling and our previously reported Bayesian analysis. Partial discrepancy is due to the treatment of twins from the Brisbane Systems Genomics Study, which were both included with random effect adjustment for relatedness in the Bayesian Poly-QTL modeling (Zeng and Gibson 2019) but only one was included in this study by random selection. Since permutation of the random selection only modifies AbO marginal effect estimates by a median of 1.4 %, this is 6 times less than the median effect of comparing FSR to AbO conditioning, thus not the major source of the differential mapping. Relaxation of thresholds for defining credible sets in both studies also increases overlap, but at risk of increasing false positive colocalization. The choice of a 20% NLP drop interval is arbitrary, as is the 80% PIP in the Bayesian analysis, and may be conservative. Typical credible sets often range from 10 to 50 SNPs but can reach upwards to 100 even with these criteria. Given the popularity of Bayesian methods such as Caviar (Hormozdiari et al. 2014) and DAP (Wen et al. 2016) for fine-mapping regulatory intervals in support of GWAS interpretation, our results suggest caution. We encourage comparison of several different mapping approaches before investments are made in downstream studies based on the assumption that one method is superior to another. There is reason to expect that Bayesian and conditional mapping in the presence of multiple associations at a locus may yield varying results for standard GWAS-based complex trait mapping.

One of the areas where the accuracy of fine-mapping matters is tying regulatory variation to GWAS interpretation. The expectation that blood eQTL colocalize with IBD associations was poorly met in large-scale studies conducted 5 years ago (Farh et al. 2015; Huang et al. 2017): of 139 IBD loci, just 10 map to regulatory sites with single nucleotide resolution and another 72 to blood eQTL sets of 2–20 SNPs. Reasons for this include low power of most eQTL studies for multisite detection, the cellular complexity of blood, possible condition- and/or tissue residency-specific effects, systematic discovery biases in GWAS and eQTL analysis (Mostafavi et al. 2022), and the previously mentioned inherent constraints on fine-mapping. Some of these issues are now overcome by single cell eQTL analysis which is directly observing condition-specific eQTL in specific immune cell types (Kundu et al. 2022; Soskic et al. 2022; Yazar et al. 2022). Numerous groups (e.g. Farh et al. 2015; Chen et al. 2016; Ghoussaini et al. 2021) have overlapped chromatin features, specifically differentially accessible regions (DAR) of chromatin defined by DNAse hypersensitivity or ATAC-seq profiles to help identify causal variants. A promising approach is ABC modeling which matches DARs to promoters through chromatin contacts (Fulco et al. 2019) showing high precision and recall for GWAS signals in 30 of 37 immune-related IBD loci. We asked how many of these polymorphic regulatory elements overlap with eQTL peaks, identifying 152 cases. These results are encouraging with respect to the development of a revised ABC score that incorporates eQTL data into the algorithm. We expect that single cell RNA-seq data will be required to generate the desired resolution, likely in combination with emerging computational approaches to integrating multimodal genomic data (Chiou et al. 2021; Doke et al. 2021). In the meantime, we have generated a database of plausible credible sets and superimposed them on ATAC-seq data for all of the established IBD risk loci and incorporated this into our R-shiny browser at https://eqtlhub-gt.shinyapps.io/shiny/.

We conclude from the variable overlap of Bayesian and AbO eQTL mapping and of chromatin accessibility assays that experimental assays are essential to fine-mapping. However, even those may yield ambiguous results. It may be critical to perform experimental assays on the appropriate cell type, although extrachromosomal reporter assays may not capture true regulatory effects. We have developed a CROP-seq approach to systematically screen credible sets, which in a pilot experiment led to high confidence mapping of the IBD risk variants at CISD1 and PARK7 (Pan et al. 2020). Single cell perturbation assays combine moderate to high throughput with statistical power and the benefit of analyzing effects in the native chromatin context. Whether they are sufficient to resolve the molecular basis of regulatory associations to single causal variants consistently remains to be seen.

Supplementary Material

iyac162_Supplementary_Data

Acknowledgments

We thank all donors for their participation, as well as our CAGE collaborators Peter Visscher, Jian Yang, Grant Montgomery, Youssef Idaghdour, Arshed Quyyami, the late Kenneth Brigham, Andres Metspalu, and Tonu Esko.

Funding

This research was supported by a US National Institute of Human Genome Research grant awarded to GG, R01-HG-011459.

Contributor Information

Margaret Brown, Center for Integrative Genomics, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.

Emily Greenwood, Center for Integrative Genomics, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.

Greg Gibson, Center for Integrative Genomics, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.

Data Availability

Gene expression data for the CAGE studies are available from GEO under study accession no. GSE61672 (CHDWB), GSE49925 (CAD), GSE17065 (Morocco), GSE53195 (BSGS), and GSE48348 (EGCUT). All findings are reported on our R-Shiny at https://eqtlhub-gt.shinyapps.io/shiny/ including the SNP Prioritization score, and all raw data needed to repeat analyses. A working example for the gene CISD1 with code used to perform the analysis is available on Github at https://github.com/GibsonLab-GT/All-but-One.

Supplemental material is available at GENETICS online.

Literature cited

  1. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 2013;41(Database Issue):D991–D995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Benner C, Spencer CCA, Havulinna AS, Salomaa V, Ripatti S, Pirinen M.. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016;32(10):1493–1501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chen W, Larrabee BR, Ovsyannikova IG, Kennedy RB, Haralambieva IH, Poland GA, Schaid DJ.. Fine mapping causal variants with an approximate Bayesian method using marginal test statistics. Genetics. 2015;200(3):719–736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chen W, McDonnell SK, Thibodeau SN, Tillmans LS, Schaid DJ.. Incorporating functional annotations for fine-mapping causal variants in a Bayesian framework using summary statistics. Genetics. 2016;204(3):933–958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Chen Y, Zhu J, Lum PY, Yang X, Pinto S, MacNeil DJ, Zhang C, Lamb J, Edwards S, Sieberts SK, et al. Variations in DNA elecudiate molecular networks that cause disease. Nature. 2008;452(7186):429–435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chiou J, Geusz RJ, Okino M-L, Han JY, Miller M, Melton R, Beebe E, Benaglio P, Huang S, Korgaonkar K, et al. Interpreting type 1 diabetes risk with genetics and single-cell epigenomics. Nature. 2021;594(7863):398–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chun S, Casparino A, Patsopoulos NA, Croteau-Chonka DC, Raby BA, De Jager PL, Sunyaev SR, Cotsapas C.. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat Genet. 2017;49(4):600–605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Dai Q, Zhou G, Zhao H, Võsa U, Franke L, Battle A, Teumer A, Lehtimaki T, Raitakari O, Esko T, et al. OTTERS: a powerful TWAS framework leveraging summary-level reference data. bioRxiv. 10.1101/2022.03.30.486451v2, 2022. [DOI] [PMC free article] [PubMed]
  9. de Lange KM, Moutsianas L, Lee JC, Lamb CA, Luo Y, Kennedy NA, Jostins L, Rice DL, Gutierrez-Achury J, Ji S-G, et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat Genet. 2017;49(2):256–261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Dobbyn A, Huckins LM, Boocock J, Sloofman LG, Glicksberg BS, Giambartolomei C, Hoffman GE, Perumal TM, Girdhar K, Jiang Y, et al. ; CommonMind Consortium . Landscape of conditional eQTL in dorsolateral prefrontal cortex and co-localization with schizophrenia GWAS. Am J Hum Genet. 2018;102(6):1169–1184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Doke T, Huang S, Qiu C, Sheng X, Seasock M, Liu H, Ma Z, Palmer M, Susztak K.. Genome-wide association studies identify the role of caspase-9 in kidney disease. Sci Adv. 2021;7(45):eabi8051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, et al. ; ENCODE Project Consortium . Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447(7146):799–816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J, Carlson S, Helgason A, Walters GB, Gunnarsdottir S, et al. Genetics of gene expression and its effect on disease. Nature. 2008;452(7186):423–428. [DOI] [PubMed] [Google Scholar]
  14. Fachal L, Aschard H, Beesley J, Barnes DR, Allen J, Kar S, Pooley KA, Dennis J, Michailidou K, Turman C, et al. ; ABCTB Investigators . Fine-mapping of 150 breast cancer risk regions identifies 191 likely target genes. Nat Genet. 2020;52(1):56–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Farh KK-H, Marson A, Zhu J, Kleinewietfeld M, Housley WJ, Beik S, Shoresh N, Whitton H, Ryan RJH, Shishkin AA, et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature. 2015;518(7539):337–343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Franke A, Balschun T, Sina C, Ellinghaus D, Häsler R, Mayr G, Albrecht M, Wittig M, Buchert E, Nikolaus S, et al. ; IBSEN Study Group . Genome-wide association study for ulcerative colitis identifies risk loci at 7q22 and 22q13 (IL17REL). Nat Genet. 2010;42(4):292–294. [DOI] [PubMed] [Google Scholar]
  17. Fulco CP, Nasser J, Jones TR, Munson G, Bergman DT, Subramanian V, Grossman SR, Anyoha R, Doughty BR, Patwardhan TA, et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat Genet. 2019;51(12):1664–1669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Ghoussaini M, Mountjoy E, Carmona M, Peat G, Schmidt EM, Hercules A, Fumis L, Miranda A, Carvalho-Silva D, Buniello A, et al. Open targets genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics. Nucleic Acids Res. 2021;49(D1):D1311–D1320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Gudjonsson A, Gudmundsdottir V, Axelsson GT, Gudmundsson EF, Jonsson BG, Launer LJ, Lamb JR, Jennings LL, Aspelund T, Emilsson V, et al. A genome-wide association study of serum proteins reveals shared loci with common diseases. Nat Commun. 2022;13(1):480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hormozdiari F, Kostem E, Kang EY, Pasaniuc B, Eskin E.. Identifying causal variants at loci with multiple signals of association. Genetics. 2014;198(2):497–508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Huan T, Meng Q, Saleh MA, Norlander AE, Joehanes R, Zhu J, Chen BH, Zhang B, Johnson AD, Ying S, et al. ; International Consortium for Blood Pressure GWAS (ICBP) . Integrative network analysis reveals molecular mechanisms of blood pressure regulation. Mol Syst Biol. 2015;11(1):799., [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Huang H, Fang M, Jostins L, Umićević Mirkov M, Boucher G, Anderson CA, Andersen V, Cleynen I, Cortes A, Crins F, et al. ; International Inflammatory Bowel Disease Genetics Consortium . Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature. 2017;547(7662):173–178., [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Jansen R, Hottenga J-J, Nivard MG, Abdellaoui A, Laport B, de Geus EJ, Wright FA, Penninx BWJH, Boomsma DI.. Conditional eQTL analysis reveals allelic heterogeneity of gene expression. Hum Mol Genet. 2017;26(8):1444–1451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ.. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004;32(Database Issue):D493–D496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Kassam I, Lloyd-Jones L, Holloway A, Small KS, Zeng B, Bakshi A, Metspalu A, Gibson G, Spector TD, Esko T, et al. Autosomal genetic control of human gene expression does not differ across the sexes. Genome Biol. 2016;17(1):248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Kendziorski CM, Chen M, Yuan M, Lan H, Attie AD.. Statistical methods for expression quantitative loci (eQTL) mapping. Biometrics. 2006;62(1):19–27. [DOI] [PubMed] [Google Scholar]
  27. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D.. The human genome browser at UCSC. Genome Res. 2002;12(6):996–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kichaev G, Pasaniuc B.. Leveraging functional annotation data in trans-ethnic fine mapping studies. Am J Hum Genet. 2015;97(2):260–271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Kichaev G, Yang W-Y, Lindstrom S, Hormozdiari F, Eskin E, Price AL, Kraft P, Pasaniuc B.. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 2014;10(10):e1004722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Kundu K, Tardaguila M, Mann AL, Watt S, Ponstingl H, Vasquez L, Von Schiller D, Morrell NW, Stegle O, Pastinen T, et al. Genetic associations at regulatory phenotypes improve fine-mapping of causal variants for 12 immune-mediated diseases. Nat Genet. 2022;54(3):251–262. [DOI] [PubMed] [Google Scholar]
  31. Lander ES, Botstein DB.. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 1989;121(1):185–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Li Y, Liao Z, Luo H, Benyoucef A, Kang Y, Lai Q, Dovat S, Miller B, Chepelev I, Li Y, et al. Alteration of CTCF-associated chromatin neighborhood inhibits TAL1-driven oncogenic transcription program and leukemogenesis. Nucleic Acids Res. 2020;48(6):3119–3133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Liu L, Chandrashekar P, Zeng B, Sanderford MD, Kumar S, Gibson G.. TreeMap: a structured approach to fine mapping of eQTL variants. Bioinformatics. 2021;37(8):1125–1134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Lloyd-Jones LR, Holloway A, McRae A, Yang J, Small K, Zhao J, Zeng B, Bakshi A, Metspalu A, Dermitzakis M, et al. The Genetic architecture of gene expression in peripheral blood. Am J Hum Genet. 2017;100(2):228–237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Lukowski SW, Lloyd-Jones LR, Holloway A, Kirsten H, Hemani G, Yang J, Small K, Zhao J, Metspalu A, Dermitzakis ET, et al. Genetic correlations reveal the shared genetic architecture of transcription in human peripheral blood. Nat. Commun. 2017;8(1):483. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337(6099):1190–1195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Mizuno A, Okada Y.. Biological characterization of expression quantitative trait loci (eQTLs) showing tissue-specific opposite directional effects. Eur J Hum Genet. 2019;27(11):1745–1756. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Mostafavi H, Spence JP, Naqvi S, Pritchard JK. Limited overlap of eQTLs and GWAS hits due to systematic differences in discovery. BioRxiv. 10.1101/2022.05.07.491045, 2022. [DOI]
  39. Mountjoy E, Schmidt EM, Carmona M, Schwartzentruber J, Peat G, Miranda A, Fumis L, Hayhurst J, Buniello A, Karim MA, et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat Genet. 2021;53(11):1527–1533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Odhams CA, Cunninghame Graham DS, Vyse TJ.. Profiling RNA-seq at multiple resolutions markedly increases the number of causal eQTLs in autoimmune disease. PLoS Genet. 2017;13(10):e1007071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Pan YD, Tian RY, Lee C, Bao G, Gibson G.. Fine-mapping within eQTL credible intervals by expression CROP-seq. Biol Methods Protoc. 2020;5(1):bpaa008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Park SH, Kang K, Giannopoulou E, Qiao Y, Kang K, Kim G, Park-Min K-H, Ivashkiv LB.. Type I interferons and the cytokine TNF cooperatively reprogram the macrophage epigenome to promote inflammatory activation. Nat Immunol. 2017;18(10):1104–1116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Parrish RL, Gibson G, Epstein MP, Yang J.. TIGAR-V2: efficient TWAS tool with nonparametric Bayesian eQTL weights of 49 tissue types from GTEx V8. HGG Adv. 2022;3(1):100068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, Boehnke M, Abecasis GR, Willer CJ.. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics. 2010;26(18):2336–2337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Ramírez F, Bhardwaj V, Arrigoni L, Lam KC, Grüning BA, Villaveces J, Habermann B, Akhtar A, Manke T.. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat Commun. 2018;9(1):189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Ramirez RN, El-Ali NC, Mager MA, Wyman D, Conesa A, Mortazavi A.. Dynamic gene regulatory networks of human myeloid differentiation. Cell Syst. 2017;4(4):416–429 e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Schaid DJ, Chen W, Larson NB.. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat Rev Genet. 2018;19(8):491–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Schmiedel BJ, Singh D, Madrigal A, Valdovino-Gonzalez AG, White BM, Zapardiel-Gonzalo J, Ha B, Altay G, Greenbaum JA, McVicker G, et al. Impact of genetic polymorphisms on human immune cell gene expression. Cell. 2018;175(6):1701–1715 e16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Soskic B, Cano-Gamez E, Smyth DJ, Rowan WC, Nakic N, Esparza-Gordillo J, Bossini-Castillo L, Tough DF, Larminie CGC, Bronson PG, et al. Chromatin activity at GWAS loci identifies T cell states driving complex immune diseases. Nat Genet. 2019;51(10):1486–1493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Soskic B, Cano-Gamez E, Smyth DJ, Ambridge K, Ke Z, Matte JC, Bossini-Castillo L, Kaplanis J, Ramirez-Navarro L, Lorenc A, et al. Immune disease risk variants regulate gene expression dynamics during CD4+ T cell activation. Nat Genet. 2022;54(6):817–826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Stegle O, Parts L, Durbin R, Winn J.. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput Biol. 2010;6(5):e1000770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Trynka G, Sandor C, Han B, Xu H, Stranger BE, Liu XS, Raychaudhuri S.. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat Genet. 2013;45(2):124–130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J.. 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet. 2017;101(1):5–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Võsa U, Claringbould A, Westra H-J, Bonder MJ, Deelen P, Zeng B, Kirsten H, Saha A, Kreuzhuber R, Yazar S, et al. ; i2QTL Consortium . Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet. 2021;53(9):1300–1310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Wang G, Sarkar A, Carbonetto P, Stephens M.. A simple new approach to variable selection in regression, with application to genetic fine mapping. J R Stat Soc B. 2020;82(5):1273–1300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Wen X, Lee Y, Luca F, Pique-Regi R.. Efficient integrative multi-SNP association analysis via deterministic approximation of posteriors. Am J Hum Genet. 2016;98(6):1114–1129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Wen X, Pique-Regi R, Luca F.. Integrating molecular QTL data into genome-wide genetic association analysis: probabilistic assessment of enrichment and colocalization. PLoS Genet. 2017;13(3):e1006646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Wen X, Luca F, Pique-Regi R.. Cross-population joint analysis of eQTLs: fine mapping and functional annotation. PLoS Genet. 2015;11(4):e1005176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Wu Y, Broadaway KA, Raulerson CK, Scott LJ, Pan C, Ko A, He A, Tilford C, Fuchsberger C, Locke AE, et al. Colocalization of GWAS and eQTL signals at loci with multiple signals identifies additional candidate genes for body fat distribution. Hum Mol Genet. 2019;28(24):4161–4172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Yang J, Ferreira T, Morris AP, Medland SE, Madden PAF, Heath AC, Martin NG, Montgomery GW, Weedon MN, Loos RJ, et al. ; Genetic Investigation of Anthropometric Traits (GIANT) Consortium . Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet. 2012;44(4):369–375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Yazar S, Alquicira-Hernandez J, Wing K, Senabouth A, Gordon MG, Andersen S, Lu Q, Rowson A, Taylor TRP, Clarke L, et al. Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease. Science. 2022;376(6589):eabf3041. [DOI] [PubMed] [Google Scholar]
  63. Zeng B, Bendl J, Kosoy R, Fullard JF, Hoffman GE, Roussos P.. Multi-ancestry eQTL meta-analysis of human brain identifies candidate causal variants for brain-related traits. Nat Genet. 2022;54(2):161–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Zeng B, Gibson G.. PolyQTL: Bayesian multiple eQTL detection with control for population structure and sample relatedness. Bioinformatics. 2019;35(6):1061–1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Zeng B, Lloyd-Jones LR, Holloway A, Marigorta UM, Metspalu A, et al. Constraints on eQTL fine mapping in the presence of multisite local regulation of gene expression. G3-Genes Genomes Genetics. 2017;7:2532–2544. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Zeng B, Lloyd-Jones LR, Montgomery GW, Metspalu A, Esko T, Franke L, Vosa U, Claringbould A, Brigham KL, Quyyumi AA, et al. Comprehensive multiple eQTL detection and its application to GWAS interpretation. Genetics. 2019;212(3):905–918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Zeng J, de Vlaming R, Wu Y, Robinson MR, Lloyd-Jones LR, Yengo L, Yap CX, Xue A, Sidorenko J, McRae AF, et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat Genet. 2018;50(5):746–753. [DOI] [PubMed] [Google Scholar]
  68. Zhang B, Gaiteri C, Bodea L-G, Wang Z, McElwee J, Podtelezhnikov AA, Zhang C, Xie T, Tran L, Dobrin R, et al. Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer's disease. Cell. 2013;153(3):707–720. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Zhernakova DV, Deelen P, Vermaat M, van Iterson M, van Galen M, Arindrarto W, van't Hof P, Mei H, van Dijk F, Westra H-J, et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat Genet. 2017;49(1):139–145. [DOI] [PubMed] [Google Scholar]
  70. Zou Y, Carbonetto P, Wang G, Stephens M.. Fine-mapping from summary data with the “Sum of Single Effects” model. PLoS Genet. 2022;18(7):e1010299. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

iyac162_Supplementary_Data

Data Availability Statement

Gene expression data for the CAGE studies are available from GEO under study accession no. GSE61672 (CHDWB), GSE49925 (CAD), GSE17065 (Morocco), GSE53195 (BSGS), and GSE48348 (EGCUT). All findings are reported on our R-Shiny at https://eqtlhub-gt.shinyapps.io/shiny/ including the SNP Prioritization score, and all raw data needed to repeat analyses. A working example for the gene CISD1 with code used to perform the analysis is available on Github at https://github.com/GibsonLab-GT/All-but-One.

Supplemental material is available at GENETICS online.


Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES