Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2025 Feb 7;112(3):659–674. doi: 10.1016/j.ajhg.2025.01.015

reg-eQTL: Integrating transcription factor effects to unveil regulatory variants

Rekha Mudappathi 1,2,3, Tatiana Patton 1,2, Hai Chen 1,2,3, Ping Yang 3, Zhifu Sun 4, Panwen Wang 6, Chang-Xin Shi 5, Junwen Wang 6,7, Li Liu 1,2,
PMCID: PMC11947170  PMID: 39922197

Summary

Regulatory single-nucleotide variants (rSNVs) in noncoding regions of the genome play a crucial role in gene transcription by altering transcription factor (TF) binding, chromatin states, and other epigenetic modifications. Existing expression quantitative trait locus (eQTL) methods identify genomic loci associated with gene-expression changes, but they often fall short in pinpointing causal variants. We introduce reg-eQTL, a computational method that incorporates TF effects and interactions with genetic variants into eQTL analysis. This approach provides deeper insights into the regulatory mechanisms, bringing us one step closer to identifying potential causal variants by uncovering how TFs interact with SNVs to influence gene expression. This method defines a trio consisting of a genetic variant, a target gene, and a TF and tests its impact on gene transcription. In comprehensive simulations, reg-eQTL shows improved power of detecting rSNVs with low population frequency, weak effects, and synergetic interaction with TF as compared to traditional eQTL methods. Application of reg-eQTL to GTEx data from lung, brain, and whole-blood tissues uncovered regulatory trios that include eQTLs and increased the number of eQTLs shared across tissue types. Regulatory networks constructed on the basis of these trios reveal intricate gene regulation across tissue types.

Keywords: eQTL analysis, transcription factors, TF-SNV interaction, regulatory trios, rare SNVs, tissue-specific eQTLs, bioinformatics

Graphical abstract

graphic file with name fx1.jpg


Reg-eQTL enhances eQTL analysis by simultaneously modeling genetic variant effects, transcription factor effects, and their interactions. It excels in detecting eQTL loci with low-frequency and weak-effect variants. Application of reg-eQTL to GTEx data discovered novel eQTLs and built regulatory networks, offering a deeper understanding of regulatory mechanisms.

Introduction

The human genome encompasses a wide array of single-nucleotide variants (SNVs), a vast majority of which is located in noncoding regions.1 While the functional impact of noncoding SNVs is largely unknown, a subset of them, known as regulatory SNVs (rSNVs), may modulate gene transcription through various mechanisms such as altering transcription factor (TF) binding affinity,2 transforming chromatin states, and affecting other epigenetic modifications.3,4,5 These rSNVs have been implicated in a broad spectrum of diseases and phenotypic traits.6,7,8 For example, analysis of SNVs associated with phenotypes in diseases such as Crohn disease (MIM: 266600), type 1 diabetes (MIM: 222100), and rheumatoid arthritis (MIM: 180300) has revealed significant enrichment in transcription regulatory regions.6

Expression quantitative trait locus (eQTL) analysis includes a group of computational methods designed to identify genomic loci harboring rSNVs. cis-eQTL analysis focuses on finding rSNVs that act locally on nearby genes, while trans-eQTL analysis targets rSNVs that regulate distant genes.9 Single-QTL analysis tests each locus independently,10 whereas multiple-QTL analysis examines interactions between loci.11 The MatrixQTL12 and fastQTL13 are among the most commonly used tools that streamline single cis-eQTL analysis by offering efficient algorithm implementations. These tools use linear regression models to examine the relationship between the transcript abundance of target genes (TGs) and SNVs, incorporating other factors such as age, sex, ethnicity, and population structure as covariates. The multiple-QTL mapping (MQM) approach involves automated model selection and can identify both large-effect and small-to-moderate-effect QTLs, including interactions between them. It starts by applying a single-QTL mapping method, such as interval mapping14 or Haley-Knott regression,15 to select markers based on logarithm of odds (LOD) scores, followed by stepwise regression to first assess main effects and then include potential interactions.11 The MQM method allows for mapping QTLs across chromosome using maximum likelihood.16 Although various eQTL methods can identify lead SNVs strongly associated with gene-expression changes, these lead SNVs are not always the causal variants driving the observed effects.17 TFs are crucial in transcription regulation, interacting closely with rSNVs. Variants that affect TF binding are often enriched among causal variants for various traits as reported in several studies.18 Therefore, considering TFs and their interactions with SNVs is vital for identifying true causal SNVs.

TFs modulate gene transcription by binding to specific DNA sequences known as transcription factor binding sites (TFBSs). By creating or disrupting TFBS sequences, rSNVs may alter the binding affinity of TFs to these sites,19 which in turn affects transcription of the TGs. For example, rs11672691 at 19q13 alters the binding site of HOXA2 (MIM: 604685), affecting the expression of PCAT19 (MIM: 618192) and CEACAM21 (MIM: 618191), which are implicated in aggressive prostate cancer progression.7 In lung cancer, the T allele of rs17079281 in the DCBLD1 (MIM: 608698) promoter creates a YY1-binding site, reducing DCBLD1 expression and consequently lowering the risk of adenocarcinoma.8 Several studies have explored the relationships between genetic variants, TFs, and gene regulation through eQTL and TFBS analyses, revealing how functional variants alter TF binding, and their colocalization with genome-wide association studies loci. For example, editing variants within the TBC1D4 gene (MIM: 612465), which encodes a canonical nuclear factor κB (NF-κB) (MIM: 164012) binding site, was found to affect chromatin accessibility and binding of the p65 component of NF-κB.20 A recent study examined fine-mapped rSNVs from the GTEx samples to identify potential TFs participating in the regulatory process. Multiple linear regression was used to incorporate the main effects of TFs and SNVs as well as their interactions.21 Significant results from this study emphasize the importance of modeling TFs in accurately estimating the direction and magnitude of rSNV effects. However, no tool exists that explicitly considers TF and TF-SNV interactions in eQTL analysis.

In this study, we propose an eQTL method called reg-eQTL that enhances traditional methods by incorporating TF and TF-SNV interaction into the analysis. This method defines a trio consisting of an SNV, a TG, and a TF. Within each trio, reg-eQTL tests the main effects of the TF and SNV as well as their interaction effect on the transcription level of the TG. Reg-eQTL detects not only rSNVs showing a significant main SNV effect but also regulatory trios (rTrios) that have significant main effects of SNV and TF and TF-SNV interactions. Via comprehensive simulations, we systematically evaluated the strengths and weaknesses of reg-eQTL and traditional eQTL methods. The results showed that reg-eQTL excelled in detecting rSNVs with low population frequency, weak effect size, or synergetic interaction with TF. We applied reg-eQTL to analyze GTEx data from lung, brain, and whole blood samples, uncovering numerous eQTLs that had been previously overlooked and eQTLs shared across tissue types. By identifying eQTLs enriched with rTrios across tissues, we developed tissue-specific regulatory modules that capture the intricate gene-regulation mechanism across diverse biological contexts. The reg-eQTL method provides an additional layer of insight in identifying potential causal variants through revealing the TFs involved and elucidating how these TFs interact with rSNVs to cause changes in gene expression. This capability significantly enhances our understanding of the regulatory mechanisms at play, facilitating the identification of causal variants and constructing regulatory networks.

Material and methods

Compilation of regulatory trios

Using annotations in the GeneHancer database,22 we retrieved the coordinates of regulatory elements (REs) (promoters and enhancers) identified in the human genome (hg38) experimentally or by imputation, TGs, and gene-association scores. We also obtained TFs corresponding to REs and combined them with TG data to form trios. Specifically, we used two GeneHancer database files. The first file contains the GH ID, the start and end genomic coordinates, and the TGs of each RE. The second file contains the GH ID, the associated TFs, and the tissue-/cell-type information for each RE. Using the GH ID as a common key, we merged these two files to map an RE to its associated TFs and TGs. SNVs were mapped to REs based on hg38 genomic coordinates utilizing the GenomicRanges package in R. A trio consists of a unique combination of an SNV inside an RE, a TF, and a TG based on GeneHancer annotations. The SNVs can be located in introns, exons, or intergenic regions, with distances of up to 1.2 Mbp from the transcription start site (TSS) of its associated TG (Figure S1).

The reg-eQTL algorithm

Given a regulatory trio in a sample s, we denote the expression of TG as TGs, the expression of TF as TFs, and the SNV genotype as SNVs. We represent their regulatory relationship as a linear regression model,

TGS=δ+αTFs+βSNVs+γTFs:SNVs+ΩCs+ϵs, (Equation 1)

where δ is the intercept, α and β are coefficients representing the main effect of TF and SNV, respectively, γ is the coefficient representing the interaction between TF and SNV, C is a vector of covariates with the corresponding coefficients in Ω, and εs is Gaussian-distributed error. We used the R/glm(family = ’Gaussian’) function to fit the model to a dataset. To correct for multiple comparisons, we adjusted the nominal p values using the q value method with a false discovery rate (FDR) threshold of 0.05. A q value of <0.05 indicates a significant association. A significant nonzero β value implies an rSNV. Significant nonzero α,β, and γ values imply an rTrio.

Building regulatory networks

After identifying rTrios, we constructed association networks to visualize the relationships among them. In a network, nodes represent TG, TF, and SNVs, and edges connect those with significant associations. A direct edge between a TG and TF indicates that their transcription levels are correlated. An edge between a TG and an SNV indicates an eQTL. An edge between a TF and an SNV implies that their interaction affects transcription, that is, the impact of TF on TG depends on the SNV genotype.

Simulation data

To evaluate the performance of our method against traditional eQTL approaches, we employed a simulation-based approach. We used the software PLINK to generate synthetic genotype data for an SNV with minor allele frequency (MAF) in one of the four categories, i.e., A (0.01–0.05), B (0.05–0.1), C (0.1–0.2), and D (0.2–0.5). We generated gene-expression data of a TF based on the Gaussian distribution G(0,1), mimicking the distribution of GTEx data. To explore various scenarios, we selected a series of α,β, and γ values in Equation 1 to simulate the effect size of TF, SNV, and TF-SNV interaction, respectively. The α and β values ranged from −0.6 to 0.6 with an interval of 0.1, and the γ values varied between −0.3 and 0.3 with an interval of 0.1. These effect sizes and MAF categories produced 4,732 unique combinations (13 α × 13 β × 7 γ × 4 MAF). For each combination, we simulated the expression of the TG according to Equation 1, with intercept and error drawn from the G(0,1) distribution. Each simulated dataset consisted of 670 samples, a size similar to the whole blood tissue dataset in the GTEx study. Covariates were not included in the simulations. We selected the ranges of α,β, and γ values to encompass a wide spectrum of small to large effects and to allow for both synergistic and antagonistic interactions between TF and SNV. While the signs of these coefficients indicate the direction of effects, their absolute values do not directly represent the magnitude of effect sizes, as these are influenced by variance. To standardize interpretation, we converted the coefficients in each simulated trio into Cohen’s f2 standardized coefficients. The small, medium, or large effect sizes correspond to Cohen’s standardized coefficients ≥0.02, ≥0.15, and ≥0.35, respectively.

For each simulated dataset, we executed the reg-eQTL method. We also executed a simple method (s-eQTL) that includes only the main effect of SNV,

TGs=δ+βSNVs+ΩCs+ϵs, (Equation 2)

which corresponds to the traditional eQTL analysis implemented in MatrixQTL and fastQTL. We confirmed that our implementation of Equation 2 produced the same results as these two methods (see results). However, our own implementation provided flexibility for us to test different scenarios.

Following the model fitting and correction for multiple comparisons for both reg-eQTL and s-eQTL, we compared their performance based on receiver-operating characteristic (ROC), area under the ROC curve (AUC), sensitivity, specificity, precision, and recall. To account for stochastic factors, we produced 200 simulation datasets for each combination of effect sizes and MAF and reported the average performance.

GTEx datasets

We downloaded RNA-sequencing and genotype data of 670 whole blood, 165 brain hippocampus, and 515 lung samples from the GTEx data portal (v.8, mapped to the hg38 reference genome). Transcript abundance quantified as transcripts per kilobase per million mapped reads (TPM) were available for 24,421 genes in the brain, 20,315 genes in whole blood, and 26,095 genes in lung. We also downloaded covariate data that include five genotyping principal components, 60 probabilistic estimation of expression residuals (PEER) factors, sequencing protocol, sequencing platform, and sex. We defined trios as described above, which yielded a comprehensive set of 53,151,141 unique combinations. We filtered the SNVs to remove those that were outside the gene bodies and regulatory regions, had MAF < 0.01, or were genotyped in fewer than 20 samples, which gave rise to 1,096,234 SNVs in the lung, 1,068,248 in the brain, and 1,076,383 in the whole blood tissue. After mapping the SNVs to their TGs, we obtained 43,474,690 trios in the lung, 40,941,207 in the brain, and 38,527,177 in the whole blood dataset. For each TG, we applied reg-eQTL and s-eQTL methods. The nominal p values were corrected for multiple testing using the q value method with an FDR threshold of 0.05.

Results

Performance on simulation data

In our simulations, true positives were SNVs with a nonzero simulated effect size (β ≠ 0, i.e., rSNVs), and true negatives were SNVs with β = 0. We organized the simulation data into subsets with matching effect sizes and MAF and balanced class labels. Specifically, for a given combination of {α,β,γ} values where β ≠ 0, the true-positive cases consisted of 200 trios, and the negative cases consisted of 200 trios with the same α value and γ value but β = 0. This procedure generated 4,368 balanced sets (13 α values × 12 nonzero β values × 7 γ values × 4 MAF).

Using the simulation data, we first confirmed that the s-eQTL results were identical to the MatrixQTL results (Figures S2A and S2B), validating our implementation of the traditional eQTL analysis algorithm is correct. Many commonly used eQTL analysis tools, such as fastQTL,13 QTLtools,23 and TensorQTL,24 are different implementations of the same linear regression algorithm used in MatrixQTL,12 with each tool achieving varying degrees of computational efficiency. Therefore, the performance comparison between reg-eQTL and s-eQTL effectively reflects the comparison with these other tools.

We constructed ROC curves and computed area under the curve (AUC) values based on discriminating rSNVs from non-rSNVs. In 404 (9.2%) of these datasets, reg-eQTL reported significantly better ROC curves and AUC values than s-eQTL (DeLong test, p < 0.05, Table 1). Trios involving low-frequency SNVs (221 with MAF < 0.05 and 93 with MAF < 0.1) exhibited the most substantial increase of AUC in reg-eQTL analysis compared to s-eQTL analysis (mean ΔAUC range 0.05–0.07). In only 61 (1.4%) datasets, including additional TF and TF-SNV interactions in the reg-eQTL model overfitted the data, as indicated by the low AUC values compared to s-eQTL. In these cases, the most substantial decrease of AUC was exhibited in low-frequency SNVs (mean ΔAUC range 0.04–0.05).

Table 1.

Summary of simulated datasets for which reg-eQTL and s-eQTL methods show significantly different performance

eQTL method with better performance No. of datasets MAF category Mean AUC difference Mean p value (DeLong)
reg-eQTL 221 A (0.01–0.05) 0.07 0.007
93 B (0.05–0.1) 0.05 0.011
68 C (0.1–0.2) 0.02 0.01
22 D (0.2–0.5) 0.02 0.023
s-eQTL 41 A (0.01–0.05) 0.05 0.011
7 B (0.05–0.1) 0.02 0.021
12 C (0.1–0.2) 0.002 0.035
1 D (0.2–0.5) 0.01 0.048

We observed the biggest AUC difference of 0.450 in trios with {α = −0.6, β = 0.1, γ = −0.2, MAF = A}, in which reg-eQTL achieved an AUC of 0.89 while the AUC of s-eQTL was merely 0.44 (Figure 1A). Noticeably, the ROC curve of s-eQTL sometimes went below the diagonal line, indicating a performance worse than random predictions. In other MAF categories, the biggest AUC differences observed were 0.248, 0.108, and 0.056 in trios {α = 0.6, β = 0.1, γ = 0.3, MAF = B}, {α = 0.6, β = −0.1, γ = 0.3, MAF = C}, and {α = 0.6, β = 0.1, γ = 0.3, MAF = D}, respectively (Figures 1B–1D). We also examined ROC curves in each MAF category where s-eQTL outperformed reg-eQTL (Figures 1E–1H). Although the difference was statistically significant, the ROC curves of the two methods exhibited close proximity. The biggest AUC differences observed in each MAF category were 0.124, 0.033, 0.005, and 0.011 in trios {α = −0.5, β = 0.1, γ = 0.1, MAF = A}, {α = −0.3, β = 0.1, γ = 0.0, MAF = B}, {α = −0.6, β = −0.1, γ = 0.2, MAF = C}, and {α = −0.1, β = −0.1, γ = −0.3, MAF = D}, respectively.

Figure 1.

Figure 1

ROC curves and AUC values of reg-eQTL and s-eQTL methods across various MAF categories and effect-size combinations

(A–D) The reg-eQTL method outperformed the s-eQTL method. For each MAF category, ROC curves from the datasets producing the largest performance difference were presented. AUC values for each method are displayed in parentheses. Above each panel, the MAF category and effect sizes of TF, SNV, and TF-SNV interaction are displayed in order.

(E–H) The s-eQTL method outperformed the reg-eQTL method.

The relative performance of the two methods also varied according to the simulated effect sizes. When the magnitude of SNV effect was weak (e.g., β = 0.1 or −0.1), reg-eQTL often showed superior performance (Figures 2A and 2B). When the TF effect and the TF-SNV interaction shared the same direction, reg-eQTL also outperformed the s-eQTL (Figure 2E). However, when the TF effect and the TF-SNV interaction were in the opposite direction, s-eQTL showed advantages (Figure 2F). In these cases, the conflicting effects might attenuate or cancel each other, such that s-eQTL with only the SNV effect better fitted the data than reg-eQTL with inaccurate estimation of the effect sizes. Therefore, it is imperative to consider the interaction effect between TF and SNV in eQTL analysis to accurately assess the contribution of genetic variants to gene-expression regulation.

Figure 2.

Figure 2

Heatmap of the number of models outperformed by reg-eQTL and s-eQTL across different simulated effect sizes and directions

(A and D) TF effect size vs. SNV effect size.

(B and E) TF:SNV effect size vs. SNV effect size.

(C and F) TF effect size vs. TF:SNV effect size.

(G and H) Number of false positives and true positives at various q value cutoffs produced on a simulation dataset, in which s-eQTL shows the largest advantage over reg-eQTL.

Although s-eQTL outperformed reg-eQTL in only 61 simulation datasets, it implied that including TFs in the model may sometimes cause overfitting. To better understand these scenarios, we carefully analyzed the simulation data where s-eQTL exhibited the largest advantage over reg-eQTL. We found that the two methods had comparable specificities at q value cutoffs of <0.75 (Figure 2G). However, at q value cutoffs >0.75, the specificity of reg-eQTL declined rapidly, whereas that of s-eQTL was maintained at a high level. This difference in specificity outweighed the slight disadvantage of s-eQTL in sensitivity (Figure 2H), leading to its better overall performance measured by the ROC curve. It is noteworthy that the advantage of s-eQTL was only evident at q value cutoffs >0.75, which are rarely used in real-world applications.

Performance of reg-eQTL and s-eQTL at q value threshold of 0.05

ROC curves and AUC values depict the performance landscape of a method independent of the threshold for class prediction. However, identification of potential eQTL requires specific q value or FDR threshold. We first examined the predictions at the default threshold of q < 0.05. As expected, this default cutoff gave rise to a very high specificity (>0.98) in all simulation datasets for both methods (Figure 3C). The accuracy and sensitivity were also high in the majority (72%) of the simulation datasets (accuracy >0.90 and sensitivity >0.90, Figures 3A and 3B). However, for 269 (6%) simulation datasets, using the default threshold resulted in poor performance (sensitivity <0.6 and accuracy <0.6) for both methods.

Figure 3.

Figure 3

Performance of reg-eQTL and s-eQTL at a q value threshold of 0.05 across all simulations

(A–C) Histogram showing the accuracy (A), sensitivity (B), and specificity (C) distributions for different simulation datasets at a q value threshold of 0.05.

(D) Barplot showing the percentage of unique SNVs identified by reg-eQTL and s-eQTL across different MAF categories.

(E and F) Analysis of the impact of varying the q value threshold on the accuracy of reg-eQTL and s-eQTL methods for trios −0.6:0.1:−0.2 for MAF < 0.05 (E) and −0.5:0.1:0.1 for MAF < 0.05 (F). Vertical bars indicate the q value thresholds corresponding to the highest accuracy for reg-eQTL (red) and s-eQTL (blue).

The reg-eQTL method consistently identified a higher percentage of rSNVs across all MAF categories compared to the s-eQTL method at the default q value cutoff (Figure 3D). In the MAF A category representing SNVs with the lowest frequency, 4.40% of the rSNVs were identified by only reg-eQTL, while only 1.92% were identified by s-eQTL. As the MAF increased, reg-eQTL retained its advantage although the margin narrowed. These results underscore the effectiveness of the reg-eQTL method in identifying rSNVs with different population frequencies, particularly excelling in detecting rare regulatory variants.

Determining q value cutoff for reg-eQTL and s-eQTL

We noticed that the default q value threshold was not always optimal. For example, in the set of trios {α = −0.6, β = 0.1, γ = −0.2, MAF < 0.05} where the reg-eQTL showed the largest improvement over the s-eQTL in AUC, the default q value cutoff corresponded to sensitivity of 0.02 and 0.09 and accuracy of 0.51 and 0.54 for reg-eQTL and s-eQTL, respectively. We then examined whether altering the q value threshold could improve the performance of the two methods for this particular set of trios. We iterated through a series of cutoff values from 0.01 to 0.99 with an interval of 0.01. At the q value threshold of 0.82, reg-eQTL achieved the highest accuracy of 0.91. However, the highest accuracy for s-eQTL was only 0.67, which corresponded to the q value cutoff of 0.52 (Figure 3E). Similarly, we examined another set of trios {α = −0.5, β = 0.1, γ = 0.1, MAF = A} in which s-eQTL outperformed reg-eQTL. The optimal q value threshold was 0.78 for reg-eQTL, corresponding to the highest accuracy of 0.87, while the optimal q value threshold was 0.99 for s-eQTL, corresponding to accuracy of 0.99 (Figure 3F).

We then systematically investigated how performance of the two methods varied with q value cutoff in four scenarios in MAF category A: (1) both TF and SNV effects were weak (α = 0.1, β = 0.1, Figures 4A–4C); (2) TF effect was weak while SNV effect was strong (α = 0.1, β = 0.6, Figures 4D–4F); (3) TF effect was strong while SNV effect was weak (α = 0.6, β = 0.1, Figures 4G–4I); and (4) both TF and SNV effects were strong (α = 0.6, β = 0.6, Figures 4J–4L). We found that the specificity of both methods for any, β, and γ values remained high with value >0.99 and started decreasing when q value cutoffs exceeded 0.31–0.93 (Figures 4B, 4E, 4H, and 4K). In scenarios when both β values were weak (Figures 4A and 4G), the sensitivity of detecting the main effect of SNVs steadily increased from 0 to 1 as the q value threshold increased from 0 to 1 for both methods. The accuracy curves for both methods followed a pattern similar to that of the sensitivity curves (Figures 4C and 4I), with a noticeable slowing in the increase in accuracy once the q value cutoffs exceeded 0.31–0.93. In scenarios when either β was strong (Figures 4D and 4J), the sensitivity of detecting the main effect of SNVs approached 1 quickly at low q value cutoffs of 0.02–0.13. The accuracy of both methods aligned more closely with the specificity curve, indicating higher accuracies at lower q value thresholds (Figures 4F and 4I). These results demonstrated that the conventional q value threshold of 0.05 may not be universally applicable across all scenarios. Instead, the determination of optimal q value cutoffs depends on the TF, SNV, and TF-SNV effect sizes. It is noteworthy that the challenge of statistical power varying with MAF and effect size is well documented for many commonly used eQTL mapping tools.25,26

Figure 4.

Figure 4

Systematic investigation of how performance of reg-eQTL and s-eQTL methods varied with q value cutoff in four scenarios

(A–C) Both TF (0.1) and SNV (0.1) effects were weak.

(D–F) TF effect (0.1) was weak while SNV effect (0.6) was strong.

(G–I) TF effect (0.6) was strong with weak SNV effect (0.1).

(J–L) Both TF effect (0.6) and SNV effect (0.6) were strong.

We thus estimated the optimal q value thresholds in different scenarios to inform context-specific identification of significant eQTLs. Since the effect sizes for TF, SNV, and TF-SNV are not known in real datasets, we used Cohen’s f2 as a measure of standardized coefficient.27,28 To mimic real eQTL data enriched with functionally neutral variants,29,30 we performed simulations with an imbalanced class ratio of 1:100 (200 positives vs. 20,000 negatives), generating 4,368 datasets, each corresponding to a unique combination of [MAF:α:β:γ] parameters. Using previously noted guidelines on the interpretation of f2 (effect size ≥0.02 is a small effect, ≥0.15 is a medium effect, and ≥0.35 is a large effect), we determined the q value thresholds for each MAF category within each effect size class based on the highest F1 score. For SNVs with medium to large effects, the optimal q value thresholds were between 0.01 and 0.04, lower than the default 0.05 threshold across all MAF categories. However, for SNVs with small effects, the thresholds increased to 0.08–0.23 (Table 2). We also performed simulations using a class ratio of 1:10 (200 positives vs. 2,000 negatives), which produced q value thresholds similar to those based on 1:100 ratio (Table S1). Importantly, even with the substantially high q value thresholds for small-effect-size groups, all SNVs passing the q value filter had raw p values of <0.05.

Table 2.

Optimal q value cutoffs corresponding to highest F1 scores based on MAF and effect size of SNVs

Cohen’s effect size MAF category
A (0.01–0.05) B (0.05–0.1) C (0.1–0.2) D (0.2–0.5)
Small (≥0.02) 0.13 0.14 0.24 0.17
Medium (≥0.15) 0.04 0.02 0.02 0.02
Large (≥0.35) 0.02 0.01 0.01 0.01

Performance with class imbalance

Using the simulation datasets with a 1:100 positive-to-negative class ratio, we compared the performance of reg-eQTL and s-eQTL based on the area under the precision-recall curve (AUPRC) value. In 62.1% (2,712 out of 4,368) of the datasets, reg-eQTL outperformed s-eQTL. The largest improvement was observed in the [A:−0.6:−0.3:−0.3] dataset, where the AUPRC of reg-eQTL was 0.47 higher than that of s-eQTL (0.70 vs. 0.23, Figure S4). Conversely, s-eQTL outperformed reg-eQTL in only 8.0% (350) of these datasets, all showing small AUPRC differences (ΔAUPRC range 0.001–0.03). In the datasets with a less severe class imbalance ratio of 1:10, we observed similar trends: reg-eQTL outperformed in 56.5% of the datasets, while s-eQTL outperformed in only 10.3%. The results demonstrate that reg-eQTL is more robust to class imbalance than s-eQTL.

Power analysis

Complex models typically require larger sample sizes to train effectively in comparison to simpler models. To understand the implications for reg-eQTL, we conducted a power analysis using simulations. For each combination of MAF category, TF effect (α[0,0.6]), SNV effect (β[1,0.6]), and TF-SNV interaction (γ[0,0.3]), we simulated datasets with sample sizes n = 25, 50, 100, 250, and 500. We first examined the minimum sample size required to achieve a power of 0.8. Across 672 unique combinations of MAF, α,β, and γ coefficients, reg-eQTL required fewer samples than s-eQTL in 79 case (11.8%), more samples in only three cases (0.4%), and equal numbers of samples in 445 cases (66.2%). In the remaining 145 cases, neither method was able to achieve a power of 0.8 with 500 samples. We then analyzed the power across all sample sizes, considering a total of 3,360 unique combinations of MAF, α,β, and γ, and sample sizes. Reg-eQTL demonstrated higher power than s-eQTL in 43.1% (1,448) of cases, lower power in 35.0% (1,176) of cases, and equal power in the remaining 21.9% (736) of cases. The largest increase in power was for common SNVs with strong main effects and interaction in small sample sizes (MAF>0.2,α=0.6,β=0.4,γ=0.3,n=50; Figure S3A). The largest decrease in power was for rare SNVs with no TF effect but strong SNV effect and strong interaction in small sample sizes (MAF<0.1,α=0,β=0.6,γ=0.3,n=25; Figure S3B). These results suggest that incorporating TF and TF-SNV interaction in the model can reduce the power to detect rSNV when the TF is not a regulator of the TG but increase the power when the TF strongly impacts TG expression, and this effect varied with SNV.

Application to GTEx data

We analyzed the lung, brain, and whole blood datasets from the GTEx project. The lung dataset produced the largest number (43,474,690) of trios, while the brain dataset produced the smallest number (40,941,207) of trios. Because an SNV, a TF, and a TG might be included in multiple trios, the number of unique SNVs, TFs, and TGs involved was much less (Table S2). We applied reg-eQTL and s-eQTL to these trios. We confirmed that the s-eQTL results including raw p values and coefficients were the same as those reported in the GTEx portal (Figures S2C and S2D). We first compared s-eQTL and reg-eQTL results for the lung dataset, then examined tissue specificity across lung, brain, and whole blood datasets.

At the q value cutoff of 0.05 for the SNV effect, the reg-eQTL method identified 2,503,127 (5.8%) trios in the lung samples, with significant SNV main effect detected, i.e., rSNVs. These trios involved 260,002 (23.7%), 503 (99.8%), and 15,235 (76.6%) unique SNVs, TFs, and TGs, respectively. A vast majority (247,071, 95%) of these rSNVs were also identified by the s-eQTL method. Among these consensus rSNVs, reg-eQTL detected significant TF effects for 10,919 (4.2%) of them and significant TF-SNV interactions, i.e., rTrios, for 1,350 (0.52%) of them. Reg-eQTL identified 12,931 rSNVs that were not detected by the s-eQTL method; the absolute standard effect size of these SNVs were in the range of 0.03–7.33. To examine whether the two methods identified different lead SNVs in the same locus, we scanned the ±1-kbp flanking regions of the rSNVs. We found 5,372 loci where only reg-eQTL detected significant rSNVs. Even when we expanded the flanking regions to 10 kbp surrounding the rSNVs, 1,569 loci remained unique to the reg-eQTL result. We then examined trios involving these uniquely identified rSNVs and found 419 trios showing significant TF effects, among which 76 were rTrios. Conversely, s-eQTL identified 7,496 rSNVs that were not detected by the reg-eQTL method; the absolute effect sizes of these SNVs were small, ranging from 0.02 to 0.42. We also analyzed the significance of TF and TF-SNV interaction for SNVs that were missed by reg-eQTL. The number of reg-missed e-QTLs with significant TF effects was 424 (5.66%), and the number of reg-missed e-QTLs with significant TF-SNV effects was 1 (0.01%) (Figure 5A). The results for brain and whole blood datasets are provided in Figure S5.

Figure 5.

Figure 5

Comparison of reg-eQTL and s-eQTL methods

(A) Venn diagram showing the number of unique and shared significant SNVs between reg-eQTL and s-eQTL methods in lung tissue, with subsets representing SNVs with/without significant TF and TF:SNV effects. “Not TF/TF:SNV” indicates neither TF nor TF:SNV effect is significant.

(B and C) Scatterplots showing the distances of significant SNVs from the target genes for reg-eQTL (B) and s-eQTL (C).

(D and E) Significant SNVs associated with target genes TOE1 in lung tissue (D) and MMACHC in whole blood (E) identified by reg-eQTL but missed by s-eQTL. Top panels show the HiC interactions, with blue indicating distant and gray proximal interactions; middle panels shows the effect sizes for the significant SNVs and the q value for SNV, TF, and TF:SNV for the significant SNVs; and the bottom panel shows the LD block diagram for the given genomic range.

The GeneHancer annotations include tissue specificity of REs, although the samples used to infer this information are limited in number and heavily biased toward cancer cell lines (Table S3). For example, lung-specific REs are based on only three cell lines (two from lung cancer tissues and one from normal lung tissue). We reanalyzed the GTEx lung, brain, and whole blood samples by focusing on REs specific to matching tissue or cell types and present the results in Table S4. As expected, the number of tested trios and rSNVs both decreased. For example, in the lung-specific analysis, the number of tested trios decreased to 39% (from 43,474,690 to 16,955,403) and the number of rSNVs dropped to 17% (from 260,002 to 44,543). Despite these reductions, we observed similar distributions of SNV effects and p values between tissue-specific REs and all REs (Figure S6). These findings suggest that while tissue-specific annotations refine the analysis, they do not substantially alter the overall patterns of detected associations.

We tested applying adaptive q value cutoffs to the GTEx lung data. For SNVs with medium to large effects, applying a stricter q value cutoff (0.01–0.04) reduced the number of rSNVs slightly (0%–2%, Table S5). However, for SNVs with small effects, adaptive q value cutoffs of 0.08, 0.15, 0.17, and 0.23 for MAF categories A to D, respectively led to a substantial increase in the number of identified rSNVs, with inflation rates of 22%, 118%, 49%, and 88%, respectively. Due to the significant inflation of the number of rSNVs, users should carefully examine rSNVs in these categories to ensure their biological relevance. In the remaining analysis, we applied the default 0.05 q value cutoff.

Unique SNVs identified by reg-eQTL and s-eQTL

The analysis of lung, brain, and whole blood tissues revealed that the reg-eQTL method identified a larger number of unique rSNVs (12,931 in lung, 11,605 in whole blood, and 12,042 in brain) compared to the s-eQTL method (7,496 in lung, 7,530 in whole blood, and 4,441 in brain). Notably, around 15% of these unique rSNVs (14.79% in lung, 14.67% in whole blood, and 14.76% in brain) identified by reg-eQTL were located at greater distances (>500 kbp) from the TSS (Figures 5B and 5C). Of the distant rSNVs identified, 7% in lung, 13% in whole blood, and 3% in brain exhibited significant TF effects or TF-SNV interactions. Also, 6% in lung, 4% in whole blood, and 2% in brain showed significant association with proximal (within 100 kbp) TGs. For instance, the SNV rs2335410 is significantly associated with the distant TG KPNA6 (MIM: 610563) (q = 0.005, SNV to TG distance = 715 kb), for which ARNT (MIM: 126110) is a significant TF. Additionally, this SNV is associated with the proximal TG YARS1 (MIM: 603623) (q = 0.022, SNV to TG distance = 0.866 kbp).

We analyzed chromatin-wide interactions to identify significant chromatin interactions that may contribute to the regulatory effects of distant SNVs in our eQTL analysis. Using the ChIA-PET chromatin interaction data31 from the UCSC browser for various cell lines, we filtered interaction blocks that overlapped with the distant SNVs and the distant TGs. In lung tissue, examples of TGs having chromatin interactions with distant eQTLs included TOE1 (MIM: 613931), a gene known to cause developmental 2defects and breathing abnormalities32; TTC9C (MIM: 610488), a potential novel tumorigenic regulator33; and TPRG1 (MIM: 611460), an immune-related gene correlated with tumor recurrence of stage Ia-Ib lung cancer.34 In whole blood, examples include ZFP91 (MIM: 619289), a potential target in acute myeloid leukemia (MIM: 601626) treatment35; SRCAP (MIM: 611421), with mutations implicated in human clonal hematopoiesis36; and MMACHC (MIM: 609831), where mutations can cause methylmalonic acidemia (MIM: 251000), a disorder that can lead to blood abnormalities.37 A linkage disequilibrium (LD) block diagram highlighting the chromatin interactions and LD blocks encompassing the significant SNVs and their associated TGs TOE1 in lung and MMACHC in whole blood is shown in Figures 5D and 5E. In our analysis, while reg-eQTL identified four rSNVs associated with TOE1 and MMACHC, s-eQTL detected none. Among the rSNVs identified by reg-eQTL, rs78902799, rs115758619, and rs35708671 for TOE1 and rs114160820 and rs116724108 for MMACHC, located more than 500 kbp from their respective TGs, have been implicated in significant HiC interactions, highlighting potential long-range regulatory effects.

Tissue specificity

In our analysis of tissue-shared and tissue-specific rSNVs using both reg-eQTL and s-eQTL methods, we identified overlapping and nonoverlapping rSNV regions between lung, brain, and whole blood tissues. Specifically, we compared rSNV regions with a flanking region of 10 kb across tissue pairs: lung-brain, lung-whole blood, and brain-whole blood. Our results revealed that reg-eQTL identified more tissue-shared rSNV regions and consequently fewer tissue-specific rSNV regions compared to s-eQTL (Table S6). This trend persisted when restricting the analysis to SNVs located within tissue-specific REs (Table S7).

For tissue-specific and tissue-shared rSNV regions, we further examined whether they were associated with the same or different TFs across the different tissues. In comparing tissue-specific rSNVs in lung and whole blood tissues, we identified four lung-specific rSNVs and 13 whole blood-specific rSNVs that were associated with common TG-TF pairs such as NACC2 (MIM: 615786)-RXRA (MIM: 180245), ZFP36L1 (MIM: 601064)-FOS (MIM: 164810), and JUN (MIM: 165160)-ATF3 (MIM: 603148). Additionally, we found two lung-specific rSNVs and four whole blood-specific rSNVs that were associated with tissue-specific TG-TF pairs: COMMD3 (MIM: 616700)-BMI1 (MIM: 164831) and PIK3CB (MIM: 602925)-FOXP1 (MIM: 605515) in lung and PIK3CB (MIM: 602925)-BHLHE40 (MIM: 604256) and COMMD3 (MIM: 616700)-SMC3 (MIM: 606062) in whole blood. In comparing tissue shared rSNV regions in lung and whole blood tissues, several rSNV regions with common and unique TG-TF pairs were also identified. These findings imply that tissue-specific regulation may involve the same TF binding to different TFBSs to regulate the same TG.

Regulatory network

Using the reg-eQTL method, we identified regulatory networks in each tissue by focusing on TGs involved in rTrio unique to each tissue. These TGs do not exhibit significant TF-SNV interactions in other tissues, highlighting tissue-specific regulatory mechanisms. For example, in lung tissue (Figure 6A), the gene RFWD3 (MIM: 614151) is significantly associated with SNV rs4402594 (q < 0.05) and shows a significant interaction with the TF ZNF629 (q < 0.1). As noted in a study, RFWD3, an E3 ubiquitin ligase essential for the repair of DNA interstrand crosslinks in response to DNA damage, is crucial for non-small cell lung cancer (NSCLC [MIM: 211980]) cell proliferation. Silencing RFWD3 leads to a dramatic inhibition of NSCLC cell proliferation and colony-forming activity, and its elevated expression in tumor samples is inversely associated with clinical outcomes of NSCLC patients.38 In whole blood tissue (Figure 6B), the gene BCL10 (MIM: 603517) was identified with significant associations with SNVs rs2735592 and rs2735591, which significantly interact with the TF RERE (MIM: 605226). Additionally, SNVs rs12032315 and rs12044882 show significant interactions with TF ZNF687 (MIM: 610568), and SNV rs485928 shows significant interaction with TF ATF3 (MIM: 603148). The direct involvement of BCL10 in mucosa-associated lymphoid tissue lymphoma (MIM: 137245) has been previously studied, where the wild-type BCL10 promotes apoptosis and suppresses malignant transformation, while truncated mutants lose this activity and enhance transformation.39

Figure 6.

Figure 6

Tissue-specific regulatory networks identified using the reg-eQTL method in lung and whole blood tissues

(A) In lung, the gene RFWD3 is significantly associated with SNV rs4402594 (q value <0.05) and shows a significant interaction with the transcription factor ZNF629 (q value <0.1).

(B) In whole blood, the gene BCL10 is identified with significant associations with SNVs rs2735592 and rs2735591, which significantly interact with the transcription factor RERE. Additionally, SNVs rs12032315 and rs12044882 show significant interactions with ZNF687, and SNV rs485928 with ATF3.

(C) Two TFs (MLX as a suppressor and ZNF207 as an activator) regulate the same target gene, TMEM104. The seven SNVs regulate TMEM104 negatively but with positive interaction with MLX.

We found seven rTrios in the lung tissue where TF and TF-SNV effects had opposite directions. Interestingly, all these trios involved the same TF MLX (MIM: 602976) and the same TG, TMEM104. The SNVs were also in the same locus upstream of the TG (Figure 6C). Consistent among these trios, the TF and SNV both showed negative effects, while the TF-SNV interaction had a positive effect. As a representative example, we focused on the SNV at chr17:74777655, stratified the data by SNV genotype, and plotted the relationship between TF and TG expression (Figure S7). The slope of the fitted line, representing the TF effect, became progressively more negative as the number of minor alleles increased from 0 to 1 and to 2. A Chow test confirmed that the TF effect differed significantly across SNV genotypes (p = 10−5 and 10−8 for genotype 0 vs. 1 and 1 vs. 2, respectively). These findings suggest that the signals detected by reg-eQTL are unlikely to be false discoveries.

A plausible biological explanation for this phenomenon is that other TFs competitively bind to the same SNV to regulate the TG, resulting in convoluted signals captured by reg-eQTL. We indeed identified another TF, ZNF207 (MIM: 603428), which also binds to this SNV (Figure 6C). Unlike MLX, ZNF207 positively regulates TMEM104. This dynamic, where two TFs with opposing regulatory effects—one acting as a suppressor and the other as an activator—bind to the same SNV and control the same TG, likely explains the observed statistical signals.

Discussion

In this study, we introduce an eQTL method, reg-eQTL, which integrates TF effects and TF-SNV interactions into the conventional eQTL analysis framework. Through comprehensive simulations, we demonstrated that reg-eQTL was more robust in detecting rare rSNVs that have low population frequency with MAF < 0.05 and in detecting rSNVs with weak effects. However, the performance difference between reg-eQTL and s-eQTL was contingent on the direction and magnitude of TF, SNV, and TF interaction effects. The most notable AUC differences were observed when TF effects and interactions aligned. However, in cases where TF and TF-SNV interaction had opposite directions, s-eQTL showed advantages, likely due to the attenuating or canceling effects in reg-eQTL estimates. In some cases, the s-eQTL ROC curve even dipped below the diagonal, indicating worse-than-random performance. Overall, reg-eQTL proved more effective in scenarios with specific interaction dynamics and low-frequency variants. This highlights the importance of considering TF interactions in eQTL analysis for accurate variant effect estimation.

In our investigation, we explored how the performance of reg-eQTL and s-eQTL methods varied across different scenarios based on TF, SNV, and their interaction effects and at different q value thresholds. We found that the specificity remained high at lower q value thresholds but decreased at higher thresholds. Sensitivity increased steadily with higher q values, particularly when the effects were either consistently strong or weak. These findings suggest that the conventional q value threshold of 0.05 may not be universally applicable, and optimal cutoffs should be determined based on the specific effect sizes of TFs and SNVs in the context of the analysis. In general, we recommend stringent thresholds for common variants with strong effect and lenient thresholds for rare variants with weak effect. The guidelines in Table 2 require knowledge of the effect size, which, while not directly observable, can be estimated using Cohen’s f2 standardized coefficient provided in the reg-eQTL output. Importantly, nominal p values should remain <0.05 regardless of the q value threshold, particularly for variants with weak effects, for which the recommended q value cutoffs were lenient.

When applied to GTEx data, reg-eQTL and s-eQTL produced consistent results for a large majority of SNVs. However, reg-eQTL excelled in identifying distant eQTLs (located >500 kbp from TSS). The analysis of tissue-specific and shared regulatory networks further highlighted the distinct and overlapping regulatory mechanisms in different tissues. Reg-eQTL identified more tissue-shared rSNV regions and fewer tissue-specific SNV regions compared to s-eQTL, emphasizing its robustness in capturing common regulatory variants across tissues. Additionally, the identification of tissue-specific regulatory networks in lung and whole blood provides insights into the unique regulatory landscapes of different tissues. Like most eQTL-mapping algorithms, reg-eQTL assumes independence between multiple causal signals and tests each regulatory trio separately. In reality, this assumption could be violated if multiple TFs bind competitively to the same SNV. In such cases, the regulatory networks constructed by reg-eQTL are particularly useful, as they highlight shared SNVs across multiple TFs and TGs, which can then be jointly modeled in follow-up analyses to account for interdependencies and competitive interactions.

Despite the promising results, this study is limited by the large number of tests performed due to the extensive number of trios involved, which could lead to inflated test statistics and increased computational burden. By default, reg-eQTL corrects for multiple testing using q values, which are relatively lenient. For users requiring stricter control of false positives, reg-eQTL also supports alternative approaches, such as Bonferroni correction, to provide more stringent controls. Users are encouraged to explore advanced methods for calculating FDR, such as hierarchical FDR filtering,25,40 iterative p-value threshold adaptation,41 and p-value-free FDR estimation.42 However, it is important to emphasize that ultimate validation of computationally identified eQTLs must be performed experimentally to confirm their biological relevance.

In the reg-eQTL model, the inclusion of TFs alongside PEER factors as covariates introduces potential challenges related to multicollinearity. Specifically, if TFs are correlated with the PEER factors,43 this correlation can lead to multicollinearity within the regression model, complicating the interpretation of individual effects. When assessing TF interactions, such collinearity can obscure the true relationship between TFs and SNVs, potentially causing the TF interaction term to be inaccurately estimated as insignificant. In our analysis, we found that all 504 TFs examined in lung tissue exhibited a Pearson correlation coefficient greater than 0.5 with at least one of the PEER factor covariates (Figure S8). This highlights the need for careful consideration and potential adjustment for collinearity to ensure accurate modeling of TF effects in eQTL analysis.

Tissue- and cell-type specificity is essential for understanding gene-regulatory mechanisms. However, this information may not be fully captured in the GeneHancer annotations due to the limited number of samples used to derive tissue-specific REs, many of which are heavily biased toward cancer samples. Improving the quality and comprehensiveness of these annotations would significantly enhance their utility. Nevertheless, because the reg-eQTL model incorporates TF expression levels that vary by tissue type, the results inherently reflect some degree of tissue specificity. To further enhance the flexibility, the R/reg-eQTL package includes a utility function that allows users to generate regulatory trios using their own custom annotations, enabling analyses tailored to specific datasets or tissue types.

Fine mapping is a natural follow-up to eQTL analysis to pinpoint causal variants in high-LD regions. Several fine-mapping algorithms support the use of user-specified SNV-level weights as prior to guide the fine-mapping process. These include PAINTOR,44 CAVIAR/CAVIARBF,41 TreeMap,45 and TORUS.46 The standardized coefficients of SNV effect reported by reg-eQTL are well suited as weights for these tools, as it is comparable across loci and reflects the relative contribution of each SNV to the TG expression.

In summary, reg-eQTL is a valuable addition to the existing eQTL analysis toolkit, providing functionalities to study regulatory networks.

Data and code availability

We developed an R package implementing the reg-eQTL algorithm and have provided open access at https://github.com/liliulab/reg-eQTL.

Acknowledgments

This work was supported by the National Institutes of Health (grant no. R01LM013438).

Declaration of interests

The authors declare no competing interests.

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work, the authors utilized ChatGPT to assist with grammar checks and sentence formation in the drafting of the manuscript. Following the use of this tool, the authors thoroughly reviewed and edited the content to ensure its accuracy and appropriateness. The authors take full responsibility for the final content of the publication.

Published: February 7, 2025

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2025.01.015.

Web resources

Supplemental information

Document S1. Figures S1–S8 and Tables S1–S7
mmc1.pdf (840.4KB, pdf)
Document S2. Article plus supplemental information
mmc2.pdf (5.5MB, pdf)

References

  • 1.Hrdlickova B., de Almeida R.C., Borek Z., Withoff S. Genetic variation in the non-coding genome: Involvement of micro-RNAs and long non-coding RNAs in disease. Biochim. Biophys. Acta. 2014;1842:1910–1922. doi: 10.1016/J.BBADIS.2014.03.011. [DOI] [PubMed] [Google Scholar]
  • 2.Kumar S., Ambrosini G., Bucher P. SNP2TFBS – a database of regulatory SNPs affecting predicted transcription factor binding site affinity. Nucleic Acids Res. 2017;45:D139–D144. doi: 10.1093/NAR/GKW1064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Li M.J., Yan B., Sham P.C., Wang J. Exploring the function of genetic variants in the non-coding genomic regions: approaches for identifying human regulatory variants affecting gene expression. Brief. Bioinform. 2015;16:393–412. doi: 10.1093/BIB/BBU018. [DOI] [PubMed] [Google Scholar]
  • 4.Hardison R.C. Genome-wide epigenetic data facilitate understanding of disease susceptibility association studies. J. Biol. Chem. 2012;287:30932–30940. doi: 10.1074/JBC.R112.352427/ATTACHMENT/91EE2B06-DF6A-427A-A662-D49856FCC15A/MMC1.PDF. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Schaub M.A., Boyle A.P., Kundaje A., Batzoglou S., Snyder M. Linking disease associations with regulatory information in the human genome. Genome Res. 2012;22:1748–1759. doi: 10.1101/GR.136127.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Nicolae D.L., Gamazon E., Zhang W., Duan S., Dolan M.E., Cox N.J. Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS. PLoS Genet. 2010;6 doi: 10.1371/JOURNAL.PGEN.1000888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Gao P., Xia J.H., Sipeky C., Dong X.M., Zhang Q., Yang Y., Zhang P., Cruz S.P., Zhang K., Zhu J., et al. Biology and Clinical Implications of the 19q13 Aggressive Prostate Cancer Susceptibility Locus. Cell. 2018;174:576–589.e18. doi: 10.1016/J.CELL.2018.06.003/ATTACHMENT/AD9723C6-F97E-4721-A168-9C34A5F3AFC6/MMC1.PDF. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Wang Y., Ma R., Liu B., Kong J., Lin H., Yu X., Wang R., Li L., Gao M., Zhou B., et al. SNP rs17079281 decreases lung cancer risk through creating an YY1-binding site to suppress DCBLD1 expression. Oncogene. 2020;39:4092–4102. doi: 10.1038/S41388-020-1278-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sieberts S.K., Perumal T.M., Carrasquillo M.M., Allen M., Reddy J.S., Hoffman G.E., Dang K.K., Calley J., Ebert P.J., Eddy J., et al. Large eQTL meta-analysis reveals differing patterns between cerebral cortical and cerebellar brain regions. Sci. Data. 2020;7 doi: 10.1038/S41597-020-00642-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wright F.A., Shabalin A.A., Rusyn I. Computational tools for discovery and interpretation of expression quantitative trait loci. Pharmacogenomics. 2012;13:343–352. doi: 10.2217/PGS.11.185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wang P., Dawson J.A., Keller M.P., Yandell B.S., Thornberry N.A., Zhang B.B., Wang I.M., Schadt E.E., Attie A.D., Kendziorski C. A Model Selection Approach for Expression Quantitative Trait Loci (eQTL) Mapping. Genetics. 2011;187:611–621. doi: 10.1534/GENETICS.110.122796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Shabalin A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28:1353–1358. doi: 10.1093/BIOINFORMATICS/BTS163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ongen H., Buil A., Brown A.A., Dermitzakis E.T., Delaneau O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics. 2016;32:1479–1485. doi: 10.1093/BIOINFORMATICS/BTV722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Zeng Z.B. Precision mapping of quantitative trait loci. Genetics. 1994;136:1457–1468. doi: 10.1093/GENETICS/136.4.1457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Haley C.S., Knott S.A. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity. 1992;69:315–324. doi: 10.1038/HDY.1992.131. [DOI] [PubMed] [Google Scholar]
  • 16.Arends D., Prins P., Jansen R.C., Broman K.W. R/qtl: high-throughput multiple QTL mapping. Bioinformatics. 2010;26:2990–2992. doi: 10.1093/BIOINFORMATICS/BTQ565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Schaid D.J., Chen W., Larson N.B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 2018;19:491–504. doi: 10.1038/s41576-018-0016-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Tseng C.C., Wong M.C., Liao W.T., Chen C.J., Lee S.C., Yen J.H., Chang S.J. Genetic Variants in Transcription Factor Binding Sites in Humans: Triggered by Natural Selection and Triggers of Diseases. Int. J. Mol. Sci. 2021;22:4187. doi: 10.3390/IJMS22084187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Degtyareva A.O., Antontseva E.V., Merkulova T.I. Regulatory SNPs: Altered Transcription Factor Binding Sites Implicated in Complex Traits and Diseases. Int. J. Mol. Sci. 2021;22:6454. doi: 10.3390/IJMS22126454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Johnston A.D., Simões-Pires C.A., Thompson T.V., Suzuki M., Greally J.M. Functional genetic variants can mediate their regulatory effects through alteration of transcription factor binding. Nat. Commun. 2019;10:1–16. doi: 10.1038/s41467-019-11412-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Flynn E.D., Tsu A.L., Kasela S., Kim-Hellmuth S., Aguet F., Ardlie K.G., Bussemaker H.J., Mohammadi P., Lappalainen T. Transcription factor regulation of eQTL activity across individuals and tissues. PLoS Genet. 2022;18 doi: 10.1371/JOURNAL.PGEN.1009719. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Fishilevich S., Nudel R., Rappaport N., Hadar R., Plaschkes I., Iny Stein T., Rosen N., Kohn A., Twik M., Safran M., et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database. 2017;2017 doi: 10.1093/DATABASE/BAX028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Delaneau O., Ongen H., Brown A.A., Fort A., Panousis N.I., Dermitzakis E.T. A complete tool set for molecular QTL discovery and analysis. Nat. Commun. 2017;8 doi: 10.1038/NCOMMS15452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Taylor-Weiner A., Aguet F., Haradhvala N.J., Gosai S., Anand S., Kim J., Ardlie K., Van Allen E.M., Getz G. Scaling computational genomics to millions of individuals with GPUs. Genome Biol. 2019;20 doi: 10.1186/S13059-019-1836-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Huang Q.Q., Ritchie S.C., Brozynska M., Inouye M. Power, false discovery rate and Winner’s Curse in eQTL studies. Nucleic Acids Res. 2018;46:e133. doi: 10.1093/NAR/GKY780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ko B.S., Lee S.B., Kim T.K. A brief guide to analyzing expression quantitative trait loci. Mol. Cells. 2024;47 doi: 10.1016/J.MOCELL.2024.100139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Selya A.S., Rose J.S., Dierker L.C., Hedeker D., Mermelstein R.J. A Practical Guide to Calculating Cohen’s f2, a Measure of Local Effect Size, from PROC MIXED. Front. Psychol. 2012;3:111. doi: 10.3389/FPSYG.2012.00111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Nieminen P. Application of Standardized Regression Coefficient in Meta-Analysis. BioMedInformatics. 2022;2:434–458. doi: 10.3390/BIOMEDINFORMATICS2030028. [DOI] [Google Scholar]
  • 29.Aguet F., Barbeira A.N., Bonazzola R., Brown A., Castel S.E., Jo B., Kasela S., Kim-Hellmuth S., Liang Y., Oliva M., et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318. doi: 10.1126/SCIENCE.AAZ1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Lonsdale J., Thomas J., Salvatore M., Phillips R., Lo E., Shad S., Hasz R., Walters G., Garcia F., Young N., et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 2013;45:580–585. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Fullwood M.J., Han Y., Wei C.L., Ruan X., Ruan Y. Chromatin Interaction Analysis using Paired-End Tag Sequencing. Curr. Protoc. Mol. Biol. 2010;CHAPTER 21:21.15.1–21.15.25. doi: 10.1002/0471142727.MB2115S89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Wang C., Ge Y., Li R., He G., Lin Y. Novel compound heterozygous missense variants in TOE1 gene associated with pontocerebellar hypoplasia type 7. Gene. 2023;862 doi: 10.1016/J.GENE.2023.147250. [DOI] [PubMed] [Google Scholar]
  • 33.Lintell, N., Hsieh, S.-M., Hunter, K., and Ambs, S. (2008). Ttc9c: A potential novel tumorigenic regulator. American Association for Cancer Research, 68 (9_Supplement): LB–94.
  • 34.Hong T., Piao S., Sun L., Tao Y., Ke M. Tumor protein P63 Regulated 1 contributes to inflammation and cell proliferation of cystitis glandularis through regulating the NF-кB/cyclooxygenase-2/prostaglandin E2 axis. Bosn. J. Basic Med. Sci. 2022;22:100–109. doi: 10.17305/BJBMS.2021.6763. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zhang Z., Zhong L., Dan W., Chu X., Liu C., Luo X., Wan P., Liu Z., Lu Y., Wang X., Liu B. ZFP91 promotes cell proliferation and inhibits cell apoptosis in AML via inhibiting the proteasome-dependent degradation of RIP1. Int. J. Med. Sci. 2022;19:274–285. doi: 10.7150/IJMS.67436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Chen C.W., Zhang L., Dutta R., Niroula A., Miller P.G., Gibson C.J., Bick A.G., Reyes J.M., Lee Y.T., Tovy A., et al. SRCAP mutations drive clonal hematopoiesis through epigenetic and DNA repair dysregulation. Cell Stem Cell. 2023;30:1503–1519.e8. doi: 10.1016/j.stem.2023.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Xu Q., Zhou H., Li M., Wang W., Xu M., Zhu Z., Zhang C., Wang Q., Yu F., He J. Structural Study of the Complex of cblC Methylmalonic Aciduria and Homocystinuria-Related Protein MMACHC with Cyanocobalamin. Crystals. 2022;12:468. doi: 10.3390/CRYST12040468. [DOI] [Google Scholar]
  • 38.Zhang Y., Zhao X., Zhou Y., Wang M., Zhou G. Identification of an E3 ligase-encoding gene RFWD3 in non-small cell lung cancer. Front. Med. 2020;14:318–326. doi: 10.1007/S11684-019-0708-6. [DOI] [PubMed] [Google Scholar]
  • 39.Gehring T., Seeholzer T., Krappmann D. BCL10-Bridging CARDs to immune activation. Front. Immunol. 2018;9 doi: 10.3389/FIMMU.2018.01539/BIBTEX. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Peterson C.B., Bogomolov M., Benjamini Y., Sabatti C. TreeQTL: hierarchical error control for eQTL findings. Bioinformatics. 2016;32:2556–2558. doi: 10.1093/BIOINFORMATICS/BTW198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Chen W., McDonnell S.K., Thibodeau S.N., Tillmans L.S., Schaid D.J. Incorporating Functional Annotations for Fine-Mapping Causal Variants in a Bayesian Framework Using Summary Statistics. Genetics. 2016;204:933–958. doi: 10.1534/GENETICS.116.188953. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Ge X., Chen Y.E., Song D., McDermott M., Woyshner K., Manousopoulou A., Wang N., Li W., Wang L.D., Li J.J. Clipper: p-value-free FDR control on high-throughput data from two conditions. Genome Biol. 2021;22 doi: 10.1186/S13059-021-02506-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Fan Y., Zhu H., Song Y., Peng Q., Zhou X. Efficient and effective control of confounding in eQTL mapping studies through joint differential expression and Mendelian randomization analyses. Bioinformatics. 2021;37:296–302. doi: 10.1093/BIOINFORMATICS/BTAA715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Kichaev G., Yang W.Y., Lindstrom S., Hormozdiari F., Eskin E., Price A.L., Kraft P., Pasaniuc B. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 2014;10 doi: 10.1371/JOURNAL.PGEN.1004722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Liu L., Chandrashekar P., Zeng B., Sanderford M.D., Kumar S., Gibson G. TreeMap: a structured approach to fine mapping of eQTL variants. Bioinformatics. 2021;37:1125–1134. doi: 10.1093/BIOINFORMATICS/BTAA927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Wen X., Pique-Regi R., Luca F. Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization. PLoS Genet. 2017;13 doi: 10.1371/JOURNAL.PGEN.1006646. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S8 and Tables S1–S7
mmc1.pdf (840.4KB, pdf)
Document S2. Article plus supplemental information
mmc2.pdf (5.5MB, pdf)

Data Availability Statement

We developed an R package implementing the reg-eQTL algorithm and have provided open access at https://github.com/liliulab/reg-eQTL.


Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES