Abstract
An expression quantitative trait locus (eQTL) is a chromosomal region where genetic variants are associated with the expression levels of specific genes that can be both nearby or distant. The identifications of eQTLs for different tissues, cell types, and contexts have led to a better understanding of the dynamic regulations of gene expressions and implications of functional genes and variants for complex traits and diseases. Although most eQTL studies have been performed on data collected from bulk tissues, recent studies have demonstrated the importance of cell-type-specific and context-dependent gene regulations in biological processes and disease mechanisms. In this review, we discuss statistical methods that have been developed to enable the detection of cell-type-specific and context-dependent eQTLs from bulk tissues, purified cell types, and single cells. We also discuss the limitations of the current methods and future research opportunities.
Keywords: eQTL, Bulk samples, Single cell, Tissues, Cell-type-specific, Context-dependent
Introduction
Recent years have seen significant progress in identifying genomic regions associated with complex traits and diseases through genome wide association studies (GWAS), where tens of thousands of genomic regions have been associated with thousands of traits, including many complex diseases (see the curated GWAS results at https://www.ebi.ac.uk/gwas/). One challenge of interpreting GWAS findings is that most of associated genetic variants, e.g., single nucleotide polymorphisms (SNPs), are in intergenic regions, making it difficult to infer functional genes and variants in these regions. Many efforts have been made to annotate the human genome through experimental studies (e.g., ENCODE project [The ENCODE Project Consortium et al., 2020]; Roadmap Epigenomics Project [Roadmap Epigenomics et al., 2015]; psychENCODE [Psych et al., 2015]) and computational approaches (e.g., CADD [Kircher et al., 2014]; GWAVA [Ritchie et al., 2014]; GenoCanyon [Lu et al., 2015]; GenoSkyline [Lu et al., 2016]; EIGEN [Ionita-Laza et al., 2016]; GenoSkyline-Plus [Lu et al., 2017]; STARR [Li et al., 2022]) to infer the functional roles of different SNPs and other variants, including eQTL studies (The GTEx Consortium, 2020), where the goal is to infer genetic variants affecting genetic regulation by associating genotypes with gene expression levels across a sample of individuals. Because eQTL studies measure expression levels of all the genes in the genome, they provide an unbiased view of the regulation of gene expression. Using results from eQTL studies in lymphoblastoid cell lines from HapMap samples, it was shown that SNPs associated with complex traits are significantly more likely to be eQTLs identified than minor-allele-frequency-matched SNPs (Nicolae et al., 2010). Another study assessed the enrichment and depletion of variants in different annotation classes (Kindt et al., 2013), including genic regions, regulatory features, measures of conservation, and patterns of histone modifications. It was found that annotations associated with chromatin state and eQTLs were the most enriched groups. These early results stimulated many large community efforts to collect gene expression and genotype data for eQTL studies, and the accumulation of eQTL results parallels the great success of GWAS. Suppose a SNP is associated with a complex trait, and is also associated with the expression level of a specific gene. In that case, this gene may be implicated as a possible candidate gene for the trait. A number of methods have been developed to formalize this idea for co-localization analysis that aims to find the SNPs that are associated with both expression and complex traits (Hormozdiari et al., 2016; Wen et al., 2016; Giambartolomei et al., 2018). Transcriptome wide association analysis methods have been developed to use eQTL data to predict the expression levels and associate the predicted (imputed) expression levels with the observed complex traits (Gamazon et al., 2015; Gusev et al., 2016; Hu et al., 2019). Mendelian randomization methods have also been proposed to investigate whether the expression trait is a causal factor for a complex trait of interest (Richardson et al., 2020; Yuan et al., 2020; Zhou et al., 2020; Liu et al., 2021).
The most well-known eQTL study is the GTEx project where dozens of tissues from hundreds of individuals were analyzed to identify tissue-specific eQTLs (The GTEx Consortium, 2020). The GTEx project has proved to be a valuable resource for the research community. The version 8 of the GTEx analyzed 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. It was found that cis-eQTLs showed 1.46-fold enrichment in the GWAS catalogue (https://www.ebi.ac.uk/gwas/) where significant GWAS association results are collected. The cross tissue eQTL similarities were consistent with tissue relatedness, with tissues from the brain region forming one cluster, other organs being more similar to each other, with the exceptions of testis, lymphoblastoid cell lines, whole blood, and liver that are distinct from other tissues. BLUEPRINT collects genetic, epigenetic, and transcriptomic profiling in three immune cell types to investigate the contributions of different factors in gene expression (Chen et al., 2016). eQTL catalogue is a resource developed by reprocessing data from dozens of studies with more than 30,000 samples, where summary statistics are available for many cell types and tissues (Kerimov et al., 2021). The results from these studies and resources thus generated have demonstrated the values of eQTL information in inferring causal genes and variants at GWAS loci.
Most eQTL studies to date have been performed on bulk samples, where the estimated effect size of an SNP represents the average effect across different cell types, and the cell-type origin(origins) of the inferred eQTLs is(are) often unknown for a bulk sample consisting of distinct cell types. Despite some successes of using eQTL results to infer disease causing genes and variants, recent studies based on both modeling (Yao et al., 2020) and carefully chosen gene-trait pairs (Connally et al., 2022) have shown that the known eQTLs, which are mostly derived from analysis of bulk tissues, only explain a very small proportion of the GWAS signals, where GWAS hits colocalize with eQTL SNPs. There is growing evidence (as summarized below) that eQTL effects are often cell-type-specific and/or context-dependent, and many of the eQTLs uniquely identified through cell-type-specific and context-dependent analysis (either experimentally or computationally) colocalize with GWAS results (Aguirre-Gamboa et al., 2020; Donovan et al., 2020; Patel et al., 2021), suggesting the importance of cell-type-specific and context-dependent eQTLs for interpreting and understanding GWAS signals. Therefore, there is a great need to identify these additional eQTLs missed from tissue-based analysis to expand the space of eQTLs and make more informed inference on disease-causing genes and variants.
To facilitate the identifications of cell-type-specific and context-dependent eQTLs, statistical methods have been developed for both bulk samples through digital deconvolution analysis, and for single-cell data, which offer finer cell-type resolutions and can capture dynamic effects of eQTLs. We illustrate three different data types that can be used for inferring cell-type-specific and context-dependent eQTLs (Fig. 1). In this review, we discuss existing statistical methods that use bulk tissues and single-cell data to identify cell-type-specific and context-dependent eQTLs, showing high-level analysis pipelines for bulk tissues consisting of distinct cell types and single cells (Fig. 2), with details in the next section. We then summarize results from empirical studies using bulk samples, purified cells, and single-cell data. We conclude with the limitations of the existing computational methods and future methodological needs.
Fig. 1.

Illustration of eQTL analysis at different resolutions: single cells, purified cells, and bulk samples. Shown are data from three individuals with genotypes of AA, AG, and GG, respectively. There are two cell types making up the bulk samples, the oval-shaped cells and the triangle-shaped cells. For single-cell data, we can observe expression level at the single-cell level. For example, for the first individual with genotype AA, there are four oval-shaped cells with expression levels at 0.9, 1.1, 0.8, and 1.2, and two triangle-shaped cells with expression levels at 3.2 and 2.8, respectively. eQTL analysis can be performed for two cell types separately using single cells across these three individuals to correlate genotypes with observed single-cell level gene expression data. For data from purified cells, we observe aggregated gene expression levels for different cell types but without individual cell level measurements. The average expression level for the oval-shaped cells is 1, 2, and 3, respectively, for the three individuals. For data from bulk samples, we can no longer distinguish contributions from two distinct cell types. The average expression level for the three individuals is 1.7, 2.0, and 1.7, respectively. For single-cell data, not only we can study the association between genotypes and cell-type-specific expressions, we can also correlate genotypes with cell-type proportions. Through deconvolutions methods, the bulk samples may be deconvoluted to different cell types to allow cell-type-specific eQTL analysis with estimated cell type proportions from different individuals.
Fig. 2.

General pipeline for (A) bulk-sample-based and (B) single-cell-based analysis to identify celltype-specific and context-dependent eQTLs.
2. Analytical approaches for cell-type-specific and context-dependent eQTL inference
3. 1. eQTL inference using bulk samples
Early eQTL studies collected gene expression data using microarrays, where gene expression levels need to be normalized to remove batch effects, and the normalized data are analyzed to identify eQTLs. Consider a study with subjects, SNPs, and genes. For the th subject, let denote the expression level of the th gene, and denote the genotype of the sth SNP. For a SNP with two alleles, say and , its three genotypes , , and can be coded as 2, 1, and 0, respectively. We can study the relationship between the observed gene expression level and genotype through the following regression model
| (1) |
where is the intercept, is the effect of the sth SNP on the expression of the th gene, and is the error term, often assumed to follow a normal distribution. A more comprehensive model may also include other covariates, such as age and sex. Testing the null hypothesis that the sth SNP does not affect the expression level of the th gene is equivalent to testing . A typical eQTL study considers more than 20,000 genes and up to millions of SNPs. Because of the large number of SNPs to be tested, researchers often focus on cis-eQTLs for a given gene, which are SNPs in close physical proximity, say within one million base pairs of the candidate gene. In contrast, trans-sQTLs correspond to SNPs that are on different chromosomes or further away from the gene of interest on the same chromosome. Most eQTL findings have been for cis-eQTLs largely due to statistical power difference in detecting cis-eQTLs and trans-eQTLs. With a few hundred samples, which is the typical size of an eQTL study, there is limited power to do a genome-wide association study required to identify trans-eQTLs, which often have smaller effect sizes than cis-eQTLs. Even for cis-eQTL analysis, hundreds or thousands of SNPs often need to be considered, and multiple comparison adjustments must be done to appropriately control false positive findings. Several computational tools have been developed and commonly used for eQTL analysis in bulk samples, such as MatrixEQTL (Shabalin, 2012) and FastQTL (Ongen et al., 2016).
The regression setting in (1), where the errors are assumed to be Gaussian, is reasonable for microarray-based gene expression measurements. However, with gene expression data collected through RNA-sequencing, such as those from the GTEx project, the measured gene expression level is the total number of sequence reads mapped to a specific gene, which needs to be adjusted for total sequencing depth and other factors. These data may be better modeled by other distributions, e.g., negative binomial, while accounting for factors that may impact the observed sequencing reads. In this case, a generalized linear regression model may be more appropriate than (1) and may also have better statistical power, although it may be computationally more expensive.
For RNA-sequencing, there is added benefit of observing alternative splicing and allele-specific expression. In the case of allele-specific expression, consider the presence of a SNP in the transcribed region of a gene with two alleles and , and the simple scenario that all the sequence reads contain this SNP. For heterozygous individuals with genotype , a sequence read covering this SNP may either have or . In one extreme case, all the sequence reads may only contain but not . Even in the absence of measured total gene expression levels for homozygous individuals with genotypes and , the imbalance between the mapped sequence reads having and suggests the presence of cis-eQTLs, either the SNP with alleles and itself or some SNP with perfect dependence with this SNP, that regulates the expression level of this gene. Statistical models have been proposed to explicitly incorporate this allelic-specific expression to identify cis-eQTLs, including TReCASE (Sun, 2012), RASQUAL (Kumasaka et al., 2016), and mixQTL (Liang et al., 2021). It was found that considering allelic-specific expression could identify 20% to 100% more genes with eQTLs across 28 tissues in the GTEx project than only considering total expression levels using TreCASE (Zhabotynsky et al., 2022), and the power gain of mixQTL was equivalent to a 29% increase in sample size for genes with sufficient allele-specific read coverage (Liang et al., 2021).
The analysis of tissue-level data also allows for the investigation of context-dependent eQTLs if the context can be well defined. For example, 369 sex-biased eQTLs were inferred through separate analyses of male and female GTEx samples (The GTEx Consortium, 2020), where the sex of an individual may be considered a context. Furthermore, 178 population-biased eQTLs were also implicated, where population origin may be regarded as another context. Other context-dependent effects can be considered by including an interaction term between the context variable and the SNP genotype in regression model (1).
2.2. Cell-type-specific eQTL inference using bulk samples
Several studies have been published that used purified cells of different cell types to infer cell-type-specific eQTLs (ct-eQTLs). As the gene expression data in these samples are collected in the same manner as bulk tissue samples, the same statistical methods for bulk samples can be applied to infer ct-eQTLs from these data. However, the sample size tends to be smaller and the measurement noises may be higher. In addition, the purified cell types may be contaminated with other cell types.
Without collecting data from purified cell types, Westra et al. (2015) proposed identifying ct-eQTLs by investigating whether there is an interaction effect between the surrogate score for a cell type and candidate SNP's genotype on bulk gene expression levels from the collected samples. More formally, this model can be written as
| (2) |
with two additional terms and compared to model (1), where is a proxy marker for the cell type of interest in the th individual, is the effect of the proxy marker on the expression level of the th gene, and is the interaction effect between genotype and proxy marker . A significant interaction effect, i.e. , is interpreted as the cell-type-specific effect of SNP on expression of gene . The same approach was used to study context-dependent eQTLs in (Zhernakova et al., 2017).
Instead of deriving cell-type-specific proxy markers or enrichment scores, the estimated cell type proportions can also be used as a proxy for a given cell type. Recent years have seen the developments of many methods to deconvolute bulk RNA-seq samples to infer proportions of different cell types and cell-type-specific expression levels (Avila Cobos et al., 2020). For the th subject, let denote the estimated proportion of the th cell type for this individual, where there is a total of cell types. We can use the following regression model to detect cs-eQTLs for the th cell type.
| (3) |
In this model, is the cell type proportion effect from the th cell type, and is the interaction effect between the sth SNP and the proportion of th cell type. A non-zero suggests a cell-type-specific effect for the sth SNP.
The formulations in (2) and (3) consider one cell type at a time and ignore the contributions of possible cell-type-specific effects from other cell types, both in terms of proportions and expression profiles, leading to a potential loss of information. Moreover, because models (2) and (3) only consider a tissue and cell type pair at a time, and may not attribute a non-zero to the correct cell type. For example, consider the case of two cell types, where or 2. If , then due to the constraint that . Furthermore, the power differs across cell types, with a higher statistical power in detecting ct-eQTLs for more abundant cell types. A more comprehensive model that takes into account all cell types simultaneously can be formulated as
| (4) |
subject to the constraint that . Correspondingly, the sth SNP is a ct-eQTL for the th cell type if . Another way to parametrize this model is in the form of
| (5) |
Note that the term in model (5) is essentially the cell-type-specific gene expression for the th cell type in sample , essentially the same model considered in Decon-eQTL (Aguirre-Gamboa et al., 2020).
In practice, ct-eQTL analysis based on the above models often uses transformed gene expression data instead of read counts. This may distort the association between the observed gene expression level and cell type compositions, leading to reduced power and inflated false positives. Recently, Little et al. (2022) proposed Cell type-Specific eQTL (CSeQTL) to jointly model total read counts and allele-specific counts by a negative binomial (or Poisson) and a beta-binomial (or binomial) distribution with the consideration of covariates, cell type composition, and SNP genotype. CSeQTL also includes allele-specific expression to further increase the power to detect cell type-specific eQTLs. Empirical studies showed higher power of CSeQTL than linear model-based methods. DeCAF is a linear model-based method that considers both total expression levels and allele-specific expression (Kalita and Gusev, 2022).
Although the above approaches are intuitive, applying them to infer ct-eQTLs in practice has challenges. First, there are uncertainties in the estimated proxy markers and cell type proportions, and these need to be appropriately incorporated in the analysis. However, this issue has only recently been studied (Xie and Wang, 2022) and the impact of incorporating these uncertainty estimates in ct-eQTL inference needs to be explored. Second, although ct-eQTLs may be inferred for all cell types in principle, it would be relatively easier for more abundant cell types than for less abundant or rare cell types. Third, there have to be sufficient variations in cell type compositions across subjects to allow ct-eQTL inference. For example, in the extreme case that all the subjects have identical cell type proportions, the parameters in the above models are not identifiable. Fourth, the above formulation does not consider similarity among some cell types, although methods have been proposed to consider cell lineage (Yankovitz et al., 2021).
2.3. Cell-type-specific eQTLs from single-cell data
In addition to bulk data, single-cell data are increasingly used for ct-eQTL inference (Jonkers and Wijmenga, 2017; van der Wijst et al., 2018; Liu et al., 2021; Neavin et al., 2021). Most published single-cell-based ct-eQTL analyses are performed by analyzing pseudo-bulk RNA-seq data for different cell types, where the single-cell data are first annotated to distinct cell types, and the cells annotated to the same cell types from a specific subject are combined to derive cell-type-specific gene expression levels. eQTL methods for bulk samples can then be applied to detect ct-eQTLs. For example, Yazar et al. (2022) grouped cells of the same type for each individual and adjusted for covariate effects before performing Spearman rank correlation analysis. However, the sample size is still much more limited for single-cell data compared to that of bulk samples, and there are many ongoing efforts for single-cell-based genetic association analysis, e.g., the single-cell eQTLGen consortium (van der Wijst et al., 2018) and the OneK1K cohort (Yazar et al., 2022).
Instead of aggregating all the cells in a given individual, Nathan et al. (2022) used Poisson mixed effects regression to model the effects of SNPs, cell states (which can be both discrete and continuous), batch structure, and other covariates (such as sex, age, genotype principal components and gene expression principal components, percentage of mitochondrial UMIs) on the observed gene expression level measured by UMI (unique molecular identifier) counts at the single cell level. The effect of an SNP is modeled as a fixed effect in the analysis. When the Poisson mixed effects model was compared with the computationally more expensive negative binomial mixed effects model, it was found that the Poisson model was adequate for the single-cell data analyzed.
Although single cells can be grouped into pre-defined cell types for ct-eQTL analysis, the very high resolution at the single cell level offers the opportunity for more refined analysis, where the individual cells can be characterized by a vector of continuous contexts. For example, principal component analysis can be performed for the highly variable genes across all the cells based on normalized gene expression data, and the top principal components for a single cell may be taken as the cellular states for this cell. After the cellular states are defined for a single cell, the effects of a SNP on gene expression may be studied in the context of these cellular states to see whether the effects may vary depending on different states. Assuming we have subjects, with cells collected from the th subject, and a total of different cellular contexts are defined for each cell. Let the states of th cell for the th subject be denoted by a vector of contexts of dimension . Cuomo et al. (2022) proposed a cellular regulatory map model, called CellRegMap, as
| (6) |
where represents the measured expression level of the th gene in the ith cell of the th subject, is the genotype of the sth SNP of the nth individual, is the baseline expression level, represents the persistent effect of the sth SNP across all the cells in different subjects, is the cell-specific effect on the th expression level, accounts for the fact that the cells are from the same subject, accounts for the cell context effects, and is the error term. CellRegMap adopts an overall random effects model approach where , , , and . The matrix is defined by the cellular context vectors . CellRegMap uses a score test to investigate whether an SNP has a context-dependent effect on gene expression level with the null hypothesis . This model can also be used to test the main effect and estimate the allelic effects of single cells for each gene-SNP pairs based on the best linear unbiased predictor. In practice, it is important to define cellular contexts, and CellRegMap used MOFA to define cellular states(Argelaguet et al., 2018), where latent factors are inferred from single-cell data that explain variation in gene expression in the data. Because of the computational issues and the assumption of normal errors, the single cells were aggregated to meta cells because of the sparsity in single-cell data in real data analysis. In addition, only specific gene-SNP pairs were considered due to statistical power concerns.
Strober et al. (2022) proposed a similar approach, called Single-cell Unsupervised Regulation of Gene Expression (SURGE), where a continuous representation of the cell contexts is learned through a probabilistic model with matrix factorization. The model has a form similar to that of CellRegMap as follows:
| (7) |
where is standardized gene expression level for the th gene in the ith cell of the th subject and is the standardized genotype for the sth SNP of the th individual. The other parameters have the same meaning as the CellRegMap model, but the context vector is latent and learned from the model instead of prespecified. SURGE has the following assumptions about the model parameters: , , , , , , , and . SURGE approximates the posterior distribution of all latent variables using mean-field variational inference. Similar to CellRegMap, only eQTLs identified in previous studies were used for analysis due to statistical power concerns, and the reliance on the normal distribution assumption of the error terms limits its direct application to single-cell data. As a result, single cells have to be aggregated to meta cells before the model is applied.
3. Empirical results on cell-type-specific and context-dependent eQTL analyses
This section reviews the growing evidence of cell-type-specific and context-dependent eQTLs using bulk samples, purified cells, and single cells.
3.1. Tissue analysis
3.1.1. Whole blood
Among the first analysis of cis-eQTLs using bulk samples, Westra et al. (2015) analyzed whole blood gene expression data of 5683 individuals from seven cohorts to infer cell-type-specific cis-eQTLs. A total of 1115 cis-eQTLs (8.5% of the significant cis-eQTLs from prior eQTL analysis for the whole tissue) were found to have significant interaction effects with neutrophil proxy. The results were replicated in six individual purified cell-type eQTL datasets. More importantly, the authors showed SNPs associated with Crohn's disease preferentially affect gene expression within neutrophils, demonstrating the insights gained from cell-type-specific eQTL analysis. Zhernakova et al. (2017) performed eQTL and context-dependent eQTL analysis on RNA-seq data of peripheral blood from 2116 unrelated individuals, identifying 23,060 genes with eQTLs, among which 2743 (12%) showed context-dependent effects.
3.1.2. GTEx data
Cell-type-specific analysis was performed on the GTEx data (Kim-Hellmuth et al., 2020). The authors estimated cell type enrichment for seven cell types (adipocytes, epithelial cells, hepatocytes, keratinocytes, myocytes, neurons, and neutrophils) across 35 tissues. Between 43 pairs of tissues and cell types, they identified eQTLs specific to at least one cell type by testing for interaction effects between SNP and cell type enrichment on the observed expression levels. They found that these cell-type–interaction QTLs, called ieQTLs, are enriched for genes with tissue-specific eQTLs and generally not shared across unrelated tissues. Furthermore, these ieQTLs are enriched for complex trait associations and had colocalization signals for hundreds of undetected loci in bulk tissue.
3.2. Cultured and purified cells
3.2.1. Brain cells
Aygun et al. (2021) used a cell-type-specific in vitro model system including 85 neural progenitors and 74 virally labeled and sorted neuronal progeny for eQTL analysis. They identified 2079 and 872 eQTLs in progenitors and neurons, respectively, with 66% and 47% of these eQTLs not identified in fetal bulk brain eQTLs from a largely overlapping sample or in adult data from GTEx. These eQTLs had cell-type-specific colocalizations with GWAS hits for neuropsychiatric disorders and other brain-related traits.
Microglia in the brain play critical roles in immune defense and development, and are implicated in neurodegenerative disorders. Young et al. (2021) gathered gene expression profiles in primary microglia isolated from 141 patients undergoing neurosurgery. A total of 585 microglia eQTLs were identified. Through joint analysis with monocytes and IPSDMac, 855 microglia eQTLs were inferred, with 108 microglia specific, and 449 shared across three cell types. For colocalization with GWAS hits, there was an excess of colocalized microglial eQTLs for Alzheimer's disease, Parkinson's disease, and inflammatory bowel disease.
3.2.2. Melanocyte cultures
Because melanocytes give rise to melanoma but account for less than 5% of human skin biopsies, Zhang et al. (2018) performed eQTL analysis in primary melanocyte cultures from 106 newborn males to identify eQTLs in melanocytes. The identified melanocyte eQTLs differed considerably from those from the GTEx tissues, including skin. Novel risk genes for melanoma were implicated using the transcriptome-wide association study based on this data set.
3.2.3. Immune cells
The DICE project isolated 13 immune cell types from 106 leukapheresis samples of 91 healthy subjects (Schmiedel et al., 2018). It was found that eQTLs are highly cell type specific, and sex has a major effect on gene expression. In the ImmuNexUT study, with samples from 79 healthy controls and 337 patients diagnosed with different immune-mediated diseases, Ota et al. (2021) purified 28 immune cell types from these individuals with a total of 9852 samples and performed cell-type-specific eQTL analysis. They identified a median of 7092 genes with eQTLs in each cell type, 2.2-fold more than that identified in the DICE study (Schmiedel et al., 2018). They further identified eQTLs that were only present in patients.
3.3. Single-cell analysis
3.3.1. PBMC
In a proof of concept study, van der Wijst et al. (2018) analyzed 25,000 single-cell RNA-seq data from 45 donors. They identified 379 unique cis-eQTLs involving 287 unique eGenes across six cell types. A total of 48 cis-eQTLs were only identified from cell-type-specific analysis. The authors also demonstrated the benefit of performing cell subtype analysis for cMonocytes and ncMonocytes.
Oelen et al. (2022) exposed PBMC samples from 120 individuals to three pathogens. They sequenced these samples in an unstimulated condition and after 3 hours and 24 hours in vitro stimulation for the three pathogens. They identified cell-type-specific eQTLs, with the number of such eQTLs correlated with the cell type abundance. Furthermore, the effects of eQTLs differed across pathogen stimulations, the strongest enrichment for GWAS signals was observed for eQTLs that were identified from stimulation experiments.
The investigators of the OneK1K cohort analyzed 1.27 million PBMC single-cell RNA-seq data from 982 donors of Northern European ancestry and performed eQTL analyses on 14 immune cell types (Yazar et al., 2022). A total of 26,597 cis-eQTLs were identified, with most having cell-type-specific effects. Dynamic effects were also observed based on the pseudo-time trajectory for the B cell landscape. In addition to cis-eQTLs, 990 trans-eQTLs were identified, with most genes regulated by trans-eQTLs being specific for a single cell type, and none were ubiquitous across cell types. Co-localization analysis between eQTLs and GWAS signals suggested that 60% of colocalizing genes were detected upon activation and co-localization is very cell type-specific.
3.3.2. Induced pluripotent stem cells (iPSCs)
Neavin et al. (2021) gathered single-cell RNA-seq data from 64,018 fibroblasts of 79 donors and performed single-cell eQTL analysis. For the six types of fibroblasts and four types of iPSCs, most of the detected eQTLs in fibroblasts were specific to one cell type. Only 41% of the 45,503 eQTLs identified in the six fibroblast types were significant in GTEx.
Using 125 iPSC lines derived from 125 donors, Cuomo et al. (2020) collected single-cell gene expression data from 36,044 cells at four differentiation time points using full-length RNA-sequencing and the expression levels of selected cell surface markers through cell sorting. Substantial regulatory changes were observed, with over 30% of eQTLs being specific to a single stage. Hundreds of eQTLs at the mesendo and defendo stages were new. This study also tested for associations between pseudo-time and the genetic effect size using a linear model, and identified 899 time-dynamic eQTLs.
3.3.3. T cells
Soskic et al. (2022) analyzed 655,349 CD4+ T cells from 119 healthy donors, both for unstimulated cells and three time points after cell activation. Different numbers of genes showing eQTL effects were detected at different time points with hundreds of them only detected at specific cell states. Using pseudo-time trajectory information, 2265 genes were found to have dynamic eQTL effects, representing about one-third of the genes. Colocalized genes with GWAS signals were enriched in time-dependent eQTLs.
With 89 healthy donors, Schmiedel et al. (2022) performed eQTL analysis for more than one million activated CD4+ T cells classified into 19 distinct CD4+ T cell subsets. The effects of many eQTLs were strongly manifested only in specific cell types in an activation-dependent manner, and significant sex effects were also observed.
Nathan et al.(2022) performed single-cell eQTL analysis using gene expression data from more than 500,000 unstimulated memory T cells from 259 Peruvian individuals. They found that the effects of one-third of cis-eQTLs were mediated by continuous multimodally defined cell states, with independent eQTLs at some loci having opposing cell-state relationships.
3.3.4. Brain cortex
In a recent study using single-cell data from prefrontal cortex, temporal cortex, and deep white matter from 192 individuals, a total of 7,607 genes were found to have eQTLs from eight cell types (Bryois et al., 2022). A majority of cell-type-specific eQTLs were replicated in tissue-level eQTLs for cortical tissue, with eQTLs for more abundant cell types more likely replicated in the tissue results. It was also found that the effect sizes estimated from tissue tend to be lower than those estimated from cell-type-specific analysis. As expected, the number of cis-eQTLs identified in a cell type was strongly correlated with the number of cells available for the corresponding cell type. The effect sizes were more similar for similar cell types with microglia being most different from others. Co-localization analysis suggests that disease risk at a given GWAS locus is usually mediated by a single gene acting in a specific cell type.
3.3.5. Dopaminergic neuron differentiation
Jerber et al. (2021) differentiated 215 human iPSC lines to profile over 1 million cells across four conditions, including three differentiation stages (progenitor-like, young neurons, and more mature neurons) and cells exposed to a chemical stressor. eQTL analysis was performed for 14 cell types, identifying 4828 genes with eQTLs. Compared to eQTLs identified from GTEx brain tissues, this study identified 2366 new eQTLs. As for colocalization analysis, 1284 eQTLs were colocalized, with 597 being new, and 67% of these new colocalizations were associated with eQTLs detected in later differentiation stages or upon stimulation. A colocalization using aggregated data from different cell types yielded a much smaller number of colocalizations, suggesting the importance of considering cell type specificity.
4. Discussion
A main driving force for many eQTL studies in recent years is their potential to offer insights into the GWAS signals, which mostly fall into non-coding regions of the human genome. Because of the importance of cell-type-specific and context-dependent eQTLs, a growing number of studies are collecting and analyzing data to facilitate such analyses. Coupled with the rise of rich data that offer cell-type-specific and context-dependent gene regulation information, there is also a need for more statistically robust and computationally efficient methods for these analyses. In this paper, we have reviewed statistical methods that have been developed and applied to analyze different types of data for cell-type-specific and context-dependent eQTL inference, including bulk samples, purified cells, and single cells.
Despite these progresses, many issues remain to be resolved, especially in anticipation of the many population-level single-cell data to be gathered in the near future that will involve tens of thousands of individuals and tens of millions of single cells across different tissues. For example, to fully respect the nature of the single-cell data, the single data are more appropriately modeled as Poisson or negative binomial distribution. Yet, most published studies have adopted linear regression models and often aggregated multiple cells to address the sparsity of the observed single-cell data. For bulk samples using RNA sequencing, efforts have been made to use allele-specific expression to improve statistical power for identifying eQTLs, and limited work has been done for single-cell data to capitalize on allele-specific expression information. There are challenges in appropriately defining cell types and contexts for cell-type-specific and context-dependent eQTL analysis. Cuomo et al. (2022) and Strober et al. (2022) represent some initial efforts and more needs to be done to fully capture and utilize the cell state information for eQTL discoveries, including non-linear transcription programs (Wang and Zhao, 2022). In addition, as more than one SNP may jointly affect expression levels (The GTEx Consortium, 2020; Abell et al., 2022), methods that include joint effects are likely to be more powerful and can better characterize the relationship between gene expression levels and SNPs.
With many studies performed on bulk tissues, and the availability of several methods to use bulk tissue samples for cell-type-specific eQTL analysis, there is a need to effectively integrate the results from single-cell data and bulk tissue data. Furthermore, as different tissues may share cells of similar cell types and states, there is also a need to better integrate results across tissues and studies (Flutre et al., 2013; Urbut et al., 2019).
As for all genetic studies, study design is important, such as the number of samples to be collected and, in the case of single cells, the number of cells per subject and the sequencing depth of each cell. Several software and tools have been developed to facilitate this analysis under relatively simple statistical models for data analysis (Mandric et al., 2020; Dong et al., 2021). Further developments are needed to incorporate more comprehensive statistical models for analysis, and the consideration of other information, e.g. alleles-specific expression, in inferring eQTLs. Due to the limited statistical power, most studies have focused on cis-eQTLs. For example, fewer than 150 genes were found to be affected by trans-eQTLs in the GTEx project (The GTEx Consortium, 2020). There is a critical need to design statistical methods to identify trans-eQTLs, both for bulk tissues and cell-type-specific and context-dependent effects.
We have focused on the inference of eQTLs in this paper. There are many downstream applications of eQTL analysis. For example, methods need to perform co-localization analysis with cell-type-specific and context-dependent eQTL results without focusing on a specific set of SNPs through simple thresholding of statistical significance, while accounting for the linkage disequilibrium information across the SNPs in a region. Cell-type-specific and context-dependent eQTLs also offer the opportunity to improve the identifications of candidate genes through transcriptome-wide association studies at the cell-type and context-dependent level, and the recently developed methods (Okamoto et al., 2023; Song et al., 2023) for bulk samples can be extended for cell-type-specific and context-dependent analyses. There is also the need for Mendelian randomization methods to infer the causal relationship between transcript levels and complex traits and diseases (Liu et al., 2021). In this setting, the existing methods for bulk samples may not be adequate to deal with the count nature of the single-cell data, data sparsity, and the need to integrate information from multiple data sources for more informed analysis and decision. In addition to eQTLs that affect gene expression levels, genetic variants can also affect gene expression variances and co-variances (Hulse and Cai, 2013; Ek et al., 2018; Sarkar et al., 2019; Marderstein et al., 2021). Compared to eQTL analysis, more needs to be done to identify such variants. Even less has been explored at the cell type level, where cell-type-specific co-expressions may be inferred using either bulk samples (Su et al., 2022b) or single-cell data (Su et al., 2022a). A comprehensive catalog of eQTLs with cell-type-specific and context-dependent effects on gene expression variance and co-variance will better characterize gene regulation of expressions and interpret GWAS results.
Although gene expression has been the focus of cell-type-specific and context-dependent analysis, other data types are being increasingly collected for similar analysis, such as methylation, chromatin accessibility, and proteomics, based on single-cell data (Wang et al., 2022). The methods and tools developed for bulk and single-cell RNA-seq data may also apply to other data types, such as recently published methylation data from the GTEx subjects (Oliva et al., 2023) and an meQTL dataset derived from primary melanocytes of 106 individuals (Zhang et al., 2021). More importantly, as different data types reflect different aspects of the same biological process, there is a need to integrate data from different modalities to assess the genetic effects of SNPs on gene expression, methylation, chromatin accessibility, protein expression, and other molecular phenotypes. Such integrated analysis will likely yield more informative annotations of the SNPs to facilitate the interpretation of the GWAS results (Hormozdiari et al., 2018).
Table 1.
Representative resources for eQTL studies
| Resource | Sample types | Sample sizes | Link |
|---|---|---|---|
| GTEx | bulk tissues | 73 to 706 across 49 tissues | https://gtexportal.org/home/ |
| eQTL Catalogue | bulk tissues from 29 studies | 73 to 838 across studies | https://www.ebi.ac.uk/eqtl/ |
| eQTLGen | Blood | 31,684 across 37 datasets | https://www.eqtlgen.org |
| OneK1K | Single cells from PBMC | 1.27 million cells from 982 donors | https://onek1k.org |
Table 2.
Representative statistical methods for detecting cell-type-specific and context-dependent eQTLs
| Samples | Methods | Key ideas | Pros | Cons |
|---|---|---|---|---|
| bulk |
Westra
et al. 2015
Zhernakova et al. 2017 Avila Cobos et al. 2020 Aguirre-Gamboa et al. 2020 https://github.com/molgenis/systemsgenetics/tree/master/Decon2 |
Detect interactions effects between candidate eQTL genotypes and cell-type-specific proxy markers (e.g. cell type proportions) on gene expression levels in bulk tissues | Applicable to large collection of eQTL studies based on bulk samples | Limited resolution for cell types and dependence on informative and robust cell-type-specific proxy markers |
| Single cells |
Cuomo
et al. 2022
https://github.com/limix/CellRegMap/ Strober et al. 2022 https://github.com/BennyStrobes/surge |
Detect differential effects of candidate eQTL genotypes on gene expression levels for different cell types and/or contexts inferred from single cell expression data | High resolution cell types and different molecular contexts | Limited number of subjects available and sparsity in single cell gene expression data |
Acknowledgments
Zhang's research is supported by NSF DMS-2015190 and DMS-2210469. Zhao's research is supported in part by NIH R01 GM134005 and R56 AG074015.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflict of interest
The authors declare no conflict of interest.
References
- Abell NS, DeGorter MK, Gloudemans MJ, Greenwald E, Smith KS, He Z,Montgomery SB, 2022. Multiple causal variants underlie genetic associations in humans. Science 375, 1247–1254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aguirre-Gamboa R, de Klein N, di Tommaso J, Claringbould A, van der Wijst MG, de Vries D, Brugge H, Oelen R, Vosa U, Zorro MM, et al. , 2020. Deconvolution of bulk blood eqtl effects into immune cell subpopulations. BMC Bioinformatics 21, 243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Argelaguet R, Velten B, Arnol D, Dietrich S, Zenz T, Marioni JC, Buettner F, Huber W,Stegle O, 2018. Multi-omics factor analysis-a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol 14, e8124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Avila Cobos F, Alquicira-Hernandez J, Powell JE, Mestdagh P,De Preter K, 2020. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat Commun 11, 5650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aygun N, Elwell AL, Liang D, Lafferty MJ, Cheek KE, Courtney KP, Mory J, Hadden-Ford E, Krupa O, de la Torre-Ubieta L, et al. , 2021. Brain-trait-associated variants impact cell-type-specific gene regulation during neurogenesis. Am J Hum Genet 108, 1647–1668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bryois J, Calini D, Macnair W, Foo L, Urich E, Ortmann W, Iglesias VA, Selvaraj S, Nutma E, Marzin M, et al. , 2022. Cell-type-specific cis-eqtls in eight human brain cell types identify novel risk genes for psychiatric and neurological disorders. Nat Neurosci 25, 1104–1112. [DOI] [PubMed] [Google Scholar]
- Chen L, Ge B, Casale FP, Vasquez L, Kwan T, Garrido-Martin D, Watt S, Yan Y, Kundu K, Ecker S, et al. , 2016. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414 e1324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Connally NJ, Nazeen S, Lee D, Shi H, Stamatoyannopoulos J, Chun S, Cotsapas C, Cassa CA,Sunyaev SR, 2022. The missing link between genetic association and regulatory function. Elife 11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Consortium, E.P., Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, Kawli T, Davis CA, Dobin A, et al. , 2020. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Consortium, G.T., 2020. The gtex consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cuomo ASE, Heinen T, Vagiaki D, Horta D, Marioni JC,Stegle O, 2022. Cellregmap: A statistical framework for mapping context-specific regulatory variants using scrna-seq. Mol Syst Biol 18, e10663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cuomo ASE, Seaton DD, McCarthy DJ, Martinez I, Bonder MJ, Garcia-Bernardo J, Amatya S, Madrigal P, Isaacson A, Buettner F, et al. , 2020. Publisher correction: Single-cell rna-sequencing of differentiating ips cells reveals dynamic genetic effects on gene expression. Nat Commun 11, 1572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dong X, Li X, Chang TW, Scherzer CR, Weiss ST,Qiu W, 2021. Powereqtl: An r package and shiny application for sample size and power calculation of bulk tissue and single-cell eqtl analysis. Bioinformatics 37, 4269–4271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Donovan MKR, D'Antonio-Chronowska A, D'Antonio M,Frazer KA, 2020. Cellular deconvolution of gtex tissues powers discovery of disease and cell-type associated regulatory variants. Nat Commun 11, 955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ek WE, Rask-Andersen M, Karlsson T, Enroth S, Gyllensten U,Johansson A, 2018. Genetic variants influencing phenotypic variance heterogeneity. Hum Mol Genet 27, 799–810. [DOI] [PubMed] [Google Scholar]
- Flutre T, Wen X, Pritchard J,Stephens M, 2013. A statistical framework for joint eqtl analysis in multiple tissues. PLoS Genet 9, e1003486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, Eyler AE, Denny JC, Consortium GT, Nicolae DL, et al. , 2015. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet 47, 1091–1098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giambartolomei C, Zhenli Liu J, Zhang W, Hauberg M, Shi H, Boocock J, Pickrell J, Jaffe AE, CommonMind C, Pasaniuc B, et al. , 2018. A bayesian framework for multiple trait colocalization from summary association statistics. Bioinformatics 34, 2538–2545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BW, Jansen R, de Geus EJ, Boomsma DI, Wright FA, et al. , 2016. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet 48, 245–252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hormozdiari F, Gazal S, van de Geijn B, Finucane HK, Ju CJ, Loh PR, Schoech A, Reshef Y, Liu X, O'Connor L, et al. , 2018. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat Genet 50, 1041–1047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hormozdiari F, van de Bunt M, Segre AV, Li X, Joo JWJ, Bilow M, Sul JH, Sankararaman S, Pasaniuc B,Eskin E, 2016. Colocalization of gwas and eqtl signals detects target genes. Am J Hum Genet 99, 1245–1260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu Y, Li M, Lu Q, Weng H, Wang J, Zekavat SM, Yu Z, Li B, Gu J, Muchnik S, et al. , 2019. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat Genet 51, 568–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hulse AM,Cai JJ, 2013. Genetic variants contribute to gene expression variability in humans. Genetics 193, 95–108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ionita-Laza I, McCallum K, Xu B,Buxbaum JD, 2016. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat Genet 48, 214–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jerber J, Seaton DD, Cuomo ASE, Kumasaka N, Haldane J, Steer J, Patel M, Pearce D, Andersson M, Bonder MJ, et al. , 2021. Population-scale single-cell rna-seq profiling across dopaminergic neuron differentiation. Nat Genet 53, 304–312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jonkers IH,Wijmenga C, 2017. Context-specific effects of genetic variants associated with autoimmune disease. Hum Mol Genet 26, R185–R192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kalita CA,Gusev A, 2022. Decaf: A novel method to identify cell-type specific regulatory variants and their role in cancer risk. Genome Biol 23, 152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kerimov N, Hayhurst JD, Peikova K, Manning JR, Walter P, Kolberg L, Samovica M, Sakthivel MP, Kuzmin I, Trevanion SJ, et al. , 2021. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat Genet 53, 1290–1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim-Hellmuth S, Aguet F, Oliva M, Munoz-Aguirre M, Kasela S, Wucher V, Castel SE, Hamel AR, Vinuela A, Roberts AL, et al. , 2020. Cell type-specific genetic regulation of gene expression across human tissues. Science 369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kindt AS, Navarro P, Semple CA,Haley CS, 2013. The genomic signature of trait-associated variants. BMC Genomics 14, 108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM,Shendure J, 2014. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46, 310–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumasaka N, Knights AJ,Gaffney DJ, 2016. Fine-mapping cellular qtls with rasqual and atac-seq. Nat Genet 48, 206–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Z, Li X, Zhou H, Gaynor SM, Selvaraj MS, Arapoglou T, Quick C, Liu Y, Chen H, Sun R, et al. , 2022. A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nat Methods 19, 1599–1611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liang Y, Aguet F, Barbeira AN, Ardlie K,Im HK, 2021. A scalable unified framework of total and allele-specific counts for cis-qtl, fine-mapping, and prediction. Nat Commun 12, 1424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Little P, Zhabotynsky V, Li Y, Lin D,Sun W, 2022. Cell type-speci c expression quantitative trait loci 10.1101/20220331486605. [DOI] [PMC free article] [PubMed]
- Liu L, Zeng P, Xue F, Yuan Z,Zhou X, 2021. Multi-trait transcriptome-wide association studies with probabilistic mendelian randomization. Am J Hum Genet 108, 240–256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu Q, Hu Y, Sun J, Cheng Y, Cheung KH,Zhao H, 2015. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data. Sci Rep 5, 10576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu Q, Powles RL, Abdallah S, Ou D, Wang Q, Hu Y, Lu Y, Liu W, Li B, Mukherjee S, et al. , 2017. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset alzheimer's disease. PLoS Genet 13, e1006933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu Q, Powles RL, Wang Q, He BJ,Zhao H, 2016. Integrative tissue-specific functional annotations in the human genome provide novel insights on many complex traits and improve signal prioritization in genome wide association studies. PLoS Genet 12, e1005947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mandric I, Schwarz T, Majumdar A, Hou K, Briscoe L, Perez R, Subramaniam M, Hafemeister C, Satija R, Ye CJ, et al. , 2020. Optimized design of single-cell rna sequencing experiments for cell-type-specific eqtl analysis. Nat Commun 11, 5504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marderstein AR, Davenport ER, Kulm S, Van Hout CV, Elemento O,Clark AG, 2021. Leveraging phenotypic variability to identify genetic interactions in human phenotypes. Am J Hum Genet 108, 49–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nathan A, Asgari S, Ishigaki K, Valencia C, Amariuta T, Luo Y, Beynor JI, Baglaenko Y, Suliman S, Price AL, et al. , 2022. Single-cell eqtl models reveal dynamic t cell state dependence of disease loci. Nature 606, 120–128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neavin D, Nguyen Q, Daniszewski MS, Liang HH, Chiu HS, Wee YK, Senabouth A, Lukowski SW, Crombie DE, Lidgerwood GE, et al. , 2021. Single cell eqtl analysis identifies cell type-specific genetic control of gene expression in fibroblasts and reprogrammed induced pluripotent stem cells. Genome Biol 22, 76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME,Cox NJ, 2010. Trait-associated snps are more likely to be eqtls: Annotation to enhance discovery from gwas. PLoS Genet 6, e1000888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oelen R, de Vries DH, Brugge H, Gordon MG, Vochteloo M, single-cell e Q.c., Consortium B, Ye CJ, Westra HJ, Franke L, et al. , 2022. Single-cell rna-sequencing of peripheral blood mononuclear cells reveals widespread, context-specific gene expression regulation upon pathogenic exposure. Nat Commun 13, 3267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Okamoto J, Wang L, Yin X, Luca F, Pique-Regi R, Helms A, Im HK, Morrison J,Wen X, 2023. Probabilistic integration of transcriptome-wide association studies and colocalization analysis identifies key molecular pathways of complex traits. Am J Hum Genet 110, 44–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oliva M, Demanelis K, Lu Y, Chernoff M, Jasmine F, Ahsan H, Kibriya MG, Chen LS,Pierce BL, 2023. DNA methylation qtl mapping across diverse human tissues provides molecular links between genetic variation and complex traits. Nat Genet 55, 112–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ongen H, Buil A, Brown AA, Dermitzakis ET,Delaneau O, 2016. Fast and efficient qtl mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ota M, Nagafuchi Y, Hatano H, Ishigaki K, Terao C, Takeshima Y, Yanaoka H, Kobayashi S, Okubo M, Shirai H, et al. , 2021. Dynamic landscape of immune cell-specific gene regulation in immune-mediated diseases. Cell 184, 3006–3021 e3017. [DOI] [PubMed] [Google Scholar]
- Patel D, Zhang X, Farrell JJ, Chung J, Stein TD, Lunetta KL,Farrer LA, 2021. Cell-type-specific expression quantitative trait loci associated with alzheimer disease in blood and brain tissue. Transl Psychiatry 11, 250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Psych EC, Akbarian S, Liu C, Knowles JA, Vaccarino FM, Farnham PJ, Crawford GE, Jaffe AE, Pinto D, Dracheva S, et al. , 2015. The psychencode project. Nat Neurosci 18, 1707–1712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richardson TG, Hemani G, Gaunt TR, Relton CL,Davey Smith G, 2020. A transcriptome-wide mendelian randomization study to uncover tissue-dependent regulatory mechanisms across the human phenome. Nat Commun 11, 185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ritchie GR, Dunham I, Zeggini E,Flicek P, 2014. Functional annotation of noncoding sequence variants. Nat Methods 11, 294–296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roadmap Epigenomics C, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, et al. , 2015. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sarkar AK, Tung PY, Blischak JD, Burnett JE, Li YI, Stephens M,Gilad Y, 2019. Discovery and characterization of variance qtls in human induced pluripotent stem cells. PLoS Genet 15, e1008045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmiedel BJ, Gonzalez-Colin C, Fajardo V, Rocha J, Madrigal A, Ramirez-Suastegui C, Bhattacharyya S, Simon H, Greenbaum JA, Peters B, et al. , 2022. Single-cell eqtl analysis of activated t cell subsets reveals activation and cell type-dependent effects of disease-risk variants. Sci Immunol 7, eabm2508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmiedel BJ, Singh D, Madrigal A, Valdovino-Gonzalez AG, White BM, Zapardiel-Gonzalo J, Ha B, Altay G, Greenbaum JA, McVicker G, et al. , 2018. Impact of genetic polymorphisms on human immune cell gene expression. Cell 175, 1701–1715 e1716. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shabalin AA, 2012. Matrix eqtl: Ultra fast eqtl analysis via large matrix operations. Bioinformatics 28, 1353–1358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Song X, Ji J, Rothstein JH, Alexeeff SE, Sakoda LC, Sistig A, Achacoso N, Jorgenson E, Whittemore AS, Klein RJ, et al. , 2023. Mixcan: A framework for cell-type-aware transcriptome-wide association studies with an application to breast cancer. Nat Commun 14, 377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soskic B, Cano-Gamez E, Smyth DJ, Ambridge K, Ke Z, Matte JC, Bossini-Castillo L, Kaplanis J, Ramirez-Navarro L, Lorenc A, et al. , 2022. Immune disease risk variants regulate gene expression dynamics during cd4(+) t cell activation. Nat Genet 54, 817–826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strober BJ, Tayeb K, Popp J, Qi G, Gordon MG, Perez R, Ye CJ,Battle A, 2022. Uncovering context-specific genetic-regulation of gene expression from single-cell rnasequencing using latent-factor models. bioRxiv 10.1101/20221222521678. [DOI] [PMC free article] [PubMed]
- Su C, Xu Z, Shan X, Cai B, Zhao H,Zhang J, 2022a. Cell-type-specific co-expression inference from single cell ma-sequencing data doi: 10.1101/20221213520181. [DOI] [PMC free article] [PubMed]
- Su C, Zhang J,Zhao H, 2022b. Csnet: Estimating cell-type-specific gene co-expression networks from bulk gene expression data 10.1101/20211221473558. [DOI] [PMC free article] [PubMed]
- Sun W, 2012. A statistical framework for eqtl mapping using rna-seq data. Biometrics 68, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Urbut SM, Wang G, Carbonetto P,Stephens M, 2019. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat Genet 51, 187–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Wijst MGP, Brugge H, de Vries DH, Deelen P, Swertz MA, LifeLines Cohort S, Consortium B,Franke L, 2018. Single-cell rna sequencing identifies celltype-specific cis-eqtls and co-expression qtls. Nat Genet 50, 493–497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang SK, Nair S, Li R, Kraft K, Pampari A, Patel A, Kang JB, Luong C, Kundaje A,Chang HY, 2022. Single-cell multiome of the human retina and deep learning nominate causal variants in complex eye diseases. Cell Genom 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Y,Zhao H, 2022. Non-linear archetypal analysis of single-cell rna-seq data by deep autoencoders. PLoS Comput Biol 18, e1010025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wen X, Lee Y, Luca F,Pique-Regi R, 2016. Efficient integrative multi-snp association analysis via deterministic approximation of posteriors. Am J Hum Genet 98, 1114–1129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Westra HJ, Arends D, Esko T, Peters MJ, Schurmann C, Schramm K, Kettunen J, Yaghootkar H, Fairfax BP, Andiappan AK, et al. , 2015. Cell specific eqtl analysis without sorting cells. PLoS Genet 11, e1005223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie D,Wang J, 2022. Robust statistical inference for cell type deconvolution. arXiv:220206420
- Yankovitz G, Cohn O, Bacharach E, Peshes-Yaloz N, Steuerman Y, Iraqi FA,Gat-Viks I, 2021. Leveraging the cell lineage to predict cell-type specificity of regulatory variation from bulk genomics. Genetics 217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao DW, O'Connor LJ, Price AL,Gusev A, 2020. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat Genet 52, 626–633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yazar S, Alquicira-Hernandez J, Wing K, Senabouth A, Gordon MG, Andersen S, Lu Q, Rowson A, Taylor TRP, Clarke L, et al. , 2022. Single-cell eqtl mapping identifies cell type-specific genetic control of autoimmune disease. Science 376, eabf3041. [DOI] [PubMed] [Google Scholar]
- Young AMH, Kumasaka N, Calvert F, Hammond TR, Knights A, Panousis N, Park JS, Schwartzentruber J, Liu J, Kundu K, et al. , 2021. A map of transcriptional heterogeneity and regulatory variation in human microglia. Nat Genet 53, 861–868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yuan Z, Zhu H, Zeng P, Yang S, Sun S, Yang C, Liu J,Zhou X, 2020. Testing and controlling for horizontal pleiotropy with probabilistic mendelian randomization in transcriptome-wide association studies. Nat Commun 11, 3861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhabotynsky V, Huang L, Little P, Hu YJ, Pardo-Manuel de Villena F, Zou F,Sun W, 2022. Eqtl mapping using allele-specific count data is computationally feasible, powerful, and provides individual-specific estimates of genetic effects. PLoS Genet 18, e1010076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang T, Choi J, Dilshat R, Einarsdottir BO, Kovacs MA, Xu M, Malasky M, Chowdhury S, Jones K, Bishop DT, et al. , 2021. Cell-type-specific meqtls extend melanoma gwas annotation beyond eqtls and inform melanocyte gene-regulatory mechanisms. Am J Hum Genet 108, 1631–1646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang T, Choi J, Kovacs MA, Shi J, Xu M, Program NCS, Melanoma Meta-Analysis C, Goldstein AM, Trower AJ, Bishop DT, et al. , 2018. Cell-type-specific eqtl of primary melanocytes facilitates identification of melanoma susceptibility genes. Genome Res 28, 1621–1635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhernakova DV, Deelen P, Vermaat M, van Iterson M, van Galen M, Arindrarto W, van ť Hof P, Mei H, van Dijk F, Westra HJ, et al. , 2017. Identification of context-dependent expression quantitative trait loci in whole blood. Nat Genet 49, 139–145. [DOI] [PubMed] [Google Scholar]
- Zhou D, Jiang Y, Zhong X, Cox NJ, Liu C,Gamazon ER, 2020. A unified framework for jointtissue transcriptome-wide association and mendelian randomization analysis. Nat Genet 52, 1239–1246. [DOI] [PMC free article] [PubMed] [Google Scholar]
