A brief guide to analyzing expression quantitative trait loci

Byung Su Ko; Sung Bae Lee; Tae-Kyung Kim

doi:10.1016/j.mocell.2024.100139

letter

. 2024 Oct 22;47(11):100139. doi: 10.1016/j.mocell.2024.100139

A brief guide to analyzing expression quantitative trait loci

Byung Su Ko ¹, Sung Bae Lee ¹, Tae-Kyung Kim ^2,^3,^⁎

PMCID: PMC11600780 PMID: 39447874

Abstract

Molecular quantitative trait locus (molQTL) mapping has emerged as an important approach for elucidating the functional consequences of genetic variants and unraveling the causal mechanisms underlying diseases or complex traits. However, the variety of analysis tools and sophisticated methodologies available for molQTL studies can be overwhelming for researchers with limited computational expertise. Here, we provide a brief guideline with a curated list of methods and software tools for analyzing expression quantitative trait loci, the most widely studied type of molQTL.

INTRODUCTION

In recent decades, genome-wide association studies (GWAS) have advanced our understanding of the genetic basis of diseases and complex traits by identifying causal variants present in human populations (Buniello et al., 2019, Wang et al., 2022, Visscher et al., 2017). To decipher the underlying mechanisms and discover potential therapeutic targets, there is a growing need to interpret the functional relevance of genetic variants (Cano-Gamez and Trynka, 2020). With the rapid advancements in high-throughput sequencing technologies, an increasing number of studies have adopted integrative approaches combining genetic information with various molecular phenotypes, such as gene expression, splicing, protein abundance, and chromatin modification/accessibility. These integrative strategies have paved the way for molecular quantitative trait loci (molQTL) mapping (Aguet et al., 2023), a powerful statistical framework that identifies genetic loci associated with quantitative variations in molecular phenotypes, thereby providing insights into the functional consequences of genetic variants.

Expression quantitative trait loci (eQTL) mapping determines the regulatory effects of genetic variants on gene expression levels, which can provide insights into disease mechanisms. Large-scale consortia, such as the eQTL Catalogue (Kerimov et al., 2021, Kerimov et al., 2023), the Genotype-Tissue Expression (GTEx) project (GTex Consortium, 2020), and the eQTLGen consortium (Vosa et al., 2021) offer catalogs of eQTL summaries and annotations in diverse human tissues. Given the population scale of genome-wide studies, robust eQTL analysis typically requires genetic data from hundreds of individuals to achieve sufficient statistical power (Huang et al., 2018). A wide range of computational tools and methodologies have been developed and integrated into bioinformatics pipelines to facilitate the analysis of large-scale genetic and phenotypic datasets (Kel et al., 2016, Kerimov et al., 2021, Lee et al., 2024, Wang et al., 2021, Yoo et al., 2021). Although eQTL mapping protocols and summary statistics are publicly accessible, researchers with limited computational expertise may encounter challenges in orchestrating computational workflows and processing large-scale datasets. Here, we provide a curated resource for eQTL mapping analysis to assist experimental biologists (Fig. 1 and Supplementary Table).

Fig. 1 — Framework diagram of eQTL mapping. A schematic diagram of the eQTL mapping process with commonly used tools for each step. eQTL, expression quantitative trait loci; HWE, Hardy-Weinberg Equilibrium; QC, quality control; MAF, minor allele frequency; WGS, whole-genome sequencing.

MAIN BODY

Overview of eQTL Mapping

eQTL mapping requires 2 types of datasets: genotype data and gene expression data. Before eQTL mapping, quality control (QC) of both datasets should be conducted to identify and remove problematic or outlying samples, preventing a loss of power in subsequent eQTL analysis. Using QC-processed datasets, eQTL mapping identifies genetic variants that significantly affect the expression levels of putative target genes, providing insights into the regulatory networks of gene expression. It should be noted that the statistical power of eQTL studies is highly dependent on sample size. Small sample sizes can lead to false positives or false negatives, thereby reducing the reliability of the results (Huang et al., 2018). To enhance the robustness of eQTL findings, researchers should aim for larger sample sizes or consider conducting meta-analyses that combine data from multiple studies (Sieberts et al., 2020).

Genotype Data

Genome-wide genotype data, obtained from whole-genome sequencing and/or single nucleotide polymorphism arrays combined with genotype imputation, provide comprehensive coverage of genetic variations across the genome and enhance the power to identify causal variants. Variant calling tools such as the Genome Analysis Toolkit (GATK) (McKenna et al., 2010), BCFtools (Li, 2011), DeepVariant (Poplin et al., 2018), Strelka2 (Kim et al., 2018), and FreeBayes (Garrison and Marth, 2012) are employed to detect variants from sequencing and microarray data. Among these tools, GATK, a widely adopted suite of tools developed by the Broad Institute, analyzes high-throughput sequencing data to discover genetic variants and integrates information on variants into VCF (variant call format) files (Danecek et al., 2011). VCF files can also be obtained from public repositories such as dbSNP (https://www.ncbi.nlm.nih.gov/snp/), the 1000 Genomes Project (Genomes Project et al., 2015), gnomAD (https://gnomad.broadinstitute.org/), EVA (https://www.ebi.ac.uk/eva/), or UK Biobank, (Bycroft et al., 2018), or from individual research publications.

QC of genotype data is an indispensable step to ensure the reliability and accuracy of eQTL analysis. Several QC tools such as PLINK (Chang et al., 2015, Marees et al., 2018, Purcell et al., 2007) and VCFtools (Danecek et al., 2011) offer a range of functionalities (eg, data formatting, filtering, and statistical analyses) to perform the overall genotype QC process. In this resource, we organized the multiple steps of genotype QC into 2 levels: sample-level QC and variant-level QC.

Sample-level QC

When combining genotype data from heterogeneous sources, missing genotypes per sample can be a common issue. VCFtools (--missing-indv) or PLINK (--mind) calculates the missing rate of genotypes for each sample, allowing the exclusion of low-quality samples with excessive missing genotypes.

Gender mismatches can be detected by examining the homozygosity rate of genetic variants on the X chromosome by using PLINK (--check-sex) (Zhao et al., 2018). For instance, males have a higher expected homozygosity rate of 1, while females have a lower rate. Comparing reported sex information with the observed homozygosity rates identifies gender discrepancies, and the corresponding samples should be removed. Even after excluding gender-mismatched samples, heterozygous haploid genotypes of variants on the X chromosome may persist due to errors in genotype calling or sequencing. To maintain data integrity, these genotypes should also be treated as missing and removed by using PLINK (--set-hh-missing).

To reduce the false-positive rates in eQTL mapping analysis, the relatedness between each pair of samples should be assessed. The kinship coefficient, a common measure of relatedness, is defined as the probability that a pair of randomly sampled homologous alleles derived from 2 individuals are identical by descent (Speed and Balding, 2015, Thompson, 1975). One issue in the relatedness estimation is that the presence of strong linkage disequilibrium (LD) among detected variants can lead to overestimation of relatedness. Hence, LD pruning is often recommended to improve the estimation accuracy of relatedness by reducing the number of redundant variants in strong LD. PLINK (--indep-pairwise) command is a widely used tool for LD pruning, effectively removing variants that are highly correlated with each other. The algorithm calculates pairwise LD (r²) between all variants within a specified window, identifying variants in strong LD and subsequently removing 1 variant that exceeds a certain LD threshold. Following LD pruning, researchers can employ specialized tools such as KING (Manichaikul et al., 2010), SEEKIN (Dou et al., 2017), correctkin (Nyerki et al., 2023), and IBDkin (Zhou et al., 2020) to identify related individuals in each experiment by setting a certain threshold for expected kinship coefficients. Then, researchers can either remove 1 individual from each related pair or adjust for relatedness in the eQTL analysis using a linear mixed model, which incorporates kinship coefficients into kinship matrices to account for population structure and confounding effects (Hoffman, 2013, Lee, 2018, Pala et al., 2017).

Population stratification is another crucial factor to consider in eQTL mapping. It refers to the existence of systematic differences in allele frequencies between subpopulations, which can be attributed to variations in ancestry or geographic origin (Yang et al., 2014). These differences can introduce confounding effects, potentially leading to false-positive or false-negative associations between genetic variants and gene expression levels. To mitigate this issue, principal component analysis (PCA) has been widely adopted to identify population structure and potential relatedness (Price et al., 2006). Principal components (PCs) defined from LD-pruned datasets can be used to identify and remove outlying samples that deviate from the primary ancestral clusters. These PCs, derived from genotype data, can then be incorporated as covariates in the eQTL model to adjust for population structures (see the section on Selecting Covariates).

Variant-level QC

Since the differences in read depth of genes across experiments could potentially lead to missing genotypes, variants with a high missing genotyping rate should be removed to prevent false-positive signals in subsequent analyses. Missingness can be identified by using PLINK (--geno) or VCFtools (--max-missing) options.

To identify potential genotyping errors and population stratifications, researchers should confirm whether genetic variants violate the principle of Hardy-Weinberg Equilibrium (HWE), which assumes constant genotype and allele frequencies across generations in a large population without natural selection, newly occurred mutations, or gene migration (Wang et al., 2021, Wigginton et al., 2005). The Chi-squared test is commonly used to assess HWE violations. This test compares the observed genotype frequencies with the expected frequencies under the assumption of HWE (Rohlfs and Weir, 2008) and generates a P-value that indicates the significance of the deviation. In practice, an HWE P-value threshold of 10⁻⁶ is commonly used to filter out variants that significantly deviate from HWE, ensuring high-quality variants for downstream analyses.

Variants with a minor allele frequency (MAF) below a certain threshold are often removed in eQTL mapping studies to reduce computational burden and false-positive associations. These variants have limited statistical power to detect significant associations with gene expression. Therefore, removing low MAF variants allows subsequent analyses to prioritize variants with sufficient statistical power, enhancing the overall robustness and reliability of the eQTL results. Several tools, including PLINK (--maf) and VCFtools (--freq), can be used to detect and filter out MAF variants below the set threshold, which depends on sample size and study design (Hong and Park, 2012). For instance, a higher MAF threshold may be appropriate in studies with smaller sample sizes to ensure sufficient statistical power.

Phenotype Data

Publicly available RNA-seq datasets may be provided in various formats depending on the processing steps (eg, raw read data, read-aligned data, feature-counted data, or standardized data) (Sanchis et al., 2021). RNA-seq data compiled in different formats must be integrated into a single format for subsequent analysis. As a basic QC measure for RNA-seq datasets, low-quality samples (eg, those with poor sequencing quality or a low percentage of mapped reads) should be identified and excluded. Additionally, genes that exhibit expression levels below a defined threshold across samples should be filtered out to reduce noise. These QC procedures can be performed by using the following software tools: MultiQC (Ewels et al., 2016), RNA-SeQC2 (Graubert et al., 2021), RSeQC (Wang et al., 2012), FastQC (Andrews, 2010), and Picard (https://github.com/broadinstitute/picard).

After data integration, it is essential to identify and remove samples that exhibit atypical gene expression profiles. These outliers may arise from technical issues such as sample contamination or failures in RNA-seq library preparation. PCA on the RNA-seq data, using the first 2 components to capture major variations, can detect potential outliers by using tools such as PCAtools, factoextra, pcaExplorer, and smartPCA from the EIGENSOFT package (https://github.com/DReichLab/EIG).

To ensure the integrity of RNA-seq datasets, researchers should identify and correct sample swaps, mislabeled samples, or cross-contamination between RNA-seq samples by using tools such as Match Bam to VCF (Fort et al., 2017) and VerifyBamID2 (Zhang et al., 2020). In addition, gender-mismatched samples can be identified by measuring the gene expression levels of gender-specific genes, such as the RPSY41 or XIST gene.

RNA-seq data normalization is a crucial step enabling the comparison of gene expression levels across samples. While intrasample normalization, such as CPM/FPKM/RPKM/TPM, normalizes gene expression levels within individual samples, it is not well-suited for comparing expression levels across samples and experiments. To address this issue, various software packages (edgeR [Robinson et al., 2010], DESeq2 [Love et al., 2014], NOIseq [Tarazona et al., 2015], cqn [Hansen et al., 2012], and EDAseq [Risso et al., 2011]) provide tools for cross-sample normalization of RNA-seq data, such as Trimmed Mean of M values (Robinson and Oshlack, 2010), Relative Log Expression (Abbas-Aghababazadeh et al., 2018), Quantile normalization (Bolstad et al., 2003, Cloonan et al., 2008), and Median Ratio Normalization (Dillies et al., 2013). These methods remove systematic biases that can arise from technical variations, such as library preparation or sequencing platforms. Following cross-sample normalization, gene expression data should be transformed by the inverse normal transformation method, which converts the data to follow a normal distribution and aligns it with the assumptions of regression models in subsequent analyses. The inverse normal transformation enhances the comparability of gene expression levels across samples by reducing the impact of outliers.

Selecting Covariates

Adjustment of covariates accounts for unwanted variations introduced by confounding factors, enhancing the power to detect true associations between genetic variants and gene expression levels. Both technical (eg, batch effects, RNA-seq features such as read length, paired/single-end sequencing, library size, sequencing platform) and biological covariates (eg, age, sex, tissue type) can be regressed out using linear regression models. However, including too many covariates can lead to overfitting, which reduces the statistical power of detection and produces unreliable estimates of eQTL effects. Therefore, researchers should carefully consider and prioritize the most relevant covariates for inclusion in the regression models.

In addition to known covariates, latent covariates that are not directly observable or measurable can be inferred from the patterns present in each dataset. PCs derived from both genotype and phenotype data can capture latent sources of variations. The number of PCs is often determined by identifying the elbow point of the scree plot, which depicts the proportion of variance in each PC, to improve the accuracy of the analysis (Zhou et al., 2022). In addition to PC analysis, statistical methods such as surrogate variable analysis (Leek and Storey, 2007), probabilistic estimation of expression residuals (Stegle et al., 2012), or hidden covariates with prior (Mostafavi et al., 2013) can also be employed to capture latent covariates from gene expression data and to control for confounding factors. The optimal number of these covariates to be included in the analysis can be determined by maximizing the number of detected eQTLs while minimizing the risk of overfitting.

eQTL Mapping

The primary goal of eQTL mapping is to identify genetic variants that influence gene expression levels and determine whether these associations exhibit genome-wide significance. In eQTL analysis, normalized gene expression values, covariate matrices, genotypes, and regression models are used to identify statistically significant associations between variants and phenotypes, thereby discovering QTLs in cis (proximal) and trans (distal) regions (de Klein et al., 2023, GTEx Consortium, Laboratory D. A., Coordinating Center -Analysis Working G., 2017, Kerimov et al., 2021). Cis-regulatory variants affect gene expression through regulatory elements located near the target genes, typically within a 100 kb to 1 Mb window from the transcription start site. In contrast, trans-regulatory variants influence gene expression from a different chromosome or at least 5 Mb away from the target genes. This resource focuses on mapping cis-eQTLs, which are more commonly analyzed due to their larger effect sizes and stronger effects of variants on gene expression than trans-eQTLs (Liu et al., 2019, Pierce et al., 2014, Wang et al., 2024).

eQTL analysis involves association tests between genetic variants and gene expression levels across the genome. Nominal P-values of correlation for each variant-gene pair are calculated using significance tests based on the null hypothesis of no association between the variant and gene expression. However, selecting an appropriate genome-wide significance threshold for eQTL mapping is challenging due to a large number of association tests conducted with every possible variant-gene pair across the entire genome. Therefore, multiple testing correction is applied to control the false discovery rate (FDR), ensuring the genome-wide significance of the identified associations. Several multiple testing correction methods, such as Bonferroni correction, FDR correction, or permutation-based methods, can be applied to adjust the P-values or significance threshold. Selecting the appropriate multiple-testing correction method depends on the study design, number of tests, effect sizes of the eQTLs, and the specific research objectives. For studies aiming to uncover a comprehensive list of potential eQTLs, a less stringent method like FDR might be more suitable. On the other hand, studies requiring high confidence in the identified eQTLs might prefer a more stringent correction method, such as the Bonferroni correction. However, the conservative nature of this method can increase the risk of false negatives, potentially missing true associations with smaller effect sizes. Researchers should carefully select a correction method that balances minimizing false positives with the ability to detect true associations, considering the objectives of their study.

To efficiently perform a large number of association tests for eQTL mapping and subsequent multiple testing correction, several software packages have been developed, including Matrix eQTL (Shabalin, 2012), FastQTL/QTLtools (Ongen et al., 2016, Delaneau et al., 2017), and TensorQTL (Taylor-Weiner et al., 2019). While Matrix eQTL provides a basic framework for eQTL mapping, FastQTL/QTLtools improves its computational efficiency. These tools model the null distribution of the association test statistics using a beta distribution. This approach enables the estimation of adjusted P-values with fewer permutations, thereby mitigating the computational burden. TensorQTL utilizes the computational power of TensorFlow, which employs tensor operation and graphics processing unit acceleration to speed up the computation of association tests. GEMMA (Zhou and Stephens, 2012), LIMIX (Casale et al., 2015), and APEX (Corbin et al., 2020) are specifically designed for eQTL mapping using linear mixed models as a regression model.

Translating eQTL Findings Into Meaningful Biological Insights

While eQTL mapping provides a robust statistical framework for identifying associations between genetic variants and gene expression, the true value of these studies lies in uncovering the biological mechanisms driving these associations and their relevance to diseases. To translate statistical associations into biological insights, it is crucial to identify causal variants and understand their regulatory mechanisms. Fine-mapping methods play a critical role in this process by refining the list of candidate causal variants within an eQTL region. Tools such as CAVIAR (Hormozdiari et al., 2014), DAP-G (Wen et al., 2016), FINEMAP (Benner et al., 2016), and SuSiE (Wang et al., 2020) employ statistical models to estimate the probability that a given variant affects gene expression, producing credible sets of variants at a specified probability threshold.

Once potential causal variants are prioritized, integration of functional genomic annotations becomes essential for biological interpretation. These annotations include overlaps with regulatory elements (eg, enhancers, promoters), chromatin accessibility data (eg, ATAC-seq or DNase-seq data), and histone modifications. Databases such as ENCODE (Consortium, 2012) provide these annotations, enabling researchers to contextualize eQTL variants within the regulatory landscape. Given the highly context-dependent nature of gene expression, validating the tissue specificity of identified eQTLs is crucial. The GTEx project exemplifies this approach by conducting eQTL mapping across multiple human tissues and integrating tissue-specific functional annotations, providing a comparative framework for eQTL analysis (GTEx Consortium, 2020). This approach allows researchers to determine whether identified eQTLs are tissue-specific or shared across multiple tissues, thereby linking genetic variants to relevant biological functions and potential disease associations.

Colocalization analysis enhances the biological interpretation of eQTLs by determining whether the same genetic variant underlies both an eQTL signal, indicating the association between a variant and gene expression levels, and a GWAS signal, which links genetic variants to complex traits or diseases. By using tools like coloc (Giambartolomei et al., 2014) and eCAVIAR (Hormozdiari et al., 2016), this approach helps to uncover causal relationships between genetic variants and disease risk, allowing researchers to pinpoint specific genes or regulatory elements that may contribute to disease mechanisms. For instance, a study on cerebral cortical development identified eQTLs that overlap with GWAS loci for neuropsychiatric disorders, such as schizophrenia and autism, providing insights into disease mechanisms and potential therapeutic targets (Werling et al., 2020). Moreover, integrating eQTL data with other omics layers offers a more comprehensive understanding of how genetic variants influence cellular pathways and biological processes. In the context of cancer, a recent study combined multiple layers of omics data, including genomic, transcriptomic, proteomic, and phosphoproteomic information, to reveal key signaling pathways and protein networks involved in tumor progression (Chen et al., 2020). By leveraging these multiomics data, researchers can gain deeper insights into the pathological mechanisms underlying various diseases, thereby enhancing the precision and effectiveness of targeted therapies.

While computational methods provide strong evidence for eQTL associations, experimental validation is crucial for establishing causality. Techniques such as massively parallel reporter assays (Tewhey et al., 2016) and Clustered Regularly Interspaced Short Palindromic Repeats-based genome editing (Gasperini et al., 2019) complement computational methods by providing functional evidence of the causal relationship between genetic variants and gene expression changes. By combining computational and experimental strategies, researchers can effectively translate eQTL findings into meaningful biological insights, advancing our understanding of gene regulation and its impact on human health and disease.

CONCLUDING REMARKS

In this resource, we provide an introductory guideline for biologists with little or no expertise in bioinformatics who are interested in conducting eQTL analysis. Our guidelines focus on the key steps and software tools for bulk RNA-seq data, which is commonly used for gene expression profiling. However, it is important to note that single-cell RNA sequencing data offer a unique opportunity to investigate cell type–specific gene regulation in complex tissues, providing deeper insights into biological processes and diseases. Recent studies on COVID-19 (Edahiro et al., 2023) and neurological disorders (de Klein et al., 2023) highlighted the power of single-cell eQTL analysis in unraveling cell type–specific regulatory mechanisms underlying disease pathology.

While single-cell approaches provide unprecedented resolution, they also present distinct challenges, such as high technical noise, data sparsity, and the complexity of managing cellular heterogeneity. The large data volumes and the complexity of integrating genetic and transcriptomic information increase computational demands. Additionally, the dynamic nature of single-cell gene expression and the immense burden of multiple testing require careful statistical considerations. Despite these challenges, recent advances in computational tools have improved the integration of single-cell RNA sequencing data into eQTL mapping. Tools such as SCeQTL (Hu et al., 2020), scReQTL (Liu et al., 2021), CellRegMap (Cuomo et al., 2022), and FastGxC (Lu et al., 2021) are specifically designed to address these challenges, enabling more robust and efficient single-cell eQTL analysis.

Furthermore, various other molQTL data types, such as splicing QTLs, methylation QTLs, and chromatin accessibility QTLs, can be utilized to uncover associations between genetic variants and different layers of gene regulation. This approach allows for a more comprehensive understanding of the functional consequences of genetic variation and the identification of key regulatory pathways and networks that can unravel the complex interplay between genetic variation and molecular phenotypes.

AUTHOR CONTRIBUTIONS

Byung Su Ko: Writing – review & editing, Writing – original draft, Conceptualization. Sung Bae Lee: Supervision, Conceptualization. Tae-Kyung Kim: Writing – review & editing, Supervision, Conceptualization.

DECLARATION OF COMPETING INTERESTS

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

ACKNOWLEDGMENTS

This work was supported by Samsung Science & Technology Foundation (SSTF-BA2102-09 to T.-K.KIM), the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (2023R1A2C300337811, 2020H1D3A1A04104610, RS-2023-00265581, and RS-2023-00217798 (Multitasking Macrophage Research Center) to T.-K.KIM, 24-BR-03-02 (the Korea Brain Research Institute (KBRI) Research Program) to S.B.LEE), and Korea Basic Science Institute (National Research Facilities and Equipment Center) grant funded by the Ministry of Education (2021R1A6C101A390) (T.-K.KIM).

Footnotes

^{Appendix A}

Supplemental material associated with this article can be found in the online version at: doi:10.1016/j.mocell.2024.100139.

Appendix A. Supplemental material

Supplementary material

mmc1.xlsx^{(14.8KB, xlsx)}

REFERENCES

Abbas-Aghababazadeh F., Li Q., Fridley B.L. Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing. PLoS One. 2018;13 doi: 10.1371/journal.pone.0206312. [DOI] [PMC free article] [PubMed] [Google Scholar]
Aguet F., Alasoo K., Li Y.I., Battle A., Im H.K., Montgomery S.B., Lappalainen T. Molecular quantitative trait loci. Nat. Rev. Methods Primers. 2023;3:4. doi: 10.1038/s43586-022-00188-6. [DOI] [Google Scholar]
Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. 〈https://www.bioinformatics.babraham.ac.uk/projects/fastqc/〉.
Benner C., Spencer C.C., Havulinna A.S., Salomaa V., Ripatti S., Pirinen M. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016;32:1493–1501. doi: 10.1093/bioinformatics/btw018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bolstad B.M., Irizarry R.A., Astrand M., Speed T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–193. doi: 10.1093/bioinformatics/19.2.185. [DOI] [PubMed] [Google Scholar]
Buniello A., MacArthur J.A.L., Cerezo M., Harris L.W., Hayhurst J., Malangone C., McMahon A., Morales J., Mountjoy E., Sollis E., et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47:D1005–D1012. doi: 10.1093/nar/gky1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cano-Gamez E., Trynka G. From GWAS to function: using functional genomics to identify the mechanisms underlying complex diseases. Front. Genet. 2020;11:424. doi: 10.3389/fgene.2020.00424. [DOI] [PMC free article] [PubMed] [Google Scholar]
Casale F.P., Rakitsch B., Lippert C., Stegle O. Efficient set tests for the genetic analysis of correlated traits. Nat. Methods. 2015;12:755–758. doi: 10.1038/nmeth.3439. [DOI] [PubMed] [Google Scholar]
Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen Y.J., Roumeliotis T.I., Chang Y.H., Chen C.T., Han C.L., Lin M.H., Chen H.W., Chang G.C., Chang Y.L., Wu C.T., et al. Proteogenomics of non-smoking lung cancer in East Asia delineates molecular signatures of pathogenesis and progression. Cell. 2020;182:226–244. doi: 10.1016/j.cell.2020.06.012. e217. [DOI] [PubMed] [Google Scholar]
Cloonan N., Forrest A.R., Kolle G., Gardiner B.B., Faulkner G.J., Brown M.K., Taylor D.F., Steptoe A.L., Wani S., Bethel G., et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods. 2008;5:613–619. doi: 10.1038/nmeth.1223. [DOI] [PubMed] [Google Scholar]
Corbin Q., Li G., Zilin L., Xihao L., Rounak D., Yaowu L., Laura S. and Xihong L. A versatile toolkit for molecular QTL mapping and meta-analysis at scale. bioRxiv 2020 〈10.1101/2020.12.18.4234902020.2012.2018.423490〉.
Cuomo A.S.E., Heinen T., Vagiaki D., Horta D., Marioni J.C., Stegle O. CellRegMap: a statistical framework for mapping context-specific regulatory variants using scRNA-seq. Mol. Syst. Biol. 2022;18 doi: 10.15252/msb.202110663. [DOI] [PMC free article] [PubMed] [Google Scholar]
Danecek P., Auton A., Abecasis G., Albers C.A., Banks E., DePristo M.A., Handsaker R.E., Lunter G., Marth G.T., Sherry S.T., et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156 2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
de Klein N., Tsai E.A., Vochteloo M., Baird D., Huang Y., Chen C.Y., van Dam S., Oelen R., Deelen P., Bakker O.B., et al. Brain expression quantitative trait locus and network analyses reveal downstream effects and putative drivers for brain-related diseases. Nat. Genet. 2023;55:377–388. doi: 10.1038/s41588-023-01300-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Delaneau O., Ongen H., Brown A.A., Fort A., Panousis N.I., Dermitzakis E.T. A complete tool set for molecular QTL discovery and analysis. Nat. Commun. 2017;8:15452. doi: 10.1038/ncomms15452. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dillies M.A., Rau A., Aubert J., Hennequet-Antier C., Jeanmougin M., Servant N., Keime C., Marot G., Castel D., Estelle J., et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform. 2013;14:671–683. doi: 10.1093/bib/bbs046. [DOI] [PubMed] [Google Scholar]
Dou J., Sun B., Sim X., Hughes J.D., Reilly D.F., Tai E.S., Liu J., Wang C. Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data. PLoS Genet. 13. 2017 doi: 10.1371/journal.pgen.1007021. [DOI] [PMC free article] [PubMed] [Google Scholar]
Edahiro R., Shirai Y., Takeshima Y., Sakakibara S., Yamaguchi Y., Murakami T., Morita T., Kato Y., Liu Y.-C., Motooka D., et al. Single-cell analyses and host genetics highlight the role of innate immune cells in COVID-19 severity. Nat. Genet. 2023;55:753–767. doi: 10.1038/s41588-023-01375-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57 74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ewels P., Magnusson M., Lundin S., Kaller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32:3047 3048. doi: 10.1093/bioinformatics/btw354. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fort A., Panousis N.I., Garieri M., Antonarakis S.E., Lappalainen T., Dermitzakis E.T., Delaneau O. MBV: a method to solve sample mislabeling and detect technical bias in large combined genotype and sequencing assay datasets. Bioinformatics. 2017;33:1895–1897. doi: 10.1093/bioinformatics/btx074. [DOI] [PMC free article] [PubMed] [Google Scholar]
E. Garrison G. Marth. Haplotype-based variant detection from short-read sequencing, arXiv preprint 2012 arXiv:1207.3907 [q-bio.GN]Erik.
Gasperini M., Hill A.J., McFaline-Figueroa J.L., Martin B., Kim S., Zhang M.D., Jackson D., Leith A., Schreiber J., Noble W.S., et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell. 2019;176:377–390. doi: 10.1016/j.cell.2018.11.029. e319. [DOI] [PMC free article] [PubMed] [Google Scholar]
Genomes Project C., Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., et al. A global reference for human genetic variation. Nature. 2015;526:68 74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
Giambartolomei C., Vukcevic D., Schadt E.E., Franke L., Hingorani A.D., Wallace C., Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 2014;10 doi: 10.1371/journal.pgen.1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]
Graubert A., Aguet F., Ravi A., Ardlie K.G., Getz G. RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts. Bioinformatics. 2021;37:3048–3050. doi: 10.1093/bioinformatics/btab135. [DOI] [PMC free article] [PubMed] [Google Scholar]
GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318 1330. doi: 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
GTEx Consortium, Laboratory D. A., Coordinating Center -Analysis Working G. Statistical Methods groups-Analysis Working G., Enhancing G. g., Fund N. I. H. C., Nih/Nci, Nih/Nhgri, Nih/Nimh, Nih/Nida et al. Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. doi: 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hansen K.D., Irizarry R.A., Wu Z. Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics. 2012;13:204–216. doi: 10.1093/biostatistics/kxr054. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hoffman G.E. Correcting for population structure and kinship using the linear mixed model: theory and extensions. PLoS One. 2013;8 doi: 10.1371/journal.pone.0075707. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hong E.P., Park J.W. Sample size and statistical power calculation in genetic association studies. Genom. Inform. 2012;10:117–122. doi: 10.5808/GI.2012.10.2.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hormozdiari F., van de Bunt M., Segrè A.V., Li X., Joo J.W.J., Bilow M., Sul J.H., Sankararaman S., Pasaniuc B., Eskin E. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 2016;99:1245–1260. doi: 10.1016/j.ajhg.2016.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hormozdiari F., Kostem E., Kang E.Y., Pasaniuc B., Eskin E. Identifying causal variants at loci with multiple signals of association. Genetics. 2014;198:497–508. doi: 10.1534/genetics.114.167908. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang Q.Q., Ritchie S.C., Brozynska M., Inouye M. Power, false discovery rate and Winner's Curse in eQTL studies. Nucleic Acids Res. 2018;46 doi: 10.1093/nar/gky780. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hu Y., Xi X., Yang Q., Zhang X. SCeQTL: an R package for identifying eQTL from single-cell parallel sequencing data. BMC Bioinform. 2020;21:184. doi: 10.1186/s12859-020-3534-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kel I., Chang Z., Galluccio N., Romeo M., Beretta S., Diomede L., Mezzelani A., Milanesi L., Dieterich C. and Merelli I. SPIRE, a modular pipeline for eQTL analysis of RNA-Seq data, reveals a regulatory hotspot controlling miRNA expression in C. elegans. Mol. Biosyst. 2016;12:3447–3458. doi: 10.1039/c6mb00453a. [DOI] [PubMed] [Google Scholar]
Kerimov N., Hayhurst J.D., Peikova K., Manning J.R., Walter P., Kolberg L., Samovica M., Sakthivel M.P., Kuzmin I., Trevanion S.J., et al. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat. Genet. 2021;53:1290–1299. doi: 10.1038/s41588-021-00924-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kerimov N., Tambets R., Hayhurst J.D., Rahu I., Kolberg P., Raudvere U., Kuzmin I., Chowdhary A., Vija A., Teras H.J., et al. eQTL Catalogue 2023: new datasets, X chromosome QTLs, and improved detection and visualisation of transcript-level QTLs. PLoS Genet. 2023;19 doi: 10.1371/journal.pgen.1010932. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim S., Scheffler K., Halpern A.L., Bekritsky M.A., Noh E., Kallberg M., Chen X., Kim Y., Beyter D., Krusche P., et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods. 2018;15:591–594. doi: 10.1038/s41592-018-0051-x. [DOI] [PubMed] [Google Scholar]
Lee C. Genome-wide expression quantitative trait loci analysis using mixed models. Front. Genet. 2018;9:341. doi: 10.3389/fgene.2018.00341. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee G.Y., Ham S., Lee S.V. Brief guide to RNA sequencing analysis for nonexperts in bioinformatics. Mol. Cells. 2024;47 doi: 10.1016/j.mocell.2024.100060. [DOI] [PMC free article] [PubMed] [Google Scholar]
Leek J.T., Storey J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3:1724–1735. doi: 10.1371/journal.pgen.0030161. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu H., Prashant N.M., Spurr L.F., Bousounis P., Alomran N., Ibeawuchi H., Sein J., Slowinski P., Tsaneva-Atanasova K., Horvath A. scReQTL: an approach to correlate SNVs to gene expression from individual scRNA-seq datasets. BMC Genom. 2021;22:40. doi: 10.1186/s12864-020-07334-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu X., Li Y.I., Pritchard J.K. Trans effects on gene expression can drive omnigenic inheritance. Cell. 2019;177:1022–1034. doi: 10.1016/j.cell.2019.04.014.e1026. e1026. [DOI] [PMC free article] [PubMed] [Google Scholar]
Love M.I., Huber W., Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lu A., Thompson M., Grace Gordon M., Dahl A., Ye C.J., Zaitlen N. and Balliu B. Fast and powerful statistical method for context-specific QTL mapping in multi-context genomic studies, bioRxiv 2021 2021.2006.2017.448889; 10.1101/2021.06.17.4488892021.2006.2017.448889.
Manichaikul A., Mychaleckyj J.C., Rich S.S., Daly K., Sale M., Chen W.M. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867–2873. doi: 10.1093/bioinformatics/btq559. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marees A.T., de Kluiver H., Stringer S., Vorspan F., Curis E., Marie-Claire C., Derks E.M. A tutorial on conducting genome-wide association studies: quality control and statistical analysis. Int. J. Methods Psychiatr. Res. 2018;27 doi: 10.1002/mpr.1608. [DOI] [PMC free article] [PubMed] [Google Scholar]
McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297 1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mostafavi S., Battle A., Zhu X., Urban A.E., Levinson D., Montgomery S.B., Koller D. Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge. PLoS One. 2013;8 doi: 10.1371/journal.pone.0068141. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nyerki E., Kalmar T., Schutz O., Lima R.M., Neparaczki E., Torok T., Maroti Z. correctKin: an optimized method to infer relatedness up to the 4th degree from low-coverage ancient human genomes. Genome Biol. 2023;24:38. doi: 10.1186/s13059-023-02882-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ongen H., Buil A., Brown A.A., Dermitzakis E.T., Delaneau O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics. 2016;32:1479–1485. doi: 10.1093/bioinformatics/btv722. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pala M., Zappala Z., Marongiu M., Li X., Davis J.R., Cusano R., Crobu F., Kukurba K.R., Gloudemans M.J., Reinier F., et al. Population- and individual-specific regulatory variation in Sardinia. Nat. Genet. 2017;49:700 707. doi: 10.1038/ng.3840. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pierce B.L., Tong L., Chen L.S., Rahaman R., Argos M., Jasmine F., Roy S., Paul-Brutus R., Westra H.J., Franke L., et al. Mediation analysis demonstrates that trans-eQTLs are often explained by cis-mediation: a genome-wide analysis among 1,800 South Asians. PLoS Genet. 2014;10 doi: 10.1371/journal.pgen.1004818. [DOI] [PMC free article] [PubMed] [Google Scholar]
Poplin R., Chang P.C., Alexander D., Schwartz S., Colthurst T., Ku A., Newburger D., Dijamco J., Nguyen N., Afshar P.T., et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 2018;36:983–987. doi: 10.1038/nbt.4235. [DOI] [PubMed] [Google Scholar]
Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J., et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
Risso D., Schwartz K., Sherlock G., Dudoit S. GC-content normalization for RNA-Seq data. BMC Bioinform. 2011;12:480. doi: 10.1186/1471-2105-12-480. [DOI] [PMC free article] [PubMed] [Google Scholar]
Robinson M.D., McCarthy D.J., Smyth G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
Robinson M.D., Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25. doi: 10.1186/gb-2010-11-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rohlfs R.V., Weir B.S. Distributions of Hardy-Weinberg equilibrium test statistics. Genetics. 2008;180:1609–1616. doi: 10.1534/genetics.108.088005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sanchis P., Lavignolle R., Abbate M., Lage-Vickers S., Vazquez E., Cotignola J., Bizzotto J., Gueron G. Analysis workflow of publicly available RNA-sequencing datasets. STAR Protoc. 2021;2 doi: 10.1016/j.xpro.2021.100478. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shabalin A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28:1353–1358. doi: 10.1093/bioinformatics/bts163. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sieberts S.K., Perumal T.M., Carrasquillo M.M., Allen M., Reddy J.S., Hoffman G.E., Dang K.K., Calley J., Ebert P.J., Eddy J., et al. Large eQTL meta-analysis reveals differing patterns between cerebral cortical and cerebellar brain regions. Sci. Data. 2020;7:340. doi: 10.1038/s41597-020-00642-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Speed D., Balding D.J. Relatedness in the post-genomic era: is it still useful? Nat. Rev. Genet. 2015;16:33 44. doi: 10.1038/nrg3821. [DOI] [PubMed] [Google Scholar]
Stegle O., Parts L., Piipari M., Winn J., Durbin R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 2012;7:500–507. doi: 10.1038/nprot.2011.457. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tarazona S., Furio-Tari P., Turra D., Pietro A.D., Nueda M.J., Ferrer A., Conesa A. Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package. Nucleic Acids Res. 2015;43 doi: 10.1093/nar/gkv711. [DOI] [PMC free article] [PubMed] [Google Scholar]
Taylor-Weiner A., Aguet F., Haradhvala N.J., Gosai S., Anand S., Kim J., Ardlie K., Van Allen E.M., Getz G. Scaling computational genomics to millions of individuals with GPUs. Genome Biol. 2019;20:228. doi: 10.1186/s13059-019-1836-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tewhey R., Kotliar D., Park D.S., Liu B., Winnicki S., Reilly S.K., Andersen K.G., Mikkelsen T.S., Lander E.S., Schaffner S.F., et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell. 2016;165:1519 1529. doi: 10.1016/j.cell.2016.04.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thompson E.A. The estimation of pairwise relationships. Ann. Hum. Genet. 1975;39:173 188. doi: 10.1111/j.1469-1809.1975.tb00120.x. [DOI] [PubMed] [Google Scholar]
Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., Yang J. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang L., Babushkin N., Liu Z., Liu X. Trans-eQTL mapping in gene sets identifies network effects of genetic variants. Cell Genom. 2024;4 doi: 10.1016/j.xgen.2024.100538. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang X., Glubb D.M., O'Mara T.A. 10 years of GWAS discovery in endometrial cancer: aetiology, function and translation. EBioMedicine. 2022;77 doi: 10.1016/j.ebiom.2022.103895. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vosa U., Claringbould A., Westra H.J., Bonder M.J., Deelen P., Zeng B., Kirsten H., Saha A., Kreuzhuber R., Yazar S., et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 2021;53:1300–1310. doi: 10.1038/s41588-021-00913-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang G., Sarkar A., Carbonetto P., Stephens M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol. 2020;82:1273–1300. doi: 10.1111/rssb.12388. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang T., Liu Y., Ruan J., Dong X., Wang Y., Peng J. A pipeline for RNA-seq based eQTL analysis with automated quality control procedures. BMC Bioinform. 2021;22:403. doi: 10.1186/s12859-021-04307-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang L., Wang S., Li W. RSeQC: quality control of RNA-seq experiments. Bioinformatics. 2012;28:2184–2185. doi: 10.1093/bioinformatics/bts356. [DOI] [PubMed] [Google Scholar]
Wen X., Lee Y., Luca F., Pique-Regi R. Efficient integrative multi-SNP association analysis via deterministic approximation of posteriors. Am. J. Hum. Genet. 2016;98:1114–1129. doi: 10.1016/j.ajhg.2016.03.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
Werling D.M., Pochareddy S., Choi J., An J.Y., Sheppard B., Peng M., Li Z., Dastmalchi C., Santpere G., Sousa A.M.M., et al. Whole-genome and RNA sequencing reveal variation and transcriptomic coordination in the developing human prefrontal cortex. Cell Rep. 2020;31 doi: 10.1016/j.celrep.2020.03.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wigginton J.E., Cutler D.J., Abecasis G.R. A note on exact tests of Hardy-Weinberg equilibrium. Am. J. Hum. Genet. 2005;76:887–893. doi: 10.1086/429864. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang J., Zaitlen N.A., Goddard M.E., Visscher P.M., Price A.L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 2014;46:100–106. doi: 10.1038/ng.2876. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yoo T., Joo S.K., Kim H.J., Kim H.Y., Sim H., Lee J., Kim H.H., Jung S., Lee Y., Jamialahmadi O., et al. Disease-specific eQTL screening reveals an anti-fibrotic effect of AGXT2 in non-alcoholic fatty liver disease. J. Hepatol. 2021;75:514–523. doi: 10.1016/j.jhep.2021.04.011. [DOI] [PubMed] [Google Scholar]
Zhang F., Flickinger M., Taliun S.A.G., In P.P.G.C., Abecasis G.R., Scott L.J., McCaroll S.A., Pato C.N., Boehnke M., Kang H.M. Ancestry-agnostic estimation of DNA sample contamination from sequence reads. Genome Res. 2020;30:185 194. doi: 10.1101/gr.246934.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao S., Jing W., Samuels D.C., Sheng Q., Shyr Y., Guo Y. Strategies for processing and quality control of Illumina genotyping arrays. Brief Bioinform. 2018;19:765–775. doi: 10.1093/bib/bbx012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou Y., Browning S.R., Browning B.L. IBDkin: fast estimation of kinship coefficients from identity by descent segments. Bioinformatics. 2020;36:4519–4520. doi: 10.1093/bioinformatics/btaa569. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou H.J., Li L., Li Y., Li W., Li J.J. PCA outperforms popular hidden variable inference methods for molecular QTL mapping. Genome Biol. 2022;23:210. doi: 10.1186/s13059-022-02761-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou X., Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 2012;44:821–824. doi: 10.1038/ng.2310. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

mmc1.xlsx^{(14.8KB, xlsx)}

[bib1] Abbas-Aghababazadeh F., Li Q., Fridley B.L. Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing. PLoS One. 2018;13 doi: 10.1371/journal.pone.0206312. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Aguet F., Alasoo K., Li Y.I., Battle A., Im H.K., Montgomery S.B., Lappalainen T. Molecular quantitative trait loci. Nat. Rev. Methods Primers. 2023;3:4. doi: 10.1038/s43586-022-00188-6. [DOI] [Google Scholar]

[bib3] Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. 〈https://www.bioinformatics.babraham.ac.uk/projects/fastqc/〉.

[bib4] Benner C., Spencer C.C., Havulinna A.S., Salomaa V., Ripatti S., Pirinen M. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016;32:1493–1501. doi: 10.1093/bioinformatics/btw018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Bolstad B.M., Irizarry R.A., Astrand M., Speed T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–193. doi: 10.1093/bioinformatics/19.2.185. [DOI] [PubMed] [Google Scholar]

[bib6] Buniello A., MacArthur J.A.L., Cerezo M., Harris L.W., Hayhurst J., Malangone C., McMahon A., Morales J., Mountjoy E., Sollis E., et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47:D1005–D1012. doi: 10.1093/nar/gky1120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Cano-Gamez E., Trynka G. From GWAS to function: using functional genomics to identify the mechanisms underlying complex diseases. Front. Genet. 2020;11:424. doi: 10.3389/fgene.2020.00424. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Casale F.P., Rakitsch B., Lippert C., Stegle O. Efficient set tests for the genetic analysis of correlated traits. Nat. Methods. 2015;12:755–758. doi: 10.1038/nmeth.3439. [DOI] [PubMed] [Google Scholar]

[bib10] Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Chen Y.J., Roumeliotis T.I., Chang Y.H., Chen C.T., Han C.L., Lin M.H., Chen H.W., Chang G.C., Chang Y.L., Wu C.T., et al. Proteogenomics of non-smoking lung cancer in East Asia delineates molecular signatures of pathogenesis and progression. Cell. 2020;182:226–244. doi: 10.1016/j.cell.2020.06.012. e217. [DOI] [PubMed] [Google Scholar]

[bib12] Cloonan N., Forrest A.R., Kolle G., Gardiner B.B., Faulkner G.J., Brown M.K., Taylor D.F., Steptoe A.L., Wani S., Bethel G., et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods. 2008;5:613–619. doi: 10.1038/nmeth.1223. [DOI] [PubMed] [Google Scholar]

[bib13] Corbin Q., Li G., Zilin L., Xihao L., Rounak D., Yaowu L., Laura S. and Xihong L. A versatile toolkit for molecular QTL mapping and meta-analysis at scale. bioRxiv 2020 〈10.1101/2020.12.18.4234902020.2012.2018.423490〉.

[bib14] Cuomo A.S.E., Heinen T., Vagiaki D., Horta D., Marioni J.C., Stegle O. CellRegMap: a statistical framework for mapping context-specific regulatory variants using scRNA-seq. Mol. Syst. Biol. 2022;18 doi: 10.15252/msb.202110663. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Danecek P., Auton A., Abecasis G., Albers C.A., Banks E., DePristo M.A., Handsaker R.E., Lunter G., Marth G.T., Sherry S.T., et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156 2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] de Klein N., Tsai E.A., Vochteloo M., Baird D., Huang Y., Chen C.Y., van Dam S., Oelen R., Deelen P., Bakker O.B., et al. Brain expression quantitative trait locus and network analyses reveal downstream effects and putative drivers for brain-related diseases. Nat. Genet. 2023;55:377–388. doi: 10.1038/s41588-023-01300-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Delaneau O., Ongen H., Brown A.A., Fort A., Panousis N.I., Dermitzakis E.T. A complete tool set for molecular QTL discovery and analysis. Nat. Commun. 2017;8:15452. doi: 10.1038/ncomms15452. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Dillies M.A., Rau A., Aubert J., Hennequet-Antier C., Jeanmougin M., Servant N., Keime C., Marot G., Castel D., Estelle J., et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform. 2013;14:671–683. doi: 10.1093/bib/bbs046. [DOI] [PubMed] [Google Scholar]

[bib19] Dou J., Sun B., Sim X., Hughes J.D., Reilly D.F., Tai E.S., Liu J., Wang C. Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data. PLoS Genet. 13. 2017 doi: 10.1371/journal.pgen.1007021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Edahiro R., Shirai Y., Takeshima Y., Sakakibara S., Yamaguchi Y., Murakami T., Morita T., Kato Y., Liu Y.-C., Motooka D., et al. Single-cell analyses and host genetics highlight the role of innate immune cells in COVID-19 severity. Nat. Genet. 2023;55:753–767. doi: 10.1038/s41588-023-01375-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57 74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] Ewels P., Magnusson M., Lundin S., Kaller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32:3047 3048. doi: 10.1093/bioinformatics/btw354. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Fort A., Panousis N.I., Garieri M., Antonarakis S.E., Lappalainen T., Dermitzakis E.T., Delaneau O. MBV: a method to solve sample mislabeling and detect technical bias in large combined genotype and sequencing assay datasets. Bioinformatics. 2017;33:1895–1897. doi: 10.1093/bioinformatics/btx074. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] E. Garrison G. Marth. Haplotype-based variant detection from short-read sequencing, arXiv preprint 2012 arXiv:1207.3907 [q-bio.GN]Erik.

[bib25] Gasperini M., Hill A.J., McFaline-Figueroa J.L., Martin B., Kim S., Zhang M.D., Jackson D., Leith A., Schreiber J., Noble W.S., et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell. 2019;176:377–390. doi: 10.1016/j.cell.2018.11.029. e319. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Genomes Project C., Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., et al. A global reference for human genetic variation. Nature. 2015;526:68 74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] Giambartolomei C., Vukcevic D., Schadt E.E., Franke L., Hingorani A.D., Wallace C., Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 2014;10 doi: 10.1371/journal.pgen.1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Graubert A., Aguet F., Ravi A., Ardlie K.G., Getz G. RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts. Bioinformatics. 2021;37:3048–3050. doi: 10.1093/bioinformatics/btab135. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318 1330. doi: 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] GTEx Consortium, Laboratory D. A., Coordinating Center -Analysis Working G. Statistical Methods groups-Analysis Working G., Enhancing G. g., Fund N. I. H. C., Nih/Nci, Nih/Nhgri, Nih/Nimh, Nih/Nida et al. Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. doi: 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] Hansen K.D., Irizarry R.A., Wu Z. Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics. 2012;13:204–216. doi: 10.1093/biostatistics/kxr054. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Hoffman G.E. Correcting for population structure and kinship using the linear mixed model: theory and extensions. PLoS One. 2013;8 doi: 10.1371/journal.pone.0075707. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Hong E.P., Park J.W. Sample size and statistical power calculation in genetic association studies. Genom. Inform. 2012;10:117–122. doi: 10.5808/GI.2012.10.2.117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Hormozdiari F., van de Bunt M., Segrè A.V., Li X., Joo J.W.J., Bilow M., Sul J.H., Sankararaman S., Pasaniuc B., Eskin E. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 2016;99:1245–1260. doi: 10.1016/j.ajhg.2016.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] Hormozdiari F., Kostem E., Kang E.Y., Pasaniuc B., Eskin E. Identifying causal variants at loci with multiple signals of association. Genetics. 2014;198:497–508. doi: 10.1534/genetics.114.167908. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Huang Q.Q., Ritchie S.C., Brozynska M., Inouye M. Power, false discovery rate and Winner's Curse in eQTL studies. Nucleic Acids Res. 2018;46 doi: 10.1093/nar/gky780. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] Hu Y., Xi X., Yang Q., Zhang X. SCeQTL: an R package for identifying eQTL from single-cell parallel sequencing data. BMC Bioinform. 2020;21:184. doi: 10.1186/s12859-020-3534-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] Kel I., Chang Z., Galluccio N., Romeo M., Beretta S., Diomede L., Mezzelani A., Milanesi L., Dieterich C. and Merelli I. SPIRE, a modular pipeline for eQTL analysis of RNA-Seq data, reveals a regulatory hotspot controlling miRNA expression in C. elegans. Mol. Biosyst. 2016;12:3447–3458. doi: 10.1039/c6mb00453a. [DOI] [PubMed] [Google Scholar]

[bib39] Kerimov N., Hayhurst J.D., Peikova K., Manning J.R., Walter P., Kolberg L., Samovica M., Sakthivel M.P., Kuzmin I., Trevanion S.J., et al. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat. Genet. 2021;53:1290–1299. doi: 10.1038/s41588-021-00924-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Kerimov N., Tambets R., Hayhurst J.D., Rahu I., Kolberg P., Raudvere U., Kuzmin I., Chowdhary A., Vija A., Teras H.J., et al. eQTL Catalogue 2023: new datasets, X chromosome QTLs, and improved detection and visualisation of transcript-level QTLs. PLoS Genet. 2023;19 doi: 10.1371/journal.pgen.1010932. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] Kim S., Scheffler K., Halpern A.L., Bekritsky M.A., Noh E., Kallberg M., Chen X., Kim Y., Beyter D., Krusche P., et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods. 2018;15:591–594. doi: 10.1038/s41592-018-0051-x. [DOI] [PubMed] [Google Scholar]

[bib42] Lee C. Genome-wide expression quantitative trait loci analysis using mixed models. Front. Genet. 2018;9:341. doi: 10.3389/fgene.2018.00341. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] Lee G.Y., Ham S., Lee S.V. Brief guide to RNA sequencing analysis for nonexperts in bioinformatics. Mol. Cells. 2024;47 doi: 10.1016/j.mocell.2024.100060. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] Leek J.T., Storey J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3:1724–1735. doi: 10.1371/journal.pgen.0030161. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Liu H., Prashant N.M., Spurr L.F., Bousounis P., Alomran N., Ibeawuchi H., Sein J., Slowinski P., Tsaneva-Atanasova K., Horvath A. scReQTL: an approach to correlate SNVs to gene expression from individual scRNA-seq datasets. BMC Genom. 2021;22:40. doi: 10.1186/s12864-020-07334-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] Liu X., Li Y.I., Pritchard J.K. Trans effects on gene expression can drive omnigenic inheritance. Cell. 2019;177:1022–1034. doi: 10.1016/j.cell.2019.04.014.e1026. e1026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] Love M.I., Huber W., Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] Lu A., Thompson M., Grace Gordon M., Dahl A., Ye C.J., Zaitlen N. and Balliu B. Fast and powerful statistical method for context-specific QTL mapping in multi-context genomic studies, bioRxiv 2021 2021.2006.2017.448889; 10.1101/2021.06.17.4488892021.2006.2017.448889.

[bib50] Manichaikul A., Mychaleckyj J.C., Rich S.S., Daly K., Sale M., Chen W.M. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867–2873. doi: 10.1093/bioinformatics/btq559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] Marees A.T., de Kluiver H., Stringer S., Vorspan F., Curis E., Marie-Claire C., Derks E.M. A tutorial on conducting genome-wide association studies: quality control and statistical analysis. Int. J. Methods Psychiatr. Res. 2018;27 doi: 10.1002/mpr.1608. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297 1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] Mostafavi S., Battle A., Zhu X., Urban A.E., Levinson D., Montgomery S.B., Koller D. Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge. PLoS One. 2013;8 doi: 10.1371/journal.pone.0068141. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] Nyerki E., Kalmar T., Schutz O., Lima R.M., Neparaczki E., Torok T., Maroti Z. correctKin: an optimized method to infer relatedness up to the 4th degree from low-coverage ancient human genomes. Genome Biol. 2023;24:38. doi: 10.1186/s13059-023-02882-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] Ongen H., Buil A., Brown A.A., Dermitzakis E.T., Delaneau O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics. 2016;32:1479–1485. doi: 10.1093/bioinformatics/btv722. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib56] Pala M., Zappala Z., Marongiu M., Li X., Davis J.R., Cusano R., Crobu F., Kukurba K.R., Gloudemans M.J., Reinier F., et al. Population- and individual-specific regulatory variation in Sardinia. Nat. Genet. 2017;49:700 707. doi: 10.1038/ng.3840. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] Pierce B.L., Tong L., Chen L.S., Rahaman R., Argos M., Jasmine F., Roy S., Paul-Brutus R., Westra H.J., Franke L., et al. Mediation analysis demonstrates that trans-eQTLs are often explained by cis-mediation: a genome-wide analysis among 1,800 South Asians. PLoS Genet. 2014;10 doi: 10.1371/journal.pgen.1004818. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib58] Poplin R., Chang P.C., Alexander D., Schwartz S., Colthurst T., Ku A., Newburger D., Dijamco J., Nguyen N., Afshar P.T., et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 2018;36:983–987. doi: 10.1038/nbt.4235. [DOI] [PubMed] [Google Scholar]

[bib59] Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]

[bib60] Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J., et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] Risso D., Schwartz K., Sherlock G., Dudoit S. GC-content normalization for RNA-Seq data. BMC Bioinform. 2011;12:480. doi: 10.1186/1471-2105-12-480. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib62] Robinson M.D., McCarthy D.J., Smyth G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib63] Robinson M.D., Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25. doi: 10.1186/gb-2010-11-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib64] Rohlfs R.V., Weir B.S. Distributions of Hardy-Weinberg equilibrium test statistics. Genetics. 2008;180:1609–1616. doi: 10.1534/genetics.108.088005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib65] Sanchis P., Lavignolle R., Abbate M., Lage-Vickers S., Vazquez E., Cotignola J., Bizzotto J., Gueron G. Analysis workflow of publicly available RNA-sequencing datasets. STAR Protoc. 2021;2 doi: 10.1016/j.xpro.2021.100478. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib66] Shabalin A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28:1353–1358. doi: 10.1093/bioinformatics/bts163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib67] Sieberts S.K., Perumal T.M., Carrasquillo M.M., Allen M., Reddy J.S., Hoffman G.E., Dang K.K., Calley J., Ebert P.J., Eddy J., et al. Large eQTL meta-analysis reveals differing patterns between cerebral cortical and cerebellar brain regions. Sci. Data. 2020;7:340. doi: 10.1038/s41597-020-00642-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib68] Speed D., Balding D.J. Relatedness in the post-genomic era: is it still useful? Nat. Rev. Genet. 2015;16:33 44. doi: 10.1038/nrg3821. [DOI] [PubMed] [Google Scholar]

[bib69] Stegle O., Parts L., Piipari M., Winn J., Durbin R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 2012;7:500–507. doi: 10.1038/nprot.2011.457. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib70] Tarazona S., Furio-Tari P., Turra D., Pietro A.D., Nueda M.J., Ferrer A., Conesa A. Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package. Nucleic Acids Res. 2015;43 doi: 10.1093/nar/gkv711. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib71] Taylor-Weiner A., Aguet F., Haradhvala N.J., Gosai S., Anand S., Kim J., Ardlie K., Van Allen E.M., Getz G. Scaling computational genomics to millions of individuals with GPUs. Genome Biol. 2019;20:228. doi: 10.1186/s13059-019-1836-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib72] Tewhey R., Kotliar D., Park D.S., Liu B., Winnicki S., Reilly S.K., Andersen K.G., Mikkelsen T.S., Lander E.S., Schaffner S.F., et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell. 2016;165:1519 1529. doi: 10.1016/j.cell.2016.04.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib73] Thompson E.A. The estimation of pairwise relationships. Ann. Hum. Genet. 1975;39:173 188. doi: 10.1111/j.1469-1809.1975.tb00120.x. [DOI] [PubMed] [Google Scholar]

[bib74] Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., Yang J. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib75] Wang L., Babushkin N., Liu Z., Liu X. Trans-eQTL mapping in gene sets identifies network effects of genetic variants. Cell Genom. 2024;4 doi: 10.1016/j.xgen.2024.100538. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib76] Wang X., Glubb D.M., O'Mara T.A. 10 years of GWAS discovery in endometrial cancer: aetiology, function and translation. EBioMedicine. 2022;77 doi: 10.1016/j.ebiom.2022.103895. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib77] Vosa U., Claringbould A., Westra H.J., Bonder M.J., Deelen P., Zeng B., Kirsten H., Saha A., Kreuzhuber R., Yazar S., et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 2021;53:1300–1310. doi: 10.1038/s41588-021-00913-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib78] Wang G., Sarkar A., Carbonetto P., Stephens M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol. 2020;82:1273–1300. doi: 10.1111/rssb.12388. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib79] Wang T., Liu Y., Ruan J., Dong X., Wang Y., Peng J. A pipeline for RNA-seq based eQTL analysis with automated quality control procedures. BMC Bioinform. 2021;22:403. doi: 10.1186/s12859-021-04307-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib80] Wang L., Wang S., Li W. RSeQC: quality control of RNA-seq experiments. Bioinformatics. 2012;28:2184–2185. doi: 10.1093/bioinformatics/bts356. [DOI] [PubMed] [Google Scholar]

[bib81] Wen X., Lee Y., Luca F., Pique-Regi R. Efficient integrative multi-SNP association analysis via deterministic approximation of posteriors. Am. J. Hum. Genet. 2016;98:1114–1129. doi: 10.1016/j.ajhg.2016.03.029. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib82] Werling D.M., Pochareddy S., Choi J., An J.Y., Sheppard B., Peng M., Li Z., Dastmalchi C., Santpere G., Sousa A.M.M., et al. Whole-genome and RNA sequencing reveal variation and transcriptomic coordination in the developing human prefrontal cortex. Cell Rep. 2020;31 doi: 10.1016/j.celrep.2020.03.053. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib83] Wigginton J.E., Cutler D.J., Abecasis G.R. A note on exact tests of Hardy-Weinberg equilibrium. Am. J. Hum. Genet. 2005;76:887–893. doi: 10.1086/429864. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib84] Yang J., Zaitlen N.A., Goddard M.E., Visscher P.M., Price A.L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 2014;46:100–106. doi: 10.1038/ng.2876. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib85] Yoo T., Joo S.K., Kim H.J., Kim H.Y., Sim H., Lee J., Kim H.H., Jung S., Lee Y., Jamialahmadi O., et al. Disease-specific eQTL screening reveals an anti-fibrotic effect of AGXT2 in non-alcoholic fatty liver disease. J. Hepatol. 2021;75:514–523. doi: 10.1016/j.jhep.2021.04.011. [DOI] [PubMed] [Google Scholar]

[bib86] Zhang F., Flickinger M., Taliun S.A.G., In P.P.G.C., Abecasis G.R., Scott L.J., McCaroll S.A., Pato C.N., Boehnke M., Kang H.M. Ancestry-agnostic estimation of DNA sample contamination from sequence reads. Genome Res. 2020;30:185 194. doi: 10.1101/gr.246934.118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib87] Zhao S., Jing W., Samuels D.C., Sheng Q., Shyr Y., Guo Y. Strategies for processing and quality control of Illumina genotyping arrays. Brief Bioinform. 2018;19:765–775. doi: 10.1093/bib/bbx012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib88] Zhou Y., Browning S.R., Browning B.L. IBDkin: fast estimation of kinship coefficients from identity by descent segments. Bioinformatics. 2020;36:4519–4520. doi: 10.1093/bioinformatics/btaa569. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib89] Zhou H.J., Li L., Li Y., Li W., Li J.J. PCA outperforms popular hidden variable inference methods for molecular QTL mapping. Genome Biol. 2022;23:210. doi: 10.1186/s13059-022-02761-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib90] Zhou X., Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 2012;44:821–824. doi: 10.1038/ng.2310. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A brief guide to analyzing expression quantitative trait loci

Byung Su Ko

Sung Bae Lee

Tae-Kyung Kim

Abstract

INTRODUCTION

Fig. 1.

MAIN BODY

Overview of eQTL Mapping

Genotype Data

Sample-level QC

Variant-level QC

Phenotype Data

Selecting Covariates

eQTL Mapping

Translating eQTL Findings Into Meaningful Biological Insights

CONCLUDING REMARKS

AUTHOR CONTRIBUTIONS

DECLARATION OF COMPETING INTERESTS

ACKNOWLEDGMENTS

Footnotes

Appendix A. Supplemental material

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A brief guide to analyzing expression quantitative trait loci

Byung Su Ko

Sung Bae Lee

Tae-Kyung Kim

Abstract

INTRODUCTION

Fig. 1.

MAIN BODY

Overview of eQTL Mapping

Genotype Data

Sample-level QC

Variant-level QC

Phenotype Data

Selecting Covariates

eQTL Mapping

Translating eQTL Findings Into Meaningful Biological Insights

CONCLUDING REMARKS

AUTHOR CONTRIBUTIONS

DECLARATION OF COMPETING INTERESTS

ACKNOWLEDGMENTS

Footnotes

Appendix A. Supplemental material

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases