Abstract
Background & Aims
The association of genetic variation with tissue-specific gene expression and alternative splicing guides functional characterization of complex trait-associated loci and may suggest novel genes implicated in disease. Here, our aims were as follows: (1) to generate reference profiles of colon mucosa gene expression and alternative splicing and compare them across colon subsites (ascending, transverse, and descending), (2) to identify expression and splicing quantitative trait loci (QTLs), (3) to find traits for which identified QTLs contribute to single-nucleotide polymorphism (SNP)-based heritability, (4) to propose candidate effector genes, and (5) to provide a web-based visualization resource.
Methods
We collected colonic mucosal biopsy specimens from 485 healthy adults and performed bulk RNA sequencing. We performed genome-wide SNP genotyping from blood leukocytes. Statistical approaches and bioinformatics software were used for QTL identification and downstream analyses.
Results
We provided a complete quantification of gene expression and alternative splicing across colon subsites and described their differences. We identified thousands of expression and splicing QTLs and defined their enrichment at genome-wide regulatory regions. We found that part of the SNP-based heritability of diseases affecting colon tissue, such as colorectal cancer and inflammatory bowel disease, but also of diseases affecting other tissues, such as psychiatric conditions, can be explained by the identified QTLs. We provided candidate effector genes for multiple phenotypes. Finally, we provided the Colon Transcriptome Explorer web application.
Conclusions
We provide a large characterization of gene expression and splicing across colon subsites. Our findings provide greater etiologic insight into complex traits and diseases influenced by transcriptomic changes in colon tissue.
Keywords: Gene Expression, Alternative Splicing, QTLs, Colon
Abbreviations used in this paper: AS, alternative splicing; BarcUVa-Seq, University of Barcelona and University of Virginia genotyping and RNA sequencing project; CoTrEx, colon transcriptome explorer; CRC, colorectal cancer; eGene, eQTL gene; eQTL, expression quantitative trait locus; eSNP, eQTL SNP; FDR, false-discovery rate; FWER, family-wise error rate; GTEx, Genotype-Tissue Expression project; GWAS, genome-wide association study; LD, linkage disequilibrium; MAF, minor allele frequency; PSI, percent splicing index; RBP, RNA-binding proteins; RNA-Seq, RNA sequencing; sGene, sQTL gene; SNP, single-nucleotide polymorphism; sSNP, sQTL SNP; sQTL, splicing quantitative trait locus; TSS, transcription start site
Graphical abstract
Summary.
We profiled gene expression and alternative splicing of non-neoplastic colon from biopsy specimens from 445 healthy individuals. We showed that single-nucleotide polymorphisms associated with these profiles are enriched in disease-associated loci, including colorectal cancer and inflammatory bowel disease.
Transcriptome-wide gene expression profiles of normal colon tissue have been assessed in population-based studies, using data sets with a range of different characteristics, including variable colon anatomic subsites, collection methods, sample sizes, sequencing technologies, and data processing methods.1, 2, 3, 4, 5, 6, 7, 8 A large public transcriptome data set for non-neoplastic colon tissue from the Genotype-Tissue Expression (GTEx) project included samples collected from the transverse and sigmoid colon of post-mortem subjects and included both mucosa and muscularis propria.8 In most studies, the transcriptome is assessed in terms of gene expression, however, a comprehensive characterization of alternative splicing (AS) has not been performed in normal colon epithelial tissue derived from living individuals.
AS is a post-transcriptional regulatory mechanism by which multiple messenger RNA transcripts are produced from a single locus, enabling enlargement of cellular functions.9 More than 90% of human genes have the potential to undergo AS.10 Common AS patterns include exon skipping, alternative 5’ and 3’ splice sites, mutually exclusive exons, intron retention, and alternative first or last exons.11 Based on these predefined patterns and transcript expression levels, different AS events and their relative abundances can be identified for a given gene.12 In addition, by measuring alternative excision of introns, novel and more complex alternative splicing events can be identified.13 AS has been assessed in multiple tissue types across several large cohorts, including healthy8 and pathologic tissues,14, 15, 16 allowing the association of particular AS events with phenotypes such as age17 and cancer type.14, 15, 16 In colon tissue, AS events have been measured in adenocarcinomas and paired adjacent normal tissue and have been associated with colorectal cancer (CRC) anatomic location18 and prognosis.18, 19, 20
Single-nucleotide polymorphisms (SNPs) have been associated with gene expression (ie, expression quantitative trait loci [eQTLs]) and AS (sQTLs), and increasingly are identified in studies of both normal8,21, 22, 23, 24, 25 and malignant tissues.26 Such associations can indicate the functional effects of SNPs at genetic risk loci, help prioritize SNPs and genes for functional assays, serve as prognostic biomarkers, and suggest disease mechanisms.10,26,27 In the case of normal colon tissue, eQTL data sets have been generated,1, 2, 3, 4, 5, 6, 7, 8 but there is no information about sQTLs derived from living individuals.
In this study, we analyzed a novel RNA sequencing (RNA-Seq) data set of normal colon tissue biopsy specimens including colon anatomic subsites not investigated previously (ascending, transverse, and descending). Our data set is representative of the transcriptome of colon epithelial cells of living subjects because all biopsy specimens were collected from mucosa at colonoscopy. This characteristic makes it optimal for investigating the normal physiology across the colon, and it is relevant not only for studying the etiologic aspects of diseases affecting this tissue, such as CRC, but also for diseases affecting other tissues, such as those that imply epithelial–neuronal communication28 and those affected by perturbations of intestinal permeability.29
The aims of this study were as follows: (1) to provide a reference transcriptomic data set for normal colon epithelium by profiling gene expression and AS, (2) to identify SNPs associated with variation in gene expression and AS (ie, QTLs), (3) to list traits for which identified QTLs contribute to SNP-based heritability, (4) to prioritize candidate effector genes, and (5) to provide a web-based resource to visualize the expression profiles and QTLs.
Results
The University of Barcelona and University of Virginia genotyping and RNA Sequencing Project: A Novel Reference Data Set for Colon Tissue Transcriptome Analysis
The University of Barcelona and University of Virginia genotyping and RNA sequencing project (BarcUVa-Seq) cross-sectional study included 485 adult volunteers found to have an endoscopically healthy colon (ie, a normal colon without polyps or other lesions) from whom we collected superficial colon biopsy specimens and blood samples. Bulk RNA was isolated from biopsy samples and sequenced in several batches. Subjects were genotyped using the Illumina (San Diego, CA) OncoArray 500K beadchip,30 and genome-wide SNPs were imputed. After filtering the data to select for individuals with high-quality RNA-Seq and genotype samples (see the Materials and Methods section), we included data from 445 individuals, among whom 283 were female (64%). Biopsy specimens were obtained from sites along the ascending (n = 138; 31%), transverse (n = 143; 32%), and descending (n = 164; 37%) colon (Table 1). We profiled gene expression and alternative splicing and identified cis-acting eQTLs and sQTLs (see the Materials and Methods section).
Table 1.
BarcUVa-Seq Data Set Descriptive
| Total individuals, N | 445 |
|---|---|
| Sex, n (%) | |
| Female | 283 (63.6) |
| Male | 162 (36.4) |
| Age, y, means ± SD | 60 ± 7.44 |
| Colon anatomic location overall and stratified by sex, n (%) | |
| Ascending (right) | 138 (31.0) |
| Female | 86 (62.3) |
| Male | 52 (37.7) |
| Transverse | 143 (32.1) |
| Female | 90 (62.9) |
| Male | 53 (37.1) |
| Descending (left) | 164 (36.9) |
| Female | 107 (65.2) |
| Male | 57 (34.8) |
Gene Expression and Alternative Splicing
Expression was analyzed based on GENCODE (E;BL-EBI, Hinxton, UK) release 19 annotations.31 After filtering out features with low or no expression, 21,281 genes and 104,769 transcripts remained (see the Materials and Methods section). Gene and transcript abundances of interest can be visualized online (see the Colon Transcriptome Explorer [CoTrEx] section). We considered 13,243 AS events in 6178 genes after applying filters (see AS events annotations in Supplementary Table 1). We categorized AS events as follows: alternative first exons (30%), exon skipping (24%), alternative 3’ splice-site (12%), alternative 5’ splice-site (12%), intron retention (10%), alternative last exons (10%), and mutually exclusive exons (1%) (Figure 1, Table 2). Most genes had AS events from 1 or 2 categories, and few had AS events from up to 6 categories. In addition, as a complementary AS metric, we computed the abundances of 269,586 alternatively excised introns that were grouped in 73,313 clusters. Some introns (23%) were novel and 77% were annotated in 15,912 genes. We filtered introns by low expression or low complexity and considered only 42,808 intron clusters annotated in 8953 genes for sQTL analysis (see the Materials and Methods section).
Figure 1.
Alternative splicing events. (A) Scheme of gene and alternatively spliced transcripts structure in 7 AS categories: alternative first exons (AF), exon skipping (SE), alternative 3’ splice-site (A3), alternative 5’ splice-site (A5), intron retention (RI), alternative last exons (AL), and mutually exclusive exons (MX). Constitutive exons (ie, those maintained in all processed transcripts after splicing) are shown in gray. Exons in red or gold alternatively are present in processed transcripts after splicing. Dashed line indicates different splicing processing for a gene. (B) Frequency of AS events and genes by AS category. One gene can be processed according to different AS categories.
Table 2.
Description of AS Events and Genes by AS category
| Event category | Total AS events, n (%) | Total genes, n (%) | AS events associated with sSNPs, n (%) |
|---|---|---|---|
| SE | 3235 (24.43) | 2542 (41.20) | 316 (28.1) |
| AF | 4023 (30.38) | 2146 (34.78) | 253 (22.5) |
| A3 | 1627 (12.29) | 1378 (22.33) | 140 (12.4) |
| A5 | 1579 (11.92) | 1344 (21.78) | 148 (13.2) |
| RI | 1327 (10.02) | 1022 (16.56) | 126 (11.2) |
| AL | 1292 (9.76) | 785 (12.72) | 259 (11.5) |
| MX | 160 (1.21) | 148 (2.40) | 12 (1.1) |
| Overall | 13,243 (100.00) | 6170 (100.00) | 1125 (100.0) |
NOTE. A given gene can have AS events from up to 6 categories.
AF, alternative first exons; AL, alternative last exon; A3, alternative 3’ splice-site; A5, alternative 5’ splice-site; RI, intron retention; MX, mutually exclusive exons; SE, exon skipping.
Transcriptomic Profiles Differ Between Colon Subsites
We aimed to identify genes and splicing features that were expressed differentially across colon subsites, situating the transverse colon as an intermediate phenotype (see the Materials and Methods section). Overall, 4430 genes were expressed differentially between ascending, transverse, and descending subsites (family-wise error rate [FWER], ≤0.05), with absolute log fold changes of up to 3.7 (Figure 2A). Hierarchical clustering of the top 30 genes with the smallest FWER showed the transverse colon clustered with descending colon (Figure 2B). Full differential gene expression results are listed in Supplementary Table 2. Next, we tested whether genes expressed differentially across subsites were enriched for features in a wide array of curated gene sets, signatures, functional pathways, and ontologies. We found enrichment in a gene set associated with normal colon tissue transformation into adenoma, in pathways involved in drug metabolism, and in other biological processes such as antimicrobial humoral response. Full enrichment results are listed in Supplementary Table 3. For splicing, we found 236 genes with different relative abundances of AS events (false-discovery rate [FDR], ≤0.05) (Supplementary Tables 4 and 5) and 280 genes with different relative abundances of excised introns between the ascending and descending colon (FDR, ≤0.05) (Supplementary Table 6).
Figure 2.
Differential gene expression profiles across colon anatomic subsites. (A) Volcano plot showing the distribution of gene log fold changes and statistical significance. Points above the horizontal dashed line represent genes considered significantly differentially expressed (FWER ≤ 0.05). Points in red and blue color represent genes over (red) and underexpressed (blue) following a consistent trend from ascending to descending colon (ie, overexpressed in transverse relative to ascending colon and overexpressed in descending relative to transverse). (B) Heatmap showing the expression profiles of the top 30 differentially expressed genes across colon subsites ranked by FWER-adjusted P values. Hierarchical clustering shows the similarity between genes (rows) and samples (columns) based on Euclidean distances.
Identification of eQTLs and sQTLs
We identified 11,739 eQTLs (Q value ≤ 0.05) including 11,427 unique SNPs (eSNPs) associated with the expression of 11,739 genes (eGenes) (Supplementary Table 7). Most eSNPs were associated with a single eGene, but we found eSNPs associated with up to 6 eGenes. Neither the location of the eSNPs relative to the gene transcription start site (TSS) nor the allele frequency were associated with the eSNP effect (Figure 3). eQTLs can be explored on the CoTrEx web application (see the Colon Transcriptome Explorer section). Full eQTL summary statistics are publicly available (see the Data availability statement). In addition, we performed eQTL interaction analysis for colon subsites (ascending vs descending) and found 26 eQTLs with a Q value of 0.05 or less (Supplementary Table 8). The eQTL rs6684275-RIMKLA showed an inverse association in the ascending colon compared with the descending colon (Figure 4).
Figure 3.
eQTLs features. (A) Distribution of distances between eSNPs location and corresponding eGenes TSS. (B) Distribution of absolute beta values (slope associated with the nominal P value of association) of eQTLs and eSNPs minor allele frequencies (MAF). These variables were not correlated (r = 0.14).
Figure 4.
Example of eQTLs interacting with colon subsite. Distribution of expression level (inverse normal transformed trimmed means of M values) of RIMKLA by rs6684275 genotype and colon subsite.
Next, we mapped 1125 sQTLs (Q value ≤ 0.05) including 1122 unique SNPs (sSNPs) associated with 1125 genes (sGenes) (Supplementary Table 9). The proportions of AS categories among SNP-associated AS events were similar to those found for total AS events (Table 2). Although we found 82% of sGenes among eGenes, only 8% of sGenes shared the same genetic variants with eGenes (6%) or harbored variants in high linkage disequilibrium (LD R2 > 0.8) with eSNPs (2%) (Figure 5A). In addition, we identified an additional set of 1062 sQTLs (Q value ≤ 0.05) of 1058 sSNPs associated with clusters of excised introns in 1062 genes (Supplementary Table 10) and observed that 40% of these sGenes were in common with sGenes associated with AS events. sQTLs can be explored on the CoTrEx web application (see Colon Transcriptome Explorer section), and full summary statistics are publicly available (see Data availability statement).
Figure 5.
Colocalization among sSNPs and eSNPs and genomic region annotation. (A) Percentages of colocalization patterns among sSNPs and eSNPs in common genes according to measures of LD R2. (B) Percentages of eSNPs and sSNPs at specific genomic regions, note that the plot is gapped between 15% and 30% and rescaled between 30% and 60% to show the differences in the categories with the lowest representation. UTR, untranslated region.
Replication and Meta-Analysis With GTEx
We performed replication and meta-analyses using data from the GTEx project v8.8 For replication analysis, we used samples from the sigmoid and transverse colon (n = 318 and n = 368, respectively). For the replication of eQTLs, we downloaded the list of GTEx eQTLs (see the Materials and Methods section). For the replication of sQTLs we used GTEx transcript expression data for computing AS events as well as SNPs for computing sQTLs using the same approach considered for BarcUVa-Seq data (Supplementary Tables 11 and 12). We explored the P value distributions between BarcUVa-Seq and GTEx colon data sets and computed the π1 statistic32 (Figure 6). For eQTLs, a higher replication value (π1 = 0.76) was obtained for GTEx transverse colon than for sigmoid colon (π1 = 0.56). For sQTLs the same replication statistic was obtained for both GTEx colon tissue data sets (π1 = 0.67).
Figure 6.
Replication analysis of eQTLs/sQTLs with GTEx v8 colon data. The value of the π1 statistic is shown. The distribution of P values is shown for (A) transverse colon eQTLs, (B) sigmoid colon eQTLs, (C) transverse colon sQTLs, and (D) sigmoid colon sQTLs.
We performed a meta-analysis of BarcUVa-Seq eQTLs with the full GTEx v8 data set (n = 49 tissues) using a multivariate adaptive shrinkage approach.33 Hierarchical clustering of pairwise correlations on the resulting effect sizes showed that BarcUVa-Seq eQTLs from colonic mucosa clustered with GTEx eQTLs from transverse colon and terminal ileum (Figure 7A). The correlations between BarcUVa-Seq eQTL effect sizes and all GTEx tissues showed that transverse colon, terminal ileum, stomach, minor salivary gland, and kidney cortex are the GTEx tissues with highest correlation (ρ > 0.7) (Figure 7B).
Figure 7.
Meta-analysis with GTEx v8 tissues. (A) Clustering of BarcUVa-Seq and GTEx v8 tissues based on pairwise Spearman correlation of eQTL effect sizes derived from mashr meta-analysis. We only considered significant (FDR ≤ 0.05) and active (local false sign rate [LFSR] ≤ 0.05) eQTLs. (B) Spearman correlation of eQTL effect sizes between BarcUVa-Seq and GTEx v8 tissues. eQTL effect sizes were derived from mashr meta-analysis. We only considered significant (FDR ≤ 0.05) and active (LFSR ≤ 0.05) eQTLs.
Annotation and Functional Enrichment Analyses
We observed eSNPs and sSNPs distributed in patterns similar to each other across the following genomic regions: introns, intergenic regions, upstream and downstream gene regions, 3’ and 5’ untranslated regions and splice regions (including donor and acceptor variants). Intronic variants were the most common from both types of SNPs. Intergenic and upstream regions harbored higher proportions of eSNPs than sSNPs, and splice and untranslated regions harbored higher proportions of sSNPs than eSNPs (Figure 5B). Functional consequences also were assessed: most SNPs were not classified, but a small proportion of SNPs were classified as nonsense, start loss, frameshift, canonical splice site, missense, or synonymous variants (Supplementary Table 13).
Next, we performed enrichment analysis at regulatory regions (open chromatin regions, active enhancers, superenhancers, and transcription factor binding sites) using data derived from colon cell lines as well as from normal and cancerous colon tissue. We found significant enrichment (P value ≤ .05) in all types of regulatory regions for both eSNPs and sSNPs. In addition, we looked for enrichment in target sites distributed across the genome of 170 RNA-binding proteins (RBPs). The top 20 RBPs with the lowest P values for eSNP enrichment are included in Figure 8A. Of those RBPs, 15 also were among the top 20 RBPs most enriched for sSNPs. In both cases, the heterogeneous nuclear ribonucleoprotein C was the RBP with the most significant enrichment. The RBPs with highest enrichment values for sSNPs are included in Figure 8B. We observed sSNPs enriched at binding sites of spliceosome constituents such as the splicing factor U2 small nuclear RNA auxiliary factor 1. Full enrichment results are listed in Supplementary Table 14.
Figure 8.
Enrichment of eSNPs/sSNPs in binding sites across the genome of RBPs. (A) The top 20 RBP with the lowest enrichment P values for eSNPs. (B) The top 20 RBPs with the highest enrichment values for sSNPs (P value < .05).
Phenotype Heritability Enrichment and Colocalization Analyses
To quantify the ability of BarcUVa-Seq QTLs to explain a phenotype’s genetic risk loci, we analyzed eSNPs/sSNPs in the context of their potential contribution to total SNP-based heritability estimates of multiple complex traits. SNP-based heritability is the heritability of traits captured by SNPs in a SNP array in the context of a genome-wide association study (GWAS). We performed SNP-based heritability enrichment tests in 63 complex diseases and traits that we considered a priori to influence or be influenced by colon homeostasis. We observed that eSNPs were enriched in the SNP-based heritability estimation of 20 diseases or traits after Bonferroni adjustment (P value ≤ 8 × 10-4) and 31 diseases or traits at an unadjusted P value ≤ .01. SNP-heritability enrichments for 33 traits and diseases are included in Figure 9A, and full results are listed in Supplementary Table 15. BarcUVa-Seq eSNPs explained 17% of the total SNP-based heritability of CRC (P value = 9 × 10-8), which accounts for 10% of the phenotype (based on a recent GWAS study34). Interestingly, eSNPs also were enriched in the SNP-based heritability estimation of psychiatric–neuronal disease, such as schizophrenia, bipolar disorder, and multisite chronic pain. BarcUVa-Seq sSNPs were enriched in the SNP-based heritability estimation of 10 diseases and traits at a P value ≤ .01, but no enrichments were statistically significant after Bonferroni adjustment (Figure 9B shows 33 representative traits or diseases, Supplementary Table 15 has the full list of results). BarcUVa-Seq sSNPs explained 3% of the total SNP heritability of ulcerative colitis (P value = .02), which accounts for 13% of the phenotype (Figure 9B).
Figure 9.
BarcUVa-Seq QTL enrichment results for total SNP heritability of 33 complex traits and diseases related to colon tissue. (A) Proportion of total SNP heritability explained by eSNPs is shown on the x axis, along with error bars. The size of the points indicates the percentage of the total SNP heritability out of the total heritability of the phenotype. (B) Proportion of total SNP heritability explained by sSNPs is shown on the x axis, along with error bars. The size of the points indicates the percentage of the total SNP heritability out of the total heritability of the phenotype.
Subsequently, to nominate candidate genes at GWAS-identified genetic risk loci, we performed colocalization analyses for the complex traits and diseases that passed Bonferroni correction for SNP-based heritability analysis for BarcUVa-Seq eSNPs. The regional colocalization probability is used as a proxy for the gene’s causality, that is, to quantify the probability that an eQTL and a GWAS signal share the same causal variant.35 In the case of CRC, we identified 13 genes with regional colocalization probability greater than 0.9, including known risk genes such as COLCA1 and COLCA2,6 as well as other less-well-described genes such as ANKRD36. In the case of inflammatory bowel disease, we identified 6 genes with a regional colocalization probability greater than 0.9, such as IRF8 and RGS14 (Figure 10). Full results are available in the Supplementary Data.
Figure 10.
The top eQTLs of the genes with the highest regional colocalization probability for CRC and inflammatory bowel disease. (A) Expression level (inverse normal transformed trimmed means of M values [TMMs]) of COLCA2 by genotype of the eSNP rs11213820. (B) Expression level (inverse normal transformed TMMs) of IRF8 by genotype of the eSNP rs16940186.
Colon Transcriptome Explorer
Gene and transcript abundances for the BarcUVa-Seq data set, as well as eQTLs/sQTLs, have been loaded into the web-based visualization resource CoTrEx. This tool facilitates searches for genes and transcripts of interest for their visualization in customizable plots, such as a strip chart, heatmap, and principal component analysis (PCA) plots. The interactive application includes different options for filtering and coloring the data by covariates. Figure 11 shows an example in the Expression tab. CoTrEx is freely available online at http://barcuvaseq.org/cotrex.
Figure 11.
Overview of the expression tab of CoTrEx. As an example, the transcript expression values and relative abundances of the TP53 gene are shown, along with different display options.
Discussion
In the present study we analyzed a large data set (BarcUVa-Seq) comprising germline SNPs and transcriptome profiles from mucosal biopsy specimens of ascending, transverse, and descending colon collected from 445 healthy living individuals. Differential expression patterns were identified across colon subsites. We profiled 11,739 eQTLs comprising 11,427 unique SNPs associated with the expression of 11,739 genes. In addition, we identified 13,243 AS events from 7 distinct AS categories and identified 1125 AS events in 1125 genes associated with 1122 unique SNPs (sQTLs). These eQTLs/sQTLs frequently were intronic and enriched in regulatory regions. We showed how these are useful for annotation of GWAS-identified risk loci and prioritization of candidate effector genes. Moreover, we replicated and meta-analyzed our QTLs with GTEx v8 data. Finally, we built an interactive web resource to explore the expression profiles and QTLs of the BarcUVa-Seq data set.
In contrast to BarcUVa-Seq, the GTEx project provided RNA-Seq data on sigmoid and transverse colon tissue from post-mortem subjects and extracted RNA from full-thickness and muscularis-only sections.8,36 Our novel BarcUVa-Seq data set overcomes some of the limitations of the GTEx colon data sets. BarcUVa-Seq samples were collected as superficial mucosal biopsy specimens in living subjects undergoing colonoscopy, which provide an optimal representation of the normal physiology of the colon epithelium. Moreover, they included subsites of the large intestine not assessed previously. Together with the enrichment of colon epithelial cells in superficial biopsy specimens, inclusion of ascending, transverse, and descending colon samples make BarcUVa-Seq a unique colon transcriptome data set.
Next-generation RNA-Seq data provide estimates of AS. Although long-read sequencing technologies can provide transcriptomic profiles with full-length isoform information, such technologies have lower base-level fidelity and are less feasible in large population-based studies at their current cost.11 In this study we used 2 complementary methods to provide a comprehensive profile of AS. The frequencies of genes with specific AS patterns that we identified in colon tissue are similar to those described in other tissues, where genes with exon skipping events were the most frequent.17 Predicting AS events helps generate hypotheses about specific molecular mechanisms involved in post-transcriptional modifications. In contrast to profiling individual transcripts to characterize the transcriptome, AS events group transcripts with similar structure. However, the profiles of annotated AS events are sensitive to the choice of transcript annotations,11 and other measures of AS, such as clusters of excised introns, complement the characterization of AS events.13
Regarding colon location, transcriptomic differences between subsites in normal colon have been described previously,37 including gene expression differences in genes from the cytochrome P450 family. In addition, different AS events have been identified between CRC tumors located in the ascending and descending colon.38 Indeed, tumor distribution across the colon has been associated with differential mutation and immune profiles, prognosis, and treatment response.39,40 In this study, we identified a subset of genes expressed differentially between colon subsites that are involved in molecular pathways related to lipid, xenobiotic, and drug metabolism, and a subset of genes involved in antimicrobial response. We observed that the gene expression profile of transverse colon tissue was more similar to the descending than to the ascending colon, which was unexpected based on embryologic origin and adult blood supply. Differential gene expression across the colon may reflect differences in cell type composition because we find gene markers of different cell types of the colon epithelium shown by single-cell RNA-Seq studies.41, 42, 43 For instance, using our data, we confirmed that goblet cell markers defined elsewhere,41 such as MUC2 and TFF3, are overexpressed in descending colon (Supplementary Table 2), which supports previous findings that have shown that goblet cell content increases caudally from duodenum to distal colon.44 Differential expression also may be influenced by differential exposure owing to variability in luminal content along the length of the colon, including microbial communities.43
We identified eQTLs and sQTLs assumed to participate in the transcriptional regulation of colon epithelium via cis mechanisms. These had strong replication in the transverse colon from GTEx v8 and were more similar to tissues with a high proportion of mucosa (eg, terminal ileum, stomach, and salivary gland) than others from GTEx v8, showing the robustness of BarcUVa-Seq data. The lower replication value in sigmoid colon may be owing to the higher proportion of muscularis in this tissue.8,36 We found fewer sGenes than eGenes, partly because the number of genes that showed splicing variability was lower than genes with expression variability. In addition, we had lower power to detect expression for transcripts than for genes at our depth of coverage. We found similar distributions of eSNPs/sSNPs around gene TSSs, as well as across estimated effect sizes, genomic locations, and functional consequences. We observed a high proportion of sGenes among eGenes, as reported elsewhere.24,25 Although they can colocalize, eQTLs and sQTLs usually are independent.27 sQTLs add information to eQTLs as they associate SNPs with changes in relative use of specific sets of transcripts sharing a common structure and post-transcriptional mechanism.
In this study, we showed that regulation of gene expression and AS is associated with tissue-specific epigenetic variations, including chromatin remodeling and histone modifications.45 The dysregulation of these features has been associated with initiation and progression of diseases such as CRC.45,46 We showed that normal colon eSNPs/sSNPs are present at many important regulatory regions marked by epigenetic signatures, such as open chromatin and proximal enhancers of both normal and malignant colon tissue. In addition, we identified specific RBPs and transcription factors as potential regulators of AS in normal colon.
We provide a comprehensive profile of AS for normal tissue along colon subsites in living subjects. We described differential gene expression and splicing between the ascending and descending normal colon, which involved genes of immune response and drug metabolism. We expanded the number of colon QTLs and assessed eQTL interaction with colon subsites. In addition, we observed that colon eQTLs/sQTLs contributed to the SNP-based heritability of brain-related traits and disease, supporting a model of epithelial–neuronal communication along the gut–brain axis.28 Thus, our QTL catalog may be of potential interest for researchers investigating traits and diseases that do not primarily affect the colon, but other organs. It is important to note that these results could reflect a common regulation of expression between tissues. In addition, colocalization alludes to potential molecular mechanisms associated with risk loci, but may not prove to be directly causal.
Overall, our findings provide evidence of the regulation of gene expression and alternative splicing in the colon as potential underlying mechanisms of genetic risk loci and should serve as a rich resource for the research community.
Methods
Sample Collection
Subjects included in the study (n = 445; 64% females) had a mean age of 60 years, were almost all of European ancestry, and received an indication for colonoscopy after a positive fecal immunochemical test result (hemoglobin level, >20 mg Hb/g) or by direct referral by their medical doctor. Subjects had no lesions at colonoscopy and no history of polyps or CRC. Non-neoplastic colon mucosa biopsy specimens were obtained endoscopically from the ascending (n = 138; 31%), transverse (n = 143; 32%), and descending (n = 164; 37%) colon (Table 1). Peripheral blood samples also were collected. Informed consent was obtained from all participants. The corresponding study protocol was approved by the Bellvitge University Hospital Ethics Committee (PR073/11 and PR286/15) and followed national and international directives on ethics and data protection. More information about the BarcUVa-Seq project can be accessed online at https://barcuvaseq.org. All authors had access to the study data and reviewed and approved the final manuscript.
RNA-Seq Library Preparation and Sequencing
RNA was extracted from frozen tissue using the mirVana kit (Thermo Fisher Scientific, Waltham, MA) after homogenization using the Minilys bead mill (Bertin Instruments, Montigny le Bretonneux, France). The RNA was DNAse treated and concentrated using the RNA Clean and Concentrator-5 kit (Zymo Research, Irvine, CA). Quantification of total RNA was executed using a Qubit Fluorometer (Invitrogen, Walthan, MA). An Agilent (Santa Clara, CA) 2100 Bioanalyzer or TapeStation was used to assess quality. For library preparation, the Illumina TruSeq Stranded Total RNA Library Prep Gold kit was used. Libraries were tagged with unique adapter indexes. Final libraries were validated on the Agilent 2100 Bioanalyzer, quantified via quantitative polymerase chain reaction, pooled at equimolar ratios, diluted, denatured, and loaded onto an Illumina HiSeq 2500 (high-output mode), for batches 1–7, or a NovaSeq 6000, for batch 8, instruments using a paired-end flowcell.
RNA-Seq Data Processing
Low-quality bases, sequencing adapters, and ribosomal RNA of raw sequences were trimmed from RNA-Seq reads using BBTools suite (Joint Genome Institute, Berkeley, CA).47 FastQC (Babraham Bioinformatics, Cambridge, UK)48 was used for quality control. Trimmed reads were aligned against human transcriptome using the Genome Reference Consortium human reference 37 assembly (GRCh37/hg19) with the Spliced Transcripts Alignment to a Reference (STAR, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY) software in 2-pass mode49 using GENCODE (EMBL-EBI, Hnxton, UK) release 19 annotations, which include a total of 57,952 genes and 196,667 transcripts.31 We only included samples with a depth of coverage greater than 10 million mappable paired-end reads, a multimapping rate lower than 15%, and a unique mapping rate greater than 80%. The mean library size was 32M (SD, 8.5M). Gene and transcript expression were quantified with RSEM (University of Wisconsin-Madison, Madison, WI).50 Genes and transcripts with fewer than 6 and 3 counts, respectively, in less than 10% of the samples were considered not expressed and filtered out. Trimmed mean of M values were computed from counts to correct for library size and RNA composition.
Genotype Data Processing
Genotyping of approximately 400,000 SNPs was performed with the Illumina OncoArray BeadChip.30 We only included samples with a genotyping rate greater than 95%. The following aspects also were assessed before imputation: duplication and relatedness greater than 0.8, missing rate per SNP greater than 0.1, missing rate per sample greater than 0.1, sex concordance (genetic and reported sex), heterozygosity: means ± 4 SD and Hardy–Weinberg disequilibrium P value less than 1 x 10-4. We obtained allelic dosages from 39,117,105 and 1,228,035 SNPs for autosomes and chromosome X, respectively, using SHAPEIT (University of Oxford, Oxford, UK)51 for phasing and Minimac 3 (University of Michigan, Ann Arbor, MI)53 for imputation with The Haplotype Reference Consortium panel on the Michigan Imputation Server.52 SNPs with an imputation quality of R2 less than 0.7 or minor allele frequency (MAF) less than 1% were excluded, resulting in 6,804,675 and 183,788 SNPs for autosomes and chromosome X, respectively. Allelic dosages were used for subsequent QTL analyses. SNP IDs were annotated using dbSNP version 142.53 Principal components of genetic data were obtained with PLINK 1.9 (Complete Genomics, Mountain View, CA).54 We checked that both genotype and RNA-Seq samples had been labeled correctly and belonged to the same individual using Picard Tools CheckFingerprint (Broad Institute, Cambridge, MA).
Alternative Splicing Profiling
For quantifying AS, we used 2 complementary methods that provide the relative abundance (ie, percent splicing index [PSI]) of specific AS features. Seven types of AS events were determined based on GENCODE version 19 annotations with SUPPA2 (Catalan Institution for Research and Advanced Studies, Barcelona, Spain).12 In this case, the PSI reflects the proportion of transcripts of a given gene showing a specific AS event (ie, inclusion transcripts) of the total transcripts of the gene.11 This metric was calculated with SUPPA2 for each AS event by dividing the expression levels of the inclusion transcripts by the total expression levels of all transcripts of the gene. We kept AS events in which the median PSI for all samples was between 0.05 and 0.95 (see AS events annotations in Supplementary Table 1). As a complementary approach, we used LeafCutter (Stanford University, Stanford, CA)13 following the analysis procedure described elsewhere8 to compute the relative abundance of alternatively excised introns.
Differential Gene Expression and Splicing Analysis
Differential gene expression analysis was performed using a quasi-likelihood F-test implemented in the R package edgeR (Garvan Institute of Medical Research, Parkville, Australia).55 Ward’s minimum variance method with Euclidean distances was used for hierarchical clustering. For differential splicing analysis, normalized PSI values of AS events were fitted in a linear model adjusted for sex, age, and sequencing batch using the R package limma (University of Melbourne, Parkville, Australia).56 The function diffSplice was used to perform an F test to find the differences between AS event log-fold-changes of a gene and yield a single gene-level P value. T tests for individual AS events also were performed with diffSplice. Differential use of excised introns was performed with LeafCutter,13 adjusting for sex, age, and sequencing batch. Functional enrichment analysis was performed with FUMA gen2func (University Amsterdam, Amsterdam, The Netherlands)57 using differentially expressed genes with FWER of 0.05 or less. FWER values were estimated for correcting for multiple testing using a Bonferroni correction.
eQTL/sQTL Mapping
We mapped QTLs within 1 Mb of the TSSs for given genes and assumed QTLs influenced expression of nearby genes via cis mechanisms. For QTL identification we used FastQTL (University of Geneva Medical School, Geneva, Switzerland) version 2.0.58 We applied an inverse normal transformation on gene trimmed means of M values and PSI values, which mitigates the effect of outliers and normalizes the expression distribution across samples. We adjusted the models for age, sex, sequencing batch, tissue anatomic location, genetic ancestry (2 principal components), and probabilistic estimation of expression residuals factors,59 which capture the effects of unknown confounding variables. We chose the number of probabilistic estimation of expression residuals factors that maximized the discovery of eGenes/sGenes. FDR (Storey and Tibshirani procedure) was computed with R package qvalue (Princeton University, Princeton, NJ).60 For colon subsite eQTL interaction analysis we used the FastQTL version 2.0 interaction mode.57
Replication and Meta-Analysis With GTEx Data
For replication analysis, we estimated π133 with the R package qvalue.60 This statistic reflects the proportion of true positives among BarcUVa-Seq QTLs that also were detected by the corresponding QTL analysis in GTEx v8. Following a common approach described elsewhere,8 we only included associations involving the SNP with the lowest P value for each gene to avoid including many SNPs in LD. For meta-analysis, full GTEx v8 eQTL summary statistics (n = 49 tissues) were downloaded from the Google Cloud Platform (Mountain View, CA) under gtex-resources. We used a multivariate adaptive shrinkage approach using the R package mashr (University of Chicago, Chicago, IL)33 following the same analytic pipeline described elsewhere.8 Effect size estimates and local false sign rate output by mashr were used as metrics of QTL magnitude and activity, respectively. A local false sign rate less than 0.05 was used as a threshold for significant QTL activity.
Annotation and Functional Enrichment Analysis
For the annotation of genomic regions and classification of variants according to their functional consequence we used the ENSEMBL Variant Effect Predictor (EMBL-EBI, Hinxton, UK).61 We used the --pick flag to extract a single annotation per variant following an ordered set of criteria to prioritize annotations. For functional enrichment analysis in regulatory regions distributed across the genome (Supplementary Table 14), we compiled a list of publicly available regions relevant for colon tissue from different studies (ie, active enhancers,46 variant enhancer loci,46 open chromatin sites,34,46 superenhancers,62 and transcription factor binding sites63). Regions from multiple samples of the same assay type were joined. In addition, we downloaded RNA binding protein sites, including splicing factor binding sites, from CLIPdb (Tsinghua University, Beijing, China).64 We used GREGOR (University of Michigan, Ann Arbor, MI),65 which defines enrichment (fold change) as the ratio between the number of observed vs expected SNPs overlapping the regulatory regions. This approach accounts for the number of LD proxies, gene proximity, and MAF.
Phenotype Heritability Enrichment and Colocalization Analyses
For the SNP-based heritability enrichment analysis (partitioned heritability analysis) of eSNPs/sSNPs among disease-/trait-associated loci, we applied linkage disequilibrium score regression using the software LD SCore (Broad Institute of MIT, Cambridge, MA)66 with baselineLD model. A list with the GWAS summary statistics used for this analysis and related information can be found in Supplementary Table 15. Total SNP heritability for the tested phenotypes was estimated in observed scale for continuous traits and in liability scale for binary traits, using LD score regression from a total of 1,217,312 SNPs with a MAF greater than 0.05 in HapMap phase 3 populations (NHGRI, Bethesda, MD).66 Under the null hypothesis of all SNPs contributing equally to the total SNP-based heritability, we would expect that the 1122 sSNPs and 11,427 eSNPs identified in this study explain approximately 0.09% and 0.94%, respectively, of estimated total SNP heritability. Population prevalence and lifetime risk in the case of CRC was curated from the literature. For colocalization we used the fastENLOC (University of Michigan)35 approach. We computed Z-score–derived posterior inclusion probabilities for GWAS summary statistics with TORUS (University of Michigan)67 and assigned LD blocks to each locus using the references defined elsewhere.68 We performed multi-SNP fine-mapping analysis of eQTLs with DAP-G (University of Michigan).69
Web Application
The web-based visualization resource CoTrEx was developed with the RStudio platform Shiny (Boston, MA)70 using open-source software.
Data Availability
The RNA-Seq and SNP data that support the findings of this study as well as the sample covariates are available from the European Genome-phenome Archive under accession number EGAS00001004891. Complete summary statistics (including all FastQTL nominal pass results) for all QTLs identified in this study are available from the Digital Repository of the University of Barcelona at http://hdl.handle.net/2445/172697. Top-QTLs per gene are available in Supplementary Tables 7, 9, 10, 11, and 13.
Acknowledgments
The authors thank the "Centres de Recerca de Catalunya" (CERCA) Program, Generalitat de Catalunya for institutional support. The authors particularly acknowledge the patients participating in this study, the endoscopy units from the Bellvitge University Hospital and the Viladecans Hospital, as well as Carmen Atencia, Judith Rocamora, Susana Lopez, Gemma Aiza, and the Biobank, Bellvitge University Hospital, Catalan Institute of Oncology Bellvitge Biomedical Research Institute (HUB-ICO-IDIBELL) (PT17/0015/0024) for their collaboration. RNA-Seq was provided by the Genomics Core Facility of the Case Western Reserve University (CWRU) School of Medicine's Genetics and Genome Sciences Department as well as the Northwest Genomics Center at the University of Washington. Colon artwork in the CoTrEx logo is designed by Smashicons from Flaticon (Málaga, Spain).
CRediT Authorship Contributions
Virginia Díez-Obrero (Data curation: Lead; Formal analysis: Lead; Software: Lead; Visualization: Lead; Writing – original draft: Lead)
Christopher H Dampier (Data curation: Lead; Formal analysis: Equal; Writing – original draft: Lead; Writing – review & editing: Lead)
Ferran Moratalla-Navarro (Data curation: Equal; Formal analysis: Equal; Software: Equal; Writing – review & editing: Equal)
Matthew Devall (Data curation: Equal; Formal analysis: Equal; Writing – review & editing: Equal)
Sarah J Plummer (Data curation: Equal; Resources: Lead; Writing – review & editing: Equal)
Anna Díez-Villanueva (Formal analysis: Equal; Software: Equal; Writing – review & editing: Equal)
Ulrike Peters (Funding acquisition: Equal; Resources: Equal; Supervision: Equal; Writing – review & editing: Equal)
Stephanie Bien (Supervision: Equal; Writing – review & editing: Equal)
Jeroen R Huyghe (Supervision: Equal; Writing – review & editing: Equal)
Anshul Kundaje (Supervision: Equal)
Gemma Ibáñez-Sanz (Resources: Lead; Writing – review & editing: Equal)
Elisabeth Guinó (Data curation: Lead)
Mireia Obón-Santacana (Data curation: Equal; Writing – review & editing: Equal)
Robert Carreras-Torres (Conceptualization: Equal; Software: Equal; Supervision: Equal; Writing – original draft: Lead; Writing – review & editing: Lead)
Graham Casey (Conceptualization: Lead; Funding acquisition: Lead; Resources: Lead; Supervision: Lead; Writing – review & editing: Equal)
Victor Moreno (Conceptualization: Lead; Funding acquisition: Lead; Resources: Lead; Software: Equal; Supervision: Lead; Writing – review & editing: Equal)
Footnotes
Conflicts of interest The authors disclose no conflicts.
Funding Supported by the Agency for Management of University and Research Grants of the Catalan Government grants 2017SGR723; the Instituto de Salud Carlos III, co-funded by European Regional Development Fund (FEDER) funds “A Way to Build Europe” grants PI14-00613, PI17-00092; the Spanish Association Against Cancer Scientific Foundation grant GCTRA18022MORE; Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP) CB07/02/2005; and the National Institutes of Health grants R01CA204279, R01CA143237, and R01CA201407. Also supported by EU H2020 - Marie Skłodowska-Curie (MSC) grant 796216 (R.C.T.); a postdoctoral fellowship through the “Fundación Científica de la Asociación Española Contra el Cáncer (AECC)” (M.O.S.); the National Institutes of Health training grant T32 5T32CA163177-07 (C.H.D.); and the Ministerio de Universidades through predoctoral fellowship number FPU16/00599 for the “Formación del Profesorado Universitario” (V.D.O.). A sample collection of this work was supported by the Xarxa de Bancs de Tumors de Catalunya (XBTC) sponsored by Pla Director d’Oncologia de Catalunya, “Plataforma Biobancos PT13/0010/0013,” and the Biobank of the Catalan Institute of Oncology (ICOBIOBANC), sponsored by the Catalan Institute of Oncology. This work was supported in part by National Institutes of Health/National Cancer Institute grants CA143237 and CA204279 (G.C.).
Contributor Information
Graham Casey, Email: gc8r@virginia.edu.
Víctor Moreno, Email: v.moreno@iconcologia.net.
Supplementary Material
References
- 1.Momozawa Y., Dmitrieva J., Théâtre E., Deffontaine V., Rahmouni S., Charloteaux B., Crins F., Docampo E., Elansary M., Gori A.-S., Lecut C., Mariman R., Mni M., Oury C., Altukhov I., Alexeev D., Aulchenko Y., Amininejad L., Bouma G., Hoentjen F., Löwenberg M., Oldenburg B., Pierik M.J., Vander Meulen-de Jong A.E., Janneke van der Woude C., Visschedijk M.C., International IBD Genetics Consortium. Lathrop M., Hugot J.-P., Weersma R.K., De Vos M., Franchimont D., Vermeire S., Kubo M., Louis E., Georges M. IBD risk loci are enriched in multigenic regulatory modules encompassing putative causative genes. Nat Commun. 2018;9:2427. doi: 10.1038/s41467-018-04365-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Closa A., Cordero D., Sanz-Pamplona R., Solé X., Crous-Bou M., Paré-Brunet L., Berenguer A., Guino E., Lopez-Doriga A., Guardiola J., Biondo S., Salazar R., Moreno V. Identification of candidate susceptibility genes for colorectal cancer through eQTL analysis. Carcinogenesis. 2014;35:2039–2046. doi: 10.1093/carcin/bgu092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Moreno V., Alonso M.H., Closa A., Vallés X., Diez-Villanueva A., Valle L., Castellví-Bel S., Sanz-Pamplona R., Lopez-Doriga A., Cordero D., Solé X. Colon-specific eQTL analysis to inform on functional SNPs. Br J Cancer. 2018;119:971–977. doi: 10.1038/s41416-018-0018-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Singh T., Levine A.P., Smith P.J., Smith A.M., Segal A.W., Barrett J.C. Characterization of expression quantitative trait loci in the human colon. Inflamm Bowel Dis. 2015;21:251–256. doi: 10.1097/MIB.0000000000000265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hulur I., Gamazon E.R., Skol A.D., Xicola R.M., Llor X., Onel K., Ellis N.A., Kupfer S.S. Enrichment of inflammatory bowel disease and colorectal cancer risk variants in colon expression quantitative trait loci. BMC Genomics. 2015;16:138. doi: 10.1186/s12864-015-1292-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Law P.J., Timofeeva M., Fernandez-Rozadilla C., Broderick P., Studd J., Fernandez-Tajes J., Farrington S., Svinti V., Palles C., Orlando G., Sud A., Holroyd A., Penegar S., Theodoratou E., Vaughan-Shaw P., Campbell H., Zgaga L., Hayward C., Campbell A., Harris S., Deary I.J., Starr J., Gatcombe L., Pinna M., Briggs S., Martin L., Jaeger E., Sharma-Oates A., East J., Leedham S., Arnold R., Johnstone E., Wang H., Kerr D., Kerr R., Maughan T., Kaplan R., Al-Tassan N., Palin K., Hänninen U.A., Cajuso T., Tanskanen T., Kondelin J., Kaasinen E., Sarin A.-P., Eriksson J.G., Rissanen H., Knekt P., Pukkala E., Jousilahti P., Salomaa V., Ripatti S., Palotie A., Renkonen-Sinisalo L., Lepistö A., Böhm J., Mecklin J.-P., Buchanan D.D., Win A.-K., Hopper J., Jenkins M.E., Lindor N.M., Newcomb P.A., Gallinger S., Duggan D., Casey G., Hoffmann P., Nöthen M.M., Jöckel K.-H., Easton D.F., Pharoah P.D.P., Peto J., Canzian F., Swerdlow A., Eeles R.A., Kote-Jarai Z., Muir K., Pashayan N., PRACTICAL Consortium. Harkin A., Allan K., McQueen J., Paul J., Iveson T., Saunders M., Butterbach K., Chang-Claude J., Hoffmeister M., Brenner H., Kirac I., Matošević P., Hofer P., Brezina S., Gsur A., Cheadle J.P., Aaltonen L.A., Tomlinson I., Houlston R.S., Dunlop M.G. Association analyses identify 31 new risk loci for colorectal cancer susceptibility. Nat Commun. 2019;10:2154. doi: 10.1038/s41467-019-09775-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ongen H., Andersen C.L., Bramsen J.B., Oster B., Rasmussen M.H., Ferreira P.G., Sandoval J., Vidal E., Whiffin N., Planchon A., Padioleau I., Bielser D., Romano L., Tomlinson I., Houlston R.S., Esteller M., Orntoft T.F., Dermitzakis E.T. Putative cis-regulatory drivers in colorectal cancer. Nature. 2014;512:87–90. doi: 10.1038/nature13602. [DOI] [PubMed] [Google Scholar]
- 8.The GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318–1330. doi: 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Di C., Syafrizayanti, Zhang Q., Chen Y., Wang Y., Zhang X., Liu Y., Sun C., Zhang H., Hoheisel J.D. Function, clinical application, and strategies of Pre-mRNA splicing in cancer. Cell Death Differ. 2018;26:1181–1194. doi: 10.1038/s41418-018-0231-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Manning K.S., Cooper T.A. The roles of RNA processing in translating genotype to phenotype. Nat Rev Mol Cell Biol. 2017;18:102–114. doi: 10.1038/nrm.2016.139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Park E., Pan Z., Zhang Z., Lin L., Xing Y. The expanding landscape of alternative splicing variation in human populations. Am J Hum Genet. 2018;102:11–26. doi: 10.1016/j.ajhg.2017.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Trincado J.L., Entizne J.C., Hysenaj G., Singh B., Skalic M., Elliott D.J., Eyras E. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 2018;19:40. doi: 10.1186/s13059-018-1417-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Li Y.I., Knowles D.A., Humphrey J., Barbeira A.N., Dickinson S.P., Im H.K., Pritchard J.K. Annotation-free quantification of RNA splicing using LeafCutter. Nat Genet. 2018;50:151–158. doi: 10.1038/s41588-017-0004-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ryan M., Wong W.C., Brown R., Akbani R., Su X., Broom B., Melott J., Weinstein J. TCGASpliceSeq a compendium of alternative mRNA splicing in cancer. Nucleic Acids Res. 2016;44:D1018–D1022. doi: 10.1093/nar/gkv1288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kahles A., Lehmann K.-V., Toussaint N.C., Hüser M., Stark S.G., Sachsenberg T., Stegle O., Kohlbacher O., Sander C., Cancer Genome Atlas Research Network. Rätsch G. Comprehensive analysis of alternative splicing across tumors from 8,705 patients. Cancer Cell. 2018;34:211–224.e6. doi: 10.1016/j.ccell.2018.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Climente-González H., Porta-Pardo E., Godzik A., Eyras E. The functional impact of alternative splicing in cancer. Cell Rep. 2017;20:2215–2226. doi: 10.1016/j.celrep.2017.08.012. [DOI] [PubMed] [Google Scholar]
- 17.Wang K., Wu D., Zhang H., Das A., Basu M., Malin J., Cao K., Hannenhalli S. Comprehensive map of age-associated splicing changes across human tissues and their contributions to age-associated diseases. Sci Rep. 2018;8:10929. doi: 10.1038/s41598-018-29086-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Huang X., Liu J., Mo X., Liu H., Wei C., Huang L., Chen J., Tian C., Meng Y., Wu G., Xie W., P C F.J., Liu Z., Tang W. Systematic profiling of alternative splicing events and splicing factors in left- and right-sided colon cancer. Aging. 2019;11:8270–8293. doi: 10.18632/aging.102319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Xiong Y., Deng Y., Wang K., Zhou H., Zheng X., Si L., Fu Z. Profiles of alternative splicing in colorectal cancer and their clinical significance: A study based on large-scale sequencing data. EBioMedicine. 2018;36:183–195. doi: 10.1016/j.ebiom.2018.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zong Z., Li H., Yi C., Ying H., Zhu Z., Wang H. Genome-wide profiling of prognostic alternative splicing signature in colorectal cancer. Front Oncol. 2018;8:537. doi: 10.3389/fonc.2018.00537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Takata A., Matsumoto N., Kato T. Genome-wide identification of splicing QTLs in the human brain and their enrichment among schizophrenia-associated loci. Nat Commun. 2017;8:14519. doi: 10.1038/ncomms14519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhang X., Joehanes R., Chen B.H., Huan T., Ying S., Munson P.J., Johnson A.D., Levy D., O’Donnell C.J. Identification of common genetic variants controlling transcript isoform variation in human whole blood. Nat Genet. 2015;47:345–352. doi: 10.1038/ng.3220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Li Y.I., Wong G., Humphrey J., Raj T. Prioritizing Parkinson’s disease genes using population-scale transcriptomic data. Nat Commun. 2019;10:994. doi: 10.1038/s41467-019-08912-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Rotival M., Quach H., Quintana-Murci L. Defining the genetic and evolutionary architecture of alternative splicing in response to infection. Nat Commun. 2019;10:1671. doi: 10.1038/s41467-019-09689-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Walker R.L., Ramaswami G., Hartl C., Mancuso N., Gandal M.J., de la Torre-Ubieta L., Pasaniuc B., Stein J.L., Geschwind D.H. Genetic control of expression and splicing in developing human brain informs disease mechanisms. Cell. 2019;179:750–771.e22. doi: 10.1016/j.cell.2019.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Tian J., Wang Z., Mei S., Yang N., Yang Y., Ke J., Zhu Y., Gong Y., Zou D., Peng X., Wang X., Wan H., Zhong R., Chang J., Gong J., Han L., Miao X. CancerSplicingQTL: a database for genome-wide identification of splicing QTLs in human cancer. Nucleic Acids Res. 2019;47:D909–D916. doi: 10.1093/nar/gky954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Li Y.I., van de Geijn B., Raj A., Knowles D.A., Petti A.A., Golan D., Gilad Y., Pritchard J.K. RNA splicing is a primary link between genetic variation and disease. Science. 2016;352:600–604. doi: 10.1126/science.aad9417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Najjar S.A., Davis B.M., Albers K.M. Epithelial–neuronal communication in the colon: implications for visceral pain. Trends Neurosci. 2020;43:170–181. doi: 10.1016/j.tins.2019.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Camilleri M. Leaky gut: mechanisms, measurement and clinical implications in humans. Gut. 2019;68:1516–1526. doi: 10.1136/gutjnl-2019-318427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Amos C.I., Dennis J., Wang Z., Byun J., Schumacher F.R., Gayther S.A., Casey G., Hunter D.J., Sellers T.A., Gruber S.B., Dunning A.M., Michailidou K., Fachal L., Doheny K., Spurdle A.B., Li Y., Xiao X., Romm J., Pugh E., Coetzee G.A., Hazelett D.J., Bojesen S.E., Caga-Anan C., Haiman C.A., Kamal A., Luccarini C., Tessier D., Vincent D., Bacot F., Van Den Berg D.J., Nelson S., Demetriades S., Goldgar D.E., Couch F.J., Forman J.L., Giles G.G., Conti D.V., Bickeböller H., Risch A., Waldenberger M., Brüske-Hohlfeld I., Hicks B.D., Ling H., McGuffog L., Lee A., Kuchenbaecker K., Soucy P., Manz J., Cunningham J.M., Butterbach K., Kote-Jarai Z., Kraft P., FitzGerald L., Lindström S., Adams M., McKay J.D., Phelan C.M., Benlloch S., Kelemen L.E., Brennan P., Riggan M., O’Mara T.A., Shen H., Shi Y., Thompson D.J., Goodman M.T., Nielsen S.F., Berchuck A., Laboissiere S., Schmit S.L., Shelford T., Edlund C.K., Taylor J.A., Field J.K., Park S.K., Offit K., Thomassen M., Schmutzler R., Ottini L., Hung R.J., Marchini J., Amin Al Olama A., Peters U., Eeles R.A., Seldin M.F., Gillanders E., Seminara D., Antoniou A.C., Pharoah P.D.P., Chenevix-Trench G., Chanock S.J., Simard J., Easton D.F. The OncoArray Consortium: a network for understanding the genetic architecture of common cancers. Cancer Epidemiol Biomarkers Prev. 2017;26:126–135. doi: 10.1158/1055-9965.EPI-16-0106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Harrow J., Frankish A., Gonzalez J.M., Tapanari E., Diekhans M., Kokocinski F., Aken B.L., Barrell D., Zadissa A., Searle S., Barnes I., Bignell A., Boychenko V., Hunt T., Kay M., Mukherjee G., Rajan J., Despacio-Reyes G., Saunders G., Steward C., Harte R., Lin M., Howald C., Tanzer A., Derrien T., Chrast J., Walters N., Balasubramanian S., Pei B., Tress M., Rodriguez J.M., Ezkurdia I., van Baren J., Brent M., Haussler D., Kellis M., Valencia A., Reymond A., Gerstein M., Guigó R., Hubbard T.J. GENCODE: the reference human genome annotation for the ENCODE Project. Genome Res. 2012;22:1760–1774. doi: 10.1101/gr.135350.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Storey J.D., Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Urbut S.M., Wang G., Carbonetto P., Stephens M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat Genet. 2019;51:187–195. doi: 10.1038/s41588-018-0268-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Huyghe J.R., Bien S.A., Harrison T.A., Kang H.M., Chen S., Schmit S.L., Conti D.V., Qu C., Jeon J., Edlund C.K., Greenside P., Wainberg M., Schumacher F.R., Smith J.D., Levine D.M., Nelson S.C., Sinnott-Armstrong N.A., Albanes D., Alonso M.H., Anderson K., Arnau-Collell C., Arndt V., Bamia C., Banbury B.L., Baron J.A., Berndt S.I., Bézieau S., Bishop D.T., Boehm J., Boeing H., Brenner H., Brezina S., Buch S., Buchanan D.D., Burnett-Hartman A., Butterbach K., Caan B.J., Campbell P.T., Carlson C.S., Castellví-Bel S., Chan A.T., Chang-Claude J., Chanock S.J., Chirlaque M.-D., Cho S.H., Connolly C.M., Cross A.J., Cuk K., Curtis K.R., de la Chapelle A., Doheny K.F., Duggan D., Easton D.F., Elias S.G., Elliott F., English D.R., Feskens E.J.M., Figueiredo J.C., Fischer R., FitzGerald L.M., Forman D., Gala M., Gallinger S., Gauderman W.J., Giles G.G., Gillanders E., Gong J., Goodman P.J., Grady W.M., Grove J.S., Gsur A., Gunter M.J., Haile R.W., Hampe J., Hampel H., Harlid S., Hayes R.B., Hofer P., Hoffmeister M., Hopper J.L., Hsu W.-L., Huang W.-Y., Hudson T.J., Hunter D.J., Ibañez-Sanz G., Idos G.E., Ingersoll R., Jackson R.D., Jacobs E.J., Jenkins M.A., Joshi A.D., Joshu C.E., Keku T.O., Key T.J., Kim H.R., Kobayashi E., Kolonel L.N., Kooperberg C., Kühn T., Küry S., Kweon S.-S., Larsson S.C., Laurie C.A., Le Marchand L., Leal S.M., Lee S.C., Lejbkowicz F., Lemire M., Li C.I., Li L., Lieb W., Lin Y., Lindblom A., Lindor N.M., Ling H., Louie T.L., Männistö S., Markowitz S.D., Martín V., Masala G., McNeil C.E., Melas M., Milne R.L., Moreno L., Murphy N., Myte R., Naccarati A., Newcomb P.A., Offit K., Ogino S., Onland-Moret N.C., Pardini B., Parfrey P.S., Pearlman R., Perduca V., Pharoah P.D.P., Pinchev M., Platz E.A., Prentice R.L., Pugh E., Raskin L., Rennert G., Rennert H.S., Riboli E., Rodríguez-Barranco M., Romm J., Sakoda L.C., Schafmayer C., Schoen R.E., Seminara D., Shah M., Shelford T., Shin M.-H., Shulman K., Sieri S., Slattery M.L., Southey M.C., Stadler Z.K., Stegmaier C., Su Y.-R., Tangen C.M., Thibodeau S.N., Thomas D.C., Thomas S.S., Toland A.E., Trichopoulou A., Ulrich C.M., Van Den Berg D.J., van Duijnhoven F.J.B., Van Guelpen B., van Kranen H., Vijai J., Visvanathan K., Vodicka P., Vodickova L., Vymetalkova V., Weigl K., Weinstein S.J., White E., Win A.K., Wolf C.R., Wolk A., Woods M.O., Wu A.H., Zaidi S.H., Zanke B.W., Zhang Q., Zheng W., Scacheri P.C., Potter J.D., Bassik M.C., Kundaje A., Casey G., Moreno V., Abecasis G.R., Nickerson D.A., Gruber S.B., Hsu L., Peters U. Discovery of common and rare genetic risk variants for colorectal cancer. Nat Genet. 2019;51:76–87. doi: 10.1038/s41588-018-0286-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wen X., Pique-Regi R., Luca F. Integrating molecular QTL data into genome-wide genetic association analysis: probabilistic assessment of enrichment and colocalization. PLoS Genet. 2017;13 doi: 10.1371/journal.pgen.1006646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Breschi A., Muñoz-Aguirre M., Wucher V., Davis C.A., Garrido-Martín D., Djebali S., Gillis J., Pervouchine D.D., Vlasova A., Dobin A., Zaleski C., Drenkow J., Danyko C., Scavelli A., Reverter F., Snyder M.P., Gingeras T.R., Guigó R. A limited set of transcriptional programs define major cell types. Genome Res. 2020;30:1047–1059. doi: 10.1101/gr.263186.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Glebov O.K., Rodriguez L.M., Nakahara K., Jenkins J., Cliatt J., Humbyrd C.-J., DeNobile J., Soballe P., Simon R., Wright G., Lynch P., Patterson S., Lynch H., Gallinger S., Buchbinder A., Gordon G., Hawk E., Kirsch I.R. Distinguishing right from left colon by the pattern of gene expression. Cancer Epidemiol Biomarkers Prev. 2003;12:755–762. [PubMed] [Google Scholar]
- 38.Puccini A., Marshall J.L., Salem M.E. Molecular variances between right- and left-sided colon cancers. Curr Colorectal Cancer Rep. 2018;14:152–158. [Google Scholar]
- 39.Zhang L., Zhao Y., Dai Y., Cheng J.-N., Gong Z., Feng Y., Sun C., Jia Q., Zhu B. Immune landscape of colorectal cancer tumor microenvironment from different primary tumor location. Front Immunol. 2018;9:1578. doi: 10.3389/fimmu.2018.01578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Stintzing S., Tejpar S., Gibbs P., Thiebach L., Lenz H.-J. Understanding the role of primary tumour localisation in colorectal cancer treatment and outcomes. Eur J Cancer. 2017;84:69–80. doi: 10.1016/j.ejca.2017.07.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Smillie C.S., Biton M., Ordovas-Montanes J., Sullivan K.M., Burgin G., Graham D.B., Herbst R.H., Rogel N., Slyper M., Waldman J., Sud M., Andrews E., Velonias G., Haber A.L., Jagadeesh K., Vickovic S., Yao J., Stevens C., Dionne D., Nguyen L.T., Villani A.-C., Hofree M., Creasey E.A., Huang H., Rozenblatt-Rosen O., Garber J.J., Khalili H., Desch A.N., Daly M.J., Ananthakrishnan A.N., Shalek A.K., Xavier R.J., Regev A. Intra- and inter-cellular rewiring of the human colon during ulcerative colitis. Cell. 2019;178:714–730.e22. doi: 10.1016/j.cell.2019.06.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Parikh K., Antanaviciute A., Fawkner-Corbett D., Jagielowicz M., Aulicino A., Lagerholm C., Davis S., Kinchen J., Chen H.H., Alham N.K., Ashley N., Johnson E., Hublitz P., Bao L., Lukomska J., Andev R.S., Björklund E., Kessler B.M., Fischer R., Goldin R., Koohy H., Simmons A. Colonic epithelial cell diversity in health and inflammatory bowel disease. Nature. 2019;567:49–55. doi: 10.1038/s41586-019-0992-y. [DOI] [PubMed] [Google Scholar]
- 43.James K.R., Gomes T., Elmentaite R., Kumar N., Gulliver E.L., King H.W., Stares M.D., Bareham B.R., Ferdinand J.R., Petrova V.N., Polański K., Forster S.C., Jarvis L.B., Suchanek O., Howlett S., James L.K., Jones J.L., Meyer K.B., Clatworthy M.R., Saeb-Parsy K., Lawley T.D., Teichmann S.A. Distinct microbial and immune niches of the human colon. Nat Immunol. 2020;21:343–353. doi: 10.1038/s41590-020-0602-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kim Y.S., Ho S.B. Intestinal goblet cells and mucins in health and disease: recent insights and progress. Curr Gastroenterol Rep. 2010;12:319–330. doi: 10.1007/s11894-010-0131-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Amirkhah R., Naderi-Meshkin H., Shah J.S., Dunne P.D., Schmitz U. The intricate interplay between epigenetic events, alternative splicing and noncoding RNA deregulation in colorectal cancer. Cells. 2019;8:929. doi: 10.3390/cells8080929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Cohen A.J., Saiakhova A., Corradin O., Luppino J.M., Lovrenert K., Bartels C.F., Morrow J.J., Mack S.C., Dhillon G., Beard L., Myeroff L., Kalady M.F., Willis J., Bradner J.E., Keri R.A., Berger N.A., Pruett-Miller S.M., Markowitz S.D., Scacheri P.C. Hotspots of aberrant enhancer activity punctuate the colorectal cancer epigenome. Nat Commun. 2017;8:14400. doi: 10.1038/ncomms14400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Bushnell B. BBtools. BBMap short read aligner, and other bioinformatic tools. sourceforge.net/projects/bbmap Available from: Accessed December 2019.
- 48.Andrews S. 2010. FastQC: a quality control tool for high throughput sequence data.http://www.bioinformatics.babraham.ac.uk/projects/fastqc Available from. Accessed December 2019. [Google Scholar]
- 49.Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Li B., Dewey C.N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.O’Connell J., Gurdasani D., Delaneau O., Pirastu N., Ulivi S., Cocca M., Traglia M., Huang J., Huffman J.E., Rudan I., McQuillan R., Fraser R.M., Campbell H., Polasek O., Asiki G., Ekoru K., Hayward C., Wright A.F., Vitart V., Navarro P., Zagury J.-F., Wilson J.F., Toniolo D., Gasparini P., Soranzo N., Sandhu M.S., Marchini J. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet. 2014;10 doi: 10.1371/journal.pgen.1004234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Das S., Forer L., Schönherr S., Sidore C., Locke A.E., Kwong A., Vrieze S.I., Chew E.Y., Levy S., McGue M., Schlessinger D., Stambolian D., Loh P.-R., Iacono W.G., Swaroop A., Scott L.J., Cucca F., Kronenberg F., Boehnke M., Abecasis G.R., Fuchsberger C. Next-generation genotype imputation service and methods. Nat Genet. 2016;48:1284–1287. doi: 10.1038/ng.3656. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.NCBI Resource Coordinators Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2015;43:D6–D17. doi: 10.1093/nar/gku1130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Robinson M.D., McCarthy D.J., Smyth G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ritchie M.E., Phipson B., Wu D., Hu Y., Law C.W., Shi W., Smyth G.K. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47. doi: 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Watanabe K., Taskesen E., van Bochoven A., Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat Commun. 2017;8:1826. doi: 10.1038/s41467-017-01261-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Ongen H., Buil A., Brown A.A., Dermitzakis E.T., Delaneau O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics. 2016;32:1479–1485. doi: 10.1093/bioinformatics/btv722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Stegle O., Parts L., Piipari M., Winn J., Durbin R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat Protoc. 2012;7:500–507. doi: 10.1038/nprot.2011.457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.John D., Storey Andrew J., Bass ADDR. qvalue Q-value estimation for false discovery rate control. 2018. http://github.com/jdstorey/qvalue Available from: Accessed November 2020.
- 61.McLaren W., Gil L., Hunt S.E., Riat H.S., Ritchie G.R.S., Thormann A., Flicek P., Cunningham F. The Ensembl variant effect predictor. Genome Biol. 2016;17:122. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Hnisz D., Abraham B.J., Lee T.I., Lau A., Saint-André V., Sigova A.A., Hoke H.A., Young R.A. Super-enhancers in the control of cell identity and disease. Cell. 2013;155:934–947. doi: 10.1016/j.cell.2013.09.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.The ENCODE Project Consortium A user’s guide to the encyclopedia of DNA elements (ENCODE) PLoS Biol. 2011;9 doi: 10.1371/journal.pbio.1001046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Yang Y.-C.T., Di C., Hu B., Zhou M., Liu Y., Song N., Li Y., Umetsu J., Lu Z.J. CLIPdb: a CLIP-seq database for protein-RNA interactions. BMC Genomics. 2015;16:51. doi: 10.1186/s12864-015-1273-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Schmidt E.M., Zhang J., Zhou W., Chen J., Mohlke K.L., Chen Y.E., Willer C.J. GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach. Bioinformatics. 2015;31:2601–2606. doi: 10.1093/bioinformatics/btv201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Bulik-Sullivan B.K., Schizophrenia Working Group of the Psychiatric Genomics Consortium. Loh P.-R., Finucane H.K., Ripke S., Yang J., Patterson N., Daly M.J., Price A.L., Neale B.M. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Wen X. Molecular QTL discovery incorporating genomic annotations using Bayesian false discovery rate control. Ann Appl Stat. 2016;10:1619–1638. [Google Scholar]
- 68.Berisa T., Pickrell J.K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics. 2016;32:283–285. doi: 10.1093/bioinformatics/btv546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Wen X., Lee Y., Luca F., Pique-Regi R. Efficient integrative multi-SNP association analysis via deterministic approximation of posteriors. Am J Hum Genet. 2016;98:1114–1129. doi: 10.1016/j.ajhg.2016.03.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Chang W., Cheng J., Allaire J.J., Xie Y., McPherson J. Shiny: web application framework for R. 2018. https://CRAN.R-project.org/package=shiny Available from: Accessed November 2020.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The RNA-Seq and SNP data that support the findings of this study as well as the sample covariates are available from the European Genome-phenome Archive under accession number EGAS00001004891. Complete summary statistics (including all FastQTL nominal pass results) for all QTLs identified in this study are available from the Digital Repository of the University of Barcelona at http://hdl.handle.net/2445/172697. Top-QTLs per gene are available in Supplementary Tables 7, 9, 10, 11, and 13.












