Skip to main content
iScience logoLink to iScience
. 2021 Feb 4;24(3):102144. doi: 10.1016/j.isci.2021.102144

Cis-regulatory mutations with driver hallmarks in major cancers

Zhongshan Cheng 1,3, Michael Vermeulen 1,4, Micheal Rollins-Green 1, Brian DeVeale 2,5,6,, Tomas Babak 1,5,∗∗
PMCID: PMC7903341  PMID: 33665563

Summary

Despite the recent availability of complete genome sequences of tumors from thousands of patients, isolating disease-causing (driver) non-coding mutations from the plethora of somatic variants remains challenging, and only a handful of validated examples exist. By integrating whole-genome sequencing, genetic data, and allele-specific gene expression from TCGA, we identified 320 somatic non-coding mutations that affect gene expression in cis (FDR<0.25). These mutations cluster into 47 cis-regulatory elements that modulate expression of their subject genes through diverse molecular mechanisms. We further show that these mutations have hallmark features of non-coding drivers; namely, that they preferentially disrupt transcription factor binding motifs, are associated with a selective advantage, increased oncogene expression and decreased tumor suppressor expression.

Subject areas: Genetics, Genomics, Cancer Systems Biology

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • Enrichment of functional non-coding somatic mutations predicts drivers

  • Elevated variant allele frequencies are consistent with roles in tumorigenesis

  • Putative non-coding drivers disrupt transcription factor binding motifs

  • Predicted drivers associate with increased oncogene and decreased TSG expression


Genetics; Genomics; Cancer Systems Biology

Introduction

Identification of somatic mutations that contribute to tumorigenesis is an essential step to understanding disease prognosis and developing therapies (Gerstung et al., 2015; Kulik et al., 1989; Verhaak et al., 2010). Despite extensive exome and genome sequencing efforts, a substantial proportion of causal or driver mutations (called drivers from here on) are thought to be unknown (Kandoth et al., 2013; Cancer Genome Atlas Research Network, 2014, 2015; Nik-Zainal et al., 2016). On average, 22.2% of tumor samples within each cancer type do not harbor coding mutations in any of 144 common driver genes (Schroeder et al., 2014). Moreover, since multiple drivers are typically involved (Vogelstein et al., 2013), even tumors with well-characterized mutations likely harbor additional causal alterations (Beerenwinkel et al., 2007; Merid et al., 2014; Sjoblom et al., 2006; Vogelstein et al., 2013). Mutations in cis-regulatory elements (CREs) are postulated to comprise a large fraction of the undiscovered drivers (Sjoblom et al., 2006). However, despite the availability of hundreds of complete tumor genomes, only a few non-coding drivers have been experimentally validated (Table S1).

Distinguishing drivers from passengers outside coding regions requires overcoming several known challenges: the search space is orders of magnitude larger, functional impact cannot be predicted from amino acid changes (especially gain-of-function alterations), mutation rates are higher (Poulos et al., 2015), and positive selection pressure on relative growth is relaxed. These challenges have been partially overcome by associating mutations with disruption or acquisition of transcription factor binding sites (Kalender Atak et al., 2017; Mathelier et al., 2015; Melton et al., 2015; Svetlichnyy et al., 2015; Weinhold et al., 2014), altered mRNA abundance (Fredriksson et al., 2014), clinical data (Smith et al., 2015; Weinhold et al., 2014), and evolutionary conservation (Carter et al., 2009; Foo et al., 2015; Fu et al., 2014; Piraino and Furney, 2017). Combinations of these features have also been weighed to prioritize putative drivers and determine significant mutational hotspots (Fu et al., 2014; Kalender Atak et al., 2017; Piraino and Furney, 2017; Puente et al., 2015; Weinhold et al., 2014).

Since the tumorigenic role of a non-coding driver is likely exerted through a cis-change in gene expression (Khurana et al., 2016), mapping genes whose expression is impacted by cis-acting regulatory effects has significant promise. Allele-specific expression (ASE), where one allele of a gene is more highly expressed than the other, is a powerful approach for detecting cis-regulatory effects, since trans-regulatory effects impact both alleles equally (Fraser, 2011). By comparing ASE in tumors to matched normal ASE (“diffASE”), it is further possible to distinguish somatic from germ-line effects. Ongen et al. applied this approach to identify 71 putative driver genes in colorectal cancer (Ongen et al., 2014). Furthermore, after predicting functional non-coding variants by prioritizing those generating de novo transcription factor binding motifs, Atak et al. showed that many of these somatic mutations were associated with ASE (Kalender Atak et al., 2017). In practice, however, the sparse availability of matched tumor and normal gene expression and genetic data poses a significant limitation. Just 7.7% of The Cancer Genome Atlas (TCGA) tumor samples have matched normal RNA-Seq data (Figure S1A).

Here we show that the vast majority of differential ASE is acquired in tumors, enabling us to dispense with the matched normal requirement and expand our survey 13-fold. We interrogated all whole-genome sequenced non-coding somatic mutations across 1,165 TCGA patients and identified 47 putative drivers as mutated CREs on the basis of robust association with ASE in tumors. The driver role of these mutations is further supported by functional disruption of transcription factor binding sites and elevated variant allele frequencies. This functional catalog of non-coding features significantly expands our knowledge of non-coding tumor driver biology.

Results

Survey of breast invasive carcinoma reveals that differential ASE is due to ASE in tumors

We initially focused on breast invasive carcinoma (BRCA) since it is the cancer type with the largest set of matched tumor and normal RNA-seq data accompanied by whole-genome sequence (WGS) in TCGA (Figure S1A). Measuring ASE relies on counting RNA-Seq reads that map over heterozygous single-nucleotide polymorphisms (SNPs) (Figure 1A) detected by genotyping arrays. To maximize our sensitivity, we first imputed and phased SNPs using the 1000 Genome haplotypes (Howie et al., 2009) (Figure 1A), which on average increases the number of informative SNPs by 20%. We have previously shown that this is more accurate than relying on WGS, particularly where coverage is low (Figure S1B), and reduces false-positive SNPs that have a disproportionately high impact on estimates of ASE since all reads are assigned to one haplotype (Babak et al., 2015). Phasing also allowed us to combine allelic counts across SNPs within the same gene, which contributes to the improved accuracy (Babak et al., 2015) (and see Transparent methods and Figure S2 for details). We observed extensive diffASE in BRCA (Figure 1B).

Figure 1.

Figure 1

Allelic bias commonly arises in tumors independent of copy number variations (CNVs) and promoter methylation

(A) A schematic of the allele-specific expression (ASE) analysis strategy implemented on TCGA breast cancer samples. In brief, imputed genotyping data, tumor RNA-seq and cases where tumor-normal matched RNA-seq are assessed for gene-level ASE by calculating the allelic imbalance ratios for imputed and phased heterozygous single-nucleotide polymorphisms (SNPs). We report differential ASE (diffASE) between tumors and matched normals and ASE in tumors (tumorASE) in cases when matched normals are unavailable.

(B) diffASE events in breast cancer tumors exceed the background. A diffASE event is called between a tumor and its matched normal when the allelic ratio between them is p < 0.001 using a chi-squared test and the skew is greater in the tumor. Six hundred thirty two diffASE events were obtained when the diffASE events calculated with the actual sample identities were compared to the background obtained with 10,000 permutations of randomized normal/tumor identities (FDR<0.05 and greater ASE in the tumor than the matched normal, n = 92; for clarity, only 100 permutated data are displayed in the figure). The FDR reflects the proportion of permutations where the most significant diffASE event was obtained with the actual tumor/normal data.

(C) > 90% of diffASE events originate in breast cancer tumors. The overlap of diffASE events with ASE events in tumors and matched normal by individual (p < 0.001, binomial distribution, n = 92).

(D) In BRCA tumors, most ASE is not correlated with CNV. The Pearson correlation between linear regression of gene-level tumorASE and the absolute tumor CNV signal is significant (R = 0.11, p = 1 x 10−82, n = 92) but does not explain the majority of ASE. This analysis includes every gene exhibiting ASE (binomial test, p < 0.001) in an individual tumor and excludes all others. The CNV signal intensity is obtained from CNV microarrays. Only 10% of the genes are depicted for clarity.

(E) Gene-level diffASE is weakly correlated with the promoter (±2 kb from TSS) methylation (Pearson's linear correlation R = −0.01, p = 2.4 x 10−6, n = 92). As in Figure 1D, all genes exhibiting ASE (binomial test, p < 0.001) in an individual tumor were included. The methylation beta value is the ratio of methylated to total probe intensity. Only 10% of the genes are depicted for clarity.

See also Figures S1–S3.

Nearly all of the diffASE can be attributed to an increase of ASE in tumors relative to matched controls (Figure 1B). We reasoned that this trend may be due to higher clonality of tumors relative to matched normal tissue which would be expected to be more complex. We first considered whether loss of heterozygosity (LOH) may be a confounding factor. Since all BRCA tumors are female, a comparison of allelic expression between autosomes to the X chromosome could illuminate the contribution of clonality. X chromosomes are randomly inactivated across cells comprising normal tissue. Comparison with a clone derived from this tissue (where all cells retain monoallelic expression from the same allele) would yield strong diffASE for any expressed gene on chromosome X. If clonality was the dominant source of greater ASE in tumors, we would expect enrichment of highly ranked X-linked genes when evaluated for diffASE. This enrichment would not be expected if LOH was the dominant source. We indeed observed a high enrichment of X-linked genes (66/100) among the top diffASE genes, suggesting that these tumors are highly clonal (Figure S2D). When we performed the ASE analysis using only tumor expression (tumorASE), >98% of the diffASE events were recapitulated in tumorASE and >90% of diffASE events attributable to increased allelic bias in tumors (Figure 1C). Finally, neither CNVs nor methylation explained the majority of ASE events originating in tumors (Figures 1D, 1E, and S3). CNVs showed the stronger correlation but only account for about 11% of the ASE in tumors. These findings suggested that altered cis-regulatory mechanisms of gene expression might explain the observed ASE in tumors, and that this signal is a valuable starting point for identifying non-coding drivers.

Identification of mutations that explain ASE in tumors

The availability of WGS data for 113 BRCA RNA-seq tumor samples (Figure S1A, Table S2) allowed us to find specific mutations that are associated, and which may explain, the observed ASE in tumors. We evaluated common mutation callers and implemented a robust filtering scheme to yield high confidence somatic variants (Figures S4A–S4D; see Transparent methods for details). We then asked whether the presence or absence of these variants near a gene is associated with ASE of that gene across BRCA tumor samples. Unfortunately, using mutations 10 kb upstream of each transcription start site (TSS) as well as within each gene body did not yield associations that survived multiple test correction, even in this heavily surveyed cancer type. We chose the window because cis-regulatory variants are heavily enriched in the 10 kb window upstream of the TSS (Group et al., 2020). The high proportion of neutral mutations relative to genuine non-coding drivers likely explains this result and necessitates an enrichment strategy for variants that are likely to have a functional impact.

The vast majority of previously validated non-coding driver mutations occur in 3′ UTRs, promoters, enhancers, and CTCF binding sites (Table S1). As these collectively encompass major sites of transcriptional regulation, we focused on somatic variants within these features and refined them using several publicly available annotation resources (see Transparent methods). To comprehensively map genomic regions where transcription is regulated, we also included an aggregate map of TF binding sites (‘TF binding’) and accessible chromatin (see Transparent methods, Table S3). For the enrichment analysis, we grouped the somatic mutations in these CREs by regulatory feature and asked if they were 10 kb upstream of a TSS or gene body of a gene exhibiting ASE. Since the active enhancers effecting target gene transcription vary by cell type and frequently regulate non-adjacent genes, we used the regulatory relationships previously defined for distinct cancer types by the association of chromatin accessibility and gene expression (cancer-specific) (Corces et al., 2018). Using active elements defined in cells matching the query tumor further reduces the search space and improves sensitivity by focusing on the active subset of enhancers in the query tumor (Perera et al., 2014). Putative drivers were identified by positive correlations between gene-level ASE and somatic regulatory mutations within a cis-regulatory feature (see Transparent methods and Figures 2A, S4A, and S5). This approach revealed candidate non-coding driver mutations regulating genes including some that had been previously implicated in breast cancer by coding variants.

Figure 2.

Figure 2

Seven regulatory features harboring somatic mutations are enriched for ASE in breast cancer

(A) A schematic of somatic mutations in cis-regulatory elements causing allele-specific expression.

(B) The significance of gene-level associations between mutated regulatory features and ASE in relation to FDR in breast cancer. The association of gene-level ASE was evaluated with a Wilcoxon rank-sum test (n = 113). The FDR is calculated as depicted in Figure S5 and detailed in the Transparent methods. The association was performed genome-wide.

(C) The ASE ratio of putative BRCA driver TIMP3 (p = 1.4 × 10−8, FDR = 0.11, Wilcoxon rank-sum test, n = 113). The boxplot is delimited by the first and the third quartile, the whiskers encompass minimum and maximum data, while the diamonds and dots represent the medians and raw absolute gene-level ASE, respectively.

(D) The ASE ratio of putative BRCA driver ITPR3 (p = 1.4 x 10−6, FDR = 0.15, Wilcoxon rank-sum test, n = 113). Boxplot features as in ‘C.’

See also Figures S4–S6 as well as Table S4.

Using these features, we found ten genes that are enriched for somatic mutations in regulatory elements that coincide with altered cis-regulation in breast cancer (FDR<0.25, n = 113). These include mutations in the CTCF binding sites of DAAM1, variants in the promoters of UNC5B, and in TF binding sites near EHMT1, FHIT, GSN, ITPR3, NCOA3, TIMP3, VPS13B, and ZDHHC14 (Figure 2B). The most significant association was between variants in TF binding sites adjacent to TIMP3 and its altered cis-regulation (Figure 2C). TIMP3 is an inhibitor of matrix metalloproteinases whose upregulation suppresses tumor growth (Anand-Apte et al., 1996). In the 3 tumors harboring enhancer mutations, dysregulation is evident from the ASE in mutated tumors compared to the remainder (Figure 2C, FDR = 0.11, p = 1.4x10−8, Wilcoxon rank-sum test). Somatic non-coding mutations associated with dysregulation of ITPR3 were the next most confident finding; their distribution in relation to ITPR3 is shown in Figure 2D (FDR = 0.15, p = 1.4 x 10−6). ITPR3 mediates the release of intracellular calcium in response to IP3 (Yamamoto-Hino et al., 1994). It was recently implicated as the target of the tumor suppressor BAP1 that triggers apoptosis following exposure to genotoxic stress (Bononi et al., 2017). Gene set enrichment analysis revealed enriched interactions between the established breast cancer pathway and the genes dysregulated by these putative non-coding driver mutations (Figure S6, p = 0.044, KEGG ‘Breast Cancer’).

Somatic mutations in regulatory features are enriched for gene-level ASE in diverse tumors

To identify relevant non-coding somatic mutations in other cancer types, we applied our pipeline to 11 other cancer types that had a sufficient number of matched WGS, RNA-Seq, genotyping, and cancer-specific chromatin accessibility data (derived by ATAC-seq (Corces et al., 2018)) (Figure S1A). Overall we identified 320 mutations in 47 CREs associated with ASE of a nearby gene (Figures 2B and 3, Tables 1 and S4, FDR<0.25). We will collectively refer to these 47 mutated CREs as the “putative drivers”. The top ranked putative driver by FDR was SEMA6D in stomach adenocarcinoma (STAD) (FDR = 0.01). SEMA6D promotes survival and anchorage independent growth of malignant pleural mesothelioma (Catalano et al., 2009). The second ranked putative driver by FDR was CBLB in acute myeloid leukemia (LAML) (FDR = 0.04). CBLB is an E3 ubiquitin ligase previously implicated in myeloid malignancies that helps to attenuate proliferative signals transduced by activated receptor tyrosine kinases (Makishima et al., 2009). Variants in the CTCF bound regions of CBLB were prevalent, occurring in 12.2% of tumors (n = 5/41). Other notable examples of putative drivers based on prevalence include the CTCF bound regions of FHIT in BRCA (11.5%; n = 13/113) and the CTCF bound region of SEMA4D in lung squamous cell carcinoma (LUAD) (11.5%; n = 26/226). The putative drivers were generally associated with consistently skewed ASE across tumors (Figure S8). The transcript abundance of most genes exhibiting cis-dysregulation in association with somatic variants was unaffected (Figure 3F). Moreover, the coding regions of genes exhibiting altered cis-regulation are free of nonsense mutations, consistent with the ASE dysregulation occurring at the level of transcription rather than being a secondary consequence of nonsense-mediated decay. The majority of genes impacted by our putative drivers have been implicated previously in cancer and compelling cases for their driver mechanistic roles are explored further below (see Discussion).

Figure 3.

Figure 3

Thirty-nine regulatory features harboring somatic mutations are enriched for ASE across 11 additional cancer types

(A) The number of somatic mutations in each tumor as well as the median number for each type of cancer.

(B) The number of somatic mutations in regulatory features that were tested in the ASE-Mutation association in each tumor as well as the median number for each type of cancer.

(C) The significance of gene-level associations between mutated regulatory features and ASE in relation to FDR in 11 additional cancer types. The association of gene-level ASE was evaluated with a Wilcoxon rank-sum test (see Figure S5 and Transparent methods for detail and Figure S7D for ‘n’ evaluated in each cancer). The associations were performed genome-wide and independently on each cancer type.

(D) The percentage of tumors where each mutated regulatory element is found. The percentage of mutated regulatory elements is presented for the cancer type where mutated regulatory elements were associated with ASE.

(E) The distribution of mutated regulatory features across tumors. The inset illustrates that the majority of samples do not harbor driver mutations.

(F) The abundance of most putative driver genes does not change.

See also Figures S7 and S8 as well as Table S4.

Table 1.

Annotated catalog of the 47 putative driver hotspots

Cancer Regulatory feature Gene symbol P FDR Carriers ASE-CNV assoc. P
BRCA CTCF DAAM1 2.5 × 10−6 0.24 5 8.3 × 10−1
BRCA Promoter UNC5B 9.6 × 10−3 0.18 4 1.0#
BRCA TF binding site EHMT1 3.0 × 10−5 0.22 3 ∗4.6 × 10−2
BRCA TF binding site FHIT 6.6 × 10−6 0.16 13 5.0 × 10−1
BRCA TF binding site GSN 5.6 × 10−5 0.21 3 3.6 × 10−1
BRCA TF binding site ITPR3 1.3 × 10−6 0.15 5 7.2 × 10−1
BRCA TF binding site NCOA3 3.8 × 10−5 0.21 3 1.0
BRCA TF binding site TIMP3 1.3 × 10−8 0.11 3 4.7 × 10−1
BRCA TF binding site VPS13B 4.2 × 10−7 0.15 3 1.5 × 10−1
BRCA TF binding site ZDHHC14 5.5 × 10−6 0.18 6 4.8 × 10−1
HNSC Promoter WLS 6.1 × 10−3 0.06 3 ∗6.7 × 10−3
LAML CTCF CBLB 3.6 × 10-4 0.04 4 1.0#
LAML CTCF WAC 4.3 × 10−2 0.24 3 1.0#
LUAD 3′ UTR ADAMTS2 1.7 × 10−8 0.23 4 3.6 × 10−1
LUAD Cancer-specific enhancer PIP5K1B 8.6 × 10−9 0.11 5 2.1 × 10−1
LUAD CTCF BLNK 1.6 × 10−6 0.16 6 4.7 × 10−1
LUAD CTCF C16orf75 2.4 × 10−6 0.15 5 1.0#
LUAD CTCF C9orf95 5.4 × 10−5 0.17 8 1.0#
LUAD CTCF CEP192 9.1 × 10−6 0.16 3 9.0 × 10−1
LUAD CTCF DSCR3 1.2 × 10−4 0.19 6 3.8 × 10−1
LUAD CTCF ENC1 1.3 × 10−5 0.17 6 3.2 × 10−1
LUAD CTCF ERI2 6.3 × 10−8 0.18 4 2.9 × 10−1
LUAD CTCF FAM120A 1.0 × 10−7 0.14 7 7.5 × 10−1
LUAD CTCF FAM120AOS 2.3 × 10−5 0.16 5 9.7 × 10−1
LUAD CTCF GATA3 1.0 × 10−4 0.18 5 9.5 × 10−1
LUAD CTCF HAUS8 3.1 × 10−5 0.15 2 2.3 × 10−1
LUAD CTCF KIF11 1.5 × 10−6 0.18 12 1.4 × 10−3
LUAD CTCF LPAR1 2.4 × 10−4 0.23 24 6.0 × 10−1
LUAD CTCF TMEM147 4.4 × 10−7 0.15 3 1.0#
LUAD CTCF PRR14 8.4 × 10−5 0.19 5 1.0#
LUAD CTCF RRP1B 2.5 × 10−5 0.15 3 1.0#
LUAD CTCF SEMA4D 3.4 × 10−6 0.15 26 1.4 × 10−2
LUAD CTCF SETD4 4.0 × 10−6 0.14 5 1.3 × 10−2
LUAD CTCF TSPAN14 2.1 × 10−5 0.17 4 7.1 × 10−1
LUAD CTCF TTC28 4.6 × 10−5 0.17 8 7.0 × 10−2
LUAD Accessible chromatin GGPS1 6.4 × 10−4 0.23 3 4.7 × 10−1
LUAD Accessible chromatin SND1 7.5 × 10−4 0.22 3 8.8 × 10−1
LUAD Promoter SF3A1 4.4 × 10−10 0.11 3 5.8 × 10−1
LUAD TF binding site C12orf5 1.9 × 10−10 0.14 3 1.0#
LUAD TF binding site CREBL2 3.5 × 10−8 0.14 4 7.7 × 10−1
LUAD TF binding site DNAJC5 9.7 × 10−9 0.17 3 8.9 × 10−3
LUAD TF binding site ITGAE 1.9 × 10−8 0.17 11 1.9 × 10−1
LUAD TF binding site TTC23 4.3 × 10−7 0.23 6 2.2 × 10−2
LUAD TF binding site VPS16 6.8 × 10−8 0.15 7 4.3 × 10−1
SKCM CTCF FRMD4A 3.3 × 10−2 0.15 3 ∗3.7 × 10−2
STAD 3′ UTR SEMA6D 2.1 × 10−4 0.01 3 3.9 × 10−1
STAD TF binding site TSHZ2 2.7 × 10−6 0.10 4 ∗1.5 × 10−2

Note: ∗Among the 7 genes where ASE is associated with CNV, tumors coincidently harboring driver mutations and CNV occurred in SEMA4D, SETD4, DNAJC5, and TTC23 of LUAD and TSHZ2 of STAD. SEMA4D was only nominally significant (p = 0.03) after exclusion of CNV carriers. TSHZ2 had one tumor carrying both driver mutation and CNV; however, the association between mutations and ASE remained significant after excluding the tumor for TSHZ2 (p = 2.2 × 10−5). SETD4, DNAJC5, and TTC23 were not significant after excluding CNV carriers in the ASE-Mut association, with p values were 0.20, 0.48, and 0.76, respectively. #When a driver gene does not have multiple samples harboring CNV for the association test between ASE and CNV, the association is assigned as p = 1.

CNV, copy number variation; P, ASE-mutation association p value; ASE-CNV Assoc P, ASE-CNV association p value; FDR, false discover rate; Carriers, driver mutations carriers.

Candidate drivers have elevated variant allele frequencies

By definition, driver mutations confer a selective advantage to the cells in which they occur. Variant allele frequency (VAF) measures the fraction of alleles in a sample in which the variant is present. Hence, if a mutation confers a selective advantage to the cell in which it occurs, its VAF would be higher, on average, than passenger mutations that arose coincidentally. A corollary being that mutations with increased VAF occurred early enough during tumor evolution for this selective advantage to manifest as increased VAF. To ask whether the putative driver mutations conferred a selective advantage, we compared the normalized VAF of all putative drivers to all non-coding mutations that were not enriched for ASE (p > 0.5). As a positive control, we used known coding driver mutations (Schroeder et al., 2014). As expected, we found that the VAF of known coding drivers (n = 116) was, on average, higher than background mutations in coding regions (Figure 4A, p value = 2.9 × 10−6, n = 2,971). Importantly, we found that the VAF of our candidate drivers was also higher (Figures 4A and 4B), an effect that is independent of CNV based on the stable ratio of adjacent heterozygous SNPs (Tables 1 and S4).

Figure 4.

Figure 4

Variant allele frequency (VAF) of putative non-coding drivers suggests positive selection

VAF was calculated as the fraction of all sequencing reads covering variant with mutation and was normalized against all mutations within each patient to account for differences in tumor heterogeneity.

(A) Coding Drivers (n = 116) represent all mutations within known driver genes that yield a functional amino acid change and Coding Bcg (n = 2,971) represents identically selected mutations in all other coding genes (Schroeder et al., 2014). Putative non-coding drivers represent all mutations from Table S4 (n = 320) and Non-coding Bcg represents all non-coding mutations not enriched for ASE (n = 122,603). Both Coding and Non-coding VAFs are positively shifted relative to background (p = 2.9 × 10−6, p = 1.4 × 10−7, 2-tailed Student's t-test, equal variance). The boxplots are delimited by the first and the third quartile, red lines indicate medians, whiskers encompass minimum and maximum data and red points indicate outliers. Please see Transparent methods for more details.

(B) Same as (A) with putative non-coding drivers divided by feature. The number of putative non-coding driver and background mutations, as well as the p value (2-tailed Student's t-test, equal variance) comparing the VAF between putative driver and background mutations for each feature are: promoters (n = 10, n = 1,747, p = 0.19), cancer-specific enhancers (n = 5, n = 2,329, p = 0.22), CTCF binding sites (n = 208, n = 26,128, p = 5.03 × 10−11), TF binding sites (n = 88, n = 50,202, p = 1.12 × 10−7), accessible chromatin (n = 6, n = 583, p = 0.50), 3′ UTRs (n = 8, n = 1,661, p = 0.86).

Candidate drivers disrupt transcription factor binding motifs

To further explore functional evidence supporting our non-coding driver mutations (Table 1), we asked whether they may be impacting DNA binding of transcription factors. Two features, specifically, might be expected to reflect this type of mechanism; TF binding sites and cancer-specific enhancers. We therefore limited our analysis to these features. Transcription factor binding affinities are typically represented by a generalized position-weight matrix (PWM) that represents a motif and a probability of observing any of the four bases at each position in that motif. These probabilities are typically constructed from observed frequencies of genuine binding events and can be represented by bit-scores. A bit-score of 2 implies that a particular base is always found at that position. We first scanned the genome for 392 transcription factor motifs and overlapped them with mutations (not filtered on ASE). Each motif had between 2 and 2,000 mutations (Figure 5A) and the number of mutations directly correlated with the number of motifs found in the genome (Figure 5B inset). As a preliminary analysis, we asked whether driver mutations (Table 1) are enriched in any putative transcription factor motifs. We observed a compelling enrichment of mutations among nucleotides important for transcription factor binding (Figure 5B). The challenge with relying on PWMs exclusively to identify transcription factor binding is that there is typically insufficient information to distinguish genuine binding sites from the many possible motif sequence matches in the genome. Here, we relied on the built-in component of our analysis to only consider mutations in functionally annotated regions. To further enrich for genuine transcription factor binding sites, we took advantage of ATAC-Seq data (Corces et al., 2018). Specifically, we considered motifs within open chromatin (i.e. in ATAC-Seq peaks) that were within 5 kb of a TSS. As expected the difference between reference and mutated bases tended to disrupt overall binding affinity (i.e. shift in all LUAD-ATAC-TSS-filtered mutations is overall negative; Figure 5C). Interestingly, driver mutations have an even stronger shift (Figure 5D). We did not see a significant difference in any other features, as expected, and also note that the numbers of driver mutations severely restricted our statistical power when exploring subsets of driver mutations outside of LUAD. It is also possible that improving binding (i.e. a positive delta-bit) could cause ASE, but we are not powered to explore this possibility.

Figure 5.

Figure 5

Driver mutations disrupt transcription factor binding motifs

(A) Number of overlapping mutation/motifs across 392 genome-mapped position-weight matrices (PWMs). TF binding site mutations outnumber cancer-specific enhancer driver mutations (inset).

(B) Putative driver mutations are overrepresented in several genome-mapped motif sets and disrupt important binding residues within the motifs of several transcription factors. The enrichment of mutations disrupting transcription factor binding motifs relative to background mutations was evaluated using a chi-squared test.

(C) The frequency at which mutations enhance or disrupt transcription factor binding motifs, evaluated as changes in PWM bits. Most mutations lead to a lower-affinity PWM.

(D) LUAD driver mutations (TF binding site and cancer-specific enhancers merged) within high-likelihood functionally bound motifs (ATAC-Seq support, within 5 kb of TSS; n = 18) result in a stronger shift than matched background (equivalent selection criteria except no association with ASE; n = 4,179). p values are two-tailed and computed using a Wilcoxon rank-sum test. The boxplots are delimited by the first and the third quartile, red lines indicate medians, and whiskers encompass minimum and maximum data.

The expression changes of putative driver genes are consistent with roles in tumorigenesis

Next, we considered how cis-regulatory drivers might contribute to tumorigenesis. The mechanism of non-coding drivers might be ectopic target expression or altered target expression kinetics; however, since altered target abundance is a common driver mechanism (Bailey et al., 2018), we leveraged DepMap to ask whether the upregulated target genes associated with predicted drivers were oncogenic and vice-versa (Figure 6A) (Meyers et al., 2017).

Figure 6.

Figure 6

Transcript abundance of regulatory variant targets align with roles in tumorigenesis

(A) The expression change associated with putative driver mutations. The fold-change of target genes compares expression in tumors harboring putative drivers relative to the remaining tumors of the same type. Upregulated and downregulated target genes were inferred to be oncogenic and tumor suppressors respectively.

(B) The fitness impact of deleting predicted driver genes. Deletion of 40 coding oncogenes decreases fitness, while deletion of 91 coding. tumor suppressor genes (TSGs) increases fitness across 808 cell lines. Deletion of target genes associated with predicted non-coding drivers had equivalent fitness effects as coding drivers: deletion of putative oncogenes reduced fitness (n = 26) and increased fitness of putative TSGs (n = 20, 2-sided unpaired t test for unequal variance, p values are as shown). The boxplots are delimited by the first and the third quartile, red lines indicate medians, whiskers encompass minimum and maximum data and red points indicate outliers.

(C and D) The fitness impact of deleting putative TSGs ADAMTS2 and SEMA4D. The boxplot features are the same as panel 'B'.

(E) Kaplan-Meier plot of LUAD patients (n = 515) with low expression of SEMA4D (n = 75) compared to the remaining patients. Low expression of SEMA4D is marginally associated with worse overall survival (p = 0.06; log rank test; ‘+’ indicate censored data).

In an effort to reveal the contribution of genes targeted by predicted drivers to tumorigenesis, we asked how their deletion affects cellular expansion (fitness). As expected, we found that deletion of coding oncogenes decreased fitness and deletion of TSGs increased fitness (Figure 6B, 2-sided unpaired t test for unequal variance, p < 1 × 10−200 (oncogenes), p = 9.2 × 10−89 (TSGs)). Deletion of target genes associated with predicted drivers mirrored the effects of known coding drivers: deletion of upregulated genes led to decreased fitness, and fitness increased following deletion of genes downregulated in association with predicted drivers (Figures 6B–6D, 2-sided unpaired t test for unequal variance, p = 4.9 × 10−185 (oncogenes), p = 1.7 × 10−80 (TSGs)).

The deletion of target genes in cell lines harboring predicted driver mutations is consistent with tumorigenic roles. Evaluating the oncogenic and suppressive potential of predicted driver genes across all CCLE lines provided a robust measurement of each gene's fitness contribution; however it does not capture the role of target genes in lines where the target gene might be a driver. To identify CCLE lines potentially driven by the predicted non-coding drivers, we screened CCLE lines for identical mutations. In 3 of 4 cell lines harboring variants identical to predicted driver mutations, deletion of target genes had the expected impact on fitness (e.g. Figures 6C and 6D). While most drivers are not associated with a clinical outcome (Smith and Sheltzer, 2018), we also found predicted drivers associated with expression changes relevant to patient survival. For example, predicted driver variants decreased SEMA4D expression, which is marginally associated with worse overall survival (Figure 6E, p = 0.06). Collectively, these analyses are consistent with many predicted drivers promoting tumorigenesis by altering target abundance.

Discussion

The association between mutated CREs and gene expression altered in cis revealed 47 clusters of non-coding mutations across 12 cancer types that exhibit the hallmarks of driver mutations. These 47 mutation hotspots significantly expand the landscape of putative non-coding cancer drivers. Prior approaches did not reveal the majority of our findings, although there is partial overlap with previous non-coding driver discoveries. For example, we found enriched cis-regulatory mutations in the CTCF binding sites of DAAM1 in BRCA. DAAM1 is a member of the formin protein family activated by Disheveled binding (Liu et al., 2008). It regulates cytoskeletal dynamics through its control of linear actin assembly (Li et al., 2011). Regulatory mutations in DAAM1 were recently implicated in invasiveness of melanoma (Zhang et al., 2018). Our findings also overlap previous reports in that somatic mutations in other regulatory regions of the same genes in the same type of cancer have been implicated as drivers. For example, mutations in the splice-acceptor site of GATA3 were previously implicated in LUAD (Hornshoj et al., 2018). Here we implicated promoter mutations in GATA3 in LUAD. This overlap suggests that the consequences of mutated regulatory features may overlap in these cases, and that combining the association of distinct features that regulate the same gene may increase sensitivity.

Many of the genes impacted by the putative non-coding drivers discovered here (Tables 1 and S4) have been previously implicated in cancer biology. The predicted non-coding drivers disproportionately impact established coding drivers (hypergeometric, p = 0.006). The target genes impacted by non-coding drivers include the COSMIC genes CBLB, FHIT, GATA3, and SND1. The genes associated with mutated CREs in BRCA illustrate how driver roles clearly tie into the established functions of the dysregulated genes. NCOA3 is a transcriptional co-activator that is alternatively known as Amplified in Breast 1 (AIB-1) after its amplification and increased abundance was discovered in breast cancer (Anzick et al., 1997). NCOA3 enhances estrogen-dependent transcription (Anzick et al., 1997). In this analysis, NCOA3 was neither amplified nor elevated in abundance in the tumors with putative driver mutations, but our approach still implicated it in BRCA based on the association of somatic mutations with its dysregulation (Table 1). EHMT1 provides insight into the observation that the total abundance of most genes dysregulated in association with non-coding mutations is unchanged. EHMT1 represses transcription by methylating H3K9 residues in conjunction with EHMT2 (G9a in mice) (Tachibana et al., 2005). EHMT1/2 complexed with E2F6, and polycomb proteins preferentially occupy promoters in G0 phase of the cell cycle and is associated with cellular quiescence (Ogawa et al., 2002; Tachibana et al., 2005), suggesting that non-coding mutations associated with EHMT1 might disrupt its coordination with the cell cycle as opposed to its abundance. Conversely, FHIT is one of the few putative drivers that is dysregulated and differentially expressed. Consistent with the observed decrease in expression, FHIT is an established tumor suppressor gene (Waters et al., 2014).

Extensive cis-regulatory changes occur during tumorigenesis that are unrelated to copy number variation. In contrast with previous reports in different tumor types, CNVs were not responsible for the majority of altered cis-regulation in BRCA tumors (Mayba et al., 2014). Somatic regulatory variants are a major source of altered cis-regulation in tumors. Somatic variants in non-coding regions that are enriched for altered cis-regulation were found in 11.9% of the tumors analyzed. This high-prevalence is predicted by multi-hit models as well as divergent phenotypes between tumors with common known drivers. While many of the associations involve genes thought to be involved in tumorigenesis, the implication of specific mutations and regulatory features is a mechanistic advance. Indeed, we are not aware of any of the specific mutated regulatory features reported here previously being implicated as drivers of tumorigenesis.

Although TCGA and other emerging cancer data now include >1000 available genomes, illuminating the complete set of non-coding drivers will require a substantially broader collection. Even with the approach employed here of focusing on functional somatic variants with underlying evidence of gene expression regulation, we found ourselves limited by statistical power, especially in cancer types with fewer than 100 genomes. Deeper genome sequencing with longer reads will also improve driver detection sensitivity by enabling phasing of mutations with the direction of ASE. This would allow more evidence to be used to prioritize genuine drivers (e.g. disruption of an activating transcription factor binding site should reduce expression of that allele). This was generally not possible with the current available data since accurate phasing of somatic variants more than a few hundred base pairs away from the gene would require long-read technology or much deeper coverage. Improved matching of the regulatory features to each cell type will also improve sensitivity. When possible, cellular context was prioritized throughout these analyses to account for context-specific aspects of gene-regulation. For example, enhancers were matched to the cancer type being analyzed (Corces et al., 2018), and each cancer was separately analyzed in parallel, however, enhancer to gene maps are still incomplete and will no doubt improve with more chromatin accessibility readouts expand. In any case, we believe our approach here, made freely available as a dockerized pipeline (see Transparent methods) will be a powerful tool for taking advantage of these emerging resources and building on our discoveries.

Limitations of the study

The greatest limitation currently hindering our ability to apply our approach to broadly map cis-acting regulatory mutations across cancer is the limited availability of matched WGS and RNA-Seq data. Most RNA-Seq samples in TCGA do not have WGS data at all, and many WGS have relatively low-coverage data that makes identifying somatic mutations in regulatory regions difficult. Statistical power is simply not there to detect meaningful associations with just dozens (sometimes hundreds) of samples having WGS within each cancer type. A second, related limitation, is the current difficulty in phasing somatic variants captured only by WGS, into haplotypes against which ASE is ascertained. This adds complexity to characterizing the putative causal mechanism of ASE associated with a specific mutation. Long reads are a potential solution, but even increasing the coverage of paired-end short reads via WGS could dramatically improve phasing from overlapping reads.

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Brian DeVeale (brian.deveale@ucsf.edu).

Materials availability

All of the data analyzed in this study was generated and made accessible by TCGA.

Data and code availability

We have made all of the code scripted and used in this analysis freely publicly available. Details are described in the Transparent methods section. Our Driver-ASE package is available via GitHub (https://github.com/MichealRollins-Green/Driver-ASE) and as a Docker image (https://hub.docker.com/r/mikegreen24/driver-ase). All raw gene-level ASE and somatic mutations called in this analysis can be accessed via Mendeley Data: https://data.mendeley.com/datasets/4kx5sfx9vz/2.

Driver-ASE uses data or software provided by the following websites:

UCSC Genome Browser (https://genome.ucsc.edu/cgi-bin/hgTables), Genomic Data Commons (https://gdc.cancer.gov/), Genomic Data Commons (https://portal.gdc.cancer.gov/legacy-archive/search/f), The Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov), PLINK (www.cog-genomics.org/plink), NIH Roadmap Epigenomics Mapping Consortium (www.roadmapepigenomics.org), SAMtools (www.htslib.org), overlapSelect (http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/overlapSelect), Varscan2 (http://massgenomics.org/varscan), impute2 (https://mathgen.stats.ox.ac.uk/impute/impute_v2.html), and shapeit (https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html).

Methods

All methods can be found in the accompanying Transparent Methods supplemental file.

Acknowledgments

We would like to thank the funding of the Canadian Cancer Society, Canada, which enabled this research.

This study was supported by Canadian Cancer Society (CBCF grant BC-RG-15-2, PI: T.B.). No funding sources were involved in study design, data collection and interpretation, or the decision to submit the work for publication.

Author contributions

T.B. and B.D. designed the study. T.B, B.D., Z.C., M.V., and M.R. analyzed the data and wrote the manuscript.

Ethics approval and consent to participate.

We agreed to and followed the TCGA data use agreement.

Consent for publication TCGA obtained informed consent from participants for all of the donated specimens. TCGA uses the informed consent guidelines developed by NCI and NHGRI as detailed on their website.

Declaration of interests

The authors have no conflicts of interest to declare.

Published: March 19, 2021

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.isci.2021.102144.

Contributor Information

Brian DeVeale, Email: brian.deveale@ucsf.edu.

Tomas Babak, Email: tomas.babak@queensu.ca.

Supplemental information

Document S1. Transparent methods and Figures S1–S8
mmc1.pdf (5.3MB, pdf)
Table S1. A table of known non-coding drivers, related to Figures 2 and 3
mmc2.xlsx (15.7KB, xlsx)
Table S2. Annotation of the samples used in this analysis, related to Figures 2 and 3
mmc3.xlsx (107.5KB, xlsx)
Table S3. Annotation of the samples used to generate the meta-track of TF binding sites, related to Figures 2 and 3
mmc4.xlsx (20.9KB, xlsx)
Table S4. Annotated catalog of the 320 potential driver mutations among 47 putative driver hotspots, related to Figures 2 and 3
mmc5.xlsx (17.5KB, xlsx)

References

  1. Anand-Apte B., Bao L., Smith R., Iwata K., Olsen B.R., Zetter B., Apte S.S. A review of tissue inhibitor of metalloproteinases-3 (TIMP-3) and experimental analysis of its effect on primary tumor growth. Biochem. Cell Biol. 1996;74:853–862. doi: 10.1139/o96-090. [DOI] [PubMed] [Google Scholar]
  2. Anzick S.L., Kononen J., Walker R.L., Azorsa D.O., Tanner M.M., Guan X.Y., Sauter G., Kallioniemi O.P., Trent J.M., Meltzer P.S. AIB1, a steroid receptor coactivator amplified in breast and ovarian cancer. Science. 1997;277:965–968. doi: 10.1126/science.277.5328.965. [DOI] [PubMed] [Google Scholar]
  3. Babak T., DeVeale B., Tsang E.K., Zhou Y., Li X., Smith K.S., Kukurba K.R., Zhang R., Li J.B., van der Kooy D. Genetic conflict reflected in tissue-specific maps of genomic imprinting in human and mouse. Nat. Genet. 2015;47:544–549. doi: 10.1038/ng.3274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bailey M.H., Tokheim C., Porta-Pardo E., Sengupta S., Bertrand D., Weerasinghe A., Colaprico A., Wendl M.C., Kim J., Reardon B. Comprehensive characterization of cancer driver genes and mutations. Cell. 2018;173:371–385.e18. doi: 10.1016/j.cell.2018.02.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Beerenwinkel N., Antal T., Dingli D., Traulsen A., Kinzler K.W., Velculescu V.E., Vogelstein B., Nowak M.A. Genetic progression and the waiting time to cancer. PLoS Comput. Biol. 2007;3:e225. doi: 10.1371/journal.pcbi.0030225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bononi A., Giorgi C., Patergnani S., Larson D., Verbruggen K., Tanji M., Pellegrini L., Signorato V., Olivetto F., Pastorino S. BAP1 regulates IP3R3-mediated Ca(2+) flux to mitochondria suppressing cell transformation. Nature. 2017;546:549–553. doi: 10.1038/nature22798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Cancer Genome Atlas Research Network Comprehensive molecular profiling of lung adenocarcinoma. Nature. 2014;511:543–550. doi: 10.1038/nature13385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cancer Genome Atlas Research Network The molecular taxonomy of primary prostate cancer. Cell. 2015;163:1011–1025. doi: 10.1016/j.cell.2015.10.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Carter H., Chen S., Isik L., Tyekucheva S., Velculescu V.E., Kinzler K.W., Vogelstein B., Karchin R. Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer Res. 2009;69:6660–6667. doi: 10.1158/0008-5472.CAN-09-1133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Catalano A., Lazzarini R., Di Nuzzo S., Orciari S., Procopio A. The plexin-A1 receptor activates vascular endothelial growth factor-receptor 2 and nuclear factor-kappaB to mediate survival and anchorage-independent growth of malignant mesothelioma cells. Cancer Res. 2009;69:1485–1493. doi: 10.1158/0008-5472.CAN-08-3659. [DOI] [PubMed] [Google Scholar]
  11. Corces M.R., Granja J.M., Shams S., Louie B.H., Seoane J.A., Zhou W., Silva T.C., Groeneveld C., Wong C.K., Cho S.W. The chromatin accessibility landscape of primary human cancers. Science. 2018;362:eaav1898. doi: 10.1126/science.aav1898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Foo J., Liu L.L., Leder K., Riester M., Iwasa Y., Lengauer C., Michor F. An evolutionary approach for identifying driver mutations in colorectal cancer. PLoS Comput. Biol. 2015;11:e1004350. doi: 10.1371/journal.pcbi.1004350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fraser H.B. Genome-wide approaches to the study of adaptive gene expression evolution: systematic studies of evolutionary adaptations involving gene expression will allow many fundamental questions in evolutionary biology to be addressed. Bioessays. 2011;33:469–477. doi: 10.1002/bies.201000094. [DOI] [PubMed] [Google Scholar]
  14. Fredriksson N.J., Ny L., Nilsson J.A., Larsson E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet. 2014;46:1258–1263. doi: 10.1038/ng.3141. [DOI] [PubMed] [Google Scholar]
  15. Fu Y., Liu Z., Lou S., Bedford J., Mu X.J., Yip K.Y., Khurana E., Gerstein M. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 2014;15:480. doi: 10.1186/s13059-014-0480-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Gerstung M., Pellagatti A., Malcovati L., Giagounidis A., Porta M.G., Jadersten M., Dolatshad H., Verma A., Cross N.C., Vyas P. Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes. Nat. Commun. 2015;6:5901. doi: 10.1038/ncomms6901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Group P.T.C., Calabrese C., Davidson N.R., Demircioglu D., Fonseca N.A., He Y., Kahles A., Lehmann K.V., Liu F., Shiraishi Y. Genomic basis for RNA alterations in cancer. Nature. 2020;578:129–136. doi: 10.1038/s41586-020-1970-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Hornshoj H., Nielsen M.M., Sinnott-Armstrong N.A., Switnicki M.P., t Juul M., Madsen T., Sallari R., Kellis M., Orntoft T., Hobolth A. Pan-cancer screen for mutations in non-coding elements with conservation and cancer specificity reveals correlations with expression and survival. NPJ Genom Med. 2018;3:1. doi: 10.1038/s41525-017-0040-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Howie B.N., Donnelly P., Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5:e1000529. doi: 10.1371/journal.pgen.1000529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Kalender Atak Z., Imrichova H., Svetlichnyy D., Hulselmans G., Christiaens V., Reumers J., Ceulemans H., Aerts S. Identification of cis-regulatory mutations generating de novo edges in personalized cancer gene regulatory networks. Genome Med. 2017;9:80. doi: 10.1186/s13073-017-0464-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kandoth C., McLellan M.D., Vandin F., Ye K., Niu B., Lu C., Xie M., Zhang Q., McMichael J.F., Wyczalkowski M.A. Mutational landscape and significance across 12 major cancer types. Nature. 2013;502:333–339. doi: 10.1038/nature12634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Khurana E., Fu Y., Chakravarty D., Demichelis F., Rubin M.A., Gerstein M. Role of non-coding sequence variants in cancer. Nat. Rev. Genet. 2016;17:93–108. doi: 10.1038/nrg.2015.17. [DOI] [PubMed] [Google Scholar]
  23. Kulik G.I., Pel'kis F.P., Korol V.I. Adaptation of the body to alkylating anti-tumor substances. Eksp. Onkol. 1989;11:34–38. [PubMed] [Google Scholar]
  24. Li D., Hallett M.A., Zhu W., Rubart M., Liu Y., Yang Z., Chen H., Haneline L.S., Chan R.J., Schwartz R.J. Dishevelled-associated activator of morphogenesis 1 (Daam1) is required for heart morphogenesis. Development. 2011;138:303–315. doi: 10.1242/dev.055566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Liu W., Sato A., Khadka D., Bharti R., Diaz H., Runnels L.W., Habas R. Mechanism of activation of the formin protein Daam1. Proc. Natl. Acad. Sci. U S A. 2008;105:210–215. doi: 10.1073/pnas.0707277105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Makishima H., Cazzolli H., Szpurka H., Dunbar A., Tiu R., Huh J., Muramatsu H., O'Keefe C., Hsi E., Paquette R.L. Mutations of e3 ubiquitin ligase cbl family members constitute a novel common pathogenic lesion in myeloid malignancies. J. Clin. Oncol. 2009;27:6109–6116. doi: 10.1200/JCO.2009.23.7503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Mathelier A., Lefebvre C., Zhang A.W., Arenillas D.J., Ding J., Wasserman W.W., Shah S.P. Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas. Genome Biol. 2015;16:84. doi: 10.1186/s13059-015-0648-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Mayba O., Gilbert H.N., Liu J., Haverty P.M., Jhunjhunwala S., Jiang Z., Watanabe C., Zhang Z. MBASED: allele-specific expression detection in cancer tissues and cell lines. Genome Biol. 2014;15:405. doi: 10.1186/s13059-014-0405-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Melton C., Reuter J.A., Spacek D.V., Snyder M. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat. Genet. 2015;47:710–716. doi: 10.1038/ng.3332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Merid S.K., Goranskaya D., Alexeyenko A. Distinguishing between driver and passenger mutations in individual cancer genomes by network enrichment analysis. BMC Bioinformatics. 2014;15:308. doi: 10.1186/1471-2105-15-308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Meyers R.M., Bryan J.G., McFarland J.M., Weir B.A., Sizemore A.E., Xu H., Dharia N.V., Montgomery P.G., Cowley G.S., Pantel S. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat. Genet. 2017;49:1779–1784. doi: 10.1038/ng.3984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Nik-Zainal S., Davies H., Staaf J., Ramakrishna M., Glodzik D., Zou X., Martincorena I., Alexandrov L.B., Martin S., Wedge D.C. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature. 2016;534:47–54. doi: 10.1038/nature17676. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Ogawa H., Ishiguro K., Gaubatz S., Livingston D.M., Nakatani Y. A complex with chromatin modifiers that occupies E2F- and Myc-responsive genes in G0 cells. Science. 2002;296:1132–1136. doi: 10.1126/science.1069861. [DOI] [PubMed] [Google Scholar]
  34. Ongen H., Andersen C.L., Bramsen J.B., Oster B., Rasmussen M.H., Ferreira P.G., Sandoval J., Vidal E., Whiffin N., Planchon A. Putative cis-regulatory drivers in colorectal cancer. Nature. 2014;512:87–90. doi: 10.1038/nature13602. [DOI] [PubMed] [Google Scholar]
  35. Perera D., Chacon D., Thoms J.A., Poulos R.C., Shlien A., Beck D., Campbell P.J., Pimanda J.E., Wong J.W. OncoCis: annotation of cis-regulatory mutations in cancer. Genome Biol. 2014;15:485. doi: 10.1186/s13059-014-0485-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Piraino S.W., Furney S.J. Identification of coding and non-coding mutational hotspots in cancer genomes. BMC Genomics. 2017;18:17. doi: 10.1186/s12864-016-3420-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Poulos R.C., Sloane M.A., Hesson L.B., Wong J.W. The search for cis-regulatory driver mutations in cancer genomes. Oncotarget. 2015;6:32509–32525. doi: 10.18632/oncotarget.5085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Puente X.S., Bea S., Valdes-Mas R., Villamor N., Gutierrez-Abril J., Martin-Subero J.I., Munar M., Rubio-Perez C., Jares P., Aymerich M. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature. 2015;526:519–524. doi: 10.1038/nature14666. [DOI] [PubMed] [Google Scholar]
  39. Schroeder M.P., Rubio-Perez C., Tamborero D., Gonzalez-Perez A., Lopez-Bigas N. OncodriveROLE classifies cancer driver genes in loss of function and activating mode of action. Bioinformatics. 2014;30 doi: 10.1093/bioinformatics/btu467. i549–555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Sjoblom T., Jones S., Wood L.D., Parsons D.W., Lin J., Barber T.D., Mandelker D., Leary R.J., Ptak J., Silliman N. The consensus coding sequences of human breast and colorectal cancers. Science. 2006;314:268–274. doi: 10.1126/science.1133427. [DOI] [PubMed] [Google Scholar]
  41. Smith J.C., Sheltzer J.M. Systematic identification of mutations and copy number alterations associated with cancer patient prognosis. Elife. 2018;7:e39217. doi: 10.7554/eLife.39217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Smith K.S., Yadav V.K., Pedersen B.S., Shaknovich R., Geraci M.W., Pollard K.S., De S. Signatures of accelerated somatic evolution in gene promoters in multiple cancer types. Nucleic Acids Res. 2015;43:5307–5317. doi: 10.1093/nar/gkv419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Svetlichnyy D., Imrichova H., Fiers M., Kalender Atak Z., Aerts S. Identification of high-impact cis-regulatory mutations using transcription factor specific random forest models. PLoS Comput. Biol. 2015;11:e1004590. doi: 10.1371/journal.pcbi.1004590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Tachibana M., Ueda J., Fukuda M., Takeda N., Ohta T., Iwanari H., Sakihama T., Kodama T., Hamakubo T., Shinkai Y. Histone methyltransferases G9a and GLP form heteromeric complexes and are both crucial for methylation of euchromatin at H3-K9. Genes Dev. 2005;19:815–826. doi: 10.1101/gad.1284005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Verhaak R.G., Hoadley K.A., Purdom E., Wang V., Qi Y., Wilkerson M.D., Miller C.R., Ding L., Golub T., Mesirov J.P. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell. 2010;17:98–110. doi: 10.1016/j.ccr.2009.12.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Vogelstein B., Papadopoulos N., Velculescu V.E., Zhou S., Diaz L.A., Jr., Kinzler K.W. Cancer genome landscapes. Science. 2013;339:1546–1558. doi: 10.1126/science.1235122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Waters C.E., Saldivar J.C., Hosseini S.A., Huebner K. The FHIT gene product: tumor suppressor and genome "caretaker". Cell. Mol. Life Sci. 2014;71:4577–4587. doi: 10.1007/s00018-014-1722-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Weinhold N., Jacobsen A., Schultz N., Sander C., Lee W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet. 2014;46:1160–1165. doi: 10.1038/ng.3101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Yamamoto-Hino M., Sugiyama T., Hikichi K., Mattei M.G., Hasegawa K., Sekine S., Sakurada K., Miyawaki A., Furuichi T., Hasegawa M. Cloning and characterization of human type 2 and type 3 inositol 1,4,5-trisphosphate receptors. Recept. Channels. 1994;2:9–22. [PubMed] [Google Scholar]
  50. Zhang W., Bojorquez-Gomez A., Velez D.O., Xu G., Sanchez K.S., Shen J.P., Chen K., Licon K., Melton C., Olson K.M. A global transcriptional network connecting noncoding mutations to changes in tumor gene expression. Nat. Genet. 2018;50:613–620. doi: 10.1038/s41588-018-0091-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Transparent methods and Figures S1–S8
mmc1.pdf (5.3MB, pdf)
Table S1. A table of known non-coding drivers, related to Figures 2 and 3
mmc2.xlsx (15.7KB, xlsx)
Table S2. Annotation of the samples used in this analysis, related to Figures 2 and 3
mmc3.xlsx (107.5KB, xlsx)
Table S3. Annotation of the samples used to generate the meta-track of TF binding sites, related to Figures 2 and 3
mmc4.xlsx (20.9KB, xlsx)
Table S4. Annotated catalog of the 320 potential driver mutations among 47 putative driver hotspots, related to Figures 2 and 3
mmc5.xlsx (17.5KB, xlsx)

Data Availability Statement

We have made all of the code scripted and used in this analysis freely publicly available. Details are described in the Transparent methods section. Our Driver-ASE package is available via GitHub (https://github.com/MichealRollins-Green/Driver-ASE) and as a Docker image (https://hub.docker.com/r/mikegreen24/driver-ase). All raw gene-level ASE and somatic mutations called in this analysis can be accessed via Mendeley Data: https://data.mendeley.com/datasets/4kx5sfx9vz/2.

Driver-ASE uses data or software provided by the following websites:

UCSC Genome Browser (https://genome.ucsc.edu/cgi-bin/hgTables), Genomic Data Commons (https://gdc.cancer.gov/), Genomic Data Commons (https://portal.gdc.cancer.gov/legacy-archive/search/f), The Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov), PLINK (www.cog-genomics.org/plink), NIH Roadmap Epigenomics Mapping Consortium (www.roadmapepigenomics.org), SAMtools (www.htslib.org), overlapSelect (http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/overlapSelect), Varscan2 (http://massgenomics.org/varscan), impute2 (https://mathgen.stats.ox.ac.uk/impute/impute_v2.html), and shapeit (https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html).


Articles from iScience are provided here courtesy of Elsevier

RESOURCES