Skip to main content
eBioMedicine logoLink to eBioMedicine
. 2017 Dec 1;27:156–166. doi: 10.1016/j.ebiom.2017.11.028

Module Analysis Captures Pancancer Genetically and Epigenetically Deregulated Cancer Driver Genes for Smoking and Antiviral Response

Magali Champion a,1, Kevin Brennan a,1, Tom Croonenborghs b,c, Andrew J Gentles d, Nathalie Pochet b, Olivier Gevaert a,
PMCID: PMC5828545  PMID: 29331675

Abstract

The availability of increasing volumes of multi-omics profiles across many cancers promises to improve our understanding of the regulatory mechanisms underlying cancer. The main challenge is to integrate these multiple levels of omics profiles and especially to analyze them across many cancers. Here we present AMARETTO, an algorithm that addresses both challenges in three steps. First, AMARETTO identifies potential cancer driver genes through integration of copy number, DNA methylation and gene expression data. Then AMARETTO connects these driver genes with co-expressed target genes that they control, defined as regulatory modules. Thirdly, we connect AMARETTO modules identified from different cancer sites into a pancancer network to identify cancer driver genes. Here we applied AMARETTO in a pancancer study comprising eleven cancer sites and confirmed that AMARETTO captures hallmarks of cancer. We also demonstrated that AMARETTO enables the identification of novel pancancer driver genes. In particular, our analysis led to the identification of pancancer driver genes of smoking-induced cancers and ‘antiviral’ interferon-modulated innate immune response.

Software availability

AMARETTO is available as an R package at https://bitbucket.org/gevaertlab/pancanceramaretto

Keywords: Data fusion, Cancer driver gene discovery, Module network

Highlights

  • We present an algorithm for pancancer identification of cancer driver genes based on multi-omics data fusion

  • GPX2 is a novel driver gene in smoking induced cancers and validated using knockdown of GPX2 in the A549 cell line.

  • OAS2 is a novel driver gene defining cancers with an antiviral signature and infiltration of tumor-associated macrophages

We present an algorithm that combines multiple sources of molecular data to identify novel genes that are involved in cancer development. We applied this algorithm on multiple cancers in a combined fashion and identified a network of pancancer driver genes. We highlighted two genes in detail GPX2 and OAS2. We showed that GPX2 is an important cancer gene in smoking induced cancers, and validated our predictions using experimental data where GPX2 was inactivated in a lung cancer cell line. Similarly we showed that OAS2 is an important cancer driver gene in cancers that show an antiviral signature.

1. Introduction

In the last two decades, advances in high-throughput experimental technologies have produced an abundance of molecular data. An increasing number of large multi-omics projects have launched and provide millions of data points for thousands of biological samples. For example, The Cancer Genome Atlas (TCGA) project (Hoadley et al., 2014, Cancer Genome Atlas Research Network, 2013, Yuan et al., 2014) was launched to improve our ability to diagnose, treat and prevent cancer and has produced an enormous amount of multi-omics data. Interpreting these high dimensional datasets to identify novel cancer driver genes represents an outstanding challenge. True cancer driver genes are those whose perturbation pushes a cell towards a malignant phenotype. Within this study, we define cancer driver genes as genes that fulfill all of the following criteria: (1) genes that are genetically and/or epigenetically deregulated in cancer, (2) genes whose genetic and epigenetic aberrations have a direct impact on their own functional gene expression levels, and (3) genes that are predicted to play regulatory roles high in the causal hierarchy of the origin of tumors. These include, for example, transcription factors, cell cycle genes or epigenetic modifying enzymes, whose altered state in cancer results in deregulation of downstream target genes; as well as upstream signaling molecules. They typically hide amongst a large number of passenger genes that are only by chance genetically or epigenetically altered (Eifert and Powers, 2012).

Previously, several computational methods have been developed to integrate multi-omics data. For example, Ciriello et al. used a method based on mutual exclusivity of copy number and mutation events to identify driver genes in glioblastoma (Ciriello et al., 2012). Similarly, Vandin et al. developed a method to identify driver genes in cancer, but focused on finding pathways with a significant enrichment of mutually exclusive genes (Vandin et al., 2012). In addition, Akavia et al. built further on this work and used copy number data to identify potential cancer driver genes in a modified Bayesian module network analysis called CONEXIC (Akavia et al., 2010). More recently, other groups are focusing on identifying driver genes through network analysis of copy number data to identify potential drivers using a Bayesian module network analysis (Ray et al., 2014).

We have previously developed AMARETTO, an algorithm that integrates copy number, DNA methylation and gene expression data to identify a set of driver genes altered by DNA methylation or DNA copy number alterations, and constructs a gene expression network to connect them to clusters of co-expressed genes, defined as modules (Gevaert and Plevritis, 2013, Gevaert et al., 2013). These gene expression modules are subsequently ascribed biological pathways using gene set enrichment analysis (GSEA), revealing the pathways affected by cancer driver gene regulation. AMARETTO is thus a data driven pathway approach, using genomic, epigenomics and transcriptomics data as inputs, and produces modules and cancer driver genes associated with these modules as output. Integration of epigenomics data is essential to comprehensive analysis of cancer genomic analysis, as DNA methylation is a major mechanism of transcriptional deregulation in virtually all cancers. For example, cancer driver genes such as BRCA1 and MLH1, which are often altered by mutation in cancer, are also frequently deregulated by DNA methylation in other patients, with similar downstream consequences (Simpkins et al., 1999, Das and Singal, 2004, Catteau and Morris, 2002). Our data-driven pathway approach contrasts with previous work that relies upon use of known cancer pathways and networks such as PARADIGM, an algorithm that uses human-curated pathways and estimates their activity using DNA copy number and mRNA expression data (Vaske et al., 2010).

Here, we present an extension of AMARETTO to a pancancer application using multi-omics data of eleven cancer sites from TCGA. We show that AMARETTO captures modules enriched in major pathways of cancers and modules that accurately predict molecular subtypes. Next, we connect the modules of co-expressed genes in a pancancer module network. We show that this allows the identification of major oncogenic pathways and cancer driver genes involved in multiple cancers. More specifically, we identified a pancancer driver gene that is involved in smoking induced cancers and a pancancer driver gene that is involved in antiviral IFN modulated immune response. Overall, our results show the potential of pancancer multi-omics data fusion to identify cancer drivers that are high within the causal hierarchy of cancer development and associated with common pathways across different types of tumors that eventually can lead to the identification of pancancer drug targets. The AMARETTO algorithm and its pancancer application are publicly available.

2. Materials and Methods

2.1. Data Preprocessing

We used gene expression, copy number and DNA methylation data from TCGA for 11 cancer sites, namely bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), colon and rectal adenocarcinoma (COADREAD), glioblastoma (GBM), head and neck squamous cell carcinoma (HNSC), clear cell renal carcinoma (KIRC), acute myeloid leukemia (LAML), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), ovarian serous cystadenocarcinoma (OV) and uterine corpus endometrial carcinoma (UCEC) (Table 1). All data sets are available at the TCGA data portals.

Table 1.

Number of samples and number of genes for each of the data modalities (gene expression, DNA copy number and DNA methylation) and for the eleven studied cancer sites.

TCGA cancer site TCGA cancer code Gene expression
GISTIC
MethylMix
Samples Genes Samples Genesa Samples Genesb
Bladder urothelial carcinoma BLCA 181 15.432 178 1.974 123 472
Breast invasive carcinoma BRCA 985 16.02 968 1.523 887 890
Colon and rectum adenocarcinoma COADREAD 589 15.533 578 2.523 570 522
Glioblastoma multiforme GBM 501 17.811 481 1.561 321 395
Head and neck squamous cell carcinoma HNSC 371 15.828 365 2.184 308 753
Kidney renal clear cell carcinoma KIRC 509 16.123 501 3.052 497 567
Acute myeloid leukemia LAML 173 14.296 166 1.681 170 613
Lung adenocarcinoma LUAD 489 16.092 487 3.585 367 678
Lung squamous cell carcinoma LUSC 490 16.219 487 2.592 355 679
Ovarian serous cystadenocarcinoma OV 541 17.814 528 1.499 540 510
Uterine corpus endometrial carcinoma UCEC 508 15.706 500 2.074 496 821
a

Number of significant genes found after running GISTIC (data available in the TCGA data portal).

b

Number of genes with significant methylation patterns identified using MethylMix.

The gene expression data were produced using Agilent microarrays for GBM and OV cancers, and RNA sequencing for all other cancer sites. Preprocessing was done by log-transformation and quantile normalization of the arrays. The DNA methylation data were generated using the Illumina Infinium Human Methylation 27 Bead Chip. DNA methylation was quantified using β-values ranging from 0 to 1 according to the DNA methylation levels. We removed CpG sites with more than 10% of missing values in all samples. We used the 15-K nearest neighbor algorithm to estimate the remaining missing values in the data set (Troyanskaya et al., 2001). Finally, the copy number data we used are produced by the Agilent Sure Print G3 Human CGH Microarray Kit 1Mx1M platform. This platform has high redundancy at the gene level, but we observed high correlation between probes matching the same gene. Therefore, probes matching the same gene were merged by taking the average. For all data sources, gene annotation was translated to official gene symbols based on the HUGO Gene Nomenclature Committee (version August 2012). TCGA samples are analyzed in batches and significant batch effects were observed based on a one-way analysis of variance in most data modes. We applied Combat to adjust for these effects (Johnson et al., 2007).

2.2. AMARETTO: Multi-omics Data Fusion

Our approach for analyzing TCGA cancer data is based on AMARETTO, a novel algorithm devoted to construct modules of co-expressed genes through the integration of multi-omics data (Gevaert and Plevritis, 2013, Gevaert et al., 2013). More precisely, AMARETTO is a three-step algorithm that (i) identifies tumor specific DNA copy number or DNA methylation changes, (ii) identifies a set of potential cancer driver genes by integrating DNA copy number, DNA methylation and gene expression data, (iii) connects these cancer driver genes to modules of co-expressed target genes that they control using a penalized regulatory program. AMARETTO, consists of three steps (Fig. 1).

Fig. 1.

Fig. 1

Workflow of pancancer AMARETTO analysis. AMARETTO generates a list of candidate cancer driver genes by investigating significant correlations between methylation, copy number and gene expression data for each putative driver gene separately, and connects them with their downstream targets by constructing a module network. We then identify modules that capture major hallmarks of cancers. We finally detect subnetworks of modules across all cancers.

2.2.1. Step 1

Identification of candidate cancer driver genes with tumor-specific DNA copy number or DNA methylation alterations compared to normal tissue: we first restrict the list of candidates to genes that have either copy number or DNA methylation alterations. These alterations are detected using the GISTIC (Taylor et al., 2008, Mermel et al., 2011) and MethylMix (Gevaert, 2015, Gevaert et al., 2015) algorithms for copy number and DNA methylation data respectively. GISTIC separately models arm-level and focal alterations, identifying amplified and deleted genes. Modeling DNA methylation aberrations in cancer is less well studied. We recently developed MethylMix, a method that identifies hypo and hypermethylated genes by (i) detecting methylation states of each gene with univariate beta mixture models, (ii) comparing them with the DNA methylation levels of normal tissue samples. We used GISTIC to identify significantly and recurrently deleted or amplified regions in the genome (Mermel et al., 2011). Similarly, we used MethylMix to identify recurrently hyper-or hypomethylated genes (Gevaert, 2015).

2.2.2. Step 2

Modeling effect of candidate driver genes on gene expression: a given gene is considered as a candidate driver gene if its expression can be explained by genomic events. Our rationale is that genes driven by multiple genomic events in a significant subset of samples are unlikely to be randomly deregulated. To establish a list of cancer driver genes, we thus investigate the linear effects of copy number and DNA methylation on gene expression through a linear regression model performed on each gene independently:

ExpressionGenei=fβ1MethylationGenei+β2Copy NumberGenei

We used the R2 statistic to evaluate whether copy number has a significant positive effect (β2 > 0) and DNA methylation a significant negative effect (β1 < 0) on gene expression.

2.2.3. Step 3

Associating candidate driver genes with their downstream targets: given the cancer driver genes identified in Step 1, Step 2 aims at connecting them to their regulated targets to construct the regulatory module network. First, the filtered data are clustered in modules of co-expressed genes using a k-means algorithm with 100 clusters. Then, we learn the regulatory programs for each of the modules as a linear combination of cancer driver genes that together explain each module’s mean expression:

ExpressionModulei=fα1Driver1++αnDrivern

In order to induce sparseness, we use an L1-penalty on the regression weights (Tibshirani et al., 2010). The modules are further optimized by running iteratively over the two following steps: (i) reassigning genes based on the closest match to the updated regulatory programs, (ii) updating the regulatory programs based on the new gene assignments (Lee et al., 2009). These two steps are repeated until less than 1% of the cancer driver genes are assigned to new modules.

2.3. Pancancer Module Network

After running AMARETTO on each cancer site individually, the modules are connected in a pancancer network. Specifically, we evaluate whether there is a significant association between all pairs of modules through a hyper-geometric test. We correct for multiple hypothesis testing using the false discovery rate (Benjamini and Hochberg, 1995). We consider the association to be significant if both of the following conditions are satisfied: (i) the adjusted p-value is smaller than 0.05 and (ii) the overlap between two modules is larger than 5 genes. This defines a module network where each edge is scored based on the negative log-transformed adjusted p-value.

We cluster the weighted module network to identify significantly connected subnetworks using the Girvan-Newman algorithm (Newman and Girvan, 2004). This algorithm is a divisive method, which aims at detecting subnetworks by progressively removing edges from the module network according to a score. The original proposed score is based on the betweenness of edges, where betweenness is a measure that favors edges that are between subnetworks, and thus responsible for connecting many pairs of others.

We used the igraph R package to visualize the network and the edge.betweenness.community function (implementation of the Girvan-Newman algorithm) to detect subnetworks. We only focus on subnetworks that satisfy the following conditions: number of nodes larger than the 1% of the total number of nodes in the network, number of represented cancers larger than the 10% of the subnetwork size (and at least, larger than 2) and finally, ratio between edges inside/outside the subnetwork larger than ½.

2.4. Gene Set Enrichment Analysis

To assign biological meaning to these subnetworks of modules, we perform gene set enrichment analysis based on the databases GeneSetDB (Culhane et al., 2010) and MSigDB (Liberzon et al., 2015). For the latter, we restrict the enrichment to hallmark (H), curated (C2), GO (C5), oncogenic (C6) and immunologic signatures (C7) gene sets, which include the gene sets most relevant to cancer gene expression profiles. The enrichment is evaluated by performing a hyper-geometric test, corrected for multiple testing using the FDR (Benjamini and Hochberg, 1995). We used averaged p-values to combine p-values of the pathway enrichment for modules within a subnetwork. We used the following cutoffs: p-value smaller than 0.05, overlap with one module larger than 5 and gene set size smaller than 500. As a negative control we compared the enrichment of the actual modules for each cancer with 100 random permutations of the gene to module assignments. We then counted for each permutation the average number of significant gene sets per module, over all modules using the same cutoffs as the actual enrichment results.

2.5. Prediction Performances

After running AMARETTO on the provided data sets, we computed the prediction performances by comparing the observed module expression data matrix with its predicted value. We reported the averaged mean squared error (MSE) and the R-square taken across all modules.

2.6. Smoking Signatures

To investigate the role played by GPX2 on smoking related pathways of different cancer sites, we first used an oxidative gene signature from the Gene Ontology (Supplementary Table 6a). We defined an associated score by taking the average expression of these genes. We then used a Pearson test to measure the correlation with GPX2 expression. We did the same for a second GO signature associated to xenobiotic metabolism (Supplementary Table 6a).

2.7. Correlation With Smoking Data

We investigated whether GPX2 expression is correlated with smoking. We used clinical data from TCGA containing 743 characteristics (e.g. ethnicity, gender, tumor size, etc.). We restrict our study to clinical variables that are related to smoking (profile, started smoking year, stopping smoking year and pack years). We obtained a significant number of clinical data for only 4 of the 11 cancer sites, namely BLCA, HNSC, LUAD and LUSC. For each of these cancer sites, correlation coefficients between the associated clinical variables and GPX2 expression are calculated through the Spearman test for continuous variables, and the Wilcoxon test, or Kendall test for discrete variables with two or more than two groups respectively. In addition, we drew boxplots representing the association between smoking profile and GPX2 expression.

2.8. Experimental Validation Using GPX2 Knockdown Experiments

We extracted GPX2 perturbation experiments from the LINCS database (Lamb et al., 2006a, Keenan et al., 2017). GPX2 perturbation experiments were available for the lung adenocarcinoma A549 cell line that best resembles the LUAD cancer site, while no matching cell lines were available for the 4 other cancer sites, i.e., LUSC, BLCA, HNSC and UCEC. Four perturbation experiments measuring experimental targets of GPX2 knockdown in the A549 cell line were available in LINCS, including three shRNA experiments and a consensus signature derived from these three experiments. In one of these four experiments, a positive differential expression z-score for GPX2 was measured, and we therefore removed this experiment from the analysis since successful GPX2 knockdown is expected to result in a negative z-score. We used the “preranked” Gene Set Enrichment Analysis (GSEA) (GSEA) (Subramanian et al., 2005) tool to test for enrichment of the target genes of modules regulated by GPX2 in the genome-wide differential expression z-score profiles of the three GPX2 knockdown experiments. We restricted our analysis to the landmark genes (measured on the L1000 platform) and bing genes (inferred with high confidence), and we collapsed multiple probes to unique genes by selecting the probe with the most reliable (landmark over bing) and the highest absolute z-score value. GSEA enrichments were estimated using the normalized enrichment score (NES) and significance of the enrichments was assessed at the FDR < 0.25 level as well as p-value < 0.05 and FDR < 0.25 levels.

2.9. Correlation With PDL1-PDL2 Expression

To investigate the role played by OAS2 on immune response pathways of different cancer sites, we correlated OAS2 expression with CD274, more commonly known as PDL1, and PDCD1LG2 expressions, more commonly known as PDL2, using a Pearson test.

2.10. Inference of Tumor Associated Leukocyte Levels Using CIBERSORT

CIBERSORT (Newman et al., 2015, Gentles et al., 2015) is a computational method that characterizes cell composition of complex tissues from their expression profiles. We applied CIBERSORT to TCGA gene expression data to infer leukocyte representation in the 11 considered cancer sites. More precisely, we used expression profiles for 22 distinct leukocyte subsets (TALs). Only patients for whom estimated p-values are less than 0.05 (indicating high confidence TAL estimation) were included in downstream analyses.

3. Results

AMARETTO is an algorithm that allows the integration of multi-omics data to identify cancer drivers and associate them to their downstream targets. Here, we present a pancancer application of AMARETTO on eleven different cancer sites (Fig. 1, Table 1).

3.1. AMARETTO Captures Major Hallmarks of Cancer

AMARETTO modules capture the major oncogenic pathways whereas randomly permuted modules do not result in significant gene sets in all cancer sites (Fig. 2, Supplementary Table 1). We found 22 modules enriched in cell cycle pathways. Four of these modules are regulated by CHEK1, a well-known cell cycle gene required for checkpoint-mediated cell cycle arrest in response to DNA damage (Walworth et al., 1993, Patil et al., 2013). Next, we found that 43 modules are highly enriched with genes related to angiogenesis. The most common cancer driver gene, FSTL1, regulates 8 of the 43 enriched modules, representing a potential cancer driver gene that regulates angiogenesis. This gene has been shown to be involved in proliferation, migration and invasion (Zhou et al., 2016, Chan et al., 2009, Kudo-Saito et al., 2013) and was recently linked with angiogenesis in post-myocardial infarction rats (Xi et al., 2016). Twelve modules are enriched in epithelial-to-mesenchymal transition (EMT) pathways. The cancer driver genes of EMT modules include TGFB3, a member of the TGF beta pathway known to regulate EMT (Jalali et al., 2012, Papageorgis, 2015, Vendrell et al., 2012, Chen et al., 2013a), and NUAK, which has been implicated in several cancers (Ye et al., 2014, Cui et al., 2013, Chen et al., 2013b). In addition, NUAK1 is involved in EMT in ovarian cancer (Zhang et al., 2015) and is part of two ovarian cancer modules identified by AMARETTO that are enriched in EMT-related genes (Jalali et al., 2012). Next, 100 modules are enriched in immune response pathways, with the inflammatory chemokine CCL5 regulating 14 of these modules (Aldinucci and Colombatti, 2014). Overall, AMARETTO enabled us to find modules enriched in many major hallmarks of cancers, including hypoxia, apoptosis, metastases, integrin and epidermal growth factor receptor (EGFR) signaling demonstrating the validity of our approach (Fig. 2, Supplementary Table 1).

Fig. 2.

Fig. 2

Heatmap representing the enrichment between modules of all cancer sites (in rows) and gene sets associated with major hallmarks of cancers (in columns): angiogenesis, hypoxia, epithelial mesenchymal transition (EMT), cell cycle, immune response, apoptosis, metastases, integrin and epidermal growth factor receptor (EGFR).

3.2. Methylation Driven Genes are More Predictive of Downstream Expression Than DNA Copy Number Driver Genes

The module networks for each of the 11 cancer types contain on average 408 cancer driver genes and 7.67 cancer driver genes per module. The top cancer driver genes across all histologies include 45 genes that regulate more than 15 modules across an average number of 4.9 cancer sites per gene (Supplementary Table 2). Interestingly, for all cancers, a higher proportion of selected drivers are DNA methylation driven compared to DNA copy number driven genes (Supplementary Fig. 1). Over 90% of cancer driver genes present aberrant DNA methylation patterns, highlighting the importance of DNA methylation-mediated deregulation. Moreover, using methylation data with or without copy number data considerably increases the predictive performance of cognate gene expression relative to copy number alone (Supplementary Fig. 2 and Table 3). We found that adding methylation driver genes led to an averaged R-squared increase of between 6% for LUSC up to 16% for BRCA when predicting cognate gene expression (Fig. 3, Supplementary Fig. 2).

Fig. 3.

Fig. 3

Boxplots representing the prediction performances of cancer driver genes predicting their target genes measured in two ways: R-square and mean squared error (MSE) obtained after running AMARETTO in three ways using only copy number data, only DNA methylation data and with both copy number and methylation data, shown for four cancer sites.

3.3. Connecting AMARETTO Modules Reveals Pancancer Driver Genes

To identify pancancer driver genes, we connected AMARETTO modules across 11 cancer sites in a network and identified significantly connected subnetworks. Our results show a module network with 2,693 edges between 713 modules (Supplementary Fig. 3). Given this network, we detected 20 subnetworks containing between 9 and 74 modules (Fig. 4, Supplementary Table 4). Among these subnetworks, seven represent all 11 cancer sites. The most heterogeneous one is Subnetwork 17, consisting of 11 modules each representing a different cancer. The least represented cancer site is LAML, with only 26 modules. It is also absent from 7 subnetworks reflecting most likely the difference between hematological and epithelial cancers. On the contrary, HNSC, LUSC, LUAD and BLCA, which are part of 19 subnetworks, are over-represented with more than 60 modules. Next, we focused on subnetworks that show high degrees of overlapping cancer driver genes in an effort to identify pancancer driver genes of important biological pathways. For example we identified a pancancer subnetwork enriched in cell cycle pathways (Fig. 4, dark blue subnetwork, Supplementary note). Next, we describe two subnetworks that led to the identification of two novel pancancer driver genes: a subnetwork involved in smoking induced cancers and a subnetwork involved in ‘antiviral’ interferon-modulated innate immune response.

Fig. 4.

Fig. 4

(a) Network representing all pancancer modules and 20 subnetworks. One node corresponds to one module from one cancer site. An edge between two modules represents a significant association between the module genes. The identified subnetworks are highlighted using different colors. (b) Three subnetworks are highlighted: the blue subnetwork is related to smoking, the red subnetwork captures immune response and the dark blue subnetwork captures the cell cycle.

3.4. GPX2 is a Driver of Smoking Induced Cancers

We found a subnetwork containing 15 modules representing 8 different cancers that is significantly enriched in smoking induced pathways (Fig. 4, blue subnetwork). Two smoking associated cancer sites, LUSC and HNSC, have three modules each in this subnetwork. This subnetwork contains one cancer driver gene GPX2, a glutathione peroxidase from the GPx family of genes, that is part of 8 modules across multiple cancer sites (Supplementary Table 5). GPX2 is hypo-methylated gene in all of the 11 cancers, except in HNSC where it is hyper or hypo-methylated in different patient subgroups. We found that several modules of the subnetwork are enriched in three smoking-related pathways (Supplementary Table 5). These particularly include genes involved in protection against chronic inflammation and asthma in lung cancers, as well as smoking-related gene expression signatures (Malhotra et al., 2010, Harvey et al., 2007, Boelens et al., 2009).

To verify the smoking association of this subnetwork, we used two gene signatures reflecting smoking damage, a xenobiotic metabolism signature, and an oxidative stress gene signature (Supplementary Table 6a). This analysis showed a significant correlation between GPX2 expression and xenobiotic metabolism for all cancers (p-value < 0.001, Fig. 5a, Supplementary Fig. 4 and Table 6a) and a significant correlation with oxidative stress for six cancer sites in this subnetwork (p-value < 0.001, Supplementary Fig. 5 and Table 6a), suggesting a role of GPX2 in meditating cellular response to carcinogens in tobacco smoke (Boyle et al., 2010, Beane et al., 2011).

Fig. 5.

Fig. 5

(a) Correlation between GPX2 expression and the averaged expression of a xenobiotic response gene signature showing significant correlation. (b) Heatmap showing the enrichments of the target genes of the 8 modules regulated by GPX2 derived from the 5 cancer sites (columns organized by cancer sites in following order: LUAD, LUSC, BLCA, HNSC, UCEC) in the three GPX2 knockdown experiments measured in the lung adenocarcinoma A549 cell line (rows: shRNA1, shRNA2, and consensus). Enrichments are represented by the GSEA normalized enrichment scores (NES). Only significant enrichments (top panel: p-value < 0.05 and FDR < 0.25; bottom panel: FDR < 0.25) are shown (red: induced; blue: repressed; grey: not significant). The bottom panel shows the modules that contain at least 10 target genes or at least 50% of the target genes in LINCS.

Next, we correlated GPX2 expression with smoking data, available for BLCA, HNSC, LUAD and LUSC. Even in the presence of significant missing clinical data (Supplementary Table 6b), we found significant correlations with the number of smoked years (p-value = 0.011, corr = 0.26) and pack years (p-value = 0.003, corr = 0.21) for HNSC (Supplementary Table 6b). We also found a significant association with GPX2 expression and smoking profile for HNSC (p-value < 0.001, Supplementary Fig. 6), the most represented cancer within the subnetwork, and a borderline significant correlation for BLCA (p-value = 0.05, Supplementary Fig. 6).

To experimentally validate GPX2 as a driver of the smoking subnetwork, we interrogated the target genes of the 8 modules learned in the 5 cancer sites against publicly available genetic perturbation studies of GPX2 in the Library of Integrated Network-Based Cellular Signatures (LINCS) database (Keenan et al., 2017, Lamb et al., 2006b). We used Gene Set Enrichment Analysis (GSEA) to test for enrichment of the target genes of these modules in GPX2 knockdown experiments performed in the lung adenocarcinoma A549 cell line (Subramanian et al., 2005, Mootha et al., 2003). We observed that the target genes of these modules that are activated by induced GPX2 expression are significantly repressed upon knockdown of GPX2 in the A549 cell line. This expected behavior was observed in LUAD, and also consistent in the four other cancer sites, i.e., LUSC, BLCA, HNSC and UCEC (Fig. 5, Supplementary Table 7), which confirmed our hypothesis that GPX2 is a causative driver of these smoking-related modules.

3.5. OAS2 is a Driver of ‘Antiviral’ Interferon-Modulated Innate Immune Response

We found a second intriguing pancancer subnetwork, Subnetwork 12, containing 15 modules representing 10 different cancers and enriched in interferon-inducible antiviral response pathways (Fig. 4, red subnetwork). After ranking the 65 cancer driver genes in this subnetwork based on overlap, we identified one gene, OAS2, regulating 10 modules, and TRIM22 regulating 6 modules, 6 genes regulating 4 modules and 8 genes overlapping 4 modules (Supplementary Table 8). OAS2 is an interferon (IFN)-inducible enzyme that senses of double-stranded RNA (dsRNA) produced by viruses and subsequently activates RNAse L to destroy viruses (Gribaudo et al., 1991). Next, TRIM22 is also an interferon-inducible motif family antiviral protein, though its mechanism of viral repression is less clear (Eldin et al., 2009, Uchil et al., 2013). Interestingly, almost all of the cancer driver genes of this subnetwork present aberrant methylation patterns.

Using enrichment analysis, we found that the subnetwork is enriched by 21 different gene sets, most of which represented antiviral response and/or interferon inducible pathways (Supplementary Table 8). These included genes up or down-regulated in T-cells (Hutcheson et al., 2008), dendritic cells (Fulcher et al., 2006), and other blood cell types (Querec et al., 2009). A strong enrichment was also found with an IFN gamma response gene set from the hallmark gene collection (Liberzon et al., 2015) and other IFN response gene sets were highly enriched (Ulloa-Montoya et al., 2013). Using the IFNs database Interferome as a reference (Rusinova et al., 2013), we confirmed that pancancer driver genes of the immune subnetwork were related to response to IFNs, including all three IFN classes (Supplementary Fig. 7). Despite its role in antiviral response, OAS2 was not differentially expressed between cancers harboring oncogenic viruses and those without detectable viruses, based on data for detection of viral transcripts in TCGA cancers provided (Tang et al., 2013) (data not shown).

Immune gene expression signatures in cancer can reflect the profile of mixed infiltrating tumor associated leukocyte (TAL) cell types within the tumor. To determine the TAL types associated with the IFN responsive signature, we inferred the levels of 22 TAL types in all TCGA cases using CIBERSORT (Newman et al., 2015), and tested the correlation of pancancer driver genes OAS2 and TRIM22 with each TAL type. Both OAS2 and TRIM22 were strongly correlated with M1 macrophage levels across all cancer types. This is consistent with the fact that these pro-inflammatory M1 tumor associated macrophages (TAMs) are activated by IFNγ in response to pathogen infection or cancer, and that M1 macrophage activation or ‘polarization’ coincides with upregulation of IFN-responsive genes (Jiang et al., 2017). Interestingly, SP110, a pancancer driver gene of the IFN response subnetwork, is known to regulate macrophage gene expression and differentiation (Jones et al., 2014, Wu et al., 2016).

We have recently reported that a molecular subtype of HNSC with high levels of M1 TAMs overexpresses CD274, the gene encoding PD-L1, a ligand for the CD8 + T cell-expressed immune checkpoint receptor PD-1 (Brennan et al., 2017). This suggests that M1 TAM expression of PD-L1 may contribute to evasion of CD8 + T cell-mediated anti-cancer immunity, as previously reported (Jiang et al., 2017). To investigate this further, we tested the correlation of OAS2, as a marker of the IFN-responsive/M1 TAM signature, with both ligands for the immune checkpoint receptor PD-1: CD274, the gene encoding PD-L1, and PDCD1LG2, encoding PD-L2. We observed a significant correlation for both genes and all cancer sites (p-values < 0.001, Supplementary Fig. 8 and Table 9).

4. Discussion

Here, we present a pancancer analysis using AMARETTO, an algorithm that addresses the challenge of integrating and interpreting multi-omics cancer data. AMARETTO first identifies cancer drivers, by considering genes with either DNA copy number or DNA methylation aberrations that have an effect on gene expression. AMARETTO then connects these cancer driver genes with target genes in the form of modules. Modules are subsequently associated with known biological pathways. We have shown that AMARETTO captures major biological pathways and reveals pancancer driver genes through network analysis.

AMARETTO focuses on DNA copy number and DNA methylation altered genes and their effect on gene expression, and does not model DNA somatic mutations for several reasons. First, somatic mutations do not necessarily affect gene expression, and additional data are required to be able to model the effect of a somatic mutation on expression, the key idea behind AMARETTO. Secondly, for each cancer site, besides the most significant mutated genes, many sequencing projects show that many genes are mutated in less than 5% of the cohort. These long tails create very sparse mutation data that do not add any predictive power in AMARETTO (data not shown). Future algorithmic work is needed to investigate how sparse somatic mutations can be integrated in a multi-omics framework like AMARETTO. Overall, AMARETTO is a complementary technique to identify cancer driver genes alongside methods focusing on distinguishing driver mutations from passenger mutations from DNA sequencing data such as MutSig (Lawrence et al., 2013).

Other computational methods have focused on identifying cancer driver genes using transcriptomics and multi-omics data. For example, ARACNE is a method that uses gene expression and a mutual information statistic to identify cancer driver genes through connecting transcription factors to their targets (Margolin et al., 2006, Basso et al., 2005). ARACNE is thus focused solely on gene expression data. CONEXIC is the most similar method to AMARETTO and uses a Bayesian strategy to connect cancer driver genes to their targets (Akavia et al., 2010). We argue however that AMARETTO significantly improves upon CONEXIC. First, CONEXIC only takes into account DNA copy number changes, and does not model DNA methylation data. Our results show that DNA methylation driven genes are more predictive of transcription. Secondly, in a large benchmark, a previous comparison between AMARETTO and CONEXIC showed that in terms of R-square AMARETTO outperforms CONEXIC when predicting gene expression on unseen data (Manolakos et al., 2014). Thirdly, CONEXIC involves a large number of parameters and is more computationally demanding for large data sets (Manolakos et al., 2014).

In our results, we focused particularly on the biological implications of the two novel pancancer driver genes: GPX2 related to smoking and OAS2 related to IFN response. Regarding the former subnetwork, previous work has shown the importance of GPX2 in lung cancer. The GPX genes, glutathione peroxidases, are involved in protection of cells against oxidative stress and have been shown to be regulated by the Nrf2-pathway in lung. Activation of the Nrf2-pathway plays an important role in resistance to oxidant stress and its deregulation is one of the major causes of lung cancers (Rangasamy et al., 2004, Ma, 2013). Among the 5 GPX genes, expression of GPX2 has been shown to be related to smoking response, induced by Nrf2 activation in the lungs (Singh et al., 2006). GPX2 inhibits apoptosis in response to oxidative stress (Yan and Chen, 2006), such as may be caused by smoking. This subnetwork discovered by pancancer AMARETTO analysis indicates a key role for GPX2 in regulating gene expression in smoking-related cancers (LUAD, LUSC, HNSC, BLCA), and surprisingly, in UCEC, for which smoking is a protective factor. That GPX2 may block oxidative stress-induced apoptosis suggests that its inhibition may restore apoptosis, making it a potential drug target.

Next, the interferon response subnetwork showed that all of its predicted cancer driver genes represent genes that are expressed in response to IFNs. IFNs are cytokines that protect against cancer by activating innate immune inflammatory response to pathogens or cancer, triggering cancer cell death (Munoz-Fontela et al., 2008). We found that OAS2, overexpressed due to DNA hypo-methylation, was a major driver of IFN/immune related modules. While OAS2 is primarily implicated in mediating immune response to viruses (Crouse et al., 2015, Sadler and Williams, 2008), most of the cancers overexpressing the IFN module do not have detectable oncogenic viruses (Tang et al., 2013). This is consistent with previous reports of an ‘antiviral’ gene expression profile marked by expression of OAS2 and other IFN responsive genes, in non-virus-related cancers (Liu et al., 2013). A similar ‘interferon-inducible antiviral response’ gene expression signature (Featuring OAS2, OAS1, IRF7, MX2 and ISG20, regulators of our IFN response subnetwork) was associated with response to expression of dsRNA derived from human endogenous retroviruses (HERVs) in ovarian and colorectal cancer cell lines, upon loss of DNA methylation-mediated repression of HERVs (Chiappinelli et al., 2015, Roulois et al., 2015). Given that OAS2 is a viral dsRNA sensor, and all of the IFN response subnetwork regulators were abnormally methylated, it is plausible that expression of this subnetwork reflects response to reactivation of HERVs, a frequent event in, and potential cause of, many cancers (Kassiotis, 2014).

The IFN-response subnetwork was associated with levels of M1 TAMs across cancer types. TAMs include both pro-inflammatory M1 TAMs and anti-inflammatory M2 TAMs, both of which derive from M0 mature macrophages. M1 macrophage activation is stimulated by IFNγ, and is associated with expression of IFN-responsive genes such including OAS2 (Jiang et al., 2017, Martinez et al., 2006, Hu et al., 2008). Therefore, expression of IFN-response subnetwork genes may reflect infiltration of M1 TAMs, as part of innate response to cancer. Such inflammatory responses are generally considered to be anti-tumorigenic, indeed, stimulation of inflammatory response by treatment with IFNα is used therapeutically to stimulate anti-cancer immune response (Jiang et al., 2017, Zitvogel et al., 2015). The current paradigm asserts that M2 TAMs are immunosuppressive and their levels are generally associated with poor prognosis in cancer, as they promote invasion, metastasis and therapy resistance. Conversely M1 TAMs that kill tumor cells are associated with prolonged survival (Mei et al., 2016, Costa et al., 2013, Ostuni et al., 2015). On the other hand, tumors can modulate TAMs to express pro-oncogenic factors. We observed that OAS2 expression is correlated with expression of CD274, the gene encoding PD-L1, a ligand for the CD8 + T cell-expressed immune checkpoint receptor PD-1 that suppresses CD8 + T cell-mediated anti-cancer immunity. Indeed, IFN induces expression of PD-L1 during chronic inflammation or viral infections, dampening CD8 + T cell response (Akhmetzyanova et al., 2015, Barber et al., 2006). PD-L1 is expressed by TAMs as well as tumor cells, and emerging evidence indicates that tumors modulate TAMs to express high levels of PD-L1 (Jiang et al., 2017, Schalper et al., 2015). Our findings indicate that PD-L1 expression is particularly correlated with M1 TAMs, as opposed to M0 or M2 TAMs, indicating a novel immunosuppressive role of M1 TAMs.

Monoclonal antibodies targeting PD-L1 or PD-1 can restore CD8 + T cell cytotoxic anti-cancer immunity, suggesting a potential therapeutic opportunity for patients displaying the IFN signature. Indeed, an IFN signature has recently been shown to be favorably predictive of response to PD-1 blockade (Ayers et al., 2015, Ribas et al., 2015) and recent experimental evidence indicates that IFN signaling, particularly IFN gamma signaling upregulates PD-L1 and PD-L2 expression in cancer, both in vitro and in vivo (Garcia-Diaz et al., 2017). A cancer driver gene such as OAS2 or TRIM22 may help to distinguish between patients that will benefit for PD-1/PD-L1 immunotherapy, and those for whom ineffective treatment may cause autoimmune side effects (Patel and Kurzrock, 2015). AMARETTO has hereby enabled us to identify OAS2 and TRIM22 as epigenetically deregulated cancer driver genes within a pan-cancer IFN responsive pathway that provides novel biological insight into tumor-immune interactions that may have implications for immunotherapy.

In summary, pancancer AMARETTO allows identifying cancer driver genes for major hallmark cancer pathways, a pancancer driver gene involved in smoking induced cancers and a pancancer driver gene of immune response. AMARETTO thus provides a computational method for cancer driver gene identification in a multi-omics setting and might lead to novel drug targets.

The following are the supplementary data related to this article.

Supplementary Fig. 1

Number of regulator genes (genes that regulate at least one module from one cancer type) identified in AMARETTO by only copy number, only DNA methylation or both.

mmc1.pdf (62.1KB, pdf)
Supplementary Fig. 2

Boxplots representing the prediction performances (mean squared error MSE and R-square) obtained after running AMARETTO using copy number data only, methylation only or both copy number and methylation on the 11 cancer sites.

mmc2.pdf (1.1MB, pdf)
Supplementary Fig. 3

Overview of the module network (upper figure), the pancancer module network (on the bottom, left) and a zoomed view of intriguing subnetworks. The nodes of the graph are all the modules across all cancers sites. Their size depends on the node degree (number of incident edges). An edge between two modules stands for a significant association between them (measured through the minus log-transformation of the adjusted p-value, which also defines the edge thickness). For the top figure, the node color depends on the associated cancer site, for the two bottom figures, the node color depends on the subnetwork it belongs to.

mmc3.pdf (285.4KB, pdf)
Supplementary Fig. 4

Correlations between GPX2 expression and the averaged expression of an oxidative response gene signature for all cancer sites.

mmc4.pdf (254.4KB, pdf)
Supplementary Fig. 5

Correlations between GPX2 expression and the averaged expression of a xenobiotic response gene signature for all cancer sites.

mmc5.pdf (214.1KB, pdf)
Supplementary Fig. 6

Boxplots representing the association between GPX2 expression and smoking profile for BLCA and HNSC cancers.

mmc6.pdf (23.9KB, pdf)
Supplementary Fig. 7

Venn diagram representing the number of genes regulating the immune response subnetwork and induced by IFNs of type I, II or III.

mmc7.pdf (4.8KB, pdf)
Supplementary Fig. 8

Scatterplots representing the associations between OAS2 expression and PD-L1 expression on the left, and PD-L2 expression, on the right for all cancer sites.

mmc8.pdf (608.1KB, pdf)
Supplementary Table 1

(a) The top 50 most selected driver genes regulating modules enriched in major pathways of cancer including angiogenesis, hypoxia, EMT, cell cycle, immune response, apoptosis, metastases, integrin signaling and EGFR across all cancer sites, (b) genetic and epigenetic alterations of the top drivers across all cancer sites and (c) comparison of average number of enriched gene sets per module in 100 random permutations vs. the actual modules per cancer.

mmc9.pdf (124.4KB, pdf)
Supplementary Table 2

Number of modules regulated by driver genes for all cancer types. Only the top regulator genes, ranked according to the total number of regulated modules across all cancer sites (last column) are represented.

mmc10.pdf (50.4KB, pdf)
Supplementary Table 3

Prediction performances (R-square and mean squared error MSE) obtained after running AMARETTO using copy number data only, methylation only or both copy number and methylation on the 11 cancer sites. The two last columns indicate the R-square and MSE increase when adding methylation data.

mmc11.pdf (35.8KB, pdf)
Supplementary Table 4

Distribution of the modules per cancer (column) and subnetwork (row). The last column indicates the total number of modules within a subnetwork.

mmc12.pdf (33.2KB, pdf)
Supplementary Table 5

Cancer driver genes and enrichment results of the smoking subnetwork ranked by number of modules each driver gene is participating in.

mmc13.pdf (63.8KB, pdf)
Supplementary Table 6

(a) Oxidative and xenobiotic gene signatures provided by the GO ontology and used to measure the association of GPX2 expression with smoking. (b) Correlations and p-values measuring the association between smoking related data (smoking profile, number of smoked years and pack years) and GPX2 expression across all cancer sites with enough data (BLCA, HNSC, LUAD, LUSC). The percentage of missing data is also indicated.

mmc14.pdf (60.7KB, pdf)
Supplementary Table 7

Experimental validation of the modules and their target genes regulated by GPX2 as a causative driver of the pancancer smoking subnetwork. Table shows overall GSEA enrichment results of the three perturbation experiments upon GPX2 knockdown in the lung adenocarcinoma A549 cell line derived from LINCS (columns: consensus, shRNA1, shRNA2) in each of the 8 modules regulated by GPX2 in the 5 cancer sites (rows: modules organized by the 5 sites in the following order: LUAD, LUSC, BLCA, HNSC, UCEC). On top are shown the significance levels (p-values and FDR values; green if p-value < 0.05 and FDR < 0.25, yellow if FDR < 0.25) and below, the normalized enrichment scores (NES; blue if repressed, red if induced). On the right are the sizes of the signatures, from left to right: number of genes in the modules, number of genes that are part of the LINCS landmark and bing genes, and the numbers of those genes that are identified as ‘leading edge’ genes driving the GSEA enrichment scores in the three experiments (consensus, shRNA1, shRNA2).

mmc15.pdf (119KB, pdf)
Supplementary Table 8

Cancer driver genes and enrichment results of the immune response subnetwork ranked by number of modules each driver gene is participating in.

mmc16.pdf (83.6KB, pdf)
Supplementary Table 9

Correlations and p-values for Pearson test measuring the association between OAS2 expression and PD-L1/PD-L2 expression for all cancer sites.

mmc17.pdf (40.4KB, pdf)
Supplementary Table 10

Cancer driver genes and enrichment results of the histone subnetwork ranked by number of modules each driver gene is participating in.

mmc18.pdf (65.3KB, pdf)

Funding Sources

Research reported in this publication was supported by the National Institute of Dental & Craniofacial Research (NIDCR) under Award Number U01 DE025188 and the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health under Award Number R01 EB020527 and The National Cancer Institute (NCI) Informatics Technology for Cancer Research (ITCR) under Award Number R21 CA209940. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The funders did not play a role in the design and execution of this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

OG conceived and designed the study. OG, KB and MC developed the algorithm, performed experiments and analyzed the data. KB and AG analyzed CIBERSORT data and interpreted results. TC and NP performed experiments and carried out analysis using LINCS data. All authors contributed in manuscript drafting and revision.

Acknowledgements

We thank The Cancer Genome Atlas (TCGA) for making the data publicly available.

References

  1. Akavia U.D. An integrated approach to uncover drivers of cancer. Cell. 2010;143:1005–1017. doi: 10.1016/j.cell.2010.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Akhmetzyanova I. PD-L1 Expression on Retrovirus-Infected Cells Mediates Immune Escape from CD8 + T Cell Killing. PLoS Pathog. 2015;11:e1005224. doi: 10.1371/journal.ppat.1005224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Aldinucci D., Colombatti A. The inflammatory chemokine CCL5 and cancer progression. Mediat. Inflamm. 2014;2014:292376. doi: 10.1155/2014/292376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Ayers M. Relationship between immune gene signatures and clinical response to PD-1 blockade with pembrolizumab (MK-3475) in patients with advanced solid tumors. J. Immunother. Cancer. 2015;3 [Google Scholar]
  5. Barber D.L. Restoring function in exhausted CD8 T cells during chronic viral infection. Nature. 2006;439:682–687. doi: 10.1038/nature04444. [DOI] [PubMed] [Google Scholar]
  6. Basso K. Reverse engineering of regulatory networks in human B cells. Nat. Genet. 2005;37:382–390. doi: 10.1038/ng1532. [DOI] [PubMed] [Google Scholar]
  7. Beane J. Characterizing the impact of smoking and lung cancer on the airway transcriptome using RNA-Seq. Cancer Prev. Res. 2011;4:803–817. doi: 10.1158/1940-6207.CAPR-11-0212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 1995:289–300. [Google Scholar]
  9. Boelens M.C. Current smoking-specific gene expression signature in normal bronchial epithelium is enhanced in squamous cell lung cancer. J. Pathol. 2009;218:182–191. doi: 10.1002/path.2520. [DOI] [PubMed] [Google Scholar]
  10. Boyle J.O. Effects of cigarette smoke on the human oral mucosal transcriptome. Cancer Prev. Res. 2010;3:266–278. doi: 10.1158/1940-6207.CAPR-09-0192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Brennan C., Koenig J.L., Gentles A.J., Sunwoo J.B., Gevaert O. Identification of an atypical etiological head and neck squamous carcinoma subtype featuring the CpG island methylator phenotype. EBioMedicie. 2017;17:223–236. doi: 10.1016/j.ebiom.2017.02.025. [Epub ahead of print] [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cancer Genome Atlas Research Network The cancer genome Atlas Pan-Cancer analysis project. Nat. Genet. 2013;45:1113–1120. doi: 10.1038/ng.2764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Catteau A., Morris J.R. BRCA1 methylation: a significant role in tumour development? Semin. Cancer Biol. 2002;12:359–371. doi: 10.1016/s1044-579x(02)00056-1. [DOI] [PubMed] [Google Scholar]
  14. Chan Q.K. Tumor suppressor effect of follistatin-like 1 in ovarian and endometrial carcinogenesis: a differential expression and functional analysis. Carcinogenesis. 2009;30:114–121. doi: 10.1093/carcin/bgn215. [DOI] [PubMed] [Google Scholar]
  15. Chen C.L. Single-cell analysis of circulating tumor cells identifies cumulative expression patterns of EMT-related genes in metastatic prostate cancer. Prostate. 2013;73:813–826. doi: 10.1002/pros.22625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Chen P., Li K., Liang Y., Li L., Zhu X. High NUAK1 expression correlates with poor prognosis and involved in NSCLC cells migration and invasion. Exp. Lung Res. 2013;39:9–17. doi: 10.3109/01902148.2012.744115. [DOI] [PubMed] [Google Scholar]
  17. Chiappinelli K.B. Inhibiting DNA methylation causes an interferon response in cancer via dsRNA including endogenous retroviruses. Cell. 2015;162:974–986. doi: 10.1016/j.cell.2015.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Ciriello G., Cerami E., Sander C., Schultz N. Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 2012;22:398–406. doi: 10.1101/gr.125567.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Costa N.L. Tumor-associated macrophages and the profile of inflammatory cytokines in oral squamous cell carcinoma. Oral Oncol. 2013;49:216–223. doi: 10.1016/j.oraloncology.2012.09.012. [DOI] [PubMed] [Google Scholar]
  20. Crouse J., Kalinke U., Oxenius A. Regulation of antiviral T cell responses by type I interferons. Nat. Rev. Immunol. 2015;15:231–242. doi: 10.1038/nri3806. [DOI] [PubMed] [Google Scholar]
  21. Cui J. Overexpression of ARK5 is associated with poor prognosis in hepatocellular carcinoma. Tumour Biol. 2013;34:1913–1918. doi: 10.1007/s13277-013-0735-x. [DOI] [PubMed] [Google Scholar]
  22. Culhane A.C. GeneSigDB—a curated database of gene expression signatures. Nucleic Acids Res. 2010;38:D716–725. doi: 10.1093/nar/gkp1015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Das P.M., Singal R. DNA methylation and cancer. J. Clin. Oncol. 2004;22:4632–4642. doi: 10.1200/JCO.2004.07.151. [DOI] [PubMed] [Google Scholar]
  24. Eifert C., Powers R.S. From cancer genomes to oncogenic drivers, tumour dependencies and therapeutic targets. Nat. Rev. Cancer. 2012;12:572–578. doi: 10.1038/nrc3299. [DOI] [PubMed] [Google Scholar]
  25. Eldin P. TRIM22 E3 ubiquitin ligase activity is required to mediate antiviral activity against encephalomyocarditis virus. J. Gen. Virol. 2009;90:536–545. doi: 10.1099/vir.0.006288-0. [DOI] [PubMed] [Google Scholar]
  26. Fulcher J.A. Galectin-1-matured human monocyte-derived dendritic cells have enhanced migration through extracellular matrix. J. Immunol. 2006;177:216–226. doi: 10.4049/jimmunol.177.1.216. [DOI] [PubMed] [Google Scholar]
  27. Garcia-Diaz A. Interferon receptor signaling pathways regulating PD-L1 and PD-L2 expression. Cell Rep. 2017;19:1189–1201. doi: 10.1016/j.celrep.2017.04.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Gentles A.J. The prognostic landscape of genes and infiltrating immune cells across human cancers. Nat. Med. 2015;21(8):938–945. doi: 10.1038/nm.3909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Gevaert O. MethylMix: an R package for identifying DNA methylation-driven genes. Bioinformatics. 2015;31:1839–1841. doi: 10.1093/bioinformatics/btv020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Gevaert O., Plevritis S. Identifying master regulators of cancer and their downstream targets by integrating genomic and epigenomic features. Pac. Symp. Biocomput. 2013:123–134. [PMC free article] [PubMed] [Google Scholar]
  31. Gevaert O., Villalobos V., Sikic B.I., Plevritis S.K. Identification of ovarian cancer driver genes by using module network integration of multi-omics data. Interface Focus. 2013;3:20130013. doi: 10.1098/rsfs.2013.0013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Gevaert O., Tibshirani R., Plevritis S.K. Pancancer analysis of DNA methylation-driven genes using MethylMix. Genome Biol. 2015;16:17. doi: 10.1186/s13059-014-0579-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Gribaudo G., Lembo D., Cavallo G., Landolfo S., Lengyel P. Interferon action: binding of viral RNA to the 40-kilodalton 2'-5'-oligoadenylate synthetase in interferon-treated HeLa cells infected with encephalomyocarditis virus. J. Virol. 1991;65:1748–1757. doi: 10.1128/jvi.65.4.1748-1757.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Harvey B.G. Modification of gene expression of the small airway epithelium in response to cigarette smoking. J. Mol. Med. 2007;85:39–53. doi: 10.1007/s00109-006-0103-z. [DOI] [PubMed] [Google Scholar]
  35. Hoadley K.A. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell. 2014;158:929–944. doi: 10.1016/j.cell.2014.06.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Hu X., Chakravarty S.D., Ivashkiv L.B. Regulation of interferon and Toll-like receptor signaling during macrophage activation by opposing feedforward and feedback inhibition mechanisms. Immunol. Rev. 2008;226:41–56. doi: 10.1111/j.1600-065X.2008.00707.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Hutcheson J. Combined deficiency of proapoptotic regulators Bim and Fas results in the early onset of systemic autoimmunity. Immunity. 2008;28:206–217. doi: 10.1016/j.immuni.2007.12.015. [DOI] [PubMed] [Google Scholar]
  38. Jalali A., Zhu X., Liu C., Nawshad A. Induction of palate epithelial mesenchymal transition by transforming growth factor beta3 signaling. Develop. Growth Differ. 2012;54:633–648. doi: 10.1111/j.1440-169X.2012.01364.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Jiang C., Yuan F., Wang J., Wu L. Oral squamous cell carcinoma suppressed antitumor immunity through induction of PD-L1 expression on tumor-associated macrophages. Immunobiology. 2017;222(4):651–657. doi: 10.1016/j.imbio.2016.12.002. [DOI] [PubMed] [Google Scholar]
  40. Johnson W.E., Li C., Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–127. doi: 10.1093/biostatistics/kxj037. [DOI] [PubMed] [Google Scholar]
  41. Jones A., Kainz D., Khan F., Lee C., Carrithers M.D. Human macrophage SCN5A activates an innate immune signaling pathway for antiviral host defense. J. Biol. Chem. 2014;289:35326–35340. doi: 10.1074/jbc.M114.611962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Kassiotis G. Endogenous retroviruses and the development of cancer. J. Immunol. 2014;192:1343–1349. doi: 10.4049/jimmunol.1302972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Keenan A.B, Jenkins S.L., Jagodnik K.M., Koplev S., He E., Torre D., Wang Z., Dohlman A.B., Silverstein M.C., Lachmann A., Kuleshov M.V., Ma'ayan A., Stathias V., Terryn R., Cooper D., Forlin M., Koleti A., Vidovic D., Chung C., Schürer S.C., Vasiliauskas J., Pilarczyk M., Shamsaei B., Fazel M., Ren Y., Niu W., Clark N.A., White S., Mahi N., Zhang L., Kouril M., Reichard J.F., Sivaganesan S., Medvedovic M., Meller J., Koch R.J., Birtwistle M.R., Iyengar R., Sobie E.A., Azeloglu E.U., Kaye J., Osterloh J., Haston K., Kalra J., Finkbiener S., Li J., Milani P., Adam M., Escalante-Chong R., Sachs K., Lenail A., Ramamoorthy D., Fraenkel E., Daigle G., Hussain U., Coye A., Rothstein J., Sareen D., Ornelas L., Banuelos M., Mandefro B., Ho R., Svendsen C.N., Lim R.G., Stocksdale J., Casale M.S., Thompson T.G., Wu J., Thompson L.M., Dardov V., Venkatraman V., Matlock A., Van Eyk J.E., Jaffe J.D., Papanastasiou M., Subramanian A., Golub T.R., Erickson S.D., Fallahi-Sichani M., Hafner M., Gray N.S., Lin J.R., Mills C.E., Muhlich J.L., Niepel M., Shamu C.E., Williams E.H., Wrobel D., Sorger P.K., Heiser L.M., Gray J.W., Korkola J.E., Mills G.B., LaBarge M., Feiler H.S., Dane M.A., Bucher E., Nederlof M., Sudar D., Gross S., Kilburn D.F., Smith R., Devlin K., Margolis R., Derr L., Lee A., Pillai A. The library of integrated network-based cellular signatures NIH program: system-level cataloging of human cells response to perturbations. Cell Syst. 2017 doi: 10.1016/j.cels.2017.11.001. pii: S2405-4712(17)30490-8. [Epub ahead of print] [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Kudo-Saito C., Fuwa T., Murakami K., Kawakami Y. Targeting FSTL1 prevents tumor bone metastasis and consequent immune dysfunction. Cancer Res. 2013;73:6185–6193. doi: 10.1158/0008-5472.CAN-13-1364. [DOI] [PubMed] [Google Scholar]
  45. Lamb J. The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006;313:1929–1935. doi: 10.1126/science.1132939. [DOI] [PubMed] [Google Scholar]
  46. Lamb J. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science (New York, N.Y.) 2006;313:1929–1935. doi: 10.1126/science.1132939. [DOI] [PubMed] [Google Scholar]
  47. Lawrence M.S. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–218. doi: 10.1038/nature12213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Lee S.I. Learning a prior on regulatory potential from eQTL data. PLoS Genet. 2009;5:e1000358. doi: 10.1371/journal.pgen.1000358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Liberzon A. The molecular signatures database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1:417–425. doi: 10.1016/j.cels.2015.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Liu Y.P., Suksanpaisan L., Steele M.B., Russell S.J., Peng K.W. Induction of antiviral genes by the tumor microenvironment confers resistance to virotherapy. Sci. Rep. 2013;3:2375. doi: 10.1038/srep02375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Ma Q. Role of nrf2 in oxidative stress and toxicity. Annu. Rev. Pharmacol. Toxicol. 2013;53:401–426. doi: 10.1146/annurev-pharmtox-011112-140320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Malhotra D. Global mapping of binding sites for Nrf2 identifies novel targets in cell survival response through ChIP-Seq profiling and network analysis. Nucleic Acids Res. 2010;38:5718–5734. doi: 10.1093/nar/gkq212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Manolakos A., Ochoa I., Venkat K., Goldsmith A.J., Gevaert O. CaMoDi: a new method for cancer module discovery. BMC Genomics. 2014;15(Suppl. 10):S8. doi: 10.1186/1471-2164-15-S10-S8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Margolin A.A. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinforma. 2006;7(Suppl. 1):S7. doi: 10.1186/1471-2105-7-S1-S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Martinez F.O., Gordon S., Locati M., Mantovani A. Transcriptional profiling of the human monocyte-to-macrophage differentiation and polarization: new molecules and patterns of gene expression. J. Immunol. 2006;177:7303–7311. doi: 10.4049/jimmunol.177.10.7303. [DOI] [PubMed] [Google Scholar]
  56. Mei J. Prognostic impact of tumor-associated macrophage infiltration in non-small cell lung cancer: a systemic review and meta-analysis. Oncotarget. 2016;7:34217–34228. doi: 10.18632/oncotarget.9079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Mermel C.H. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 2011;12:R41. doi: 10.1186/gb-2011-12-4-r41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Mootha V.K. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 2003;34:267–273. doi: 10.1038/ng1180. [DOI] [PubMed] [Google Scholar]
  59. Munoz-Fontela C. Transcriptional role of p53 in interferon-mediated antiviral immunity. J. Exp. Med. 2008;205:1929–1938. doi: 10.1084/jem.20080383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Newman M.E., Girvan M. Finding and evaluating community structure in networks. Phys. Rev. E. 2004;69:026113. doi: 10.1103/PhysRevE.69.026113. [DOI] [PubMed] [Google Scholar]
  61. Newman A.M. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods. 2015;12:453–457. doi: 10.1038/nmeth.3337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Ostuni R., Kratochvill F., Murray P.J., Natoli G. Macrophages and cancer: from mechanisms to therapeutic implications. Trends Immunol. 2015;36:229–239. doi: 10.1016/j.it.2015.02.004. [DOI] [PubMed] [Google Scholar]
  63. Papageorgis P. TGFbeta signaling in tumor initiation, epithelial-to-mesenchymal transition, and metastasis. J. Oncol. 2015;2015:587193. doi: 10.1155/2015/587193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Patel S.P., Kurzrock R. PD-L1 Expression as a Predictive Biomarker in Cancer Immunotherapy. Mol. Cancer Ther. 2015;14:847–856. doi: 10.1158/1535-7163.MCT-14-0983. [DOI] [PubMed] [Google Scholar]
  65. Patil M., Pabla N., Dong Z. Checkpoint kinase 1 in DNA damage response and cell cycle regulation. Cell. Mol. Life Sci. 2013;70:4009–4021. doi: 10.1007/s00018-013-1307-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Querec T.D. Systems biology approach predicts immunogenicity of the yellow fever vaccine in humans. Nat. Immunol. 2009;10:116–125. doi: 10.1038/ni.1688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Rangasamy T. Genetic ablation of Nrf2 enhances susceptibility to cigarette smoke-induced emphysema in mice. J. Clin. Invest. 2004;114:1248–1259. doi: 10.1172/JCI21146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Ray P., Zheng L., Lucas J., Carin L. Bayesian joint analysis of heterogeneous genomics data. Bioinformatics. 2014;30:1370–1376. doi: 10.1093/bioinformatics/btu064. [DOI] [PubMed] [Google Scholar]
  69. Ribas A. Association of response to programmed death receptor 1 (PD-1) blockade with pembrolizumab (MK-3475) with an interferon-inflammatory immune gene signature. J. Clin. Oncol. 2015;33 abstr 3001. [Google Scholar]
  70. Roulois D. DNA-demethylating agents target colorectal cancer cells by inducing viral mimicry by endogenous transcripts. Cell. 2015;162:961–973. doi: 10.1016/j.cell.2015.07.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Rusinova I. Interferome v2.0: an updated database of annotated interferon-regulated genes. Nucleic Acids Res. 2013;41:D1040–1046. doi: 10.1093/nar/gks1215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Sadler A.J., Williams B.R. Interferon-inducible antiviral effectors. Nat. Rev. Immunol. 2008;8:559–568. doi: 10.1038/nri2314. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Schalper K.A. Clinical significance of PD-L1 protein expression on tumor-associated macrophages in lung cancer. J. Immunother. Cancer. 2015;3:P415. [Google Scholar]
  74. Simpkins S.B. MLH1 promoter methylation and gene silencing is the primary cause of microsatellite instability in sporadic endometrial cancers. Hum. Mol. Genet. 1999;8:661–666. doi: 10.1093/hmg/8.4.661. [DOI] [PubMed] [Google Scholar]
  75. Singh A. Glutathione peroxidase 2, the major cigarette smoke-inducible isoform of GPX in lungs, is regulated by Nrf2. Am. J. Respir. Cell Mol. Biol. 2006;35:639–650. doi: 10.1165/rcmb.2005-0325OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Subramanian A. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Tang K.W., Alaei-Mahabadi B., Samuelsson T., Lindh M., Larsson E. The landscape of viral expression and host gene fusion and adaptation in human cancer. Nat. Commun. 2013;4:2513. doi: 10.1038/ncomms3513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Taylor B.S. Functional copy-number alterations in cancer. PLoS One. 2008;3:e3179. doi: 10.1371/journal.pone.0003179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Tibshirani R., Hastie T., Friedman J. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010;33:1–22. [PMC free article] [PubMed] [Google Scholar]
  80. Troyanskaya O. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17:520–525. doi: 10.1093/bioinformatics/17.6.520. [DOI] [PubMed] [Google Scholar]
  81. Uchil P.D. TRIM protein-mediated regulation of inflammatory and innate immune signaling and its association with antiretroviral activity. J. Virol. 2013;87:257–272. doi: 10.1128/JVI.01804-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Ulloa-Montoya F. Predictive gene signature in MAGE-A3 antigen-specific cancer immunotherapy. J. Clin. Oncol. 2013;31:2388–2395. doi: 10.1200/JCO.2012.44.3762. [DOI] [PubMed] [Google Scholar]
  83. Vandin F., Upfal E., Raphael B.J. De novo discovery of mutated driver pathways in cancer. Genome Res. 2012;22:375–385. doi: 10.1101/gr.120477.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Vaske C.J. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics. 2010;26:i237–245. doi: 10.1093/bioinformatics/btq182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Vendrell J.A. ZNF217 is a marker of poor prognosis in breast cancer that drives epithelial-mesenchymal transition and invasion. Cancer Res. 2012;72:3593–3606. doi: 10.1158/0008-5472.CAN-11-3095. [DOI] [PubMed] [Google Scholar]
  86. Walworth N., Davey S., Beach D. Fission yeast chk1 protein kinase links the rad checkpoint pathway to cdc2. Nature. 1993;363:368–371. doi: 10.1038/363368a0. [DOI] [PubMed] [Google Scholar]
  87. Wu Y. The transcriptional foundations of Sp110-mediated macrophage (RAW264.7) resistance to Mycobacterium tuberculosis H37Ra. Sci. Rep. 2016;6 doi: 10.1038/srep22041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Xi Y., Gong D.W., Tian Z. FSTL1 as a potential mediator of exercise-induced cardioprotection in post-myocardial infarction rats. Sci. Rep. 2016;6:32424. doi: 10.1038/srep32424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Yan W., Chen X. GPX2, a direct target of p63, inhibits oxidative stress-induced apoptosis in a p53-dependent manner. J. Biol. Chem. 2006;281:7856–7862. doi: 10.1074/jbc.M512655200. [DOI] [PubMed] [Google Scholar]
  90. Ye X.T., Guo A.J., Yin P.F., Cao X.D., Chang J.C. Overexpression of NUAK1 is associated with disease-free survival and overall survival in patients with gastric cancer. Med. Oncol. 2014;31:61. doi: 10.1007/s12032-014-0061-1. [DOI] [PubMed] [Google Scholar]
  91. Yuan Y. Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat. Biotechnol. 2014;32:644–652. doi: 10.1038/nbt.2940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Zhang H.Y., Li J.H., Li G., Wang S.R. Activation of ARK5/miR-1181/HOXA10 axis promotes epithelial-mesenchymal transition in ovarian cancer. Oncol. Rep. 2015;34:1193–1202. doi: 10.3892/or.2015.4113. [DOI] [PubMed] [Google Scholar]
  93. Zhou X. Epigenetic inactivation of follistatin-like 1 mediates tumor immune evasion in nasopharyngeal carcinoma. Oncotarget. 2016;7:16433–16444. doi: 10.18632/oncotarget.7654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Zitvogel L., Galluzzi L., Kepp O., Smyth M.J., Kroemer G. Type I interferons in anticancer immunity. Nature reviews. Immunology. 2015;15:405–414. doi: 10.1038/nri3845. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Fig. 1

Number of regulator genes (genes that regulate at least one module from one cancer type) identified in AMARETTO by only copy number, only DNA methylation or both.

mmc1.pdf (62.1KB, pdf)
Supplementary Fig. 2

Boxplots representing the prediction performances (mean squared error MSE and R-square) obtained after running AMARETTO using copy number data only, methylation only or both copy number and methylation on the 11 cancer sites.

mmc2.pdf (1.1MB, pdf)
Supplementary Fig. 3

Overview of the module network (upper figure), the pancancer module network (on the bottom, left) and a zoomed view of intriguing subnetworks. The nodes of the graph are all the modules across all cancers sites. Their size depends on the node degree (number of incident edges). An edge between two modules stands for a significant association between them (measured through the minus log-transformation of the adjusted p-value, which also defines the edge thickness). For the top figure, the node color depends on the associated cancer site, for the two bottom figures, the node color depends on the subnetwork it belongs to.

mmc3.pdf (285.4KB, pdf)
Supplementary Fig. 4

Correlations between GPX2 expression and the averaged expression of an oxidative response gene signature for all cancer sites.

mmc4.pdf (254.4KB, pdf)
Supplementary Fig. 5

Correlations between GPX2 expression and the averaged expression of a xenobiotic response gene signature for all cancer sites.

mmc5.pdf (214.1KB, pdf)
Supplementary Fig. 6

Boxplots representing the association between GPX2 expression and smoking profile for BLCA and HNSC cancers.

mmc6.pdf (23.9KB, pdf)
Supplementary Fig. 7

Venn diagram representing the number of genes regulating the immune response subnetwork and induced by IFNs of type I, II or III.

mmc7.pdf (4.8KB, pdf)
Supplementary Fig. 8

Scatterplots representing the associations between OAS2 expression and PD-L1 expression on the left, and PD-L2 expression, on the right for all cancer sites.

mmc8.pdf (608.1KB, pdf)
Supplementary Table 1

(a) The top 50 most selected driver genes regulating modules enriched in major pathways of cancer including angiogenesis, hypoxia, EMT, cell cycle, immune response, apoptosis, metastases, integrin signaling and EGFR across all cancer sites, (b) genetic and epigenetic alterations of the top drivers across all cancer sites and (c) comparison of average number of enriched gene sets per module in 100 random permutations vs. the actual modules per cancer.

mmc9.pdf (124.4KB, pdf)
Supplementary Table 2

Number of modules regulated by driver genes for all cancer types. Only the top regulator genes, ranked according to the total number of regulated modules across all cancer sites (last column) are represented.

mmc10.pdf (50.4KB, pdf)
Supplementary Table 3

Prediction performances (R-square and mean squared error MSE) obtained after running AMARETTO using copy number data only, methylation only or both copy number and methylation on the 11 cancer sites. The two last columns indicate the R-square and MSE increase when adding methylation data.

mmc11.pdf (35.8KB, pdf)
Supplementary Table 4

Distribution of the modules per cancer (column) and subnetwork (row). The last column indicates the total number of modules within a subnetwork.

mmc12.pdf (33.2KB, pdf)
Supplementary Table 5

Cancer driver genes and enrichment results of the smoking subnetwork ranked by number of modules each driver gene is participating in.

mmc13.pdf (63.8KB, pdf)
Supplementary Table 6

(a) Oxidative and xenobiotic gene signatures provided by the GO ontology and used to measure the association of GPX2 expression with smoking. (b) Correlations and p-values measuring the association between smoking related data (smoking profile, number of smoked years and pack years) and GPX2 expression across all cancer sites with enough data (BLCA, HNSC, LUAD, LUSC). The percentage of missing data is also indicated.

mmc14.pdf (60.7KB, pdf)
Supplementary Table 7

Experimental validation of the modules and their target genes regulated by GPX2 as a causative driver of the pancancer smoking subnetwork. Table shows overall GSEA enrichment results of the three perturbation experiments upon GPX2 knockdown in the lung adenocarcinoma A549 cell line derived from LINCS (columns: consensus, shRNA1, shRNA2) in each of the 8 modules regulated by GPX2 in the 5 cancer sites (rows: modules organized by the 5 sites in the following order: LUAD, LUSC, BLCA, HNSC, UCEC). On top are shown the significance levels (p-values and FDR values; green if p-value < 0.05 and FDR < 0.25, yellow if FDR < 0.25) and below, the normalized enrichment scores (NES; blue if repressed, red if induced). On the right are the sizes of the signatures, from left to right: number of genes in the modules, number of genes that are part of the LINCS landmark and bing genes, and the numbers of those genes that are identified as ‘leading edge’ genes driving the GSEA enrichment scores in the three experiments (consensus, shRNA1, shRNA2).

mmc15.pdf (119KB, pdf)
Supplementary Table 8

Cancer driver genes and enrichment results of the immune response subnetwork ranked by number of modules each driver gene is participating in.

mmc16.pdf (83.6KB, pdf)
Supplementary Table 9

Correlations and p-values for Pearson test measuring the association between OAS2 expression and PD-L1/PD-L2 expression for all cancer sites.

mmc17.pdf (40.4KB, pdf)
Supplementary Table 10

Cancer driver genes and enrichment results of the histone subnetwork ranked by number of modules each driver gene is participating in.

mmc18.pdf (65.3KB, pdf)

Articles from EBioMedicine are provided here courtesy of Elsevier

RESOURCES