Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2024 May 15;48(8):414–432. doi: 10.1002/gepi.22566

Identifying genes associated with disease outcomes using joint sparse canonical correlation analysis—An application in renal clear cell carcinoma

Diptavo Dutta 1,, Ananda Sen 2,3, Jaya M Satagopan 4
PMCID: PMC11589067  PMID: 38751238

Abstract

Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing “gene component scores” and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA‐KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of ASAH1 gene trans‐regulated by methylation of several genes including SIX5 and by CNAs in the 10q25 region including TCF7L2. Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia‐regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.

Keywords: interaction, pathway, sparse caonical correlation

1. INTRODUCTION

Cancer exhibits a diverse and enriched burden of somatic DNA copy number aberrations (CNAs) and DNA methylations (Eder et al., 2005; Zender et al., 2006; Zhang et al., 2018). The CNAs and methylations may regulate cellular growth pathways by altering gene expressions at the RNA level that are crucial biological processes for tumorigenesis and disease outcomes. Modern cancer studies increasingly obtain large volumes of genome‐wide data on CNAs, methylations, and gene expressions from tumor samples. Analyzing these large data to identify biologically relevant CNAs and methylations, to discern the mechanisms through which they affect downstream gene expressions, and to investigate the resulting effect on cancer outcome are crucial for effective disease management and for developing successful biomarker‐based anticancer therapies. We use the term biomarker to refer to downstream genes whose expressions are regulated by key CNA sites and methylation sites, and where the expression impact disease outcome. Although identifying individual CNAs associated with disease outcomes provides important etiological insights, identifying downstream biomarkers, that is, intermediate molecular phenotypes like gene expressions associated with the disease, is important for developing therapeutic strategies and drug repurposing, providing ways to translate the etiological insights into clinical management options (Chin & Gray, 2008; Malone et al., 2020). In this paper, we present an analysis pipeline for identifying such biomarkers.

Standard approaches for identifying biomarkers using large and multiple types of genomics data include individual feature analysis (Despierre et al., 2014; Nguyen et al., 2020; Sapkota et al., 2013) and regularization techniques (Chen et al., 2009; Hastie et al., 2001; Witten et al., 2009). These methods do not leverage putative biological relationships between CNAs, methylation, and gene expressions. However, it is well‐understood that genes and gene products rarely act in isolation, but form networks by working with other genes to address specific biological functions that inform disease outcomes (Wang et al., 2011). Therefore, alternative methods such as piecewise linear regression spline, lots of lasso, weighted correlation network analysis, and Oncodrive‐CIS strive to harness underlying biological relationships by selecting significant CNA‐gene expression pairs (Chin et al., 2007; Langfelder & Horvath, 2008; Leday & van de Wiel, 2013; Meinshausen & Bühlmann, 2010; Meinshausen et al., 2009; Tamborero et al., 2013). However, these methods do not identify comprehensive maps of networks or regulatory modules that are essential for gaining insights into biomarker reflecting disease biology. The purpose of this paper is to fill this gap in the analysis of genomics data.

In genomics data analysis, the major challenges in identifying regulatory modules are the large dimension of CNAs, methylations, and gene expressions, and the sparsity of CNAs. As a result, it becomes quite difficult to model the large and sparse matrix of CNA data in relation to large matrices of methylation and gene expression data. To address this difficulty, sparse canonical correlation analysis (sCCA) is emerging as an important computational tool for modeling large genomics matrices by leveraging the inter‐relatedness of DNA and RNA features (Hardoon & Shawe‐Taylor, 2011; Witten et al., 2009). The sCCA approach correlates two types of genomics data—for example, CNA and gene expression—and extracts maximally correlated sets or modules or components of features. To achieve this, sCCA produces sparse latent variables representing biologically relevant sets of genomics data. This approach has been previously used in high‐dimensional analyses of transcriptomics (Dutta, He, et al., 2022) and especially various cancers. For example, Witten and Tibshirani used sCCA to select CNA‐regulated gene expression components (i.e., sets of gene expression which we also refer to as gene components) associated with survival of diffuse large B‐cell lymphoma patients (Witten & Tibshirani, 2009). Lin and colleagues used a modified sCCA to select biologically relevant single‐nucleotide polymorphism (SNP) sets and gene components in human gliomas (Lin et al., 2013). Priya and colleagues used sCCA within a machine learning framework to select sets of host microbiomes and gene components and used these to examine host genes and pathways in colorectal cancer, inflammatory bowel disease, and irritable bowel syndrome (Priya et al., 2022). In prior work, we used sCCA to identify gene components regulated by CNA sites in breast tumors and demonstrated the association between these biomarkers and breast cancer outcomes (Dutta, Sen, et al., 2022).

Although sCCA has proven effective in identifying smaller number of correlated features between two data types (Laha & Mukherjee, 2023), modern genomics studies measure multiple types of genomics features such as CNAs, methylation, and gene expression. The sCCA approach cannot be readily used to examine the relationship between multiple features. Therefore, recent works examined correlations between pairs of data types using sCCA—CNAs versus methylations, CNAs versus gene expressions, and methylations versus gene expressions (Rodosthenous et al., 2020). Focusing on correlations between pairs of data types separately effectively ignores the underlying dependency between multiple data types that may be informative in identifying potential biomarkers. Therefore, to overcome such gaps, Witten and Tibshirani developed the sparse multiple canonical correlation approach (Witten & Tibshirani, 2009), which maximizes the sum of correlation between pairs of genomic data sets—for example, sum of correlation between CNA and methylation, methylation and gene expression, and gene expression and CNA. This approach treats each data type symmetrically in the sense that it examines correlations between every pair of data type. However, in cancer genomic studies, one may often want to examine specific patterns of correlations, derived from the underlying biological or mechanistic hypothesis—for example, to identify gene components that are regulated by CNA‐associated methylation sites. This entails an asymmetric investigation by first identifying sets of methylation sites that are correlated with sets of CNA sites, and then identifying gene components that are correlated with these methylation sites. The sparse multiple canonical correlation approach cannot be directly applied to such investigations. Therefore, in this paper, we propose a novel analysis pipeline, known as the joint sparse canonical correlation analysis (jsCCA) pipeline, to analyze multiple genomic data types to examine such asymmetric relationships for identifying biomarkers.

Our proposed jsCCA analysis pipeline consists of two key parts that are executed in three steps. Part 1 (Step 1; see below) simultaneously optimizes CNA‐methylation and methylation‐gene expression correlations, to identify gene components that are regulated by CNA‐associated methylations. The motivation behind part 1 is the hypothesis that associations at the DNA level can impact biological processes at the RNA level via regulating methylation levels. Part 2 (Steps 2–3; see below) investigates the association between the expressions of genes in these gene components and disease outcome. The motivation behind part 2 is the hypothesis that biological processes at the RNA level can impact disease outcome. Given the limited sample sizes of current omics studies, we can have low power to detect individual gene expressions within the identified gene modules, that are associated with the outcome of interest. However, the overall effect of the gene module on the outcome might still be significant due to the aggregation of numerous weaker effects. Thus, to better leverage potential cumulative effects of gene expressions on disease outcome, we propose to construct gene component scores (GCSs) for each module as a weighted linear combination of the expression of genes in each component and examine the association between the scores and outcome. This is analogous to constructing polygenic scores (PGS) in genetic association studies, using gene expression modules rather than significant variants. Thus Part 2 of our pipeline consists of two steps—first, estimating the weights to construct GCSs, and then examining the association between the GCSs and outcome in an independent sample. In similar vein, we can also adjust for additional covariates and examine interactions of such GCSs with risk factors and exposures.

Throughout this article, we shall use CNA, methylation, gene expression, clinical data, and smoking exposure in renal clear cell carcinoma (ccRCC) cases from The Cancer Genome Atlas (TCGA) (i.e., TCGA‐KIRC) to illustrate our proposed jsCCA analysis pipeline (La Rochelle et al., 2010; Monzon et al., 2011). We will use stage at diagnosis of ccRCC as our outcome of interest. Previous works have identified genomic features associated with ccRCC. For example, mutations and copy number changes in PBRM1, BAP1, SETD2, and KDM5C genes are associated with higher‐grade tumor and poor survival (Bihr et al., 2019; Joosten et al., 2018; Weaver et al., 2022); methylation of ZNF677 and PCDH8 genes are associated with poor survival (Ghatalia & Rathmell, 2018). ClearCode34, a validated score based on the expression of 34 genes, has been suggested as a biomarker to identify patients who will benefit most from risk‐adapted treatment approaches since this score is associated with recurrence in patients with localized disease (Ghatalia & Rathmell, 2018). However, due to the identification of association between a small number of such CNAs, methylation sites, and genes, coupled with low prevalence of specific CNAs and the limited applicability of gene expression scores beyond specific subgroups, current genomic features can only be used for managing a small segment of patients. Indeed, for ccRCC, the incorporation of biomarkers into clinical applications has not been widely implemented (Weaver et al., 2022). We will examine whether our jsCCA analysis pipeline identifies any previously known biomarkers for stage at diagnosis of ccRCC by leveraging potential inter‐relatedness between CNAs, methylations, and gene expressions.

We describe the jsCCA analysis pipeline in Section 2. Application of the pipeline to analyze ccRCC data from TCGA is given in Section 3. Lastly, in Section 4, we provide a discussion, limitations, and areas of future research.

2. METHODS

The overarching goal of jsCCA is to identify sets of gene expressions, termed gene components, whose expression levels are regulated by CNA‐associated methylation sites and investigate the association between these components and a disease outcome of interest. Achieving this goal involves three key steps as outlined in Figure 1.

  • Step 1 (Part 1; see above) identifies gene components by iteratively applying sparse canonical correlation analysis to Ncna CNA features, Nmeth methylation features, and Nexpr gene expression features measured in a set of independent individuals (Box 1 and Box 2 of Figure 1). Each gene component consists of a set of gene expressions (or features) selected for their biological relevance in the sense that these gene expressions are regulated by certain CNA‐associated methylation sites. This method identifies multiple orthogonal gene components, where the total number of features identified across all the components is considerably smaller than Nexpr.

  • Step 2 (Part 2; see above) dutta fits a generalized linear model for a disease outcome in relation to the biologically relevant gene expressions, separately for each gene component selected in Step 1. This step provides the estimated effects of the gene expressions on the outcome for each gene component. These estimates can be used to create GCSs as weighted linear combination of gene expressions, separately for each component, by using the estimated effects as weights (Box 3 of Figure 1).

  • Finally, Step 3 (Part 2; see above) fits a generalized linear model to test the association between each GCS by incorporating, where relevant, environmental exposures and their interactions with the GCS and by adjusting for covariates (Box 4 of Figure 1).

Figure 1.

Figure 1

Current analysis pipeline including identification of joint sparse canonical correlation analysis (jsCCA) modules. The jsCCA to select gene expression modules regulated by copy number aberrations‐associated methylation sites is described in section “jsCCA.” Performing association analysis of these gene modules in relation to outcome to estimate the effects of individual genes in a module to obtain gene component scores (GCS) is described in section “GCS calculation.” Performing association analysis of GCS, including analysis of their interaction with exposure (smoking), is described in section “Association and interaction.” Application of this approach to identify genes associated with stage at diagnosis of renal clear cell carcinoma (ccRCC) and interpretation are given in section “Results.”

To avoid optimistic bias in gene component selection and effect estimation and investigating their association with disease outcome, we propose to conduct Steps 1 and 2 using an independent of samples, referred to as training samples, that are distinct from those used for Step 3, which are referred to as test samples. The methodological details of these steps are given below.

Step 1 (Box 1 and Box 2, Figure 1): jsCCA to identify gene components regulated by CNA‐associated methylation sites. Let CNtrain,Ncna, MNtrain,Nmeth, and ENtrain,Nexpr× be matrices representing CNA, normalized methylation, and normalized gene expression levels for Ncna CNA sites, Nmeth methylation sites, and Nexpr genes, respectively, in Ntrain individuals. To identify gene components regulated by CNA‐associated methylation sites, we solve an optimization problem to discover sparse vectors corresponding to CNA (uNcna,1; termed CNA component), methylation sites (vNmeth,1; termed methylation component), and gene expressions (wNexpr,1; termed gene component) such that the correlations between linear combinations Cu and Mw and between Mw and Ev are maximized. In notations, this is given by:

(u,w)=argmaxuTZCMw,
(w,v)=argmaxwTZMEv,

with u1λu; v1λv; w1λw and u2=1, v2=1 and w2=1, where ZCM=CTM and ZME=MTE, where .h denotes the Lh norm, the superscript T denotes matrix transpose, and λu, λv, and λw are sparsity parameters.

This is numerically equivalent to solving the following optimization problem:

(u,w,v)=argmaxuTCTMw+wMTEv,

with u1λu; v1λv; w1λw and u2=1, v2=1 and w2=1.

Previous work has considered similar optimization problem in context of imaging genetics, prediction, process monitoring among others (Fang et al., 2016; Wilms & Croux, 2015; Xiu et al., 2021).

The maximization details are given in the Appendix. Ideally, this solution selects a sparse set of CNA sites that are correlated with a sparse set of methylation sites, which, in turn, are correlated with a sparse set of gene expressions. This sparse set is denoted by nonzero elements of the components of the triplet (u,v,w), which we refer to as a jsCCA module. Using the same methylation component w in the above equations selects gene components regulated by methylation sites that are maximally correlated with CNAs, that is, gene components regulated by CNA‐associated methylations.

After obtaining the first jsCCA module, subsequent modules are obtained iteratively by Hotelling's matrix deflation and under the constraint of being uncorrelated and “strictly orthogonal” with previous modules (see Appendix for details). To facilitate interpretation, we choose the sparsity parameters λu, λw, and λv such that there is no overlap between the CNA sites selected in different jsCCA modules and, similarly, no overlap between the methylation sites or gene expressions selected in the different modules (see Appendix for details), which we call strict orthogonality. This allows for a conceptual partitioning of the features through jsCCA allowing us to interpret the jsCCA modules as potentially distinct biological processes affecting the outcome. Each jsCCA module combines multiple associations between the selected CNA sites, methylation sites, and gene expressions and, thus, represents regulation of gene components by CNA‐associated methylation sites.

The maximum number of jsCCA modules that can be selected in this iterative approach is min{Ncna,Nmeth,Nexpr}. However, to get a reasonable number of components, we extract the number of components that maximizes iterative ratio of canonical correlations (see Appendix).

Step 2 (Box 3, Figure 1): Identify genes associated with disease outcome and estimate overall effects of gene expression in identified gene components on disease outcome to calculate GCS. Suppose we identify K(min{Ncna,Nmeth,Nexpr}) jsCCA modules in Step 1. This corresponds to obtaining K sets of triplet vectors (u,v,w) using jsCCA. Consider the gene expressions corresponding to the gene component of the k‐th jsCCA module. This will be a matrix of pk normalized gene expressions for all the Ntrain individuals, where pk is the total number of nonzero elements of w in the k‐th jsCCA module. Denote Ek as this Ntrain×pk matrix, where k=1,,K. The (i,j)‐th element of this matrix, denoted Ek,i,j, is the normalized expression of gene j in component k for person i. For each gene component identified in the k‐th jsCCA module (k=1,,K), we relate the pk genes to disease outcome of interest using a generalized linear model:

g[Exp(yi)]=βk0+Ekβk,

where g[.] is a canonical link function, Exp(yi) is the expected value of the outcome yi of the i‐th individual, βk0 is the model intercept and βk=(βk1,βk2,,βkpk) are the effects of the pk gene expressions in the gene component of the k‐th jsCCA module. These effects can be estimated using standard maximum likelihood estimation methods for generalized linear models.

We propose to use these estimated effects to obtain a parsimonious composite score for the k‐th gene component of an individual. We define this score, which we refer to as the GCS, as the weighted linear combination of the pk gene expressions in an individual, where the weights are the estimated effects. In notation, the GCS corresponding to the k‐th gene component of the i‐th individual is written as:

GCSki=j=1pkβkj×Ekij.

Once we select the gene component in Step 1, we estimate overall the effects of these genes, aggregated within a gene component, using the above generalized linear model. Note that, we use all the pk genes to obtain GCS without resorting to p‐value thresholding to calculate GCS based solely on statistically significant gene expressions.

In practical applications, we may examine the statistical significance of the effects by calculating false discovery rates to gain insights into the characteristics of gene expression features having significant association with the outcome, even though we do not select significant features to construct the GSCs.

Step 3 (Box 4, Figure 1): Association testing. In this final step, we test the statistical significance of each GCS in relation to disease outcome using a generalized linear model in an independent test data set of Ntest individuals. We can account for covariates and incorporate interactions between the GCSs and environmental exposures of interest. Denote Zi as a vector of covariates of the i‐th individual with outcome yi in the test data set. Let Si denote the value of an environmental exposure in this individual. For example, Si can be a binary variable denoting the presence or absence of exposure. Let GCSki denote the GCS corresponding to the gene component of the k‐th jsCCA module for person i in the test data set. For each k=1,2,,K, we fit a linear model relating the k‐th GCS, exposure, and their interaction to disease outcome, accounting for covariates, as:

g[Exp(yi)]=μk0+Ziδk+Si×μk1+GCSki×μk2+Si×GCSki×θk,

where μk0 is the model intercept, δk is a vector of covariate effects, and μk1 is the main effect of the exposure when modeling the k‐th GCS in relation to the outcome, μk2 is the main effect of the k‐th GCS, and θk is the effect of interaction between exposure and the k‐th GCS. We use the score statistic to test the statistical significance of the main and interaction effects and adjust for multiple comparisons by calculating false discovery rates (FDR). Gene components and interactions having FDR < 0.05 are declared as being significantly associated with the outcome.

3. RESULTS

3.1. TCGA kidney renal clear cell cancer data

We obtained TCGA kidney renal clear cell cancer (TCGA‐KIRC) (Cancer Genome Atlas Research Network, 2013) from publicly available cBioPortal that includes 537 ccRCC cases. This repository hosts uniformly preprocessed and normalized cancer omics data and thus we did not further preprocess the data. Demographic data on age at diagnosis, sex, race, ethnicity, and clinical data on tumor characteristics are available for all 537 cases. Our outcome of interest is stage at diagnosis, a binary variable denoting low or high‐stage ccRCC. Molecular data on CNAs, CpG island methylation, and gene expressions are available for 530, 322, and 536 cases, respectively. All three molecular features—CNAs, methylation, and gene expression—are available for 313 cases. Further, CNAs and gene expression data, but not methylation data, are available for 215 cases. Thus, we will use the 313 cases as training data (i.e., Ntrain = 313) to identify the jsCCA modules and to estimate weights for deriving the GCSs. We will use the remaining 215 cases as test data (i.e., Ntest = 215) to examine the effect of GCSs on stage and evaluate the interaction between the GCSs and smoking, which is an established risk factor for ccRCC tumor stage. The characteristics of the training and test data are described in Supporting Information S1: Table 1.

3.2. Estimation of jsCCA modules (Step 1)

We identified K = 8 jsCCA modules using the training data. Each module consists of a CNA component, methylation component, and gene component. Across the eight components, jsCCA selects 698 CNA sites, 1996 methylation sites, and 1336 gene expression sites (Table 1 and Supporting Information S1: Table 2).

Table 1.

Overall description of the CNA, methylation sites, and genes identified in the eight jsCCA modules.

CNA components Methylation components Gene Component
jsCCA module Number of CNA selected Chromosome Maximally enriched genomic location (Mb) Number of sites selected Sites on different chromosome selected by CNA Number of genes selected Transassociations
1 58 20 45–47 262 230 151 135
2 53 18 2–11 218 191 166 122
3 61 5 131–148 320 249 184 113
4 34 12 1–3 238 188 164 141
5 97 1 216–229 210 184 157 124
6 238 3 9–27 and 42–52 255 211 155 118
7 58 14 60–78 233 201 183 143
8 99 10 101–117 260 194 176 131
Total 698 1996 1648 1336 1027

Note: Within the gene components, trans‐associations are defined as the genes which have no selected CNA or methylation sites within a ±5 Mb from its transcription start site.

In general, for each jsCCA module, most of the CNA component selects CNA sites located in a small subregion within a chromosome (Figure 2a). Our jsCCA analysis was agnostic of the physical location of the CNA in the genome and hence the estimation of jsCCA modules is not guided by or biased towards selecting positionally proximal CNA sites. However, due to the high correlation between physically proximal CNA, each CNA component primarily selects a smaller subregion in chromosomes of high correlated CNA.

Figure 2.

Figure 2

Modules identified by joint sparse canonical correlation analysis (jsCCA). (a) positions of the copy number aberrations (CNA) selected in the CNA components across eight identified jsCCA modules. (b) The methylation sites associated with the CNAs selected in the corresponding components in jsCCA module 6 on chromosome 10. (c) Examples of cis and trans associations among genes identified by gene and methylation components of jsCCA module 6. Several trans‐associated genes and the corresponding associations, which do not have any CNA selected in CNA component 6, or methylations sites in methylation component 6, are marked in the dark red.

For example, the 238 CNA sites selected in CNA component 6 are located on chromosome 3p21 region (Table 1; Supporting Information S1: Table 2). Interestingly, this CNA component sites include the genes PBRM1, BAP1, and SETD2, which have previously been reported to harbor mutations associated with advanced‐stage ccRCC tumors, as noted earlier (see Section 1). The SETD2 gene is a histone methyltransferase responsible for specific histone modifications and epigenetic regulation, which indirectly impacts transcription (Cancer Genome Atlas Research Network, 2013; Guo et al., 2012). Additionally, PBRM1 is a subunit of the SWI/SNF chromatin remodeling complex involved in altering the structure of chromatin to regulate gene expression (Cancer Genome Atlas Research Network, 2013). The methylation component 6 (i.e., methylation sites having maximal correlation with CNA component 6) contains a set of 260 methylation sites including known cancer‐related transcription factors (TF) such as GATA3, and TCF12, indicating that the CNAs in the 3p21 region might impact the methylation of these genes and consequently the downstream targets of these TFs. The corresponding gene component can then be viewed as the potential set of genes whose expression levels are associated with the methylation sites regulated by selected CNAs in the 3p21 region.

3.3. Comparison with pairwise‐sCCA

As an exploratory analyses, we further compared jsCCA results with those obtained from pairwise sCCA on CNA‐methylation (CM), methylation‐gene expression (ME), and CNA‐gene expression. Given the availability of individual‐level data, we implemented sCCA using the PMA package by Witten and Tibshirani (2009), Witten et al. (2009), under default settings. Since jsCCA aims to identify shared methylation sites that are associated with both CNA and gene expressions, we focused on the intersection of the methylation sites selected in the sCCA analyses with CM and ME (Supporting Information Figure). We identified 78 shared or common methylation sites between CM and ME analyses as opposed to 262 identified in jsCCA. Also, 66 of the 78 methylation sites were also identified using jsCCA as well. Interestingly, methylation site at MAPK3, which is a known cancer related gene, is not identified the pairwise analyses but through jsCCA. This provides initial and suggestive evidence that, jsCCA can identify plausible and relevant candidates which pairwise sCCA might not be able to capture.

We further examined the biological relevance of the CNA sites, methylation sites, and gene expressions selected in the eight jsCCA modules. These are summarized in the following subsections.

3.3.1. jsCCA modules identify cis & trans associations

The identified jsCCA modules encompass both cis and trans‐associations between the CNA, methylation sites, and gene expressions (Table 1). For the CNA and methylation components, only 348 (17.4%) of the selected methylation sites highlighted cis associations within a neighborhood of ±5 Mb, indicating that most of the genetic regulation of methylation is potentially through distal or trans associations. In contrast, 309 (23.1%) of the gene expressions had a cis methylation site or CNA sites in the corresponding methylation or CNA components respectively indicating that methylation might have a slightly stronger cis‐regulatory effect on gene expression. However, several interesting examples of distal (trans) regulatory effects on expressions of genes on different chromosome were also identified in the gene components (Figure 2b,c).

For example, in jsCCA module 8, the 99 CNA sites are in chromosome 10q (Table 1). The methylation component of this module selects 260 sites which include Phosphatidylinositol Glycan Anchor Biosynthesis Class Q (PIGQ), on chromosome 16 (which corresponds to one of the arcs connecting chromosomes 10 and 16 in Figure 2b), and Phosphatidylinositol Glycan Anchor Biosynthesis Class H (PIGH) on chromosome 14 (one of the arcs connecting chromosomes 10 and 14 in Figure 2b), highlighting potential trans associations. Both PIGQ and PIGH are reported to be prognostic markers for renal cancer (https://www.proteinatlas.org/) and are downstream targets of TF SMC3 in at least one ENCODE ChIP‐seq experiment (AbascalMoore et al., 2020; ENCODE Project Consortium, 2012). In the CNA component of this module, a CNA of SMC3 was selected among the CNA sites, which potentially reinforces known biology about interactions of SMC3 with PIGQ and PIGH. It is to be noted that, if we were to estimate pairwise correlations between CNA sites and methylation sites and test for their statistical significance by accounting for multiple testing, the correlations between SMC3 and the methylation of PIGQ (p = 2.1 × 10−03) and PIGH (p = 1.9 × 10−02) do not surpass the multiple testing burden of pairwise regression. This correlation can only be identified through the jsCCA approach.

The methylation component of the jsCCA module 8 selects General Transcription Factor IIF Subunit 1 (GTF2F1), which is an essential component of the RNA polymerase II (Pol II) transcription initiation complex and can lead to dysregulation of oncogenes and tumor suppressor genes (Tompkins et al., 2022). In the corresponding gene component, we identify GTF2F1 which indicates that methylation of the regulatory region/CpG islands near this gene can significantly alter its gene expression through cis‐regulation. However, GTF2F1 is a TF target of SMC3 which is selected in the corresponding CNA component. Thus, the CNA selected in jsCCA module 8 in the 10q region, regulates the methylation of GTF2F1, which in turn impacts the expression level of the same gene. Such biological explanation of the associations between CNA, methylation, and gene expression is one of the key strengths of jsCCA.

3.3.2. VHL and HIF2a/EPAS1 interaction

Among the identified jsCCA components, CNA component 6 includes sites in subregions of chromosome 3p (Table 1), which harbors the Von Hippel Lindau tumor suppressor gene (VHL). The VHL gene is the most frequently mutated gene in renal cell carcinoma, with alterations in 41% of cases according to the COSMIC cancer gene census (Sondka et al., 2018), whose primary role is to promote ubiquitination and degradation of hypoxia‐inducible factor (HIF) genes. Interestingly, the sites in the corresponding methylation component 6 contain 255 genes that include the EPAS1 gene located in chromosome 2. The EPAS1 gene encodes HIF‐2α, another key hypoxia response protein which, like several other members of the HIF genes, interacts with VHL and promotes angiogenesis via activation of VEGF and FLT1 (Takeda et al., 2004). The correlations indicate that copy number deletions in VHL are associated with the lower methylation of EPAS1 (r = −0.45, p = 9.9 × 10−18) and hence, higher expression of EPAS1 (r = 0.26, p = 3.1 × 10−06). It is to be noted that jsCCA captures such well‐established and potentially transregulatory processes involved in ccRCC.

3.3.3. Evidence of coregulation in gene components enriched for TF targets

To investigate whether the gene components in each jsCCA module highlight biological process which are potentially coregulated or represent similar processes, we queried whether the gene components are enriched for targets of TFs. Our analysis showed that each of the eight gene components were enriched for the presence of at least one TF binding motif within 300 base pair of its transcription start site (Supporting Information S1: Table 3). For example, gene component for jsCCA module 4 are enriched for motifs binding several TFs including KLF5 (p = 2.8 × 10−03), TP53 (p = 8.2 × 10−03), and ZBTB33 (p = 8.5 × 10−04), indicating that the gene component might be simultaneously regulated by these TFs. Gene component 7 is enriched for motifs binding RFX6 (p = 2.9 × 10−04). Majority of these TFs have been established to be related to different cancers or more specifically, ccRCC (Luo & Chen, 2021; Singhal et al., 2021; Zhao et al., 2020). Taken together, this suggests that the identified gene components represent potentially biologically coregulated modules.

3.4. Association of gene expressions with tumor stage: Marker for prognosis (Step 2)

We dichotomized tumor stage as low (stages 1 and 2) and high (stages 3 and 4) resulting in 77 (35.8%) high‐stage tumors and used logistic regression to estimate the effects of the gene expressions in each gene component on stage through multivariable logistic regression. We used these estimates to subsequently derive GCSs for the test data in Section 3.4.

To gain insights into the characteristics of the gene expression features associated with stage in each component, we identified statistically significant genes satisfying an FDR cutoff of 5% (Supporting Information S1: Table 4) and examined the biological relevance of these genes. A total of three genes satisfied the FDR threshold for statistical significance. This includes the N‐Acyl sphingosine Amidohydrolase 1 (ASAH1) gene in gene component 8, which is critically involved in sphingolipid metabolism and does not have any nearby CNA or methylation sites selected in the corresponding components, indicating that it is potentially trans‐regulated. Aberrant sphingolipid metabolism, including changes in ceramide levels, has been associated with tumor progression and resistance to cancer therapies (Ogretmen, 2017). Previous work has suggested a role of ASAH1 and its role in sphingolipid metabolism in various cancer types, including breast cancer, prostate cancer, leukemia, and so on (Sänger et al., 2015; Vijayan et al., 2019). In the current data, we find that higher expression of ASAH1 is significantly associated (p = 1.01 × 10−05) with low‐stage ccRCC tumors, indicating a potential protective effect of ASAH1 expression.

A key advantage of our jsCCA approach is that using additional information on methylation and CNA through the corresponding components, we can identify a suggestive biological explanation for the potential trans‐regulation (Figure 3). Among other sites, methylation component 8 selects a TF SIX5 which is known to target ASAH1 downstream (www.genecards.org). In fact, ASAH1 has a proximal enhancer site which is a recognized binding site of SIX5 among other TFs and shows evidence in the current data to be significantly associated to SIX5 methylation (p = 1.07 × 10−09). Further, in the corresponding CNA component, we identify a well‐known TF Transcription Factor 7‐Like 2 (TCF7L2), which is known to have SIX5 as one of its targets. We find that deletions in TCF7L2 gene are associated with lower levels of methylation in SIX5 gene (p = 1.74 × 10−03). Taken together this evidence suggests a potential biological explanation for the association of the ASAH1 with tumor stage, in that CNA in the 10q region and especially TCF7L2 potentially regulates the methylation of SIX5 which in turn regulates the expression of ASAH1 which is associated with lower stage tumors.

Figure 3.

Figure 3

Association of ASAH1 gene with renal clear cell carcinoma tumor stage. (a) Association of TCF7L2 CNA with methylation levels of SIX5. (b) Association of methylation levels of SIX5 with gene expression of ASAH1. (c) Association of ASAH1 with tumor stage in ccRCC.

This suggestive biological interpretation through integrating multiple data modalities is a key strength of jsCCA. Also, it is to be noted that the association between CNA in TCF7L2 and SIX5 or the association between SIX5 methylation and ASAH1 expression do not surpass the multiple testing correction burden if we analyzed all possible pairs of CNA, methylation, and gene expressions. Thus, by extracting the specific modular structure through jsCCA, we not only improve interpretation but also identify effects not captured due to weaker magnitude or smaller sample size, by lowering the multiple testing burden.

3.5. Association analysis (Step 3): Differential interactions gene components with smoking

We conducted this association analysis using the 215 test data samples. First, using the estimated effects of the gene expressions for each gene component (obtained from Step 2 above), we constructed GCSki, for k=1,,8 jsCCA modules in i=1,215 test data samples. We examined the association between each GCS and tumor stage using a logistic regression model. This model included main effects for GCS and smoking and their multiplicative interaction and was adjusted for age and sex. The main effects of GCS for five jsCCA modules were significantly associated with tumor stage (Supporting Information S1: Table 5) indicating that there are potential causal genes in these components. The genes in these modules have biological relevance for cancer, as summarized below.

There are 157 genes in the gene component of jsCCA module 5 (Table 1). These genes are associated with methylation sites regulated by CNAs in the sub‐region of chromosome 1q (Table 1). A pathway analysis of the genes selected in this component reveals several interesting and important pathways that are related to immune responses, different cancer‐related processes, and specifically, known RCC‐related mechanisms (Table 2). For example, the 157 genes selected in the gene component of jsCCA module 5 are overrepresented in several hallmark pathways including inflammatory response (FDR adjusted p = 1.9 × 10−05), IL6‐JAK‐STAT3 signaling (FDR adjusted p = 7.0 × 10−04) and interferon‐gamma response (FDR adjusted p = 2.4 × 10−07) among others and Kyoto Encyclopedia of Genes and Genomes pathways related to immunodeficiency (FDR adjusted p = 2.2 × 10−04), PD‐L1 expression and PD‐1 checkpoint (FDR adjusted p = 1.8 × 10−02) and chemokine signaling (FDR adjusted p = 2.5 × 10−05) and other immune‐related processes in Gene Ontology (GO) terms.

Table 2.

Pathway enrichment of the genes selected in gene component 5.

Category Pathway Adjusted p‐value Genes in pathway Genes overlap
GO Mononuclear cell differentiation 3.8 × 10−11 469 21
Immune response‐regulating signaling pathway 1.2 × 10 10 756 25
Molecular transducer activity 3.3 × 10 02 1940 23
Hallmark Interferon Gamma Response 2.4 × 10−07 199 11
IL6 JAK STAT3 signaling 7.0 × 10−04 86 5
Inflammatory response 1.9 × 10−05 200 8
KEGG Primary immunodeficiency 2.2 × 10−04 38 5
PD‐L1 expression and PD‐1 checkpoint pathway in cancer 1.8 × 10−02 89 4
Chemokine signaling pathway 2.5 × 10−05 191 10
Curated Gene sets RODWELL AGING KIDNEY UP 9.3 × 10−11 494 21
CHEN METABOLIC SYNDROM NETWORK 1.7 × 10−12 1205 36
FLECHNER BIOPSY KIDNEY TRANSPLANT REJECTED VS. OK UP 1.6 × 10−11 88 12
Immunologic Signatures GSE3039 NKT CELL VS. ALPHAALPHA CD8 TCELL DN 9.3 × 10−11 199 16
GSE22886 NAIVE CD8 TCELL VS. MONOCYTE UP 5.8 × 10−10 199 15
GSE29618 MONOCYTE VS. PDC DAY7 FLU VACCINE UP 5.8 × 10−10 199 15
TF targets TRRUST: SPI1 1.6 × 10−03 62 5
TRANSFAC & JASPAR: FOS 6.4 × 10−03 1275 19
Reg Network: STAT2 1.8 × 10−02 660 12

Note: See Supporting Information S1: Table 2 for a full list of genes.

Of particular interest is the PD‐1/PD‐L1 pathway which plays a significant role in the immune response to renal cell carcinoma (RCC) and has become a target for immunotherapy in the treatment of this cancer (Kammerer‐Jacquet et al., 2019). RCC has been shown to exploit the PD‐1/PD‐L1 pathway as a mechanism of immune evasion. This has led to the development of related immunotherapies known as checkpoint inhibitors, which attempt to block the interaction between PD‐1 and PD‐L1, preventing the cancer cells from inhibiting the immune response. In RCC, checkpoint inhibitors like nivolumab (Sheng & Ornstein, 2020) and pembrolizumab (Chau & Bilusic, 2020) have been approved for the treatment of advanced disease. Similarly, immune‐related processes are also enriched among GO terms and others (Table 2). Taken together, this implies that the genes in the gene component of jsCCA module 5 represents mechanisms related to immune activity and inflammatory response.

There are 176 genes in the gene component of jsCCA module 8 (Table 1). These genes are associated with methylation sites regulated by CNAs primarily in the sub‐region of chromosome 10q25 (Table 1). A pathway analysis of the 176 genes selected in this gene component identifies several enriched pathways distinctive in function compared to the previous pathways enriched for genes selected in gene component 5 (Table 3). We identify enrichment for putative VHL targets (FDR adjusted p = 1.2 × 10−02), which is one of the key RCC hallmark genes, as mentioned earlier, and has been reported extensively in literature. Further, the selected genes are also overrepresented among HIF1A targets (FDR adjusted p = 1.0 × 10−06).

Table 3.

Pathway enrichment of the genes selected in gene component 8.

Category Pathway Adjusted p‐value Genes in pathway Genes overlap
GO TRNA modification 3.6 × 10−03 95 6
SRP‐dependent co‐translational protein targeting to membrane 1.3 × 10−03 112 7
Macromolecule catabolic process 1.6 × 10−03 1620 26
KEGG Ribosome 3.3 × 10−03 134 7
Curated Gene sets JIANG VHL TARGETS 1.2 × 10−02 87 5
SANSOM APC MYC TARGETS 8.0 × 10−03 234 8
WANG TUMOR INVASIVENESS UP 6.7 × 10−03 358 10
Immunologic Signatures GSE42724 NAIVE VS. B1 BCELL DN 1.3 × 10−03 197 9
GSE45837 WT VS. GFI1 KO PDC DN 1.3 × 10−03 198 9
GSE32901 TH1 VS. TH17 ENRICHED CD4 TCELL DN 4.8 × 10−03 180 8
TF targets Marbach: HIF1A 1.0 × 10−06 1653 31
ENCODE: MEF2C 1.1 × 10−02 1402 18
ChEA: CREB1 in HEK293T 8.7 × 10−06 443 15

Note: See Supporting Information S1: Table 2 for a full list of genes.

The GCS of jsCCA modules 5 and 8 have statistically significant interactions with smoking in relation to tumor stage. Compared to never‐smokers, smokers with high‐stage tumors have significantly lower mean GCS in jsCCA module 5 (Figure 4a). It is to be noted that smoking has been associated with changes in gene expressions including those related to cancer and the immune system. For example, smoking is an established cause of inflammation, and can lead to the differential expression of genes related to inflammation and immune responses creating a microenvironment conducive to cancer development and progression (Jung et al., 2020), and can cause epigenetic modification and immune evasion (Alexandrov et al., 2016; Jin et al., 2017; Lin et al., 2010).

Figure 4.

Figure 4

Interaction of gene component scores (GCS) of gene components 5 and 8 with smoking in predicting tumor stage. GCS across low and high tumor stages for patient with and without a history of smoking for (a) gene components 5 and (b) gene component 8.

Compared to never‐smokers, smokers with high‐stage tumors have lower mean GCS in jsCCA module 8 (Figure 4b). It is known that smoking can influence the expression of HIF‐related genes by reducing the blood's oxygen‐carrying capacity and is a source of oxidative stress and inflammation in the body. These conditions can activate HIF‐related genes as part of the cellular response to oxidative stress and tissue damage (Daijo et al., 2016).

4. DISCUSSIONS

Complex diseases like cancers, are characterized by a wide spectrum of genetic, epigenetic, and transcriptomic changes that are highly inter‐related. Integrating these data modalities into a joint analysis framework can allow researchers to gain a more complete and holistic understanding of the underlying biology of cancer. In this article, we propose a new analysis pipeline called the jsCCA which aims to jointly analyze CNA, methylation, and gene expression to identify gene expressions regulated by CNA‐associated methylation sites. We have illustrated this pipeline using the TCGA‐KIRC genomics and clinical data and showed that the gene components and the associated CNA sites and methylation sites have biological relevance. We further quantified the effect of each gene‐component on ccRCC using GCSs for each component. Our analysis using GCSs, also identified a significant interaction between two gene‐components and smoking indicating that these gene‐sets might be susceptible to gene‐environment interaction effects on ccRCC. Together, these analyses highlight the suggestive evidence that the gene components identified through jsCCA have distinct and separate biological functions. Our analysis identified several biomarkers previously implicated in ccRCC and other biomarkers that would not be identified by studying pairwise correlations between CNAs, methylations, and gene expressions and adjusting for multiple corrections.

The CNA, methylation, and gene components identified in the jsCCA modules are agnostic of the tumor characteristics or disease outcomes. This is a key advancement that we propose over existing work on this topic. Numerous methods have been developed to identify significant associations between individual pairs of CNA and methylation sites or gene expressions (Dutta, He, et al., 2022; Dutta, Sen, et al., 2022; Lin et al., 2013; Priya et al., 2022; Witten & Tibshirani, 2009). But identifying individual pairwise associations or analysis of any two data modalities separately, although informative, cannot provide deeper insight into overall genetic architecture. Especially in cancer multiomics studies, certain biological hypothesis naturally call for treating the data modalities asymmetrically, as is demonstrated in the manuscript and discussed in previous sections. jsCCA is naturally equipped to handle such scenarios arising from specific mechanistic hypothesis. Here, through our joint analysis approach in jsCCA, we map groups of CNA to methylation sites and then gene expressions, which essentially identifies gene programs potentially impacted by CNA‐regulated methylation. Further biological validation is provided by existing TF databases which show that the gene components thus detected through jsCCA have suggestive evidence of coregulation, which indicates that such joint mapping approach can potentially reveal true biological patterns of regulation.

It is also noteworthy that jsCCA selects physically proximal sets of CNAs in the CNA component, facilitating interpretation of the corresponding jsCCA module as the network of methylation sites and gene expressions potentially regulated by CNAs in a smaller chromosomal sub‐region. In this paper, we have described jsCCA as an analysis approach to select gene expressions regulated by CNA‐associated methylation sites. This approach can be easily used to select gene expressions regulated by methylation‐associated CNA sites by switching the roles of the C and M matrices in Step 1.

Our proposed jsCCA pipeline is highly generalizable as an overall analysis framework. This pipeline can be viewed as an extension of the sCCA approach. Although we have built jsCCA here on top of sCCA for its ease of interpretation in terms of linear combination of features, several other similar methods like adaptive clustering or matrix factorization can be adapted accordingly to identify the modules. In fact, groupings based on functional annotation can also be incorporated to further strengthen the mechanistic interpretation of the identified modules. One of the major advantages of jsCCA is that, in principle, the identification of modules can be performed in a separate data set while the subsequent evaluation of the effect of gene components on tumor characteristics can be performed in a different data set. For example, in current studies, genomic, epigenomic, and transcriptomic data are available for many individuals, while detailed information on their phenotypes and traits might not be available. Thus, the identification of jsCCA modules can be performed using such data while the subsequent association analysis can be carried out in a separate data using generalized meta‐analysis.

Our proposed pipeline must be interpreted in view of several limitations. To improve interpretability and enforce partitioning, we have chosen the regularization parameters to be such that the CNA components have no overlap for selected CNAs. This could lead to stringent regularization. Further research is warranted on the parameter settings to obtain desirable performance and interpretation. Additionally, the optimal number of components in jsCCA is chosen heuristically by maximizing the iterative ratio of canonical correlations rather than using any significance or enrichment tests. Although statistical tests have been recently proposed for canonical correlation loadings, formulating analytical tests of significance has proven difficult and methods based on sCCA have mostly resorted to resampling methods (Dutta, He, et al., 2022). In the future, research on Bayesian formulations coupled with sequential testing can be pursued to perform tests of significance for regularization methods which can indicate the optimal number of components and parameter settings. It remains to be explored whether other regularization techniques like fused Lasso, which are more suitable for correlated settings, produce more interpretable and precise results. Additionally, jsCCA only explores linear interactions between CNA, methylation, and gene expressions. Although this assumption has been used previously, further research is required to understand whether linearity of interactions is a valid assumption and whether possible kernelization is required to introduce nonlinear effects.

In summary, the integration of CNAs, DNA methylation, and gene expression data in tumor analysis is pivotal for unraveling the complexity of cancer biology, identifying actionable targets, and advancing precision oncology. Given the growth and aging of the global population and, with this, the projected 56.7% increase in cancer incidence 63.7% increase in mortality over the next two decades (Sung et al., 2021), there is urgent need for further research to identify biomarkers through comprehensive investigations of diverse genomic and transcriptomic data to facilitate effective management of all types of cancers. Integrated analysis of multi‐omics data can lead to the discovery of biomarkers for cancer diagnosis, prognosis, and treatment response. Our proposed jsCCA analysis pipeline offers a strategy for conducting such integrated analyses. Application of this pipeline to additional data sets and additional research to address the limitations can help further strengthen the utility of this approach.

AUTHOR CONTRIBUTIONS

Diptavo Dutta, Ananda Sen, and Jaya M. Satagopan conceived the study. Diptavo Dutta performed the data analysis with inputs from Ananda Sen and Jaya M. Satagopan. All the authors wrote, edited, and approved the manuscript.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflict of interest.

WEB RESOURCES

TCGA KIRC: https://www.cbioportal.org/study/summary?id=kirc_tcga

Code for analysis: https://github.com/diptavo/

ShinyGO: https://bioinformatics.sdstate.edu/go/

FUMA: https://fuma.ctglab.nl/

ENCODE: https://maayanlab.cloud/Harmonizome/dataset/ENCODE+Transcription+Factor+Targets

Supporting information

Supporting information.

GEPI-48-414-s001.docx (39.9KB, docx)

Supporting information.

GEPI-48-414-s002.xlsx (65.3KB, xlsx)

1.

To introduce jsCCA method, we first start with an exposition of canonical correlation analysis (CCA) and sparse canonical correlation analysis (sCCA).

Canonical Correlation Analysis (CCA):

Given two quantitative variables C and M measured in Nind individuals, their association can be measured using Pearson or Spearman correlation. Suppose there are Ncna number ofC variables and Nmeth number of M variables measured in Nind individuals, with Ncna>1 or Nmeth>1, where C=(C1,C2,,CNcna), M=(M1,M2,,MNmeth), with Cj=(C1j,,CNind,j)T and Mk=(M1k,,MNind,k)T denoting column vectors of measurements of the j‐th C variable and k‐th M variable measured in all Nind individuals. Measuring the association between C and M will, in principle, require obtaining Ncna×Nmeth pairs of cross‐correlations, which can quickly become large and infeasible when Ncna or Nmeth are large. CCA popularized by Hotelling (Hotelling, 1936) is a multivariate approach that offers a pragmatic strategy to address this issue using pairs of canonical variables from C and M.

Each pair of canonical variables is a linear combination, written as:

C(j)=uj1C1+uj2C2++uj,NcnaCNcna,
M(j)=wj1M1+wj2M2++wj,NmethMNmeth,

where j=1,,min{Ncna,Nmeth}. The pair (C(j),M(j)) is referred to as the j‐th canonical variate pair. The coefficients ujk and wjr (k=1,,Ncna and r=1,,Nmeth) are chosen to maximize the correlation between the j‐th canonical pair (C(j),M(j)). The resulting correlation is referred to as canonical correlation. Since there can be up to min{Ncna,Nmeth} canonical pairs, in addition to maximizing the correlation, the coefficients for each pair are also obtained such that the j‐th pair and any of the l‐th pair, l = 1.2….j‐−1, of canonical variables are orthogonal.

CCA has a useful geometric interpretation. Each element of the canonical pair (C(j),M(j)) is a linear projection of Ncna number of C variables and Nmeth number ofM variables into a two‐dimensional space. The coefficients correspond to the directions of the projections and maximizing the correlation between the j‐th pair corresponds to minimizing the angle between the projected variables in the two‐dimensional space.

Sparse canonical correlation analysis (sCCA): In the context of high‐dimensional data, it is important to identify association through a relatively small number of features that regulate the activity. In that vein, sparse CCA (Parkhomenko et al., 2009) introduces sparsity constraints in the coefficients of the above maximization problems by shrinking ujk and wjr towards 0, through the use of appropriate and context‐dependent penalty functions. Thus, sCCA identifies sparse linear combinations (C(j),M(j)) that are highly correlated. Sparsity can be introduced by imposing L1 or L2 or any other equivalent constraints on the coefficients.

Joint sparse canonical correlation analysis (jsCCA): Given the data set up, as described in the Methods section, previous methods have focused primarily the analysis of two data modalities at a time, employing methods like factor analysis or sCCA which select a subset of variables from one data modality strongly associated with a subset of variables from the second data modality. As mentioned above, given data matrices C, M, and E, corresponding to CNAs, methylation, and gene expression respectively (see Section 2), sCCA finds sparse linear combinations of (u,wc) or (v,we) such that the correlations between sparse linear combinations Cu and Mwc and between Mwe and Ev are maximized, where wc and we are canonical loading vectors of M with respect to its correlation with C and E, and, u and v are canonical loading vectors of C and E, respectively. Thus, the main objective is to reduce the dimension of the datasets by considering linear combination of CNAs, methylations, or gene expressions such that they are maximally correlated with the other data modality being considered. In jsCCA we assume that there exists a linear combination or latent linear factor of methylation which is correlated to both a latent linear factor of CNA and a latent linear factor of gene expressions. In other words, we address the specific estimation problem of interest, that is assume that there exists a common set of methylation sites that are associated to a subset of CNAs and a subset of gene expression levels. Thus, we set wc=we, and denote this common canonical loading vector as w. The optimization problem for jsCCA is given by two simultaneous optimization problems:

(u,w)=argmaxuTZCMw,
(w,v)=argmaxwTZMEv,

with u1λu; v1λv; w1λw and u2=1, v2=1 and w2=1, where ZCM=CTM and ZME=MTE.

which is equivalent to:

(u,w,v)=argmaxuTCTMw+wMTEv,

with u1λu; v1λv; w1λw and u2=1, v2=1 and w2=1.

We perform the estimation as follows, given the sparsity parameters λu, λv, and λw:

Step A. Fix an initial value for u, v, and w.

Step B. Repeat until convergence, with subscript i denoting the value of a vector at i‐th iteration:

wiS(ZCMTui1,δw)S(ZCMTui1,δw)2,
viS(ZMETwi1,δv)S(ZMETwi1,δv)2,
uiS(ZCMwi1,δu)S(ZCMwi1,δu)2,

where δw=0 if x2λx otherwise δw>0 is chosen such that x2=λx and S(.) is the soft thresholding function.

Step C: The final triplet (u,w,v) is termed a jsCCA module with the individual vectors being CNA component, methylation component, and gene component.

The canonical correlation values are defined as

qCM=uTZCMw,
qME=wTZMEv,

Given the k th jsCCA module (uk,wk,vk), we use Hotelling's deflation with the above algorithm to extract the (k + 1)th jsCCA module as:

ZCM;k+1=ZCM;k|qCM|wkukT;ZME;k+1=ZME;k|qME|vkwkT;
ZCM;1=ZCMandZME;1=ZME.

Following the stopping criteria used previously in Dutta et al., we extract min{maxKqCM;KqCM;K+1,maxSqME;SqME;S+1}, components by maximizing the ratio of successive canonical correlations and taking the minimum across CNA‐methylation and methylation‐gene components.

Choice of sparsity parameters. Most applications of variable selection use cross‐validation techniques to determine the choice the sparsity parametersλu,λw and λvthat will maximize the prediction accuracy in an independent or hold‐out test sample. However, since our pivotal goal is interpretation of jsCCA modules in terms of potentially orthogonal patterns of biological associations, we used an intersection‐minimization approach to select the tuning parameters. In jsCCA, each optimization problem being an sCCA, exact orthogonality cannot be guaranteed by the estimation algorithm, although the resulting components are approximately orthogonal. We define two CNA components uj and uk to be strictly orthogonal if the two components do not have the same elements with non‐zero loadings or intersection of non‐zero elements of the two components is a null set. In other words,||ujuk||0=0, where denotes the element‐wise product of two vectors. Similar conditions can be defined for methylation and gene components. Thus, uj and uk select mutually exclusive sets of CNAs. In our estimation, we aim to choose λu,λw, and λv to capture such orthogonal patterns of associations. The primary motivation behind this is to obtain a conceptual partition of the data modalities through jsCCA, such that they are distinct in the selected features and hence can be treated independently. We propose the following steps for that:

A. For k = 1, that is, the first jsCCA component, choose λu,λw, and λv through grid search, such that ||uk||0100, ||vk||0200, ||wk||0400, so as to select a smaller number of CNA, methylation sites and gene expressions.

B. For k > 1, i.e., the subsequent components, choose λu,λw, and λv such that there is strict orthogonality between the kth CNA/methylation/gene component and all previous estimated CNA/gene/methylation components.

While it is difficult to guarantee the existence of strict orthogonality in general scenarios and applications, in the current application we were able to obtain estimates under such a condition. However, the choice of the number of non‐zero elements in the first component, needs to be updated according to the data application and the choices here are made to ensure a relatively interpretable number of elements.

To summarize, for a given jsCCA module, several elements of the canonical loading vectors u, v, and w will be 0. The CNAs, methylations sites, and gene expressions corresponding to the non‐zero elements of these vectors will be deemed biologically relevant CNA, methylation sites, and gene sets corresponding to that jsCCA module.

Dutta, D. , Sen, A. , & Satagopan, J. M. (2024). Identifying genes associated with disease outcomes using joint sparse canonical correlation analysis—An application in renal clear cell carcinoma. Genetic Epidemiology, 48, 414–432. 10.1002/gepi.22566

REFERENCES

  1. Abascal, F. , Acosta, R. , Addleman, N. J. , Adrian, J. , Afzal, V. , Ai, R. , Aken, B. , Akiyama, J. A. , Jammal, O. A. , Amrhein, H. , Anderson, S. M. , Andrews, G. R. , Antoshechkin, I. , Ardlie, K. G. , Armstrong, J. , Astley, M. , Banerjee, B. , Barkal, A. A. , Barnes, I. H. A. , … Grubert, F. (2020). Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature, 583, 699–710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alexandrov, L. B. , Ju, Y. S. , Haase, K. , Van Loo, P. , Martincorena, I. , Nik‐Zainal, S. , Totoki, Y. , Fujimoto, A. , Nakagawa, H. , Shibata, T. , Campbell, P. J. , Vineis, P. , Phillips, D. H. , & Stratton, M. R. (2016). Mutational signatures associated with tobacco smoking in human cancer. Science, 354, 618–622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bihr, S. , Ohashi, R. , Moore, A. L. , Rüschoff, J. H. , Beisel, C. , Hermanns, T. , Mischo, A. , Corrò, C. , Beyer, J. , Beerenwinkel, N. , Moch, H. , & Schraml, P. (2019). Expression and mutation patterns of PBRM1, BAP1 and SETD2 mirror specific evolutionary subtypes in clear cell renal cell carcinoma. Neoplasia, 21, 247–256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Cancer Genome Atlas Research Network . (2013). Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature, 499, 43–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Chau, V. , & Bilusic, M. (2020). Pembrolizumab in combination with axitinib as first‐line treatment for patients with renal cell carcinoma (RCC): Evidence to date. Cancer Management and Research, 12, 7321–7330. 10.2147/CMAR.S216605 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chen, B. J. , Causton, H. C. , Mancenido, D. , Goddard, N. L. , Perlstein, E. O. , & Pe'er, D. (2009). Harnessing gene expression to identify the genetic basis of drug resistance. Molecular Systems Biology, 5, 310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chin, L. , & Gray, J. W. (2008). Translating insights from the cancer genome into clinical practice. Nature, 452, 553–563. 10.1038/nature06914 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chin, S. F. , Teschendorff, A. E. , Marioni, J. C. , Wang, Y. , Barbosa‐Morais, N. L. , Thorne, N. P. , Costa, J. L. , Pinder, S. E. , van de Wiel, M. A. , Green, A. R. , Ellis, I. O. , Porter, P. L. , Tavaré, S. , Brenton, J. D. , Ylstra, B. , & Caldas, C. (2007). High‐resolution aCGH and expression profiling identifies a novel genomic subtype of ER‐negative breast cancer. Genome Biology, 8, R215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Daijo, H. , Hoshino, Y. , Kai, S. , Suzuki, K. , Nishi, K. , Matsuo, Y. , Harada, H. , & Hirota, K. (2016). Cigarette smoke reversibly activates hypoxia‐inducible factor 1 in a reactive oxygen species‐dependent manner. Scientific Reports, 6, 34424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Despierre, E. , Moisse, M. , Yesilyurt, B. , Sehouli, J. , Braicu, I. , Mahner, S. , Castillo‐Tong, D. C. , Zeillinger, R. , Lambrechts, S. , Leunen, K. , Amant, F. , Moerman, P. , Lambrechts, D. , & Vergote, I. (2014). Somatic copy number alterations predict response to platinum therapy in epithelial ovarian cancer. Gynecologic Oncology, 135, 415–422. [DOI] [PubMed] [Google Scholar]
  11. Dutta, D. , Sen, A. , & Satagopan, J. (2022). Sparse canonical correlation to identify breast cancer‐related genes regulated by copy number aberrations. PLoS One, 17, e0276886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dutta, D. , He, Y. , Saha, A. , Arvanitis, M. , Battle, A. , & Chatterjee, N. (2022). Aggregative trans‐eQTL analysis detects trait‐specific target gene sets in whole blood. Nature Communications, 13, 4323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Eder, A. M. , Sui, X. , Rosen, D. G. , Nolden, L. K. , Cheng, K. W. , Lahad, J. P. , Kango‐Singh, M. , Lu, K. H. , Warneke, C. L. , Atkinson, E. N. , Bedrosian, I. , Keyomarsi, K. , Kuo, W. , Gray, J. W. , Yin, J. C. P. , Liu, J. , Halder, G. , & Mills, G. B. (2005). Atypical PKCι contributes to poor prognosis through loss of apical‐basal polarity and Cyclin E overexpression in ovarian cancer. Proceedings of the National Academy of Sciences, 102, 12519–12524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. ENCODE Project Consortium . (2012). An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Fang, J. , Lin, D. , Schulz, S. C. , Xu, Z. , Calhoun, V. D. , & Wang, Y. P. (2016). Joint sparse canonical correlation analysis for detecting differential imaging genetics modules. Bioinformatics, 32, 3480–3488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Ghatalia, P. , & Rathmell, W. K. (2018). Systematic review: ClearCode 34—a validated prognostic signature in clear cell renal cell carcinoma (ccRCC). Kidney Cancer, 2, 23–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Guo, G. , Gui, Y. , Gao, S. , Tang, A. , Hu, X. , Huang, Y. , Jia, W. , Li, Z. , He, M. , Sun, L. , Song, P. , Sun, X. , Zhao, X. , Yang, S. , Liang, C. , Wan, S. , Zhou, F. , Chen, C. , Zhu, J. , … Wang, J. (2012). Frequent mutations of genes encoding ubiquitin‐mediated proteolysis pathway components in clear cell renal cell carcinoma. Nature Genetics, 44, 17–19. [DOI] [PubMed] [Google Scholar]
  18. Hardoon, D. R. , & Shawe‐Taylor, J. (2011). Sparse canonical correlation analysis. Machine Learning, 83, 331–353. [Google Scholar]
  19. Hastie, T. , Tibshirani, R. , & Friedman, J. H. (2001). The elements of statistical learning data mining, inference, and prediction. Springer. 10.1007/978-0-387-84858-7 [DOI] [Google Scholar]
  20. Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 321–377. [Google Scholar]
  21. Jin, F. , Thaiparambil, J. , Donepudi, S. R. , Vantaku, V. , Piyarathna, D. W. B. , Maity, S. , Krishnapuram, R. , Putluri, V. , Gu, F. , Purwaha, P. , Bhowmik, S. K. , Ambati, C. R. , von Rundstedt, F. C. , Roghmann, F. , Berg, S. , Noldus, J. , Rajapakshe, K. , Gödde, D. , Roth, S. , … Putluri, N. (2017). Tobacco‐specific carcinogens induce hypermethylation, DNA adducts, and DNA damage in bladder cancer. Cancer Prevention Research, 10, 588–597. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  22. Joosten, S. C. , Smits, K. M. , Aarts, M. J. , Melotte, V. , Koch, A. , Tjan‐Heijnen, V. C. , & van Engeland, M. (2018). Epigenetics in renal cell cancer: Mechanisms and clinical applications. Nature Reviews Urology, 15, 430–451. 10.1038/s41585-018-0023-z [DOI] [PubMed] [Google Scholar]
  23. Jung, Y. S. , Park, J. H. , Park, D. I. , Sohn, C. I. , Lee, J. M. , & Kim, T. I. (2020). Impact of smoking on human natural killer cell activity: A large cohort study. Journal of Cancer Prevention, 25, 13–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kammerer‐Jacquet, S. F. , Deleuze, A. , Saout, J. , Mathieu, R. , Laguerre, B. , Verhoest, G. , Dugay, F. , Belaud‐Rotureau, M. A. , Bensalah, K. , & Rioux‐Leclercq, N. (2019). Targeting the PD‐1/PD‐l1 pathway in renal cell carcinoma. International Journal of Molecular Sciences, 20, 1692. 10.3390/ijms20071692 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. La Rochelle, J. , Klatte, T. , Dastane, A. , Rao, N. , Seligson, D. , Said, J. , Shuch, B. , Zomorodian, N. , Kabbinavar, F. , Belldegrun, A. , & Pantuck, A. J. (2010). Chromosome 9p deletions identify an aggressive phenotype of clear cell renal cell carcinoma. Cancer, 116, 4696–4702. [DOI] [PubMed] [Google Scholar]
  26. Laha, N. , & Mukherjee, R. (2023). On support recovery with sparse CCA: Information theoretic and computational limits. IEEE Transactions on Information Theory, 69, 1695–1738. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Langfelder, P. , & Horvath, S. (2008). WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics, 9, 559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Leday, G. G. R. , & van de Wiel, M. A. (2013). PLRS: A flexible tool for the joint analysis of DNA copy number and mRNA expression data. Bioinformatics, 29, 1081–1082. [DOI] [PubMed] [Google Scholar]
  29. Lin, R. K. , Hsieh, Y. S. , Lin, P. , Hsu, H. S. , Chen, C. Y. , Tang, Y. A. , Lee, C. F. , & Wang, Y. C. (2010). The tobacco‐specific carcinogen NNK induces DNA methyltransferase 1 accumulation and tumor suppressor gene hypermethylation in mice and lung cancer patients. Journal of Clinical Investigation, 120, 521–532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Lin, D. , Zhang, J. , Li, J. , Calhoun, V. D. , Deng, H. W. , & Wang, Y. P. (2013). Group sparse canonical correlation analysis for genomic data integration. BMC Bioinformatics, 14, 245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Luo, Y. , & Chen, C. (2021). The roles and regulation of the KLF5 transcription factor in cancers. Cancer Science, 112, 2097–2117. 10.1111/cas.14910 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Malone, E. R. , Oliva, M. , Sabatini, P. J. B. , Stockley, T. L. , & Siu, L. L. (2020). Molecular profiling for precision cancer therapies. Genome Medicine, 12, 8. 10.1186/s13073-019-0703-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Meinshausen, N. , & Bühlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 72, 417–473. [Google Scholar]
  34. Meinshausen, N. , Meier, L. , & Bühlmann, P. (2009). p‐Values for high‐dimensional regression. Journal of the American Statistical Association, 104, 1671–1681. [Google Scholar]
  35. Monzon, F. A. , Alvarez, K. , Peterson, L. , Truong, L. , Amato, R. J. , Hernandez‐McClain, J. , Tannir, N. , Parwani, A. V. , & Jonasch, E. (2011). Chromosome 14q loss defines a molecular subtype of clear‐cell renal cell carcinoma associated with poor prognosis. Modern Pathology, 24, 1470–1479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Nguyen, B. , Mota, J. M. , Nandakumar, S. , Stopsack, K. H. , Weg, E. , Rathkopf, D. , Morris, M. J. , Scher, H. I. , Kantoff, P. W. , Gopalan, A. , Zamarin, D. , Solit, D. B. , Schultz, N. , & Abida, W. (2020). Pan‐cancer analysis of CDK12 alterations identifies a subset of prostate cancers with distinct genomic and clinical characteristics. European Urology, 78, 671–679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Ogretmen, B. (2017). Sphingolipid metabolism in cancer signalling and therapy. Nature Reviews Cancer, 18, 33–50. 10.1038/nrc.2017.96 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Parkhomenko, E. , Tritchler, D. , & Beyene, J. (2009). Sparse canonical correlation analysis with application to genomic data integration. Statistical Applications in Genetics and Molecular Biology, 8, 1–34. [DOI] [PubMed] [Google Scholar]
  39. Priya, S. , Burns, M. B. , Ward, T. , Mars, R. A. T. , Adamowicz, B. , Lock, E. F. , Kashyap, P. C. , Knights, D. , & Blekhman, R. (2022). Identification of shared and disease‐specific host gene–microbiome associations across human diseases using multi‐omic integration. Nature Microbiology, 7, 780–795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Rodosthenous, T. , Shahrezaei, V. , & Evangelou, M. (2020). Integrating multi‐OMICS data through sparse canonical correlation analysis for the prediction of complex traits: A comparison study. Bioinformatics, 36, 4616–4625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Sänger, N. , Ruckhäberle, E. , Györffy, B. , Engels, K. , Heinrich, T. , Fehm, T. , Graf, A. , Holtrich, U. , Becker, S. , & Karn, T. (2015). Acid ceramidase is associated with an improved prognosis in both DCIS and invasive breast cancer. Molecular Oncology, 9, 58–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Sapkota, Y. , Ghosh, S. , Lai, R. , Coe, B. P. , Cass, C. E. , Yasui, Y. , Mackey, J. R. , & Damaraju, S. (2013). Germline DNA copy number aberrations identified as potential prognostic factors for breast cancer recurrence. PLoS One, 8, e53850. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Sheng, I. Y. , & Ornstein, M. C. (2020). Ipilimumab and nivolumab as first‐line treatment of patients with renal cell carcinoma: The evidence to date. Cancer Management and Research, 12, 4871–4881. 10.2147/CMAR.S202017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Singhal, S. K. , Byun, J. S. , Park, S. , Yan, T. , Yancey, R. , Caban, A. , Hernandez, S. G. , Hewitt, S. M. , Boisvert, H. , Hennek, S. , Bobrow, M. , Ahmed, M. S. U. , White, J. , Yates, C. , Aukerman, A. , Vanguri, R. , Bareja, R. , Lenci, R. , Farré, P. L. , Gardner, K. , De Siervi A., Nápoles A.M., Vohra N., Gardner K. (2021). Kaiso (ZBTB33) subcellular partitioning functionally links LC3A/B, the tumor microenvironment, and breast cancer survival. Communications Biology, 4, 150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Sondka, Z. , Bamford, S. , Cole, C. G. , Ward, S. A. , Dunham, I. , & Forbes, S. A. (2018). The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nature Reviews Cancer, 18, 696–705. 10.1038/s41568-018-0060-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Sung, H. , Ferlay, J. , Siegel, R. L. , Laversanne, M. , Soerjomataram, I. , Jemal, A. , & Bray, F. (2021). Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 71, 209–249. [DOI] [PubMed] [Google Scholar]
  47. Takeda, N. , Maemura, K. , Imai, Y. , Harada, T. , Kawanami, D. , Nojiri, T. , Manabe, I. , & Nagai, R. (2004). Endothelial PAS domain protein 1 gene promotes angiogenesis through the transactivation of both vascular endothelial growth factor and its receptor, Flt‐1. Circulation Research, 95, 146–153. [DOI] [PubMed] [Google Scholar]
  48. Tamborero, D. , Lopez‐Bigas, N. , & Gonzalez‐Perez, A. (2013). Oncodrive‐CIS: A method to reveal likely driver genes based on the impact of their copy number changes on expression. PLoS One, 8, e55489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Tompkins, V. S. , Rouse, W. B. , O'Leary, C. A. , Andrews, R. J. , & Moss, W. N. (2022). Analyses of human cancer driver genes uncovers evolutionarily conserved RNA structural elements involved in posttranscriptional control. PLoS One, 17, e0264025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Vijayan, Y. , Lankadasari, M. B. , & Harikumar, K. B. (2019). Acid ceramidase: A novel therapeutic target in cancer. Current Topics in Medicinal Chemistry, 19, 1512–1520. [DOI] [PubMed] [Google Scholar]
  51. Wang, X. , Gulbahce, N. , & Yu, H. (2011). Network‐based methods for human disease gene prediction. Briefings in Functional Genomics, 10, 280–293. [DOI] [PubMed] [Google Scholar]
  52. Weaver, C. , Bin Satter, K. , Richardson, K. P. , Tran, L. K. H. , Tran, P. M. H. , & Purohit, S. (2022). Diagnostic and prognostic biomarkers in renal clear cell carcinoma. Biomedicines, 10, 2953. 10.3390/biomedicines10112953 [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Wilms, I. , & Croux, C. (2015). Sparse canonical correlation analysis from a predictive point of view. Biometrical Journal, 57, 834–851. [DOI] [PubMed] [Google Scholar]
  54. Witten, D. M. , & Tibshirani, R. J. (2009). Extensions of sparse canonical correlation analysis with applications to genomic data. Statistical Applications in Genetics and Molecular Biology, 8, 1–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Witten, D. M. , Tibshirani, R. , & Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10, 515–534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Xiu, X. , Yang, Y. , Kong, L. , & Liu, W. (2021). Data‐driven process monitoring using structured joint sparse canonical correlation analysis. IEEE Transactions on Circuits and Systems II: Express Briefs, 68, 361–365. [Google Scholar]
  57. Zender, L. , Spector, M. S. , Xue, W. , Flemming, P. , Cordon‐Cardo, C. , Silke, J. , Fan, S. T. , Luk, J. M. , Wigler, M. , Hannon, G. J. , Mu, D. , Lucito, R. , Powers, S. , & Lowe, S. W. (2006). Identification and validation of oncogenes in liver cancer using an integrative oncogenomic approach. Cell, 125, 1253–1267. 10.1016/j.cell.2006.05.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Zhang, L. , Feizi, N. , Chi, C. , & Hu, P. (2018). Association analysis of somatic copy number alteration burden with breast cancer survival. Frontiers in Genetics, 9, 421. 10.3389/fgene.2018.00421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Zhao, Y. , Han, F. , Zhang, X. , Zhou, C. , & Huang, D. (2020). Aryl hydrocarbon receptor nuclear translocator promotes the proliferation and invasion of clear cell renal cell carcinoma cells potentially by affecting the glycolytic pathway. Oncology Letters, 20, 56. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting information.

GEPI-48-414-s001.docx (39.9KB, docx)

Supporting information.

GEPI-48-414-s002.xlsx (65.3KB, xlsx)

Articles from Genetic Epidemiology are provided here courtesy of Wiley

RESOURCES