Abstract
There is a growing awareness that tumor-adjacent normal tissues used as control samples in cancer studies do not represent fully healthy tissues. Instead, they are intermediates between healthy tissues and tumors. The factors that contribute to the deviation of such control samples from healthy state include exposure to the tumor-promoting factors, tumor-related immune response, and other aspects of tumor microenvironment. Characterizing the relation between gene expression of tumor-adjacent control samples and tumors is fundamental for understanding roles of microenvironment in tumor initiation and progression, as well as for identification of diagnostic and prognostic biomarkers for cancers.
To address the demand, we developed and validated TranNet, a computational approach that utilizes gene expression in matched control and tumor samples to study the relation between their gene expression profiles. TranNet infers a sparse weighted bipartite graph from gene expression profiles of matched control samples to tumors. The results allow us to identify predictors (potential regulators) of this transition. To our knowledge, TranNet is the first computational method to infer such regulation.
We applied TranNet to the data of several cancer types and their matched control samples from The Cancer Genome Atlas (TCGA). Many predictors identified by TranNet are genes associated with regulation by the tumor microenvironment as they are enriched in G-protein coupled receptor signaling, cell-to-cell communication, immune processes, and cell adhesion. Correspondingly, targets of inferred predictors are enriched in pathways related to tissue remodelling (including the epithelial-mesenchymal Transition (EMT)), immune response, and cell proliferation. This implies that the predictors are markers and potential stromal facilitators of tumor progression. Our results provide new insights for the relationships between tumor adjacent control sample, tumor and the tumor environment. Moreover, the set of predictors identified by TranNet will provide a valuable resource for future investigations.
The TranNet method was implemented in python, source codes and the data sets used for and generated during this study are available at the Github site https://github.com/ncbi/TranNet.
1. Introduction
In multi-stage carcinogenesis theory, mutations accumulate to perturb the cell regulatory program, eventually causing cell transformation. These perturbations interact with other factors such as micro and macro environment or preexisting health conditions. Although mutations are central for the emergence of cancer, much of the understanding of the disease comes from studies of the cancer-related changes in gene expression. Cancer is characterized by dysregulated functions of many cellular processes including proliferation, cell-cell interactions, chromatin organization, DNA repair, and others. Yet many of these alterations are not mechanically linked to specific mutations, but are driven by changes in gene expression [1]. One of the emerging concepts is that cancer progression is facilitated by increased cell plasticity, which allows cancer cells to switch dynamically between a differentiated states in response to stress [2]. Cancer cell plasticity has been linked to the epithelial-to-mesenchymal transition (EMT) which has been shown to respond to microenvironmental signals or cancer therapy [3, 4, 5, 6, 7, 8]. In addition, it has been estimated that at least 25% of cancers are associated with chronic inflammation [9, 10]. Last but not least, a recent study suggested that the tumor microenvironment and the microenvironment of control sample are strongly dependent [11]. In fact, tumor-related alteration of adjacent tissue are believed to contribute to postoperative cancer recurrences that occur in up to a third of patients [12]. This is not a surprise, as tumors and their adjacent normal tissues also share some exogenous exposures including environmental factors, such as smoking, diet, and genetic variations. However could gene expression from a control sample inform on tumor state? Which molecular pathways and functions in tumor are associated with gene expression changes in normal tissue? What can they teach us about tumor progression?
To address these questions, we developed the Transition Network model (TranNet), a computational approach to study the relation between the gene expression patterns in matched normal and tumor tissues. Focusing on tumor genes (defined as genes differently expressed between tumor and control samples), TranNet infers a transition function from gene expression in a control sample to an estimate of gene expression in the matched tumor sample. Simultaneously, the method infers a set of genes that are predictors of this transition (Figure 1). These predictor genes might but do not have to regulate this transition and might instead reflect expression changes due to factors common to control and tumor samples. Functional analysis of the predictors and their target genes helps to shed light on possible sources of this relation.
Figure 1: Workflow of the analysis.
TranNet constructs a transition function from gene expression in control samples to gene expression in the matched tumor samples. Simultaneously, the method infers a set of predictors in the control tissue that used for computing the transition and target genes in the tumor tissue that are influenced by the predictors. After validating the method, functional analysis of the predictors and their target genes is used to shed light on possible sources of the relation between gene expression in control and tumor samples.
Computationally, TranNet utilizes a network construction based on a sparse estimation of partial correlation [13] that uses l1 norm constraint to ensure the selection of the most informative predictors (Figure 2). Recognising that the expression of the genes which did not pass the p-value threshold to be included in the set of tumor genes might also contribute to gene expression in tumor, we extended the network by including additional nodes, principal components, representing the informative trends on the expression data of these genes. We refer to the genes and principal components whose activity in control samples influence gene expression in tumor tissue samples as predictors.
Figure 2: Outline of the TranNet method.
Input matrices and represent the expressions of genes in control (, panel (A1)) and tumor , panel (A2)) samples across patients (rows). Genes which are differentially expressed between control and tumor samples (referred to as tumor genes) are represented by individual nodes (columns from 1 to n) and the rest of the genes are represented by principal components of their expression (columns from to ) preserving major trends of the expression. The principal components which are differentiated between normal and tumor tissue samples are considered as additional variables in the multi-variate analysis. Distribution trend of expression of a differentially expressed gene is visualized in the left of the yellow panel. As illustrated in a 2-dimensional space in the right of the yellow panel, although neither nor is differentially expressed between control and tumor tissue samples, their joint distribution is differentiated along the axis of principal component showing a similar trend to the DE genes while is not differentiated (B). The transition map is a linear operator defined by matrix computed to minimise ’s representation error subject to a sparsity constraint explained in the main text (C). The result is summarised as a bipartite network representing regulatory influences from control samples to tumors (D). Regulatory potential of a gene represents its total contribution to the transition network (E).
TranNet opens a new way to explore the relationship between gene expression in adjacent control samples and tumors. Our results indicate that the former can provide information about the latter and link, at least in part, inferred relations to tumor environment. As elaborated in the Discussion, not all inferred associations are assumed to be tumor drivers. Yet, as predictors of the expression of tumor genes, they are markers of other tumor-related processes including tumor-environment interaction and are important in this respect. Indeed, using TranNet we identified a set of genes that can serve as the predictors of tumor-environment relation as well as genes and pathways involved in this interaction. Taken together, this work offers a computational method to infer the relation between the gene expression patterns in matched normal and tumor tissues and provided a new understanding of the relation between tumor, normal, and the environment.
2. The Transition Network Model (TranNet)
An overview of the TranNet workflow is presented in Figure 2. Two matrices representing gene expressions in normal (A1) and tumor (A2) tissues of cancer patients are provided as the input. Tumor genes, defined as genes differentially expressed between control and tumor samples, are represented as network nodes while the expression of the rest of the genes is represented by meta-nodes corresponding to the principal components of the expression of these remaining genes. Each gene or principal component is represented by two nodes — one for each condition. The conceptual idea of including the principal components in the analysis is illustrated in (B). The transition matrix from normal to tumor state is obtained by solving the problem in (C) and the bipartite network formed by the transition matrix represents dependency flow from normal to tumor tissues (D). Finally, the strength of the total influence of a node in normal tissue on gene expression in tumor is quantified by regulatory potentials (RP) (E).
2.1. Inference of the transition network
Let and be the matrices describing gene expression in tumor and control samples respectively for patients. The first columns of each matrix represent the expression of the tumor genes followed by meta nodes corresponding the principal components summarising gene expression trends of the remaining set of genes.
Specifically, and denotes the expression of gene (or if ) in the ith patient in tumor and normal respectively where the “expression” of a principal component in patient is explained below (see also Figure 2B right, for the description of differentially expressed (DE) principal component).
Assuming that represents regulatory influences (or markers of such influences) and their targets, the weight matrix describing transformations between them can be written as
| (1) |
More precisely, for patient , the expression value of gene or principal component in the patient’s tumor tissue can be approximated by a linear combination of the expressions of the genes and meta nodes in the patient’s normal tissue
where denotes the transition weights from the normal tissue expression of the genes and principal components to the tumor tissue expression of gene or principal component . Thus, our goal is to minimize the least square error subject to a unit norm constraint on as follows:
| (2) |
The norm constraint on the edge weights allows for selecting the strongest transition effects on a given target node. Hence, this regularization acts to avoid over-fitting issues and only non-zero , selected for , denotes the transition effect from . In this setting, for every target node, the optimization problem in Equation (2) (a constrained version of Equation (1)) searches over all possible combinations of transition effects from the predictornodes and selects the best combination with their optimal weights to explain the activity of the target node [13].
To explain the role of principal components we shall recall that a fundamental assumption for multivariate analysis is that there are no unobserved factors affecting both explanatory and response variables globally. The expressions of the genes that were not selected as tumor genes (DE genes) might have such influence. To adjust for such effects on the transition mapping, we include meta nodes (adjustment variables in [14]) that represent the principal components of the expression data for these non-tumor genes (see Section 6). By the definition, principal components are vector representations of the general trends in a multi-dimensional data [15]. Here data points represent all the samples (control and tumor) and each point is a vector representing gene expression (Figure 2B right). Embedding this data using principal components as reference axis, we say that a given principal component is differentially expressed between the control and tumor samples if the coordinates of the tumor samples on the axis defined by this principal component are significantly larger or smaller than the coordinates of the control samples (Figure 2B right).
Finally, we comment on the optimization problem. Although the least square minimization in Equation (2) is convex, the norm constraint is non-smooth and derivative-based optimality conditions such as Lagrangian multipliers and Karush–Kuhn–Tucker (KKT) conditions are not, in general, directly applicable. While a coordinate descent algorithm with convex penalties [16] is commonly used to approximate this non-smooth problem, there is no optimal strategy for tuning parameters controlling the strength of the penalty term [17]. Similarly complementing the standard quadratic programming formulation, the soft shrinkage method decomposes the norm into inequality constraints [18]. However, handling constraints is not practical for the large-scale problem. To overcome this issue, we implemented the projected gradient method [19] that converges to the optimal solution for the non-smooth constrained problem.
2.2. Regulatory potentials of predictors
TranNet aims to identify predictors (genes and principal comments) whose expression in one condition (here normal) predicts gene expression in a different condition (here tumor). We model the information flow from normal to tumor tissue by the transition matrix in Equation (1), obtained by solving the non-smooth convex optimization problem in Equation (2). The total contribution of an individual gene or a principal component to the transition is captured by the non-zero transition weights from the corresponding node to its selected targets in the network. We define the regulatory potential (RP) of such predictor as the summation of the absolute weights of the outgoing edges from the node representing the corresponding gene/principal component in normal tissue.
| (3) |
Thus, predictors can be prioritized with respect to their regulatory potential defined in Equation (3).
3. Validation of the TranNet model
We applied the TranNet model to the TCGA gene expression data for five different cancer types, focusing on solid tumors selected based on availability of matched control-tumor samples: Breast Cancer (BRCA), Lung Adenocarcinoma (LUAD), Lung squamous cell carcinoma (LUSC), Prostate Adenocarcinoma (PRAD), and Liver Hepatocellular Carcinoma (LIHC) (see Section 6 for data processing). To validate TranNet model, we first show that the impact of gene expression in normal tissue on gene expression in tumor is higher than expected by chance (predictability). Next, we show that the network edges inferred by TranNet and are enriched in functional interactions. Subsequently, we analyse gene-wise prediction accuracy. These validations in BRCA are collected in Figure 3A, and extended figures including the validation results for all the five cancers are provided in Figure S1, S2 and S3 in Additional file 1. Finally, we confirm that the results are stable thus the conclusions of this study do not depend on the threshold used for selecting the gene set for analyzing (Table 2).
Figure 3: Validations and visualisation of the results.
(A) Sample-wise predictivity of TranNet was evaluated based on a leave-one-out test (top sub-panel, Section 3.1); Interactions inferred by TranNet was tested for enrichment of functional interactions (middle sub-panel, Section 3.2); Gene-wise prediction accuracy of TranNet was evaluated by the prediction errors (bottom sub-panel Section 3.1). (B) Enrichment of GO terms for the predictor genes: Rows correspond to enriched terms, while the color represents enrichment p-value. GO terms enriched in at least three cancers are shown. The complete lists of GO enrichment for the lists of predictorgenes sorted by their RP scores are provided as in Additional file 2. (C) Enrichment of GO terms for self-predictors defined as predictor genes with edges leading from a gene in control tissue to the same gene in tumor. GO terms enriched in at least two cancers are shown and the complete lists of GO terms enriched in such self-predicting genes are provided, for each cancer, in Additional file 4. (D) Kegg pathways enrichment for the targets of the top 10 regulators (Table 3) in each cancer type. Rows correspond to the predictor genes while columns represent the pathways enriched for more than 10 predictor genes over all the five cancers. The complete lists are combined in Additional file 5.
Table 2:
Stability of TranNet.
| Main | Alternative | Selected-predictors | Overlap | Jaccard Index | HG p-value | |
|---|---|---|---|---|---|---|
|
| ||||||
| BRCA | 9309 | 8172 | 279 | 228 | 0.8 | ≈ 0.0 |
| LUAD | 8691 | 7225 | 347 | 282 | 0.805 | ≈ 0.0 |
| LUSC | 9965 | 8635 | 398 | 317 | 0.79 | ≈ 0.0 |
| PRAD | 6399 | 4464 | 127 | 73 | 0.57 | 8.7e-125 |
| LIHC | 7514 | 5806 | 300 | 215 | 0.711 | 3.5e-322 |
Overlap in the selected predictors based on the networks constructed form two different selection of tumor genes. Main set is described in Table 4 in Section 6 and the Alternative set was selected with a different threshold as described in Section 3.3. We selected the same number of top predictors from each list based on the number of optimal predictors for the Main set. Overlap reports the number of common predictors, Jaccard Index: Jaccard Index similarity between the two sets of the highest scoring predictors and HG p-value is hyper-geometric p value for the enrichment of the predictors obtained with the Alternative set in the Main set.
3.1. Expression of genes in control samples informs gene expression in tumor
The motivation for the TranNet model is the hypothesis that gene expression in control samples carry relevant information on gene expression in matched tumors. In order to confirm that this is indeed the case, we used a leave-one-out strategy (see Section 6) to test if the model can predict the expression of tumor genes from the expression of predictors better than expected by chance. The paired-sample T-test is used to compare the results (prediction errors) of two predictions: the first based on the real expression data and the second based on permuted data. In addition, we tested how prediction accuracy depends on the number of used predictors assuming that the markers are selected in the decreasing order of their RP values. The prediction accuracy varies with the selection of different percentages of the top RP scoring predictors (see Sample-wise predictability in Figure 3A and Figure S1 in Additional file 1, and Table 1). As we mentioned before, the distribution of regulatory potentials in all the five cancers is characterised by a small number of nodes with high regulatory potential and very large number of genes with low regulatory potential. We point out that despite very small percentage, the summations of regulatory potentials for the top predictors that are above the cut-offs (Table 1) were 42.2% (BRCA), 46.6% (LUAD), 50.7% (LUSC), 56.5% (PRAD) and 29.5% (LIHC) of the total regulatory potentials over all predictors (Additional file 6).
Table 1:
Leave-one-out validation for the TranNet Model.
| Cancer | BRCA | LUAD | LUSC | PRAD | LIHC |
|---|---|---|---|---|---|
|
| |||||
| Number/Percentage of predictors (optimized set) | 279/3% | 347/4% | 398/4% | 300/4% | 127/2% |
| Paired-sample T-statistic (optimized set) | 11.7 | 11.1 | 8.3 | 8.7 | 9.0 |
| Paired-sample P-value (optimized set) | 1.1e-20 | 8.7e-16 | 5.6e-11 | 2.6e-11 | 4.5e-11 |
|
| |||||
| The number of predictors (top 30%) | 2792 | 2607 | 2989 | 1919 | 2254 |
| Paired-sample T-statistic (top 30%) | 8.8 | 7.5 | 6.1 | 6.0 | 6.5 |
| Paired-sample P-value (top 30%) | 2.6e-14 | 4.9e-10 | 1.6e-07 | 2.7e-07 | 8.1e-08 |
The paired-sample T-test was used to compare the sample-wise prediction errors on real and permutated response data. This test was performed for the different numbers of top RP scoring predictors. The selected predictors corresponding to the optimal cut-off are provided with their RP scores in Additional file 6. The bottom rows report result using top 30% of predictors. The top rows report the results for the cancer-specific optimised set of predictors (See Figure 3A and Figure S1 in Additional file 1).
As for the target genes, we note that the expressions are not predicted equally well for all the tumor genes. Importantly, the error distributions are highly non-symmetric and negatively skewed towards small errors as visualized in Figure 3A suggesting that the expression of only some tumor genes genes is related to their expression in control samples. The GO terms enriched for the gene lists sorted by the prediction errors on the genes (tumor tissue) are summarized in Additional file 7 and discussed in Section 4.2.
3.2. The transition edges inferred by TranNet are enriched in functional interaction edges.
TranNet infers a sparse bipartite graph modelling the transition from the expression of predictors to the expression of tumor genes in matched tumor samples. Since these edges encode potential regulatory influences (or markers associated with such influences), we expect that they should be enriched for functional interactions. To test this, we computed the overlap of the TranNet edges with the edges in the human functional interaction network [20] (see Section 6) and tested whether this overlap is larger than expected by chance. To this end, we constructed 1,000 random networks by permuting target nodes of the inferred transition network without changing the topology of the network. The enrichment of the functional interactions [20] for BRCA is provided in Figure 3A, and the performances for all the five cancers are summarized as Figure S2 in Additional file 1. These results confirm that TranNet infers relations consistent with functional interactions [20].
We note that, similar to the other biological networks, the distribution of both degrees and regulatory potentials in the TranNet networks is characterised by a large number of predictors with very small regulatory potentials and small proportion of predictors with high potentials (see Figure S4 in Additional file 1 and the table in Additional file 6).
3.3. Stability of TranNet
Since the networks inferred by TranNet are sparse (see the table in Figure S4 in Additional file 1) and inferred from expression data using some cut-offs (see Section 6), it is important to test if an alternative selection of tumor genes would not lead to very different results. To test this, we computed the transition matrix for an alternative definition of tumor (DE) and Non-DE genes where using a q-value cut-off of 0.001 in the T-test for differential expression. Then, we compared the optimal set of predictors for the main set with the same number of predictors computed for the alternative test. The Jaccard similarity and hyper-geometric test results between the two selected sets of predictors showed very high agreements as summarized in Table 2.
4. Insights into the relation between gene expression in tumor and in matched control samples
After validating TranNet model, we investigated the properties of the inferred interactions between control and tumor tissue samples. As noted above, the gene expressions are not predicted equally well for all genes. However, the error distributions are highly non-symmetric and negatively skewed towards small errors as visualized in Figure 3A for BRCA (the error distributions for all the five cancers are provided as Figure S3 in Additional file 1). We started by investigating which biological pathways are enriched within the genes with better predictions. Towards this end we performed enrichment analysis for the gene list ranked by prediction accuracy. Strikingly, the genes whose expression was predicted with higher accuracy were enriched with specific groups of GO categories related to: environment, cell communication, immune response, signalling, and cell cycle (see the complete list of enriched GO terms in Additional file 7). Particularly, the enriched pathways in BRCA included cell-to-cell signalling, ion transport, and regulation of hormone level. Pathways in LUAD and LUSC had many terms related to cilia as expected due to the known impact of smoking on the properties of ciliated cells [21]. Pathways in LUAD were generally related to cell cycle. Finally pathways enriched in LIHC included, in addition to the pathways related to cell cycle, pathways related to detoxification and taxis and thus related to liver-specific relation with environment [22].
Interestingly, the third principal components PC3 in LUAD and PRAD, are significantly differentiated between control and tumor samples with q-values 2.10E-05 for LUAD and 4.72E-05 for PRAD (see Table 4) and have relatively high regulatory potentials ranked 104-th in the LUAD predictors and 37-th in the PRAD predictors. This demonstrates that expression of genes that did not pass the significance level of differential expression have effects on the differentially expressed genes. As for the targets genes of these components, both are enriched with terms related to metabolism and DNA repair, and Spliceosome (see Table S1 in Additional file 1) while also containing cancer type specific two terms.
Table 4:
Data summary.
| # | DE Genes | Non-DE Genes | DE PCs (q value) | TranNet | Patients | HN Genes | HN Edges |
|---|---|---|---|---|---|---|---|
|
| |||||||
| BRCA | 9308 | 4831 | PC27(0.0015) | 9309 | 105 | 8633 | 308985 |
| LUAD | 8688 | 5798 | PC3,6,8(2.1E-05, 0.0056, 0.0099) | 8691 | 57 | 8087 | 269821 |
| LUSC | 9965 | 4853 | PC5(0.0003) | 9965 | 50 | 9246 | 345514 |
| PRAD | 6398 | 8440 | PC3(4.72E-05) | 6399 | 47 | 5951 | 129963 |
| LIHC | 7514 | 6177 | 7514 | 40 | 7011 | 230952 | |
DE genes (tumor genes): differentially expressed genes selected based on T-test q-value < 0.01; Non-DE Genes: genes whose expressions are not differentiated between control and tumor samples; DE PCs: principal components which are differentiated between control and tumor samples representing potential impacts of Non-DE Genes (T-test q-value) on the transition mapping; TranNet: Pairs of Nodes in the Transition Network (DE Genes + DE PCs); Patients: Patients control and tumor samples are available; HN Genes: genes in both DE Genes and HumanNet-FN; HN Edges: Edges within HN Genes in the functional network.
4.1. Predictor genes are enriched in pathways that imply the features of tumor microenvironment
To find the properties of the inferred predictor genes, we used GOrilla [23] to perform a GO enrichment analysis. For each cancer, we identified predictor genes, ranked them by their regulatory potential (RP) scores, and performed a ranked list enrichment analysis [23]. We considered all three levels: enrichment of biological process (P), molecular function (F) and cellular component (C) (Figure 3C).
The predictor genes for all the five cancer types are enriched in pathways that imply the feature of the microenvironment around the tumors. For the Biological Process, they include humoral immune response, cell-cell signaling and cell communication, and G-protein coupled receptor. Molecular functions were enriched with cell communication (receptor regulator activity and receptor ligand activity), hormone activity, and inflammation (cytokine activity, serine hydrolase activity, serine-type peptidase activity). With respect to the terms related to cellular component, all cancers were enriched in pathways related to extracellular space and membrane. When compared among individual cancer types, the predictor genes have shown distinct features of tumor microenvironment. This is illustrated by the top ten predictor genes in each cancer type (Table 3, the top 10 cut-off was selected based on 50% dropout in regulatory potential). Unprecedented 85% of these genes have been identified in literature as cancer related and generally associated with tumor progression. For BRCA, the predictors include inflammation-induced genes (CCDC102B [24], SPOCD1 [25], EPYC [26], GLYATL1, SLC6A9 [27]), immune cell markers (CLEC4G [28] and BATF4 [29]), and neurogenesis genes (ADRA2B [30] and NNAT [31]). For LUAD, in addition to inflammation-induced genes (LYPD1 [32]), immune cell markers (CD244 [33]), and neural genes (WDR62 [34]), the predictors also include invasiveness-related genes (ELMO1 [35], LY6K [36], TMEM52 [37], TSACC, PCDH7). Similarly, the predictors of LUSC include to inflammation-induced genes (ENTPD6 [38], SLC51B [39], GNLY, SLCO1C1 [40], and B3GAT1[41]), invasiveness-related genes (CITED1, OR2W3 [42], RASGEF1A [43]), and neural genes (GAL and SH2D5). Importantly, most of these genes are involved in chronic obstructive pulmonary disease (COPD). Taken together, these results suggested that BRCA, LUAD, and LUSC exhibited inflammatory neuroepithelial reactive stroma that are regulated by different mechanisms in immune responses and tissue regeneration. Also shown in Figure 3C, the predictors of PRAD and LIHC were enriched in distinct pathways from other cancer types. Their features are demonstrated by the function of top-ranked predictor genes. For PRAD, the majority of top-ten predictors are neurogenesis genes (NTRK2 [44], LRRC7, LMX1B, ROPN1B); others include genes involved in inflammatory response (TMEM178A) and androgen receptor regulation (PRMT5 [45]). This is relevant to the intense neuroepithelial interactions in prostate cancer. For LIHC, the predictors include genes responsive to DNA damage (AUNIP, CHEK2, CENPI) and hypoxia (FLVCR1), and two genes involved in liver disease such as fatty liver and hepatitis (GSTA4, ATP2A1 [46], RNF125 [47]). They reflect the metabolic modification in the tumor microenvironments.
Table 3:
Top ten RP scoring genes.
| BRCA | LUAD | LUSC | PRAD | LIHC | |||||
|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||
| NNAT | [48] | LY6K | [49] | ENTPD6 | [38] | LRRC23 | [50] | ATP2A1 | [51] |
| DPEP1 | [52] | OLAH | [53] | RASGEF1A | [54] | PRMT5 | [45] | TMEM253 | |
| CLEC4G | [55] | LYPD1 | [56] | SLCO1C1 | [57] | NTRK2 | [58] | GSTA4 | [59] |
| CCDC102B | [24] | SLC13A4 | [60] | OR2W3 | [61] | ADAD2 | [62] | FLVCR1 | [63] |
| GLYATL1 | [64] | CD244 | [65] | SH2D5 | [66] | FAM86B1 | IGHMBP2 | [67] | |
| DLEU1 | [68] | TSACC | B3GAT1 | [69] | TMEM178A | CENPI | [70] | ||
| ADRA2B | [30] | TMEM52 | [71] | SLC51B | [72] | ROPN1B | [73] | CHEK2 | [74] |
| SPOCD1 | [75] | WDR62 | [76] | CITED1 | [77] | LRRC7 | [50] | RNF125 | [78] |
| BATF2 | [79] | ELMO1 | GNLY | [80] | LMX1B | [81] | AUNIP | [82] | |
| EPYC | [83] | PCDH7 | [84] | GAL | C2orf73 | OAZ3 | [85] | ||
The lists of the top 10 genes with the highest regulatory potentials (Equation (3)) in each cancer together with references to sample papers discussing their roles in cancer. In bold are references to papers that point to relations with the same cancer type. The regulatory potentials of the top predictors for the five cancers are provided at https://github.com/ncbi/TranNet.
4.2. Pathway enrichment of genes identified as targets of predictors revealed tumor-stroma interactions
To investigate how the microenvironments might regulate the tumors, we identified the pathways enriched (see Section 6 for the pathway enrichment) by the tumor genes (“targets”; Matrix Y in Figure 2B) associated with top-ranked predictor genes in Table 3. As shown in Figure 3D, the targets in BRCA and LUAD are more enriched in cell proliferation-related pathways (the first group in Figure 3D). In LUSC, the majority of targets are enriched in tissue remodeling and immune response pathways (the second and third groups in Figure 3D, respectively). In PRAD, in addition to immune response and tissue remodeling, the targets are mostly enriched in neural and hormone pathways (fourth and fifth groups in Figure 3D, respectively). In LIHC, the targets are enriched mostly in cell proliferation and tissue remodeling pathways. These results reflect the distinct signaling for tumor initiation and expansion in each cancer type. For example, studies have shown that the occurrence of LUSC and LIHC are frequently preceded with Chronic Obstructive Pulmonary Disease (COPD) and hepatitis, respectively, and the tissue injury and remodeling in such chronic inflammation promote tumorigenesis. Interestingly, the same types of pathways in tumors could be associated with different types of predictors, depending on cancer type. For example, cell proliferation-related pathways in tumors are associated with inflammation-related predictors for BRCA (CCDC102B and GLYATL1) and LUSC (SLCO1C1, SLC51B, and BATF2); with invasiveness predictors for LUAC (LY6K and TMEM52); with neurogenesis predictors (LRRC7, LMX1B, ROPN1B) in PRAD; and with hypoxic (FLVCR1) and DNA-damage response (AUNIP) predictors for LIHC. These observations suggest that tumor growth is stimulated by distinct microenvironmental factors in each cancer type, as summarised in 4. Taken together, our results also give important implication in the detection and diagnosis of cancers at early stage.
4.3. Predictors whose expression in control samples is predictive of expression in tumor have unique properties
Interestingly, we noted that the top predictors do not include genes that have edges leading from a gene in control tissue to the same gene in tumor. We refer to such genes as self-predictors and asked, if such self-predictors have properties distinct form the properties of the top predictors. To answer this question, in each cancer, we have identified all self predictors (see Additional file 3 for the list of self-predictors). Note that since all network edges are between differentially expressed genes only, this excludes genes whose expression is simply the same in normal and tumor tissues. Not surprisingly, gene expression in control and tumor samples for self-predictors tends to be significantly correlated (Additional file 3). Strikingly, in contrast to the predictors with high regulatory potential, GO enrichment analysis of self-predicting genes revealed relations to oxidation (including hemoglobin complex) and detoxification (Figure 3 C, Additional file 4). In particular self-predictors enrichment terms did not include terms directly related to tissue remodelling. While both, the top predictors and self-predictors, contained terms related to immune response, the terms for self-predictive genes were more strongly focused on Major Histocompatibility Complex (MHC)-related immune response (Figure 3 C, and Additional file 4). Methods and data used for functional enrichment are described in Section 6.
5. Conclusions
The regulation of tumor progression is not fully understood. Recent studies demonstrated that control samples representing adjacent normal tissue in solid tumor studies differ form healthy tissue [11]. We hypothesised that genes whose expression in control samples is predictive of the expression in tumor samples might provide insights into this relation.
We developed and validated a new computational method, TranNet, which is able to identify predictor genes, predictive (and potentially facilitating) regulation of a tumor by the tumor environment (Fig 4). We found that the functional properties of the predictor genes are consistent with the mechanisms of regulation by the environment. In addition, many of the identified predictors are previously recognised markers of the microenvironment-mediated tumor progression. One limitation of this study is that while TranNet identifies predictors of such external influences on a tumor it does not establish causality.
Figure 4: Interpretations of the relations between tumor and normal tissues inferred by TranNet.
TranNet predictors (light blue) in normal tissue predict environment-related changes in tumor (light grey) (A). Cancer-specific functional enrichment of predictor genes (light blue) and inferred environmentally modulated changes in tumor (light gray) (B).
It is important to recognize that some external factors that might impact both normal tissues and tumors and do not necessarily cause tumor progression directly. Even though, many of these factors interact with and/or regulate tumor and are interesting in this respect COPID and hepatitis are likely examples. There is also an association between the incidence of a wide variety of malignancies and diabetes. Identification of such interactions is of fundamental importance for preventing cancer and development of novel treatments.
The concept of tumor environment, has many layers. Previous studies distinguished tumor’s micro and macro environments to capture the effects of the closest neighborhood (such as normal cells, molecules, and blood vessels that surround and feed a tumor cell) and more distant and more distant influences respectively [86]. Here we also allow yet another level of influences: organismal environment and germline variations. Our approach provides a proof of principle that control samples can be used to gain insight into the impact of external factors, jointly referred here as the environment, on tumor and tumor progression. The enrichment of predictor genes in processes associated with immune response suggest that some of these predictors might be in fact sensors of common exposure to chronic infection in normal and tumor tissues. Indeed, it is recognized that cigarette smoking induces lung inflammation, leading to the changes in cellular composition and functions, such as a reduction in the number and properties of ciliated cells. In future studies, it would be interesting to extend the model to allow for a separation of different types of external contributions. In general, it might be difficult to fully deconvolute such processes, as environmental contributors are mostly unknown. However, some environmental factors are mutagenic and in such cases the exposure of the organism to an environmental factor can be estimated by the strength of a corresponding mutational signature [87, 88]. These potential extensions aside, the results obtained by TranNet already provided new insights into the relation between tumor, adjacent control samples, and tumor environment.
6. Materials and Methods
Gene expression data:
Batch effect removed TCGA gene expression data [89] were downloaded from GitHub repository (https://github.com/mskcc/RNAseqDB) on April 5th, 2022. Requiring matched normal and cancer samples, five types of EMT cancers: Breast Cancer (BRCA), Lung Adenocarcinoma (LUAD), Lung squamous cell carcinoma (LUSC), Prostate Adenocarcinoma (PRAD), and Liver Hepatocellular Carcinoma (LIHC) were selected for the analysis and only patients that had both normal and tumor gene expressions were kept for each cancer type. We excluded genes which are expressed in neither normal (control) nor tumor samples, and the expression level of a gene is assumed as significant if 80% or more of its expression values over samples are greater than a threshold (see Figure S5 in Additional file 1). We then selected the genes that are differentially expressed between control and tumor samples, and expression data for the remaining genes (non-differentially expressed) are compressed in the principal components that preserve 99% of covariate information of the data [15]. From these principal components (PCs), only differentiated (differentially expressed) components are assumed as additional meta nodes in the network, representing the presence of potential covariates in the transition network, coming from the non-differentially expressed genes [14]. The conceptual idea for defining the differentially expressed principal components and for including them as additional variables in the presenting analysis is described in Section 2.1 and Figure 2B. Finally, activity (expression of genes and magnitudes of PCs) of each node is standardized with Z-score to bring dissimilar features on a similar scale. The data cohorts for all the five cancers are provided as Table 6.
Leave-one-out test:
The leave-one-out test was performed as follows. For each patient , remove the corresponding expression values (a row) and (a row) from the data and . A transition matrix is then computed from the remaining data (such as ). Then, check if the prediction error on the true response is less than the error on the random response , such that , where is obtained as permuting is denoted as the absolute summation over the prediction errors on the genes and principal components). A T-test for paired samples was performed to check if this prediction error is less than the random error. The prediction accuracy varies depending on the number of selected predictors as shown in Figure 3A for BRCA, and Figure S1 for all the five cancers in Additional file 1. The number of predictors for each cancer was decided based on different cut-off percents of top genes and principal components with the highest regulatory potentials. The test results for the five cancers are summarized in Table 1.
Functional Network:
We used a functional network [20] of human genes for disease studies downloaded from HumanNet v3 (https://www.inetbio.org/humannet). This data set contains 18,459 protein coding genes and 977,495 interactions inferred from various datasets and functional networks. The interactions corresponding to our genes sets are used as the edge validation sets in the analysis of the cancers. The characteristics of the gene expression data and edge validation set are summarized in Table 4.
Enrichment of GO terms:
Enrichment of GO term biological processes, molecular functions and cellular components was performed using GOrilla [23] for either the sorted lists or selected subsets of genes.
Enrichment of Kegg pathways:
Pathway enrichment was performed with a hypergeometric test followed by a Benjamin–Hochberg test for multiple comparison corrections using Kegg pathways [90]. Pathways enriched with q − value < 0.01 are reported in this study.
Availability of source codes and data:
The TranNet method was implemented in python, source codes, data sets and supplemental tables used for and generated during this study are available at the Github site https://github.com/ncbi/TranNet.
Supplementary Material
Additional file 1 — Supplemental Information: Table S1, Figures S1, S2, S3, S4 and S5. (PDF)
Additional file 2 — Lists of GO terms enriched for the gene list sorted based on their regulatory potential (PR) scores for each of the five types of cancers: Enrichment of biological process, molecular function and cellular component are provided in separate sheets in the same file (Excel).
Additional file 3 — Lists of the self-predictor genes with their self-interaction weights, correlation between their normal and tumor tissue expressions and the corresponding p-values. (Excel)
Additional file 4 — Lists of enriched GO terms for the genes whose normal samples are predictive for the corresponding tumor samples (self-predictors): Enrichment of biological process, molecular function and cellular component (Excel).
Additional file 5 — Lists of enriched Kegg pathways for the target genes of the top ten RP scoring predictors: Enriched pathways and the corresponding p-values (Excel).
Additional file 6 — Lists of the selected high scoring predictor genes and principal components with their regulatory potentials for the five cancers. (Excel)
Additional file 7 — Lists of GO terms enriched for the gene list sorted based on prediction errors for each of the five cancers (Genes whose tumor expressions are predicted with smaller errors are listed in higher ranks in the list): Enrichment of biological process, molecular function and cellular component are provided in separate sheets in the same file (Excel).
Acknowledgement
We want to thank M.G. Hirsch for comments on the manuscript. This research was supported by the Intramural Research Program of the National Library of Medicine.
References
- [1].Sager R.. Expression genetics in cancer: shifting the focus from DNA to RNA. Proc Natl Acad Sci U S A, 94(3):952–955, Feb 1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].da Silva-Diz V., Lorenzo-Sanz L., Bernat-Peguera A., Lopez-Cerda M., and oz P.. Cancer cell plasticity: Impact on tumor progression and therapy response. Semin Cancer Biol, 53:48–58, Dec 2018. [DOI] [PubMed] [Google Scholar]
- [3].Shen S. and Clairambault J.. Cell plasticity in cancer cell populations. F1000Res, 9, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Yuan S., Norgard R. J., and Stanger B. Z.. Cellular Plasticity in Cancer. Cancer Discov, 9(7):837–851, Jul 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Huang S.. Genetic and non-genetic instability in tumor progression: link between the fitness landscape and the epigenetic landscape of cancer cells. Cancer Metastasis Rev, 32(3–4):423–448, Dec 2013. [DOI] [PubMed] [Google Scholar]
- [6].Huang S.. Tumor progression: chance and necessity in Darwinian and Lamarckian somatic (mutation-less) evolution. Prog Biophys Mol Biol, 110(1):69–86, Sep 2012. [DOI] [PubMed] [Google Scholar]
- [7].Wang M., Zhao J., Zhang L., Wei F., Lian Y., Wu Y., Gong Z., Zhang S., Zhou J., Cao K., Li X., Xiong W., Li G., Zeng Z., and Guo C.. Role of tumor microenvironment in tumorigenesis. J Cancer, 8(5):761–773, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Quail D. F. and Joyce J. A.. Microenvironmental regulation of tumor progression and metastasis. Nat Med, 19(11):1423–1437, Nov 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Balkwill F. and Mantovani A.. Inflammation and cancer: back to Virchow? Lancet, 357(9255):539–545, Feb 2001. [DOI] [PubMed] [Google Scholar]
- [10].Coussens L. M. and Werb Z.. Inflammation and cancer. Nature, 420(6917):860–867, 2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Aran D., Camarda R., Odegaard J., Paik H., Oskotsky B., Krings G., Goga A., Sirota M., and Butte A. J.. Comprehensive analysis of normal adjacent to tumor transcriptomes. Nat Commun, 8(1):1077, October 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Mehlen P. and Puisieux A.. Metastasis: a question of life or death. Nat Rev Cancer, 6(6):449–458, Jun 2006. [DOI] [PubMed] [Google Scholar]
- [13].Amgalan B. and Lee H.. DEOD: uncovering dominant effects of cancer-driver genes based on a partial covariance selection method. Bioinformatics, 31(15):2452–2460, Aug 2015. [DOI] [PubMed] [Google Scholar]
- [14].Jablonski K. P., Pirkl M., Ćevid D., Bühlmann P., and Beerenwinkel N.. Identifying cancer pathway dysregulations using differential causal effects. Bioinformatics, Dec 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Ringnér M.. What is principal component analysis? Nat Biotechnol, 26(3):303–304, Mar 2008. [DOI] [PubMed] [Google Scholar]
- [16].Kim J., Kim Y., and Kim Y.. A gradient-based optimization algorithm for lasso. Journal of Computational and Graphical Statistics, 17(4):994–1009, 2008. [Google Scholar]
- [17].Friedman J., Hastie T., and Tibshirani R.. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw, 33(1):1–22, 2010. [PMC free article] [PubMed] [Google Scholar]
- [18].Donoho D. L. and Johnstone J. M.. Ideal spatial adaptation by wavelet shrinkage. biometrika, 81(3):425–455, 1994. [Google Scholar]
- [19].Gafni E. M. and Bertsekas D. P.. Two-metric projection methods for constrained optimization. SIAM Journal on Control and Optimization, 22(6):936–964, 1984. [Google Scholar]
- [20].Kim C. Y., Baek S., Cha J., Yang S., Kim E., Marcotte E. M., Hart T., and Lee I.. HumanNet v3: an improved database of human gene networks for disease research. Nucleic Acids Res, 50(D1):D632–D639, January 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Leopold P. L., O’Mahony M. J., Lian X. J., Tilley A. E., Harvey B. G., and Crystal R. G.. Smoking is associated with shortened airway cilia. PLoS One, 4(12):e8157, Dec 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Grant D. M.. Detoxification pathways in the liver. J Inherit Metab Dis, 14(4):421–430, 1991. [DOI] [PubMed] [Google Scholar]
- [23].Eden E., Navon R., Steinfeld I., Lipson D., and Yakhini Z.. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics, 10:48, Feb 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Si J., Guo R., Xiu B., Chi W., Zhang Q., Hou J., Su Y., Chen J., Xue J., Shao Z. M., Wu J., and Chi Y.. B Pathway. Front Oncol, 12:927358, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Sato K., Masuda T., Hu Q., Tobo T., Kidogami S., Ogawa Y., Saito T., Nambara S., Komatsu H., Hirata H., Sakimura S., Uchi R., Hayashi N., Iguchi T., Eguchi H., Ito S., Nakagawa T., and Mimori K.. Phosphoserine Phosphatase Is a Novel Prognostic Biomarker on Chromosome 7 in Colorectal Cancer. Anticancer Res, 37(5):2365–2371, May 2017. [DOI] [PubMed] [Google Scholar]
- [26].Zhang J., Zhang S., Zhou Y., Qu Y., Hou T., Ge W., and Zhang S.. KLF9 and EPYC acting as feature genes for osteoarthritis and their association with immune infiltration. J Orthop Surg Res, 17(1):365, Jul 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Carmans S., Hendriks J. J., Thewissen K., Van den Eynden J., Stinissen P., Rigo J. M., and Hellings N.. The inhibitory neurotransmitter glycine modulates macrophage activity by activation of neutral amino acid transporters. J Neurosci Res, 88(11):2420–2430, Aug 2010. [DOI] [PubMed] [Google Scholar]
- [28].Dominguez-Soto A., Aragoneses-Fenoll L., Martin-Gayo E., Martinez-Prats L., Colmenares M., Naranjo-Gomez M., Borras F. E., Munoz P., Zubiaur M., Toribio M. L., Delgado R., and Corbi A. L.. The DC-SIGN-related lectin LSECtin mediates antigen capture and pathogen binding by human myeloid cells. Blood, 109(12):5337–5345, Jun 2007. [DOI] [PubMed] [Google Scholar]
- [29].Roy S., Guler R., Parihar S. P., Schmeier S., Kaczkowski B., Nishimura H., Shin J. W., Negishi Y., Ozturk M., Hurdayal R., Kubosaki A., Kimura Y., de Hoon M. J., Hayashizaki Y., Brombacher F., and Suzuki H.. Batf2/Irf1 induces inflammatory responses in classically activated macrophages, lipopolysaccharides, and mycobacterial infection. J Immunol, 194(12):6035–6044, Jun 2015. [DOI] [PubMed] [Google Scholar]
- [30].Rivero E. M., Martinez L. M., Bruque C. D., Gargiulo L., Bruzzone A., and Lüthy I. A.. Prognostic significance of α and β2-adrenoceptor gene expression in breast cancer patients. Br J Clin Pharmacol, 85(9):2143–2154, September 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Nass N., Walter S., Jechorek D., Weissenborn C., Ignatov A., Haybaeck J., Sel S., and Kalinski T.. High neuronatin (NNAT) expression is associated with poor outcome in breast cancer. Virchows Arch, 471(1):23–30, Jul 2017. [DOI] [PubMed] [Google Scholar]
- [32].Fu X. W., Song P. F., and Spindel E. R.. Role of Lynx1 and related Ly6 proteins as modulators of cholinergic signaling in normal and neoplastic bronchial epithelium. Int Immunopharmacol, 29(1):93–98, Nov 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Sun L., Gang X., Li Z., Zhao X., Zhou T., Zhang S., and Wang G.. Advances in Understanding the Roles of CD244 (SLAMF4) in Immune Regulation and Associated Diseases. Front Immunol, 12:648182, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Chen J. F., Zhang Y., Wilde J., Hansen K. C., Lai F., and Niswander L.. Microcephaly disease gene Wdr62 regulates mitotic progression of embryonic neural stem cells and brain size. Nat Commun, 5:3885, May 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Li H., Yang L., Fu H., Yan J., Wang Y., Guo H., Hao X., Xu X., Jin T., and Zhang N.. i2 and ELMO1/Dock180 connects chemokine signalling with Rac activation and metastasis. Nat Commun, 4:1706, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].AlHossiny M., Luo L., Frazier W. R., Steiner N., Gusev Y., Kallakury B., Glasgow E., Creswell K., Madhavan S., Kumar R., and Upadhyay G. . Ly6E/K Signaling to TGFβ Promotes Breast Cancer Progression, Immune Escape, and Drug Resistance. Cancer Res, 76(11):3376–3386, June 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Ehrlich K. C., Lacey M., and Ehrlich M.. Gene Families. Epigenomes, 4(1), Jan 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Zhuang Z. and Gao C.. Development of a Clinical Prognostic Model for Metabolism-Related Genes in Squamous Lung Cancer and Correlation Analysis of Immune Microenvironment. Biomed Res Int, 2022:6962056, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Berg T., ck T., Olsson M., rd J., m V., Zhou X. H., Grunewald J., Gustavsson L., and Nord M.. Gene expression analysis of membrane transporters and drug-metabolizing enzymes in the lung of healthy and COPD subjects. Pharmacol Res Perspect, 2(4):e00054, Aug 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Ridder D. A., Lang M. F., Salinin S., derer J. P., Struss M., Maser-Gluth C., and Schwaninger M.. TAK1 in brain endothelial cells mediates fever and lethargy. J Exp Med, 208(13):2615–2623, Dec 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Trimarco J. D., Nelson S. L., Chaparian R. R., Wells A. I., Murray N. B., Azadi P., Coyne C. B., and Heaton N. S.. Cellular glycan modification by B3GAT1 broadly restricts influenza virus infection. Nat Commun, 13(1):6456, Oct 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Gao M., Kong W., Huang Z., and Xie Z.. Identification of Key Genes Related to Lung Squamous Cell Carcinoma Using Bioinformatics Analysis. Int J Mol Sci, 21(8), Apr 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Wuttig D., Baier B., Fuessel S., Meinhardt M., Herr A., Hoefling C., Toma M., Grimm M. O., Meye A., Rolle A., and Wirth M. P.. Gene signatures of pulmonary metastases of renal cell carcinoma reflect the disease-free interval and the number of metastases per patient. Int J Cancer, 125(2):474–482, Jul 2009. [DOI] [PubMed] [Google Scholar]
- [44].Chow R., Wessels J. M., and Foster W. G.. Brain-derived neurotrophic factor (BDNF) expression and function in the mammalian reproductive Tract. Hum Reprod Update, 26(4):545–564, Jun 2020. [DOI] [PubMed] [Google Scholar]
- [45].Beketova E., Owens J. L., Asberry A. M., and Hu C. D.. PRMT5: a putative oncogene and therapeutic target in prostate cancer. Cancer Gene Ther, 29(3–4):264–276, March 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Kong J., Sun S., Min F., Hu X., Zhang Y., Cheng Y., Li H., Wang X., and Liu X.. Integrating Network Pharmacology and Transcriptomic Strategies to Explore the Pharmacological Mechanism of Hydroxysafflor Yellow A in Delaying Liver Aging. Int J Mol Sci, 23(22), Nov 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [47].Toyoda H., Kumada T., Kiriyama S., Tanikawa M., Hisanaga Y., Kanamori A., Tada T., Kitabatake S., and Murakami Y.. Association between hepatic steatosis and hepatic expression of genes involved in innate immunity in patients with chronic hepatitis C. Cytokine, 63(2):145–150, Aug 2013. [DOI] [PubMed] [Google Scholar]
- [48].Pieper W., Ignatov A., Kalinski T., Haybaeck J., Czapiewski P., and Nass N.. The predictive potential of Neuronatin for neoadjuvant chemotherapy of breast cancer. Cancer Biomark, 32(2):161–173, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Ali M. M., Di Marco M., Mahale S., Jachimowicz D., Kosalai S. T., Reischl S., Statello L., Mishra K., Darnfors C., Kanduri M., and Kanduri C.. LY6K-AS lncRNA is a lung adenocarcinoma prognostic biomarker and regulator of mitotic progression. Oncogene, 40(13):2463–2478, April 2021. [DOI] [PubMed] [Google Scholar]
- [50].Santoni M. J., Kashyap R., Camoin L., and Borg J. P.. The Scribble family in cancer: twentieth anniversary. Oncogene, 39(47):7019–7033, Nov 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Zhang G., Shang H., Liu B., Wu G., Wu D., Wang L., Li S., Wang Z., Wang S., and Yuan J.. Predicts Poor Prognosis in Patients With Colorectal Carcinoma. Front Genet, 13:661348, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Green A. R., Krivinskas S., Young P., Rakha E. A., Paish E. C., Powe D. G., and Ellis I. O.. Loss of expression of chromosome 16q genes DPEP1 and CTCF in lobular carcinoma in situ of the breast. Breast Cancer Res Treat, 113(1):59–66, Jan 2009. [DOI] [PubMed] [Google Scholar]
- [53].Wu Q., Wang D., Zhang Z., Wang Y., Yu W., Sun K., Maimela N. R., Sun Z., Liu J., Yuan W., and Zhang Y.. DEFB4A is a potential prognostic biomarker for colorectal cancer. Oncol Lett, 20(4):114, Oct 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [54].Suzuki C., Takahashi K., Hayama S., Ishikawa N., Kato T., Ito T., Tsuchiya E., Nakamura Y., and Daigo Y.. Identification of Myc-associated protein with JmjC domain as a novel therapeutic target oncogene for lung cancer. Mol Cancer Ther, 6(2):542–551, Feb 2007. [DOI] [PubMed] [Google Scholar]
- [55].Zhang Y., Wei H., Fan L., Fang M., He X., Lu B., and Pang Z.. CLEC4s as Potential Therapeutic Targets in Hepatocellular Carcinoma Microenvironment. Front Cell Dev Biol, 9:681372, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Burnett R. M., Craven K. E., Krishnamurthy P., Goswami C. P., Badve S., Crooks P., Mathews W. P., Bhat-Nakshatri P., and Nakshatri H.. Organ-specific adaptive signaling pathway activation in metastatic breast cancer cells. Oncotarget, 6(14):12682–12696, May 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Xiao Y. F., Li B. S., Liu J. J., Wang S. M., Liu J., Yang H., Hu Y. Y., Gong C. L., Li J. L., and Yang S. M.. Role of lncSLCO1C1 in gastric cancer progression and resistance to oxaliplatin therapy. Clin Transl Med, 12(4):e691, April 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [58].Faltermeier C. M., Drake J. M., Clark P. M., Smith B. A., Zong Y., Volpe C., Mathis C., Morrissey C., Castor B., Huang J., and Witte O. N.. Functional screen identifies kinases driving prostate cancer visceral and bone metastasis. Proc Natl Acad Sci U S A, 113(2):E172–181, Jan 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [59].White D. L., Li D., Nurgalieva Z., and El-Serag H. B.. Genetic variants of glutathione S-transferase as possible risk factors for hepatocellular carcinoma: a HuGE systematic review and meta-analysis. Am J Epidemiol, 167(4):377–389, Feb 2008. [DOI] [PubMed] [Google Scholar]
- [60].Yang M. L., Zhang J. H., Li S., Zhu R., and Wang L.. SLC13A4 Might Serve as a Prognostic Biomarker and be Correlated with Immune Infiltration into Head and Neck Squamous Cell Carcinoma. Pathol Oncol Res, 27:1609967, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Masjedi S., Zwiebel L. J., and Giorgio T. D.. Olfactory receptor gene abundance in invasive breast carcinoma. Sci Rep, 9(1):13736, September 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [62].Pan J., Dai Q., Xiang Z., Liu B., and Li C.. Three Biomarkers Predict Gastric Cancer Patients’ Susceptibility To Fluorouracil-based Chemotherapy. J Cancer, 10(13):2953–2960, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [63].Zhang K., Zhao Z., Yu J., Chen W., Xu Q., and Chen L.. LncRNA FLVCR1-AS1 acts as miR-513c sponge to modulate cancer cell proliferation, migration, and invasion in hepatocellular carcinoma. J Cell Biochem, 119(7):6045–6056, 07 2018. [DOI] [PubMed] [Google Scholar]
- [64].Wang J., Shidfar A., Ivancic D., Ranjan M., Liu L., Choi M. R., Parimi V., Gursel D. B., Sullivan M. E., Najor M. S., Abukhdeir A. M., Scholtens D., and Khan S. A.. Overexpression of lipid metabolism genes and PBX1 in the contralateral breasts of women with estrogen receptor-negative breast cancer. Int J Cancer, 140(11):2484–2497, Jun 2017. [DOI] [PubMed] [Google Scholar]
- [65].Vaes R. D. W., Reynders K., Sprooten J., Nevola K. T., Rouschop K. M. A., Vooijs M., Garg A. D., Lambrecht M., Hendriks L. E. L., Rucevic M., and De Ruysscher D.. Identification of Potential Prognostic and Predictive Immunological Biomarkers in Patients with Stage I and Stage III Non-Small Cell Lung Cancer (NSCLC): A Prospective Exploratory Study. Cancers (Basel), 13(24), Dec 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [66].Zheng Y., Ming P., Zhu C., Si Y., Xu S., Chen A., Wang J., and Zhang B.. Hepatitis B virus X protein-induced SH2 domain-containing 5 (SH2D5) expression promotes hepatoma cell growth via an SH2D5-transketolase interaction. J Biol Chem, 294(13):4815–4827, March 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [67].Shen J., Terry M. B., Gammon M. D., Gaudet M. M., Teitelbaum S. L., Eng S. M., Sagiv S. K., Neugut A. I., and Santella R. M.. IGHMBP2 Thr671Ala polymorphism might be a modifier for the effects of cigarette smoking and PAH-DNA adducts to breast cancer risk. Breast Cancer Res Treat, 99(1):1–7, Sep 2006. [DOI] [PubMed] [Google Scholar]
- [68].Pang B., Sui S., Wang Q., Wu J., Yin Y., and Xu S.. Upregulation of DLEU1 expression by epigenetic modification promotes tumorigenesis in human cancer. J Cell Physiol, 234(10):17420–17432, August 2019. [DOI] [PubMed] [Google Scholar]
- [69].Ramello M. C., Núñez N. G., Tosello Boari J., Bossio S. N., Canale F. P., Abrate C., Ponce N., Del Castillo A., Ledesma M., Viel S., Richer W., Sedlik C., Tiraboschi C., Muñoz M., Compagno D.,Gruppi A., Acosta Rodríguez E. V., Piaggio E., and Montes C. L.. T Cells Infiltrate Tumors and Are Expanded in Peripheral Blood From Breast Cancer Patients. Front Immunol, 12:713132, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [70].Ding N., Li R., Shi W., and He C.. CENPI is overexpressed in colorectal cancer and regulates cell migration and invasion. Gene, 674:80–86, Oct 2018. [DOI] [PubMed] [Google Scholar]
- [71].Wang Z., Pei H., Liang H., Zhang Q., Wei L., Shi D., Chen Y., and Zhang J.. Construction and Analysis of a circRNA-Mediated ceRNA Network in Lung Adenocarcinoma. Onco Targets Ther, 14:3659–3669, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [72].Cheng J., Li Y., Wang X., Dong Z., Chen Y., Zhang R., Huang J., Jin X., Yao J., Ge A., Song L., Lu Y., and Zeng Z.. Response Stratification in the First-Line Combined Immunotherapy of Hepatocellular Carcinoma at Genomic, Transcriptional and Immune Repertoire Levels. J Hepatocell Carcinoma, 8:1281–1295, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [73].Wu W., Warner M., Wang L., He W. W., Zhao R., Guan X., Botero C., Huang B., Ion C., Coombes C., and Gustafsson J. A.. Drivers and suppressors of triple-negative breast cancer. Proc Natl Acad Sci U S A, 118(33), Aug 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [74].Lulli M., Del Coco L., Mello T., Sukowati C., Madiai S., Gragnani L., Forte P., Fanizzi F. P., Mazzocca A., Rombouts K., Galli A., and Carloni V.. DNA Damage Response Protein CHK2 Regulates Metabolism in Liver Cancer. Cancer Res, 81(11):2861–2873, June 2021. [DOI] [PubMed] [Google Scholar]
- [75].Liu D., Yang Y., Yan A., and Yang Y.. SPOCD1 accelerates ovarian cancer progression and inhibits cell apoptosis via the PI3K/AKT pathway. Onco Targets Ther, 13:351–359, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [76].Shinmura K., Kato H., Kawanishi Y., Igarashi H., Inoue Y., Yoshimura K., Nakamura S., Fujita H., Funai K., Tanahashi M., Niwa H., Ogawa H., and Sugimura H.. WDR62 overexpression is associated with a poor prognosis in patients with lung adenocarcinoma. Mol Carcinog, 56(8):1984–1991, Aug 2017. [DOI] [PubMed] [Google Scholar]
- [77].Cantelli G., Orgaz J. L., Rodriguez-Hernandez I., Karagiannis P., Maiques O., Matias-Guiu X., Nestle F. O., Marti R. M., Karagiannis S. N., and Sanz-Moreno V.. TGF-β-Induced Transcription Sustains Amoeboid Melanoma Migration and Dissemination. Curr Biol, 25(22):2899–2914, Nov 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [78].Kodama T., Kodama M., Jenkins N. A., Copeland N. G., Chen H. J., and Wei Z.. Ring Finger Protein 125 Is an Anti-Proliferative Tumor Suppressor in Hepatocellular Carcinoma. Cancers (Basel), 14(11), May 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [79].Lin Y., Zhou X., Peng W., Wu J., Wu X., Chen Y., and Cui Z.. Expression and clinical implications of basic leucine zipper ATF-like transcription factor 2 in breast cancer. BMC Cancer, 21(1):1062, Sep 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [80].Zhang Z., Ji W., Huang J., Zhang Y., Zhou Y., Zhang J., Dong Y., Yuan T., Yang Q., Ding X., Tang L., Li H., Yin J., Wang Y., Ji T., Fei J., Zhang B., Chen P., and Hu H.. Characterization of the tumour microenvironment phenotypes in malignant tissues and pleural effusion from advanced osteoblastic osteosarcoma patients. Clin Transl Med, 12(11):e1072, Nov 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [81].Meng M. and Wu Y. C.. LMX1B Activated Circular RNA GFRA1 Modulates the Tumorigenic Properties and Immune Escape of Prostate Cancer. J Immunol Res, 2022:7375879, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [82].Ma C., Kang W., Yu L., Yang Z., and Ding T.. AUNIP Expression Is Correlated With Immune Infiltration and Is a Candidate Diagnostic and Prognostic Biomarker for Hepatocellular Carcinoma and Lung Adenocarcinoma. Front Oncol, 10:590006, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [83].Duss S., Brinkhaus H., Britschgi A., Cabuy E., Frey D. M., Schaefer D. J., and Bentires-Alj M.. Mesenchymal precursor cells maintain the differentiation and proliferation potentials of breast epithelial cells. Breast Cancer Res, 16(3):R60, Jun 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [84].Chen Y., Shen L., Chen B., Han X., Yu Y., Yuan X., and Zhong L.. in lung cancer. Ann Transl Med, 9(10):843, May 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [85].Ngoc P. C. T., Tan S. H., Tan T. K., Chan M. M., Li Z., Yeoh A. E. J., Tenen D. G., and Sanda T.. Identification of novel lncRNAs regulated by the TAL1 complex in T-cell acute lymphoblastic leukemia. Leukemia, 32(10):2138–2151, October 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [86].Rutkowski M. R., Svoronos N., Perales-Puchalt A., and Conejo-Garcia J. R.. The Tumor Macroenvironment: Cancer-Promoting Networks Beyond Tumor Beds. Adv Cancer Res, 128:235–262, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [87].Alexandrov L. B., Kim J., Haradhvala N. J., Huang M. N., Tian Ng A. W., Wu Y., Boot A., Covington K. R., Gordenin D. A., Bergstrom E. N., Islam S. M. A., Lopez-Bigas N., Klimczak L. J., McPherson J. R., Morganella S., Sabarinathan R., Wheeler D. A., Mustonen V., Getz G., Rozen S. G., Stratton M. R., Alexandrov L. B., Bergstrom E. N., Boot A., Boutros P., Chan K., Covington K. R., Fujimoto A., Getz G., Gordenin D. A., Haradhvala N. J., Huang M. N., Islam S. M. A., Kazanov M., Kim J., Klimczak L. J., Lopez-Bigas N., Lawrence M., Martincorena I., McPherson J. R., Morganella S., Mustonen V., Nakagawa H., Tian Ng A. W., Polak P., Prokopec S., Roberts S. A., Rozen S. G., Sabarinathan R., Saini N., Shibata T., Shiraishi Y., Stratton M. R., Teh B. T., Vázquez-García I., Wheeler D. A., Wu Y., Yousif F., and Yu W.. The repertoire of mutational signatures in human cancer. Nature, 578(7793):94–101, February 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [88].Kim Y. A., Hodzic E., Amgalan B., Saslafsky A., Wojtowicz D., and Przytycka T. M.. Mutational Signatures as Sensors of Environmental Exposures: Analysis of Smoking-Induced Lung Tissue Remodeling. Biomolecules, 12(10), Sep 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [89].Wang Q., Armenia J., Zhang C., Penson A. V., Reznik E., Zhang L., Minet T., Ochoa A., Gross B. E., Iacobuzio-Donahue C. A., Betel D., Taylor B. S., Gao J., and Schultz N.. Unifying cancer and normal RNA sequencing data from different sources. Sci Data, 5:180061, April 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [90].Kanehisa M. and Goto S.. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res, 28(1):27–30, Jan 2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Additional file 1 — Supplemental Information: Table S1, Figures S1, S2, S3, S4 and S5. (PDF)
Additional file 2 — Lists of GO terms enriched for the gene list sorted based on their regulatory potential (PR) scores for each of the five types of cancers: Enrichment of biological process, molecular function and cellular component are provided in separate sheets in the same file (Excel).
Additional file 3 — Lists of the self-predictor genes with their self-interaction weights, correlation between their normal and tumor tissue expressions and the corresponding p-values. (Excel)
Additional file 4 — Lists of enriched GO terms for the genes whose normal samples are predictive for the corresponding tumor samples (self-predictors): Enrichment of biological process, molecular function and cellular component (Excel).
Additional file 5 — Lists of enriched Kegg pathways for the target genes of the top ten RP scoring predictors: Enriched pathways and the corresponding p-values (Excel).
Additional file 6 — Lists of the selected high scoring predictor genes and principal components with their regulatory potentials for the five cancers. (Excel)
Additional file 7 — Lists of GO terms enriched for the gene list sorted based on prediction errors for each of the five cancers (Genes whose tumor expressions are predicted with smaller errors are listed in higher ranks in the list): Enrichment of biological process, molecular function and cellular component are provided in separate sheets in the same file (Excel).




