Abstract
Multi-omics cancer data provides complementary views of tumorigenesis and progression. Technical challenges exist in integrating these heterogeneous data into deep learning models to better understand tumorigenesis and predict cancer recurrence. We herein propose a novel end-to-end deep learning method (MULGONET) for cancer recurrence prediction and biomarker discovery. First, MULGONET can effectively solve the curse of dimensionality and the lack of model interpretability in multi-omics data integration. Second, it explores interactions and regulatory relationships between genes and GO terms, thus providing biological insights. Benchmark results show that MULGONET outperforms other contemporary classification methods. It achieves AUPRs of 0.774 ± 0.015, 0.873 ± 0.003 and 0.702 ± 0.011 on the bladder, pancreatic and stomach cancer datasets, respectively. We also show that MULGONET can effectively identify prognostic genes and GO terms associated with cancer recurrence.
Keywords: Cancer recurrence, Multi-omics data, Deep learning, Interpretability, Biomarker discovery
Graphical abstract
1. Introduction
Cancer is mainly caused by malfunctioning endogenous mechanisms that control cell growth and proliferation, and its incidence increases alarmingly [1]. Based on incomplete statistics, approximately 19.3 million new cancer cases and 10 million cancer-related deaths are estimated to have occurred in 2020 [2]. Advances in surgery, chemotherapy, molecularly targeted therapies and immunotherapy have significantly improved patient survival over the past decades. However, the 5-year survival rate for pancreatic, lung and liver cancer patients is still low due to cancer recurrence and metastasis [[3], [4], [5]]. Cancer recurrence usually means a few tumour cells may escape treatment and remain in place after surgery. These surviving tumour cells use the power of various resistance mechanisms to rebuild themselves after initial treatment. In addition, some normal cells undergo numerous genetic alterations to generate new tumour cells [6]. These new tumour cells will destroy the normal structure and function of human tissues and organs and even cause the death of patients. Therefore, it is critical to evaluate the risk of recurrence and metastasis of patients with cancer treated by surgery and to discover biomarkers associated with cancer recurrence to improve the survival of cancer patients.
With the development of high-throughput sequencing technologies, a large amount of multi-omics data can be obtained. These multi-omics data are used to predict cancer recurrence and reveal cancer recurrence mechanisms and molecular complexity [7,8]. Currently, multi-omics data have been used for cancer invasion identification [9], cancer subtype classification [10] and related tumour regulator discovery [11]. It provides valuable information about the cellular metabolic processes that drive tumour formation and progression and helps design optimal cancer treatment [12]. In addition, these multi-omics data can help discover biomarkers and cellular pathways or processes that distinguish between cancer patients and normal cohorts.
Deep learning is a multi-level representation learning method that has led to breakthroughs in processing images, video, speech and audio [13,14]. In recent years, it has been employed for cancer data classification and biomarker identification due to the power of multilayer nonlinear transformation. For example, Wei et al. [15] integrated multi-omics data by autoencoder and extracted new features at the bottleneck level. Then, the new features are used to achieve recurrence prediction of prostate cancer by k-mean clustering. Mars et al. [16] constructed cross-omics correlation networks based on multi-omics data to identify putative host-microbial-metabolite interactions in a non-targeted manner and further discover relevant biomarkers of irritable bowel syndrome. Lan et al. [17] utilized the biological relationship between genes and pathways to design a biological hierarchy module and a pathway self-attention module to effectively integrate multiple omics data and learn potential biological features. However, it has been challenging to understand the underlying mechanisms due to the “black box” property of deep neural networks. Therefore, to open the “black box” nature of neural networks, more and more studies are proposed to solve this problem. Zhao et al. [18] constructed a scalable and interpretable multi-omics cancer survival analysis framework by combining a priori knowledge of biological pathways to capture the nonlinear and hierarchical effects of biological pathways associated with survival time. Bourgeais et al. [19] designed a self-explaining neural network by encapsulating the Gene Ontology graph to provide interpretability in phenotype prediction. Moon et al. [20] proposed a multi-task attention learning algorithm based on multi-omics data for cancer classification and interpretation. An interpretable convolutional neural network is proposed for survival prediction and pathway analysis of glioblastoma [21]. This method constructs pathway images based on specific biological pathways and maps multi-omics data into the pathway images to identify critical pathways by using the Grad-CAM interpretable method [22]. Wang et al. [23] proposed a multi-omics graph convolutional neural network for biomarker discovery by using interpretable sensitivity analysis methods. Elmarakeby et al. [24] proposed a deep learning model with biological information based on Reactome data to provide insights into targeted therapy for prostate cancer. Although these methods have achieved promising results, limitations and challenges exist. Some methods are unable to effectively improve the predictive performance of the model due to design limitations. They often fail to overcome the problem of dimensionality catastrophe when dealing with multi-omics data integration. Multi-omics data often contains a large number of missing and zero values, which can affect the training and predictive performance of the model. In addition, while these methods are able to identify key features and assess the internal weights of the model, they often only provide a general interpretation of the model and cannot provide deeper biological insights from a model perspective.
In this paper, we propose a new interpretable neural network (MULGONET) for cancer recurrence prediction and biomarker discovery. This model is designed to utilize the immense biology knowledge already embedded in the functional annotations, e.g., biological process (BP) and molecular function (MF), contained in the Gene Ontology (GO) database [25]. In MULGONET, network nodes represent specific biological entities (genes, GO terms), and edges represent the relationships between entities. MULGONET utilizes the annotation relationships between genes and GO terms for multi-omics data integration, which effectively avoids the “curse of dimensionality” of multi-omics integration. In addition, integrating biological networks from different domains can provide potential interactive information to improve the prediction performance of the model. Furthermore, calculating node importance scores based on the attribution method can help identify complete gene-phenotype regulatory relationships, which provides more insights to reveal complex biological mechanisms. We demonstrate that MULGONET outperforms other advanced classification methods in 5-fold cross-validation experiments. Furthermore, the case study demonstrates that MULGONET is an effective tool for discovering essential genes and GO terms associated with cancer recurrence. Finally, the interpretability of MULGONET further provides reliability and transparency to the experimental results.
2. Materials and methods
2.1. Overview of MULGONET
Overall, MULGONET is an interpretable multi-omics integration framework that predicts cancer recurrence and identifies relevant biomarkers. There are five major steps in the MULGONET framework (Fig. 1). (i) Gene annotation is performed to annotate copy number variation Ensemble ID and methylated CpG probes into corresponding gene name. Then, data are preprocessed by removing low-quality multi-omics data and genes without protein encode function. (ii) Feature selection is performed on each type of omics data to remove redundant features. Then, the BP and MF entities are extracted according to the GO namespace. (iii) Based on the hierarchical relationship of GO, BP and MF relationship matrices are constructed to represent the corresponding GO hierarchical networks, respectively. Furthermore, utilizing the annotation relationships between genes and GO terms, multi-omics data are linked to the first layer GO entities of the GO hierarchical network, forming the gene-GO relationship layers. (iv) A complete biological network is constructed by integrating the gene-GO relationship layers with the GO hierarchical network. Then, the biological network is converted into a computational model to achieve network learning and prediction. (v) MULGONET generates an importance score for each entity (gene and GO term) to indicate the significance of the entity in cancer recurrence by using the interpretable attribution method.
Fig. 1.
Overview of the MULGONET framework. (a) Data processing. Collection and preprocessing of multi-omics data and gene ontology data. Then, the GO hierarchy are constructed as a directed acyclic graph, and the graph is subjected to relationship extraction to obtain the GO hierarchical matrix. Finally, the annotation relationships between the genes and GO terms are used to generate the gene-GO relationship matrix. (b) Network construction. MULGONET is constructed based on the gene-GO relationship layers and GO hierarchy networks. Nodes on the far left represent omics data types, and the subsequent five layers represent higher-level biological entities (GO terms). The upper layer is constructed based on BP hierarchy, and the lower layer is constructed based on MF hierarchy. Finally, the outputs of the two sub-networks are merged by a multilayer perceptron to perform the prediction of the cancer recurrence. (c) Interpretability of model. The relative importance scores of the MULGONET nodes are calculated by the integrated gradients methods, and the node scores are visualized to reveal the regulatory process of cancer recurrence.
2.2. Data description and data preprocessing
In this experiment, the performance of our model is evaluated by using three datasets, including pancreatic cancer (PAAD), stomach cancer (STAD) and bladder cancer (BLCA), which are downloaded from the Xena TCGA Pan-Cancer web [26]. Each dataset includes mRNA expression data (mRNA), copy number variation data (CNV) and DNA methylation data (DNA_Meth), where mRNA is the “HTSeq - FPKM” data type, CNV is the “VarScan2 Variant Aggregation and Masking” data type, and DNA_Meth is the “Illumina Human Methylation 450” data type. To clearly describe the effect of different copy number variations on the final prediction, the CNV data is divided into two sub-datasets: copy number amplification (Cnv_Amp) and copy number deletion (Cnv_Del). In addition, the clinical data corresponding to these datasets are downloaded from the TCGA database [27]. Only samples with matched mRNA, CNV and DNA_Meth are included in our study.
Due to the inherent differences among various omics data types, appropriate preprocessing is essential. Firstly, for raw mRNA data, non-coding proteins and low-expressed genes are filtered out. Secondly, genes without protein-encoding functions are removed from DNA methylation data. Furthermore, to ensure that genes can be linked to GO terms, those without annotations in the first layer of the network are excluded. Considering the high dimensionality characteristic of each omics type, the chi-square test is employed for feature selection, and the top 1000 gene features are selected based on their corresponding p-values. The details of the datasets are presented in Table 1. Moreover, GO hierarchy data is obtained from the Gene Ontology Database, and BP and MF entities are extracted according to the GO namespace. In Gene Ontology, only the “is a” relationship is utilized to construct the biological hierarchy network, which retains only five GO layers. The number of nodes in each layer for the MF and BP networks is detailed in Table 2.
Table 1.
Summary of datasets.
| DateSet | Categories | Number of features of mRNA, DNA_Meth, Cnv_Amp, Cnv_Del | Number of training features of mRNA, DNA_Meth, Cnv_Amp, Cnv_Del |
|---|---|---|---|
| PAAD | Recurrence: 81 Non-Recurrence: 94 |
18,580, 33,479, 19,645, 19,645 | 1000, 1000, 1000, 1000 |
| STAD | Recurrence: 85 Non-Recurrence: 218 |
18,580, 33,461, 19,645, 19,645 | 1000, 1000, 1000, 1000 |
| BLCA | Recurrence: 144 Non-Recurrence: 259 |
18,580, 17,983, 19,729, 19,729 | 1000, 1000, 1000, 1000 |
Note: PAAD, pancreatic cancer; STAD, stomach cancer; BLCA, bladder cancer; mRNA, mRNA expression data; DNA_Meth, DNA methylation data; CNV_Amp, copy number amplification data; Cnv_Del, copy number deletion data; Recurrence, positive samples; Non- Recurrence, negative samples.
Table 2.
Number of neurons in MULGONET.
| Categories | First layer | Second layer | Third layer | Fourth layer | Fifth layer |
|---|---|---|---|---|---|
| BP | 2819 | 2252 | 534 | 114 | 19 |
| MF | 1428 | 522 | 172 | 37 | 10 |
Note: BP, biological process network; MF, molecular function network.
2.3. Construction of network with GO hierarchy
In this section, we focus on the overall design of MULGONET. It is a biological network consisting of a gene layer and two GO hierarchical networks, where the gene layer is used to represent the input omics data, and the two GO hierarchical networks are represented as BP and MF networks, respectively. In addition, the BP and MF networks are constructed separately, and then the two networks are integrated to form the structure of the MULGONET. Firstly, for the BP network, let denote the number of layers of the hidden layer, and denote the i th hidden layer. The connection relationship of the entity in the hidden layer is represented as, which is defined as follows:
| (1) |
where denotes the q-th entity (node) at the i th layer, and denotes the set of successor nodes of the p-th entity in the (i-1)-th layer. If equals 1, then entity p is connected to entity q. Otherwise, entity p is dropped. Meanwhile, to fully represent the connection relationships between the adjacent layers, the relationship matrix between adjacent layers is calculated as follows:
| (2) |
where represents the counting function, and represents relationship matrix between the (i-1)-th and i th layers. Furthermore, the relationship matrix of the BP network is defined as follows:
| (3) |
where represents the relationship matrix of the BP network. In this experiment, the value of is 4, which means that there are five layers of GO terms in the BP network. Similarly, the relationship matrix of the MF network is defined as follows:
| (4) |
Except for defining the relationship matrix between GO terms, the connection relationship between genes and GO terms is represented by and is defined as follows:
| (5) |
where denotes the -th gene of k-th kind of omics data, and denotes the m-th GO term of the first layer in the BP network. If equals 1, the gene i is connected to the GO term m. Then, the relationship matrix between the genes and GO terms in the BP network is calculated as follows:
| (6) |
where () denotes the number of genes for the k-th type of omics data, denotes the number of GO terms in the first layer of the BP network, and denotes the relationship matrix between the k-th type of omics data and GO terms in the BP network. Here, we concatenate the relationship matrix of the four omics data, which is defined as follows:
| (7) |
Similarly, the relationship matrix between the genes and GO terms in the MF network is calculated as follows:
| (8) |
2.4. Training and optimization
MULGONET is designed as a feedforward neural network. The network propagation based on BP and MF network is calculated as follows:
(1) The propagation from the gene layer to the first GO layer is calculated as follows:
| (9) |
| (10) |
(2) The propagation of subsequent layers is calculated as follows:
| (11) |
| (12) |
where σ represents the tanh activation function. , , , and b represent the trainable parameter of the network. represents the transposal function, and represents the multilayer perceptron. In addition, X denotes the concatenated four multi-omics data. and denotes the output of the first layer BP and MF networks, respectively. is the sum of the outputs of the two GO networks. Then, is input to the multilayer perceptron to predict cancer recurrence.
Since there is an imbalance between positive and negative samples in multi-omics data, category weights are added to the loss function for the category balance. The weight relative to the class is calculated as follows:
| (13) |
where denotes the number of classes. denotes the balance coefficient of j-th class. N denotes the number of total classes and denotes the total number of j-th class.
In our method, the cross-entropy loss function is used as the loss function which is defined as follows:
| (14) |
where is the regularization parameter to avoid model overfitting. α denotes all parameters of model. represents the class weight of positive samples. represents the class weight of negative samples. and represent the actual value of the target class and the predicted probability of the target class, respectively.
2.5. Interpretability of the model
The contribution score of a gene indicates the extent of the gene's role in predicting cancer recurrence. A higher score indicates that the gene has a greater impact on the prediction outcome and may be an important biomarker. The contribution score of a GO entity indicates the extent to which the biological process or molecular function plays a role in predicting cancer recurrence. A higher score indicates that the biological process or molecular function has a greater impact on the prediction result and may reveal the underlying mechanism of cancer recurrence.
To evaluate the impact of each node on the final output, the integrated gradients method [28] is used to obtain the importance score of the nodes in the network. It is calculated as follows:
| (15) |
where denotes the activation of the current node. denotes the reference activation of the current node. m is the step value of the integrated Riemann sum approximation. In this experiment, the value of m is set as 20. The integrated gradient method is a local interpretation method that captures only the score of a single sample. Therefore, to obtain global interpretability, the samples with cancer recurrence are used as input and the importance scores of each sample are weighted to achieve global interpretability in this experiment.
3. Results
3.1. Classification performance evaluation
We use 5-fold cross-validation to evaluate the performance of our model. In the 5-fold cross-validation, all data are divided into five subsets where four subsets of data are used as the training set, and the other subset is taken as the test set. In addition, the 5-fold cross-validation is repeated five times to further ensure the reliability of the experiment. Furthermore, Recall (REC), F1-score (F1), area under the roc curve (AUC) and area under the precision-recall curve (AUPR) are employed as evaluation metrics. Finally, the mean and standard deviation of the evaluation metrics are used as the final prediction results.
3.2. Performance of MULGONET compared with other advanced classification methods
In this experiment, MULGONET is utilized to predict cancer recurrence in BLCA, PAAD and STAD. To validate the effectiveness of our model, we compare MULGONET with several other advanced classification methods, including stochastic gradient descent (SGD), Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), Adaptive boosting with the decision tree (AdaBoost), RBF support vector machines (RBF-SVM), MOGONET [23], PathCNN [21], ProgCAE [29] and MOCAT [30]. To meet the input requirements of these methods, we perform data preprocessing. We concatenate four kinds of preprocessed omics data as input to the machine learning method (SGD, RF, LR, AdaBoost, DT and SVM). For the MOGONET method, we use the preprocessed four types of omics data directly as input. For the PathCNN, PNET, ProgCAE and MOCAT methods, we utilize the raw four types of omics data as input. We evaluate all methods on three datasets and ensure that the same training steps and hyperparameters are used. The results are shown in Fig. 2. MULGONET outperforms the other classification methods on the three datasets. MULGONET achieves the AUPR of 0.774 ± 0.015 on BLCA, 0.873 ± 0.003 on PAAD and 0.702 ± 0.011 on STAD, which is better than the other nine methods. The experiment results demonstrate the practicability of constructing the GO ontology into a computational model for cancer recurrence prediction. In addition, it also confirms the effectiveness of integrating multi-omics data in the GO space. In conclusion, MULGONET is a biologically inspired computational model that can perform feature fusion of different omics data in GO space to improve prediction performance.
Fig. 2.
The performance of MULGONET compared with other classification methods. Result: mean ± standard deviation; (a) performance comparison of different methods on STAD dataset; (b) performance comparison of different methods on PAAD dataset; (c) performance comparison of different methods on BLCA dataset; SGD, stochastic gradient descent; RF, Random Forest; LR, Logistic Regression; DT, Decision Tree; AdaBoost, Adaptive boosting with the decision tree; RBF-SVM, RBF support vector machines; MOGONET, PathCNN, PNET, ProgCAE and MOCAT, Multi-omics Integrative Network.
3.3. Performance of MULGONET in integration of different omics data
To further evaluate the ability of MULGONET to integrate different types of omics data in the classification tasks, we compare its performance on the three cancer datasets by integrating four types of omics data. The results are shown in Fig. 3 and Supplementary materials Fig. 1 and 2. Integrating four types of omics data achieves the best prediction performance compared with other data integrations. Meanwhile, CNV data performs better than other single omics data in the cancer recurrence prediction task, as shown in Fig. 3a, Supplementary materials Fig. 1a and 2a. These results suggest that CNV is a more critical factor in cancer recurrence than other single omics data, which has been suggested in some earlier studies as one of the important drivers of cancer development and progression [31,32]. We next investigate the predictive performance of different combinations of omics data, and the results are shown in Fig. 3b, Supplementary materials Fig. 1b and 2b. The results demonstrate that MULGONET with four omics data types achieves the best performance compared to using three omics data types. Moreover, MULGONET with three omics data types performs better than using only single omics data types. Furthermore, the predictive performance of DNA_Meth + CNV_Amp is better than mRNA+CNV_Amp on the three datasets (the prediction performance of mRNA is better than DNA_Meth in the single omics). Some studies suggest that it is also a key factor in cancer recurrence [33,34,35]. It indicates that MULGONET effectively captures the potential interaction information between DNA_Meth and CNV_Amp to improve predictive performance. In addition, MULGONET can be extended to other types of omics data without suffering from the “curse of dimensionality” by using the annotation relationships between genes and GO terms. This experiment also further demonstrates the superiority of MULGONET in integrating multi-omics data.
Fig. 3.
The performance of MULGONET in integrating different omics data in the STAD. (a) The results of single omics data and integration of the four omics data on 5-fold cross-validation. (b) The results of integrating two or more omics data on 5-fold cross-validation. ACC, Accuracy; F1, F1 score; AUC, the area under the roc curve; AUPR, the area under the precision-recall curve. DNA_Meth, DNA methylation data; mRNA, mRNA expression data; CNV_Amp, copy number amplification data; Cnv_Del, copy number deletion data; Four_Omics, DNA_Meth + mRNA + CNV_Amp + Cnv_Del. *p < 0.05 vs. Four_Omics. **p < 0.01 vs. Four_Omics.
3.4. Prediction performance of MULGONET constructed from different GO ontology networks
We next investigate the prediction performance of using different GO networks, i.e., BP vs MF. We use either BP or MF as the backbone network of MULGONET and evaluate the predicted performance of the model. MF represents activities at the molecular level for individual gene products (including proteins and RNAs) or complexes of multiple gene products, while BP describes biological objectives by one or more sequential molecular activities or reactions [25]. We construct three networks with three Gene ontology networks (MF, BP and MF+BP) to ascertain their performance on the three datasets. The results are shown in Fig. 4. It can be observed that integrating multiple GO ontology networks (BP+MF) achieves an AUPR of 0.774 on BLCA, 0.873 on PAAD and 0.702 on STAD, better than any single gene ontology network. This indicates that integrating the BP network and MF network can provide deeper interaction information to improve the prediction performance of the model.
Fig. 4.
The performance of MULGONET in different GO ontology networks. (a) The performance of different gene ontology networks on the STAD dataset. (b) The performance of different gene ontology networks on the PAAD dataset. (c) The performance of different gene ontology networks on the BLCA dataset. BP represents the MULGONET network constructed by the BP graph on GO, MF represents the MULGONET network constructed by the MF graph on GO and BP+MF represents the MULGONET network constructed by the BP graph and MF graph on GO. *p < 0.05 vs. BP+MF. **p < 0.01 vs. BP+MF.
In order to investigate the prediction performance in different network layers, we use 5-layer, 4-layer and 3-layer networks as the backbone of MULGONET and evaluate the prediction performance of the model. The results are shown in Supplementary Materials Fig. 5. It can be observed that the AUPR of BLCA in 5-layer network is 0.774, which is better than 3-layer (0.622) and 4-layer networks (0.732). The AUPR of PAAD in the 5-layer network is 0.873, which is better than the 3-layer network (0.720) and 4-layer network (0.776). The AUPR of STAD in the 5-layer network is 0.702, which is better than the 3-layer network (0.593) and 4-layer network (0.625). This indicates that a 5-layer network can provide deeper interaction information and thus improve the predictive performance of the model.
Fig. 5.
Kaplan-Meier curves for bladder cancer genes. (a) CDK6 gene Kaplan-Meier curve in bladder cancer. (b) WNT5A gene Kaplan-Meier curve in bladder cancer. (c) KITLG gene Kaplan-Meier curve in bladder cancer. (d) SCRN1 gene Kaplan-Meier curve in bladder cancer.
3.5. MULGONET identifies biomarkers related to bladder cancer
To further evaluate the ability of MULGONET for biomarker discovery, we inspect the gene layer (input layer) and use the integrated gradients method to obtain the gradient value of individual genes. The gradient value of the feature is usually regarded as the contribution of the feature to the classification result. We selected the top nine genes in each omics based on the contribution results of the features as the crucial genes associated with bladder cancer recurrence, which is shown in Table 3. Some of these genes with higher scores have previously been shown to be associated with bladder cancer. For mRNA data, it has been verified that high expression of WNT5A inhibits invasion in bladder cancer cell lines [36]. In addition, WNT5A is shown to increase gemcitabine resistance and decrease cell apoptosis in bladder cancer cells induced by gemcitabine [37]. It has been proved that SFRP1 can reduce the malignant potential of bladder cancer cells and may be a prognostic marker for muscle-invasive bladder cancer [38]. It has been verified that the expression of CDK6 is increased in cases of invasive bladder cancer which may contribute to bladder cancer development, and it also can serve as a biomarker for bladder cancer [39]. Gohji et al. [40] indicated that the expression of HPSE is associated with the invasion, metastasis and differentiation of bladder cancer.
Table 3.
Biomarkers identified by MULGONET on the bladder cancer.
| Omics data type | Biomarkers(genes) |
|---|---|
| mRNA | WNT5A, IRS1, HPSE, SFRP1, USP37, CDK6, SCRN1, KITLG, INHBA. |
| Cnv_Amp | ROCK1, GDF5, SEMA5A, BCL2L1, SOX9, IFNK, HCK, GNRH2, IL2. |
| Cnv_Del | CDKN2A, IFNA1, EDN1, WNT5A, PIK3C3, MTAP, PRKCD, APEH, SEMA3F. |
| DNA_Meth | INHBA, BMP10, CARD16, PLG, UTS2, MMP7, HSD3B1, MMP3, TNFSF4. |
For copy number amplification data, it has been demonstrated that the high expression of ROCK1 is correlated with poor prognosis of urothelial bladder cancer, and its silence inhibits cell proliferation and promotes apoptosis, which may be a therapeutic target for bladder cancer treatment [41]. In addition, ROCK1 is known to be regulated by multiple microRNAs to participate in the invasion and metastasis of bladder cancer [42,43]. Wan et al. [44] discovered that the expression of SOX9 is significantly upregulated in bladder cancer, and it is correlated with the clinical stage of bladder cancer. Moreover, their experiment results suggested that SOX9 may play an important role in the tumour progression of bladder cancer and may be used as a biomarker. In addition, it is shown that the inducibility of IL-2 is an independent prognostic parameter and a useful predictive indicator of remission versus relapse in bladder carcinoma [45].
For copy number deletion data, Worst et al. [46] showed that the deletion of CDKN2A is associated with fibroblast growth factor 3 (FGFR3) mutations, which can serve as progression markers of non-muscle invasive bladder cancer. It has been demonstrated that the inhibition of PIK3C3 promotes mitochondrial dysfunction, ROS production and DNA damage in bladder cancer cells [47]. Protein kinase C delta (PRKCD) has been reported to be associated with overall survival in bladder cancer [48], which is a critical regulator of cell proliferation, survival and cell death [49]. EDN1 (also known as ET1) is a potential biomarker of unfavourable prognosis in bladder cancer which has been reported to be associated with bladder tumour invasion and survival in patients [50,51].
For DNA methylation data, Kao et al. [52] showed that INHBA is a biomarker of tumour progression and metastasis in patients with bladder cancer, and the high expression of INHBA is associated with a lower survival rate in patients with bladder cancer. It has been demonstrated that the overexpression of BMP10 can inhibit the growth, adhesion, and migration of bladder cancer cells, and it can be used as a novel potential target for therapeutic intervention of bladder cancer [53]. MMP7 is identified as an independent prognostic factor in bladder cancer, where its overexpression was associated with metastatic lesions in bladder cancer [54,55]. It has been revealed that the piRABC promote bladder cancer cell apoptosis by up-regulating TNFSF4, and the expression of TNFSF4 protein is lower than normal tissues in bladder cancer tissues [56].
In addition, Kaplan-Meier plots for the identified biomarker genes in MULGONET, including CDK6, WNT5A, KITLG and SCRN1, are shown in Fig. 5. We use an independent set to validate the results, as shown in Supplementary Material Fig. 4. The results show that the Recurrence-Free survival rate of patients with high expression of CDK6, WNT5A, KITLG and SCRN1 is lower than patients with low expression.
3.6. MULGONET identified biomarkers related to pancreatic cancer
To further explore the contribution of genes in pancreatic cancer recurrence, we utilize MULGONET to calculate the importance scores of each omics data in pancreatic cancer recurrence. Table 4 shows the list of genes with high scores for each omics data type. Many of the genes have prior evidence of being closely associated with pancreatic cancer. For mRNA expression data, it has been discovered that the expression of CDK6 significantly increases in pancreatic cancer, and patients with high expression of CDK6 have poor overall survival [57]. It has been found that the high expression of WNT11 correlates with clinical features such as malignancy, tumour invasion and survival cycle in pancreatic ductal adenocarcinoma [58]. Igbinigie et al. [59] showed that the high expression of DKK1 is associated with poor prognosis for pancreatic cancer, and DKK1 could be a promising prognostic marker and therapeutic target. It has been demonstrated that AXL is involved in the process of epithelial-mesenchymal transition (EMT) regulation, which promotes the invasive and metastatic ability of pancreatic cancer cells [60].
Table 4.
Biomarkers identified by MULGONET on the pancreatic cancer.
| Omics data type | Biomarkers(genes) |
|---|---|
| mRNA | TGFB2, NKX3–1, TNFSF18, CDK6, WNT11, GCG, WNT7A, DKK1, KLK7, AXL, CTSV, CCND1. |
| Cnv_Amp | ROCK1, MYC, TG, AGO2, CDK6, RPS5, CPT1A, RCE1, BMP4, APEX1. |
| Cnv_Del | APELA, MAP3K13, ABCA1, EGFR, PPARGC1A, TRABD2A, PARL, VAMP8, ZDHHC7, KLF4. |
| DNA_Meth | VEGFC, IGF2, GCG, IFNA4, INSL6, MIA, WNT3A, MMP9, WT1, LTF, CGB2, BRINP1. |
For copy number amplification data, it has been found that ROCK1 has overexpression in pancreatic tumour tissues, and its expression promotes the invasion and metastasis of pancreatic cancer cells [61]. Hessmann et al. [62] showed that MYC is related to the development and progression of pancreatic cancer through multiple pathways including cell proliferation, metabolism and growth. In addition, the overexpression and amplification of MYC promote the proliferation and migration of pancreatic cancer cells. Hamada et al. [63] found that BMP4 induces epithelial-mesenchymal transition (EMT) in pancreatic cancer cells and further enhances the migration and invasion of pancreatic cancer cells. Moreover, BMP4 can also activate ERK and p38 MAPK pathways to promote the proliferation and growth of pancreatic cancer cells. Studies also show that AGO2 is involved in the proliferation, invasion, and metastasis of pancreatic ductal adenocarcinoma cells by interacting with critical molecules in the EGFR and RAS signaling pathways [64].
For copy number deletion data, the overexpression of EGFR can promote the proliferation and metastasis of pancreatic cancer cells, and it could be a potential therapeutic target for pancreatic cancer treatment [65]. Huang et al. [66] showed that the overexpression of PPARGC1A suppresses pancreatic cancer cell proliferation, migration, and invasion abilities. At the same time, the PPARGC1A regulates the fatty acid synthesis and metabolic remodeling processes. It has been demonstrated that the overexpression of KLF4 can significantly inhibit the proliferation and migration of pancreatic cancer cells [67].
For DNA methylation data, it has been reported that the overexpression of VEGFC potentially promotes tumour growth and metastasis in pancreatic cancer [68]. It has been found that the overexpression of MMP9 is associated with tumour metastasis and survival in pancreatic cancer tissues [69]. It has been demonstrated that the overexpression of WT1 is associated with the cell growth of pancreatic cancer [70]. There are also reports that the overexpression of TGFB2 enhances the proliferation and invasion of pancreatic cancer cells [71]. Furthermore, in order to explore the association between genes and patient prognosis, we also perform a survival analysis to evaluate the association between gene expression levels and patient survival. The Kaplan-Meier plots of several high-scoring genes (CDK6, MYC, TGFB2 and GNRH2) are shown in Fig. 6. The results show a significant survival difference between the high and low-risk groups for all genes, which indicates the ability of MULGONET for biomarker discovery.
Fig. 6.
Kaplan-Meier curves for pancreatic cancer genes. (a) CDK6 gene Kaplan-Meier curve in pancreatic cancer. (b) MYC gene Kaplan-Meier curve in pancreatic cancer. (c) TGFB2 gene Kaplan-Meier curve in pancreatic cancer. (d) GNRH2 gene Kaplan-Meier curve in pancreatic cancer.
Multi-omics integration improves the biomarker discovery by fusing feature from different omics. Specifically, mRNA expression data reveal the status of genes under specific conditions and help to identify active or repressed genes. Copy number variation data reveal genome-level changes which may lead to the increase or decrease of gene expression, affecting cellular function and disease processes. DNA methylation data provide the critical information of gene expression regulation. The multi-omics integration provides a comprehensive view for revealing key biological processes and pathways which can contribute to discover potential biomarkers.
3.7. The biological explanation of bladder cancer recurrence
To understand the interactions between different genes, biological processes and molecular functions that contributed to the predictive performance, we next use the integrated gradients method to calculate importance scores for GO terms on bladder cancer. Fig. 7 shows these importance scores as a data flow in the GO map. In the regulatory network of bladder cancer, some GO terms, including negative regulation of cell differentiation (GO:0045596), positive regulation of mitochondrial fission (GO:0090141), regulation of insulin receptor signaling pathway (GO:0046626), regulation of G protein-coupled receptor signaling pathway (GO:0008277), cysteine-type exopeptidase activity (GO:0070004) and chemorepellent activity (GO:0045499) are identified by MULGONET as potential GO terms implicated in the recurrence of bladder cancer. Mitochondrial fission is recognized as an important characteristic of mitochondrial dynamics, and it has been implicated in the pathogenesis of various human diseases [72]. Several studies suggest that mitochondrial function plays a crucial role in promoting the development of bladder cancer [73,74]. G protein-coupled receptors (GPCRs) are a large family of transmembrane proteins that play a crucial role in signal transduction across the cell membrane [75]. In addition, GPCRs and their signaling contribute to the development and metastasis of human cancers [76].
Fig. 7.
Data flow of bladder cancer recurrence in MULGONET. The nodes (red entities) on the left layer represent the input omics features. The following five layers represent higher-level biological entities, where the blue entities represent the BP network, and the purple entities represent the MF network. The last layer (yellow entity) represents the output of the model. Furthermore, the “residue” nodes represent the total importance of entities not shown in these corresponding layers. PR, positive regulation; NR, negative regulation; GP-CRSP, G protein-coupled receptor signaling pathway; IRSP, insulin receptor signaling pathway; PA, phosphotransferase activity; TRPTKA, transmembrane receptor protein tyrosine kinase activity; TRPK, transmembrane receptor protein kinase; TP-CG, transferring phosphorus-containing groups.
In addition, several more specific GO terms are associated with bladder cancer. It has been demonstrated that genetic inactivation of Notch signaling (GO:0008593) leads to Erk1/2 phosphorylation, resulting in tumorigenesis in the urinary tract [77]. Furthermore, it has been found that the Notch signaling pathway plays a tumour-suppressive role in bladder cancer, and its inactivation promotes bladder cancer development [78]. Receptor protein tyrosine kinase (RPTK) (GO:0004714) is involved in regulating many cellular programs, including the control of cell growth and differentiation [79]. In addition, tyrosine-kinases are regulated by RPTK which have been confirmed to be involved in the progression and metastasis of bladder cancer [80].
3.8. The biological explanation of pancreatic cancer recurrence
We next use the integrated gradients method to calculate importance scores for the GO entities in pancreatic cancer too. In the regulatory network of pancreatic cancer (Fig. S3), some pathways including cellular response to lipopolysaccharide (GO:0071222), G1/S transition of mitotic cell cycle (GO:0000082), cellular response to Camp (GO:0071320), G protein-coupled receptor binding (GO:0001664), protein serine/threonine kinase activity (GO:0004674) and receptor ligand activity (GO:0048018) are identified by MULGONET as the potential GO terms for recurrence of pancreatic cancer. Ikebe et al. [81] showed that Lipopolysaccharide (LPS) enhances the invasiveness of pancreatic cancer cells by inducing the expression of MMP9. In addition, G Protein-coupled Receptors (GO:0001664) have been proven to play a crucial role in tumour growth and cell proliferation, and they are also closely associated with the development of pancreatic cancer [82,83]. Several studies have also shown that the protein serine/threonine kinase (GO:0004674) can increase protein expression in pancreatic cancer tissues and is associated with the progression stages of tumours [84].
4. Discussion and conclusion
Cancer recurrence is a critical factor that affects patient survival. Predicting its occurrence is essential for developing personalized treatment plans, optimizing clinical decision-making and improving patient quality of life. In recent years, deep learning-based multi-omics data integration has made significant progress in cancer recurrence prediction, tumour grading and cancer subtyping. However, the multi-omics data have the characteristics of heterogeneity, sparsity, noise, sample unbalance and high-dimensional small sample. These factors directly impact the application of deep learning to multi-omics analysis. Moreover, a major drawback of deep learning is the lack of interpretability which is important for biological understanding. In this paper, we propose an interpretable deep-learning framework that implements multi-omics data integration and GO graph representation learning for cancer recurrence prediction and its regulatory mechanism discovery. The advantages of our method are as follows: 1. MULGONET can map different genomic features to the same gene ontology space for omics data integration and encapsulate the biological hierarchy of Gene Ontology into neural networks for representation learning. 2. It can clearly describe the hierarchical relationship of the model and provide more biological insights. 3. It is easier and effective to integrate additional omics data types.
We demonstrate that MULGONET can effectively predict cancer recurrence and achieve genotype-phenotype association discovery by using prior knowledge. MULGONET performs better in tested prediction tasks than other advanced multi-omics integration methods. Furthermore, integrating biological networks from different domains can provide potential interactive information to improve the prediction performance of the model. Experiments also demonstrate that the fusion of several types of biological networks achieves better performance in prediction tasks than single biological networks. In two case studies of bladder and pancreatic cancer, MULGONET can effectively identify biomarkers associated with cancer recurrence by using the attribution method. In addition, MULGONET examines the impact of entities on cancer recurrence by visualizing the contribution of entities in the GO network, which can provide more biological insights into the mechanisms of disease and clinical treatments. Although we only utilize four types of omics data in this experiment for cancer recurrence prediction, MULGONET can be readily extended to more omics data to meet different omics integration requirements by using the annotation relationships between genes and GO terms. Overall, MULGONET has the potential to advance our understanding of the complex regulatory mechanisms of underlying cancer recurrence. Furthermore, it can be easily extended to other biological problems and provide a novel research tool for biological discovery.
MULGONET has some limitations that can be further explored and improved in the future. First, MULGONET is highly dependent on the quality and size of multi-omics data. For example, the noisy and missing values in data such as mRNA expression, copy number variation and DNA methylation can affect the predictive performance of the model. Second, since some GO terms may have multiple parent terms at the same time, the optimal parent terms need to be identified for constructing a biological knowledge network. In addition, the GO dataset is updated every few months. The biological network structure needs to be updated.
Data availability
The Gene Ontology dataset is obtained from Gene Ontology Resource (http://geneontology.org/). Omics data of STAD, BLCA and PAAD are obtained from The Xema TCGA Pan-Cancer web (https://xenabrowser.net/datapages/). The clinical data of STAD, BLCA and PAAD are obtained from the Cancer Genome Atlas Program (TCGA)(https://portal.gdc.cancer.gov/repository).
Code availability
The source code of this work can be downloaded from GitHub (https://github.com/lanbiolab/MULGONET) and the web server is available at http://101.33.251.35/MULGONET/#/GeneSelection
Declaration of competing interest
The authors declare that they have no conflicts of interest in this work.
Acknowledgments
This paper was supported by the National Natural Science Foundation of China (U24A20256, 62472128); the Natural Science Foundation of Guangxi (2024GXNSFFA010006); the Guangxi Bagui Youth Talent Program; the Project of Guangxi Key Laboratory of Eye Health (GXYJK-202407); the Project of Guangxi Health Commission Eye and Related Diseases Artificial Intelligence Screen Technology Key Laboratory (GXYAI-202402).
Biographies
Wei Lan received the Ph.D. in computer science from Central South University, China, in 2016. Currently, he is associate professor in School of Computer, Electronic and Information in the Guangxi University, Nanning, China. His current research interests include bioinformatics and machine learning.
Zhaolei Zhang received the Ph.D. in biophysics from the University of California at Berkeley, Canada, in 2000. Currently, he is a professor in the Donnelly Centre for Cellular and Biomolecular Research at the University of Toronto, Canada. His current research interests include non-coding RNA, gene regulation, and evolutionary genomics.
Jianxin Wang received the BEng and MEng degrees in computer engineering from Central South University, China in 1992 and 1996, respectively, and the PhD degree in computer science from Central South University, China in 2001. He is the chair and a professor at School of Computer Science and Engineering, Central South University, China. His current research interests include algorithm analysis and optimization, parameterized algorithm, bioinformatics, and computer network.
Footnotes
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.fmre.2025.01.004.
Contributor Information
Zhaolei Zhang, Email: zhaolei.zhang@utoronto.ca.
Jianxin Wang, Email: jxwang@csu.edu.cn.
Appendix. Supplementary materials
References
- 1.Bukhtoyarov O.V., Samarin D.M. Pathogenesis of cancer: Cancer reparative trap. J. Cancer Ther. 2015;6:339–412. [Google Scholar]
- 2.Sung H., Ferlay J., Siegel R.L., et al. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]
- 3.Ethun C.G., Kooby D.A. The importance of surgical margins in pancreatic cancer. J. Surg. Oncol. 2016;113:283–288. doi: 10.1002/jso.24092. [DOI] [PubMed] [Google Scholar]
- 4.Larsen J.E., Pavey S.J., Passmore L.H., et al. Gene expression signature predicts recurrence in lung adenocarcinoma. Clin. Cancer Res. 2007;13:2946–2954. doi: 10.1158/1078-0432.CCR-06-2525. [DOI] [PubMed] [Google Scholar]
- 5.Lee S.C., Tan H.T., Chung M.C.M. Prognostic biomarkers for prediction of recurrence of hepatocellular carcinoma: Current status and future prospects. World J. Gastroenterol. 2014;20:3112–3124. doi: 10.3748/wjg.v20.i12.3112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Esmatabadi M.J., Bakhshinejad B., Motlagh F.M., et al. Therapeutic resistance and cancer recurrence mechanisms: Unfolding the story of tumour coming back. J. Biosci. 2016;41:497–506. doi: 10.1007/s12038-016-9624-y. [DOI] [PubMed] [Google Scholar]
- 7.Lan W., Ling H., Chen Q., et al. scMoMtF: An interpretable multitask learning framework for single-cell multi-omics data analysis. PLOS Computat. Biol. 2024;20 doi: 10.1371/journal.pcbi.1012679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Xu Z., Liao H., Huang L., et al. IBPGNET: Lung adenocarcinoma recurrence prediction based on neural network interpretability. Brief. Bioinform. 2024;25:bbae080. doi: 10.1093/bib/bbae080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Albaradei S., Napolitano F., Thafar M.A., et al. MetaCancer: A deep learning-based pan-cancer metastasis prediction model developed using multi-omics data. Comput. Struct. Biotechnol. J. 2021;19:4404–4411. doi: 10.1016/j.csbj.2021.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Xu J., Wu P., Chen Y., et al. A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data. BMC bioinform. 2019;20:1–11. doi: 10.1186/s12859-019-3116-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ghaffari S., Hanson C., Schmidt R.E., et al. An integrated multi-omics approach to identify regulatory mechanisms in cancer metastatic processes. Genome biol. 2021;22:1–28. doi: 10.1186/s13059-020-02213-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bamji-Stocke S., van Berkel V., Miller D.M., et al. A review of metabolism-associated biomarkers in lung cancer diagnosis and treatment. Metabolomics. 2018;14:1–16. doi: 10.1007/s11306-018-1376-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lan W., Yang T., Chen Q., et al. Multiview subspace clustering via low-rank symmetric affinity graph. IEEE Trans. Neural Networks Learn. Syst. 2024;35:11382–11395. doi: 10.1109/TNNLS.2023.3260258. [DOI] [PubMed] [Google Scholar]
- 14.Lan W., Zhou G., Chen Q., et al. Contrastive clustering learning for multi-behavior recommendation. ACM Trans. Inform. Syst. 2024;43:1–23. [Google Scholar]
- 15.Wei Z., Han D., Zhang C., et al. Deep learning-based multi-omics integration robustly predicts relapse in prostate cancer. Front. Oncol. 2022;12 doi: 10.3389/fonc.2022.893424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Mars R.A.T., Yang Y., Ward T., et al. Longitudinal multi-omics reveals subset-specific mechanisms underlying irritable bowel syndrome. Cell. 2020;182:1460–1473. doi: 10.1016/j.cell.2020.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lan W., Liao H., Chen Q., et al. DeepKEGG: A multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery. Brief. Bioinform. 2024;25(3):bbae185. doi: 10.1093/bib/bbae185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zhao L., Dong Q., Luo C., et al. DeepOmix: A scalable and interpretable multi-omics deep learning framework and application in cancer survival analysis. Comput. Struct. Biotechnol. J. 2021;19:2719–2725. doi: 10.1016/j.csbj.2021.04.067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bourgeais V., Zehraoui F., Hanczar B. GraphGONet: A self-explaining neural network encapsulating the Gene Ontology graph for phenotype prediction on gene expression. Bioinformatics. 2022;38:2504–2511. doi: 10.1093/bioinformatics/btac147. [DOI] [PubMed] [Google Scholar]
- 20.Moon S., Lee H. MOMA: A multi-task attention learning algorithm for multi-omics data interpretation and classification. Bioinformatics. 2022;38:2287–2296. doi: 10.1093/bioinformatics/btac080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Oh J.H., Choi W., Ko E., et al. PathCNN: Interpretable convolutional neural networks for survival prediction and pathway analysis applied to glioblastoma. Bioinformatics. 2021;37:i443–i450. doi: 10.1093/bioinformatics/btab285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Selvaraju R.R., Cogswell M., Das A., et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vision. 2020;128:336–359. [Google Scholar]
- 23.Wang T., Shao W., Huang Z., et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 2021;12:1–13. doi: 10.1038/s41467-021-23774-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Elmarakeby H.A., Hwang J., Arafeh R., et al. Biologically informed deep neural network for prostate cancer discovery. Nature. 2021;598:348–352. doi: 10.1038/s41586-021-03922-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Harris M.A., Clark J., Ireland A., et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32:D258–D261. doi: 10.1093/nar/gkh036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Goldman M.J., Craft B., Hastie M., et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat. Biotechnol. 2020;38:675–678. doi: 10.1038/s41587-020-0546-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Weinstein J.N., Collisson E.A., Mills G.B., et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 2013;45:1113–1120. doi: 10.1038/ng.2764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Sundararajan M., Taly A., Yan Q. Axiomatic attribution for deep networks. 34th Int. Confer. Mach. Learning. 2017;70:3319–3328. [Google Scholar]
- 29.Liu Q., Song K. ProgCAE: A deep learning-based method that integrates multi-omics data to predict cancer subtypes. Brief. Bioinform. 2023;24(4):bbad196. doi: 10.1093/bib/bbad196. [DOI] [PubMed] [Google Scholar]
- 30.Yao X., Jiang X., Luo H., et al. MOCAT: Multi-omics integration with auxiliary classifiers enhanced autoencoder. BioData Min. 2024;17(1):9. doi: 10.1186/s13040-024-00360-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Liang L., Fang J.Y., Xu J. Gastric cancer and gene copy number variation: Emerging cancer drivers for targeted therapy. Oncogene. 2016;35:1475–1482. doi: 10.1038/onc.2015.209. [DOI] [PubMed] [Google Scholar]
- 32.Steele C.D., Abbasi A., Islam S.M.A., et al. Signatures of copy number alterations in human cancer. Nature. 2022;606:984–991. doi: 10.1038/s41586-022-04738-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Lan W., Lai D., Chen Q., et al. LDICDL: lncRNA-disease association identification based on collaborative deep learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020;19(3):1715–1723. doi: 10.1109/TCBB.2020.3034910. [DOI] [PubMed] [Google Scholar]
- 34.Lan W., Wu X., Chen Q., et al. GANLDA: Graph attention network for lncRNA-disease associations prediction. Neurocomputing. 2022;469:384–393. [Google Scholar]
- 35.Lan W., Dong Y., Chen Q., et al. KGANCDA: Predicting circRNA-disease associations based on knowledge graph attention network. Brief. Bioinform. 2022;23(1):bbab494. doi: 10.1093/bib/bbab494. [DOI] [PubMed] [Google Scholar]
- 36.Moran J.D., Kim H.H., Li Z., et al. SOX4 regulates invasion of bladder cancer cells via repression of WNT5a. Int. J. Oncol. 2019;55:359–370. doi: 10.3892/ijo.2019.4832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Cao J., Wang Q., Wu G., et al. miR-129-5p inhibits gemcitabine resistance and promotes cell apoptosis of bladder cancer cells by targeting Wnt5a. Int. Urol. Nephrol. 2018;50:1811–1819. doi: 10.1007/s11255-018-1959-x. [DOI] [PubMed] [Google Scholar]
- 38.Rogler A., Kendziorra E., Giedl J., et al. Functional analyses and prognostic significance of SFRP1 expression in bladder cancer. J. Cancer Res. Clin. Oncol. 2015;141:1779–1790. doi: 10.1007/s00432-015-1942-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wang G., Zheng L., Yu Z., et al. Increased cyclin-dependent kinase 6 expression in bladder cancer. Oncol. Lett. 2012;4:43–46. doi: 10.3892/ol.2012.695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Gohji K., Okamoto M., Kitazawa S., et al. Heparanase protein and gene expression in bladder cancer. J. Urol. 2001;166:1286–1290. [PubMed] [Google Scholar]
- 41.Mei Y., Wu Y., Ma L., et al. Overexpression of ROCK1 promotes cancer cell proliferation and is associated with poor prognosis in human urothelial bladder cancer. Mamm. Genome. 2021;32:466–475. doi: 10.1007/s00335-021-09896-y. [DOI] [PubMed] [Google Scholar]
- 42.Xu X., Li S., Lin Y., et al. MicroRNA-124-3p inhibits cell migration and invasion in bladder cancer cells by targeting ROCK1. J. Transl. Med. 2013;11:1–13. doi: 10.1186/1479-5876-11-276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Majid S., Dar A.A., Saini S., et al. MicroRNA-1280 inhibits invasion and metastasis by targeting ROCK1 in bladder cancer. PloS one. 2012;7:e46743. doi: 10.1371/journal.pone.0046743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Wan Y.P., Xi M., He H.C., et al. Expression and clinical significance of SOX9 in renal cell carcinoma, bladder cancer and penile cancer. Oncol. Res. Treat. 2017;40:15–20. doi: 10.1159/000455145. [DOI] [PubMed] [Google Scholar]
- 45.Kaempfer R., Gerez L., Farbstein H., et al. Prediction of response to treatment in superficial bladder carcinoma through pattern of interleukin-2 gene expression. J. Clin. Oncol. 1996;14:1778–1786. doi: 10.1200/JCO.1996.14.6.1778. [DOI] [PubMed] [Google Scholar]
- 46.Worst T.S., Weis C.A., Stöhr R., et al. CDKN2A as transcriptomic marker for muscle-invasive bladder cancer risk stratification and therapy decision-making. Sci. Rep. 2018;8:14383. doi: 10.1038/s41598-018-32569-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Chen C.H., Changou C.A., Hsieh T.H., et al. Dual inhibition of PIK3C3 and FGFR as a new therapeutic approach to treat bladder cancer. Clin. Cancer Res. 2018;24:1176–1189. doi: 10.1158/1078-0432.CCR-17-2066. [DOI] [PubMed] [Google Scholar]
- 48.Reyland M.E. Protein kinase C isoforms: Multi-functional regulators of cell life and death. Front. Biosci. 2009;14:2386. doi: 10.2741/3385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Zhou C., Li A.H., Liu S., et al. Identification of an 11-autophagy-related-gene signature as promising prognostic biomarker for bladder cancer patients. Biology (Basel) 2021;10:375. doi: 10.3390/biology10050375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Mitrakas L., Gravas S., Karasavvidou F., et al. Endothelin-1 overexpression: A potential biomarker of unfavorable prognosis in non-metastatic muscle-invasive bladder cancer. Tumour. Biol. 2015;36:4699–4705. doi: 10.1007/s13277-015-3118-7. [DOI] [PubMed] [Google Scholar]
- 51.Said N., Smith S., Sanchez-Carbayo M. Tumor endothelin-1 enhances metastatic colonization of the lung in mouse xenograft models of bladder cancer. J. Clin. Invest. 2011;121:132–147. doi: 10.1172/JCI42912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Kao C.C., Chang Y.L., Liu H.Y., et al. DNA Hypomethylation is associated with the overexpression of INHBA in upper tract urothelial carcinoma. Int. J. Mol. Sci. 2022;23:2072. doi: 10.3390/ijms23042072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Zhang N., Ye L., Wu L., et al. Expression of bone morphogenetic protein-10 (BMP10) in human urothelial cancer of the bladder and its effects on the aggressiveness of bladder cancer cells in vitro. Anticancer Res. 2013;33:1917–1925. [PubMed] [Google Scholar]
- 54.Szarvas T., Becker M., vom Dorp F., et al. Matrix metalloproteinase-7 as a marker of metastasis and predictor of poor survival in bladder cancer. Cancer Sci. 2010;101:1300–1308. doi: 10.1111/j.1349-7006.2010.01506.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Szarvas T., Singer B.B., Becker M., et al. Urinary matrix metalloproteinase-7 level is associated with the presence of metastasis in bladder cancer. BJU. Int. 2011;107:1069–1073. doi: 10.1111/j.1464-410X.2010.09625.x. [DOI] [PubMed] [Google Scholar]
- 56.Chu H., Hui G., Yuan L., et al. Identification of novel piRNAs in bladder cancer. Cancer Lett. 2015;356:561–567. doi: 10.1016/j.canlet.2014.10.004. [DOI] [PubMed] [Google Scholar]
- 57.Zhao Y.X., Xu B.W., Wang F.Q., et al. nc-RNA-mediated high expression of CDK6 correlates with poor prognosis and immune infiltration in pancreatic cancer. Cancer Med. 2023;12:5110–5123. doi: 10.1002/cam4.5260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Dart D.A., Arisan D.E., Owen S., et al. Wnt-11 expression promotes invasiveness and correlates with survival in human pancreatic ductal adeno carcinoma. Genes (Basel) 2019;10:921. doi: 10.3390/genes10110921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Igbinigie E., Guo F., Jiang S.W., et al. Dkk1 involvement and its potential as a biomarker in pancreatic ductal adenocarcinoma. Clin. Chim. Acta. 2019;488:226–234. doi: 10.1016/j.cca.2018.11.023. [DOI] [PubMed] [Google Scholar]
- 60.Du W., Phinney N.Z., Huang H., et al. AXL is a key factor for cell plasticity and promotes metastasis in pancreatic cancer. Mol. Cancer. Res. 2021;19:1412–1421. doi: 10.1158/1541-7786.MCR-20-0860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Whatcott C.J., Ng S., Barrett M.T., et al. Inhibition of ROCK1 kinase modulates both tumor cells and stromal fibroblasts in pancreatic cancer. PloS one. 2017;12 doi: 10.1371/journal.pone.0183871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Hessmann E., Schneider G., Ellenrieder V., et al. MYC in pancreatic cancer: Novel mechanistic insights and their translation into therapeutic strategies. Oncogene. 2016;35:1609–1618. doi: 10.1038/onc.2015.216. [DOI] [PubMed] [Google Scholar]
- 63.Hamada S., Satoh K., Hirota M., et al. Bone morphogenetic protein 4 induces epithelial-mesenchymal transition through MSX2 induction on pancreatic cancer cell line. J. Cell. Physiol. 2007;213:768–774. doi: 10.1002/jcp.21148. [DOI] [PubMed] [Google Scholar]
- 64.Shankar S., Tien J.C., Siebenaler R.F., et al. An essential role for Argonaute 2 in EGFR-KRAS signaling in pancreatic cancer development. Nat. Commun. 2020;11:2817. doi: 10.1038/s41467-020-16309-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Oliveira-Cunha M., Newman W.G., Siriwardena A.K. Epidermal growth factor receptor in pancreatic cancer. Cancers (Basel) 2011;3:1513–1526. doi: 10.3390/cancers3021513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Huang X., Pan L., Zuo Z., et al. LINC00842 inactivates transcription co-regulator PGC-1α to promote pancreatic cancer malignancy through metabolic remodelling. Nat. Commun. 2021;12:3830. doi: 10.1038/s41467-021-23904-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Zammarchi F., Morelli M., Menicagli M., et al. KLF4 is a novel candidate tumor suppressor gene in pancreatic ductal carcinoma. Am. J. Pathol. 2011;178:361–372. doi: 10.1016/j.ajpath.2010.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Tang R.F., Wang S.X., Peng L., et al. Expression of vascular endothelial growth factors A and C in human pancreatic cancer. World J. Gastroenterol. 2006;12:280–286. doi: 10.3748/wjg.v12.i2.280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Xu Y., Li Z., Jiang P., et al. The co-expression of MMP-9 and Tenascin-C is significantly associated with the progression and prognosis of pancreatic cancer. Diagn. Pathol. 2015;10:1–8. doi: 10.1186/s13000-015-0445-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Oji Y., Nakamori S., Fujikawa M., et al. Overexpression of the Wilms' tumor gene WT1 in pancreatic ductal adenocarcinoma. Cancer sci. 2004;95:583–587. doi: 10.1111/j.1349-7006.2004.tb02490.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Hilbig A., Oettle H. Transforming growth factor beta in pancreatic cancer. Curr. Pharm. Biotechnol. 2011;12:2158–2164. doi: 10.2174/138920111798808356. [DOI] [PubMed] [Google Scholar]
- 72.Archer S.L. Mitochondrial dynamics–mitochondrial fission and fusion in human diseases. N. Engl. J. Med. 2013;369:2236–2251. doi: 10.1056/NEJMra1215233. [DOI] [PubMed] [Google Scholar]
- 73.Cormio A., Sanguedolce F., Musicco C., et al. Mitochondrial dysfunctions in bladder cancer: Exploring their role as disease markers and potential therapeutic targets. Crit. Rev. Oncol. Hematol. 2017;117:67–72. doi: 10.1016/j.critrevonc.2017.07.001. [DOI] [PubMed] [Google Scholar]
- 74.Huang L., Luan T., Chen Y., et al. LASS2 regulates invasion and chemoresistance via ERK/Drp1 modulated mitochondrial dynamics in bladder cancer cells. J. Cancer. 2018;9:1017–1024. doi: 10.7150/jca.23087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Rosenbaum D.M., Rasmussen S.G.F., Kobilka B.K. The structure and function of G-protein-coupled receptors. Nature. 2009;459:356–363. doi: 10.1038/nature08144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Dorsam R.T., Gutkind J.S. G-protein-coupled receptors and cancer. Nat. Rev. Cancer. 2007;7:79–94. doi: 10.1038/nrc2069. [DOI] [PubMed] [Google Scholar]
- 77.Rampias T., Vgenopoulou P., Avgeris M., et al. A new tumor suppressor role for the Notch pathway in bladder cancer. Nat. Med. 2014;20:1199–1205. doi: 10.1038/nm.3678. [DOI] [PubMed] [Google Scholar]
- 78.Maraver A., Fernandez-Marcos P.J., Cash T.P., et al. NOTCH pathway inactivation promotes bladder cancer progression. J. Clin. Invest. 2015;125:824–830. doi: 10.1172/JCI78185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.van der Geer P., Hunter T., Lindberg R.A. Receptor protein-tyrosine kinases and their signal transduction pathways. Annu. Rev. Cell Biol. 1994;10:251–337. doi: 10.1146/annurev.cb.10.110194.001343. [DOI] [PubMed] [Google Scholar]
- 80.Zangouei A.S., Barjasteh A.H., Rahimi H.R., et al. Role of tyrosine kinases in bladder cancer progression: An overview. Cell Commun. Signal. 2020;18:1–14. doi: 10.1186/s12964-020-00625-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Ikebe M., Kitaura Y., Nakamura M., et al. Lipopolysaccharide (LPS) increases the invasive ability of pancreatic cancer cells through the TLR4/MyD88 signaling pathway. J. Surg. Oncol. 2009;100:725–731. doi: 10.1002/jso.21392. [DOI] [PubMed] [Google Scholar]
- 82.Dorsam R.T., Gutkind J.S. G-protein-coupled receptors and cancer. Nat. Rev. Cancer. 2007;7:79–94. doi: 10.1038/nrc2069. [DOI] [PubMed] [Google Scholar]
- 83.Schuller H.M. Regulatory role of G protein-coupled receptors in pancreatic cancer development and progression. Curr. Med. Chem. 2018;25:2566–2575. doi: 10.2174/0929867324666170303121708. [DOI] [PubMed] [Google Scholar]
- 84.Wang Q., Xu D., Han C., et al. Overexpression of serine/threonine-protein kinase-1 in pancreatic cancer tissue: Serine/threonine-protein kinase-1 knockdown increases the chemosensitivity of pancreatic cancer cells. Mol. Med. Rep. 2015;12:475–481. doi: 10.3892/mmr.2015.3434. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The Gene Ontology dataset is obtained from Gene Ontology Resource (http://geneontology.org/). Omics data of STAD, BLCA and PAAD are obtained from The Xema TCGA Pan-Cancer web (https://xenabrowser.net/datapages/). The clinical data of STAD, BLCA and PAAD are obtained from the Cancer Genome Atlas Program (TCGA)(https://portal.gdc.cancer.gov/repository).








