An Unsupervised Deep Learning-Based Model Using Multiomics Data to Predict Prognosis of Patients with Stomach Adenocarcinoma

Sizhen Chen; Yiteng Zang; Biyun Xu; Beier Lu; Rongji Ma; Pengcheng Miao; Bingwei Chen

doi:10.1155/2022/5844846

. 2022 Oct 27;2022:5844846. doi: 10.1155/2022/5844846

An Unsupervised Deep Learning-Based Model Using Multiomics Data to Predict Prognosis of Patients with Stomach Adenocarcinoma

Sizhen Chen ¹, Yiteng Zang ¹, Biyun Xu ², Beier Lu ¹, Rongji Ma ¹, Pengcheng Miao ¹, Bingwei Chen ^1,^✉

PMCID: PMC9633210 PMID: 36339684

Abstract

Methods

Patients (363 in total) with stomach adenocarcinoma from The Cancer Genome Atlas (TCGA) cohort were included. An autoencoder was constructed to integrate the RNA sequencing, miRNA sequencing, and methylation data. The features of the bottleneck layer were used to perform the k-means clustering algorithm to obtain different subgroups for evaluating the prognosis-related risk of stomach adenocarcinoma. The model's robustness was verified using a 10-fold cross-validation (CV). Survival was analyzed by the Kaplan-Meier method. Univariate and multivariate Cox regression was used to estimate hazard risk. The model was validated in three independent cohorts with different endpoints.

Results

The patients were divided into low-risk and high-risk groups according to the k-means clustering algorithm. The high-risk group had a significantly higher risk of poor survival (log-rank P value = 2.80e − 06; adjusted hazard ratio = 2.386, 95% confidence interval: 1.607~3.543), a concordance index (C-index) of 0.714, and a Brier score of 0.184. The model performed well both in the 10-fold CV procedure and three independent cohorts from the Gene Expression Omnibus (GEO) repository.

Conclusions

A robust and generalizable model based on the autoencoder was proposed to integrate multiomics data and predict the prognosis of patients with stomach adenocarcinoma. The model demonstrates better performance than two alternative approaches on prognosis prediction. The results might provide the grounds for further exploring the potential biomarkers to predict the prognosis of patients with stomach adenocarcinoma.

1. Introduction

Stomach cancer is responsible for approximately 769,000 deaths in 2020 and is the fifth most common cancer and the fourth cause of global cancer-related mortality [1]. Stomach adenocarcinoma is the most prevalent histological subtype and has increasing incidence and mortality rates in recent years [2]. Compared to early diagnosis, the 5-year overall survival (OS) rate of stomach adenocarcinoma drops to less than 30% because of recurrence and metastasis at an advanced stage [3, 4]. Identification of the patients with high risk of stomach adenocarcinoma could help guide the future development of targeted treatment strategies and improve prognosis. Therefore, it is essential to develop a model for risk identification.

As the result of technological advancement and decreasing costs, high-throughput sequencing technology generated a large amount of multiomics data, which contains a wide spectrum of omics such as genome, transcriptome, and epigenome, and provides an opportunity to identify the risk of stomach adenocarcinoma at different molecular levels. Since single-omics data could only provide limited information, multiomics data could provide a holistic view of the biological system through identifying important biomarkers and providing biological information at multiple levels [5–7].Therefore, integration of multiomics data would provide better risk identification for patients with stomach adenocarcinoma.

Deep learning, also called deep neural network (DNN), is a new category of machine learning methods and widely used in many fields [8–11]. Deep learning consists of multiple layers containing multiple artificial neurons which have a weight and a shift value updated during backpropagation to minimize global loss function [12, 13]. An autoencoder is an unsupervised deep learning framework which is aimed at reconstructing its original input through a series of nonlinear transformations. The hidden layer could represent the information of the input layer. Compared to other methods for dimension reduction, the autoencoder is considered to fit a complex nonlinear relationship well. In recent years, many researchers paid attention to using autoencoder to integrate the information of multiomics and identifying the subtype of patients with cancer [14–16]. Some of the studies stacked the matrices of all multiomics data and then constructed one autoencoder to extract the latent information and to perform a clustering algorithm. Other studies constructed autoencoders for each type of the multiomics data and then stacked the nodes from each autoencoder for clustering.

In this study, an autoencoder was constructed to integrate multiomics data of patients from The Cancer Genome Atlas (TCGA) cohort with stomach adenocarcinoma to identify the risk and predict the prognosis, including mRNA expression, miRNA expression, and CpG methylation. The bottleneck layer was thought to represent the information of multiomics data, and then, the univariate Cox proportional hazard (Cox-PH) model was used to select the nodes in the bottleneck layer related to survival. The k-means clustering algorithm was performed to obtain the subgroup of patients with stomach adenocarcinoma. A Kaplan-Meier curve was drawn for survival analysis, and the log-rank method was used for statistical testing. The obtained subgroup was compared with two alternative approaches: principal component analysis (PCA) and similarity network fusion (SNF). A 10-fold cross-validation- (CV-) like was performed to assess the robustness, and three independent cohorts from the Gene Expression Omnibus (GEO) repository were used to validate the performance of the prognosis model.

2. Literature Review

Since more information could be provided by multiomics data compared to single-omics data, many algorithms were proposed to integrate different types of omics data, such as similarity network fusion (SNF) [17] and iCluster [18]. This section highlights the studies identifying subgroups of patients with cancer using a deep learning-based model to predict the prognosis and identify potential prognostic biomarkers. Table 1 presents the review on applications of an unsupervised deep learning model in subgroup identification and prognosis prediction for patients with cancer.

Table 1.

Deep learning applications in subgroup identification and prognosis prediction for patients with cancer.

Ref.	Datasets	Omics data	Deep learning model	Validation cohort
[14]	TCGA-HCC	mRNA miRNA DNA methylation	Autoencoder	LIRI-JP cohort NCI cohort miRNA GSE31384 cohort E-TABM-36 cohort Hawaiian cohort

[15]	TCGA-ESCC	mRNA DNA methylation	Autoencoder with early-fusion strategy and joint multimodal representation strategy	E-GEOD-53624 E-GEOD-53624

[16]	TCGA-HNSCC	mRNA miRNA DNA methylation	Autoencoder	E-GEOD-26549 E-GEOD-27020 E-GEOD-65858

[19]	TCGA-PRAD	mRNA miRNA DNA methylation	Autoencoder	—

[20]	TCGA-BLCA	mRNA miRNA DNA methylation	Autoencoder	GSE84525

[21]	TCGA-COAD	mRNA miRNA DNA methylation	Autoencoder	E-GEOD-17538 E-GEOD-39582 E-GEOD-28722

[22]	TCGA-BRCA	mRNA CNV	Autoencoder	—

[23]	TCGA-LUAD	mRNA miRNA DNA methylation CNV	Autoencoder	GSE81089 GSE63805 GSE63384 TCGA-LUAD CNV (lack of mRNA, miRNA, or methylation information)

[24]	TCGA-OV	mRNA/RNA-seq, CNV, DNA methylation	VAE, MMD-VAE	—

[25]	TCGA-PAAD	mRNA miRNA DNA methylation	Autoencoder	—

[26]	TCGA-STAD	mRNA DNA methylation	BiDNN	E-GEOD-62254 E-GEOD-26253

Open in a new tab

CNV: copy number variation; VAE: variational autoencoder; MMD-VAE: maximum mean discrepancy variational autoencoder; BiDNN: bidirectional deep neural network.

To compare the performance of different deep learning autoencoders for cancer subtype identification using multiomics data, Franco et al. [27] performed four autoencoders (vanilla, denoising, sparse, and variational) on four types of cancer (glioblastoma multiforme, GBM; colon adenocarcinoma, COAD; kidney renal clear cell carcinoma, KRCC; breast invasive carcinoma, BIC) from TCGA datasets. The study concluded that generally vanilla and variational autoencoders showed the best performance to identify different subgroups, in spite of the varied performance of different autoencoders on different datasets.

Besides, some deep learning frameworks were proposed to integrate multiomics data of patients with cancer to identify the subgroup and predict the prognosis. A framework to integrate multiomics data by a denoising autoencoder for accurate cancer prognosis prediction (DCAP) was proposed by Chai et al. [28]. The framework was applied in 15 TCGA datasets and presented superior accuracy and robustness.

Poirion et al. [29] proposed an ensemble framework of deep learning and machine learning approaches, named DeepProg, to robustly predict the survival subgroups of patients. DeepProg presents superior predictive accuracy and robustness on 32 cancers from TCGA datasets.

Yang et al. [30] proposed Subtype-GAN, which uses a generative adversarial network (GAN) to extract latent variables and uses consensus clustering and the Gaussian Mixture model to identify tumor samples' subgroups. The result proved the well performance in ten TCGA datasets.

Some of the frameworks involve other types of information beside multiomics data to obtain the subgroup and predict the prognosis. Liu et al. [31] proposed deep learning fusion clustering (DLFS) to integrate multiomics data on TCGA-BCRA dataset for breast cancer subgroup detection. DLFS involves both clustering loss and classification loss into the total loss of the whole framework. Classification loss is calculated based on prior biological knowledge.

Zhao et al. [32] proposed a scalable and interpretable multiomics deep learning framework named DeepOmix to identify the relationships between multiomics data and the survival data and to incorporate some prior biological information. The output layer of DeepOmix consists of survival time and status, and patients would be divided into high- and low-risk subgroups based on the output layer. More and more types of data like single-cell RNA sequencing data or spatial transcriptomics data would be available for the DeepOmix framework in the future.

However, only few studies constructed a deep learning-based model for detecting the subgroups of patients with stomach adenocarcinoma to predict the prognosis and identify the prognostic biomarkers.

3. Materials and Methods

3.1. Data Collection and Preprocessing

In this study, TCGA-Assembler package [33] in R was used to derive the multiomics data of 363 patients with stomach adenocarcinoma from the TCGA cohort, including RNA sequencing (RNA-Seq) data (UNC IlluminaHiSeq_RNASeqV2; Level 3), microRNA sequencing (miRNA-Seq) data (BCGSC IlluminaHiSeq_miRNASeq; Level 3), and DNA methylation data (JHU-USC HumanMethylation450; Level 3). For mRNA and miRNA data, features which have zero value in more than 20% samples were excluded. And then, samples which have zero value in more than 20% features were excluded. For DNA methylation data, average methylation β-value of all CpG within the gene promoter region, which is defined as 1500 base pairs (bp) upstream of transcription start sites (TSS) of genes, was calculated as the promoter methylation values. Promoter features which have missing value in more than 20% samples were excluded, and samples which have missing value in more than 20% promoter features were excluded. And then, the missing values were imputed by the impute package (https://bioconductor.org/packages/release/bioc/html/impute.html) in R. All the features obtained above were unit scaled by samples as follows:

\begin{matrix} v_{n o r m e d} = v \frac{1}{{‖v‖}_{2}^{2}}, \end{matrix}

(1)

where v is a vector of samples and ‖v‖₂² is the l₂ norm of v.

To validate the performance, three independent cohorts obtained from the GEO dataset were used. The first cohort was obtained from GSE15459 [34–39], which derived the RNA-Seq data of 192 samples using the Affymetrix GeneChip Human Genome U133 Plus 2.0 Array. The patients of GSE15459 were recruited from the National Cancer Center and hospitals of the National Healthcare Group. The second cohort was obtained from GSE26253 [40, 41] Illumina HumanRef-8 WG-DASL v3.0 dataset with 432 samples. Samples of the GSE26253 cohort were from Samsung Medical Center and passed the RNA quality control. The third cohort was obtained from the GSE84437 [42] Illumina HumanHT-12 V3.0 expression beadchip microarray dataset with 433 samples. Total RNA of samples from GSE84437 was extracted from the fresh-frozen gastrectomy specimens at the Yonsei University Severance Hospital (South Korea) between 2000 and 2010. According to the annotation file from each platform, average expression of all probes corresponding to one gene symbol was calculated as the mRNA expression value. Unit scaling was also performed on each cohort. OS, disease-free survival (DFS), and the response of fourfold increase from the baseline of the heterotopic tumor size were used as the endpoint of the three cohorts.

3.2. Autoencoder Construction

Autoencoder is an unsupervised feedforward neural network. According to the previous study [14], more hidden layers contribute little to the final performance but lead to a large amount of calculation. Thus, an autoencoder with three hidden layers was implemented using the Python library Keras. The number of nodes of hidden layers was set to 2000, 500, and 2000 (Supplementary Table S1). Though the model with 5000, 500, and 5000 nodes in each hidden layer provided the lowest log-rank P value and the highest concordance index (C-index), the model was not chosen because of its bad performance in the 10-fold CV procedure (Supplementary Table S2). The layer with 500 nodes named the bottleneck layer was thought to contain the representation features. Autoencoder reconstructs its input layer x by a series of nonlinear transformations. In this study, tanh was used as the activation function of each layer. A loss function of the mean square error was used to measure the error between input layer x and output layer x′: L(x, x′) = (x − x′)². However, to avoid overfitting, a batch normalization layer was appended after the hidden layer. Moreover, an adaptive momentum (Adam) algorithm was used with 32 epochs.

3.3. Selection of Representation Features and k-Means Clustering

For each representation feature obtained from the bottleneck layer, a univariate Cox proportional hazards (Cox-PH) model was built using survival package in R, and the features with P value greater than 0.10 were screened out. The k-means clustering algorithm was applied to the remaining features. Silhouette index [43] and Calinski-Harabasz criterion [44] were calculated to determine the optimal number of clusters, and the label of each patient was obtained. On account of the labels being related to the prognosis risk of stomach adenocarcinoma, we used “subgroup” instead of label or cluster. Kaplan-Meier survival curves of different subgroups were drawn, and log-rank P value, C-index, and Brier score were calculated.

3.4. Supervised Classification

To verify the robustness of the subgroup obtained from the k-means algorithm, internal and an external validations were performed.

In order to ensure sufficient sample size of the test set, a 10-fold CV-like procedure [14, 15, 21] was used to partition the TCGA dataset into a training set and a test set in the internal validation. Through the 10-fold CV-like procedure, 363 samples were randomly split into 5 folds, and then, ten different combinations of 3 folds (training sets) and 2 folds (test sets) were obtained. For each of these ten combinations, a different autoencoder was built using the data from the training set, and k-means clustering was performed to obtain the subgroup labels. Then, the top 50 mRNA, 30 miRNA, and 50 methylation features were selected based on the P value of analysis of variance (ANOVA) to build a support vector machine (SVM) model. Three single-omics data and the multiomics data were used to build a SVM model, respectively. Subgroup labels of samples from the test sets were predicted by the SVM classifier. Then, log-rank P value, C-index, and Brier score were calculated to evaluate the robustness of the prognosis model.

In the external validation, the mRNA features within both the TCGA cohort and the independent cohort were firstly selected. Then, like in the internal validation, a SVM classifier was built using the top 50 mRNA features, which were selected by the P value of ANOVA, of the whole TCGA cohort. The subgroup labels of three independent cohorts were predicted, and the log-rank P value, C-index, and Brier score were calculated.

However, two scaling steps were applied before constructing a SVM classifier whether in training sets, test sets, or the independent cohort. For the mRNA and methylation features, median scale normalization and robust scale normalization were performed. And for miRNA features, median scale normalization and unit scale normalization were performed. Unit scale normalization was shown in the aforementioned procedure, and the median and robust scale normalization were performed as follows:

\begin{matrix} median scale normalization : x_{scaled} = \frac{(x - median (x))}{mad (x)}, \\ robust scale normalization : x_{whitened} = \{\frac{x_{i} - {mean}_{25 - 75} (x_{i})}{{s d}_{25 - 75} (x_{i})}, x_{i} \in x\}, \end{matrix}

(2)

where x = (x₁, ⋯, x_n) is a vector of feature and mad(x) = median({|x_i − median(x)|, x_i ∈ x}).

All SVM classifiers were constructed using the e1071 package (https://CRAN.R-project.org/package=e1071) in R, and 5-fold CV was used to perform a grid search of the best hyperparameters. C-index and Brier score were calculated using the survcomp package [45] in R.

3.5. Alternative Approaches

Two alternative approaches were used to compare with the autoencoder. In the first approach, principal component analysis (PCA) was performed and the number of principal components was set to the same as the bottleneck layer. Then, a univariate Cox-PH model was built for each principal component, and the k-means clustering was performed to obtain the subgroups which indicated to the prognosis-related risk. In the second approach, similarity network fusion (SNF) [17] was used to construct sample similarity matrices for each of the omics data types using pairwise correlation. Through the fused network of samples, the optimal numbers and subgroup labels were obtained according to the spectral clustering algorithm. The SNF was implemented using the SNFtool package in R.

3.6. Bioinformatics Analysis

Differential expression analysis was performed between different subgroups of the TCGA cohort. For mRNA and miRNA data, the DESeq2 package in R was used to identify the differentially expressed genes (DEGs) and miRNA expression. The features of the mRNA or miRNA which satisfy the criteria of |log2FC| > 1 and false discovery rate (FDR) < 0.05 were considered to be significant. For methylation data, beta value was transformed into M value using the lumi package [46] in R, and then, significant differentially methylated genes (DMGs) were identified in the criteria of |M value difference| > 1 and FDR < 0.05 using moderate t-test with the limma package [47] in R.

Kyoto Encyclopedia of Genes and Genomes (KEGG) enriched analysis was performed for the significant DEGs, which were obtained from the differential expression analysis, with the threshold of FDR < 0.05, using the clusterProfiler package [48, 49] in R.

4. Results

4.1. Two Significant Subgroups Were Identified according to the TCGA Cohort

The workflow of overall study is shown in Figure 1(a). From the TCGA-STAD project, the total number of patients with stomach adenocarcinoma included in this study was 363. The clinical information of the TCGA cohort is shown in Table 2. After preprocessing, 16,699 genes from RNA-Seq, 390 miRNAs from miRNA-Seq, and 18,992 genes from DNA methylation data were obtained as input features. Three omics data were stacked as the features of the input layer. The architecture of the autoencoder is shown in Figure 1(b).

Architecture of autoencoder and overall workflow. (a) The workflow of overall study. (b) The architecture of autoencoder to integrate multiomics data.

Table 2.

Clinical information of TCGA and three independent cohorts.

Item	TCGA (n = 363)	GSE15459 (n = 192)	GSE26253 (n = 432)	GSE 84437 (n = 433)
Age (years; mean ± sd)	64.98 ± 10.55	64.37 ± 13.24		60.06 ± 11.58
Gender (female/male)	125/238	67/125		137/296
Race (Asian/Black or African American/White/not reported)	84/12/236/31
Time (days; median)	463	574	1735	2087
Status (0/1)^a	219/144	97/95	255/177	224/209
Stage (I/II/III/IV/-)	46/118/161/28/10	31/29/72/60/0	68/167/130/67/0
Pathologic T stage (T1/T2/T3/T4)	18/72/168/105			11/38/92/292
Pathologic N stage (N0/N1/N2/N3/NX/-)	112/95/72/76/6/2			80/188/132/33/0/0
Pathologic M stage (M0/M1/MX)	327/19/17

Open in a new tab

^aStatus 1 means reaching the endpoint. For TCGA and GSE15459, the endpoint is OS. For GSE26253, the endpoint is DFS. For GSE84437, the endpoint is the response of the fourfold increase from the baseline of the heterotopic tumor size.

The representation features obtained from the bottleneck layer were selected using the univariate COX-PH model, and the cut-off of P value was set to 0.10. Then, 104 features were thought to be related to survival and used to implement the k-means clustering. According to the silhouette index and the Calinski-Harabasz criterion, K = 2 could be considered as the optimal number of clusters (Figure 2). Thus, two subgroups were obtained (Supplementary Table S3).

Selection of the best cluster K according to silhouette index and Calinski-Harabasz score: (a) silhouette index; (b) Calinski-Harabasz score.

Furthermore, prognostic difference between these two subgroups was assessed using the Cox-PH model. The log-rank P value, C-index, and Brier score were 2.8e − 06, 0.714, and 0.184, respectively (Figure 3(a)). However, 104 features were obtained after a dimension reduction method, PCA, and k-means clustering were performed. The log-rank P value, C-index, and Brier score were 2.05e − 03, 0.649, and 0.206, respectively (Figure 3(b)). And a SNF was constructed with number of neighbors K, hyperparameter sigma, and number of iteration T setting to 14, 0.3, and 27, respectively. The log-rank P value, C-index, and Brier score were 8.91e − 04, 0.653, and 0.200, respectively (Figure 3(c)). Thus, the subgroup obtained from the autoencoder performed better than those from two other alternative approaches.

Kaplan–Meier curves for OS time between two different subgroups of TCGA cohort obtained from different approaches: (a) autoencoder; (b) PCA; (c) SNF.

4.2. Cox-PH Model including Subgroups and Other Clinical Features

As the univariate Cox-PH models (Table 3) showed, subgroup (log-rank P value 2.81e − 06), age (log-rank P value 1.08e − 02), pathologic tumor (T) stage (log-rank P value 3.23e − 02), pathologic node (N) stage (log-rank P value 5.95e − 03), and stage (log-rank P value 3.20e − 03) were identified as risk factors related to OS time. The multivariate Cox regression model also found the same result (adjusted HR 2.386; 95% CI 1.607-3.543; P value 1.62e − 05) after the adjustment of age, pathologic T stage, pathologic N stage, and stage.

Table 3.

Identification for risk factor using univariate and multivariate Cox-PH model.

Factors	Univariate				Multivariate
Factors	HR	95% CI	z	P value^a	HR	95% CI	z	P value^b
Subgroup				2.81e − 06
Low risk	1.000	—	—	—	1.000	—	—	—
High risk	2.392	(1.642, 3.485)	4.545	5.51e − 06	2.386	(1.607, 3.543)	4.312	1.62e − 05
Age				1.08e − 02
<65	1.000	—	—	—	1.000	—	—	—
≥65	1.538	(1.102, 2.148)	2.528	1.15e − 02	1.982	(1.389, 2.829)	3.769	1.64e − 04
Gender				7.59e − 02
Female	1.000	—	—	—
Male	1.386	(0.965, 1.992)	1.767	7.72e − 02
Race				5.22e − 01
Not reported	1.000	—	—	—
Asian	0.812	(0.430, 1.534)	-0.641	5.21e − 01
Black or African American	1.478	(0.636, 3.434)	0.907	3.64e − 01
White	0.919	(0.544, 1.552)	-0.316	7.52e − 01
Pathologic T stage				3.23e − 02
T1	1.000	—	—	—	1.000	—	—	—
T2	3.032	(0.716, 12.845)	1.506	1.32e − 01	1.935	(0.433, 8.637)	0.864	3.87e − 01
T3	4.457	(1.093, 18.180)	2.083	3.72e − 02	4.233	(0.846, 21.18)	1.756	7.90e − 02
T4	4.902	(1.188, 20.231)	2.198	2.80e − 02	4.738	(0.93, 24.147)	1.872	6.12e − 02
Pathologic N stage				5.95e − 03
N0	1.000	—	—	—	1.000	—	—	—
N1	1.560	(0.973, 2.50)	1.847	6.47e − 02	1.583	(0.794, 3.154)	1.305	1.92e − 01
N2	1.628	(0.976, 2.71)	1.868	6.17e − 02	1.646	(0.718, 3.771)	1.178	2.39e − 01
N3	2.430	(1.520, 3.890)	3.708	2.09e − 04	2.390	(1.041, 5.485)	2.055	3.99e − 02
NX or missing	1.812	(0.432, 7.600)	0.812	4.17e − 01	1.566	(0.352, 6.970)	0.588	5.56e − 01
Pathologic M stage				1.04e − 01
M0	1.000	—	—	—
M1	1.785	(0.936, 3.410)	1.759	7.86e − 02
MX	1.627	(0.759, 3.490)	1.251	2.11e − 01
Stage				3.20e − 03
I	1.000	—	—	—	1.000	—	—	—
II	1.362	(0.706, 2.630)	0.923	3.56e − 01	0.586	(0.221, 1.556)	-1.072	2.84e − 01
III	2.084	(1.130, 3.850)	2.350	1.88e − 02	0.518	(0.142, 1.892)	-0.996	3.19e − 01
IV	2.946	(1.406, 6.170)	2.863	4.20e − 03	0.842	(0.232, 3.057)	-0.262	7.93e − 01
Missing	3.760	(1.319, 10.72)	2.479	1.32e − 02	1.457	(0.349, 6.082)	0.516	6.06e − 01

Open in a new tab

^aBoth log-rank P value of each univariate Cox-PH model and P value of each variable were calculated. ^bScore test for the multivariate Cox-PH model P value 1.15e − 07.

4.3. Internal Validation to Assess the Robustness of the Subgroups

Since the two subgroups of the TCGA cohort were obtained from k-means clustering, a 10-fold CV-like procedure was performed to assess the robustness. In each fold, ANOVA was performed to select the top 50 mRNA, 30 miRNA, and 50 DNA methylation gene features associated with the obtained subgroup (Supplementary Table 4). The remaining features were used to build a SVM classifier which used the subgroup obtained from the autoencoder as the true label. For the training set, a C-index of 0.677 ± 0.042, a low Brier score of 0.216 ± 0.009, and a significant log-rank P value of 3.51e − 03 were generated (Table 4). For the test set, a C-index of 0.644 ± 0.030, a low Brier score of 0.209 ± 0.006, and a significant log-rank P value of 3.19e − 03 were generated (Table 4). SVMs using each three-single omics features were also built, and impressive performances were also produced. For the test set based on mRNA features only, a C-index of 0.636 ± 0.020, a Brier score of 0.208 ± 0.008, and a log-rank P value of 3.56e − 03 were generated (Table 4). For the test set based on miRNA features only, a C-index of 0.631 ± 0.017, a Brier score of 0.201 ± 0.006, and a log-rank P value of 1.81e − 03 were generated (Table 4). For the test set based on methylation features only, a C-index of 0.653 ± 0.016, a Brier score of 0.213 ± 0.004, and a log-rank P value of 2.63e − 03 were generated (Table 4).

Table 4.

CV-like performance of SVM classifier on training set and test of TCGA cohort.

Datasets	Omics type	Log-rank P value (geo.Mean)	C-index	Brier score
Training	3-omics	3.51e − 03	0.677 ± 0.042	0.216 ± 0.009

Test	3-omics	3.19e − 03	0.644 ± 0.030	0.209 ± 0.006
	mRNA only	3.56e − 03	0.636 ± 0.020	0.208 ± 0.008
	miRNA only	1.81e − 03	0.631 ± 0.017	0.201 ± 0.006
	Methylation only	2.63e − 03	0.653 ± 0.016	0.213 ± 0.004

Open in a new tab

4.4. External Validation in Three Independent Cohorts

To prove the performance of the classification model at predicting prognosis outcomes, three independent cohorts, GSE15459, GSE26253, and GSE84437, were used. The clinical information of the three cohorts is shown in Table 2. 14,881, 14,920, and 10,973 common mRNA features were used to select the top 50 mRNA features associated with the subgroup obtained from the prognosis model based on the autoencoder for the three independent cohorts, respectively. The GSE15459 cohort produced a C-index of 0.615, a Brier score of 0.219, and a significant log-rank P value of 1.19e − 02 (Figure 4(a)). The GSE26253 cohort produced a C-index of 0.609, a Brier score of 0.314, and a significant log-rank P value of 3.84e − 02 (Figure 4(b)). The GSE84437 cohort produced a C-index of 0.635, a Brier score of 0.278, and a significant log-rank P value of 4.10e − 03 (Figure 4(c)).

Kaplan–Meier curves between two different subgroups of three independent cohorts: (a) GSE15459; (b) GSE26253; (c) GSE84437.

4.5. Bioinformatics Analysis

Differential analysis was performed to detect the differentially expressed genes (DEGs), differentially expressed miRNAs, and differentially methylated genes (DMGs). 1839 upregulated and 56 downregulated genes were obtained with a criterion of |log2FC| < 1 and FDR < 0.05. 27 differentially expressed miRNA were obtained with the same criterion as DEGs. 1323 DMGs were obtained with a criterion of |M value difference| > 1 and FDR < 0.05. The top 10 DEGs (CLDN6, TF, ETNK2, CLEC2L, RSPO4, CHGB, RBFOX3, PDLIM4, APBB1, PPP1R14A), differentially expressed miRNAs (hsa-mir-483, hsa-mir-675, hsa-mir-187, hsa-mir-337, hsa-mir-654, hsa-mir-145, hsa-mir-133a-1, hsa-mir-133b, hsa-mir-218-2, hsa-mir-99a), and DMGs (CBX7, NPAS1, BDNF, MAD2L2, BAIAP3, INSRR, CYB5R1, RRP15, LHX4, ATP1B2) are shown in Figure 5.

Heat map for the top 10 DEGs, differentially expressed miRNAs, and DMGs: (a) DEGs; (b) differentially expressed miRNAs; (c) DMGs.

As the result of implementation of KEGG pathway enrichment analysis, the upregulated and downregulated DEGs were significantly involved in 30 and 8 signaling pathways, respectively (Figure 6 and Supplementary Table S5).

5. Discussion

Wide popularity and poor prognosis of stomach adenocarcinoma make it an urgent request for risk identification. In previous studies, deep learning-based unsupervised methods were implemented using multiomics data and obtained better prognosis performances [14–16]. The autoencoder is an architecture of deep learning which is able to fit a complex nonlinear relationship between the input layer and the output layer. The representative features from the bottleneck layer could reflect the information of multiomics data well, and the deep learning-based model overcomes the problem of high dimension of the multiomics data. In this study, two subgroups were identified using an autoencoder deep learning architecture. Difference of survival probabilities between two subgroups was significant, and the model fitness performed well. Performance of the model based on the autoencoder was better than those of two alternative approaches, PCA and SNF. It suggested that a deep learning-based method like autoencoder could utilize the information of multiomics data better through capturing the nonlinear relationship than the traditional dimension reduction method like PCA, and the model based on an autoencoder could produce a more accurate prediction of prognosis than the model which constructs and fuses a similarity network to have a comprehensive view of a biological process in specific diseases like SNF. The deep learning-based model would contribute to predicting the prognosis of patients with stomach adenocarcinoma and identifying the potential prognostic biomarkers.

According to the 10-fold CV-like validation, the robustness of the model identifying the subgroups has been verified. In addition, the subgroup could be considered as a risk factor according to the univariate and multivariate Cox-PH model. Moreover, the model was validated in three independent cohorts with different endpoints, and each cohort produced a significant log-rank P value. Regretfully, GSE26253 and GSE84437, with an endpoint of DFS and response of fourfold increasement of tumor size, respectively, produced a high Brier score, indicating high error of the model fitting on survival data. This suggests that the endpoint of OS would be more suitable in prognosis prediction using autoencoder-integrating multiomics data.

In fact, Xu et al. [26] have already used a deep learning architecture of bidirectional deep neural networks (BiDNNs) to integrate RNA-Seq and DNA methylation data of patients with gastric cancer to obtain two subgroups associated with prognosis-related risk. The model based on the autoencoder shows a better performance than that on BiDNNs, in log-rank P value (2.80e − 06 vs. 9.05e − 05, respectively) and C-index (0.714 vs. 0.673, respectively). This may indicate that complex architecture could not provide a better performance. Information from miRNA expression may also play an important role in the model based on the autoencoder to show its robustness. Both studies provide guidance for personalization of medical services.

DEGs, differentially expressed miRNAs, and DMGs between two subgroups were obtained and provided an opportunity to be used as potential prognostic biomarkers or candidate therapeutic targets in future clinical practice. Differential analysis and enrichment analysis were practiced to understand the difference between two subgroups at the molecular level. Some studies have revealed the relationship between the top 10 DEGs, DMGs, and stomach adenocarcinoma. Expression of CLDN6 indicates poor prognosis of gastric cancer [50–52]. RBFOX3 was considered to be associated to the growth and progression of gastric cancer [53]. CBX7 was thought to be related to the proliferation and metastasis of the gastric cancer cell and be able to reveal the prognosis [54–56]. BDNF expression was reported to be associated with poor prognosis of gastric cancer [57, 58]. CLDN6, TF, CLEC2L, and APBB1 were also identified as top 10 DEGs, and no common top 10 DMGs were found in the previous study using the model based on BiDNNs [26]. Low has-miR-337-3p expression was reported to be related to lymph node metastasis of human gastric cancer [59]. hsa-miR-145 and hsa-mir-133a-1 were considered to serve as an underlying prognostic indicator for the patient with stomach adenocarcinoma [60]. Little was known about the relationship between the other 7 of the top 10 differentially expressed miRNAs and stomach adenocarcinomas.

However, only RNA-Seq, miRNA-Seq, and DNA methylation data were used in this study. This might not provide sufficient information of the biological process, and more omics data could be included in the further study. Regretfully, no independent cohort with miRNA-Seq and DNA methylation data has been found to use in the external validation. Moreover, the poor interpretability of deep learning may prevent the prognosis model from being widely used in clinical practice, and a more theoretic approach is required to explain how the model works.

6. Conclusions

Prognosis-related risk was evaluated based on the architecture of the autoencoder, and the patients were divided into low-risk and high-risk groups. The high-risk group had a significantly higher risk of poor survival with an adjusted HR of 2.386, a log-rank P value of 2.80e − 06, a C-index of 0.714, and a Brier score of 0.184. The robustness of the prognosis model was successfully validated using a 10-fold CV-like procedure, and the performance was verified in three independent cohorts. The model based on the autoencoder has a better performance than PCA (log-rank P value: 2.05e − 03; C-index: 0.649) and SNF (log-rank P value: 8.91e − 03; C-index: 0.653) on prognosis prediction. DEGs, differentially expressed miRNAs, and DMGs between two subgroups were uncovered. The results might provide the foundation for further exploring the potential biomarkers to predict the prognosis of patients with stomach adenocarcinoma.

Acknowledgments

This work was supported by the Postgraduate Research & Practice Innovation Program of Jiangsu Province (grant number SJCX21_0086).

Data Availability

All data included in this study are available from The Cancer Genome Atlas (TCGA) database (http://cancergenome.nih.gov) and GEO repository (https://www.ncbi.nlm.nih.gov/geo/).

Conflicts of Interest

The authors declare no conflict of interest.

Supplementary Materials

The following supplementary materials are available. Table S1: performance of models based on autoencoders with different numbers of nodes in hidden layers. Table S2: CV-like performance of models based on autoencoders with different numbers of nodes in hidden layers. Table S3: subgroup of patients from TCGA cohort. Table S4: top 50 mRNA, 30 miRNA, and 50 methylation features associated with subgroup based on ANOVA. Table S5: result of KEGG pathway enrichment analysis.

Click here for additional data file.^{(63.7KB, docx)}

References

1.Sung H., Ferlay J., Siegel R. L., et al. Global Cancer Statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a Cancer Journal for Clinicians . 2021;71(3):209–249. doi: 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]
2.Du Y., Wang Z., Wan W. High expression of Erk-related Rasgrf2 predicts poor prognosis in patients with stomach adenocarcinoma and correlates with M2 macrophage. Journal of Cancer . 2021;12(23):7177–7189. doi: 10.7150/jca.63029. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Johnston F. M., Beckman M. Updates on management of gastric cancer. Current Oncology Reports . 2019;21(8):p. 67. doi: 10.1007/s11912-019-0820-4. [DOI] [PubMed] [Google Scholar]
4.Suzuki H., Oda I., Abe S., et al. High rate of 5-year survival among patients with early gastric cancer undergoing curative endoscopic submucosal dissection. Gastric Cancer . 2016;19(1):198–205. doi: 10.1007/s10120-015-0469-0. [DOI] [PubMed] [Google Scholar]
5.Huang S., Chaudhary K., Garmire L. X. More is better: recent progress in multi-omics data integration methods. Frontiers in Genetics . 2017;8:p. 84. doi: 10.3389/fgene.2017.00084. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Ritchie M. D., Holzinger E. R., Li R., Pendergrass S. A., Kim D. Methods of integrating data to uncover genotype-phenotype interactions. Nature Reviews Genetics . 2015;16(2):85–97. doi: 10.1038/nrg3868. [DOI] [PubMed] [Google Scholar]
7.Subramanian I., Verma S., Kumar S., Jere A., Anamika K. Multi-omics data integration, interpretation, and its application. Bioinformatics and Biology Insights . 2020;14:p. 1215734613. doi: 10.1177/1177932219899051. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Lakshmanna K., Kaluri R., Gundluru N., et al. A review on deep learning techniques for IoT data. Electronics . 2022;11(10):p. 1604. doi: 10.3390/electronics11101604. [DOI] [Google Scholar]
9.Chang C.-Y., Sweta Bhattacharya P. M., Vincent D. R., Lakshmanna K., Srinivasan K. An efficient classification of neonates cry using extreme gradient boosting-assisted grouped-support-vector network. Journal of Healthcare Engineering . 2021;2021:14. doi: 10.1155/2021/7517313.7517313 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Gundluru N., Rajput D. S., Lakshmanna K., et al. Enhancement of detection of diabetic retinopathy using Harris Hawks optimization with deep learning model. Computational Intelligence and Neuroscience . 2022;2022:13. doi: 10.1155/2022/8512469.8512469 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Chowdhary C. L., Khare N., Patel H., Koppu S., Kaluri R., Rajput D. S. Past, present and future of gene feature selection for breast cancer classification - a survey. International Journal of Engineering Systems Modelling and Simulation . 2022;13(2):140–153. doi: 10.1504/IJESMS.2022.123355. [DOI] [Google Scholar]
12.Esteva A., Robicquet A., Ramsundar B., et al. A guide to deep learning in healthcare. Nature Medicine . 2019;25(1):24–29. doi: 10.1038/s41591-018-0316-z. [DOI] [PubMed] [Google Scholar]
13.Zhu W., Xie L., Han J., Guo X. The application of deep learning in cancer prognosis prediction. Cancers . 2020;12(3):p. 603. doi: 10.3390/cancers12030603. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Chaudhary K., Poirion O. B., Liangqun L., Garmire L. X. Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clinical Cancer Research . 2018;24(6):1248–1259. doi: 10.1158/1078-0432.CCR-17-0853. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Yu J., Xiaoliu W., Lv M., et al. A model for predicting prognosis in patients with esophageal squamous cell carcinoma based on joint representation learning. Oncology Letters . 2020;20(6):p. 387. doi: 10.3892/ol.2020.12250. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Zhao Z., Li Y., Yuanqing W., Chen R. Deep learning-based model for predicting progression in patients with head and neck squamous cell carcinoma. Cancer Biomarkers . 2019;27(1):19–28. doi: 10.3233/CBM-190380. [DOI] [PubMed] [Google Scholar]
17.Wang B., Mezlini A. M., Demir F., et al. Similarity network fusion for aggregating data types on a genomic scale. Nature Methods . 2014;11(3):333–337. doi: 10.1038/nmeth.2810. [DOI] [PubMed] [Google Scholar]
18.Shen R., Olshen A. B., Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics . 2009;25(22):2906–2912. doi: 10.1093/bioinformatics/btp543. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Wang T.-H., Lee C.-Y., Lee T.-Y., Huang H.-D., Hsu J. B.-K., Chang T.-H. Biomarker identification through multiomics data analysis of prostate cancer prognostication using a deep learning model and similarity network fusion. Cancers . 2021;13(11):p. 2528. doi: 10.3390/cancers13112528. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Poirion O. B., Chaudhary K., Garmire L. X. Deep learning data integration for better risk stratification models of bladder cancer. AMIA Summits on Translational Science Proceedings . 2018;2018:197–206. [PMC free article] [PubMed] [Google Scholar]
21.Lv J., Wang J., Shang X., Liu F., Guo S. Survival prediction in patients with colon adenocarcinoma via multiomics data integration using a deep learning algorithm. Bioscience Reports . 2020;40(12) doi: 10.1042/BSR20201482. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Liu Q., Pingzhao H. Association analysis of deep genomic features extracted by denoising autoencoders in breast cancer. Cancers . 2019;11(4):p. 494. doi: 10.3390/cancers11040494. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Lee T.-Y., Huang K.-Y., Chuang C.-H., Lee C.-Y., Chang T.-H. Incorporating deep learning and multi-omics autoencoding for analysis of lung adenocarcinoma prognostication. Computational Biology and Chemistry . 2020;87:p. 107277. doi: 10.1016/j.compbiolchem.2020.107277. [DOI] [PubMed] [Google Scholar]
24.Hira M. T., Razzaque M. A., Angione C., Scrivens J., Sawan S., Sarker M. Integrated multi-omics analysis of ovarian cancer using variational autoencoders. Scientific Reports . 2021;11(1):1–16. doi: 10.1038/s41598-021-85285-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Baek B., Lee H. Prediction of survival and recurrence in patients with pancreatic cancer by integrating multi-omics data. Scientific Reports . 2020;10(1):p. 18951. doi: 10.1038/s41598-020-76025-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Xu J., Yao Y., Binghua X., Li Y., Zhijian S. Unsupervised learning of cross-modal mappings in multi-omics data for survival stratification of gastric cancer. Future Oncology . 2022;18(2):215–230. doi: 10.2217/fon-2021-1059. [DOI] [PubMed] [Google Scholar]
27.Franco E. F., Rana P., Cruz A., et al. Performance comparison of deep learning autoencoders for cancer subtype detection using multi-omics data. Cancers . 2021;13(9):p. 2013. doi: 10.3390/cancers13092013. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Chai H., Zhou X., Zhang Z., Rao J., Zhao H., Yang Y. Integrating multi-omics data through deep learning for accurate cancer prognosis prediction. Computers in Biology and Medicine . 2021;134:p. 104481. doi: 10.1016/j.compbiomed.2021.104481. [DOI] [PubMed] [Google Scholar]
29.Poirion O. B., Jing Z., Chaudhary K., Huang S., Garmire L. X. DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Medicine . 2021;13(1):p. 112. doi: 10.1186/s13073-021-00930-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Yang H., Chen R., Li D., Wang Z. Subtype-Gan: a deep learning approach for integrative cancer subtyping of multi-omics data. Bioinformatics . 2021;37(16):2231–2237. doi: 10.1093/bioinformatics/btab109. [DOI] [PubMed] [Google Scholar]
31.Liu S., Lin Q., Tie Y., Liu F. A deep learning fusion clustering framework for breast cancer subtypes identification by integrating multi-omics data. 2020 5th International Conference On Mechanical, Control And Computer Engineering (Icmcce 2020); December 2020; Harbin, China. pp. 1710–1714. [DOI] [Google Scholar]
32.Zhao L., Dong Q., Luo C., et al. Deepomix: a scalable and interpretable multi-omics deep learning framework and application in cancer survival analysis. Computational and Structural Biotechnology Journal . 2021;19:2719–2725. doi: 10.1016/j.csbj.2021.04.067. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Zhu Y., Qiu P., Ji Y. TCGA-Assembler: open-source software for retrieving and processing TCGA data. Nature Methods . 2014;11(6):599–600. doi: 10.1038/nmeth.2956. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Ooi C. H., Ivanova T., Wu J., et al. Oncogenic pathway combinations predict clinical prognosis in gastric cancer. PLoS Genetics . 2009;5(10):p. e1000676. doi: 10.1371/journal.pgen.1000676. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Tao J., Deng N. T., Ramnarayanan K., et al. Cd44-Slc1a2 gene fusions in gastric cancer. Science Translational Medicine . 2011;3(77):p. 77ra30. doi: 10.1126/scitranslmed.3001423. [DOI] [PubMed] [Google Scholar]
36.Muratani M., Deng N., Ooi W. F., et al. Nanoscale chromatin profiling of gastric adenocarcinoma reveals cancer- associated cryptic promoters and somatically acquired regulatory elements. Nature Communications . 2014;5(1):p. 4361. doi: 10.1038/ncomms5361. [DOI] [PubMed] [Google Scholar]
37.Chia N. Y., Deng N., Das K., et al. Regulatory crosstalk between lineage-survival oncogenes KLF5, GATA4 and GATA6 cooperatively promotes gastric cancer development. Gut . 2015;64(5):707–719. doi: 10.1136/gutjnl-2013-306596. [DOI] [PubMed] [Google Scholar]
38.Lei Z., Tan I. B., Das K., et al. Identification of molecular subtypes of gastric cancer with different responses to PI3-kinase inhibitors and 5-fluorouracil. Gastroenterology . 2013;145(3):554–565. doi: 10.1053/j.gastro.2013.05.010. [DOI] [PubMed] [Google Scholar]
39.Subhash V. V., Yeo M. S., Wang L., et al. Anti-tumor efficacy of selinexor (Kpt-330) in gastric cancer is dependent on nuclear accumulation of P53 tumor suppressor. Scientific Reports . 2018;8(1):p. 12248. doi: 10.1038/s41598-018-30686-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Lee J., Sohn I., Do I. G., et al. Nanostring-based multigene assay to predict recurrence for gastric cancer patients after surgery. PLoS One . 2014;9(3):p. e90133. doi: 10.1371/journal.pone.0090133. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Oh S. C., Sohn B. H., Cheong J. H., et al. Clinical and genomic landscape of gastric cancer with a mesenchymal phenotype. Nature Communications . 2018;9(1):p. 1777. doi: 10.1038/s41467-018-04179-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Yoon S. J., Park J., Shin Y., et al. Deconvolution of diffuse gastric cancer and the suppression of CD34 on the Balb/C nude mice model. BMC Cancer . 2020;20(1):p. 314. doi: 10.1186/s12885-020-06814-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Rousseeuw P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics . 1987;20:53–65. doi: 10.1016/0377-0427(87)90125-7. [DOI] [Google Scholar]
44.Calinski T. A dendrite method for cluster analysis. Biometrics . 1968;24(1):p. 207. [Google Scholar]
45.Ms S., Culhane A. C., Quackenbush J., Haibe-Kains B. Survcomp: an R/Bioconductor package for performance assessment and comparison of survival models. Bioinformatics . 2011;27(22):3206–3208. doi: 10.1093/bioinformatics/btr511. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Du P., Kibbe W. A., Lin S. M. Lumi: a pipeline for processing Illumina microarray. Bioinformatics . 2008;24(13):1547–1548. doi: 10.1093/bioinformatics/btn224. [DOI] [PubMed] [Google Scholar]
47.Ritchie M. E., Phipson B., Wu D., et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research . 2015;43(7):p. e47. doi: 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Wu T., Erqiang H., Shuangbin X., et al. ClusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovations . 2021;2(3):p. 100141. doi: 10.1016/j.xinn.2021.100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Yu G., Wang L.-G., Han Y., He Q.-Y. Clusterprofiler: an R package for comparing biological themes among gene clusters. OMICS: a Journal of Integrative Biology . 2012;16(5):284–287. doi: 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Kohmoto T., Masuda K., Shoda K., et al. Claudin-6 is a single prognostic marker and functions as a tumor-promoting gene in a subgroup of intestinal type gastric cancer. Gastric Cancer . 2020;23(3):403–417. doi: 10.1007/s10120-019-01014-x. [DOI] [PubMed] [Google Scholar]
51.Yu S., Zhang Y., Li Q., Zhang Z., Zhao G., Jia X. CLDN6 promotes tumor progression through the Yap1-Snail1 axis in gastric cancer. Cell Death & Disease . 2019;10(12):1–14. doi: 10.1038/s41419-019-2168-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Gao F., Li M., Xiang R., Zhou X., Zhu L., Zhai Y. Expression of CLDN6 in tissues of gastric cancer patients: association with clinical pathology and prognosis. Oncology Letters . 2019;17(5):4621–4625. doi: 10.3892/ol.2019.10129. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Luo C., Zhu X., Luo Q., et al. RBFOX3 promotes gastric cancer growth and progression by activating HTERT signaling. Frontiers in Oncology . 2020;10:p. 1044. doi: 10.3389/fonc.2020.01044. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
54.Ni S.-J., Zhao L.-Q., Wang X.-F., et al. Cbx7 regulates stem cell-like properties of gastric cancer cells via P16 and Akt-Nf-kappa B-Mir-21 pathways. Journal of Hematology & Oncology . 2018;11(1):1–14. doi: 10.1186/s13045-018-0562-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Peng P., Han X. Cbx7 and Hoxc10 promote gastric cancer cell proliferation and metastasis through NF-κB activation. Acta Poloniae Pharmaceutica . 2021;78(1):41–48. doi: 10.32383/appdr/132891. [DOI] [Google Scholar]
56.He M., Yue L., Wang H., et al. Evaluation of the prognostic value of CBXs in gastric cancer patients. Scientific Reports . 2021;11(1):1–9. doi: 10.1038/s41598-021-91649-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Okugawa Y., Tanaka K., Inoue Y., et al. Brain-derived neurotrophic factor/tropomyosin-related kinase B pathway in gastric cancer. British Journal of Cancer . 2013;108(1):121–130. doi: 10.1038/bjc.2012.499. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Choi B., Lee E.-J., Shin M.-K., et al. Upregulation of brain-derived neurotrophic factor in advanced gastric cancer contributes to bone metastatic osteolysis by inducing long pentraxin 3. Oncotarget . 2016;7(34):55506–55517. doi: 10.18632/oncotarget.10747. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Wang Z., Wang J., Yang Y., et al. Loss of has-miR-337-3p expression is associated with lymph node metastasis of human gastric cancer. Journal of Experimental & Clinical Cancer Research . 2013;32(1):1–9. doi: 10.1186/1756-9966-32-76. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Luo S.-S., Liao X.-W., Zhu X.-D. Genome-wide analysis to identify a novel microRNA signature that predicts survival in patients with stomach adenocarcinoma. Journal of Cancer . 2019;10(25):6298–62313. doi: 10.7150/jca.33250. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Click here for additional data file.^{(63.7KB, docx)}

Data Availability Statement

All data included in this study are available from The Cancer Genome Atlas (TCGA) database (http://cancergenome.nih.gov) and GEO repository (https://www.ncbi.nlm.nih.gov/geo/).

[B1] 1.Sung H., Ferlay J., Siegel R. L., et al. Global Cancer Statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a Cancer Journal for Clinicians . 2021;71(3):209–249. doi: 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]

[B2] 2.Du Y., Wang Z., Wan W. High expression of Erk-related Rasgrf2 predicts poor prognosis in patients with stomach adenocarcinoma and correlates with M2 macrophage. Journal of Cancer . 2021;12(23):7177–7189. doi: 10.7150/jca.63029. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Johnston F. M., Beckman M. Updates on management of gastric cancer. Current Oncology Reports . 2019;21(8):p. 67. doi: 10.1007/s11912-019-0820-4. [DOI] [PubMed] [Google Scholar]

[B4] 4.Suzuki H., Oda I., Abe S., et al. High rate of 5-year survival among patients with early gastric cancer undergoing curative endoscopic submucosal dissection. Gastric Cancer . 2016;19(1):198–205. doi: 10.1007/s10120-015-0469-0. [DOI] [PubMed] [Google Scholar]

[B5] 5.Huang S., Chaudhary K., Garmire L. X. More is better: recent progress in multi-omics data integration methods. Frontiers in Genetics . 2017;8:p. 84. doi: 10.3389/fgene.2017.00084. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Ritchie M. D., Holzinger E. R., Li R., Pendergrass S. A., Kim D. Methods of integrating data to uncover genotype-phenotype interactions. Nature Reviews Genetics . 2015;16(2):85–97. doi: 10.1038/nrg3868. [DOI] [PubMed] [Google Scholar]

[B7] 7.Subramanian I., Verma S., Kumar S., Jere A., Anamika K. Multi-omics data integration, interpretation, and its application. Bioinformatics and Biology Insights . 2020;14:p. 1215734613. doi: 10.1177/1177932219899051. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8.Lakshmanna K., Kaluri R., Gundluru N., et al. A review on deep learning techniques for IoT data. Electronics . 2022;11(10):p. 1604. doi: 10.3390/electronics11101604. [DOI] [Google Scholar]

[B9] 9.Chang C.-Y., Sweta Bhattacharya P. M., Vincent D. R., Lakshmanna K., Srinivasan K. An efficient classification of neonates cry using extreme gradient boosting-assisted grouped-support-vector network. Journal of Healthcare Engineering . 2021;2021:14. doi: 10.1155/2021/7517313.7517313 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10.Gundluru N., Rajput D. S., Lakshmanna K., et al. Enhancement of detection of diabetic retinopathy using Harris Hawks optimization with deep learning model. Computational Intelligence and Neuroscience . 2022;2022:13. doi: 10.1155/2022/8512469.8512469 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11.Chowdhary C. L., Khare N., Patel H., Koppu S., Kaluri R., Rajput D. S. Past, present and future of gene feature selection for breast cancer classification - a survey. International Journal of Engineering Systems Modelling and Simulation . 2022;13(2):140–153. doi: 10.1504/IJESMS.2022.123355. [DOI] [Google Scholar]

[B12] 12.Esteva A., Robicquet A., Ramsundar B., et al. A guide to deep learning in healthcare. Nature Medicine . 2019;25(1):24–29. doi: 10.1038/s41591-018-0316-z. [DOI] [PubMed] [Google Scholar]

[B13] 13.Zhu W., Xie L., Han J., Guo X. The application of deep learning in cancer prognosis prediction. Cancers . 2020;12(3):p. 603. doi: 10.3390/cancers12030603. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14.Chaudhary K., Poirion O. B., Liangqun L., Garmire L. X. Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clinical Cancer Research . 2018;24(6):1248–1259. doi: 10.1158/1078-0432.CCR-17-0853. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15.Yu J., Xiaoliu W., Lv M., et al. A model for predicting prognosis in patients with esophageal squamous cell carcinoma based on joint representation learning. Oncology Letters . 2020;20(6):p. 387. doi: 10.3892/ol.2020.12250. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.Zhao Z., Li Y., Yuanqing W., Chen R. Deep learning-based model for predicting progression in patients with head and neck squamous cell carcinoma. Cancer Biomarkers . 2019;27(1):19–28. doi: 10.3233/CBM-190380. [DOI] [PubMed] [Google Scholar]

[B17] 17.Wang B., Mezlini A. M., Demir F., et al. Similarity network fusion for aggregating data types on a genomic scale. Nature Methods . 2014;11(3):333–337. doi: 10.1038/nmeth.2810. [DOI] [PubMed] [Google Scholar]

[B18] 18.Shen R., Olshen A. B., Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics . 2009;25(22):2906–2912. doi: 10.1093/bioinformatics/btp543. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Wang T.-H., Lee C.-Y., Lee T.-Y., Huang H.-D., Hsu J. B.-K., Chang T.-H. Biomarker identification through multiomics data analysis of prostate cancer prognostication using a deep learning model and similarity network fusion. Cancers . 2021;13(11):p. 2528. doi: 10.3390/cancers13112528. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20.Poirion O. B., Chaudhary K., Garmire L. X. Deep learning data integration for better risk stratification models of bladder cancer. AMIA Summits on Translational Science Proceedings . 2018;2018:197–206. [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Lv J., Wang J., Shang X., Liu F., Guo S. Survival prediction in patients with colon adenocarcinoma via multiomics data integration using a deep learning algorithm. Bioscience Reports . 2020;40(12) doi: 10.1042/BSR20201482. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22.Liu Q., Pingzhao H. Association analysis of deep genomic features extracted by denoising autoencoders in breast cancer. Cancers . 2019;11(4):p. 494. doi: 10.3390/cancers11040494. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23.Lee T.-Y., Huang K.-Y., Chuang C.-H., Lee C.-Y., Chang T.-H. Incorporating deep learning and multi-omics autoencoding for analysis of lung adenocarcinoma prognostication. Computational Biology and Chemistry . 2020;87:p. 107277. doi: 10.1016/j.compbiolchem.2020.107277. [DOI] [PubMed] [Google Scholar]

[B24] 24.Hira M. T., Razzaque M. A., Angione C., Scrivens J., Sawan S., Sarker M. Integrated multi-omics analysis of ovarian cancer using variational autoencoders. Scientific Reports . 2021;11(1):1–16. doi: 10.1038/s41598-021-85285-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25.Baek B., Lee H. Prediction of survival and recurrence in patients with pancreatic cancer by integrating multi-omics data. Scientific Reports . 2020;10(1):p. 18951. doi: 10.1038/s41598-020-76025-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26.Xu J., Yao Y., Binghua X., Li Y., Zhijian S. Unsupervised learning of cross-modal mappings in multi-omics data for survival stratification of gastric cancer. Future Oncology . 2022;18(2):215–230. doi: 10.2217/fon-2021-1059. [DOI] [PubMed] [Google Scholar]

[B27] 27.Franco E. F., Rana P., Cruz A., et al. Performance comparison of deep learning autoencoders for cancer subtype detection using multi-omics data. Cancers . 2021;13(9):p. 2013. doi: 10.3390/cancers13092013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] 28.Chai H., Zhou X., Zhang Z., Rao J., Zhao H., Yang Y. Integrating multi-omics data through deep learning for accurate cancer prognosis prediction. Computers in Biology and Medicine . 2021;134:p. 104481. doi: 10.1016/j.compbiomed.2021.104481. [DOI] [PubMed] [Google Scholar]

[B29] 29.Poirion O. B., Jing Z., Chaudhary K., Huang S., Garmire L. X. DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Medicine . 2021;13(1):p. 112. doi: 10.1186/s13073-021-00930-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] 30.Yang H., Chen R., Li D., Wang Z. Subtype-Gan: a deep learning approach for integrative cancer subtyping of multi-omics data. Bioinformatics . 2021;37(16):2231–2237. doi: 10.1093/bioinformatics/btab109. [DOI] [PubMed] [Google Scholar]

[B31] 31.Liu S., Lin Q., Tie Y., Liu F. A deep learning fusion clustering framework for breast cancer subtypes identification by integrating multi-omics data. 2020 5th International Conference On Mechanical, Control And Computer Engineering (Icmcce 2020); December 2020; Harbin, China. pp. 1710–1714. [DOI] [Google Scholar]

[B32] 32.Zhao L., Dong Q., Luo C., et al. Deepomix: a scalable and interpretable multi-omics deep learning framework and application in cancer survival analysis. Computational and Structural Biotechnology Journal . 2021;19:2719–2725. doi: 10.1016/j.csbj.2021.04.067. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B33] 33.Zhu Y., Qiu P., Ji Y. TCGA-Assembler: open-source software for retrieving and processing TCGA data. Nature Methods . 2014;11(6):599–600. doi: 10.1038/nmeth.2956. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B34] 34.Ooi C. H., Ivanova T., Wu J., et al. Oncogenic pathway combinations predict clinical prognosis in gastric cancer. PLoS Genetics . 2009;5(10):p. e1000676. doi: 10.1371/journal.pgen.1000676. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B35] 35.Tao J., Deng N. T., Ramnarayanan K., et al. Cd44-Slc1a2 gene fusions in gastric cancer. Science Translational Medicine . 2011;3(77):p. 77ra30. doi: 10.1126/scitranslmed.3001423. [DOI] [PubMed] [Google Scholar]

[B36] 36.Muratani M., Deng N., Ooi W. F., et al. Nanoscale chromatin profiling of gastric adenocarcinoma reveals cancer- associated cryptic promoters and somatically acquired regulatory elements. Nature Communications . 2014;5(1):p. 4361. doi: 10.1038/ncomms5361. [DOI] [PubMed] [Google Scholar]

[B37] 37.Chia N. Y., Deng N., Das K., et al. Regulatory crosstalk between lineage-survival oncogenes KLF5, GATA4 and GATA6 cooperatively promotes gastric cancer development. Gut . 2015;64(5):707–719. doi: 10.1136/gutjnl-2013-306596. [DOI] [PubMed] [Google Scholar]

[B38] 38.Lei Z., Tan I. B., Das K., et al. Identification of molecular subtypes of gastric cancer with different responses to PI3-kinase inhibitors and 5-fluorouracil. Gastroenterology . 2013;145(3):554–565. doi: 10.1053/j.gastro.2013.05.010. [DOI] [PubMed] [Google Scholar]

[B39] 39.Subhash V. V., Yeo M. S., Wang L., et al. Anti-tumor efficacy of selinexor (Kpt-330) in gastric cancer is dependent on nuclear accumulation of P53 tumor suppressor. Scientific Reports . 2018;8(1):p. 12248. doi: 10.1038/s41598-018-30686-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40] 40.Lee J., Sohn I., Do I. G., et al. Nanostring-based multigene assay to predict recurrence for gastric cancer patients after surgery. PLoS One . 2014;9(3):p. e90133. doi: 10.1371/journal.pone.0090133. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B41] 41.Oh S. C., Sohn B. H., Cheong J. H., et al. Clinical and genomic landscape of gastric cancer with a mesenchymal phenotype. Nature Communications . 2018;9(1):p. 1777. doi: 10.1038/s41467-018-04179-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B42] 42.Yoon S. J., Park J., Shin Y., et al. Deconvolution of diffuse gastric cancer and the suppression of CD34 on the Balb/C nude mice model. BMC Cancer . 2020;20(1):p. 314. doi: 10.1186/s12885-020-06814-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B43] 43.Rousseeuw P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics . 1987;20:53–65. doi: 10.1016/0377-0427(87)90125-7. [DOI] [Google Scholar]

[B44] 44.Calinski T. A dendrite method for cluster analysis. Biometrics . 1968;24(1):p. 207. [Google Scholar]

[B45] 45.Ms S., Culhane A. C., Quackenbush J., Haibe-Kains B. Survcomp: an R/Bioconductor package for performance assessment and comparison of survival models. Bioinformatics . 2011;27(22):3206–3208. doi: 10.1093/bioinformatics/btr511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B46] 46.Du P., Kibbe W. A., Lin S. M. Lumi: a pipeline for processing Illumina microarray. Bioinformatics . 2008;24(13):1547–1548. doi: 10.1093/bioinformatics/btn224. [DOI] [PubMed] [Google Scholar]

[B47] 47.Ritchie M. E., Phipson B., Wu D., et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research . 2015;43(7):p. e47. doi: 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B48] 48.Wu T., Erqiang H., Shuangbin X., et al. ClusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovations . 2021;2(3):p. 100141. doi: 10.1016/j.xinn.2021.100141. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B49] 49.Yu G., Wang L.-G., Han Y., He Q.-Y. Clusterprofiler: an R package for comparing biological themes among gene clusters. OMICS: a Journal of Integrative Biology . 2012;16(5):284–287. doi: 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B50] 50.Kohmoto T., Masuda K., Shoda K., et al. Claudin-6 is a single prognostic marker and functions as a tumor-promoting gene in a subgroup of intestinal type gastric cancer. Gastric Cancer . 2020;23(3):403–417. doi: 10.1007/s10120-019-01014-x. [DOI] [PubMed] [Google Scholar]

[B51] 51.Yu S., Zhang Y., Li Q., Zhang Z., Zhao G., Jia X. CLDN6 promotes tumor progression through the Yap1-Snail1 axis in gastric cancer. Cell Death & Disease . 2019;10(12):1–14. doi: 10.1038/s41419-019-2168-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B52] 52.Gao F., Li M., Xiang R., Zhou X., Zhu L., Zhai Y. Expression of CLDN6 in tissues of gastric cancer patients: association with clinical pathology and prognosis. Oncology Letters . 2019;17(5):4621–4625. doi: 10.3892/ol.2019.10129. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B53] 53.Luo C., Zhu X., Luo Q., et al. RBFOX3 promotes gastric cancer growth and progression by activating HTERT signaling. Frontiers in Oncology . 2020;10:p. 1044. doi: 10.3389/fonc.2020.01044. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

[B54] 54.Ni S.-J., Zhao L.-Q., Wang X.-F., et al. Cbx7 regulates stem cell-like properties of gastric cancer cells via P16 and Akt-Nf-kappa B-Mir-21 pathways. Journal of Hematology & Oncology . 2018;11(1):1–14. doi: 10.1186/s13045-018-0562-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B55] 55.Peng P., Han X. Cbx7 and Hoxc10 promote gastric cancer cell proliferation and metastasis through NF-κB activation. Acta Poloniae Pharmaceutica . 2021;78(1):41–48. doi: 10.32383/appdr/132891. [DOI] [Google Scholar]

[B56] 56.He M., Yue L., Wang H., et al. Evaluation of the prognostic value of CBXs in gastric cancer patients. Scientific Reports . 2021;11(1):1–9. doi: 10.1038/s41598-021-91649-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B57] 57.Okugawa Y., Tanaka K., Inoue Y., et al. Brain-derived neurotrophic factor/tropomyosin-related kinase B pathway in gastric cancer. British Journal of Cancer . 2013;108(1):121–130. doi: 10.1038/bjc.2012.499. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B58] 58.Choi B., Lee E.-J., Shin M.-K., et al. Upregulation of brain-derived neurotrophic factor in advanced gastric cancer contributes to bone metastatic osteolysis by inducing long pentraxin 3. Oncotarget . 2016;7(34):55506–55517. doi: 10.18632/oncotarget.10747. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B59] 59.Wang Z., Wang J., Yang Y., et al. Loss of has-miR-337-3p expression is associated with lymph node metastasis of human gastric cancer. Journal of Experimental & Clinical Cancer Research . 2013;32(1):1–9. doi: 10.1186/1756-9966-32-76. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B60] 60.Luo S.-S., Liao X.-W., Zhu X.-D. Genome-wide analysis to identify a novel microRNA signature that predicts survival in patients with stomach adenocarcinoma. Journal of Cancer . 2019;10(25):6298–62313. doi: 10.7150/jca.33250. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

An Unsupervised Deep Learning-Based Model Using Multiomics Data to Predict Prognosis of Patients with Stomach Adenocarcinoma

Sizhen Chen

Yiteng Zang

Biyun Xu

Beier Lu

Rongji Ma

Pengcheng Miao

Bingwei Chen

Abstract

Methods

Results

Conclusions

1. Introduction

2. Literature Review

Table 1.

3. Materials and Methods

3.1. Data Collection and Preprocessing

3.2. Autoencoder Construction

3.3. Selection of Representation Features and k-Means Clustering

3.4. Supervised Classification

3.5. Alternative Approaches

3.6. Bioinformatics Analysis

4. Results

4.1. Two Significant Subgroups Were Identified according to the TCGA Cohort

Figure 1.

Table 2.

Figure 2.

Figure 3.

4.2. Cox-PH Model including Subgroups and Other Clinical Features

Table 3.

4.3. Internal Validation to Assess the Robustness of the Subgroups

Table 4.

4.4. External Validation in Three Independent Cohorts

Figure 4.

4.5. Bioinformatics Analysis

Figure 5.

Figure 6.

5. Discussion

6. Conclusions

Acknowledgments

Data Availability

Conflicts of Interest

Supplementary Materials

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases