DeepOmicsSurv: a deep learning-based model for survival prediction of oral cancer

Deepali; Neelam Goel; Padmavati Khandnor

doi:10.1007/s12672-025-02346-0

. 2025 Apr 25;16:614. doi: 10.1007/s12672-025-02346-0

DeepOmicsSurv: a deep learning-based model for survival prediction of oral cancer

Deepali ^1,³, Neelam Goel ^1,^✉, Padmavati Khandnor ²

PMCID: PMC12031713 PMID: 40278990

Abstract

Objective

Oral cancer is an important health challenge worldwide and accurate survival time prediction of this disease can guide treatment decisions. This study aims to propose a deep learning-based model, DeepOmicsSurv, to predict survival in oral cancer patients using clinical and multi-omics data.

Methods

DeepOmicsSurv builds on the DeepSurv model, incorporating multi-head attention convolutional layers, dropout, pooling, and batch normalization to boost its strength and precision. Various dimensionality reduction techniques, including Principal Component Analysis (PCA), Kernel PCA, Non-Negative Matrix Factorization (NMF), Singular Value Decomposition (SVD), Partial Least Squares (PLS), Multidimensional Scaling (MDS), and Autoencoders, were employed to manage the high-dimensional omics data. The model's performance was evaluated against DeepSurv, DeepHit, Cox Proportional Hazards (CoxPH), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN). Additionally, SHapley Additive Explanations (SHAP) was used to analyze the impact of clinical features on survival predictions.

Results

DeepOmicsSurv achieved a C-index of 0.966, MSE of 0.0138, RMSE of 0.1174, MAE of 0.0795, and MedAE of 0.0515, outperforming other deep learning models. Among various dimensionality reduction techniques, autoencoder performed the best with DeepOmicsSurv. SHAP analysis showed that Age, AJCC N Stage, alcohol history and patient smoking history are prevalent clinical features for survival time.

Conclusion

In conclusion, DeepOmicsSurv has the potential to predict survival time in oral cancer patients. This model achieved high accuracy with various data types including Clinical, DNAmethylation + clinical, mRNA + clinical, Copy number alteration + clinical, or multi-omics data. Additionally, SHAP analysis reveals clinical factors that influence survival time.

Supplementary Information

The online version contains supplementary material available at 10.1007/s12672-025-02346-0.

Keywords: DeepOmicsSurv, HNSCC, Dimensionality reduction, Survival prediction, Deep learning

Introduction

Oral cancer is a type of cancer that originates in the oral cavity, including the lips, tongue, gums, floor of the mouth, buccal mucosa, hard palate, retromolar trigone, and inner cheeks [1]. The cancer causes about 65,700 new cases and 330,000 deaths every year worldwide [2]. India has the highest mortality rates due to oral cancer [3] and 30% of cancer cases are of oral cancer [4]. In 2020, India reported about 119,992 new cases of oral cancer and as per World Health Organization (WHO), mortality were 72,616 [3]. While India has some of the highest oral cancer rates, it is important to note that there has been a steady rise in both the incidence and mortality rates of oral cancer in many other countries globally [5].

The high prevalence of oral cancer is attributed to several factors, including high consumption rates of both smoking and smokeless tobacco [6]. Additionally, poor oral hygiene can increase the risk of developing oral cancer [7]. Other contributing factors include excessive alcohol consumption and human papillomavirus (HPV) infection [8]. Patients diagnosed at an early stage have a five-year survival rate of 82% [9], whereas those diagnosed at an advanced stage have a significantly lower survival rate of 27% [10]. Early detection improves quality of life and reduces complications during follow-up compared to later diagnoses [11]. Previous studies have leveraged clinical data for survival prediction in oral cancer. Hung et al. [12] used the Extreme Gradient Boosting (XGBoost) model, which resulted in a high mean square error (MSE) of 486.55. Adeoye et al. [13] and Kim et al. [14] implemented DeepSurv on clinical data, achieving C-indices of 0.89 and 0.81, respectively. DeepSurv can capture interactions between patient characteristics and treatment strategies, facilitating the development of a treatment recommender system [15]. While traditional models based solely on clinical data remain valuable, they fail to capture complex genetic mutations that may impact survival time. Recent advancements in multi-omics technologies provide valuable insights into genetic variations influencing cancer prognosis [16]. These advancements allow for a deeper understanding of molecular changes affecting patient survival outcomes. Sharma et al. [17] implemented DeepSurv with a combination of clinical and multi-omics data, achieving a C-index of 0.916. From this discussion, it can be concluded that the integration of clinical and multi-omics data enhances survival prediction accuracy. However, the high dimensionality and heterogeneity of such data present significant challenges. To address these challenges, deep learning models are increasingly being adopted for survival analysis tasks [18].

In this study, the DeepOmicsSurv model is proposed as an extension of DeepSurv for predicting survival time in oral cancer patients. The model incorporates advanced neural network layers, including convolutional, dropout, multi-head attention, and pooling layers. The convolutional layer extracts local patterns from input data, while the pooling layer reduces the spatial dimensions while preserving essential features. The batch normalization layer stabilizes the training process, and the dropout layer prevents overfitting by randomly disabling a fraction of input units during training. The multi-head attention layer captures intricate relationships across various dimensions. To handle the high dimensionality of multi-omics data, different dimensionality reduction techniques are applied. The aims of this study are:-

To propose DeepOmicsSurv which integrates clinical and multi-omics data using advanced neural network layers to predict survival time
To incorporate SHAP explainable AI to analyze the impact of clinical features, providing insights into their relative importance in survival prediction.
To evaluate the performance of DeepOmicsSurv in comparison with DeepSurv, DeepHit, Cox Proportional Hazards (CoxPH), Convolution Neural Network (CNN), and Recurrent Neural Network (RNN) deep learning models.
To present an analysis of various dimensionality reduction techniques specifically applied to multi-omics datasets of oral cancer survival prediction.

Materials and methods

Materials

For this study, the data of the TCGA-HNSC project was downloaded from The Cancer Genome Atlas (TCGA) which is publicly accessible. The TCGA-HNSC dataset includes a diverse range of head and neck squamous cell carcinoma (HNSC) cases, covering multiple anatomical subsites. The dataset comprises samples from the lips, base of tongue, floor of mouth, gum, palate, tonsil, oropharynx, hypopharynx, and larynx, as well as other and unspecified parts of the mouth, tongue, and pharynx. Additionally, it includes cases from bones, joints, and articular cartilage of other and unspecified sites and ill-defined regions of the oral cavity and pharynx. In TCGA, tissue samples are primarily used, requiring at least 60% tumor nuclei and less than 20% necrotic tissue for inclusion. After pathology review, nucleic acids (DNA and RNA) are extracted, undergo molecular quality control, and are distributed to Cancer Genome Characterization and Genome Sequencing Centers for high-quality genomic analysis. This standardized approach ensures the consistency and reproducibility of molecular data for cancer research. The model uses clinical, DNA methylation, Copy Number Alteration (CNA) and mRNA expression data from 528 patients. Clinical dataset has parameters like Age, Tumor Stage, Gender, TNM Staging, etc. which can influence the prediction of overall survival time [19]. After preprocessing, 26 clinical features were selected and the distribution of data for different features is represented in Table 1.

Table 1.

Overview of demographic characteristics of oral cancer patients

Variables	Classification	Count	Percentage (%)
Gender	Female	142	26.89
	Male	386	73.11
	Total	528
Age at the time of Diagnosis (in Years)	Less than 30	8	1.52
	31–40	14	2.65
	41–50	75	14.20
	51–60	163	30.87
	Above 60	268	50.76
	Total	528
Race	African-American or Black	48	9.09
	Alaska Native or American Indian	2	0.38
	White	452	85.61
	Asian	11	2.08
	Missing	15	2.84
	Total	528
AJCC Clinical Stage	Stage1	21	3.98
	Stage2	99	18.75
	Stage3	107	20.27
	Stage4a	269	50.95
	Stage 4b	11	2.08
	Stage 4c	7	1.33
	Missing	14	2.65
	Total	528
AJCC Clinical T	T1	37	7.01
	T2	152	28.79
	T3	139	26.33
	T4	25	4.73
	T4a	156	29.55
	T4b	3	0.57
	Missing	16	3.03
	Total	528
AJCC Clinical N	N0	246	46.59
	N1	85	16.10
	N2	19	3.60
	N2a	17	3.22
	N2b	85	16.10
	N2c	45	8.52
	N3	9	1.70
	Nx	18	3.41
	Missing	4	0.76
	Total	528
AJCC Clinical M	M0	496	93.94
	M1	6	1.14
	Mx	21	3.98
	Missing	5	0.95
	Total	528
AJCC Pathologic T	T0	1	0.19
	T1	49	9.28
	T2	140	26.52
	T3	101	19.13
	T4	11	2.08
	T4a	160	30.30
	T4b	4	0.76
	Tx	39	7.39
	Missing	23	4.36
	Total	528
AJCC Pathologic N	N0	180	34.09
	N1	68	12.88
	N2	12	2.27
	N2a	8	1.52
	N2b	104	19.70
	N2c	48	9.09
	N3	8	1.52
	Nx	75	14.20
	Missing	25	4.73
	Total	528
AJCC Pathologic M	M0	191	36.17
	M1	1	0.19
	Mx	65	12.31
	Missing	271	51.33
	Total	528
Alcohol History	Yes	352	66.67
	No	165	31.25
	Unknown	11	2.08
	Total	528
	4th	1	0.19
AJCC Staging	5th	10	1.89
	6th	125	23.67
	7th	392	74.25
	Total	528
Overall Survival Status	Living	304	57.58
	Deceased	224	42.42
	Total	528

Open in a new tab

This study incorporated three types of multi-omics data: DNA methylation, mRNA expression, and Copy Number Alteration (CNA). mRNA expression refers to Messenger RNA (mRNA), which is transcribed from DNA and conveys genetic information during protein synthesis [20]. mRNA expression profiles reflect transcriptional activity, with dysregulated mRNA levels contributing to oncogenesis in oral cancer. Overexpression of oncogenic mRNAs and downregulation of tumor suppressor mRNAs drive tumor progression, making mRNA biomarkers valuable for diagnosis, prognosis, and therapeutic targeting. DNA methylation is an important feature used for cancer research, as it can describe the epigenetic alterations related to carcinogenesis [21]. DNA methylation modulates transcriptional activity by silencing genes, particularly tumor suppressors, through CpG island hypermethylation in oral cancer. It ensures genomic stability by suppressing transposable elements and regulates cell fate via epigenetic imprinting. Environmental carcinogens induce aberrant methylation, driving oncogenesis. Methylation biomarkers facilitate early cancer detection, prognostication, and therapeutic intervention, highlighting its significance in oral cancer. CNA refers to genetic alterations that involve the loss or increase of DNA segments, and these changes are commonly observed in cancer cells [22]. CNA involves genomic amplifications or deletions, leading to dosage imbalances of oncogenes and tumor suppressor genes in oral cancer. These alterations promote tumorigenesis by enhancing proliferative signals or impairing cell cycle control, making CNA profiling crucial for cancer classification and targeted therapy.

The dataset obtained from TCGA contains missing values, which require proper handling before further analysis. To ensure data consistency and reliability, normalization techniques are applied. Some features are categorical, making them difficult to process directly. Therefore, this study implemented a comprehensive preprocessing pipeline to prepare the data for model training. The data refinement process includes several steps as shown in Fig. 1. First, NULL values from each data type are removed, and features with significant missing data are dropped. Missing values in numerical features are replaced with the mean value, while those in categorical features are handled using the forward-fill or backward-fill method. Next, categorical variables are converted into numerical representations to facilitate uniform analysis. Finally, a Min–Max scaler is applied to normalize the dataset, rescaling values between 0 and 1 to ensure consistency across different feature ranges.

Omics data often suffers from the curse of dimensionality, which must be addressed to improve model performance [23]. To tackle this issue, dimensionality reduction methods are employed in this study to eliminate irrelevant features from large datasets. For survival analysis, various techniques such as Principal Component Analysis (PCA) [24–29], Non-negative Matrix Factorization (NMF) [30–33], Partial Least Squares (PLS) [34–36], Multidimensional Scaling (MDS) [37–42], Kernel PCA [43–46], Singular Value Decomposition (SVD) [47–50], and autoencoders [51–54] are utilized. The number of features before and after dimensionality reduction for various data types is provided in Table 2.

Table 2.

Number of features of various datatypes

Datatype	No of feature (before dimensionality reduction)	No of features (after dimensionality reduction)
mRNA	20,531	301
DNA methylation	16,529	229
CNA	24,776	467

Open in a new tab

Principal Component Analysis (PCA) identifies the most important patterns and directions of variation in the data, representing it in terms of these patterns. The first principal component captures the greatest variation in the data, followed by the second component, which is orthogonal to the first, and so on [25]. PCA is widely used due to its simplicity, but it can reduce interpretability, as principal components are linear combinations of the original features and may lack clear clinical relevance. Non-negative Matrix Factorization (NMF) reduces the dimension of a matrix while ensuring that all elements remain non-negative [30]. However, NMF assumes non-negativity, which may not apply to all datasets and can lead to suboptimal performance in cases with negative values. Partial Least Squares (PLS) seeks small latent variables that explain variance in the original data. Unlike PCA, PLS identifies latent variables that are linear combinations of both predictor and response variables. However, PLS may overfit when the number of predictors significantly exceeds the sample size. Multidimensional Scaling (MDS) reduces the number of attributes while preserving relevant information [37, 38]. The primary goal of MDS is to map distances between objects into a lower-dimensional space while maintaining original pairwise distances [39]. However, it may struggle with high-dimensional or noisy datasets, potentially distorting the mapping. Kernel PCA extends PCA by applying a non-linear transformation (kernel function) to the original data before PCA is performed. This projects the data into a higher-dimensional space, making it more linearly separable. However, choosing the correct kernel and hyperparameters is crucial, as improper selection may lead to poor results or reduced interpretability. Singular Value Decomposition (SVD) is a matrix factorization technique used for dimensionality reduction in cancer survival prediction. It decomposes a feature matrix into singular values, left singular vectors, and right singular vectors, representing the original data [47, 48]. However, SVD assumes linearity and does not account for non-linear relationships inherent in many survival datasets. Autoencoders, a neural network-based dimensionality reduction technique, aim to minimize reconstruction error. They learn a compressed representation of input data, capturing essential features while discarding noise and irrelevant information. The reduced-dimensional representation can then be used as input for survival analysis models [53]. However, autoencoders require significant computational resources, careful hyperparameter tuning, and sufficient data to avoid overfitting.

DeepOmicsSurv

In this study, the DeepOmicsSurv model is proposed for predicting survival time in oral cancer patients. It is built upon the DeepSurv model with the addition of advanced neural network layers, including convolutional, pooling, multi-head attention, and dropout layers. These enhancements enable the model to capture complex relationships within the input data. The architecture of the proposed model is illustrated in Fig. 2.

The detailed architecture of proposed model with explanation of each layer is mentioned below:

Input layer

The input layer takes as input the feature matrix $X$ with the shape of ${(n}_{samples}, n_{features}, 1)$ . Here, $n_{samples}$ is the number of observations, $n_{features}$ is the number of features, which can be very large, especially in genomics, and the last dimension (1) represents the channel dimension, necessary for convolutional operations, even if it is a single channel. The feature matrix is computed using Eq. 1.

X = [x_{1}, x_{2}, \dots . . x_{n_{features}}]

where each $x_{i}$ represents a feature for a given sample.

Convolutional layers

Convolutional layers are applied to high-dimensional multi-omics data, which provides capturing of local patterns and the relationships between features. Mathematically, for an input $X$ with shape ${(n}_{samples}, n_{features}, 1)$ , a filter $W$ of size $k$ , and stride $s$ , the output feature map $Z$ at position $i$ is given by Eq. 2.

z_{i} = \sum_{j = 0}^{k - 1} X_{i + j} . W_{j} + b

where $X_{i + j}$ is the input segment and $W_{j}$ is the corresponding filter weight. Padding = 'same' ensures that the output length of feature map is equal to the number of inputs by Zero-Padding if necessary.

Pooling layers

Max pooling reduces the feature map by taking maximum value over a window size. This helps in down-sampling the input thereby reducing the computational complexity as in Eq. (3).

X_{pool}^{(i)} = M a x P o o l i n g 1 D (P) (X_{conv}^{(i)})

where $P$ is the pooling size, which decrements the dimension of the feature map. This is adjusted by the dropout variable.

X_{dropout}^{(i)} = D r o p o u t (r a t e) (X_{pool}^{(i)})

The dropout rate is typically between 0.1 and 0.5 to prevent overfitting.

Residual operation

Residual connections add the input of a layer to its output which helps with counteracting vanishing gradients and allows for deeper networks using Eq. 5.

X_{res}^{(i)} = X_{dropout}^{(i)} + X^{(i - 1)}

Muti-head attention layer

This layer is designed to capture intricate relationships between various features. The feature vector $X_{res}$ is transformed into query (Q), key (K), and value (V) matrices. Attention weights are computed as in Eq. 6.

A = s o f t m a x (\frac{Q K^{T}}{\sqrt{d}})

where d is the attention dimension. The attention-weighted output is given in Eq. (7).

X_{atten} = A V

Global average pooling

This layer squashes the spatial dimensions of inputs to one, essentially summarizing all feature maps with Eq. (8).

X_{GAP} = G l o b a l A v e r a g e P o o l i n g 1 D (X_{atten})

Dense layers

Dense (Fully Connected) Layers are the workhorses in learning complex feature interactions. These are used after the convolutional and global average pooling to process properly this extracted information before preparing them for the final output. A dense layer is just performing linear transformation and then applying the RELU function using Eq. 9.

X_{dense}^{(j)} = D e n s e (n, a c t i v a t i o n) (X_{prev}^{(j)})

where $X_{dense}^{(j)}$ is the output, $n$ is the number of neurons, $activation$ is the activation function applied to the layer and $X_{prev}^{(j)}$ is the input to the j^th dense layer, which can be the output from a previous layer or the input to the network.

The entire operation for a dense layer can be written as in Eq. (10).

X_{dense}^{(j)} = σ (W^{(j)} X_{prev}^{(j)} + b^{(j)})

where $W^{(j)}$ represents weight matrix, $b^{(j)}$ is the bias vector and $σ$ is the activation function.

Batch normalization

This layer normalizes the outputs of the dense layer to maintain a variance to 1 and mean to 0. This helps stabilize and speed up the training process, as shown in Eq. (11).

X_{BN}^{(j)} = B a t c h N o r m a l i z a t i o n (X_{dense}^{(j)})

where $X_{BN}^{(j)}$ is the output of the batch normalization layer. Batch normalization adjusts and scales the output from the dense layer. The batch normalization operation is defined as in Eq. (12).

X_{BN}^{(j)} = γ (\frac{X_{dense}^{(j)} - μ}{\sqrt{σ^{2} + \in}} + β)

where $μ$ represents mean, $σ^{2}$ is variance, $γ$ and $β$ are learnable parameters and ∈ is a small constant added for numerical stability.

Dropout

This layer randomly disables a portion of the input units during each training iteration, as depicted in Eq. (13). This technique helps reduce overfitting by preventing the model from becoming too reliant on any individual neuron.

X_{dropout}^{(j)} = D r o p o u t (r a t e) (X_{BN}^{(j)})

where $X_{dropout}^{(j)}$ is the output and $rate$ is the dropout rate.

The dropout operation can be expressed as in Eq. (14).

X_{dropout}^{(j)} = (X_{BN}^{(j)} ⊙ M)

where ⨀ denotes element-wise multiplication, M represents the binary mask with the same shape as $X_{BN}^{(j)}$ , where each element is zero with a probability rate and one with a probability 1 rate.

Output layer

The model has an output layer that predicts the log hazard function for each sample. This layer is essential in survival analysis as it outputs the estimated log hazard ratios which are then used to compute the risk of an event occurring. This final dense layer is in charge of the raw outputs which are just scalar values representing the unperturbed log hazard function for each sample using Eq. (15).

\hat{h} = D e n s e (1, l i n e a r, k e r n e l_{regularizer} = l 2 (l 2_{reg})) (X_{dense}^{dense})

where $\hat{h}$ is the output of the dense layer, $X_{dense}^{dense}$ is the output from the final dense layer, $D e n s e (1, l i n e a r)$ indicates a dense layer with 1 unit and a linear activation function, producing a single output value per sample and $k e r n e l_{regularizer} = l 2 (l, 2_{reg})$ applies L2 regularization with a factor of $l 2_{reg}$ .

The operation can be expressed as in Eq. (16).

\hat{h} = W_{output} X_{dense}^{dense} + b_{output}

where $W_{output}$ is the weight vector and $b_{output}$ is the bias of the output layer.

Loss function

The model is optimized using a hybrid loss function, combining Negative Log-Likelihood (NLL), MSE and a regularization penalty:

N L L = - \sum_{i} E_{i} ({yp}_{i} - l o g \sum_{j \geq i} exp ({yp}_{i})

where $E_{i}$ is the event indicator, ${yp}_{i}$ is the predicted value.

M S E = \frac{1}{N} \sum_{i} {E_{i} ({yp}_{i} - y_{i})}^{2}

P e n a l t y = \frac{1}{N} \sum_{i} {yp}_{i}^{2}

The total loss is shown in Eq. (20).

L o s s = w_{1} N L L + w_{2} M S E + w_{3} P e n a l t y

Where w_1,w₂, w₃ are the weights.

Grid search

To optimize the performance of the DeepOmicsSurv model, a grid search approach is employed to explore various combinations of hyperparameters. These hyperparameters include the number of dense layers, nodes per dense layer, number of convolution layers, filters per convolutional layer, convolutional kernel size, number of attention heads, regularization weights, learning rate, and dropout rate. The grid search process systematically evaluates all possible hyperparameter combinations using a validation set. For each combination, the model is trained on the training data, and its performance is assessed using the C-Index and Mean Squared Error (MSE). A weighted score is then calculated as described in Eq. (21).

W e i g h t e d s c o r e = c - i n d e x - β M S E

where β is the weight for MSE. The combination yielding the highest weighted score is selected as the optimal hyperparameter set. The values for various parameters that are given to grid search is presented in Table 3.

Table 3.

Hyper parameters tuning features

No. of dense layers	[2, 4, 8]
No. of nodes in each dense layer	[16,32,64]
No. of convolution layers	[2, 4]
Number of filters in each convolutional layer	[16,32,64]
Size of the convolutional kernel	[3, 5]
No. of heads for the attention layer	[2, 4]
Regularization weights(L₁,L₂)	[0.001,0.01,0.1]
Learning rate	[0.001,0.01,0.1]
Dropout rate	[0.2,0.4]

Open in a new tab

Model’s performance evaluation

The model's performance is assessed using several metrics: C-index, MSE, Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Median Absolute Error (MedAE). Harrell's C-index is a performance metric used to evaluate the predictive ability of regression models [55]. The formula for the concordance index in regression [56] is given by Eq. (22):

C - i n d e x = \frac{(n u m b e r o f c o n c o r d a n t p a i r s + 0.5 \times n u m b e r o f t i e s)}{Totalnumberofpairs}

Concordant pairs refer to sample pairs where the predicted value for one sample is higher than for another, and the true value follows the same pattern. Ties occur when both samples have identical predicted values, and the Total number of pairs represents all possible pairs in the dataset.

Mean Squared Error (MSE) evaluates the performance of a regression model [57] by computing the average squared differences between predicted and actual values. The MSE formula is given by Eq. (23):

M S E = \frac{1}{n} \times \sum {(y_{i} - \bar{y})}^{2}

where n is the number of samples, $y_{i}$ is the true value, and $\bar{y}$ represents the mean of all true values in the dataset. MSE provides the average squared difference between predicted and actual values. RMSE is a metric that represents the square root of MSE, reflecting the average distance between predicted and actual values, expressed in the same units as the target variable [58]. The formula for RMSE is given in Eq. (24)

R M S E = \sqrt{MSE}

Mean Absolute Error (MAE) represents the average absolute difference between actual and predicted values [59]. The formula for MAE is given by Eq. (25):

M A E = \frac{1}{n} \times |y_{i} - {\hat{y}}_{i}|

where ${\hat{y}}_{i}$ is the predicted value.

Median Absolute Error (MedAE) is another metric used for regression model evaluation, calculating the median of absolute differences between true and predicted values [59]. The formula for MedAE is given by Eq. (26):

M e d A E = m e d i a n |y_{i} - {\hat{y}}_{i}|

Model Interpretability

The SHapley Additive Explanations (SHAP) summary plot underscores the critical role of various features in influencing the model’s predictions, as shown in Fig. 3a. Diagnosis Age emerged as the most impactful feature, highlighting its central role in determining patient outcomes. This is followed by the Neoplasm Disease Lymph Node Stage (AJCC_N_Stage), which appears twice, reinforcing its significance in cancer staging and disease progression. Lifestyle factors such as Smoking History and Alcohol History also contribute to the model’s predictions, aligning with the clinical understanding of their impact on overall health and survival. Features related to cancer staging, including AJCC Tumor Stage and AJCC T Stage, exhibit high SHAP values, further validating their prognostic importance in evaluating disease severity.

As depicted in Fig. 3b, the SHAP feature importance bar plot provides a clearer ranking of these variables, confirming the dominance of Diagnosis Age and AJCC_N_Stage in shaping patient outcomes. Demographic variables such as Race Category and Ethnicity Category, while less influential compared to biological markers, offer valuable context for understanding disparities in survival rates. Lower-impact features, including Neoadjuvant Therapy and Prior Cancer Diagnosis, may contribute minimally to the model and could be considered for removal to enhance model efficiency. The SHAP analysis not only corroborates the significance of well-established clinical indicators but also offers valuable insights into the interplay between lifestyle, demographic, and pathological factors, reinforcing the model’s clinical relevance and potential for improving prognostic assessments.

Proposed methodology

The methodology employed to enhance survival time prediction integrates clinical data with DNA methylation, copy number alteration, and mRNA, as illustrated in Fig. 4. Dimensionality reduction techniques are applied individually to the pre-processed data of each omic type, simplifying the data streams and highlighting the most critical features or components for each type. Subsequently, the dimensionally reduced features from DNA methylation, copy number alteration, and mRNA expression data are merged with clinical features by aligning them using common identifiers, such as PatientId. The resulting outcome is the 2-D matrix. This step combined all the different data types into a dataset that would then be used for further analysis and modeling. Once the data is integrated, 70% of it is used for training and 30% for testing datasets with the training set further divided in an 80:20 ratio for training and validation purposes. The DeepOmicsSurv model is trained on the training subset, validated using the validation subset, and evaluated on the testing subset. The model predicts the survival time in months and performance is assessed using metrics such as the c-index, MSE, RMSE, MAE, and MedAE. By utilizing both multi-omics and clinical data, the model demonstrates its capabilities in survival prediction alongside other deep learning models.

Experiment results

In this study, the model named DeepOmicsSurv is developed to predict survival time. The proposed model is assessed against several deep learning models, including DeepSurv, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), DeepHit, and CoxPH, across multiple metrics.

Dimensionality reduction is crucial in handling the high-dimensional data inherent in multi-omics and clinical datasets. The analysis of dimensional reduction techniques: PCA, Autoencoder, Kernel-PCA, NMF, SVD, MDS, and PLS are performed with the implementation of the DeepOmicsSurv model. Table 4 summarizes the results for multi-omics data with various dimensionality reduction techniques.

Table 4.

Performance of DeepOmicsSurv with dimensionality reduction techniques (multi-omics data)

Technique	C-index	MSE	RMSE	MAE	MedAE
PCA (Principle Component Analysis)	0.9562	0.0226	0.1504	0.096	0.0535
Autoencoder	0.966	0.0138	0.1174	0.0795	0.0415
Kernel-PCA	0.963	0.0168	0.1298	0.0898	0.0521
NMF (Non-negative Matrix Factorization)	0.9378	0.0199	0.1413	0.0835	0.045
SVD (Singular Value Decomposition)	0.9553	0.0272	0.165	0.087	0.4334
MDS (MultiDimensional Scaling)	0.9442	0.0217	0.1475	0.0892	0.0672
PLS (Partial Least Squares)	0.9442	0.0204	0.1431	0.0908	0.0506

Open in a new tab

The bold values are indicating better results

The results in Table 4 demonstrate that DeepOmicsSurv consistently achieves high predictive accuracy across dimensionality reduction techniques. The Autoencoder method obtained the highest c-index (0.966) and lowest MAE (0.2014) and MedAE (0.1766), showcasing its strong ability to handle complex, high-dimensional data. Kernel-PCA with c-index of 0.963, MSE of 0.4167, RMSE of 0.6455 showed good performance with values closely following DeepOmicsSurv. On the other hand, NMF and SVD performed slightly worse, with c-indices of 0.9378 and 0.9553, respectively.

Also, the performance of DeepOmicsSurv model is compared with other deep learning models, including DeepSurv, CNN, RNN, DeepHit, and CoxPH using both clinical and multi-omics data. For fair comparison, these models are implemented on the data used in this study. The comparative performance of various deep learning models with DeepOmicsSurv, using clinical data is detailed in Table 5.

Table 5.

Comparative analysis of DeepOmicsSurv with other models (clinical data only)

Model	C-index	MSE	RMSE	MAE	MedAE
DeepOmicsSurv	0.8912	0.0316	0.178	0.115	0.0804
DeepSurv	0.8006	0.08086	0.2844	0.2508	0.2149
CNN (Convolutional Neural Network)	0.5258	0.1337	0.3657	0.2932	0.2464
RNN (Recurrent Neurall Network)	0.5747	0.1333	0.2825	0.3271	0.2906
DeepHit	0.7898	9.0077	3.0012	2.6337	2.8941
CoxPH	0.8871	263.23	16.224	7.763	0.5431

Open in a new tab

The bold values are indicating better results

With clinical data, DeepOmicsSurv outperformed other models by achieving a c-index of 0.8912, MSE of 0.0316, RMSE of 0.178, MAE of 0.115, and MedAE of 0.0804. From this, it can be inferred that DeepOmicsSurv architecture is better than previous deep learning models in predicting survival time. It is superior to DeepSurv (c-index: 0.8006) and CoxPH (c-index: 0.8871), although CoxPH showed a higher c-index compared to other models. However, the error metrics (MSE, RMSE, MAE) of CoxPH are much higher, suggesting that while the c-index may be similar, the DeepOmicsSurv model provides more reliable and stable predictions.

The comparison of different deep learning models with DeepOmicsSurv using multi-omics data is shown in Table 6.

Table 6.

Comparative analysis of DeepOmicsSurv with other models (multi-omics data)

Model	C-index	MSE	RMSE	MAE	MedAE
DeepOmicsSurv	0.966	0.0138	0.1174	0.0795	0.0515
DeepSurv	0.8702	0.0577	0.17599	0.0898	0.069
CNN (Convolutional Neural Network)	0.5768	0.4248	0.6518	0.5986	0.5677
RNN (Recurrent Neural Network)	0.6112	0.0373	0.1937	0.1389	0.1017
DeepHit	0.8175	4.1103	2.0273	1.7515	1.7932
CoxPH	0.6969	1.7192	1.3112	1.0371	0.8749

Open in a new tab

The bold values are indicating better results

DeepOmicsSurv outperforms among all other models across all metrics when using multi-omics data. It achieves the highest c-index (0.9683), indicating superior discriminatory power, and significantly lower MSE 0.0138, RMSE 0.1174, MAE 0.0795 and MedAE 0.0515 values, highlighting its improved accuracy in survival prediction. In comparison, DeepSurv achieved a c-index of 0.8702, while CNN and RNN exhibited lower performance, with c-indices of 0.5768 and 0.6112, respectively. Additionally, DeepHit and CoxPH showed less competitive results, with c-index of 0.8175 and 0.6969, respectively. These results highlight that DeepOmicsSurv provides a more accurate and reliable prediction of survival outcomes. The comparison of DeepOmicsSurv based on various types of data is presented in Table 7.

Table 7.

Comparative analysis of DeepOmicsSurv based on different datatypes

Datatype	C-index	MSE	RMSE	MAE	MedAE
Multi-omics	0.966	0.0138	0.1174	0.0795	0.0515
Clinical	0.8912	0.0316	0.178	0.115	0.0804
DNA Methylation + clinical	0.9139	0.0208	0.1755	0.107	0.054
mRNA + clinical	0.9048	0.0257	0.1606	0.1107	0.0754
CNA + clinical	0.9103	0.0183	0.1352	0.0846	0.0589

Open in a new tab

The bold values are indicating better results

DeepOmicsSurv performed the best using multi-omics data with c-index 0.966, MSE 0.0138, RMSE 0.1174, MAE 0.0795 and MedAE 0.0515, followed by combinations of DNA Methylation + Clinical with c-index of 0.9139 and CNA + Clinical with c-index of 0.9103. The clinical data alone yielded a c-index of 0.8912, which is lower than that of multi-omics and combined datatypes but still demonstrates a strong predictive capability. The c-index for CNA + Clinical was 0.9103, suggesting that combining clinical data with genetic data types like DNA methylation further improves performance compared to clinical data alone. It can be inferred from these statistics that the use of multi-omics data provides insights to learn complex interactions of genomic data along with clinical data for survival analysis.

To further assess the effectiveness of the proposed method, a comparison with existing models isconducted, and the results are presented in Table 8. The proposed model outperforms previous models in predicting survival time, achieving exceptionally low values for MSE (0.0138), RMSE (0.1174), MAE (0.0795), and MedAE (0.0515), alongside the highest c-index of 0.9666. t is important to note that not all studies report all evaluation metrics, and these comparisons should be interpreted with caution due to potential variations in datasets, model architectures, and other influencing factors. Overall, the proposed method exhibits promising potential for predicting patient outcomes with high accuracy.

Table 8.

Comparison of proposed method with existing models

Authors	Dataset Used	No of Patients	Method Used	C-index	MSE	RMSE	MAE	MedAE
Sharma et al. [17]	Clinical + Multi omics	528	Deep Surv	0.916	–	–	–	–
Kim et al. [14]	Clinical	255	Deep Surv	0.810	–	–	–	–
Hung et al. [12]	Clinical	257,880	XGBoost	–	486.55	22.06	13.55	–
Adeoye et al. [13]	Clinical	313	Deep Surv	0.89	–	–	–	–
Proposed model	Clinical + Multi omics	528	DeepOmicsSurv	0.966	0.0138	0.1174	0.0795	0.0515

Open in a new tab

The bold values are indicating better results

Discussion

This study proposes the DeepOmicsSurv model with clinical and multi-omics data to predict survival time in oral cancer patients. With the addition of advanced neural network layers, the model is able to perform predictions more efficiently than other deep learning models. Even this model works better for each type of data. PCA, NMF, Kernel-PCA, Autoencoder, SVD, MDS, and PLS have been used in this work to deal with the issue of high dimensionality of data.

The performance of the DeepOmicsSurv model is analyzed on a dataset of 528 patients and obtained a C-index of 0.966, MSE of 0.0138, RMSE of 0.1174, MAE of 0.0795, and MedAE of 0.0515 which is better than all implemented deep learning models. These results indicate that the model can extract complex relationships of input which other existing models may miss. Also, the results showed the importance of the integration of clinical and multi-omics data. But even with clinical data only, the model performed best among all other models which reveals the potential of the model to work with any data type. This makes the model suitable for clinical workflows, even when multi-omics data is not available.

While comparing the results with previous studies, it has been observed that DeepOmicsSurv outperformed many existing models in terms of predictive accuracy. Sharma et al. [17] reported a C-index of 0.916, which is lower than the DeepOmicsSurv model (c-index: 0.966). Kim et al. [14] (C-index: 0.810) and Adeoye et al. [13] (C-index: 0.89) used the DeepSurv model, but the results are not as good as achieved by DeepOmicsSurv even with Clinical data (c-index:0.8912). These figures indicate the ability of DeepOmicsSurv to provide a more accurate prediction of survival time with any type of data. DeepOmicsSurv with the Autoencoder approach demonstrated its effectiveness in capturing complex data relationships by attaining the highest C-index of 0.9666 and the lowest MAE of 0.0138. In comparison, models like DeepSurv and CoxPH showed relatively lower C-index values and higher error metrics. This aligns with recent studies that suggest the superiority of deep learning models over traditional statistical methods, especially when handling multi-omics data ([14, 17]).

Although the results are promising, the proposed work has certain limitations. The major limitation is the use of TCGA-HNSC data only. While this dataset is widely used, but only internal validation is performed in the study and due to the lack of external validation, it has limited generalizability. In future work, the model will be implemented on real-world data from hospitals. In the proposed methodology, only four types of data namely clinical, CNA, DNA Methylation, and mRNA are used. However, future work may incorporate additional data types, such as miRNA and protein data. The model used various dimensionality reduction techniques to select to most relevant features for prediction of survival time but in future key molecular markers will be isolated from complete dataset to make it suitable for clinical application.

Conclusion

The present study proposes the DeepOmicsSurv model which accurately predicts the survival time of oral cancer patients. The model achieved outstanding performance, with a C-index of 0.966, MSE (0.0138), RMSE (0.1174), MAE (0.0795), and MedAE (0.0515) with multi-omics data. This highlights the ability of DeepOmicsSurv to capture complex relationships within high-dimensional clinical and multi-omics data. Among various dimensionality reduction techniques, Autoencoder performed better. DeepOmicsSurv achieved a C-index of 0.8912, MSE of 0.0316, RMSE of 0.178, MAE of 0.115, and MedAE of 0.0804 with clinical data, leading to more precise survival predictions than traditional models, including CoxPH and DeepHit. The SHAP analysis reveals the important clinical features for survival time of oral cancer patients. Thus, the DeepOmicsSurv significantly improves the predictive power of the model. With further refinements and validation on larger datasets, DeepOmicsSurv holds great potential as a valuable tool in precision medicine.

Supplementary Information

Additional file 1.^{(8.8KB, py)}

Acknowledgements

We would like to thank the anonymous reviewer’s for their constructive feedback that helped us to improve the manuscript.

Author contributions

D.G, N.G, and P.K. conceptualize the study. D.G. acquired the data and performed data analysis and interpretation. D.G. prepared the manuscript draft. All authors reviewed the manuscript.

Funding

No funds, grants, or other support was received.

Data availability

The data used in this research can be accessed at National Cancer Institute. "Genomic Data Commons Data Portal: TCGA-HNSC." 2021, https://portal.gdc.cancer.gov/projects/TCGA-HNSC.

Code availability

The source code for the project is available upon request. To obtain access to the code, please send an email to goyaldeepali1@gmail.com with the request, and we will provide you with the necessary instructions and access details.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Khalili J. Oral cancer: risk factors, prevention and diagnostic. Exp Oncol. 2008;30(4):259–64. [PubMed] [Google Scholar]
2.Annual report to the nation 2021: national trends in cancer death rates infographic. Surveillance, epidemiology, and end results program, National institutes of health. 2021. https://seer.cancer.gov/report_to_nation/infographics/trends_mortality.html. Accessed 20 Feb 2025.
3.Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71:209–49. 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]
4.Li Y, Feng Y, Cao P, et al. Role of synovium-derived fibrous cartilage in temporomandibular joint synovial chondromatosis. J Oral Pathol Med. 2019;48:79–86. 10.1111/jop.12788. [DOI] [PubMed] [Google Scholar]
5.Dhanuthai K, Rojanawatsirivej S, Thosaporn W, Kintarak S, Subarnbhesaj A, Darling M, Kryshtalskyj E, Chiang CP, Shin HI, Choi SY, Lee SS, Aminishakib P. Oral cancer: a multicenter study. Med Oral Patol Oral Cir Bucal. 2018;23(1):e23–9. 10.4317/medoral.21999. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Borse V, Konwar AN, Buragohain P. Oral cancer diagnosis and perspectives in India. Sens Int. 2020;1: 100046. 10.1016/j.sintl.2020.100046. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Niaz K, Maqbool F, Khan F, Bahadar H, Ismail Hassan F, Abdollahi M. Smokeless tobacco (paan and gutkha) consumption, prevalence, and contribution to oral cancer. Epidemiol Health. 2017;9(39): e2017009. 10.4178/epih.e2017009. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Irani S. New insights into oral cancer-risk factors and prevention: a review of literature. Int J Prev Med. 2020;30(11):202. 10.4103/ijpvm.IJPVM_403_18. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.González-Moles MÁ, Aguilar-Ruiz M, Ramos-García P. Challenges in the early diagnosis of oral cancer, evidence gaps and strategies for improvement: a scoping review of systematic reviews. Cancers (Basel). 2022;14(19):4967. 10.3390/cancers14194967. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Sandra SC, Raghavan A, Madan Kumar PD. Application of artificial intelligence in the diagnosis and survival prediction of patients with oral cancer: a systematic review. J Oral Res Rev. 2022;14:154–60. [Google Scholar]
11.Thavarool SB, Muttath G, Nayanar S, Duraisamy K, Bhat P, Shringarpure K, Nayak P, Tripathy JP, Thaddeus A, Philip S. Improved survival among oral cancer patients: findings from a retrospective study at a tertiary care cancer center in rural Kerala, India. World J Surg Oncol. 2019;17(1):15. 10.1186/s12957-018-1550-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Hung M, et al. Artificial intelligence in dentistry: harnessing big data to predict oral cancer survival. World J Clin Oncol. 2020;11(11):918. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Adeoye J, et al. Comparison of time-to-event machine learning models in predicting oral cavity cancer prognosis. Int J Med Inform. 2022;157:104635. [DOI] [PubMed] [Google Scholar]
14.Kim DW, et al. Deep learning-based survival prediction of oral cancer patients. Sci Rep. 2019;9(1):6994. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol. 2018;18:1–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Tran TO, Vo TH, Le NQ. Omics-based deep learning approaches for lung cancer decision-making and therapeutics development. Brief Funct Genomics. 2024;23(3):181–92. [DOI] [PubMed] [Google Scholar]
17.Sharma D, Garg VK, Kashyap D, Goel N. A deep learning-based integrative model for survival time prediction of head and neck squamous cell carcinoma patients. Neural Comput Appl. 2022;34(23):21353–65. [Google Scholar]
18.Le NQ. Hematoma expansion prediction: still navigating the intersection of deep learning and radiomics. Eur Radiol. 2024;22:1–3. [DOI] [PubMed] [Google Scholar]
19.Ahmad P, Nawaz R, Qurban M, Shaikh GM, Mohamed RN, Nagarajappa AK, Asif JA, Alam MK. Risk factors associated with the mortality rate of oral squamous cell carcinoma patients: a 10-year retrospective study. Medicine. 2021;100(36): e27127. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Chery J. RNA therapeutics: RNAi and antisense mechanisms and clinical applications. Postdoc J. 2016;4(7):35. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Mikeska T, Bock C, Do H, Dobrovic A. DNA methylation biomarkers in cancer: progress towards clinical implementation. Expert Rev Mol Diagn. 2012;12(5):473–87. [DOI] [PubMed] [Google Scholar]
22.Barrett MT, Scheffer A, Ben-Dor A, Sampas N, Lipson D, Kincaid R, Tsang P, Curry B, Baird K, Meltzer PS, Yakhini Z, Bruhn L, Laderman S. Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. Proc Natl Acad Sci USA. 2004;101(35):12973–8. 10.1073/pnas.0402927101. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Sun YS, Zhao Z, Yang ZN, Xu F, Lu HJ, Zhu ZY, Shi W, Jiang J, Yao PP, Zhu HP. Risk factors and preventions of breast cancer. Int J Biol Sci. 2017;13(11):1387–97. 10.7150/ijbs.21635. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Ghojogh B, Crowley M, Karray F, Ghodsi A. Principal component analysis. In: Ghojogh B, Crowley M, Karray F, Ghodsi A, editors. Elements of dimensionality reduction and manifold learning. Cham: Springer International Publishing; 2023. p. 123–54. [Google Scholar]
25.Hasan BM, Abdulazeez AM. A review of principal component analysis algorithm for dimensionality reduction. J Soft Comput Data Min. 2021;2(1):20–30. [Google Scholar]
26.Zebari R, Abdulazeez A, Zeebaree D, Zebari D, Saeed J. A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. J Appl Sci Technol Trends. 2020;1(2):56–70. [Google Scholar]
27.Kabir MF, Chen T, Ludwig SA. Performance analysis of dimensionality reduction algorithms in machine learning models for cancer prediction. Healthc Anal. 2023;1(3): 100125. [Google Scholar]
28.Sen D, Erazo K, Zhang W, Nagarajaiah S, Sun L. On the effectiveness of principal component analysis for decoupling structural damage and environmental effects in bridge structures. J Sound Vib. 2019;29(457):280–98. [Google Scholar]
29.Abdi H, Williams LJ. Principal component analysis. Wiley Interdiscip Rev Comput Statist. 2010;2(4):433–59. [Google Scholar]
30.Meng Y, Shang R, Shang F, Jiao L, Yang S, Stolkin R. Semi-supervised graph regularized deep NMF with bi-orthogonal constraints for data representation. IEEE Transact Neural Netw Learn Syst. 2019;31(9):3245–58. [DOI] [PubMed] [Google Scholar]
31.Fomin F, Panolan F, Patil A, Tanveer A. Boolean and $\mathbb F _ p $-matrix factorization: from theory to practice. In: Fomin F, editor. 2022 International joint conference on neural networks (IJCNN). Padua: IEEE; 2022. p. 1–8. [Google Scholar]
32.Gillies RJ, Morse DL. In vivo magnetic resonance spectroscopy in cancer: a review. NMR Biomed. 2005;18(4):321–40. 10.1002/nbm.974. [DOI] [PubMed] [Google Scholar]
33.Zhang W, Zeng S, Liu X, Zhao Y, Wei Y. Cancer diagnosis and prognosis prediction based on non-negative matrix factorization of microarray data. J Biomed Inform. 2018;86:99–105. 10.1016/j.jbi.2018.07.008. [Google Scholar]
34.Boulesteix AL. PLS dimension reduction for classification with microarray data. Statist Appl Genet Mol Biol. 2004;3(1):1. [DOI] [PubMed] [Google Scholar]
35.Rosipal R, Krämer N. Overview and recent advances in partial least squares. In: Saunders C, Grobelnik M, Gunn S, Shawe-Taylor J, editors. Subspace, latent structure and feature selection. Berlin: Springer; 2006. p. 34–51. [Google Scholar]
36.Geladi P, Kowalski BR. Partial least-squares regression: a tutorial. Anal Chim Acta. 1986;185:1–17. 10.1016/0003-2670(86)80028-9. [Google Scholar]
37.Ayesha S, Hanif MK, Talib R. Overview and comparative study of dimensionality reduction techniques for high dimensional data. Inform Fusion. 2020;1(59):44–58. [Google Scholar]
38.Nguyen LH, Holmes S. Ten quick tips for effective dimensionality reduction. PLoS Comput Biol. 2019;15(6): e1006907. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Sumithra V, Surendran S. A review of various linear and nonlinear dimensionality reduction techniques. Int J Comput Sci Inf Technol. 2015;6(3):2354–60. [Google Scholar]
40.Hu L, Sung AH, May WL. A dimension reduction approach to selecting and clustering time-course gene expression data. Bioinformatics. 2005;21(15):3054–62. [Google Scholar]
41.Borg I, Groenen P. Modern multidimensional scaling: theory and applications. 2nd ed. Springer: Amsterdam; 2005. [Google Scholar]
42.Cox TF, Cox MAA. Multidimensional scaling. 2nd ed. London: Chapman and Hall/CRC; 2001. [Google Scholar]
43.Aziz R, Verma CK, Srivastava N. Dimension reduction methods for microarray data: a review. AIMS Bioeng. 2017;4(1):179–97. [Google Scholar]
44.Li H, Liu J, Wu J. A Kernel principal component analysis method for survival prediction of oral cancer patients. J Healthc Eng. 2018;2018:9409261. 10.1155/2018/9409261. [Google Scholar]
45.Schölkopf B, Smola A, Müller K. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 1998;10(5):1299–319. [Google Scholar]
46.Yan L, Jin G, Su Z, Li C. Kernel principal component analysis for feature reduction of ECG signals. J Med Syst. 2016;40(11):235.27653042 [Google Scholar]
47.Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290(5500):2323–6. [DOI] [PubMed] [Google Scholar]
48.Lu Y, Li Y, Liang K, Pan Y. Feature extraction of mass spectrometry data by singular value decomposition for breast cancer classification. J Healthc Eng. 2019.
49.Howland P, Park H. Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Trans Pattern Anal Mach Intell. 2004;26(8):995–1006. [DOI] [PubMed] [Google Scholar]
50.Golub GH, Reinsch C. Singular value decomposition and least squares solutions. Numer Math. 1970;14(5):403–20. [Google Scholar]
51.Torkey H, Atlam M, El-Fishawy N, Salem H. A novel deep autoencoder-based survival analysis approach for microarray dataset. PeerJ Comput Sci. 2021;21(7): e492. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Gupta S, Kalaivani S, Rajasundaram A, Ameta GK, Oleiwi AK, Dugbakie BN. Prediction performance of deep learning for colon cancer survival prediction on SEER data. Biomed Res Int. 2022;16:2022. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
53.Irsoy O, Alpaydın E. Unsupervised feature extraction with autoencoder trees. Neurocomputing. 2017;4(258):63–73. [Google Scholar]
54.Alom MZ, Taha TM, Yakopcic C, Westberg K, Sidike P, Nasrin MS, Hasan M. A review of deep learning techniques for the prediction of protein-protein interactions. Brief Bioinform. 2019;20(4):1281–98. [Google Scholar]
55.Longato E, Vettoretti M, Di Camillo B. A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models. J Biomed Inform. 2020;1(108): 103496. [DOI] [PubMed] [Google Scholar]
56.Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 2011;30(10):1105–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Liu L, Tao D. A convolutional neural network approach to predicting the remaining useful life of aero-engines. Sci Rep. 2015;5:10528. 10.1038/srep10528. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Saleh SI, Saha S, Hossain MA. Oral cancer survival prediction using machine learning techniques: a systematic review. J Healthc Eng. 2021.
59.Chicco D, Warrens MJ, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE, and RMSE in regression analysis evaluation. PeerJ Computer Science. 2021;5(7): e623. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Additional file 1.^{(8.8KB, py)}

Data Availability Statement

The data used in this research can be accessed at National Cancer Institute. "Genomic Data Commons Data Portal: TCGA-HNSC." 2021, https://portal.gdc.cancer.gov/projects/TCGA-HNSC.

[CR1] 1.Khalili J. Oral cancer: risk factors, prevention and diagnostic. Exp Oncol. 2008;30(4):259–64. [PubMed] [Google Scholar]

[CR2] 2.Annual report to the nation 2021: national trends in cancer death rates infographic. Surveillance, epidemiology, and end results program, National institutes of health. 2021. https://seer.cancer.gov/report_to_nation/infographics/trends_mortality.html. Accessed 20 Feb 2025.

[CR3] 3.Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71:209–49. 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Li Y, Feng Y, Cao P, et al. Role of synovium-derived fibrous cartilage in temporomandibular joint synovial chondromatosis. J Oral Pathol Med. 2019;48:79–86. 10.1111/jop.12788. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Dhanuthai K, Rojanawatsirivej S, Thosaporn W, Kintarak S, Subarnbhesaj A, Darling M, Kryshtalskyj E, Chiang CP, Shin HI, Choi SY, Lee SS, Aminishakib P. Oral cancer: a multicenter study. Med Oral Patol Oral Cir Bucal. 2018;23(1):e23–9. 10.4317/medoral.21999. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Borse V, Konwar AN, Buragohain P. Oral cancer diagnosis and perspectives in India. Sens Int. 2020;1: 100046. 10.1016/j.sintl.2020.100046. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Niaz K, Maqbool F, Khan F, Bahadar H, Ismail Hassan F, Abdollahi M. Smokeless tobacco (paan and gutkha) consumption, prevalence, and contribution to oral cancer. Epidemiol Health. 2017;9(39): e2017009. 10.4178/epih.e2017009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Irani S. New insights into oral cancer-risk factors and prevention: a review of literature. Int J Prev Med. 2020;30(11):202. 10.4103/ijpvm.IJPVM_403_18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.González-Moles MÁ, Aguilar-Ruiz M, Ramos-García P. Challenges in the early diagnosis of oral cancer, evidence gaps and strategies for improvement: a scoping review of systematic reviews. Cancers (Basel). 2022;14(19):4967. 10.3390/cancers14194967. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Sandra SC, Raghavan A, Madan Kumar PD. Application of artificial intelligence in the diagnosis and survival prediction of patients with oral cancer: a systematic review. J Oral Res Rev. 2022;14:154–60. [Google Scholar]

[CR11] 11.Thavarool SB, Muttath G, Nayanar S, Duraisamy K, Bhat P, Shringarpure K, Nayak P, Tripathy JP, Thaddeus A, Philip S. Improved survival among oral cancer patients: findings from a retrospective study at a tertiary care cancer center in rural Kerala, India. World J Surg Oncol. 2019;17(1):15. 10.1186/s12957-018-1550-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Hung M, et al. Artificial intelligence in dentistry: harnessing big data to predict oral cancer survival. World J Clin Oncol. 2020;11(11):918. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Adeoye J, et al. Comparison of time-to-event machine learning models in predicting oral cavity cancer prognosis. Int J Med Inform. 2022;157:104635. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Kim DW, et al. Deep learning-based survival prediction of oral cancer patients. Sci Rep. 2019;9(1):6994. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol. 2018;18:1–2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Tran TO, Vo TH, Le NQ. Omics-based deep learning approaches for lung cancer decision-making and therapeutics development. Brief Funct Genomics. 2024;23(3):181–92. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Sharma D, Garg VK, Kashyap D, Goel N. A deep learning-based integrative model for survival time prediction of head and neck squamous cell carcinoma patients. Neural Comput Appl. 2022;34(23):21353–65. [Google Scholar]

[CR18] 18.Le NQ. Hematoma expansion prediction: still navigating the intersection of deep learning and radiomics. Eur Radiol. 2024;22:1–3. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Ahmad P, Nawaz R, Qurban M, Shaikh GM, Mohamed RN, Nagarajappa AK, Asif JA, Alam MK. Risk factors associated with the mortality rate of oral squamous cell carcinoma patients: a 10-year retrospective study. Medicine. 2021;100(36): e27127. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Chery J. RNA therapeutics: RNAi and antisense mechanisms and clinical applications. Postdoc J. 2016;4(7):35. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Mikeska T, Bock C, Do H, Dobrovic A. DNA methylation biomarkers in cancer: progress towards clinical implementation. Expert Rev Mol Diagn. 2012;12(5):473–87. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Barrett MT, Scheffer A, Ben-Dor A, Sampas N, Lipson D, Kincaid R, Tsang P, Curry B, Baird K, Meltzer PS, Yakhini Z, Bruhn L, Laderman S. Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. Proc Natl Acad Sci USA. 2004;101(35):12973–8. 10.1073/pnas.0402927101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Sun YS, Zhao Z, Yang ZN, Xu F, Lu HJ, Zhu ZY, Shi W, Jiang J, Yao PP, Zhu HP. Risk factors and preventions of breast cancer. Int J Biol Sci. 2017;13(11):1387–97. 10.7150/ijbs.21635. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Ghojogh B, Crowley M, Karray F, Ghodsi A. Principal component analysis. In: Ghojogh B, Crowley M, Karray F, Ghodsi A, editors. Elements of dimensionality reduction and manifold learning. Cham: Springer International Publishing; 2023. p. 123–54. [Google Scholar]

[CR25] 25.Hasan BM, Abdulazeez AM. A review of principal component analysis algorithm for dimensionality reduction. J Soft Comput Data Min. 2021;2(1):20–30. [Google Scholar]

[CR26] 26.Zebari R, Abdulazeez A, Zeebaree D, Zebari D, Saeed J. A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. J Appl Sci Technol Trends. 2020;1(2):56–70. [Google Scholar]

[CR27] 27.Kabir MF, Chen T, Ludwig SA. Performance analysis of dimensionality reduction algorithms in machine learning models for cancer prediction. Healthc Anal. 2023;1(3): 100125. [Google Scholar]

[CR28] 28.Sen D, Erazo K, Zhang W, Nagarajaiah S, Sun L. On the effectiveness of principal component analysis for decoupling structural damage and environmental effects in bridge structures. J Sound Vib. 2019;29(457):280–98. [Google Scholar]

[CR29] 29.Abdi H, Williams LJ. Principal component analysis. Wiley Interdiscip Rev Comput Statist. 2010;2(4):433–59. [Google Scholar]

[CR30] 30.Meng Y, Shang R, Shang F, Jiao L, Yang S, Stolkin R. Semi-supervised graph regularized deep NMF with bi-orthogonal constraints for data representation. IEEE Transact Neural Netw Learn Syst. 2019;31(9):3245–58. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Fomin F, Panolan F, Patil A, Tanveer A. Boolean and $\mathbb F _ p $-matrix factorization: from theory to practice. In: Fomin F, editor. 2022 International joint conference on neural networks (IJCNN). Padua: IEEE; 2022. p. 1–8. [Google Scholar]

[CR32] 32.Gillies RJ, Morse DL. In vivo magnetic resonance spectroscopy in cancer: a review. NMR Biomed. 2005;18(4):321–40. 10.1002/nbm.974. [DOI] [PubMed] [Google Scholar]

[CR33] 33.Zhang W, Zeng S, Liu X, Zhao Y, Wei Y. Cancer diagnosis and prognosis prediction based on non-negative matrix factorization of microarray data. J Biomed Inform. 2018;86:99–105. 10.1016/j.jbi.2018.07.008. [Google Scholar]

[CR34] 34.Boulesteix AL. PLS dimension reduction for classification with microarray data. Statist Appl Genet Mol Biol. 2004;3(1):1. [DOI] [PubMed] [Google Scholar]

[CR35] 35.Rosipal R, Krämer N. Overview and recent advances in partial least squares. In: Saunders C, Grobelnik M, Gunn S, Shawe-Taylor J, editors. Subspace, latent structure and feature selection. Berlin: Springer; 2006. p. 34–51. [Google Scholar]

[CR36] 36.Geladi P, Kowalski BR. Partial least-squares regression: a tutorial. Anal Chim Acta. 1986;185:1–17. 10.1016/0003-2670(86)80028-9. [Google Scholar]

[CR37] 37.Ayesha S, Hanif MK, Talib R. Overview and comparative study of dimensionality reduction techniques for high dimensional data. Inform Fusion. 2020;1(59):44–58. [Google Scholar]

[CR38] 38.Nguyen LH, Holmes S. Ten quick tips for effective dimensionality reduction. PLoS Comput Biol. 2019;15(6): e1006907. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Sumithra V, Surendran S. A review of various linear and nonlinear dimensionality reduction techniques. Int J Comput Sci Inf Technol. 2015;6(3):2354–60. [Google Scholar]

[CR40] 40.Hu L, Sung AH, May WL. A dimension reduction approach to selecting and clustering time-course gene expression data. Bioinformatics. 2005;21(15):3054–62. [Google Scholar]

[CR41] 41.Borg I, Groenen P. Modern multidimensional scaling: theory and applications. 2nd ed. Springer: Amsterdam; 2005. [Google Scholar]

[CR42] 42.Cox TF, Cox MAA. Multidimensional scaling. 2nd ed. London: Chapman and Hall/CRC; 2001. [Google Scholar]

[CR43] 43.Aziz R, Verma CK, Srivastava N. Dimension reduction methods for microarray data: a review. AIMS Bioeng. 2017;4(1):179–97. [Google Scholar]

[CR44] 44.Li H, Liu J, Wu J. A Kernel principal component analysis method for survival prediction of oral cancer patients. J Healthc Eng. 2018;2018:9409261. 10.1155/2018/9409261. [Google Scholar]

[CR45] 45.Schölkopf B, Smola A, Müller K. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 1998;10(5):1299–319. [Google Scholar]

[CR46] 46.Yan L, Jin G, Su Z, Li C. Kernel principal component analysis for feature reduction of ECG signals. J Med Syst. 2016;40(11):235.27653042 [Google Scholar]

[CR47] 47.Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290(5500):2323–6. [DOI] [PubMed] [Google Scholar]

[CR48] 48.Lu Y, Li Y, Liang K, Pan Y. Feature extraction of mass spectrometry data by singular value decomposition for breast cancer classification. J Healthc Eng. 2019.

[CR49] 49.Howland P, Park H. Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Trans Pattern Anal Mach Intell. 2004;26(8):995–1006. [DOI] [PubMed] [Google Scholar]

[CR50] 50.Golub GH, Reinsch C. Singular value decomposition and least squares solutions. Numer Math. 1970;14(5):403–20. [Google Scholar]

[CR51] 51.Torkey H, Atlam M, El-Fishawy N, Salem H. A novel deep autoencoder-based survival analysis approach for microarray dataset. PeerJ Comput Sci. 2021;21(7): e492. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Gupta S, Kalaivani S, Rajasundaram A, Ameta GK, Oleiwi AK, Dugbakie BN. Prediction performance of deep learning for colon cancer survival prediction on SEER data. Biomed Res Int. 2022;16:2022. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

[CR53] 53.Irsoy O, Alpaydın E. Unsupervised feature extraction with autoencoder trees. Neurocomputing. 2017;4(258):63–73. [Google Scholar]

[CR54] 54.Alom MZ, Taha TM, Yakopcic C, Westberg K, Sidike P, Nasrin MS, Hasan M. A review of deep learning techniques for the prediction of protein-protein interactions. Brief Bioinform. 2019;20(4):1281–98. [Google Scholar]

[CR55] 55.Longato E, Vettoretti M, Di Camillo B. A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models. J Biomed Inform. 2020;1(108): 103496. [DOI] [PubMed] [Google Scholar]

[CR56] 56.Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 2011;30(10):1105–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR57] 57.Liu L, Tao D. A convolutional neural network approach to predicting the remaining useful life of aero-engines. Sci Rep. 2015;5:10528. 10.1038/srep10528. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR58] 58.Saleh SI, Saha S, Hossain MA. Oral cancer survival prediction using machine learning techniques: a systematic review. J Healthc Eng. 2021.

[CR59] 59.Chicco D, Warrens MJ, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE, and RMSE in regression analysis evaluation. PeerJ Computer Science. 2021;5(7): e623. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

DeepOmicsSurv: a deep learning-based model for survival prediction of oral cancer

Deepali

Neelam Goel

Padmavati Khandnor

Abstract

Objective

Methods

Results

Conclusion

Supplementary Information

Introduction

Materials and methods

Materials

Table 1.

Fig. 1.

Table 2.

DeepOmicsSurv

Fig. 2.

Input layer

Convolutional layers

Pooling layers

Residual operation

Muti-head attention layer

Global average pooling

Dense layers

Batch normalization

Dropout

Output layer

Loss function

Grid search

Table 3.

Model’s performance evaluation

Model Interpretability

Fig. 3.

Proposed methodology

Fig. 4.

Experiment results

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Discussion

Conclusion

Supplementary Information

Acknowledgements

Author contributions

Funding

Data availability

Code availability

Declarations

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases