OncoMet: a deep learning framework for the prediction of oncogenic signaling pathways and metastasis in esophageal cancer patients using histopathology images from primary tumors

Syed Wajid Aalam; Abdul Basit Ahanger; Tabasum Majeed; Ab Naffi Ahanger; Tariq Masoodi; Ajaz A Bhat; Assif Assad; Muzafar Ahmad Macha; Muzafar Rasool Bhat

doi:10.1186/s12967-025-06914-4

. 2025 Aug 21;23:945. doi: 10.1186/s12967-025-06914-4

OncoMet: a deep learning framework for the prediction of oncogenic signaling pathways and metastasis in esophageal cancer patients using histopathology images from primary tumors

Syed Wajid Aalam ^1,², Abdul Basit Ahanger ¹, Tabasum Majeed ³, Ab Naffi Ahanger ¹, Tariq Masoodi ⁴, Ajaz A Bhat ⁵, Assif Assad ², Muzafar Ahmad Macha ^6,^✉, Muzafar Rasool Bhat ^1,^2,^✉

PMCID: PMC12372372 PMID: 40842012

Abstract

Background

Despite recent advancements in the diagnosis and prognosis of Esophageal cancer (EC), it remains among the leading causes of cancer-related mortality. Timely and cost-effective diagnosis, particularly in predicting the risk of metastasis and identifying the deregulation of oncogenic signaling pathways, could open new frontiers towards precision medicine and targeted therapy of EC. However, current diagnostic practices in identifying metastasis and deregulated oncogenic pathways involve molecular testing, which is time-consuming and costly. Advances in deep learning analysis of digital pathological imagery data offer promising avenues for automating and enhancing cancer diagnosis and risk stratification.

Methods

High-resolution H&E-stained diagnostic whole slide images were obtained from the open repository of The Cancer Genome Atlas (TCGA). The WSIs underwent several pre-processing steps, including patching, color normalization and augmentation. A deep learning model was designed and trained on WSI data and tissue-level labels to generate image feature representations for predicting metastatic potential and identifying the deregulation of four major oncogenic signaling pathways, viz. mTOR, PTEN, p53, and PI3K/AKT.

Results

The proposed model achieved an AUC of 0.92 for predicting metastatic risk and AUCs ranging from 0.64 to 0.92 for the identification of deregulated oncogenic pathways. In a first, we were able to operate the model without the need for exhaustive patch-level annotations, relying instead on slide-level annotations only.

Conclusion

In this work, we highlighted the transformative potential of deep learning in accurately detecting metastasis and identifying deregulated oncogenic pathways from H&E slides using slide-level annotation, thus opening new doors in precision medicine and targeted therapy.

Keywords: Esophageal cancer, Histopathology, Metastasis, Signaling pathways, Deep learning, Machine learning, Whole slide images, Color normalization, Patching

Introduction

Esophageal cancer (EC) poses a significant global health challenge, with approximately 6,04,000 new cases and 5,44,000 deaths annually. It is the sixth leading cause of cancer-related mortality and the eighth most common cancer worldwide [1]. EC is highly aggressive, often metastasizing early to organs such as the lungs, pleura, liver, stomach, peritoneum, kidneys, adrenal glands, and bones [2, 3]. Despite advancements in oncology, EC remains understudied, resulting in low 5-year survival rates of less than 25%, primarily due to loco-regional and distant metastasis [4]. Alarmingly, up to 50% of EC cases are diagnosed with metastasis at initial presentation. Even superficially located primary tumors can lead to metastatic spread, underscoring the formidable challenge in prognostication and treatment planning [5, 6]. Esophageal cancer has been on the rise globally, particularly among men, contributing significantly to cancer-related mortality. By 2040, the number of new esophageal cancer cases worldwide is expected to reach nearly 1 million annually [7, 8].

EC consists of two main subtypes: squamous cell carcinoma (ESCC), arising from the squamous cells lining the esophagus, and adenocarcinoma (EAC), originating from glandular cells in the lower esophagus. While early detection and classification of these subtypes are essential, the more significant challenge lies in addressing metastasis, where cancer cells spread to secondary sites. The formation of the secondary cancer sites, known as metastases, is a continuous process that can start early in the growth of the primary tumor. Metastases can also create new lesions in other organs, essentially “metastasizing within metastases” [9].

The heterogeneity of metastatic tumors varying in size, location, age, and cellular composition - poses a formidable obstacle to current treatment modalities. This complexity limits the efficacy of therapeutic interventions and contributes to poor prognoses. Most fatalities in metastatic cancer cases result from organ failure caused by the cumulative burden of widespread metastases or complications related to aggressive treatment strategies [10]. Hence, an advanced and highly accurate diagnostic system is imperative. Such a system could assist pathologists in predicting metastatic potential during the early stages of cancer, thereby improving patient outcomes and survival rates [11].

Recent studies have explored the use of histological features to predict metastatic involvement in certain cancers, such as lung, renal, and prostate carcinomas. Leveraging deep learning (DL) techniques, these studies achieved area under the curve (AUC) values of up to 0.78, 0.86, and 0.68 for receiver operating characteristic (ROC) analyses [12–14]. While these advancements highlight the potential of artificial intelligence in improving metastasis prediction, none of these approaches have explored signaling pathway identification using histopathological whole-slide images (WSIs) at the slide level. Furthermore, our study introduces a novel method for metastasis prediction that achieves higher accuracy, bridges this gap, and advances the field by offering unique insights into the underlying biological mechanisms.

A critical aspect of understanding EC lies in the diverse oncogenic signaling pathways involved in its progression. Key pathways such as mTOR, p53, PTEN, and PI3K/AKT are pivotal in cellular functions like growth, proliferation, differentiation, invasion, and apoptosis. Dysregulation of these pathways contributes significantly to cancer development and metastasis [15]. For instance, the mTOR pathway is essential for cell growth and metabolism, and its dysregulation can lead to uncontrolled cell proliferation. The p53 pathway is necessary for DNA repair and apoptosis; mutations in the p53 gene are common in EC and lead to evasion of apoptosis and increased survival of malignant cells. The PTEN pathway acts as a tumor suppressor, and its inactivation can enhance cell survival and growth. Similarly, the PI3K/AKT pathway promotes cell survival and development, and its hyperactivation is frequently observed in various cancers, including EC [16–19].

Traditional cancer diagnosis and grading have relied on the microscopic analysis of Hematoxylin and Eosin (H&E) stained sections. This process has become increasingly important with the rise in cancer cases and the need for personalized treatments. Pathologists must now examine numerous slides, often with additional stains, and extract various quantitative parameters for grading systems, making the process time-intensive and susceptible to variation [20, 21]. To address these challenges, integrating computational science with histopathology through whole-slide scanning systems has opened new avenues for enhancing pathological procedures. Digitizing high-resolution tissue slides allows for efficient study and analysis, potentially increasing metastasis detection’s speed, sensitivity, and consistency [22].

Machine learning and artificial intelligence (AI) algorithms, intense learning models such as Convolutional Neural Networks (CNNs), have demonstrated superior capabilities in diagnosing, prognosing, and classifying malignancies based on medical imaging data [19]. These technologies hold promise for automating the initial assessment of images, providing supplementary information to pathologists, and streamlining the diagnostic process.

Using histopathology images, we present OncoMet a novel DL framework designed to predict metastasis and identify oncogenic signaling pathways in esophageal cancer patients. OncoMet employs DL classifiers to enable direct slide-level predictions, utilizing unique feature representations of whole slide images (WSIs). By efficiently handling and transforming large-sized WSIs through pre-processing, OncoMet consolidates them into single feature vectors for classification using machine learning classifiers. We adopted a three-step data augmentation strategy for the first time by applying Reinhard normalization three times, each with a different reference image, to introduce greater variability and richness in the training data.

The performance and generalizability of the proposed framework are validated through external validation using the TCGA-STAD dataset of gastric cancer patients, providing additional insights into its efficacy. This innovative approach aims to support pathologists in predicting metastasis at early stages and identifying critical oncogenic pathways such as mTOR, p53, PTEN, and PI3K/AKT, ultimately improving patient prognosis and survival outcomes.

Materials and methods

Dataset description

For this study, we selected WSIs from 124 patients diagnosed with esophageal carcinoma (TCGA-ESCA) from The Cancer Genome Atlas (TCGA) via the GDC data transfer tool (https://portal.gdc.cancer.gov/). These WSIs were captured at x40 magnification, featuring an average pixel density of 1,00,000 × 80,000 pixels and an 8-bit color depth, with each pixel corresponding to a spatial resolution of 0.25 micrometers (µm). The images, provided in.svs format, specifically focused on primary tumors stained with H&E. The dataset was randomly partitioned into an 80:20 ratio, adhering to the Pareto principle, a commonly applied rule of thumb in practice.

Additionally, we incorporated diagnostic slides from 20 patients with stomach adenocarcinoma (TCGA-STAD) to validate our method. The clinical and demographic characteristics of both TCGA-ESCA and TCGA-STAD cohorts, including gender distribution, age groups, vital status, and tumor location, are summarized in Table 1. Notably, esophageal tumors were predominantly located in the lower third (76 cases) and middle third (32 cases) of the esophagus. In comparison, stomach tumors were primarily found in the cardia (4 cases), body (4 cases), and pyloric antrum (12 cases) of the stomach.

Table 1.

Clinical and demographic characteristics of TCGA-ESCA and TCGA-STAD cohorts

Characteristics	Cohort
Characteristics	TCGA-ESCA	TCGA-STAD
Gender
Male	112	14
Female	12	6
Age
<=60	10	2
> 60	114	18
Vital status
Dead	56	12
Alive	68	8
Location (Esophagus)
Lower Third	76	-
Middle Third	32	-
Thoracic	2	-
Upper Third	4	-
Unspecified Location	10	-
Location (Stomach)
Cardia	-	4
Central region (Body)	-	4
Pyloric antrum	-	12

Open in a new tab

We conducted an integrative molecular analysis using somatic mutation and copy number variation (CNV) data derived from the TCGA-ESCA and TCGA-STAD cohorts to identify patients with potential for distant metastasis. We prioritized known metastasis-associated biomarkers, including Matrix Metalloproteinases (MMPs), vascular endothelial growth factor (VEGF), and E-cadherin (CDH1), based on their well-established roles in promoting tissue invasion, angiogenesis, and epithelial-to-mesenchymal transition (EMT), hallmark processes in metastatic progression.

Somatic mutation profiles were downloaded from cBioPortal [23]. Recurrently mutated genes were identified (mutated in at least two samples). These included key oncogenes and tumor suppressor genes such as TP53, CDKN2A, FAT1, NOTCH1, and PIK3CA, among others, known to regulate critical cellular pathways involved in tumor progression, EMT, and metastatic spread. We also incorporated CNVs, focusing on amplifications of oncogenes and deletions of tumor suppressors, as these structural variations are often associated with aggressive tumor phenotypes.

The complete list of recurrently mutated genes was submitted to Ingenuity Pathway Analysis (IPA) [24]. To identify significantly enriched canonical pathways with functional relevance to metastasis. The analysis yielded 10 top-ranked pathways commonly altered in carcinoma cases and implicated in the metastatic process. These included pathways involved in cell cycle dysregulation, apoptosis evasion, PI3K/AKT signaling, and EMT-related cascades. Patients showing alterations in any of these metastasis-associated pathways were labeled as having a high metastatic potential. These molecular labels were subsequently used to guide histopathological image processing and model training, linking morphological features to the presence of metastatic-driving alterations.

Dataset pre-processing

To ensure consistent quality in our dataset, we filtered low-quality whole-slide images from the training and validation sets based on stringent criteria such as visual clarity, image blurring, staining abnormalities, and ink marks. This process refined our collection to 124 diagnostic whole-slide images from the 154 diagnostic WSIs available on the TCGA portal. We allocated 80% of these images for training and the remaining 20% for testing, maintaining the same proportion of metastatic and non-metastatic images as outlined in previous studies [25–27].

The selected 124 WSIs were utilized to analyze four oncogenic signaling pathways. To address data imbalance, we supplemented our dataset with additional cases from the TCGA-HNSCC cohort, thereby increasing the representation of minority class cases. This approach ensured that our ML models were trained on a more balanced and representative dataset, crucial for accurately identifying and predicting patterns in minority classes. A balanced dataset reduces the risk of model bias towards the majority class, thereby enhancing the robustness and generalizability of the predictions. The supplemented datasets were then randomly divided into an 80:20 ratio for training and testing the models.

Whole slide image pre-processing

With images captured at various magnification levels, WSIs possess a pyramidal data structure and are exceedingly large [28]. To balance storage needs and information retention, we processed the data at a 20x magnification level using an image segmentation tool built on the OpenSlide library [29]. This process yielded a total of 3,90,532 tiles. In WSI analysis, tile dimensions and overlap selection are critical parameters. We extracted non-overlapping tiles with dimensions of 512 × 512 pixels. These 512 × 512 tiles were resized to 224 × 224 pixels during the feature extraction phase. Additionally, tiles with more than 90% background coverage were excluded, adhering to the methodology outlined in [30].

Histopathology images are inherently prone to color variations due to fluctuations in staining concentrations, disparities in equipment usage, and inconsistencies during tissue sectioning. When these variations occur within WSIs, they introduce significant variability into the training data, which can profoundly impact the performance of DL models. In medical image analysis, where precision is paramount, addressing these color discrepancies is essential to ensure accurate diagnoses and reliable feature extraction.

Several stain normalization techniques have been developed to address color inconsistencies in histopathology images, including Histogram Specification, Structure-Preserving Color Normalization, and the widely used Reinhard Algorithm (RA). Among these, RA has shown strong potential in improving the accuracy and reliability of deep learning models in medical image analysis [31, 32].

Reinhard color normalization is a widely used technique in histopathology image analysis to reduce staining variability across whole-slide images (WSIs). It works by transforming the RGB color space of an image into the Lab color space, which separates luminance (L) from color information (a and b channels). Once the image is converted to Lab color space, Reinhard normalization adjusts the brightness and color channels so that the target image has the same average color and contrast as a chosen reference image. This helps ensure that all slides, no matter where or how they were stained, end up with a similar overall appearance, making it easier for models to learn consistent patterns. After normalization, the image is converted back to the RGB space for further processing. By harmonizing color distributions, Reinhard normalization enhances the consistency of image features extracted by deep learning models and improves the generalizability of predictions across datasets acquired under different staining conditions [33].

In our study, we successfully mitigated color variations within our histopathology dataset by applying Reinhard color normalization, facilitating feature extraction and simplifying subsequent analysis. This approach is invaluable in medical image analysis and DL, where precision and consistency are paramount. By harmonizing the color properties of all images, RA ensures a consistent dataset, ultimately improving the accuracy and reliability of DL models in medical diagnostics.

OncoMet algorithm

OncoMet employs a supervised workflow to achieve cost-effective automated identification of metastasis and oncogenic signaling pathways in esophageal carcinoma. The OncoMet algorithm comprises three distinct phases: (i) The Feature extraction phase is dedicated to effectively characterizing the input data. Each tile of every whole-slide image is passed through the pre-trained ResNet101 network with the top layer removed and the pre-trained parameter set to True. The extracted features are then stored for subsequent analysis. This sub-network is pre-trained on the ImageNet dataset, maintaining consistent parameters during training and testing. Consequently, each input tile yields a feature vector with a substantial dimensionality of 2048. This process generates an N x 2048 feature matrix for each patient slide, where N represents the total number of tumor tiles within the slide. (ii) Feature aggregation: The feature aggregation process consolidates the extracted features from each tile into a cohesive representation for the entire slide. Formally defined by Eqs. 1 and 2, this step ensures that the high-dimensional feature vectors are effectively summarized, providing a comprehensive overview of the slide’s characteristics. (iii) Classification phase: Following feature aggregation, we shift our focus to classification tasks using a variety of machine learning classifiers, including K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest, AdaBoost, and Decision Tree. These classifiers are employed to make predictions based on the feature vectors derived from the WSIs, enhancing the accuracy and reliability of the diagnostic process.

Given a set of feature vectors Inline graphic of all the tiles in a WSI we calculate the mean and median of as follows:

where Inline graphic .

For each WSI in our dataset, we calculate two feature vectors mean ( Inline graphic ) and median () for all the tiles within that WSI, and concatenate them sequentially to form a single feature vector, . For each WSI W, the generated feature vector is computed as:

Model setup

The feature extraction process employs the ResNet101 model pre-trained on ImageNet, with the top (classification) layer removed and the training for the remaining layers frozen. Before feature extraction, each 512 × 512 pixel tile is resized to 224 × 224 pixels. A crucial step in our approach is color normalization, where each whole-slide image undergoes normalization using mean and standard deviations derived from the Reinhard Algorithm (RA) applied to a chosen reference image. This ensures that the color properties are consistent across all photos. Subsequently, every tile within each whole-slide image is passed through the feature extractor to obtain its feature vector, consisting of 2048 deep features. These extracted features for each tile are saved to disk to expedite the training and evaluation process.

Given the variations in the dimensions of WSIs, the tiling process produces tile sets of varying lengths, leading to inconsistent feature set lengths for each whole slide image. To establish consistent feature representations, we calculate the mean and median of each feature set, resulting in two distinct feature vectors, each with 2048 dimensions. These two feature vectors are then concatenated into a single feature vector of length 4096, serving as a comprehensive representation of the WSI. Subsequently, these concatenated feature vectors are fed into various supervised machine-learning classifiers and corresponding WSI labels. This enables the classifiers to learn the feature representations and accurately classify the test set into metastasis and non-metastasis categories.

Model evaluation

A range of quantitative metrics is employed to comprehensively assess the OncoMet algorithm’s performance and predictive capability. These metrics offer intuitive insights into classification performance, including accuracy, precision, recall, and F1-score, while concealing the underlying mathematical complexity.

Accuracy (ACC) denotes the percentage of correct predictions, serving as a straightforward measure of overall model performance. Precision quantifies the proportion of true positives among all instances classified as positive, reflecting the model’s ability to avoid false positives. Recall, or sensitivity, indicates how effectively the model identifies all actual positive cases of metastasis, highlighting its capacity to detect actual positives. The F1-score, calculated as the harmonic mean of precision and recall, offers a balanced perspective on the model’s performance, considering both false positives and false negatives.

Furthermore, the Area Under the Curve (AUC) score is critical for evaluating the model’s ability to distinguish between positive (metastatic) and negative cases. A higher AUC signifies superior discrimination capacity, indicating more accurate cancer prognosis and diagnosis predictions. The Receiver Operating Characteristic (ROC) curve, which plots sensitivity against 1specificity across all classification thresholds, visually assesses the model’s discriminatory power. An ideal ROC curve is characterized by high sensitivity and specificity, demonstrating the model’s proficiency in metastasis detection.

Results

Prediction of metastasis using OncoMet

In the experiment illustrated in Fig. 1, the training set comprised 100 diagnostic WSIs. Features were extracted using the ResNet101 pre-trained architecture and subsequently fed into various machine-learning algorithms for classification in an end-to-end supervised manner. Data augmentation techniques were employed, given the limited dataset size and the risk of model overfitting. Data augmentation involves applying various transformations to the existing data, enhancing the dataset’s diversity, and reducing overfitting, thereby improving the generalizability and robustness of DL models. To address these challenges, our approach involved applying data augmentation techniques, which include Reinhard stain normalization, to our training data only.

We performed this process three times, each time with a different reference image, making the data three times bigger with a total image of 300. These augmented samples were named with the original patient’s name and the reference image name used for normalization. This augmentation strategy contributed significantly to our DL models’ robustness and generalization capacity, making them more adept at handling the inherent variations in WSI characteristics across different medical institutions. Notably, testing data was not augmented to ensure a realistic evaluation of the model’s performance. Finally, the model was tested using 24 images separated during the train test split.

The extracted features are passed to different Machine Learning models for classification. KNN performed better than other models including Support Vector Machine, Logistic Regression, Decision Tree, AdaBoost, and Random Forest. The model performs strongly in predicting metastasis, achieving a precision of 0.88, a recall of 1.00, and an F1-score of 0.94. For non-metastasis, the model attained a precision of 1.00, a recall of 0.78, and an F1-score of 0.88. The model’s overall accuracy and AUC score of 92%. Among the 24 test images, 15 were metastatic, and 9 were non-metastatic. The model correctly identified all 15 metastatic slides and 7 out of 9 non-metastatic slides, underscoring the accuracy and effectiveness of the OncoMet algorithm. The performance of all the models on the test set and validation set are illustrated in Table 2.

Table 2.

Performance metrics for predicting metastatic and Non-Metastatic esophageal cancer using various algorithms on test set and external dataset

Algorithms	Precision		Recall		F1-Score		ACC	AUC
Algorithms	Meta	Non-Meta	Meta	Non-Meta	Meta	Non-Meta	ACC	AUC
SVC	0.71	0.50	0.67	0.56	0.69	0.53	0.62	0.63
LR	0.79	0.60	0.73	0.67	0.76	0.63	0.71	0.83
DT	0.75	0.44	0.40	0.78	0.52	0.56	0.54	0.49
AdaBoost	0.69	045	0.60	0.56	0.64	0.50	0.65	0.58
RFC	0.85	0.64	0.73	0.78	0.79	0.70	0.75	0.75
KNN	0.88	1	1	0.78	0.94	0.88	0.92	0.92
KNN (External Test set)	0.82	0.89	0.90	0.80	0.86	0.84	0.85	0.84

Open in a new tab

Model validation on an external dataset

To validate our best-performing model, we employed an external dataset of gastric cancer from TCGA. This external validation was crucial for assessing the model’s generalizability and robustness beyond the training data. We selected an equal number of metastatic and non-metastatic patients to ensure the data was balanced. The pre-processing of gastric cancer WSIs was conducted using PyHIST with the same parameters applied during the training phase. Color normalization was performed using Reinhard Normalization, utilizing one of the three reference images in the pre-processing step. The difference between patches before and after normalization is shown in Fig. 2.

Fig. 2 — Color Normalization effects: The difference between patches of gastric cancer WSIs before and after color normalization using Reinhard Normalization is visible. This normalization was applied during the pre-processing of an external dataset from TCGA to validate our model. The figure demonstrates how the normalization process standardizes the color variations, ensuring consistency across WSIs

The consistency in pre-processing ensured that the WSIs were standardized across training and validation datasets. Unlike the training set, the validation set did not undergo any augmentation, guaranteeing an unbiased evaluation of the model’s performance. The KNN model, trained on esophageal cancer data, achieved an accuracy of 85% on the gastric cancer validation set, with an AUC score of 84%. The precision for non-metastatic slides was 0.89, while the precision for metastatic slides was 0.82. The recall and F1 score for metastatic slides were higher than those for non-metastatic slides, with a recall of 0.90 and an F1-score of 0.86 for metastasis, compared to a recall of 0.80 and an F1-score of 0.84 for non-metastatic slides. These results indicate that the model effectively generalized to the external dataset, maintaining high-performance metrics across both metastasis and non-metastasis classifications.

Gastric cancer was chosen for external validation due to its close histological and pathological resemblance to esophageal adenocarcinoma. Both cancers originate from the upper gastrointestinal tract and often share similar glandular morphologies, tumor microenvironments, and progression patterns. Moreover, they exhibit overlapping molecular features, including dysregulation of pathways such as PI3K/AKT and p53, which are critical to our study. These similarities make gastric cancer a biologically relevant choice for validating the generalizability of our model beyond esophageal carcinoma, particularly in the context of metastasis prediction using histopathological images [34, 35].

Prediction of signaling pathways using OncoMet

We utilized the extracted features from metastasis prediction to explore four signaling pathways associated with EC: mTOR, p53, PI3K/AKT, and PTEN. The whole experimentation is illustrated in Fig. 3. The label for each WSI images was annotated as present (1) or absent (0). There were significant data imbalances between the two classes in each path. For instance, the p53 pathway showed an evident imbalance, with only 13% of cases where the path was absent and 87% where it was present. Similar patterns were observed for the other pathways: the mTOR pathway had 89% of cases absent and 11% present, the PI3K/AKT pathway had 85% present and 15% absent, and the PTEN pathway had 92% absent and only 8% present. These imbalances highlight the challenges in training robust models and underscore the necessity for strategies such as data augmentation and the inclusion of external datasets to ensure balanced representation.

Fig. 3 — Signaling Pathways pipeline: This figure illustrates four signaling pathways-mTOR, p53, PI3K/AKT, and PTEN in esophageal cancer, explored using features extracted from metastasis prediction. Each pathway is categorized as present (1) or absent (0) in WSI images, highlighting significant data imbalances between pathway states. Strategies such as data augmentation and external dataset inclusion were crucial to address these imbalances and enhance model reliability for pathway analysis

Addressing these imbalances is crucial for improving our models’ reliability and predictive performance. We addressed the class imbalance by incorporating additional TCGA- HNSCC cohort data. This strategy aimed to balance the minority class representations within our dataset. Specifically, we identified 21 patients with mTOR pathway activation, eight patients with p53 pathway activation, 10 with PI3K/AKT pathway activation, and 26 with PTEN pathway activation from the HNSCC cohort. These patients were selected to match the minority class instances and were incorporated to balance the pathway data in our study. This integration helped balance the class distribution. The cases from the TCGA-HNSCC cohort were pre-processed similarly to those from the TCGA-ESCA cohort. Each WSI was divided into patches, and those containing less than 10% tissue were discarded to ensure quality. These patches were then passed through a pre-trained ResNet101 model (with the top layer removed) for feature extraction. The features extracted from ResNet101 were saved for subsequent analysis. The resulting dataset was randomly split into an 80:20 ratio for training and testing.

Despite incorporating additional HNSCC data, some imbalances persisted. To further mitigate this issue in the training dataset, we applied the Synthetic Minority Over-sampling Technique (SMOTE). SMOTE is a powerful data augmentation method that helps address class imbalances in machine learning datasets. It generates synthetic samples for the minority class by intelligently interpolating between existing samples. This process creates a more balanced training dataset, ensuring the model is exposed to a more equitable representation of the majority and minority classes. The test set remained untouched to preserve its originality and provide an unbiased model performance evaluation. This approach ensured that the training data was balanced, enhancing the model’s ability to learn from diverse examples and reducing the risk of bias. Using the SMOTE technique, 187 training samples were generated for both classes for each pathway. The resulting balanced dataset is depicted in Fig. 4, showcasing the equitable representation of both minority and majority groups.

Fig. 4 — SMOTE Balancing in Training Dataset: This figure illustrates the application of the Synthetic Minority Over-sampling Technique (SMOTE) to mitigate class imbalances in the training dataset for pathway analysis. We incorporated additional HNSCC data and the SMOTE to generate synthetic samples for the minority class. This process created a balanced training dataset with equitable representation across minority and majority classes, which is crucial for enhancing model robustness and reducing bias. The test set was kept untouched to maintain its originality and ensure unbiased model performance evaluation

Despite applying SMOTE to address class imbalance, the relatively low performance in PTEN pathway prediction is attributed to under-represented alterations of PTEN class which affected performance even after SMOTE-based oversampling. As synthetic samples generated by SMOTE are linear interpolations of minority instances, they have not adequately captured the non-linear and high-dimensional variability typical of histopathological image features, especially in complex molecular phenotypes such as PTEN inactivation.

For each pathway, we experimented with multiple classifiers including KNN, SVM, Logistic Regression, Random Forest, Decision Tree, and AdaBoost. The classifier with the best performance based on accuracy, precision, recall, F1-score, and AUC was selected. We observed that AdaBoost performed better for the p53 and PI3K/AKT pathways, likely due to its ability to handle complex patterns and class imbalances, whereas Decision Tree achieved superior results for the mTOR and PTEN pathways, benefiting from simpler decision boundaries.

The Decision Tree classifier achieved the best performance for predicting the mTOR signaling pathway. It attained an accuracy of 87% and an AUC (area under the ROC curve) of 64%. When the pathway is absent, the model demonstrated high precision (0.88), recall (0.96), and F1-score (0.92). Even when the pathway is present, the model maintained the precision (0.80), although recall (0.57) and F1-score (0.67) are lower. The sensitivity is 0.57, and the specificity is 0.95. For the p53 signaling pathway, the AdaBoost model outperformed other classifiers. It achieved balanced precision, recall, and F1-scores of 0.67 when the pathway was absent and 0.96 when present. This model exhibited high sensitivity (0.95) but lower specificity (0.66). Overall, it attained an accuracy of 93% and an AUC of 76%.

AdaBoost also showed superior performance in predicting the PI3K/AKT signaling pathway. It achieved precision, recall, and F1-scores of 0.75 for the absence and 0.96 for the presence of the pathway. The model demonstrated high sensitivity (0.95) and specificity (0.75), impressive accuracy of 93%, and an AUC of 92%. For the PTEN signaling pathway, the Decision Tree classifier provided the best results. It achieved precision, recall, and F1-scores of 0.83, 0.87, and 0.85, respectively, when the pathway is absent. When the pathway is present, the scores are 0.57, 0.50, and 0.53. The sensitivity is 0.50, specificity is 0.86, accuracy is 77%, and AUC is 74%. The performance metrics of the best models for predicting signaling pathways in esophageal cancer are summarized in Table 3. These models were selected based on their superior accuracy, precision, recall, and F1 score, demonstrating their effectiveness in identifying key signaling pathways associated with esophageal cancer.

Table 3.

Performance metrics of the best models for predicting signaling pathways in esophageal cancer

Pathway	Models (best)	Precision		Recall		F1-Score		ACC	AUC
Pathway	Models (best)	0	1	0	1	0	1	ACC	AUC
PI3K	AdaBoost	0.75	0.96	0.75	0.96	0.75	0.96	0.93	0.92
mTOR	DT	0.88	0.80	0.96	0.57	0.92	0.67	0.87	0.64
p53	AdaBoost	0.67	0.96	0.67	0.96	0.67	0.96	0.93	0.76
PTEN	DT	0.83	0.57	0.87	0.50	0.85	0.53	0.77	0.74

Open in a new tab

Discussion and conclusion

EC is a highly aggressive cancer, with around 6,04,000 new cases and 5,44,000 deaths annually [36]. It is the sixth leading cause of cancer-related mortality and commonly metastasizes early to multiple organs, resulting in low survival rates of 15–20%. The two main types of EC are ESCC, linked to tobacco and alcohol use; EAC, primarily associated with obesity, GERD, and Barrett’s esophagus. Key oncogenic pathways involved in EC progression include mTOR, p53, PTEN, and PI3K/AKT [37]. Traditional diagnostic methods rely on time-consuming microscopic analysis of tissue slides. Still, integrating AI and ML, such as CNNs, promises faster, more accurate detection and classification, aiding pathologists in diagnosing EC and predicting metastasis.

This study showed the capability of DL models to predict both metastasis and signaling pathways in EC using WSIs. We trained our model on the EC dataset. Further, we validated its performance of predicting metastasis on an external set of gastric cancer WSIs, demonstrating its potential as a reliable tool for cancer prognosis in clinical settings. Despite the promising results, we faced several challenges inherent to the domain, particularly the limited availability of sufficiently large training samples, which can lead to overfitting issues.

We explored data augmentation strategies to address these challenges, including Reinhard stain normalization and the Synthetic Minority Over-sampling Technique (SMOTE). These efforts mitigated the challenges posed by limited training data and potential overfitting, ultimately contributing to a more reliable and accurate metastasis prediction model for esophageal cancer analysis. The total number of tiles used in this study, including those generated through data augmentation, was 10,78,883, effectively increasing the dataset size and providing the model with a more diverse range of examples to learn from. Tile distribution in different cohorts used in prediction of metastasis is illustrated in Fig. 5.

Fig. 5 — Tile Distribution: This illustrates the distribution of tiles used for predicting metastasis across different cohorts. It displays the number of tiles in the training set before and after augmentation and in the test set and external validation set. The increase in tile numbers post-augmentation highlights the importance of data augmentation in deep learning for improved feature extraction and model performance

Handling WSIs is challenging due to their large size, necessitating the breaking up of WSIs into tiles or patches for model training. We extracted non-overlapping tiles to avoid repetition, ensure consistent tile dimensions, reduce computational load, prevent redundancy, and ensure overall resource efficiency. This approach streamlined the processing of WSIs, ensuring that machine learning models could effectively learn from the data. Decomposing a single WSI into tiles, followed by normalization and feature extraction, is time-consuming, averaging 15 to 25 min per slide. To expedite these processes, we utilized parallel processing on an NVIDIA Tesla V100 GPU with 32 GB memory, significantly enhancing our capability for intensive data analysis.

Our research highlights the benefits of tissue-level classification over traditional patch-level analysis in digital pathology. Although widely used for disease progression and cancer metastasis detection, patch-level approaches are limited by several significant drawbacks. These methods require extensive manual annotation by pathologists, who must meticulously mark each patch within WSIs. This process is labor-intensive, time-consuming, and susceptible to human error. Furthermore, Patch-level analysis is vulnerable to sampling bias, as selected patches may not accurately represent the entire slide, potentially obscuring critical pathological details. In contrast, our whole slide image analysis method provides a comprehensive view of the pathological context, improving diagnostic performance and reliability. By analyzing the entire WSI, our approach preserves essential spatial relationships and contextual information often lost in patch-level analysis. This comprehensive method reduces the annotation burden on pathologists and mitigates the risk of sampling bias. As a result, it enhances the accuracy and generalizability of cancer detection and classification, offering a more robust and efficient alternative to patch-level techniques [38].

This study provides a substantial advancement in the field by validating the OncoMet algorithm’s efficacy in predicting metastasis in esophageal cancer through the analysis of WSIs. The study also extended the OncoMet algorithm to predict the activation status of key signaling pathways (mTOR, PTEN, p53, and PI3K/AKT) in esophageal cancer, addressing data imbalances and enhancing model reliability through advanced techniques like additional dataset incorporation and SMOTE. The findings underscore the OncoMet algorithm’s potential as a powerful tool for cancer prognosis and pathway analysis, with implications for targeted treatment strategies. This research contributes to advancing the field of digital pathology and computational oncology, paving the way for future studies and applications in cancer research and clinical practice.

The predictions made by our model are consistent with findings reported in the existing literature on EC. Specifically, the following molecular alterations predicted by the model align closely with known biological patterns in EC: (a) High frequencies of p53 pathway alterations, PTEN loss, and aberrant activation of the PI3K/AKT and mTOR pathways have been well-documented in EC, particularly in esophageal adenocarcinoma. Our model reflected these trends, with p53 and PI3K/AKT showing high predicted activation rates, while PTEN frequently appeared inactivated, consistent with its tumor suppressor loss reported in multiple genomic studies [39–41]. (b) A pathway alteration score of 0.93 for PI3K/AKT implies strong pathway activation. It is frequently hyperactivated in EC, driving proliferation, survival and metastasis. For example, phosphorylated AKT is constitutively elevated in the majority of tumors: one study found AKT active in ~ 75% of esophageal cancers versus matched normal tissue [42]. (c) mTOR lies downstream of PI3K/AKT and orchestrates protein synthesis, metabolism and cell growth. In ESCC, elevated mTOR or phosphorylated-mTOR expression strongly predicts worse survival. Analyses show that mTOR overexpression is a robust biomarker of poor overall and disease-free survival [43]. A high mTOR accuracy of 0.87 correlates with its aggressiveness. (d) In EC, TP53 is by far the most commonly altered gene with over 90% of dysplastic and cancerous lesions exhibiting TP53 loss [44]. In adenocarcinomas of the esophagus and gastroesophageal junction, nearly half of tumors harbor TP53 mutations, with significantly worse survival rates [45]. (e) In EC, PTEN downregulation is well documented with studies reporting significantly lower PTEN protein in tumors than in adjacent normal mucosa [46]. PTEN loss correlates with aggressive featuresand tend to have deeper invasion and higher stage.

Our model’s capability to predict the activation of these pathways further emphasizes its potential utility in clinical settings. Identifying the status of mTOR, PTEN, p53, and PI3K/AKT pathways can provide valuable insights into the molecular foundations of esophageal cancer in individual patients, thereby guiding personalized treatment strategies. The significant data imbalances encountered, with most instances showing the pathways as absent, highlight the necessity for data augmentation and balancing techniques. Incorporating additional data from the TCGA-HNSCC cohort helped to mitigate these imbalances. Yet, some persisted, underscoring the ongoing challenge of achieving perfectly balanced datasets.

By employing a combination of ResNet101 for feature extraction and various machine learning classifiers, we achieved an overall accuracy and an AUC score of 92% in detecting metastasis on test set. The model’s ability was further validated on an external dataset of gastric cancer WSIs, maintaining robust performance with an accuracy of 85% and an AUC score of 84%. The AUC ROC curves on test and validation sets are shown in Fig. 6. Addressing the inherent challenges posed by limited training data and data imbalance, we applied data augmentation techniques, including reinhard stain normalization and SMOTE, to enhance the model’s robustness and mitigate overfitting. The incorporation of additional data from the TCGA-HNSCC cohort and the careful pre-processing of WSIs ensured a balanced representation of minority classes, further improving the reliability of the model.

Fig. 6 — AUC ROC Curves for Metastasis: This figure presents the AUC ROC curves for both the test and external validation sets. The curves demonstrate the model’s performance in distinguishing between metastatic and non-metastatic esophageal cancer cases. A higher AUC indicates better discrimination ability, showcasing the model’s robustness and generalizability across different datasets

Our investigation into the mTOR, PTEN, p53, and PI3K/AKT signaling pathways revealed significant data imbalances, effectively addressed through data augmentation using two different techniques. The results highlight the importance of balanced datasets in training robust predictive models and demonstrate the feasibility of transferring learned features from metastasis prediction to pathway analysis. Future work should further increase dataset diversity, explore advanced augmentation techniques, and integrate multiomics data to enhance model performance. Developing more sophisticated algorithms capable of handling imbalanced data more effectively would further improve predictive accuracy and robustness.

Author contributions

SWA, TAM, AA, MAM, and MRB contributed to the study’s concept and design, wrote the manuscript, generated figures, and critically edited it. SWA, ABA, TM, ANA, and TAM performed experiments. SWA, ABA, TM, ANA, TAM, AAB, AA, MRB, and MAM critically revised and edited the scientific content. All authors read and approved the final manuscript.

Funding

This study was funded, in part, by a Research Grant (Grant number: ID No. 2022–16465) from the Indian Council of Medical Research (ICMR) Govt. of India, New Delhi to Muzafar A. Macha. Promotion of University Research and Scientific Excellence (PURSE) (SR/PURSE/2022/121) grant from the Department of Science and Technology, Govt. of India, New Delhi to the Islamic University of Science and Technology (IUST), Awantipora.

Data availability

Not applicable.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Muzafar Ahmad Macha, Email: muzafar.macha@iust.ac.in.

Muzafar Rasool Bhat, Email: muzafarrasool@gmail.com.

References

1.Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Cancer J Clin. 2021;71:209–49. [DOI] [PubMed] [Google Scholar]
2.Koizumi W, Kitago M, Shinoda M, Yagi H, Abe Y, Oshima G, Hori S, Inomata K, Kawakubo H, Kawaida M. Successful resection of pancreatic metastasis from oesophageal squamous cell carcinoma: a case report and review of the literature. BMC Cancer. 2019;19:1–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Weinberg JS, Suki D, Hanbali F, Cohen ZR, Lenzi R, Sawaya R. Metastasis of esophageal carcinoma to the brain. Cancer. 2003;98:1925–33. [DOI] [PubMed] [Google Scholar]
4.Pennathur A, Gibson MK, Jobe BA, Luketich JD. Oesophageal carcinoma. Lancet. 2013;381:400–12. [DOI] [PubMed] [Google Scholar]
5.Enzinger PC, Mayer RJ. Esophageal cancer. N Engl J Med. 2003;349:2241–52. [DOI] [PubMed] [Google Scholar]
6.Dempsey D. Esophagectomy for T1 esophageal cancer: outcomes in 100 patients and implications for endoscopic therapy. In: Pennathur A, Farkas A, Krasinskas AM et al. (eds) (University of Pittsburgh Med Ctr; The Univ of Pittsburgh Cancer Inst Biostatistics Facility, Pittsburgh, PA) Ann Thorac Surg. 2009;87:1048–1055. Year Book of Gastroenterology 2009, 2009;157–158. [DOI] [PMC free article] [PubMed]
7.Vijayakumar S, Saravanan A, Sayeed N, Kirezi NGR, Duggirala NK, El-Hashash AH, Al Hussein H. Analyzing mortality patterns and location of death in patients with malignant esophageal neoplasms: a two-decade study in the united States. Cureus 2023;15. [DOI] [PMC free article] [PubMed]
8.Santucci C, Mignozzi S, Malvezzi M, Collatuzzo G, Levi F, La Vecchia C, Negri E. Global trends in esophageal cancer mortality with predictions to 2025, and in incidence by histotype. Cancer Epidemiol. 2023;87:102486. [DOI] [PubMed] [Google Scholar]
9.Kim N. Sex difference of esophageal cancer: esophageal squamous cell carcinoma vs. esophageal adenocarcinoma. Sex/Gender-Specific medicine in the Gastrointestinal diseases. Springer; 2022. pp. 69–92.
10.Gerstberger S, Jiang Q, Ganesh K. Metastasis. Cell. 2023;186:1564–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Sheikh M, Roshandel G, McCormack V, Malekzadeh R. Current status and future prospects for esophageal cancer. Cancers. 2023;15:765. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Lee V, King ALO, Sritharan D, Moore NS, Chadha S, Maresca R, Hager T, Aneja S. Lymph node metastasis prediction with non-small cell lung cancer histopathology imaging. American Society of Clinical Oncology; 2024.
13.Gao F, Jiang L, Guo T, Lin J, Xu W, Yuan L, Han Y, Yang J, Pan Q, Chen E. Deep learning-based pathological prediction of lymph node metastasis for patient with renal cell carcinoma from primary whole slide images. J Translational Med. 2024;22:568. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Wessels F, Schmitt M, Krieghoff-Henning E, Jutzi T, Worst TS, Waldbillig F, Neuberger M, Maron RC, Steeg M, Gaiser T. Deep learning approach to predict lymph node metastasis directly from primary tumour histology in prostate cancer. BJU Int. 2021;128:352–60. [DOI] [PubMed] [Google Scholar]
15.Wu N, Du Z, Zhu Y, Song Y, Pang L, Chen Z. The expression and prognostic impact of the PI3K/AKT/mTOR signaling pathway in advanced esophageal squamous cell carcinoma. Technol Cancer Res Treat. 2018;17:1533033818758772. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Basson MA. Signaling in cell differentiation and morphogenesis. Cold Spring Harb Perspect Biol 2012;4. [DOI] [PMC free article] [PubMed]
17.da Silva HB, Amaral EP, Nolasco EL, de Victo NC, Atique R, Jank CC, Anschau V, Zerbini LF, Correa RG. Dissecting major signaling pathways throughout the development of prostate cancer. Prostate Cancer. 2013;2013:920612. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Nisar S, Hashem S, Macha MA, Yadav SK, Muralitharan S, Therachiyil L, Sageena G, Al-Naemi H, Haris M, Bhat AA. Exploring dysregulated signaling pathways in cancer. Curr Pharm Des. 2020;26:429–45. [DOI] [PubMed] [Google Scholar]
19.Luo Q, Du R, Liu W, Huang G, Dong Z, Li X. PI3K/AKT/mTOR signaling pathway: role in esophageal squamous cell carcinoma, regulatory mechanisms and opportunities for targeted therapy. Front Oncol. 2022;12:852383. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Litjens G, Sánchez CI, Timofeeva N, Hermsen M, Nagtegaal I, Kovacs I, Hulsbergen-Van De Kaa C, Bult P, Van Ginneken B, Van Der Laak J. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci Rep. 2016;6:26286. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Wang X, Chen Y, Gao Y, Zhang H, Guan Z, Dong Z, Zheng Y, Jiang J, Yang H, Wang L. Predicting gastric cancer outcome from resected lymph node histopathology images using deep learning. Nat Commun. 2021;12:1637. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.He L, Long LR, Antani S, Thoma GR. Histology image analysis for carcinoma detection and grading. Comput Methods Programs Biomed. 2012;107:538–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, Jacobsen A, Byrne CJ, Heuer ML, Larsson E, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2:401–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Krämer A, Green J, Pollard J Jr., Tugendreich S. Causal analysis approaches in ingenuity pathway analysis. Bioinformatics. 2014;30:523–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Talebi A, Celis-Morales CA, Borumandnia N, Abbasi S, Pourhoseingholi MA, Akbari A, Yousefi J. Predicting metastasis in gastric cancer patients: machine learning-based approaches. Sci Rep. 2023;13:4163. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Talebi R, Celis-Morales CA, Akbari A, Talebi A, Borumandnia N, Pourhoseingholi MA. Machine learning-based classifiers to predict metastasis in colorectal cancer patients. Front Artif Intell. 2024;7:1285037. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Aalam SW, Ahanger AB, Masoodi TA, Bhat AA, Akil ASA, Khan MA, Assad A, Macha MA, Bhat MR. Deep learning-based identification of esophageal cancer subtypes through analysis of high-resolution histopathology images. Front Mol Biosci. 2024;11:1346242. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Berman AG, Orchard WR, Gehrung M, Markowetz F. PathML: a unified framework for whole-slide image analysis with deep learning. MedRxiv 2021:2021.2007. 2007.21260138.
29.Goode A, Gilbert B, Harkes J, Jukic D, Satyanarayanan M. OpenSlide: a vendor-neutral software foundation for digital pathology. J Pathol Inf. 2013;4:27. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Hoffman RA, Kothari S, Phan JH, Wang MD. A high-resolution tile-based approach for classifying biological regions in whole-slide histopathological images. IFMBE Proc. 2014;42:280–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Lakshmanan B, Anand S, Jenitha T. Stain removal through color normalization of haematoxylin and eosin images: a review. In: Journal of physics: conference series. IOP Publishing; 2019: 012108.
32.Qu H, Zhou M, Yan Z, Wang H, Rustgi VK, Zhang S, Gevaert O, Metaxas DN. Genetic mutation and biological pathway prediction based on whole slide images in breast carcinoma using deep learning. NPJ Precis Oncol. 2021;5:87. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Roy S, Panda S, Jangid M. (2021, August). Modified reinhard algorithm for color normalization of colorectal cancer histopathology images. In 2021 29th European Signal Processing Conference (EUSIPCO) (pp. 1231–1235). IEEE.
34.Quante M, Wang TC, Bass AJ. Adenocarcinoma of the oesophagus: is it gastric cancer? Gut. 2023;72(6):1027–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Hu B, Hajj E, Sittler N, Lammert S, Barnes N, R., Meloni-Ehrig A. Gastric cancer: classification, histology and application of molecular pathology. J Gastrointest Oncol. 2012;3(3):251. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Qu H-T, Li Q, Hao L, Ni Y-J, Luan W-Y, Yang Z, Chen X-D, Zhang T-T, Miao Y-D, Zhang F. Esophageal cancer screening, early detection and treatment: current insights and future directions. World J Gastrointest Oncol. 2024;16:1180. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Hölscher AH, Bollschweiler E, Schneider PM, Siewert JR. Prognosis of early esophageal cancer. Comparison between adeno-and squamous cell carcinoma. Cancer. 1995;76:178–86. [DOI] [PubMed] [Google Scholar]
38.Li B, Qin W, Yang L, Li H, Jiang C, Yao Y, Cheng S, Zou B, Fan B, Dong T. From pixels to patient care: deep learning-enabled pathomics signature offers precise outcome predictions for immunotherapy in esophageal squamous cell cancer. J Transl Med. 2024;22:195. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Zhong L, Li H, Chang W, Ao Y, Wen Z, Chen Y. TP53 mutations in esophageal squamous cell carcinoma. Front Biosci Landmark. 2023;28(9):219. [DOI] [PubMed] [Google Scholar]
40.Wang G, Guo S, Zhang W, Li Z, Xu J, Li D, Zhan Q. A comprehensive analysis of alterations in DNA damage repair pathways reveals a potential way to enhance the radio-sensitivity of esophageal squamous cell cancer. Front Oncol. 2020;10:575711. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Fusco N, Sajjadi E, Venetis K, Gaudioso G, Lopez G, Corti C, Rocco EG, Criscitiello C, Malapelle U, Invernizzi M. PTEN alterations and their role in cancer management: are we making headway on precision medicine? Genes. 2020;11(7):719. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Li B, Li J, Xu WW, Guan XY, Qin YR, Zhang LY, Cheung AL. Suppression of esophageal tumor growth and chemoresistance by directly targeting the PI3K/AKT pathway. Oncotarget. 2014;5(22):11576. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Lu J, Pan Y, Xia X, Gu Y, Lei Y. Prognostic significance of mTOR and PTEN in patients with esophageal squamous cell carcinoma. Biomed Res Int. 2015;20151:417210. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Murai K, Dentro S, Ong SH, Sood R, Fernandez-Antoran D, Herms A, Kostiou V, Abnizova I, Hall BA, Gerstung M, Jones PH. p53 mutation in normal esophagus promotes multiple stages of carcinogenesis but is constrained by clonal competition. Nat Commun. 2022;13(1):6206. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Ireland AP, Shibata DK, Chandrasoma P, Lord RV, Peters JH, DeMeester TR. Clinical significance of p53 mutations in adenocarcinoma of the esophagus and cardia. Ann Surg. 2000;231(2):179–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Chang D, Wang TY, Li HC, Wei JC, Song JX. Prognostic significance of PTEN expression in esophageal squamous cell carcinoma from Linzhou city, a high incidence area of Northern China. Dis Esophagus. 2007;20(6):491–6. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not applicable.

[CR1] 1.Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Cancer J Clin. 2021;71:209–49. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Koizumi W, Kitago M, Shinoda M, Yagi H, Abe Y, Oshima G, Hori S, Inomata K, Kawakubo H, Kawaida M. Successful resection of pancreatic metastasis from oesophageal squamous cell carcinoma: a case report and review of the literature. BMC Cancer. 2019;19:1–5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Weinberg JS, Suki D, Hanbali F, Cohen ZR, Lenzi R, Sawaya R. Metastasis of esophageal carcinoma to the brain. Cancer. 2003;98:1925–33. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Pennathur A, Gibson MK, Jobe BA, Luketich JD. Oesophageal carcinoma. Lancet. 2013;381:400–12. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Enzinger PC, Mayer RJ. Esophageal cancer. N Engl J Med. 2003;349:2241–52. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Dempsey D. Esophagectomy for T1 esophageal cancer: outcomes in 100 patients and implications for endoscopic therapy. In: Pennathur A, Farkas A, Krasinskas AM et al. (eds) (University of Pittsburgh Med Ctr; The Univ of Pittsburgh Cancer Inst Biostatistics Facility, Pittsburgh, PA) Ann Thorac Surg. 2009;87:1048–1055. Year Book of Gastroenterology 2009, 2009;157–158. [DOI] [PMC free article] [PubMed]

[CR7] 7.Vijayakumar S, Saravanan A, Sayeed N, Kirezi NGR, Duggirala NK, El-Hashash AH, Al Hussein H. Analyzing mortality patterns and location of death in patients with malignant esophageal neoplasms: a two-decade study in the united States. Cureus 2023;15. [DOI] [PMC free article] [PubMed]

[CR8] 8.Santucci C, Mignozzi S, Malvezzi M, Collatuzzo G, Levi F, La Vecchia C, Negri E. Global trends in esophageal cancer mortality with predictions to 2025, and in incidence by histotype. Cancer Epidemiol. 2023;87:102486. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Kim N. Sex difference of esophageal cancer: esophageal squamous cell carcinoma vs. esophageal adenocarcinoma. Sex/Gender-Specific medicine in the Gastrointestinal diseases. Springer; 2022. pp. 69–92.

[CR10] 10.Gerstberger S, Jiang Q, Ganesh K. Metastasis. Cell. 2023;186:1564–79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Sheikh M, Roshandel G, McCormack V, Malekzadeh R. Current status and future prospects for esophageal cancer. Cancers. 2023;15:765. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Lee V, King ALO, Sritharan D, Moore NS, Chadha S, Maresca R, Hager T, Aneja S. Lymph node metastasis prediction with non-small cell lung cancer histopathology imaging. American Society of Clinical Oncology; 2024.

[CR13] 13.Gao F, Jiang L, Guo T, Lin J, Xu W, Yuan L, Han Y, Yang J, Pan Q, Chen E. Deep learning-based pathological prediction of lymph node metastasis for patient with renal cell carcinoma from primary whole slide images. J Translational Med. 2024;22:568. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Wessels F, Schmitt M, Krieghoff-Henning E, Jutzi T, Worst TS, Waldbillig F, Neuberger M, Maron RC, Steeg M, Gaiser T. Deep learning approach to predict lymph node metastasis directly from primary tumour histology in prostate cancer. BJU Int. 2021;128:352–60. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Wu N, Du Z, Zhu Y, Song Y, Pang L, Chen Z. The expression and prognostic impact of the PI3K/AKT/mTOR signaling pathway in advanced esophageal squamous cell carcinoma. Technol Cancer Res Treat. 2018;17:1533033818758772. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Basson MA. Signaling in cell differentiation and morphogenesis. Cold Spring Harb Perspect Biol 2012;4. [DOI] [PMC free article] [PubMed]

[CR17] 17.da Silva HB, Amaral EP, Nolasco EL, de Victo NC, Atique R, Jank CC, Anschau V, Zerbini LF, Correa RG. Dissecting major signaling pathways throughout the development of prostate cancer. Prostate Cancer. 2013;2013:920612. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Nisar S, Hashem S, Macha MA, Yadav SK, Muralitharan S, Therachiyil L, Sageena G, Al-Naemi H, Haris M, Bhat AA. Exploring dysregulated signaling pathways in cancer. Curr Pharm Des. 2020;26:429–45. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Luo Q, Du R, Liu W, Huang G, Dong Z, Li X. PI3K/AKT/mTOR signaling pathway: role in esophageal squamous cell carcinoma, regulatory mechanisms and opportunities for targeted therapy. Front Oncol. 2022;12:852383. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Litjens G, Sánchez CI, Timofeeva N, Hermsen M, Nagtegaal I, Kovacs I, Hulsbergen-Van De Kaa C, Bult P, Van Ginneken B, Van Der Laak J. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci Rep. 2016;6:26286. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Wang X, Chen Y, Gao Y, Zhang H, Guan Z, Dong Z, Zheng Y, Jiang J, Yang H, Wang L. Predicting gastric cancer outcome from resected lymph node histopathology images using deep learning. Nat Commun. 2021;12:1637. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.He L, Long LR, Antani S, Thoma GR. Histology image analysis for carcinoma detection and grading. Comput Methods Programs Biomed. 2012;107:538–56. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, Jacobsen A, Byrne CJ, Heuer ML, Larsson E, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2:401–4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Krämer A, Green J, Pollard J Jr., Tugendreich S. Causal analysis approaches in ingenuity pathway analysis. Bioinformatics. 2014;30:523–30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Talebi A, Celis-Morales CA, Borumandnia N, Abbasi S, Pourhoseingholi MA, Akbari A, Yousefi J. Predicting metastasis in gastric cancer patients: machine learning-based approaches. Sci Rep. 2023;13:4163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Talebi R, Celis-Morales CA, Akbari A, Talebi A, Borumandnia N, Pourhoseingholi MA. Machine learning-based classifiers to predict metastasis in colorectal cancer patients. Front Artif Intell. 2024;7:1285037. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Aalam SW, Ahanger AB, Masoodi TA, Bhat AA, Akil ASA, Khan MA, Assad A, Macha MA, Bhat MR. Deep learning-based identification of esophageal cancer subtypes through analysis of high-resolution histopathology images. Front Mol Biosci. 2024;11:1346242. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Berman AG, Orchard WR, Gehrung M, Markowetz F. PathML: a unified framework for whole-slide image analysis with deep learning. MedRxiv 2021:2021.2007. 2007.21260138.

[CR29] 29.Goode A, Gilbert B, Harkes J, Jukic D, Satyanarayanan M. OpenSlide: a vendor-neutral software foundation for digital pathology. J Pathol Inf. 2013;4:27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Hoffman RA, Kothari S, Phan JH, Wang MD. A high-resolution tile-based approach for classifying biological regions in whole-slide histopathological images. IFMBE Proc. 2014;42:280–3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Lakshmanan B, Anand S, Jenitha T. Stain removal through color normalization of haematoxylin and eosin images: a review. In: Journal of physics: conference series. IOP Publishing; 2019: 012108.

[CR32] 32.Qu H, Zhou M, Yan Z, Wang H, Rustgi VK, Zhang S, Gevaert O, Metaxas DN. Genetic mutation and biological pathway prediction based on whole slide images in breast carcinoma using deep learning. NPJ Precis Oncol. 2021;5:87. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Roy S, Panda S, Jangid M. (2021, August). Modified reinhard algorithm for color normalization of colorectal cancer histopathology images. In 2021 29th European Signal Processing Conference (EUSIPCO) (pp. 1231–1235). IEEE.

[CR34] 34.Quante M, Wang TC, Bass AJ. Adenocarcinoma of the oesophagus: is it gastric cancer? Gut. 2023;72(6):1027–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Hu B, Hajj E, Sittler N, Lammert S, Barnes N, R., Meloni-Ehrig A. Gastric cancer: classification, histology and application of molecular pathology. J Gastrointest Oncol. 2012;3(3):251. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Qu H-T, Li Q, Hao L, Ni Y-J, Luan W-Y, Yang Z, Chen X-D, Zhang T-T, Miao Y-D, Zhang F. Esophageal cancer screening, early detection and treatment: current insights and future directions. World J Gastrointest Oncol. 2024;16:1180. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Hölscher AH, Bollschweiler E, Schneider PM, Siewert JR. Prognosis of early esophageal cancer. Comparison between adeno-and squamous cell carcinoma. Cancer. 1995;76:178–86. [DOI] [PubMed] [Google Scholar]

[CR38] 38.Li B, Qin W, Yang L, Li H, Jiang C, Yao Y, Cheng S, Zou B, Fan B, Dong T. From pixels to patient care: deep learning-enabled pathomics signature offers precise outcome predictions for immunotherapy in esophageal squamous cell cancer. J Transl Med. 2024;22:195. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Zhong L, Li H, Chang W, Ao Y, Wen Z, Chen Y. TP53 mutations in esophageal squamous cell carcinoma. Front Biosci Landmark. 2023;28(9):219. [DOI] [PubMed] [Google Scholar]

[CR40] 40.Wang G, Guo S, Zhang W, Li Z, Xu J, Li D, Zhan Q. A comprehensive analysis of alterations in DNA damage repair pathways reveals a potential way to enhance the radio-sensitivity of esophageal squamous cell cancer. Front Oncol. 2020;10:575711. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Fusco N, Sajjadi E, Venetis K, Gaudioso G, Lopez G, Corti C, Rocco EG, Criscitiello C, Malapelle U, Invernizzi M. PTEN alterations and their role in cancer management: are we making headway on precision medicine? Genes. 2020;11(7):719. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Li B, Li J, Xu WW, Guan XY, Qin YR, Zhang LY, Cheung AL. Suppression of esophageal tumor growth and chemoresistance by directly targeting the PI3K/AKT pathway. Oncotarget. 2014;5(22):11576. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Lu J, Pan Y, Xia X, Gu Y, Lei Y. Prognostic significance of mTOR and PTEN in patients with esophageal squamous cell carcinoma. Biomed Res Int. 2015;20151:417210. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Murai K, Dentro S, Ong SH, Sood R, Fernandez-Antoran D, Herms A, Kostiou V, Abnizova I, Hall BA, Gerstung M, Jones PH. p53 mutation in normal esophagus promotes multiple stages of carcinogenesis but is constrained by clonal competition. Nat Commun. 2022;13(1):6206. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Ireland AP, Shibata DK, Chandrasoma P, Lord RV, Peters JH, DeMeester TR. Clinical significance of p53 mutations in adenocarcinoma of the esophagus and cardia. Ann Surg. 2000;231(2):179–87. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Chang D, Wang TY, Li HC, Wei JC, Song JX. Prognostic significance of PTEN expression in esophageal squamous cell carcinoma from Linzhou city, a high incidence area of Northern China. Dis Esophagus. 2007;20(6):491–6. [DOI] [PubMed] [Google Scholar]

PERMALINK

OncoMet: a deep learning framework for the prediction of oncogenic signaling pathways and metastasis in esophageal cancer patients using histopathology images from primary tumors

Syed Wajid Aalam

Abdul Basit Ahanger

Tabasum Majeed

Ab Naffi Ahanger

Tariq Masoodi

Ajaz A Bhat

Assif Assad

Muzafar Ahmad Macha

Muzafar Rasool Bhat

Abstract

Background

Methods

Results

Conclusion

Introduction

Materials and methods

Dataset description

Table 1.

Dataset pre-processing

Whole slide image pre-processing

OncoMet algorithm

Model setup

Model evaluation

Results

Prediction of metastasis using OncoMet

Fig. 1.

Table 2.

Model validation on an external dataset

Fig. 2.

Prediction of signaling pathways using OncoMet

Fig. 3.

Fig. 4.

Table 3.

Discussion and conclusion

Fig. 5.

Fig. 6.

Author contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases