Abstract
Objective
To develop a deep learning-based MRI model for predicting tongue cancer T-stage.
Methods
This retrospective study analyzed clinical and MRI data from 579 tongue cancer patients (Xiangya Cancer Hospital and Jiangsu Province Hospital). T2-weighted (T2WI) and contrast-enhanced T1-weighted (CET1) sequences were preprocessed (anonymization/resampling/calibration). Regions of interest (ROI) were segmented by two radiologists (intraclass correlation coefficient (ICC) > 0.75), and using PyRadiomics, 2375 radiomics features were extracted. ResNet18 and ResNet50 algorithms were employed to build deep learning models (deep learning radiomics (DLR) resnet18 / DLRresnet50), compared with a radiomics model (Rad) based on 17 optimized features. Performance was evaluated via AUC, DCA, IDI, and NRI in different sets.
Results
In training set, deep learning models outperformed Rad (AUC: DLRresnet18 = 0.837, DLRresnet50 = 0.847 vs. Rad = 0.828). Test set and and external validation set results were consistent (DLRresnet18, AUC = 0.805 / 0.857; DLRresnet50, AUC = 0.810 / 0.860). The decision curve analysis (DCA) demonstrated that both deep learning models performed better than the Rad model in the training set, test set, and external validation set. Furthermore, both NRI and IDI of the two deep learning models compared with the Rad model were greater than 0.
Conclusion
DLRresnet18 and DLRresnet50 models significantly improve T-stage prediction accuracy over traditional radiomics, reducing subjective interpretation errors and supporting personalized treatment planning. This research achievement provides new ideas and tools for image-assisted diagnosis of tongue cancer T-stage.
Level of evidence
III.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12885-025-14627-6.
Keywords: Tongue cancer, Magnetic resonance imaging, Radiomics, Deep learning
Introduction
Tongue cancer is one of the most common malignant tumors in the head and neck region, accounting for 25–40% of oral cancers. Its incidence is on the rise globally, especially in Asia, where it is closely related to smoking, alcohol consumption, and HPV infection [1]. Although surgical resection combined with radiotherapy or chemotherapy is the main treatment modality, patient prognosis remains limited by early tumor metastasis and recurrence. Clinically, accurate assessment of the tumor's T stage (primary tumor size and local invasion extent) is crucial for formulating individualized treatment plans [2].However, traditional imaging methods (such as CT or MRI) rely on the subjective experience of physicians to judge the T stage, which may have potentially high heterogeneity [3].
Radiomics is an emerging technology that involves extracting and analyzing a large amount of high-dimensional data from medical images. By analyzing radiomic features such as shape, texture, and contrast, the biological characteristics of tumors can be quantified more precisely. In recent years, radiomics has provided new ideas for the non-invasive staging of tumors by combining machine learning algorithms with the high-throughput extraction of quantitative features from medical images [4].Bilal et al. Using deep learning models to predict molecular pathway status from colorectal cancer tissue sections has demonstrated the potential of multimodal data integration [5]. However, individual radiomic features often fail to capture the heterogeneity of tumors, a shortcoming that deep learning may overcome by automatically learning high-level semantic features. Particularly in tongue cancer, where tumor boundaries are indistinct and contrast with surrounding tissues is low, traditional feature extraction methods face challenges [6]. Deep learning, a powerful subset of machine learning, excels in automatically extracting hierarchical features from raw data, offering significant advantages over traditional radiomics. Unlike radiomics, which relies on manually extracted features, deep learning models, such as CNNs and ResNets, can learn complex patterns directly from images, enhancing predictive accuracy and generalizability [7]. For instance, in medical imaging, deep learning models have shown superior performance in predicting tumor stages and outcomes, with AUC values significantly higher than traditional methods. They can also handle image heterogeneity by applying correction algorithms and selecting consistent features [3]. Future directions include exploring lightweight models like MobileNet and integrating multi-modal data to further improve performance [8]. The potential of deep learning in medical imaging is immense, promising more accurate and efficient diagnostic tools.
This study aims to construct a fusion model for predicting the T stage of tongue cancer by integrating traditional radiomic features with deep learning features, and to validate its generalization ability in multicenter datasets. The results are expected to provide an objective and automated T-staging tool for clinical use, assisting in treatment decision-making.
Materials and methods
Patients
This study retrospectively collected 579 patients from the two hospitals from December 2019 to October 2024. All the participants underwent surgical resection and examination of pathology to confirm the pathological stage of the patients. Additionally, the study was approved by the medical ethics committees of Jiangsu Province Hospital (No. 2025-SR-111) and Xiangya Cancer Hospital, Central South University (No. SBQLL-2024–064), and the need for informed consent was waived.
The inclusion criteria were as follows: (1) tongue cancer patients who underwent MRI examination within 2 weeks before surgery; (2) patients who had not received surgery, radiotherapy, chemotherapy or targeted therapy; (3) patients with complete clinical records, imaging data and pathological reports. Exclusion criteria were as follows: (1) poor image quality; (2) patients without CET or T2WI sequence; (3) patients who received surgery, radiotherapy, chemotherapy or targeted therapy. After screening, a total of 579 subjects were enrolled in this study. We divided the patients in center 1 into an internal training set and an internal validation set at a ratio of 7:3 (357:153), and center 2 (69) was the external test set. Flow chart of patient inclusion and exclusion in the study (Fig. 1).
Fig. 1.
Flow chart of inclusion and exclusion criteria
Image acquisition and processing
All participants underwent MRI scanning, two different MRI scanners (Siemens and GE) were used in this study. The MRI scan protocols are described in Table S1. The MRI images of all patients were resampled to 0.5*0.5*4mm3 before feature extraction. In addition, N4 bias field [9] correction was used to adjust the image intensity to improve the image intensity inhomogeneity.
ROI segmentation
ITK-Snap software was used to register T2WI and CET1, so that the anatomical position and spatial structure in T2WI and CET1 sequences were consistent. The tumor region in the MRI image was delineated along the boundary of the tumor, and the corresponding region of interest (ROI) was generated layer by layer.
Twenty patients with T1-2 and 20 patients with T3-4 were randomly selected for Intraclass Correlation Coefficient. Two radiologists performed double-blind segmentation without knowing the pathological data of the patients. ICC > 0.75 of radiomics features and Deep Learning (DL) features indicated good agreement.
Radiomics feature and deep learning feature extraction
PyRadiomics was used to extract the radiomics features of CET1 and T2WI from the same patient. The radiomics features included irstorder features, two-dimensional features, gray-level cooccurrence matrix (GLCM), gray-level dependence matrix (GLDM), gray-level size-zone matrix (GLSZM), gray-level runlength matrix (GLRLM), and neighboring gray tone difference matrix (NGTDM).
For each patient, the largest tumor region of interest (ROI) was precisely cropped from both the T2WI and CET1 sequences. After cropping, the data were enhanced using random horizontal and vertical flips. The images were then uniformly cropped to a size of 224 × 224 pixels. ResNet18 and ResNet50 (https://gitee.com/wangqingbaidu/OnekeyCompo) are currently widely used deep learning algorithms in the field of radiomics [3, 5]. ResNet18 and ResNet50 networks were pre-trained using a large number of images from the ImageNet dataset, and transfer learning was performed on the training set. In the training process, the ResNet50 and ResNet18 parameters were iteratively updated using backpropagation, and the cross-entropy loss function was used considering the output probability and the T-stage label. The learning rate was set to 1 × 104, and the parameters were updated using the Adam optimizer. 64 batches, L2 regularization, and an early stopping strategy were utilized to prevent overfitting. The trained CNN can be used to predict the T stage of each MRI patch. Patient-level probabilities were then obtained by averaging the probabilities of all MRI patches for a patient. After completing the training of ResNet50 and ResNet18, we utilized ResNet50 to extract 2048 DL features for each patch from the penultimate average pooling layer of ResNet50. And, the 521 DL features were extracted from the ResNet18. Workflow of Deep Learning-Radiomics Analysis, as shown in Fig. 2.
Fig. 2.
Workflow of deep learning-radiomics analysis
Statistical analysis
For the comparison of categorical data between groups, the chi-square test or Fisher's exact test is used. For the comparison of continuous data between groups, the Mann–Whitney U test or independent samples t-test is employed. The performance of the prediction model was evaluated using the receiver operating characteristic (ROC) curve and the area under the curve (AUC), along with measures of accuracy, sensitivity, and specificity. Net Reclassification Improvement Index (NRI) and Integrated Discrimination Improvement Index (IDI) were used to compare the performance of different models. Statistical significance was set at P < 0.05. This study utilised the following software tools: ITK-SNAP (v3.8.0), custom Python scripts (v3.7.12), and RStudio (v4.3.2). Key Python packages included Pandas (v1.2.4), NumPy (v1.20.2), PyTorch (v1.8.0), Onekey (v2.2.3), OpenSlide (v1.2.0), Seaborn (v0.11.1), Matplotlib (v3.4.2), SciPy (v1.7.3), Scikit-learn (v1.0.2), and PyRadiomics (v3.0).
Results
Clinical characteristics of the patients
The baseline clinical data of all patients were retrospectively analyzed through the electronic medical record system of the two hospitals. The clinical characteristics of the patients are shown in Table 1.
Table 1.
Clinical characteristics of patients in the training set, test set and external validation set
| feature_name | train set (n, %) | T1-2 | T3-4 | p value | test set (n, %) | T1-2 | T3-4 | p value | validation set (n, %) | T1-2 | T3-4 | p value |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Age | 60.89 ± 12.18 | 60.02 ± 12.23 | 61.86 ± 12.09 | 0.090 | 60.16 ± 12.67 | 58.30 ± 12.35 | 66.87 ± 11.87 | 0.019 | 60.82 ± 12.28 | 61.62 ± 12.61 | 59.95 ± 11.93 | 0.479 |
| Gender | 0.208 | 0.156 | 0.212 | |||||||||
| Male | 335(93.84) | 174(92.06) | 161(95.83) | 42(60.87) | 30(55.56) | 12(80.00) | 150(98.04) | 80(100.00) | 70(95.89) | |||
| Female | 22(6.16) | 15(7.94) | 7(4.17) | 27(39.13) | 24(44.44) | 3(20.00) | 3(1.96) | null | 3(4.11) | |||
| Smoking | 0.177 | 0.137 | 0.478 | |||||||||
| No | 198(55.46) | 98(51.85) | 100(59.52) | 37(53.62) | 32(59.26) | 5(33.33) | 79(51.63) | 44(55.00) | 35(47.95) | |||
| Yes | 159(44.54) | 91(48.15) | 68(40.48) | 32(46.38) | 22(40.74) | 10(66.67) | 74(48.37) | 36(45.00) | 38(52.05) | |||
| Drinking | 0.656 | 0.086 | 0.907 | |||||||||
| No | 252(70.59) | 131(69.31) | 121(72.02) | 43(62.32) | 37(68.52) | 6(40.00) | 94(61.44) | 50(62.50) | 44(60.27) | |||
| Yes | 105(29.41) | 58(30.69) | 47(27.98) | 26(37.68) | 17(31.48) | 9(60.00) | 59(38.56) | 30(37.50) | 29(39.73) | |||
| N stage | < 0.001 | 0.002 | 0.009 | |||||||||
| 0 | 215(60.22) | 138(73.02) | 77(45.83) | 46(66.67) | 39(72.22) | 7(46.67) | 97(63.40) | 60(75.00) | 37(50.68) | |||
| 1 | 62(17.37) | 28(14.81) | 34(20.24) | 9(13.04) | 9(16.67) | null | 19(12.42) | 9(11.25) | 10(13.70) | |||
| 2 | 78(21.85) | 23(12.17) | 55(32.74) | 10(14.49) | 5(9.26) | 5(33.33) | 36(23.53) | 11(13.75) | 25(34.25) | |||
| 3 | 2(0.56) | null | 2(1.19) | 4(5.80) | 1(1.85) | 3(20.00) | 1(0.65) | null | 1(1.37) |
Radiomics feature selection and model construction
A total of 2394 radiomics features were extracted from CET1 and T2WI. 2375 with ICC greater than 0.75 were selected for subsequent feature selection. The radiomics features were standardized using the z-score normalization method. To address the strong correlations between features (Spearman correlation coefficient ≥ 0.9), a greedy recursive feature elimination strategy was employed for feature selection. This method involves iteratively removing the most redundant feature from the current set until no features with a correlation coefficient greater than 0.9 remain. Subsequently, we further refined the features using multivariate least absolute shrinkage and selection operator (LASSO) regression (Fig. 3A, D), and Minimum Redundancy Maximum Relevance. Finally, 17 radiomics features were used to construct the radiomics model (Rad), and the feature weights are shown in Fig. 4a. The NaiveBayes algorithm is known for its simplicity and efficiency [10]. The Rad model was constructed based on NaiveBayes classifier, the specific parameters are shown in Table S2.
Fig. 3.
Lasso regression modeling. A, B, C LASSO regression coefficient path diagram of radiomics model, DLRresnet18 model and DLRresnet50 model. D, E, F LASSO regression analysis cross-validation curve of radiomics model, DLRresnet18 model and DLRresnet50 model
Fig. 4.
Feature construction for radiomics and deep learning-based omics models using Naive Bayes. A Feature construction for the radiomics model (B) Feature construction for the DLRresnet18 model (C) Feature construction for the DLRresnet50 model
Deep learning feature and deep learning radiomics model construction
Supplementary Fig. 1 illustrates region-specific activation patterns of deep learning architectures in tongue cancer MRI. ResNet 50 (50-layer residual network) localizes tumor core regions with precise correspondence to pathological boundaries, indicating its sensitivity to morphometric determinants (e.g., maximum tumor diameter) and infiltration characteristics. These features constitute established prognostic indicators for T-stage classification. Conversely, ResNet 18 exhibits heightened activation at the tumor-normal tissue interface, capturing subtle morphological transitions through its optimized residual blocks. This differential attention suggests complementary roles: ResNet 50 quantifies invasion extent while ResNet-18 delineates marginal heterogeneity – both essential for staging accuracy.
Five hundred twelve and 2048 DL features were extracted by resnet18 and resnet50 algorithms, respectively. Additionally, 512 and 1883 features with ICC greater than 0.75 were retained for further screening. These features were evaluated by the z-score normalization method, Spearman correlation coefficientand, lasso and Minimum Redundancy Maximum Relevance were used to construct DLRresnet18 and DLRresnet50 models (Fig. 3B and E). Twelve radiomics features and two DL features were used to construct the DLRresnet18 model, and 17 radiomics features and seven DL features were used to construct the DLRresnet50 model (Fig. 4B and C). Similarly, NaiveBayes was used for model classification.
Evaluation of the radiomics, deep learning models and the fusion models
The AUCs of the two DLR models (AUC = 0.847 and 0.837) were higher than the Rad model (AUC = 0.828) in training set. In the test set and the external test set, the AUCs of DLRresnet18 (AUC = 0.805 and 0.857) and DLRresnet50 (AUC = 0.810 and 0.860) were consistent with the training set and higher than the Rad model (AUC = 0.770 and 0.801) (Fig. 5). DCA curves also showed that DLRresnet18 and DLRresnet50 were superior to Rad model in the training set, test set and external validation set (Fig. 6). To further indicate whether there is an improvement in the predicted categories of the two DLR models relative to the Rad model, we employ the NRI set IDI to compare with the Rad. In the test set, the NRI and IDI of DLRresnet18 relative to the Rad model are respectively 0.001 and 0.030. The NRI and IDI of DLRresnet50 relative to the Rad model are respectively 0.033 and 0.042. In the external dataset, DLRresnet18 (NRIs = 0.044, IDI = 0.029) and DLRresnet50 (NRIs = 0.048, IDI = 0.062) also had high NRI and IDI (Fig. 7). Supplementary Fig. 2 shows the NRI and IDI values of the two DLR models in the training set compared with the Rad model, the NRI and IDI of DLRresnet18 relative to the Rad model are respectively 0.058 and 0.065. The NRI and IDI of DLRresnet50 rela-tive to the Rad model are respectively 0.056 and 0.066. The probability distribution histograms for the Rad and DLR models are presented in Supplementary Fig. 3. The performance metrics, including AUC, ACC, TPR, and TNR, for all three models are displayed in Table 2.
Fig. 5.
The AUCs of the Rad model and the two DLR models in the training set (A), test set (B) and external validation set (C)
Fig. 6.
DCA curves of the Rad model and the two DLR models in the training set (A), test set (B) and external validation set (C)
Fig. 7.
The NRI and IDI values of the two DLR models compared with the Rad model in test set (A, C) and external validation set (B, D)
Table 2.
The performance metrics, including AUC, ACC, TPR, and TNR, for all three models
| Model | Cohort | AUC(95% CI) | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|---|
| Rad | Training | 0.828 (0.7866—0.8695) | 0.751 | 0.702 | 0.794 |
| Validation | 0.770 (0.6948—0.8449) | 0.719 | 0.712 | 0.725 | |
| External test | 0.801 (0.6599—0.9426) | 0.783 | 0.733 | 0.796 | |
| DLRresnet18 | Training | 0.837 (0.7965—0.8783) | 0.773 | 0.839 | 0.714 |
| Validation | 0.805 (0.7374—0.8729) | 0.712 | 0.849 | 0.587 | |
| External test | 0.857 (0.7592—0.9544) | 0.710 | 0.867 | 0.667 | |
| DLRresnet50 | Training | 0.847 (0.8075—0.8863) | 0.776 | 0.780 | 0.772 |
| Validation | 0.810(0.7433—0.8769) | 0.732 | 0.808 | 0.662 | |
| External test | 0.860 (0.7433—0.8769) | 0.783 | 0.800 | 0.778 |
Discussion
Traditional imaging methods such as CT and MRI, while capable of providing morphological information about tumors, often fall short in reflecting the internal heterogeneity of tumors. Deep learning radiomics analysis, by extracting and analyzing these advanced quantitative features, can more accurately assess the nature and extent of tumors [11].Radiomics not only plays a role in diagnosis but also provides important support in the treatment of tongue cancer [12]. MRI radiomics based on machine learning can predict the pathological differentiation degree of tongue cancer [13]. This is further validated by our DLR models'high AUC (0.860 in external test set, Table 2) and clear feature separation in Naive Bayes probability distributions (Fig. 4), demonstrating superior capture of tumor heterogeneity compared to traditional Rad model.
By extracting imaging features of tumors and combining them with machine learning algorithms, it is possible to establish classification or predictive models of imaging features related to tumor phenotypes or gene-protein characteristics [14].Radiomics can predict the risk of neck lymph node metastasis in patients with tongue cancer by analyzing MRI image features. For example, MRI texture analysis based on the gray-level co-occurrence matrix (GLCM) can provide value in predicting cervical lymph node metastasis in patients with tongue cancer [3]. Our clinical data (Table 1) support this: N stage significantly correlated with T stage (p < 0.001), with T3-4 patients having higher N2-3 rates (32.74% vs 12.17% in training set), aligning with the DLR model's ability to detect advanced invasion.
Radiomics can also be used to assess the response of patients with HNSCC to treatment. By comparing the changes in imaging features before and after treatment, the therapeutic effect can be more accurately evaluated, providing a basis for adjusting the treatment plan [15]. By predicting pathological differentiation, neck lymph node metastasis, and treatment response, radiomics provides new evidence for clinical decision-making.
Contrast, deep learning models, with their multi-layered neural network architecture, can automatically identify optimal features from raw image data, offering greater flexibility, nonlinearity, and rich parameterization to predict clinical outcomes [16]. Moreover, deep learning models have the capacity for self-improvement and can scale exponentially with increasing data and complexity. As a result, deep learning algorithms have been shown to enhance predictive performance in 74% of studies, whether used alone or integrated with conventional radiomics [17]. Our results confirm this: DLRresnet50 achieved IDI = 0.062 and NRI = 0.048 in external validation (Fig. 7), indicating significant discrimination improvement over the Rad model. The Grad-CAM heatmaps (Figs. 1) further illustrate how deep features localize invasion boundaries (e.g., tongue base infiltration in T3 cases).
The T staging of tongue cancer is classified based on the size of the tumor and the depth of invasion, and is mainly used to guide the selection of treatment plans. Although T staging plays an important role in guiding treatment, it still has certain limitations [18]. The 2017 AJCC 8th edition introduced the concept of depth of invasion, but the assessment of depth of invasion is somewhat subjective in histology, which can affect the choice of treatment plan. Currently, there is no effective imaging tool for preoperative T staging in clinical application, and no relevant models have been reported. In this multicenter retrospective cohort study, we established, for the first time, an extensive deep learning radiomics (DLR) model to predict the preoperative T staging of tongue cancer. DLRresnet18 and DLRresnet50 are deep learning models derived from the ResNet architecture, using deeper networks with 18/50 layers to process complex features from input data. Their applications in medical image analysis have demonstrated powerful feature extraction capabilities and adaptability. By improving the architecture and using transfer learning techniques, they can significantly enhance diagnostic accuracy and efficiency, providing a robust tool for medical image analysis [19, 20]. The LASSO feature selection process (Fig. 3) optimized this capability – for example, DLRresnet50 retained deep features that reduced overfitting while maintaining validation AUC = 0.860 (Table 2).
Compared to traditional CT, MRI has good soft tissue resolution in the preoperative assessment of tongue cancer. In tongue cancer with severe soft tissue involvement, MRI is more suitable for accurate T staging [21]. Supplementary Table S1 details the MRI parameters enabling this: uniform 4 mm slice thickness across GE/Siemens scanners ensured feature extraction consistency, critical for multicenter validation.
This study retrospectively analyzed MRI data from 579 patients with tongue cancer from two centers and constructed radiomics (Rad)- and deep learning radiomics (DLR) models based on ResNet18 and ResNet50 architectures, which can effectively identify T1-2 and T3-4 stages of tongue cancer. To address the potential heterogeneity of images from different centers, we applied intensity inhomogeneity correction algorithms and performed interobserver reproducibility assessments to select features with good consistency (ICC > 0.75). By integrating T1WI and T2-fs radiomics features with maximum cross-sectional deep learning features, the DLR model demonstrated reliable predictive ability, with AUC values (0.805–0.860) significantly higher than those of the traditional radiomics model (Rad, AUC = 0.770–0.828) in the training, test, and external validation sets, indicating that deep learning features can effectively complement the deficiencies of traditional features (NRI = 0.033–0.048, IDI = 0.029–0.062). Figure 7 and Figs2 quantify this complementarity: NRI values were consistently positive across all cohorts (p < 0.05), confirming the DLR models'superior reclassification ability. Decision curve analysis (DCA) showed that the fusion model has higher net benefit across a wide range of threshold probabilities, supporting its potential application value as a non-invasive staging tool. As seen in Fig. 6, DLRresnet50 provided the highest net benefit at clinical decision thresholds. We validated the applicability and generalizability of the proposed DLR analysis using internal datasets. These findings suggest that a multimodal feature fusion strategy can significantly enhance the predictive accuracy of tongue cancer staging, providing a reliable basis for individualized treatment.
In comparison with other studies, this study shares similarities with the work of Lan et al. on oral squamous cell carcinoma [22], which focused on predicting lymph node metastasis (DLRresnet50, AUC = 0.796–0.928), whereas this study is the first to apply the DLR model to T staging (DLRresnet50, AUC = 0.810–0.860). Similarly, in our study on T-staging prediction of tongue cancer, DLRresnet50 demonstrated significantly superior predictive efficacy compared to the Rad model. Additionally, this study quantifies model improvement through NRI and IDI, providing a more comprehensive performance evaluation than AUC alone, echoing the multi-omics feature selection framework proposed by Bhadra et al. [10, 23]. The probability distribution histograms (Figs3) further differentiate our approach: DLR models showed bimodal distributions with minimal overlap between T1-2/T3-4 classes, reducing ambiguity in clinical staging.
Our research also has some limitations, as follows. The current model relies on high-resolution MRI data, which limits its application in regions with limited resources. The quality and standardization of imaging data are important factors affecting the application of radiomics. Differences between different devices and operators may lead to inconsistent data, affecting the accuracy and generalization ability of the model. Clinical deployment faces significant challenges: MRI-based deep learning for tongue cancer T-staging encounters triple barriers in resource accessibility, technical robustness, and clinical integration maturity. Breakthroughs require synergistic advances in rapid scanning technology, cross-center federated learning frameworks, and interpretable algorithms. In the future, lightweight models (such as MobileNet) or the combination of CT features can be explored. Secondly, the interpretability and generalization ability of machine learning and deep learning algorithms are hot issues in current research. How to improve the interpretability and generalization ability of algorithms so that they can maintain a high accuracy rate across different datasets is an important direction for future research. Figure 4's feature construction analysis reveals a related challenge: Naive Bayes assumes feature independence, potentially underutilizing spatial relationships captured in ResNet activations. Thirdly, the inherent selection bias in the retrospective design may affect the extrapolation ability of the model. Advanced imaging sequences, such as DWI (Diffusion-Weighted Imaging) and ADC (Apparent Diffusion Coefficient) maps, were not included in the radiomics analysis. These sequences can provide additional information as tumor tissues typically exhibit different diffusion characteristics. Prospective cohort validation is required, with a larger cohort and the inclusion of multi—parameter data, to explore the impact of these sequences. Finally, the regions of interest (ROIs) were manually delineated. Although the intra—class correlation coefficient (ICC) was calculated to minimize inter—observer variability, automated segmentation methods can improve accuracy and reduce the labor—intensiveness of manual delineation, especially with the advancements in artificial intelligence algorithms.
Conclusion
This study proposed a novel deep learning radiomics (DLR) model to predict the T stage of tongue cancer based on the features of T1WI and T2—fs MRI images. As an important tool in tongue cancer research and clinical practice, deep learning radiomics is constantly evolving and advancing. By integrating advanced imaging technologies and machine learning algorithms, deep learning radiomics offers new methods and ideas for the early diagnosis, precise treatment, and prognosis assessment of tongue cancer.
Supplementary Information
Supplementary Material 1: Supplementary Table S1 Detailed MRI Scan Parameters. Supplementary Table S2 Parameters of machine learning. Figs1. Illustration of deep learning feature heatmap to predict T-staging of Tongue Cancer. (A) Grad-Cam of ResNet 18. (B) Grad-Cam of ResNet 50. Figs2. The NRI and IDI values of the two DLR models in the training set compared with the Rad model. Figs3. The probability distribution histograms for the Rad and DLR models.
Authors’ contributions
Xi Chen designed the study; Zhaoyi Lu wrote the manuscript; Hang ling coordinated the project; Zhaoyi Lu and Bowen Zhu performed data analysis; All members participated in discussion. All authors approved the final manuscript.
Funding
The study was supported by Jiangsu Province Hospital (the First Affiliated Hospital with Nanjing Medical University) Clinical Capacity Enhancement Project (JSPH-MA-2023–1), Clinical Diagnosis and Treatment Technology Innovation Challenge Project of Jiangsu Province Hospital (JBGS202420) and the Natural Science Foundation of Jiangsu Province (Grants No. BK20230739).
Data availability
No datasets were generated or analysed during the current study.
Declarations
Ethics approval and consent to participate
This study was approved by the Medical Ethics Committee of Jiangsu Province Hospital (No. 2025-SR-111) and Xiangya Cancer Hospital, Central South University (No. SBQLL-2024–064), and the need for informed consent was waived. All procedures adhered to the Declaration of Helsinki and relevant national guidelines and we confirm that our study does not involve experiments on humans or the use of human tissue samples.
Consent for publication
All authors approved the final manuscript.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Zhaoyi Lu and Bowen Zhu are co-authors.
References
- 1.Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49. [DOI] [PubMed] [Google Scholar]
- 2.Bourhis J, Le Tourneau C, Calderon B, Martin L, Sire C, Pointreau Y, Ramee J, Coutte A, Boisselier P, Kaminsky-Forrett M. LBA33 5-year overall survival (OS) in patients (pts) with locally advanced squamous cell carcinoma of the head and neck (LA SCCHN) treated with xevinapant+ chemoradiotherapy (CRT) vs placebo+ CRT in a randomized, phase II study. Ann Oncol. 2022;33:S1400. [Google Scholar]
- 3.Lan T, Kuang S, Liang P, Ning C, Li Q, Wang L, Wang Y, Lin Z, Hu H, Yang L, et al. MRI-based deep learning and radiomics for prediction of occult cervical lymph node metastasis and prognosis in early-stage oral and oropharyngeal squamous cell carcinoma: a diagnostic study. Int J Surg. 2024;110(8):4648–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J, Sanduleanu S, Larue R, Even AJG, Jochems A, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol. 2017;14(12):749–62. [DOI] [PubMed] [Google Scholar]
- 5.Bilal M, Raza SEA, Azam A, Graham S, Ilyas M, Cree IA, Snead D, Minhas F, Rajpoot NM. Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images: a retrospective study. Lancet Digit Health. 2021;3(12):e763–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Aerts HJ, Velazquez ER, Leijenaar RT, Parmar C, Grossmann P, Carvalho S, Bussink J, Monshouwer R, Haibe-Kains B, Rietveld D, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun. 2014;5:4006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Liu L, Wang X, Bao Q, Li X. Behavior detection and evaluation based on multi-frame MobileNet. Multimedia Tools Appl. 2024;83(6):15733–50. [Google Scholar]
- 9.Dovrou A, Nikiforaki K, Zaridis D, Manikis GC, Mylona E, Tachos N, Tsiknakis M, Fotiadis DI, Marias K. A segmentation-based method improving the performance of N4 bias field correction on T2-weighted MR imaging data of the prostate. Magn Reson Imaging. 2023;101:1–12. [DOI] [PubMed] [Google Scholar]
- 10.Bhadra T, Mallik S, Hasan N, Zhao Z. Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer. BMC Bioinformatics. 2022;23(Suppl 3):153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Crombe A, Spinnato P, Italiano A, Brisse HJ, Feydy A, Fadli D, Kind M. Radiomics and artificial intelligence for soft-tissue sarcomas: current status and perspectives. Diagn Interv Imaging. 2023;104(12):567–83. [DOI] [PubMed] [Google Scholar]
- 12.Tagliabue M, Ruju F, Mossinelli C, Gaeta A, Raimondi S, Volpe S, Zaffaroni M, Isaksson LJ, Garibaldi C, Cremonesi M, et al. The prognostic role of MRI-based radiomics in tongue carcinoma: a multicentric validation study. Radiol Med. 2024;129(9):1369–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yu B, Huang C, Xu J, Liu S, Guan Y, Li T, Zheng X, Ding J. Prediction of the degree of pathological differentiation in tongue squamous cell carcinoma based on radiomics analysis of magnetic resonance images. BMC Oral Health. 2021;21(1):585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ren J, Qi M, Yuan Y, Tao X. Radiomics of apparent diffusion coefficient maps to predict histologic grade in squamous cell carcinoma of the oral tongue and floor of mouth: a preliminary study. Acta Radiol. 2021;62(4):453–61. [DOI] [PubMed] [Google Scholar]
- 15.Lin P, Xie W, Li Y, Zhang C, Wu H, Wan H, Gao M, Liang F, Han P, Chen R, et al. Intratumoral and peritumoral radiomics of MRIs predicts pathologic complete response to neoadjuvant chemoimmunotherapy in patients with head and neck squamous cell carcinoma. J Immunother Cancer. 2024. 10.1136/jitc-2024-009616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Shehta AI, Nasr M, El Ghazali A. Blood cancer prediction model based on deep learning technique. Sci Rep. 2025;15(1):1889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wan Q, Lindsay C, Zhang C, Kim J, Chen X, Li J, Huang RY, Reardon DA, Young GS, Qin L. Comparative analysis of deep learning and radiomic signatures for overall survival prediction in recurrent high-grade glioma treated with immunotherapy. Cancer Imaging. 2025;25(1):5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Dai B, Tan H, Yang Z, Gong Z, Wu K, Wu H. Necessity of applying anatomical unit resection surgery in suspected posterior oral squamous cell carcinoma. BMC Oral Health. 2025;25(1):212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Gong W, Vaishnani DK, Jin X-C, Zeng J, Chen W, Huang H, Zhou Y-Q, Hla KWY, Geng C, Ma J. Evaluation of an enhanced ResNet-18 classification model for rapid on-site diagnosis in respiratory cytology. BMC Cancer. 2025;25(1):10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Su F, Sun Y, Hu Y, Yuan P, Wang X, Wang Q, Li J, Ji JF. Development and validation of a deep learning system for ascites cytopathology interpretation. Gastric Cancer. 2020;23(6):1041–50. [DOI] [PubMed] [Google Scholar]
- 21.Bae MR, Roh JL, Kim JS, Lee JH, Cho KJ, Choi SH, Nam SY, Kim SY. (18)F-fdg pet/ct versus ct/mr imaging for detection of neck lymph node metastasis in palpably node-negative oral cavity cancer. J Cancer Res Clin Oncol. 2020;146(1):237–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Malone J, Hill C, Tanskanen A, Liu K, Ng S, MacAulay C, Poh CF, Lane PM. Imaging biomarkers of oral dysplasia and carcinoma measured with in vivo endoscopic optical coherence tomography. Cancers (Basel). 2024. 10.3390/cancers16152751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bhadra T, Maulik U. Unsupervised feature selection using iterative shrinking and expansion algorithm. IEEE Trans Emerg Top Comput Intell. 2022;6(6):1453–62. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Material 1: Supplementary Table S1 Detailed MRI Scan Parameters. Supplementary Table S2 Parameters of machine learning. Figs1. Illustration of deep learning feature heatmap to predict T-staging of Tongue Cancer. (A) Grad-Cam of ResNet 18. (B) Grad-Cam of ResNet 50. Figs2. The NRI and IDI values of the two DLR models in the training set compared with the Rad model. Figs3. The probability distribution histograms for the Rad and DLR models.
Data Availability Statement
No datasets were generated or analysed during the current study.







