An interpretable machine learning model using SHapley Additive exPlanations for preoperative cervical lymph node metastasis risk stratification in tongue squamous cell carcinoma: a multicenter study

Yang Li; Nengwen Huang; Li Wang; Haotian Xiao; Weiping Chen; Yifeng Xing; Takashi Nishioka; Kangwei Zhou; Ikuho Kojima; Jiang Chen; Yanjing Ou; Wen Li

doi:10.1186/s12903-025-07528-4

. 2025 Dec 27;26:185. doi: 10.1186/s12903-025-07528-4

An interpretable machine learning model using SHapley Additive exPlanations for preoperative cervical lymph node metastasis risk stratification in tongue squamous cell carcinoma: a multicenter study

Yang Li ^1,^#, Nengwen Huang ^2,^#, Li Wang ², Haotian Xiao ³, Weiping Chen ², Yifeng Xing ², Takashi Nishioka ^4,⁵, Kangwei Zhou ⁶, Ikuho Kojima ⁵, Jiang Chen ⁷, Yanjing Ou ^7,^✉, Wen Li ^1,^✉

PMCID: PMC12853609 PMID: 41455998

Abstract

Objectives

Tongue squamous cell carcinoma (TSCC) is characterized by early lymph node metastasis (LNM), which significantly impacts prognosis. Traditional diagnostic methods rely on invasive biopsies or postoperative histopathology, highlighting the need for non-invasive preoperative prediction tools. This study aimed to develop an interpretable radiomics model using tumor shape features from magnetic resonance imaging (MRI) to predict cervical LNM in TSCC.

Methods

We retrospectively analyzed data from 293 TSCC patients across two hospitals. Shape-related radiomic features were extracted from preoperative contrast-enhanced T1-weighted imaging (CET1WI) and T2-weighted imaging (T2WI). A radiomics model was developed using logistic regression (LR) and validated internally and externally. Clinical variables were integrated into a combined model. The SHapley Additive exPlanations (SHAP) framework was employed to interpret feature contributions.

Results

The radiomics model achieved AUCs of 0.818 (training cohort), 0.739 (validation cohort), and 0.755 (test cohort). Incorporating clinical variables did not significantly improve performance. SHAP analysis identified T2WI_SurfaceVolumeRatio as the most influential feature. Individualized force plots and a web-based nomogram provided intuitive visualizations of model predictions.

Conclusions

Tumor shape features derived from MRI, particularly SurfaceVolumeRatio, independently predict cervical LNM in TSCC. The SHAP-interpretable radiomics model offers a clinically transparent, non-invasive tool for preoperative risk stratification, aiding personalized treatment decisions.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12903-025-07528-4.

Keywords: Tongue squamous cell carcinoma, Lymph node metastasis, Radiomics, SHapley Additive exPlanations, Machine learning

Introduction

Tongue squamous cell carcinoma (TSCC) is one of the most prevalent subtypes of oral cancer [1]. Its pathogenesis is closely associated with early regional lymph node metastasis (LNM) [2, 3] and aggressive tumor proliferation [4]. Despite recent advancements in therapeutic modalities, the five-year survival rate and overall quality of life for TSCC patients have shown limited improvement [5]. Unlike the overt clinical manifestations typically observed in advanced TSCC, early-stage lesions are often asymptomatic and require histopathological assessment, such as tissue biopsy, for accurate detection and diagnosis. In accordance with the World Health Organization’s recommendation to prioritize early diagnosis and prevention in oral cancer control, the development and implementation of effective strategies for the early detection and diagnosis of TSCC are urgently needed to prolong survival and enhance the quality of life of affected individuals [6].

Recent studies have highlighted the prognostic significance of tumor morphological characteristics in TSCC [7]. Tumor shape, encompassing parameters such as contour irregularity, border definition, and growth pattern, has been increasingly recognized as a surrogate marker for tumor aggressiveness and metastatic potential [8, 9]. Tumor shape features extracted from magnetic resonance imaging (MRI) can serve as valuable imaging markers for predicting perineural invasion in high-grade prostate cancer [10]. These shape features may reflect the biological behavior of the tumor, including its capacity for local invasion and distant dissemination. Integrating tumor shape analysis into the clinical decision-making process may improve the accuracy of prognostic evaluation.

Radiomics provides a non-invasive approach for extracting high-dimensional quantitative imaging features, which can be used to predict tumor status and patient prognosis [11]. In TSCC, predictive models based on radiomic features (encompassing texture, first-order, and shape characteristics) frequently face limited interpretability, which hinders their clinical translation despite their potential utility in assessing tumor status [12, 13]. Unlike deep-seated visceral tumors, TSCC lesions are accessible for direct visual inspection, making macroscopic morphological features particularly relevant for clinical assessment. The SHapley Additive exPlanations (SHAP) method addresses this challenge by quantifying the contribution of each individual feature to the model’s output [14]. SHAP is rooted in cooperative game theory and determines feature importance by calculating the Shapley value—the average marginal contribution of a feature across all possible feature combinations. This method offers both local and global interpretability, enabling clinicians to understand individual predictions as well as overall model behavior. Such transparency is crucial for building trust in radiomic models and facilitating their clinical translation.

Based on these considerations, we hypothesized that a machine learning model integrating radiomic shape features and clinical variables could effectively predict LNM in TSCC, and that this model would maintain generalizability across external cohorts. We further hypothesized that the SHAP framework would provide interpretable insights into the model’s decision-making process. To test these hypotheses, we developed and externally validated a fused radiomics-clinical model for preoperative LNM prediction in TSCC, and applied SHAP to visualize and interpret the model’s behavior.

Methods

Study population of the dataset

This study included data from 293 patients with TSCC across two hospitals in China, as illustrated in Fig. 1. The training cohort consisted of 182 patients diagnosed at the First Affiliated Hospital of Fujian Medical University between June 2022 and July 2024. An additional 59 patients diagnosed at the same hospital from August 2024 to March 2025 were assigned to the validation cohort. The external test cohort comprised 52 patients diagnosed at the First Affiliated Hospital of Zhengzhou University between June 2022 and March 2025.

Comprehensive clinical data, including age, gender, and inflammatory markers, were systematically collected. The inclusion criteria were: (i) patients underwent contrast-enhanced MRI prior to surgery; (ii) no history of prior treatment for TSCC; (iii) availability of complete clinical information. Exclusion criteria were: (i) absence of either contrast-enhanced T1-weighted imaging (CET1WI) and T2-weighted imaging (T2WI); (ii) no neck lymph node dissection performed.

This study was approved by the respective institutional ethics committees and conducted in accordance with the Declaration of Helsinki (FMU[2023]639). Due to its retrospective design, the requirement for informed consent was waived.

CEMRI examination and image preprocessing

Figure 2 presents the flowchart of this study. All MRI examinations were performed using a 3.0T superconducting MR scanner (Siemens, Germany) equipped with a head and neck array coil. Both the CET1WI and T2WI MRI sequences acquired in this study were performed with fat suppression techniques. The parameters were repetition time (TR)/echo time (TE) of 500 ms/2.4 ms, an identical field of view (FOV) of 220 × 220 mm, and a slice thickness of 5 mm.

Fig. 2 — Workflow of the study. ROIs: regions of interest; TSCC: tongue squamous cell carcinoma; CET1WI: contrast-enhanced T1-weighted imaging; T2WI: T2-weighted imaging; SHAP: SHapley Additive exPlanations

The image preprocessing pipeline included three key steps to ensure data consistency and quality: N4 bias field correction was first applied to correct for low-frequency intensity inhomogeneities in the CET1WI and T2WI [15]; voxel resampling was then performed to standardize all images to a uniform resolution of 1 mm³, ensuring spatial consistency; finally, image standardization was conducted to normalize the intensity values across different scanners and protocols [16].

Regions of interest (ROIs) were manually delineated using ITK-SNAP software (version 3.8.0). This task was conducted by an experienced oral and maxillofacial surgeon who was blinded to the patients’ clinical data to ensure unbiased segmentation. The delineations were then reviewed and confirmed by a senior oral and maxillofacial surgeon to ensure accuracy and consistency of the imaging data.

Radiomics feature extraction, selection, and model construction

Given our focus on the impact of tumor morphology on LNM, we extracted shape-related radiomic features. A total of 14 shape features were extracted, as illustrated in Fig. 2. These included: Elongation, Flatness, Least Axis Length, Major Axis Length, Maximum 2D Diameter (Column), Maximum 2D Diameter (Row), Maximum 2D Diameter (Slice), Maximum 3D Diameter, Mesh Volume, Minor Axis Length, Sphericity, Surface Area, Surface Volume Ratio, and Voxel Volume. Feature extraction was performed using PyRadiomics (version 2.2.0) [17]. Reproducibility was assessed by calculating intraclass correlation coefficients (ICCs) between two independent radiologists. Features with ICCs < 0.75, either intra- or inter-observer, were considered unreliable and excluded from further analysis [18].

All features were standardized using z-score normalization. For features with normal distribution, Student’s t-tests were applied, and only those with p-values < 0.05 were retained. To reduce redundancy, features with Spearman correlation coefficients > 0.9 were filtered [19, 20]. A greedy recursive elimination strategy was then used to further reduce feature dimensionality.

Radiomics signature development was based on the discovery cohort using the Least Absolute Shrinkage and Selection Operator (LASSO) regression. The optimal regularization parameter (λ) was determined via 10-fold cross-validation. Features with non-zero coefficients in the LASSO model were used to construct the final radiomics signature. Modeling was performed using the scikit-learn package in Python [21].

To evaluate the predictive performance of different machine learning models, we constructed classifiers using five algorithms: logistic regression (LR), support vector machine (SVM), LightGBM, Naive Bayes, and multilayer perceptron (MLP). After performance comparison, the best-performing model was combined with selected clinical variables to build a combined model.

Model explanation and visualization

To interpret our model’s predictions, we applied SHAP to quantify the contribution of each feature to individual outcomes. Using the Kernel Explainer, we computed Shapley values, which represent the average marginal impact of a feature across all possible subsets. This approach provides clear visualization of feature importance and direction of influence [22].

Clinical feature selection and model development

Univariate logistic regression was performed to evaluate the association between clinical characteristics, inflammatory markers, and the outcome. Variables with a statistically significant association (p-value < 0.05) were selected as clinical predictors for model construction.

Subsequently, two models were developed: (1) a radiomics model based on shape features extracted from CET1WI and T2WI, and (2) a combined model incorporating both radiomics shape features (from CET1WI and T2WI) and the selected clinical variables. The combined (fusion) model was constructed using a logistic regression framework.

Nomogram construction

To simplify the combined model into a user-friendly tool, a nomogram was constructed to represent the predictive model. The total score was calculated by summing points assigned to clinical features and the radiomics signature. For practical use, web-based calculators were also created for instant deployment and easy access.

Statistical analysis

Categorical variables were analyzed using the chi-square test, while continuous variables were assessed with Student’s t-test. Model performance was evaluated using receiver operating characteristic (ROC) curves by calculating the area under the curve (AUC). The optimal sensitivity and specificity were determined at the cutoff point that maximized the Youden index. The DeLong test was adopted to compare the diagnostic performance of the ROC curves between the radiomics model and the combined model.

To estimate the confidence intervals (CIs), the bootstrap method with 1000 resamples was applied. All statistical analyses were performed using SPSS software (version 21.0), and a two-sided p-value < 0.05 was considered statistically significant.

Results

Patient characteristics

At Hospital A, a total of 182 eligible patients were enrolled in the training cohort, comprising 65 cases with cervical LNM and 117 cases without. Additionally, 59 eligible patients were included in the validation cohort, with 21 cases of cervical LNM and 38 without. At Hospital B, 52 eligible patients were selected for the test cohort, including 15 cases with cervical LNM and 37 without.

The average age of patients in the training cohort was 56.10 ± 12.41, significantly higher than that in the validation cohort (49.15 ± 13.05) and the test cohort (50.19 ± 12.05) (p-value < 0.01). No significant differences were observed in terms of patient gender across the three cohorts. There were also notable differences in the inflammatory marker lymphocyte-to-monocyte ratio (LMR) between the groups. Compared to the training cohort (5.34 ± 2.33), the LMR was lower in the validation cohort (4.62 ± 1.94) (p-value = 0.02), and also lower in the test cohort (4.67 ± 2.07), indicating differences in systemic inflammatory responses across the different cohorts (Table 1).

Table 1.

Baseline characteristics of study sets

Characteristics	Training cohort (n = 182)	Validation cohort (n = 59)	p-value	Test cohort (n = 52)	p-value
Age (years)	56.10 ± 12.41	49.15 ± 13.05	< 0.01	50.19 ± 12.05	< 0.01
Gender, n (%)			0.99		0.69
Male	122(67.03)	40(67.80)		37(71.15)
Female	60(32.97)	19(32.20)		15(28.85)
PLR	150.25 ± 78.17	143.71 ± 86.81	0.17	145.24 ± 73.07	0.55
NLR	3.02 ± 4.41	2.99 ± 6.34	0.33	2.62 ± 3.53	0.14
LMR	5.34 ± 2.33	4.62 ± 1.94	0.02	4.67 ± 2.07	0.04
SIRI	1.46 ± 3.55	1.15 ± 1.92	0.39	1.40 ± 3.22	0.67

Open in a new tab

PLR platelet-lymphocyte ratio, NLR neutrophils-lymphocytes ratio, LMR lymphocytes-monocytes ratio, SIRI systemic inflammation response index. P-value was calculated separately from the training cohort and two other cohorts

Feature selection

For each tumor, 14 shape-related features were extracted separately from CET1WI and T2WI, yielding a total of 28 features. Following reproducibility assessment based on ICCs and the removal of features with p-value ≥ 0.05 in Student’s t-tests, 26 features were retained. Subsequent redundancy filtering, which excluded features with Spearman correlation coefficients > 0.9, further reduced the feature set to 9. Finally, the LASSO regression was applied, resulting in a total of six features with non-zero coefficients—specifically, Elongation, Sphericity, and Surface Volume Ratio from each MRI sequence—which constituted the final radiomic signature.

In addition, clinical information and inflammatory markers were subjected to univariate logistic regression analysis. Three features with statistical significance (p-value < 0.05) were retained: age, PLR, and LMR.

Machine learning model selection

To evaluate the performance of different machine learning models, six shape-related features were used to construct predictive models for cervical LNM (Fig. 3). In the training cohort, LR (AUC = 0.818), Naive Bayes (AUC = 0.819), and MLP (AUC = 0.828) demonstrated relatively strong performance. Considering results from both the validation and test cohorts, LR showed favorable and consistent predictive performance, with AUCs of 0.739 and 0.755, respectively. Therefore, LR was ultimately selected as the final predictive model.

Fig. 3 — Predictive performance of different machine learning models. A Training cohort; B Validation cohort; C Test cohort. LR: logistic regression; SVM: support vector machine; MLP: multilayer perceptron; AUC: area under the curve

Performance comparison between the radiomics and combined models

The performance evaluation of the radiomics model and the combined model is presented in Table 2. In the training cohort, the combined model demonstrated a performance comparable to that of the radiomics model (AUC = 0.818, 95% CI: 0.754–0.881). Similarly, in the validation cohort, the combined model achieved an AUC of 0.743 (95% CI: 0.614–0.872), which was comparable to that of the radiomics model (AUC = 0.739, 95% CI: 0.609–0.870). In the test cohort as well, the performance of the combined model (AUC = 0.735, 95% CI: 0.577–0.894) was comparable to that of the radiomics model (AUC = 0.755, 95% CI: 0.611–0.899). The DeLong test demonstrated no significant difference in predictive performance between the radiomics model and the combined model, both in the validation (p-value = 0.646) and test (p-value = 0.411) cohorts.

Table 2.

Performances of the predictive models

Models	Cohort	AUC(95%CI)	Accuracy	Sensitivity	Specificity	PPV	NPV
Radiomics model	Training cohort	0.818 (0.754–0.881)	0.780	0.723	0.812	0.681	0.841
Combined model	Training cohort	0.818 (0.754–0.881)	0.786	0.692	0.838	0.703	0.831
Radiomics model	Validation cohort	0.739 (0.609–0.870)	0.627	1.000	0.421	0.488	0.999
Combined model	Validation cohort	0.743 (0.614–0.872)	0.627	0.952	0.447	0.488	0.944
Radiomics model	Test cohort	0.755 (0.611–0.899)	0.731	0.600	0.784	0.529	0.829
Combined model	Test cohort	0.735 (0.577–0.894)	0.769	0.533	0.865	0.615	0.821

Open in a new tab

AUC area under the curve, CI confidence interval, PPV ‌positive predictive value‌, NPV ‌negative predictive value‌

Explanation and visualization of radiomics model

SHAP provided a quantitative interpretation for the LR machine learning model. The SHAP summary plot offered a visually concise representation by illustrating both the range and distribution of each feature’s importance on the model output and by relating the feature value to its impact. Features were ranked by their global importance. Each dot, representing a SHAP value for a given feature from an individual patient, was plotted horizontally and stacked vertically to indicate the density of identical SHAP values. Additionally, each dot was color-coded according to the feature value, ranging from low (blue) to high (red). As shown in Fig. 4, T2WI_SurfaceVolumeRatio was identified as the most important feature for distinguishing patients with and without cervical LNM. The density of the SHAP plot for T2WI_SurfaceVolumeRatio demonstrated variation in SHAP values across the cohort, and the color gradient indicated that the model’s predicted probability increased as the feature value decreased.

Fig. 4 — SHAP summary plots of radiomics model. CET1WI: contrast-enhanced T1-weighted imaging; T2WI: T2-weighted imaging; SHAP: SHapley Additive exPlanations

The force plot (Fig. 5) provided an interpretation of the model’s prediction for an individual patient. It visualized each feature’s SHAP value as a force that either increased or decreased the final prediction. Every prediction started from the base value (− 1.588), which represented the mean SHAP value across all predictions. The length of each arrow indicated the magnitude of a feature’s contribution to the prediction, while the color of the arrow denoted the direction of the effect—red for positive contributions and blue for negative ones.

As shown in Fig. 5A, the SHAP value of this patient was − 1.643, which was lower than the base value (− 1.588), indicating that this patient could be assessed as belonging to the non–LNM group. Among the features, the negative (blue) arrow of T2WI_SurfaceVolumeRatio, with a value of − 0.12, made a substantial contribution to the assessment of non–LNM. As shown in Fig. 5B, for another patient, the SHAP value was − 1.434, which was higher than the base value (− 1.588), suggesting that this patient could be assessed as having LNM. The T2WI_SurfaceVolumeRatio arrow, with a value of 0.06, made a positive (red) contribution to the classification of LNM.

In addition to model interpretation using SHAP values, a nomogram was constructed to facilitate individualized prediction (Appendix Fig. 1). An online version of the tool was also developed to enable real-time prediction of cervical LNM, with its user interface illustrated in Appendix Figs. 2 and 3.

Discussion

In this study, we developed and validated a radiomics model based exclusively on MRI-derived tumor shape features for the preoperative prediction of cervical LNM in patients with TSCC. The model exhibited favorable discriminative ability and reproducibility across both internal and external validation cohorts. Notably, the integration of clinical variables into a combined model did not significantly improve predictive performance, highlighting the sufficiency of shape-based radiomics features alone in prognostication. SHAP-based interpretation further identified T2WI_SurfaceVolumeRatio as the predominant predictor, offering transparent insight into the model’s decision-making process. Collectively, these results support the potential of shape-focused radiomics combined with explainable machine learning as a non-invasive tool for preoperative LNM prediction in TSCC, which may facilitate clinical translation and support personalized treatment planning.

Previous studies have identified a range of clinical and pathological factors associated with cervical LNM in TSCC, including greater tumor thickness or depth of invasion, poor histological differentiation, higher clinical T stage, and presence of vascular or perineural invasion [23–25]. While most radiomics investigations have prioritized texture and first-order features, few have systematically evaluated shape characteristics [26]. Our findings demonstrate that MRI-derived tumor shape features alone can provide prognostic information comparable to conventional approaches, offering a novel non-invasive biomarker that may complement existing clinical indicators. By showing that shape features such as the SurfaceVolumeRatio carry independent predictive value, our findings highlight a novel and non-invasive imaging biomarker that may complement or even substitute for conventional clinical indicators.

A key methodological innovation of this work lies in integrating SHAP with the LR-based radiomics model, thereby addressing the critical issue of interpretability that often hinders the clinical adoption of conventional machine learning. Although LR classifiers are widely utilized in radiomics research and have shown reliable performance, their inherent “black-box” characteristics can impede clinical trust and application [27]. SHAP effectively mitigates this limitation by delivering both global and local interpretability of feature contributions [28], thereby offering clinician-friendly visualizations that illustrate how individual features influence predictive outcomes.

At the cohort level, SHAP summary plots present a ranked, color-coded distribution of feature impact, where the horizontal spread of dots reflects the magnitude of influence and the color indicates the feature’s actual value (from low in blue to high in red). This offers more nuanced insights than traditional bar plots, which show only relative importance. In our study, clinicians can clearly observe, for example, how higher values of SurfaceVolumeRatio (highlighted in red) strongly contribute to predicting lymph node metastasis, whereas lower values exert the opposite effect.

Beyond global interpretability, SHAP also enables individualized decision support through force plots, which offer a localized explanation for each patient’s prediction [29]. This feature is particularly valuable in clinical settings where personalized treatment strategies are essential. Compared to traditional nomograms—which require manual calculation of feature scores to estimate a patient’s risk—SHAP force plots provide a more efficient and intuitive visualization. For each patient, the SHAP force plot illustrates how specific features contribute to shifting the predicted probability from a baseline value toward a higher or lower risk of cervical lymph node metastasis. In these plots, features are represented by colored arrows, where red indicates a positive contribution toward metastasis and blue indicates a negative contribution. The length of each arrow corresponds to the strength of the contribution. Clinicians can quickly interpret which features are driving the prediction and to what extent, without performing manual scoring or look-up procedures.

Pathophysiologically, the most influential shape features identified by SHAP—Elongation, Sphericity, and SurfaceVolumeRatio—reflect key aspects of tumor shape that may be indicative of underlying biological behavior. Elongation quantifies how stretched a tumor appears along its principal axis, with higher values suggesting directional growth or infiltration [30, 31]. This shape feature may represent tumors that grow along anatomical planes or perineural pathways, a pattern previously associated with aggressive phenotypes in head and neck cancers. Sphericity, on the other hand, measures how closely a tumor resembles a perfect sphere; lower sphericity implies more irregular and lobulated contours, which may reflect local invasiveness or heterogeneous proliferation. Li et al. demonstrated that lower tumor sphericity was associated with more irregular and complex morphological patterns on MRI, as well as a reduced likelihood of achieving pathologic complete response [32]. The SurfaceVolumeRatio can be interpreted in various ways depending on the tumor context. Traditionally, a higher ratio indicates a greater surface area relative to volume, which has been associated with a more aggressive tumor–host interface and an increased likelihood of local invasion or lymphovascular spread [33, 34]. However, in our model, a lower SurfaceVolumeRatio was strongly associated with a higher risk of metastasis in TSCC. This inverse relationship is consistent with the findings reported by Chu et al. [35]. Therefore, the interpretation of this feature must be contextualized within the specific anatomy and pathology of the disease.

This study presents a non-invasive, preoperative approach for predicting LNM in TSCC using a SHAP-based radiomics model, which may support clinical diagnosis and inform treatment decisions. For patients predicted to be at high risk, neck lymph node dissection is recommended in accordance with current clinical practice. In contrast, those identified as low-risk should carefully weigh the potential trauma and complications associated with surgical intervention. By integrating SHAP into radiomics-based predictive modeling for TSCC, particularly in the analysis of shape-related features, clinicians and researchers can gain a clearer understanding of which imaging characteristics drive individual predictions. This combination holds promise for improving the reliability, interpretability, and clinical applicability of radiomics models in TSCC management.

The present study had some limitations. First, its retrospective design may introduce inherent selection bias, potentially affecting the generalizability of the findings. Second, although an external test cohort was included, its relatively small size (n = 52) and the fact that all data were collected from two Chinese hospitals may limit the robustness of the validation and the applicability of the model to other populations, races, and regions. Future prospective studies with larger, multi-center, and more diverse cohorts are needed to confirm and generalize the model’s performance. Third, this study focused exclusively on shape-based radiomics features. While these morphological characteristics yielded valuable insights, incorporating textural features or deep learning–derived representations may further improve predictive accuracy.

Conclusion

In summary, we developed and validated a SHAP-based radiomics model for the non-invasive, preoperative prediction of LNM in TSCC. The model demonstrated promising predictive performance by effectively integrating shape-related features and interpretability techniques, while also exhibiting potential for clinical translation to support personalized treatment planning. Further validation in larger, prospective multi-center studies is warranted to confirm its generalizability and assess its impact on clinical outcomes.

Supplementary Information

Supplementary Material 1.^{(1.6MB, docx)}

Acknowledgments

Clinical trial number

Not applicable.

Abbreviations

AUC: Area under the curve
CET1WI: contrast-enhanced T1-weighted imaging
CIs: Confidence intervals
ICC: Intraclass correlation coefficients
LASSO: Least absolute shrinkage and selection operator
LMR: Lymphocyte-to-monocyte ratio
LNM: Lymph node metastasis
LR: Logistic regression
MLP: Multilayer perceptron
MRI: Magnetic resonance imaging
ROC: Receiver operating characteristic
ROIs: Regions of interest
SHAP: SHapley Additive exPlanations
SVM: Support vector machine
TSCC: Tongue squamous cell carcinoma
T2WI: T2-weighted imaging

Authors’ contributions

Y.L., N.H., W.L. and Y.O. contributed to the conception and design of the study. Y.L., N.H., W.L. and Y.O. prepared the figures and drafted the manuscript. L.W., and H.X. reviewed and revised the manuscript. W.C. and Y.X. conducted the data analysis. J.C. and K.Z. participated in the collection, follow-up, detection and analysis of clinical data. L.W., T.N. and I.K. provided critical revisions and suggestions for improving the manuscript. All authors read and approved the final manuscript.

Funding

This research was funded by the Natural Science Foundation of Fujian Province (2022J01267).

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Declarations

Ethics approval and consent to participate

This study was approved by the Medical Ethics Committee of the First Affiliated Hospital of Fujian Medical University (approval No. FMU[2023]639). All procedures adhered to the Declaration of Helsinki. For the retrospective analysis of anonymized data, the requirement for informed consent was waived by the same ethics committee.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Yang Li and Nengwen Huang contributed equally to this work.

Contributor Information

Yanjing Ou, Email: ouyanjing_FJMU@163.com.

Wen Li, Email: zzlw8651@163.com.

References

1.Weatherspoon DJ, Chattopadhyay A, Boroumand S, Garcia I. Oral cavity and oropharyngeal cancer incidence trends and disparities in the United States: 2000–2010. Cancer Epidemiol. 2015;39(4):497–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Xie N, Wang C, Liu X, et al. Tumor budding correlates with occult cervical lymph node metastasis and poor prognosis in clinical early-stage tongue squamous cell carcinoma. J Oral Pathol Med. 2015;44(4):266–72. [DOI] [PubMed] [Google Scholar]
3.Kwon M. Prediction of occult lymph node metastasis in early tongue cancer. Clin Exp Otorhinolaryngol. 2022;15(4):297–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Li HF, Liu YQ, Shen ZJ, et al. Downregulation of MACC1 inhibits invasion, migration and proliferation, attenuates cisplatin resistance and induces apoptosis in tongue squamous cell carcinoma. Oncol Rep. 2015;33(2):651–60. [DOI] [PubMed] [Google Scholar]
5.Ding L, Fu Y, Zhu N, et al. OXTRHigh stroma fibroblasts control the invasion pattern of oral squamous cell carcinoma via ERK5 signaling. Nat Commun. 2022;13(1):5124. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Sun Y, Li Y, Zhou W, Liu Z. MicroRNA expression as a prognostic biomarker of tongue squamous cell carcinoma (TSCC): a systematic review and meta-analysis. BMC Oral Health. 2024;24(1):406. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Matsuo K, Akiba J, Kusukawa J, Yano H. Squamous cell carcinoma of the tongue: subtypes and morphological features affecting prognosis. Am J Physiol Cell Physiol. 2022;323(6):C1611–23. [DOI] [PubMed] [Google Scholar]
8.Baytok A, Ecer G, Balasar M, Koplay M. Computed tomography and magnetic resonance imaging characteristics of renal cell carcinoma: differences between subtypes and clinical evaluation. J Clin Imaging Sci. 2025;15:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Le HDM, Vo DT, Do HT, et al. Hepatectomy in a young patient with advanced hepatocellular carcinoma and poor prognostic imaging features: a case of recurrence-free survival. Radiol Case Rep. 2025;20(6):2704–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Zhang W, Zhang W, Li X, Cao X, Yang G, Zhang H. Predicting tumor perineural invasion status in high-grade prostate cancer based on a clinical-radiomics model incorporating T2-weighted and diffusion-weighted magnetic resonance images. Cancers (Basel). 2022;15(1):86. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Kang W, Qiu X, Luo Y, et al. Application of radiomics-based multiomics combinations in the tumor microenvironment and cancer prognosis. J Transl Med. 2023;21(1):598. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.LiW, Li Y, Wang L, et al. Evaluating fusion models for predicting occult lymph node metastasis in tongue squamous cell carcinoma. Eur Radiol. 2025;35(9):5228–38. 10.1007/s00330-025-11473-9 [DOI] [PubMed] [Google Scholar]
13.Xu Y, Liu X, Cao X, et al. Artificial intelligence: a powerful paradigm for scientific research. Innovation. 2021;2(4):100179. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Rodríguez-Pérez R, Bajorath J. Interpretation of compound activity predictions from complex machine learning models using local approximations and Shapley values. J Med Chem. 2020;63(16):8761–77. [DOI] [PubMed] [Google Scholar]
15.Tustison NJ, Avants BB, Cook PA, et al. N4ITK: improved N3 bias correction. IEEE Trans Med Imaging. 2010;29(6):1310–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Nyúl LG, Udupa JK, Zhang X. New variants of a method of MRI scale standardization. IEEE Trans Med Imaging. 2000;19(2):143–50. [DOI] [PubMed] [Google Scholar]
17.van Griethuysen J, Fedorov A, Parmar C, et al. Computational radiomics system to Decode the radiographic phenotype. Cancer Res. 2017;77(21):e104–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Luo Y, Sun X, Kong X, et al. A DWI-based radiomics-clinical machine learning model to preoperatively predict the futile recanalization after endovascular treatment of acute basilar artery occlusion patients. Eur J Radiol. 2023;161:110731. [DOI] [PubMed] [Google Scholar]
20.Wang W, Peng Y, Feng X, et al. Development and validation of a computed tomography-based radiomics signature to predict response to neoadjuvant chemotherapy for locally advanced gastric cancer. JAMA Netw Open. 2021;4(8):e2121143. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30. [Google Scholar]
22.Borgonovo E, Plischke E, Rabitti G. The many Shapley values for explainable artificial intelligence: a sensitivity analysis perspective. Eur J Oper Res. 2024;318(3):911–26. [Google Scholar]
23.Fujii S, Katada C, Watanabe H, et al. Tumor thickness as a novel risk factor for lymph node metastasis by superficial squamous cell carcinoma of head and neck. Cancer Sci. 2024;115(9):3169–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Navarro Cuéllar I, Espías Alonso S, Alijo Serrano F, et al. Depth of invasion: influence of the latest TNM classification on the prognosis of clinical early stages of oral tongue squamous cell carcinoma and its association with other histological risk factors. Cancers. 2023;15(19):4882. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Wang W, Wang Y, Zeng W, et al. Prognostic factors in surgically treated tongue squamous cell carcinoma in stage T1‐2N0‐1M0: A retrospective analysis. Cancer Med. 2024;13(3):e7016. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Valizadeh P, Jannatdoust P, Pahlevan-Fallahy MT, et al. Diagnostic accuracy of radiomics and artificial intelligence models in diagnosing lymph node metastasis in head and neck cancers: a systematic review and meta-analysis. Neuroradiology. 2025;67(2):449–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Rizzo S, Botta F, Raimondi S, et al. Radiomics of high-grade serous ovarian cancer: association between quantitative CT features, residual tumour and disease progression within 12 months. Eur Radiol. 2018;28(11):4849–59. [DOI] [PubMed] [Google Scholar]
28.Bi Y, Xiang D, Ge Z, Li F, Jia C, Song J. An interpretable prediction model for identifying N7-Methylguanosine sites based on XGBoost and SHAP. Mol Ther Nucleic Acids. 2020;22:362–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Ponce-Bobadilla AV, Schmitt V, Maier CS, Mensing S, Stodtmann S. Practical guide to SHAP analysis: explaining supervised machine learning model predictions in drug development. Clin Transl Sci. 2024;17(11):e70056. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Ahmed AA, Elmohr MM, Fuentes D, et al. Radiomic mapping model for prediction of Ki-67 expression in adrenocortical carcinoma. Clin Radiol. 2020;75(6):479.e17-479.e22. [DOI] [PubMed] [Google Scholar]
31.Zhou B, Xu J, Tian Y, Yuan S, Li X. Correlation between radiomic features based on contrast-enhanced computed tomography images and Ki-67 proliferation index in lung cancer: a preliminary study. Thorac Cancer. 2018;9(10):1235–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Li W, Newitt DC, Yun B, et al. Tumor sphericity predicts response in neoadjuvant chemotherapy for invasive breast cancer. Tomography. 2020;6(2):216–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Bi X, Sterling JA, Merkel AR, Perrien DS, Nyman JS, Mahadevan-Jansen A. Prostate cancer metastases alter bone mineral and matrix composition independent of effects on bone architecture in mice—a quantitative study using MicroCT and Raman spectroscopy. Bone. 2013;56(2):454–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Hu X, Ye W, Li Z, et al. Non-invasive evaluation for benign and malignant subcentimeter pulmonary ground-glass nodules (≤ 1 cm) based on CT texture analysis. Br J Radiol. 2020;93(1114):20190762. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Chu H, Pang P, He J, et al. Value of radiomics model based on enhanced computed tomography in risk grade prediction of gastrointestinal stromal tumors. Sci Rep. 2021;11(1):12009. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1.^{(1.6MB, docx)}

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

[CR1] 1.Weatherspoon DJ, Chattopadhyay A, Boroumand S, Garcia I. Oral cavity and oropharyngeal cancer incidence trends and disparities in the United States: 2000–2010. Cancer Epidemiol. 2015;39(4):497–504. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Xie N, Wang C, Liu X, et al. Tumor budding correlates with occult cervical lymph node metastasis and poor prognosis in clinical early-stage tongue squamous cell carcinoma. J Oral Pathol Med. 2015;44(4):266–72. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Kwon M. Prediction of occult lymph node metastasis in early tongue cancer. Clin Exp Otorhinolaryngol. 2022;15(4):297–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Li HF, Liu YQ, Shen ZJ, et al. Downregulation of MACC1 inhibits invasion, migration and proliferation, attenuates cisplatin resistance and induces apoptosis in tongue squamous cell carcinoma. Oncol Rep. 2015;33(2):651–60. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Ding L, Fu Y, Zhu N, et al. OXTRHigh stroma fibroblasts control the invasion pattern of oral squamous cell carcinoma via ERK5 signaling. Nat Commun. 2022;13(1):5124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Sun Y, Li Y, Zhou W, Liu Z. MicroRNA expression as a prognostic biomarker of tongue squamous cell carcinoma (TSCC): a systematic review and meta-analysis. BMC Oral Health. 2024;24(1):406. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Matsuo K, Akiba J, Kusukawa J, Yano H. Squamous cell carcinoma of the tongue: subtypes and morphological features affecting prognosis. Am J Physiol Cell Physiol. 2022;323(6):C1611–23. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Baytok A, Ecer G, Balasar M, Koplay M. Computed tomography and magnetic resonance imaging characteristics of renal cell carcinoma: differences between subtypes and clinical evaluation. J Clin Imaging Sci. 2025;15:10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Le HDM, Vo DT, Do HT, et al. Hepatectomy in a young patient with advanced hepatocellular carcinoma and poor prognostic imaging features: a case of recurrence-free survival. Radiol Case Rep. 2025;20(6):2704–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Zhang W, Zhang W, Li X, Cao X, Yang G, Zhang H. Predicting tumor perineural invasion status in high-grade prostate cancer based on a clinical-radiomics model incorporating T2-weighted and diffusion-weighted magnetic resonance images. Cancers (Basel). 2022;15(1):86. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Kang W, Qiu X, Luo Y, et al. Application of radiomics-based multiomics combinations in the tumor microenvironment and cancer prognosis. J Transl Med. 2023;21(1):598. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.LiW, Li Y, Wang L, et al. Evaluating fusion models for predicting occult lymph node metastasis in tongue squamous cell carcinoma. Eur Radiol. 2025;35(9):5228–38. 10.1007/s00330-025-11473-9 [DOI] [PubMed] [Google Scholar]

[CR13] 13.Xu Y, Liu X, Cao X, et al. Artificial intelligence: a powerful paradigm for scientific research. Innovation. 2021;2(4):100179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Rodríguez-Pérez R, Bajorath J. Interpretation of compound activity predictions from complex machine learning models using local approximations and Shapley values. J Med Chem. 2020;63(16):8761–77. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Tustison NJ, Avants BB, Cook PA, et al. N4ITK: improved N3 bias correction. IEEE Trans Med Imaging. 2010;29(6):1310–20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Nyúl LG, Udupa JK, Zhang X. New variants of a method of MRI scale standardization. IEEE Trans Med Imaging. 2000;19(2):143–50. [DOI] [PubMed] [Google Scholar]

[CR17] 17.van Griethuysen J, Fedorov A, Parmar C, et al. Computational radiomics system to Decode the radiographic phenotype. Cancer Res. 2017;77(21):e104–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155–63. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Luo Y, Sun X, Kong X, et al. A DWI-based radiomics-clinical machine learning model to preoperatively predict the futile recanalization after endovascular treatment of acute basilar artery occlusion patients. Eur J Radiol. 2023;161:110731. [DOI] [PubMed] [Google Scholar]

[CR20] 20.Wang W, Peng Y, Feng X, et al. Development and validation of a computed tomography-based radiomics signature to predict response to neoadjuvant chemotherapy for locally advanced gastric cancer. JAMA Netw Open. 2021;4(8):e2121143. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30. [Google Scholar]

[CR22] 22.Borgonovo E, Plischke E, Rabitti G. The many Shapley values for explainable artificial intelligence: a sensitivity analysis perspective. Eur J Oper Res. 2024;318(3):911–26. [Google Scholar]

[CR23] 23.Fujii S, Katada C, Watanabe H, et al. Tumor thickness as a novel risk factor for lymph node metastasis by superficial squamous cell carcinoma of head and neck. Cancer Sci. 2024;115(9):3169–79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Navarro Cuéllar I, Espías Alonso S, Alijo Serrano F, et al. Depth of invasion: influence of the latest TNM classification on the prognosis of clinical early stages of oral tongue squamous cell carcinoma and its association with other histological risk factors. Cancers. 2023;15(19):4882. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Wang W, Wang Y, Zeng W, et al. Prognostic factors in surgically treated tongue squamous cell carcinoma in stage T1‐2N0‐1M0: A retrospective analysis. Cancer Med. 2024;13(3):e7016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Valizadeh P, Jannatdoust P, Pahlevan-Fallahy MT, et al. Diagnostic accuracy of radiomics and artificial intelligence models in diagnosing lymph node metastasis in head and neck cancers: a systematic review and meta-analysis. Neuroradiology. 2025;67(2):449–67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Rizzo S, Botta F, Raimondi S, et al. Radiomics of high-grade serous ovarian cancer: association between quantitative CT features, residual tumour and disease progression within 12 months. Eur Radiol. 2018;28(11):4849–59. [DOI] [PubMed] [Google Scholar]

[CR28] 28.Bi Y, Xiang D, Ge Z, Li F, Jia C, Song J. An interpretable prediction model for identifying N7-Methylguanosine sites based on XGBoost and SHAP. Mol Ther Nucleic Acids. 2020;22:362–72. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Ponce-Bobadilla AV, Schmitt V, Maier CS, Mensing S, Stodtmann S. Practical guide to SHAP analysis: explaining supervised machine learning model predictions in drug development. Clin Transl Sci. 2024;17(11):e70056. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Ahmed AA, Elmohr MM, Fuentes D, et al. Radiomic mapping model for prediction of Ki-67 expression in adrenocortical carcinoma. Clin Radiol. 2020;75(6):479.e17-479.e22. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Zhou B, Xu J, Tian Y, Yuan S, Li X. Correlation between radiomic features based on contrast-enhanced computed tomography images and Ki-67 proliferation index in lung cancer: a preliminary study. Thorac Cancer. 2018;9(10):1235–40. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Li W, Newitt DC, Yun B, et al. Tumor sphericity predicts response in neoadjuvant chemotherapy for invasive breast cancer. Tomography. 2020;6(2):216–22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Bi X, Sterling JA, Merkel AR, Perrien DS, Nyman JS, Mahadevan-Jansen A. Prostate cancer metastases alter bone mineral and matrix composition independent of effects on bone architecture in mice—a quantitative study using MicroCT and Raman spectroscopy. Bone. 2013;56(2):454–60. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Hu X, Ye W, Li Z, et al. Non-invasive evaluation for benign and malignant subcentimeter pulmonary ground-glass nodules (≤ 1 cm) based on CT texture analysis. Br J Radiol. 2020;93(1114):20190762. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Chu H, Pang P, He J, et al. Value of radiomics model based on enhanced computed tomography in risk grade prediction of gastrointestinal stromal tumors. Sci Rep. 2021;11(1):12009. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

An interpretable machine learning model using SHapley Additive exPlanations for preoperative cervical lymph node metastasis risk stratification in tongue squamous cell carcinoma: a multicenter study

Yang Li

Nengwen Huang

Li Wang

Haotian Xiao

Weiping Chen

Yifeng Xing

Takashi Nishioka

Kangwei Zhou

Ikuho Kojima

Jiang Chen

Yanjing Ou

Wen Li

Abstract

Objectives

Methods

Results

Conclusions

Supplementary Information

Introduction

Methods

Study population of the dataset

Fig. 1.

CEMRI examination and image preprocessing

Fig. 2.

Radiomics feature extraction, selection, and model construction

Model explanation and visualization

Clinical feature selection and model development

Nomogram construction

Statistical analysis

Results

Patient characteristics

Table 1.

Feature selection

Machine learning model selection

Fig. 3.

Performance comparison between the radiomics and combined models

Table 2.

Explanation and visualization of radiomics model

Fig. 4.

Fig. 5.

Discussion

Conclusion

Supplementary Information

Acknowledgments

Clinical trial number

Abbreviations

Authors’ contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Competing interests

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases