An explainable radiomics model based on multiparametric magnetic resonance for differentiating benign and malignant orbital tumors

Guozheng Zhang; Xingjian Xu; Rujian Hong; Xiaowei Han; Weitao Huang

doi:10.1186/s12885-026-15853-2

. 2026 Mar 13;26:507. doi: 10.1186/s12885-026-15853-2

An explainable radiomics model based on multiparametric magnetic resonance for differentiating benign and malignant orbital tumors

Guozheng Zhang ^1,^#, Xingjian Xu ^3,^#, Rujian Hong ², Xiaowei Han ¹, Weitao Huang ^1,^✉

PMCID: PMC13101340 PMID: 41826882

Abstract

Objective

To develop and internally test a multiparametric radiomics combined model for differentiating benign and malignant orbital tumors.

Patients and methods

This retrospective study analyzed 147 patients from two centers (December 2014 to March 2024) with pathologically confirmed orbital tumors and preoperative contrast-enhanced magnetic resonance imaging(MRI). After image preprocessing, 3668 radiomics features were extracted from T2-weighted imaging (T2WI) and contrast-enhanced T1-weighted imaging (CE T1WI) sequences. Feature reduction and selection were performed using the t-test/U-test, Pearson correlation coefficient, minimum redundancy maximum relevance (mRMR), and least absolute shrinkage and selection operator (LASSO) regression. Three machine learning algorithms, logistic regression (LR), naive Bayes classifier (NaiveBayes), and Multilayer perceptron (MLP) were used to construct radiomics models. A combined radiomics model (CRM), defined as an MLP-based model incorporating selected features from both T2WI and CE T1WI sequences, was subsequently built and integrated with clinical factors to create a radiomics nomogram. Model performance was evaluated using the area under the curve (AUC), accuracy, sensitivity, and specificity. Decision curve analysis (DCA) assessed clinical utility, and SHapley Additive exPlanations (SHAP) provided model interpretability.

Results

Six key radiomics features were selected to establish the CRM. The MLP-based model achieved the highest AUC among the individual machine learning models in both training and test cohorts. The CRM demonstrated superior performance compared to models based solely on T2WI or CE T1WI, with AUCs of 0.877 (training cohort) and 0.860 (test cohort). The final nomogram, integrating the CRM and clinical factors, showed favorable discriminatory performance, achieving AUCs of 0.890 and 0.846 in the training and test cohorts, respectively. SHAP analysis identified ‘squareroot_firstorder_Skewness_CE T1WI’ and ‘wavelet_LLH_glcm_Correlation_CE T1WI’ as important predictors for malignant orbital tumors.

Conclusion

This study presents an effective and explainable multiparametric MRI radiomics model that accurately differentiates benign from malignant orbital tumors. The developed nomogram demonstrates promising performance within the internal validation framework and may provide supportive information for clinical decision-making pending further external validation.

Keywords: Orbital tumors, Benign, Malignant, Magnetic resonance imaging, Radiomics, Nomogram

Introduction

Orbital tumors and orbital masses represent a wide spectrum of disorders, ranging from benign to malignant conditions, each capable of causing significant morbidity and functional impairment. The anatomy of the orbit, with its complex structures and vital connections to the visual system, presents unique challenges in the diagnosis and management of orbital tumors. Benign orbital tumors, such as cavernous hemangioma and dermoid cysts, although not life-threatening, can cause significant discomfort and cosmetic deformities that may lead to psychological distress [1]. On the other hand, malignant orbital diseases, such as orbital lymphoma, rhabdomyosarcoma, and metastatic tumors, pose a severe threat due to their potential for rapid progression and high morbidity [2]. Accurate differentiation between benign and malignant orbital tumors is therefore a critical diagnostic decision that directly influences clinical management strategies.

Recent advances in imaging techniques, including high-resolution MRI and PET scans, have significantly improved the diagnostic accuracy, allowing for earlier and more precise interventions [3]. However, accurate preoperative diagnosis of orbital tumors remains challenging due to the complexity of the orbital anatomy and the overlapping imaging characteristics of different tumor types [4, 5]. Conventional diagnostic methods, including biopsy and surgical interventions, often carry risks of complications and inaccuracies, particularly in the delicate orbital area [6]. Recent advances in medical imaging and machine learning have paved the way for non-invasive diagnostic techniques, offering a promising alternative through radiomics—the extraction of a large number of features from radiographic medical images [7, 8]. An imaging-based radiomics model focused on orbital tumors may assist in distinguishing benign from malignant lesions and support existing clinical and imaging assessments rather than replace them.

The development of a non-invasive imaging radiomics model specifically for orbital tumors has the potential to enhance diagnostic confidence and aid clinical decision-making, rather than replace established diagnostic pathways. Imaging radiomics allows for detailed characterization of tissue heterogeneity, which may contribute to differentiating benign from malignant orbital tumors and help stratify patients for biopsy, surveillance, or further multidisciplinary evaluation [9]. This is crucial because the physical intervention can potentially cause damage to the optical nerves or other critical structures within the compact orbital space.Studies have shown that radiomics can effectively capture the heterogeneity of tumors in various anatomical regions, which is often linked to underlying genetic mutations, tumor behavior, and prognosis [10, 11].

Multiparametric magnetic resonance imaging (MRI) combines various MRI techniques to provide a comprehensive assessment of tissues, leveraging the strengths of each modality to improve diagnostic accuracy [12]. In this study, we used T2-weighted (T2WI) and contrast-enhanced T1-weighted (CE T1WI) images for orbital imaging. T2W MRI provides high tissue contrast, particularly useful for differentiating between soft tissue structures and detecting edema or inflammation. CE T1WI, on the other hand, uses contrast agents to enhance the visibility of blood vessels and tumors, allowing for better delineation of lesions, particularly useful for distinguishing malignant tumors from benign ones as well. The integration of these sequences is particularly suitable for radiomics-based modeling aimed at differentiating benign and malignant orbital tumors [13]. The development of a multiparametric MRI radiomics model represents a significant advancement in medical imaging [14], particularly for complex anatomical regions such as the orbit, where distinguishing between benign and malignant lesions is critical yet challenging.

Based on our literature review, there are few reports using multiparametric magnetic resonance imaging radiomic features to distinguish between benign and malignant orbital lesions [9, 15, 16]. Recent studies have shown that while the application of radiomics has demonstrated its value in numerous ophthalmic diseases, there is a lack of research specifically focusing on orbital lesions [16]. However, the application of multiparametric MRI radiomics in orbital imaging remains in its early stages, and the use of these features to differentiate between benign and malignant lesions is still an ongoing research topic. A limited number of published studies involve small patient cohorts and focus on specific orbital locations or subtypes of orbital tumors, such as lacrimal gland tumors, or on technical points [15, 17, 18].

Therefore, the objective of this study was to investigate whether multiparametric MRI radiomics features derived from T2WI and CE T1WI can differentiate benign from malignant orbital tumors, addressing a clinically relevant diagnostic question.

Patients and methods

Patients

This retrospective study was approved by the Ethics Committee of the Quzhou People’s Hospital (No. 2024 − 139) and the requirement for written informed consent was waived. We enrolled 147 patients with pathologically confirmed orbital tumors who underwent surgical procedures at two institutions between December 2014 and March 2024. All patients had preoperative contrast-enhanced MRI examinations and complete clinical-pathological data.

Inclusion criteria comprised: patients with histologically verified benign or malignant orbital tumors; MRI examination performed within 2 weeks before surgery; and availability of complete medical records. Exclusion criteria included: lesions smaller than 5 mm in maximum diameter; MRI scans with substantial motion or susceptibility artifacts that compromised segmentation quality; and cases with incomplete follow-up information. Figure 1 presents the patient selection process, while Fig. 2 outlines the complete analytical workflow including lesion segmentation, radiomics feature extraction, feature selection, and model development. Patient allocation followed a 7:3 random split between training and test cohorts. To maximize the utilization of our sample size for model development, data from both institutions were merged before random partitioning. It should be noted that this constitutes an internal validation strategy. The critical step of external validation across institutions will be addressed in future prospective studies.

Clinical baseline characteristics and MRI image acquisition

Patient age and sex information was obtained from hospital medical record systems. Tumor size measurements (maximum and minimum diameters) represented the average values determined by two radiologists. MRI examinations were performed using a 3-Tesla scanner (Magnetom Skyra, Siemens) with all images stored in DICOM format. The imaging protocol included T2-weighted imaging (T2WI) using fat-saturated fast spin-echo sequences and contrast-enhanced T1-weighted imaging (CE T1WI) using standard gradient-echo sequences. Technical parameters were consistent across all studies: slice thickness 2 mm, slice gap 0.2 mm, and acquisition matrix 512 × 512. Sequence-specific parameters included: T1WI (TR 952 ms, TE 12 ms) and fat-saturated T2WI (TR 5000 ms, TE 82 ms). Contrast administration involved gadopentetate dimeglumine (Magnevist, Bayer) at 0.1 mmol/kg. Prior to radiomics feature extraction, all images underwent a single, unified preprocessing workflow. First, voxel intensities were truncated at the 0.5th and 99.5th percentiles to reduce the influence of outliers. Second, images were resampled to 1 × 1 × 1 mm³ isotropic voxel spacing using linear interpolation to ensure spatial consistency across patients. Finally, gray-level discretization was performed using a fixed number of bins (64 bins) for all sequences prior to feature extraction.

This preprocessing strategy was applied identically to both T2WI and CE T1WI images to ensure consistency and reproducibility of the radiomics analysis.

Image segmentation

Accurate delineation of orbital tumors served as the foundation for subsequent image analysis. All segmentations were performed manually by radiologists using ITK-SNAP software (version 3.8.0). An experienced radiologist (with 11 years in diagnostic imaging) conducted the primary segmentation process, creating three-dimensional volumes of interest through sequential slice-by-slice annotation of entire tumor volumes. To assess interobserver variability, a second radiologist independently segmented 30 randomly selected lesions. We calculated intraclass correlation coefficients (ICC) to evaluate the consistency of radiomics features between observers, considering ICC values ≥ 0.75 indicative of satisfactory reliability.

Radiomics feature extraction, selection, and model evaluation

Our analysis followed a structured radiomics pipeline. After manual segmentation of regions of interest using ITK-SNAP (version 3.8.0), we implemented comprehensive image preprocessing: intensity values were truncated to the 0.5th-99.5th percentile range to reduce outlier effects, and all images were resampled to 1 mm³ isotropic resolution for spatial normalization.

Radiomics feature extraction was conducted separately for T2WI and CE-T1WI sequences using PyRadiomics (version 3.0.1). This process yielded an initial high-dimensional feature pool of over 3,000 radiomic features per patient.Image intensity discretization employed a fixed bin width approach (5 bins). To capture multi-scale texture information, we applied both Wavelet transforms and Laplacian of Gaussian (LoG) filters with three sigma values ([1.0, 2.0, 3.0] mm). All extraction procedures complied with the standardized workflow provided by PyRadiomics.

Feature selection involved a multi-step process: initial statistical screening identified features with significant differences (t-test or U-test, p < 0.05); subsequent redundancy reduction using Pearson correlation coefficients removed highly correlated features (retaining one from pairs with correlation > 0.9); mRMR method selected features with optimal correlation-redundancy balance; final selection employed LASSO-logistic regression. The LASSO regression model served to reduce feature dimensionality and construct radiomic signatures. Through ten-fold cross-validation within the training cohort, we identified the optimal regularization parameter λ that minimized cross-validation error. This ensures that the feature selection process is blind to the test cohort data.Notably, λ values were determined independently for each imaging modality to account for feature distribution differences. Features with non-zero coefficients were incorporated into the final radiomic signatures.

To prevent information leakage, the dataset was randomly divided into a training cohort and an independent test cohort at a 7:3 ratio, with the test cohort excluded from all feature selection, model training, hyperparameter tuning, and early stopping procedures.Within the training cohort, a stratified 80%/20% internal training–validation split was applied using a fixed random seed. Early stopping for the MLP model was implemented based on internal validation loss (patience = 20), and the validation set was used solely to prevent overfitting.All feature selection, model training, and model selection steps were conducted exclusively within the training cohort. To rigorously assess the stability of the selected features and the robustness of model performance against the randomness of data splitting, we employed a repeated random sub-sampling validation strategy. Specifically, the 7:3 training-test split process was repeated 100 times. For each split, the entire feature selection pipeline (including statistical screening, mRMR, and LASSO) and model training were independently re-performed on the training subset. The performance metrics (e.g., AUC, Accuracy) reported in this study represent the mean ± standard deviation across these 100 iterations. This approach mitigates the risk that our conclusions are dependent on a single, fortunate data partition.Final model performance was evaluated once on the independent test cohort to obtain unbiased estimates.

Following LASSO feature selection, we implemented three machine learning classifiers: logistic regression (LR), naive Bayes classifier (NaiveBayes), and multilayer perceptron (MLP).Model training utilized 5-fold cross-validation within the training cohort to identify the optimal classifier, while the independent test cohort was reserved for final performance evaluation only. Performance evaluation included area under the curve (AUC), accuracy, sensitivity, and specificity, with clinical utility assessed via decision curve analysis (DCA). The model demonstrating the highest AUC was selected as optimal. We employed Shapley Additive exPlanations (SHAP) to enhance model interpretability and applied Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI) metrics for comparative model assessment.

Statistical analyses

Statistical analyses were performed using R software (version 4.0.2). Normally distributed continuous data are expressed as mean ± standard deviation (𝑥±s) and compared using independent samples t-test. Non-normally distributed continuous data are presented as median (interquartile range) [M(Q1, Q3)] and analyzed with Mann-Whitney U test. Categorical variables are described as frequencies (percentages) [n (%)] and compared using χ² test or Fisher’s exact test. We constructed receiver operating characteristic (ROC) curves using the “pROC” package to evaluate diagnostic performance. Model performance metrics including AUC, accuracy (ACC), sensitivity (SEN), specificity (SPE), positive predictive value (PPV), and negative predictive value (NPV) were compared against gold standard pathological diagnoses. The Delong test facilitated AUC comparisons, while the Hosmer-Lemeshow test assessed model calibration. Additional evaluations included net reclassification improvement (NRI) and integrated discrimination improvement (IDI) analyses. Statistical significance was defined as p < 0.05. For results derived from repeated random sub-sampling (e.g., model performance metrics), data are presented as mean ± standard deviation. The consistency of feature selection frequency across the 100 iterations was also recorded to assess the stability of the radiomic signature.

Results

Patient characteristics

Among the 176 patients who met the inclusion criteria, 29 patients were excluded due to the following reasons: 8 patients had tumors smaller than 5 mm, and 21 patients had images with significant artifacts that impaired analysis. Ultimately, 147 patients were included in this study, comprising 65 patients with malignant orbital tumors and 82 patients with benign orbital tumors; the histopathological types of the participants’ lesions are detailed in Table 1. Table 2 summarizes the baseline characteristics of orbital malignant and benign tumors in the training and test cohorts. Univariate logistic regression analyses (Table 3) showed that none of the clinical factors demonstrated statistically significant associations (P > 0.05). It is noteworthy that patient age, which showed a significant intergroup difference in the univariate analysis (p < 0.05, Table 2), was not retained as a significant predictor in the univariate logistic regression (P > 0.05, Table 3). This suggests that the predictive information conveyed by age may be shared with other baseline characteristics. This suggests that the predictive information conveyed by age may be shared with other baseline characteristics. However, tumor size is a well-established imaging biomarker associated with malignancy, as it reflects the rate of tumor progression [19]. And the age was an important clinical factor for determining tumor malignancy [20]. Therefore, age, along with the minimum tumor diameter and maximum tumor diameter, were selected as the sole clinical factors to be included in the final nomogram model.

Table 1.

Participants’ lesion histopathologies

n(%)		Whole Sample (n = 147)
Histopathologie	Benign tumor	Overall	82/147(55.7%)
		cavernous hemangioma	30/147(20.4%)
		Orbital inflammation	23/147(15.6%)
		Neurogenic tumors	9/147(6.1%)
		Pleomorphic adenoma	9/147(6.1%)
		Other benign tumors	11/147(7.5%)
	Malignant tumor	Overall	65/147(44.3%)
		Lymphoma	32/147(21.8)
		rhabdomyosarcoma	9/147(6.1%)
		astrocytoma	5/147(3.4%)
		orbital solitary fibrous tumor*	3/147(2.0%)
		Adenoid Cystic Carcinoma	3/147(2.0%)
		Other malignant tumors	13(8.7%)

Open in a new tab

* Orbital solitary fibrous tumors (SFTs) included in the malignant group met the pathological criteria for malignancy (e.g., mitotic count > 4/10 HPF). According to the WHO classification, these correspond to grade 3 (malignant) SFTs

Table 2.

Clinical characteristics of patients

Characteristics	Training cohort (n = 102)			Test cohort (n = 45)
Characteristics	malignant	benign	P	malignant	benign	P
Age(years)	41.35 ± 22.06	45.54 ± 16.91	0.030	39.95 ± 23.94	40.00 ± 19.28	0.964
Gender(%)			0.829			1.000
Male Female	28(65.12%) 15(34.88%)	36(61.02%) 23(38.98%)		10(45.45%) 12(54.55%)	11(47.83%) 12(52.17%)
Maximum diameter (cm)	2.80 ± 1.15	2.54 ± 1.10	0.159	3.12 ± 1.27	2.41 ± 0.92	0.035
Minimum diameter (cm)	1.78 ± 0.95	1.55 ± 0.63	0.302	1.70 ± 0.69	1.63 ± 0.59	0.733

Open in a new tab

Table 3.

Univariate logistic regression analysis of clinical factors for orbital lesions

Clinical factors	Log(OR)(95%CI)	OR(95%CI)	P
Age	-0.004 (-0.01-0.002)	0.996(0.990–1.002)	0.313
Gender	-0.251(-0.666-0.163)	0.778(0.514–1.177)	0.319
Maximum diameter	-0.070(-0.185-0.044)	0.915(0.764–1.095)	0.311
Minimum diameter	-0.089(-0.269-0.091)	0.932(0.831–1.045)	0.417

Open in a new tab

Features selection

LASSO-logistic regression analysis was used to perform dimensionality reduction on the features extracted from the T2WI images. The selection of the penalty coefficient (λ = 0.0791), the process of feature selection, and the curve of the variation of the feature coefficient with λ are shown in Fig. 3. After the final screening, a total of 4 features extracted from T2WI images were selected, including first-order statistical features (3), shape (1). The Radiomics feature importance score was constructed from the 4 features extracted from T2WI images and their corresponding regression coefficients.

Fig. 3 — T2WI feature selection using the least absolute shrinkage and the histogram of the Radiomics feature importance score based on the selected features. The optimal λ value of 0.0791 was selected, and a total of 4 features were chosen

LASSO-logistic regression analysis was used to perform dimensionality reduction on the features extracted from CE T1WI images. The selection of the penalty coefficient (λ = 0.0339), the process of feature selection, and the curve of the variation of the feature coefficient with λ are shown in Fig. 4. After the final screening, a total of 7 features extracted from CE T1WI images were selected, including first-order statistical features (1), NGTDM (1), GLRM (1), GLSZM (2), GLDM(1) and GLCM (1). The Radiomics feature importance score was constructed from the 7 features extracted from CE T1WI images and their corresponding regression coefficients.

Fig. 4 — CE T1WI feature selection using the least absolute shrinkage and the histogram of the Radiomics feature importance score based on the selected features. The optimal λ value of 0.0339 was selected, and a total of 7 features were chosen

LASSO-logistic regression analysis was used to perform dimensionality reduction of features fusion. The selection of the penalty coefficient (λ = 0.0596), the feature screening process, and the graph of the variation of the feature coefficient with λ are shown in Fig. 5. Six radiomics features (two from T2WI and four from CE T1WI) were ultimately retained and used as the final input features for construction of the combined radiomics model (CRM) (Fig. 6). For transparency, a linear radiomics score (Rad_signature) was also calculated using the LASSO coefficients of the six selected fused features as follows: Rad_signature = 0.4215686274509803 − 0.062693 * original_shape_Sphericity_T2WI − 0.047678 * wavelet_HLH_firstorder_Median_T2WI − 0.034699 * exponential_glszm_SizeZoneNonUniformityNormalized_CE T1WI − 0.055310 * wavelet_LLH_glcm_Correlation_CE T1WI.

Fig. 5 — CRM feature selection using the least absolute shrinkage and the histogram of the Radiomics feature importance score based on the selected features. The optimal λ value of 0.0596 was selected, and a total of 6 features were chosen

Fig. 6 — The selected radiomics features (with non-zero coefficients) and their corresponding coefficients

−0.089995 * squareroot_firstorder_Skewness_CE T1WI + 0.031029 * square_glszm_ZonePercentage_CE T1WI.

In the above formula, “ + ” indicates a positive correlation, while “ − ” indicates a negative correlation. The larger the coefficient in front of each feature, the higher the correlation. For example, square_glszm_ZonePercentage_CE T1WI is the feature with positive correlation, while squareroot_firstorder_Skewness_CE T1WI is the feature with the highest negative correlation.

Notably, Rad_signature was used as an intermediate linear representation for feature reporting and comparison, whereas the finalized CRM in this study refers to the machine-learning classifier (MLP) trained on the six selected fused features.

Predictive performance of the models

After feature selection, three machine-learning classifiers—logistic regression (LR), naive Bayes (NaiveBayes), and multilayer perceptron (MLP)—were trained using the selected radiomics features for the T2WI model, CE T1WI model, and the combined radiomics model (CRM). For the CRM, the six fused radiomics features served as model inputs.Among the evaluated classifiers, the MLP consistently achieved the highest AUC for the T2WI, CE T1WI, and CRM models (Fig. 7). Therefore, the MLP-based CRM was selected as the final combined radiomics model. The MLP classifier was implemented with two hidden layers (128 and 64 neurons, respectively), using ReLU activation and a dropout rate of 0.5 after each hidden layer to mitigate overfitting. The model was trained with the Adam optimizer (learning rate = 0.001, batch size = 32) for a maximum of 500 epochs, incorporating early stopping (patience = 20) based on validation loss.As shown in Tables 4 and 5; Fig. 8, the optimal prediction model is the CRM, with an AUC of 0.877(0.8112–0.9420,95% CI) for the training cohort and 0.860(0.7444–0.9749,95% CI) for the test cohort. The CRM were combined with the clinical factors to construct a nomogram with an AUC of 0.0.890(0.8262–0.9546,95% CI) for the training cohort and 0.846(0.7208–0.9709,95% CI) for the test cohort. Although the DeLong test showed no significant difference between the CRM model and the nomogram in the training cohort (P = 0.389), further analysis using Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI) metrics demonstrated that the nomogram exhibited significant improvements in classification accuracy compared to the CRM model. Specifically, the nomogram showed improvements in patient reclassification and overall discrimination ability, indicating its potential advantages in the management and treatment of orbital tumor patients(Fig. 9).DCA indicates that the nomogram was more advantageous to patients than the Clinic, T2WI, CE T1WI, and CRM prediction models(Fig. 10).

Fig. 7 — The AUCs of three machine learning methods in the training cohort (A) and test cohort (B), MLPs are the most effective machine learning algorithms

Table 4.

Distinguishing performances of radiomics models in the training cohort and test cohort

Modality	AUC(95%CI)	Accuracy	Sensitivity	Specificity	PPV	NPV	F1
Training cohort
T2WI	0.804(0.7208–0.8882)	0.735	0.953	0.576	0.621	0.944	0.752
CE T1WI	0.873(0.8074–0.9395)	0.814	0.884	0.763	0.731	0.900	0.800
CRM	0.877(0.8112–0.9420)	0.804	0.837	0.780	0.735	0.868	0.783
Test cohort
T2WI	0.789(0.6461–0.9309)	0.800	0.864	0.739	0.760	0.850	0.809
CE T1WI	0.830(0.7043–0.9557)	0.800	0.773	0.826	0.810	0.792	0.791
CRM	0.860(0.7444–0.9749)	0.822	0.864	0.783	0.792	0.857	0.826

Open in a new tab

Table 5.

Distinguishing performances of clinic model, CRM and nomogram

Modality	AUC(95%CI)	Accuracy	Sensitivity	Specificity	PPV	NPV
Training cohort
Clinic	0.627(0.5143–0.7388)	0.657	0.651	0.661	0.583	0.722
CRM	0.877(0.8112–0.9420)	0.804	0.837	0.780	0.735	0.868
Nomogram	0.890(0.8262–0.9546)	0.853	0.860	0.847	0.804	0.893
Test cohort
Clinic	0.585(0.4132–0.7568)	0.622	0.818	0.435	0.581	0.714
CRM	0.860(0.7444–0.9749)	0.822	0.864	0.783	0.792	0.857
Nomogram	0.846(0.7208–0.9709)	0.844	0.773	0.913	0.895	0.808

Open in a new tab

Fig. 8 — The AUCs of various prediction models in the training cohort and test cohort. A & B show the AUCs for the T2WI, CE T1WI, and CRM models, while C & D display the AUCs for the Clinic, CRM, and Nomogram models

Fig. 9 — The Delong test, Net Reclassification Improvement(NRI) and Integrated Discrimination Improvement(IDI) of various prediction models in the training cohort

Fig. 10 — Decision curve analysis was developed with various prediction models. A shows the DCA of the T2WI, CE T1WI, and CRM models, while B shows the DCA of the Clinic, CRM, and Nomogram models

Radiomic nomogram construction

After comparing the predictive performance of the T2WI, CE T1WI, and combined radiomics model (CRM), the CRM implemented as an MLP demonstrated the best overall discriminatory ability and clinical utility. Therefore, a radiomics nomogram was constructed by integrating the CRM-predicted probability with selected clinical baseline characteristics, including age, minimum tumor diameter, and maximum tumor diameter (Fig. 11).This nomogram provides an intuitive and individualized visualization for differentiating benign from malignant orbital tumors and facilitates clinical decision support.

Fig. 11 — The radiomics nomogram integrating clinical factors (age, minimum and maximum tumor diameters) and the CRM-derived radiomics score based on T2WI and CE T1WI.The points for each predictor were determined by drawing a vertical line to the points axis. The total points were then summed and projected onto the total points axis, which corresponds to the predicted probability of malignant orbital tumors on the risk axis. The plot illustrates the relative importance and distribution of the six CRM radiomics features, showing their individual contributions to the predictive performance of the trained MLP classifier. Feature values are color-coded from low (blue) to high (red)

Explanation and visualization of nomogram model

SHapley Additive exPlanations (SHAP) were applied to the finalized MLP-based combined radiomics model (CRM) to provide quantitative and interpretable explanations of model predictions. SHAP summary plots offered a visually concise figure by representing the range and distribution of importance that feature had on the model’s output and by relating the feature’s value to the feature’s impact. Features were sorted by their global importance first. Each dot representing the SHAP value of each feature from a patient was plotted horizontally and was stacked vertically to show the density of the same SHAP value. Then, each dot was colored by the value of the feature, from low (blue) to high (red) [21]. As shown in Fig. 12, wavelet_LLH_glcm_Correlation_CE T1WI was identified as the most influential feature in the CRM for distinguishing malignant from benign orbital tumors. The distribution of SHAP values indicates substantial inter-patient variability, and the color gradient demonstrates that lower values of this feature were associated with higher predicted probabilities of malignancy, reflecting the nonlinear behavior of the MLP model.

Fig. 12 — SHAP summary plot of the MLP-based CRM model

As shown in Fig. 13, for a patient, the SHAP value was 0.56, which is higher than the base value (0.423). Therefore, we can assess that this patient belongs to the malignant orbital tumor group. The squareroot_firstorder_Skewness_CE T1WI arrow value of -0.79 made a negative (red) impact on the benign-malignant orbital tumor classification.

As shown in Fig. 14, for another patient, the SHAP value was 0.07, which is lower than the base value (0.423). Therefore, we can assess that this patient belongs to the benign orbital tumor group. The wavelet_LLH_glcm_Correlation_CE T1WI arrow value of 1.798 made a positive (blue) impact on the benign-malignant orbital tumor classification.

Discussion

We have demonstrated that the MLP model based on multiparametric magnetic resonance imaging radiomics showed good performance in differentiating between benign and malignant orbital tumors. Additionally, the nomogram constructed by combining clinical factors and CRM may enable individualized risk assessment and support clinical decision-making based on the specific characteristics of the patients, thereby potentially improving predictive performance through visualization.

In this retrospective study, we developed and evaluated three non-invasive, personalized radiomics feature models based on MR imaging to differentiate between benign and malignant orbital tumors. The AUC for the T2WI model training cohort was 0.804 (0.7208–0.8882,95%CI), For the test cohort, the AUC was 0.789(0.6461–0.9309,95%CI). The AUC for the CE T1WI model training cohort was 0.873(0.8074–0.9395,95%CI), For the test cohort, the AUC was 0.830(0.7043–0.9557,95%CI). The CRM model’s training cohort AUC was 0.877(0.8112–0.9420,95%CI), The test cohort AUC was 0.860(0.7444–0.9749,95%CI). By integrating clinical factors with the CRM-derived radiomics score, we further constructed a nomogram, which achieved an AUC of 0.890 (0.8262–0.9546, 95% CI) in the training cohort and 0.846 (0.7208–0.9709, 95% CI) in the test cohort. Although the Delong test showed that there was no statistically significant difference between the nomogram and the CRM model, but the NRI and IDI analyses suggested an incremental improvement in model performance, indicating that the nomogram may provide additional diagnostic value beyond radiomics features alone.

Although radiomics models have demonstrated their value in many ophthalmic diseases, there is currently a lack of studies in the literature focusing on the analysis of orbital lesions. The few published studies involve a limited number of patients and primarily concentrate on specific orbital regions or subtypes of orbital tumors, such as distinguishing lacrimal gland tumors or lymphomas from idiopathic inflammation [17, 22, 23], or differentiating between two types of orbital tumors (solitary fibrous tumor and schwannoma) [24]. In this study, we included a variety of histopathological types and utilized multi-parametric MRI image features (including T2WI and CE T1WI sequences), aiming to achieve reliable diagnostic performance while maintaining clinical feasibility. The nomogram we developed combines clinical baseline characteristics with three models (T2WI, CE T1WI, and CRM), resulting in high diagnostic performance (AUC of 0.877). Furthermore, the nomogram showed an improvement in AUC value compared to the CRM in the training cohort (AUC of 0.890). This improvement and accuracy suggest that magnetic resonance imaging–based radiomics may have potential utility in supporting the clinical assessment of patients with orbital tumors. It may help identify patients with malignant orbital tumors who could be considered for more aggressive management strategies, such as surgery or chemotherapy. Conversely, if a benign lesion compresses orbital structures like the optic nerve or eyeball, simple follow-up or surgery may suffice [16].

In this study, the construction of the radiomics model was based solely on two MRI sequences: T2WI and CE T1WI. Existing literature indicates that, compared to standalone morphological imaging, the combination of multiparametric MRI provides higher accuracy [25, 26]. In the study by Emma O’Shaughnessy et al. [16], both models incorporated features from diffusion-weighted imaging (DWI), dynamic contrast-enhanced (DCE), and intravoxel incoherent motion (IVIM). Each of these three techniques offers unique and non-overlapping information, thereby enhancing the accuracy and precision of machine learning models. However, due to the limitations of a retrospective study, comprehensive imaging data from patients could not be fully collected. Therefore, future research could adopt a prospective study design, performing comprehensive MRI imaging on all patients initially diagnosed with orbital tumors. This approach would be both highly interesting and challenging. Future prospective studies incorporating comprehensive MRI protocols may further enhance model performance and robustness.

Consistent with previous studies [27, 28], our radiomics-based model constructed using MLP demonstrated good performance. MLP, also known as an artificial neural network (ANN), is a widely used feedforward neural network model that comprises multiple neurons. The MLP establishes a non-linear mapping between inputs and outputs through inter-layer connections, weights, and bias. The MLP architecture assigns multiple channels to different inputs, and the surrogate model is constructed by incorporating weights, bias matrices, and nonlinear activation functions [29, 30]. Although MLP-based models are widely used and powerful, they cannot be applied to clinical practice unless the models can be interpreted. Through SHAP summary plots and SHAP force plots, SHAP can provide model interpretation and visualization in a clinician-friendly manner [31, 32]. From a global perspective, SHAP summary plots offer intuitive and concise graphics by representing the range and distribution of the importance of features on the model output (the more important the feature, the wider the range of the points) and by associating feature values with feature impacts (from low (blue) to high (red)). This provides a new alternative feature importance plot for the entire cohort. In contrast, ordinary feature importance bars only provide relative importance [19, 33–35]. In this study, clinicians can clearly observe how the “wavelet_LLH_glcm_Correlation_CE T1WI” feature, along with other feature values, influences the assessment, which is visualized through the range and color of the points. This allows clinicians to simultaneously identify the most influential features in the model. However, it should be noted that SHAP-based explanations primarily reflect the contribution of radiomics features to the model’s predictive behavior, rather than direct biological mechanisms. Although certain texture features may be plausibly related to tumor heterogeneity, cellularity, or vascular patterns, such interpretations remain speculative in the absence of matched histopathological or molecular validation. Therefore, the present SHAP analysis is intended to enhance model transparency at the feature level, rather than to establish definitive biological causation. “Wavelet_LLH_glcm_Correlation_CE T1WI” combines high-order feature extraction methods, such as wavelet transform and GLCM, which are commonly used to analyze image texture features. The wavelet transform is a powerful signal processing tool widely utilized in medical image analysis, especially in tumor grading studies. When combined with texture analysis methods such as GLCM, it allows for a deeper extraction of spatial relationships in tumor tissues, capturing more detailed information about image textures. This approach is particularly beneficial for classification tasks in image processing, enhancing the accuracy of tumor assessment and facilitating more precise medical decision-making [36]. This approach may contribute to improved model performance and offers additional quantitative information that could support clinical decision-making [37]. Liu et al. [38]reported that texture features associated with tumor heterogeneity are strongly linked to tumor grading. In this study, the “wavelet_LLH_glcm_Correlation_CE T1WI” feature was significantly lower in patients with benign tumors compared to those with malignant tumors, and this radiomic feature showed a notable association with the model-based classification of benign and malignant tumor groups. This may be related to the heterogeneity of malignant tumors.

We conducted SHAP analysis on the six selected radiomics features incorporated into the CRM, of which four were derived from CE T1WI and two from T2WI, highlighting the important value of CE T1WI sequences in predicting malignant orbital tumors. The value of CE T1WI-derived radiomics features may be related to their ability to capture enhancement patterns and intratumoral heterogeneity, which are commonly observed in malignant lesions, providing clear lesion contrast through fat suppression techniques, and quantifying enhanced heterogeneity [39, 40]. SHAP analysis indicated that squareroot_firstorder_Skewness_CE T1WI and wavelet_LLH_glcm_Correlation_CE T1WI were positively associated with model predictions of malignancy. These two radiomics features reflect the high spatial heterogeneity inherent in malignant tumors. Such heterogeneity may be related to underlying biological processes, including invasive growth, internal necrosis, cyst formation, bleeding and tissue structure destruction, which are quantified and captured through radiomic features in two different but interconnected dimensions: skewness (gray distribution pattern) and correlation (spatial texture regularity) [41]. The wavelet_LLH_glcm_Correlation, squareroot_firstorder_Skewness, and original_shape_Sphericity showed a strong correlation with benign orbital tumors, associations consistent with their expansive growth, encapsulated limitations, and relatively ordered internal structures, both biologically and radiologically. These radiomics features together form a composite signature that may be useful for non-invasive differentiation between benign and malignant orbital tumors [42, 43]. Nevertheless, these associations are observational and do not imply direct biological causation.

The present study has several limitations. First, a key methodological limitation concerns the validation strategy. Although patient data were collected from two institutions, the datasets were merged and randomly partitioned into training and testing sets for internal validation. Consequently, a truly independent external validation, using data from one institution to test a model trained exclusively on the other, was not performed. This approach was primarily adopted due to the limited overall sample size and the imbalanced distribution of tumor subtypes between the two centers; separating the data strictly by institution would have substantially reduced the statistical robustness of model development and evaluation. Nevertheless, this strategy limits the ability of the present study to rigorously assess model generalizability across different clinical settings, MRI scanners, and imaging protocols. The absence of institution-level external validation therefore constrains confidence in the immediate real-world applicability of the proposed models. Prospective multicenter studies with datasets separated strictly by institution are essential future steps to validate robustness and clinical utility across heterogeneous patient populations and healthcare environments.

Second, manual segmentation of the regions of interest (ROIs) was used in our study to ensure the accuracy of tumor segmentation and reduce inter-rater variability among radiologists. However, this method remains subjective and labor-intensive, which could introduce biases and inconsistencies in the data. Moreover, manual segmentation may result in the loss of information from certain tumor regions, particularly in cases of complex or irregularly shaped tumors [44, 45]. Future studies incorporating automated or semi-automated segmentation techniques may improve efficiency, reproducibility, and objectivity.

Third, the high dimensionality of the initial radiomic feature set relative to our sample size represents a potential limitation. We extracted a large number of features from a limited patient cohort. Although we employed rigorous dimensionality reduction techniques, including the Least Absolute Shrinkage and Selection Operator, to select a parsimonious set of six final features for model construction, the risk of overfitting and the possibility of selecting features that reflect noise or dataset-specific idiosyncrasies rather than generalizable biological signals cannot be fully ruled out. While internal cross-validation was used to mitigate this risk, the stability of the selected feature set across different data partitions warrants further investigation. Future validation on larger, independent, multi-institutional cohorts is essential to confirm the robustness, biological relevance, and generalizability of the identified radiomic signature.

Conclusions

Our research suggests that a machine learning approach that amalgamates multi-parameter magnetic resonance radiomics features with clinical factors, may serve as a non-invasive tool for distinguishing between orbital malignant and benign tumors. SHAP-based interpretability enhances understanding of model predictions and may assist clinicians in individualized risk assessment, although further validation is required before routine clinical implementation.

Acknowledgements

The authors would like to thank the The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People’s Hospital and the Eye & ENT Hospital, Fudan University for its support in this project.Meanwhile, We would like to thank the participants in this study. Especially Professor Rujian Hong’s support and attention towards me.

Abbreviations

MRI: Magnetic resonance imaging
T2WI: T2-Weighted imaging
CE T1WI: Contrast-enhanced T1-weighted imaging
mRMR: Minimum redundancy maximum relevance
LASSO: Least absolute shrinkage and selection operator
LR: Logistic regression
NaiveBayes: Naive bayes classifier
MLP: Multilayer perceptron
CRM: Combined radiomics model
AUC: Area under the curve
DCA: Decision curve analysis
SHAP: SHapley Additive exPlanations

Authors’ contributions

G.Z.and W.H.wrote the main manuscript text and X.X. prepared figures.G.Z. and X.H.provided Funding.R.H. provided Resources . All authors reviewed the manuscript.

Funding

This work was supported by Quzhou Municipal Science and Technology Bureau(grant no. 2022k65).

Data availability

All data generated or analysed during this study are included in this published article [and its supplementary information files].

Declarations

Ethics approval and consent to participate

This study complied with the declaration of Helsinki and was approved by the Human Research Ethics Committee of the Quzhou People’s Hospital. Informed consent was waived owing to the retrospective nature of the study and approved by the Human Research Ethics Committee of the Quzhou People’s Hospital.

Consent for publication

Not Applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Guozheng Zhang and Xingjian Xu contributed equally to this work and share first authorship.

References

1.Shields JA, Shields CL, Scartozzi R. Survey of 1264 patients with orbital tumors and simulating lesions: The 2002 Montgomery Lecture, part 1. Ophthalmology. 2004;111(5):997–1008. [DOI] [PubMed] [Google Scholar]
2.Xu XQ, Qian W, Ma G, Hu H, Su GY, Liu H, et al. Combined diffusion-weighted imaging and dynamic contrast-enhanced MRI for differentiating radiologically indeterminate malignant from benign orbital masses. Clin Radiol. 2017;72(10):e9039–90315. [DOI] [PubMed] [Google Scholar]
3.Lecler A, Duron L, Charlson E, Kolseth C, Kossler AL, Wintermark M, et al. Comparison between 7 Tesla and 3 Tesla MRI for characterizing orbital lesions. Diagn Interv Imaging. 2022;103(9):433–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Cohen LM, Yoon MK. Update on Current Aspects of Orbital Imaging: CT, MRI, and Ultrasonography. Int Ophthalmol Clin. 2019;59(4):69–79. [DOI] [PubMed] [Google Scholar]
5.Koukkoulli A, Pilling JD, Patatas K, El-Hindy N, Chang B, Kalantzis G. How accurate is the clinical and radiological evaluation of orbital lesions in comparison to surgical orbital biopsy? Eye. 2018;32(8):1329–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Martel A, Baillif S, Nahon-Esteve S, Gastaud L, Bertolotto C, Lassalle S, et al. Orbital exenteration: an updated review with perspectives. Surv Ophthalmol. 2021;66(5):856–76. [DOI] [PubMed] [Google Scholar]
7.Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48(4):441–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Zwanenburg A, Vallières M, Abdalah MA, Aerts HJWL, Andrearczyk V, Apte A, et al. The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology. 2020;295(2):328–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Duron L, Heraud A, Charbonneau F, Zmuda M, Savatovsky J, Fournier L, et al. A Magnetic Resonance Imaging Radiomics Signature to Distinguish Benign From Malignant Orbital Lesions. Invest Radiol. 2021;56(3):173–80. [DOI] [PubMed] [Google Scholar]
10.Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images Are More than Pictures. They Are Data Radiol. 2015;278(2):563–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol. 2017;14(12):749–62. [DOI] [PubMed] [Google Scholar]
12.Zhang Q, Peng Y, Liu W, Bai J, Zheng J, Yang X, et al. Radiomics Based on Multimodal MRI for the Differential Diagnosis of Benign and Malignant Breast Lesions. J Magn Reson Imaging. 2020;52(2):596–607. [DOI] [PubMed] [Google Scholar]
13.Geng W, Zhu J, Li M, Pi B, Wang X, Xing J, et al. Radiomics Based on Multimodal magnetic resonance imaging for the Differential Diagnosis of Benign and Malignant Vertebral Compression Fractures. Orthop Surg. 2024;16(10):2464–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Tay S, Stephenson M, Allameen N, Narayanan S, Lee B, Mak A. A Multimodal Magnetic Resonance Imaging Study of Cognitive Function in Systemic Lupus Erythematosus: A Machine Learning Approach. Ann Rheum Dis. 2022;81(Suppl 1):668. [Google Scholar]
15.Nakagawa J, Fujima N, Hirata K, Tang M, Tsuneta S, Suzuki J, et al. Utility of the deep learning technique for the diagnosis of orbital invasion on CT in patients with a nasal or sinonasal tumor. Cancer Imaging. 2022;22(1):52. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.O’Shaughnessy E, Senicourt L, Mambour N, Savatovsky J, Duron L, &Lecler A. Toward Precision Diagnosis: Machine Learning in Identifying Malignant Orbital Tumors With Multiparametric 3 T MRI. Invest. Radiol. 2024;59(10):737–45. [DOI] [PubMed] [Google Scholar]
17.Xie X, Yang L, Zhao F, Wang D, Zhang H, He X, et al. A deep learning model combining multimodal radiomics, clinical and imaging features for differentiating ocular adnexal lymphoma from idiopathic orbital inflammation. Eur Radiol. 2022;32(10):6922–32. [DOI] [PubMed] [Google Scholar]
18.Tooley AA, Tailor P, Tran AQ, Garrity JA, Eckel L, Link MJ. Differentiating intradiploic orbital dermoid and epidermoid cysts utilizing clinical features and machine learning. Indian J Ophthalmol.2022;70 (6). [DOI] [PMC free article] [PubMed]
19.Dregely I, Prezzi D, Kelly-Morland C, Roccia E, Neji R, Goh V. Imaging biomarkers in oncology: Basics and application to MRI. J Magn Reson Imaging. 2018;48(1):13–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.O’Connor JP, Aboagye EO, Adams JE, Aerts HJ, Barrington SF, Beer AJ, et al. Imaging biomarker roadmap for cancer studies. Nat Rev Clin Oncol. 2016;14(3):169–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Wang Y, Lang J, Zuo JZ, Dong Y, Hu Z, Xu X, et al. The radiomic-clinical model using the SHAP method for assessing the treatment response of whole-brain radiotherapy: a multicentric study. Eur Radiol. 2022;32(12):8737–47. [DOI] [PubMed] [Google Scholar]
22.Hou Y, Xie X, Chen J, Lv P, Jiang S, He X, et al. Bag-of-features-based radiomics for differentiation of ocular adnexal lymphoma and idiopathic orbital inflammation from contrast-enhanced MRI. Eur Radiol. 2020;31(1):24–33. [DOI] [PubMed] [Google Scholar]
23.Han Q, Du L, Mo Y, Huang C, Yuan Q. Machine Learning Based Non-Enhanced CT Radiomics for the Identification of Orbital Cavernous Venous Malformations: An Innovative Tool. J Craniofac Surg. 2022;33(3):814–20. [DOI] [PubMed] [Google Scholar]
24.Ren J, Yuan Y, Qi M, Tao X. MRI-based radiomics nomogram for distinguishing solitary fibrous tumor from schwannoma in the orbit: a two-center study. Eur Radiol. 2023;34(1):560–8. [DOI] [PubMed] [Google Scholar]
25.Ro SR, Asbach P, Siebert E, Bertelmann E, Hamm B, Erb-Eigner K. Characterization of orbital masses by multiparametric. MRI Eur J Radiol. 2015;85(2):324–36. [DOI] [PubMed] [Google Scholar]
26.Xu XQ, Hu H, Liu H, Wu JF, Cao P, Shi HB, et al. Benign and malignant orbital lymphoproliferative disorders: Differentiating using multiparametric MRI at 3.0T. J Magn Reson Imaging. 2016;45(1):167–76. [DOI] [PubMed] [Google Scholar]
27.Yang L, Zhang H, Xie X, Jiang S, Zhang H, Cao X, et al. MRI-Based Radiomics Nomogram for Preoperative Differentiation Between Ocular Adnexal Lymphoma and Idiopathic Orbital Inflammation. J Magn Reson Imaging. 2022;57(5):1594–604. [DOI] [PubMed] [Google Scholar]
28.Armstrong GW, Lorch AC. A(eye): A Review of Current Applications of Artificial Intelligence and Machine Learning in Ophthalmology. Int Ophthalmol Clin. 2020;60(1):57–71. [DOI] [PubMed] [Google Scholar]
29.Huang CC, Kuo WY, Shen YT, Chen CJ, Lin HJ, Hsu CC, et al. Artificial intelligence prediction of In-Hospital mortality in patients with dementia: A multi-center study. Int J Med Inf. 2024;191:105590. [DOI] [PubMed] [Google Scholar]
30.Ma H, Kang X, Duan S, Li Y. Efficient Structural Optimization under Transient Impact Loads Using Multilayer Perceptron and Genetic Algorithms. Int J Nonlin Mech. 2024;104950.
31.Fu M, Liu Y, Hou Z, Wang Z. Interpretable prediction of acute ischemic stroke after hip fracture in patients 65 years and older based on machine learning and SHAP. Arch Gerontol Geriat. 2024;129:105641. [DOI] [PubMed] [Google Scholar]
32.Rodríguez-Pérez R, Bajorath J. Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values. J Med Chem. 2019;63(16):8761–77. [DOI] [PubMed] [Google Scholar]
33.Giraud P, Giraud P, Nicolas E, Boisselier P, Alfonsi M, Rives M et al. Interpretable Machine Learning Model for Locoregional Relapse Prediction in Oropharyngeal Cancers. Cancers (Basel). 2020;13 (1). [DOI] [PMC free article] [PubMed]
34.Ma L, Xiao Z, Li K, Li S, Li J, Yi X. Game theoretic interpretability for learning based preoperative gliomas grading. Future Gener Comp Sy. 2020;112:1–10. [Google Scholar]
35.Li R, et al. Machine Learning-Based Interpretation and Visualization of Nonlinear Interactions in Prostate Cancer Survival. Jco Clin Cancer Inf. 2020;4:637–46. [DOI] [PubMed] [Google Scholar]
36.Verma S, Kumar M. A hybrid machine learning model for skin disease classification using discrete wavelet transform and gray level co-occurrence matrix (GLCM). Multimed Tools. 2024.
37.Tomaszewski MR, Gillies RJ. The Biological Meaning of Radiomic Features. Radiology. 2021;299(2):E256. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Liu Y, Liu S, Qu F, Li Q, Cheng R, Ye Z. Tumor heterogeneity assessed by texture analysis on contrast-enhanced CT in lung adenocarcinoma: association with pathologic grade. Oncotarget. 2017;8(32):53664–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Yuan Y, Kuai XP, Chen XS, Tao XF. Assessment of dynamic contrast-enhanced magnetic resonance imaging in the differentiation of malignant from benign orbital masses. Eur J Radiol. 2013;82(9):1506–11. [DOI] [PubMed] [Google Scholar]
40.Li J, Zhou C, Qu X, Du L, Yuan Q, Han Q, et al. Perilesional dominance: radiomics of multiparametric MRI enhances differentiation of IgG4-Related ophthalmic disease and orbital MALT lymphoma. BMC Med Imaging. 2025;25(1):238. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.He Z, Mao Y, Lu S, Tan L, Xiao J, Tan P, et al. Machine learning-based radiomics for histological classification of parotid tumors using morphological MRI: a comparative study. Eur Radiol. 2022;32(12):8099–110. [DOI] [PubMed] [Google Scholar]
42.Ou S, Lin Y, Zhang Y, Shi K, Wu H. Epidemiology and tumor microenvironment of ocular surface and orbital tumors on growth and malignant transformation. Front Oncol. 2024;14:1388156. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Ladbury C, Zarinshenas R, Semwal H, Tam A, Vaidehi N, Rodin AS, et al. Utilization of model-agnostic explainable artificial intelligence frameworks in oncology: a narrative review. Transl Cancer Res. 2022;11(10):3853–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Kniep HC, Madesta F, Schneider T, Hanning U, Schönfeld MH, Schön G, et al. Radiomics of Brain MRI: Utility in Prediction of Metastatic Tumor Type. Radiology. 2018;290(2):479–87. [DOI] [PubMed] [Google Scholar]
45.Whybra P, Zwanenburg A, Andrearczyk V, Schaer R, Apte AP, Ayotte A, et al. The Image Biomarker Standardization Initiative: Standardized Convolutional Filters for Reproducible Radiomics and Enhanced Clinical Insights. Radiology. 2024;310(2):e231319. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data generated or analysed during this study are included in this published article [and its supplementary information files].

[CR1] 1.Shields JA, Shields CL, Scartozzi R. Survey of 1264 patients with orbital tumors and simulating lesions: The 2002 Montgomery Lecture, part 1. Ophthalmology. 2004;111(5):997–1008. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Xu XQ, Qian W, Ma G, Hu H, Su GY, Liu H, et al. Combined diffusion-weighted imaging and dynamic contrast-enhanced MRI for differentiating radiologically indeterminate malignant from benign orbital masses. Clin Radiol. 2017;72(10):e9039–90315. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Lecler A, Duron L, Charlson E, Kolseth C, Kossler AL, Wintermark M, et al. Comparison between 7 Tesla and 3 Tesla MRI for characterizing orbital lesions. Diagn Interv Imaging. 2022;103(9):433–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Cohen LM, Yoon MK. Update on Current Aspects of Orbital Imaging: CT, MRI, and Ultrasonography. Int Ophthalmol Clin. 2019;59(4):69–79. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Koukkoulli A, Pilling JD, Patatas K, El-Hindy N, Chang B, Kalantzis G. How accurate is the clinical and radiological evaluation of orbital lesions in comparison to surgical orbital biopsy? Eye. 2018;32(8):1329–33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Martel A, Baillif S, Nahon-Esteve S, Gastaud L, Bertolotto C, Lassalle S, et al. Orbital exenteration: an updated review with perspectives. Surv Ophthalmol. 2021;66(5):856–76. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48(4):441–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Zwanenburg A, Vallières M, Abdalah MA, Aerts HJWL, Andrearczyk V, Apte A, et al. The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology. 2020;295(2):328–38. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Duron L, Heraud A, Charbonneau F, Zmuda M, Savatovsky J, Fournier L, et al. A Magnetic Resonance Imaging Radiomics Signature to Distinguish Benign From Malignant Orbital Lesions. Invest Radiol. 2021;56(3):173–80. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images Are More than Pictures. They Are Data Radiol. 2015;278(2):563–77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol. 2017;14(12):749–62. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Zhang Q, Peng Y, Liu W, Bai J, Zheng J, Yang X, et al. Radiomics Based on Multimodal MRI for the Differential Diagnosis of Benign and Malignant Breast Lesions. J Magn Reson Imaging. 2020;52(2):596–607. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Geng W, Zhu J, Li M, Pi B, Wang X, Xing J, et al. Radiomics Based on Multimodal magnetic resonance imaging for the Differential Diagnosis of Benign and Malignant Vertebral Compression Fractures. Orthop Surg. 2024;16(10):2464–74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Tay S, Stephenson M, Allameen N, Narayanan S, Lee B, Mak A. A Multimodal Magnetic Resonance Imaging Study of Cognitive Function in Systemic Lupus Erythematosus: A Machine Learning Approach. Ann Rheum Dis. 2022;81(Suppl 1):668. [Google Scholar]

[CR15] 15.Nakagawa J, Fujima N, Hirata K, Tang M, Tsuneta S, Suzuki J, et al. Utility of the deep learning technique for the diagnosis of orbital invasion on CT in patients with a nasal or sinonasal tumor. Cancer Imaging. 2022;22(1):52. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.O’Shaughnessy E, Senicourt L, Mambour N, Savatovsky J, Duron L, &Lecler A. Toward Precision Diagnosis: Machine Learning in Identifying Malignant Orbital Tumors With Multiparametric 3 T MRI. Invest. Radiol. 2024;59(10):737–45. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Xie X, Yang L, Zhao F, Wang D, Zhang H, He X, et al. A deep learning model combining multimodal radiomics, clinical and imaging features for differentiating ocular adnexal lymphoma from idiopathic orbital inflammation. Eur Radiol. 2022;32(10):6922–32. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Tooley AA, Tailor P, Tran AQ, Garrity JA, Eckel L, Link MJ. Differentiating intradiploic orbital dermoid and epidermoid cysts utilizing clinical features and machine learning. Indian J Ophthalmol.2022;70 (6). [DOI] [PMC free article] [PubMed]

[CR19] 19.Dregely I, Prezzi D, Kelly-Morland C, Roccia E, Neji R, Goh V. Imaging biomarkers in oncology: Basics and application to MRI. J Magn Reson Imaging. 2018;48(1):13–26. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.O’Connor JP, Aboagye EO, Adams JE, Aerts HJ, Barrington SF, Beer AJ, et al. Imaging biomarker roadmap for cancer studies. Nat Rev Clin Oncol. 2016;14(3):169–86. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Wang Y, Lang J, Zuo JZ, Dong Y, Hu Z, Xu X, et al. The radiomic-clinical model using the SHAP method for assessing the treatment response of whole-brain radiotherapy: a multicentric study. Eur Radiol. 2022;32(12):8737–47. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Hou Y, Xie X, Chen J, Lv P, Jiang S, He X, et al. Bag-of-features-based radiomics for differentiation of ocular adnexal lymphoma and idiopathic orbital inflammation from contrast-enhanced MRI. Eur Radiol. 2020;31(1):24–33. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Han Q, Du L, Mo Y, Huang C, Yuan Q. Machine Learning Based Non-Enhanced CT Radiomics for the Identification of Orbital Cavernous Venous Malformations: An Innovative Tool. J Craniofac Surg. 2022;33(3):814–20. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Ren J, Yuan Y, Qi M, Tao X. MRI-based radiomics nomogram for distinguishing solitary fibrous tumor from schwannoma in the orbit: a two-center study. Eur Radiol. 2023;34(1):560–8. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Ro SR, Asbach P, Siebert E, Bertelmann E, Hamm B, Erb-Eigner K. Characterization of orbital masses by multiparametric. MRI Eur J Radiol. 2015;85(2):324–36. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Xu XQ, Hu H, Liu H, Wu JF, Cao P, Shi HB, et al. Benign and malignant orbital lymphoproliferative disorders: Differentiating using multiparametric MRI at 3.0T. J Magn Reson Imaging. 2016;45(1):167–76. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Yang L, Zhang H, Xie X, Jiang S, Zhang H, Cao X, et al. MRI-Based Radiomics Nomogram for Preoperative Differentiation Between Ocular Adnexal Lymphoma and Idiopathic Orbital Inflammation. J Magn Reson Imaging. 2022;57(5):1594–604. [DOI] [PubMed] [Google Scholar]

[CR28] 28.Armstrong GW, Lorch AC. A(eye): A Review of Current Applications of Artificial Intelligence and Machine Learning in Ophthalmology. Int Ophthalmol Clin. 2020;60(1):57–71. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Huang CC, Kuo WY, Shen YT, Chen CJ, Lin HJ, Hsu CC, et al. Artificial intelligence prediction of In-Hospital mortality in patients with dementia: A multi-center study. Int J Med Inf. 2024;191:105590. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Ma H, Kang X, Duan S, Li Y. Efficient Structural Optimization under Transient Impact Loads Using Multilayer Perceptron and Genetic Algorithms. Int J Nonlin Mech. 2024;104950.

[CR31] 31.Fu M, Liu Y, Hou Z, Wang Z. Interpretable prediction of acute ischemic stroke after hip fracture in patients 65 years and older based on machine learning and SHAP. Arch Gerontol Geriat. 2024;129:105641. [DOI] [PubMed] [Google Scholar]

[CR32] 32.Rodríguez-Pérez R, Bajorath J. Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values. J Med Chem. 2019;63(16):8761–77. [DOI] [PubMed] [Google Scholar]

[CR33] 33.Giraud P, Giraud P, Nicolas E, Boisselier P, Alfonsi M, Rives M et al. Interpretable Machine Learning Model for Locoregional Relapse Prediction in Oropharyngeal Cancers. Cancers (Basel). 2020;13 (1). [DOI] [PMC free article] [PubMed]

[CR34] 34.Ma L, Xiao Z, Li K, Li S, Li J, Yi X. Game theoretic interpretability for learning based preoperative gliomas grading. Future Gener Comp Sy. 2020;112:1–10. [Google Scholar]

[CR35] 35.Li R, et al. Machine Learning-Based Interpretation and Visualization of Nonlinear Interactions in Prostate Cancer Survival. Jco Clin Cancer Inf. 2020;4:637–46. [DOI] [PubMed] [Google Scholar]

[CR36] 36.Verma S, Kumar M. A hybrid machine learning model for skin disease classification using discrete wavelet transform and gray level co-occurrence matrix (GLCM). Multimed Tools. 2024.

[CR37] 37.Tomaszewski MR, Gillies RJ. The Biological Meaning of Radiomic Features. Radiology. 2021;299(2):E256. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Liu Y, Liu S, Qu F, Li Q, Cheng R, Ye Z. Tumor heterogeneity assessed by texture analysis on contrast-enhanced CT in lung adenocarcinoma: association with pathologic grade. Oncotarget. 2017;8(32):53664–74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Yuan Y, Kuai XP, Chen XS, Tao XF. Assessment of dynamic contrast-enhanced magnetic resonance imaging in the differentiation of malignant from benign orbital masses. Eur J Radiol. 2013;82(9):1506–11. [DOI] [PubMed] [Google Scholar]

[CR40] 40.Li J, Zhou C, Qu X, Du L, Yuan Q, Han Q, et al. Perilesional dominance: radiomics of multiparametric MRI enhances differentiation of IgG4-Related ophthalmic disease and orbital MALT lymphoma. BMC Med Imaging. 2025;25(1):238. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.He Z, Mao Y, Lu S, Tan L, Xiao J, Tan P, et al. Machine learning-based radiomics for histological classification of parotid tumors using morphological MRI: a comparative study. Eur Radiol. 2022;32(12):8099–110. [DOI] [PubMed] [Google Scholar]

[CR42] 42.Ou S, Lin Y, Zhang Y, Shi K, Wu H. Epidemiology and tumor microenvironment of ocular surface and orbital tumors on growth and malignant transformation. Front Oncol. 2024;14:1388156. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Ladbury C, Zarinshenas R, Semwal H, Tam A, Vaidehi N, Rodin AS, et al. Utilization of model-agnostic explainable artificial intelligence frameworks in oncology: a narrative review. Transl Cancer Res. 2022;11(10):3853–68. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Kniep HC, Madesta F, Schneider T, Hanning U, Schönfeld MH, Schön G, et al. Radiomics of Brain MRI: Utility in Prediction of Metastatic Tumor Type. Radiology. 2018;290(2):479–87. [DOI] [PubMed] [Google Scholar]

[CR45] 45.Whybra P, Zwanenburg A, Andrearczyk V, Schaer R, Apte AP, Ayotte A, et al. The Image Biomarker Standardization Initiative: Standardized Convolutional Filters for Reproducible Radiomics and Enhanced Clinical Insights. Radiology. 2024;310(2):e231319. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

An explainable radiomics model based on multiparametric magnetic resonance for differentiating benign and malignant orbital tumors

Guozheng Zhang

Xingjian Xu

Rujian Hong

Xiaowei Han

Weitao Huang

Abstract

Objective

Patients and methods

Results

Conclusion

Introduction

Patients and methods

Patients

Fig. 1.

Fig. 2.

Clinical baseline characteristics and MRI image acquisition

Image segmentation

Radiomics feature extraction, selection, and model evaluation

Statistical analyses

Results

Patient characteristics

Table 1.

Table 2.

Table 3.

Features selection

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Predictive performance of the models

Fig. 7.

Table 4.

Table 5.

Fig. 8.

Fig. 9.

Fig. 10.

Radiomic nomogram construction

Fig. 11.

Explanation and visualization of nomogram model

Fig. 12.

Fig. 13.

Fig. 14.

Discussion

Conclusions

Acknowledgements

Abbreviations

Authors’ contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases