Abstract
Background
Accurately distinguishing the different molecular subtypes of 2021 World Health Organization (WHO) grade 4 Central Nervous System (CNS) gliomas is highly relevant for prognostic stratification and personalized treatment.
Objectives
To develop and validate a machine learning (ML) model using multiparametric MRI for the preoperative differentiation of astrocytoma, CNS WHO grade 4, and glioblastoma (GBM), isocitrate dehydrogenase-wild-type (IDH-wt) (WHO 2021) (Task 1:grade 4 vs. GBM); and to stratify astrocytoma, CNS WHO grade 4, by distinguish astrocytoma, IDH-mutant (IDH-mut), CNS WHO grade 4 from astrocytoma, IDH-wild-type (IDH-wt), CNS WHO grade 4 (Task 2:IDH-mut grade 4 vs. IDH-wt grade 4). Additionally, to evaluate the model’s prognostic value.
Methods
We retrospectively analyzed 320 glioma patients from three hospitals (training/testing, 7:3 ratio) and 99 patients from The Cancer Genome Atlas (TCGA) database for external validation. Radiomic features were extracted from tumor and edema on contrast-enhanced T1-weighted imaging (CE-T1WI) and T2 fluid-attenuated inversion recovery (T2-FLAIR). Extreme gradient boosting (XGBoost) was utilized for constructing the ML, clinical, and combined models. Model performance was evaluated with receiver operating characteristic (ROC) curves, decision curves, and calibration curves. Stability was evaluated using six additional classifiers. Kaplan-Meier (KM) survival analysis and the log-rank test assessed the model’s prognostic value.
Results
In Task 1 and Task 2, the combined model (AUC = 0.907, 0.852 and 0.830 for Task 1; AUC = 0.899, 0.895 and 0.792 for Task 2) and the optimal ML model (AUC = 0.902, 0.854 and 0.832 for Task 1; AUC = 0.904, 0.899 and 0.783 for Task 2) significantly outperformed the clinical model (AUC = 0.671, 0.656, and 0.543 for Task 1; AUC = 0.619, 0.605 and 0.400 for Task 2) in both the training, testing and validation sets. Survival analysis showed the combined model performed similarly to molecular subtype in both tasks (p = 0.964 and p = 0.746).
Conclusion
The multiparametric MRI ML model effectively distinguished astrocytoma, CNS WHO grade 4 from GBM, IDH-wt (WHO 2021) and differentiated astrocytoma, IDH-mut from astrocytoma, IDH-wt, CNS WHO grade 4. Additionally, the model provided reliable survival stratification for glioma patients across different molecular subtypes.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12885-025-14529-7.
Keywords: Astrocytoma, Glioblastoma, Magnetic resonance imaging, Machine learning, Molecular subtype
Background
The 2021 World Health Organization (WHO) Classification of central nervous system (CNS) tumors underscores the importance of integrating molecular parameters with histological findings for accurate grading [1]. Notably, isocitrate dehydrogenase-mutant (IDH-mut) grade 2–3 astrocytomas are now reclassified as astrocytoma, IDH-mutant, CNS WHO grade 4, in the presence of homozygous deletion of cyclin-dependent kinase inhibitor A/B (CDKN2A/B), regardless of necrosis and/or microvascular proliferation. Similarly, IDH wild-type (IDH-wt) grade 2–3 astrocytomas are reclassified as astrocytoma, IDH-wt, with molecular features of glioblastoma (mGBM), CNS WHO grade 4, if any or a combination of telomerase reverse transcriptase (TERT) promoter mutation, epidermal growth factor receptor (EGFR) amplification, and chromosome + 7 /−10 copy number changes. These molecularly characterized grade 2–3 astrocytomas represent high-risk molecular subtypes with poor prognosis [1, 2].
Despite poor prognosis, astrocytoma, IDH-wt, CNS WHO grade 4 patients demonstrated longer overall survival (OS) and progression-free survival (PFS) compared to glioblastoma (GBM) patients [3]. Moreover, the two IDH variants of astrocytoma, CNS WHO grade 4 exhibit distinct biological behaviors and clinical outcomes: IDH-mut astrocytoma showing less aggressive behavior and a better prognosis under similar treatment protocols [2, 4]. Therefore, accurately discrimination of molecular subtypes in WHO grade 4 glioma is highly relevant for prognostic stratification and personalized treatment. However, current molecular diagnostics depend on invasive biopsies, necessitating a reliable non-invasive approach to predict the molecular subtypes of glioma.
Magnetic Resonance Imaging (MRI) is extensively utilized for glioma diagnosis and treatment monitoring but falls short in revealing histological and molecular details. Radiomics, by extracting high-throughput image features, provides insights into tumor heterogeneity, thereby enhancing diagnostic and therapeutic accuracy [5, 6]. Machine learning (ML) algorithms enable the processing of radiomics data predicting tissue characteristics and identifying molecular features in glioma [7, 8]. Previous studies have developed predictive models for glioma grading based on MRI radiomics or machine learning, mostly adhering to the 2016 or earlier WHO classifications or focusing on glioma grading prediction [9–17]. However, the differentiation of grade 4 gliomas from GBM, IDH-wt (WHO 2021), as well as the distinction between astrocytoma, IDHmut and IDH-wt, CNS WHO grade 4, remains underinvestigated. Therefore, we aim to construct a ML model using multiparametric MRI to differentiate between astrocytoma, CNS WHO grade 4 and GBM, IDH-wt (WHO 2021) (Task 1:grade 4 vs. GBM), and further stratify grade 4 astrocytoma to distinguish IDH-mut from IDH-wt subtypes (Task 2:IDH-mut grade 4 vs. IDH-wt grade 4). Additionally, we seek to analyze its corresponding prognostic value in OS.
Materials and methods
Machine learning-based classification
We constructed two independent binary classification tasks: (1) Differentiation between astrocytoma, CNS WHO grade 4 and GBM, IDH-wt (WHO 2021) (Task 1: grade 4 vs. GBM), and (2) Stratification of grade 4 astrocytoma into IDHmut and IDH-wt subtypes (Task 2: IDH-mut grade 4 vs. IDH-wt grade 4).
Patient population
This retrospective study enrolled 320 patients with pathologically confirmed glioma from three institutions (First Hospital of Shanxi Medical University, Shanxi Provincial People’s Hospital, and Shanxi Bethune Hospital) between February 2011 and December 2023. The institutional cohort was randomly stratified into training and testing set at a 7:3 ratio. An additional 99 cases from The Cancer Genome Atlas (TCGA) database served as an independent external validation set. To assess the prognostic value of the molecular subtype prediction model, OS was defined as the time from initial diagnosis to death or the last follow-up. Survival information was available for 312 patients in the institutional cohort and all 99 cases in the TCGA cohort. The institutional review board waived the requirement for patient informed consent (KYYJ-2023-058).
Inclusion criteria: (1) Histological diagnosis of diffuse glioma; (2) No prior radiotherapy, chemotherapy, or surgery before MRI examination; (3) No history of craniocerebral surgery or other systemic malignancies; (4) Availability of molecular information, including IDH, 1p/19q, methyl guanine methyl transferase promoter methylation (MGMTmet), CDKN2A/B, EGFR, TERT, and chromosome 7/10 status. (5) The follow-up time of TCGA data was more than two years.
The exclusion criteria were: (1) 2021 CNS WHO grade 2–3 astrocytoma; (2) Oligodendroglioma; (3) Incomplete or poor-quality MRI images; (4) Incomplete molecular information. A flowchart of patient selection and machine learning classification is shown in Fig. 1.
Fig. 1.
Patient flow and machine learning-based classification chart. CNS: Central Nervous System; WHO: World Health Organization; IDH-mut: Isocitrate dehydrogenase-mutant; IDH-wt: Isocitrate dehydrogenase wild-type; TERTpMUT: telomerase reverse transcriptase promoter mutation; EGFRAMP: epidermal growth factor receptor amplification; GBM: glioblastoma; mGBM: molecular glioblastoma; TCGA: The Cancer Genome Atlas
Clinical-radiological characteristics collection
Clinical-radiological characteristics included gender, age, methyl guanine methyl transferase promoter methylation (MGMTmet), treatment, Karnofsky Performance Status (KPS) score, tumor number, tumor margin, intratumoral hemorrhage, intratumoral necrosis, peritumoral edema, maximum diameter, midline shift, enhancement pattern, enhancement quality, tumor crosses midline (TCM), edema crosses midline (ECM), cortical involvement, deep white matter invasion, pial invasion, and ependymal invasion.
Molecular biomarker detection
IDH mutation status, CDKN2A/B co-deletion status, and TERT promoter mutation status were detected via Sanger sequencing (ABI 3500, Thermo Fisher Scientific, Waltham, MA, USA). EGFR amplification, alterations in chromosome 7/10, and 1p/19q deletions were determined using fluorescence in situ hybridization (FISH). Bisulfite modification of extracted DNA from glioma was performed using the Bisul Flash™ DNAModification Kit (Epigentek, Farmingdale, New York, NY, USA), and PCR amplification for MGMTmet status was conducted using the DRR006 Kit (Takara, Kusatsu, Shiga, Japan).
MRI image acquisition and preprocessing
MR Images were obtained using 3T scanners (Signa HDxt, GE Healthcare, USA; Skyra, Siemens Healthineers, Germany). Scanning sequences comprised axial T2-weighted imaging fluid attenuated inversion recovery (T2WI-FLAIR) and contrast-enhanced T1WI (CE-T1WI). MRI scanning parameters were: T2WI-FLAIR (TR 6800-8000ms, TE 80-95ms, TI 2000ms); CE-T1WI (TR 195-240ms, TE 4.8-8.6ms); layer thickness 5.0 mm, layer spacing 1.5 mm, FOV 240 mm×240 mm, matrix 256 × 256. Contrast-enhanced sequences utilized intravenous gadolinium contrast agent (0.1mmol/kg).
Images were preprocessed to standardize signal intensity on a scale of 100, corrected for N4ITK bias fields, and resampled to a voxel size of 1 mm × 1 mm × 1 mm, with voxel intensity discretized using a fixed bin width of 5. Image preprocessing used FeAture Explorer V.0.5.7 (FAE, https://github.com/salan668/FAE) and 3Dslicer version 5.7.20240325 (https://www.slicer.org).
Image segmentation, feature extraction and selection
A rigid registration algorithm was used to register T2WI-FLAIR to CE-T1WI images. A radiologist with 8 years of experience (Radiologist A) manually delineated tumor and edema areas layer by layer on CE-T1WI and T2-FLAIR, respectively, while blinded to molecular status and clinical outcomes. The volume of interest (VOI) outline of the edema was replicated onto CE-T1WI. Three VOIs were acquired for each patient: CE-T1WI tumor (T1C tumor), CE-T1WI edema (T1C edema), and T2WI-FLAIR tumor (T2F tumor). Thirty patients were randomly selected for a second segmentation by an independent radiologist with 5 years of experience (Radiologist B) a month later to calculate the intraclass correlation coefficient (ICC) for inter-observer agreement. The two radiologists were blinded to molecular status and clinical outcomes.
The extracted features encompassing seven categories: first-order, shape, Gray Level Co-Occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), Gray Level Dependence Matrix (GLDM), and Neighborhood Gray Tone Difference Matrix (NGTDM) features. Feature selection was performed on the training set. Features with an intraclass correlation coefficient (ICC) ≥ 0.75 were selected, followed by z-score normalization. Normality was assessed using the Shapiro-Wilk test. Features were retained if p < 0.05 using independent sample t-tests or Mann-Whitney U tests, depending on distribution. The least absolute shrinkage and selection operator (LASSO) selected the best regularization parameters through ten-fold cross-validation to ensure the robustness of feature selection.
Image segmentation utilized ITK-SNAP software (http://www.itksnap.org/, version 4.0.0). Features were extracted using FAE software. Region of interest (ROI) segmentation is shown in Fig. 2. The radiomics workflow is shown in Fig. 3.
Fig. 2.
ROI segmentation of T1C tumor, T1C edema and T2F tumor. T1C: contrast-enhanced T1-weighted imaging; T2F: T2-weighted imaging fluid attenuated inversion recovery
Fig. 3.
The workflow of radiomics analysis. The radiomics workflow includes ROI segmentation, feature extraction, feature selection and ML model construction and model evaluation. GLCM: Gray Level Co-occurrence Matrix; GLRLM: Gray Level Run Length Matrix; GLSZM: Gray Level Size Zone Matrix; GLDM: Gray Level Dependence Matrix; NGTDM: Neighborhood Gray Tone Difference Matrix; ICC: intraclass correlation coefficient; M-W: Mann-Whitney test; LASSO: least absolute shrinkage and selection operator; XGBoost: extreme gradient boosting; ML: machine learning.
Construction of the ML model
Extreme gradient boosting (XGBoost), an ensemble learning algorithm based on gradient boosting decision tree, was employed to build ML models for the two tasks. Based on the selected features of each sequence and their combination, ML models for single and combined sequences were constructed. The ML model with the best predictive performance in the validation set was selected as the optimal ML model, and the Rad-score was calculated.
Construction of the clinical model
Univariate logistic regression (LR) analysis screened clinical-radiological characteristics, and variables with p < 0.05 were included in multivariate LR analysis. Significant variables were used to establish the clinical model using XGBoost.
Construction of the combined model and nomogram
A combined model was constructed using XGBoost based on the Rad-score and independent clinical-radiological risk factors. A nomogram was generated using LR to visually discriminate between astrocytoma, CNS WHO grade 4 and GBM, IDH-wt, (WHO 2021).
Model evaluation and model comparison
The performance of each model in predicting WHO grade 4 glioma molecular subtypes was evaluated by the Receiver Operating Characteristic (ROC) curve, assessing Area Under the Curve (AUC), sensitivity (SEN), specificity (SPE), accuracy (ACC), Positive Predictive Value (PPV) and Negative Predictive Value (NPV). The DeLong’s test compared the predictive performance between different models, with statistical significance set at p < 0.05. Decision and calibration curves evaluated model calibration and clinical utility.
Various ML algorithms, including LR, support vector machines (SVM), multi-layer perceptrons (MLP), linear discriminant analysis (LDA), random forest (RF), and naive bayes (NB), were used to evaluate the generalization and stability of the combined model.
Survival analysis
To evaluate the prognostic value of the molecular subtype prediction model, Kaplan-Meier (KM) survival analysis and log-rank test were employed based on the combined model. The Z-test compared the prognostic value between the combined model and the molecular subtype.
Statistical analysis
Statistical analyses were performed using R version 4.2.3 (http://www.R-project.org). Numerical variables were expressed as mean ± standard deviations. The Shapiro-Wilk test assessed normality. Independent sample t-test or Mann-Whitney U-test was used based on the distribution. Categorical variables were expressed as frequencies (percentages) and evaluated using Pearson’s chi-square test or Fisher’s exact test. p < 0.05 was considered statistically significant.
Results
Clinical-radiological baseline characteristics
Baseline characteristics are detailed in Table 1. Among the 320 patients enrolled based on the 2021 WHO CNS tumor classification, there were 196 GBMs, 41 IDH-mut grade 4, and 83 IDH-wt grade 4 astrocytoma cases. The training and testing sets included 224/96 cases for Task 1 (grade 4 vs. GBM) and 118/51 cases for Task 2 (IDH-mut grade 4 vs. IDH-wt grade 4). The validation set (n = 99) consisted of 76 GBMs, 3 IDH-mut grade 4, and 20 IDH-wt grade 4 astrocytomas. The diagnostic reclassification scheme is detailed in Supplementary Table 1.
Table 1.
Clinical-radiological baseline characteristics between training, testing, and validation sets of the two tasks
| Variables | Task 1 (grade 4 vs. GBM) | Task 2 (IDH-mut grade 4 vs. IDH-wt grade 4) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Training set(n = 224) | Testing set(n = 96) | P-value | Validation set(n = 99) | P-value | Training set(n = 87) | Testing set(n = 37) | P-value | Validation set(n = 23) | P-value | |
| Gender | 0.712 | 0.182 | 0.554 | 0.355 | ||||||
| Male | 130 (58.04%) | 53 (55.21%) | 49(49.49%) | 48 (55.17%) | 23 (62.16%) | 10 (43.48%) | ||||
| Female | 94 (41.96%) | 43 (44.79%) | 50(50.51%) | 39 (44.83%) | 14 (37.84%) | 13 (56.52%) | ||||
| Age(years) | 53.76 ± 13.48 | 52.23 ± 13.62 | 0.201 | 58.51 ± 12.32 | 0.011 | 49.26 ± 14.71 | 52.11 ± 14.21 | 0.216 | 55.17 ± 10.29 | 0.031 |
| MGMTmet | 0.027 | 0.051 | 0.090 | 0.302 | ||||||
| Yes | 136 (60.71%) | 45 (46.88%) | 48(48.48%) | 64 (73.56%) | 21 (56.76%) | 14 (60.87%) | ||||
| No | 88 (39.29%) | 51 (53.12%) | 51 (51.52%) | 23 (26.44%) | 16 (43.24%) | 9 (39.13%) | ||||
| Treatment | 1.000 | <0.001 | 0.440 | <0.001 | ||||||
| Surgery | 88 (39.29%) | 38 (39.58%) | 6 (6.06%) | 38 (43.68%) | 19 (51.35%) | 1 (4.35%) | ||||
| Combination therapy | 136 (60.71%) | 58 (60.42%) | 93 (93.94%) | 49 (56.32%) | 18 (48.65%) | 22 (95.65%) | ||||
| KPS | 78.64 ± 12.08 | 79.23 ± 11.46 | 0.913 | 77.98 ± 11.86 | 0.426 | 78.46 ± 12.11 | 79.78 ± 6.68 | 0.884 | 80.43 ± 12.24 | 0.114 |
| Tumor number | 0.625 | 0.630 | 0.305 | 0.621 | ||||||
| Single | 118 (52.68%) | 54 (56.25%) | 49 (49.49%) | 59 (67.82%) | 21 (56.76%) | 14 (60.87%) | ||||
| Multiple | 106 (47.32%) | 42 (43.75%) | 50 (50.51%) | 28 (32.18%) | 16 (43.24%) | 9 (39.13%) | ||||
| Tumor margin | 0.713 | 0.463 | 0.073 | 0.806 | ||||||
| Clear | 97 (43.30%) | 44 (45.83%) | 38 (38.38%) | 31 (35.63%) | 20 (54.05%) | 7 (30.43%) | ||||
| Non-clear | 127 (56. 70%) | 52 (54.17%) | 61 (61.62%) | 56 (64.37%) | 17 (45.95%) | 16 (69.57%) | ||||
| Intratumoral hemorrhage | 1.000 | <0.001 | 0.617 | 1.000 | ||||||
| Yes | 54 (24.11%) | 23 (23.96%) | 57 (57.583%) | 15 (17.24%) | 8 (21.62%) | 4 (17.39%) | ||||
| No | 170 (75.89%) | 73 (76.04%) | 42 (42.42%) | 72 (82.76%) | 29 (78.38%) | 19 (82.61%) | ||||
| Intratumoral necrosis | 0.640 | 0.273 | 0.269 | 0.137 | ||||||
| Yes | 180 (80.36%) | 80 (83.33%) | 85 (85.86%) | 61 (70.11%) | 30 (81.08%) | 12 (52.17%) | ||||
| No | 44 (19.64%) | 16 (16.67%) | 14 (14.14%) | 26 (29.89%) | 7 (18.92%) | 11 (47.83%) | ||||
| Peritumoral edema | 0.243 | 0.021 | 0.033 | 0.116 | ||||||
| Yes | 212 (94.64%) | 94 (97.92%) | 99 (100.00%) | 76 (87.36%) | 37 (100.00%) | 23 (100.00%) | ||||
| No | 12 (5.36%) | 2 (2.08%) | 0 (0.00%) | 11 (12.64%) | 0 (0.00%) | 0 (0.00%) | ||||
| Maximum diameter | 4.73 ± 1.82 | 4.74 ± 1.96 | 0.840 | 4.50 ± 1.64 | 0.354 | 4.66 ± 1.94 | 4.64 ± 1.52 | 0.960 | 3.86 ± 1.85 | 0.070 |
| Midline shift | 0.456 | 0.116 | 0.433 | 0.058 | ||||||
| Yes | 111 (47.45%) | 48 (50.00%) | 39 (39.39%) | 44 (50.57%) | 22 (59.46%) | 6 (26.09%) | ||||
| No | 113 (52.55%) | 48 (50.00%) | 60 (60.61%) | 43 (49.43%) | 15 (40.54%) | 17 (73.91%) | ||||
| Enhancement pattern | 0.316 | <0.001 | 0.392 | 0.008 | ||||||
| No reinforcement | 32 (14.29%) | 10 (10.42%) | 6 (6.06%) | 24 (27.59%) | 6 (16.22%) | 6 (26.09%) | ||||
| Annular reinforcement | 103 (45.98%) | 41 (42.71%) | 58 (58.59%) | 36 (41.38%) | 15 (40.54%) | 5 (21.74%) | ||||
| Nodular enhancement | 30 (13.39%) | 10 (10.42%) | 23 (23.23%) | 13(14.94%) | 6 (16.22%) | 11 (47.83%) | ||||
| Mixed reinforcement | 59 (26.34%) | 35 (36.46%) | 12 (12.12%) | 14 (16.09%) | 10 (27.03%) | 1 (4.35%) | ||||
| Enhancement quality | 0.357 | 0.057 | 0.108 | 1.000 | ||||||
| No reinforcement | 31 (13.84%) | 9 (9.38%) | 6 (6.06%) | 24 (27.59%) | 5 (13.51%) | 6 (26.09%) | ||||
| Reinforcement | 193 (86.16%) | 87 (90.62%) | 93 (93.94%) | 63 (72.41%) | 32 (86.49%) | 17 (73.91%) | ||||
| TCM | 0.537 | 0.010 | 1.000 | 0.069 | ||||||
| Yes | 41 (18.30%) | 21 (21.88%) | 7 (7.07%) | 20 (22.99%) | 8 (21.62%) | 1 (4.35%) | ||||
| No | 183 (81.70%) | 75 (78.12%) | 92 (92.93%) | 67 (77.01%) | 29 (78.38%) | 22 (95.65%) | ||||
| ECM | 0.063 | 0.652 | 0.384 | 0.297 | ||||||
| Yes | 47 (20.98%) | 30 (31.25%) | 18 (18.18%) | 27 (31.03%) | 8 (21.62%) | 4 (17.39%) | ||||
| No | 177 (79.02%) | 66 (68.75%) | 81 (81.82%) | 60 (68.97%) | 29 (78.38%) | 19 (82.61%) | ||||
| Cortical involvement | 1.000 | <0.001 | 1.000 | 0.002 | ||||||
| Yes | 158 (70.54%) | 68 (70.83%) | 97 (97.98%) | 62 (71.26%) | 26 (70.27%) | 23 (100.00%) | ||||
| No | 66 (29.46%) | 28 (29.17%) | 2 (2.02%) | 25 (28.74%) | 11 (29.73%) | 0 (0.00%) | ||||
| Deep white matter invasion | 0.788 | <0.001 | 1.000 | <0.001 | ||||||
| Yes | 162 (72.32%) | 68 (70.83%) | 35 (35.35%) | 64 (73.56%) | 27 (72.97%) | 5 (21.74%) | ||||
| No | 62 (27.68%) | 28 (29.17%) | 64 (64.65%) | 23 (26.44%) | 10 (27.03%) | 18 (78.26%) | ||||
| Pial invasion | 0.710 | <0.001 | 0.169 | <0.001 | ||||||
| Yes | 92(41.07%) | 37 (38.54%) | 96 (96.97%) | 46 (52.87%) | 14 (37.84%) | 22 (95.65%) | ||||
| No | 132 (58.93%) | 59 (61.46%) | 3 (3.03%) | 41 (47.13%) | 23 (62.16%) | 1 (4.35%) | ||||
| Ependymal invasion | 1.000 | 0.780 | 0.193 | 0.034 | ||||||
| Yes | 56 (25.00%) | 24 (25.00%) | 23 (23.23%) | 27 (31.03%) | 7 (18.92%) | 2 (8.70%) | ||||
| No | 168 (75.00%) | 72 (75.00%) | 76 (76.77%) | 60 (68.97%) | 30 (81.08%) | 21 (91.30%) | ||||
GBM glioblastoma, IDH-mut Isocitrate dehydrogenase-mutant, IDH-wt Isocitrate dehydrogenase wild-type, MGMTmet methyl guanine methyl transferase promoter methylation, KPS karnofsky kerformance status score, TCM tumor across midline, ECM edema across midline
Construction of the clinical model
Univariate and multivariate LR results are presented in Table 2. For Task 1 (grade 4 vs. GBM), age (p = 0.014) and MGMTmet (p = 0.028) were significant predictors. For Task 2 (IDH-mut grade 4 vs. IDH-wt grade 4), ECM (p = 0.005) and deep white matter invasion (p = 0.003) showed significance.
Table 2.
Univariate and multivariate logistic regression analysis in the training sets of the two tasks
| Variables | Task 1 (grade 4 vs. GBM) | Task 2 (IDH-mut grade 4 vs. IDH-wt grade 4) | ||||||
|---|---|---|---|---|---|---|---|---|
| Univariate analysis | Multivariate analysis | Univariate analysis | Multivariate analysis | |||||
| OR (95%CI) | P | OR (95%CI) | P | OR (95%CI) | P | OR (95%CI) | P | |
| Gender | 1.103(0.638–1.918) | 0.727 | 1.120 (0.458–2.729) | 0.802 | ||||
| Age(years) | 1.034 (1.013–1.056) | 0.002 | 1.028 (1.006–1.052) | 0.014 | 0.979 (0.948–1.009) | 0.170 | ||
| MGMTmet | 0.564 (0.316–0.991) | 0.049 | 0.503 (0.269–0.918) | 0.028 | 1.700 (0.612–5.247) | 0.326 | ||
| Treatment | 1.487 (0.856–2.585) | 0.159 | 0.831 (0.340–2.035) | 0.684 | ||||
| KPS | 0.987 (0.963–1.010) | 0.271 | 1.003 (0.967–1.044) | 0.863 | ||||
| Tumor number | 1.557 (0.903–2.706) | 0.113 | 0.673 (0.244–1.745) | 0.426 | ||||
| Tumor margin | 0.971 (0.561–1.676) | 0.917 | 1.167 (0.465–3.032) | 0.745 | ||||
| Intratumoral hemorrhage | 1.776 (0.925–3.555) | 0.093 | 0.940 (0.268–2.959) | 0.918 | ||||
| Intratumoral necrosis | 1.696 (0.867–3.308) | 0.120 | 1.643 (0.617–4.752) | 0.335 | ||||
| Peritumoral edema | 5.480 (1.581–25.261) | 0.013 | 1.960 (0.424–10.969) | 0.405 | 2.625 (0.621–18.028) | 0.238 | ||
| Maximum diameter | 1.062 (0.915–1.238) | 0.432 | 0.888 (0.699–1.118) | 0.316 | ||||
| Midline shift | 0.972 (0.565–1.671) | 0.918 | 1.184 (0.488–2.895) | 0.709 | ||||
| Enhancement pattern | 1.465 (1.117–1.944) | 0.006 | 1.085 (0.765–1.552) | 0.649 | 1.492 (0.966–2.348) | 0.075 | ||
| Enhancement quality | 4.333 (1.97–10.127) | <0.001 | 2.568 (0.832–8.27) | 0.105 | 1.396 (0.518–4.067) | 0.521 | ||
| TCM | 1.387 (0.692–2.752) | 0.350 | 0.733 (0.264–2.113) | 0.555 | ||||
| ECM | 1.462 (0.756–2.804) | 0.254 | 0.338 (0.129–0.866) | 0.025 | 0.211 (0.067–0.602) | 0.005 | ||
| Cortical involvement | 0.892 (0.496–1.621) | 0.705 | 1.400(0.526–3.651) | 0.493 | ||||
| Deep white matter invasion | 1.024 (0.562–1.894) | 0.939 | 3.594 (1.344–9.956) | 0.012 | 5.688 (1.912–18.609) | 0.003 | ||
| Pial invasion | 1.423 (0.822–2.465) | 0.208 | 2.217 (0.909–5.575) | 0.084 | ||||
| Ependymal invasion | 1.350 (0.725–2.496) | 0.340 | 2.333 (0.857–7.121) | 0.112 | ||||
GBM glioblastoma, IDH-mut Isocitrate dehydrogenase-mutant, IDH-wt Isocitrate dehydrogenase wild-type, OR odds ratio, CI confidence interval, MGMTmet methyl guanine methyl transferase promoter methylation, KPS karnofsky performance status score, TCM tumor crosses midline, ECM edema crosses midline
The clinical models achieved AUCs of 0.671 and 0.619 (training set), 0.656 and 0.605 (testing set), and 0.543 and 0.400 (validation set) for Tasks 1 and Tasks 2, respectively (Fig. 4A-C, G-I, and Table 3).
Fig. 4.
ROC curves of the ML models, clinical models, and combined models in the training (A, G), testing (B, H), and validation sets (C, I) for two tasks. DeLong’s test p-valued heat maps of the training (D, J), testing (E, K), and validation sets (F, L) for two tasks. ROC: receiver operating characteristic; ML: machine learning; T1C: contrast-enhanced T1-weighted imaging; T2F: T2-weighted imaging fluid attenuated inversion recovery.
Table 3.
The diagnostic performance of clinical model, ML model, and combined model in the training, testing, and validation sets of the two tasks
| Task 1 (grade 4 vs. GBM) | Task 2 (IDH-mut grade 4 vs. IDH-wt grade 4) | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Set |
AUC
(95% CI) |
Cut-off | SEN | SPE | ACC | PPV | NPV |
AUC
(95% CI) |
Cut-off | SEN | SPE | ACC | PPV | NPV |
| T1C Edema | Training |
0.843 (0.787–0.892) |
0.667 | 0.707 | 0.833 | 0.754 | 0.876 | 0.631 |
0.837 (0.748–0.911) |
0.385 | 0.767 | 0.772 | 0.770 | 0.639 | 0.863 |
| Testing |
0.829 (0.734–0.911) |
0.505 | 0.875 | 0.750 | 0.823 | 0.831 | 0.811 |
0.787 (0.615–0.930) |
0.402 | 0.636 | 0.846 | 0.784 | 0.636 | 0.846 | |
| Validation |
0.715 (0.574–0.839) |
0.598 | 0.763 | 0.609 | 0.727 | 0.866 | 0.438 |
0.717 (0.500-1.000) |
0.304 | 0.677 | 1.000 | 0.435 | 0.048 | 0.000 | |
| T1C Tumor | Training |
0.825 (0.768–0.877) |
0.681 | 0.700 | 0.786 | 0.732 | 0.845 | 0.611 |
0.735 (0.624–0.838) |
0.382 | 0.800 | 0.614 | 0.678 | 0.522 | 0.854 |
| Testing |
0.722 (0.608–0.830) |
0.584 | 0.821 | 0.625 | 0.740 | 0.754 | 0.714 |
0.750 (0.535–0.927) |
0.373 | 0.818 | 0.654 | 0.703 | 0.500 | 0.895 | |
| Validation |
0.786 (0.646–0.915) |
0.533 | 0.842 | 0.739 | 0.818 | 0.914 | 0.586 |
0.700 (0.533-1.000) |
0.367 | 0.667 | 1.000 | 0.957 | 1.000 | 0.952 | |
| T2F Tumor | Training |
0.777 (0.711–0.838) |
0.558 | 0.886 | 0.571 | 0.768 | 0.775 | 0.750 |
0.719 (0.601–0.825) |
0.394 | 0.733 | 0.632 | 0.667 | 0.512 | 0818 |
| Testing |
0.629 (0.491–0.741) |
0.549 | 0.821 | 0.425 | 0.656 | 0.667 | 0.630 |
0.650 (0.432–0.818) |
0.370 | 0.909 | 0.423 | 0.568 | 0.400 | 0.917 | |
| Validation |
0.642 (0.529–0.751) |
0.608 | 0.579 | 0.783 | 0.626 | 0.898 | 0.360 |
0.750 (0.450–0.967) |
0.413 | 0.667 | 0.700 | 0.696 | 0.250 | 0.933 | |
| Optimal ML | Training |
0.902 (0.859–0.938) |
0.692 | 0.707 | 0.940 | 0.795 | 0.952 | 0.658 |
0.904 (0.838–0.957) |
0.373 | 0.933 | 0.772 | 0.828 | 0.683 | 0.957 |
| Testing |
0.854 (0.771–0.925) |
0.678 | 0.732 | 0.850 | 0.781 | 0.872 | 0.694 |
0.899 (0.783–0.986) |
0.378 | 0.909 | 0.808 | 0.838 | 0.667 | 0.955 | |
| Validation |
0.830 (0.728–0.923) |
0.596 | 0.698 | 0.913 | 0.748 | 0.964 | 0.477 |
0.783 (0.367-1.000) |
0.407 | 0.667 | 1.000 | 0.957 | 1.000 | 0.952 | |
| Clinical | Training |
0.671 (0.596–0.743) |
0.665 | 0.693 | 0.583 | 0.652 | 0.735 | 0.533 |
0.619 (0.519–0.720) |
0.400 | 0.467 | 0.772 | 0.667 | 0.519 | 0.733 |
| Testing |
0.656 (0.492–0.766) |
0.680 | 0.357 | 0.875 | 0.573 | 0.800 | 0.493 |
0.605 (0.449–0.767) |
0.400 | 0.364 | 0.846 | 0.703 | 0.500 | 0.759 | |
| Validation |
0.543 (0451-0.697) |
0.520 | 0.855 | 0.217 | 0.778 | 0.800 | 0.556 |
0.400 (0.300-0.475) |
- | 1.000 | 0.000 | 0.130 | 0.130 | - | |
| Combined | Training |
0.907 (0.866–0.943) |
0.499 | 0.893 | 0.762 | 0.844 | 0.862 | 0.810 |
0.899 (0.832–0.951) |
0.377 | 0.933 | 0.772 | 0.828 | 0.683 | 0.957 |
| Testing |
0.852 (0.767–0.925) |
0.670 | 0.768 | 0.825 | 0.792 | 0.860 | 0.717 |
0.895 (0.787–0.979) |
0.457 | 0.909 | 0.808 | 0.838 | 0.667 | 0.955 | |
| Validation |
0.832 (0.729–0.925) |
0.564 | 0.711 | 0.913 | 0.758 | 0.964 | 0.488 |
0.792 (0.400-1.000) |
0.387 | 0.667 | 1.000 | 0.957 | 1.000 | 0.952 | |
GBM glioblastoma, IDH-mut Isocitrate dehydrogenase-mutant, IDH-wt Isocitrate dehydrogenase wild-type, AUC area under curve, CI confidence interval, SEN sensitivity, SPE specificity, ACC accuracy, PPV Positive Predictive Value, NPV Negative Predictive Value, T1C contrast-enhanced T1-weighted imaging, T2F T2-weighted imaging fluid attenuated inversion recovery, ML machine learning
Construction of the ML model
A total of 1688 radiomics features were extracted. After feature selection and reduction, 11 T1C edema, 14 T1C tumor, and 10 T2F tumor features for Task 1 and 9 T1C edema, 6 T1C tumor, and 3 T2F tumor features for Task 2 were retained. These features were then amalgamated separately, resulting in final sets of 35 and 18 features for constructing combined-sequence ML models for the two tasks.
The results demonstrated that the validation set of the combined-sequence ML model of Task 1 (grade 4 vs. GBM) (training set = 0.902, testing set = 0.854, and validation set = 0.830) and Task 2 (IDH-mut grade 4 vs. IDH-wt grade 4) (training set = 0.904, testing set = 0.899, and validation set = 0.783) had the highest AUC, making them the optimal ML models (Fig. 4A-C, G-I, and Table 3). Rad-score was calculated.
Construction of the combined model and nomogram
In Task 1 (grade 4 vs. GBM), the combined model had the highest AUC in the training (0.907) and validation set (0.832), but was slightly lower than the optimal ML model in the testing set (0.852). In Task 2 (IDH-mut grade 4 vs. IDH-wt grade 4), the combined model exhibited lower AUC values than the optimal ML model in the training set (0.899) and testing set (0.895), though it attained the highest AUC in the validation set (0.792), as shown in Fig. 4A-C and G-I.
The nomogram based on the combined model further facilitates individualized discrimination between grade 4 astrocytoma and GBM (Fig. 5A). Their representative MR images and case nomograms are shown in Fig. 5B-E.
Fig. 5.
A Nomogram of the combined model for Task 1 (grade 4 vs. GBM). The nomogram incorporates Rad-score and independent clinical-radiological risk factors. B-E Representative MR images and case nomograms for grade 4 astrocytoma and GBM patients. B CE-T1WI and T2WI-FLAIR images of a 50-year-old male patient with grade 4 astrocytoma. C Rad-score = 1.36, calculates a score of 172 total points based on the nomogram, with a prediction probability of 0.141; D CE-T1WI and T2WI-FLAIR images of a 53-year-old male patient with GBM. E Rad-score = 8.53, calculates a score of 268 total points based on the nomogram, with a prediction probability of 0.937
Model evaluation and comparison
DeLong’s test demonstrated significant differences in AUCs between the optimal ML model and the combined model, as compared to the clinical model across both tasks in the training, testing and validation sets. For Task 1 (grade 4 vs. GBM), the differences were statistically significant in the training set, testing set, and validation set (all p values < 0.001). For Task 2 (IDH-mut grade 4 vs. IDH-wt grade 4), the optimal ML model and combined model showed significant improvements over the clinical model in both the training set (all p < 0.001) and testing set (all p < 0.01). In the validation set, the clinical model showed substantially lower AUC values (differences of 0.383 and 0.392) compared to the optimal ML model (p = 0.089) and combined model (p = 0.063), respectively, though these differences did not reach statistical significance. The AUC differences between the combined model and the optimal ML model were not statistically significant in the training set (Task 1: p = 0.428, Task 2: p = 0.716), testing set (Task 1: p = 0.887, Task 2: p = 0.856), and validation set (Task 1: p = 0.889, Task 2: p = 0.842). DeLong’s test results are shown in Fig. 4D-F and J-L.
The combined model demonstrated good calibration in all sets for Task 1 (grade 4 vs. GBM, Fig. 6A-C) and in the training (Fig. 6G) and testing sets (Fig. 6H) for Task 2 (IDH-mut grade 4 vs. IDH-wt grade 4). However, moderate calibration was observed in the validation set for Task 2 (Fig. 6I). Decision curve analysis (DCA) indicated that the combined model and the optimal ML model provided significantly better net benefits in predicting glioma molecular subtypes compared to the clinical model. For Task 1, threshold probabilities were 0.12 to 0.92 and 0.02 to 0.96 (training set, Fig. 6D), 0.18 to 0.86 and 0.06 to 0.92 (testing set, Fig. 6E), and 0.36 to 0.98 and 0.10 to 0.98 (validation set, Fig. 6F). For Task 2, threshold probabilities ranged from 0.06 to 0.70 and 0.02 to 0.98 (training set, Fig. 6J), from 0.06 to 0.62 and 0.02 to 0.96 (testing set, Fig. 6K), and from 0.08 to 0.98 and 0.10 to 0.98 (validation set, Fig. 6L).
Fig. 6.
Calibration curves of the combined model in the training (A, G), testing (B, H), and validation sets (C, I) for two tasks. Decision curve analysis (DCA) of the training (D, J), testing (E, K), and validation sets (F, L) of the clinical model, optimal ML model, and combined model for two tasks. ML: machine learning
The combined model offered a limited net benefit compared to the optimal ML model, which offered greater net benefits across multiple threshold ranges.
We used six additional ML methods to evaluate the stability and reliability of our combined model. In both tasks, the RF model demonstrated superior performance with AUCs of 1.000 and 0.996 in the training sets, while NB and XGBoost showed slightly lower performance. The remaining models performed similarly, with AUCs ranging from 0.903 to 0.919 (Task 1: grade 4 vs. GBM) and 0.922 to 0.943 (Task 2: IDH-mut grade 4 vs. IDH-wt grade 4), respectively (Fig. 7A, G, D, J). In Task 1 testing set, models showed comparable performance (AUC range: 0.851 to 0.882; Fig. 7B, H), with DeLong’s test revealing only MLP vs. SVM reached significance (p = 0.04; all other p > 0.05; Fig. 7N). For Task 2, excluding XGBoost (AUC = 0.897), models demonstrated consistent discrimination (AUC range: 0.944 to 0.976; Fig. 7E, K) without significant differences (all p > 0.05; Fig. 7Q). Validation set results showed AUCs of 0.790–0.832 for Task 1 (Fig. 7C, I). In Task 2, while RF demonstrated lower discrimination (AUC = 0.675; Fig. 7F, L), other models maintained AUCs of 0.700 to 0.833, with no significant differences detected by DeLong’s test (all p > 0.05; Fig. 7R).
Fig. 7.
AUCs for the training (A, D, G, J), testing (B, E, H, K), and validation sets (C, F, I, L) of the combined model constructed by 7 classifiers. DeLong’s test p-valued heat maps of training (M, P), testing (N, Q), and validation sets (O, R) for two tasks. ROC: receiver operating characteristic; RF: random forest; SVM: support vector machines, NB: naive bayes; XGboost: extreme gradient boosting; LDA: linear discriminant analysis; MLP: multi-layer perceptrons; LR: logistic regression.
Survival analysis
Using the training set cutoff values from the combined model for the two tasks (0.499 and 0.377), patients were divided into high-risk and low-risk groups. In Task 1 (grade 4 vs. GBM, 487 vs. 874 days) and Task 2 (IDH-mut grade 4 vs. IDH-wt grade 4, 627 vs. 1316 days), the average OS of low-risk patients was significantly longer than that of high-risk patients, reflecting the different prognoses associated with various molecular subtypes. KM analysis revealed significant differences between these groups for the combined model and molecular subtype in Task 1 and Task 2 (all p < 0.001). The Z-test indicated that there was no statistically significant difference in prognostic value between the molecular subtype and combined model in both tasks (p = 0.964 and p = 0.746), as illustrated in Fig. 8.
Fig. 8.
Kaplan-Meier survival analysis of the combined model for two tasks. The combined model (dotted line) effectively stratified Task 1 and Task 2 cases into high-risk (red line) and low-risk (green line) groups, with significant prognostic differences. The combined model performed comparable to the molecular subtype (solid line) in two tasks
Discussion
This study constructed two machine learning tasks to predict molecular subtypes of 2021 WHO grade 4 glioma using multiparametric MRI, clinical-radiological characteristics, and their combination. Our findings indicate that the ML model performed well in distinguishing astrocytoma, CNS WHO grade 4 from GBM, IDH-wt (WHO 2021) and discriminating astrocytoma, IDHmut, CNS WHO grade 4 from IDH-wt, CNS WHO grade 4. Additionally, the combined model effectively stratified Task 1 (grade 4 vs. GBM) and Task 2 (IDH-mut grade 4 vs. IDH-wt grade 4) cases into high-risk and low-risk groups according to OS, with prognostic performance comparable to molecular subtype.
Predictive value of the clinical model
Our study identified age and MGMTmet status as significant predictors for distinguishing astrocytoma, CNS WHO grade 4 from GBM, IDH-wt (WHO 2021) (Task 1: grade 4 vs. GBM) with AUCs of 0.671 (training set), 0.656 (testing set), and 0.543 (validation set). Similarly, ECM and deep white matter invasion were significant predictors for differentiating astrocytoma, IDHmut, CNS WHO grade 4 from astrocytoma, IDH-wt, CNS WHO grade 4 (Task 2: IDH-mut grade 4 vs. IDH-wt grade 4) with AUCs of 0.619 (training set), 0.605 (testing set), and 0.400 (validation set).
GBM, IDH-wt (WHO 2021) patients were older on average than astrocytoma, CNS WHO grade 4 patients (56.5 vs. 50.9 years), and astrocytoma, IDH-wt, CNS WHO grade 4 patients were older than astrocytoma, IDHmut, CNS WHO grade 4 patients (53.4 vs. 45.0 years), consistent with a previous study [18]. This indicated that older age may be associated with higher malignancy potential. Studies indicated that younger age is associated with better prognosis, whereas older age is linked to poor survival in adult glioma patients [19, 20], and our study indirectly supports this. Additionally, a higher proportion of MGMTmet was observed in astrocytoma, CNS WHO grade 4 compared to GBM, IDH-wt (WHO 2021) (67.3% vs. 47.8%), suggesting it may be associated with less aggressive tumor behavior. Previous studies have shown that MGMTmet is linked to an improved response to temozolomide and longer OS [2, 21, 22], consistent with our findings. Astrocytoma, IDH-wt, CNS WHO grade 4 patients exhibited higher rates of ECM (79.6% vs. 59.1%) and lower rates of deep white matter invasion (30.1% vs. 45.5%) compared to astrocytoma, IDHmut, CNS WHO grade 4 patients.
Despite these findings, the clinical model demonstrated limited predictive power in both tasks, underscoring the challenge of relying solely on clinical-radiological features for precise molecular subtypes of 2021 WHO glioma. This limitation is likely due to the inherent heterogeneity of glioma, where clinical and radiological characteristics may not fully capture the underlying molecular alterations.
Predictive value of the ML model
In our study, the ML model significantly outperformed the clinical models in distinguishing astrocytoma, CNS WHO grade 4 from GBM, IDH-wt (WHO 2021) and discriminating astrocytoma, IDHmut, CNS WHO grade 4 from astrocytoma, IDH-wt, CNS WHO grade 4. The optimal ML model for both tasks had strong predictive performance in the training (AUC = 0.902 and 0.904), testing (0.854 and 0.899), and validation (0.830 and 0.783) sets. Specifically, it achieved positive predictive values (PPV) of 0.952 (training), 0.872 (testing), and 0.964 (validation) for GBM, IDH-wt (WHO 2021) identification (Task 1: grade 4 vs. GBM), with corresponding negative predictive values (NPV) of 0.957, 0.955, and 0.952 for excluding IDH-mut subtypes (Task 2: IDH-mut grade 4 vs. IDH-wt grade 4). These metrics indicate > 85% accuracy for GBM prediction across all evaluations, and > 95% reliability in training/testing sets for ruling out astrocytoma, IDHmut, CNS WHO grade 4 cases, confirming its clinical utility for molecular subtyping. Currently, only Wei et al. [23] conducted a relevant study in which they developed a subregion-based MRI RadioFusionOmics model to discriminate between astrocytoma, CNS WHO grade 4 and GBM, IDH-wt (WHO 2021), achieving AUCs of 0.976 and 0.974 for the training and validation cohorts, respectively, consistent with our finding. However, we further stratified astrocytoma, CNS WHO grade 4 into IDHmut and IDH-wt subtypes and evaluated the prognostic value of the combined model. Previous studies employing radiomics or ML models to distinguish GBM, IDH-mut, WHO grade 4 from GBM, IDH-wt (WHO 2021) were based on the 2016 WHO CNS criteria [24–26]. No studies currently have addressed differentiating astrocytoma, IDH-mut, 2021 WHO grade 4 and astrocytoma, IDH-wt, CNS WHO grade 4 reclassified as grade 4 astrocytoma under the 2021 WHO CNS criteria.
Our multiparametric MRI includes T1C edema, T1C tumor, and T2F tumor. In both tasks, T1C edema, besides the optimal ML model and the combined model, showed the highest AUC in the training (0.843 and 0.837) and testing (0.829 and 0.787) sets. In Task 1 (grade 4 vs. GBM) and Task 2 (IDH-mut grade 4 vs. IDH-wt grade 4) validation (0.715 and 0.717) sets, T1C edema achieved an AUC above 0.7, though slightly lower than T1C tumor or T2F tumor. The peritumoral region adjacent to GBM, a mix of infiltrative tumor and vasogenic edema, is often the site of recurrence [16, 27]. Thus, tumor and edema regions in grade 4 glioma reflect tumor heterogeneity. Although peritumoral edema is not a significant predictor of the 2021 glioma molecular subtypes, its proportion in GBM, IDH-wt (WHO 2021) is higher than in astrocytoma, CNS WHO grade 4 (98.9% vs. 92.5%). This suggests peritumoral edema may encompass potential heterogeneity between astrocytoma, CNS WHO grade 4 and GBM, IDH-wt (WHO 2021), consistent with a previous study [23].
Predictive value of the combined model
The combined model, integrating Rad-score and clinical-radiological characteristics, aimed to leverage the strengths of both approaches. However, its performance was similar to the optimal ML model. In Task 1 (grade 4 vs. GBM), the AUCs for the training, testing, and validation sets were 0.907 vs. 0.902 (p = 0.428), 0.852 vs. 0.854 (p = 0.887) and 0.832 vs. 0.830 (p = 0.889), respectively. In Task 2 (IDH-mut grade 4 vs. IDH-wt grade 4), the AUCs for the training, testing, and validation sets were 0.899 vs. 0.904 (p = 0.716), 0.895 vs. 0.899 (p = 0.856), and 0.792 vs. 0.783 (p = 0.842), respectively. These findings indicate there is limited added value from incorporating clinical-radiological features.
The nomogram based on the combined model of Task 1 (grade 4 vs. GBM) offers a practical tool for individualized risk prediction, providing a visual and user-friendly representation to aid clinicians in stratifying patients and tailoring treatment strategies.
Model evaluation and comparison
We evaluated the combined model using various ML algorithms, including LR, SVM, MLP, LDA, RF, and NB. Most algorithms showed consistent performance across training, testing, and the external validation sets for both tasks. However, we observed moderate calibration and suboptimal decision curve performance specifically in the validation set for Task 2 (IDH-mut grade 4 vs. IDH-wt grade 4). This limitation likely stems from several factors: (1) severe class imbalance in the validation cohort (3 mutant vs. 20 wildtype cases), which fundamentally challenges model calibration; (2) potentially inferior clinical outcomes in the external validation set that may not fully represent the target population.
In addition, the notable performance degradation of RF (validation AUC = 0.675 vs. training AUC = 0.996) compared to more stable linear models, suggesting algorithm-specific sensitivity to dataset shifts. These results emphasize that clinical application requires rigorous multicenter validation.
Survival analysis
We further evaluated the prognostic value of the combined model. It effectively stratified patients into high-risk and low-risk groups, with prognostic value comparable to molecular subtype. This confirms that our model can accurately predict glioma molecular subtype and holds substantial prognostic value. While our study demonstrates the feasibility of MRI-based molecular subtyping, future research should focus on integrating radiomic biomarkers into clinical trial designs and surveillance protocols. Potential applications include stratifying patients for targeted therapy trials [28] or monitoring treatment response [29]. Nevertheless, such applications would require prospective validation and standardization of imaging protocols across institutions.
Limitations
This study has several limitations. Firstly, the retrospective nature and reliance on data from three institutions may introduce selection bias. Prospective validation on a larger multicenter cohort is necessary to confirm the findings. Secondly, different equipment and scanning parameters used across the three institutions, despite image preprocessing, may have influenced the radiomic features. Future studies should aim for uniform scanning parameters. Thirdly, the potential for segmentation bias in radiomic feature extraction should be acknowledged, particularly for first-order features, despite our demonstration of good intra-observer agreement (ICC ≥ 0.75). Fourthly, although this study included multi-sequence MRI of tumor and edema regions, future research should incorporate more functional imaging modalities, such as diffusion-weighted imaging (DWI) and perfusion-weighted imaging (PWI), to further enhance the model’s predictive power. Fifthly, the lack of standardized extent of resection data represents an important constraint, as surgical radicality is known to significantly impact survival outcomes in high-grade glioma. Finally, our study focused on static preoperative imaging, which limits the ability to capture dynamic tumor changes. Future studies could integrate parametric response mapping (PRM) [30, 31] to longitudinally track spatial heterogeneity in multi-parametric MRI data.
Conclusion
In conclusion, the multiparametric MRI machine learning model effectively differentiated astrocytoma, CNS WHO grade 4 from GBM, IDH-wt (WHO 2021) and distinguished between astrocytoma, IDHmut and IDH-wt subtypes. Furthermore, the model stratified various molecular subtypes of glioma patients into high-risk and low-risk groups according to OS, providing a proof-of-concept for non-invasive molecular subtyping.
Supplementary Information
Acknowledgements
Not applicable.
Abbreviations
- ACC
Accuracy
- AUC
Area under the curve
- AMP
Amplification
- CDKN
Cyclin-dependent kinase inhibitor
- CE
Contrast-enhanced
- CI
Confidence interval
- CNS
Central nervous system
- DCA
Decision curve analysis
- DWI
Diffusion-weighted imaging
- ECM
Edema crosses midline
- EGFR
Epidermal growth factor receptor
- FLAIR
Fluid attenuated inversion recovery
- GBM
Glioblastoma
- GLCM
Gray level co-occurrence matrix
- GLDM
Gray evel dependence matrix
- GLRLM
Gray level r un length matrix
- GLSZM
Gray level size zone matrix
- ICC
Intraclass correlation coefficient
- IDH-mut
Isocitrate dehydrogenase mutant
- IDH-wt
Isocitrate dehydrogenase wild-type
- KM
Kaplan-Meier
- KPS
Karnofsky performance status
- LASSO
Least absolute shrinkage and selection operator
- LDA
Linear discriminant analysis
- LR
Logistic regression
- mGBM
Molecular features of glioblastoma
- MGMTmet
Methyl guanine methyl transferase promoter methylation
- ML
Machine learning
- MRI
Magnetic resonance imaging
- MLP
Multi-layer perceptrons
- NB
Naive bayes
- NGTDM
Neighborhood gray tone difference matrix
- NPV
Negative predictive value
- OR
Odds ratio
- OS
Overall survival
- PFS
Progression-free survival
- PPV
Positive predictive value
- PWI
Perfusion-weighted imaging
- RF
Random forest
- ROC
Receiver operating characteristic
- ROI
Region of interest
- SEN
Sensitivity
- SPE
Specificity
- SVM
Support vector machines
- TCM
Tumor crosses midline
- TERT
Telomerase reverse transcriptase
- T1WI
T1-weighted imaging
- T2WI
T2-weighted imaging
- VOI
Volume of interest
- WHO
World health organization
- XGBoost
Extreme gradient boosting
Authors’ contributions
Conception and design: Y T, H Z, Xiaochun Wang. Administrative support: H Z, Xiaochun Wang, Guoqiang Yang, Jiangfeng Du. Provision of study materials or patients: WJ X, YY L. Collection and assembly of data: WJ X, YY L, J Z, Guoqiang Yang, Jiangfeng Du. Data analysis and interpretation: WJ X, YY L, J Z, ZY Z, PX S. Manuscript writing: All authors. Final approval of manuscript: All authors.
Funding
This work was supported by the National Natural Science Foundation of China [grant numbers 82371941, 82071893 to Yan Tan, and U21A20386 to Hui Zhang]; the Research Project Supported by Shanxi Scholarship Council of China [grant number 2023 − 186 to Yan Tan]; and Shanxi Province Higher Education “Billion Project” Science and Technology Guidance Project [grant number BYJL017 to Yan Tan].
Data availability
The datasets generated or analyzed during the study are not publicly available due to institutional regulations but are available from the corresponding author on reasonable request.
Declarations
Ethics approval and consent to participate
The study was retrospective and anonymous, so the ethical review (Ethics Committee of the First Hospital of Shanxi Medical University) waived the requirement for written informed consent (NO. KYYJ-2023-058).
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Wenji Xu and Yangyang Li are co-first authors and contributed equally to this work.
Contributor Information
Hui Zhang, Email: zhanghui_mr@163.com.
Yan Tan, Email: tanyan123456@sina.com.
References
- 1.Louis DN, Perry A, Wesseling P, Brat DJ, Cree IA, Figarella-Branger D, et al. The 2021 WHO classification of tumors of the central nervous system: a summary. Neuro Oncol. 2021;23:1231–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Horbinski C, Berger T, Packer RJ, Wen PY. Clinical implications of the 2021 edition of the WHO classification of central nervous system tumours. Nat Rev Neurol. 2022;18:515–29. [DOI] [PubMed] [Google Scholar]
- 3.Ramos-Fresnedo A, Pullen MW, Perez-Vega C, Domingo RA, Akinduro OO, Almeida JP, et al. The survival outcomes of molecular glioblastoma IDH-wildtype: a multicenter study. J Neurooncol. 2022;157:177–85. [DOI] [PubMed] [Google Scholar]
- 4.Gritsch S, Batchelor TT, Gonzalez Castro LN. Diagnostic, therapeutic, and prognostic implications of the 2021 world health organization classification of tumors of the central nervous system. Cancer. 2022;128:47–58. [DOI] [PubMed] [Google Scholar]
- 5.Lambin P, Leijenaar RTH, Deist TM, Peerlings J, De Jong EEC, Van Timmeren J, et al. Radiomics: the Bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol. 2017;14:749–62. [DOI] [PubMed] [Google Scholar]
- 6.Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, Van Stiphout RGPM, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48:441–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Avanzo M, Wei L, Stancanello J, Vallières M, Rao A, Morin O, et al. Machine and deep learning methods for radiomics. Med Phys. 2020;47:e185–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Moodi F, Khodadadi Shoushtari F, Ghadimi DJ, Valizadeh G, Khormali E, Salari HM et al. Glioma tumor grading using radiomics on conventional MRI: a comparative study of WHO 2021 and WHO 2016 classification of central nervous tumors. J Magn Reson Imaging. 2024 Sep;60(3):923-938. [DOI] [PubMed]
- 9.Tian Q, Yan L, Zhang X, Zhang X, Hu Y, Han Y, et al. Radiomics strategy for glioma grading using texture features from multiparametric MRI. Magn Reson Imaging. 2018;48:1518–28. [DOI] [PubMed] [Google Scholar]
- 10.Chiu F-Y, Le NQK, Chen C-Y. A multiparametric MRI-Based radiomics analysis to efficiently classify tumor subregions of glioblastoma: A pilot study in machine learning. JCM. 2021;10:2030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lin K, Cidan W, Qi Y, Wang X. Glioma grading prediction using multiparametric magnetic resonance imaging-based radiomics combined with proton magnetic resonance spectroscopy and diffusion tensor imaging. Med Phys. 2022;49:4419–29. [DOI] [PubMed] [Google Scholar]
- 12.Ding J, Zhao R, Qiu Q, Chen J, Duan J, Cao X, et al. Developing and validating a deep learning and radiomic model for glioma grading using multiplanar reconstructed magnetic resonance contrast-enhanced T1-weighted imaging: a robust, multi-institutional study. Quant Imaging Med Surg. 2022;12:1517–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Vijithananda SM, Jayatilake ML, Gonçalves TC, Rato LM, Weerakoon BS, Kalupahana TD, et al. Texture feature analysis of MRI-ADC images to differentiate glioma grades using machine learning techniques. Sci Rep. 2023;13:15772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Xing X, Zhu M, Chen Z, Yuan Y. Comprehensive learning and adaptive teaching: distilling multi-modal knowledge for pathological glioma grading. Med Image Anal. 2024;91:102990. [DOI] [PubMed] [Google Scholar]
- 15.Malik N, Geraghty B, Dasgupta A, Maralani PJ, Sandhu M, Detsky J, et al. MRI radiomics to differentiate between low grade glioma and glioblastoma peritumoral region. J Neurooncol. 2021;155:181–91. [DOI] [PubMed] [Google Scholar]
- 16.Szekeres D, Jetty SN, Soni N. The role of multiparametric MRI in diagnosing and grading glioma. Neurol India. 2023;71:1274–5. [DOI] [PubMed] [Google Scholar]
- 17.Naser MA, Deen MJ. Brain tumor segmentation and grading of lower-grade glioma using deep learning in MRI images. Comput Biol Med. 2020;121:103758. [DOI] [PubMed] [Google Scholar]
- 18.Lee D, Riestenberg RA, Haskell-Mendoza A, Bloch O. Diffuse astrocytic glioma, IDH-Wildtype, with molecular features of glioblastoma, WHO grade IV: A single-institution case series and review. J Neurooncol. 2021;152:89–98. [DOI] [PubMed] [Google Scholar]
- 19.Weller M, Van Den Bent M, Preusser M, Le Rhun E, Tonn JC, Minniti G, et al. EANO guidelines on the diagnosis and treatment of diffuse gliomas of adulthood. Nat Rev Clin Oncol. 2021;18:170–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Park YW, Kim S, Park CJ, Ahn SS, Han K, Kang S-G, et al. Adding radiomics to the 2021 WHO updates May improve prognostic prediction for current IDH-wildtype histological lower-grade gliomas with known EGFR amplification and TERT promoter mutation status. Eur Radiol. 2022;32:8089–98. [DOI] [PubMed] [Google Scholar]
- 21.Agarwal A, Edgar MA, Desai A, Gupta V, Soni N, Bathla G. Molecular GBM versus histopathological GBM: radiology-pathology-genetic correlation and the new WHO 2021 definition of glioblastoma. AJNR Am J Neuroradiol. 2024 Aug 9;45(8):1006-1012. [DOI] [PMC free article] [PubMed]
- 22.Zeng C, Song X, Zhang Z, Cai Q, Cai J, Horbinski C, et al. Dissection of transcriptomic and epigenetic heterogeneity of grade 4 gliomas: implications for prognosis. Acta Neuropathol Commun. 2023;11:133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wei R, Lu S, Lai S, Liang F, Zhang W, Jiang X, et al. A subregion-based radiofusionomics model discriminates between grade 4 Astrocytoma and glioblastoma on multisequence MRI. J Cancer Res Clin Oncol. 2024;150:73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Pasquini L, Napolitano A, Tagliente E, Dellepiane F, Lucignani M, Vidiri A et al. Deep learning can differentiate IDH-Mutant from IDH-wild GBM. J Pers Med. 2021 Apr 9;11(4):290. [DOI] [PMC free article] [PubMed]
- 25.Calabrese E, Villanueva-Meyer JE, Cha S. A fully automated artificial intelligence method for non-invasive, imaging-based identification of genetic alterations in glioblastomas. Sci Rep. 2020;10:11852. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Kandalgaonkar P, Sahu A, Saju AC, Joshi A, Mahajan A, Thakur M, et al. Predicting IDH subtype of grade 4 Astrocytoma and glioblastoma from tumor radiomic patterns extracted from multiparametric magnetic resonance images using a machine learning approach. Front Oncol. 2022;12:879376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Cheng J, Liu J, Yue H, Bai H, Pan Y, Wang J. Prediction of glioma grade using intratumoral and peritumoral radiomic features from multiparametric MRI images. IEEE/ACM Trans Comput Biol Bioinform. 2022;19:1084–95. [DOI] [PubMed] [Google Scholar]
- 28.Liu C, Liu Y, Lin H, Zhang C, Zhang B, Song H, et al. Multi-omics landscape of alternative splicing in diffuse midline glioma reveals immune- and neural-driven subtypes with implications for spliceosome-targeted therapy. Front Immunol. 2025;16:1587009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zhou Q, Xue C, Ke X, Zhou J. Treatment Response and Prognosis Evaluation in High-Grade Glioma: An Imaging Review Based on MRI. J Magn Reson Imaging. 2022 Aug;56(2):325-340. [DOI] [PubMed]
- 30.Lausch A, Yeung TP, Chen J, Law E, Wang Y, Urbini B, et al. A generalized parametric response mapping method for analysis of multi-parametric imaging: A feasibility study with application to glioblastoma. Med Phys. 2017;44:6074–84. [DOI] [PubMed] [Google Scholar]
- 31.Hoff BA, Lemasson B, Chenevert TL, Luker GD, Tsien CI, Amouzandeh G, et al. Parametric response mapping of FLAIR MRI provides an early indication of progression risk in glioblastoma. Acad Radiol. 2021;28:1711–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets generated or analyzed during the study are not publicly available due to institutional regulations but are available from the corresponding author on reasonable request.








