Predicting the molecular subtypes of 2021 WHO grade 4 glioma by a multiparametric MRI-based machine learning model

Wenji Xu; Yangyang Li; Jie Zhang; Zhiyi Zhang; Pengxin Shen; Xiaochun Wang; Guoqiang Yang; Jiangfeng Du; Hui Zhang; Yan Tan

doi:10.1186/s12885-025-14529-7

. 2025 Jul 14;25:1171. doi: 10.1186/s12885-025-14529-7

Predicting the molecular subtypes of 2021 WHO grade 4 glioma by a multiparametric MRI-based machine learning model

Wenji Xu ^1,^#, Yangyang Li ^1,^#, Jie Zhang ¹, Zhiyi Zhang ¹, Pengxin Shen ¹, Xiaochun Wang ², Guoqiang Yang ², Jiangfeng Du ², Hui Zhang ^2,^✉, Yan Tan ^2,^✉

PMCID: PMC12261723 PMID: 40660102

Abstract

Background

Accurately distinguishing the different molecular subtypes of 2021 World Health Organization (WHO) grade 4 Central Nervous System (CNS) gliomas is highly relevant for prognostic stratification and personalized treatment.

Objectives

To develop and validate a machine learning (ML) model using multiparametric MRI for the preoperative differentiation of astrocytoma, CNS WHO grade 4, and glioblastoma (GBM), isocitrate dehydrogenase-wild-type (IDH-wt) (WHO 2021) (Task 1:grade 4 vs. GBM); and to stratify astrocytoma, CNS WHO grade 4, by distinguish astrocytoma, IDH-mutant (IDH-mut), CNS WHO grade 4 from astrocytoma, IDH-wild-type (IDH-wt), CNS WHO grade 4 (Task 2:IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}). Additionally, to evaluate the model’s prognostic value.

Methods

We retrospectively analyzed 320 glioma patients from three hospitals (training/testing, 7:3 ratio) and 99 patients from ‌The Cancer Genome Atlas (TCGA) database for external validation‌. Radiomic features were extracted from tumor and edema on contrast-enhanced T1-weighted imaging (CE-T1WI) and T2 fluid-attenuated inversion recovery (T2-FLAIR). Extreme gradient boosting (XGBoost) was utilized for constructing the ML, clinical, and combined models. Model performance was evaluated with receiver operating characteristic (ROC) curves, decision curves, and calibration curves. Stability was evaluated using six additional classifiers. Kaplan-Meier (KM) survival analysis and the log-rank test assessed the model’s prognostic value.

Results

In Task 1 and Task 2, the combined model (AUC = 0.907, 0.852 and 0.830 for Task 1; AUC = 0.899, 0.895 and 0.792 for Task 2) and the optimal ML model (AUC = 0.902, 0.854 and 0.832 for Task 1; AUC = 0.904, 0.899 and 0.783 for Task 2) significantly outperformed the clinical model (AUC = 0.671, 0.656, and 0.543 for Task 1; AUC = 0.619, 0.605 and 0.400 for Task 2) in both the training, testing and validation sets. Survival analysis showed the combined model performed similarly to molecular subtype in both tasks (p = 0.964 and p = 0.746).

Conclusion

The multiparametric MRI ML model effectively distinguished astrocytoma, CNS WHO grade 4 from GBM, IDH-wt (WHO 2021) and differentiated astrocytoma, IDH-mut from astrocytoma, IDH-wt, CNS WHO grade 4. Additionally, the model provided reliable survival stratification for glioma patients across different molecular subtypes.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12885-025-14529-7.

Keywords: Astrocytoma, Glioblastoma, Magnetic resonance imaging, Machine learning, Molecular subtype

Background

The 2021 World Health Organization (WHO) Classification of central nervous system (CNS) tumors underscores the importance of integrating molecular parameters with histological findings for accurate grading [1]. Notably, isocitrate dehydrogenase-mutant (IDH-mut) grade 2–3 astrocytomas are now reclassified as astrocytoma, IDH-mutant, CNS WHO grade 4, in the presence of homozygous deletion of cyclin-dependent kinase inhibitor A/B (CDKN2A/B), regardless of necrosis and/or microvascular proliferation. Similarly, IDH wild-type (IDH-wt) grade 2–3 astrocytomas are reclassified as astrocytoma, IDH-wt, with molecular features of glioblastoma (mGBM), CNS WHO grade 4, if any or a combination of telomerase reverse transcriptase (TERT) promoter mutation, epidermal growth factor receptor (EGFR) amplification, and chromosome + 7 /−10 copy number changes. These molecularly characterized grade 2–3 astrocytomas represent high-risk molecular subtypes with poor prognosis [1, 2].

Despite poor prognosis, astrocytoma, IDH-wt, CNS WHO grade 4 patients demonstrated longer overall survival (OS) and progression-free survival (PFS) compared to glioblastoma (GBM) patients [3]. Moreover, the two IDH variants of astrocytoma, CNS WHO grade 4 exhibit distinct biological behaviors and clinical outcomes: IDH-mut astrocytoma showing less aggressive behavior and a better prognosis under similar treatment protocols [2, 4]. Therefore, accurately discrimination of molecular subtypes in WHO grade 4 glioma is highly relevant for prognostic stratification and personalized treatment. However, current molecular diagnostics depend on invasive biopsies, necessitating a reliable non-invasive approach to predict the molecular subtypes of glioma.

Magnetic Resonance Imaging (MRI) is extensively utilized for glioma diagnosis and treatment monitoring but falls short in revealing histological and molecular details. Radiomics, by extracting high-throughput image features, provides insights into tumor heterogeneity, thereby enhancing diagnostic and therapeutic accuracy [5, 6]. Machine learning (ML) algorithms enable the processing of radiomics data predicting tissue characteristics and identifying molecular features in glioma [7, 8]. Previous studies have developed predictive models for glioma grading based on MRI radiomics or machine learning, mostly adhering to the 2016 or earlier WHO classifications or focusing on glioma grading prediction [9–17]. However, the differentiation of grade 4 gliomas from GBM, IDH-wt (WHO 2021), as well as the distinction between astrocytoma, IDHmut and IDH-wt, CNS WHO grade 4, remains underinvestigated. Therefore, we aim to construct a ML model using multiparametric MRI to differentiate between astrocytoma, CNS WHO grade 4 and GBM, IDH-wt (WHO 2021) (Task 1:grade 4 vs. GBM), and further stratify grade 4 astrocytoma to distinguish IDH-mut from IDH-wt subtypes (Task 2:IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}). Additionally, we seek to analyze its corresponding prognostic value in OS.

Materials and methods

Machine learning-based classification

We constructed two independent binary classification tasks: (1) Differentiation between astrocytoma, CNS WHO grade 4 and GBM, IDH-wt (WHO 2021) (Task 1: grade 4 vs. GBM), and (2) Stratification of grade 4 astrocytoma into IDHmut and IDH-wt subtypes (Task 2: IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}).

Patient population

This retrospective study enrolled 320 patients with pathologically confirmed glioma from three institutions (First Hospital of Shanxi Medical University, Shanxi Provincial People’s Hospital, and Shanxi Bethune Hospital) between February 2011 and December 2023. The institutional cohort was randomly stratified into training and testing set at a 7:3 ratio. An additional 99 cases from The Cancer Genome Atlas (TCGA) database served as an independent external validation set. To assess the prognostic value of the molecular subtype prediction model, OS was defined as the time from initial diagnosis to death or the last follow-up. Survival information was available for 312 patients in the institutional cohort and all 99 cases in the TCGA cohort. The institutional review board waived the requirement for patient informed consent (KYYJ-2023-058).

Inclusion criteria: (1) Histological diagnosis of diffuse glioma; (2) No prior radiotherapy, chemotherapy, or surgery before MRI examination; (3) No history of craniocerebral surgery or other systemic malignancies; (4) Availability of molecular information, including IDH, 1p/19q, methyl guanine methyl transferase promoter methylation (MGMTmet), CDKN2A/B, EGFR, TERT, and chromosome 7/10 status. (5) The follow-up time of TCGA data was more than two years.

The exclusion criteria were: (1) 2021 CNS WHO grade 2–3 astrocytoma; (2) Oligodendroglioma; (3) Incomplete or poor-quality MRI images; (4) Incomplete molecular information. A flowchart of patient selection and machine learning classification is shown in Fig. 1.

Fig. 1 — Patient flow and machine learning-based classification chart. CNS: Central Nervous System; WHO: World Health Organization; IDH-mut: Isocitrate dehydrogenase-mutant; IDH-wt: Isocitrate dehydrogenase wild-type; TERTp^MUT: telomerase reverse transcriptase promoter mutation; EGFR^AMP: epidermal growth factor receptor amplification; GBM: glioblastoma; mGBM: molecular glioblastoma; TCGA: The Cancer Genome Atlas

Clinical-radiological characteristics collection

Clinical-radiological characteristics included gender, age, methyl guanine methyl transferase promoter methylation (MGMTmet), treatment, Karnofsky Performance Status (KPS) score, tumor number, tumor margin, intratumoral hemorrhage, intratumoral necrosis, peritumoral edema, maximum diameter, midline shift, enhancement pattern, enhancement quality, tumor crosses midline (TCM), edema crosses midline (ECM), cortical involvement, deep white matter invasion, pial invasion, and ependymal invasion.

Molecular biomarker detection

IDH mutation status, CDKN2A/B co-deletion status, and TERT promoter mutation status were detected via Sanger sequencing (ABI 3500, Thermo Fisher Scientific, Waltham, MA, USA). EGFR amplification, alterations in chromosome 7/10, and 1p/19q deletions were determined using fluorescence in situ hybridization (FISH). Bisulfite modification of extracted DNA from glioma was performed using the Bisul Flash™ DNAModification Kit (Epigentek, Farmingdale, New York, NY, USA), and PCR amplification for MGMTmet status was conducted using the DRR006 Kit (Takara, Kusatsu, Shiga, Japan).

MRI image acquisition and preprocessing

MR Images were obtained using 3T scanners (Signa HDxt, GE Healthcare, USA; Skyra, Siemens Healthineers, Germany). Scanning sequences comprised axial T2-weighted imaging fluid attenuated inversion recovery (T2WI-FLAIR) and contrast-enhanced T1WI (CE-T1WI). MRI scanning parameters were: T2WI-FLAIR (TR 6800-8000ms, TE 80-95ms, TI 2000ms); CE-T1WI (TR 195-240ms, TE 4.8-8.6ms); layer thickness 5.0 mm, layer spacing 1.5 mm, FOV 240 mm×240 mm, matrix 256 × 256. Contrast-enhanced sequences utilized intravenous gadolinium contrast agent (0.1mmol/kg).

Images were preprocessed to standardize signal intensity on a scale of 100, corrected for N4ITK bias fields, and resampled to a voxel size of 1 mm × 1 mm × 1 mm, with voxel intensity discretized using a fixed bin width of 5. Image preprocessing used FeAture Explorer V.0.5.7 (FAE, https://github.com/salan668/FAE) and 3Dslicer version 5.7.20240325 (https://www.slicer.org).

Image segmentation, feature extraction and selection

A rigid registration algorithm was used to register T2WI-FLAIR to CE-T1WI images. A radiologist with 8 years of experience (Radiologist A) manually delineated tumor and edema areas layer by layer on CE-T1WI and T2-FLAIR, respectively, while blinded to molecular status and clinical outcomes. The volume of interest (VOI) outline of the edema was replicated onto CE-T1WI. Three VOIs were acquired for each patient: CE-T1WI tumor (T1C tumor), CE-T1WI edema (T1C edema), and T2WI-FLAIR tumor (T2F tumor). Thirty patients were randomly selected for a second segmentation by an independent radiologist with 5 years of experience (Radiologist B) a month later to calculate the intraclass correlation coefficient (ICC) for inter-observer agreement. The two radiologists were blinded to molecular status and clinical outcomes.

The extracted features encompassing seven categories: first-order, shape, Gray Level Co-Occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), Gray Level Dependence Matrix (GLDM), and Neighborhood Gray Tone Difference Matrix (NGTDM) features. Feature selection was performed on the training set. Features with an intraclass correlation coefficient (ICC) ≥ 0.75 were selected, followed by z-score normalization. Normality was assessed using the Shapiro-Wilk test. Features were retained if p < 0.05 using independent sample t-tests or Mann-Whitney U tests, depending on distribution. The least absolute shrinkage and selection operator (LASSO) selected the best regularization parameters through ten-fold cross-validation to ensure the robustness of feature selection.

Image segmentation utilized ITK-SNAP software (http://www.itksnap.org/, version 4.0.0). Features were extracted using FAE software. Region of interest (ROI) segmentation is shown in Fig. 2. The radiomics workflow is shown in Fig. 3.

Fig. 2 — ROI segmentation of T1C tumor, T1C edema and T2F tumor. T1C: contrast-enhanced T1-weighted imaging; T2F: T2-weighted imaging fluid attenuated inversion recovery

Fig. 3 — The workflow of radiomics analysis. The radiomics workflow includes ROI segmentation, feature extraction, feature selection and ML model construction and model evaluation. GLCM: Gray Level Co-occurrence Matrix; GLRLM: Gray Level Run Length Matrix; GLSZM: Gray Level Size Zone Matrix; GLDM: Gray Level Dependence Matrix; NGTDM: Neighborhood Gray Tone Difference Matrix; ICC: intraclass correlation coefficient; M-W: Mann-Whitney test; LASSO: least absolute shrinkage and selection operator; XGBoost: extreme gradient boosting; ML: machine learning.

Construction of the ML model

Extreme gradient boosting (XGBoost), an ensemble learning algorithm based on gradient boosting decision tree, was employed to build ML models for the two tasks. Based on the selected features of each sequence and their combination, ML models for single and combined sequences were constructed. The ML model with the best predictive performance in the validation set was selected as the optimal ML model, and the Rad-score was calculated.

Construction of the clinical model

Univariate logistic regression (LR) analysis screened clinical-radiological characteristics, and variables with p < 0.05 were included in multivariate LR analysis. Significant variables were used to establish the clinical model using XGBoost.

Construction of the combined model and nomogram

A combined model was constructed using XGBoost based on the Rad-score and independent clinical-radiological risk factors. A nomogram was generated using LR to visually discriminate between astrocytoma, CNS WHO grade 4 and GBM, IDH-wt, (WHO 2021).

Model evaluation and model comparison

The performance of each model in predicting WHO grade 4 glioma molecular subtypes was evaluated by the Receiver Operating Characteristic (ROC) curve, assessing Area Under the Curve (AUC), sensitivity (SEN), specificity (SPE), accuracy (ACC), Positive Predictive Value (PPV) and Negative Predictive Value (NPV). The DeLong’s test compared the predictive performance between different models, with statistical significance set at p < 0.05. Decision and calibration curves evaluated model calibration and clinical utility.

Various ML algorithms, including LR, support vector machines (SVM), multi-layer perceptrons (MLP), linear discriminant analysis (LDA), random forest (RF), and naive bayes (NB), were used to evaluate the generalization and stability of the combined model.

Survival analysis

To evaluate the prognostic value of the molecular subtype prediction model, Kaplan-Meier (KM) survival analysis and log-rank test were employed based on the combined model. The Z-test compared the prognostic value between the combined model and the molecular subtype.

Statistical analysis

Statistical analyses were performed using R version 4.2.3 (http://www.R-project.org). Numerical variables were expressed as mean ± standard deviations. The Shapiro-Wilk test assessed normality. Independent sample t-test or Mann-Whitney U-test was used based on the distribution. Categorical variables were expressed as frequencies (percentages) and evaluated using Pearson’s chi-square test or Fisher’s exact test. p < 0.05 was considered statistically significant.

Results

Clinical-radiological baseline characteristics

Baseline characteristics are detailed in Table 1. Among the 320 patients enrolled based on the 2021 WHO CNS tumor classification, there were 196 GBMs, 41 IDH-mut ^{grade 4}, and 83 IDH-wt ^{grade 4} astrocytoma cases. The training and testing sets included 224/96 cases for Task 1 (grade 4 vs. GBM) and 118/51 cases for Task 2 (IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}). The validation set (n = 99) consisted of 76 GBMs, 3 IDH-mut ^{grade 4}, and 20 IDH-wt ^{grade 4} astrocytomas. The diagnostic reclassification scheme is detailed in Supplementary Table 1.

Table 1.

Clinical-radiological baseline characteristics between training, testing, and validation sets of the two tasks

Variables	Task 1 (grade 4 vs. GBM)					Task 2 (IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4})
Variables	Training set(n = 224)	Testing set(n = 96)	P-value	Validation set(n = 99)	P-value	Training set(n = 87)	Testing set(n = 37)	P-value	Validation set(n = 23)	P-value
Gender			0.712		0.182			0.554		0.355
Male	130 (58.04%)	53 (55.21%)		49(49.49%)		48 (55.17%)	23 (62.16%)		10 (43.48%)
Female	94 (41.96%)	43 (44.79%)		50(50.51%)		39 (44.83%)	14 (37.84%)		13 (56.52%)
Age(years)	53.76 ± 13.48	52.23 ± 13.62	0.201	58.51 ± 12.32	0.011	49.26 ± 14.71	52.11 ± 14.21	0.216	55.17 ± 10.29	0.031
MGMTmet			0.027		0.051			0.090		0.302
Yes	136 (60.71%)	45 (46.88%)		48(48.48%)		64 (73.56%)	21 (56.76%)		14 (60.87%)
No	88 (39.29%)	51 (53.12%)		51 (51.52%)		23 (26.44%)	16 (43.24%)		9 (39.13%)
Treatment			1.000		<0.001			0.440		<0.001
Surgery	88 (39.29%)	38 (39.58%)		6 (6.06%)		38 (43.68%)	19 (51.35%)		1 (4.35%)
Combination therapy	136 (60.71%)	58 (60.42%)		93 (93.94%)		49 (56.32%)	18 (48.65%)		22 (95.65%)
KPS	78.64 ± 12.08	79.23 ± 11.46	0.913	77.98 ± 11.86	0.426	78.46 ± 12.11	79.78 ± 6.68	0.884	80.43 ± 12.24	0.114
Tumor number			0.625		0.630			0.305		0.621
Single	118 (52.68%)	54 (56.25%)		49 (49.49%)		59 (67.82%)	21 (56.76%)		14 (60.87%)
Multiple	106 (47.32%)	42 (43.75%)		50 (50.51%)		28 (32.18%)	16 (43.24%)		9 (39.13%)
Tumor margin			0.713		0.463			0.073		0.806
Clear	97 (43.30%)	44 (45.83%)		38 (38.38%)		31 (35.63%)	20 (54.05%)		7 (30.43%)
Non-clear	127 (56. 70%)	52 (54.17%)		61 (61.62%)		56 (64.37%)	17 (45.95%)		16 (69.57%)
Intratumoral hemorrhage			1.000		<0.001			0.617		1.000
Yes	54 (24.11%)	23 (23.96%)		57 (57.583%)		15 (17.24%)	8 (21.62%)		4 (17.39%)
No	170 (75.89%)	73 (76.04%)		42 (42.42%)		72 (82.76%)	29 (78.38%)		19 (82.61%)
Intratumoral necrosis			0.640		0.273			0.269		0.137
Yes	180 (80.36%)	80 (83.33%)		85 (85.86%)		61 (70.11%)	30 (81.08%)		12 (52.17%)
No	44 (19.64%)	16 (16.67%)		14 (14.14%)		26 (29.89%)	7 (18.92%)		11 (47.83%)
Peritumoral edema			0.243		0.021			0.033		0.116
Yes	212 (94.64%)	94 (97.92%)		99 (100.00%)		76 (87.36%)	37 (100.00%)		23 (100.00%)
No	12 (5.36%)	2 (2.08%)		0 (0.00%)		11 (12.64%)	0 (0.00%)		0 (0.00%)
Maximum diameter	4.73 ± 1.82	4.74 ± 1.96	0.840	4.50 ± 1.64	0.354	4.66 ± 1.94	4.64 ± 1.52	0.960	3.86 ± 1.85	0.070
Midline shift			0.456		0.116			0.433		0.058
Yes	111 (47.45%)	48 (50.00%)		39 (39.39%)		44 (50.57%)	22 (59.46%)		6 (26.09%)
No	113 (52.55%)	48 (50.00%)		60 (60.61%)		43 (49.43%)	15 (40.54%)		17 (73.91%)
Enhancement pattern			0.316		<0.001			0.392		0.008
No reinforcement	32 (14.29%)	10 (10.42%)		6 (6.06%)		24 (27.59%)	6 (16.22%)		6 (26.09%)
Annular reinforcement	103 (45.98%)	41 (42.71%)		58 (58.59%)		36 (41.38%)	15 (40.54%)		5 (21.74%)
Nodular enhancement	30 (13.39%)	10 (10.42%)		23 (23.23%)		13(14.94%)	6 (16.22%)		11 (47.83%)
Mixed reinforcement	59 (26.34%)	35 (36.46%)		12 (12.12%)		14 (16.09%)	10 (27.03%)		1 (4.35%)
Enhancement quality			0.357		0.057			0.108		1.000
No reinforcement	31 (13.84%)	9 (9.38%)		6 (6.06%)		24 (27.59%)	5 (13.51%)		6 (26.09%)
Reinforcement	193 (86.16%)	87 (90.62%)		93 (93.94%)		63 (72.41%)	32 (86.49%)		17 (73.91%)
TCM			0.537		0.010			1.000		0.069
Yes	41 (18.30%)	21 (21.88%)		7 (7.07%)		20 (22.99%)	8 (21.62%)		1 (4.35%)
No	183 (81.70%)	75 (78.12%)		92 (92.93%)		67 (77.01%)	29 (78.38%)		22 (95.65%)
ECM			0.063		0.652			0.384		0.297
Yes	47 (20.98%)	30 (31.25%)		18 (18.18%)		27 (31.03%)	8 (21.62%)		4 (17.39%)
No	177 (79.02%)	66 (68.75%)		81 (81.82%)		60 (68.97%)	29 (78.38%)		19 (82.61%)
Cortical involvement			1.000		<0.001			1.000		0.002
Yes	158 (70.54%)	68 (70.83%)		97 (97.98%)		62 (71.26%)	26 (70.27%)		23 (100.00%)
No	66 (29.46%)	28 (29.17%)		2 (2.02%)		25 (28.74%)	11 (29.73%)		0 (0.00%)
Deep white matter invasion			0.788		<0.001			1.000		<0.001
Yes	162 (72.32%)	68 (70.83%)		35 (35.35%)		64 (73.56%)	27 (72.97%)		5 (21.74%)
No	62 (27.68%)	28 (29.17%)		64 (64.65%)		23 (26.44%)	10 (27.03%)		18 (78.26%)
Pial invasion			0.710		<0.001			0.169		<0.001
Yes	92(41.07%)	37 (38.54%)		96 (96.97%)		46 (52.87%)	14 (37.84%)		22 (95.65%)
No	132 (58.93%)	59 (61.46%)		3 (3.03%)		41 (47.13%)	23 (62.16%)		1 (4.35%)
Ependymal invasion			1.000		0.780			0.193		0.034
Yes	56 (25.00%)	24 (25.00%)		23 (23.23%)		27 (31.03%)	7 (18.92%)		2 (8.70%)
No	168 (75.00%)	72 (75.00%)		76 (76.77%)		60 (68.97%)	30 (81.08%)		21 (91.30%)

Open in a new tab

GBM glioblastoma, IDH-mut Isocitrate dehydrogenase-mutant, IDH-wt Isocitrate dehydrogenase wild-type, MGMTmet methyl guanine methyl transferase promoter methylation, KPS karnofsky kerformance status score, TCM tumor across midline, ECM edema across midline

Construction of the clinical model

Univariate and multivariate LR results are presented in Table 2. For Task 1 (grade 4 vs. GBM), age (p = 0.014) and MGMTmet (p = 0.028) were significant predictors. For Task 2 (IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}), ECM (p = 0.005) and deep white matter invasion (p = 0.003) showed significance.

Table 2.

Univariate and multivariate logistic regression analysis in the training sets of the two tasks

Variables	Task 1 (grade 4 vs. GBM)				Task 2 (IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4})
	Univariate analysis		Multivariate analysis		Univariate analysis		Multivariate analysis
	OR (95%CI)	P	OR (95%CI)	P	OR (95%CI)	P	OR (95%CI)	P
Gender	1.103(0.638–1.918)	0.727			1.120 (0.458–2.729)	0.802
Age(years)	1.034 (1.013–1.056)	0.002	1.028 (1.006–1.052)	0.014	0.979 (0.948–1.009)	0.170
MGMTmet	0.564 (0.316–0.991)	0.049	0.503 (0.269–0.918)	0.028	1.700 (0.612–5.247)	0.326
Treatment	1.487 (0.856–2.585)	0.159			0.831 (0.340–2.035)	0.684
KPS	0.987 (0.963–1.010)	0.271			1.003 (0.967–1.044)	0.863
Tumor number	1.557 (0.903–2.706)	0.113			0.673 (0.244–1.745)	0.426
Tumor margin	0.971 (0.561–1.676)	0.917			1.167 (0.465–3.032)	0.745
Intratumoral hemorrhage	1.776 (0.925–3.555)	0.093			0.940 (0.268–2.959)	0.918
Intratumoral necrosis	1.696 (0.867–3.308)	0.120			1.643 (0.617–4.752)	0.335
Peritumoral edema	5.480 (1.581–25.261)	0.013	1.960 (0.424–10.969)	0.405	2.625 (0.621–18.028)	0.238
Maximum diameter	1.062 (0.915–1.238)	0.432			0.888 (0.699–1.118)	0.316
Midline shift	0.972 (0.565–1.671)	0.918			1.184 (0.488–2.895)	0.709
Enhancement pattern	1.465 (1.117–1.944)	0.006	1.085 (0.765–1.552)	0.649	1.492 (0.966–2.348)	0.075
Enhancement quality	4.333 (1.97–10.127)	<0.001	2.568 (0.832–8.27)	0.105	1.396 (0.518–4.067)	0.521
TCM	1.387 (0.692–2.752)	0.350			0.733 (0.264–2.113)	0.555
ECM	1.462 (0.756–2.804)	0.254			0.338 (0.129–0.866)	0.025	0.211 (0.067–0.602)	0.005
Cortical involvement	0.892 (0.496–1.621)	0.705			1.400(0.526–3.651)	0.493
Deep white matter invasion	1.024 (0.562–1.894)	0.939			3.594 (1.344–9.956)	0.012	5.688 (1.912–18.609)	0.003
Pial invasion	1.423 (0.822–2.465)	0.208			2.217 (0.909–5.575)	0.084
Ependymal invasion	1.350 (0.725–2.496)	0.340			2.333 (0.857–7.121)	0.112

Open in a new tab

GBM glioblastoma, IDH-mut Isocitrate dehydrogenase-mutant, IDH-wt Isocitrate dehydrogenase wild-type, OR odds ratio, CI confidence interval, MGMTmet methyl guanine methyl transferase promoter methylation, KPS karnofsky performance status score, TCM tumor crosses midline, ECM edema crosses midline

The clinical models achieved AUCs of 0.671 and 0.619 (training set), 0.656 and 0.605 (testing set), and 0.543 and 0.400 (validation set) for Tasks 1 and Tasks 2, respectively (Fig. 4A-C, G-I, and Table 3).

Fig. 4 — ROC curves of the ML models, clinical models, and combined models in the training (A, G), testing (B, H), and validation sets (C, I) for two tasks. DeLong’s test p-valued heat maps of the training (D, J), testing (E, K), and validation sets (F, L) for two tasks. ROC: receiver operating characteristic; ML: machine learning; T1C: contrast-enhanced T1-weighted imaging; T2F: T2-weighted imaging fluid attenuated inversion recovery.

Table 3.

The diagnostic performance of clinical model, ML model, and combined model in the training, testing, and validation sets of the two tasks

		Task 1 (grade 4 vs. GBM)							Task 2 (IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4})
Model	Set	AUC (95% CI)	Cut-off	SEN	SPE	ACC	PPV	NPV	AUC (95% CI)	Cut-off	SEN	SPE	ACC	PPV	NPV
T1C Edema	Training	0.843 (0.787–0.892)	0.667	0.707	0.833	0.754	0.876	0.631	0.837 (0.748–0.911)	0.385	0.767	0.772	0.770	0.639	0.863
	Testing	0.829 (0.734–0.911)	0.505	0.875	0.750	0.823	0.831	0.811	0.787 (0.615–0.930)	0.402	0.636	0.846	0.784	0.636	0.846
	Validation	0.715 (0.574–0.839)	0.598	0.763	0.609	0.727	0.866	0.438	0.717 (0.500-1.000)	0.304	0.677	1.000	0.435	0.048	0.000
T1C Tumor	Training	0.825 (0.768–0.877)	0.681	0.700	0.786	0.732	0.845	0.611	0.735 (0.624–0.838)	0.382	0.800	0.614	0.678	0.522	0.854
	Testing	0.722 (0.608–0.830)	0.584	0.821	0.625	0.740	0.754	0.714	0.750 (0.535–0.927)	0.373	0.818	0.654	0.703	0.500	0.895
	Validation	0.786 (0.646–0.915)	0.533	0.842	0.739	0.818	0.914	0.586	0.700 (0.533-1.000)	0.367	0.667	1.000	0.957	1.000	0.952
T2F Tumor	Training	0.777 (0.711–0.838)	0.558	0.886	0.571	0.768	0.775	0.750	0.719 (0.601–0.825)	0.394	0.733	0.632	0.667	0.512	0818
	Testing	0.629 (0.491–0.741)	0.549	0.821	0.425	0.656	0.667	0.630	0.650 (0.432–0.818)	0.370	0.909	0.423	0.568	0.400	0.917
	Validation	0.642 (0.529–0.751)	0.608	0.579	0.783	0.626	0.898	0.360	0.750 (0.450–0.967)	0.413	0.667	0.700	0.696	0.250	0.933
Optimal ML	Training	0.902 (0.859–0.938)	0.692	0.707	0.940	0.795	0.952	0.658	0.904 (0.838–0.957)	0.373	0.933	0.772	0.828	0.683	0.957
	Testing	0.854 (0.771–0.925)	0.678	0.732	0.850	0.781	0.872	0.694	0.899 (0.783–0.986)	0.378	0.909	0.808	0.838	0.667	0.955
	Validation	0.830 (0.728–0.923)	0.596	0.698	0.913	0.748	0.964	0.477	0.783 (0.367-1.000)	0.407	0.667	1.000	0.957	1.000	0.952
Clinical	Training	0.671 (0.596–0.743)	0.665	0.693	0.583	0.652	0.735	0.533	0.619 (0.519–0.720)	0.400	0.467	0.772	0.667	0.519	0.733
	Testing	0.656 (0.492–0.766)	0.680	0.357	0.875	0.573	0.800	0.493	0.605 (0.449–0.767)	0.400	0.364	0.846	0.703	0.500	0.759
	Validation	0.543 (0451-0.697)	0.520	0.855	0.217	0.778	0.800	0.556	0.400 (0.300-0.475)	-	1.000	0.000	0.130	0.130	-
Combined	Training	0.907 (0.866–0.943)	0.499	0.893	0.762	0.844	0.862	0.810	0.899 (0.832–0.951)	0.377	0.933	0.772	0.828	0.683	0.957
	Testing	0.852 (0.767–0.925)	0.670	0.768	0.825	0.792	0.860	0.717	0.895 (0.787–0.979)	0.457	0.909	0.808	0.838	0.667	0.955
	Validation	0.832 (0.729–0.925)	0.564	0.711	0.913	0.758	0.964	0.488	0.792 (0.400-1.000)	0.387	0.667	1.000	0.957	1.000	0.952

Open in a new tab

GBM glioblastoma, IDH-mut Isocitrate dehydrogenase-mutant, IDH-wt Isocitrate dehydrogenase wild-type, AUC area under curve, CI confidence interval, SEN sensitivity, SPE specificity, ACC accuracy, PPV Positive Predictive Value, NPV Negative Predictive Value, T1C contrast-enhanced T1-weighted imaging, T2F T2-weighted imaging fluid attenuated inversion recovery, ML machine learning

Construction of the ML model

A total of 1688 radiomics features were extracted. After feature selection and reduction, 11 T1C edema, 14 T1C tumor, and 10 T2F tumor features for Task 1 and 9 T1C edema, 6 T1C tumor, and 3 T2F tumor features for Task 2 were retained. These features were then amalgamated separately, resulting in final sets of 35 and 18 features for constructing combined-sequence ML models for the two tasks.

The results demonstrated that the validation set of the combined-sequence ML model of Task 1 (grade 4 vs. GBM) (training set = 0.902, testing set = 0.854, and validation set = 0.830) and Task 2 (IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}) (training set = 0.904, testing set = 0.899, and validation set = 0.783) had the highest AUC, making them the optimal ML models (Fig. 4A-C, G-I, and Table 3). Rad-score was calculated.

Construction of the combined model and nomogram

In Task 1 (grade 4 vs. GBM), the combined model had the highest AUC in the training (0.907) and validation set (0.832), but was slightly lower than the optimal ML model in the testing set (0.852). In Task 2 (IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}), the combined model exhibited lower AUC values than the optimal ML model in the training set (0.899) and testing set (0.895), though it attained the highest AUC in the validation set (0.792), as shown in Fig. 4A-C and G-I.

The nomogram based on the combined model further facilitates individualized discrimination between grade 4 astrocytoma and GBM (Fig. 5A). Their representative MR images and case nomograms are shown in Fig. 5B-E.

Model evaluation and comparison

DeLong’s test demonstrated significant differences in AUCs between the optimal ML model and the combined model, as compared to the clinical model across both tasks in the training, testing and validation sets. For Task 1 (grade 4 vs. GBM), the differences were statistically significant in the training set, testing set, and validation set (all p values < 0.001). For Task 2 (IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}), the optimal ML model and combined model showed significant improvements over the clinical model in both the training set (all p < 0.001) and testing set (all p < 0.01). In the validation set, the clinical model showed substantially lower AUC values (differences of 0.383 and 0.392) compared to the optimal ML model (p = 0.089) and combined model (p = 0.063), respectively, though these differences did not reach statistical significance. The AUC differences between the combined model and the optimal ML model were not statistically significant in the training set (Task 1: p = 0.428, Task 2: p = 0.716), testing set (Task 1: p = 0.887, Task 2: p = 0.856), and validation set (Task 1: p = 0.889, Task 2: p = 0.842). DeLong’s test results are shown in Fig. 4D-F and J-L.

The combined model demonstrated good calibration in all sets for Task 1 (grade 4 vs. GBM, Fig. 6A-C) and in the training (Fig. 6G) and testing sets (Fig. 6H) for Task 2 (IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}). However, moderate calibration was observed in the validation set for Task 2 (Fig. 6I). Decision curve analysis (DCA) indicated that the combined model and the optimal ML model provided significantly better net benefits in predicting glioma molecular subtypes compared to the clinical model. For Task 1, threshold probabilities were 0.12 to 0.92 and 0.02 to 0.96 (training set, Fig. 6D), 0.18 to 0.86 and 0.06 to 0.92 (testing set, Fig. 6E), and 0.36 to 0.98 and 0.10 to 0.98 (validation set, Fig. 6F). For Task 2, threshold probabilities ranged from 0.06 to 0.70 and 0.02 to 0.98 (training set, Fig. 6J), from 0.06 to 0.62 and 0.02 to 0.96 (testing set, Fig. 6K), and from 0.08 to 0.98 and 0.10 to 0.98 (validation set, Fig. 6L).

The combined model offered a limited net benefit compared to the optimal ML model, which offered greater net benefits across multiple threshold ranges.

We used six additional ML methods to evaluate the stability and reliability of our combined model. In both tasks, the RF model demonstrated superior performance with AUCs of 1.000 and 0.996 in the training sets, while NB and XGBoost showed slightly lower performance. The remaining models performed similarly, with AUCs ranging from 0.903 to 0.919 (Task 1: grade 4 vs. GBM) and 0.922 to 0.943 (Task 2: IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}), respectively (Fig. 7A, G, D, J). In Task 1 testing set, models showed comparable performance (AUC range: 0.851 to 0.882; Fig. 7B, H), with DeLong’s test revealing only MLP vs. SVM reached significance (p = 0.04; all other p > 0.05; Fig. 7N). For Task 2, excluding XGBoost (AUC = 0.897), models demonstrated consistent discrimination (AUC range: 0.944 to 0.976; Fig. 7E, K) without significant differences (all p > 0.05; Fig. 7Q). Validation set results showed AUCs of 0.790–0.832 for Task 1 (Fig. 7C, I). In Task 2, while RF demonstrated lower discrimination (AUC = 0.675; Fig. 7F, L), other models maintained AUCs of 0.700 to 0.833, with no significant differences detected by DeLong’s test (all p > 0.05; Fig. 7R).

Fig. 7 — AUCs for the training (A, D, G, J), testing (B, E, H, K), and validation sets (C, F, I, L) of the combined model constructed by 7 classifiers. DeLong’s test p-valued heat maps of training (M, P), testing (N, Q), and validation sets (O, R) for two tasks. ROC: receiver operating characteristic; RF: random forest; SVM: support vector machines, NB: naive bayes; XGboost: extreme gradient boosting; LDA: linear discriminant analysis; MLP: multi-layer perceptrons; LR: logistic regression.

Survival analysis

Using the training set cutoff values from the combined model for the two tasks (0.499 and 0.377), patients were divided into high-risk and low-risk groups. In Task 1 (grade 4 vs. GBM, 487 vs. 874 days) and Task 2 (IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}, 627 vs. 1316 days), the average OS of low-risk patients was significantly longer than that of high-risk patients, reflecting the different prognoses associated with various molecular subtypes. KM analysis revealed significant differences between these groups for the combined model and molecular subtype in Task 1 and Task 2 (all p < 0.001). The Z-test indicated that there was no statistically significant difference in prognostic value between the molecular subtype and combined model in both tasks (p = 0.964 and p = 0.746), as illustrated in Fig. 8.

Fig. 8 — Kaplan-Meier survival analysis of the combined model for two tasks. The combined model (dotted line) effectively stratified Task 1 and Task 2 cases into high-risk (red line) and low-risk (green line) groups, with significant prognostic differences. The combined model performed comparable to the molecular subtype (solid line) in two tasks

Discussion

This study constructed two machine learning tasks to predict molecular subtypes of 2021 WHO grade 4 glioma using multiparametric MRI, clinical-radiological characteristics, and their combination. Our findings indicate that the ML model performed well in distinguishing astrocytoma, CNS WHO grade 4 from GBM, IDH-wt (WHO 2021) and discriminating astrocytoma, IDHmut, CNS WHO grade 4 from IDH-wt, CNS WHO grade 4. Additionally, the combined model effectively stratified Task 1 (grade 4 vs. GBM) and Task 2 (IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}) cases into high-risk and low-risk groups according to OS, with prognostic performance comparable to molecular subtype.

Predictive value of the clinical model

Our study identified age and MGMTmet status as significant predictors for distinguishing astrocytoma, CNS WHO grade 4 from GBM, IDH-wt (WHO 2021) (Task 1: grade 4 vs. GBM) with AUCs of 0.671 (training set), 0.656 (testing set), and 0.543 (validation set). Similarly, ECM and deep white matter invasion were significant predictors for differentiating astrocytoma, IDHmut, CNS WHO grade 4 from astrocytoma, IDH-wt, CNS WHO grade 4 (Task 2: IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}) with AUCs of 0.619 (training set), 0.605 (testing set), and 0.400 (validation set).

GBM, IDH-wt (WHO 2021) patients were older on average than astrocytoma, CNS WHO grade 4 patients (56.5 vs. 50.9 years), and astrocytoma, IDH-wt, CNS WHO grade 4 patients were older than astrocytoma, IDHmut, CNS WHO grade 4 patients (53.4 vs. 45.0 years), consistent with a previous study [18]. This indicated that older age may be associated with higher malignancy potential. Studies indicated that younger age is associated with better prognosis, whereas older age is linked to poor survival in adult glioma patients [19, 20], and our study indirectly supports this. Additionally, a higher proportion of MGMTmet was observed in astrocytoma, CNS WHO grade 4 compared to GBM, IDH-wt (WHO 2021) (67.3% vs. 47.8%), suggesting it may be associated with less aggressive tumor behavior. Previous studies have shown that MGMTmet is linked to an improved response to temozolomide and longer OS [2, 21, 22], consistent with our findings. Astrocytoma, IDH-wt, CNS WHO grade 4 patients exhibited higher rates of ECM (79.6% vs. 59.1%) and lower rates of deep white matter invasion (30.1% vs. 45.5%) compared to astrocytoma, IDHmut, CNS WHO grade 4 patients.

Despite these findings, the clinical model demonstrated limited predictive power in both tasks, underscoring the challenge of relying solely on clinical-radiological features for precise molecular subtypes of 2021 WHO glioma. This limitation is likely due to the inherent heterogeneity of glioma, where clinical and radiological characteristics may not fully capture the underlying molecular alterations.

Predictive value of the ML model

In our study, the ML model significantly outperformed the clinical models in distinguishing astrocytoma, CNS WHO grade 4 from GBM, IDH-wt (WHO 2021) and discriminating astrocytoma, IDHmut, CNS WHO grade 4 from astrocytoma, IDH-wt, CNS WHO grade 4. The optimal ML model for both tasks had strong predictive performance in the training (AUC = 0.902 and 0.904), testing (0.854 and 0.899), and validation (0.830 and 0.783) sets. Specifically, it achieved positive predictive values (PPV) of 0.952 (training), 0.872 (testing), and 0.964 (validation) for GBM, IDH-wt (WHO 2021) identification (Task 1: grade 4 vs. GBM), with corresponding negative predictive values (NPV) of 0.957, 0.955, and 0.952 for excluding IDH-mut subtypes (Task 2: IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}). These metrics indicate > 85% accuracy for GBM prediction across all evaluations, and > 95% reliability in training/testing sets for ruling out astrocytoma, IDHmut, CNS WHO grade 4 cases, confirming its clinical utility for molecular subtyping. Currently, only Wei et al. [23] conducted a relevant study in which they developed a subregion-based MRI RadioFusionOmics model to discriminate between astrocytoma, CNS WHO grade 4 and GBM, IDH-wt (WHO 2021), achieving AUCs of 0.976 and 0.974 for the training and validation cohorts, respectively, consistent with our finding. However, we further stratified astrocytoma, CNS WHO grade 4 into IDHmut and IDH-wt subtypes and evaluated the prognostic value of the combined model. Previous studies employing radiomics or ML models to distinguish GBM, IDH-mut, WHO grade 4 from GBM, IDH-wt (WHO 2021) were based on the 2016 WHO CNS criteria [24–26]. No studies currently have addressed differentiating astrocytoma, IDH-mut, 2021 WHO grade 4 and astrocytoma, IDH-wt, CNS WHO grade 4 reclassified as grade 4 astrocytoma under the 2021 WHO CNS criteria.

Our multiparametric MRI includes T1C edema, T1C tumor, and T2F tumor. In both tasks, T1C edema, besides the optimal ML model and the combined model, showed the highest AUC in the training (0.843 and 0.837) and testing (0.829 and 0.787) sets. In Task 1 (grade 4 vs. GBM) and Task 2 (IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}) validation (0.715 and 0.717) sets, T1C edema achieved an AUC above 0.7, though slightly lower than T1C tumor or T2F tumor. The peritumoral region adjacent to GBM, a mix of infiltrative tumor and vasogenic edema, is often the site of recurrence [16, 27]. Thus, tumor and edema regions in grade 4 glioma reflect tumor heterogeneity. Although peritumoral edema is not a significant predictor of the 2021 glioma molecular subtypes, its proportion in GBM, IDH-wt (WHO 2021) is higher than in astrocytoma, CNS WHO grade 4 (98.9% vs. 92.5%). This suggests peritumoral edema may encompass potential heterogeneity between astrocytoma, CNS WHO grade 4 and GBM, IDH-wt (WHO 2021), consistent with a previous study [23].

Predictive value of the combined model

The combined model, integrating Rad-score and clinical-radiological characteristics, aimed to leverage the strengths of both approaches. However, its performance was similar to the optimal ML model. In Task 1 (grade 4 vs. GBM), the AUCs for the training, testing, and validation sets were 0.907 vs. 0.902 (p = 0.428), 0.852 vs. 0.854 (p = 0.887) and 0.832 vs. 0.830 (p = 0.889), respectively. In Task 2 (IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}), the AUCs for the training, testing, and validation sets were 0.899 vs. 0.904 (p = 0.716), 0.895 vs. 0.899 (p = 0.856), and 0.792 vs. 0.783 (p = 0.842), respectively. These findings indicate there is limited added value from incorporating clinical-radiological features.

The nomogram based on the combined model of Task 1 (grade 4 vs. GBM) offers a practical tool for individualized risk prediction, providing a visual and user-friendly representation to aid clinicians in stratifying patients and tailoring treatment strategies.

Model evaluation and comparison

We evaluated the combined model using various ML algorithms, including LR, SVM, MLP, LDA, RF, and NB. Most algorithms showed consistent performance across training, testing, and the external validation sets for both tasks. However, we observed moderate calibration and suboptimal decision curve performance specifically in the validation set for Task 2 (IDH-mut ^{grade 4} vs. IDH-wt ^{grade 4}). This limitation likely stems from several factors: (1) severe class imbalance in the validation cohort (3 mutant vs. 20 wildtype cases), which fundamentally challenges model calibration; (2) potentially inferior clinical outcomes in the external validation set that may not fully represent the target population.

In addition, the notable performance degradation of RF (validation AUC = 0.675 vs. training AUC = 0.996) compared to more stable linear models, suggesting algorithm-specific sensitivity to dataset shifts. These results emphasize that clinical application requires rigorous multicenter validation.

Survival analysis

We further evaluated the prognostic value of the combined model. It effectively stratified patients into high-risk and low-risk groups, with prognostic value comparable to molecular subtype. This confirms that our model can accurately predict glioma molecular subtype and holds substantial prognostic value. While our study demonstrates the feasibility of MRI-based molecular subtyping, future research should focus on integrating radiomic biomarkers into clinical trial designs and surveillance protocols. Potential applications include stratifying patients for targeted therapy trials [28] or monitoring treatment response [29]. Nevertheless, such applications would require prospective validation and standardization of imaging protocols across institutions.

Limitations

This study has several limitations. Firstly, the retrospective nature and reliance on data from three institutions may introduce selection bias. Prospective validation on a larger multicenter cohort is necessary to confirm the findings. Secondly, different equipment and scanning parameters used across the three institutions, despite image preprocessing, may have influenced the radiomic features. Future studies should aim for uniform scanning parameters. Thirdly, the potential for segmentation bias in radiomic feature extraction should be acknowledged, particularly for first-order features, despite our demonstration of good intra-observer agreement (ICC ≥ 0.75). Fourthly, although this study included multi-sequence MRI of tumor and edema regions, future research should incorporate more functional imaging modalities, such as diffusion-weighted imaging (DWI) and perfusion-weighted imaging (PWI), to further enhance the model’s predictive power. Fifthly, the lack of standardized extent of resection data represents an important constraint, as surgical radicality is known to significantly impact survival outcomes in high-grade glioma. Finally, our study focused on static preoperative imaging, which limits the ability to capture dynamic tumor changes. Future studies could integrate parametric response mapping (PRM) [30, 31] to longitudinally track spatial heterogeneity in multi-parametric MRI data.

Conclusion

In conclusion, the multiparametric MRI machine learning model effectively differentiated astrocytoma, CNS WHO grade 4 from GBM, IDH-wt (WHO 2021) and distinguished between astrocytoma, IDHmut and IDH-wt subtypes. Furthermore, the model stratified various molecular subtypes of glioma patients into high-risk and low-risk groups according to OS, providing a proof-of-concept for non-invasive molecular subtyping.

Supplementary Information

Supplementary Material 1.^{(133.5KB, docx)}

Acknowledgements

Not applicable.

Abbreviations

ACC: Accuracy
AUC: Area under the curve
AMP: Amplification
CDKN: Cyclin-dependent kinase inhibitor
CE: Contrast-enhanced
CI: Confidence interval
CNS: Central nervous system
DCA: Decision curve analysis
DWI: Diffusion-weighted imaging
ECM: Edema crosses midline
EGFR: Epidermal growth factor receptor
FLAIR: Fluid attenuated inversion recovery
GBM: Glioblastoma
GLCM: Gray level co-occurrence matrix
GLDM: Gray evel dependence matrix
GLRLM: Gray level r un length matrix
GLSZM: Gray level size zone matrix
ICC: Intraclass correlation coefficient
IDH-mut: Isocitrate dehydrogenase mutant
IDH-wt: Isocitrate dehydrogenase wild-type
KM: Kaplan-Meier
KPS: Karnofsky performance status
LASSO: Least absolute shrinkage and selection operator
LDA: Linear discriminant analysis
LR: Logistic regression
mGBM: Molecular features of glioblastoma
MGMTmet: Methyl guanine methyl transferase promoter methylation
ML: Machine learning
MRI: Magnetic resonance imaging
MLP: Multi-layer perceptrons
NB: Naive bayes
NGTDM: Neighborhood gray tone difference matrix
NPV: Negative predictive value
OR: Odds ratio
OS: Overall survival
PFS: Progression-free survival
PPV: Positive predictive value
PWI: Perfusion-weighted imaging
RF: Random forest
ROC: Receiver operating characteristic
ROI: Region of interest
SEN: Sensitivity
SPE: Specificity
SVM: Support vector machines
TCM: Tumor crosses midline
TERT: Telomerase reverse transcriptase
T1WI: T1-weighted imaging
T2WI: T2-weighted imaging
VOI: Volume of interest
WHO: World health organization
XGBoost: Extreme gradient boosting

Authors’ contributions

Conception and design: Y T, H Z, Xiaochun Wang. Administrative support: H Z, Xiaochun Wang, Guoqiang Yang, Jiangfeng Du. Provision of study materials or patients: WJ X, YY L. Collection and assembly of data: WJ X, YY L, J Z, Guoqiang Yang, Jiangfeng Du. Data analysis and interpretation: WJ X, YY L, J Z, ZY Z, PX S. Manuscript writing: All authors. Final approval of manuscript: All authors.

Funding

This work was supported by the National Natural Science Foundation of China [grant numbers 82371941, 82071893 to Yan Tan, and U21A20386 to Hui Zhang]; the Research Project Supported by Shanxi Scholarship Council of China [grant number 2023 − 186 to Yan Tan]; and Shanxi Province Higher Education “Billion Project” Science and Technology Guidance Project [grant number BYJL017 to Yan Tan].

Data availability

The datasets generated or analyzed during the study are not publicly available due to institutional regulations but are available from the corresponding author on reasonable request.

Declarations

Ethics approval and consent to participate

The study was retrospective and anonymous, so the ethical review (Ethics Committee of the First Hospital of Shanxi Medical University) waived the requirement for written informed consent (NO. KYYJ-2023-058).

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Wenji Xu and Yangyang Li are co-first authors and contributed equally to this work.

Contributor Information

Hui Zhang, Email: zhanghui_mr@163.com.

Yan Tan, Email: tanyan123456@sina.com.

References

1.Louis DN, Perry A, Wesseling P, Brat DJ, Cree IA, Figarella-Branger D, et al. The 2021 WHO classification of tumors of the central nervous system: a summary. Neuro Oncol. 2021;23:1231–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Horbinski C, Berger T, Packer RJ, Wen PY. Clinical implications of the 2021 edition of the WHO classification of central nervous system tumours. Nat Rev Neurol. 2022;18:515–29. [DOI] [PubMed] [Google Scholar]
3.Ramos-Fresnedo A, Pullen MW, Perez-Vega C, Domingo RA, Akinduro OO, Almeida JP, et al. The survival outcomes of molecular glioblastoma IDH-wildtype: a multicenter study. J Neurooncol. 2022;157:177–85. [DOI] [PubMed] [Google Scholar]
4.Gritsch S, Batchelor TT, Gonzalez Castro LN. Diagnostic, therapeutic, and prognostic implications of the 2021 world health organization classification of tumors of the central nervous system. Cancer. 2022;128:47–58. [DOI] [PubMed] [Google Scholar]
5.Lambin P, Leijenaar RTH, Deist TM, Peerlings J, De Jong EEC, Van Timmeren J, et al. Radiomics: the Bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol. 2017;14:749–62. [DOI] [PubMed] [Google Scholar]
6.Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, Van Stiphout RGPM, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48:441–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Avanzo M, Wei L, Stancanello J, Vallières M, Rao A, Morin O, et al. Machine and deep learning methods for radiomics. Med Phys. 2020;47:e185–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Moodi F, Khodadadi Shoushtari F, Ghadimi DJ, Valizadeh G, Khormali E, Salari HM et al. Glioma tumor grading using radiomics on conventional MRI: a comparative study of WHO 2021 and WHO 2016 classification of central nervous tumors. J Magn Reson Imaging. 2024 Sep;60(3):923-938. [DOI] [PubMed]
9.Tian Q, Yan L, Zhang X, Zhang X, Hu Y, Han Y, et al. Radiomics strategy for glioma grading using texture features from multiparametric MRI. Magn Reson Imaging. 2018;48:1518–28. [DOI] [PubMed] [Google Scholar]
10.Chiu F-Y, Le NQK, Chen C-Y. A multiparametric MRI-Based radiomics analysis to efficiently classify tumor subregions of glioblastoma: A pilot study in machine learning. JCM. 2021;10:2030. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Lin K, Cidan W, Qi Y, Wang X. Glioma grading prediction using multiparametric magnetic resonance imaging-based radiomics combined with proton magnetic resonance spectroscopy and diffusion tensor imaging. Med Phys. 2022;49:4419–29. [DOI] [PubMed] [Google Scholar]
12.Ding J, Zhao R, Qiu Q, Chen J, Duan J, Cao X, et al. Developing and validating a deep learning and radiomic model for glioma grading using multiplanar reconstructed magnetic resonance contrast-enhanced T1-weighted imaging: a robust, multi-institutional study. Quant Imaging Med Surg. 2022;12:1517–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Vijithananda SM, Jayatilake ML, Gonçalves TC, Rato LM, Weerakoon BS, Kalupahana TD, et al. Texture feature analysis of MRI-ADC images to differentiate glioma grades using machine learning techniques. Sci Rep. 2023;13:15772. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Xing X, Zhu M, Chen Z, Yuan Y. Comprehensive learning and adaptive teaching: distilling multi-modal knowledge for pathological glioma grading. Med Image Anal. 2024;91:102990. [DOI] [PubMed] [Google Scholar]
15.Malik N, Geraghty B, Dasgupta A, Maralani PJ, Sandhu M, Detsky J, et al. MRI radiomics to differentiate between low grade glioma and glioblastoma peritumoral region. J Neurooncol. 2021;155:181–91. [DOI] [PubMed] [Google Scholar]
16.Szekeres D, Jetty SN, Soni N. The role of multiparametric MRI in diagnosing and grading glioma. Neurol India. 2023;71:1274–5. [DOI] [PubMed] [Google Scholar]
17.Naser MA, Deen MJ. Brain tumor segmentation and grading of lower-grade glioma using deep learning in MRI images. Comput Biol Med. 2020;121:103758. [DOI] [PubMed] [Google Scholar]
18.Lee D, Riestenberg RA, Haskell-Mendoza A, Bloch O. Diffuse astrocytic glioma, IDH-Wildtype, with molecular features of glioblastoma, WHO grade IV: A single-institution case series and review. J Neurooncol. 2021;152:89–98. [DOI] [PubMed] [Google Scholar]
19.Weller M, Van Den Bent M, Preusser M, Le Rhun E, Tonn JC, Minniti G, et al. EANO guidelines on the diagnosis and treatment of diffuse gliomas of adulthood. Nat Rev Clin Oncol. 2021;18:170–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Park YW, Kim S, Park CJ, Ahn SS, Han K, Kang S-G, et al. Adding radiomics to the 2021 WHO updates May improve prognostic prediction for current IDH-wildtype histological lower-grade gliomas with known EGFR amplification and TERT promoter mutation status. Eur Radiol. 2022;32:8089–98. [DOI] [PubMed] [Google Scholar]
21.Agarwal A, Edgar MA, Desai A, Gupta V, Soni N, Bathla G. Molecular GBM versus histopathological GBM: radiology-pathology-genetic correlation and the new WHO 2021 definition of glioblastoma. AJNR Am J Neuroradiol. 2024 Aug 9;45(8):1006-1012. [DOI] [PMC free article] [PubMed]
22.Zeng C, Song X, Zhang Z, Cai Q, Cai J, Horbinski C, et al. Dissection of transcriptomic and epigenetic heterogeneity of grade 4 gliomas: implications for prognosis. Acta Neuropathol Commun. 2023;11:133. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Wei R, Lu S, Lai S, Liang F, Zhang W, Jiang X, et al. A subregion-based radiofusionomics model discriminates between grade 4 Astrocytoma and glioblastoma on multisequence MRI. J Cancer Res Clin Oncol. 2024;150:73. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Pasquini L, Napolitano A, Tagliente E, Dellepiane F, Lucignani M, Vidiri A et al. Deep learning can differentiate IDH-Mutant from IDH-wild GBM. J Pers Med. 2021 Apr 9;11(4):290. [DOI] [PMC free article] [PubMed]
25.Calabrese E, Villanueva-Meyer JE, Cha S. A fully automated artificial intelligence method for non-invasive, imaging-based identification of genetic alterations in glioblastomas. Sci Rep. 2020;10:11852. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Kandalgaonkar P, Sahu A, Saju AC, Joshi A, Mahajan A, Thakur M, et al. Predicting IDH subtype of grade 4 Astrocytoma and glioblastoma from tumor radiomic patterns extracted from multiparametric magnetic resonance images using a machine learning approach. Front Oncol. 2022;12:879376. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Cheng J, Liu J, Yue H, Bai H, Pan Y, Wang J. Prediction of glioma grade using intratumoral and peritumoral radiomic features from multiparametric MRI images. IEEE/ACM Trans Comput Biol Bioinform. 2022;19:1084–95. [DOI] [PubMed] [Google Scholar]
28.Liu C, Liu Y, Lin H, Zhang C, Zhang B, Song H, et al. Multi-omics landscape of alternative splicing in diffuse midline glioma reveals immune- and neural-driven subtypes with implications for spliceosome-targeted therapy. Front Immunol. 2025;16:1587009. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Zhou Q, Xue C, Ke X, Zhou J. Treatment Response and Prognosis Evaluation in High-Grade Glioma: An Imaging Review Based on MRI. J Magn Reson Imaging. 2022 Aug;56(2):325-340. [DOI] [PubMed]
30.Lausch A, Yeung TP, Chen J, Law E, Wang Y, Urbini B, et al. A generalized parametric response mapping method for analysis of multi-parametric imaging: A feasibility study with application to glioblastoma. Med Phys. 2017;44:6074–84. [DOI] [PubMed] [Google Scholar]
31.Hoff BA, Lemasson B, Chenevert TL, Luker GD, Tsien CI, Amouzandeh G, et al. Parametric response mapping of FLAIR MRI provides an early indication of progression risk in glioblastoma. Acad Radiol. 2021;28:1711–20. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1.^{(133.5KB, docx)}

Data Availability Statement

The datasets generated or analyzed during the study are not publicly available due to institutional regulations but are available from the corresponding author on reasonable request.

[CR1] 1.Louis DN, Perry A, Wesseling P, Brat DJ, Cree IA, Figarella-Branger D, et al. The 2021 WHO classification of tumors of the central nervous system: a summary. Neuro Oncol. 2021;23:1231–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Horbinski C, Berger T, Packer RJ, Wen PY. Clinical implications of the 2021 edition of the WHO classification of central nervous system tumours. Nat Rev Neurol. 2022;18:515–29. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Ramos-Fresnedo A, Pullen MW, Perez-Vega C, Domingo RA, Akinduro OO, Almeida JP, et al. The survival outcomes of molecular glioblastoma IDH-wildtype: a multicenter study. J Neurooncol. 2022;157:177–85. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Gritsch S, Batchelor TT, Gonzalez Castro LN. Diagnostic, therapeutic, and prognostic implications of the 2021 world health organization classification of tumors of the central nervous system. Cancer. 2022;128:47–58. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Lambin P, Leijenaar RTH, Deist TM, Peerlings J, De Jong EEC, Van Timmeren J, et al. Radiomics: the Bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol. 2017;14:749–62. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, Van Stiphout RGPM, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48:441–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Avanzo M, Wei L, Stancanello J, Vallières M, Rao A, Morin O, et al. Machine and deep learning methods for radiomics. Med Phys. 2020;47:e185–202. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Moodi F, Khodadadi Shoushtari F, Ghadimi DJ, Valizadeh G, Khormali E, Salari HM et al. Glioma tumor grading using radiomics on conventional MRI: a comparative study of WHO 2021 and WHO 2016 classification of central nervous tumors. J Magn Reson Imaging. 2024 Sep;60(3):923-938. [DOI] [PubMed]

[CR9] 9.Tian Q, Yan L, Zhang X, Zhang X, Hu Y, Han Y, et al. Radiomics strategy for glioma grading using texture features from multiparametric MRI. Magn Reson Imaging. 2018;48:1518–28. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Chiu F-Y, Le NQK, Chen C-Y. A multiparametric MRI-Based radiomics analysis to efficiently classify tumor subregions of glioblastoma: A pilot study in machine learning. JCM. 2021;10:2030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Lin K, Cidan W, Qi Y, Wang X. Glioma grading prediction using multiparametric magnetic resonance imaging-based radiomics combined with proton magnetic resonance spectroscopy and diffusion tensor imaging. Med Phys. 2022;49:4419–29. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Ding J, Zhao R, Qiu Q, Chen J, Duan J, Cao X, et al. Developing and validating a deep learning and radiomic model for glioma grading using multiplanar reconstructed magnetic resonance contrast-enhanced T1-weighted imaging: a robust, multi-institutional study. Quant Imaging Med Surg. 2022;12:1517–28. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Vijithananda SM, Jayatilake ML, Gonçalves TC, Rato LM, Weerakoon BS, Kalupahana TD, et al. Texture feature analysis of MRI-ADC images to differentiate glioma grades using machine learning techniques. Sci Rep. 2023;13:15772. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Xing X, Zhu M, Chen Z, Yuan Y. Comprehensive learning and adaptive teaching: distilling multi-modal knowledge for pathological glioma grading. Med Image Anal. 2024;91:102990. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Malik N, Geraghty B, Dasgupta A, Maralani PJ, Sandhu M, Detsky J, et al. MRI radiomics to differentiate between low grade glioma and glioblastoma peritumoral region. J Neurooncol. 2021;155:181–91. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Szekeres D, Jetty SN, Soni N. The role of multiparametric MRI in diagnosing and grading glioma. Neurol India. 2023;71:1274–5. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Naser MA, Deen MJ. Brain tumor segmentation and grading of lower-grade glioma using deep learning in MRI images. Comput Biol Med. 2020;121:103758. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Lee D, Riestenberg RA, Haskell-Mendoza A, Bloch O. Diffuse astrocytic glioma, IDH-Wildtype, with molecular features of glioblastoma, WHO grade IV: A single-institution case series and review. J Neurooncol. 2021;152:89–98. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Weller M, Van Den Bent M, Preusser M, Le Rhun E, Tonn JC, Minniti G, et al. EANO guidelines on the diagnosis and treatment of diffuse gliomas of adulthood. Nat Rev Clin Oncol. 2021;18:170–86. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Park YW, Kim S, Park CJ, Ahn SS, Han K, Kang S-G, et al. Adding radiomics to the 2021 WHO updates May improve prognostic prediction for current IDH-wildtype histological lower-grade gliomas with known EGFR amplification and TERT promoter mutation status. Eur Radiol. 2022;32:8089–98. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Agarwal A, Edgar MA, Desai A, Gupta V, Soni N, Bathla G. Molecular GBM versus histopathological GBM: radiology-pathology-genetic correlation and the new WHO 2021 definition of glioblastoma. AJNR Am J Neuroradiol. 2024 Aug 9;45(8):1006-1012. [DOI] [PMC free article] [PubMed]

[CR22] 22.Zeng C, Song X, Zhang Z, Cai Q, Cai J, Horbinski C, et al. Dissection of transcriptomic and epigenetic heterogeneity of grade 4 gliomas: implications for prognosis. Acta Neuropathol Commun. 2023;11:133. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Wei R, Lu S, Lai S, Liang F, Zhang W, Jiang X, et al. A subregion-based radiofusionomics model discriminates between grade 4 Astrocytoma and glioblastoma on multisequence MRI. J Cancer Res Clin Oncol. 2024;150:73. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Pasquini L, Napolitano A, Tagliente E, Dellepiane F, Lucignani M, Vidiri A et al. Deep learning can differentiate IDH-Mutant from IDH-wild GBM. J Pers Med. 2021 Apr 9;11(4):290. [DOI] [PMC free article] [PubMed]

[CR25] 25.Calabrese E, Villanueva-Meyer JE, Cha S. A fully automated artificial intelligence method for non-invasive, imaging-based identification of genetic alterations in glioblastomas. Sci Rep. 2020;10:11852. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Kandalgaonkar P, Sahu A, Saju AC, Joshi A, Mahajan A, Thakur M, et al. Predicting IDH subtype of grade 4 Astrocytoma and glioblastoma from tumor radiomic patterns extracted from multiparametric magnetic resonance images using a machine learning approach. Front Oncol. 2022;12:879376. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Cheng J, Liu J, Yue H, Bai H, Pan Y, Wang J. Prediction of glioma grade using intratumoral and peritumoral radiomic features from multiparametric MRI images. IEEE/ACM Trans Comput Biol Bioinform. 2022;19:1084–95. [DOI] [PubMed] [Google Scholar]

[CR28] 28.Liu C, Liu Y, Lin H, Zhang C, Zhang B, Song H, et al. Multi-omics landscape of alternative splicing in diffuse midline glioma reveals immune- and neural-driven subtypes with implications for spliceosome-targeted therapy. Front Immunol. 2025;16:1587009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Zhou Q, Xue C, Ke X, Zhou J. Treatment Response and Prognosis Evaluation in High-Grade Glioma: An Imaging Review Based on MRI. J Magn Reson Imaging. 2022 Aug;56(2):325-340. [DOI] [PubMed]

[CR30] 30.Lausch A, Yeung TP, Chen J, Law E, Wang Y, Urbini B, et al. A generalized parametric response mapping method for analysis of multi-parametric imaging: A feasibility study with application to glioblastoma. Med Phys. 2017;44:6074–84. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Hoff BA, Lemasson B, Chenevert TL, Luker GD, Tsien CI, Amouzandeh G, et al. Parametric response mapping of FLAIR MRI provides an early indication of progression risk in glioblastoma. Acad Radiol. 2021;28:1711–20. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Predicting the molecular subtypes of 2021 WHO grade 4 glioma by a multiparametric MRI-based machine learning model

Wenji Xu

Yangyang Li

Jie Zhang

Zhiyi Zhang

Pengxin Shen

Xiaochun Wang

Guoqiang Yang

Jiangfeng Du

Hui Zhang

Yan Tan

Abstract

Background

Objectives

Methods

Results

Conclusion

Supplementary Information

Background

Materials and methods

Machine learning-based classification

Patient population

Fig. 1.

Clinical-radiological characteristics collection

Molecular biomarker detection

MRI image acquisition and preprocessing

Image segmentation, feature extraction and selection

Fig. 2.

Fig. 3.

Construction of the ML model

Construction of the clinical model

Construction of the combined model and nomogram

Model evaluation and model comparison

Survival analysis

Statistical analysis

Results

Clinical-radiological baseline characteristics

Table 1.

Construction of the clinical model

Table 2.

Fig. 4.

Table 3.

Construction of the ML model

Construction of the combined model and nomogram

Fig. 5.

Model evaluation and comparison

Fig. 6.

Fig. 7.

Survival analysis

Fig. 8.

Discussion

Predictive value of the clinical model

Predictive value of the ML model

Predictive value of the combined model

Model evaluation and comparison

Survival analysis

Limitations

Conclusion

Supplementary Information

Acknowledgements

Abbreviations

Authors’ contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles