Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Feb 1.
Published in final edited form as: Comput Biol Med. 2024 Dec 4;185:109436. doi: 10.1016/j.compbiomed.2024.109436

Deep Learning-Based Overall Survival Prediction in Patients with Glioblastoma: An Automatic End-to-End Workflow Using Pre-Resection Basic Structural Multiparametric MRIs

Zi Yang 1,3,*, Aroosa Zamarud 2,*, Neelan J Marianayagam 2, David J Park 2, Ulas Yener 2, Scott G Soltys 2,3, Steven D Chang 2, Antonio Meola 2, Hao Jiang 4, Weiguo Lu 1, Xuejun Gu 3,
PMCID: PMC11761382  NIHMSID: NIHMS2040028  PMID: 39637462

Abstract

Purpose:

Accurate and automated early survival prediction is critical for patients with glioblastoma (GBM) as their poor prognosis requires timely treatment decision-making. To address this need, we developed a deep learning (DL)-based end-to-end workflow for GBM overall survival (OS) prediction using pre-resection basic structural multiparametric magnetic resonance images (Bas-mpMRI) with a multi-institutional public dataset and evaluated it with an independent dataset of patients on a prospective institutional clinical trial.

Materials and methods:

The proposed end-to-end workflow includes a skull-stripping model, a GBM sub-region segmentation model and an ensemble learning-based OS prediction model. The segmentation model utilizes skull-stripped Bas-mpMRIs to segment three GBM sub-regions. The segmented GBM is fed into the contrastive learning-based OS prediction model to classify the patients into different survival groups. Our datasets include both a multi-institutional public dataset from Medical Image Computing and Computer Assisted Intervention (MICCAI) Brain Tumor Segmentation (BraTS) challenge 2020 with 235 patients, and an institutional dataset from a 5-fraction SRS clinical trial with 19 GBM patients. Each data entry consists of pre-operative Bas-mpMRIs, survival days and patient ages. Basic clinical characteristics are also available for SRS clinical trial data. The multi-institutional public dataset was used for workflow establishing (90% of data) and initial validation (10% of data). The validated workflow was then evaluated on the institutional clinical trial data.

Results:

Our proposed OS prediction workflow achieved an area under the curve (AUC) of 0.86 on the public dataset and 0.72 on the institutional clinical trial dataset to classify patients into 2 OS classes as long-survivors (>12 months) and short-survivors (< 12 months), despite the large variation in Bas-mpMRI protocols. In addition, as part of the intermediate results, the proposed workflow can also provide detailed GBM sub-regions auto-segmentation with a whole tumor Dice score of 0.91.

Conclusion:

Our study demonstrates the feasibility of employing this DL-based end-to-end workflow to predict the OS of patients with GBM using only the pre-resection Bas-mpMRIs. This DL-based workflow can be potentially applied to assist timely clinical decision-making.

Keywords: Survival prediction, Glioblastoma, Stereotactic radiosurgery, Deep learning

Introduction

Glioblastoma (GBM) represents the most prevalent malignant brain tumor in adults, with a reported incidence rate of 3.26 per 100,000 population1. Moreover, the incidence of GBM has risen in the last decade2. Despite advances in contemporary therapeutic modalities, GBM patients continue to face a poor prognosis, with a median overall survival (OS) of 16.6 months and a two-year median survival rate of 34%3,4. Given the poor prognosis, emphasis on early diagnosis, close monitoring, and prompt treatment decision-making is imperative as precise and automated survival prediction in the early stages can facilitate both treatment planning and post-treatment assessment.

Magnetic resonance imaging (MRI) is the foremost non-invasive imaging modality used for brain tumors, including GBM. Basic structural multiparametric MRI (Bas-mpMRI), such as T1-weighted, T1 contrast-enhanced (T1ce), T2-weighted, and T2-Fluid Attenuated Inversion Recovery (FLAIR), are routinely employed for the diagnosis, treatment, and follow-up of GBM5,6. While these Bas-mpMRI techniques may not provide detailed quantitative functional information as advanced MRI sequences such as diffusion tensor imaging, their combination permits comprehensive analysis of diverse parameters that contain complementary information. Such analyses have yielded promising results of OS prediction in GBM patients, and multiple studies have reported the use of Bas-mpMRI for this task79.

One popular approach is radiomics, where quantitative features are extracted from the tumor volume in radiographic images, and subsequently fed into appropriate machine learning models to predict OS. Although the radiomics approach has proven effective in several studies9,10, 31, the complicated and subjective hand-crafted radiomic feature extraction during the image pre-processing procedure is its major drawback. Furthermore, radiomic feature extraction requires tumor contours which may not always be available in clinical cases since most GBM patients undergo surgical resection as the primary course of treatment, and detailed tumor delineation is typically not part of the protocol.

The recent advancements in deep learning (DL)-powered technology have opened new avenues for GBM OS prediction. Firstly, DL enables automated GBM segmentation, which releases the contouring burden in OS prediction. More importantly, DL-based OS prediction approach does not require manual feature extraction. Instead, DL models are trained to automatically explore complex features through vast computations11. Though several DL approaches have been developed5,1215,32, as far as we know no end-to-end DL-based OS prediction workflow has been developed and validated in clinical studies. To enable a fully automatic workflow that can be easily adopted for clinical studies, an DL-based brain image analysis platform and an ensemble learning-based GBM OS prediction model were developed and validated on the publicly available multi-institutional BraTS 2020 dataset16. Then we evaluated the entire workflow in a cohort of patients from our institutional phase I/II of 5-fraction GBM stereotactic radiosurgery (SRS) clinical trial, demonstrating the potential for clinical adoption.

Materials and Methods

Patient data

Public dataset

The first dataset used in this study for the workflow establishing and initial validation consists of 235 glioma patients in total from the public MICCAI BraTS 2020 database6,16,17. For each patient, the data includes pre-operative Bas-mpMRI scans of glioblastoma with pathologically confirmed diagnosis, age, and OS (in days). All BraTS Bas-mpMRI scans include four imaging sequences as T1, T1ce, T2, and T2-FLAIR, which were acquired with different clinical protocols and various scanners from multiple institutions (n=16).

GBM SRS clinical trial dataset

This clinical trial dataset consists of a total of 30 patients enrolled on our institutional phase I/II clinical trial18. Patients older than 18 years with newly diagnosed, pathologically confirmed, supratentorial GBM were candidates for this institutional review board–approved prospective trial. Detailed inclusion/exclusion criteria and clinical trial management can be found at clinicaltrials.gov: NCT01120639. All the enrolled patients underwent surgical resection followed by 5-fraction SRS with 5-mm margins delivered to the resection cavity with concurrent temozolomide at our institution between 2010 and 2015. The dose escalation scheme in the SRS trial contains four levels: 25 Gy, 30 Gy, 35 Gy, and 40 Gy, using a standard 3+3 design. Patients were divided into two arms based on the PTV size: Group 1 had a PTV of 0.1 to 60 cc, and Group 2 had a PTV of 60 to 150 cc. Only 19 out of these 30 patients have all four Bas-mpMRI sequences (T1, T1ce, T2, and T2-FLAIR) acquired for GBM diagnosis before resection, thus being selected in this study. The Bas-mpMRI were scanned during 2010–2015 with varied scanner and protocols, and the majority of them were acquired at a resolution of ~2.5mm × 2.5mm × 5mm. In addition to the Bas-mpMRI scans, their clinical feature data like age, sex, tumor characteristics, clinical presentation like Karnofsky Performance Scale (KPS) and Eastern Cooperative Oncology Group (ECOG) Performance Status, treatment characteristics, follow-up period, clinical outcome like OS and PFS are available. The Common Terminology Criteria for Adverse Events (CTCAE) grades 3–5 was used to define adverse events. Table 1 gives a summary of the basic demographic characteristics and SRS treatment parameters.

Table 1:

Basic demographic characteristics of the institutional SRS clinical trial dataset

Variables Numbers
Patient Characteristics

Number of Patients:
Age (Years):
19

Median 66
Range (24-79)
Sex:
Female 58% (11)
Male 42% (8)
IDH-1
Immunostaining
negative 47% (9)
Unknown 53% (10)
MGMT Methylation status
Negative 63% (12)
Positive 21% (4)
Unknown 16% (3)
Pre-op KPS
Median 80
Range (50-90)
Pre-op ECOG
Median 1
Range (1-3)

Treatment Characteristics

Treatment: CK SRS
Volume (cc):
Median 29.27
Range (0.98–81.01)
Prescription dose (Gy):
Median 35
Range (18–40)
Fractions:
Median 5
Range (1–5)
Mean dose (Gy):
Median 39.61
Range (20.32-46.37)
Maximum dose (Gy):
Median 42.68
Range (22.5–50)
Isodose line (%):
Median 82
Range (79–86)
Conformality Index (CI):
Median 2.07
Range (1.36–3.16)

Follow-up period (months)
Median 11.4
Range (1.2–126.4)
CTCAE Grade
Median 1
Range (0–2)

CK, CyberKnife; SRS, Stereotactic radiosurgery; IDH, isocitrate dehydrogenase; MGMT, O6-Methylguanine-DNA-methyltransferase; ECOG, Eastern Cooperative Oncology Group; CTCAE, Common Terminology Criteria for Adverse Events

End-to-end workflow overview

The proposed end-to-end workflow is illustrated in Figure 1. Raw images of four pre-operative Bas-mpMRI scans (T1, T1ce, T2, and T2-FLAIR) are utilized as the input. Firstly, all the input images are preprocessed on a brain image analysis platform19 to segment out detailed GBM sub-regions using in-house preprocessing algorithms and DL models. The segmented GBM is then fed into the OS prediction model to classify the patient into different OS classes.

Figure 1:

Figure 1:

Overview of the proposed OS prediction end-to-end workflow.

Brain image analysis platform

As shown in figure 1, all the raw image data were preprocessed on a brain image analysis platform19. The platform initially was developed for supporting brain metastases SRS clinical workflow and later expanded to the image analysis of various disease sites in radiotherapy. The platform is supported by conventional image preprocessing tools, such as image registration, cropping and resampling and novel DL-based image segmentation tools in the backend. The frontend is implemented with WEBGL techniques to display processed images for human visual inspection and interaction. In addition, a SQlite database is integrated with the platform for efficient data management. The server is hosted under our institutional secure network that ensures the HIPAA requirements are met; it also provides a function for the data auto-anonymization.

All Bas-mpMRIs used in this study were first imported into the platform to conduct the following image preprocessing steps: (1) Co-registering to the Montreal Neurological Institute (MNI) standard brain template with rigid registration; (2) Interpolating into the same isotropic resolution (1 mm3); (3) Skull stripping using an in-house DL model; (4) Normalizing the image intensity using z-score normalization to account for intensity variation among different scanning protocols; (5) DL-based GBM sub-region segmentation. Since the BraTS public dataset was already skull-stripped and resampled, only intensity normalization and sub-region segmentation was conducted to this dataset through this platform.

Skull stripping model

The DL-based skull stripping model was built based on a UNet architecture and trained with 122 SRS patient data with skull contours available from our department using a Dice coefficient loss. It takes the T1 MRI as the input to generate a brain mask. Then this brain mask can be applied to other Bas-mpMRI scans for skull removal as all the brain scans were co-registered to the same standard anatomical template.

GBM sub-region segmentation model

As part of the preprocessing for the OS prediction task, GBM sub-regions including the contrast-enhancing tumor (ET), peri-tumoral edema (ED), and the necrotic and non-enhancing tumor core (NCR/NET) are segmented on the skull-stripped pre-operative Bas-mpMRIs, using an in-house DL segmentation model embedded in the brain image analysis platform. The segmentation model adopted the state-of-the art nnUNet architecture20 with a Dice coefficient loss. It was trained with the MICCAI BraTS 2020 public dataset to automatically delineate the ET, ED, NCR/NET sub-regions, using four pre-resection Bas-mpMRI GBM scans (T1, T1ce, T2, and T2-FLAIR) as the input.

Ensemble learning based OS prediction model

After the GBM segmentation, all the 2D slices of the Bas-mpMRIs (T1ce, T2, and T2-FLAIR) containing at least tumor core (NCR/NET and ET) were extracted and concatenated as the inputs to an ensemble learning based DL model for the OS prediction process. As shown in figure 2, our ensemble model for GBM OS prediction which consists of 3 parts: (1) A ResNet50-based Siamese network: The feature extractor utilizes the ResNet50 design to extract features from the input images, and the extracted features are fed into a CNN classifier to classify the input into the predefined OS categories. (2) A K-NN model: The extracted features of the input are also sent into a K-NN classifier to conduct neighborhood analysis with the training set features to assign an OS class to the input. (3) Prediction ensemble: the final OS prediction is an ensemble of the predictions from both the Siamese network and the K-NN model. Each patient may have multiple slices utilized in this OS prediction process. For each patient, his/her OS label was considered as the majority-voting result of the predicted labels from all the involving slices.

Figure 2:

Figure 2:

The workflow of the ensemble OS prediction model, which consists of a ResNet50-based Siamese network model and a K-NN model.

A Siamese network is a neural network with two twin feature extractor branches with the same weights and hyperparameters. During the training phase, the twin feature extractor branches work in tandem on two different input images to compute comparable output feature vectors to learn the inter-class differences. This contrastive learning scheme greatly increases the training inputs as the combination of any two images from the training set can be treated as one input, which can help to alleviate the limitations of the small dataset and to mitigate the concern of overfitting. In the testing phase, only one of the twin branches is used to extract the feature vector from every input image and send it to the subsequent CNN classifier. We implemented a contrastive loss Lcontrastive together with a cross-entropy loss LCE as the total loss Ltotal for training the entire ResNet50-based Siamese network, where Lcontrastive was for the feature extractor and LCE for the classifier:

Lcontrastive=i=1P12[(1Yi)(d(X1,X2)i)2+Yi{max(0,(λd(X1,X2)i))}2]withYi={0,if(X1,X2)iaresameclass1otherwise;andd(X1,X2)i=F(X1i)F(X2i)2,
LCE=c=1nlclog(pc)
Ltotal=αLcontrassive+(1α)LCE+γR

where P is the number of input pairs and λ is a designated threshold. Yi is the corresponding group label. d(X1,X2)i is the Euclidean distance between feature maps F(X1i) and F(X2i) from the two branches of the Siamese network with the corresponding ith input image pair (X1,X2)i. Inside the LCE, lc is the ground truth OS class and pc is the probability of the cth class. To combine LCE and Lcontrastive, α and γ are weighting parameters and R was a regularization term to avoid overfitting.

More details about our DL-based GBM OS prediction ensemble model can be found in our previous publication cited in the reference5.

Establishing the proposed workflow in the multi-institutional public dataset

To establish this proposed end-to-end workflow, we randomly split the total number of patients of MICCAI BraTS 2020 into ~9:1, with 212 patients’ Bas-mpMRIs used for training and validating DL models, while the remaining 23 patients were reserved as an independent holdout testing set. Notably, this dataset and the specific data partition were employed to establish both the GBM sub-region segmentation model and the OS prediction model.

The OS prediction model was implemented using Pytorch library. Transfer learning was adopted to initialize the ResNet50-based feature extractor with a set of parameters that was previously trained in our brain metastases false-positive reduction model 21. The parameters of the CNN classifier were initialized with “Xavier” algorithm. Other detailed training scheme and experimental design can be found in our previous publication5. Notably, in this previous work, we approached this GBM OS prediction task as classifying patients into three OS categories: (1) Short-survivors (<10 months), (2) Mid-survivors (between 10 and 15 months), and (3) Long-survivors (>15 months). The purpose of using this stratification was to follow the guideline and evaluation scheme of BraTS Challenge 2020 and compare it with other studies, to validate the effectiveness and technical improvements of our proposed ensemble learning model. In this study, we modified the stratification as two OS categories, short-survivors (<12 months) and long-survivors (>12 months) and retrained the OS prediction model that be more suitable for clinical studies.

Applying the proposed workflow to the institutional 5-fraction SRS clinical trial dataset

To assess the applicability and efficacy of the proposed workflow in clinical studies, we collected institutional GBM 5-fraction SRS clinical trial data, as described earlier, and evaluated the established OS prediction workflow with it. Firstly, we performed a visual inspection of the DL-based skull-stripping results to avoid introducing any errors. Next, all the auto-segmented GBM sub-regions (ET, ED, NCR/NET) were visually inspected and approved by neurosurgeon fellows. After obtaining valid segmentation results, 2D slices of the Bas-mpMRIs that contained the tumor core were automatically extracted and concatenated as inputs to the ensemble learning OS prediction model, which classifies patients into short- or long-term survivors.

Statistical Analysis of the SRS clinical trial dataset

After applying the SRS trial dataset to the proposed workflow, we further analyzed the correlation between clinical features and OS classes as well as the correlation between clinical features and prediction accuracy. A total of 15 clinical features were pre-selected, including gender, age, treatment volume, mean dose, prescription dose, max dose, CTCAE, pre- and post-surgery KPS, pre- and post-surgery ECOG, pre- and post-RT KPS, and pre- and post-RT ECOG. We used ANOVA to analyze those continuous-variable clinical features, such as age, and Chi-square to study categorical-variable ones, such as gender.

Results

Auto-segmentation validation and inspection

The auto-segmentation model was first validated using the holdout testing dataset of MICCAI BraTS 2020. After visual inspection, we calculated the Dice (mean ± Standard Deviation) scores between the manual contours and the auto-segmentations for 23 testing patients. The Dice scores for whole tumor (include all three sub-regions NCR/NET, ET and ED), tumor core (NCR/NET and ET) and ET are 0.91±0.09, 0.89±0.14, and 0.87±0.09, respectively, which are clinically acceptable. The top panel of Figure 3 shows the comparison of segmentation and ground truth (manual) contour of a sample patient from the testing dataset of BraTS 2020 overlayed with T1ce image. For SRS clinical dataset, the segmentation results are visually inspected as no ground truth contours are available. All segmentations of 19 patients are approved by neurosurgeon fellows with no obvious defects. A sample patient auto-segmentation results from the clinical trial dataset are illustrated on the bottom panel of Figure 3 with T2 and T1ce images.

Figure 3:

Figure 3:

(a) ground truth vs. (b) auto-segmentation results on a sample patient of BraTS dataset; Auto-segmentation result on the pre-resection (c) T2 image and (d) T1ce image of a sample patient of SRS clinical dataset.

OS prediction in the multi-institutional public dataset

As described earlier, the OS prediction model from our previous study classifies GBM patients into three OS classes according to the BraTS Challenge 2020 guidelines. For this study, we modified the OS prediction model to classify patients into 2 OS classes to establish the proposed workflow for clinical settings. Here we present the OS prediction results using both models to provide more comprehensive information. Specifically, we use the original 3-class model to compare with other studies and demonstrate its effectiveness, while the new 2-class model is intended for clinical adoption and application.

Table 2 shows patient-wise accuracy (ACC) and area under the curve (AUC) of the training set and the independent testing set using both 3-class and 2-class models. Figure 4 plots the ROC (Receiver Operating Characteristic) curves and the confusion matrix of the testing set from both models.

Table 2:

The patient-wise accuracy (ACC) and area under the curve (AUC) of the OS prediction using both 3-class and 2-class models in the training set and the independent testing set of the multi-institutional public dataset, and the institutional SRS clinical trial dataset.

3-class 2-class
Train Test SRS Train Test SRS
AUC 0.95 0.81 0.69 0.98 0.86 0.72
ACC 89.9% 65.2% 52.6% 99.5% 78.3% 68.4%

Figure 4:

Figure 4:

OS prediction results of the BraTS independent testing set using the original 3-class model ((a) ROC curve, (b) confusion matrix), and the new 2-class model ((c) ROC curve, (d) confusion matrix) that implemented in the proposed workflow. (Pred: model prediction, Class: ground truth OS class)

Our original 3-class model achieved an ACC of 65.2% and AUC of 0.81 in the independent testing set, while our proposed workflow using the new 2-class model achieved an ACC of 78.3% and AUC of 0.86 in the same testing set.

OS prediction in the institutional 5-fraction SRS clinical trial dataset

We directly applied this proposed workflow that was established in BraST 2020 dataset to predict OS survival on 19 patients enrolled on a clinical trial of 5-fraction SRS. Table 2 shows the corresponding AUC and ACC in OS prediction using both 3-class and 2-class models. The original 3-class model achieved an ACC of 52.6% and AUC of 0.69. Our proposed workflow using the new 2-class model achieved an ACC of 68.4% and AUC of 0.72 in the same SRS clinical set, demonstrating its applicability in clinical studies. In addition, we further plotted the corresponding ROC curves as well as confusion matrix in Figure 5.

Figure 5:

Figure 5:

OS prediction results of the institutional SRS clinical trial using the original 3-class model ((a) ROC curve, (b) confusion matrix), and the new 2-class model ((c) ROC curve, (d) confusion matrix) that implemented in the proposed workflow. (Pred: model prediction, Class: ground truth OS class)

After obtaining the above prediction results, we performed the statistical analysis as described in the method section. Though no statistical significance has been found between the clinical features and the ground truth OS class or the model prediction within this SRS clinical trial, the CTCAE and post-RT KPS score was found to be highly correlated with both the patients’ ground truth OS class and the DL model OS prediction. In detail, the CTCAE presents a χ2 score of 1.17 and p-value of 0.28 with the ground truth OS class, and a χ2 score of 0.70 and p-value of 0.40 with the DL model OS prediction, respectively. In addition, the post-RT KPS score was found to have an F-score of 1.82 and p-value of 0.19 with the ground truth class, and an F-score of 2.86 and p-value of 0.11 with the model prediction, respectively.

Discussion

GBM is a highly aggressive tumor associated with poor OS. The currently standard for treating GBM is maximal surgical resection followed by concurrent chemoradiation 22. To aid in surgical planning, it would be ideal for the surgeon to have a reliable method for estimating OS based on patient characteristics, especially image features. Similarly, radiation oncologists would greatly benefit from knowing the survival probabilities to optimize treatment regimens and target margins. For example, for short survival patients, conventional 30 fraction radiation regimens may not be suitable, and for long survival patients, a conservative margin might need to be considered to balance disease control and radiation toxicity for an optimal quality of life. However, accurately predicting GBM OS remains a challenging task, especially when only pre-operative basic structural multiparametric MRIs (Bas-mpMRI) are available.

Recent advancements in machine learning technology have significantly improved the field of medical image analysis. Many researchers have utilized novel machine learning methods to address the challenges of GBM OS prediction. Deep learning-based approaches has demonstrated the ability to automatically extract features predictive of survival from raw images, but this automation brings a lack of interpretability of the extracted features and requires high computation and large dataset to avoid overfitting. In this study, we presented a novel DL-based end-to-end automatic GBM OS prediction workflow using basic structural multiparametric MRIs. The workflow automation is enabled by a comprehensive platform which integrates and automates image pre-processing, image segmentation and OS prediction. The workflow has novelties in both technical and clinical aspects. In technical aspect, our OS prediction model does not require hand-crafted feature extraction and makes it easier to be applied in clinics. Most of the current studies regarding GBM OS prediction were using radiomics or radiomics combined with deep-learned23,24. However, the hand-crafted radiomic feature extraction usually requires complex preprocessing and is subject to generalization issue especially in multi-parameter MRIs and varied imaging acquisition protocol. After feature extraction, feature engineering is often required to select important features and remove redundancy, which introduces additional processing steps with user-dependent operations23,24. Thus, survival analysis in these studies23,24 demands handcrafts and user-dependent processing rather than full-automated methods as we developed. From clinical aspects, the developed platform provides a fully automated clinical solution which is outstanding from most of works25 that were still in the research stage. Upon the completion of scanning, the users can push the Bas-mpMRIs images from the institutional Picture archiving and communication system (PACS) system into our platform server. Once receiving the images, the platform will auto-execute all the steps that described in the workflow including all the preprocessing and model testing. At the same time, users can access the platform via any web browser under the institutional network to review the images as well as the results from each step. The segmentation results can be exported as RTStructure Dicom file for the following planning procedure, and the other numerical results can be batch-exported as excel file for future analysis. In addition, using only Bas-mpMRIs rather than advanced MRIs allows the workflow to retrospectively investigate clinical data, as these basic sequences are often available in routine clinical practice. We validated the workflow and tested its generalizability by initially building and validating it on the BraTS multi-institutional public dataset and then applying it to an institutional SRS clinical trial. The reasonable performance achieved in both testing datasets shows the potential of the platform. The implementation of our workflow and platform into clinical application requires some computational resources such as a dedicated server computer with GPU as well as periodic software maintenance to ensure the data transmission between different systems. However, these computational requirements can be easily met as most of the commercially available gaming GPU such as RTX 20 series and 30 series can support the computation.

As previously mentioned, the segmentation of GBM sub-regions is an essential intermediate step in our proposed OS prediction workflow. Our segmentation model achieved Dice scores of 0.91±0.09 for the whole tumor, 0.89±0.14 for the tumor core (including NCR/NET and ET), and 0.87±0.09 for ET on a public dataset, which regards high accuracy. Furthermore, when directly applied to an institutional clinical trial dataset, all the segmentation results passed the inspection from neurosurgeon fellows without obvious defects, indicating the segmentation model’s ability to generalize to other clinical studies.

The DL OS prediction model implemented in this workflow is a contrastive-learning based ensemble model from our previous work5. The novelty of this model comes from the following two aspects: firstly, it employs a 2D-based Siamese branch design and contrastive learning scheme to increase training inputs and enhance the network’s ability to identify interclass discriminative features. Secondly, image features from the training dataset are incorporated as a reference in the classification process, providing additional information to guide prediction alongside the CNN classifier.

In our previous research, we approached GBM OS prediction as a 3-class classification task, in line with the guidelines set by the BraTS Challenge. The Challenge organizers set the OS threshold values at <10 months, 10–15 months, and >15 months, with the aim of balancing the sample distribution for optimal prediction performance and accounting for uncertainties by including an intermediate class between long- and short-survival groups. However, these threshold values lack clinical evidence and proof, and are therefore not well-suited for clinical applications. Therefore, in this study, we reformulated GBM OS prediction as a 2-class classification problem, which is a common one-year survival approach utilized in several studies 2628. This 2-class OS stratification is more generic and appropriate for clinical applications. We utilized the same DL architecture as in our previous work, but retrained and built the proposed workflow using this 2-class approach. In addition, we also tested the 3-class OS prediction model on both datasets to provide comprehensive information and enable comparison with other studies using the same public dataset.

Our 3-class OS prediction model achieved a prediction ACC of 65.22% and AUC of 0.81 on the testing set from multi-institutional BraTS dataset. These results are reasonable and comparable to those reported by participants in the BraTS Challenge 2020, who used the same dataset10. However, most of the challenge participating groups including the top 3 performing teams10,1215 approached this OS prediction task by extracting hand-crafted radiomics features from the tumor volumes and followed with feature selection and machine learning algorithms such as SVM. Therefore, their approach can require extra preprocessing and introduce more uncertainties in generalization especially when dealing with multi-modal MRIs. This performance comparison demonstrates that the design of our OS prediction model is promising in the early prediction of GBM OS using Bas-mpMRIs. Subsequently, we applied this 3-class model on the SRS clinical trial patients. The model achieved an ACC of 52.6% and AUC of 0.69. Though the ACC value is not as high as 65.22% that we achieved in BraTS data, it is comparable to other reported results, such as 58.9% by Bommineni et al.12 and 57.9% by Ali et al.13, and surpassed 48.3% reported by Rafi et al.14.

After validating the design and performance of the OS model using the original 3-class approach, we retrained the model as a 2-class OS prediction approach and implemented it into the proposed workflow. The established workflow achieved a prediction ACC of 78.26% and AUC of 0.86 on the independent testing set from the multi-institutional BraTS dataset, which is also promising compared to other studies using the same 2-class approach, like AUC 0.74 and ACC 74% by Ahmed et al. 15, and AUC 0.75 by Bakas et al. 9. This workflow was then further validated with the same SRS clinical trial dataset and achieved an ACC of 68.42% and AUC of 0.72, demonstrating its applicability and generalization ability in clinical studies.

The performance degradation observed in the SRS dataset, compared to the BraTS dataset, may be attributed by several factors. Firstly, the image quality of the SRS dataset is relatively lower due to the fact that the patients were enrolled and scanned during 2010–2015, and the majority of MRIs were acquired at a lower resolution (~2.5mm × 2.5mm × 5mm) compared to the BraTS MRIs, which had a higher resolution of 1mm × 1mm × 1mm. Although the lower resolution images were resampled to 1mm × 1mm × 1mm as inputs, some information may have been lost in the acquisition. Secondly, MRI intensities are known to be non-standardized and highly dependent on various factors, such as manufacturer, sequence type, and acquisition parameters 29. Although we conducted z-normalization on the MRIs to mitigate intensity variability, such normalization may not fully compensate for the data variation, which could potentially compromise the model performance. To better address data distribution variation, new machine learning methods such as transfer learning and domain adaptation have been developed in the field of computer vision 30. In the future, we may adopt these methods to improve the performance of our workflow. Thirdly, the OS prediction model is trained with pre-operative MRI data and patient age, as the publica available BraTS challenge only provides the imaging data and patient age together with OS. And our institutional SRS trial dataset was too small to allow us to finetune the model with clinical features. Therefore, the performance of our workflow in both the BraTS and SRS datasets is limited by the lack of comprehensive and critical clinical information, such as patients’ KPS and treatment with and without Chemo-RT, which have great influence on patient OS. Moreover, the distribution of these critical information could be quite different between BraTS and our clinical SRS datasets, the model trained on BraTS could lead to biased, potentially decreased prediction accuracy on our clinical SRS patients. To enhance the performance of our workflow, in the future, we aim to incorporate critical clinical information as well as additional time-series image data and genomic data if available. By including these data, we expect to refine and improve the predictive power of our model. In addition, this is a retrospective study and therefore has the inherent limitations typical of such designs. Our specific requirements of incorporating patients who underwent SRS treatment with Bas-mpMRI and OS data restricted the sample size of our study. This may introduce selection bias and limit the extent to which our findings be applied. Expanding dataset to include multi-institutional clinical data will improve with the model’s generalizability. However, this is difficult to achieve under healthcare context due to HIPAA and data safety considerations. Federated learning can be a potential approach utilized to collaborate with other institutions on expanding the current clinical data.

Building on our previously developed OS prediction model, we have created a fully automated, end-to-end workflow tailored for direct clinical application. In this study, we aim to validate the entire workflow and demonstrate its applicability in a clinical setting, evidenced by outcomes from both the BRATS dataset and our clinical SRS dataset. Detailed information regarding the current model can be found in our previous publication5. Additionally, the end-to-end workflow we developed not only supports current models but is also adaptable for integrating newer models in the future. We believe that our study can provide valuable insights to the existing body of literature about computational survival prediction in GBM patients.

Conclusion

In conclusion, we present a DL-based end-to-end workflow for automatic prediction of OS in patients with GBM using basic structural multiparametric pre-operative MRIs. The workflow was tested on both public and institutional datasets, demonstrating its promising performance. The developed model has the potential to assist in timely clinical decision-making for patients with GBM, thereby improving patient outcomes.

Highlights.

  • Early overall survival (OS) prediction of Glioblastoma with pre-operative basic structural multi-parametric MRI.

  • The end-to-end DL-based workflow, including skull-stripping, GBM segmentation and OS prediction, sets users free from complex processing.

  • Contrastive learning-based ensemble learning for OS prediction.

  • Validated on a multi-institutional public dataset and an institutional 5-fraction SRS clinical trial.

Funding Statement

This work was supported by the National Institutes of Health under Grant No. R01-CA235723 and SBIR 75N91021C00031.

Footnotes

Conflict of Interest

None

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Data Availability:

BraTS data are publicly available. SRS clinical trial data are not available at this time.

Reference

  • 1.Ostrom QT, Price M, Neff C, et al. Cbtrus statistical report: Primary brain and other central nervous system tumors diagnosed in the united states in 2015–2019. Neuro-oncology 2022;24:v1–v95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Grech N, Dalli T, Mizzi S, et al. Rising incidence of glioblastoma multiforme in a well-defined population. Cureus 2020;12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Gilbert MR, Wang M, Aldape KD, et al. Dose-dense temozolomide for newly diagnosed glioblastoma: A randomized phase iii clinical trial. Journal of clinical oncology 2013;31:4085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Shah JL, Li G, Shaffer JL, et al. Stereotactic radiosurgery and hypofractionated radiotherapy for glioblastoma. Neurosurgery 2018;82:24–34. [DOI] [PubMed] [Google Scholar]
  • 5.Yang Z, Chen M, Kazemimoghadam M, et al. Ensemble learning for glioma patients overall survival prediction using pre-operative mris. Phys Med Biol 2022;67:245002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Menze BH, Jakab A, Bauer S, et al. The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 2014;34:1993–2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Nicolasjilwan M, Hu Y, Yan C, et al. Addition of mr imaging features and genetic biomarkers strengthens glioblastoma survival prediction in tcga patients. Journal of Neuroradiology 2015;42:212–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Seow P, Wong JHD, Ahmad-Annuar A, et al. Quantitative magnetic resonance imaging and radiogenomic biomarkers for glioma characterisation: A systematic review. The British journal of radiology 2018;91:20170930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bakas S, Shukla G, Akbari H, et al. Overall survival prediction in glioblastoma patients using structural magnetic resonance imaging (mri): Advanced radiomic features may compensate for lack of advanced mri modalities. Journal of Medical Imaging 2020;7:031505–031505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.McKinley R, Rebsamen M, Dätwyler K, et al. Uncertainty-driven refinement of tumor-core segmentation using 3d-to-2d networks with label uncertainty. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 6th International Workshop, BrainLes 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers, Part I 6. Springer. 2021. pp. 401–411. [Google Scholar]
  • 11.Tang Z, Xu Y, Jin L, et al. Deep learning of imaging phenotype and genotype for predicting overall survival time of glioblastoma patients. IEEE transactions on medical imaging 2020;39:2100–2109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bommineni VL. Piecenet: A redundant unet ensemble. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 6th International Workshop, BrainLes 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers, Part II 6. Springer. 2021. pp. 331–341. [Google Scholar]
  • 13.Ali MJ, Akram MT, Saleem H, et al. Glioma segmentation using ensemble of 2d/3d u-nets and survival prediction using multiple features fusion. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 6th International Workshop, BrainLes 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers, Part II 6. Springer. 2021. pp. 189–199. [Google Scholar]
  • 14.Rafi A, Madni TM, Janjua UI, et al. Multi-level dilated convolutional neural network for brain tumour segmentation and multi-view-based radiomics for overall survival prediction. International Journal of Imaging Systems and Technology 2021;31:1519–1535. [Google Scholar]
  • 15.Ben Ahmed K, Hall LO, Goldgof DB, et al. Ensembles of convolutional neural networks for survival time estimation of high-grade glioma patients from multimodal mri. Diagnostics 2022;12:345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bakas S, Reyes M, Jakab A, et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:181102629 2018. [Google Scholar]
  • 17.Bakas S, Akbari H, Sotiras A, et al. Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Scientific data 2017;4:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Azoulay M, Chang SD, Gibbs IC, et al. A phase i/ii trial of 5-fraction stereotactic radiosurgery with 5-mm margins with concurrent temozolomide in newly diagnosed glioblastoma: Primary outcomes. Neuro-oncology 2020;22:1182–1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Yang Z, Liu H, Liu Y, et al. A web-based brain metastases segmentation and labeling platform for stereotactic radiosurgery. Medical physics 2020;47:3263–3276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Isensee F, Jaeger PF, Kohl SA, et al. Nnu-net: A self-configuring method for deep learning-based biomedical image segmentation. Nature methods 2021;18:203–211. [DOI] [PubMed] [Google Scholar]
  • 21.Yang Z, Chen M, Kazemimoghadam M, et al. Deep-learning and radiomics ensemble classifier for false positive reduction in brain metastases segmentation. Physics in Medicine & Biology 2022;67:025004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Molinaro AM, Hervey-Jumper S, Morshed RA, et al. Association of maximal extent of resection of contrast-enhanced and non–contrast-enhanced tumor with survival within molecular subgroups of patients with newly diagnosed glioblastoma. JAMA oncology 2020;6:495–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Fu J, Singhrao K, Zhong X, et al. An automatic deep learning–based workflow for glioblastoma survival prediction using preoperative multimodal mr images: A feasibility study. Advances in radiation oncology 2021;6:100746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kaur G, Rana PS, Arora V. Deep learning and machine learning-based early survival predictions of glioblastoma patients using pre-operative three-dimensional brain magnetic resonance imaging modalities. International Journal of Imaging Systems and Technology 2023;33:340–361. [Google Scholar]
  • 25.Di Noia C, Grist JT, Riemer F, et al. Predicting survival in patients with brain tumors: Current state-of-the-art of ai methods applied to mri. Diagnostics 2022;12:2125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zhang X, Lu H, Tian Q, et al. A radiomics nomogram based on multiparametric mri might stratify glioblastoma patients according to survival. European radiology 2019;29:5528–5538. [DOI] [PubMed] [Google Scholar]
  • 27.Sanghani P, Ang BT, King NKK, et al. Overall survival prediction in glioblastoma multiforme patients from volumetric, shape and texture features using machine learning. Surgical oncology 2018;27:709–714. [DOI] [PubMed] [Google Scholar]
  • 28.Das S, Bose S, Nayak GK, et al. Brain tumor segmentation and overall survival period prediction in glioblastoma multiforme using radiomic features. Concurrency and Computation: Practice and Experience 2022;34:e6501. [Google Scholar]
  • 29.Simmons A, Tofts PS, Barker GJ, et al. Sources of intensity nonuniformity in spin echo images at 1.5 t. Magnetic resonance in medicine 1994;32:121–128. [DOI] [PubMed] [Google Scholar]
  • 30.Redko I, Morvant E, Habrard A, et al. Advances in domain adaptation theory: Elsevier; 2019. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

BraTS data are publicly available. SRS clinical trial data are not available at this time.

RESOURCES