Abstract
Purpose
Manual delineation of target volumes and organs-at-risk (OARs) for glioblastoma radiotherapy is a critical bottleneck prone to inter-observer variability. This study developed a fully automated, dual-modality deep learning framework to validate its clinical utility by demonstrating the dosimetric equivalence of generated contours against an expert standard.
Materials and Methods
A deep learning framework was developed using a retrospective dataset of 100 patients. It integrates two specialized networks operating without pre-alignment: a specialized dual-encoder attention U-Net for computed tomography (CT) (used for OARs delineation and dose calculation), and a single-encoder attention U-Net for T2-FLAIR magnetic resonance imaging (MRI) (used for precise target volumes definition). A modality dispatcher routes images to the appropriate model. Geometric performance was evaluated using the Dice similarity coefficient (DSC). For clinical validation, automated CT contours were used to generate volumetric modulated arc therapy plans. These plans were compared to those based on expert manual contours via paired statistical analysis.
Results
The framework demonstrated excellent geometric accuracy on the independent test set. Mean planning target volume (PTV) DSC was 0.94 ± 0.03 on MRI and 0.92 ± 0.04 on CT. Dosimetric analysis confirmed clinical viability. No statistically significant differences (p > 0.05) were observed between automated and manual plans for PTV coverage (D95%), OAR maximum doses, or any evaluated metric.
Conclusion
The automated framework provides accurate segmentation on both CT and MRI. Demonstrated dosimetric equivalence validates clinical reliability and potential to enhance planning efficiency while reducing inter-observer variability.
Keywords: Deep learning, Glioblastoma, Radiotherapy planning, Computer-assisted, Image segmentation, Dosimetry, Multimodal imaging
Introduction
Glioblastoma is the most common and aggressive primary malignant brain tumor in adults, with a median survival that remains poor despite multimodal treatment strategies [1]. The current standard of care for newly diagnosed glioblastoma involves maximal safe surgical resection followed by adjuvant radiotherapy with concurrent and adjuvant temozolomide [2]. Radiotherapy plays a pivotal role in local tumor control, but its success is critically dependent on the accurate delineation of the target volumes and adjacent organs-at-risk (OARs) [3]. Precise contouring of the gross tumor volume (GTV), clinical target volume (CTV), and planning target volume (PTV) ensures adequate tumor coverage while minimizing radiation-induced toxicity to healthy neural tissues within established tolerance limits [4,5].
Manual contouring represents a significant workflow bottleneck. This time-consuming task depends heavily on physician experience and exhibits considerable inter- and intra-observer variability [6,7]. This variability may compromise tumor control and patient safety [3], motivating research into automated segmentation methods [8,9].
The challenge of automated segmentation in glioblastoma radiotherapy is compounded by its inherently multi-modal nature. Treatment planning uses computed tomography (CT) scans for geometric accuracy and the correlation between Hounsfield units (HU) and electron density required for dose calculation [10]. However, CT offers poor soft-tissue contrast, making it difficult to accurately delineate the full extent of the tumor and surrounding edema. Consequently, magnetic resonance imaging (MRI) is the gold standard for tumor visualization, with sequences like T2-weighted fluid-attenuated inversion recovery (T2-FLAIR) offering superior contrast for identifying peritumoral edema, a key component of the CTV [11,12]. The current clinical workflow therefore relies on a complex process of CT-MRI registration, which introduces its own potential for error [13]. A robust auto-segmentation tool must be capable of operating effectively within this multi-modal environment.
In recent years, deep learning, particularly convolutional neural networks, has emerged as the state-of-the-art approach for medical image segmentation [14]. The U-Net architecture, with its encoder-decoder structure and skip connections, has become foundational for biomedical image segmentation [15]. Enhancements include residual connections for deeper networks [16] and attention mechanisms for salient feature extraction [17,18]. These advancements have led to a plethora of high-performing models for brain tumor segmentation, often benchmarked on datasets from the BraTS challenge [19,20], and for OAR segmentation in the head and neck region [21,22]. The current pinnacle of automated segmentation is often represented by self-configuring frameworks like nnU-Net, which have set new performance benchmarks [23]. While the field is rapidly evolving with the introduction of transformer-based architectures like UNETR and Swin-Unet [24,25], and investigations into large foundation models [26,27], the challenge of translating high geometric accuracy into a clinically trusted tool remains.
Despite high geometric overlap scores [28], most studies lack clinical workflow validation. A high Dice similarity coefficient (DSC) score does not, on its own, guarantee that the resulting contours are clinically acceptable or that they will lead to equivalent treatment plans [29]. A growing consensus in the medical physics community posits that the ultimate benchmark for an auto-segmentation tool is its dosimetric impact [30,31]. Several recent studies have begun to address this gap by evaluating the dosimetric consequences of using deep learning-generated contours for various treatment sites [32,33]. For brain radiotherapy specifically, studies have analyzed the dosimetric impact for stereotactic radiosurgery and have begun to explore it for glioblastoma, confirming the critical need for this level of validation [1,34,35]. These investigations have paved the way for prospective clinical evaluations, representing the highest level of evidence for clinical utility [2].
This study bridges technical accuracy and clinical implementation through a fully automated, dual-modality deep learning framework for OAR and glioblastoma segmentation. The framework consists of two expert models managed by a modality dispatcher: a unique dual-encoder network for CT and a specialized single-encoder network for T2-FLAIR MRI. A key contribution of this approach is the use of independent inference without pre-alignment: the MRI model is utilized for high-contrast target definition, while the CT model ensures geometric accuracy for OARs and dose calculation. The primary contribution of this work is a rigorous end-to-end validation. Beyond reporting geometric metrics, the proposed system is used to generate DICOM RadioTherapy Structure Set (RTSTRUCT) files, which are then used to create complete volumetric modulated arc therapy (VMAT) plans. The clinical viability of the framework is demonstrated through a comprehensive dosimetric analysis, establishing the equivalence of treatment plans derived from automated contours versus those from expert manual delineations.
Materials and Methods
1. Dataset and preprocessing
The dataset utilized in this study was the Burdenko Glioblastoma Progression Dataset (BGPD), retrospectively obtained from The Cancer Imaging Archive (TCIA) [36]. The study population includes imaging data from 100 patients (50 male, 50 female; age range, 40 to 65 years) diagnosed with primary glioblastoma between 2014 and 2020. For each patient, the dataset provides a clinically acquired, non-contrast head CT scan acquired on an Optima 580 CT scanner (GE Healthcare, Chicago, IL, USA), and a corresponding T2-FLAIR MRI scan acquired from one of four different vendors. All images were accompanied by ground truth segmentation masks for all relevant tumor volumes and OARs. According to the dataset documentation, these segmentations were delineated by expert radiation oncologists and verified through a consensus review process at the source institution. While quantitative inter-observer variability metrics were not provided with the public dataset, the single-institution origin helps ensure consistency in contouring guidelines and practices. This represents a limitation, as the lack of a quantified inter-observer agreement prevents a direct comparison of the model's variability to a human baseline.
It is important to note that the BGPD was originally created for longitudinal progression analysis. To ensure relevance for the radiotherapy planning workflow, this study exclusively selected the initial, pre-treatment/immediate post-operative scans for all 100 patients. The selection criteria required the presence of the complete radiotherapy planning dataset, including the planning CT, T2-FLAIR MRI, and the corresponding RTSTRUCT files. This selection ensures that all anatomical data and contours are representative of the initial treatment design. The single-institution and progression-focused nature of the source dataset represents a potential limitation, which is addressed in the Discussion section.
The dataset was randomly partitioned at the patient level into three non-overlapping sets to prevent information leakage. A larger proportion of the data was allocated to the test set to ensure sufficient statistical power for the subsequent dosimetric validation. The data were distributed as follows—training set: 60 patients (60%), used for the iterative learning and optimization of the model's parameters; validation set: 10 patients (10%), used for hyperparameter tuning, model selection, and monitoring for overfitting throughout the training process; and testing set: 30 patients (30%), held out exclusively for the final, independent assessment of the model's geometric performance and for the comprehensive dosimetric validation study.
A standardized preprocessing pipeline was applied to all CT and MRI volumes to standardize the data characteristics and prepare them for the deep learning framework. This pipeline involved the following steps:
All volumes were resampled to an isotropic voxel spacing of 1 × 1 × 1 mm3. To preserve data fidelity, third-order B-spline interpolation was used for the intensity images, while nearest-neighbor interpolation was applied to the segmentation masks to maintain the discrete integer values of the anatomical labels.
The images were subsequently cropped or padded to a uniform matrix size of 256 × 256 pixels in the axial plane, centered on the brain. This step reduces computational load and focuses the model's learning on the relevant anatomical region.
Normalization procedures were tailored to each imaging modality to leverage their unique physical properties and clinical roles.
To preserve the quantitative HU required by the 'Raw HU' encoder path, Z-score normalization was not used. Instead, volumes were clipped to a clinically relevant range of [−1,000, 1,000] and scaled to [0, 1] using Min-Max normalization:
where Imin = –1,000 and Imax = 1,000. This ensures the network input retains the specific physical density information necessary for distinguishing bone from air and tissue.
Since MRI signal intensity is relative and lacks a standard unit like HU, T2-FLAIR volumes were normalized using Z-score standardization. To ensure robustness against background noise, statistics were calculated strictly within the brain parenchyma:
where μbrain and σbrain are the mean and standard deviation of voxel intensities within the provided brain masks. This effectively standardizes the soft-tissue contrast distribution across different patients and scanners.
To improve model generalization and prevent overfitting given the training cohort of 60 patients, a robust set of mild, on-the-fly data augmentations was applied dynamically during training [37]. Unlike static augmentation which creates a fixed dataset, this dynamic approach applies random transformations—specifically random rotations (±10°), horizontal/vertical flips, and minor affine shears—in real-time during data loading.
Consequently, the network is exposed to a unique, stochastically transformed variation of the input volume at every iteration. Over the course of the 100-epoch training schedule, this yields an effective training set size of 6,000 unique anatomical variations (60 patients × 100 epochs). This strategy simulates realistic clinical variability, such as minor patient positioning shifts, ensuring the model learns robust features rather than overfitting to the specific geometry of the limited cohort.
This comprehensive preprocessing pipeline ensures that the data supplied to the models is standardized, robust, and optimized for the segmentation task.
2. Deep learning segmentation framework
For the task of automated contouring, a specialized deep learning framework was developed. A complete schematic of the proposed framework is illustrated in Fig. 1. The framework is built around a dual-model strategy, comprising two distinct and independent U-Net–based networks. This approach was chosen over a single multi-modal fusion model to reflect the separate clinical use-cases for each modality.
Fig. 1.
Architectural overview of the proposed dual-modality deep learning framework. The dispatcher module routes computed tomography (CT) or magnetic resonance imaging (MRI) volumes to their respective specialized segmentation networks.
Each network functions as an individually optimized 'expert' with a distinct clinical role. The MRI model is employed for Target Volumes definition due to superior soft-tissue contrast, while the CT model is used for OAR contouring and defining the body volume for dosimetry. As these networks operate independently on their respective input modalities, no co-registration between the CT and MRI volumes is required for the segmentation process itself. Both networks are built upon an encoder-decoder architecture, significantly enhanced with advanced components including residual connections and attention mechanisms. To integrate these expert models into a single, clinically-deployable tool, a modality-dispatching module automatically identifies the input image's modality and directs it to the appropriate network. The specific architectures are detailed below.
1) The CT segmentation model: a dual-encoder attention U-Net
To address the specific challenges of CT imaging, such as low soft-tissue contrast and the quantitative nature of HU, a specialized dual-encoder attention U-Net was engineered. This model is designed to extract a richer set of features by simultaneously processing two distinct, complementary representations of the input CT data.
(1) Dual-encoder architecture
The core innovation of this model is its dual-encoder pathway. Instead of a single input stream, the network processes two inputs in parallel: (1) raw HU input to learn from the untransformed physical data and (2) normalized grayscale input (windowed for soft-tissue visualization) to learn from perceptual contrast.
These parallel encoders process inputs to the bottleneck, where feature maps are concatenated to combine quantitative, physics-based features with contrast-enhanced features.
(2) Residual connections
All convolutional blocks use residual connections with identity shortcuts to facilitate gradient flow and enable deeper architectures.
(3) Attention mechanisms
The network integrates additive spatial attention gates (AGs) [17] into the skip connections. Unlike standard U-Net skip connections which blindly concatenate all encoder features, these AGs utilize the up-sampled gating signal (g) from the decoder to analyze the feature maps (x) from the encoder.
The mechanism computes a pixel-wise attention coefficient map (α∈[0,1]) via an additive soft-attention formulation:
where σ1 is ReLU activation, σ2 is Sigmoid activation, Wx,Wg,Wψ represent the learned weight matrices, bg, bψ denote the bias terms, and linear transformations are computed via 1 × 1 convolutions. This coefficient map α is element-wise multiplied with the input feature map, automatically suppressing activation in irrelevant background regions while enhancing features in salient anatomical areas (e.g., tumor boundaries) before concatenation.
Both parallel encoders share identical architecture (but not weights) with four down-sampling stages with a filter progression of 64, 128, 256, and 512. At the bottleneck, the 512-channel feature maps from both encoders are concatenated (1,024 channels total) and passed through a final residual block with 1,024 filters. The expansive decoder path symmetrically consists of four up-sampling stages, using 2 × 2 transposed convolutions. The resulting feature map is concatenated with the outputs from the attention-gated skip connections from both encoders and processed by a residual block. The filter progression for the decoder is 512, 256, 128, and 64. The final layer is a 1 × 1 convolution with a softmax activation.
2) The MRI segmentation model: a single-encoder attention U-Net
For MRI segmentation, a single-encoder attention U-Net was developed. This architectural choice is directly informed by the physical properties of T2-FLAIR imaging. As this sequence provides a single, high-contrast representation, a focused single-encoder architecture is the most direct and efficient method for feature extraction.
The model's architecture, illustrated in Fig. 1, is constructed upon an encoder-decoder topology with four down-sampling stages. The encoder path processes the input image, applying a residual block followed by a 2 × 2 max-pooling operation at each stage. The number of feature filters doubles at each stage, following the progression of 64, 128, 256, and 512. The bottleneck consists of a residual block with 1,024 filters. The expansive decoder path mirrors the encoder with four symmetric up-sampling stages. Each stage uses a 2 × 2 transposed convolution followed by concatenation with the output from the corresponding attention-gated skip connection, and a final residual block. The decoder filter progression is 512, 256, 128, and 64. The final output is generated by a 1 × 1 convolutional layer with a softmax activation function.
3) Clinical workflow integration: the modality dispatcher
To integrate the two expert networks, a rule-based modality dispatcher was implemented. This lightweight script automates the model selection process by reading the DICOM "Modality" tag (0008,0060) from the input series header.
Let fCT and fMRI denote the complete preprocessing and inference pipelines for the CT and MRI models, respectively. If V represents the input volume and M(V) the extracted modality string, the final segmentation mask, S is generated by:
Once the appropriate model is selected, the system proceeds with slice-by-slice inference and reconstructs the 3D volume.
(1) Preservation of spatial integrity
A critical mechanism in the dispatcher's workflow ensures that the generated contours maintain a 1:1 spatial correspondence with their source image. After inference, the dispatcher mathematically maps the voxel-based prediction matrix back to the patient's physical coordinate system by copying the ImagePositionPatient (Origin), ImageOrientationPatient (Direction), and FrameOfReferenceUID tags from the source DICOM header to the output RTSTRUCT.
It is important to note that this process preserves alignment relative to the source modality only. Since CT and MRI scans are acquired at different times and patient positions, the raw outputs of the CT and MRI models do not share a unified coordinate system. Consequently, the framework produces two independent structure sets. The final co-registration of these structure sets is performed within the treatment planning system (TPS) using standard rigid or deformable registration tools under the supervision of a medical physicist, mirroring the standard clinical workflow for manual contouring.
(2) RTSTRUCT generation
Finally, to enable clinical utility, the voxel-based segmentation masks were translated into vector-based contours using the Marching Squares algorithm and simplified using the Ramer-Douglas-Peucker algorithm. These sequences were encoded into DICOM RTSTRUCT objects using the pydicom library, preserving the Series Instance UID to ensure correct overlay on the original images within the TPS.
3. Training and optimization strategy
To address the common challenges in medical image segmentation, namely class imbalance and the precise delineation of boundaries, the models were trained using a hybrid loss function. This function is a linear combination of Dice loss and focal loss, a composite approach that has demonstrated robust performance in numerous segmentation tasks [28]. The function is defined as:
The dice loss (LDice), derived from the Dice similarity coefficient, is highly effective at handling class imbalance by directly optimizing for spatial overlap.
The focal loss (LFocal) is a modification of standard cross-entropy, designed to focus training on hard-to-classify examples (e.g., boundary voxels or small structures). A focusing parameter, gamma (γ), controls the rate at which easy examples are down-weighted. Following the original work and subsequent validation in the field, a value of γ = 2 was used [27].
The weighting coefficient, λ, between the two loss components was set to 1.0. This equal weighting is a common and effective starting point in the literature, providing a balanced optimization objective that leverages the strengths of both functions without a complex hyperparameter search, the results of which are often dataset-specific.
Training used the Adam optimizer with adaptive learning rates. The key hyperparameters and training protocols were as follows:
- Initial learning rate: The optimizer was initialized with a learning rate of 1 × 10-4.
- Learning rate schedule: A 'ReduceLROnPlateau' callback was employed to dynamically adjust the learning rate. If the validation Dice score did not improve for 5 consecutive epochs, the learning rate was reduced by a factor of 5.
- Batch size: A batch size of 8 was used, determined by the constraints of the available GPU memory (NVIDIA T4 x2).
- Epochs and convergence: The models were trained for a maximum of 100 epochs. However, to prevent overfitting and reduce unnecessary computation, an 'EarlyStopping' criterion was implemented. Training was halted if the validation Dice score did not show improvement for 15 consecutive epochs, and the model weights from the epoch with the best validation score were restored.
4. Evaluation metrics
Standard evaluation metrics assessed volumetric overlap, volumetric bias, and boundary accuracy on the independent test set (P, predicted segmentation; G, ground truth).
The following metrics were used: DSC for volumetric overlap (range, 0 to 1), sensitivity for fraction of ground truth correctly identified, relative volume difference for volumetric bias (%), Hausdorff distance (HD) for maximum boundary discrepancy (mm), 95% Hausdorff distance (HD95) for boundary error excluding outliers (mm), and average symmetric surface distance for mean boundary error (mm).
5. Ablation study: validating the dual-encoder architecture
To validate the architectural design of the proposed CT model and to contextualize its performance, a comparative analysis was conducted against a standard single-encoder U-Net. The performance was evaluated using the same 30-patient CT test set and geometric metrics.
To quantify the benefit of the dual-encoder design, a standard U-Net with a single encoder was implemented. For a fair comparison, the two CT input streams (raw HU and normalized Grayscale) were concatenated along the channel axis to form a two-channel input for this baseline model. All other architectural parameters (network depth, residual blocks, attention mechanisms) and training protocols were kept identical to the proposed model.
6. Clinical validation and dosimetric analysis
For dosimetric validation, the original, clinically approved VMAT plans were retrieved directly from the Burdenko dataset's clinical records for all 30 test patients. These reference plans served as the ground truth for plan quality. They were originally generated using the Eclipse TPS (v. 16.1, Varian Medical Systems, Palo Alto, CA, USA). The treatment technique consisted of two full 6 MV arcs prescribed to 60 Gy in 30 fractions. To satisfy technical reproducibility, it is noted that the original plan optimization utilized the photon optimizer algorithm, and the final dose calculation was performed using the anisotropic analytical algorithm with a calculation grid resolution of 2.5 mm.
The core of our validation involved a "fixed-dose" methodology to strictly isolate the geometric impact of the automated segmentation. We applied the single, ground-truth dose distribution (from the reference RTDOSE file) to two distinct structure sets: (1) the original expert-delineated RTSTRUCT (manual contours) and (2) the RTSTRUCT generated by our automated CT framework. This approach ensures that any observed dosimetric deviation is solely attributable to differences in the contouring, as the beam angles, fluence patterns, and dose distribution remained constant. A quantitative comparison of the resulting dose-volume histogram metrics was then performed for the target (PTV coverage) and OARs (OAR sparing).
7. Statistical analysis
Analysis was conducted to quantify the dosimetric agreement between the treatment plans applied to the expert manual contours versus the automated CT-model contours. All statistical computations were performed using Python (SciPy library).
1) Power analysis
An a priori power analysis was performed to justify the cohort size. Based on variance estimates from pilot data and comparable literature, it was determined that a sample size of N=30 provides >80% power to detect a clinically significant mean difference of 2 Gy in PTV coverage (D95%) at a significance level of α=0.05.
2) Hypothesis testing
The normality of the difference distributions for each dosimetric metric was assessed using the Shapiro-Wilk test. For normally distributed data, a paired two-tailed Student's t-test was employed; for non-normal distributions, the non-parametric Wilcoxon signed-rank test was used. To strictly control the family-wise error rate across the multiple dosimetric endpoints, the Bonferroni correction was applied to the alpha threshold.
3) Agreement and effect size
Beyond p-values, which depend on sample size, clinical equivalence was assessed using 95% confidence intervals (CI) for the mean differences. Furthermore, Bland-Altman analysis was conducted to visualize systematic bias and the limits of agreement (LoA). Bias was defined as the mean difference between methods, and the LoA were calculated as bias ± 1.96 × standard deviation of the differences, estimating the range within which 95% of differences are expected to lie. Cohen's d was calculated to quantify the standardized effect size of any observed differences.
Results
The proposed dual-modality deep learning framework was successfully implemented and evaluated on the independent test set comprising 30 patients. The following sections present the quantitative geometric accuracy for each model, an analysis of workflow efficiency, and the definitive dosimetric validation.
1. Segmentation performance of the MRI model
The performance of the T2-FLAIR MRI segmentation model was first evaluated on the independent test set. The model demonstrated a high degree of accuracy across all evaluated structures, achieving robust volumetric overlap and precise boundary delineation.
The MRI model demonstrated a high degree of accuracy for segmenting structures on T2-FLAIR images. The quantitative geometric performance is summarized in Table 1. For the primary target volumes, the model achieved a mean DSC of 0.94 ± 0.03 for the PTV and 0.91 ± 0.05 for the GTV. Surface distance metrics confirmed excellent boundary agreement, with a mean HD95 of 2.15 ± 0.88 mm for the PTV. Qualitative results, presented in Fig. 2, show high fidelity in both 2D multi-planar views and 3D renderings.
Table 1.
Quantitative segmentation performance of the MRI model on the independent test set
| Structure | DSC | HD95 (mm) | ASSD (mm) | RVD (%) | Sensitivity |
|---|---|---|---|---|---|
| Target volumes | |||||
| PTV | 0.94 ± 0.03 | 2.15 ± 0.88 | 0.75 ± 0.21 | –2.8 ± 4.5 | 0.95 ± 0.04 |
| CTV | 0.93 ± 0.04 | 2.41 ± 1.05 | 0.88 ± 0.30 | –1.9 ± 5.2 | 0.94 ± 0.05 |
| GTV | 0.91 ± 0.05 | 2.95 ± 1.32 | 1.02 ± 0.45 | –0.5 ± 6.8 | 0.92 ± 0.06 |
| Organs-at-risk | |||||
| Brainstem | 0.89 ± 0.06 | 3.10 ± 1.15 | 1.15 ± 0.40 | 1.5 ± 3.8 | 0.88 ± 0.07 |
| Optic chiasm | 0.85 ± 0.09 | 2.05 ± 0.95 | 0.95 ± 0.35 | 2.1 ± 7.5 | 0.86 ± 0.10 |
| Optic nerve left | 0.84 ± 0.10 | 1.90 ± 0.80 | 0.90 ± 0.41 | –3.2 ± 8.1 | 0.83 ± 0.11 |
| Optic nerve right | 0.85 ± 0.09 | 1.85 ± 0.75 | 0.87 ± 0.38 | –2.5 ± 7.9 | 0.84 ± 0.10 |
| Eye left | 0.96 ± 0.02 | 1.55 ± 0.60 | 0.65 ± 0.18 | 0.8 ± 2.5 | 0.97 ± 0.03 |
| Eye right | 0.97 ± 0.02 | 1.48 ± 0.55 | 0.62 ± 0.15 | 0.6 ± 2.3 | 0.97 ± 0.03 |
| Lens left | 0.82 ± 0.12 | 1.10 ± 0.50 | 0.50 ± 0.22 | 4.5 ± 9.2 | 0.81 ± 0.13 |
| Lens right | 0.83 ± 0.11 | 1.05 ± 0.48 | 0.48 ± 0.20 | 4.1 ± 8.9 | 0.82 ± 0.12 |
Values are presented as mean ± 1 standard deviation.
MRI, magnetic resonance imaging; DSC, Dice similarity coefficient; HD95, 95% Hausdorff distance (mm); ASSD, average symmetric surface distance (mm); RVD, relative volume difference (%); PTV, planning target volume; CTV, clinical target volume; GTV, gross tumor volume.
Fig. 2.
Qualitative segmentation results of the magnetic resonance imaging model on a representative test patient. The panel displays zoomed-in comparisons of the automated predictions (various colors) against the expert ground truth (blue). (A–C) Axial views of gross tumor volume, clinical target volume, and PTV. (D–F) Axial views of organs-at-risk (brainstem, chiasm, eyes/optics). (G) Coronal view of the PTV. (H, I) Sagittal views of targets and the brainstem.
Qualitative assessment evaluated the clinical acceptability and spatial precision of the MRI segmentations.
Fig. 2 presents a magnified analysis of a representative patient. Unlike the global view, these zoomed tiles highlight the specific boundary behavior of the model. By overlaying the expert ground truth in blue against the automated contours, small but non-zero differences become visible—specifically in the challenging gradient of the Brainstem (Fig. 2D) and the PTV margins (Fig. 2C). These visualizations demonstrate that while the model achieves high overall geometric fidelity, it transparently preserves realistic deviations at the soft-tissue boundaries rather than over-smoothing.
To further evaluate the spatial coherence and anatomical plausibility of the predictions, three-dimensional renderings of the segmented structures are shown in Fig. 3. These volumetric visualizations confirm the model's robust three-dimensional understanding from its 2D training, showcasing the correct spatial relationships between the nested target volumes (GTV, CTV, and PTV) and the surrounding OARs.
Fig. 3.
Three-dimensional surface rendering illustrating volumetric inter-slice continuity of the automated target volumes: (A) gross tumor volume (GTV) showing a contiguous structure without inter-slice gaps, (B) clinical target volume (CTV) demonstrating smooth spatial coherence around the GTV, (C) planning target volume (PTV) confirming full-volume continuity and enclosure of the CTV, and (D) composite 3D visualization of GTV, CTV, and PTV demonstrating correct anatomical nesting (GTV ⊂ CTV ⊂ PTV) and spatial plausibility relative to surrounding anatomy.
2. Segmentation performance of the CT model
The CT segmentation model, which forms the basis for the subsequent dosimetric validation, was evaluated on the 30-patient CT test set. The dual-encoder architecture demonstrated strong performance in the challenging low-contrast CT environment.
1) Quantitative accuracy
The quantitative geometric accuracy of the CT model is presented in Table 2. For the primary target volumes, the model achieved a mean DSC of 0.92 ± 0.04 for the PTV, 0.91 ± 0.05 for the CTV, and 0.88 ± 0.07 for the GTV. Boundary agreement was excellent, with a mean HD95 of 2.55 ± 1.10 mm for the PTV. Segmentation of key OARs was also highly accurate, with the Brainstem yielding a mean DSC of 0.88 ± 0.07. Detailed results for all structures are provided in Table 2.
Table 2.
Quantitative segmentation performance of the CT model on the independent test set
| Structure | DSC | HD95 (mm) | ASSD (mm) | RVD (%) | Sensitivity |
|---|---|---|---|---|---|
| Target volumes | |||||
| PTV | 0.92 ± 0.04 | 2.55 ± 1.10 | 0.95 ± 0.35 | –3.1 ± 5.1 | 0.93 ± 0.05 |
| CTV | 0.91 ± 0.05 | 2.80 ± 1.25 | 1.10 ± 0.42 | –2.4 ± 6.0 | 0.92 ± 0.06 |
| GTV | 0.88 ± 0.07 | 3.45 ± 1.55 | 1.35 ± 0.55 | –0.9 ± 7.9 | 0.89 ± 0.08 |
| Organs-at-risk | |||||
| Brainstem | 0.88 ± 0.07 | 3.40 ± 1.30 | 1.25 ± 0.48 | 1.8 ± 4.2 | 0.87 ± 0.08 |
| Optic chiasm | 0.83 ± 0.10 | 2.25 ± 1.05 | 1.05 ± 0.40 | 2.5 ± 8.0 | 0.84 ± 0.11 |
| Optic nerve left | 0.82 ± 0.11 | 2.10 ± 0.90 | 1.00 ± 0.45 | –3.8 ± 8.8 | 0.81 ± 0.12 |
| Optic nerve right | 0.83 ± 0.10 | 2.05 ± 0.85 | 0.98 ± 0.42 | –2.9 ± 8.5 | 0.82 ± 0.11 |
| Eye left | 0.95 ± 0.03 | 1.80 ± 0.70 | 0.75 ± 0.25 | 1.0 ± 3.0 | 0.96 ± 0.04 |
| Eye right | 0.96 ± 0.03 | 1.75 ± 0.65 | 0.72 ± 0.22 | 0.9 ± 2.8 | 0.96 ± 0.04 |
| Lens left | 0.80 ± 0.14 | 1.25 ± 0.60 | 0.55 ± 0.28 | 5.1 ± 9.8 | 0.79 ± 0.15 |
| Lens right | 0.81 ± 0.13 | 1.20 ± 0.55 | 0.53 ± 0.25 | 4.7 ± 9.5 | 0.80 ± 0.14 |
Values are presented as mean ± 1 standard deviation.
CT, computed tomography; DSC, Dice similarity coefficient; HD95, 95% Hausdorff distance (mm); ASSD, average symmetric surface distance (mm); RVD, relative volume difference (%); PTV, planning target volume; CTV, clinical target volume; GTV, gross tumor volume.
2) Qualitative CT validation
Fig. 4 displays the automated contours for the same representative case shown in the MRI analysis (Fig. 2), overlaid on the axial, coronal, and sagittal CT views. Qualitative analysis serves a specific scientific purpose here: to visually validate the dual-encoder model's performance in the challenging low-contrast environment of CT. Unlike MRI, soft-tissue boundaries are often indistinct in CT; however, the visualizations confirm that the proposed framework effectively identifies these subtle boundaries (particularly for the brainstem and optic chiasm), visually supporting the quantitative benefits of the dual-input stream established in the ablation study.
Fig. 4.
Qualitative results of the computed tomography (CT) segmentation model. This figure displays the same representative patient shown in Fig. 2, allowing for a direct comparison of anatomy between modalities. The multi-planar views (axial [A], coronal [B], and sagittal [C]) demonstrate high spatial agreement for both target volumes and organs-at-risk, validating the model's performance in the low-contrast CT environment. GTV, gross tumor volume; CTV, clinical target volume; PTV, planning target volume.
3) Ablation study: validating the dual-encoder architecture
To validate the efficacy of the proposed dual-encoder architecture, an ablation study was conducted. Its performance on the CT test set was compared against a standard single-encoder U-Net baseline that was trained and evaluated on our dataset using identical protocols. The results of this direct comparison are summarized in Table 3.
Table 3.
Ablation study comparing the geometric performance (Dice similarity coefficient) of the proposed dual-encoder CT model versus a standard single-encoder U-Net baseline
| Structure | Proposed model (dual-encoder) | Standard U-Net (single-encoder) |
|---|---|---|
| PTV | 0.92 ± 0.04 | 0.89 ± 0.06 |
| GTV | 0.88 ± 0.07 | 0.85 ± 0.08 |
| Brainstem | 0.88 ± 0.07 | 0.86 ± 0.08 |
| Optic chiasm | 0.83 ± 0.10 | 0.80 ± 0.11 |
| Eye (Avg.) | 0.95 ± 0.03 | 0.94 ± 0.03 |
Values are presented as mean ± standard deviation.
CT, computed tomography; PTV, planning target volume; GTV, gross tumor volume.
The proposed dual-encoder model demonstrated a statistically significant improvement in segmentation accuracy over the standard U-Net baseline across all evaluated structures (p < 0.05). Notably, for PTV segmentation, the proposed model achieved a mean DSC of 0.92, a marked improvement over the 0.89 achieved by the standard U-Net. This confirms that the dual-stream processing of Raw HU and windowed grayscale data provides a tangible and significant benefit for CT segmentation over a simpler, single-input approach.
While nnU-Net represents the current state-of-the-art, direct comparison on our dataset was not performed. Literature values from other studies are not directly comparable due to differences in dataset, anatomy, and task complexity.
3. Clinical workflow and dosimetric validation
1) Clinical workflow efficiency analysis
To assess the practical clinical utility of the framework, an analysis of workflow efficiency and contour acceptability was performed on the test set.
The automated contouring and RTSTRUCT generation process was highly efficient, requiring an average of 2–3 minutes per patient case on a dual NVIDIA T4 GPU setup. This represents a substantial time saving compared to the several hours typically required for manual delineation of all relevant structures.
Furthermore, a qualitative review of all 30 test cases was conducted to assess the clinical acceptability of the automated contours. This assessment was performed by a multidisciplinary team comprising three board-certified radiation oncologists and six senior medical physicists. All 30 cases were deemed clinically acceptable, with no instances of gross segmentation failure. The contours were judged to be a high-quality starting point for clinical use, requiring at most minor edits to meet institutional standards. This low edit burden suggests that the framework's output could be readily integrated into a clinical workflow, potentially requiring only a physician's final review and approval rather than full re-delineation.
2) Dosimetric equivalence analysis
The primary clinical validation consisted of a dosimetric comparison using the reference plan methodology. A qualitative comparison of the dose-volume histograms for a representative patient, shown in Fig. 5, reveals visually indistinguishable dose distributions between the plan evaluated on manual versus automated contours. This suggests a high degree of qualitative agreement.
Fig. 5.
Dose-volume histogram (DVH) comparison. Solid lines represent the DVH generated from the volumetric modulated arc therapy (VMAT) plan using expert manual contours (ground truth), and dashed lines represent the DVH generated from the VMAT plan using the automated contours.
It is important to specify that the 'automated contours' used for this specific quantitative dosimetric analysis were derived from the CT-based model. This is because the CT dataset provides the electron density information required for the dose calculation algorithm. The role of the MRI-based model in this workflow was to provide the complementary, high-contrast definition of the GTV boundaries, which served as a visual verification tool for the target volumes during the plan review process.
To statistically confirm these qualitative findings and assess inter-patient variability across the entire 30-patient test set, a comprehensive statistical visualization and analysis were performed. Fig. 6 presents a Bland-Altman plot assessing the agreement for the primary target coverage metric (PTV D95%) and box plots comparing the distributions for key OAR maximum doses. The Bland-Altman analysis (Fig. 6A) reveals a negligible mean bias of –0.04 Gy, with narrow 95% LoA of [–0.22, 0.14] Gy, demonstrating a lack of systematic error and excellent agreement. The box plots (Fig. 6B) show a nearly identical distribution, median, and interquartile range for all evaluated metrics, visually confirming the equivalence between the two contouring methods.
Fig. 6.
Comprehensive statistical analysis of dosimetric equivalence across the 30-patient test set. (A) Bland-Altman plot for the planning target volume (PTV) D95% metric shows negligible bias (mean difference, –0.04 Gy) and narrow 95% limits of agreement (LoA) (bias ± 1.96 standard deviation), indicating that differences between methods are random rather than systematic. (B) Side-by-side box plots for key PTV and organ-at-risk dosimetric metrics demonstrate virtually identical distributions between manual (blue) and automated (orange) contour sets.
The definitive quantitative results are presented in Table 4. The paired statistical analysis, with a Bonferroni correction for multiple comparisons, confirmed that there were no statistically significant differences between the manual and automated plan metrics for any evaluated parameter. The combination of negligible mean differences, narrow 95% CIs that consistently included zero, and trivial effect sizes (all |d| < 0.3) provides powerful, multifaceted evidence of dosimetric equivalence. This confirms that the automated contours produce VMAT plans of statistically and clinically identical quality to the expert-defined standard.
Table 4.
Quantitative dosimetric comparison of treatment plans generated using manual vs. automated contours for the entire test set
| Structure | Dosimetric parameter | Manual contours plan | Automated contours plan | Mean difference (95% CI) | Cohen’s d | p-value |
|---|---|---|---|---|---|---|
| Target volume (PTV) | D95%(Gy) | 57.18 ± 1.12 | 57.15 ± 1.15 | –0.03 (–0.15 to 0.09) | –0.03 | >0.99 |
| D98% (Gy) | 55.45 ± 1.30 | 55.41 ± 1.33 | –0.04 (–0.18 to 0.10) | –0.03 | >0.99 | |
| D2% (Gy) | 62.11 ± 0.95 | 62.13 ± 0.98 | +0.02 (–0.09 to 0.13) | +0.02 | >0.99 | |
| Homogeneity index | 0.12 ± 0.02 | 0.12 ± 0.02 | 0.00 (–0.01 to 0.01) | 0.00 | >0.99 | |
| Conformality index | 0.95 ± 0.04 | 0.94 ± 0.04 | –0.01 (–0.03 to 0.01) | –0.25 | 0.52 | |
| Organs-at-risk | ||||||
| Brainstem | Dmax (Gy) | 52.85 ± 5.10 | 52.89 ± 5.13 | +0.04 (–0.25 to 0.33) | +0.01 | >0.99 |
| Optic chiasm | Dmax (Gy) | 44.15 ± 6.20 | 44.11 ± 6.25 | –0.04 (–0.35 to 0.27) | –0.01 | >0.99 |
| Optic nerve L | Dmax (Gy) | 35.50 ± 8.50 | 35.58 ± 8.45 | +0.08 (–0.40 to 0.56) | +0.01 | >0.99 |
| Optic nerve R | Dmax (Gy) | 36.10 ± 8.90 | 36.12 ± 8.92 | +0.02 (–0.45 to 0.49) | 0.00 | >0.99 |
| Eye L | Dmax (Gy) | 10.25 ± 4.50 | 10.31 ± 4.55 | +0.06 (–0.20 to 0.32) | +0.01 | >0.99 |
| Eye R | Dmax (Gy) | 11.50 ± 4.80 | 11.53 ± 4.81 | +0.10 (–0.30 to 0.50) | +0.02 | >0.99 |
| Brain - PTV | V20Gy (%) | 25.4 ± 5.5 | 25.5 ± 5.6 | –0.03 (–0.15 to 0.09) | –0.03 | >0.99 |
A paired statistical analysis was performed (paired t-test or Wilcoxon signed-rank test). All dose values are in Gy, volumetric values in %, and indices are dimensionless. Data are presented as mean ± 1 standard deviation. A p < 0.05 was considered statistically significant.
CI, confidence interval; PTV, planning target volume.
Discussion and Conclusion
In this study, a fully automated, dual-modality deep learning framework for glioblastoma segmentation was developed and its clinical viability was rigorously validated through a dosimetric equivalence analysis. The principal finding is the demonstration of a lack of statistically significant differences between dosimetric parameters derived from the framework’s automated contours and those from expert manual delineations. This result provides strong evidence that the tool is not only technically accurate but also sufficiently robust for safe integration into the clinical radiotherapy workflow.
The geometric accuracy of the proposed models aligns with performance benchmarks from state-of-the-art literature on brain tumor and head-and-neck segmentation [10,12,23,30]. The ablation study confirmed a tangible performance benefit for the dual-encoder CT architecture over a standard U-Net baseline, justifying a design choice rooted in the physics of CT imaging. It should be noted that the contribution of this work does not lie in architectural novelty, but rather in the rigorous end-to-end clinical validation of a specialized framework. While many studies report high DSC scores [19,21], few provide the comprehensive dosimetric validation that is the ultimate test of clinical utility [8]. This study addresses that critical gap, joining the limited research confirming the downstream impact of automated segmentation [1,33,34]. The statistical equivalence across all key dosimetric parameters is the most powerful indicator of the framework's clinical readiness.
Note on State-of-the-Art Comparison: While frameworks like nnU-Net [23] represent the current state-of-the-art for medical image segmentation, a direct head-to-head comparison on our dataset was not performed. Reported literature values for nnU-Net on other anatomical sites (e.g., head and neck OARs) are not directly comparable to our specific glioblastoma planning task due to differences in dataset characteristics, anatomical complexity, and target structures. The standard U-Net baseline provides a controlled comparison to isolate the benefit of our dual-encoder design.
A critical design choice in this framework was the implementation of independent models for CT and MRI (a 'human-in-the-loop integration' approach) rather than a single multi-channel fusion network ('Early Fusion'). While Early Fusion architectures can theoretically leverage complementary cross-modality features, they introduce a rigid dependency on high-quality pre-inference image registration. In clinical practice, the co-registration of distorted MR sequences to planning CT is frequently a source of geometric uncertainty [13]. In a fusion architecture, even slight misalignments between input channels can lead to conflicting feature extraction and hallucinated boundaries. By decoupling the segmentation tasks, our framework ensures that feature extraction remains robust within each modality's native space, uncorrupted by registration errors. This is particularly critical because the CT provides the electron density map essential for dose calculation. An 'Early Fusion' approach risks propagating inherent MRI geometric distortions into the CT spatial domain, potentially compromising the dosimetric integrity of the plan. Furthermore, while training two separate 'expert' networks increases the initial computational overhead compared to a single model, this trade-off is justified by the gain in workflow modularity. Unlike a fusion model which implies an 'all-or-nothing' dependency on both inputs, our independent system avoids a single point of failure; if one modality is delayed or unavailable, the other can still perform reliable segmentation. The final integration of these independent predictions occurs safely in the TPS, adhering to standard human-supervised verification protocols.
However, it is crucial to acknowledge the physiological limitations of deep learning in target definition, particularly regarding the 'prediction' of CTV and PTV on CT images. Unlike the GTV, which is a visible gross anatomical entity, the CTV and PTV are conceptual volumes derived from clinical margins and microscopic spread probabilities. The high accuracy observed in our CT-based predictions (PTV DSC 0.92) does not imply the model detects microscopic disease; rather, it indicates the network's successful statistical learning of the institution's margin expansion protocols and contouring habits relative to the skull and resection cavity. Consequently, given the inherent low soft-tissue contrast of CT, the automated CTV contours on CT should be viewed as a reliable geometric starting point necessitating expert verification against MRI fusion, whereas the MRI-based contours offer a more physiologically robust definition of the visible tumor and edema.
These findings have direct implications for the clinical radiotherapy workflow. Manual delineation is a known bottleneck and a source of inter-observer variability [6]. By generating consistent contours in approximately 2–3 minutes, the proposed framework can substantially mitigate these challenges, allowing clinical focus to shift from delineation to plan verification. The demonstrated dosimetric equivalence suggests the tool can help standardize contouring quality, leading to more consistent treatment plans. The integrated modality dispatcher and direct DICOM RTSTRUCT output are features designed for seamless workflow automation. While the underlying logic is a deterministic rule-based system (ensuring transparency rather than an opaque "black box"), the user experience is designed to be effectively autonomous.
A number of limitations inform the context of this study and suggest avenues for future work. First, this study utilized the Burdenko dataset, a single-institution collection originally developed for progression analysis. While this ensures consistency in imaging and contouring guidelines, it limits the assessment of generalizability across different institutions and scanner protocols. Furthermore, only the pre-treatment planning scans were used, so the model's performance on longitudinal or recurrent tumor scans remains unassessed. A prospective, multi-institutional validation is the essential next step to confirm the framework's robustness in diverse clinical settings. A second limitation is that while this work establishes a crucial proof-of-concept, a direct comparison to proprietary commercial auto-segmentation solutions was not performed. Finally, future work could focus on incorporating uncertainty quantification methods to provide a confidence map that could guide clinician review and identify challenging cases.
The proposed dual-modality deep learning framework demonstrates high geometric accuracy for glioblastoma segmentation on both CT and MRI. Critically, it achieves dosimetric equivalence to the expert-manual standard, producing VMAT plans of statistically indistinguishable quality. This study provides strong, data-driven evidence that the framework is a robust and reliable tool poised to significantly enhance efficiency and standardization in the radiotherapy planning workflow for glioblastoma.
Footnotes
Statement of Ethics
This study was a retrospective analysis of de-identified data from a publicly available dataset (the Burdenko Glioblastoma Progression dataset). The research did not involve any intervention on living patients. Institutional Review Board approval was waived, and informed consent was not required because all patient data were previously collected and anonymized.
Conflict of Interest
No potential conflict of interest relevant to this article was reported.
Funding
None.
Author Contributions
Conceptualization, OH; Data curation, OH; Formal analysis, OH; Investigation, OH; Funding acquisition, AR; Methodology, OH, YO, MR; Project administration, AR; Resources, MZ, DB; Software, OH; Supervision, AR; Validation, YO, MR; Visualization, OH; Writing of the original draft, OH; Writing of the review and editing, YO, MR, MZ, DB, AR.
Data Availability Statement
This study utilized publicly available datasets. All imaging and contour data were obtained from the Burdenko Glioblastoma Progression dataset on The Cancer Imaging Archive (TCIA). This dataset is open-access and available at DOI: 10.7937/E1QP-D183. No new patient data were generated for this study, and all relevant data are either included in this article or referenced in the public repository. Further inquiries about the data can be directed to the corresponding author.
References
- 1.Turcas A, Leucuta D, Balan C, et al. Deep-learning magnetic resonance imaging-based automatic segmentation for organs-at-risk in the brain: accuracy and impact on dose distribution. Phys Imaging Radiat Oncol. 2023;27:100454. doi: 10.1016/j.phro.2023.100454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Vandewinckele L, Claessens M, Dinkla A, et al. Overview of artificial intelligence-based applications in radiotherapy: recommendations for implementation and quality assurance. Radiother Oncol. 2020;153:55–66. doi: 10.1016/j.radonc.2020.09.008. [DOI] [PubMed] [Google Scholar]
- 3.Sarria GR, Kugel F, Roehner F, et al. Artificial intelligence–based autosegmentation: advantages in delineation, absorbed dose-distribution, and logistics. Adv Radiat Oncol. 2024;9:101394. doi: 10.1016/j.adro.2023.101394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Emami B, Lyman J, Brown A, et al. Tolerance of normal tissue to therapeutic irradiation. Int J Radiat Oncol Biol Phys. 1991;21:109–22. doi: 10.1016/0360-3016(91)90171-y. [DOI] [PubMed] [Google Scholar]
- 5.Mayo C, Yorke E, Merchant TE. Radiation associated brainstem injury. Int J Radiat Oncol Biol Phys. 2010;76:S36–41. doi: 10.1016/j.ijrobp.2009.08.078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lustberg T, van Soest J, Gooding M, et al. Clinical evaluation of atlas and deep learning based automatic contouring for lung cancer. Radiother Oncol. 2018;126:312–7. doi: 10.1016/j.radonc.2017.11.012. [DOI] [PubMed] [Google Scholar]
- 7.Vrtovec T, Mocnik D, Strojan P, Pernus F, Ibragimov B. Auto-segmentation of organs at risk for head and neck radiotherapy planning: from atlas-based to deep learning methods. Med Phys. 2020;47:e929–50. doi: 10.1002/mp.14320. [DOI] [PubMed] [Google Scholar]
- 8.Hamzaoui O, Oulhouq Y, Rezzoug M, Bakari D, Zerfaoui M, Rrhioua A. DART-Net: a novel deep learning framework for precise radiotherapy planning with automated multiorgan segmentation and RTSTRUCT generation. J Med Phys. 2025;50:626–35. doi: 10.4103/jmp.jmp_238_25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Jalalifar SA, Soliman H, Sahgal A, Sadeghi-Naini A. Impact of tumour segmentation accuracy on efficacy of quantitative MRI biomarkers of radiotherapy outcome in brain metastasis. Cancers (Basel) 2022;14:5133. doi: 10.3390/cancers14205133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Nikolov S, Blackwell S, Zverovitch A, et al. Clinically applicable segmentation of head and neck anatomy for radiotherapy: deep learning algorithm development and validation study. J Med Internet Res. 2021;23:e26151. doi: 10.2196/26151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Menze BH, Jakab A, Bauer S, et al. The multimodal brain tumor image segmentation benchmark (BRATS) IEEE Trans Med Imaging. 2015;34:1993–2024. doi: 10.1109/TMI.2014.2377694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bakas S, Reyes M, Jakab A, et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. arXiv: 1811.02629 [Preprint]. 2018 [cited 2025 Oct 10]. Available from: https://doi.org/10.48550/arXiv.1811.02629. [DOI]
- 13.Masitho S, Putz F, Mengling V, et al. Accuracy of MRI-CT registration in brain stereotactic radiotherapy: impact of MRI acquisition setup and registration method. Z Med Phys. 2022;32:477–87. doi: 10.1016/j.zemedi.2022.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Havaei M, Davy A, Warde-Farley D, et al. Brain tumor segmentation with deep neural networks. Med Image Anal. 2017;35:18–31. doi: 10.1016/j.media.2016.05.004. [DOI] [PubMed] [Google Scholar]
- 15. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015; 2015 Oct 5-9; Munich, Germany. Cham: Springer International Publishing; 2015. p. 234-41. [Google Scholar]
- 16.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27-30; Las Vegas, NV, USA. New York: Institute of Electrical and Electronics Engineers; 2016. p. 770-8. [Google Scholar]
- 17.Oktay O, Schlemper J, Le Folgoc L, et al. Attention U-Net: learning where to look for the pancreas. doi: 10.48550/arXiv.1804.03999. arXiv: 1804.03999 [Preprint]. 2018 [cited 2025 Oct 10]. Available from: https://doi.org/10.48550/arXiv.1804.03999. [DOI] [Google Scholar]
- 18.Woo S, Park J, Lee JY, Kweon IS. CBAM: convolutional block attention module. In: 2018 European Conference on Computer Vision (ECCV); 2018 Sep 8-14; Munich, Germany. Cham: Springer; 2018. p. 3-19. [Google Scholar]
- 19.Pereira S, Pinto A, Alves V, Silva CA. Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans Med Imaging. 2016;35:1240–51. doi: 10.1109/tmi.2016.2538465. [DOI] [PubMed] [Google Scholar]
- 20.Wang G, Li W, Ourselin S, Vercauteren T. Automatic brain tumor segmentation using convolutional neural networks with test-time augmentation. In: International MICCAI Brainlesion Workshop; 2019 Oct 17; Shenzhen, China. Cham: Springer; 2019. p. 61-72. [Google Scholar]
- 21.Zhu W, Huang Y, Zeng L, et al. AnatomyNet: deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy. Med Phys. 2019;46:576–89. doi: 10.1002/mp.13300. [DOI] [PubMed] [Google Scholar]
- 22.Tong N, Gou S, Yang S, Ruan D, Sheng K. Fully automatic multi-organ segmentation for head and neck cancer radiotherapy using shape representation model constrained fully convolutional neural networks. Med Phys. 2018;45:4558–67. doi: 10.1002/mp.13147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Isensee F, Jaeger PF, Kohl SA, Petersen J, Maier-Hein KH. NnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021;18:203–11. doi: 10.1038/s41592-020-01008-z. [DOI] [PubMed] [Google Scholar]
- 24.Hatamizadeh A, Tang Y, Nath V, et al. UNETR: transformers for 3D medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV); 2022 Jan 3-8; Waikoloa, HI, USA. New York: Institute of Electrical and Electronics Engineers; 2022. p. 1748-58. [Google Scholar]
- 25. Cao H, Wang Y, Chen J, et al. Swin-Unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision - ECCV 2022; 2022 Oct 23-27; Tel Aviv, Israel. Cham: Springer; 2022. p. 205-18. [Google Scholar]
- 26.Ma J, He Y, Li F, Han L, You C, Wang B. Segment anything in medical images. Nat Commun. 2024;15:654. doi: 10.1038/s41467-024-44824-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lin TY, Goyal P, Girshick R, He K, Dollar P. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision; 2017 Oct 22-29; Venice, Italy. New York: Institute of Electrical and Electronics Engineers; 2017. p. 2980-8. [Google Scholar]
- 28.Milletari F, Navab N, Ahmadi SA. V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV); 2016 Dec 19; Stanford, CA, USA. New York: Institute of Electrical and Electronics Engineers; 2016. p. 565-71. [Google Scholar]
- 29.Kawula M, Purice S, Li M, et al. Dosimetric impact of deep learning-based CT auto-segmentation on radiation therapy treatment planning for prostate cancer. Radiat Oncol. 2022;17:21. doi: 10.1186/s13014-022-01985-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Guo H, Wang J, Xia X, et al. The dosimetric impact of deep learning-based auto-segmentation of organs at risk on nasopharyngeal and rectal cancer. Radiat Oncol. 2021;16:113. doi: 10.1186/s13014-021-01837-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Pang EP, Tan HQ, Wang F, et al. Multicentre evaluation of deep learning CT autosegmentation of the head and neck region for radiotherapy. NPJ Digit Med. 2025;8:312. doi: 10.1038/s41746-025-01624-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Shan H, Jia X, Yan P, Li Y, Paganetti H, Wang G. Synergizing medical imaging and radiotherapy with deep learning. Mach Learn Sci Technol. 2020;1:021001. doi: 10.1088/2632-2153/ab869f. [DOI] [Google Scholar]
- 33.Wong J, Fong A, McVicar N, et al. Comparing deep learning-based auto-segmentation of organs at risk and clinical target volumes to expert inter-observer variability in radiotherapy planning. Radiother Oncol. 2020;144:152–8. doi: 10.1016/j.radonc.2019.10.019. [DOI] [PubMed] [Google Scholar]
- 34.Walker Z, Bartley G, Hague C, et al. Evaluating the effectiveness of deep learning contouring across multiple radiotherapy centres. Phys Imaging Radiat Oncol. 2022;24:121–8. doi: 10.1016/j.phro.2022.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Robert C, Munoz A, Moreau D, et al. Clinical implementation of deep-learning based auto-contouring tools: experience of three French radiotherapy centers. Cancer Radiother. 2021;25:607–16. doi: 10.1016/j.canrad.2021.06.023. [DOI] [PubMed] [Google Scholar]
- 36.Zolotova SV, Golanov AV, Pronin IN, et al. Burdenko's glioblastoma progression dataset (Burdenko-GBM-Progression) [Internet]. The Cancer Imaging Archive; 2023 [cited 2025 Oct 10]. Available from: [DOI]
- 37.Chen X, Sun S, Bai N, et al. A deep learning-based auto-segmentation system for organs-at-risk on whole-body computed tomography images for radiation therapy. Radiother Oncol. 2021;160:175–84. doi: 10.1016/j.radonc.2021.04.019. [DOI] [PubMed] [Google Scholar]






