Abstract
Background and Purpose:
Precise delineation of pelvic organs-at-risk (OARs) is crucial for high-dose-rate brachytherapy (HDR-BT) in cervical cancer treatment. While deep learning methods have shown promise in automatic delineation, substantial and complex organ deformations pose significant challenges. This study presents a novel approach to address these issues.
Materials and Methods:
We introduce a coarse-to-refine strategy for annotation, utilizing limited existing data to expedite the process. Combined with deformation-based data augmentation, we incorporate this information into a three-dimensional attention U-Net (C2FAU-Net). The study included 100 cervical cancer patients, with OARs annotated by experienced oncologists. The dataset was divided into 80 patients for training, 10 for validation, and 10 for testing. To assess the delineation performance, we employed the volumetric dice similarity coefficient (DSC), 95th percentile Hausdorff Distance (HD95), average symmetric surface distance (ASSD), precision, and recall. We compared the time consumed by manual delineation versus artificial intelligence (AI)-assisted delineation. Dosimetric parameters were compared using different contours to evaluate the clinical impact of the automated approach.
Results:
Our method achieved an average DSC of 89.7%, HD95 of 3.61 mm, and an ASSD of 1.02 mm in the test cohort. The AI-assisted method significantly reduced the manual delineation time from 17.85 ± 3.84 min to 7.54 ± 4.95 min. No significant difference was observed in ΔD2cc, ΔD1cc, ΔD0.1cc, and ΔDmax for bladder, rectum, and sigmoid when comparing contours generated by C2FAU-Net to those created manually.
Conclusion:
We introduce an effective automatic delineation framework for pelvic OARs, enhancing efficiency within the HDR-BT workflow and potentially improving treatment outcomes.
Keywords: Automatic delineation, cervical cancer, deep learning, high-dose-rate brachytherapy, multiscale
INTRODUCTION
Cervical cancer remains a significant global health concern, affecting over half a million women annually and leading to over 300,000 deaths worldwide.[1] In the curative management of cervical carcinoma, brachytherapy plays a pivotal role, significantly enhancing survival outcomes.[2] High-dose-rate brachytherapy (HDR-BT), in particular, has demonstrated remarkable improvement in survival rates and has become a crucial component of cervical cancer therapy.[3] The success of this treatment modality heavily relies on the precise delineation of organs-at-risk (OARs) to minimize damage to surrounding tissues and organs. However, manual annotation of OARs in a clinical setting is time-consuming and prone to intra- and interobserver inconsistencies.[4,5]
Over the past decade, automatic segmentation techniques have been widely employed to enhance the accuracy and efficiency of medical procedures.[6,7] These techniques leverage computational algorithms and image processing technologies to delineate OARs automatically. Recently, deep learning methods, particularly those utilizing deep neural networks, have demonstrated exceptional performance in medical image analysis tasks, including automatic segmentation.[8,9,10] Deep convolutional neural networks (DCNNs) represent an end-to-end model capable of autonomously learning features from raw images, eliminating manual feature extraction. They efficiently capture structural and textural features of three-dimensional (3D) medical images and can establish generalized models over large training datasets, effectively identifying lesions.[11]
In the context of brachytherapy for cervical cancer, accurate organ delineation is crucial for treatment planning and evaluation due to the steep dose gradients involved. Several studies have developed advanced deep learning models, such as ResUNet, nnNet, and refineNet, to delineate OARs for cervical brachytherapy, providing fast and accurate results. These approaches have also reduced brachytherapy treatment duration by automating OAR delineation, thereby minimizing patient waiting time. However, challenges remain in the clinical implementation of these models, including the need for powerful data processors, long training times, and an unfamiliar operating environment.[12,13,14,15,16] The bladder, sigmoid, and rectum are three pivotal organs closely associated with cervical cancer treatment. Their 3D tubular structure, inherent bends, twists, and expansions, can significantly impact image presentation and segmentation.[17,18] Accurate segmentation thus requires a comprehensive understanding of these organs’ anatomical complexity. Moreover, the variability in anatomical structure, disease progression, and organ morphology (e.g., bladder filling) among patients and different fractions complicates the segmentation tasks.[19] The scarcity of comprehensive patient image datasets due to privacy concerns poses additional challenges to the application of deep learning techniques in this field.[20] To address these challenges, our study introduces a novel approach that integrates a coarse-to-refine strategy into the physician’s manual annotation workflow, reducing the burden of manual annotations. We employ deformation-based data augmentation methods to mitigate the scarcity of medical data, utilizing diverse deformation patterns within the training data to enhance the DCNN’s ability to capture a broader data distribution. Furthermore, we introduce the C2FAU-Net with a multi-scale strategy to improve the quality of data learned by the network, effectively eliminating irrelevant background information and noise interference. This research aims to reduce the data pressure when applying artificial intelligence (AI) to clinical workflows and improve the efficiency of OAR contour analysis for cervical cancer in HDR-BT. By addressing the current limitations in automatic segmentation for brachytherapy, we strive to enhance the overall quality and efficiency of cervical cancer treatment planning and delivery.
MATERIALS AND METHODS
Clinical dataset
This retrospective study included computed tomography (CT) images of 437 fractions from 100 patients with locally advanced cervical cancer, collected between 2019 and 2022 from National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital and Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College. The study was approved by the Institutional Review Board (Cancer Hospital Chinese Academy of Medical Sciences, Shenzhen Center Ethics Committee, approval number: JS2024-5-4). All patients initially received external beam radiation therapy, delivered in 25 fractions with a dose of 1.8 Gy per fraction. This was followed by HDR-BT as a boost treatment, delivered in 4–6 fractions with a mean dose of 6 Gy per fraction. For each patient, a Foley catheter was inserted into the bladder before each brachytherapy implantation session, and all the CT images were acquired after applicator placement. The bladder was filled with approximately 100 cm3 of normal saline in the CT scanning room. All patients were scanned using a Discovery RT scanner (GE Healthcare, Milwaukee, WI, USA) for simulation,, with the scan range encompassing the entire pelvis, extending from the vulva to at least the lumbar vertebra L4. The images were reconstructed with a matrix size of 512 × 512 and a slice thickness of 5 mm. All treatment plans were designed using Eclipse 13.6 (Varian Medical Systems, Palo Alto, CA) and delivered using the GammaMed Plus (Varian Medical Systems, Palo Alto, CA) system.
The patient dataset was divided into three parts: 80 patients for training, 10 patients for validation, and 10 patients for testing. Binary masks were created for each patient’s OARs using the 3D Slicer software and structure sets. Background, bladder, rectum, and sigmoid were represented by voxel values of 0, 1, 2, and 3, respectively. The annotated areas and the corresponding OARs are depicted in Figure 1.
Figure 1.

Schematic illustration of organs-at-risks to be delineated in the body region, the bladder (red), rectum (green), and sigmoid (blue) regions correspond to the voxel labels 1, 2, and 3, respectively. OARs: Organs-at-risks
Data preprocessing
The performance of DCNNs in medical image analysis is highly dependent on the quality and diversity of input data. To enhance model generalization across varied clinical datasets, we implemented deformation-based data augmentation techniques that introduce realistic anatomical variability, even within a limited dataset. By leveraging patient-specific deformation patterns, our approach enables the network to learn from a broader spectrum of plausible anatomical configurations, thereby improving robustness.
A central component of our strategy is interpatient deformation-based augmentation. We utilized a deformation-based registration model, inspired by the VoxelMorph framework[21] to capture and learn deformation differences between patients. While VoxelMorph was originally developed for unsupervised deformable image registration, we adapted it for our application by modifying the input and output layers of the Attention U-Net architecture and employing normalized cross correlation (NCC) as the loss function [Equation 1]. The NCC loss is particularly well-suited for this task, as it is robust to intensity variations between images – a crucial property when dealing with inter-patient variability.
Here, the deformation D is defined as the m Average Symmetric Surface Distance (ASSD)
pping of each voxel in image A to its corresponding position in image B, and a regularization term is included to ensure the resulting deformation is both smooth and physically plausible.
To further improve registration accuracy and anatomical consistency, we used a population-averaged template as the reference image [denoted as Image A in Equation 1]. This shared reference facilitates the extraction of a general deformation vector field that can be applied across the entire patient cohort, as validated by improvements in Dice coefficients.
To analyze and exploit the learned deformation patterns, we incorporated the Karhunen–Loève transform,[22] which statistically decomposes deformation fields into principal components. This allows us to capture the main modes of anatomical variation present within the cohort. Our augmentation pipeline combines these inter-patient deformations, learned via the registration network, with intra-patient deformations generated using random displacement control points. This hybrid approach produces anatomically plausible and diverse spatial variations, expanding the spatial distribution of OARs in the augmented dataset. As a result, the segmentation models trained on this data demonstrate improved generalization and robustness to a wide range of anatomical variations during inference. As a result, the segmentation models trained on this data demonstrate improved generalization and robustness to a wide range of anatomical variations during inference.[23]
For intrapatient deformation, we employed a 3D thin plate spline model to simulate realistic local deformations. By randomly generating 20–60 control point pairs and applying displacements ranging from 0 mm to 30 mm, we effectively mimic the localized motion of organs within individual patients. The combination of inter- and intra-patient deformations enables the model to accommodate a wide range of potential organ shapes, thereby further enhancing its generalization capability for future patients.
To minimize noise in segmentation, we first computed histogram statistics of CT image intensities within the pelvic region. We then truncated the intensity range using the 99th percentile minimum and maximum values (Imin and Imax), followed by Z-score normalization (Equation 2) to standardize the image intensities.
In addition, to improve the contrast between OARs and surrounding tissues in the pelvic region, we optimized the CT window settings by setting the window width to 750 Hounsfield Units (HU) and the window level to 0 HU. This adjustment enhances the visibility of critical anatomical structures, facilitating accurate manual delineation. Importantly, these window settings are used solely for visualization and do not affect the underlying raw CT voxel values, which remain unaltered for all computational analyses and model training.
Network architectures of C2FAU-net
The U-shaped network architecture effectively integrates feature information at different scales from the encoder and decoder, making it well-suited for automatic delineation tasks.[24,25,26] However, the simplistic U-Net assigns equal learning weights to different anatomical features, rendering it susceptible to interference from background regions and noise. To address this limitation, we implemented an attention mechanism in U-Net[27] and combined it with a novel coarse-to-fine framework [Figure 2] to develop a network called C2FAU-Net for automatic OAR delineation in CT scanning. Both the coarse and fine segmentation networks are trained using the Attention-UNet architecture,[28] as shown in Figure 3. The network’s loss function is based on Dice similarity coefficients (DSCs), with weights α and β used to balance precision and recall, as defined by the following equation:
Figure 2.

Architecture of the Attention C2FAU-Net model. In clinical workflows, the input computed tomography images will be processed through a coarse segmentation network to predict a region of interest, followed by fine segmentation, ultimately producing output in the original size. CT: Computed tomography, ROI: Region of interest
Figure 3.

Illustration of the U-Net block, which is used to train coarse network and fine network, where C, H, W, L represent the height, width, length and channel of the computed tomography feature image. ReLU: Rectified linear unit, CT: Computed tomography
Let I denote the total number of voxels, which are the fundamental 3D units of the segmented grid. J represents the number of OAR classes, set at a value of 3. Each element Pi, j, and Mi, j corresponds to the predicted delineation and the one-hot encoded ground-truth mask values, respectively, for class j at voxel i. The parameters α and β are both configured at 0.5 to strike a balanced trade-off between precision and recall. To maintain numerical stability during the optimization process, a smoothing term ϵ is incorporated into the denominator, with a nominal value of 10−5.
In clinical automatic delineation, the input CT images underwent preliminary segmentation using a coarse segmentation network. This initial segmentation served as prior information to crop the region of interest (ROI), highlighting features within the segmented area while excluding a substantial amount of background voxel information and noise interference. Subsequently, the corresponding fine segmentation network was employed to automatically delineate OARs within the ROI. The resulting delineations were then resampled to the original image resolution for output.
Auto-segmentation implementation details
We conducted all experiments on a high-performance workstation equipped with an Intel CPU i5-11400F @ 2.60GHz, an NVIDIA RTX A6000 GPU, and 128 GB of RAM. The experimental environment was set up using Python 3.9 and PyTorch 1.12.1. We employed the Adam optimizer with a batch size of 1 and an initial learning rate of 5×10-3. For the Attention U-Net network, we employed a five-layer encoder and a four-layer decoder architecture. Each layer comprised a convolution block followed by batch normalization and Rectified Linear Unit activation function. We also incorporated an attention gate to connect the attention mechanism. The final segmentation result was generated through a fully connected softmax layer. Both the coarse and fine segmentation models were trained for 150 epochs to ensure complete convergence.
During the training stage, we resampled the entire 3D CT to a size of 192 reled × 192 for the coarse segmentation network. Although a smaller resolution would have sufficed, we opted for a larger sample size to better capture the detailed tubular structures. This network’s primary function was to assist in roughly localizing the 3D segmentation ROI for new and unknown patients, serving as a priori information for the subsequent fine segmentation process.
The fine segmentation network was trained using manually delineated segmentation boundaries by an oncologist. To enhance the network’s focus on organ shape features and their immediate surroundings, we initially expanded the segmentation boundaries by 5 pixels within each CT slice, applying this as a two-dimensional (2D) operation in the X/Y plane. After this boundary expansion, the cropped CT images and their corresponding labels were resampled to a volumetric size of 192 metri × 192, resulting in a 3D voxel-based representation. All subsequent operations, including cropping and fine segmentation, were then performed in 3D space using these voxel-based volumes, ensuring consistency and anatomical accuracy throughout the preprocessing and segmentation pipeline. This two-step approach offers several benefits. The coarse segmentation provides an initial ROI, reducing the computational load for the fine segmentation step. By automatically cropping the 3D CT of the pelvic region based on expert prior knowledge and edge shape information, we effectively excluded background interference and noise. This refined input allows the deep neural network to perform optimally, focusing on the most relevant areas for accurate organ delineation.
Performance metric
To comprehensively evaluate the performance of the proposed segmentation method, the following metrics were adopted: Volumetric DSC, 95th percentile Hausdorff Distance (HD95), Average Symmetric Surface Distance (ASSD), Precision, and Recall.[29] All these equations are defined as follows:
where A represents the region from automatic segmentation, B represents the region from ground truth, |Auth denotes the number of overlapping voxels, and | A| and | B| are the voxel counts of the two regions.
where HD represents the directed Hausdorff distance from region A to region B. It is defined as the maximum of the shortest distances from any point in A to the closest point in B. HD95 is typically calculated as the 95th percentile of these distances, which means that 95% of the distances are smaller than this value.
where S (A) represents the surface voxels in the set A, d (v, S (A)) denotes the shortest distance from any voxel to S (A).
where TP (True Positives) represents the number of correctly segmented pixels, and FP (False Positives) represents the number of incorrectly segmented pixels.
These metrics provide a comprehensive evaluation of the segmentation performance. A DSC value approaching 1 indicates that the segmentation result is nearly identical to the ground truth. The HD95 metric, serving as a robust alternative to the traditional Hausdorff distance, quantifies the 95th percentile of the maximum distances between each organ segmentation result and its corresponding ground truth, offering a more resilient measure to outliers in multi-organ segmentation tasks. ASSD refers to the average of all distances from a point on the boundary of the automatic delineation result to the boundary of the manual delineation result.
Clinical evaluation
The efficiency of manual versus AI-assisted methods in delineating OARs is a crucial consideration in clinical settings. Our study employed two approaches to evaluate the potential application of the proposed AI framework in treatment planning: direct manual delineation from original CT scans and refinement based on automatic segmentation results. Using CT slices from 10 test patients, we compared the time required for each method. To mitigate the influence of subjective delineation variations, two experts independently timed their efforts using both manual and AI-assisted methods. For a robust statistical evaluation of the time differences, we conducted a two-sided paired Student’s t-test using SciPy 1.8.0 and Python 3.9, providing insights into the statistical significance of the differences.
The accuracy of automatic OAR delineation is crucial for determining the feasibility of adopting this approach in clinical practice. To evaluate the dosimetric differences between manually generated contours and automatic delineation, we overlaid the original dose distribution on the predicted segmentations, generated new cumulative dose volume histograms (DVHs), and evaluated and compared the dosimetric discrepancies of D2cc, D1cc, D0.1cc, and the maximum dose for OARs with the DVH statistics from manual contours. We also employed a two-sided paired Student’s t-test to compare these dosimetric parameters between manual and automatic delineation methods. This comprehensive analysis provides valuable insights into both the efficiency and accuracy of automatic OAR delineation, contributing to the assessment of its potential integration into clinical workflows.
RESULTS
Quantitative evaluation of automatic organs-at-risk delineation
Compared to 3D U-Net, the proposed method delivered superior segmentation performance with enhanced detail when evaluated on a clinical dataset. Figure 4 illustrates two representative examples of automatic versus manual delineation, providing a visualization of randomly selected delineation results. Figure 4a shows a case requiring minimal clinical modifications, while Figure 4b presents a case needing more extensive adjustments. A comparison to 3D U-Net segmentation results is also included. Table 1 presents detailed quantitative results, using mean sativese deviation to evaluate both models’ performance in terms of DSC, HD95, ASSD, precision, and recall. Our method significantly outperforms 3D U-Net across all five metrics for the three organs of interest. Notably, the DSC evaluation metric reaches 92.4% 2ac. 0% for the bladder, 90.2% 0ad. 9% for the rectum, and 86.5% 6dt. 8% for the sigmoid. The bladder achieved the highest DSC, precision, and recall scores, as well as the lowest HD95 metric. In contrast, the sigmoid exhibited the largest variation in performance metrics. This variability can be attributed to the sigmoid’s anatomical characteristics: (1) its position varies significantly due to differing filling degrees of various intestines within the abdominal cavity; (2) its density fluctuates due to varying contents within the abdominal cavity. The bladder and rectum demonstrate greater consistency in these aspects, leading to their superior performance compared to the sigmoid colon.
Figure 4.

Visualization of automated segmentation results in test patients compared to those manually sketched by physicians, where (a) the model automatically contours patient cases with high accuracy (less time spent for revision) (b) the model automatically contours patient cases with lower accuracy (more time spent for revision). Three rows of representative axial slices were selected. The first column represents the original computed tomography (CT) slice, with W representing the window width, and L representing the level. A white frame was utilized to enlarge the raw CT slice at the image’s center. The second column displays the manual delineation, the third column presents the model’s prediction, and the fourth column compares the delineations from the first two columns. CT: Computed tomography
Table 1.
Quantitative evaluation results for organ segmentation using five metrics
| Metrics | Models | OAR |
Average | ||
|---|---|---|---|---|---|
| Bladder | Rectum | Sigmoid | |||
| DSC (%) | Proposed | 92.4±3.0 | 90.2±2.9 | 86.5±3.8 | 89.7 |
| 3DU-Net | 85.8±7.3 | 85.6±6.1 | 80.4±9.4 | 83.9 | |
| HD95 (mm) | Proposed | 2.81±2.76 | 3.40±2.12 | 4.61±1.97 | 3.61 |
| 3DU-Net | 5.19±3.64 | 5.87±2.96 | 6.44±2.38 | 5.83 | |
| ASSD (mm) | Proposed | 1.03±0.67 | 0.78±0.46 | 1.26±1.14 | 1.02 |
| 3DU-Net | 1.49±0.38 | 1.17±0.37 | 1.86±1.53 | 1.51 | |
| Precision (%) | Proposed | 92.1±2.4 | 90.6±3.4 | 87.1±3.9 | 89.9 |
| 3DU-Net | 84.7±6.1 | 85.1±4.9 | 81.3±5.8 | 83.7 | |
| Recall (%) | Proposed | 93.8±2.7 | 90.4±3.3 | 87.4±3.6 | 90.5 |
| 3DU-Net | 86.5±5.9 | 86.2±5.1 | 83.6±5.9 | 85.4 | |
OAR: Organs-at-risk, DSC: Dice similarity coefficient, HD95: Hausdorff distance, ASSD: Average symmetric surface distance
Time consumption of manual and artificial intelligence-assisted delineation
The clinical evaluation results, presented in Table 2, demonstrate a significant reduction in delineation time using the AI-assisted method. For manual delineation, the average times for the bladder, rectum, and sigmoid were For manual delineation, the average times for the bladder, rectum, and sigmoid were 6.33±2.87 min, 4.30±1.89 min, and 7.22±2.45 min, respectively. In contrast, the AI-assisted method substantially reduced these times to 2.67±2.65 min, 1.95±1.35 min, and 2.92±1.82 min, respectively. The proposed AI-assisted method significantly decreased the overall delineation time from 17.85±3.84 min to 7.54±4.95 min. Notably, the system can automatically generate organ contours for each patient in just 1.5 s. The time reduction is highly statistically significant (P < 0.001), indicating that these improvements are not due to random chance. These results underscore the superior efficiency of the proposed AI-assisted method in clinical applications, potentially streamlining workflow and reducing patient wait times in brachytherapy planning.
Table 2.
Comparison of the average time required for manual and artificial intelligence -assisted delineation methods in clinical workflow (unit: min)
| Manual | AI-assisted | P | |
|---|---|---|---|
| OARs | 17.85±3.84 | 7.54±4.95 | <0.001 |
| Bladder | 6.33±2.87 | 2.67±2.65 | <0.001 |
| Rectum | 4.30±1.89 | 1.95±1.35 | <0.001 |
| Sigmoid | 7.22±2.45 | 2.92±1.82 | <0.001 |
All values are described in the form of mean±SD. AI: Artificial intelligence, OARs: Organs-at-risk, SD: Standard deviation
Dosimetric comparison using different delineation methods
To evaluate the dosimetric accuracy of the proposed method, we compared dose-volume parameters obtained from predicted contours with those from manually delineated contours [Table 3]. The prescribed dose for patients was 6 Gy. Our analysis revealed that: (1) No statistically significant differences were found in the mean D2cc, D1cc, D0.1cc, Dmax, and volume for bladder, rectum, and sigmoid (all P > 0.05); (2) the largest mean change in DVH parameters was observed for the bladder, with a mean change in Dmax of 1.73 Gy; (3) the sigmoid showed better performance in terms of mean changes in D2cc, D1cc, and Dmax compared to the bladder and rectum; (4) The rectum demonstrated the smallest mean change in D0.1cc among the three organs. These results indicate that the automatically generated contours produce dosimetric parameters comparable to those from manual delineation, suggesting the clinical applicability of the proposed method for treatment planning in cervical cancer brachytherapy.
Table 3.
Dosimetric comparison of bladder, rectum and sigmoid using manual and automatic delineation methods
| Bladder (Gy) (P) | Rectum (Gy) (P) | Sigmoid (Gy) (P) | |
|---|---|---|---|
| ΔD2cc | 0.46±0.41 (0.71) | 0.22±0.20 (0.30) | 0.14±0.16 (0.77) |
| ΔD1cc | 0.54±0.52 (0.62) | 0.26±0.22 (0.28) | 0.14±0.16 (0.77) |
| ΔD0.1cc | 1.16±1.25 (0.41) | 0.29±0.34 (0.16) | 2.13±6.45 (0.36) |
| ΔDmax | 1.73±2.01 (0.49) | 0.84±1.48 (0.17) | 0.7±1.75 (0.94) |
| ΔVolume | 2.95±1.80 (0.41) | 2.34±1.81 (0.39) | 6.76±10.55 (0.18) |
All values are described in the form of mean±SD. The unit is Gy for ΔD2cc, ΔD1cc, ΔD0.1cc, and ΔDmax. The unit is cc for ΔVolume. SD: Standard deviation
DISCUSSION
The delineation of OARs is critically important in HDR brachytherapy for gynecological cancers. Manual delineation of OARs is time-consuming and can prolong treatment, potentially increasing patient discomfort. Moreover, the process is subject to inter- and intra-observer variability, depending on the clinician’s experience. Consequently, automatic delineation of OARs in cervical cancer is crucial for establishing efficient and precise HDR brachytherapy treatment plans.[30,31,32] Deep neural network methods have shown promising results in automatic organ delineation. However, these robust deep-learning models require large amounts of high-quality data for training to ensure high-precision performance across multiple patients.[33,34] In addition, the large volumetric range of CT scans poses a challenge for deep learning models to effectively learn the features required for automatic delineation. Furthermore, the presence of background information and noise can further limit the network’s learning capability.[35,36]
Our proposed algorithm effectively addresses the challenges of OAR delineation in cervical cancer HDR-BT through several innovative approaches. We developed a novel coarse-to-refine annotation workflow that provides prior information for refinement annotations. This approach alleviates the burden on physicians and facilitates the construction of a high-quality dataset for the DCNN. Through deformation-based data augmentation methods, we learn inter-patient variations from limited data and combine them with intra-patient deformations to generate diverse and realistic patient anatomy structures. This approach effectively addresses the problem of data scarcity in clinical practice. In terms of network architecture, C2FAU-Net employs a multi-scale strategy, where the coarse stage locates the ROI to facilitate precise segmentation during the fine stage. The introduction of an attention mechanism within the network helps select significant features at organ boundaries and suppresses irrelevant areas and noise interference, enhancing the quality of segmentation results. This is particularly important for segmenting the bladder and tubular organs in the pelvic area, as their shapes and positions often change, making edge and texture information difficult to extract.
Compared to the 3D U-Net approach, a previously widely used implementation in clinical workflows, our proposed algorithm exhibits higher accuracy for OARs in cervical cancer HDR-BT. When predicting the target area structure of a new patient, the organ deformations generated by the deformation model are used for training, integrating rich anatomical structure information into the DCNN. This leads to accurate and automatic delineation results that benefit physicians in brachytherapy plan development. The experimental results in Table 1 show that compared to the 3D U-Net, our method improves DSC by 5.8% and reduces HD95 and ASSD by 2.59 mm and 4.32 mm, respectively. Furthermore, the high precision and recall are maintained, as evidenced by the quantitative analysis and the visualization of segmentation results in Figure 3. The experimental results in Table 2 show that compared to traditional manual delineation by physicians, the The experimental results in Table 2 show that compared to traditional manual delineation by physicians, the average time duration is reduced from 17.85±3.8 min to 7.54±4.95 min. Notably, for sigmoid structures, which contain complex structures, the average time duration is reduced by 4.3 min, representing a 60% improvement in efficiency. These results demonstrate the superiority of our proposed method in terms of accuracy, efficiency, and clinical applicability for OAR delineation in cervical cancer HDR-BT.
Table 4 provides a comprehensive comparison of the C2FAU-Net model’s performance metrics with those reported in previous studies. Mohammadi et al. employed ResU-Net architecture, which uses long and short skip connections to improve the feature extraction procedure and segmentation accuracy.[16] Using CT images from 113 patients, they achieved DSC values of 95.7% and 96.6% for automated segmentation of the bladder and rectum, respectively. While the model performed well on the training set, the introduction of residuals significantly increased computational complexity and training difficulty. The model also had high data quality requirements, potentially affecting performance in the presence of noise or image artifacts. Moreover, its generalization ability was limited, with suboptimal performance on the test set.[37] Didier Duprez, et al. developed a self-configuring method based on a well-trained U-net, offering three different trainable configurations: a 2D U-Net, a 3D full-resolution U-Net, and a 3D cascade U-Net. This approach can empirically determine the best-performing configuration or ensemble for specific tasks, including whether to apply all-but-largest-component-suppression postprocessing. However, the model still faces challenges such as high computational costs, substantial hardware requirements, sensitivity to hyperparameters, and limited generalization ability.[38] Based on the results of our study, the DSC achieved for the rectum is lower than that reported in the aforementioned methods. This discrepancy highlights the ongoing challenges in achieving consistent, high-quality automatic delineation across different anatomical structures and datasets.
Table 4.
Comparison of the performance metric between the current study and previous work
| Studies | bladder | Rectum | Sigmoid |
|---|---|---|---|
| Wong et al.[39] Method: DSD-UNet |
DSC: 92% | DSC: 76% | DSC: 76% |
| HD95: 3.5 mm | HD95: 18.4 mm | HD95: 18.4 mm | |
| Mohammadi et al.[16] Method: Resu-net |
DSC: 95.7±3.7% | DSC: 96.6±1.5% | DSC: 96.6±1.5% |
| HD95: 4.05±5.17 mm | HD95: 1.96±2.19 mm | HD95: 1.96±2.19 mm | |
| Duprez et al.[13] Method: 3DFR NnU-net |
DSC: 92±4% | DSC: 84±2% | / |
| HD95: 3.0±1.1 mm | HD95: 5.3±1.5 mm | / | |
| Our study c2fau-net |
DSC: 92.4±3% | DSC: 90.2±2.9% | DSC: 86.5±3.8% |
| HD95: 2.81±2.76 mm | HD95: 3.40±2.12 mm | HD95: 4.61±1.97 mm |
DSC: Dice similarity coefficient, HD95: Hausdorff distance, 3DFR: 3D full-resolution U-Net
Recent research in image segmentation has focused on adapting large vision models for 3D medical imaging applications.[40,41,42] Large vision models may offer significant advantages over traditional image segmentation algorithms for OAR delineation in cervical cancer. These models demonstrate high accuracy (DSC values > 0.7) for most OARs, particularly larger organs, and exhibit strong generalization capabilities across different anatomical structures.[43] They provide user-friendly, interactive interfaces using simple prompts like clicks or bounding boxes. However, large vision models often require substantial computational resources and may struggle with very small or poorly defined structures, such as the sigmoid and intestine in the pelvic region. While ongoing research aims to optimize these models for medical imaging, they currently lack the long-standing medical-specific optimizations of traditional algorithms. DCNN algorithms, in contrast, may excel in delineating smaller, less-defined structures and generally demand fewer computational resources. They often require task-specific training and complex parameter tuning, which can limit their flexibility. DCNN methods typically need substantial task-specific training data but offer more straightforward interpretability of their decision-making processes. The choice between these approaches ultimately depends on the specific clinical context, available resources, and target organs. A hybrid approach, leveraging the strengths of both large vision models and traditional algorithms, might prove optimal for comprehensive OAR delineation in cervical cancer radiotherapy planning.
Our study, while demonstrating significant improvements in automation and physician efficiency, has several limitations that guide future work. (1) The reliance on high-performance computing may limit adoption in resource-constrained settings. Future work will focus on deploying our algorithm on platforms that distribute computational tasks across multiple centers. (2) Our current dataset is from a single clinical trial. We plan to collaborate with other healthcare organizations to incorporate multi-center data, enhancing the model’s generalizability. (3) We aim to combine our expanded dataset with detailed dose assessment reports to quantitatively demonstrate the benefits for HDR-BT planning. (4) To improve performance on resource-limited devices, we will explore model pruning, quantization, and lightweight architectures such as MobileNetV2 or ShuffleNet. (5) We will investigate methods to improve model performance across various imaging modalities and patient populations, potentially using techniques like GANs for data augmentation. (6) Our study primarily employs DSC and HD as core metrics – due to their direct clinical relevance in assessing spatial overlap and boundary accuracy for OAR segmentation, whereass accuracy and specificity are not included in the stud. These metrics could provide additional insights into pixel-wise classification performance, particularly in scenarios with extreme class imbalance. Future work will incorporate specificity analysis to quantify false-positive reductions in adjacent critical tissues. By addressing these limitations, we aim to develop a more accessible, efficient, and generalizable solution for automated OAR segmentation in HDR brachytherapy planning.
CONCLUSION
We established a multi-scale strategy for OAR segmentation in cervical cancer HDR-BT, demonstrating its effectiveness in clinical scenarios. Our models significantly enhance automatic delineation, reducing inter-observer variability and planning time. By integrating the proposed approach, we anticipate substantial improvements in both the accuracy and efficiency of brachytherapy delineation for cervical cancer. Our study lays the groundwork for future research in radiotherapy planning, potentially leading to more personalized and effective treatment strategies. As we continue to refine and validate these models, we envision AI-assisted planning becoming an integral part of clinical practice, ultimately improving patient outcomes and healthcare efficiency.
Ethics approval and consent to participate
This project was approved by the Ethics Committee of Shenzhen Hospital, Cancer Hospital, Chinese Academy of Medical Sciences, China (2020–23). Written consent to Participate was waived.
Author contributions statement
Conceptualization, M.Z., W.H., X.L. and C.L.; M.Z. and W.H. contributed equally to this work and share first authorship; methodology, M.Z. and W.H.; validation, J.D. and Y.X.; data curation, Z.C., N.X., S.L., D.C., and C.L.; writing-original draft preparation, M.Z. and W.H.; writing-review and editing, J.D.,Y.X., Z.C., N.X., S.L. and X.L.; supervision, X.L. and C.L. All authors have read and agreed to the published version of the manuscript.
Conflicts of interest
There are no conflicts of interest.
Funding Statement
This work is partly supported by grants from the National Natural Science Foundation of China (82202954, 12005301), Guangdong Basic and Applied Basic Research Foundation (No. 2022A1515012456), National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen (No. E010124002, SZ2020MS002, SZ2020QN013, E010321001), Shenzhen High-level Hospital Construction Fund, Shenzhen Key Medical Discipline Construction Fund (No.SZXK013), Sanming Project of Medicine in Shenzhen (No.SZSM202211030), Shenzhen High-level Hospital Construction Fund, National Key Research and Develop Program of China (2023YFC2411502).
REFERENCES
- 1.Cohen PA, Jhingran A, Oaknin A, Denny L. Cervical cancer. Lancet. 2019;393:169–82. doi: 10.1016/S0140-6736(18)32470-X. [DOI] [PubMed] [Google Scholar]
- 2.Holschneider CH, Petereit DG, Chu C, Hsu IC, Ioffe YJ, Klopp AH, et al. Brachytherapy: A critical component of primary radiation therapy for cervical cancer: From the society of gynecologic oncology (SGO) and the American Brachytherapy Society (ABS) Brachytherapy. 2019;18:123–32. doi: 10.1016/j.brachy.2018.11.009. [DOI] [PubMed] [Google Scholar]
- 3.Oud M, Kolkman-Deurloo IK, Mens JW, Lathouwers D, Perkó Z, Heijmen B, et al. Fast and fully-automated multi-criterial treatment planning for adaptive HDR brachytherapy for locally advanced cervical cancer. Radiother Oncol. 2020;148:143–50. doi: 10.1016/j.radonc.2020.04.017. [DOI] [PubMed] [Google Scholar]
- 4.Gupta V, Wang Y, Méndez Romero A, Myronenko A, Jordan P, Maurer C, et al. Fast and robust adaptation of organs-at-risk delineations from planning scans to match daily anatomy in pre-treatment scans for online-adaptive radiotherapy of abdominal tumors. Radiother Oncol. 2018;127:332–8. doi: 10.1016/j.radonc.2018.02.014. [DOI] [PubMed] [Google Scholar]
- 5.van Dijk LV, Van den Bosch L, Aljabar P, Peressutti D, Both S, Steenbakkers RJ, et al. Improving automatic delineation for head and neck organs at risk by deep learning contouring. Radiother Oncol. 2020;142:115–23. doi: 10.1016/j.radonc.2019.09.022. [DOI] [PubMed] [Google Scholar]
- 6.Isambert A, Dhermain F, Bidault F, Commowick O, Bondiau PY, Malandain G, et al. Evaluation of an atlas-based automatic segmentation software for the delineation of brain organs at risk in a radiation therapy clinical context. Radiother Oncol. 2008;87:93–9. doi: 10.1016/j.radonc.2007.11.030. [DOI] [PubMed] [Google Scholar]
- 7.Tao CJ, Yi JL, Chen NY, Ren W, Cheng J, Tung S, et al. Multi-subject atlas-based auto-segmentation reduces interobserver variation and improves dosimetric parameter consistency for organs at risk in nasopharyngeal carcinoma: A multi-institution clinical study. Radiother Oncol. 2015;115:407–11. doi: 10.1016/j.radonc.2015.05.012. [DOI] [PubMed] [Google Scholar]
- 8.Chen Y, Gensheimer MF, Bagshaw HP, Butler S, Yu L, Zhou Y, et al. Patient-specific auto-segmentation on daily kVCT images for adaptive radiation therapy. Int J Radiat Oncol Biol Phys. 2023;117:505–14. doi: 10.1016/j.ijrobp.2023.04.026. [DOI] [PubMed] [Google Scholar]
- 9.Costea M, Zlate A, Durand M, Baudier T, Grégoire V, Sarrut D, et al. Comparison of atlas-based and deep learning methods for organs at risk delineation on head-and-neck CT images using an automated treatment planning system. Radiother Oncol. 2022;177:61–70. doi: 10.1016/j.radonc.2022.10.029. [DOI] [PubMed] [Google Scholar]
- 10.Liao W, Luo X, He Y, Dong Y, Li C, Li K, et al. Comprehensive evaluation of a deep learning model for automatic organs-at-risk segmentation on heterogeneous computed tomography images for abdominal radiation therapy. Int J Radiat Oncol Biol Phys. 2023;117:994–1006. doi: 10.1016/j.ijrobp.2023.05.034. [DOI] [PubMed] [Google Scholar]
- 11.Zhu Z, Albadawy E, Saha A, Zhang J, Harowicz MR, Mazurowski MA. Deep learning for identifying radiogenomic associations in breast cancer. Comput Biol Med. 2019;109:85–90. doi: 10.1016/j.compbiomed.2019.04.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Cao Y, Vassantachart A, Ragab O, Bian S, Mitra P, Xu Z, et al. Automatic segmentation of high-risk clinical target volume for tandem-and-ovoids brachytherapy patients using an asymmetric dual-path convolutional neural network. Med Phys. 2022;49:1712–22. doi: 10.1002/mp.15490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Duprez D, Trauernicht C, Simonds H, Williams O. Self-configuring nnU-Net for automatic delineation of the organs at risk and target in high-dose rate cervical brachytherapy, a low/middle-income country’s experience. J Appl Clin Med Phys. 2023;24:e13988. doi: 10.1002/acm2.13988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kim N, Chang JS, Kim YB, Kim JS. Atlas-based auto-segmentation for postoperative radiotherapy planning in endometrial and cervical cancers. Radiat Oncol. 2020;15:106. doi: 10.1186/s13014-020-01562-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Li Z, Zhu Q, Zhang L, Yang X, Li Z, Fu J. A deep learning-based self-adapting ensemble method for segmentation in gynecological brachytherapy. Radiat Oncol. 2022;17:152. doi: 10.1186/s13014-022-02121-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Mohammadi R, Shokatian I, Salehi M, Arabi H, Shiri I, Zaidi H. Deep learning-based auto-segmentation of organs at risk in high-dose rate brachytherapy of cervical cancer. Radiother Oncol. 2021;159:231–40. doi: 10.1016/j.radonc.2021.03.030. [DOI] [PubMed] [Google Scholar]
- 17.Chamié LP, Ribeiro DM, Caiado AH, Warmbrand G, Serafini PC. Translabial US and dynamic MR imaging of the pelvic floor: Normal anatomy and dysfunction. Radiographics. 2018;38:287–308. doi: 10.1148/rg.2018170055. [DOI] [PubMed] [Google Scholar]
- 18.Swamidas J, Kirisits C, De Brabandere M, Hellebust TP, Siebert FA, Tanderup K. Image registration, contour propagation and dose accumulation of external beam and brachytherapy in gynecological radiotherapy. Radiother Oncol. 2020;143:1–11. doi: 10.1016/j.radonc.2019.08.023. [DOI] [PubMed] [Google Scholar]
- 19.Lempart M, Nilsson MP, Scherman J, Gustafsson CJ, Nilsson M, Alkner S, et al. Pelvic U-Net: Multi-label semantic segmentation of pelvic organs at risk for radiation therapy anal cancer patients using a deeply supervised shuffle attention convolutional neural network. Radiat Oncol. 2022;17:114. doi: 10.1186/s13014-022-02088-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Xiao C, Choi E, Sun J. Opportunities and challenges in developing deep learning models using electronic health records data: A systematic review. J Am Med Inform Assoc. 2018;25:1419–28. doi: 10.1093/jamia/ocy068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Balakrishnan G, Zhao A, Sabuncu MR, Guttag J, Dalca AV. VoxelMorph: A Learning Framework for Deformable Medical Image Registration. IEEE Trans Med Imaging. 2019 doi: 10.1109/TMI.2019.2897538. [doi: 10.1109/TMI.2019.2897538] [DOI] [PubMed] [Google Scholar]
- 22.Levey A, Lindenbaum M. Sequential karhunen-loeve basis extraction and its application to images. IEEE Trans Image Process. 2000;9:1371–4. doi: 10.1109/83.855432. [DOI] [PubMed] [Google Scholar]
- 23.He W, Zhang C, Dai J, Liu L, Wang T, Liu X, et al. A statistical deformation model-based data augmentation method for volumetric medical image segmentation. Med Image Anal. 2024;91:102984. doi: 10.1016/j.media.2023.102984. [DOI] [PubMed] [Google Scholar]
- 24.Kelly BS, Judge C, Bollard SM, Clifford SM, Healy GM, Aziz A, et al. Radiology artificial intelligence: A systematic review and evaluation of methods (RAISE) Eur Radiol. 2022;32:7998–8007. doi: 10.1007/s00330-022-08784-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tang H, Chen X, Liu Y, Lu Z, You J, Yang M, et al. Clinically applicable deep learning framework for organs at risk delineation in CT images. Nat Mach Intell. 2019;1:480–91. [Google Scholar]
- 26.Wasserthal J, Breit HC, Meyer MT, Pradella M, Hinck D, Sauter AW, et al. TotalSegmentator: Robust segmentation of 104 anatomic structures in CT images. Radiol Artif Intell. 2023;5:e230024. doi: 10.1148/ryai.230024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ronneberger O, Fischer P, Brox T. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Lecture Notes in Computer Science. Vol. 9351. Springer; 2015. U-Net: Convolutional networks for biomedical image segmentation; pp. 234–241. [Google Scholar]
- 28.Schlemper J, Oktay O, Schaap M, Heinrich M, Kainz B, Glocker B, et al. Attention gated networks: Learning to leverage salient regions in medical images. Med Image Anal. 2019;53:197–207. doi: 10.1016/j.media.2019.01.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Karimi D, Warfield SK, Gholipour A. Transfer learning in medical image segmentation: New insights from analysis of the dynamics of model parameters and learned representations. Artif Intell Med. 2021;116:102078. doi: 10.1016/j.artmed.2021.102078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Zhang D, Yang Z, Jiang S, Zhou Z, Meng M, Wang W. Automatic segmentation and applicator reconstruction for CT-based brachytherapy of cervical cancer using 3D convolutional neural networks. J Appl Clin Med Phys. 2020;21:158–69. doi: 10.1002/acm2.13024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Mahantshetty U, Gurram L, Bushra S, Ghadi Y, Aravindakshan D, Paul J, et al. Single Application multifractionated image guided adaptive high-dose-rate brachytherapy for cervical cancer: Dosimetric and clinical outcomes. Int J Radiat Oncol Biol Phys. 2021;111:826–34. doi: 10.1016/j.ijrobp.2021.06.014. [DOI] [PubMed] [Google Scholar]
- 32.Hu H, Shao Y, Hu S. A review of the application of deep learning in brachytherapy. OALib. 2020;7:1–9. [Google Scholar]
- 33.Holzinger A, Dehmer M, Emmert-Streib F, Cucchiara R, Augenstein I, Del Ser J, et al. Information fusion as an integrative cross-cutting enabler to achieve robust, explainable, and trustworthy medical artificial intelligence. Inf Fusion. 2021;79:263–78. [Google Scholar]
- 34.Wang H, Fu T, Du Y, Gao W, Huang K, Liu Z, et al. Publisher correction: Scientific discovery in the age of artificial intelligence. Nature. 2023;621:E33. doi: 10.1038/s41586-023-06559-7. [DOI] [PubMed] [Google Scholar]
- 35.Aghamohammadi A, Ranjbarzadeh R, Naiemi F, Mogharrebi M, Dorosti S, Bendechache M. TPCNN: Two-path convolutional neural network for tumor and liver segmentation in CT images using a novel encoding approach. Expert Syst Appl. 2021;183:115406. [Google Scholar]
- 36.Kim G, Han M, Shim H, Baek J. A convolutional neural network-based model observer for breast CT images. Med Phys. 2020;47:1619–32. doi: 10.1002/mp.14072. [DOI] [PubMed] [Google Scholar]
- 37.de Boer M, Kos TM, Fick T, van Doormaal JA, Colombo E, Kuijf HJ, et al. NnU-Net versus mesh growing algorithm as a tool for the robust and timely segmentation of neurosurgical 3D images in contrast-enhanced T1 MRI scans. Acta Neurochir (Wien) 2024;166:92. doi: 10.1007/s00701-024-05973-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Avesta A, Hossain S, Lin M, Aboian M, Krumholz HM, Aneja S. Comparing 3D, 2.5D, and 2D approaches to brain image auto-segmentation. Bioengineering (Basel) 2023;10:181. doi: 10.3390/bioengineering10020181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wong J, Fong A, McVicar N, Smith S, Giambattista J, Wells D, et al. Comparing deep learning-based auto-segmentation of organs at risk and clinical target volumes to expert inter-observer variability in radiotherapy planning. Radiother Oncol. 2020;144:152–8. doi: 10.1016/j.radonc.2019.10.019. [DOI] [PubMed] [Google Scholar]
- 40.Li Y, Liu J, Zhang Y, Wei J, Luo Y, Yang X, et al. Large scale models in radiology: revolutionizing the future of medical imaging. Radiology Sci. 2024;3:15–24. Doi: 10.15212/RADSCI-2023-0018. [Google Scholar]
- 41.Wang J, Liu Z, Zhao L, Wu Z, Ma C, Yu S, et al. Review of large vision models and visual prompt engineering. Meta-Radiology. 2023;1:100047. [Google Scholar]
- 42.Oakden-Rayner L. Exploring large-scale public medical image datasets. Acad Radiol. 2020;27:106–12. doi: 10.1016/j.acra.2019.10.006. [DOI] [PubMed] [Google Scholar]
- 43.Zhang L, Liu Z, Zhang L, Wu Z, Yu X, Holmes J, et al. Technical note: Generalizable and promptable artificial intelligence model to augment clinical delineation in radiation oncology. Med Phys. 2024;51:2187–99. doi: 10.1002/mp.16965. [DOI] [PMC free article] [PubMed] [Google Scholar]
