Abstract
Background:
Adaptive radiotherapy for non-small cell lung cancer (NSCLC) requires accurate image registration to account for anatomical changes during treatment. Artificial intelligence (AI)-based approaches have shown potential in improving automated landmark detection and deformable image registration.
Aims and Objectives:
This study aimed to investigate the application of AI for automated anatomic landmark detection and image deformation in NSCLC cases to support adaptive radiotherapy planning.
Materials and Methods:
A multimodal image registration approach combining cone-beam computed tomography (CBCT) and computed tomography (CT) images was implemented. The workflow consisted of multistage registration, beginning with rigid registration followed by deep-learning-based deformable registration using the VoxelMorph framework. A total of 1040 axial CBCT and CT images were used for training, validation, and testing of the landmark detection model based on the You Only Look Once (YOLO) algorithm. Various YOLO models were compared for spine landmark detection. Model performance was evaluated using Intersection over Union (IoU), Mean Average Precision (mAP), Dice Similarity Coefficient (DSC), and Target Registration Error (TRE).
Results:
Among the evaluated models, YOLOv3 achieved the highest accuracy for spine landmark detection, with an IoU of 0.818 and an mAP of 0.66. For organ-at-risk segmentation in deformable registration, YOLOv9 outperformed YOLOv8. Rigid registration demonstrated an average DSC of 0.88 ± 0.04 and a TRE of 1.7 ± 1.0 mm, within the tolerance range recommended by AAPM Task Group 132. Deformable registration using VoxelMorph achieved acceptable micro-DSC values; however, macro-DSC results indicated the need for further refinement.
Conclusion:
AI-based methods demonstrate promising performance for automated landmark detection and deformable image registration in adaptive radiotherapy for NSCLC. Nevertheless, further optimization is required to improve accuracy and reduce variability in registered images before routine clinical implementation.
Keywords: Artificial intelligence, cone-beam computed tomography, deformable registration, landmark annotation
INTRODUCTION
Respiratory movements cause significant lung deformation, leading to anatomical variability among patients and complicating lung cancer treatment planning, often resulting in tumour displacements.[1] Lung image registration methods involve rigid, affine, and deformable, primary face challenges like sliding motion tissue, which complicates image alignment, causing misalignments in tumour targeting.[2] To address this challenge, deformable image registration (DIR) assumes smooth transformations but may fail with discontinuous deformations.
Lung deformation may also involve anisotropic changes that are not necessarily linked to volume, introducing substantial variability across patients.[3] This variability can cause registration failures when anatomical correspondences cannot be easily discerned by standard algorithms,[4] requiring frequent fine-tuning of patient-specific models.[5] Image quality issues, such as artifacts and intensity changes from respiratory motion, along with low lung tissue density and limited cone-beam computed tomography (CBCT) field of view, further hinder registration accuracy.[6,7]
To address these challenges, lung image registration can be performed with multi-stage approach, starting with rigid registration followed by deformable registration, addressing these challenges.[8] Classical methods like Intermediate DIR (IDIR) are important,[9] but deep learning (DL)-based models like VoxelMorph offer newer, faster solutions for DIR.[10] Beyond image registration, the application of DL in medical imaging has expanded rapidly, particularly in lung cancer detection. Shatnawi et al. demonstrated that an Enhanced Convolutional Neural Network model achieved a testing accuracy of up to 100% in lung cancer classification using computed tomography (CT) scans.[11] Furthermore, Wani et al. introduced DeepXplainer, a hybrid DL-based technique that incorporates interpretability for lung cancer detection while providing explanations of model predictions, thereby enhancing transparency and clinical trust.[12] Despite their promise, DL methods often suffer from limited training data.
This study investigated the feasibility and performance of artificial intelligence (AI)-based DIR for addressing respiratory-induced lung deformation and interpatient anatomical variability using a retrospective clinical dataset. The proposed framework evaluated rigid and deformable registration approaches, along with AI-assisted annotation, segmentation, and image deformation, in non-small cell lung cancer (NSCLC) cases across multiple fractionations. In addition, comparing the deformable performance to the classical method of IDIR.
MATERIALS AND METHODS
Retrospectively collected institutional CT and CBCT images from 17 NSCLC patients treated with radiotherapy in a single hospital, using a Varian Trilogy linear accelerator. Ethical approval was obtained before data usage, and the requirement for informed consent was waived due to the study’s retrospective nature. All imaging data were fully anonymized before analysis, and no additional imaging or interventions beyond routine clinical practice were performed. All CBCT images were acquired using a standardized thoracic imaging protocol. The acquisition parameters were fixed at 110 kVp and 20 mAs, with half-fan geometry applied for thoracic scanning. These parameters were consistently applied across all patients and treatment fractions.
In total, 1040 axial images, consisting of 520 CT Simulator and 520 CBCT images, were randomly selected and split into 70%, 20%, and 10% for training, validation, and testing, respectively. In a patient-wise splitting setting with unseen patients, interpatient anatomical variability would likely reduce performance due to morphological and positional differences across patients, yet this approach offers a more clinically representative evaluation of model generalization. Rigid and deformable registration were performed to assess patient position and structural changes during the interfraction radiotherapy process, as seen in Figure 1. The test of rigid registration was done on the 3rd week of irradiation to observe the changes of patient’s body shape or internal structure. Moreover, to evaluate the deformable registration method, data from a single patient were utilized, including 12 CBCT and 12 rigid CT axial images, which were evaluated, with scans acquired on November 14, November 22, November 30, and December 9, corresponding to weeks 1 through 4.
Figure 1.

Flowchart of image registration. CT: Computed tomography, CBCT: Cone-beam computed tomography
AI aided in the rigid registration process for automatic landmark detection and in the deformable registration process to automatic change the CT to CBCT image shape. The CBCT image was set as a fixed image and used as a reference image for deformable registration evaluation.
Preparation and data preprocessing
This process began by converting DICOM images into PNG format using Python libraries such as pydicom and OpenCV. The result was CBCT images with a size of 384 × 384 pixels and CT images with a size of 512 × 512 pixels, with a resolution of 0.85 and 0.73 pixels/mm, respectively. This step was undertaken due to the current limitation of the You Only Look Once (YOLO) algorithm, which does not yet provide support for direct processing or annotation of data in the DICOM format. However, this conversion inherently removes clinically relevant metadata such as Hounsfield Unit values and limits analysis to two-dimensional (2D) slice-wise information, which may affect volumetric consistency and clinical interpretability. Accordingly, the proposed pipeline should be interpreted as a methodological feasibility study rather than a fully integrated clinical radiotherapy workflow.
The creation of the bony landmark detection model dataset for rigid registration involved the annotation of CBCT and CT images, as seen in Figure 2a. Annotation is the process of labelling predefined objects or classes into bounding boxes. Since the spine remained stationary during respiration, the annotated area was the spine, labeled as the “backbone” class in 1040 PNG image data files.
Figure 2.

Annotation of (a) spine for rigid transformation and (b) some organs for deformable transformation
For the deformable registration, segmentation was performed on four predefined organs at risk (OAR), i.e. the right lung, left lung, spinal cord, and body contour as seen in Figure 2b. The lungs were included to assess the algorithm’s ability to account for respiratory-induced deformation, the spinal cord to evaluate registration accuracy in a rigid and clinically critical structure, and the body contour to represent global patient geometry. These structures were contoured on CT images and propagated to CBCT images using a semiautomatic annotation process, serving as ground truth for deformable registration evaluation.
For deformable registration, image annotation was done semiautomatic using Roboflow[13] with polygon-shaped labels to match the anatomy, creating segmentation masks as ground truth. Figure 2b displays four labels based on CT contour, i.e. right lung, left lung, spinal cord, and body. These labels on axial CBCT images, along with the results of rigid registration, facilitated a detailed and informative evaluation of the registration outcomes.
Landmark detection and segmentation model
The spine detection model was developed using the YOLO algorithm. In this study, the performance of YOLOv3, YOLOv5, and YOLOv8 was compared in detecting the spine in CBCT and CT images. The main differences between these models lie in their architecture. YOLOv3 used Darknet53 as its backbone, YOLOv5 employed the Modified CSP v7 backbone, and YOLOv8 utilized the YOLOv8 backbone for its architecture. In addition to differences in their backbone architectures, these models also varied in their frameworks. YOLOv3 and YOLOv5 were built on the Darknet framework, whereas YOLOv8 used the Pytorch framework.[14]
For the segmentation of four OAR, two YOLO architectures, YOLOv8 and YOLOv9, were utilized. YOLOv8 is engineered for fast and accurate results, making it an ideal choice for image segmentation and object detection tasks. On the other hand, YOLOv9 offers high precision while maintaining or reducing computational demands compared to its predecessor. Although YOLOv8 supports both detection and segmentation, these tasks were separated because spine localization can be addressed using bounding boxes, whereas organ-at-risk segmentation requires more precise spatial delineation for radiotherapy assessment.
Evaluation of annotation and segmentation
The object detection model needs to be evaluated to ensure its accuracy in detecting objects. The model outputs include confidence level, class name, and bounding box coordinates. If no object was detected, the confidence level would be zero. The model’s effectiveness was measured using the Intersection over Union (IoU) metric as follows:
(1)
where Aintersect and Aunion are the intersection area and union area of the AI detection and original annotation as the ground truth, respectively. Detection results are categorized as true positive, false positive, and false negative. Furthermore, mean average precision (mAP) was used for assessment. A higher mAP score indicates better model performance. The following equation was used to determine mAP
(2)
where n is number of evaluated images, precision is
, and recal is
.
Image registration process
To perform the rigid transformation, the CT image was translated to match the CBCT’s size and position. The CT image was converted into a NumPy array, and an empty 384 × 384 pixels array was initialized for alignment. The center points of both images were calculated, and their difference determined the translation value. Pixels from the CT images were then mapped to the new array based on this translation, gradually filling the array. Through this process, the CBCT and CT images eventually shared the same pixel dimensions.
The deformable transformation used VoxelMorph from G. Balakrishnan’s research toolbox, employing an unsupervised learning approach. In this study, the VoxelMorph model was trained with CBCT images as the fixed image and CT rigid images as the moving image.
Evaluation of image registration
The evaluation of image registration results was reviewed based on the Target Registration Error (TRE), Dice Similarity Coefficient (DSC), and Mean Distance to Agreement (MDA) metrics. Registration is considered successful when corresponding anatomical points and structures in both images exhibited spatial consistency. In calculating TRE, three identical points on both CT and CBCT images being evaluated were used (Brock et al., 2017).[15] These three identical points on both images are then calculated using the Euclidean distance, which was formulated as follows
(3)
where n is the total number evaluated, Tix, Tiy, Tiz are the identical points of the CBCT image, and Ti'x, Ti'y, Ti'z are the identical points of the CT image after rigid transformation. The evaluation of deformable registration was conducted using the DSC and MDA metrics. DSC was applied in both micro and macro evaluations. In the micro evaluation, overall image alignment was assessed without distinguishing individual anatomical structures. In the macro evaluation, DSC was calculated separately for four segmented structures: right lung, left lung, spinal cord, and body. The final microDSC score was derived by averaging the DSC values of the four segmented classes, calculated using the following equation
(4)
where A represents the output image of deformable registration, and B is the CBCT image before registration, which serves as the ground truth. According to AAPM Task Group 132 (TG132), the expected range for the DSC in image registration typically lies between 0.8 and 0.9. DSC below 0.8 suggests that the registration may not be sufficiently accurate, whereas values closer to 1 indicate a highly precise registration.
To complement the overlap-based DSC metric, MDA was used to evaluate geometric agreement between corresponding contours. MDA was calculated as the average bidirectional distance between all contour points in image sets A and B, defined as
(5)
where A represents the set of contour points derived from the deformable registration result, and B denotes the contour points from the reference CBCT image. The term is defined as the minimum Euclidean distance from a point a on contour A to contour B, while represents the minimum distance from a point b on contour B to contour A. By averaging these bidirectional distances, MDA provides a symmetric global measure of spatial correspondence between the registered and reference contours and is less sensitive to localized contour mismatches compared to DSC. Smaller MDA values indicate better overall geometric agreement between registered images. In addition, the IDIR method was employed as a comparison, representing the conventional approach of image deformable. The IDIR process was referenced from the toolbox containing MATLAB code developed by Aganj and Fischl.[9]
RESULTS
Landmark detection and rigid registration
The mean IoU for 104 test images of YOLOv3, YOLOv5, and YOLOv8 was 0.818 ± 0.088, 0.813 ± 0.090, and 0.817 ± 0.085. Moreover, Table 1 shows the evaluation mAP results for all datasets and employed YOLO versions. As a result, the YOLOv3 object detection model was chosen as the superior algorithm to detect bony landmark points on CBCT and CT images in this study. Before performing image registration, the translation center point is determined from the midpoint of the bounding box, as seen in Figure 3. Once the midpoint is determined, a rigid transformation is applied.
Table 1.
Annotation model evaluation of employed you only look once version
| Model | Precision | Recall | mAP 50 | mAP 50–95 |
|---|---|---|---|---|
| YOLOv3 | 0.989 | 0.99 | 0.99 | 0.67 |
| YOLOv5 | 0.989 | 0.99 | 0.99 | 0.67 |
| YOLOv8 | 0.986 | 0.99 | 0.99 | 0.66 |
The mAP 50 represents the mean precision of 50% IoU threshold, while the mAP 50–95 indicates the mAP with IoU thresholds ranging from 50%–95%. YOLO: You only look once, mAP: Mean average precision, IoU: Intersection over union
Figure 3.

The typical intersection over union results contain the bounding box. The midpoint of the box represents the translation center position. IoU: Intersection over union
Twelve patients who undergo the third week fractionation were evaluated for rigid registration. The average shift along the X-axis was 1.45 ± 0.57 cm, whereas the shift along the Y-axis was 1.72 ± 0.69 cm. This shift occurred due to the difference between the midpoint of the CT image and the midpoint of the CBCT image. Table 2 shows the results of rigid registration evaluation on 12 patient images. The average DSC value was 0.88 ± 0.04, and the TRE value was 1.7 ± 1.0 mm.
Table 2.
The position shifts, rotation, dice similarity coefficient, and target registration error evaluation of rigid registration of computed tomography and cone-beam computed tomography for 12 patients
| Patient number | X-axis shift (cm) | Y-axis shift (cm) | Rotation (°) | DSC | TRE (mm) |
|---|---|---|---|---|---|
| 1 | 0.48 | 1.57 | 1.0 | 0.91 | 1.9 |
| 2 | 0.95 | 1.01 | 0.3 | 0.89 | 0.3 |
| 3 | 1.71 | 1.90 | 1.2 | 0.89 | 2.6 |
| 4 | 2.14 | 1.55 | 1.3 | 0.90 | 1.2 |
| 5 | 1.72 | 2.65 | 0.0 | 0.92 | 0.6 |
| 6 | 0.92 | 2.83 | 1.0 | 0.89 | 2.3 |
| 7 | 1.72 | 1.53 | 1.3 | 0.93 | 1.2 |
| 8 | 1.45 | 2.67 | 0.0 | 0.83 | 3.2 |
| 9 | 1.06 | 1.86 | 0.3 | 0.80 | 0.8 |
| 10 | 0.96 | 1.57 | 0.0 | 0.88 | 0.5 |
| 11 | 2.64 | 1.11 | 1.8 | 0.86 | 2.6 |
| 12 | 1.64 | 0.45 | 0.0 | 0.89 | 3.2 |
DSC: Dice similarity coefficient, TRE: Target registration error
Deformable registration
The deformable registration started with segmenting four structures to assess anatomical changes. We tested two YOLO versions and found that YOLOv8 required 9.72 min for training and evaluation, whereas YOLOv9 required 17.72 min. This indicates that YOLOv8 is more computationally efficient. However, YOLOv9 outperformed YOLOv8 in the segmentation task. YOLOv9 accurately detected all labels in the 12 test images with better sensitivity and image context understanding. In contrast, YOLOv8 missed 8 out of 12 spinal cord labels on the CBCT images, as shown in Figure 4. These segmentations then became the reference structure for deformable evaluation.
Figure 4.

Segmentation structure detection of You Only Look Once, version 8 (YOLOv8) and YOLOv9 model of 12 computed tomography (CT) and cone-beam CT images test. CT: Computed tomography, CBCT: Cone-beam computed tomography, YOLOv8: You only look once, version 8, YOLOv9: You only look once, version 9
Figure 5 shows the overlay of a 4-week evaluation for one patient, with each week containing three axial slices (top, middle, and bottom of the lung structure). Noticeable differences remained between the deformable registration results and the reference CBCT images. Moreover, the DSC value for macro evaluation was almost below 0.8, whereas the micro evaluation consistently exceeded 0.8. The MDA evaluation was performed on segmented images to ensure anatomical correspondence, such that distance measurements represent geometric differences between homologous anatomical contours. (a, b, c) The image pair as seen in Table 3 represents the top, middle, and bottom axial slice of the lung. Moreover, the MDA evaluation shows that the iDIR method yields lower average distances between corresponding anatomical contours compared to the Voxelmorph method. Nevertheless, the MDA values obtained from both methods fall within the submillimeter range and approach up to 1 mm. According to the AAPM TG-132 recommendations, geometric deviations within a tolerance range of 2–3 mm are still considered acceptable for DIR evaluation.
Figure 5.
The overlay deformable registration over time of VoxelMorph for a, b, and c is the top, middle, and bottom area of the chest. Blue contours represent the cone-beam computed tomography (CT) reference, and orange contours represent the registered CT after registration
Table 3.
Dice similarity coefficient and mean distance to agreement evaluation of VoxelMorph and intermediate deformable image registration deformable methods of one patient undergo 4-week fractionations
| Image pair | VoxelMorph |
IDIR |
||||||
|---|---|---|---|---|---|---|---|---|
| DSC |
MDA | Runtime (s) | DSC |
MDA | Runtime (s) | |||
| “Macro” | “Micro” | “Macro” | “Micro” | |||||
| 1st week | ||||||||
| a | 0.75 | 0.93 | 0.404 | 0.384 | 0.52 | 0.85 | 0.425 | 188.244 |
| b | 0.54 | 0.88 | 0.920 | 0.436 | 0.55 | 0.86 | 0.580 | 192.305 |
| c | 0.91 | 0.96 | 0.326 | 0.440 | 0.99 | 0.99 | 0.004 | 189.525 |
| 2nd week | ||||||||
| a | 0.55 | 0.88 | 0.827 | 0.621 | 0.85 | 0.95 | 0.014 | 184.135 |
| b | 0.52 | 0.87 | 0.908 | 0.952 | 0.94 | 0.98 | 0.005 | 210.716 |
| c | 0.91 | 0.96 | 0.414 | 0.674 | 0.99 | 1.00 | 0.001 | 209.988 |
| 3rd week | ||||||||
| a | 0.49 | 0.86 | 0.914 | 1.245 | 0.84 | 0.95 | 0.034 | 179.619 |
| b | 0.55 | 0.88 | 0.709 | 0.662 | 0.46 | 0.83 | 0.296 | 184.742 |
| c | 0.91 | 0.95 | 0.507 | 0.624 | 0.98 | 0.99 | 0.095 | 219.667 |
| 4th week | ||||||||
| a | 0.78 | 0.94 | 0.454 | 0.616 | 0.48 | 0.84 | 0.509 | 202.902 |
| b | 0.74 | 0.93 | 0.312 | 0.748 | 0.82 | 0.95 | 0.018 | 202.514 |
| c | 0.96 | 0.92 | 0.367 | 0.695 | 0.99 | 0.99 | 0.012 | 214.039 |
The evaluation was designed to macro (consider the OARs segmentation) and micro (overall). a, b, c image pair represent the top, middle, and bottom axial slice of lung. DSC: Dice similarity coefficient, MDA: Mean distance to agreement, IDIR: Intermediate deformable image registration, OARs: Organs at risks
In addition, we used the IDIR method as a comparative deformable model. The process of iteration was shown in Figures 6 and 7 shows the overlay of deformable images and CBCT reference images, like those used by VoxelMorph. The macro-DSC value was improved by using the IDIR method. The details of DSC and MDA results and its run time can be seen in Table 3.
Figure 6.
Iterative process of intermediate deformable image registration deformable method at selected iterations
Figure 7.
The overlay deformable registration over time of Intermediate deformable image registration for a, b, and c is the top, middle, and bottom area of chest
DISCUSSION
In this study, the CBCT and CT rigid images were annotated using a semi-automatic annotation approach, which allows for a blend of automated and manual processes to enhance labelling accuracy. This method is particularly valuable in medical imaging, where precision is crucial. The annotations were performed in accordance with the contours outlined by radiation oncologists, ensuring that the labels accurately reflect the anatomical structures relevant to the patients.[16] The use of semi-automatic annotation tools has been widely supported in medical applications. These tools facilitate the annotation process by enabling quick and efficient labeling while allowing for expert oversight.[17,18]
YOLOv3 uses Darknet53 as its backbone to extract features, which improves the algorithm’s accuracy due to the use of residual blocks, skip connections, and up-sampling. In this research, we used YOLOv3 because its accuracy was better than YOLOv5 and YOLOv8 in terms of IoU and mAP results. A review by Nepal and Eslamiat showed that YOLOv5 outperformed YOLOv3.[19] This does not significantly contradict our findings, as the average IoU difference between YOLOv3 and YOLOv5 was only 0.003, and they had the same mAP value. Meanwhile, in our study, YOLOv8 showed a mAP 0.01% lower than both YOLOv5 and YOLOv3. Research investigating YOLO using COCO2017 also showed that YOLOv8’s AP was 2% lower than YOLOv5.[20]
The results for rigid registration showed an average DSC value of 0.88 ± 0.04 and a TRE value of 1.7 ± 1.0 mm. According to AAPM 132 recommendations, these results are within the tolerance range, with DSC and TRE values between 0.8–0.9 and 2–3 mm, respectively.[15] The rotation results showed variations from 0° to 1.8°. This deviation is due to annotations made on the spinal area without considering the shape of the patient tabletop.
YOLOv9 is an excellent choice for image segmentation due to its accuracy, outperforming earlier versions like YOLOv8. Its architecture, particularly the YOLOv9c-seg version, ensures precise boundary detection of anatomical structures crucial for such tasks. Widely used YOLO models have been applied in medical imaging for tumour detection, lung nodule identification, and segmentation, making YOLOv9 a natural progression for these applications.[21,22] In our result, YOLOv9 exceeded YOLOv8 in small organ like spinal cord at a relatively poor image quality of CBCT.
Evaluation of deformable registration was performed by utilizing DSC metrics, which were divided into micro, overall evaluation, and macro, consideration of segmented area. All the micro-DSC passed within the AAPM TG132 tolerance as seen in Table 3. However, the overall evaluation was not directly correlated to internal movement due to the respiratory process. In addition, we conducted the macro-DSC evaluation. Almost all the macro-DSC failed the AAPM TG132 tolerances except the image pair c, which only small lung was evaluated [Figures 5 and 6]. The results agreed with Balakrishnan et al. that reported average of macro-DSC at brain cases was lower than the AAPM TG132 tolerance.[10]
The same results were also found in the IDIR method. The average score of macro-DSR for IDIR was 0.786 ± 0.180, whereas VoxelMorph achieved 0.635 ± 0.172. This indicates that IDIR provides more consistent and accurate segmentation across different anatomical structures, particularly for more complex regions, as reflected in its higher mean macro-DSC. To complement overlap-based evaluation, MDA was incorporated as a surface-based metric to assess geometric correspondence between homologous segmented contours. MDA directly quantifies spatial discrepancies between anatomical boundaries and is less sensitive to volume effects. As shown in Table 3, MDA values varied substantially across lung regions and fractions, ranging from 0.312 to 0.920 for VoxelMorph and 0.001–0.580 for IDIR, reflecting heterogeneous respiration-induced deformation.
From the perspective of processing time, VoxelMorph surpasses the conventional IDIR-based method. With a registration time of approximately 1 s, VoxelMorph shows great potential as a real-time deformable registration tool for adaptive radiotherapy. The iterative nature of IDIR, as illustrated in Figure 6, further highlights the gradual refinement of deformation fields, contrasting with the rapid single-step output of VoxelMorph.
In this study, deformable registration was evaluated using axial slice-wise data from a single patient acquired across four CBCT fractions, enabling assessment of longitudinal intra-patient anatomical variations and comparison between VoxelMorph[10] and IDIR.[9] Restricting the analysis to one patient avoided inter-patient cross-matching, as anatomical deformations are highly individual. However, the use of slice-wise data from a single patient limits the ability to assess volumetric consistency and restricts the generalizability of the results to unseen patients.
There are limitations to this work. As shown in Figure 5, residual discrepancies remain between the deformably registered images and the reference CBCT, indicating that further refinement is required to reduce registration errors. In addition, the analysis was limited to 2D data due to DICOM-to-PNG conversion, which resulted in the loss of clinically relevant information. Future work will address these limitations by incorporating multipatient longitudinal datasets, multiple respiratory phases, and frameworks supporting direct DICOM-based three-dimensional (3D) processing, such as MONAI (Medical Open Network for AI).[21,23] The direct use of AI with DICOM image formats and 3D evaluation is beyond the scope of this research and will be investigated in the near future.
CONCLUSIONS
This study presents a proof-of-concept investigation into the feasibility of applying AI-based methods for anatomic landmark detection and image deformation in NSCLC cases using a multi-stage registration framework. Spine detection was performed using YOLOv3, while organ at risk segmentation employed YOLOv9. The results indicate that rigid registration achieved acceptable performance, whereas deformable registration remains challenging and requires further optimization. While the findings suggest the potential of AI-assisted approaches to support adaptive radiotherapy workflows, the current results are limited by the retrospective design, slice-wise evaluation, and restricted dataset size. Therefore, the proposed approach should be interpreted as an initial exploratory step rather than a clinically deployable solution. Future work will focus on patient-wise evaluation, increased training data, and clinically realistic preprocessing to improve robustness, generalizability, and clinical applicability.
Conflicts of interest
There are no conflicts of interest.
Funding Statement
Hibah Riset FMIPA Universitas Indonesia Tahun 2023-2024 (PKS-022/UN2.F3.D/PPM.00.02/2023).
REFERENCES
- 1.Keall PJ, Mageras GS, Balter JM, Emery RS, Forster KM, Jiang SB, et al. The management of respiratory motion in radiation oncology report of AAPM Task Group 76. Med Phys. 2006;33:3874–900. doi: 10.1118/1.2349696. [DOI] [PubMed] [Google Scholar]
- 2.Fu Y, Lei Y, Wang T, Higgins K, Bradley JD, Curran WJ, et al. LungRegNet: An unsupervised deformable image registration method for 4D-CT lung. Med Phys. 2020;47:1763–74. doi: 10.1002/mp.14065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Nenoff L, Ribeiro CO, Matter M, Hafner L, Josipovic M, Langendijk JA, et al. Deformable image registration uncertainty for inter-fractional dose accumulation of lung cancer proton therapy. Radiother Oncol. 2020;147:178–85. doi: 10.1016/j.radonc.2020.04.046. [DOI] [PubMed] [Google Scholar]
- 4.Nenoff L, Amstutz F, Murr M, Archibald-Heeren B, Fusella M, Hussein M, et al. Review and recommendations on deformable image registration uncertainties for radiotherapy applications. Phys Med Biol. 2023;68:24TR01. doi: 10.1088/1361-6560/ad0d8a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Yamamoto T, Kabus S, Klinder T, von Berg J, Lorenz C, Loo BW, et al. Four-dimensional computed tomography pulmonary ventilation images vary with deformable image registration algorithms and metrics. Med Phys. 2011;38:1348–58. doi: 10.1118/1.3547719. [DOI] [PubMed] [Google Scholar]
- 6.Peterlik I, Strzelecki A, Lehmann M, Messmer P, Munro P, Paysan P, et al. Reducing residual-motion artifacts in iterative 3D CBCT reconstruction in image-guided radiation therapy. Med Phys. 2021;48:6497–507. doi: 10.1002/mp.15236. [DOI] [PubMed] [Google Scholar]
- 7.Schröder L, Bootsma G, Stankovic U, Ploeger L, Sonke JJ. Impact of cone-beam computed tomography artifacts on dose calculation accuracy for lung cancer. Med Phys. 2024;51:4709–20. doi: 10.1002/mp.16994. [DOI] [PubMed] [Google Scholar]
- 8.Xiao H, Xue X, Zhu M, Jiang X, Xia Q, Chen K, et al. Deep learning-based lung image registration: A review. Comput Biol Med. 2023;165:107434. doi: 10.1016/j.compbiomed.2023.107434. [DOI] [PubMed] [Google Scholar]
- 9.Aganj I, Fischl B. Intermediate Deformable Image Registration Via Windowed Cross-Correlation, Proceedings of the IEEE 20th International Symposium on Biomedical Imaging. 2023:1. doi: 10.1109/isbi53787.2023.10230715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Balakrishnan G, Zhao A, Sabuncu MR, Guttag J, Dalca AV. VoxelMorph: A learning framework for deformable medical image registration. IEEE Trans Med Imaging. 2019;38:1788–1800. doi: 10.1109/TMI.2019.2897538. [DOI] [PubMed] [Google Scholar]
- 11.Shatnawi MQ, Abuein Q, Al-Quraan R. Deep learning-based approach to diagnose lung cancer using CT-scan images. Intell Based Med. 2025;11:100188. [Google Scholar]
- 12.Wani NA, Kumar R, Bedi J. DeepXplainer: An interpretable deep learning based approach for lung cancer detection using explainable artificial intelligence. Comput Methods Programs Biomed. 2024;243:107879. doi: 10.1016/j.cmpb.2023.107879. [DOI] [PubMed] [Google Scholar]
- 13.Dwyer B, Nelson J, Hansen T. Roboflow (Version 1.0) [Software] Available from: https://roboflow.com.computervision. [Last retrieved on 2024 Jul 25]
- 14.Ragab MG, Abdelsamea MM, Mohamed MH, Seoud L, Hussein A, Hassanien AE. A comprehensive systematic review of YOLO for medical object detection (2018 to 2023) IEEE Access. 2024;12:57815–57836. doi: 10.1109/ACCESS.2024.3386826. [Google Scholar]
- 15.Brock KK, Mutic S, McNutt TR, Li H, Kessler ML. Use of image registration and fusion algorithms and techniques in radiotherapy: Report of the AAPM radiation therapy committee task group no. 132. Med Phys. 2017;44:e43–76. doi: 10.1002/mp.12256. [DOI] [PubMed] [Google Scholar]
- 16.Philbrick KA, Weston AD, Akkus Z, Kline TL, Korfiatis P, Sakinis T, et al. RIL-Contour: A medical imaging dataset annotation tool for and with deep learning. J Digit Imaging. 2019;32:571–81. doi: 10.1007/s10278-019-00232-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Aljabri M, AlAmir M, AlGhamdi M, Abdel-Mottaleb M, Collado-Mesa F. Towards a better understanding of annotation tools for medical imaging: A survey. Multimed Tools Appl. 2022;81:25877–911. doi: 10.1007/s11042-022-12100-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Krenzer A, Makowski K, Hekalo A, Fitting D, Troya J, Zoller WG, et al. Fast machine learning annotation in the medical domain: A semi-automated video annotation tool for gastroenterologists. Biomed Eng Online. 2022;21:33. doi: 10.1186/s12938-022-01001-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Nepal U, Eslamiat H. Comparing YOLOv3, YOLOv4 and YOLOv5 for autonomous landing spot detection in faulty UAVs. Sensors. 2022;22:2. doi: 10.3390/s22020464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Terven J, Córdova-Esparza DM, Romero-González JA. A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach Learn Knowledge Extraction. 2023;5:1680–716. [Google Scholar]
- 21.Chen A, Lin D, Gao Q. Enhancing brain tumor detection in MRI images using YOLO-NeuroBoost model. Front Neurol. 2024;15:01–17. doi: 10.3389/fneur.2024.1445882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wang M. YOLOv8-Based Framework for Accurate Lung CT Nodule Images Detection, Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence. 2024:775. [Google Scholar]
- 23.Diaz-Pinto A, Alle S, Nath V, Tang Y, Ihsani A, Asad M, et al. MONAI label: A framework for AI-assisted interactive labeling of 3D medical images. Med Image Anal. 2024;95:103207. doi: 10.1016/j.media.2024.103207. [DOI] [PubMed] [Google Scholar]



