Abstract
Background:
Intrafraction motion monitoring in External Beam Radiation Therapy (EBRT) is usually accomplished by establishing a correlation between the tumor and the surrogates such as an external infrared reflector, implanted fiducial markers, or patient skin surface. These techniques either have unstable surrogate-tumor correlation or are invasive. Markerless real-time onboard imaging is a noninvasive alternative that directly images the target motion. However, the low target visibility due to overlapping tissues along the X-ray projection path makes tumor tracking challenging.
Purpose:
To enhance the target visibility in projection images, a patient-specific model was trained to synthesize the Target Specific Digitally Reconstructed Radiograph (TS-DRR).
Methods:
Patient-specific models were built using a conditional Generative Adversarial Network (cGAN) to map the onboard projection images to TS-DRR. The standard Pix2Pix network was adopted as our cGAN model. We synthesized the TS-DRR based on the onboard projection images using phantom and patient studies for spine tumors and lung tumors. Using previously acquired CT images, we generated DRR and its corresponding TS-DRR to train the network. For data augmentation, random translations were applied to the CT volume when generating the training images. For the spine, separate models were trained for an anthropomorphic phantom and a patient treated with paraspinal stereotactic body radiation therapy (SBRT). For lung, separate models were trained for a phantom with a spherical tumor insert and a patient treated with free-breathing SBRT. The models were tested using Intrafraction Review Images (IMR) for the spine and CBCT projection images for the lung. The performance of the models was validated using phantom studies with known couch shifts for the spine and known tumor deformation for the lung.
Results:
Both the patient and phantom studies showed that the proposed method can effectively enhance the target visibility of the projection images by mapping them into synthetic TS-DRR (sTS-DRR). For the spine phantom with known shifts of 1mm, 2mm, 3mm, and 4mm, the absolute mean errors for tumor tracking were 0.11±0.05mm in the x direction and 0.25±0.08mm in the y direction. For the lung phantom with known tumor motion of 1.8mm, 5.8mm, and 9mm superiorly, the absolute mean errors for the registration between the sTS-DRR and ground truth are 0.1±0.3mm in both the x and y directions. Compared to the projection images, the sTS-DRR has increased the image correlation with the ground truth by around 83% and increased the structural similarity index measure with the ground truth by around 75% for the lung phantom.
Conclusions:
The sTS-DRR can greatly enhance the target visibility in the onboard projection images for both the spine and lung tumors. The proposed method could be used to improve the markerless tumor tracking accuracy for EBRT.
1. Introduction
Image Guided Radiation Therapy (IGRT) is widely used to accurately deliver radiation to the tumor while sparing surrounding organs at risk (OARs). In external beam radiotherapy (EBRT), kV pairs and Cone Beam CT (CBCT) are routinely used to consistently set up patients according to CT simulation, effectively minimizing patient setup error before beam delivery. Due to intrafraction target motion, accurate patient setup before beam delivery alone may not be sufficient to ensure accurate dose delivery. Motion management techniques such as deep inspiration breath hold and abdominal compression are routinely used in the clinic to restrict the target motion. The use of internal target volume (ITV) is another way to mitigate the dosimetric uncertainty caused by internal target motion such as respiratory motion during the treatment. Even with these motion management and mitigation procedures in place, it is still desirable to have an effective intrafraction monitoring method to ensure high target coverage and crucial OAR sparing, especially for stereotactic body radiation therapy (SBRT) that adopts tight margins and delivers high doses per fraction.
It is very appealing to the radiotherapy community to have an image guidance system that allows us to ‘see what we treat, as we treat’. To this end, many intrafraction monitoring techniques have been developed to visualize and track the target motion during radiation treatment1. Onboard kV imaging systems e.g. the Varian On-Board Imager (OBI) have been integrated with linear accelerator (LINAC) to enable intrafraction monitoring. MR-LINAC systems e.g. the Viewray MRIdian and Elekta Unity have been developed to combine MR imaging with LINAC to leverage the superior soft tissue contrast of MRI for radiotherapy. BrainLab ExacTrac and Vero Gimbal systems use stereoscopic kV imaging with external breathing signals for patient motion monitoring during treatment. Other motion tracking systems include electromagnetic transponders, e.g., Varian Calypso, surface imaging, e.g., AlignRT and ultrasound imaging. In this study, we focus on kV x-ray imaging-based intrafraction motion monitoring using an onboard imager that is equipped on most modern LINACs.
Due to poor soft tissue contrast and the 2D nature of the kV imaging, high-contrast fiducial markers are commonly used as surrogates for the tumor position. However, fiducial marker implantation is an invasive and costly procedure that is usually associated with medical risks, therefore, not universally available to patients. Moreover, the small number of implanted fiducial markers are a sparse point representation of the 3D solid tumor that could change shape and volume over the course of treatment. Markers also can migrate, yielding positional errors in the kV images2. Markerless target monitoring using kV imaging is highly desired. In 2019, an AAPM grand challenge - The MArkerless Lung Target Tracking CHallenge (MATCH) was organized to systematically investigate and benchmark the accuracy of markerless lung tumor motion tracking methods3. One common challenge for markerless tumor tracking is the low tumor visibility of the onboard kV projection images. The tumor target is obscured by overlapping structures along the x-ray projection path, resulting in low target visibility. Zhang et al. performed MV/kV imaging-based lung tumor tracking and discussed that one major obstacle is the low tumor visibility4,5. De Bruin et al. performed kV imaging-based markerless lung tumor tracking and showed that the low tumor visibility was one of the major causes for unsuccessful tracking6. Hence, it is important to develop a method to enhance the tumor visibility in the projection images for accurate tumor tracking. To achieve this goal, Menten et al. investigated the use of dual energy x-ray imaging to enhance tumor visibility by removing overlapping bony structures, such as the ribs7. Bowman et al. showed the feasibility of optimizing dual-energy x-ray parameters to enhance soft-tissue imaging8. Berg et al. proposed a thoracic bone suppression algorithm to enhance the sensitivity and specificity of the detection and localization of lung nodules9. He et al. proposed a deep learning model to decompose the spine from x-ray projection images, which can be used to improve paraspinal tumor tracking10,11. Shen et al. developed a deep learning-based patient-specific model to reconstruct volumetric CT image from a single projection image or a few projection images12. The model was tested on one upper-abdomen patient, one head-and-neck patient, and one lung patient. This method demonstrated the feasibility of using a patient-specific model with prior CT knowledge to reconstruct volumetric CT images. Another proof-of-concept study was published by Lei et al. to reconstruct volumetric CT from a single digitally reconstructed radiograph (DRR) for lung patients13. This method was trained on 9 phases of lung 4DCT images and tested on the remaining phase to demonstrate the potential use for tumor tracking. Based on tests on 20 patients, the method was able to reconstruct the volumetric CT with the tumor center-of-mass positional accuracy within 2.6 mm. The above-mentioned methods have demonstrated the great potential of using patient-specific deep learning models for sparse-view CT reconstruction. However, as proof-of-concept studies, one common limitation is that their models were tested on only DRRs rather than the real onboard projection images. Training and testing on the same image domain favored the network performance. Despite these proof-of-concept studies, we believe it is too challenging for the network to reconstruct the volumetric CT with sub-mm geometric accuracy based on a single 2D projection image due to the lack of the third-dimensional information.
In radiotherapy, DRRs are usually created from the simulation CT to simulate the onboard projection image for patient setup. Rigid image registration between the kV setup image pairs and their corresponding DRRs is routinely performed to set up the patient. This method normally works well for bony structures due to its high visibility. The low soft tissue contrast of the onboard projection images and DRRs makes them less accurate at aligning soft tissue targets. Therefore, CBCT is often acquired after the kV/kV or kV/MV setup pair to further improve the target positioning accuracy before beam delivery. The target visibility of the projection images is affected by the kV energy used, detector efficiency, soft tissue attenuation coefficient and so on. Nevertheless, one of the major contributing factors to low target visibility is the overlapping structures with the target along the x-ray projection path. To improve the target visibility, we introduce a new imaging modality, the target-specific DRR, short for TS-DRR. The reconstruction process of TS-DRR is shown in Fig. 1. Considering one thoracic vertebra as the planning target volume (PTV), the CT volume is truncated to include only the section that overlaps with the PTV for TS-DRR generation. Fig.1 shows the comparison between DRR and TS-DRR at four projection angles. We can see that the target visibility is greatly improved on TS-DRR as compared to DRR. Therefore, we propose to build a patient-specific model to generate the synthetic Target-Specific DRR (sTS-DRR) from intrafraction kV images for intrafraction tumor monitoring.
Fig. 1.
DRR vs. TS-DRR. Left: DRR is projected through the whole body while TS-DRR is projected through the target-only volume. Right: Target visibility was improved for TS-DRR compared to DRR.
2. Materials and Methods
2.1. Workflow
The proposed workflow of using sTS-DRR for treatment is shown in Fig. 2. During treatment, the patient was first positioned on the couch using kV setup pairs and CBCT. After beam-on, intrafraction kV projection images were taken at a pre-determined frequency. The real-time kV projection images such as the Intrafraction Motion Review images (IMR) were fed to the trained model to generate sTS-DRR with enhanced target visibility. The IMR is a real-time, 2D motion management tool on the Varian Truebeam™ system featuring triggered imaging acquired with the OBI during beam delivery. Tumor TS-DRR template was generated from the planning CT. The tumor in the sTS-DRR generated from the real-time kV projection images was registered to the tumor template in the TS-DRR for motion tracking. Based on the calculated tumor motion, motion management techniques such as beam gating or multileaf collimator tumor tracking could be used during beam delivery.
Fig. 2.
The workflow of the proposed method. A Pix2Pix patient-specific model was trained using the TS-DRR and DRR image pairs. A synthetic TS-DRR can be generated from the real-time kV projection images to enhance the target visibility. The synthetic TS-DRR was registered to the tumor template TS-DRR generated from the planning CT for intrafraction tumor motion monitoring.
2.2. Network
Training DRR and TS-DRR image pairs were first generated using the simulation CT by projecting the CT volume and target-only volume at different angles. The DRR and TS-DRR are perfectly matching image pairs with pixel-to-pixel correspondence. Hence, we chose the Pix2Pix conditional GAN as our network14,15. UNet-256 was selected as the network generator. An input image of size 512×512 will become size 2×2 at the bottleneck. The generator has a total of 54 million learnable parameters. For the discriminator, ‘PatchGAN’ classifier described in the original Pix2Pix paper was adopted. The generated sTS-DRR was reduced to a size of 70×70 using convolution layers in the discriminator. Each image patch is classified as real or fake to encourage realistic image generation. Such a patch-level discriminator architecture has fewer parameters than a full-image discriminator. The Pix2Pix network can be trained to create the patient-specific model for sTS-DRR generation based on projection images.
2.3. Training and Testing
The original Pix2Pix paper focused on image style transfer without the explicit emphasis on rigorous pixel-to-pixel correspondence. Minor training image pair misalignments can be tolerated with the method. The training image pairs were generated by cropping the roughly aligned source-and-target domain images at different pixel locations that are within a certain distance. In our case, we prepared the DRR and TS-DRR image pairs to have exact pixel-to-pixel correspondence. Therefore, the training image pairs were cropped at the same pixel location for both the DRR and TS-DRR. For natural image processing, e.g. image classification, translation-invariance is a desirable feature for the neural network, meaning the network prediction is robust to minor image shift.16. For intrafraction motion monitoring, the translation-invariance feature is undesirable since any input image shifts caused by intrafraction patient motion need to be preserved in the output sTS-DRR. To solve this problem, image augmentation was used to train the network to be equivariant to translation, meaning the output image will be shifted equally when the input image is shifted. Specifically, the CT was randomly shifted in the lateral, longitudinal, and vertical directions in the range of ±1cm around the isocenter with uniform distribution. This is analogous to rigidly shifting the treatment couch around the treatment isocenter during DRR generation. To improve network robustness, speckle noise with variance of 0.001 was added to the DRR during training. For the discriminator, we have found that the least-square GAN (LSGAN) outperformed the Vanilla GAN17. The network was trained on Nvidia RTX A6000 graphic card with 48G memory. The training took around 4 to 8 hours depending on the treatment site and the number of training image pairs used. The number of training image pairs depends on the CT, e.g. whether a 4DCT or a single CT was used, and on the degree of augmentation, e.g. the number of random shifts performed per projection angle. To make image size consistent between training and testing images, the IMR and raw CBCT projection images were cropped from 1024×768 to 512×512. The code was implemented using Pytorch version 1.12.1 and modified based on the GitHub repository at: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix. The specific training parameters used are “--model pix2pix --input_nc 1 --output_nc 1 --preprocess crop --no_flip --no_html --load_size 512 --crop_size 512 --gan_mode lsgan.
By design, a simulation CT with sub-millimeter slice thickness should be used for model training because it allows adequate spatial resolution and time for network training before the patient starts treatment. Nevertheless, a CBCT dataset could be used as a surrogate when a high-resolution CT is not available. CT usually has higher image quality but lower spatial resolution than the CBCT. DRRs generated from the same-day CBCT usually have higher anatomy consistency with the IMR images that are acquired shortly after. However, same-day CBCT is not available until the treatment session begins, which gives limited time for network training. CBCT from previous fractions could be used if the spatial resolution is of absolute importance, e.g. paraspinal SBRT. For lung tumor, 4DCT is often acquired to assess the tumor motion amplitude to decide the appropriate motion management technique, such as free breathing or breath hold. The 4DCT should be used for model training since it contains the tumor motion at 10 different phases within a full respiratory cycle.
Since the DRR and projection images belong to different imaging domains, image pre-processing on projection images is needed prior to network inference. Compared to the DRR generated from the simulation CT, the projection images acquired by the OBI of a LINAC have Compton scatter artifacts, different x-ray source energy, different detector response, and beam geometry. All of these factors cause the domain difference between the projection images and the DRR. Hence, it is important to preprocess the projection images to match the training DRR for network inference. Fig. 3 shows the proposed pre-processing steps that can effectively mitigate this problem. In Fig.3(A), the raw CBCT projection data were first corrected for the bowtie filter attenuation18. Then, the histogram of the projection images was matched to the training DRR with the same projection angle. For the spine, IMR images were used for patient motion monitoring. Since no bowtie filter is used for IMR, the histogram of the IMR was directly matched to the DRR in Fig. 3(B). The preprocessed projection images are then used to generate the sTS-DRR.
Fig. 3.
Image preprocessing for (A) CBCT projection images for a lung case, (B) IMR images for a spine case.
2.4. Spine tumor
2.4.1. Phantom study
For paraspinal SBRT, the target is usually very close to the spinal cord. In our clinic, IMR images were acquired to monitor patient intrafraction motion during the beam delivery to ensure target coverage and spinal cord sparing. In this study, we used nine-field fixed gantry angle IMRT for treatment planning19. The 9 projection angles of the IMR are [10°, 30°, 50°, 70°, 90°, 110°, 130°, 150°, 170°]. An IMR image was acquired every 200 MU in IMRT during beam delivery. To quantify the target motion, the IMR acquired during treatment can be registered to the planning DRR around the PTV. The robustness of the registration is sometimes impaired by the target’s low visibility11. The sTS-DRR aims to improve the target visibility and could potentially increase the tumor tracking accuracy and robustness. Due to the unavailability of simulation CT with slice thickness <1mm, high-resolution CBCT (0.45×0.45×0.45mm3) was used as a surrogate to generate training images. The treatment isocenter for the phantom was at the 7th cervical vertebra. To investigate the potential of sTS-DRR for intrafraction motion monitoring, the phantom was manually shifted by 1mm, 2mm, 3mm, and 4mm in all the lateral, longitudinal, and vertical directions. For each shift, nine IMR images from 9 different gantry angles were acquired during beam delivery for analysis. sTS-DRRs were generated based on these IMR images for testing.
2.4.2. Patient study
Due to the unavailability of ground truth for patient study, we investigated one important property of the patient-specific model, which is the equivariance to translation. The trained model is equivariant to translation if the output sTS-DRR is equally shifted when its input IMR is shifted. Since high-resolution simulation CT with slice thickness <1mm was not available for paraspinal patients in our clinic, CBCT was used as a surrogate to generate training images. A patient-specific model was trained for a patient who underwent paraspinal SBRT using nine-field fixed gantry angle IMRT with IMR intrafraction monitoring. The projection angles were the same as for the phantom study. To investigate whether the model is equivariant to translation, the IMR images were intentionally shifted in the x and y directions by 1mm, 2mm, 3mm and 4mm. sTS-DRRs were generated based on the shifted IMR for testing. Translation-only image registration was performed using Matlab between the sTS-DRRs with and without the shift. The Matlab built-in OnePlusOneEvolutionary optimizer was used with MattesMutualInformation as the image similarity metric.
2.5. Lung tumor
2.5.1. Lung phantom study
A LUNGMAN phantom manufactured by Kyoto Kagaku Co., Ltd with a spherical tumor of diameter 12mm was used to train the network for lung tumor enhancement. The spherical tumor with 100 HU was manually put in the right lung. Fig. 4 shows the experimental setup. The abdomen block of the phantom was initially pulled out by ~2cm to simulate end-inhalation. First, one CBCT was acquired to align the treatment isocenter to the center of the tumor. Subsequent CBCTs were taken after the abdomen block was pushed in by hand, little by little. The three subsequent CBCTs revealed that the lung was deformed by the abdomen block, and the tumor moved superiorly by 1.8mm, 5.8mm, and 9mm, respectively, as shown in Fig. 5. The tumor was moved by ~ 1mm laterally and vertically while being pushed in by the abdomen block.
Fig. 4.
The LUNGMAN phantom with 12mm diameter tumor. Lung and tumor were deformed by manually pushing the abdomen block in.
Fig. 5.
CBCTs showing that the tumor moved superiorly. (A) tumor is at isocenter, (B-D): tumor was deformed superiorly by 1.8mm, 5.8mm and 9mm, respectively.
For lung tumor visibility enhancement, 4DCT or 4DCBCT is preferred over a single static CT or CBCT when generating training image pairs since 4DCT or 4DCBCT captures the tumor motion at different phases throughout a respiratory cycle. However, due to the unavailability of 4DCT or 4DCBCT for this experimental setup, we generated the training image pairs using only the CBCT with the tumor at the isocenter (Fig.5A). Alternatively, simulation CT could be used to train the network. 3600 training image pairs were generated at 1° projection angle interval with 10 random shifts per projection angle within ±1cm around the isocenter in the lateral, vertical and longitudinal directions. Same training and testing configurations as the spine were used for the lung phantom. The network was trained for 200 epochs in about 8 hours. The trained network was tested on separately acquired CBCT projection images after the lung was deformed by the abdomen block shown in Fig. 5(B–D).
2.5.2. Patient study
A patient-specific model was trained using the 4DCT of one patient treated with free-breathing lung SBRT. The voxel size of the 4DCT images is 1.17×1.17×2 mm3. A total of 9000 training image pairs were generated from the 10 phases of CT images by projecting the DRR and TS-DRR at every 2° angle and augmented with five random shifts within ±1 cm around the isocenter in the lateral, vertical and longitudinal directions. The same training and testing configurations as the spine were used for the lung patient. The network was trained for 50 epochs which took around 8 hours. Since real-time kV projection images during beam-on is not available, we used the CBCT projection images for testing to generate the sTS-DRR. PTV, ITV and CTV contours were shown to indicate the location of the tumors. The tumor was manually contoured on the phase 50% CT. Centroid of the tumor was identified based on the binary tumor mask. Template matching with the GTV as the ROI and SSIM as the image similarity metric was performed between the sTS-DRR and the phase 50% TS-DRR to track the tumor.
3. Results
3.1. Spine
3.1.1. Phantom study
The IMR images were pre-processed to match the histogram of the training DRR prior to testing, as shown in Fig. 3. Fig. 6 shows the IMR and its respective sTS-DRR at four projection angles. Around 10 landmarks were manually selected for each projection angle. The landmarks were identified as the bony edges or vertebral body corners/edges that can be clearly observed on either the projection image, or the sTS-DRR or on both. The same landmarks were plotted for the IMR and its respective sTS-DRR to allow visual assessment of the anatomical correspondence between the two. Fig. 6 shows that the sTS-DRR greatly improved the target visibility. The anatomy corresponds well between the IMR and sTS-DRR. Because the shoulder obscured the target for certain projection angles such as 70°, 90°, and 110°, the target visibility was very low for these angles. Fig. 6 (70°) shows that the network was able to reconstruct the target structures even when the IMR had very low image contrast.
Fig. 6.
Top row: IMR images, Bottom row: respective sTS-DRR. Same landmarks per projection angle were annotated to show the anatomical correspondence between the IMR and its respective sTS-DRR.
When using IMR for intrafraction monitoring, the online IMR is registered to the DRR generated from simulation CT. Hence, DRR vs. online IMR-shifted registrations were performed. When using TS-DRR for intrafraction monitoring, the sTS-DRR generated from online IMR is registered to the TS-DRR generated from simulation CT. Hence, TS-DRR vs. online sTS-DRR registration were performed for comparison. The registration results were shown in Fig. 7. The region of interest for each registration was the PTV which is the 7th cervical vertebrae body with a 1cm margin. The registration was performed manually by shifting one image in the x and y directions to match the other image. Image fusion and image toggle tools were used to assess the image alignment. For angles 10°, 30°, 50°, 130°, 150° and 170° where the shoulder occlusion was not an issue, the sTS-DRR performed slightly better than the IMR in terms of tracking accuracy. For DRR vs. IMR-shifted registration, the absolute mean errors were 0.13±0.06mm in the x direction and 0.31±0.05mm in the y direction. For TS-DRR vs. sTS-DRR-shifted registration, the absolute mean errors were 0.11±0.05mm in the x direction and 0.25±0.08mm in the y direction. However, for angles 70°, 90° and 110° where the shoulder obscured the target, DRR vs. IMR-shifted registration could not be performed due to the low image contrast around the PTV. On the contrary, the TS-DRR vs sTS-DRR-shifted registration achieved an absolute mean error of 0.62±0.13mm in the x direction and 0.47±0.13mm in the y direction. Therefore, the sTS-DRR has an overall improved tracking accuracy than the IMR, especially for angles with low target visibility.
Fig. 7.
Manual registration error comparison in the X and Y directions for the phantom study. DRR vs. IMR-shifted shows the error of DRR and IMR-shifted registration. TS-DRR vs. sTS-DRR-shifted shows the error of TS-DRR and sTS-DRR-shifted registration. For angles 70°, 90° and 110°, the registration errors for DRR vs. IMR-shifted were not ploted due to registration failure caused by low image contrast around the PTV.
3.1.2. Patient study
Fig.8 shows the IMR images and their respective sTS-DRR images at four different projection angles. Around 20 landmarks were identified as the vertebral body corners/edges that can be clearly observed on either the projection image, or the sTS-DRR or on both in order to allow visual assessment of the bony structure correspondence between the two For equivariance study, the absolute mean errors of the registration between sTS-DRR with and without 1mm, 2mm, 3mm, 4mm shifts are 0.18±0.17mm and 0.04±0.03mm in the x and y directions, respectively. Due to the high contrast of the vertebra in the longitudinal direction, the registration error in the longitudinal direction is almost negligible. Compared to the longitudinal direction, the x direction has relatively larger errors which were mainly due to the less visible vertebra body edge in the IMR. Considering the pixel size of 0.26mm for the onboard imager at the isocenter plane, the mean absolute error of 0.18 mm in the x direction is at sub-pixel level. Therefore, the trained model is considered to be equivariant to translations with sub-pixel accuracy, which is a desired feature for intrafraction motion monitoring.
Fig. 8.
Top row: IMR images, Bottom row: respective sTS-DRR. Same landmarks per projection angle were annotated to show the anatomical correspondence between the IMR and its respective sTS-DRR.
3.2. Lung
3.2.1. Phantom study
The results are shown in Fig. 9, including the CBCT projection images, the generated sTS-DRR and the ground truth TS-DRR for six projection angles at 50° interval. Fig. 9 shows that the tumor visibility was greatly improved. The red cross in Fig. 9 shows the tumor centroid as identified in the ground truth image. For projection angle of 150°, the network’s performance was impaired because the vertebra body overlapped with the tumor. The performance could be improved by using 4DCT or 4DCBCT which includes tumor motion at different phases throughout a respiratory cycle.
Fig. 9 (A-C):
Tumor moved 1.8mm, 5.8mm and 9mm superiorly. Six projection angles were selected at ~50° interval. Each triplet shows the projection image, the sTS-DRR and the ground truth (GT). Projection angle was shown in red. The red cross landmark shows the tumor center identified in the GT.
To quantitatively demonstrate the tumor visibility enhancement, we calculated the 2D image correlation and structural similarity index measure (SSIM) against the ground truth for the projection images and for the sTS-DRR, around the tumor with 5mm margin. The results are shown in Table 1 and Fig. 10. We can see that the image correlation and SSIM were greatly improved for all angles, except for some angles around 140° and 320°, where the tumor overlapped with the vertebral body. Nevertheless, moderate improvements were still observed for these projection angles. The registration error shows the same trend with slightly larger errors for projection angles around 140° and 320°. Compared to the projection images, the sTS-DRR has increased the image correlation with the ground truth by around 83%, and increased the structural similarity index measure with the ground truth by around 75%.
TABLE 1.
Image correlation and SSIM for images cropped around tumor with 5 mm margin.
| Tumor moved 1.8 mm superiorly | Tumor moved 5.8 mm superiorly | Tumor moved 9 mm superiorly | ||||
|---|---|---|---|---|---|---|
| Projection vs. GT | sTS-DRR vs. GT | Projection vs. GT | sTS-DRR vs. GT | Projection vs. GT | sTS-DRR vs, GT | |
| Image correlation | 0.48 ± 0.22 | 0.91 ± 0.08 | 0.47 ± 0.23 | 0.86 ± 0.16 | 0.47 ± 0.23 | 0.84 ± 0.21 |
| SSIM | 0.31 ± 0.06 | 0.57 ± 0.11 | 0.31 ± 0.07 | 0.54 ± 0.14 | 0.31 ± 0.07 | 0.52 ± 0.14 |
Fig. 10 (A-C):
Image correlation and SSIM for a tumor motion of 1.8mm (A), 5.8mm (B) and 9mm (C) superiorly. Top row: Image correlation comparison between CBCT projection and synthetic TS-DRR. Bottom row: SSIM comparison between CBCT projection and synthetic TS-DRR.
To demonstrate the superiority of sTS-DRR over CBCT projection images for tumor tracking, we separately registered the sTS-DRR and the CBCT projection images to the ground truth using the same translation-only registration as the spine case described previously. The registration errors and their respective histograms are shown in Fig. 11. For sTS-DRR, the absolute mean registration errors were 0.1±0.3mm in both the x and y directions for all target motions. Out of the 893 projection angles, the registration errors are less than 1mm for 98% of the angles, are less than 0.5mm for 90% of the angles. In contrast, for CBCT projection images, the absolute mean registration errors were 0.4±0.5mm in the x direction, and 0.4±0.3mm in the y direction. Out of the 893 projection angles, the registration errors are less than 1mm for 95% of the angles, are less than 0.5mm for 81% of the angles. Therefore, the sTS-DRR outperformed the CBCT projection images in terms of tumor tracking accuracy.
Fig. 11 (A-C):
Registration error and histogram for a tumor motion of 1.8mm (A), 5.8mm (B) and 9mm (C) superiorly. Top: TS-DRR vs. sTS-DRR registration. Botom: TS-DRR vs. CBCT projection images registration.
3.2.2. Patient study
The centroid of the tumor was mapped to the projection images and the sTS-DRR based on the template matching results as shown by the red cross in Fig.12. Fig. 12(A) shows the results for projection angles when the lung tumor was vaguely visible by the human eye in the CBCT projection images. Fig.12(B) shows the results for projection angles when the tumor was hardly visible to human eye in the CBCT projection images. The network was able to reconstruct the tumor location and shape from the image features learned from the training data.
Fig. 12.

(A) Lung tumor was vaguely visible in projection images, (B) Lung tumor was hardly visible in projection images. Projection: CBCT projection images, sTS-DRR: sTS-DRR based on projection images, 50% TS-DRR: TS-DRR generated based on 50% phase of 4DCT. Red cross: tumor centroid as defined on phase 50% CT, and mapped to the projection and sTS-DRR images by template matching. Red, green blue contours are the PTV, ITV and GTV, respectively.
The tumor trajectory of the 2D template matching was shown in Fig. 13(A–B). Sequential triangulation was implemented to calculate 3D tumor coordinates from the 2D match results. 20. For each projection angle, multiple prior 2D matches were selected according to a specific screening criteria and were used simultaneously to perform multi-view triangulation. The screening criteria are (1) stereo separation angle ≥ 25°, and (2) epipolar distance ≤ 1mm. The 3D tumor location was first calculated as the location with the least mean square distance from the multiple prior 2D matches. Then, the tumor location was projected to the current tumor tracking ray to find the final tumor location in 3D. Any projection distance greater than 1mm was considered to be unsuccessful. Fig. 13(C–E) shows the trajectory of the calculated tumor centroid location in the vertical, lateral, and longitudinal directions. The magnitude of the tumor motion was consistent with the tumor motion measured from 4DCT, which were approximately 3.4mm sup-inf, 2.2mm lateral, and 2.2mm vertical.
Fig. 13 (A-B):
Tumor motion trajectory in 2D obtained by template matching the GTV in the x and y directions. (C-E): The tumor motion trajectory in 3D calculated using sequential triangulation.
4. Discussion
The sTS-DRR generated by the patient-specific model can greatly enhance the target visibility in the onboard projection images. The anatomical correspondence demonstrated in the spine and lung studies shows that the network learned to reconstruct the target based on the image features of the input projection images, rather than simply ‘memorizing’ the absolute image coordinates of the training datasets. The lung phantom study shows that the network trained using CBCT with a tumor at an isocenter (Fig.5A) can be applied to projection images after the tumor has moved to different locations (Fig.5B–D). The patient studies show that the proposed method can greatly enhance tumor visibility even when the tumor is only vaguely visible in the projection images (Fig.12A). For projection angles when the tumor is hardly identifiable by the human eye (Fig.12B), the sTS-DRR could still reconstruct the tumor, with blurred tumor shape. The impact of the projection image quality on tumor localization accuracy with the proposed method needs further thorough investigation and is beyond the scope of this study.
With the phantom studies, we have demonstrated that the sTS-DRR outperformed the IMR for the spine tumor case and the CBCT projection images for the lung tumor case in terms of tumor tracking accuracy. The improvement was due to the enhanced tumor visibility across the entire projection angles. Low target visibility can be caused by overlapping structures along the X-ray projection line, including soft tissues and bony structures. Though the network’s performance was slightly impaired when bony structures obscured the target, improved target visibility with increased image correlation and SSIM was still observed for these angles. Sub-millimeter tracking accuracy was achieved using sTS-DRR for the spine tumor, even for angles where the IMR failed to track the tumor due to low target visibility. For the lung tumor, the mean tracking error was reduced by 75% when using sTS-DRR, as compared to using the CBCT projections.
Using the Nvidia RTX A6000 graphic card with 48G memory, the average time needed to generate one sTS-DRR of size 512×512 is 35ms. With additional disk input/output operations and image post-processing, the latency is approximately 50ms. The time interval between two consecutive CBCT projections is approximately 70ms. Hence, the tumor tracking for the current CBCT projection could be finished before the next CBCT projection image is available. For beam gating based on tumor location, the latency could result in delayed beam hold after the tumor has moved out of the PTV. A tighter margin could be used to mitigate the latency effect. For tumor location calculation in 3D, the sequential triangulation algorithm relies on multiple previous projections at roughly the same respiratory phase to estimate the tumor location with the least mean square distance. Therefore, if the patient’s breathing pattern is highly irregular, it would result in a large estimation error of the tumor location.
Instead of using a more realistic deformation model, translation-only data augmentation is used by randomly shifting the CT around the isocenter within ±1cm in this work for its simplicity. This is sufficient by our phantom and patient studies. For each projection angle, at least 5 training image pairs should be generated by randomly shifting the CT around the isocenter. This data augmentation step is crucial since it encourages image feature-based learning and prevents the network from simply ‘memorizing’ the image feature location. Nevertheless, rotations and deformation could be used to further augment the training datasets, which could improve the performance at the cost of increased training time. Furthermore, to incorporate a variety of tumor motion scenarios, artificial lung motion can be generated using principal component analysis of the patient-specific lung deformation. The augmentation could potentially increase the performance by allowing the network to learn from a wide range of tumor motion scenarios. The proposed histogram matching before network testing is an effective way to account for the difference between the training DRR and projection images due to the Compton scatter, different image filters, x-source energy, mAs, and detectors. The generalizability of a patient-specific model means whether the model trained on DRR generated from the simulation CT/4DCT can be applied to the real-time kV images. To improve the generalizability of the model, we have introduced data augmentation for model training and histogram-matching image pre-processing for model testing. Our results show that the patient-specific models generalized well on the IMR for the spine tumor and on the CBCT projection images for the lung tumor. However, only one case was investigated for the spine and lung tumors, respectively. Each patient is unique in terms of tumor size, location, and tumor occlusion by other structures in the kV projection images. Therefore, the proposed method needs to be evaluated on more patients with a variety of clinical scenarios to demonstrate its potential in intrafraction motion monitoring.
Though the proposed method works on simulation CT and 4DCT, same-day CBCT is preferred if the CBCT has good image quality around the target. This is because 1) same-day CBCT shows greater anatomy consistency with the real-time projection images as compared to simulation CT due to daily anatomy variations, and 2) CBCT has higher spatial resolution than the simulation CT, which helps to mitigate the blurring effect of DRR generation. The challenges of using same-day CBCT for training are two-fold: 1) the CBCT generally has lower image quality than the simulation CT due to patient motion blur and scattering. This could be mitigated by using 4DCBCT for targets with motion or a fast CBCT acquisition. 2) The limited time after CBCT and before beam delivery does not allow full training of the network to finish. Currently, without code optimization, the training takes about 4 hours to converge for spine cases and approximately 8 hours to converge for lung cases on one Nvidia RTX A6000 graphic card. This limitation could be mitigated by code optimization, using more powerful GPU clusters, and pre-training the model using simulation CT or previous fraction CBCT. Another workaround is to use the previous fraction CBCT or 4DCBCT to train the network and use it on the next fraction assuming no substantial tumor shrinkage or growth has occurred between the two fractions.
Though the proposed method worked well for kV projection images, it is not trivial to apply it to the megavoltage (MV) treatment beam imaging due to the small beam apertures caused by IMRT multileaf collimator modulation21. For future work, we will investigate the feasibility of applying the proposed method to MV treatment beam images for MV/kV tumor tracking.
5. Conclusion
The sTS-DRR can greatly enhance the target visibility of the onboard projection images for both the spine and lung tumors. The proposed method could be used to improve the markerless tumor tracking accuracy for external beam treatment.
Acknowledgment
Memorial Sloan-Kettering Cancer Center has a research agreement with Varian Medical Systems. This research was partially supported by the MSK Cancer Center Support Grant/Core Grant (P30 CA008748).
Reference
- 1.Bertholet J, Knopf A, Eiben B, et al. Real-time intrafraction motion monitoring in external beam radiotherapy. Physics in Medicine & Biology. 2019;64(15):15TR01. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Imura M, Yamazaki K, Shirato H, et al. Insertion and fixation of fiducial markers for setup and tracking of lung tumors in radiotherapy. International journal of radiation oncology, biology, physics. 2005;63 5:1442–1447. [DOI] [PubMed] [Google Scholar]
- 3.Mueller M, Poulsen PR, Hansen R, et al. The markerless lung target tracking AAPM grand challenge (MATCH) results. Medical physics. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wang C, Hunt M, Zhang L, et al. Technical Note: 3D localization of lung tumors on cone beam CT projections via a convolutional recurrent neural network. Medical physics. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zhang P, Hunt M, Telles AB, et al. Design and validation of a MV/kV imaging-based markerless tracking system for assessing real-time lung tumor motion. Medical Physics. 2018;45:5555–5563. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.de Bruin K, Dahele MR, Mostafavi H, Slotman BJ, Verbakel WFAR. Markerless Real-Time 3-Dimensional kV Tracking of Lung Tumors During Free Breathing Stereotactic Radiation Therapy. Advances in Radiation Oncology. 2021;6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Menten MJ, Fast MF, Nill S, Oelfke U. Using dual-energy x-ray imaging to enhance automated lung tumor tracking during real-time adaptive radiotherapy. Medical physics. 2015;42 12:6987–6998. [DOI] [PubMed] [Google Scholar]
- 8.Bowman W, Robar JL, Sattarivand M. Optimizing dual-energy x-ray parameters for the ExacTrac clinical stereoscopic imaging system to enhance soft-tissue imaging. Medical Physics. 2017;44:823–831. [DOI] [PubMed] [Google Scholar]
- 9.von Berg J, Young S, Carolus H, et al. A novel bone suppression method that improves lung nodule detection. International Journal of Computer Assisted Radiology and Surgery. 2016;11(4):641–655. [DOI] [PubMed] [Google Scholar]
- 10.He X, Cai W, Li F, et al. Decompose kV projection using neural network for improved motion tracking in paraspinal SBRT. Medical physics. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Fan Q, Pham HD, Zhang P, Li X, Li T. Evaluation of a proprietary software application for motion monitoring during stereotactic paraspinal treatment. Journal of Applied Clinical Medical Physics. 2022;23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Shen L, Zhao W, Xing L. Patient-specific reconstruction of volumetric computed tomography images from a single projection view via deep learning. Nature biomedical engineering. 2019;3:880–888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lei Y, Tian Z, Wang T, et al. Deep learning-based real-time volumetric imaging for lung stereotactic body radiation therapy: a proof of concept study. Physics in Medicine & Biology. 2020;65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Isola P, Zhu J-Y, Zhou T, Efros AA. Image-to-Image Translation with Conditional Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017.5967–5976. [Google Scholar]
- 15.Zhu JY, Park T, Isola P, Efros AA. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. Paper presented at: 2017 IEEE International Conference on Computer Vision (ICCV); 22–29 Oct. 2017, 2017. [Google Scholar]
- 16.Kayhan OS, Gemert JCv. On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location. Paper presented at: Computer Vision and Pattern Recognition 2020. [Google Scholar]
- 17.Mao X, Li Q, Xie H, Lau RYK, Wang Z, Smolley SP. Least Squares Generative Adversarial Networks. 2017 IEEE International Conference on Computer Vision (ICCV). 2017.2813–2821. [Google Scholar]
- 18.Zhang H, Kong V, Huang K, Jin J-Y. Correction of Bowtie-Filter Normalization and Crescent Artifacts for a Clinical CBCT System. Technology in Cancer Research & Treatment. 2017;16:81–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zarepisheh M, Hong L, Zhou Y, et al. Automated intensity modulated treatment planning: The expedited constrained hierarchical optimization (ECHO) system. Medical physics. 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hazelaar C, van der Weide L, Mostafavi H, Slotman BJ, Verbakel WFAR, Dahele MR. Feasibility of markerless 3D position monitoring of the central airways using kilovoltage projection images: Managing the risks of central lung stereotactic radiotherapy. Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology. 2018;129 2:234–241. [DOI] [PubMed] [Google Scholar]
- 21.Li T, Li F, Cai W, Zhang P, Li X. Technical Note: Synthetic treatment beam imaging for motion monitoring during spine SBRT treatments - a phantom study. Medical physics. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]












