Abstract
Background:
An automated, accurate, and efficient lung four-dimensional computed tomography (4DCT) image registration method is clinically important to quantify respiratory motion for optimal motion management.
Purpose:
The purpose of this work is to develop a weakly supervised deep learning method for 4DCT lung deformable image registration (DIR).
Methods:
The landmark-driven cycle network is proposed as a deep learning platform that performs DIR of individual phase datasets in a simulation 4DCT. This proposed network comprises a generator and a discriminator. The generator accepts moving and target CTs as input and outputs the deformation vector fields (DVFs) to match the two CTs. It is optimized during both forward and backward paths to enhance the bi-directionality of DVF generation. Further, the landmarks are used to weakly supervise the generator network. Landmark-driven loss is used to guide the generator’s training. The discriminator then judges the realism of the deformed CT to provide extra DVF regularization.
Results:
We performed four-fold cross-validation on 10 4DCT datasets from the public DIR-Lab dataset and a hold-out test on our clinic dataset, which included 50 4DCT datasets. The DIR-Lab dataset was used to evaluate the performance of the proposed method against other methods in the literature by calculating the DIR-Lab Target Registration Error (TRE). The proposed method outperformed other deep learning-based methods on the DIR-Lab datasets in terms of TRE. Bi-directional and landmark-driven loss were shown to be effective for obtaining high registration accuracy. The mean and standard deviation of TRE for the DIR-Lab datasets was 1.20±0.72 mm and the mean absolute error (MAE) and structural similarity index (SSIM) for our datasets were 32.1±11.6 HU and 0.979±0.011, respectively.
Conclusion:
The landmark-driven cycle network has been validated and tested for automatic deformable image registration of patients’ lung 4DCTs with results comparable to or better than competing methods.
Keywords: deformable image registration, deep learning, lung CT, radiotherapy
1. Introduction
Lung cancer is the most lethal and third most common cancer type. In 2021, there were over 235,000 new cases and over 131,000 deaths in the United States.1 For non-small cell lung cancer, radiation therapy is delivered to approximately 42% of patients with stage I and II disease, 66% with stage III disease, and 43% with stage IV disease.2 4DCT is one of the common methods to account for respiratory motion in radiotherapy.3 In lung cancer radiotherapy, it is routinely acquired in the clinic to assess tumor motion and select appropriate motion management techniques, such as abdominal compression, breath-hold, or amplitude/phase gating.4,5 DIR among different phases of 4DCT volumes has been shown to be an effective tool throughout the entire treatment course. DIR can by employed to propagate contours from one phase to another during treatment planning stage to facilitate the contouring process.6 During treatment, DIR can also be the key technique to evaluate received dose of the patient by accumulating dose from different phases onto a reference phase.7,8 Moreover, DIR can be used to quantify breathing-induced lung density/volume changes, which are essential information for lung ventilation imaging. The ventilation imaging can serve as a functional imaging modality to indicate functional avoidance in lung cancer.9 Thus, 4DCT lung registration enables quantitative analysis of tumor motion throughout the respiratory cycle, thereby allowing optimal treatment planning and evaluation. Therefore, it is desirable to develop a fast and accurate 4DCT lung deformable registration method.
Due to its superior runtime and robustness when compared to traditional methods, deep learning (DL)-based DIR methods have been a hot research topic in recent years.10,11 Two commonly adopted DL DIR networks are supervised and unsupervised transformation prediction networks, which are trained to predict DVFs to align the input images. For the supervised method, the transformation learning process is considered to be fast and moderately regularized due to the existence of the ground truth transformation. However, the ground truth transformations are often unavailable and generated using either conventional registration or analytical models, which could introduce bias for transformation prediction. Since the introduction of the spatial transformer network (STN)12, which allows image similarity loss to be defined within the network, unsupervised transformation prediction has become increasingly popular.
De Vos et al.13 proposed an unsupervised model which utilized multistage training and testing. They incorporated several image dataset sources, including cardiac cine MRIs, chest CTs, and the DIR-Lab dataset. With regard to the DIR-Lab dataset, the group performed leave-one-out cross-validation due to the small size of the dataset. On average, their method achieved target registration error (TRE) values of 2.64±3.47 mm. A significant downfall of current unsupervised models is that the variance is larger than that of many supervised models, which lends to the unsupervised methods being less robust for clinical application, where both accuracy and precision are paramount. With clinical application as a major focus of our work, some degree of supervision will be necessary when designing our model.
Sentker et al.14 developed a supervised image registration model called GDL-FIRE, which was trained with ground truths generated by DIR frameworks including PlastiMatch, NiftyReg, and VarReg on in-house datasets. The DIR-Lab dataset was used for testing. A major strength of the method was the speed – a few seconds compared to approximately 15 minutes one would typically see in a traditional registration algorithm. One limitation of the method is inconsistent estimated motion patterns between the GDL-FIRE variants when certain artifacts are present. This inconsistency raised some doubts for the authors with regard to the model’s applicability for treatment planning. On average, their method achieved TRE values of 2.50±2.49 mm.
Eppenhof et al.15 proposed a supervised model using U-Net. Their network was trained using synthetic random transformations to a small set of representative images and tested on the DIR-Lab dataset. One limitation of their work includes using entire uncropped images as input, which forced them to downsample the original images to save memory and ultimately lose some image information. Additionally, random transformations are not analogous to normal lung motion, which may lead to suboptimal DVF regularization. On average, their method achieved TRE values of 2.17±1.87 mm. Additionally, they subsequently trained their model on seven of the DIR-Lab datasets and tested it on the remaining three, which showed marginally improved image metrics.
Fu et al.16 proposed an unsupervised DIR model using generational adversarial networks (GANs). Their work differs from others in that it uses adversarial loss and pulmonary vessel enhancement to register 4DCT lung images, in addition to the generator-discriminator pair. They performed five-fold cross-validation on ten datasets from their clinic, optimized registration accuracy in terms of TRE, and tested the model on the DIR-Lab dataset. The group concluded that biomedical constraints were needed for subsequent improvement, which were explored in another work17 with CT/MRI to improve DIL delineation, treatment planning, and dose monitoring for prostate radiotherapy. In another work,18 the group developed a multi-scale unsupervised DIR model and applied it on abdominal 4DCT registration. A limitation of this study is that the sliding motion at the lung pleura was not supported well, which caused the DVFs at the pleura to be too small. On average, their method achieved TRE values of 1.59±2.06 mm.
Zhang et al.19 developed a method called GroupRegNet, which used group-wise implicit template registration, one-shot unsupervised learning, and deformation field smoothing. One drawback of the one-shot learning strategy is that it is an iterative process which is typically slower than non-iterative networks which can generate the DVF in a single forward prediction. However, the network is simpler than traditional registration methods, functions well with a small set of input data, and requires less manipulation on the input images, notably by eliminating the need for breaking images into patches. On average, the method achieved TRE values of 1.26±0.84 mm on the DIR-Lab dataset.
To improve deformation regularization and improve the physical fidelity of the transformation prediction, CycleMorph20 was proposed to utilize cycle consistency to drive the deformed image to return to the original image, which could minimize the discretization errors originating from the inverse deformation fields. This approach resulted in preservation of topological information and improved registration accuracy over other state-of-the-art methods, such as VoxelMorph.
To further improve the registration regularization introduced by CycleMorph, we proposed to incorporate a discriminator and a loss term defined by using pre-identified landmark pairs for network training. Pre-defined landmark pairs were required during the training process and were not used for the network testing. Inspired by CycleMorph, we proposed a novel landmark-driven cycle network infrastructure, which was integrated into CycleMorph, to develop a fast and accurate DIR method for 4DCT lung images.
The landmark-driven loss focuses on the local TRE as defined by the landmarks. The cycle consistency loss aims to ensure the model’s efficiency for deforming the image back and forth between the target and moving images. With only these two losses, the trained model will primarily focus on the accuracy of specified location transformations, as defined by the sparsely labeled landmarks, and the image similarity between the deformed and target images. As such, the model may not learn the reasonability of the deformations. To solve this potential issue, we included a generator-discriminator pair to focus on the deformed image’s reasonability when compared to the original image. By including this term, the deformation reasonability of the trained model can be improved.
Novel aspects of our method include (1) utilizing pre-defined, highly accurate landmark pairs to aid the registration in a cycle-consistent framework with a generator-discriminator pair to improve the realism and accuracy of the deformations, and (2) using bi-directional loss, which is capable of counteracting unrealistic deformations, to train the model and promote reasonable and properly regularization transformation predictions.
2. Methods and materials
2.A. Overview
The proposed workflow, the landmark-driven cycle network, is shown in Figure 1. The training of the network follows the feed forward paths, indicated by both solid black arrows and dashed blue arrows, and is optimized by several loss terms, indicated by dashed red arrows and red boxes. The inference of a new patient’s lung CT’s moving phase and target phase follows the feed forward paths, indicated by black arrows, of the landmark-driven cycle network. The network includes two sub-networks, a generator and a discriminator.
Figure 1.

Schematic of the proposed landmark-driven cycle network-based registration method. The black arrows denote the feed forward path in both the training and inference procedures. The blue dashed arrows denote the feed forward path for only the training procedure. The red dashed arrows denote the supervision.
During training, the network took a moving phase , a target phase , and corresponding movements and . The corresponding movements and were calculated using landmarks of the moving phase to the target phase and target phase to the moving phase, respectively, as inputs of each iteration. The generator took and as inputs and then output the DVF that fuses the patient’s moving phase to the target phase deformably, and the DVF that deformably registers the target phase to match the moving phase . The deformable registrations were achieved via spatial transformer.10
The discriminator was used to query the image quality of the deformed image that was obtained with at to match and to query the image quality of the deformed image that was obtained with at to match . These transformations can be written as .
The discriminator loss associated with image distance loss, which refers to the MAE and gradient difference calculated between each deformed and target image pair, was calculated between and and between and and built the bi-directional loss. This loss was used to minimize the image-wise error of applying the DVFs and , thus increasing the accuracy of the deformation model.
Then, landmark-driven loss was applied to further weakly supervise the generated DVFs. During training, we had sparse movement that was calculated from landmarks of the moving phase to the target phase and from landmarks of the target phase to the moving phase . Thus, if the obtained DVFs and were coherent, then the movement of the corresponding positions of and should be the same as that of and . Namely, we used landmark-driven loss, which is the error calculated between each DVF-landmark movement pair, such as between and and between and , to further increase the accuracy of generated DVFs. Since we can only have sparse landmarks from the moving and target phases, this landmark-driven loss was made with weak supervision. In addition, regularization terms that consider the coherence of the DVFs, such as spatial consistency, were also included in the landmark-driven loss to consider the priority of the DVFs.
With the unsupervised deformable registration method, the generated DVF does not explicitly enforce topological preservation. Without topological preservation, structural information is lost and inaccurate image registration may occur.20 To correct for this, the DVFs’ cycle-consistency strategy21 was implemented into the training of the proposed network by applying the DVF to DVF to obtain an estimation of , denoted as and vice versa. Then, the distances between and and and were minimized.
During training, for both the matching of the moving phase to the target phase and the matching of the target phase to the moving phase, the generator and discriminator involved were updated iteratively and alternately within each iteration. Thus, the proposed landmark-driven cycle network could ensure consistent registration performance under a bilaterally directional fashion.
2.B. Network Architecture
The network comprises a generator and a discriminator. Since the training of the network is bilateral and the inference of the network only needs one side of the feed forward path, we introduce the procedure of one side for simplicity.
Suppose one patient’s moving phase and target phase are fed into the generator to derive the DVF that can use a spatial transformer22 to deformably register to match i.e., . The generator and discriminator are shown in Fig. 2. The generator was implemented using a fully convolutional network (FCN) and comprised several convolutional layers with stride size of 2 to down-sample the output. The convolutional layers in the generator were followed by a residual convolutional block used to focus the model’s learning primarily on the residual of the movement between and . Then, a combination of a deconvolutional layer and bilinear interpolation was used to adjust the output DVF’s size back to the input size. Bilinear interpolation was incorporated to generate smooth DVFs that were more reasonable for deformable registration. The discriminator, which is another FCN, output quality measurement of the derived images. The derived images can be either target images , regarded as real, or deformed images , regarded as fake.
Figure 2.

Network architecture of the mutual network.
2.C. Bi-directional Loss
One potential limitation of recent deep learning-based DIR methods10 is that small inconsistencies may occur because finite parameters generally comprise the discrete representation of the deformations. Notably, may not be equal to the inverse of . To solve this potential issue, the investigated bi-directional framework forces the coherence of the deformation from moving image to target image and target image to moving image with bilateral deformable registration. The module should both register to match (left side of Fig. 1) and register to match (right side of Fig. 1). The bi-directional loss was thus composed of two parts. Suppose we have a set of and and the corresponding generated DVFs and . This bi-directional loss is represented by:
| (1) |
where represents the optimization term. For simplicity, we only introduce the term via , and is implemented via compound optimization terms which are composed of image difference loss and regularization terms.
In this work, the image difference loss was implemented through a compound loss of normalized cross-correlation (NCC) loss,16 which is a structural similarity measured between the deformed phase and the target phase , and TukeyBiweight loss,23 which is calculated between the gradient of and . To further consider the local structure details’ motion and treat the boundary equally so that the model was able to enlarge the focus of vessels, we introduced another TukeyBiweight loss that was measured between the gradient images of and . A single parameter, , represents the threshold above which voxel differences were cropped and had no further effect. That is, they were treated as outliers and automatically discounted. The image difference loss of is represented as follows:
| (2) |
where, denotes the NCC loss, denotes the TukeyBiweight loss, and denotes the gradient image.
To generate realistic DVFs, regularization was used. A frequently used method for regularization in the literature is isotropic smoothing. To achieve isotropic smoothing, the -norm (sparseness)24 of the gradient matrix of the DVFs was used. Additionally, sigmoid cross entropy (SCE), derived from the discriminator, guided the realism of the deformed image. During training, the generator generated the DVF that moved to match while minimizing loss term . The discriminator attempted to score the image quality of deformed image from target image so that the generator could improve the realism of the deformed image to compete against the discriminator and thus improve the performance of deformation. The discriminator can be optimized by:
| (3) |
2.D. Landmark-driven and cycle-consistency
In this work, we used landmark-driven strategy to weakly supervise the learning process in order to generate accurate DVFs with small TRE. Namely, we used sparse movement that was calculated from landmarks of the moving phase to the target phase and from landmarks of the target phase to the moving phase. We denoted these two sets of movements as (moving to target) and (target to moving). If the obtained DVFs and are accurate, then the movement of corresponding positions of and should be the same as the and , i.e.:
| (4) |
and denote the set of 300 landmark pairs’ positions in moving and target image and denotes the movement of the ith moving landmark from moving image to its related target image ’s position. The difference between and denotes the points movements of these 300 landmark pairs. To guarantee that the direction of movement was accurate, the -norm of the difference between the two DVFs was used, represented as follows:21
| (5) |
Given the regularization terms of DVFs and the image difference loss, the generator , shown below, can be optimized by minimizing the loss terms and the regularization terms:
| (6) |
2.E. Dataset
We used the public DIR-Lab datasets25,26 as separate testing datasets for evaluating registration accuracy of the proposed method in comparison with other DIR methods. The DIR-Lab datasets consist of 10 4DCT cases, each comprising 10 images and 300 manually selected landmark pairs for end-expiration (EE) and end-inspiration (EI) phases. Additional dense landmark pairs obtained by Fu et al.27 were also used to introduce extra supervision for the registration. Following the evaluation protocols used in the other methods, image similarity metrics were calculated on the EE and EI phases only, as the deformations are most extreme between these two phases.
The landmarks used in this work and others were collected from the public datasets available through DIR-Lab. Each available dataset has associated with it a coordinate list of anatomical landmarks that have been manually delineated and registered by an expert in thoracic imaging with repeat registration performed by multiple observers to estimate the spatial variance in feature identification. The point sets served as a reference for evaluating DIR spatial accuracy within the lung for each case.
The original in-plane image resolutions of DIR-Lab datasets ranged from 0.97–1.16 mm while the slice thicknesses were 2.5 mm. As a preprocessing step, each dataset was resampled in the superior–inferior direction to match the in-plane resolution. Since we were interested in registration of only the lung volumes, we cropped the images to cover only the lung. To avoid the boundary effect, a margin of 24 pixels was preserved after image cropping. Therefore, a total of twenty 4DCT lung datasets with isotropic resolutions were used in this study. For our clinic’s dataset, we also resampled our data to the same degree as the DIR-Lab data to ease the testing.
In each fold, 4DCT datasets of eight patients consisting of 16 3DCTs were taken as training and validation (6 training : 2 validation split) datasets while the datasets of the two other patients were used as the testing dataset. After cross-validating the four folds for each pair of testing images, values were tabulated and the next pair of testing images was selected. This process was repeated five times to include all ten patients in the testing dataset. During training, image pairs between EE and EI phases were taken as the moving and fixed image pairs. Additionally, the total number of training image pairs was doubled after switching the moving and fixed image pairs. During the testing on the DIR-Lab datasets, the EE phase of a tested patient was taken as the moving image and the EI phase was taken as the target image. We calculated the metrics between the deformed and target images.
To verify the robustness of the proposed DIR model, we included a hold-out test with 50 4DCT lung datasets that were obtained in our department. These 4DCT images were acquired on a Siemens SOMATOM Definition AS CT scanner. CT voxels were 0.977×0.977×2.0 mm. Each 4DCT dataset comprised 10 3DCT scans, each of which corresponded to a respiratory phase. The model was trained and tested on DIR-Lab’s 10-patient 4DCT dataset and tested on our 50 4DCT lung datasets.
2.F. Implementation and evaluation
Our proposed algorithm was implemented in Python 3.6 and TensorFlow on an NVIDIA Tesla V100 GPU with 32 GB of memory. The Adam gradient optimizer with a learning rate of 2e-4 was used for optimization. A loss curve from one of the folds is shown in Fig. 1* in Supplementary.
To quantitatively evaluate the performance of the proposed method, MAE, peak-signal-to-noise ratio (PSNR), SSIM, and TRE between the deformed image and the target phase were calculated. Specifically, the TRE was defined as the Euclidean distance between positions of the corresponding landmarks after registration.
We also used the Jacobian determinant for evaluation based on our previous study.28 It was calculated for assessment of characteristics that affected fidelity of the predicted DVF, including topological preservation and minimization of physically unrealistic deformation. Performance was inspected through comparison of the original and deformed images, along with the subtracted images. Additionally, this inspection was augmented with quantitative intensity profiles in the anterior-posterior direction and across notable anatomical landmarks to evaluate the structural accuracy of registration.
3. Results
3.A. Efficacy of landmark-driven supervision
To demonstrate the effectiveness of the landmark-driven supervision, we compared the registration accuracy for the proposed method with and without landmark-driven supervision. The results shown here are between the registrations of EI as the moving image and EE as the target image. Deformation results of the two variants of the proposed method are shown in Fig. 3. Fig. 3 (b2) shows the fused image between the moving and the target images before registration. Fig. 3 (c2) and (d2) show the fused images between the target and the registered images for the proposed method without landmark-driven supervision and with landmark-driven supervision, respectively. Fused images were color-coded with red representing the target image, green representing the moving image, and yellow representing good intensity agreement between the moving/deformed and target images. The arrow in Fig. 3 (b2) indicates that artery misalignment between the moving image and target image was observed in the proposed method without landmark-driven supervision. However, the artery alignment was improved in the proposed method with landmark-driven supervision, demonstrating the improved performance with landmark-driven supervision. Intensity difference images before and after registration are shown in Fig. 3 (b3–d3). We can observe that the intensity differences after registration were greater for the proposed method without landmark-driven supervision than that of the proposed method with landmark-driven supervision, which suggests that landmark-driven supervision improved the alignment of small structures in the lungs.
Figure 3.

(a1): EI phase, (b1): EE phase and ground truth, (b2): fusion image between EI (moving) and EE (target) phases, (b3): difference image between EI and EE phases. (c1, d1): deformed images via proposed method without landmark-driven supervision, and with landmark-driven supervision, respectively. (c2, d2): fusion images between deformed images of the two variant methods and the target phase. (c3, d3): intensity difference images between deformed images of the two variant methods and target phase. For fusion images, red represents the target image while green represents the moving image. Yellow represents intensity agreement, indicating good alignment between the fixed and deformed images. The window level of CT images is set to [−1000, 200] HU. The window level of difference images is set to [−300, 300] HU.
Additionally, we calculated other image similarity metrics between the target and registered images and the Jacobian determinant of the generated DVFs via the proposed method with and without landmark-driven supervision in Table 1. Two-tailed, paired two sample t-tests were conducted on the data in Table 1. While the proposed method with landmark-driven supervision produced lower MAE (p = .359) and higher PSNR (p = .156) and SSIM (p = .719), it produced a higher Jacobian determinant (p = .756). The p-values were not significant with α = 0.05. The degradation of the Jacobian determinant’s quality may be explained by the model not being sufficiently robust with such a small training set of six patients. Additionally, a larger training set could lead to improved p-values.
Table 1.
Metrics of the two variants of the proposed method using DIR-Lab datasets.
| Set | Without Landmark Loss | With Landmark Loss | ||||||
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| MAE | PSNR | SSIM | Jac. | MAE | PSNR | SSIM | Jac. | |
| 1 | 19.8 | 42.7 | 0.988 | 0.067% | 20.2 | 45.6 | 0.991 | 0.015% |
| 2 | 40.1 | 30.7 | 0.968 | 0.184% | 21.8 | 31.2 | 0.988 | 0.007% |
| 3 | 24.3 | 30.4 | 0.986 | 0.243% | 23.1 | 45.9 | 0.990 | 0.090% |
| 4 | 25.4 | 25.2 | 0.989 | 0.317% | 28.2 | 29.2 | 0.983 | 0.109% |
| 5 | 30.0 | 28.7 | 0.982 | 0.011% | 22.7 | 32.7 | 0.985 | 1.897% |
| 6 | 44.7 | 32.4 | 0.984 | 0.016% | 45.1 | 26.2 | 0.969 | 2.104% |
| 7 | 46.8 | 26.0 | 0.968 | 0.141% | 39.6 | 29.2 | 0.977 | 0.024% |
| 8 | 41.7 | 28.7 | 0.975 | 2.109% | 54.6 | 28.8 | 0.955 | 0.074% |
| 9 | 54.2 | 28.8 | 0.955 | 0.172% | 27.5 | 32.0 | 0.985 | 0.013% |
| 10 | 29.4 | 31.5 | 0.984 | 0.105% | 37.3 | 31.1 | 0.971 | 0.085% |
|
| ||||||||
| Mean | 35.6±11.4 | 30.5±4.8 | 0.978±0.011 | 0.337±0.630% | 32.0±11.6 | 33.2±6.9 | 0.979±0.011 | 0.453±0.818% |
Note Jac. Denotes the Jacobian determinant.
To compare the proposed method with other deep learning-based DIR methods, we calculated TRE using the 300 landmark pairs per case given by DIR-Lab. Quantitative results are reported in Table 2. One-tailed, paired two-sample t-tests using the mean values of each set were conducted on the proposed method compared to each of the other group’s methods. Additionally, Eppenhof et al.15 also trained their network on seven of the DIR-Lab datasets and tested it on the remaining three, as indicated in the column with the asterisk. The differences in means for the TREs between the proposed method and other methods were statistically significant for all groups but Zhang et al.19 This suggests that the proposed method was more accurate than four of the other five deep learning-based methods in the examined cases. The standard deviations reported in the “Mean” row for these methods has been recalculated in some cases to consistently report the standard deviation for the methods, as groups did not use the same formulations for their reported values. The formula used to calculate this was .
Table 2.
Comparison of TRE values for different deep learning-based methods on DIR-Lab. The p-values under each column are calculated with respect to the proposed method and the column group’s method.
| Set | Before registration | Eppenhof et al.15 | *Eppenhof et al.15 | De Vos et al.13 | Zhang et al.19 | Fu et al.16 | Sentker et al.14 | Proposed |
|---|---|---|---|---|---|---|---|---|
| 1 | 3.89±2.78 | 1.45±1.06 | 1.27±1.16 | 1.02±0.51 | 0.98±0.54 | 1.20±0.60 | 0.97±0.56 | |
| 2 | 4.34±3.90 | 1.46±0.76 | 1.24±0.61 | 1.20±1.12 | 1.04±0.49 | 0.98±0.52 | 1.19±0.63 | 0.96±0.43 |
| 3 | 6.94±4.05 | 1.57±1.10 | 1.48±1.26 | 1.24±0.71 | 1.14±0.64 | 1.67±0.90 | 1.03±0.63 | |
| 4 | 9.83±4.86 | 1.95±1.32 | 1.70±1.00 | 2.09±1.93 | 1.43±0.97 | 1.39±0.99 | 2.53±2.01 | 1.05±0.62 |
| 5 | 7.48±5.51 | 2.07±1.59 | 1.95±2.10 | 1.41±1.22 | 1.43±1.31 | 2.06±1.56 | 1.22±0.71 | |
| 6 | 10.89±6.90 | 3.04±2.73 | 5.16±7.09 | 1.31±0.72 | 2.26±2.93 | 2.90±1.70 | 1.56±0.84 | |
| 7 | 11.03±7.40 | 3.41±2.75 | 3.05±3.01 | 1.28±0.65 | 1.42±1.16 | 3.60±2.99 | 1.04±0.78 | |
| 8 | 15.00±9.01 | 2.80±2.46 | 6.48±5.37 | 1.33±1.08 | 3.13±3.77 | 5.29±5.52 | 1.34±0.89 | |
| 9 | 7.92±3.98 | 2.18±1.24 | 1.61±0.82 | 2.10±1.66 | 1.30±0.69 | 1.27±0.94 | 2.38±1.46 | 1.28±0.70 |
| 10 | 7.30±6.35 | 1.83±1.36 | 2.09±2.24 | 1.22±0.63 | 1.93±3.06 | 2.13±1.88 | 1.59±1.14 | |
|
| ||||||||
| Mean | 8.46±6.08 | 2.17±1.87 | 1.52±1.01 | 2.64±3.47 | 1.26±0.84 | 1.59±2.06 | 2.50±2.49 | 1.20±0.79 |
|
| ||||||||
| p-value | (<.001) | (.009) | (.235) | (.024) | (.003) | |||
indicates the group’s results for their testing set when using the other seven DIR-Lab patients as the training set.
3.B. Registration accuracies on institutional dataset
We also performed a hold-out test on 50 of our clinic’s patients’ data. We calculated the image similarity metrics between the target and deformed images after image registration and the Jacobian determinant of the generated DVFs via the proposed method with landmark-driven supervision in Table 3.
Table 3.
Metrics of the proposed method using 50 additional datasets for hold-out test.
| Set | MAE | PSNR | SSIM | Jac. |
|---|---|---|---|---|
| Mean | 41.7±11.3 | 29.9±3.0 | 0.986±0.009 | 0.57±0.80% |
Note Jac. denotes the Jacobian determinant.
4. Discussion
A method for automated, accurate, and efficient CT image registration is clinically important. Specific to lung patients that are simulated using 4DCT, this platform could be used to co-register the different phases and thereby facilitate contour propagation and quantitative motion estimates for the important nearby tissues to advise the physician on how to best manage respiratory motion. The optimization of conventional DIR methods is typically slow because of their iterative nature. Additionally, such methods often require parameter fine-tuning, which can depend on many factors, including the image modalities and image similarity metrics. This time-consuming process does not allow for efficient patient-specific decision making at the time of simulation. Deep learning-based methods are therefore more desirable because they are effective at producing registrations in a single forward prediction. In this work, we demonstrated the robustness of the proposed deep learning model by testing it on 50 hold-out cases from our clinic.
A major difference between the proposed method and other supervised deep learning methods is that the supervision is performed by a landmark-driven network in the proposed method. Namely, no full volume ground truth DVFs are needed to solve the issue of the scarcity of databases that include them. However, landmark pairs were used to guide the generated DVF during training, which allowed for weak supervision. By involving the landmark pairs from DIR-Lab and the dense landmarks from Fu et al.27 for each patient, approximately 2200 landmarks located throughout the whole lung region were available to guide the DVF. Thus, it could improve the derived DVF within the lung region (mean Jacobian determinant of the generated DVFs is within 0.001).
Table 2 shows that the proposed method has statistically significant improvement in performance when compared to all compared groups except for Zhang et al.19 Aspects of this work that are likely contributors to this performance include the inverse consistency through incorporation of an additional cycle-consistent regularization term, use of the vasculature information through incorporation of the landmark-driven loss, and use of image cropping, among others. For example, the shortcoming of the work by Eppenhof et al.15 with regard to image cropping has been ameliorated in our work. The cycle-consistent regularization and landmark-driven loss additionally contribute to the improved accuracy and precision against other methods.
Similar to the work by Fu et al.,16 one limitation of this study is that the lung sliding motion against the chest wall was not explicitly modeled. In conventional DIR methods, the sliding motion is modeled via repeated application of direction-dependent spatial filters, which would be more helpful for the lung pleura region. In a future work, we plan to model the sliding motion through integration of a biomechanical model into DVF regularization.
In this study, we used 300 landmark pairs from DIR-Lab for each case. The 300 landmark pairs are sparsely distributed within the lung. Better performance is expected from using more landmark pairs. The method proposed by Fu et al.27 can be used to automatically generate large number of landmark pairs in the lungs. However, landmarks are usually located in regions with high image contrast. Few landmarks are available in the low contrast regions, such as the air in lungs. Despite the sparsity of the landmark pairs, our results showed improved registration performance with the landmark pair guidance during network training. For future works, performance can be improved in both the lung and other regions by further increasing the number of total landmarks and increasing the landmark density in areas that are found to underperform.
In addition to sparse landmark pairs, this work was limited by the small set of patients. When training a model, we would generally want a larger number of input datasets. While that is still a concern here, it is partially alleviated by the 300 landmark pairs per dataset from DIR-Lab and additional 1900 landmark pairs from Fu et al.,27 which also work to alleviate some degree of possible overfitting. As overfitting is a common concern in such works, another future work could be to expand the number of patients’ datasets, though it would require many physician man-hours to achieve.
Another potential limitation of this study is that the performance of DIR may be affected by the image quality of the 4DCT images. Noise and streaking artifacts, arising from an insufficient number of projections in a respiratory phase,29 can degrade the image quality. While they can be reduced in severity, the trade-off is increased patient dose. Additionally, respiratory motion can cause blurring and more severe motion artifacts that may require rescanning the patient and incurring more dose. Patient motion limits precision of target delineation, with slow and shallow breathing leading to the smallest values of uncertainty for GTV location30 but decreased SNR and CNR.29 Developing deep learning-based 4DCT image quality enhancement as a pre-processing step for 4DCT lung DIR will be another future study focus.
This work lends to future studies regarding automatic organ-at-risk segmentation and dose-volume histogram prediction, all of which constitute the pipeline for further optimizing patient outcome and clinical feasibility through use of 4DCT imaging for 4DCT lung patients. In the current radiation therapy workflow, the proposed model can be used as part of a pipeline for automating and improving the current process for 4DCT treatment planning. Ideally, this would be used after the patient goes through 4DCT simulation, at which point the patient will have already gone through PET/CT imaging with the tumors already being contoured. The contours on the PET/CT dataset will be projected onto a phase of the 4DCT dataset using deformable registration. From here, OAR contours will be generated on the scan. Then, those projected contours will be further propagated across the remaining nine phases of the 4DCT scans using the deformation vector fields from this model. Other applications, such as 4DCT ventilation imaging, are also potential clinical utilizations.
5. Conclusion
A landmark-driven cycle network for automatic deformable image registration of patients’ lung 4DCT individual phase datasets was proposed and tested on DIR-Lab’s public dataset and a local clinical dataset. The proposed method can perform accurate and fast deformable registration on the utilized data, and it proves to be a promising tool for improving lung motion management strategies and treatment planning during radiation therapy.
Acknowledgments
This research is supported in part by the National Cancer Institute of the National Institutes of Health under Award Number R01CA215718 and R01CA272991.
Reference
- 1.Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer Statistics, 2021. CA: A Cancer Journal for Clinicians. 2021;71(1):7–33. [DOI] [PubMed] [Google Scholar]
- 2.Miller KD, Nogueira L, Mariotto AB, et al. Cancer treatment and survivorship statistics, 2019. CA: A Cancer Journal for Clinicians. 2019;69(5):363–385. [DOI] [PubMed] [Google Scholar]
- 3.Keall PJ, Mageras GS, Balter JM, et al. The management of respiratory motion in radiation oncology report of AAPM Task Group 76 [published online ahead of print 2006/11/09]. Med Phys. 2006;33(10):3874–3900. [DOI] [PubMed] [Google Scholar]
- 4.Benedict SH, Yenice KM, Followill D, et al. Stereotactic body radiation therapy: the report of AAPM Task Group 101 [published online ahead of print 2010/10/01]. Med Phys. 2010;37(8):4078–4101. [DOI] [PubMed] [Google Scholar]
- 5.Keall P 4-dimensional computed tomography imaging and treatment planning [published online ahead of print 2004/01/31]. Semin Radiat Oncol. 2004;14(1):81–90. [DOI] [PubMed] [Google Scholar]
- 6.Speight R, Sykes J, Lindsay R, Franks K, Thwaites D. The evaluation of a deformable image registration segmentation technique for semi-automating internal target volume (ITV) production from 4DCT images of lung stereotactic body radiotherapy (SBRT) patients [published online ahead of print 2011/01/25]. Radiother Oncol. 2011;98(2):277–283. [DOI] [PubMed] [Google Scholar]
- 7.Rong Y, Rosu-Bubulac M, Benedict SH, et al. Rigid and Deformable Image Registration for Radiation Therapy: A Self-Study Evaluation Guide for NRG Oncology Clinical Trial Participation [published online ahead of print 2021/03/05]. Pract Radiat Oncol. 2021;11(4):282–298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Tong Y, Yin Y, Cheng P, Gong G. Impact of deformable image registration on dose accumulation applied electrocardiograph-gated 4DCT in the heart and left ventricular myocardium during esophageal cancer radiotherapy [published online ahead of print 2018/08/12]. Radiat Oncol. 2018;13(1):145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hegi-Johnson F, Keall P, Barber J, Bui C, Kipritidis J. Evaluating the accuracy of 4D-CT ventilation imaging: First comparison with Technegas SPECT ventilation [published online ahead of print 2017/05/10]. Med Phys. 2017;44(8):4045–4055. [DOI] [PubMed] [Google Scholar]
- 10.Fu Y, Lei Y, Wang T, Curran WJ, Liu T, Yang X. Deep learning in medical image registration: a review. Physics in Medicine & Biology. 2020;65(20):20TR01. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Haskins G, Kruger U, Yan P. Deep learning in medical image registration: a survey. Machine Vision and Applications. 2020;31(1):8. [Google Scholar]
- 12.Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K. Spatial Transformer Networks. 2015. doi: 10.48550/arXiv.1506.02025:arXiv:1506.02025. doi: 10.48550/arXiv.1506.02025. https://ui.adsabs.harvard.edu/abs/2015arXiv150602025J Accessed June 01, 2015. [DOI]
- 13.de Vos BD, Berendsen FF, Viergever MA, Sokooti H, Staring M, Išgum I. A deep learning framework for unsupervised affine and deformable image registration [published online ahead of print 2018/12/24]. Med Image Anal. 2019;52:128–143. [DOI] [PubMed] [Google Scholar]
- 14.Sentker T, Madesta F, Werner R. GDL-FIRE4D: Deep Learning-Based Fast 4D CT Image Registration. Paper presented at: Medical Image Computing and Computer Assisted Intervention – MICCAI 2018; 2018//, 2018; Cham. [Google Scholar]
- 15.Eppenhof KAJ, Pluim JPW. Pulmonary CT Registration Through Supervised Learning With Convolutional Neural Networks [published online ahead of print 2018/10/30]. IEEE Trans Med Imaging. 2019;38(5):1097–1105. [DOI] [PubMed] [Google Scholar]
- 16.Fu Y, Lei Y, Wang T, et al. LungRegNet: An unsupervised deformable image registration method for 4D-CT lung [published online ahead of print 2020/02/06]. Med Phys. 2020;47(4):1763–1774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Fu Y, Wang T, Lei Y, et al. Deformable MR-CBCT prostate registration using biomechanically constrained deep learning networks. Medical Physics. 2021;48(1):253–263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lei Y, Fu Y, Wang T, et al. 4D-CT deformable image registration using multiscale unsupervised deep learning. Physics in Medicine & Biology. 2020;65(8):085003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhang Y, Wu X, Gach HM, Li H, Yang D. GroupRegNet: a groupwise one-shot deep learning-based 4D image registration method [published online ahead of print 20210212]. Phys Med Biol. 2021;66(4):045030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kim B, Kim DH, Park SH, Kim J, Lee J-G, Ye JC. CycleMorph: Cycle consistent unsupervised deformable image registration. Medical Image Analysis. 2021;71:102036. [DOI] [PubMed] [Google Scholar]
- 21.Zhang J Inverse-Consistent Deep Networks for Unsupervised Deformable Image Registration. ArXiv. 2018;abs/1809.03443. [Google Scholar]
- 22.Li H, Fan Y. Non-rigid image registration using self-supervised fully convolutional networks without training data [published online ahead of print 05/24]. Proc IEEE Int Symp Biomed Imaging. 2018;2018:1075–1078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Reuter M, Rosas HD, Fischl B. Highly accurate inverse consistent registration: a robust approach [published online ahead of print 20100714]. Neuroimage. 2010;53(4):1181–1196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Zhang Y, He X, Tian Z, et al. Multi-Needle Detection in 3D Ultrasound Images Using Unsupervised Order-Graph Regularized Sparse Dictionary Learning [published online ahead of print 20200122]. IEEE Trans Med Imaging. 2020;39(7):2302–2315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Castillo E, Castillo R, Martinez J, Shenoy M, Guerrero T. Four-dimensional deformable image registration using trajectory modeling [published online ahead of print 2009/12/17]. Phys Med Biol. 2010;55(1):305–327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Castillo R, Castillo E, Guerra R, et al. A framework for evaluation of deformable image registration spatial accuracy using large landmark point sets [published online ahead of print 2009/03/07]. Phys Med Biol. 2009;54(7):1849–1870. [DOI] [PubMed] [Google Scholar]
- 27.Fu Y, Wu X, Thomas AM, Li HH, Yang D. Automatic large quantity landmark pairs detection in 4DCT lung images [published online ahead of print 2019/07/19]. Med Phys. 2019;46(10):4490–4501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lei Y, Fu Y, Tian Z, et al. Deformable CT image registration via a dual feasible neural network [published online ahead of print 2022/07/24]. Med Phys. 2022. doi: 10.1002/mp.15875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Lee S, Yan G, Lu B, Kahler D, Li JG, Sanjiv SS. Impact of scanning parameters and breathing patterns on image quality and accuracy of tumor motion reconstruction in 4D CBCT: a phantom study. Journal of Applied Clinical Medical Physics. 2015;16(6):195–212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Watkins WT, Li R, Lewis J, et al. Patient-specific motion artifacts in 4DCT. Medical Physics. 2010;37(6Part1):2855–2861. [DOI] [PubMed] [Google Scholar]
