Abstract
Background
Four‐dimension computed tomography (4D‐CT) provides important respiration‐related information for thoracic radiotherapy. Its quality is challenged by various respiratory patterns. Its acquisition gives rise to the risk of higher radiation exposure. Based on a continuously estimated deformation, a 4D synthesis by warping a high‐quality volumetric image is a possible solution.
Purpose
To propose a non‐patient‐specific cascaded ensemble model (CEM) to estimate respiration‐induced thoracic tissue deformation from surface motion.
Methods
The CEM is cascaded by three deep learning‐based models. By inputting the surface motion, CEM outputs a deformation vector field (DVF) inside thorax. In our work, the surface motion was simulated using the body contours derived from 4D‐CT. The CEM was trained on our private database including 62 4D‐CT sets, and was tested on a public database encompassing 80 4D‐CT sets. To evaluate CEM, we employed the model output DVF to generate a few series of synthesized CTs, and compared them with the ground truth. CEM was also compared with other published works.
Results
CEM synthesized CT with an mRMSE (average root mean square error) of 61.06 ± 10.43HU (average ± standard deviation), an mSSIM (average structural similarity index measure) of 0.990 ± 0.004, and an mMAE (average mean absolute error) of 26.80 ± 5.65HU. Compared with other works, CEM showed the best result.
Conclusions
The results demonstrated the effectiveness of CEM on estimating tissue DVF inside thorax. CEM requires no patient‐specific breathing data sampling and no additional training before treatment. It shows potential for broad applications.
Keywords: body deformation features, respiration, thoracic deformation estimation from surface, tissue deformation features
1. INTRODUCTION
Four‐dimension computed tomography (4D‐CT)1 is an useful approach to characterize respiratory motion in lung cancer radiotherapy. 1 , 2 It can be analyzed for the mean tumor position, tumor motion range for treatment planning, and so forth. 3 , 4 However, the 4D‐CT quality is challenged by the respiration pattern variety, 5 , 6 since it is reconstructed based on the hypothesis of a periodic cycle of breathing. 7 An increase in scanning time 8 and higher radiation dose 9 give rise to the risk of radiation exposure. To tackle these concerns, a 4D synthesis by warping a high‐quality image using an estimated deformation is a possible solution. The internal tissue deformation can be estimated from external surrogates. 10 , 11 , 12
Body surface is a common external surrogate. 13 , 14 Its correlation to the internal tissue deformation can be categorized into two types. The first type is to use it as a breathing phase indicator 15 for an interpolation. 16 Before treatment, a database containing a lot of motion data and their corresponding surfaces is constructed. The motion data can be a series of deformation vector fields (DVFs), 17 , 18 respiration‐correlated images, 11 and so forth. During treatment, a real‐time surface is captured and matched with those in the database. Then, the candidate motion data corresponding to the matched surfaces can be selected for the interpolation‐based deformation estimation. Most of the methods in this type is based on the hypothesis of respiratory periodicity. 11 However, the periodicity can be violated by different breathing patterns (such as deep/shallow breathing 19 , 20 , 21 ) and instantaneous changes (such as coughing and sneezing 22 , 23 ). The second type is an inference. 24 It is to use the surface motion to estimate internal deformation via a biomechanical model, 18 , 25 , 26 a statistical model, 10 , 27 a regression model, 28 and so forth. Compared to the first type, an inference method has the potential to adapt for the breathing variety, since it is not limited by the history data.
Inspired by our previous work, 29 an internal‐external correlation model in a high‐dimension deformation feature space shows a good robustness against the breathing variety. In this work, we propose a non‐patient‐specific cascaded ensemble model (CEM) to estimate the respiration‐induced thoracic tissue deformation based on the surface motion (as shown in Figure 1). The contributions of this model are as follows:
The CEM is non‐invasive and applicable for various patients.
The CEM doesn't need sampling patient‐specific motion data and additional training before treatment.
FIGURE 1.
RCCT synthesis workflow of our proposed CEM*. *CEM, a non‐patient‐specific cascaded ensemble model; RCCT, respiration‐correlated computed tomography. CEM = BDFs encoder + BDFs‐TDFs‐Net + TDFs decoder.
The rest of this paper is organized as follows. Section materials and methods introduces the method and the architecture of our proposed CEM. For a comprehensive evaluation, we tested it on a public database for respiration‐correlated CT (RCCT) synthetization, compared with other published works, and conducted an ablation study. These experiments are detailed in Section materials and methods. Section results shows the results. In Section discussion, we further discuss the performances. Section conclusion concludes our work.
2. MATERIALS AND METHODS
2.1. Method overview
Figure 2 presents a schematic representation of our work which has different configurations in training and testing phases. The training framework is composed of the following blocks:
DIR‐Body: a deformable image registration (DIR) model for body. Its encoder outputs the body deformation features (BDFs).
DIR‐Tissue: a DIR model for internal tissue. Its encoder outputs the tissue deformation features (TDFs).
BDFs‐TDFs‐Net: a model estimates TDFs from BDFs.
FIGURE 2.
Schematic representation of the proposed model.
During the test, the BDFs decoder and the TDFs encoder are removed. We tested the proposed CEM on the RCCT synthesization. In the following subsections, we detail the DIR‐Body, the DIR‐Tissue, and the BDFs‐TDFs‐Net.
2.2. Model structure
In this section, we present the architectures of three models.
2.2.1. DIR‐Body for encoding BDFs
BDFs are derived by the DIR‐Body encoder. DIR‐Body structure is shown in Figure 3. Its input is a pair of the volumetric binary images of the body at the reference respiration phase (Bodyref) and at ith respiration phase (Body i ).
FIGURE 3.
Illustration of DIR‐Body. (a) DIR‐Body training framework, (b) illustration of encoding BDFs, and (c) DIR‐Body layers*. *Legends: Conv3D(kernel size, stride, padding), LeakyReLU(negative slope). The number on the top of each rectangle is the number of filters. *DIR‐Body, deformable image registration model for body; Conv3D, convolution in three dimensions; BatchNorm, batch normalization.
2.2.2. DIR‐Tissue for encoding TDFs
DIR‐Tissue's input is a pair of CT images at the reference respiration phase (CTref) and at ith respiration phase (CT i ). DIR‐Tissue, as shown in Figure 4, was optimized using the unsupervised learning framework which was suggested by Balakrishnan et al. 30
FIGURE 4.
Illustration of DIR‐Tissue. (a) DIR‐Tissue training framework, (b) illustration of encoding TDFs and (c) DIR‐Tissue layers*. *DIR‐Tissue, deformable image registration model for internal tissue. Other abbreviations and legends are the same as those in Figure 3.
2.2.3. BDFs‐TDFs‐Net
Figure 5 shows the architecture of BDFs‐TDFs‐Net. It outputs an estimated TDFs by inputting BDFs. In this model, we employed a modified dynamic region‐aware convolution in three dimensions (modDRConv3d).2 The modDRConv3d is modified based on the DRConv proposed by Chen et al. 31 Different with the DRConv, the modDRConv3d generates various randomly‐initialized filters, and then assigns these filters to different input regions to execute convolution.
FIGURE 5.
Illustration of BDFs‐TDFs‐Net. (a) model architecture, (b) illustration of generating weight matrix W, (c–e) illustrations of generating the model's input variables.* *modDRConv3d, modified dynamic region‐aware convolution in three dimensions; TransConv3d, transposed convolution in three dimensions. *Legends: modDRConv3d(kernel size, region number), TransConv3d(kernel size, stride, padding). Other abbreviations and legends are the same as those in Figure 2. *modDRConv3d is detailed in Appendix B.
2.3. Experiment
2.3.1. Data acquisition
Two datasets were used in this experiment. One was a private dataset containing 62 4D‐CT images. One was the public 4D‐Lung dataset 32 provided by the Cancer Imaging Archive (TCIA). It included 80 4D‐CT sets from 20 locally‐advanced, non‐small cell lung cancer patients.
The private database was collected from 62 lung cancer patients receiving radiation treatment in our department, namely the Department of Radiation Oncology. All patients underwent CT scans on a Brilliance CT Big Bore system (Philips Healthcare, Best, the Netherlands). Each 4D‐CT owned several sets of three‐dimension(3D) CT.3 Various 3D CTs corresponded to different respiratory phases. All CT images were reconstructed into a matrix of 512 × 512 with a slice thickness of 2–5 mm and a pixel space of 0.97–1.18 mm.
In the TCIA 4D‐Lung database, the images were acquired on a 16‐slice helical CT scanner (Brilliance Big Bore, Philips Medical Systems, Andover, MA) and were reconstructed into 10 breathing phases (0.0%–90.0%). 33 The in‐plane resolution was 0.9766 mm with a grid size of 512 × 512. The slice thickness was 3 mm.
2.3.2. Data pre‐processing
This experiment needed two types of data: the 3D CT, and its corresponding volumetric binary image of body (Body) which is used to simulate the surface motion.
For the private database, each 3D CT was resampled into a slice thickness of 2.5 mm and a pixel space of 2.0 mm due to the graphics processing unit (GPU) memory limitation. Then the images were centrally cropped to matrices of 128 × 192 × 192. To focus on lung soft tissue, the image intensities were clamped to a range of −1000–500HU and normalized to 0–1. The Body was generated using a commercially available automatic segmentation tool (Shenzhen Yino Intelligent Technology Development Co., Ltd, Shenzhen, China).
For the TCIA 4D‐Lung database, the Body was generated by using a threshold segmentation and a Moore neighbor contour tracing algorithm. 34 All body images and 3D CT slices were resampled into a resolution of 2.5 mm × 1.16 mm × 1.16 mm, and cropped to matrices of 120 × 265 × 265 that cover the lung.
Note that, the different image dimensions and resolutions for the two databases are to train the model according to our design and to better evaluate the model performances. The detailed discussion is presented in the Supplementary Material S1.
2.3.3. Data splitting
The private database was for training. The TCIA 4D‐Lung database was for test. Note that the test and training data came from two institutions, respectively, and no model fine‐tuning before test.
For each 4D‐CT, the end‐exhalation phase (50%) was the reference4 The other phases were the moving ones. The images at reference and any moving phase composed one sample pair. Totally, there were 540 sample pairs for training and 720 sample pairs for test. The splitting categories for different models are detailed in Table 1.
TABLE 1.
Quantity of training sets for different models.
Quantity of samples | ||||
---|---|---|---|---|
Models | Input variables | Output variables | Training | Validation |
DIR‐tissue | CTref, CT i | ∖ | 387 | 153 |
DIR‐body | Bodyref, Body i | ∖ | 387 | 153 |
BDFs‐TDFs‐Net | BDFs i , TDFsref, BDFsref | TDFs i | 387 | 153 |
Note: Other abbreviations are the same as those in Figure 2.
Abbreviations: ref, the reference respiration phase; i, ith respiration phase. ∖, not applicable
2.3.4. Model training
All models were implemented using Pytorch and were trained on two NVIDIA TITAN RTX GPUs. Adam 39 was used to optimize model weights. The training settings were listed in Table 2.
TABLE 2.
Training setting for different models.
Models | Learning rate | Batch size | Epoch | Optimal model |
---|---|---|---|---|
DIR‐body a | 10−4 b | 1 | 300 | The one with a minimum loss on validation set c |
DIR‐tissue a | 10−4 b | 1 | 300 | |
BDFs‐TDFs‐Net a | 10−3 b | 8 | 300 |
The abbreviations are the same as those in Figure 2. The three models were trained separately, and then cascaded without end‐to‐end fine tuning. The end‐to‐end training results are presented, compared, and discussed in the Supplementary Material S2.
The learning rate is reduced by a factor of 0.1 if no loss decrease on the training set is seen for 10 epochs.
The indicator for optimal model is chosen because the losses involve the data fidelity term (for faithful estimation) and the regularization term (for realistic looking).
The loss function for DIR‐Body is:
(1) |
(2) |
in which, SSIM represents the structural similarity index measure (SSIM). denotes the output DVF of DIR‐Body. φ refers to the spatial transform5 to warp Bodyref by . R assesses the smoothness in . λ represents an adjustment factor to balance SSIM and R. λ = 0.01 as suggested by Balakrishnan et al. 30 Nx , Ny, and Nz refer to the image size of Body i along x, y, and z axes, respectively. Other denotations are the same as those in Figure 3.
The loss function for DIR‐Tissue is:
(3) |
(4) |
wherein MSE represents the mean square error. denotes the output DVF of DIR‐Tissue. assesses the smoothness of within Body i . is the number of voxels with a value of 1 in Body i . Other denotations are the same as those in Figure 4. φ represents the same as the one in Equation (1).
The loss function for BDFs‐TDFs‐Net is:
(5) |
(6) |
(7) |
(8) |
in which wMSE represents a weighted mean square error as shown in Equation (6). The weight matrix W in Equation (6) aims to focus on the parts within body. has the same size with TDFs i by down‐sampling Bodyref. N denotes the total number of voxels in TDFs i . represents the VGG loss6 suggested by Ledig et al. 40 is deployed for a better high texture detail. and refer to the feature maps output by the first nine layers of the pre‐trained VGG16. 41 represents L2 regularization. represents the BDFs‐TDFs‐Net parameters. α and β denote two adjustment factors. α = 10−2 and β = 10−5 empirically. Other denotations are the same as those in Figures 4 and 5. Note that, in Equations (5)–(7), the TDFs i is the one output by the TDF encoder and was regarded as the truth when training BDFs‐TDFs‐Net.
2.3.5. Validation on RCCT synthetization
The test was performed on the TCIA 4D‐Lung database. By inputting Bodyref and Body i , our model gives an estimated internal tissue DVF. According to the DVF, a CTref is transformed into an estimated CT i (). By comparing with its ground truth () using the following three metrics, we evaluated the proposed CEM on RCCT synthetization.
(9) |
in which RMSE i means the root mean square error (RMSE). The subscript of “i” is the ith respiration phase. N means the number of voxels in . means L2‐norm. mRMSE represents the average RMSE over a breathing time (T).
(10) |
in which SSIM i refers to the SSIM between and . and mean the averages of and , respectively. and mean the variances of and , respectively. represents the covariance of and . C 1 = 10−4. C 2 = 9×10−4. mSSIM relates to the mean SSIM over T. Other denotations are the same as those in Equation (9).
(11) |
wherein represents the mean absolute error. refers to the mean MAE over T. Other denotations are the same as those in Equation (9).
2.3.6. Comparison with other works
Our model was compared with other approaches that had been applied on the TCIA 4D‐Lung database. To align with the comparative methods, the test data are the four intermediate phases (10.0%–40.0%) from 20 sets of the TCIA 4D‐Lung database. Their matrices are 96 × 256 × 256 with a resolution of 2.5 mm × 1.16 mm × 1.16 mm. The comparative methods are:
SE‐linear: A linear scaling method to estimate any intermediate‐phase DVF (i.e., an intermediate‐phase DVF = the EE‐EI DVF × α). 42 The EE‐EI DVF is the CT images’ DVF between the end‐exhalation (EE) and end‐inhalation (EI) phases. α is a scaling factor. α represents the normalized phase index. The EE‐EI DVF was generated by the classic B‐spline registration in the SimpleElastix toolbox. 43
DL‐linear: A linear scaling method to estimate any intermediate‐phase DVF. 42 DL‐linear is similar to SE‐linear, but DL‐linear used a deep learning network to generate the EE‐EI DVF.
Pix2pix network: A U‐net maps the image at the EE phase to an image at any intermediate phase. 42
ConReg network: A deep learning‐based conditional registration model estimates an intermediate‐phase DVF by inputting EE CT scans, EI CT scans, and a conditional variable. 42 The conditional variable is the phase index.
2.3.7. Ablation study
The modDRConv3d is our point to better estimate TDFs from BDFs. To study its superiority, we replaced the modDRConv3d with conventional 3D convolution (denoted as Conv3d), and trained the BDFs‐TDFs‐Net as Section materials and methods 2.2.3. Finally, we compared the CEM performance discrepancy using mRMSE, mSSIM, mMAE, memory size, inference time, and a metric of , as defined in the AppendixD, to evaluate the model error in the lung region showing volume variations.
To differentiate the two CEMs, we denote the one using Conv3d as conCEM, and denote the one using modDRConv3d as CEM.
More ablation studies on our work are presented and discussed in the Supplementary Material S3.
3. RESULTS
3.1. Validation on RCCT synthetization
3.1.1. Quantitative results
The quantitative results are summarized in Table 3. For the 80 testing 4D‐CT sets, our CEM reaches an mRMSE of 61.06 ± 10.43HU, an mSSIM of 0.990 ± 0.004, and an mMAE of 26.80 ± 5.65HU.
TABLE 3.
Quantitative results of our CEM on 80 testing sets. a
mRMSE (HU) | mSSIM | mMAE (HU) b | |
---|---|---|---|
Average ± standard deviation | 61.06 ± 10.43 | 0.990 ± 0.004 | 26.80 ± 5.65 |
median | 60.01 | 0.991 | 25.95 |
70th percentile | 65.39 | 0.992 | 29.47 |
90th percentile | 73.56 | 0.994 | 34.85 |
95th percentile | 82.08 | 0.994 | 36.68 |
Note: The lower mRMSE, lower mMAE, and higher mSSIM refer to a better result. mSSIM ranges from 0 to 1. The metrics were calculated in the whole image, rather than only in the body. However, the testing image resolution and dimension settings focus the validation on the deformation region. Further details are stated in the Supplementary Material S1.
Abbreviations: CEM, a non‐patient‐specific cascaded ensemble model; mRMSE, average root mean square error over a breathing time; mSSIM, average structural similarity index measure over a breathing time; mMAE, average mean absolute error over a breathing time.
Testing set details: Nine phases of each set were involved. Their resolutions are 2.5 mm×1.16 mm×1.16 mm with a matrix of 120×265×265. The image at 50% phase is the model input, and isn't involved in the test.
The outliers of the voxel‐wise absolute errors on a volumetric image are exhibited in a Video and are discussed in the Supplementary Material S4.
The CEM results and the CTref versus CT i metrics are illustrated in Figure 6. It is for a further demonstration on our CEM performance. In our validation experiment, the synthesized CT was generated by transforming the CTref based on the model's output DVF. If there is no difference between CTref and CT i , it can't demonstrate our CEM's effectiveness. In Figure 6, our CEM achieves a better metric in most cases. In 6 cases, indicated by the blue dots in Figure 6, our CEM exhibits an equal or worse metric. Among these failure cases, No.13 case presents the worst performance, since three metrics show no improvement. A further study on it is detailed in Section discussion 4.
FIGURE 6.
Evaluation metric comparison of CEM versus GT and ref versus GT. Their differences are filled by the shaded regions. A lower mRMSE, a lower mMAE, and a higher mSSIM correspond to a better result. The blue dots represent the cases with no improvements. In the top‐to‐bottom subfigures, the blue dots denote the cases with mRMSECEM versus GT ≥ mRMSEref vs. GT, mSSIMCEM versus GT ≤ mSSIMref versus GT, and mMAECEM versus GT ≥ mMAEref versus GT, respectively.* *GT, ground truth; ref, the reference image. Other abbreviations are the same as those in Table 3.
3.1.2. Qualitative visualization
The synthesis examples are exhibited in Figure 7.7 It shows that our CEM performs a good estimation on the diaphragm motion and the structures inside the lung.
FIGURE 7.
The synthesis examples*. The last two rows show the image intensity difference compared to the ground truth. The red and yellow arrows indicate our model performances on the diaphragm motion and the structures inside the lung, respectively. *For a clear view, the images are interpolated to a resolution of 1 mm × 1 mm.
3.2. Comparison with other works
The comparison results were listed in Table 4. The comparison shows that our CEM presents the lowest RMSE and highest SSIM.
TABLE 4.
Comparison results with other works on 20 testing sets a . The results are expressed as average ± standard deviation. b .
SE‐linear 42 | DL‐linear 42 | Pix2pix network 42 | ConReg network 42 | ours | |
---|---|---|---|---|---|
RMSE (HU) | 119.1 ± 54.9 | 131.2 ± 53.7 | 81.3 ± 55.2 | 70.1 ± 33.0 | 58.8 ± 9.4 |
SSIM | 0.870 ± 0.065 | 0.854 ± 0.075 | 0.914 ± 0.076 | 0.926 ± 0.044 | 0.990 ± 0.003 |
Inference time (ms) | 26.59 | 26.59 | 10.61 | 200.47 | 143.23 |
Number of parameters | 1.62 × 104 | 8.81 × 105 | 2.07 × 108 | 8.81 × 105 | 1.25 × 106 |
Abbreviations: RMSE, root mean square error; SSIM, structural similarity index measure.
Testing set details: 4 intermediate phases of each set were involved. Their resolutions are 2.5 mm × 1.16 mm × 1.16 mm with a matrix of 96 × 256 × 256. The details are different from the ones in Table 3. It is to align with the published comparative works.
The best results (lowest RMSE, highest SSIM) were printed in bold.
3.3. Ablation study
Table 5 lists the ablation study results. Obviously, the proposed CEM using modDRConv3d occupies more memory than the one using Conv3d. In terms of a whole volumetric image, their performances and inference times show little difference, since all p‐values are > 0.05. However, for the lung region showing volume variations, our CEM achieves a smaller error than conCEM at the two extreme phases (i.e., 0% and 40% phases which are indicated by two gray boxes, respectively) as shown in the boxplots in Figure 8A. Figure 8B presents three examples, indicated by the red arrows, in which our model with modDRConv3d exhibits a better deformation detail than the one with Conv3d.
TABLE 5.
Ablation study results.
mRMSE (HU) a | mSSIM a | mMAE (HU) a | Number of parameters | inference time (ms) a | |
---|---|---|---|---|---|
CEM b | 61.06 ± 10.43 | 0.990 ± 0.004 | 26.80 ± 5.65 | 1.25×106 | 143.23 ± 25.38 |
conCEM b | 60.93 ± 10.45 | 0.990 ± 0.004 | 26.71 ± 5.67 | 4.16×105 | 143.65 ± 22.78 |
p‐value c | 0.54 | 0.10 | 0.15 | – | 0.14 |
The results of mRMSE, mSSIM, mMAE and inference time are expressed as average ± standard deviation.
CEM is our proposed one. conCEM is the comparative one. Their difference is the convolution in BDFs‐TDFs‐Net. Specifically, CEM = BDFs‐encoder+BDFs‐TDFs‐Net(using modDRConv3d)+TDFs‐decoder. conCEM = BDFs‐encoder+BDFs‐TDFs‐Net(using Conv3d)+TDFs‐decoder.
The paired‐samples t‐test is with a significance level of 0.05. p‐value < 0.05 means the observed average difference between the two groups is statistically significant. p‐value > 0.05 means not. “–” means not applicable.
FIGURE 8.
(A) The boxplots of our proposed model (CEM) and the one using Conv3d (conCEM). * The gray boxes indicate that our CEM reaches a smaller error than conCEM. * Q1, 25th percentile; Q3, 75th percentile; IQR = Q3−Q1. (B) The synthesized examples* of CEM and conCEM. The (a–c) columns refer to three patients, respectively. The last two rows show the image intensity difference compared to the ground truth. The red arrows indicate the two models’ performance differences. *For a clear view, the images are interpolated to a resolution of 1 mm×1 mm.
4. DISCUSSION
4.1. Further study on the failure case
Figure 6 shows six cases (indicated by the blue dots) in which our CEM gets no‐improvement metrics. Among them, No.13 case exhibits the worst. To further study it, we choose its volumetric image at 0% phase as the study object, and then display the CEM‐estimated slices with the worst metrics and their CTref and CT i in Figure 9. The red dashed boxes in the fourth column of Figure 9 indicate the principal CEM versus ground truth differences. As shown in the first three columns (indicated by the red dashed boxes in Figure 9), these principal differences are in the regions of respiration‐induced image artifacts. Other no‐improvement cases also fail in the same reason. The proposed model generates the synthesized images based on the estimated DVF, but it can't correct the artifacts which already exist in the CTref.
FIGURE 9.
Illustration of failure case No.13. (a–c) show the slices with the worst RMSE, MAE, and SSIM, respectively*. The principal differences are in the respiration motion artifact regions as indicated by the red dashed boxes. *MAE, mean absolute error; RMSE, root mean square error; SSIM, structural similarity index measure.
4.2. Detailed study on our CEM's accumulated error caused by the cascaded structure
Our proposed CEM is cascaded by three models. To study their accumulated error, we conducted a comparison experiment as shown in Figure 10.
FIGURE 10.
Experiment setup for the accumulated error study of different cascaded structures.* *The abbreviations are the same as those in Figure 2.
Table 6 and Figure 11 show the quantitative and qualitative synthesis errors of different cascaded structures. Table 6 demonstrates the existence of our CEM's accumulated error. Figure 11 shows that the accumulated error mainly comes from the anatomic sites with high image intensity gradients, such as the diaphragm and the trachea as indicated by the red arrows. To reduce the accumulated error, an additional loss term in Loss BDFs‐TDFs‐Net with an image intensity gradient‐based weight matrix is a possible solution.
TABLE 6.
Quantitative results on 80 testing sets for different cascaded structures. The results are expressed as average ± standard deviation. a
mRMSE (HU) | mSSIM | mMAE (HU) | |
---|---|---|---|
TDFs decoder | 45.71 ± 6.93 | 0.995 ± 0.002 | 22.82 ± 5.16 |
CEM b | 61.06 ± 10.43 | 0.990 ± 0.004 | 26.80 ± 5.65 |
FIGURE 11.
The synthesis examples of different cascaded structures (i.e., TDFs decoder and CEM).* The (a–c) columns refer to three patients, respectively. The last row shows the absolute image intensity difference between the two structures. The red arrows indicate the principal differences and their corresponding anatomic sites in the synthesized images. *The abbreviations are the same as those in Table 6 and Figure 10.
4.3. Limitation and future work
4.3.1. Limitation
In this experiment, the respiration‐induced surface motion was simulated using the body contours derived from 4D‐CT. In the future, we will employ the surface motion captured by an optical camera to assess the proposed model's performance for clinical practice.
Regarding that it is hard to use an optical camera to directly capture such a whole‐body contour when the patient is in supine position, we will use an alternative approach as illustrated in Figure 12. By using an optical camera, we can capture a 3D anterior body surface which moves with breathing, and then fuse it with the posterior body surface in a planning CT to be a whole one. Additionally, only using the front surface as input is also potential.
FIGURE 12.
Workflow of capturing time‐resolved body binary mask.
In our work, we adopted different edge detection methods to derive body contours from the training and testing sets to simulate the body motion data difference between the simulation and the real scenario. With this experiment setup, we conducted an implicit evaluation on the model robustness against such differences. Although the results show a good robustness, a further assessment of using real data is still necessary.
4.3.2. Exploration on time‐resolved dose calculation
The proposed method can also serve for the time‐resolved dose calculation(TR‐DC) 44 in adaptive radiotherapy (ART). TR‐DC provides dose accumulation, which is crucial for predicting the dosimetric outcome in the presence of intra‐fraction organ motion. Specifically, we can use the proposed method to generate a series of 3D images showing the time‐varying internal anatomy in one fraction. These images, combined with the irradiation parameters recorded in a log file, 45 , 46 are transferred to a dose calculation software for the dose distribution in each 3D image. By warping the dose distributions in all 3D images into a reference image and accumulating them, we can get the total dose delivered to the tumor and other normal tissues.
In this subsection, we explore the proposed model on TR‐DC and list the results in Table 7. 5 patients8 receiving radiation treatment in our department were involved. All of them underwent 4D‐CT scanning. We used the proposed model to generate the corresponding synthesized 4D‐CT. The dose distributions, resulting from the same treatment plan and delivered to each phase of the true and the synthesized 4D‐CTs, were calculated using the Eclipse v15.1 treatment planning software (Varian Medical Systems, Inc., CA, USA), and then were warped and accumulated on the 50% phase scans using the MIM v7.3.2 software (MIM software Inc., OH, USA). The dosimetric impact of the intensity differences between the estimated and the ground‐truth volumes were evaluated using the dose‐volume histogram (DVH) criteria of gross target volume (GTV) and normal organs. The GTV DVH criterion is D99% as suggested by Shintani et al. 47 The normal tissues’ DVH criteria are those suggested by Timmerman. 48
TABLE 7.
The time‐resolved dose calculation comparison between the GT 4D‐CT and our synthesized one.
GTV D99% (Gy) | Lung D1500cm 3 (Gy) | Lung D950cm 3 (Gy) | Heart D15cm 3 (Gy) | Spinal cord D0.35 cm 3 (Gy) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No. | mMAE‐ a (HU) | GT | ours | Δ | GT | ours | Δ | GT | ours | Δ | GT | ours | Δ | GT | ours | Δ |
1 | 29.95 | 51.14 | 51.43 | −0.29 | 0.24 | 0.34 | −0.10 | 0.51 | 0.63 | −0.13 | 14.02 | 14.19 | −0.17 | 21.60 | 22.29 | −0.69 |
2 | 36.43 | 62.52 | 63.16 | −0.64 | 3.62 | 3.63 | −0.01 | 6.12 | 6.15 | −0.02 | 14.72 | 15.31 | −0.59 | 12.98 | 12.88 | 0.09 |
3 | 17.86 | 56.04 | 56.27 | −0.24 | 0.64 | 0.62 | 0.02 | 1.64 | 1.59 | 0.05 | 4.87 | 4.91 | −0.04 | 10.57 | 10.64 | −0.07 |
4 | 34.61 | 58.35 | 56.86 | 1.49 | 0.81 | 0.79 | 0.02 | 2.05 | 1.98 | 0.07 | 12.11 | 12.35 | −0.23 | 5.60 | 5.73 | −0.13 |
5 | 20.25 | 50.94 | 51.02 | −0.08 | 0.21 | 0.20 | 0.01 | 0.30 | 0.30 | 0.00 | 7.19 | 7.40 | −0.20 | 12.99 | 13.01 | −0.02 |
Abbreviations: GTV, gross target volume; D99%, the minimum dose delivered to 99% of GTV; D x cm 3, the minimum dose delivered to x cm3 of the tissue; GT, the ground truth; mMAE, average mean absolute error over a breathing time; Δ, the metric difference by subtracting ours from GT.
The mMAE is calculated in the intersection of the body and irradiation region.
In Table 7, the difference of GTV D99% between the ground truth and the synthesis is less than 1.49 Gy. The differences of DVH metrics are of < 0.13 Gy for lung, of < 0.59 Gy for heart and of < 0.69 Gy for spinal cord.
In the currently proposed ART workflow, the delivered dose for adaption is fraction‐wise. 49 It presents no real‐time requirement, since the synthesized images can be generated between two fractions (i.e., usually between 2 days).
4.4. Effect discussion of motion hysteresis and deformable drift
The motion hysteresis and deformable drift are two common factors violating the correlation between external surrogates and internal targets. In this subsection, we give a further discussion on their effect on our model.
4.4.1. Effect discussion of motion hysteresis
The motion hysteresis implies a difference between target motion trajectories during inhalation and exhalation. To study our model's performance against the hysteresis, we only use the inhalation data to train all models, and evaluate the performance on the exhalation data as Section materials and methods 2.
The quantitative results are listed in Table 8. Compared with the testing results9 in Table 3, although the inhalation‐data‐trained CEM presents a significantly different performance (all p‐values are < 0.05 in Table 8), their differences are small (the 95% CIs for mRMSE, mSSIM and mMAE are [3.21, 7.08] HU, [−0.003, −0.001] and [1.56, 2.80] HU, respectively).
TABLE 8.
Quantitative results of the inhalation‐data‐trained CEM on 80 testing sets of exhalation data.
mRMSE (HU) | mSSIM | mMAE (HU) | |
---|---|---|---|
Average ± standard deviation | 66.65 ± 13.76 | 0.988 ± 0.005 | 29.09 ± 5.96 |
Median | 64.27 | 0.989 | 28.23 |
70th percentile | 70.87 | 0.991 | 32.80 |
90th percentile | 86.48 | 0.993 | 37.27 |
95th percentile | 92.90 | 0.994 | 38.25 |
p‐value | 1.03 × 10−6 | 5.73 × 10−6 | 6.88 × 10−10 |
95% CI a | [3.21, 7.08] | [−0.003, −0.001] | [1.56, 2.80] |
Note: The p‐value is obtained by performing the paired samples t‐test on the results in Table 8 and the testing results during exhalation in Table 3. The p‐value < 0.05 means that their difference is statistically significant. Otherwise not.
Abbreviation: 95% CI, 95% confidence interval. Other abbreviations are the same as those in Table 3.
95% CI means that the two populations’ average difference falls in this interval with 95% certainty.
4.4.2. Effect discussion of deformable drift
The deformable drift indicates the intra‐fraction target drift where the patient relaxes and gradually deviates from their original positions. To assess our model's performance against this drift, we choose No.45 and No.47 cases as the study objects. The two cases were acquired from one patient, but on different dates. Their image difference at 50% phase is regarded as the deformable drift. By inputting the No.45 case's 50% phase images into our model, we estimate the deformation corresponding to the No.47 case. In this scenario, our model's performance is the robustness against the deformable drift.
Figure 13 displays the result. The red arrows in Figure 13a indicate that our model is good at estimating diaphragm motion. Figure 13b,c shows our model's robustness on the rib motion, although it is only a little bit. The yellow arrows in Figure 13a illustrate that our model fails on the structures inside the lung. Figure 13 proves our model's potential against the deformable drift. However, a further performance improvement on the structures inside the lung and the rib is needed.
FIGURE 13.
Illustration of our proposed model's performance against deformable drift. (a) The red arrows indicate that our model performance on diaphragm motion. The yellow arrows indicate that our model fails on the structures inside the lung. (b–c) The white arrows indicate our model's robustness on the rib motion, although it's only a little bit.
4.5. Further study on DIR‐Tissue
In this work, DIR‐Tissue determines the highest accuracy of our model. In this section, we further study its uncertainty and residual error, and compare it with other DIR approaches.
4.5.1. Further study on the uncertainty and residual error
We exhibit three DIR‐Tissue registration results with the worst MAEs in Figure 14. The registration errors mainly locate on the lung boundary and its inside details. In our future work, by adding the VGG loss 40 term in the current loss function of DIR‐Tissue, the registration performances are potential to be improved, since the VGG loss benefits the high texture details.
FIGURE 14.
DIR‐Tissue registration results on test sets with the worst MAE* results. (a–c) correspond to three cases. *MAE, mean absolute error; DIR, deformable image registration.
4.5.2. Comparison with other DIR approaches
We compare our DIR‐Tissue with VoxelMorph, 30 the pyramidal Lucas‐Kanade (Pyr‐LK) optical flow algorithm 50 and the optical flow method proposed by Farneback (Farneback‐OP). 51 The VoxelMorph was trained using the same data and the same training strategy. The three comparative approaches were evaluated on the same data as described in Section materials and methods 2. The results are listed in Table 9.
TABLE 9.
Deformable image registration (DIR) results comparison. The results are expressed as average ± standard deviation (p‐value from paired samples t‐test, 95% CI).
mRMSE (HU) | mSSIM | mMAE (HU) | inference time (ms) | number of parameters | |
---|---|---|---|---|---|
DIR‐Tissue | 45.71 ± 6.93 (1.67×10−21, [−8.84, −6.51]) | 0.995 ± 0.002 (6.67×10−18, [0.0015, 0.0021]) | 22.82 ± 5.16 (2.95×10−29, [−3.30, −2.63]) | 51.58 | 2.80×105 |
VoxelMorph | 38.03 ± 6.53 | 0.996 ± 0.001 | 19.86 ± 5.16 | 55.11 | 3.97×105 |
Pyr‐LK | 56.36 ± 10.25 (2.86×10−27, [−20.54, −16.11]) | 0.992 ± 0.003 (1.34×10−20, [0.0040, 0.0054]) | 25.01 ± 5.37 (6.05×10−33, [−5.66, −4.64]) | 44539.74 | 1.77×106 |
Farneback‐OP | 54.90 ± 10.90 (1.08×10−23, [−19.21, −14.53]) | 0.992 ± 0.004 (7.84×10−17, [0.0035, 0.0051]) | 24.48 ± 5.30 (8.26×10−29, [−5.15, −4.10]) | 9287.00 | 9.3×103 |
Note: The best results are printed in bold.
Abbreviation: 95% CI, 95% confidence interval; DIR‐Tissue, DIR model for internal tissue; VoxelMorph, a deep‐learning‐based DIR model; Pyr‐LK, the pyramidal Lucas‐Kanade optical flow algorithm; Farneback‐OP, the optical flow method proposed by Farneback. Other abbreviations are the same as those in Table 3.
In Table 9, our DIR‐Tissue is the second best one. The VoxelMorph shows the best result. The 95% confidence intervals, by comparing our DIR‐Tissue with VoxelMorph, are [−8.84, −6.51] HU for mRMSE, [0.0015, 0.0021] for mSSIM and [−3.30, −2.63] HU for mMAE. Although the difference is small, the U‐net‐similar model in VoxelMorph inspires us that the skip connections (i.e., the defining feature of the U‐net architecture) and a deeper network10 can be adopted in our future work to further decrease the registration errors. In this scenario, a well‐designed BDFs‐TDFs‐Net to correlate the surface motion and the internal tissue deformation is needed. It is because that the skip connections introduce more features to the decoding layers. It complicates the external‐internal relationship and necessitates the cost effectiveness analysis. Our work's point is to estimate the time‐varying internal deformation only based on the external surface motion.
4.5.3. Further discussion on DIR‐Tissue and VoxelMorph
The U‐net‐similar model in VoxelMorph is widely used. The reason that our DIR‐Tissue doesn't adopt such a model structure lies in our hypothesis.
Our work's point is to only use the external surface motion to estimate the time‐varying internal tissue deformation. In our hypothesis, the internal tissue and the surface are two sub‐systems of the respiratory dynamics system. Their motion is both driven by the same power, namely the breathing. Thus, their motion should be related to each other.
Based on the hypothesis, we think that the two sub‐systems’ high‐level deformation features are appropriate to be mapped to each other. It is because that, in the deep learning's hierarchical learning manner, the high‐level features refer to a more powerful and abstract representation of the whole sub‐system's deformation. It would be easier to establish the connection between the two sub‐systems’ high‐level features.
The low‐level features, for example, corners and edges, are frequently unique to the input CT images. It would be tough to infer such low‐level features from body, since they present strongly correlated to the input CT images. However, in the decoder of a U‐net‐similar model, because of the skip connections, the low‐level features must be concatenated to the high‐level ones to output a DVF. That is why we do not use the U‐net‐similar model in our DIR‐Tissue.
Given that the U‐net‐similar model achieves a better registration result as shown in Table 9, we will conduct more studies on estimating these low‐level features in our future work. In this work, we aim to explore the feasibility of relating internal tissue deformation to external surface motion.
5. CONCLUSION
The proposed CEM is effective when estimating the respiration‐induced thoracic tissue deformation from surface motion. It needs no patient‐specific breathing data sampling and no additional training before application. It shows promise in clinical practice and potential for broad application in the field of respiration‐related correction techniques.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts of interest.
Supporting information
Supporting Information
Supporting Information
ACKNOWLEDGMENTS
This work was supported by the National Natural Science Foundation of China (82001928) and General Project of Zhejiang Medical and Health Plan (2024KY843).
APPENDIX A.
List of abbreviations
- 4D‐CT
four‐dimension computed tomography
- 95% CI
95% confidence interval
- ART
adaptive radiotherapy
- BDFs
body deformation features
- BDFs i
BDFs at ith respiration phase
- BDFsref
BDFs at reference respiration phase
- Body
volumetric binary image of body
- Body i
Body at ith respiration phase
- Bodyref
Body at reference respiration phase
- CEM
a non‐patient‐specific cascaded ensemble model
- Conv3d
conventional three‐dimension convolution
- CT
computed tomography
- CT i
CT images at ith respiration phase
- CTref
CT images at reference respiration phase
- DIR
deformable image registration
- DIR‐Body
DIR model for body
- DIR‐Tissue
DIR model for internal tissue
- DRConv
dynamic region‐aware convolution
- DVF
deformation vector field
- EE
end‐exhalation
- EI
end‐inhalation
- Farneback‐OP
the optical flow method proposed by Farneback
- GT
ground truth
- mMAE
average mean absolute error
- modDRConv3d
modified DRConv in three dimensions
- mRMSE
average root mean square error
- mSSIM
average structure similarity index measure
- Pyr‐LK
the pyramidal Lucas‐Kanade optical flow algorithm
- RCCT
respiration‐correlated computed tomography
- TDFs
tissue deformation features
- TDFs i
TDFs at ith respiration phase
- TDFsref
TDFs at reference respiration phase
APPENDIX B.
B.1.
Details of the modDRConv3d in our work
For simplicity, Figure B1 shows the modDRConv in two dimensions. As shown in Figure B1(a), there are two input variables (i.e., the guided input and X) and one output variable (i.e., Y) for the modDRConv. During calculation, X is divided into m regions based on the guided input. Each region is assigned with an individual filter (i.e., W i ). Every W i is randomly initialized. We convolve the different regions of X with its corresponding filters to get Y.
FIGURE B1.
Illustration of modDRConv with kernel size k×k and region number m. We get guided feature from the guided input with standard k×k convolution, and get m randomly‐initialized filters. The spatial dimension is divided into m regions as guided mask shows. Every region has an individual filter W i which is shared in this region and we execute k×k convolution with the corresponding filter in these regions of X to output Y.
-
2.
The hypothesis of employing the modDRConv3d in our work
Based on the hypothesis that various anatomic sites show different deformation correlations between body and internal tissues, we employed the modDRConv3d in BDFs‐TDFs‐Net.
As shown in Figure B2, the modDRConv3d's guided input is the concatenated features of normBDFsref and normTDFsref. From it, we generate a guided mask to classify the various anatomic sites into different categories. Based on these categories, we assign different convolution filters to approximate the different body‐tissue deformation correlations.
FIGURE B2.
The analogy between (a) the symbolic input and output variables of modDRConv and (b) the one in BDFs‐TDFs‐Net. The dashed lines indicate the counterparts. * * The denotations in (a) are the same as those in Figure B1(a). The abbreviations and legends in (b) are the same as those in Figure 5. (b) is part of the BDFs‐TDFs‐Net structure in Figure 5(a).
APPENDIX C.
Calculation of the spatial transform φ
In our work, . and represent the warped and the reference images, respectively. ϕ denotes the registration field that maps the coordinates of to the coordinates of .
Let and . denotes a three‐dimension(3D) image matrix. represents the voxel value at the 3D coordinate of (i, j, k). Likewise, represents the voxel value at the 3D coordinate of (i, j, k) for the 3D image matrix m . For ϕ, , and denote the coordinate displacements along three orthogonal axes.
Finally, . For the voxels which can't be assigned directly through the aforementioned formula, we use the bilinear interpolation 52 to estimate their values.
APPENDIX D.
The metric of
The metric of is calculated as the Figure D1.
FIGURE D1.
The calculation of the metric of .
Zhang J, Bai X, Shan G. Deep learning‐based estimation of respiration‐induced deformation from surface motion: A proof‐of‐concept study on 4D thoracic image synthesis. Med Phys. 2025;52:e17804. 10.1002/mp.17804
Footnotes
All abbreviations are listed in Appendix A.
The modDRConv3d is detailed in Appendix B.
Among the 62 4D‐CT sets in the private database, 58 sets were ten‐phase 4D‐CT including 10 sets of 3D CT. Two sets were four‐phase 4D‐CT including four sets of 3D CT. One set was five‐phase 4D‐CT including five sets of 3D CT. One set was nine‐phase 4D‐CT including nine sets of 3D CT.
Because of the stability of end‐exhalation (EE) phase 35 , 36 , EE phase is chosen as a reference in most clinical practices, such as respiratory‐gated radiotherapy 37 and measurement of tumor drift. 38 . To align with it, this work also chooses the EE phase as the reference.
The spatial transform φ is explicitly defined in Appendix C.
The VGG loss code is open‐source at https://github.com/crowsonkb/vgg_loss/blob/master/vgg_loss.py
Figure 7 shows three randomly‐chosen cases, namely No. 1, No. 2 and No. 43 cases in Figure 6. Five best and five worst cases are shown in the format of Figure 7 and are discussed in the Supplementary Material S5.
The patient cohort descriptors are summarized in the Supplementary Material S6.
The data used to train all models involve inhalation and exhalation data.
Our DIR‐Tissue shares the same unsupervised deep learning framework for deformable image registration with the VoxelMorph. Their difference principally lies in the model structure. Specifically, the structure differences are (a) the VoxelMorph adopts a U‐net‐similar model which contains the skip connections, but our DIR‐Tissue does not. (b) The U‐net‐similar model has two more layers in the decoding path than our DIR‐Tissue.
REFERENCES
- 1. Vinogradskiy Y, Castillo R, Castillo E, et al. Results of a multi‐institutional phase 2 clinical trial for 4DCT‐ventilation functional avoidance thoracic radiation therapy. Int J Radiat Oncol Biol Phys. 2022;112(4):986‐995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Socha J, Rygielska A, Uziębło‐Życzkowska B, et al. Contouring cardiac substructures on average intensity projection 4D‐CT for lung cancer radiotherapy: A proposal of a heart valve contouring atlas. Radiother Oncol. 2022;167:261‐268. [DOI] [PubMed] [Google Scholar]
- 3. Rabe M, Thieke C, Düsberg M, et al. Real‐time 4DMRI‐based internal target volume definition for moving lung tumors. Med Phys. 2020;47(4):1431‐1442. [DOI] [PubMed] [Google Scholar]
- 4. Trémolières P, Gonzalez‐Moya A, Paumier A, et al. Lung stereotactic body radiation therapy: personalized PTV margins according to tumor location and number of four‐dimensional CT scans. Radiat Oncol. 2022;17:1‐9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Szkitsak J, Werner R, Fernolendt S, et al. First clinical evaluation of breathing controlled four‐dimensional computed tomography imaging. Phys Imaging Radiat Oncol. 2021;20:56‐61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Keall PJ, Mageras GS, Balter JM, et al. The management of respiratory motion in radiation oncology report of AAPM Task Group 76 a. Med Phys. 2006;33(10):3874‐3900. [DOI] [PubMed] [Google Scholar]
- 7. Werner R, Hofmann C, Mücke E, Gauer T. Reduction of breathing irregularity‐related motion artifacts in low‐pitch spiral 4D CT by optimized projection binning. Radiat Oncol. 2017;12:1‐8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Pan T, Lee TY, Rietzel E, Chen GT. 4D‐CT imaging of a volume influenced by respiratory motion on multi‐slice CT. Med Phys. 2004;31(2):333‐340. [DOI] [PubMed] [Google Scholar]
- 9. Cao Y‐H, Jaouen V, Bourbonne V, et al. Patient‐specific 4DCT respiratory motion synthesis using tumor‐aware GANs. Paper presented at: IEEE Nuclear Science Symposium and Medical Imaging Conference 20222022.
- 10. Lafrenière M, Mahadeo N, Lewis J, Rottmann J, Williams CL. Continuous generation of volumetric images during stereotactic body radiation therapy using periodic kV imaging and an external respiratory surrogate. Physica Med. 2019;63:25‐34. [DOI] [PubMed] [Google Scholar]
- 11. Feng L, Tyagi N, Otazo R. MRSIGMA: Magnetic Resonance SIGnature MAtching for real‐time volumetric imaging. Magn Reson Med. 2020;84(3):1280‐1292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Shao HC, Li Y, Wang J, Jiang S, Zhang Y. Real‐time liver motion estimation via deep learning‐based angle‐agnostic X‐ray imaging. Med Phys. 2023;50(11):6649‐6662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Zhou Z, Jiang S, Yang Z, Zhou N, Ma S, Li Y. A high‐dimensional respiratory motion modeling method based on machine learning. Expert Syst Appl. 2023;242:122757. [Google Scholar]
- 14. Freislederer P, Kügele M, Öllers M, et al. Recent advances in surface guided radiation therapy. Radiat Oncol. 2020;15(1):1‐11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Wang T, He T, Zhang Z, et al. A personalized image‐guided intervention system for peripheral lung cancer on patient‐specific respiratory motion model. Int J Comput Assist Radiol Surg. 2022;17(10):1751‐1764. [DOI] [PubMed] [Google Scholar]
- 16. Zhang J, Huang X, Shen Y, Chen Y, Cai J, Ge Y. Nearest neighbor method to estimate internal target for real‐time tumor tracking. Technol Cancer Res Treat. 2018;17:1533033818786597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Fakhraei S, Ehler E, Sterling D, Cho L, Alaei P. A patient‐specific correspondence model to track tumor location in thorax during radiation therapy. Int J Radiat Oncol Biol Phys. 2021;111(3):e531. [DOI] [PubMed] [Google Scholar]
- 18. Shao H, Mengke T, Chen H, Wang J, Zhang Y. Deep learning‐driven real‐time liver tumor localization via optical surface imaging and biomechanical modeling. Int J Radiat Oncol Biol Phys. 2022;114(3):S34. [Google Scholar]
- 19. Mann P, Witte M, Mercea P, Nill S, Lang C, Karger C. Feasibility of markerless fluoroscopic real‐time tumor detection for adaptive radiotherapy: development and end‐to‐end testing. Phys Med Biol. 2020;65(11):115002. [DOI] [PubMed] [Google Scholar]
- 20. Korreman S. Image‐guided radiotherapy and motion management in lung cancer. Br J Radiol. 2015;88(1051):20150100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Siochi RA, Kim Y, Bhatia S. Tumor control probability reduction in gated radiotherapy of non‐small cell lung cancers: A feasibility study. J Appl Clin Med Phys. 2015;16(1):8‐21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Liang Z, Zhang M, Shi C, Huang ZR. Real‐time respiratory motion prediction using photonic reservoir computing. Sci Rep. 2023;13(1):5718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Wang Y, Yu Z, Sivanagaraja T, Veluvolu KC. Fast and accurate online sequential learning of respiratory motion with random convolution nodes for radiotherapy applications. Appl Soft Comput. 2020;95:106528. [Google Scholar]
- 24. Li G. Advances and potentials of optical surface imaging in radiotherapy. Phys Med Biol. 2022;67(16):10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Ladjal H, Beuve M, Giraud P, Shariat B. Towards non‐invasive lung tumor tracking based on patient specific model of respiratory system. IEEE Trans Biomed Eng. 2021;68(9):2730‐2740. [DOI] [PubMed] [Google Scholar]
- 26. Shao H‐C, Li Y, Wang J, Jiang S, Zhang Y. Real‐time liver tumor localization via combined surface imaging and a single x‐ray projection. Phys Med Biol. 2023;68(6):065002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Huang Y, Dong Z, Wu H, Li C, Liu H, Zhang Y. Deep learning‐based synthetization of real‐time in‐treatment 4D images using surface motion and pretreatment images: A proof‐of‐concept study. Med Phys. 2022;49(11):7016‐7024. [DOI] [PubMed] [Google Scholar]
- 28. Wikström KA, Isacsson UM, Nilsson KM, Ahnesjö A. Evaluation of four surface surrogates for modeling lung tumor positions over several fractions in radiotherapy. J Appl Clin Med Phys. 2021;22(9):103‐112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Zhang J, Wang Y, Bai X, Chen M. Extracting lung contour deformation features with deep learning for internal target motion tracking: a preliminary study. Phys Med Biol. 2023;68(19):195009. [DOI] [PubMed] [Google Scholar]
- 30. Balakrishnan G, Zhao A, Sabuncu MR, Guttag J, Dalca AV. VoxelMorph: a learning framework for deformable medical image registration. IEEE Trans Med Imaging. 2019;38(8):1788‐1800. [DOI] [PubMed] [Google Scholar]
- 31. Chen J, Wang X, Guo Z, Zhang X, Sun J. Dynamic region‐aware convolution. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
- 32. Hugo GD, Weiss E, Sleeman WC, et al. Data from 4D Lung Imaging of NSCLC Patients (Version 2). In. 10.7937/K9/TCIA.2016.ELN8YGLE, The Cancer Imaging Archive; 2016. [DOI] [Google Scholar]
- 33. Balik S, Weiss E, Jan N, et al. Evaluation of 4‐dimensional computed tomography to 4‐dimensional cone‐beam computed tomography deformable image registration for lung cancer adaptive radiation therapy. Int J Radiat Oncol Biol Phys. 2013;86(2):372‐379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Moore FR, Langdon GG. A generalized firing squad problem. Inf Control. 1968;12(3):212‐220. [Google Scholar]
- 35. George R, Chung TD, Vedam SS, et al. Audio‐visual biofeedback for respiratory‐gated radiotherapy: impact of audio instruction and audio‐visual biofeedback on respiratory‐gated radiotherapy. Int J Radiat Oncol Biol Phys. 2006;65(3):924‐933. [DOI] [PubMed] [Google Scholar]
- 36. Wu W, Chan C, Wong Y, Cuijpers J. A study on the influence of breathing phases in intensity‐modulated radiotherapy of lung tumours using four‐dimensional CT. Br J Radiol. 2010;83(987):252‐256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Farzaneh MJK, Momennezhad M, Naseri S. Gated radiotherapy development and its expansion. J Biomed Phys Eng. 2021;11(2):239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Kamima T, Iino M, Sakai R, et al. Evaluation of the four‐dimensional motion of lung tumors during end‐exhalation breath‐hold conditions using volumetric cine computed tomography images. Radiother Oncol. 2023;182:109573. [DOI] [PubMed] [Google Scholar]
- 39. Kingma DP, Ba J. Adam: A method for stochastic optimization. 3rd International Conference for Learning Representations; May 7–9, 2015; San Diego, CA, USA. [Google Scholar]
- 40. Ledig C, Theis L, Huszár F, et al. Photo‐realistic single image super‐resolution using a generative adversarial network. Paper presented at: IEEE Conference on Computer Vision and Pattern Recognition; July 21‐July 26, 2017; Honolulu, Hawaii, USA. [Google Scholar]
- 41. Simonyan K, Zisserman A. Very deep convolutional networks for large‐scale image recognition. Paper presented at: 3rd International Conference on Learning Representations (ICLR 2015); May 7–9, 2015; San Diego, CA, USA. [Google Scholar]
- 42. Sang Y, Ruan D. A conditional registration network for continuous 4D respiratory motion synthesis. Med Phys. 2023;50(7):4379‐4387. [DOI] [PubMed] [Google Scholar]
- 43. Marstal K, Berendsen F, Staring M, SimpleElastix Klein S.: A user‐friendly, multi‐lingual library for medical image registration. Paper presented at: IEEE Conference on Computer Vision and Pattern Recognition Workshops; 27–30 June, 2016; Las Vegas, NV, USA. [Google Scholar]
- 44. Kostiukhina N, Palmans H, Stock M, Knopf A, Georg D, Knaeusl B. Time‐resolved dosimetry for validation of 4D dose calculation in PBS proton therapy. Phys Med Biol. 2020;65(12):125015. [DOI] [PubMed] [Google Scholar]
- 45. Meijers A, Jakobi A, Stützer K, et al. Log file‐based dose reconstruction and accumulation for 4D adaptive pencil beam scanned proton therapy in a clinical treatment planning system: implementation and proof‐of‐concept. Med Phys. 2019;46(3):1140‐1149. [DOI] [PubMed] [Google Scholar]
- 46. Inui S, Nishio T, Ueda Y, et al. Machine log file‐based dose verification using novel iterative CBCT reconstruction algorithm in commercial software during volumetric modulated arc therapy for prostate cancer patients. Physica Med. 2021;92:24‐31. [DOI] [PubMed] [Google Scholar]
- 47. Shintani T, Nakamura M, Matsuo Y, et al. Investigation of 4D dose in volumetric modulated arc therapy‐based stereotactic body radiation therapy: does fractional dose or number of arcs matter? J Radiat Res (Tokyo). 2020;61(2):325‐334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Timmerman R. A story of hypofractionation and the table on the wall. Int J Radiat Oncol Biol Phys. 2022;112(1):4‐21. [DOI] [PubMed] [Google Scholar]
- 49. Brock KK. Adaptive radiotherapy: moving into the future. Semin Radiat Oncol. 2019;29(3):181‐184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Bouguet J‐Y. Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel corporation. 2001;5(1‐10):4. [Google Scholar]
- 51. Farnebäck G. Two‐frame motion estimation based on polynomial expansion. Paper presented at: Image Analysis: 13th Scandinavian Conference, SCIA 2003 Halmstad, Sweden, June 29–July 2, 2003. Proceedings 132003. [Google Scholar]
- 52. Hawkes P W. Advances in Imaging and Electron Physics[M]. Elsevier, 2004. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supporting Information
Supporting Information