Abstract
Objective.
Synthesize realistic and controllable respiratory motions in the extended cardiac-torso (XCAT) phantoms by developing a generative adversarial network (GAN)-based deep learning technique.
Methods.
A motion generation model was developed using bicycle-GAN with a novel 4D generator. Input with the end-of-inhale (EOI) phase images and a Gaussian perturbation, the model generates inter-phase deformable-vector-fields (DVFs), which were composed and applied to the input to generate 4D images. The model was trained and validated using 71 4D-CT images from lung cancer patients and then applied to the XCAT EOI images to generate 4D-XCAT with realistic respiratory motions. A separate respiratory motion amplitude control model was built using decision tree regression to predict the input perturbation needed for a specific motion amplitude, and this model was developed using 300 4D-XCAT generated from 6XCAT phantom sizes with 50 different perturbations for each size. In both patient and phantom studies, Dice coefficients for lungs and lung volume variation during respiration were compared between the simulated images and reference images. The generated DVF was evaluated by deformation energy. DVFs and ventilation maps of the simulated 4D-CT were compared with the reference 4D-CTs using cross correlation and Spearman’s correlation. Comparison of DVFs and ventilation maps among the original 4D-XCAT, the generated 4D-XCAT, and reference patient 4D-CTs were made to show the improvement of motion realism by the model. The amplitude control error was calculated.
Results.
Comparing the simulated and reference 4D-CTs, the maximum deviation of lung volume during respiration was 5.8%, and the Dice coefficient reached at least 0.95 for lungs. The generated DVFs presented comparable deformation energy levels. The cross correlation of DVFs achieved 0.89 ± 0.10/0.86 ± 0.12/0.95 ± 0.04 along the x/y/z direction in the testing group. The cross correlation of ventilation maps derived achieved 0.80 ± 0.05/0.67 ± 0.09/0.68 ± 0.13, and the Spearman’s correlation achieved 0.70 ± 0.05/0,60 ± 0.09/0.53 ± 0.01, respectively, in the training/validation/testing groups. The generated 4D-XCAT phantoms presented similar deformation energy as patient data while maintained the lung volumes of the original XCAT phantom (Dice = 0.95, maximum lung volume variation = 4%). The motion amplitude control models controlled the motion amplitude control error to be less than 0.5 mm.
Conclusions.
The results demonstrated the feasibility of synthesizing realistic controllable respiratory motion in the XCAT phantom using the proposed method. This crucial development enhances the value of XCAT phantoms for various 4D imaging and therapy studies.
Keywords: XCAT phantom, generative adversarial network, deformation vector field, virtual trial
1. Introduction
Anthropomorphic phantoms have been used as an essential tool for testing, evaluating, and comparing radiotherapy and medical imaging techniques. With several selectable parameters involved in a new technique, it would not be feasible to evaluate every technique on real patients under clinical conditions, especially for techniques that lead to extra radiation dose to patients. Meanwhile, the use of physical phantoms is usually limited because of the inconvenience and expensiveness of fabricating physical phantoms with a realistic range of patient sizes and variations. Therefore, digital phantoms have been more widely used for diverse imaging technique assessment (Ren et al 2014, Bosca and Jackson 2016) and virtual clinical trials (Bakic et al 2018, Barufaldi et al 2018). In addition to the advantages of convenience and low-cost, using digital phantoms provides a gold standard for the quantitative evaluation of imaging techniques and devices, since the phantoms’ exact anatomy and physiological functions are known.
Simulating respiratory motions in digital phantoms is crucial for their application in radiation therapy, such as evaluating 4D imaging and treatment planning techniques for thoracic and abdominal tumors that are affected by breathing. The 4D extended cardiac-torso (XCAT) phantom (Segars et al 2008,2010) is a widely used phantom developed by William P. Segars at Duke University that can simulate cardiac and respiratory motions under diverse human anatomies (e.g. genders, heights, tumor sizes and locations, etc). Using the non-uniform rational b-spline (NURBS) surfaces to construct organ shapes, the 4D-XCAT provides a realistic and flexible anthropomorphic model based on multi-contrast high-resolution imaging data, including CT, MR, and PET. Thus, the 4D-XCAT phantoms have gained broad applications in biomedical imaging and therapy research, including medical imaging technique evaluation (Chen et al 2018, Lafata et al 2018, Pham et al 2019), radiation therapy technique evaluation (Ren et al 2014, Zhang et al 2017, 2018), radiation dosimetry (Xie and Zaidi 2014), respiratory motion studies, etc (Segars et al 2018).
One of the main limitations of the 4D-XCAT phantom is its simplified breathing motion pattern. In general, the breathing motion in the 4D-XCAT phantom approximately presents a linearly increasing pattern from the lung apex to the diaphragm along both superior–inferior (SI) and anterior–posterior (AP) directions, which is far from realistic compared to real patients. Therefore, the current phantom is limited for research that involves lung function, motion management, 4D imaging or treatment optimization, etc. Simulating realistic respiratory motion patterns in the 4D-XCAT phantom is essential for broadening its applications.
To our best knowledge, there have been no studies that adequately addressed this limitation. With the rapid development of artificial intelligence techniques in medical imaging, we see its potential to solve this limitation of the 4D-XCAT phantom. In particular, generative adversarial networks (GAN)-based algorithms have shown promising applications in image synthesizing, image translation, etc (Liu et al 2019). In the GAN architecture, one neural network, called the generator, generates synthetic images from random Gaussian perturbations, while another neural network, called the discriminator, evaluates the synthetic images for authenticity. Synthetic images gradually become indistinguishable from real images through an adversarial training process. Based on the vanilla GAN approach, numerous GAN-based structures have been developed to adapt to different applications. For example, a bicycle-GAN (bicycle-GAN) algorithm was developed to synthesize a distribution of possible outputs given different input perturbations (Zhu et al 2017). Such a method offers great potential for the task of synthesizing 4D-XCAT phantoms with various breathing amplitudes.
In this study, we developed and trained a 4D bicycle-GAN to synthesize realistic respiratory motions in the 4D-XCAT. Beyond that, we built a machine learning model to control the respiratory motion amplitude generated by the 4D bicycle-GAN model. The generated 4D-XCAT images were evaluated using both the deformation field and ventilation maps to evaluate the synthesized respiratory motions’ realism. The accuracy of the respiratory motion amplitude control algorithm was evaluated by comparing the generated amplitude with the predefined amplitude. Different control algorithms, including linear regression, multilayer perceptron (MLP), and decision tree regression, were compared to investigate their efficacy. Results demonstrated the feasibility of using deep learning models to generate realistic and controllable respiratory motions in 4D-XCAT phantoms, which significantly enhances the value of the phantom for radiation therapy studies.
2. Methods and materials
2.1. Overall workflow
A schematic illustration of the overall workflow is depicted in figure 1. First, the 4D bicycle-GAN motion generation model is trained and validated using 4D-CT images from real patients. The phase one image, which is also the end-of-inhalation (EOI) phase, of a 4D-CT and a Gaussian perturbation are input to the model. The model generates the inter-phase deformation vector fields (DVFs), which are then composed to form input-to-phase DVFs. The input-to-phase DVFs are applied to the input EOI phase image to generate following phases of images. The loss function is calculated on both the DVF and the image domains. Then, the 4D bicycle-GAN model is tested with the EOI phase image of the XCAT phantom to generate realistic respiratory motions in XCAT phantoms. Various Gaussian perturbations are introduced to generate 4D-XCAT phantoms with different motion amplitudes. After the establishment of the motion generation model, a respiratory motion amplitude control model is trained to learn the relationship between the input Gaussian perturbation and the generated motion amplitudes so that the phantom can generate user-defined breathing amplitudes.
Figure 1.

The overall workflow of the respiratory motion generation and amplitude control.
2.2. Motion generation model
As mentioned above, the bicycle-GAN algorithm (Zhu et al 2017) has the advantage of synthesizing a distribution of possible outputs. However, there are some challenges to use this algorithm for this specific breathing motion generation task. First, while the original model generates a single image with one input, we would like to synthesize multiple and temporally coherent phase images from a single-phase input. In other words, the problem elevates from a 2D to 2D image translation to a 3D to 4D image translation. Second, while the original bicycle-GAN model would not control the specific output, we would like to control the respiratory motion amplitudes of the output 4D-XCAT images. Therefore, we made certain modifications based on the vanilla bicycle-GAN to address these challenges.
The model architecture is shown in figure 2. The motion generation model is developed based on the bicycle-GAN algorithm and contains two main parts, a (1) conditional variational encoder-GAN (cVAE-GAN) and a (2) conditional latent regressor-GAN (cLR-GAN). The cVAE-GAN encodes the reference breathing phases into a latent perturbation, giving the generator a noisy perturbation into the desired output. Using this, along with the input EOI image, the generator would synthesize images that simulate the images at other respiratory phases. To ensure that the Gaussian perturbations can induce desired outputs during testing, a KL-divergence loss is computed to regularize the latent perturbation to be close to a standard Gaussian distribution. For the cLR-GAN, the approach is to first provide a randomly drawn perturbation from the standard Gaussian distribution to the generator. The same encoder used in cVAE-GAN then attempts to recover the perturbation from the output images. This part enhances the encoder’s ability to encode the reference images to a Gaussian latent space.
Figure 2.

An illustration of the motion generation model structure.
Different from the conventional bicycle-GAN, which directly generates images of the same domain as the input image, the generator is modified to output a sequence of DVFs, as shown in figure 3. The design of the network is described below.
Figure 3.

An illustration of the 4D generator structure.
The generator consists of several generation blocks. In the first generation block, an eight-digit Gaussian perturbation (dimension: 1 × 8) was broadcasted to the size of the input image by repeating each digit. Then, the broadcasted Gaussian perturbation was concatenated with the input image and used as the input to the first generation block. Each generation block outputs an inter-phase DVF (DVFt). The DVFt is applied to the images at phase t (It) to obtain the images at phase t + 1 (It+1) using linear interpolation (Jiang et al 2020), which is then used as the input of the next generation block. During model validation and testing, the inter-phase DVFs were composed to form the input-to-phase DVFs. The composed input-to-phase DVFs are applied to the input EOI phase image (I1) to obtain the images at phase t+1 (It+1) using the same linear interpolation method. This step is added to solve the blurring issue caused by repeated interpolations to sequentially generate phase images using inter phase DVFs.
Each generation block is built based on the U-net structure, which consists of five downsampling blocks, four upsampling blocks, three concatenated layers, and one output layer. Each upsampling/downsampling block utilizes a 3D convolutional/deconvolutional layer using a 4 × 4 kernel with a stride of two for padded convolution, following by batch normalization and leaky-ReLU (rectified linear unit) activation except for the first downsampling block which omits the batch normalization. The number of feature channels increases at each downsampling layer and decreases with the corresponding speed in the upsampling layers for the concatenation. The output layer utilizes a 3D deconvolution with a 1 × 1 kernel with a stride of one for padded convolution.
The encoder consists of two 3D convolutional layers using a 4 × 4 kernel and a stride of two and three residual blocks, followed by a leaky-ReLU activation and a 3D average pooling layer.
The two discriminators, one in the cVAE-GAN and one in the cLR-GAN, utilize the patchGAN structure, which consists of five convolutional blocks. Each block is comprised of an unpadded convolutional layer using a 4 × 4 kernel and a stride of two followed by batch normalization and leaky-ReLU activation, except for the final block where the stride is adjusted to 1, and sigmoid activation is applied.
The loss function is modified from the original bicycle-GAN. In the cVAE-GAN part, the modified loss function includes: (1) the discriminator loss to distinguish the synthetic and reference 4D-CT phase images, which utilizes cross-entropy; (2) the KL-divergence loss to push the latent perturbation towards the standard Gaussian distribution; (3) the image reconstruction loss which is computed as the structure similarity (SSIM) between the simulated and reference phase images; and (4) the unsupervised DVF gradient smoothness loss (Jiang et al2020) of the inter-phase DVFs to guarantee the image continuity between every two phases. The cVAE-GAN loss function is calculated as shown in equation (1)
| (1) |
In the cLR-GAN part, the loss function is the same as that of the original bicycle-GAN, including the discriminator loss using cross-entropy and an L1-loss between the standard Gaussian distribution and the latent perturbation encoded from synthetic images. The cLR-GAN loss function is calculated as shown in equation (2)
| (2) |
Thus, the overall loss function can take advantage of both cycles ( and ), as shown in equation (3):
| (3) |
with the well-trained motion generation model, the EOI phase of the XCAT phantoms and various Gaussian perturbations are input to the generator, which generates the inter-phase DVFs and finally outputs 4D-XCAT images of diverse breathing motion amplitudes.
2.3. Respiratory motion amplitude control model
Besides synthesizing realistic breathing motion patterns, we also aim to establish a mechanism to control the respiratory motion amplitude generated in 4D-XCAT models so that users can generate the desired motion amplitude based on their needs. To achieve this goal, machine learning models are trained to predict the input Gaussian perturbation needed for the motion generation model to generate 4D-XCAT with a desired respiratory motion amplitude. The chosen machine learning algorithms are the ones that are commonly used for a multioutput regression problem, considering the predicted Gaussian perturbation has multiple digits. Specifically, linear regression, decision tree regression, and MLP were used and compared.
As shown in figure 4, the model input includes the respiratory amplitude and phantom size, and the model output is an eight-digit perturbation. Phantom size is a vital input, as we found that the same Gaussian perturbation with phantoms of different sizes can lead to different respiratory amplitudes generated by the motion generation model. The respiratory motion amplitude was calculated as the moving vector length of a landmark located at the apex of the diaphragm. In each algorithm, the model parameters were tuned empirically to achieve the optimal performance of its type. For example, in the decision tree regression model, the tunable parameter was the maximum depth of the tree; in the MLP model, the tunable parameters included hidden layer sizes, penalty parameters, optimization method, etc. All the machine learning models were built using the scikit-learn package (Pedregosa et al 2011).
Figure 4.

A schematic illustration of respiratory motion amplitude control mechanism.
2.4. Materials
To construct the motion generation model, we used 71 4D-CT images of 20 lung cancer patients from the TCIA (the cancer imaging archive) dataset for model training and validation. Each 4D-CT contains 10 phases, and we used phase 1 to phase 5 due to the GPU memory limitation, with phase one as the EOI phase and phase five as the EOE phase. 65 4D-CT images were used for training, and 6 4D-CT images were used for validation. To avoid the potential bias towards the validation data, we also tested the model using an independent group of patient data, which included 9 4D-CT images from the SPARE challenge dataset (Shieh et al 2019). Due to the limited GPU memory, all 4D-CT images were resampled to a 128 × 128 × 64 volume with an identical pixel spacing of 3.9064 mm and slice thickness of 4.125 mm.
After the motion generation model was trained, validated, and tested using patient data, it was tested using the XCAT images to generate 4D-XCAT with respiratory motions. 6 XCAT phantoms of different body sizes were used to generate 4D-XCAT phantom images, and each phantom was tested with 50 random perturbations to generate different breathing amplitudes.
The generated 4D-XCAT phantoms with respiratory motion amplitudes ranging from 7.5 to 15 mm were used for training, validating, and testing the motion amplitude control model. Specifically, there were 300 sets of XCAT phantoms in total with different breathing amplitudes and phantom sizes. 90% of them were used for the model training and validation with the 10-fold cross-validation technique to reduce overfitting, and 10% of the data were used for model testing. The motion control error was calculated for the testing group.
2.5. Evaluation
2.5.1. Evaluate the realism of the generated breathing motion
The efficacy of the motion generation model was evaluated from both the image domain and DVF domain. On the image domain, several metrics were used for the evaluation: (1) dice coefficients for lungs between the reference image and the simulated image at the EOE phase. (2) Lung volume variation during respiration of the reference and the simulated images. Note that the simulated images used in evaluation metrics (1) and (2) were not the directly generated image. Instead, they were generated by linearly interpolating the generated DVFs to a higher resolution (pixel size: 0.9766 mm, slice thickness: 3 mm; volume dimension 512 × 512 × 96) and applied to the 4D-CT EOI phase image before downsampling. This action was taken to maintain the image quality. The lungs were segmented using a thresholding method (threshold: −200 HU). (3) Cross correlation of whole-body DVFs between the reference 4D-CTs and generated 4D-CTs. The whole-body DVFs here were obtained by deformable registration between the EOI images and the EOE images using a pre-trained deformable registration model (Jiang et al 2020) for both the reference images and generated images. The DVFs generated by the motion generation model were not directly used for this evaluation to avoid the inconsistency caused by different deformable registration methods. (4) Ventilation maps, which were computed from the registered DVFs. Ventilation maps between the reference 4D-CTs and generated 4D-CTs were compared using cross correlation and Spearman’s correlation. On the DVF domain, the deformation energy of the directly generated DVFs was calculated and compared between the real 4DCTs and simulated images. Deformation energy is a commonly used metric indicating the smoothness of DVF (Zhang et al 2013) and is normalized by the number of voxels in the lungs.
The ventilation map calculates the lung ventilation based on the lung volume deformation from the EOI phase to the end-of-expiration (EOE) phase to represent the function of the lung. In this study, the ventilation map was used to evaluate the realism of the simulated breathing motion. Specifically, the ventilation map was calculated based on the Jacobian determinant (Reinhardt et al 2008), as shown in equation (4)
| (4) |
where v (x, y, z) is the ventilation of a small volume at point (x, y, z); J (x, y, z) is the Jacobian of the small volume at point (x, y, z); ux, uy, uz correspond to the x, y, z component of the DVF, respectively.
To evaluate the 4D-XCAT generation results, we also evaluated the image domain and the DVF domain using the same methods in evaluating the patient study results. Specifically, six 4D-XCAT phantoms of different sizes were generated using the motion generation model, and the corresponding original 4D-XCAT phantoms of similar maximum breathing amplitudes were also generated. Dice coefficients for lungs between the original 4D-XCAT phantoms and the generated 4D-XCAT phantoms were calculated at the EOE phase. Lung volume variation during respiration was compared between the original 4D-XCAT phantoms and the generated 4D-XCAT phantoms. The deformation energy of the generated DVFs was calculated and compared with those of the original 4D-XCAT phantoms. Ventilation maps were calculated for both phantoms and compared visually. Since the absence of ground-truth for the generated 4D-XCAT images, a ventilation map from a real patient was also included as a reference for comparison to evaluate the realism 4D-XCAT.
2.5.2. Evaluate the motion amplitude control error
To evaluate the effectiveness of the predicted perturbation, we calculated a motion control error, as shown in figure 4. During this process, the predicted perturbations are returned to the motion generation model, and the generated breathing amplitudes (Agenerated) are compared with the initially desired breathing amplitudes (Adesired) to calculate the motion control error, as shown in equation (5)
| (5) |
3. Results
3.1. Motion model training, validation, and testing using patient 4D-CT
Figure 5 shows an example of the 4D-CT validation result from the motion generation model. As can be seen, the motion generation model simulated the breathing motion in the 4D-CT image. The Dice coefficients for the body and lungs between the reference EOE images and simulated EOE images are shown in table 1. The lung volume variation during respiration is shown in table 2.
Figure 5.

An example of 4D-CT validation results. 1st row: reference 4D-CT phases; 3rd row: simulated 4D-CT phases; 2nd and 4th rows: image variations from the input EOI phase to other phases. 5th row: difference between the 1st row and 3rd row images.
Table 1.
Dice coefficients between the reference and simulated EOE images.
| Training group | Validation group | Testing group | |
|---|---|---|---|
| Lungs | 0.96 ± 0.03 | 0.95 ± 0.02 | 0.96 ± 0.02 |
Table 2.
Lung volume variation during respiration in 4D-CT.
| Lung volume (cc) | Phase 2 | Phase 3 | Phase 4 | Phase 5 (EOE) | |
|---|---|---|---|---|---|
| Training group | Reference | 4372.9 ± 1603.3 | 4240.1 ± 1600.0 | 4103.7 ± 1573.2 | 3991.9 ± 1535.6 |
| Simulation | 4399.8 ± 1592.0 | 4319.4 ± 1586.3 | 4239.0 ± 1574.0 | 4105.7 ± 1597.7 | |
| %Difference | 1.0% | 2.5% | 3.8% | 5.4% | |
| Validation group | Reference | 3313.3 ± 675.11 | 3225.4 ± 650.59 | 3137.9 ± 621.17 | 3081.2 ± 592.35 |
| Simulation | 3338.9 ± 657.94 | 3311.8 ± 651.49 | 3275.2 ± 642.34 | 3244.7 ± 634.28 | |
| %Difference | 1.2% | 2.9% | 4.6% | 5.3% | |
| Testing group | Reference | 3543.0 ± 1341.5 | 3416.2 ± 1339.0 | 3313.5 ± 1294.1 | 3248.6 ± 1269.8 |
| Simulation | 3550.2 ± 1317.8 | 3500.7 ± 1306.5 | 3456.6 ± 1291.6 | 3420.8 ± 1277.9 | |
| %Difference | 0.7% | 3.1% | 4.8% | 5.8% | |
An example of the DVF comparison is shown in figure 6. The quantitative evaluation of the generated DVF is shown in tables 3 and 4. The cross correlation between the reference and simulated DVF for the whole-body ranged from 0.81 to 0.99 along inter-plane directions or SI directions. Furthermore, we derived the ventilation map from the generated DVF in the lung to evaluate the realism of the motion pattern. An example of the ventilation maps for the reference and simulated 4D images is shown in figure 7. All the images are shown in the same window and level for comparison. As we can see from the figures, the model successfully simulated the obvious larger breathing motion around the rib cage and the lower lobes of the lungs. Quantitative evaluation results are shown in table 5.
Figure 6.

Comparison of reference DVF and simulated DVF.
Table 3.
Deformation energy of the generated DVFs.
| Training group | Validation group | Testing group | ||
|---|---|---|---|---|
| Deformation energy | Reference | 0.33 ± 0.17 | 0.22 ± 0.14 | 0.36 ± 0.14 |
| Simulation | 0.50 ± 0.25 | 0.34 ± 0.05 | 0.53 ± 0.18 |
Table 4.
Cross correlation between the whole-body DVFs of the reference images and simulation images.
| Training group | Validation group | Testing group | ||
|---|---|---|---|---|
| Cross correlation | DVF_x | 0.97 ± 0.01 | 0.84 ± 0.07 | 0.89 ± 0.10 |
| DVF_y | 0.96 ± 0.02 | 0.81 ± 0.09 | 0.86 ± 0.12 | |
| DVF_z | 0.99 ± 0.01 | 0.89 ± 0.08 | 0.95 ± 0.04 |
Figure 7.

Comparison of reference and simulated ventilation maps.
Table 5.
Comparison of ventilation maps computed from the reference and generated DVFs.
| Training group | Validation group | Testing group | |
|---|---|---|---|
| Cross correlation | 0.80 ± 0.05 | 0.67 ± 0.09 | 0.68 ± 0.13 |
| Spearman’s correlation | 0.70 ± 0.05 | 0.60 ± 0.09 | 0.53 ± 0.11 |
3.2. 4D-XCAT synthesis
To test the performance of the motion generation model, we also evaluated the images and DVF properties of the generated 4D-XCAT phantoms. The Dice coefficient for lungs between the reference and simulated EOE images in the XCAT phantom is 0.950 ± 0.004. The lung volume variation results are shown in table 6. The deformation energy was calculated from the newly generated DVFs and the motion vector fields generated from the original 4D-XCAT phantoms, and the results are shown in table 7.
Table 6.
Lung volume variation during respiration in XCAT phantom.
| Lung volume (cc) | Phase 2 | Phase 3 | Phase 4 | Phase 5 (EOE) |
|---|---|---|---|---|
| Original XCAT | 4419.6 ± 529.8 | 4468.7 ± 535.6 | 4294.2 ± 514.0 | 4048.3 ± 500.2 |
| Generated XCAT | 4507.1 ± 522.3 | 4307.7 ± 495.1 | 4120.5 ± 474.6 | 3960.6 ± 457.8 |
| %Difference | 1.9% | 3.6% | 4.0% | 2.2% |
Table 7.
Deformation energy of the generated DVFs.
| Original XCAT | Generated XCAT | |
|---|---|---|
| Deformation energy | 0.08 ± 0.002 | 0.36 ± 0.07 |
Ventilation maps were calculated for both the original 4D-XCAT and the generated 4D-XCAT phantoms and compared visually. Since the absence of ground-truth for the generated 4D-XCAT images, a ventilation map from a real patient was also included as a reference to evaluate the realism of 4D-XCAT. Figure 8 shows the comparison between the ventilation maps. As shown in the first column in figure 8, the ventilation map of the original XCAT is relatively uniform because of the simplified linear motion pattern. By comparison, the ventilation map of the generated XCAT phantom using our method presents a more heterogeneous pattern with greater ventilation in the lower lobes, which resembles that of real CT. Note that the difference between the ventilation maps of the generated 4D-XCAT and real 4D-CT is due to the anatomical difference between the XCAT and real CT and patient-specific differences in ventilation. The deep learning model used to synthesize respiratory motions in the 4D-XCAT was trained from a group of patients, and therefore, the ventilation maps represent average characteristics from the group. The ventilation map from a specific patient’s 4D-CT represents the patient-specific ventilation characteristics. As a result, the difference between the generated 4D-XCAT and real 4D-CT in figure 8 is partially due to the inter-patient variation in ventilation.
Figure 8.

Comparison of ventilation maps among the original 4D-XCAT, generated 4D-XCAT, and real patient 4D-CT. 1st row: coronal view; 2nd row: sagittal view.
3.3. 4D-XCAT breathing amplitude control
3.3.1. Synthesis of diverse motion patterns
As shown in figure 9, different levels of diaphragm motions were synthesized from the EOI to EOE phases when different Gaussian perturbations were used. The breathing amplitude is 14.2 mm in the simulated EOE#1 image, and the breathing amplitude is 8 mm in the simulated EOE#2 image.
Figure 9.

An example of a 4D-XCAT phantom generated with different breathing amplitudes. Visualization window: [0 1500].
3.3.2. Respiratory motion amplitude control error
Furthermore, the motion control mechanism was found to work well. As shown in figure 10, the decision tree algorithm performs the best among the three algorithms. The general motion amplitude control error is smaller than 0.5 mm.
Figure 10.

Prediction error of the different motion amplitude control algorithms.
3.4. Model runtime
In terms of time consumption, it took about 18 h to train the motion generation model on a 24 GB NVIDIA TITAN RTX GPU, while it took about 0.23 s to generate a 4D-CT or 4D-XCAT during validation and testing. The training and validation of the motion amplitude control model was rapid using the scikit-learn package. With 10-fold cross-validation, the training and validation only took less than 0.1 s for each algorithm, and the end-to-end evaluation of a testing case took about 0.5 s.
4. Discussion
In this study, we developed a deep learning method to generate realistic respiratory motion, i.e. similar spatial and temporal respiratory motion patterns as that of real patients which is not generated in the original XCAT phantom, in the widely used 4D-XCAT digital phantom. The method was built based on the bicycle-GAN algorithm (Zhu et al 2017) with a novel 4D generator and was trained using the breathing motion pattern extracted from real 4D-CT. The original bicycle-GAN model successfully simulated RGB images from a greyscale image input. The task of generating the RGB images is analogous to the task of generating DVFs. RGB generation requires synthesizing three-color channel values at each pixel of the input image, while DVF generation requires synthesizing three deformation vector values (Dx, Dy, Dz along three axes) at each pixel of the input image. The previous paper showed the efficacy of using Gaussian perturbation to generate images of different RGB colors. Therefore, it can be expected that it is potentially feasible to use Gaussian perturbations to generate DVFs with different magnitudes, thus generating 4D-XCAT phantoms of different breathing amplitudes. Our study results further demonstrated the feasibility of such an approach. In addition, we built a machine learning model to control the breathing amplitude of the generated phantoms, due to the complicated relationship between the Gaussian perturbation and the motion amplitude.
Simulating realistic breathing motion in the 4D-XCAT phantom is significant for translating the findings in phantom studies into clinical practice, especially for the following studies: (1) 4D imaging studies: Various 4D imaging acquisition and reconstruction techniques can be evaluated using XCAT. The complexity of the respiratory motion can significantly impact the data acquisition adequacy and intra-phase motion amplitude of different structures, which consequently affects the quality of the reconstructed 4D images. Simulating realistic motions can validate the techniques in scenarios that are close to the real patients, thus increasing the value of the phantom study for optimizing the techniques for clinical usage. (2) Motion modeling: motion modeling is often used to build a patient respiratory model from the 4D images acquired during simulation, which will be used for on-board image reconstruction or localization. The linearly increasing respiratory motion in the original XCAT is too simple to represent real patient breathing motions, and thus is inadequate to test the efficacy of motion modeling techniques. (3) Functional imaging study: XCAT can be a valuable tool to evaluate and standardize different technologies to reconstruct 4D imaging and derive functional information, such as ventilation maps, from the 4D images. However, as shown in figure 8 of the manuscript, the original XCAT has an unrealistic uniform ventilation map, which makes it inadequate for such studies. (4) Interplay effect study: The interplay between respiratory motion and MLC motion for fluence modulation can affect the delivered dose to the tumor and nearby structures. XCAT provides an important tool to investigate such an effect since XCAT provides the ground-truth images of the respiratory motion. The impact of the interplay effects is highly dependent on the complexity of the respiratory motion. Therefore, establishing a more realistic motion pattern will greatly increase the value of the phantom for such study. This improvement can be more significant for pencil-beam scanning proton treatments as well as that with MLCs, as such treatments are more sensitive to the exact motion. (5) Target tracking: XCAT can be valuable for developing target localization or tracking techniques based on 2D projection images. The complexity of the respiratory motion in XCAT affects the deformation and overlapping of the tumor and other structures at different time frames, which has an important impact on tracking accuracy. Therefore, simulating realistic respiratory motion makes such phantom studies more meaningful for future clinical implementation. For applications (4) and (5), they require the phantom to be with breath-to-breath variation and hysteresis. Currently, the newly developed model can simulate breath-to-breath variation, however, not being able to simulate hysteresis by simulating 5 phases. This limitation and potential solutions are discussed later.
During the training and validation of the motion generation model using real patient 4D-CTs, the realism of the generated breathing motion was evaluated by multiple metrics in the image domain and DVF domain. As shown in tables 1 and 2, the motion generation model simulated the lung motion and lung volume variation during respiration, which matches closely to the reference 4D-CT images. The ventilation map calculated based on the jacobian determinant of the DVF was also used to evaluate the realism of the generated breathing motion. Spearman’s correlation is a commonly used method to evaluate the reproducibility or correlation of ventilation maps (Yamamoto et al 2012, Woodruff et al 2017, Kipritidis et al 2019). Our results in table 5 showed that the Spearman’s correlation between generated and ground-truth ventilation maps was 0.53 ± 0.10 in the patient testing group. We consider this value to be satisfactory due to the following reasons: (1) inter-patient variations. The motion generation model was trained using the respiratory motion data from the training group of 65 patients, and therefore it learned the general motion patterns from these patients. When the model was tested in the testing group of nine other patients, the generated motion will still represent the characteristics of the training group, which is expected to have some difference from the characteristics of the testing group due to inter-patient variations. (2) Reference values in previous literature. To our knowledge, there have been no studies that looked at the correlation between ventilation maps from different patients, potentially due to the challenges caused by the geometric differences between patients. However, Yamamoto et al (2012) quantified the reproducibility of 4D-CT ventilation imaging for the same patient, and they reported Spearman’s correlation of 0.50 ± 0.15 between two intra-patient ventilation maps on average. Since we were comparing the ground-truth 4D-CT images from a specific patient with the simulated 4D-CT images generated from a group-trained model, the Spearman’s correlation is affected by inter-patient variations and thus is expected to be equivalent or worse than the reported values for the intra-patient study above. Therefore, our correlation value of 0.53 ± 0.10 is considered acceptable considering the inter-patient variations.
The improvement of realism of 4D-XCAT phantom can be demonstrated by the comparable deformation energy of the DVFs and ventilation maps of the generated 4D-XCAT phantoms and real 4D-CT images, which are far different from that of the original 4D-XCAT. The deformation energy of the original 4D-XCAT DVF between the EOI phase and EOE phase is 0.08 ± 0.002, which is much lower than that of the reference 4D-CT (training group: 0.33 ± 0.17; validation group: 0.22 ± 0.14; testing group: 0.36 ± 0.14) and generated 4D-XCAT phantom (0.36 ± 0.07) due to the simple linear deformation model used in the original XCAT phantom. Note that the difference between the energy of reference and simulated DVFs can be caused by inter-patient differences and the difference introduced by the deformable registration algorithm. The ventilation map of the generated 4D-XCAT phantoms was heterogeneous and presented obvious greater ventilation in the lower lobes, which is similar to that of real patient 4D-CT. In contrast, the original 4D-XCAT phantom presents a relatively uniform pattern in the ventilation map. Note that we would expect differences between the ventilation maps of the generated 4D-XCAT phantoms and real 4D-CT patient data. The differences are caused by: (1) the anatomical difference between the XCAT and real patients; and (2) patient-specific differences. The generated 4D-XCAT phantoms are produced from the motion generation model trained by a group of patients, while real 4D-CT would present patient-specific ventilation characteristics. Additionally, phantoms of diverse breathing amplitudes can be generated by tuning the model input perturbations, and the motion amplitude control error can be controlled to be within 0.5 mm.
This study has several novelties. First, we modified the bicycle-GAN structure targeting the specific task. We developed a 4D-generator that synthesizes DVFs instead of directly generating CT images, and included an unsupervised gradient smoothness of the DVF in the loss function. Generating continuous phases of images from a single-phase input is challenging, and the gradient smoothness loss guarantees the continuity between the generated phases. Second, we used ventilation maps to evaluate the realism of the generated respiratory motion. Overall, the ventilation map reflects the wellness of the patient’s lung function by computing the lung volume variation during breathing. Specifically, the voxel values on ventilation maps represent the spatial movement of the voxel during breathing. Therefore, a ventilation map is appropriate to evaluate the realism of the generated breathing motion. Third, we developed a respiratory motion amplitude control mechanism using machine learning and evaluated the effectiveness of the mechanism by computing an end-to-end motion amplitude control error. The results demonstrated that the machine learning motion control mechanism was able to control the breathing amplitude of the output phantoms with sub-mm accuracy.
A main challenge in this study is that there is no ‘ground-truth’ DVF to evaluate the synthesized DVF. Real patients only have 4D images without ground-truth DVF. Any registration algorithm used to register a DVF will leads to registration errors. In the XCAT phantom, since we are synthesizing more realistic DVF, there is no ‘ground-truth’ DVF available either. The absence of ground-truth makes evaluating the synthetic DVF a challenging task. The original XCAT phantom generates a linearly increasing DVF, which has neither been validated against any ground-truth DVF nor been quantitatively evaluated. Therefore, for the evaluation of DVF, we mainly evaluate its properties and compared it against the patient data. Specifically, we evaluated the deformation energy (Zhang et al 2013) of DVF, which is commonly used to evaluate DVF in deformable registration. We compared the deformation energy of the simulated DVF with the reference DVF registered between the ground-truth images using our published deep learning registration method (Jiang et al 2020). This evaluation might not be ideal but should provide valuable information for reference.
Comparing with the original XCAT phantom, the newly developed model keeps the merit of the original 4D-XCAT phantom in using real patient breathing trace to simulate breath-to-breath variation. The original phantom can simulate respiratory motions based on real patient breathing traces. In the original 4D-XCAT phantom, the organ motion is simulated by moving the control points on organ surfaces using cubic NURBS algorithm based on two breathing signals (SI motion and AP motion). The motion of control points at different phases is linearly proportional to the maximum breathing amplitude based on the breathing signals. The newly developed phantom can utilize the real patient breathing trace in a similar way. Specifically, the generated DVF between the EOI phase and EOE phase could be scaled according to a real patient breathing signal. In this way, the breath-to-breath variation can be simulated in the newly developed phantom. Using the scaling method, we can also simulate a full breathing cycle which usually contains 10 phases. We can linearly scale the more realistic EOI to EOE DVF in our new 4D-XCAT phantom based on the patient breathing trace to generate the rest of the phases. Another approach is to mirror the generated respiratory phases in the first half breathing cycle to the second half, and then linearly scale the realistic deformation fields based on the patient breathing trace in the second half of the cycle to generate the rest of the phases.
Another merit of the original 4D-XCAT phantom is that it can simulate hysteresis in motion by simulating a phase shift between the SI and AP signals. Since our model does not utilize separate signals and only simulates a half breathing cycle, we acknowledge that our current phantom cannot generate hysteresis at the current stage.
This limitation can be addressed in the future in two ways: (1) with a larger GPU memory, we can train the model to generate all 10 phase images of the breathing cycle, which will include hysteresis learned from real patient breathing motion. (b). We can potentially train separate models to generate DVF along SI and AP directions separately, similar as how XCAT is designed. Then phase shift can be introduced between SI and AP motions to produce hysteresis
One limitation of the study is that image resolution is currently limited by GPU memory. In the presented method, we composed the generated inter phase DVFs, and used linear interpolation to apply them on full-resolution 4D-CT EOI phase images for better image resolution. One potential way to improve the results is to compose the inter phase DVFs during model training, although we expect minor improvement from that. Another way that can well address this limitation in the very near future lies in the advance in hardware. Currently, we are training the model on an NVIDIA TITAN RTX GPU with 24 GB memory. Based on our estimation, the image resolution can be improved to 1.9 mm } 1.9 mm } 3 mm for an image size of 256 × 256 × 96 on a 48 GB GPU (NVIDIA quadro 8000). NVIDIA just recently announced a new A100 80 GB GPU card to be released in early 2021. This new card will allow us to improve the resolution further to match closer to the clinical lung CT scan resolution, which is around 1mm × 1mm × 3 mm. Also, please be noted that this memory issue only exists in the training stage. Once the model is trained and disseminated to users, users do not need to have a large GPU memory to generate the XCAT images.
Organ mass is an important measurement for radiation dosimetry. In the original XCAT phantom, all the organs are assigned with uniform intensities that do not change during respiration. The XCAT phantom includes a separate parameter that defines the density of lungs. This parameter is not reflected in the image pixel values, and is rather just an add-on parameter that can be assigned to lungs so its mass can be calculated for dose calculation, etc. This density parameter is typically adjusted for different respiratory phases, so that lung mass remains constant during breathing. The same adjustments can be used in our model so that the lung mass remains constant during breathing. A similar approach can be applied to other organs in the phantom.
The respiratory motion pattern of the generated 4D-XCAT phantoms depends on the training data characteristics. In the future, we would like to include more training patients to further expand the model’s ability to simulate different patient scenarios. Potentially, patient data can be divided into different groups based on their lung functions to train a model separately to simulate respiratory characteristics for each group. Sliding motions between organs can also be incorporated in the model in future studies to further enhance the realism of the respiratory motions in the phantom. Sliding motion mostly account for the out-of-sync motion of lungs and rib cages. In the future, we could add a low constraint of the DVF smoothness at the sliding surfaces in the training loss function to allow sliding motions in the motion generation model. In this study, patient 4D-CT data were used to train the motion generation model, since 4D-CT is the most widely used imaging technique for motion assessment in clinical practice. Data from other imaging modalities, such as 4D-MRI, can also be used in the future for motion modeling. Both 4D-CT and 4D-MRI are susceptible to motion artifacts. In our study, images with severe motion artifacts were excluded from the training data to ensure the quality of the training. In the future, advanced imaging sorting, reconstruction, and processing techniques can be employed to minimize such artifacts in the training data.
5. Conclusion
The study demonstrated the feasibility of generating realistic controllable respiratory motion in the 4D-XCAT phantom using the proposed motion generation and amplitude control models. This crucial development greatly enhances the value of the XCAT phantom, giving it the ability to more realistically simulate respiratory motions for various 4D imaging and radiotherapy studies, such as 4D image optimization, motion management, and robust treatment planning.
Acknowledgments
This work was supported by the National Institutes of Health under Grant No. R01-CA184173 and R01-EB028324.
References
- Bakic PR, Barufaldi B, Higginbotham D, Weinstein SP, Avanaki AN, Espig KS, Xthona A, Kimpe TRL and Maidment ADA 2018. Virtual clinical trial of lesion detection in digital mammography and digital breast tomosynthesis Proc SPIE 10573 1057306 [Google Scholar]
- Barufaldi B, Higginbotham D, Bakic PR and Maidment ADA 2018. OpenVCT: a GPU-accelerated virtual clinical trial pipeline for mammography and digital breast tomosynthesis Proc SPIE 10573 1057358 [Google Scholar]
- Bosca RJ and Jackson EF 2016. Creating an anthropomorphic digital MR phantom—an extensible tool for comparing and evaluating quantitative imaging algorithms Phys. Med. Biol 61 974. [DOI] [PubMed] [Google Scholar]
- Chen Y, Yin F-F, Zhang Y, Zhang Y and Ren L 2018. Low dose CBCT reconstruction via prior contour based total variation (PCTV) regularization: a feasibility study Phys. Med. Biol 63 085014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiang Z, Yin F-F, Ge Y and Ren L 2020. A multi-scale framework with unsupervised joint training of convolutional neural networks for pulmonary deformable image registration Phys. Med. Biol 65 015011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kipritidis J et al. 2019. The VAMPIRE challenge: a multi-institutional validation study of CT ventilation imaging Med. Phys 46 1198–217 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lafata K, Cai J, Wang C, Hong J, Kelsey CR and Yin F-F 2018. Spatial-temporal variability of radiomic features and its effect on the classification of lung cancer histology Phys. Med. Biol 63 225003. [DOI] [PubMed] [Google Scholar]
- Liu Y, Lei Y, Wang T, Kayode O, Tian S, Liu T, Patel P, Curran WJ, Ren L and Yang X 2019. MRI-based treatment planning for liver stereotactic body radiotherapy: validation of a deep learning-based synthetic CT generation method Br. J. Radiol 92 20190067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pedregosa F et al. 2011Scikit-learn: machine learning in Python J. Mach. Learn. Res 12 2825–30 (https://www.jmlr.org/papers/v12/pedregosa11a.html) [Google Scholar]
- Pham J, Harris W, Sun W, Yang Z, Yin F-F and Ren L 2019. Predicting real-time 3D deformation field maps (DFM) based on volumetric cine MRI (VC-MRI) and artificial neural networks for on-board 4D target tracking: a feasibility study Phys. Med. Biol 64 165016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reinhardt JM, Ding K, Cao K, Christensen GE, Hoffman EA and Bodas SV 2008. Registration-based estimates of local lung tissue expansion compared to xenon CT measures of specific ventilation Med. Image Anal 12 752–63 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ren L, Zhang Y and Yin FF 2014. A limited-angle intrafraction verification (LIVE) system for radiation therapy Med. Phys 41 020701. [DOI] [PubMed] [Google Scholar]
- Segars W, Sturgeon G, Mendonca S, Grimes J and Tsui BM 2010. 4D XCAT phantom for multimodality imaging research Med. Phys 37 4902–15 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Segars WP, Mahesh M, Beck TJ, Frey EC and Tsui BMW 2008. Realistic CT simulation using the 4D XCAT phantom Med. Phys 35 3800–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Segars WP, Tsui BMW, Cai J, Yin F, Fung GSK and Samei E 2018. Application of the 4D XCAT phantoms in biomedical imaging and beyond. IEEE Trans. Med. Imaging 37 680–92 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shieh CC et al. 2019. SPARE: Sparse-view reconstruction challenge for 4D cone-beam CT from a 1 min scan Med. Phys 46 3799–811 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woodruff HC, Shieh CC, Hegi-Johnson F, Keall PJ and Kipritidis J 2017. Quantifying the reproducibility of lung ventilation images between 4-dimensional cone beam CT and 4-dimensional CT Med. Phys 44 1771–81 [DOI] [PubMed] [Google Scholar]
- Xie T and Zaidi H 2014. Effect of respiratory motion on internal radiation dosimetry Med. Phys 41 112506. [DOI] [PubMed] [Google Scholar]
- Yamamoto T et al. 2012. Reproducibility of four-dimensional computed tomography-based lung ventilation imaging Acad. Radiol 19 1554–65 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Deng X, Yin FF and Ren L 2018. Image acquisition optimization of a limited—angle intrafraction verification (LIVE) system for lung radiotherapy Med. Phys 45 340–51 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Yin FF, Segars WP and Ren L 2013. A technique for estimating 4D-CBCT using prior knowledge and limited—angle projections Med. Phys 40 121701. [DOI] [PubMed] [Google Scholar]
- Zhang Y, Yin F-F, Zhang Y and Ren L 2017. Reducing scan angle using adaptive prior knowledge for a limited-angle intrafraction verification (LIVE) system for conformal arc radiotherapy Phys. Med. Biol 62 3859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu J-Y. et al. Toward multimodal image-to-image translation arXiv:1711.11586v4 2017 [Google Scholar]
