Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Apr 1.
Published in final edited form as: Int J Radiat Oncol Biol Phys. 2024 Dec 6;121(5):1349–1360. doi: 10.1016/j.ijrobp.2024.11.077

Data-Driven Volumetric CT Image Generation from Surface Structures using a Patient-Specific Deep Leaning Model

Shaoyan Pan 1,2,6, Chih-Wei Chang 1,6, Zhen Tian 3, Tonghe Wang 4, Marian Axente 1, Joseph Shelton 1, Tian Liu 5, Justin Roper 1, Xiaofeng Yang 1,2
PMCID: PMC13036628  NIHMSID: NIHMS2133776  PMID: 39577474

Abstract

Purpose:

Optical surface imaging presents radiation-dose-free and noninvasive approaches for image-guided radiotherapy, allowing continuous monitoring during treatment delivery. However, it falls short in cases where correlation of motion between body surface and internal tumor is complex, limiting the use of purely surface-guided surrogates for tumor tracking. Relying solely on surface-guided radiation therapy (SGRT) may not ensure accurate intra-fractional monitoring. This work aims to develop a data-driven framework, mitigating the limitations of SGRT in lung cancer radiotherapy by reconstructing volumetric CT images from surface images.

Methods and Materials:

We conducted a retrospective analysis involving 50 lung cancer patients who underwent radiotherapy and had 10-phase 4DCT scans during their treatment simulation. For each patient, we utilized nine phases of 4DCT images for patient-specific model training and validation, reserving one phase for testing purposes. Our approach employed a surface-to-volume image synthesis framework, harnessing cycle-consistency generative adversarial networks to transform surface images into volumetric representations. The framework was extensively validated using an additional 6-patient cohort with re-simulated 4DCT.

Results:

The proposed technique has produced accurate volumetric CT images from the patient’s body surface. In comparison to the ground truth CT images, those generated synthetically by the proposed method exhibited the GTV center of mass difference of 1.72±0.87 mm, the overall mean absolute error of 36.2±7.0 HU, structural similarity index measure of 0.94±0.02, and Dice score coefficient of 0.81±0.07. Furthermore, the robustness of the proposed framework was found to be linked to respiratory motion.

Conclusion:

The proposed approach provides a novel solution to overcome the limitation of SGRT for lung cancer radiotherapy, which can potentially enable real-time volumetric imaging during radiation treatment delivery for accurate tumor tracking without radiation-induced risk. This data-driven framework offers a comprehensive solution to tackle motion management in radiotherapy, without necessitating the rigid application of first principles modeling for organ motion.

1. Introduction

In radiation therapy for thoracic and abdominal treatments, tumor movement can reach distances of several centimeters due to respiratory motion. Effective motion management is crucial to guarantee treatment accuracy and successful treatment outcomes, particularly for hypo-fractionated radiotherapy and proton therapy (1). Time-resolved imaging, that is, four-dimensional (4D) computed tomography (CT) is now routinely used at treatment simulation to evaluate the motion for treatment planning. However, the unpredictable changes in respiration between simulation and treatment may result in significant deviations of the actual delivered dose from the planned dose of the treatment plan, leading to under-dosing of targets and/or over-dosing of normal tissues (2). In-treatment real-time imaging is hence highly desired for continuous tumor motion monitoring and management during treatment delivery.

Conventional real-time tumor motion tracking usually relies on surrogates for actual tumor motion (3,4), and direct tumor tracking can be achieved by implanting markers within tumors or surrounding tissues (59). However, implanting markers is invasive and costly and carries medical risks, including marker migration. Tracking without markers for lung tumor positioning is highly desirable. Projection-based 2D imaging techniques like kV fluoroscopy are commonly employed for real-time, marker-free tumor tracking (1015). However, these methods are limited by their inability to fully capture the three-dimensional structure of the tumor, as they project the 3D anatomy onto a 2D plane, reducing tumor visibility. This limitation becomes particularly pronounced when the tumor is obscured by high-intensity structures like bones. Cone-beam CT (CBCT) has been widely deployed in clinics to minimize positional uncertainty during treatment delivery. However, due to concerns about potential collisions, the rotation speed of the LINAC gantry is constrained. The CBCT half-fan acquisition takes approximately 1 minute, which is two orders of magnitude longer than the required temporal resolution for real-time imaging. Recently, deep learning (DL) has emerged as a promising solution for generating volumetric imaging using patient-specific training schemes (16,17). This approach leverages the ability of DL to encode anatomical relationships during model training, achieved through augmented datasets that include diverse 4DCT data representing various respiratory phases and anatomical variations. Prior research (18,19) has successfully shown the potential of employing DL techniques to generate instant 3D CT images from orthogonal or even a single 2D kV image, particularly in lung imaging scenarios. Nevertheless, these projection-based techniques still necessitate acquiring X-ray projections at a high temporal rate, which raises concerns regarding imaging dose based on the public radiation protection requirement regulated by U.S. NRC.

Optical surface imaging (20,21) has emerged as an attractive option for in-treatment real-time imaging for motion monitoring due to its non-ionizing property, large field of view (FOV), sub-millimeter detectability, and high temporal resolution. However, several studies (2224) have noted the imperfect correlation between body surface motion and the motion of the internal tumors. In this study, we aim to surmount the constraints associated with surface imaging while harnessing its benefits for lung cancer radiotherapy. Our research delves into the potential of creating 3D anatomical images based on a patient’s 3D surface image, capitalizing on their prior anatomical information. More specifically, a DL framework is designed to infer the hidden spatial anatomical details using the ultra-sparse body surface. The proposed framework is based on two key hypotheses: 1) patient-specific DL training on volumetric images can provide the essential information to bridge the gap between surface imaging and volumetric imaging; and 2) Patients’ breathing patterns and anatomy remain consistent between the planning CT simulation and the treatment delivery phase. This framework has the potential to provide real-time imaging to guide photon and proton therapy, particularly in FLASH radiotherapy (25), which demands highly accurate treatment delivery. Such a data-driven approach maximally leverages prior knowledge to enhance image prediction by integrating data from different modalities, which is an emerging topic in multiple disciplines (26,27).

2. Methods

This study utilizes DL to explore the relationship between patients’ skin surface movements and internal tumor motion, guided by two primary hypotheses in the Introduction. The first hypothesis supports the development of a DL-based synthetic CT framework capable of predicting internal tumor motion from non-invasive surface imaging. The second hypothesis assumes that no significant anatomical changes occur between the planning CT scan and the treatment delivery phase, ensuring consistency in patient anatomy. Consequently, the proposed framework can generate CT images that reduce uncertainties in anatomical predictions. For the initial phase of this investigation, we assume that the surface meshes obtained through optical imaging can be accurately translated into surface volumes, specifically a binary body mask of the patients. Here the surface volume is obtained from patient’s body contour with the same dimension of ground truth CT. Fig. 1 depicts the architecture of the surface-to-volume image synthesis framework. The input is the patients’ binary body surface volume with the resolution based on the ground truth CT. The output is a 3D CT image volume (i.e., synthetic CT) corresponding to the input binary surface volume.

Fig. 1 |. The proposed surface-to-volume image synthesis framework which consists of three translation neural networks.

Fig. 1 |

A binary surface volume, which has the identical resolution as the ground truth CT, is firstly input to the reconstruction network to generate an intermediate CT volume (Generated volume). Then the generated volume is fed into a refinement network to correct the HU intensities of the reconstructed CT, including the high- and low-density materials, and further generate a full-resolution volume. In addition, a verification network is utilized to convert the generated volume back to the binary surface volume, which should align with the ground truth surface volume. This step helps minimize information loss that might have occurred during the surface-to-volume translation.

The network architecture consists of three modularized sub-networks: a reconstruction network, a verification network, and a refinement network. The reconstruction network first encodes the surface images into various feature maps and transforms the surface feature maps into volumetric feature maps, which allows the decoder to reconstruct the volumetric images. During the training phase, this reconstruction network learns how to correlate the feature distributions between the surface and volumetric images. The verification network transforms the reconstructed volumetric images back to generated surface images to ensure the invariance of DL-based image synthesis to the input surface images. The refinement network ensures the scale invariance of CT numbers between the output volumetric images and ground truth CT images. This network learns the high-density tissue intensity, image noise level, and contrasts from ground truth CT images, which is patient-specific and machine-specific prior knowledge.

Data limitation often challenges the DL applications in a medical setting. To address the data limitation issue, in this work we designed the surface-to-volume network to be patient specific; that is, we train a patient-specific network model (Fig. 1) for each patient using his or her own 4D CT images. The 4D CT is commonly acquired during treatment simulation for lung cancer patients to capture and evaluate respiratory motion for treatment planning, which provides patient specific prior knowledge on the relationship between patient’s 3D anatomy and the body surface.

2.1. Data acquisition and preprocessing

A retrospective study was performed with a cohort of 50 lung cancer patients identified from the institutional database, who underwent radiotherapy at our institution and had 4DCT scan during their treatment simulation. The database included 25 females and 25 males, who received stereotactic body radiation therapy (SBRT) with the median age of 76.4 (54–98). Table A1 (Appendix A) summarizes patient diagnosis details. Each 4D CT image consists of 10 sets of 3D CT images corresponding to 10 different respiratory phases. Independent experiments were conducted for each patient. All image data are acquired from Siemens SOMATOM Definition AS at 120 kV using the standard Siemens Lung 4D CT protocol with the software version of Syngo CT VA48A, pitch of 0.8, and reconstruction kernel of Bf37. The image resolution and voxel size are 512 × 512 × (133–168) and voxel spacing of 0.9756 × 0.9756 × 2 mm3 along the axial, coronal, and sagittal axis. For training efficiency, we cropped CT images to reduce the background air region, and generated down-sampled CT images by the ratio of 2 to have a resolution of 160 × 256 for the transversal CT slices. In each experiment, 8 respiratory phases are used for training; another random phase is used for validation, and the remaining phases are used for inference.

The surface-to-volume system was implemented using the PyTorch framework (28) in Python 3.8.11 on a workstation running Windows 11 with a single NVIDIA A100 GPU. Compared to DL applications in computer vision and other fields, data availability is a limiting factor in a medical setting, making it a fundamental challenge to achieve the generalizability of DL models for a large patient population of significant variations in tumor shape, size, location, and motion. We propose to innovatively leverage the patient’s similarity among treatment simulations and each treatment fraction to build patient-specific DL models with data augmentation techniques to overcome the aforementioned fundamental challenge and improve the system’s performance and robustness. We will train a patient-specific model for each specific patient and will not use this model for other patients. For each patient-specific training, the 50% phase was reserved for testing and the model was trained by 8 phases that were randomly selected from (0%, 10%, 20%, 30%, 40%, 60%, 70%, 80%, 90%). The remaining one phase was for validation to prevent model overfitting. This patient-specific training can ensure the maximum applicability and relevancy between the models and the proposed application, which is generating volumetric images from patient-specific surface images for SGRT (surface-guided radiation therapy). After the training of 2600 epochs (600 epochs for reconstruction-verification networks and 2000 epochs for refinement network), the models with the smallest refinement loss were saved as the final models. The training time of the network is around 9 hours, and the inference time is around 0.4 seconds per volume.

2.2. Network architecture

The proposed reconstruction network utilizes an encoder-decoder architecture (Fig. 1) to translate the binary surface to volume. It takes a down-sampled surface as input and generates a down-sampled volume as output. The encoder component initiates with early convolutional layers using a 3-kernel size and a 1-stride size, which capture early semantic features. Subsequently, three down-sampling and one non-sampling convolutional modules learn semantic features from various resolution levels. Each convolutional module consists of four convolutional layers with a 3-kernel size and a 1-stride size, responsible for learning semantic features, followed by a final down-sampling trilinear interpolation layer (with a ratio of 2) and a convolutional layer with a 1-kernel size and a 1-stride size are employed for down-sampling the features. For the non-sampling module, we remove the interpolation and the last 1-kernel size convolutional layer. Shortcut connections(29) are employed between the input of the first and third convolutional layers, as well as between the output of the second and fourth layers, enhancing the learning process. The encoder features are then fed into the transformer modules, which are two sequential convolutional modules with a 3-kernel size and a 1-stride size. The transformer refines the semantic features and transfers them to the decoder. The decoder, symmetric to the encoder, expands the features using one non-sampling and three up-sampling convolutional modules (with the same structure as the down-sampling modules, except for using up-sampling instead of down-sampling interpolation) and a final convolutional layer with a 3-kernel size and a 1-stride size to reconstruct the volume. Instance normalization (30) and the Sigmoid linear unit activation function (31) are applied after each convolutional layer except the last one (Tanh). Additionally, skip connections (32) are utilized to connect the corresponding modules at the same resolution levels between the encoder and decoder, enabling the network to capture hidden correlations in the data through hierarchical structures.

The refinement network, depicted in Fig. 1, follows an encoder-decoder architecture. Its objective is to generate a full-resolution volume. Initially, the down-sampled generated volume is up-sampled by trilinear interpolation. The blurry volume then undergoes processing through a non-sampled convolutional residual module. It then passes through five sequential down-sampling residual convolution modules, followed by another non-sampling transformer module. These features are subsequently passed through five sequential up- sampling and one non- sampling residual modules, culminating in a final convolutional layer with a 1-kernel size and a 1-stride size, and a Tanh activation function. The non-sampling residual module has two paths: a path with two 3-kernel size convolutional layers with strides of 1, and another path with a 1-kernel size convolutional layer. The output of the first path is summed with the second path to create a residual connection that stabilizes the network. The down-sampling convolution/transformer module has a similar architecture to the non-sampling module, except for the stride sizes of the first convolutional layers of both paths, which change to 2 (for the first down-sampling module, the kernel size and stride size are (1,2,1)). The up-sampling module of the network comprises a deconvolutional layer with a 2-kernel size and a 2-stride size (for the first up-sampling module, the kernel size and stride size are (1,2,1)), which is then followed by two convolutional layers with a 3-kernel size and a 1-stride size. The output of the deconvolutional layer is added elementwise to the output of the final convolutional layer. In addition, skip connections are employed between the modules of the encoder and decoder to improve the network’s learning capability. Instance normalization and LeakyReLU (31) activation functions are applied to each convolutional layer except the last one.

2.3. Reconstruction and verification networks’ optimization

The proposed reconstruction and verification networks, GR and GV, are trained jointly in 600 epochs with a mini-batch-size of 2. These two networks are trained under the principle of “conditional generation” (33): the inputs are concatenated with a Normal noise N𝒩(0,1). The networks should generate volume that have minimize intensity difference with the ground truths. Therefore, with assistance of a discriminator DR, which consisting of five down-sampled residual convolutional blocks (followed by the Instance normalization and LeakyRelu activation functions) and a final average pooling layer, the reconstruction and verification networks are trained by =vol+surf=D(SV)+G(SV)+I(VV)+C(VSV)+D(VS)+I(SS)+C(SVS), where V and S indicate the volume and surface.

Now, by denoting “·” as concatenation, we have difference losses:

D(SV)=100*MAEGR(SN),V (1)
D(VS)=100*DICEGV(VN),S (2)

and generative adversarial networks (GAN) losses for enhancing visual appearance of the reconstructed volume:

G(SV)=-DRSGR(SN) (3)

and identity losses for preserving structure details:

I(VV)=5*MAEGR(VN),V (4)
I(SS)=5*DICEGV(SN),S (5)

and cycle-consistency losses for preserving overall details:

C(VSV)=20*MAEGRGV(VN)N,V (6)
C(SVS)=20*DICEGVGR(SN)N,S (7)

where MAE indicates mean absolute error, and DICE indicates dice score (32). In addition, the discriminator is also conditionally optimized to ensure that it can provide correct visual appearance information to the reconstruction networks:

DR=DRSGR(SN)-DR(SV)+10*(||DRSϵ*V+(1-ϵ)*GR(SN)||22-1)2 (8)

where indicates the network gradients to the corresponding inputs, ϵ is a random number sampled from a beta distribution with two shape-parameters equal to 0.2.

In training, the reconstruction and verification networks, and the discriminators are optimized in order of DR for each batch of data. An AdamW optimizer (34) optimizes the networks with a learning rate of 10−4. After every ten training epochs, the reconstruction network was evaluated on the validation dataset to monitor the model performance on the “unseen” data. Once the training is finished, only the reconstruction network is needed to translate surface to volume.

2.4. Refinement network’s optimization

After we train the reconstruction/verification system through the whole dataset, we can obtain the down-sampled reconstructed volumes from surface from the reconstruction network. The reconstructed volumes are then up-sampled to the blurry, intermediate volumes VI using trilinear interpolation. In each epoch, we start to unconditionally (without a need of concatenating a normal noise) optimize the refinement network (RF) to refine the intermediate volumes VI to the ground truth VI,truth.

refine=MAERFVI,VI,truth (9)

We trained the refined reconstruction stage throughout the dataset which is shuffled with a mini-batch-size of 1. The AdamW optimizer optimizes the refinement network with a learning rate of 10−4. The refinement network was evaluated at every ten epochs, and we saved the model at the epoch with the best validation accuracy as the final model.

2.5. Evaluation

We employed six distinct metrics to evaluate under which conditions the proposed framework can generate volumetric images with minimal inaccuracies from surface images. The mean absolute error (MAE) and peak signal-to-noise ratio (PSNR) metrics evaluate the quality of the reconstructed volumetric images, focusing on aspects such as noise level, contrast, and CT numbers. The structural similarity index measure (SSIM) (35) assesses the structural congruence between synthetic and original simulation CT images, reflecting anatomical fidelity. These metrics were used to perform regions of interest (ROI)-based assessments for organs at risks. To explore the predictive capability of the proposed framework, we included additional 6 patients (Table A2) with an initial 4DCT images set and 2 re-simulated 4DCT image sets. The average days for the second and third re-simulated 4DCT image sets were 18.0±5.5 and 36.3±9.3 days from the initial 4DCT scan. We used all phases of the initial 4DCT to train the patient-specific model, and tested the model based on the two re-simulated 4DCT acquired from two different dates.

For tumor tracking evaluation, we used RayStation 2023B to manually contour the gross tumor volumes (GTV) on the ground truth and predicted CT. We computed the GTV center of mass (COM) distances from the ground truth and predicted CT to investigate the capability of the proposed framework regarding potential tumor tracking. To facilitate a deeper and more nuanced understanding of the network’s capability of creating high-resolution CT volumes, we initiated a rigorous correlation analysis. Central to this exploration is discerning the relationship between the CT reconstruction accuracy, delineated through MAE, and body motion of 4DCT scans. To effectively carry out this endeavor, we leverage Gaussian process regression (GPR) (36) with adaptive linear/quadratic kernel to illuminate the underlying correlations between the body motion and the accuracy in reconstruction. Significantly augmenting the rigor and reliability of this study, a 95% confidence interval was instituted, furnishing a robust framework to assess the regression analysis precision. Additionally, we utilized the Dice similarity coefficient (DSC) to assess the similarity between the tumors in the predicted and ground truth CT images. To robustly quantify how closely the predicted tumor boundaries match the actual boundaries, we reported the 95% Hausdorff distance (HD95). Furthermore, we defined motion correlation as the geometric mean of Pearson correlation coefficients in the left-right, anterior-posterior, and superior-inferior directions of the COM derived from the GTV contours in both the ground truth and generated CT images.

In our current research, we have streamlined our focus to delve deeply into two vital motion metrics that are paramount for our GPR analysis. All quantitative values have been normalized to ensure the robustness of the analysis. The first metric of interest is “Lung Volume Variation (MAX-MIN)” (ΔVLung) which encapsulates the absolute difference between the normalized maximum and minimum lung volumes observed throughout the respiratory phases. This metric is indispensable for grasping the amplitude of lung volume fluctuation and serves as a marker for the maximum level of disparity in lung size during respiration. The second metric we scrutinized is how the lung volumes vary, which represents the standard deviation (σ) of normalized lung volume (VLung) across all observed respiratory phases. This metric is particularly insightful for understanding the variability and dispersion in lung volume. While the lung volume change offers a snapshot view of peak volume changes, the lung volume variation provides a holistic perspective on the inherent variability and inconsistency in lung volume throughout the respiratory cycle.

3. Results

Among 50 patients, Table 1 indicates that the reconstructed CT volume achieves an average MAE, PSNR, and SSIM of 36.2±7.0 HU, 36.4±1.9 dB, and 94±0.02 for the body contour. For the re-simulated 4DCT patients, both image sets achieve comparable MAE, PSNR, and SSIM with difference values of 1.5 HU, 0.5 dB, and 0.02 in GTV contours. Fig. 2 shows the reconstructed volumetric images and ground truth CT in transversal views for different cases from different groups. Fig. 2 also includes comparisons of SSIM, difference maps, and CT-number line profiles between synthetic images and ground truth. Fig. 3 illustrates the sagittal views of the reconstructed volumetric images, together with the ground truth CT and histogram comparisons of CT numbers between reconstructed images and ground truth. All evaluation metrics indicate the model potential regarding generating comparative volumetric images to 3D CT images acquired from treatment planning CT scanners.

Table 1 |. Evaluation of the generated volumetric images using surface images from 50 patients and additional 6 patients with re-simulated 4DCT.

The comparisons included various regions of interest (ROI) such as body, organs at risk, and gross tumor volume (GTV) to evaluate the global and local image quality.

ROI MAE (HU) (↓) PSNR (dB) (↑) SSIM (↑)

4DCT GTV 51.8±29.6 35.5±5.0 0.88±0.09
Heart 43.7±14.0 34.5±3.8 0.90±0.04
Lung 41.2±17.1 36.1±6.8 0.92±0.04
Esophagus 41.5±13.0 35.3±3.8 0.92±0.04
Spinal cord 40.8±17.5 37.5±7.4 0.93±0.05
Body 36.2±7.0 36.4±1.9 0.94±0.02

Re-simulated 4DCT Set 1 GTV 57.0±13.8 34.0±1.5 0.87±0.05
Body 38.8±4.2 35.7±1.4 0.93±0.01

Re-simulated 4DCT Set 2 GTV 55.5±13.0 34.5±1.3 0.89±0.04
Body 37.8±4.0 36.3±1.2 0.94±0.01

Fig. 2 |. Examples of predicted volumetric images from each group.

Fig. 2 |

The transversal views of ground truth and predicted CT are displayed. The presented cases from #1 to #6 are selected based on the highest SSIM to the lowest SSIM. The evaluation metrics include structural similarity index measure (SSIM), relative difference maps, and line profiles. The horizontal solid and oblique dot lines on the ground truth images indicate the location of profile comparisons.

Fig. 3 |. Examples of predicted volumetric images from each group with histogram distributions of CT numbers.

Fig. 3 |

The sagittal views of ground truth and predicted CT are displayed. The presented cases from #1 to #6 are selected based on the highest SSIM to the lowest SSIM.

Fig. 4(a) shows the lung volume variation between inhale and exhale phases. Fig. 4(b1)-(b2) depicts the correlation plots to show the trendline and 95 percent confidence intervals between the motion and the reconstruction accuracy based on MAE. Building on our comprehensive analysis, the GPR between the lung volume change and MAE demonstrates a confidence interval of ±13.38 HU. And the GPR between the lung volume variation and MAE demonstrates a confidence interval of ±13.31 HU. Intriguingly, both parameters have a near-linear positive correlation with MAE, signifying a roughly linear detrimental effect on the accuracy of our reconstruction model. The confidence intervals for these newly introduced parameters align well with those of previously examined variables, thereby bolstering the consistency and reliability of our model across these motion characteristics. These correlations imply that minor alterations in either of these variables could lead to corresponding shifts in reconstruction accuracy. Fig. 4(c1)-(d3) depict the GTV COM distances, DSC, and HD95 from the ground truth and predicted CT for all 50 patients and 6 additional patients with re-simulated 4DCT image sets. Table 2 summarizes the motion evaluation metrics based on GTV contour evaluation for all patients. The GTV contours from all generated 4DCT sets achieve mean DSC above 0.8. For the case of 50 patient with a single 4DCT image set, the mean relative GTV COM distance and motion correlation are 0.2±0.1 and 0.83±0.13. For the patients with re-simulated 4DCT, the mean relative GTV COM distances are 2.33±0.49 and 2.16±0.45 mm, and the mean motion correlations are 0.79±0.09 and 0.79±0.07, respectively. Figure B1 (Appendix B) shows the images for tumor motion correlation analysis.

Fig. 4 |. The performance of the surface-to-volume image reconstruction network changes with motion metrics.

Fig. 4 |

a) The visualization of the lung volume variation due to respiratory motion across two different phases. b1-b2) These figures demonstrate the population distribution by the red dots, the blue trendlines obtained from the Gaussian process regression (GPR) and the 95 percent confidence interval for reconstruction accuracy (MAE) on the y-axis, with motion characteristics on the x-axis. Each figure corresponds to a specific motion characteristic. c1) Histogram of the GTV COM distances and relative GTV COM distances for all 50 patients. c2-c3) DSC and HD95 between ground truth and predicted CT. d1) Histogram of GTV COM distances and relative GTV COM distances for 6 patients with two re-simulated 4DCT image sets, where A-F, 1, and 2 in x-axis denote the patient 1–6, the first 4DCT set, and the second 4DCT set. d2-d3) DSC and HD95 between ground truth and predicted CT.

Table 2 |. Tumor motion tracking evaluation in generated volumetric images using surface images from 50 patients and additional 6 patients with re-simulated 4DCT.

The comparisons included GTV COM distances, relative GTV COM distances, DSC, HD95, Motion Correlation for the tumor in ground truth and predicted CT.

GTV COM Distance (mm) (↓) Relative GTV COM Distance (↓) DSC (↑) HD95 (mm) (↓) Motion Correlation (↑)
4DCT 1.72±0.87 0.20±0.10 0.81±0.07 2.50±1.03 0.83±0.13
Re-simulated 4DCT Set 1 2.23±0.49 0.24±0.05 0.80±0.03 2.54±0.68 0.79±0.09
Re-simulated 4DCT Set 2 2.16±0.45 0.23±0.04 0.81±0.02 2.40±0.60 0.79±0.07

4. Discussion

We validated the proposed framework within the context of lung cancer patients, utilizing 4DCT images. This framework generates synthetic CT images without any radiation exposure, featuring detailed anatomical information crucial for lesion localization. This capability has the potential to facilitate real-time image-guided radiation therapy. Each patient benefits from a unique, individually trained model. The training process can seamlessly occur in the background during the treatment planning stage. Once the model is trained, it can be effectively employed with surface imaging devices for real-time prediction of 3D patient anatomy. This aids in patient setup, verification, and continuous online monitoring within the existing workflow for both photon and proton radiation therapy. Furthermore, the framework’s application can extend to real-time imaging for radiosurgery, interventional procedures, or ultra-high dose rate FLASH radiotherapy. Integration of this method into these clinical procedures may lead to a significant reduction in imaging dose exposure to meet the goal of radiation protection and real-time tumor tracking.

Table 1 shows good agreements between the proposed data-driven framework and ground truth for ROI-based accuracy across various anatomical regions. The esophagus, heart, lung, and spinal cord regions demonstrate good fidelity in reconstruction, with metrics indicating a balance between error tolerance and image quality. Notably, the GTV region presents the greatest challenge, reflected in higher variability and slightly lower pattern congruence. Despite these challenges, structural and contrast similarities, as measured by SSIM, remain consistently high across all regions, affirming the method’s reliability in preserving critical CT features in the anatomical regions. In addition, pattern alignment, crucial for accurate reconstructions, shows excellence in the lung and spinal cord regions, as evidenced by NCC scores, suggesting precise alignment with actual images. Visual realism, evaluated through FID and LPIPS, indicates a high degree of perceptual quality, particularly in the lung region, which achieves notably lower scores, highlighting its closer alignment with human visual perception.

The results gleaned from this study bear significant implications for the field of radiotherapy, which relies on precise target delineation for effective tumor treatment while safeguarding nearby organs at risk. Firstly, our findings indicate that the developed framework can effectively anticipate the potential boundaries of reconstruction accuracy, allowing for more informed treatment planning prior to synthesis. Secondly, they underscore the fact that variations in patient motion during different phases of 4DCT can detrimentally affect reconstruction fidelity, potentially compromising the effectiveness of established radiation therapy procedures. This underscores the importance of thorough assessment and precautionary measures when formulating radiation therapy strategies, especially for patients exhibiting pronounced motion characteristics, such as those with substantial lung volume changes in their CT volumes. As a result, there is an urgent need to further research efforts aimed at mitigating the adverse effects of motion during both image reconstruction and treatment phases. This will ensure more dependable and successful outcomes in radiation therapy.

A comprehensive analysis of motion, as quantified by lung volume variation as illustrated in Fig. 4, unveils intricate relationships between distinct motion characteristics and the accuracy of reconstruction. Notably, both metrics exhibit a nearly linear positive correlation with MAE, suggesting a correspondingly linear negative impact on reconstruction accuracy. Moreover, concerning lung volume variation, Fig. 4(b1)-(b2) depicts the existence of uncertainty ranges. It is important to exercise caution when applying the proposed data-driven framework for synthesizing volumetric images, as these images could potentially be influenced by data quality issues, such as irregular breathing patterns. Such uncertainties demarcate the potential margin of error when employing these characteristics to estimate the performance of the proposed model with new patients, defining the ranges within which the accuracy of synthetic CT volumes is expected to fall.

Fig. 4(c1) depicts that the average GTV COM distance is 1.72±0.87 mm for all 50 patients and 2.20±0.47 mm for patients with re-simulated 4DCT. The average DSC is 0.81±0.06 for both re-simulated 4DCT datasets, indicating good similarity between the tumors in the generated CTs and the ground truth, in terms of both location and shape. Table 2 and Figure B1 show that the GTV motion correlation for the 4DCT data is 0.83±0.13, indicating high consistency in motion patterns. Similarly, the average result of two re-simulated 4DCT datasets exhibits a slightly lower but still strong motion correlation of 0.79±0.08. These motion correlation results confirm that the generated 4DCT accurately aligns with the true tumor motion, demonstrating that the tumors in the generated CTs not only maintain a minimal absolute distance from the true tumors but also align correctly with the tumor motion direction in respiratory cycle.

The proposed surface-to-volume framework can assimilate all the available, relevant, and adequately evaluated image data to enhance model performance. In cases where irregular patterns are detected in specific patients, additional pre-treatment quality assurance images, such as quality assurance CT scans and daily/weekly CBCT scans, can be utilized to adapt the model and account for any anatomical or respiratory changes that may occur during treatment. It is worth noting that we maintain the model’s architectural simplicity for interpretability (3739) and efficiency for model inference. Although the patient-specific training requires around 9 hours, it can be done during a typical 5- or 10-day treatment planning and preparation care path before the initial treatment starts since the 4D CT is acquired before this planning process. The model inference time is about 0.4 seconds using a single NVIDIA A100 GPU. Another challenge for potential clinical implementation is the requirement of an online training interface. Future investigations will focus on optimizing the model inference time to be within 100 milliseconds and simplifying the framework and model structures for potential clinical applications.

Surface imaging is an effective tool for patient positioning in specific clinical sites (40), and Leong et al (41) demonstrated that it can reduce the need for orthogonal kV imaging prior to CBCT. However, Leong et al (41) still incorporated CBCT into the patient setup workflow as a final step to determine the necessary positional adjustments before radiotherapy delivery. Lai et al (42) reported that only 35.71% of stereotactic radiosurgery cases achieved positional uncertainties of less than 0.5 mm when comparing surface imaging to CBCT. These findings highlight the critical role of volumetric imaging in verifying the treatment position before radiation delivery. Given that tumors in different lung locations can exhibit varying degrees of motion, the use of validated volumetric images ensures precise radiation beam targeting. Additionally, validated volumetric imaging enables online dose evaluation and online adaptive therapy. Future research will explore whether the proposed method can be extended beyond the current hypotheses.

As a preliminary feasibility study, our proposed method is presently restricted to patients with 4DCT images, which are typically obtained during simulation for lung and abdominal cancer cases. This limitation arises from the need for a representative dataset of patient motion to train the proposed model, i.e. conventional convolutional neural network-based models require the assumption of independent and identically distributed samples. Given the inherent temporal correlation and dependency among different respiratory phases, it is imperative to explore alternative network architectures that can seamlessly integrate into the proposed surface-to-volume framework. Such exploration is essential to identify more sophisticated network structures, such as recurrent GAN-type methods (43,44), capable of effectively processing and interpreting these time-sequenced respiratory data. In our upcoming research endeavors, we are committed to improving our methodology by incorporating advanced techniques like denoising diffusion probabilistic models (4547) to further enhance the accuracy of reconstructed volumes. This study serves as a proof-of-concept, demonstrating the feasibility of generating volumetric CT images from a patient’s body surface image, achieved by simulating the surface image using the body contour from CT images. While the advantages of patient-specific training have been demonstrated, caution must be exercised because this training uses small datasets, and applications can potentially overfit specific scenarios. For instance, the current work is limited by the two hypotheses in the Introduction. In future studies, we plan to assess our method in the context of broader motion management challenges in radiotherapy, including scenarios involving significant patient weight fluctuations, unexpected changes in internal anatomy, and variations in tumor size due to shrinkage or growth.

The current work is also limited by using simulated patients’ surface volume. However, the conventional clinical surface imaging systems only capture a subarea of the body surface. To implement our method in clinical practice, it is crucial to develop the network to generate CT images from these subarea surfaces. Advancing the model to achieve this capability is a key aspect of our future work to validate the framework in real clinical settings. Future work will also use authentic surface image datasets to assess the proposed framework’s effectiveness and applicability, including exploring the convergence of mesh resolution regarding synthetic CT accuracy. These comprehensive evaluations will provide valuable insights and validate the practical utility of our approach in clinical applications.

5. Conclusion

Our study has demonstrated the potential of data-driven modeling for generating volumetric CT images by leveraging surface information along with patient-specific priors (4DCT). The developed surface-to-volume network successfully establishes meaningful correlations between concealed surface features and the 3D patient anatomy, enabling the synthesis of patient-specific CT images. This approach presents a promising solution, as it allows for the integration of data in addressing motion management in radiotherapy without the need for rigid, first principles modeling of organ motion.

Supplementary Material

MMC1

Acknowledgments

This research is supported in part by the National Institutes of Health under Award Number R01CA215718, R56EB033332, R01EB032680, and P30CA008748.

Footnotes

Competing interests

The authors declare that there are no competing interests.

Additional information

Emory IRB review board approval was obtained (IRB #114349), and informed consent was not required for this Health Insurance Portability and Accountability Act (HIPAA) compliant retrospective analysis.

Publisher's Disclaimer: This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  • 1.Keall PJ, Mageras GS, Balter JM, et al. The management of respiratory motion in radiation oncology report of aapm task group 76a). Medical Physics 2006;33:3874–3900. [DOI] [PubMed] [Google Scholar]
  • 2.Liu G, Hu F, Ding X, et al. Simulation of dosimetry impact of 4dct uncertainty in 4d dose calculation for lung sbrt. Radiation Oncology 2019;14:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Caillet V, Booth JT, Keall P. Igrt and motion management during lung sbrt delivery. Physica Medica 2017;44:113–122. [DOI] [PubMed] [Google Scholar]
  • 4.Bertholet J, Knopf A, Eiben B, et al. Real-time intrafraction motion monitoring in external beam radiotherapy. Physics in Medicine & Biology 2019;64:15TR01. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Torshabi AE, Pella A, Riboldi M, et al. Targeting accuracy in real-time tumor tracking via external surrogates: A comparative study. Technology in Cancer Research & Treatment 2010;9:551–561. [DOI] [PubMed] [Google Scholar]
  • 6.Ghorbanzadeh L, Torshabi AE, Nabipour JS, et al. Development of a synthetic adaptive neuro-fuzzy prediction model for tumor motion tracking in external radiotherapy by evaluating various data clustering algorithms. Technology in Cancer Research & Treatment 2016;15:334–347. [DOI] [PubMed] [Google Scholar]
  • 7.Teo TP, Ahmed SB, Kawalec P, et al. Feasibility of predicting tumor motion using online data acquired during treatment and a generalized neural network optimized with offline patient tumor trajectories. Medical Physics 2018;45:830–845. [DOI] [PubMed] [Google Scholar]
  • 8.Özbek Y, Bárdosi Z, Freysinger W. Respitrack: Patient-specific real-time respiratory tumor motion prediction using magnetic tracking. International Journal of Computer Assisted Radiology and Surgery 2020;15:953–962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zhou D, Nakamura M, Mukumoto N, et al. Development of ai-driven prediction models to realize real-time tumor tracking during radiotherapy. Radiation Oncology 2022;17:42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Zhao W, Shen L, Han B, et al. Markerless pancreatic tumor target localization enabled by deep learning. International Journal of Radiation Oncology* Biology* Physics 2019;105:432–439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.de Bruin K, Dahele M, Mostafavi H, et al. Markerless real-time 3-dimensional kv tracking of lung tumors during free breathing stereotactic radiation therapy. Advances in Radiation Oncology 2021;6:100705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.He X, Cai W, Li F, et al. Decompose kv projection using neural network for improved motion tracking in paraspinal sbrt. Medical physics 2021;48:7590–7601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Dai Z, He Q, Zhu L, et al. Automatic prediction model for online diaphragm motion tracking based on optical surface monitoring by machine learning. Quantitative Imaging in Medicine and Surgery 2022;13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Shao H-C, Li Y, Wang J, et al. Real-time liver tumor localization via combined surface imaging and a single x-ray projection. Physics in Medicine & Biology 2023;68:065002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhou D, Nakamura M, Mukumoto N, et al. Feasibility study of deep learning-based markerless real-time lung tumor tracking with orthogonal x-ray projection images. Journal of Applied Clinical Medical Physics 2023;24:e13894. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Shen L, Zhao W, Xing L. Patient-specific reconstruction of volumetric computed tomography images from a single projection view via deep learning. Nature Biomedical Engineering 2019;3:880–888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Lei Y, Tian Z, Wang T, et al. Deep learning-based fast volumetric imaging using kv and mv projection images for lung cancer radiotherapy: A feasibility study. Medical Physics;n/a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ying X, Guo H, Ma K, et al. X2ct-gan: Reconstructing ct from biplanar x-rays with generative adversarial networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019. pp. 10611–10620. [Google Scholar]
  • 19.Montoya JC, Zhang C, Li Y, et al. Reconstruction of three-dimensional tomographic patient models for radiation dose modulation in ct from two scout views using deep learning. Medical Physics 2022;49:901–916. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Al-Hallaq HA, Cerviño L, Gutierrez AN, et al. Aapm task group report 302: Surface-guided radiotherapy. Medical Physics 2022;49:e82–e112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Gierga DP, Turcotte JC, Tong LW, et al. Analysis of setup uncertainties for extremity sarcoma patients using surface imaging. Practical Radiation Oncology 2014;4:261–266. [DOI] [PubMed] [Google Scholar]
  • 22.Stanley DN, McConnell KA, Kirby N, et al. Comparison of initial patient setup accuracy between surface imaging and three point localization: A retrospective analysis. Journal of applied clinical medical physics 2017;18:58–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Walter F, Freislederer P, Belka C, et al. Evaluation of daily patient positioning for radiotherapy with a commercial 3d surface-imaging system (catalyst™). Radiation oncology 2016;11:1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Carl G, Reitz D, Schönecker S, et al. Optical surface scanning for patient positioning in radiation therapy: A prospective analysis of 1902 fractions. Technology in cancer research & treatment 2018;17:1533033818806002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gao Y, Liu R, Chang C-W, et al. A potential revolution in cancer treatment: A topical review of flash radiotherapy. Journal of Applied Clinical Medical Physics 2022;23:e13790. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chang C-W, Dinh NT. Classification of machine learning frameworks for data-driven thermal fluid models. International Journal of Thermal Sciences 2019;135:559–579. [Google Scholar]
  • 27.Karniadakis GE, Kevrekidis IG, Lu L, et al. Physics-informed machine learning. Nature Reviews Physics 2021;3:422–440. [Google Scholar]
  • 28.Paszke A, Gross S, Massa F, et al. Pytorch: An imperative style, high-performance deep learning library. In: Editor, editorêditors. Book Pytorch: An imperative style, high-performance deep learning library; 2019. pp. 8026–8037. [Google Scholar]
  • 29.He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. pp. 770–778. [Google Scholar]
  • 30.Ulyanov D, Vedaldi A, Lempitsky V. Instance normalization: The missing ingredient for fast stylization 2016. [Google Scholar]
  • 31.Xu B, Wang N, Chen T, et al. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:150500853 2015. [Google Scholar]
  • 32.Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editors. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Cham. Springer International Publishing. 2015. pp. 234–241. [Google Scholar]
  • 33.Mirza M, Osindero S. Conditional generative adversarial nets. arXiv preprint arXiv:14111784 2014. [Google Scholar]
  • 34.Loshchilov I, Hutter F. Decoupled weight decay regularization. arXiv preprint arXiv:171105101 2017. [Google Scholar]
  • 35.Zhou W, Bovik AC, Sheikh HR, et al. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 2004;13:600–612. [DOI] [PubMed] [Google Scholar]
  • 36.Williams C, Rasmussen C. Gaussian processes for regression. Advances in neural information processing systems 1995;8. [Google Scholar]
  • 37.de Silva BM, Higdon DM, Brunton SL, et al. Discovery of physics from data: Universal laws and discrepancies. Frontiers in Artificial Intelligence 2020;3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Lusch B, Kutz JN, Brunton SL. Deep learning for universal linear embeddings of nonlinear dynamics. Nature Communications 2018;9:4950. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Chang C-W, Gao Y, Wang T, et al. Dual-energy ct based mass density and relative stopping power estimation for proton therapy using physics-informed deep learning. Physics in Medicine & Biology 2022;67:115010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Batista V, Meyer J, Kügele M, et al. Clinical paradigms and challenges in surface guided radiation therapy: Where do we go from here? Radiotherapy and Oncology 2020;153:34–42. [DOI] [PubMed] [Google Scholar]
  • 41.Leong B, Padilla L. Impact of use of optical surface imaging on initial patient setup for stereotactic body radiotherapy treatments. Journal of Applied Clinical Medical Physics 2019;20:149–158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Lai JL, Liu SP, Jiang XX, et al. Can optical surface imaging replace non-coplanar cone-beam computed tomography for non-coplanar set-up verification in single-isocentre non-coplanar stereotactic radiosurgery and hypofractionated stereotactic radiotherapy for single and multiple brain metastases? Clinical Oncology 2023;35:e657–e665. [DOI] [PubMed] [Google Scholar]
  • 43.Mogren O C-rnn-gan: Continuous recurrent neural networks with adversarial training. arXiv preprint arXiv:161109904 2016. [Google Scholar]
  • 44.Esteban C, Hyland SL, Rätsch G. Real-valued (medical) time series generation with recurrent conditional gans. arXiv preprint arXiv:170602633 2017. [Google Scholar]
  • 45.Peng J, Qiu RLJ, Wynne JF, et al. Cbct-based synthetic ct image generation using conditional denoising diffusion probabilistic model. Medical Physics 2023;n/a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Jascha S-D, Eric W, Niru M, et al. Deep unsupervised learning using nonequilibrium thermodynamics. In: Editor, editorêditors. Book Deep unsupervised learning using nonequilibrium thermodynamics: PMLR; 2015. pp. 2256–2265. [Google Scholar]
  • 47.Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 2020;33:6840–6851. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

MMC1

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

RESOURCES