Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2025 Jun 4;52(7):e17911. doi: 10.1002/mp.17911

Deep learning‐based cone‐beam CT motion compensation with single‐view temporal resolution

Joscha Maier 1, Stefan Sawall 1,2, Marcel Arheit 3, Pascal Paysan 3, Marc Kachelrieß 1,2,
PMCID: PMC12258004  PMID: 40467957

Abstract

Background

Cone‐beam CT (CBCT) scans that are affected by motion often require motion compensation to reduce artifacts or to reconstruct 4D (3D+time) representations of the patient. To do so, most existing strategies rely on some sort of gating strategy that sorts the acquired projections into motion bins. Subsequently, these bins can be reconstructed individually before further post‐processing may be applied to improve image quality. While this concept is useful for periodic motion patterns, it fails in case of non‐periodic motion as observed, for example, in irregularly breathing patients.

Purpose

To address this issue and to increase temporal resolution, we propose the deep single angle‐based motion compensation (SAMoCo).

Methods

To avoid gating, and therefore its downsides, the deep SAMoCo trains a U‐net‐like network to predict displacement vector fields (DVFs) representing the motion that occurred between any two given time points of the scan. To do so, 4D clinical CT scans are used to simulate 4D CBCT scans as well as the corresponding ground truth DVFs that map between the different motion states of the scan. The network is then trained to predict these DVFs as a function of the respective projection views and an initial 3D reconstruction. Once the network is trained, an arbitrary motion state corresponding to a certain projection view of the scan can be recovered by estimating DVFs from any other state or view and by considering them during reconstruction.

Results

Applied to 4D CBCT simulations of breathing patients, the deep SAMoCo provides high‐quality reconstructions for periodic and non‐periodic motion. Here, the deviations with respect to the ground truth are less than 27 HU on average, while respiratory motion, or the diaphragm position, can be resolved with an accuracy of about 0.75 mm. Similar results were obtained for real measurements where a high correlation with external motion monitoring signals could be observed, even in patients with highly irregular respiration.

Conclusions

The ability to estimate DVFs as a function of two arbitrary projection views and an initial 3D reconstruction makes deep SAMoCo applicable to arbitrary motion patterns with single‐view temporal resolution. Therefore, the deep SAMoCo is particularly useful for cases with unsteady breathing, compensation of residual motion during a breath‐hold scan, or scans with fast gantry rotation times in which the data acquisition only covers a very limited number of breathing cycles. Furthermore, not requiring gating signals may simplify the clinical workflow and reduces the time needed for patient preparation.

Keywords: 4D CBCT, deep learning, motion compensation

1. INTRODUCTION

In recent years, cone‐beam computed tomography (CBCT) has found wide application in various fields of medical imaging, including dentistry, 1 orthopedics, 2 interventional radiology, 3 and image‐guided radiotherapy. 4 Among the main reasons for this trend is the high flexibility of CBCT, its high spatial resolution, comparably low cost, and constantly improving image quality. However, on the downside, CBCT's poor temporal resolution poses a major challenge for any application dealing with patient motion. Since the gantry speed is often as low as 60 s per rotation, the acquisition time is large compared to the time scale of typical motion patterns. As a result, anatomical regions affected by motion appear blurred or distorted in the corresponding CBCT reconstructions. Therefore, several approaches have been proposed to address this issue.

Considering non‐periodic motion such as involuntary muscle motion, twitching, or swallowing, existing approaches are designed to provide a single artifact‐free volume. This is typically achieved by performing some sort of warped backprojection that compensates for the present motion. In this context, different strategies have been proposed to derive the motion estimate according to which the backprojection matrices are modified. These strategies include the use of fiducial markers, 5 the use of projection domain consistency conditions, 6 2D/3D registration to a motion‐free prior volume, 7 , 8 iterative re‐projection schemes, 9 as well the minimization motion artifact metrics. 10 , 11

Applications dealing with cardiac or respiratory motion, on the other hand, rather rely on 4D (3D + time) reconstructions, that is, temporal sequences of 3D reconstructions representing the patient in consecutive states of the motion cycle. Here, these reconstructions typically rely on some sort of retrospective gating strategy. For that purpose, a motion surrogate signal, such as the displacement of a breathing belt (respiratory motion) or an ECG (cardiac motion), is acquired along with the CBCT scan. In that way, any x‐ray projection of the CBCT scan can be assigned to a certain motion phase. Sorting the projections according to their phase into different motion bins (referred to as gating) and reconstructing them separately yields the desired time‐resolved representation of the motion cycle. 12 , 13 However, as each bin only uses a subset of the acquired projections, the gated reconstructions may show strong angular undersampling artifacts. One option to address this issue is to use hardware‐based approaches which adapt the gantry rotation speed and the acquisition of projection images to the patient's respiration. 14 , 15 , 16

Other software‐based approaches rather make use of 4D reconstruction algorithms or rely on motion compensation strategies. The former are usually based on iterative reconstruction schemes that incorporate some sort of spatio‐temporal regularization, 17 , 18 , 19 , 20 , 21 make use 4D image filter operations, 22 , 23 or employ deep learning to cope with the limited amount of data. 24 , 25 , 26 Motion compensation, on the other hand, aims to reconstruct any motion phase from all available data. In particular, this is realized by estimating displacement vector fields (DVFs) that model inter‐phase motion. Using these DVFs, any phase can be deformed into an arbitrary reference phase such that the sum of the deformed phases yields the final motion‐compensated reconstruction. While early motion compensation approaches required properly sampled prior CT scans to estimate the DVFs, 27 , 28 , 29 later approaches were optimized to estimate DVFs directly from the gated reconstructions using additional regularization techniques, 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 or even from an initial 3D reconstruction. 38

However, despite recent advances, there are still several downsides of current approaches. Most of them are directly linked to the gating process which implicitly assumes a periodicity and cannot be applied to irregular motion patterns. Furthermore, gating requires additional patient preparation and leads to a loss of temporal resolution by sorting the projections into a limited number of phase bins. Gating‐free approaches, on the other hand, are currently not designed to provide 4D reconstructions but only a single artifact‐free reconstruction.

Therefore, we propose a novel single angle‐based motion compensation (SAMoCo) approach to overcome these drawbacks. The general idea of the proposed approach is inspired by the partial‐angle based motion compensation (PAMoCo), our prior work on coronary artery motion compensation in cardiac CT. 39 , 40 To increase temporal resolution, the PAMoCo divides the scan range into several consecutive partial angles, reconstructs them separately within a small patch around the coronary artery, deforms them to the central motion state of the scan according to a motion model, and finally sums the deformed images to obtain a motion‐compensated reconstruction. Here, we adapt this strategy to be applicable to 4D CBCT. In particular this includes the backprojection and deformation of single projection views instead of multiple views to account for CBCT's low gantry rotation speed, the estimation of dense, non‐constrained vector fields for the entire field‐of‐view instead of a single vector at the coronary artery, and the estimation of vector fields between arbitrary motion states. In that way a distinct reconstruction can be provided for any motion state that had occurred during the CBCT scan. Since motion is estimated without any further assumptions or constraints on the underlying motion, the proposed approach applies to periodic and non‐periodic motion patterns and provides single‐view temporal resolution for 3D and 4D applications.

2. MATERIALS AND METHODS

2.1. SAMoCo

2.1.1. General concept

In the following a patient is described in terms of its time‐dependent distribution of the attenuation coefficient f(r,t). During a CBCT scan, x‐ray projections of f(r,t) are acquired at N successive angles ϑ{ϑ(t1),,ϑ(tN)} that correspond to the time points t{t1,,tN}. Using the short form fnf(r,t)|t=tn, the projection pn corresponding to the n th view (since we assume every view to have an individual motion state, n also refers to the motion state) is given as:

pn=Xnfn, (1)

where X denotes the x‐ray transform operator and Xn its n th component corresponding to the n th view of the scan. Since different motion states in the set of projections {pn} are superimposed during the CT reconstruction process, the corresponding image (i.e., X1p) suffers from motion artifacts. To address this issue, existing approaches typically sort projections corresponding to similar motion states into K motion bins that are reconstructed independently to derive a set of gated reconstructions:

fk=n=1NXn1IIknpn. (2)

Here Xn1 is the n th component of reconstruction operator and IIkn is a gating function that equals one if the motion states fk and fn fulfill a certain similarity criterion and zero otherwise.

Current motion compensation approaches operate on these gated reconstructions by estimating motion in terms of a deformable transformation Tlk:rr+ulk(r), which consists of the identity mapping and a DVF ulk such that:

fk=Tlkfl=fl(r+ulk). (3)

Accordingly, the motion‐compensated reconstruction is given as

fk,MoCo=l=1KTlkfl. (4)

In order to achieve single‐view resolution, the proposed SAMoCo follows a similar strategy but drops the gating to avoid the downsides discussed in Section 1. Instead of applying transformations Tlk that map between gated reconstructions, we apply transformations Tin to the (filtered) backprojection of single views (Xi1pi), which we refer to as single‐angle reconstructions (SARs). Thus, the SAMoCo reconstruction is given as:

fn,SAMoCo=i=1NTinXi1pi. (5)

In that way any motion state of the scan can be reconstructed by setting the reference index n{1,,N} accordingly.

2.1.2. Practical realization – deep SAMoCo

Implementing the motion compensation according to Equation (5), requires to know the transformations Tin that map motion states fi to motion states fn. Here, we aim to determine these transformations as a function of respective projections using a deep neural network as shown in Figure 1. In the Deep SAMoCo framework the network is trained to learn the following mapping:

MSAMoCo:[gi(pi),gn(pn)]uin, (6)

Here, the function gi is designed to provide additional morphological information that is not present in pi or pn. For that purpose, we chose gi to be the first update of an iterative reconstruction:

gi:pif¯+1XiT1XiTpiXif¯Xi1, (7)

where f¯ is an initial (motion blurred) reconstruction that uses all available data, that is

f¯=n=1NXn1pn. (8)

In that way, the temporal information contained in pi is combined with the tomographic information of f¯ to establish a robust mapping. Finally, the n th motion state fn is reconstructed by estimating N DVFs {uin}i{1,,N} and by applying them according to Equation (5).

FIGURE 1.

FIGURE 1

Workflow of the proposed Deep SAMoCo approach to reconstruct the 1st motion state. Any other motion state j can be reconstructed in the same way by changing the input to the network accordingly. Note that a shifted detector setup is considered. Therefore, the single angle reconstructions only cover about half the image. SAMoCo, single angle‐based motion compensation.

Here, the deformation itself is realized via the spatial transformer function described in reference. 41

2.1.3. Deep SAMoCo network

Since it has shown good performance in other motion estimation tasks, 36 , 37 , 41 a U‐Net‐like architecture is used to predict the DVF (required by Equation 5) that maps a given motion state fi to a reference motion state fn. 42 As fi and fn are to be estimated by the proposed approach, they are not available during inference time. Rather, modified SARs, gi(pi) and gn(pn), are provided as a two‐channel input to the network according to Equation (6).

The network itself consists of six stages with skip connections between the encoding and the decoding path. Each stage in the encoder and the decoder employs two convolutional layers (kernel size = 3 × 3 × 3) with a parametric rectified linear unit (PReLU) activation function. 43 The number of filters of the convolutional layers is increased by a factor of two in each stage starting with 8 filters in the first one. To reduce the spatial dimensions, the encoding path uses 2 × 2 × 2 max pooling while the decoding path applies 2 × 2 × 2 nearest neighbor upsampling to get back to the initial dimensions. Note that the final convolution only uses three filters (and a linear activation) such that the three channels of the output represent the three spatial dimensions of the DVF.

2.1.4. Data generation and training

To train the SAMoCo approach in a supervised setup, paired data {(gi(pi),gn(pn)),uin} consisting of the modified SAR inputs (gi(pi),gn(pn)) and the corresponding ground truth vector field uin are required. Here, these data were generated using CBCT simulations based on a 4D clinical CT dataset. The dataset consists of respiratory‐gated CT scans (acquired with a Philips Brilliance Big Bore CT) of 84 patients who were examined for clinical indications and who in part had lung tumors.

Here each scan is reconstructed into 10 separate volumes, f^i with i{1,,10}, representing 10 successive motion phases of the respiratory cycle. Due to the high temporal resolution of clinical CT, it can be assumed that these reconstructions are quasi‐static and free of motion artifacts. Therefore, they serve as prior volumes for the generation of paired training data.

Here, it has to be noted that we do not simulate CBCT scans which would include a continuous transition between these 10 reconstructions but just single samples of these motion states.

To do so, all reconstructions were resampled in a first step to a 320×256×160 grid with isotropic voxel size of 1.5 mm. In a second step, all possible DVFs uin between any two phases i,n{1,,10} were calculated for the resampled reconstructions via deformable image registration. For the purpose of this study, this was performed using VoxelMorph. 41 However, it has to be noted that the general concept does not rely particularly on VoxelMorph. Any other deformable image registration approach, for instance demons 44 or DEEDS algorithm, 45 could be used instead.

Finally, inputs to network (gi(pi),gn(pn)) were generated by a forward projection in CBCT geometry, followed by a backprojection within the scope of the function gi as described by Equation (7). In any case, the projection operators were implemented to match the ones of the Varian TrueBeam system described in Section 2.3.3. With N views per CBCT scan (where N is typically in the order of several hundred) and 10 phases of the prior CT scans, there are 10N different realizations of gi(pi) and thus (10N)2 different input combinations per patient. Since a precalculation of all realizations would lead to memory issues, they are rather calculated on‐the‐fly during training by randomly sampling two phases, i and j, as well two view angles, ϑi and ϑn.

In that way 1000 random samples were drawn for each patient per epoch. Using 68 of the 84 patients (average lung volume: 3.54 ± 0.96 L, average difference between inspiration and expiration: 0.40 ± 0.12 L), the approach was trained for 500 epochs on four NVIDIA RTX 3090 GPUs using an Adam optimizer, a batch size of four, a learning rate of 0.0001, and the mean squared error between prediction and ground truth DVF as loss function. The network realization that performed best on another eight independent validation patients was used for performance evaluation on the remaining eight test patients (average lung volume: 3.39 ± 0.91 L, average difference between inspiration and expiration: 0.35 ± 0.08 L), as described in Section 2.3.2.

2.2. Residual artifact correction (RAC)

2.2.1. Problem formulation

Let us assume we have an object with different motion states {fi}i{1,,N} and a set of transformations Ti1 which map to the first motion state such that f1=Ti1fi. In that case, the SAR of f1 is given as:

Xi1Xif1=Xi1Xi(Ti1fi). (9)

where Xi is the forward projection of the i th view and Xi1 the corresponding filtered backprojection (note that for a single view these operations do not cancel out). Summing up the SARs for all view angles yields:

f1=iXi1Xif1=iXi1Xi(Ti1fi). (10)

Within the deep SAMoCo framework, however, we do not have access to the fi’s but only to their SARs Xi1Xifi. Therefore, we rather use the approximation Xi1Xi(Ti1fi)Ti1(Xi1Xifi) and apply the transformation to the SARs instead. Thus, according to Equation (5), the deep SAMoCo for this example is given as

f1,SAMoCo=iTi1(Xi1Xifi)iXi1Xi(Ti1fi)=f1. (11)

Due to this approximation, some residual artifacts remain in our deep SAMoCo reconstruction even if the transformations are exactly know.

2.2.2. Toy example

To illustrate the issue described in Section 2.2.1, Figure 2 provides the results of a toy example. Here, a cylindrical phantom containing the logo of our institution was scaled periodically to simulate motion‐corrupted projection data

pi=XiTifD, (12)

with fD being the phantom and

Ti:r1+0.1·sin(0.15·i)0001+0.1·sin(0.15·i)0001·r, (13)

representing the transformation that applies the periodic scaling in the axial plane. For our purpose, the motion frequency was chosen to correspond to a typical number of respiratory cycles during a 60 s CBCT scan. Due to the simplicity of Ti, the SAMoCo can be performed according to Equation (5) using the exact inverse of Ti. As shown in the right column of Figure 2, this yields an image similar to the ground truth but with residual streak artifacts.

FIGURE 2.

FIGURE 2

Toy example demonstrating the need for the proposed residual streak artifact correction. Motion‐corrupted projection data are generated by applying a view‐dependent axial scaling factor (1+0.1·sin(0.15·i)), with i denoting the view index, to the phantom prior to the forward projection. While a conventional 3D reconstruction shows severe motion artifacts (middle column), the SAMoCo using the known DVFs (right column) compensates for motion but contains residual streaks. DVF, displacement vector field; SAMoCo, single angle‐based motion compensation.

2.2.3. RAC network

To account for the residual streak artifacts AA(r), a second network is trained to learn the following mapping MRAC:

MRAC:fn,SAMoCoA,s.t.fn,SAMoCo+A=fn. (14)

Here, the RAC network uses the same architecture as the SAMoCo network (see Section 2.1.3), except for the following two modifications: First, the RAC network uses a residual connection to estimate the artifacts A according to Equation (14), and second, it has only a single input channel (fn,SAMoCo) and single output channel (A).

2.2.4. Data generation and training

The generation of training data for the RAC network is based on the 4DCT dataset described in Section 2.1.4. For each training example, a random sequence (f^ν1,f^ν2,,f^νN) was sampled, with νi{1,,10} being a uniform random number and N=657 corresponding to the number of views. Subsequently, projection data were generated according to Equation (1) and reconstructed using the SAMoCo approach according to Equation (5). The corresponding artifact‐free labels were generated by performing an ideal motion compensation.

In total, 50 random sequences were simulated per patient. Here, 68 of the 84 patients were used for training, while 8 were used for validation and another 8 were reserved for testing. The network was trained for 300 epochs on an NVIDIA RTX 3090 GPU using an Adam optimizer, a batch size of one, a learning rate of 0.0001, and the mean squared error between the prediction end the ideal motion compensation as loss function.

2.3. Evaluation

2.3.1. Accuracy of DVFs

The accuracy of the DVFs predicted by the deep SAMoCo is evaluated by a comparison against the ground truth DVFs (see Section 2.1.4) of the test set. To do so, ten random view angles were sampled for each patient and each phase of the prior 4DCT to generate projections pi according to Equation (1) and modified SARs gi according to Equation (7). Subsequently, the deep SAMoCo network was applied to predict DVFs according to Equation (6) between any combination of modified SARs.

To further quantify the quality of the DVFs, an anatomy‐specific evaluation was performed. For that purpose the CT reconstructions were segmented using an internal segmentation tool. As shown in Figure 3, each voxel of the patient was assigned exclusively to one of the following classes: lung, lung nodules, heart, ribs, diaphragm, remainder.

FIGURE 3.

FIGURE 3

Segmentation to perform an anatomy‐specific evaluation of the DVFs. Blue: lung, orange: lung nodules, green: heart, white: ribs, yellow: diaphragm, gray: remainder. DVF, displacement vector field.

Based on this segmentation, mean values of the predicted DVFs

u¯c=1Kci=110n=110νi=110νn=110rwn,c(r)·|uin(r)| (15)

as well as their mean absolute error with respect to the ground truth

Ec=1Kci=110n=110νi=110νn=110rwn,c(r)·|uin(r)ui,GTn(r)| (16)

were calculated, where Kc=104·rwn,c(r) is a normalization constant, i and j are source and target motion phases, νi and νj refer to the corresponding random angles and wn,c(r) is the class‐ and motion state‐specific binary weight resulting from the segmentation.

2.3.2. Simulation study

To test the proposed SAMoCo approach including the RAC network, simulations were performed using eight test patients of the 4DCT dataset described in Section 2.1.4. For each patient, CBCT projections were generated according to Equation (1) by a forward projection of a sequence of N (here: 657) prior volumes (f1,f2,,fN) using the Varian TrueBeam geometry described in Section 2.3.3. Here, the n th prior volume fn is given as:

fn=(ϕ(n)Φn)·TΦnΦn+1f^Φn, (17)

where f^Φn is one of ten phases of the 4DCT dataset, ϕ(n):{1,N}[1,11) is a (non‐integer) phase signal representing the distribution of motion states during the scan, and Φn=ϕ(n) is the greatest integer less than or equal to ϕ(n).

In that way we can mimic a continuous and smooth transition between different motion states as it would occur during a real CBCT scan.

Following this strategy, a periodic and a non‐periodic motion pattern was simulated according to the phase signals shown in Figure 4. Here, the two scenarios can be thought as two extremal cases. While the periodic case corresponds to an ideal motion pattern with a respiration frequency similar to what is observed in most of our patient scans, the non‐periodic case shows a very complex motion pattern with a very limited number of respirations.

FIGURE 4.

FIGURE 4

Top: Periodic and non‐periodic motion phases that were used for simulation. Bottom: Clinical example of a 60 s scan with a selected 15 s interval showing highly non‐periodic motion. Particularly in scans with such short acquisition times, the motion patterns may resemble the non‐periodic simulation case.

This can be seen as the result of a CBCT scan of a free‐breathing patient with a very short scan time as illustrated in Figure 4. Although this is currently a rather rare case, it may become more important in the future, especially with the trend towards systems with faster gantry rotation speed. 46

Finally, the corresponding motion compensated reconstructions were evaluated in terms of their gray value accuracy as well as the accuracy of the position of the diaphragm. For the latter a sigmoid function

sz0,h,c,b(z)=h1+exp(c·(zz0))+b, (18)

with open parameters z0,h,c, and b was fitted to line profiles (extracted from the motion compensated reconstructions) that run perpendicular to the surface of the diaphragm. In that way, the fitted inflection point z0 can be used as a measure of the diaphragm position.

Here, 400 line profiles were evaluated and averaged for each patient and each motion compensated reconstruction fn,SAMoCo. For comparison the same evaluation was performed for the ground truth, that is, fn as given in Equation (17).

2.3.3. Pilot patient study

Patient measurements were performed using the kV imaging unit of a Varian TrueBeam system. Here, the source to detector distance is 1500 mm and the source to isocenter distance is 1000 mm. All scans were performed at 125 kV with the 2 × 2 binning mode in which the CsI‐based detector (PaxScan 4030) has 1024 × 768 pixels with an effective pixel size of 0.388 mm×0.388 mm. To increase the field of measurement to about 46 cm, the system was operated in shifted‐detector mode in which the detector is laterally off‐centered by 160 mm. In total, 660 to 840 projections were acquired over 360Inline graphic with a frame rate between 7  and 11 fps, leading to a scan time of 60 s to 120 s. To monitor respiratory motion, the Varian Real‐time Position Management (RPM) systems was synchronized with the CBCT scan.

For this study measurements of five patients were evaluated. Since there is no ground truth, the position of the diaphragm was determined as a function of the scan time (see Section 2.3.2) and compared quantitatively to the signals (see Figure 5) of the RPM system.

FIGURE 5.

FIGURE 5

RPM signals of the five patient measurements evaluated in this study. Exemplary reconstructions for the first two patients can be found in Figure 9. RPM, real‐time position management.

3. RESULTS

3.1. Accuracy of DVFs

The accuracy of the predicted DVFs was evaluated as described in Section 2.3.1. As shown for one exemplary test patient in Figure 6, the predicted DVFs of the deep SAMoCo are in good agreement with the ground truth. However, since DVFs are predicted as a function of only two projections, high frequencies of the DVFs are reproduced with reduced accuracy. The quantitative analysis summarized in Table 1 shows similar trends. While the anatomy‐specific mean values of the predicted DVFs differ by less than 10% from the corresponding ground truth, the evaluation of the mean absolute error on a pixel level yields slightly higher deviations. Nevertheless, these deviations are on average well below the voxel size for all anatomies. Here, the highest errors occur in the heart, which is most likely due to the superposition of respiratory and cardiac motion. Since our training strategy is focused on the respiratory motion, it is not optimal for estimating cardiac motion and leads to a reduced accuracy within the heart.

FIGURE 6.

FIGURE 6

x‐, y‐, and z‐component (top, middle, bottom) of the ground truth DVFs, the corresponding deep SAMoCo prediction, as well as their difference. It has to be noted that this figure shows only a subset of the predicted DVFs (phase 1 2, 1 4, 1 6, 1 8, 1 10), while DVFs were calculated between any combination of the 10 phases of the 4DCT prior. DVF, displacement vector field; SAMoCo, single angle‐based motion compensation.

TABLE 1.

Anatomy‐specific mean of the DVFs of the GT and the deep SAMoCo as well as the (pixelwise) MAE for the eight patients of the simulated test data set according to Equations (15) and (16).

Mean GT/mm Mean deep SAMoCo/mm MAE/mm
Lung 1.50 ± 1.04 1.49 ± 1.11 0.73 ± 0.23
Lung nodules 1.81 ± 1.39 1.87 ± 1.49 0.79 ± 0.28
Heart 1.72 ± 1.00 1.56 ± 1.09 1.06 ± 0.30
Ribs 0.45 ± 0.27 0.41 ± 0.22 0.32 ± 0.09
Diaphragm 2.69 ± 2.08 2.80 ± 2.23 0.86 ± 0.33
Remainder 0.94 ± 0.49 0.84 ± 0.48 0.53 ± 0.12

Abbreviations: DVF, displacement vector field; GT, ground truth; MAE, mean absolute error; SAMoCo, single angle‐based motion compensation.

3.2. Simulation study

Motion‐compensated reconstructions were generated as described in Section 2.1 using the data introduced in Section 2.3.2. In addition, gated reconstructions were performed according to Equation (2) as a reference. Here, the retrospective  gating is based on the phase signals shown in Figure 4, that is, for every time point all data within a 20% window centered around the respective phase were used for reconstruction. The corresponding results for a periodic test case are shown in Figure 7. Since the retrospective  gating leads to sparse angular sampling, the corresponding gated reconstructions suffer from severe sparse‐view artifacts. However, despite the poor image quality, high contrast structures such as the diaphragm are clearly resolved. Therefore, its position can still be determined quite accurately (see Figure 7, bottom curve).

FIGURE 7.

FIGURE 7

Simulation study for a periodic motion signal. The CT images show motion compensated reconstructions for two motion states (top: end‐exhale, middle: end‐inhale). The diaphragm position according to Section 2.3.2 is shown below for all views or motion states respectively.

The Deep SAMoCo, in contrast, is designed to use all data by estimating and applying DVFs that account for the present motion. As a results, the corresponding reconstructions show highly improved image quality. Even though no gating information is used within the SAMoCo framework, all present motion states or the diaphragm positions can be resolved with high accuracy. However, due to inconsistencies discussed in Section 2.2, some streak artifacts still remain. Therefore, a further improvement can be achieved by applying RAC network which is trained to address this issue. In fact, the SAMoCo + RAC reconstructions are almost free of artifacts and only show some blurring in the region of the heart.

Similar results can be obtained for the non‐periodic case shown in Figure 8. As the Deep SAMoCo is trained to estimate DVFs between arbitrary motion states independent of their temporal distribution, it can handle periodic and non‐periodic cases in the same way and with the same accuracy. This becomes evident when considering the evaluation of the diaphragm position, as shown in the bottom curve. Similar to the periodic case, the application of the RAC network can further boost image quality without affecting the accuracy of the motion compensation. Comparing these results with those of the gated reconstruction, the advantages of the proposed approach become particularly clear. Since any gating strategy relies on different views sharing the same motion state, it fails in case of non‐periodic motion patterns. This downside is reflected by the poor image quality that does not allow a reasonable determination of the diaphragm position.

FIGURE 8.

FIGURE 8

Simulation study for a non‐periodic motion signal. The CT images show motion compensated reconstructions for two motion states (top: end‐exhale, middle: end‐inhale). The diaphragm position according to Section 2.3.2 is shown below for all views or motion states respectively.

A more quantitative evaluation is given in Table 2. Considering the average deviation of the diaphragm position as well as the mean absolute error of the CT values, there is a good agreement with the quantitative findings discussed above. While the gated reconstruction yields poor image quality, it is yet able to provide accurate diaphragm positions for the periodic case. As expected, it fails completely in the non‐periodic case. The SAMoCo, in contrast, provides a similar accuracy for both cases. As intended, the RAC network does not affect the accuracy of the diaphragm position but leads to an improved image quality.

TABLE 2.

Deviation from the ground truth averaged over the eight patients of the simulated test data set.

Diaphragm position/mm Mean absolute error/HU
periodic non‐periodic periodic non‐periodic
Gated reco. 0.49
±
0.17
3.79 ± 1.32 245 ± 21 1120 ± 87
SAMoCo 0.73 ± 0.24 0.74
±
0.31
35 ± 5 33 ± 5
SAMoCo+RAC 0.73 ± 0.25 0.75 ± 0.31 27
±
5
26
±
4

Abbreviations: RAC, residual artifact correction; SAMoCo, single angle‐based motion compensation.

3.3. Pilot patient study

Patient measurements were performed according to Section 2.3.3. Similar to the simulation study, gated reconstructions were used as reference. Here, however, the retrospective  gating was performed according to the external RPM signal. The corresponding reconstructions as well as the results of the SAMoCo including the RAC network are shown in Figure 9 for two different patients. Similar to the previous experiments, the gated reconstructions suffer from a poor image quality. Especially in the extremal motion states (end‐exhale and end‐inhale), there are very few projections that can be used if the respiration is not perfectly regular. In particular this can be observed in the second case in which the motion pattern changes strongly during the scan. The Deep SAMoCo, in contrast, is not affected by the motion pattern and yields constant image quality independent of the motion state or the view angle respectively.

FIGURE 9.

FIGURE 9

Results for a 1 min CBCT scan (top) and a two minute CBCT scan (bottom) of two different patients. The CT images show the 3D CBCT reconstruction that is corrupted by motion as well as motion compensated reconstructions for two motion states (left: end‐exhale, right: end‐inhale). As there is no ground truth, the normalized RPM signal is plotted against the normalized diaphragm position (according to Section 2.3.2) below.

To further assess the quality of the SAMoCo reconstructions, the diaphragm position was evaluated as a function of the scan time or the view angle respectively and compared against the RPM signal. Even though the RPM signal actually represents the motion of a marker block on the patient's chest, we expect a high correlation once both signals are normalized to be in the same range (here: [0,1]). The corresponding comparison is plotted below the CT images in Figure 9 and indeed shows a good agreement with the RMP signal as well as the curve of the gated reconstruction.

Comparing the normalized diaphragm positions to the normalized RPM signal of all test patients yields an average deviation of 7.5% ± 1.4% for the gated reconstruction and 7.1% ± 1.2% for the SAMoCo as well as the SAMoCo with RAC.

4. DISCUSSION AND CONCLUSION

Current 4D motion compensation approaches usually rely on some kind of gating strategy. However, especially in case of irregular or non‐periodic motion patterns, these strategies have major downsides as demonstrated by our experiments. Even though the gated reconstructions shown here are the simplest representatives of this class of approaches, we expect a similar behavior for more sophisticated gating‐based motion compensation strategies. Since they are typically designed to operate on a set of gated reconstructions (either by estimating DVFs as, for example, in refs. [30, 31] or by applying some sort of post‐processing as, for example, in refs. [22, 23, 24]), the gated reconstructions must have a certain image quality in the first place. If this is the case, advanced approaches as discussed in Section 1 are most likely to provide as high image quality as the proposed deep SAMoCo. However, if only very few projections share the same motion state, this criterion can certainly not be met.

To address this issue, the deep SAMoCo approach does not rely on gating at all, but is trained to estimate DVFs on a single‐view level as a function of so‐called SARs. In practice, this offers several advantages. First of all, not requiring gating signals simplifies the clinical workflow and reduces the time needed for patient preparation. Second, operating on a single‐view level provides a temporal resolution that corresponds to the acquisition time of a single projection image. Last and most importantly, the ability to estimate DVFs from any two SARs regardless of their motion states makes the deep SAMoCo applicable to arbitrary motion patterns. Therefore, the deep SAMoCo is particularly useful for cases with unsteady breathing, compensation of residual motion during a breath‐hold scan, or scans with fast gantry rotation times in which the data acquisition only covers a very limited number of breathing cycles. Furthermore, this allows to update an existing patient model, for example, a planning CT or a previously acquired CBCT, to the current motion state by only acquiring one additional x‐ray image. Thus, such an update can be performed almost in real time with minimal effort and minimal dose, offering new possibilities for image‐guided radiation therapy or interventional procedures. Nevertheless, it has to be noted that it works at least just as well for periodic signals as existing gating‐based methods.

These advantages are confirmed by our experiments.

Evaluating the quality of the predicted DVFs shows a high similarity to the ground truth with average errors well below the voxel size used in our experiments. In particular, the DVF estimates show a high accuracy for different anatomical regions with slightly higher deviations in the heart due to the superposition of respiratory and cardiac motion. Applied to periodic and non‐periodic CBCT simulations, the deep SAMoCo shows an equally well performance and provides motion‐compensated reconstructions that deviate from the ground truth by less than 35 HU on average without the RAC network and less than 27 HU with the RAC network. In both cases, these reconstructions allowed an accurate determination of the diaphragm position, which was used as an additional performance measure. Independent of the motion pattern, it could be determined with an accuracy of about 0.75 mm, which corresponds to half the voxel size.

Similar results could be obtained for real measurements at a Varian TrueBeam system. Even for patients with a very irregular breathing, the deep SAMoCo provides constant image quality across all motion states and a high correlation of the extracted diaphragm positions to the external RPM signal.

Finally, it has to be noted that even though the focus of this study is on respiratory motion compensation, the general concept of the deep SAMoCo applies similarly to other types of motion. If appropriate training data are available, it may be used, for example, to compensate for cardiac motion in CT and CBCT, to handle motion in interventional radiology, or to compensate for head motion in dental CBCT.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

ACKNOWLEDGMENTS

This study was supported by Varian Medical Systems, a Siemens Healthineers Company. Parts of the reconstruction software were provided by RayConStruct GmbH, Nürnberg, Germany.

Open access funding enabled and organized by Projekt DEAL.

Maier J, Sawall S, Arheit M, Paysan P, Kachelrieß M. Deep learning‐based cone‐beam CT motion compensation with single‐view temporal resolution. Med Phys. 2025;52:e17911. 10.1002/mp.17911

REFERENCES

  • 1. Kiljunen T, Kaasalainen T, Suomalainen A, Kortesniemi M. Dental cone beam CT: a review. Physica Med. 2015;31:844‐860. [DOI] [PubMed] [Google Scholar]
  • 2. Carrino JA, Muhit AA, Zbijewski W, et al. Dedicated cone‐beam CT system for extremity imaging. Radiology. 2014;270:816‐824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Orth RC, Wallace MJ, Kuo MD. C‐arm Cone‐beam CT: general principles and technical considerations for use in interventional radiology. J Vasc Interv Radiol. 2008;19:814‐820. [DOI] [PubMed] [Google Scholar]
  • 4. Jaffray DA. Image‐guided radiotherapy: from current concept to future perspectives. Nat Rev Clin Oncol. 2012;9:688‐699. [DOI] [PubMed] [Google Scholar]
  • 5. Choi JH, Maier A, Keil A, et al. Fiducial marker‐based correction for involuntary motion in weight‐bearing C‐arm CT scanning of knees. II. Experiment. Med Phys. 2014;41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Berger M, Xia Y, Aichinger W, et al. Motion compensation for cone‐beam CT using Fourier consistency conditions. Phys Med Biol. 2017;62:7181‐7215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Berger M, Müller K, Aichert A, et al. Marker‐free motion correction in weight‐bearing cone‐beam CT of the knee joint. Med Phys. 2016;43:1235‐1248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Ouadah S, Jacobson M, Stayman JW, Ehtiati T, Weiss C, Siewerdsen JH. Correction of patient motion in cone‐beam CT using 3D‐2D registration. Phys Med Biol. 2017;62:8813‐8831. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Sun T, Jacobs R, Pauwels R, Tijskens E, Fulton R, Nuyts J. A motion correction approach for oral and maxillofacial cone‐beam CT imaging. Phys Med Biol. 2021;66:125008. [DOI] [PubMed] [Google Scholar]
  • 10. Capostagno S, Sisniega A, Stayman JW, Ehtiati T, Weiss CR, Siewerdsen JH. Deformable motion compensation for interventional cone‐beam CT. Phys Med Biol. 2021;66:055010. [DOI] [PubMed] [Google Scholar]
  • 11. Huang H, Siewerdsen JH, Zbijewski W, et al. Reference‐free learning‐based similarity metric for motion compensation in cone‐beam CT. Phys Med Biol. 2022;67:125020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Sonke JJ, Zijp L, Remeijer P, Van Herk M. Respiratory correlated cone beam CT. Med Phys. 2005;32:1176‐1186. [DOI] [PubMed] [Google Scholar]
  • 13. Li T, Xing L, Munro P, et al. Four‐dimensional cone‐beam computed tomography using an on‐board imager. Med Phys. 2006;33:3825‐3833. [DOI] [PubMed] [Google Scholar]
  • 14. Lu J, Guerrero TM, Munro P, et al. Four‐dimensional cone beam CT with adaptive gantry rotation and adaptive data sampling. Med Phys. 2007;34:3520‐3529. [DOI] [PubMed] [Google Scholar]
  • 15. Dillon O, Keall PJ, Shieh CC, O'Brien RT. Evaluating reconstruction algorithms for respiratory motion guided acquisition. Phys Med Biol. 2020;65:175009. [DOI] [PubMed] [Google Scholar]
  • 16. O'Brien RT, Dillon O, Lau B, et al. The first‐in‐human implementation of adaptive 4D cone beam CT for lung cancer radiotherapy: 4DCBCT in less time with less dose. Radiother Oncol. 2021;161:29‐34. [DOI] [PubMed] [Google Scholar]
  • 17. Isola AA, Ziegler A, Koehler T, Niessen WJ, Grass M, Motion‐compensated iterative cone‐beam CT image reconstruction with adapted blobs as basis functions. Phys Med Biol. 2008;53:6777‐6797. [DOI] [PubMed] [Google Scholar]
  • 18. Ritschl L, Sawall S, Knaup M, Hess A, Kachelrieß M, Iterative 4D cardiac micro‐CT image reconstruction using an adaptive spatio‐temporal sparsity prior. Phys Med Biol. 2012;57:1517‐25. [DOI] [PubMed] [Google Scholar]
  • 19. Chen GH, Thériault‐Lauzier P, Tang J, et al. Time‐resolved interventional cardiac C‐arm cone‐beam CT: an application of the PICCS algorithm. IEEE Trans Med Imaging. 2012;31:907‐923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Mory C, Auvray V, Zhang B, et al. Cardiac C‐arm computed tomography using a 3D + time ROI reconstruction method with spatial and temporal regularization. Med Phys. 2014;41:021903. [DOI] [PubMed] [Google Scholar]
  • 21. Zhi S, Kachelrieß M, Mou X. High‐quality initial image‐guided 4D CBCT reconstruction. Med Phys. 2020;47:2099‐2115. [DOI] [PubMed] [Google Scholar]
  • 22. Sawall S, Bergner F, Lapp R, et al. Low‐dose cardio‐respiratory phase‐correlated cone‐beam micro‐CT of small animals. Med Phys. 2011;38:1416. [DOI] [PubMed] [Google Scholar]
  • 23. Tian Z, Jia X, Dong B, Lou Y, Jiang SB. Low‐dose 4DCT reconstruction via temporal nonlocal means. Med Phys. 2011;38:1359‐1365. [DOI] [PubMed] [Google Scholar]
  • 24. Zhi S, Kachelrieß M, Pan F, Mou X, CycN‐Net: A convolutional neural network specialized for 4D CBCT images refinement. IEEE Trans Med Imaging. 2021;40:3054‐3064. [DOI] [PubMed] [Google Scholar]
  • 25. Yang P, Ge X, Tsui T, et al. Four‐dimensional cone beam CT imaging using a single routine scan via deep learning. IEEE Trans Med Imaging. 2023;42:1495‐1508. [DOI] [PubMed] [Google Scholar]
  • 26. Jiang Z, Chang Y, Zhang Z, Yin Ff, Ren L, Fast four‐dimensional cone‐beam computed tomography reconstruction using deformable convolutional networks. Med Phys. 2022;49:6461‐6476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Li T, Schreibmann E, Yang Y, Xing L. Motion correction for improved target localization with on‐board cone‐beam computed tomography. Phys Med Biol. 2006;51:253‐267. [DOI] [PubMed] [Google Scholar]
  • 28. Li T, Koong A, Xing L. Enhanced 4D cone‐beam CT with inter‐phase motion model. Med Phys. 2007;34:3688‐3695. [DOI] [PubMed] [Google Scholar]
  • 29. Rit S, Wolthaus JW, Van Herk M, Sonke JJ. On‐the‐fly motion‐compensated cone‐beam CT using an a priori model of the respiratory motion. Med Phys. 2009;36:2283‐2296. [DOI] [PubMed] [Google Scholar]
  • 30. Brehm M, Paysan P, Oelhafen M, Kunz P, Kachelrieß M. Self‐adapting cyclic registration for motion‐compensated cone‐beam CT in image‐guided radiation therapy. Med Phys. 2012;39:7603‐7618. [DOI] [PubMed] [Google Scholar]
  • 31. Brehm M, Paysan P, Oelhafen M, Kachelrieß M. Artifact‐resistant motion estimation with a patient‐specific artifact model for motion‐compensated cone‐beam CT. Med Phys. 2013;40:101913. [DOI] [PubMed] [Google Scholar]
  • 32. Wang J, Gu X, Simultaneous motion estimation and image reconstruction (SMEIR) for 4D cone‐beam CT. Med Phys. 2013;40:101912. [DOI] [PubMed] [Google Scholar]
  • 33. Sauppe S, Hahn A, Brehm M, Paysan P, Seghers D, Kachelrieß M. Five‐dimensional motion compensation for respiratory and cardiac motion with cone‐beam CT of the thorax region. In: Proceedings of the SPIE Medical Imaging Conference . SPIE; 2016:97830H. [Google Scholar]
  • 34. Zhang H, Ma J, Bian Z, Zeng D, Feng Q, Chen W, High quality 4D cone‐beam CT reconstruction using motion‐compensated total variation regularization. Phys Med Biol. 2017;62:3313‐3329. [DOI] [PubMed] [Google Scholar]
  • 35. Riblett MJ, Christensen GE, Weiss E, Hugo GD. Data‐driven respiratory motion compensation for four‐dimensional cone‐beam computed tomography (4D‐CBCT) using groupwise deformable registration. Med Phys. 2018;45:4471‐4482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Huang X, Zhang Y, Chen L, Wang J, U‐net‐based deformation vector field estimation for motion‐compensated 4D‐CBCT reconstruction. Med Phys. 2020;47:3000‐3012. [DOI] [PubMed] [Google Scholar]
  • 37. Zhang Z, Liu J, Yang D, Kamilov US, Hugo GD. Deep learning‐based motion compensation for four‐dimensional cone‐beam computed tomography (4D‐CBCT) reconstruction. Med Phys. 2023;50:808‐820. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Gardner M, Dillon O, Byrne H, Keall P, O'Brien R. Data‐driven rapid 4D cone‐beam CT reconstruction for new generation linacs. Phys Med Biol. 2024;69:18NT02. [DOI] [PubMed] [Google Scholar]
  • 39. Hahn J, Bruder H, Rohkohl C, et al. Motion compensation in the region of the coronary arteries based on partial angle reconstructions from short‐scan CT data. Med Phys. 2017;44:5795‐5813. [DOI] [PubMed] [Google Scholar]
  • 40. Maier J, Lebedev S, Erath J, Eulig E, Sawall S, Fournié E Stierstorfer K, Lell M, Kachelrieß M. Deep learning‐based coronary artery motion estimation and compensation for short‐scan cardiac CT. Med Phys. 2021;48:3559‐3571. [DOI] [PubMed] [Google Scholar]
  • 41. Balakrishnan G, Zhao A, Sabuncu MR, Guttag J, Dalca AV. VoxelMorph: a learning framework for deformable medical image registration. IEEE Trans Med Imaging. 2019;38:1788‐1800. [DOI] [PubMed] [Google Scholar]
  • 42. Ronneberger O, Fischer P, Brox T. U‐Net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells W, Frangi A eds. Medical Image Computing and Computer‐Assisted Intervention MICCAI . Vol 9351. Lecture Notes in Computer Science. Springer, Cham. 2015. doi: 10.1007/978-3-319-24574-4_28 [DOI] [Google Scholar]
  • 43. He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human‐level performance on ImageNet classification. 2015 IEEE International Conference on Computer Vision (ICCV) . 2015:1026‐1034, doi: 10.1109/ICCV.2015.123 [DOI]
  • 44. Thirion JP. Image matching as a diffusion process: an analogy with Maxwell's demons. Med Image Anal. 1998;2:243‐260. [DOI] [PubMed] [Google Scholar]
  • 45. Heinrich HP, Jenkinson M, Brady M, Schnabel JA. MRF‐based deformable registration and ventilation estimation of lung CT. IEEE Trans Med Imaging. 2013;32:1239–1248. [DOI] [PubMed] [Google Scholar]
  • 46. Kim E, Park YK, Zhao T, et al. Image quality characterization of an ultra‐high‐speed kilovoltage cone‐beam computed tomography imaging system on an O‐ring linear accelerator. J Appl Clin Med Phys. 2024;25(5):e14337. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Medical Physics are provided here courtesy of Wiley

RESOURCES