Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2025 Oct 9;52(10):e70070. doi: 10.1002/mp.70070

Motion correction in CBCT imaging using gate‐less model‐based reconstruction of non‐rigid motion and images

Ethan Waterink 1,, Rodrigo José Santo 1, Cornelis A T van den Berg 1,2, Alessandro Sbrizzi 1,2, Casper Beijst 1
PMCID: PMC12509791  PMID: 41065309

Abstract

Background

Cone beam computed tomography (CBCT) plays a crucial role in verifying patient positioning during radiotherapy. However, CBCT images are often blurred by motion originating from breathing, bowel motion, and/or repositioning. Conventional methods often employ gating techniques to mitigate motion artifacts by assuming periodicity, which is restricted to respiratory motion. However, this does not resolve a‐periodic motion such as irregular breathing.

Purpose

The purpose of this study is to estimate and correct for both periodic motion and irregular motion. We present a novel method for gate‐less reconstruction of motion and images (CBCT‐MOTUS), thereby estimating and correcting for all non‐rigid 3D motion at high temporal frequency (per‐projection temporal resolution of 182 ms).

Methods

Starting from a motion‐corrupted image, we perform the reconstruction process by alternating between a motion estimation step and an image correction step. Motion estimation is model‐based, that is, it is performed directly in projection space by comparing acquired projections to simulated projections that take motion‐fields into account. The optimization process benefits from assumptions, including (i) compressing the motion‐fields using B‐spline parameterization to reduce the number of parameters to estimate, (ii) exploiting spatio‐temporal correlation of motion using a low‐rank motion model, and (iii) enforcing smoothness using spatial regularization to include a priori knowledge on motion‐fields. High temporal resolution (182 ms for this scanner) is achieved as motion‐fields are estimated for each acquired gantry angle. The method is validated in silico, on phantoms and on clinical in vivo acquisitions.

Results

The framework can estimate and correct irregular and periodic motion with high temporal resolution (per‐projection temporal resolution of 182 ms). The motion‐corrected images show improved image features for all experiments with reduction of motion artifacts, such as deblurring on organ interfaces.

Conclusions

We have developed and validated a gate‐less reconstruction method (CBCT‐MOTUS) for model‐based motion estimation and correction in CBCT imaging for any kind of motion. High temporal resolution (per‐projection temporal resolution of 182 ms) is achieved by exploiting the smoothness and compressibility of motion using a low‐rank B‐spline motion model, enabling the correction for non‐rigid irregular and periodic motion. These proof‐of‐principle results warrant further clinical validation.

Keywords: CBCT motion correction, gate‐less motion reconstruction, model‐based non‐rigid motion

1. INTRODUCTION

Radiotherapy is the cornerstone of many cancer treatments. Anatomically accurate images acquired prior to radiotherapy are essential for accurate treatments, especially when smaller margins are used for daily adjustments, referred to as adaptive radiotherapy. 1 Cone‐beam computed tomography (CBCT) is often used for position verification prior to radiotherapy. However, a major challenge in acquiring high‐quality CBCT images is motion from breathing, bowel motion, and/or repositioning, especially a‐periodic and irregular motion.

Conventional methods often employ gating or sorting techniques to mitigate motion artifacts in (CB)CT. They assume periodicity of motion to reconstruct different phases separately, from which the deformation vector fields are subsequently estimated by non‐rigid registration. 2 Gating has been shown to improve target localization discrepancy and organ visibility. 3 However, gating does not solve irregular, a‐periodic motion such as dysfunctional breathing, resulting in images with inherent blurring. Irregular motion can also lead to misalignment artifacts and disconnected anatomy. 4 Moreover, registration in image space is affected by reconstruction artifacts and data sparsity for individual frames/gates. 5

4D CBCT scans have longer acquisition time to collect more data for higher‐quality gating or sorting. Together with a surrogate signal, data can be sorted into phases for reconstruction. 6 A surrogate‐driven approach for model fitting of respiratory motion has been deployed by extracting respiration surrogate signals from the projection data by the intensity analysis method. 7 Motion compensation in cardiac imaging can be performed with an ECG surrogate signal. 8 Navigators are used to track patient motion, sometimes internally 9 or externally. 10 These approaches require longer acquisition times and increase in delivered radiation, or use (external) surrogate signals.

Artificial intelligence (AI) has been employed as pre‐ and/or post‐processing steps for motion estimation and correction, 19 or as part of the reconstruction process. 20 Real‐time CBCT imaging has been implemented with deep‐learning by reconstructing a dynamic CBCT from a pre‐treatment CBCT scan, 21 and by capturing spatiotemporal correlation in the projections using PCA coefficient estimation. 22 Neural networks have been trained to map motion‐corrupted projections to motion‐free projections by “freezing” the patient in a certain motion phase from 4D CT scans. 23 Deep learning networks are generally black box methods and do not provide the same understanding as model‐based approaches. However, hybrid methods combine deep learning with models for better understanding. Deep‐autofocus methods leverage convolutional neural networks to estimate motion‐fields, which are used to compensate for motion in analytical reconstruction. 24 , 25 , 26

Joint reconstruction methods simultaneously perform reconstruction of motion and images using the full projection data, resulting into less noise. The motion optimization problem has been solved in projection space with a derivative free approach using the Nelder–Mead algorithm, 11 or with a gradient‐based optimization algorithm and generalized derivatives of cone beam backprojection operator. 12 Recently, motion compensation has been achieved via a surrogate‐optimized respiratory motion from unsorted projection data. 13 Additionally, diffeomorphic motion models are used to incorporate fundamental physical properties of organ motion. 14 Motion detection algorithms have been used to isolate motion‐free projections from which a regularized reference reconstruction is made, after which motion is estimated. 15 Other approaches for irregular motion use the autofocus concept to estimate motion by maximizing a sharpness metric computed on the reconstructed image (e.g., image entropy, total variation or a gradient‐based metric). 16 , 17 , 18 Most previously proposed joint reconstruction methods only perform rigid motion estimation.

In this work, we present a novel method for gate‐less model‐based reconstruction of motion and images in CBCT imaging, thereby estimating and correcting for all non‐rigid 3D motion at high temporal frequency (per‐projection temporal resolution of 182 ms). We assume that motion can be parameterized by parsimonious models. Therefore, we postulate that the motion estimation problem can be optimized by (i) compressing the motion‐fields using B‐spline parameterization to reduce the number of parameters to estimate, (ii) exploiting spatio‐temporal correlation of motion using a low‐rank motion model, and (iii) enforcing smoothness using spatial regularization to include a priori knowledge on motion‐fields. Starting from a motion‐corrupted image, we resolve for motion by alternating between a motion estimation and an image correction step. Motion estimation is performed directly in projection space (model‐based) by comparing acquired projections to simulated projections that take motion‐fields into account. High temporal resolution is achieved as motion‐fields are estimated for each acquired gantry angle. We draw inspiration from the MRI framework Model‐based Reconstruction of MOTion from Undersampled Signals (MR‐MOTUS) 27 and subsequent developments 28 , 29 for our CBCT setting, hence the name CBCT‐MOTUS. Proof‐of‐principle tests of CBCT‐MOTUS are conducted on in silico, phantom and clinical acquisitions.

2. THEORY

The CBCT‐MOTUS framework performs the reconstruction process by alternating between motion estimation and correction (Figure 1), starting from a motion‐corrupted image. Motion estimation is performed directly in projection space (model‐based) by comparing the acquired signal (Section 2.1) to simulated projections that take motion‐fields into account (Section 2.2). Motion correction is performed with the estimated motion‐fields (Section 2.3). Motion estimation and correction are subsequently iteratively performed with, respectively, the motion‐corrected images and estimated motion‐fields. We will refer to one step of motion estimation and correction as an alternation. For the overall CBCT‐MOTUS framework, we refer the reader to Algorithm 1.

FIGURE 1.

FIGURE 1

Schematic overview of the CBCT‐MOTUS framework: alternate between motion estimation and correction. Motion estimation is performed directly in projection space (model‐based) using the acquired signal (projection data) and simulated projections of motion‐corrupted/corrected images. Motion correction is performed with the estimated motion‐fields. CBCT, cone beam computed tomography.

ALGORITHM 1. The CBCT‐MOTUS framework for joint reconstruction of motion estimation and correction.

graphic file with name MP-52-0-g002.jpg

2.1. Signal

Let xt[r] denote the densities of a deforming object at time t over a uniform spatial grid r. A CBCT system rotates an x‐ray source and imaging sensors (detectors) around the object and measures the photon intensity after attenuation through the object. The projection signal can be modeled as

st=s0expLtxtdr=s0expyt, (1)

where s0 is the source intensity, Lt are the lines from source to detectors, and yt the line integrals. The modeling is mono‐energetic and does not take into consideration non‐idealities of the imaging chain such as scattering and noise. Line integrals are modeled through the system matrix At=[aij]t, specifying the contribution of voxel j to detector i, giving

yt=Atxt. (2)

Then, let Dt be the motion‐field that deforms x0 to xt:

xt[r]=x0[Dt(r)]=x0[r+δt(r)], (3)

with displacement function δt to sample x0. We will refer to x0 as the reference image, st as the projection signal, and yt as the integral signal. The signal is acquired at high frequency from beginning to end of the acquisition, which we will refer to as time‐resolved. The reference image is an image reconstruction in an arbitrary motion state and is updated in each alternation. Since the motion is time‐resolved, any time frame can be used as the reference motion state and no separate stationary scan is necessary for registration. For the first alternation, a motion‐corrupted image, that is, without any motion compensation. An explicit relation between the reference image, motion‐fields, and the projection/integral signal of the deforming object can be obtained by substituting (3) in (2) and rewriting (1) to obtain the signal model:

yt=lnsts0=Atx0[Dt(r)]. (4)

This relation ensures that the signal model is linear rather than exponential.

2.2. Motion estimation

Motion estimation is performed directly in projection space by minimization of a model‐based functional. High temporal resolution is achieved as motion‐fields are estimated for each acquired gantry angle. However, motion estimation for all three components is challenging due to limited angular information per time frame. The optimization process benefits from a priori assumptions that the motion‐fields can be compressed using B‐spline parameterization to reduce the number of parameters to estimate (Section 2.2.1), the spatio‐temporal correlation of motion can be exploited using a low‐rank motion model (Section 2.2.2), and that smoothness can be enforced using spatial regularization to include a priori knowledge on motion‐fields (Section 2.2.3).

Note that (4) explicitly connects the integral signal to a reference object through non‐rigid motion‐fields Dt. That is, if reference image x0 and integral signal yt are available, the motion can be estimated by solving the inverse problem corresponding to (4). We parameterize the motion‐fields with a lower‐dimensional basis using basis coefficients c to reduce the number of parameters. Writing (4) in operator form

yt=Ft(cx0)=Atx0[Dt(rc)],

where Ft is the model that performs image‐domain deformations and forward projections, we estimate motion‐fields by solving the following minimization problem (Section 2.2.4):

argminct=1TFt(cx0)yt22+λR(Dt(rc)), (5)

where T is the number of time frames, R is a spatial regularizer that models a priori knowledge on motion‐fields, and λR+ controls the strength of the regularization. Note that there is no explicit modeling of noise in the projection data.

2.2.1. Motion compression

We assume that motion is spatially smooth, which has been shown to be highly compressible. 27 Accordingly, we can represent the motion in a lower‐dimensional basis to compress it and reduce the number of parameters to optimize. We obtain smooth non‐rigid motion‐fields by defining the spatial parametrization as uniformly spaced control points of cubic B‐splines. The number of control points per dimension affects the complexity of motion and local smoothness.

2.2.2. Motion model

Patient motion contains high spatio‐temporal correlation between subsequent time frames. That is, motion is spatially and temporally smooth. Respiratory motion, for example, has a semi‐periodic nature. We exploit this correlation with a low‐rank motion model 30 to separate the motion into spatial and temporal components. This further reduces the number of parameters to optimize for. A single angle only contains 2D information, making it more difficult to estimate 3D motion‐fields. Temporal splines allow for correlation between frames and thus enable estimating 3D motion‐fields from 2D images. Several components can be combined to account for different types of motion, including irregular motion. The modeled motion is defined by scaling spatial components with temporal ones:

Dt(rc)=r+c=1Ncompbx(r)ccxby(r)ccybz(r)cczσc(r)bτ(t)ccττc(t) (6)

where spatial coefficients cx,y,z are expanded by 3D B‐spline basis functions bx,y,z evaluated on grid r, and temporal coefficients cτ are expanded by 1D B‐spline basis function bτ evaluated at time t. That is, for each spatio‐temporal component c, the spatial component σ is scaled by temporal component τ. The final motion‐field is constructed by adding all Ncomp spatio‐temporal components, which equals the rank of the model.

2.2.3. Motion regularization

Regularization of the motion‐fields is necessary due to the limited angular information for each time frame. Therefore, trying to estimate motion merely from data consistency in projection space is an ill‐posed problem. The spatial regularizer models prior information on the motion‐fields to encourage realistic motion. In particular, we apply the total variation (TV) 31 on the Jacobian of the motion‐fields for regularizer R:

R(Dt(rc))=p{x,y,z}d{x,y,z}Dtp(rc)d22,

where Dpd is the gradient in the d‐direction of the pth dimension of the motion‐field. This expression considers all components of the displacement gradients jointly. The gradients of the motion‐fields can be computed analytically from (6) using B‐spline derivatives or numerically by finite differences.

2.2.4. Motion optimization

The minimization problem of (5) is performed directly in projection space by minimizing the difference between the acquired signal and simulated projections created from deformed reference images, with respect to motion parameters c. The deformations are controlled by the motion compression, low‐rank motion model and spatial motion regularization.

Derivative‐based algorithms can be used for solving this minimization problem. Specifically, we apply gradient descent to find motion parameters c by rephrasing (5) and minimizing loss function

L(c)=t=1TFt(cx0)yt22+λR(Dt(rc)). (7)

We use PyTorch's automatic differentiation (autograd) 32 functionality to automatically compute derivatives. It tracks all operations performed on motion parameters c and computes the gradients Lc by applying backpropagation once (7) has been evaluated. We explicitly define the backward projection AtT as the backward step for the line integrals. We use the NAdam optimizer 33 as the gradient descent algorithm with Nepochs epochs and learning rate γ. The motion parameters c are randomly initialized. For the pseudocode of the algorithm we refer the reader to Algorithm 2.

ALGORITHM 2. Motion estimation: gate‐less model‐based non‐rigid motion estimation algorithm.
graphic file with name MP-52-0-g005.jpg

2.3. Motion correction

Motion correction is performed to mitigate related artifacts by reconstructing an image to a single reference position using the motion‐fields obtained through the methods of Section 2.2. Image reconstruction requires projection signals over a range of gantry angles. Concatenating the system matrices and integral signals of all time frames, we get

argminxAxy. (8)

In order to compute x in (8), we solve a weighted‐least squares problem using the simultaneous iterative reconstruction technique (SIRT), 34 given by

x(n+1)=x(n)+CATRyAx(n)=x(n)+CtAtTRtytAtx(n), (9)

where C=1t=1TAtT1P and Rt=1At1I are normalizations, and 1P and 1I images of ones in projection and image space, respectively. This method iteratively updates the current guess by backprojection of the error between the measured data and simulated projections for all time frames.

We incorporate motion correction by first deforming the current guess to match the motion state of yt, assuming that the former is in the reference position. Let Wt be the deformation operation corresponding to (3):

xt[r]=x0[Dt(r)]=Wt{x0}[r], (10)

and let Dt1 be the inverse motion‐field that deforms xt to x0 with corresponding deformation operator Wt1:

x0[r]=xt[Dt1(r)]=xt[r+δt1(r)]=Wt1{xt}[r], (11)

with δt1 the inverse displacement function of δt such that

Dt1(Dt(r))=r=Dt(Dt1(r)),Wt1{Wt{x}}=x=Wt{Wt1{x}}. (12)

Assuming that the current guess x(n) is in the reference position, we must first deform it with Wt to match the motion state of yt. After computing the error and backprojection, the result is warped back with Wt1 to the reference position to perform motion correction. Moreover, a non‐negativity constraint is applied. For the pseudocode of the algorithm we refer the reader to Algorithm 3.

ALGORITHM 3. Motion correction: motion‐corrected image reconstruction algorithm.

graphic file with name MP-52-0-g001.jpg

In general, the exact inverse displacement function δt1 does not exist. Instead, we make use of adjoint pull and push operations, which, respectively, sample and splat an image with respect to a motion‐field. Specifically, we define

Wt{x}=pull(x,Dt(r)),Wt1{x}=push(x,Dt(r))/push(1,Dt(r)), (13)

where we normalized the latter to account for the number of voxels splatting into each voxel.

3. METHODS

The following data were acquired in three different experiments:

  • 1.

    In silico allows for validation against ground truth reference images and motion‐fields,

  • 2.

    Phantom acquisition validates against static images and ground truth motion‐fields,

  • 3.

    Clinical acquisitions are patient data providing more complex motion‐fields.

We evaluated the performance of the CBCT‐MOTUS framework for all experiments by comparing the initial motion‐corrupted reference with the motion‐corrected image, and with the use of line profiles. Moreover, we quantified the in silico experiment by computing the structural similarity index measure (SSIM) between static and motion‐corrupted/corrected images over the field of view (FOV) volume, and by (the inverse of) the spatial resolution wESF estimated as the width of the edge spread function (ESF) fitted on the line profiles. 26

3.1. In silico: XCAT phantom

In the first experiment, the digital anthropomorphic XCAT phantom 35 was utilized with varying amplitudes and frequencies of respiration for realistic physiological motion with an average respiratory period of 5 s. We defined the ground truth motion using the initial reference image and motion‐fields to create each dynamic. The projection signals were simulated by forward projecting each dynamic, using cone geometry and gantry angles from a clinical scan. Expanding on (1), the simulated signal becomes

st=Poisson(s0exp(yt))+Normal(0,σelectronic2),

where Poisson adds Poisson noise to the projection signal to represent X‐ray quantum noise, and Normal adds electronic background noise with variance σelectronic2 to test the framework's robustness to noise. We set the source intensity s0=1.0·105 and σelectronic2=10. 22

Moreover, we quantitatively evaluated the performance of the motion estimation method by re‐running it using a motion‐ and artifact‐free reference image (i.e., the XCAT ground truth image) for x0 in (5) instead of the motion‐corrupted image. This provides validation that using motion‐corrupted/corrected images as surrogates for motion‐free reference images is effective.

3.2. Phantom acquisition

With the second experiment, part of an Alderson radiation therapy (ART) torso phantom was placed on an in‐house developed moving platform, 36 constructed with a DC electronic motor and programmable through Arduino, inside the Elekta XVI system (Elekta AB, Stockholm, Sweden)—with 512×512 pixels, 0.8 mm spacing, and 1536 mm SDD—for data acquisition (Figure 2). This phantom is made up from individual slices with gaps/paper between them. We applied 1D transverse motion mimicking a respiratory pattern with different amplitudes and slightly different periods. As the phantom was larger than the transverse FOV, motion was also present outside the FOV. Similar to the in silico experiment, the ground truth motion is known and can be used for comparison. Moreover, we made a scan of the static phantom for reference.

FIGURE 2.

FIGURE 2

Phantom acquisition setup with part of an Alderson radiation therapy torso phantom placed on a programmable moving platform.

3.3. Clinical acquisitions

For the third experiment, we utilized raw patient data from clinical CBCT examinations acquired for position verification prior to radiotherapy at the University Medical Center Utrecht. As we used clinically obtained anonymized data, our medical ethics committee waived the requirement for written informed parental consent for participation in the study. We used the data of four patients: patient A (lung cancer), patient B (lung cancer), patient C (bone metastases), and patient D (lung cancer).

3.4. Convergence

The CBCT‐MOTUS framework alternates between motion estimation and image correction steps until a maximum number of alternations or convergence has been reached. We define convergence for our framework when the motion estimation no longer improves (determined by visual inspection).

3.5. General settings

The acquisition time for all experiments was 29 s with 182 ms between consecutive projections, creating 160 frames in total, covering 180. The temporal resolution of the motion is also 182 ms as we estimate 3D motion‐fields for each projection. The source intensity s0 was estimated for the phantom and clinical experiments from their projection signals by computing the mean value of an area of background pixels (i.e., where there was no object between source and detector). We extended the reconstruction domain to prevent reconstruction artifacts in the SIRT reconstructions to account for the fact that the patient is not fully inside the scanner's FOV. The regularization strength was determined empirically such that motion is relatively smooth while ensuring sufficient data consistency. See Table 1 for all relevant parameter settings of the motion estimation and correction.

TABLE 1.

Details of the experiments: settings for image reconstruction, motion model, and motion estimation.

Parameter In silico Phantom acquisition Clinical acquisitions
Resolution
256×256×256
264×270×270
264×270×270
Voxel size (mm)
1.2×1.2×1.2
1.15×1.22×1.22
1.15×1.22×1.22
Number of reconstruction iterations Niterations 200 200 200
Spatial spline control points per dimension 32 7 32
Temporal spline control points 2.0 s1 2.0 s1 3.0 s1
Number of spatio‐temporal components Ncomp 1 1 3
Learning rate γ
1.0·101
1.0·101
1.0·101
Spatial regularization coefficient λ
1.0·106
1.0·106
1.0·106
Number of motion estimation epochs Nepochs 100 100 100
Maximum number of alternations Nalternations 50 50 20

The scanner was processed with the ASTRA Toolbox 37 , 38 on the NVIDIA Quadro P6000 graphical processing unit (GPU) using PyTorch and the tomosipo 39 library for 3D tomography. Warped volumes were resampled on a grid using tri‐linear interpolation.

4. RESULTS

4.1. In silico: XCAT phantom

The motion‐corrected image of the in silico data shows improved image features (Figure 3). The motion‐corrupted image shows motion artifacts around the lung‐liver interface, heart and abdominal aorta (Figure 3a, 0.86 SSIM). The lung‐liver interface is visually sharp after applying motion correction on the in silico data, which was initially blurred due to the breathing motion of the XCAT simulation (Figure 3b, 0.90 SSIM). This motion artifact is reflected by its relatively flat line profile (1/wESF=0.12 mm1) with two steeper sections, indicating two contours (Figure 3c). The single steep line profile (1/wESF=0.40 mm1) of the final motion‐corrected image approximates the static image, indicating a sharp transition and reduction of motion artifacts. The image reconstruction with the motion‐field validation, where we use the ground truth XCAT image as the reference, shows that the applied alternations approach convergence (0.92 SSIM, 1/wESF=0.37 mm1), indicating that the motion‐corrected image converges toward the ground truth.

FIGURE 3.

FIGURE 3

In silico (XCAT phantom): coronal slice of (a) the motion‐corrupted and (b) motion‐corrected images. (c) A line profile (magenta) through the lung–liver interface is taken of the motion‐corrupted, motion‐corrected, static and validation image reconstructions. (d) Coronal slice of the estimated motion‐fields (from reference to max‐exhale) imposed on the motion‐corrected image and (e) the motion profile in a voxel (magenta) tracing z‐displacement of the estimated, ground truth, and validation motion.

The motion estimation captures the non‐rigid and irregular respiratory motion of the XCAT phantom (Figure 3d, Animated Figure 1 1). The non‐rigid motion is mostly estimated around the lung‐liver interface and other regions with high contrast. Little motion is estimated in the liver, and some ribs are moving too resulting in artifacts. The estimated motion profile in the lung–liver interface (Figure 3e) approximates the ground truth motion with a root mean squared error (RMSE) of [x:0.09,y:1.05,z:0.50] mm. The motion‐field validation profile shows that the applied alternations approach convergence, and has an RMSE of [x:0.09,y:1.17,z:0.32] mm w.r.t the ground truth.

ANIMATED FIGURE 1.

ANIMATED FIGURE 1

In silico (XCAT phantom): animation of the ground truth, estimated and validation motion. Motion‐fields are imposed and warp the ground truth and motion‐corrected images.

4.2. Phantom acquisition

The motion‐corrected image of the phantom acquisition shows reduced motion artifacts (Figure 4). The two contours of the lung–liver interface in the motion‐corrupted image (Figure 4a) have been corrected into one sharp contour. Moreover, the blurred gaps between the phantom slabs are more clearly defined after motion correction (Figure 4b). Note that gaps further from the center are less sharp due to cone‐beam artifacts, which we also observed in the static image reconstruction. The two steep line profiles of the final motion‐corrected image approximate the static reference image (Figure 4c), indicating recovery of image features. Note that the center gaps appear sharper as they are more parallel to the gantry.

FIGURE 4.

FIGURE 4

Alderson phantom acquisition: coronal slice of (a) the motion‐corrupted and (b) motion‐corrected images. (c) A line profile (magenta) through the lung–liver interface is taken of the motion‐corrupted, motion‐corrected and static images. (d) Coronal slice of the estimated motion‐fields (from reference to maximum amplitude) imposed on the motion‐corrected image and (e) the motion profile in a voxel (magenta) tracing z‐displacement of the estimated and ground truth motion.

The motion estimation captures the applied transverse rigid motion, where all deformation vectors point in the same direction (Figure 4d, Animated Figure 2, 1). Less motion is estimated near the homogeneous bottom region of the phantom. The largest deviation from ground truth is found around gantry angle 0 (Figure 4e), which can be explained by the fact that at horizontal projections the motion information encoded in the 2D projections is overlapping for many organs. The estimated motion profile approximates the applied ground truth motion with an RMSE of [x:0.79,y:1.04,z:1.42] mm.

ANIMATED FIGURE 2.

ANIMATED FIGURE 2

Alderson phantom acquisition: animation of the ground truth and estimated motion. Motion‐fields are imposed and warp the static and motion‐corrected images.

4.3. Clinical acquisitions

The motion‐corrected images of the clinical acquisitions show improved image features for all four patients (Figure 5). The initially blurred lung–liver interfaces of all patients (Figure 5a,d) have sharper appearances after motion correction in both coronal and axial views. Moreover, the outline of the lung tumor of patient B is more clearly defined (Figure 5b,e). The steeper line profiles show how the interfaces have sharpened visually, as opposed to the flat line profiles of the motion‐corrupted images (Figure 5c).

FIGURE 5.

FIGURE 5

Clinical acquisition: coronal/axial slices of (a,d) the motion‐corrupted and (b,e) motion‐corrected images of patients A, B, C, and D. (c) Line profiles (magenta) through a tumor and the lung–liver interfaces are taken of the coronal motion‐corrupted and motion‐corrected images.

The motion estimation captures the irregular and periodic non‐rigid respiratory motion of the patients (Figure 6a, Animated Figure 3, 1). Most of the motion is estimated around the lung–liver interface and patient B's tumor, which are regions with high contrast. The motion profiles near the lung–liver interface of patients B and D indicate regular breathing, while the breathing of patients A and C is more irregular (Figure 6b). Amplitudes of all estimated motion ranges between [5.2,30.6] mm (15.3 mm average) and periods between [2.0,6.0] s (3.6 s average), which are realistic and expected from real human breathing. We observe that patient A takes shorter breaths halfway through the scan. However, its motion profiles shows more jagged breathing around gantry angles of 0. Moreover, we observe that motion is estimated near and outside the FOV, particularly for patient D. In short, the clinical data confirms that our gate‐less method is able to estimate irregular motion in a time‐resolved manner.

FIGURE 6.

FIGURE 6

Clinical acquisition: (a) Coronal slices of the estimated motion‐fields (from reference to max‐exhale) imposed on the motion‐corrected images. (b) The motion profile in a voxel (magenta) tracing z‐displacement of the estimated motion.

ANIMATED FIGURE 3.

ANIMATED FIGURE 3

Clinical acquisition: animation of the estimated motion for all four patients. Motion‐fields are imposed and warp the motion‐corrected images.

4.4. Description of the animated figures

The animated figures are part of the main body of this work and can be found online.

4.5. Convergence

The motion estimation step converges for all three experiments (Figure 7). The final motion estimation loss of (7) decreases with each alternation. The convergence plot plateaus after about 50 alternations for the in silico and phantom experiments. The clinical experiments converged after 10,5,7, and 16 alternations for patients A, B, C, and D, respectively. We observe that applying further alternations increases the final loss and thus worsens the motion estimation and correction. Therefore, we stop the framework at the converged alternation.

FIGURE 7.

FIGURE 7

Convergence plots for the in silico, phantom and clinical (patients A and B) experiments. The vertical line indicates the converged alternation.

5. DISCUSSION

The CBCT‐MOTUS framework estimates and corrects irregular and periodic non‐rigid motion with high temporal resolution (per‐projection temporal resolution of 182 ms), as shown in in silico simulations, phantom acquisitions and clinical data. The motion‐corrected images show improved image features for all experiments, such as deblurring. The motion is time‐resolved and represents real movement instead of averaged respiratory motion as no assumption on periodicity is used. Clinical gating typically requires prolonged, high‐dose 4D acquisitions to ensure robustness. In contrast, our method achieves reliable performance with shorter 3D scans at high per‐projection temporal resolution, underscoring its added clinical value.

Image features of all image reconstructions are improved for all experiments after performing motion correction, with an increase in SSIM and spatial resolution for the in silico experiment. The lung–liver interfaces are sharp and patient B's tumor is more clearly defined, which were initially blurred. The clinical relevance of the motion corrections is the decrease in uncertainty regarding motion artifacts in image reconstructions. The reference position appears to be near mid‐ventilation, as apparent from the motion profiles. If desired, a different reference position may be chosen prior to reconstruction.

We observe from the in silico and clinical experiments that motion is encoded in regions of high contrast. These regions are discernable in projection space and facilitate motion estimation. Therefore, motion is estimated around the lung–liver interface, while motion of the liver is not due its homogeneity. Estimating motion in homogeneous areas with no encoding power remains a challenge. The low‐rank motion model ensures a certain degree of smoothness in time and is able to extract respiratory signals in the temporal components. Future work includes applying this model to estimate complex, non‐compressible motion with decoupled spatial and temporal components (e.g., cardiac + respiratory), and other regions of the body (e.g., abdomen, extremities, brain/head). The TV spatial regularization on the Jacobian of the motion motion‐fields and cubic B‐spline parameterization ensure a certain degree of motion smoothness around the lung‐liver interface. However, non‐smooth motion, such as sliding motion, can be a major challenge for accurate motion estimation, as some ribs move along with the estimated motion. Future work includes matching the model complexity with the motion types.

Although image features are improved after one alternation, applying more alternations further improves both the motion estimation and correction gradually. 13 The motion‐corrected images allow for better motion estimation and the estimated motion‐fields yield better motion‐corrected images. At convergence, the motion estimation no longer improves. As a consequence, the motion correction no longer improves either. We validated with the in silico experiment that using motion‐corrupted/corrected images as a surrogate for motion‐free reference images is effective. The phantom experiment corroborates this. The clinical experiments converged after a few alternations in terms of the loss function. Future work includes investigating the convergence properties, parameters, and criteria of the framework to fully automate it.

The results show that the CBCT‐MOTUS framework can estimate and correct motion despite not being optimized to correct for image‐degrading effects, such as photon scatter and ghosting. Ideally, changes in the measured signal intensities correspond only to patient motion. The framework can benefit from addressing these degradations by improving the signal model or using more advanced iterative reconstruction techniques 40 rather than SIRT. Better image reconstruction quality can be achieved by, for example, performing photon scatter estimation, modeling the quantum noise and electronic noise that are intrinsic to the x‐ray detectors, and correcting for cone‐beam artifacts such as ghosting.

Performing one alternation takes 33 min with the settings from Table 1 and GPU‐acceleration, of which 25 min are for motion estimation and 8 min for motion correction. The motion estimation may be further accelerated by using other solvers for faster convergence such as Gauss‐Newton or L‐BFGS. 41 Moreover, explicitly defining the gradients for faster convergence 13 omits the use of PyTorch's autograd 32 functionality. Deep learning could replace the motion estimation loop with networks that have the reference image and undersampled signals as input and motion‐fields as output, at the expense of having no guarantee of data fidelity. Moreover, the settings in Table 1 can be altered to tradeoff computation time and quality. Lower resolution images could be used for faster simulated projections for motion estimation. 7

The motion estimation appears to be more challenging around gantry angles of 0. The signal data is attenuated more when scanning the patient from the side and less information is visible than from the top. The quality of the motion estimation is thus angle‐dependent and sensitive toward parallel versus orthogonal motion with respect to the gantry angle. This is addressed by exploiting spatio‐temporal correlation in the low‐rank motion model and spatial motion‐field regularization. Another consideration is that motion outside the FOV (e.g., patient B's left lung) is not present in all the signal data and could affect motion estimation inside the FOV. This is handled by expanding the reconstruction domain to outside the FOV and estimate motion there for the relevant gantry angles. We stop at convergence to prevent overfitting due to extrapolation of motion and propagation of noise. The in silico and phantom experiments show, however, that the captured motion is quantitatively correct.

The CBCT‐MOTUS framework could be improved by performing parameter studies to determine optimal values for the parameters in Table 1, perhaps patient specific, and statistical analyses on more clinical acquisitions. Different convergence criterions looking at image features rather than motion optimization could be explored. Moreover, the impact of gantry rotation speed and other acquisition parameters could be investigated with different protocols. As mentioned previously, implementing photon scatter estimation and correction would improve the signal model, and therefore the motion estimation.

6. CONCLUSION

We have developed a gate‐less reconstruction method (CBCT‐MOTUS) for model‐based non‐rigid motion estimation and correction in CBCT imaging for irregular and periodic motion. The motion estimation is performed in projection space to achieve high temporal resolution (per‐projection temporal resolution of 182 ms) and estimate 3D motion‐fields per gantry angle. The correction of non‐rigid irregular and periodic motion is enabled by exploiting the smoothness and compressibility of motion using a low‐rank B‐spline motion model. The framework employs the motion‐corrupted images as the initial reference and signal data to alternate between motion estimation and motion‐corrected image reconstruction. We applied it to three proposed experiments, which all showed reduced motion artifacts and improved image features in the image reconstructions, and irregular and periodic motion patterns in the motion estimation.

CONFLICTS OF INTEREST STATEMENT

This work was funded by the Dutch Cancer Society (KWF) with contributions from Philips and Tesla DC, and by the the Netherlands Organisation for Scientific Research (NWO). Casper Beijst, Rodrigo José Santo, Cornelis A.T. van den Berg, and Alessandro Sbrizzi, have a patent pending for the motion correction technique.

ACKNOWLEDGMENTS

This work was funded by the Dutch Cancer Society (KWF) under grant 14848 with contributions from Philips and Tesla DC, and by the the Netherlands Organisation for Scientific Research (NWO) under grant 19003. The authors would like to thank Hugo W.A.M. de Jong for the fruitful discussions, and André A.J.M. Wopereis, Dominique J.P.M. Nouws and Tom T.F. Nijsink for helping with scanning the Alderson phantom.

Waterink E, Santo RJ, Berg CAT, Sbrizzi A, Beijst C. Motion correction in CBCT imaging using gate‐less model‐based reconstruction of non‐rigid motion and images. Med Phys. 2025;52:e70070. 10.1002/mp.70070

Footnotes

1

The Animated Figures can be found online. See Section 4.4 for a description of all the animated figures.

REFERENCES

  • 1. Liu H, Schaal D, Curry H. Review of cone beam computed tomography based online adaptive radiotherapy: current trend and future direction. Radiat Oncol. 2023;18:144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Kyme AZ, Fulton RR. Motion estimation and correction in SPECT, PET and CT. Phys Med Biol. 2021;66:18TR02. [DOI] [PubMed] [Google Scholar]
  • 3.Jr,. REK, Hertanto AE, Hu YC, et al. Evaluation of respiratory motion‐corrected cone‐beam CT at end expiration in abdominal radiotherapy sites: a prospective study. Acta Oncol. 2018;57:1017–1024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Hugo GD, Rosu M. Advances in 4D radiation therapy for managing respiration: part I ‐ 4D imaging. Z Med Phys. 2012;22:258‐271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Park S, Plishker W, Quon H, Wong J, Shekhar R, Lee J. Deformable registration of CT and cone‐beam CT with local intensity matching. Phys Med Biol. 2017;62:927‐947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Martin R, Pan T. Target volume and artifact evaluation of a new data‐driven 4D CT. Practical Radiat Oncol. 2017;7:e345‐e354. [DOI] [PubMed] [Google Scholar]
  • 7. Huang Y, Thielemans K, Price G, McClelland JR. Surrogate‐driven respiratory motion model for projection‐resolved motion estimation and motion compensated cone‐beam CT reconstruction from unsorted projection data. Phys Med Biol. 2024;2:025020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Philips , Motion‐compensated reconstruction for coronary imaging, White paper, Philips, 2021.
  • 9. Rankine L, Wan H, Parikh P, et al. Cone‐beam computed tomography internal motion tracking should be used to validate 4‐dimensional computed tomography for abdominal radiation therapy patients. Int J Radiat Oncol*Biology*Phys. 2016;95:818–826. [DOI] [PubMed] [Google Scholar]
  • 10. Wijenayake U, Park SY. Real‐time external respiratory motion measuring technique using an RGB‐D camera and principal component analysis. Sensors. 2017;17:1840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. S Niebler, Schömer E, Tjaden H, Schwanecke U, Schulze R. Projection‐based improvement of 3D reconstructions from motion‐impaired dental cone beam CT data. Med Phys. 2019;46:4470‐4480. [DOI] [PubMed] [Google Scholar]
  • 12. Thies M, Wagner F, Maul N, et al. A gradient‐based approach to fast and accurate head motion compensation in cone‐beam CT. IEEE Trans Med Imaging. 2024;44:1098‐1109. [DOI] [PubMed] [Google Scholar]
  • 13. Huang Y, Thielemans K, McClelland JR. Surrogate‐optimized respiration motion model to estimate motion for each projection of 3D CBCT Scan. In: XXth International Conference on the Use of Computers in Radiation Therapy ; 2024.
  • 14. Hinkle J, Szegedi M, Wang B, Salter B, Joshi S. 4D CT image reconstruction with diffeomorphic motion model. Med Image Anal. 2012;16:1307‐1316. [DOI] [PubMed] [Google Scholar]
  • 15. Ali ASRA, Landi C, Sarti C, Fusiello A. Non‐iterative compensation for patient motion in dental CBCT imaging. In: 2023 IEEE Nuclear Science Symposium, Medical Imaging Conference, and Room‐Temperature Semiconductor Detectors Symposium ; 2023.
  • 16. Thies M, Wagner F, Maul N, et al. Gradient‐based geometry learning for fan‐beam CT reconstruction. Phys Med Biol. 2023;68:205004. [DOI] [PubMed] [Google Scholar]
  • 17. Sisniega A, Stayman JW, Yorkston J, Siewerdsen JH, Zbijewski W. Motion compensation in extremity cone‐beam CT using a penalized image sharpness criterion. Phys Med Biol. 2017;62:3712‐3734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Capostagno S, Sisniega A, Stayman JW, Ehtiati T, Weiss CR, Siewerdsen JH. Deformable motion compensation for interventional cone‐beam CT. Phys Med Biol. 2021;66:055010. [DOI] [PubMed] [Google Scholar]
  • 19. Amirian M, Montoya‐Zegarra JA, Herzig I, et al. Mitigation of motion‐induced artifacts in cone beam computed tomography using deep convolutional neural networks. Med Phys. 2023;50:6228‐6242. [DOI] [PubMed] [Google Scholar]
  • 20. Amirian M, Barco D, Herzig I, Schilling FP. Artifact reduction in 3d and 4D cone‐beam computed tomography images with deep learning: a review. IEEE Access. 2024;12:10281‐10295. [Google Scholar]
  • 21. Shao HC, Mengke T, Pan T, Zhang Y. Real‐Time CBCT imaging via a single arbitrarily‐angled x‐ray image using a joint dynamic reconstruction and motion estimation (DREME) framework. In: XXth International Conference on the Use of Computers in Radiation therapy ; 2024.
  • 22. Zhang H, Chen K, Xu X, You T, Sun W, Dang J. Spatiotemporal correlation enhanced real‐time 4D‐CBCT imaging using convolutional LSTM networks. Front Oncol. 2024;14:1390398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Maier J, Herbst T, Sawall S, Arheit M, Paysan P, Kachelrieß M. Projection‐based motion correction for shifted‐detector cone‐beam CT. In: 2023 IEEE Nuclear Science Symposium, Medical Imaging Conference, and Room‐Temperature Semiconductor Detectors Symposium ; 2023.
  • 24. Huang Y, Preuhs A, Lauritsch G, Manhart M, Huang X, Maier A. Data Consistent Artifact Reduction for Limited Angle Tomography with Deep Learning Prior, MICCAI MLMIR 2019.
  • 25. Huang H, Siewerdsen J, Zbijewski W, et al. Reference‐free learning‐based similarity metric for motion compensation in cone‐beam CT. Phys Med Biol. 2022;67‐79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Huang H, Liu Y, Siewerdsen J, et al. Deformable motion compensation in interventional cone‐beam CT with a context‐aware learned autofocus metric. Med Phys. 2024;51:4158‐4180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Huttinga NRF, van den Berg CAT, Luijten PR, Sbrizzi A. MR‐MOTUS: model‐based non‐rigid motion estimation for MR‐guided radiotherapy using a reference image and minimal k‐space data. Phys Med Biol. 2020;65:015004. [DOI] [PubMed] [Google Scholar]
  • 28. Olausson TE, Terpstra ML, Huttinga NRF, et al. Free‐Running Time‐Resolved First‐Pass Myocardial Perfusion Using a Multi‐Scale Dynamics Decomposition: CMR‐MOTUS; Magn Reson Mater Phy. 2025. [DOI] [PubMed]
  • 29. Santo RJ, Waterink E, Van Den Berg CA, De Jong HW, Sbrizzi A, Beijst C. PET‐MOTUS: 500 ms non‐rigid motion estimation and correction for PET imaging. In: 2024 IEEE Nuclear Science Symposium (NSS), Medical Imaging Conference (MIC) and Room Temperature Semiconductor Detector Conference (RTSD) ; 2024:1‐1.
  • 30. Huttinga NRF, Bruijnen T, van den Berg CAT, Sbrizzi A. Nonrigid 3D motion estimation at high temporal resolution from prospectively undersampled k‐space data using low‐rank MR‐MOTUS. Magn Reson Med. 2021;85:2309‐2326. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Blomgren P, Chan T. Color TV: total variation methods for restoration of vector‐valued images. IEEE Trans Image Process. 1998;7:304‐309. [DOI] [PubMed] [Google Scholar]
  • 32. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G. PyTorch: An Imperative Style, High‐Performance Deep Learning Library, Curran Associates Inc.; 2019;8026–8037. [Google Scholar]
  • 33. Dozat T. Incorporating Nesterov momentum into Adam. In: International Conference on Learning Representations ; 2016.
  • 34. Gregor J, Benson T. Computational Analysis and Improvement of SIRT. IEEE Trans Med Imaging. 2008;27:918‐924. [DOI] [PubMed] [Google Scholar]
  • 35. Segars WP, Sturgeon G, Mendonca S, Grimes J, Tsui BM. 4D XCAT phantom for multimodality imaging research. Med Phys. 2010;37:4902‐4915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Dietze MMA, Kunnen B, Lam MGEH, de Jong HWAM. Interventional respiratory motion compensation by simultaneous fluoroscopic and nuclear imaging: a phantom study. Phys Med Biol. 2021;66:065001. [DOI] [PubMed] [Google Scholar]
  • 37. van Aarle W, Palenstijn WJ, Beenhouwer JD, et al. The ASTRA Toolbox: a platform for advanced algorithm development in electron tomography. Ultramicroscopy. 2015;157:35‐47. [DOI] [PubMed] [Google Scholar]
  • 38. van Aarle W, Palenstijn WJ, Cant J, et al. Fast and flexible x‐ray tomography using the ASTRA Toolbox. Opt Express. 2016;24:25129‐25147. [DOI] [PubMed] [Google Scholar]
  • 39. Hendriksen A, Schut D, Palenstijn WJ, et al. Tomosipo: Fast, flexible, and convenient 3D tomography for complex scanning geometries in Python. Opt Express. 2021;29:40494‐40513. [DOI] [PubMed] [Google Scholar]
  • 40. Cao Q, Zbijewski W, Sisniega A, Yorkston J, Siewerdsen JH, Stayman JW. Multiresolution iterative reconstruction in high‐resolution extremity cone‐beam CT. Phys Med Biol. 2016;61:7263‐7281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Liu DC, Nocedal J. On the limited memory BFGS method for large scale optimization. Math Program. 1989;45:503‐528. [Google Scholar]

Articles from Medical Physics are provided here courtesy of Wiley

RESOURCES