Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 May 26.
Published in final edited form as: Med Phys. 2024 Apr 3;51(5):3309–3321. doi: 10.1002/mp.17047

Estimate and Compensate Head Motion in Non-contrast Head CT Scans Using Partial Angle Reconstruction and Deep Learning

Zhennong Chen 1, Quanzheng Li 1, Dufan Wu 1
PMCID: PMC11128317  NIHMSID: NIHMS1995729  PMID: 38569143

Abstract

Background:

Patient head motion is a common source of image artifacts in computed tomography (CT) of the head, leading to degraded image quality and potentially incorrect diagnoses. The partial angle reconstruction (PAR) means dividing the CT projection into several consecutive angular segments and reconstructing each segment individually. Although motion estimation and compensation using PAR has been developed and investigated in cardiac CT scans, its potential for reducing motion artifacts in head CT scans remains unexplored.

Purpose:

To develop a deep learning (DL) model capable of directly estimating head motion from PAR images of head CT scans and to integrate the estimated motion into an iterative reconstruction process to compensate the motion.

Methods:

Head motion is considered as a rigid transformation described by 6 time-variant variables, including the 3 variables for translation and 3 variables for rotation. Each motion variable is modeled using a B-spline defined by 5 control points (CP) along time. We split the full projections from 360 degree into 25 consecutive PARs and subsequently input them into a convolutional neural network (CNN) that outputs the estimated CPs for each motion variable. The estimated CPs are used to calculate the object motion in each projection, which are incorporated into the forward and backprojection of an iterative reconstruction algorithm to reconstruct the motion-compensated image. The performance of our DL model is evaluated through both simulation and phantom studies.

Results:

The DL model achieved high accuracy in estimating head motion, as demonstrated in both the simulation study (mean absolute error (MAE) ranging from 0.28~0.45 mm or degree across different motion variables) and the phantom study (MAE ranging from 0.40~0.48 mm or degree). The resulting motion-corrected image, IDL,PAR, exhibited a significant reduction in motion artifacts when compared to the traditional filtered back-projection reconstructions, which is evidenced both in the simulation study (image MAE drops from 178±33HU to 37±9HU, structural similarity index (SSIM) increases from 0.60±0.06 to 0.98±0.01) and the phantom study (image MAE drops from 117±17HU to 42±19HU, SSIM increases from 0.83±0.04 to 0.98±0.02).

Conclusions:

We demonstrate that using PAR and our proposed deep learning model enables accurate estimation of patient head motion and effectively reduces motion artifacts in the resulting head CT images.

Keywords: head CT, motion correction, partial angle reconstruction, deep learning

1. INTRODUCTION

Patient head motion is a common source of image artifacts in computed tomography (CT) of the head1. The motion causes data inconsistency among the acquired CT projections and results in the blurring, streak artifacts and distortion in the reconstructed images2. The common approaches to reduce the likelihood of motion involve employing immobilization devices for patient restraint and administrating the anesthesia or sedation. However, immobilization devices may not be suitable for the patients with head trauma or stroke3, and the anesthesia complicates the procedure and may cause adverse effects, particularly for pediatric patients4. On the other hand, improved hardware designs such as faster gantry rotation or dual source can reduce scanning time and mitigate motion artifacts, but these designs may not be applicable to certain CT systems such as portable CT or cone-beam CT. Another hardware-based solution is integrating an external optical tracking system with markers on the patient to capture the motion trajectory57, but the accuracy might be degraded due to the complicated calibration between the tracking system coordinates and CT coordinates as well as marker invisibility.

On the software side, motion artifact reduction solutions can be categorized into two categories. The first category involves the deep learning techniques and solely involves image domain. These solutions employ convolutional neural networks (CNN) either fused with a ResNet2 or attention modules8 to output an artifact-reduced image using the motion-corrupted image as the input. However, although these solutions are easy to be applied because they do not require the projection data, the absence of projection data results in a lack of data consistency constraint and potential image hallucination which may lead to the false positive diagnosis8. The other category seeks solutions that compensate motion artifacts by utilizing different motion estimation approaches using the projection data. One solution is to use 3D-2D registration9,10 to register each 2D projection to a 3D prior image to estimate the motion. The 3D prior image can be a previously acquired motion-free image10, which is not always available for all the clinical applications. Alternatively, the 3D image can be initiated with the filtered-backprojection (FBP) result and iteratively updated9, while the image registration especially in the early steps is prone to errors. Another solution employs an iterative optimization scheme that aims to optimizing image-based motion artifact metrics (MAM) such as total variation norm11 or image entropy and sharpness12. The process involves repeated iterations of motion estimation and motion-compensated (MC) image reconstruction until the MAM value is optimized. However, this approach may face challenges with higher motion amplitudes and get trapped in local minima of the MAM. Furthermore, the MAM are mostly handcrafted and may not lead to the desired solution.

Another robust motion estimation approach, that is proposed and investigated in cardiac CT but not yet explored in head CT, is the utilization of partial angle reconstruction (PAR) images13,14. The underlying concept is as follows. The CT projection is divided into small consecutive angular segments, and each segment is backprojected individually to create its corresponding PAR image. Each PAR image represents the object’s pose within a short time interval. By registering two PAR images, a motion vector field (MVF) between the two time intervals can be calculated. Kim et al.13 proposed to estimate the MVF via registration from two conjugate PARs that are separated by 180°, while Hahn et al.14 segmented the projection into 27 PARs and significantly improved the temporal resolution of motion estimation. Nonetheless, PAR images with higher temporal resolution suffer from more pronounced limited-angle artifacts, potentially degrading the accuracy of image-based registration. To address this concern, Maier et al.15 recently proposed to leverage deep learning techniques to improve registration performance for the coronary artery in cardiac CT. They developed a CNN that takes all 25 PAR images as input and outputs the MVFs between the PARs. This DL-driven approach demonstrated high performance in both simulation and clinical studies.

The efficacy of PAR-based motion estimation and motion correction in head CT is unknown given the complexity of head anatomy compared to coronary arteries in cardiac CT. Therefore, this paper aims to address this question with two primary objectives:

  1. Investigate the effectiveness of PAR images in estimating and compensating motion in head CT scans.

  2. Develop a DL-driven approach capable of directly outputting head motion from PAR images.

2. METHODS

2.1. 3D rigid motion model

The whole head can be regarded as a single 3D rigid body; therefore, a head pose can be described using 6 motion variables including translations (tx, ty, tz) and rotations (rx, ry, rz) along all three image axes. Note that among 6 motion variables, there are 3 intra-slice motions including intra-slice translations tx, ty and rotation rz, as well as 3 inter-slice motions including inter-slice translation tz and rotations rx, ry. To model a series of head poses, we employed the model proposed by Jang et al12 in which each motion variable at the ith projection view is represented by a cubic B-spline model with 5 control points (CP) equally distributed across one gantry rotation time. For example, the intra-slice translation along the x-axis at the ith projection view, tx,i, is described:

tx,i=B1D(CPtx,0,CPtx,1,CPtx,2,CPtx,3,CPtx,4,i) (Eq. 1)

where B1D denotes 1-D B-spline interpolation and CP0 to CP4 refers to 5 control points. If the time of one gantry rotation is trot, the ith control point (CPi) represents the motion at time equal to i5trot The first CP (CP0) represents the initial state of the head and is always set to be 0 while CP1 to CP4 are varied in different motions. The subscript in each CP notation emphasizes that each of 6 motion variables has its own B-spline model with different CPs. Figure 1A and 1B illustrates two examples of our B-spline motion model.

Figure 1. Motion model and deep learning pipeline.

Figure 1.

(A) and (B) depict two examples of our B-spline motion model. In both figures, the 5 blue dots represent the control points (CP). The first CP is fixed with a value of 0, while the remaining 4 CPs are varied to represent different motion patterns. Specifically, (A) illustrates a continuous translation spanning a full gantry rotation (x-axis represents time) with motion amplitude = 4mm, whereas (B) illustrates a rotation with a duration lasting half of the gantry rotation and motion amplitude = −3°. (C) shows the pipeline of our approach, which consists of four steps. Step 1 involves dividing a CT projection equally into K segments. In Step 2, each segment undergoes FBP to produce a series of K PAR images covering a full gantry rotation (2π). Moving on to Step 3, all PAR images are input into a CNN, which outputs the CPs necessary to build the B-spline motion model for each of 6 motion variables outlined in Step 4. As illustrated in the last step, the first CP remains 0 while the remaining 4 CPs are obtained from the CNN’s output. FBP = filtered back-projection. PAR = partial angle reconstruction. H = height of PAR. W = width of PAR. Z = number of slices of PAR. CNN = convolutional neural network. CP = control point.

To clarify, after the motion modeling, estimating the head motion is technically equivalent as estimating the values of CP1 to CP4 for each of 6 motion variables. In total, we have a total number of 4 × 6 = 24 CPs to be estimated.

2.2. Partial angle reconstruction (PAR) images

Our method requires the CT projections before rebinning. The CT projections are divided equally into K non-overlapping angular segments. Hence, each segment k(K12kK+12) covers an angular interval of [kΔ2,k+Δ2], where k=π+Δ×k  and Δ=2πK. Each segment is then reconstructed by FBP without any view-weighting (such as Parker weights) to obtain a series of K PAR images covering the entire 360° gantry rotation. In our study, a large value of K = 25 (in total 25 PAR images) was picked to maintain high temporal resolution (=trot25,trot is one gantry rotation time) and reduce the motion artifacts within each PAR. Each PAR has its dimension of [H, W, Z] where H and W represent height and width and Z represents the number of z-slices it includes. The first two steps in Figure 1C illustrates the process of PAR image generation.

2.3. Deep learning (DL) model designs

We developed a DL model to estimate the head motion using PAR images as model input. As mentioned in the last paragraph in section 2.1, the task of the DL model is to estimate CP1 to CP4 for each of the 6 motion variables.

The model architecture of our 3D convolutional network is explained in Figure 2. The input to the model consists of a sequence of 25 PAR images, with dimensions [H, W, Z, 25], where each channel represents an individual PAR image. The model outputs 24 CPs (4 CPs for each motion variable) required to subsequently build the B-spline for each motion variable following Equation 1. The model loss is the summation of the mean-squared-error of each motion variable. This process is illustrated in the last two steps in Figure 1C.

Figure 2. Convolutional Neural Network (CNN) architecture.

Figure 2.

The model input is a series of 25 PARs, where each of PAR has 15 z-slices and has its height of 128 and width of 128. The model output consists of 4 CPs (corresponding to CP1 to CP4 in the B-spline model) for each of 6 motion variables. Conv3D = 3D convolution layer. CP = control point.

2.4. Integrate the estimated motion into iterative reconstruction

We used iterative reconstruction for easier integration of the estimated motion into the reconstruction. We observed that simple FBP-and-wrap led to aliasing artifacts because the sampling is no longer uniform due to the random motion during the acquisition. It violates the basic assumption during the discretization of the FBP algorithms especially when the motion magnitude is large. Hence, iterative reconstruction is employed to reduce the impact of flaws from the reconstruction algorithm to our motion estimation, which is the focus of this work. After the CPs are predicted by the trained CNN, we use the B-Spline in Equation 1 to calculate the motion variables at each frame k, which was used to construct the affine matrix Tk from the 6 motion variables. Concretely, Tk is defined as Tk=[rk0|tk1] where rk is a standard 3×3 rotation matrix made of rx, ry and rz and tk is a 3×1 translation vector made of tx, ty and tz. Using the B-spline interpolation, the temporal resolution of k can be as small as one projection (i.e., it is not limited to trot25 as we set for deep learning study).

The affine matrices Tk can be used to construct the MVF matrix M ∈ RJK×J, which maps the images to be reconstructed, x ∈ RJ, to a per-projection moving image sequence MxRJK. Here J is the number of image voxels and K is the number of frames. Denoting yRI as the projection data, where I is the number of detector pixels times number of projections, the cost function of the iterative reconstruction is:

x*=arg minx12AMxy2+βjjNj127(xjxj)2 #(Eq.2)

where ARI×JK is the system matrix to forward project the K moving images to form the motion-corrupted projections. The second term is the Gaussian prior to regulate noise, where Nj is the 27-neighborhood of voxel j, including itself. β is the hyperparameter to control the strength of the Gaussian prior. xj means the jth pixel in x and xj represents its neighboring pixels in Nj.

Compared to the conventional, statical system matrix ARI×J, the moving system matrix A has a block structure with K diagonal submatrices AkRIk×J where ∑k Ik = I and everywhere else 0. Ak is corresponding to the forward projection of the image at frame k to the related projections. Despite the increased size of the system matrix, everything can be calculated on the fly and no additional memory is needed to store the matrix.

Eq. 2 can be solved via the separable quadratic surrogate algorithm16 (derivation in Supplemental Text S-2):

x(n+1)=x(n)MTAT(AMx(n)y)+β4(x(n)F(x(n)))MTATAM1+8β #(Eq.3)

where F(·) is a 3×3×3 mean filter, and 1 is an all-ones vector with same shape with x. The algorithm was initialized with FBP-and-wrap. Nesterov acceleration (γ = 0.5) is used to speed up the algorithm17,18. 100 iterations with 12 ordered subsets were used for the reconstruction.

We use on-the-fly calculation of the MVF matrix M and projection matrix A as well as their transposes. M is implemented by applying the transformation matrices Tk to the image to achieve the moving image for each projection. A is implemented by forward projection of the image corresponding to each projection angle using distance driven. AT is implemented by individually backprojecting each projection without summing using distance driven. MT is implemented by applying the inverse transformation matrix Tk−1 to each angle’s backprojection and summing them up. Note that our implementation does not ensure strict transpose between M and MT, but no divergence of the algorithm was observed during our experiments.

After this pipeline of DL motion estimation and iterative reconstruction, the motion-corrected image x* is made. We denote this image as IDL,PAR in this paper to emphasize the contribution of both PAR and deep learning.

2.5. Simulation study

2.5.1. Dataset

100 head CT scans without any motion artifacts, denoted as Istatic, were retrospectively collected from a single medical center. The pixel size of all images was re-sampled into 1mm in the x- and y-direction, and the z-slice thickness was re-sampled to 2.5mm. The images were cropped to dimension of 256×256×45 covering the region from the nose to the top of the head.

2.5.2. Motion simulation

To train and test our DL model, we conducted a simulation study where the ground truth simulated motion was known. B-spline motions were simulated by randomly sampling CP1 to CP4 from [−max, max] where max is the motion amplitude equal to 5mm for each translation or 5° for each rotation1,19,20. These simulated motions were then applied to the entire motion-free image. A simulated full-scan (360° gantry rotation) projection was generated by forward-projecting the moving image using multi-slice fan-beam geometry, which is closer to the portable CT scanners21. Following the steps outlined in section 2.2, 25 PAR images were made from the projection. Each PAR had its dimension and pixel size the same as the CT image. To fit all PARs in GPU for model input, we further (1) downsampled the PAR by a factor of 2 in the x- and y-direction, resulting in H = 128 and W = 128; (2) included only 15 consecutive slices in PAR, resulting in approximately 4cm (≈ 15 × 2.5mm) in z-direction coverage. During the training phase, the model randomly sampled 15 consecutive slices as the model input in each iteration.

Among the 100 CT scans, 80 were used for training and the rest 20 were used for testing. In the training dataset, 100 different motion simulations were generated for each scan, resulting in a total of 8000 images. In the testing dataset, 50 different motion simulations were generated for each scan, resulting in a total of 1000 images.

2.5.3. Investigation on z-coverage

We conducted two investigation studies on the performance of our model using different z-coverages. Firstly, a CT scanner may not be able to cover the entire head along the z direction in one gantry rotation and the model can only see limited z coverage. Considering the anatomy variation across the head, as depicted in Figure 3, we divided the head into three parts: top (showing the cerebrum surrounded by skull in the image), middle (showing the brain’s ventricles and the frontal sinus within the frontal bone of the skull) and bottom (displaying the nose, maxillofacial bones and cerebellum). We generated our PAR images covering 15 z-slices from each part and evaluated the stability of model performance when using different parts as the model input. Note the model was trained with randomly sampling 15 consecutive slices for the model input so the model already learned different anatomies. We refer this investigation as our “main simulation study” in this paper, and we will report the motion estimation accuracy for each part separately in the Results section.

Figure 3. Divide a head into three parts: top, middle and bottom.

Figure 3.

Display window level = 50HU, window width = 100HU.

Secondly, some CT scanners have relatively shorter z-coverage in each gantry rotation (e.g., some portable CTs21 has a 1cm z-coverage in each rotation) so it’s worth accommodating a smaller z-coverage in our model input for these scanners. Concretely, we upsampled the motion-free image to have a pixel size of 0.625mm in z-direction and applied the same motions as the main simulation study to the upsampled motion-free images. Once again, following the steps in section 2.2, we generated 25 PAR images from the simulated projection, with each PAR having its z-resolution of 0.625mm. By using a 15-slice model input, the z-coverage is approximately 1cm (≈ 15 × 0.625mm). We evaluated the model performance with this shorter z-coverage. We refer to this investigation as “shorter z-coverage study” in this paper.

2.6. Phantom study

To evaluate the model performance on real CT projections, we conducted a head phantom study. An ACS head phantom (Kyoto Kagaku, Kyoto, Japan) was scanned using an OmniTom Elite PCD portable photon-counting CT (Neurologica, Danvers, MA, US). The system is shown in Figure 4A. The x-ray tube energy was set to 120 kVp, x-ray tube current was 20mA. The gantry rotation speed was 1 second per rotation, resulting in 1440 projections in one rotation. The z-directional slice thickness of the acquired projection was 0.707mm. Only the slices from the nose to the top of the head phantom were used in this analysis.

Figure 4. Phantom study.

Figure 4.

(A) photograph of the experimental setup of our phantom study. In (B), we presented a projection acquired from one CT scan of the static head phantom. To simulate intra-slice rotation and inter-slice translation with known ground truth, the projection was translated along different axes. θ = projection view direction, u = the detector direction, z = the z-slice direction, Δ = resolution.

To create a motion-contained projection, we stacked segments from different projections representing different head poses. Concretely, we acquired one projection dataset from a CT scan of the static head phantom and then applied data transformation onto this projection dataset to simulate motions with known ground truth (Figure 4B). A 3D projection dataset has three directions including the projection view direction θ, the detector direction u and the z-slice direction z. By translating the projections along the θ direction, we generated the intra-slice rotation, rz. The rotation was discrete with a step size the same as the angular coverage of each projection view, equal to 3601440 views =0.25. By translating the projections along the z direction, we generated the inter-slice translation, tz. The translation was also discrete with a step size the same as slice thickness of the projection, 0.707mm. We did not generate the tx, ty rx, and ry due to technical difficulties of accurately applying small, controlled perturbation. Note that with rz and tz we already had both the intra-slice and inter-slice motion. To create a motion pattern spanning a full gantry rotation, we obtained a sequence of 25 projection sets, each of which represents a particular combination of rz and tz using the method above. From these projections, we sequentially selected a respective 125 segment from each projection (i.e., equally divided each projection into 25 segments and selected the ith segment from the ith projection) and stacked these segments together, resulting in a motion-contained projection with temporal resolution equal to trot25.

We created 20 motion-contained projections with different motion patterns, each of which had the motion amplitude within ±5mm or 5° and had the motion between two consecutive temporal segments within ±1mm or 1°. From each motion-contained projection, we reconstructed 25 PAR images with dimensions and pixel sizes consistent with those used in the main simulation study. These PAR images were then input into our trained DL model for performance evaluation. Note that we averaged the estimated motion across the results obtained by using three parts of the head phantom (top, middle and bottom) as model input and report the averaged values in the Results section.

2.7. Evaluations and Comparisons

2.7.1. Quantitative Evaluation

We reported mean-absolution-error (MAE) to evaluate the estimated motion against the ground truth. The MAE was averaged across four CPs (CP1 to CP4) for each motion variable. As defined in the last paragraph of the section 2.4, IDL,PAR is the motion-corrected image by our approach. We reported MAE, root-mean-squared-error (RMSE), and structural similarity index (SSIM) to evaluate IDL,PAR against the motion-free reference images, Istatic. All these errors were measured on the foreground pixels whose values are larger than −10HU on the reference images.

2.7.2. Comparison with an image-domain-based CNN

We also applied an image-domain-based DL approach proposed in Su et al.2 (more details in Supplemental Material Text S-1) to the testing dataset and denoted the generated image as IDL,image. Unlike our approach that requires projection data, this approach reduces motion artifacts solely based on the image domain. It takes the motion-corrupted image obtained through FBP as the model input and outputs the motion-corrected image. The concerns of this approach include the potential lack of data fidelity and the challenges related to skull deformities2. This CNN was trained on the same training dataset and then evaluated on the same testing dataset as our method did. The one-tailed paired t-test was used to determine whether there is a significant improvement from IDL,image to IDL,PAR in terms of MAE, RMSE and SSIM. Statistical significance was set to p ≤ 0.05.

2.8. Run-time for Our Approach

Regarding the PAR image generation, it took approximately 5 seconds per case to perform FBP on 25 angular segments of the CT projection, produce 25 PAR images. Regarding the DL model deployment, we performed all DL experiments by using TensorFlow (http://www.tensorflow.org/) on an Ubuntu (version: 20.04.5) workstation with 40 GB RAM equipped with a NVIDIA DGX A100 GPU (NVIDIA Corporation, Santa Clara, CA, USA). The DL model with input dimension as [128,128,15,25] (where 25 represents 25 PAR images) took~38GB memory during the training and inference. The training time for one epoch was around 45 minutes, and we implemented early stopping to optimize the training duration. For the inference, the model was capable of predicting all controls points for 6 motion variables in less than 1 second.

3. RESULTS

3.1.1. Main simulation study: motion estimation accuracy

As mentioned in section 2.5.3, we assessed model performance when using PAR images made from three pre-defined parts of the head as the model input. Table 1 shows that the motion estimation accuracy for intra-slice and inter-slice motions in the bottom, middle and top parts. By averaging across three parts, the motion estimation accuracy is 0.40±0.21mm for tx, 0.36±0.18mm for ty, 0.36±0.18mm for tz, 0.30±0.14° for rx, 0.35±0.17° for ry and 0.39±0.21° for rz.

Table 1.

Mean-absolute-error (MAE) of motion estimation by DL

part of head bottom middle top
z-coverage 4cm 1cm 4cm 1cm 4cm 1cm
intra-slice tx(mm) 0.38±0.20 0.54±0.50* 0.41±0.15 0.54±0.32* 0.42±0.20 0.59±0.52*
ty(mm) 0.32±0.14 0.57±0.57* 0.33±0.15 0.50±0.29* 0.43±0.21 0.55±0.31*
rz(°) 0.33±0.14 0.41±0.35* 0.34±0.16 0.38±0.31* 0.51±0.24 0.69±0.36*
inter-slice tz(mm) 0.33±0.14 0.43±0.32* 0.34±0.16 0.51±0.38* 0.40±0.22 0.51±0.30*
rx(°) 0.28±0.13 0.53±0.44* 0.29±0.14 0.64±0.50* 0.31±0.14 0.68±0.52*
ry(°) 0.30±0.13 0.47±0.39* 0.34±0.16 0.47±0.35* 0.41±0.20 0.49±0.38*

z-coverage = 4cm corresponds to the main simulation study where we input 15 PAR images with z-resolution = 2.5mm. z-coverage = 1cm corresponds to the shorter z-coverage study where we input 15 PAR images with z-resolution = 0.625mm.

*

means statistically significant larger (p<0.05 by a one-tailed paired t-test).

3.1.2. Main simulation study: quantitative image evaluation

We integrated the predicted motion into the CT system matrix and employed the iterative reconstruction proposed in section 2.4 to generate motion-corrected image IDL,PAR. Each part of the head was reconstructed using its corresponding predicted motion individually. Figure 5 compares the images made by FBP (IFBP), by the image-domain-based CNN2 (IDL,image) and by our method (IDL,PAR) for four patients with different simulated motions. In these examples, IDL,PAR exhibits better capability in removing streaking artifacts (Figure 5A), recovering both large and small anatomical features (Figure 5B) and addressing skull deformities (Figure 5C). IDL,PAR also successfully reduces the motion-induced blurring in a patient with hemorrhage (Figure 5D). In all examples, we observed that IDL,image suffers from over-smoothing, especially in the brain tissue regions (Figure 5AD).

Figure 5. Comparisons of IFBP, IDL,image and IDL,PAR.

Figure 5.

(A), (B) and (C) display examples of the bottom, middle and top part of the head respectively. The zoomed-in image of the yellow box is presented at the bottom right corner. In all (A), (B) and (C), the problem of IDL,image is that it over-smooths the image, especially in the brain tissue. In contrast, in (A), IDL,PAR effectively removes the streaking artifacts; in (B), we show IDL,PAR not only accurately recovers large anatomical structures, such as the ventricles, but also restores small anatomical details, like the fissure between two hemispheres of the brain (yellow box); in (C), IFBP suffers from the double skull artifact due to the large head motion, while IDL,PAR effectively reduces motion artifacts by estimating accurate motion from the PAR images, leading to successful recovery of both the skull and surrounding brain tissues. (D) shows an example with the hemorrhage. In IFBP, the location of the hemorrhage appears blurred. IDL,image over-smooths the region of hemorrhage, while IDL,PAR reduce this blurring effect (yellow box). IFBP = image reconstructed by FBP. IDL,image = image reconstructed by image-domain-based CNN. IDL,PAR = image reconstructed by our method. PSNR = Peak Signal-to-Noise Ratio.

We also quantitatively evaluated the performance of each method in the testing dataset. When considering all three parts of the head, our method, IDL,PAR, significantly outperforms IDL,image and IFBP with lower MAE values (MAE = 37±9, 150±28, 178±33HU for IDL,PAR, IDL,image and IFBP, p<0.05), lower RMSE values (RMSE = 80±20, 310±51, 346±56HU for IDL,PAR, IDL,image and IFBP, p<0.05) and higher SSIM values (SSIM = 0.98±0.01, 0.67±0.08, 0.60±0.10 for IDL,PAR, IDL,image and IFBP, p<0.05). Detailed quantitative evaluation for each part of the head can be found in Table 2 and Figure 6.

Table 2.

Quantitative evaluations of IFBP, IDL,image and IDL,PAR

IFBP IDL,image IDL,PAR
MAE (HU) all slices 178±33 150±28 37±9
bottom 177±36 165±40 39± 12
middle 147±33 124±28 29±10
top 245±79 182±54 42±16
RMSE (HU) all slices 346±56 310±51 80±20
bottom 334±54 322±61 83±26
middle 312±64 275±58 71±25
top 421±105 182±54 88±33
SSIM all slices 0.60±0.10 0.67±0.08 0.98±0.01
bottom 0.59±0.11 0.62±0.12 0.97±0.02
middle 0.63±0.13 0.70±0.09 0.98±0.02
top 0.56±0.16 0.70±0.11 0.98±0.02

Figure 6. Box plots of quantitative evaluations of IFBP, IDL,image and IDL,PAR.

Figure 6.

(A), (B) and (C) display the box plots of MAE, RMSE and SSIM respectively. The red, blue, brown, and orange boxes represent the result of all slices, bottom part, middle part, and top part of the head respectively. Note the ticks in y-axis are accommodated for different value ranges of each metric.

3.2. Shorter z-coverage study: motion estimation accuracy

As stated in section 2.5.3, we evaluated the motion estimation accuracy when the z-coverage is only approximately 1cm. The ground truth motions are the same as the main simulation study. Table 1 presents the motion estimation accuracy with larger z-coverage (≈4cm) and shorter z-coverage (≈1cm) and demonstrates that the accuracy is significantly lower with shorter z-coverage (p < 0.05 by a one-tailed paired t-test).

3.3. Phantom study

We made 20 motion-contained projections representing different motion patterns of the head phantom. Our DL model estimated motions using PAR images made from each motion-contained projection and achieved motion estimation error as 0.48±0.19° for intra-slice rotation rz, 0.40±0.22mm for inter-slice translation tz, and 0.11±0.10mm, 0.17±0.17mm, 0.09±0.05°, and 0.13±0.07° for tx, ty, rx and ry which are set to be static in the ground truth.

Regarding the quantitative evaluation of the motion-corrected images, IDL,PAR significantly outperforms IDL,image and IFBP with lower MAE values (MAE = 42±19, 122±14, 117±17HU for IDL,PAR, IDL,image and IFBP, p<0.05), lower RMSE values (RMSE = 85±39, 248±22, 239±28HU for IDL,PAR, IDL,image and IFBP, p<0.05) and higher SSIM values (SSIM = 0.98±0.02, 0.80±0.03, 0.83±0.04 for IDL,PAR, IDL,image and IFBP, p<0.05). An illustrative example of the model performance on the phantom study is presented in Figure 7.

Figure 7. Phantom study: an illustrative example.

Figure 7.

For this example, we show its simulated ground truth motion patterns as the green lines in (A). As mentioned in section 2.6, the simulated motion was discrete and only tz and rz were simulated with non-zero values. The dashed brown lines in (A) represent fitting a B-spline onto the ground truth discrete motion, with blue asterisks as corresponding control points. The red lines in (A) represent the estimated motion by DL, with blue dots as corresponding control points. The results indicate accurate estimation for tz and rz, and the estimated tx and rx remain close to 0 as the ground truth. ty and ry are not shown here but they also remain close to 0 as the ground truth. (B) shows the IFBP, IDL,image and IDL,PAR generated for this example. Each row corresponds to a different slice. The zoomed-in image of the yellow box is presented at the bottom right corners. IDL,PAR demonstrates superior performance by effectively removing blurring in bone (the first row), removing streaking artifacts (the second row) and restoring the small anatomical features (the last row). Same as the results in simulation study, IDL,image over-smooths the image especially in the brain tissue. Quantitatively, there’s no improvement in MAE and SSIM for IDL,image. The MAE and SSIM of each image are presented on the top of each column.

4. DISCUSSION

In this study, we investigated the effectiveness of using partial angle reconstruction (PAR) images to estimate and compensate patient’s head motion in head CT scans. Concretely, we developed a DL model that utilizes PAR images as input to estimate the head motion. We then integrated the estimated motions into an iterative reconstruction scheme to compensate motion artifacts in the final CT image. The DL model achieved high accuracy in estimating head motion, as evidenced by results from both a simulation study (MAE ranging from 0.28~0.45 mm or degrees for different motion variables) and a phantom study (MAE ranging from 0.40~0.48 mm or degrees). The resulting motion-corrected image, IDL,PAR, exhibited a significant reduction in motion artifacts when compared to IFBP by the traditional FBP reconstruction, as evidenced both in a simulation study (MAE drops from 178±33HU to 37±9HU, SSIM increases from 0.60±0.10 to 0.98±0.01) and a phantom study (MAE drops from 117±17HU to 42±19HU, SSIM increases from 0.83±0.04 to 0.98±0.02).

4.1. Investigation on z-coverage

We conducted two investigation studies on the performance of our model using different z-coverages. Firstly, we investigated the model performance when using different parts of the head, each with different anatomical features, as the model input. Table 1 shows that the motion estimation MAE for the bottom part (ranging from 0.28 to 0.38 mm or degrees), the middle part (ranging from 0.29 to 0.41) and the top part (ranging from 0.31 to 0.51). The top part exhibited slightly lower motion estimation accuracy, displaying larger magnitudes and broader range of MAE compared to the other two parts. A plausible explanation for this observation is that the bottom and middle parts encompass a larger volume of head, including more bone structures (e.g., the maxillofacial bones in the bottom part) that might be helpful for motion estimation, which is fundamentally driven by image registration of PAR images through our DL model.

Secondly, we investigated the model performance when using PAR images with a shorter z-coverage (approximately 1cm) and higher z-resolution (pixel size = 0.625mm in z-direction) as the model input. Table 1 reveals that the motion estimation accuracy is significantly lower with shorter z-coverage (p < 0.05 by a one-tailed paired t-test). A plausible explanation is that the higher z-resolution decreases the disparity between neighboring slices, making it more challenging for the DL model to conduct previse image registration, particularly along the z-direction. Although smaller z-coverage led to worse motion estimation, the errors are still less than 0.7 mm or degrees for most of the variables, demonstrating the feasibility of applying the proposed method to portable CTs with small z coverage.

4.2. Phantom study

Compared with the simulation study, the motion estimation accuracy in phantom study is slightly lower (0.40±0.22mm versus 0.36±0.18mm for tz and 0.48±0.19° versus 0.39±0.21° for rz when comparing phantom study versus simulation study). Two plausible explanations include the domain shift and the discretized motion in phantom study that cannot be modeled well by the B-spline.

4.3. Comparison with an image-domain-based CNN

We compared the performance between IDL,PAR generated by our proposed method and IDL,image generated by an image-domain-based CNN proposed by Su et al2 and showed superior performance of ours (p<0.05). Both our simulation study and phantom study reveal that IDL,image tends to over-smooth the reconstructed image, especially in the region of brain tissue. Although IDL,image was reported to reduce the motion artifacts in the original paper2, it is worth mentioning that their motion simulation involved much milder motion (e.g.., the SSIM of their motion-corrupted image was 0.94 compared to our 0.60) in both training and testing dataset which simplified the task of motion correction. In contrast, our simulated motion can be categorized as moderate to severe (i.e., RMSE = 346±56HU, SSIM = 0.60±0.10) according to Kim et al7, and in these harder cases IDL,image did not perform satisfactorily. This indicates that our own approach IDL,PAR is a better solution when the patient is experiencing large motion during the CT acquisition.

4.4. Image Downsampling

We down-sampled PAR by a factor of 2 in x- and y-directions to fit 25 PARs into our GPU. One concern is whether doing so would affect the accuracy of motion estimation and the sharpness of the reconstruction.

Because the motion was estimated on a lower resolution, it may suffer from loss of precision to the pixel level. However, most visible motion artifacts in brain CT are due to rigid motions that are far beyond the pixel level. Hence, downsampled PARs are sufficient for our task in reduction the motion artifacts. Our reconstruction was done under full resolution with the estimated motion parameters. Therefore, the fact that PARs are downsampled will not directly harm the sharpness of the reconstructed image.

4.5. Limitations

Our study has limitations. Firstly, the head motion simulated in our study may not fully represent real patient motion in two aspects. In one aspect, it might not adequately capture the abrupt motion. We utilized a B-spline motion model with 5 control points proposed in Jang et al.12 to describe the patient head motion during the CT scan. While our DL model effectively learns this motion model and estimates the values of the control points, it restricts the temporal resolution of our results to one-fifth of one gantry rotation. Consequently, this temporal resolution might not capture the abrupt motion that can occur in real patients. For instance, the portable CT we used in the phantom study has one gantry rotation lasting for 1 second, meaning our method could only capture the head motion that no shorter than 0.2 seconds. In future work, it will be essential to investigate the feasibility of incorporating additional control points into the head model to enhance temporal resolution and to explicitly evaluate our model’s performance in scenarios of abrupt motion. In the other aspect, our study may not fully capture motion with large amplitudes, such as the movement of the patient’s body, which can also induce motion artifacts in CT images. The rotations and translations we simulated in this study were always centered around the image center, which approximately corresponds to the center of the head when ensuring proper head positioning. Consequently, our simulated motion mainly represented patient head motion, typically with amplitudes in the range of 2~5mm or degrees1. However, patient body movement can primarily be decomposed into translations along different directions but likely with larger amplitude. Future investigations should consider these larger amplitude motions.

Secondly, this method may not work properly for helical CT scans. Unlike axial CT scan where the z-coverage remains constant during one gantry rotation, helical CT scans propagate along the z-direction during the gantry rotation, leading to varying z-coverage for each PAR image. This dynamic z-coverage can pose challenges for the DL model in accurately performing image registration between PARs and estimating motion. However, note that our method is straightforward to be extended to cone-beam geometry with circular trajectory.

Lastly, while our study provides promising results from simulation and phantom studies, it is essential to further validate our method on real clinical CT scans in future work. The transition from simulation and phantom studies to real clinical scenarios may present certain challenges. These challenges include the need for a proper head motion model that effectively represents the typical head motion observed in real patients (as discussed in the first limitation) and the anatomical variation between patients.

5. CONCLUSIONS

We have demonstrated in this work that we can accurately estimate the patient head motion using the partial angle reconstruction (PAR) images of head CT scans. By developing a deep learning model that take PAR images as input, we achieve accurate motion estimation. Integrating this motion estimation into an iterative reconstruction process significantly reduces motion artifacts in the final reconstructed images. Our method has been validated in simulation and phantom studies. Future work involves further investigation of our method on real clinical CT scans.

Supplementary Material

Supplemental Material 1
Supplemental Material 2

Acknowledgement:

We thank Neurologica for providing support during the physical phantom study.

Disclosure of Conflicts of Interest:

Dufan Wu is supported in part by NIBIB under grant R21EB031939. Zhennong Chen and Quanzheng Li are supported in part by NIBIB under grant R01HL159183.

Reference

  • 1.Kyme AZ, Fulton RR. Motion estimation and correction in SPECT, PET and CT. Phys Med Biol. 2021;66(18). doi: 10.1088/1361-6560/ac093b [DOI] [PubMed] [Google Scholar]
  • 2.Su B, Wen Y, Liu Y, et al. A deep learning method for eliminating head motion artifacts in computed tomography. Med Phys. 2022;49(1):411–419. doi: 10.1002/mp.15354 [DOI] [PubMed] [Google Scholar]
  • 3.Fahmi F, Beenen LFM, Streekstra GJ, et al. Head movement during CT brain perfusion acquisition of patients with suspected acute ischemic stroke. Eur J Radiol. 2013;82(12):2334–2341. doi: 10.1016/j.ejrad.2013.08.039 [DOI] [PubMed] [Google Scholar]
  • 4.Malviya S, Voepel-Lewis T, Eldevik OP, Rockwell DT, Wong JH, Tait AR. Sedation and general anaesthesia in children undergoing MRI and CT: adverse events and outcomes. British Journal of Anaesthesia. 2000;84(6):743–748. doi: 10.1093/oxfordjournals.bja.a013586 [DOI] [PubMed] [Google Scholar]
  • 5.Kim JH, Nuyts J, Kuncic Z, Fulton R. The feasibility of head motion tracking in helical CT: a step toward motion correction. Med Phys. 2013;40(4):041903. doi: 10.1118/1.4794481 [DOI] [PubMed] [Google Scholar]
  • 6.Kim JH, Nuyts J, Kyme A, Kuncic Z, Fulton R. A rigid motion correction method for helical computed tomography (CT). Phys Med Biol. 2015;60(5):2047. doi: 10.1088/0031-9155/60/5/2047 [DOI] [PubMed] [Google Scholar]
  • 7.Kim JH, Sun T, Alcheikh AR, Kuncic Z, Nuyts J, Fulton R. Correction for human head motion in helical x-ray CT. Phys Med Biol. 2016;61(4):1416–1438. doi: 10.1088/0031-9155/61/4/1416 [DOI] [PubMed] [Google Scholar]
  • 8.Ko Y, Moon S, Baek J, Shim H. Rigid and non-rigid motion artifact reduction in X-ray CT using attention module. Medical Image Analysis. 2021;67:101883. doi: 10.1016/j.media.2020.101883 [DOI] [PubMed] [Google Scholar]
  • 9.Sun T, Kim JH, Fulton R, Nuyts J. An iterative projection-based motion estimation and compensation scheme for head x-ray CT. Med Phys. 2016;43(10):5705. doi: 10.1118/1.4963218 [DOI] [PubMed] [Google Scholar]
  • 10.Ouadah S, Jacobson M, Stayman JW, Ehtiati T, Weiss C, Siewerdsen JH. Correction of patient motion in cone-beam CT using 3D-2D registration. Phys Med Biol. 2017;62(23):8813–8831. doi: 10.1088/1361-6560/aa9254 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Bruder H, Rohkohl C, Stierstorfer K, Flohr T. Compensation of skull motion and breathing motion in CT using data-based and image-based metrics, respectively. In: Medical Imaging 2016: Physics of Medical Imaging. Vol 9783. SPIE; 2016:348–359. doi: 10.1117/12.2217395 [DOI] [Google Scholar]
  • 12.Jang S, Kim S, Kim M, Ra JB. Head motion correction based on filtered backprojection for x-ray CT imaging. Med Phys. 2018;45(2):589–604. doi: 10.1002/mp.12705 [DOI] [PubMed] [Google Scholar]
  • 13.Kim S, Chang Y, Ra JB. Cardiac motion correction based on partial angle reconstructed images in x-ray CT. Med Phys. 2015;42(5):2560–2571. doi: 10.1118/1.4918580 [DOI] [PubMed] [Google Scholar]
  • 14.Hahn J, Bruder H, Rohkohl C, et al. Motion compensation in the region of the coronary arteries based on partial angle reconstructions from short-scan CT data. Med Phys. 2017;44(11):5795–5813. doi: 10.1002/mp.12514 [DOI] [PubMed] [Google Scholar]
  • 15.Maier J, Lebedev S, Erath J, et al. Deep learning-based coronary artery motion estimation and compensation for short-scan cardiac CT. Med Phys. 2021;48(7):3559–3571. doi: 10.1002/mp.14927 [DOI] [PubMed] [Google Scholar]
  • 16.Elbakri IA, Fessler JA. Statistical image reconstruction for polyenergetic X-ray computed tomography. IEEE Transactions on Medical Imaging. 2002;21(2):89–99. doi: 10.1109/42.993128 [DOI] [PubMed] [Google Scholar]
  • 17.Wu D, Kim K, El Fakhri G, Li Q. Iterative Low-Dose CT Reconstruction With Priors Trained by Artificial Neural Network. IEEE Trans Med Imaging. 2017;36(12):2479–2486. doi: 10.1109/TMI.2017.2753138 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kim K, El Fakhri G, Li Q. Low-dose CT reconstruction using spatially encoded nonlocal penalty. Med Phys. 2017;44(10):e376–e390. doi: 10.1002/mp.12523 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Johnson PM, Drangova M. Conditional generative adversarial network for 3D rigid-body motion correction in MRI. Magn Reson Med. 2019;82(3):901–910. doi: 10.1002/mrm.27772 [DOI] [PubMed] [Google Scholar]
  • 20.Sommer K, Saalbach A, Brosch T, Hall C, Cross NM, Andre JB. Correction of Motion Artifacts Using a Multiscale Fully Convolutional Neural Network. AJNR Am J Neuroradiol. 2020;41(3):416–423. doi: 10.3174/ajnr.A6436 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Park SJ, Park J, Kim D, et al. The first mobile photon-counting detector CT: the human images and technical performance study. Phys Med Biol. 2023;68(9). doi: 10.1088/1361-6560/acc8b3 [DOI] [PubMed] [Google Scholar]
  • 22.Erdogan H, Fessler JA. Monotonic algorithms for transmission tomography. IEEE Transactions on Medical Imaging. 1999;18(9):801–814. doi: 10.1109/42.802758 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material 1
Supplemental Material 2

RESOURCES