Abstract
Rationale and Objectives:
To evaluate the clinical performance of a diffusion model-based motion correction algorithm for portable brain CT.
Materials and Methods:
We retrospectively collected 67 portable brain CT scans with corresponding fixed CT scans acquired within ± 2 days as reference. A pre-trained diffusion model was applied to correct motion artifacts in the portable scans. Each case yielded three volumes as follows: original (motion group), corrected (corrected group), and fixed (reference group). Images were reviewed in randomized order by three professional readers (one neuroradiologist, one neuroradiology fellow, and one radiology resident), with at least two weeks between sessions to reduce recall bias. Eight lesion types and four image quality metrics were scored using a 5-point Likert scale. ACR phantom testing was performed to assess compliance with diagnostic image quality standards.
Results:
Corrected images significantly outperformed motion images in all image quality metrics (improvement: 0.33–0.79, p < 0.001), except for sharpness (p = 0.34). Diagnostic confidence improved from 2.52 to 2.86. Lesion detectability remained comparable before and after correction, with no significant differences in agreement rates (McNemar’s p > 0.10) or AUCs (DeLong’s p > 0.06) across all lesion types. Agreement rates ranged from 0.866 to 0.985 in the corrected group against the reference, and AUCs from 0.788 to 0.964. The net reclassification index was 2.66%. Corrected images passed all ACR criteria in phantom testing.
Conclusion:
The diffusion model-based algorithm effectively improves image quality and diagnostic confidence without compromising lesion detection, supporting its potential for clinical use in portable brain CT.
Keywords: Portable Brain CT, Motion Correction, Diffusion Model
INTRODUCTION
Transporting critically ill patients from the intensive care unit (ICU) to the CT suite poses significant challenges and may result in a high incidence of adverse events (1). Portable CT scanners were designed to overcome this issue. These portable scanners can be easily moved around the hospital to the bedside, or to an operating room for intra-operative imaging. Furthermore, portable CT can also be installed in ambulances for time-sensitive acute ischemic stroke (AIS) management. An ambulance equipped with a portable CT scanner can eliminate the delay in scanning upon arrival at the emergency department by allowing patients to be scanned immediately at the site or en route (2,3), therefore increasing the likelihood of favorable outcomes from recanalization therapies (4). Portable CT scanners have been used primarily for brain imaging.
Patient motion is a major source of artifacts in portable brain CT due to three main reasons. First, the relatively slower gantry rotation speed (one second per rotation in portable CT (5), compared to ~400ms in fixed CT (6)) and narrower z-axis coverage (1 cm for portable CT (5), compared to approximately 16 cm for a 320-detector-row CT (7)) in portable CT results in a prolonged scanning to cover the entire patient’s head, increasing the likelihood of patient head motion during the acquisition. Second, critically ill patients often lack the ability to voluntarily stabilize their head, and existing immobilization devices may not be appropriate for such patients (8). Third, portable CT scanners currently do not incorporate advanced hardware improvements, such as faster gantry rotation or dual-source technology, which have been adopted in standard CT scanners to minimize scan time.
Several solutions have been proposed to address motion artifacts in brain CT images, which can be broadly categorized into two groups as follows: (1) methods that estimate head motion from CT projection data and perform motion compensation during image reconstruction (9–12) and (2) deep learning-based techniques that operate motion correction solely within the image domain by learning mappings between motion-corrupted and motion-free image pairs (13,14). Among these, Zhou et al. (15) clinically validated their deep learning-based algorithm (14) for reducing motion artifacts and improving lesion detectability in fixed brain CT. However, none of these prior studies have specifically targeted motion correction in portable brain CT.
To address this gap, a diffusion model-based algorithm to solve this problem (16) in portable CT has been proposed. The diffusion model is conditioned on motion-corrupted images and trained to estimate the score function of the corresponding motion-free images. During inference, the trained model generates motion-corrected images through iterative sampling. While the original study included a simple human observer study that subjectively assessed motion artifact reduction, it lacked artifact-free reference images, making it unable to quantify the impact of motion correction on lesion detectability. Compromised lesion detectability (e.g., the introduction of false positive and false negative in motion-corrected images) is a major concern when applying generative models to medical imaging (17). In this paper, we aim to provide more rigorous clinical validation of this algorithm. We retrospectively collected portable brain CT scans with motion artifacts alongside corresponding fixed CT scans for the same patients, using the latter as a reference standard for lesion presence. A reader study was conducted to score the presence of eight different types of intracranial lesion on original portable CT, motion-corrected images, and reference fixed CT. To the best of our knowledge, this is the first study to evaluate a diffusion model-based method for portable CT image reconstruction using a task-based human observer study focused on diagnostic performance.
METHODS AND MATERIALS
Data Sources
This retrospective, single-center study was approved by local institutional review board with IRB2022P002233. Figure 1 shows the flowchart for case collection. Between January 2018 and July 2023, we retrospectively collected 250 portable brain CT scans. Among these, 131 scans had corresponding fixed CT scans acquired within ± 2 days for the same patients, with the fixed CT considered as the reference standard for lesion detection. Within this subset, 67 portable scans exhibited visible motion artifacts and were therefore included in this study. Each scan pair was from a unique patient. Specifically, 23 scan pairs were acquired on the same day, 30 within ± 1 day and 14 within ± 2 days.
Figure 1.

Flowchart for case collection.
The data were divided into the following three groups: (1) the motion group, consisting of the original portable CT scans with motion artifacts; (2) the corrected group, consisting of motion-corrected images; and (3) the reference group, consisting of the corresponding fixed CT scans. The images of the motion and reference group were non-contrast volumes reconstructed with soft kernels. The corrected group was generated by the proposed diffusion algorithm (16).
Image Acquisition
The original portable CT scans (motion group) were acquired using a portable CT scanner (CereTom, Neurologica, Danvers, MA, USA) using standard non-contrast brain protocol. The tube voltage was 120 kVp, the X-ray tube current was 6–7 mA. The volume CT dose index ranged from 41.26 to 70.73 mGy (mean: 54.74 mGy). The in-plane pixel spacing was 0.49 mm in both the x and y directions, and the slice thickness was 1.25 mm. The fixed CT scans (reference group) were acquired on Siemens (n = 55) and GE scanners (n = 12) using standard non-contrast brain protocol. The tube voltage was 120 kVp, the X-ray tube current ranged from 145 to 935 mA (mean: 309 mA). The volume CT dose index ranged from 23.21 to 94.44 mGy (mean: 59.41 mGy). The in-plane pixel spacing ranged from 0.38 to 0.52 mm (mean: 0.43 mm) in x and y directions while the slice thickness ranged from 0.5 to 1 mm (mean: 0.61 mm). All volumes, including both portable and fixed CTs, were reconstructed using soft tissue kernels. To meet input requirements of the diffusion model, all volumes were resampled to a pixel dimension of 1×1×2.5 mm3 and then cropped/padded to dimension of 256×256×50 voxels.
Motion Correction Algorithm
This diffusion-based motion correction algorithm was proposed in the previous technical paper (16). The key technical innovations include the following: (1) using original portable CT images as conditional inputs of the diffusion network to guide the generation of corresponding motion-corrected outputs; (2) applying histogram equalization during image preprocessing to resolve the performance discrepancies between brain tissue and skull caused by their intensity range difference and (3) leveraging the advanced Elucidated Diffusion Model (EDM) (18) framework for an expediated sampling process. With this approach, motion correction for a full case of 256×256×50 voxels can be completed in approximately 2 min on a single NVIDIA A100 GPU. The process of algorithm development and clinical validation is illustrated in Figure 2. The source code is publicly available at https://github.com/zhennongchen/Diffusion_for_CT_motion.
Figure 2.

The process of algorithm development and clinical validation. Note that during model training, we used fixed CT as reference standard since it is practically more challenging to find portable CT completely free of image artifacts due to the nature of portable technology.
Qualitative Scoring of Image Quality and Lesion Presence
CT images were independently reviewed by three professional readers as follows: (1) a 4th-year radiology resident (4 years of experience that includes emergency radiology and radiological services for ICU, with a dedicated 12 months in diagnostic neuroradiology), (2) a neuroradiology fellowship trained and board-certified radiologist (6 years of experience) and (3) a neuroradiology fellow (5 years of experience, 4 years of radiology residency and 1 year of neuroradiology fellowship experience). Images from the motion, corrected and reference group were mixed in a randomized order and reviewed in three distinct sessions with a minimum interval of two weeks to minimize memorization and reduce the risk of recall bias.
Four qualitative image quality metrics, including overall quality, sharpness, motion artifacts, and diagnostic confidence, were graded across the motion, corrected, and reference groups using a 5-point Likert scale (19), where higher scores indicate better image quality.
Regarding lesion detectability, the presence of eight different lesion types was graded across three groups using a 5-point Likert scale, with higher scores indicating greater likelihood of lesion presence. The lesion types evaluated included the following: epidural hematoma (EDH), Intraparenchymal hematoma (IPH), subdural hematoma (SDH), subarachnoid hemorrhage (SAH), intraventricular hemorrhage (IVH), pneumocephalus, mass effect, and hydrocephalus. A complete list of scoring criteria for both image quality and lesion presence is provided in Appendix Table A.
ACR Phantom Testing
To evaluate whether the motion correction algorithm preserves key image quality metrics and meets diagnostic imaging standard, we performed an ACR phantom testing. An ACR CT phantom was scanned on a portable CT scanner (OmniTom, Neurologica, Danvers, MA, USA) using a standard adult head clinical protocol. The volume was resampled to 1×1×2.5 mm3, and the trained diffusion model was directly applied to the phantom volume without further modification. The volumes before and after applying the motion correction model were analyzed by a diagnostic medical physicist for the following aspects: low contrast performance, spatial resolution, CT number accuracy, uniformity, and artifacts.
Statistical Analysis
For each image, the three reader scores were aggregated using the median to reduce the influence of outliers. Inter-reader variability was assessed using the intraclass correlation coefficient (ICC) across all three readers.
Image Quality Metrics
Likert-scale scores for image quality in each group were summarized using the mean and standard deviation. To evaluate whether motion correction improved image quality, a one-sided Wilcoxon signed-rank test was performed to compare scores between the motion and corrected groups. A p-value less than 0.05 was considered statistically significant.
Lesion Presence
Lesion detectability in the motion and corrected groups was evaluated using agreement rates, receiver operating characteristic (ROC) analysis and net reclassification index.
For agreement rate evaluation, reader scores from the motion or corrected groups were compared against scores from the reference group. If and only if , the two readings were considered to be in agreement; otherwise, they were classified as in disagreement. The agreement rate was calculated as the number of agreed cases divided by the total number of cases. Differences in agreement rates between the motion and corrected groups were assessed using McNemar’s test (20). A p-value less than 0.05 was considered statistically significant.
For ROC analysis, binary reference labels were defined based on scores from the reference group (fixed CT), which were binarized such that a score ≥4 was labeled as positive (label = 1), and a score < 4 as negative (label = 0). Reader scores from the motion and corrected group were retained as 5-point Likert values. The area under the ROC curve (AUC) was calculated for the motion and corrected group, and 95% CI were estimated using non-parametric bootstrap resampling with 1000 iterations. Note for EDH, since we only have one positive case in the reference group, no CI was calculated. AUCs were reported separately for each lesion type, along with an overall AUC calculated for all lesions. Statistical differences in AUC between the motion and corrected groups were assessed using a fast implementation of DeLong’s algorithm (21).
Lastly, we calculated the net reclassification index (NRI) to assess the impact of motion correction on classification. A lesion was considered present if the reader score was ≥4 and absent if the score was ≤2; scores of 3 were treated as uncertain and excluded from the analysis. We defined the following four types of reclassification: (1) up|event, the portable CT label changed from absent to present after motion correction, and the reference indicates presence; (2) down|event, the label changed from present to absent, while the reference indicates presence, (3) down|non-event, the label changed from present to absent, and the reference indicates absence and (4) up|non-event, the label changed from absent to present, while the reference indicates absence. Among four, (1) and (3) represent correct reclassification while (2) and (4) indicate incorrect reclassification (i.e., 2 represents the introduction of false negative while 4 represents false positive). The NRI was calculated as:
where probabilities were computed by dividing the number of cases in each category by the total number of positive or negative reference cases, respectively.
RESULTS
Table 1 summarizes the patient characteristics of the case collection. The inter-reader variability is reported in Appendix Table B.
TABLE 1.
Patient Characteristics of the Collected Brain CT Cases
| Patients’ Characteristics (n = 67) | |
|---|---|
| Mean age (years) | 53 ± 17 (18–91) |
| Male | 39 (58.2%) |
| EDH | 1 (1.5%) |
| IPH | 34 (50.7%) |
| SDH | 31 (46.3%) |
| SAH | 27 (40.3%) |
| IVH | 26 (38.8%) |
| Pneumocephalus | 17 (25.4%) |
| Mass effect | 40 (59.7%) |
| Hydrocephalus | 13 (19.4%) |
The presence of lesion types was determined using the reference binary labels (see section 2.6)
Image Quality
Table 2 summarizes the image quality scores across the four assessed metrics. The corrected group demonstrated higher mean scores than the motion group on all metrics. Among them, three differences were statistically significant (p < 0.001 by one-sided Wilcoxon signed-rank test) except for sharpness (p = 0.34). The greatest improvement was observed in the motion artifact score, increasing from 2.43 ± 0.87 in the motion group to 3.22 ± 0.95 in the corrected group (higher scores indicate better image quality). Specifically for diagnostic confidence, 31 cases in the motion group received scores below 3 (indicating low or very low confidence). Among these, 18 cases (58.1%) showed an increase of at least 1 point in the corrected group, suggesting improved confidence after motion correction. As expected, the reference group consistently achieved the highest scores across all metrics. Figure 3 provides visual examples of improved image quality.
TABLE 2.
The Image Quality Scores for Each Metric
| Metric | Motion | Corrected | Reference | P-value |
|---|---|---|---|---|
| Overall quality | 2.28 ± 1.15 | 2.61 ± 0.92 | 3.44 ± 1.00 | < 0.001 |
| Sharpness | 2.48 ± 0.86 | 2.51 ± 0.89 | 3.34 ± 1.04 | 0.340 |
| Motion artifacts | 2.43 ± 0.87 | 3.22 ± 0.95 | 3.91 ± 0.90 | < 0.001 |
| Diagnostic confidence | 2.52 ± 0.94 | 2.86 ± 0.96 | 3.69 ± 0.97 | < 0.001 |
Larger scores mean better quality. p-values were calculated between the motion and corrected groups
Figure 3.

Visual examples of image quality improvements. (a) and (b) show corrections of motion artifacts within the brain tissue, where (a) corrects the star-like artifacts originating from the skull and (b) corrects the severe streaking artifacts across the brain. (c) shows the removal of a “double skull” artifact caused by substantial head motion. Image display window is [0,80]HU for (a) and (b) and [−500, 1500]HU for (c).
Lesion Detectability
Figure 4 presents representative examples where motion correction improves lesion detectability.
Figure 4.

Visual examples of improved lesion detection by motion correction. (a) It illustrates correct reclassification of a positive case (up|event). Motion artifacts obscure the presence of an IPH. The corrected image restores the lesion’s true morphology, which is confirmed in the reference. (b) It illustrates correct reclassification of a negative case (down|non-event). A suspected IPH in the right frontal lobe is seen in the motion image, but it disappears in both the corrected and reference, suggesting it was a false-positive finding induced by minor motion artifacts that was successfully corrected by our algorithm. The reference images in all three examples were scanned on the same day as the portable scan. Image display window is [0,80]HU. The right bottom corner zooms in the region of interest. The reference images in (a) examples were acquired on the same day of the portable scan, while in (b) the reference was acquired one day earlier.
Table 3 summarizes agreement rates between the motion or corrected groups and the reference group for lesion detection. Agreement rates were reported for all cases, as well as stratified by the severity of motion, defined by the motion artifact score in the motion group (score≤1 classified as significant/severe motion and > 1 as non-significant motion). Across all cases, improvements in agreement after motion correction were observed for SDH, pneumocephalus, and hydrocephalus, while slight decreases were noted for IPH, IVH, and mass effect. However, McNemar’s test indicated no statistically significant differences in agreement rates before and after motion correction across any lesion type or motion severity subgroup (p≥0.109).
TABLE 3.
Agreement Rate of Lesion Detection in Portable CT (the Motion and Corrected Group) Compared to Fixed CT (the Reference Group)
| All Cases (n = 67) |
Significant Motion (n = 20) |
Non-significant Motion (n = 47) |
||||
|---|---|---|---|---|---|---|
| Lesion Type | Motion | Corrected | Motion | Corrected | Motion | Corrected |
| EDH | 0.985 | 0.985 | 1.000 | 1.000 | 0.979 | 0.979 |
| IPH | 0.910 | 0.866 | 0.850 | 0.800 | 0.936 | 0.894 |
| SDH | 0.806 | 0.896 | 0.650 | 0.850 | 0.872 | 0.915 |
| SAH | 0.866 | 0.866 | 0.900 | 0.850 | 0.851 | 0.872 |
| IVH | 0.955 | 0.925 | 0.950 | 0.900 | 0.957 | 0.936 |
| Pneumocephalus | 0.851 | 0.881 | 0.800 | 0.850 | 0.872 | 0.894 |
| Mass effect | 0.881 | 0.866 | 0.850 | 0.750 | 0.894 | 0.915 |
| Hydrocephalus | 0.925 | 0.985 | 0.950 | 1.000 | 0.915 | 0.979 |
The agreement is defined if the absolute difference between portable CT score and fixed CT score is smaller or equal to 1. The motion subgroups were determined using the motion artifact score in the motion group
Table 4 presents the AUCs and corresponding 95% CIs for each lesion type. Both the motion and corrected groups achieved high AUCs (≥0.788) for the eight lesion types when compared to the reference group. While none of the AUC differences between the motion and corrected groups reached statistical significance (all p≥0.059 by DeLong’s test), the corrected group demonstrated the largest numerical improvement for EDH (AUC improved from 0.826 to 0.955), and the largest decrease for IVH (decrease from 0.949 to 0.911). The overall AUC calculated for all lesions was 0.890 (CI: 0.872–0.907) for the motion group and 0.892 (CI: 0.874–0.909) for the corrected group. Figure 5 presents the ROC curve for each lesion type.
TABLE 4.
AUCs and 95% CI for Lesion Detectability Across the Eight Lesion Types
| Lesion type | Motion | Corrected | p-value |
|---|---|---|---|
| EDH* | 0.826 | 0.955 | 0.401 |
| IPH | 0.904 (0.858, 0.947) | 0.886 (0.839, 0.934) | 0.381 |
| SDH | 0.818 (0.756, 0.869) | 0.810 (0.753, 0.865) | 0.766 |
| SAH | 0.796 (0.730, 0.855) | 0.788 (0.716, 0.849) | 0.731 |
| IVH | 0.949 (0.912, 0.980) | 0.911 (0.869, 0.950) | 0.059 |
| Pneumocephalus | 0.824 (0.745, 0.892) | 0.851 (0.780, 0.918) | 0.260 |
| Mass effect | 0.887 (0.839, 0.933) | 0.868 (0.809, 0.917) | 0.458 |
| Hydrocephalus | 0.938 (0.887, 0.979) | 0.964 (0.928, 0.990) | 0.325 |
| All types | 0.890 (0.872, 0.907) | 0.892 (0.874, 0.909) | 0.799 |
p-values were calculated using DeLong’s test to compare the AUCs between the motion and corrected groups.
Note
the 95% CI for EDH was not calculated due to only one positive case
Figure 5.

ROC curves for lesion detection in the motion (blue) and corrected (orange) group. The shaded areas indicate the 95% CI. (Color version of figure is available online.)
Appendix Table C summarizes the cases that were reclassified by the motion correction algorithm. In total, there were 13 correct reclassifications (i.e., 5 up|event and 8 down|non-event cases, visual examples in Figures 4 and 5 incorrect reclassifications (i.e., 4 down|event and 1 up|non-event, visual examples in Fig 6). The resulting NRI was calculated to be 2.66%.
Figure 6.

Visual examples of incorrect reclassification by motion correction. (a) shows an example of down|event, while (b) shows an up|non-event case. (a) Both the reference and motion images show a subtle IPH in the temporal horn, which is no longer visible in the corrected image. This may be due to the motion correction algorithm removing the fine details mistaken as motion artifacts. (b) The motion and corrected show parenchymal hemorrhage mostly, the reference (2 days later) shows more intraventricular extension. This suggest there was interval evolution. The mass effect appears less evident in the reference scan, likely due to ventricular decompression, and was therefore scored as “no mass effect” by the readers. The motion image was also scored as “no mass effect,” whereas the corrected image was scored as “mass effect,” likely because improved visualization enhanced reader confidence in identifying it. The reference image in (a) example was acquired on the same day of portable CT scan, while the reference in (b) was scanned two days later.
ACR Phantom Testing
For low contrast performance, the contrast-to-noise ratio (CNR) was 1.15 and 1.01 before and after applying the motion correction model, respectively. For spatial resolution, it remained unchanged at 5 line pairs per mm. For CT number accuracy, the measured CT numbers for water, air, Teflon, polyethylene, and acrylic were respectively −1.3, −995.6, 1018.7, −101.3, and 118.4 HU before the correction, and −5.5, −986.2, 1026.0, −102.4, and 114.9 HU after the correction. For uniformity, the maximum difference between peripheral ROIs and center ROI was 1.14 and 4.19 HU before and after the correction. Artifacts were not seen for both volumes. All testing values for both volumes were within the ACR acceptance range, meaning that the motion correction model did not significantly affect the imaging performance.
DISCUSSIONS
In this study, we evaluated a diffusion model-based algorithm for correcting motion artifacts in 67 real-world portable brain CT scans. To our knowledge, this is the first work to apply a diffusion model to portable CT image reconstruction and assess its performance using a task-based human observer study. The findings highlight two key messages. First, the diffusion model significantly reduces motion artifacts (reflected by an improvement in the motion artifact score from 2.43 to 3.22 under 5-point Likert scale) and enhances overall image quality (from 2.28 to 2.61) as well as diagnostic confidence (from 2.52 to 2.86) in portable brain CT. Second, the motion correction does not compromise lesion detectability, as indicated by comparable agreement rates and AUCs across eight lesion types, and a modest NRI of 2.66%.
A widely explored software strategy for correcting motion artifacts in brain CT involves estimating patient head motion directly from raw projection data and compensating for this motion during image reconstruction. Solutions under this framework include the registration of 2D projections to a 3D prior image (9,10), optimization of image-based motion artifact metrics (MAM) (11,22) and the utilization of partial angle reconstruction (PAR) images (12,23). Despite promising results, a major limitation of projection-based methods is the restricted access to raw projection on most commercial CT scanners. As a result, these projection-based approaches have only been validated in numerical simulations, in-house phantom studies, or through limited visual assessment on very few real-world cases (9,10). Furthermore, solving these ill-posed inverse problems typically requires iterative schemes with high computational cost and often relies on pre-defined motion patterns that may not capture the complexity of real head motion.
Recently, advances in deep learning have enabled motion correction directly within the image domain, eliminating the need for projection data and facilitating larger-scale evaluation on clinical datasets. Convolutional neural networks (CNN) (13,14) have been trained to learn the mapping between motion-corrupted and motion-free images using supervised training on paired data. Zhou et al. (15) conducted a comprehensive clinical validation of their CNN-based motion correction algorithm on 53 fixed brain CT scans with immediate rescans serving as the reference standard. With this robust reference, they quantitatively compared the original and corrected images against the reference using metrics such as mean squared error and peak signal-to-noise ratio and found that motion correction significantly improved these metrics. In a subgroup analysis of ischemic stroke cases, they also reported higher AUC in the Alberta Stroke Program Early CT Score (ASPECTS) assessment after motion correction. However, their method showed limited performance in handling severe motion artifacts, particularly motion-induced skull deformities, which may reflect the architectural limitations of CNNs in capturing complex spatial distortions. In contrast, our proposed diffusion model demonstrated strong performance in correcting such challenging artifacts (e.g., the “double skull” case in Fig 3c) and was validated on real-world portable brain CT images, which are inherently more challenging due to lower baseline image quality compared to fixed CT.
One major concern of image-domain AI image processing methods is the potential for introducing hallucinations (i.e., false positives) or unintentionally removing true lesions (i.e., false negatives). In our study, we identified 1 false positive case (up|non-event) and 4 false negative cases (down|event). Among the false negative cases, we observed that the algorithm may inadvertently suppress subtle hemorrhages (Fig 6a), potentially mistaking them for motion artifacts. In the only false positive case, the algorithm actually enhanced the visibility and diagnostic confidence of mass effect (Fig 6b). Although both the motion and corrected images showed signs of mass effect, the reference scan—acquired two days later—did not. The motion image was scored as “no mass effect,” likely due to poor visualization and low reader confidence, whereas the corrected image was scored as “mass effect,” resulting in an “up|non-event” classification. The discrepancy with the reference scan is plausibly attributed to interval evolution, as the increased intraventricular extension observed in the reference may have reduced the apparent mass effect. Taken together, these findings suggest that our algorithm did not introduce obvious hallucinations leading to false positive diagnoses in our dataset.
The ACR phantom study showed that all ACR criteria have been met by the motion-corrected images, indicating that the model preserved the crucial imaging performance for diagnosis, including low contrast performance, spatial resolution, CT number accuracy, and uniformity. Furthermore, the fact that the model was trained only with human data but can be seamlessly applied to phantom data also demonstrates the robustness of the model to unseen data.
Our study has several limitations. The most notable concern is the lack of significant improvement in lesion detectability, with some lesion types even showing a slight numerical decrease in AUC after motion correction. One possible explanation is the domain shift introduced during model training. Specifically, due to the lack of portable brain CT scans that are completely free of motion artifacts, the model was trained using simulated motion generated from fixed CT images and portable CT x-ray geometries (Fig 2), rather than real portable CT scans. To address this, future work will involve training the model on paired scan/rescan data or on artifact-free portable CT scans and their simulated counterparts. In addition, we will explore to combine image-domain diffusion models with projection-domain motion correction methods, such as MAM-based (11) and PAR-based (12) techniques, which ensures data fidelity (to avoid incorrect addition or removal of lesions) and may further enhance diagnostic utility. Second, we were unable to prospectively collect immediate rescans to serve as a robust reference standard. Instead, we retrospectively collected cases with fixed CT scans acquired within ± 2 days of the portable CT, during which intracranial conditions may have changed. A large-scale prospective study will be done in the future given the promising results of this retrospective study. Third, due to the inherently lower image quality of portable brain CT, lesion identification becomes more challenging for readers, which contributed to greater inter-reader variability (ICC ranging from 0.192 to 0.794 across lesion types). This variability could reduce the sensitivity of our evaluation in detecting subtle differences in lesion detectability. The inter-reader variability may also come from the varying experience levels of our readers (one radiologist, one fellow and one resident). Fourth, currently all images were downsampled to 256×256×50 to fit the GPU memory. It might have limited the pathology that can be distinguished in the human observer study. A more efficient implementation is needed to retain the image resolution for clinical deployment. Lastly, all portable CT used in our study came from one type of scanner (CereTom) so that the generalizability evaluation was limited.
In conclusion, we demonstrated that a diffusion model-based motion correction approach can effectively reduce motion artifacts and improve both image quality and diagnostic confidence, without compromising lesion detection. These findings suggest that this approach holds clinical potential by providing radiologists with cleaner, more interpretable images and increasing their confidence in making accurate diagnostic assessments.
Supplementary Material
ACKNOWLEDGMENTS
Research reported in this publication was supported by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health under Award Number R01EB035394. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
DECLARATION OF COMPETING INTEREST
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Dufan Wu reports that financial support was provided by National Institute of Biomedical Imaging and Bioengineering. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Abbreviation:
- ACR
American College of Radiology
- ASPECTS
Alberta Stroke Program Early CT Score
- CNN
Convolutional Neural Networks
- CT
Computed Tomography
- EDH
Epidural Hematoma
- EDM
Elucidated Diffusion Model
- GPU
Graphical Processing Unit
- ICC
Intraclass Correlation Coefficient
- ICU
Intensive Care Unit
- IPH
Intraparenchymal Hematoma
- IVH
Intraventricular Hemorrhage
- MAM
Motion Artifact Metrics
- NRI
Net Reclassification Index
- PAR
Partial Angle Reconstruction
- SAH
Subarachnoid Hemorrhage
- SDH
Subdural Hematoma
APPENDIX
Appendix Table A.
A Complete List of Scoring Criteria
| Image quality | Overall quality | 1=poor, 2=fair, 3=acceptable, 4=good, 5=excellent |
| Sharpness | 1=very low, 2=low, 3=moderate, 4=high, 5=very high | |
| Motion artifacts | 1=severe, 2=significant, 3=moderate, 4=minimal, 5=no artifacts | |
| Diagnostic confidence | 1=very low, 2=low, 3=moderate, 4=high, 5=very high | |
| Lesion presence | Epidural hematoma (EDH) | 1=definitely absent, 2=probably absent, 3=uncertain, 4=probably present, 5=definitely present |
| Intraparenchymal hematoma (IPH) | ||
| Subdural hematoma (SDH) | ||
| Subarachnoid hemorrhage (SAH) | ||
| Intraventricular hemorrhage (IVH) | ||
| Pneumocephalus | ||
| Mass effect | ||
| Hydrocephalus |
Appendix Table B.
Intraclass Correlation Coefficient (ICC) for Qualitative Scoring Across Three Readers
| All groups | Motion | Corrected | Reference | |
|---|---|---|---|---|
| Overall quality | 0.556 | 0.378 | 0.459 | 0.440 |
| Sharpness | 0.473 | 0.368 | 0.349 | 0.409 |
| Motion artifacts | 0.562 | 0.266 | 0.507 | 0.446 |
| Diagnostic confidence | 0.558 | 0.286 | 0.498 | 0.567 |
| EDH | 0.192 | 0.077 | 0.204 | 0.278 |
| IPH | 0.767 | 0.716 | 0.722 | 0.851 |
| SDH | 0.554 | 0.522 | 0.571 | 0.570 |
| SAH | 0.568 | 0.564 | 0.599 | 0.553 |
| IVH | 0.784 | 0.756 | 0.822 | 0.784 |
| Pneumocephalus | 0.750 | 0.790 | 0.729 | 0.739 |
| Mass effect | 0.745 | 0.728 | 0.721 | 0.783 |
| Hydrocephalus | 0.765 | 0.764 | 0.747 | 0.786 |
The ICC for all cases (the first column) and each specific group (the last three columns) were reported
Appendix Table C.
Re-classification of Lesion Detections
| Re-classification | Reference | ||||||
|---|---|---|---|---|---|---|---|
|
| |||||||
| Lesion Type | Up | event | Down | event | Down | non-event | Up | non-event | Present | Absent | Uncertain |
| EDH | 0 | 0 | 0 | 0 | 1 | 62 | 4 |
| IPH | 0 | 1 | 1 | 0 | 34 | 32 | 1 |
| SDH | 1 | 0 | 3 | 0 | 31 | 29 | 7 |
| SAH | 0 | 0 | 1 | 0 | 27 | 34 | 6 |
| IVH | 0 | 2 | 0 | 0 | 26 | 40 | 1 |
| Pneumocephalus | 1 | 0 | 0 | 0 | 17 | 50 | 0 |
| Mass effect | 1 | 1 | 1 | 1 | 40 | 27 | 0 |
| Hydrocephalus | 2 | 0 | 2 | 0 | 13 | 54 | 0 |
| All types | 5 | 4 | 8 | 1 | 189 | 328 | 19 |
In total, the correct re-classification (up|event and down|non-event) is 13 cases and incorrect re-classification (down|event and up|non-event) is 5 cases
Footnotes
CREDIT AUTHORSHIP CONTRIBUTION STATEMENT
Zhennong Chen: Writing – review & editing, Writing – original draft, Visualization, Validation, Software, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Quirin Strotzer: Writing – review & editing, Formal analysis, Data curation. Min Lang: Writing – review & editing, Formal analysis, Data curation. Maryam Vejdani-Jahromi: Writing – review & editing, Formal analysis, Data curation. Baihui Yu: Writing – review & editing, Formal analysis, Data curation. Rehab Naeem Khalid: Writing – review & editing, Data curation. Siyeop Yoon: Writing – review & editing, Software. Matthew Tivnan: Writing – review & editing, Software. Quanzheng Li: Writing – review & editing, Supervision, Resources, Funding acquisition. Michael H. Lev: Writing – review & editing, Supervision, Resources, Project administration. Rajiv Gupta: Writing – review & editing, Supervision, Resources, Project administration, Funding acquisition. Dufan Wu: Writing – review & editing, Supervision, Resources, Project administration, Methodology, Investigation, Funding acquisition, Conceptualization.
Contributor Information
Zhennong Chen, Center for Advanced Medical Computing and Analysis, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, School of AI and Advanced Computing, Xi’an Jiaotong-Liverpool University, Suzhou, China.
Quirin Strotzer, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts.
Min Lang, Department of Radiology, Brigham and Women’s Hospital, Boston, Massachusetts.
Maryam Vejdani-Jahromi, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts.
Baihui Yu, Department of Radiology, The University of Chicago, Chicago, Illinois.
Rehab Naeem Khalid, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts.
Siyeop Yoon, Center for Advanced Medical Computing and Analysis, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts.
Matthew Tivnan, Center for Advanced Medical Computing and Analysis, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts.
Quanzheng Li, Center for Advanced Medical Computing and Analysis, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts.
Michael H. Lev, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts
Rajiv Gupta, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts.
Dufan Wu, Center for Advanced Medical Computing and Analysis, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, Department of Radiology, The Ohio State University, Columbus, Ohio.
REFERENCES
- 1.Peace K, Wilensky EM, Frangos S, et al. The use of a portable head CT scanner in the intensive care unit. J Neurosci Nurs J Am Assoc Neurosci Nurses 2010; 42(2):109–116. 10.1097/jnn.0b013e3181ce5c5b [DOI] [PubMed] [Google Scholar]
- 2.Walter S, Kostopoulos P, Haass A, et al. Diagnosis and treatment of patients with stroke in a mobile stroke unit versus in hospital: a randomised controlled trial. Lancet Neurol 2012; 11(5):397–404. 10.1016/S1474-4422(12)70057-1 [DOI] [PubMed] [Google Scholar]
- 3.Ebinger M, Winter B, Wendt M, et al. Effect of the use of ambulance-based thrombolysis on time to thrombolysis in acute ischemic stroke: a randomized clinical trial. JAMA 2014; 311(16):1622–1631. 10.1001/jama.2014.2850 [DOI] [PubMed] [Google Scholar]
- 4.Emberson J, Lees KR, Lyden P, et al. Effect of treatment delay, age, and stroke severity on the effects of intravenous thrombolysis with alteplase for acute ischaemic stroke: a meta-analysis of individual patient data from randomised trials. Lancet Lond Engl 2014; 384(9958):1929–1935. 10.1016/S0140-6736(14)60584-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Park SJ, Park J, Kim D, et al. The first mobile photon-counting detector CT: the human images and technical performance study. Phys Med Biol 2023; 68(9):095013. 10.1088/1361-6560/acc8b3 [DOI] [PubMed] [Google Scholar]
- 6.Fukuda A, Lin PP, Matsubara K, et al. Measurement of gantry rotation time in modern ct. J Appl Clin Med Phys 2014; 15(1):303–308. 10.1120/jacmp.v15i1.4517 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Diekmann S, Siebert E, Juran R, et al. Dose exposure of patients undergoing comprehensive stroke imaging by multidetector-row CT: comparison of 320-detector row and 64-detector row CT scanners. Am J Neuroradiol 2010; 31(6):1003–1009. 10.3174/ajnr.A1971 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Fahmi F, Beenen LFM, Streekstra GJ, et al. Head movement during CT brain perfusion acquisition of patients with suspected acute ischemic stroke. Eur J Radiol 2013; 82(12):2334–2341. 10.1016/j.ejrad.2013.08.039 [DOI] [PubMed] [Google Scholar]
- 9.Sun T, Kim JH, Fulton R, et al. An iterative projection-based motion estimation and compensation scheme for head x-ray CT. Med Phys 2016; 43(10):5705. 10.1118/1.4963218 [DOI] [PubMed] [Google Scholar]
- 10.Ouadah S, Jacobson M, Stayman JW, et al. Correction of patient motion in cone-beam CT using 3D-2D registration. Phys Med Biol 2017; 62(23):8813–8831. 10.1088/1361-6560/aa9254 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Jang S, Kim S, Kim M, et al. Head motion correction based on filtered backprojection for x-ray CT imaging. Med Phys 2018; 45(2):589–604. 10.1002/mp.12705 [DOI] [PubMed] [Google Scholar]
- 12.Chen Z, Li Q, Wu D. Estimate and compensate head motion in non-contrast head CT scans using partial angle reconstruction and deep learning. Med Phys 2024; 51(5):3309–3321. 10.1002/mp.17047 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ko Y, Moon S, Baek J, et al. Rigid and non-rigid motion artifact reduction in X-ray CT using attention module. Med Image Anal 2021; 67:101883. 10.1016/j.media.2020.101883 [DOI] [PubMed] [Google Scholar]
- 14.Su B, Wen Y, Liu Y, et al. A deep learning method for eliminating head motion artifacts in computed tomography. Med Phys 2022; 49(1):411–419. 10.1002/mp.15354 [DOI] [PubMed] [Google Scholar]
- 15.Zhou L, Liu H, Zou YX, et al. Clinical validation of an AI-based motion correction reconstruction algorithm in cerebral CT. Eur Radiol 2022; 32(12):8550–8559. 10.1007/s00330-022-08883-4 [DOI] [PubMed] [Google Scholar]
- 16.Chen Z, Yoon S, Strotzer Q, et al. Portable head CT motion artifact correction via diffusion-based generative model. Comput Med Imaging Graph 2025; 119:102478. 10.1016/j.compmedimag.2024.102478 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Chu LC, Anandkumar A, Shin HC, et al. The potential dangers of artificial intelligence for radiology and radiologists. J Am Coll Radiol 2020; 17(10):1309–1311. 10.1016/j.jacr.2020.04.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Karras T, Aittala M, Aila T, et al. Elucidating the design space of diffusion-based generative models, In: Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ‘22), 2022, 26565–26577, doi: 10.5555/3600270.3602196. [DOI] [Google Scholar]
- 19.Weinrich JM, Well L, Regier M, et al. MDCT in suspected lumbar spine fracture: comparison of standard and reduced dose settings using iterative reconstruction. Clin Radiol 2018; 73(7):675.e9–675.e15. 10.1016/j.crad.2018.02.015 [DOI] [PubMed] [Google Scholar]
- 20.Leon AC. Descriptive and inferential statistics. Comprehensive Clin Psychol 1998:243–285. 10.1016/B0080-4270(73)00264-9 [DOI]
- 21.Sun X, Xu W. Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Process Lett 2014; 21(11):1389–1393. 10.1109/LSP.2014.2337313 [DOI] [Google Scholar]
- 22.Bruder H, Rohkohl C, Stierstorfer K, et al. Compensation of skull motion and breathing motion in CT using data-based and image-based metrics, respectively. Med Imaging 2016 Phys Med Imaging SPIE,; 2016. p. 348–359. 10.1117/12.2217395 [DOI] [Google Scholar]
- 23.Hahn J, Bruder H, Rohkohl C, et al. Motion compensation in the region of the coronary arteries based on partial angle reconstructions from short-scan CT data. Med Phys 2017; 44(11):5795–5813. 10.1002/mp.12514 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
