Abstract
Objective. Motion artifacts (ARTs) in brain magnetic resonance imaging (MRI), mainly from rigid head motion, degrade image quality and hinder downstream applications. Conventional methods to mitigate these ARTs, including repeated acquisitions or motion tracking, impose workflow burdens. This study introduces Res-MoCoDiff, an efficient denoising diffusion probabilistic model specifically designed for MRI motion ART correction. Approach. Res-MoCoDiff exploits a novel residual error shifting mechanism during the forward diffusion process to incorporate information from motion-corrupted images. This mechanism allows the model to simulate the evolution of noise with a probability distribution closely matching that of the corrupted data, enabling a reverse diffusion process that requires only four steps. The model employs a U-net backbone, with attention layers replaced by Swin Transformer blocks, to enhance robustness across resolutions. Furthermore, the training process integrates a combined loss function, which promotes image sharpness and reduces pixel-level errors. Res-MoCoDiff was evaluated on both an in-silico dataset generated using a realistic motion simulation framework and an in-vivo movement-related ARTs dataset. Comparative analyses were conducted against established methods, including cycle generative adversarial network, Pix2pix, and a diffusion model with a vision transformer backbone, using quantitative metrics such as peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and normalized mean squared error (NMSE). Main results. The proposed method demonstrated superior performance in removing motion ARTs across minor, moderate, and heavy distortion levels. Res-MoCoDiff consistently achieved the highest SSIM and the lowest NMSE values, with a PSNR of up to dB for minor distortions. Notably, the average sampling time was reduced to 0.37 s per batch of two image slices, compared with 101.74 s for conventional approaches. Significance. Res-MoCoDiff offers a robust and efficient solution for correcting MRI motion ARTs, preserving fine structural details while significantly reducing computational overhead. Its speed and restoration fidelity underscore its potential for integration into clinical workflows, enhancing diagnostic accuracy and patient care.
Keywords: MRI, deep learning, motion correction, MoCo, efficient, diffusion model
1. Introduction
Magnetic resonance imaging (MRI) is a cornerstone of modern diagnostics, treatment planning, and patient follow-up, providing high-resolution images of soft tissues without the use of ionizing radiation. However, prolonged MRI acquisitions increase the likelihood of patient movement, leading to motion artifacts (ARTs). These ARTs can alter the B0 field, which results in susceptibility ARTs (Safari et al 2023), and disrupt the k-space readout lines, potentially violating the Nyquist criterion and causing ghosting and ringing ARTs (Zaitsev et al 2015). As one of the most common ARTs encountered in MRI (Sreekumari et al 2019), motion ARTs may compromise post-processing procedures such as image segmentation (Kemenczky et al 2022) and target tracking in movement-related(MR)-guided radiation therapy (Sui et al 2024). Moreover, due to the limited availability of prospective motion correction (MoCo) techniques and the additional complexity they introduce, clinical workflows often resort to repeating the imaging acquisition. The severity and spatial distribution of these ARTs underscore a need for robust methods capable of effectively removing or significantly reducing motion ARTs without repeated imaging.
Traditional MoCo algorithms have primarily aimed at mitigating ARTs by optimizing image quality metrics such as entropy and image gradient (Manduca et al 2000), as well as by estimating the motion-corrupted k-space lines (Bydder et al 2002) and corresponding motion trajectories (Loktyushin et al 2013). Additional strategies include navigator echoes, external tracking devices, and sequence modifications that prospectively adjust acquisition to account for motion (Zaitsev et al 2015). Retrospective approaches in the reconstruction domain, such as phase shift correction in k-space, blind motion trajectory estimation, and compressed sensing with motion models, have also been widely explored (Manduca et al 2000). While these methods have shown clinical utility, their adoption is often limited by the need for raw k-space data, which is not routinely stored in clinical archives, and by reconstruction pipelines that vary across scanner vendors, reducing reproducibility and generalizability (Karakuzu et al 2022, Tamir et al 2025).
In parallel, deep learning (DL) approaches have demonstrated superior performance in suppressing motion ARTs compared to conventional methods (Duffy et al 2021, Spieker et al 2024). In particular, both supervised and unsupervised generative models based on generative adversarial networks (GANs) have been successfully employed for MRI motion ART removal (Küstner et al 2019, Liu et al 2021a, Safari et al 2024). However, GAN-based approaches frequently encounter practical limitations, including mode collapse and unstable training, while reconstruction- and k-space-based methods often require scanner-specific modifications. These limitations highlight the importance of image-domain methods that operate directly on magnitude images, which are universally available and readily integrated into existing clinical workflows without changes to acquisition hardware or vendor-specific reconstruction software.
In contrast, image-domain methods that operate directly on reconstructed magnitude images that are universally available in both retrospective studies and multi-center collaborations, enabling wide applicability without requiring changes to the acquisition hardware or reconstruction software. This characteristic makes image-based models particularly attractive as off-the-shelf solutions for clinical deployment. Furthermore, the use of image-domain methods allows integration with existing clinical pipelines for tasks such as segmentation, registration, and radiotherapy planning, where corrected magnitude images can be seamlessly substituted for corrupted ones. For these reasons, this study focused on the image domain, aiming to balance methodological innovation with clinical feasibility.
Recently, diffusion denoising probabilistic models (DDPMs) have revolutionized image generation techniques by markedly improving synthesis quality (Ho et al 2020) and have been adapted for various medical imaging tasks, including image synthesis (Özbey et al 2023, Pan et al 2025), denoising (Wu et al 2023), MRI acceleration (Safari et al 2025), and the vision foundation model for MRI (Sun et al 2024, Wang et al 2025a, 2025b). DDPMs involve a forward process in which a Markov chain gradually transforms the input image into Gaussian noise, , over a large number of steps, followed by a reverse process in which a neural network reconstructs the original image from the noisy data (Luo 2022). Existing DDPM-based MoCo models concatenate the motion-corrupted image y with and performing the backward diffusion to reconstruct the motion-free image being similar to the ground truth image x (Liu et al 2024, Safari et al 2024, Sarkar et al 2025). Although these methods achieved promising results, their reliance on numerous diffusion steps substantially increases the inference time. Additionally, initiating reconstruction from fully Gaussian noise might be suboptimal for MRI MoCo task (see section 2.1).
In this study, we present Res-MoCoDiff, residual-guided efficient motion-correction denoising diffusion probabilistic model, a diffusion model that explicitly exploits the residual error between motion-free x and motion-corrupted y images (i.e. ) in the forward diffusion process. Integrating this residual error into the diffusion process enables generation of noisy images at step N with a probability distribution closely matching that of the motion-corrupted images, specifically . This approach offers two significant advantages: (1) enhanced reconstruction fidelity by avoiding the restrictive purely Gaussian prior assumption of conventional DDPMs, and (2) substantial computational efficiency, as the reverse diffusion process can be reduced to only four steps, substantially accelerating reconstruction times compared to traditional DDPMs.
In summary, the main contributions of this study are as follows:
-
•
Res-MoCoDiff is an efficient diffusion model leveraging residual information, substantially reducing the diffusion process to just four steps.
-
•
Res-MoCoDiff employs a novel noise scheduler that enables a more precise transition between diffusion steps by incorporating the residual error.
-
•
Res-MoCoDiff replaces the attention layers with a Swin Transformer block.
-
•
Extensive evaluation of Res-MoCoDiff is performed on both simulated (in-silico) and clinical (in-vivo) datasets covering various levels of motion-induced distortions.
2. Materials and methods
2.1. DDPM
DDPMs are inspired by non-equilibrium thermodynamics and aim to approximate complex data distributions using a tractable distribution, such as a normal Gaussian distribution as a prior (Sohl-Dickstein et al 2015). Specifically, DDPMs employ a Markov chain consisting of two distinct processes: a forward diffusion and a backward (denoising) process. During the forward diffusion, the input image x is gradually perturbed through a sequence of small Gaussian noise injections, eventually converging toward pure Gaussian noise after a large number of diffusion steps (Ho et al 2020, Luo 2022). Conversely, the backward process employs a DL model to iteratively remove noise and reconstruct the original image from the Gaussian noise by approximating the reverse Markov chain of the forward diffusion.
In traditional DDPM implementations, this reconstruction (reverse diffusion) typically requires many iterative steps (often hundreds to thousands), significantly increasing the computational burden and limiting clinical applicability, especially in time-sensitive scenarios (Ho et al 2020, Luo 2022).
Formally, MoCo algorithms aim to recover an unknown motion-free image from a motion-corrupted image y according to
where denotes an unknown motion corruption operator and n represents additive noise. Since this inverse problem is ill-posed, it is essential to impose a regularization or prior assumptions to constrain the solution space. Without such constraints, multiple plausible solutions for x may be consistent with the observed data y. From a Bayesian perspective, this regularization is introduced via a prior distribution , which, when combined with the likelihood term , yields the posterior distribution:
Traditional DDPMs typically assume a normal Gaussian prior, . While mathematically convenient, this assumption might not be ideal for inverse problem tasks (Yue et al 2023, 2025) such as MRI MoCo tasks because it could encourage unrealistic reconstruction, introducing unwanted ARTs or image hallucinations, as suggested by recent studies (Safari et al 2024).
2.2. Problem formulation
Similar to conventional DDPMs, Res-MoCoDiff employs a Markov chain for both the forward and backward diffusion processes. However, it introduces a key modification: explicitly incorporating the residual error r between the motion-corrupted (y) and the motion-free (x) images into the forward diffusion process. This process is illustrated in figure 1.
Figure 1.
Flowchart of the Res-MoCoDiff approach. The forward process employs a Markov chain to shift the residual error (), thus simulating the forward diffusion. The backward diffusion is also modeled via a Markov chain , where a DL model parametrized by θ is trained to iteratively remove the noise and recover the original image.
2.2.1. Forward process
Res-MoCoDiff. Res-MoCoDiff employs a monotonically increasing shifting sequence to modulate the residual error r, starting with and culminating in , as illustrated in figure 1, where each forward step progressively integrates more of the residual into the motion-free image. The transition kernel for each forward step is given by
where and . The hyperparameter γ enhances the flexibility of the forward process. Following a procedure similar to that described in Luo (2022), Safari et al (2025a), it can be shown that the marginal distribution of the data at a given time step t from the input image x is
where we denote the motion-free input image by x, omitting the subscript (i.e. x0).
Noise scheduler. We employ a non-uniform geometric noise scheduler, as proposed by Yue et al (2023), to compute the shifting sequence . Formally,
where p is a hyperparameter controlling the growth rate. As shown in figure 2, lower values of p lead to greater noise levels in the images xt across the forward diffusion steps. In addition, it is recommended to keep sufficiently small to ensure (see (4)) (Sohl-Dickstein et al 2015, Ho et al 2020). Hence, we set by choosing and γ = 2. We also set to satisfy the upper bound . Unlike p, which modulates the rate at which noise accumulates, a larger γ amplifies the overall noise level at each step. Panels (a)–(c), (d)–(f), and (i)–(k) of figure 2 illustrate how different values of p and γ alter the forward diffusion process at various time steps t, while panel (g) depicts the ground truth x, the motion-corrupted image y, and the residual r. The corresponding noise scheduler curves for each hyperparameter combination are shown in panel (h).
Figure 2.
Illustration of the influence of hyperparameters on the forward diffusion process. Panels (a)–(c), (d)–(f), and (i)–(k) demonstrate how varying the hyperparameter γ affects the noise level in the generated images xt for different values of p, with higher γ leading to stronger noise. Panels (a) and (d) specifically compare the effect of p for a fixed γ. Panel (g) displays the ground truth motion-free image x, the motion-corrupted image y, and the residual error r. Panel (h) shows the evolution of over the time steps t for various hyperparameter combinations.
2.2.2. Backward process
This process trains a DL model, parameterized by θ, that employs a U-net backbone in which the conventional attention layers are replaced by Swin Transformer blocks (Liu et al 2021b) to improve generalization across different image resolutions (Safari et al 2025b). The network architecture is depicted in figure 3.
Figure 3.
The Res-MoCoDiff network architecture. The inputs consist of a motion-corrupted image y, a motion-free image xt at a given time step t, and the corresponding time step information. The output is the estimated motion-free image for .
The Res-MoCoDiff model is trained to estimate the posterior distribution as follows:
where , and denotes a DL model, parameterized by θ, which approximates given xt.
Following the conventional DDPM literature (Luo 2022, Yue et al 2023, 2025, Safari et al 2025a), we assume that the reverse process follows a Gaussian distribution:
where the parameters θ are optimized by minimizing the following evidence lower bound:
With denoting the Kullback–Leibler divergence. Detailed derivations can be found in Yue et al (2023, 2025), Safari et al (2025a).
Based on (3) and (4), the target distribution is given by:
Since is independent of the inputs x and y, we set , in accordance with previous works (Ho et al 2020, Safari et al 2025a, Yue et al 2025).
The mean parameter is modeled as follows:
where denotes the DL model parameterized by θ.
Under the assumption of a Gaussian kernel and a Markov chain, it can be shown that (6) can be optimized by minimizing the loss below,
Additionally, our experiments demonstrate that incorporating an regularizer can enhance high-resolution image reconstruction by promoting sparsity in the learned representations. The overall loss function is defined as:
With the effectiveness of the regularizer further validated in the ablation study in section 3.3. The pseudo-codes for the training and sampling processes are provided in algorithms 1 and 2, respectively.
| Algorithm 2. Sampling process. |
|---|
| Input: motion-corrupted image y; number of steps N = 4; noise scheduler with , , growth rate p = 0.3, and ; noise scaling γ = 2 |
| for t = N, ..., 1 do |
| if t > 1 else |
| Given in (10) |
| end for |
In our implementation, the training objective combined and losses with equal weighting. This design was chosen to balance the strengths of the two norms: the component penalizes large deviations and promotes overall pixel-level consistency, whereas the component preserves high-frequency features and sharper structural details. While loss alone can lead to overly smoothed reconstructions, the inclusion of mitigates this effect by encouraging sparsity and edge sharpness. This complementary behavior has been reported in prior medical image restoration studies and was confirmed in our ablation analysis (section 3.3). No additional hyperparameter optimization of the relative weights was performed in this work.
In Res-MoCoDiff, the forward diffusion process was implemented with N = 20 steps. However, the reverse process was reduced to only four steps due to the residual-guided formulation, which shifts the corrupted image distribution closer to the motion-free distribution. This substantially reduces the gap that the reverse diffusion must bridge. The number of reverse steps was selected empirically by evaluating different step counts on a validation set, where four steps provided the optimal trade-off between reconstruction quality and efficiency.
Res-MoCoDiff was implemented in PyTorch (version 2.5.1) and executed on an NVIDIA A100 with 80 GB GPU RAM. The model was trained for 100 epochs with a batch size of 32. Optimization was performed using the RAdam optimizer (Liu et al 2020) in conjunction with a cosine annealing learning rate scheduler (Loshchilov and Hutter 2017), with an initial learning rate of and a minimum learning rate of . A warm-up phase comprising 5000 steps was employed prior to transitioning to the cosine schedule to stabilize early training dynamics.
2.3. Patient data acquisition and data pre-processing
This study utilizes two publicly available datasets, namely the IXI dataset (https://brain-development.org/ixi-dataset/) and the movement-related ARTs (MR-ART) dataset from Open-Neuro (Nárai et al 2022), to train and evaluate our models.
The IXI dataset comprises 580 cases of T1-weighted (T1-w) brain MRI images. We partitioned the dataset into two non-overlapping subsets: a training set consisting of 480 cases (54 160 slices) and a testing set comprising 100 cases (11 980 slices). We adapted the motion simulation technique of Duffy et al (2021) to generate an in-silico dataset with varying levels of motion ARTs including high, moderate, and minor by perturbing 15, 10, and 7 k-space lines along a phase encoding direction, respectively. Random slabs, with widths ranging between three and seven k-space lines, were selected along the phase encoding direction and were subjected to rotational perturbations of and translational shifts of ±5 mm.
Additionally, model performance on in-vivo data was evaluated using the MR-ART T1-w brain MRI dataset, which comprises 148 cases (95 females and 53 males). This dataset includes three types of images: ground truth motion-free images, motion-corrupted images with a low level of distortion (level 1), and motion-corrupted images with a high level of distortion (level 2). Rigid brain image registration was performed using FSL-FLIRT (Jenkinson and Smith 2001, Jenkinson et al 2002) to compensate for misalignment between the motion-free and motion-corrupted images.
It is important to note that our study was conducted in the image domain, rather than in k-space. This choice was made because most publicly available datasets and many clinical archives provide only reconstructed magnitude images, not raw k-space data. Working in the image domain therefore enables broader applicability of Res-MoCoDiff to retrospective studies and multi-institutional data, where access to raw acquisition information is not readily available.
2.4. Quantitative and statistical analysis
We compared our model against benchmark approaches, including CycleGAN (Zhu et al 2017), Pix2pix (Isola et al 2017), and a conventional DDPM variant that employs a vision transformer backbone (Pan et al 2023).
To quantitatively assess the performance of the models in removing brain motion ARTs, we reported three metrics: normalized mean squared error (NMSE), structural similarity index measure (SSIM) (Wang et al 2004), and peak signal-to-noise ratio (PSNR). Lower NMSE values indicate better performance, although NMSE may favor solutions with increased blurriness (Zhang et al 2018). SSIM ranges from –1 to 1, with a value of 1 representing optimal structural similarity between the reconstructed and ground truth images. Likewise, a higher PSNR denotes improved performance and is more aligned with human perception due to its logarithmic scaling (Eidex et al 2024). The quantitative metrics were computed using the PIQ library (https://piq.readthedocs.io/en/latest, version 0.8.0) (Kastryulin et al 2022) with its default parameters.
3. Results
This section presents both qualitative and quantitative results for the in-silico and in-vivo datasets. In addition, an ablation study is conducted to quantify the contribution of each component of the proposed Res-MoCoDiff model.
3.1. Qualitative results
The motion ARTs observed in the motion-corrupted images confirm that our simulation procedure successfully reproduces both ringing ARTs inside the skull and ghosting of bright fat tissue outside the skull, as indicated by the white and green arrows in figures 4(a) and (d). Notably, the zoomed-in regions in figure 4(b) illustrate that Res-MoCoDiff preserves fine structural details more effectively than the comparative methods. Furthermore, the pixel-level distortion maps in figures 4(c), (f), and (i) underscore the superior ART removal achieved by Res-MoCoDiff.
Figure 4.
Qualitative results for the in-silico dataset are shown. Panels (a)–(c), (d)–(f), and (g)–(i) illustrate the outcomes for heavy, moderate, and minor distortion levels, respectively. The white and green arrows in panels (a) and (d) indicate ringing ARTs inside the skull and ghosting of bright fat tissue outside the skull, respectively, while panels (b), (e), and (h) present zoomed-in views of the regions highlighted by the red boxes.
Although our approach demonstrates a generally robust ability to preserve detailed structures, a few residual ringing ARTs remain (highlighted by arrows in figure 4(d)) for the moderate distortion level. For the minor distortion level, the overall performance among all methods is similar in mitigating motion ARTs, as shown in figures 4(g)–(i). Finally, the pixel-wise correlation plots in figures 5(a)–(c) confirm the qualitative findings: Res-MoCoDiff attained Pearson correlation coefficients of ρ = 0.9974, ρ = 0.9990, and ρ = 0.9999 for high, moderate, and minor distortion levels, respectively, surpassing the second-best MT-DDPM method, which yields ρ = 0.9961, ρ = 0.9987, and ρ = 0.9997.
Figure 5.
Pixel-wise correlations for the in-silico dataset are shown. Panels (a)–(c) display the corresponding pixel-wise correlation plots for heavy, moderate, and minor distortion levels.
Qualitative results for the in-vivo MR-ART dataset are presented in figure 6, where both optimal and suboptimal reconstructions are illustrated. In panel (a), corresponding to cases with recoverable motion corruption, red arrows indicate ringing ARTs inside the skull, and a white arrow highlights ghosting of bright fat tissue outside the skull. The green arrows denote regions where Res-MoCoDiff successfully restores fine structural details, achieving improvements from 28.81 dB and 75.71 (motion-corrupted) to 30.61 dB and 95.30 (Res-MoCoDiff) in PSNR and SSIM for Level 1, and from 27.03 dB and 67.58 to 30.78 dB and 94.75 for Level 2. In panel (b), however, we show suboptimal reconstructions in which Res-MoCoDiff failed to fully recover the anatomical structures. As indicated by the yellow arrows, the model produced hallucinated features when the motion-corrupted input lacked sufficient soft-tissue contrast, underscoring the inherent difficulty of reconstructing details that are not present in the corrupted images.
Figure 6.
Qualitative results for the in-vivo MR-ART dataset, illustrating both optimal (a) and suboptimal (b) reconstructions. Green and red arrows indicate ringing ARTs inside the skull, while the white arrow highlights ghosting of bright fat tissue outside the skull. Yellow arrows denote regions where Res-MoCoDiff hallucinated structures due to insufficient soft-tissue contrast in the corrupted inputs.
3.2. Quantitative results
As illustrated in figure 7, motion corruption progressively reduces PSNR and SSIM across minor, moderate, and heavy distortion levels, reaching average values of , , and , respectively. This negative trend confirms that increased distortion degrades image quality. Conversely, NMSE values rise from to and as the distortion intensifies. These results are further detailed in table 1, which shows that our proposed Res-MoCoDiff method consistently achieves higher PSNR and SSIM, as well as lower NMSE, compared with the benchmark approaches at all distortion levels.
Figure 7.
Boxplots of PSNR, SSIM, and NMSE metrics across different motion artifact levels for the in-silico IXI dataset.
Table 1.
Quantitative Metrics demonstrated in mean across different motion artifact levels of the in-silico dataset are summarized. The arrows indicate the direction of better performance. Bold indicates the best values.
| Metrics | Distortion level | Corrupted | Pix2Pix | CycleGAN | MT-DDPM | Res-MoCoDiff (ours) |
|---|---|---|---|---|---|---|
| PSNR [dB] | Minor | |||||
| Moderate | ||||||
| Heavy | ||||||
|
| ||||||
| SSIM [-] | Minor | |||||
| Moderate | ||||||
| Heavy | ||||||
|
| ||||||
| NMSE [%] | Minor | |||||
| Moderate | ||||||
| Heavy | ||||||
As shown in figure 7 and table 1, Res-MoCoDiff consistently achieves the lowest NMSE across all distortion levels. For minor distortion, Res-MoCoDiff outperforms all comparative methods in PSNR () and SSIM (), while also obtaining the lowest NMSE (). CycleGAN provides the second-best NMSE () at this level, although it achieves a lower PSNR () and SSIM () compared with Res-MoCoDiff.
For moderate distortion, Res-MoCoDiff achieves the highest SSIM () and the lowest NMSE (). CycleGAN attains the best PSNR () for this distortion level, but its SSIM () and NMSE () remain below Res-MoCoDiff’s performance. MT-DDPM ranks as the second-best method in NMSE (), highlighting its competitive ART-reduction capabilities, although it still trails Res-MoCoDiff.
At the heavy distortion level, Res-MoCoDiff again secures the best SSIM () and NMSE (), whereas CycleGAN achieves the highest PSNR (). Nonetheless, CycleGAN’s SSIM () and NMSE () are notably worse than those of Res-MoCoDiff, indicating that a higher PSNR alone may not guarantee superior structural fidelity or overall ART removal. Across all three distortion levels, the boxplots reveal that Res-MoCoDiff’s performance distribution is consistently shifted toward higher PSNR and SSIM and lower NMSE values compared with the other methods, underscoring its robustness in mitigating motion ARTs.
In the in-vivo MR-ART dataset, the original motion-corrupted images at distortion levels 1 and 2 yield NMSE, SSIM, and PSNR values of and , respectively. Res-MoCoDiff reduces NMSE by 33.21% (to ) for level 1 and 37.41% (to ) for level 2. Additionally, it raises SSIM by 23.60% (to ) for level 1 and 25.58% (to ) for level 2, while also improving PSNR by 6.26% (to ) for level 1 and 7.71% (to ) for level 2. These gains underline the robust performance of Res-MoCoDiff for both in-silico and in-vivo motion ART correction.
3.3. Ablation study
We conducted an ablation study to quantify the contributions of the loss defined in (11) and the combined loss specified in (12) to the overall performance of the proposed method across minor, moderate, and heavy distortion levels. Table 2 summarizes the PSNR, SSIM, and NMSE metrics obtained under the two training scenarios.
Table 2.
The ablation study results are summarized. The arrows indicate the direction of better performance. The numbers inside the parentheses in red are the improvement of the complete Res-MoCoDiff compared with the other training scenarios that only used loss in training.
| Scenarios | Distortion level | PSNR [dB] | SSIM [-] | NMSE [%] |
|---|---|---|---|---|
| Minor | ||||
| Moderate | ||||
| Heavy | ||||
|
| ||||
| (Res-MoCoDiff) | Minor | |||
| Moderate | ||||
| Heavy | ||||
When training with only the loss, the model achieved a PSNR of dB, an SSIM of , and an NMSE of for minor distortion. For moderate distortion, the performance was dB in PSNR, in SSIM, and in NMSE, while for heavy distortion the corresponding values were dB, , and , respectively.
In contrast, when the model was trained using the complete Res-MoCoDiff strategy that incorporates both the and losses, performance improvements were observed consistently across all distortion levels. For minor distortion, the PSNR increased to dB (an improvement of 3.63%), SSIM improved marginally to (an increase of 0.42%), and the NMSE was reduced to , corresponding to a reduction of 26.29%. For moderate distortion, the PSNR improved to dB (a 2.04% increase), the SSIM increased to (an improvement of 0.48%), and the NMSE decreased to , a reduction of 14.74%. For heavy distortion, the use of the combined loss resulted in a PSNR of dB (an improvement of 1.21%), an SSIM of (an increase of 0.95%), and an NMSE of , corresponding to a reduction of 5.73%.
These findings indicate that the inclusion of the regularizer is instrumental in reducing pixel-level errors and preserving fine structural details, thereby contributing significantly to the overall performance of the method. The improvements, particularly in NMSE, underscore the efficacy of the combined loss function in mitigating residual errors and enhancing the robustness of motion ART correction across varying levels of distortion.
4. Discussion
MRI is a versatile imaging modality that provides excellent soft-tissue contrast and valuable physiological information. However, the prolonged acquisition times inherent to MRI increase the likelihood of patient motion, which in turn manifests as ghosting and ringing ARTs. Although the simplest solutions to mitigate motion ARTs involve repeating the scan or employing motion tracking systems, these approaches impose additional costs and burdens on the clinical workflow.
In this study, we proposed Res-MoCoDiff, an efficient denoising diffusion probabilistic model designed to reconstruct motion-free images. By leveraging a residual error shifting mechanism (illustrated in figure 1), our method performs the sampling process in only four steps (see algorithm 2), thereby facilitating its integration into current clinical practices. Unlike conventional DDPMs that require hundreds of reverse steps, Res-MoCoDiff leverages residual error shifting to bring the corrupted distribution closer to the motion-free image distribution at the terminal diffusion step. This enables accurate image restoration in only four reverse steps. Our validation experiments showed that using more than four steps did not yield significant improvements, confirming the efficiency of this reduced-step sampling. Notably, Res-MoCoDiff achieves an average sampling time of 0.37 s per batch of two image slices, which is substantially lower than the 101.74 s per batch required by the conventional TM-DDPM approach.
Our motion simulation technique effectively generates realistic ARTs, including ringing within the skull and ghosting of bright fat tissue outside the skull, as indicated by the white arrows in figure 4 for the in-silico dataset and similarly in figure 6 for the in-vivo dataset. This capability underscores the potential of our simulation framework to closely mimic the clinical appearance of motion ARTs.
Extensive qualitative and quantitative evaluations on both in-silico and in-vivo datasets demonstrate the superior performance of Res-MoCoDiff in removing motion ARTs across different distortion levels. While comparative models often leave residual ARTs, particularly under heavy and moderate motion conditions, Res-MoCoDiff consistently eliminates these imperfections, as highlighted by the green arrows in figures 4. Moreover, the proposed method excels at recovering fine structural details, resulting in higher pixel-wise correlations (see figure 5(a)–(c)). Although our method achieves the second highest PSNR among the evaluated techniques (see table 1), its overall improvements in SSIM and NMSE, together with the perceptually superior image quality, underscore its clinical efficacy.
The inclusion of an regularizer during training further enhanced image sharpness by reducing NMSE, which is particularly important given that higher NMSE is often associated with blurry reconstructions. This improvement was consistent with the increases observed in both PSNR and SSIM, resulting in images that were structurally closer to the ground truth (Safari et al 2023).
While loss is widely used in medical image restoration to reduce global pixel-level error, it can lead to oversmoothed reconstructions. By contrast, the term preserves high-frequency information and improves edge sharpness, thereby producing more realistic structures. The complementary behavior of and has been reported in both general image restoration (Zhang et al 2018) and MRI ART removal (Liu et al 2021a). Our ablation study (table 2) confirmed these benefits, showing that the combined loss reduced NMSE while enhancing perceptual quality compared with alone.
A further consideration in designing Res-MoCoDiff is our decision to operate in the image domain rather than in k-space. While k-space methods can, in principle, provide direct access to raw acquisition information, their use in clinical practice is limited by several practical factors. Raw k-space data are not routinely stored in most hospital archives or large-scale public repositories, and reconstruction pipelines differ substantially across scanner vendors and software versions, which complicates reproducibility. In addition, implementing k-space MoCo often requires vendor-specific software environments that are not widely accessible. By contrast, reconstructed magnitude images are universally available and can be directly integrated into existing clinical workflows. Our image-domain design therefore prioritizes generalizability and clinical feasibility, allowing Res-MoCoDiff to serve as an off-the-shelf solution that can be applied retrospectively across a wide range of studies and institutions.
We focused on T1-w brain MRI because this sequence is both highly susceptible to motion ARTs and is central to Radiation Oncology workflows (De Pietro et al 2024). This emphasis ensures that our evaluation addresses a clinically important sequence with high impact on both diagnostic and therapeutic decision making.
This study has several limitations that warrant discussion. First, in severely degraded in-vivo cases where the motion-corrupted input lacks soft-tissue contrast, Res-MoCoDiff may hallucinate anatomical details that are not consistent with the ground truth. This limitation reflects the fact that when essential structural information is absent from the input, the model cannot reliably recover it. Future work should therefore explore strategies such as incorporating multi-contrast MRI, embedding stronger physics-informed priors, or using uncertainty quantification to identify high-risk reconstructions. Second, our work was restricted to brain MRI, and other sequences such as diffusion-weighted echo-planar imaging and high-resolution T2w imaging were not explored. Motion in brain MRI is typically irregular, unpredictable, and of relatively small amplitude, and the residual-guided design of Res-MoCoDiff is well suited for these conditions. However, larger and non-rigid motion events, such as sneezing or swallowing during long 3D acquisitions, may introduce more complex ARTs that are not fully addressed by the current framework. In addition, our motion simulation followed the established approach of Duffy et al (2021), which reproduces realistic ghosting and ringing but does not explicitly account for the increased probability of motion during longer scans. Furthermore, it should also be clarified that this study addresses retrospective motion ART correction in structural brain MRI, which is distinct from physiological MoCo approaches (e.g. respiratory or cardiac compensation) used in free-breathing acquisitions. To extend Res-MoCoDiff to these broader scenarios, retraining on sequence-specific data may be required, and future studies could explore whether architectural modifications would further improve performance in modeling non-rigid motion. Addressing these extensions will be important for establishing the generalizability of the framework across diverse imaging protocols and clinical applications. Finally, future work should also explore integration with motion modeling or hybrid prospective-retrospective strategies to improve robustness in more complex scenarios.
5. Conclusion
Res-MoCoDiff represents a significant advancement in motion ART correction for MRI. Its rapid processing speed and robust performance across a range of distortion levels make it a promising candidate for clinical adoption, potentially reducing the need for repeated scans and thereby improving patient throughput and diagnostic and treatment efficiency. Future work will focus on further optimizing the model, exploring its application to other imaging modalities, and validating its performance in larger, multi-center clinical studies.
Acknowledgments
This research is supported in part by the National Institutes of Health under Award Numbers R01DE033512, and R01CA272991.
Data availability statement
The datasets used in this study are publicly available. The IXI dataset can be accessed at https://brain-development.org/ixi-dataset/, and the MR-ART dataset is available through OpenNeuro https://openneuro.org/datasets/ds004173/versions/1.0.2.
Conflicts of interest
There are no conflicts of interest declared by the authors.
References
- Bydder M, Larkman D J, Hajnal J V. Detection and elimination of motion artifacts by regeneration of k-space. Magn. Reson. Med. 2002;47:677–86. doi: 10.1002/mrm.10093. [DOI] [PubMed] [Google Scholar]
- De Pietro S, et al. The role of MRI in radiotherapy planning: a narrative review “from head to toe. Insights Imaging. 2024;15:255. doi: 10.1186/s13244-024-01799-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duffy B A, Zhao L, Sepehrband F, Min J, Wang D J, Shi Y, Toga A W, Kim H. Retrospective motion artifact correction of structural MRI images using deep learning improves the quality of cortical surface reconstructions. NeuroImage. 2021;230:117756. doi: 10.1016/j.neuroimage.2021.117756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eidex Z, Wang J, Safari M, Elder E, Wynne J, Wang T, Shu H-K, Mao H, Yang X. High-resolution 3t to 7t ADC map synthesis with a hybrid CNN-transformer model. Med. Phys. 2024;51:4380–8. doi: 10.1002/mp.17079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Proc. 34th Int. Conf. on Neural Information Processing Systems (NIPS’20); Curran Associates Inc.; 2020. [Google Scholar]
- Isola P, Zhu J-Y, Zhou T, Efros A A. Image-to-image translation with conditional adversarial networks. 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).2017. [Google Scholar]
- Jenkinson M, Bannister P, Brady M, Smith S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. NeuroImage. 2002;17:825–41. doi: 10.1006/nimg.2002.1132. [DOI] [PubMed] [Google Scholar]
- Jenkinson M, Smith S. A global optimisation method for robust affine registration of brain images. Med. Image Anal. 2001;5:143–56. doi: 10.1016/S1361-8415(01)00036-6. [DOI] [PubMed] [Google Scholar]
- Karakuzu A, Biswas L, Cohen-Adad J, Stikov N. Vendor-neutral sequences and fully transparent workflows improve inter-vendor reproducibility of quantitative MRI. Magn. Reson. Med. 2022;88:1212–28. doi: 10.1002/mrm.29292. [DOI] [PubMed] [Google Scholar]
- Kastryulin S, Zakirov J, Prokopenko D, Dylov D V. Pytorch image quality: metrics for image quality assessment. 2022. [DOI]
- Kemenczky P, Vakli P, Somogyi E, Homolya I, Hermann P, Gál V, Vidnyánszky Z. Effect of head motion-induced artefacts on the reliability of deep learning-based whole-brain segmentation. Sci. Rep. 2022;12:1618. doi: 10.1038/s41598-022-05583-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Küstner T, Armanious K, Yang J, Yang B, Schick F, Gatidis S. Retrospective correction of motion-affected mr images using deep learning frameworks. Magn. Reson. Med. 2019;82:1527–40. doi: 10.1002/mrm.27783. [DOI] [PubMed] [Google Scholar]
- Liu L, Jiang H, He P, Chen W, Liu X, Gao J, Han J. On the variance of the adaptive learning rate and beyond. 8th Int. Conf. on Learning Representations, ICLR 2020, (Addis Ababa, Ethiopia, 26 April–30 April, 2020); OpenReview.net; 2020. [Google Scholar]
- Liu S, Thung K-H, Liangqiong Q, Lin W, Shen D, Yap P-T. Learning mri artefact removal with unpaired data. Nat. Mach. Intell. 2021;3:60–67. doi: 10.1038/s42256-020-00270-2. [DOI] [Google Scholar]
- Liu Y, Diao J, Zhou Z, Qi H, Hu P. Cardiac cine MRI motion correction using diffusion models. 2024 IEEE Int. Symp. on Biomedical Imaging (ISBI); 2024. pp. pp 1–5. [Google Scholar]
- Liu Z, Lin Y, Cao Y, Han H, Wei Y, Zhang Z, Lin S, Guo B. Swin transformer: hierarchical vision transformer using shifted windows. Proc. IEEE/CVF Int. Conf. on Computer Vision (ICCV); 2021. pp. pp 10012–22. [Google Scholar]
- Loktyushin A, Nickisch H, Pohmann R, Schölkopf B. Blind retrospective motion correction of MR images. Magn. Reson. Med. 2013;70:1608–18. doi: 10.1002/mrm.24615. [DOI] [PubMed] [Google Scholar]
- Loshchilov I, Hutter F. SGDR: stochastic gradient descent with warm restarts. 5th Int. Conf. on Learning Representations, ICLR 2017, (Toulon, France, 24 April–26 April, 2017) (Conf. Track Proc.); OpenReview.net; 2017. [Google Scholar]
- Luo C. Understanding diffusion models: a unified perspective. 2022 (arXiv: 2208.11970)
- Manduca A, McGee K P, Welch E B, Felmlee J P, Grimm R C, Ehman R L. Autocorrection in MR imaging: adaptive motion correction without navigator echoes. Radiology. 2000;215:904–9. doi: 10.1148/radiology.215.3.r00jn19904. [DOI] [PubMed] [Google Scholar]
- Nárai Adám, Hermann P, Auer T, Kemenczky P, Szalma J, Homolya I, Somogyi E, Vakli P, Weiss B, Vidnyánszky Z. Movement-related artefacts (MR-ART) dataset of matched motion-corrupted and clean structural MRI brain scans. Sci. Data. 2022;9:630. doi: 10.1038/s41597-022-01694-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Özbey M, Dalmaz O, Dar S U H, Bedel H A, Özturk, Güngör A, Čukur T. Unsupervised medical image translation with adversarial diffusion models. IEEE Trans. Med. Imaging. 2023;42:3524–39. doi: 10.1109/TMI.2023.3290149. [DOI] [PubMed] [Google Scholar]
- Pan S, et al. 2D medical image synthesis using transformer-based denoising diffusion probabilistic model. Phys. Med. Biol. 2023;68:105004. doi: 10.1088/1361-6560/acca5c. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pan S, Eidex Z, Safari M, Qiu R, Yang X. Cycle-guided denoising diffusion probability model for 3D cross-modality MRI synthesis. In: Gimi B S, Krol A, editors. Medical Imaging 2025: Clinical and Biomedical Imaging (International Society for Optics and Photonics) vol 13410. SPIE; 2025. p134101W [Google Scholar]
- Safari M, Eidex Z, Pan S, Qiu R L J, Yang X. Self-supervised adversarial diffusion models for fast mri reconstruction. Med. Phys. 2025;52:3888–99. doi: 10.1002/mp.17675. [DOI] [PubMed] [Google Scholar]
- Safari M, Fatemi A, Afkham Y, Archambault L. Patient-specific geometrical distortion corrections of MRI images improve dosimetric planning accuracy of vestibular schwannoma treated with gamma knife stereotactic radiosurgery. J. Appl. Clin. Med. Phys. 2023;24:e14072. doi: 10.1002/acm2.14072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Safari M, Fatemi A, Archambault L. Medfusiongan: multimodal medical image fusion using an unsupervised deep generative adversarial network. BMC Med. Imaging. 2023;23:203. doi: 10.1186/s12880-023-01160-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Safari M, Wang S, Eidex Z, Li Q, Middlebrooks E H, Yu D S, Yang X, Qiu R L. MRI super-resolution reconstruction using efficient diffusion probabilistic model with residual shifting. Phys. Med. Biol. 2025a;70:125008. doi: 10.1088/1361-6560/ade049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Safari M, Wang S, Eidex Z, Qiu R, Chang C-W, Yu D S, Yang X. A physics-informed deep learning model for MRI brain motion correction. 2025b (arXiv: 2502.09296v1)
- Safari M, Yang X, Chang C-W, Qiu R L J, Fatemi A, Archambault L. Unsupervised mri motion artifact disentanglement: introducing maudgan. Phys. Med. Biol. 2024;69:115057. doi: 10.1088/1361-6560/ad4845. [DOI] [PubMed] [Google Scholar]
- Safari M, Yang X, Fatemi A, Archambault L. Mri motion artifact reduction using a conditional diffusion probabilistic model (MAR-CDPM) Med. Phys. 2024;51:2598–610. doi: 10.1002/mp.16844. [DOI] [PubMed] [Google Scholar]
- Sarkar A, Das A, Ram K, Ramanarayanan S, Joel S E, Sivaprakasam M. Autodps: an unsupervised diffusion model based method for multiple degradation removal in MRI. Comput. Methods Programs Biomed. 2025;263:108684. doi: 10.1016/j.cmpb.2025.108684. [DOI] [PubMed] [Google Scholar]
- Sohl-Dickstein J, Weiss E A, Maheswaranathan N, Ganguli S. Deep unsupervised learning using nonequilibrium thermodynamics. Proc. 32nd Int. Conf. on Int. Conf. on Machine Learning (ICML’15); JMLR.org; 2015. pp. pp 2256–65. [Google Scholar]
- Spieker V, Eichhorn H, Hammernik K, Rueckert D, Preibisch C, Karampinos D C, Schnabel J A. Deep learning for retrospective motion correction in MRI: a comprehensive review. IEEE Trans. Med. Imaging. 2024;43:846–59. doi: 10.1109/TMI.2023.3323215. [DOI] [PubMed] [Google Scholar]
- Sreekumari A, et al. A deep learning-based approach to reduce rescan and recall rates in clinical MRI examinations. Am. J. Neuroradiol. 2019;40:217–23. doi: 10.3174/ajnr.A5926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sui Z, Palaniappan P, Brenner J, Paganelli C, Kurz C, Landry G, Riboldi M. Intra-frame motion deterioration effects and deep-learning-based compensation in MR-guided radiotherapy. Med. Phys. 2024;51:1899–917. doi: 10.1002/mp.16702. [DOI] [PubMed] [Google Scholar]
- Sun Y, Wang L, Li G, Lin W, Wang Li. A foundation model for enhancing magnetic resonance images and downstream segmentation, registration and diagnostic tasks. Nat. Biomed. Eng. 2024;9:521–38. doi: 10.1038/s41551-024-01283-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tamir J I, Blumenthal M, Wang J, Oved T, Shimron E, Zaiss M. MRI acquisition and reconstruction cookbook: recipes for reproducibility, served with real-world flavour. Magn. Resonan. Mater. Phys. Biol. Med. 2025;38:367–85. doi: 10.1007/s10334-025-01236-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang S, Jin Z, Hu M, Safari M, Zhao F, Chang C-W, Qiu R L J, Roper J, Yu D S, Yang X. Unifying biomedical vision-language expertise: towards a generalist foundation model via multi-clip knowledge distillation. 2025a (arXiv: 2506.22567)
- Wang S, Safari M, Li Q, Chang C-W, Qiu R L J, Roper J, Yu D S, Yang X. Triad: vision foundation model for 3D magnetic resonance imaging. 2025b doi: 10.21203/rs.3.rs-6129856/v1. [DOI]
- Wang Z, Bovik A C, Sheikh H R, Simoncelli E P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 2004;13:600–12. doi: 10.1109/TIP.2003.819861. [DOI] [PubMed] [Google Scholar]
- Wu Z, Chen X, Xie S, Shen J, Zeng Y. Super-resolution of brain mri images based on denoising diffusion probabilistic model. Biomed. Signal Process. Control. 2023;85:104901. doi: 10.1016/j.bspc.2023.104901. [DOI] [Google Scholar]
- Yue Z, Wang J, Loy C C. Efficient diffusion model for image restoration by residual shifting. IEEE Trans. Pattern Anal. Mach. Intell. 2025;47:116–30. doi: 10.1109/TPAMI.2024.3461721. [DOI] [PubMed] [Google Scholar]
- Yue Z, Wang J, Loy C C. Resshift: efficient diffusion model for image super-resolution by residual shifting. In: Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, editors. Advances in Neural Information Processing Systems; Curran Associates, Inc.; 2023. pp. pp 13294–307. [Google Scholar]
- Zaitsev M, Maclaren J, Herbst M. Motion artifacts in MRI: a complex problem with many partial solutions. J. Magn. Resonan. Imaging. 2015;42:887–901. doi: 10.1002/jmri.24850. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang R, Isola P, Efros A A, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. Proc. IEEE Conf. on Computer Vision and Pattern Recognition; 2018. pp. pp 586–95. [Google Scholar]
- Zhu J-Y, Park T, Isola P, Efros A A. Unpaired image-to-image translation using cycle-consistent adversarial networks. 2017 IEEE Int. Conf. on Computer Vision (ICCV).2017. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets used in this study are publicly available. The IXI dataset can be accessed at https://brain-development.org/ixi-dataset/, and the MR-ART dataset is available through OpenNeuro https://openneuro.org/datasets/ds004173/versions/1.0.2.







