Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

ArXiv logoLink to ArXiv
[Preprint]. 2025 Sep 4:arXiv:2505.03498v2. Originally published 2025 May 6. [Version 2]

Res-MoCoDiff: Residual-guided diffusion models for motion artifact correction in brain MRI

Mojtaba Safari 1, Shansong Wang 1, Qiang Li 1, Zach Eidex 1, Richard LJ Qiu 1, Chih-Wei Chang 1, Hui Mao 2, Xiaofeng Yang 1,
PMCID: PMC12083705  PMID: 40386577

Abstract

Objective.

Motion artifacts in brain MRI, mainly from rigid head motion, degrade image quality and hinder downstream applications. Conventional methods to mitigate these artifacts, including repeated acquisitions or motion tracking, impose workflow burdens. This study introduces Res-MoCoDiff, an efficient denoising diffusion probabilistic model specifically designed for MRI motion artifact correction.

Approach.

Res-MoCoDiff exploits a novel residual error shifting mechanism during the forward diffusion process to incorporate information from motion-corrupted images. This mechanism allows the model to simulate the evolution of noise with a probability distribution closely matching that of the corrupted data, enabling a reverse diffusion process that requires only four steps. The model employs a U-net backbone, with attention layers replaced by Swin Transformer blocks, to enhance robustness across resolutions. Furthermore, the training process integrates a combined 1+2 loss function, which promotes image sharpness and reduces pixel-level errors. Res-MoCoDiff was evaluated on both an in-silico dataset generated using a realistic motion simulation framework and an in-vivo MR-ART dataset. Comparative analyses were conducted against established methods, including CycleGAN, Pix2pix, and a diffusion model with a vision transformer backbone (MT-DDPM), using quantitative metrics such as peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and normalized mean squared error (NMSE).

Main results.

The proposed method demonstrated superior performance in removing motion artifacts across minor, moderate, and heavy distortion levels. Res-MoCoDiff consistently achieved the highest SSIM and the lowest NMSE values, with a PSNR of up to 41.91 ± 2.94 dB for minor distortions. Notably, the average sampling time was reduced to 0.37 seconds per batch of two image slices, compared with 101.74 seconds for conventional approaches.

Significance.

Res-MoCoDiff offers a robust and efficient solution for correcting MRI motion artifacts, preserving fine structural details while significantly reducing computational overhead. Its speed and restoration fidelity underscore its potential for integration into clinical workflows, enhancing diagnostic accuracy and patient care.

Keywords: MRI, Deep learning, Motion correction, MoCo, efficient, diffusion model

1. Introduction

Magnetic resonance imaging (MRI) is a cornerstone of modern diagnostics, treatment planning, and patient follow-up, providing high-resolution images of soft tissues without the use of ionizing radiation. However, prolonged MRI acquisitions increase the likelihood of patient movement, leading to motion artifacts. These artifacts can alter the B0 field, which results in susceptibility artifacts [1], and disrupt the k-space readout lines, potentially violating the Nyquist criterion and causing ghosting and ringing artifacts [2]. As one of the most common artifacts encountered in MRI [3], motion artifacts may compromise post-processing procedures such as image segmentation [4] and target tracking in MR-guided radiation therapy [5]. Moreover, due to the limited availability of prospective motion correction techniques and the additional complexity they introduce, clinical workflows often resort to repeating the imaging acquisition. The severity and spatial distribution of these artifacts underscore a need for robust methods capable of effectively removing or significantly reducing motion artifacts without repeated imaging.

Traditional motion correction (MoCo) algorithms have primarily aimed at mitigating artifacts by optimizing image quality metrics such as entropy and image gradient [6], as well as by estimating the motion-corrupted k-space lines [7] and corresponding motion trajectories [8]. Additional strategies include navigator echoes, external tracking devices, and sequence modifications that prospectively adjust acquisition to account for motion [2]. Retrospective approaches in the reconstruction domain, such as phase shift correction in k-space, blind motion trajectory estimation, and compressed sensing with motion models, have also been widely explored [6]. While these methods have shown clinical utility, their adoption is often limited by the need for raw k-space data, which is not routinely stored in clinical archives, and by reconstruction pipelines that vary across scanner vendors, reducing reproducibility and generalizability [9, 10].

In parallel, deep learning (DL) approaches have demonstrated superior performance in suppressing motion artifacts compared to conventional methods [11, 12]. In particular, both supervised and unsupervised generative models based on generative adversarial networks (GANs) have been successfully employed for MRI motion artifact removal [1315]. However, GAN-based approaches frequently encounter practical limitations, including mode collapse and unstable training, while reconstruction and k-space-based methods often require scanner-specific modifications. These limitations highlight the importance of image-domain methods that operate directly on magnitude images, which are universally available and readily integrated into existing clinical workflows without changes to acquisition hardware or vendor-specific reconstruction software.

In contrast, image-domain methods that operate directly on reconstructed magnitude images that are universally available in both retrospective studies and multi-center collaborations, enabling wide applicability without requiring changes to the acquisition hardware or reconstruction software. This characteristic makes image-based models particularly attractive as off-the-shelf solutions for clinical deployment. Furthermore, the use of image-domain methods allows integration with existing clinical pipelines for tasks such as segmentation, registration, and radiotherapy planning, where corrected magnitude images can be seamlessly substituted for corrupted ones. For these reasons, this study focused on the image domain, aiming to balance methodological innovation with clinical feasibility.

Recently, diffusion denoising probabilistic models (DDPMs) have revolutionized image generation techniques by markedly improving synthesis quality [16] and have been adapted for various medical imaging tasks, including image synthesis [17, 18], denoising [19], MRI acceleration [20], and the vision foundation model for MRI [2123]. DDPMs involve a forward process in which a Markov chain gradually transforms the input image into Gaussian noise, N(0,I), over a large number of steps, followed by a reverse process in which a neural network reconstructs the original image from the noisy data [24]. Existing DDPM-based MoCo models concatenate the motion-corrupted image y with xN~N(0,I) and performing the backward diffusion to reconstruct the motion-free image x^ being similar to the ground truth image x [2527]. Although these methods achieved promising results, their reliance on numerous diffusion steps substantially increases the inference time. Additionally, initiating reconstruction from fully Gaussian noise xN~N(0,I) might be suboptimal for MRI motion correction task (see Section 2.1).

In this study, we present Res-MoCoDiff, residual-guided efficient motion-correction denoising diffusion probabilistic model, a diffusion model that explicitly exploits the residual error between motion-free x and motion-corrupted y images (i.e., r=yx) in the forward diffusion process. Integrating this residual error into the diffusion process enables generation of noisy images at step N with a probability distribution closely matching that of the motion-corrupted images, specifically pxN~Nx;y,γ2I. This approach offers two significant advantages: (1) enhanced reconstruction fidelity by avoiding the restrictive purely Gaussian prior assumption of conventional DDPMs, and (2) substantial computational efficiency, as the reverse diffusion process can be reduced to only four steps, substantially accelerating reconstruction times compared to traditional DDPMs.

In summary, the main contributions of this study are as follows:

  • Res-MoCoDiff is an efficient diffusion model leveraging residual information, substantially reducing the diffusion process to just four steps.

  • Res-MoCoDiff employs a novel noise scheduler that enables a more precise transition between diffusion steps by incorporating the residual error.

  • Res-MoCoDiff replaces the attention layers with a Swin Transformer block.

  • Extensive evaluation of Res-MoCoDiff is performed on both simulated (in-silico) and clinical (in-vivo) datasets covering various levels of motion-induced distortions.

2. Materials and Methods

2.1. DDPM

DDPMs are inspired by non-equilibrium thermodynamics and aim to approximate complex data distributions using a tractable distribution, such as a normal Gaussian distribution as a prior [28]. Specifically, DDPMs employ a Markov chain consisting of two distinct processes: a forward diffusion and a backward (denoising) process. During the forward diffusion, the input image x is gradually perturbed through a sequence of small Gaussian noise injections, eventually converging toward pure Gaussian noise N(0,I) after a large number of diffusion steps [16, 24]. Conversely, the backward process employs a DL model to iteratively remove noise and reconstruct the original image from the Gaussian noise by approximating the reverse Markov chain of the forward diffusion.

In traditional DDPM implementations, this reconstruction (reverse diffusion) typically requires many iterative steps (often hundreds to thousands), significantly increasing the computational burden and limiting clinical applicability, especially in time-sensitive scenarios [16, 24].

Formally, MoCo algorithms aim to recover an unknown motion-free image xn from a motion-corrupted image y according to

y=A(x)+nn, (1)

where A denotes an unknown motion corruption operator and n represents additive noise. Since this inverse problem is ill-posed, it is essential to impose a regularization or prior assumptions to constrain the solution space. Without such constraints, multiple plausible solutions for x may be consistent with the observed data y. From a Bayesian perspective, this regularization is introduced via a prior distribution p(x), which, when combined with the likelihood term p(yx), yields the posterior distribution:

p(xy)p(x)p(yx) (2)

Traditional DDPMs typically assume a normal Gaussian prior, p(x)=N(0,I). While mathematically convenient, this assumption might not be ideal for inverse problem tasks [29, 30] such as MRI motion correction tasks because it could encourage unrealistic reconstruction, introducing unwanted artifacts or image hallucinations, as suggested by recent studies [25].

2.2. Problem Formulation

Similar to conventional DDPMs, Res-MoCoDiff employs a Markov chain for both the forward and backward diffusion processes. However, it introduces a key modification: explicitly incorporating the residual error r between the motion-corrupted (y) and the motion-free (x) images into the forward diffusion process. This process is illustrated in Figure 1.

Figure 1:

Figure 1:

Flowchart of the Res-MoCoDiff approach. The forward process qxtxt1,y employs a Markov chain to shift the residual error (r=yx), thus simulating the forward diffusion. The backward diffusion is also modeled via a Markov chain pθxtxt1,y, where a DL model parametrized by θ is trained to iteratively remove the noise and recover the original image.

2.2.1. Forward Process

Res-MoCoDiff.

Res-MoCoDiff employs a monotonically increasing shifting sequence βtt=1N to modulate the residual error r, starting with β10 and culminating in βN1, as illustrated in Figure 1, where each forward step progressively integrates more of the residual into the motion-free image. The transition kernel for each forward step is given by

qxtxt1,y=Nxt;xt1+αtr,γ2αtIfort[0,N], (3)

where αt=βtβt1 and α1=β10. The hyperparameter γ enhances the flexibility of the forward process. Following a procedure similar to that described in [24, 31], it can be shown that the marginal distribution of the data at a given time step t from the input image x is

qxtx,y=Nxt;x+βtr,γ2βtIfort[0,N], (4)

where we denote the motion-free input image by x, omitting the subscript (i.e., x0).

Noise scheduler.

We employ a non-uniform geometric noise scheduler, as proposed by Yue et al. [30], to compute the shifting sequence βtt=1N. Formally,

βt=β1exp12t1T1plogβNβ1fort[2,N1], (5)

where p is a hyperparameter controlling the growth rate. As shown in Figure 2, lower values of p lead to greater noise levels in the images xt across the forward diffusion steps. In addition, it is recommended to keep γβ1 sufficiently small to ensure qx1x,yq(x) (see (4)) [16, 28]. Hence, we set γβ1=0.04 by choosing β1=(0.04/γ)2 and γ=2. We also set βN=0.999 to satisfy the upper bound βN1. Unlike p, which modulates the rate at which noise accumulates, a larger γ amplifies the overall noise level at each step. Panels (a)–(c), (d)–(f), and (i)–(k) of Figure 2 illustrate how different values of p and γ alter the forward diffusion process at various time steps t, while panel (g) depicts the ground truth x, the motion-corrupted image y, and the residual r. The corresponding noise scheduler curves for each hyperparameter combination are shown in panel (h).

Figure 2:

Figure 2:

Illustration of the influence of hyperparameters on the forward diffusion process. Panels (a)–(c), (d)–(f), and (i)–(k) demonstrate how varying the hyperparameter γ affects the noise level in the generated images xt for different values of p, with higher γ leading to stronger noise. Panels (a) and (d) specifically compare the effect of p for a fixed γ. Panel (g) displays the ground truth motion-free image x, the motion-corrupted image y, and the residual error r. Panel (h) shows the evolution of βt over the time steps t for various hyperparameter combinations.

2.2.2. Backward process

This process trains a DL model, parameterized by θ, that employs a U-net backbone in which the conventional attention layers are replaced by Swin Transformer blocks [32] to improve generalization across different image resolutions [33]. The network architecture is depicted in Figure 3.

Figure 3:

Figure 3:

The Res-MoCoDiff network architecture. The inputs consist of a motion-corrupted image y, a motion-free image xt at a given time step t, and the corresponding time step information. The output is the estimated motion-free image xt for t<t.

The Res-MoCoDiff model is trained to estimate the posterior distribution pθ(xy) as follows:

pθ(xy)=pxNyt=1Npθxt1xt,ydx1:N (6)

where pxNyNy,γ2I, and pθxt1xt,y denotes a DL model, parameterized by θ, which approximates xt1 given xt.

Following the conventional DDPM literature [24, 2931], we assume that the reverse process follows a Gaussian distribution:

pθxt1xt,y=Nxt1;μθxt,y,t,Σθxt,y,t, (7)

where the parameters θ are optimized by minimizing the following evidence lower bound:

t=1NDKLqxt1xt,x,ypθxt1xt,y, (8)

with DKL[] denoting the Kullback-Leibler divergence. Detailed derivations can be found in [2931].

Based on (3) and (4), the target distribution qxt1xt,x,y is given by:

qxt1xt,x,y=N(xt1;βt1βtxt+αtβtxμq,γ2βt1βtαtIΣq), (9)

Since Σq is independent of the inputs x and y, we set Σθxt,y,t=Σq, in accordance with previous works [16, 29, 31].

The mean parameter μθxt,y,t is modeled as follows:

μθxt,y,t=βt1βtxt+αtβtfθxt,y,t (10)

where fθ() denotes the DL model parameterized by θ.

Under the assumption of a Gaussian kernel and a Markov chain, it can be shown that (6) can be optimized by minimizing the 2 loss below,

θ^=argminθfθxt,y,tx22, (11)

Additionally, our experiments demonstrate that incorporating an 1 regularizer can enhance high-resolution image reconstruction by promoting sparsity in the learned representations. The overall loss function is defined as:

Lθxt,y,t=fθxt,y,tx22+fθxt,y,tx11, (12)

with the effectiveness of the 1 regularizer further validated in the ablation study in Section 3.3. The pseudo-codes for the training and sampling processes are provided in Algorithms 1 and 2, respectively.

In our implementation, the training objective combined 1 and 2 losses with equal weighting. This design was chosen to balance the strengths of the two norms: the 2 component penalizes large deviations and promotes overall pixel-level consistency, whereas the 1 component preserves high-frequency features and sharper structural details. While 2 loss alone can lead to overly smoothed reconstructions, the inclusion of 1 mitigates this effect by encouraging sparsity and edge sharpness. This complementary behavior has been reported in prior medical image restoration studies and was confirmed in our ablation analysis (Section 3.3). No additional hyperparameter optimization of the relative weights was performed in this work.

In Res-MoCoDiff, the forward diffusion process was implemented with N=20 steps. However, the reverse process was reduced to only four steps due to the residual-guided formulation, which shifts the corrupted image distribution closer to the motion-free distribution. This substantially reduces the gap that the reverse diffusion must bridge. The number of reverse steps was selected empirically by evaluating different step counts on a validation set, where four steps provided the optimal trade-off between reconstruction quality and efficiency.

Res-MoCoDiff was implemented in PyTorch (version 2.5.1) and executed on an NVIDIA A100 with 80 GB GPU RAM. The model was trained for 100 epochs with

Algorithm 1.

Training process

Input: motion-free dataset T, motion-corrupted dataset Tc
repeat
  x~T, y~Tc
  t~Uniform({1,,N})
  xt~qxtx,y,t ▷ Given in (4)
  Take a gradient descent step on Lθxt,y,t ▷ Given in (12)
until converged
Algorithm 2.

Sampling process

Input: motion-corrupted image y; number of steps N=4; noise scheduler βtt=1N with βN=0.999, β1=(0.04/γ)2, growth rate p=0.3, and αt=βtβt1; noise scaling γ=2
   xNNxN;y,γ2βNI
for t=N,,1 do
   ϵN(ϵ;0,I) if t>1 else ϵ=0
   μθ=βt1βtxt+αtβtfθxt,y,t ▷ Given in (10)
   xt1=μθ+γβt1αtβtϵ
end for

a batch size of 32. Optimization was performed using the RAdam optimizer [34] in conjunction with a cosine annealing learning rate scheduler [35], with an initial learning rate of 2×10−4 and a minimum learning rate of 2×10−5. A warm-up phase comprising 5,000 steps was employed prior to transitioning to the cosine schedule to stabilize early training dynamics.

2.3. Patient Data Acquisition and Data Pre-processing

This study utilizes two publicly available datasets, namely the IXI dataset (https://brain-development.org/ixi-dataset/) and the movement-related artifacts (MR-ART) dataset from Open-Neuro [36], to train and evaluate our models.

The IXI dataset comprises 580 cases of T1-weighted (T1-w) brain MRI images. We partitioned the dataset into two non-overlapping subsets: a training set consisting of 480 cases (54,160 slices) and a testing set comprising 100 cases (11,980 slices). We adapted the motion simulation technique of Duffy et al. [12] to generate an in-silico dataset with varying levels of motion artifacts including high, moderate, and minor by perturbing 15, 10, and 7 k-space lines along a phase encoding direction, respectively. Random slabs, with widths ranging between three and seven k-space lines, were selected along the phase encoding direction and were subjected to rotational perturbations of ±7° and translational shifts of ±5 mm.

Additionally, model performance on in-vivo data was evaluated using the MR-ART T1-w brain MRI dataset, which comprises 148 cases (95 females and 53 males). This dataset includes three types of images: ground truth motion-free images, motion-corrupted images with a low level of distortion (level 1), and motion-corrupted images with a high level of distortion (level 2). Rigid brain image registration was performed using FSL-FLIRT [37, 38] to compensate for misalignment between the motion-free and motion-corrupted images.

It is important to note that our study was conducted in the image domain, rather than in k-space. This choice was made because most publicly available datasets and many clinical archives provide only reconstructed magnitude images, not raw k-space data. Working in the image domain therefore enables broader applicability of Res-MoCoDiff to retrospective studies and multi-institutional data, where access to raw acquisition information is not readily available.

2.4. Quantitative and Statistical Analysis

We compared our model against benchmark approaches, including CycleGAN [39], Pix2pix [40], and a conventional DDPM variant that employs a vision transformer backbone [41].

To quantitatively assess the performance of the models in removing brain motion artifacts, we reported three metrics: normalized mean squared error (NMSE), structural similarity index measure (SSIM) [42], and peak signal-to-noise ratio (PSNR). Lower NMSE values indicate better performance, although NMSE may favor solutions with increased blurriness [43]. SSIM ranges from −1 to 1, with a value of 1 representing optimal structural similarity between the reconstructed and ground truth images. Likewise, a higher PSNR denotes improved performance and is more aligned with human perception due to its logarithmic scaling [44]. The quantitative metrics were computed using the PIQ library (https://piq.readthedocs.io/en/latest, version 0.8.0) [45] with its default parameters.

3. Results

This section presents both qualitative and quantitative results for the in-silico and in-vivo datasets. In addition, an ablation study is conducted to quantify the contribution of each component of the proposed Res-MoCoDiff model.

3.1. Qualitative results

The motion artifacts observed in the motion-corrupted images confirm that our simulation procedure successfully reproduces both ringing artifacts inside the skull and ghosting of bright fat tissue outside the skull, as indicated by the white and green arrows in Figures 4(a) and (d). Notably, the zoomed-in regions in Figure 4(b) illustrate that Res-MoCoDiff preserves fine structural details more effectively than the comparative methods. Furthermore, the pixel-level distortion maps in Figures 4(c), (f), and (i) underscore the superior artifact removal achieved by Res-MoCoDiff.

Figure 4:

Figure 4:

Qualitative results for the in-silico dataset are shown. Panels (a)-(c), (d)-(f), and (g)-(i) illustrate the outcomes for heavy, moderate, and minor distortion levels, respectively. The white and green arrows in panels (a) and (d) indicate ringing artifacts inside the skull and ghosting of bright fat tissue outside the skull, respectively, while panels (b), (e), and (h) present zoomed-in views of the regions highlighted by the red boxes.

Although our approach demonstrates a generally robust ability to preserve detailed structures, a few residual ringing artifacts remain (highlighted by arrows in Figure 4(d)) for the moderate distortion level. For the minor distortion level, the overall performance among all methods is similar in mitigating motion artifacts, as shown in Figure 4(g)-(i). Finally, the pixel-wise correlation plots in Figure 5(a)-(c) confirm the qualitative findings: Res-MoCoDiff attained Pearson correlation coefficients of ρ=0.9974, ρ=0.9990, and ρ=0.9999 for high, moderate, and minor distortion levels, respectively, surpassing the second-best MT-DDPM method, which yields ρ=0.9961, ρ=0.9987, and ρ=0.9997.

Figure 5:

Figure 5:

Pixel-wise correlations for the in-silico dataset are shown. Panels (a)-(c) display the corresponding pixel-wise correlation plots for heavy, moderate, and minor distortion levels.

Qualitative results for the in-vivo MR-ART dataset are presented in Figure 6, where both optimal and suboptimal reconstructions are illustrated. In panel (a), corresponding to cases with recoverable motion corruption, red arrows indicate ringing artifacts inside the skull, and a white arrow highlights ghosting of bright fat tissue outside the skull. The green arrows denote regions where Res-MoCoDiff successfully restores fine structural details, achieving improvements from 28.81 dB and 75.71 (motion-corrupted) to 30.61 dB and 95.30 (Res-MoCoDiff) in PSNR and SSIM for Level 1, and from 27.03 dB and 67.58 to 30.78 dB and 94.75 for Level 2. In panel (b), however, we show suboptimal reconstructions in which Res-MoCoDiff failed to fully recover the anatomical structures. As indicated by the yellow arrows, the model produced hallucinated features when the motion-corrupted input lacked sufficient soft-tissue contrast, underscoring the inherent difficulty of reconstructing details that are not present in the corrupted images.

Figure 6:

Figure 6:

Qualitative results for the in-vivo MR-ART dataset, illustrating both optimal (a) and suboptimal (b) reconstructions. Green and red arrows indicate ringing artifacts inside the skull, while the white arrow highlights ghosting of bright fat tissue outside the skull. Yellow arrows denote regions where Res-MoCoDiff hallucinated structures due to insufficient soft-tissue contrast in the corrupted inputs.

3.2. Quantitative results

As illustrated in Figure 7, motion corruption progressively reduces PSNR and SSIM across minor, moderate, and heavy distortion levels, reaching average values of (34.62± 3.25 dB, 0.87±0.05), (30.46±2.48 dB, 0.79±0.05), and (28.14±2.20 dB, 0.74±0.05), respectively. This negative trend confirms that increased distortion degrades image quality. Conversely, NMSE values rise from 0.56 ± 0.43% to 1.33 ± 0.84% and 2.94 ± 1.79% as the distortion intensifies. These results are further detailed in Table 1, which shows that our proposed Res-MoCoDiff method consistently achieves higher PSNR and SSIM, as well as lower NMSE, compared with the benchmark approaches at all distortion levels.

Figure 7:

Figure 7:

Boxplots of PSNR, SSIM, and NMSE metrics across different motion artifact levels for the in-silico IXI dataset.

Table 1:

Quantitative Metrics demonstrated in mean±std across different motion artifact levels of the in-silico dataset are summarized. The arrows indicate the direction of better performance.

Metrics Distortion level Corrupted Pix2Pix CycleGAN MT-DDPM Res-MoCoDiff (ours)
PSNR [dB] ↑ Minor 34.62±3.25 37.03±1.65 41.21±2.91 38.25±3.08 41.91 ±2.94
Moderate 30.46±2.48 37.16±4.86 38.96 ±2.33 35.10±2.64 37.97±2.39
Heavy 28. l4±2.20 31.95±3.01 35.93 ±1.48 32.65±2.30 34.15±2.42

SSIM [−] ↑ Minor 0.87±0.05 0.91±0.07 0.97±0.01 0.98±0.01 0.99 ±0.00
Moderate 0.79±0.05 0.92±0.05 0.92±0.05 0.98±0.01 0.98 ±0.01
Heavy 0.74±0.05 0.89±0.03 0.91±0.06 0.95±0.02 0.96 ±0.01

NMSE [%] ↓ Minor 0.56±0.43 0.18±0.13 0.15±0.18 0.24±0.20 0.10 ±0.09
Moderate l.33±0.84 0. 57±0.43 0.60±0.60 0.47±0.32 0.24 ±0.16
Heavy 2.94±1.79 1.14±0.69 0.88±0.54 0.81±0.56 0.58 ±0.40

Bold indicates the best values.

As shown in Figure 7 and Table 1, Res-MoCoDiff consistently achieves the lowest NMSE across all distortion levels. For minor distortion, Res-MoCoDiff outperforms all comparative methods in PSNR (41.91 ± 2.94dB) and SSIM (0.99 ± 0.00), while also obtaining the lowest NMSE (0.10±0.09%). CycleGAN provides the second-best NMSE (0.15±0.18%) at this level, although it achieves a lower PSNR (41.21±2.91dB) and SSIM (0.97 ± 0.01) compared with Res-MoCoDiff.

For moderate distortion, Res-MoCoDiff achieves the highest SSIM (0.98 ± 0.01) and the lowest NMSE (0.24 ± 0.16%). CycleGAN attains the best PSNR (38.96 ± 2.33dB) for this distortion level, but its SSIM (0.92 ± 0.05) and NMSE (0.60 ± 0.60%) remain below Res-MoCoDiff’s performance. MT-DDPM ranks as the second-best method in NMSE (0.47±0.32%), highlighting its competitive artifact-reduction capabilities, although it still trails Res-MoCoDiff.

At the heavy distortion level, Res-MoCoDiff again secures the best SSIM (0.96 ± 0.01) and NMSE (0.58 ± 0.40%), whereas CycleGAN achieves the highest PSNR (35.93 ± 1.48dB). Nonetheless, CycleGAN’s SSIM (0.91 ± 0.06) and NMSE (0.88 ± 0.54%) are notably worse than those of Res-MoCoDiff, indicating that a higher PSNR alone may not guarantee superior structural fidelity or overall artifact removal. Across all three distortion levels, the boxplots reveal that Res-MoCoDiff’s performance distribution is consistently shifted toward higher PSNR and SSIM and lower NMSE values compared with the other methods, underscoring its robustness in mitigating motion artifacts.

In the in-vivo MR-ART dataset, the original motion-corrupted images at distortion levels 1 and 2 yield NMSE, SSIM, and PSNR values of (2.56±2.38%, 0.74±0.10, 28.61±2.85dB) and (3.30±2.76%, 0.72±0.10, 27.51±2.83), respectively. Res-MoCoDiff reduces NMSE by 33.21% (to 1.71±1.49) for level 1 and 37.41% (to 2.07±1.79) for level 2. Additionally, it raises SSIM by 23.60% (to 0.92±0.05) for level 1 and 25.58% (to 0.91±0.05) for level 2, while also improving PSNR by 6.26% (to 30.40±2.90dB) for level 1 and 7.71% (to 29.63 ± 2.97dB) for level 2. These gains underline the robust performance of Res-MoCoDiff for both in-silico and in-vivo motion artifact correction.

3.3. Ablation Study

We conducted an ablation study to quantify the contributions of the 2 loss defined in (11) and the combined 1+2 loss specified in (12) to the overall performance of the proposed method across minor, moderate, and heavy distortion levels. Table 2 summarizes the PSNR, SSIM, and NMSE metrics obtained under the two training scenarios.

Table 2:

The ablation study results are summarized. The arrows indicate the direction of better performance. The numbers inside the parentheses in red are the improvement of the complete Res-MoCoDiff compared with the other training scenarios that only used 2 loss in training.

Scenarios Distortion level PSNR [dB] ↑ SSIM [−] ↑ NMSE [%] ↓
2 Minor 40.44±2.77 0.99±0.01 0.14±0.11
Moderate 37.21±2.26 0.97±0.01 0.28±0.18
Heavy 33.74±2.15 0.95±0.01 0.61±0.37

2+1 (Res-MoCoDiff) Minor 41.91±2.94(3.63%) 0.99±0.00(0.42%) 0.10±0.09(−26.29%)
Moderate 37.97±2.39(2.04%) 0.98±0.01(0.48%) 0.24±0.16(−14.74%)
Heavy 34.15±2.42(1.21%) 0.96±0.01(0.95%) 0.58±0.40(−5.73%)

When training with only the 2 loss, the model achieved a PSNR of 40.44 ± 2.77 dB, an SSIM of 0.99 ± 0.01, and an NMSE of 0.14 ± 0.11% for minor distortion. For moderate distortion, the performance was 37.21 ± 2.26 dB in PSNR, 0.97 ± 0.01 in SSIM, and 0.28±0.18% in NMSE, while for heavy distortion the corresponding values were 33.74 ± 2.15 dB, 0.95 ± 0.01, and 0.61 ± 0.37%, respectively.

In contrast, when the model was trained using the complete Res-MoCoDiff strategy that incorporates both the 1 and 2 losses, performance improvements were observed consistently across all distortion levels. For minor distortion, the PSNR increased to 41.91 ± 2.94 dB (an improvement of 3.63%), SSIM improved marginally to 0.99 ± 0.00 (an increase of 0.42%), and the NMSE was reduced to 0.10 ± 0.09%, corresponding to a reduction of 26.29%. For moderate distortion, the PSNR improved to 37.97±2.39 dB (a 2.04% increase), the SSIM increased to 0.98±0.01 (an improvement of 0.48%), and the NMSE decreased to 0.24 ± 0.16%, a reduction of 14.74%. For heavy distortion, the use of the combined loss resulted in a PSNR of 34.15±2.42 dB (an improvement of 1.21%), an SSIM of 0.96 ± 0.01 (an increase of 0.95%), and an NMSE of 0.58 ± 0.40%, corresponding to a reduction of 5.73%.

These findings indicate that the inclusion of the 1 regularizer is instrumental in reducing pixel-level errors and preserving fine structural details, thereby contributing significantly to the overall performance of the method. The improvements, particularly in NMSE, underscore the efficacy of the combined loss function in mitigating residual errors and enhancing the robustness of motion artifact correction across varying levels of distortion.

4. Discussion

MRI is a versatile imaging modality that provides excellent soft-tissue contrast and valuable physiological information. However, the prolonged acquisition times inherent to MRI increase the likelihood of patient motion, which in turn manifests as ghosting and ringing artifacts. Although the simplest solutions to mitigate motion artifacts involve repeating the scan or employing motion tracking systems, these approaches impose additional costs and burdens on the clinical workflow.

In this study, we proposed Res-MoCoDiff, an efficient denoising diffusion probabilistic model designed to reconstruct motion-free images. By leveraging a residual error shifting mechanism (illustrated in Figure 1), our method performs the sampling process in only four steps (see Algorithm 2), thereby facilitating its integration into current clinical practices. Unlike conventional DDPMs that require hundreds of reverse steps, Res-MoCoDiff leverages residual error shifting to bring the corrupted distribution closer to the motion-free image distribution at the terminal diffusion step. This enables accurate image restoration in only four reverse steps. Our validation experiments showed that using more than four steps did not yield significant improvements, confirming the efficiency of this reduced-step sampling. Notably, Res-MoCoDiff achieves an average sampling time of 0.37 seconds per batch of two image slices, which is substantially lower than the 101.74 seconds per batch required by the conventional TM-DDPM approach.

Our motion simulation technique effectively generates realistic artifacts, including ringing within the skull and ghosting of bright fat tissue outside the skull, as indicated by the white arrows in Figure 4 for the in-silico dataset and similarly in Figure 6 for the in-vivo dataset. This capability underscores the potential of our simulation framework to closely mimic the clinical appearance of motion artifacts.

Extensive qualitative and quantitative evaluations on both in-silico and in-vivo datasets demonstrate the superior performance of Res-MoCoDiff in removing motion artifacts across different distortion levels. While comparative models often leave residual artifacts, particularly under heavy and moderate motion conditions, Res-MoCoDiff consistently eliminates these imperfections, as highlighted by the green arrows in Figure 4. Moreover, the proposed method excels at recovering fine structural details, resulting in higher pixel-wise correlations (see Figure 5(a)-(c)). Although our method achieves the second highest PSNR among the evaluated techniques (see Table 1), its overall improvements in SSIM and NMSE, together with the perceptually superior image quality, underscore its clinical efficacy.

The inclusion of an 1 regularizer during training further enhanced image sharpness by reducing NMSE, which is particularly important given that higher NMSE is often associated with blurry reconstructions. This improvement was consistent with the increases observed in both PSNR and SSIM, resulting in images that were structurally closer to the ground truth [46].

While 2 loss is widely used in medical image restoration to reduce global pixel-level error, it can lead to oversmoothed reconstructions. By contrast, the 1 term preserves high-frequency information and improves edge sharpness, thereby producing more realistic structures. The complementary behavior of 1 and 2 has been reported in both general image restoration [47] and MRI artifact removal [48]. Our ablation study (Table 2) confirmed these benefits, showing that the combined loss reduced NMSE while enhancing perceptual quality compared with 2 alone.

A further consideration in designing Res-MoCoDiff is our decision to operate in the image domain rather than in k-space. While k-space methods can, in principle, provide direct access to raw acquisition information, their use in clinical practice is limited by several practical factors. Raw k-space data are not routinely stored in most hospital archives or large-scale public repositories, and reconstruction pipelines differ substantially across scanner vendors and software versions, which complicates reproducibility. In addition, implementing k-space motion correction often requires vendor-specific software environments that are not widely accessible. By contrast, reconstructed magnitude images are universally available and can be directly integrated into existing clinical workflows. Our image-domain design therefore prioritizes generalizability and clinical feasibility, allowing Res-MoCoDiff to serve as an off-the-shelf solution that can be applied retrospectively across a wide range of studies and institutions.

We focused on T1-w brain MRI because this sequence is both highly susceptible to motion artifacts and is central to Radiation Oncology workflows [49]. This emphasis ensures that our evaluation addresses a clinically important sequence with high impact on both diagnostic and therapeutic decision making.

This study has several limitations that warrant discussion. First, in severely degraded in-vivo cases where the motion-corrupted input lacks soft-tissue contrast, Res-MoCoDiff may hallucinate anatomical details that are not consistent with the ground truth. This limitation reflects the fact that when essential structural information is absent from the input, the model cannot reliably recover it. Future work should therefore explore strategies such as incorporating multi-contrast MRI, embedding stronger physics-informed priors, or using uncertainty quantification to identify high-risk reconstructions. Second, our work was restricted to brain MRI, and other sequences such as diffusion-weighted echo-planar imaging (EPI) and high-resolution T2w imaging were not explored. Motion in brain MRI is typically irregular, unpredictable, and of relatively small amplitude, and the residual-guided design of Res-MoCoDiff is well suited for these conditions. However, larger and non-rigid motion events, such as sneezing or swallowing during long 3D acquisitions, may introduce more complex artifacts that are not fully addressed by the current framework. In addition, our motion simulation followed the established approach of Duffy et al. [12], which reproduces realistic ghosting and ringing but does not explicitly account for the increased probability of motion during longer scans. Furthermore, it should also be clarified that this study addresses retrospective motion artifact correction in structural brain MRI, which is distinct from physiological motion correction approaches (e.g., respiratory or cardiac compensation) used in free-breathing acquisitions. To extend Res-MoCoDiff to these broader scenarios, retraining on sequence-specific data may be required, and future studies could explore whether architectural modifications would further improve performance in modeling non-rigid motion. Addressing these extensions will be important for establishing the generalizability of the framework across diverse imaging protocols and clinical applications. Finally, future work should also explore integration with motion modeling or hybrid prospective-retrospective strategies to improve robustness in more complex scenarios.

5. Conclusion

Res-MoCoDiff represents a significant advancement in motion artifact correction for MRI. Its rapid processing speed and robust performance across a range of distortion levels make it a promising candidate for clinical adoption, potentially reducing the need for repeated scans and thereby improving patient throughput and diagnostic and treatment efficiency. Future work will focus on further optimizing the model, exploring its application to other imaging modalities, and validating its performance in larger, multi-center clinical studies.

Acknowledgment

This research is supported in part by the National Institutes of Health under Award Numbers R56EB033332, R01DE033512, and R01CA272991.

Footnotes

Conflicts of interest

There are no conflicts of interest declared by the authors.

Data availability

The datasets used in this study are publicly available. The IXI dataset can be accessed at https://brain-development.org/ixi-dataset/, and the MR-ART dataset is available through OpenNeuro https://openneuro.org/datasets/ds004173/versions/1.0.2.

References

  • [1].Safari Mojtaba, Fatemi Ali, Afkham Younes, and Archambault Louis. Patient-specific geometrical distortion corrections of mri images improve dosimetric planning accuracy of vestibular schwannoma treated with gamma knife stereotactic radiosurgery. Journal of Applied Clinical Medical Physics, 24(10):e14072, 2023. [Google Scholar]
  • [2].Zaitsev Maxim, Maclaren Julian, and Herbst Michael. Motion artifacts in mri: complex problem with many partial solutions. Journal of Magnetic Resonance Imaging, 42(4):887–901, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Sreekumari A., Shanbhag D., Yeo D., Foo T., Pilitsis J., Polzin J., Patil U., Coblentz A., Kapadia A., Khinda J., Boutet A., Port J., and Hancu I.. A deep learning–based approach to reduce rescan and recall rates in clinical mri examinations. American Journal of Neuroradiology, 40(2):217–223, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Kemenczky Péter, Vakli Pál, Somogyi Eszter, Homolya István, Hermann Petra, Gál Viktor, and Vidnyánszky Zoltán. Effect of head motion-induced artefacts on the reliability of deep learning-based whole-brain segmentation. Scientific Reports, 12(1):1618, Jan 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Sui Zhuojie, Palaniappan Prasannakumar, Brenner Jakob, Paganelli Chiara, Kurz Christopher, Landry Guillaume, and Riboldi Marco. Intra-frame motion deterioration effects and deep-learning-based compensation in mr-guided radiotherapy. Medical Physics, 51(3):1899–1917, 2024. [DOI] [PubMed] [Google Scholar]
  • [6].Manduca Armando, McGee Kiaran P., Brian Welch E., Felmlee Joel P., Grimm Roger C., and Ehman Richard L.. Autocorrection in mr imaging: Adaptive motion correction without navigator echoes. Radiology, 215(3):904–909, 2000. [DOI] [PubMed] [Google Scholar]
  • [7].Bydder M., Larkman D.J., and Hajnal J.V.. Detection and elimination of motion artifacts by regeneration of k-space. Magnetic Resonance in Medicine, 47(4):677–686, 2002. [DOI] [PubMed] [Google Scholar]
  • [8].Loktyushin Alexander, Nickisch Hannes, Pohmann Rolf, and Bernhard Schölkopf. Blind retrospective motion correction of mr images. Magnetic Resonance in Medicine, 70(6):1608–1618, 2013. [DOI] [PubMed] [Google Scholar]
  • [9].Tamir Jonathan I., Blumenthal Moritz, Wang Jiachen, Oved Tal, Shimron Efrat, and Zaiss Moritz. Mri acquisition and reconstruction cookbook: recipes for reproducibility, served with real-world flavour. Magnetic Resonance Materials in Physics, Biology and Medicine, 38(3):367–385, Jul 2025. [Google Scholar]
  • [10].Karakuzu Agah, Biswas Labonny, Cohen-Adad Julien, and Stikov Nikola. Vendor-neutral sequences and fully transparent workflows improve intervendor reproducibility of quantitative mri. Magnetic Resonance in Medicine, 88(3):1212–1228, 2022. [DOI] [PubMed] [Google Scholar]
  • [11].Spieker Veronika, Eichhorn Hannah, Hammernik Kerstin, Rueckert Daniel, Preibisch Christine, Karampinos Dimitrios C., and Schnabel Julia A.. Deep learning for retrospective motion correction in mri: A comprehensive review. IEEE Transactions on Medical Imaging, 43(2):846–859, 2024. [DOI] [PubMed] [Google Scholar]
  • [12].Duffy Ben A, Zhao Lu, Sepehrband Farshid, Min Joyce, Wang Danny JJ, Shi Yonggang, Toga Arthur W, and Kim Hosung. Retrospective motion artifact correction of structural mri images using deep learning improves the quality of cortical surface reconstructions. NeuroImage, 230:117756, 2021. [Google Scholar]
  • [13].Küstner Thomas, Armanious Karim, Yang Jiahuan, Yang Bin, Schick Fritz, and Gatidis Sergios. Retrospective correction of motion-affected mr images using deep learning frameworks. Magnetic Resonance in Medicine, 82(4):1527–1540, 2019. [DOI] [PubMed] [Google Scholar]
  • [14].Safari Mojtaba, Yang Xiaofeng, Chang Chih-Wei, Qiu Richard L J, Fatemi Ali, and Archambault Louis. Unsupervised mri motion artifact disentanglement: introducing maudgan. Physics in Medicine & Biology, 69(11):115057, may 2024. [Google Scholar]
  • [15].Liu Siyuan, Thung Kim-Han, Qu Liangqiong, Lin Weili, Shen Dinggang, and Yap Pew-Thian. Learning mri artefact removal with unpaired data. Nature Machine Intelligence, 3(1):60–67, Jan 2021. [Google Scholar]
  • [16].Ho Jonathan, Jain Ajay, and Abbeel Pieter. Denoising diffusion probabilistic models. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ‘20, Red Hook, NY, USA, 2020. Curran Associates Inc. [Google Scholar]
  • [17].Özbey Muzaffer, Dalmaz Onat, Dar Salman U. H., Bedel Hasan A., Özturk Șaban, Güngör Alper, and Çukur Tolga. Unsupervised medical image translation with adversarial diffusion models. IEEE Transactions on Medical Imaging, 42(12):3524–3539, 2023. [DOI] [PubMed] [Google Scholar]
  • [18].Pan Shaoyan, Eidex Zach, Safari Mojtaba, Qiu Richard, and Yang Xiaofeng. Cycle-guided denoising diffusion probability model for 3D cross-modality MRI synthesis. In Gimi Barjor S. and Krol Andrzej, editors, Medical Imaging 2025: Clinical and Biomedical Imaging, volume 13410, page 134101W. International Society for Optics and Photonics, SPIE, 2025. [Google Scholar]
  • [19].Wu Zhanxiong, Chen Xuanheng, Xie Sangma, Shen Jian, and Zeng Yu. Super-resolution of brain mri images based on denoising diffusion probabilistic model. Biomedical Signal Processing and Control, 85:104901, 2023. [Google Scholar]
  • [20].Safari Mojtaba, Eidex Zach, Pan Shaoyan, Qiu Richard L. J., and Yang Xiaofeng. Self-supervised adversarial diffusion models for fast mri reconstruction. Medical Physics, 52(6):3888–3899, 2025. [DOI] [PubMed] [Google Scholar]
  • [21].Wang Shansong, Safari Mojtaba, Li Qiang, Chang Chih-Wei, Richard LJ Qiu Justin Roper, Yu David S., and Yang Xiaofeng. Triad: Vision foundation model for 3d magnetic resonance imaging. 2025. [Google Scholar]
  • [22].Sun Yue, Wang Limei, Li Gang, Lin Weili, and Wang Li. A foundation model for enhancing magnetic resonance images and downstream segmentation, registration and diagnostic tasks. Nature Biomedical Engineering, Dec 2024. [Google Scholar]
  • [23].Wang Shansong, Jin Zhecheng, Hu Mingzhe, Safari Mojtaba, Zhao Feng, Chang Chih-Wei, Richard LJ Qiu Justin Roper, Yu David S., and Yang Xiaofeng. Unifying biomedical vision-language expertise: Towards a generalist foundation model via multi-clip knowledge distillation, 2025. [Google Scholar]
  • [24].Luo Calvin. Understanding diffusion models: A unified perspective, 2022. [Google Scholar]
  • [25].Safari Mojtaba, Yang Xiaofeng, Fatemi Ali, and Archambault Louis. Mri motion artifact reduction using a conditional diffusion probabilistic model (mar-cdpm). Medical Physics, 51(4):2598–2610, 2024. [DOI] [PubMed] [Google Scholar]
  • [26].Sarkar Arunima, Das Ayantika, Ram Keerthi, Ramanarayanan Sriprabha, Joel Suresh Emmanuel, and Sivaprakasam Mohanasankar. Autodps: An unsupervised diffusion model based method for multiple degradation removal in mri. Computer Methods and Programs in Biomedicine, 263:108684, 2025. [Google Scholar]
  • [27].Liu Yang, Diao Jiameng, Zhou Zijian, Qi Haikun, and Hu Peng. Cardiac cine mri motion correction using diffusion models. In 2024 IEEE International Symposium on Biomedical Imaging (ISBI), pages 1–5, 2024. [Google Scholar]
  • [28].Sohl-Dickstein Jascha, Weiss Eric A., Maheswaranathan Niru, and Ganguli Surya. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, page 2256–2265. JMLR.org, 2015. [Google Scholar]
  • [29].Yue Zongsheng, Wang Jianyi, and Loy Chen Change. Efficient diffusion model for image restoration by residual shifting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(1):116–130, 2025. [DOI] [PubMed] [Google Scholar]
  • [30].Yue Zongsheng, Wang Jianyi, and Loy Chen Change. Resshift: Efficient diffusion model for image super-resolution by residual shifting. In Oh A., Naumann T., Globerson A, Saenko K., Hardt M., and Levine S., editors, Advances in Neural Information Processing Systems, volume 36, pages 13294–13307. Curran Associates, Inc., 2023. [Google Scholar]
  • [31].Safari Mojtaba, Wang Shansong, Eidex Zach, Li Qiang, Middlebrooks Erik H., Yu David S., and Yang Xiaofeng. Mri super-resolution reconstruction using efficient diffusion probabilistic model with residual shifting, 2025. [Google Scholar]
  • [32].Liu Ze, Lin Yutong, Cao Yue, Hu Han, Wei Yixuan, Zhang Zheng, Lin Stephen, and Guo Baining. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, October 2021. [Google Scholar]
  • [33].Safari Mojtaba, Wang Shansong, Eidex Zach, Qiu Richard, Chang Chih-Wei, Yu David S., and Yang Xiaofeng. A physics-informed deep learning model for mri brain motion correction, 2025. [Google Scholar]
  • [34].Liu Liyuan, Jiang Haoming, He Pengcheng, Chen Weizhu, Liu Xiaodong, Gao Jianfeng, and Han Jiawei. On the variance of the adaptive learning rate and beyond. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net, 2020. [Google Scholar]
  • [35].Loshchilov Ilya and Hutter Frank. SGDR: stochastic gradient descent with warm restarts. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings. OpenReview.net, 2017. [Google Scholar]
  • [36].Nárai Adám, Hermann Petra, Auer Tibor, Kemenczky Péter, Szalma János,´ István Homolya, Eszter Somogyi, Pál Vakli, Béla Weiss, and Zoltán Vidnyánszky. Movement-related artefacts (mr-art) dataset of matched motion-corrupted and clean structural mri brain scans. Scientific Data, 9(1):630, Oct 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Jenkinson Mark, Bannister Peter, Brady Michael, and Smith Stephen. Improved optimization for the robust and accurate linear registration and motion correction of brain images. NeuroImage, 17(2):825–841, 2002. [DOI] [PubMed] [Google Scholar]
  • [38].Jenkinson Mark and Smith Stephen. A global optimisation method for robust affine registration of brain images. Medical Image Analysis, 5(2):143–156, 2001. [DOI] [PubMed] [Google Scholar]
  • [39].Zhu Jun-Yan, Park Taesung, Isola Phillip, and Efros Alexei A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, 2017. [Google Scholar]
  • [40].Isola Phillip, Zhu Jun-Yan, Zhou Tinghui, and Efros Alexei A. Image-to-image translation with conditional adversarial networks. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, 2017. [Google Scholar]
  • [41].Pan Shaoyan, Wang Tonghe, Richard L J Qiu, Marian Axente, Chang Chih-Wei, Peng Junbo, Patel Ashish B, Shelton Joseph, Patel Sagar A, Roper Justin, and Yang Xiaofeng. 2d medical image synthesis using transformer-based denoising diffusion probabilistic model. Physics in Medicine & Biology, 68(10):105004, may 2023. [Google Scholar]
  • [42].Zhou Wang A.C. Bovik, Sheikh H.R., and Simoncelli E.P.. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004. [DOI] [PubMed] [Google Scholar]
  • [43].Zhang Richard, Isola Phillip, Efros Alexei A., Shechtman Eli, and Wang Oliver. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. [Google Scholar]
  • [44].Eidex Zach, Wang Jing, Safari Mojtaba, Elder Eric, Wynne Jacob, Wang Tonghe, Shu Hui-Kuo, Mao Hui, and Yang Xiaofeng. High-resolution 3t to 7t adc map synthesis with a hybrid cnn-transformer model. Medical Physics, 51(6):4380–4388, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Kastryulin Sergey, Zakirov Jamil, Prokopenko Denis, and Dylov Dmitry V.. Pytorch image quality: Metrics for image quality assessment, 2022. [Google Scholar]
  • [46].Safari Mojtaba, Fatemi Ali, and Archambault Louis. Medfusiongan: multimodal medical image fusion using an unsupervised deep generative adversarial network. BMC Medical Imaging, 23(1):203, Dec 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [47].Zhang Richard, Isola Phillip, Efros Alexei A, Shechtman Eli, and Wang Oliver. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018. [Google Scholar]
  • [48].Liu Siyuan, Thung Kim-Han, Qu Liangqiong, Lin Weili, Shen Dinggang, and Yap Pew-Thian. Learning mri artefact removal with unpaired data. Nature Machine Intelligence, 3(1):60–67, 2021. [Google Scholar]
  • [49].De Pietro Simona, Di Martino Giulia, Caroprese Mara, Barillaro Angela, Cocozza Sirio, Pacelli Roberto, Cuocolo Renato, Ugga Lorenzo, Briganti Francesco, Brunetti Arturo, Conson Manuel, and Elefante Andrea. The role of mri in radiotherapy planning: a narrative review “from head to toe”. Insights into Imaging, 15(1):255, Oct 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used in this study are publicly available. The IXI dataset can be accessed at https://brain-development.org/ixi-dataset/, and the MR-ART dataset is available through OpenNeuro https://openneuro.org/datasets/ds004173/versions/1.0.2.


Articles from ArXiv are provided here courtesy of arXiv

RESOURCES