Abstract
Objective:
Magnetic resonance imaging (MRI) is essential in clinical and research contexts, providing exceptional soft-tissue contrast. However, prolonged acquisition times often lead to patient discomfort and motion artifacts. Diffusion-based deep learning super-resolution (SR) techniques reconstruct high-resolution (HR) images from low-resolution (LR) pairs, but they involve extensive sampling steps, limiting real-time application. To overcome these issues, this study introduces a residual error-shifting mechanism markedly reducing sampling steps while maintaining vital anatomical details, thereby accelerating MRI reconstruction.
Approach:
We developed Res-SRDiff, a novel diffusion-based SR framework incorporating residual error shifting into the forward diffusion process. This integration aligns the degraded HR and LR distributions, enabling efficient HR image reconstruction. We evaluated Res-SRDiff using ultra-high-field brain T1 MP2RAGE maps and T2-weighted prostate images, benchmarking it against Bicubic, Pix2pix, CycleGAN, SPSR, I2SR, and TM-DDPM methods. Quantitative assessments employed peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), gradient magnitude similarity deviation (GMSD), and learned perceptual image patch similarity (LPIPS). Additionally, we qualitatively and quantitatively assessed the proposed framework’s individual components through an ablation study and conducted a Likert-based image quality evaluation.
Main results:
Res-SRDiff significantly surpassed most comparison methods regarding PSNR, SSIM, and GMSD for both datasets, with statistically significant improvements (p-values ). The model achieved high-fidelity image reconstruction using only four sampling steps, drastically reducing computation time to under one second per slice. In contrast, traditional methods like TM-DDPM and I2SR required approximately 20 and 38 seconds per slice, respectively. Qualitative analysis showed Res-SRDiff effectively preserved fine anatomical details and lesion morphologies. The Likert study indicated that our method received the highest scores, 4.14 ± 0.77(brain) and 4.80 ± 0.40(prostate).
Significance:
Res-SRDiff demonstrates efficiency and accuracy, markedly improving computational speed and image quality. Incorporating residual error shifting into diffusion-based SR facilitates rapid, robust HR image reconstruction, enhancing clinical MRI workflow and advancing medical imaging research. Code available at https://github.com/mosaf/Res-SRDiff
Keywords: Super-resolution, MRI, Deep learning, Reconstruction, Diffusion model, Brain T1 map, Ultra-high field MRI
1. Introduction
Magnetic resonance imaging (MRI) is an indispensable tool in both clinical practice and research, providing detailed anatomical and functional images. Quantitative techniques, such as 3D magnetization-prepared 2 rapid acquisition gradient echo (MP2RAGE) T1-maps, offer robust imaging free from reception bias and first-order transmit field inhomogeneities, thereby enabling precise diagnosis and treatment planning [1–3]. For example, T1-maps are employed to identify hypoxic regions that can inform adaptive dose-painting radiation therapy [4–6]. Moreover, in addition to these quantitative methods, T2-weighted (T2w) MRI provides enhanced tissue contrast, rendering it a critical imaging modality for prostate cancer treatment by delineating tumor boundaries and guiding therapeutic decisions [7]. Nevertheless, the lengthy acquisition times associated with both T1-mapping and T2w imaging may induce patient discomfort and elevate the risk of motion artifacts [8], thereby potentially compromising image quality and diagnostic accuracy.
To accelerate MRI image acquisition, super-resolution (SR) studies have aimed to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts [9]. Conventional SR models, which constitute a subcategory of the broader field of image restoration, employ a maximum a posteriori framework–a Bayesian paradigm consisting of a likelihood (loss) function and a prior (regularization) term–to resolve the ill-posed SR task. The likelihood term presupposes an underlying noise distribution, yielding and losses for Gaussian and Laplacian noise assumptions, respectively. Typical regularizers include Tikhonov [10], non-local similarity [11], wavelet [12], and total variation [13] to address the ill-posed image restoration task.
Deep learning algorithms, particularly generative deep learning models, consistently outperform traditional methods in medical imaging tasks such as reconstruction [14, 15] and denoising [16]. Despite the impressive visual fidelity achieved by generative adversarial networks (GANs), they often face challenges such as mode collapse and unstable training [17, 18], which might undermine their reliability in practical and clinical settings. For example, Cheng et al. introduced a structure-preserving SR (SPSR) method that incorporates gradient guidance into the SR process [19]. This approach highlights the importance of high-fidelity gradient maps in preserving geometric consistency and mitigating structural distortions, which are prevalent challenges in GAN-based super-resolution techniques.
Recently, diffusion models have emerged as a compelling alternative to address these limitations. These models have demonstrated considerable success in MRI-related applications, such as reconstruction [20], denoising [21], synthesis [22], and super-resolution [23, 24]. The operational framework of diffusion models involves a forward process that gradually diffuses data towards a prior distribution, typically modeled as a multivariate standard Gaussian , followed by a reverse process in which a neural network (NN) is trained to approximate the inverse trajectory across numerous sampling steps. However, two significant drawbacks associated with the standard paradigm for image SR persist: first, the iterative nature of the denoising process renders the generation of HR images from LR inputs computationally demanding; second, the reliance on pure Gaussian noise for initializing the reverse process is inherently more suited for image generation than for restoration tasks. Yue et al. [25] highlighted these inefficiencies, and subsequent research has demonstrated that centering the initial reconstruction distribution around the LR image–by adjusting residual errors over sampling steps–can significantly enhance sampling efficiency [26].
Recent advancements in diffusion models have significantly reduced the required sampling steps, enhancing their efficiency. For example, a simulation-free Image-to-Image Schrödinger Bridge framework [27] employs a nonlinear diffusion bridge that directly utilizes degraded image information to guide restoration, resulting in more interpretable generative pathways. In parallel, partial diffusion models [28] have been introduced for MRI applications, leveraging latent convergence observations between LR and HR images. This latent alignment strategy effectively bypasses redundant denoising steps, markedly decreasing computational load. Additionally, a method that distills the stochastic diffusion process into a single deterministic generation step [29] has been proposed, achieving substantial acceleration without sacrificing perceptual quality.
In this study, we present an efficient diffusion model named Res-SRDiff, which leverages the residual error shift between HR and LR image pairs to reconstruct HR axial T2w prostate images and quantitative brain MRI T1 MP2RAGE maps obtained from ultra-high B0 fields, extending the work presented in [25, 26]. To our knowledge, this is the first investigation aimed at recovering HR MRI using an efficient diffusion model that requires only four sampling steps, in contrast to the thousands required by conventional diffusion models. This considerable reduction in sampling steps substantially enhances computational efficiency while maintaining the high quality of the restored HR images.
The contributions of this work are:
Formulating an efficient diffusion model specifically tailored for SR task, enabling inference in only four sampling steps.
Utilizing a U-net architecture integrated with a Swin Transformer block, replacing the traditional attention layer, to ensure improved generalization across varying image resolutions.
Conducting extensive evaluations using publicly available axial T2w prostate image datasets and institutionally acquired ultra-high-field (7T) T1 MP2RAGE brain MRI maps.
Demonstrating, for the first time, the application of an efficient diffusion model to reconstruct HR axial T2w pelvic images and ultra-high field brain T1 maps from LR image pairs.
2. Materials and Methods
In this section, we first review the traditional Denoising Diffusion Probabilistic Model (DDPM). Next, we introduce our proposed method, Res-SRDiff, which is designed to recover HR images from their LR counterparts . We assume that both HR and LR images have similar spatial size, an assumption that can be readily satisfied by pre-upsampling the LR images using nearest neighbor interpolation.
2.1. DDPM
The DDPM was initially inspired by non-equilibrium thermodynamics [30], aiming to approximate a complex data distribution with a tractable distribution, such as a standard Gaussian distribution. It was later enhanced by integrating stochastic differential equations and denoising score matching [31, 32]. The DDPM comprises two diffusion processes: a forward process and a reverse process. The forward process degrades the input image into noise following a standard Gaussian distribution over numerous steps . The reverse process trains an NN to approximate the sampling trajectory required to recover the input image from Gaussian noise over a large number of steps , which diminishes the sampling efficiency of the DDPM.
2.2. Problem formulation
Res-SRDiff is built upon a Markov chain, similar to the conventional DDPM model. However, it aims to degrade input HR images into an image over steps such that the resulting distribution rather than converging to . This is achieved by introducing the residual , which is used to shift over the steps. This process is illustrated in Figure 1.
Figure 1:
Illustration of the diffusion process. The forward diffusion process in Res-SRDiff, where a HR image is progressively shifted to match the LR distribution . The model introduces a residual error , which drives through Markov steps until , rather than converging to a standard Gaussian distribution. The reverse process trains an NN to estimate the posterior distribution given in (5).
Forward process.
To simulate the forward diffusion process, a monotonically increasing shifting sequence over time steps with bounding conditions and is used. The transition kernel for simulating the forward diffusion process is given in (1), which is constructed based on the Markov chain and the residual error shift sequences (see Figure 1):
| (1) |
where and for , and is a hyper-parameter introduced to improve the flexibility of the forward diffusion process. Considering the Markov chain, we can compute the image at step from its step using the reparameterization trick as follows:
| (2) |
where . Since this sampling forward process using (2) increases the computational burden, it is also possible to compute the image at step directly from the input noise-free image as follows:
| (3a) |
| (3b) |
| (3c) |
| (3d) |
Here, we omit the superscript HR for brevity and is . The second term (mean) and the square of the third term (variance) in the summation given in (3d) are equal to . Thus, the marginal distribution at any time step can be computed analytically as follows:
| (4) |
Reverse process.
The reverse process trains a to estimate the posterior distribution , as follows [33]:
| (5) |
where and is a reverse transition kernel that aims to learn from by training a network . Similar to conventional diffusion models [31–33], it can be written as follows by adopting the Gaussian assumption:
| (6) |
where the optimum parameter is achieved by minimizing the Kullback-Leibler (KL) divergence between the forward and reverse kernels summed over all time steps as follows [33]:
| (7) |
The target distribution can be computed using (1) and (4), along with the Markov chain assumption, which states , as follows:
| (8) |
The multiplication of two Gaussian distributions yields another Gaussian distribution that can be computed tractably [34] as follows:
| (9) |
By assuming that the forward and backward covariance matrices are similar , the KL divergence given in (7) simplifies to:
| (10) |
The mean parameter is parameterized as follows:
| (11) |
After substituting it into (10), the final loss function is achieved as follows:
| (12) |
The constant parameters were dropped, as experiments demonstrated that this improves the model’s performance [26, 32]. In addition to the data fidelity loss, a learned perceptual image patch similarity (LPIPS) loss [35] was employed. The overall optimization function is given by:
| (13) |
where is a hyper-parameter controlling the relative importance and we set it to 10 in this study. The training and sampling pseudo-codes are provided in Algorithm 1 and 2.
Algorithm 1.
Training process
Algorithm 2.
Sampling process
| Require: Low resolution image | |
| for do | |
| if else | |
| ▷ Given in (11) | |
| end for |
2.3. Noise scheduler
This study utilizes a hyper-parameter and a noise scheduler in the forward diffusion process. Given that and the scaling factor in (4) control the forward process, and it has been shown that a NN can approximate the forward diffusion trajectory [30, 32], needs to be small such as 0.04, which ensures that . Thus, we set and used to satisfy the first bounding condition (see Figure 1) and to satisfy the second bounding condition . We employed a non-uniform geometric noise scheduler proposed by [26] for as follows:
| (14) |
where the hyper-parameter controls the growth rate, as shown in Figure 2. We used in our study, similar to a recent study [26]. Furthermore, we used 15 steps for training and four steps for sampling.
Figure 2:

Residual shift denoising diffusion process. (a) shows the HR image, ; (b) displays the corresponding LR image, ; and (c) illustrates the residual error, . (d) presents the evaluation of the noise scaling factor, , as a function of the diffusion time step, . Panels (e)-(h) demonstrate the forward diffusion process driven by the residual error shift for different hyper-parameter sets.
Figure 2 comprises several panels demonstrating the effects of the hyper-parameters and on the forward diffusion process. Specifically, panels (a)-(c) show the HR image, the LR image, and the residual error, respectively. Panel (d) illustrates how varies with the time step for different values of . Finally, panels (e)-(h) depict representative outputs of the forward diffusion process at selected time steps , enabling a visual comparison of noise levels under varying and .
From panels (e) and (g) as well as (f) and (h), one observes that, for a fixed time step and fixed , reducing leads to an increase in the amount of noise in the reconstructed images. This outcome aligns with panel (d), where lower corresponds to higher , implying more additive noise given in (4). Conversely, keeping and fixed but increasing also produces noisier reconstructions, as evidenced by comparing panels (e) and (f), and similarly (g) and (h).
The Res-SRDiff model was implemented using PyTorch (version 2.5.1) and executed on NVIDIA A100 GPUs. The model was trained for 182,000 and 131,000 steps on the brain and prostate datasets, respectively, using a batch size of 16. The network was optimized with the Rectified Adam (RAdam) optimizer [36] and employed a cosine annealing learning rate scheduler [37]. The initial learning rate was set in the range of 2 × 10−5 to 5 × 10−5, and it was adjusted according to a cosine decay schedule throughout training. A warm-up phase of 5,000 steps was applied before transitioning to the cosine decay schedule to stabilize early training dynamics.
2.4. Patient data acquisition and data preprocessing
We used institutional ultra-high 7T brain T1 MP2RAGE maps [38] and publicly available axial T2w prostate cancer data [39] to train and evaluate the proposed method.
Our institutional dataset comprises 142 patients with a confirmed diagnosis of multiple sclerosis, which were divided into two non-overlapping sets: a training set (121 cases, 14,566 slices) and a test set (21 cases, 2,552 slices). This retrospective study was approved by the Mayo Clinic IRB. The institutional data were acquired using a 7 T Siemens MAGNETOM Terra with 8-channel transmit/32-channel receive head coil with the following key imaging parameters: TR = 4.5 s, TE = 2.2 ms, TI1/TI2 = 0.95/2.5 s, FA1/FA2 = 6°/4°, FOV = 230 × 230 cm2, matrix size of 288 × 288, a resolution of 0.8 × 0.8 × 0.8 mm3, and a total acquisition time of 8:44 min. FSL BET [40] was used to extract the brain mask from image inversion 1, which was subsequently applied to the T1 maps to remove the noisy background and skull. The T1 maps were down-sampled by a factor of 43, resulting in a voxel size of 3.2 × 3.2 × 3.2 mm3 (a 4-fold reduction in each direction).
We randomly selected data from 334 patients in the public prostate dataset, which were split into two non-overlapping sets: a training set (268 patients, 10,480 slices) and an evaluation set (66 patients, 2,668 slices). The T2w MR images were acquired using a 1.5 T Siemens scanner with the following parameters: TR = 2.2 s, TE = 202 ms, FA = 110°, matrix size of 256 × 256, an in-plane resolution of 0.66 × 0.66 mm2, and a slice thickness of 1.5 mm. The T2w MR images were down-sampled by a factor of 18, yielding a voxel size of 2 × 2 × 3 mm3 (a 9-fold reduction in-plane and a 2-fold reduction along the slice axis).
Under-sampling of ultra-high brain T1 maps and the axial T2w prostate images were performed in image space using the SimpleITK.Resample (version 2.1.1) package [41].
2.5. Quantitative and statistical analysis
We evaluated our method against four benchmark approaches: Bicubic, Pix2pix [42], CycleGAN [43], SPSR [19], I2SR [27], and TM-DDPM, which is a conventional DDPM with a vision transformer backbone [21]. All methods were trained for the same number of steps and with similar training parameters, except that the TM-DDPM and I2SB model had approximately three times as many training parameters.
The reconstructed HR image quality was quantitatively evaluated using four metrics: peak signal-to-noise ratio (PSNR), structural similarity index (SSIM)[44], gradient magnitude similarity deviation (GMSD)[45], and LPIPS [46]. Higher SSIM and PSNR values, and lower GMSD and LPIPS values, indicate better image restoration performance. PSNR quantifies the residual error between the restored and ground truth images, and its logarithmic scale aligns better with human perceptual judgments [47]. Furthermore, SSIM, GMSD, and LPIPS provide measures of the structural similarity between the restored images and the HR ground truth images.
Two statistical tests were employed to assess the significance of differences: a oneway analysis of variance (ANOVA) and Tukey’s honestly significant difference (HSD) test. Prior to these analyses, the Shapiro-Wilk test was conducted to evaluate the normality of the residuals. When the normality assumption was not satisfied, non-parametric methods were used, specifically the Kruskal-Wallis test followed by Dunn’s test with Bonferroni correction for multiple comparisons. The ANOVA tested the null hypothesis that the mean values for each method are equal, while the Kruskal-Wallis test assessed whether the distributions of the groups differed significantly. Tukey’s HSD and Dunn’s test with Bonferroni correction were then used to identify which specific pairs of groups differed significantly. For all analyses, the significance level was set at p < 0.05.
2.6. Subjective Image-Quality Evaluation (Likert Ratings)
A subjective image-quality evaluation was performed by two experienced medical physicists certified by the American Board of Radiology. Each rater independently reviewed SR images and assigned scores on a 5-point Likert scale (1 = Poor, 2 = Fair, 3 = Acceptable, 4 = Good, 5 = Excellent) with respect to three criteria: contrast, edge sharpness, and preservation of anatomical detail.
For every reconstruction method, 25 images were randomly selected from the brain dataset and 25 from the prostate dataset, yielding 50 images per method and 300 images in total. Mean ± standard-deviation scores are reported; larger values denote superior perceived quality. The 95% confidence intervals (CIs) for the mean values were computed using the percentile bootstrap method with 10,000 iterations, along with the bias-adjusted accelerated bootstrap approach [48]. Statistical comparisons between methods were performed using the framework detailed in Section 2.5, and the resulting p-values are provided alongside the descriptive statistics.
3. Results
3.1. Brain T1 maps
The proposed method demonstrated superior performance, exhibiting lower residual errors and higher structural similarity, as illustrated in Figure 3 (first row). Detailed comparisons provided by the zoomed-in panels (second row of Figure 3), indicated by white and red arrows, reveal that our method effectively captured fine structural details more accurately than baseline methods. However, despite successfully reconstructing these highlighted fine details, our method, like the comparative approaches, was unable to fully reconstruct very subtle structures diminished by partial volume effects, as indicated by the pink arrow in Figure 3. The difference maps shown in the third row of Figure 3 further confirm the lower global discrepancies between our method’s outputs and the HR ground truth images. These visual observations align well with quantitative findings, where our proposed method yielded higher PSNR values and lower GMSD and LPIPS scores (refer to Table 1). The reduced global residual errors depicted in the third row of Figure 3 additionally suggest a greater consistency between the reconstructed outputs of our method and the ground truth images.
Figure 3:
Qualitative results of the ultra-high field brain T1 MP2RAGE maps. The first row shows the ground truth image along with the restored outputs from our proposed Res-SRDiff and comparative models. The second row displays the zoomed-in regions corresponding to the dashed red and brown boxes. The white and red arrows highlight regions where our method outperforms the comparative models. The last row presents the difference map between the restored images and the ground truth.
Table 1:
Quantitative comparison of super-resolution models on two datasets: Axial T2w pelvic MRI and 7T brain T1 MP2RAGE maps. Results are presented as mean ± standard deviation for our proposed Res-SRDiff and comparative models. Bold values highlight the best-performing results, while underlined values indicate the second-best performance. Arrows indicate the direction of better results.
| Pelvic T2w MRI | 7T brain T1 MP2RAGE map | |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|||||||
| Models | PSNR [dB] ↑ | SSIM [−] ↑ | GMSD [−] ↓ | LPIPS [−] ↓ | PSNR [dB] ↑ | SSIM [−] ↑ | GMSD [−] ↓ | LPIPS [−] ↓ |
| Bicubic | 25.47±2.61 | 0.75±0.06 * | 0.10±0.02 | 0.69±0.15 | 22.00±1.37 | 0.31±0.16 | 0.12±0.02 | 0.38±0.07 |
| cycleGAN | 25.84±1.96 | 0.73±0.05 | 0.10±0.01 | 0.45±0.10 | 21.89±1.09 | 0.86±0.02 | 0.12±0.02 | 0.21±0.05 |
| Pix2pix | 24.83±2.09 | 0.66±0.05 | 0.11±0.01 | 0.20±0.05 * | 24.63±1.32 | 0.90±0.03 | 0.10±0.02 | 0.09±0.04 * |
| TM-DDPM | 25.12±4.46 | 0.73±0.16 | 0.13±0.04 | 0.51±0.49 | 23.22±5.02 | 0.85±0.13 | 0.12±0.05 | 0.25±0.10 |
| SPSR | 24.74±1.96 | 0.68±0.07 | 0.11±0.01 | 0.20±0.09 * | 24.76±1.12 | 0.93±0.02 | 0.10±0.01 | 0.08±0.02 |
| I2SB | 24.74±1.60 | 0.70±0.04 | 0.14±0.01 | 0.33±0.13 | 23.22±0.98 | 0.84±0.04 | 0.12±0.01 | 0.15±0.03 |
| Res-SRDiff | 27.72±2.26 | 0.75±0.05 | 0.08±0.02 | 0.21±0.11 | 26.28±1.41 | 0.92±0.03 | 0.07±0.02 | 0.08±0.02 |
denotes results that are not statistically significant based on the multi-comparison test (p-value > 0.05).
In terms of computational efficiency, the proposed method achieved an average evaluation time of 0.46 ± 0.21 seconds per slice, which was substantially faster than I2SR (31.15 ± 0.27 seconds per slice) and the MT-DDPM method (66.84 ± 27.72 seconds per slice). Given that the quantitative metrics did not meet the Shapiro-Wilk normality test criteria (p-values ), non-parametric analyses, specifically Kruskal-Wallis and Dunn’s post-hoc tests, were conducted. The Kruskal-Wallis test indicated significant statistical differences among the evaluated methods (p-values ) across all quantitative metrics. On average, the proposed method consistently outperformed alternative approaches across PSNR, GMSD, and LPIPS with statistically significant improvements (p-values ), except in the case of LPIPS, where the difference compared to the Pix2pix method was not statistically significant (p = 0.08).
The Shapiro-Wilk test indicated that the Likert scores were not normally distributed () for all evaluated methods. Our proposed method achieved an average score of 4.14 ± 0.77 (95% CI 3.90, 4.31), outperforming the second-best method, TM-DDPM, which had an average score of 3.51 ± 0.75 (95% CI 3.29, 3.71); however, this difference was not statistically significant (p = 0.11). In contrast, the differences between these two top-performing methods and the third-best method, Pix2pix with score 3.04 ± 0.68 (95% CI 2.84, 3.20), were statistically significant ().
3.2. Pelvic T2w images
We compared our proposed Res-SRDiff model against Bicubic, CycleGAN, Pix2pix, SPSR, I2SR, and MT-DDPM. Our proposed method was able to restore axial T2w pelvic images with improved fidelity to the HR ground truth, as shown in Figure 4. Although the Pix2pix and SPSR methods successfully restored HR images that were globally similar to the ground truth, our method better restored the lesion, as indicated by the red arrow in the second row of Figure 4. These findings are further confirmed by the difference maps shown in the third row of Figure 4, where our method exhibits the smallest residual error compared with the other methods.
Figure 4:
Qualitative results of the pelvic axial T2w images. The first row presents the ground truth image along with the restored outputs from our proposed Res-SRDiff and comparative models. The second row shows the zoomed-in regions outlined by the red dashed lines, where the red arrows indicate lesions that are visually restored closer to the ground truth by our method. The last row depicts the difference map between the restored images and the ground truth.
We conducted a comparative evaluation of our proposed Res-SRDiff model against Bicubic, CycleGAN, Pix2pix, SPSR, I2SR, and MT-DDPM. Our method restored axial T2w pelvic images with enhanced fidelity to the HR ground truth, as illustrated in Figure 4. Although the Pix2pix and SPSR methods generated HR images that were globally similar to the ground truth, our approach more effectively restored the lesion, as indicated by the red arrow in the second row of Figure 4. This observation is further confirmed by the difference maps shown in the third row, where our method exhibits the smallest residual error relative to the alternative approaches.
In a further analysis, the evaluation time of our proposed method (0.95 ± 0.74 seconds per slice) remained considerably lower than of I2SR (38.32 ± 25.74 seconds per slice) and that of MT-DDPM (20.66 ± 14.00 seconds per slice). The quantitative metrics failed the Shapiro-Wilk normality test (with p-values ); thus, we performed non-parametric Kruskal-Wallis and Dunn’s tests. The Kruskal-Wallis test yielded p-values , indicating that the differences between the methods were statistically significant for all quantitative metrics. Specifically, our method achieved the highest PSNR (27.72 ± 2.26) and the lowest GMSD (0.08 ± 0.02). Although our method, on average, achieved the second-best LPIPS after Pix2pix and SPRS, the differences were not statistically significant with p = 0.17 and p = 0.31, respectively. Table 1 summarizes the quantitative metrics and indicates whether the differences are statistically significant. Furthermore, Figure 5 illustrates the boxplot of the quantitative metrics where vertical orange lines and red rectangular markers show the median and average values.
Figure 5:
The boxplot of the quantitative metrics is illustrated for the T2w pelvic and T1 map brain MRI.
The Shapiro-Wilk test revealed that the Likert scores were not normally distributed () across all evaluated methods. Our proposed method achieved an average score of 4.80 ± 0.40 (95% CI 4.65, 4.88), surpassing the second-best method, TM-DDPM, which obtained an average score of 4.33 ± 0.68 (95% CI 4.12, 4.49). Nonetheless, the difference was not statistically significant (). Furthermore, both of these methods significantly outperformed the third-best method, Pix2pix, which scored 3.67 ± 0.55 (95% CI 3.49, 3.80) ().
3.3. Ablation study
The ablation study was conducted to evaluate the contribution of the Swin Transformer block, which replaces the attention layer, and U-net without Swin Transformer block to the overall performance of the Res-SRDiff method. The quantitative results are presented in Table 2. The Shapiro test indicated that PSNR and SSIM for brain dataset were not normally distribution with p-values of 0.046 and , and all the metrics for pelvis dataset were not normally distributed with p-values . The paired t-test or Wilcoxon signed rank test results, according to the data distribution, are summarized in Table 2.
Table 2:
The Ablation study results are summarized. The best results are written in bold. The absolute percentage changes are written inside a paranthesis where red color and blue color indicate percentage of improvement and reduction of performance, respectively, of the Res-SRDiff model compare with its variation without using Swin Transformer block.
| Training scenario |
||||||
|---|---|---|---|---|---|---|
| Imaging region | w/o1 | w2 | PSNR [dB] ↑ | SSIM [−] ↑ | GMSD [−] ↓ | LIPIPS [−] ↓ |
| Pelvis T2w MRI | • | ◦ | 26.92±2.18 | 0.71±0.05 | 0.10±0.02 | 0.22±0.10 |
| ◦ | • | 27.72±2.26(2.97) | 0.75±0.05(5.63) | 0.08±0.02(20.00) | 0.21±0.11(4.55) | |
|
| ||||||
| 7T brain T1 | • | ◦ | 26.41±1.42 | 0.90±0.02 | 0.10±0.02 | 0.10±0.02 |
| MP2RAGE map | ◦ | • | 26.28±1.41(0.49) | 0.92±0.03(2.22) | 0.09±0.02(10.00) | 0.08±0.02(20.00) |
Trained without using Swin Transformer block.
Trained using Swin Transformer block (Res-SRDiff).
Denotes results that are not statistically significant(p-value > 0.05).
Res-SRDiff with Swin Transformer block achieved the superior performance in reconstructing high-resolution pelvis T2w MRI compared with the U-net for all quantitative metrics. Specifically, Swin Transformer block increases PSNR and SSIM about 2.97% and 5.63%, and reduces GMSD and LPIPS 20% and 4.55%, where all the differences were statistically significant different with p-values .
Similarly, Res-SRDiff with Swin Transformer block achieved the superior performance in reconstructing high-resolution brain T1 map compared with the U-net for SSIM, GMSD, and LPIPS metrics. Specifically, using Swin Transformer block increased SSIM 2.22% and reduced GMSD and LPIPS 10% and 20%. However, Res-SRDiff that did not use Swin Transformer block achieved a lower PSNR value. All the differences were statistically significant with p-values , except PSNR metric with p = 0.09.
Res-SRDiff using the Swin Transformer block could successfully preserved the fine-details better than when it uses a U-net without Swin Transformer block indicated by red and white arrows in Figure 6. Although the global difference between the two scenrios are small in reconstructing the pelvis T2w images, the Res-SRDiff with Swin Transformer block was able to reconstruct images with sharper details indicated by yellow arrow in Figure 6.
Figure 6:
The ablation results to reconstruct the brain and pelvis MRI images are illustrated for scenarios where the Res-SRDiff was trained with Swin Transformer block (w Swin block) and without Swin Transformer block (w/o Swin block).
In addition, we compared the local attribution map (LAM) to visualize the influence of the neighborhood pixels to reconstruct a shown with star region in Figure 7. By leveraging broader range of information, the Res-SRDiff with Swin Transformer block achieved improved results. Compared with the Res-SRDiff without Swin Transformer block, the proposed method with Swin Transformer block leverage wider information to reconstruct the given region for both brain and pelvis MRI (see Figure 7).
Figure 7:

The influence of pixel neighborhood to reconstruct the region shown by red stars is shown for the scenarios where the Res-SRDiff was trained with (w Swin block) and without (w/o Swin block) Swin Transformer blocks. Wider distributions indicate the involvement of more pixels that might help to reconstruct the region with higher fidelity.
4. Discussion
MRI remains one of the most versatile modalities in both clinical practice and research due to its excellent soft-tissue contrast and ability to generate multiple image contrasts without ionizing radiation. However, the inherently long acquisition times can lead to patient discomfort and motion artifacts [49], often forcing a trade-off between spatial resolution and acquisition efficiency. One of the easiest approaches to mitigate these challenges is to increase the voxel size, but this can adversely affect the diagnostic quality [50] by introducing partial volume effects.
In this study, we introduced Res-SRDiff, an efficient probabilistic diffusion model designed to reconstruct HR MRI images from LR inputs. By leveraging the residual error between the LR and HR images in the forward diffusion process, our approach shifts the HR image distribution toward that of the LR images. This enables the reverse process using a NN to accurately recover fine image details in only four sampling steps, markedly reducing the reconstruction time to under one second per slice compared with conventional diffusion models, which may require up to 20 seconds per slice.
Our experiments on both brain T1 maps and pelvic T2w images demonstrate that Res-SRDiff not only improves computational efficiency but also preserves critical anatomical details. For the brain T1 maps, qualitative assessments (as indicated by the white and red arrows in Figure 3) reveal that our method recovers fine structures with smaller residual errors compared to competing models. Quantitatively, our approach consistently achieved the highest PSNR and lowest GMSD, with statistically significant improvements (). Moreover, the small standard deviation observed across test samples suggests that incorporating the residual error contributes to a more stable and robust reconstruction process.
Similarly, in the pelvic T2w images, Res-SRDiff successfully reconstructs HR images with improved lesion depiction. Unlike the TM-DDPM method–which tended to exaggerate lesion sizes, possibly due to its progressive sampling process–our method maintained more anatomically accurate representations while also exhibiting lower residual errors. These findings align with the previous study that reported that DDPMs tend to generate blurry images [51]. The consistency of these results across both datasets underscores the advantage of integrating residual error information into the diffusion process.
The application of our SR approach offers substantial clinical benefits by enhancing lesion diagnosis and contouring in both brain and prostate imaging. For the brain T1 MP2RAGE maps, our method preserves fine anatomical details and reduces residual errors, which is critical for differentiating subtle lesions such as those associated with tumors, demyelinating diseases, or stroke. Similarly, the improved depiction of lesion morphology in pelvic T2w images, particularly in areas where traditional methods tend to exaggerate lesion boundaries, underscores the potential for more accurate identification and delineation of prostate lesions. In both cases, the enhanced image quality may not only support more precise treatment planning by reducing diagnostic and contouring uncertainty but also contribute to decreased acquisition times, thereby minimizing patient discomfort and motion artifacts. This comprehensive improvement in image quality across multiple anatomical regions highlights the clinical relevance and robustness of the Res-SRDiff method, paving the way for its integration into routine diagnostic and treatment workflows.
Looking forward, several promising research avenues arise from our work. Expanding the Res-SRDiff framework to include other imaging modalities and incorporating it into real-time clinical workflows could remarkably enhance its effectiveness. Furthermore, refining the diffusion process by incorporating adaptive noise scheduling [52] or aligning measurement gradient with local manifold structure of the Res-SRDiff diffusion state [53] has the potential to further improve image quality.
This study presents limitations that merit attention. Although the datasets comprise 3D volumes with resolution reduced across all three dimensions, the Res-SRDiff framework is implemented as a 2D super-resolution method that processes individual slices independently. This approach capitalizes on the efficiency of 2D convolutional architectures and simplifies both training and inference. However, because each slice is enhanced separately, there is no explicit mechanism in the current model to guarantee inter-slice continuity. As such, although improved spatial resolution and lesion delineation are achieved on a per-slice basis, the volumetric consistency across slices remains an open challenge. Future work will aim to address this limitation, potentially by incorporating 3D super-resolution architectures or additional continuity constraints to ensure robust inter-slice coherence in fully volumetric reconstructions. In addition, we acknowledge that the SR process may inadvertently alter or omit very fine anatomical structures (pink arrows shown in Figure 3). While such alterations are rare, they could have clinical implications, and future improvements should focus on incorporating structure-preserving mechanisms or uncertainty estimation to better safeguard clinically relevant features.
5. Conclusions
The proposed Res-SRDiff marks a substantial advancement in the creation of efficient diffusion-based super-resolution models for MRI. By minimizing the number of necessary sampling steps and utilizing residual error information, our approach achieves superior image restoration performance while ensuring both computational efficiency and consistency across a range of datasets.
Res-SRDiff provides a highly efficient and precise framework for MRI super-resolution, offering a notable reduction in computational time while maintaining or even exceeding the image quality of state-of-the-art methods. The integration of residual error shifting within the diffusion process signifies a meaningful step forward in medical image reconstruction, with potential implications for accelerating high-quality imaging in both clinical workflows and research applications.
Acknowledgment
This research is supported in part by the National Institutes of Health under Award Numbers R56EB033332, R01DE033512, and R01CA272991.
Footnotes
Conflicts of interest
There are no conflicts of interest declared by the authors.
Data availability
The ProstateX data is publicly available at the TCIA portal (https://www.cancerimagingarchive.net/analysis-result/prostatex-seg-hires/). Our institutional data cannot be made publicly available upon publication because they contain sensitive personal information.
References
- [1].Marques José P., Kober Tobias, Krueger Gunnar, van der Zwaag Wietske, Van de Moortele Pierre-François, and Gruetter Rolf. Mp2rage, a self bias-field corrected sequence for improved segmentation and t1-mapping at high field. NeuroImage, 49(2):1271–1281, 2010. [DOI] [PubMed] [Google Scholar]
- [2].Schelbert Erik B. and Messroghli Daniel R.. State of the art: Clinical applications of cardiac t1 mapping. Radiology, 278(3):658–676, 2016. [DOI] [PubMed] [Google Scholar]
- [3].Taylor Andrew J., Salerno Michael, Dharmakumar Rohan, and Jerosch-Herold Michael. T1 mapping. JACC: Cardiovascular Imaging, 9(1):67–81, 2016. [DOI] [PubMed] [Google Scholar]
- [4].Muir Eric R., Zhang Yi, Emeterio Nateras Oscar San, Peng Qi, and Duong Timothy Q.. Human vitreous: Mr imaging of oxygen partial pressure. Radiology, 266(3):905–911, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Epel Boris, Maggio Matthew C., Barth Eugene D., Miller Richard C., Pelizzari Charles A., Krzykawska-Serda Martyna, Sundramoorthy Subramanian V., Aydogan Bulent, Weichselbaum Ralph R., Tormyshev Victor M., and Halpern Howard J.. Oxygen-guided radiation therapy. International Journal of Radiation Oncology*Biology*Physics, 103(4):977–984, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Sun Park Claire Keun, Warner Noah Stanley, Kaza Evangelia, and Sudhyadhom Atchar. Optimization and validation of low-field mp2rage t1 mapping on 0.35t mr-linac: Toward adaptive dose painting with hypoxia biomarkers. Medical Physics, 51(11):8124–8140, 2024. [DOI] [PubMed] [Google Scholar]
- [7].Fu Yabo, Lei Yang, Wang Tonghe, Tian Sibo, Patel Pretesh, Jani Ashesh B., Curran Walter J., Liu Tian, and Yang Xiaofeng. Pelvic multi-organ segmentation on cone-beam ct for prostate adaptive radiotherapy. Medical Physics, 47(8):3415–3422, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Safari Mojtaba, Yang Xiaofeng, Chang Chih-Wei, Qiu Richard L J, Fatemi Ali, and Archambault Louis. Unsupervised mri motion artifact disentanglement: introducing maudgan. Physics in Medicine & Biology, 69(11):115057, may 2024. [DOI] [PubMed] [Google Scholar]
- [9].Lepcha Dawa Chyophel, Goyal Bhawna, Dogra Ayush, and Goyal Vishal. Image super-resolution: A comprehensive review, recent trends, challenges and applications. Information Fusion, 91:230–260, 2023. [Google Scholar]
- [10].Zhang Xin, Lam Edmund Y., Wu Ed X., and Wong Kenneth K. Y.. Application of tikhonov regularization to super-resolution reconstruction of brain mri images. In Gao Xiaohong, Müller Henning, Loomes Martin J., Comley Richard, and Luo Shuqian, editors, Medical Imaging and Informatics, pages 51–56, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg. [Google Scholar]
- [11].Dong Weisheng, Zhang Lei, Shi Guangming, and Li Xin. Nonlocally centralized sparse representation for image restoration. IEEE Transactions on Image Processing, 22(4):1620–1630, 2013. [DOI] [PubMed] [Google Scholar]
- [12].Vonesch CÉdric and Unser Michael. A fast multilevel algorithm for wavelet-regularized image restoration. IEEE Transactions on Image Processing, 18(3):509–523, 2009. [DOI] [PubMed] [Google Scholar]
- [13].Joshi Shantanu H., Marquina Antonio, Osher Stanley J., Dinov Ivo, Van Horn John D., and Toga Arthur W.. Mri resolution enhancement using total variation regularization. In 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pages 161–164, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Safari Mojtaba, Eidex Zach, Pan Shaoyan, Qiu Richard L. J., and Yang Xiaofeng. Self-supervised adversarial diffusion models for fast mri reconstruction. Medical Physics, n/a(n/a). [DOI] [PubMed]
- [15].Chen Yutong, Schönlieb Carola-Bibiane, Liò Pietro, Leiner Tim, Dragotti Pier Luigi, Wang Ge, Rueckert Daniel, Firmin David, and Yang Guang. Ai-based reconstruction for fast mri—a systematic review and meta-analysis. Proceedings of the IEEE, 110(2):224–245, 2022. [Google Scholar]
- [16].Mishro Pranaba K., Agrawal Sanjay, Panda Rutuparna, and Abraham Ajith. A survey on state-of-the-art denoising techniques for brain magnetic resonance images. IEEE Reviews in Biomedical Engineering, 15:184–199, 2022. [DOI] [PubMed] [Google Scholar]
- [17].Yi Xin, Walia Ekta, and Babyn Paul. Generative adversarial network in medical imaging: A review. Medical Image Analysis, 58:101552, 2019. [DOI] [PubMed] [Google Scholar]
- [18].Wang Longguang, Guo Yulan, Wang Yingqian, Dong Xiaoyu, Xu Qingyu, Yang Jungang, and An Wei. Unsupervised degradation representation learning for unpaired restoration of images and point clouds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(1):1–18, 2025. [DOI] [PubMed] [Google Scholar]
- [19].Ma Cheng, Rao Yongming, Cheng Yean, Chen Ce, Lu Jiwen, and Zhou Jie. Structure-preserving super resolution with gradient guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020. [Google Scholar]
- [20].Safari Mojtaba, Yang Xiaofeng, and Fatemi Ali. MRI data consistency guided conditional diffusion probabilistic model for MR imaging acceleration. In Gimi Barjor S. and Krol Andrzej, editors, Medical Imaging 2024: Clinical and Biomedical Imaging, volume 12930, page 129300R. International Society for Optics and Photonics, SPIE, 2024. [Google Scholar]
- [21].Pan Shaoyan, Wang Tonghe, Qiu Richard L J, Axente Marian, Chang Chih-Wei, Peng Junbo, Patel Ashish B, Shelton Joseph, Patel Sagar A, Roper Justin, and Yang Xiaofeng. 2d medical image synthesis using transformer-based denoising diffusion probabilistic model. Physics in Medicine & Biology, 68(10):105004, may 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Pan Shaoyan, Eidex Zach, Safari Mojtaba, Qiu Richard, and Yang Xiaofeng. Cycle-guided denoising diffusion probability model for 3D cross-modality MRI synthesis. In Gimi Barjor S. and Krol Andrzej, editors, Medical Imaging 2025: Clinical and Biomedical Imaging, volume 13410, page 134101W. International Society for Optics and Photonics, SPIE, 2025. [Google Scholar]
- [23].Chang Chih-Wei, Peng Junbo, Safari Mojtaba, Salari Elahheh, Pan Shaoyan, Roper Justin, Qiu Richard L J, Gao Yuan, Shu Hui-Kuo, Mao Hui, and Yang Xiaofeng. High-resolution mri synthesis using a data-driven framework with denoising diffusion probabilistic modeling. Physics in Medicine & Biology, 69(4):045001, feb 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Moser Brian B., Shanbhag Arundhati S., Raue Federico, Frolov Stanislav, Palacio Sebastian, and Dengel Andreas. Diffusion models, image super-resolution, and everything: A survey. IEEE Transactions on Neural Networks and Learning Systems, pages 1–21, 2024. [DOI] [PubMed]
- [25].Zongsheng Yue Jianyi Wang and Loy Chen Change. Resshift: Efficient diffusion model for image super-resolution by residual shifting. In Advances in Neural Information Processing Systems (NeurIPS), 2023. [Google Scholar]
- [26].Yue Zongsheng, Wang Jianyi, and Loy Chen Change. Efficient diffusion model for image restoration by residual shifting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(1):116–130, 2025. [DOI] [PubMed] [Google Scholar]
- [27].Liu Guan-Horng, Vahdat Arash, Huang De-An, Theodorou Evangelos, Nie Weili, and Anandkumar Anima. Iˆ2sb: Image-to-image schrödinger bridge. In International Conference on Machine Learning (ICML), July 2023. [Google Scholar]
- [28].Zhao Kai, Pang Kaifeng, Yu Hung Alex Ling, Zheng Haoxin, Yan Ran, and Sung Kyunghyun. Mri super-resolution with partial diffusion models. IEEE Transactions on Medical Imaging, pages 1–1, 2024. [DOI] [PMC free article] [PubMed]
- [29].Wang Yufei, Yang Wenhan, Chen Xinyuan, Wang Yaohui, Guo Lanqing, Chau Lap-Pui, Liu Ziwei, Qiao Yu, Kot Alex C., and Wen Bihan. Sinsr: Diffusion-based image super-resolution in a single step. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 25796–25805, June 2024. [Google Scholar]
- [30].Sohl-Dickstein Jascha, Weiss Eric, Maheswaranathan Niru, and Ganguli Surya. Deep unsupervised learning using nonequilibrium thermodynamics. In Bach Francis and Blei David, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR. [Google Scholar]
- [31].Song Yang, Sohl-Dickstein Jascha, Kingma Diederik P, Kumar Abhishek, Ermon Stefano, and Poole Ben. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. [Google Scholar]
- [32].Ho Jonathan, Jain Ajay, and Abbeel Pieter. Denoising diffusion probabilistic models. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ‘20, Red Hook, NY, USA, 2020. Curran Associates Inc. [Google Scholar]
- [33].Luo Calvin. Understanding diffusion models: A unified perspective, 2022.
- [34].Murphy Kevin P.. Probabilistic Machine Learning: An Introduction. MIT Press, 2022. [Google Scholar]
- [35].Zhang Richard, Isola Phillip, Efros Alexei A., Shechtman Eli, and Wang Oliver. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. [Google Scholar]
- [36].Liu Liyuan, Jiang Haoming, He Pengcheng, Chen Weizhu, Liu Xiaodong, Gao Jianfeng, and Han Jiawei. On the variance of the adaptive learning rate and beyond, 2021.
- [37].Loshchilov Ilya and Hutter Frank. Sgdr: Stochastic gradient descent with warm restarts, 2017.
- [38].Middlebrooks Erik H., Patel Vishal, Zhou Xiangzhi, Straub Sina, Murray John V. Jr., Agarwal Amit K., Okromelidze Lela, Singh Rahul B., Lopez Chiriboga Alfonso S., Westerhold Erin M., Gupta Vivek, Singh Sandhu Sukhwinder Johnny, Marin Collazo Iris V., and Tao Shengzhen. 7 t lesion-attenuated magnetization-prepared gradient echo acquisition for detection of posterior fossa demyelinating lesions in multiple sclerosis. Investigative Radiology, 59(7), 2024. [DOI] [PubMed] [Google Scholar]
- [39].Armato Samuel G., Huisman Henkjan, Drukker Karen, Hadjiiski Lubomir, Kirby Justin S., Petrick Nicholas, Redmond George, Giger Maryellen L., Cha Kenny, Mamonov Artem, Kalpathy-Cramer Jayashree, and Farahani Keyvan. PROSTATEx Challenges for computerized classification of prostate lesions from multiparametric magnetic resonance images. Journal of Medical Imaging, 5(4):044501, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Smith Stephen M.. Fast robust automated brain extraction. Human Brain Mapping, 17(3):143–155, 2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Yaniv Ziv, Lowekamp Bradley C., Johnson Hans J., and Beare Richard. Simpleitk image-analysis notebooks: a collaborative environment for education and reproducible research. Journal of Digital Imaging, 31(3):290–303, Jun 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Isola Phillip, Zhu Jun-Yan, Zhou Tinghui, and Efros Alexei A. Image-to-image translation with conditional adversarial networks. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, 2017. [Google Scholar]
- [43].Zhu Jun-Yan, Park Taesung, Isola Phillip, and Efros Alexei A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, 2017. [Google Scholar]
- [44].Wang Zhou, Bovik A.C., Sheikh H.R., and Simoncelli E.P.. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004. [DOI] [PubMed] [Google Scholar]
- [45].Xue Wufeng, Zhang Lei, Mou Xuanqin, and Bovik Alan C.. Gradient magnitude similarity deviation: A highly efficient perceptual image quality index. IEEE Transactions on Image Processing, 23(2):684–695, 2014. [DOI] [PubMed] [Google Scholar]
- [46].Sheikh H.R. and Bovik A.C.. Image information and visual quality. IEEE Transactions on Image Processing, 15(2):430–444, 2006. [DOI] [PubMed] [Google Scholar]
- [47].Safari Mojtaba, Fatemi Ali, and Archambault Louis. Medfusiongan: multimodal medical image fusion using an unsupervised deep generative adversarial network. BMC Medical Imaging, 23(1):203, Dec 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Virtanen Pauli, Gommers Ralf, Oliphant Travis E., Haberland Matt, Reddy Tyler, Cournapeau David, Burovski Evgeni, Peterson Pearu, Weckesser Warren, Bright Jonathan, van der Walt Stéfan J., Brett Matthew, Wilson Joshua, Millman K. Jarrod, Mayorov Nikolay, Nelson Andrew R. J., Jones Eric, Kern Robert, Larson Eric, Carey C J, Polat Ilhan, Feng Yu, Moore Eric W.,˙ VanderPlas Jake, Laxalde Denis, Perktold Josef, Cimrman Robert, Henriksen Ian, Quintero E. A., Harris Charles R., Archibald Anne M., Ribeiro Antônio H., Pedregosa Fabian, van Mulbregt Paul, and SciPy 1.0 Contributors. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Safari Mojtaba, Yang Xiaofeng, Fatemi Ali, and Archambault Louis. Mri motion artifact reduction using a conditional diffusion probabilistic model (marcdpm). Medical Physics, 51(4):2598–2610, 2024. [DOI] [PubMed] [Google Scholar]
- [50].Mao Lijuan, Zhang Xiaoling, Chen Tingting, Li Zhoulei, and Yang Jianyong. High-resolution reduced field-of-view diffusion-weighted magnetic resonance imaging in the diagnosis of cervical cancer. Quantitative Imaging in Medicine and Surgery, 13(6), 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Gao Yuan, Xie Huiqiao, Chang Chih-Wei, Peng Junbo, Pan Shaoyan, Qiu Richard L. J., Wang Tonghe, Ghavidel Beth, Roper Justin, Zhou Jun, and Yang Xiaofeng. Ct-based synthetic iodine map generation using conditional denoising diffusion probabilistic model. Medical Physics, 51(9):6246–6258, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Lee Seunghan, Lee Kibok, and Park Taeyoung. ANT: Adaptive noise schedule for time series diffusion models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. [Google Scholar]
- [53].Zirvi Rayhan, Tolooshams Bahareh, and Anandkumar Anima. Diffusion state-guided projected gradient for inverse problems. In The Thirteenth International Conference on Learning Representations, 2025. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The ProstateX data is publicly available at the TCIA portal (https://www.cancerimagingarchive.net/analysis-result/prostatex-seg-hires/). Our institutional data cannot be made publicly available upon publication because they contain sensitive personal information.





