Abstract
The quality of brain MRI volumes is often compromised by motion artifacts arising from intricate respiratory patterns and involuntary head movements, manifesting as blurring and ghosting that markedly degrade imaging quality. In this study, we introduce an innovative approach employing a 3D deep learning framework to restore brain MR volumes afflicted by motion artifacts. The framework integrates a densely connected 3D U-net architecture augmented by generative adversarial network (GAN)-informed training with a novel volumetric reconstruction loss function tailored to 3D GAN to enhance the quality of the volumes. Our methodology is substantiated through comprehensive experimentation involving a diverse set of motion artifact-affected MR volumes. The generated high-quality MR volumes have similar volumetric signatures comparable to motion-free MR volumes after motion correction. This underscores the significant potential of harnessing this 3D deep learning system to aid in the rectification of motion artifacts in brain MR volumes, highlighting a promising avenue for advanced clinical applications.
Keywords: MR motion correction, volume translation, Generative adversarial networks, Deep Learning
1. INTRODUCTION
The advent of Magnetic Resonance Imaging (MRI) has revolutionized medical diagnostics, providing high-resolution images of the internal structure of the body without the use of ionizing radiation. However, the integrity of these images can be significantly compromised by motion artifacts, which may arise due to involuntary movements of patients, including physiological motions such as cardiac pulsation and respiration1,2. This degradation not only diminishes the quality and reliability of the MRI but also increases the probability of diagnostic inaccuracies and the necessity for retakes, which can affect between 10% to 42% of brain MR scans3,4. Traditional strategies to mitigate such artifacts often require the acquisition and alignment of multiple images, a process that is time-intensive and can be subject to further errors14.
Recent advancements in artificial intelligence, particularly with the development of 2D deep convolutional neural networks (CNNs11-13) and 3D CNNs15,16, have offered promising alternatives to conventional methods. These AI-based methods have been pivotal in translating a domain of images into its equivalent target domain, a process which is especially beneficial in the context of medical imaging where the enhancement of diversity and quality of 2D images is crucial5. Techniques such as Generative Adversarial Networks (GANs) have been explored for medical image translation, with studies showcasing their efficacy in enhancing image quality and facilitating the generation of synthetic yet realistic medical images6-10. These AI-driven approaches are revolutionizing the correction of motion artifacts by enabling the synthesis of high-quality images that are free from the distortions caused by patient movement.
Furthermore, unsupervised methods using Cycle-MedGAN and adversarial correction have advanced the field, allowing for the retrospective correction of rigid and non-rigid MR motion artifacts without the need for paired images7,9. These methods have demonstrated the ability to generate corrected images that retain high fidelity to the original anatomy, even in the absence of a ground truth reference. Moreover, the exploration of uncertainty-guided progressive GANs provides an innovative framework for medical image translation, underscoring the potential of such methods to accommodate and rectify uncertainties inherent in medical imaging datasets10. Although these techniques can generate high-quality synthetic 2D image slices, the actual content of the 3D volume might differ significantly from the corresponding true representations. Our work shows that the proposed methodology rectifies varying levels of motion artifacts in the volume by producing high-quality motion-corrected MR volumes.
2. METHODS
2.1. Dataset Preparation
To facilitate the translation of motion-corrupted MR volumes into motion-free equivalents, a dataset of 973 motion-free T2-weighted Brain MR volumes (1.5T) was acquired from the ADHD-200 database20, representing diverse patient profiles. Linear interpolation was applied to resize the 3D MR volumes to a uniform 240 × 240 × 240 voxel dimension followed by normalizing to zero mean and unit variance. By using the simulation software package, TorchIO19, realistic motion artifacts were introduced based on three distinct categories. Low-level artifacts were simulated by TorchIO's random rotational range of (−3, 3) degrees and random translational range of (0, 5) mm. Medium-level artifacts adopted a rotational range of (−5, 5) degrees and a translational range of (0, 5) mm. High-level artifacts employed a rotational range of (−10, 10) degrees and a translational range of (0, 5) mm.
These artifact categories informed the creation of three datasets, unlike the traditional approach where each dataset consisted of pre-generated pairs of images (motion corrupted and motion-free) from MR volumes, our methodology involves a dynamic process. For each of the 923 patient MR volumes, we load the same volume twice during training. In every iteration, TorchIO is employed to dynamically corrupt one of the loaded MR volumes. This corruption is achieved by using TorchIO to apply random motion artifacts, ensuring that the corruption varies with each iteration, generating 1846 total image pairs for training. This dynamic approach results in a more robust training process, as the model is exposed to a wider variety of artifacts. For testing, we use 100 image pairs derived from 50 subject MR volumes, following a similar dynamic corruption process to evaluate the model's effectiveness in a varied set of scenarios. Additionally, we trained on High-level unrealistic motion artifacts employing a rotational range of (−10, 10) degrees, translational range of (−20, 20) mm and num transforms = 5 using TorchIO. We used the pre-generated dataset of 1846 volume pairs reduced training time, however we tested on realistic High-level artifacts following the dynamic corruption process for evaluation. Figure 1 shows an example of an MR volume corrupted by different levels of motion artifacts.
Figure 1.
MR volume in 3 different views with varying levels of Motion Artifacts
2.2. Image-to-Image Translation of Volumes using Generative Adversarial Networks:
The idea of applying adversarial networks to translation tasks involves using a conditional variant called cGAN, where the generator transforms a source domain volume into a target domain volume using a mapping function , where is a noise vector that serves as an additional input to the generator. Figure 2 shows the framework of GANs for MR motion correction using the image-to-image translation of volumes. The adversarial loss is given as:
| (1) |
with the discriminator classifying real and fake pairs. However, relying solely on adversarial loss for image-to-image translation doesn't consistently yield good results. To address this, a volumetric reconstruction loss (Volumetric loss) is added. is a combination of structural similarity index (SSIM) loss17 and peak signal-to-noise ratio (PSNR) loss18, normalized by a factor of 2.0.
Figure 2.
3D GAN framework for MR volume correction.
SSIM loss calculates the structural similarity between the target and generated volumes, by considering factors like luminance, contrast and structure providing a more comprehensive assessment of volume quality.
| (2) |
where, and the means of the target and generated volumes, and are the standard deviations of the target and generated volumes, is the covariance between the target and generated volume and , , , are the constants introduced for stability, where , , and .
The mean operation is applied to calculate the average SSIM across all voxels in the volumes.
The PSNR loss evaluates the fidelity of a volume by comparing the maximum potential pixel value to the root mean squared error between the target and generated volumes.
| (3) |
where is the maximum possible pixel value in the volumes, MSE is the Mean Squared Error between the target and the generated volumes, and N is the total number of pixels in the volumes.
The SSIM and PSNR losses were finally combined as a single volumetric loss term:
| (4) |
The final training objective in the image-to-image translation of volumes is to minimize the cGAN loss while maximizing the discriminator’s adversarial loss, augmented by a weighted volumetric reconstruction loss given by:
| (5) |
with as weighting hyperparameter.
Image-to-Image translation of volumes refers to the process of transforming a high-dimensional input into an output that exhibits distinct surface characteristics while retaining the fundamental structure. Within the 3D GAN framework, the emphasis lies on achieving resilience across diverse input modalities without necessitating alterations tailored to specific applications. As a result, the fundamental building block chosen for 3D GAN is a 3D encoder-decoder UNet architecture.
The 3D UNet architecture consists of an encoder and a decoder for the image-to-image translation of volumes. Table 1 shows the detailed architecture of 3D UNet. The encoder uses (4, 4, 4) kernels with (2, 2, 2) strides for downsampling, starting with 64 filters and increasing to 128 and 256 filters in subsequent blocks. Batch normalization and Leaky ReLU activation are used. The decoder features transpose convolutional layers with (4, 4, 4) kernels and (2, 2, 2) strides for upsampling, followed by optional dropout (rate 0.5). Decoder blocks merge upsampled and encoder feature maps, applying ReLU activation. The generator incorporates the encoder and decoder. Encoder blocks have 64, 128, and 256 filters, while the bottleneck uses a 256-filter convolutional layer with ReLU activation. Decoder blocks use 256, 128, and 64 filters. The output layer, with (4, 4, 4) kernels and (2, 2, 2) strides, produces the output volume, applying ReLU activation. In summary, the 3D UNet architecture utilizes defined kernel sizes, strides, and filters within the encoder and decoder. This design efficiently converts input volumes to output volumes, altering their appearance while maintaining the underlying structure.
Table 1.
3D UNet Architecture
| Layer | Kernels, Filters and Strides | Output shape |
|---|---|---|
| Encoder 1 | k = 4×4×4, f = 64, s = 2×2×2 | 120×120×120 |
| Encoder 2 | k = 4×4×4, f = 128, s = 2×2×2 | 60×60×60 |
| Encoder 3 | k = 4×4×4, f = 256, s = 2×2×2 | 30×30×30 |
| Bottleneck | k = 4×4×4, f = 256, s = 2×2×2 | 15×15×15 |
| Decoder 1 | k = 4×4×4, f = 256, s = 2×2×2 | 30×30×30 |
| Decoder 2 | k = 4×4×4, f = 128, s = 2×2×2 | 60×60×60 |
| Decoder 3 | k = 4×4×4, f = 64, s = 2×2×2 | 120×120×120 |
| Output | k = 4×4×4, f = 1, s = 2×2×2 | 240×240×240 |
The discriminator is a modified version of the PatchGAN architecture proposed by (Isola et al., 2016)5. PatchGAN divides input volumes into smaller patches through convolution, reducing the receptive field before classification and averaging. This confines the discriminator's focus to compact patches, enhancing accuracy in high-frequency details and enabling detailed generator outputs. Empirical results show that using smaller patches along with existing non-adversarial losses like volumetric reconstruction enhances image sharpness and removes tiling artifacts. The approach employs a 4 × 4 × 4 patch size achieved through three convolutional layers with 64, 128, and 256 spatial filters, followed by batch normalization and Leaky-ReLU activation. Finally, a convolutional layer with one output dimension and a sigmoid activation produces the needed confidence probability map.
The 3D GAN framework consists of a 3D UNet generator penalized from the volumetric reconstruction and pixel perspectives via an adversarial discriminator network. The framework is trained via a min-max optimization task using the final objective loss function:
| (6) |
Where is a hyperparameter, as a result of extensive hyperparameter optimization, was utilized. For training we make use of Adam Optimizer, with a learning rate of 0.0006 using PyTorch Framework. The network is trained for 300 epochs on 6 NVIDIA A6000 GPU’s with a batch size of 1. For every 10 epochs the generator model is saved to a file, as the training continues, the generator and discriminator models improve over time in generating high quality volumes and classifying between generated and motion free volumes. The training time was around 11 days on each category of data.
3. RESULTS
Results from each independent data set with the network optimized and tuned when evaluated on the held-out test datasets (100 image pairs each) are shown in Table 2. In our evaluation of motion correction models across varying artifact levels, we observed that models trained on unrealistic motion artifacts demonstrated a commendable degree of generalization when tested on realistic high-level motion artifacts, as evidenced by a Structural Similarity Index Measure (SSIM) of 92.1 and a notable Peak Signal-to-Noise Ratio (PSNR) of 68.9. This performance suggests that while there is a marginal decrease in SSIM compared to models trained on realistic artifacts, the capacity for noise reduction in practical scenarios remains robust. Intriguingly, the model trained on medium-level artifacts exhibited the highest SSIM, implying optimal performance within its specific training domain. Overall, the consistent SSIM scores across different models, alongside the variability in PSNR, underscore the potential for models trained on non-realistic data to achieve significant noise reduction, albeit with a slight compromise in structural fidelity when applied to real-world high-motion conditions.
Table 2.
Results from 3D GAN framework when evaluated on 3 categories of motion
| Test Datasets | Models trained on artifact level |
SSIM | PSNR |
|---|---|---|---|
| Low-Level motion | Low-level | 92.8 ± 3.39 | 63.1 ± 9.13 |
| Medium-Level motion | Medium-level | 93.0 ± 2.66 | 66.4 ± 8.78 |
| High-Level motion | High-level | 92.6 ± 3.12 | 62.9 ± 9.79 |
| High-Level motion | Unrealistic High-level | 92.1 ± 3.38 | 68.9 ± 7.32 |
In our comparative analysis, as shown in Table 3, the models trained across varying artifact levels demonstrated notable robustness when assessed against realistic motion artifacts, with the model trained on unrealistic high-level motion artifacts showing exceptional adaptability, achieving the highest PSNR values across all tested levels. This model sustained SSIM scores above 92%, indicating a consistent structural preservation despite the training on exaggerated artifacts. Interestingly, the models trained on low and medium-level artifacts exhibited domain-specific optimality in their respective artifact levels, with the medium-level model achieving the highest SSIM on medium artifacts. The relatively uniform standard deviations across metrics suggest stable performance within each model's operational domain. These findings highlight the potential efficacy of training on non-realistic data, presenting a valuable approach for scenarios where realistic training datasets are not readily available. Figure 3 presents a comparison of MRI scans across sagittal, coronal, and axial views, showing the presence of motion artifacts, and subsequent corrections by model trained on unrealistic high-level artifacts. The corrections show a marked improvement over the motion artifact-affected volumes.
Table 3.
Comparative analysis of models on different levels of motion artifacts
| Models trained on artifact level |
Low-Level motion artifacts test dataset |
Medium-Level motion artifacts test dataset |
High-Level motion artifacts test dataset |
|||
|---|---|---|---|---|---|---|
| SSIM | PSNR | SSIM | PSNR | SSIM | PSNR | |
| Low-level | 92.8 ± 3.4 | 63.1 ± 9.1 | 92.7 ± 3.1 | 63.8 ± 7.6 | 92.6± 2.9 | 62.8 ± 7.2 |
| Medium-level | 92.4± 2.8 | 62.2 ± 10.2 | 93.0 ± 2.6 | 65.4 ± 8.7 | 92.9± 2.8 | 65.4 ± 8.8 |
| High-level | 92.5 ± 3.5 | 63.4 ± 9.8 | 92.7 ± 3.0 | 63.2 ± 8.4 | 92.6± 3.1 | 62.9 ± 9.7 |
| Unrealistic High-level | 92.1± 3.1 | 68.2 ± 7.0 | 92.2 ± 2.8 | 68.6 ± 7.1 | 92.1 ± 3.3 | 68.9 ± 7.3 |
Figure 3.
Motion artifact correction using 3D UNet trained on unrealistic high level motion artifacts in 3 different views
4. DISCUSSION
In this work, we introduce an approach involving a 3D U-net architecture with a novel loss function using a generative adversarial network (GAN)-informed training strategy to address artifact removal in MR volumes. The network capitalizes on spatial information within the volume, refining 3D representation features through a tailored volumetric reconstruction loss function. This refinement process significantly reduces noise artifacts, enhancing image quality. The model which was trained on unrealistic high-level artifacts was not only able to perform well on realistic high-level artifacts but also on medium and low-level artifacts suggesting that the need for training on medium and low-level artifacts is not necessary. This 3D deep learning model is not only effective in addressing varying levels of realistic motion artifacts but also maintains, and perhaps even enhances its correction capabilities as the complexity of the artifacts increases. It is a positive outcome, highlighting the model’s ability to generalize and perform well across various artifacts. Importantly, this approach eliminates the need for additional preprocessing or post-processing steps that involve volume slicing and using 2D networks on each slice, simplifying the workflow while yielding high-resolution brain MR volumes.
5. CONCLUSION
Our proposed method adeptly processes the motion-corrupted volumes, yielding high-resolution, motion-corrected MR volumes. Although our achievements in producing high-quality MR volumes are commendable, further refinement of gray matter sharpness in the MR volumes can be achieved through the integration of advanced neural networks, such as transformers. This approach is particularly beneficial, as constructing deep networks solely with 3D convolutions can be computationally expensive. By adopting a 3D-based strategy, the necessity for volume slicing and employing 2D networks on slices is notably reduced, streamlining the workflow and enhancing efficiency.
ACKNOWLEDGMENTS
Research reported in this publication was supported in part by the National Cancer Institute of the National Institutes of Health under Award Number R01CA288379 and R01CA260705 and by the Cancer Prevention and Research Institute of Texas (CPRIT) under Award Number RP240289. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Footnotes
DISCLOSURES
The authors have no relevant financial interests in this article and no potential conflicts of interest to disclose.
REFERENCES
- 1.Enzmann DR, O'Donohue J, Rubin JB, Shuer L, Cogen P, Silverberg G. CSF pulsations within nonneoplastic spinal cord cysts. American Journal of Roentgenology. 1987; 149(1):149–157. [DOI] [PubMed] [Google Scholar]
- 2.Kjos BO, Ehman RL, Brant-Zawadzki M, Kelly WM, Norman D, Newton TH. Reproducibility of relaxation times and spin density calculated from routine MR imaging sequences: clinical study of the CNS. American Journal of Roentgenology. 1985; 144(6):1165–1170. [DOI] [PubMed] [Google Scholar]
- 3.Zaitsev M, Maclaren J, Herbst M. Motion artifacts in MRI: A complex problem with many partial solutions. Journal of Magnetic Resonance Imaging. 2015; 42(4):887–901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Maclaren J, Herbst M, Speck O, Zaitsev M. Prospective motion correction in brain imaging: a review. Magnetic resonance in medicine. 2013; 69(3):621–636. [DOI] [PubMed] [Google Scholar]
- 5.Isola Phillip, et al. "Image-to-image translation with conditional adversarial networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. [Google Scholar]
- 6.Armanious K et al. , "MedGAN: Medical Image Translation using GANs", arXiv preprint, 2018, [online] Available: http://arxiv.org/abs/1806.06397v1. [DOI] [PubMed] [Google Scholar]
- 7.Armanious Karim et al. “Unsupervised Medical Image Translation Using Cycle-MedGAN.” 2019 27th European Signal Processing Conference (EUSIPCO) (2019): 1–5. [Google Scholar]
- 8.Armanious Karim, et al. "Retrospective correction of rigid and non-rigid mr motion artifacts using gans." 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). IEEE, 2019. [Google Scholar]
- 9.Armanious Karim et al. “Unsupervised Adversarial Correction of Rigid MR Motion Artifacts.” 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) (2019): 1494–1498. [Google Scholar]
- 10.Upadhyay Uddeshya, Chen Yanbei, Hepp Tobias, Gatidis Sergios, and Akata Zeynep. 2021. Uncertainty-Guided Progressive GANs for Medical Image Translation. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III. Springer-Verlag, Berlin, Heidelberg, 614–624. 10.1007/978-3-030-87199-4_58 [DOI] [Google Scholar]
- 11.Yogananda Chandan Ganesh Bangalore, Madhuranthakam Ananth J., and Fei Baowei. 2021. Non-Invasive Profiling of Molecular Markers in Brain Gliomas Using Deep Learning and Magnetic Resonance Images. Ph.D. Dissertation. The University of Texas at Arlington. Advisor(s) Maldjian Joseph A. and Liu Hanli. Order Number: AAI28674456. [Google Scholar]
- 12.Feinler Mathias S., and Halm Bernadette N. "Retrospective Motion Correction in Gradient Echo MRI by Explicit Motion Estimation Using Deep CNNs." arXiv preprint arXiv:2303.17239 (2023). [Google Scholar]
- 13.Hossbach Julian et al. “Deep learning-based motion quantification from k-space for fast model-based magnetic resonance imaging motion correction.” Medical physics vol. 50,4 (2023): 2148–2161. doi: 10.1002/mp.16119 [DOI] [PubMed] [Google Scholar]
- 14.Godenschweger Frank, et al. "Motion correction in MRI of the brain." Physics in medicine & biology 61.5 (2016): R32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Duffy Ben A., et al. "Retrospective correction of motion artifact affected structural MRI images using deep learning of simulated motion." Medical Imaging with Deep Learning. 2022. [Google Scholar]
- 16.Kurzawski Jan W., et al. "Retrospective rigid motion correction of three-dimensional magnetic resonance fingerprinting of the human brain." Magnetic Resonance in Medicine 84.5 (2020): 2606–2615. [DOI] [PubMed] [Google Scholar]
- 17.Wang Z et al. , "Image quality assessment: from error visibility to structural similarity," IEEE transactions on image processing 13(4), 600–612 (2004). [DOI] [PubMed] [Google Scholar]
- 18.Sheikhand HR, Bovik AC, "Image information and visual quality," in IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430–444, Feb. 2006, doi: 10.1109/TIP.2005.859378. [DOI] [PubMed] [Google Scholar]
- 19.Pérez-García Fernando, Sparks Rachel, and Ourselin Sébastien. "TorchIO: a Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning." Computer Methods and Programs in Biomedicine 208 (2021): 106236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bellec Pierre, et al. "The neuro bureau ADHD-200 preprocessed repository." Neuroimage 144 (2017): 275–286. [DOI] [PubMed] [Google Scholar]



