Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2025 Aug 29;95(1):363–381. doi: 10.1002/mrm.70052

A network‐assisted joint image and motion estimation approach for robust 3D MRI motion correction across severity levels

Brian Nghiem 1,2,, Zhe Wu 1, Sriranga Kashyap 1, Lars Kasper 1,3, Kâmil Uludağ 1,2,4,5
PMCID: PMC12620172  PMID: 40883879

Abstract

Purpose

The purpose of this work was to develop and evaluate a novel method that leverages neural networks and physical modeling for 3D motion correction at different levels of corruption.

Methods

The novel method (“UNet+JE”) combines an existing neural network (“UNetmag”) with a physics‐informed algorithm for jointly estimating motion parameters and the motion‐compensated image (“JE”). UNetmag and UNet+JE were trained on two training datasets separately with different distributions of motion corruption severity and compared to JE as a benchmark. All five resulting methods were tested on T1w 3D MPRAGE scans of healthy participants with simulated (n = 40) and in vivo (n = 10) motion corruption ranging from mild to severe motion.

Results

UNet+JE provided better motion correction than UNetmag (p<102 for all metrics for both simulated and in vivo data), under both training datasets. UNetmag exhibited residual image artifacts and blurring, as well as greater susceptibility to data distribution shifts than UNet+JE. UNet+JE and JE did not significantly differ in image correction quality (p>0.05 for all metrics), even under strong distribution shifts for UNet+JE. However, UNet+JE reduced runtimes by a median reduction factor of between 2.00 to 3.80 as well as 4.05 for the simulation and in vivo studies, respectively.

Conclusions

UNet+JE benefitted from the robustness of joint estimation and the fast image improvement provided by the neural network, enabling the method to provide high quality 3D image correction under a wide range of motion corruption within shorter runtimes.

Keywords: deep learning, image reconstruction, motion correction

1. INTRODUCTION

Patient motion is a common source of image quality degradation in MRI. 1 Neuroanatomical MRI scans typically require several minutes of data acquisition, and changes to a patient's head position during that time can lead to ghosting artifacts, blurring, and signal loss, which reduce the diagnostic value of the image. Head motion is prevalent in pediatric and elderly cohorts, as well as patients with movement disorders and participants experiencing discomfort during scans. 2 , 3 In standard clinical practice, heavily corrupted scans are often repeated, which extends the length of the MRI exam and further increases patient discomfort. Scan repeats because of motion occur in 20% of MRI exams 4 and amount to an estimated cost of $1.4 billion USD per year in the United States. 5 In research settings, motion artifacts can bias or confound findings. 6 , 7

Motion mitigation strategies include patient training, head restraints, 8 and administering sedatives, which are not always feasible or desired. 2 Alternatively, prospective motion correction attempts to compensate for motion by adjusting MRI gradients in real time using data from external sensors 9 , 10 , 11 , 12 or navigator sequences. 13 , 14 , 15 , 16 Although prospective correction could theoretically account for most deleterious effects of motion, these approaches require dedicated hardware or specialized pulse sequences and are vulnerable to issues with calibration, accuracy, and latency.

Alternatively, data‐driven retrospective motion correction (RMC) methods estimate the uncorrupted image without requiring external motion tracking. Classical methods use a physical model of the impact of motion on the data 17 to jointly estimate the underlying image and the subject's head movement. The motion trajectory is estimated by minimizing some measure of the presence of motion artifacts in the data, such as image gradient entropy 18 , 19 , 20 or data consistency. 21 , 22 , 23 , 24 , 25 , 26 These joint estimation (JE) approaches involve large‐scale non‐convex optimization, which is computationally challenging. Optimized sampling patterns 25 and scout images 26 , 27 have been proposed to improve the convergence properties of JE.

More recently, various deep learning (DL) approaches have been proposed. Spieker et al. 28 conducted a comprehensive review of DL methods, including image‐to‐image translation approaches. 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 Although DL methods are attractive because of their fast runtime and their ability to learn image priors that are difficult to mathematically formulate, DL‐based approaches suffer from “hallucinations” (i.e., spurious image features) and smoothing, particularly under data distribution shifts. 30 , 41 , 42 Spieker et al. 28 concluded that DL‐based methods should be grounded in the physics of image acquisition (i.e., enforcing data consistency) to increase their robustness.

Haskell et al. 43 proposed the NAMER algorithm, which incorporated an artifact‐removal neural network within the JE algorithm, leading to better convergence properties. Neural networks have subsequently been shown to benefit image sparsity‐based JE 44 and scout‐assisted JE. 45 To provide greater robustness against different sampling patterns, Levac et al. 46 trained an unsupervised image denoiser to assist with JE. To reduce JE to only its motion estimation subcomponent, Singh et al. 47 trained a hypernetwork to estimate the motion‐compensated image. Conversely, Dabrowski et al. 48 reduced the problem to only image estimation by training a regression network to directly estimate motion parameters from the corrupted k‐space data.

Although promising, hybrid DL‐JE works have only been applied to 2D image correction and have not been implemented for 3D volumetric correction, which is non‐trivial because of its greater computational complexity. Although Cordero‐Grande et al. 24 , 25 have worked extensively on JE for 3D motion correction, they do not leverage DL. Although Wang et al. 49 have applied DL to improve 3D motion‐compensated image reconstruction, their study focused on the special case where motion parameters were known a priori.

Furthermore, limited research has been conducted on the performance of hybrid approaches under data distribution shifts with respect to motion corruption severity. It is important to characterize the impact of a method's training distribution on its performance. To this end, we developed a novel 3D hybrid DL‐JE algorithm and evaluated its performance on T1‐weighted 3D MPRAGE scan data with a wide range of simulated head motion corruption severity relative to different training datasets. Finally, we validated our method on data with in vivo motion corruption. Our source code is publicly available on GitHub (https://github.com/BRAIN‐TO/PyMoCo_v2).

2. METHODS

2.1. Rigid body model of motion corruption

The impact of motion on the MRI signal can be approximated using basic properties of the Fourier transform under rotations and translations. 17 For rigid body motion, the transformed signal encoding operator is given by Equation 1, where U is the sampling mask, F is the Fourier transform, C are the complex coil profiles, and θ consists of rotations followed by translations. The operators U and F are known a priori, whereas C is typically measured independently. Eq. 1 does not account for the impact of head motion on physical MRI properties (e.g., B0, B1 +, spin excitation and phase history 1 ), which would require a more complicated physical model. Larger head motions lead to greater deviations from the assumption of a rigid body motion.

We can approximate continuous changes in head position as piecewise constant motion states, where the nth discrete motion state (θn) consists of 6 degrees of freedom and is prescribed over a segment of the sampling pattern (Un). The motion‐corrupted signal encoding model is then described by Equation 2, where s is the acquired k‐space signal, m is the magnetization image, and η is Gaussian noise.

Eθ=UCθ (1)
s=n=1NUnCθnm+η=Eθm+η (2)

In the absence of explicit motion tracking (e.g., using external trackers 9 , 10 , 11 , 12 or navigators 13 , 14 , 15 , 16 ), we can jointly estimate both m and θ. 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 43 , 44 , 45 , 46 , 47 , 48 , 49 In this work, we focus on the data consistency‐based JE algorithm (Equation 3), which iteratively estimates m and θ by minimizing discrepancies between the measured signal (s) and the projected data (Eθm).

(m^,θ^)=argminm,θEθms22 (3)

It is difficult for the JE algorithm to resolve motion states at every read‐out line because of the limited spatial frequency information contained per line. 23 , 46 Instead, discrete motion states are typically assumed to cover several phase‐encoding (PE) steps 24 , 47 (referred to as a “shot”) with negligible intra‐shot motion. The JE algorithm has been shown to benefit from re‐ordering the PE steps such that all shots sample a similarly wide range of low and high spatial frequencies. 23 , 24 , 25 Although optimized patterns (i.e., DISORDER 25 ) have been developed, simpler patterns (e.g., interleaved pattern 24 ) have been proposed, which require fewer sequence modifications.

2.2. Implementation details of motion correction methods

Developing a hybrid DL‐JE method for 3D motion correction involved implementing a 3D JE algorithm, identifying a suitable 3D DL method to be combined with the JE algorithm, and finally evaluating the methods under different data distribution shifts with simulated and in vivo data. The DL approaches were trained on a NVIDIA V100 32 GB GPU (driver v470.42.01, CUDA v11.4). To distribute the computational workload, the methods were evaluated on a separate server equipped with a NVIDIA Quadro RTX 8000 48GB GPU (driver v470.239.06, CUDA v11.4).

2.2.1. Implementation of 3D joint estimation

Referencing the algorithms described by Haskell et al. 43 and Cordero‐Grande et al., 24 , 25 we implemented a 3D JE algorithm (“JE”) (Figure 1A). JE solves Equation 3 by alternating between estimating the image (m^) and the motion trajectory (θ^) (i.e., 6 parameters per shot) using the conjugate gradient algorithm 50 and the quasi‐Newton Broyden‐Fletcher‐Goldfarb‐Shannon algorithm. 51 To simplify the algorithm, we did not implement the regularizers, multi‐resolution strategies, or data compression strategies described by Cordero‐Grande et al. 24 , 25

FIGURE 1.

FIGURE 1

Overview of the 3D retrospective motion correction methods compared in this work. (A) The first method is the data consistency‐based joint estimation (JE) algorithm. (B) The second method is the stacked UNet with self‐assisted prior (UNetmag). (C) Our proposed method combines a complex‐valued version of the UNet (UNetcom) and the JE algorithm (UNet+JE). The JE algorithm consists of motion estimation (using 1 iteration of the Broyden‐Fletcher‐Goldfarb‐Shannon [BFGS] algorithm for each motion state separately) and image reconstruction (using 3 iterations of the conjugate gradient [CG] algorithm).

Similar to Haskell et al., 43 the motion parameters for different shots (θn) are estimated independently (Equation 4a) because of the separable nature of motion corruption. 24 , 43 We assumed that each shot corresponded to one motion state. The convergence criterion was empirically selected such that JE terminates when the first‐order finite difference of the total data consistency loss (Equation 5) is less than 0.1 for a minimum of 10 consecutive iterations. The algorithm was implemented in Python v3.9.7 using JAX 52 v0.2.27 for accelerated computation. When evaluated on one test case with 16 motion states, the typical processing time for one iteration of JE was 61 s.

θn^=argminθnEθnWm^s22 (4a)
m^=argminmEθ^ms22 (4b)
total=EθWm^s22 (5)

Our JE implementation includes a binary matrix, W, which masks out regions that violate the rigid‐body motion model (Equation 4a). W is set equal to the identity matrix for the simulated dataset. For the in vivo dataset, W is a 3D rectangular window that was manually designed to mask out axial slices below the base of each subject's cerebellum. This does not hinder motion estimation because it does not truncate ghosting artifacts, which manifest along the PE direction (i.e., within the axial plane).

2.2.2. Implementation of stacked UNets with self‐assisted priors

The stacked UNet with self‐assisted priors 36 (“UNetmag”) (Figure 1B) was selected as a representative 3D DL method because of its state‐of‐the‐art performance and the authors' thorough ablation study of their network architecture, which is available on GitHub. UNetmag processes 3D volumes in stacks of three adjacent sagittal slices with a stride of 1, first extracting relevant through‐plane information using attention networks before applying in‐plane motion correction using a stacked UNet architecture. The authors showed that incorporating the local through‐plane information enabled them to correct 3D volumes while using only 2D convolution layers. We compared the UNetmag model to another state‐of‐the‐art method 39 that provides volumetric correction using 3D convolution layers. We found that UNetmag provided better image correction when both models were trained with the same dataset; we present our methods and results in the Supporting Information (Figures S7–S9). UNetmag was implemented using TensorFlow v2.4.1. 53 We trained the network with structural similarity index measure (SSIM) loss using the Adam optimizer for 100 epochs, with a batch size of 10 and a learning rate of 0.001 (additional details are provided in Section 2.3). The typical time for running UNetmag on one volume was 4 s.

2.2.3. Implementation of UNet‐Assisted JE

Our UNet‐assisted JE algorithm (“UNet+JE”) (Figure 1C) incorporates a modified version of UNetmag with the complex‐valued JE algorithm. We modified the network architecture to process real and imaginary components of complex‐valued data in separate channels. This involved increasing the number of input channels in the first layer and output channels in the last layer of the UNet model, and the rest of the network architecture was kept the same. The resulting network (“UNetcom”) was trained independently from JE using a loss function comprised of a weighted sum of the SSIM of the magnitude, real, and imaginary components of the model's output (Equation 6). UNetcom was trained with the same optimizer, number of epochs, batch size, and learning rate as UNetmag (Table S2).

com=0.70*SSIM(|m^|)+0.15*SSIM(Re(m^))+0.15*SSIM(Im(m^)) (6)
θ^=argminθEθWUNetcom(m^)s22 (7a)
m^=argminmEθ^ms22. (7b)

The artifact‐removing UNetcom was applied to each intermediate image estimate before the motion estimation step (Equation 7a). The motion estimates were then used to update the image estimate (Equation 7b). All other details of JE remained the same when implementing and evaluating UNet+JE. When evaluated on one test case with 16 motion states, the typical processing time for one iteration of UNet+JE was 65 s.

2.3. Training and testing datasets with simulated motion corruption

We generated data with simulated motion corruption using publicly available T1‐weighted 3D MPRAGE 54 k‐space data 55 (the scanning parameters are summarized in Table 1). This dataset contained fully sampled, 12‐channel data from 67 participants with negligible motion artifacts. The coil sensitivity profiles were estimated using the Berkeley Advanced Reconstruction Toolbox's (BART) ESPIRiT algorithm. 56 , 57 Data from 60 randomly selected participants were used to generate a training dataset, while the remaining seven participants were reserved for testing. Two of the testing data were subsequently excluded because of large wrap‐around artifacts.

TABLE 1.

The scanning parameters for the dataset with simulated motion corruption (generated from a public dataset) and our acquired dataset with in vivo motion corruption.

Dataset with simulated motion corruption Dataset with in vivo motion corruption
TR/TE/TI (ms) 1600/2.75/650
Phase encode directions

PE 1: A/P

PE 2: L/R

Resolution (mm) 1 × 1 × 1
Acceleration None
Partial Fourier None
Bandwidth (Hz/px) 260
Matrix size (mm) 224 × 218 × 180 256 × 256 × 192
Acquisition time (min) 6:00 6:50

Note: The image contrast of the acquired dataset with in vivo motion corruption was matched to the public dataset that was used to generate the test dataset with simulated motion corruption.

Rigid‐body head motion was simulated using the MRI signal encoding model (Equation 2). The sampling pattern had no acceleration and was designed with an interleaved pattern along the first phase encoding (PE1) direction. The PE1 steps were subdivided into 14 segments (“shots”), each one resembling a Cartesian sampling pattern with a down‐sampling factor of R = 14 (Figure S1A). Each shot was assigned a discrete motion state, and the first shot corresponded to zero motion. Given that the TR of the acquisition was 1.6 s (Table 1), this corresponded to a duration of 22.4 s per motion state. We have investigated evaluating UNet+JE with simulated test cases with a temporal resolution of 3.2 s. This is documented in the Supporting Information (Table S3 and Figures S10–S11).

The motion trajectories were generated using a pseudo‐random walk algorithm with different probabilities of moving and displacement ranges for the 6 degrees of freedom. We established four levels of motion corruption, which are described in Table S1. The parameters for generating “mild” (e.g., maximum ˜1mm and ˜1°) and “moderate” (e.g., maximum of ˜3mm and ˜3°) motion were determined based on typical head motion characteristics reported in previous works. 36 , 58 , 59 We implemented “large” and “extreme” simulation parameters to simulate clinical cases with severe patient motion.

2.3.1. Training datasets with simulated motion corruption

To evaluate the impact of data distribution shifts, we generated two training datasets (Table S2). Training dataset A consisted of 240 training samples, which were generated from 2 “mild” and 2 “moderate” motion simulations applied to each of the 60 images that were reserved for training. Training dataset B includes an additional 240 training samples with “large” and “extreme” levels of motion corruption, amounting to 480 training samples. We trained the UNetmag and UNetcom models de novo using the two training datasets separately, resulting in UNetmag,A and UNetmag,B as well as UNet+JEA and UNet+JEB. The SSIM training loss curves for all models are plotted in Figure S2.

Although Al‐Masni et al. 36 had trained UNetmag on sagittal slices from 83 participants (21 248 sagittal slices), we decided to train on axial slices because the sagittal orientation distributes non‐brain head anatomy (e.g., neck) across all slices, which is not relevant for our brain‐specific application. To account for memory constraints and to reduce the number of slices containing non‐brain anatomy, we trained only on the central 128 slices out of 256 axial slices per volume, resulting in 30 720 and 61 440 axial slices for training datasets A and B.

2.3.2. Testing dataset with simulated motion corruption

To generate the testing dataset (n = 40), we imposed eight motion trajectories onto each of the five images reserved for testing (Table S2). For each test subject, we generated two motion trajectories from each of the four levels of motion corruption. We evaluated five different motion correction methods on this testing dataset: JE, UNetmag,A, UNetmag,B, UNet+JEA, and UNet+JEB.

2.4. Testing dataset with in vivo motion corruption

Ten healthy volunteers (age range: 24–47; 4 females) participated in this study with informed consent. The study conformed to the Declaration of Helsinki and was approved by the Research Ethics Board of the University Health Network. All scans were performed on a Siemens MAGNETOM Prisma 3 T scanner (Siemens Healthineers) at the Slaight Family Centre for Advanced MRI (Toronto Western Hospital). We acquired T1‐weighted 3D MPRAGE data with scanning parameters that were matched to the dataset with simulated motion (Table 1), except for a larger FOV to minimize wrap‐around artifacts.

We acquired a pair of images from each subject: a reference scan where the subject was asked to be stationary and a motion corrupted scan. The participants were trained to identify their neutral, maximum (upward nod), and minimum (downward nod) head positions that they felt comfortable performing within the head coil. The participants were supervised while they practiced a sinusoidal trajectory that spanned these three positions across 16 timepoints (Figure S3). This corresponded to a duration of 25.6 s per motion state. During the scan, the participants wore MRI‐compatible headphones and were verbally cued to switch between the head positions at the onset of each shot. Participants were asked to maintain their pose for the duration of the shot to minimize intrashot motion.

The reference scans were acquired with a linear order of PE, whereas the motion‐corrupted scans were acquired with a custom interleaved pattern along the PE1 direction (Figure S1B). Data was acquired using a 20‐channel head and neck coil, and all participants were placed in a head‐first supine pose. The coil sensitivity profiles were estimated using BART's ESPIRiT algorithm. 56 To focus on the data distribution shift of the in vivo testing dataset relative to training dataset A, we evaluated three different motion correction methods on this testing dataset: JE, UNetmag,A, and UNet+JEA.

2.5. Image quality and convergence rate assessment

We evaluated the performance of the different methods by computing the image percent error (ϵ, Equation 8) and SSIM, 60 which are standard full‐reference image quality metrics (IQM). Additionally, gray‐matter (GM) segmentations were evaluated to assess the recovery of gray‐white matter boundaries. 34 , 38 , 61 This was done using FSL FAST 62 because its image‐based algorithm better reflects the impact of motion artifacts than other algorithms that use additional priors (e.g., SPM). 63 The GM segmentation quality was quantified using Dice coefficients (Dice). Furthermore, we compared the convergence rates of JE and UNet+JE by evaluating their runtimes. Finally, we compared the motion estimation accuracy of JE and UNet+JEA/UNet+JEB for the simulated dataset.

To isolate the impact of motion artifacts within the brain, all images underwent rigid‐body registration 64 to their corresponding reference images and underwent skull‐stripping 65 before evaluating IQMs. All figures presenting image results (e.g., Figures 3, 4, 8) are plotted post‐registration. To account for the varying image quality across different scans, we reported the relative change of each IQM (Equation 9). We performed Friedman tests against the null hypothesis that all methods provided the same median relative change in each IQM. We then made pairwise comparisons by performing paired 2‐tailed sign tests with Bonferroni correction.

ϵ=mfinalmref2mref2*100 (8)
ΔIQM=IQMmfinalIQMminitialIQMminitial (9)

FIGURE 3.

FIGURE 3

Results of a test case with simulated “moderate” motion corruption, which falls within the distribution of both training datasets A and B (please see Table S2). The maximum translation and rotation magnitudes are provided on the left of the figure. The top row includes the motion‐free reference image and corrupted image, as well as the results of joint estimation (JE), UNetmag,A, and UNet+JEA. The bottom row shows the results of UNetmag,B and UNet+JEB. The global percent error (ϵ) and structural similarity index measure (SSIM) values are included. The zoomed‐in inset figures illustrate how well the different methods can recover intricate cortical folds, while the red arrows highlight residual artifacts.

FIGURE 4.

FIGURE 4

Results of a test case with simulated “large” motion corruption, which only falls within the distribution of training dataset B (please see Table S2). The maximum translation and rotation magnitudes are provided on the left of the figure. The top row includes the motion‐free reference image and corrupted image, as well as the results of joint estimation (JE), UNetmag,A, and UNet+JEA. The bottom row shows the results of UNetmag,B and UNet+JEB. The global percent error (ϵ) and structural similarity index measure (SSIM) values are included. The zoomed‐in inset figures illustrate how well the different methods can recover intricate cortical folds. The red arrows highlight subtle features that were well‐recovered by JE and UNet+JE compared to UNetmag (for both training datasets A and B).

FIGURE 8.

FIGURE 8

Results of two test cases from the dataset with in vivo motion corruption, featuring moderate (A) and large (B) discrete head motion. For UNetmag and UNet+JE, we only evaluated results for the models that were trained with training dataset A. The maximum motion magnitudes that were estimated by UNet+JEA are included on the left of each row; the true motion trajectories were not measured when acquiring the dataset. In both (A,B), the top and bottom rows display axial and coronal slices of the motion‐free reference image, corrupted image, and the results of the three motion correction methods. The global percent error (ϵ), structural similarity index measure (SSIM) and Dice coefficients are included for each test case.

Figure 2 compares the SSIM distribution of training datasets A and B with the simulated and in vivo test datasets. The simulated test data that fall within the distribution of training dataset A had a median SSIM (SSIM˜) and interquartile range (IQR) of 0.85 and 0.12; the remaining simulated test data had a SSIM˜ of 0.49 (IQR: 0.37). The in vivo testing dataset had a SSIM˜ of 0.60 (IQR: 0.20). Please note that, for the in vivo data, motion induces effects on the MRI signal that are not modeled in Equation 1 (e.g., changes in B0, B1 +, and spin history). Therefore, the SSIM values could be lowered by these effects.

FIGURE 2.

FIGURE 2

Comparing the training and testing dataset's structural similarity index measure (SSIM) distributions. UNetA and UNetB correspond to the UNet models that were trained with training dataset A and B, respectively (please see Table S2). The SSIM values of training datasets A and B are displayed as orange and blue histograms, respectively. The scatterplot above the histogram displays the image quality of each simulated (n = 40) and in vivo (n = 10) test case. The test cases with simulated motion corruption are color‐coded with respect to their simulation parameters.

3. RESULTS

3.1. Dataset with simulated motion corruption

Figures 3 and 4 shows the results of two representative test cases generated with “moderate” (Figure 3) (maximum translation/rotation: 1.37mm,1.65°) and “large” (Figure 4) (maximum translation/rotation: 4.19mm,1.68°) simulation parameters. In each figure, the two left columns show the data before and after simulated motion corruption. The three right columns show the results of using JE, UNetmag, and UNet+JE, while the top and bottom rows compare the methods when trained with training datasets A and B, respectively. A comparison of the corresponding GM segmentations for each result is included in the Supporting Information (Figures S4 and S5). Coronal and axial slices are presented to highlight the 3D motion correction performance of all methods. The inset zoomed‐in figures highlight intricate gray‐white matter boundaries.

In Figure 3, the corrupted image contained prominent ghosting artifacts (ϵ:11.60%,SSIM:0.83), which led to false‐positive/negative labels in its GM segmentation (Figure S4) (Dice:0.85). Although UNetmag,A (ϵ:8.73%,SSIM:0.92,Dice:0.95) and UNetmag,B (ϵ:13.55%,SSIM:0.91,Dice:0.89) were able to remove ghosting artifacts, both methods produced residual artifacts and blurring as well as apparent changes to the brightness/contrast of the image (particularly for UNetmag,B, which led to worsened ϵ values). In contrast, JE (ϵ:4.50%,SSIM:0.98,Dice:0.97), UNet+JEA (ϵ:3.81%,SSIM:0.98,Dice:0.97), and UNet+JEB (ϵ:2.02%,SSIM:0.99,Dice:0.99) all accurately recovered the image.

In Figure 4 and its corresponding GM segmentations in Figure S5, the anatomical features in the corrupted image have largely been obscured (ϵ:20.20%,SSIM:0.78,Dice:0.84). UNetmag,A (ϵ:22.80%,SSIM:0.66,Dice:0.78) smoothened the appearance of repeated edges, but significant residual artifacts and blurring remained. UNetmag,B (ϵ:25.06%,SSIM:0.66,Dice:0.79) further smoothened the image while preserving the overall contour of the brain to a greater degree than UNetmag,A. However, UNetmag,B did not accurately recover the underlying anatomical features. The JE (ϵ:9.87%,SSIM:0.94,Dice:0.94), UNet+JEA (ϵ:9.84%,SSIM:0.94,Dice:0.93) and UNet+JEB (ϵ:9.86%,SSIM:0.94,Dice:0.93) algorithms recovered the image with high fidelity, although residual artifacts remained in contrast to the near‐perfect image recovery that these methods achieved in Figure 3.

Figures 5 and 6 summarize the performance of all methods on subsets of the simulated testing datasets that are within and outside of the distribution of training dataset A, respectively. In both figures, the boxplots in (A–C) display the relative change in percent error (Δrϵ), SSIM (ΔrSSIM), and Dice coefficient (ΔrDice). We evaluated the median relative change (Δr˜) for each IQM, which are reported below each corresponding boxplot. Additionally, the results of the 2‐tailed sign test are annotated.

FIGURE 5.

FIGURE 5

Comparing the performance of all five methods across the subset of the simulated testing dataset that falls within the distribution of training dataset A (n = 20) (refer to Table S2). (A–C) compare the distribution of relative change (Δr) in percent error (ϵ), structural similarity index measure (SSIM), and Dice, respectively. (D) compares the convergence rates of joint estimation (JE), UNet+JEA and UNet+JEB for the simulated test dataset. The boxplots are annotated with p‐values from evaluating paired 2‐tailed sign test samples with Bonferroni correction. The median of each boxplot is displayed below each subfigure.

FIGURE 6.

FIGURE 6

Similar to Figure 5, this figure compares the performance of all methods across the simulated test cases that fall outside of the distribution of training dataset A (n = 20) (refer to Table S2). (A–C) compare the distribution of relative change (Δr) in percent error (ϵ), structural similarity index measure (SSIM), and Dice, respectively. (D) compares the convergence rates of joint estimation (JE) and UNet+JE for the simulated test dataset. The boxplots are annotated with p‐values from evaluating paired 2‐tailed sign test samples with Bonferroni correction. The median of each boxplot is displayed below each subfigure.

First, considering only the methods trained on training dataset A, JE and UNet+JEA consistently outperformed UNetmag,A with strong statistical significance (p<5.0·104 for all IQMs). In particular, UNetmag,A provided minimal improvement in the accuracy of the GM segmentations (Δ˜rDice: 0.05 and 0.03 in Figures 5C and 6C) compared to JE (Δ˜rDice: 0.12 and 0.37) and UNet+JEA (Δ˜rDice: 0.12 and 0.33). Although JE and UNet+JEA did not differ in image quality, UNet+JEA converged significantly faster than JE (p<5.0·104), leading to a speed up in runtime by a median factor of 2.35 (Figure 5D) and 3.20 (Figure 6D) for test cases with “small/moderate” and “large/extreme” motion corruption, respectively.

When comparing the impact of training distributions on UNetmag,A/UNetmag,B and UNet+JEA/UNet+JEB, there were generally no statistically significant differences in image quality for either subset of the testing dataset (Figures 5A–C and 6A–C). The only exceptions were for UNet+JEA/UNet+JEB for Δrϵ with “mild/moderate” test cases (Figure 5A) (W=5.00,p=2.07·101) and for ΔrDice with “large/extreme” test cases (Figure 6C) (W=4.00,p=5.77·101). There were no statistically significant differences in the convergence rates of UNet+JEA and UNet+JEB across the entire testing dataset (Figures 5D and 6D). Similar to UNet+JEA, UNet+JEB provided greater speed up in runtime for “large/extreme” motion cases (median factor: 3.8) than for “small/moderate” motion cases (median factor: 2.0).

To further investigate the impact of training distributions on UNet+JEA/UNet+JEB, Figure 7 compares the motion estimation error (xestxtrue) of JE, UNet+JEA, and UNet+JEB. Although all methods generally estimated the motion parameters with high accuracy (median errors are on the order of 103 mm and degrees (deg) for all methods), only UNet+JEB led to errors greater than 1.0 mm or 1.0 deg. These cases appear to be outliers and generally occurred for larger motion parameters (e.g., translations >5 mm; rotations >5 deg). Figure S6 presents one such test case where UNet+JEB converged to a local minimum.

FIGURE 7.

FIGURE 7

The error (xestxtrue) of the motion parameters estimated by joint estimation (JE), UNet+JEA, and UNet+JEB across all n = 40 simulated test cases. Rows (A,B) plot the distribution of the accuracy of the estimated translation and rotation parameters against the ground truth motion parameters, which have been grouped to the nearest 1 mm or 1 deg. along the x‐axis. All three methods generally estimated motion parameters with high accuracy (median accuracies of 0.0 for all methods). However, only UNet+JEB exhibited outlier cases where motion parameters were mis‐estimated by >1.0 mm or >1.0 deg. Please refer to Figure S6 for an example of a test cases where UNet+JEB converged to local minima.

3.2. Dataset with in vivo motion corruption

Figure 8 shows two test cases from the dataset with in vivo motion corruption. In Figure 8A, the image contained moderately strong ghosting artifacts (ϵ:24.84%,SSIM:0.72,Dice:0.80). Although UNetmag,A reduced their appearance, residual artifacts and blurring remained (ϵ:30.11%,SSIM:0.75,Dice:0.79). Although JE (ϵ:18.16%,SSIM:0.91,Dice:0.95) and UNet+JEA (ϵ:18.69%,SSIM:0.91,Dice:0.95) provided more accurate image recovery, they both contain mild residual artifacts. This led to slightly lower improvements in ϵ, SSIM, and Dice compared to the near‐perfect correction of the simulated test dataset (e.g., Figure 3). JE and UNet+JEA converged after 204 and 84 min, respectively.

In Figure 8B, the corrupted image contained strong artifacts (ϵ:37.87%,SSIM:0.49,Dice:0.66). Using UNetmag,A on its own provided some improvement in sharpening the appearance of the ventricles and the cortical folds of the posterior lobes, however, anatomical features remained largely obscured (ϵ:41.08%,SSIM:0.54,Dice:0.70). Although the outputs of JE (ϵ:22.91%,SSIM:0.85,Dice:0.90) and UNet+JEA (ϵ:22.25%,SSIM:0.85,Dice:0.90) contained obvious residual artifacts, they were able to recover more of the underlying anatomical features, leading to significant improvements in the accuracy of their GM segmentations. JE and UNet+JEA converged after 533 and 119 min, respectively.

Figure 9 compares the three methods across the entire in vivo dataset. In Figure 9A–C, JE and UNet+JEA outperformed UNetmag,A with strong statistical significance (p>1.0·103 in all 3 IQMs). Notably, UNetmag,A consistently provided negligible improvement in SSIM (Δ˜rDice: 0.05, IQR: 0.07) and GM segmentation quality (Δ˜rDice: 0.03, IQR: 0.04). Although JE and UNet+JEA did not statistically differ in any IQM, Figure 9D shows that UNet+JEA converges significantly faster (W=4.1,p=2.86·103), leading to a reduction in runtime by a median factor of 4.05.

FIGURE 9.

FIGURE 9

The performance of the three motion correction methods across the test dataset with in vivo motion corruption (n = 10). (A–C) compare the distribution of relative change (Δr) in percent error (ϵ), structural similarity index measure (SSIM), and Dice, respectively. (D) compares the convergence rates of joint estimation (JE) and UNet+JE for the in vivo test dataset. The boxplots are annotated with p‐values from evaluating paired 2‐tailed sign test samples with Bonferroni correction. The median of each boxplot is displayed below each subfigure.

4. DISCUSSION

Although recent RMC methods 43 , 44 , 45 , 46 , 47 , 48 have demonstrated the benefits of leveraging both DL and physical modeling, these methods are limited to 2D applications and their generalizability to different levels of motion corruption has yet to be evaluated. UNet+JE is the first hybrid DL‐JE algorithm that can provide 3D motion correction and that has been assessed under various extents of motion corruption relative to its training distribution.

For the simulated dataset, we found that UNet+JEA/UNet+JEB consistently outperformed UNetmag,A/UNetmag,B in all metrics, even under strong distribution shifts. This suggests that the JE subcomponent of UNet+JE conferred greater robustness to the overall algorithm. Furthermore, we found that UNet+JEA/UNet+JEB converged faster than JE, which corroborates findings reported by Haskell et al. 43 In particular, we found that both UNet+JEA and UNet+JEB provided greater reduction in runtime for “large/extreme” motion cases than for “mild/moderate” cases, which indicates that UNet+JE offers greater performance gains for more heavily corrupted test cases, regardless of its training distribution. These results were echoed in the in vivo dataset, where JE and UNet+JEA outperformed UNetmag,A in all IQMs, and UNet+JEA converged faster than JE.

Although UNet+JEA and UNet+JEB did not have any statistically differences in IQMs, we found that only UNet+JEB yielded outlier cases where the algorithm converged to local minima (e.g., Figure S6). This suggests that training with the augmented dataset B led to more “hallucinated” image features and blurring that could hinder the joint estimation algorithm. It remains an open question to determine how to optimally train a neural network to generalize over a large distribution of test cases, particularly for clinical applications (e.g., more training samples; transfer learning from “specialized” models tailored to correcting large/extreme motion, etc.).

We found that UNetmag,A/UNetmag,B were able to reduce the appearance of ghosting artifacts (particularly in the background) while preserving the overall contour of the skull. However, both models struggled to reliably recover image features and frequently contained residual artifacts and image blurring, even in test cases that fell within the models' training distributions (e.g., Figure 3). On a case‐by‐case basis for “large” and “extreme” motion corruption, UNetmag,B appeared to remove more ghosting artifacts than UNetmag,A, albeit with a greater extent of image blurring and “hallucinated” image features, however, both models were still unable to recover the actual underlying image features.

Despite these discrepancies, the ϵ, SSIM and Dice values were similar between the image outputs of both UNetmag models with no statistically significant differences. We believe that this is because of limitations of these full‐reference IQMs. Although these IQMs can be used to describe how dissimilar the image outputs are relative to their corresponding reference data, they do not directly describe how the image outputs differ from each other. In particular, while RMC outputs with “better” IQM scores (e.g. ε 0, SSIM and Dice 1) are close to their reference image and, therefore, necessarily more similar to each other, image outputs that are highly dissimilar between each other (e.g., in terms of smoothness) can share similarly poor IQM values (e.g., ϵ>20%, SSIM and Dice <0.80) with respect to their reference image (e.g., UNetmag,A and UNetmag,B in Figure 4, both of which share an SSIM of 0.66). We believe that reference‐free IQMs that are tailored to quantifying salient image textures for our application (i.e., repeated edges, smoothing, and hypo‐intensities) would enable better comparisons between the different RMC methods. 66

Another major limitations of our work is that UNet+JE requires data to be acquired with an interleaved PE1 ordering rather than the standard sequential ordering, which is known to lead to unfavorable convergence properties for JE. 23 , 24 Although previous works have proposed optimized sampling patterns for JE (i.e., DISORDER 25 ), they require more sequence modifications and significantly amplify the appearance of motion artifacts. 27 Although Levac et al. 46 explored using unsupervised learning to increase robustness to different sampling patterns, their Cartesian data was still simulated with interleaved sampling. We are currently exploring using regression neural networks for motion estimation 48 to circumvent the limitations of classical optimizers.

Additionally, UNet+JE currently assigns one motion state to each shot, similar to previous JE methods. 47 We had instructed our participants to move accordingly to minimize intrashot motion. In our Supporting Information, we outlined and validated a multi‐resolution version of the UNet+JE algorithm, and we found that the modified algorithm was able to consistently improve image quality for simulated test cases at a temporal resolution of 3.2 s. To be able to estimate arbitrary motion trajectories at finer temporal resolutions, we will explore shot rejection 24 and intra‐shot 43 , 46 motion estimation approaches. Although data consistency‐based JE methods require modified k‐space trajectories and assumptions regarding discrete head motion, the results of our in vivo study and preliminary simulation results with intrashot motion show that UNet+JE can still improve image quality despite violations to the method's assumptions.

Finally, in clinical settings, UNet+JE would need to preserve anatomical features (including pathology) with high fidelity, depending on the use case. Although UNet+JE has yet to be tested on clinical data, its performance on distribution shifts with respect to severe motion artifacts is promising. UNet+JE consistently yielded better image quality than the initial corrupted image for all test cases. Although residual artifacts remain visible for more severely corrupted cases (e.g., Figure 4, Figure 8B), UNet+JE recovered anatomical features in images that were otherwise diagnostically unusable. Previous studies 61 , 67 have been conducted to assess the utility of RMC methods on clinical data. We will explore transfer learning 68 approaches to ensure that UNet+JE generalizes well for clinical data.

5. CONCLUSIONS

We have presented a novel retrospective motion correction method (UNet+JE) that combines a neural network (UNetmag) with a physics‐based algorithm (JE). UNet+JE is the first hybrid DL‐JE method that has been applied to 3D volumetric correction. When assessed on simulated and in vivo data with varying levels of motion corruption, we found that UNet+JE was more robust to data distribution shifts than UNetmag, while converging to similar image outputs as JE in fewer iterations. Our findings demonstrate that it is worthwhile to combine DL and classical MRI signal modeling to leverage their respective benefits.

CONFLICT OF INTEREST

Z.W. now works for Siemens Healthineers, Oakville, ON, Canada.

Supporting information

Figure S1. Illustrations of the interleaved sampling patterns used to generate the simulated (A) and in vivo (B) datasets. In both subfigures, each column (delineated by dashed lines) plots the sampling pattern of the first phase encoding (PE1) and second phase encoding (PE2) steps for a single shot. In (A), the 218 PE1 steps are binned into 14 shots. The first 8 shots contain 16 PE1 steps, while the last 6 shots contain 15 PE1 steps; all shots have an effective undersampling factor of R = 14. In (B), the 256 PE1 steps are binned into 16 shots, each containing 16 equidistant PE1 lines.

Figure S2. Plots of the training loss (1‐SSIM) curves for all UNet models. The top row corresponds to the magnitude‐only UNet models (UNetmag), while the bottom row displays the complex‐valued UNet models (UNetcom). The left and right columns display the results of training with Training Dataset A (UNetmag,A, UNetcom,A) and Dataset B (UNetmag,B, UNetcom,B), respectively. The blue and orange curves correspond to the loss values for the training and validation datasets. All UNet models were trained for 100 epochs and displayed stable convergence.

Figure S3. Illustrating the targeted head motion trajectory performed by participants for the in vivo data acquisitions. Participants were instructed to perform pitch rotations (i.e., rotations about the left–right axis). The gray dashed lines demarcate the minimum (downward nod), neutral, and maximum (upward nod) head positions which the participants were trained to identify before the scan. The blue solid lines indicate the relative head positions that the participants were asked to maintain during each shot, while the dotted blue lines indicate transitions between head positions.

Figure S4. The gray matter segmentations corresponding to each image displayed in Figure 3 (main text). For each result, the top image displays gray matter segmentations (bluish‐gray) overlayed on top of their corresponding anatomical images; the global Dice coefficients are included. The bottom image plots the difference image of each gray matter segmentation with respect to the reference segmentation; false‐positive and false‐negative labels are displayed as red and navy blue labels, respectively.

Figure S5. Similar to Figure S4, this figure shows the gray matter segmentations corresponding to each image displayed in Figure 4 (main text). For each result, the top image displays gray matter segmentations (bluish‐gray) overlayed on top of their corresponding anatomical images; the global Dice coefficients are included. The bottom image plots the difference image of each gray matter segmentation with respect to the reference segmentation; false‐positive and false‐negative labels are displayed as red and navy blue labels, respectively.

Figure S6. An example of a simulated test case where UNet+JEB converged to a local minimum. (A) A test case with “extreme” motion corruption. The percent error (ϵ) and structural similarity index (SSIM) are provided. (B) The relative data consistency (DC) loss is plotted for the JE, UNet+JEA and UNet+JEB algorithms. While JE and UNet+JEA converge to similar DC values (0.06), UNet+JEB resulted in a larger DC value (0.17). The dashed vertical lines denote the iteration at which each algorithm converged. (C) Comparing the image estimates provided by UNetmag,A, UNetmag,B, as well as the image and motion estimates provided by UNet+JEA and UNet+JEB. In the motion trajectory plots, the dashed lines correspond to the groundtruth parameters (“TGT”, “RGT”), while the solid lines are the estimated parameters (“Test”, “Rest”). The red dashed rectangle highlights motion states that were mis‐estimated by UNet+JEB.

Figure S7. Comparing the training loss curves of the UNetAl‐Masni et al (Al‐Masni et al. NeuroImage 2021; left column) and the UNetDuffy et al (Duffy et al. Nature 2021; right column) models. The former was trained with structural similarity (SSIM) loss, while the latter was trained with mean absolute error (MAE) loss. Both SSIM and MAE loss values were evaluated during all training sessions and displayed for completeness. Both models were trained with magnitude‐only data from Training Dataset A; the dataset was transformed into 3D image patches to be compatible with the 3D convolutions used by UNetDuffy et al.

Figure S8. Comparing the performance of UNetAl‐Masni et al (corresponds to “UNetmag,A” in main text) and UNetDuffy et al with the same simulated test cases from Figures 3 and 4. In (A), this test case falls within the training distribution of both models. Both methods improve the image quality of the corrupted scan, as reflected by the improved global SSIM values and the reduction of ghosting artifacts in the inset zoomed‐in ROIs. In (B), this test case falls outside of the training distribution of both UNet models. While both methods produced images with residual artifacts and strong image blurring, UNetDuffy et al produces stronger image distortions, leading to worse ϵ and SSIM than the initial corrupted image.

Figure S9. Comparing UNetAl‐Masni et al and UNetDuffy et al across the simulated testing dataset (N = 40). Plots (A) – (C) compare the distribution of relative change (Δr) in percent error (ϵ), SSIM, and Dice, respectively. The boxplots are annotated with p‐values from evaluating paired two‐tailed sign test samples, and the median of each boxplot is displayed below each subfigure. Based on the median relative changes in each IQM, the UNetAl‐Masni et al outperformed UNetDuffy et al in all 3 IQMs with strong statistical significance.

Figure S10. (A) A test case with simulated intrashot motion corruption. Motion parameters were simulated at a temporal resolution of 3.2 seconds. This test cases corresponds to Subject 5 in Figure S11. (B) Comparing the image output and estimated motion parameters at Level 1 and Level 4 of the multiresolution UNet+JE algorithm. The top row displays the intermediate image and motion estimates at Level 1, in which the translation and rotation parameters are estimated at a temporal resolution of 25.6 seconds. The bottom row displays the final image and motion estimates at Level 4 of the algorithm, where motion parameters are estimated with a resolution of 3.2 seconds. The percent error (ϵ) and structural similarity index SSIM are included. (C) The change in total data consistency loss across all iterations of the multiresolution UNet+JE algorithm.

Figure S11. Displaying the change in image quality across the multi‐resolution UNet+JE algorithm for the 5 test cases with simulated intrashot motion. Plots (A) – (C) show the RMSE, SSIM, and Dice, respectively. While the RMSE trajectories display non‐monotonic behavior for certain test cases, the SSIM and Dice trajectories display more monotonic increase in image quality across each level.

Table S1. A summary of the motion corruption parameters for the training dataset. Motion trajectories were generated as a pseudo‐random walk from shot to shot, with different probabilities and step intervals assigned to each of the 3 translational and 3 rotational degrees of freedom.

Table S2. Details of the two datasets generated for training the UNet models. Training Dataset A consists of 240 samples, generated from data of 60 subjects that each underwent 2 simulations with “mild” motion corruption and 2 simulations with “moderate” motion corruption. Similarly, Training Dataset B consists of 480 samples generated from the same 60 subjects, now augmented to include “large” and “extreme” levels of motion corruption. During training, the number of epochs, the learning rate, and the batch size were kept the same between the different training sessions with Dataset A and B.

Table S3. Description of the multi‐resolution UNet+JE algorithm. The algorithm is divided into 4 levels, where each level carries out motion estimation and correction at increasingly higher temporal resolutions. Level 1 is equivalent to the standard UNet+JEA presented in the main text.

MRM-95-363-s001.docx (3.5MB, docx)

ACKNOWLEDGMENTS

We thank Dr. Melissa Haskell for sharing helpful insights about the NAMER algorithm. We also thank Dr. Mohammed Al‐Masni for sharing code for the Stacked UNets with Self‐Assisted Priors. Finally, we thank Asma Naheed for her assistance with data acquisition.

Nghiem B., Wu Z., Kashyap S., Kasper L., and Uludağ K., “A network‐assisted joint image and motion estimation approach for robust 3D MRI motion correction across severity levels,” Magnetic Resonance in Medicine 95, no. 1 (2026): 363–381, 10.1002/mrm.70052.

DATA AVAILABILITY STATEMENT

The source code is open available on GitHub at https://github.com/BRAIN‐TO/PyMoCo_v2. The data that support the findings of this study are available from the corresponding author on reasonable request.

REFERENCES

  • 1. Zaitsev M, Maclaren J, Herbst M. Motion artifacts in MRI: a complex problem with many partial solutions. J Magn Reson Imaging. 2015;1:887‐901. doi: 10.1002/jmri.24850 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. van der Kouwe A. Motion artifacts and correction in neuro MRI. Advances in Magnetic Resonance Technology and Applications. Vol 4. Elsevier; 2021:53‐68. doi: 10.1016/B978-0-12-822479-3.00012-9 [DOI] [Google Scholar]
  • 3. Slipsager JM, Glimberg SL, Søgaard J, et al. Quantifying the Financial Savings of Motion Correction in brain MRI: a model‐based estimate of the costs arising from patient head motion and potential savings from implementation of motion correction. J Magn Reson Imaging. 2020;52:731‐738. doi: 10.1002/jmri.27112 [DOI] [PubMed] [Google Scholar]
  • 4. Andre JB, Bresnahan BW, Mossa‐Basha M, et al. Toward quantifying the prevalence, severity, and cost associated with patient motion during clinical MR examinations. J Am Coll Radiol. 2015;12:689‐695. doi: 10.1016/j.jacr.2015.03.007 [DOI] [PubMed] [Google Scholar]
  • 5. Wald LL. Ultimate MRI. J Magn Reson. 2019;306:139‐144. doi: 10.1016/j.jmr.2019.07.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Kim H, Lepage C, Maheshwary R, et al. NEOCIVET: towards accurate morphometry of neonatal gyrification and clinical applications in preterm newborns. Neuroimage. 2016;138:28‐42. doi: 10.1016/j.neuroimage.2016.05.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Reuter M, Tisdall MD, Qureshi A, Buckner RL, van der Kouwe AJW, Fischl B. Head motion during MRI acquisition reduces gray matter volume and thickness estimates. Neuroimage. 2015;107:107‐115. doi: 10.1016/j.neuroimage.2014.12.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Power JD, Silver BM, Silverman MR, Ajodan EL, Bos DJ, Jones RM. Customized head molds reduce motion during resting state fMRI scans. Neuroimage. 2019;189:141‐149. doi: 10.1016/j.neuroimage.2019.01.016 [DOI] [PubMed] [Google Scholar]
  • 9. Maclaren J, Armstrong BSR, Barrows RT, et al. Measurement and correction of microscopic head motion during magnetic resonance imaging of the brain. PLoS One. 2012;7:e48088. doi: 10.1371/journal.pone.0048088 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Aranovitch A, Haeberlin M, Gross S, et al. Prospective motion correction with NMR markers using only native sequence elements. Magn Reson Med. 2018;79:2046‐2056. doi: 10.1002/mrm.26877 [DOI] [PubMed] [Google Scholar]
  • 11. Slipsager JM, Ellegaard AH, Glimberg SL, et al. Markerless motion tracking and correction for PET, MRI, and simultaneous PET/MRI. PLoS One. 2019;14:e0215524. doi: 10.1371/journal.pone.0215524 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Frost R, Wighton P, Karahanoğlu FI, et al. Markerless high‐frequency prospective motion correction for neuroanatomical MRI. Magn Reson Med. 2019;82:126‐144. doi: 10.1002/mrm.27705 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Tisdall MD, Hess AT, Reuter M, Meintjes EM, Fischl B, van der Kouwe AJW. Volumetric navigators for prospective motion correction and selective reacquisition in neuroanatomical MRI. Magn Reson Med. 2012;68:389‐399. doi: 10.1002/mrm.23228 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Gallichan D, Marques JP, Gruetter R. Retrospective correction of involuntary microscopic head movement using highly accelerated fat image navigators (3D FatNavs) at 7T. Magn Reson Med. 2016;75:1030‐1039. doi: 10.1002/mrm.25670 [DOI] [PubMed] [Google Scholar]
  • 15. Wallace TE, Afacan O, Waszak M, Kober T, Warfield SK. Head motion measurement and correction using FID navigators. Magn Reson Med. 2019;81:258‐274. doi: 10.1002/mrm.27381 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Ulrich T, Riedel M, Pruessmann KP. Servo navigators: linear regression and feedback control for rigid‐body motion correction. Magn Reson Med. 2024;91:1876‐1892. doi: 10.1002/mrm.29967 [DOI] [PubMed] [Google Scholar]
  • 17. Batchelor PG, Atkinson D, Irarrazaval P, Hill DLG, Hajnal J, Larkman D. Matrix description of general motion correction applied to multishot images. Magn Reson Med. 2005;54:1280. doi: 10.1002/mrm.20656 [DOI] [PubMed] [Google Scholar]
  • 18. Atkinson D, Hill DLG, Stoyle PNR, Summers PE, Keevil SF. Automatic correction of motion artifacts in magnetic resonance images using an entropy focus criterion. IEEE Trans Med Imaging. 1997;16:903‐910. doi: 10.1109/42.650886 [DOI] [PubMed] [Google Scholar]
  • 19. Lin W, Song HK. Improved optimization strategies for autofocusing motion compensation in MRI via the analysis of image metric maps. Magn Reson Imaging. 2006;24:751‐760. doi: 10.1016/j.mri.2006.02.003 [DOI] [PubMed] [Google Scholar]
  • 20. Loktyushin A, Nickisch H, Pohmann R, Schölkopf B. Blind retrospective motion correction of MR images. Magn Reson Med. 2013;70:1608‐1618. doi: 10.1002/mrm.24615 [DOI] [PubMed] [Google Scholar]
  • 21. Loktyushin A, Nickisch H, Pohmann R, Schölkopf B. Blind multirigid retrospective motion correction of MR images. Magn Reson Med. 2015;73:1457‐1468. doi: 10.1002/mrm.25266 [DOI] [PubMed] [Google Scholar]
  • 22. Odille F, Vuissoz PA, Marie PY, Felblinger J. Generalized reconstruction by inversion of coupled systems (GRICS) applied to free‐breathing MRI. Magn Reson Med. 2008;60:146‐157. doi: 10.1002/mrm.21623 [DOI] [PubMed] [Google Scholar]
  • 23. Cordero‐Grande L, Teixeira RPAG, Hughes EJ, Hutter J, Price AN, Hajnal JV. Sensitivity encoding for aligned multishot magnetic resonance reconstruction. IEEE Trans Comput Imaging. 2016;2:266‐280. doi: 10.1109/tci.2016.2557069 [DOI] [Google Scholar]
  • 24. Cordero‐grande L, Hughes EJ, Hutter J, Price AN, Hajnal JV. Three‐dimensional motion corrected sensitivity encoding reconstruction for multi‐shot multi‐slice MRI: application to neonatal brain imaging. Magn Reson Med. 2018;1376:1365‐1376. doi: 10.1002/mrm.26796 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Cordero‐Grande L, Ferrazzi G, Teixeira RPAG, O'Muircheartaigh J, Price AN, Hajnal JV. Motion‐corrected MRI with DISORDER: distributed and incoherent sample orders for reconstruction deblurring using encoding redundancy. Magn Reson Med. 2020;84:713‐726. doi: 10.1002/mrm.28157 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Polak D, Splitthoff DN, Clifford B, et al. Scout accelerated motion estimation and reduction (SAMER). Magn Reson Med. 2021;87:163‐178. doi: 10.1002/mrm.28971 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Polak D, Hossbach J, Splitthoff DN, et al. Motion guidance lines for robust data consistency–based retrospective motion correction in 2D and 3D MRI. Magn Reson Med. 2023;89:1777‐1790. doi: 10.1002/mrm.29534 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Spieker V, Eichhorn H, Hammernik K, et al. Deep learning for retrospective motion correction in MRI: a comprehensive review. IEEE Trans Med Imaging. 2024;43:846‐859. doi: 10.1109/TMI.2023.3323215 [DOI] [PubMed] [Google Scholar]
  • 29. Loktyushin A, Schuler C, Scheffler K, Schölkopf B. Retrospective motion correction of magnitude‐input MR images. Revised Selected Papers of the First International Workshop on Machine Learning Meets Medical Imaging. Vol 9487. Springer‐Verlag; 2015:3‐12. doi: 10.1007/978-3-319-27929-9_1 [DOI] [Google Scholar]
  • 30. Küstner T, Armanious K, Yang J, Yang B, Schick F, Gatidis S. Retrospective correction of motion‐affected MR images using deep learning frameworks. Magn Reson Med. 2019;82:1527‐1540. doi: 10.1002/mrm.27783 [DOI] [PubMed] [Google Scholar]
  • 31. Johnson PM, Drangova M. Conditional generative adversarial network for 3D rigid‐body motion correction in MRI. Magn Reson Med. 2019;82:901‐910. doi: 10.1002/mrm.27772 [DOI] [PubMed] [Google Scholar]
  • 32. Pawar K, Chen Z, Shah NJ, Egan GF. Suppressing motion artefacts in MRI using an inception‐ResNet network with motion simulation augmentation. NMR Biomed. 2019;32:e4225. doi: 10.1002/nbm.4225 [DOI] [PubMed] [Google Scholar]
  • 33. Lee J, Kim B, Park H. MC2‐net: motion correction network for multi‐contrast brain MRI. Magn Reson Med. 2021;86:1077‐1092. doi: 10.1002/mrm.28719 [DOI] [PubMed] [Google Scholar]
  • 34. Oksuz I, Clough JR, Ruijsink B, et al. Deep learning‐based detection and correction of cardiac MR motion artefacts during reconstruction for high‐quality segmentation. IEEE Trans Med Imaging. 2020;39:4001‐4010. doi: 10.1109/TMI.2020.3008930 [DOI] [PubMed] [Google Scholar]
  • 35. Duffy BA, Zhang W, Tang H, et al. Retrospective correction of motion artifact affected structural MRI images using deep learning of simulated motion. https://openreview.net/forum?id=H1hWfZnjM.
  • 36. Al‐masni MA, Lee S, Yi J, et al. Stacked U‐nets with self‐assisted priors towards robust correction of rigid motion artifact in brain MRI. Neuroimage. 2022;259:119411. doi: 10.1016/j.neuroimage.2022.119411 [DOI] [PubMed] [Google Scholar]
  • 37. Usman M, Latif S, Asim M, Lee BD, Qadir J. Retrospective motion correction in multishot MRI using generative adversarial network. Sci Rep. 2020;10:4786. doi: 10.1038/s41598-020-61705-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Liu S, Thung KH, Qu L, Lin W, Shen D, Yap PT. Learning MRI artefact removal with unpaired data. Nat Mach Intell. 2021;3:60‐67. doi: 10.1038/s42256-020-00270-2 [DOI] [Google Scholar]
  • 39. Duffy BA, Zhao L, Sepehrband F, et al. Retrospective motion artifact correction of structural MRI images using deep learning improves the quality of cortical surface reconstructions. Neuroimage. 2021;230:117756. doi: 10.1016/j.neuroimage.2021.117756 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Van der Goten LA, Guo J, Smith K. MAMOC: MRI Motion Correction Via Masked Autoencoding. 2024. Accessed August 2, 2024. http://arxiv.org/abs/2405.14590.
  • 41. Hewlett M, Petrov I, Johnson PM, Drangova M. Deep‐learning‐based motion correction using multichannel MRI data: a study using simulated artifacts in the fastMRI dataset. NMR Biomed. 2024;37:e5179. doi: 10.1002/nbm.5179 [DOI] [PubMed] [Google Scholar]
  • 42. Levac B, Kumar S, Kardonik S, Tamir JI. FSE compensated motion correction for MRI using data driven methods. In: Wang L, Dou Q, Fletcher PT, Speidel S, Li S, eds. Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. Lecture Notes in Computer Science. Vol 13436. Springer Nature Switzerland; 2022:707‐716. doi: 10.1007/978-3-031-16446-0_67 [DOI] [Google Scholar]
  • 43. Haskell MW, Cauley SF, Bilgic B, et al. Network accelerated motion estimation and reduction (NAMER): convolutional neural network guided retrospective motion correction using a separable motion model. Magn Reson Med. 2019;82:1452‐1461. doi: 10.1002/mrm.27771 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Kuzmina E, Razumov A, Rogov OY, Adalsteinsson E, White J, Dylov DV. Autofocusing+: noise‐resilient motion correction in magnetic resonance imaging. In: Wang L, Dou Q, Fletcher PT, Speidel S, Li S, eds. Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. Lecture Notes in Computer Science. Springer Nature Switzerland; 2022:365‐375. doi: 10.1007/978-3-031-16446-0_35 [DOI] [Google Scholar]
  • 45. Hossbach J, Splitthoff DN, Cauley S, et al. Deep learning‐based motion quantification from k‐space for fast model‐based magnetic resonance imaging motion correction. Med Phys. 2023;50:2148‐2161. doi: 10.1002/mp.16119 [DOI] [PubMed] [Google Scholar]
  • 46. Levac B, Kumar S, Jalal A, Tamir JI. Accelerated motion correction with deep generative diffusion models. Magn Reson Med. 2024;92:853‐868. doi: 10.1002/mrm.30082 [DOI] [PubMed] [Google Scholar]
  • 47. Singh NM, Dey N, Hoffmann M, et al. Data Consistent Deep Rigid MRI Motion Correction. doi: 10.48550/arXiv.2301.10365 [DOI] [PMC free article] [PubMed]
  • 48. Dabrowski O, Falcone JL, Klauser A, et al. SISMIK for brain MRI: deep‐learning‐based motion estimation and model‐based motion correction in k‐space. IEEE Trans Med Imaging. 2025;44:396‐408. doi: 10.1109/TMI.2024.3446450 [DOI] [PubMed] [Google Scholar]
  • 49. Wang C, Liang Y, Wu Y, Zhao S, Du YP. Correction of out‐of‐FOV motion artifacts using convolutional neural network. Magn Reson Imaging. 2020;71:93‐102. doi: 10.1016/j.mri.2020.05.004 [DOI] [PubMed] [Google Scholar]
  • 50. Pruessmann KP, Weiger M, Scheidegger MB, Boesiger P. SENSE: sensitivity encoding for fast MRI. Magn Reson Med. 1999;42:952‐962. doi: [DOI] [PubMed] [Google Scholar]
  • 51. Nocedal J, Wright SJ. Numerical Optimization. 2nd ed. Springer; 2006. [Google Scholar]
  • 52. Bradbury J, Frostig R, Hawkins P, et al. JAX: Autograd and XLA. Accessed November 14, 2023. https://github.com/google/jax.
  • 53. Abadi M, Agarwal A, Barham P, et al. TensorFlow: Large‐Scale Machine Learning on Heterogeneous Distributed Systems.
  • 54. Mugler JP III, Brookeman JR. Three‐dimensional magnetization‐prepared rapid gradient‐echo imaging (3D MP RAGE). Magn Reson Med. 1990;15:152‐157. doi: 10.1002/mrm.1910150117 [DOI] [PubMed] [Google Scholar]
  • 55. Souza R, Lucena O, Garrafa J, et al. An open, multi‐vendor, multi‐field‐strength brain MR dataset and analysis of publicly available skull stripping methods agreement. Neuroimage. 2018;170:482‐494. doi: 10.1016/j.neuroimage.2017.08.021 [DOI] [PubMed] [Google Scholar]
  • 56. Uecker M, Ong F, Tamir JI, et al. Berkeley advanced reconstruction toolbox. Annual Meeting ISMRM, Toronto, 2015. In Proc Intl Soc Mag Reson Med. 2015;23:2486. https://archive.ismrm.org/2015/2486.html. [Google Scholar]
  • 57. Uecker M, Lai P, Murphy MJ, et al. ESPIRiT—an eigenvalue approach to autocalibrating parallel MRI: where SENSE meets GRAPPA. Magn Reson Med. 2014;71:990‐1001. doi: 10.1002/mrm.24751 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Hess AT, Alfaro‐Almagro F, Andersson JLR, Smith SM. Head movement in UK Biobank, analysis of 42,874 fMRI motion logs. 2022.
  • 59. Eichhorn H, Vascan AV, Nørgaard M, et al. Characterisation of Children's head motion for magnetic resonance imaging with and without general Anaesthesia. Front Radiol. 2021;1:1. doi: 10.3389/fradi.2021.789632 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process. 2004;13:600‐612. doi: 10.1109/TIP.2003.819861 [DOI] [PubMed] [Google Scholar]
  • 61. Vecchiato K, Egloff A, Carney O, et al. Evaluation of DISORDER: retrospective image motion correction for volumetric brain MRI in a pediatric setting. Am J Neuroradiol. 2021;42:774‐781. doi: 10.3174/ajnr.A7001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Zhang Y, Brady M, Smith S. Segmentation of brain MR images through a hidden Markov random field model and the expectation‐maximization algorithm. IEEE Trans Med Imaging. 2001;20:45‐57. doi: 10.1109/42.906424 [DOI] [PubMed] [Google Scholar]
  • 63. Ashburner J, Friston KJ. Unified segmentation. Neuroimage. 2005;26:839‐851. doi: 10.1016/j.neuroimage.2005.02.018 [DOI] [PubMed] [Google Scholar]
  • 64. Avants BB, Epstein CL, Grossman M, Gee JC. Symmetric diffeomorphic image registration with cross‐correlation: evaluating automated labeling of elderly and neurodegenerative brain. Med Image Anal. 2008;12:26‐41. doi: 10.1016/j.media.2007.06.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Hoopes A, Mora JS, Dalca AV, Fischl B, Hoffmann M. SynthStrip: skull‐stripping for any brain image. Neuroimage. 2022;260:119474. doi: 10.1016/j.neuroimage.2022.119474 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Marchetto E, Eichhorn H, Gallichan D, Schnabel JA, Ganz M. Agreement of image quality metrics with radiological evaluation in the presence of motion artifacts. Magn Reson Mater Phys Biol Med. 2025. doi:10.1007/s10334‐025‐01266‐y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Pawar K, Chen Z, Seah J, Law M, Close T, Egan G. Clinical utility of deep learning motion correction for T1 weighted MPRAGE MR images. Eur J Radiol. 2020;133:109384. doi: 10.1016/j.ejrad.2020.109384 [DOI] [PubMed] [Google Scholar]
  • 68. Knoll F, Hammernik K, Kobler E, Pock T, Recht MP, Sodickson DK. Assessment of the generalization of learned image reconstruction and the potential for transfer learning. Magn Reson Med. 2019;81:116‐128. doi: 10.1002/mrm.27355 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1. Illustrations of the interleaved sampling patterns used to generate the simulated (A) and in vivo (B) datasets. In both subfigures, each column (delineated by dashed lines) plots the sampling pattern of the first phase encoding (PE1) and second phase encoding (PE2) steps for a single shot. In (A), the 218 PE1 steps are binned into 14 shots. The first 8 shots contain 16 PE1 steps, while the last 6 shots contain 15 PE1 steps; all shots have an effective undersampling factor of R = 14. In (B), the 256 PE1 steps are binned into 16 shots, each containing 16 equidistant PE1 lines.

Figure S2. Plots of the training loss (1‐SSIM) curves for all UNet models. The top row corresponds to the magnitude‐only UNet models (UNetmag), while the bottom row displays the complex‐valued UNet models (UNetcom). The left and right columns display the results of training with Training Dataset A (UNetmag,A, UNetcom,A) and Dataset B (UNetmag,B, UNetcom,B), respectively. The blue and orange curves correspond to the loss values for the training and validation datasets. All UNet models were trained for 100 epochs and displayed stable convergence.

Figure S3. Illustrating the targeted head motion trajectory performed by participants for the in vivo data acquisitions. Participants were instructed to perform pitch rotations (i.e., rotations about the left–right axis). The gray dashed lines demarcate the minimum (downward nod), neutral, and maximum (upward nod) head positions which the participants were trained to identify before the scan. The blue solid lines indicate the relative head positions that the participants were asked to maintain during each shot, while the dotted blue lines indicate transitions between head positions.

Figure S4. The gray matter segmentations corresponding to each image displayed in Figure 3 (main text). For each result, the top image displays gray matter segmentations (bluish‐gray) overlayed on top of their corresponding anatomical images; the global Dice coefficients are included. The bottom image plots the difference image of each gray matter segmentation with respect to the reference segmentation; false‐positive and false‐negative labels are displayed as red and navy blue labels, respectively.

Figure S5. Similar to Figure S4, this figure shows the gray matter segmentations corresponding to each image displayed in Figure 4 (main text). For each result, the top image displays gray matter segmentations (bluish‐gray) overlayed on top of their corresponding anatomical images; the global Dice coefficients are included. The bottom image plots the difference image of each gray matter segmentation with respect to the reference segmentation; false‐positive and false‐negative labels are displayed as red and navy blue labels, respectively.

Figure S6. An example of a simulated test case where UNet+JEB converged to a local minimum. (A) A test case with “extreme” motion corruption. The percent error (ϵ) and structural similarity index (SSIM) are provided. (B) The relative data consistency (DC) loss is plotted for the JE, UNet+JEA and UNet+JEB algorithms. While JE and UNet+JEA converge to similar DC values (0.06), UNet+JEB resulted in a larger DC value (0.17). The dashed vertical lines denote the iteration at which each algorithm converged. (C) Comparing the image estimates provided by UNetmag,A, UNetmag,B, as well as the image and motion estimates provided by UNet+JEA and UNet+JEB. In the motion trajectory plots, the dashed lines correspond to the groundtruth parameters (“TGT”, “RGT”), while the solid lines are the estimated parameters (“Test”, “Rest”). The red dashed rectangle highlights motion states that were mis‐estimated by UNet+JEB.

Figure S7. Comparing the training loss curves of the UNetAl‐Masni et al (Al‐Masni et al. NeuroImage 2021; left column) and the UNetDuffy et al (Duffy et al. Nature 2021; right column) models. The former was trained with structural similarity (SSIM) loss, while the latter was trained with mean absolute error (MAE) loss. Both SSIM and MAE loss values were evaluated during all training sessions and displayed for completeness. Both models were trained with magnitude‐only data from Training Dataset A; the dataset was transformed into 3D image patches to be compatible with the 3D convolutions used by UNetDuffy et al.

Figure S8. Comparing the performance of UNetAl‐Masni et al (corresponds to “UNetmag,A” in main text) and UNetDuffy et al with the same simulated test cases from Figures 3 and 4. In (A), this test case falls within the training distribution of both models. Both methods improve the image quality of the corrupted scan, as reflected by the improved global SSIM values and the reduction of ghosting artifacts in the inset zoomed‐in ROIs. In (B), this test case falls outside of the training distribution of both UNet models. While both methods produced images with residual artifacts and strong image blurring, UNetDuffy et al produces stronger image distortions, leading to worse ϵ and SSIM than the initial corrupted image.

Figure S9. Comparing UNetAl‐Masni et al and UNetDuffy et al across the simulated testing dataset (N = 40). Plots (A) – (C) compare the distribution of relative change (Δr) in percent error (ϵ), SSIM, and Dice, respectively. The boxplots are annotated with p‐values from evaluating paired two‐tailed sign test samples, and the median of each boxplot is displayed below each subfigure. Based on the median relative changes in each IQM, the UNetAl‐Masni et al outperformed UNetDuffy et al in all 3 IQMs with strong statistical significance.

Figure S10. (A) A test case with simulated intrashot motion corruption. Motion parameters were simulated at a temporal resolution of 3.2 seconds. This test cases corresponds to Subject 5 in Figure S11. (B) Comparing the image output and estimated motion parameters at Level 1 and Level 4 of the multiresolution UNet+JE algorithm. The top row displays the intermediate image and motion estimates at Level 1, in which the translation and rotation parameters are estimated at a temporal resolution of 25.6 seconds. The bottom row displays the final image and motion estimates at Level 4 of the algorithm, where motion parameters are estimated with a resolution of 3.2 seconds. The percent error (ϵ) and structural similarity index SSIM are included. (C) The change in total data consistency loss across all iterations of the multiresolution UNet+JE algorithm.

Figure S11. Displaying the change in image quality across the multi‐resolution UNet+JE algorithm for the 5 test cases with simulated intrashot motion. Plots (A) – (C) show the RMSE, SSIM, and Dice, respectively. While the RMSE trajectories display non‐monotonic behavior for certain test cases, the SSIM and Dice trajectories display more monotonic increase in image quality across each level.

Table S1. A summary of the motion corruption parameters for the training dataset. Motion trajectories were generated as a pseudo‐random walk from shot to shot, with different probabilities and step intervals assigned to each of the 3 translational and 3 rotational degrees of freedom.

Table S2. Details of the two datasets generated for training the UNet models. Training Dataset A consists of 240 samples, generated from data of 60 subjects that each underwent 2 simulations with “mild” motion corruption and 2 simulations with “moderate” motion corruption. Similarly, Training Dataset B consists of 480 samples generated from the same 60 subjects, now augmented to include “large” and “extreme” levels of motion corruption. During training, the number of epochs, the learning rate, and the batch size were kept the same between the different training sessions with Dataset A and B.

Table S3. Description of the multi‐resolution UNet+JE algorithm. The algorithm is divided into 4 levels, where each level carries out motion estimation and correction at increasingly higher temporal resolutions. Level 1 is equivalent to the standard UNet+JEA presented in the main text.

MRM-95-363-s001.docx (3.5MB, docx)

Data Availability Statement

The source code is open available on GitHub at https://github.com/BRAIN‐TO/PyMoCo_v2. The data that support the findings of this study are available from the corresponding author on reasonable request.


Articles from Magnetic Resonance in Medicine are provided here courtesy of Wiley

RESOURCES