NeSVoR: Implicit Neural Representation for Slice-to-Volume Reconstruction in MRI

Junshen Xu; Daniel Moyer; Borjan Gagoski; Juan Eugenio Iglesias; P Ellen Grant; Polina Golland; Elfar Adalsteinsson

doi:10.1109/TMI.2023.3236216

. Author manuscript; available in PMC: 2024 Jun 1.

Published in final edited form as: IEEE Trans Med Imaging. 2023 Jun 1;42(6):1707–1719. doi: 10.1109/TMI.2023.3236216

NeSVoR: Implicit Neural Representation for Slice-to-Volume Reconstruction in MRI

Junshen Xu ¹, Daniel Moyer ², Borjan Gagoski ³, Juan Eugenio Iglesias ⁴, P Ellen Grant ⁵, Polina Golland ⁶, Elfar Adalsteinsson ⁷

PMCID: PMC10287191 NIHMSID: NIHMS1905997 PMID: 37018704

Abstract

Reconstructing 3D MR volumes from multiple motion-corrupted stacks of 2D slices has shown promise in imaging of moving subjects, e.g., fetal MRI. However, existing slice-to-volume reconstruction methods are time-consuming, especially when a high-resolution volume is desired. Moreover, they are still vulnerable to severe subject motion and when image artifacts are present in acquired slices. In this work, we present NeSVoR, a resolution-agnostic slice-to-volume reconstruction method, which models the underlying volume as a continuous function of spatial coordinates with implicit neural representation. To improve robustness to subject motion and other image artifacts, we adopt a continuous and comprehensive slice acquisition model that takes into account rigid inter-slice motion, point spread function, and bias fields. NeSVoR also estimates pixel-wise and slice-wise variances of image noise and enables removal of outliers during reconstruction and visualization of uncertainty. Extensive experiments are performed on both simulated and in vivo data to evaluate the proposed method. Results show that NeSVoR achieves state-of-the-art reconstruction quality while providing two to ten-fold acceleration in reconstruction times over the state-of-the-art algorithms.

Keywords: MRI, slice-to-volume reconstruction, motion correction, super-resolution, 3D reconstruction, implicit neural representation, fetal brain MRI

I. INTRODUCTION

A. Motivation

High-resolution 3D Magnetic Resonance Imaging (MRI) plays an important role in clinical examinations but is vulnerable to artifacts caused by subject motion. To address this problem, ultra-fast sequences, e.g., single-shot fast spin echo $T_{2}$ -weighted imaging [1], have been developed to “freeze” in-plane motion, making within slice motion artifacts less severe compared to multi-shot methods. Nevertheless, inter-slice motion still exists and remains to be a problem. Therefore, in order to reconstruct 3D volumes, multiple stacks of slices at different orientations are acquired. These slices are then realigned to correct subject motion with slice-to-volume registration and then combined using super-resolution reconstruction [2]–[4]. This slice-to-volume reconstruction (SVR) framework has a wide range of applications in clinical practice and image analysis, including fetal and neonatal MRI [4], [5], cardiac MRI [6] and diffusion-weighted MRI [7]. Existing SVR algorithms explicitly represent the reconstructed volume as a discrete function on a pre-defined grid. In this formulation, the complexity and memory footprint of SVR is proportional to the number of voxels in the volume. For example, reducing the voxel spacing by half in every dimension would approximately increase the run time by a factor of eight. Moreover, the discrete representation of the volume may also introduce discretization error during reconstruction.

Recently, implicit neural representation (INR) has gained popularity in a variety of tasks in computer vision and graphics [8], [9]. In contrast to explicit representation, INR models a 2D slice or a 3D volume as a continuous function of spatial coordinates, and parameterizes the function with a neural network, e.g., a multi-layer perceptron (MLP). INR has several advantages. i) It is resolution-agnostic, i.e., the network learns a continuous function during training and is able to sample volumes at different resolutions during inference. ii) Prior knowledge and constraints of the problem can be injected into the model by designing the architecture of the implicit network [10]. iii) INR can also overcome the high storage costs of dense discretized voxel grids [8].

However, few works have applied INR to 3D MRI reconstruction and the continuous forward model for slice acquisition is poorly studied. Here, we propose Neural Slice-to-Volume Reconstruction (NeSVoR), a novel method to solve the problem of 3D volumetric MR reconstruction from multiple motion-corrupted stacks of slices utilizing implicit neural representation, which allows a continuous and resolution-agnostic representation of the reconstructed volume.

B. Related Works

1). Slice-to-Volume Reconstruction:

Rousseau et al. [2] proposed a 3D fetal brain reconstruction approach that consisted of three steps: i) motion correction with multi-resolution slice alignment, ii) intensity correction for the local relative intensity distortion between stacks, and iii) super-resolution reconstruction using scattered interpolation with a Gaussian point spread function (PSF). Jiang et al. [5] improved the scattered interpolation method by utilizing a multi-level B-spline kernel. Kim et al. [11] proposed a slice intersection motion correction method that realigned stacks of slices based on the slice intersection, followed by a gradient-weighted averaging step for volume reconstruction. Gholipour et al. [3] formulated volume reconstruction as a maximum likelihood error norm minimization problem and developed a robust M-estimation solution that reduced the influence of potential outliers in the acquired data. Building on this idea, Kuklisova-Murgasova et al. [4] proposed an SVR approach with complete outlier removal using robust statistics based on expectation maximization (EM). An additional intensity matching step was used to compensate for inconsistent scaling factors and bias fields of different slices. Tourbier et al. [12] extended the method in [4] with a total variation regularization, which can be solved efficiently using the primal-dual hybrid gradient (PDHG) method. Kainz et al. [13] developed a fast reconstruction algorithm based on [4], which leveraged the acceleration from multiple GPUs. Ebner et al. [14] proposed an automated reconstruction framework that included fetal brain localization and segmentation, and used a novel slice-level outlier rejection method by removing outlier slices with low similarity scores.

2). Implicit Neural Representation:

The idea of INR has been widely applied in neural rendering. Mildenhall et al. [8] proposed Neural Radiance Fields (NeRF) to learn a 3D scene with 2D images at different camera positions. NeRF modeled the density and RGB color as continuous fields in 5D space (3D spatial location + 2D viewing direction) and simulated 2D observations from the radiance fields following the principles of volume rendering. The network was optimized by minimizing the error between the simulated and ground-truth images. To mitigate the misalignment of images in cases where ground-truth camera poses are unknown, NeRF−− [15] was proposed to optimize the camera parameters and neural networks simultaneously. NeRF-W [16] introduced image-dependent embedding vectors to model the appearance and transient objects that vary from image to image. These embeddings were optimized during reconstruction and helped synthesize scenes robust to variations in appearance and occluders.

In the field of medical imaging, attempts have been made to reconstruct super-resolved volumes from 2D slices using INR. IREM [17] was proposed for super-resolution reconstruction of adult brain MRI from stacks of thick slices, where only the motion between stacks is considered. Inspired by NeRF−−, Yeung et al. developed ImplicitVol [18] for 3D ultrasound reconstruction. ImplicitVol optimizes both the implicit network and the rigid transformation of each slice to compensate for inter-slice fetal motion. However, the aforementioned methods ignore the complex slice acquisition model as well as the artifacts and noise that occur during acquisition, and therefore cannot be directly applied to fetal or neonatal MRI.

Furthermore, the training of NeRF is known to be time-consuming. Recent researches have revealed that the encoding layer, i.e., the input layer of the implicit network, had a significant impact on the convergence of the network [19], [20]. Pre-defined encoding functions, such as sine and cosine, required a deep network and long training time to fit the underlying function. To this end, parametric encodings [20], [21] were proposed, which had trainable parameters in addition to the network weights. These parameters were arranged in a sparse data structure and helped reduce the depth of network and shorten training time significantly [20].

C. Contribution

In this work, we present NeSVoR, a novel SVR method than extends INR to learn a continuous representation of the unknown 3D volume from multiple 2D slices corrupted by subject motion and image artifacts. The main contributions of NeSVoR are: 1) We use INR to model the underlying volume as a neural network that is resolution-agnostic and more efficient than the explicit grid representation, especially when high-resolution volumes are desired. 2) In tandem with the INR, We adopt a continuous slice acquisition model that takes into account inter-slice subject motion, PSF, and bias fields. 3) We also introduce a novel approach for outlier removal during reconstruction by estimating the pixel-level and slice-level variances. 4) With GPU accelerated implementation, NeSVoR achieves 2 to 10 times speedup compared to the baselines while providing state-of-the-art results.

II. MATERIALS AND METHODS

A. Slice Acquisition Model

1). Discrete Model:

Let $I \in R^{N_{s} \times N_{p}}$ be the data of the acquired slices, where $I_{i j}$ is the intensity of the j-th pixel in the i-th slice, $N_{s}$ and $N_{p}$ are the number of slices and the number of pixels in each slice, respectively. The goal of 3D reconstruction is to find an unknown volume $V$ of the 3D object, which is represented as an array $V \in R^{N_{v}}$ in traditional methods, where $N_{v}$ is the number of voxels. The forward slice acquisition model can be expressed as [3], [4], [13], [14]

I_{i j} = C_{i} B_{i j} \sum_{k = 1}^{N_{v}} M_{i j k} V_{k} .

(1)

The relationship between voxels of the reconstructed volume and pixels of the acquired data is described by $M \in R^{N_{s} \times N_{p} \times N_{v}}$ , where $M_{i j k}$ is the coefficient of the spatially aligned, discretized PSF for the acquisition of pixel $I_{i j}$ from voxel $V_{k}$ in the volume. $B_{i j}$ is the multiplicative bias field for pixel $I_{i j}$ and $C_{i}$ is the scaling factor of the i-th slice which accounts for global intensity inconsistency in different slices.

2). Proposed Continuous Model:

There are two disadvantages of the aforementioned discrete model. First, the discrete representation of the volume, bias field, and PSF might introduce discretization errors during reconstruction. Second, this formulation reconstructs a volume only at a specific resolution. To address these problems, we propose a continuous slice acquisition model in NeSVoR:

I_{i j} = C_{i} \int_{Ω} M_{i j} (x) B_{i} (x) [V (x) + ϵ_{i} (x)] d x,

(2)

where $Ω$ is the 3D region of interest (ROI). The main differences between the proposed model and the discrete model are as follows: i) Instead of discretized arrays, we model the volume, PSF, and bias field as continuous functions of spatial coordinates $x$ . ii) The bias fields are modeled in volume coordinates rather than slice coordinates so that they share coordinates encoding with the volume in INR. Since the movement of the fetus changes its position relative to the scanner and creates inconsistencies in the bias field of the reconstructed volume, we keep the bias fields to be slice-dependent as in the previous model. iii) We also adopt a residual (noise) term $ϵ_{i}$ in our formulation. The aim is twofold: to model slice-dependent noise in the acquired data; and to enable automatic outlier removal during reconstruction.

Assume that $ϵ_{i} (x)$ is white Gaussian noise with $E [ϵ_{i} (x)] = 0$ 0 and $E [ϵ_{i} (x) ϵ_{i} (y)] = σ_{i}^{2} (x) δ (x - y)$ , where $δ (\cdot)$ is the Dirac delta function. The mean and variance of pixel $I_{i j}$ are

{\overline{I}}_{i j} = E [I_{i j}] = C_{i} \int_{Ω} M_{i j} (x) B_{i} (x) V (x) d x,

(3)

σ_{i j}^{2} = v a r (I_{i j}) = C_{i}^{2} \int_{Ω} M_{i j}^{2} (x) B_{i}^{2} (x) σ_{i}^{2} (x) d x .

(4)

In general, there are no closed-form solutions to the integrals in Eq. (3) and (4). Therefore, we use Monte Carlo sampling to estimate them. In many cases, the PSF can be modeled as an anisotropic 3D Gaussian distribution [2], [4],

M_{i j} (x) = g (T_{i}^{- 1} \circ x - p_{i j}; Σ),

(5)

g (u; Σ) = \frac{1}{\sqrt{(2 π)^{3} d e t (Σ)}} e x p (- \frac{1}{2} u^{T} Σ^{- 1} u)

(6)

where $T_{i}$ is the (unknown) rigid transformation from the i-th slice to the 3D space, $p_{i j}$ is the location of pixel $I_{i j}$ in the slice coordinates, and $Σ$ is the covariance matrix of the Gaussian PSF. The expression $(T_{i}^{- 1} \circ x - p_{i j})$ maps the 3D position $x$ back to the slice coordinates centered at $p_{i j}$ , where the PSF is unrotated. Therefore, we generate $K$ i.i.d. samples from the Gaussian distribution, with $x_{i j k} = T_{i} \circ (u_{i j k} + p_{i j}), u_{i j k} \sim 𝒩 (0, Σ)$ , $k = 1, \dots, K$ , to compute Eq. (3) and (4).

E [I_{i j}] = \frac{C_{i}}{K} \sum_{k = 1}^{K} B_{i} (x_{i j k}) V (x_{i j k}),

(7)

var (I_{i j}) = \frac{C_{i}^{2}}{K} \sum_{k = 1}^{K} M_{i j} (x_{i j k}) B_{i}^{2} (x_{i j k}) σ_{i}^{2} (x_{i j k}) .

(8)

B. Implicit Neural Representation

1). Hash Grid Encoding:

In INR, a volume is modeled as a continuous function $f (x)$ parameterized by a neural network that takes coordinates as inputs, $f (x) = M L P (ϕ (x))$ , where $ϕ$ is an encoding function that maps coordinates $x$ to a high-dimensional feature vector which is then fed into an $MLP$ to fit $f (x)$ . For example, $ϕ$ can be the multi-resolution sequence of $L$ sine and cosine functions [8], [19], $ϕ (x) = [s i n (2^{0} x), \dots, s i n (2^{L - 1} x), c o s (2^{0} x), \dots, c o s (2^{L - 1} x)]$ . However, with fixed encoding functions, we only rely on the weights of $MLP$ to fit the target function $f (x)$ , and thus require a deeper network that typically converges slower.

To enable fast INR training, we adopt the recently proposed hash grid encoding [20], which arranges additional trainable parameters in multi-resolution 3D grids, and therefore reduces the depth of $MLP$ . Specifically, Let $Φ_{l} \in R^{N_{l} \times N_{l} \times N_{l} \times F}$ be the grid of parameters at the l-th level, $l = 1, \dots, L$ , where $L$ is the number of levels, $N_{l}$ is the size of the l-th grid in every dimension, and each vertex of the grid conceptually stores a feature vector with a length of $F$ . To compute the encoding at position $x$ , we perform trilinear interpolation on the grid, i.e., we find the eight vertices around position $x$ and compute the linear combination of the feature vectors stored in these eight vertices as the feature vector at position $x$ .

Starting with the coarsest grid with a size of $N_{1}$ , each following grid increases the size by a factor of s, i.e., $N_{l} = ⌊N_{1} s^{l - 1}⌋$ . The coarse-to-fine strategy enables the model to learn multi-scale features in a progressively refined manner, where low-level grids encode slowly varying features, such as bias field, while high-level grids learn high-frequency details, like edges in the image. However, if we store the multi-level grids naively as dense 3D arrays, the memory footprint of each level increases as cube of grid size, $O (N_{l}^{3})$ . In multi-scale representation, high-resolution details tend to be sparse, so a large amount of memory space in the dense array is wasted. To this end, the dense 3D array $Φ_{l}$ is replaced by a hash table $Φ_{l}^{hash} \in R^{N_{h} \times F}$ , where $N_{h}$ is the size of the hash table and $N_{h} ≪ {(N_{L})}^{3}$ . Therefore, the query of the grid $Φ_{l}$ at index $(i, j, k)$ is translated into two steps: i) mapping $(i, j, k)$ to a hash code with a hash function, ii) accessing the hash table $Φ_{l}^{hash}$ with the hash code, $Φ_{l} (i, j, k) = Φ_{l}^{hash} (i \oplus j π_{1} \oplus k π_{2} m o d N_{h})$ , where $π_{1}$ and $π_{2}$ are two large primes, and $\oplus$ denotes the bit-wise XOR operation.

With the hash table structure, we essentially compress the grids at high levels so that they have a much smaller memory footprint and only store details that are necessary for volume reconstruction. The final encoding provided to the networks is the concatenation of encodings at different levels, $ϕ (x) = [ϕ_{1} (x), \dots, ϕ_{L} (x)]$ .

2). Implicit Networks:

The architecture of the proposed implicit network is shown in Fig. 1-B. Given the multi-resolution encoding $ϕ (x)$ , an $MLP$ is used to regress the intensity of the volume at position $x$ . The $MLP$ also outputs a feature vector $z (x)$ for downstream processing.

[V (x), z (x)] = {M L P}_{V} (ϕ (x))

(9)

Since bias fields are slice-dependent, we model the slice-specific information with latent variable optimization [16], [22] by assigning each slice an embedding vector $e_{i}$ , $i = 1, \dots, N_{s}$ . These slice embedding vectors are trainable and able to learn slice-specific information during optimization. It is worth noting that we do not incorporate these embeddings in ${M L P}_{V}$ , since we want ${M L P}_{V}$ to only leam information that is slice-independent, i.e., the intensity of the underlying volume.

Note that in Eq. (7), $B_{i} (x)$ is more general than $V (x)$ , as it can be slice-dependent. Hence, if we use the same input encoding $ϕ (x)$ as in ${M L P}_{V}$ , $B_{i} (x)$ may leam the product of bias field and volume such that $V (x)$ becomes a constant. To avoid this undesired solution, we need to limit the information going through the bias field branch. One important prior of the bias field is that it is a smoothly varying function of spatial location. Therefore, instead of the full encoding $ϕ (x)$ , we only use the first $b$ levels of the encoding as the input to the bias field network, which contains the low-frequency information. In summary, a second $MLP$ is adopted to regress the bias field $B_{i} (x)$ from the low-level encoding $ϕ_{1 : b} (x) = [ϕ_{1} (x), \dots, ϕ_{b} (x)]$ and the slice embedding $e_{i}$ as well.

B_{i} (x) = {M L P}_{B} (ϕ_{1 : b} (x), e_{i}) .

(10)

The last component for evaluating Eq. (7) and (8) is the variance $σ_{i}^{2}$ which is also slice-dependent. We use a third $MLP$ to estimate the variance at location $x$ from the feature vector $z (x)$ , and the slice embedding $e_{i}$ ,

σ_{i}^{2} (x) = {M L P}_{σ} (z (x), e_{i}) .

(11)

C. Loss Functions

1). Slice Reconstruction:

Given the estimates of the mean and variance of pixel intensity in Eq. (7) and (8), the underlying volume can be reconstructed by minimizing the negative log-likelihood of Gaussian distribution:

ℒ_{i j} = \frac{{(I_{i j} - {\overline{I}}_{i j})}^{2}}{2 σ_{i j}^{2}} + \frac{1}{2} l o g (σ_{i j}^{2}) .

(12)

Another way to interpret this loss function is from the perspective of outlier removal. The acquired MR slices are often corrupted by different types of artifacts, e.g., motion blurring, and spin history. Such slices or pixels should be excluded during reconstruction to avoid artifacts in the reconstructed volume. The precision $1 / σ_{i j}^{2}$ can be interpreted as the weight of pixel $I_{i j}$ . The model should assign a large variance (small weight) to the outlier so that they would be ignored during reconstruction. Also, the log-variance term prevents $σ_{i j}^{2}$ from going to infinity. The NeSVoR model is optimized with stochastic gradient descent, i.e., in each iteration, a batch of data $ℬ \subset \{1, \dots, N_{s}\} \times \{1, \dots, N_{p}\}$ is sampled to compute the loss function:

ℒ_{I} = \frac{1}{| ℬ |} \sum_{(i, j) \in ℬ} ℒ_{i j}

(13)

2). Image Regularization:

SVR is an ill-posed problem due to subject motion and insufficient ROI coverage. Thus, regularization methods are adopted to improve image quality and suppress noise. Although the network architecture implicitly regularizes the outputs [10], [23], we provide a way to add (optional) explicit regularizations to the loss function to demonstrate the flexibility of NeSVoR. A widespread approach is the first-order regularizer,

ℛ_{V} = \int_{Ω} r (∥ \nabla V (x) ∥_{2}) d x .

(14)

The function $r$ can be the identity function (isotropic total variation), square function (first-order Tikhonov), or Huber function (edge-preserving). Although it is possible to compute $\nabla V (x)$ with automatic differentiation, the extra computation graph significantly increases computational cost. Instead, we approximate the first-order regularizer by estimating the directional derivative from the random samples. Specifically, we split the $K$ samples for computing Eq. (7) and (8) into $K / 2$ pairs, ${(1, 1 + K / 2), \dots, (K / 2, K)}$ ¹. The directional derivative for each pair is $|V (x_{i j k}) - V (x_{i j l})| / {∥x_{i j k} - x_{i j l}∥}_{2}$ , where $l = k + K / 2$ . Then the regularization can be approximated by

ℛ_{V} = \frac{2}{K | ℬ |} \sum_{(i, j) \in ℬ} \sum_{k = 1}^{K / 2} r (\frac{| V (x_{i j k}) - V (x_{i j l}) |}{{‖ x_{i j k} - x_{i j l} ‖}_{2}}) .

(15)

This method requires no extra forward/backward pass of the network, and therefore, adds only marginal computational cost.

We use isotropic total variation as the default regularization for the reconstructed volume.

3). Bias Field:

In Eq. (7), the bias field $B_{i}$ and volume $V$ are only unique up to a constant factor. If $(B_{i}, V)$ is a solution, $(c B_{i}, \frac{1}{c} V)$ is also a feasible solution for any constant $c > 0$ . In order to disambiguate, extra constraints are required. For example, we can force the mean of log bias field to be zero, $\int_{Ω} l o g B_{i} (x) d x = 0$ , which can be achieved with the following regularization of the sample mean of log bias field:

ℛ_{B} = {(\frac{1}{K | ℬ |} \sum_{(i, j) \in ℬ} \sum_{k = 1}^{K} \log B_{i} (x_{i j k}))}^{2}

(16)

D. Other trainable parameters

1). Transformation:

To allow unconstrained optimization of the rigid transformation with gradient descent, we adopt the axis-angle representation for the transformation of each slice, i.e., the transformation of the i-th slice $T_{i}$ is parameterized by a six-dimensional vector $(θ_{i} n_{i 1}, θ_{i} n_{i 2}, θ_{i} n_{i 3}, t_{i 1}, t_{i 2}, t_{i 3})$ , where $(n_{i 1}, n_{i 2}, n_{i 3})$ is a unit vector representing the rotation axis, $θ_{i}$ is the rotation angle, and $(t_{i 1}, t_{i 2}, t_{i 3})$ is the translation vector.

2). Slice Scaling Factor:

The scaling factor $C_{i}$ in Eq. (7) introduces an arbitrary constant factor to the solution, and therefore constraints on $C_{i}$ need to be imposed. Here, we assume that the average of the scaling factor is 1, i.e., $\frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} C_{i} = 1$ and reparameterize $C$ to satisfy this constraint,

C = N_{s} softmax (c), C_{i} = \frac{N_{s} \exp (c_{i})}{\sum_{j = 1}^{N_{s}} \exp (c_{j})},

(17)

so that the new parameter vector $c$ is unconstrained.

3). Slice-wise Variance:

Under severe artifacts, the whole slice might be corrupted. Therefore, in addition to the pixel-wise variance $v a r (I_{i j})$ , we also introduce a slice variance $ν_{i}^{2}$ which downplays the whole slice from reconstruction when the entire slice is of low quality. Eq. (4) is then modified as

σ_{i j}^{2} = v a r (I_{i j}) + v_{i}^{2},

(18)

i.e., the total variance of pixel $I_{i j}$ is the sum of pixel-wise variance $v a r (I_{i j})$ and slice-wise variance $v_{i}^{2}$ .

E. Training and interence

During training, we solve the optimization problem:

\arg \underset{Θ}{m i n} ℒ (Θ), ℒ = ℒ_{I} + λ_{B} ℛ_{B} + λ_{V} ℛ_{V},

(19)

where $λ_{B}$ and $λ_{V}$ are the weights for the regularization terms, $Θ$ is the set of trainable parameters, including the weights of MLPs, the hash grid $Φ^{hash}$ , the slice transformations $T$ , the slice embeddings $e$ , the scaling factors $c$ , and the log slice variances $l o g ν^{2}$ . We adopt an Adam optimizer [24] with an initial leaming rate of $5 \times 10^{- 3}$ which decays with a factor of $γ = 1 / 3$ at iteration $N_{τ} / 2$ and $3 N_{τ} / 4$ , where $N_{τ}$ is the total number of iterations. We use a batch size of 4096 and set the number of samples $K = 128$ . The covariance matrix of the Gaussian PSF is defined as in [4], $Σ = d i a g ({(\frac{1.2 r_{1}}{2.355})}^{2}, {(\frac{1.2 r_{2}}{2.355})}^{2}, {(\frac{r_{3}}{2.355})}^{2})$ , where $r_{1}$ and $r_{2}$ are the in-plane pixel spacings and $r_{3}$ is the slice thickness.

All MLPs have one hidden layer with 64 units and ReLU activation. ${M L P}_{V}$ use softplus² as the output activation while the other MLPs use the exponential function. The length of slice embedding is set to 16 and the slice embedding is initialized with the standard normal distribution. For the hash encoding, we choose the hyperparameters following the strategy in [20], $s = 1.38$ , $N_{1} \in [6, 16]$ , $L \in [9, 12]$ , depending on the size of slices. The parameters in the hash grid are initialized with a uniform distribution $U (- 10^{4}, 10^{- 4})$ . When there are more than one input stacks, we first perform a volume-to-volume registration to coarsely correct the motion between different stacks [4]. The stack transformation after volume-to-volume registration is then used to initialize the transformation of each slice $T_{i}$ .

After training the model, $V (x)$ learns a continuous representation of the underlying volume. Slices at different views and volumes of different field of views (FOV) can then be sampled from the function $V (x)$ . Sampling directly from $V (x)$ might result in aliasing and image noise. Therefore, we sample the intensity at position $x$ using the PSF model,

V^{out} (x) = \int_{Ω} M (x) V (x) d x .

(20)

where $M$ is an isotropic Gaussian PSF with $σ = r / 2.3548$ and $r$ is the isotropic voxel spacing of the output volume.

F. Implementation

All models were tested on a system with two Intel Xeon Gold 6238R CPUs @2.20 GHz with 768 GB RAM, and an NVIDIA Tesla V100 GPU with 32 GB RAM. The networks were implemented with PyTorch [25] and Tiny CUDA NN [20], [26]. To further accelerate the training of INR, we adopt the strategy of automatic mixed precision training [27] where the hash grid and MLPs are trained with half-precision format while the other parameters are stored in single-precision format. The source code is available on GitHub³.

III. EXPERIMENTS

A. Datasets

We performed extensive experiments to evaluate NeSVoR using the following four datasets.

1). Simulated Adult Brain Data:

An Adult brain MRI dataset was synthesized from the data in the Human Connectome Project (HCP) [28]. We randomly selected the $T_{1}$ -weighted and $T_{2}$ -weighted images of 30 subjects, which were acquired at 0.7 mm isotropic resolution, and used as ground truth. We simulated 3 orthogonal stacks of slices from each volume, with in-plane resolution of 1 mm and slice thickness of 2 mm. Simulated inter-slice motion with random translations and rotations was incorporated. The translations along the $x -$ , $y -$ , and $z -$ axes of each slice were sampled independently from the range of [−3, 3] mm. The angles of 3D rotation were randomly sampled from the range of [−6, 6] degree. Inplane motion artifacts and ghosting artifacts were simulated as in [29]. Rician noise [30], with a standard deviation of 3% the maximum intensity, was added to each slice.

2). Simulated Fetal Brain Data:

We also simulated fetal data from the FeTA [31] dataset, which consisted of fetal brain volumes with 0.5 mm isotropic resolution. We selected 10 volumes with gestational age (GA) from 27 to 35 weeks as ground truths. For each subject, 3 orthogonal stacks were simulated with in-plane resolution of 0.8 mm and slice thickness of 3 mm. Fetal brain motion trajectories were simulated using the method in [32]. Specifically, we sampled head motion from a fetal keypoint dataset [33] that represents realistic fetal brain motion trajectories during MRI scans. The maximum translation and rotation motion in the motion trajectory dataset are 21.4 mm/s and 59.7 degree/s respectively. The fetal brain volumes were transformed according to the sampled trajectory, and slices were extracted from the volume at the corresponding positions. Bias fields and signal void artifacts are also simulated. Image noise was added as in the adult brain data.

Fig 2 shows example stacks from the two simulated datasets. The goal of the simulated data is twofold: i) to quantitatively evaluate our approach with ground-truth data, ii) to show that the proposed method can be applied to data with different contrasts, sizes of ROI, and image artifacts.

3). Clinical Neonatal Brain Data:

We used the clinical neonatal brain data from the Developing Human Connectome Project (dHCP) [34] to evaluate the proposed method. The raw $T_{2}$ -weighted magnitude images of 10 neonatal subjects were selected from this dataset. Each subject consists of 2 to 4 image stacks with in-plane resolution of 0.8 mm, slice gap of 0.8 mm, and slice thickness of 1.6 mm. Details on acquisition parameters can be found in [34].

4). Clinical Fetal Brain Data:

A fetal MRI dataset was collected to evaluate the method. This dataset consisted of $T_{2}$ -weighted MRI from 20 fetuses, with GA from 21 to 32 weeks. All scans were performed in accordance with the local institutional review board protocol. The data were acquired with in-plane resolution of 1–1.3 mm, slice thickness of 2–4 mm, no gap, $T E = 100 - 120 m s, T R = 1.4 - 1.8 s$ . Each subject had 3 to 10 stacks of slices. Fetal brains are segmented from the slices using an existing, trained segmentation network [35].

B. Baselines

We adopted as baseline three state-of-the-art SVR methods that have open-source implementations: 1) SVRTK⁴: The slice-to-volume reconstruction toolkit is an implementation of the algorithm in [4], which is accelerated with multi-core parallelism on CPU. 2) SVR-GPU⁵: This approach [13] is a GPU-accelerated implementation of [4]. Although it has been pointed out in previous works [14] that the GPU-accelerated approach tends to produce blurrier outcomes, we still incorporated this method in experiments mainly for comparing the efficiency of algorithms on GPU. 3) NiftyMIC⁶: It is the implementation of the reconstruction algorithm in [14], which runs on CPU with no parallelism and is slow when the size output volume is large. We excluded this method from the experiments on the simulated adult brain data and the neonatal brain data as the run time for each case exceeded 12 hours.

For tuning hyperparameters for different methods, we randomly picked one subject from the simulated adult/fetal dataset and adjusted the hyperparameters to minimize the mean squared error between the reconstructed volume and the ground truth. The tuned hyperparameters were then applied to the other data in the same dataset. For the clinical neonatal brain dataset, we used the same hyperparameters as the simulated adult brain data, and for the clinical fetal brain dataset, we used the same hyperparameters as the simulated fetal brain data.

C. Results on Simulated Data

We reconstructed the simulated adult and fetal data at the isotropic resolution of 0.7 mm and 0.5 mm respectively to match the original resolutions of the ground truths. We compared the reconstructed volumes and ground truths by different quantitative metrics, including peak signal-to-noise ratio (PSNR), structural similarity (SSIM) [36], normalized root mean square error (NRMSE), and normalized cross-correlation (NCC). Results are shown in Table I. NeSVoR achieved comparable results with SVRTK on the simulated adult brain data with 4× speedup. Although SVR-GPU was faster than SVRTK due to GPU acceleration, the reconstruction quality of this implementation was lower than NeSVoR and SVRTK. For the simulated fetal data, NeSVoR outperformed the baselines in terms of both accuracy and speed.

TABLE I.

Mean values of quantitative metrics for different models on the simulated datasets (standard deviation IN PARENTHESES). ↓ indicates lower values being more accurate, and vice versa.

Methods	PSNR / dB ↑	SSIM ↑	NRMSE ↓	NCC ↑	run time / min ↓

Simulated adult brain data

SVRTK [4]	28.71 (3.33)	0.8861 (0.0459)	0.1129 (0.0452)	0.8706 (0.0598)	24.57 (3.42)
SVR-GPU [13]	26.68 (4.38)	0.7526 (0.1936)	0.1518 (0.0867)	0.7360 (0.2775)	11.63 (2.23)
NeSVoR	28.85 (3.22)	0.8901 (0.0317)	0.1103 (0.0427)	0.8772 (0.0526)	6.13 (0.30)

Simulated fetal brain data

SVRTK [4]	22.10 (2.39)	0.8694 (0.0792)	0.1772 (0.0504)	0.6517 (0.2007)	8.38 (2.83)
SVR-GPU [13]	21.64 (0.70)	0.8031 (0.0460)	0.1809 (0.0169)	0.6552 (0.0779)	2.36 (0.87)
NiftyMIC [14]	21.09 (2.20)	0.7919 (0.1651)	0.1978 (0.0525)	0.5647 (0.2445)	189.24 (79.81)
NeSVoR	23.63 (1.17)	0.9290 (0.0354)	0.1446 (0.0199)	0.7804 (0.055)	1.92 (0.09)

Open in a new tab

To further study the trade-off between the run time and the reconstruction quality, we ran NeSVoR on the simulated adult data with different numbers of training steps $N_{τ}$ and altered the number of outer iterations in the baselines, i.e., the number of cycles of registration and reconstruction. The resulting curves are shown in Fig. 3. NeSVoR converged much faster than SVRTK, requiring only 6% to 25% run time to achieve comparable results with SVRTK. Although running on GPU, SVR-GPU suffered from lower image quality compared to the other methods, resulting in a sub-optimal trade-off curve.

Fig. 3. — The image quality of reconstructed volume vs. run time (number of iterations) for different methods. NeSVoR: number of iterations = 1000, 2000, 4000, 8000, 16000; SVRTK: number of outer iterations = 1, 2, 3; SVR-GPU: number of outer iterations = 1, 2, 4, 6.

Since fetal MRI routinely suffers from subject motion during scans, we also evaluated the methods with different motion levels. We randomly chose a fetal brain volume and simulated motion trajectories of different levels (3D translation and 3D rotation were simulated and evaluated separately). Fig. 4 shows the PSNRs and SSIMs of different methods. In comparison, NeSVoR is more robust than the baselines when the motion is small to moderate. As the intensity of motion increases, the PSNRs and SSIMs of all the reconstruction methods decrease. Since the slice transformations are optimized by maximizing the local similarity between the slices and the volume, they easily get stuck in local minima, and therefore, result in a limited capture range of motion.

Fig. 4. — Comparative reconstruction performance (PSNR and SSIM) of different methods on the simulated fetal data with different levels of motion. The x-axes represent the distance traveled and the accumulated rotation over the trajectory for translation and rotation, respectively.

D. Results on Clinical MRI data

We reconstructed the neonatal data with resolution of 0.5 mm. As there was no ground truth, we evaluated the methods by measuring the consistency between the output volumes and the input slices. We extracted slices from the motion-corrected locations and computed the NCC and SSIM between the extracted slices and the corresponding slices of an input stack. The results are presented in Fig. 5 and show that NeSVoR had similar SSIM and higher NCC while achieving 9× speedup on average compared to SVRTK. Fig. 6 shows the reconstruction results of a neonatal subject. NeSVoR produced results with fewer image artifacts than the baseline method.

Fig. 5. — Quantitative comparison based on the similarity metrics between the input slices and the slices extracted from the reconstructed volumes. Results of Wilcoxon signed rank test are presented, *: p<0.05, **: p<0.01, and n.s.: not significant (p≥0.05).

Fig. 6. — The reconstruction results and an input stack of a subject in the dHCP dataset. Green arrows indicate artifacts in SVRTK that are eliminated in NeSVoR.

For the clinical fetal data, we reconstructed volumes with isotropic resolution of 0.8 mm. Since there was also no ground truth for the clinical fetal data, we adopted an automated MRI quality assessment (QA) approach to evaluate the image quality of reconstructed volumes. Specifically, we used the trained QA network proposed in [37], which can predict a QA score between 0 and 1 for a 2D $T_{2}$ weighted fetal brain MR image to assess the artifacts in the image, with higher scores indicating better image quality. The score of the volume was computed as the average of the scores of all slices in the volume. We also evaluated the reconstructed volumes in terms of signal-to-noise ratio (SNR) and partial volume effect (PVE). We considered the Gaussian mixture model (GMM) in [38], [39] that models three types of brain tissues, i.e., cerebrospinal fluid (CSF), gray matter (GM), and white matter (WM). The SNR of a volume is computed as $S N R = 20 {l o g}_{10} (μ / σ)$ , where $s$ and $σ$ are the weighted averages of the mean signal intensities and the standard deviations of noise of three Gaussian components. We consider the voxels are from the k-th tissue if their intensities are within $\pm δ_{k}$ of the mean of the k-th Gaussian component, where $δ_{k}$ is the corresponding half FWHM. Thus, the percentage of voxels outside the three ranges can be used as a proxy for PVE.

Fig. 7 shows the QA scores, SNRs, and PVEs of different methods. NeSVoR achieved higher image quality and SNR compared to the baselines. While there is no difference among the PVEs of different reconstruction algorithms. Fig. 8 presents a visual comparison of NeSVoR and the baselines for three challenging cases that are corrupted by severe fetal motion. NeSVoR yielded results with the best perceptual quality.

Fig. 8. — The reconstruction results of three challenging cases in the clinical fetal brain dataset. For each reconstructed volume, three orthogonal views are displayed. Rows 1 to 4 show the results of different methods and the last row shows one of the input stacks for each case.

E. Reconstructing Volumes at Different Resolutions

In NeSVoR, the INR learns a continuous representation of the reconstructed volume. Therefore, the model only needs to be trained once, from which volumes at different resolutions can be sampled (the time for sampling a volume is negligible compared to the training time). In comparison, conventional methods reconstruct a volume only at a specific resolution, and therefore, to reconstruct volumes at different resolutions, the algorithm needs to be re-run. We call this property of NeSVoR resolution-agnostic reconstruction.

To demonstrate this, we reconstructed a fetal subject with different isotropic voxel spacings (0.8 mm, 0.6 mm, and 0.4 mm). We also performed the same experiment using SVRTK, where the reconstruction algorithm was re-run with different resolutions. Fig. 9 shows the reconstruction results at different resolutions for the two methods. Volumes reconstructed at higher resolution yielded sharper edges. The bar plot in Fig. 9 shows the run time of the two methods. The run time of SVRTK increases drastically as the voxel spacing decreases because the number of voxels in the volume is inversely proportional to the cube of voxel spacing. In contrast, the run time of NeSVoR is independent of the voxel spacing. When reconstructing a volume with 4 mm voxel spacing, NeSVoR achieved 18× speedup compared to SVRTK.

Fig. 9. — The reconstruction results of a fetal brain at different resolutions. A) The reconstruction results of NeSVoR and SVRTK, where different columns show volumes reconstructed with different voxel spacings. B) The bar plot of the run time for the two methods. The numbers shown on top of the bars indicate the speedup of NeSVoR compared to SVRTK.

F. Ablation Study

To investigate the contribution of each component in NeSVoR, we evaluated the model on the simulated fetal dataset by ablating PSF, bias field estimation, transformation optimization, variance estimation, slice embedding, slice scaling, and INR. When ablating INR, we represented the volume, bias fields, and variance as dense 3D grids that were optimized directly. The PSNR and SSIM of different variants of the model are shown in Table. II. The results show that the full model outperforms other variants.

TABLE II.

Mean values of PSNR and SSIM for different ablated models on the simulated fetal brain datasets (standard deviation in parentheses).

Methods	PSNR / dB	SSIM

full model	23.63 (1.17)	0.9290 (0.0354)
w/o PSF	19.49 (1.23)	0.7101 (0.0916)
w/o bias field correction	23.57 (1.29)	0.8848 (0.0623)
w/o transformation optimization	19.26 (0.70)	0.7179 (0.0495)
w/o variance estimation	17.59 (0.85)	0.4917 (0.1438)
w/o slice embedding	19.94 (0.63)	0.8856 (0.0317)
w/o slice scaling	22.57 (0.78)	0.9242 (0.0376)
w/o INR	18.17 (0.48)	0.7402 (0.0599)

Open in a new tab

1). Bias Field:

We compared the reconstructed volumes with and without bias field correction, and the results are shown in Fig. 10. NeSVoR was able to mitigate the effect of bias field. For comparison, we performed the same experiment with SVRTK whose forward model also took into account the bias fields. However, it failed to correct the smoothly varying bias field in this subject. The last row shows a selected intensity profile of the resulting reconstructions without bias field correction (bottom left) and with bias field correction (bottom middle).

Moreover, since we implement the bias field model in a separate network, computational cost can be reduced by disabling this module, when the effect of the bias field is small or when other techniques of bias field correction are available.

2). Variance:

Fig. 11 shows examples of estimated slice-wise and pixel-wise variances. From the original slices (the first row), we can see that images corrupted by severe artifacts have high values of log slice variance $l o g ν_{i}^{2}$ (the number labeled on top of each slice), indicating that NeSVoR can identify outlier slices and reduce their influence by assigning high slice variances. The second row of Fig. 11 shows the maps of pixel-wise variance learned by the model. The pixels with large variances match the locations of image artifacts. Thus, the variance maps also provide a way to visualization of pixel-level uncertainty. The reconstructed volumes show that the result without the variance model suffered from severe artifacts propagated from the corrupted slices, while the variance model succeeded in excluding those slices from reconstruction.

3). PSF:

Fig. 12 shows the reconstructed volumes of the full model and the model ablating PSF during training or inference. The model trained without PSF suffered from partial volume effects leading to blurred results. Ablating the PSF during inference improved the sharpness of the output while being more vulnerable to image noise and aliasing.

G. Understanding Hash Grid Encoding

Fig. 13 visualizes the learned hash grids at different levels as well as the corresponding reconstruction result. The low-level hash grid is of low resolution, and therefore, learns low-frequency features in the images. The middle level has a finer grid, but the conceptual grid size is still comparable to the actual size of the hash table, so it is able to encode anatomical structures of the brain volume. For the high-level grid, however, the conceptual grid size is far greater than the actual size of the hash table, resulting in severe hash collisions. Therefore, it would encode some sparse features or high-frequency noise in the images.

One of the advantages of hash grid encoding compared to other encodings, e.g., frequency encoding, is the convergence speed. Fig. 14 shows the convergence of NeSVoR on a fetal subject. brain data. The model converged to a high-quality volume in one to two minutes. Fig. 15 shows the reconstruction results of the same subject by replacing the hash grid encoding with frequency encoding and using eight-layer MLPs as in [16]. The frequency encoding needs much more time to converge and results tend to be smoother than that of hash grid encoding.

Fig. 15. — Results of reconstruction with frequency encoding and hash grid encoding.

H. Hyperparameters

In this section, we study the impact of different hyperparameters on the performance of NeSVoR. Fig. 16 shows the PSNR and runtime of NeSVoR per hyperparameter setting, where we varied one hyperparameter at a time.

The PSNR increases with the size of slice embeddings, since it could potentially encode more slice-specific information. The gain starts to saturate after 16, while the run time increases faster.

The number of hidden layers has a minimal impact on the PSNR. This result is consistent with the previous work [20] and indicates that most of the information of the volume is encoded in the hash grid.

The scale factor $s$ of the hash grid determines how the size of the grid increases per level. When $s$ is too small, the grid size is not enough to encode high-frequency details in the images. On the other hand, if $s$ is too large, the grid size increases too fast while the actual size of the hash table is fixed, leading to severe hash collisions.

Fixing the factor $s$ , the PSNR first increases with the number of levels, as more and more features are encoded. It also saturates after a threshold, which indicates the resolution at the highest level is finer than the finest detail in the data.

In NeSVoR, we propose a method to impose image regularization using sampling. To demonstrate the efficacy of the regularization, we reconstructed a fetal brain with different weights of regularization $λ_{V}$ . As expected, the reconstructed volume becomes smoother as $λ_{V}$ increases.

For slice-to-volume reconstruction, multiple stacks of slices of different orientations are acquired to oversample the brain ROI and the number of input stacks would affect the quality of reconstruction. We collected 10 stacks of slices of a subject (2 axial, 4 sagittal, and 4 coronal) and used different subsets of data to reconstruct the brain volume with NeSVoR. Fixing the number of input stacks, the setting that contains more different orientations yields better reconstruction results (e.g., 4S vs. 2S+2C). Moreover, increasing the number of stacks per orientation can further increase the reconstruction quality (e.g., 1A+1S+1C vs 2A+2S+2C).

I. Incorporating Deep Initializer

Slice-to-volume reconstruction is vulnerable to subject motion, since the slice transformations are optimized by maximizing the local slice-to-volume consistency, leading to a limited capture range of subject motion. Fig. 19 shows the reconstruction of a fetal subject ( $G A = 21$ weeks) with 7 input stacks. The input data suffered from severe motion and the transformation optimization in NeSVoR failed to correct the slice misalignment in the data.

Fig. 19. — The reconstruction results of NeSVoR with and without SVoRT initialization on a fetal subject with severe motion. The last row shows one of the input stacks that is corrupted by fetal motion.

Recently, many methods were proposed to address this problem by formulating the slice-to-volume registration as a learning problem [40]–[42], where deep neural networks are trained to predict the 3D location of each input slice. By learning from a large dataset in a supervised manner, these approaches are able to identify and correct large motions in fetal MRI. The predicted slice transformations can be used to initialize downstream reconstruction algorithms to improve the robustness in presence of motion.

To demonstrate the potential of combining NeSVoR with learning based slice-to-volume registration methods, we adopted the Slice-to-Volume Registration Transformer (SVoRT) [42] to predict the slice transformation of the data in Fig. 19, and used them to initialize NeSVoR. The reconstruction results show that, with the SVoRT initialization, NeSVoR is able to restore the correct 3D anatomical structures from the 2D slices even though they are corrupted by extreme subject motion during the scan. Fig. 20 visualizes the output transformations of NeSVoR, where three input slices from different stacks are placed at the corresponding output locations. NeSVoR alone cannot correct the large movement in the data, leading to a local minimum with significant slice misalignment. With the help of SVoRT, however, NeSVoR yields results with better consistency between different slices.

Fig. 20. — Visualization of output transformations of NeSVoR with and without SVoRT initialization. Three input slices from different stacks are placed in the 3D space according to the corresponding output locations.

IV. DISCUSSION AND CONCLUSION

We have presented NeSVoR, a novel approach for fast, robust slice-to-volume reconstruction based on implicit neural representation. We adopt a continuous representation for the underlying volume and the slice acquisition model, which is resolution-agnostic and efficient for reconstructing volumes at high resolution. We also introduce a probabilistic noise model for outlier removal in reconstruction. Extensive evaluations on both simulated and clinical data show that NeSVoR produces high-quality results that are robust to subject motion, bias fields, and artifacts while achieving a significant speedup over traditional SVR methods. The proposed implementation can reconstruct a high-quality fetal brain volume in about a minute (Fig. 14), and potentially enables online reconstruction of fetal MRI during scans, which can be combined with online image quality assessment [43] and fetal brain tracking [44] to implement a fully automated pipeline for fetal MRI. Also, for an input dataset with 9 stacks (309 slices), the peak GPU memory usage of NeSVoR is only 832MB. Therefore, it can also be run on a GPU with less RAM.

It is noteworthy that the current model only focuses on the rigid motion for brain MRI. For applications with larger ROIs, such as fetal body and placenta reconstruction, a deformable motion model should be employed [45], [46]. Moreover, as the slice transformations are optimized with gradient descent, NeSVoRis only able to recover relatively limited transformations of the target object. To this end, we further demonstrate that deep learning based slice-to-volume registration methods, e.g., SVoRT [42], can be used to initialize slice transformations and improve the robustness of NeSVoR in presence of large motion. Also, PSFs beyond the Gaussian function, such as the sinc kernel, will be explored in the future. The developed formulation of INR-based reconstruction is general and well suited to other reconstruction problems that involve a PSF model. For future work, we plan to extend NeSVoR to other types of MR acquisitions and even other modalities.

Taken together, NeSVoR provides a rather general framework for exploiting implicit neural representation in slice-to-volume reconstruction, which is potentially applicable to a broader range of reconstruction problems in medical imaging.

Fig. 17. — The reconstruction results and quantitative metrics for a subject using different numbers of input stacks. We use the result with 10 input stacks as reference. A, S, and C mean axial, sagittal, and coronal respectively. For instance, 2A+4S+4C means the input stacks consist of two axial stacks, three sagittal stacks, and three coronal stacks.

Fig. 18. — The reconstruction results of a subject from the clinical fetal dataset with different weights for the image regularization term.

ACKNOWLEDGMENT

This research was supported by NIH R01EB032708, R01HD100009, NIBIB NAC P41EB015902, U01HD087211, R01EB017337, R01AG070988, RF1MH123195, ERC Starting Grant 677697, and ARUK-IRG2019A-003.

Footnotes

Since the $K$ samples that we have are i.i.d., it doesn’t matter how we pair the points. Here we chose to pair point $k$ and point $k + K / 2$ as this is easy to implement.

The softplus function is defined as $s o f t p l u s (x) = l o g (1 + e x p (x))$

https://github.com/daviddmc/NeSVoR

⁴

https://github.com/SVRTK/SVRTK

⁵

https://github.com/bkainz/fetalReconstruction

⁶

https://github.com/gift-surg/NiftyMIC

Contributor Information

Junshen Xu, Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA..

Daniel Moyer, Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA, USA..

Borjan Gagoski, Fetal-Neonatal Neuroimaging and Developmental Science Center, Boston Children’s Hospital, Boston, MA, USA, and Harvard Medical School, Boston, MA, USA..

Juan Eugenio Iglesias, Center for Medical Image Computing, UCL, London, UK, the Martinos Center for Biomedical Imaging, Harvard Medical School, Boston, MA, USA, and the Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA, USA..

P. Ellen Grant, Fetal-Neonatal Neuroimaging and Developmental Science Center, Boston Children’s Hospital, Boston, MA, USA, and Harvard Medical School, Boston, MA, USA..

Polina Golland, Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA, USA..

Elfar Adalsteinsson, Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA..

REFERENCES

[1].Saleem SN, “Fetal MRI: An approach to practice: A review,” Journal of advanced research, vol. 5, no. 5, pp. 507–523, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Rousseau F, Glenn OA, Iordanova B, Rodriguez-Carranza C, Vigneron DB, Barkovich JA, and Studholme C, “Registration-based approach for reconstruction of high-resolution in utero fetal MR brain images,” Academic radiology, vol. 13, no. 9, pp. 1072–1081, 2006. [DOI] [PubMed] [Google Scholar]
[3].Gholipour A, Estroff JA, and Warfield SK, “Robust super-resolution volume reconstruction from slice acquisitions: application to fetal brain MRI,” IEEE transactions on medical imaging, vol. 29, no. 10, pp. 1739–1758, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Kuklisova-Murgasova M, Quaghebeur G, Rutherford MA, Hajnal JV, and Schnabel JA, “Reconstruction of fetal brain MRI with intensity matching and complete outlier removal,” Medical image analysis, vol. 16, no. 8, pp. 1550–1564, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Jiang S, Xue H, Glover A, Rutherford M, Rueckert D, and Hajnal JV, “MRI of moving subjects using multislice snapshot images with volume reconstruction (SVR): application to fetal, neonatal, and adult brain studies,” IEEE transactions on medical imaging, vol. 26, no. 7, pp. 967–980, 2007. [DOI] [PubMed] [Google Scholar]
[6].Odille F, Bustin A, Chen B, Vuissoz P-A, and Felblinger J, “Motion-corrected, super-resolution reconstruction for high-resolution 3D cardiac cine MRI,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 435–442, Springer, 2015. [Google Scholar]
[7].Marami B, Scherrer B, Afacan O, Erem B, Warfield SK, and Gholipour A, “Motion-robust diffusion-weighted brain MRI reconstruction through slice-level registration-based motion tracking,” IEEE transactions on medical imaging, vol. 35, no. 10, pp. 2258–2269, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, and Ng R, “NeRF: Representing scenes as neural radiance fields for view synthesis,” in European conference on computer vision, pp. 405–421, Springer, 2020. [Google Scholar]
[9].Sitzmann V, Zollhöfer M, and Wetzstein G, “Scene representation networks: Continuous 3d-structure-aware neural scene representations,” Advances in Neural Information Processing Systems, vol. 32, 2019. [Google Scholar]
[10].Zhang K, Riegler G, Snavely N, and Koltun V, “NeRF++: Analyzing and improving neural radiance fields,” arXiv preprint arXiv:2010.07492, 2020. [Google Scholar]
[11].Kim K, Habas PA, Rousseau F, Glenn OA, Barkovich AJ, and Studholme C, “Intersection based motion correction of multislice MRI for 3-D in utero fetal brain image formation,” IEEE transactions on medical imaging, vol. 29, no. 1, pp. 146–158, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Tourbier S, Bresson X, Hagmann P, Thiran J-P, Meuli R, and Cuadra MB, “An efficient total variation algorithm for super-resolution in fetal brain MRI with adaptive regularization,” NeuroImage, vol. 118, pp. 584–597, 2015. [DOI] [PubMed] [Google Scholar]
[13].Kainz B, Steinberger M, Wein W, Kuklisova-Murgasova M, Malamateniou C, Keraudren K, Torsney-Weir T, Rutherford M, Aljabar P, Hajnal JV, et al. , “Fast volume reconstruction from motion corrupted stacks of 2D slices,” IEEE transactions on medical imaging, vol. 34, no. 9, pp. 1901–1913, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Ebner M, Wang G, Li W, Aertsen M, Patel PA, Aughwane R, Melbourne A, Doel T, Dymarkowski S, et al. , “An automated framework for localization, segmentation and super-resolution reconstruction of fetal brain MRI,” NeuroImage, vol. 206, p. 116324, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Wang Z, Wu S, Xie W, Chen M, and Prisacariu VA, “NeRF--: Neural radiance fields without known camera parameters,” arXiv preprint arXiv:2102.07064, 2021. [Google Scholar]
[16].Martin-Brualla R, Radwan N, Sajjadi MS, Barron JT, Dosovitskiy A, and Duckworth D, “NeRF in the wild: Neural radiance fields for unconstrained photo collections,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7210–7219, 2021. [Google Scholar]
[17].Wu Q, Li Y, Xu L, Feng R, Wei H, Yang Q, Yu B, Liu X, Yu J, and Zhang Y, “IREM: High-resolution magnetic resonance image reconstruction via implicit neural representation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 65–74, Springer, 2021. [Google Scholar]
[18].Yeung P-H, Hesse L, Aliasi M, Haak M, Xie W, Namburete AI, et al. , “Implicitvol: Sensorless 3D ultrasound reconstruction with deep implicit representation,” arXiv preprint arXiv:2109.12108, 2021. [Google Scholar]
[19].Tancik M, Srinivasan P, Mildenhall B, Fridovich-Keil S, Raghavan N, Singhal U, Ramamoorthi R, Barron J, and Ng R, “Fourier features let networks learn high frequency functions in low dimensional domains,” Advances in Neural Information Processing Systems, vol. 33, pp. 7537–7547, 2020. [Google Scholar]
[20].Müller T, Evans A, Schied C, and Keller A, “Instant neural graphics primitives with a multiresolution hash encoding,” arXiv preprint arXiv:2201.05989, 2022. [Google Scholar]
[21].Chabra R, Lenssen JE, Ilg E, Schmidt T, Straub J, Lovegrove S, and Newcombe R, “Deep local shapes: Learning local sdf priors for detailed 3d reconstruction,” in European Conference on Computer Vision, pp. 608–625, Springer, 2020. [Google Scholar]
[22].Bojanowski P, Joulin A, Lopez-Paz D, and Szlam A, “Optimizing the latent space of generative networks,” arXiv preprint arXiv:1707.05776, 2017. [Google Scholar]
[23].Ulyanov D, Vedaldi A, and Lempitsky V, “Deep image prior,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9446–9454, 2018. [Google Scholar]
[24].Kingma DP and Ba J, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014. [Google Scholar]
[25].Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, and Lerer A, “Automatic differentiation in pytorch,” 2017. [Google Scholar]
[26].Müller T, Rousselle F, Novák J, and Keller A, “Real-time neural radiance caching for path tracing,” arXiv preprint arXiv:2106.12372, 2021. [Google Scholar]
[27].Micikevicius P, Narang S, Alben J, Diamos G, Elsen E, Garcia D, Ginsburg B, Houston M, Kuchaiev O, Venkatesh G, et al. , “Mixed precision training,” arXiv preprint arXiv:1710.03740, 2017. [Google Scholar]
[28].Van Essen DC, Smith SM, Barch DM, Behrens TE, Yacoub E, Ugurbil K, Consortium W-MH, et al. , “The WU-Minn human connectome project: an overview,” Neuroimage, vol. 80, pp. 62–79, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[29].Pérez-García F, Sparks R, and Ourselin S, “TorchIO: a Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning,” Computer Methods and Programs in Biomedicine, vol. 208, p. 106236, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Gudbjartsson H and Patz S, “The Rician distribution of noisy MRI data,” Magnetic resonance in medicine, vol. 34, no. 6, pp. 910–914, 1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
[31].Payette K, de Dumast P, Kebiri H, Ezhov I, Paetzold JC, Shit S, Iqbal A, Khan R, Kottke R, Grehten P, et al. , “An automatic multi-tissue human fetal brain segmentation benchmark using the fetal tissue annotation dataset,” Scientific Data, vol. 8, no. 1, pp. 1–14, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Xu J, Abaci Turk E, Grant PE, Golland P, and Adalsteinsson E, “STRESS: Super-resolution for dynamic fetal MRI using self-supervised learning,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 197–206, Springer, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Xu J, Zhang M, Turk EA, Zhang L, Grant PE, Ying K, Golland P, and Adalsteinsson E, “Fetal pose estimation in volumetric mri using a 3d convolution neural network,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 403–410, Springer, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[34].Hughes EJ, Winchman T, Padormo F, Teixeira R, Wurie J, Sharma M, Fox M, Hutter J, Cordero-Grande L, Price AN, et al. , “A dedicated neonatal brain imaging system,” Magnetic resonance in medicine, vol. 78, no. 2, pp. 794–804, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[35].Ranzini M, Fidon L, Ourselin S, Modat M, and Vercauteren T, “MONAIfbs: MONAI-based fetal brain MRI deep learning segmentation,” arXiv preprint arXiv:2103.13314, 2021. [Google Scholar]
[36].Wang Z, Bovik AC, Sheikh HR, and Simoncelli EP, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004. [DOI] [PubMed] [Google Scholar]
[37].Xu J, Lala S, Gagoski B, Abaci Turk E, Grant PE, Golland P, and Adalsteinsson E, “Semi-supervised learning for fetal brain MRI quality assessment with ROI consistency,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 386–395, Springer, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].Sui Y, Afacan O, Jaimes C, Gholipour A, and Warfield SK, “Scan-specific generative neural network for mri super-resolution reconstruction,” IEEE Transactions on Medical Imaging, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
[39].Laidlaw DH, Fleischer KW, and Barr AH, “Partial-volume bayesian classification of material mixtures in mr volume data using voxel histograms,” IEEE transactions on medical imaging, vol. 17, no. 1, pp. 74–86, 1998. [DOI] [PubMed] [Google Scholar]
[40].Hou B, Khanal B, Alansary A, McDonagh S, Davidson A, Rutherford M, Hajnal JV, Rueckert D, Glocker B, and Kainz B, “3-D reconstruction in canonical co-ordinate space from arbitrarily oriented 2-D images,” IEEE transactions on medical imaging, vol. 37, no. 8, pp. 1737–1750, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[41].Shi W, Xu H, Sun C, Sun J, Li Y, Xu X, Zheng T, Zhang Y, Wang G, and Wu D, “Affirm: Affinity fusion-based framework for iteratively random motion correction of multi-slice fetal brain mri,” arXiv preprint arXiv:2205.05851, 2022. [DOI] [PubMed] [Google Scholar]
[42].Xu J, Moyer D, Grant PE, Golland P, Iglesias JE, and Adalsteinsson E, “SVoRT: Iterative transformer for slice-to-volume registration in fetal brain MRI,” in Medical Image Computing and Computer Assisted Intervention - MICCAI 2022 – 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part VI (Wang L, Dou Q, Fletcher PT, Speidel S, and Li S, eds.), vol. 13436 of Lecture Notes in Computer Science, pp. 3–13, Springer, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
[43].Gagoski B, Xu J, Wighton P, Tisdall MD, Frost R, Lo W-C, Golland P, van Der Kouwe A, Adalsteinsson E, and Grant PE, “Automated detection and reacquisition of motion-degraded images in fetal HASTE imaging at 3T,” Magnetic resonance in medicine, vol. 87, no. 4, pp. 1914–1922, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
[44].Hoffmann M, Abaci Turk E, Gagoski B, Morgan L, Wighton P, Tisdall MD, Reuter M, Adalsteinsson E, Grant PE, Wald LL, et al. , “Rapid head-pose detection for automated slice prescription of fetal-brain MRI,” International journal of imaging systems and technology, vol. 31, no. 3, pp. 1136–1154, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[45].Alansary A, Rajchl M, McDonagh SG, Murgasova M, Damodaram M, Lloyd DF, Davidson A, Rutherford M, Hajnal JV, Rueckert D, et al. , “PVR: patch-to-volume reconstruction for large area motion correction of fetal MRI,” IEEE transactions on medical imaging, vol. 36, no. 10, pp. 2031–2044, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[46].Uus A, Zhang T, Jackson LH, Roberts TA, Rutherford MA, Hajnal JV, and Deprez M, “Deformable slice-to-volume registration for motion correction of fetal body and placenta MRI,” IEEE transactions on medical imaging, vol. 39, no. 9, pp. 2750–2759, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] [1].Saleem SN, “Fetal MRI: An approach to practice: A review,” Journal of advanced research, vol. 5, no. 5, pp. 507–523, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] [2].Rousseau F, Glenn OA, Iordanova B, Rodriguez-Carranza C, Vigneron DB, Barkovich JA, and Studholme C, “Registration-based approach for reconstruction of high-resolution in utero fetal MR brain images,” Academic radiology, vol. 13, no. 9, pp. 1072–1081, 2006. [DOI] [PubMed] [Google Scholar]

[R3] [3].Gholipour A, Estroff JA, and Warfield SK, “Robust super-resolution volume reconstruction from slice acquisitions: application to fetal brain MRI,” IEEE transactions on medical imaging, vol. 29, no. 10, pp. 1739–1758, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] [4].Kuklisova-Murgasova M, Quaghebeur G, Rutherford MA, Hajnal JV, and Schnabel JA, “Reconstruction of fetal brain MRI with intensity matching and complete outlier removal,” Medical image analysis, vol. 16, no. 8, pp. 1550–1564, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Jiang S, Xue H, Glover A, Rutherford M, Rueckert D, and Hajnal JV, “MRI of moving subjects using multislice snapshot images with volume reconstruction (SVR): application to fetal, neonatal, and adult brain studies,” IEEE transactions on medical imaging, vol. 26, no. 7, pp. 967–980, 2007. [DOI] [PubMed] [Google Scholar]

[R6] [6].Odille F, Bustin A, Chen B, Vuissoz P-A, and Felblinger J, “Motion-corrected, super-resolution reconstruction for high-resolution 3D cardiac cine MRI,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 435–442, Springer, 2015. [Google Scholar]

[R7] [7].Marami B, Scherrer B, Afacan O, Erem B, Warfield SK, and Gholipour A, “Motion-robust diffusion-weighted brain MRI reconstruction through slice-level registration-based motion tracking,” IEEE transactions on medical imaging, vol. 35, no. 10, pp. 2258–2269, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, and Ng R, “NeRF: Representing scenes as neural radiance fields for view synthesis,” in European conference on computer vision, pp. 405–421, Springer, 2020. [Google Scholar]

[R9] [9].Sitzmann V, Zollhöfer M, and Wetzstein G, “Scene representation networks: Continuous 3d-structure-aware neural scene representations,” Advances in Neural Information Processing Systems, vol. 32, 2019. [Google Scholar]

[R10] [10].Zhang K, Riegler G, Snavely N, and Koltun V, “NeRF++: Analyzing and improving neural radiance fields,” arXiv preprint arXiv:2010.07492, 2020. [Google Scholar]

[R11] [11].Kim K, Habas PA, Rousseau F, Glenn OA, Barkovich AJ, and Studholme C, “Intersection based motion correction of multislice MRI for 3-D in utero fetal brain image formation,” IEEE transactions on medical imaging, vol. 29, no. 1, pp. 146–158, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Tourbier S, Bresson X, Hagmann P, Thiran J-P, Meuli R, and Cuadra MB, “An efficient total variation algorithm for super-resolution in fetal brain MRI with adaptive regularization,” NeuroImage, vol. 118, pp. 584–597, 2015. [DOI] [PubMed] [Google Scholar]

[R13] [13].Kainz B, Steinberger M, Wein W, Kuklisova-Murgasova M, Malamateniou C, Keraudren K, Torsney-Weir T, Rutherford M, Aljabar P, Hajnal JV, et al. , “Fast volume reconstruction from motion corrupted stacks of 2D slices,” IEEE transactions on medical imaging, vol. 34, no. 9, pp. 1901–1913, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Ebner M, Wang G, Li W, Aertsen M, Patel PA, Aughwane R, Melbourne A, Doel T, Dymarkowski S, et al. , “An automated framework for localization, segmentation and super-resolution reconstruction of fetal brain MRI,” NeuroImage, vol. 206, p. 116324, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Wang Z, Wu S, Xie W, Chen M, and Prisacariu VA, “NeRF--: Neural radiance fields without known camera parameters,” arXiv preprint arXiv:2102.07064, 2021. [Google Scholar]

[R16] [16].Martin-Brualla R, Radwan N, Sajjadi MS, Barron JT, Dosovitskiy A, and Duckworth D, “NeRF in the wild: Neural radiance fields for unconstrained photo collections,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7210–7219, 2021. [Google Scholar]

[R17] [17].Wu Q, Li Y, Xu L, Feng R, Wei H, Yang Q, Yu B, Liu X, Yu J, and Zhang Y, “IREM: High-resolution magnetic resonance image reconstruction via implicit neural representation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 65–74, Springer, 2021. [Google Scholar]

[R18] [18].Yeung P-H, Hesse L, Aliasi M, Haak M, Xie W, Namburete AI, et al. , “Implicitvol: Sensorless 3D ultrasound reconstruction with deep implicit representation,” arXiv preprint arXiv:2109.12108, 2021. [Google Scholar]

[R19] [19].Tancik M, Srinivasan P, Mildenhall B, Fridovich-Keil S, Raghavan N, Singhal U, Ramamoorthi R, Barron J, and Ng R, “Fourier features let networks learn high frequency functions in low dimensional domains,” Advances in Neural Information Processing Systems, vol. 33, pp. 7537–7547, 2020. [Google Scholar]

[R20] [20].Müller T, Evans A, Schied C, and Keller A, “Instant neural graphics primitives with a multiresolution hash encoding,” arXiv preprint arXiv:2201.05989, 2022. [Google Scholar]

[R21] [21].Chabra R, Lenssen JE, Ilg E, Schmidt T, Straub J, Lovegrove S, and Newcombe R, “Deep local shapes: Learning local sdf priors for detailed 3d reconstruction,” in European Conference on Computer Vision, pp. 608–625, Springer, 2020. [Google Scholar]

[R22] [22].Bojanowski P, Joulin A, Lopez-Paz D, and Szlam A, “Optimizing the latent space of generative networks,” arXiv preprint arXiv:1707.05776, 2017. [Google Scholar]

[R23] [23].Ulyanov D, Vedaldi A, and Lempitsky V, “Deep image prior,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9446–9454, 2018. [Google Scholar]

[R24] [24].Kingma DP and Ba J, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014. [Google Scholar]

[R25] [25].Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, and Lerer A, “Automatic differentiation in pytorch,” 2017. [Google Scholar]

[R26] [26].Müller T, Rousselle F, Novák J, and Keller A, “Real-time neural radiance caching for path tracing,” arXiv preprint arXiv:2106.12372, 2021. [Google Scholar]

[R27] [27].Micikevicius P, Narang S, Alben J, Diamos G, Elsen E, Garcia D, Ginsburg B, Houston M, Kuchaiev O, Venkatesh G, et al. , “Mixed precision training,” arXiv preprint arXiv:1710.03740, 2017. [Google Scholar]

[R28] [28].Van Essen DC, Smith SM, Barch DM, Behrens TE, Yacoub E, Ugurbil K, Consortium W-MH, et al. , “The WU-Minn human connectome project: an overview,” Neuroimage, vol. 80, pp. 62–79, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] [29].Pérez-García F, Sparks R, and Ourselin S, “TorchIO: a Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning,” Computer Methods and Programs in Biomedicine, vol. 208, p. 106236, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Gudbjartsson H and Patz S, “The Rician distribution of noisy MRI data,” Magnetic resonance in medicine, vol. 34, no. 6, pp. 910–914, 1995. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] [31].Payette K, de Dumast P, Kebiri H, Ezhov I, Paetzold JC, Shit S, Iqbal A, Khan R, Kottke R, Grehten P, et al. , “An automatic multi-tissue human fetal brain segmentation benchmark using the fetal tissue annotation dataset,” Scientific Data, vol. 8, no. 1, pp. 1–14, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].Xu J, Abaci Turk E, Grant PE, Golland P, and Adalsteinsson E, “STRESS: Super-resolution for dynamic fetal MRI using self-supervised learning,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 197–206, Springer, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] [33].Xu J, Zhang M, Turk EA, Zhang L, Grant PE, Ying K, Golland P, and Adalsteinsson E, “Fetal pose estimation in volumetric mri using a 3d convolution neural network,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 403–410, Springer, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] [34].Hughes EJ, Winchman T, Padormo F, Teixeira R, Wurie J, Sharma M, Fox M, Hutter J, Cordero-Grande L, Price AN, et al. , “A dedicated neonatal brain imaging system,” Magnetic resonance in medicine, vol. 78, no. 2, pp. 794–804, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] [35].Ranzini M, Fidon L, Ourselin S, Modat M, and Vercauteren T, “MONAIfbs: MONAI-based fetal brain MRI deep learning segmentation,” arXiv preprint arXiv:2103.13314, 2021. [Google Scholar]

[R36] [36].Wang Z, Bovik AC, Sheikh HR, and Simoncelli EP, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004. [DOI] [PubMed] [Google Scholar]

[R37] [37].Xu J, Lala S, Gagoski B, Abaci Turk E, Grant PE, Golland P, and Adalsteinsson E, “Semi-supervised learning for fetal brain MRI quality assessment with ROI consistency,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 386–395, Springer, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] [38].Sui Y, Afacan O, Jaimes C, Gholipour A, and Warfield SK, “Scan-specific generative neural network for mri super-resolution reconstruction,” IEEE Transactions on Medical Imaging, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] [39].Laidlaw DH, Fleischer KW, and Barr AH, “Partial-volume bayesian classification of material mixtures in mr volume data using voxel histograms,” IEEE transactions on medical imaging, vol. 17, no. 1, pp. 74–86, 1998. [DOI] [PubMed] [Google Scholar]

[R40] [40].Hou B, Khanal B, Alansary A, McDonagh S, Davidson A, Rutherford M, Hajnal JV, Rueckert D, Glocker B, and Kainz B, “3-D reconstruction in canonical co-ordinate space from arbitrarily oriented 2-D images,” IEEE transactions on medical imaging, vol. 37, no. 8, pp. 1737–1750, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] [41].Shi W, Xu H, Sun C, Sun J, Li Y, Xu X, Zheng T, Zhang Y, Wang G, and Wu D, “Affirm: Affinity fusion-based framework for iteratively random motion correction of multi-slice fetal brain mri,” arXiv preprint arXiv:2205.05851, 2022. [DOI] [PubMed] [Google Scholar]

[R42] [42].Xu J, Moyer D, Grant PE, Golland P, Iglesias JE, and Adalsteinsson E, “SVoRT: Iterative transformer for slice-to-volume registration in fetal brain MRI,” in Medical Image Computing and Computer Assisted Intervention - MICCAI 2022 – 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part VI (Wang L, Dou Q, Fletcher PT, Speidel S, and Li S, eds.), vol. 13436 of Lecture Notes in Computer Science, pp. 3–13, Springer, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] [43].Gagoski B, Xu J, Wighton P, Tisdall MD, Frost R, Lo W-C, Golland P, van Der Kouwe A, Adalsteinsson E, and Grant PE, “Automated detection and reacquisition of motion-degraded images in fetal HASTE imaging at 3T,” Magnetic resonance in medicine, vol. 87, no. 4, pp. 1914–1922, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] [44].Hoffmann M, Abaci Turk E, Gagoski B, Morgan L, Wighton P, Tisdall MD, Reuter M, Adalsteinsson E, Grant PE, Wald LL, et al. , “Rapid head-pose detection for automated slice prescription of fetal-brain MRI,” International journal of imaging systems and technology, vol. 31, no. 3, pp. 1136–1154, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] [45].Alansary A, Rajchl M, McDonagh SG, Murgasova M, Damodaram M, Lloyd DF, Davidson A, Rutherford M, Hajnal JV, Rueckert D, et al. , “PVR: patch-to-volume reconstruction for large area motion correction of fetal MRI,” IEEE transactions on medical imaging, vol. 36, no. 10, pp. 2031–2044, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] [46].Uus A, Zhang T, Jackson LH, Roberts TA, Rutherford MA, Hajnal JV, and Deprez M, “Deformable slice-to-volume registration for motion correction of fetal body and placenta MRI,” IEEE transactions on medical imaging, vol. 39, no. 9, pp. 2750–2759, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

NeSVoR: Implicit Neural Representation for Slice-to-Volume Reconstruction in MRI

Junshen Xu

Daniel Moyer

Borjan Gagoski

Juan Eugenio Iglesias

P Ellen Grant

Polina Golland

Elfar Adalsteinsson

Abstract

I. INTRODUCTION

A. Motivation

B. Related Works

1). Slice-to-Volume Reconstruction:

2). Implicit Neural Representation:

C. Contribution

II. MATERIALS AND METHODS

A. Slice Acquisition Model

1). Discrete Model:

2). Proposed Continuous Model:

B. Implicit Neural Representation

1). Hash Grid Encoding:

2). Implicit Networks:

Fig. 1.

C. Loss Functions

1). Slice Reconstruction:

2). Image Regularization:

3). Bias Field:

D. Other trainable parameters

1). Transformation:

2). Slice Scaling Factor:

3). Slice-wise Variance:

E. Training and interence

F. Implementation

III. EXPERIMENTS

A. Datasets

1). Simulated Adult Brain Data:

2). Simulated Fetal Brain Data:

Fig. 2.

3). Clinical Neonatal Brain Data:

4). Clinical Fetal Brain Data:

B. Baselines

C. Results on Simulated Data

TABLE I.

Fig. 3.

Fig. 4.

D. Results on Clinical MRI data

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 8.

E. Reconstructing Volumes at Different Resolutions

Fig. 9.

F. Ablation Study

TABLE II.

1). Bias Field:

Fig. 10.

2). Variance:

Fig. 11.

3). PSF:

Fig. 12.

G. Understanding Hash Grid Encoding

Fig. 13.

Fig. 14.

Fig. 15.

H. Hyperparameters

Fig. 16.

I. Incorporating Deep Initializer

Fig. 19.

Fig. 20.

IV. DISCUSSION AND CONCLUSION

Fig. 17.

Fig. 18.

ACKNOWLEDGMENT

Footnotes

Contributor Information

REFERENCES

ACTIONS

PERMALINK

RESOURCES