Abstract
In recent years, there has been attention on leveraging the statistical modeling capabilities of neural networks for reconstructing sub-sampled Magnetic Resonance Imaging (MRI) data. Most proposed methods assume the existence of a representative fully-sampled dataset and use fully-supervised training. However, for many applications, fully sampled training data is not available, and may be highly impractical to acquire. The development and understanding of self-supervised methods, which use only sub-sampled data for training, are therefore highly desirable. This work extends the Noisier2Noise framework, which was originally constructed for self-supervised denoising tasks, to variable density sub-sampled MRI data. We use the Noisier2Noise framework to analytically explain the performance of Self-Supervised Learning via Data Undersampling (SSDU), a recently proposed method that performs well in practice but until now lacked theoretical justification. Further, we propose two modifications of SSDU that arise as a consequence of the theoretical developments. Firstly, we propose partitioning the sampling set so that the subsets have the same type of distribution as the original sampling mask. Secondly, we propose a loss weighting that compensates for the sampling and partitioning densities. On the fastMRI dataset we show that these changes significantly improve SSDU’s image restoration quality and robustness to the partitioning parameters.
Index Terms: Deep learning, image reconstruction, magnetic resonance imaging
I. Introduction
The data acquisition process in Magnetic Resonance Imaging (MRI) consists of traversing a sequence of smooth paths through the Fourier representation of the image, referred to as “k-space”, which is inherently time-consuming. Images can be reconstructed from accelerated, sub-sampled acquisitions by leveraging the non-uniformity of receiver coil sensitivities, referred to as “parallel imaging” [1], [2], [3], [4]. Compressed sensing [5], [6], which uses sparse models to reconstruct incoherently sampled data, has also been widely applied to MRI [7], [8], [9].
There has been significant research attention in recent years on methods that reconstruct sub-sampled MRI data with neural networks [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24]. The majority of these works use fully-supervised training. To train a network in a fully-supervised manner, there must be a dataset comprised of fully sampled k-space data y0,t ∈ ℂN, where N is the dimension of k-space multiplied by the number of coils, and paired sub-sampled data . Here, t indexes the training set and is a sub-sampling mask with sampling set Ωt, so that the jth diagonal of is 1 if j ∈ Ωt and zero otherwise. Then a network fθ with parameters θ is trained by seeking a minimum of a non-convex loss function:
(1) |
which could be, for example, an norm in the image domain after coil combination [25]. The network estimates the ground truth in the image domain or k-space depending on the choice of loss function. For a k-space to k-space network, y0,s can be estimated with , where s indexes the test set.
Given sufficient representative training data, fully-supervised networks can yield substantial reconstruction quality gains over sparsity-based compressed sensing methods. There are a number of large datasets available for fully supervised training, such as the fastMRI knee and brain data [25]. However, for many contrasts, orientations, or anatomical regions of interest, fully sampled datasets are not publicly available. Fully sampled data is rarely acquired as part of a normal scanning protocol, so acquiring sufficient training data for a specific application is highly resource intensive. In some cases, it may not even be technically feasible to acquire such data [26], [27], [28]. Therefore, for MRI reconstruction with deep learning to be applicable to datasets acquired using only standard protocols, a training method that uses solely sub-sampled data is required.
There have been several attempts to train networks with only sub-sampled MRI data [29], [30], [31], [32], [33], [34], [35], [36], [37], some of which are based on methods from the denoising literature [38], [39], [40], [41], [42], [43], [44].One such approach is Noise2Noise [38]. Rather than mapping yt to y0,t, Noise2Noise trains a network to map yt to another sub-sampled k-space where ΩT and Ωt are independent and y0,T =y0,t when t = T [31]. A limitation of Noise2Noise is that it requires paired data, so the dataset must contain two independently sampled scans of the same k-space [14], which is not part of standard protocols. Further, unless compensated for [45], any motion and phase drifts between scans would cause the paired data to be inconsistent, violating the central assumption that underlies the method.
SSDU [33] is a recently proposed method for ground-truth free training that does not require paired data. SSDU partitions the sampling set Ωt into two disjoint sets: , where . Then the network is trained to recover from :
(2) |
At inference, the estimate is used. With a physics-guided network architecture, SSDU was found to have a reconstruction quality comparable with fully supervised training given certain empirically selected choices of At and Bt. However, it was presented without theoretical justification. Although SSDU has similarities with Noise2Self [40], Noise2Self’s analysis has a strong requirement on independent noise, so do not apply to k-space sampling in general.
A. Contributions
This article considers the recently proposed Noisier2Noise framework [41], which was originally constructed for denoising problems. We modify Noisier2Noise so that it can be applied to variable density sub-sampled MRI data. To our knowledge, this is the first work that applies Noisier2Noise to image reconstruction. Like SSDU, the proposed modification of Noisier2Noise does not require paired data, and involves training a network to map from one subset of Ωt to another. While SSDU recovers one disjoint set from the other, Noisier2Noise applies a second sub-sampling mask to the data, , and the network is trained to recover yt from with an loss. Then, at inference, the fully sampled data is estimated via a correction term based on the distributions of Λt and Ωt that ensures that the estimate is correct in expectation.
Despite their superficial differences, we show that, in fact, SSDU and Noisier2Noise are closely related. Specifically, we demonstrate that SSDU is a version of Noisier2Noise with a particular loss function modification that removes the need for the correction term at inference. The primary contribution of this article is the use of Noisier2Noise to theoretically explain SSDU’s excellent empirical performance. Specifically, we show that SSDU with an loss correctly estimates fully sampled k-space in expectation: see Section II-D.
The second contribution of this article is the proposal of two modifications of SSDU that significantly improve its reconstruction quality and robustness to the parameters of , both of which arise as a consequence of SSDU’s connection to Noisier2Noise. Firstly, we use Noisier2Noise to inform SSDU’s sampling set partition: we show that SSDU’s performance improves when Bt has the same type of distribution as the original mask Ωt, but not necessarily with the same parameters. Secondly, we show that SSDU’s performance improves when a particular weighting is employed in the loss function. This non-trivial weighting, which arises as a consequence of the novel theoretical analysis of SSDU, depends on the distributions of Λt and Ωt and has minimal additional computational cost: see Section II-F.
Although this paper focuses on MRI reconstruction, we emphasize that none of the theoretical developments are specific to k-space. This framework is therefore applicable to any image reconstruction problem with a forward model that involves random sub-sampling, such as low dose x-ray computed tomography [46] or astronomical imaging [47].
II. Theory
This section describes how the Noisier2Noise framework can be applied to sub-sampled data. Additive and multiplicative noise versions of Noisier2Noise are proposed in [41]. Based on the observation that a k-space sub-sampling mask can be considered as multiplicative “noise”, we extend Noisier2Noise to image reconstruction by modifying the latter. It is standard practice in MRI to sub-sample k-space with variable density, so that low frequencies, where the spectral density is larger, are sampled with higher probability [7]. Since the multiplicative noise version of standard Noisier2Noise assumes uniformity, this requires a modification of the framework to variable density sampling.
A. Variable Density Noisier2Noise for Reconstruction
The terms in the measurement model can be considered as instances of random variables. We denote Y = MΩY0, where Y, MΩ and Y0 are the random variables corresponding to yt, , and y0,t respectively. Now consider the multiplication of Y by a second mask represented by the random variable MΛ,
so that is a further sub-sampled random variable. The following result states how the expectation of Y0 can be computed from and Y. Here, and throughout this article, 𝔼[·] is used to denote the expectation over all random variables within the brackets.
Claim 1: When and for all j, the expectation of Y0 given is
(3) |
where K is a diagonal matrix defined as
(4) |
for P = 𝔼[MΩ] and .
Proof: See Section A of the Appendix, which is based on the proof given in Section III.D of [41].
Equation (3) generalizes the version of Noisier2Noise pro-posed for uniform, multiplicative noise in [41] to variable density sampling. The difference between the uniform and variable density versions is the matrix K, which is a scalar in [41]. For the special case where MΩ and MΛ are uniformly random sub-sampling masks, P, and therefore K are proportional to the identity matrix, and (3) simplifies to the uniform version. The mathematical requirement that pj > 0 and for all j simply ensures that (𝟙 − K) is invertible: see Section A of the Appendix.
Equation (3) implies that can be estimated without fully sampled data by training a network to estimate . To do this, a network can be trained to minimize
(5) |
for a full-rank matrix W. The minimum occurs when the gradient with respect to θ is zero:
where J is the Jacobian matrix with entries . The number of parameters is typically much greater than N, so J has far more rows than columns. Assuming that the rows of J are maximally linearly independent, so the row space is N-dimensional, the only solution is
(6) |
If W is full-rank, WH W is also full rank, so left-multiplying by (WH W)−1 and using ,
Therefore, by (3), a candidate for estimating fully sampled k-space with sub-sampled data only is
This expression does not use Y, so does not use all available data. Two candidate approaches for using all available data at inference are considered in this article. Firstly, one can overwrite known entries of the network output with Y:
where the final step uses . Here, the superscript refers to “data consistent”, since the estimate is exactly consistent with Y. We emphasize that is consistent with all available data Y, not just the data in . Alternatively, similar to the approaches suggested in both SSDU [33] and the additive noise examples in Noisier2Noise [41], one can use singly sub-sampled k-space Y as the network input at inference:
(7) |
Since Claim 1 applies to , not , (7) is not guaranteed to be correct in expectation. However, it has the advantage that all available data is used by the network. Hence, despite deviating from strict theory, we have found that it performs well in practice: see Section IV.
This suggests the following procedure, illustrated in Fig. 1, for training a network without fully-sampled data. For each sub-sampled k-space in the training set , generate a further sub-sampled k-space , where is an instance of MΛ. Then, approximate (5) by training a network to minimize the loss function
(8) |
for some full-rank matrix W. During inference, estimate fully-sampled k-space with either
(9) |
or
(10) |
where s indexes the test set.
Fig. 1.
Schematic of the self-supervised training methods in this article. If the loss weighting W is full rank, the training method is variable density Noisier2Noise, as proposed in Section II-A, whereas if the training method is SSDU: see Section II-D.
In other words, we train a network to estimate the “singly” sub-sampled k-space yt from “doubly” sub-sampled k-space and then, during inference, apply a correction based on the diagonal matrix K to estimate the fully sampled data. The correction term only needs to be applied during inference and has minimal computational cost.
In [41], only the version with W = 𝟙 was presented. Here we present a version with non-trivial W because it provides a theoretical link to SSDU; Section II-D shows that Noisier2Noise with the rank-deficient W = (𝟙 − MΛ)MΩ is SSDU exactly.
Noisier2Noise and SSDU work because the network cannot deduce from which entries of yt are non-zero [41]. Therefore, the loss is minimized when the network learns to recover all of k-space: see Section V for a detailed discussion.
B. Choice of Mask Distributions
The only condition on the first mask MΩ from Claim 1 is that pj > 0 for all j. In other words, the guarantee only applies when there is a non-zero probability that there are sampled examples of all k-space locations in the training set.
Claim 1 also states that the second mask MΛ must obey for all j. This ensures that there is a non-zero probability that any entry of is masked. Unlike MΩ, whose distribution is determined by the acquisition protocol, the MΛ is chosen freely during training. Following [41], we suggest using a distribution of MΛ that is the same type as MΩ, but not necessarily with the same parameters. For instance, if MΩ is column-wise sampling with variable density, such as in Fig. 1, an appropriate MΛ is one that is also column-wise, but possibly with a different variable density distribution.
C. Choice of Network
Noisier2Noise is agnostic to the network architecture. We have found that using the data consistent function
(11) |
where is a network with arbitrary architecture, may improve the performance of Noisier2Noise. This is because the in (11) only recovers regions of k-space that are not already sampled in , so the network does not need to learn to map sampled k-space locations to themselves. We emphasize that (11) ensures that is consistent with , while (9) ensures the estimate is consistent with ys, which is only applied at inference and cannot be part of the network architecture when is used as the input.
Many popular network architectures for MRI reconstruction are based on a sequence of “unrolled” iterations of a optimization algorithm [48] such as the Iterative Shrinkage Thresholding Algorithm (ISTA) [49] or the Alternating Direction Method of Multipliers (ADMM) [50]. These are variously known as “physics-guided”, “physics-based” or “model-based” methods due to their explicit use of the MRI forward model. These architectures typically alternate between a module that recovers missing k-space entries by removing aliasing in the image domain and a module that ensures consistency with the k-space data. This implies that (11), or possibly a “soft” version of it where the data is not forced to be exactly consistent, may already be implemented as part of the network architecture. In the experimental evaluation of the methods in this article we used the Variational Network (VarNet) [12], [51], which is one such architecture where (11) is not necessary. However, in preliminary studies not presented in this article we found that a U-net [52], which does not already employ data consistency, benefited considerably from (11).
D. Relationship to SSDU
This section shows that SSDU [33] with an loss is a version of Noisier2Noise with a particular rank-deficient loss weighting matrix W.
To see the connection between SSDU and Noisier2Noise, it is instructive to see the relationship between Noisier2Noise’s Λt and SSDU’s disjoint subsets At and Bt. Disjoint subsets of Ωt can be formed in terms of Ωt and Λt by setting At =Ωt \ Λt and Bt =Ωt ∩ Λt. The distribution of At and Bt are defined by the distributions of Ωt and Λt and always satisfy At ∪ Bt = Ωt and At ∩ Bt = ∅ as required. In terms of sampling masks, this is written as and . Therefore, SSDU’s loss (2) with a squared norm is
so is exactly Noisier2Noise with . In other words, while Noiser2Noise’s loss is computed over all k-space, SSDU’s loss is computed only on indices that are in Ωt but not in Λt.
SSDU’s weighting ensures that any indices not sampled in Y are ignored in the loss. One might think that the correct choice for this goal would be . However, if a data consistent network is employed, as in (11), the contribution to the loss from indices in both Ωt and Λt would be zero because they are consistent by construction. Therefore the loss for and would be identical. A similar idea was presented for fully supervised learning in [53], where a mask is applied to the training data multiple times.
E. Proof of SSDU
This section shows that SSDU’s loss weighting causes the correction (𝟙 − K)−1 at inference to no longer be necessary. When the weighting matrix W is the random variable (𝟙 − MΛ)MΩ, the network parameters are trained to seek a minimum of
(12) |
Unlike Noisier2Noise, W =(𝟙 − MΛ)MΩ is not full-rank, so . The usual theoretical goal for self-supervised methods is to prove that the network is correct in expectation [38], [39], [40], [41], [42], [43], [44], as in Claim 1 for variable density Noisier2Noise. In the following we state, to our knowledge, the first similar result for SSDU.
Claim 2: A network with parameters that minimizes (12) satisfies
(13) |
Proof: See Section B of the Appendix.
If 𝟙 − K is invertible, which holds when pj > 0 and for all j,
Therefore, in general, is correct in expectation, but only in regions of k-space that are not sampled in . This contrasts with the variable density Noisier2Noise method presented in Section II-A, which is correct in expectation for all k-space indices. However, as described in the following, this apparent shortcoming can easily be circumvented by using all available data at inference.
Similarly to Noisier2Noise’s (9) and (10), we consider two options for the k-space estimate at inference, both of which use all available data. Firstly, similarly to (9), the data consistent estimate
(14) |
can be used, which is correct in expectation everywhere in k-space for any network architecture. Alternatively, the SSDU paper [33] suggests using
(15) |
and a physics-guided network architecture. Like (10) for Noisier2Noise, the network input for (15) is singly sub-sampled, so Claim 2 does not apply and the estimate is not guaranteed to be correct in expectation. Nonetheless, it has the advantage over (14) that it uses all available data in the input to the network. As in [33], we have found that (15) performs well in practice when the network architecture includes a data consistency module: see Section IV.
We emphasize that unlike Noisier2Noise, SSDU does not require the correction term (𝟙 − K)−1 at inference. This implies that SSDU is less sensitive to inaccuracies in , and we have found that SSDU outperforms Noisier2Noise in general: see Section IV.
F. K-Weighted SSDU
Since we train on a finite number of instances of the random variables Y, , Ω and Λ, the network parameters we obtain in practice, which we denote , are an approximation of the ideal θ∗ from (12). In this case, the right-hand-side of (13) is not exactly zero. Rather,
(16) |
where ε is a vector random variable. The vector ε characterizes the difference between a true expectation and the network’s estimate of it, which is non-zero for finite data. In other words, ε is a statistical error due to finite sampling. The difference between the trained network’s output and the expectation of interest, , is (𝟙 − K)−1ε. This implies that the network is more sensitive to errors in k-space locations where (𝟙 − K)−1 is large.
To compensate for this, we propose minimizing the following weighted version of SSDU’s loss as an alternative to (12):
Introducing in the loss cancels the 𝟙 − K in (16), so mitigates the error amplification caused by θ* approximation. We find that this version of SSDU, which we refer to as “K-weighted SSDU” throughout the remainder of this article, substantially improves the image restoration quality and robustness to training hyperparameters: see Section IV. We chose the power because it exactly cancels the 𝟙 − K on the left-hand-side of (16) when the squared loss is used; we also tried power (𝟙 − K)−1 and found that, as expected, it did not perform as well in practice.
G. Understanding the Need for Correction
This section intuitively explains why Noisier2Noise requires correction at inference but SSDU does not. We can write the weighted loss as
where we have used that the term is square brackets equals the identity matrix. When is consistent with , such as in (11), . Therefore
(17) |
where we have used (𝟙 − MΩ)Y = 0. In (17) is SSDU’s loss function (12) plus a contribution from all .
Intuitively, the second term on the right-hand-side of (17) causes the proposed method to underestimate regions of k-space with index . This underestimation is compensated for with (𝟙 − K)−1 at inference. For SSDU, where W =(𝟙 − MΛ)MΩ, the second term on the right-hand-side of (17) is zero, k-space is not underestimated anywhere, and there is no need for a correction term at inference.
III. Experimental Method
A. Description of Data
We used the multi-coil brain and knee data from the fastMRI dataset [25], which is comprised of multi-channel raw k-space MRI data. The reference fastMRI test set data is magnitude images only, without fully sampled k-space data. Since we also require phase, we discarded the data allocated for testing and generated our own partition into training, validation and test sets. For the brain data, we only used data that was acquired on 16 coils, and used training, validation and test set sizes of 127, 19 and 14 volumes (2020, 302, and 224 slices) respectively. For the knee data, the training, validation and test sets consisted of 166, 19 and 14 volumes (5977, 665, and 493 slices) respectively. We set the network output to be zero in regions of k-space where the reference data had zero padding.
B. Network Architecture
For fθ, we used the variant of the VarNet [12] that estimates coil sensitivities on-the-fly [51], which performs competitively on the fastMRI leaderboard and is available as part of the fastMRI package.1 After a coil sensitivity estimation module, VarNet uses multiple repetitions of a module based on gradient descent, which is comprised of a data consistency term in k-space and a prior based on a U-net [52] that acts in the image domain after an inverse Fourier transform and coil combination. The output of the neural network was in k-space. We used 6 repetitions of the main module, so that our model had around 1.5×107 parameters. Note that in [25], the Structural Similarity Index (SSIM) [54] was used as the loss, while in this article we use an loss.
The only additional operations SSDU and Noisier2Noise require compared to fully-supervised training are simple entry-wise masks, so all methods had similar memory requirements and training time. We trained for 50 epochs, which took around 17 hours on a GTX 1080 Ti GPU with 11 GB of RAM for the brain data. For all methods we used the Adam optimizer [55] with a fixed learning rate of 10−3. Our PyTorch implementation is publicly available on GitHub.2
C. Distribution of Masks
So that the distribution of the sampling masks were known exactly, we generated our own masks rather than using those suggested in fastMRI. Unless stated otherwise, the distribution of the first mask MΩ was 1D column-wise. We fully sampled the central 10 columns and sampled the remainder with polynomial variable density. We used polynomial order 8, and scaled the probability density P so that it matched a desired acceleration factor. We ran each method with RΩ ∈ {4, 8}, where is the expected acceleration factor. An example at RΩ = 4 is shown in Fig. 2(a).
Fig. 2.
Example of the singly sub-sampled mask , and doubly sub-sampled with two MΛ distribution types. Here, the acceleration factor of the first mask is RΩ = 4 and the second is RΛ = 2.
In [41], it is suggested that the distribution of Noisier2Noise’s second random variable is the same as the first, but not necessarily with the same distribution parameters. Therefore, for Noisier2Noise’s second mask MΛ, we used the same type of distribution as MΩ with a different variable density. An example with RΩ = 4 and is shown in Fig. 2(b). Concretely, we define two masks as having the same ‘type’ of distribution when the conditional dependence of the sampling set indices is the same. Let . If pj|k = pj for all j and k, the entries are independent and the mask is the type ‘2D Bernoulli’. If pj|k = 1 when j and k are in the same k-space column and pj|k = pj otherwise, the mask is the type ‘1D column-wise’. The experiments in this article focus on these two types of masks; other types are discussed in Section V. We emphasize that constraining a mask to a type does not constrain the pjs, which define the variable sampling density.
To ensure that everywhere, we set in the central 10 columns of k-space, where ϵ is a small real constant. The network architecture ensures that the central region is consistent with the input, so ϵ can be small without penalty. We used ϵ = 10−3.
In order to be a realistic simulation of prospectively sub-sampled data, the sampling set Ωt must be fixed for all epochs. However, Λt need not be. Therefore, we re-generated from the distribution of MΛ once per epoch. Since the network sees more samples from the distribution of MΛ, the loss function is closer to (5), so is expected to be a more accurate approximation of . This has similarities with training data augmentation, as each slice is used to generate several inputs to the network [56].
D. Comparative Methods
We trained Noisier2Noise using different weightings of the loss stated in (8). For each self-supervised method, we considered two possible estimates at inference: one with the doubly sub-sampled as the network input and the other with the singly sub-sampled ys. The methods and their two estimates at inference are summarized in Table I.
TABLE 1. The Self-Supervised Methods Evaluated in This Paper.
Name | Loss weighting W | MΛ distribution | Estimate with input | Estimate with ys input |
---|---|---|---|---|
Unweighted Noisier2Noise | 𝟙 | ID column-wise | ||
2D partitioned SSDU | 2D Bernoulli | |||
1D partitioned SSDU | 1D column-wise | |||
K-weighted 1D partitioned SSDU | 1D column-wise |
Here, and throughout this paper, the subscripts t and s index the training and test sets respectively. Examples of for 2D bemoulli and ID column-wise are shown in fig. 2.
We trained with W = 𝟙, referred to as “Unweighted Nois-ier2Noise”. By Claim 1, Unweighted Noisier2Noise requires (𝟙 − K)−1 correction at inference: see Table I. We have found that the need for correction substantially reduces the image quality compared to SSDU, so do not recommend using Unweighted Noisier2Noise in practice. Nonetheless, we include some Unweighted Noisier2Noise results to illustrate the value of SSDU’s loss weighting.
We also trained Noisier2Noise with W = (𝟙 −MΛ)MΩ which, based on the relationship described in Section II-D, we refer to as “SSDU”, despite some differences between our implementation and [33]. In [33], a mixture of an and loss was used, whereas here, so that it can be directly compared with Unweighted Noisier2Noise, we used an loss. We also used a different MΩ distribution, dataset and network architecture to [33].
SSDU [33] was originally applied to an architecture that requires pre-computed sensitivity maps. It was suggested that has a fully sampled 4 × 4 central region and 2D Gaussian variable density otherwise, so that high frequencies are sampled with higher probability. For the architecture considered in this article, which has a coil sensitivity estimation module, we found that increasing the size of the fully sampled central region considerably improved the method’s performance. Since MΩ has 10 fully sampled central columns, we increased the size of the central region of MΛ to 10 × 10.
As the probability of sampling each location in k-space is independent, the sampling set partition proposed in [33] is equivalent to a 2D variable density Bernoulli MΛ distribution. To estimate their variable density distribution we ran the SSDU authors’ set partitioning code3 1000 times on a fully sampled mask and averaged the result. We trained SSDU using a distribution of MΛ of this type, referred to as “2D partitioned SSDU”, illustrated in Fig. 2(c). We also trained SSDU using the same distribution type of MΛ as MΩ, as in Fig. 2(b). We refer to this method as “1D partitioned SSDU”, or “K-weighted 1D partitioned SSDU” when a weighting is used in the loss as described in Section II-F. Like Unweighted Noisier2Noise, was re-generated once per epoch [56]. We emphasize that although 2D partitioned SSDU has a similar MΛ distribution as in [33], the distribution of MΩ here is random variable density columns, not equidistant columns as in [33]. Therefore, 2D partitioned SSDU is not necessarily expected to perform as well as SSDU in [33].
As a best-case target, we also trained using a fully supervised method with an (unweighted) loss. All deep learning methods had the same network architecture and training hyperparameters, as described in III-B.
Finally, as a comparative method that does not use deep learning, we ran a compressed sensing algorithm with a sparse model on wavelet coefficients, which we implemented via the Berkeley Advanced Reconstruction Toolbox (BART) [57].We used BART’s default settings with fourth-order Daubechies wavelets and a sparse weighting of λ = 2 × 10−3.
E. Quality Metrics
To evaluate the reconstruction quality, we computed the Normalized Mean Squared Error (NMSE) in k-space on the test set: . We also computed the image-domain root-sum-of-squares (RSS), where ys,c is the k-space entries on coil c and F is the discrete Fourier transform, cropped the RSS estimate to a central 320×320 region and computed the SSIM, as suggested in fastMRI [25].
IV. Results
For brevity, the results presented here focus on RΩ = 8. Similar results for the brain data at RΩ = 4 are shown in the supplementary material: see Figs. S1-S4.
For the brain data, we evaluated the dependence of the methods’ performance on the distribution of MΛ by varying the parameters so that the sub-sampling factor RΛ changed. We trained with RΛ ∈ {1.2, 1.6, 2, 4, 6}, except for 2D partitioned SSDU, which we found needed finer tuning and a smaller RΛ for the best performance, so we trained with RΛ ∈ {1.1, 1.2, …, 2, 3, 4, 6}.
A. Performance With Tuned RΛ
This section focuses on the case where RΛ has been tuned to minimize the ground truth test set NMSE. Figs. 3 and S1 show bar charts of the percentage difference between fully supervised training and each method: (μ−μfull)/μfull where μ and μfull are the mean NMSE of interest and mean NMSE of fully supervised training respectively. The best performance was for K-weighted 1D partitioned SSDU with a ys input; its mean NMSE was only 1.1% and 0.8% larger than fully supervised for RΩ = 8, 4 respectively. Figs. 4 and S2 show box plots of the NMSE of each method for RΩ = 8 and RΩ = 4 respectively: see Table S1 of the supplementary material for the numerical values.
Fig. 3.
Mean test set NMSE percentage difference between fully supervised and each methods at RΩ = 8 and a 1D distributed MΩ, where RΛ has been tuned to minimize the test set NMSE. Fig. S1 shows a similar plot for RΩ = 4.
Fig. 4.
NMSE for all methods at RΩ = 8 and a 1D distributed MΩ, where RΛ has been tuned to minimize the test set NMSE. Fig. S2 shows a similar plot for RΩ = 4 and the exact numerical values are in Table S1.
To evaluate whether the proposed changes to SSDU were statistically significant, we performed a one-sided Wilcoxon signed-rank test with p-value 0.01 on the test set NMSEs. For both the ys and inputs, we found that there was a significant statistical difference between 2D and 1D partitioned SSDU. We also found that the difference between 1D partitioned SSDU and K-weighted 1D partitioned SSDU was statistically significant.
Figs. 5 and S3 show RSS estimates from the test set at RΩ = 8 and RΩ = 4 respectively. Qualitatively, K-weighted 1D partitioned SSDU performs the most similarly to fully supervised training. Although 2D partitioned SSDU has a competitive quantitative score for the estimate with input, it exhibits some streaking artifacts.
Fig. 5.
Reconstruction example with a 1D sub-sampled MΩ and RΩ = 8, with a RΛ tuned to minimize the test set NMSE. A similar figure for RΩ = 4 is in the supplementary material, Fig. S3.
Unweighted Noisier2Noise’s performance was substantially worse than SSDU. Therefore we compare SSDU and its modifications only in the remainder of this article.
B. Robustness to RΛ
For actual, prospectively sampled data, it would not be possible to tune RΛ on the ground truth test set NMSE. The practicality of SSDU therefore depends greatly on the robustness to RΛ. Figs. 6 and S4 show the dependence of the mean test set NMSE on RΛ for RΩ = 8 and RΩ = 4 respectively. K-weighted 1D partitioned SSDU was the most robust to the tuning of RΛ. 2D partitioned SSDU was the least robust, especially for the estimate with ys input. This is visualized in Fig. 7, which shows reconstruction examples for a number of RΛs. K-weighted 1D partitioned SSDU performs very similarly for all RΛs between 1.6 and 6, while 2D partitioned SSDU’s restoration quality significantly degrades qualitatively and quantitatively for mistunings as small as 0.1.
Fig. 6.
Dependence of the test set NMSE on the acceleration factor of the second mask MΛ, denoted as RΛ, at RΩ = 8 for both outputs. 1D partitioned SSDU is far more robust to the tuning of RΛ than 2D partitioned SSDU. Fully supervised learning does not use a second mask MΛ, so has the same performance for all RΛ. A similar figure for RΩ = 4 is in the supplementary material, Fig. S3.
Fig. 7.
Robustness to RΛ, where the blue box highlights the case where RΛ is tuned. K-weighted 1D partitioned SSDU is very robust to RΛ, with very similar restoration quality for all RΛ between 1.6 and 6. 2D partitioned SSDU is far more sensitive, with substantial degradation in image quality for mistunings as small as 0.1. Here, we show the estimate with ys input only.
C. Performance on 2D Sampled Brain Data
To further evaluate the role of the partitioning distribution, we also ran 1D and 2D partitioned SSDU on the brain data with a 2D Bernoulli sampled MΩ. In this case, the type matching of the second mask to MΩ is switched: 2D partitioned SSDU’s second mask has the same type of distribution as the first, while 1D partitioned SSDU has a different type. For MΩ, we used a fully sampled 10 × 10 central region and a polynomial variable density that samples low frequencies with higher probability otherwise. We used RΛ = 1.2 and RΛ = 4 for 2D and 1D partitioned SSDU respectively. All other hyperparameters and network specifics were unchanged.
In this case, the best performance was 2D partitioned SSDU, which performed very similarly to fully supervised training: see Fig. 8. The input had a mean test set NMSE of 0.141 and 0.144 for 2D and 1D partitioned SSDU respectively, and the ys input had 0.141 and 0.145, compared with 0.139 for fully supervised training. Although not shown in Fig. 8 for brevity, we also trained 2D partitioned SSDU with a loss weighting. As for 1D partitioned SSDU in Section IV-A, we found that this reduced the mean NMSE further to 0.140 for both the ys and input.
Fig. 8.
Reconstruction example from the brain fastMRI dataset with a 2D Bernoulli distributed MΩ and RΩ = 8. Compared to Fig. 5, the comparative performance of the SSDU algorithms are switched: here, 2D partitioned SSDU performs similarly to fully supervised training, while 1D partitioned SSDU suffers from streaking artifacts.
D. Performance on 1D Sampled Knee Data
We also trained K-weighted 1D partitioned SSDU on the fastMRI knee data with the same network architecture, training hyperparameters, and a 1D distributed MΩ. The sub-sampling factor of the first and second masks were RΩ = 8 and RΛ = 2 respectively. The mean test set NMSE was 0.233 and 0.231 for the estimates with and ys inputs respectively, compared with 0.230 for fully supervised training. Fig. 9 shows two example reconstructions from the test set, demonstrating competitive performance with fully supervised training qualitatively.
Fig. 9.
Two reconstruction examples of K-weighted 1D partitioned SSDU from the knee fastMRI dataset, where MΩ is 1D. As in Fig. 5, K-weighted 1D partitioned SSDU’s restoration quality is very similar to fully supervised training.
V. Discussion
Due to its need for correction at inference, Unweighted Noisier2Noise had consistently the worst score. We therefore do not recommend using Unweighted Noisier2Noise in practice. Rather, we suggest using a variant of SSDU, which has a loss weighting that removes the need for such a correction.
The hierarchy of 1D and 2D partitioned SSDU depends on the distribution of MΩ. In particular, the best performance was when they are both 1D or both 2D. It is conventional wisdom that better reconstruction quality is possible when k-space is randomly sub-sampled in both spatial dimensions (see, for instance, [58]). This is because the image-domain aliasing is incoherent in both dimensions, so is easier to remove. The superior performance of 1D partitioned SSDU compared with 2D partitioned SSDU when MΩ is 1D shows that it is not necessarily true that the sampling set partition should also ideally be two-dimensional. Rather, better performance is possible when the distribution of MΩ and MΛMΩ are of the same type.
To see why, consider the nature of the aliasing caused by sub-sampling and further sub-sampling k-space, focusing on the example of a random 1D column sampled MΩ. Such sampling causes the image-domain aliasing to be horizontally incoherent and vertically coherent. With a 1D column-wise Λt, further horizontal aliasing is introduced. Since the network cannot distinguish between the horizontal aliasing caused by Ωt or Λt, the loss is minimized when the aliasing due to both is removed. On the other hand, a 2D Λt introduces some aliasing that is orthogonal to the original aliasing, which is distinguishable in principle. In this case, the loss is minimized when the network removes the aliasing caused by Λs, but not necessarily the original aliasing caused by Ωs. This is visible in Figs. 5 and 8, where SSDU fails to completely remove artifacts caused by MΩ when MΛ does not have the same type of distribution.
This implies that, in general, better performance is possible when the distribution of the aliasing of and yt are of the same type. For both the independent 1D column sampling and 2D Bernoulli sampling considered here, this can be achieved by choosing a MΛ with the same type of distribution as MΩ. Recently, in [59], this was also observed empirically for SSDU with random spoke sampling. However, such a procedure does not always achieve this goal. For instance, while the SSDU paper [33] considers a fully sampled central region and equidistant column sampling, recovery of images with regular under-sampling is not currently considered in the proposed framework. In this case, a Λt of the same type would not give a with the same aliasing type as yt. The 2D Gaussian variable density partition employed in this article was originally constructed to handle such sampling patterns, and was found to perform very well in this context. Future work includes establishing the correct sampling set partitions for MΩ distributions not in [33] or covered by the approach suggested here.
We found that K-weighted SSDU further improved the image quality and robustness to RΛ. Consider the jth entry of the (squared) weighting (𝟙 − K)−1 in terms of sampling probabilities:
This leads to the following intuitive interpretation of the proposed loss weighting as compensation for the variable density of Ω and Λ. A smaller denominator ℙ(j ∈ Ω \ Λ) implies that the jth location occurs less frequently in the loss, which is compensated for by an increased weighting. A smaller numerator implies that the jth location is estimated by the network less frequently, so has a decreased weighting.
The benefit of the (𝟙 − K)−1 weighting highlights and addresses a general challenge of self-supervised learning with variable density sampling: regions of k-space sampled with lower probability are underrepresented in the loss. This issue has been noted in other works. For instance, for variable density reconstruction with Noise2Noise, [60] suggests weighting the loss function by the sampling density. An alternative approach was suggested in [61], which suggests passing the training target through the network before it is employed in the loss function. We note that if the sampling and partitioning had uniform density, such as in [56], K would also be uniform, so the proposed weighting would not be required. This may explain in part the empirical performance observed in [56].
When MΩ was 1D, with the exception of 2D partitioned SSDU, Fig. 6 shows that the estimate with ys input performed similarly or better than with input when RΛ is tuned. This indicates that, for these methods, the advantage of using all the data in the input to the network outweighs the disadvantage that the input data has a different sampling distribution to the training data so is not guaranteed by Claim 1 or 2 to be correct in expectation. Heuristically, when MΩ and MΛMΩ are both variable density column-wise sampled, a network trained on doubly sub-sampled data is likely to also be able to handle singly sub-sampled data. However, for 2D partitioned SSDU, MΛMΩ is no longer column-wise, see Fig. 2(c). Accordingly, 2D partitioned SSDU was the only method that had a higher NMSE for the ys input compared to the input.
The best RΛ for 2D partitioned SSDU was lower than competing methods: RΛ = 1.8 and RΛ = 1.2 for the ys and inputs respectively. In [33], the sampling set partition was quantified in terms of the ratio ρ = |At|/|Bt|, and it was found that ρ = 0.4 offered the best performance. Since the MΩ distributions are different here, the optimal ρ is not expected to necessarily be the same. For 2D partitioned SSDU RΛ = 1.8 and RΛ = 1.2 corresponds to ρ = 0.52 and ρ = 0.21 respectively, while for the other methods’s best performance at RΛ = 4 corresponded to ρ = 0.57. Therefore the ρ were reasonably similar despite the substantial difference in RΛ.
Since the network architecture uses in its coil sensitivity estimation module, not yt, it is plausible that the differences between 1D and 2D partitioning could be due to poorer coil sensitivity estimation rather than an intrinsic property of the partition change. To examine this, we re-trained tuned 1D and 2D partitioned SSDU on the 1D sampled brain data with k-space masked to a central 10 × 10 region in the coil sensitivity estimation module. We found that the test set NMSE was within 1% of the usual approach. This verifies that the performance improvement was indeed a consequence of the partition change, not simply a consequence of specifics of the architecture.
Unweighted Noisier2Noise’s correction at inference (𝟙 − K)−1 is only valid when an loss is used; we have found that other loss functions do not perform well in practice. This loss leads to smoothing artifacts, even for fully supervised training. For SSDU, since there is no correction term, loss functions other than are possible. For instance, in [33], a mixture of and was used. Better visual quality may be achievable when SSDU is implemented with a different loss; we do not suggest using an loss in general, it is only required here so that it can be compared directly with Noisier2Noise.
For all self-supervised methods in this work, we re-generated Λt once per epoch. This has similarities to the multi-mask SSDU approach proposed in [56]. However, in [56], a fixed number nΛ of Λts were generated for each Ωt, each of which were treated as an additional member of the training set. Therefore, unlike in this article, each epoch was nΛ times as long. Future work includes establishing whether it is also advantageous to limit the number of unique Λts per Ωt for the approach considered in this article.
All methods in this article were trained without taking measurement noise into account [62], [63]. Recent work by the present authors has shown that the additive and multiplicative versions of Noisier2Noise can be combined to recover higher fidelity images than SSDU in the presence of noise [64].
VI. Conclusions and Future Work
Based on the observation that SSDU is a version of Noisier2Noise with a particular rank-deficient loss weighting, we proved that SSDU correctly estimates Y0 in expectation. This analysis led to two proposals that we found significantly improved SSDU’s performance in practice. Firstly, we propose employing a distribution of MΛMΩ that is the same type as the original mask MΩ. Secondly, we propose introducing a weighting of in SSDU’s loss. We found that that each of these modifications significantly improved SSDU’s test set NMSE and robustness to RΛ.
There are a number of other self-supervised learning methods that also use sampling set partitioning [37], [56], [65], some of which are variants of SSDU. For instance, [37], [65], [66] propose training two networks in parallel, one for each sampling subset, with a loss function that includes the difference between the outputs of the two networks. Another recent development is zero-shot SSDU [67], which shows that sampling set partitioning can also be applied to recover images without a training dataset [68]. Future work includes determining whether the theoretical and practical developments of this article can be extended to these methods.
Supplementary Material
Acknowledgment
For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. The computational aspects of this research were supported by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z and the NIHR Oxford BRC. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
Biographies
Charles Millard received the M.Sc. degree in physics from Imperial College London, London, U.K. and the doctorate degree in mathematics with biomedical imaging from the University of Oxford, Oxford, U.K. He is currently a Postdoctoral Researcher with the Wellcome Centre for Integrative Neuroimaging, University of Oxford. His research focuses on methods for reconstructing accelerated magnetic resonance imaging acquisitions with compressed sensing and deep learning.
Mark Chiew received the B.ASc. degree in engineering physics from the University of British Columbia, Vancouver, BC, Canada, and the Ph.D. degree in medical biophysics from the University of Toronto, Toronto, ON, Canada. From 2012 to 2022, he was a Postdoctoral Researcher and then the Royal Academy of Engineering Research Fellow with the University of Oxford, Oxford, U.K. Since 2022, he has been an Associate Professor with the University of Toronto, and the Scientist with Sunnybrook Research Institute, Toronto, ON. His research interests include the development of acquisition and image reconstruction strategies for magnetic resonance imaging.
Footnotes
[Online]. Available: https://github.com/facebookresearch/fastMRI
[Online]. Available: https://github.com/charlesmillard/Noisier2Noise_for_recon
[Online]. Available: https://github.com/byaman14/SSDU
Contributor Information
Charles Millard, Email: charles.millard@ndcn.ox.ac.uk, the Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, OX3 9DU Oxford, U.K.
Mark Chiew, Email: mark.chiew@utoronto.ca, the Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, OX3 9DU Oxford, U.K., and with the Department of Medical Biophysics, University of Toronto, Toronto, ON M5S 1A1, Canada, and also with the Canada and Physical Sciences, Sunnybrook Research Institute, Toronto, ON M4N 3M5, Canada.
References
- [1].Ra JB, Rim CY. Fast imaging using subencoding data sets from multiple detectors. Magn Reson Med. 1993;30(1):142–145. doi: 10.1002/mrm.1910300123. [DOI] [PubMed] [Google Scholar]
- [2].Pruessmann KP, Weiger M, Scheidegger MB, Boesiger P. SENSE: Sensitivity Encoding for Fast MRI. Magn Reson Med: An Official J Int Soc Magn Reson Med. 1999 Nov;42:952–62. [PubMed] [Google Scholar]
- [3].Griswold MA, et al. Generalized autocalibrating partially parallel acquisitions (GRAPPA) Magn Reson Med. 2002 Jun;47:1202–1210. doi: 10.1002/mrm.10171. [DOI] [PubMed] [Google Scholar]
- [4].Uecker M, et al. ESPIRiT-an eigenvalue approach to autocalibrating parallel MRI: Where SENSE meets GRAPPA. Magn Reson Med. 2014 Mar;71:990–1001. doi: 10.1002/mrm.24751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Donoho DL. Compressed sensing. IEEE Trans Inf Theory. 2006 Apr;52(4):1289–1306. [Google Scholar]
- [6].Candes EJ, Romberg J, Tao T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory. 2006 Feb;52:489–509. [Google Scholar]
- [7].Lustig M, Donoho D, Pauly JM. Sparse MRI: The application of compressed sensing for rapid MR imaging. Magn Reson Med. 2007 Dec;58:1182–1195. doi: 10.1002/mrm.21391. [DOI] [PubMed] [Google Scholar]
- [8].Ye JC. Compressed sensing MRI: A review from signal processing perspective. BMC Biomed Eng. 2019 Dec;1 doi: 10.1186/s42490-019-0006-z. Art no 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Jaspan ON, Fleysher R, Lipton ML. Compressed sensing MRI: A review of the clinical literature. Brit J Radiol. 2015 Dec;88 doi: 10.1259/bjr.20150487. Art no 20150487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Wang S, et al. Accelerating magnetic resonance imaging via deep learning; Proc IEEE 13th Int Symp Biomed Imag; pp. 514–517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Kwon K, Kim D, Park H. A parallel MR imaging method using multilayer perceptron. Med Phys. 2017;44(12):6209–6224. doi: 10.1002/mp.12600. [DOI] [PubMed] [Google Scholar]
- [12].Hammernik K, et al. Learning a variational network for reconstruction of accelerated MRI data. Magn Reson Med. 2018;79(6):3055–3071. doi: 10.1002/mrm.26977. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Yazdanpanah AP, Afacan O, Warfield S. Deep plug-and-play prior for parallel MRI reconstruction; Proc IEEE/CVF Int Conf Comput Vis Workshop; 2019. pp. 3952–3958. [Google Scholar]
- [14].Liu J, Sun Y, Eldeniz C, Gan W, An H, Kamilov US. RARE: Image reconstruction using deep priors learned without groundtruth. IEEE J Sel Topics Signal Process. 2020 Oct;14(6):1088–1099. [Google Scholar]
- [15].Yang Y, Sun J, Li H, Xu Z. Deep ADMM-Net for compressive sensing MRI; Proc 30th Int Conf Neural Inf Process Syst; pp. 10–18. [Google Scholar]
- [16].Yang Y, Sun J, Li H, Xu Z. ADMM-CSNet: A Deep Learning Approach for Image Compressive Sensing. IEEE Trans Pattern Anal Mach Intell. 2020 Mar;42(3):521–538. doi: 10.1109/TPAMI.2018.2883941. [DOI] [PubMed] [Google Scholar]
- [17].Zhang J, Ghanem B. ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing; Proc IEEE Conf Comput Vis Pattern Recognit; 2018. pp. 1828–1837. [Google Scholar]
- [18].Zhu B, Liu JZ, Cauley SF, Rosen BR, Rosen MS. Image reconstruction by domain-transform manifold learning. Nature. 2018;555(7697):487–492. doi: 10.1038/nature25988. [DOI] [PubMed] [Google Scholar]
- [19].Quan TM, Nguyen-Duc T, Jeong W-K. Compressed sensing MRI reconstruction using a generative adversarial network with a cyclic loss. IEEE Trans Med Imag. 2018 Jun;37(6):1488–1497. doi: 10.1109/TMI.2018.2820120. [DOI] [PubMed] [Google Scholar]
- [20].Mardani M, et al. Deep generative adversarial neural networks for compressive sensing MRI. IEEE Trans Med Imag. 2019 Jan;38(1):167–179. doi: 10.1109/TMI.2018.2858752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Aggarwal HK, Mani MP, Jacob M. MoDL: Model-based deep learning architecture for inverse problems. IEEE Trans Med Imag. 2019 Feb;38(2):394–405. doi: 10.1109/TMI.2018.2865356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Ahmad R, et al. Plug-and-play methods for magnetic resonance imaging: Using denoisers for image recovery. IEEE Signal Process Mag. 2020 Jan;37(1):105–116. doi: 10.1109/msp.2019.2949470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Wang S, et al. DIMENSION: Dynamic MR imaging with both k-space and spatial prior knowledge obtained via multi-supervised network training. NMR Biomed. 2022;35(4) doi: 10.1002/nbm.4131. Art no e4131. [DOI] [PubMed] [Google Scholar]
- [24].Chen Y, et al. AI-based reconstruction for fast MRI–A systematic review and meta-analysis. Proc IEEE. 2022 Feb;110(2):224–245. [Google Scholar]
- [25].Zbontar J, et al. fastMRI: An open dataset and benchmarks for accelerated MRI. arXiv:1811.08839. 2018 [Google Scholar]
- [26].Uecker M, Zhang S, Voit D, Karaus A, Merboldt K-D, Frahm J. Real-time MRI at a resolution of 20 ms. NMR Biomed. 2010;23(8):986–994. doi: 10.1002/nbm.1585. [DOI] [PubMed] [Google Scholar]
- [27].Haji-Valizadeh H, et al. Validation of highly accelerated real-time cardiac cine MRI with radial k-space sampling and compressed sensing in patients at 1.5 T and 3T. Magn Reson Med. 2018;79(5):2745–2751. doi: 10.1002/mrm.26918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Lim Y, Zhu Y, Lingala SG, Byrd D, Narayanan S, Nayak KS. 3D dynamic MRI of the vocal tract during natural speech. Magn Reson Med. 2019;81(3):1511–1520. doi: 10.1002/mrm.27570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Yoo J, Jin KH, Gupta H, Yerly J, Stuber M, Unser M. Time-dependent deep image prior for dynamic MRI. IEEE Trans Med Imag. 2021 Dec;40(12):3337–3348. doi: 10.1109/TMI.2021.3084288. [DOI] [PubMed] [Google Scholar]
- [30].Tamir JI, Stella XY, Lustig M. Unsupervised deep basis pursuit: Learning reconstruction without ground-truth data; Proc ISMRM Annu Meeting; 2019. Art no 0660. [Google Scholar]
- [31].Huang P, et al. Deep MRI reconstruction without ground truth for training; Proc 27th Annu Meeting ISMRM; 2019. [Online] Available: https://archive.ismrm.org/2019/4668.html. [Google Scholar]
- [32].Cole EK, Pauly JM, Vasanawala SS, Ong F. Unsupervised MRI reconstruction with generative adversarial networks. arXiv:2008.13065. 2020 [Google Scholar]
- [33].Yaman B, Hosseini SAH, Moeller S, Ellermann J, Uğurbil K, Akçakaya M. Self-supervised learning of physics-guided reconstruction neural networks without fully sampled reference data. Magn Reson Med. 2020;84(6):3172–3191. doi: 10.1002/mrm.28378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Liu S, Schniter P, Ahmad R. MRI recovery with a self-calibrated denoiser; Proc IEEE Int Conf Acoust, Speech Signal Process; 2022. pp. 1351–1355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Aggarwal HK, Pramanik A, Jacob M. Ensure: Ensemble Stein’s unbiased risk estimator for unsupervised learning; Proc IEEE Int Conf Acoust, Speech Signal Process; 2021. pp. 1160–1164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Zeng G, et al. A review on deep learning MRI reconstruction without fully sampled k-space. BMC Med Imag. 2021;21(1):1–11. doi: 10.1186/s12880-021-00727-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Hu C, Li C, Wang H, Liu Q, Zheng H, Wang S. Self-supervised learning for MRI reconstruction with a parallel network training framework; Proc 24th Int Conf Med Image Comput Comput-Assist Interv; 2021. pp. 382–391. [Google Scholar]
- [38].Lehtinen J, et al. Noise2noise: Learning image restoration without clean data; Proc Int Conf Mach Learn; 2018. pp. 2965–2974. [Google Scholar]
- [39].Krull A, Buchholz T-O, Jug F. Noise2void-learning denoising from single noisy images; Proc IEEE/CVF Conf Comput Vis Pattern Recognit; 2019. pp. 2129–2137. [Google Scholar]
- [40].Batson J, Royer L. Noise2self: Blind denoising by self-supervision; Proc Int Conf Mach Learn; 2019. pp. 524–533. [Google Scholar]
- [41].Moran N, Schmidt D, Zhong Y, Coady P. Noisier2noise: Learning to denoise from unpaired noisy data; Proc IEEE/CVF Conf Comput Vis Pattern Recognit; 2020. pp. 12064–12072. [Google Scholar]
- [42].Xie Y, Wang Z, Ji S. Noise2same: Optimizing a self-supervised bound for image denoising. Proc Adv Neural Inf Process Syst. 2020:20320–20330. [Google Scholar]
- [43].Hendriksen AA, Pelt DM, Batenburg KJ. Noise2Inverse: Self-supervised deep convolutional denoising for tomography. IEEE Trans Comput Imag. 2020;6:1320–1335. [Google Scholar]
- [44].Kim K, Ye JC. Noise2Score: Tweedie’s approach to self-supervised image denoising without clean images. Adv Neural Inf Process Syst. 2021;34:864–874. [Google Scholar]
- [45].Gan W, Sun Y, Eldeniz C, Liu J, An H, Kamilov US. Deformation-compensated learning for image reconstruction without ground truth. IEEE Trans Med Imag. 2022 Sep;41(9):2371–2384. doi: 10.1109/TMI.2022.3163018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Kang E, Min J, Ye JC. A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction. Med Phys. 2017;44(10):e360–e375. doi: 10.1002/mp.12344. [DOI] [PubMed] [Google Scholar]
- [47].Flamary R. Astronomical image reconstruction with convolutional neural networks; Proc IEEE 25th Eur Signal Process Conf; 2017. pp. 2468–2472. [Google Scholar]
- [48].Hammernik K, et al. Physics-driven deep learning for computational magnetic resonance imaging: Combining physics and machine learning for improved medical imaging. IEEE Signal Process Mag. 2023;40(1):98–114. doi: 10.1109/msp.2022.3215288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Daubechies I, Defrise M, De Mol C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun Pure Appl Math. 2004 Nov;57:1413–1457. [Google Scholar]
- [50].Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn. 2010;3(1):1–122. [Google Scholar]
- [51].Sriram A, et al. End-to-end variational networks for accelerated MRI reconstruction; Proc 23rd Int Conf Med Image Comput Comput.Assist Interv; 2020. pp. 64–73. [Google Scholar]
- [52].Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation; Proc 18th Int Conf Med Image Comput Comput-Assist Interv; 2015. pp. 234–241. [Google Scholar]
- [53].Yaman B, Hosseini SAH, Moeller S, Akçakaya M. Improved supervised training of physics-guided deep learning image reconstruction with multi-masking; Proc IEEE Int Conf Acoust, Speech Signal Process; 2021. pp. 1150–1154. [Google Scholar]
- [54].Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: From error visibility to structural similarity. IEEE Trans Image Process. 2004 Apr;13(4):600–612. doi: 10.1109/tip.2003.819861. [DOI] [PubMed] [Google Scholar]
- [55].Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980. 2014 [Google Scholar]
- [56].Yaman B, et al. Multi-mask self-supervised learning for physics-guided neural networks in highly accelerated magnetic resonance imaging. NMR Biomed. 2022;35(12) doi: 10.1002/nbm.4798. Art no e4798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Uecker M, Tamir JI, Ong F, Lustig M. The BART Toolbox for Computational Magnetic Resonance Imaging. Proc Int Soc Magn Reson Med. 2016;24 [Online] Available: https://www.user.gwdg.de/∼muecker1/basp-uecker2.pdf. [Google Scholar]
- [58].Deshpande V, Nickel D, Kroeker R, Kannengiesser S, Laub G. Optimized caipirinha acceleration patterns for routine clinical 3D imaging; Proc 20th Annu Meeting ISMRM; 2012. [Online] Available: https://archive.ismrm.org/2012/0104.html. [Google Scholar]
- [59].Blumenthal M, Luo G, Schilling M, Haltmeier M, Uecker M. NLINV-Net: Self-supervised End-2-End learning for reconstructing undersampled radial cardiac real-time data; Proc ISMRM Annu Meeting; 2022. [Online] Available: https://archive.ismrm.org/2022/0499.html. [Google Scholar]
- [60].Gan W, et al. Self-supervised deep equilibrium models for inverse problems with theoretical guarantees. arXiv:2210.03837. 2022 [Google Scholar]
- [61].Liu X, Zou J, Zheng X, Li C, Zheng H, Wang S. Iterative data refinement for self-supervised MR image reconstruction. arXiv:2211.13440. 2022 [Google Scholar]
- [62].Desai AD, et al. Noise2Recon: Enabling SNR-robust MRI reconstruction with semi-supervised and self-supervised learning. Magn Reson Med. doi: 10.1002/mrm.29759. to be published. [DOI] [PubMed] [Google Scholar]
- [63].Chen D, Tachella J, Davies ME. Robust equivariant imaging: A fully unsupervised framework for learning to image from noisy and partial measurements; Proc IEEE/CVF Conf Comput Vis Pattern Recognit; 2022. pp. 5647–5656. [Google Scholar]
- [64].Millard C, Chiew M. Simultaneous self-supervised reconstruction and denoising of sub-sampled MRI data with Noisier2Noise. arXiv:2210.01696. 2022 [Google Scholar]
- [65].Zou J, et al. SelfCoLearn: Self-supervised collaborative learning for accelerating dynamic MR imaging. Bioengineering. 2022;9(11) doi: 10.3390/bioengineering9110650. Art no 650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [66].Wang S, et al. PARCEL: Physics-based unsupervised contrastive representation learning for multi-coil MR imaging. IEEE/ACM Trans Comput Biol Bioinf. 2022 Oct 11; doi: 10.1109/TCBB.2022.3213669. early access. [DOI] [PubMed] [Google Scholar]
- [67].Yaman B, Hosseini SAH, Akcakaya M. Zero-shot physics-guided deep learning for subject-specific MRI reconstruction; Proc Neural Inf Process Syst Workshop Deep Learn Inverse Problems; 2021. [Online] Available: https://openreview.net/forum?id=Nzv2jICkWV7. [Google Scholar]
- [68].Ulyanov D, Vedaldi A, Lempitsky V. Deep image prior; Proc IEEE Conf Comput Vis pattern Recognit; 2018. pp. 9446–9454. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.