Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Dec 1.
Published in final edited form as: Magn Reson Med. 2020 Jul 2;84(6):3172–3191. doi: 10.1002/mrm.28378

Self-Supervised Learning of Physics-Guided Reconstruction Neural Networks without Fully-Sampled Reference Data

Burhaneddin Yaman 1,2, Seyed Amir Hossein Hosseini 1,2, Steen Moeller 2, Jutta Ellermann 2, Kâmil Uğurbil 2, Mehmet Akçakaya 1,2
PMCID: PMC7811359  NIHMSID: NIHMS1661354  PMID: 32614100

Abstract

Purpose:

To develop a strategy for training a physics-guided MRI reconstruction neural network without a database of fully-sampled datasets.

Theory and Methods:

Self-supervised learning via data under-sampling (SSDU) for physics-guided deep learning (DL) reconstruction partitions available measurements into two disjoint sets, one of which is used in the data consistency units in the unrolled network and the other is used to define the loss for training. The proposed training without fully-sampled data is compared to fully-supervised training with ground-truth data, as well as conventional compressed sensing and parallel imaging methods using the publicly available fastMRI knee database. The same physics-guided neural network is used for both proposed SSDU and supervised training. The SSDU training is also applied to prospectively 2-fold accelerated high-resolution brain datasets at different acceleration rates, and compared to parallel imaging.

Results:

Results on five different knee sequences at acceleration rate of 4 shows that proposed self-supervised approach performs closely with supervised learning, while significantly outperforming conventional compressed sensing and parallel imaging, as characterized by quantitative metrics and a clinical reader study. The results on prospectively sub-sampled brain datasets, where supervised learning cannot be employed due to lack of ground-truth reference, show that the proposed self-supervised approach successfully perform reconstruction at high acceleration rates (4, 6 and 8). Image readings indicate improved visual reconstruction quality with the proposed approach compared to parallel imaging at acquisition acceleration.

Conclusion:

The proposed SSDU approach allows training of physics-guided DL-MRI reconstruction without fully-sampled data, while achieving comparable results with supervised DL-MRI trained on fully-sampled data.

Keywords: accelerated imaging, image reconstruction, parallel imaging, deep learning, convolutional neural networks, unsupervised learning, self-supervised learning, non-linear estimation

Introduction

Data acquisition in MRI is inherently slow, necessitating the use of accelerated imaging techniques. In these approaches, data is acquired at sub-Nyquist rates, and reconstructed using additional information. Parallel imaging exploits the redundancies between receiver coils and is the most clinically used approach (13). Compressed sensing is another method that utilizes the compressibility of images based on linear sparsifying transforms for a regularized reconstruction (49), which can also be synergistically combined with multi-coil acquisitions (1012). At high acceleration rates, parallel imaging suffers from noise amplification (1315), while compressed sensing may lead to residual artifacts (16,17). Furthermore, compressed sensing reconstruction is computationally lengthy in nature and typically requires empirical fine-tuning of regularization parameters, although recent approaches using rapid self-tuning show promise for principled parameter selection (18,19).

Recently, deep learning (DL) has gained interest for high-quality accelerated MRI. DL based MRI reconstruction algorithms can be roughly divided into two categories, purely data-driven and physics-guided (20). In purely data-driven approaches, a mapping between the undersampled k-space/aliased image to full k-space/artifact-free image is learned (2126). In the so-called physics-guided methods, the knowledge of the forward encoding operator, which contains the undersampling pattern and typically the coil sensitivities, is taken into account to solve an inverse problem based on a regularized least squares objective function (2735). Some other works have directly worked with multi-coil data without explicitly including the coil sensitivities (36,37). These techniques unroll an iterative reconstruction algorithm for solving this objective method for a fixed number of iterations. The unrolled network alternates between data consistency and regularization, where the regularization is implemented implicitly using a neural network. Subsequently, these unrolled networks are trained end-to-end with a loss function that characterizes similarity with a reference image obtained from fully-sampled data (20). The parameters of the network can be different across the unrolled iterations (27,31) or shared across them (28,33).

The aforementioned physics-guided methods have been trained in a supervised manner, where fully-sampled data is used as a reference during the training. However, in many practical imaging scenarios, it is infeasible to acquire fully-sampled datasets. For instance, when imaging moving organs, such as the heart, there is often a short period of time during which the data needs to be acquired. Example acquisitions include real-time imaging, myocardial perfusion, and numerous contrast-enhanced scans (3840). Another hindrance for fully-sampled acquisitions in some applications include the signal decay. This is pronounced in acquisitions, such as diffusion MRI with echo-planar imaging, where the signal decays quickly with T2*, thus prohibiting use of fully-sampled acquisitions especially at high resolutions (41,42). In several other scenarios such as whole-heart coronary MRI or high-resolution anatomical brain imaging, it is impractical to acquire fully-sampled datasets as the scan time becomes extremely lengthy.

Furthermore, accelerated imaging methods are often used to improve acquisition resolution. When higher acceleration rates are achievable, these are not solely used for image time reduction, but rather a trade-off is made with improved resolution (12,43,44). However, this newer resolution may necessitate re-training of the DL reconstruction, since neural networks do not necessarily generalize across different resolutions, as depicted in Supporting Information Figure S1. Thus, if fully-sampled data is required for training at higher resolutions, this may lead to excessive scan times, even for anatomical imaging protocols, making it difficult to make protocol changes to fully utilize the benefits of accelerated imaging.

In this study, we sought to develop a new self-supervised learning approach to train physics-guided DL-MRI reconstruction without fully-sampled reference data. The proposed self-supervised approach which we term as Self-Supervision via Data Undersampling (SSDU) splits the acquired k-space indices into two disjoint sets. One of these is used in the data consistency unit for the network, while the other set is used to define the loss function in k-space. Hence, end-to-end training and evaluation of the network is done through only the acquired measurements without making any other assumptions about image output or characteristics. We apply the proposed self-supervised training without fully-sampled data, on the fastMRI knee datasets and prospectively undersampled high-resolution brain MRI datasets. These are compared to parallel imaging, compressed sensing and a supervised training of a DL-MRI network when fully-sampled reference data is available. Our results indicate that the proposed self-supervised method performs similarly to the supervised approach trained on fully-sampled data, although it is trained only on undersampled data.

Theory

Physics-Guided Neural Networks for MRI Reconstruction

Let x denote the image to be recovered and yΩ represent acquired k-space measurements with undersampling pattern Ω. The forward model for the acquisition is given as

yΩ=EΩx+n, (1)

where EΩ:M1×M2P is the encoding operator including a partial Fourier matrix sampling the locations specified by Ω and the coil sensitivities, and nP is measurement noise. The forward model presented in Equation [1] is usually ill-conditioned due to sub-Nyquist sampling and hence regularizers that induce prior information is incorporated into the objective function for the reconstruction. Possible choices for the regularizer include total variation (10,45,46), ℓ1-norm of wavelet coefficients (4,8,47), sparsity in adaptive transform domains (9,48), and more recently neural networks (27,28,33). The image recovery is then formulated as an optimization problem

argminxyΩEΩx22+R(x), (2)

where the first term represents data consistency with acquired measurements, while ℛ(∙) is a regularization term. The optimization problem in Equation [2] can be solved in numerous ways, including proximal gradient descent, variable splitting with quadratic penalty, alternating direction method of multipliers among others (27,30,32,49). In this study, we will consider the variable splitting with quadratic penalty approach (50) for implementation, which has also been used in previous physics-guided DL-MRI approaches (28,32). In this method, data consistency and regularization are decoupled as

argminx,zyΩEΩx22+μxz22+R(z), (3)

where z is the auxiliary variable that is initially constrained to be equal to x, and μ is the parameter for the quadratic penalty for relaxing this intermediate constrained problem to an unconstrained one. The optimization problem in Equation [3] is then solved iteratively by alternating the minimization over the variables x and z as follows

z(i1)=argminzμx(i1)z22+R(z), (4)
x(i)=argminxyΩEΩx22+μxz(i1)22, (5)

where x(0) is the initial image obtained from zero-filled under-sampled k-space data, x(i) is the network output at iteration i and z(i) is an intermediate variable. In compressed sensing methods, these problems are solved in an iterative manner by alternating between the regularizer and data consistency units until a stopping criterion met as shown in Figure 1a.

Figure 1.

Figure 1.

a) Depiction of a conventional iterative optimization algorithm for solving regularized inverse reconstruction problems. These algorithms alternate between regularization (R) and data consistency (DC). b) For neural networks, this iterative algorithm is unrolled for T steps, leading to a feed-forward structure alternating between R and DC units, where R is implemented by means of a neural network. c) The ResNet architecture (49) used as regularizer (R) in this study consists of 15 residual blocks (RB), each of which contains two convolution layers with the first one followed by a ReLU and the second one followed by a constant multiplication layer.

In physics-guided DL-MRI approaches, this iterative algorithm is unrolled for a fixed number of iterations, as depicted in Figure 1b. The regularization sub-problem in Equation [4] is implicitly solved using a neural network. The data consistency sub-problem in Equation [5] has a closed form solution

x(i)=(EΩHEΩ+μI)1(EΩHy+μz(i1)), (6)

where I is the identity operator and (∙)H is the conjugate transpose operator. Equation [6] can be solved using gradient descent or conjugate gradient, which itself is unrolled for a number of iterations (28).

Supervised Training with Fully-Sampled Reference Datasets

Supervised learning performs end-to-end training using ground truth images as the reference labels for the training loss function (21,27). Ground truth images are obtained through SENSE-1 coil combination (2), which is the sum across the coil dimension of the product of the conjugate of the coil sensitivity maps with the corresponding coil images (31,32). Suppose that xrefi is the ground truth image for subject i, and f(yΩi,EΩi;θ) denotes the output of the unrolled network that is parametrized by θ for subsampled k-space data yΩi and corresponding encoding matrix EΩi of the same subject i. The supervised training of a physics-guided DL-MRI method can be performed by minimizing the image domain loss

minθ1Ni=1NL(xrefi,f(yΩi,EΩi;θ)), (7)

where N is the number of fully-sampled training data in the database, L(.,.) denotes the loss between the ground truth and network output image(27,28,31). Alternatively, supervised training may be evaluated in k-space as

minθ1Ni=1NL(yrefi,Efulli(f(yΩi,EΩi;θ))), (8)

where yrefi is the fully-sampled reference k-space and Efulli is the fully-sampled encoding operator that transforms network output to k-space across coils. Example loss functions include ℓ1 norm, ℓ2 norm, mixed norm and perception based loss (25,32,5153). We note that the subsampling patterns Ω used in this study are equispaced and same for all subjects. However, subsampling pattern Ω may vary per subject, i.e. indexed by i, if random subsampling is used.

Proposed Self-supervised Training without Fully-Sampled Reference Data

As discussed previously, acquiring fully sampled data is often difficult or impossible in many scenarios, due to constraints such as organ motion, signal decay or lengthy scan times. Such cases pose an important challenge for the practicality of DL-MRI reconstruction methods that rely on supervised training, since ground truth data is not available for training. To tackle this problem, we propose a self-supervised approach illustrated in Figure 2, where the acquired sub-sampled data indices, Ω from each scan is divided into two sets Θ and Λ as

Ω=ΘΛ. (9)

Figure 2.

Figure 2.

The self-supervised learning scheme to train physics-guided deep learning without fully-sampled data. The acquired sub-sampled k-space measurements, Ω, are split into two disjoint sets, Θ and Λ. The first set of indices, Θ, is used in the data consistency unit of the unrolled network, while the latter set, Λ is used to define the loss function for training. During training, the output of the network is transformed to k-space, and the available subset of measurements at Λ are compared with the corresponding reconstructed k-space values. Based on this training loss, the network parameters are subsequently updated.

The set of k-space locations specified by Θ are used within the network during training in the data consistency units, while the set of k-space points in Λ are used to define the loss function. Thus, to enable training without using fully-sampled data, the following loss function is minimized

minθ1Ni=1NL(yΛi,EΛi(f(yΘi,EΘi;θ))). (10)

In other words, the unrolled network output image f(yΘi,EΘi;θ) which only uses the indices specified by Θ for data consistency is transformed to k-space using the encoding operator, EΛi specified by the k-space indices in Λ. Then the loss is calculated in k-space with respect to the acquired k-space data at these locations. In the proposed SSDU approach, Θ was chosen as Ω\Λ. Thus, in our self-supervised training methodology, the unrolled network only sees the acquired k-space data at locations Θ = Ω\Λ to enforce data consistency. The quality of the final reconstruction, i.e. the network output image, is then checked by mapping to the individual coil k-spaces via EΛi, and checking the discrepancy to these acquired measurements at these remaining locations Λ. Thus, the network is trained to decrease the discrepancy between the network output transformed to all the coil k-spaces and the acquired measurements that it does not see within its unrolled data consistency units. After the network is trained with our proposed self-supervised approach, the reconstruction for unseen test data is performed by using all available measurements at locations Ω.

Our proposed self-supervised approach share similarities with the widely used concept of cross-validation. In machine learning, cross-validation is commonly used to evaluate how accurately a model will perform with robustness to bias and over-fitting issues. Cross-validation is performed by partitioning available data into two sets, one of which is used to train the model and the other for validation, i.e. check whether the trained model generalizes to unseen data. The key difference between our approach and cross-validation is that we perform partitioning per each slice in the dataset, whereas in cross-validation the whole dataset is partitioned only once. The key hyper-parameter for success of cross-validation is the number of folds, which should be well-designed (54). Similarly, in our proposed self-supervised approach, subset selection mechanisms for Λ and Θ are critical, which are thoroughly studied in the next section.

Methods

Network and Training Details

The network for solving sub-problems [5] and [6] was unrolled for 10 iterations. The data consistency in the unrolled network was implemented with conjugate gradient method for solving Equation [6], which itself was unrolled for 10 iterations. The neural network for solving the sub-problem [5] was implemented using a convolutional neural network (CNN) based on a ResNet structure, which has shown success in other regression problems (55). This CNN, shown in Figure 1c, consisted of a layer of input and output convolution layers, and 15 residual blocks (RB) with skip connections that facilitate information flow during network training. Each RB comprised of two convolutional layers in which the first layer is followed by a rectified linear unit (ReLU) and second layer is followed by a constant multiplication layer, with factor C = 0.1 (55). All layers had a kernel size of 3×3 and 64 channels. This ResNet CNN had a total of 592,129 trainable parameters, which were shared across the unrolled iterations. Coil sensitivity maps were generated from the 24×24 center of k-space using ESPIRiT (56) using a kernel size of 6×6, as well as thresholds of 0.02 and 0.95 for calibration-matrix and eigenvalue decomposition.

A normalized ℓ1-ℓ2 loss, defined as

Lu,v=uv2u2+uv1u1, (11)

was used for both the supervised and the proposed self-supervised training. In the supervised setting, u and v correspond to the reference ground-truth image/fully-sampled k-space and network output image/network output k-space obtained by transforming network output images to k-space by applying a fully-sampled encoding operator, while for the proposed self-supervised training these correspond to the acquired k-space measurements at locations specified by Λ and the k-space corresponding to the network output image at the same locations. For supervised training, k-space loss was used throughout the study as it outperforms the image domain loss used in our preliminary results (57) (Supporting Information Figure S2), while also matching our self-supervised framework. Prior to processing, maximum absolute value of the k-space datasets was normalized to 1 in all cases. The networks were trained using the Adam optimizer with a learning rate of 10−3 unless specified otherwise, by minimizing the corresponding loss function with a batch size of 1 over 100 epochs. All training was performed using Tensorflow in Python, and processed on a workstation with an Intel E5–2640V3 CPU (2.6GHz and 256 GB memory), and an NVIDIA Tesla V100 GPU with 32 GB memory.

Choice of the Loss Mask

The proposed SSDU approach divides the acquired sub-sampled data into two disjoint sets Θ and Λ. Furthermore, in our implementation, Λ is allowed to vary for each different slice in the training database, i.e. they can be indexed as {Λi}i=1N. The subset Λ is retrospectively selected from the acquired k-space points, Ω in order to define the loss function. Hence, unlike the data acquisition process for sampling k-space locations Ω, which is affected by concerns about contrast changes or eddy current artifacts (9), selection of Λ is not limited by any physical constraints. This is because Λ is selected after data acquisition and amounts to the selection of an index set from all possible acquired k-space locations. Thus, distribution and size of Λ were the two hyper-parameters that were studied. For the distribution of Λ, a uniformly random selection among elements of Ω, as well as a variable density selection based on Gaussian random weighting were investigated. For its size, the ratio ρ = |Λ|/|Ω| was varied among 0.05, 0.1, 0.2, …,0.8, 0.9, where |∙| is the cardinality of the index set. A 5-fold cross-validation was also performed on training data for quantitative assessment of the distribution of Λ, as well as a subset of ρ values among 0.1, 0.2, 0.3, 0.4, 0.5, 0.6.

Additionally, the impact of the overlap between Θ and Λ on the reconstruction performance was also studied. The first scenario considered was the limiting case when Ω=Θ=Λ. Subsequently, we created three different partial overlap scenarios for the best performing ρ value as: 1) The first case, referred to as disjoint sets, in which there is no overlap between Θ and Λ (as originally proposed); 2) The second case, referred to as 50% overlap, where we included 50% of points from Λ in Θ as well. More formally, i.e. |Λ∩Θ| / |Λ| = 0.5; 3) Lastly, we have the 100% overlap case where all points in Λ is included in Θ as well (in this case Ω = Θ, but Λ is a subset of Ω).

Fully-Sampled Knee MRI

Knee datasets were obtained from the New York University (NYU) fastMRI initiative database, which was curated with an approval from the NYU School of Medicine Institutional Review Board (58). Fully sampled raw data were acquired on a clinical 3T system (Magnetom Skyra, Siemens, Erlangen, Germany) with a 15-channel knee coil using 2D turbo spin-echo sequences. The imaging parameters used for the knee data acquisitions are provided in the Supporting Information Table S1.

The fully-sampled raw data were under-sampled retrospectively for both training and testing using equispaced sampling patterns provided in the fastMRI database with an acceleration rate (R) = 4 (27,58,59). The center of k-space was fully-sampled with 24 lines of auto-calibrated signal (ACS). The training set consisted of 300 slices from 15 subjects for coronal PD, coronal PDFS, and 10 subjects for sagittal PD, sagittal T2, axial T2. Testing was performed on all slices from 10 different subjects for all knee sequences. Ground truth images for supervised training were generated with a SENSE-1 combination of the fully-sampled data (31,32). The proposed self-supervised approach was compared with supervised DL-MRI trained on fully-sampled dataset and conjugate gradient SENSE (CG-SENSE) (60). Additionally, comparison to a multi-coil compressed sensing reconstruction incorporating coil sensitivities with total generalized variation (TGV) as regularizer (45) was carried out for illustration purposes. However, TGV was not performed on all test datasets since it is computationally expensive, and a comparison between supervised DL-MRI and TGV was already performed in (27). For TGV, the MATLAB implementation provided by authors was utilized (45). We note that TGV and CG-SENSE approaches are shown only for comparison purposes with more traditional methods, and are not considered as competitive baseline images, consistent with previously reported results in the literature (27).

Prospectively Accelerated Brain MRI

Brain imaging was performed on 19 healthy subjects at a 3T Siemens Magnetom Prisma (Siemens Healthcare, Erlangen, Germany) system using a 32‐channel receiver head coil‐array. The imaging protocols were approved by the local institutional review board, and written informed consent was obtained from all participants before each examination for this HIPAA-compliant study. Data acquisition was performed using a standard Siemens 3D‐MPRAGE sequence with the following parameters: FOV = 224×224×157 mm3, resolution = 0.7×0.7×0.7 mm3, TR/TE = 2400 ms/2.2 ms, inversion time = 1000 ms, flip angle = 8°, band-width = 210 Hz/pixel, 3D matrix size = 320×320×224, prospective acceleration R = 2 (equispaced in ky), ACS lines = 32, acquisition orientation = sagittal. The k-space data was inverse Fourier transformed along the read-out (foot-head) direction, and these axial slices were processed individually. The prospectively undersampled brain datasets were further retrospectively undersampled to R = 4, 6, 8 using a sheared equispaced ky-kz undersampling pattern (61), with a 32×32 ACS region in the ky-kz plane. Sampling masks are provided in Supporting Information Figure S3. We note that while in principle prospectively sub-sampled data can be acquired at all these different rates, we chose to utilize further retrospective sub-sampling of prospectively accelerated data since our focus is on the reconstruction quality and this approach avoids confounding factors between different scans, such as subject motion or variations from T1 recovery. We also note that when the self-supervised approach was used at one of these higher acceleration rates, it only had access to the k-space data corresponding to that acceleration rate, both during training and testing. The learning rate for training was set to 5∙10−4. The training set consisted of 300 slices from 10 subjects, formed by taking the central 30 slices from each subject. Testing was performed on all slices from 9 different subjects.

The proposed self-supervised DL-MRI results were compared to CG-SENSE method. We note that a comparison to supervised DL-MRI was not possible in this setting, since there was no fully-sampled ground truth data.

Image Evaluation

Experimental results were quantitatively evaluated using normalized mean square error (NMSE) and structural similarity index (SSIM). Additionally, qualitative assessment of the image quality was performed by an experienced radiologist. For knee MRI, the proposed self-supervised DL- MRI approach was compared to ground truth fully-sampled images, supervised DL-MRI trained on fully-sampled data and CG-SENSE at the same acceleration R = 4. As noted earlier, TGV was not included in the comparison due to its computational complexity and availability of a previous study comparing supervised DL-MRI and TGV (27). For brain MRI, proposed self-supervised DL- MRI reconstructions at acceleration R = 4, 6 and 8 were compared with CG-SENSE approach at the acquisition acceleration R = 2. The reader was blinded to the reconstruction method, except for the knowledge of the reference image in knee MRI datasets. The order in which the methods were shown was also randomized. There were differences between the sequences used for the fastMRI database and our institutional sequences, thus this knowledge allowed the radiologist to assess the baseline image quality. All five knee MRI weightings and brain dataset were evaluated on a 4-point ordinal scale, adopted from (27) for blurring (1: no blurring, 2: mild blurring, 3: moderate blurring, 4: severe blurring), SNR (1: excellent, 2: good, 3: fair, 4: poor), aliasing artifacts(1: none, 2:mild, 3: moderate, 4: severe) and overall image quality (1: excellent, 2: good, 3: fair, 4: poor). Wilcoxon signed-rank test was used to evaluate the scores with a significance level of P < 0.05.

Results

Choice of the Loss Mask

Figure 3 depicts the self-supervised network training using varying subsets across slices by uniformly random and variable-density Gaussian selection of Λ ⊂ Ω for ρ = 0.1. Uniformly random selection of Λ suffers from visible residual artifacts, marked by red arrows. These artifacts are further suppressed in the Gaussian-based approach and difference images align with these observations. The quantitative assessment from 5-fold cross-validation are consistent with these qualitative assessments. The median and interquartile range of SSIM values were 0.9380 [0.9197, 0.9527], 0.9457 [0.9293, 0.9575], and NMSE values were 0.0021 [0.0016, 0.0027], 0.0019 [0.0015, 0.0023] using uniform random selection and Gaussian selection, respectively. Supporting Information Figure S4 shows additional reconstructions for uniform random and Gaussian selection for different ρ values, which further highlights that Gaussian selection consistently outperforms uniform random selection across different ρ values. Thus, a variable-density Gaussian selection was used for Λ for the remainder of the study.

Figure 3.

Figure 3.

a) Acquired sub-sampling pattern, Ω; b) Example uniform random and c) variable-density Gaussian random selection for subset Λ (allowed to differ for each slice in the training dataset) that is used to define the training loss; d) Ground-truth reference data; e) and f) Self-supervised DL-MRI reconstruction and corresponding difference images with loss masks Λ as in b) and c), respectively. Red arrows mark residual artifacts in uniform random selection. These artifacts are further suppressed in the Gaussian random selection, which is used for the remainder of the study.

Figure 4 shows the impact of network training with varying ρ ∈ 0.05, 0.1, 0.2, …, 0.8, 0.9 using variable-density Gaussian selection. Red arrows show visible residual artifacts for low ρ values of 0.05, 0.1, 0.2. As cardinality of Λ increases towards ρ = 0.4, residual artifacts decrease. At ρ = 0.4, visible artifacts seen at lower ρ values are further suppressed. Residual artifacts start to reappear starting from ρ =0.5, and these artifacts become more pronounced as ρ increases. The quantitative assessment from 5-fold cross-validation aligns with these qualitative assessments. The median and interquartile range of SSIM values were 0.9457 [0.9293, 0.9575], 0.9477 [0.9323, 0.9591], 0.9488 [0.9328, 0.9603], 0.9507 [0.9352, 0.9614], 0.9450 [0.9297, 0.9569], 0.9391 [0.9225, 0.9524], and NMSE values were 0.0019 [0.0015, 0.0023], 0.0018 [0.0013, 0.0023], 0.0018 [0.0014, 0.0022], 0.0017 [0.0013, 0.0021], 0.0020 [0.0015, 0.0024], 0.0022 [0.0016, 0.0028] using Gaussian selection for ρ ∈ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, respectively. Hence, ρ = 0.4 was used for the remainder of the study.

Figure 4.

Figure 4.

A representative test slice depicting the reconstruction results for different ratios of ρ = |Λ|/|Ω|. Λ is used only for defining loss function, while Θ = Ω\Λ is only used within data consistency units. Red arrows mark visible residual artifacts for ρ ≤ 0.3 and ρ≥0.5. These artifacts are suppressed at ρ = 0.4, which is used for the remainder of the study.

Figure 5 shows the impact of different degrees of overlap between Λ and Θ for ρ = |Λ|/|Ω| = 0.4, as well as the limiting case that uses all available data for both data consistency and loss (i.e. Ω=Θ=Λ). For the limiting case with Ω=Θ=Λ, the reconstruction results suffer from residual noise amplification. On the other hand, when Λ and Θ were disjoint as proposed, such noise amplifications are significantly suppressed. Quantitative SSIM and NMSE evaluation of these methods over the dataset are presented in Supporting Information Table S2, indicating that for different rates of overlap between Λ and Θ with ρ = 0.4, the performance degrades as the amount of overlap increases. Thus disjoint sets were used for the remainder of the study.

Figure 5.

Figure 5.

Reconstruction results for different degrees of overlap between Λ and Θ, i.e. |Λ∩Θ|/|Λ|, for ρ = |Λ|/|Ω| = 0.4, as well as the limiting case that uses all available data for both data consistency and loss (i.e. Ω=Θ=Λ). For the limiting case with Ω=Θ=Λ, the reconstruction suffers from noise amplification, which is significantly suppressed for the proposed disjoint Λ and Θ. The performance of the self-supervised approach degrades as the amount of overlap increases.

Knee MRI

Figure 6 demonstrates the reconstruction results of coronal PD images using CG-SENSE, TGV, supervised DL-MRI and proposed self-supervised DL-MRI approach along with the ground truth reference, as well as difference images with respect to this reference. CG-SENSE and TGV suffer from visible residual artifacts, marked by red arrows, with the latter having fewer artifacts. The proposed self-supervised and supervised DL-MRI approaches successfully remove the residual artifacts, while achieving similar qualitative and quantitative performance. Quantitative metrics and difference images displayed in the figure are in agreement with these observations. Supporting Information Figure S5 shows the training loss curves for both approaches where loss decreases over epochs in a similar trend.

Figure 6.

Figure 6.

A representative test slice from fastMRI coronal PD knee MRI dataset depicting the reconstruction results for proposed self-supervised DL-MRI, supervised DL-MRI, CG-SENSE and TGV approaches for retrospective equispaced undersampling R = 4. Zoomed views and error images show the residual artifacts observed in CG-SENSE and TGV approaches. Both self-supervised and supervised DL-MRI approaches successfully suppress these artifacts, while showing similar quantitative performance.

The same trends were observed for coronal PD-FS as depicted in Figure 7. Both proposed and supervised DL-MRI approaches show similar performance, while improving the suppression of residual artifacts that are visible in CG-SENSE and TGV methods. Quantitative evaluation and the residual artifacts apparent in the difference images also highlight these observations. Supporting Information Figure S6 show reconstruction results for axial T2, sagittal T2 and sagittal-PD weighted knee dataset which align with observation from coronal weighted knee datasets.

Figure 7.

Figure 7.

A reconstructed test slice showing reconstruction results from fastMRI coronal PD-FS datasets for retrospective equispaced undersampling R = 4. Red arrows indicate visible artifacts, especially apparent in the zoom views and error images for CG-SENSE and TGV techniques. Proposed self-supervised and supervised DL-MRI eliminate these artifacts, while showing similar quantitative and qualitative performance.

Figure 8 shows a box-plot displaying the median and interquartile range (25th-75th percentile) of the quantitative metrics, SSIM and NMSE, across all test datasets for each knee sequence. In all sequences, supervised and self-supervised DL-MRI approaches achieve similar quantitative performance for both SSIM and NMSE, while significantly outperforming the CG-SENSE approach. We note again that TGV was not included in these comparisons, as it is computationally expensive, and a comparison between supervised DL-MRI and TGV was already performed in (27).

Figure 8.

Figure 8.

Boxplots showing the median and interquartile range (25th-75th percentile) of the quantitative metrics, (a) structural similarity index and (b) normalized mean squared error (NMSE) for all five knee MRI sequences. Both proposed self-supervised and supervised DL-MRI significantly outperform CG-SENSE in terms of SSIM and NMSE for all knee sequences, while showing similar quantitative performance.

Prospectively Accelerated Brain MRI

Figure 9 depicts a sagittal slice of the 3D MPRAGE dataset at acquisition acceleration R = 2 and further retrospective acceleration R = 4, 6 and 8 reconstructed with CG-SENSE, as well as R = 4, 6 and 8 reconstructed with the proposed self-supervised DL-MRI on a representative test subject, following reformatting to the original acquisition (sagittal) plane. CG-SENSE suffers from significant noise amplification at higher acceleration rates. Self-supervised DL-MRI successfully performs reconstruction at these higher acceleration rates, while achieving lower noise level and similar overall image quality with CG-SENSE at R = 2. Results from another subject are depicted in Supporting Information Figure S7 and shows similar trends. TGV was not applied due to the high computational runtime across all axial slices, and supervised DL-MRI cannot be applied in this setting due to the lack of fully-sampled references.

Figure 9.

Figure 9

Reconstruction results from prospectively 2-fold equispaced undersampled brain MRI. CG-SENSE and the proposed self-supervised approach are applied at further retrospective acceleration rates of 4, 6 and 8 with equispaced sheared ky-kz undersampling patterns, while CG-SENSE is also used at the acquisition rate of 2. CG-SENSE suffers from visibly higher noise amplification at high acceleration rates. The proposed approach successfully reconstructs brain MRI at these higher rates, achieving similar image quality to CG-SENSE at R = 2. Note the supervised DL-MRI cannot be applied here due to the lack of fully-sampled ground truth data for training.

Image Evaluation Scores

Figure 10 summarizes the results of the reader study for knee and brain datasets. For knee datasets, both supervised and self-supervised DL-MRI approaches get comparable scores to the reference image in terms of SNR, blurring, aliasing artifacts and overall image quality. There was no statistical difference between reference and DL-MRI approaches in terms of the evaluation criterions for all knee sequences, except for blurring between reference and DL-MRI approaches in coronal PD-FS. CG-SENSE was significantly outperformed by both DL-MRI approaches, while showing statistically significant differences to the reference and both DL-MRI approaches for all knee sequences, except in blurring for coronal PD and PD-FS sequences. More comprehensive bar plots of the average scores including CG-SENSE and supervised training with image domain loss as in Equation [7] are presented in Supporting Information Figure S8.

Figure 10.

Figure 10.

The image reading results from the clinical reader study for knee and brain datasets. Bar-plots show average reader scores and their standard deviation across the test subjects. Statistical testing was performed by one-sided Wilcoxon single-rank test, with * showing significant statistical difference with P <0.05. For knee MRI, both supervised and self-supervised DL-MRI approaches get comparable scores to the reference image in terms of SNR, blurring, aliasing artifacts and overall image quality. There was no statistical difference between reference and DL-MRI approaches in terms of the evaluation criterions for the knee datasets, except for blurring between reference and DL-MRI approaches in coronal PD-FS. For brain MRI, CG-SENSE at R = 2 and self-supervision at R = 4, 6 and 8 do not show any significant differences in terms of SNR and blurring. Self-supervision at all rates were evaluated to be significantly improved compared to CG-SENSE in terms of aliasing artifacts and overall image quality. Additionally, self-supervision at R = 6 and 8 were also significantly worse than self-supervision at R = 4 in terms of overall image quality.

For the 3D MPRAGE dataset, DL-MRI reconstructions trained using the proposed self-supervised approach at acceleration rates 4, 6 and 8 show similar statistical properties in terms of SNR and blurring with CG-SENSE at acquisition R = 2. However, in terms of aliasing artifacts and overall image quality, proposed self-supervised approach at all three acceleration rates (R = 4, 6 and 8) outperform CG-SENSE at R = 2. In terms of aliasing artifacts, proposed self-supervised approach for rates 4 and 6 show similar statistical behavior with each other, while significantly improving upon self-supervised DL-MRI at R = 8 and CG-SENSE at R = 2, which perform statistically similar among themselves. Proposed self-supervised approach at R = 4 shows the best overall image quality and shows statistically significant differences with self-supervision at R = 6, 8 and CG-SENSE at R = 2. As expected, the overall image quality decreases with higher acceleration rates using the proposed self-supervised DL-MRI approach, although these techniques still outperform CG-SENSE at R = 2.

Discussion

In this study, we developed a framework for self-supervised training of physics based DL-MRI reconstruction without fully sampled data. The proposed approach split the acquired under-sampled k-space indices into two disjoint sets Θ and Λ, where the former was used across the unrolled network to enforce data consistency, while the latter was used to define the loss function for the training. The results on retrospectively under-sampled knee datasets showed that our SSDU approach achieves comparable results with a supervised DL-MRI approach using the same neural network architecture, while outperforming conventional CG-SENSE and TGV approaches. Results on prospectively under-sampled brain datasets, for which supervised learning methods cannot be applied due to unavailability of fully-sampled data, further confirmed the effectiveness of the proposed self-supervised training approach for DL-MRI reconstruction. These reconstructions at higher acceleration rates of 4, 6 and 8, visually outperformed CG-SENSE at R = 2 according to the reader study. We note that CG-SENSE was implemented without regularization, and its performance may be improved using Tikhonov regularization with the regularization parameter selected over a training set (27).

Most DL-MRI approaches use supervised learning for network training in order to provide improved accelerated MRI reconstruction (28,29,32,33,59). However, acquiring fully-sampled data is challenging in many practical scenarios of interest. These may be due to constraints on timing, physiological constraints, signal decay or long scan times (3842). As an example, the fully-sampled acquisition for the 3D MPRAGE sequence with the resolution used in this study would be more than 15 minutes (41), which is impractical for large studies and may lead to patient discomfort. Furthermore, such long scan times increase susceptibility to motion artifacts, which would be more pronounced at these high resolutions. To further highlight the need for training data, we have also performed experiments on prospectively sub-sampled snapshot cardiac MRI, where it is infeasible to collect the ground truth data. Results from these experiments are shown in Supporting Information Figure S9, showing the applicability of our method in this setting as well. Thus, being able to train DL-MRI reconstruction methods without fully-sampled data is imperative to broaden their application to settings in which such data is challenging to acquire, where supervised training are no longer practical. Furthermore, this may also facilitate the integration of DL-MRI methods to many clinical scans that readily include a form of accelerated imaging, most commonly in the form of parallel imaging, by enabling the use of prospectively undersampled raw k-space data for training.

Given the importance of training without fully sampled data, there have been several works which have tried to tackle this issue. For purely data-driven de-aliasing of single-coil data using image domain to image domain mapping without the encoding operator, a self-supervised approach has been proposed (62) using a mixture of measurement and k-space losses. Unlike our approach, it uses all available data for training and loss, i.e. identical sets. As a result, the reconstructions suffer from visible noise amplifications which also align with our observation about usage of identical sets in Figure 5. An alternative approach, which assumes the same data is acquired with two separate acquisitions using different undersampling patterns was also proposed (63,64) extending on the Noise2Noise denoising framework (65). In the same image-domain reconstruction setting, a self-supervised learning scheme using cycleGANs with optimal transport cost minimization was proposed (66), although initial results exhibit blurring artifacts. Although purely data-driven image domain methods have been used for DL-MRI reconstruction, physics-guided DL-MRI techniques are more desirable as they offer a degree of interpretability by incorporating domain knowledge on the MRI encoding mechanism (20,27,28,30,31,33). In this physics-guided setting, earlier work used the output of a regularized CG-SENSE algorithm based on compressed sensing as the reference for supervised training (67), showing that such training may outperform the compressed sensing output, as some images are over-regularized while others are under-regularized. However, this approach assumes that the compressed sensing algorithm output will be a reliable estimate of the image without residual aliasing artifacts, and thus is limited by sampling strategies and acquisition acceleration rates, as high acceleration rates or equispaced sampling may lead to degradation in the compressed sensing results. More recently, an unpaired learning approach using Wasserstein GANs was proposed (68), but this procedure still assumes the presence of high-quality images albeit not requiring pairwise matching with undersampled data. Another approach uses the so-called unsupervised basis pursuit (69,70), where the unrolled network consists of regularizer units followed by several consecutive DC units. This approach uses the current output of the DC unit as the training label, and iteratively updates both network parameters and this training label, in a method reminiscent of semi-supervised training. This method was investigated with random undersampling patterns, where intermediate outputs tend to suffer from noise amplification but without significant residual artifacts. In this setting, this approach was able to reduce noise further, even though noise amplification was observed when compared to supervised training (69,70). However, this method was not investigated for equispaced undersampling, as is the focus of this study, where intermediate DC outputs are both noisy and likely to have residual aliasing artifacts. Thus, the utility of this method in equispaced undersampling is unclear and warrants further investigation. In contrast, our SSDU approach uses physics-guided DL-MRI reconstruction, while not making any explicit assumptions about the final output in image space. In particular, we do not enforce the output of our network to align with a generative model or consider intermediate estimates as reference output for training. The training in SSDU only considers the acquired k-space data to evaluate the reconstruction quality, in effect using a physics-guided self-supervision approach. Furthermore, SSDU works for both equispaced undersampling patterns, as is the focus of the study, and random undersampling patterns (results not shown). Note the former was considered to be more challenging for physics-guided DL-MRI reconstruction in previous studies, as networks trained with equispaced sampling were shown to generalize well to random sampling, but not the vice versa (27,71).

Our training method is also reminiscent of the broader and fundamental concept of cross-validation in machine learning and statistics (72). When testing generalizability, the training database is partitioned into two sets of complementary datasets, one which is used for training the model (often called training set), and the other used to assess the performance in unseen data (often called validation/testing set). In our approach, we do a similar partitioning of the acquired data to two sets we denoted Θ and Λ. The main difference to typical cross-validation is that our partitioning is done for each subject in the training set from the database. But the intuition for partitioning within the network is similar, as the unrolled network only sees Θ for data consistency during training, while Λ is only used to establish the network loss. Indeed, as our experiments in Figure 5 show that when Θ and Λ are taken to be the same as Ω, such training leads to poor image quality with insufficient removal of aliasing artifacts and noise amplification, as the DC unit operating on the full Ω, inherently matches well with the acquired data at these locations.

Selection of the loss mask, Λ plays an important role in the performance of the proposed self-supervised training. One major design advantage is that since it only exists in post-processing, it can be chosen freely among all the acquired measurements retrospectively, without physical constraints that are imposed during acquisition. Thus even though 40% of the acquired indices in Ω were included in Λ, this is not the equivalent to training with an ~8-fold accelerated acquisition, especially for the 2D setting, since the points in Λ do not need to constitute fully-sampled readout encoding lines along kx. This point is further illustrated in Supporting Information Figure S10, in the context of supervised training. This advantage is not as clear in the training for the 3D brain dataset in this study, since the data had to be inverse Fourier transformed along the foot-head readout direction and axial slices had to be processed due to memory issues in the GPUs. In this case, the sheared equispaced ky-kz undersampling pattern readily do not include any lines, thus the selection of Λ, may affect the DC units more substantially than in the 2D knee MRI experiments. Accordingly, the self-supervised approach is expected to show more gains and better reconstruction quality at higher acceleration rates for 3D imaging if 3D neural networks can be used. Thus memory-efficient 3D neural network designs (73) may warrant further investigation, although it is beyond the scope of the current study.

The data reduction arising from data splitting between Θ and Λ poses more challenges for training and reconstruction at higher acceleration rates, even for 2D acquisitions. This was further investigated to check how the performance of self-supervised and supervised training would change at higher acceleration rates when all the training parameters and datasets are the same as described earlier. The results shown in Supporting Information Figure S11 indicate that both training methods perform similarly at R = 4 and 6 for knee MRI. However, at R = 8, where the supervised training is able to suppress artifacts albeit at the cost of blurring artifacts, the self-supervised approach starts suffering from additional residual aliasing artifacts. Thus, at higher acceleration rates, where reconstructions from the supervised training can operate without aliasing artifacts but with quality degradation, the self-supervised approach faces additional challenges including residual aliasing, due to the scarcity of data, especially after the splitting to two sets. The problem of data scarcity has been addressed by several important transfer learning methods when using supervised training with fully-sampled datasets (74,75). These approaches pre-train neural networks on fully-sampled large datasets and then fine-tune them on smaller datasets of interest. In such cases, if the smaller dataset of interest is additionally not fully-sampled, then the proposed self-supervised approach may be combined synergistically with transfer learning to tackle this challenging issue of both data scarcity and not having fully-sampled data, though this was beyond the scope of this study. We also note that there are differences between the weights of the networks from supervised and self-supervised training approaches. However, a quantitative difference, such as NMSE, between learned weights of these two training approaches does not directly translate to reconstruction performance, as shown by our results. Nonetheless, it is noteworthy that two different trained networks with differences among their weights have similar reconstruction performance during the testing stage, further alluding to the complexity of the parameter space for the neural network.

All experiments in this study were based on Cartesian acquisitions. The proposed self-supervised approach can be extended to non-Cartesian acquisitions. In non-Cartesian acquisitions such as radial or spiral acquisitions, one can choose the subsets for training and loss mask from the acquired radial spokes and spirals, similar to Cartesian acquisitions used in this study, since this amounts to selecting a subset of individual k-space points on the spokes or spirals. We also note that for non-Cartesian acquisitions, the encoding operator also contains the gridding/de-gridding operation to account for non-uniform Fourier transforms. These extensions were not investigated, as it was beyond the scope of the current work.

In this study, we compared uniformly random selection with a variable-density approach based on Gaussian weighting for selecting Λ. In our experiments, the latter selection was favored as it statistically outperformed and visibly improved upon the former. A self-supervised mask selection during the network training may further remove these hyper-parameters and potentially lead to further improvements in reconstruction. However, this is a difficult problem, which warrants further investigation, beyond the scope of the current study. Using different distributions for selecting a number of distinct Θ and Λ pairs per subject may further improve performance, but currently these distributions would need to be empirically chosen. Due to the ad-hoc nature of such a process and the wide range of available distributions, this was not explored in detail, but this idea also warrants more investigation in the context of self-supervised mask selection in future works. We also investigated the reconstruction performance using the same sets, Θ and Λ, across all training slices versus letting these vary across slices as Θi and Λi, as proposed. Although one can choose these sets to be same for all slices, such an approach bears the risk of a sub-optimal loss mask being used for all slices. Hence, having different sets for each slice in the dataset may provide additional robustness. Supporting Information Figure S12 shows that having different loss and training sets for each slice shows slight improvement over using the same sets across all the training dataset. Finally, a heuristic choice was made to keep 4×4 central k-space lines in the Θ set, as the DC units did not work well without these high-energy components. In our experience, use of larger (8×8 or 16×16) or smaller (2×2) regions deteriorated the overall performance.

The same residual network structure for regularizer and unrolled conjugate gradient for data consistency units were used throughout the study. However, our approach is not restricted to these network and DC unit choices. Alternative approaches, such as a DenseNet, U-Net or variational neural network as a regularizer CNN (27,76,77), or gradient descent for the DC unit are also possible (27,33). However, these were not explored, since such network optimization was not the focus of our study. Instead we fixed one architecture, and used this for both supervised and self-supervised training. In this study, we also shared the regularizer CNN parameters across the unrolled network, similar to (28,33), in order to enable training with a smaller training dataset. However, it is possible to use different parameters for each unrolled regularizer unit, as in (27,31), at the cost of a higher number of trainable parameters. A comparison between supervised training with shared and non-shared parameters in the unrolled network is provided in Supporting Information Figure S13. The results indicate that the two approaches perform similarly in terms of qualitative and quantitative assessments.

Selection of proper loss functions also play a vital role for network training. The ℓ2 loss is a frequently used metric in DL-MRI with promising results (20,28), but it is sensitive to outliers. On the other hand, ℓ1 loss is more robust to outliers. Hence, we used a normalized ℓ1-ℓ2 loss to take advantage of the superior properties of each loss while minimizing their disadvantages (53). Other choices of losses such as discriminative losses have also been popular for supervised training of DL-MRI methods (33,78). There have also been works to incorporate the conventional loss functions such as ℓ1 or ℓ2 into adversarial losses (25,7981). To the best of our knowledge, there are no works that use an adversarial loss in k-space, but such an extension may benefit the reconstruction quality when using the proposed self-supervision approach.

Conclusion

The proposed training framework allows training of physics-guided DL-MRI reconstruction without requiring fully-sampled data, while performing similar to conventional supervised DL-MRI approaches.

Supplementary Material

Supporting Information

Supporting Information Figure S1. Reconstruction results for the generalization performance of supervised training across different image matrix sizes. The networks are trained in by taking actual k-space, the central ½ of the k-space (i.e. reducing the resolution by 2-fold), and the central ¼ of the k-space (i.e. reducing the resolution by 4-fold). All trained networks are then applied on actual size data to test generalization. The generalization performance of CNNs on actual image size degrades as training image size get smaller, with ¼ k-space performing the worst.

Supporting Information Figure S2. Reconstruction results for supervised training with image domain (Equation [7]) and k-space (Equation [8]) losses. When using image domain loss, the reconstruction suffers from residual artifacts (red arrows), whereas using k-space loss suppresses these artifacts. Difference images also show that the supervised training with k-space loss has fewer residual artifacts. Across the dataset, the two approaches perform quantitatively similar. The median and interquartile range for SSIM values across test dataset were 0.967 [0.955, 0.978], 0.966 [0.956, 0.0977], and for NMSE values were 0.001 [0.001, 0.002], 0.001 [0.001, 0.002] for supervised with image domain and k-space losses, respectively.

Supporting Information Figure S3. Sub-sampling masks used in the brain MRI study. Prospective subsampling was equispaced with R = 2 in ky and 32 ACS lines. Subsampling patterns for R = 4, 6, 8 were obtained by sheared sub-sampling, while keeping the center 32×32 ACS region in the ky-kz plane.

Supporting Information Figure S4. Reconstruction results from self-supervised training with uniform random selection and variable-density Gaussian selection of Λ for ρ ∈ 0.1, 0.2, 0.4. Gaussian random selection consistently outperforms the uniform random selection at all ρ values in terms of reconstruction quality and suppression of residual artifacts, which is also highlighted in the difference images. For ρ ∈ 0.1, 0.2 both uniform and Gaussian random selection show visible residual artifacts, marked by red arrows, with former showing more residual artifacts. For ρ = 0.4, uniform random selection still suffers from visible residual artifacts, whereas Gaussian selection further suppress those artifacts and achieves artifact free reconstruction. Difference images further confirms the observations.

Supporting Information Figure S5. a) Training loss for supervised and self-supervised training approaches. In both cases, the loss decreases over epochs. Self-supervised approach achieves a lower loss value, as the loss is only measured on Λ, whereas the supervised loss is measured on the fully-sampled k-space. b) For both supervised and self-supervised training, the outputs of the networks is evaluated on the fully-sampled k-space loss, as defined in Equation [8] for every 10th epoch. Using a similar metric, the two approaches show similar trends over epochs, with the supervised training achieving a slightly lower loss than the self-supervised approach.

Supporting Information Figure S6. Representative reconstructed test slices from fastMRI sagittal PD, sagittal T2 and axial T2 knee sequences for retrospective equispaced undersampling R = 4. In all three sequences, CG-SENSE and TGV suffer from visible residual artifacts, marked by red arrows. Both proposed self-supervised and fully-supervised DL-MRI approaches successfully remove these residual artifacts, while showing similar quantitative and qualitative performance. Note the former does not require any fully-sampled data for training unlike the latter supervised approach.

Supporting Information Figure S7. Reconstruction results for CG-SENSE and proposed self-supervised approach for brain MRI. CG-SENSE suffers from significant noise amplification at high acceleration rates. Proposed self-supervised approach achieves high-quality reconstruction at high acceleration rates, and achieves a lower noise amplification at rate 8 compared to CG-SENSE at acquisition acceleration rate 2.

Supporting Information Figure S8. Average reader scores for all knee sequences for proposed self-supervised training, supervised training with image domain loss and CG-SENSE. Both supervised and self-supervised DL-MRI approaches get comparable scores to the reference image in terms of SNR, blurring, aliasing artifacts and overall image quality. There was no statistical difference between reference and DL-MRI approaches in terms of SNR and blurring in the knee sequences in general, except for blurring between reference and DL-MRI approaches in coronal PD-FS. In terms of aliasing artifacts and overall image quality, there were no statistical difference between reference and the two DL-MRI approaches for coronal PD, coronal PD-FS and sagittal PD sequences. However, for sagittal T2 sequence, supervised DL-MRI was ranked statistically worse than the reference, while for axial T2, it was ranked lower than both the reference and self-supervised DL-MRI. Thus, in general, both DL-MRI approaches performed well, but the self-supervised approach was slightly more favored by the reader, who was blinded to the reconstruction method. CG-SENSE was significantly outperformed by both DL-MRI approaches, while showing statistically significant differences to the reference and both DL-MRI approaches for all knee sequences, except in blurring for coronal PD and PD-FS sequences. Finally, we also note that the supervised training with k-space loss (Figure 10) outperforms supervised training with image domain loss in terms of reader scores for axial T2, coronal PD-FS and sagittal T2 sequences.

Supporting Information Figure S9. Reconstructed images from an 8-fold accelerated snapshot cardiac MRI data with 1.3×1.3 mm2 in-plane resolution, acquired using a transient bSSFP sequence. These type of acquisitions are commonly used in cardiac parametric mapping, where the image data for one contrast weighting need to be acquired within the diastolic quiescence of one heartbeat. A fully-sampled acquisition at this higher resolution would take >700 ms, which is impossible to fit in the diastolic quiescence of a single heart-beat. Training data was acquired on 14 subjects, and testing was performed on a different subject, using the approach described in the manuscript. The proposed self-supervised approach achieves high-quality reconstruction, outperforming CG-SENSE, which suffers from residual artifacts and high noise.

Supporting Information Figure S10. Reconstruction results for proposed self-supervised training at R = 4, supervised training at R = 4, R = 4 with ρ = 0.4, and R = 8. The amount of data used for self-supervised/supervised training at R = 4 (24 ACS lines) with ρ = 0.4 is 21120 k-space points, which is approximately equivalent to training the network with an equispaced undersampling pattern of R = 8 (24 ACS lines) with 21440 k-space points. The results show that supervised training at R = 4 with ρ = 0.4 is visibly similar with supervised and proposed self-supervised training at R = 4, and outperforms supervised training at R = 8. These results are visibly highlighted in difference images, which show supervised training at R = 8 suffering from residual artifacts, while other approaches show similar performance. Quantitative metrics on test dataset aligns with these qualitative assessments. The median and interquartile range for SSIM across test dataset were 0.961 [0.947, 0.972], 0.966 [0.956, 0.977], 0.966 [0.954, 0.976], 0.929 [0.908, 0.950], and NMSE were 0.002 [0.001, 0.002], 0.001 [0.001, 0.002], 0.002 [0.001, 0.002], 0.004 [0.003, 0.005] for proposed self-supervised at R = 4, supervised at R = 4, supervised at R = 4 with ρ = 0.4, and supervised at R = 8, respectively.

Supporting Information Figure S11. Reconstruction results for the coronal PD-weighted dataset at acceleration rates of 4, 6 and 8. For R = 4 and 6, the proposed self-supervised approach performs similarly with the supervised approach. However, at R = 8, the image quality degrades for both methods with more pronounced blurring, while the self-supervised approach further suffers from visible residual aliasing artifacts.

Supporting Information Figure S12. Reconstruction results for the proposed self-supervised approach when using same or varying sets, Θ and Λ, across different training slices. The two approaches perform similarly with the varying mask approach showing slight improvement. The median and interquartile ranges for SSIM across the test dataset were 0.959 [0.945, 0.970], 0.960 [0.947, 0.0971], and for NMSEs were 0.002 [0.001, 0.002], 0.002 [0.001, 0.002] for varying mask and same mask scenarios, respectively.

Supporting Information Figure S13. Reconstruction results for supervised training when using shared and distinct (non-shared) parameters across the unrolled network. The two approaches perform similarly both visually and quantitatively. The interquartile range of SSIM values across the test dataset were 0.967 [0.955, 0.978], 0.964 [0.953, 0.975], and NMSE values were 0.001 [0.001, 0.002], 0.001 [0.001, 0.002] for shared and non-shared scenarios, respectively. Note that the same training database was used for the two approaches. The non-shared approach has 10 times as many trainable parameters, and its generalization performance may benefit from a larger training database. This was not studied as it is not the focus of our study.

Supporting Information Table S1. Imaging parameters for the knee datasets.

Supporting Information Table S2. Median and interquartile range (25th -75th percentile) of the quantitative evaluation of SSIM and NMSE values for different overlap scenarios between Λ and Θ when ρ = 0.4. Overlap %, defined as |Λ∩Θ|/|Λ| refers to the amount of data in the loss mask Λ that was also included in the training mask Θ. Performance of the self-supervised training degrades as the amount of overlap increases.

Acknowledgements

Knee MRI data were obtained from the NYU fastMRI initiative database (58). NYU fastMRI database was acquired with the relevant institutional review board approvals as detailed in (58). NYU fastMRI investigators provided data but did not participate in analysis or writing of this report. A listing of NYU fastMRI investigators, subject to updates, can be found at fastmri.med.nyu.edu.

Funding:

NIH, Grant numbers: U01EB025144, P41EB027061; NSF, Grant number: CAREER CCF-1651825

References

  • 1.Griswold MA, Jakob PM, Heidemann RM, Nittka M, Jellus V, Wang J, Kiefer B, Haase A. Generalized autocalibrating partially parallel acquisitions (GRAPPA),. Magn Reson Med 2002;47(6):1202–1210. [DOI] [PubMed] [Google Scholar]
  • 2.Pruessmann KP, Weiger M, Scheidegger MB, Boesiger P. SENSE: sensitivity encoding for fast MRI. Magn Reson Med 1999;42(5):952–962. [PubMed] [Google Scholar]
  • 3.Lustig M, Pauly JM. SPIRiT: Iterative self-consistent parallel imaging reconstruction from arbitrary k-space. Magn Reson Med 2010;64(2):457–471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Lustig M, Donoho D, Pauly JM. Sparse MRI: The application of compressed sensing for rapid MR imaging. Magn Reson Med 2007;58(6):1182–1195. [DOI] [PubMed] [Google Scholar]
  • 5.Haldar JP, Hernando D, Liang ZP. Compressed-sensing MRI with random encoding. IEEE Trans Med Imaging 2011;30(4):893–903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Trzasko J, Manduca A. Highly Undersampled Magnetic Resonance Image Reconstruction via Homotopic $\ell_{0}$ -Minimization. Volume 28: IEEE Transactions on Medical Imaging; 2009. p 106–121. [DOI] [PubMed] [Google Scholar]
  • 7.Ye JC, Tak S, Han Y, Park HW. Projection reconstruction MR imaging using FOCUSS. Magn Reson Med 2007;57(4):764–775. [DOI] [PubMed] [Google Scholar]
  • 8.Akcakaya M, Nam S, Hu P, Moghari MH, Ngo LH, Tarokh V, Manning WJ, Nezafat R. Compressed sensing with wavelet domain dependencies for coronary MRI: a retrospective study. IEEE Trans Med Imaging 2011;30(5):1090–1099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Akcakaya M, Basha TA, Goddu B, Goepfert LA, Kissinger KV, Tarokh V, Manning WJ, Nezafat R. Low-dimensional-structure self-learning and thresholding: Regularization beyond compressed sensing for MRI Reconstruction. Magn Reson Med 2011;66(3):756–767. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Block KT, Uecker M, Frahm J. Undersampled radial MRI with multiple coils. Iterative image reconstruction using a total variation constraint. Magn Reson Med 2007;57(6):1086–1098. [DOI] [PubMed] [Google Scholar]
  • 11.Liang D, Liu B, Wang J, Ying L. Accelerating SENSE using compressed sensing. Magn Reson Med 2009;62(6):1574–1584. [DOI] [PubMed] [Google Scholar]
  • 12.Otazo R, Kim D, Axel L, Sodickson DK. Combination of compressed sensing and parallel imaging for highly accelerated first-pass cardiac perfusion MRI. Magn Reson Med 2010;64(3):767–776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Robson PM, Grant AK, Madhuranthakam AJ, Lattanzi R, Sodickson DK, McKenzie CA. Comprehensive quantification of signal-to-noise ratio and g-factor for image-based and k-space-based parallel imaging reconstructions. Magn Reson Med 2008;60(4):895–907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Chang Y, Liang D, Ying L. Nonlinear GRAPPA: a kernel approach to parallel MRI reconstruction. Magn Reson Med 2012;68(3):730–740. [DOI] [PubMed] [Google Scholar]
  • 15.Madore B UNFOLD-SENSE: a parallel MRI method with self-calibration and artifact suppression. Magn Reson Med 2004;52(2):310–320. [DOI] [PubMed] [Google Scholar]
  • 16.Sung K, Hargreaves BA. High-frequency subband compressed sensing MRI using quadruplet sampling. Magn Reson Med 2013;70(5):1306–1318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Yang Y, Sun J, Li H, Xu Z. ADMM-CSNet: A Deep Learning Approach for Image Compressive Sensing. IEEE Trans Pattern Anal Mach Intell 2018. [DOI] [PubMed] [Google Scholar]
  • 18.Shahdloo M, Ilicak E, Tofighi M, Saritas EU, Cetin AE, Cukur T. Projection onto Epigraph Sets for Rapid Self-Tuning Compressed Sensing MRI. IEEE Trans Med Imaging 2019;38(7):1677–1689. [DOI] [PubMed] [Google Scholar]
  • 19.Ramani S, Liu Z, Rosen J, Nielsen JF, Fessler JA. Regularization parameter selection for nonlinear iterative image restoration and MRI reconstruction using GCV and SURE-based methods. IEEE Trans Image Process 2012;21(8):3659–3672. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Liang D, Cheng J, Ke Z, Ying L. Deep MRI Reconstruction: Unrolled Optimization Algorithms Meet Neural Networks. arXiv preprint arXiv:1907.11711; 2019. [Google Scholar]
  • 21.Wang S, Su Z, Ying L, Peng X, Zhu S, Liang F, Feng D, Liang D. Accelerating magnetic resonance imaging via deep learning. IEEE 13th International Symposium on Biomedical Imaging (ISBI); 2016. p 514–517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lee D, Yoo J, Tak S, Ye JC. Deep Residual Learning for Accelerated MRI Using Magnitude and Phase Networks. IEEE Trans Biomed Eng 2018;65(9):1985–1995. [DOI] [PubMed] [Google Scholar]
  • 23.Akcakaya M, Moeller S, Weingartner S, Ugurbil K. Scan-specific robust artificial-neural-networks for k-space interpolation (RAKI) reconstruction: Database-free deep learning for fast imaging. Magn Reson Med 2019;81(1):439–453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhu B, Liu JZ, Cauley SF, Rosen BR, Rosen MS. Image reconstruction by domain-transform manifold learning. Nature 2018;555(7697):487–492. [DOI] [PubMed] [Google Scholar]
  • 25.Mardani M, Gong E, Cheng JY, Vasanawala SS, Zaharchuk G, Xing L, Pauly JM. Deep Generative Adversarial Neural Networks for Compressive Sensing MRI. IEEE Trans Med Imaging 2019;38(1):167–179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Han Y, Sunwoo L, Ye JC. k-Space Deep Learning for Accelerated MRI. IEEE Trans Med Imaging 2019. [DOI] [PubMed] [Google Scholar]
  • 27.Hammernik K, Klatzer T, Kobler E, Recht MP, Sodickson DK, Pock T, Knoll F. Learning a variational network for reconstruction of accelerated MRI data. Magn Reson Med 2018;79(6):3055–3071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Aggarwal HK, Mani MP, Jacob M. MoDL: Model-Based Deep Learning Architecture for Inverse Problems. IEEE Trans Med Imaging 2019;38(2):394–405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Zhang J, Ghanem B. ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p 1828–1837. [Google Scholar]
  • 30.Yang Y, Sun J, Li H, Xu Z. Deep ADMM-Net for compressive sensing MRI. Advances in neural information processing systems; 2016. p 10–18. [Google Scholar]
  • 31.Schlemper J, Caballero J, Hajnal JV, Price AN, Rueckert D. A Deep Cascade of Convolutional Neural Networks for Dynamic MR Image Reconstruction. IEEE Trans Med Imaging 2018;37(2):491–503. [DOI] [PubMed] [Google Scholar]
  • 32.Qin C, Schlemper J, Caballero J, Price AN, Hajnal JV, Rueckert D. Convolutional Recurrent Neural Networks for Dynamic MR Image Reconstruction. IEEE Trans Med Imaging 2019;38(1):280–290. [DOI] [PubMed] [Google Scholar]
  • 33.Mardani M, Sun Q, Donoho D, Papyan V, Monajemi H, Vasanawala S, Pauly J. Neural proximal gradient descent for compressive imaging. Advances in Neural Information Processing Systems; 2018. p 9573–9583. [Google Scholar]
  • 34.Gregor K, LeCun Y. Learning fast approximations of sparse coding. International Conference on International Conference on Machine Learning; 2010. p 399–406. [Google Scholar]
  • 35.Hosseini SAH, Yaman B, Moeller S, Hong M, Akcakaya M. Dense Recurrent Neural Networks for Inverse Problems: History-Cognizant Unrolling of Optimization Algorithms. arXiv:1912.07197; 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Cheng JY, Pauly JM, Vasanawala SS. Multi-channel Image Reconstruction with Latent Coils and Adverserial Loss. Proceedings of the 27th Annual Meeting of ISMRM; 2019. [Google Scholar]
  • 37.Wang P, Chen EZ, Chen T, Patel VM, Sun S. Pyramid Convolutional RNN for MRI Reconstruction. Advances in Neural Information Processing Systems Workshops; 2019. [Google Scholar]
  • 38.Haji-Valizadeh H, Rahsepar AA, Collins JD, Bassett E, Isakova T, Block T, Adluru G, DiBella EVR, Lee DC, Carr JC, Kim D, Group COMwBaNCS. Validation of highly accelerated real-time cardiac cine MRI with radial k-space sampling and compressed sensing in patients at 1.5T and 3T. Magn Reson Med 2018;79(5):2745–2751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Coelho-Filho OR, Rickers C, Kwong RY, Jerosch-Herold M. MR myocardial perfusion imaging. Radiology 2013;266(3):701–715. [DOI] [PubMed] [Google Scholar]
  • 40.Kellman P, Hansen MS. T1-mapping in the heart: accuracy and precision. J Cardiovasc Magn Reson 2014;16:2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Uğurbil K, Xu J, Auerbach EJ, Moeller S, Vu AT, Duarte-Carvajalino JM, Lenglet C, Wu X, Schmitter S, Van de Moortele PF, Strupp J, Sapiro G, De Martino F, Wang D, Harel N, Garwood M, Chen L, Feinberg DA, Smith SM, Miller KL, Sotiropoulos SN, Jbabdi S, Andersson JL, Behrens TE, Glasser MF, Van Essen DC, Yacoub E, Consortium W-MH. Pushing spatial and temporal resolution for functional and diffusion MRI in the Human Connectome Project. Neuroimage 2013;80:80–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Setsompop K, Kimmlingen R, Eberlein E, Witzel T, Cohen-Adad J, McNab JA, Keil B, Tisdall MD, Hoecht P, Dietz P, Cauley SF, Tountcheva V, Matschl V, Lenz VH, Heberlein K, Potthast A, Thein H, Van Horn J, Toga A, Schmitt F, Lehne D, Rosen BR, Wedeen V, Wald LL. Pushing the limits of in vivo diffusion MRI for the Human Connectome Project. Neuroimage 2013;80:220–233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Jung H, Sung K, Nayak KS, Kim EY, Ye JC. k-t FOCUSS: a general compressed sensing framework for high resolution dynamic MRI. Magn Reson Med 2009;61(1):103–116. [DOI] [PubMed] [Google Scholar]
  • 44.Gamper U, Boesiger P, Kozerke S. Compressed sensing in dynamic MRI. Magn Reson Med 2008;59(2):365–373. [DOI] [PubMed] [Google Scholar]
  • 45.Knoll F, Bredies K, Pock T, Stollberger R. Second order total generalized variation (TGV) for MRI. Magn Reson Med 2011;65(2):480–491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Hu Y, Jacob M. Higher Degree Total Variation (HDTV) Regularization for Image Recovery. Volume 21: IEEE Transactions on Image Processing; 2012. p 2559–2571. [DOI] [PubMed] [Google Scholar]
  • 47.Doneva M, Bornert P, Eggers H, Stehning C, Senegas J, Mertins A. Compressed sensing reconstruction for magnetic resonance parameter mapping. Magn Reson Med 2010;64(4):1114–1120. [DOI] [PubMed] [Google Scholar]
  • 48.Ravishankar S, Bresler Y. MR image reconstruction from highly undersampled k-space data by dictionary learning. IEEE Trans Med Imaging 2011;30(5):1028–1041. [DOI] [PubMed] [Google Scholar]
  • 49.Fessler JA. Optimization methods for MR image reconstruction. arXiv:1903.03510; 2019. [Google Scholar]
  • 50.Afonso MV, Bioucas-Dias JM, Figueiredo MA. Fast image recovery using variable splitting and constrained optimization. IEEE Trans Image Process 2010;19(9):2345–2356. [DOI] [PubMed] [Google Scholar]
  • 51.Hammernik K, Knoll F, Sodickson DK, Pock T. L2 or not L2: impact of loss function design for deep learning MRI reconstruction. ISMRM 25th Annual Meeting; 2017. p 687. [Google Scholar]
  • 52.Quan TM, Nguyen-Duc T, Jeong WK. Compressed Sensing MRI Reconstruction Using a Generative Adversarial Network With a Cyclic Loss. IEEE Trans Med Imaging 2018;37(6):1488–1497. [DOI] [PubMed] [Google Scholar]
  • 53.Knoll F, Hammernik K, Zhang C, Moeller S, Pock T, Sodickson DK, Akcakaya M. Deep-Learning Methods for Parallel Magnetic Resonance Imaging Reconstruction: A Survey of the Current Approaches, Trends, and Issues. IEEE Signal Processing Magazine; 2020. p 128–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Arlot S, Lerasle M. Choice of V for V-fold cross-validation in least-squares density estimation. Volume 17: The Journal of Machine Learning Research; 2016. p 7256–7305. [Google Scholar]
  • 55.Timofte R, Agustsson E, Van Gool L, Yang M-H, Zhang L. Ntire 2017 challenge on single image super-resolution: Methods and results. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; 2017. p 114–125. [Google Scholar]
  • 56.Uecker M, Lai P, Murphy MJ, Virtue P, Elad M, Pauly JM, Vasanawala SS, Lustig M. ESPIRiT--an eigenvalue approach to autocalibrating parallel MRI: where SENSE meets GRAPPA. Magn Reson Med 2014;71(3):990–1001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Yaman B, Hosseini SAH, Moeller S, Ellermann J, Ugurbil K, Akcakaya M. Self-Supervised Physics-Based Deep Learning MRI Reconstruction Without Fully-Sampled Data. Proceedings of IEEE 17th International Symposium on Biomedical Imaging (ISBI); 2020. [Google Scholar]
  • 58.Zbontar J, Knoll F, Sriram A, Muckley MJ, Bruno M, Defazio A, Parente M, Geras KJ, Katsnelson J, Chandarana H, others. FastMRI: An open dataset and benchmarks for accelerated MRI. arXiv preprint arXiv:1811.08839; 2018. [Google Scholar]
  • 59.Knoll F, Hammernik K, Kobler E, Pock T, Recht MP, Sodickson DK. Assessment of the generalization of learned image reconstruction and the potential for transfer learning. Magn Reson Med 2019;81(1):116–128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Pruessmann KP, Weiger M, Bornert P, Boesiger P. Advances in sensitivity encoding with arbitrary k-space trajectories. Magn Reson Med 2001;46(4):638–651. [DOI] [PubMed] [Google Scholar]
  • 61.Breuer FA, Blaimer M, Mueller MF, Seiberlich N, Heidemann RM, Griswold MA, Jakob PM. Controlled aliasing in volumetric parallel imaging (2D CAIPIRINHA). Magn Reson Med 2006;55(3):549–556. [DOI] [PubMed] [Google Scholar]
  • 62.Senouf O, Vedula S, Weiss T, Bronstein A, Michailovich O, Zibulevsky M. Self-supervised learning of inverse problem solvers in medical imaging. arXiv:1905.09325; 2019. [Google Scholar]
  • 63.Huang P, Zhang C, Li H, Gaire SK, Liu R, Zhang X, Li X, Ying L. Deep MRI Reconstruction without Ground Truth for Training. In Proceedings of 27th Annual Meeting of ISMRM2019. [Google Scholar]
  • 64.Liu J, Sun Y, Eldeniz C, Gan W, An H, Kamilov US. RARE: Image Reconstruction using Deep Priors Learned without Ground Truth. arXiv:1912.05854; 2019. [Google Scholar]
  • 65.Lehtinen J, Munkberg J, Hasselgren J, Laine S, Karras T, Aittala M, Aila T. Noise2noise: Learning image restoration without clean data. arXiv preprint arXiv:1803.04189 2018. [Google Scholar]
  • 66.Sim B, Oh G, Lim S, Ye JC. Optimal Transport, CycleGAN, and Penalized LS for Unsupervised Learning in Inverse Problems. arXiv:1909.12116; 2019. [Google Scholar]
  • 67.Chen F, Taviani V, Malkiel I, Cheng JY, Tamir JI, Shaikh J, Chang ST, Hardy CJ, Pauly JM, Vasanawala SS. Variable-Density Single-Shot Fast Spin-Echo MRI with Deep Learning Reconstruction by Using Variational Networks. Radiology 2018;289(2):366–373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Lei K, Mardani M, Pauly JM, Vasawanala SS. Wasserstein GANs for MR Imaging: from Paired to Unpaired Training. arXiv preprint arXiv:1910.07048; 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Tamir JI, Stella XY, Lustig M. Unsupervised Deep Basis Pursuit: Learning Reconstruction without Ground-Truth Data. Proceedings of the 27th Annual Meeting of ISMRM; 2019. [Google Scholar]
  • 70.Tamir JI, Yu SX, Lustig M. Unsupervised Deep Basis Pursuit: Learning inverse problems without ground-truth data. Advances in Neural Information Processing Systems Workshops; 2019. [Google Scholar]
  • 71.Hammernik K, Knoll F, Sodickson D, Pock T. On the influence of sampling pattern design on deep learning-based MRI reconstruction. In Proceedings of 25th Annual Meeting of ISMRM2017 p 644. [Google Scholar]
  • 72.Browne MW. Cross-validation methods. Volume 44 Journal of mathematical psychology: Elsevier; 2000. p 108–132. [DOI] [PubMed] [Google Scholar]
  • 73.Kellman M, Zhang K, Tamir J, Bostan E, Lustig M, Waller L. Memory-efficient Learning for Large-scale Computational Imaging. arXiv:2003.05551; 2020. [Google Scholar]
  • 74.Dar SUH, Özbey M, Çatlı AB, Çukur T. A Transfer-Learning Approach for Accelerated MRI Using Deep Neural Networks. Magn Reson Med 2020. [DOI] [PubMed] [Google Scholar]
  • 75.Han Y, Yoo J, Kim HH, Shin HJ, Sung K, Ye JC. Deep learning with domain adaptation for accelerated projection-reconstruction MR. Magn Reson Med 2018;80(3):1189–1205. [DOI] [PubMed] [Google Scholar]
  • 76.Hu Y, Shi X, Tian Q, Guo H, Deng M, Yu M, Moran C, McNab JA, Daniel B, Hargreaves B. Reconstruction of multi-shot diffusion-weighted MRI using unrolled network with U-nets as priors. ISMRM 27th Annual Meeting; 2019. [Google Scholar]
  • 77.Yaman B, Hosseini SAH, Moeller S, Akcakaya M. Comparison of Neural Network Architectures for Physics-Driven Deep Learning MRI Reconstruction. IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)2019. p 0155–0159. [Google Scholar]
  • 78.Sanchez I, Vilaplana V. Brain MRI super-resolution using 3D generative adversarial networks. International Conference on Medical Imaging with Deep Learning; 2018. [Google Scholar]
  • 79.Lei L, Mardani M. Semi-supervised Super-resolution GANs for MRI Reconstruction. Neural Information Processing Systems; 2017. [Google Scholar]
  • 80.Yang G, Yu S, Dong H, Slabaugh G, Dragotti PL, Ye X, Liu F, Arridge S, Keegan J, Guo Y, Firmin D. DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruction. IEEE Trans Med Imaging 2018;37(6):1310–1321. [DOI] [PubMed] [Google Scholar]
  • 81.Dar SUH, Yurt M, Shahdloo M, Ildiz ME, Cukur T. Synergistic reconstruction and synthesis via generative adversarial networks for accelerated multi-contrast MRI. arXiv:1805.10704; 2018. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Supporting Information Figure S1. Reconstruction results for the generalization performance of supervised training across different image matrix sizes. The networks are trained in by taking actual k-space, the central ½ of the k-space (i.e. reducing the resolution by 2-fold), and the central ¼ of the k-space (i.e. reducing the resolution by 4-fold). All trained networks are then applied on actual size data to test generalization. The generalization performance of CNNs on actual image size degrades as training image size get smaller, with ¼ k-space performing the worst.

Supporting Information Figure S2. Reconstruction results for supervised training with image domain (Equation [7]) and k-space (Equation [8]) losses. When using image domain loss, the reconstruction suffers from residual artifacts (red arrows), whereas using k-space loss suppresses these artifacts. Difference images also show that the supervised training with k-space loss has fewer residual artifacts. Across the dataset, the two approaches perform quantitatively similar. The median and interquartile range for SSIM values across test dataset were 0.967 [0.955, 0.978], 0.966 [0.956, 0.0977], and for NMSE values were 0.001 [0.001, 0.002], 0.001 [0.001, 0.002] for supervised with image domain and k-space losses, respectively.

Supporting Information Figure S3. Sub-sampling masks used in the brain MRI study. Prospective subsampling was equispaced with R = 2 in ky and 32 ACS lines. Subsampling patterns for R = 4, 6, 8 were obtained by sheared sub-sampling, while keeping the center 32×32 ACS region in the ky-kz plane.

Supporting Information Figure S4. Reconstruction results from self-supervised training with uniform random selection and variable-density Gaussian selection of Λ for ρ ∈ 0.1, 0.2, 0.4. Gaussian random selection consistently outperforms the uniform random selection at all ρ values in terms of reconstruction quality and suppression of residual artifacts, which is also highlighted in the difference images. For ρ ∈ 0.1, 0.2 both uniform and Gaussian random selection show visible residual artifacts, marked by red arrows, with former showing more residual artifacts. For ρ = 0.4, uniform random selection still suffers from visible residual artifacts, whereas Gaussian selection further suppress those artifacts and achieves artifact free reconstruction. Difference images further confirms the observations.

Supporting Information Figure S5. a) Training loss for supervised and self-supervised training approaches. In both cases, the loss decreases over epochs. Self-supervised approach achieves a lower loss value, as the loss is only measured on Λ, whereas the supervised loss is measured on the fully-sampled k-space. b) For both supervised and self-supervised training, the outputs of the networks is evaluated on the fully-sampled k-space loss, as defined in Equation [8] for every 10th epoch. Using a similar metric, the two approaches show similar trends over epochs, with the supervised training achieving a slightly lower loss than the self-supervised approach.

Supporting Information Figure S6. Representative reconstructed test slices from fastMRI sagittal PD, sagittal T2 and axial T2 knee sequences for retrospective equispaced undersampling R = 4. In all three sequences, CG-SENSE and TGV suffer from visible residual artifacts, marked by red arrows. Both proposed self-supervised and fully-supervised DL-MRI approaches successfully remove these residual artifacts, while showing similar quantitative and qualitative performance. Note the former does not require any fully-sampled data for training unlike the latter supervised approach.

Supporting Information Figure S7. Reconstruction results for CG-SENSE and proposed self-supervised approach for brain MRI. CG-SENSE suffers from significant noise amplification at high acceleration rates. Proposed self-supervised approach achieves high-quality reconstruction at high acceleration rates, and achieves a lower noise amplification at rate 8 compared to CG-SENSE at acquisition acceleration rate 2.

Supporting Information Figure S8. Average reader scores for all knee sequences for proposed self-supervised training, supervised training with image domain loss and CG-SENSE. Both supervised and self-supervised DL-MRI approaches get comparable scores to the reference image in terms of SNR, blurring, aliasing artifacts and overall image quality. There was no statistical difference between reference and DL-MRI approaches in terms of SNR and blurring in the knee sequences in general, except for blurring between reference and DL-MRI approaches in coronal PD-FS. In terms of aliasing artifacts and overall image quality, there were no statistical difference between reference and the two DL-MRI approaches for coronal PD, coronal PD-FS and sagittal PD sequences. However, for sagittal T2 sequence, supervised DL-MRI was ranked statistically worse than the reference, while for axial T2, it was ranked lower than both the reference and self-supervised DL-MRI. Thus, in general, both DL-MRI approaches performed well, but the self-supervised approach was slightly more favored by the reader, who was blinded to the reconstruction method. CG-SENSE was significantly outperformed by both DL-MRI approaches, while showing statistically significant differences to the reference and both DL-MRI approaches for all knee sequences, except in blurring for coronal PD and PD-FS sequences. Finally, we also note that the supervised training with k-space loss (Figure 10) outperforms supervised training with image domain loss in terms of reader scores for axial T2, coronal PD-FS and sagittal T2 sequences.

Supporting Information Figure S9. Reconstructed images from an 8-fold accelerated snapshot cardiac MRI data with 1.3×1.3 mm2 in-plane resolution, acquired using a transient bSSFP sequence. These type of acquisitions are commonly used in cardiac parametric mapping, where the image data for one contrast weighting need to be acquired within the diastolic quiescence of one heartbeat. A fully-sampled acquisition at this higher resolution would take >700 ms, which is impossible to fit in the diastolic quiescence of a single heart-beat. Training data was acquired on 14 subjects, and testing was performed on a different subject, using the approach described in the manuscript. The proposed self-supervised approach achieves high-quality reconstruction, outperforming CG-SENSE, which suffers from residual artifacts and high noise.

Supporting Information Figure S10. Reconstruction results for proposed self-supervised training at R = 4, supervised training at R = 4, R = 4 with ρ = 0.4, and R = 8. The amount of data used for self-supervised/supervised training at R = 4 (24 ACS lines) with ρ = 0.4 is 21120 k-space points, which is approximately equivalent to training the network with an equispaced undersampling pattern of R = 8 (24 ACS lines) with 21440 k-space points. The results show that supervised training at R = 4 with ρ = 0.4 is visibly similar with supervised and proposed self-supervised training at R = 4, and outperforms supervised training at R = 8. These results are visibly highlighted in difference images, which show supervised training at R = 8 suffering from residual artifacts, while other approaches show similar performance. Quantitative metrics on test dataset aligns with these qualitative assessments. The median and interquartile range for SSIM across test dataset were 0.961 [0.947, 0.972], 0.966 [0.956, 0.977], 0.966 [0.954, 0.976], 0.929 [0.908, 0.950], and NMSE were 0.002 [0.001, 0.002], 0.001 [0.001, 0.002], 0.002 [0.001, 0.002], 0.004 [0.003, 0.005] for proposed self-supervised at R = 4, supervised at R = 4, supervised at R = 4 with ρ = 0.4, and supervised at R = 8, respectively.

Supporting Information Figure S11. Reconstruction results for the coronal PD-weighted dataset at acceleration rates of 4, 6 and 8. For R = 4 and 6, the proposed self-supervised approach performs similarly with the supervised approach. However, at R = 8, the image quality degrades for both methods with more pronounced blurring, while the self-supervised approach further suffers from visible residual aliasing artifacts.

Supporting Information Figure S12. Reconstruction results for the proposed self-supervised approach when using same or varying sets, Θ and Λ, across different training slices. The two approaches perform similarly with the varying mask approach showing slight improvement. The median and interquartile ranges for SSIM across the test dataset were 0.959 [0.945, 0.970], 0.960 [0.947, 0.0971], and for NMSEs were 0.002 [0.001, 0.002], 0.002 [0.001, 0.002] for varying mask and same mask scenarios, respectively.

Supporting Information Figure S13. Reconstruction results for supervised training when using shared and distinct (non-shared) parameters across the unrolled network. The two approaches perform similarly both visually and quantitatively. The interquartile range of SSIM values across the test dataset were 0.967 [0.955, 0.978], 0.964 [0.953, 0.975], and NMSE values were 0.001 [0.001, 0.002], 0.001 [0.001, 0.002] for shared and non-shared scenarios, respectively. Note that the same training database was used for the two approaches. The non-shared approach has 10 times as many trainable parameters, and its generalization performance may benefit from a larger training database. This was not studied as it is not the focus of our study.

Supporting Information Table S1. Imaging parameters for the knee datasets.

Supporting Information Table S2. Median and interquartile range (25th -75th percentile) of the quantitative evaluation of SSIM and NMSE values for different overlap scenarios between Λ and Θ when ρ = 0.4. Overlap %, defined as |Λ∩Θ|/|Λ| refers to the amount of data in the loss mask Λ that was also included in the training mask Θ. Performance of the self-supervised training degrades as the amount of overlap increases.

RESOURCES