Skip to main content
Bioengineering logoLink to Bioengineering
. 2022 Nov 4;9(11):650. doi: 10.3390/bioengineering9110650

SelfCoLearn: Self-Supervised Collaborative Learning for Accelerating Dynamic MR Imaging

Juan Zou 1,2, Cheng Li 2, Sen Jia 2, Ruoyou Wu 2, Tingrui Pei 1,3,*, Hairong Zheng 2, Shanshan Wang 2,4,*
Editors: Or Perlman, Efrat Shimron, Liang Luo
PMCID: PMC9687509  PMID: 36354561

Abstract

Lately, deep learning technology has been extensively investigated for accelerating dynamic magnetic resonance (MR) imaging, with encouraging progresses achieved. However, without fully sampled reference data for training, the current approaches may have limited abilities in recovering fine details or structures. To address this challenge, this paper proposes a self-supervised collaborative learning framework (SelfCoLearn) for accurate dynamic MR image reconstruction from undersampled k-space data directly. The proposed SelfCoLearn is equipped with three important components, namely, dual-network collaborative learning, reunderampling data augmentation and a special-designed co-training loss. The framework is flexible and can be integrated into various model-based iterative un-rolled networks. The proposed method has been evaluated on an in vivo dataset and was compared to four state-of-the-art methods. The results show that the proposed method possesses strong capabilities in capturing essential and inherent representations for direct reconstructions from the undersampled k-space data and thus enables high-quality and fast dynamic MR imaging.

Keywords: dynamic MR imaging, self-supervised learning, collaborative learning, reunderampling data augmentation, co-training loss

1. Introduction

Deep learning-based dynamic magnetic resonance (MR) imaging has attracted substantial attention in recent years. It draws knowledge from big datasets via network training and then uses the trained network to reconstruct a dynamic image from the undersampled k-space data. Compared to the classical compressed sensing methods [1,2,3,4,5,6,7], deep learning-based methods have made encouraging performances and progresses.

Based on the reliance on the fully sampled dataset or not, existing methods for dynamic MR imaging can be roughly classified into two types [8,9,10]: fully-supervised methods and unsupervised methods. For the fully-supervised methods, data pairs are needed for the training of the neural networks between the corrupted/ undersampled data and the ground truth/fully sampled data [11,12,13,14,15,16,17,18]. In this category, different network structures and prior knowledge have been explored [19,20,21,22,23,24,25,26]. For example, Schlemper et al. proposed a cascade network architecture composed of an intermediate de-aliasing convolutional neural network (CNN) module and a data consistency layer [22]. Chen et al. applied bidirectional convolutional recurrent neural network (CRNN) with interleaved data consistency to accelerate MR imaging [23]. Chen et al. designed a parallel framework, including a time-frequency domain CRNN and an image domain CRNN to simultaneously exploit spatiotemporal correlations [24]. Wang et al. applied both k-space and spatial prior knowledge to accelerate MR imaging [25]. Ke et al. exploited the low rank priors (SLR-Net) [26]. The aforementioned methods have made great progress in accelerating dynamic MRI reconstruction. However, one major challenge of the above methods is that, in many practical imaging scenarios, obtaining high-quality fully sampled dynamic MR data is infeasible due to various factors, such as the physiological motions of patients and imaging speed restriction. Therefore, the requirement for fully sampled reference data of network training limits the wide application of supervised learning methods.

To address this problem, researchers have developed unsupervised learning methods to train models without fully sampled reference data [27,28,29,30]. For example, Jin et al. extended the framework of deep image prior [31] to dynamic non-Cartesian MRI [29]. Recently, Yaman et al. proposed a classical self-supervised learning strategy (SSDU) for static MR imaging [32], which divides the acquired undersampled data into two parts, of which one is treated as input data, and another is used as the supervisory signals [33]. Subsequently, Acar et al. applied SSDU to reconstruct dynamic MR images [30]. The above-mentioned works have made great contributions to unsupervised dynamic MR image reconstruction. Nevertheless, since the undersampled data have incomplete inherent representation compared to the fully sampled data, these works still have room to improve in recovering fine details or structures.

To boost the performances for accelerating dynamic MR imaging without fully sampled reference data, this paper proposes a self-supervised collaborative learning framework named the SelfCoLearn. The SelfCoLearn is based on the assumption that the latent representation of network predictions is consistent under different reundersampling data augmentation from the same data. The SelfCoLearn performs collaborative training of a dual-network using reundersampling data augmentation to explore more sufficient prior knowledge compared to a single network. Specifically, from undersampled k-space data, the reundersampling data augmentation operations are implemented to obtain two reundersampling inputs for a dual-network. In addition, dual networks are trained collaboratively with a special-designed co-training loss in an end-to-end manner. By using this collaborative training strategy, the proposed framework can possess strong capabilities in capturing essential and inherent representations from the undersamled k-space data in a self-supervised learning manner. Moreover, the proposed framework is flexible and can be integrated with various model-based iterative un-rolled networks [34] for dynamic MR imaging. In summary, the main contributions of this work can be expressed as follows:

  1. We present a self-supervised collaborative learning framework with reundersampling data augmentation for accelerating dynamic MR imaging. The proposed framework is flexible and can be integrated with various model-based iterative un-rolled networks;

  2. A co-training loss, including both undersampled consistency loss term and a contrastive consistency loss term, is designed to guide the end-to-end framework to capture essential and inherent representations from undersamled k-space data;

  3. Extensive experiments are conducted to evaluate the effectiveness of the proposed SelfCoLearn with different model-based iterative un-rolled networks, with more promising results obtained compared to self-supervised methods.

The remainder of this paper is organized as follows: Section 2 states the dynamic MR imaging problem and the proposed SelfCoLearn with different backbone networks. Section 3 summarizes the comparison experiments and results to demonstrate the effectiveness of SelfCoLearn. Section 4 presents discussions about the impact of different backbone networks and loss functions. Section 5 concludes the work.

2. Methodology

2.1. Dynamic MR Imaging Formulation

The problem of dynamic MR imaging aims to estimate dynamic MR image sequences xCN from undersampled measurements yCM(MN) in k-space. N=NhNWT is a vector. Nh and NW are the height and width of the frame, respectively. T represents the number of frames in each sequence. Thus, the imaging model is described as follows:

y=Ax+e (1)

where eCM is noise and A=PF is an undersampled Fourier encoding operator, F is 2D Fourier transform to each frame in the image sequence and P is the undersampled mask for each frame. In general, the reconstruction problem is formulated as the following unconstrained optimization problem:

x*=argminx12Axy22+λR(x) (2)

where R(x) represents a prior regularization item on x, and λ is the weight of the regularization. 12Axy22 is the data fidelity item, which guarantees the reconstruction result to be consistent with the raw undersampled measurements.

For fully-supervised deep learning methods, it typically uses a CNN fCNNyθ as a regularization term R(x), by learning the mapping between corrupted/undersampled data and their corresponding fully sampled data with parameters θ. Its mathematical description can be given as:

θ*=argminθi=1SLfCNNyiθ,xiref (3)

where i is the index of the subject in the training dataset, and S is its total number. xiref is the ground truth (fully sampled reference data) of the subject data i. L(·) denotes the loss function between the predicted output and the ground truth, which typically adopts the l1norm or l2norm.

2.2. The Overall Framework

This work proposes a simple but effective self-supervised training framework for dynamic MR imaging, whose paradigm is shown in Figure 1. The proposed framework simultaneously trains two independent reconstruction networks, which have different inputs and different weight parameters. The backbone network can adopt various iterative un-rolled network, such as CRNN [23], k-t NEXT [21], and SLR-Net [26]. Based on the consistency between two networks’ prediction results, the network provides complementary information for the to-be-reconstructed dynamic MR images in its peer partner. The two networks will finally realize consistent reconstruction in the training process. Specifically, given a raw undersampled k-space data sequence Ω=yΩtt=1T, the original k-space data yΩt are reundersampled to construct a partial data points sequence yutt=1T as follows:

yut=PutyΩt,t=1,,T,u=Θ,Λ (4)

where t is the sequence index, u denotes the index of the two training sequences and Put is the undersampled mask for frame t. To achieve full use of all data points in yΩt to learn representation, and ensure that each network can provide complementary information for the to-be-reconstructed dynamic MR images in its peer network, these training sequences are generated to adhere to the following data augmented principles: (1) The union of data points in two training sequences must be equal to the data yΩt, i.e., yΩt=yΘtyΛt. (2) The data points in two training sequences should be different, i.e., yΘtyΛt. (3) The training sequences should include most of the low frequency signals and part of the high frequency signals. Low frequency signals correspond to data points in the k-space center or close to it and high frequency signals to the outer parts of the k-space. Following these principles, the two training sequences contain different points in the high frequency region, and similar data points in the low frequency region. It should be noted that data reundersampling is necessary only during training, whereas the reconstructed images can be inferred from the test data directly.

Figure 1.

Figure 1

An overview of the proposed self-supervised collaborative training framework. A raw undersampled k-space data sequence yΩt is undersampled from the fully sampled data using an undersampled mask Pt retrospectively, and then two k-space data sequences yΘt and yΛt are augmented from yΩt. In the considered scenario, yΘt and yΛt are reundersampled from yΩt using reundersampled mask PΘt and PΛt, respectively. Next, the two networks received inputs from zero-filling image sequences of yΘt and yΛt. The predicted image sequences of networks are transformed to the k-space data fΘyΘt and fΛyΛt by two-dimensional Fourier transform. Afterward, a co-training loss is calculated using yΩt, fΘyΘt and fΛyΛt. The backbone reconstruction network can flexibly adopt different iterative un-rolled network, such as CRNN, k-t NEXT and SLR-Net. Collaborative network-1 and collaborative network-2 have the same network structure but different weight parameters θΘ and θΛ respectively.

2.3. Network Architectures

2.3.1. Model-Driven Deep Learning with Image-Domain Regularization

In these settings, the common practice is to decouple Equation (2) into a regularization term and a data fidelity term via utilizing the variable splitting technique [22,23]. By introducing an auxiliary variable z=x, Equation (2) can be re-formulated as a penalty function [23], which can be expressed as follows:

argminx,zλR(z)+12Axy22+μxz22 (5)

where μ denotes a penalty parameter. Equation (5) can then be solved iteratively via alternating minimization over z and x:

zn=argminzλR(z)+μxn1z22 (6)
xn=argminx12Axy22+μxzn22 (7)

where n1,2,,N is the nth iteration, x0 is the zero-filling image transformed from original undersampled measurement, zn denotes the intermediate reconstruction sequence, and xn denotes the final reconstruction sequence at each iteration. In Equation (7), the operation on the intermediate reconstruction sequence zn is a data consistency step [22]. The iterative optimization process in Equations (6) and (7) is unrolled into a neural network.

The CRNN [23] is a typical model-driven deep learning method with image-domain regularization for dynamic MR imaging [35]. A single iteration of the CRNN can be expressed as follows:

xrnn(n)=xrec(n1)+CRNNxrec(n1) (8)
xrec(n)=DCxrnn(n);y,λ (9)

where xrnn(n) is the intermediate reconstruction sequence analogous to zn in Equation (6), and xrec(n) denotes the final predicted result at each iteration analogous to xn in Equation (7). The regularization subproblem in Equation (6) is solved by using a convolutional recurrent neural network. The data consistency subproblem in Equation (7) is treated as a data consistency network layer, which uses the original sampled k-space data points to replace the corresponding data points in the reconstructed k-space data [22]. More details of CRNN layers can be found in Ref. [23].

2.3.2. Model-Driven Deep Learning with Complementary Regularization

The complementary regularization is also an effective method for dynamic MR imaging. The k-t NEXT [21] is a typical model-driven deep learning method with complementary regularization [35], which exploits prior information in both combined spatial and temporal Fourier (x-f) domain and spatiotemporal image (x-t) domain. A single iteration of the k-t NEXT can be expressed as the following process:

ρ(n)=DCybase+xfCNNyrec(n1)ybase, (10)
xrec(n)=CRNNFfHρ(n);y0,yrec(n)=Fxyxrec(n) (11)

where ρ(n) denotes the intermediate reconstruction results in the x-f domain from xf-CNN at nth iteration, xrec(n) denotes the reconstruction image sequence in the x-t domain at nth iteration, ybase is the corresponding baseline signal, and Fxy and FfH denote, respectively, the Fourier transform in x-t domain and the inverse Fourier transform in x-f domain.

2.3.3. Model-Driven Deep Learning with Low-Rank Regularization

Another widely-used prior regularization is low-rank based dynamic MR imaging, which applies low-rank priors as regularized terms. The SLR-Net [26] is a typical example of a model-driven deep learning method with low-rank regularization. In the SLR-Net, by introducing an auxiliary variable M, Equation (2) can be decoupled as the fidelity term, sparse regularization term, and the low rank regularization term:

argminx,M12Axy22+λ1Dx1+λ2M* (12)

where D is a sparse transform in a certain sparse domain. M=Rx is a matrix (with size (Nh×Nw, T)), in which each column corresponds to one frame in dynamic MR image sequence. R is a reshaping operator. M* is the nuclear norm. Previous works have proven that nuclear norm minimization is effective in low-rank matrix recovery [36]. More details of the iterative process in SLR-Net can be found in Ref. [26].

2.4. The Proposed Co-Training Loss

In this study, a co-training loss is defined to promote accurate dynamic MR image reconstruction in a self-supervised manner. The main idea of the co-training loss is to enforce the consistency not only between the reconstruction results and the original undersampled k-space data, but also between two network predictions. Compared with existing self-supervised methods with single network, the consistency between two network predictions is an additional regularization, which guides the dual-network to narrow the divergence and learn more correct information. Specifically, the co-training loss in SelfCoLearn, including an undersampled consistency loss term and a contrastive consistency loss term, is calculated to optimize the proposed framework.

Let fSelfCoLearnyΩt denote SelfCoLearn, yΩt is the original undersampled k-space data. During training, two training sequences yΘt and yΛt are generated from yΩt following the data augmented principles in Section 2.2 as follows:

yΘt=PΘtyΩt,yΛt=PΛtyΩt, (13)

where PΘt and PΛt are the reundersampled mask for yΩt. The undersampled consistency loss is mainly referred to the actually sampled k-space points in yΩt, which ensures that the corresponding sampled points in network prediction are consistent with the actually sampled k-space points in yΩt. The actually sampled points in these two network predictions are denoted as yΘΩt and yΛΩt, respectively. yΘΩt and yΛΩt in these two network predictions can be written as:

yΘΩt=PtfyΘt,yΛΩt=PtfyΛt, (14)

where k-space data fyΘt and fyΛt are transformed from the predicted image sequences of two networks, respectively. Pt is the undersampled mask, which is applied to generate the raw undersampled k-space data yΩt from the fully sampled data.

The Undersampled Consistency loss term is used to calculate the MSE between the actually sampled k-space points in yΩt and those predicted by the network as follows:

LUC=yΘΩtyΩt22+yΛΩtyΩt22. (15)

In the ideal case, when different reundersampled k-space data from the same data are set as inputs of the two networks, the networks’ predictions should approximate the fully-sampled reference data after network optimization. However, when fully sampled reference data are unavailable, these two networks can be trained only using the undersampled consistency loss, and they will be likely to generate different prediction results, which will lead to different reconstruction performances. As mentioned above, a contrastive consistency loss is defined to compute the MSE between two network predictions obtained by using different reundersampling inputs generated from the same data. Specially, the proposed contrastive consistency loss term mainly refers to the points in network predictions corresponding to unsampled k-space points in yΩt. Points y¯ΘΩt and y¯ΛΩt in two network predictions fyΘt and fyΛt can be expressed as follows:

y¯ΘΩt=IPtfyΘt,y¯ΛΩt=IPtfyΛt, (16)

therefore, the Contrastive Consistency loss term is formulated as:

LCC=y¯ΘΩty¯ΛΩt22. (17)

combining the two loss terms, the final co-training loss function can be defined as follows:

Lco=LUC+γLCC, (18)

where γ is used to balance the weight parameter of the undersampled consistency loss and the contrastive consistency loss. During the testing phase, the undersampled data is used as input of the collaborative network-1 or collaborative network-2 to obtain the final reconstruction result.

3. Experimental Results

Extensive experiments have been performed to evaluate the effectiveness of SelfCoLearn. SelfCoLearn is compared with fully-supervised and self-supervised learning methods at different acceleration factors. Besides, SelfCoLearn with different backbone networks for dynamic MR imaging have been experimented. Then, the results of the ablation studies are reported to investigate the impacts of the undersampled consistency loss term and contrastive consistency loss term. Finally, reconstruction results with a different co-training loss calculated in different domains are reported to further evaluate the proposed SelfCoLearn.

3.1. Experimental Setup

3.1.1. Dataset

The dataset includes fully sampled 2D+t complex-valued short-axis cardiac cine MR data collected on a 3T Siemens Magnetom Trio scanner from 101 healthy volunteers. T1-weighted FLASH sequence is utilized. Each scan includes single-slice FLASH acquisition from the volunteer with retrospectively electrocardiogram ECG-gating. Each volunteer needed to breath-hold for 15–20 s on each slice. The parameters of data acquisition include 24 receiving coils, FOV of 330 mm × 330 mm, acquisition matrix of 192 × 192, slice thickness of 6 mm, repetition time of 50 ms, and echo time of 3 ms. Each scan with a single slice covers the entire cardiac dynamic process with 25 temporal frames. This retrospective study was approved by local ethics committee and the informed consent was obtained from all of the involved volunteers. In the experiments, the set of scanned multi-coil MR data for each frame is transformed to a single-channel MRI by the adaptive reconstruction technique [37]. The corresponding k-space data to the single-channel MRI can be viewed as a fully sampled single-coil data. To enlarge the training dataset, we implement data augmentation strategies by shearing the single-channel complex-valued image along the dimensions of x, y, and t. After data augmentation, the dataset includes 6214 complex-valued data sequences of size 128 × 128 × 14. A total of 5950 cardiac MR data sequences were selected as the training dataset, 50 cardiac sequences were used as the validation dataset, and the remaining sequences were used for testing.

3.1.2. Reundersampling K-Space Data Augmentation

In the proposed method, the fully sampled data are only used to generate the original undersampled k-space data yΩt with a Cartesian retrospective undersampled mask Pt. Following the principles of training data augmentation in Section 2.2, yΩt is augmented to two training sequences yΘt and yΛt with two Cartesian reundersampled masks PΘt and PΛt. PΘt with 2-fold acceleration is used for collaborative network-1, and PΛt, which combines the complementary set of PΘt with some low-frequency data points of Pt, is used for collaborative network-2.

3.1.3. Evaluation Metrics

Reconstruction performances are evaluated by calculating mean-squared-error (MSE), peak-signal-to-noise ratio (PSNR), and structural similarity index (SSIM) [38] on magnitude images. The evaluation metrics are measured between the reconstruction image sequence Rec with the reference image sequence Ref as follows:

MSE=RefRec22 (19)
PSNR=20log10MAXRefMSE (20)
SSIM=2μRefμRec+c12σRef,Rec+c2μRef2+μRec2+c1σRef2+σRec2+c2 (21)

where MAXRef is the maximum possible value in the image. μRef and μRec are the averaged intensity values of the corresponding images. σRef and σRec are the variances. c1 and c2 are adjustable constants. σRef,Rec is the covariance. (details of SSIM index can be found in Ref. [38]).

3.1.4. Model Configuration and Implementation Details

The proposed framework is flexible and can be integrated with various iterative un-rolled networks, such as CRNN, k-t NEXT and SLR-Net. Most of our experiments adopt CRNN as the backbone network. In detail, the network is composed of a bidirectional CRNN layer, three CRNN layers, a 2D CNN layer, a residual connection and a DC layer. For the bidirectional CRNN and CRNN layer, the convolution filter is set as 64 and the kernel size is set as 3. The 2D CNN layer has kernel size k=3 and convolution filter Nf=2. We use stride=1 and the padding is set to half of the filter size (rounded down). The DC layer is followed by the 2D CNN layer, which forces the actually sampled points in the predicted k-space data to be consistent with that in the input data.

For model training, the number of iteration steps is set to N=5. The batch size is set to 1. All training data and test data are normalized to the range of [0, 1]. The SelfCoLearn framework with CRNN and k-t NEXT is implemented in PyTorch 1.8.1, and that with SLR-Net is implemented in Tensorflow 2.2.0. The experiments are performed on an Nvidia Titan Xp GPU, with 12GB memory. SelfCoLearn is trained by Adam optimizer [39] with parameters β1=0.5 and β2=0.999. The learning rate is set to 104. The weight parameter γ in co-training loss is set to 0.01. It takes 52 h to train SelfCoLearn with CRNN and each cardiac MR data sequence takes roughly 0.5 s to get the reconstructed result.

3.2. Comparisons to State-of-the-Art Unsupervised Methods

To evaluate the proposed SelfCoLearn, we compared it with two self-supervised methods, SS-DCCNN and SS-CRNN, at different acceleration factors. It is worth noting that the state-of-the-art self-supervised method SSDU [32] was developed for static MR imaging. Ref. [30] adopted a similar self-supervised training manner as SSDU for dynamic MR imaging. They evaluated several backbone architectures for dynamic MR imaging including DCCNN and CRNN, whereas SSDU adopted ResNet as the backbone network. We choose two self-supervised learning methods SS-DCCNN and SS-CRNN [30] for comparison. In this experiment, the proposed SelfCoLearn selects the CRNN as the backbone network.

Figure 2 plots the reconstruction results of different self-supervised methods at 4-fold acceleration, 8-fold acceleration, and 12-fold acceleration, respectively. The first row and fourth row show the ground truth (fully sampled image) and the reconstruction images of the respective methods in the diastolic and systolic at different accelerations, respectively (display range [0, 1]). The second row and fifth row show their corresponding enlarged images in the heart regions. The third row and sixth row plot the error images of the corresponding methods (display range [0, 0.2]). The y-t images at the 40th slice along the dimensions of y and t are shown in the seventh row. The corresponding error images of y-t images are plotted in the last row. From the visualization results, the proposed SelfCoLearn generates better reconstruction results than the two self-supervised methods, SS-DCCNN and SS-CRNN, at all acceleration factors. The reconstruction images of SelfCoLearn show finer structural details and more precise heart borders with fewer artifacts.

Figure 2.

Figure 2

Reconstruction results of different self-supervised methods (SS-DCCNN, SS-CRNN, and SelfCoLearn) at 4-fold acceleration, 8-fold acceleration, and 12-fold acceleration. The first row and fourth row show the ground truth (fully sampled image) and the reconstruction images of the respective methods in the diastolic (the 10th frame of image sequence) and systolic (the 5th frame of image sequence), respectively. The second row and fifth row show their corresponding enlarged images in the heart regions. The third row and sixth row plot the error images of corresponding methods. The last two rows show y-t images (the 40th slice along the dimensions of y and t) and the corresponding error images.

The quantitative results of these self-supervised methods are listed in Table 1. Similar conclusions can be obtained, showing that the SelfCoLearn achieves better quantitative performance than these self-supervised learning methods. Therefore, our collaborative learning strategy can effectively capture essential and inherent representations from undersampled k-space data directly.

Table 1.

Quantitative reconstruction results of different self-supervised methods (SS-DCCNN, SS-CRNN, and SelfCoLearn) at 4-fold, 8-fold, and 12-fold acceleration factors (mean ± std).

AF Methods Training Pattern PSNR (dB) SSIM MSE (×104)
SS-DCCNN Self-supervised 25.81 ± 2.86 0.6409 ± 0.0739 32.81 ± 24.85
4-fold SS-CRNN Self-supervised 32.49 ± 1.79 0.8383 ± 0.0387 6.14 ± 2.62
SelfCoLearn Self-supervised 40.34 ± 2.69 0.9536 ± 0.0239 1.11 ± 0.72
SS-DCCNN Self-supervised 22.56 ± 2.71 0.5615 ± 0.0732 67.87 ± 49.27
8-fold SS-CRNN Self-supervised 30.81 ± 1.77 0.8015 ± 0.0427 9.02 ± 3.75
SelfCoLearn Self-supervised 37.27 ± 2.40 0.9243 ± 0.0338 2.17 ± 1.22
SS-DCCNN Self-supervised 22.17 ± 2.76 0.5270 ± 0.0702 74.89 ± 54.96
12-fold SS-CRNN Self-supervised 30.14 ± 1.78 0.7943 ± 0.0444 10.54 ± 4.40
SelfCoLearn Self-supervised 35.19 ± 2.24 0.8985 ± 0.0399 3.44 ± 1.78

Figure 3 shows the box plots displaying the median and interquartile range (25th–75th percentile) of the reconstruction results of different self-supervised methods on the test cardiac cine data at 4-fold acceleration, 8-fold acceleration, and 12-fold acceleration, respectively. The results in Figure 3 show that, for all dynamic cine sequences, the SelfCoLearn outperforms the two self-supervised learning methods (SS-DCCNN and SS-CRNN) at all three acceleration factors.

Figure 3.

Figure 3

Box plots of different methods (SS-DCCNN, SS-CRNN, and SelfCoLearn) at 4-fold, 8-fold, and 12-fold accelerations are presented, which show the median and interquartile range of the PSNR, SSIM, and MSE on the cardiac cine test dataset.

3.3. Comparisons to State-of-the-Art Supervised Methods

We further compare our SelfCoLearn with different supervised methods, including supervised U-Net and supervised CRNN [23], at different acceleration factors. Figure 4 plots the reconstruction images of different methods at 4-fold acceleration, 8-fold acceleration, and 12-fold acceleration, respectively. The error images of SelfCoLearn indicate minor reconstruction errors than those of supervised U-Net.

Figure 4.

Figure 4

Reconstruction results of different methods (Supervised U-Net, SelfCoLearn, and Supervised CRNN) at 4-fold acceleration, 8-fold acceleration, and 12-fold acceleration. The first row and fourth row show the ground truth (fully sampled image) and the reconstruction images of respective methods in the diastolic (the 10th frame of the image sequence) and systolic (the 5th frame of the image sequence), respectively. The second row and fifth row show their corresponding enlarged images in the heart regions. The third row and sixth row plot the error images of the corresponding methods. The last two rows show y-t images (the 40th slice along the dimensions of y and t) and the corresponding error images.

In addition, the reconstruction results generated by SelfCoLearn are close to those of supervised CRNN at low acceleration factors. From the quantitative results in Table 2, the PSNR and SSIM of SelfCoLearn present 1.3% and 0.17% lower than those of supervised CRNN at 4-fold acceleration factors, respectively. At higher acceleration factors, such as 12-fold acceleration, the reconstructed images of SelfCoLearn become slightly blurred. Nevertheless, most of the structural details in the heart regions are still successfully restored by SelfCoLearn. The PSNR and SSIM of SelfCoLearn present 3.2% and 0.69% lower than those of supervised CRNN at 12-fold acceleration factors, respectively. Therefore, SelfCoLearn can achieve comparable reconstruction performance with baseline fully-supervised methods via self-supervised dual-network collaborative learning.

Table 2.

Quantitative reconstruction results of different methods (Supervised U-Net, Supervised CRNN and SelfCoLearn) at 4-fold, 8-fold, and 12-fold acceleration factors (mean ± std).

AF Methods Training Pattern PSNR (dB) SSIM MSE (×104)
U-Net Supervised 33.77 ± 1.96 0.8698 ± 0.0391 4.66 ± 2.22
4-fold SelfCoLearn Self-supervised 40.34 ± 2.69 0.9536 ± 0.0239 1.11 ± 0.72
CRNN Supervised 40.89 ± 2.90 0.9553 ± 0.0237 1.01 ± 0.68
U-Net Supervised 32.63 ± 1.97 0.8329 ± 0.0456 6.06 ± 2.88
8-fold SelfCoLearn Self-supervised 37.27 ± 2.40 0.9243 ± 0.0338 2.17 ± 1.22
CRNN Supervised 38.09 ± 2.52 0.9269 ± 0.0342 1.83 ± 1.07
U-Net Supervised 31.96 ± 1.88 0.8315 ± 0.0478 6.99 ± 3.03
12-fold SelfCoLearn Self-supervised 35.19 ± 2.24 0.8985 ± 0.0399 3.44 ± 1.78
CRNN Supervised 36.32 ± 2.29 0.9048 ± 0.0392 2.67 ± 1.42

4. Discussion

4.1. Network Backbone Architectures

In this section, we explore the reconstruction results of the proposed self-supervised learning strategy with different backbone networks for dynamic MR imaging. The experiments are conducted using SLR-Net [26], k-t NEXT [21], and CRNN [23] at 8-fold acceleration. The reconstruction results with different backbone networks are exhibited in Figure 5 and Table 3. Compared with SS-CRNN [11], the proposed SelfCoLearn can achieve better results regardless of the utilized backbone network. Among the three different backbone networks, SLR-Net generates worse results than k-t NEXT and CRNN. The reason for this phenomenon may be that SLR-Net needs to learn a singular value threshold, and the absence of the fully sampled reference data causes the learned singular value threshold to be suboptimal. However, the proposed self-supervised learning strategy with SLR-Net still obtains acceptable reconstruction results. The qualitative results in Figure 5 clearly show that SelfCoLearn can better restore the structural details and achieve clearer reconstructed MR images (especially in the heart regions around the red and yellow arrows) than SS-CRNN. The quantitative results also indicate more accurate reconstructions achieved by the proposed SelfCoLearn. These results indicate that our proposed self-supervised learning framework is flexible, and it can achieve promising reconstruction results with various iterative un-rolled networks for dynamic MR imaging.

Figure 5.

Figure 5

Reconstruction results of SS-CRNN and the proposed SelfCoLearn with SLR-Net, k-t NEXT, and CRNN backbone networks at 8-fold acceleration. The first row shows ground truth (fully sampled image), the reconstruction images of SS-CRNN and the proposed self-supervised learning strategy with SLR-Net, k-t NEXT, and CRNN (10th frame). The second row shows their enlarged images in the heart regions. The third row plots the error images of these two methods. The last two rows show the y-t images (the 40th slice along the dimensions of y and t) and the corresponding error images.

Table 3.

Quantitative results of SS-CRNN and SelfCoLearn with different backbone networks at 8-fold acceleration (mean ± std).

Methods Training Pattern PSNR (dB) SSIM MSE (×104)
SS-CRNN Self-supervised 30.81 ± 1.77 0.8015 ± 0.0427 9.02 ± 3.75
SelfCoLearn with SLR-Net Self-supervised 33.58 ± 2.24 0.9001 ± 0.0369 5.57 ± 10.48
SelfCoLearn with k-t Next Self-supervised 36.95 ± 2.39 0.9226 ± 0.0343 2.34 ± 1.32
SelfCoLearn with CRNN Self-supervised 37.27 ± 2.40 0.9243 ± 0.0338 2.17 ± 1.22

4.2. Co-Training Loss Function

In this section, we investigate the utility of the designed co-training loss function. The backbone network in these experiments adopts CRNN. Different training strategies at 8-fold acceleration are utilized. Strategy B-I: a single reconstruction network is trained in self-supervised manner. Only the loss function between the output fyΘt of network and yΛt is used to train the network. Strategy B-II: a strategy similar to B-I but the loss function here is calculated between the output fyΘt of the network and the original undersampled k-space data yΩt. SelfCoLearn: two networks are trained collaboratively with LUC and LCC, and the two collaborative networks adopt the same backbone network as that in strategy B-I. Reconstruction images of methods utilizing the different training strategies are plotted in Figure 6. Quantitative results are listed in Table 4. From both qualitative and quantitative results, we can observe that SelfCoLearn (training two networks collaboratively with both loss terms) achieves the best performance (especially in the heart regions around the red and yellow arrows). In particular, the contrastive consistency loss term results in a large reconstruction performance improvement. For example, PSNR is improved from 31.04 dB (Strategy B-II) to 37.27 dB (SelfCoLearn).

Figure 6.

Figure 6

Ablation studies utilizing different training strategies at 8-fold acceleration. The first row shows the ground truth (fully sampled image), and the reconstruction images of strategy B-I, strategy B-II, and proposed SelfCoLearn (10th frame). The second row shows their enlarged images in the heart regions. The third row plots the error images of respective methods. The last two rows show y-t images (the 40th slice along the dimensions of y and t) and the corresponding error images.

Table 4.

Quantitative results of reconstruction models utilizing different training strategies at 8-fold acceleration (mean ± std).

Methods Single-Net Parallel-Net LUC LCC PSNR (dB) SSIM MSE (×104)
Strategy B-I × × × 30.81 ± 1.77 0.8015 ± 0.0427 9.02 ± 3.75
Strategy B-II × × 31.04 ± 1.74 0.8102 ± 0.0411 8.53 ± 3.50
SelfCoLearn × 37.27 ± 2.40 0.9243 ± 0.0338 2.17 ± 1.22

4.3. Loss Functions

In this section, we inspect the effects of loss functions. The backbone network in these experiments adopts CRNN. Reconstruction results at 8-fold acceleration are given in Figure 7 and Table 5. Three strategies utilizing different loss function settings are investigated. In Strategy C-I, two networks are trained collaboratively with LUC and LCC, in which LUC is calculated in the x-t domain, and LCC is calculated in the k-space domain. In Strategy C-II, both LUC and LCC are calculated in the x-t domain. In Strategy C-III, both LUC and LCC are calculated in the k-space domain. From both qualitative and quantitative results, we can observe that the influence of utilizing different loss function settings on the reconstruction performance is insignificant. All the other experiments in this work adopt the setting of strategy C-III.

Figure 7.

Figure 7

Effects of loss functions calculated in different domains on the reconstruction results at 8-fold acceleration. The first row shows ground truth (fully sampled image), the reconstruction results of models utilizing Strategy C-I, C-II and C-III (10th frame). The second row shows their enlarged images in the heart regions. The third row plots their error images of respective strategies. The last two rows show y-t images (the 40th slice along the dimensions of y and t) and the corresponding error images.

Table 5.

Quantitative results of methods utilizing different loss function strategies at 8-fold acceleration (mean ± std).

Methods LUC LCC PSNR (dB) SSIM MSE (×104)
Strategy C-I x-t domain k-space 37.00 ± 2.35 0.9230 ± 0.0344 2.30 ± 1.29
Strategy C-II x-t domain x-t domain 37.20 ± 2.37 0.9235 ± 0.0343 2.20 ± 1.22
Strategy C-III k-space k-space 37.27 ± 2.40 0.9243 ± 0.0338 2.17 ± 1.22

5. Conclusions

In our work, we propose a self-supervised collaborative training framework to boost the image reconstruction performance for accelerating dynamic MR imaging. Specifically, two independent reconstruction networks are trained collaboratively with different inputs, which are augmented from the same k-space data. To guide the dual-network in capturing the detailed structural features and spatiotemporal correlations in dynamic image sequences, a co-training loss function is designed to promote the consistency between network predictions to provide complementary information for the to-be-reconstructed dynamic MR images. The proposed framework is flexible and can be integrated with various iterative un-rolled networks. In addition, the proposed method has been comprehensively evaluated on a cardiac cine dataset. The quantitative and qualitative results indicate that SelfCoLearn possesses strong capabilities in capturing essential and inherent representations directly from the undersampled k-space data and thus enable high-quality and fast dynamic MR imaging.

Author Contributions

Methodology, S.W. and J.Z.; software, J.Z.; validation, J.Z. and R.W.; investigation, J.Z. and R.W.; data curation, J.Z. and R.W.; writing—original draft preparation, J.Z. and C.L.; writing—review and editing, J.Z., C.L., S.W., T.P. and S.J.; supervision, S.W. and T.P.; project administration, S.W.; funding acquisition, S.W. and H.Z. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

The study was approved by the Institutional Review Board of shenzhen institute of advanced technology chinese academy of sciences (SIAT-IRB-200315-H0469).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The source code will be available publicly upon publication of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Funding Statement

This research was partly supported by Scientific and Technical Innovation 2030-“New Generation Artificial Intelligence” Project (2020AAA0104100, 2020AAA0104105), the National Natural Science Foundation of China (61871371,62222118,U22A2040), Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application (No. 2022B1212010011), the Basic Research Program of Shenzhen (JCYJ20180507182400762), Shenzhen Science and Technology Program (Grant No. RCYX20210706092104034), Youth Innovation Promotion Association Program of Chinese Academy of Sciences (2019351), and Hunan Provincial Innovation Foundation For Postgraduate (CX20200626).

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Gamper U., Boesiger P., Kozerke S. Compressed sensing in dynamic MRI. Magn. Reson. Med. Off. J. Int. Soc. Magn. Reson. Med. 2008;59:365–373. doi: 10.1002/mrm.21477. [DOI] [PubMed] [Google Scholar]
  • 2.Zhao B., Haldar J.P., Christodoulou A.G., Liang Z.P. Image reconstruction from highly undersampled (k, t)-space data with joint partial separability and sparsity constraints. IEEE Trans. Med. Imaging. 2012;31:1809–1820. doi: 10.1109/TMI.2012.2203921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Jung H., Ye J.C., Kim E.Y. Improved k–t BLAST and k–t SENSE using FOCUSS. Phys. Med. Biol. 2007;52:3201. doi: 10.1088/0031-9155/52/11/018. [DOI] [PubMed] [Google Scholar]
  • 4.Wang Y., Ying L. Compressed sensing dynamic cardiac cine MRI using learned spatiotemporal dictionary. IEEE Trans. Biomed. Eng. 2013;61:1109–1120. doi: 10.1109/TBME.2013.2294939. [DOI] [PubMed] [Google Scholar]
  • 5.Caballero J., Price A.N., Rueckert D., Hajnal J.V. Dictionary learning and time sparsity for dynamic MR data reconstruction. IEEE Trans. Med. Imaging. 2014;33:979–994. doi: 10.1109/TMI.2014.2301271. [DOI] [PubMed] [Google Scholar]
  • 6.Jung H., Sung K., Nayak K.S., Kim E.Y., Ye J.C. k-t FOCUSS: A general compressed sensing framework for high resolution dynamic MRI. Magn. Reson. Med. Off. J. Int. Soc. Magn. Reson. Med. 2009;61:103–116. doi: 10.1002/mrm.21757. [DOI] [PubMed] [Google Scholar]
  • 7.Otazo R., Candes E., Sodickson D.K. Low-rank plus sparse matrix decomposition for accelerated dynamic MRI with separation of background and dynamic components. Magn. Reson. Med. 2015;73:1125–1136. doi: 10.1002/mrm.25240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Wang S., Xiao T., Liu Q., Zheng H. Deep learning for fast MR imaging: A review for learning reconstruction from incomplete k-space data. Biomed. Signal Process. Control. 2021;68:102579. doi: 10.1016/j.bspc.2021.102579. [DOI] [Google Scholar]
  • 9.Wang S., Cao G., Wang Y., Liao S., Wang Q., Shi J., Li C., Shen D. Review and Prospect: Artificial Intelligence in Advanced Medical Imaging. Front. Radiol. 2021;1:781868. doi: 10.3389/fradi.2021.781868. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Li C., Li W., Liu C., Zheng H. Artificial intelligence in multiparametric magnetic resonance imaging: A review. Med. Phys. 2022;49:e1024–e1054. doi: 10.1002/mp.15936. [DOI] [PubMed] [Google Scholar]
  • 11.Wang S., Su Z., Ying L., Peng X., Zhu S., Liang F., Feng D., Liang D. Accelerating magnetic resonance imaging via deep learning; Proceedings of the 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI); Prague, Czech Republic. 13–16 April 2016; pp. 514–517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhang J., Ghanem B. ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Salt Lake City, UT, USA. 18–22 June 2018; pp. 1828–1837. [Google Scholar]
  • 13.Eo T., Jun Y., Kim T., Jang J., Lee H.J., Hwang D. KIKI-net: Cross-domain convolutional neural networks for reconstructing undersampled magnetic resonance images. Magn. Reson. Med. 2018;80:2188–2201. doi: 10.1002/mrm.27201. [DOI] [PubMed] [Google Scholar]
  • 14.Sun J., Li H., Xu Z., Yang Y. Deep ADMM-Net for compressive sensing MRI. Adv. Neural Inf. Process. Syst. 2016;29 [Google Scholar]
  • 15.Aggarwal H.K., Mani M.P., Jacob M. MoDL: Model-based deep learning architecture for inverse problems. IEEE Trans. Med. Imaging. 2018;38:394–405. doi: 10.1109/TMI.2018.2865356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hammernik K., Klatzer T., Kobler E., Recht M.P., Sodickson D.K., Pock T., Knoll F. Learning a variational network for reconstruction of accelerated MRI data. Magn. Reson. Med. 2018;79:3055–3071. doi: 10.1002/mrm.26977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Akçakaya M., Moeller S., Weingärtner S., Uğurbil K. Scan-specific robust artificial-neural-networks for k-space interpolation (RAKI) reconstruction: Database-free deep learning for fast imaging. Magn. Reson. Med. 2019;81:439–453. doi: 10.1002/mrm.27420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Mardani M., Gong E., Cheng J.Y., Vasanawala S.S., Zaharchuk G., Xing L., Pauly J.M. Deep generative adversarial neural networks for compressive sensing MRI. IEEE Trans. Med. Imaging. 2018;38:167–179. doi: 10.1109/TMI.2018.2858752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Huang Q., Xian Y., Yang D., Qu H., Yi J., Wu P., Metaxas D.N. Dynamic MRI reconstruction with end-to-end motion-guided network. Med. Image Anal. 2021;68:101901. doi: 10.1016/j.media.2020.101901. [DOI] [PubMed] [Google Scholar]
  • 20.Seegoolam G., Schlemper J., Qin C., Price A., Hajnal J., Rueckert D. Exploiting motion for deep learning reconstruction of extremely-undersampled dynamic MRI; Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Shenzhen, China. 13–17 October 2019; pp. 704–712. [Google Scholar]
  • 21.Qin C., Schlemper J., Duan J., Seegoolam G., Price A., Hajnal J., Rueckert D. k-t NEXT: Dynamic MR image reconstruction exploiting spatio-temporal correlations; Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Shenzhen, China. 13–17 October 2019; pp. 505–513. [Google Scholar]
  • 22.Schlemper J., Caballero J., Hajnal J.V., Price A.N., Rueckert D. A deep cascade of convolutional neural networks for dynamic MR image reconstruction. IEEE Trans. Med. Imaging. 2017;37:491–503. doi: 10.1109/TMI.2017.2760978. [DOI] [PubMed] [Google Scholar]
  • 23.Qin C., Schlemper J., Caballero J., Price A.N., Hajnal J.V., Rueckert D. Convolutional recurrent neural networks for dynamic MR image reconstruction. IEEE Trans. Med. Imaging. 2018;38:280–290. doi: 10.1109/TMI.2018.2863670. [DOI] [PubMed] [Google Scholar]
  • 24.Qin C., Duan J., Hammernik K., Schlemper J., Küstner T., Botnar R., Prieto C., Price A.N., Hajnal J.V., Rueckert D. Complementary time-frequency domain networks for dynamic parallel MR image reconstruction. Magn. Reson. Med. 2021;86:3274–3291. doi: 10.1002/mrm.28917. [DOI] [PubMed] [Google Scholar]
  • 25.Wang S., Ke Z., Cheng H., Jia S., Ying L., Zheng H., Liang D. DIMENSION: Dynamic MR imaging with both k-space and spatial prior knowledge obtained via multi-supervised network training. NMR Biomed. 2022;35:e4131. doi: 10.1002/nbm.4131. [DOI] [PubMed] [Google Scholar]
  • 26.Ke Z., Huang W., Cui Z.X., Cheng J., Jia S., Wang H., Liu X., Zheng H., Ying L., Zhu Y., et al. Learned low-rank priors in dynamic MR imaging. IEEE Trans. Med. Imaging. 2021;40:3698–3710. doi: 10.1109/TMI.2021.3096218. [DOI] [PubMed] [Google Scholar]
  • 27.Hu C., Li C., Wang H., Liu Q., Zheng H., Wang S. Self-supervised learning for mri reconstruction with a parallel network training framework; Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Strasbourg, France. 27 September–1 October 2021; pp. 382–391. [Google Scholar]
  • 28.Wang S., Wu R., Li C., Zou J., Zhang Z., Liu Q., Xi Y., Zheng H. PARCEL: Physics-based Unsupervised Contrastive Representation Learning for Multi-coil MR Imaging. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022 doi: 10.1109/TCBB.2022.3213669. [DOI] [PubMed] [Google Scholar]
  • 29.Yoo J., Jin K.H., Gupta H., Yerly J., Stuber M., Unser M. Time-dependent deep image prior for dynamic MRI. IEEE Trans. Med. Imaging. 2021;40:3337–3348. doi: 10.1109/TMI.2021.3084288. [DOI] [PubMed] [Google Scholar]
  • 30.Acar M., Çukur T., Öksüz İ. Self-supervised Dynamic MRI Reconstruction; Proceedings of the International Workshop on Machine Learning for Medical Image Reconstruction; Strasbourg, France. 1 October 2021; pp. 35–44. [Google Scholar]
  • 31.Ulyanov D., Vedaldi A., Lempitsky V. Deep image prior; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Salt Lake City, UT, USA. 18–22 June 2018; pp. 9446–9454. [Google Scholar]
  • 32.Yaman B., Hosseini S.A.H., Moeller S., Ellermann J., Uğurbil K., Akçakaya M. Self-supervised learning of physics-guided reconstruction neural networks without fully sampled reference data. Magn. Reson. Med. 2020;84:3172–3191. doi: 10.1002/mrm.28378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Akçakaya M., Yaman B., Chung H., Ye J.C. Unsupervised Deep Learning Methods for Biological Image Reconstruction and Enhancement: An overview from a signal processing perspective. IEEE Signal Process. Mag. 2022;39:28–44. doi: 10.1109/MSP.2021.3119273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Liang D., Cheng J., Ke Z., Ying L. Deep Magnetic Resonance Image Reconstruction: Inverse Problems Meet Neural Networks. IEEE Signal Process. Mag. 2020;37:141–151. doi: 10.1109/MSP.2019.2950557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Qin C., Rueckert D. Artificial Intelligence in Cardiothoracic Imaging. Springer; Cham, Switzerland: 2022. Artificial Intelligence-Based Image Reconstruction in Cardiac Magnetic Resonance; pp. 139–147. [Google Scholar]
  • 36.Candès E.J., Recht B. Exact matrix completion via convex optimization. Found. Comput. Math. 2009;9:717–772. doi: 10.1007/s10208-009-9045-5. [DOI] [Google Scholar]
  • 37.Lee K., Bresler Y. Admira: Atomic decomposition for minimum rank approximation. IEEE Trans. Inf. Theory. 2010;56:4402–4416. doi: 10.1109/TIT.2010.2054251. [DOI] [Google Scholar]
  • 38.Wang Z., Bovik A.C., Sheikh H.R., Simoncelli E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004;13:600–612. doi: 10.1109/TIP.2003.819861. [DOI] [PubMed] [Google Scholar]
  • 39.Kingma D.P., Ba J. Adam: A Method for Stochastic Optimization; Proceedings of the ICLR (Poster); San Diego, CA, USA. 7–9 May 2015. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The source code will be available publicly upon publication of the manuscript.


Articles from Bioengineering are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES