Abstract
Bilinear models such as low-rank and dictionary methods, which decompose dynamic data to spatial and temporal factor matrices are powerful and memory-efficient tools for the recovery of dynamic MRI data. Current bilinear methods rely on sparsity and energy compaction priors on the factor matrices to regularize the recovery. Motivated by deep image prior, we introduce a novel bilinear model, whose factor matrices are generated using convolutional neural networks (CNNs). The CNN parameters, and equivalently the factors, are learned from the undersampled data of the specific subject. Unlike current unrolled deep learning methods that require the storage of all the time frames in the dataset, the proposed approach only requires the storage of the factors or compressed representation; this approach allows the direct use of this scheme to large-scale dynamic applications, including free breathing cardiac MRI considered in this work. To reduce the run time and to improve performance, we initialize the CNN parameters using existing factor methods. We use sparsity regularization of the network parameters to minimize the overfitting of the network to measurement noise. Our experiments on free-breathing and ungated cardiac cine data acquired using a navigated golden-angle gradientecho radial sequence show the ability of our method to provide reduced spatial blurring as compared to classical bilinear methods as well as a recent unsupervised deep-learning approach.
Keywords: bilinear model, cardiac MRI, dynamic imaging, image reconstruction, unsupervised learning
I. Introduction
Dynamic MRI (DMRI) plays an important role in clinical applications such as cardiac cine MRI, which is commonly used by clinicians for the anatomical and functional assessment of organs. The clinical practice is to acquire the cine data using breath holding to achieve good spatial and temporal resolution. However, it is difficult for many subjects, including children, patients with myocardial infarction, and chronic obstructive pulmonary disease patients, to hold their breath [1]. In addition, multiple breath-holds prolong the scan time, adversely impacting patient comfort and compliance.
Several computational approaches have been introduced to reduce the breath-held duration in cardiac cine and to enable free-breathing imaging. Early approaches relied on carefully designed signal models to exploit the structure of the data in x-f space [2]–[4], sparsity [5], or binned the data to different phases [6] which facilitates the signal recovery from undersampled measurements. In recent years, bilinear models [7]–[10], which represent the signal as the product of spatial and temporal factors, have emerged as powerful alternatives for the recovery of large scale data. The factor-based framework has been combined with other priors, including low-rank and sparsity [7], [9], low-rank +sparse [11], blind compressed sensing that learns a dictionary from the data [12], and motion-compensation [13], [14]. Recently, non-linear manifold models that rely on kernel low-rank relation have been shown to outperform the subspace-based models in the context of free-breathing and ungated cardiac MRI [15]–[17]. A major benefit of the factor-based methods is the significantly reduced memory demand of these algorithms, in addition to the good reconstructions they offer. Specifically, the spatial and temporal factors are significantly smaller in dimension than the dynamic dataset, which facilitates the recovery of large 3D volumes [10], [18].The distinction between the methods can be considered as the specific priors applied on the spatial and temporal factors, including energy priors in the low-rank setting [9], unit column norm and sparsity priors in the blind dictionary learning setting [12], and kernel priors in the manifold setting [15], [16].
Model-based deep unrolled learning methods [20]–[22], which interleave convolutional neural network (CNN) blocks with data-consistency enforcing optimization modules, have been shown as the promising alternatives for classical methods in image recovery. A challenge with these schemes is their high memory demand during training, which results from the recurrent use of CNN blocks. In particular, the intermediate results at each unroll need to be stored for back-propagation. Several unrolled architectures were introduced for the recovery of breath-held ECG-gated cardiac cine MRI with factor representations [21], [23]–[26]. The gated approach allows the binning of data from multiple cardiac cycles to a few cardiac phase images (e.g. 20–30), which makes it suitable for memory hungry unrolled networks. The availability of pre-acquired fully-sampled data from multiple subjects, which is readily available in the cine setting, makes it possible to train the above algorithms in a supervised learning mode. Unfortunately, the memory demand and lack of fully sampled training data makes it challenging to apply it to the free-breathing and ungated imaging application, with at least an order higher number of image frames (e.g. 300 or more in our setting). While we have recently used an unrolled architecture for the free-breathing and ungated setting, we had to severely restrict the number of frames (≈ 100) and had to use considerably simpler networks that severely restricted the performance [27]. Another challenge associated with these schemes is the lack of fully sampled training data for training these models, especially in the free-breathing and ungated mode.
The main focus of this work is to introduce a memory-efficient model for dynamic MRI called Deep Bi-Linear Unsupervised Representation (DEBLUR). This approach exploits the power of CNNs to further improve the performance, while retaining the memory efficiency of bilinear factor representations. We use the structural bias of CNNs [19] as priors on the factors; this approach is an alternative to current schemes that penalize energy [9], sparsity [12], or kernel priors [16], [22] on the factors. In particular, we assume that the spatial and temporal factors are generated by two CNN generators, whose parameters are estimated from the measured data. The CNN parameters are learned such that the k-space measurements of the bilinear representation matches the multi-channel measurements. The proposed scheme directly learns the compressed representation (factors) from the undersampled k-t space data of each specific subject. Because the full image data need not be stored or used in the computation, the proposed approach is more memory efficient in multidimensional applications compared to current CNN approaches that require the image frames to be stored [25]–[27]. However, the improved memory efficiency comes at the expense of compute time during inference, compared to current deep learning solutions. The parameters of the CNN and the latent vectors are learned during inference; the computational complexity of the proposed scheme is comparable to classical bilinear based methods [7]–[10].
This paper generalizes our earlier conference version [28] in multiple ways. First of all, we follow a completely unsupervised strategy; we pretrain the CNN bilinear generators from SToRM reconstruction of undersampled k-space data rather than using exemplar datasets. In addition, the generator of temporal basis functions is significantly different from [28]; the inputs to the generator are learned during the optimization in the current setting, which offers improved performance compared to keeping them fixed. More importantly, the approach is now validated with several datasets compared to the limited comparisons in [28], along with ablation studies to determine the impact of the regularization terms. We note that an unsupervised approach was recently introduced in [29] for free-breathing dynamic MRI data. Unlike the proposed scheme that uses deep image priors on the factors, the above approach learns the parameters of the generator and assumes the latent vectors to be random. We compare the performance of the proposed scheme with [29].
II. Background
A. Dynamic MRI: Problem setup
We consider the recovery of an image time series xi; from their noisy undersampled measurements
| (1) |
Here, is the undersampled multi-channel MRI measurement operator for the ith frame and ni is the measurement noise. The undersampling pattern varies from frame to frame. It is a common practice to arrange the frames in a multidimensional Casorati matrix
| (2) |
where xi denotes the vectorized version of the ith frame in the dataset.
B. Bilinear factor models for dynamic MRI
Bilinear factor models are widely used in multidimensional applications, including dynamic MRI [7], [8], parameter mapping [30], [31], and MR spectroscopic imaging [32], [33]. These approaches represent the Casorati matrix in (2) in terms of the factors U and V:
| (3) |
The columns of U denoted by ui are identified as the spatial basis functions, while those of V are the temporal basis functions. The main benefit in the above representation is the significant reduction in the number of free parameters that need to be estimated, which translates to reduced data demand [27]. In the context of large multidimensional applications, another key benefit is the memory and computational benefits. In particular, the measurements can be expressed completely in terms of U and V as:
| (4) |
This approach eliminates the need for computing and storing the image frames xi themselves during image reconstruction; post-recovery, the desired frame can be retrieved as .
Many schemes [7], [8] estimate the temporal basis functions V from k-space navigators, followed by the estimation of the spatial factors U in (3). By contrast, the joint estimation of U and V from the measured data offers improved performance and reduces the need for specialized acquisition schemes with k-space navigators. The joint approaches pose the recovery of the signals from the undersampled measurements as:
| (5) |
Here, B denotes the measured data. We note that the representation in (3) is linear in U and V independently; the joint optimization in (5) can be viewed as a bilinear optimization. Here, and are regularization functionals. Depending on the specific form of the regularization functions, one would obtain dif|ferent flavors of reconstruction algorithms.
Low-rank regularization [9]: Here, one would choose and .
Blind compressed sensing: Here, one would choose and .
Smoothness regularization on manifolds (SToRM): The SToRM scheme also relies on a factorization as in (3), where and V is obtained as the eigenvectors of the graph Laplacian matrix of the graph of the data. Both calibrated [34] and uncalibrated [17] formulations are available.
The performance of the above methods critically depends on the specific choice of the priors and to estimate U and V.
C. Deep Image Prior (DIP)
The deep image prior approach has been introduced in inverse problems to exploit the structural bias of CNNs to natural image structure rather than noise [19]. The regularized reconstruction of a static image from undersampled and noisy measurements is posed as
| (6) |
where is the recovered image, generated by the CNN generator whose parameters are denoted by .
The constraint that the image is generated by a CNN provides implicit regularization, which facilitates the recovery of x in challenging inverse problems. Here, z is a random latent variable, which may or may not be optimized. The structural bias of untrained CNN generators towards natural images is exploited to yield good recovery. The above problem is often solved using stochastic gradient descent (SGD), which is often terminated early to obtain regularized recovery. Few iterations of the above model are observed to represent images reasonably well, while the model will learn the noise in the measurements, resulting in poor reconstructions with more iterations, provided the generator has sufficient capacity. Early termination is often used to avoid this and thus regularize the recovery. Alternate approaches including alternatives to SGD have been introduced to avoid the early stopping strategies.
III. Deep Bi-Linear Representation (DEBLUR)
The main focus of this work is to develop a multidimensional image reconstruction algorithm that inherits the memory efficiency of bilinear models; we express the data as as in (3). Unlike current bilinear models that use energy and sparsity priors, we use CNN priors on the factors. In particular, we propose to represent the factors as
| (7) |
| (8) |
Here, and are CNN generators whose weights are denoted by θ and ϕ, respectively. Here, are time-dependent latent vectors that are also learned from the measured k-space data. Here, U0 is an initial approximate reconstruction that is often obtained using a simple algorithm, which is refined by the generator. See Fig. 1 for details. We observe that feeding an initial reconstruction of the U factors to Gθ offers faster reconstructions compared to the DIP strategy of feeding noise to the image generator. Similar to DIP [19], we expect to capitalize on the structural bias of the CNNs towards smooth natural images.
Fig. 1.
Outline of the DEBLUR model. Two CNN generative networks and are used to generate the spatial (U) and temporal V factors, respectively. The Casorati matrix of the dataset is modeled as X = UVT, where the columns of X correspond to the temporal frames. Rather than using random inputs as in [19], the generator is fed with the U factor matrix from SToRM, denoted by U0. The input to network are the latent vectors Z, which are also learned from the data. We expect the joint learning procedure to result in interpretable latent vectors, which will capture the temporal motion components (e.g., cardiac and respiratory motion).
We propose to recover the factors by solving the optimization problem:
| (9) |
which is solved using SGD. Note that we also optimize for the latent vectors Z.
Note that the recovery scheme in (9) recovers the factors U and V directly from the multi-channel measurements B. Once U and V are recovered, the ith frame of the time series can be generated as
| (10) |
Similarly, one can obtain the measurements of the ith frame as during training, without having to evaluate the frames. This approach significantly reduces the memory demand and computational complexity of the algorithm, especially in applications involving multidimensional time series. As an example, in our case with 300 frames, we choose the number of columns in U and V as 30. Thus, our setting only requires the storage of 30 images and 30 1-D vectors. By contrast, unrolled approaches [23], [24], [27] require the storage of the entire time series and their intermediate copies at each unroll. With 10 unrolls, this would translate to the storage of 3000 images, which is an order of 100 times larger. Unfortunately, these methods are not directly applicable in our setting.
A. Unsupervised pretraining of the generators
The generators in DIP are usually initialized with random weights. Unlike past convex strategies, the CNN-based algorithm in (9) is not guaranteed to converge to the global minimum of the cost function. The final solution will be dependent on the initialization. To improve performance and to reduce the computational complexity, we propose to initialize the and networks.
In this work, we chose the SToRM [15] derived basis vectors to initialize the network. In particular, SToRM performs kernel low-rank regularization of the k-space navigators to derive the temporal basis functions VSToRM. Once the temporal basis functions are available, one can perform a fast and approximate reconstruction of the SToRM factor , where D is a diagonal matrix corresponding to density compensation. Note that this approximation is not as computationally expensive (around 20 seconds) compared to the SToRM algorithm [15], which involves a conjugate gradients iteration. We note that reconstructions from other approaches (e.g. low-rank and time-DIP [29]) can be used to also initialize the DEBLUR scheme. We use these fast estimate to pre-train and generators independently as
| (11) |
| (12) |
These initial guesses of the network weights (θ, ϕ) and latent vectors Z are used as initialization in (9). We study the impact of the initialization on the algorithms in the results section. The input latent vectors Z0 was initialized randomly during the pre-training phase.
B. Regularization penalties
The DIP approach, as well as our extension to the dynamic setting in (9), is vulnerable to noise overfitting. In particular, if the generator networks have sufficient capacity, they will learn the noise in the measurements when the number of iterations is large [19]. To further improve the robustness of the algorithm to noise, we propose to additionally penalize the norm of the network weights. The regularized recovery is posed as
| (13) |
We note that the above optimization is performed for each dataset. We expect the weight-regularization strategy to learn networks with fewer non-zero weights, which translates to smaller capacity; this approach is expected to reduce the vulnerability of the network to overfitting. The sparsity of the weights of promotes the learning of local spatial factors similar to [10], which is more efficient than global low-rank minimization. We note that the use of sparsity regularization is qualitatively equivalent to drop-out approaches, which are widely used in CNN training instead of explicit regularization. Unlike the explicit definition of the blocks in [10], the definition of the local neighborhoods in the proposed scheme is more data-driven. The images in the time series often vary smoothly with time in dynamic imaging applications; the norm of the temporal derivatives of the images is a widely used prior. Our goal is to directly recover the compressed representation from the data without using the actual images. We note from (10) that the ith image in the time series is dependent on the latent vector zi. The last term in (13) is the norm of the temporal gradients of Z, which will encourage the temporal smoothness of the time series. The algorithm in (13) is also initialized with the pretraining strategy discussed in the previous section. The impact of the regularization parameters and their ability to minimize overfitting issues are studied in the results section.
IV. Implementation details
A. Data acquisition and post-processing
The experimental data was obtained using the FLASH sequence on a Siemens 1.5T scanner with 34 coil elements total (body and spine coil arrays) in the free-breathing and ungated mode from cardiac MRI patients(five patients with 10–14 slices) with a scan time of 42 seconds per slice.The patients with cardiac abnormalities were recruited from those who were referred for routine clinical examinations. The protocol was approved by the Institutional Review Board (IRB) at the University of Iowa.
1). Pulse sequence details:
All datasets were acquired using a 34-channel cardiac array on Siemens Aera scanners at the University of Iowa. We used a radial GRE sequence with the following parameters: TR/TE 4.68/2.1 ms, FOV 300 mm, base resolution 256, slice thickness 8 mm. A temporal resolution of 46.8 ms was obtained by sampling 10 k-space spokes per frame. Each temporal frame was sampled by two k-space navigator spokes (out of 10 spokes/frame), oriented at 0 degrees and 90 degrees, respectively. The remaining spokes were chosen with a golden-angle view ordering. The scan parameters were kept the same across all patients. The subjects were asked to breathe freely, and the data was acquired in an ungated fashion. The complete data acquisition lasted 42 seconds for each slice. We extracted the data corresponding to the first 14s of the 42 second data from each slice to demonstrate the ability of the algorithm to reduce the data demand; the remaining 28s of data is discarded in our approach. By contrast, the classical SToRM algorithm requires the data from 42 second acquisitions to offer good recovery. We note that a cardiac cycle is less than 1 second in duration; 14 second acquisition often has 15–20 cardiac cycles and several respiration cycles.
2). Coil selection and compression::
To improve the image reconstruction quality, we excluded the coils with low sensitivities in the region/slice of interest. We used an automatic coil selection algorithm to pick the five best coil images, which provided the best signal-to-noise ratio (SNR) in the heart region. Our experiments (not included in this paper) show that this coil combination has minimal impact on image quality. The main motivation for the combination was to reduce the memory requirement so that it fit on our GPU device, which significantly reduced the computational complexity. All the results were generated using a single node of the high-performance Argon Cluster at the University of Iowa, equipped with Titan V 32GB of memory.
3). Performance Metrics::
We used the following four quantitative metrics to compare our method against the existing schemes:
- Signal to error ratio (SER):
where donates the norm, and and denote the original and the reconstructed images, respectively.(14) - Peak Signal to Noise Ratio (PSNR):
(15) - Normalized High Frequency Error (HFEN) [35]: This measures the quality of fine features, edges, and spatial blurring in the images and is defined as:
where LoG is a Laplacian of the Gaussian filter that captures edges. We use the same filter specifications as did Ravishankar et al. [35]: kernel size of 15 × 15 pixels, with a standard deviation of 1.5.(16) The Structural Similarity index (SSIM) is a perceptual metric introduced in [36], whose implementation is publicly available. We used the default contrast values, Gaussian kernel size of 11 × 11 pixels with a standard deviation of 1.5 pixels.
BRISQUE is a referenceless measure of image quality, where a smaller score indicates better perceptual quality. BRISQUE estimates the score using a support vector regression (SVR) model with the help of an image database with distorted and undistorted images to derive the differential mean opinion score values. The distortions include compression artifacts, blurring, and additive noise [37].
Higher values of the above-mentioned performance metrics correspond to better reconstruction, except for the HFEN, where a lower value is better.
4). State-of-the-art algorithms for comparison::
We compare the proposed scheme against the following algorithms:
SToRM [15]: The manifold Laplacian is estimated from the self-gating navigators acquired in k-space. Once the Laplacian matrix is obtained from navigators, the high-resolution images are recovered using kernel low-rank-based framework. The SToRM basis functions were estimated from navigator data with 70 iterations. Following the estimation of the navigators, the algorithm was run till convergence with a minimum of 60 iterations, as described in [15].
Low-Rank [7], [9]: The image time series is recovered by nuclear norm minimization. The nuclear norm minimization approach models the images as points living on a subspace.
B. Architecture of the generators
We refer to as the spatial generator. We use a four-layer CNN with ReLU activation. The number of channels of the input and output is equal to twice the number of basis functions, to account for the real and imaginary parts of the basis. In our work, we use 30 basis functions, and hence the number of input and output channels is 60. We refer to as the temporal generator, where we use a four-layer CNN with ReLU activation. The inputs to the temporal generator are the latent vectors , where d represents the latent dimension and denotes the number of frames in the dynamic dataset. In our work, we observe that two-dimensional latent vectors are sufficient to obtain good reconstructions. The outputs of the temporal generator are the temporal basis functions V of dimension , where r is the number of temporal basis functions in (3).
V. Experiments and Results
We describe the experiments and our observations in this section.
A. Impact of pretraining
To demonstrate the benefit of pre-training, we compare the algorithms described in (9) with random initialization and with the initialization scheme denoted in (11) & (12). Fig 2.(a) shows the plot of SER values with respect to the number of epochs. The color of the borders of the images in Fig 2.(b)-(e) correspond to the color of the markers in the plots, which denotes the initial and maximum SER values. We note that the initialization with random weights starts with the low SER as seen from Fig. 2.(a), compared to the other initializations. As seen from the plots in (a), the DEBLUR approach with low-rank, SToRM, and time-DIP converged rapidly to a peak value, while the one with random initialization converged to a poor solution with significantly more iterations. While SToRM initialization results in a slightly higher peak SNR, we note that the reconstructions shown in (c)-(e) are qualitatively equivalent. The plot of the U and V basis functions shown in (f) and (g) shows that the DEBLUR approach yielded similar final basis functions, irrespective of the specific initialization. We note that the computational complexity of the approximate low-rank and SToRM reconstructions are very low, compared to the time-DIP approach. We consider the initialization by SToRM basis vectors for the rest of the experiments considered in this work.
Fig. 2.
Impact of initialization of basis functions on image quality: (a) shows the SER curves of the DEBLUR scheme with random initialization, low-rank initialization, Time-DIP initialization and SToRM initialization. Their corresponding peak values images and time profiles are shown in (b)-(e). Three spatial basis vectors of each methods are shown in (f), both initial and learned basis. Their corresponding temporal basis vectors are shown in (g). These results show that the proposed method can offer qualitatively equivalent results when initialized using any of the existing algorithms. In particular, the spatial and temporal basis functions converge to the same ones, irrespective of the initialization. By contrast, the approach converges to a poor local minimum when initialized with random weights.
B. Impact of regularization parameters
We study the impact of the regularization priors in (13) in Figures 3, 4 and 5.
Fig. 3.
shows the SER curves with different λ1 values in DEBLUR method in (13), which regularizes the generator used to generate U. The un-regularized setting is denoted by the blue curve, which is the zoomed version of the red-curve in Fig. 2. We note that higher regularization parameters control the decay of SER with iterations, which indicates improved generalization of the network to unseen k-space samples. We note that the best performance is achieved with λ1 = 10−3. Larger regularization parameters (e.g., λ = 10−2 denoted by the red curve) translate to slight oversmoothing of the spatial factors.
Fig. 4.
shows the SER curves with different λ2 values in DEBLUR method. λ2 regularizes V network parameters. Lower λ2 value allows the network to learn noisy temporal information in the data as illustrated by the yellow curve. A higher value of λ2 oversmooths the temporal basis as depicted by the orange color. Empirical findings show λ2 = 1e−4 gives better image SER as compared to higher and lower values. Secondly, the performance does not deteriorate if it runs for more epochs, as shown by the purple color.
Fig. 5.
shows the latent vectors with different λ3 values in the DEBLUR method. λ3 applies temporal smoothness on latent vectors to achieve meaningful cardiac and respiratory motion. Lower λ3 gives noisy latent vectors as illustrated by λ3 = 0. A higher value of λ3 oversmooths the latent vectors as shown by λ3 = 1e3. Empirical findings show λ1 = 1e2 gives better separation of latent vectors that represent cardiac and respiratory motion. Secondly, the SER changes between the different values of λ3 are marginal. If no regularization is added, the latent vectors will capture alias artifacts resulting from under-sampling. Since cardiac and respiratory motion induced changes will be on a slower time-scale, we impose a temporal regularization to encourage the latent vectors from learning physiologically relevant parameters rather than noise. In our current work, the main focus is on accurate recovery of the images rather than disentangling the different latent variables.
1). Regularization of U network parameters:
We initialize the network with the SToRM initialization, described by (7) & (8). We set λ2 = λ3 = 0 and consider four different λ1 values in this study. We note from Fig. 3 that the network with λ1 = 0 results in the performance peaking after a few iterations, similar to the case in Fig. 2. With more iterations, the SER drops because of the overfitting to noise. By contrast, we observe that λ1 = 10−3 results in saturating performance, which is around 2.5 dB superior to the peak performance obtained from λ1 = 0. We also observe that λ1 values that are slightly higher and lower than the optimal values result in somewhat similar performance, which indicates that the network is not very sensitive on the optimal values.
2). Regularization of V network parameters:
In Fig. 4, we study the impact of λ2 on the results. We keep the best λ1 = 10−3 value from Fig. 3 and set λ3 = 0. We vary λ2 in this study and plot the change in SER with iterations. We note that λ2 = 10−4 offers the best final performance, resulting in around 0.2 dB improvement in performance compared to λ2 = 0. We note that the joint optimization of V and U networks results in around 8–9 dB improvement in performance over the SToRM initialization. The networks learned by DEBLUR result in a bilinear representation that is more optimal in representing the data when compared to the classical bilinear methods.
3). Regularization of latent vectors Z:
In Fig. 5 we study the impact of λ3 on the results. We keep λ1 = 10−3 and λ2 = 10−4, which were the best values that we determined in the previous subsections. We consider different values of λ3 and plot the corresponding latent vectors. We observe that the latent vector regularization had marginal impact on the SER. We note that the optimal value is λ3 = 1e2. We observe that λ3 = 0 resulted in noisy latent vectors, while the one with λ3 = 1e2 had good reconstructions.
4). Benefits of using latent vectors to generate temporal basis:
We note from Fig 5 and Fig. 6 that the λ3 parameter is not very sensitive to image quality. The V basis functions are derived as a non-linear function of Z. When λ3 → 0, learns to generate the basis functions from noise-like latent vectors. We note that cardiac and respiratory motion is expected to be smooth. With smoothness regularization (increasing λ3), we note that the latent vectors capture physiologically relevant parameters. In the datasets we considered, we note that the latent vectors separate into a fast-changing latent vector that captures cardiac motion and a slow one that captures respiratory motion. Post-reconstruction, the latent variables may be used to bin the images into different cardiac and respiratory states. The top row corresponds to the latent vectors and reconstructions from 14 s of data. The red box (corresponding to the red lines in the latent vector plot in (a)) corresponds to the image frame in the systole phase and expiration state, while the blue box corresponds to an image in the diastole phase and expiration phase. Fig 6(c) shows the latent vectors estimated from 28 s of data. The results show that similar decomposition of latent vectors can be obtained from more data, with moderate improvement in image quality.
Fig. 6.
Comparison of the DEBLUR method using 14 s and 28 s of acquired data. Fig 6(a) shows the two latent vectors of 14 s data, where the blue vector represents respiratory motion and the red vector represents cardiac motion in the data. Images at different time points, indicated by color dots, are shown in Fig 6(b). It also shows the ability of our latent vector-based approach to capture images at different time instants in a series. For example, the red dot in Fig 6(a) captures the image at the systole phase and at the expiration state. We have also compared 14 s data with 28 s data to show the improvement in the performance of the DEBLUR method. Differences in image quality with less and more data are subtle in the diastole phase but become prominent in the systole phase, as indicated by the red arrows.
C. Comparison with state-of-the-art methods
We compare the proposed DEBLUR method with the SToRM and low-rank reconstructions. Fig 7 shows the comparison between the DEBLUR and SToRM methods using five different datasets. We consider the recovery from 42 s and 14 s of data.
Fig. 7.
Comparison of the DEBLUR and SToRM methods using 14 s and 42 s of acquired data. We have used five different datasets to compare the performance of the DEBLUR and SToRM methods. With less data (14 s), images have less sharpness than they do with 42 s of data. However, DEBLUR(14s) gives better image quality than the SToRM(14s) method. DEBLUR(42s) achieves better image contrast with enhanced features, as depicted by red arrows in the Figure.
We observe that the DEBLUR(42s) visually offers similar or improved image quality when compared to SToRM(42s). However, we note that the SToRM approach results in significant blurring and degradation in image quality when only 14 s of data is available. Both SToRM algorithms were iterated until convergence as reported in [15]. We note that the SToRM approach reported in our earlier study relied on 42s of data [15]; the reduced data is the main reason for the blurring seen in the SToRM reconstructions. The DEBLUR reconstructions are observed to minimize the blurring, resulting in reconstructions that are comparable to the 42 s reconstructions. The improved performance may be attributed to the spatial and temporal regularization of the factors offered by DEBLUR.
Since ground truth is not available, we have used SToRM(42s) acquisition as reference data for SER and SSIM comparisons in Table I. We also compare the methods using the BRISQUE score in Table I and II, where we show the BRISQUE scores of the individual datasets.
TABLE I.
Quantitative comparison of the DEBLUR method with SOTA methods.
| Metric | Low-Rank | SToRM(14s) | DEBLUR(14s) |
|---|---|---|---|
| SER | 18.93 ± 0.48 | 22.41 ± 0.78 | 35.99 ± 4.98 |
| SSIM | 0.38 ± 0.02 | 0.59 ± 0.04 | 0.96 ± 0.04 |
| HFEN | 1.23 ± 0.02 | 0.82 ± 0.05 | 0.18±0.13 |
| Brisque | 40.82 ± 2.1 | 32.88 ± 4.3 | 26.83 ± 5.81 |
TABLE II.
Performance comparison of the DEBLUR method using the BRISQUE score on multiple data. We have also shown the benefit of using more acquired data. Corresponding images are shown in Fig 7
| Method | Data1 | Data2 | Data3 | Data4 | Data5 |
|---|---|---|---|---|---|
| DEBLUR(42s) | 28.30 ± 3.7 | 30.33 ± 3.3 | 29.90 ± 3.8 | 28.11 ± 3.7 | 17.53 ± 5.0 |
| SToRM(42s) | 34.44 ± 3.9 | 26.0 ± 2.9 | 29.96 ± 4.3 | 34.0 ± 5.0 | 17.32 ± 5.4 |
| DEBLUR(14s) | 29.5 ± 5.3 | 25.94 ± 3.4 | 28.61 ± 4.1 | 32.87 ± 5.5 | 17.28 ±5.2 |
| SToRM(14s) | 36.26 ± 0.9 | 32.63 ± 1.3 | 35.72 ± 3.5 | 34.57 ± 0.8 | 25.42 ± 2.3 |
In Fig 8, we compare the methods that recover the images from 14 s of data against SToRM reconstructions from 42 s. We have shown two frames (end of diastole and end of systole) from each method, along with error maps in Fig 8(a). Time profiles are depicted in Fig 8(b). We note that DEBLUR provides the best spatial and temporal quality and improved details, which are comparable to the SToRM reconstructions from 42 s.
Fig. 8.
Comparison of the DEBLUR(14s) method with state-of-the-art methods. Since ground truth is not available, we have used SToRM(42s) as ground truth. (a) To compare the image quality spatially, we have shown two frames (diastole and systole) from each method, along with their error maps. (b) Time profiles are shown.
VI. Discussion
The experiment in Section V-A clearly shows the benefit of initializing the network parameters by (11) and (12), respectively. As seen from Fig. 2, the optimization process is able to offer around 8–9 dB improvement in performance over the SToRM initialization. We note that the proposed framework of recovering the bilinear representation from 10 spokes/frame is significantly more challenging than the traditional DIP strategy in [19]. A reasonable initialization of the network can offer improved performance compared to random initialization. Fig. 2 also shows the need for early stopping of the unregularized setting in (9). In particular, the performance of the algorithm decays with increasing iterations, indicating overfitting to the noise in the k-space measurements.
The experiments in Section V-B show the benefit of regularizing the generator parameters while fitting to undersampled data. We observe from Fig. 3 & 4 that regularizing the network parameters improves the generalization performance of the network to unobserved k-space samples. Specifically, the network parameters are learned from few measured k-space data. The regularization of these networks reduces the impact of overfitting, thus improving the degradation in performance with iterations.
In this work, we have only used smoothness regularization on the latent vectors Z. With smoothness regularization, the latent vectors are observed to learn the temporal variations seen in the data, including cardiac and respiratory motion. Interpretable latent vectors can aid in visualization of the results, as showcased in Fig. 6. In particular, the slow-changing latent vector captures the respiratory motion, while the fast latent vector captures the cardiac motion. The data can be sorted into the respective phases based on the latent vectors. We note that disentangling the latent variables is an active research area [38], [39]. The use KL divergence priors to encourage the latent vectors to follow specific probability distributions may offer improved disentanglement of the latent variables. For instance, this prior may reduce the slight variations in the cardiac latent vectors resulting from respiratory motion seen in Fig. 7. This will be a focus of our future work; the main focus of this work is on obtaining good image quality rather than deriving interpretable representations.
The comparison of the proposed scheme with the classical approaches shows that the proposed DEBLUR(42s) can offer comparable performance to SToRM(42s), while the proposed scheme from 14 s of data significantly outperforms the SToRM(14s) approach.
The main challenge associated with the proposed scheme is the high computational complexity of the algorithm during inference. Unlike current supervised deep learning strategies that do not perform any training during inference, the network parameters θ and ϕ are estimated during inference. The computational complexity of the algorithm is hence more comparable to traditional factor based methods that estimate the factors using iterative optimization for each dataset. Low-rank method takes 58 minutes on CPU, on GPU, Storm takes 12 minutes and DEBLUR takes roughly 10 minutes. While this approach is ideally suited for applications with limited datasets for supervised training, there is currently no systematic way for learning prior information from other datasets. In the future, we plan to address the unsupervised learning of the network from the measured k-t space data of multiple datasets as in [40]. If successful, this strategy eliminates the need for fully sampled data and is expected to considerably reduce the reconstruction time. The network parameters may be fine-tuned using the data of each subject, as considered in this work, to further improve performance.
VII. Conclusion
We introduced a deep unsupervised bilinear algorithm to reconstruct dynamic MRI from undersampled measurements. We represented the spatial and temporal factors using two CNN- based generators, which are learned from the undersampled k-space data of each subject. The initialization of the network weights using an existing bilinear model (SToRM) is observed to both reduce the run-time as well as offer improved performance compared to initialization by random weights. The weights of the networks are regularized with l1 penalty to minimize the overfitting of the network to the noise in the measurements. The weight regularization is observed to minimize the degradation in performance with iterations. The implicit and learned regularization offered by the proposed scheme offers improved image quality compared to current methods, especially while recovering the data from shorter acquisition strategies. The proposed scheme directly learns a compressed image representation from the measured data, making it considerably more memory efficient than current approaches. The high memory efficiency makes it readily applicable to large-scale dynamic imaging (e.g., 3D+time) applications. In addition, the unsupervised strategy also eliminates the need for fully sampled training data, which is often not available in large-scale imaging problems.
VIII. COMPLIANCE WITH ETHICAL STANDARDS
This research study was conducted using human subject data. The Institutional Review Board at the local institution approved the acquisition of the data, and written consent was obtained from the subject.
IX. ACKNOWLEDGMENTS
This work is supported by grant NIH 1R01EB019961. The authors claim that there is no conflict of interest.
This work was supported in part by the NIH under Grant.
Contributor Information
Abdul Haseeb Ahmed, Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA 52242 USA.
Qing Zou, Department of Electrical and Computer Engineering, the University of Iowa, Iowa City, IA, USA.
Prashant Nagpal, Department of Radiology, The University of Iowa, Iowa City, IA 52242 USA.
Mathews Jacob, Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA 52242 USA.
References
- [1].Gay SB, Sistrom CL, Holder CA, and Suratt PM, “Breath-holding capability of adults: implications for spiral computed tomography, fast-acquisition magnetic resonance imaging, and angiography,” Investigative radiology, vol. 29, no. 9, pp. 848–851, 1994. [PubMed] [Google Scholar]
- [2].Liang Z-P, Jiang H, Hess CP, and Lauterbur PC, “Dynamic imaging by model estimation.” International journal of imaging systems and technology, vol. 8, no. 6, pp. 551–557, 1997. [Google Scholar]
- [3].Sharif B, Derbyshire JA, Faranesh AZ, and Bresler Y, “Patient-adaptive reconstruction and acquisition in dynamic imaging with sensitivity encoding (paradise).” Magnetic Resonance in Medicine, vol. 64, no. 2, pp. 501–513, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Tsao J, Boesiger P, and Pruessmann KP, “k-t blast and k-t sense: Dynamic mri with high frame rate exploiting spatiotemporal correlations.” Magnetic resonance in medicine, vol. 50, no. 5, pp. 1031–1042, 2003. [DOI] [PubMed] [Google Scholar]
- [5].Lustig M, Santos JM, Donoho DL, and Pauly JM, “kt sparse: High frame rate dynamic mri exploiting spatio-temporal sparsity.” in Proceedings of the 13th Annual Meeting of ISMRM, Seattle, vol. 2420, 2006. [Google Scholar]
- [6].Feng L, Axel L, Chandarana H, Block KT, Sodickson DK, and Otazo R, “Xd-grasp: golden-angle radial mri with reconstruction of extra motion-state dimensions using compressed sensing,” Magnetic resonance in medicine, vol. 75, no. 2, pp. 775–788, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Zhao B, Haldar JP, Christodoulou AG, and Liang Z-P, “Image reconstruction from highly undersampled (k, t)-space data with joint partial separability and sparsity constraints.” IEEE transactions on medical imaging, vol. 31, no. 9, pp. 1809–1820, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Jung H, Sung K, Nayak KS, Kim EY, and Ye JC, “k-t focuss: a general compressed sensing framework for high resolution dynamic mri.” Magnetic resonance in medicine, vol. 61, no. 1, pp. 103–116, 2009. [DOI] [PubMed] [Google Scholar]
- [9].Lingala SG, Hu Y, DiBella E, and Jacob M, “Accelerated dynamic mri exploiting sparsity and low-rank structure: kt slr.” IEEE transactions on medical imaging, vol. 30, no. 5, pp. 1042–1054, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Ong F, Zhu X, Cheng JY, Johnson KM, Larson PE, Vasanawala SS, and Lustig M, “Extreme mri: Large-scale volumetric dynamic imaging from continuous non-gated acquisitions,” Magnetic resonance in medicine, vol. 84, no. 4, pp. 1763–1780, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Otazo R, Candès E, and Sodickson DK, “Low-rank plus sparse matrix decomposition for accelerated dynamic mri with separation of background and dynamic components,” Magnetic Resonance in Medicine, vol. 73, no. 3, pp. 1125–1136, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Lingala SG and Jacob M, “Blind compressive sensing dynamic mri,” IEEE transactions on medical imaging, vol. 32, no. 6, pp. 1132–1145, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Asif MS, Hamilton L, Brummer M, and Romberg J, “Motion-adaptive spatio-temporal regularization for accelerated dynamic mri.” Magnetic Resonance in Medicine, vol. 70, no. 3, pp. 800–812, 2013. [DOI] [PubMed] [Google Scholar]
- [14].Lingala SG, DiBella E, and Jacob M, “Deformation corrected compressed sensing (dc-cs): a novel framework for accelerated dynamic mri,” IEEE transactions on medical imaging, vol. 34, no. 1, pp. 72–85, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Poddar S, Mohsin Y, Ansah D, Thattaliyath B, Ashwath R, and Jacob M, “Manifold recovery using kernel low-rank regularization: application to dynamic imaging.” IEEE Transactions on Computational Imaging, 2019. [DOI] [PMC free article] [PubMed]
- [16].Nakarmi U, Wang Y, Lyu J, Liang D, and Ying L, “A kernel-based low-rank (klr) model for low-dimensional manifold recovery in highly accelerated dynamic mri,” IEEE transactions on medical imaging, vol. 36, no. 11, pp. 2297–2307, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Ahmed AH, Zhou R, Yang Y, Nagpal P, Salerno M, and Jacob M, “Free-breathing and ungated dynamic mri using navigator-less spiral storm,” IEEE Transactions on Medical Imaging, vol. 39, no. 12, pp. 3933–3943, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Jiang W, Ong F, Johnson KM, Nagle SK, Hope TA, Lustig M, and Larson PE, “Motion robust high resolution 3d free-breathing pulmonary mri using dynamic 3d image self-navigator,” Magnetic resonance in medicine, vol. 79, no. 6, pp. 2954–2967, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Ulyanov D, Vedaldi A, and Lempitsky V, “Deep image prior,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9446–9454. [Google Scholar]
- [20].Hammernik K, Klatzer T, Kobler E, Recht MP, Sodickson DK, Pock T, and Knoll F, “Learning a variational network for reconstruction of accelerated mri data,” Magnetic resonance in medicine, vol. 79, no. 6, pp. 3055–3071, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Schlemper J, Caballero J, Hajnal JV, Price AN, and Rueckert D, “A deep cascade of convolutional neural networks for dynamic mr image reconstruction,” IEEE transactions on Medical Imaging, vol. 37, no. 2, pp. 491–503, 2017. [DOI] [PubMed] [Google Scholar]
- [22].Aggarwal HK, Mani MP, and Jacob M, “Modl: Model-based deep learning architecture for inverse problems,” IEEE transactions on medical imaging, vol. 38, no. 2, pp. 394–405, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Sandino C, Ong F, and Vasanawala S, “Deep subspace learning: Enhancing speed and scalability of deep learning-based reconstruction of dynamic imaging data.” in Proceedings of the 2020 Annual Meeting of ISMRM, vol. 0599, 2020. [Google Scholar]
- [24].Huang W, Ke Z, Cui Z-X, Cheng J, Qiu Z, Jia S, Ying L, Zhu Y, and Liang D, “Deep low-rank plus sparse network for dynamic mr imaging,” Medical Image Analysis, vol. 73, p. 102190, 2021. [DOI] [PubMed] [Google Scholar]
- [25].Kustner T, Fuin N, Hammernik K, Bustin A, Qi H, Hajhosseiny R, Masci PG, Neji R, Rueckert D, Botnar RM et al. , “Cinenet: deep learning-based 3d cardiac cine mri reconstruction with multicoil complex-valued 4d spatio-temporal convolutions,” Scientific reports, vol. 10, no. 1, pp. 1–13, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Sandino CM, Lai P, Vasanawala SS, and Cheng JY, “Accelerating cardiac cine mri using a deep learning-based espirit reconstruction,” Magnetic Resonance in Medicine, 2020. [DOI] [PMC free article] [PubMed]
- [27].Biswas S, Aggarwal HK, and Jacob M, “Dynamic mri using model-based deep learning and storm priors: Modl-storm,” Magnetic resonance in medicine, vol. 82, no. 1, pp. 485–494, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Ahmed AH, Nagpal P, Kruger S, and Jacob M, “Dynamic imaging using deep bilinear unsupervised learning (deblur),” in 2021. IEEE 18th International Symposium on Biomedical Imaging (ISBI). IEEE, 2021, pp. 1099–1102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Yoo J, Jin KH, Gupta H, Yerly J, Stuber M, and Unser M, “Time-dependent deep image prior for dynamic mri,” IEEE Transactions on Medical Imaging, 2021. [DOI] [PubMed]
- [30].Zhao B, Lam F, and Liang Z-P, “Model-based mr parameter mapping with sparsity constraints: parameter estimation and performance bounds,” IEEE transactions on medical imaging, vol. 33, no. 9, pp. 1832–1844, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Bhave S, Lingala SG, Johnson CP, Magnotta VA, and Jacob M, “Accelerated whole-brain multi-parameter mapping using blind compressed sensing,” Magnetic resonance in medicine, vol. 75, no. 3, pp. 1175–1186, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Lam F. and Liang Z-P, “A subspace approach to high-resolution spectroscopic imaging,” Magnetic resonance in medicine, vol. 71, no. 4, pp. 1349–1357, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Bhattacharya I. and Jacob M, “Compartmentalized low-rank recovery for high-resolution lipid unsuppressed mrsi,” Magnetic resonance in medicine, vol. 78, no. 4, pp. 1267–1280, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Poddar S. and Jacob M, “Dynamic mri using smoothness regularization on manifolds (storm),” IEEE transactions on medical imaging, vol. 35, no. 4, pp. 1106–1115, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Ravishankar S. and Bresler Y, “Mr image reconstruction from highly undersampled k-space data by dictionary learning.” IEEE transactions on medical imaging, vol. 30, no. 5, pp. 1028–1041, 2011. [DOI] [PubMed] [Google Scholar]
- [36].Wang Z, Bovik AC, Sheikh HR, and Simoncelli EP, “Image quality assessment: from error visibility to structural similarity.” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004. [DOI] [PubMed] [Google Scholar]
- [37].Mittal A, Moorthy AK, and Bovik AC, “No-reference image quality assessment in the spatial domain,” IEEE Transactions on image processing, vol. 21, no. 12, pp. 4695–4708, 2012. [DOI] [PubMed] [Google Scholar]
- [38].Burgess CP, Higgins I, Pal A, Matthey L, Watters N, Desjardins G, and Lerchner A, “Understanding disentangling in β-vae,” arXiv preprint arXiv:1804.03599, 2018. [Google Scholar]
- [39].Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, and Lerchner A, “beta-vae: Learning basic visual concepts with a constrained variational framework,” 2016.
- [40].Aggarwal HK, Pramanik A, and Jacob M, “Ensure: A general approach for unsupervised training of deep image reconstruction algorithms,” arXiv preprint arXiv:2010.10631, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]








