A Complex Quasi-Newton Proximal Method for Image Reconstruction in Compressed Sensing MRI

Tao Hong; Luis Hernandez-Garcia; Jeffrey A Fessler

doi:10.1109/tci.2024.3369404

. Author manuscript; available in PMC: 2025 Feb 23.

Published in final edited form as: IEEE Trans Comput Imaging. 2024 Feb 23;10:372–384. doi: 10.1109/tci.2024.3369404

A Complex Quasi-Newton Proximal Method for Image Reconstruction in Compressed Sensing MRI

Tao Hong ¹, Luis Hernandez-Garcia ², Jeffrey A Fessler ³

PMCID: PMC11460721 NIHMSID: NIHMS1973316 PMID: 39386353

Abstract

Model-based methods are widely used for reconstruction in compressed sensing (CS) magnetic resonance imaging (MRI), using regularizers to describe the images of interest. The reconstruction process is equivalent to solving a composite optimization problem. Accelerated proximal methods (APMs) are very popular approaches for such problems. This paper proposes a complex quasi-Newton proximal method (CQNPM) for the wavelet and total variation based CS MRI reconstruction. Compared with APMs, CQNPM requires fewer iterations to converge but needs to compute a more challenging proximal mapping called weighted proximal mapping (WPM). To make CQNPM more practical, we propose efficient methods to solve the related WPM. Numerical experiments on reconstructing non-Cartesian MRI data demonstrate the effectiveness and efficiency of CQNPM.

Index Terms—: Compressed sensing, magnetic resonance imaging (MRI), non-Cartesian trajectory, sparsity, wavelets, total variation, second-order

I. Introduction

Magnetic resonance imaging (MRI) scanners acquire samples of the Fourier transform (known as k-space data) of the image of interest. However, MRI is slow since the speed of acquiring k-space data is limited by many constraints, e.g., hardware, physics, and physiology etc. Improving the acquisition speed is crucial for many MRI applications. Lustig et al. [1] proposed a technique called compressed sensing (CS) MRI that improves the imaging speed significantly. CS MRI allows one to get an image of interest from undersampling data by solving the following composite optimization problem:

x^{*} = arg \underset{x \in C^{N}}{m i n} \underset{f (x)}{\underset{⏟}{\frac{1}{2} ∥ A x - y ∥_{2}^{2}}} + λ h (x),

(1)

where $A \in C^{M L \times N}$ denotes the forward model describing a mapping from the latent image $x$ to the acquired k-space data $y \in C^{M L}, h (x)$ is the regularizer that provides some prior assumptions about $x, L \geq 1$ denotes the number of coils, and $λ > 0$ is a tradeoff parameter to balance $f (x)$ and $h (x)$ . We note that $A$ consists of $L$ different submatrices $A_{l} = Φ F S_{l} \in C^{M \times N}$ for $l = 1,2, \dots, L$ , where $Φ$ denotes the downsampling mask, $F$ represents the nonuniform fast Fourier transform that depends on the sampling trajectory, and $S_{l}$ is a diagonal matrix involving the sensitivity map for the $l t h$ coil which differs for each scan.

Sparsity plays a key role in the success of CS MRI. In general, MR images are not sparse but they can be sparsely represented under some transforms, e.g., total variation (TV) [1], wavelets [2], and transform-learning [3] etc. Recently, more advanced priors or frameworks were introduced for CS MRI reconstruction, such as low-rank [4], plug and play [5, 6], model-based deep learning [7], score-based generative models [8], to name a few. Although deep learning based reconstruction methods have shown better performance than classical priors like TV and wavelet when trained with sufficient data, Gu et al. [9] recently found that suitably trained wavelet regularizers can also achieve comparable performance, demonstrating the power of the classical regularizers. Following Lustig et al. work in [1], we consider both wavelet and TV regularizers for CS MRI reconstruction, i.e., we address the following composite minimization problem for image reconstruction in CS MRI:

x^{*} = arg \underset{x \in C^{N}}{m i n} \frac{1}{2} ∥ A x - y ∥_{2}^{2} + λ [α ∥ T x ∥_{1} + (1 - α) T V (x)],

(2)

where $T$ and $∥ \cdot ∥_{1}$ denote a general wavelet transform and $ℓ_{1}$ norm, $T V (\cdot)$ represents the TV function (see definition in Section II-B), and $α \in [0, 1]$ is used to balance the wavelet and TV terms. For $α = 1$ (respectively, $α = 0$ ), (2) becomes the wavelet (respectively, TV) based CS MRI reconstruction. Since $ℓ_{1}$ and TV functions are nonsmooth, accelerated proximal methods (APMs) [10], which have the optimal convergence rate $O (1 / k^{2})$ where $k$ is the number of iterations, are very popular algorithms for (2). In [11], Beck et al. proposed a fast iterative shrinkage-thresholding algorithm (FISTA) (a specific type of APMs) for wavelet-based image reconstruction and showed a closed-form solution for the related proximal mapping [10]. Beck et al. [12] extended FISTA to solve TV-based image reconstruction and suggested a fast dual gradient descent method to compute the proximal mapping. Primal-dual methods [13] are also appealing methods for composite problems. The work in [14] showed that primal-dual methods can also achieve the optimal convergence rate and showed their connection to proximal methods. However, primal-dual methods have to tune parameters that affect the practical convergence rates and such tuning is nontrivial. For a review of different variants of primal-dual methods, see [15]. For more optimization methods and the use of different regularizers for reconstruction in CS MRI, see [16].

Modern MR images are typically acquired using multiple receiver coils and non-Cartesian trajectories, resulting in an expensive forward process from the image to the k-space domains and ill-conditioned or under-determined $A$ [16]. An ill-conditioned $A$ can lead to slow reconstruction [17]. To accelerate the recovery process, some preconditioning techniques have been introduced. In [18], Ong et al. proposed a diagonal matrix $\tilde{D}$ as a preconditioner such that they solved the following problem instead of (1):

x^{*} = a r g \underset{x \in C^{N}}{m i n} \frac{1}{2} {∥{\tilde{D}}^{\frac{1}{2}} (A x - y)∥}_{2}^{2} + λ h (x) .

(3)

Recently, Iyer et al. [17] developed more effective polynomial preconditioners than $\tilde{D}$ , based on Chebyshev polynomials. Although [17] showed promising results for practical reconstruction, adding such a preconditioner changes the incoherence of $A$ , which breaks the original theoretical guarantee. For $α \in (0,1)$ , both wavelet and TV are used as regularizers. When we have two nonsmooth terms, the alternating direction method of multipliers (ADMM) [19] is one of the appealing approaches. However, ADMM only provides linear convergence rate $O (1 / k)$ [20] and the computation in each iteration is high because we need to solve a least square problem. In [21–23], the authors proposed several preconditioning methods to solve the least square problem quickly, which reduces the computation time of the whole reconstruction significantly.

Similar to the quasi-Newton methods for smooth minimization problems [24], the authors in [25, 26] developed quasi-Newton proximal methods (QNPMs) that use second-order information for solving composite problems when $x \in R^{N}$ . Compared with APMs, QNPMs need fewer iterations to converge which is appealing for problems when computing the gradient $\nabla f (x)$ is expensive. Indeed, the authors in [27–29] applied QNPMs to solving the RED model and the TV based inverse-scattering and X-ray reconstruction and observed faster convergence than APMs. However, QNPMs require computing a weighted proximal mapping (WPM), defined in (6), that needs more computation than computing proximal mapping in APMs. So often QNPMs are impractical for many real applications. To compute the WPM, Kadu et al. [28] applied primal-dual methods. Alternatively, Ge et al. [29] treated the WPM as a TV based image deblurring problem and computed the WPM with APMs [12]. Those methods require inner and outer (i.e., two layers) iterations to compute the WPM, making them inefficient. Similar to QNPMs, the variable metric operator splitting methods (VMOSMs) [30] introduce new metrics to accelerate the proximal methods. For a discussion of the differences between QNPMs and VMOSMs, see the prior work section in [26].

The primary contribution of this paper lies in two significant advancements. Firstly, we expand QNPMs to address (1) for complex $x$ (Recall that reconstructed MRI images are inherently complex [31] and in some applications the image phase itself is useful, e.g., high-field MRI [32] and quantitative susceptibility mapping [33]). This is achieved by introducing a symmetric rank-1 method in the complex plane to approximate the Hessian matrix of $f (x)$ , which we called complex quasi-Newton proximal methods (CQNPMs). Secondly, we propose efficient approaches to compute the WPM. Notably, the computational needs of CQNPMs align closely with the proximal mapping in APMs for wavelet and/or TV-based reconstructions. Our numerical experiments on wavelet and TV based CS MRI reconstruction show that CQNPMs converge faster than APMs in terms of iterations and CPU time, demonstrating the potential advantage of CQNPMs for practical applications.

The rest of this paper is organized as follows. Section II first defines some notation and then reviews the formulation of the discretized TV function and the definition of WPM. Section III derives our algorithm. Section IV reports numerical experiments on the wavelet and TV based CS MRI reconstruction. Section V presents some conclusions and future work.

II. Preliminaries

This section first defines some notation that simplifies the following discussion and then describes the discretized TV functions. Finally, we define the WPM that generalizes the well-known proximal mapping.

A. Notation

Denote by $X \in C^{I \times J}$ the matrix form of $x \in C^{N}$ with relation $x = v e c (X)$ and $X = m a t (x)$ where $v e c (\cdot)$ denotes a column-stacking operator and $m a t (\cdot)$ is an operator to reshape a vector to its matrix form.
The $(i, j) t h$ (respectively, $n t h$ ) element of a matrix $X \in C^{I \times J}$ (respectively, vector $x \in C^{N}$ ) is represented as $X_{i, j}$ (respectively, $x_{n}$ ).
$𝒫_{1}$ denotes the set of matrix-pairs $(P, Q)$ where $P \in C^{(I - 1) \times J}$ and $Q \in C^{I \times (J - 1)}$ satisfy
$\begin{matrix} {|P_{i, j}|}^{2} + {|Q_{i, j}|}^{2} \leq 1, i = 1, \dots, I - 1, j = 1, \dots, J - 1, \\ |P_{i, J}| \leq 1, i = 1, \dots, I - 1, \\ |Q_{I, j}| \leq 1, J = 1, \dots, J - 1. \end{matrix}$
$P_{2}$ is the set of matrix-pairs $(P, Q)$ where $P \in C^{(I - 1) \times J}$ and $Q \in C^{I \times (J - 1)}$ satisfy $|P_{i, j}| \leq 1, |Q_{i, j}| \leq 1, \forall i, j$ .
$Z$ is the set of vectors $z \in C^{N}$ such that $|z_{n}| \leq 1, \forall n$ .
$ℒ : C^{(I - 1) \times J} \times C^{I \times (J - 1)} \to C^{I \times J}$ denotes a linear operator that satisfies
$ℒ (P, Q)_{i, j} = P_{i, j} + Q_{i, j} - P_{i - 1, j} - Q_{i, j - 1}, \forall i, j,$

where we assume that $P_{0, j} = P_{I, j} = Q_{i, 0} = Q_{i, J} = 0, \forall i, j$ .
The adjoint operator of $ℒ : C^{I \times J} \to C^{(I - 1) \times J} \times C^{I \times (J - 1)}$ is
$ℒ^{𝒯} (X) = (P, Q),$

where $P \in C^{(I - 1) \times J}$ and $Q \in C^{I \times (J - 1)}$ are the matrix pairs that satisfy
$\begin{array}{r} P_{i, j} = X_{i, j} - X_{i + 1, j}, i = 1, \dots, I - 1, j = 1, \dots, J, \\ Q_{i, j} = X_{i, j} - X_{i, j + 1}, i = 1, \dots, I, j = 1, \dots, J - 1 . \end{array}$

B. Discretized Total Variation

Assuming zero Neumann boundary conditions for an image $X \in C^{I \times J}$ , i.e.,

X_{I + 1, j} - X_{I, j} = 0, \forall j and X_{i, J + 1} - X_{i, J} = 0, \forall i,

the isotropic and anisotropic TV functions are defined as follows

{T V}_{i s o} (X) = \sum_{i = 1}^{I - 1} \sum_{j = 1}^{J - 1} \sqrt{{(X_{i, j} - X_{i + 1, j})}^{2} + {(X_{i, j} - X_{i, j + 1})}^{2}} + \sum_{i = 1}^{I - 1} |X_{i, J} - X_{i + 1, J}| + \sum_{j = 1}^{J - 1} |X_{I, j} - X_{I, j + 1}|,

(4)

and

{T V}_{ℓ_{1}} (X) = \sum_{i = 1}^{I - 1} \sum_{j = 1}^{J - 1} \{|X_{i, j} - X_{i + 1, j}| + |X_{i, j} - X_{i, j + 1}|\} + \sum_{i = 1}^{I - 1} |X_{i, J} - X_{i + 1, J}| + \sum_{j = 1}^{J - 1} |X_{I, j} - X_{I, j + 1}|,

(5)

respectively. Hereafter, we use $T V (x)$ to represent either ${T V}_{iso} (X)$ or ${T V}_{ℓ_{1}} (X)$ .

C. Weighted Proximal Mapping

Given a proper closed convex function $h (x)$ and a Hermitian positive definite matrix $W ≻ 0 : \in C^{N \times N}$ , the WPM associated to $h$ is defined as

{p r o x}_{h}^{W} (x) = a r g \underset{u}{m i n} (h (u) + \frac{1}{2} ∥ u - x ∥_{W}^{2}),

(6)

where $∥ \cdot ∥_{W}$ denotes the $W$ -norm defined by $∥ q ∥_{w} = \sqrt{q^{ℋ} W q}$ . Here $ℋ$ denotes Hermitian transpose. Clearly, (6) simplifies to the proximal mapping for $W = I_{N}$ where $I_{N}$ represents the identity matrix. Since $h (u) + \frac{1}{2} ∥ u - x ∥_{W}^{2}$ is strongly convex, ${p r o x}_{h}^{W} (x)$ exists and is unique for $x \in d o m h$ so that the WPM is well defined.

III. Complex Quasi-Newton Proximal Methods

This section first describes a complex quasi-Newton proximal method (CQNPM) for solving (1) with regularizer $h (x) = α ∥ T x ∥_{1} + (1 - α) T V (x)$ and $α \in [0, 1]$ . Here, we consider $T \in C^{\tilde{N} \times N}$ to be a wavelet transform. Then, we propose efficient methods to compute the related WPM. Moreover, to avoid applying wavelet transforms when computing the WPM for $α \in [0,1)$ , we propose a partial smooth approach. Our numerical experiments show that such a partial smooth strategy recovers the desired images with less computation.

At $k t h$ iteration, CQNPM solves (7) for $x_{k + 1}$ ,

x_{k + 1} = a r g \underset{x \in C^{N}}{m i n} f (x_{k}) + ⟨\nabla f (x_{k}), x - x_{k}⟩ + \frac{1}{2 a_{k}} {∥x - x_{k}∥}_{B_{k}}^{2} + λ h (x) = {p r o x}_{a_{k} λ h}^{B_{k}} (x_{k} - a_{k} B_{k}^{- 1} \nabla_{x} f (x_{k})),

(7)

where $a_{k}$ is the step-size and $B_{k} \in C^{N \times N}$ is a Hermitian symmetric positive definite matrix. For clarity, we present the detailed steps of CQNPM in Algorithm 1 Note that Algorithm 1 would be identical to the proximal methods [10] if one chose $B_{k} = I_{N}$ . In [30, 34], the authors suggested using a diagonal matrix $B_{k}$ for their application. However, building such a diagonal matrix is nontrivial and its effectiveness is problem dependent. In this paper, we choose $B_{k}$ to be a more accurate approximation of the Hessian of $f (x)$ . Specifically, we select $B_{k}$ based on the Symmetric Rank-1 (SR1) method [24], a popular method used in quasi-Newton methods for approximatnig a Hessian matrix. Following the derivation of SR1 for real variables, we derive a complex plane SR1 that is similar to the real one. Algorithm 2 presents the implementation details of SR1 in the complex plane. We found that using $γ > 1$ is crucial to ensure that $B_{k}$ is Hermitian positive definite in our setting because otherwise $⟨m_{k} - H_{0} s_{k}, s_{k}⟩$ can become negative, causing $B_{k}$ to turn indefinite. In our numerical experiments, we found that a fixed $γ > 1$ worked well.

Algorithm 1.

Proposed complex quasi-Newton proximal method.

Initialization: x₁.

Iteration:

1: for k = 1,2, … do

2: pick the step-size a_k and the weighting B_k.

x_{k + 1} \leftarrow {prox}_{a_{k} λ h}^{B_{k}} (x_{k} - a_{k} B_{k}^{- 1} \nabla_{x} f (x_{k})) .

4: end for

Open in a new tab

Algorithm 2.

SR1 updating.

Initialization:

γ > 1, δ = 10^{- 8}, Ξ > 0

a fixed real scalar,

x_{k}

x_{k - 1}

\nabla f (x_{k})

, and

\nabla f (x_{k - 1})

1: if

k = 1

then

B_{k} \leftarrow Ξ I

3: else

4: Set

s_{k} \leftarrow x_{k} - x_{k - 1}

and

m_{k} \leftarrow \nabla f (x_{k}) - \nabla f (x_{k - 1})

5: Compute

τ_{k} \leftarrow γ \frac{{‖m_{k}‖}_{2}^{2}}{〈s_{k}, m_{k}〉}

% 〈a, b〉 = b^{H} a

6: if

τ < 0

then

B_{k} \leftarrow Ξ I

8: else

H_{0} \leftarrow τ_{k} I

10:

u_{k} \leftarrow m_{k} - H_{0} s_{k}

11: if

|〈u_{k}, s_{k}〉| \leq δ {‖s_{k}‖}_{2} {‖u_{k}‖}_{2}

then

12:

u_{k} \leftarrow 0

13: end if

14:

B_{k} \leftarrow H_{0} + \frac{u_{k} u_{k}^{H}}{〈m_{k} - H_{0} s_{k}, s_{k}〉}

15: end if

16: end if

17: Return:

B_{k}

Open in a new tab

A. Compute Weighted Proximal Mapping

The dominant computation in Algorithm 1 is computing the WPM at Step 3 which could be as hard as solving (1) for a general $B_{k}$ . However, we find one can compute ${p r o x}_{a_{k} λ h}^{B_{k}} (\cdot)$ as easily as the case when $B_{k} = I_{N}$ by using the structure of $B_{k}$ .

To compute the WPM ${p r o x}_{\overline{λ} h}^{B_{k}} (v_{k})$ at $k t h$ iteration, we need to solve the following problem

\underset{x \in C^{N}}{m i n} {∥x - v_{k}∥}_{B_{k}}^{2} + 2 \overline{λ} [α ∥ T x ∥_{1} + (1 - α) T V (x)],

(8)

where $v_{k} = x_{k} - a_{k} B_{k}^{- 1} \nabla_{x} f (x_{k})$ and $\overline{λ} = a_{k} λ$ . A difficulty of (8) is the nonsmoothness of $∥ \cdot ∥_{1}$ and $T V (\cdot)$ . To address this difficulty, we consider a dual approach for (8) that is similar to Chambolle’s approach for TV-based image reconstruction [35]. Our method only uses one inner iteration to compute the WPM, and the related gradient is computed easily. Proposition 1 describes the dual problem of (8) and the relation between the primal and dual optimal solutions.

Proposition 1. Let

(z^{*}, P^{*}, Q^{*}) = \underset{\underset{(P, Q) \in 𝒫}{z \in 𝒵}}{argmin} {∥w_{k} (z, P, Q)∥}_{B_{k}}^{2}

(9)

where $w_{k} (z, P, Q) = v_{k} - \overline{λ} B_{k}^{- 1} (α T^{ℋ} z + (1 - α) v e c (ℒ (P, Q))$ and $𝒫 = 𝒫_{1}$ or $𝒫_{2}$ depending on which TV is used. Then the optimal solution of (8) is given by $x_{k + 1} = w_{k} (z^{*}, P^{*}, Q^{*})$ .

Proof. See Section A. □

Using Proposition 1, we can apply the FISTA [11, 36] to solve (9) for computing ${p r o x}_{\overline{λ} h}^{B_{k}}$ since (9) is convex and continuously differentiable. Lemma 1 specifies the corresponding gradient and Lipschitz constant of (9).

Lemma 1. The gradient of (19) is

- 2 \overline{λ} [\begin{matrix} α T \\ (1 - α) ℒ^{𝒯} \end{matrix}] w_{k} (z, P, Q)

(10)

and the corresponding Lipschitz constant is

L_{c} = 2 σ_{m i n} {\overline{λ}}^{2} (α^{2} ∥ T ∥^{2} + 8 (1 - α)^{2})

where $σ_{m i n}$ is the smallest eigenvalue of $B_{k}$ .

Proof. See Section B.

According to the formulation of $B_{k}$ proposed in Algorithm 2, we can obtain $σ_{m i n}$ easily through¹

σ_{min} = \{\begin{array}{l} Ξ & if τ < 0, \\ τ & if ⟨m_{k} - H_{0} s_{k}, s_{k}⟩ > 0, \\ τ + \frac{u^{ℋ} u}{⟨m_{k} - H_{0} s_{k}, s_{k}⟩} & if ⟨m_{k} - H_{0} s_{k}, s_{k}⟩ < 0 . \end{array}

The value of $∥ T ∥$ depends on the choice of wavelets which can be computed in advance, so the computational cost of obtaining the Lipschitz constant of (19) is cheap. For completeness, Algorithm 3 presents the implementation details of FISTA for solving (19). We terminate Algorithm 3 when it reaches a maximal number of iterations or a given accuracy tolerance. The initial value $(z_{1}, P_{1}, Q_{1})$ in Algorithm 3 uses the final solution of the previous iteration.

Remark 1. Compared with APMs for addressing (1), the additional cost of CQNPM is applying $B_{k}^{- 1}$ in computing $v_{k}$ and $w_{k}$ in Algorithms 1 and 3 This inversion can be computed cheaply through the Woodbury matrix identity. Moreover, computing the projectors ${P r o j}_{Z} (\cdot)$ and ${P r o j}_{𝒫} (\cdot)$ is also cheap and identical to the one shown in [12], so we omit the details here. The step-size $a_{k}$ in Algorithm 1 can be simply set to be 1.

III.

B. Compute the Weighted Proximal Mapping when $α = 1$

For $α = 1$ , running Algorithm 3 to compute the WPM would be inefficient since we would have to apply wavelet transform many times at each outer iteration. However, if $T$ is left invertible that $T^{ℋ} T = I_{N}$ , we can solve the following problem instead of (2) to avoid using Algorithm 3 to compute the WPM:

{\bar{x}}^{*} = \underset{\bar{x} \in C^{\tilde{N}}}{a r g m i n} \underset{f (\bar{x})}{\underset{⏟}{\frac{1}{2} {∥A T^{ℋ} \bar{x} - y∥}_{2}^{2}}} + λ ∥ \bar{x} ∥_{1} .

(11)

Then the recovered image is $x^{*} = T^{ℋ} {\bar{x}}^{*}$ . Now the corresponding WPM becomes

{p r o x}_{\overline{λ} ∥ \cdot ∥_{1}}^{B_{k}} (v_{k}) = \underset{\bar{x} \in C^{\tilde{N}}}{a r g m i n} {∥\bar{x} - v_{k}∥}_{B_{k}}^{2} + 2 \overline{λ} ∥ \bar{x} ∥_{1 .}

(12)

Note that (2) and (11) represent the analysis-based and synthetic-based priors, respectively. For a detailed discussion of their relations and equivalence, see [37].

Let $W \in C^{\tilde{N} \times \tilde{N}} : = D \pm u u^{ℋ}$ where $D \in R^{\tilde{N} \times \tilde{N}}$ is a diagonal matrix and $u \in C^{\tilde{N}}$ . Becker et al. proposed the following theorem that relates ${p r o x}_{λ h}^{W} (x)$ and ${p r o x}_{λ h}^{D} (x)$ .

Theorem 1 (Theorem 3.4, [26]²). Let $W = D \pm u u^{ℋ}$ . Then,

{p r o x}_{λ h}^{W} (x) = {p r o x}_{λ h}^{D} (x \mp D^{- 1} u β^{*}),

where $β^{*} \in C$ is the unique zero of the following nonlinear equation

J (β) : u^{ℋ} (x - {p r o x}_{λ h}^{D} (x \mp D^{- 1} u β)) + β .

Using the notation in Algorithm 2, we have the following observation:

Observation I. $τ$ and $⟨m_{k} - H_{0} s_{k}, s_{k}⟩$ in Algorithm 2 are real.

Proof. Note that $f (x) = \frac{1}{2} ∥ A x - y ∥_{2}^{2}$ . Then we have $m_{k} = A^{ℋ} A s_{k}$ , so $⟨s_{k}, m_{k}⟩$ is real. □

Since $⟨m_{k} - H_{0} s_{k}, s_{k}⟩$ is real, we rewrite $B_{k}$ as

B_{k} = H_{0} + s g n (⟨m_{k} - H_{0} s_{k}, s_{k}⟩) {\tilde{u}}_{k} {\tilde{u}}_{k}^{ℋ},

(13)

where ${\tilde{u}}_{k} = \frac{u_{k}}{\sqrt{⟨m_{k} - H_{0} s_{k}, s_{k}⟩}}$ and $s g n (\cdot)$ denotes the sign function such that $B_{k}$ holds the same structure as $W$ in Theorem 1 So, instead of solving (12) directly, we first solve $J (β) = 0$ and then use Theorem 1 to obtain ${p r o x}_{\overline{λ} ∥ \cdot ∥_{1}}^{B_{k}} (v_{k})$ . In this paper, we solve $J (β) = 0$ using “SciPy” library in Python.

C. Partial Smoothing

For $α \in (0,1)$ , Algorithm 3 still requires applying many wavelet transforms, which can dominate the computational cost. An alternative way is to use the idea proposed in [38] where one partially smooths the objective and then applies Algorithm 1. For comparison purposes, we apply Algorithm 1 to the following problem

\underset{x \in C^{N}}{m i n} \underset{f (x)}{\underset{⏟}{\frac{1}{2} ∥ A x - y ∥_{2}^{2} + λ α \cdot S^{η} (∥ T x ∥_{1})}} + \underset{h (x)}{\underset{⏟}{λ (1 - α) T V (x)}},

(14)

such that each outer iteration needs only two wavelet transforms. For the comparisons in this paper, we used $S^{η} (∥ x ∥_{1}) = \sum_{n = 1}^{N} \sqrt{x_{n}^{2} + η}$ with $η > 0$ so that $f (x)$ in (14) is differentiable. Our numerical experiments compare the performance of such a partial smoothing approach to methods based on the original cost function for image reconstruction in CS MRI.³

IV. Numerical Experiments

This section studies the performance of our algorithm for image reconstruction in CS MRI with non-Cartesian sampling trajectories. Specifically, we consider the radial and spiral trajectories. Moreover, we also study the robustness of our algorithm to the choice of $γ$ and Max_Iter in Algorithms 2 and 3, respectively. Similar to [1], we focus on wavelet and TV regularizers. We first present our experimental and algorithmic settings and then show our reconstruction results.

Experimental Settings:

We took complex k-space data from the brain and knee training datasets (one each) in the NYU fastMRI dataset [39] to generate the simulated k-space data. We applied the ESPIRiT algorithm [40] to recover the complex images and then cropped the images to size 256×256 to define the ground-truth images, with maximum magnitude scaled to one. Figure 1 shows the magnitude of the complex-valued ground-truth images. Following [17], we used 32 interleaves, 1688 readout points, and 12-coils (respectively, 96 radial projections, 512 readout points, and 12 coils) for the spiral (respectively, radial) trajectory to define the forward model $A$ . Figure 2 presents the used trajectories in this paper. For clarity, we plot only every 4th sample of the trajectories. Applying the used forward model to the ground truth image generated the noiseless multi-coil k-space data. We added complex i.i.d Gaussian noise with mean zero and variance 10⁻² to all coils to form the measurements, $y$ . The data input SNR was below 7dB. We also studied a higher data input SNR case of around 30dB. Our implementation used Python programming language with SigPy library [41]. The reconstructions ran on a workstation with 2.3GHz AMD EPYC 7402. Our code is available on https://github.com/hongtao-argmin/CQNPCS_MRIReco. The supplementary material provides additional experimental results and a comparison with a Plug-and-Play reconstruction method using BM3D and a deep denoiser [42].

Fig. 1. — The magnitude of the complex-valued ground truth images.

Fig. 2. — The non-Cartesian MRI trajectories used in this paper.

Algorithmic Settings:

For APM, we precomputed the Lipschitz constant for all experiments. For CQNPM, we set $a_{k} = 1$ and $γ = 1.7$ . Denote by S-APM (respectively, S-CQNPM) when APM (respectively, CQNPM) is used to solve (14). We chose the step-size in S-APM using a backtracking strategy [43]. Moreover, we also compared our method with primal-dual (PD) methods [44]. The tradeoff parameters $λ$ and $α$ were chosen to reach the highest peak signal-to-noise ratio (PSNR) when running enough iterations of APM. We set $η = 10^{- 5}$ in our experiments. The maximal number of iterations and tolerance in Algorithm 3 are set to be 20 and 10⁻⁶ for both CQNPM and APM.

A. Radial Acquisition MRI Reconstruction

Figures 3 and 4 show the performance of Algorithm 1 for the wavelet based reconstruction of the brain image and the comparison with APM [11] and PD [44]. Here, we used Theorem 1 to compute the WPM. Clearly, CQNPM converged faster than APM and PD in terms of iterations. Compared with the cost of computing the proximal mapping, the additional cost of computing WPM with our method is insignificant. Figures 3 and 4 show that the computational costs of CQNPM and APM per iteration are similar. The comparison of PSNR versus CPU time in Figure 4 also shows that CQNPM reached a higher PSNR with less CPU time, illustrating the fast convergence of CQNPM. The reconstructed images at 3, 10, 13, and 16th iteration illustrate that CQNPM yielded a clearer image than APM for the same number of iterations. Since PD led to a much lower PSNR than APM, we do not present the reconstructed images of PD. Similar observations apply to the knee image and the related results are provided in the supplemental material.

Fig. 3. — Cost values versus iteration (top) and CPU time (bottom) of the brain image with regularizer $h (x) = ∥ T x ∥_{1}$ and $λ = 5 \times 10^{- 4}$ for a left invertible wavelet transform $T$ with 5 levels. Acquisition: radial trajectory with 96 projections, 512 readout points, and 12 coils.

Fig. 4. — First row: the ground truth image and PSNR values versus CPU time; second to third row: the reconstructed brain images at 3, 10, 13, and 16th iteration with Figure 3 setting; fourth row: the zoomed-in regions and the corresponding error maps (×5) of the 16th iteration reconstructed images.

We also studied the performance of our algorithm when using both wavelet and TV regularizers. Here, we used Algorithm 3 to compute the the proximal mapping and WPM. Since ADMM is a classical method for (2) with $h (x) = α ∥ T x ∥_{1} + (1 - α) T V (x)$ , we include a comparison with ADMM. Moreover, we also studied the performance of the partial smoothing technique. Although PD does not require any inner iteration, unlike ADMM, APM, and our method, our method is still faster than PD in terms of iterations and CPU time.

Figures 5 and 6 present the results for the reconstruction of the brain image. CQNPM reduced the cost faster than APM in terms of iterations and CPU time. Although we solved (14) instead of (2) for the partial smoothing method, the cost is still computed with (2). Surprisingly, in this setting, we see that, for the cost values versus iterations, S-APM (respectively, S-CQNPM) converged similar to APM (respectively, CQNPM) in terms of iterations. However, from the cost values versus CPU time plot, S-CQNPM converged faster than CQNPM, as expected since the partial smoothing method requires only two wavelet transforms per outer iteration. However, S-APM converged slower than APM in terms of CPU time because S-APM requires applying a line search to choose the step-size, increasing the computational cost. Although CQNPM/S-CQNPM require an iterative method to solve the WPM, Section IV-D demonstrated that the WPM can be solved inexactly, and the computation for solving the WPM is relatively inexpensive compared to executing $A x$ in CS MRI reconstruction. Thus, CQNPM/S-CQNPM converged faster than ADMM/PD both in terms of iteration numbers and in CPU time. Note that ADMM requires solving a least-squares problem at each iteration, which involves applying $A x$ multiple times, leading to significantly slower convergence in terms of CPU time.

Fig. 5. — Cost values versus iteration (top) and CPU time (bottom) of the brain image with regularizer $h (x) = α ∥ T x ∥_{1} + (1 - α) T V (x)$ and same acquisition as Figure 3. The parameters were $λ = 6 \times 10^{- 4}$ and $α = \frac{1}{6}$ .

Fig. 6. — First row: the ground truth image and PSNR values versus CPU time; second to sixth row: the reconstructed brain images at 3, 10, 13, and 16th iteration with Figure 5 setting and the zoomed-in regions of the 16th iteration reconstruction. We did not show the reconstructed image of ADMM since it yielded a much lower PSNR than other methods.

The PSNR versus CPU time plot in Figure 6 also demonstrates the fast convergence of CQNPM and S-CQNPM. Compared with the previous experiments that only used a wavelet regularizer, we see an improved PSNR here, confirming the benefit of using both wavelet and TV regularizers. The reconstructed images at 3, 10, 13, and 16th iteration for each method⁴ illustrate that the partial smoothing method works as well as the nonsmoothing one. In summary, the proposed method converged faster than other methods in terms of iterations and CPU time, and S-CQNPM is the best algorithm for (2) in this setting. We also tested our algorithm on the knee image and provided the results in the supplementary material.

B. Spiral Acquisition MRI Reconstruction

This part studies the reconstruction with spiral acquisition that used 32 interleaves, 1688 readout points, and 12 coils. Figures 9 and 10 show the results of the knee image with wavelet and TV regularizers. The trends are similar to the radial acquisition case. Note that CQNPM reduced the cost values faster than S-CQNPM in terms of iterations and CPU time in this setting. However, S-CQNPM reached a higher PSNR than CQNPM with same CPU time. We provided the reconstruction of the brain and knee images with wavelet regularizer and the brain image with wavelet and TV regularizers in the supplementary material.

Fig. 9. — Cost values versus iteration (top) and CPU time (bottom) of the knee image with same regularizer as Figure 5 but with spiral acquisition: 32 intervals, 1688 readout points, and 12 coils. The parameters $λ$ and $α$ were 10⁻³ and $\frac{1}{2}$ .

Fig. 10. — First row: the ground truth image and PSNR values versus CPU time; second to sixth row: the reconstructed knee images at 3, 10, 13, and 16th iteration with Figure 9 setting. We did not show the reconstructed image of ADMM since it yielded a much lower PSNR than other methods. The seventh and eighth rows represent the zoomed-in regions and the corresponding error maps (×5) of the 16th itertion reconstructed images with PD → APM → S-APM → CQNPM → S-CQNPM

C. The Choice of $γ$

We tried several different $γ$ values to study how $γ$ affects the convergence of CQNPM. We reconstructed the brain image with wavelet and TV regularizers and radial acquisition. Figure 7 presents the results that show that CQNPM is quite robust to different $γ$ values, and $γ = 1.7$ worked slightly better than the others. So we simply set $γ = 1.7$ for all experiments.

Fig. 7. — Influence of $γ$ on the convergence of CQNPM. Test on the brain image with wavelet and TV regularizers and radial acquisition. We set Max_Iter= 20.

D. The Choice of Max_Iter in Algorithm 3

Following the setting used in Section IV-C we studied how the choice of Max_Iter in Algorithm 3 affects the converge of CQNPM. Figure 8 presents the cost values versus iteration with different values of Max_Iter. Clearly we see that CQNPM is quite robust to the choice of Max_Iter. However, a small Max_Iter (e.g., Max_Iter= 10) can slightly increase the cost and Max_Iter= 20,50 converged faster than other vaules. In our experiments, we found that Max_Iter= 20 is sufficient.

Fig. 8. — Influence of Max_Iter on the convergence of CQNPM. Test on the same problem as Figure 7 with $γ = 1.7$ .

E. Reconstruction with High Data Input SNR

This part studies the reconstruction for complex additive Gaussian noise with mean zero and lower variance 4×10⁻⁵, yielding around 30dB data input SNR. Figure 11 displays the reconstructed results using spiral acquisition and $h (x) = α ∥ T x ∥_{1} + (1 - α) T V (x)$ . The reconstructed images are much clearer than those in the low data input SNR cases. Moreover, the convergence trends of different algorithms are similar to those observed in low data input SNR reconstructions. The supplementary material provides the reconstructed results of the knee image that align with the observations made from the brain image presented here.

Fig. 11. — First row: the ground truth image and PSNR values versus CPU time; second to sixth row: the reconstructed brain images at 3, 10, 13, and 16th iteration with spiral acquisition and $h (x) = α ∥ T x ∥_{1} + (1 - α) T V (x)$ and the zoomed-in regions of the 16th iteration reconstruction. The parameters were $λ = 4 \times 10^{- 5}$ and $α = \frac{3}{4}$ .

V. Conclusions and Future Work

This paper proposes complex quasi-Newton proximal methods for solving (2) that led to faster convergence than APMs. By using the structure of $B_{k}$ , we develop efficient approaches for computing the WPM by considering wavelet and TV regularizers. Compared with computing the proximal mapping in APMs, i.e., $B_{k} = I_{N}$ , the increased computational cost in computing the WPM is insignificant, as illustrated by our comparisons in terms of CPU time. CQNPM is appealing for large-scale problems because CQNPM requires fewer iterations than APMs to converge, reducing the times of computing $\nabla f (x)$ that it is expensive in large-scale settings. Interestingly, in our setting, we found the partial smoothing method worked pretty well when both wavelet and TV regularizers are used. So the partial smoothing approach may be a good method for solving problems with two nonsmooth terms. To adapt CQNPM to other regularizers, one must find an efficient approach to address the WPM for the chosen regularizer to preserve the computational efficiency.

Clearly, $B_{k}$ plays an important role in our algorithm and a more accurate $B_{k}$ can accelerate the convergence further. Since the Hessian matrix in CS MRI is known, i.e., $A^{ℋ} A$ , we plan to learn a fixed weighting $B$ to approximate $A^{ℋ} A$ accurately for future work. However, $B$ must be easy to invert so that $B$ should have some special structures, e.g., $B = D \pm U U^{ℋ}$ , and finding such a $B$ should be computationally cheap since $A$ is different from each acquisition because the sensitivity mapping is patient dependent. Moreover, with such a fixed $B$ , we can adopt the accelerated manner used in APMs for Algorithm 1 and obtain an even faster algorithm than the one presented here.

Supplementary Material

supplemental material

NIHMS1973316-supplement-supplemental_material.pdf^{(6.7MB, pdf)}

VI. Acknowledgements

This work was funded by National Institutes of Health grant R01NS112233.

Appendix A

Proof of Proposition 1

Similar to [12] for the real case, one can prove the following relations for complex numbers $x, y \in C$

\sqrt{| x |^{2} + | y |^{2}} = \underset{p_{1}, p_{2} \in C}{m a x} \{R (p_{1}^{*} x + p_{2}^{*} y) : {|p_{1}|}^{2} + {|p_{2}|}^{2} \leq 1\} | x | = \underset{p \in C}{m a x} \{R (p^{*} x) : | p | \leq 1\}

where * denotes the conjugate operator and $R (\cdot)$ represents an operator to take the real part. With these relations and the definition of TV functions, we can rewrite $T V (x)$ and $∥ T x ∥_{1}$ as

T V (x) = \underset{(P, Q) \in 𝒫}{m a x} R \{v e c (ℒ (P, Q))^{ℋ} x\}, ∥ T x ∥_{1} = \underset{z \in 𝒵}{m a x} R \{z^{ℋ} T x\},

where $𝒫 = 𝒫_{1}$ (respectively, $𝒫_{2}$ ) for ${T V}_{iso}$ (respectively, ${T V}_{ℓ_{1}}$ ). Hence, we represent (8) as

\underset{x \in C^{N}}{m i n} max_{\underset{(P, Q) \in 𝒫}{z \in 𝒵,}} {∥x - v_{k}∥}_{B_{k}}^{2} + 2 \overline{λ} g (x, z, P, Q),

(15)

where

g (x, z, P, Q) = R \{α ⟨ T x, z ⟩ + (1 - α) v e c (ℒ (P, Q))^{ℋ} x\} .

Reorganizing (15), we get

min_{x \in ℂ^{N}} max_{\underset{(P, Q) \in 𝒫}{z \in 𝒵,}} {‖x - w_{k} (z, P, Q)‖}_{B_{k}}^{2} - {‖w_{k} (z, P, Q)‖}_{B_{k}}^{2},

(16)

where

w_{k} (z, P, Q) = v_{k} - \overline{λ} B_{k}^{- 1} (α T^{ℋ} z + (1 - α) v e c (ℒ (P, Q))) .

Since (16) is convex in $x$ and concave in $(z, P, Q)$ , we interchange the minimum and maximum and then get

\underset{max_{\underset{(P, Q) \in 𝒫}{z \in 𝒵}}}{m a x} \underset{x \in C^{N}}{m i n} {∥x - w_{k} (z, P, Q)∥}_{B_{k}}^{2} - {∥w_{k} (z, P, Q)∥}_{B_{k}}^{2} .

(17)

Note that $x$ only appears in the first term of (17) so that the optimal solution of the minimum part is

x^{*} = w_{k} (z, P, Q) .

(18)

Substituting (18) into (17), we get the following dual problem that contains only unknown dual variables $(z, P, Q)$

(z^{*}, P^{*}, Q^{*}) = \underset{\underset{(P, Q) \in P}{z \in 𝒵,}}{argmin} {‖w_{k} (z, P, Q)‖}_{B_{k}}^{2} .

(19)

After solving (19), the primal variable update is $x_{k + 1} = w_{k} (z^{*}, P^{*}, Q^{*})$ . This completes the proof.

Appendix B

Proof of Lemma 1

Denote by $h (z, P, Q) ≜ {∥w_{k} (z, P, Q)∥}_{B_{k}}^{2}$ . Applying the chain rule, we get

\nabla h (z, P, Q) = - 2 \overline{λ} [\begin{matrix} α T \\ (1 - α) ℒ^{𝒯} \end{matrix}] w_{k} (z, P, Q) .

Now, we compute the Lipschitz constant of $h (z, P, Q)$ . For every two pairs of $(z_{1}, P_{1}, Q_{1})$ and $(z_{2}, P_{2}, Q_{2})$ , we have

\begin{matrix} ∥\nabla h (z_{1}, P_{1}, Q_{1}) - \nabla h (z_{2}, P_{2}, Q_{2})∥ \\ = 2 {\overline{λ}}^{2} ∥ [\begin{matrix} α T \\ (1 - α) ℒ^{𝒯} \end{matrix}] B_{k}^{- 1} [\begin{array}{l} α T^{ℋ} & (1 - α) ℒ \end{array}] \\ [(z_{1}, P_{1}, Q_{1}) - (z_{2}, P_{2}, Q_{2})] ∥ \\ \leq 2 {\overline{λ}}^{2} ∥α^{2} T^{ℋ} T + (1 - α)^{2} ℒ^{𝒯} ℒ∥ ∥B_{k}^{- 1}∥ \\ ∥[(z_{1}, P_{1}, Q_{1}) - (z_{2}, P_{2}, Q_{2})]∥ \\ \leq 2 {\overline{λ}}^{2} (α^{2} ∥ T ∥^{2} + (1 - α)^{2} ∥ ℒ ∥^{2}) σ_{m i n} \\ ∥[(z_{1}, P_{1}, Q_{1}) - (z_{2}, P_{2}, Q_{2})]∥, \end{matrix}

where $σ_{m i n}$ is the smallest eigenvalue of $B_{k}$ . With the proof of [12, Lemma 4.2], we know $∥ ℒ ∥ = \sqrt{8}$ such that the Lipschitz constant of $h (z, P, Q)$ is $L_{c} = 2 σ_{m i n} {\overline{λ}}^{2} (α^{2} ∥ T ∥^{2} + 8 (1 - α)^{2})$ . This completes the proof.

Footnotes

We note that $⟨m_{k} - H_{0} s_{k}, s_{k}⟩$ is real in our setting, see Observation I.

The theorem is proved in real plane but it is also valid in complex plane.

One could instead partially smooth the TV regularizer. However, in our settings, we found that smoothing $∥ T x ∥_{1}$ led to better qualifty than TV smoothing.

⁴

We do not show the reconstructed image of ADMM since it yielded a much lower PSNR than other methods.

Contributor Information

Tao Hong, Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA.

Luis Hernandez-Garcia, Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA.

Jeffrey A. Fessler, Department of Electrical and Computer Engineering, University of Michigan, Ann Arbor, MI 48109, USA.

References

[1].Lustig M, Donoho D, and Pauly JM, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195, 2007. [DOI] [PubMed] [Google Scholar]
[2].Guerquin-Kern M, Haberlin M, Pruessmann KP, and Unser M, “A fast wavelet-based reconstruction method for magnetic resonance imaging,” IEEE Transactions on Medical Imaging, vol. 30, no. 9, pp. 1649–1660, 2011. [DOI] [PubMed] [Google Scholar]
[3].Ravishankar S and Bresler Y, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Transactions on Medical Imaging, vol. 30, no. 5, pp. 1028–1041, 2011. [DOI] [PubMed] [Google Scholar]
[4].Dong W, Shi G, Li X, Ma Y, and Huang F, “Compressive sensing via nonlocal low-rank regularization,” IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 3618–3632, 2014. [DOI] [PubMed] [Google Scholar]
[5].Venkatakrishnan SV, Bouman CA, and Wohlberg B, “Plug-and-play priors for model based reconstruction,” in IEEE Global Conference on Signal and Information Processing. IEEE, 2013, pp. 945–948. [Google Scholar]
[6].Ahmad R, Bouman CA, Buzzard GT, Chan S, Liu S, Reehorst ET, and Schniter P, “Plug-and-play methods for magnetic resonance imaging: Using denoisers for image recovery,” IEEE Signal Processing Magazine, vol. 37, no. 1, pp. 105–116, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Aggarwal HK, Mani MP, and Jacob M, “MoDL: Model-based deep learning architecture for inverse problems,” IEEE Transactions on Medical Imaging, vol. 38, no. 2, pp. 394–405, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Song Y, Shen L, Xing L, and Ermon S, “Solving inverse problems in medical imaging with score-based generative models,” in International Conference on Learning Representations, 2021. [Google Scholar]
[9].Gu H, Yaman B, Moeller S, Ellermann J, Ugurbil K, and Akçakaya M, “Revisiting ℓ1-wavelet compressed-sensing MRI in the era of deep learning,” Proceedings of the National Academy of Sciences, vol. 119, no. 33, p. e2201062119, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Parikh N, Boyd S et al. , “Proximal algorithms,” Foundations and Trends^® in Optimization, vol. 1, no. 3, pp. 127–239, 2014. [Google Scholar]
[11].Beck A and Teboulle M, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009. [Google Scholar]
[12].Beck A and Teboulle M, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Transactions on Image Processing, vol. 18, no. 11, pp. 2419–2434, 2009. [DOI] [PubMed] [Google Scholar]
[13].Chambolle A and Pock T, “A first-order primal-dual algorithm for convex problems with applications to imaging,” Journal of Mathematical Imaging and Vision, vol. 40, pp. 120–145, 2011. [Google Scholar]
[14].Condat L, “A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms,” Journal of Optimization Theory and Applications, vol. 158, no. 2, pp. 460–479, 2013. [Google Scholar]
[15].Komodakis N and Pesquet J-C, “Playing with duality: An overview of recent primal-dual approaches for solving large-scale optimization problems,” IEEE Signal Processing Magazine, vol. 32, no. 6, pp. 31–54, 2015. [Google Scholar]
[16].Fessler JA, “Optimization methods for magnetic resonance image reconstruction: Key models and optimization algorithms,” IEEE Signal Processing Magazine, vol. 37, no. 1, pp. 33–40, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Iyer SS, Ong F, Cao X, Liao C, Luca D, Tamir JI, and Setsompop K, “Polynomial preconditioners for regularized linear inverse problems,” arXiv preprint arXiv:2204.10252, 2022. [Google Scholar]
[18].Ong F, Uecker M, and Lustig M, “Accelerating non-Cartesian MRI reconstruction convergence using k-space preconditioning,” IEEE Transactions on Medical Imaging, vol. 39, no. 5, pp. 1646–1654, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[19].Boyd S, Parikh N, Chu E, Peleato B, Eckstein J et al. , “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends^® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011. [Google Scholar]
[20].He B and Yuan X, “On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method,” SIAM Journal on Numerical Analysis, vol. 50, no. 2, pp. 700–709, 2012. [Google Scholar]
[21].Ramani S and Fessler JA, “Parallel MR image reconstruction using augmented Lagrangian methods,” IEEE Transactions on Medical Imaging, vol. 30, no. 3, pp. 694–706, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Weller DS, Ramani S, and Fessler JA, “Augmented Lagrangian with variable splitting for faster non-Cartesian ℓ1-SPIRiT MR image reconstruction,” IEEE Transactions on Medical Imaging, vol. 33, no. 2, pp. 351–361, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[23].Koolstra K, van Gemert J, Börnert P, Webb A, and Remis R, “Accelerating compressed sensing in parallel imaging reconstructions using an efficient circulant preconditioner for Cartesian trajectories,” Magnetic Resonance in Medicine, vol. 81, no. 1, pp. 670–685, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[24].Nocedal J and Wright SJ, Numerical Optimization. Springer, 2006. [Google Scholar]
[25].Lee JD, Sun Y, and Saunders MA, “Proximal Newton-type methods for minimizing composite functions,” SIAM Journal on Optimization, vol. 24, no. 3, pp. 1420–1443, 2014. [Google Scholar]
[26].Becker S, Fadili J, and Ochs P, “On quasi-Newton forward-backward splitting: Proximal calculus and convergence,” SIAM Journal on Optimization, vol. 29, no. 4, pp. 2445–2481, 2019. [Google Scholar]
[27].Hong T, Yavneh I, and Zibulevsky M, “Solving RED with weighted proximal methods,” IEEE Signal Processing Letters, vol. 27, pp. 501–505, 2020. [Google Scholar]
[28].Kadu A, Mansour H, and Boufounos PT, “High-contrast reflection tomography with total-variation constraints,” IEEE Transactions on Computational Imaging, vol. 6, pp. 1523–1536, 2020. [Google Scholar]
[29].Ge T, Villa U, Kamilov US, and O’Sullivan JA, “Proximal Newton methods for X-ray imaging with nonsmooth regularization,” Electronic Imaging, vol. 2020, no. 14, pp. 7–1, 2020. [Google Scholar]
[30].Chouzenoux E, Pesquet J-C, and Repetti A, “Variable metric forward-backward algorithm for minimizing the sum of a differentiable function and a convex function,” Journal of Optimization Theory and Applications, vol. 162, no. 1, pp. 107–132, 2014. [Google Scholar]
[31].Fessler JA, “Model-based image reconstruction for MRI,” IEEE Signal Processing Magazine, vol. 27, no. 4, pp. 81–9, Jul. 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Duyn JH, van Gelderen P, Li T-Q, de Zwart JA, Koretsky AP, and Fukunaga M, “High-field MRI of brain cortical substructure based on signal phase,” Proceedings of the National Academy of Sciences, vol. 104, no. 28, pp. 11796–11801, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Wang Y and Liu T, “Quantitative susceptibility mapping (QSM): decoding MRI data for a tissue magnetic biomarker,” Magnetic Resonance in Medicine, vol. 73, no. 1, pp. 82–101, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[34].Bonettini S, Loris I, Porta F, and Prato M, “Variable metric inexact line-search-based methods for nonsmooth optimization,” SIAM Journal on Optimization, vol. 26, no. 2, pp. 891–921, 2016. [Google Scholar]
[35].Chambolle A, “An algorithm for total variation minimization and applications,” Journal of Mathematical Imaging and Vision, vol. 20, no. 1, pp. 89–97, 2004. [Google Scholar]
[36].Nesterov Y, “A method for unconstrained convex minimization problem with the rate of convergence O(1/k²),” in Doklady an SSSR, vol. 269, no. 3, 1983, pp. 543–547. [Google Scholar]
[37].Elad M, Milanfar P, and Rubinstein R, “Analysis versus synthesis in signal priors,” Inverse problems, vol. 23, no. 3, p. 947, 2007. [Google Scholar]
[38].Beck A and Teboulle M, “Smoothing and first order methods: A unified framework,” SIAM Journal on Optimization, vol. 22, no. 2, pp. 557–580, 2012. [Google Scholar]
[39].Zbontar J, Knoll F, Sriram A, Murrell T, Huang Z, Muckley MJ, Defazio A, Stern R, Johnson P, Bruno M et al. , “fastMRI: An open dataset and benchmarks for accelerated MRI,” arXiv preprint arXiv:1811.08839, 2018. [Google Scholar]
[40].Uecker M, Lai P, Murphy MJ, Virtue P, Elad M, Pauly JM, Vasanawala SS, and Lustig M, “ESPIRiT—an eigenvalue approach to autocalibrating parallel MRI: where SENSE meets GRAPPA,” Magnetic Resonance in Medicine, vol. 71, no. 3, pp. 990–1001, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[41].Ong F and Lustig M, “SigPy: a python package for high performance iterative reconstruction,” in Proceedings of the ISMRM 27th Annual Meeting, Montreal, Quebec, Canada, vol. 4819, 2019. [Google Scholar]
[42].Zhang K, Li Y, Zuo W, Zhang L, Van Gool L, and Timofte R, “Plug-and-play image restoration with deep denoiser prior,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 6360–6376, 2021. [DOI] [PubMed] [Google Scholar]
[43].Beck A, First-Order Methods in Optimization. SIAM, 2017, vol. 25. [Google Scholar]
[44].Sidky EY, Jørgensen JH, and Pan X, “Convex optimization problem prototyping for image reconstruction in computed tomography with the Chambolle-Pock algorithm,” Physics in Medicine & Biology, vol. 57, no. 10, p. 3065, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplemental material

NIHMS1973316-supplement-supplemental_material.pdf^{(6.7MB, pdf)}

[R1] [1].Lustig M, Donoho D, and Pauly JM, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195, 2007. [DOI] [PubMed] [Google Scholar]

[R2] [2].Guerquin-Kern M, Haberlin M, Pruessmann KP, and Unser M, “A fast wavelet-based reconstruction method for magnetic resonance imaging,” IEEE Transactions on Medical Imaging, vol. 30, no. 9, pp. 1649–1660, 2011. [DOI] [PubMed] [Google Scholar]

[R3] [3].Ravishankar S and Bresler Y, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Transactions on Medical Imaging, vol. 30, no. 5, pp. 1028–1041, 2011. [DOI] [PubMed] [Google Scholar]

[R4] [4].Dong W, Shi G, Li X, Ma Y, and Huang F, “Compressive sensing via nonlocal low-rank regularization,” IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 3618–3632, 2014. [DOI] [PubMed] [Google Scholar]

[R5] [5].Venkatakrishnan SV, Bouman CA, and Wohlberg B, “Plug-and-play priors for model based reconstruction,” in IEEE Global Conference on Signal and Information Processing. IEEE, 2013, pp. 945–948. [Google Scholar]

[R6] [6].Ahmad R, Bouman CA, Buzzard GT, Chan S, Liu S, Reehorst ET, and Schniter P, “Plug-and-play methods for magnetic resonance imaging: Using denoisers for image recovery,” IEEE Signal Processing Magazine, vol. 37, no. 1, pp. 105–116, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Aggarwal HK, Mani MP, and Jacob M, “MoDL: Model-based deep learning architecture for inverse problems,” IEEE Transactions on Medical Imaging, vol. 38, no. 2, pp. 394–405, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Song Y, Shen L, Xing L, and Ermon S, “Solving inverse problems in medical imaging with score-based generative models,” in International Conference on Learning Representations, 2021. [Google Scholar]

[R9] [9].Gu H, Yaman B, Moeller S, Ellermann J, Ugurbil K, and Akçakaya M, “Revisiting ℓ1-wavelet compressed-sensing MRI in the era of deep learning,” Proceedings of the National Academy of Sciences, vol. 119, no. 33, p. e2201062119, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Parikh N, Boyd S et al. , “Proximal algorithms,” Foundations and Trends^® in Optimization, vol. 1, no. 3, pp. 127–239, 2014. [Google Scholar]

[R11] [11].Beck A and Teboulle M, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009. [Google Scholar]

[R12] [12].Beck A and Teboulle M, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Transactions on Image Processing, vol. 18, no. 11, pp. 2419–2434, 2009. [DOI] [PubMed] [Google Scholar]

[R13] [13].Chambolle A and Pock T, “A first-order primal-dual algorithm for convex problems with applications to imaging,” Journal of Mathematical Imaging and Vision, vol. 40, pp. 120–145, 2011. [Google Scholar]

[R14] [14].Condat L, “A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms,” Journal of Optimization Theory and Applications, vol. 158, no. 2, pp. 460–479, 2013. [Google Scholar]

[R15] [15].Komodakis N and Pesquet J-C, “Playing with duality: An overview of recent primal-dual approaches for solving large-scale optimization problems,” IEEE Signal Processing Magazine, vol. 32, no. 6, pp. 31–54, 2015. [Google Scholar]

[R16] [16].Fessler JA, “Optimization methods for magnetic resonance image reconstruction: Key models and optimization algorithms,” IEEE Signal Processing Magazine, vol. 37, no. 1, pp. 33–40, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].Iyer SS, Ong F, Cao X, Liao C, Luca D, Tamir JI, and Setsompop K, “Polynomial preconditioners for regularized linear inverse problems,” arXiv preprint arXiv:2204.10252, 2022. [Google Scholar]

[R18] [18].Ong F, Uecker M, and Lustig M, “Accelerating non-Cartesian MRI reconstruction convergence using k-space preconditioning,” IEEE Transactions on Medical Imaging, vol. 39, no. 5, pp. 1646–1654, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] [19].Boyd S, Parikh N, Chu E, Peleato B, Eckstein J et al. , “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends^® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011. [Google Scholar]

[R20] [20].He B and Yuan X, “On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method,” SIAM Journal on Numerical Analysis, vol. 50, no. 2, pp. 700–709, 2012. [Google Scholar]

[R21] [21].Ramani S and Fessler JA, “Parallel MR image reconstruction using augmented Lagrangian methods,” IEEE Transactions on Medical Imaging, vol. 30, no. 3, pp. 694–706, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Weller DS, Ramani S, and Fessler JA, “Augmented Lagrangian with variable splitting for faster non-Cartesian ℓ1-SPIRiT MR image reconstruction,” IEEE Transactions on Medical Imaging, vol. 33, no. 2, pp. 351–361, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] [23].Koolstra K, van Gemert J, Börnert P, Webb A, and Remis R, “Accelerating compressed sensing in parallel imaging reconstructions using an efficient circulant preconditioner for Cartesian trajectories,” Magnetic Resonance in Medicine, vol. 81, no. 1, pp. 670–685, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] [24].Nocedal J and Wright SJ, Numerical Optimization. Springer, 2006. [Google Scholar]

[R25] [25].Lee JD, Sun Y, and Saunders MA, “Proximal Newton-type methods for minimizing composite functions,” SIAM Journal on Optimization, vol. 24, no. 3, pp. 1420–1443, 2014. [Google Scholar]

[R26] [26].Becker S, Fadili J, and Ochs P, “On quasi-Newton forward-backward splitting: Proximal calculus and convergence,” SIAM Journal on Optimization, vol. 29, no. 4, pp. 2445–2481, 2019. [Google Scholar]

[R27] [27].Hong T, Yavneh I, and Zibulevsky M, “Solving RED with weighted proximal methods,” IEEE Signal Processing Letters, vol. 27, pp. 501–505, 2020. [Google Scholar]

[R28] [28].Kadu A, Mansour H, and Boufounos PT, “High-contrast reflection tomography with total-variation constraints,” IEEE Transactions on Computational Imaging, vol. 6, pp. 1523–1536, 2020. [Google Scholar]

[R29] [29].Ge T, Villa U, Kamilov US, and O’Sullivan JA, “Proximal Newton methods for X-ray imaging with nonsmooth regularization,” Electronic Imaging, vol. 2020, no. 14, pp. 7–1, 2020. [Google Scholar]

[R30] [30].Chouzenoux E, Pesquet J-C, and Repetti A, “Variable metric forward-backward algorithm for minimizing the sum of a differentiable function and a convex function,” Journal of Optimization Theory and Applications, vol. 162, no. 1, pp. 107–132, 2014. [Google Scholar]

[R31] [31].Fessler JA, “Model-based image reconstruction for MRI,” IEEE Signal Processing Magazine, vol. 27, no. 4, pp. 81–9, Jul. 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].Duyn JH, van Gelderen P, Li T-Q, de Zwart JA, Koretsky AP, and Fukunaga M, “High-field MRI of brain cortical substructure based on signal phase,” Proceedings of the National Academy of Sciences, vol. 104, no. 28, pp. 11796–11801, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] [33].Wang Y and Liu T, “Quantitative susceptibility mapping (QSM): decoding MRI data for a tissue magnetic biomarker,” Magnetic Resonance in Medicine, vol. 73, no. 1, pp. 82–101, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] [34].Bonettini S, Loris I, Porta F, and Prato M, “Variable metric inexact line-search-based methods for nonsmooth optimization,” SIAM Journal on Optimization, vol. 26, no. 2, pp. 891–921, 2016. [Google Scholar]

[R35] [35].Chambolle A, “An algorithm for total variation minimization and applications,” Journal of Mathematical Imaging and Vision, vol. 20, no. 1, pp. 89–97, 2004. [Google Scholar]

[R36] [36].Nesterov Y, “A method for unconstrained convex minimization problem with the rate of convergence O(1/k²),” in Doklady an SSSR, vol. 269, no. 3, 1983, pp. 543–547. [Google Scholar]

[R37] [37].Elad M, Milanfar P, and Rubinstein R, “Analysis versus synthesis in signal priors,” Inverse problems, vol. 23, no. 3, p. 947, 2007. [Google Scholar]

[R38] [38].Beck A and Teboulle M, “Smoothing and first order methods: A unified framework,” SIAM Journal on Optimization, vol. 22, no. 2, pp. 557–580, 2012. [Google Scholar]

[R39] [39].Zbontar J, Knoll F, Sriram A, Murrell T, Huang Z, Muckley MJ, Defazio A, Stern R, Johnson P, Bruno M et al. , “fastMRI: An open dataset and benchmarks for accelerated MRI,” arXiv preprint arXiv:1811.08839, 2018. [Google Scholar]

[R40] [40].Uecker M, Lai P, Murphy MJ, Virtue P, Elad M, Pauly JM, Vasanawala SS, and Lustig M, “ESPIRiT—an eigenvalue approach to autocalibrating parallel MRI: where SENSE meets GRAPPA,” Magnetic Resonance in Medicine, vol. 71, no. 3, pp. 990–1001, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] [41].Ong F and Lustig M, “SigPy: a python package for high performance iterative reconstruction,” in Proceedings of the ISMRM 27th Annual Meeting, Montreal, Quebec, Canada, vol. 4819, 2019. [Google Scholar]

[R42] [42].Zhang K, Li Y, Zuo W, Zhang L, Van Gool L, and Timofte R, “Plug-and-play image restoration with deep denoiser prior,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 6360–6376, 2021. [DOI] [PubMed] [Google Scholar]

[R43] [43].Beck A, First-Order Methods in Optimization. SIAM, 2017, vol. 25. [Google Scholar]

[R44] [44].Sidky EY, Jørgensen JH, and Pan X, “Convex optimization problem prototyping for image reconstruction in computed tomography with the Chambolle-Pock algorithm,” Physics in Medicine & Biology, vol. 57, no. 10, p. 3065, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A Complex Quasi-Newton Proximal Method for Image Reconstruction in Compressed Sensing MRI

Tao Hong

Luis Hernandez-Garcia

Jeffrey A Fessler

Roles

Abstract

I. Introduction

II. Preliminaries

A. Notation

B. Discretized Total Variation

C. Weighted Proximal Mapping

III. Complex Quasi-Newton Proximal Methods

Algorithm 1.

Algorithm 2.

A. Compute Weighted Proximal Mapping

B. Compute the Weighted Proximal Mapping when α=1

C. Partial Smoothing

IV. Numerical Experiments

Experimental Settings:

Fig. 1.

Fig. 2.

Algorithmic Settings:

A. Radial Acquisition MRI Reconstruction

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

B. Spiral Acquisition MRI Reconstruction

Fig. 9.

Fig. 10.

C. The Choice of γ

Fig. 7.

D. The Choice of Max_Iter in Algorithm 3

Fig. 8.

E. Reconstruction with High Data Input SNR

Fig. 11.

V. Conclusions and Future Work

Supplementary Material

VI. Acknowledgements

Appendix A

Proof of Proposition 1

Appendix B

Proof of Lemma 1

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

B. Compute the Weighted Proximal Mapping when $α = 1$

C. The Choice of $γ$