A Convergent Generalized Krylov Subspace Method for Compressed Sensing MRI Reconstruction with Gradient-Driven Denoisers

Tao Hong; Umberto Villa; Jeffrey A Fessler

doi:10.1109/tci.2026.3655489

. Author manuscript; available in PMC: 2026 Mar 7.

Published in final edited form as: IEEE Trans Comput Imaging. 2026 Jan 19;12:378–390. doi: 10.1109/tci.2026.3655489

A Convergent Generalized Krylov Subspace Method for Compressed Sensing MRI Reconstruction with Gradient-Driven Denoisers

Tao Hong ¹, Umberto Villa ², Jeffrey A Fessler ³

PMCID: PMC12965199 NIHMSID: NIHMS2144790 PMID: 41799403

Abstract

Model-based reconstruction plays a key role in compressed sensing (CS) MRI, as it incorporates effective image regularizers to improve the quality of reconstruction. The Plug-and-Play and Regularization-by-Denoising frameworks leverage advanced denoisers (e.g., convolutional neural network (CNN)-based denoisers) and have demonstrated strong empirical performance. However, their theoretical guarantees remain limited, as practical CNNs often violate key assumptions. In contrast, gradient-driven denoisers achieve competitive performance, and the required assumptions for theoretical analysis are easily satisfied. However, solving the associated optimization problem remains computationally demanding. To address this challenge, we propose a generalized Krylov subspace method (GKSM) to solve the optimization problem efficiently. Moreover, we also establish rigorous convergence guarantees for GKSM in nonconvex settings. Numerical experiments on CS MRI reconstruction with spiral and radial acquisitions validate both the computational efficiency of GKSM and the accuracy of the theoretical predictions. The proposed optimization method is applicable to any linear inverse problem.

Index Terms—: CS MRI, gradient-driven denoiser, Krylov subspace, convergence, spiral and radial acquisitions

I. Introduction

MAGNETIC resonance imaging (MRI) scanners acquire k-space data that represents the Fourier coefficients of the image of interest. However, the acquisition process is inherently slow due to physical, hardware, and sampling constraints [1]. This slow acquisition presents several practical challenges, including patient discomfort, motion artifacts, and reduced throughput. Since the seminal work in [2], compressed sensing (CS) MRI has attracted significant attention in the MRI community [3, 4] for accelerating the acquisition process through structured sampling patterns. Modern CS MRI methods incorporate multiple receiver coils (a.k.a. parallel imaging [5, 6]) to further improve acquisition speed. Image reconstruction in CS MRI requires solving the following composite minimization problem:

x^{*} = arg min_{x \in ℂ^{N}} F (x) ≜ \underset{h (x)}{\underset{⏟}{\frac{1}{2} ‖ A x - y ‖_{2}^{2}}} + λ f (x),

(1)

where $A \in C^{M C \times N}$ denotes the forward operator that maps the image $x \in C^{N}$ to the measured k-space data $y \in C^{M C}$ . Here, we consider $C$ receiver coils. The encoding operator $A$ is a stack of $C$ submatrices $A_{c} \in C^{M \times N}$ , each defined as $A_{c} = {P F S}_{c}$ , where $P$ is the sampling mask, $F$ is the (nonuniform) Fourier transform, and $S_{c}$ is the coil sensitivity map corresponding to the cth coil, which is patient-dependent. The trade-off parameter $λ > 0$ balances $h (x)$ and $f (x)$ .

The data-fidelity term $h (x)$ in (1) promotes consistency with the acquired k-space data. In practice, often $M ≪ N$ due to under-sampling, making (1) ill-posed. Therefore, incorporating prior knowledge through the regularizer $f (x)$ is essential for stabilizing the reconstruction. The choice of regularization plays a crucial role in reconstruction quality. Traditional hand-crafted regularizers include wavelets [7], total variation (TV) [8, 9], combinations of wavelets and TV [2, 10], dictionary learning [11, 12], and low-rank models [13], to name a few. For reviews of various choices for $f (x)$ , see [4, 14, 15].

In the past decade, deep learning (DL) has gained prominence in MRI reconstruction due to its capacity to learn complex image priors directly from large training datasets [16]. Roughly speaking, DL-based approaches can be broadly categorized into end-to-end networks [17] and physics-driven unrolled algorithms [18–20]. Recently, generative models have emerged as a powerful class for modeling priors in MRI, achieving impressive results across various settings [21].

An alternative to classical DL pipelines is the Plug-and-Play (PnP) [22] and REgularization-by-Denoising (RED) [23] frameworks. PnP and RED integrate powerful denoisers into iterative reconstruction algorithms and have demonstrated competitive performance across various imaging tasks [24–29]. Early PnP/RED frameworks employed classical denoisers such as the median filter [30], non-local means [31], and BM3D [32]. As convolutional neural network (CNN)–based denoisers have shown superior performance to classical ones, modern PnP/RED methods typically integrate CNN-based denoisers. Unlike end-to-end or unrolled DL methods that require retraining for each imaging task, PnP and RED leverage learned image priors to flexibly adapt to changes in the forward model without retraining. This adaptability is particularly beneficial in CS MRI reconstruction, where scanspecific variations (e.g., different sampling trajectories and patient-specific sensitivity maps) are common. See [33] for a review of PnP methods that incorporate both classical and CNN-based denoisers in MRI reconstruction.

Despite the empirical success of PnP and RED, their theoretical convergence guarantees remain an active area of research; see [26, 34–38]. These works typically require that the denoisers either approximate maximum a posteriori or minimum mean squared error estimators, or satisfy the nonexpansiveness condition. However, many successful denoisers—especially those based on CNNs—do not satisfy these assumptions. As a result, PnP and RED with such denoisers cannot be rigorously interpreted as optimization algorithms. Although optimization-free perspectives have been proposed, understanding the behavior of these frameworks remains challenging [26]. One alternative is to train denoisers with additional regularization that enforces a bounded Lipschitz constant [36, 37]. However, guaranteeing strict and tight boundedness in practice remains an open challenge.

Recent efforts have aimed to close the gap between the theoretical foundations and practical effectiveness of PnP and RED by introducing gradient-driven denoisers [39–41]. In this approach, the unknown image $x$ is recovered by solving

x^{*} = arg min_{x \in 𝒞} F (x) : = \underset{h (x)}{\underset{⏟}{\frac{1}{2} ‖ A x - y ‖_{2}^{2}}} + λ f_{θ} (x),

(2)

where $f_{𝛉} (x)$ is a scalar-valued energy function parameterized by CNNs that serves as a learned image prior and $𝒞$ is a closed convex set in $C^{N}$ . The parameters $𝛉$ are learned by enforcing $x - \nabla_{x} f_{𝛉} (x)$ to act as a denoiser. Thus, the only required assumption is the differentiability of $f_{𝛉}$ with respect to $x$ , which allows one to integrate deep learning into inverse problems while maintaining a degree of interpretability—an essential requirement in medical imaging, where reconstructions directly influence diagnostic decisions. For notational simplicity, we omit the subscript $𝛉$ and use $\nabla f (x)$ instead of $\nabla_{x} f (x)$ . Moreover, we absorb $λ$ into $f (x)$ in the following discussion since $λ$ is fixed throughout the minimization once it is selected.

Although both $h (x)$ and $f (x)$ in (2) are differentiable, $f (x)$ is generally nonconvex, which poses challenges for designing convergent and efficient algorithms. Cohen et al. [39] applied a projected gradient descent method with a line search to solve (2). Alternatively, Hurault et al. [40] employed the proximal gradient descent method with line search. Both approaches provide convergence guarantees under the assumption that $\nabla f$ is Lipschitz continuous. However, these methods typically require hundreds of iterations to converge, which limits their practical applicability. Recently, Hong et al. [42] proposed a convergent complex quasi-Newton proximal method (CQNPM) that significantly reduces the computational time required to solve (2). Their convergence is established under the assumptions that $\nabla f$ is Lipschitz continuous and that the proximal Polyak-Łojasiewicz condition holds. Although CQNPM converges faster than existing methods for solving (2), it requires solving a weighted proximal mapping (as defined in [42, equation (3)]) at each iteration. This step requires computing $A x$ and $A^{H} x$ multiple times,¹ which which can increase the overall computational complexity. Computing Ax is expensive in MRI reconstruction with many coils, high-resolution images, or many interleaves or spokes in non-Cartesian acquisitions. Drawing inspiration from Krylov subspace methods (KSMs) [43], we propose a generalized Krylov subspace method (GKSM) for efficiently solving (2), which requires computing $A x, A^{H} x$ , and $\nabla f (x)$ only once per iteration. Our main contributions are summarized as follows:

We propose a generalized Krylov subspace method (GKSM) for efficiently solving (2).
We present a rigorous convergence analysis of GKSM in nonconvex settings, along with the convergence rate of the cost function values.
We extensively evaluate the performance of GKSM on brain (respectively, knee) images from the dataset described in the MoDL paper [18] (respectively, the NYU fastMRI dataset [44]). The k-space data are simulated from the reconstructed complex-valued images using spiral and radial sampling trajectories. We also empirically validate the accuracy of the convergence analysis.

The rest of this paper is organized as follows. Section II reviews the preliminaries on KSMs and discusses related work that generalizes KSMs for solving inverse problems. Section III describes GKSM in detail. Section IV provides a rigorous convergence analysis of GKSM. Section V reports experimental results that evaluate the performance of GKSM and empirically validate the theoretical analysis.

II. Preliminaries on Krylov Subspace Methods

This section first introduces KSMs, which were primarily developed for solving linear equations. We then review related generalized Krylov methods for linear inverse problems, along with existing theoretical results. Our main goal in this section is to provide a sketch of the key developments in KSMs, from their origins in solving linear equations to their generalization for inverse problems.

KSMs are a class of iterative algorithms for solving problems of the form

\bar{A} x = b,

(3)

where $\bar{A} \in R^{N \times N}$ is typically sparse, ill-conditioned, and large-scale. At kth iteration, KSMs construct an approximate solution to $x^{*}$ within the Krylov subspace:

𝒦_{k} (\bar{A}, {\bar{r}}_{1}) = s p a n \{{\bar{r}}_{1}, \bar{A} {\bar{r}}_{1}, {\bar{A}}^{2} {\bar{r}}_{1}, \dots, {\bar{A}}^{k - 1} {\bar{r}}_{1}\},

(4)

where ${\bar{r}}_{1} = b - \bar{A} x_{1}$ is the initial residual. The approximate solution $x_{k + 1}$ is obtained by seeking $x_{k + 1} \in x_{1} + 𝒦_{k} (\bar{A}, {\bar{r}}_{1})$ that minimizes a chosen norm of the residual. The most widely used KSMs include, but are not limited to, the conjugate gradient method [45], LSQR [46], BiCGSTAB [47], and the generalized minimal residual method [48]. These methods are designed for different types of matrices $\bar{A}$ , such as symmetric positive definite, non-symmetric, or indefinite, to name a few [43]. Moreover, KSMs can incorporate preconditioners to further accelerate convergence [49].

Many inverse problems with variational regularizers [50, 51] can be modeled as the following $ℓ_{p} - ℓ_{q}$ optimization problem:

min_{x \in ℝ^{N}} \frac{1}{p} {‖ A x - y ‖}_{p}^{p} + \frac{λ}{q} {‖ W x ‖}_{q}^{q},

(5)

where $0 < p, q \leq 2$ , and $W$ represents a transform such as a wavelet transform. KSMs have been generalized to solve problems such as (5). For $p = q = 2$ , Lampe et al. [52] presented a generalized KSM to address (5) in which the parameter $λ$ is adaptively adjusted to keep $‖ A x - y ‖_{2}$ sufficiently close to a prescribed tolerance. Lanza et al. [53] proposed to solve (5) using a KSM along with an iteratively reweighted approach. Moreover, Huang et al. combined Krylov subspace-based method with majorization minimization for solving (5) [54]. To avoid the inner-outer iterations when using the iteratively approach in [53], several flexible Krylov subspaces were proposed to improve efficiency [55–57]. See [50, 58] for a review of using KSMs for inverse problems.

Besides their extension to inverse problems, rigorous convergence analyses of KSMs remain an open research area. Lanza et al. [53] showed that the iterates converged to a minimizer of (5) for $1 \leq p, q \leq 2$ if $k e r (A^{T} A) \cap k e r (W^{T} W) = {0}$ , where $k e r (\cdot)$ denotes the null space of the matrix, and the constructed Krylov subspace fully represents the entire image domain. Similar convergence results can also be found in [54, 59]. Other works [55–57] proved that the cost function values are monotonically decreasing and that the iterates converge to a stationary point. For brevity, we present only the convergence results of KSMs for inverse problems. See [60, 61] and the references therein for discussions on KSMs and their convergence in other contexts.

III. Proposed Method

This section provides the details of our GKSM for solving (2). We first discuss the case where $𝒞 = C^{N}$ and then describe how to incorporate a convex constraint. Lastly, we provide further discussion of GKSM to offer additional insights.

Given a subspace basis $V_{k} \in C^{N \times k}$ satisfying $V_{k}^{H} V_{k} = I_{k}$ where $I_{k}$ is the identity matrix with dimension $k$ , a Hermitian positive definite matrix $B_{k} \in C^{N \times N}, B_{k} ≻ 0$ , and $α_{k} \in R, α_{k} > 0$ , GKSM solves the following problem at the kth iteration to obtain the coefficient $𝛃_{k} \in C^{k}$ :

𝛃_{k} = a r g \underset{𝛃 \in C^{k}}{m i n} \underset{\overline{F} (x, x_{k})}{\underset{⏟}{\frac{1}{2} {‖ A x - y ‖}_{2}^{2} + \overline{f} (x, x_{k}, B_{k}, α_{k})}},

(6)

where $x = V_{k} 𝛃, \overline{f} (x, x_{k}, B_{k}, α_{k}) \equiv ⟨\nabla f (x_{k}), x⟩ + \frac{1}{2 α_{k}} {‖ x - x_{k} ‖}_{B_{k}}^{2}$ is a quadratic proximal term, and $‖ x ‖_{B_{k}}^{2} = x^{H} B_{k} x$ . Then the next image iterate is $x_{k + 1} = V_{k} 𝛃_{k}$ . Rewriting (6) in terms of $𝛃$ and reorganizing yields

𝛃_{k} = a r g \underset{𝛃 \in C^{k}}{m i n} {‖[\begin{matrix} A V_{k} 𝛃 \\ {\bar{B}}_{k}^{\frac{1}{2}} V_{k} 𝛃 \end{matrix}] - [\begin{matrix} y \\ {\bar{B}}_{k}^{\frac{1}{2}} w_{k} \end{matrix}]‖}_{2}^{2},

(7)

where $w_{k} = x_{k} - α_{k} B_{k}^{- 1} \nabla f (x_{k})$ and ${\bar{B}}_{k}^{1 / 2}$ denotes the principal matrix square root of ${\bar{B}}_{k} = B_{k} / α_{k}$ and it is unique [62]. Note that ${A V}_{k}$ is built incrementally during the algorithm. Compared with the image size, the dimension of $𝛃_{k}$ is relatively low because the number of iterations will be significantly smaller than the image dimension. Therefore, we solve (7) directly, i.e.,

𝛃_{k} = {(V_{k}^{H} A^{H} A V_{k} + V_{k}^{H} {\bar{B}}_{k} V_{k})}^{- 1} V_{k}^{H} (A^{H} y + {\bar{B}}_{k} w_{k}) .

(8)

Algorithm 1.

Generalized Krylov Subspace Method (GKSM)

Initialization: $x_{1}$ , stepsize $α_{k} > 0, V_{1} = \frac{A^{H} y}{‖ A^{H} y ‖}, A V_{1},$ maximal number of subspace iterations K, and maximal number of total iterations `Max_Iter`
Iteration:
1:	for k = 1, 2,…, `Max_Iter` do
2:	Compute $\nabla f (x_{k})$
3:	Set $B_{k}$ using Algorithm 2
4:	Compute $𝛃_{k}$ using (8) (or solve (11) for $𝛃_{k}$ if a convex constraint is enforced)
5:	Compute $x_{k + 1} \leftarrow V_{k} 𝛃_{k}$
6:	if $k \leq K$ then
7:	Compute $r_{k} \leftarrow \nabla_{x} \bar{F} (x_{k + 1}, x_{k})$
8:	${\tilde{r}}_{k} \leftarrow (I - V_{k} V_{k}^{H}) r_{k}$
9:	if $‖ {\tilde{r}}_{k} ‖ \neq 0$ then
10:	$v_{k + 1} \leftarrow {\tilde{r}}_{k} / ‖ {\tilde{r}}_{k} ‖$
11:	$V_{k + 1} \leftarrow [\begin{array}{l} V_{k} & v_{k + 1} \end{array}]$
12:	$A V_{k + 1} \leftarrow [\begin{array}{l} A V_{k} & A v_{k + 1} \end{array}]$
13:	else
14:	$V_{k + 1} \leftarrow V_{k}$
15:	$A V_{k + 1} \leftarrow A V_{k}$
16:	end if
17:	else
18:	$V_{k + 1} \leftarrow I_{N}$
19:	end if
20:	end for

Open in a new tab

Here the matrix being inverted is only $k \times k$ with $k ≪ N$ .

To enrich the subspace after the kth iteration, we first compute the gradient of the objective function in (6) with respect to $x$ at $x = x_{k + 1}$ , i.e.,

r_{k} = \nabla_{x} \overline{F} (x_{k + 1}, x_{k}) .

(9)

Then we set $v_{k + 1} = {\tilde{r}}_{k} / ‖{\tilde{r}}_{k}‖$ with ${\tilde{r}}_{k} = (I_{N} - V_{k} V_{k}^{H}) r_{k}$ . The new subspace basis $V_{k + 1}$ is formulated as $[\begin{array}{l} V_{k} & v_{k + 1} \end{array}]$ . If $‖{\tilde{r}}_{k}‖ = 0$ , we simply skip the update of $V_{k}$ . Note that the dimension of $𝛃_{k}$ is equal to the number of columns of $V_{k}$ . This dimension may be smaller than $k$ if the event $‖{\tilde{r}}_{k}‖ = 0$ occurs, in which case the number of columns of $V_{k}$ is also smaller than $k$ . Algorithm 1 summarizes the detailed steps of GKSM. To establish the convergence rate of the cost function values, we introduce an additional step 18 in Algorithm 1, as the generated $V_{k}$ does not necessarily span the entire image domain. Note that GKSM reduces to CQNPM [42] when $V_{k} = I_{N}$ . For this case, we simply apply the accelerated gradient descent method to solve (6). If the regularizer $f$ in (2) is a quadratic function, then the subspace spanned by $V_{k}$ simplifies to the classical Krylov subspace in (4). To obtain $B_{k}$ , we adopt the algorithm presented in [42, Algorithm 2] such that $B_{k}$ is an estimate of the Hessian matrix of $f (x)$ that is guaranteed to be Hermitian positive definite. For completeness, Algorithm 2 provides the detailed steps for computing $B_{k}$ . The operator $ℜ (\cdot)$ in Algorithm 2 extracts the real part.

Algorithm 2.

Modified Memory Efficient Self-Scaling Hermitian Rank-1 Method

Initialization:

x_{k - 1}, x_{k}, \nabla f (x_{k - 1}), \nabla f (x_{k}), δ > 0, ν_{1} \in (0, 1)

and

ν_{2} \in (1, \infty)

Set

s_{k} \leftarrow x_{k} - x_{k - 1}

and

m_{k} \leftarrow (\nabla f (x_{k}) - \nabla f (x_{k - 1}))

Compute a such that

\min_{a} {a \in [0, 1] ∣ {\bar{m}}_{k} = a s_{k} + (1 - a) m_{k}} satisfies ν_{1} \leq \frac{ℜ (〈 s_{k}, {\bar{m}}_{k} 〉)}{〈 s_{k}, s_{k} 〉} and \frac{〈 {\bar{m}}_{k}, {\bar{m}}_{k} 〉}{ℜ (〈 s_{k}, {\bar{m}}_{k} 〉)} \leq ν_{2}

(10)

Compute

τ_{k} \leftarrow \frac{〈 s_{k}, s_{k} 〉}{ℜ (〈 s_{k}, {\bar{m}}_{k} 〉)} - \sqrt{{(\frac{〈 s_{k}, s_{k} 〉}{ℜ (〈 s_{k}, {\bar{m}}_{k} 〉)})}^{2} - \frac{〈 s_{k}, s_{k} 〉}{〈 {\bar{m}}_{k}, {\bar{m}}_{k} 〉}}

ρ_{k} \leftarrow ℜ (〈 s_{k} - τ_{k} {\bar{m}}_{k}, {\bar{m}}_{k} 〉)

ρ_{k} \leq δ ‖ s_{k} - τ_{k} {\bar{m}}_{k} ‖ ‖ {\bar{m}}_{k} ‖

then

u_{k} \leftarrow 0

else

u_{k} \leftarrow s_{k} - τ_{k} {\bar{m}}_{k}

end if

10:

ρ_{k}^{B} \leftarrow τ_{k}^{2} ρ_{k} + τ_{k} u_{k}^{H} u_{k}

11:

Return:

B_{k} \leftarrow τ_{k}^{- 1} I_{N} - \frac{u_{k} u_{k}^{H}}{ρ_{k}^{B}}

Open in a new tab

A. Incorporating A Convex Constraint

This part extends GKSM to handle a convex constraint on $x$ with a slight increase in computational cost. To ensure $x \in 𝒞$ , we solve the following problem for $𝛃_{k}$ instead of (7):

𝛃_{k} = a r g \underset{(V_{k} 𝛃) \in 𝒞}{m i n} {‖[\begin{matrix} A V_{k} 𝛃 \\ {\bar{B}}_{k}^{\frac{1}{2}} V_{k} 𝛃 \end{matrix}] - [\begin{matrix} y \\ {\bar{B}}_{k}^{\frac{1}{2}} w_{k} \end{matrix}]‖}_{2}^{2} .

(11)

Letting $z = V_{k} 𝛃$ and using the fact that $V_{k}^{H} V_{k} = I_{k}$ , we have $𝛃 = V_{k}^{H} z$ . Thus we rewrite (11) as

x_{k + 1} = a r g \underset{z \in 𝒞}{m i n} {‖[\begin{matrix} A V_{k} V_{k}^{H} z \\ {\bar{B}}_{k}^{\frac{1}{2}} V_{k} V_{k}^{H} z \end{matrix}] - [\begin{matrix} y \\ {\bar{B}}_{k}^{\frac{1}{2}} w_{k} \end{matrix}]‖}_{2}^{2} .

(12)

Since the objective function of (12) is differentiable, we simply apply the accelerated projection gradient method [63] to solve (12) efficiently. Although $z$ has the same dimension as $x$ , solving (12) only requires simple matrix-vector multiplications since $A V_{k}$ and ${\bar{B}}_{k}^{\frac{1}{2}} V_{k}$ are precomputed and saved. In practice, to extend Algorithm 1 to handle the convex constraint, we only need to replace the computation of $𝛃_{k}$ at step 4 with the solution of (11) by solving (12).

B. Discussion

The dominant computations at each iteration in Algorithm 1 involve computing $\nabla f (x), A x$ and $A^{H} x$ once, and the overall computational cost per iteration is lower than that of the methods proposed in [40, 42] that require dozens of evaluations of $A x$ and $A^{H} x$ per iteration. Apart from computational efficiency, GKSM requires additional memory, because it must store $V_{k}$ and $A V_{k}$ . Thus, GKSM may become memoryprohibitive for very large-scale problems. A practical heuristic to address this challenge is to use a restart strategy, in which we cyclically set $V_{k} = x_{k + 1} / ‖x_{k + 1}‖$ . This not only reduces the memory usage but also lowers the computational cost. Algorithmically, when a restart is triggered, this is equivalent to rerunning Algorithm 1 with $x_{k + 1}$ as the new initial value and $V_{1} = x_{k + 1} / ‖x_{k + 1}‖$ . Moreover, in practice, there is no guarantee that a column-orthogonal matrix $V_{k + 1}$ can always be constructed from $r_{k}$ , since $‖{\tilde{r}}_{k}‖$ may be zero. The restart strategy typically helps to escape such a situation. However, in our experimental settings, we never found that $‖{\tilde{r}}_{k}‖ = 0$ . We leave the study of restart strategies to future work.

The following convergence analysis shows that GKSM is guaranteed to monotonically decrease the cost function value every iteration. Thus, it is safe to set $K = Max_Iter$ in practice. However, the convergence rate of the cost function values to a minimum remains unclear since we cannot guarantee that $V_{k}$ will span the entire image space after a finite number of iterations. To better characterize the convergence rate of the cost values, we introduce step 18 in Algorithm 1. This addition allows us to explicitly quantify the cost convergence after $K$ iterations.

IV. Convergence Analysis

This section provides a rigorous convergence analysis of using Algorithm 1 to solve (2). Because the unconstrained problem is a special case of the constrained one, we focus our analysis on using GKSM for problems with constraints. We use the notation $F_{𝒞} (x) = F (x) + ι_{𝒞} (x)$ in the following analysis, where $ι_{𝒞} (x)$ denotes the characteristic function, defined as $ι_{𝒞} (x) = 0$ if $x \in 𝒞$ , and $ι_{𝒞} (x) = + \infty$ otherwise. Here, we assume that $𝒞$ is convex and that its indicator function $ι_{𝒞}$ is lower semicontinuous. Before presenting our main convergence results, we first review the definition of Kurdyka–Łojasiewicz (KL) inequality and make one assumption, followed by four supporting lemmas.

Definition 1 (Kurdyka–Łojasiewicz inequality [64, 65]). Let $χ (x) : C^{N} \to (- \infty, + \infty]$ be a proper, lower semicontinuous function. We say that $χ$ satisfies the Kurdyka–Łojasiewicz (KL) inequality at a point $\bar{x} \in d o m (\partial χ)$ if there exist a $η > 0$ , a neighborhood $𝒰$ of $\bar{x}$ , and a continuous concave function $φ : [0, η) \to R_{+}$ that is continuously differentiable on $(0, η)$ and satisfies $φ (0) = 0$ and $φ^{'} (s) > 0$ for all $s \in (0, η)$ , such that

φ^{'} (| χ (x) - χ (\bar{x}) |) \cdot d i s t (0, \partial χ (x)) \geq 1 .

holds for all $x \in 𝒰 \cap \{x \in C^{N} : | χ (x) - χ (\bar{x}) | < η\}$ . Here, $\partial χ (x)$ denotes the subgradient of $χ (x)$ , and $dist (\cdot, \cdot)$ denotes Euclidean distance.

In Definition 1, $φ (s)$ is called the desingularization function. If $φ (s)$ holds the form $φ (s) = c s^{1 - t}$ for $t \in [0, 1)$ and $c >$ 0, then we say that $χ (x)$ has the KL property at $\bar{x}$ with an exponent of $t$ .

Assumption 1 ( $L$ -Smooth $f$ ). Assume that $f : C^{n} \to (- \infty, + \infty]$ is a proper, lower semicontinuous, and lower bounded function. Further assume that the gradient of $f$ is L-Lipschitz continuous. That is, $\forall x_{1}, x_{2} \in C^{N}$ , there exists a $L > 0$ such that the following inequality holds:

‖ \nabla f (x_{1}) - \nabla f (x_{2}) ‖ \leq L ‖ x_{1} - x_{2} ‖ .

(13)

Lemma 1 (Majorizer of $f$ [42, Lemma 1]). Let $f : C^{N} \to (- \infty, \infty]$ be an $L$ -smooth function. Then for any $x_{1}, x_{2} \in C^{N}$ , we have

f (x_{2}) \leq f (x_{1}) + ℜ {\nabla f (x_{1}), x_{2} - x_{1}} + \frac{L}{2} {‖ x_{1} - x_{2} ‖}_{2}^{2} .

(14)

Lemma 2 (Bounded Hessian [42, Lemma 4]). The approximate Hessian matrices $B_{k}$ generated by Algorithm 2 satisfy the following inequality

\underline{η} I ⪯ B_{k} ⪯ \overline{η} I,

where $0 < \underline{η} < \overline{η} < \infty$ .

Lemma 3. By running Algorithm 1 for solving (2), we have the following inequality at $k$ th iteration,

ℜ {⟨ \nabla f (x_{k}), x_{k + 1} - x_{k} ⟩} \leq h (x_{k}) - h (x_{k + 1}) - \frac{1}{2} {‖ x_{k} - x_{k + 1} ‖}_{(\frac{2}{α_{k}} B_{k} - \underline{η} I_{N})}^{2} .

(15)

Lemma 4. Suppose the elements in ${\{ϕ_{k} > 0\}}_{k \geq 1}$ satisfy

ϕ_{k + 1}^{2 t} \leq γ (ϕ_{k} - ϕ_{k + 1}) and ϕ_{k + 1} \leq ϕ_{k},

where $t \in (0, 1)$ and $γ > 0$ . Then we have the following upper bounds for $ϕ_{k + 1}$ ,

ϕ_{k + 1} \leq {\begin{array}{l} {(1 - \frac{ϕ_{1}^{2 t - 1}}{γ + ϕ_{1}^{2 t - 1}})}^{k} ϕ_{1}, & t \in (0, \frac{1}{2}] \\ {(ϕ_{1}^{1 - 2 t} + \frac{(2 t - 1) {(1 - σ)}^{2 t}}{2 γ} k)}_{+}^{\frac{1}{1 - 2 t}}, & t \in (\frac{1}{2}, 1), \end{array}

(16)

where $σ \in (0, 1)$ and $(\cdot)_{+} = m a x (\cdot, 0)$ .

Lemmas 1 and 2 were already demonstrated in [42], so we omit their proofs here. The proofs of Lemmas 3 and 4 are provided in Appendices A and B. Lemma 4 is used to establish the convergence rates of the cost function sequence for different values of $t$ in the KL inequality when running Algorithm 1. Theorems 1 and 2 summarize our main convergence results.

Theorem 1 (Descent properties of Algorithm 1, $K \leq + \infty$ ). Let $α_{k} \in (0, \frac{2 \underline{η}}{\underline{η} + L})$ and $Δ_{k} ≜ \underset{k^{'} \leq k}{m i n} {‖x_{k^{'} + 1} - x_{k^{'}}‖}_{2}^{2}$ . Under Assumption 1, by running $k < K$ iterations of Algorithm 1 to solve (2), we have

$Δ_{k} \leq \frac{F (x_{1}) - F^{*}}{v k}$ and $F (x_{k + 1}) \leq F (x_{k})$ , where $v = \underset{k}{m i n} \{\underline{η} / α_{k} - (\underline{η} + L) / 2\}, F^{*}$ denotes the minimum of (2), and $x_{1}$ is the initial iterate.
$‖x_{k + 1} - x_{k}‖ \to 0$ as $k \to \infty$ .

Theorem 2 (Convergence rates, $K < + \infty$ ). Let $α_{k} \in (0, \frac{2 \underline{η}}{\underline{η} + L})$ and $ℬ (x^{*}, Λ) = \{x \in C^{N} ∣ ‖x - x^{*}‖ \leq Λ\}$ . Under Assumption 1, by running Algorithm 1 $k > K$ iterations to solve (2), we have

$‖x_{k + 1} - x_{k}‖ \to 0$ as $k \to \infty$ and all cluster points of the sequence ${\{x_{k}\}}_{k > K}$ are critical points of (2).
Assume $x_{k}$ converges to $x^{*}$ and $F_{𝒞}$ satisfies the $K L$ inequality. There exists $Λ > 0$ such that $F_{𝒞}$ at $\bar{x} = x^{*}$ in a neighborhood of $𝒰$ containing $ℬ (x^{*}, Λ)$ . Then there also exists $K^{'} > K$ , such that, $x_{k} \in ℬ (x^{*}, Λ)$ and $|F_{𝒞} (x_{k}) - F^{*}| < η$ for all $k \geq K^{'}$ . For $k \geq K^{'}$ , we have the following convergence rates of the cost function values for $t \in [0, 1)$ :
1. $F (x_{k + 1}) - F^{*} \leq {(F_{K^{'}} - \frac{1}{γ} (k - K^{'} + 1))}_{+}$ , $t = 0$ ,
2. $F (x_{k + 1}) - F^{*} \leq {(1 - \frac{F_{K^{'}}^{2 t - 1}}{F_{K^{'}}^{2 t - 1} + γ})}^{k - K^{'} + 1} F_{K^{'}}$ , $t \in (0, \frac{1}{2}]$ ,
3. $F (x_{k + 1}) - F^{*} \leq {(F_{K^{'}}^{1 - 2 t} + q (k - K^{'} + 1))}_{+}^{\frac{1}{1 - 2 t}}$ , $t \in (\frac{1}{2}, 1)$ .
where $F_{K^{'}} = F (x_{K^{'}}) - F^{*}, σ \in (0, 1), γ = {m a x}_{k} ({[c (L α_{k} + \overline{η})]}^{2}) / (v {(1 - t)}^{2} α_{k}^{2})$ , and $q = (t - 1 / 2) {(1 - σ)}^{2 t} / γ$ .

Theorem 1 states that GKSM leads to $‖x_{k + 1} - x_{k}‖ \to 0$ as $k \to \infty$ for any $K$ , either finite or infinite. Our first result in Theorem 2 establishes all cluster points of the sequence ${\{x_{k}\}}_{k > K}$ are critical points of (2) for finite $K$ . The second result in Theorem 2 provides convergence rates of the cost values for $t \in [0, 1)$ after $K^{'} > K$ iterations. During the algorithm’s execution, all iterates remain in $𝒞$ , so that $F_{𝒞} (x_{k}) = F (x_{k})$ for all $k$ . For ease of notation, we use $F (x)$ instead of $F_{𝒞} (x)$ to express the convergence rates in Theorem 2. Note that if $F (x_{k + 1}) = F^{*}$ , the left hand side is identically zero and therefore the convergence rate bounds in Theorem 2 hold trivially.

Note that Algorithm 1 reduces to CQNPM [42] when $K$ is finite and $k > K$ . Theorem 2 then extends the theoretical analysis in [42] to a more general class of functions $F_{C}$ . Specifically, our analysis relies on the KL inequality, which is weaker than the Polyak-Łojasiewicz (PL) inequality² used in [42]. Section V-C empirically studies the convergence behavior of Algorithm 1 to validate our theoretical analysis.

V. Numerical Experiments

This section studies the performance of GKSM for CS MRI reconstruction with spiral and radial sampling trajectories. Note that the experimental and algorithmic settings used here are similar to those in our previous work [42]. For convenience, we briefly re-describe them in this paper. Then, we present the reconstruction results and study the convergence behavior of GKSM empirically.

Experimental Settings:

The performance of GKSM are evaluated on both brain and knee MRI datasets. For the brain images, we used the dataset from [18], which contains 360 training and 164 test images. For the knee images, we adopted the multi-coil knee dataset from the NYU fastMRI [44]. The ESPIRiT algorithm [66] was used to obtain the complex-valued images from the raw k-space data. All images were then resized to a uniform image size of $256 \times 256$ and normalized such that the maximum magnitude was one. The network architecture proposed in [39] with the addition of bias terms is used to construct $f (x)$ , see Fig. 1. The number of layers is set to six. Note that other neural network architectures can also be used to train the gradient-driven denoiser. The only requirement is that the nonlinear activation functions be differentiable. For example, Hurault et al. [40] used DRUNet by replacing the ReLU activations with ELU to construct f(x). The noisy images were generated by adding i.i.d. Gaussian noise with variance 1/255 to the clean images. The network was trained using the mean squared error loss. We employed the ADAM optimizer [67] with an initial learning rate of 10−3, which was halved every 4, 000 iterations. Training was performed for a total of 18, 000 iterations with a batch size of 64. Although we trained separate denoisers for the brain and knee datasets, different sampling trajectories used the same denoiser.

Fig. 1. — The neural network architecture used to construct the energy function $f_{𝛉} (x)$ is based on [39]. The convolutional kernels have size $3 \times 3$ with a stride of one.

To assess reconstruction quality, we selected six brain and knee test images as ground truth. Fig. 2 displays the magnitudes of these images. For spiral acquisition, we used six interleaves with 1688 readout points and 32 coils. Radial acquisition employed 55 spokes with golden-angle rotation, 1024 readout points, and 32 coils. Fig. 3 describes these sampling trajectories. To simulate k-space data, we applied the forward model to the ground-truth images and then added the complex i.i.d. Gaussian noise (zero mean, variance 10⁻⁴), resulting in an input SNR of approximately 21 dB. In the reconstruction, we employed coil compression [68] to reduce the number of coils from 32 to 20 virtual coils, thereby lowering the computational cost. All experiments used simulated k-space data and were implemented in PyTorch and ran on an NVIDIA A100 GPU.

Fig. 3. — The spiral (a) and radial (b) sampling trajectories.

Algorithmic Settings:

The previous work [10] already showed that the accelerated proximal gradient method (dubbed APG) [69] is faster than the projected gradient descent method [39] and the proximal gradient method [40] for addressing (2). Thus, we mainly compared GKSM with CQNPM [10] and APG in this paper. To test the applicability of GKSM for a constrained problem, we used the constraint set $𝒞 = \{x ∣ ‖ x ‖_{\infty} \leq 1\}$ in all competing methods. However, in practical CS MRI reconstruction, we generally do not impose such a constraint. For plots involving $F^{*}$ , we ran APG for 500 iterations and defined $F^{*} = F (x_{500}) - ε$ for a small constant $ε > 0$ . Unless otherwise specified, we set $K = Max_Iter$ in the following experiments. Algorithm 2 used $δ = 10^{- 8}$ , $ν_{1} = 2 \times 10^{- 6}$ , and $ν_{2} = 200$ .

A. Spiral Acquisition Reconstruction

Fig. 4 summarizes the cost and PSNR values versus the number of iterations and wall time for each method on the brain 1 image. Fig. 4.s (a) and (c show CQNPM was the fastest algorithm in terms of the number of iterations. GKSM achieved similar results to CQNPM after enough iterations. Fig. 4s (b) and (d) report the cost and PSNR values versus wall time, where GKSM was the fastest algorithm in terms of wall time. In multi-coil CS MRI reconstruction with non-Cartesian sampling, computing $A x$ and $A^{H} x$ typically dominate the computational cost. CQNPM (respectively, APG) requires solving a weighted proximal mapping (respectively, a proximal mapping) at each iteration, which involves multiple evaluations of $A x$ and $A^{H} x$ . In contrast, GKSM requires only a single evaluation of $A x$ and $A^{H} x$ per iteration, which significantly reduces computational cost while maintaining relatively fast convergence in terms of the number of iterations.

Fig. 5 presents the reconstructed images and the corresponding error maps at the 50th and 100th iterations of each method. GKSM eventually achieved a reconstruction quality similar to that of CQNPM with same number of iterations, but with significantly less wall time. Table I summarizes the PSNR performance on the rest of five brain images within 150 iterations. Clearly, we observed that GKSM was approximately $7 \times$ faster than CQNPM in terms of wall time required to exceed the performance of APG within 150 iterations. Moreover, GKSM achieves nearly the same performance as CQNPM at the 150th iteration, while requiring approximately $9 \times$ less time, illustrating the superior performance of our method. The supplementary material includes additional results on the knee images with spiral acquisition, as well as the structural similarity index measure (SSIM) metrics, which exhibit similar behavior.

TABLE I.

PSNR performance of each method for reconstructing five additional brain test images with spiral acquisition. For APG, we report the maximum PSNR (within 150 iterations), the corresponding number of iterations, and wall time. For other methods, we report the earliest iteration count that exceeds the APG PSNR (along with its PSNR and wall time) in the first row. The PSNR and wall time at the 150th iteration were summarized in the second row. Bold indicates the shortest wall time at which the PSNR of APG was exceeded. The PSNR values and wall time at the 150th iteration of CQNPM and GKSM are marked with an underline. The blue digits denote the shortest wall time at the 150th iteration.

_Methods╲^Index	2			3			4			5			6
_Methods╲^Index	PSNR↑	iter.↓	sec.↓	PSNR↑	iter.↓	sec.↓	PSNR↑	iter.↓	sec.↓	PSNR↑	iter.↓	sec.↓	PSNR↑	iter.↓	sec.↓
APG	42.1	150	86.3	42.8	150	86.4	43.3	150	87.1	42.0	150	86.4	40.5	150	88.7

CQNPM	42.1	29	16.9	42.8	30	19.7	43.3	31	19.1	42.1	30	17.6	40.5	29	17.6
CQNPM	44.2	150	82.0	44.7	150	70.1	45.0	150	65.5	44.0	150	68.9	42.7	150	77.0

GKSM	42.2	49	2.4	42.9	50	2.4	43.3	50	2.4	42.1	51	2.5	40.5	49	2.3
GKSM	44.2	150	8.4	44.7	150	8.2	45.1	150	8.2	43.9	150	8.3	42.5	150	8.2

Open in a new tab

B. Radial Acquisition Reconstruction

Fig. 6 presents the cost and PSNR values versus the number of iterations and wall time of each method on the knee 1 image with radial acquisition. Fig. 6s (a) and (c) show CQNPM converged faster than APG in terms of the number of iterations. In this experimental setting, we found that CQNPM made faster progress than GKSM in the early iterations. Then GKSM exceeded CQNPM in the later iterations. This observation is slightly different from Fig. 4. Although all methods have convergence guarantees under the same assumptions, we cannot guarantee that their iterates follow the same trajectory. One possible explanation is that using a subspace in GKSM may sometimes act as an additional constraint, guiding the iterates along a more favorable path toward a minimizer. The detailed study of this direction is beyond the scope of this paper and is left for future work. Fig. 6s (b) and (d) display the cost and PSNR values of each method versus wall time. Evidently, GKSM converged faster than others in terms of wall time, which is consistent with the previous observation.

Fig. 7 reports the reconstructed images of each method at 50th and 100th iterations. From this experiment, we observed that GKSM demonstrated the best visual quality among all methods. Table II reports the PSNR and wall time performance of each method on the remaining five knee test images. We ran APG for 100 iterations and then compared how many iterations were required by the other methods to exceed the PSNR value achieved by APG. Table II also reports the PSNR values and wall time of CQNPM and GKSM at 100 iterations. Consistently, we observed similar beharior as in Table I. The supplementary material includes additional results on the brain images with radial acquisition, as well as the SSIM metrics that show similar trends.

C. Effect of K and Convergence Validation

This part empirically studies the effect of $K$ and the convergence behavior of GKSM using the brain 1 image and spiral acquisition settings as in Fig. 4. Fig. 8 reports the PSNR values versus the number of iterations and wall time for GKSM with varying $K$ and CQNPM. We observed that different values of $K$ converged to similar PSNR values as CQNPM. Fig. 8(b) presents the PSNR values versus wall time, where we observed that larger values of $K$ led to faster convergence compared with smaller ones. This observation is consistent with our earlier results, as GKSM avoids solving a weighted proximal mapping for iterations $k \leq K$ .

We have now empirically validated our theoretical analysis. Fig. 9 presents the cost values and the values of $Δ_{k} / Δ_{1}$ for GKSM with spiral acquisition on six brain test images. As expected, the cost values converged to a constant across all test images, and $Δ_{k} / Δ_{1} \to 0$ , consistent with our theoretical analysis.

VI. Conclusion

A well-established theoretical foundation is especially important for ensuring reliability in medical imaging applications. Compared with the PnP and RED frameworks, gradient-driven denoisers offer a significantly stronger theoretical foundation. In particular, the only required assumptions are the differentiability of f and the Lipschitz continuity of ∇f, which are easier to satisfy in practice. To efficiently solve the associate nonconvex minimization problem, we developed a generalized Krylov subspace method with convergence guarantees in nonconvex settings. Numerical experiments on multi-coil CS-MRI reconstruction with non-Cartesian sampling trajectories demonstrate that the proposed method can recover images within seconds on a GPU platform. This significantly improves the efficiency of solving the associated optimization problem and enhances the practical applicability of gradient-driven denoisers.

Supplementary Material

supp1-3655489

NIHMS2144790-supplement-supp1-3655489.pdf^{(3.7MB, pdf)}

Acknowledgments

TH and UV were supported in part by NIH grant R01 EB034261.

Supported in part by NIH grants R01 EB035618 and R21 EB034344.

Appendix A

Proof of Lemma 3

Since $B_{k} ≻ 0$ (cf. Lemma 2), we know the objective function in (6) is $\underline{η}$ -strongly convex. By combining with the fact that $V_{k}^{H} V_{k} = I_{k}$ and the $\underline{η}$ -strongly convex inequality, we have the following inequality at kth iteration for $\forall x = V_{k} 𝛃, x \in 𝒞$ :

ℜ {V_{k}^{H} [\nabla h (x_{k + 1}) + \nabla \bar{f} (x_{k + 1}, x_{k}, B_{k}, α_{k}) + \frac{η}{2} (V_{k} β - x_{k + 1})], β - β_{k}} \geq 0 .

(17)

TABLE II.

PSNR performance of each method for reconstructing five additional knee test images with radial acquisition. For APG, we report the maximum PSNR (within 100 iterations), the corresponding number of iterations, and wall time. For other methods, we report the earliest iteration that exceeds the APG PSNR (along with its PSNR and wall time) in the first row. The PSNR and wall time at the 100th iteration were summarized in the second row. Bold indicates the shortest wall time at which the PSNR of APG was exceeded. The PSNR values and wall time at the 150th iteration of CQNPM and GKSM are marked with an underline. The blue digits denote the shortest wall time at the 100th iteration.

_Methods╲^Index	2			3			4			5			6
_Methods╲^Index	PSNR↑	iter.↓	sec.↓	PSNR↑	iter.↓	sec.↓	PSNR↑	iter.↓	sec.↓	PSNR↑	iter.↓	sec.↓	PSNR↑	iter.↓	sec.↓
APG	42.6	100	61.0	41.4	100	61.3	44.6	100	60.7	42.1	100	63.6	44.2	100	64.7

CQNPM	42.7	26	15.8	41.4	30	19.1	44.7	25	16.4	42.2	28	17.0	44.2	25	16.3
CQNPM	44.1	100	59.9	41.9	100	64.3	45.5	100	65.4	44.1	100	61.9	44.9	100	66.3

GKSM	42.7	41	2.0	41.4	33	1.6	44.7	38	1.8	42.3	36	1.8	44.2	39	1.9
GKSM	44.1	100	5.6	43.3	100	5.5	45.5	100	5.5	44.4	100	5.7	44.9	100	5.5

Open in a new tab

Fig. 6. — Comparison of different methods with radial acquisition on the knee 1 image for $ε = 6 \times 10^{- 3}$ . (a), (b): cost values versus iteration and wall time; (c), (d): PSNR values versus iteration and wall time.

Letting $𝛃 = [\begin{matrix} 𝛃_{k - 1} \\ 0 \end{matrix}]$ , we rewrite (17) as

ℜ {[\nabla h (x_{k + 1}) + \frac{1}{α_{k}} B_{k} (x_{k + 1} - x_{k}) + \nabla f (x_{k}) + \frac{η}{2} (x_{k} - x_{k + 1})], x_{k} - x_{k + 1}} \geq 0 .

(18)

Note that if $‖{\tilde{r}}_{k}‖ = 0$ , we choose $𝛃 = 𝛃_{k - 1}$ and (18) is still held. By reorganizing (18) and using the convexity of $h (x) (h (x_{k}) \geq h (x_{k + 1}) + ℜ \{⟨\nabla h (x_{k + 1}), x_{k} - x_{k + 1}⟩\})$ , we get the desired result

ℜ {\nabla f (x_{k}), x_{k + 1} - x_{k}} \leq h (x_{k}) - h (x_{k + 1}) - \frac{1}{2} {‖ x_{k} - x_{k + 1} ‖}_{(\frac{2}{α_{k}} B_{k} - \underline{η} I_{N})}^{2} .

(19)

Appendix B

Proof of Lemma 4

Our goal is to derive an upper bound for $ϕ_{k + 1}$ by using the facts that $ϕ_{k} - ϕ_{k + 1} \geq ϕ_{k + 1}^{2 t} / γ$ and $0 < ϕ_{k + 1} \leq ϕ_{k}$ . Rewrite $ϕ_{k} - ϕ_{k + 1} \geq ϕ_{k + 1}^{2 t} / γ$ as $ϕ_{k} \geq ϕ_{k + 1} (1 + \frac{1}{γ} ϕ_{k + 1}^{2 t - 1})$ . Considering $t \in (0, 1 / 2)$ , we know $ϕ_{k + 1}^{2 t - 1}$ is monotonically decreasing since $2 t - 1 < 0$ . So we have $ϕ_{k + 1}^{2 t - 1} \geq ϕ_{1}^{2 t - 1}$ , which implies $ϕ_{k} \geq ϕ_{k + 1} (1 + \frac{1}{γ} ϕ_{1}^{2 t - 1})$ , yielding $ϕ_{k + 1} \leq {(1 - \frac{ϕ_{1}^{2 t - 1}}{γ + ϕ_{1}^{2 t - 1}})}^{k} ϕ_{1}$ . If $t = \frac{1}{2}$ , we have $ϕ_{k + 1} \leq γ (ϕ_{k} - ϕ_{k + 1})$ , which yields $ϕ_{k + 1} \leq \frac{γ}{1 + γ} ϕ_{k}$ . Therefore, we can establish the desired result immediately: $ϕ_{k + 1} \leq {(1 - \frac{1}{1 + γ})}^{k} ϕ_{1}$ .

Denote by $ψ (x) = x^{1 - 2 t}$ , where $x > 0$ . Let $\overline{t} = 2 t - 1$ . Using the mean value theorem, we have

ψ (ϕ_{k + 1}) - ψ (ϕ_{k}) = - \overline{t} {\overline{ϕ}}_{k}^{- \overline{t} - 1} (ϕ_{k + 1} - ϕ_{k}),

(20)

with $ϕ_{k + 1} \leq {\overline{ϕ}}_{k} \leq ϕ_{k}$ . Since ${\overline{ϕ}}_{k}^{- \overline{t} - 1}$ is monotonically decreasing and $ϕ_{k} - ϕ_{k + 1} \geq ϕ_{k + 1}^{2 t} / γ$ , we can get the following inequalities from (20) for $t \in (1 / 2, 1)$

ψ (ϕ_{k + 1}) - ψ (ϕ_{k}) \geq \bar{t} ϕ_{k}^{- 2 t} (ϕ_{k} - ϕ_{k + 1}) \geq ϕ_{k}^{- 2 t} ϕ_{k + 1}^{2 t} \frac{\bar{t}}{γ} .

(21)

Since $0 < ϕ_{k + 1} \leq ϕ_{k}$ , we have $ϕ_{k + 1} / ϕ_{k} \leq 1$ . Suppose we run $k$ iterations. For any $σ \in (0, 1)$ , we can split the whole iterate indices into two subsets $ℐ_{1}$ and $ℐ_{2}$ such that $ℐ_{1} = \{k^{'} ∣ ϕ_{k^{'} + 1} / ϕ_{k^{'}} \leq 1 - σ\}$ and $ℐ_{2} = \{k^{'} ∣ ϕ_{k^{'} + 1} / ϕ_{k^{'}} > 1 - σ\}$ . So, we know that either $|ℐ_{1}| \geq k / 2$ or $|ℐ_{2}| \geq k / 2$ .

If $|ℐ_{1}| \geq k / 2$ , we get

ϕ_{k + 1} \leq (1 - σ)^{k / 2} ϕ_{1} .

(22)

Next, we consider $|ℐ_{2}| \geq k / 2$ . By summing up (21) from 1 to $k$ , we reach

ψ (ϕ_{k + 1}) \geq ψ (ϕ_{1}) + \frac{(2 t - 1) (1 - σ)^{2 t}}{2 γ} k .

Using the definition of $ψ (\cdot)$ and the fact that $1 - 2 t < 0$ , we derive

ϕ_{k + 1} \leq {(ϕ_{1}^{1 - 2 t} + \frac{(2 t - 1) (1 - σ)^{2 t}}{2 γ} k)}_{+}^{\frac{1}{1 - 2 t}} .

(23)

If $k$ is large enough, the bound in (22) is smaller than that in (23), yielding the desired result.

Fig. 7. — First row: the reconstructed knee 1 images of each method at 50th and 100th iterations with radial acquisition. The PSNR (respectively, SSIM) values are labeled at the left (respectively, right) bottom corner of each image. Second row: the associated error maps (8×) of the reconstructed images.

Fig. 8. — Comparison of varying K with spiral acquisition on the brain 1 image. (a), (b): PSNR values versus iteration and wall time.

Fig. 9. — (a) Averaged cost values (a) and $Δ_{k} / Δ_{1}$ (b) versus iteration for GKSM. The shaded region of each curve represents the range of the cost values and $Δ_{k}$ across six brain test images with spiral acquisition.

Appendix C

By using Lemma 1, we have the following inequalities

f (x_{k + 1}) \leq f (x_{k}) + \frac{L}{2} {‖ x_{k + 1} - x_{k} ‖}_{2}^{2} + ℜ {\nabla f (x_{k}), x_{k + 1} - x_{k}} \leq f (x_{k}) + h (x_{k}) - h (x_{k + 1}) - \frac{1}{2} {‖ x_{k} - x_{k + 1} ‖}_{(2 B_{k} / α_{k} - (\underline{η} + L) I_{N})}^{2} .

(24)

The second inequality comes from Lemma 3. Reorganizing (24), we get

\frac{1}{2} {‖x_{k} - x_{k + 1}‖}_{(2 B_{k} / α_{k} - (\underline{η} + L) I_{N})}^{2} \leq F (x_{k}) - F (x_{k + 1}) .

Letting $α_{k} < \frac{2 \underline{η}}{\underline{η} + L}, v = {m i n}_{k} {\underline{η} / α_{k^{'}} - (\underline{η} + L) / 2}$ , and using Lemma 2, we reach

v {‖x_{k} - x_{k + 1}‖}_{2}^{2} \leq F (x_{k}) - F (x_{k + 1}) .

(25)

Since $v > 0$ , we have $F (x_{k + 1}) \leq F (x_{k})$ . Summing up (25) from $k^{'} = 1$ to $k$ , we get

\sum_{k^{'} = 1}^{k} v {‖ x_{k^{'}} - x_{k^{'} + 1} ‖}_{2}^{2} \leq F (x_{1}) - F (x_{K + 1}) \leq F (x_{1}) - F^{*},

(26)

where $F^{*}$ denotes the minimal value of $F (x)$ . Letting $Δ_{k} = {m i n}_{k^{'} \leq k} \{{‖x_{k^{'}} - x_{k^{'} + 1}‖}_{2}^{2}\}$ , we get the desired result

Δ_{k} \leq \frac{F (x_{1}) - F^{*}}{v k} .

(27)

Let $k \to \infty$ , we get $Δ_{k} \to 0$ . Together with the summation in (26), we obtain $‖x_{k + 1} - x_{k}‖ \to 0$ as $k \to \infty$ .

Appendix D

Proof of Theorem 2

For $k > K$ , we have $V_{k} = I_{N}$ , so $V_{k}^{H} V_{k} = I_{N}$ still holds. Therefore, (26) and (27) remain valid for $k > K$ . Consequently, we still have $‖x_{k + 1} - x_{k}‖ \to 0$ as $k \to \infty$ . Next, we prove that all cluster points of the sequence ${\{x_{k}\}}_{k > K}$ are critical points of (2).

Let $G (𝛃) = {‖{\bar{A}}_{k} 𝛃 - {\bar{y}}_{k}‖}_{2}^{2}$ denote the cost function of (11) with ${\bar{A}}_{k} = [\begin{matrix} A \\ {\bar{B}}_{k}^{\frac{1}{2}} \end{matrix}]$ and ${\bar{y}}_{k} = [\begin{matrix} y \\ {\bar{B}}_{k}^{\frac{1}{2}} w_{k} \end{matrix}]$ . Then we rewrite (11) as an unconstrained problem, i.e.,

𝛃_{k} = a r g \underset{𝛃}{m i n} G (𝛃) + ι_{𝒞} (𝛃) .

(28)

From the first-order optimality condition of (28) and using the fact that $x_{k + 1} = 𝛃_{k}$ , we have

0 \in \nabla h (x_{k + 1}) + \partial ι_{𝒞} (x_{k + 1}) + \nabla f (x_{k}) + \frac{1}{α_{k}} B_{k} (x_{k + 1} - x_{k})

which implies

\nabla f (x_{k + 1}) - \nabla f (x_{k}) + \frac{1}{α_{k}} B_{k} (x_{k} - x_{k + 1}) \in \nabla h (x_{k + 1}) + \nabla f (x_{k + 1}) + \partial ι_{𝒞} (x_{k + 1}) .

(29)

Here, we use the definitions of $\overline{f} (x, x_{k}, B_{k}, α_{k})$ and $h (x)$ .

Note that $F_{𝒞} (x) = F (x) + ι_{𝒞} (x)$ with $F (x) = h (x) + f (x)$ . By using (29), we have

dist (0, \partial F_{𝒞} (x_{k + 1})) \leq ‖ \nabla f (x_{k + 1}) - \nabla f (x_{k}) + \frac{1}{α_{k}} B_{k} (x_{k} - x_{k + 1}) ‖, \leq ‖ \nabla f (x_{k + 1}) - \nabla f (x_{k}) ‖ + \frac{\bar{η}}{α_{k}} ‖ x_{k} - x_{k + 1} ‖ \leq \frac{L α_{k} + \bar{η}}{α_{k}} ‖ x_{k} - x_{k + 1} ‖ .

(30)

Notice that $‖x_{k} - x_{k + 1}‖ \to 0$ for $k \to \infty$ and that $\frac{L α_{k} + \overline{η}}{α_{k}}$ remains finite. So we have $d i s t (0, \partial F_{𝒞} (x_{k + 1})) \to 0$ for $k \to \infty$ , which implies that all cluster points of ${\{x_{k}\}}_{k > K}$ are critical points of (2). This completes the proof of the first term.

Since $F_{𝒞}$ satisfies the KL inequality and $x_{k}$ is converging to $x^{*}$ , there exist $K^{'} > K$ and $Λ > 0$ such that, for all $k \geq K^{'}$ , we have $x_{k} \in ℬ (x^{*}, Λ)$ , where $ℬ (x^{*}, Λ) = \{x \in C^{N} ∣ ‖ x - x^{*} ‖ \leq Λ\}$ , and $|F_{C} (x_{k}) - F^{*}| < η$ . By letting $\bar{x} = x^{*}$ , $φ (s) = c s^{1 - t}$ , and the assumption $ℬ (x^{*}, Λ) \subseteq 𝒰$ , we get

{(F_{𝒞} (x_{k + 1}) - F^{*})}^{2 t} \leq c^{2} {(1 - t)}^{2} dist {(0, \partial F_{𝒞} (x_{k + 1}))}^{2} \overset{(30)}{\leq} \frac{{[c (L α_{k} + \bar{η})]}^{2}}{{(1 - t)}^{2} α_{k}^{2}} {‖ x_{k} - x_{k + 1} ‖}_{2}^{2} \overset{(25)}{\leq} \frac{{[c (L α_{k} + \bar{η})]}^{2}}{v {(1 - t)}^{2} α_{k}^{2}} (F (x_{k}) - F (x_{k + 1})) .

(31)

Note that during the algorithm’s progress, all iterates remain in $𝒞$ , so that $F_{𝒞} (x) = F (x)$ . For simplicity, we write $F (x)$ instead of $F_{𝒞} (x)$ in what follows. Denote by $γ = {m a x}_{k} ({[c (L α_{k} + \overline{η})]}^{2}) / (v {(1 - t)}^{2} α_{k}^{2}))$ . For $t = 0$ , we have

F (x_{k + 1}) - F^{*} \leq F (x_{k}) - F^{*} - \frac{1}{γ},

resulting in

F (x_{k + 1}) - F^{*} \leq {(F (x_{K^{'}}) - F^{*} - \frac{1}{γ} (k - K^{'} + 1))}_{+} .

By using Lemma 4, we get the desired results for $t \in (0, 1)$ .

Footnotes

H denotes the Hermitian transpose operator.

The PL inequality corresponds to a special case of the KL inequality with $t = \frac{1}{2}$ .

Contributor Information

Tao Hong, Oden Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, TX 78712, USA.

Umberto Villa, Oden Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, TX 78712, USA.

Jeffrey A. Fessler, Department of Electrical and Computer Engineering, University of Michigan, Ann Arbor, MI 48109, USA.

References

[1].Brown RW, Cheng Y-CN, Haacke EM, Thompson MR, and Venkatesan R, Magnetic Resonance Imaging: Physical Principles and Sequence Design. John Wiley & Sons, 2014. [Google Scholar]
[2].Lustig M, Donoho D, and Pauly JM, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195, 2007. [DOI] [PubMed] [Google Scholar]
[3].Lustig M, Donoho DL, Santos JM, and Pauly JM, “Compressed sensing MRI,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 72–82, 2008. [Google Scholar]
[4].Fessler JA, “Model-based image reconstruction for MRI,” IEEE Signal Processing Magazine, vol. 27, no. 4, pp. 81–9, Jul. 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Pruessmann KP, Weiger M, Scheidegger MB, and Boesiger P, “SENSE: sensitivity encoding for fast MRI,” Magnetic Resonance in Medicine, vol. 42, no. 5, pp. 952–962, 1999. [PubMed] [Google Scholar]
[6].Griswold MA, Jakob PM, Heidemann RM, Nittka M, Jellus V, Wang J, Kiefer B, and Haase A, “Generalized autocalibrating partially parallel acquisitions (GRAPPA),” Magnetic Resonance in Medicine, vol. 47, no. 6, pp. 1202–1210, 2002. [DOI] [PubMed] [Google Scholar]
[7].Guerquin-Kern M, Haberlin M, Pruessmann KP, and Unser M, “A fast wavelet-based reconstruction method for magnetic resonance imaging,” IEEE Transactions on Medical Imaging, vol. 30, no. 9, pp. 1649–1660, 2011. [DOI] [PubMed] [Google Scholar]
[8].Rudin LI, Osher S, and Fatemi E, “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60, no. 1-4, pp. 259–268, 1992. [Google Scholar]
[9].Block KT, Uecker M, and Frahm J, “Undersampled radial MRI with multiple coils. Iterative image reconstruction using a total variation constraint,” Magnetic Resonance in Medicine, vol. 57, no. 6, pp. 1086–1098, 2007. [DOI] [PubMed] [Google Scholar]
[10].Hong T, Hernandez-Garcia L, and Fessler JA, “A complex quasi-Newton proximal method for image reconstruction in compressed sensing MRI,” IEEE Transactions on Computational Imaging, vol. 10, pp. 372 – 384, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Aharon M, Elad M, and Bruckstein A, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006. [Google Scholar]
[12].Ravishankar S and Bresler Y, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Transactions on Medical Imaging, vol. 30, no. 5, pp. 1028–1041, 2011. [DOI] [PubMed] [Google Scholar]
[13].Dong W, Shi G, Li X, Ma Y, and Huang F, “Compressive sensing via nonlocal low-rank regularization,” IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 3618–3632, 2014. [DOI] [PubMed] [Google Scholar]
[14].Fessler JA, “Optimization methods for magnetic resonance image reconstruction: Key models and optimization algorithms,” IEEE Signal Processing Magazine, vol. 37, no. 1, pp. 33–40, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Ravishankar S, Ye JC, and Fessler JA, “Image reconstruction: From sparsity to data-adaptive methods and machine learning,” Proceedings of the IEEE, vol. 108, no. 1, pp. 86–109, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Heckel R, Jacob M, Chaudhari A, Perlman O, and Shimron E, “Deep learning for accelerated and robust MRI reconstruction,” Magnetic Resonance Materials in Physics, Biology and Medicine, vol. 37, no. 3, pp. 335–368, 2024. [Google Scholar]
[17].Wang S, Su Z, Ying L, Peng X, Zhu S, Liang F, Feng D, and Liang D, “Accelerating magnetic resonance imaging via deep learning,” in IEEE 13th International Symposium on Biomedical Imaging (ISBI). IEEE, 2016, pp. 514–517. [Google Scholar]
[18].Aggarwal HK, Mani MP, and Jacob M, “MoDL: Model-based deep learning architecture for inverse problems,” IEEE Transactions on Medical Imaging, vol. 38, no. 2, pp. 394–405, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[19].Gilton D, Ongie G, and Willett R, “Deep equilibrium architectures for inverse problems in imaging,” IEEE Transactions on Computational Imaging, vol. 7, pp. 1123–1133, 2021. [Google Scholar]
[20].Ramzi Z, Chaithya G, Starck J-L, and Ciuciu P, “NC-PDNet: A density-compensated unrolled network for 2D and 3D non-Cartesian MRI reconstruction,” IEEE Transactions on Medical Imaging, vol. 41, no. 7, pp. 1625–1638, 2022. [DOI] [PubMed] [Google Scholar]
[21].Chung H and Ye JC, “Score-based diffusion models for accelerated MRI,” Medical Image Analysis, vol. 80, p. 102479, 2022. [Google Scholar]
[22].Venkatakrishnan SV, Bouman CA, and Wohlberg B, “Plug-and-play priors for model based reconstruction,” in IEEE Global Conference on Signal and Information Processing. IEEE, 2013, pp. 945–948. [Google Scholar]
[23].Romano Y, Elad M, and Milanfar P, “The little engine that could: Regularization by denoising (RED),” SIAM Journal on Imaging Sciences, vol. 10, no. 4, pp. 1804–1844, 2017. [Google Scholar]
[24].Sreehari S, Venkatakrishnan SV, Wohlberg B, Buzzard GT, Drummy LF, Simmons JP, and Bouman CA, “Plug-and-play priors for bright field electron tomography and sparse interpolation,” IEEE Transactions on Computational Imaging, vol. 2, no. 4, pp. 408–423, 2016. [Google Scholar]
[25].Meinhardt T, Moeller M, Hazirbas C, and Cremers D, “Learning proximal operators: Using denoising networks for regularizing inverse imaging problems,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, Oct. 2017, pp. 1799–1808. [Google Scholar]
[26].Buzzard GT, Chan SH, Sreehari S, and Bouman CA, “Plug-and-Play unplugged: Optimization free reconstruction using consensus equilibrium,” SIAM Journal on Imaging Sciences, vol. 11, no. 3, pp. 2001–2020, 2018. [Google Scholar]
[27].Hong T, Romano Y, and Elad M, “Acceleration of RED via vector extrapolation,” Journal of Visual Communication and Image Representation, p. 102575, 2019. [Google Scholar]
[28].Zhang K, Li Y, Zuo W, Zhang L, Van Gool L, and Timofte R, “Plug-and-play image restoration with deep denoiser prior,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 6360–6376, 2021. [Google Scholar]
[29].Hong T, Xu X, Hu J, and Fessler JA, “Provable preconditioned plug-and-play approach for compressed sensing MRI reconstruction,” IEEE Transactions on Computational Imaging, vol. 10, pp. 372 – 384, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Huang T, Yang G, and Tang G, “A fast two-dimensional median filtering algorithm,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 1, pp. 13–18, 1979. [Google Scholar]
[31].Buades A, Coll B, and Morel J-M, “A non-local algorithm for image denoising,” in Computer Vision and Pattern Recognition, CVPR, IEEE Computer Society Conference on, vol. 2, 2005, pp. 60–65. [Google Scholar]
[32].Dabov K, Foi A, Katkovnik V, and Egiazarian K, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007. [DOI] [PubMed] [Google Scholar]
[33].Ahmad R, Bouman CA, Buzzard GT, Chan S, Liu S, Reehorst ET, and Schniter P, “Plug-and-play methods for magnetic resonance imaging: Using denoisers for image recovery,” IEEE Signal Processing Magazine, vol. 37, no. 1, pp. 105–116, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[34].Chan SH, Wang X, and Elgendy OA, “Plug-and-play ADMM for image restoration: Fixed-point convergence and applications,” IEEE Transactions on Computational Imaging, vol. 3, no. 1, pp. 84–98, 2017. [Google Scholar]
[35].Reehorst ET and Schniter P, “Regularization by denoising: Clarifications and new interpretations,” IEEE Transactions on Computational Imaging, vol. 5, no. 1, pp. 52–67, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[36].Ryu E, Liu J, Wang S, Chen X, Wang Z, and Yin W, “Plug-and-play methods provably converge with properly trained denoisers,” in International Conference on Machine Learning. PMLR, 2019, pp. 5546–5557. [Google Scholar]
[37].Terris M, Repetti A, Pesquet J-C, and Wiaux Y, “Building firmly nonexpansive convolutional neural networks,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 8658–8662. [Google Scholar]
[38].Kamilov US, Bouman CA, Buzzard GT, and Wohlberg B, “Plug-and-play methods for integrating physical and learned models in computational imaging: Theory, algorithms, and applications,” IEEE Signal Processing Magazine, vol. 40, no. 1, pp. 85–97, 2023. [Google Scholar]
[39].Cohen R, Blau Y, Freedman D, and Rivlin E, “It has potential: Gradient-driven denoisers for convergent solutions to inverse problems,” Advances in Neural Information Processing Systems, vol. 34, pp. 18 152–18 164, 2021. [Google Scholar]
[40].Hurault S, Leclaire A, and Papadakis N, “Gradient step denoiser for convergent plug-and-play,” arXiv preprint arXiv:2110.03220, 2021. [Google Scholar]
[41].Chaudhari S, Pranav S, and Moura JM, “Gradient networks,” IEEE Transactions on Signal Processing, vol. 73, pp. 324 – 339, 2024. [Google Scholar]
[42].Hong T, Xu Z, Chun SY, Hernandez-Garcia L, and Fessler JA, “Convergent complex quasi-Newton proximal methods for gradient-driven denoisers in compressed sensing MRI reconstruction,” IEEE Transactions on Computational Imaging, vol. 11, pp. 1534–1547, 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
[43].Saad Y, Iterative Methods for Sparse Linear Systems. SIAM, 2003. [Google Scholar]
[44].Zbontar J, Knoll F, Sriram A, Murrell T, Huang Z, Muckley MJ, Defazio A, Stern R, Johnson P, Bruno M et al. , “fastMRI: An open dataset and benchmarks for accelerated MRI,” arXiv preprint arXiv:1811.08839, 2018. [Google Scholar]
[45].Hestenes MR, Stiefel E et al. , “Methods of conjugate gradients for solving linear systems,” Journal of Research of the National Bureau of Standards, vol. 49, no. 6, pp. 409–436, 1952. [Google Scholar]
[46].Paige CC and Saunders MA, “LSQR: An algorithm for sparse linear equations and sparse least squares,” ACM Transactions on Mathematical Software, vol. 8, no. 1, pp. 43–71, 1982. [Google Scholar]
[47].Van der Vorst HA, “Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 13, no. 2, pp. 631–644, 1992. [Google Scholar]
[48].Saad Y and Schultz MH, “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 7, no. 3, pp. 856–869, 1986. [Google Scholar]
[49].Saad Y, “A flexible inner-outer preconditioned GMRES algorithm,” SIAM Journal on Scientific Computing, vol. 14, no. 2, pp. 461–469, 1993. [Google Scholar]
[50].Chung J and Gazzola S, “Computational methods for large-scale inverse problems: A survey on hybrid projection methods,” SIAM Review, vol. 66, no. 2, pp. 205–284, 2024. [Google Scholar]
[51].Hong T, Xu Z, Hu J, and Fessler JA, “Using randomized nyström preconditioners to accelerate variational image reconstruction,” IEEE Transactions on Computational Imaging, pp. 1630 – 1643, 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
[52].Lampe J, Reichel L, and Voss H, “Large-scale Tikhonov regularization via reduction by orthogonal projection,” Linear Algebra and Its Applications, vol. 436, no. 8, pp. 2845–2865, 2012. [Google Scholar]
[53].Lanza A, Morigi S, Reichel L, and Sgallari F, “A generalized Krylov subspace method for $ℓ_{p} - ℓ_{q}$ minimization,” SIAM Journal on Scientific Computing, vol. 37, no. 5, pp. S30–S50, 2015. [Google Scholar]
[54].Huang G, Lanza A, Morigi S, Reichel L, and Sgallari F, “Majorization–minimization generalized Krylov subspace methods for $ℓ_{p} - ℓ_{q}$ optimization applied to image restoration,” BIT Numerical Mathematics, vol. 57, no. 2, pp. 351–378, 2017. [Google Scholar]
[55].Gazzola S and Nagy JG, “Generalized Arnoldi–Tikhonov method for sparse reconstruction,” SIAM Journal on Scientific Computing, vol. 36, no. 2, pp. B225–B247, 2014. [Google Scholar]
[56].Chung J and Gazzola S, “Flexible Krylov methods for $ℓ_{p}$ regularization,” SIAM Journal on Scientific Computing, vol. 41, no. 5, pp. S149–S171, 2019. [Google Scholar]
[57].Gazzola S, Nagy JG, and Landman MS, “Iteratively reweighted FGMRES and FLSQR for sparse reconstruction,” SIAM Journal on Scientific Computing, vol. 43, no. 5, pp. S47–S69, 2021. [Google Scholar]
[58].Gazzola S and Sabaté Landman M, “Krylov methods for inverse problems: Surveying classical, and introducing new, algorithmic approaches,” GAMM-Mitteilungen, vol. 43, no. 4, p. e202000017, 2020. [Google Scholar]
[59].Buccini A, Pasha M, and Reichel L, “Modulus-based iterative methods for constrained $ℓ_{p} - ℓ_{q}$ minimization,” Inverse Problems, vol. 36, no. 8, p. 084001, 2020. [Google Scholar]
[60].Sterck HD, “A nonlinear GMRES optimization algorithm for canonical tensor decomposition,” SIAM Journal on Scientific Computing, vol. 34, no. 3, pp. A1351–A1379, 2012. [Google Scholar]
[61].Sterck HD and He Y, “On the asymptotic linear convergence speed of Anderson acceleration, Nesterov acceleration, and nonlinear GMRES,” SIAM Journal on Scientific Computing, vol. 43, no. 5, pp. S21–S46, 2021. [Google Scholar]
[62].Fessler JA and Nadakuditi RR, Linear Algebra for Data Science, Machine Learning, and Signal Processing. Cambridge University Press, 2024. [Google Scholar]
[63].Beck A and Teboulle M, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009. [Google Scholar]
[64].Attouch H, Bolte J, Redont P, and Soubeyran A, “Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality,” Mathematics of Operations Research, vol. 35, no. 2, pp. 438–457, 2010. [Google Scholar]
[65].Bolte J, Sabach S, and Teboulle M, “Proximal alternating linearized minimization for nonconvex and nonsmooth problems,” Mathematical Programming, vol. 146, no. 1, pp. 459–494, 2014. [Google Scholar]
[66].Uecker M, Lai P, Murphy MJ, Virtue P, Elad M, Pauly JM, Vasanawala SS, and Lustig M, “ESPIRiT—an eigenvalue approach to autocalibrating parallel MRI: where SENSE meets GRAPPA,” Magnetic Resonance in Medicine, vol. 71, no. 3, pp. 990–1001, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[67].Kingma D and Ba J, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014. [Google Scholar]
[68].Zhang T, Pauly JM, Vasanawala SS, and Lustig M, “Coil compression for accelerated imaging with Cartesian sampling,” Magnetic Resonance in Medicine, vol. 69, no. 2, pp. 571–582, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[69].Li H and Lin Z, “Accelerated proximal gradient methods for nonconvex programming,” Advances in Neural Information Processing Systems, vol. 28, 2015. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp1-3655489

NIHMS2144790-supplement-supp1-3655489.pdf^{(3.7MB, pdf)}

[R1] [1].Brown RW, Cheng Y-CN, Haacke EM, Thompson MR, and Venkatesan R, Magnetic Resonance Imaging: Physical Principles and Sequence Design. John Wiley & Sons, 2014. [Google Scholar]

[R2] [2].Lustig M, Donoho D, and Pauly JM, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195, 2007. [DOI] [PubMed] [Google Scholar]

[R3] [3].Lustig M, Donoho DL, Santos JM, and Pauly JM, “Compressed sensing MRI,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 72–82, 2008. [Google Scholar]

[R4] [4].Fessler JA, “Model-based image reconstruction for MRI,” IEEE Signal Processing Magazine, vol. 27, no. 4, pp. 81–9, Jul. 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Pruessmann KP, Weiger M, Scheidegger MB, and Boesiger P, “SENSE: sensitivity encoding for fast MRI,” Magnetic Resonance in Medicine, vol. 42, no. 5, pp. 952–962, 1999. [PubMed] [Google Scholar]

[R6] [6].Griswold MA, Jakob PM, Heidemann RM, Nittka M, Jellus V, Wang J, Kiefer B, and Haase A, “Generalized autocalibrating partially parallel acquisitions (GRAPPA),” Magnetic Resonance in Medicine, vol. 47, no. 6, pp. 1202–1210, 2002. [DOI] [PubMed] [Google Scholar]

[R7] [7].Guerquin-Kern M, Haberlin M, Pruessmann KP, and Unser M, “A fast wavelet-based reconstruction method for magnetic resonance imaging,” IEEE Transactions on Medical Imaging, vol. 30, no. 9, pp. 1649–1660, 2011. [DOI] [PubMed] [Google Scholar]

[R8] [8].Rudin LI, Osher S, and Fatemi E, “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60, no. 1-4, pp. 259–268, 1992. [Google Scholar]

[R9] [9].Block KT, Uecker M, and Frahm J, “Undersampled radial MRI with multiple coils. Iterative image reconstruction using a total variation constraint,” Magnetic Resonance in Medicine, vol. 57, no. 6, pp. 1086–1098, 2007. [DOI] [PubMed] [Google Scholar]

[R10] [10].Hong T, Hernandez-Garcia L, and Fessler JA, “A complex quasi-Newton proximal method for image reconstruction in compressed sensing MRI,” IEEE Transactions on Computational Imaging, vol. 10, pp. 372 – 384, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Aharon M, Elad M, and Bruckstein A, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006. [Google Scholar]

[R12] [12].Ravishankar S and Bresler Y, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Transactions on Medical Imaging, vol. 30, no. 5, pp. 1028–1041, 2011. [DOI] [PubMed] [Google Scholar]

[R13] [13].Dong W, Shi G, Li X, Ma Y, and Huang F, “Compressive sensing via nonlocal low-rank regularization,” IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 3618–3632, 2014. [DOI] [PubMed] [Google Scholar]

[R14] [14].Fessler JA, “Optimization methods for magnetic resonance image reconstruction: Key models and optimization algorithms,” IEEE Signal Processing Magazine, vol. 37, no. 1, pp. 33–40, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Ravishankar S, Ye JC, and Fessler JA, “Image reconstruction: From sparsity to data-adaptive methods and machine learning,” Proceedings of the IEEE, vol. 108, no. 1, pp. 86–109, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].Heckel R, Jacob M, Chaudhari A, Perlman O, and Shimron E, “Deep learning for accelerated and robust MRI reconstruction,” Magnetic Resonance Materials in Physics, Biology and Medicine, vol. 37, no. 3, pp. 335–368, 2024. [Google Scholar]

[R17] [17].Wang S, Su Z, Ying L, Peng X, Zhu S, Liang F, Feng D, and Liang D, “Accelerating magnetic resonance imaging via deep learning,” in IEEE 13th International Symposium on Biomedical Imaging (ISBI). IEEE, 2016, pp. 514–517. [Google Scholar]

[R18] [18].Aggarwal HK, Mani MP, and Jacob M, “MoDL: Model-based deep learning architecture for inverse problems,” IEEE Transactions on Medical Imaging, vol. 38, no. 2, pp. 394–405, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] [19].Gilton D, Ongie G, and Willett R, “Deep equilibrium architectures for inverse problems in imaging,” IEEE Transactions on Computational Imaging, vol. 7, pp. 1123–1133, 2021. [Google Scholar]

[R20] [20].Ramzi Z, Chaithya G, Starck J-L, and Ciuciu P, “NC-PDNet: A density-compensated unrolled network for 2D and 3D non-Cartesian MRI reconstruction,” IEEE Transactions on Medical Imaging, vol. 41, no. 7, pp. 1625–1638, 2022. [DOI] [PubMed] [Google Scholar]

[R21] [21].Chung H and Ye JC, “Score-based diffusion models for accelerated MRI,” Medical Image Analysis, vol. 80, p. 102479, 2022. [Google Scholar]

[R22] [22].Venkatakrishnan SV, Bouman CA, and Wohlberg B, “Plug-and-play priors for model based reconstruction,” in IEEE Global Conference on Signal and Information Processing. IEEE, 2013, pp. 945–948. [Google Scholar]

[R23] [23].Romano Y, Elad M, and Milanfar P, “The little engine that could: Regularization by denoising (RED),” SIAM Journal on Imaging Sciences, vol. 10, no. 4, pp. 1804–1844, 2017. [Google Scholar]

[R24] [24].Sreehari S, Venkatakrishnan SV, Wohlberg B, Buzzard GT, Drummy LF, Simmons JP, and Bouman CA, “Plug-and-play priors for bright field electron tomography and sparse interpolation,” IEEE Transactions on Computational Imaging, vol. 2, no. 4, pp. 408–423, 2016. [Google Scholar]

[R25] [25].Meinhardt T, Moeller M, Hazirbas C, and Cremers D, “Learning proximal operators: Using denoising networks for regularizing inverse imaging problems,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, Oct. 2017, pp. 1799–1808. [Google Scholar]

[R26] [26].Buzzard GT, Chan SH, Sreehari S, and Bouman CA, “Plug-and-Play unplugged: Optimization free reconstruction using consensus equilibrium,” SIAM Journal on Imaging Sciences, vol. 11, no. 3, pp. 2001–2020, 2018. [Google Scholar]

[R27] [27].Hong T, Romano Y, and Elad M, “Acceleration of RED via vector extrapolation,” Journal of Visual Communication and Image Representation, p. 102575, 2019. [Google Scholar]

[R28] [28].Zhang K, Li Y, Zuo W, Zhang L, Van Gool L, and Timofte R, “Plug-and-play image restoration with deep denoiser prior,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 6360–6376, 2021. [Google Scholar]

[R29] [29].Hong T, Xu X, Hu J, and Fessler JA, “Provable preconditioned plug-and-play approach for compressed sensing MRI reconstruction,” IEEE Transactions on Computational Imaging, vol. 10, pp. 372 – 384, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Huang T, Yang G, and Tang G, “A fast two-dimensional median filtering algorithm,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 1, pp. 13–18, 1979. [Google Scholar]

[R31] [31].Buades A, Coll B, and Morel J-M, “A non-local algorithm for image denoising,” in Computer Vision and Pattern Recognition, CVPR, IEEE Computer Society Conference on, vol. 2, 2005, pp. 60–65. [Google Scholar]

[R32] [32].Dabov K, Foi A, Katkovnik V, and Egiazarian K, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007. [DOI] [PubMed] [Google Scholar]

[R33] [33].Ahmad R, Bouman CA, Buzzard GT, Chan S, Liu S, Reehorst ET, and Schniter P, “Plug-and-play methods for magnetic resonance imaging: Using denoisers for image recovery,” IEEE Signal Processing Magazine, vol. 37, no. 1, pp. 105–116, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] [34].Chan SH, Wang X, and Elgendy OA, “Plug-and-play ADMM for image restoration: Fixed-point convergence and applications,” IEEE Transactions on Computational Imaging, vol. 3, no. 1, pp. 84–98, 2017. [Google Scholar]

[R35] [35].Reehorst ET and Schniter P, “Regularization by denoising: Clarifications and new interpretations,” IEEE Transactions on Computational Imaging, vol. 5, no. 1, pp. 52–67, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] [36].Ryu E, Liu J, Wang S, Chen X, Wang Z, and Yin W, “Plug-and-play methods provably converge with properly trained denoisers,” in International Conference on Machine Learning. PMLR, 2019, pp. 5546–5557. [Google Scholar]

[R37] [37].Terris M, Repetti A, Pesquet J-C, and Wiaux Y, “Building firmly nonexpansive convolutional neural networks,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 8658–8662. [Google Scholar]

[R38] [38].Kamilov US, Bouman CA, Buzzard GT, and Wohlberg B, “Plug-and-play methods for integrating physical and learned models in computational imaging: Theory, algorithms, and applications,” IEEE Signal Processing Magazine, vol. 40, no. 1, pp. 85–97, 2023. [Google Scholar]

[R39] [39].Cohen R, Blau Y, Freedman D, and Rivlin E, “It has potential: Gradient-driven denoisers for convergent solutions to inverse problems,” Advances in Neural Information Processing Systems, vol. 34, pp. 18 152–18 164, 2021. [Google Scholar]

[R40] [40].Hurault S, Leclaire A, and Papadakis N, “Gradient step denoiser for convergent plug-and-play,” arXiv preprint arXiv:2110.03220, 2021. [Google Scholar]

[R41] [41].Chaudhari S, Pranav S, and Moura JM, “Gradient networks,” IEEE Transactions on Signal Processing, vol. 73, pp. 324 – 339, 2024. [Google Scholar]

[R42] [42].Hong T, Xu Z, Chun SY, Hernandez-Garcia L, and Fessler JA, “Convergent complex quasi-Newton proximal methods for gradient-driven denoisers in compressed sensing MRI reconstruction,” IEEE Transactions on Computational Imaging, vol. 11, pp. 1534–1547, 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] [43].Saad Y, Iterative Methods for Sparse Linear Systems. SIAM, 2003. [Google Scholar]

[R44] [44].Zbontar J, Knoll F, Sriram A, Murrell T, Huang Z, Muckley MJ, Defazio A, Stern R, Johnson P, Bruno M et al. , “fastMRI: An open dataset and benchmarks for accelerated MRI,” arXiv preprint arXiv:1811.08839, 2018. [Google Scholar]

[R45] [45].Hestenes MR, Stiefel E et al. , “Methods of conjugate gradients for solving linear systems,” Journal of Research of the National Bureau of Standards, vol. 49, no. 6, pp. 409–436, 1952. [Google Scholar]

[R46] [46].Paige CC and Saunders MA, “LSQR: An algorithm for sparse linear equations and sparse least squares,” ACM Transactions on Mathematical Software, vol. 8, no. 1, pp. 43–71, 1982. [Google Scholar]

[R47] [47].Van der Vorst HA, “Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 13, no. 2, pp. 631–644, 1992. [Google Scholar]

[R48] [48].Saad Y and Schultz MH, “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 7, no. 3, pp. 856–869, 1986. [Google Scholar]

[R49] [49].Saad Y, “A flexible inner-outer preconditioned GMRES algorithm,” SIAM Journal on Scientific Computing, vol. 14, no. 2, pp. 461–469, 1993. [Google Scholar]

[R50] [50].Chung J and Gazzola S, “Computational methods for large-scale inverse problems: A survey on hybrid projection methods,” SIAM Review, vol. 66, no. 2, pp. 205–284, 2024. [Google Scholar]

[R51] [51].Hong T, Xu Z, Hu J, and Fessler JA, “Using randomized nyström preconditioners to accelerate variational image reconstruction,” IEEE Transactions on Computational Imaging, pp. 1630 – 1643, 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] [52].Lampe J, Reichel L, and Voss H, “Large-scale Tikhonov regularization via reduction by orthogonal projection,” Linear Algebra and Its Applications, vol. 436, no. 8, pp. 2845–2865, 2012. [Google Scholar]

[R53] [53].Lanza A, Morigi S, Reichel L, and Sgallari F, “A generalized Krylov subspace method for $ℓ_{p} - ℓ_{q}$ minimization,” SIAM Journal on Scientific Computing, vol. 37, no. 5, pp. S30–S50, 2015. [Google Scholar]

[R54] [54].Huang G, Lanza A, Morigi S, Reichel L, and Sgallari F, “Majorization–minimization generalized Krylov subspace methods for $ℓ_{p} - ℓ_{q}$ optimization applied to image restoration,” BIT Numerical Mathematics, vol. 57, no. 2, pp. 351–378, 2017. [Google Scholar]

[R55] [55].Gazzola S and Nagy JG, “Generalized Arnoldi–Tikhonov method for sparse reconstruction,” SIAM Journal on Scientific Computing, vol. 36, no. 2, pp. B225–B247, 2014. [Google Scholar]

[R56] [56].Chung J and Gazzola S, “Flexible Krylov methods for $ℓ_{p}$ regularization,” SIAM Journal on Scientific Computing, vol. 41, no. 5, pp. S149–S171, 2019. [Google Scholar]

[R57] [57].Gazzola S, Nagy JG, and Landman MS, “Iteratively reweighted FGMRES and FLSQR for sparse reconstruction,” SIAM Journal on Scientific Computing, vol. 43, no. 5, pp. S47–S69, 2021. [Google Scholar]

[R58] [58].Gazzola S and Sabaté Landman M, “Krylov methods for inverse problems: Surveying classical, and introducing new, algorithmic approaches,” GAMM-Mitteilungen, vol. 43, no. 4, p. e202000017, 2020. [Google Scholar]

[R59] [59].Buccini A, Pasha M, and Reichel L, “Modulus-based iterative methods for constrained $ℓ_{p} - ℓ_{q}$ minimization,” Inverse Problems, vol. 36, no. 8, p. 084001, 2020. [Google Scholar]

[R60] [60].Sterck HD, “A nonlinear GMRES optimization algorithm for canonical tensor decomposition,” SIAM Journal on Scientific Computing, vol. 34, no. 3, pp. A1351–A1379, 2012. [Google Scholar]

[R61] [61].Sterck HD and He Y, “On the asymptotic linear convergence speed of Anderson acceleration, Nesterov acceleration, and nonlinear GMRES,” SIAM Journal on Scientific Computing, vol. 43, no. 5, pp. S21–S46, 2021. [Google Scholar]

[R62] [62].Fessler JA and Nadakuditi RR, Linear Algebra for Data Science, Machine Learning, and Signal Processing. Cambridge University Press, 2024. [Google Scholar]

[R63] [63].Beck A and Teboulle M, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009. [Google Scholar]

[R64] [64].Attouch H, Bolte J, Redont P, and Soubeyran A, “Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality,” Mathematics of Operations Research, vol. 35, no. 2, pp. 438–457, 2010. [Google Scholar]

[R65] [65].Bolte J, Sabach S, and Teboulle M, “Proximal alternating linearized minimization for nonconvex and nonsmooth problems,” Mathematical Programming, vol. 146, no. 1, pp. 459–494, 2014. [Google Scholar]

[R66] [66].Uecker M, Lai P, Murphy MJ, Virtue P, Elad M, Pauly JM, Vasanawala SS, and Lustig M, “ESPIRiT—an eigenvalue approach to autocalibrating parallel MRI: where SENSE meets GRAPPA,” Magnetic Resonance in Medicine, vol. 71, no. 3, pp. 990–1001, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R67] [67].Kingma D and Ba J, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014. [Google Scholar]

[R68] [68].Zhang T, Pauly JM, Vasanawala SS, and Lustig M, “Coil compression for accelerated imaging with Cartesian sampling,” Magnetic Resonance in Medicine, vol. 69, no. 2, pp. 571–582, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R69] [69].Li H and Lin Z, “Accelerated proximal gradient methods for nonconvex programming,” Advances in Neural Information Processing Systems, vol. 28, 2015. [Google Scholar]

PERMALINK

A Convergent Generalized Krylov Subspace Method for Compressed Sensing MRI Reconstruction with Gradient-Driven Denoisers

Tao Hong

Umberto Villa

Jeffrey A Fessler

Roles

Abstract

I. Introduction

II. Preliminaries on Krylov Subspace Methods

III. Proposed Method

Algorithm 1.

Algorithm 2.

A. Incorporating A Convex Constraint

B. Discussion

IV. Convergence Analysis

V. Numerical Experiments

Experimental Settings:

Fig. 1.

Fig. 2.

Fig. 3.

Algorithmic Settings:

A. Spiral Acquisition Reconstruction

Fig. 4.

Fig. 5.

TABLE I.

B. Radial Acquisition Reconstruction

C. Effect of K and Convergence Validation

VI. Conclusion

Supplementary Material

Acknowledgments

Appendix A

Proof of Lemma 3

TABLE II.

Fig. 6.

Appendix B

Proof of Lemma 4

Fig. 7.

Fig. 8.

Fig. 9.

Appendix C

Appendix D

Proof of Theorem 2

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases