Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Mar 7.
Published in final edited form as: IEEE Trans Comput Imaging. 2026 Jan 19;12:378–390. doi: 10.1109/tci.2026.3655489

A Convergent Generalized Krylov Subspace Method for Compressed Sensing MRI Reconstruction with Gradient-Driven Denoisers

Tao Hong 1, Umberto Villa 2, Jeffrey A Fessler 3
PMCID: PMC12965199  NIHMSID: NIHMS2144790  PMID: 41799403

Abstract

Model-based reconstruction plays a key role in compressed sensing (CS) MRI, as it incorporates effective image regularizers to improve the quality of reconstruction. The Plug-and-Play and Regularization-by-Denoising frameworks leverage advanced denoisers (e.g., convolutional neural network (CNN)-based denoisers) and have demonstrated strong empirical performance. However, their theoretical guarantees remain limited, as practical CNNs often violate key assumptions. In contrast, gradient-driven denoisers achieve competitive performance, and the required assumptions for theoretical analysis are easily satisfied. However, solving the associated optimization problem remains computationally demanding. To address this challenge, we propose a generalized Krylov subspace method (GKSM) to solve the optimization problem efficiently. Moreover, we also establish rigorous convergence guarantees for GKSM in nonconvex settings. Numerical experiments on CS MRI reconstruction with spiral and radial acquisitions validate both the computational efficiency of GKSM and the accuracy of the theoretical predictions. The proposed optimization method is applicable to any linear inverse problem.

Index Terms—: CS MRI, gradient-driven denoiser, Krylov subspace, convergence, spiral and radial acquisitions

I. Introduction

MAGNETIC resonance imaging (MRI) scanners acquire k-space data that represents the Fourier coefficients of the image of interest. However, the acquisition process is inherently slow due to physical, hardware, and sampling constraints [1]. This slow acquisition presents several practical challenges, including patient discomfort, motion artifacts, and reduced throughput. Since the seminal work in [2], compressed sensing (CS) MRI has attracted significant attention in the MRI community [3, 4] for accelerating the acquisition process through structured sampling patterns. Modern CS MRI methods incorporate multiple receiver coils (a.k.a. parallel imaging [5, 6]) to further improve acquisition speed. Image reconstruction in CS MRI requires solving the following composite minimization problem:

x=argminxNF(x)12Axy22h(x)+λf(x), (1)

where ACMC×N denotes the forward operator that maps the image xCN to the measured k-space data yCMC. Here, we consider C receiver coils. The encoding operator A is a stack of C submatrices AcCM×N, each defined as Ac=PFSc, where P is the sampling mask, F is the (nonuniform) Fourier transform, and Sc is the coil sensitivity map corresponding to the cth coil, which is patient-dependent. The trade-off parameter λ>0 balances h(x) and f(x).

The data-fidelity term h(x) in (1) promotes consistency with the acquired k-space data. In practice, often MN due to under-sampling, making (1) ill-posed. Therefore, incorporating prior knowledge through the regularizer f(x) is essential for stabilizing the reconstruction. The choice of regularization plays a crucial role in reconstruction quality. Traditional hand-crafted regularizers include wavelets [7], total variation (TV) [8, 9], combinations of wavelets and TV [2, 10], dictionary learning [11, 12], and low-rank models [13], to name a few. For reviews of various choices for f(x), see [4, 14, 15].

In the past decade, deep learning (DL) has gained prominence in MRI reconstruction due to its capacity to learn complex image priors directly from large training datasets [16]. Roughly speaking, DL-based approaches can be broadly categorized into end-to-end networks [17] and physics-driven unrolled algorithms [1820]. Recently, generative models have emerged as a powerful class for modeling priors in MRI, achieving impressive results across various settings [21].

An alternative to classical DL pipelines is the Plug-and-Play (PnP) [22] and REgularization-by-Denoising (RED) [23] frameworks. PnP and RED integrate powerful denoisers into iterative reconstruction algorithms and have demonstrated competitive performance across various imaging tasks [2429]. Early PnP/RED frameworks employed classical denoisers such as the median filter [30], non-local means [31], and BM3D [32]. As convolutional neural network (CNN)–based denoisers have shown superior performance to classical ones, modern PnP/RED methods typically integrate CNN-based denoisers. Unlike end-to-end or unrolled DL methods that require retraining for each imaging task, PnP and RED leverage learned image priors to flexibly adapt to changes in the forward model without retraining. This adaptability is particularly beneficial in CS MRI reconstruction, where scanspecific variations (e.g., different sampling trajectories and patient-specific sensitivity maps) are common. See [33] for a review of PnP methods that incorporate both classical and CNN-based denoisers in MRI reconstruction.

Despite the empirical success of PnP and RED, their theoretical convergence guarantees remain an active area of research; see [26, 3438]. These works typically require that the denoisers either approximate maximum a posteriori or minimum mean squared error estimators, or satisfy the nonexpansiveness condition. However, many successful denoisers—especially those based on CNNs—do not satisfy these assumptions. As a result, PnP and RED with such denoisers cannot be rigorously interpreted as optimization algorithms. Although optimization-free perspectives have been proposed, understanding the behavior of these frameworks remains challenging [26]. One alternative is to train denoisers with additional regularization that enforces a bounded Lipschitz constant [36, 37]. However, guaranteeing strict and tight boundedness in practice remains an open challenge.

Recent efforts have aimed to close the gap between the theoretical foundations and practical effectiveness of PnP and RED by introducing gradient-driven denoisers [3941]. In this approach, the unknown image x is recovered by solving

x=argminx𝒞F(x):=12Axy22h(x)+λfθ(x), (2)

where f𝛉(x) is a scalar-valued energy function parameterized by CNNs that serves as a learned image prior and 𝒞 is a closed convex set in CN. The parameters 𝛉 are learned by enforcing xxf𝛉(x) to act as a denoiser. Thus, the only required assumption is the differentiability of f𝛉 with respect to x, which allows one to integrate deep learning into inverse problems while maintaining a degree of interpretability—an essential requirement in medical imaging, where reconstructions directly influence diagnostic decisions. For notational simplicity, we omit the subscript 𝛉 and use f(x) instead of xf(x). Moreover, we absorb λ into f(x) in the following discussion since λ is fixed throughout the minimization once it is selected.

Although both h(x) and f(x) in (2) are differentiable, f(x) is generally nonconvex, which poses challenges for designing convergent and efficient algorithms. Cohen et al. [39] applied a projected gradient descent method with a line search to solve (2). Alternatively, Hurault et al. [40] employed the proximal gradient descent method with line search. Both approaches provide convergence guarantees under the assumption that f is Lipschitz continuous. However, these methods typically require hundreds of iterations to converge, which limits their practical applicability. Recently, Hong et al. [42] proposed a convergent complex quasi-Newton proximal method (CQNPM) that significantly reduces the computational time required to solve (2). Their convergence is established under the assumptions that f is Lipschitz continuous and that the proximal Polyak-Łojasiewicz condition holds. Although CQNPM converges faster than existing methods for solving (2), it requires solving a weighted proximal mapping (as defined in [42, equation (3)]) at each iteration. This step requires computing Ax and AHx multiple times,1 which which can increase the overall computational complexity. Computing Ax is expensive in MRI reconstruction with many coils, high-resolution images, or many interleaves or spokes in non-Cartesian acquisitions. Drawing inspiration from Krylov subspace methods (KSMs) [43], we propose a generalized Krylov subspace method (GKSM) for efficiently solving (2), which requires computing Ax,AHx, and f(x) only once per iteration. Our main contributions are summarized as follows:

  • We propose a generalized Krylov subspace method (GKSM) for efficiently solving (2).

  • We present a rigorous convergence analysis of GKSM in nonconvex settings, along with the convergence rate of the cost function values.

  • We extensively evaluate the performance of GKSM on brain (respectively, knee) images from the dataset described in the MoDL paper [18] (respectively, the NYU fastMRI dataset [44]). The k-space data are simulated from the reconstructed complex-valued images using spiral and radial sampling trajectories. We also empirically validate the accuracy of the convergence analysis.

The rest of this paper is organized as follows. Section II reviews the preliminaries on KSMs and discusses related work that generalizes KSMs for solving inverse problems. Section III describes GKSM in detail. Section IV provides a rigorous convergence analysis of GKSM. Section V reports experimental results that evaluate the performance of GKSM and empirically validate the theoretical analysis.

II. Preliminaries on Krylov Subspace Methods

This section first introduces KSMs, which were primarily developed for solving linear equations. We then review related generalized Krylov methods for linear inverse problems, along with existing theoretical results. Our main goal in this section is to provide a sketch of the key developments in KSMs, from their origins in solving linear equations to their generalization for inverse problems.

KSMs are a class of iterative algorithms for solving problems of the form

A¯x=b, (3)

where A¯RN×N is typically sparse, ill-conditioned, and large-scale. At kth iteration, KSMs construct an approximate solution to x within the Krylov subspace:

𝒦kA¯,r¯1=spanr¯1,A¯r¯1,A¯2r¯1,,A¯k1r¯1, (4)

where r¯1=bA¯x1 is the initial residual. The approximate solution xk+1 is obtained by seeking xk+1x1+𝒦kA¯,r¯1 that minimizes a chosen norm of the residual. The most widely used KSMs include, but are not limited to, the conjugate gradient method [45], LSQR [46], BiCGSTAB [47], and the generalized minimal residual method [48]. These methods are designed for different types of matrices A¯, such as symmetric positive definite, non-symmetric, or indefinite, to name a few [43]. Moreover, KSMs can incorporate preconditioners to further accelerate convergence [49].

Many inverse problems with variational regularizers [50, 51] can be modeled as the following pq optimization problem:

minxN1pAxypp+λqWxqq, (5)

where 0<p,q2, and W represents a transform such as a wavelet transform. KSMs have been generalized to solve problems such as (5). For p=q=2, Lampe et al. [52] presented a generalized KSM to address (5) in which the parameter λ is adaptively adjusted to keep Axy2 sufficiently close to a prescribed tolerance. Lanza et al. [53] proposed to solve (5) using a KSM along with an iteratively reweighted approach. Moreover, Huang et al. combined Krylov subspace-based method with majorization minimization for solving (5) [54]. To avoid the inner-outer iterations when using the iteratively approach in [53], several flexible Krylov subspaces were proposed to improve efficiency [5557]. See [50, 58] for a review of using KSMs for inverse problems.

Besides their extension to inverse problems, rigorous convergence analyses of KSMs remain an open research area. Lanza et al. [53] showed that the iterates converged to a minimizer of (5) for 1p,q2 if kerATAkerWTW={0}, where ker() denotes the null space of the matrix, and the constructed Krylov subspace fully represents the entire image domain. Similar convergence results can also be found in [54, 59]. Other works [5557] proved that the cost function values are monotonically decreasing and that the iterates converge to a stationary point. For brevity, we present only the convergence results of KSMs for inverse problems. See [60, 61] and the references therein for discussions on KSMs and their convergence in other contexts.

III. Proposed Method

This section provides the details of our GKSM for solving (2). We first discuss the case where 𝒞=CN and then describe how to incorporate a convex constraint. Lastly, we provide further discussion of GKSM to offer additional insights.

Given a subspace basis VkCN×k satisfying VkHVk=Ik where Ik is the identity matrix with dimension k, a Hermitian positive definite matrix BkCN×N,Bk0, and αkR,αk>0, GKSM solves the following problem at the kth iteration to obtain the coefficient 𝛃kCk:

𝛃k=argmin𝛃Ck12Axy22+fx,xk,Bk,αkFx,xk, (6)

where x=Vk𝛃,fx,xk,Bk,αkfxk,x+12αkxxkBk2 is a quadratic proximal term, and xBk2=xHBkx. Then the next image iterate is xk+1=Vk𝛃k. Rewriting (6) in terms of 𝛃 and reorganizing yields

𝛃k=argmin𝛃CkAVk𝛃B¯k12Vk𝛃yB¯k12wk22, (7)

where wk=xkαkBk1fxk and B¯k1/2 denotes the principal matrix square root of B¯k=Bk/αk and it is unique [62]. Note that AVk is built incrementally during the algorithm. Compared with the image size, the dimension of 𝛃k is relatively low because the number of iterations will be significantly smaller than the image dimension. Therefore, we solve (7) directly, i.e.,

𝛃k=VkHAHAVk+VkHB¯kVk1VkHAHy+B¯kwk. (8)

Algorithm 1.

Generalized Krylov Subspace Method (GKSM)

Initialization: x1, stepsize αk>0,V1=AHyAHy,AV1, maximal number of subspace iterations K, and maximal number of total iterations Max_Iter
Iteration:
 1: for k = 1, 2,…, Max_Iter do
 2:  Compute f(xk)
 3:  Set Bk using Algorithm 2
 4:  Compute 𝛃k using (8) (or solve (11) for 𝛃k if a convex constraint is enforced)
 5:  Compute xk+1Vk𝛃k
 6: if kK then
 7:   Compute rkxF¯(xk+1,xk)
 8:   r˜k(IVkVkH)rk
 9:   if r˜k0 then
10:    vk+1r˜k/r˜k
11:    Vk+1[Vkvk+1]
12:    AVk+1[AVkAvk+1]
13:   else
14:    Vk+1Vk
15:    AVk+1AVk
16:   end if
17: else
18:   Vk+1IN
19: end if
20: end for

Here the matrix being inverted is only k×k with kN.

To enrich the subspace after the kth iteration, we first compute the gradient of the objective function in (6) with respect to x at x=xk+1, i.e.,

rk=xFxk+1,xk. (9)

Then we set vk+1=r˜k/r˜k with r˜k=INVkVkHrk. The new subspace basis Vk+1 is formulated as Vkvk+1. If r˜k=0, we simply skip the update of Vk. Note that the dimension of 𝛃k is equal to the number of columns of Vk. This dimension may be smaller than k if the event r˜k=0 occurs, in which case the number of columns of Vk is also smaller than k. Algorithm 1 summarizes the detailed steps of GKSM. To establish the convergence rate of the cost function values, we introduce an additional step 18 in Algorithm 1, as the generated Vk does not necessarily span the entire image domain. Note that GKSM reduces to CQNPM [42] when Vk=IN. For this case, we simply apply the accelerated gradient descent method to solve (6). If the regularizer f in (2) is a quadratic function, then the subspace spanned by Vk simplifies to the classical Krylov subspace in (4). To obtain Bk, we adopt the algorithm presented in [42, Algorithm 2] such that Bk is an estimate of the Hessian matrix of f(x) that is guaranteed to be Hermitian positive definite. For completeness, Algorithm 2 provides the detailed steps for computing Bk. The operator () in Algorithm 2 extracts the real part.

Algorithm 2.

Modified Memory Efficient Self-Scaling Hermitian Rank-1 Method

Initialization: xk1,xk,f(xk1),f(xk),δ>0,ν1(0,1) and ν2(1,)
 1: Set skxkxk1 and mk(f(xk)f(xk1))
 2: Compute a such that
mina{a[0,1]m¯k=ask+(1a)mk}satisfiesν1(sk,m¯k)sk,skandm¯k,m¯k(sk,m¯k)ν2 (10)
 3: Compute τksk,sk(sk,m¯k)(sk,sk(sk,m¯k))2sk,skm¯k,m¯k
 4: ρk(skτkm¯k,m¯k)
 5: if ρkδskτkm¯km¯k then
 6: uk0
 7: else
 8: ukskτkm¯k
 9: end if
10: ρkBτk2ρk+τkukHuk
11: Return: Bkτk1INukukHρkB

A. Incorporating A Convex Constraint

This part extends GKSM to handle a convex constraint on x with a slight increase in computational cost. To ensure x𝒞, we solve the following problem for 𝛃k instead of (7):

𝛃k=argminVk𝛃𝒞AVk𝛃B¯k12Vk𝛃yB¯k12wk22. (11)

Letting z=Vk𝛃 and using the fact that VkHVk=Ik, we have 𝛃=VkHz. Thus we rewrite (11) as

xk+1=argminz𝒞AVkVkHzB¯k12VkVkHzyB¯k12wk22. (12)

Since the objective function of (12) is differentiable, we simply apply the accelerated projection gradient method [63] to solve (12) efficiently. Although z has the same dimension as x, solving (12) only requires simple matrix-vector multiplications since AVk and B¯k12Vk are precomputed and saved. In practice, to extend Algorithm 1 to handle the convex constraint, we only need to replace the computation of 𝛃k at step 4 with the solution of (11) by solving (12).

B. Discussion

The dominant computations at each iteration in Algorithm 1 involve computing f(x),Ax and AHx once, and the overall computational cost per iteration is lower than that of the methods proposed in [40, 42] that require dozens of evaluations of Ax and AHx per iteration. Apart from computational efficiency, GKSM requires additional memory, because it must store Vk and AVk. Thus, GKSM may become memoryprohibitive for very large-scale problems. A practical heuristic to address this challenge is to use a restart strategy, in which we cyclically set Vk=xk+1/xk+1. This not only reduces the memory usage but also lowers the computational cost. Algorithmically, when a restart is triggered, this is equivalent to rerunning Algorithm 1 with xk+1 as the new initial value and V1=xk+1/xk+1. Moreover, in practice, there is no guarantee that a column-orthogonal matrix Vk+1 can always be constructed from rk, since r˜k may be zero. The restart strategy typically helps to escape such a situation. However, in our experimental settings, we never found that r˜k=0. We leave the study of restart strategies to future work.

The following convergence analysis shows that GKSM is guaranteed to monotonically decrease the cost function value every iteration. Thus, it is safe to set K=Max_Iter in practice. However, the convergence rate of the cost function values to a minimum remains unclear since we cannot guarantee that Vk will span the entire image space after a finite number of iterations. To better characterize the convergence rate of the cost values, we introduce step 18 in Algorithm 1. This addition allows us to explicitly quantify the cost convergence after K iterations.

IV. Convergence Analysis

This section provides a rigorous convergence analysis of using Algorithm 1 to solve (2). Because the unconstrained problem is a special case of the constrained one, we focus our analysis on using GKSM for problems with constraints. We use the notation F𝒞(x)=F(x)+ι𝒞(x) in the following analysis, where ι𝒞(x) denotes the characteristic function, defined as ι𝒞(x)=0 if x𝒞, and ι𝒞(x)=+ otherwise. Here, we assume that 𝒞 is convex and that its indicator function ι𝒞 is lower semicontinuous. Before presenting our main convergence results, we first review the definition of Kurdyka–Łojasiewicz (KL) inequality and make one assumption, followed by four supporting lemmas.

Definition 1 (Kurdyka–Łojasiewicz inequality [64, 65]). Let χ(x):CN(,+] be a proper, lower semicontinuous function. We say that χ satisfies the Kurdyka–Łojasiewicz (KL) inequality at a point x¯dom(χ) if there exist a η>0, a neighborhood 𝒰 of x¯, and a continuous concave function φ:[0,η)R+ that is continuously differentiable on (0,η) and satisfies φ(0)=0 and φ(s)>0 for all s(0,η), such that

φ(|χ(x)χ(x¯)|)dist(0,χ(x))1.

holds for all x𝒰xCN:|χ(x)χ(x¯)|<η. Here, χ(x) denotes the subgradient of χ(x), and dist(,) denotes Euclidean distance.

In Definition 1, φ(s) is called the desingularization function. If φ(s) holds the form φ(s)=cs1t for t[0,1) and c> 0, then we say that χ(x) has the KL property at x¯ with an exponent of t.

Assumption 1 (L-Smooth f). Assume that f:Cn(,+] is a proper, lower semicontinuous, and lower bounded function. Further assume that the gradient of f is L-Lipschitz continuous. That is, x1,x2CN, there exists a L>0 such that the following inequality holds:

f(x1)f(x2)Lx1x2. (13)

Lemma 1 (Majorizer of f [42, Lemma 1]). Let f:CN(,] be an L-smooth function. Then for any x1,x2CN, we have

f(x2)f(x1)+{f(x1),x2x1}+L2x1x222. (14)

Lemma 2 (Bounded Hessian [42, Lemma 4]). The approximate Hessian matrices Bk generated by Algorithm 2 satisfy the following inequality

η_IBkηI,

where 0<η_<η<.

Lemma 3. By running Algorithm 1 for solving (2), we have the following inequality at kth iteration,

{f(xk),xk+1xk}h(xk)h(xk+1)12xkxk+1(2αkBkη_IN)2. (15)

Lemma 4. Suppose the elements in ϕk>0k1 satisfy

ϕk+12tγϕkϕk+1andϕk+1ϕk,

where t(0,1) and γ>0. Then we have the following upper bounds for ϕk+1,

ϕk+1{(1ϕ12t1γ+ϕ12t1)kϕ1,t(0,12](ϕ112t+(2t1)(1σ)2t2γk)+112t,t(12,1), (16)

where σ(0,1) and ()+=max(,0).

Lemmas 1 and 2 were already demonstrated in [42], so we omit their proofs here. The proofs of Lemmas 3 and 4 are provided in Appendices A and B. Lemma 4 is used to establish the convergence rates of the cost function sequence for different values of t in the KL inequality when running Algorithm 1. Theorems 1 and 2 summarize our main convergence results.

Theorem 1 (Descent properties of Algorithm 1, K+). Let αk0,2η_η_+L and Δkminkkxk+1xk22. Under Assumption 1, by running k<K iterations of Algorithm 1 to solve (2), we have

  • ΔkFx1Fvk and Fxk+1Fxk, where v=minkη_/αk(η_+L)/2,F denotes the minimum of (2), and x1 is the initial iterate.

  • xk+1xk0 as k.

Theorem 2 (Convergence rates, K<+). Let αk0,2η_η_+L and x,Λ=xCNxxΛ. Under Assumption 1, by running Algorithm 1 k>K iterations to solve (2), we have

  • xk+1xk0 as k and all cluster points of the sequence xkk>K are critical points of (2).

  • Assume xk converges to x and F𝒞 satisfies the KL inequality. There exists Λ>0 such that F𝒞 at x¯=x in a neighborhood of 𝒰 containing x,Λ. Then there also exists K>K, such that, xkx,Λ and F𝒞xkF<η for all kK. For kK, we have the following convergence rates of the cost function values for t[0,1):
    1. Fxk+1FFK1γkK+1+, t=0,
    2. Fxk+1F1FK2t1FK2t1+γkK+1FK, t0,12,
    3. Fxk+1FFK12t+qkK+1+112t, t12,1.
    where FK=FxKF,σ(0,1),γ=maxk([c(Lαk+η)]2)/(v(1t)2αk2), and q=(t1/2)(1σ)2t/γ.

Theorem 1 states that GKSM leads to xk+1xk0 as k for any K, either finite or infinite. Our first result in Theorem 2 establishes all cluster points of the sequence xkk>K are critical points of (2) for finite K. The second result in Theorem 2 provides convergence rates of the cost values for t[0,1) after K>K iterations. During the algorithm’s execution, all iterates remain in 𝒞, so that F𝒞xk=Fxk for all k. For ease of notation, we use F(x) instead of F𝒞(x) to express the convergence rates in Theorem 2. Note that if Fxk+1=F, the left hand side is identically zero and therefore the convergence rate bounds in Theorem 2 hold trivially.

Note that Algorithm 1 reduces to CQNPM [42] when K is finite and k>K. Theorem 2 then extends the theoretical analysis in [42] to a more general class of functions FC. Specifically, our analysis relies on the KL inequality, which is weaker than the Polyak-Łojasiewicz (PL) inequality2 used in [42]. Section V-C empirically studies the convergence behavior of Algorithm 1 to validate our theoretical analysis.

V. Numerical Experiments

This section studies the performance of GKSM for CS MRI reconstruction with spiral and radial sampling trajectories. Note that the experimental and algorithmic settings used here are similar to those in our previous work [42]. For convenience, we briefly re-describe them in this paper. Then, we present the reconstruction results and study the convergence behavior of GKSM empirically.

Experimental Settings:

The performance of GKSM are evaluated on both brain and knee MRI datasets. For the brain images, we used the dataset from [18], which contains 360 training and 164 test images. For the knee images, we adopted the multi-coil knee dataset from the NYU fastMRI [44]. The ESPIRiT algorithm [66] was used to obtain the complex-valued images from the raw k-space data. All images were then resized to a uniform image size of 256×256 and normalized such that the maximum magnitude was one. The network architecture proposed in [39] with the addition of bias terms is used to construct f(x), see Fig. 1. The number of layers is set to six. Note that other neural network architectures can also be used to train the gradient-driven denoiser. The only requirement is that the nonlinear activation functions be differentiable. For example, Hurault et al. [40] used DRUNet by replacing the ReLU activations with ELU to construct f(x). The noisy images were generated by adding i.i.d. Gaussian noise with variance 1/255 to the clean images. The network was trained using the mean squared error loss. We employed the ADAM optimizer [67] with an initial learning rate of 10−3, which was halved every 4, 000 iterations. Training was performed for a total of 18, 000 iterations with a batch size of 64. Although we trained separate denoisers for the brain and knee datasets, different sampling trajectories used the same denoiser.

Fig. 1.

Fig. 1.

The neural network architecture used to construct the energy function f𝛉(x) is based on [39]. The convolutional kernels have size 3×3 with a stride of one.

To assess reconstruction quality, we selected six brain and knee test images as ground truth. Fig. 2 displays the magnitudes of these images. For spiral acquisition, we used six interleaves with 1688 readout points and 32 coils. Radial acquisition employed 55 spokes with golden-angle rotation, 1024 readout points, and 32 coils. Fig. 3 describes these sampling trajectories. To simulate k-space data, we applied the forward model to the ground-truth images and then added the complex i.i.d. Gaussian noise (zero mean, variance 10−4), resulting in an input SNR of approximately 21 dB. In the reconstruction, we employed coil compression [68] to reduce the number of coils from 32 to 20 virtual coils, thereby lowering the computational cost. All experiments used simulated k-space data and were implemented in PyTorch and ran on an NVIDIA A100 GPU.

Fig. 2.

Fig. 2.

The magnitude of the six brain and knee complex-valued ground truth images.

Fig. 3.

Fig. 3.

The spiral (a) and radial (b) sampling trajectories.

Algorithmic Settings:

The previous work [10] already showed that the accelerated proximal gradient method (dubbed APG) [69] is faster than the projected gradient descent method [39] and the proximal gradient method [40] for addressing (2). Thus, we mainly compared GKSM with CQNPM [10] and APG in this paper. To test the applicability of GKSM for a constrained problem, we used the constraint set 𝒞=xx1 in all competing methods. However, in practical CS MRI reconstruction, we generally do not impose such a constraint. For plots involving F, we ran APG for 500 iterations and defined F=Fx500ε for a small constant ε>0. Unless otherwise specified, we set K=Max_Iter in the following experiments. Algorithm 2 used δ=108, ν1=2×106, and ν2=200.

A. Spiral Acquisition Reconstruction

Fig. 4 summarizes the cost and PSNR values versus the number of iterations and wall time for each method on the brain 1 image. Fig. 4.s (a) and (c show CQNPM was the fastest algorithm in terms of the number of iterations. GKSM achieved similar results to CQNPM after enough iterations. Fig. 4s (b) and (d) report the cost and PSNR values versus wall time, where GKSM was the fastest algorithm in terms of wall time. In multi-coil CS MRI reconstruction with non-Cartesian sampling, computing Ax and AHx typically dominate the computational cost. CQNPM (respectively, APG) requires solving a weighted proximal mapping (respectively, a proximal mapping) at each iteration, which involves multiple evaluations of Ax and AHx. In contrast, GKSM requires only a single evaluation of Ax and AHx per iteration, which significantly reduces computational cost while maintaining relatively fast convergence in terms of the number of iterations.

Fig. 4.

Fig. 4.

Comparison of different methods with spiral acquisition on the brain 1 image for ε=5×103. (a), (b): cost values versus iteration and wall time; (c), (d): PSNR values versus iteration and wall time.

Fig. 5 presents the reconstructed images and the corresponding error maps at the 50th and 100th iterations of each method. GKSM eventually achieved a reconstruction quality similar to that of CQNPM with same number of iterations, but with significantly less wall time. Table I summarizes the PSNR performance on the rest of five brain images within 150 iterations. Clearly, we observed that GKSM was approximately 7× faster than CQNPM in terms of wall time required to exceed the performance of APG within 150 iterations. Moreover, GKSM achieves nearly the same performance as CQNPM at the 150th iteration, while requiring approximately 9× less time, illustrating the superior performance of our method. The supplementary material includes additional results on the knee images with spiral acquisition, as well as the structural similarity index measure (SSIM) metrics, which exhibit similar behavior.

Fig. 5.

Fig. 5.

First row: the reconstructed brain 1 images of each method at 50th and 100th iterations with spiral acquisition. The PSNR (respectively, SSIM) values are labeled at the left (respectively, right) bottom corner of each image. Second row: the associated error maps (8×) of the reconstructed images.

TABLE I.

PSNR performance of each method for reconstructing five additional brain test images with spiral acquisition. For APG, we report the maximum PSNR (within 150 iterations), the corresponding number of iterations, and wall time. For other methods, we report the earliest iteration count that exceeds the APG PSNR (along with its PSNR and wall time) in the first row. The PSNR and wall time at the 150th iteration were summarized in the second row. Bold indicates the shortest wall time at which the PSNR of APG was exceeded. The PSNR values and wall time at the 150th iteration of CQNPM and GKSM are marked with an underline. The blue digits denote the shortest wall time at the 150th iteration.

MethodsIndex 2 3 4 5 6
PSNR↑ iter.↓ sec.↓ PSNR↑ iter.↓ sec.↓ PSNR↑ iter.↓ sec.↓ PSNR↑ iter.↓ sec.↓ PSNR↑ iter.↓ sec.↓
APG 42.1 150 86.3 42.8 150 86.4 43.3 150 87.1 42.0 150 86.4 40.5 150 88.7

CQNPM 42.1 29 16.9 42.8 30 19.7 43.3 31 19.1 42.1 30 17.6 40.5 29 17.6
44.2 150 82.0 44.7 150 70.1 45.0 150 65.5 44.0 150 68.9 42.7 150 77.0

GKSM 42.2 49 2.4 42.9 50 2.4 43.3 50 2.4 42.1 51 2.5 40.5 49 2.3
44.2 150 8.4 44.7 150 8.2 45.1 150 8.2 43.9 150 8.3 42.5 150 8.2

B. Radial Acquisition Reconstruction

Fig. 6 presents the cost and PSNR values versus the number of iterations and wall time of each method on the knee 1 image with radial acquisition. Fig. 6s (a) and (c) show CQNPM converged faster than APG in terms of the number of iterations. In this experimental setting, we found that CQNPM made faster progress than GKSM in the early iterations. Then GKSM exceeded CQNPM in the later iterations. This observation is slightly different from Fig. 4. Although all methods have convergence guarantees under the same assumptions, we cannot guarantee that their iterates follow the same trajectory. One possible explanation is that using a subspace in GKSM may sometimes act as an additional constraint, guiding the iterates along a more favorable path toward a minimizer. The detailed study of this direction is beyond the scope of this paper and is left for future work. Fig. 6s (b) and (d) display the cost and PSNR values of each method versus wall time. Evidently, GKSM converged faster than others in terms of wall time, which is consistent with the previous observation.

Fig. 7 reports the reconstructed images of each method at 50th and 100th iterations. From this experiment, we observed that GKSM demonstrated the best visual quality among all methods. Table II reports the PSNR and wall time performance of each method on the remaining five knee test images. We ran APG for 100 iterations and then compared how many iterations were required by the other methods to exceed the PSNR value achieved by APG. Table II also reports the PSNR values and wall time of CQNPM and GKSM at 100 iterations. Consistently, we observed similar beharior as in Table I. The supplementary material includes additional results on the brain images with radial acquisition, as well as the SSIM metrics that show similar trends.

C. Effect of K and Convergence Validation

This part empirically studies the effect of K and the convergence behavior of GKSM using the brain 1 image and spiral acquisition settings as in Fig. 4. Fig. 8 reports the PSNR values versus the number of iterations and wall time for GKSM with varying K and CQNPM. We observed that different values of K converged to similar PSNR values as CQNPM. Fig. 8(b) presents the PSNR values versus wall time, where we observed that larger values of K led to faster convergence compared with smaller ones. This observation is consistent with our earlier results, as GKSM avoids solving a weighted proximal mapping for iterations kK.

We have now empirically validated our theoretical analysis. Fig. 9 presents the cost values and the values of Δk/Δ1 for GKSM with spiral acquisition on six brain test images. As expected, the cost values converged to a constant across all test images, and Δk/Δ10, consistent with our theoretical analysis.

VI. Conclusion

A well-established theoretical foundation is especially important for ensuring reliability in medical imaging applications. Compared with the PnP and RED frameworks, gradient-driven denoisers offer a significantly stronger theoretical foundation. In particular, the only required assumptions are the differentiability of f and the Lipschitz continuity of ∇f, which are easier to satisfy in practice. To efficiently solve the associate nonconvex minimization problem, we developed a generalized Krylov subspace method with convergence guarantees in nonconvex settings. Numerical experiments on multi-coil CS-MRI reconstruction with non-Cartesian sampling trajectories demonstrate that the proposed method can recover images within seconds on a GPU platform. This significantly improves the efficiency of solving the associated optimization problem and enhances the practical applicability of gradient-driven denoisers.

Supplementary Material

supp1-3655489

Acknowledgments

TH and UV were supported in part by NIH grant R01 EB034261.

Supported in part by NIH grants R01 EB035618 and R21 EB034344.

Appendix A

Proof of Lemma 3

Since Bk0 (cf. Lemma 2), we know the objective function in (6) is η_-strongly convex. By combining with the fact that VkHVk=Ik and the η_-strongly convex inequality, we have the following inequality at kth iteration for x=Vk𝛃,x𝒞:

{VkH[h(xk+1)+f¯(xk+1,xk,Bk,αk)+η2(Vkβxk+1)],ββk}0. (17)

TABLE II.

PSNR performance of each method for reconstructing five additional knee test images with radial acquisition. For APG, we report the maximum PSNR (within 100 iterations), the corresponding number of iterations, and wall time. For other methods, we report the earliest iteration that exceeds the APG PSNR (along with its PSNR and wall time) in the first row. The PSNR and wall time at the 100th iteration were summarized in the second row. Bold indicates the shortest wall time at which the PSNR of APG was exceeded. The PSNR values and wall time at the 150th iteration of CQNPM and GKSM are marked with an underline. The blue digits denote the shortest wall time at the 100th iteration.

MethodsIndex 2 3 4 5 6
PSNR↑ iter.↓ sec.↓ PSNR↑ iter.↓ sec.↓ PSNR↑ iter.↓ sec.↓ PSNR↑ iter.↓ sec.↓ PSNR↑ iter.↓ sec.↓
APG 42.6 100 61.0 41.4 100 61.3 44.6 100 60.7 42.1 100 63.6 44.2 100 64.7

CQNPM 42.7 26 15.8 41.4 30 19.1 44.7 25 16.4 42.2 28 17.0 44.2 25 16.3
44.1 100 59.9 41.9 100 64.3 45.5 100 65.4 44.1 100 61.9 44.9 100 66.3

GKSM 42.7 41 2.0 41.4 33 1.6 44.7 38 1.8 42.3 36 1.8 44.2 39 1.9
44.1 100 5.6 43.3 100 5.5 45.5 100 5.5 44.4 100 5.7 44.9 100 5.5

Fig. 6.

Fig. 6.

Comparison of different methods with radial acquisition on the knee 1 image for ε=6×103. (a), (b): cost values versus iteration and wall time; (c), (d): PSNR values versus iteration and wall time.

Letting 𝛃=𝛃k10, we rewrite (17) as

{[h(xk+1)+1αkBk(xk+1xk)+f(xk)+η2(xkxk+1)],xkxk+1}0. (18)

Note that if r˜k=0, we choose 𝛃=𝛃k1 and (18) is still held. By reorganizing (18) and using the convexity of h(x)hxkhxk+1+hxk+1,xkxk+1, we get the desired result

{f(xk),xk+1xk}h(xk)h(xk+1)12xkxk+1(2αkBkη_IN)2. (19)

Appendix B

Proof of Lemma 4

Our goal is to derive an upper bound for ϕk+1 by using the facts that ϕkϕk+1ϕk+12t/γ and 0<ϕk+1ϕk. Rewrite ϕkϕk+1ϕk+12t/γ as ϕkϕk+11+1γϕk+12t1. Considering t(0,1/2), we know ϕk+12t1 is monotonically decreasing since 2t1<0. So we have ϕk+12t1ϕ12t1, which implies ϕkϕk+11+1γϕ12t1, yielding ϕk+11ϕ12t1γ+ϕ12t1kϕ1. If t=12, we have ϕk+1γϕkϕk+1, which yields ϕk+1γ1+γϕk. Therefore, we can establish the desired result immediately: ϕk+1111+γkϕ1.

Denote by ψ(x)=x12t, where x>0. Let t=2t1. Using the mean value theorem, we have

ψϕk+1ψϕk=tϕkt1ϕk+1ϕk, (20)

with ϕk+1ϕkϕk. Since ϕkt1 is monotonically decreasing and ϕkϕk+1ϕk+12t/γ, we can get the following inequalities from (20) for t(1/2,1)

ψ(ϕk+1)ψ(ϕk)t¯ϕk2t(ϕkϕk+1)ϕk2tϕk+12tt¯γ. (21)

Since 0<ϕk+1ϕk, we have ϕk+1/ϕk1. Suppose we run k iterations. For any σ(0,1), we can split the whole iterate indices into two subsets 1 and 2 such that 1=kϕk+1/ϕk1σ and 2=kϕk+1/ϕk>1σ. So, we know that either 1k/2 or 2k/2.

If 1k/2, we get

ϕk+1(1σ)k/2ϕ1. (22)

Next, we consider 2k/2. By summing up (21) from 1 to k, we reach

ψϕk+1ψϕ1+(2t1)(1σ)2t2γk.

Using the definition of ψ() and the fact that 12t<0, we derive

ϕk+1ϕ112t+(2t1)(1σ)2t2γk+112t. (23)

If k is large enough, the bound in (22) is smaller than that in (23), yielding the desired result.

Fig. 7.

Fig. 7.

First row: the reconstructed knee 1 images of each method at 50th and 100th iterations with radial acquisition. The PSNR (respectively, SSIM) values are labeled at the left (respectively, right) bottom corner of each image. Second row: the associated error maps (8×) of the reconstructed images.

Fig. 8.

Fig. 8.

Comparison of varying K with spiral acquisition on the brain 1 image. (a), (b): PSNR values versus iteration and wall time.

Fig. 9.

Fig. 9.

(a) Averaged cost values (a) and Δk/Δ1 (b) versus iteration for GKSM. The shaded region of each curve represents the range of the cost values and Δk across six brain test images with spiral acquisition.

Appendix C

By using Lemma 1, we have the following inequalities

f(xk+1)f(xk)+L2xk+1xk22+{f(xk),xk+1xk}f(xk)+h(xk)h(xk+1)12xkxk+1(2Bk/αk(η_+L)IN)2. (24)

The second inequality comes from Lemma 3. Reorganizing (24), we get

12xkxk+1(2Bk/αk(η_+L)IN)2FxkFxk+1.

Letting αk<2η_η_+L,v=mink{η_/αk(η_+L)/2}, and using Lemma 2, we reach

vxkxk+122FxkFxk+1. (25)

Since v>0, we have Fxk+1Fxk. Summing up (25) from k=1 to k, we get

k=1kvxkxk+122F(x1)F(xK+1)F(x1)F, (26)

where F denotes the minimal value of F(x). Letting Δk=minkkxkxk+122, we get the desired result

ΔkFx1Fvk. (27)

Let k, we get Δk0. Together with the summation in (26), we obtain xk+1xk0 as k.

Appendix D

Proof of Theorem 2

For k>K, we have Vk=IN, so VkHVk=IN still holds. Therefore, (26) and (27) remain valid for k>K. Consequently, we still have xk+1xk0 as k. Next, we prove that all cluster points of the sequence xkk>K are critical points of (2).

Let G(𝛃)=A¯k𝛃y¯k22 denote the cost function of (11) with A¯k=AB¯k12 and y¯k=yB¯k12wk. Then we rewrite (11) as an unconstrained problem, i.e.,

𝛃k=argmin𝛃G(𝛃)+ι𝒞(𝛃). (28)

From the first-order optimality condition of (28) and using the fact that xk+1=𝛃k, we have

0hxk+1+ι𝒞xk+1+fxk+1αkBkxk+1xk

which implies

f(xk+1)f(xk)+1αkBk(xkxk+1)h(xk+1)+f(xk+1)+ι𝒞(xk+1). (29)

Here, we use the definitions of fx,xk,Bk,αk and h(x).

Note that F𝒞(x)=F(x)+ι𝒞(x) with F(x)=h(x)+f(x). By using (29), we have

dist(0,F𝒞(xk+1))f(xk+1)f(xk)+1αkBk(xkxk+1),f(xk+1)f(xk)+η¯αkxkxk+1Lαk+η¯αkxkxk+1. (30)

Notice that xkxk+10 for k and that Lαk+ηαk remains finite. So we have dist0,F𝒞xk+10 for k, which implies that all cluster points of xkk>K are critical points of (2). This completes the proof of the first term.

Since F𝒞 satisfies the KL inequality and xk is converging to x, there exist K>K and Λ>0 such that, for all kK, we have xkx,Λ, where x,Λ=xCNxxΛ, and FCxkF<η. By letting x¯=x, φ(s)=cs1t, and the assumption x,Λ𝒰, we get

(F𝒞(xk+1)F)2tc2(1t)2dist(0,F𝒞(xk+1))2(30)[c(Lαk+η¯)]2(1t)2αk2xkxk+122(25)[c(Lαk+η¯)]2v(1t)2αk2(F(xk)F(xk+1)). (31)

Note that during the algorithm’s progress, all iterates remain in 𝒞, so that F𝒞(x)=F(x). For simplicity, we write F(x) instead of F𝒞(x) in what follows. Denote by γ=maxkcLαk+η2/(v(1t)2αk2). For t=0, we have

Fxk+1FFxkF1γ,

resulting in

Fxk+1FFxKF1γkK+1+.

By using Lemma 4, we get the desired results for t(0,1).

Footnotes

1

H denotes the Hermitian transpose operator.

2

The PL inequality corresponds to a special case of the KL inequality with t=12.

Contributor Information

Tao Hong, Oden Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, TX 78712, USA.

Umberto Villa, Oden Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, TX 78712, USA.

Jeffrey A. Fessler, Department of Electrical and Computer Engineering, University of Michigan, Ann Arbor, MI 48109, USA.

References

  • [1].Brown RW, Cheng Y-CN, Haacke EM, Thompson MR, and Venkatesan R, Magnetic Resonance Imaging: Physical Principles and Sequence Design. John Wiley & Sons, 2014. [Google Scholar]
  • [2].Lustig M, Donoho D, and Pauly JM, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195, 2007. [DOI] [PubMed] [Google Scholar]
  • [3].Lustig M, Donoho DL, Santos JM, and Pauly JM, “Compressed sensing MRI,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 72–82, 2008. [Google Scholar]
  • [4].Fessler JA, “Model-based image reconstruction for MRI,” IEEE Signal Processing Magazine, vol. 27, no. 4, pp. 81–9, Jul. 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Pruessmann KP, Weiger M, Scheidegger MB, and Boesiger P, “SENSE: sensitivity encoding for fast MRI,” Magnetic Resonance in Medicine, vol. 42, no. 5, pp. 952–962, 1999. [PubMed] [Google Scholar]
  • [6].Griswold MA, Jakob PM, Heidemann RM, Nittka M, Jellus V, Wang J, Kiefer B, and Haase A, “Generalized autocalibrating partially parallel acquisitions (GRAPPA),” Magnetic Resonance in Medicine, vol. 47, no. 6, pp. 1202–1210, 2002. [DOI] [PubMed] [Google Scholar]
  • [7].Guerquin-Kern M, Haberlin M, Pruessmann KP, and Unser M, “A fast wavelet-based reconstruction method for magnetic resonance imaging,” IEEE Transactions on Medical Imaging, vol. 30, no. 9, pp. 1649–1660, 2011. [DOI] [PubMed] [Google Scholar]
  • [8].Rudin LI, Osher S, and Fatemi E, “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60, no. 1-4, pp. 259–268, 1992. [Google Scholar]
  • [9].Block KT, Uecker M, and Frahm J, “Undersampled radial MRI with multiple coils. Iterative image reconstruction using a total variation constraint,” Magnetic Resonance in Medicine, vol. 57, no. 6, pp. 1086–1098, 2007. [DOI] [PubMed] [Google Scholar]
  • [10].Hong T, Hernandez-Garcia L, and Fessler JA, “A complex quasi-Newton proximal method for image reconstruction in compressed sensing MRI,” IEEE Transactions on Computational Imaging, vol. 10, pp. 372 – 384, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Aharon M, Elad M, and Bruckstein A, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006. [Google Scholar]
  • [12].Ravishankar S and Bresler Y, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Transactions on Medical Imaging, vol. 30, no. 5, pp. 1028–1041, 2011. [DOI] [PubMed] [Google Scholar]
  • [13].Dong W, Shi G, Li X, Ma Y, and Huang F, “Compressive sensing via nonlocal low-rank regularization,” IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 3618–3632, 2014. [DOI] [PubMed] [Google Scholar]
  • [14].Fessler JA, “Optimization methods for magnetic resonance image reconstruction: Key models and optimization algorithms,” IEEE Signal Processing Magazine, vol. 37, no. 1, pp. 33–40, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Ravishankar S, Ye JC, and Fessler JA, “Image reconstruction: From sparsity to data-adaptive methods and machine learning,” Proceedings of the IEEE, vol. 108, no. 1, pp. 86–109, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Heckel R, Jacob M, Chaudhari A, Perlman O, and Shimron E, “Deep learning for accelerated and robust MRI reconstruction,” Magnetic Resonance Materials in Physics, Biology and Medicine, vol. 37, no. 3, pp. 335–368, 2024. [Google Scholar]
  • [17].Wang S, Su Z, Ying L, Peng X, Zhu S, Liang F, Feng D, and Liang D, “Accelerating magnetic resonance imaging via deep learning,” in IEEE 13th International Symposium on Biomedical Imaging (ISBI). IEEE, 2016, pp. 514–517. [Google Scholar]
  • [18].Aggarwal HK, Mani MP, and Jacob M, “MoDL: Model-based deep learning architecture for inverse problems,” IEEE Transactions on Medical Imaging, vol. 38, no. 2, pp. 394–405, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Gilton D, Ongie G, and Willett R, “Deep equilibrium architectures for inverse problems in imaging,” IEEE Transactions on Computational Imaging, vol. 7, pp. 1123–1133, 2021. [Google Scholar]
  • [20].Ramzi Z, Chaithya G, Starck J-L, and Ciuciu P, “NC-PDNet: A density-compensated unrolled network for 2D and 3D non-Cartesian MRI reconstruction,” IEEE Transactions on Medical Imaging, vol. 41, no. 7, pp. 1625–1638, 2022. [DOI] [PubMed] [Google Scholar]
  • [21].Chung H and Ye JC, “Score-based diffusion models for accelerated MRI,” Medical Image Analysis, vol. 80, p. 102479, 2022. [Google Scholar]
  • [22].Venkatakrishnan SV, Bouman CA, and Wohlberg B, “Plug-and-play priors for model based reconstruction,” in IEEE Global Conference on Signal and Information Processing. IEEE, 2013, pp. 945–948. [Google Scholar]
  • [23].Romano Y, Elad M, and Milanfar P, “The little engine that could: Regularization by denoising (RED),” SIAM Journal on Imaging Sciences, vol. 10, no. 4, pp. 1804–1844, 2017. [Google Scholar]
  • [24].Sreehari S, Venkatakrishnan SV, Wohlberg B, Buzzard GT, Drummy LF, Simmons JP, and Bouman CA, “Plug-and-play priors for bright field electron tomography and sparse interpolation,” IEEE Transactions on Computational Imaging, vol. 2, no. 4, pp. 408–423, 2016. [Google Scholar]
  • [25].Meinhardt T, Moeller M, Hazirbas C, and Cremers D, “Learning proximal operators: Using denoising networks for regularizing inverse imaging problems,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, Oct. 2017, pp. 1799–1808. [Google Scholar]
  • [26].Buzzard GT, Chan SH, Sreehari S, and Bouman CA, “Plug-and-Play unplugged: Optimization free reconstruction using consensus equilibrium,” SIAM Journal on Imaging Sciences, vol. 11, no. 3, pp. 2001–2020, 2018. [Google Scholar]
  • [27].Hong T, Romano Y, and Elad M, “Acceleration of RED via vector extrapolation,” Journal of Visual Communication and Image Representation, p. 102575, 2019. [Google Scholar]
  • [28].Zhang K, Li Y, Zuo W, Zhang L, Van Gool L, and Timofte R, “Plug-and-play image restoration with deep denoiser prior,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 6360–6376, 2021. [Google Scholar]
  • [29].Hong T, Xu X, Hu J, and Fessler JA, “Provable preconditioned plug-and-play approach for compressed sensing MRI reconstruction,” IEEE Transactions on Computational Imaging, vol. 10, pp. 372 – 384, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Huang T, Yang G, and Tang G, “A fast two-dimensional median filtering algorithm,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 1, pp. 13–18, 1979. [Google Scholar]
  • [31].Buades A, Coll B, and Morel J-M, “A non-local algorithm for image denoising,” in Computer Vision and Pattern Recognition, CVPR, IEEE Computer Society Conference on, vol. 2, 2005, pp. 60–65. [Google Scholar]
  • [32].Dabov K, Foi A, Katkovnik V, and Egiazarian K, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007. [DOI] [PubMed] [Google Scholar]
  • [33].Ahmad R, Bouman CA, Buzzard GT, Chan S, Liu S, Reehorst ET, and Schniter P, “Plug-and-play methods for magnetic resonance imaging: Using denoisers for image recovery,” IEEE Signal Processing Magazine, vol. 37, no. 1, pp. 105–116, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Chan SH, Wang X, and Elgendy OA, “Plug-and-play ADMM for image restoration: Fixed-point convergence and applications,” IEEE Transactions on Computational Imaging, vol. 3, no. 1, pp. 84–98, 2017. [Google Scholar]
  • [35].Reehorst ET and Schniter P, “Regularization by denoising: Clarifications and new interpretations,” IEEE Transactions on Computational Imaging, vol. 5, no. 1, pp. 52–67, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Ryu E, Liu J, Wang S, Chen X, Wang Z, and Yin W, “Plug-and-play methods provably converge with properly trained denoisers,” in International Conference on Machine Learning. PMLR, 2019, pp. 5546–5557. [Google Scholar]
  • [37].Terris M, Repetti A, Pesquet J-C, and Wiaux Y, “Building firmly nonexpansive convolutional neural networks,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 8658–8662. [Google Scholar]
  • [38].Kamilov US, Bouman CA, Buzzard GT, and Wohlberg B, “Plug-and-play methods for integrating physical and learned models in computational imaging: Theory, algorithms, and applications,” IEEE Signal Processing Magazine, vol. 40, no. 1, pp. 85–97, 2023. [Google Scholar]
  • [39].Cohen R, Blau Y, Freedman D, and Rivlin E, “It has potential: Gradient-driven denoisers for convergent solutions to inverse problems,” Advances in Neural Information Processing Systems, vol. 34, pp. 18 152–18 164, 2021. [Google Scholar]
  • [40].Hurault S, Leclaire A, and Papadakis N, “Gradient step denoiser for convergent plug-and-play,” arXiv preprint arXiv:2110.03220, 2021. [Google Scholar]
  • [41].Chaudhari S, Pranav S, and Moura JM, “Gradient networks,” IEEE Transactions on Signal Processing, vol. 73, pp. 324 – 339, 2024. [Google Scholar]
  • [42].Hong T, Xu Z, Chun SY, Hernandez-Garcia L, and Fessler JA, “Convergent complex quasi-Newton proximal methods for gradient-driven denoisers in compressed sensing MRI reconstruction,” IEEE Transactions on Computational Imaging, vol. 11, pp. 1534–1547, 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Saad Y, Iterative Methods for Sparse Linear Systems. SIAM, 2003. [Google Scholar]
  • [44].Zbontar J, Knoll F, Sriram A, Murrell T, Huang Z, Muckley MJ, Defazio A, Stern R, Johnson P, Bruno M et al. , “fastMRI: An open dataset and benchmarks for accelerated MRI,” arXiv preprint arXiv:1811.08839, 2018. [Google Scholar]
  • [45].Hestenes MR, Stiefel E et al. , “Methods of conjugate gradients for solving linear systems,” Journal of Research of the National Bureau of Standards, vol. 49, no. 6, pp. 409–436, 1952. [Google Scholar]
  • [46].Paige CC and Saunders MA, “LSQR: An algorithm for sparse linear equations and sparse least squares,” ACM Transactions on Mathematical Software, vol. 8, no. 1, pp. 43–71, 1982. [Google Scholar]
  • [47].Van der Vorst HA, “Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 13, no. 2, pp. 631–644, 1992. [Google Scholar]
  • [48].Saad Y and Schultz MH, “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 7, no. 3, pp. 856–869, 1986. [Google Scholar]
  • [49].Saad Y, “A flexible inner-outer preconditioned GMRES algorithm,” SIAM Journal on Scientific Computing, vol. 14, no. 2, pp. 461–469, 1993. [Google Scholar]
  • [50].Chung J and Gazzola S, “Computational methods for large-scale inverse problems: A survey on hybrid projection methods,” SIAM Review, vol. 66, no. 2, pp. 205–284, 2024. [Google Scholar]
  • [51].Hong T, Xu Z, Hu J, and Fessler JA, “Using randomized nyström preconditioners to accelerate variational image reconstruction,” IEEE Transactions on Computational Imaging, pp. 1630 – 1643, 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Lampe J, Reichel L, and Voss H, “Large-scale Tikhonov regularization via reduction by orthogonal projection,” Linear Algebra and Its Applications, vol. 436, no. 8, pp. 2845–2865, 2012. [Google Scholar]
  • [53].Lanza A, Morigi S, Reichel L, and Sgallari F, “A generalized Krylov subspace method for pq minimization,” SIAM Journal on Scientific Computing, vol. 37, no. 5, pp. S30–S50, 2015. [Google Scholar]
  • [54].Huang G, Lanza A, Morigi S, Reichel L, and Sgallari F, “Majorization–minimization generalized Krylov subspace methods for pq optimization applied to image restoration,” BIT Numerical Mathematics, vol. 57, no. 2, pp. 351–378, 2017. [Google Scholar]
  • [55].Gazzola S and Nagy JG, “Generalized Arnoldi–Tikhonov method for sparse reconstruction,” SIAM Journal on Scientific Computing, vol. 36, no. 2, pp. B225–B247, 2014. [Google Scholar]
  • [56].Chung J and Gazzola S, “Flexible Krylov methods for p regularization,” SIAM Journal on Scientific Computing, vol. 41, no. 5, pp. S149–S171, 2019. [Google Scholar]
  • [57].Gazzola S, Nagy JG, and Landman MS, “Iteratively reweighted FGMRES and FLSQR for sparse reconstruction,” SIAM Journal on Scientific Computing, vol. 43, no. 5, pp. S47–S69, 2021. [Google Scholar]
  • [58].Gazzola S and Sabaté Landman M, “Krylov methods for inverse problems: Surveying classical, and introducing new, algorithmic approaches,” GAMM-Mitteilungen, vol. 43, no. 4, p. e202000017, 2020. [Google Scholar]
  • [59].Buccini A, Pasha M, and Reichel L, “Modulus-based iterative methods for constrained pq minimization,” Inverse Problems, vol. 36, no. 8, p. 084001, 2020. [Google Scholar]
  • [60].Sterck HD, “A nonlinear GMRES optimization algorithm for canonical tensor decomposition,” SIAM Journal on Scientific Computing, vol. 34, no. 3, pp. A1351–A1379, 2012. [Google Scholar]
  • [61].Sterck HD and He Y, “On the asymptotic linear convergence speed of Anderson acceleration, Nesterov acceleration, and nonlinear GMRES,” SIAM Journal on Scientific Computing, vol. 43, no. 5, pp. S21–S46, 2021. [Google Scholar]
  • [62].Fessler JA and Nadakuditi RR, Linear Algebra for Data Science, Machine Learning, and Signal Processing. Cambridge University Press, 2024. [Google Scholar]
  • [63].Beck A and Teboulle M, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009. [Google Scholar]
  • [64].Attouch H, Bolte J, Redont P, and Soubeyran A, “Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality,” Mathematics of Operations Research, vol. 35, no. 2, pp. 438–457, 2010. [Google Scholar]
  • [65].Bolte J, Sabach S, and Teboulle M, “Proximal alternating linearized minimization for nonconvex and nonsmooth problems,” Mathematical Programming, vol. 146, no. 1, pp. 459–494, 2014. [Google Scholar]
  • [66].Uecker M, Lai P, Murphy MJ, Virtue P, Elad M, Pauly JM, Vasanawala SS, and Lustig M, “ESPIRiT—an eigenvalue approach to autocalibrating parallel MRI: where SENSE meets GRAPPA,” Magnetic Resonance in Medicine, vol. 71, no. 3, pp. 990–1001, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [67].Kingma D and Ba J, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014. [Google Scholar]
  • [68].Zhang T, Pauly JM, Vasanawala SS, and Lustig M, “Coil compression for accelerated imaging with Cartesian sampling,” Magnetic Resonance in Medicine, vol. 69, no. 2, pp. 571–582, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [69].Li H and Lin Z, “Accelerated proximal gradient methods for nonconvex programming,” Advances in Neural Information Processing Systems, vol. 28, 2015. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp1-3655489

RESOURCES