Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jan 1.
Published in final edited form as: IEEE Trans Comput Imaging. 2021 May 6;7:530–518. doi: 10.1109/tci.2021.3077806

PALMNUT: An Enhanced Proximal Alternating Linearized Minimization Algorithm with Application to Separate Regularization of Magnitude and Phase

Yunsong Liu 1, Justin P Haldar 1
PMCID: PMC8386764  NIHMSID: NIHMS1713931  PMID: 34458504

Abstract

We introduce a new algorithm for complex image reconstruction with separate regularization of the image magnitude and phase. This optimization problem is interesting in many different image reconstruction contexts, although is nonconvex and can be difficult to solve. In this work, we first describe a novel implementation of the previous proximal alternating linearized minimization (PALM) algorithm to solve this optimization problem. We then make enhancements to PALM, leading to a new algorithm named PALMNUT that combines the PALM together with Nesterov’s momentum and a novel approach that relies on uncoupled coordinatewise step sizes derived from coordinatewise Lipschitz-like bounds. Theoretically, we establish that a version of PALMNUT (without Nesterov’s momentum) monotonically decreases the objective function, guaranteeing convergence of the cost function value. Empirical results obtained in the context of magnetic resonance imaging demonstrate that PALMNUT has computational advantages over common existing approaches like alternating minimization. Although our focus is on the application to separate magnitude and phase regularization, we expect that the same approach may also be useful in other nonconvex optimization problems with similar objective function structure.

I. Introduction

Many model-based computational imaging methods assume that data acquisition can be represented as a linear system of equations, such that

b=Af+n, (1)

where b is the length-M vector of measured data samples, f is the length-N vector of unknown image voxel values, A is an M × N matrix modeling the data acquisition operator, and n is a length-M vector of additive noise perturbations. Based on this forward model, it is relatively popular to formulate image reconstruction as a regularized linear least-squares optimization problem:

f^=argminf12Afb22+R(f), (2)

where ∥ · ∥2 is the standard Euclidean (2) norm, and R(·) is a regularization functional that is designed to reduce illposedness by encouraging the reconstructed image to have desirable characteristics. For example, it is common to encourage the reconstructed image to be spatially-smooth by choosing R(·) as a penalty on the norm of the image gradient or as a penalty on the norm of the high-frequency subbands in a wavelet representation of the image.

Although the image f may be real-valued in many scenarios (i.e., fRN), there are several imaging modalities (including certain kinds of optical imaging, ultrasound imaging, magnetic resonance imaging (MRI), synthetic aperture radar, etc. [1]-[10]) for which the image of interest is most naturally modeled as complex-valued (i.e., fCN). In such cases, the image magnitude and image phase often represent manifestations of different physical phenomena, and have distinct spatial characteristics from one another that are not easily captured by a single unified regularization penalty. This has led several authors, working on image reconstruction across a variety of different modalities and application domains, to consider optimization problems in which the magnitude and phase of the image are regularized separately [1]-[10].

While these approaches were developed independently in many cases, the optimization problems can largely be unified by the following general formulation:

{m^,p^}=argminmRNpRNJ~(m,p)+R1(m)+R~2(p), (3)

where

J~(m,p)12A(meip)b22. (4)

In this expression, m and p respectively represent the magnitude1 and phase of the complex image f such that f = mʘeip, and R1(·) and R~2() respectively represent separate regularization penalties for the magnitude and phase. We have used the notation ʘ to represent the Hadamard product (elementwise multiplication) of two vectors, have used the notation ez to denote the elementwise exponentiation of the vector z, and have used i to denote the unit imaginary number (i=1).

On one hand, Eq. (3) is desirable because it allows independent customization of the regularization penalties applied to the magnitude and phase components of the image, which has previously been shown to be beneficial [1]-[10]. On the other hand, this formulation can be substantially more difficult to solve than Eq. (2). For example, while Eq. (2) is convex if R(·) is chosen to be convex (and therefore can be globally optimized from arbitrary initializations using standard optimization methods!), the formulation in Eq. (3) is generally non-convex and is thus more challenging to optimize. As a result, many authors have decided to make use of a simple alternating minimization (AM) strategy that alternates between optimizing m for a fixed value of p and optimizing p for a fixed value of m [2], [4], [5], [7], [8], [10]. However, while AM can successfully decrease the objective function, it is not necessarily computationally efficient.

In this work, we propose and evaluate a new algorithm named PALMNUT to solve Eq. (3). This algorithm is based on combining the proximal alternating linearized minimization (PALM) algorithm [11] with Nesterov’s momentum [12] and a novel approach that uses uncoupled coordinatewise step sizes derived from coordinatewise Lipschitz-like bounds. PALMNUT is evaluated in the context of several different MRI-related inverse problems, where it is demonstrated to outperform popular existing methods. A preliminary account of portions of this work was previously presented in a recent conference [13].

During the review process for this paper, we were made aware of the existence of the Block Proximal Extrapolated Gradient method using a Majorizer (BPEG-M) algorithm, which was originally developed in the context of dictionary and operator learning [14], [15]. Although PALMNUT and BPEG-M were developed independently from one another, the algorithms share some similar features, including the alternating application of proximal gradient methods to different sets of variables and the use of momentum. In addition, the use of uncoupled step-sizes in PALMNUT has similarities to the use of “majorization matrices” in BPEG-M, and both approaches are derived using similar proof techniques. A nuanced description of the differences between the two algorithms is outside the purview of this paper, although is available from the authors on request. However, at a high level, PALMNUT’s approach is based on a slightly more general2 majorization relationship (based on inner products) than the one used by BPEG-M (based on norms). Conversely, BPEG-M has proposed the use of a positive definite matrix for majorization which is slightly more general than the coordinatewise (diagonal) majorization described herein for PALMNUT, although we note that PALMNUT would be easily generalized to use positive definite majorization if so desired. While a full comparison of PALMNUT and BPEG-M is beyond the scope of this paper, our preliminary analysis and numerical experiments (not shown, but available from the authors on request) suggest that PALMNUT generally uses larger step-sizes and has faster convergence than the most recent and most relevant published BPEG-M variation [15], although this variation of BPEG-M also possesses a strong form of convergence guarantee (stronger than what we have attempted to derive for PALMNUT).

II. Background

The following subsections review the AM approach (one of the dominant existing algorithms for solving Eq. (3)) as well as the related PALM algorithm (which serves as the foundation for our proposed PALMNUT approach).

Since both AM and PALM are generic algorithms that are broadly applicable beyond just the magnitude and phase problem from Eq. (3), we will start by introducing both of these algorithms in a more general setting before specializing to our problem of interest. Specifically, we will consider the generic optimization problem of the form

{x^,y^}=argminxRN1yRN2Ψ(x,y), (5)

where xRN1 and yRN2 are optimization variables, and the objective function Ψ(·, ·) can be decomposed as

Ψ(x,y)=H(x,y)+F(x)+G(y) (6)

for some real-valued or extended-real-valued scalar functions H(·, ·), F(·), and G(·).

A. Alternating Minimization

Given the problem setting described above, the AM algorithm proceeds according to Alg. 1.

Algorithm 1 Alternating Minimization for Eq. (5)
Initialization:Setk=1and initializex^0andy^0.whilenot convergedox^k=argminxRN1Ψ(x,y^k1)=argminxRN1H(x,y^k1)+F(x)y^k=argminyRN2Ψ(x^k,y)=argminyRN2H(x^k,y)+G(y)kk+1endwhile

As can be seen, AM alternates between optimizing the estimate of x for a fixed estimated value of y and optimizing the estimate of y for a fixed estimated value of x. This can be viewed as a block coordinate descent algorithm, and thus has well-studied theoretical characteristics [16]. The algorithm leads to monotonic decrease of the objective function value, and therefore (by the monotone convergence theorem) convergence of the objective function value if Ψ(x, y) is bounded from below. In practice, it is not necessary to solve the optimization subproblems for x^k and y^k exactly in each step, as the overall objective function will still decrease monotonically as long as the subproblem objective function values always decrease in each iteration.

In the case of optimizing Eq. (3) with AM [2], [4], [5], [7], [8], [10], one should associate x with m, y with p, H(·, ·) with J~(,), F(·) with R1(·), and G(·) with R~2().

When defining J~(m,p) as in Eq. (4), the magnitude subproblem has the form of a classical regularized linear least-squares problem. As such, there exist many different algorithms to solve this kind of problem efficiently. However, the phase subproblem is more complicated, as it is generally nonconvex. Existing methods have frequently relied on either the nonlinear conjugate gradient (NCG) algorithm [2], [4], [5], [7], [10] or a phase cycling heuristic [8] to address this subproblem.

B. Proximal Alternating Linearized Minimization (PALM)

The PALM algorithm considers the same setup described previously, but with the additional assumptions that H(·, ·) is a smooth real-valued function and F(·) and G(·) are proper and lower semicontinuous (but potentially nonsmooth) extended-real-valued functions. None of these functions are required to be convex.

The structure of PALM is strongly motivated by the AM algorithm, and similar to AM, the PALM algorithm for Eq. (5) proceeds by updating the x and y variables in alternation. However, rather than directly using the AM update formulas, PALM first applies proximal linearization of H to each subproblem prior to computing each update step. Specifically, PALM uses

x^k=argminxR1NH(x^k1,y^k1)+xx^k1,xH(x^k1,y^k1)+ck2xx^k122+F(x)+G(y^k1) (7)

and

y^k=arg minyRN2H(x^k,y^k1)+yy^k1,yH(x^k,y^k1)+dk2yy^k122+F(x^k)+G(y), (8)

where ck and dk are real-valued positive scalars, ∇x represents the gradient with respect to x, and ⟨·, ·⟩ denotes the standard dot product. These steps can be viewed as alternatingly performing proximal gradient descent on each set of variables.

One way to justify Eqs. (7) and (8) is to invoke the majorize-minimize algorithmic framework [17]. Specifically, for any real-valued objective function Q(x) with xRN, we say that the function Sk(x) is a “majorant” of Q(x) at the point x^k1 if it satisfies two conditions: (i)Sk(x^k1)=Q(x^k1); and (ii) Q(x) ≤ Sk (x) for xRN. An important feature of majorant functions is that if Sk(x^k)<Sk(x^k1), then it is guaranteed that Q(x^k)<Q(x^k1). This fact allows the majorant Sk(x) to be used as a surrogate for the original objective function Q(x), since descending on Sk(x) guarantees descent on Q(x). The use of a surrogate can be beneficial whenever the surrogate is easier to optimize than the original function [17].

With these concepts defined, we return to Eqs. (7) and (8). Assume that for a fixed value of y, ∇xH(x, y) is Lipschitz in x such that

xH(x1,y)xH(x2,y)2L1(y)x1x22 (9)

for ∀x1, x2RN1, where L1(y) is the corresponding Lipschitz constant. Similarly, assume that for a fixed value of x, ∇yH(x, y) is Lipschitz in y such that

yH(x,y1)yH(x,y2)2L2(x)y1y22 (10)

for ∀y1, y2RN2, where L2(x) is the corresponding Lipschitz constant. Under these assumptions, it can be shown that the objective function in Eq. (7) will be a majorant of Ψ(x,y^k1) at the point x=x^k1 whenever ckL1(y^k1), and the objective function in Eq. (8) will be a majorant of Ψ(x^k,y) at the point y=y^k1 whenever dkL2(x^k) [11]. This means that if ck and dk are chosen large enough, then Eqs. (7) and (8) can be interpreted as majorize-minimize steps. This will guarantee monotonic decrease of the objective function value, and therefore convergence of the objective function value will be guaranteed for PALM if Ψ(x, y) is bounded from below [17]. Under some additional conditions (e.g., that the cost function satisfies something called the Kurdyka-Lojasiewicz property), the iterates of PALM are guaranteed to converge to a critical point of Ψ(x, y) [11].

The solutions to the optimization problems in Eqs. (7) and (8) can be written in a more concise “proximal operator” form [11]. Specifically, Eq. (7) is equivalent to the “proximal operator”

x^k=argminxRN1F(x)+ck2xwk22proxF(wk,ck) (11)

with

wk=x^k1(1ck)xH(x^k1,y^k1), (12)

and Eq. (8) is equivalent to the “proximal operator”

y^k=argminyRN2G(y)+dk2yzk22,proxG(zk,dk) (13)

with

zk=y^k1(1dk)yH(x^k,y^k1). (14)

These representations are useful, because for many common regularization penalties, the corresponding proximal operators often have simple closed-form solutions [18], [19].

In summary, the PALM algorithm proceeds according to Alg. 2.

Algorithm 2 PALM for Eq. (5)
Initialization:Setk=1and initializex^0andy^0.whilenotconvergedoChooseckL1(y^k1)wk=x^k1(1ck)xH(x^k1,y^k1)x^k=proxF(wk,ck)ChoosedkL2(x^k)zk=y^k1(1dk)yH(x^k,y^k1)y^k=proxG(zk,dk)kk+1endwhile

Applying PALM to Eq. (3) is nontrivial and, to the best of our knowledge, has never been done before. We will describe our implementation (as well as the enhancements needed for PALMNUT) in the next section.

III. Methods

This section is organized as follows. In Subsection III-A, we first demonstrate how to apply PALM to solve Eq. (3). Next, Subsection III-B will introduce the principles of our novel approach that relies on uncoupled coordinatewise step sizes. Afterwards, Subsection III-C will describe the incorporation of Nesterov’s momentum strategy and describe the full PALMNUT algorithm.

For the sake of concreteness, the remainder of this paper will make the further assumption that the phase regularization term R~2(p) can be written as

R~2(p)=R2(eip) (15)

for some appropriate penalty function R2(·), such that we are regularizing the exponentiated phase. This choice is beneficial because it means that p always appears in exponentiated form in every place it appears in the objective function, which will enable simplifications later on. This choice has also been used in previous work [4], [5], [10] because it offers several other benefits. First, the exponentiated phase eip is unique, even though the phase vector p is never unique due to the 2π-periodicity of phase. Second, the exponentiated phase eip can be spatially-smooth even if the phase p is nonsmooth due to issues associated with phase wrapping. And finally, the spatial derivatives of the exponentiated phase image can be shown to have the same magnitude as the spatial derivatives of the optimally-unwrapped non-exponentiated phase image under common regularity conditions [20], which is important because phase derivatives are often used by regularization strategies that are designed to promote smooth phase [2], [4], [5], [8], [10]. However, it should also be noted that while regularizing the exponentiated phase is often beneficial in cases where phase wrapping has no consequence, it might not be the preferred choice in certain situations where the absolute (unwrapped) phase is important, e.g., in experiments where phase serves as a surrogate for an absolute physical quantity like temperature or velocity.

A. Applying PALM to Eq. (3)

In order to apply PALM to Eq. (3), it may be tempting to associate H(·, ·) with J~(,), F(·) with R1(·), and G(·) with R2(·) as we had also done for AM. However, it turns out that J~(,) does not have favorable Lipschitz bounds for the phase subproblem, and a different approach may be preferred.

To see this, note for the magnitude that

mJ~(m,p)={eip[AHA(meip)AHb]} (16)

and that

mJ~(m1,p)mJ~(m2,p)2{diag(eip)AHAdiag(eip)}m1m22, (17)

where {} denotes the operator that extracts the real part of its input, ∥ · ∥ denotes the spectral norm, and diag(eip) is the square diagonal matrix whose diagonal entries are equal to the elements of eip. Due to the characteristics of spectral norms, Eq. (17) provides a tight Lipschitz bound.

However, for the phase, note that

pJ~(m,p)={eipm(AHA(meip)AHb)}, (18)

where {} denotes the operator that extracts the imaginary part of its input. Because of the form of this gradient expression (i.e., p appears nonlinearly in Eq. (18)), deriving a good Lipschitz bound for the phase subproblem is nontrivial. We have been partially successful in deriving valid Lipschitz upper bounds for this case (results not shown), but none of the bounds we’ve derived have been anywhere close to tight. This is problematic for the implementation of PALM, because these loose Lipschitz bounds could cause us to choose dk values that are much larger than necessary, which will result in smaller-than-necessary step sizes in the phase update problem, which will ultimately lead to slow convergence speed.

To avoid this issue, instead of working directly with the original variable p and its exponentiated version eip, we will instead consider an equivalent formulation using the change of variables qeip under the constraint that all entries of q have magnitude one. This allows us to equivalently express Eq. (3) as

{m^,q^}=argminmRNqVJ(m,q)+R1(m)+R2(q), (19)

where

J(m,q)12A(mq)b22 (20)

and

V{qCN:qn=1,n=1,2,,N}. (21)

Once m^ and q^ are obtained, the corresponding value of p^ can be obtained, if so desired, by computing the angle of each of the entries of q.

This reformulation is beneficial, because the gradients simplify substantially. For the magnitude, we now have that

mJ(m,q)={q¯[AHA(mq)AHb]} (22)

where q¯ denotes the elementwise complex conjugation of q, which gives the tight Lipschitz bound

mJ(m1,q)mJ(m2,q)2{diag(q¯)AHAdiag(q)}m1m22. (23)

For the phase, we now have that

qJ(m,q)=m[AHA(mq)AHb] (24)

which gives the tight Lipschitz bound

qJ(m,q1)qJ(m,q2)2Adiag(m)2q1q22. (25)

Although this reformulation simplifies the calculation of Lipschitz constants, the introduction of the constraint set V makes the optimization problem more complicated. To avoid the need for constrained optimization, we will further rewrite the constrained problem from Eq. (19) in an equivalent unconstrained form using indicator functions [21] as

{m^,q^}=argminmRNqCNJ(m,q)+R1(m)+R2(q)+IV(q), (26)

where

IV(q){0,qV+,qV.} (27)

The new objective function from Eq. (26) now has four terms in it (i.e., J(·, ·), R1(·), R2(·), and IV()), and to apply PALM, it is necessary to associate these with the PALM terms H(·, ·), F(·), and G(·). The function J(·, ·) is always smooth and involves both m and q, so we will associate it with H(·, ·). The function IV() is always nonsmooth and only involves q, so we will associate it with G(·). While these associations are straightforward (we have no other options), we potentially have options for R1(·) and R2(·) depending on their smoothness characteristics. If R1(·) is nonsmooth, then it has to be associated with F(·), but if it is smooth (Lipschitz) then we could either choose to associate it with F(·) or incorporate it into H(·, ·). Similarly, if R2(·) is nonsmooth, then it has to be associated with G(·), but if it is smooth (Lipschitz) then we could either choose to associate it with G(·) or incorporate it into H(·, ·).

Although we have different options, the remainder of this paper will assume (for simplicity and without loss of generality) that smooth regularization functions will always be incorporated into H(·, ·). This choice leads to function associations that are summarized in Table I.

TABLE I:

Associations between PALM and Eq. (26) depending on whether R1(·) and R2(·) are smooth (Lipschitz).

R1(·) is
Smooth?
R2(·) is
Smooth?
H(·, ·) F(·) G(·)
Y Y J(m, q) + R1(m) + R2(q) 0 IV(q)
N Y J(m, q) + R2(q) R1(m) IV(q)
Y N J(m, q) + R1(q) 0 R2(q)+IV(q)
N N J(m, q) R1(m) R2(q)+IV(q)

With these assignments, the PALM algorithm from Alg. 2 can be directly applied, although it is still necessary to specify the computation of ck, dk, proxF(·, ·), and proxG(·, ·). Although these computations will necessarily depend on the characteristics of R1(·) and R2(·), we will provide concrete illustrations of these calculations for two typical choices of regularization penalties (we will use these same choices of regularization penalties in the validation study presented later in the paper). Of course, these two illustrations do not encompass every possibility, and interested readers are referred to Refs. [18], [19], [22] for further discussion and examples of computating proximal operators. However, it should be noted that the two illustrations below focus on the case where R2(·) is smooth (the first two rows of Table I), as we have found that it is frequently nontrivial to derive the proxG(·, ·) operator when G(·) incorporates both R2(·) and IV() (the last two rows of Table I).

1). Huber-function regularization of m with Tikhonov regularization of q:

For our first illustration, we will consider the case where magnitude regularization takes the form of either

R1(m)=λ1=1Lhξ([Bm]) (28)

or

R1(m)=λ1=1Lhξ(t=1T[Btm]2), (29)

and phase regularization takes the form of

R2(q)=λ22Cq22. (30)

In these expressions, λ1 and λ2 are positive scalar regularization parameters that can be respectively adjusted to control the influence of the magnitude and phase regularization terms, [z] is used to denote the th entry of the vector z, and hξ(·) is the Huber function (with parameter ξ > 0) defined as

hξ(t)={12ξt2,tξt12ξ,t>ξ.} (31)

The Huber function is a smooth, convex function that is commonly used for both robust statistics (to mitigate the effects of outliers) [23] and for edge/discontinuity/sparsity-preserving image regularization [24], [25]. As can be seen from Eq. (31), the Huber function is similar to the 1 -norm for large values of its argument, but unlike the 1-norm, is also smooth at the origin because it behaves like a squared 2-norm for small values of its argument. The Huber function with a small value of ξ is frequently chosen as a differentiable surrogate for sparsity-promoting 1-norm regularization, while choosing larger values of ξ can make the Huber function more tolerant to smoothly-varying image regions, more resilient to noise, and easier to characterize theoretically [24], [26], [27].

The regularization in Eq. (28) is based on applying the Huber function to a single transformation BCL×N (e.g., a wavelet transform, a finite-difference transform, etc.) of the magnitude vector m. The more general regularization in Eq. (29) applies the Huber function to a combination of T different transforms BtCL×N of m, which can be useful for imposing additional transform-domain structure. For example, combining a horizontal finite-difference transform with a vertical finite-difference transform within Eq. (29) is a common way to achieve isotropic regularization [28]. In addition, Eq. (29) is related to concepts of simultaneous sparsity [29], and our previous work has used regularization penalties of this form to impose the constraint that multi-contrast images of the same anatomy will frequently have correlated edge characteristics [10], [26], [27], [30]. Since Eq. (28) is a special case of Eq. (29) with T = 1, our description below will assume use of the more general form of Eq. (29).

The regularization in Eq. (30) corresponds to standard quadratic (Tikhonov) regularization. If C is chosen as a spatial finite-difference operator, this type of regularization can be good at imposing the constraint that the image phase should be spatially smooth without major discontinuities [2], [4], [5], [7], [10]. While not every MRI image will have smooth phase characteristics, most do, and smooth phase is a common constraint within the image reconstruction literature [31], [32].

For this case, both R1(·) and R2(·) are smooth, corresponding to the situation in the first row of Table I. As such, to implement PALM, we use the assignments:

H(,)12A(mq)b22+λ1=1Lhξ(t=1T[Btm]2)+λ22Cq22F()0G()IV(q). (32)

The gradients of H(·, ·) needed for PALM are

mH(m,q)={q¯[AHA(mq)AHb]}{+λ1t=1TBtHW(m)Btm} (33)

and

qH(m,q)=m[AHA(mq)AHb]+λ2CHCq, (34)

where W(m) is an L × L diagonal matrix depending on m with th diagonal entry given by

[W(m)]=1max{ξ,t=1T[Btm]2}. (35)

These gradient expressions give rise to Lipschitz-type upper bounds in the form of Eqs. (9) and (10), such that the majorization and descent conditions will be satisfied whenever

ckA2+λ1ξt=1TBtHBt (36)

and

dkA2m^k2+λ2C2, (37)

where ∥ · ∥ denotes the -norm. Please refer to [33]-[35] for insight on obtaining Lipschitz constants for this kind of Huber function. The spectral norms in these expressions are not iteration-dependent, and can be precomputed and reused throughout the iterative process. If it is difficult to analytically calculate the spectral norm values, they can also be evaluated using standard computationally-efficient numerical methods like Lanczos iteration [36].

Finally, the proximal operators needed for PALM are given by

proxF(wk,ck)=wk (38)

and

proxG(zk,dk)=zkzk, (39)

where division is performed elementwise and we choose the convention that 00=1.

2). 1regularization of m with Huber-function regularization of q:

For our second illustration, we will consider the case where

R1(m)=λ1Tm1 (40)

and

R2(q)=λ2=1Lhξ(t=1T[Btq]2). (41)

The 1-norm penalty with sparsifying transform matrix T from Eq. (40) is standard for promoting transform-domain sparsity, and has been previously used to regularize the magnitude vector m in several applications [1], [3], [5]-[9]. The characteristics of the Huber function from Eq. (41) have been discussed previously. By taking a small value of the parameter ξ, the Huber function can be used as a smooth approximation of the 1-norm in order to enable sparsity-promoting and/or edge-preserving regularization of the phase image, which can useful for some applications with more complicated phase characteristics [5], [8]. As indicated earlier, choosing R2(·) to be a smooth function and including it within H(·, ·) enables simplified computation of proxG(·, ·).

In this case, R1(·) is non-smooth while R2(·) is smooth, corresponding to the second row of Table I. As such, to implement PALM, we use the assignments:

H(,)12A(mq)b22+λ2=1Lhξ(t=1T[Btq]2)F()λ1Tm1G()IV(q). (42)

The gradients of H(·, ·) needed for PALM are

mH(m,q)={q¯[AHA(mq)AHb]} (43)

and

qH(m,q)=m[AHA(mq)AHb]+λ2t=1TBtHW(q)Btq. (44)

These gradient expressions give rise to Lipschitz-type upper bounds in the form of Eqs. (9) and (10), such that the majorization and descent conditions will be satisfied whenever

ckA2 (45)

and

dkA2m^k2+λ2ξt=1TBtHBt. (46)

As before, the spectral norms in these expressions are not iteration-dependent, and can be precomputed and reused throughout the iterative process.

Finally, assuming that T is a unitary transform such that TH = T−1, the proximal operator for F(·) is given by [19]

proxF(wk,ck)=THdiag(max{Twkλ1ck,0}Twk)Twk, (47)

where maximization, absolute value, and division operations are performed elementwise. Note that G(·) is the same as in the previous illustration, and therefore has the same proximal operator (Eq. (39)).

B. Uncoupled Step Sizes

Although the PALM algorithm described in the previous section provides a novel and effective approach for solving Eq. (3), we observe in this section that it may be very conservative and computationally inefficient for PALM to use the same value of dk (and therefore, the same step size 1/dk) for all elements of the phase vector p. This inefficiency stems from the fact that dk is set based on the global Lipschitz constant (effectively, the worst-case rate of change of the gradient along any possible direction), while we have observed that the rate of change of the gradient can be much smaller than the worst-case along specific directions. Concretely, using the global Lipschitz constant means that the step size will depend on the maximum value of m, while we observe that it can be much better for the step size for each coordinate to instead depend on the coordinatewise values of m. This observation motivates us to investigate and utilize coordinatewise bounds on the rate of change of the gradient, enabling uncoupled coordinatewise step sizes. Some readers might view our use of uncoupled step sizes as a form of iteration-dependent preconditioning.

For the sake of generality, we will first describe this approach for the general setting of Section II, where we are given a generic smooth real-valued function H(x, y). The PALM approach utilized majorants of H(x, y) that were derived based on the global (scalar-valued) Lipschitz constants L1(y) and L2(x) of Eqs. (9) and (10). In this section, we instead make the assumption that a vector L1(y)RN1 can be found such that, for a fixed value of y, we have

xH(x1,y)xH(x2,y),x1x2L1(y)(x1x2)22 (48)

for ∀x1, x2R1N, where the square-root operation is applied elementwise. Similarly, we assume that a vector L2(x)RN2 can be found such that, for a fixed value of x, we have

yH(x,y1)yH(x,y2),y1y2L2(x)(y1y2)22 (49)

for ∀y1, y2RN2.

It should be noted that if the global Lipschitz continuity conditions of Eqs. (9) and (10) are known to be satisfied, then a vector L1(y) satisfying Eq. (48) can be trivially obtained by setting all of its entries equal to L1(y), with an analogous argument holding true for L2(x). In particular, the Cauchy-Schwarz inequality combined with Eq. (9) implies that

xH(x1,y)xH(x2,y),x1x2xH(x1,y)xH(x2,y)2x1x22L1(y)x1x222=L1(y)(x1x2)22. (50)

However, Eqs. (48) and (49) are more flexible than Eqs. (9) and (10) because a different Lipschitz-like constant can be used for every coordinate, and many of these entries can be potentially much smaller than the global Lipschitz constant (because the function gradient may change much more slowly along these directions).

From an optimization perspective, Eqs. (48) and (49) are important because they enable the use of potentially better majorants than were used by PALM, as described by the following theorem.

Theorem 1. Consider the setting described in Section II, and assume that the smooth real-valued function H(x, y) satisfies the conditions of Eq. (48). Then given a vector ckRN1 and assuming y is held fixed at y=y^k1, the function

H(x^k1,y^k1)+xx^k1,xH(x^k1,y^k1)+12ck(xx^k1)22+F(x)+G(y^k1) (51)

is a majorant of Ψ(x,y^k1) at the point x=x^k1 whenever ckL1(y^k1) (elementwise). Similarly, given a vector dkRN2, assuming H(x, y) satisfies the conditions of Eq. (49), and assuming x is held fixed at x=x^k, the function

H(x^k,y^k1)+yy^k1,yH(x^k,y^k1)+12dk(yy^k1)22+F(x^k)+G(y) (52)

is a majorant of Ψ(x^k,y) at the point y=y^k1 whenever dkL2(x^k) (elementwise).

The proof of this theorem is given in Appendix A. Although this theorem is stated for real-valued vectors (for consistency with previous descriptions), the same also holds true for complex-valued vectors x and y.

Following the same approach as PALM but replacing the original PALM majorants (Eqs. (7) and (8)) with the new majorants from Thm. 1 results in the new PALM algorithm with uncoupled step sizes given in Alg. 3. This algorithm uses proximal operators with vector-valued ck and dk, which we define as

proxF(wk,ck)argminxRN1F(x)+12ck(xwk)22 (53)

and

proxG(zk,dk)argminyRN2G(y)+12dk(yzk)22. (54)

Just like for the original PALM algorithm, the use of valid majorants in our new algorithm guarantees that it will monotonically decrease the objective function value, and that the objective function value converge if Ψ(x, y) is bounded from below. Further, this new algorithm reduces to the original PALM algorithm if the ck and dk are treated as scalars (i.e., the values in different entries are always the same) instead of choosing different values in each of the entries. We hypothesize that making additional assumptions about the structure of the cost function (e.g., the Kurdyka-Lojasiewicz property that has been used with the PALM algorithm [11]) may enable a proof that this new algorithm has guaranteed convergence to a critical point, although such a proof is beyond the scope of the present work.

Algorithm 3 PALM for Eq. (5) with Uncoupled Step Sizes
Initialization:Setk=1and initializex^0andy^0.whilenotconvergedoChooseckL1(y^k1)(elementwise)wk=x^k1diag(ck)1xH(x^k1,y^k1)x^k=proxF(wk,ck)ChoosedkL2(x^k)(elementwise)zk=y^k1diag(dk)1yH(x^k,y^k1)y^k=proxG(zk,dk)kk+1endwhile

Now that the general approach has been described, let’s consider the application to the specific magnitude and phase optimization problem of interest from Eq. (3). Without further assumptions about the structure of A, we do not observe any special coordinatewise structure for the magnitude subproblem. As such, we can simply utilize the global Lipschitz constant in this case, setting ck = ck1, where 1 is a vector whose entries are all equal to 1 and ck is the value obtained for the PALM algorithm as described in the previous section.

However, for the phase subproblem with a fixed value of m, we observe that

qJ(m,q1)qJ(m,q2),q1q2Am(q1q2)22, (55)

where the magnitude operation is applied elementwise to the vector m. Thus, for the first illustration from the previous subsection (Huber-function regularization of m with Tikhonov regularization of p), our new algorithm can adopt

dkA2m^k2+λ2C2 (56)

instead of the previous expression from Eq. (37). Similarly, for the second illustration from the previous subsection (1-regularization of m with Huber-function regularization of p), our new algorithm can adopt

dkA2m^k2+λ2ξt=1TBtHBt (57)

instead of the previous expression from Eq. (46). Note also that for both of the previous illustrations, the proxG (·, ·) expressions had no dependence on the actual value of dk, which allows us to simply reuse the same proximal operators for the new algorithm without modification. As such, for these illustrations, the main difference between the PALM algorithm and our new algorithm is that PALM takes a uniform step size for the phase update that depends on the maximium value of m, while the new algorithm can take larger step sizes for coordinates where the corresponding value of m is small.

C. Nesterov’s Momentum Acceleration and PALMNUT

Our proposed PALMNUT (PALM with Nesterov’s momentum and Uncoupled sTep sizes) algorithm is obtained by combining Alg. 3 with Nesterov’s momentum technique. The basic idea of Nesterov’s technique is that, instead of computing the next iterate x^k based on values derived from x^k1, it can be better to instead find the next iterate using values derived from a combination of the previous iterates [12], [37], which can be interpreted as using “momentum” from previous iterations. For convex optimization problems, this approach can even result in convergence rates that have optimal order [12], [37]. Of course, the problem of interest in this work is not convex, although it has been shown empirically (often-times without rigorous theoretical justification) that Nesterov’s momentum technique can often substantially accelerate the iterative solution of nonconvex optimization problems.

The idea of applying momentum to accelerate the convergence of PALM has been studied in Ref. [38], where the resulting algorithm was called inertial PALM (iPALM). Theoretical convergence results for iPALM were proven with restrictive parameter choices (different from Nesterov’s original parameter choices), although it was also shown empirically that using Nesterov’s original parameter choices generally led to much faster convergence, despite the lack of theoretical guarantees. Our empirical experience is also consistent with these previous observations, so our proposed PALMNUT algorithm similarly utilizes Nesterov’s original parameter choices (following the concise form described by Ref. [39]). The final PALMNUT algorithm, incorporating both uncoupled coordinatewise step sizes and Nesterov’s momentum, is given in Alg. 4.

Algorithm 4 PALMNUT
Initialization:Setk=1and initializex^0andy^0.Setu0=x^0andv0=y^0.whilenotconvergedoChooseckL1(vk1)(elementwise)wk=uk1diag(ck)1xH(uk1,vk1)x^k=proxF(wk,ck)uk=x^k+k1k+2(x^kx^k1)ChoosedkL2(uk)(elementwise)zk=vk1diag(dk)1yH(uk,vk1)y^k=proxG(zk,dk)vk=y^k+k1k+2(y^ky^k1)kk+1endwhile

IV. Numerical Experiments

In the following subsections, we evaluate PALMNUT in three different simulations that are representative of a diverse set of real problems in MRI: sparsity-promoting reconstruction of undersampled k-space data [5], [7], [8], regularization-based denoising of a complex image [10], [26], [30], and using phase correction to enable the combination of multiple images acquired in the presence of experimental phase instabilities [10], [40]-[44].

In each of these cases, we compare PALMNUT against AM combined with NCG [2], [4], [5], [7], [10] and AM combined with phase cycling [8]. PALMNUT and AM combined with NCG are directly comparable, because they can both be used to regularize the exponentiated phase eip, and therefore can both be applied to the exact same optimization problem. As a result, for both of these algorithms, the phase was regularized with R2(eip) as described previously, and we used the cost function value and the computation time to judge algorithm performance. However, the phase cycling heuristic [8] is intended to be used for the direct regularization of p. For this algorithm, we therefore had to instead use phase regularization of the form R2(p). In addition, by its nature, the phase cycling heuristic employs a different phase-cycled cost function at each iteration, which makes it difficult to plot a meaningful cost function value. To allow for comparisons with this different approach, we therefore also computed a normalized root-mean-squared error (NRMSE) metric for each algorithm. If f represents the ground truth vector and f^ represents an estimate, the NRMSE of the estimate is given by

NRMSEf^f2f2. (58)

In order to focus on consequential errors, the NRMSE values were always computed after masking out the empty (background) parts of the field-of-view. This was particularly beneficial for AM with phase cycling, which frequently showed higher error levels in the image background compared to the other two approaches.

Regularization parameters for each cost function in each scenario were empirically optimized to achieve the smallest possible final NRMSE values for the two AM algorithms. Our implementation of AM with NCG used the Polak-Ribiere version of NCG [45]. Our implementation of AM with phase cycling was based on code provided by the authors of Ref. [8] (available from https://mrirecon.github.io/bart/).

A. Undersampled MRI Reconstruction

In the first set of evaluations, we considered the reconstruction of an MR image from 8×-undersampled k-space data. The gold-standard magnitude and phase images, which were obtained from a real fully-sampled in vivo T1-weighted MRI acquisition with 256 × 256 in-plane matrix size, are shown in Fig. 1. This figure also shows the k-space sampling mask (corresponding to 8× undersampling) that we used to simulate an accelerated acquisition.

Fig. 1:

Fig. 1:

The ground truth (a) magnitude and (b) phase images used for the undersampled MRI reconstruction scenario, along with (c) the 8×-accelerated k-space sampling mask used to simulate the undersampled acquisition.

For this simulation, the original fully-sampled k-space data (originally measured with 32 channels) was coil-compressed down to 8 virtual channels to reduce computational complexity, and was then retrospectively undersampled using the afore-mentioned k-space sampling mask. For reconstruction, the A matrix was chosen according to the standard SENSE model [46], with sensitivity maps estimated using ESPIRiT [47]. The magnitude regularization took the form of an 1 penalty as given by Eq. (40), where, following Ref. [8], the sparsifying transform T was chosen to be the unitary Daubechies-4 wavelet transform. The phase regularization took the form of a Huber-function penalty as given by Eq. (41), where the Huber parameter ξ was chosen to be a small number (i.e., ξ = 0.001) in order to approximate the 1-norm. Following Ref. [8], the transform we used for phase regularization was also a unitary Daubechies-4 wavelet transform. All three algorithms were initialized by applying SENSE-based coil-combination to the multi-channel images obtained by zero-filling the unmeasured data.

Convergence results for all three algorithms are shown in Fig. 2, along with representative image reconstruction results. As can be seen, the cost function and NRMSE values converge to similar levels for both PALMNUT and AM with NCG, although PALMNUT converged much faster. Although AM with phase cycling converged to a result with a reasonably-good NRMSE, it was substantially worse than PALMNUT in both convergence speed and the final achieved NRMSE value.

Fig. 2:

Fig. 2:

Convergence plots and reconstructed images for the undersampled MRI reconstruction scenario. The convergence plots show (a) the cost function value as a function of computation time and (b) the NRMSE value for the complex image as a function of computation time. Also shown are the magnitude and phase images corresponding to (c) zero-filled reconstruction and (d) PALMNUT.

B. Regularization-based MRI Denoising

In the second set of evaluations, we considered regularization-based denoising of a 230×180 single-channel T1-weighted MR image, obtained by applying complex coil-combination to an 8-channel dataset and subsequently adding simulated complex Gaussian noise. The ground truth and noisy images are shown in Fig. 3.

Fig. 3:

Fig. 3:

The (a) ground truth and (b) noisy magnitude and phase images used for the regularization-based MRI denoising scenario.

For reconstruction, the A matrix was an identity matrix. Following Refs. [10], [26], [27], [30], the magnitude was regularized using a Huber-function penalty as given by Eq. (28), where a finite difference transformation was used to enforce spatial smoothness of the image. Following Ref. [2], [4], [5], [7], [10], the phase was regularized using a Tikhonov penalty as given by Eq. (29), also using a finite difference transformation to enforce spatial smoothness. All algorithms were initialized with the noisy image.

Convergence results are shown in Fig. 4, along with representative reconstruction results. As can be seen, the results in this case are consistent with the previous case: PALMNUT and AM with NCG converged to similar NRMSE values, although PALMNUT was substantially faster, while AM with phase cycling was reasonably successful yet still substantially worse than the others in both speed and NRMSE.

Fig. 4:

Fig. 4:

Convergence plots and reconstructed images for the regularization-based MRI denoising scenario. The convergence plots show (a) the cost function value as a function of computation time and (b) the NRMSE value for the complex image as a function of computation time. Also shown are the (c) magnitude and phase images corresponding to PALMNUT.

C. Phase-Corrected MR Image Combination

In the third experiment, we simulated a scenario that is common in diffusion MRI, in which multiple measurements are made of the same image to enable averaging to improve SNR, but the phase varies randomly with each measurement due to experimental instabilities. This case was simulated based on actual diffusion MRI magnitude and phase data from Ref. [10]. We simulated a case with four repetitions, based on one ground truth magnitude image and four ground truth phase images, all with matrix size 180 × 332. The magnitude image was combined with each of the different phase images to yield four different complex images, and then complex Gaussian noise was added to each result. The ground truth images and a representative noisy image are shown in Fig. 5.

Fig. 5:

Fig. 5:

(a) The ground truth magnitude and (b) a representative noisy magnitude image, along with (c) the four different ground truth phase images used for the scenario with phase-corrected combination of multiple images.

For reconstruction, we estimated a single shared magnitude image m and four different phase images pj corresponding to the four different noisy measured images bj, j = 1, 2, 3, 4, by minimizing the cost function

R1(m)+j=1412meipjbj22+R2(eipj), (59)

where the magnitude and phase regularization penalties R1 (·) and R2(·) were chosen in exactly the same way as for the denoising scenario from the previous subsection. All optimization algorithms were initialized with the noisy phase images and by taking the average of the noisy magnitude images.

The convergence plots shown in Fig. 6 show similar characteristics to those observed in the previous scenarios, with PALMNUT having a distinct advantage over the two alternative algorithms.

Fig. 6:

Fig. 6:

Convergence plots and reconstructed images for the phase-corrected MRI image combination scenario. The convergence plots show (a) the cost function value as a function of computation time and (b) the NRMSE value as a function of computation time. Also shown is (c) the combined magnitude image obtained from PALMNUT.

V. Discussion

The results of the previous section demonstrated that PALMNUT can have major advantages relative to AM methods. However, PALMNUT represents a combination of three different ideas (PALM, Nesterov’s momentum, and uncoupled stepsizes), and the previous experiments did not investigate which parts of PALMNUT contribute most to its performance. In order to gain more insight, we did another set of experiments in which we compared PALMNUT against the original PALM algorithm (Alg. 2), PALM with Nesterov’s momentum but without uncoupled stepsizes (iPALM)3, and PALM with uncoupled stepsizes but without Nesterov’s momentum (Alg. 3). Results are shown in Fig. 7 for both the undersampled MRI reconstruction (Section IV-A) and the regularization-based MRI denoising (Section IV-B) scenarios described previously. From these plots, we observe that both uncoupled step sizes and Nesterov’s momentum independently improve the convergence speed relative to the original PALM method, with Nesterov’s momentum contributing a little more than the use of uncoupled step sizes. However, PALMNUT’s combination of all of these elements leads to the best overall results (i.e., reaching small cost function values sooner than the other algorithms and achieving the fastest empirical convergence of the cost function value).

Fig. 7:

Fig. 7:

The convergence plots show the cost function value and the NRMSE value as a function of computation time for (a) the undersampled MRI reconstruction scenario and (b) the regularization-based MRI denoising scenario.

Interestingly, it was also observed that the NRMSE curves for PALMNUT and iPALM followed similar trends in this set of experiments, and that the NRMSE values appeared to converge sooner than the cost function values did. This is likely due to the well-known fact that NRMSE can be insensitive to some kinds of image features [48], although it should be clear from the cost function behavior that the image estimates are still changing even after the NRMSE curves appear to have converged. In addition, it is important to keep in mind that NRMSE values cannot generally be computed in practical applications, and unlike the cost function behavior, the NRMSE behavior could not be easily used as an algorithm stopping criterion.

An interesting phenomenon we observed with PALMNUT is that the magnitude estimate frequently converged faster than the phase estimate (results not shown). As a result, it could be potentially beneficial to update the phase estimate more frequently than the magnitude estimate, although we believe that such an exploration is beyond the scope of this paper.

One of the key ingredients of PALMNUT is the use of coordinatewise step sizes based on coordinatewise Lipschitz-like bounds in the form of Eqs. (48) and (49). Although these bounds were formulated in the context of PALM, we believe that Lemma 1 (found in the Appendix and used in the proof of Theorem 1) represents a novel coordinatewise majorization relationship, which could also potentially be useful to construct majorants in more general optimization scenarios.

Finally, we should mention that while PALM combined with uncoupled step sizes has guaranteed monotonic convergence properties, the convergence of the PALMNUT approach (which additionally includes Nesterov’s momentum) has not yet been theoretically proven. This represents another potentially interesting topic for future research.

VI. Conclusion

We proposed and evaluated a new algorithm called PALMNUT, which combines the PALM algorithm with Nesterov’s momentum with uncoupled coordinatewise step sizes derived from coordinatewise Lipschitz-like bounds. Although our approach is general and can be applied to other computational imaging scenarios, our evaluation studies focused on MRI scenarios involving separate regularization of the image magnitude and phase. Applying algorithms like PALM and PALMNUT to this problem also required us to reformulate this optimization problem in a novel way that is more compatible with them than the standard formulation. our empirical results demonstrated that PALMNUT consistently had substantial advantages over previous approaches based on alternating minimization across several different MRI scenarios. As a result, we expect that PALMNUT will be useful for these kinds of MRI scenarios, and may also prove useful for more general computational imaging problems with similar optimization structure.

Acknowledgments

This work was supported in part by research grants NSF CCF-1350563, NIH R21-EB022951, NIH R01-MH116173, NIH R01-NS074980, and NIH R01-NS089212, as well as a USC Viterbi/Graduate School Fellowship.

Appendix A

proof of Thm. 1

Theorem 1 is an immediate consequence of the following Lemma.

Lemma 1. Let Q(x):RNR be smooth, and assume that a non-negative vector LRN exists such that

Q(x1)Q(x2),x1x2L(x1x2)22 (60)

for ∀x1, x2RN. Then for ∀x, x^kRN and ∀cL (elementwise), we have

Q(x)Q(x^k)+xx^k,Q(x^k)+12c(xx^k)22. (61)

Further, the right hand side of Eq. (61) can be written compactly as

Q(x^k)+xx^k,Q(x^k)+12c(xx^k)22=τk+12c(xwk)22, (62)

where

wk=x^kdiag(c)1Q(x^k) (63)

and τk is a constant that does not depend on the variable x, given by

τk=Q(x^k)x^k,Q(x^k)+12cx^k2212cwk22. (64)

Proof. Inspired by the proof of Proposition A.24 from [49], define α(t)Q(tx+(1t)x^k). Then α(0)=Q(x^k), α(1) = Q(x), and

ddtα(t)=Q(tx+(1t)x^k),xx^k. (65)

We have that

Q(x)Q(x^k)=01ddtα(t)dt=01Q(tx+(1t)x^k),xx^kdt=01Q(tx+(1t)x^k)Q(x^k),xx^kdt+Q(x^k),xx^k=011tQ(tx+(1t)x^k)Q(x^k),txtx^kdt+xx^k,Q(x^k)=011tQ(tx+(1t)x^k)Q(x^k),tx+(1t)x^kx^kdt+xx^k,Q(x^k)011tL(tx+(1t)x^kx^k)22dt+xx^k,Q(x^k)=01tL(xx^k)22dt+xx^k,Q(x^k)=12L(xx^k)22+xx^k,Q(x^k)12c(xx^k)22+xx^k,Q(x^k). (66)

This derivation proves Eq. (61), while the simplifications leading to Eq. (62) come simply from completing the square. □

Footnotes

1

Note that this formulation allows the “magnitude” vector m to have negative entries, which can avoid unnecessary phase discontinuities in p [5].

2

In particular, PALMNUT’s majorization relationship implies the BPEG-M majorization relationship while the converse does not hold, meaning that the PALMNUT assumptions are more broadly applicable.

3

iPALM [38] has two momentum step sizes αk and βk. In our implementation, we set these to be equal with αk = βk = (k − 1)/(k + 2) so that the only difference between iPALM and PALMNUT is the uncoupled step sizes.

References

  • [1].Cetin M and Karl WC, “Feature-enhanced synthetic aperture radar image formation based on nonquadratic regularization,” IEEE Trans. Image Process, vol. 10, pp. 623–631, 2001. [DOI] [PubMed] [Google Scholar]
  • [2].Fessler JA and Noll DC, “Iterative image reconstruction in MRI with separate magnitude and phase regularization,” in Proc. IEEE Int. Symp. Biomed. Imaging, 2004, pp. 209–212. [Google Scholar]
  • [3].Tuysuzoglu A, Kracht JM, Cleveland RO, Cetin M, and Karl WC, “Sparsity driven ultrasound imaging,” J. Acoust. Soc. Am, vol. 131, pp. 1271–1281, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Haldar JP, Wang Z, Popescu G, and Liang ZP, “Deconvolved spatial light interference microscopy for live cell imaging,” IEEE Trans. Biomed. Eng, vol. 58, pp. 2489–2497, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Zhao F, Noll DC, Nielsen JF, and Fessler JA, “Separate magnitude and phase regularization via compressed sensing,” IEEE Trans. Med. Imaging, vol. 31, pp. 1713–1723, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Guven HE, Gungor A, and Cetin M, “An augmented Lagrangian method for complex-valued compressed SAR imaging,” IEEE Trans. Comput. Imaging, vol. 2, pp. 235–250, 2016. [Google Scholar]
  • [7].Zibetti MVW and De Pierro AR, “Improving compressive sensing in MRI with separate magnitude and phase priors,” Multidim. Syst. Signal. Process, vol. 28, pp. 1109–1131, 2017. [Google Scholar]
  • [8].Ong F, Cheng JY, and Lustig M, “General phase regularized reconstruction using phase cycling,” Magn. Reson. Med, vol. 80, pp. 112–125, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Moradikia M, Samadi S, and Cetin M, “Joint SAR imaging and multi-feature decomposition from 2-D undersampled data via low-rankness plus sparsity priors,” IEEE Trans. Comput. Imaging, vol. 5, pp. 1–16, 2018. [Google Scholar]
  • [10].Haldar JP, Liu Y, Liao C, Fan Q, and Setsompop K, “Fast submillimeter diffusion MRI using gSlider-SMS and SNR-enhancing joint reconstruction,” Magn. Reson. Med, vol. 84, pp. 762–776, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Bolte J, Sabach S, and Teboulle M, “Proximal alternating linearized minimization for nonconvex and nonsmooth problems,” Math. Program, vol. 146, pp. 459–494, 2014. [Google Scholar]
  • [12].Nesterov Y, Introductory Lectures on Convex Optimization: A Basic Course. Boston: Springer, 2004. [Google Scholar]
  • [13].Liu Y and Haldar JP, “NAPALM: An algorithm for MRI reconstruction with separate magnitude and phase regularization,” in Proc. Int. Soc. Magn. Reson. Med, 2019, p. 4764. [Google Scholar]
  • [14].Chun IY and Fessler JA, “Convolutional dictionary learning: Acceleration and convergence,” IEEE Trans. Image Process, vol. 27, pp. 1697–1712, 2018. [DOI] [PubMed] [Google Scholar]
  • [15].Chun IY and Fessler JA, “Convolutional analysis operator learning: Acceleration and convergence,” IEEE Trans. Image Process, vol. 29, pp. 2108–2122, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Wright SJ, “Coordinate descent algorithms,” Math. Program, vol. 151, pp. 3–34, 2015. [Google Scholar]
  • [17].Hunter DR and Lange K, “A tutorial on MM algorithms,” Am. Stat, vol. 58, pp. 30–37, 2004. [Google Scholar]
  • [18].Parikh N and Boyd S, “Proximal algorithms,” Found. Trends Optim, vol. 1, pp. 123–231, 2013. [Google Scholar]
  • [19].Beck A, First-Order Methods in Optimization. Philadelphia: SIAM, 2017. [Google Scholar]
  • [20].Liang Z-P, “A model-based method for phase unwrapping,” IEEE Trans. Med. Imaging, vol. 15, pp. 893–897, 1996. [DOI] [PubMed] [Google Scholar]
  • [21].Afonso MV, Bioucas-Dias JM, and Figueiredo MAT, “Fast image recovery using variable splitting and constrained optimization,” IEEE Trans. Image Process, vol. 19, pp. 2345–2356, 2010. [DOI] [PubMed] [Google Scholar]
  • [22].Combettes PL and Pesquet J-C, “Proximal splitting methods in signal processing,” in Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Bauschke HH, Burachik RS, Combettes PL, Elser V, Luke DR, and Wolkowicz H, Eds. New York, NY: Springer New York, 2011, pp. 185–212. [Google Scholar]
  • [23].Huber PJ, Robust Statistics. New York: John Wiley & Sons, Inc., 1981. [Google Scholar]
  • [24].Nikolova M and Ng MK, “Analysis of half-quadratic minimization methods for signal and image recovery,” SIAM J. Sci. Comput, vol. 27, pp. 937–966, 2005. [Google Scholar]
  • [25].Black MJ and Rangarajan A, “On the unification of line processes, outlier rejection, and robust statistics with applications in early vision,” Int. J. Comput. Vis, vol. 19, pp. 57–91, 1996. [Google Scholar]
  • [26].Haldar JP, Wedeen VJ, Nezamzadeh M, Dai G, Weiner MW, Schuff N, and Liang Z-P, “Improved diffusion imaging through SNR-enhancing joint reconstruction,” Magn. Reson. Med, vol. 69, pp. 277–289, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Haldar JP, “Constrained imaging: Denoising and sparse sampling,” Ph.D. dissertation. [Google Scholar]
  • [28].Rudin LI, Osher S, and Fatemi E, “Nonlinear total variation based noise removal algorithms,” Physica D, vol. 60, pp. 259–268, 1992. [Google Scholar]
  • [29].Tropp JA, “Algorithms for simultaneous sparse approximation. Part II: Convex relaxation.” Signal Proc., vol. 86, pp. 589–602, 2006. [Google Scholar]
  • [30].Haldar JP and Liang Z-P, “Joint reconstruction of noisy high-resolution MR image sequences,” in Proc. IEEE Int. Symp. Biomed. Imaging, 2008, pp. 752–755. [Google Scholar]
  • [31].Liang Z-P, Boada FE, Constable RT, Haacke EM, Lauterbur PC, and Smith MR, “Constrained reconstruction methods in MR imaging,” Rev. Magn. Reson. Med, vol. 4, pp. 67–185, 1992. [Google Scholar]
  • [32].Haldar JP and Setsompop K, “Linear predictability in magnetic resonance imaging reconstruction: Leveraging shift-invariant fourier structure for faster and better imaging,” IEEE Signal Process. Mag, vol. 37, pp. 69–82, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Becker S, Bobin J, and Candés EJ, “NESTA: A fast and accurate first-order method for sparse recovery,” SIAM J. Imaging Sci, vol. 4, pp. 1–39, 2011. [Google Scholar]
  • [34].Weiss P, Blanc-Feraud L, and Aubert G, “Efficient schemes for total variation minimization under constraints in image processing,” SIAM J. Sci. Comput, vol. 31, pp. 2047–2080, 2009. [Google Scholar]
  • [35].Nesterov Y, “Smooth minimization of non-smooth functions,” Math. Program, vol. 103, pp. 127–152, 2005. [Google Scholar]
  • [36].Golub GH and Van Loan CF, Matrix Computations, 4th ed. Baltimore: The Johns Hopkins University Press, 2013. [Google Scholar]
  • [37].Beck A and Teboulle M, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imaging Sci, vol. 2, pp. 183–202, 2009. [Google Scholar]
  • [38].Pock T and Sabach S, “Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems,” SIAM Journal on Imaging Sciences, vol. 9, no. 4, pp. 1756–1787, 2016. [Google Scholar]
  • [39].Su W, Boyd S, and Candes E, “A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights,” in Proc. NeurIPS, 2014, pp. 2510–2518. [Google Scholar]
  • [40].Bernstein MA, Thomasson DM, and Perman WH, “Improved detectability in low signal-to-noise ratio magnetic resonance images by means of a phase-corrected real reconstruction,” Math. Program, vol. 16, pp. 813–817, 1989. [DOI] [PubMed] [Google Scholar]
  • [41].McKinnon GC, Zhou XJ, and Leeds NE, “Phase corrected complex averaging for diffusion weighted spine imaging,” in Proc. Int. Soc. Magn. Reson. Med, 2000, p. 802. [Google Scholar]
  • [42].Liu C, Bammer R, Kim D. h., and Moseley ME, “Self-navigated interleaved spiral (SNAILS): Application to high-resolution diffusion tensor imaging,” Magn. Reson. Med, vol. 52, pp. 1388–1396, 2004. [DOI] [PubMed] [Google Scholar]
  • [43].Chen N-K, Guidon A, Chang H-C, and Song AW, “A robust multi-shot scan strategy for high-resolution diffusion weighted MRI enabled by multiplexed sensitivity-encoding (MUSE),” NeuroImage, vol. 72, pp. 41–47, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Eichner C, Cauley SF, Cohen-Adad J, Moller HE, Turner R, Setsompop K, and Wald LL, “Real diffusion-weighted MRI enabling true signal averaging and increased diffusion contrast,” NeuroImage, vol. 122, pp. 373–384, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Press WH, Teukolsky SA, Vetterling WT, and Flannery BP, Numerical Recipes in C. Cambridge: Cambridge University Press, 1992. [Google Scholar]
  • [46].Pruessmann KP, Weiger M, Bornert P, and Boesiger P, “Advances in sensitivity encoding with arbitrary k-space trajectories,” Magn. Reson. Med, vol. 46, pp. 638–651, 2001. [DOI] [PubMed] [Google Scholar]
  • [47].Uecker M, Lai P, Murphy MJ, Virtue P, Elad M, Pauly JM, Vasanawala SS, and Lustig M, “ESPIRiT – an eigenvalue approach to autocalibrating parallel MRI: Where SENSE meets GRAPPA,” Magn. Reson. Med, vol. 71, pp. 990–1001, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [48].Wang Z and Bovik AC, “Mean squared error: Love it or leave it? a new look at signal fidelity measures,” IEEE Signal Process. Mag, vol. 26, pp. 98–117, 2009. [Google Scholar]
  • [49].Bertsekas DP, Nonlinear Programming, 2nd ed. Athena Scientific, 1999. [Google Scholar]

RESOURCES