Abstract
In this paper, we focus on the application of the Peaceman–Rachford splitting method (PRSM) to a convex minimization model with linear constraints and a separable objective function. Compared to the Douglas–Rachford splitting method (DRSM), another splitting method from which the alternating direction method of multipliers originates, PRSM requires more restrictive assumptions to ensure its convergence, while it is always faster whenever it is convergent. We first illustrate that the reason for this difference is that the iterative sequence generated by DRSM is strictly contractive, while that generated by PRSM is only contractive with respect to the solution set of the model. With only the convexity assumption on the objective function of the model under consideration, the convergence of PRSM is not guaranteed. But for this case, we show that the first t iterations of PRSM still enable us to find an approximate solution with an accuracy of O(1/t). A worst-case O(1/t) convergence rate of PRSM in the ergodic sense is thus established under mild assumptions. After that, we suggest attaching an underdetermined relaxation factor with PRSM to guarantee the strict contraction of its iterative sequence and thus propose a strictly contractive PRSM. A worst-case O(1/t) convergence rate of this strictly contractive PRSM in a nonergodic sense is established. We show the numerical efficiency of the strictly contractive PRSM by some applications in statistical learning and image processing.
Keywords: convex programming, Peaceman–Rachford splitting method, convergence rate, contraction
1. Introduction
We consider the following convex minimization model with linear constraints and a separable objective function:
| (1.1) |
where A ∈ ℜm×n1, B ∈ ℜm×n2, b ∈ ℜm,
⊂ ℜn1 and
⊂ ℜn2 are closed convex sets, and θ1 :
→ ℜ and θ2 :
→ ℜ are convex functions. Note that both θ1 and θ2 could be nonsmooth functions. Throughout, the solution set of (1.1) (denoted by S*) is assumed to be nonempty.
When A = Im, B = −Im, b = 0, X = ℜm, and Y = ℜm, the model (1.1) reduces to
| (1.2) |
where θ1 and θ2 can be explained, respectively, as data-fidelity and regularization terms for some ill-posed inverse problems arising widely in statistical learning and image processing areas. Because of the separable structure in its objective function, the majority of efficient solvers for (1.1) in the literature belong to the category of the splitting method, which is effective for taking advantage of the properties of the functions θ1 and θ2 individually in the algorithmic design. The resulting subproblems are usually easy enough to have closed-form solutions or can be solved easily up to high precisions; therefore, the implementation of such an algorithm is extremely easy and its convergence is fast. A benchmark is the alternating direction method of multipliers (ADMM) proposed in [24] (see also [9, 21]), which has received tremendous attention from a number of areas recently (see, e.g., [6, 17, 22] for review papers). As analyzed in [20], ADMM is just an application of the Douglas–Rachford splitting method (DRSM) in [15, 39] to the dual problem of (1.1), and its iterative scheme reads as
| (1.3) |
where λ ∈ ℜm is the Lagrange multiplier associated with the linear constraints in (1.1) and β > 0 is a penalty parameter.
In this paper, we focus on the application of the Peaceman–Rachford splitting method (PRSM) in [39, 46] to (1.1). As elaborated on in [20], applying PRSM to the dual of (1.1), we obtain the iterative scheme of PRSM for (1.1),
| (1.4) |
where λ ∈ ℜm and β have the same meaning as (1.3). As analyzed in [20], the PRSM scheme (1.4) differs from ADMM “only through the addition of the intermediate update of the multipliers (i.e., ); it thus offers the same set of advantages.” The PRSM scheme (1.4), however, according to [20] again (see also [23]), “is less ‘robust’ in that it converges under more restrictive assumptions than ADMM.” Also, it was remarked in [20] that the PRSM scheme (1.4) with optimal parameters converges on the linear rate if Lipschitz continuity and coercivity of (where denotes the conjugate function of θ2) are assumed. We refer the reader to [3, 25] for some numerical verification of the efficiency of PRSM.
We first show that the difference between DRSM and PRSM in convergence can be illustrated by the contraction property of their iterative sequences.1 More specifically, the iterative sequence generated by DRSM is strictly contractive with respect to the solution set of (1.1), as proved in [29], while this does not hold for the iterative sequence generated by PRSM (see (3.20)). This difference is also the reason a worst-case O(1/t) convergence rate of ADMM for (1.1) in a nonergodic sense can be established in [32], while we can only establish the same convergence rate in the ergodic sense for the PRSM scheme (1.4);2 see Theorem 4.1. Also inspired by the failure of strict contraction of PRSM’s iterative sequence, we find that when an underdetermined relaxation factor α ∈ (0, 1) is attached to the penalty parameter β in the steps of Lagrange multiplier updating in (1.4), the resulting sequence becomes strictly contractive with respect to the solution set of (1.1). This strict contraction property makes it possible to establish a worst-case O(1/t) convergence rate in a non-ergodic sense for the PRSM (1.4) with an underdetermined relaxation factor, which will be named as a strictly contractive PRSM from now on. Let us specify the iterative scheme of the strictly contractive PRSM for (1.1):
| (1.5) |
where α ∈ (0, 1). Note that we follow the standard terminology in numerical linear algebra and call α ∈ (0, 1) an underdetermined relaxation factor; see also [18, 26]. As we shall show, the consideration of an additional relaxation factor in the PRSM scheme (1.5) ensures the sequence generated by (1.5) to be strictly contractive with respect to the solution set of (1.1). Thus we can establish some worst-case convergence rates for (1.5) without any further assumption on the model (1.1). Numerically, we can simply choose α close to 1.
The rest of this paper is organized as follows. In section 2, we summarize some useful preliminary results and prove some simple assertions for further analysis. Then, we prove some properties for the sequence generated by the strictly contractive PRSM (1.5) in section 3. In section 4, we establish a worst-case O(1/t) convergence rate in the ergodic sense for the PRSM scheme (1.4); and in section 5, we establish a worst-case O(1/t) convergence rate in a nonergodic sense for the strictly contractive PRSM scheme (1.5). Then, we show the numerical efficiency of the strictly contractive PRSM in section 6 by some applications to statistical learning and image processing. Some comparisons with existing efficient methods are also reported. Finally, some conclusions are drawn in section 7.
2. Preliminaries
In this section, we summarize some useful preliminaries known in the literature and prove some simple conclusions for further analysis.
2.1. Variational reformulation of (1.1)
First, as in the work [31, 32] for analyzing the convergence rate of ADMM, we need a variational inequality (VI) reformulation of the model (1.1) and a characterization of its solution set. More specifically, solving (1.1) is equivalent to finding w* = (x*, y*, λ*) ∈ ω :=
×
× ℜm such that
| (2.1a) |
where
| (2.1b) |
Since the mapping F(w) defined in (2.1b) is affine with a skew-symmetric matrix, it is monotone. We denote by ω* the solution set of VI(ω, F, θ), and it is not nonempty under our nonempty assumption onto S*.
According to Theorem 2.3.5 in [19], a very useful characterization of ω* can be summarized in the following theorem. Its proof can be found in [19, 31].
Theorem 2.1
The solution set of VI(ω, F, θ) is closed and convex, and it can be characterized as
| (2.2) |
Theorem 2.1 thus implies that w̃ ∈ ω is an approximate solution of VI(ω, F, θ) with an accuracy of O(1/t) if it satisfies
| (2.3) |
with ε = O(1/t) and
⊂ ω a compact set.3 In fact, this characterization makes it possible to analyze the convergence rate of ADMM and other splitting methods via the VI approach rather than the conventional approach based on the functional values in the literature. In the following, we shall show that either sequence (1.4) or sequence (1.5) enables us to find an approximate solution of (1.1) in the sense of (2.3) after t iterations.
2.2. Some notation
As mentioned in [6] for ADMM, the variable x is an intermediate variable during the PRSM iteration since it essentially requires only (yk, λk) in (1.4) or (1.5) to generate the (k + 1)th iterate. For this reason, we define the notation vk = (yk, λk),
=
× ℜm,
it suffices to analyze the convergence rate of the sequence {vk} to the set
in order to study the convergence rate of the sequence {wk} generated by (1.4) or (1.5). Note that
is also closed and convex.
Then, we define some matrices in order to present our analysis in a compact way. Let
| (2.4) |
and
| (2.5) |
In (2.5), “0” is a matrix with all zero entries in appropriate dimensionality. We further define
| (2.6) |
as the submatrix of Q0 excluding all the first zero rows. The matrices Q0 and Q are associated with the analysis for the sequences {wk} and {vk}, respectively. Last, for α ∈ (0, 1] we define a symmetric matrix
| (2.7) |
Below we prove some assertions regarding the matrices just defined. These assertions will be used in our theoretical analysis about the convergence rate of (1.4) and (1.5); their role is to make our proof presentable in compact notation.
Lemma 2.2
The matrix H defined in (2.7) is positive definite (if B is a full column rank matrix) for α ∈ (0, 1) and positive semidefinite for α = 1.
Proof
We have
Note that the matrix
is positive definite if α ∈ (0, 1) and positive semidefinite if α = 1. The assertion of this lemma is thus proved.
Lemma 2.3
The matrices M, Q, and H defined, respectively, in (2.4), (2.6), and (2.7) have the following relationships:
| (2.8) |
and
| (2.9) |
Proof
Using the definitions of the matrices M, Q, and H, by a simple manipulation, we obtain
The first assertion is proved. Consequently, we get
Using (2.6) and the above equation, we have
| (2.10) |
Note that
| (2.11) |
Because
the right-hand side of (2.11) is positive semidefinite. Thus, it follows that
| (2.12) |
Substituting (2.12) into (2.10), we obtain (2.9), and the lemma is proved.
Remark 2.1
When α = 1, the matrices H defined in (2.7) and QT + Q − MT HM are both positive semidefinite. However, in the following analysis we still use ||v − ṽ||H and ||v − ṽ||(QT +Q − MT HM) to denote, respectively,
and
for v, ṽ ∈
× ℜm This slight abuse of notation will simplify the notation in our analysis greatly.
3. Contraction analysis
In this section, we analyze the contraction property for the sequence {vk} generated by the PRSM scheme (1.4) or the strictly contractive PRSM scheme (1.5) with respect to the set
. The convergence rate analysis for (1.4) and (1.5) to be presented is based on this analysis of contraction property. Since (1.4) can be included by the strictly contractive PRSM scheme (1.5) if we extend the value of α = 1 and if the algebra of convergence analysis for these two schemes are of the same framework, below we only present the contraction analysis for (1.5); the analysis for (1.4) is readily obtained by taking α = 1 in our analysis.
First, to further simplify the notation in our analysis, we need to define an auxiliary sequence {w̃k} as
| (3.1) |
where (xk+1, yk+1) is generated by (1.4) or (1.5). Note that with the notation of w̃k, we immediately have
| (3.2) |
The strictly contractive PRSM (1.5) can be written as
| (3.3) |
Then, based on (1.5) and (3.2), we immediately get
| (3.4) |
Furthermore, together with yk+1 = ỹk, we have the relationship
which can be rewritten in a compact form by using the notation of vk and ṽk:
| (3.5) |
where M is as defined in (2.4).
Now, we start to prove some properties for the sequence {w̃k} defined in (3.1). Recall that our primary purpose is to analyze the convergence rate for the sequences (1.4) and (1.5) based on the solution characterization (2.2), and the accuracy of an approximate solution w̃ ∈ Ω is measured by an upper bound of the quantity of θ(ũ) − θ(u) + (w̃ − w)T F(w) for all w ∈ Ω (see (2.3)). Hence, we are interested in estimating how accurate the point w̃k defined in (3.1) is to a solution point of VI(Ω, F, θ). The main result is proved in Theorem 3.4. But before that, we first show some lemmas. The first lemma presents an upper bound of θ(ũ) − θ(u)+(w̃ − w)T F(w) for all w ∈ Ω in terms of a quadratic term involving the matrix Q.
Lemma 3.1
For given vk ∈
× ℜm, let wk+1 be generated by the strictly contractive PRSM scheme (1.5), and let w̃k be as defined in (3.1). Then, we have w̃k ∈ Ω and
| (3.6) |
where the matrix Q is as defined in (2.6).
Proof
Since xk+1 = x̃k, by deriving the first-order optimality condition of the x-minimization problem in (3.3), we have
| (3.7) |
According to the definition (3.1), we have
| (3.8) |
Using (3.8), the inequality (3.7) can be written as
| (3.9) |
Similarly, by deriving the first-order optimality condition of the y-minimization problem in (3.3), we get
| (3.10) |
Again, using (3.8), we have
Consequently, it follows from (3.10) that
| (3.11) |
In addition, based on (3.1) we have
| (3.12) |
Combining (3.9), (3.11), and (3.12), we get w̃k = (x̃k, ỹk, λ̃k) ∈ Ω; and for any w = (x, y, λ) ∈ Ω, it holds that
The assertion (3.6) is only a compact form of the above inequality by using the notation of Q in (2.6), w and F in (2.1b), and v. The proof is complete.
Based on the optimality condition (2.1) and Lemma 3.1, we can prove the following lemma, which makes it possible to measure the accuracy of w̃k to a solution point in
by the quantity
. This is also an important assertion to establish a nonergodic convergence rate for the proposed strictly contractive PRSM in section 5.
Lemma 3.2
Let {wk} be generated by the strictly contractive PRSM scheme (1.5), and let {w̃k} be as defined in (3.1); let M, Q, and H be as defined in (2.4), (2.6), and (2.7), respectively. Then, w̃k is a solution of VI(Ω, F, θ) if .
Proof
By using Q = HM and M(vk − ṽk) = vk − vk+1 (see (2.8) and (3.5)), it follows that
| (3.13) |
Substituting this into (3.6), we get
| (3.14) |
Note that w̃k ∈ Ω. Since H is positive semidefinite, in the case , we have H(vk − vk+1) = 0, and thus
According to (2.1), w̃k is a solution of VI(Ω, F, θ).
In the next lemma, we aim at further bounding the term (v − ṽk)TQ(vk − ṽk) found in Lemma 3.1 by the difference of two quadratic terms involving two consecutive iterates of the sequence {vk} and a quadratic term involving vk and the auxiliary iterate ṽk. This refined bound is convenient for the manipulation over the whole sequence {vk} recursively and thus for establishing the convergence rate of {vk} in either the ergodic or a nonergodic sense.
Lemma 3.3
Let {wk} be generated by the strictly contractive PRSM scheme (1.5), and let {w̃k} be as defined in (3.1); let M, Q, and H be as defined in (2.4), (2.6), and (2.7), respectively. Then we have
| (3.15) |
Proof
For the vectors a, b, c, d in the same space and a matrix H with appropriate dimensionality, we have the identity
In this identity, we take
and substitute it into the right-hand side of (3.13). The resulting equation is
| (3.16) |
Now, we deal with the last term of the right-hand side of (3.16). By using (3.5) and (2.8), we get
Substituting into (3.16), we obtain the assertion (3.15). The proof is complete.
Now we are ready to present an inequality where an upper bound of θ(ũk) − θ(u)+ (w̃k − w)T F(w) is found for all w ∈ Ω. This inequality is also crucial for analyzing the contraction property and the convergence rate for the iterative sequence generated by either (1.4) or (1.5).
Theorem 3.4
For given vk ∈
× ℜm, let wk+1 be generated by the strictly contractive PRSM scheme (1.5), and let w̃k be as defined in (3.1); let M and H be as defined in (2.4) and (2.7), respectively. Then, we have w̃k ∈ Ω and
| (3.17) |
Proof
First, because of the monotonicity of F(w), we have
Then, using the above inequality and replacing the right-hand side term in (3.6) with the inequality (3.15), we obtain the assertion (3.17). The proof is complete.
The assertion (3.17) also enables us to study the contraction property of the sequence {vk} generated by (1.4) or (1.5). In fact, setting w = w* in (3.17) where w* is an arbitrary solution point in Ω*, we get
| (3.18) |
Recall the optimality in (2.1). We thus have
| (3.19) |
Therefore, when α = 1, i.e., for the PRSM scheme (1.4), we have
| (3.20) |
which means the sequence {vk} generated by (1.4) is contractive, but not strictly, to the set
. In fact, it is possible that the sequence {vk} stays away from the solution set with a constant distance (i.e., the equivalence in (3.20) holds for any k); hence no convergence of (1.4) is guaranteed under our assumption on (1.1). In [12] and [16], such an example was shown. On the other hand, when α ∈ (0, 1), the inequality (3.19) ensures a reduction of
to the set
at the (k + 1)th iteration; i.e., the strict contraction of {vk} is guaranteed for the sequence generated by (1.5). Recall Lemma 3.2, which indicates that
whenever a solution is not yet found. Thus, the inequality (3.19) implies that the sequence {vk} generated by the proposed strictly contractive PRSM (1.5) converges to
with a guaranteed reduction of proximity to the solution set. As we have mentioned, the difference of contraction between (3.19) and (3.20) is also the reason we can establish a nonergodic convergence rate for the strictly contractive PRSM (1.5) in section 5 while only the ergodic convergence rate can be established for the original PRSM (1.4) in section 4.
4. Convergence rate of (1.4) in the ergodic sense
In this section, we show that although the original PRSM (1.4) might not be convergent to a solution point of the model (1.1), it is still possible to find an approximate solution of VI(Ω, F, θ) with an accuracy of O(1/t) based on the first t iterations of the PRSM scheme (1.4). This estimate helps us better understand the convergence property of the original PRSM (1.4).
Theorem 4.1
Let {wk} be generated by PRSM (1.4) and {w̃k} be defined by (3.1). Let w̃t be defined as
| (4.1) |
Then, for any integer number t > 0, w̃t ∈ Ω and
| (4.2) |
where H is as defined in (2.7).
Proof
First, because of (3.1), it holds that w̃k ∈ Ω for all k ≥ 0. Together with the convexity of
and
, (4.1) implies that w̃t ∈ Ω. Second, by taking α = 1 in (3.17) we have
| (4.3) |
Summing the inequality (4.3) over k = 0, 1, …, t, we obtain
Using the notation of w̃t, it can be written as
| (4.4) |
Since θ(u) is convex and
we have that
Substituting this into (4.4), the assertion of this theorem follows directly.
Let v0 = (y0, λ0) be the initial iterate. For a given compact set
⊂
× ℜm, let d = sup{||v − v0||H | v ∈
}. Then, after t iterations of the PRSM (1.4), the point w̃t ∈ Ω defined in (4.1) satisfies
which means w̃t is an approximate solution of VI(Ω, F, θ) with an accuracy of O(1/t) (recall (2.3)).
Remark 4.1
In the proof of Theorem 4.1, we take α = 1 in (4.3). Obviously, the proof is still valid if we take α ∈ (0, 1). Thus, a worst-case O(1/t) convergence rate in the ergodic sense can be established easily for the strictly contractive PRSM (1.5). As we shall show in section 5, this is less interesting because a nonergodic worst-case O(1/t) convergence rate can be established for (1.5). We thus omit the details.
5. Convergence rate of (1.5) in a nonergodic sense
In this section, we show that the sequence {vk} generated by the strictly contractive PRSM scheme (1.5) is convergent to a point in
, and its worst-case convergence rate is O(1/t) in a nonergodic sense. Our starting point for the analysis is the inequality (3.19), and a crucial property is the monotonicity of the sequence {
}. That is, we will prove that
We first take a closer look at the assertion (3.6) in Lemma 3.1.
Lemma 5.1
Let {wk} be the sequence generated by the strictly contractive PRSM (1.5), let w̃k be as defined in (3.1), and let the matrix Q be as defined in (2.6). Then, we have
| (5.1) |
Proof
Setting w = w̃k+1 in (3.6), we have
| (5.2) |
Note that (3.6) is also true for k := k + 1, and thus
Setting w = w̃k in the above inequality, we obtain
| (5.3) |
Adding (5.2) and (5.3) and using the monotonicity of F, we get (5.1) immediately.
Lemma 5.2
Let {wk} be the sequence generated by the strictly contractive PRSM (1.5); let the matrices M, Q, and H be as defined in (2.4), (2.6), and (2.7), respectively. Then, we have
| (5.4) |
Proof
Adding the equation
to both sides of (5.1), we get
| (5.5) |
By using Q = HM and M(vk − ṽk) = vk − vk+1 (see (2.8) and (3.5)) in the left-hand side of (5.5), we obtain
| (5.6) |
Due to (2.9) we have
Substituting this into the right-hand side of (5.6) and using M(vk − ṽk) = vk − vk+1 again, we obtain (5.4), and the lemma is proved.
Now, we are ready to prove the monotonicity of the sequence { }.
Theorem 5.3
Let {wk} be the sequence generated by the strictly contractive PRSM (1.5), let w̃k be as defined in (3.1), and let the matrix H be as defined in (2.7). Then, we have
| (5.7) |
Proof
Setting a = (vk − vk+1) and b = (vk+1 − vk+2) in the identity
we obtain
Inserting (5.4) into the first term of the right-hand side of the last equality, we obtain
The assertion (5.7) follows from the above inequality directly, and the proof is complete.
Now, we can establish a worst-case O(1/t) convergence rate in a nonergodic sense for the strictly contractive PRSM scheme (1.5).
Theorem 5.4
Let {wt} be the sequence generated by the strictly contractive PRSM scheme (1.5). For any v* ∈
, we have
| (5.8) |
Proof
First, it follows from (3.19) that
| (5.9) |
According to Theorem 5.3, the sequence { } is monotonically non-increasing. Therefore, we have
| (5.10) |
The assertion (5.8) follows from (5.9) and (5.10) immediately. The proof is complete.
Notice that
is convex and closed. Let v0 = (y0, λ0) be the initial iterate and d := inf{||v0 − v*||H | v* ∈
}. Then, for any given ε > 0, Theorem 5.4 shows that the strictly contractive PRSM scheme (1.5) needs at most
iterations to ensure that
. It follows from Lemma 3.2 that wk+1 is a solution of VI(Ω, F, θ) if
. A worst-case O(1/t) convergence rate in a nonergodic sense for the strictly contractive PRSM scheme (1.5) is thus established in Theorem 5.4.
6. Numerical results
In this section, we verify the theoretical assertions analyzed in previous sections by some numerical experiments. We focus on some applications of the abstract model (1.1) in statistical learning and image processing, including the least absolute shrinkage and selection operator (LASSO) model, the group LASSO model, the sparse logistic regression model, the image deblurring problem, the image inpainting problem, and the magnetic resonance imaging (MRI) problem. We shall verify the following assertions.
The original PRSM (1.4) is indeed fast if it is convergent; the assertions in [20, 23] are thus further backed up.
For some scenarios of (1.1), the original PRSM (1.4) might fail to converge while the proposed strictly contractive PRSM (1.5) is convergent; the theoretical significance of the underdetermined relaxation factor α is thus verified.
For the proposed strictly contractive PRSM (1.5), the underdetermined relaxation factor α can be easily determined. In fact, empirically, α ∈ [0.8, 0.9] is preferred for all tested cases.
The proposed strictly contractive PRSM (1.5) converges quickly for a wide range of applications, and it numerically outperforms some efficient methods in the literature.
We shall use the previously mentioned statistical leaning models to illustrate the first three assertions, and the last assertion will be verified by the previously mentioned imaging models. Our code was written by MATLAB 2012a, and all the numerical experiments were conducted on a laptop computer with a 2.9GHz i7 processor and an 8GB memory.
6.1. Statistical learning problems
In this subsection, we test some popular sparse learning models in the area of statistical learning. As we have mentioned, via these models our purpose is to justify the advantages of the proposed strictly contractive PRSM (1.5) itself. We thus choose the ADMM (1.3) as the only benchmark for numerical comparison in this subsection. Note that the ADMM (1.3) has been shown to be a widely applicable efficient solver for some popular statistical learning problems; see, e.g., [6] for a review.
6.1.1. Models and iterative schemes
We first introduce the sparse learning models to be studied.
-
The LASSO model proposed in [51],
(6.1) where r ∈ ℜn is the response vector, D ∈ ℜn×d is the design matrix, n is the number of data points, d is the number of features, γ > 0 is a regularization parameter, and . The LASSO model provides a sparse estimation of x when there are more features than data points (i.e., d > n), and it has been very influential in several areas (e.g., bioinformatics [53, 40, 41], econometrics [7], and climate analysis [10]).
-
The group LASSO model proposed in [55],
(6.2) where , N represents the number of disjointed groups partitioned among the variable x, and all other settings are the same as in (6.1). The model (6.2) promotes selecting the grouped variables (factors), which are common in multifactor analysis-of-variance (ANOVA) problems or additive models with polynomial or nonparametric components. In these problems, important factors are groups of variables rather than the individual derived variables. Group LASSO reduces to LASSO when di = 1, which means each group contains only one variable.
-
where x ∈ ℜd is the coefficient vector, x0 ∈ ℜ is the intercept scalar, Di ∈ ℜd (i = 1, 2, …, n) are training data points, ri ∈ {−1, 1} (i = 1, 2, …, n) are corresponding labels, n is the number of data points, d is the dimension of data, and γ > 0 is a regularization parameter. After obtaining the coefficient x and x0 in (6.3), we can predict a binomial categorical label rpred with the given input Din ∈ ℜd by
Now we illustrate how to implement the ADMM (1.3), the PRSM (1.4), and the strictly contractive PRSM (1.5) for solving these statistical learning models. First, for the LASSO model (6.1), by introducing an auxiliary variable y ∈ ℜd, the model (6.1) can be reformulated as
| (6.4) |
which is a special case of (1.1) where
, g(y) = γ||y||1, n1 = n2 = d, A = Id, B = −Id, b = 0,
= ℜd, and
= ℜd. Therefore, the iterative scheme of the ADMM (1.3) for (6.4) is
| (6.5) |
Moreover, the iterative schemes of the PRSM (1.4) and the strictly contractive PRSM (1.5) for (6.4) read, respectively, as
| (6.6) |
and
| (6.7) |
In (6.5)–(6.7), Sκ(a) is the soft-thresholding operator defined as
for κ > 0 and a ∈ ℜd; see, e.g., [14].
Similarly, by introducing yi = xi for i = 1, 2, …, N, we can reformulate the group LASSO model (6.2) as
| (6.8) |
for which the ADMM (1.3), the PRSM (1.4), and the strictly contractive PRSM (1.5) are implementable. The resulting iterative schemes are, respectively, as follows:
| (6.9) |
| (6.10) |
and
| (6.11) |
Finally, the sparse logistic regression model (6.3) can be reformulated as
| (6.12) |
where y ∈ ℜd is an auxiliary variable. Therefore, the ADMM (1.3), the PRSM (1.4), and the strictly contractive PRSM (1.5) are all applicable to (6.12). For succinctness, we list only the iterative scheme of the strictly contractive PRSM (1.5) in detail:
| (6.13) |
Note that in (6.13) the x-subproblem has no closed-form solutions. We implement Newton’s method (coded by MATLAB) to solve this subproblem with a tolerance of 10−5 and a maximum iteration number of 10.
6.1.2. Implementation details
Now we specify the setting for the sparse learning models to be tested. For the LASSO model (6.1), we set n = 2000 and d = 4000; each entry of D is drawn from
(0, 1), and then all columns of D are normalized; we generate a random sparse vector in ℜ4000 with 100 nonzero entries from
(0, 1) as x; the noise vector ε ~
(0, 10−3I) and the vector r = Dx + ε; the regularization parameter is set as γ = 0.1||DTr||∞. For the group LASSO model (6.2), we set n = 1500; D is also drawn from
(0, 1) and normalized; the noise vector ε ~
(0, 10−3I) and the vector r = Dx + ε; we generate N = 200 blocks with size ni uniformly distributed between 1 and 50,
. Among all the blocks, 5% have entries drawn from the standard Gaussian distribution, and the other blocks have entries all equal to zero; the regularization parameter γ is set as
, where {D1, D2, …, DN} with Di ∈ ℜn×diis a disjointed partition of D’s columns in correspondence with the partition of {x1, x2, …, xN}. For the sparse logistic regression model (6.3), we set n = 50 and d = 500; each vector Di ∈ ℜ500 has 10 nonzero entries from
(0, 1); the vectors ri are generated by
, where εi is the noise drawn from
(0, 0.1); the vector x ∈ ℜd contains 100 nonzero entries drawn from
(0, 1); the intercept x0 is also from
(0, 1), and γ is set according to [34]: γ = 0.1 γmax, where γmax is the maximum regularization parameter above which the solution x has all zero entries.
Then, we provide some details to implement the methods to be tested. Our code was constructed based on the code available at http://www.stanford.edu/~boyd/papers/admm/. Hence, the ADMM (1.3) can be implemented directly by this code package, while the implementation of the PRSM (1.4) and the strictly contractive PRSM (1.5) require only a slight modification on the Lagrange multiplier to this package. It is worth mentioning that in (6.5)–(6.11), we need to compute (DT D + βI)−1 and DT r, which is quite time consuming if N and d are large. However, since these two terms are invariant in each iteration, we need only compute it once before all iterations. We define the stopping criterion as
| (6.14) |
where β||yk − yk+1||2 and measure the primal and dual residuals, respectively, and ε > 0 is a tolerance; see, e.g., [6, 30, 56]. Note that because of Lemma 3.2 it is also reasonable to use this stopping criterion for the PRSM (1.4) and the strictly contractive PRSM (1.5). For LASSO (6.1) and group LASSO (6.2), we set ε = 10−4; and for the sparse logistic regression model (6.3), we set ε = 10−3 since its x-subproblem is solved approximately at each iteration. For the penalty parameter β, we set it as 1 for all methods; and we set α = 0.9 for the strictly contractive PRSM (1.4) (the reason will be explained later).
6.1.3. Results
In Table 1, we report the computing time in seconds (“time(s)”) and the number of iterations when the ADMM (1.3), the PRSM (1.4), and the strictly contractive PRSM (1.5) are applied to solve the above-mentioned statistical learning models. According to this table, we see that the original PRSM (1.4) is convergent only for the group LASSO model (6.2). For this case, the original PRSM (1.4) is really faster than the ADMM (1.3), and it is almost as efficient as the strictly contractive PRSM (1.4). The assertions in [20, 23] are thus verified again. For the other two models, however, the convergence of PRSM is not witnessed. But the proposed strictly contractive PRSM (1.4) still performs well—faster than ADMM.
Table 1.
Quantitative comparison among the strictly contractive PRSM, the original PRSM, and the ADMM for statistical learning models.
| Algorithm | LASSO
|
Group LASSO
|
Sparse logistic regression
|
|||
|---|---|---|---|---|---|---|
| time(s) | # iterations | time(s) | # iterations | time(s) | # iterations | |
| Strictly contractive PRSM | 1.27 | 53 | 0.53 | 21 | 2.14 | 41 |
| ADMM | 2.13 | 89 | 0.96 | 36 | 3.75 | 72 |
| PRSM | - | - | 0.53 | 23 | - | - |
“-” means that the stopping criterion is not satisfied after 10,000 iterations.
To further observe the convergence of the ADMM and the strictly contractive PRSM, in Figures 1–3 we visualize the evolution of convergence when these two methods are applied to solve these three sparse learning models. The evolution of the objective function value, the reduction of primal and dual residuals, and ||v(k) − v*||H with respect to the iterations are plotted. Plots in Figure 1 show that the strictly contractive PRSM and ADMM reach the primal tolerance at almost the same time, but the former reaches the dual tolerance faster than the latter. Plots in Figure 2 indicate that the strictly contractive PRSM reaches both the primal and dual tolerances faster than ADMM. Last, plots in Figure 3 reveal that ADMM reaches the primal tolerance faster (but possibly jumps back to above the tolerance again), while the strictly contractive PRSM reaches the dual tolerance much faster than ADMM. The lower right plots in Figure 1–3 illustrate that the sequence ||v(k) − v*||H decreases monotonically, which further back up our theoretical results in (3.19).
Fig. 1.
LASSO model (6.1): Evolution of the objective function value, primal residual, and dual residual, and ||v(k) − v*||H for ADMM and the strictly contractive PRSM.
Fig. 3.
Sparse logistic regression model (6.3): Evolution of the objective function, primal residual, and dual residual, and ||v(k) − v*||H for ADMM and the strictly contractive PRSM.
Fig. 2.
Group LASSO model (6.2): Evolution of the objective function, primal residual, and dual residual, and ||v(k) − v*||H for ADMM and the strictly contractive PRSM.
6.1.4. Sensitivity to α
As we have analyzed, attaching an underdetermined relaxation factor α ∈ (0, 1) to the original PRSM (1.4) can make the resulting iterative sequence strictly contractive with respect to the solution set of (1.1). Thus, it becomes possible to ensure the convergence and establish a worst-case O(1/t) convergence rate in a nonergodic sense for the strictly contractive PRSM (1.5). Despite its significant theoretical role, we would emphasize that this underdetermined relaxation factor can be chosen easily to implement the strictly contractive PRSM (1.5). This is an important convenience for the implementation of the strictly contractive PRSM (1.5).
In this subsection, we take the LASSO model (6.1) to test the sensitivity of α for the strictly contractive PRSM (1.5). We fix β = 1 and choose different values of α in the interval [0.05, 0.99]. (More specifically, we choose α = {0.05, 0.10, 0.15, …, 0.85, 0.90, 0.91, 0.92, …, 0.98, 0.99}.) The computing time and number of iterations required by the strictly contractive PRSM (1.5) are recorded for each choice of α. Then, we plot them in Figure 4. For comparison purposes, we also plot for ADMM with β = 1. According to the curves in Figure 4, we see that the underdetermined relaxation factor α works for a wide range of values; thus it can be chosen easily in implementation. In particular, based on our experiments, some aggressive values close to 1 (e.g., [0.8, 0.9]) are preferred.
Fig. 4.

Sensitivity test on the underdetermined relaxation factor α when β = 1.
As is well known in the literature, the numerical performance of some augmented-Lagrangian-based methods including the ADMM (1.3) is highly dependent on the penalty parameter β. Theoretically, some strategies of adjusting this parameter automatically have been proposed; see, e.g., [29]. But for some concrete applications (especially some large-scale problems or models with matrix variables), realizing this kind of self-adaptive strategy might result in too much computation. Thus, a more popular way to choose this parameter is to tune manually and then fix it as a tuned value throughout. To the best of our knowledge, it is not clear so far how to determine an optimal value for β; it is highly possible that it is problem-dependent. This difficultly occurs also for the original PRSM (1.4). In this subsection, we test some fixed values of β and empirically verify that when implementing the strictly contractive PRSM (1.5), it is easy to choose α ∈ (0, 1) to accelerate the convergence.
We take the group LASSO model (6.2) to demonstrate the effectiveness of α for a fixed β. In our experiments, we test a set of value β = 0.25, 0.5, 1, 2, 4, 8, 16, 32. For each β, we choose different values of α = {0.05, 0.10, 0.15, …, 0.85, 0.90, 0.91, 0.92, …, 0.98, 0.99} and plot the computing time in seconds and number of iterations with respect to different choices of α in Figure 5. According to the plots in Figure 5, it seems that the original PRSM is very sensitive to the value of β. In fact, for some β such as β = 4, 8, 16, 32, the original PRSM (1.4) (i.e., α = 1) fails to satisfy the stopping criterion within 10, 000 iterations. This further emphasizes the importance of choosing β when implementing the original PRSM (1.4). At the same time, we see that for each of the tested β, α ∈ (0, 0.9) tends to accelerate the convergence of PRSM.
Fig. 5.
Acceleration of α for the group LASSO model (6.2).
6.2. Image reconstruction models
In this subsection, we test some digital image reconstruction models. Our aim is to further verify the efficiency of the proposed strictly contractive PRSM (1.5) by comparing it numerically with four well-known algorithms in the imaging literature: SALSA [1], TwIST [4], SpaRSA [52], FISTA [2], and YALL1/TVAL3 [36, 37, 38, 54, 57].
6.2.1. Models and iterative schemes
We first briefly review the background of digital image reconstruction problems; for more details we refer the reader to [28, 47]. A fundamental task in many areas such as medical and astronomical imaging, film restoration, image or video coding, and synthetic aperture radar (see, e.g., [49, 50, 33]), the image reconstruction problem is to reconstruct the original image p ∈ ℜn from its degraded image p0 ∈ ℜn. Note that we vectorize an N × M-pixel image P into an n-dimensional vector p in lexicographical order with n = NM. The relationship between p and p0 is given by
| (6.15) |
where ε ∈ ℜn is a noise corrupting the original image p, and D ∈ ℜn × n is the matrix representation of a distortion operator such as a blurring (convolution), vignetting, inpainting, or zooming operator.
According to [13], we can classify image reconstruction models into two categories: the synthesis approach and the analysis approach. The synthesis approach defines x ∈ ℜd as the vector of wavelet coefficients of the original image p under a wavelet dictionary. Let W ∈ ℜn × d be the matrix of a wavelet dictionary, e.g., a group of orthogonal bases; we then have p = Wx. Since (6.15) is usually ill-posed, certain regularization techniques are required. Note that the image p processes a sparse representation under the wavelet dictionary W; that is, x is sparse with many zero entries. Therefore, it is natural to use ||x||1, the l1 norm of x, to regularize the data-fidelity term. We thus have
| (6.16) |
as the synthesis approach of an image reconstruction model. Note that we consider only the case of additive nose. Thus the l2 norm is used for the data-fidelity term in (6.16). Other cases such as the impulsive or uniform noise can also be considered. On the other hand, the analysis approach considers reconstructing the image directly and not under a wavelet domain. Let the image be represented by a vector x ∈ ℜn. Under the consideration of additive noise in (6.15), the data-fidelity term is . For the regularization term, a very popular choice is the total variation (TV) regularization proposed in the seminal work [48], which is well known to be capable of preserving the edges of images. We thus have
| (6.17) |
as the analysis approach of an image reconstruction model. In (6.17), TV(x) denotes the nonsmooth isotropic TV norm [48]:
where x ∈ ℜn is the vectorized original N × M two-dimensional image in lexicographic order with n = NM and xi,j denotes pixel value at the position (i, j).
Now we show how to implement the proposed strictly contractive PRSM (1.5) to solve the models (6.16) and (6.17). For (6.16), by introducing an auxiliary variable y ∈ ℜd, it can be reformulated as
| (6.18) |
Similarly, the model (6.17) can be reformulated as
| (6.19) |
Therefore, when the strictly contractive PRSM (1.5) is applied to solve the models (6.18) and (6.19), the iterative schemes read, respectively, as
| (6.20) |
and
| (6.21) |
Let us explain how to solve the subproblems in (6.20) and (6.21). For example, the x-subproblem in (6.20) might be computationally expensive due to the high dimensionality of x. For example, for the analysis approach, x ∈ ℜ262144 for a 512 × 512-pixel image. According to [1], the matrices D and W are of special structures (e.g., WTW = I); fast solvers such as the fast Fourier transform (FFT) are thus applicable. For the y-subproblem in (6.21) whose closed-form solution does not exist, we adopt the algorithm proposed in [8] to solve it.
We will test three scenarios for the models (6.16) and (6.17):
The synthesis-based (6.16) image deblurring model where D is the matrix representation of the blurring operator [27]. Here we use the 9 × 9 uniform convolution kernel with every element being 1/81 as the blurring operator.
The analysis-based (6.17) image inpainting model where D is the matrix representation of the missing pixel operator [27]. Specifically, D is a highly sparse matrix with only ones and zeros in the diagonal. The zeros in the diagonal correspond to the missing pixels.
The analysis-based (6.17) MRI image reconstruction model where D is matrix representation of the 22-radial-line mask in the frequency domain, which is visualized in Figure 11.
Fig. 11.

The original and masked images of the Shepp–Logan phantom.
6.2.2. Implementation details
Among the methods to be compared, SALSA in [1] and YALL1/TVAL3 in [38, 57] are ADMM-based algorithms (YALL1 is for l1 norm regularized problems and TVAL3 is for TV norm regularized problems; thus we implement YALL1 to (6.16) and TVAL3 to (6.17)). The proposed strictly contractive PRSM (1.5) thus can be easily coded based on the source code of SALSA, which is publicly available. We followed the user guide [38, 57] to code YALL1/TVAL3 with tuned parameters. Codes for all other methods are downloaded from the respective authors’ web pages. We terminate the iteration of the strictly contractive PRSM when
where “Objectivek” represents the objective function value at the kth iterate for the model under consideration. To compare, for all other methods, their iterations are terminated when their objective functions are less than or equal to the objective function value obtained by the strictly contractive PRSM (1.5). In other words, we compare these methods subject to the criterion that they achieve the same objective function value.
We measure the noise of an image by the signal-to-noise ratio in units of dB,
where p and p0 are the clean image and the distorted image, respectively. The error of reconstruction is measured by the mean squared error (MSE)
where p̂ is the reconstructed image. We also define the improved signal-to-noise ratio as
as a uniform measurement of the quality of reconstructed images. For different methods, we compare the speed in terms of computing time and number of iterations to achieve the same quality of reconstruction which is measured by MSE or ISNR.
For the synthesis image deblurring application, we set γ = 0.0075 and test the 256 × 256-pixel image of Lena. We choose the four-level redundant Haar wavelet frame as W [11]. To generate the convolution operator D, we choose a 9 × 9 uniform blur kernel in which every element equals 1/81 (with zero padding in the boundary). For the corrupted image, its SNR value is 40dB. The Gaussian noise vector ε is thus generated by
(0, 0.449). We list the clean and blurred images in Figure 10. To implement the strictly contractive PRSM (1.5), we set β = 0.0075 and α = 0.8.
Fig. 10.

The original, blurred, and masked (with 40% missing pixels) images of Lena.
For the analysis image inpainting application, we set γ = 0.15 and test the 256 × 256-pixel image of Lena. The inpainting operator D contains 40% missing pixels which are chosen randomly. The masked image is shown in Figure 10. For the corrupted image, its SNR value is also 40dB, which means the noise is generated from
(0, 0.529). To implement the strictly contractive PRSM (1.5), we set β = 0.05 and α = 0.9. To solve the y-subproblem in (6.21) by the method in [8], we allow a maximum of 20 for the inner iteration.
For the analysis MRI application, we set γ = 0.0001 and test the 128 × 128-pixel image of the Shepp–Logan phantom. The clean image is masked by 22 radial lines on its discrete Fourier transform, and only the frequency components covered by the radial lines are observed. We contaminate the frequency components using the circular complex Gaussian noise with ; i.e., the real and imaginary parts of the noise are independent Gaussian with the standard deviation σε. We list the clean and masked images of the Shepp–Logan phantom in Figure 11. The SNR value is 5.42. To implement the strictly contractive PRSM (1.5), we set β = 0.01 and α = 0.9. To solve the y-subproblem in (6.21) by the method in [8], we allow a maximum of 40 for the inner iteration.
6.2.3. Results
We first list the comparison of different methods for the imaging construction models in Tables 2–4.4 Then, we visualize the evolution of the objective function when these methods are applied to solve these imaging models in Figures 6–8. The evolution of the MSE is also plotted in Figure 9. In Figure 12, we display the clean, corrupted, and reconstructed images by the strictly contractive PRSM (1.5) for the tested scenarios. These tables and figures clearly show the efficiency of the proposed strictly contractive PRSM (1.5).
Table 2.
Quantitative comparison on image deblurring model.
| Algorithm | Image deblurring
|
|||
|---|---|---|---|---|
| time(s) | #iterations | MSE | ISNR(dB) | |
| Strictly contractive PRSM | 2.60 | 15 | 97.8 | 6.60 |
| SALSA | 6.86 | 41 | 97.7 | 6.60 |
| TwIST | 7.87 | 60 | 97.5 | 6.61 |
| FISTA | 8.18 | 97 | 103 | 6.63 |
| SpaRSA | 9.40 | 86 | 107 | 6.21 |
| YALL1 | 9.38 | 86 | 107 | 6.21 |
Table 4.
Quantitative comparison on MRI model.
| Algorithm | time(s) | #iterations | MSE | ISNR(dB) |
|---|---|---|---|---|
| Strictly contractive PRSM | 19.81 | 207 | 1.03652e-06 | 42.2073 |
| SALSA | 30.46 | 311 | 1.03435e-06 | 42.2164 |
| TwIST | 53.69 | 451 | 1.10616e-06 | 41.9673 |
| FISTA | 25.23 | 460 | 1.04220e-06 | 42.1836 |
| SpaRSA | 97.74 | 1001 | 1.09543e-06 | 41.9250 |
| TVAL3 | 45.36 | 601 | 1.03762e-06 | 42.2027 |
Fig. 6.

Visualization of the objective functions on image deblurring model.
Fig. 8.
Visualization of the objective functions on MRI image reconstruction.
Fig. 9.
Visualization of MSE on MRI image reconstruction.
Fig. 12.

The deblurred and reconstructed images of Lena, and the reconstructed Shepp–Logan phantom image by strictly contractive PRSM. The SNRs for the reconstructed images are 46.6dB, 57.5dB, and 47.6dB, respectively.
7. Conclusions
As a classical operator splitting method in the literature, the Peaceman–Rachford splitting method (PRSM) may fail to be convergent for solving a convex optimization problem with linear constraints and a separable objective function. This paper shows that this failure can be illustrated by showing that its iterative sequence is not strictly contractive with respect to the solution set of the model under consideration. This understanding from a contraction perspective inspires us to tackle the deficiency of PRSM by embedding an underdetermined factor into the iterative scheme of PRSM and to propose a strictly contractive PRSM. The strictly contractive PRSM is as easy to implement as that of the alternating direction method of multipliers (ADMM), and it is numerically faster. We verify these advantages by some applications in statistical learning and image processing. We also study the convergence rate of the proposed strictly contractive PRSM, establishing the worst-case O(1/t) convergence rate in both the ergodic and nonergodic senses.
Fig. 7.
Visualization of the objective functions on image inpainting model.
Table 3.
Quantitative comparison on image inpainting model.
| Algorithm | Image inpainting
|
|||
|---|---|---|---|---|
| time(s) | #iterations | MSE | ISNR(dB) | |
| Strictly contractive PRSM | 3.86 | 28 | 88.9 | 17.5 |
| SALSA | 6.82 | 46 | 93.2 | 17.3 |
| TwIST | 14.4 | 83 | 89.7 | 17.4 |
| FISTA | 6.76 | 91 | 90.4 | 17.4 |
| TVAL3 | 6.04 | 117 | 90.2 | 17.4 |
Footnotes
Fort the definition of a contractive sequence we refer the reader to [5].
We follow [42, 43] and many others to measure the worst-case convergence rate in terms of the iteration complexity. That is, a worst-case O(1/t) convergence rate means the accuracy of a solution under certain criteria is of the order O(1/t) after t iterations of an iterative scheme; or, equivalently, it requires at most O(1/ε) iterations to achieve an approximate solution with an accuracy of ε.
As in [44], the compact set
can be chosen as
(w̃) = {w ∈
| ||w − w̃|| ≤ 1}.
In our experiments, the open source code of SpaRSA does not work for the image inpainting model under testing. Table 3 thus does not include any comparison with SpaRSA.
Contributor Information
HE BINGSHENG, Email: hebma@nju.edu.cn.
HAN LIU, Email: hanliu@princeton.edu.
ZHAORAN WANG, Email: zhaoran@princeton.edu.
XIAOMING YUAN, Email: xmyuan@hkbu.edu.hk.
References
- 1.Afonso MV, Bioucas-Dias JM, Figueiredo MAT. Fast image recovery using variable splitting and constrained optimization. IEEE Trans Imaging Process. 2010;9:2245–2256. doi: 10.1109/TIP.2010.2047910. [DOI] [PubMed] [Google Scholar]
- 2.Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci. 2009;2:183–202. [Google Scholar]
- 3.Bertsekas DP. Constrained Optimization and Lagrange Multiplier Methods. Academic Press; New York: 1982. [Google Scholar]
- 4.Bioucas-Dias JM, Figueiredo MAT. A new TwIST: Two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Trans Imaging Process. 2007;2:2992–3004. doi: 10.1109/tip.2007.909319. [DOI] [PubMed] [Google Scholar]
- 5.Blum E, Oettli W. Ökonometrie und Unternehmensforschung. Springer-Verlag; Berlin, Heidelberg, New York: 1975. Mathematische Optimierung. Grundlagen und Verfahren. [Google Scholar]
- 6.Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Machine Learning. 2010;3:1–122. [Google Scholar]
- 7.Caner M. Lasso-type GMM estimator. Econometric Theory. 2009;25:270–290. [Google Scholar]
- 8.Chambolle A. An algorithm for total variation minimization and applications. J Math Imaging Vis. 2004;1:89–97. [Google Scholar]
- 9.Chan TF, Glowinski R. Technical report. Stanford University; Stanford, CA: 1978. Finite Element Approximation and Iterative Solution of a Class of Mildly Non-linear Elliptic Equations. [Google Scholar]
- 10.Chatterjee S, Steinhaeuser K, Banerjee A, Chatterjee S, Ganguly A. Sparse group Lasso: Consistency and climate applications. Proceedings of the SIAM International Conference on Data Mining; Philadelphia: SIAM; 2012. pp. 47–58. [Google Scholar]
- 11.Chui CK. An introduction to wavelets. Academic Press Professional; New York: 1992. [Google Scholar]
- 12.Corman E, Yuan XM. A Generalized Proximal Point Algorithm and Its Convergence Rate. 2012. manuscript. [Google Scholar]
- 13.Demoment G. Image reconstruction and restoration: Overview of common estimation structures and problems. IEEE Trans Acoust Speech Signal Process. 1989;37:2024–2036. [Google Scholar]
- 14.Donoho DL, Tsaig Y. Fast solution of l1-norm minimization problems when the solution may be sparse. IEEE Trans Inform Theory. 2008;54:4789–4812. [Google Scholar]
- 15.Douglas J, Rachford HH. On the numerical solution of the heat conduction problem in 2 and 3 space variables. Trans Amer Math Soc. 1956;82:420–439. [Google Scholar]
- 16.Eckstein J. PhD Dissertation. MIT; Cambridge, MA: 1989. Splitting Methods for Monotone Operators with Applications to Parallel Optimization. [Google Scholar]
- 17.Eckstein J. Augmented Lagrangian and Alternating Direction Methods for Convex Optimization: A Tutorial and Some Illustrative Computational Results. 2012. manuscript. [Google Scholar]
- 18.Eckstein J, Bertsekas DP. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Program. 1992;55:293–318. [Google Scholar]
- 19.Facchinei F, Pang JS. Finite-Dimensional Variational Inequalities and Complementarity Problems Vol. I, Springer Ser. Oper. Res. Springer-Verlag; New York: 2003. [Google Scholar]
- 20.Gabay D. Applications of the method of multipliers to variational inequalities. In: Fortin M, Glowinski R, editors. Augmented Lagrange Methods: Applications to the Solution of Boundary-Valued Problems. North–Holland; Amsterdam: 1983. pp. 299–331. [Google Scholar]
- 21.Gabay D, Mercier B. A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput Math Appl. 1976;2:17–40. [Google Scholar]
- 22.Glowinski R. On Alternating Direction Methods of Multipliers: A Historical Perspective. 2012. manuscript. [Google Scholar]
- 23.Glowinski R, Kärkkäinen T, Majava K. In: Kuznetsov Y, Neittanmaki P, Pironneau O, editors. On the convergence of operator-splitting methods; Proceedings of CIMNE 2003: Numerical Methods for Scientific Computing, Variational Problems and Applications; Barcelona, Spain. 2003. pp. 67–79. [Google Scholar]
- 24.Glowinski R, Marrocco A. Approximation par éléments finis d’ordre un, et la résolution, par pénalisation-dualité, d’une classe de problèmes de Dirichlet non linéaires. RAIRO Anal Numér. 1975;9:41–76. [Google Scholar]
- 25.Glowinski R, Le Tallec P. SIAM Stud Appl Math. Vol. 9. SIAM; Philadelphia: 1989. Augmented Lagrangian and Operator Splitting Methods in Nonlinear Mechanics. [Google Scholar]
- 26.Gol’shtein EG, Tret’yakov NV. Modified Lagrangian in convex programming and their generalizations. Math Program Stud. 1979;10:86–97. [Google Scholar]
- 27.Gonzalez RC, Woods RE. Digital Image Processing. Addison–Wesley Longman; Harlow, UK: 2002. [Google Scholar]
- 28.Hansen PC, Nagy JG, O’Leary DP. Fund Algorithms. Vol. 3. SIAM; Philadelphia: 2006. Deblurring Images: Matrices, Spectra, and Filtering. [Google Scholar]
- 29.He BS, Liao LZ, Han DR, Yang H. A new inexact alternating directions method for monontone variational inequalities. Math Program. 2002;92:103–118. [Google Scholar]
- 30.He BS, Yang H. Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities. Oper Res Lett. 1998;23:151–161. [Google Scholar]
- 31.He BS, Yuan XM. On the O(1/n) convergence rate of the Douglas–Rachford alternating direction method. SIAM J Numer Anal. 2012;50:700–709. [Google Scholar]
- 32.He BS, Yuan XM. On nonergodic convergence rate of Douglas-Rachford alternating direction method of multipliers. Numer Math. 2012 submitted. [Google Scholar]
- 33.Herman GT. Fundamentals of Computerized Tomography: Image Reconstruction from Projections. Springer; New York: 2009. [Google Scholar]
- 34.Koh K, Kim SJ, Boyd S. An interior-point method for large-scale l1-regularized logistic regression. J Mach Learn Res. 2007;8:1519–1555. [Google Scholar]
- 35.Krishnapuram B, Carin L, Figueiredo MA, Hartemink AJ. Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Trans Pattern Anal Machine Intell. 2005;27:957–968. doi: 10.1109/TPAMI.2005.127. [DOI] [PubMed] [Google Scholar]
- 36.Li C. PhD thesis. Rice University; Houston, TX: 2011. Compressive Sensing for 3D Data Processing Tasks: Applications, Models and Algorithms. [Google Scholar]
- 37.Li C, Yin W, Jiang H, Zhang Y. An efficient augmented Lagrangian method with applications to total variation minimization. Comput Optim Appl. 2013;56:507–530. [Google Scholar]
- 38.Li C, Yin W, Zhang Y. Rice University CAAM Technical Report. Rice University; Houston, TX: 2009. User’s Guide for TVAL3: TV Minimization by Augmented Lagrangian and Alternating Direction Algorithms. [Google Scholar]
- 39.Lions PL, Mercier B. Splitting algorithms for the sum of two nonlinear operators. SIAM J Numer Anal. 1979;16:964–979. [Google Scholar]
- 40.Liu H, Palatucci M, Zhang J. Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. Proceeding of the 26th Annual International Conference on Machine Learning; Montreal, QC, Canada. 2009. pp. 649–656. [Google Scholar]
- 41.Ma S, Song X, Huang J. Supervised group Lasso with applications to microarray data analysis. BMC Bioinformatics. 2007;8:60–72. doi: 10.1186/1471-2105-8-60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Nemirovsky AS, Yudin DB. Wiley-Interscience Series in Discrete Mathematics. John Wiley and Sons; New York: 1983. Problem Complexity and Method Efficiency in Optimization. [Google Scholar]
- 43.Nesterov YE. A method for solving the convex programming problem with convergence rate O(1/k2) Dokl Akad Nauk SSSR. 1983;269:543–547. (in Russian) [Google Scholar]
- 44.Nesterov Y. Core Discussion Paper 2007/96. Center for Operations Research and Econometrics, Catholic University of Louvain; Louvain-la-Neuve, Belgium: 2007. Gradient Methods for Minimizing Composite Objective Function. [Google Scholar]
- 45.Park MY, Hastie T. L1-regularization path algorithm for generalized linear models. J R Stat Soc Ser B Methodol. 2007;69:659–677. [Google Scholar]
- 46.Peaceman DW, Rachford HH., Jr The numerical solution of parabolic and elliptic differential equations. J Soc Indust Appl Math. 1955;3:28–41. [Google Scholar]
- 47.Pratt WK. Digital Image Processing: PIKS Inside. 3. John Wiley and Sons; New York: 2001. [Google Scholar]
- 48.Rudin L, Osher S, Fatemi E. Nonlinear total variation based noise removal algorithms. Phys D. 1992;60:259–268. [Google Scholar]
- 49.Soumekh M. Synthetic Aperture Radar Signal Processing. Wiley; New York: 1999. [DOI] [PubMed] [Google Scholar]
- 50.Sutton BP, Noll DC, Fessler JA. Fast, iterative image reconstruction for MRI in the presence of field inhomogeneities. IEEE Trans Med Imaging. 2003;17:178–188. doi: 10.1109/tmi.2002.808360. [DOI] [PubMed] [Google Scholar]
- 51.Tibshirani R. Regression shrinkage and selection via the lasso. J Roy Statist Soc Ser B. 1996;32:267–288. [Google Scholar]
- 52.Wright SJ, Nowak RD, Figueiredo MAT. Sparse reconstruction by separable approximation. IEEE Trans Signal Process. 2009;43:2479–2493. [Google Scholar]
- 53.Wu TT, Chen YF, Hastie T, Sobel E, Lange K. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics. 2009;19:714–720. doi: 10.1093/bioinformatics/btp041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Yang J, Zhang Y. Alternating direction algorithms for ℓ1-problems in compressive sensing. SIAM J Sci Comput. 2011;33:250–278. [Google Scholar]
- 55.Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol. 2006;68:49–67. [Google Scholar]
- 56.Yuan XM. Alternating direction methods for covariance selection models. J Sci Comput. 2012;51:261–273. [Google Scholar]
- 57.Zhang Y. Rice University CAAM Technical Report TR09-17. Rice University; Houston, TX: 2009. User’s Guide for YALL1: Your Algorithms for L1 Optimization. [Google Scholar]







