Skip to main content
Springer logoLink to Springer
. 2018 Oct 4;2018(1):269. doi: 10.1186/s13660-018-1863-z

Modified hybrid decomposition of the augmented Lagrangian method with larger step size for three-block separable convex programming

Min Sun 1,2,, Yiju Wang 2
PMCID: PMC6182414  PMID: 30363783

Abstract

The Jacobian decomposition and the Gauss–Seidel decomposition of augmented Lagrangian method (ALM) are two popular methods for separable convex programming. However, their convergence is not guaranteed for three-block separable convex programming. In this paper, we present a modified hybrid decomposition of ALM (MHD-ALM) for three-block separable convex programming, which first updates all variables by a hybrid decomposition of ALM, and then corrects the output by a correction step with constant step size α(0,22) which is much less restricted than the step sizes in similar methods. Furthermore, we show that 22 is the optimal upper bound of the constant step size α. The rationality of MHD-ALM is testified by theoretical analysis, including global convergence, ergodic convergence rate, nonergodic convergence rate, and refined ergodic convergence rate. MHD-ALM is applied to solve video background extraction problem, and numerical results indicate that it is numerically reliable and requires less computation.

Keywords: The augmented Lagrangian method, Three-block separable convex programming, Step size, Global convergence

Introduction

Many problems encountered in applied mathematics area can be formulated as separable convex programming, such as basis pursuit (BP) problem [13], video background extraction problem [47], image decomposition [810], and so on. Thus the solving of separable convex programming plays a fundamental role in applied mathematics and has drawn persistent attention. In the existing literature, several forms of separable convex programming have been investigated [1115], in which the following three-block separable convex programming rouses more interest:

min{i=13θi(xi)|i=13Aixi=b,xiXi,i=1,2,3}, 1

where θi:Rni(,+] (i=1,2,3) are lower semicontinuous proper convex functions, AiRl×ni (i=1,2,3) and bRl, Xi (i=1,2,3) are nonempty closed convex sets in Rni (i=1,2,3). Throughout this paper, we assume that the solution set of problem (1) is nonempty.

The Lagrangian and augmented Lagrangian functions of problem (1) are defined, respectively, as

L(x1,x2,x3,λ)=i=13θi(xi)λ,i=13Aixib, 2
Lβ(x1,x2,x3,λ)=L(x1,x2,x3,λ)+β2i=13Aixib2, 3

where λRl is the Lagrange multiplier associated with the linear constraints in (1), and β>0 is a penalty parameter. Applying the augmented Lagrangian method (ALM) [16] to problem (1), we can obtain the following iterative scheme:

{(x1k+1,x2k+1,x3k+1)=argmin{Lβ(x1,x2,x3,λk)|x1X1,x2X2,x3X3},λk+1=λkβ(A1x1k+1+A2x2k+1+A3x3k+1b). 4

Obviously, three variables x1, x2, x3 are all involved in the minimization problem of (4), which makes the method often hard to implement. One technique to handle this is to split the subproblem into several small scale subproblems. Based on this, if we split it in a Gauss–Seidel manner and adopt the famous alternating direction method of multiplier (ADMM) [11], we obtain the following iterative scheme:

{x1k+1=argmin{Lβ(x1,x2k,x3k,λk)|x1X1},x2k+1=argmin{Lβ(x1k+1,x2,x3k,λk)|x2X2},x3k+1=argmin{Lβ(x1k+1,x2k+1,x3,λk)|x3X3},λk+1=λkβ(A1x1k+1+A2x2k+1+A3x3k+1b). 5

On the other hand, if we split it in a Jacobian manner, we get the following full parallel iterative scheme:

{x1k+1=argmin{Lβ(x1,x2k,x3k,λk)|x1X1},x2k+1=argmin{Lβ(x1k,x2,x3k,λk)|x2X2},x3k+1=argmin{Lβ(x1k,x2k,x3,λk)|x3X3},λk+1=λkβ(A1x1k+1+A2x2k+1+A3x3k+1b). 6

Compared with the minimization problem in (4), the scale of the minimization procedures in (5) and (6) is decreased, and they fully utilize the separable property of the objective function of (1), thus the new iterative schemes (5) and (6) gain some solvability. However, their convergence cannot be guaranteed under milder conditions as shown in [12, 17]. To overcome this drawback, several new techniques, such as the regularization method with large proximal parameter [1823], the prediction-correction method with shrunk step size [12, 13, 2426], etc., have been developed.

Compared with the regularization method, the prediction-correction method has attracted extensive interest, and during the past decades many scholars have performed studies in this direction. For example, He et al. [24] proposed an ADMM-based contraction type method for solving multi-block separable convex programming, which first generates a temporal iterate by (5), and then corrects it with a Gaussian back substitution procedure. Later, He et al. [12] developed a full Jacobian decomposition of the augmented Lagrangian method for solving multi-block separable convex programming, which first generates a temporal iterate by (6), and then corrects it with a constant step size or varying step size. Different from the above, Han et al. [13] proposed a partial splitting augmented Lagrangian method for solving three-block separable convex programming, which first updates the primal variables x1, x2, x3 in a partially-parallel manner, and then corrects x3, λ with a constant step size. Later, Wang et al. [25] presented a proximal partially-parallel splitting method for solving multi-block separable convex programming, which first updates all primal variables in a partially-parallel manner, and then corrects the output with a constant step size or varying step size. Quite recently, Chang et al. [26] proposed a convergent prediction-correction-based ADMM in which more minimization problems are involved. In conclusion, the above iteration schemes first generate a temporal iterate by (5) or (6) or their variants, and then generate the new iterate by correcting the temporal iterate with varying step size or a constant step size.

Varying step size needs to be dynamically updated at each iteration, which might be computationally demanding for large-scale (1). Hence in this paper, we consider the prediction-correction method with constant step size for solving problem (1). To the best of our knowledge, He et al. [12] first proposed a prediction-correction method with constant step size for solving (1), and they proved that the upper bound of the constant step size is 0.2679. By taking a hybrid splitting of (4) as the prediction step, Wang et al. [25] relaxed the upper bound of the constant step size to 0.3670 and Han et al. [13] further relaxed it to 0.3820. In practice, to enhance the numerical efficiency of the corresponding iteration method, larger values of the step size are preferred as long as the convergence is still guaranteed [26]. In this paper, based on the methods in [12, 13, 25], we propose a modified hybrid decomposition of the augmented Lagrangian method with constant step size, whose upper bound is relaxed to 0.5858.

The rest of this paper is organized as follows. Section 2 lists some notations and basic results. In Sect. 3, we present a modified hybrid decomposition of the augmented Lagrangian method with larger step size for problem (1) and establish its global convergence and refined convergence rate. Furthermore, a simple example is given to illustrate that 220.5858 is the optimal upper bound of the constant step size in MHD-ALM. In Sect. 4, some numerical results are given to demonstrate the numerical advantage of larger step size. Finally, a brief conclusion including some possible future works is drawn in Sect. 5.

Preliminaries

In this section, we give some notations and basic results about the minimization problem (1), which will be used in the forthcoming discussions.

Throughout this paper, we define the following notations:

x=(x1,x2,x3),v=(x2,x3,λ),w=(x1,x2,x3,λ),θ(x)=θ1(x1)+θ2(x2)+θ3(x3)

and

A=(A1,A2,A3),X=X1×X2×X3,V=X2×X3×Rl,W=X×Rl.

Definition 2.1

A tuple (x,λ)W is called a saddle point of the Lagrangian function (2) if it satisfies the inequalities

LλRl(x,λ)L(x,λ)LxX(x,λ). 7

Solving problem (1) is equivalent to finding a saddle point of L(x,λ) [26, 27]. Therefore, to solve (1), we only need to solve the two inequalities in (7), which can be written as the following mixed variational inequality:

θ(x)θ(x)+(ww)F(w)0,wW, 8

where

F(w)=(A1λA2λA3λi=13Aixib)=(AλAxb)=(0AA0)(xλ)(0b). 9

Because F(w) is a linear mapping with skew-symmetric coefficient matrix, it satisfies the following property:

(ww)F(w)=(ww)F(w),w,wW. 10

The mixed variational inequality (8) is denoted by MVI(W,F,θ), whose solution set is denoted by W, which is nonempty from the assumption on problem (1).

To solve MVI(W,F,θ), He et al. [28] presented the following prototype algorithm:

A prototype algorithm for MVI(W,F,θ) , denoted by ProAlo:

Prediction: For given vk, find wˆkW and Q satisfying

θ(x)θ(xˆk)+(wwˆk)F(wˆk)(vvˆk)Q(vkvˆk),wW, 11

where the matrix Q has the property: (Q+Q) is positive definite.

Correction: Determine a nonsingular matrix M, a scalar α>0, and generate the new iterate vk+1 via

vk+1=vkαM(vkvˆk). 12

Condition 2.1

The matrices Q, M in the ProAlo satisfy that the three matrices Q+Q, H:=QM1, G(α):=Q+QαMHM are positive definite.

Under Condition 2.1, He et al. [28] established the convergence results of ProAlo, including the global convergence, the worst-case O(1/t) convergence rate in ergodic or nonergodic sense, where t is the iteration counter. See Theorems 3.3, 4.2, 4.5 in [28].

To end this section, we give the following lemma which will be used in the subsequent section.

Lemma 2.1

([27])

Let XRn be a closed nonempty convex set, θ(x) and f(x) be two convex functions. If the function θ(x) is nondifferentiable, the function f(x) is differentiable, and the solution set of the problem min{θ(x)+f(x)|xX} is nonempty, then

xargmin{θ(x)+f(x)|xX}

if and only if

xX,θ(x)θ(x)+(xx)f(x)0,xX.

Algorithm and its convergence

In this section, we give the process of the modified hybrid decomposition of the augmented Lagrangian method (MHD-ALM) for three-block separable convex programming (1) and establish its convergence results, including global convergence, ergodic convergence rate, nonergodic convergence rate, and refined ergodic convergence rate.

Algorithm: MHD-ALM

Step 0.

Let parameters α(0,22), β>0, tolerance error ε>0. Choose an initial point v0=(x20,x30,λ0)V. Set k=0.

Step 1.
Compute the prediction iterate w˜k=(x˜1k,x˜2k,x˜3k,λ˜k) via
{x˜1k=argmin{Lβ(x1,x2k,x3k,λk)|x1X1},x˜2k=argmin{Lβ(x˜1k,x2,x3k,λk)|x2X2},x˜3k=argmin{Lβ(x˜1k,x2k,x3,λk)|x3X3},λ˜k=λkβ(A1x˜1k+A2x˜2k+A3x˜3kb). 13
Step 2.

If max{A2x2kA2x˜2k,A3x3kA3x˜3k,λkλ˜k}ε, then stop; otherwise, go to Step 3.

Step 3.
Generate the new iterate wk+1=(x1k+1,x2k+1,x3k+1,λk+1) by
{x1k+1=x˜1k,x2k+1=x2kα(x2kx˜2k),x3k+1=x3kα(x3kx˜3k),λk+1=λkα(λkλ˜k). 14
Replace k+1 by k, and go to Step 1.

Remark 3.1

Different from the iterative schemes (5) and (6), the iterative scheme (13) first updates the primal variable x1 and then updates the primal variables x2, x3 in a parallel manner. Furthermore, the feasible set of the step size α in MHD-ALM is extended from (0,0.2679) in [12], (0,0.3820) in [13], (0,0.3670) in [25] to (0,0.5858).

The methods in [12, 13, 2426] and MHD-ALM all fall into the algorithmic framework of prediction-correction methods. The main differences among these methods are: (i) in the prediction step, the methods in [24, 26] update all the primal variables in a sequential order; the method in [12] updates all the primal variables in a parallel manner; the methods in [13, 25] and MHD-ALM update all the primal variables in a partial parallel manner, i.e., they first update x1 and then update x2, x3 in a parallel manner; (ii) in the correction step, the method in [13] updates x3, λ; the method in [26] and MHD-ALM update x2, x3, λ, and the methods in [12, 24, 25] update all the variables.

The convergence analysis of MHD-ALM needs the following assumption and auxiliary sequence.

Assumption 3.1

The matrices A2, A3 in problem (1) are both full column rank.

Define an auxiliary sequence wˆk=(xˆ1k,xˆ2k,xˆ3k,λˆk) as

xˆik=x˜ik(i=1,2,3),λˆk=λkβ(A1x˜1k+A2x2k+A3x3kb). 15

To prove the convergence results of MHD-ALM, we only need cast it into the ProAlo and ensure the following two conditions hold: (i) the generated sequence satisfying (11), (12); (ii) the resulting matrices Q, M satisfying Condition 2.1 in Sect. 2. We first verify the first condition. Based on Lemma 2.1, we can derive the first order optimality conditions of the subproblems in (13), which are summarized in the following lemma.

Lemma 3.1

Let {wk} be the sequence generated by MHD-ALM and {wˆk} be defined as in (15). Then it holds that

θ(x)θ(xˆk)+(wwˆk)F(wˆk)(vvˆk)Q(vkvˆk),wW, 16

where the matrix Q is defined by

Q=(βA2A2000βA3A30A2A3Il/β). 17

Proof

Based on Lemma 2.1 and using the notation of wˆk in (15), the first order optimality conditions for the three minimization problems in (13) can be summarized as the following inequalities:

θ1(x1)θ1(xˆ1k)+(x1xˆ1k)(A1λˆk)0,x1X1,θ2(x2)θ2(xˆ2k)+(x2xˆ2k)(A2λˆkβA2A2(x2kxˆ2k))0,x2X2,θ3(x3)θ3(xˆ3k)+(x3xˆ3k)(A3λˆkβA3A3(x3kxˆ3k))0,x3X3.

Furthermore, the definition of the variable λˆk in (15) gives

(λλˆk)(A1xˆ1k+A2xˆ2k+A3xˆ3kb+A2(x2kxˆ2k)+A3(x3kxˆ3k)+1β(λˆkλk))=0,λRl.

Adding the above four inequalities, rearranging terms, and using the definition of the matrix Q, the function F(w), we can get the result (16). This completes the proof. □

Remark 3.2

When max{A2x2kA2x˜2k,A3x3kA3x˜3k,λkλ˜k}=0, by (15), we get A2x2k=A2xˆ2k, A3x3k=A3xˆ3k, λk=λˆk. Thus, Q(vkvˆk)=0. This and inequality (16) indicate that

θ(x)θ(xˆk)+(wwˆk)F(wˆk)0,wW.

Therefore, wkW, and the stopping criterion of MHD-ALM is reasonable.

By the definition of λˆk in (15), the updating formula of λ˜k can be represented as

λ˜k=λkβ(A1x˜1k+A2x˜2k+A3x˜3kb)=λk(λkλˆkβ(A2(x2kxˆ2k)+A3(x3kxˆ3k)))=λk(βA2,βA3,Il)(vkvˆk).

This together with (14), (15) gives

{x1k+1=x˜1k,vk+1=vkαM(vkvˆk), 18

where the matrix M is defined as

M=(Il000Il0βA2βA3Il). 19

Now to establish the convergence results of MHD-ALM, we only need to verify that the matrices Q, M satisfy Condition 2.1 in Sect. 2.

Lemma 3.2

Let the matrices Q, M be defined as in (17) and (19). If α(0,0.5858) and Assumption 3.1 hold, then we have

  • (i)

    the symmetric matrix Q+Q is positive definite;

  • (ii)

    the matrix H=QM1 is symmetric and positive definite;

  • (iii)

    the matrix G(α)=Q+QαMHM is symmetric and positive definite.

Proof

(i) From the definition of Q, we have

Q+Q=(2βA2A20A202βA3A3A3A2A32Il/β).

Therefore, for any v=(x2,x3,λ)0, we have

v(Q+Q)v=2βA2x22+2βA3x32+2λ(A2x2+A3x3)+2βλ2. 20
  • If (x2,x3)=0, then λ0, so by (20), we get v(Q+Q)v=2βλ2>0.

  • If (x2,x3)0, then from (20) we get
    v(Q+Q)vβA2x22+βA3x32>0,
    where the first inequality follows from the inequality 2xy(βx2+y2/β), and the second inequality comes from Assumption 3.1.

(ii) From the definition of Q, M, we have

H=(βA2A2000βA3A3000Il/β),

which is obviously positive definite by Assumption 3.1.

(iii) Similarly, from the definition of Q, M, we have

G(α)=(β(22α)A2A2αβA2A3(1α)A2αβA3A2β(22α)A3A3(1α)A3(1α)A2(1α)A3(2α)Il/β)=LR(α)L,

where

L=(βA2000βA3000Il/β),R(α)=((22α)IlαIl(1α)IlαIl(22α)Il(1α)Il(1α)Il(1α)Il(2α)Il).

This together with Assumption 3.1 implies that we only need to prove the matrix R(α) is positive definite. In fact, it can be written as

R(α)=(22αα(1α)α22α(1α)(1α)(1α)2α)Il,

where ⊗ denotes the matrix Kronecker product. Thus, we only need to prove the 3 order matrix

(22αα(1α)α22α(1α)(1α)(1α)2α)

is positive definite, whose three eigenvalues are λ1=2α, λ2=22α3α24α+2, λ3=22α+3α24α+2. Then, solving the following three inequalities simultaneously

{2α>0,22α3α24α+2>0,22α+3α24α+2>0,

we get 0<α<220.5858. Therefore, the matrix G is positive definite for any α(0,0.5858). This completes the proof. □

Lemma 3.2 indicates that the matrices Q, M defined as in (17) and (19) satisfy Condition 2.1 in Sect. 2, and thus we get the following convergence results of MHD-ALM based on Theorems 3.3, 4.2, 4.5 in [28].

Theorem 3.1

(Global convergence)

Let {wk} be the sequence generated by MHD-ALM. Then it converges to a vector w, which belongs to W.

Theorem 3.2

(Ergodic convergence rate)

Let {wk} be the sequence generated by MHD-ALM, {wˆk} be the corresponding sequence defined in (15). Set

w¯t=1tk=0twˆk.

Then, for any integer t1, we have

θ(x¯t)θ(x)+(w¯tw)F(w)12αtvv0H2,wW. 21

Theorem 3.3

(Nonergodic convergence rate)

Let {wk} be the sequence generated by MHD-ALM. Then, for any wW and integer t1, we have

M(vtvˆt)H21c0tv0vH2,

where c0>0 is a constant.

The term

12αtvv0H2

on the right-hand side of (21) is used to measure the ergodic convergence rate of MHD-ALM. However, it is not only independent of the distance between the initial iterate w0 and the solution set W but also hard to estimate due to the variable v. Therefore, inequality (21) is not a reasonable criterion to measure the nonergodic convergence rate of MHD-ALM. In the following, we shall give a refined result from the objective function and constraint condition of problem (1), which is more reasonable, accurate, and intuitive.

Lemma 3.3

Let {wk} be the sequence generated by MHD-ALM. Then, for any wW, we have

α(θ(x)θ(xˆk)+(wwˆk)F(wˆk))12(vvk+1H2vvkH2)+α2vkvˆkH2. 22

Proof

The proof is similar to that of Lemma 3.1 in [28] and is omitted for brevity of this paper. This completes the proof. □

Theorem 3.4

(Refined ergodic convergence rate)

Let {wk} be the sequence generated by MHD-ALM, {wˆk} be the sequence defined in (15). Set

w¯t=1tk=0t1wˆk.

Then, for any integer t1, there exists a constant c>0 such that

{|θ(x¯t)θ(x)|c2αt,Ax¯tbc2αt.

Proof

Choose w=(x,λ)W. Then, for any λRl, we have w˜:=(x,λ)W. From the definition of F(w) in (9), we have

(w˜wˆk)F(wˆk)=(w˜wˆk)F(w˜)=(xxˆkλλˆk)(AλAxb)=λ(AxAxˆk)=λ(Axˆkb),

where the first equation follows from (10). Setting w=w˜ in (22), we get

α(θ(xˆk)θ(x)(w˜wˆk)F(wˆk))12(v˜vkH2v˜vk+1H2)α2vkvˆkH2.

Combining the above two inequalities gives

α(θ(xˆk)θ(x)λ(Axˆkb))12(w˜wkH2v˜vk+1H2)α2vkvˆkH2.

Summing the above inequality from k=0 to t1 yields

k=0t1θ(xˆk)tθ(x)λ(Ak=0t1xˆktb)12αv˜v0H2.

Dividing both sides of the above inequality by t, we get

1tk=0t1θ(xˆk)θ(x)λ(Ax¯tb)12αtv˜v0H2.

Then it follows from the convexity of θi (i=1,2,3) that

θ(x¯t)θ(x)λ(Ax¯tb)12αt(y0yλ0λ)H2, 23

where y0=(x20,x30), y=(x2,x3). Since (23) holds for any λ, we can set

λ=Ax¯tbAx¯tb,

and consequently,

θ(x¯t)θ(x)+Ax¯tb12αtsupλ1(y0yλ0λ)H2.

Set

c=supλ1(y0yλ0λ)H2,

and we thus get

θ(x¯t)θ(x)+Ax¯tbc2αt.

Since xX (here X denotes the solution set of problem (1)), we have

θ(x¯t)θ(x)0.

Combining the above two inequalities gives

{|θ(x¯t)θ(x)|c2αt,Ax¯tbc2αt,

which completes the proof. □

As mentioned in Sect. 1, He et al. [12] used a simple example to show that the iterative scheme (6) may diverge for two-block separable convex programming. If we set θ1=0, A1=0 in (1) and MHD-ALM, then MHD-ALM reduces to the method in [12]. In this case, the feasible set of α in [12] is (0,0.3670), the same as that of the method in [25] for three-block separable convex programming. Now we use this example given in [12] to show that: (i) larger values of α(0,22) can enhance the performance of MHD-ALM; (ii) MHD-ALM with α220.5858 may diverge.

Example 3.1

Consider the linear equation

x2+x3=0. 24

Obviously, the linear equation (24) is a special case of problem (1) with the specifications: θ1=θ2=θ3=0, A1=0, A2=A3=1, b=0, X1=X2=X3=R. Due to θ1=0, A1=0, in the following we do not consider the variable x1. The solution set of the corresponding mixed variation inequalities is

W={(x2,x3,λ)|x2+x3=0,λ=0}.

For MHD-ALM, we set β=1, the initial point x20=x30=0, λ0=1, and choose

α{0.20,0.21,0.22,,0.55}.

The stopping criterion is set as

max{|x2k+x3k|,|λk|}105,

or the number of iterations exceeds 10,000.

The numerical results are graphically shown in Fig. 1, which illustrates that when α0.5, the number of iterations is descending with respect to α, while when α(0.5,0.55), the number of iterations increases quickly. Therefore, α=0.5 is optimal for this problem, and larger values of α in its feasible set indeed can enhance the numerical performance of MHD-ALM. Of course, some extreme values, such as the values near the upper bound 0.5858, are not appropriate choices.

Figure 1.

Figure 1

Sensitivity test on the step size α

Now, we show that MHD-ALM may diverge for α22. By some simple manipulations, the iterative scheme of (13) and (14) for problem (24) can be written in the following compact form:

(x2k+1x3k+1λk+1)=P(α)(x2kx3kλk), 25

where

P(α)=(1αααα1αααα12α).

Three eigenvalues of the matrix P(α) are

λ1=1,λ2=12α+2α,λ3=12α2α.

Now let us consider the following two cases:

(1) For any α>22, we have

λ3=1(2+2)α<1(2+2)(22)=1.

Then ρ(P(α))>1 for α>22, where ρ(P(α)) is the spectral radius of P(α). Hence, the iterative scheme (25) with α>22 is divergent for this problem.

(2) For α=22, by eigenvalue decomposition, the matrix P(22) can be decomposed as

P(22)=VDV,

where

V=(1/22/21/21/22/21/22/202/2),D=(10001000425).

Thus, by (25), we get

(x2k+1x3k+1λk+1)=(2((1)k(425)k)42((1)k(425)k)4(1)k2+(425)k2),

from which it holds that

(x22kx32kλ2k)(242412),(x22k+1x32k+1λ2k+1)(242412).

Hence, the iterative scheme (25) with α=22 is also divergent for this problem. Overall, 22 is the optimal upper bound of the step size α in MHD-ALM.

Now let us consider some special cases and extensions of MHD-ALM:

(1) Problem (1) with θ1=0, A1=0 reduces to two-block separable convex programming, which can be solved by MHD-ALM as follows:

{x˜2k=argmin{Lβ(x2,x3k,λk)|x2X2},x˜3k=argmin{Lβ(x2k,x3,λk)|x3X3},λ˜k=λkβ(A2x˜2k+A3x˜3kb), 26

and

{x2k+1=x2kα(x2kx˜2k),x3k+1=x3kα(x3kx˜3k),λk+1=λkα(λkλ˜k). 27

Since the iterative scheme (26), (27) is a special case of MHD-ALM, it is convergent for any α(0,22). Furthermore, by Example 3.1, 22 is the optimal upper bound of the constant step size α in (26), (27).

(2) Similarly, problem (1) with θ3=0, A3=0 also reduces to two-block separable convex programming, which can be solved by MHD-ALM as follows:

{x˜1k=argmin{Lβ(x1,x2k,λk)|x1X1},x˜2k=argmin{Lβ(x˜1k,x2,λk)|x2X2},λ˜k=λkβ(A1x˜1k+A2x˜2kb), 28

and

{x1k+1=x˜1k,x2k+1=x2kα(x2kx˜2k),λk+1=λkα(λkλ˜k). 29

Following a similar analysis procedure, we can prove that the iterative scheme (28), (29) is convergent for any α(0,1).

(3) Extending MHD-ALM to solve four-block separable convex programming:

min{i=14θi(xi)|i=14Aixi=b,xiXi,i=1,2,3,4},

we can get the following iterative scheme:

{x˜1k=argmin{Lβ(x1,x2k,x3k,x4k,λk)|x1X1},x˜2k=argmin{Lβ(x˜1k,x2,x3k,x4k,λk)|x2X2},x˜3k=argmin{Lβ(x˜1k,x2k,x3,x4k,λk)|x3X3},x˜4k=argmin{Lβ(x˜1k,x2k,x3k,x4,λk)|x4X4},λ˜k=λkβ(A1x˜kb), 30

and

{x1k+1=x˜1k,x2k+1=x2kα(x2kx˜2k),x3k+1=x3kα(x3kx˜3k),x4k+1=x4kα(x4kx˜4k),λk+1=λkα(λkλ˜k). 31

With similar reasoning, we can prove that the iterative scheme (30), (31) is convergent for any α(0,23).

Numerical results

In this section, we demonstrate the practical efficiency of MHD-ALM by applying it to recover low-rank and sparse components of matrices from incomplete and noisy observation. Furthermore, to give more insight into the behavior of MHD-ALM, we compare it with the full Jacobian decomposition of the augmented Lagrangian method (FJD-ALM) [12] and the proximal partially parallel splitting method with constant step size (PPPSM) [25]. All experiments are performed on a Pentium(R) Dual-Core CPU T4400@2.2 GHz PC with 4 GB of RAM running on 64-bit Windows operating system.

The mathematical model of recovering low-rank and sparse components of matrices from incomplete and noisy observation is [20]

minL,S,U{L+τS1+12μPΩ(U)2},s.t. L+S+U=D, 32

where DRp×q is a given matrix, τ>0 is a balancing parameter, μ>0 is a penalty parameter, Ω{1,2,,p}×{1,2,,q} is the index set of the observable entries of D, and PΩ:Rp×qRp×q is the projection operator defined by

[PΩ(X)]ij={Xij,if (i,j)Ω,0,otherwise,1ip,1jq.

Problem (32) is a concrete model of the generic problem (1), and MHD-ALM is applicable. For this problem, the three minimization problems in (13) all admit closed-form solutions, which can be found in [20].

Simulation example

We generate the synthetic data of (32) in the same way as [5, 20]. Specifically, let L, S be the low-rank matrix, the sparse matrix, respectively, and rr, spr, and sr represent the ratios of the low-rank ratio of L (i.e., r/p), the number of nonzero entries of S (i.e., S0/(pq)), and the observed entries (i.e., |Ω/(pq)), respectively. The observed part of the matrix D is generated by the following Matlab scripts, in which b is the vectorization of D:

X=randn(m,rrm)randn(rrm,n);Omega=randperm(mn);p=round(srmn);Omega=Omega(1:p);Omega=Omega;Y=zeros(m,n);L=round(min(spr,sr)mn);nzp=Omega(1:L);Y(nzp)=(rand(L,1)21)500;b=X+Y;b=b(Omega);sigma=0.001;b=b+sigmarandn(p,1);

In this experiment, we set τ=1/p, μ=p+8pσ/10, β=0.06|Ω|PΩ(D)1, the initial iterate (L0,S0,U0,λ0)=(0,0,0,0), and use the stopping criterion

max{L˜kLk1+Lk,S˜kSk1+Sk}<104, 33

or the number of iterations exceeds 500.

The parameters in the three tested methods are listed as follows:

  • FJD-ALM: α=0.38.

  • PPPSM: Si=0, H=βI, Q=I, α=0.36.

  • MHD-ALM: α=0.5.

In Tables 1 and 2, we report the numerical results of three tested methods, in which the number of iterations (denoted by ‘Iter.’), the elapsed CPU time in seconds (denoted by ‘Time’), the relative error of the recovered low-rank matrix, and the relative error of the recovered sparse matrix are reported when the stopping criterion (33) is satisfied.

Table 1.

Numerical comparisons between different algorithms for p=q=500

rr spr sr Method Iter. Time LkLL SkSS
0.05 0.05 0.9 FJD-ALM 97 7.42 5.7325e−04 9.7830e−05
PPPSM 133 14.70 2.4034e−04 3.6358e−05
MHD-ALM 44 5.17 7.5910e−04 1.5025e−04
0.05 0.05 0.6 FJD-ALM 120 11.03 1.6528e−03 1.3102e−04
PPPSM 181 17.62 5.4329e−04 5.1513e−05
MHD-ALM 75 6.76 1.7148e−03 1.1343e−04
0.1 0.1 0.9 FJD-ALM 127 11.39 2.2566e−03 2.2374e−04
PPPSM 175 17.79 5.2215e−04 6.1755e−05
MHD-ALM 80 6.86 1.5097e−03 1.5994e−04
0.1 0.1 0.8 FJD-ALM 188 17.45 4.2439e−03 2.7536e−04
PPPSM 222 20.78 1.2697e−03 9.5339e−05
MHD-ALM 119 10.15 2.8338e−03 1.9242e−04
0.1 0.15 0.9 FJD-ALM 209 20.18 3.9031e−03 2.3423e−04
PPPSM 222 25.95 1.3444e−03 9.5133e−05
MHD-ALM 120 12.37 3.2420e−03 2.0704e−04

Table 2.

Numerical comparisons between different algorithms for p=q=1000

rr spr sr Method Iter. Time LkLL SkSS
0.05 0.05 0.9 FJD-ALM 105 93.16 2.6808e−04 6.1154e−05
PPPSM 129 131.08 9.9216e−05 2.3683e−05
MHD-ALM 46 46.82 3.5053e−04 9.5144e−05
0.05 0.05 0.6 FJD-ALM 139 117.67 7.1439e−04 6.8845e−05
PPPSM 147 138.08 2.1260e−04 3.4427e−05
MHD-ALM 57 53.31 5.9883e−04 9.8047e−05
0.1 0.1 0.9 FJD-ALM 119 100.13 9.3082e−04 1.6381e−04
PPPSM 188 179.91 2.3296e−04 4.2820e−05
MHD-ALM 61 57.04 8.9591e−04 1.6198e−04
0.1 0.1 0.8 FJD-ALM 133 114.80 1.2760e−03 1.8083e−04
PPPSM 192 182.46 3.7423e−04 5.2624e−05
MHD-ALM 76 67.08 1.3497e−03 1.6979e−04
0.1 0.15 0.9 FJD-ALM 139 117.89 1.9819e−03 2.3040e−04
PPPSM 206 188.47 4.4773e−04 5.7452e−05
MHD-ALM 84 76.34 1.6396e−03 1.8795e−04

Numerical results in Tables 1 and 2 indicate that: (i) all methods successfully solved all the tested cases; (ii) both MHD-ALM and PPPSM perform better than FJD-ALM, and MHD-ALM performed the best. The reason maybe that FJD-ALM updates all the primal variables in a parallel manner, while PPPSM and MHD-ALM update x2, x3 based on the newest updated x1 to accelerate the convergence speed. Furthermore, the step size α of MHD-ALM is larger than that of PPPSM, and the latter is larger than that of FJD-ALM. Therefore, larger values of α can enhance the efficiency of the corresponding method.

Application example

In this subsection, we apply the proposed method to solve the video background extraction problem with missing and noisy data [29]. There is a video taken in an airport, which consists of 200 grayscale frames with each frame having 144×176 pixels. We need to separate its background and foreground. Vectorizing all frames of the video, we get a matrix DR25,344×50, and each column represents a frame. Let L,SR25,344×200 be the matrix representations of its background and foreground (i.e., the moving objects), respectively. Then the rank of L is equal to one exactly, and S should be sparse with only a small number of nonzero elements. We consider only a fraction entries of D can be observed, whose indices are collected in the index set Ω. Then the background extraction problem with missing and noisy data can be casted as problem (32). In the experiment, the parameters in MHD-ALM are set as α=0.5, β=0.005|Ω|PΩ(D)1, the parameters in (32) are set as τ=1/p, μ=0.01, and the initial iterate (L0,S0,U0,λ0)=(0,0,0,0). We use the same stopping criterion as (33) with the tolerance 10−2.

Figure 2 displays the separation results of the 10th and 125th frames of the video with sr=0.7, which indicate that the proposed MHD-ALM successfully separates the background and foreground of the two frames.

Figure 2.

Figure 2

The 10th and 125th frames of the clean video and the corresponding corrupted frames with sr=0.7 (the top and third lines); the extracted background and foreground by MHD-ALM (the second and fourth lines)

Conclusion

In this paper, a hybrid decomposition of the augmented Lagrangian method is proposed for three-block separable convex programming, whose most important characteristic is that its correction step adopts a constant step size. We showed that the optimal upper bound of the constant step size is 22. Preliminary numerical results indicate that the proposed method is more efficient than similar methods in the literature.

The following two issues deserve further researching: (i) Due to Condition 2.1 being only a sufficient condition to ensure the convergence of the ProAlo, is 1 the optimal upper bound of α in the iterative scheme (28), (29)? Similarly, is 22 the optimal upper bound of α in the iterative scheme (30), (31)? (ii) If we choose different step sizes for x2, x3, λ in the correction step of MHD-ALM, the feasible set of these step sizes needs more discussion.

Acknowledgments

Acknowledgements

The authors gratefully acknowledge the valuable comments of the editor and the anonymous reviewers.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Authors’ contributions

The first author provided the problems and gave the proof of the main results, and the second author finished the numerical experiment. All authors read and approved the final manuscript.

Funding

This work is supported by the National Natural Science Foundation of China and Shandong Province (No. 11671228, 11601475, ZR2016AL05).

Competing interests

The authors declare that there are no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Min Sun, Email: ziyouxiaodou@163.com.

Yiju Wang, Email: wang-yiju@163.com.

References

  • 1.Chen S.S., Donoho D.L., Saunders M.A. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 1998;20:33–61. doi: 10.1137/S1064827596304010. [DOI] [Google Scholar]
  • 2.Sun M., Liu J. A proximal Peaceman–Rachford splitting method for compressive sensing. J. Appl. Math. Comput. 2016;50:349–363. doi: 10.1007/s12190-015-0874-x. [DOI] [Google Scholar]
  • 3.Sun M., Liu J. An accelerated proximal augmented Lagrangian method and its application in compressive sensing. J. Inequal. Appl. 2017;2017:263. doi: 10.1186/s13660-017-1539-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Candés E.J., Li X.D., Ma Y., Wright J. Robust principal component analysis? J. ACM. 2011;58(1):1–37. [Google Scholar]
  • 5.Tao M., Yuan X.M. Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 2011;21:57–81. doi: 10.1137/100781894. [DOI] [Google Scholar]
  • 6.Sun M., Wang Y.J., Liu J. Generalized Peaceman–Rachford splitting method for multiple-block separable convex programming with applications to robust PCA. Calcolo. 2017;54(1):77–94. doi: 10.1007/s10092-016-0177-0. [DOI] [Google Scholar]
  • 7.Sun M., Sun H.C., Wang Y.J. Two proximal splitting methods for multi-block separable programming with applications to stable principal component pursuit. J. Appl. Math. Comput. 2018;56:411–438. doi: 10.1007/s12190-017-1080-9. [DOI] [Google Scholar]
  • 8.He B.S., Yuan X.M., Zhang W.X. A customized proximal point algorithm for convex minimization with linear constraints. Comput. Optim. Appl. 2013;56:559–572. doi: 10.1007/s10589-013-9564-5. [DOI] [Google Scholar]
  • 9.He B.S., Liu H., Wang Z.R., Yuan X.M. A strictly contractive Peaceman–Rachford splitting method for convex programming. SIAM J. Optim. 2014;24(3):1011–1040. doi: 10.1137/13090849X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.He B.S., Tao M., Yuan X.M. A splitting method for separable convex programming. IMA J. Numer. Anal. 2015;35(1):394–426. doi: 10.1093/imanum/drt060. [DOI] [Google Scholar]
  • 11.Gabay D., Mercier B. A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 1976;2:17–40. doi: 10.1016/0898-1221(76)90003-1. [DOI] [Google Scholar]
  • 12.He B.S., Hou L.S., Yuan X.M. On full Jacobian decomposition of the augmented Lagrangian method for separable convex programming. SIAM J. Optim. 2015;25(4):2274–2312. doi: 10.1137/130922793. [DOI] [Google Scholar]
  • 13.Han D.R., Kong W.W., Zhang W.X. A partial splitting augmented Lagrangian method for low patch-rank image decomposition. J. Math. Imaging Vis. 2015;51(1):145–160. doi: 10.1007/s10851-014-0510-7. [DOI] [Google Scholar]
  • 14.Wang Y.J., Zhou G.L., Caccetta L., Liu W.Q. An alternative Lagrange-dual based algorithm for sparse signal reconstruction. IEEE Trans. Signal Process. 2011;59:1895–1901. doi: 10.1109/TSP.2010.2103066. [DOI] [Google Scholar]
  • 15.Wang Y.J., Liu W.Q., Caccetta L., Zhou G.L. Parameter selection for nonnegative 1 matrix/tensor sparse decomposition. Oper. Res. Lett. 2015;43:423–426. doi: 10.1016/j.orl.2015.06.005. [DOI] [Google Scholar]
  • 16.Hestenes M. Multiplier and gradient methods. J. Optim. Theory Appl. 1969;4:303–320. doi: 10.1007/BF00927673. [DOI] [Google Scholar]
  • 17.Chen C.H., He B.S., Ye Y.Y., Yuan X.M. The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 2016;155(1):57–79. doi: 10.1007/s10107-014-0826-5. [DOI] [Google Scholar]
  • 18.Sun D.F., Toh K.C., Yang L. A convergent 3-block semi-proximal alternating direction method of multipliers for conic programming with 4-type of constraints. SIAM J. Optim. 2015;25:882–915. doi: 10.1137/140964357. [DOI] [Google Scholar]
  • 19.He B.S., Xu H.K., Yuan X.M. On the proximal Jacobian decomposition of ALM for multiple-block separable convex minimization problems and its relationship to ADMM. J. Sci. Comput. 2016;66(3):1204–1217. doi: 10.1007/s10915-015-0060-1. [DOI] [Google Scholar]
  • 20.Hou L.S., He H.J., Yang J.F. A partially parallel splitting method for multiple-block separable convex programming with applications to robust PCA. Comput. Optim. Appl. 2016;63(1):273–303. doi: 10.1007/s10589-015-9770-4. [DOI] [Google Scholar]
  • 21.Wang J.J., Song W. An algorithm twisted from generalized ADMM for multi-block separable convex minimization models. J. Comput. Appl. Math. 2017;309:342–358. doi: 10.1016/j.cam.2016.02.001. [DOI] [Google Scholar]
  • 22.Sun M., Liu J. The convergence rate of the proximal alternating direction method of multipliers with indefinite proximal regularization. J. Inequal. Appl. 2017;2017:19. doi: 10.1186/s13660-017-1295-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Sun M., Sun H.C. Improved proximal ADMM with partially parallel splitting for multi-block separable convex programming. J. Appl. Math. Comput. 2018;58:151–181. doi: 10.1007/s12190-017-1138-8. [DOI] [Google Scholar]
  • 24.He B.S., Tao M., Yuan X.M. Alternating direction method with Gaussian back substitution for separable convex programming. SIAM J. Optim. 2012;22:313–340. doi: 10.1137/110822347. [DOI] [Google Scholar]
  • 25.Wang K., Desai J., He H.J. A proximal partially-parallel splitting method for separable convex programs. Optim. Methods Softw. 2017;32(1):39–68. doi: 10.1080/10556788.2016.1200044. [DOI] [Google Scholar]
  • 26.Chang X.K., Liu S.Y., Zhao P.J., Li X. Convergent prediction-correction-based ADMM for multi-block separable convex programming. J. Comput. Appl. Math. 2018;335:270–288. doi: 10.1016/j.cam.2017.11.033. [DOI] [Google Scholar]
  • 27. He, B.S., Ma, F., Yuan, X.M.: Linearized alternating direction method of multipliers via positive-indefinite proximal regularization for convex programming. Optimization-online, 5569 (2016)
  • 28. He, B.S., Yuan, X.M.: On the direct extension of ADMM for multi-block separable convex programming and beyond: from variational inequality perspective. Optimization-online, 4293 (2014)
  • 29.Bouwmans T. Traditional and recent approaches in background modeling for foreground detection: an overview. Comput. Sci. Rev. 2014;11–12:31–66. doi: 10.1016/j.cosrev.2014.04.001. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.


Articles from Journal of Inequalities and Applications are provided here courtesy of Springer

RESOURCES