Skip to main content
Taylor & Francis Open Select logoLink to Taylor & Francis Open Select
. 2018 Dec 11;68(10):1855–1880. doi: 10.1080/02331934.2018.1556662

A forward–backward penalty scheme with inertial effects for monotone inclusions. Applications to convex bilevel programming

Radu Ioan Boţ a,CONTACT, Dang-Khoa Nguyen b
PMCID: PMC6817331  PMID: 31708644

ABSTRACT

We investigate a forward–backward splitting algorithm of penalty type with inertial effects for finding the zeros of the sum of a maximally monotone operator and a cocoercive one and the convex normal cone to the set of zeroes of an another cocoercive operator. Weak ergodic convergence is obtained for the iterates, provided that a condition expressed via the Fitzpatrick function of the operator describing the underlying set of the normal cone is verified. Under strong monotonicity assumptions, strong convergence for the sequence of generated iterates is proved. As a particular instance we consider a convex bilevel minimization problem including the sum of a non-smooth and a smooth function in the upper level and another smooth function in the lower level. We show that in this context weak non-ergodic and strong convergence can be also achieved under inf-compactness assumptions for the involved functions.

KEYWORDS: Maximally monotone operator, Fitzpatrick function, forward–backward splitting algorithm, convex bilevel optimization

AMS SUBJECT CLASSIFICATIONS: 47H05, 65K05, 90C25

1. Introduction and preliminaries

1.1. Motivation and problems formulation

During the last couple years one can observe in the optimization community an increasing interest in numerical schemes for solving variational inequalities expressed as monotone inclusion problems of the form

0Ax+NMx, (1)

where H is a real Hilbert space, A:HH is a maximally monotone operator, M:=argminh is the set of global minima of the proper, convex and lower semicontinuous function h:RR¯:=R{±} and NM:HH is the normal cone of the set M. The article [1] was starting point for a series of papers [2–12] addressing this topic or related ones. All these papers share the common feature that the proposed iterative schemes use penalization strategies, namely, they evaluate the penalized h by its gradient, in case the function is smooth (see, for instance, [3]), and by its proximal operator, in case it is non-smooth (see, for instance, [4]).

Weak ergodic convergence has been obtained in [3,4] under the hypothesis:

For all pRanNM,n1λnβnhpβnσMpβn<+, (2)

with (λn)n1, the sequence of step sizes, (βn)n1, the sequence of penalty parameters, h:HR¯, the Fenchel conjugate function of h, and RanNM the range of the normal cone operator NM:HH. Let us mention that (2) is the discretized counterpart of a condition introduced in [1] for continuous-time non-autonomous differential inclusions.

One motivation for studying numerical algorithms for monotone inclusions of type (1) comes from the fact that, when Af is the convex subdifferential of a proper, convex and lower semicontinuous function f:HR¯, they furnish iterative methods for solving bilevel optimization problems of the form

minxHfx:xargminh. (3)

Among the applications where bilevel programming problems play an important role we mention the modelling of Stackelberg games, the determination of Wardrop equilibria for network flows, convex feasibility problems [13], domain decomposition methods for PDEs [14], image processing problems [6], and optimal control problems [4].

Later on, in [7], the following monotone inclusion problem, which turned out to be more suitable for applications, has been addressed in the same spirit of penalty algorithms

0Ax+Dx+NMx, (4)

where A:HH is a maximally monotone operator, D:HH is cocoercive operator and the constraint set M is the set of zeros of another cocoercive operator B:HH. The provided algorithm of forward–backward type evaluates the operator A by a backward step and the two single-valued operators by forward steps. For the convergence analysis, (2) has been replaced by a condition formulated in terms of the Fitzpatrick function associated with the operator B, which we will also use in this paper. In [5], several particular situations for which this new condition is fulfilled have been provided.

The aim of this work is to endow the forward–backward penalty scheme for solving (4) from [7] with inertial effects, which means that the new iterate is defined in terms of the previous two iterates. Inertial algorithms have their roots in the time discretization of second-order differential systems [15]. They can accelerate the convergence of iterates when minimizing a differentiable function [16] and the convergence of the objective function values when minimizing the sum of a convex non-smooth and a convex smooth function [17,18]. Moreover, as emphasized in [19], see also [20], algorithms with inertial effects may detect optimal solutions of minimization problems which cannot be found by their non-inertial variants. In the last years, a huge interest in inertial algorithms can be noticed (see, for instance, [8,9,15,17,20–32]).

We prove weak ergodic convergence of the sequence generated by the inertial forward–backward penalty algorithm to a solution of the monotone inclusion problem (4), under reasonable assumptions for the sequences of step sizes, penalty and inertial parameters. When the operator A is assumed to be strongly monotone, we also prove strong convergence of the generated iterates to the unique solution of (4).

In Section 3, we address the minimization of the sum of a convex non-smooth and a convex smooth function with respect to the set of minimizes of another convex and smooth function. Besides the convergence results obtained from the general case, we achieve weak non-ergodic and strong convergence statements under inf-compactness assumptions for the involved functions. The weak non-ergodic theorem is an useful alternative to the one in [9], where a similar statement has been obtained for the inertial forward–backward penalty algorithm with constant inertial parameter under assumptions which are quite complicated and hard to verify (see also [11,12]).

1.2. Notations and preliminaries

In this subsection we introduce some notions and basic results which we will use throughout this paper (see [33–35]). Let H be a real Hilbert space with inner product , and associated norm =,.

For a function Ψ:HR¯:=R{±}, we denote DomΨ={xH:Ψ(x)<+} its effective domain and say that Ψ is proper, if DomΨ and Ψ(x)> for all xH. The conjugate function of Ψ is Ψ:HR¯,Ψ(u)=supxH{x,uΨ(x)}. The convex subdifferential of Ψ at the point xH is the set ∂Ψ(x)={pH:yx,pΨ(y)Ψ(x) yH}, whenever Ψ(x)R. We take by convention ∂Ψ(x)=, if Ψ(x){±}.

Let M be a non-empty subset of H. The indicator function of M, which is denoted by δM:HR¯, takes the value 0 on M and + otherwise. The convex subdifferential of the indicator function is the normal cone of M, that is NM(x)={pH:yx,p 0 yH}, if xM, and NM(x)= otherwise. Notice that for xM we have pNM(x) if and only if σM(x)=x,p, where σM=δM is the support function of M.

For an arbitrary set-value operator A:HH we denote by GrA={(x,v)H×H:vAx} its graph, by DomA={xH:Ax} its domain, by RanA={vH:xH with vAx} its range and by A1:HH its inverse operator, defined by (v,x)GrA1 if and only if (x,v)GrA. We use also the notation ZerA={xH:0Ax} for the set of zeros of the operator A. We say that A is monotone, if xy,vw0 for all (x,v),(y,w)GrA. A monotone operator A is said to be maximally monotone, if there exists no proper monotone extension of the graph of A on H×H. Let us mention that if A is maximally monotone, then ZerA is a convex and closed set [33, Proposition 23.39]. We refer to [33, Section 23.4] for conditions ensuring that ZerA is non-empty. If A is maximally monotone, then one has the following characterization for the set of its zeros

zZer A if and only if uz,y0 for all u,yGr A. (5)

The operator A is said to be γ- strongly monotone with γ>0, if xy,vwxy2 for all (x,v),(y,w)GrA. If A is maximally monotone and strongly monotone, then ZerA is a singleton, thus non-empty [33, Corollary 23.27].

The resolvent of A,JA:HH, is defined by JA:=(Id+A)1, where Id:HH denotes the identity operator on H. If A is maximally monotone, then JA:HH is single-value and maximally monotone [33, Proposition 23.7, Corollary 23.10]. For an arbitrary γ>0, we have the following identity [33, Proposition 23.18]

JγA+γJγ1A1γ1Id=Id.

We denote Γ(H) the family of proper, convex and lower semicontinuous extended real-valued functions defined on H. When ΨΓ(H) and γ>0, we denote by proxγΨ(x) the proximal point with parameter γ of function Ψ at point xH, which is the unique optimal solution of the optimization problem

infyHΨy+12γyx2.

Notice that Jγ∂Ψ=(Id+γ∂Ψ)1=proxγΨ, thus proxγΨ:HH is a single-valued operator fulfilling the so-called Moreau's decomposition formula:

proxγΨ+γproxγ1Ψγ1Id=Id.

The function Ψ:HR¯ is said to be γstrongly convex with γ>0, if Ψγ22 is a convex function. This property implies that ∂Ψ is γstrongly monotone.

The Fitzpatrick function [36] associated to a monotone operator A is defined as

ϕA:H×HR¯,ϕAx,u:=supy,vGrAx,v+y,uy,v

and it is a convex and lower semicontinuous function. For insights in the outstanding role played by the Fitzpatrick function in relating the convex analysis with the theory of monotone operators we refer to [33,34,37–39] and the references therein. If A is maximally monotone, then ϕA is proper and it fulfills

ϕAx,ux,ux,uH×H,

with equality if and only if (x,u)GrA. Notice that if ΨΓ(H), then ∂Ψ is a maximally monotone operator and it holds (∂Ψ)1=Ψ. Furthermore, the following inequality is true (see [37]):

ϕ∂Ψx,uΨx+Ψux,vH×H. (6)

We present as follows some statements that will be essential when carrying out the convergence analysis. Let (xn)n0 be a sequence in H and (λn)n1 be a sequence of positive real numbers. The sequence of weighted averages (zn)n1 is defined for every n1 as

zn:=1τnk=1nλkxk,where τn:=k=1nλk. (7)

Lemma 1.1 Opial-Passty —

Let Z be a non-empty subset of H and assume that the limit limn+xnu exists for every element uZ. If every sequential weak cluster point of (xn)n0, respectively (zn)n1, lies in Z, then the sequence (xn)n0, respectively (zn)n1, converges weakly to an element in Z as n+.

Two following result can be found in [5,7].

Lemma 1.2

Let (θn)n0,(ξn)n1 and (δn)n1 be sequences in R+ with (δn)n11. If there exists n01 such that

θn+1θnαnθnθn1ξn+δnnn0

and α such that

0αnα<1n1,

then the following statements are true:

  1. n1[θnθn1]+<+, where [s]+:=max{s,0};

  2. the limit limnθn exists.

  3. the sequence (ξn)n1 belongs to 1.

The following result follows from Lemma 1.2, applied in case αn:=0 and θn:=ρnρ for all n1, where ρ is a lower bound for (ρn)n1.

Lemma 1.3

Let (ρn)n1 be a sequence in R, which is bounded from below, and (ξn)n1, (δn)n1 be sequences in R+ with (δn)n11. If there exists n01 such that

ρn+1ρnξn+δnnn0,

then the following statements are true:

  1. the sequence (ρn)n1 is convergent.

  2. the sequence (ξn)n1 belongs to 1.

The following result, which will be useful in this work, shows that statement (ii) in Lemma 1.3 can be obtained also when (ρn)n1 is not bounded from below, but it has a particular form.

Lemma 1.4

Let (ρn)n1 be a sequence in R and (ξn)n1,(δn)n1 be sequences in R+ with (δn)n11 and

ρn:=θnαnθn1+χnn1,

where (θn)n0,(χn)n1 are sequences in R+ and there exists α such that

0αnα<1n1.

If there exists n01 such that

ρn+1ρnξn+δnnn0, (8)

then the sequence (ξn)n1 belongs to 1.

Proof.

We fix an integer N¯n0, sum up the inequalities in (8) for n=n0,n0+1,,N¯ and obtain

ρN¯+1ρn0n=n0N¯ξn+n=n0N¯δnn1δn<+. (9)

Hence the sequence {ρn}n1 is bounded from above. Let ρ¯>0 be an upper bound of this sequence. For all n1 it holds

θnαθn1θnαnθn1+χn=ρnρ¯,

from which we deduce that

ρnθn+αθn1αθn1. (10)

By induction we obtain for all nn0+1

θnαθn1+ρ¯αnn0θn0+ρ¯k=1nn0αk1αnn0θn0+ρ¯1α. (11)

Then inequality (9) combined with (10) and (11) leads to

n=n0N¯ξnρn0ρN¯+1+n=n0N¯δnρn0+αθN¯+n1δnρn0+αN¯n0+1θn0+αρ¯1α+n1δn<+. (12)

We let N¯ converge to + and obtain that n1ξn<+.

2. The general monotone inclusion problem

In this section we address the following monotone inclusion problem.

Problem 2.1

Let H be a real Hilbert space, A:HH a maximally monotone operator, D:HH an ηcocoercive with η>0,B:HH a μcocoercive with μ>0 and assume that M:=Zer B. The monotone inclusion problem to solve reads

0Ax+Dx+NMx.

The following forward–backward penalty algorithm with inertial effects for solving Problem 2.1 will be in the focus of our investigations in this paper.

2.

When D=0 and B=h, where h:HR is a convex and differentiable function with μ1-Lipschitz continuous gradient with μ>0 fulfilling minh=0, then Problem 2.1 recovers the monotone inclusion problem addressed in [3, Section 3] and Algorithm 2.2 can be seen as an inertial version of the iterative scheme considered in this paper. When B=0, we have that NM={0} and Algorithm 2.2 is nothing else than the inertial version of the classical forward–backward algorithm (see for instance [33,40]).

Hypotheses 2.2

The convergence analysis will be carried out in the following hypotheses (see also [7]):

  1. A+NMis maximally monotone and Zer(A+D+NM);

  2. for every pRanNM,n1λnβn[supuMϕB(u,pβn)σM(pβn)]<+.

Since A and NM are maximally monotone operators, the sum A+NM is maximally monotone, provided some specific regularity conditions are fulfilled (see [33–35,38]). Furthermore, since D is also maximally monotone [33, Example 20.28] and DomDH, if A+NM is maximally monotone, then A+D+NM is also maximally monotone.

Let us also notice that for pRanNM there exists uˆM such that pNM(uˆ), hence, for every β>0 it holds

supuMϕBu,pβσMpβuˆ,pβσMpβ=0.

For situations where (H2fitz) is satisfied we refer the reader to [5,8,9,11].

Before formulating the main theorem of this section we will prove some useful technical results.

Lemma 2.3

Let (xn)n0 be the sequence generated by Algorithm 2.2 and (u,y) be an element in Gr(A+D+NM) such that y=v+Du+p with vAu and pNM(u). Further, let ε1,ε2,ε3>0 be such that 1ε3>0. Then the following inequality holds for all n1

xn+1u2xnu2 αnxnu2αnxn1u214ε1ε2xn+1xn2+αn+αn24ε1xnxn12+2ε2λn2βn22μ1ε3λnβnBxn2+4ε2λn22ηλnDxnDu2+4ε2λn2Du+v2+2ε3λnβnsupuMϕBu,pε3βnσMpε3βn+2λnuxn,y. (13)

Proof.

Let n1 be fixed. According to definition of the resolvent of the operator A we have

xnxn+1λnDxn+βnBxn+αnxnxn1λnAxn+1 (14)

and, since λnvλnAu, the monotonicity of A guarantees

xn+1u,xnxn+1λnDxn+βnBxn+v+αnxnxn10 (15)

or, equivalently,

2uxn+1,xnxn+12λnuxn+1,βnBxn+Dxn+v2αnuxn+1,xnxn1. (16)

For the term in the left-hand side of (16) we have

2uxn+1,xnxn+1=xn+1u2+xn+1xn2xnu2. (17)

Since

2αnuxn,xnxn1= αnuxn12+αnuxn2+αnxnxn12

and

2xn+1xn,αnxnxn1 4ε1xn+1xn2+αn24ε1xnxn12,

by adding the two inequalities, we obtain the following estimation for the second term in the right-hand side of (16)

2αnuxn+1,xnxn1 αnxnu2αnxn1u2+4ε1xn+1xn2+αn+αn24ε1xnxn12. (18)

We turn now our attention to the first term in the right-hand side of (16), which can be written as

2λnuxn+1,βnBxn+Dxn+v= 2λnuxn,βnBxn+Dxn+v+2λnβnxnxn+1,Bxn+2λnxnxn+1,Dxn+v. (19)

We have

2λnβnxnxn+1,Bxnε22xn+1xn2+2ε2λn2βn2Bxn2 (20)

and

2λnxnxn+1,Dxn+vε22xn+1xn2+2ε2λn2Dxn+v2 ε22xn+1xn2+4ε2λn2DxnDu2+4ε2λn2Du+v2. (21)

On the other hand, we have

2λnuxn,βnBxn+Dxn+v= 2λnβnuxn,Bxn+2λnuxn,DxnDu+2λnuxn,Du+v. (22)

Since 0<ε3<1 and Bu=0, the cocoercivity of B gives us

2λnβnuxn,Bxn2μ1ε3λnβnBxn2+2ε3λnβnuxn,Bxn. (23)

Similarly, the cocoercivity of D gives us

2λnuxn,DxnDu2ηλnDxnDu2. (24)

Combining (23)–(24) with (22) and by using the definition Fitzpatrick function and the fact that σM(p/ε3βn)=u,p/ε3βn, we obtain

2λnuxn,βnBxn+Dxn+v2μ1ε3λnβnBxn2+2ε3λnβnuxn,Bxn2ηλnDxnDu2+2λnuxn,Du+v=2μ1ε3λnβnBxn2+2ε3λnβnuxn,Bxn2ηλnDxnDu2+2λnuxn,yp=2μ1ε3λnβnBxn22ηλnDxnDu2+2λnuxn,y+2ε3λnβnu,Bxn+xn,pε3βnxn,Bxnu,pε3βn2μ1ε3λnβnBxn22ηλnDxnDu2+2λnuxn,y+2ε3λnβnsupuMϕBu,pε3βnσMpε3βn. (25)

The inequalities (20), (21) and (25) lead to

2λnuxn+1,βnBxn+Dxn+v 2ε2λn2βn22μ1ε3λnβnBxn2+4ε2λn22ηλnDxnDu2+ε2xn+1xn2+4ε2λn2Du+v2+2ε3λnβnsupuMϕBu,pε3βnσMpε3βn+2λnuxn,y. (26)

Finally, by combining (17), (18) and (26), we obtain (13).

From now on we will assume that for 0<α<13 the constants ε1,ε2,ε3>0 and the sequences (λn)n1 and (βn)n1 are chosen such that

(C4)1ε3>0,ε2<14ε1αα24ε1andsupn1λnβn<με21ε3.

As a consequence, there exists 0<s 1ε113ε1ε21+α2ε12 , which means that for all n1 it holds

αn+1+αn+124ε114ε1ε3α+α24ε114ε1ε3<s. (27)

On the other hand, there exists 0<tμ(1ε2)(1/ε3)supn0λnβn , which means that for all n1 it holds

1ε3λnβnμ1ε2t. (28)

Remark 2.4

  1. Since 0<α<13, one can always find ε1,ε2>0 such that ε2<14ε1α(α2/4ε1) . One possible choice is ε1=α4 and 0<ε2<13α. From the second inequality in (C4) it follows that 13ε1ε2>ε1+α+(α2/4ε1)>0.

  2. As
    1ε113ε1ε21+α2ε12=113ε1ε214ε1ε2αα24ε1>0,
    it is always possible to choose s such that 0<s1ε113ε1ε(1+α2ε1)2. Since in this case s<14ε1ε2αα24ε1, one has (27).

The following proposition brings us closer to the convergence result.

Proposition 2.5

Let 0<α<13, ε1,ε2,ε3>0 and the sequences (λn)n1 and (βn)n1 satisfy condition (C4). Let (xn)n0 be the sequence generated by Algorithm 2.2 and assume that the Hypotheses 2.2 are verified. Then the following statements are true:

  1. the sequence (xn+1xn)n0 belongs to 2 and the sequence (λnβnBxn2)n1 belongs to 1;

  2. if, moreover, lim infn+λnβn>0, then limn+Bxn=0 and thus every cluster point of the sequence (xn)n0 lies in M.

  3. for every uZer(A+D+NM), the limit limn+xnu exists.

Proof.

Since limn+λn=0, there exists a integer n11 such that λn(2/ε2)η for all nn0. According to Lemma 2.3, for every (u,y)Gr(A+D+NM) such that y=v+Du+p, with vAu and pNM(u), and all nn0 the following inequality holds

xn+1u2xnu2 αnxnu2αnxn1u214ε1ε2xn+1xn2+αn+αn24ε1xnxn12+2ε2λnβn2μ1ε3λnβnBxn2+4ε2λn2Du+v2+2ε3λnβnsupuMϕBu,pε3βnσMpε3βn+2λnuxn,y. (29)

We consider uZer(A+D+NM), which means that we can take y=0 in (29). For all n1 we denote

θn:=xnu2,ρn:=θnαnθn1+αn+αn24ε1xnxn12 (30)

and

δn:=4ε2λn2Du+v2+2ε3λnβnsupuMϕBu,pε3βnσMpε3βn. (31)

Using that (αn)n1 is non-decreasing, for all nn0 it yields

ρn+1ρnαn+1+αn+124ε114ε1ε2xn+1xn2+2ε3λnβn2μ1ε2λnβnBxn2+δnsxn+1xn22tλnβnBxn2+δn, (32)

where s,t>0 are chosen according to (27) and (28), respectively.

Thanks to (H2fitz) and (C1) it holds

n1δn=4ε2Du+v2n1λn2+2n1ε3λnβnsupuMϕBu,pε3βnσMpε3βn<+. (33)

Hence, according to Lemma 1.4, we obtain

n0xn+1xn2<+ and n1λnβnBxn2<+, (34)

which proves (i). If, in addition, lim infnλnβn>0, then limn+Bxn=0, which means every cluster point of the sequence (xn)n0 lies in Zer B=M.

In order to prove (iii), we consider again the inequality (29) for an arbitrary element uZer(A+D+NM) and y=0. With the notations in (30) and (31), we get for all nn0

θn+1θnαnθnθn1+αn+αn24ε1xnxn12+δn. (35)

According to (33) and (34) we have

n1αn+αn24ε1xnxn12+n1δnα+α24ε1n1xnxn12+n1δn<+, (36)

therefore, by Lemma 1.2, the limit limn+θn=limn+xnu2 exists, which means that the limit limn+xnu exists, too.

Remark 2.6

The condition (C3) that we imposed in combination with 0<α<13 on the sequence of inertial parameters (αn)n1 is the one proposed in [15, Proposition 2.1] when addressing the convergence of the inertial proximal point algorithm. However, the statements in the proposition above and in the following convergence theorem remain valid if one alternatively assumes that there exists α such that 0αnα<1 for all n1 and

n1αn+αn24ε1xnxn12<+.

This can be realized if one chooses for a fixed p>1

αnminα,2ε11+1+npxnxn12 n1.

Indeed, in this situation we have that (αn2/4ε1)+αn(1/npxnxn12) 0 for all n1, which gives

n1αn+αn24ε1xnxn12n11np<+.

Now we are ready to prove the main theorem of this section, which addresses the convergence of the sequence generated by Algorithm 2.2.

Theorem 2.8

Let 0<α<13, ε1,ε2,ε3>0 and the sequences (λn)n1 and (βn)n1 satisfy condition (C4). Let (xn)n0 be the sequence generated by Algorithm 2.2, (zn)n1 be the sequence defined in (7) and assume that the Hypotheses 2.2 are verified. Then the following statements are true:

  1. the sequence (zn)n1 converges weakly to an element in Zer(A+D+NM) as n+.

  2. if A is γ-strongly monotone with γ>0, then (xn)n0 converges strongly to the unique element in Zer(A+D+NM) as n+.

Proof.

  1. According to Proposition 2.5 (iii), the limit limn+xnu exists for every uZer(A+D+NM). Let z be a sequential weak cluster point of (zn)n1. We will show that zZer(A+D+NM), by using the characterization (5) of the maximal monotonicity, and the conclusion will follow by Lemma 1.1. To this end we consider an arbitrary (u,y)Gr(A+D+NM) such that y=v+Du+p, where vAu and pNM(u). From (29), with the notations (30) and (31), we have for all nn0
    ρn+1ρnsxn+1xn22tλnβnBxn2+δn+2λnuxn,yδn+2λnuxn,y. (37)
    Recall that from (33) that n1δn<+. Since (xn)n0 is bounded, the sequence (ρn)n1 is also bounded.
    We fix an arbitrary integer N¯n0 and sum up the inequalities in (37) for n=n0+1,n0+2,,N¯. This yields
    ρN¯+1ρn0+1n1δn+2n=1n0λnu+n=1n0λnxn,y+2τN¯un=1N¯λnxn,y.
    After dividing this last inequality by 2τN¯=2n=1N¯λn, we obtain
    12τN¯ρN¯+1ρn0+112τN¯T+2uzN¯,y, (38)
    where T:=n1δn+2n=1n0λnu+n=1n0λnxn,yR. By passing in (38) to the limit and by using that limNτN¯=limN¯n=1N¯λn=+, we get
    lim infN¯uzN¯,y0.
    As z is a sequential weak cluster point of (zn)n1, the above inequality gives us uz,y0, which finally means that zZer(A+D+NM).
  2. Let uH be the unique element in Zer(A+D+NM). Since A is γstrongly monotone with γ>0, the formula in (15) reads for all n1
    xn+1u,xnxn+1λnDxn+βnBxn+v+αnxnxn1γλnxn+1u2
    or, equivalently,
    2γλnxn+1u2+2uxn+1,xnxn+1 2λnuxn+1,βnBxn+Dxn+v2αnuxn+1,xnxn1.
    By using again (17), (18) and (26) we obtain for all n1
    2γλnxn+1u2+xn+1u2xnu2αnxnu2αnxn1u214ε1ε2xn+1xn2+αn+αn24ε1xnxn12+2ε2λn2βn22μ1ε3λnβnBxn2+4ε2λn22ηλnDxnDu2+4ε2λn2Du+v2+2ε3λnβnsupuMϕBu,pε3βnσMpε3βn+2λnuxn,y.
    By using the notations in (30) and (31), this yields for all n1
    2γλnxn+1u2+θn+1θnαnθnθn1+αn+αn24ε1xnxn12+δn.
    By taking into account (36), from Lemma 1.2 we get
    2γn1λnxnu2<+.
    According to (C1) we have n1λn=+, which implies that the limit limnxnu must be equal to zero. This provides the desired conclusion.

3. Applications to convex bilevel programming

We will employ the results obtained in the previous section, in the context of monotone inclusions, to the solving of convex bilevel programming problems.

Problem 3.1

Let H be a real Hilbert space, f:HR¯ a proper, convex and lower semicontinuous function and g,h:HR differentiable functions with Lg-Lipschitz continuous and, respectively, Lh-Lipschitz continuous gradients. Suppose that argminh and minh=0. The bilevel programming problem to solve reads

minxargminhfx+gx.

The assumption minh=0 is not restrictive as, otherwise, one can replace h with hminh.

Hypotheses 3.2

The convergence analysis will be carry out in the following hypotheses:

  1. f+Nargminh is maximally monotone and S:=argminxargminh{f(x)+g(x)};

  2. for every pRanNargminh,n1λnβn[h(pβn)σargminh(pβn)]<+.

In the above hypotheses, we have that f+g+Nargminh=(f+g+δargminh) and hence S=Zer(f+g+Nargminh). Since according to the Theorem of Baillon–Haddad (see, e.g. [33, Corollary 18.16]), g and h are Lg1-cocoercive and, respectively, Lh1- cocoercive, and argminh=Zerh, solving the bilevel programming problem in Problem 3.1 reduces to solving the monotone inclusion

0f(x)+g(x)+Nargminh(x).

By using to this end Algorithm 2.2, we receive the following iterative scheme.

3.

By using the inequality (6), one can easily notice, that (H2prog) implies (H2fitz), which means that the convergence statements for Algorithm 3.3 can be derived as particular instances of the ones derived in the previous section.

Alternatively, one can use to this end the following lemma and employ the same ideas and techniques as in Section 2. Lemma 3.3 is similar to Lemma 2.3, however, it will allow us to provide convergence statements also for the sequence of function values (h(xn))n0.

Lemma 3.3

Let (xn)n0 be the sequence generated by Algorithm 3.3 and (u,y) be an element in Gr(f+g+Nargminh) such that y=v+g(u)+p with vf(u) and pNargminh(u). Further, let ε1,ε2,ε3>0 be such that 1ε3>0. Then the following inequality holds for all n1

xn+1u2xnu2 αnxnu2αnxn1u214ε1ε2xn+1xn2+αn+αn24ε1xnxn122ε2λn2βn22μ1ε3λnβnhxn2+4ε2λn22ηλngxngu2+λnβnhuhxn+4ε2λn2v+gu2+ε3λnβnh2pε3βnσargminh2pε3βn+2λnuxn,y.

Proof.

Let be n1 fixed. The proof follows by combining the estimates used in the proof of Lemma 2.3 with some inequalities which better exploit the convexity of h. From (23) we have

2λnβnuxn,hxn2μ1ε3λnβnhxn2+2ε3λnβnuxn,hxn.

Since h is convex, the following relation also holds

2λnβnuxn,hxn2λnβnhuhxn.

Summing up the two inequalities above gives

2λnβnuxn,hxnμ1ε3λnβnhxn2+ε3λnβnuxn,hxn+λnβnhuhxn.

Using the same techniques as in the derivation of (25), we get

2λnuxn,v+gxn+βnhxnμ1ε3λnβnhxn22ηλngxngu2+λnβnhuhxn+2λnuxn,y+ε3λnβnhu,2pε3βnσargminh2pε3βn.

With these improved estimates, the conclusion follows as in the proof of Lemma 2.3.

By using now Lemma 3.3, one obtains, after slightly adapting the proof of Proposition 2.5, the following result.

Proposition 3.4

Let 0<α<13, ε1,ε2,ε3>0 and the sequences (λn)n1 and (βn)n1 satisfy condition (C4). Let (xn)n0 be the sequence generated by Algorithm 3.3 and assume that the Hypotheses 3.2 are verified. Then the following statements are true:

  1. the sequence (xn+1xn)n0 belongs to 2 and the sequences (λnβnh(xn)2)n1 and (λnβnh(xn))n1 belong to 1;

  2. if, moreover, lim infn+λnβn>0, then limn+h(xn)=limn+h(xn)=0 and thus every cluster point of the sequence (xn)n0 lies in argminh.

  3. for every uS, the limit limn+xnu exists.

Finally, the above proposition leads to the following convergence result.

Theorem 3.6

Let 0<α<13, ε1,ε2,ε3>0 and the sequences (λn)n1 and (βn)n1 satisfy condition (C4). Let (xn)n0 be the sequence generated by Algorithm 3.3, (zn)n1 be the sequence defined in (7) and assume that the Hypotheses 3.2 are verified. Then the following statements are true:

  1. the sequence (zn)n1 converges weakly to an element in S as n+.

  2. if f is γstrongly convex with γ>0, then (xn)n0 converges strongly to the unique element in S as n+.

As follows we will show that under inf-compactness assumptions one can achieve weak non-ergodic convergence for the sequence (xn)n0. Weak non-ergodic convergence has been obtained for Algorithm 3.3 in [9] when αn=α for all n1 and for restrictive choices for both the sequence of step sizes and penalty parameters.

We denote by (f+g)=minxargminh(f(x)+g(x)). For every element x in H, we denote by dist(x,S)=infuSxu the distance from x to S. In particular, dist(x,S)=xPrSx, where PrSx denotes the projection of x onto S. The projection operator PrS is firmly non-expansive [33, Proposition 4.8], this means

PrSxPrSy2+IdPrSxIdPrSy2xy2 x,yH. (39)

Denoting d(x)=12dist(x,S)2=12xPrSx2 for all xH, one has that xd(x) is differentiable and it holds d(x)=xPrSx for all xH.

Lemma 3.6

Let (xn)n0 be the sequence generated by Algorithm 3.3 and assume that the Hypotheses 3.2 are verified. Then the following inequality holds for all n1

dxn+1dxnαndxndxn1+λn(f+g)xn+1(f+g) Lg2λn+Lh4λnβn+αn2xn+1xn2+αnxnxn12. (40)

Proof.

Let n1 be fixed. Since d is convex, we have

dxn+1dxnxn+1PrSxn+1,xn+1xn. (41)

Then there exists vn+1f(xn+1) such that (see (14))

xnxn+1λn(g(xn)+βnh(xn))+αn(xnxn1)=λnvn+1

and, so,

xn+1PrSxn+1,xn+1xn= xn+1PrSxn+1,λnvn+1λngxnλnβnhxn+αnxnxn1λnβnxn+1PrSxn+1,hxn+αnxn+1PrSxn+1,xnxn1. (42)

Since vn+1f(xn+1), we get

λnxn+1PrSxn+1,vn+1λnfPrSxn+1fxn+1. (43)

Using the convexity of g it follows

gxngPrSxn+1gxn,xnPrSxn+1. (44)

On the other hand, the Descent Lemma gives

gxn+1gxn+gxn,xn+1xn+Lg2xn+1xn2. (45)

By adding (44) and (45), it yields

λnxn+1PrSxn+1,gxn λngPrSxn+1gxn+1+Lgλn2xn+1xn2. (46)

Using the (1/Lh) cocoercivity of h combined with the fact that h(PrS(xn+1))=0 (as PrS(xn+1) belongs to S), it yields

xnPrSxn+1,hxn1Lhhxn2.

Therefore

λnβnxn+1PrSxn+1,hxnλnβnxnxn+1,hxn1Lhhxn2λnβnLh4xn+1xn2. (47)

Further, we have

αnxn+1PrSxn+1xnPrSxn,xnxn1 αn2IdPrSxn+1IdPrSxn2+αn2xnxn12αn2xn+1xn2+αn2xnxn12,

and

αnxnPrSxn,xnxn1=αndxn+αn2xnxn12αn2xn1PrSxn2αndxn+αn2xnxn12αndxn1.

By adding two relations above, we obtain

αnxn+1PrSxn+1,xnxn1= αnxn+1PrSxn+1xnPrSxn,xnxn1+αnxnPrSxn,xnxn1αn2xn+1xn2+αnxnxn12+αndxndxn1. (48)

By combining (43), (46), (47) and (48) with (42) we obtain the desired conclusion.

Definition 3.7

A function Ψ:HR¯ is sad to be inf-compact if for every r>0 and κR the set

LevκrΨ:=xH:xr,Ψxκ

is relatively compact in H.

An useful property of inf-compact functions follows.

Lemma 3.8

Let Ψ:HR¯ be inf-compact and (xn)n0 be a bounded sequence in H such that (Ψ(xn))n0 is bounded as well. If the sequence (xn)n0 converges weakly to an element in xˆ as n+, then it converges strongly to this element.

Proof.

Let be r¯>0 and κ¯R such that for all n1

xnr¯andΨxnκ¯.

Hence, (xn)n0 belongs to the set Levκ¯r¯(Ψ), which is relatively compact. Then (xn)n0 has at least one strongly convergent subsequence. Since every strongly convergent subsequence (xnl)l0 of (xn)n0 has as limit xˆ, the desired conclusion follows.

We can formulate now the weak non-ergodic convergence result.

Theorem 3.10

Let 0<α<13, ε1,ε2,ε3>0, the sequences (λn)n1 and (βn)n1 satisfy the condition 0<lim infnλnβnsupn0λnβnμ, (xn)n0 be the sequence generated by Algorithm 3.3, and assume that the Hypotheses 3.2 are verified and that either f+g or h is inf-compact. Then the following statements are true:

  1. limn+d(xn)=0;

  2. the sequence (xn)n0 converges weakly to an element in S as n+;

  3. if h is inf-compact, then the sequence (xn)n0 converges strongly to an element in S as n+.

Proof.

  1. Thanks to Lemma 3.6, for all n1 we have
    dxn+1dxn+λn(f+g)xn+1(f+g)αndxndxn1+ζn, (49)
    where
    ζn:= Lg2λn+Lh4λnβn+αn2xn+1xn2+αnxnxn12.
    From Proposition 3.4 (i), combined with the fact that both sequences (λn)n1 and (βn)n1 are bounded, it follows that n1ζn<+.

    In general, since (xn)n0 is not necessarily included in argminh, we have to treat two different cases.

    Case 1: There exists an integer n11 such that (f+g)(xn)(f+g) for all nn1. In this case, we obtain from Lemma 1.2 that:
    • the limit limn+d(xn) exists.
    • nn2λn[(f+g)(xn+1)(f+g)]<+. Moreover, since (λn)n11, we must have
      lim infn+(f+g)xn(f+g). (50)
    Consider a subsequence (xnk)k0 of (xn)n0 such that
    limk+(f+g)xnk=lim infn+(f+g)xn
    and note that, thanks to (50), the sequence ((f+g)(xnk))k0 is bounded. From Proposition 3.4 (ii) –(iii) we get that also (xnk)k0 and (h(xnk))k0 are bounded. Thus, since either f+g or h is inf-compact, there exists a subsequence (xnl)l0 of (xnk)k0, which converges strongly to an element xˆ as l+. According to Proposition 3.4 (ii) –(iii), xˆ belongs to argminh. On the other hand,
    liml+(f+g)xnl=lim infn+(f+g)xn(f+g)xˆ(f+g). (51)
    We deduce from (50)–(51) that (f+g)(xˆ)=(f+g), or in other words, that xˆS. In conclusion, thanks to the continuity of d,
    limn+dxn=limldxnl=dxˆ=0.
    Case 2: For all n1 there exists some n>n such that (f+g)(xn)<(f+g). We define the set
    V=n1:(f+g)xn<(f+g).
    There exists an integer n22 such that for all nn2 the set {kn:kV} is non-empty. Hence, for all nn2 the number
    tn:=maxkn:kV
    is well-defined. By definition tnn for all nn3 and moreover the sequence {tn}nn2 is non-decreasing and limn+tn=. Indeed, if limntn=tR, then for all n>t it holds (f+g)(xn)(f+g), contradiction. Choose an integer Nn2.
    • If tN<N, then, for all n=tN,,N1, since (f+g)(xn)(f+g), the inequality (49) gives
      dxn+1dxndxn+1dxn+λnFxn+1Fαndxndxn1+ζn. (52)
      Summing (52) for n=tN,,N1 and using tht {αn}n1 is non-decreasing, it yields
      dxNdxtN n=tNN1αndxnαn1dxn1+n=tNN1ζnαdxN1+ntNζn. (53)
    • If tN=N, then d(xN)=d(xtN) and we have
      dxNαdxN1dxtN+ntNζn. (54)
    For all n1 we define an:=d(xn)αd(xn1). In both cases it yields
    aNdxtN+n=tNNζndxtN+ntNζn. (55)
    Passing in (55) to limit as N+ we obtain that
    lim supn+anlim supn+dxtn. (56)
    Let be uS. For all n1 we have
    dxn=12distxn,S212xnu2,
    which shows that (d(xn))n0 is bounded, as limn+xnu exists. We obtain
    lim supnan=lim supndxnαdxn11αlim supndxn0. (57)
    Further, for all n1 we have (f+g)(xtn)<(f+g), which gives
    lim supn+(f+g)xtn(f+g). (58)
    This means that the sequence ((f+g)(xtn))n0 is bounded from above. Consider a subsequence (xtk)k0 of (xtn)n0 such that
    limk+dxtk=lim supn+dxtn.
    From Proposition 3.4 (ii)–(iii) we get that also (xtk)k0 and (h(xtk))k0 are bounded. Thus, since either f+g or h is inf-compact, there exists a subsequence (xtl)l0 of (xtk)k0, which converges strongly to an element xˆ as l+. According to Proposition 3.4 (ii)–(iii), xˆ belongs to argminh. Furthermore, it holds
    lim infl+(f+g)xtl(f+g)xˆ(f+g). (59)
    We deduce from (58) and (59) that
    (f+g)(f+g)xˆlim supn+(f+g)xtllim supn+(f+g)xtn(f+g),
    which gives xˆS. Thanks to the continuity of d we get
    lim supn+dxtn=liml+dxtl=dxˆ=0. (60)
    By combining (56), (57) and (60), it yields
    01αlim supn+dxnlim supn+anlim supn+dxtn=0,
    which implies lim supn+d(xn)=0 and thus
    limn+dxn=lim infn+dxn=lim supn+dxn=0.
  2. According to (i) we have limnd(xn)=0, thus every weak cluster point of the sequence (xn)n0 belongs to S. From Lemma 1.1 it follows that (xn)n0 converges weakly to a point in S as n+.

  3. Since lim infnλnβn>0, from Proposition 3.4(ii) we have that
    limn+hxn=limn+hxn=0.
    Since (xn)n0 is bounded, there exist r¯>0 and κ¯R such that for all n1
    xnr¯andhxnκ¯.
    Thanks to (ii) the sequence (xn)n0 converges weakly to an element in S. Therefore, according to Lemma 3.8, it converges strongly to this element in S.

Funding Statement

Research partially supported by FWF (Austrian Science Fund), projects I 2419-N32 and W1260-N35.

Acknowledgments

The second author gratefully acknowledges the financial support of the Doctoral Programme Vienna Graduate School on Computational Optimization (VGSCO) which is funded by Austrian Science Fund (FWF, project W1260-N35).

Disclosure statement

No potential conflict of interest was reported by the authors.

ORCID

Radu Ioan Boţ  http://orcid.org/0000-0002-4469-314X

References

  • [1].Attouch H, Czarnecki M-O.. Asymptotic behavior of coupled dynamical systems with multiscale aspects. J Differ Equ. 2010;248(6):1315–1344. [Google Scholar]
  • [2].Attouch H, Cabot A, Czarnecki M-O.. Asymptotic behavior of nonautonomous monotone and subgradient evolution equations. Trans Am Math Soc. 2018;370(2):755–790. [Google Scholar]
  • [3].Attouch H, Czarnecki M-O, Peypouquet J.. Coupling forward–backward with penalty schemes and parallel splitting for constrained variational inequalities. SIAM J Optim. 2011;21(4):1251–1274. [Google Scholar]
  • [4].Attouch H, Czarnecki M-O, Peypouquet J.. Prox-penalization and splitting methods for constrained variational problems. SIAM J Optim. 2011;21(1):149–173. [Google Scholar]
  • [5].Banert S, Boţ RI.. Backward penalty schemes for monotone inclusion problems. J Optim Theory Appl. 2015;166(3):930–948. [Google Scholar]
  • [6].Boţ RI, Csetnek ER.. A Tseng's type penalty scheme for solving inclusion problems involving linearly composed and parallel-sum type monotone operators. Vietnam J Math. 2014;42(4):451–465. [Google Scholar]
  • [7].Boţ RI, Csetnek ER.. Forward–backward and Tseng's type penalty schemes for monotone inclusion problems. Set-Valued Var Anal. 2014;22(2):313–331. [Google Scholar]
  • [8].Boţ RI, Csetnek ER, Nimana N.. Gradient-type penalty method with inertial effects for solving constrained convex optimization problems with smooth data. Optim Lett. 2018;12(1):17–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Boţ RI, Csetnek ER, Nimana N.. An inertial proximal-gradient penalization scheme for constrained convex optimization problems. Vietnam J Math. 2018;46(1):53–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Frankel P, Peypouquet J.. Lagrangian-penalization algorithm for constrained optimization and variational inequalities. Set-Valued Var Anal. 2012;20(2):169–185. [Google Scholar]
  • [11].Noun N, Peypouquet J.. Forward–backward penalty scheme for constrained convex minimization without inf-compactness. J Optim Theory Appl. 2013;158(3):787–795. [Google Scholar]
  • [12].Peypouquet J. Coupling the gradient method with a general exterior penalization scheme for convex minimization. J Optim Theory Appl. 2012;153(1):123–138. [Google Scholar]
  • [13].Attouch H, Briceno-Arias LM, Combettes PL.. A parallel splitting method for coupled monotone inclusions. SIAM J Control Optim. 2010;48(5):3246–3270. [Google Scholar]
  • [14].Attouch H, Bolte J, Redont P, Soubeyran A.. Alternating proximal algorithms for weakly coupled convex minimization problems, applications to dynamical games and PDE's. J Convex Anal. 2008;15(3):485–506. [Google Scholar]
  • [15].Alvarez F, Attouch H.. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 2001;9(1):3–11. [Google Scholar]
  • [16].Polyak BT. Introduction to optimization. New York: Publications Division, Optimization Software Inc.; 1987. Translations Series in Mathematics and Engineering. [Google Scholar]
  • [17].Beck A, Teboulle M.. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci. 2009;2(1):183–202. [Google Scholar]
  • [18].Chambolle A, Dossal C.. On the convergence of the iterates of the ‘Fast Iterative Shrinkage/Thresholding Algorithm’. J Optim Theory Appl. 2015;166(3):968–982. [Google Scholar]
  • [19].Bertsekas DP. Nonlinear programming. Cambridge (MA): Athena Scientific; 1999. [Google Scholar]
  • [20].Boţ RI, Csetnek ER, Laszlo SC.. An inertial forward–backward algorithm for the minimization of the sum of two nonconvex functions. EURO J Comput Optim. 2016;4(1):3–25. [Google Scholar]
  • [21].Alvarez F. On the minimizing property of a second order dissipative system in Hilbert spaces. SIAM J Control Optim. 2000;38(4):1102–1119. [Google Scholar]
  • [22].Alvarez F. Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J Optim. 2004;14(3):773–782. [Google Scholar]
  • [23].Attouch H, Czarnecki M-O.. Asymptotic behavior of gradient-like dynamical systems involving inertia and multiscale aspects. J Differ Equ. 2017;262(3):2745–2770. [Google Scholar]
  • [24].Attouch H, Peypouquet J, Redont P.. A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J Optim. 2014;24(1):232–256. [Google Scholar]
  • [25].Boţ RI, Csetnek ER.. An inertial forward–backward–forward primal–dual splitting algorithm for solving monotone inclusion problems. Numer Algorithms. 2016;71(3):519–540. [Google Scholar]
  • [26].Boţ RI, Csetnek ER.. Penalty schemes with inertial effects for monotone inclusion problems. Optimization. 2017;66(6):313–331. [Google Scholar]
  • [27].Boţ RI, Csetnek ER, Hendrich C.. Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl Math Comput. 2015;256(1):472–487. [Google Scholar]
  • [28].Chen C, Chan RH, Ma S, Yang J.. Inertial proximal ADMM for linearly constrained separable convex optimization. SIAM J Imag Sci. 2015;8(4):2239–2267. [Google Scholar]
  • [29].Chen C, Ma S, Yang J.. A general inertial proximal point algorithm for mixed variational inequality problem. SIAM J Optim. 2015;25(4):2120–2142. [Google Scholar]
  • [30].Maingé P-E. Convergence theorems for inertial Krasnosel'skiĭ–Mann type algorithms. J Comput Appl Math. 2008;219(1):223–236. [Google Scholar]
  • [31].Maingé P-E, Moudafi A.. Convergence of new inertial proximal methods for dc programming. SIAM J Optim. 2008;19(1):397–413. [Google Scholar]
  • [32].Moudafi A, Oliny M.. Convergence of a splitting inertial proximal method for monotone operators. J Comput Appl Math. 2003;155(2):447–454. [Google Scholar]
  • [33].Bauschke HH, Combettes PL.. Convex analysis monotone operator theory in Hilbert spaces. New York (NY): Springer; CMS Books in Mathematics; 2011. [Google Scholar]
  • [34].Boţ RI. Conjugate duality in convex optimization. Vol. 637 Berlin, Heidelberg: Springer; 2010. Lecture Notes in Economics and Mathematical Systems. [Google Scholar]
  • [35].Zălinescu C. Convex analysis in general vector spaces. Singapore: World Scientific; 2002. [Google Scholar]
  • [36].Fitzpatrick S. Representing monotone operators by convex functions. Proceedings of the Centre for Mathematical Analysis. Workshop/Miniconference on Functional Analysis and Optimization, Vol. 20, Australian National University, Canberra; 1988.
  • [37].Bauschke HH, McLaren DA, Sendov HS.. Fitzpatrick functions: inequalities, examples and remarks on a problem by S. Fitzpatrick. J Convex Anal. 2006;13(3):499–523. [Google Scholar]
  • [38].Borwein JM. Maximal monotonicity via convex analysis. J Convex Anal. 2006;13(3):561–586. [Google Scholar]
  • [39].Burachik RS, Svaiter BF.. Maximal monotone operators, convex functions and a special family of enlargements. Set-Valued Anal. 2002;10(4):29–316. [Google Scholar]
  • [40].Combettes PL. Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization. 2004;53(5–6):475–504. [Google Scholar]

Articles from Optimization are provided here courtesy of Taylor & Francis

RESOURCES