A forward–backward penalty scheme with inertial effects for monotone inclusions. Applications to convex bilevel programming

Radu Ioan Boţ; Dang-Khoa Nguyen

doi:10.1080/02331934.2018.1556662

. 2018 Dec 11;68(10):1855–1880. doi: 10.1080/02331934.2018.1556662

A forward–backward penalty scheme with inertial effects for monotone inclusions. Applications to convex bilevel programming

Radu Ioan Boţ ^a,^CONTACT, Dang-Khoa Nguyen ^b

PMCID: PMC6817331 PMID: 31708644

ABSTRACT

We investigate a forward–backward splitting algorithm of penalty type with inertial effects for finding the zeros of the sum of a maximally monotone operator and a cocoercive one and the convex normal cone to the set of zeroes of an another cocoercive operator. Weak ergodic convergence is obtained for the iterates, provided that a condition expressed via the Fitzpatrick function of the operator describing the underlying set of the normal cone is verified. Under strong monotonicity assumptions, strong convergence for the sequence of generated iterates is proved. As a particular instance we consider a convex bilevel minimization problem including the sum of a non-smooth and a smooth function in the upper level and another smooth function in the lower level. We show that in this context weak non-ergodic and strong convergence can be also achieved under inf-compactness assumptions for the involved functions.

KEYWORDS: Maximally monotone operator, Fitzpatrick function, forward–backward splitting algorithm, convex bilevel optimization

AMS SUBJECT CLASSIFICATIONS: 47H05, 65K05, 90C25

1. Introduction and preliminaries

1.1. Motivation and problems formulation

During the last couple years one can observe in the optimization community an increasing interest in numerical schemes for solving variational inequalities expressed as monotone inclusion problems of the form

0 \in A x + N_{M} (x),

(1)

where $H$ is a real Hilbert space, $A : H ⇉ H$ is a maximally monotone operator, $M := \arg min h$ is the set of global minima of the proper, convex and lower semicontinuous function $h : R \to \bar{R} := R \cup {\pm \infty}$ and $N_{M} : H ⇉ H$ is the normal cone of the set M. The article [1] was starting point for a series of papers [2–12] addressing this topic or related ones. All these papers share the common feature that the proposed iterative schemes use penalization strategies, namely, they evaluate the penalized h by its gradient, in case the function is smooth (see, for instance, [3]), and by its proximal operator, in case it is non-smooth (see, for instance, [4]).

Weak ergodic convergence has been obtained in [3,4] under the hypothesis:

For all p \in Ran N_{M}, \sum_{n \geq 1} λ_{n} β_{n} [h^{*} (\frac{p}{β_{n}}) - σ_{M} (\frac{p}{β_{n}})] < + \infty,

(2)

with $(λ_{n})_{n \geq 1}$ , the sequence of step sizes, $(β_{n})_{n \geq 1}$ , the sequence of penalty parameters, $h^{*} : H \to \bar{R}$ , the Fenchel conjugate function of h, and $Ran N_{M}$ the range of the normal cone operator $N_{M} : H ⇉ H$ . Let us mention that (2) is the discretized counterpart of a condition introduced in [1] for continuous-time non-autonomous differential inclusions.

One motivation for studying numerical algorithms for monotone inclusions of type (1) comes from the fact that, when $A \equiv \partial f$ is the convex subdifferential of a proper, convex and lower semicontinuous function $f : H \to \bar{R}$ , they furnish iterative methods for solving bilevel optimization problems of the form

min_{x \in H} \{f (x) : x \in \arg min h\} .

(3)

Among the applications where bilevel programming problems play an important role we mention the modelling of Stackelberg games, the determination of Wardrop equilibria for network flows, convex feasibility problems [13], domain decomposition methods for PDEs [14], image processing problems [6], and optimal control problems [4].

Later on, in [7], the following monotone inclusion problem, which turned out to be more suitable for applications, has been addressed in the same spirit of penalty algorithms

0 \in A x + D x + N_{M} (x),

(4)

where $A : H ⇉ H$ is a maximally monotone operator, $D : H \to H$ is cocoercive operator and the constraint set M is the set of zeros of another cocoercive operator $B : H \to H$ . The provided algorithm of forward–backward type evaluates the operator A by a backward step and the two single-valued operators by forward steps. For the convergence analysis, (2) has been replaced by a condition formulated in terms of the Fitzpatrick function associated with the operator B, which we will also use in this paper. In [5], several particular situations for which this new condition is fulfilled have been provided.

The aim of this work is to endow the forward–backward penalty scheme for solving (4) from [7] with inertial effects, which means that the new iterate is defined in terms of the previous two iterates. Inertial algorithms have their roots in the time discretization of second-order differential systems [15]. They can accelerate the convergence of iterates when minimizing a differentiable function [16] and the convergence of the objective function values when minimizing the sum of a convex non-smooth and a convex smooth function [17,18]. Moreover, as emphasized in [19], see also [20], algorithms with inertial effects may detect optimal solutions of minimization problems which cannot be found by their non-inertial variants. In the last years, a huge interest in inertial algorithms can be noticed (see, for instance, [8,9,15,17,20–32]).

We prove weak ergodic convergence of the sequence generated by the inertial forward–backward penalty algorithm to a solution of the monotone inclusion problem (4), under reasonable assumptions for the sequences of step sizes, penalty and inertial parameters. When the operator A is assumed to be strongly monotone, we also prove strong convergence of the generated iterates to the unique solution of (4).

In Section 3, we address the minimization of the sum of a convex non-smooth and a convex smooth function with respect to the set of minimizes of another convex and smooth function. Besides the convergence results obtained from the general case, we achieve weak non-ergodic and strong convergence statements under inf-compactness assumptions for the involved functions. The weak non-ergodic theorem is an useful alternative to the one in [9], where a similar statement has been obtained for the inertial forward–backward penalty algorithm with constant inertial parameter under assumptions which are quite complicated and hard to verify (see also [11,12]).

1.2. Notations and preliminaries

In this subsection we introduce some notions and basic results which we will use throughout this paper (see [33–35]). Let $H$ be a real Hilbert space with inner product $⟨ \cdot, \cdot ⟩$ and associated norm $∥ \cdot ∥ = \sqrt{⟨ \cdot, \cdot ⟩}$ .

For a function $Ψ : H \to \bar{R} := R \cup {\pm \infty}$ , we denote $Dom Ψ = {x \in H : Ψ (x) < + \infty}$ its effective domain and say that Ψ is proper, if $Dom Ψ \neq \emptyset$ and $Ψ (x) > - \infty$ for all $x \in H$ . The conjugate function of Ψ is $Ψ^{*} : H \to \bar{R}, Ψ^{*} (u) = sup_{x \in H} {⟨ x, u ⟩ - Ψ (x)}$ . The convex subdifferential of Ψ at the point $x \in H$ is the set $∂Ψ (x) = {p \in H : ⟨ y - x, p ⟩ \leq Ψ (y) - Ψ (x) \forall y \in H}$ , whenever $Ψ (x) \in R$ . We take by convention $∂Ψ (x) = \emptyset$ , if $Ψ (x) \in {\pm \infty}$ .

Let M be a non-empty subset of $H$ . The indicator function of M, which is denoted by $δ_{M} : H \to \bar{R}$ , takes the value 0 on M and $+ \infty$ otherwise. The convex subdifferential of the indicator function is the normal cone of M, that is $N_{M} (x) = {p \in H : ⟨ y - x, p ⟩ \leq 0 \forall y \in H}$ , if $x \in M$ , and $N_{M} (x) = \emptyset$ otherwise. Notice that for $x \in M$ we have $p \in N_{M} (x)$ if and only if $σ_{M} (x) = ⟨ x, p ⟩$ , where $σ_{M} = δ_{M}^{*}$ is the support function of M.

For an arbitrary set-value operator $A : H ⇉ H$ we denote by $Gr A = {(x, v) \in H \times H : v \in A x}$ its graph, by $Dom A = {x \in H : A x \neq \emptyset}$ its domain, by $Ran A = {v \in H : \exists x \in H with v \in A x}$ its range and by $A^{- 1} : H ⇉ H$ its inverse operator, defined by $(v, x) \in Gr A^{- 1}$ if and only if $(x, v) \in Gr A$ . We use also the notation $Zer A = {x \in H : 0 \in A x}$ for the set of zeros of the operator A. We say that A is monotone, if $⟨ x - y, v - w ⟩ \geq 0$ for all $(x, v), (y, w) \in Gr A$ . A monotone operator A is said to be maximally monotone, if there exists no proper monotone extension of the graph of A on $H \times H$ . Let us mention that if A is maximally monotone, then $Zer A$ is a convex and closed set [33, Proposition 23.39]. We refer to [33, Section 23.4] for conditions ensuring that $Zer A$ is non-empty. If A is maximally monotone, then one has the following characterization for the set of its zeros

z \in Zer A if and only if ⟨u - z, y⟩ \geq 0 for all (u, y) \in Gr A .

(5)

The operator A is said to be γ- strongly monotone with $γ > 0$ , if $⟨ x - y, v - w ⟩ \geq ∥ x - y ∥^{2}$ for all $(x, v), (y, w) \in Gr A$ . If A is maximally monotone and strongly monotone, then $Zer A$ is a singleton, thus non-empty [33, Corollary 23.27].

The resolvent of $A, J_{A} : H ⇉ H$ , is defined by $J_{A} := (Id + A)^{- 1}$ , where $Id : H \to H$ denotes the identity operator on $H$ . If A is maximally monotone, then $J_{A} : H \to H$ is single-value and maximally monotone [33, Proposition 23.7, Corollary 23.10]. For an arbitrary $γ > 0$ , we have the following identity [33, Proposition 23.18]

J_{γ A} + γ J_{γ^{- 1} A^{- 1}} \circ γ^{- 1} Id = Id .

We denote $Γ (H)$ the family of proper, convex and lower semicontinuous extended real-valued functions defined on $H$ . When $Ψ \in Γ (H)$ and $γ > 0$ , we denote by ${prox}_{γ Ψ} (x)$ the proximal point with parameter γ of function Ψ at point $x \in H$ , which is the unique optimal solution of the optimization problem

inf_{y \in H} \{Ψ (y) + \frac{1}{2 γ} {∥y - x∥}^{2}\} .

Notice that $J_{γ ∂Ψ} = (Id + γ ∂Ψ)^{- 1} = {prox}_{γ Ψ}$ , thus ${prox}_{γ Ψ} : H \to H$ is a single-valued operator fulfilling the so-called Moreau's decomposition formula:

{prox}_{γ Ψ} + γ {prox}_{γ^{- 1} Ψ^{*}} \circ γ^{- 1} Id = Id .

The function $Ψ : H \to \bar{R}$ is said to be $γ -$ strongly convex with $γ > 0$ , if $Ψ - \frac{γ}{2} ∥ \cdot ∥^{2}$ is a convex function. This property implies that $∂Ψ$ is $γ -$ strongly monotone.

The Fitzpatrick function [36] associated to a monotone operator A is defined as

ϕ_{A} : H \times H \to \bar{R}, ϕ_{A} (x, u) := sup_{(y, v) \in Gr A} \{⟨x, v⟩ + ⟨y, u⟩ - ⟨y, v⟩\}

and it is a convex and lower semicontinuous function. For insights in the outstanding role played by the Fitzpatrick function in relating the convex analysis with the theory of monotone operators we refer to [33,34,37–39] and the references therein. If A is maximally monotone, then $ϕ_{A}$ is proper and it fulfills

ϕ_{A} (x, u) \geq ⟨x, u⟩ \forall (x, u) \in H \times H,

with equality if and only if $(x, u) \in Gr A$ . Notice that if $Ψ \in Γ (H)$ , then $∂Ψ$ is a maximally monotone operator and it holds $(∂Ψ)^{- 1} = \partial Ψ^{*}$ . Furthermore, the following inequality is true (see [37]):

ϕ_{∂Ψ} (x, u) \leq Ψ (x) + Ψ^{*} (u) \forall (x, v) \in H \times H .

(6)

We present as follows some statements that will be essential when carrying out the convergence analysis. Let $(x_{n})_{n \geq 0}$ be a sequence in $H$ and $(λ_{n})_{n \geq 1}$ be a sequence of positive real numbers. The sequence of weighted averages $(z_{n})_{n \geq 1}$ is defined for every $n \geq 1$ as

z_{n} := \frac{1}{τ_{n}} \sum_{k = 1}^{n} λ_{k} x_{k}, where τ_{n} := \sum_{k = 1}^{n} λ_{k} .

(7)

Lemma 1.1 Opial-Passty —

Let Z be a non-empty subset of $H$ and assume that the limit $lim_{n \to + \infty} ∥ x_{n} - u ∥$ exists for every element $u \in Z$ . If every sequential weak cluster point of $(x_{n})_{n \geq 0},$ respectively $(z_{n})_{n \geq 1},$ lies in Z, then the sequence $(x_{n})_{n \geq 0},$ respectively $(z_{n})_{n \geq 1},$ converges weakly to an element in Z as $n \to + \infty$ .

Two following result can be found in [5,7].

Lemma 1.2

Let $(θ_{n})_{n \geq 0}, (ξ_{n})_{n \geq 1}$ and $(δ_{n})_{n \geq 1}$ be sequences in $R_{+}$ with $(δ_{n})_{n \geq 1} \in ℓ^{1}$ . If there exists $n_{0} \geq 1$ such that

$θ_{n + 1} - θ_{n} \leq α_{n} (θ_{n} - θ_{n - 1}) - ξ_{n} + δ_{n} \forall n \geq n_{0}$

and α such that

$0 \leq α_{n} \leq α < 1 \forall n \geq 1,$

then the following statements are true:

$\sum_{n \geq 1} [θ_{n} - θ_{n - 1}]_{+} < + \infty$ , where $[s]_{+} := max {s, 0}$ ;

the limit $lim_{n \to \infty} θ_{n}$ exists.

the sequence $(ξ_{n})_{n \geq 1}$ belongs to $ℓ^{1}$ .

The following result follows from Lemma 1.2, applied in case $α_{n} := 0$ and $θ_{n} := ρ_{n} - ρ$ for all $n \geq 1$ , where ρ is a lower bound for $(ρ_{n})_{n \geq 1}$ .

Lemma 1.3

Let $(ρ_{n})_{n \geq 1}$ be a sequence in $R$ , which is bounded from below, and $(ξ_{n})_{n \geq 1}$ , $(δ_{n})_{n \geq 1}$ be sequences in $R_{+}$ with $(δ_{n})_{n \geq 1} \in ℓ^{1}$ . If there exists $n_{0} \geq 1$ such that

$ρ_{n + 1} \leq ρ_{n} - ξ_{n} + δ_{n} \forall n \geq n_{0},$

then the following statements are true:

the sequence $(ρ_{n})_{n \geq 1}$ is convergent.

the sequence $(ξ_{n})_{n \geq 1}$ belongs to $ℓ^{1}$ .

The following result, which will be useful in this work, shows that statement (ii) in Lemma 1.3 can be obtained also when $(ρ_{n})_{n \geq 1}$ is not bounded from below, but it has a particular form.

Lemma 1.4

Let $(ρ_{n})_{n \geq 1}$ be a sequence in $R$ and $(ξ_{n})_{n \geq 1}, (δ_{n})_{n \geq 1}$ be sequences in $R_{+}$ with $(δ_{n})_{n \geq 1} \in ℓ^{1}$ and

$ρ_{n} := θ_{n} - α_{n} θ_{n - 1} + χ_{n} \forall n \geq 1,$

where $(θ_{n})_{n \geq 0}, (χ_{n})_{n \geq 1}$ are sequences in $R_{+}$ and there exists α such that

$0 \leq α_{n} \leq α < 1 \forall n \geq 1.$

If there exists $n_{0} \geq 1$ such that

$ρ_{n + 1} - ρ_{n} \leq - ξ_{n} + δ_{n} \forall n \geq n_{0},$ (8)

then the sequence $(ξ_{n})_{n \geq 1}$ belongs to $ℓ^{1}$ .

Proof.

We fix an integer $\bar{N} \geq n_{0}$ , sum up the inequalities in (8) for $n = n_{0}, n_{0} + 1, \dots, \bar{N}$ and obtain

$ρ_{\bar{N} + 1} - ρ_{n_{0}} \leq - \sum_{n = n_{0}}^{\bar{N}} ξ_{n} + \sum_{n = n_{0}}^{\bar{N}} δ_{n} \leq \sum_{n \geq 1} δ_{n} < + \infty .$ (9)

Hence the sequence ${ρ_{n}}_{n \geq 1}$ is bounded from above. Let $\bar{ρ} > 0$ be an upper bound of this sequence. For all $n \geq 1$ it holds

$θ_{n} - α θ_{n - 1} \leq θ_{n} - α_{n} θ_{n - 1} + χ_{n} = ρ_{n} \leq \bar{ρ},$

from which we deduce that

$- ρ_{n} \leq - θ_{n} + α θ_{n - 1} \leq α θ_{n - 1} .$ (10)

By induction we obtain for all $n \geq n_{0} + 1$

$θ_{n} \leq α θ_{n - 1} + \bar{ρ} \leq \dots \leq α^{n - n_{0}} θ_{n_{0}} + \bar{ρ} \sum_{k = 1}^{n - n_{0}} α^{k - 1} \leq α^{n - n_{0}} θ_{n_{0}} + \frac{\bar{ρ}}{1 - α} .$ (11)

Then inequality (9) combined with (10) and (11) leads to

$\begin{aligned} \sum_{n = n_{0}}^{\bar{N}} ξ_{n} & \leq ρ_{n_{0}} - ρ_{\bar{N} + 1} + \sum_{n = n_{0}}^{\bar{N}} δ_{n} \leq ρ_{n_{0}} + α θ_{\bar{N}} + \sum_{n \geq 1} δ_{n} \\ \leq ρ_{n_{0}} + α^{\bar{N} - n_{0} + 1} θ_{n_{0}} + \frac{α \bar{ρ}}{1 - α} + \sum_{n \geq 1} δ_{n} < + \infty . \end{aligned}$ (12)

We let $\bar{N}$ converge to $+ \infty$ and obtain that $\sum_{n \geq 1} ξ_{n} < + \infty$ .

2. The general monotone inclusion problem

In this section we address the following monotone inclusion problem.

Problem 2.1

Let $H$ be a real Hilbert space, $A : H ⇉ H$ a maximally monotone operator, $D : H \to H$ an $η -$ cocoercive with $η > 0, B : H \to H$ a $μ -$ cocoercive with $μ > 0$ and assume that $M := Zer B \neq \emptyset$ . The monotone inclusion problem to solve reads

$0 \in A x + D x + N_{M} (x) .$

The following forward–backward penalty algorithm with inertial effects for solving Problem 2.1 will be in the focus of our investigations in this paper.

When D=0 and $B = \nabla h$ , where $h : H \to R$ is a convex and differentiable function with $μ^{- 1}$ -Lipschitz continuous gradient with $μ > 0$ fulfilling $min h = 0$ , then Problem 2.1 recovers the monotone inclusion problem addressed in [3, Section 3] and Algorithm 2.2 can be seen as an inertial version of the iterative scheme considered in this paper. When B=0, we have that $N_{M} = {0}$ and Algorithm 2.2 is nothing else than the inertial version of the classical forward–backward algorithm (see for instance [33,40]).

Hypotheses 2.2

The convergence analysis will be carried out in the following hypotheses (see also [7]):

$A + N_{M} is
maximally monotone and Zer (A + D + N_{M}) \neq \emptyset$ ;

for every $p \in Ran N_{M}, \sum_{n \geq 1} λ_{n} β_{n} [sup_{u \in M} ϕ_{B} (u, \frac{p}{β_{n}}) - σ_{M} (\frac{p}{β_{n}})] < + \infty$ .

Since A and $N_{M}$ are maximally monotone operators, the sum $A + N_{M}$ is maximally monotone, provided some specific regularity conditions are fulfilled (see [33–35,38]). Furthermore, since D is also maximally monotone [33, Example 20.28] and $Dom D \equiv H$ , if $A + N_{M}$ is maximally monotone, then $A + D + N_{M}$ is also maximally monotone.

Let us also notice that for $p \in Ran N_{M}$ there exists $\hat{u} \in M$ such that $p \in N_{M} (\hat{u})$ , hence, for every $β > 0$ it holds

sup_{u \in M} ϕ_{B} (u, \frac{p}{β}) - σ_{M} (\frac{p}{β}) \geq ⟨\hat{u}, \frac{p}{β}⟩ - σ_{M} (\frac{p}{β}) = 0 .

For situations where ( $H_{2}^{fitz}$ ) is satisfied we refer the reader to [5,8,9,11].

Before formulating the main theorem of this section we will prove some useful technical results.

Lemma 2.3

Let $(x_{n})_{n \geq 0}$ be the sequence generated by Algorithm 2.2 and $(u, y)$ be an element in $Gr (A + D + N_{M})$ such that y=v+Du+p with $v \in A u$ and $p \in N_{M} (u)$ . Further, let $ε_{1}, ε_{2}, ε_{3} > 0$ be such that $1 - ε_{3} > 0$ . Then the following inequality holds for all $n \geq 1$

$\begin{aligned} {∥x_{n + 1} - u∥}^{2} - {∥x_{n} - u∥}^{2} \\ \leq & α_{n} {∥x_{n} - u∥}^{2} - α_{n} {∥x_{n - 1} - u∥}^{2} - (1 - 4 ε_{1} - ε_{2}) {∥x_{n + 1} - x_{n}∥}^{2} \\ + (α_{n} + \frac{α_{n}^{2}}{4 ε_{1}}) {∥x_{n} - x_{n - 1}∥}^{2} + (\frac{2}{ε_{2}} λ_{n}^{2} β_{n}^{2} - 2 μ (1 - ε_{3}) λ_{n} β_{n}) {∥B x_{n}∥}^{2} \\ + (\frac{4}{ε_{2}} λ_{n}^{2} - 2 η λ_{n}) {∥D x_{n} - D u∥}^{2} + \frac{4}{ε_{2}} λ_{n}^{2} {∥D u + v∥}^{2} \\ + 2 ε_{3} λ_{n} β_{n} [sup_{u \in M} ϕ_{B} (u, \frac{p}{ε_{3} β_{n}}) - σ_{M} (\frac{p}{ε_{3} β_{n}})] + 2 λ_{n} ⟨u - x_{n}, y⟩ . \end{aligned}$ (13)

Proof.

Let $n \geq 1$ be fixed. According to definition of the resolvent of the operator A we have

$x_{n} - x_{n + 1} - λ_{n} (D x_{n} + β_{n} B x_{n}) + α_{n} (x_{n} - x_{n - 1}) \in λ_{n} A x_{n + 1}$ (14)

and, since $λ_{n} v \in λ_{n} A u$ , the monotonicity of A guarantees

$⟨x_{n + 1} - u, x_{n} - x_{n + 1} - λ_{n} (D x_{n} + β_{n} B x_{n} + v) + α_{n} (x_{n} - x_{n - 1})⟩ \geq 0$ (15)

or, equivalently,

$\begin{aligned} 2 ⟨u - x_{n + 1}, x_{n} - x_{n + 1}⟩ \leq 2 λ_{n} ⟨u - x_{n + 1}, β_{n} B x_{n} + D x_{n} + v⟩ \\ - 2 α_{n} ⟨u - x_{n + 1}, x_{n} - x_{n - 1}⟩ . \end{aligned}$ (16)

For the term in the left-hand side of (16) we have

$2 ⟨u - x_{n + 1}, x_{n} - x_{n + 1}⟩ = {∥x_{n + 1} - u∥}^{2} + {∥x_{n + 1} - x_{n}∥}^{2} - {∥x_{n} - u∥}^{2} .$ (17)

Since

$- 2 α_{n} ⟨u - x_{n}, x_{n} - x_{n - 1}⟩ = - α_{n} {∥u - x_{n - 1}∥}^{2} + α_{n} {∥u - x_{n}∥}^{2} + α_{n} {∥x_{n} - x_{n - 1}∥}^{2}$

and

$2 ⟨x_{n + 1} - x_{n}, α_{n} (x_{n} - x_{n - 1})⟩ \leq 4 ε_{1} {∥x_{n + 1} - x_{n}∥}^{2} + \frac{α_{n}^{2}}{4 ε_{1}} {∥x_{n} - x_{n - 1}∥}^{2},$

by adding the two inequalities, we obtain the following estimation for the second term in the right-hand side of (16)

$\begin{aligned} - 2 α_{n} ⟨u - x_{n + 1}, x_{n} - x_{n - 1}⟩ \\ \leq α_{n} {∥x_{n} - u∥}^{2} - α_{n} {∥x_{n - 1} - u∥}^{2} + 4 ε_{1} {∥x_{n + 1} - x_{n}∥}^{2} \\ + (α_{n} + \frac{α_{n}^{2}}{4 ε_{1}}) {∥x_{n} - x_{n - 1}∥}^{2} . \end{aligned}$ (18)

We turn now our attention to the first term in the right-hand side of (16), which can be written as

$\begin{aligned} 2 λ_{n} ⟨u - x_{n + 1}, β_{n} B x_{n} + D x_{n} + v⟩ \\ = 2 λ_{n} ⟨u - x_{n}, β_{n} B x_{n} + D x_{n} + v⟩ + 2 λ_{n} β_{n} ⟨x_{n} - x_{n + 1}, B x_{n}⟩ \\ + 2 λ_{n} ⟨x_{n} - x_{n + 1}, D x_{n} + v⟩ . \end{aligned}$ (19)

We have

$2 λ_{n} β_{n} ⟨x_{n} - x_{n + 1}, B x_{n}⟩ \leq \frac{ε_{2}}{2} {∥x_{n + 1} - x_{n}∥}^{2} + \frac{2}{ε_{2}} λ_{n}^{2} β_{n}^{2} {∥B x_{n}∥}^{2}$ (20)

and

$\begin{aligned} 2 λ_{n} ⟨x_{n} - x_{n + 1}, D x_{n} + v⟩ & \leq \frac{ε_{2}}{2} {∥x_{n + 1} - x_{n}∥}^{2} + \frac{2}{ε_{2}} λ_{n}^{2} {∥D x_{n} + v∥}^{2} \\ \leq \frac{ε_{2}}{2} {∥x_{n + 1} - x_{n}∥}^{2} + \frac{4}{ε_{2}} λ_{n}^{2} {∥D x_{n} - D u∥}^{2} \\ + \frac{4}{ε_{2}} λ_{n}^{2} {∥D u + v∥}^{2} . \end{aligned}$ (21)

On the other hand, we have

$\begin{aligned} 2 λ_{n} ⟨u - x_{n}, β_{n} B x_{n} + D x_{n} + v⟩ \\ = 2 λ_{n} β_{n} ⟨u - x_{n}, B x_{n}⟩ + 2 λ_{n} ⟨u - x_{n}, D x_{n} - D u⟩ + 2 λ_{n} ⟨u - x_{n}, D u + v⟩ . \end{aligned}$ (22)

Since $0 < ε_{3} < 1$ and Bu=0, the cocoercivity of B gives us

$2 λ_{n} β_{n} ⟨u - x_{n}, B x_{n}⟩ \leq - 2 μ (1 - ε_{3}) λ_{n} β_{n} {∥B x_{n}∥}^{2} + 2 ε_{3} λ_{n} β_{n} ⟨u - x_{n}, B x_{n}⟩ .$ (23)

Similarly, the cocoercivity of D gives us

$2 λ_{n} ⟨u - x_{n}, D x_{n} - D u⟩ \leq - 2 η λ_{n} {∥D x_{n} - D u∥}^{2} .$ (24)

Combining (23)–(24) with (22) and by using the definition Fitzpatrick function and the fact that $σ_{M} (p / ε_{3} β_{n}) = ⟨ u, p / ε_{3} β_{n} ⟩$ , we obtain

$\begin{aligned} 2 λ_{n} ⟨u - x_{n}, β_{n} B x_{n} + D x_{n} + v⟩ \\ \leq - 2 μ (1 - ε_{3}) λ_{n} β_{n} {∥B x_{n}∥}^{2} + 2 ε_{3} λ_{n} β_{n} ⟨u - x_{n}, B x_{n}⟩ - 2 η λ_{n} {∥D x_{n} - D u∥}^{2} \\ + 2 λ_{n} ⟨u - x_{n}, D u + v⟩ \\ = - 2 μ (1 - ε_{3}) λ_{n} β_{n} {∥B x_{n}∥}^{2} + 2 ε_{3} λ_{n} β_{n} ⟨u - x_{n}, B x_{n}⟩ - 2 η λ_{n} {∥D x_{n} - D u∥}^{2} \\ + 2 λ_{n} ⟨u - x_{n}, y - p⟩ \\ = - 2 μ (1 - ε_{3}) λ_{n} β_{n} {∥B x_{n}∥}^{2} - 2 η λ_{n} {∥D x_{n} - D u∥}^{2} + 2 λ_{n} ⟨u - x_{n}, y⟩ \\ + 2 ε_{3} λ_{n} β_{n} (⟨u, B x_{n}⟩ + ⟨x_{n}, \frac{p}{ε_{3} β_{n}}⟩ - ⟨x_{n}, B x_{n}⟩ - ⟨u, \frac{p}{ε_{3} β_{n}}⟩) \\ \leq - 2 μ (1 - ε_{3}) λ_{n} β_{n} {∥B x_{n}∥}^{2} - 2 η λ_{n} {∥D x_{n} - D u∥}^{2} + 2 λ_{n} ⟨u - x_{n}, y⟩ \\ + 2 ε_{3} λ_{n} β_{n} [sup_{u \in M} ϕ_{B} (u, \frac{p}{ε_{3} β_{n}}) - σ_{M} (\frac{p}{ε_{3} β_{n}})] . \end{aligned}$ (25)

The inequalities (20), (21) and (25) lead to

$\begin{aligned} 2 λ_{n} ⟨u - x_{n + 1}, β_{n} B x_{n} + D x_{n} + v⟩ \\ \leq (\frac{2}{ε_{2}} λ_{n}^{2} β_{n}^{2} - 2 μ (1 - ε_{3}) λ_{n} β_{n}) {∥B x_{n}∥}^{2} + (\frac{4}{ε_{2}} λ_{n}^{2} - 2 η λ_{n}) ∥D x_{n} \\ {- D u∥}^{2} + ε_{2} {∥x_{n + 1} - x_{n}∥}^{2} + \frac{4}{ε_{2}} λ_{n}^{2} {∥D u + v∥}^{2} \\ + 2 ε_{3} λ_{n} β_{n} [sup_{u \in M} ϕ_{B} (u, \frac{p}{ε_{3} β_{n}}) - σ_{M} (\frac{p}{ε_{3} β_{n}})] + 2 λ_{n} ⟨u - x_{n}, y⟩ . \end{aligned}$ (26)

Finally, by combining (17), (18) and (26), we obtain (13).

From now on we will assume that for $0 < α < \frac{1}{3}$ the constants $ε_{1}, ε_{2}, ε_{3} > 0$ and the sequences $(λ_{n})_{n \geq 1}$ and $(β_{n})_{n \geq 1}$ are chosen such that

(C_{4}) 1 - ε_{3} > 0, ε_{2} < 1 - 4 ε_{1} - α - \frac{α^{2}}{4 ε_{1}} and sup_{n \geq 1} λ_{n} β_{n} < μ ε_{2} (1 - ε_{3}) .

As a consequence, there exists $0 < s \leq 1 - \frac{ε_{1}}{1 - 3 ε_{1} - ε_{2}} {(1 + \frac{α}{2 ε_{1}})}^{2}$ , which means that for all $n \geq 1$ it holds

α_{n + 1} + \frac{α_{n + 1}^{2}}{4 ε_{1}} - (1 - 4 ε_{1} - ε_{3}) \leq α + \frac{α^{2}}{4 ε_{1}} - (1 - 4 ε_{1} - ε_{3}) < - s .

(27)

On the other hand, there exists $0 < t \leq μ (1 - ε_{2}) - (1 / ε_{3}) sup_{n \geq 0} λ_{n} β_{n}$ , which means that for all $n \geq 1$ it holds

\frac{1}{ε_{3}} λ_{n} β_{n} - μ (1 - ε_{2}) \leq - t .

(28)

Remark 2.4

Since $0 < α < \frac{1}{3}$ , one can always find $ε_{1}, ε_{2} > 0$ such that $ε_{2} < 1 - 4 ε_{1} - α - (α^{2} / 4 ε_{1})$ . One possible choice is $ε_{1} = \frac{α}{4}$ and $0 < ε_{2} < 1 - 3 α$ . From the second inequality in $(C_{4})$ it follows that $1 - 3 ε_{1} - ε_{2} > ε_{1} + α + (α^{2} / 4 ε_{1}) > 0$ .

As
$\begin{aligned} 1 - \frac{ε_{1}}{1 - 3 ε_{1} - ε_{2}} {(1 + \frac{α}{2 ε_{1}})}^{2} \\ = \frac{1}{1 - 3 ε_{1} - ε_{2}} (1 - 4 ε_{1} - ε_{2} - α - \frac{α^{2}}{4 ε_{1}}) > 0, \end{aligned}$
it is always possible to choose s such that $0 < s \leq 1 - \frac{ε_{1}}{1 - 3 ε_{1} - ε} (1 + \frac{α}{2 ε_{1}})^{2}$ . Since in this case $s < 1 - 4 ε_{1} - ε_{2} - α - \frac{α^{2}}{4 ε_{1}}$ , one has (27).

The following proposition brings us closer to the convergence result.

Proposition 2.5

Let $0 < α < \frac{1}{3}$ , $ε_{1}, ε_{2}, ε_{3} > 0$ and the sequences $(λ_{n})_{n \geq 1}$ and $(β_{n})_{n \geq 1}$ satisfy condition $(C_{4})$ . Let $(x_{n})_{n \geq 0}$ be the sequence generated by Algorithm 2.2 and assume that the Hypotheses 2.2 are verified. Then the following statements are true:

the sequence $(∥ x_{n + 1} - x_{n} ∥)_{n \geq 0}$ belongs to $ℓ^{2}$ and the sequence $(λ_{n} β_{n} ∥ B x_{n} ∥^{2})_{n \geq 1}$ belongs to $ℓ^{1}$ ;

if, moreover, $\underset{n \to + \infty}{lim inf} λ_{n} β_{n} > 0$ , then $lim_{n \to + \infty} ∥ B x_{n} ∥ = 0$ and thus every cluster point of the sequence $(x_{n})_{n \geq 0}$ lies in M.

for every $u \in Zer (A + D + N_{M})$ , the limit $lim_{n \to + \infty} ∥ x_{n} - u ∥$ exists.

Proof.

Since $lim_{n \to + \infty} λ_{n} = 0$ , there exists a integer $n_{1} \geq 1$ such that $λ_{n} \leq (2 / ε_{2}) η$ for all $n \geq n_{0}$ . According to Lemma 2.3, for every $(u, y) \in Gr (A + D + N_{M})$ such that y=v+Du+p, with $v \in A u$ and $p \in N_{M} (u)$ , and all $n \geq n_{0}$ the following inequality holds

$\begin{aligned} {∥x_{n + 1} - u∥}^{2} - {∥x_{n} - u∥}^{2} \\ \leq α_{n} {∥x_{n} - u∥}^{2} - α_{n} {∥x_{n - 1} - u∥}^{2} - (1 - 4 ε_{1} - ε_{2}) {∥x_{n + 1} - x_{n}∥}^{2} \\ + (α_{n} + \frac{α_{n}^{2}}{4 ε_{1}}) {∥x_{n} - x_{n - 1}∥}^{2} + (\frac{2}{ε_{2}} λ_{n} β_{n} - 2 μ (1 - ε_{3})) λ_{n} β_{n} {∥B x_{n}∥}^{2} \\ + \frac{4}{ε_{2}} λ_{n}^{2} {∥D u + v∥}^{2} + 2 ε_{3} λ_{n} β_{n} [sup_{u \in M} ϕ_{B} (u, \frac{p}{ε_{3} β_{n}}) - σ_{M} (\frac{p}{ε_{3} β_{n}})] \\ + 2 λ_{n} ⟨u - x_{n}, y⟩ . \end{aligned}$ (29)

We consider $u \in Zer (A + D + N_{M})$ , which means that we can take y=0 in (29). For all $n \geq 1$ we denote

$θ_{n} := {∥x_{n} - u∥}^{2}, ρ_{n} := θ_{n} - α_{n} θ_{n - 1} + (α_{n} + \frac{α_{n}^{2}}{4 ε_{1}}) {∥x_{n} - x_{n - 1}∥}^{2}$ (30)

and

$δ_{n} := \frac{4}{ε_{2}} λ_{n}^{2} {∥D u + v∥}^{2} + 2 ε_{3} λ_{n} β_{n} [sup_{u \in M} ϕ_{B} (u, \frac{p}{ε_{3} β_{n}}) - σ_{M} (\frac{p}{ε_{3} β_{n}})] .$ (31)

Using that $(α_{n})_{n \geq 1}$ is non-decreasing, for all $n \geq n_{0}$ it yields

$\begin{aligned} ρ_{n + 1} - ρ_{n} & \leq (α_{n + 1} + \frac{α_{n + 1}^{2}}{4 ε_{1}} - (1 - 4 ε_{1} - ε_{2})) {∥x_{n + 1} - x_{n}∥}^{2} \\ + (\frac{2}{ε_{3}} λ_{n} β_{n} - 2 μ (1 - ε_{2})) λ_{n} β_{n} {∥B x_{n}∥}^{2} + δ_{n} \\ \leq - s {∥x_{n + 1} - x_{n}∥}^{2} - 2 t λ_{n} β_{n} {∥B x_{n}∥}^{2} + δ_{n}, \end{aligned}$ (32)

where s,t>0 are chosen according to (27) and (28), respectively.

Thanks to ( $H_{2}^{fitz}$ ) and (C $_{1}$ ) it holds

$\begin{aligned} \sum_{n \geq 1} δ_{n} & = \frac{4}{ε_{2}} {∥D u + v∥}^{2} \sum_{n \geq 1} λ_{n}^{2} + 2 \sum_{n \geq 1} ε_{3} λ_{n} β_{n} [sup_{u \in M} ϕ_{B} (u, \frac{p}{ε_{3} β_{n}}) \\ - σ_{M} (\frac{p}{ε_{3} β_{n}})] < + \infty . \end{aligned}$ (33)

Hence, according to Lemma 1.4, we obtain

$\sum_{n \geq 0} {∥x_{n + 1} - x_{n}∥}^{2} < + \infty and \sum_{n \geq 1} λ_{n} β_{n} {∥B x_{n}∥}^{2} < + \infty,$ (34)

which proves (i). If, in addition, $\underset{n \to \infty}{lim inf} λ_{n} β_{n} > 0$ , then $lim_{n \to + \infty} ∥ B x_{n} ∥ = 0$ , which means every cluster point of the sequence $(x_{n})_{n \geq 0}$ lies in $Zer B = M$ .

In order to prove (iii), we consider again the inequality (29) for an arbitrary element $u \in Zer (A + D + N_{M})$ and y=0. With the notations in (30) and (31), we get for all $n \geq n_{0}$

$θ_{n + 1} - θ_{n} \leq α_{n} (θ_{n} - θ_{n - 1}) + (α_{n} + \frac{α_{n}^{2}}{4 ε_{1}}) {∥x_{n} - x_{n - 1}∥}^{2} + δ_{n} .$ (35)

According to (33) and (34) we have

$\begin{aligned} \sum_{n \geq 1} (α_{n} + \frac{α_{n}^{2}}{4 ε_{1}}) {∥x_{n} - x_{n - 1}∥}^{2} + \sum_{n \geq 1} δ_{n} \leq (α + \frac{α^{2}}{4 ε_{1}}) \sum_{n \geq 1} {∥x_{n} - x_{n - 1}∥}^{2} \\ + \sum_{n \geq 1} δ_{n} < + \infty, \end{aligned}$ (36)

therefore, by Lemma 1.2, the limit $lim_{n \to + \infty} θ_{n} = lim_{n \to + \infty} ∥ x_{n} - u ∥^{2}$ exists, which means that the limit $lim_{n \to + \infty} ∥ x_{n} - u ∥$ exists, too.

Remark 2.6

The condition (C $_{3}$ ) that we imposed in combination with $0 < α < \frac{1}{3}$ on the sequence of inertial parameters $(α_{n})_{n \geq 1}$ is the one proposed in [15, Proposition 2.1] when addressing the convergence of the inertial proximal point algorithm. However, the statements in the proposition above and in the following convergence theorem remain valid if one alternatively assumes that there exists $α^{'}$ such that $0 \leq α_{n} \leq α^{'} < 1$ for all $n \geq 1$ and

$\sum_{n \geq 1} (α_{n} + \frac{α_{n}^{2}}{4 ε_{1}}) {∥x_{n} - x_{n - 1}∥}^{2} < + \infty .$

This can be realized if one chooses for a fixed p>1

$α_{n} \leq min \{α^{'}, 2 ε_{1} (- 1 + \sqrt{1 + n^{- p} {∥x_{n} - x_{n - 1}∥}^{- 2}})\} \forall n \geq 1.$

Indeed, in this situation we have that $(α_{n}^{2} / 4 ε_{1}) + α_{n} - (1 / n^{p} ∥ x_{n} - x_{n - 1} ∥^{2}) \leq 0$ for all $n \geq 1$ , which gives

$\sum_{n \geq 1} (α_{n} + \frac{α_{n}^{2}}{4 ε_{1}}) {∥x_{n} - x_{n - 1}∥}^{2} \leq \sum_{n \geq 1} \frac{1}{n^{p}} < + \infty .$

Now we are ready to prove the main theorem of this section, which addresses the convergence of the sequence generated by Algorithm 2.2.

Theorem 2.8

Let $0 < α < \frac{1}{3}$ , $ε_{1}, ε_{2}, ε_{3} > 0$ and the sequences $(λ_{n})_{n \geq 1}$ and $(β_{n})_{n \geq 1}$ satisfy condition $(C_{4})$ . Let $(x_{n})_{n \geq 0}$ be the sequence generated by Algorithm 2.2, $(z_{n})_{n \geq 1}$ be the sequence defined in (7) and assume that the Hypotheses 2.2 are verified. Then the following statements are true:

the sequence $(z_{n})_{n \geq 1}$ converges weakly to an element in $Zer (A + D + N_{M})$ as $n \to + \infty$ .

if A is γ-strongly monotone with $γ > 0$ , then $(x_{n})_{n \geq 0}$ converges strongly to the unique element in $Zer (A + D + N_{M})$ as $n \to + \infty$ .

Proof.

According to Proposition 2.5 (iii), the limit $lim_{n \to + \infty} ∥ x_{n} - u ∥$ exists for every $u \in Zer (A + D + N_{M})$ . Let z be a sequential weak cluster point of $(z_{n})_{n \geq 1}$ . We will show that $z \in Zer (A + D + N_{M})$ , by using the characterization (5) of the maximal monotonicity, and the conclusion will follow by Lemma 1.1. To this end we consider an arbitrary $(u, y) \in Gr (A + D + N_{M})$ such that y=v+Du+p, where $v \in A u$ and $p \in N_{M} (u)$ . From (29), with the notations (30) and (31), we have for all $n \geq n_{0}$
$\begin{aligned} ρ_{n + 1} - ρ_{n} \\ \leq - s {∥x_{n + 1} - x_{n}∥}^{2} - 2 t λ_{n} β_{n} {∥B x_{n}∥}^{2} + δ_{n} + 2 λ_{n} ⟨u - x_{n}, y⟩ \\ \leq δ_{n} + 2 λ_{n} ⟨u - x_{n}, y⟩ . \end{aligned}$ (37)
Recall that from (33) that $\sum_{n \geq 1} δ_{n} < + \infty$ . Since $(x_{n})_{n \geq 0}$ is bounded, the sequence $(ρ_{n})_{n \geq 1}$ is also bounded.

We fix an arbitrary integer $\bar{N} \geq n_{0}$ and sum up the inequalities in (37) for $n = n_{0} + 1, n_{0} + 2, \dots, \bar{N}$ . This yields
$\begin{aligned} ρ_{\bar{N} + 1} - ρ_{n_{0} + 1} & \leq \sum_{n \geq 1} δ_{n} + 2 ⟨- \sum_{n = 1}^{n_{0}} λ_{n} u + \sum_{n = 1}^{n_{0}} λ_{n} x_{n}, y⟩ \\ + 2 ⟨τ_{\bar{N}} u - \sum_{n = 1}^{\bar{N}} λ_{n} x_{n}, y⟩ . \end{aligned}$
After dividing this last inequality by $2 τ_{\bar{N}} = 2 \sum_{n = 1}^{\bar{N}} λ_{n}$ , we obtain
$\frac{1}{2 τ_{\bar{N}}} (ρ_{\bar{N} + 1} - ρ_{n_{0} + 1}) \leq \frac{1}{2 τ_{\bar{N}}} T + 2 ⟨u - z_{\bar{N}}, y⟩,$ (38)
where $T := \sum_{n \geq 1} δ_{n} + 2 ⟨ - \sum_{n = 1}^{n_{0}} λ_{n} u + \sum_{n = 1}^{n_{0}} λ_{n} x_{n}, y ⟩ \in R$ . By passing in (38) to the limit and by using that $lim_{N \to \infty} τ_{\bar{N}} = lim_{\bar{N} \to \infty} \sum_{n = 1}^{\bar{N}} λ_{n} = + \infty$ , we get
$\underset{\bar{N} \to \infty}{lim inf} ⟨u - z_{\bar{N}}, y⟩ \geq 0 .$
As z is a sequential weak cluster point of $(z_{n})_{n \geq 1}$ , the above inequality gives us $⟨ u - z, y ⟩ \geq 0$ , which finally means that $z \in Zer (A + D + N_{M})$ .

Let $u \in H$ be the unique element in $Zer (A + D + N_{M})$ . Since A is $γ -$ strongly monotone with $γ > 0$ , the formula in (15) reads for all $n \geq 1$
$\begin{aligned} ⟨x_{n + 1} - u, x_{n} - x_{n + 1} - λ_{n} (D x_{n} + β_{n} B x_{n} + v) + α_{n} (x_{n} - x_{n - 1})⟩ \\ \geq γ λ_{n} {∥x_{n + 1} - u∥}^{2} \end{aligned}$
or, equivalently,
$\begin{aligned} 2 γ λ_{n} {∥x_{n + 1} - u∥}^{2} + 2 ⟨u - x_{n + 1}, x_{n} - x_{n + 1}⟩ \\ \leq 2 λ_{n} ⟨u - x_{n + 1}, β_{n} B x_{n} + D x_{n} + v⟩ - 2 α_{n} ⟨u - x_{n + 1}, x_{n} - x_{n - 1}⟩ . \end{aligned}$
By using again (17), (18) and (26) we obtain for all $n \geq 1$
$\begin{aligned} 2 γ λ_{n} {∥x_{n + 1} - u∥}^{2} + {∥x_{n + 1} - u∥}^{2} - {∥x_{n} - u∥}^{2} \\ \leq α_{n} {∥x_{n} - u∥}^{2} - α_{n} {∥x_{n - 1} - u∥}^{2} - (1 - 4 ε_{1} - ε_{2}) {∥x_{n + 1} - x_{n}∥}^{2} \\ + (α_{n} + \frac{α_{n}^{2}}{4 ε_{1}}) {∥x_{n} - x_{n - 1}∥}^{2} + (\frac{2}{ε_{2}} λ_{n}^{2} β_{n}^{2} - 2 μ (1 - ε_{3}) λ_{n} β_{n}) {∥B x_{n}∥}^{2} \\ + (\frac{4}{ε_{2}} λ_{n}^{2} - 2 η λ_{n}) {∥D x_{n} - D u∥}^{2} + \frac{4}{ε_{2}} λ_{n}^{2} {∥D u + v∥}^{2} \\ + 2 ε_{3} λ_{n} β_{n} [sup_{u \in M} ϕ_{B} (u, \frac{p}{ε_{3} β_{n}}) - σ_{M} (\frac{p}{ε_{3} β_{n}})] + 2 λ_{n} ⟨u - x_{n}, y⟩ . \end{aligned}$
By using the notations in (30) and (31), this yields for all $n \geq 1$
$\begin{aligned} 2 γ λ_{n} {∥x_{n + 1} - u∥}^{2} + θ_{n + 1} - θ_{n} \leq α_{n} (θ_{n} - θ_{n - 1}) \\ + (α_{n} + \frac{α_{n}^{2}}{4 ε_{1}}) {∥x_{n} - x_{n - 1}∥}^{2} + δ_{n} . \end{aligned}$
By taking into account (36), from Lemma 1.2 we get
$2 γ \sum_{n \geq 1} λ_{n} {∥x_{n} - u∥}^{2} < + \infty .$
According to (C $_{1}$ ) we have $\sum_{n \geq 1} λ_{n} = + \infty$ , which implies that the limit $lim_{n \to \infty} ∥ x_{n} - u ∥$ must be equal to zero. This provides the desired conclusion.

3. Applications to convex bilevel programming

We will employ the results obtained in the previous section, in the context of monotone inclusions, to the solving of convex bilevel programming problems.

Problem 3.1

Let $H$ be a real Hilbert space, $f : H \to \bar{R}$ a proper, convex and lower semicontinuous function and $g, h : H \to R$ differentiable functions with $L_{g}$ -Lipschitz continuous and, respectively, $L_{h}$ -Lipschitz continuous gradients. Suppose that $\arg min h \neq \emptyset$ and $min h = 0$ . The bilevel programming problem to solve reads

$min_{x \in \arg min h} f (x) + g (x) .$

The assumption $min h = 0$ is not restrictive as, otherwise, one can replace h with $h - min h$ .

Hypotheses 3.2

The convergence analysis will be carry out in the following hypotheses:

$\partial f + N_{\arg min h} is maximally monotone and S := \arg min_{x \in \arg min h} {f (x) + g (x)} \neq \emptyset$ ;

for every $p \in Ran N_{\arg min h}, \sum_{n \geq 1} λ_{n} β_{n} [h^{*} (\frac{p}{β_{n}}) - σ_{\arg min h} (\frac{p}{β_{n}})] < + \infty$ .

In the above hypotheses, we have that $\partial f + \nabla g + N_{\arg min h} = \partial (f + g + δ_{\arg min h})$ and hence $S = Zer (\partial f + \nabla g + N_{\arg min h}) \neq \emptyset$ . Since according to the Theorem of Baillon–Haddad (see, e.g. [33, Corollary 18.16]), $\nabla g$ and $\nabla h$ are $L_{g}^{- 1}$ -cocoercive and, respectively, $L_{h}^{- 1}$ - cocoercive, and $\arg min h = Zer \nabla h$ , solving the bilevel programming problem in Problem 3.1 reduces to solving the monotone inclusion

0 \in \partial f (x) + \nabla g (x) + N_{\arg min h} (x) .

By using to this end Algorithm 2.2, we receive the following iterative scheme.

By using the inequality (6), one can easily notice, that $(H_{2}^{prog})$ implies $(H_{2}^{fitz})$ , which means that the convergence statements for Algorithm 3.3 can be derived as particular instances of the ones derived in the previous section.

Alternatively, one can use to this end the following lemma and employ the same ideas and techniques as in Section 2. Lemma 3.3 is similar to Lemma 2.3, however, it will allow us to provide convergence statements also for the sequence of function values $(h (x_{n}))_{n \geq 0}$ .

Lemma 3.3

Let $(x_{n})_{n \geq 0}$ be the sequence generated by Algorithm 3.3 and $(u, y)$ be an element in $Gr (\partial f + \nabla g + N_{\arg min h})$ such that $y = v + \nabla g (u) + p$ with $v \in \partial f (u)$ and $p \in N_{\arg min h} (u)$ . Further, let $ε_{1}, ε_{2}, ε_{3} > 0$ be such that $1 - ε_{3} > 0$ . Then the following inequality holds for all $n \geq 1$

$\begin{aligned} {∥x_{n + 1} - u∥}^{2} - {∥x_{n} - u∥}^{2} \\ \leq α_{n} {∥x_{n} - u∥}^{2} - α_{n} {∥x_{n - 1} - u∥}^{2} - (1 - 4 ε_{1} - ε_{2}) {∥x_{n + 1} - x_{n}∥}^{2} \\ + (α_{n} + \frac{α_{n}^{2}}{4 ε_{1}}) {∥x_{n} - x_{n - 1}∥}^{2} (\frac{2}{ε_{2}} λ_{n}^{2} β_{n}^{2} - 2 μ (1 - ε_{3}) λ_{n} β_{n}) {∥\nabla h (x_{n})∥}^{2} \\ + (\frac{4}{ε_{2}} λ_{n}^{2} - 2 η λ_{n}) {∥\nabla g (x_{n}) - \nabla g (u)∥}^{2} + λ_{n} β_{n} [h (u) - h (x_{n})] \\ + \frac{4}{ε_{2}} λ_{n}^{2} {∥v + \nabla g (u)∥}^{2} + ε_{3} λ_{n} β_{n} [h^{*} (\frac{2 p}{ε_{3} β_{n}}) - σ_{\arg min h} (\frac{2 p}{ε_{3} β_{n}})] \\ + 2 λ_{n} ⟨u - x_{n}, y⟩ . \end{aligned}$

Proof.

Let be $n \geq 1$ fixed. The proof follows by combining the estimates used in the proof of Lemma 2.3 with some inequalities which better exploit the convexity of h. From (23) we have

$\begin{aligned} 2 λ_{n} β_{n} ⟨u - x_{n}, \nabla h (x_{n})⟩ & \leq - 2 μ (1 - ε_{3}) λ_{n} β_{n} {∥\nabla h (x_{n})∥}^{2} \\ + 2 ε_{3} λ_{n} β_{n} ⟨u - x_{n}, \nabla h (x_{n})⟩ . \end{aligned}$

Since h is convex, the following relation also holds

$2 λ_{n} β_{n} ⟨u - x_{n}, \nabla h (x_{n})⟩ \leq 2 λ_{n} β_{n} [h (u) - h (x_{n})] .$

Summing up the two inequalities above gives

$\begin{aligned} 2 λ_{n} β_{n} ⟨u - x_{n}, \nabla h (x_{n})⟩ \leq - μ (1 - ε_{3}) λ_{n} β_{n} {∥\nabla h (x_{n})∥}^{2} \\ + ε_{3} λ_{n} β_{n} ⟨u - x_{n}, \nabla h (x_{n})⟩ + λ_{n} β_{n} [h (u) - h (x_{n})] . \end{aligned}$

Using the same techniques as in the derivation of (25), we get

$\begin{aligned} 2 λ_{n} ⟨u - x_{n}, v + \nabla g (x_{n}) + β_{n} \nabla h (x_{n})⟩ \\ \leq - μ (1 - ε_{3}) λ_{n} β_{n} {∥\nabla h (x_{n})∥}^{2} - 2 η λ_{n} {∥\nabla g (x_{n}) - \nabla g (u)∥}^{2} \\ + λ_{n} β_{n} [h (u) - h (x_{n})] + 2 λ_{n} ⟨u - x_{n}, y⟩ \\ + ε_{3} λ_{n} β_{n} [h^{*} (u, \frac{2 p}{ε_{3} β_{n}}) - σ_{\arg min h} (\frac{2 p}{ε_{3} β_{n}})] . \end{aligned}$

With these improved estimates, the conclusion follows as in the proof of Lemma 2.3.

By using now Lemma 3.3, one obtains, after slightly adapting the proof of Proposition 2.5, the following result.

Proposition 3.4

Let $0 < α < \frac{1}{3}$ , $ε_{1}, ε_{2}, ε_{3} > 0$ and the sequences $(λ_{n})_{n \geq 1}$ and $(β_{n})_{n \geq 1}$ satisfy condition $(C_{4})$ . Let $(x_{n})_{n \geq 0}$ be the sequence generated by Algorithm 3.3 and assume that the Hypotheses 3.2 are verified. Then the following statements are true:

the sequence $(∥ x_{n + 1} - x_{n} ∥)_{n \geq 0}$ belongs to $ℓ^{2}$ and the sequences $(λ_{n} β_{n} ∥ \nabla h (x_{n}) ∥^{2})_{n \geq 1}$ and $(λ_{n} β_{n} h (x_{n}))_{n \geq 1}$ belong to $ℓ^{1}$ ;

if, moreover, $\underset{n \to + \infty}{lim inf} λ_{n} β_{n} > 0$ , then $lim_{n \to + \infty} ∥ \nabla h (x_{n}) ∥ = lim_{n \to + \infty} h (x_{n}) = 0$ and thus every cluster point of the sequence $(x_{n})_{n \geq 0}$ lies in $\arg min h$ .

for every $u \in S$ , the limit $lim_{n \to + \infty} ∥ x_{n} - u ∥$ exists.

Finally, the above proposition leads to the following convergence result.

Theorem 3.6

Let $0 < α < \frac{1}{3}$ , $ε_{1}, ε_{2}, ε_{3} > 0$ and the sequences $(λ_{n})_{n \geq 1}$ and $(β_{n})_{n \geq 1}$ satisfy condition $(C_{4})$ . Let $(x_{n})_{n \geq 0}$ be the sequence generated by Algorithm 3.3, $(z_{n})_{n \geq 1}$ be the sequence defined in (7) and assume that the Hypotheses 3.2 are verified. Then the following statements are true:

the sequence $(z_{n})_{n \geq 1}$ converges weakly to an element in $S$ as $n \to + \infty$ .

if f is $γ -$ strongly convex with $γ > 0$ , then $(x_{n})_{n \geq 0}$ converges strongly to the unique element in $S$ as $n \to + \infty$ .

As follows we will show that under inf-compactness assumptions one can achieve weak non-ergodic convergence for the sequence $(x_{n})_{n \geq 0}$ . Weak non-ergodic convergence has been obtained for Algorithm 3.3 in [9] when $α_{n} = α$ for all $n \geq 1$ and for restrictive choices for both the sequence of step sizes and penalty parameters.

We denote by $(f + g)_{*} = min_{x \in \arg min h} (f (x) + g (x))$ . For every element x in $H$ , we denote by $dist (x, S) = inf_{u \in S} ∥ x - u ∥$ the distance from x to $S$ . In particular, $dist (x, S) = ∥ x - {P r}_{S} x ∥$ , where ${P r}_{S} x$ denotes the projection of x onto $S$ . The projection operator ${P r}_{S}$ is firmly non-expansive [33, Proposition 4.8], this means

\begin{aligned} {∥{P r}_{S} (x) - {P r}_{S} (y)∥}^{2} + {∥[Id - {P r}_{S}] (x) - [Id - {P r}_{S}] (y)∥}^{2} \\ \leq {∥x - y∥}^{2} \forall x, y \in H . \end{aligned}

(39)

Denoting $d (x) = \frac{1}{2} dist (x, S)^{2} = \frac{1}{2} ∥ x - {P r}_{S} x ∥^{2}$ for all $x \in H$ , one has that $x \mapsto d (x)$ is differentiable and it holds $\nabla d (x) = x - {P r}_{S} x$ for all $x \in H$ .

Lemma 3.6

Let $(x_{n})_{n \geq 0}$ be the sequence generated by Algorithm 3.3 and assume that the Hypotheses 3.2 are verified. Then the following inequality holds for all $n \geq 1$

$\begin{aligned} d (x_{n + 1}) - d (x_{n}) - α_{n} (d (x_{n}) - d (x_{n - 1})) + λ_{n} [(f + g) (x_{n + 1}) - (f + g)_{*}] \\ \leq (\frac{L_{g}}{2} λ_{n} + \frac{L_{h}}{4} λ_{n} β_{n} + \frac{α_{n}}{2}) {∥x_{n + 1} - x_{n}∥}^{2} + α_{n} {∥x_{n} - x_{n - 1}∥}^{2} . \end{aligned}$ (40)

Proof.

Let $n \geq 1$ be fixed. Since d is convex, we have

$d (x_{n + 1}) - d (x_{n}) \leq ⟨x_{n + 1} - {P r}_{S} (x_{n + 1}), x_{n + 1} - x_{n}⟩ .$ (41)

Then there exists $v_{n + 1} \in \partial f (x_{n + 1})$ such that (see (14))

$x_{n} - x_{n + 1} - λ_{n} (\nabla g (x_{n}) + β_{n} \nabla h (x_{n})) + α_{n} (x_{n} - x_{n - 1}) = λ_{n} v_{n + 1}$

and, so,

$\begin{aligned} ⟨x_{n + 1} - {P r}_{S} (x_{n + 1}), x_{n + 1} - x_{n}⟩ \\ = ⟨x_{n + 1} - {P r}_{S} (x_{n + 1}), - λ_{n} v_{n + 1} - λ_{n} \nabla g (x_{n}) - λ_{n} β_{n} \nabla h (x_{n}) + α_{n} (x_{n} - x_{n - 1})⟩ \\ - λ_{n} β_{n} ⟨x_{n + 1} - {P r}_{S} (x_{n + 1}), \nabla h (x_{n})⟩ + α_{n} ⟨x_{n + 1} - {P r}_{S} (x_{n + 1}), x_{n} - x_{n - 1}⟩ . \end{aligned}$ (42)

Since $v_{n + 1} \in \partial f (x_{n + 1})$ , we get

$- λ_{n} ⟨x_{n + 1} - {P r}_{S} (x_{n + 1}), v_{n + 1}⟩ \leq λ_{n} [f ({P r}_{S} (x_{n + 1})) - f (x_{n + 1})] .$ (43)

Using the convexity of g it follows

$g (x_{n}) - g ({P r}_{S} (x_{n + 1})) \leq ⟨\nabla g (x_{n}), x_{n} - {P r}_{S} (x_{n + 1})⟩ .$ (44)

On the other hand, the Descent Lemma gives

$g (x_{n + 1}) \leq g (x_{n}) + ⟨\nabla g (x_{n}), x_{n + 1} - x_{n}⟩ + \frac{L_{g}}{2} {∥x_{n + 1} - x_{n}∥}^{2} .$ (45)

By adding (44) and (45), it yields

$\begin{aligned} - λ_{n} ⟨x_{n + 1} - {P r}_{S} (x_{n + 1}), \nabla g (x_{n})⟩ \leq λ_{n} [g ({P r}_{S} (x_{n + 1})) - g (x_{n + 1})] \\ + \frac{L_{g} λ_{n}}{2} {∥x_{n + 1} - x_{n}∥}^{2} . \end{aligned}$ (46)

Using the $(1 / L_{h}) -$ cocoercivity of $\nabla h$ combined with the fact that $\nabla h ({P r}_{S} (x_{n + 1})) = 0$ (as ${P r}_{S} (x_{n + 1})$ belongs to $S$ ), it yields

$- ⟨x_{n} - {P r}_{S} (x_{n + 1}), \nabla h (x_{n})⟩ \leq - \frac{1}{L_{h}} {∥\nabla h (x_{n})∥}^{2} .$

Therefore

$\begin{aligned} - λ_{n} β_{n} ⟨x_{n + 1} - {P r}_{S} (x_{n + 1}), \nabla h (x_{n})⟩ \\ \leq λ_{n} β_{n} (⟨x_{n} - x_{n + 1}, \nabla h (x_{n})⟩ - \frac{1}{L_{h}} {∥\nabla h (x_{n})∥}^{2}) \\ \leq λ_{n} β_{n} \frac{L_{h}}{4} {∥x_{n + 1} - x_{n}∥}^{2} . \end{aligned}$ (47)

Further, we have

$\begin{aligned} α_{n} ⟨x_{n + 1} - {P r}_{S} (x_{n + 1}) - (x_{n} - {P r}_{S} (x_{n})), x_{n} - x_{n - 1}⟩ \\ \leq \frac{α_{n}}{2} {∥[Id - {P r}_{S}] (x_{n + 1}) - [Id - {P r}_{S}] (x_{n})∥}^{2} + \frac{α_{n}}{2} {∥x_{n} - x_{n - 1}∥}^{2} \\ \leq \frac{α_{n}}{2} {∥x_{n + 1} - x_{n}∥}^{2} + \frac{α_{n}}{2} {∥x_{n} - x_{n - 1}∥}^{2}, \end{aligned}$

and

$\begin{aligned} α_{n} ⟨x_{n} - {P r}_{S} (x_{n}), x_{n} - x_{n - 1}⟩ \\ = α_{n} d (x_{n}) + \frac{α_{n}}{2} {∥x_{n} - x_{n - 1}∥}^{2} - \frac{α_{n}}{2} {∥x_{n - 1} - {P r}_{S} (x_{n})∥}^{2} \\ \leq α_{n} d (x_{n}) + \frac{α_{n}}{2} {∥x_{n} - x_{n - 1}∥}^{2} - α_{n} d (x_{n - 1}) . \end{aligned}$

By adding two relations above, we obtain

$\begin{aligned} α_{n} ⟨x_{n + 1} - {P r}_{S} (x_{n + 1}), x_{n} - x_{n - 1}⟩ \\ = α_{n} ⟨x_{n + 1} - {P r}_{S} (x_{n + 1}) - (x_{n} - {P r}_{S} (x_{n})), x_{n} - x_{n - 1}⟩ \\ + α_{n} ⟨x_{n} - {P r}_{S} (x_{n}), x_{n} - x_{n - 1}⟩ \\ \leq \frac{α_{n}}{2} {∥x_{n + 1} - x_{n}∥}^{2} + α_{n} {∥x_{n} - x_{n - 1}∥}^{2} + α_{n} (d (x_{n}) - d (x_{n - 1})) . \end{aligned}$ (48)

By combining (43), (46), (47) and (48) with (42) we obtain the desired conclusion.

Definition 3.7

A function $Ψ : H \to \bar{R}$ is sad to be inf-compact if for every r>0 and $κ \in R$ the set

${Lev}_{κ}^{r} (Ψ) := \{x \in H : ∥x∥ \leq r, Ψ (x) \leq κ\}$

is relatively compact in $H$ .

An useful property of inf-compact functions follows.

Lemma 3.8

Let $Ψ : H \to \bar{R}$ be inf-compact and $(x_{n})_{n \geq 0}$ be a bounded sequence in $H$ such that $(Ψ (x_{n}))_{n \geq 0}$ is bounded as well. If the sequence $(x_{n})_{n \geq 0}$ converges weakly to an element in $\hat{x}$ as $n \to + \infty$ , then it converges strongly to this element.

Proof.

Let be $\bar{r} > 0$ and $\bar{κ} \in R$ such that for all $n \geq 1$

$∥x_{n}∥ \leq \bar{r} and Ψ (x_{n}) \leq \bar{κ} .$

Hence, $(x_{n})_{n \geq 0}$ belongs to the set ${Lev}_{\bar{κ}}^{\bar{r}} (Ψ)$ , which is relatively compact. Then $(x_{n})_{n \geq 0}$ has at least one strongly convergent subsequence. Since every strongly convergent subsequence $(x_{n_{l}})_{l \geq 0}$ of $(x_{n})_{n \geq 0}$ has as limit $\hat{x}$ , the desired conclusion follows.

We can formulate now the weak non-ergodic convergence result.

Theorem 3.10

Let $0 < α < \frac{1}{3}$ , $ε_{1}, ε_{2}, ε_{3} > 0$ , the sequences $(λ_{n})_{n \geq 1}$ and $(β_{n})_{n \geq 1}$ satisfy the condition $0 < \underset{n \to \infty}{lim inf} λ_{n} β_{n} \leq sup_{n \geq 0} λ_{n} β_{n} \leq μ$ , $(x_{n})_{n \geq 0}$ be the sequence generated by Algorithm 3.3, and assume that the Hypotheses 3.2 are verified and that either f+g or h is inf-compact. Then the following statements are true:

$lim_{n \to + \infty} d (x_{n}) = 0$ ;

the sequence $(x_{n})_{n \geq 0}$ converges weakly to an element in $S$ as $n \to + \infty$ ;

if h is inf-compact, then the sequence $(x_{n})_{n \geq 0}$ converges strongly to an element in $S$ as $n \to + \infty$ .

Proof.

Thanks to Lemma 3.6, for all $n \geq 1$ we have
$\begin{aligned} d (x_{n + 1}) - d (x_{n}) + λ_{n} [(f + g) (x_{n + 1}) - (f + g)_{*}] \\ \leq α_{n} (d (x_{n}) - d (x_{n - 1})) + ζ_{n}, \end{aligned}$ (49)
where
$ζ_{n} := (\frac{L_{g}}{2} λ_{n} + \frac{L_{h}}{4} λ_{n} β_{n} + \frac{α_{n}}{2}) {∥x_{n + 1} - x_{n}∥}^{2} + α_{n} {∥x_{n} - x_{n - 1}∥}^{2} .$
From Proposition 3.4 (i), combined with the fact that both sequences $(λ_{n})_{n \geq 1}$ and $(β_{n})_{n \geq 1}$ are bounded, it follows that $\sum_{n \geq 1} ζ_{n} < + \infty$ .

In general, since $(x_{n})_{n \geq 0}$ is not necessarily included in $\arg min h$ , we have to treat two different cases.

Case 1: There exists an integer $n_{1} \geq 1$ such that $(f + g) (x_{n}) \geq (f + g)_{*}$ for all $n \geq n_{1}$ . In this case, we obtain from Lemma 1.2 that:

the limit $lim_{n \to + \infty} d (x_{n})$ exists.

$\sum_{n \geq n_{2}} λ_{n} [(f + g) (x_{n + 1}) - (f + g)_{*}] < + \infty$ . Moreover, since $(λ_{n})_{n \geq 1} \notin ℓ^{1}$ , we must have
$\underset{n \to + \infty}{lim inf} (f + g) (x_{n}) \leq (f + g)_{*} .$ (50)

Consider a subsequence $(x_{n_{k}})_{k \geq 0}$ of $(x_{n})_{n \geq 0}$ such that
$lim_{k \to + \infty} (f + g) (x_{n_{k}}) = \underset{n \to + \infty}{lim inf} (f + g) (x_{n})$
and note that, thanks to (50), the sequence $((f + g) (x_{n_{k}}))_{k \geq 0}$ is bounded. From Proposition 3.4 (ii) –(iii) we get that also $(x_{n_{k}})_{k \geq 0}$ and $(h (x_{n_{k}}))_{k \geq 0}$ are bounded. Thus, since either f+g or h is inf-compact, there exists a subsequence $(x_{n_{l}})_{l \geq 0}$ of $(x_{n_{k}})_{k \geq 0}$ , which converges strongly to an element $\hat{x}$ as $l \to + \infty$ . According to Proposition 3.4 (ii) –(iii), $\hat{x}$ belongs to $\arg min h$ . On the other hand,
$lim_{l \to + \infty} (f + g) (x_{n_{l}}) = \underset{n \to + \infty}{lim inf} (f + g) (x_{n}) \geq (f + g) (\hat{x}) \geq (f + g)_{*} .$ (51)
We deduce from (50)–(51) that $(f + g) (\hat{x}) = (f + g)_{*}$ , or in other words, that $\hat{x} \in S$ . In conclusion, thanks to the continuity of d,
$lim_{n \to + \infty} d (x_{n}) = lim_{l \to \infty} d (x_{n_{l}}) = d (\hat{x}) = 0.$
Case 2: For all $n \geq 1$ there exists some $n^{'} > n$ such that $(f + g) (x_{n^{'}}) < (f + g)_{*}$ . We define the set
$V = \{n^{'} \geq 1 : (f + g) (x_{n^{'}}) < (f + g)_{*}\} .$
There exists an integer $n_{2} \geq 2$ such that for all $n \geq n_{2}$ the set ${k \leq n : k \in V}$ is non-empty. Hence, for all $n \geq n_{2}$ the number
$t_{n} := max \{k \leq n : k \in V\}$
is well-defined. By definition $t_{n} \leq n$ for all $n \geq n_{3}$ and moreover the sequence ${t_{n}}_{n \geq n_{2}}$ is non-decreasing and $lim_{n \to + \infty} t_{n} = \infty$ . Indeed, if $lim_{n \to \infty} t_{n} = t \in R$ , then for all $n^{'} > t$ it holds $(f + g) (x_{n^{'}}) \geq (f + g)_{*}$ , contradiction. Choose an integer $N \geq n_{2}$ .

If $t_{N} < N$ , then, for all $n = t_{N}, \dots, N - 1$ , since $(f + g) (x_{n}) \geq (f + g)_{*}$ , the inequality (49) gives
$\begin{aligned} d (x_{n + 1}) - d (x_{n}) \leq d (x_{n + 1}) - d (x_{n}) + λ_{n} [F (x_{n + 1}) - F_{*}] \\ \leq α_{n} (d (x_{n}) - d (x_{n - 1})) + ζ_{n} . \end{aligned}$ (52)
Summing (52) for $n = t_{N}, \dots, N - 1$ and using tht ${α_{n}}_{n \geq 1}$ is non-decreasing, it yields
$\begin{aligned} d (x_{N}) - d (x_{t_{N}}) \leq \sum_{n = t_{N}}^{N - 1} (α_{n} d (x_{n}) - α_{n - 1} d (x_{n - 1})) + \sum_{n = t_{N}}^{N - 1} ζ_{n} \\ \leq α d (x_{N - 1}) + \sum_{n \geq t_{N}} ζ_{n} . \end{aligned}$ (53)

If $t_{N} = N$ , then $d (x_{N}) = d (x_{t_{N}})$ and we have
$d (x_{N}) - α d (x_{N - 1}) \leq d (x_{t_{N}}) + \sum_{n \geq t_{N}} ζ_{n} .$ (54)

For all $n \geq 1$ we define $a_{n} := d (x_{n}) - α d (x_{n - 1})$ . In both cases it yields
$a_{N} \leq d (x_{t_{N}}) + \sum_{n = t_{N}}^{N} ζ_{n} \leq d (x_{t_{N}}) + \sum_{n \geq t_{N}} ζ_{n} .$ (55)
Passing in (55) to limit as $N \to + \infty$ we obtain that
$\underset{n \to + \infty}{lim sup} a_{n} \leq \underset{n \to + \infty}{lim sup} d (x_{t_{n}}) .$ (56)
Let be $u \in S$ . For all $n \geq 1$ we have
$d (x_{n}) = \frac{1}{2} dist {(x_{n}, S)}^{2} \leq \frac{1}{2} {∥x_{n} - u∥}^{2},$
which shows that $(d (x_{n}))_{n \geq 0}$ is bounded, as $lim_{n \to + \infty} ∥ x_{n} - u ∥$ exists. We obtain
$\underset{n \to \infty}{lim sup} a_{n} = \underset{n \to \infty}{lim sup} [d (x_{n}) - α d (x_{n - 1})] \geq (1 - α) \underset{n \to \infty}{lim sup} d (x_{n}) \geq 0 .$ (57)
Further, for all $n \geq 1$ we have $(f + g) (x_{t_{n}}) < (f + g)_{*}$ , which gives
$\underset{n \to + \infty}{lim sup} (f + g) (x_{t_{n}}) \leq (f + g)_{*} .$ (58)
This means that the sequence $((f + g) (x_{t_{n}}))_{n \geq 0}$ is bounded from above. Consider a subsequence $(x_{t_{k}})_{k \geq 0}$ of $(x_{t_{n}})_{n \geq 0}$ such that
$lim_{k \to + \infty} d (x_{t_{k}}) = \underset{n \to + \infty}{lim sup} d (x_{t_{n}}) .$
From Proposition 3.4 (ii)–(iii) we get that also $(x_{t_{k}})_{k \geq 0}$ and $(h (x_{t_{k}}))_{k \geq 0}$ are bounded. Thus, since either f+g or h is inf-compact, there exists a subsequence $(x_{t_{l}})_{l \geq 0}$ of $(x_{t_{k}})_{k \geq 0}$ , which converges strongly to an element $\hat{x}$ as $l \to + \infty$ . According to Proposition 3.4 (ii)–(iii), $\hat{x}$ belongs to $\arg min h$ . Furthermore, it holds
$\underset{l \to + \infty}{lim inf} (f + g) (x_{t_{l}}) \geq (f + g) (\hat{x}) \geq (f + g)_{*} .$ (59)
We deduce from (58) and (59) that
$\begin{aligned} (f + g)_{*} \leq (f + g) (\hat{x}) \leq \underset{n \to + \infty}{lim sup} (f + g) (x_{t_{l}}) \\ \leq \underset{n \to + \infty}{lim sup} (f + g) (x_{t_{n}}) \leq (f + g)_{*}, \end{aligned}$
which gives $\hat{x} \in S$ . Thanks to the continuity of d we get
$\underset{n \to + \infty}{lim sup} d (x_{t_{n}}) = lim_{l \to + \infty} d (x_{t_{l}}) = d (\hat{x}) = 0.$ (60)
By combining (56), (57) and (60), it yields
$0 \leq (1 - α) \underset{n \to + \infty}{lim sup} d (x_{n}) \leq \underset{n \to + \infty}{lim sup} a_{n} \leq \underset{n \to + \infty}{lim sup} d (x_{t_{n}}) = 0,$
which implies $\underset{n \to + \infty}{lim sup} d (x_{n}) = 0$ and thus
$lim_{n \to + \infty} d (x_{n}) = \underset{n \to + \infty}{lim inf} d (x_{n}) = \underset{n \to + \infty}{lim sup} d (x_{n}) = 0.$

According to (i) we have $lim_{n \to \infty} d (x_{n}) = 0$ , thus every weak cluster point of the sequence $(x_{n})_{n \geq 0}$ belongs to $S$ . From Lemma 1.1 it follows that $(x_{n})_{n \geq 0}$ converges weakly to a point in $S$ as $n \to + \infty$ .

Since $\underset{n \to \infty}{lim inf} λ_{n} β_{n} > 0$ , from Proposition 3.4(ii) we have that
$lim_{n \to + \infty} ∥\nabla h (x_{n})∥ = lim_{n \to + \infty} h (x_{n}) = 0.$
Since $(x_{n})_{n \geq 0}$ is bounded, there exist $\bar{r} > 0$ and $\bar{κ} \in R$ such that for all $n \geq 1$
$∥x_{n}∥ \leq \bar{r} and h (x_{n}) \leq \bar{κ} .$
Thanks to (ii) the sequence $(x_{n})_{n \geq 0}$ converges weakly to an element in $S$ . Therefore, according to Lemma 3.8, it converges strongly to this element in $S$ .

Funding Statement

Research partially supported by FWF (Austrian Science Fund), projects I 2419-N32 and W1260-N35.

Acknowledgments

The second author gratefully acknowledges the financial support of the Doctoral Programme Vienna Graduate School on Computational Optimization (VGSCO) which is funded by Austrian Science Fund (FWF, project W1260-N35).

Disclosure statement

No potential conflict of interest was reported by the authors.

ORCID

Radu Ioan Boţ http://orcid.org/0000-0002-4469-314X

References

[1].Attouch H, Czarnecki M-O.. Asymptotic behavior of coupled dynamical systems with multiscale aspects. J Differ Equ. 2010;248(6):1315–1344. [Google Scholar]
[2].Attouch H, Cabot A, Czarnecki M-O.. Asymptotic behavior of nonautonomous monotone and subgradient evolution equations. Trans Am Math Soc. 2018;370(2):755–790. [Google Scholar]
[3].Attouch H, Czarnecki M-O, Peypouquet J.. Coupling forward–backward with penalty schemes and parallel splitting for constrained variational inequalities. SIAM J Optim. 2011;21(4):1251–1274. [Google Scholar]
[4].Attouch H, Czarnecki M-O, Peypouquet J.. Prox-penalization and splitting methods for constrained variational problems. SIAM J Optim. 2011;21(1):149–173. [Google Scholar]
[5].Banert S, Boţ RI.. Backward penalty schemes for monotone inclusion problems. J Optim Theory Appl. 2015;166(3):930–948. [Google Scholar]
[6].Boţ RI, Csetnek ER.. A Tseng's type penalty scheme for solving inclusion problems involving linearly composed and parallel-sum type monotone operators. Vietnam J Math. 2014;42(4):451–465. [Google Scholar]
[7].Boţ RI, Csetnek ER.. Forward–backward and Tseng's type penalty schemes for monotone inclusion problems. Set-Valued Var Anal. 2014;22(2):313–331. [Google Scholar]
[8].Boţ RI, Csetnek ER, Nimana N.. Gradient-type penalty method with inertial effects for solving constrained convex optimization problems with smooth data. Optim Lett. 2018;12(1):17–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Boţ RI, Csetnek ER, Nimana N.. An inertial proximal-gradient penalization scheme for constrained convex optimization problems. Vietnam J Math. 2018;46(1):53–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Frankel P, Peypouquet J.. Lagrangian-penalization algorithm for constrained optimization and variational inequalities. Set-Valued Var Anal. 2012;20(2):169–185. [Google Scholar]
[11].Noun N, Peypouquet J.. Forward–backward penalty scheme for constrained convex minimization without inf-compactness. J Optim Theory Appl. 2013;158(3):787–795. [Google Scholar]
[12].Peypouquet J. Coupling the gradient method with a general exterior penalization scheme for convex minimization. J Optim Theory Appl. 2012;153(1):123–138. [Google Scholar]
[13].Attouch H, Briceno-Arias LM, Combettes PL.. A parallel splitting method for coupled monotone inclusions. SIAM J Control Optim. 2010;48(5):3246–3270. [Google Scholar]
[14].Attouch H, Bolte J, Redont P, Soubeyran A.. Alternating proximal algorithms for weakly coupled convex minimization problems, applications to dynamical games and PDE's. J Convex Anal. 2008;15(3):485–506. [Google Scholar]
[15].Alvarez F, Attouch H.. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 2001;9(1):3–11. [Google Scholar]
[16].Polyak BT. Introduction to optimization. New York: Publications Division, Optimization Software Inc.; 1987. Translations Series in Mathematics and Engineering. [Google Scholar]
[17].Beck A, Teboulle M.. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci. 2009;2(1):183–202. [Google Scholar]
[18].Chambolle A, Dossal C.. On the convergence of the iterates of the ‘Fast Iterative Shrinkage/Thresholding Algorithm’. J Optim Theory Appl. 2015;166(3):968–982. [Google Scholar]
[19].Bertsekas DP. Nonlinear programming. Cambridge (MA): Athena Scientific; 1999. [Google Scholar]
[20].Boţ RI, Csetnek ER, Laszlo SC.. An inertial forward–backward algorithm for the minimization of the sum of two nonconvex functions. EURO J Comput Optim. 2016;4(1):3–25. [Google Scholar]
[21].Alvarez F. On the minimizing property of a second order dissipative system in Hilbert spaces. SIAM J Control Optim. 2000;38(4):1102–1119. [Google Scholar]
[22].Alvarez F. Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J Optim. 2004;14(3):773–782. [Google Scholar]
[23].Attouch H, Czarnecki M-O.. Asymptotic behavior of gradient-like dynamical systems involving inertia and multiscale aspects. J Differ Equ. 2017;262(3):2745–2770. [Google Scholar]
[24].Attouch H, Peypouquet J, Redont P.. A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J Optim. 2014;24(1):232–256. [Google Scholar]
[25].Boţ RI, Csetnek ER.. An inertial forward–backward–forward primal–dual splitting algorithm for solving monotone inclusion problems. Numer Algorithms. 2016;71(3):519–540. [Google Scholar]
[26].Boţ RI, Csetnek ER.. Penalty schemes with inertial effects for monotone inclusion problems. Optimization. 2017;66(6):313–331. [Google Scholar]
[27].Boţ RI, Csetnek ER, Hendrich C.. Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl Math Comput. 2015;256(1):472–487. [Google Scholar]
[28].Chen C, Chan RH, Ma S, Yang J.. Inertial proximal ADMM for linearly constrained separable convex optimization. SIAM J Imag Sci. 2015;8(4):2239–2267. [Google Scholar]
[29].Chen C, Ma S, Yang J.. A general inertial proximal point algorithm for mixed variational inequality problem. SIAM J Optim. 2015;25(4):2120–2142. [Google Scholar]
[30].Maingé P-E. Convergence theorems for inertial Krasnosel'skiĭ–Mann type algorithms. J Comput Appl Math. 2008;219(1):223–236. [Google Scholar]
[31].Maingé P-E, Moudafi A.. Convergence of new inertial proximal methods for dc programming. SIAM J Optim. 2008;19(1):397–413. [Google Scholar]
[32].Moudafi A, Oliny M.. Convergence of a splitting inertial proximal method for monotone operators. J Comput Appl Math. 2003;155(2):447–454. [Google Scholar]
[33].Bauschke HH, Combettes PL.. Convex analysis monotone operator theory in Hilbert spaces. New York (NY): Springer; CMS Books in Mathematics; 2011. [Google Scholar]
[34].Boţ RI. Conjugate duality in convex optimization. Vol. 637 Berlin, Heidelberg: Springer; 2010. Lecture Notes in Economics and Mathematical Systems. [Google Scholar]
[35].Zălinescu C. Convex analysis in general vector spaces. Singapore: World Scientific; 2002. [Google Scholar]
[36].Fitzpatrick S. Representing monotone operators by convex functions. Proceedings of the Centre for Mathematical Analysis. Workshop/Miniconference on Functional Analysis and Optimization, Vol. 20, Australian National University, Canberra; 1988.
[37].Bauschke HH, McLaren DA, Sendov HS.. Fitzpatrick functions: inequalities, examples and remarks on a problem by S. Fitzpatrick. J Convex Anal. 2006;13(3):499–523. [Google Scholar]
[38].Borwein JM. Maximal monotonicity via convex analysis. J Convex Anal. 2006;13(3):561–586. [Google Scholar]
[39].Burachik RS, Svaiter BF.. Maximal monotone operators, convex functions and a special family of enlargements. Set-Valued Anal. 2002;10(4):29–316. [Google Scholar]
[40].Combettes PL. Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization. 2004;53(5–6):475–504. [Google Scholar]

[CIT0001] [1].Attouch H, Czarnecki M-O.. Asymptotic behavior of coupled dynamical systems with multiscale aspects. J Differ Equ. 2010;248(6):1315–1344. [Google Scholar]

[CIT0002] [2].Attouch H, Cabot A, Czarnecki M-O.. Asymptotic behavior of nonautonomous monotone and subgradient evolution equations. Trans Am Math Soc. 2018;370(2):755–790. [Google Scholar]

[CIT0003] [3].Attouch H, Czarnecki M-O, Peypouquet J.. Coupling forward–backward with penalty schemes and parallel splitting for constrained variational inequalities. SIAM J Optim. 2011;21(4):1251–1274. [Google Scholar]

[CIT0004] [4].Attouch H, Czarnecki M-O, Peypouquet J.. Prox-penalization and splitting methods for constrained variational problems. SIAM J Optim. 2011;21(1):149–173. [Google Scholar]

[CIT0005] [5].Banert S, Boţ RI.. Backward penalty schemes for monotone inclusion problems. J Optim Theory Appl. 2015;166(3):930–948. [Google Scholar]

[CIT0006] [6].Boţ RI, Csetnek ER.. A Tseng's type penalty scheme for solving inclusion problems involving linearly composed and parallel-sum type monotone operators. Vietnam J Math. 2014;42(4):451–465. [Google Scholar]

[CIT0007] [7].Boţ RI, Csetnek ER.. Forward–backward and Tseng's type penalty schemes for monotone inclusion problems. Set-Valued Var Anal. 2014;22(2):313–331. [Google Scholar]

[CIT0008] [8].Boţ RI, Csetnek ER, Nimana N.. Gradient-type penalty method with inertial effects for solving constrained convex optimization problems with smooth data. Optim Lett. 2018;12(1):17–33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0009] [9].Boţ RI, Csetnek ER, Nimana N.. An inertial proximal-gradient penalization scheme for constrained convex optimization problems. Vietnam J Math. 2018;46(1):53–71. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0010] [10].Frankel P, Peypouquet J.. Lagrangian-penalization algorithm for constrained optimization and variational inequalities. Set-Valued Var Anal. 2012;20(2):169–185. [Google Scholar]

[CIT0011] [11].Noun N, Peypouquet J.. Forward–backward penalty scheme for constrained convex minimization without inf-compactness. J Optim Theory Appl. 2013;158(3):787–795. [Google Scholar]

[CIT0012] [12].Peypouquet J. Coupling the gradient method with a general exterior penalization scheme for convex minimization. J Optim Theory Appl. 2012;153(1):123–138. [Google Scholar]

[CIT0013] [13].Attouch H, Briceno-Arias LM, Combettes PL.. A parallel splitting method for coupled monotone inclusions. SIAM J Control Optim. 2010;48(5):3246–3270. [Google Scholar]

[CIT0014] [14].Attouch H, Bolte J, Redont P, Soubeyran A.. Alternating proximal algorithms for weakly coupled convex minimization problems, applications to dynamical games and PDE's. J Convex Anal. 2008;15(3):485–506. [Google Scholar]

[CIT0015] [15].Alvarez F, Attouch H.. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 2001;9(1):3–11. [Google Scholar]

[CIT0016] [16].Polyak BT. Introduction to optimization. New York: Publications Division, Optimization Software Inc.; 1987. Translations Series in Mathematics and Engineering. [Google Scholar]

[CIT0017] [17].Beck A, Teboulle M.. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci. 2009;2(1):183–202. [Google Scholar]

[CIT0018] [18].Chambolle A, Dossal C.. On the convergence of the iterates of the ‘Fast Iterative Shrinkage/Thresholding Algorithm’. J Optim Theory Appl. 2015;166(3):968–982. [Google Scholar]

[CIT0019] [19].Bertsekas DP. Nonlinear programming. Cambridge (MA): Athena Scientific; 1999. [Google Scholar]

[CIT0020] [20].Boţ RI, Csetnek ER, Laszlo SC.. An inertial forward–backward algorithm for the minimization of the sum of two nonconvex functions. EURO J Comput Optim. 2016;4(1):3–25. [Google Scholar]

[CIT0021] [21].Alvarez F. On the minimizing property of a second order dissipative system in Hilbert spaces. SIAM J Control Optim. 2000;38(4):1102–1119. [Google Scholar]

[CIT0022] [22].Alvarez F. Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J Optim. 2004;14(3):773–782. [Google Scholar]

[CIT0023] [23].Attouch H, Czarnecki M-O.. Asymptotic behavior of gradient-like dynamical systems involving inertia and multiscale aspects. J Differ Equ. 2017;262(3):2745–2770. [Google Scholar]

[CIT0024] [24].Attouch H, Peypouquet J, Redont P.. A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J Optim. 2014;24(1):232–256. [Google Scholar]

[CIT0025] [25].Boţ RI, Csetnek ER.. An inertial forward–backward–forward primal–dual splitting algorithm for solving monotone inclusion problems. Numer Algorithms. 2016;71(3):519–540. [Google Scholar]

[CIT0026] [26].Boţ RI, Csetnek ER.. Penalty schemes with inertial effects for monotone inclusion problems. Optimization. 2017;66(6):313–331. [Google Scholar]

[CIT0027] [27].Boţ RI, Csetnek ER, Hendrich C.. Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl Math Comput. 2015;256(1):472–487. [Google Scholar]

[CIT0028] [28].Chen C, Chan RH, Ma S, Yang J.. Inertial proximal ADMM for linearly constrained separable convex optimization. SIAM J Imag Sci. 2015;8(4):2239–2267. [Google Scholar]

[CIT0029] [29].Chen C, Ma S, Yang J.. A general inertial proximal point algorithm for mixed variational inequality problem. SIAM J Optim. 2015;25(4):2120–2142. [Google Scholar]

[CIT0030] [30].Maingé P-E. Convergence theorems for inertial Krasnosel'skiĭ–Mann type algorithms. J Comput Appl Math. 2008;219(1):223–236. [Google Scholar]

[CIT0031] [31].Maingé P-E, Moudafi A.. Convergence of new inertial proximal methods for dc programming. SIAM J Optim. 2008;19(1):397–413. [Google Scholar]

[CIT0032] [32].Moudafi A, Oliny M.. Convergence of a splitting inertial proximal method for monotone operators. J Comput Appl Math. 2003;155(2):447–454. [Google Scholar]

[CIT0033] [33].Bauschke HH, Combettes PL.. Convex analysis monotone operator theory in Hilbert spaces. New York (NY): Springer; CMS Books in Mathematics; 2011. [Google Scholar]

[CIT0034] [34].Boţ RI. Conjugate duality in convex optimization. Vol. 637 Berlin, Heidelberg: Springer; 2010. Lecture Notes in Economics and Mathematical Systems. [Google Scholar]

[CIT0035] [35].Zălinescu C. Convex analysis in general vector spaces. Singapore: World Scientific; 2002. [Google Scholar]

[CIT0036] [36].Fitzpatrick S. Representing monotone operators by convex functions. Proceedings of the Centre for Mathematical Analysis. Workshop/Miniconference on Functional Analysis and Optimization, Vol. 20, Australian National University, Canberra; 1988.

[CIT0037] [37].Bauschke HH, McLaren DA, Sendov HS.. Fitzpatrick functions: inequalities, examples and remarks on a problem by S. Fitzpatrick. J Convex Anal. 2006;13(3):499–523. [Google Scholar]

[CIT0038] [38].Borwein JM. Maximal monotonicity via convex analysis. J Convex Anal. 2006;13(3):561–586. [Google Scholar]

[CIT0039] [39].Burachik RS, Svaiter BF.. Maximal monotone operators, convex functions and a special family of enlargements. Set-Valued Anal. 2002;10(4):29–316. [Google Scholar]

[CIT0040] [40].Combettes PL. Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization. 2004;53(5–6):475–504. [Google Scholar]

PERMALINK

A forward–backward penalty scheme with inertial effects for monotone inclusions. Applications to convex bilevel programming

Radu Ioan Boţ

Dang-Khoa Nguyen

ABSTRACT

1. Introduction and preliminaries

1.1. Motivation and problems formulation

1.2. Notations and preliminaries

Lemma 1.1 Opial-Passty —

Lemma 1.2

Lemma 1.3

Lemma 1.4

Proof.

2. The general monotone inclusion problem

Problem 2.1

Hypotheses 2.2

Lemma 2.3

Proof.

Remark 2.4

Proposition 2.5

Proof.

Remark 2.6

Theorem 2.8

Proof.

3. Applications to convex bilevel programming

Problem 3.1

Hypotheses 3.2

Lemma 3.3

Proof.

Proposition 3.4

Theorem 3.6

Lemma 3.6

Proof.

Definition 3.7

Lemma 3.8

Proof.

Theorem 3.10

Proof.

Funding Statement

Acknowledgments

Disclosure statement

ORCID

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases