Skip to main content
Springer logoLink to Springer
. 2022 Apr 19;36(1):727–756. doi: 10.1007/s10884-022-10160-3

Second Order Splitting Dynamics with Vanishing Damping for Additively Structured Monotone Inclusions

Radu Ioan Boţ 1,, David Alexander Hulett 1
PMCID: PMC10901952  PMID: 38435835

Abstract

In the framework of a real Hilbert space, we address the problem of finding the zeros of the sum of a maximally monotone operator A and a cocoercive operator B. We study the asymptotic behaviour of the trajectories generated by a second order equation with vanishing damping, attached to this problem, and governed by a time-dependent forward–backward-type operator. This is a splitting system, as it only requires forward evaluations of B and backward evaluations of A. A proper tuning of the system parameters ensures the weak convergence of the trajectories to the set of zeros of A+B, as well as fast convergence of the velocities towards zero. A particular case of our system allows to derive fast convergence rates for the problem of minimizing the sum of a proper, convex and lower semicontinuous function and a smooth and convex function with Lipschitz continuous gradient. We illustrate the theoretical outcomes by numerical experiments.

Keywords: Asymptotic stabilization, Damped inertial dynamics, Lyapunov analysis, Vanishing viscosity, Splitting system, Monotone inclusions

Introduction

Problem Formulation and a Continuous Time Splitting Scheme with Vanishing Damping

Let H be a real Hilbert, A:H2H a maximally monotone operator and B:HH a β-cocoercive operator for some β>0 such that zer(A+B). Devising fast convergent continuous and discrete time dynamics for solving monotone inclusions of the type

findxHsuchthat0(A+B)(x) 1

is of great importance in many fields, including, but not limited to, optimization, equilibrium theory, economics and game theory, partial differential equations, and statistics. One of our main motivations comes from the fact that solving the convex optimization problem

minxHf(x)+g(x),

where f:HR{+} is proper, convex and lower semicontinuous and g:HR is convex and Fréchet differentiable with a Lipschitz continuous gradient, is equivalent to solving the monotone inclusion

0(f+g)(x).

We want to exploit the additive structure of (1) and approach A and B separately, in the spirit of the splitting paradigm.

For tt0>0, α>1,ξ0, and functions λ,γ:[t0,+)(0,+), we will study the asymptotic behaviour of the trajectories of the second order differential equation

(Split-DIN-AVD)x¨(t)+αtx˙(t)+ξddtTλ(t),γ(t)(x(t))+Tλ(t),γ(t)(x(t))=0, 2

where, for λ,γ>0, the operator Tλ,γ:HH is given by

Tλ,γ=1λ[Id-JγA(Id-γB)].

The sets of zeros of A+B and of Tλ,γ, for λ,γ>0, coincide. The nomenclature (Split-DIN-AVD) comes from the splitting feature of the continuous time scheme, as well as the link with the (DIN-AVD) system developed by Attouch and László in [9] (Dynamic Inertial Newton—Asymptotic Vanishing Damping), which we will emphasize later. We will discuss the existence and uniqueness of the trajectories generated (Split-DIN-AVD), and also show their weak convergence to the set of zeros of A+B as well as the fast convergence of the velocities to zero, and convergence rates for Tλ(t),γ(t)(x(t)) and ddtTλ(t),γ(t)(x(t)) as t+.

For the particular case B=0, we are left with the monotone inclusion problem

findxHsuchthat0A(x),

and the attached system

x¨(t)+αtx˙(t)+ξddtAλ(t),γ(t)(x(t))+Aλ(t),γ(t)(x(t))=0,

where, for λ,γ>0, the operator Aλ,γ:HH can be seen as a generalized Moreau envelope of the operator A, i.e.,

Aλ,γ=1λ[Id-JγA].

In particular, we will be able to set γ(t)=λ(t) for every tt0. Since for λ>0, Aλ,λ=Aλ, this allows us to recover the (DIN-AVD) system

(DIN-AVD)x¨(t)+αtx˙(t)+ξddtAλ(t)(x(t))+Aλ(t)(x(t))=0,

addressed by Attouch and László in [9].

If A=0, and after properly redefining some parameters, we obtain the following system

x¨(t)+αtx˙(t)+ξddt1η(t)Bx(t)+1η(t)Bx(t)=0,

with η:[t0,+)(0,+), which addresses the monotone equation

findxHsuchthatB(x)=0.

This dynamical system approaches the cocoercive operator B directly through a forward evaluation, which is more natural, instead of having to resort to its Moreau envelope, as in (DIN-AVD).

Notation and Preliminaries

In this subsection, we will explain the notions which were mentioned in the previous subsection, and we will introduce some definitions and preliminary results that will be required later. Throughout the paper, we will be working in a real Hilbert space H with inner product ·,· and corresponding norm ·=·,·.

Let A:H2H be a set-valued operator, that is, Ax is a subset of H for every xH. The operator A is totally characterized by its graph graA={(x,u)H×H:uAx}. The inverse of A is the operator A-1:H2H well-defined through the equivalence xA-1u if and only if uAx. The set of zeros of A is the set zerA={xH:0Ax}. For a subset CH, we say that A(C)=xCAx. The range of A is the set ranA=A(H).

A set-valued operator A is said to be monotone if v-u,y-x0 whenever (x,u),(y,v)graA, and maximally monotone if it is monotone and the following implication holds:

A~is monotone,graAgraA~A=A~.

Let λ>0. The resolvent of index λ of A is the operator JλA:H2H given by

JλA=(Id+λA)-1,

and the Moreau envelope (or Yosida approximation or Yosida regularization) of index λ of A is the operator Aλ:H2H given by

Aλ=1λ(Id-JλA),

where Id:HH, defined by Id(x)=x for every xH, is the identity operator of H. For λ1,λ2>0, it holds (Aλ1)λ2=Aλ1+λ2.

A single-valued operator B:HH is said to be β-cocoercive for some β>0 if for every x,yH we have

βBx-By2Bx-By,x-y.

In this case, B is 1β-Lipschitz continuous, namely, for every x,yH we have

Bx-By1βx-y.

We say B is nonexpansive if it is 1-Lipschitz continuous, and firmly nonexpansive if it is 1-cocoercive. For α(0,1), we say B is α-averaged if there exists a nonexpansive operator R:HH such that

B=(1-α)Id+αR.

Let λ>0 and A:H2H. According to Minty’s Theorem, A is maximally monotone if and only if ran(Id+λA)=H. In this case JλA is single-valued and firmly nonexpansive, Aλ is single-valued, λ-cocoercive, and for every xH and every λ1,λ2>0 we have

Jλ1A(x)-Jλ2A(x)|λ1-λ2|Aλ1(x).

Let B:HH be a single-valued operator. If B is α-averaged for some α(0,1), then Id-B is 12α-cocoercive. If B is monotone and continuous, then it is maximally monotone.

The following concepts and results show the strong interplay between the theory of monotone operators and the convex analysis.

Let f:HR{+} be a proper, convex and lower semicontinuous function. We denote the infimum of f over H by minHf and the set of global minimizers of f by argminHf. The subdifferential of f is the operator f:H2H defined, for every xH, by

f(x)={xH:x,y-x+f(x)f(y)yH}.

The subdifferential operator of f is maximally monotone and x¯zerf x¯ is a global minimizer of f.

Let λ>0. The proximal operator of f of index λ is the operator proxλf:HH defined, for every xH, by

proxλf(x)=Jλf(x)=argminyHf(y)+12λx-y2,

which also means that proxλf is firmly nonexpansive. The Moreau envelope of f of index λ is the function fλ:HR given, for every xH, by

fλ(x)=fproxλf(x)+12λx-proxλf(x)2.

The function fλ is Fréchet differentiable and

fλ(x)=1λx-proxλf(x)=(f)λ(x)xH.

Finally, if f:HR has full domain and is Fréchet differentiable with 1β-Lipschitz continuous gradient, for β>0, then, according to Baillon–Haddad’s Theorem, f is β-cocoercive.

A Brief History of Inertial Systems Attached to Optimization Problems and Monotone Inclusions

In the last years there have been many advances in the study of continuous time inertial systems with vanishing damping attached to monotone inclusion problems. We briefly visit them in the following paragraphs.

The Heavy Ball Method with Friction

Consider a convex and continuously differentiable function f:HR with at least one minimizer. The heavy ball with friction system

(HBF)x¨(t)+μx˙(t)+f(x(t))=0 3

was introduced by Álvarez in [2] as a suitable continuous time scheme to approach the minimization of the function f. This system can be seen as the equation of the horizontal position x(t) of an object that moves, under the force of gravity, along the graph of the function f, subject to a kinetic friction represented by the term μx˙(t) (a nice derivation can be seen in the work done by Attouch-Goudou-Redont in [8]). It is known that, if x is a solution of (HBF), then x converges weakly to a minimizer of f and f(x(t))-minHf=O1t as t+.

In recent times, the question was raised whether the damping coefficient μ could be chosen to be time-dependent. An important contribution was made by Su–Boyd–Candés (in [20]) who studied the case of an Asymptotic Vanishing Damping coefficient μ(t)=αt, namely,

(AVD)x¨(t)+αtx˙(t)+f(x(t))=0, 4

and proved when α3 the rate of convergence for the functional values f(x(t))-minHf=O1t2 as t+. This second order system can be seen as a continuous counterpart to Nesterov’s accelerated gradient method from [19]. Weak convergence of the trajectories generated by (AVD) when α>3 has been shown by Attouch-Chbani-Peypouquet-Redont [6] and May [18], with the improved rate of convergence for the functional values f(x(t))-minHf=o1t2 as t+. For α=3, the convergence of the trajectories remains an open question, except for the one dimensional case (see [7]). In the subcritical case α3, it has been shown by Apidopoulos-Aujol-Dossal [5] and Attouch-Chbani-Riahi [7] that the objective values converge at a rate O(t-2α3) as t+.

Heavy Ball Dynamics and Cocoercive Operators

If f:HR{+} is a proper, convex and lower semicontinuous function which is not necessarily differentiable, then we cannot make direct use of (3). However, since for λ>0 we have argminf=argminfλ, we can replace f by its Moreau envelope fλ, and the system now becomes

x¨(t)+μx˙(t)+fλ(x(t))=0.

In line with this idea, and in analogy with (3), Álvarez and Attouch [3] and Attouch and Maingé [11] studied the dynamics

x¨(t)+μx˙(t)+B(x(t))=0, 5

where B:HH is a β-cocoercive operator. They were able to prove that the solutions of this system weakly converge to elements of zerB provided that the cocoercitivity parameter β and the damping coefficient μ satisfy βμ2>1. For a maximally monotone operator A:H2H, we know that its Moreau envelope is λ-cocoercive and thus, under the condition λμ2>1, the trajectories of

x¨(t)+μx˙(t)+Aλ(x(t))=0

converge weakly to elements of zerAλ=zerA.

Also related to (5), Boţ-Csetnek [16] considered the system

x¨(t)+μ(t)x˙(t)+ν(t)Bx(t)=0, 6

where B:HH is again β-cocoercive. Under the assumption that μ and ν are locally absolutely continuous, μ˙(t)0ν˙(t) for almost every t[0,+) and inft0μ2(t)ν(t)>1β, the authors were able to prove that the solutions to this system converge weakly to zeros of B.

In [12], Attouch and Peypouquet addressed the system

x¨(t)+αtx˙(t)+Aλ(t)(x(t))=0, 7

where α>1 and the time-dependent regularizing parameter λ(t) satisfies λ(t)α2t2>1 for every tt0>0. As well as ensuring the weak convergence of the trajectories towards elements of zerA, choosing the regularizing parameter in such a fashion allowed the authors to obtain fast convergence of the velocities and accelerations towards zero.

Inertial Dynamics with Hessian Damping

Let us return briefly to the (AVD) system (4). In addition to the viscous vanishing damping term αtx˙(t), the following system with Hessian-driven damping was considered by Attouch-Peypouquet-Redont in [13]

x¨(t)+αtx˙(t)+ξ2f(x(t))x˙(t)+f(x(t))=0,

where ξ0. While preserving the fast convergence properties of the Nesterov accelerated method, the Hessian-driven damping term reduces the oscillatory aspect of the trajectories. In [9], Attouch and László studied a version of (7) with an added Hessian-driven damping term:

x¨(t)+αtx˙(t)+ξddtAλ(t)(x(t))+Aλ(t)(x(t))=0.

While preserving the convergence results of (7), the main benefit of the introduction of this damping term is the fast convergence rates that can be obtained for Aλ(t)(x(t)) and ddtAλ(t)(x(t)) as t+. The regularizing parameter λ(t) is again chosen to be time-dependent; in the general case, the authors take λ(t)=λt2, and in [12] it is shown that taking λ(t) this way is critical. However, in the case where A=f for a proper, convex and lower semicontinuous function f, it is also allowed to take λ(t)=λtr with r0.

Layout of the Paper

In Sect. 2, we give the proof for the existence and uniqueness of strong global solutions to (Split-DIN-AVD) by means of a Cauchy–Lipschitz–Picard argument. In Sect. 3 we state the main theorem of this work, and we show the weak convergence of the solutions of (2) to elements of zer(A+B), as well as the fast convergence of the velocities and accelerations to zero. We also provide convergence rates for Tλ(t),γ(t)(x(t)) and ddtTλ(t),γ(t)(x(t)) as t+. We explore the particular cases A=0 and B=0, and show improvements with respect to previous works. In Sect. 4, we address the convex minimization case, namely, when A=f and B=g, where f:HR{+} is a proper, convex and lower semicontinuous function and g:HR is a convex and Fréchet differentiable function with Lipschitz continuous gradient, and derive, in addition, a fast convergence rate for the function values. In Sect. 5, we illustrate the theoretical results by numerical experiments. In Sect. 5, we provide an algorithm that arises from a time discretization of (Split-DIN-AVD) and discuss its convergence properties.

Existence and Uniqueness of Trajectories

In this section, we show the existence and uniqueness of strong global solutions to (Split-DIN-AVD). For the sake of clarity, first we state the definition of a strong global solution.

Definition 2.1

We say that x:[t0,+)H is a strong global solution of (Split-DIN-AVD) with Cauchy data (x0,u0)H×H if

  • (i)

    x,x˙:[t0,+)H are locally absolutely continuous;

  • (ii)

    x¨(t)+αtx˙+ξddtTλ(t),γ(t)(x(t))+Tλ(t),γ(t)(x(t))=0 for almost every t[t0,+);

  • (iii)

    x(t0)=x0, x˙(t0)=u0.

A classic solution is just a strong global solution which is C2. Sometimes we will mention the terms strong global solution or classic global solution without explicit mention of the Cauchy data.

The following lemma will be used to prove the existence of strong global solutions of our system, and we will need it in the proof of the main theorem as well.

Lemma 2.2

Let A:H2H be a maximally monotone operator and B:HH a β-cocoercive operator for some β>0. Then, the following statements hold:

  • (i)

    For λ>0 and γ(0,2β), Tλ,γ is a λ4β-γ4β-cocoercive operator. In particular, this also implies that Tλ,γ is λ2-cocoercive.

  • (ii)
    Choose λ1,λ2>0, γ1,γ2(0,2β) and x,yH. Then, for x¯zer(A+B) it holds
    λ1Tλ1,γ1(x)-λ2Tλ2,γ2(y)4x-y+4β|γ1-γ2|γ1B(x)+2|γ1-γ2|γ1x-x¯,Tλ1,γ1(x)-Tλ2,γ2(y)1λ14x-y+4β|γ1-γ2|γ1Bx+2|γ1-γ2|γ1x-x¯+2|λ2-λ1|λ1λ2y-x¯.
  • (iii)
    If x is a classic global solution to (2) and x¯zer(A+B), then, for every tt0, we have
    ddtλ(t)Tλ(t),γ(t)(x(t))4x˙(t)+4β|γ˙(t)|γ(t)B(x(t))+2|γ˙(t)|γ(t)x(t)-x¯.

Proof

  • (i)
    From [14, Proposition 26.1(iv)(d)] we know that the operator JγA(Id-γB) is α=2β4β-γ-averaged. From [14, Proposition 4.39], we obtain that Id-JγA(Id-γB) is 12α-cocoercive, namely, it is 4β-γ4β-cocoercive. Since γ(0,2β), we have 4β-γ4β>2β4β=12, which implies that Id-JγA(Id-γB) is 12-cocoercive and thus
    Tλ,γisλ4β-γ4β-cocoerciveandTλ,γisλ2-cocoercive.
  • (ii)
    We have
    λ1Tλ1,γ1(x)-λ2Tλ2,γ2(y)x-y+Jγ1A(x-γ1B(x))-Jγ2A(y-γ2B(y))x-y+Jγ1A(x-γ1B(x))-Jγ2A(x-γ1B(x))+Jγ2A(x-γ1B(x))-Jγ2A(y-γ2B(y))2x-y+|γ1-γ2|Aγ1(x-γ1B(x))+γ1B(x)-γ2B(y)2x-y+|γ1-γ2|Aγ1(x-γ1B(x))+γ1B(x)-γ2B(x)+γ2B(x)-γ2B(y)=2x-y+|γ1-γ2|Aγ1(x-γ1B(x))+|γ1-γ2|B(x)+γ2B(x)-B(y).

Now, notice that

Aγ1(x-γ1B(x))=1γ1(Id-Jγ1A)(x-γ1B(x))=-B(x)+1γ1(x-Jγ1A(x-γ1B(x)))=-B(x)+Tγ1,γ1(x),

so using (i) and the fact that Tγ1,γ2(x¯)=0, we obtain

Aγ1(x-γ1B(x))=-B(x)+Tγ1,γ2(x)B(x)+Tγ1,γ2(x)-Tγ1,γ2(x¯)B(x)+2γ1x-x¯. 8

Altogether, plugging (8) into our initial inequality yields

λ1Tλ1,γ2(x)-λ2Tλ2,γ2(y)2x-y+2|γ1-γ2|B(x)+2|γ1-γ2|γ1x-x¯+γ2B(x)-B(y)2x-y+4β|γ1-γ2|γ1B(x)+2|γ1-γ2|γ1x-x¯+2β1βx-y.

To show the second inequality, we use the previous one. We have

Tλ1,γ1(x)-Tλ2,γ2(y)=1λ1λ1Tλ1,γ1(x)-λ2Tλ2,γ2(y)+(λ2-λ1)Tλ2,γ2(y)1λ14x-y+4β|γ1-γ2|γ1Bx+2|γ1-γ2|γ1x-x¯+|λ2-λ1|λ1Tλ2,γ2(y)1λ14x-y+4β|γ1-γ2|γ1Bx+2|γ1-γ2|γ1x-x¯+2|λ2-λ1|λ1λ2y-x¯,

where the last line is a consequence of Tλ2,γ2 being λ22-cocoercive, and hence 2λ2-Lipschitz continuous (see (i)).

(iii) For t,st0 set

x=x(t),y=x(s),λ1=λ(t),γ1=γ(t),λ2=λ(s),γ2=γ(s)

and use (ii) to obtain, for every tt0,

λ(t)Tλ(t),γ(t)(x(t))-λ(s)Tλ(s),γ(s)(x(s))|t-s|4x(t)-x(s)|t-s|+4βγ(t)|γ(t)-γ(s)||t-s|B(x(t))+2γ(t)|γ(t)-γ(s)||t-s|x(t)-x¯.

Hence, by taking the limit as st we get, for any tt0,

ddtλ(t)Tλ(t),γ(t)(x(t))4x˙(t)+4β|γ˙(t)|γ(t)B(x(t))+2|γ˙(t)|γ(t)x(t)-x¯.

The next theorem concerns the existence and uniqueness of strong global solutions to (Split-DIN-AVD).

Theorem 2.3

Assume that λ,γ:[t0,+)(0,+) are Lebesgue measurable functions and that inftt0λ(t)>0. Then, for any (x0,u0)H×H there exists a unique strong global solution x:[t0,+)H of the system (2) that satisfies x(t0)=x0 and x˙(t0)=u0.

Proof

We will rely on [17, Proposition 6.2.1] and distinguish between the cases ξ>0 and ξ=0. For each chase, we will check that the conditions of the afforementioned proposition are fulfilled. We will be working in the real Hilbert space H×H endowed with the norm (x,y)=x+y. Let x¯zer(A+B) be fixed.

The Case ξ>0. First, it can be easily checked (see also [4, 9, 13]) that for all tt0 the following dynamical systems are equivalent

x¨(t)+αtx˙(t)+ξddtTλ(t),γ(t)(x(t))+Tλ(t),γ(t)(x(t))=0.

x˙(t)+ξTλ(t),γ(t)(x(t))-1ξ-αtx(t)+1ξy(t)=0,y˙(t)-1ξ-αt+αξt2x(t)+1ξy(t)=0.

In other words, (2) with Cauchy data (x0,u0)=(x(t0),x˙(t0)) is equivalent to the first order system

z˙(t)=F(t,z(t)),z(t0)=(x0,y0),

where z(t)=(x(t),y(t)), F is given, for every tt0, by

F(t,(x,y))=-ξTλ(t),γ(t)(x)+1ξ-αtx-1ξy,1ξ-αt+αξt2x-1ξy

and the Cauchy data is x0=x(t0), y0=-ξu0+ξTλ(t0),γ(t0)(x0)-1ξ-αt0x0.

  • (i)
    Let t[t0,+) be fixed. We need to verify the Lipschitz continuity of F on the z variable. Set z=(x,y), w=(u,v). We have
    F(t,z)-F(t,w)=-ξTλ(t),γ(t)(x)-Tλ(t),γ(t)(u)+1ξ-αt(x-u)-1ξ(y-v)+1ξ-αt+αξt2(x-u)-1ξ(y-v).
    Set λ_:=inftt0λ(t)>0. According to Lemma 2.2(i), the term involving the operator Tλ(t),γ(t) satisfies
    Tλ(t),γ(t)(x)-Tλ(t),γ(t)(u)2λ(t)x-u2λ_x-u.
    It follows that, if we take
    K(t):=max2ξλ_+1ξ-αt+1ξ-αt+αξt2,2ξtt0,
    then we have KLloc1([t0,+),R) and
    F(t,z)-F(t,w)K(t)z-wtt0.
  • (ii)
    Now, we claim that F fulfills a boundedness condition. For t[t0,+) and z=(x,y)H×H we have
    F(t,z)=-ξTλ(t),γ(t)(x)+1ξ-αtx-1ξy+1ξ-αt+αξt2x-1ξy.
    By Lemma 2.2(i), we have, for every tt0,
    Tλ(t),γ(t)(x)=Tλ(t),γ(t)(x)-Tλ(t),γ(t)(x¯)2λ(t)x-x¯.
    Hence, if we take
    P(t)=max2ξλ(t)+1ξ-αt+1ξ-αt+αξt2,2ξλ(t),2ξtt0,
    then we have PLloc1([t0,+),R) and
    F(t,z)P(t)(1+z).

We have checked that the conditions of [17, Proposition 6.2.1] hold. Therefore, there exists a unique locally absolutely continuous solution tx(t) of (2) that satisfies x(t0)=x0 and x˙(t0)=u0.

The Case ξ=0. Now, (2) is easily seen to be equivalent to

z˙(t)=F(t,z(t)),z(t0)=(x0,u0),,

where z(t)=(x(t),y(t)) and F is given, for every tt0, by

F(t,(x,y))=y,-αty-Tλ(t),γ(t)(x).

Showing that F fulfills the required properties is starightforward.

The Convergence Properties of the Trajectories

In this section, we will study the asymptotic behaviour of the trajectories of the system

(Split-DIN-AVD)x¨(t)+αtx˙(t)+ξddtTλ(t),γ(t)(x(t))+Tλ(t),γ(t)(x(t))=0,

where

Tλ,γ(x)=1λ[Id-JγA(Id-γB)].

We will show weak convergence of the trajectories generated by (2) to elements of zer(A+B), as well as the fast convergence of the velocities and accelerations to zero. Additionally, we will provide convergence rates for Tλ(t),γ(t)(x(t)) and ddtTλ(t),γ(t)(x(t)) as t+. To avoid repetition of the statement “for almost every t”, in the following theorem we will assume we are working with a classic global solution of our system.

Theorem 3.1

Let A:H2H be a maximally monotone operator and B:HH a β-cocoercive operator for some β>0 such that zer(A+B). Assume that α>1, ξ0, λ(t)=λt2 for λ>2(α-1)2 and all tt0, and that γ:[t0,+)(0,2β) is a differentiable function that satisfies γ˙(t)γ(t)=O1t as t+. Then, for a solution x:[t0,+)H to (Split-DIN-AVD), the following statements hold:

  • (i)

    x is bounded.

  • (ii)
    We have the estimates
    t0+tx˙(t)2dt<+,t0+t3x¨(t)2dt<+,t0+γ2(t)tAγ(t)[x(t)-γ(t)Bx(t)]+Bx(t)2dt<+.
  • (iii)
    We have the convergence rates
    x˙(t)=o1t,x¨(t)=O1t2,Aγ(t)[x(t)-γ(t)Bx(t)]+Bx(t)=o1γ(t),ddtAγ(t)[x(t)-γ(t)Bx(t)]+Bx(t)=O1tγ(t)+ot2ddtγ(t)λ(t)γ2(t)
    as t+.
  • (iv)

    If 0<inftt0γ(t)suptt0γ(t)<2β, then x(t) converges weakly to an element of zer(A+B) as t+.

Proof

Integral Estimates and Rates. To develop the analysis, we will fix x¯zer(A+B) and make of use of the Lyapunov function E:[t0,+)R{+} given by

E(t):=12α-12(x(t)-x¯)+t(x˙(t)+ξTλ(t),γ(t)(x(t)))2+(α-1)28x(t)-x¯2. 9

Differentiation of E with respect to time yields, for every tt0,

E˙(t)=α-12(x(t)-x¯)+tx˙(t)+ξTλ(t),λ(t)(x(t)),α+12x˙(t)+ξTλ(t),γ(t)(x(t))+tx¨(t)+ξddtTλ(t),γ(t)(x(t))+(α-1)24x(t)-x¯,x˙(t).

After reduction and employing (2), we get, for every tt0,

E˙(t)=(α-1)(ξ-t)2x(t)-x¯,Tλ(t),γ(t)(x(t))+(1-α)t2x˙(t)2+-t2+ξ(3-α)t2Tλ(t),γ(t)(x(t)),x˙(t)+ξ(ξ-t)tTλ(t),γ(t)(x(t))2.

Now, by Lemma 2.2(i), we know that Tλ(t),γ(t) is λ(t)2-cocoercive for every tt0. Using this on the first summand of the right hand side of the previous inequality yields, for tt1=max{ξ,t0},

E˙(t)(1-α)t2x˙(t)2+-t2+ξ(3-α)t2Tλ(t),γ(t)(x(t)),x˙(t)+(α-1)(ξ-t)λ(t)4+ξ(ξ-t)tTλ(t),γ(t)(x(t))2. 10

Now, since λ>2(α-1)2, we can choose ϵ>0 such that

0<ϵ<α-1-2λ<α-1. 11

From (10) we get, for every tt1,

E˙(t)+ϵ2tx˙(t)2+ϵ4tλ(t)Tλ(t),γ(t)(x(t))21-α2+ϵ2tx˙(t)2+-t2+ξ(3-α)t2Tλ(t),γ(t)(x(t)),x˙(t)+(α-1)(ξ-t)2+ϵ2tλ(t)2+ξ(ξ-t)tTλ(t),γ(t)(x(t))2. 12

By (11) and the definition of λ(t), we know that 1-α2+ϵ2<0, and

(α-1)(ξ-t)2+ϵ2tλ(t)2+ξ(ξ-t)t=1-α2+ϵ2<0λ2t3+O(t2),

so we can find t2t1 such that for every tt2 the previous expression becomes nonpositive. According to Lemma A.2, the right hand side of (12) is nonpositive whenever

R(t):=-t2+ξ(3-α)t22-41-α2+ϵ2t(α-1)(ξ-t)2+ϵ2tλ(t)2+ξ(ξ-t)t0.

This quantity can be rewritten as

R(t)=1+41-α2+ϵ2α-12-ϵ2λ2t4+O(t3)ast+.

Since ϵ<α-1-2λ, we have λ2>1(α-1-ϵ)2. Hence,

1+41-α2+ϵ2α-12-ϵ2λ2=1-(α-1-ϵ)2λ2<0.

This means we can find t3t2 such that for every tt3 we have R(t)0, that is, for every tt3 we have

E˙(t)+ϵ2tx˙(t)2+ϵ4tλ(t)Tλ(t),γ(t)(x(t))20. 13

Now, integrating (13) from t3 to t we obtain

E(t)+ϵ2t3tsx˙(s)2ds+ϵ4λt3ts3Tλ(s),γ(s)(x(s))2dsE(t3). 14

From (13) and the form of E we immediately obtain

tx(t)-x¯is bounded, 15
t0+tx˙(t)2dt<+, 16
t0+t3Tλ(t),γ(t)(x(t))2dt<+, 17
suptt0α-12(x(t)-x¯)+tx˙(t)+ξTλ(t),γ(t)(x(t))<+. 18

From Lemma 2.2(i), we know that for every tt0 the operator Tλ(t),γ(t) is 2λ(t)-Lipschitz continuous, which gives, for every tt0,

Tλ(t),γ(t)(x(t))=Tλ(t),γ(t)(x(t))-Tλ(t),γ(t)(x¯)2λ(t)x(t)-x¯.

Thus, from (15) and recalling that λ(t)=λt2 we arrive at

Tλ(t),γ(t)(x(t))=O1t2ast+. 19

By combining (15), (18) and (19) we obtain suptt0tx˙(t)<+ and therefore

x˙(t)=O1tast+. 20

From Lemma 2.2, (15), (20) and the fact that B is 1β-Lipschitz continuous we deduce that, as t+,

ddtλ(t)Tλ(t),γ(t)(x(t))4x˙(t)+4β|γ˙(t)|γ(t)B(x(t))+2|γ˙(t)|γ(t)x(t)-x¯=O1t. 21

On the other hand, for every tt0 we have

ddtλ(t)Tλ(t),γ(t)(x(t))=λ˙(t)Tλ(t),γ(t)(x(t))+λ(t)ddtTλ(t),γ(t)(x(t)), 22

so by combining (19), (21), (22) and the fact that λ˙(t)=2λt we arrive at

λ(t)ddtTλ(t),γ(t)(x(t))ddtλ(t)Tλ(t),γ(t)(x(t))O1t+λ˙(t)Tλ(t),γ(t)(x(t))O1t2=O1tast+,

which yields

ddtTλ(t),γ(t)(x(t))=1λ(t)O1t=O1t3ast+. 23

Let us now improve (19) and show that

Tλ(t),γ(t)(x(t))=o1t2ast+. 24

According to (19) and (21) there exists a constant K>0 such that for every tt0 it holds

ddtλ(t)Tλ(t),γ(t)(x(t))4=4λ(t)Tλ(t),γ(t)(x(t))2×λ(t)Tλ(t),γ(t)(x(t)),ddtλ(t)Tλ(t),γ(t)(x(t))4λ(t)Tλ(t),γ(t)(x(t))2λ(t)Tλ(t),γ(t)(x(t))×ddtλ(t)Tλ(t),γ(t)(x(t))4Ktλ(t)Tλ(t),γ(t)(x(t))2.

By (17), the right hand side belongs to L1([t0,+),R), so we get

ddtλ(t)Tλ(t),γ(t)(x(t))4L1([t0,+),R),

hence the limit

limt+λ(t)Tλ(t),γ(t)(x(t))4

exists. Obviously, this implies the existence of L:=limt+λ(t)Tλ(t),γ(t))(x(t))2. By using (17) again we come to

t0+1tλ(t)Tλ(t),γ(t)(x(t))2dt=λ2t0+t3Tλ(t),γ(t)(x(t))2dt<+,

and so we must have L=0, which gives

Tλ(t),γ(t)(x(t))=o1t2ast+. 25

By combining (2), (19), (20) and (23) we obtain, as t+,

x¨(t)=-αtx˙(t)-ξddtTλ(t),γ(t)(x(t))-Tλ(t),γ(t)(x(t))αtx˙(t)O1t+ξddtTλ(t),γ(t)(x(t))O1t3+Tλ(t),γ(t)(x(t))O1t2=O1t2.

Moreover, by using the well-known inequality a+b+c23a2+3b2+3c2 for every a,b,cH, for every tt0 it holds

t3x¨(t)2t3-αtx˙(t)-ξddtTλ(t),γ(t)(x(t))-Tλ(t),γ(t)(x(t))23αtx˙(t)2+3ξ2t3ddtTλ(t),γ(t)(x(t))2+3t3Tλ(t),γ(t)(x(t))2.

From (16), (23) and (17) it follows

t0+t3x¨(t)2dt<+. 26

To see that x˙(t)=o1t as t+, we write, for every tt0,

ddtt2x˙(t)2=2tx˙(t)2+2t2x˙(t),x¨(t)3tx˙(t)2+t3x¨(t)2.

From (16) and (26) we deduce that the left hand side belongs to L1([t0,+),R), from which we infer that the limit limt+t2x˙(t)2 exists. Using (16) again, we get

t0+1tt2x˙(t)2dt=t0+tx˙(t)2dt<+,

from which we finally deduce limt+t2x˙(t)2=0, therefore

x˙(t)=o1tast+. 27

Notice that we can write for every tt0

Tλ(t),γ(t)=1λ(t)[Id-Jγ(t)A(Id-γ(t)B)]=γ(t)λ(t)Aγ(t)[x(t)-γ(t)Bx(t)]+Bx(t).

Hence, multiplying both sides of (25) by λ(t)γ(t) and remembering the definition of λ(t) we obtain

Aγ(t)[x(t)-γ(t)Bx(t)]+Bx(t)=o1γ(t)ast+. 28

For every tt0, we have

ddtTλ(t),γ(t)(x(t))=ddtγ(t)λ(t)Aγ(t)[x(t)-γ(t)Bx(t)]+Bx(t)+γ(t)λ(t)ddtAγ(t)[x(t)-γ(t)Bx(t)]+Bx(t).

Therefore, by using (23) and (28), and recalling that λ(t)=λt2, we obtain

ddtAγ(t)[x(t)-γ(t)Bx(t)]+Bx(t)=O1tγ(t)+ot2ddtγ(t)λ(t)γ2(t)ast+.

The fact that x¨(t)=O1t2 as t+ comes from (2), (27), (23) and (24).

Weak Convergence of the Trajectories. Let x¯zer(A+B). We will work with the energy function h:[t0,+)R given by

h(t):=12x(t)-x¯2.

For every tt0, we have

h˙(t)=x(t)-x¯,x˙(t),h¨(t)=x(t)-x¯,x¨(t)+x˙(t)2. 29

Combining (2) and (29) gives us, for every tt0,

h¨(t)+αth˙(t)+Tλ(t),γ(t)(x(t)),x(t)-x¯=x˙(t)2+-ξddtTλ(t),γ(t)(x(t)),x(t)-x¯.

By using the λ(t)2-cocoercitivity of Tλ(t),γ(t) on the left hand side, Cauchy–Schwarz on the right hand side and multiplying both sides by t, the previous inequality entails, for every tt0,

th¨(t)+αh˙(t)+tλ(t)2Tλ(t),γ(t)(x(t))tx˙(t)2+ξtddtTλ(t),γ(t)(x(t))x(t)-x¯tt0.

Now, puttin together results in

k(t):=tx˙(t)2+ξtddtTλ(t),γ(t)(x(t))x(t)-x¯L1([t0,+),R).

Now apply Lemma A.1 with θ(t):=tλ(t)2Tλ(t),γ(t)(x(t)) for every tt0 to deduce that the limit

limt+h(t)

exists, which fulfills the first condition of Opial’s Lemma A.3.

Let us now move on to the second condition. Suppose x^ is a weak sequential cluster point of tx(t), that is, there exists a sequence (tn)nN[t0,+) such that tn+ and xn:=x(tn) converges weakly to x^ as n+. Define

Uγ:=Id-JγA(Id-γB).

According to (25), we have Uγ(t)(x(t))=λ(t)Tλ(t),γ(t)(x(t))0 as t+. Now, since γ(t)[δ,2β-δ] for all tt0 for some δ>0, we can extract a subsequence (γ(tnk))kN such that γ(tnk)γ¯(0,2β) as k+. We may assume without loss of generality then that γn:=γ(tn)γ¯ as n+. We now have for every nN

Uγn(xn)-Uγ¯(xn)=JγnA(xn-γnB(xn))-Jγ¯A(xn-γ¯B(xn))=JγnA(xn-γnB(xn))-JγnA(xn-γ¯B(xn))+JγnA(xn-γ¯B(xn))-Jγ¯A(xn-γ¯B(xn))|γ¯-γn|B(xn)+|γ¯-γn|Aγ¯(xn-γ¯B(xn)).

Now, since every weakly convergent sequence is bounded and the operators B and Aγ¯ are Lipschitz-continuous we deduce that the right-hand side of the previous inequality approaches zero as n+, therefore getting

Uγ¯(xn)=Uγn(xn)+(Uγ¯(xn)-Uγn(xn))0

as n+. Now, from the proof of part (i) of Lemma 2.2, we know that Uγ¯ is 4β-γ¯4β-cocoercive, thus monotone and Lipschitz continuous and therefore maximally monotone. Summarizing, we have

  1. Uγ¯ is maximally monotone and thus its graph is closed in the weak×strong topology of H×H (see [14, Proposition 20.38(ii)]),

  2. xn converges weakly to x^ and Uγ¯(xn)0 as n+,

which allows us to conclude that Uγ¯(x^)=0, and gives finally x^zer(A+B). Now we just invoke Opial’s Lemma to achieve that x(t) converges weakly to x¯ as t+ for some x¯zer(A+B).

In the following subsections, we explore the particular cases B=0 and A=0, and we will show improvements with respect to previous results from the literature addressing continuous time approaches to monotone inclusions.

The Case B=0

If we let B=0 in the (Split-DIN-AVD) system (2), then, attached to the monotone inclusion problem

findxHsuchthat0A(x),

we obtain the dynamics

x¨(t)+αtx˙(t)+ξddtAλ(t),γ(t)(x(t)+Aλ(t),γ(t)(x(t))=0, 30

where

Aλ,γ(x)=1λ(Id-JγA).

We can state the following theorem.

Theorem 3.2

Let A:H2H be a maximally monotone operator such that zerA. Assume that α>1, ξ0, λ(t)=λt2 for λ>1(α-1)2 and all tt0, and that γ:[t0,+)(0,+) is a differentiable function that satisfies |γ˙(t)|γ(t)=O1t as t+. Then, for a solution x:[t0,+)H to (30), the following statements hold:

  • (i)

    x is bounded.

  • (ii)
    We have the estimates
    t0+tx˙(t)2dt<+,t0+t3x¨(t)2dt<+,t0+γ2(t)tAγ(t)(x(t))2dt<+.
  • (iii)
    We have the convergence rates
    x˙(t)=o1t,x¨(t)=O1t2,Aγ(t)(x(t))=o1γ(t),ddtAγ(t)(x(t))=O1tγ(t)+ot2ddtγ(t)λ(t)γ2(t)
    as t+.
  • (iv)

    If 0<inftt0γ(t), then x(t) converges weakly to an element of zerA as t+.

Proof

The proof proceeds in the exact same way as the proof of Theorem 3.1. However, a few comments are in order: first of all, now we have Tλ,γ=1λ(Id-JγA)=Aλ,γ. Since JλA is firmly nonexpansive, by [14, Proposition 4.4] so is Id-JλA. In other words, Id-JγA is 1-cocoercive, therefore Aλ,γ=1λ(Id-JγA) is λ-cocoercive, so now the condition on λ becomes λ>1(α-1)2.

The proof also changes when we verify the second part of the Opial’s Lemma, to get weak convergence of the trajectories tx(t). This is in order to allow for γ(t) not to be necessarily bounded. We do need, however, the assumption 0<inftt0γ(t). Indeed, from Aλ(t),γ(t)(x(t))=o1t2 as t+, we obtain

y(t):=x(t)-Jγ(t)Ax(t)=λ(t)Aλ(t),γ(t)(x(t))0

as t+. Using the definition of the resolvent, we come to

Jγ(t)Ax(t)=x(t)-y(t)y(t)γ(t)A(x(t)-y(t))1γ(t)y(t)A(x(t)-y(t)).

for all tt0. If (tn)nN[t0,+) is such that tn+ and x(tn) converges weakly to x^ as n+, then the previous inclusion, together with the assumption on γ gives

x(tn)-y(tn)convergesweaklytox^and1γ(t)y(t)0asn+,

and by the closedness of the graph of A in the weak×strong topology of H×H, we deduce that x^zerA.

Remark 3.3

The hypotheses required for γ are fulfilled at least by two families of functions. First, take r0 and set γ(t)=e-r. Then, we have

γ˙(t)γ(t)=-rt-(r+1)e-re-r=-rtr+1=O1tast+,

and

γ(t)=e-re0=1t0.

If γ is a polynomial of degree n for some nN, the conditions are also fulfilled. Assume γ(t)=antn+an-1tn-1++a0 for all tt0, for some aiR for i{0,,n} and an>0. Then, we have

t·γ˙(t)γ(t)=t·nantn-1+(n-1)an-1tn-1++a1antn+an-1tn-1++a0nanan=nast+,

so γ˙(t)γ(t)=O1t as t+. Since we also have γ(t)+ as t+, the condition inftt0γ(t)>0 is fulfilled for large enough t0.

In particular, we can choose γ(t)=λ(t)=λt2, which fulfills γ(t)λt02>0 for any tt0 and any t0. Since Aλ,λ=Aλ for λ>0, this choice of γ allows us to recover the (DIN-AVD) system studied by Attouch and László in [9]. Notice the way the convergence rates for Aγ(t)(x(t)) and ddtAγ(t)(x(t)) exhibited in part (iii) of Theorem 3.2 depend on γ(t). If we set γ(t)=tn for every tt0 for any natural number n>2, (Split-DIN-AVD) performs from this point of view better than (DIN-AVD) without increasing the complexity of the governing operator.

The Case A=0

Let us return to (Split-DIN-AVD) dynamics (2). Set A=0, and for every tt0 take γ(t)=γ(0,2β) and η(t)=ηt2 with η=λ/γ. Then, associated to the problem

findxHsuchthatB(x)=0,

we obtain the system

x¨(t)+αtx˙(t)+ξddt1η(t)Bx(t)+1η(t)Bx(t)=0. 31

The conditions λ>2(α-1)2 and γ(0,2β) imply

η=λγ>2γ(α-1)2>22β(α-1)2=1β(α-1)2.

With the previous observation, we are able to state the following theorem.

Theorem 3.4

Let B:HH be a β-cocoercive operator for some β>0 such that zerB. Assume that α>1, ξ0 and η(t)=ηt2 for η>1β(α-1)2 and all tt0. Take x:[t0,+)H a solution to (31). Then, the following hold:

  • (i)

    x is bounded, and x(t) converges weakly to an element of zerB as t+.

  • (ii)
    We have the estimates
    t0+tx˙(t)2dt<+,t0+t3x¨(t)2dt<+,t0+1tBx(t)2dt<.
  • (iii)
    We have the convergence rates
    x˙(t)=o1t,x¨(t)=O1t2
    as well as the limit
    Bx(t)0
    as t+.

Proof

Since η>1β(α-1)2, we can find ϵ(0,β) such that η>1(β-ϵ)(α-1)2, equivalently, 2(β-ϵ)η>2(α-1)2. Since (31) is equivalent to (Split-DIN-AVD) with A=0 and parameters λ=2(β-ϵ)η>1(α-1)2 and γ(t)2(β-ϵ)(0,2β), the conclusion follows from Theorem 3.1.

Remark 3.5

  1. As we mentioned in the introduction, the dynamical system (31) provides a way of finding the zeros of a cocoercive operator directly through forward evaluations, instead of having to resort to its Moreau envelope when following the approach in [9].

  2. The dynamics (31) bear some resemblance to the system (6) (see also [16]) with μ(t)=αt and ν(t)=1η(t), with an additional Hessian-driven damping term. In our case, since η>1β(α-1)2, the parameters satisfy
    μ˙(t)=-αt20,μ2(t)ν(t)=α2ηt2t2=α2η>1βtt0.
    However, we have
    ν˙(t)=-2λt30tt0,
    so one of the hypotheses which is needed in (6) is not fulfilled, which shows that one cannot address the dynamical system (31) as a particular case of it; indeed, for (6) a vanishing damping is not allowed. With our system, we obtain convergence rates for x˙(t) and x¨(t) as t+, which are not obtained in [16].

Structured Convex Minimization

We can specialize the previous results to the case of convex minimization, and show additionally the convergence of functional values along the generated trajectories to the optimal objective value at a rate that will depend on the choice of γ. Let f:HR{+} be a proper, convex and lower semicontinuous function, and let g:HR be a convex and Fréchet differentiable function with Lg-Lipschitz continuous gradient. Assume that argminH(f+g), and consider the minimization problem

minxHf(x)+g(x). 32

Fermat’s rule tells us that x¯ is a global minimum of f+g if and only if

0(f+g)(x¯)=f(x¯)+g(x¯).

Therefore, solving (32) is equivalent solving the monotone inclusion 0(A+B)(x) addressed in the first section, with A=f and B=g. Moreover, recall that if g is Lg-Lipschitz then it is 1Lg-cocoercive (Baillon–Haddad’s Theorem, see [14, Corollary 18.17]). Therefore, associated to the problem (32) we have the dynamics

x¨(t)+αtx˙(t)+ξddtγ(t)λ(t)fγ(t)(u(t))+g(x(t))+γ(t)λ(t)fγ(t)(u(t))+g(x(t))=0, 33

where we have denoted u(t)=x(t)-γ(t)g(x(t)) for all tt0 for convenience.

Theorem 4.1

Let f:HR{+} be a proper, convex and lower semicontinuous function, and let g:HR be a convex and Fréchet differentiable function with a Lg-Lipschitz continuous gradient such that argminH(f+g). Assume that α>1, ξ0, λ(t)=λt2 for λ>2(α-1)2 and all tt0, and that γ:[t0,+)0,2Lg is a differentiable function that satisfies γ˙(t)γ(t)=O(1/t) as t+. Then, for a solution x:[t0,+)H to (33), the following statements hold:

  • (i)

    x is bounded.

  • (ii)
    We have the estimates
    t0+tx˙(t)2dt<+,t0+t3x¨(t)2dt<+,t0+γ2(t)tfγ(t)[x(t)-γ(t)g(x(t))]+g(x(t))2dt<+.
  • (iii)
    We have the convergence rates
    x˙(t)=o1t,x¨(t)=O1t2,fγ(t)[x(t)-γ(t)g(x(t))]+g(x(t))=o1γ(t),ddtfγ(t)[x(t)-γ(t)g(x(t))]+g(x(t))=O1tγ(t)+ot2ddtγ(t)λ(t)γ2(t)
    as t+.
  • (iv)

    If 0<inftt0γ(t)suptt0γ(t)<2Lg, then x(t) converges converges to a minimizer of f+g as t+.

  • (v)
    Additionally, if 0<γ(t)1Lg for every tt0 and we set u(t):=x(t)-γ(t)g(x(t)), then
    fproxγ(t)f(u(t))+gproxγ(t)f(u(t))-minH(f+g)=o1γ(t)
    as t+. Moreover, proxγ(t)f(u(t))-x(t)0 as t+.

Proof

Parts (i)–(iv) are a direct consequence of Theorem 3.1. For checking (v), first notice that for all tt0 we have

Tλ(t),γ(t)(x(t))=1λ(t)[Id-Jγ(t)f(Id-γ(t)g)](x(t))=1λ(t)[x(t)-proxγ(t)f(u(t))]. 34

Now, let x¯argminH(f+g). According to [15, Lemma 2.3], for every tt0, we have the inequality

fproxγ(t)f(u(t))+gproxγ(t)f(u(t))-minH(f+g)fproxγ(t)f(u(t))+gproxγ(t)f(u(t))-f(x¯)-g(x¯)-12γ(t)proxγ(t)f(u(t))-x(t)2+1γ(t)x(t)-x,x(t)-proxγ(t)f(u(t)).

After summing the norm squared term and using the Cauchy–Schwarz inequality, for every tt0 we obtain

12γ(t)proxγ(t)f(u(t))-x(t)2fproxγ(t)f(u(t))+gproxγ(t)f(u(t))+12γ(t)proxγ(t)f(u(t))-x(t)2-minH(f+g)1γ(t)(x(t)-proxγ(t)f(u(t))),x(t)-x¯1γ(t)(x(t)-proxγ(t)f(u(t)))x(t)-x¯=λ(t)γ(t)Tλ(t),γ(t)(x(t))x(t)-x¯=o1γ(t)ast+,

which follows as a consequence of x being bounded and Tλ(t),γ(t)(x(t))=o1t2 as t+.

Remark 4.2

It is also worth mentioning the system we obtain in the case where g0, since we also get some improved rates for the objective functional values when we compare (Split-DIN-AVD) to (DIN-AVD) [9]. In this case, we have the system

x¨(t)+αt+ξddtγ(t)λ(t)fγ(t)(x(t))+γ(t)λ(t)fγ(t)(x(t))=0 35

attached to the convex optimization problem

minxHf(x).

If we assume λ>1(α-1)2, allow γ:[t0,+)(0,+) to be unbounded from above and otherwise keep the hypotheses of Theorem 4.1, for a solution x:[t0,+)H to (35), the following statements hold:

  • (i)

    x is bounded,

  • (ii)
    We have the estimates
    t0+tx˙(t)2dt<+,t0+t3x¨(t)2dt<+,t0+γ2(t)tfγ(t)(x(t))2dt<+,
  • (iii)
    We have the convergence rates
    x˙(t)=o1t,x¨(t)=O1t2,fγ(t)(x(t))=o1γ(t),ddtfγ(t)(x(t))=O1tγ(t)+ot2ddtγ(t)λ(t)γ2(t)
    as t+.
  • (iv)

    If 0<inftt0γ(t), then x(t) converges weakly to a minimizer of f as t+.

  • (v)
    We also obtain the rate
    fγ(t)(x(t))-minHf=o1γ(t)ast+,
    which entails
    fproxγ(t)f(x(t))-minHf=o1γ(t)andproxγ(t)f(x(t))-x(t)0
    as t+.

Parts (i)–(iv) are a direct consequence of Theorem 3.2 for the case A=f. For showing part (v), first notice that for λ>0 and uH we have, according to the definition of fλ and proxλf,

fλ(u)=fproxλf(u)+12λproxλf(u)-u2f(u).

Let x¯H be a minimizer of f. We apply the gradient inequality to fγ(t), from which we obtain, for every tt0

fγ(t)(x(t))-minHf=fγ(t)(x(t))-f(x¯)fγ(t)(x(t))-fλ(t)(x¯)fγ(t)(x(t)),x(t)-x¯fγ(t)(x(t))x(t)-x¯,

where the last inequality follows from the Cauchy–Schwarz inequality. Since fγ(t)(x(t))=o1γ(t) as t+ and x is bounded, the previous inequality entails the first statement of (v). Again recalling the definition of the Moreau envelope of f, this finally gives

fproxγ(t)f(x(t))+12γ(t)proxγ(t)f(x(t))-x(t)2-minHf=fγ(t)(x(t))-minHf=o1γ(t)

as t+, which implies the last two statements and concludes the proof.

As pointed out in Remark 3.3, we can choose γ(t)=λt2 for every tt0 and recover the (DIN-AVD) system for nonsmooth convex minimization problems studied in [9]. Moreover, we can also set γ(t)=tn for a natural number n>3 and all tt0. Now, not only are the convergence rates for fγ(t)(x(t)) and ddtfγ(t)(x(t)) as t+ improved with respect to the system in [9], but (Split-DIN-AVD) also provides a better rate for the convergence of fγ(t)(x(t)) to minHf as t+.

Numerical Experiments

In the following paragraphs we describe some numerical experiments that portray some aspects of the theory.

Minimizing a Smooth and Convex Function

As an example of a continuous time scheme minimizing a convex and Fréchet differentiable function g:HR with Lg-Lipschitz continuous gradient via (Split-DIN-AVD), we consider the system

x¨(t)+αtx˙(t)+ξddt1η(t)g(x(t))+1η(t)g(x(t))=0, 36

where for (x1,x2)R2 we set g(x1,x2)=12(x12+100x22) and therefore g(x1,x2)=(x1,100x2). A trajectory generated by (36) is a pair x(t)=(x1(t),x2(t)). Figure 1 plots both components of the solution to (36) with initial Cauchy data x0=(1,1), u0=(1,1). Notice that the Lipschitz constant of g is Lg=100, which means that the cocoercitivity modulus of g is β=1Lg=1100. To fulfill η>1β(α-1)2=100(α-1)2, we choose α=20, η=0.278. Figure 1a corresponds to the case with no Hessian damping, that is, ξ=0. Figure 1b corresponds to a Hessian damping parameter ξ=0.2.

Fig. 1.

Fig. 1

Trajectories of (Split-DIN-AVD) for B=g

Figure 2 depicts the fast convergence of the velocities to zero for the cases ξ=0 (Fig. 2a) and ξ=0.2 (Fig. 2b). In both figures, notice the effect of the damping parameter ξ>0, which attenuates the oscillations of the second component of the trajectories, as well as the oscillations present in the velocities.

Fig. 2.

Fig. 2

Fast convergence of the velocities

Minimizing a Nonsmooth and Convex Function

As an example of a continuous time scheme minimizing a proper, convex and lower semicontinuous function f:HR{+} via (Split-DIN-AVD), we consider the system

x¨(t)+αt+ξddtγ(t)λ(t)fγ(t)(x(t))+γ(t)λ(t)fγ(t)(x(t))=0. 37

We will consider three options for f and plot for each of them the trajectories, the objective function values and the gradients of the Moreau envelopes as follows:

  • f(x)=12x2 (Figs. 3a and 4a),

  • f(x)=|x| (Figs. 3b and 4b),

  • f(x)=|x|+12x2 (Figs. 3c and 4c).

In order to fulfill α>1 and λ>1(α-1)2, we choose the parameters α=2, λ=1.1, and we take ξ=0 and γ(t)=t8. We compare the results given by (DIN-AVD) (that is, when γ(t)=λt2) and the ones given by our system (Split-DIN-AVD). The choice of ξ does not seem to change the plots in a significant way for the examples we have chosen.

Fig. 3.

Fig. 3

Trajectories and objective function values in the case A=f

Fig. 4.

Fig. 4

Gradients of the Moreau envelopes of f

Figure 3 depicts the trajectories x(t) of (37) and the function values fproxγ(t)(x(t)) for our choices of f as t+. Figure 4 portrays the fast convergence to zero of fγ(t)(x(t)) as t+. Notice the big improvement over (DIN-AVD) for nonsmooth convex minimization in [9] when choosing γ(t)=t8, a result which we already knew theoretically. Polynomials of high degree seem to be the ones which give the biggest improvements in terms of rates.

An Example with Operator Splitting

Now we consider the monotone inclusion problem (1) for A(x1,x2)=(-x2,x1) and B(x1,x2)=(x1,x2) for every (x1,x2)R2. For every (x1,x2)R2, an easy calculation gives

JγAx1x2=11+γ2γ1+γ2-γ1+γ211+γ2x1x2,

and so

(Id-JγA(Id-γId))x1x2=x1x2-(1-γ)11+γ2γ1+γ2-γ1+γ211+γ2x1x2=γ2+γ1+γ2γ-11+γ21-γ1+γ2γ2+γ1+γ2x1x2,

and

Tλ,γx1x2=γ2+γλ(1+γ2)γ-1λ(1+γ2)1-γλ(1+γ2)γ2+γλ(1+γ2)x1x2.

(Split-DIN-AVD) now reads

x1¨(t)x2¨(t)+αtx1˙(t)x2˙(t)+ξddtγ2(t)+γ(t)λ(t)(1+γ2(t))γ(t)-1λ(t)(1+γ2(t))1-γ(t)λ(t)(1+γ2(t))γ2(t)+γ(t)λ(t)(1+γ2(t))x1(t)x2(t)+γ2(t)+γ(t)λ(t)(1+γ(t)2)γ(t)-1λ(t)(1+γ2(t))1-γ(t)λ(1+γ2(t))γ2(t)+γ(t)λ(t)(1+γ2(t))x1(t)x2(t)=00.

We choose the parameters α=7, λ=0.056, γ(t)1.5, and the Cauchy data x0=(1,2) and u0=(-1,-1). Figure 5a corresponds to the case ξ=0, and Fig. 5b depicts the trajectory when the Hessian damping parameter is ξ=0.8. Again, notice how, not only for optimization problems, but also for monotone inclusions which cannot be reduced to the former, the presence of ξ seems to attenuate the oscillations present in the trajectories.

Fig. 5.

Fig. 5

Trajectories of (Split-DIN-AVD) for finding the zeros of A+B

A Numerical Algorithm

In the following we will derive via time discretization of (Split-DIN-AVD) a numerical algorithm for solving the monotone inclusion problem (1). We perform a discretization of (Split-DIN-AVD) with stepsize 1 and set, for an integer k1, x(k):=xk, λ(k):=λk, γ(k):=γk. We make the approximations

x¨(t)xk+1-2xk+xk-1,αtx˙(t)αk(xk-xk-1),ddtTλ(t),γ(t)(x(t))Tλk,γk(xk)-Tλk-1,γk-1(xk-1),Tλ(t),γ(t)(x(t))Tλk+1,γk+1(xk+1),

so we get, for every k1,

xk+1-2xk-xk-1+αk(xk-xk-1)+ξTλk,γk(xk)-Tλk-1,γk-1(xk-1)+Tλk+1,γk+1(xk+1)=0. 38

After rearranging the terms of (38), for every k1 we obtain

xk+1+Tλk+1,γk+1(xk+1)=xk+1-αk(xk-xk-1)-ξTλk,γk(xk)-Tλk-1,γk-1(xk-1). 39

In other words, after setting αk=1-αk and denoting the right hand side of (39) by yk for every k1, we obtain the following iterative scheme

(k1)yk=xk+αk(xk-xk-1)-ξTλk,γk(xk)-Tλk-1,γk-1(xk-1),xk+1=Id+Tλk+1,γk+1-1(yk). 40

Observe that the second step in (40) is always well-defined. Indeed, for λ,γ>0, Tλ,γ is λ2-cocoercive, hence monotone (see Lemma 2.2(i)). This also implies that Tλ,γ is 2λ-Lipschitz continuous, and a monotone and continuous operator is maximally monotone, according to [14, Corollary 20.28]. Hence, by Minty’s Theorem (see [14, Theorem 21.1]), we know that Id+Tλ,γ:HH is surjective.

We are in conditions of stating the main theorem concerning our previous algorithm.

Theorem 6.1

Let A:H2H be a maximally monotone operator and B:HH a β-cocoercive operator for some β0 such that zer(A+B). Choose x0,x1H any initial points. Let α>1, ξ0, and (λk)k0, (γk)k0 sequences of positive numbers that fulfill

λk=λk2k1,withλ>4ξ+2(α-1)2,0<infk0γksupk0γk<2βandγk-γk-1γk=O1kask+.

Now, consider the sequences (yk)k1 and (xk)k0 generated by algorithm (40). The following properties are satisfied:

  • (i)
    We have the estimates
    xk+1-xk=O1kandAγk(xk-γkBxk)+Bxk=o1γkask+.
  • (ii)

    The sequence (xk)k0 converges weakly to an element of zer(A+B).

  • (iii)

    The sequence (yk)k1 converges weakly to an element of zer(A+B). Precisely, we have xk-yk=O1k as k+.

The proof can be done by transposing the techniques used in the continuous time case to the discrete time case. Algorithm (40) can be seen as a splitting version of the (PRINAM) algorithm studied by Attouch and László in [10].

Remark 6.2

The second step in (40) can be quite complicated to compute. However, if B=0, we can resort to the fact that (Aλ1)λ2=Aλ1+λ2 for λ1,λ2>0. We now have, for λ,γ>0,

Tλ,γ=1λ[Id-JγA]=γλAγ,

which gives

Id+Tγ,λ-1=JγλAγ=-γλAλγλ+Id=Id-γλAλ+γλ.

It is now possible to write (40) in terms of the resolvents of A. We have, for every k1,

Tλk,γk(xk)-Tλk-1,γk-1(xk-1)=1λk[xk-JγkA(xk)]-1λk-1[xk-1-Jγk-1A(xk-1)]=1λk-1λk-1xk+1λk-1(xk-xk-1)-1λkJγkA(xk)-1λk-1Jγk-1A(xk-1),yk-γk+1λk+1Aλk+1+γk+1λk+1(yk)=yk-γk+1λk+11λk+12+γk+1λk+1yk-Jλk+1+γk+1λk+1A(yk)=λk+12λk+12+γk+1yk+γkλk+12+γk+1Jλk+1+γk+1λk+1A(yk).

So now (40) becomes

(k1)yk=1-ξ1λk-1λk-1xk+αk-ξλk-1(xk-xk-1)+ξ1λkJγkA(xk)-1λk-1Jγk-1A(xk-1),xk+1=λk+12λk+12+γk+1yk+γkλk+12+γk+1Jλk+1+γk+1λk+1A(yk). 41

Now, if we assume 0<infk0γk and λ>2ξ+1(α-1)2 and otherwise keep the hypotheses of Theorem 6.1, then for the sequences (xk)k0 and (yk)k1 generated by (41), the following statements hold:

  • (i)
    We have the estimates
    xk+1-xk=O1kandAγk(xk)=o1γkask+.
  • (ii)

    The sequence (xk)k0 converges weakly to an element of zerA.

  • (iii)

    The sequence (yk)k1 converges weakly to an element of zerA as well. Precisely, we have xk-yk=O1k as k+.

Notice that the condition required for (γk)k0 is fulfilled in particular for γk=kn for every k1 and a natural number n1. Thus, by choosing large n, we obtain a fast convergence rate for Aγk(xk) as k+.

A Appendix

The following are three auxiliary lemmas that are used in the proof of Theorem 3.1. The proof for Lemma A.1 can be found in [12], while the proof of Lemma A.2 is straightforward. For the proof of Opial’s Lemma, we refer the reader to [1, Lemma 1.10].

Lemma A.1

Let t0>0, and let u:[t0,+)R be a continuously differentiable function which is bounded from below. Given α>1, a nonnegative function θ:[t0,+)R and a nonnegative function kL1([t0,+),R), let us assume that

tu¨(t)+αu˙(t)+θ(t)k(t)

for almost every tt0. Then, the positive part [u˙]+ of u˙ belongs to L1([t0,+),R) and limt+u(t) exists. Moreover, we have t0+θ(t)dt<+.

Lemma A.2

Let A,B,CR and H a real Hilbert space. Then the inequality

AX2+2CX,Y+BY20

holds for every X,YH if and only if A,B0 and C2-AB0.

Lemma A.3

(Opial’s Lemma) Let SH be a nonempty set and x:[t0,+)H a given map, where t0>0. Assume that

  • (i)

    for every xS, limt+x(t)-x exists;

  • (ii)

    every weak sequential cluster point of the map x belongs to S.

Then, there exists xS such that x(t) converges weakly to x as t+.

Funding

Open access funding provided by University of Vienna.

Footnotes

Research partially supported by the Doctoral Programme Vienna Graduate School on Computational Optimization (VGSCO) which is funded by FWF (Austrian Science Fund), project W1260-N35.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Abbas B, Attouch H. Dynamical systems and forward–backward algorithms associated with the sum of a convex subdifferential and a monotone cocoercive operator. Optimization. 2015;64(10):2223–2252. doi: 10.1080/02331934.2014.971412. [DOI] [Google Scholar]
  • 2.Álvarez F. On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 2020;38(4):1102–1119. doi: 10.1137/S0363012998335802. [DOI] [Google Scholar]
  • 3.Álvarez F, Attouch H. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 2001;9(1–2):3–11. doi: 10.1023/A:1011253113155. [DOI] [Google Scholar]
  • 4.Álvarez F, Attouch H, Bolte J, Redont P. A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J. Math. Pures Apl. 2002;81(8):747–779. doi: 10.1016/S0021-7824(01)01253-3. [DOI] [Google Scholar]
  • 5.Apidopoulos, V., Aujol, J.-F., Dossal, C.: Convergence rate of inertial forward–backward algorithm beyond Nesterov’s rule. Math. Program. 180, 137–156 (2020)
  • 6.Attouch H, Chbani Z, Peypouquet J, Redont P. Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 2018;168:123–175. doi: 10.1007/s10107-016-0992-8. [DOI] [Google Scholar]
  • 7.Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α3. In: ESAIM: COCV, vol. 25, Article number 2 (2019)
  • 8.Attouch H, Goudou X, Redont P. The heavy ball with friction method. The continuous dynamical system, global exploration of the local minima of a real-valued function by asymptotical analysis of a dissipative dynamical system. Commun. Contemp. Math. 2000;2(1):1–34. doi: 10.1142/S0219199700000025. [DOI] [Google Scholar]
  • 9.Attouch H, László SC. Continuous Newton-like inertial dynamics for monotone inclusions. Set-Valued Var. Anal. 2021;29:555–581. doi: 10.1007/s11228-020-00564-y. [DOI] [Google Scholar]
  • 10.Attouch H, László SC. Newton-like inertial dynamics and proximal algorithms governed by maximally monotone operators. SIAM J. Optim. 2020;30(4):3252–3283. doi: 10.1137/20M1333316. [DOI] [Google Scholar]
  • 11.Attouch H, Maingé PE. Asymptotic behavior of second order dissipative evolution equations combining potential with non-potential effects. ESAIM Control Optim. Calculus Var. 2011;17(3):836–857. doi: 10.1051/cocv/2010024. [DOI] [Google Scholar]
  • 12.Attouch H, Peypouquet J. Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators. Math. Program. 2019;174(1–2):391–432. doi: 10.1007/s10107-018-1252-x. [DOI] [Google Scholar]
  • 13.Attouch H, Peypouquet J, Redont P. Fast convex minimization via inertial dynamics with Hessian driven damping. J. Differ. Equ. 2016;210(10):5734–5783. doi: 10.1016/j.jde.2016.08.020. [DOI] [Google Scholar]
  • 14.Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory, CMS Books in Mathematics, 2nd edn. Springer, Berlin (2017)
  • 15.Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202
  • 16.Boţ RI, Csetnek ER. Second order forward–backward dynamical systems for monotone inclusion problems. SIAM J. Control Optim. 2016;54:1423–1443. doi: 10.1137/15M1012657. [DOI] [Google Scholar]
  • 17.Haraux, A.: Systémes Dynamiques Dissipatifs et Applications. Masson (1991)
  • 18.May R. Asymptotic for a second order evolution equation with convex potential and vanishing damping term. Turk. J. Math. 2017;41(3):681–685. doi: 10.3906/mat-1512-28. [DOI] [Google Scholar]
  • 19.Nesterov Y. A method of solving a convex programming problem with convergence rate O(1/k2) Dokl. Akad. Nauk SSSR. 1983;269(3):543–547. [Google Scholar]
  • 20.Su, W.J., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Neural Inf. Process. Syst. 27, 2510–2518 (2014)

Articles from Journal of Dynamics and Differential Equations are provided here courtesy of Springer

RESOURCES