Abstract
In the framework of a real Hilbert space, we address the problem of finding the zeros of the sum of a maximally monotone operator A and a cocoercive operator B. We study the asymptotic behaviour of the trajectories generated by a second order equation with vanishing damping, attached to this problem, and governed by a time-dependent forward–backward-type operator. This is a splitting system, as it only requires forward evaluations of B and backward evaluations of A. A proper tuning of the system parameters ensures the weak convergence of the trajectories to the set of zeros of , as well as fast convergence of the velocities towards zero. A particular case of our system allows to derive fast convergence rates for the problem of minimizing the sum of a proper, convex and lower semicontinuous function and a smooth and convex function with Lipschitz continuous gradient. We illustrate the theoretical outcomes by numerical experiments.
Keywords: Asymptotic stabilization, Damped inertial dynamics, Lyapunov analysis, Vanishing viscosity, Splitting system, Monotone inclusions
Introduction
Problem Formulation and a Continuous Time Splitting Scheme with Vanishing Damping
Let be a real Hilbert, a maximally monotone operator and a -cocoercive operator for some such that . Devising fast convergent continuous and discrete time dynamics for solving monotone inclusions of the type
| 1 |
is of great importance in many fields, including, but not limited to, optimization, equilibrium theory, economics and game theory, partial differential equations, and statistics. One of our main motivations comes from the fact that solving the convex optimization problem
where is proper, convex and lower semicontinuous and is convex and Fréchet differentiable with a Lipschitz continuous gradient, is equivalent to solving the monotone inclusion
We want to exploit the additive structure of (1) and approach A and B separately, in the spirit of the splitting paradigm.
For , , and functions , we will study the asymptotic behaviour of the trajectories of the second order differential equation
| 2 |
where, for , the operator is given by
The sets of zeros of and of , for , coincide. The nomenclature (Split-DIN-AVD) comes from the splitting feature of the continuous time scheme, as well as the link with the (DIN-AVD) system developed by Attouch and László in [9] (Dynamic Inertial Newton—Asymptotic Vanishing Damping), which we will emphasize later. We will discuss the existence and uniqueness of the trajectories generated (Split-DIN-AVD), and also show their weak convergence to the set of zeros of as well as the fast convergence of the velocities to zero, and convergence rates for and as .
For the particular case , we are left with the monotone inclusion problem
and the attached system
where, for , the operator can be seen as a generalized Moreau envelope of the operator A, i.e.,
In particular, we will be able to set for every . Since for , , this allows us to recover the (DIN-AVD) system
addressed by Attouch and László in [9].
If , and after properly redefining some parameters, we obtain the following system
with , which addresses the monotone equation
This dynamical system approaches the cocoercive operator B directly through a forward evaluation, which is more natural, instead of having to resort to its Moreau envelope, as in (DIN-AVD).
Notation and Preliminaries
In this subsection, we will explain the notions which were mentioned in the previous subsection, and we will introduce some definitions and preliminary results that will be required later. Throughout the paper, we will be working in a real Hilbert space with inner product and corresponding norm .
Let be a set-valued operator, that is, Ax is a subset of for every . The operator A is totally characterized by its graph . The inverse of A is the operator well-defined through the equivalence if and only if . The set of zeros of A is the set . For a subset , we say that . The range of A is the set .
A set-valued operator A is said to be monotone if whenever , and maximally monotone if it is monotone and the following implication holds:
Let . The resolvent of index of A is the operator given by
and the Moreau envelope (or Yosida approximation or Yosida regularization) of index of A is the operator given by
where , defined by for every , is the identity operator of . For , it holds .
A single-valued operator is said to be -cocoercive for some if for every we have
In this case, B is -Lipschitz continuous, namely, for every we have
We say B is nonexpansive if it is 1-Lipschitz continuous, and firmly nonexpansive if it is 1-cocoercive. For , we say B is -averaged if there exists a nonexpansive operator such that
Let and . According to Minty’s Theorem, A is maximally monotone if and only if . In this case is single-valued and firmly nonexpansive, is single-valued, -cocoercive, and for every and every we have
Let be a single-valued operator. If B is -averaged for some , then is -cocoercive. If B is monotone and continuous, then it is maximally monotone.
The following concepts and results show the strong interplay between the theory of monotone operators and the convex analysis.
Let be a proper, convex and lower semicontinuous function. We denote the infimum of f over by and the set of global minimizers of f by . The subdifferential of f is the operator defined, for every , by
The subdifferential operator of f is maximally monotone and is a global minimizer of f.
Let . The proximal operator of f of index is the operator defined, for every , by
which also means that is firmly nonexpansive. The Moreau envelope of f of index is the function given, for every , by
The function is Fréchet differentiable and
Finally, if has full domain and is Fréchet differentiable with -Lipschitz continuous gradient, for , then, according to Baillon–Haddad’s Theorem, is -cocoercive.
A Brief History of Inertial Systems Attached to Optimization Problems and Monotone Inclusions
In the last years there have been many advances in the study of continuous time inertial systems with vanishing damping attached to monotone inclusion problems. We briefly visit them in the following paragraphs.
The Heavy Ball Method with Friction
Consider a convex and continuously differentiable function with at least one minimizer. The heavy ball with friction system
| 3 |
was introduced by Álvarez in [2] as a suitable continuous time scheme to approach the minimization of the function f. This system can be seen as the equation of the horizontal position x(t) of an object that moves, under the force of gravity, along the graph of the function f, subject to a kinetic friction represented by the term (a nice derivation can be seen in the work done by Attouch-Goudou-Redont in [8]). It is known that, if x is a solution of (HBF), then x converges weakly to a minimizer of f and as .
In recent times, the question was raised whether the damping coefficient could be chosen to be time-dependent. An important contribution was made by Su–Boyd–Candés (in [20]) who studied the case of an Asymptotic Vanishing Damping coefficient , namely,
| 4 |
and proved when the rate of convergence for the functional values as . This second order system can be seen as a continuous counterpart to Nesterov’s accelerated gradient method from [19]. Weak convergence of the trajectories generated by when has been shown by Attouch-Chbani-Peypouquet-Redont [6] and May [18], with the improved rate of convergence for the functional values as . For , the convergence of the trajectories remains an open question, except for the one dimensional case (see [7]). In the subcritical case , it has been shown by Apidopoulos-Aujol-Dossal [5] and Attouch-Chbani-Riahi [7] that the objective values converge at a rate as .
Heavy Ball Dynamics and Cocoercive Operators
If is a proper, convex and lower semicontinuous function which is not necessarily differentiable, then we cannot make direct use of (3). However, since for we have , we can replace f by its Moreau envelope , and the system now becomes
In line with this idea, and in analogy with (3), Álvarez and Attouch [3] and Attouch and Maingé [11] studied the dynamics
| 5 |
where is a -cocoercive operator. They were able to prove that the solutions of this system weakly converge to elements of provided that the cocoercitivity parameter and the damping coefficient satisfy . For a maximally monotone operator , we know that its Moreau envelope is -cocoercive and thus, under the condition , the trajectories of
converge weakly to elements of .
Also related to (5), Boţ-Csetnek [16] considered the system
| 6 |
where is again -cocoercive. Under the assumption that and are locally absolutely continuous, for almost every and , the authors were able to prove that the solutions to this system converge weakly to zeros of B.
In [12], Attouch and Peypouquet addressed the system
| 7 |
where and the time-dependent regularizing parameter satisfies for every . As well as ensuring the weak convergence of the trajectories towards elements of , choosing the regularizing parameter in such a fashion allowed the authors to obtain fast convergence of the velocities and accelerations towards zero.
Inertial Dynamics with Hessian Damping
Let us return briefly to the system (4). In addition to the viscous vanishing damping term , the following system with Hessian-driven damping was considered by Attouch-Peypouquet-Redont in [13]
where . While preserving the fast convergence properties of the Nesterov accelerated method, the Hessian-driven damping term reduces the oscillatory aspect of the trajectories. In [9], Attouch and László studied a version of (7) with an added Hessian-driven damping term:
While preserving the convergence results of (7), the main benefit of the introduction of this damping term is the fast convergence rates that can be obtained for and as . The regularizing parameter is again chosen to be time-dependent; in the general case, the authors take , and in [12] it is shown that taking this way is critical. However, in the case where for a proper, convex and lower semicontinuous function f, it is also allowed to take with .
Layout of the Paper
In Sect. 2, we give the proof for the existence and uniqueness of strong global solutions to (Split-DIN-AVD) by means of a Cauchy–Lipschitz–Picard argument. In Sect. 3 we state the main theorem of this work, and we show the weak convergence of the solutions of (2) to elements of , as well as the fast convergence of the velocities and accelerations to zero. We also provide convergence rates for and as . We explore the particular cases and , and show improvements with respect to previous works. In Sect. 4, we address the convex minimization case, namely, when and , where is a proper, convex and lower semicontinuous function and is a convex and Fréchet differentiable function with Lipschitz continuous gradient, and derive, in addition, a fast convergence rate for the function values. In Sect. 5, we illustrate the theoretical results by numerical experiments. In Sect. 5, we provide an algorithm that arises from a time discretization of (Split-DIN-AVD) and discuss its convergence properties.
Existence and Uniqueness of Trajectories
In this section, we show the existence and uniqueness of strong global solutions to (Split-DIN-AVD). For the sake of clarity, first we state the definition of a strong global solution.
Definition 2.1
We say that is a strong global solution of (Split-DIN-AVD) with Cauchy data if
-
(i)
are locally absolutely continuous;
-
(ii)
for almost every ;
-
(iii)
, .
A classic solution is just a strong global solution which is . Sometimes we will mention the terms strong global solution or classic global solution without explicit mention of the Cauchy data.
The following lemma will be used to prove the existence of strong global solutions of our system, and we will need it in the proof of the main theorem as well.
Lemma 2.2
Let be a maximally monotone operator and a -cocoercive operator for some . Then, the following statements hold:
-
(i)
For and , is a -cocoercive operator. In particular, this also implies that is -cocoercive.
-
(ii)Choose , and . Then, for it holds
-
(iii)If x is a classic global solution to (2) and , then, for every , we have
Proof
- (i)
-
(ii)We have
Now, notice that
so using (i) and the fact that , we obtain
| 8 |
Altogether, plugging (8) into our initial inequality yields
To show the second inequality, we use the previous one. We have
where the last line is a consequence of being -cocoercive, and hence -Lipschitz continuous (see (i)).
(iii) For set
and use (ii) to obtain, for every ,
Hence, by taking the limit as we get, for any ,
The next theorem concerns the existence and uniqueness of strong global solutions to (Split-DIN-AVD).
Theorem 2.3
Assume that are Lebesgue measurable functions and that . Then, for any there exists a unique strong global solution of the system (2) that satisfies and .
Proof
We will rely on [17, Proposition 6.2.1] and distinguish between the cases and . For each chase, we will check that the conditions of the afforementioned proposition are fulfilled. We will be working in the real Hilbert space endowed with the norm . Let be fixed.
The Case . First, it can be easily checked (see also [4, 9, 13]) that for all the following dynamical systems are equivalent
.
In other words, (2) with Cauchy data is equivalent to the first order system
where , F is given, for every , by
and the Cauchy data is , .
-
(i)Let be fixed. We need to verify the Lipschitz continuity of F on the z variable. Set , . We have
Set . According to Lemma 2.2(i), the term involving the operator satisfies
It follows that, if we take
then we have and -
(ii)Now, we claim that F fulfills a boundedness condition. For and we have
By Lemma 2.2(i), we have, for every ,
Hence, if we take
then we have and
We have checked that the conditions of [17, Proposition 6.2.1] hold. Therefore, there exists a unique locally absolutely continuous solution of (2) that satisfies and .
The Case . Now, (2) is easily seen to be equivalent to
where and F is given, for every , by
Showing that F fulfills the required properties is starightforward.
The Convergence Properties of the Trajectories
In this section, we will study the asymptotic behaviour of the trajectories of the system
where
We will show weak convergence of the trajectories generated by (2) to elements of , as well as the fast convergence of the velocities and accelerations to zero. Additionally, we will provide convergence rates for and as . To avoid repetition of the statement “for almost every t”, in the following theorem we will assume we are working with a classic global solution of our system.
Theorem 3.1
Let be a maximally monotone operator and a -cocoercive operator for some such that . Assume that , , for and all , and that is a differentiable function that satisfies as . Then, for a solution to (Split-DIN-AVD), the following statements hold:
-
(i)
x is bounded.
-
(ii)We have the estimates
-
(iii)We have the convergence rates
as . -
(iv)
If , then x(t) converges weakly to an element of as .
Proof
Integral Estimates and Rates. To develop the analysis, we will fix and make of use of the Lyapunov function given by
| 9 |
Differentiation of with respect to time yields, for every ,
After reduction and employing (2), we get, for every ,
Now, by Lemma 2.2(i), we know that is -cocoercive for every . Using this on the first summand of the right hand side of the previous inequality yields, for ,
| 10 |
Now, since , we can choose such that
| 11 |
From (10) we get, for every ,
| 12 |
By (11) and the definition of , we know that , and
so we can find such that for every the previous expression becomes nonpositive. According to Lemma A.2, the right hand side of (12) is nonpositive whenever
This quantity can be rewritten as
Since , we have . Hence,
This means we can find such that for every we have , that is, for every we have
| 13 |
Now, integrating (13) from to t we obtain
| 14 |
From (13) and the form of we immediately obtain
| 15 |
| 16 |
| 17 |
| 18 |
From Lemma 2.2(i), we know that for every the operator is -Lipschitz continuous, which gives, for every ,
Thus, from (15) and recalling that we arrive at
| 19 |
By combining (15), (18) and (19) we obtain and therefore
| 20 |
From Lemma 2.2, (15), (20) and the fact that B is -Lipschitz continuous we deduce that, as ,
| 21 |
On the other hand, for every we have
| 22 |
so by combining (19), (21), (22) and the fact that we arrive at
which yields
| 23 |
Let us now improve (19) and show that
| 24 |
According to (19) and (21) there exists a constant such that for every it holds
By (17), the right hand side belongs to , so we get
hence the limit
exists. Obviously, this implies the existence of . By using (17) again we come to
and so we must have , which gives
| 25 |
By combining (2), (19), (20) and (23) we obtain, as ,
Moreover, by using the well-known inequality for every , for every it holds
From (16), (23) and (17) it follows
| 26 |
To see that as , we write, for every ,
From (16) and (26) we deduce that the left hand side belongs to , from which we infer that the limit exists. Using (16) again, we get
from which we finally deduce , therefore
| 27 |
Notice that we can write for every
Hence, multiplying both sides of (25) by and remembering the definition of we obtain
| 28 |
For every , we have
Therefore, by using (23) and (28), and recalling that , we obtain
The fact that as comes from (2), (27), (23) and (24).
Weak Convergence of the Trajectories. Let . We will work with the energy function given by
For every , we have
| 29 |
Combining (2) and (29) gives us, for every ,
By using the -cocoercitivity of on the left hand side, Cauchy–Schwarz on the right hand side and multiplying both sides by t, the previous inequality entails, for every ,
Now, puttin together results in
Now apply Lemma A.1 with for every to deduce that the limit
exists, which fulfills the first condition of Opial’s Lemma A.3.
Let us now move on to the second condition. Suppose is a weak sequential cluster point of , that is, there exists a sequence such that and converges weakly to as . Define
According to (25), we have as . Now, since for all for some , we can extract a subsequence such that as . We may assume without loss of generality then that as . We now have for every
Now, since every weakly convergent sequence is bounded and the operators B and are Lipschitz-continuous we deduce that the right-hand side of the previous inequality approaches zero as , therefore getting
as . Now, from the proof of part (i) of Lemma 2.2, we know that is -cocoercive, thus monotone and Lipschitz continuous and therefore maximally monotone. Summarizing, we have
is maximally monotone and thus its graph is closed in the weakstrong topology of (see [14, Proposition 20.38(ii)]),
converges weakly to and as ,
which allows us to conclude that , and gives finally . Now we just invoke Opial’s Lemma to achieve that x(t) converges weakly to as for some .
In the following subsections, we explore the particular cases and , and we will show improvements with respect to previous results from the literature addressing continuous time approaches to monotone inclusions.
The Case
If we let in the (Split-DIN-AVD) system (2), then, attached to the monotone inclusion problem
we obtain the dynamics
| 30 |
where
We can state the following theorem.
Theorem 3.2
Let be a maximally monotone operator such that . Assume that , , for and all , and that is a differentiable function that satisfies as . Then, for a solution to (30), the following statements hold:
-
(i)
x is bounded.
-
(ii)We have the estimates
-
(iii)We have the convergence rates
as . -
(iv)
If , then x(t) converges weakly to an element of as .
Proof
The proof proceeds in the exact same way as the proof of Theorem 3.1. However, a few comments are in order: first of all, now we have . Since is firmly nonexpansive, by [14, Proposition 4.4] so is . In other words, is 1-cocoercive, therefore is -cocoercive, so now the condition on becomes .
The proof also changes when we verify the second part of the Opial’s Lemma, to get weak convergence of the trajectories . This is in order to allow for not to be necessarily bounded. We do need, however, the assumption . Indeed, from as , we obtain
as . Using the definition of the resolvent, we come to
for all . If is such that and converges weakly to as , then the previous inclusion, together with the assumption on gives
and by the closedness of the graph of A in the weakstrong topology of , we deduce that .
Remark 3.3
The hypotheses required for are fulfilled at least by two families of functions. First, take and set . Then, we have
and
If is a polynomial of degree n for some , the conditions are also fulfilled. Assume for all , for some for and . Then, we have
so as . Since we also have as , the condition is fulfilled for large enough .
In particular, we can choose , which fulfills for any and any . Since for , this choice of allows us to recover the (DIN-AVD) system studied by Attouch and László in [9]. Notice the way the convergence rates for and exhibited in part (iii) of Theorem 3.2 depend on . If we set for every for any natural number , (Split-DIN-AVD) performs from this point of view better than (DIN-AVD) without increasing the complexity of the governing operator.
The Case
Let us return to (Split-DIN-AVD) dynamics (2). Set , and for every take and with . Then, associated to the problem
we obtain the system
| 31 |
The conditions and imply
With the previous observation, we are able to state the following theorem.
Theorem 3.4
Let be a -cocoercive operator for some such that . Assume that , and for and all . Take a solution to (31). Then, the following hold:
-
(i)
x is bounded, and x(t) converges weakly to an element of as .
-
(ii)We have the estimates
-
(iii)We have the convergence rates
as well as the limit
as .
Proof
Since , we can find such that , equivalently, . Since (31) is equivalent to (Split-DIN-AVD) with and parameters and , the conclusion follows from Theorem 3.1.
Remark 3.5
As we mentioned in the introduction, the dynamical system (31) provides a way of finding the zeros of a cocoercive operator directly through forward evaluations, instead of having to resort to its Moreau envelope when following the approach in [9].
- The dynamics (31) bear some resemblance to the system (6) (see also [16]) with and , with an additional Hessian-driven damping term. In our case, since , the parameters satisfy
However, we have
so one of the hypotheses which is needed in (6) is not fulfilled, which shows that one cannot address the dynamical system (31) as a particular case of it; indeed, for (6) a vanishing damping is not allowed. With our system, we obtain convergence rates for and as , which are not obtained in [16].
Structured Convex Minimization
We can specialize the previous results to the case of convex minimization, and show additionally the convergence of functional values along the generated trajectories to the optimal objective value at a rate that will depend on the choice of . Let be a proper, convex and lower semicontinuous function, and let be a convex and Fréchet differentiable function with -Lipschitz continuous gradient. Assume that , and consider the minimization problem
| 32 |
Fermat’s rule tells us that is a global minimum of if and only if
Therefore, solving (32) is equivalent solving the monotone inclusion addressed in the first section, with and . Moreover, recall that if is -Lipschitz then it is -cocoercive (Baillon–Haddad’s Theorem, see [14, Corollary 18.17]). Therefore, associated to the problem (32) we have the dynamics
| 33 |
where we have denoted for all for convenience.
Theorem 4.1
Let be a proper, convex and lower semicontinuous function, and let be a convex and Fréchet differentiable function with a -Lipschitz continuous gradient such that . Assume that , , for and all , and that is a differentiable function that satisfies as . Then, for a solution to (33), the following statements hold:
-
(i)
x is bounded.
-
(ii)We have the estimates
-
(iii)We have the convergence rates
as . -
(iv)
If , then x(t) converges converges to a minimizer of as .
-
(v)Additionally, if for every and we set , then
as . Moreover, as .
Proof
Parts (i)–(iv) are a direct consequence of Theorem 3.1. For checking (v), first notice that for all we have
| 34 |
Now, let . According to [15, Lemma 2.3], for every , we have the inequality
After summing the norm squared term and using the Cauchy–Schwarz inequality, for every we obtain
which follows as a consequence of x being bounded and as .
Remark 4.2
It is also worth mentioning the system we obtain in the case where , since we also get some improved rates for the objective functional values when we compare (Split-DIN-AVD) to (DIN-AVD) [9]. In this case, we have the system
| 35 |
attached to the convex optimization problem
If we assume , allow to be unbounded from above and otherwise keep the hypotheses of Theorem 4.1, for a solution to (35), the following statements hold:
-
(i)
x is bounded,
-
(ii)We have the estimates
-
(iii)We have the convergence rates
as . -
(iv)
If , then x(t) converges weakly to a minimizer of f as .
-
(v)We also obtain the rate
which entails
as .
Parts (i)–(iv) are a direct consequence of Theorem 3.2 for the case . For showing part (v), first notice that for and we have, according to the definition of and ,
Let be a minimizer of f. We apply the gradient inequality to , from which we obtain, for every
where the last inequality follows from the Cauchy–Schwarz inequality. Since as and x is bounded, the previous inequality entails the first statement of (v). Again recalling the definition of the Moreau envelope of f, this finally gives
as , which implies the last two statements and concludes the proof.
As pointed out in Remark 3.3, we can choose for every and recover the (DIN-AVD) system for nonsmooth convex minimization problems studied in [9]. Moreover, we can also set for a natural number and all . Now, not only are the convergence rates for and as improved with respect to the system in [9], but (Split-DIN-AVD) also provides a better rate for the convergence of to as .
Numerical Experiments
In the following paragraphs we describe some numerical experiments that portray some aspects of the theory.
Minimizing a Smooth and Convex Function
As an example of a continuous time scheme minimizing a convex and Fréchet differentiable function with -Lipschitz continuous gradient via (Split-DIN-AVD), we consider the system
| 36 |
where for we set and therefore . A trajectory generated by (36) is a pair . Figure 1 plots both components of the solution to (36) with initial Cauchy data , . Notice that the Lipschitz constant of is , which means that the cocoercitivity modulus of is . To fulfill , we choose , . Figure 1a corresponds to the case with no Hessian damping, that is, . Figure 1b corresponds to a Hessian damping parameter .
Fig. 1.
Trajectories of (Split-DIN-AVD) for
Figure 2 depicts the fast convergence of the velocities to zero for the cases (Fig. 2a) and (Fig. 2b). In both figures, notice the effect of the damping parameter , which attenuates the oscillations of the second component of the trajectories, as well as the oscillations present in the velocities.
Fig. 2.
Fast convergence of the velocities
Minimizing a Nonsmooth and Convex Function
As an example of a continuous time scheme minimizing a proper, convex and lower semicontinuous function via (Split-DIN-AVD), we consider the system
| 37 |
We will consider three options for f and plot for each of them the trajectories, the objective function values and the gradients of the Moreau envelopes as follows:
In order to fulfill and , we choose the parameters , , and we take and . We compare the results given by (DIN-AVD) (that is, when ) and the ones given by our system (Split-DIN-AVD). The choice of does not seem to change the plots in a significant way for the examples we have chosen.
Fig. 3.
Trajectories and objective function values in the case
Fig. 4.
Gradients of the Moreau envelopes of f
Figure 3 depicts the trajectories x(t) of (37) and the function values for our choices of f as . Figure 4 portrays the fast convergence to zero of as . Notice the big improvement over (DIN-AVD) for nonsmooth convex minimization in [9] when choosing , a result which we already knew theoretically. Polynomials of high degree seem to be the ones which give the biggest improvements in terms of rates.
An Example with Operator Splitting
Now we consider the monotone inclusion problem (1) for and for every . For every , an easy calculation gives
and so
and
(Split-DIN-AVD) now reads
We choose the parameters , , , and the Cauchy data and . Figure 5a corresponds to the case , and Fig. 5b depicts the trajectory when the Hessian damping parameter is . Again, notice how, not only for optimization problems, but also for monotone inclusions which cannot be reduced to the former, the presence of seems to attenuate the oscillations present in the trajectories.
Fig. 5.
Trajectories of (Split-DIN-AVD) for finding the zeros of
A Numerical Algorithm
In the following we will derive via time discretization of (Split-DIN-AVD) a numerical algorithm for solving the monotone inclusion problem (1). We perform a discretization of (Split-DIN-AVD) with stepsize 1 and set, for an integer , , , . We make the approximations
so we get, for every ,
| 38 |
After rearranging the terms of (38), for every we obtain
| 39 |
In other words, after setting and denoting the right hand side of (39) by for every , we obtain the following iterative scheme
| 40 |
Observe that the second step in (40) is always well-defined. Indeed, for , is -cocoercive, hence monotone (see Lemma 2.2(i)). This also implies that is -Lipschitz continuous, and a monotone and continuous operator is maximally monotone, according to [14, Corollary 20.28]. Hence, by Minty’s Theorem (see [14, Theorem 21.1]), we know that is surjective.
We are in conditions of stating the main theorem concerning our previous algorithm.
Theorem 6.1
Let be a maximally monotone operator and a -cocoercive operator for some such that . Choose any initial points. Let , , and , sequences of positive numbers that fulfill
Now, consider the sequences and generated by algorithm (40). The following properties are satisfied:
-
(i)We have the estimates
-
(ii)
The sequence converges weakly to an element of .
-
(iii)
The sequence converges weakly to an element of . Precisely, we have as .
The proof can be done by transposing the techniques used in the continuous time case to the discrete time case. Algorithm (40) can be seen as a splitting version of the (PRINAM) algorithm studied by Attouch and László in [10].
Remark 6.2
The second step in (40) can be quite complicated to compute. However, if , we can resort to the fact that for . We now have, for ,
which gives
It is now possible to write (40) in terms of the resolvents of A. We have, for every ,
So now (40) becomes
| 41 |
Now, if we assume and and otherwise keep the hypotheses of Theorem 6.1, then for the sequences and generated by (41), the following statements hold:
-
(i)We have the estimates
-
(ii)
The sequence converges weakly to an element of .
-
(iii)
The sequence converges weakly to an element of as well. Precisely, we have as .
Notice that the condition required for is fulfilled in particular for for every and a natural number . Thus, by choosing large n, we obtain a fast convergence rate for as .
A Appendix
The following are three auxiliary lemmas that are used in the proof of Theorem 3.1. The proof for Lemma A.1 can be found in [12], while the proof of Lemma A.2 is straightforward. For the proof of Opial’s Lemma, we refer the reader to [1, Lemma 1.10].
Lemma A.1
Let , and let be a continuously differentiable function which is bounded from below. Given , a nonnegative function and a nonnegative function , let us assume that
for almost every . Then, the positive part of belongs to and exists. Moreover, we have .
Lemma A.2
Let and a real Hilbert space. Then the inequality
holds for every if and only if and .
Lemma A.3
(Opial’s Lemma) Let be a nonempty set and a given map, where . Assume that
-
(i)
for every , exists;
-
(ii)
every weak sequential cluster point of the map x belongs to S.
Then, there exists such that x(t) converges weakly to as .
Funding
Open access funding provided by University of Vienna.
Footnotes
Research partially supported by the Doctoral Programme Vienna Graduate School on Computational Optimization (VGSCO) which is funded by FWF (Austrian Science Fund), project W1260-N35.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Abbas B, Attouch H. Dynamical systems and forward–backward algorithms associated with the sum of a convex subdifferential and a monotone cocoercive operator. Optimization. 2015;64(10):2223–2252. doi: 10.1080/02331934.2014.971412. [DOI] [Google Scholar]
- 2.Álvarez F. On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 2020;38(4):1102–1119. doi: 10.1137/S0363012998335802. [DOI] [Google Scholar]
- 3.Álvarez F, Attouch H. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 2001;9(1–2):3–11. doi: 10.1023/A:1011253113155. [DOI] [Google Scholar]
- 4.Álvarez F, Attouch H, Bolte J, Redont P. A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J. Math. Pures Apl. 2002;81(8):747–779. doi: 10.1016/S0021-7824(01)01253-3. [DOI] [Google Scholar]
- 5.Apidopoulos, V., Aujol, J.-F., Dossal, C.: Convergence rate of inertial forward–backward algorithm beyond Nesterov’s rule. Math. Program. 180, 137–156 (2020)
- 6.Attouch H, Chbani Z, Peypouquet J, Redont P. Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 2018;168:123–175. doi: 10.1007/s10107-016-0992-8. [DOI] [Google Scholar]
- 7.Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case . In: ESAIM: COCV, vol. 25, Article number 2 (2019)
- 8.Attouch H, Goudou X, Redont P. The heavy ball with friction method. The continuous dynamical system, global exploration of the local minima of a real-valued function by asymptotical analysis of a dissipative dynamical system. Commun. Contemp. Math. 2000;2(1):1–34. doi: 10.1142/S0219199700000025. [DOI] [Google Scholar]
- 9.Attouch H, László SC. Continuous Newton-like inertial dynamics for monotone inclusions. Set-Valued Var. Anal. 2021;29:555–581. doi: 10.1007/s11228-020-00564-y. [DOI] [Google Scholar]
- 10.Attouch H, László SC. Newton-like inertial dynamics and proximal algorithms governed by maximally monotone operators. SIAM J. Optim. 2020;30(4):3252–3283. doi: 10.1137/20M1333316. [DOI] [Google Scholar]
- 11.Attouch H, Maingé PE. Asymptotic behavior of second order dissipative evolution equations combining potential with non-potential effects. ESAIM Control Optim. Calculus Var. 2011;17(3):836–857. doi: 10.1051/cocv/2010024. [DOI] [Google Scholar]
- 12.Attouch H, Peypouquet J. Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators. Math. Program. 2019;174(1–2):391–432. doi: 10.1007/s10107-018-1252-x. [DOI] [Google Scholar]
- 13.Attouch H, Peypouquet J, Redont P. Fast convex minimization via inertial dynamics with Hessian driven damping. J. Differ. Equ. 2016;210(10):5734–5783. doi: 10.1016/j.jde.2016.08.020. [DOI] [Google Scholar]
- 14.Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory, CMS Books in Mathematics, 2nd edn. Springer, Berlin (2017)
- 15.Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202
- 16.Boţ RI, Csetnek ER. Second order forward–backward dynamical systems for monotone inclusion problems. SIAM J. Control Optim. 2016;54:1423–1443. doi: 10.1137/15M1012657. [DOI] [Google Scholar]
- 17.Haraux, A.: Systémes Dynamiques Dissipatifs et Applications. Masson (1991)
- 18.May R. Asymptotic for a second order evolution equation with convex potential and vanishing damping term. Turk. J. Math. 2017;41(3):681–685. doi: 10.3906/mat-1512-28. [DOI] [Google Scholar]
- 19.Nesterov Y. A method of solving a convex programming problem with convergence rate Dokl. Akad. Nauk SSSR. 1983;269(3):543–547. [Google Scholar]
- 20.Su, W.J., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Neural Inf. Process. Syst. 27, 2510–2518 (2014)





