ABSTRACT
We investigate a second-order dynamical system with variable damping in connection with the minimization of a nonconvex differentiable function. The dynamical system is formulated in the spirit of the differential equation which models Nesterov's accelerated convex gradient method. We show that the generated trajectory converges to a critical point, if a regularization of the objective function satisfies the Kurdyka- Lojasiewicz property. We also provide convergence rates for the trajectory formulated in terms of the Lojasiewicz exponent.
COMMUNICATED BY: Boris Mordukhovich
KEYWORDS: Second-order dynamical system, nonconvex optimization, Kurdyka–Łojasiewicz inequality, convergence rate
AMS SUBJECT CLASSIFICATIONS: 90C26, 90C30, 65K10
1. Introduction
Consider the (not necessarily convex) optimization problem
| (1) |
where is a Fréchet differentiable function with -Lipschitz continuous gradient, i.e. there exists such that for all . We associate to (1) the second-order dynamical system (for )
| (2) |
where and .
The study of the system (2) is motivated by the recent developments related to the approaching of the solving of convex optimization problems from a continuous perspective.
In [1], Su, Boyd and Candes proposed the following dynamical system
| (3) |
as the continuous counterpart of the Nesterov's accelerated gradient method [see [2]] for minimizing g in the convex case. This research has been deepened by Attouch and his co-authors [see [3,4]], who proved that, if , then the generated trajectory converges to a minimizer of g as , while the convergence rate of the objective function along the trajectory is . The convergence of the trajectory is actually the continuous counterpart of a result due to Chambolle and Dossal [see [5]], which proves the convergence of the iterates of the modified FISTA algorithm [see [6]].
Recently, in [7], investigations have been performed concerning the convergence rate of the objective function along the trajectory in the subcritical case , while some open questions related to the asymptotic properties of the trajectory have been formulated.
In this manuscript, we carry out, in the nonconvex setting, an asymptotic analysis of the dynamical system (2), which can be seen as a perturbation of the dynamical system (3) that models Nesterov's accelerated gradient method in the convex case. To the best of our knowledge, this is the first contribution addressing second-order dynamical systems with variable damping associated to nonconvex optimization problems. We show that the generated trajectory converges to a critical point of g as , provided the following regularization of g,
satisfies the Kurdyka–Łojasiewicz inequality. Moreover, we derive convergence rates in the terms of Łojasiewicz exponent, for the trajectory, its velocity and its acceleration. One of the major future goals is to study the asymptotic properties of the system (2) in case . For other investigations of the asymptotic analysis of second-order dynamical systems with time-dependent damping, we refer to the papers of Haraux and Jendoubi [8] and Balti [9].
For , the convergence of the trajectory generated by (2) to a critical point of g has been shown by Bégout, Bolte and Jendoubi in [10] in the hypothesis that g is of class and it satisfies the Kurdyka–Łojasiewicz property with a desingularizing function satisfying a restrictive condition [see also the papers of Haraux and Jendoubi [11] and Chill and Jendoubi [12]]. On the other hand, the dynamical system (2) is, for , a particular instance of the second-order dynamical system of proximal-gradient type studied in [13].
The following numerical scheme, with starting points ,
| (4) |
where is the step size, can be seen as a discrete counterpart of (2). One can notice that for , this iterative scheme algorithm is similar to Nesterov's accelerated convex gradient method.
In the following, we prove that (2) can be seen in an informal way as the exact limit of (4)). We take to this end in (4) small step sizes and follow the same approach as Su, Boyd and Candes in [1, Section 2]. For this purpose, we rewrite (4) in the form
| (5) |
and introduce the Ansatz for some twice continuously differentiable function . We let and get Then, as the step size s goes to zero, from the Taylor expansion of x, we obtain
and
Furthermore, since
it follows . Consequently, (5) can be written as
or, equivalently
Hence,
After dividing by and letting , we obtain
which, after division by t, gives (2), namely
2. Existence and uniqueness of the trajectory
We consider on the finite-dimensional space the Euclidean topology. If is a local minimizer of g, then . We denote by
the set of critical points of g.
We are considering in the asymptotic analysis of the dynamical system (2) strong global solutions.
Definition 2.1
We say that is a strong global solution of (2), if the following properties are satisfied:
are locally absolutely continuous, in other words, absolutely continuous on each interval for ;
for almost every ;
and .
Recall that a function is absolutely continuous on an interval , if there exists an integrable function such that
It follows from the definition that an absolutely continuous function is differentiable almost everywhere, its derivative coincides with its distributional derivative almost everywhere and one can recover the function from its derivative by the integration formula above. On the other hand, if (where ) is absolutely continuous and is L-Lipschitz continuous (where ), then the function is absolutely continuous, too. Moreover, is almost everywhere differentiable and the inequality holds for almost every [see [14,15]].
We prove existence and uniqueness of a strong global solution of (2) by making use of the Cauchy–Lipschitz–Picard Theorem for absolutely continues trajectories [see for example [16, Proposition 6.2.1], [17, Theorem 54]]. The key argument is that one can rewrite (2) as a particular first-order dynamical system in a suitably chosen product space [see also [18]].
Theorem 2.1
For every starting points there exists a unique strong global solution of the dynamical system (2).
Proof.
By making use of the notation , the system (2) can be rewritten as a first-order dynamical system:
(6) where
First we show that is -Lipschitz continuous for every and that the Lipschitz constant is a function of time with the property that . Indeed, for every , we have
Obviously, the Lipschitz constant function is continuous, hence integrable, on for all , consequently, .
Next we show that for all . Let be fixed. For , one has
and the conclusion follows by the continuity of on .
The Cauchy–Lipschitz–Picard Theorem guarantees existence and uniqueness of the trajectory of the first-order dynamical system (6) and thus of the second-order dynamical system (2).
The next result shows that the acceleration of the trajectory generated by (2) is also locally absolutely continuous on .
Proposition 2.1
For the starting points let x be the unique strong global solution of (2). Then is locally absolutely continuous on hence the third-order derivative exists almost everywhere on
Proof.
Let T>0 be fixed. According to Theorem 2.1, is absolutely continuous on . We endow the product space with the 1-norm. For arbitrary , we have
where
Let be . Since the functions and are absolutely continuous on , there exists such that for any finite family of intervals , the implication
holds. Consequently,
hence is absolutely continuous on , which shows that is absolutely continuous . This proves that is locally absolutely continuous on , which means that the third-order derivative exists almost everywhere on
The following results provides an estimate for the third-order derivative of the strong global solution of the dynamical system (2) in terms its first- and second-order derivatives.
Lemma 2.1
For the starting points let x be the unique strong global solution of (2). Then, for almost every it holds
(7)
Proof.
Let be such that . We have for almost every h>0 that
Hence,
By taking the limit as , we obtain
Since , we conclude
Remark 2.1
For
we have that
for almost every .
3. Convergence of trajectories
In this section, we study the convergence of the trajectory of the dynamical system (2). We denote by
the set of limit points of the trajectory x.
Before proving a first result in this sense, we recall two technical lemmas which will play an essential role in the asymptotic analysis.
Lemma 3.1
Suppose that is locally absolutely continuous and bounded below and that there exists such that for almost every
Then there exists .
Lemma 3.2
Suppose that is locally absolutely continuous and and that there exists such that for almost every
Then it holds .
Theorem 3.1
Assume that g is bounded from below and, for let x be the unique strong global solution of the dynamical system (2). Then the following statements are true:
there exists such that the limit exists and is finite;
and
Proof.
Choose . By using the -Lipschitz continuity of , for almost every it holds
Taking into account that
and
we obtain for almost every
(8) Since , there exists such that for every it holds
(9) For simplicity, we denote
Let be Since , we have Furthermore, by the -Lipschitz property of , it holds By integrating (8) on , we obtain
(10) Taking into account that g is bounded from bellow, by letting , we obtain
Consequently , hence
On the other hand, (8) and Lemma 3.1 ensure that the limit
(11) exists and is finite.
As for almost every
and , according to Lemma 3.2, it follows that
As for almost every
and (see Remark 2.1 and (i)), according to Lemma 3.2, it follows that
Finally, by using that , (11) becomes
(12) Let Then there exists a sequence such that as By using the continuity of , we have
which shows that
In the following result, we use the distance function to a set, defined for as for all . The following result is a direct consequence of Theorem 3.1.
Lemma 3.3
Assume that g is bounded from below and, for let x be the unique strong global solution of the dynamical system (2). Define
Let be and such that for every the inequalities (9) hold. For every define
Then the following statements are true:
the limit exists and is finite;
H is finite and constant on
If x is bounded, then
is nonempty and compact;
Proof.
(i) According to Theorem 3.1(iii),
hence
(ii) and (iii) are nothing else than (8) and (12), respectively.
(iv) follows directly from (iii).
(v) Since for every , by using (2), we have for almost every
(vi) Since
and (see (i))
by Theorem 3.1(iv) one obtains
Assume that x is bounded.
(vii) Since we obtain that u and v are bounded, too. Thus, the set of limit points is nonempty. Furthermore, since and is bounded, it is enough to show that is closed.
Let be and assume that We show that Obviously, for every , there exists a sequence , such that
Let be . Since there exists such that for every it holds
Let be fixed. Since there exists such that for every it holds
Let be such that . Obviously as and for every
Hence,
thus
(viii) follows from the definition of the set .
Remark 3.1
Combining (iii) and (iv) in Lemma 3.3, it follows that for every and such that as we have
The convergence of the trajectory generated by the dynamical system (2) will be shown in the framework of functions satisfying the Kurdyka–Łojasiewicz property. For , we denote by the class of concave and continuous functions such that , ϕ is continuously differentiable on , continuous at 0 and for all .
Definition 3.1 Kurdyka–Łojasiewicz property —
Let be a differentiable function. We say that f satisfies the Kurdyka–Łojasiewicz (KL) property at if there exist , a neighbourhood U of and a function such that for all x in the intersection
the following inequality holds
If f satisfies the KL property at each point in , then f is called a KL function.
The origins of this notion go back to the pioneering work of Łojasiewicz [19], where it is proved that for a real-analytic function and a critical point (that is ), there exists such that the function is bounded around . This corresponds to the situation when . The result of Łojasiewicz allows the interpretation of the KL property as a re-parametrization of the function values in order to avoid flatness around the critical points. Kurdyka [20] extended this property to differentiable functions definable in an o-minimal structure. Further extensions to the nonsmooth setting can be found in [21–24].
To the class of KL functions belong semi-algebraic, real sub-analytic, semiconvex, uniformly convex and convex functions satisfying a growth condition. We refer the reader to [21–27] and the references therein for more details regarding all the classes mentioned above and illustrating examples.
An important role in our convergence analysis will be played by the following uniformized KL property given in [27, Lemma 6].
Lemma 3.4
Let be a compact set and let be a differentiable function. Assume that f is constant on Ω and f satisfies the KL property at each point of Ω. Then there exist and such that for all and for all x in the intersection
(13) the following inequality holds
(14)
The following theorem is the main result of the paper and concerns the global asymptotic convergence of the trajectory generated by (2).
Theorem 3.2
Assume that g is bounded from below and, for let x be the unique strong global solution of (2). Suppose that
is a KL function and x is bounded. Then the following statements are true:
there exists such that
Proof.
Let be and such that for every the inequalities (9) hold. Furthermore, we will use the notations made in Lemma 3.3, according to which we can choose . It holds
Case I. There exists such that From Lemma 3.3(ii), we have
hence
On the other hand,
hence
Hence, , which leads to
Since A<0 and for every , the latter inequality can hold only if
Consequently, and x is constant on
Case II. We assume that holds for every . Let According to Lemma 3.3, Ω is nonempty and compact and H is constant on Ω. Since H is a KL function, according to Lemma 3.4, there exist and such that for every in the intersection
one has
Since , there exists such that for every . Since , there exists such that for every . Hence, for every , we have
According to Lemma 3.3 (ii) and (v), we have and hence
for almost every
By integrating on the interval , for , we obtain
Since ϕ is bounded from below, and for every and was arbitrarily chosen, we obtain that
which leads to
and further to
Since p is bounded from above on and
we obtain that
Finally, since , the limit exists and it is finite. In conclusion, there exists such that
Remark 3.2
According to Remark 2.1, there exists N>0 such that for almost every hence, under the hypotheses of Theorem 3.2, one has
Remark 3.3
Since the class of semi-algebraic functions is closed under addition [see, for example, [27]] and is semi-algebraic, the conclusion of the previous theorem holds, if, instead of asking that H is a KL function, we ask that g is semi-algebraic.
Remark 3.4
Assume that g is coercive, that is
For , let be the unique global solution of (2). Then x is bounded.
Indeed, notice that g is bounded from below, being a continuous and coercive function [see [28]]. From (10), it follows that is contained for every in a lower level set of g, which is bounded. According to Theorem 3.1, , which implies that x is bounded.
4. Convergence rates
In this section, we will assume that the regularized function H satisfies the Lojasiewicz property, which, as noted in the previous section, corresponds to a particular choice of the desingularizing function ϕ [see [19,22,25]].
Definition 4.1
Let be a differentiable function. The function f is said to fulfil the Łojasiewicz property, if for every there exist and such that
The number θ is called the Łojasiewicz exponent of f at the critical point
In the following theorem, we provide convergence rates for the trajectory generated by (2), its velocity and acceleration in terms of the Łojasiewicz exponent of H [see, also, [22,25]].
Theorem 4.1
Assume that g is bounded from below and, for let x be the unique strong global solution of (2). Suppose that x is bounded, let be such that and suppose that
fulfils the Łojasiewicz property at with Łojasiewicz exponent θ. Let be (see Remark 2.1)
Then, there exist and T>0 such that for almost every , the following statements are true:
if then x converges in finite time;
if then and
if then and
Proof.
Let be and such that for every the inequalities (9) hold. We define for every
Let be fixed. For , we have
By taking the limit as , we obtain
(15) Furthermore, for T>t, we have
By taking the limit as , we obtain
(16) According to Remark 2.1, it holds for almost every ,
By taking the limit as , we obtain
(17) Next, we show that there exists m<0 such that
(18) Indeed,
where
It is an easy verification that
As we have seen in the proof of Theorem 3.2, if there exists such that , then x is constant on and the conclusion follows.
On the other hand, if for every one has that , then according to the proof of Theorem 3.2, there exist and such that
and
Busing (18), we obtain that
where
For , we integrate the last relation on the interval , where , which yields
By taking the limits as , we get
On the other hand, according to the KL property for H and Lemma 3.3 (v), we have
hence
By denoting , one can easily see that a>1 and so
Taking into account that , the previous inequality is nothing else than
(19) where
If , then (19) becomes By multiplying with and integrating on , it follows that there exists such that
where Using (15)–(17), we obtain
which proves (b).
Assume now that . In this case, (19) leads to
By integrating on we obtain
where Then there exists such that for every , thus, x is constant on and (a) follows.
Assume now that In this case, (19) leads to
By integrating on we obtain
where or, equivalently,
Using again (15)–(17), we obtain
which proves (c).
Funding Statement
Radu Ioan Bo's research was partially supported by FWF (Austrian Science Fund) [project I 2419-N32] ; Ernö Robert Csetnek's research was supported by FWF (Austrian Science Fund) [project P 29809-N32] and Szilárd Csaba László's research was supported by a grant of Ministry of Research and Innovation - Unitatea Executiva pentru Finantarea Invatamantului Superior, a Cercetarii, Dezvoltarii si Inovarii (CNCS-UEFISCDI) [project number PN-III-P1-1.1-TE-2016-0266] within PNCDI III.
Acknowledgements
The authors are thankful to an anonymous reviewer for comments and remarks which were helpful to gain a better insight into the asymptotic behaviour of the trajectories of the studied dynamical system.
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
Radu Ioan Bo http://orcid.org/0000-0002-4469-314X
References
- [1].Su W, Boyd S, Candes EJ.. A differential equation for modeling Nesterov's accelerated gradient method: theory and insights. J Mach Learn Res. 2016;17:1–43. [Google Scholar]
- [2].Nesterov YE. A method for solving the convex programming problem with convergence rate . (Russian) Dokl Akad Nauk SSSR. 1983;269(3):543–547. [Google Scholar]
- [3].Attouch H, Chbani Z, Peypouquet J, et al. Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math Program. 2018;168(1–2, Ser. B):123–175. doi: 10.1007/s10107-016-0992-8 [DOI] [Google Scholar]
- [4].Attouch H, Peypouquet J, Redont P.. Fast convex optimization via inertial dynamics with Hessian driven damping. J Differ Equat. 2016;261(10):5734–5783. doi: 10.1016/j.jde.2016.08.020 [DOI] [Google Scholar]
- [5].Chambolle A, Dossal C.. On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J Optim Theory Appl. 2015;166(3):968–982. doi: 10.1007/s10957-015-0746-4 [DOI] [Google Scholar]
- [6].Beck A, Teboulle M.. A fast iterative shrinkage- thresholding algorithm for linear inverse problems. SIAM J Imag Sci. 2009;2(1):183–202. doi: 10.1137/080716542 [DOI] [Google Scholar]
- [7].Attouch H, Chbani Z, Riahi H.. Rate of convergence of the Nesterov accelerated gradient method in the subcritical case . 2017. arXiv:1706.05671.
- [8].Haraux A, Jendoubi MA.. Asymptotics for a second order differential equation with a linear, slowly time-decaying damping term. Evol Equat Contr Theor. 2013;2(3):461–470. doi: 10.3934/eect.2013.2.461 [DOI] [Google Scholar]
- [9].Balti M. Asymptotic behavior for second-order differential equations with nonlinear slowly time-decaying damping and integrable source. Electron J Differ Equat. 2015;302:11pp. [Google Scholar]
- [10].Bégout P, Bolte J, Jendoubi MA.. On damped second-order gradient systems. J Differ Equat. 2015;259:3115–3143. doi: 10.1016/j.jde.2015.04.016 [DOI] [Google Scholar]
- [11].Haraux A, Jendoubi MA.. Convergence of solutions of second-order gradient-like systems with analytic nonliniearities. J Differ Equat. 1998;144(2):313–320. doi: 10.1006/jdeq.1997.3393 [DOI] [Google Scholar]
- [12].Chill R, Jendoubi MA.. Asymptotics for a second order differential equation with a linear, slowly time-decaying damping term. Evol Equat Contr Theor. 2013;2(3):461–470. doi: 10.3934/eect.2013.2.461 [DOI] [Google Scholar]
- [13].Boţ RI, Csetnek ER, László SC.. Approaching nonsmooth nonconvex minimization through second-order proximal-gradient dynamical systems. J Evol Equat. 2018. doi: 10.1007/s00028-018-0441-7. [DOI] [Google Scholar]
- [14].Abbas B, Attouch H, Svaiter BF.. Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces. J Optim Theory Appl. 2014;161(2):331–360. doi: 10.1007/s10957-013-0414-5 [DOI] [Google Scholar]
- [15].Attouch H, Svaiter BF.. A continuous dynamical Newton-like approach to solving monotone inclusions. SIAM J Control Optim. 2011;49(2):574–598. doi: 10.1137/100784114 [DOI] [Google Scholar]
- [16].Haraux A. Systèmes Dynamiques Dissipatifs et Applications, Recherches en Mathématiques Appliquées 17, Masson, Paris, 1991.
- [17].Sontag ED. Mathematical control theory. Deterministic finite-dimensional systems. 2nd ed. New York: Springer-Verlag; 1998. (Texts in Applied Mathematics 6). [Google Scholar]
- [18].Alvarez F, Attouch H, Bolte J, et al. A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J Math Pures et Appl. 2002;81(8):747–779. doi: 10.1016/S0021-7824(01)01253-3 [DOI] [Google Scholar]
- [19].Łojasiewicz S. Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles, Éditions du Centre National de la Recherche Scientifique Paris, 87–89, 1963.
- [20].Kuryka K. On gradients of functions definable in o-minimal structures. Ann l Fourier (Grenoble). 1998;48(3):769–783. doi: 10.5802/aif.1638 [DOI] [Google Scholar]
- [21].Attouch H, Bolte J, Redont P, et al. Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math Oper Res. 2010;35(2):438–457. doi: 10.1287/moor.1100.0449 [DOI] [Google Scholar]
- [22].Bolte J, Daniilidis A, Lewis A.. The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J Optim. 2006;17(4):1205–1223. doi: 10.1137/050644641 [DOI] [Google Scholar]
- [23].Bolte J, Daniilidis A, Lewis A, et al. Clarke subgradients of stratifiable functions. SIAM J Optim. 2007;18(2):556–572. doi: 10.1137/060670080 [DOI] [Google Scholar]
- [24].Bolte J, Daniilidis A, Ley O, et al. Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans Am Math Soc. 2010;362(6):3319–3363. doi: 10.1090/S0002-9947-09-05048-X [DOI] [Google Scholar]
- [25].Attouch H, Bolte J.. On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math Program. 2009;116(1–2, Series B):5–16. doi: 10.1007/s10107-007-0133-5 [DOI] [Google Scholar]
- [26].Attouch H, Bolte J, Svaiter BF.. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math Program. 2013;137(1–2, Series A):91–129. doi: 10.1007/s10107-011-0484-9 [DOI] [Google Scholar]
- [27].Bolte J, Sabach S, Teboulle M.. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math Program Ser A. 2014;146(1–2):459–494. doi: 10.1007/s10107-013-0701-9 [DOI] [Google Scholar]
- [28].Rockafellar RT, Wets RJ-B.. Variational analysis, fundamental principles of mathematical sciences, 317. Berlin: Springer-Verlag; 1998.
