Skip to main content
Springer logoLink to Springer
. 2020 Jun 11;189(1-2):151–186. doi: 10.1007/s10107-020-01528-8

Tikhonov regularization of a second order dynamical system with Hessian driven damping

Radu Ioan Boţ 1,, Ernö Robert Csetnek 1, Szilárd Csaba László 2
PMCID: PMC8550339  PMID: 34720194

Abstract

We investigate the asymptotic properties of the trajectories generated by a second-order dynamical system with Hessian driven damping and a Tikhonov regularization term in connection with the minimization of a smooth convex function in Hilbert spaces. We obtain fast convergence results for the function values along the trajectories. The Tikhonov regularization term enables the derivation of strong convergence results of the trajectory to the minimizer of the objective function of minimum norm.

Keywords: Second order dynamical system, Convex optimization, Tikhonov regularization, Fast convergence methods, Hessian-driven damping

Introduction

The paper of Su et al. [20] was the starting point of intensive research of second order dynamical systems with an asymptotically vanishing damping term of the form

x¨(t)+αtx˙(t)+g(x(t))=0,tt0>0, 1

where g:HR is a convex and continuously Fréchet differentiable function defined on a real Hilbert space H fulfilling argming. The aim is to approach by the trajectories generated by this system the solution set of the optimization problem

minxHg(x). 2

The convergence rate of the objective function along the trajectory is in case α>3 of

g(x(t))-ming=o1t2,

while in case α=3 it is of

g(x(t))-ming=O1t2,

where mingR denotes the minimal value of g. Also in view of this fact, system (1) is seen as a continuous version of the celebrated Nesterov accelerated gradient scheme (see [16]). In what concerns the asymptotic properties of the generated trajectories, weak convergence to a minimizer of g as the time goes to infinity has been proved by Attouch et al. [7] (see also [6]) for α>3. Without any further geometrical assumption on g, the convergence of the trajectories in the case α3 is still an open problem.

Second order dynamical systems with a geometrical Hessian driven damping term have aroused the interest of the researchers, due to both their applications in optimization and mechanics and their natural relations to Newton and Levenberg-Marquardt iterative methods (see [2]). Furthermore, it has been observed for some classes of optimization problems that a geometrical damping term governed by the Hessian can induce a stabilization of the trajectories. In [11] the dynamical system with Hessian driven damping term

x¨(t)+αtx˙(t)+β2g(x(t))x˙(t)+g(x(t))=0,tt0>0, 3

where α3 and β>0, has been investigated in relation with the optimization problem (2). Fast convergence rates for the values and the gradient of the objective function along the trajectories are obtained and the weak convergence of the trajectories to a minimizer of g is shown. We would also like to mention that iterative schemes which result via (symplectic) discretizations of dynamical systems with Hessian driven damping terms have been recently formulated and investigated from the point of view of their convergence properties in [5, 18, 19].

Another development having as a starting point (1) is the investigation of dynamical systems involving a Tikhonov regularization term. Attouch, Chbani and Riahi investigated in this context in [8] the system

x¨(t)+αtx˙(t)+g(x(t))+ϵ(t)x(t)=0,tt0>0, 4

where α3 and ϵ:[t0,+)[0,+). One of the main benefits of considering such a regularized dynamical system is that it generates trajectories which converge strongly to the minimum norm solution of (2). Besides that, in [8] it was proved that the fast convergence rate of the objective function values along the trajectories remains unaltered. For more insights into the role played by the Tikhonov regularization for optimization problems and, more general, for monotone inclusion problems, we refer the reader to [3, 4, 9, 15].

This being said, it is natural to investigate a second order dynamical system which combines a Hessian driven damping and a Tikhonov regularization term and to examine if it inherits the properties of the dynamical systems (3) and (4). This is the aim of the manuscript, namely the analysis in the framework of the general assumption stated below of the dynamical system

x¨(t)+αtx˙(t)+β2g(x(t))x˙(t)+g(x(t))+ϵ(t)x(t)=0,tt0>0,x(t0)=u0,x˙(t0)=v0, 5

where α3 and β0, and u0,v0H.

General assumption:

  • g:HR is a convex and twice Fréchet differentiable function with Lipschitz continuous gradient on bounded sets and argming;

  • ϵ:[t0,+)[0,+) is a nonincreasing function of class C1 fulfilling limt+ϵ(t)=0.

The fact that the starting time t0 is taken as strictly greater than zero comes from the singularity of the damping coefficient αt. This is not a limitation of the generality of the proposed approach, since we will focus on the asymptotic behaviour of the generated trajectories. Notice that if H is finite-dimensional, then the Lipschitz continuity of g on bounded sets follows from the continuity of 2g.

To which extent the Tikhonov regularization does influence the convergence behaviour of the trajectories generated by (5) can be seen even when minimizing a one dimensional function. Consider the convex and twice continuously differentiable function

g:RR,g(x)=-(x+1)3,ifx<-10,if-1x1(x-1)3,ifx>1. 6

It holds that argming=[-1,1] and x=0 is its minimum norm solution. In the second column of Fig. 1 we can see the behaviour of the trajectories generated by the dynamical system without Tikhonov regularization (which corresponds to the case when ϵ is identically 0) for β=1 and α=3 and, respectively, α=4. In both cases the trajectories are approaching the optimal solution 1, which is a minimizer of g, however, not the minimum norm solution.

Fig. 1.

Fig. 1

First column: the trajectories of the dynamical system with Tikhonov regularization ϵ(t)=t-γ are approaching the minimum norm solution x=0. Second column: the trajectories of the dynamical system without Tikhonov regularization the trajectory are approaching the optimal solution 1

In the first column of Fig. 1 we can see the behaviour of the trajectories generated by the dynamical system with Tikhonov parametrizations of the form tϵ(t)=t-γ, for different values of γ(1,2), which is in accordance to the conditions in Theorem 4.4, β=1 and α=3 and, respectively, α=4. The trajectories are approaching the minimum norm solution x=0.

The organization of the paper is as follows. We start the analysis of the dynamical system (5) by proving the existence and uniqueness of a global C2-solution. In the third section we provide two different settings for the Tikhonov parametrization tϵ(t) in both of which g(x(t)) converges to ming, the minimal value of g, with a convergence rate of O1t2 for α=3 and of o1t2 for α>3. The proof relies on Lyapunov theory; the choice of the right energy functional plays a decisive role in this context. Weak convergence of the trajectory is also derived for α>3. In the last section we focus on the proof of strong convergence to a minimum norm solution: firstly, in a general setting, for the ergodic trajectory, and, secondly, in a slightly restrictive setting, for the trajectory x(t) itself.

Existence and uniqueness

In this section we will prove the existence and uniqueness of a global C2-solution of the dynamical system (5). The proof of the existence and uniqueness theorem is based on the idea to reformulate (5) as a particular first order dynamical system in a suitably chosen product space (see also [11]).

Theorem 2.1

For every initial value (u0,v0)H×H, there exists a unique global C2-solution x:[t0,+)H to (5).

Proof

Let (u0,v0)H×H. First we assume that β=0, which gives the dynamical system (4) investigated in [8]. The statement follows from [14, Proposition 2.2(b)] (see also the discussion in [8, Section 2]).

Assume now that β>0. We notice that x:[t0,+)H is a solution of the dynamical system (5), that is

x¨(t)+αtx˙(t)+β2g(x(t))x˙(t)+g(x(t))+ϵ(t)x(t)=0,x(t0)=u0,x˙(t0)=v0,

if and only if (x,y):[t0,+)H×H is a solution of the dynamical system

x˙(t)+βg(x(t))-y(t)=0y˙(t)+αtx˙(t)+g(x(t))+ϵ(t)x(t)=0x(t0)=u0,y(t0)=v0+βg(u0),

which is further equivalent to

x˙(t)+βg(x(t))-y(t)=0y˙(t)+αty(t)+1-αβtg(x(t))+ϵ(t)x(t)=0x(t0)=u0,y(t0)=v0+βg(u0). 7

We define F:[t0,+)×H×HH×H by

F(t,u,v)=-βg(u)+v,-αtv-1-αβtg(u)-ϵ(t)u,

and write (7) as

(x˙(t),y˙(t))=F(t,x(t),y(t))(x(t0),y(t0))=(u0,v0+βg(u0)). 8

Since g is Lipschitz continuous on bounded sets and continuously differentiable, the local existence and uniqueness theorem (see [17, Theorems 46.2 and 46.3]) guarantees the existence of a unique solution (xy) of (8) defined on a maximum intervall [t0,Tmax), where t0<Tmax+. Furthermore, either Tmax=+ or limtTmaxx(t)+y(t)=+. We will prove that Tmax=+, which will imply that x is the unique global C2-solution of (5).

Consider the energy functional (see [10])

E:[t0,+)R,E(t)=12x˙(t)2+g(x(t))+12ϵ(t)x(t)2.

By using (5) we get

ddtE(t)=-αtx˙(t)2-β2g(x(t))x˙(t),x˙(t)+12ϵ˙(t)x(t)2,

and, since ϵ is nonincreasing and 2g(x(t)) is positive semidefinite, we obtain that

ddtE(t)0tt0.

Consequently, E is nonincreasing, hence

12x˙(t)2+g(x(t))+12ϵ(t)x(t)212x˙(t0)2+g(x(t0))+12ϵ(t0)x(t0)2tt0.

From the fact that g is bounded from below we obtain that x˙ is bounded on [t0,Tmax). Let x˙:=supt[t0,Tmax)x˙(t)<+.

Since x(t)-x(t)x˙|t-t| for all t,t[t0,Tmax), there exists limtTmaxx(t), which shows that x is bounded on [t0,Tmax). Since x˙(t)+βg(x(t))=y(t) for all t[t0,Tmax) and g is Lipschitz continuous on bounded sets, it yields that y is also bounded on [t0,Tmax). Hence limtTmaxx(t)+y(t) cannot be +, thus Tmax=+, which completes the proof.

Asymptotic analysis

In this section we will show to which extent different assumptions we impose to the Tikhonov parametrization tϵ(t) influence the asymptotic behaviour of the trajectory x generated by the dynamical system (5). In particular, we are looking at the convergence of the function g along the trajectory and the weak convergence of the trajectory.

We recall that the asymptotic analysis of the system (5) is carried out in the framework of the general assumptions stated in the introduction.

We start with a result which provides a setting that guarantees the convergence of g(x(t)) to ming as t+.

Theorem 3.1

Let x be the unique global C2-solution of (5). Assume that one of the following conditions is fulfilled:

  1. t0+ϵ(t)tdt<+ and there exist a>1 and t1t0 such that
    ϵ˙(t)-aβ2ϵ2(t)foreverytt1;
  2. there exists a>0 and t1t0 such that
    ϵ(t)atforeverytt1.

If α3, then

limt+g(x(t))=ming.

Proof

Let be xargming and 2bα-1 be fixed. We introduce the following energy functional Eb:[t0,+)R,

Eb(t)=(t2-β(b+2-α)t)g(x(t))-ming+t2ϵ(t)2x(t)2+12b(x(t)-x)+t(x˙(t)+βg(x(t)))2+b(α-1-b)2x(t)-x2. 9

For every tt0 it holds

Eb˙(t)=(2t-β(b+2-α))g(x(t))-ming+(t2-β(b+2-α)t)g(x(t),x˙(t)+t2ϵ˙(t)+2tϵ(t)2x(t)2+t2ϵ(t)x˙(t),x(t)+(b+1)x˙(t)+βg(x(t))+t(x¨(t)+β2g(x(t))x˙(t)),b(x(t)-x)+t(x˙(t)+βg(x(t)))+b(α-1-b)x˙(t),x(t)-x. 10

Now, by using (5), we get for every tt0

(b+1)x˙(t)+βg(x(t))+t(x¨(t)+β2g(x(t))x˙(t)),b(x(t)-x)+t(x˙(t)+βg(x(t)))=(b+1-α)x˙(t)+(β-t)g(x(t))-tϵ(t)x(t),b(x(t)-x)+t(x˙(t)+βg(x(t)))=b(b+1-α)x˙(t),x(t)-x+(b+1-α)tx˙(t)2+(-t2+β(b+2-α)tx˙(t),g(x(t))+(β2t-βt2)g(x(t))2-ϵ(t)t2x˙(t),x(t)-βϵ(t)t2g(x(t)),x(t)-bt1-βtg(x(t))+ϵ(t)x(t),x(t)-x. 11

Let be t0:=max(β,t0). For all tt0 the function gt:HR,gt(x)=1-βtg(x)+ϵ(t)2x2, is strongly convex, thus, one has

gt(y)-gt(x)gt(x),y-x+ϵ(t)2y-x2x,yH.

By taking x:=x(t) and y:=x we get for every tt0

-bt1-βtg(x(t))+ϵ(t)x(t),x(t)-x-bt1-βt(g(x(t))-ming)-btϵ(t)2x(t)2-btϵ(t)2x(t)-x2+btϵ(t)2x2. 12

From (10), (11) and (12) it follows that for every tt0 it holds

Eb˙(t)((2-b)t-β(2-α))g(x(t))-ming+btϵ(t)2x2+t2ϵ˙(t)2+(2-b)tϵ(t)2x(t)2-btϵ(t)2x(t)-x2+(b+1-α)tx˙(t)2+(β2t-βt2)g(x(t))2-βϵ(t)t2g(x(t)),x(t). 13

At this point we treat the situations α>3 and α=3 separately.

The case α>3 and 2<b<α-1. We will carry out the analysis by addressing the settings provided by the conditions (a) and (b) separately.

Condition (a) holds: Assuming that condition (a) holds, there exist a>1 and t1t0 such that

ϵ˙(t)-aβ2ϵ2(t)foreverytt1.

Using that

-βϵ(t)t2g(x(t)),x(t)βt2ag(x(t))2+aβϵ2(t)t24x(t)2, 14

(13) leads to the following estimate

Eb˙(t)((2-b)t-β(2-α))g(x(t))-ming+btϵ(t)2x2+t2ϵ˙(t)2+(2-b)tϵ(t)2+aβϵ2(t)t24x(t)2-btϵ(t)2x(t)-x2+(b+1-α)tx˙(t)2+β2t-β1-1at2g(x(t))2, 15

which holds for every tt1.

Since a>1 and b>2, we notice that for every tt1 it holds

t2ϵ˙(t)2+(2-b)tϵ(t)2+aβϵ2(t)t240.

On the other hand, we have that

β2t-β1-1at2-βa-12at2foreveryt2aβa-1

and

(2-b)t-β(2-α)0foreverytβ(α-2)b-2.

We define t2:=maxt1,2aβa-1,β(α-2)b-2. According to (15), it holds for every tt2

Eb˙(t)-((2-b)t-β(2-α))g(x(t))-ming-t2ϵ˙(t)2+(2-b)tϵ(t)2+aβϵ2(t)t24x(t)2+btϵ(t)2x(t)-x2+(α-1-b)tx˙(t)2+βa-12at2g(x(t))2btϵ(t)2x2. 16

Condition (b) holds: Assuming now that condition (b) holds, there exist a>0 and t1t0 such that

ϵ(t)atforeverytt1.

Further, the monotonicity of g and the fact that g(x)=0 implies that

g(x(t)),x(t)-x0foreverytt0.

Using that

-βϵ(t)t2g(x(t)),x(t)-βϵ(t)t2g(x(t)),xβt3ϵ(t)2ag(x(t))2+aβϵ(t)t2x2, 17

(13) leads to the following estimate

Eb˙(t)((2-b)t-β(2-α))g(x(t))-ming+(b+aβ)tϵ(t)2x2+t2ϵ˙(t)2+(2-b)tϵ(t)2x(t)2-btϵ(t)2x(t)-x2+(b+1-α)tx˙(t)2+β2t-βt2+βt3ϵ(t)2ag(x(t))2 18

for every tt1.

Since b>2, we have that for every tt1 it holds

t2ϵ˙(t)2+(2-b)tϵ(t)20.

On the other hand, since

-βt2+βt3ϵ(t)2a-β2t2

holds for every tt1, it follows that

β2t-βt2+βt3ϵ(t)2a-β4t2foreverytmax(t1,4β). 19

We recall that

(2-b)t-β(2-α)0foreverytβ(α-2)b-2.

We define t2:=maxt1,4β,β(α-2)b-2. According to (18), it holds for every tt2

Eb˙(t)-((2-b)t-β(2-α))g(x(t))-ming-t2ϵ˙(t)2+(2-b)tϵ(t)2x(t)2+btϵ(t)2x(t)-x2+(α-1-b)tx˙(t)2+β4t2g(x(t))2(b+aβ)tϵ(t)2x2. 20

From now on we will treat the two cases together. According to (16), in case (a), and to (20), in case (b), we obtain

Eb˙(t)ltϵ(t)2x2

for every tt2, where l:=bandt2=maxt1,2aβa-1,β(α-2)b-2, in case (a), and l:=b+aβandt2=maxt1,4β,β(α-2)b-2 in case (b).

By integrating the latter inequality on the interval [t2,T], where Tt2 is arbitrarily chosen, we obtain

Eb(T)Eb(t2)+lx22t2Ttϵ(t)dt.

On the other hand,

Eb(t)(t2-β(b+2-α)t)g(x(T))-mingtt0,

hence, for every Tmax(β(b+2-α),t3) we get

0g(x(T))-mingEb(t2)T2-β(b+2-α)T+lx221T2-β(b+2-α)Tt2Ttϵ(t)dt.

Obviously,

limT+Eb(t3)T2-β(b+2-α)T=0.

Further, Lemma A.1 applied to the functions φ(t)=t2 and f(t)=ϵ(t)t provides

limT+1T2t2Tt2ϵ(t)tdt=0,

hence,

limT+1T2-β(b+2-α)Tt2Ttϵ(t)dt=0

and, consequently,

limT+g(x(T))=ming.

The case α=3 and b=2. In this case the energy functional reads

E2(t)=(t2-βt)g(x(t))-ming+t2ϵ(t)2x(t)2+122(x(t)-x)+t(x˙(t)+βg(x(t)))2

for every tt0. We will address again the settings provided by the conditions (a) and (b) separately.

Condition (a) holds: Relation (15) becomes

E2˙(t)βg(x(t))-ming+tϵ(t)x2+t2ϵ˙(t)2+aβϵ2(t)t24x(t)2-tϵ(t)x(t)-x2+β2t-β1-1at2g(x(t))2

for every tt1. Consequently, for t3:=maxt1,βaa-1, we have

E2˙(t)βg(x(t))-g+tϵ(t)x2 21

for every tt3. After multiplication with (t-β), it yields

t(t-β)E2˙(t)βt(t-β)g(x(t))-g+t2(t-β)ϵ(t)x2βE2(t)+t2(t-β)ϵ(t)x2

for every tt3. Dividing by (t-β)2 we obtain

tt-βE2˙(t)β(t-β)2E2(t)+t2t-βϵ(t)x2

or, equivalently,

ddttt-βE2(t)t2t-βϵ(t)x2foreverytt3. 22

Condition (b) holds: We define t3:=maxt1,4β. Relation (18) becomes

E2˙(t)βg(x(t))-g+2+aβ2tϵ(t)x2, 23

for every tt3. Repeating the above steps for the inequality (23) we obtain

ddttt-βE2(t)2+a1β2t2t-βϵ(t)x2foreverytt3. 24

From now on we will treat the two cases together. According to (22), in case (a), and to (24), in case (b), we obtain

ddttt-βE2(t)lt2t-βϵ(t)x2

for every tt3, where l:=1andt3=maxt1,β(α-1)b-2, in case (a), and l:=2+aβ2andt3=max(t1,4β) in case (b).

By integrating the latter inequality on an interval [t3,T], where Tt3 is arbitrarily chosen, we obtain

TT-βE2(T)t3t3-βE2(t3)+lx2t3Tt2t-βϵ(t)dt.

On the other hand,

E2(t)(t2-βt)g(x(t))-ming

for every tt0, hence, for every Tmax(β,t3)=t3 we get

0g(x(T))-ming1T2t3t3-βE2(t3)+lx21T2t3Tt2t-βϵ(t)dt.

Obviously,

limT+1T2t3t3-βE2(t3)=0.

Lemma A.1, applied this time to the functions φ(t)=t3t-β and f(t)=ϵ(t)t, yields

limT+T-βT3t3Tt3t-βϵ(t)tdt=0.

Consequently,

limT+1T2t3Tt2t-βϵ(t)dt=0,

hence

limT+g(x(T))=ming.

Remark 3.2

One can easily notice that, in case β>0, the fact that there exist a>1 and t1t0 such that ϵ˙(t)-aβ2ϵ2(t) for every tt1 implies that t0+ϵ(t)tdt<+.

The next theorem shows that, by strengthening the integrability condition t0+ϵ(t)tdt<+ (which is actually required in both settings (a) and (b) of Theorem 3.1), a rate of O(1/t2) ca be guaranteed for the convergence of g(x(t)) to ming.

Theorem 3.3

Let x be the unique global C2-solution of (5). Assume that

t0+tϵ(t)dt<+

and that one of the following conditions is fulfilled:

  1. there exist a>1 and t1t0 such that
    ϵ˙(t)-aβ2ϵ2(t)foreverytt1;
  2. there exist a>0 and t1t0 such that
    ϵ(t)atforeverytt1.

If α3, then

g(x(t))-ming=O1t2.

In addition, if α>3, then the trajectory x is bounded and

tg(x(t))-ming,tx˙(t)2,tϵ(t)x(t)-x2,tϵ(t)x(t)2,t2g(x(t))2L1([t0,+),R)

for every arbitrary xargming.

Proof

Let be xargming and 2bα-1 fixed. We will use the energy functional introduced in the proof of the previous theorem and some of the estimate we derived for it. We will treat again the situations α>3 and α=3 separately.

The case α>3 and 2<b<α-1. As we already noticed in the proof of Theorem 3.1, according to (16), in case (a), and to (20), in case (b), we have

Eb˙(t)ltϵ(t)2x2foreverytt2,

where l:=bandt2=maxt1,2aβa-1,β(α-2)b-2, in case (a), and l:=b+aβandt2=maxt1,4β,β(α-2)b-2 in case (b).

Using that tϵ(t)L1([t0,+),R) and that tEb(t) is bounded from below, from Lemma A.2 it follows that the limit limt+Eb(t) exists. Consequently, tEb(t) is bounded, which implies that there exist K>0 and tt0 such that

0g(x(t))-mingKt2foreverytt.

In addition, the function tx(t)-x2 is bounded, hence the trajectory x is bounded. Since tb(x(t)-x)+t(x˙(t)+βg(x(t)))2 is also bounded, the inequality

t(x˙(t)+βg(x(t)))22b(x(t)-x)+t(x˙(t)+βg(x(t)))2+2b2x(t)-x2,

which is true for every tt0, leads to

x˙(t)+βg(x(t))=O1t.

By integrating relation (16), in case (a), and relation (20), in case (b), on an interval [t2,s], where st3 is arbitrarily chosen, and by letting afterwards s converge to +, we obtain

tg(x(t))-ming,tx˙(t)2,tϵ(t)x(t)-x2,t2g(x(t))2L1([t0,+),R).

The boundedness of the trajectory and the condition on the Tikhonov parametrization guarantee that

tϵ(t)x(t)2L1([t0,+),R).

The case α=3 and b=2. As we already noticed in the proof of Theorem 3.1, according to (22), in case (a), and to (24), in case (b), we obtain

ddttt-βE2(t)lt2t-βϵ(t)x2foreverytt3,

where l=1andt3=maxt1,β(α-1)b-2, in case (a), and l=2+aβ2andt3=max(t1,4β) in case (b).

Since tϵ(t)L1([t0,+),R) and ϵ(t) is nonnegative, obviously t2t-βϵ(t)x2L1([t2,+),R). Using that ttt-βE2(t) is bounded from below, from Lemma A.2 it follows that the limit limt+tt-βE2(t) exists. Consequently, the limit limt+E2(t) also exists and tE2(t) is bounded. This implies that there exist K>0 and tt0 such that

0g(x(t))-mingKt2foreverytt.

The next result shows that the statements of Theorem 3.3 can be strengthened in case α>3.

Theorem 3.4

Let x be the unique global C2-solution of (5). Assume that

t0+tϵ(t)dt<+

and that one of the following conditions is fulfilled:

  1. there exist a>1 and t1t0 such that
    ϵ˙(t)-aβ2ϵ2(t)foreverytt1;
  2. there exist a>0 and t1t0 such that
    ϵ(t)atforeverytt1.

Let be an arbitrary xargming. If α>3, then

tg(x(t)),x(t)-xL1([t0,+),R)

and the limits

limt+x(t)-xRandlimt+tx˙(t)+βg(x(t)),x(t)-xR

exist. In addition,

g(x(t))-ming=o1t2,x˙(t)+βg(x(t))=o1tandlimt+t2ϵ(t)x(t)2=0.

Proof

Since α>3 we can choose 2<b<α-1. From (10) and (11) we have that

Eb˙(t)=(2t-β(b+2-α))g(x(t))-ming+t2ϵ˙(t)2+tϵ(t)x(t)2+(b+1-α)tx˙(t)2+(β2t-βt2)g(x(t))2-βϵ(t)t2g(x(t)),x(t)-bt1-βtg(x(t))+ϵ(t)x(t),x(t)-xforeverytt0. 25

We will address the settings provided by the conditions (a) and (b) separately.

Condition (a) holds: In this case we estimate -βϵ(t)t2g(x(t)),x(t) just as in (14) and from (25) we obtain

Eb˙(t)(2t-β(b+2-α))g(x(t))-ming+t2ϵ˙(t)2+tϵ(t)+aβϵ2(t)t24x(t)2+(b+1-α)tx˙(t)2+β2t-β1-1at2g(x(t))2-bt1-βtg(x(t))+ϵ(t)x(t),x(t)-xforeverytt0. 26

We define t2:=maxβ,t1,βaa-1. By using condition (a), neglecting the nonpositive terms and afterwards integrating on the interval [t2,t], with arbitrary tt2, we obtain

t2tbs1-βsg(x(s)),x(s)-xEb(t2)-Eb(t)+t2t(2s-β(b+2-α))g(x(s))-mingds-t2tbs1-βsϵ(s)x(s),x(s)-x+t2tsϵ(s)x(s)2ds. 27

For every st2, by the monotonicity of g, we have g(x(s)),x(s)-x0. Further, it holds

bs1-βsϵ(s)x(s),x(s)-x1-βsbsϵ(s)2(x(s)2+x(s)-x2).

By letting in (27) s converge to + and by taking into account that, according to Theorem 3.3,

tϵ(t)x(t)2,tϵ(t)x(t)-x2,(2t-β(b+2-α))g(x(t))-gL1([t0,+),R)
tg(x(t)),x(t)-xL1([t0,+),R). 28

Condition (b) holds: In this case we estimate -βϵ(t)t2g(x(t)),x(t) just as in (17) and from (25) we obtain

Eb˙(t)(2t-β(b+2-α))g(x(t))-ming+t2ϵ˙(t)2+tϵ(t)x(t)2+(b+1-α)tx˙(t)2+β2t-βt2+βϵ(t)t32ag(x(t))2+a1βϵ(t)t2x2-bt1-βtg(x(t))+ϵ(t)x(t),x(t)-xforeverytt0. 29

We define t2:=max4β,t1. According to (19) we have that β2t-βt2+βϵ(t)t32a10 for every tt2. By using condition (b), neglecting the nonpositive terms and afterwards integrating on the interval [t2,t], with arbitrary tt2, we obtain

t2tbs1-βsg(x(s)),x(s)-xEb(t2)-Eb(t)+t2t(2s-β(b+2-α))g(x(s))-mingds-t2tbs1-βsϵ(s)x(s),x(s)-x+t2tsϵ(s)x(s)2ds+aβ2x2t2tsϵ(s)ds. 30

From here, by using the similar arguments as for the case (a), we obtain (28).

Consider now, b1,b2(2,α-1),b1b2. Then for every tt0 we have

Eb1(t)-Eb2(t)=(b1-b2)-βt(g(x(t))-ming)+tx˙(t)+βg(x(t)),x(t)-x+α-12x(t)-x2.

According to Theorem 3.3, the limits

limt+(Eb1(t)-Eb2(t))Randlimt+t(g(x(t))-g)R

exist, consequently, the limit

limt+tx˙(t)+βg(x(t)),x(t)-x+α-12x(t)-x2

also exists. For every tt0 we define

k(t)=tx˙(t)+βg(x(t)),x(t)-x+α-12x(t)-x2

and

q(t)=12x(t)-x2+βt0tg(x(s)),x(s)-xds.

Then

(α-1)q(t)+tq˙(t)=k(t)+β(α-1)t0tg(x(s)),x(s)-xdsforeverytt0.

From (28) and the fact that k(t) has a limit whenever t+, we obtain that (α-1)q(t)+tq˙(t) has a limit when t+. According to Lemma 4.6, q(t) has a limit when t+. By using (28) again we obtain that the limit

limt+x(t)-xR

exists and, consequently, the limit

limt+tx˙(t)+βg(x(t)),x(t)-xR

also exists. On the other hand, we notice that for every tt0 the energy functional can be written as

Eb(t)=(t2-β(b+2-α)t)g(x(t))-ming+t2ϵ(t)2x(t)2+t22x˙(t)+βg(x(t))2+btx˙(t)+βg(x(t)),x(t)-x+b(α-1)2x(t)-x2. 31

Since the limits

limt+Eb(t)Randlimt+btx˙(t)+βg(x(t)),x(t)-x+b(α-1)2x(t)-x2R

exist, it follows that the limit

limt+(t2-β(b+2-α)t)g(x(t))-ming+t2ϵ(t)2x(t)2+t22x˙(t)+βg(x(t))2R

exists, too.

We define

φ:[t0,+)R,φ(t)=(t2-β(b+2-α)t)g(x(t))-g+t2ϵ(t)2x(t)2+t22x˙(t)+βg(x(t))2,

and notice that for sufficiently large t it holds

0φ(t)t2tg(x(t))-ming+tϵ(t)2x(t)2+t2x˙(t)+βg(x(t))2.

According to Theorem 3.3 the right hand side of the above inequality is of class L1([t0,+),R).

Hence,

φ(t)tL1([t0,+),R).

Since 1tL1([t0,+),R) and the limit limt+φ(t)R exists, it must hold that limt+φ(t)=0. Consequently,

limt+(t2-β(b+2-α)t)g(x(t))-ming=limt+t2ϵ(t)2x(t)2=limt+t22x˙(t)+βg(x(t))2=0

and the proof is complete.

Working in the hypotheses of Theorem 3.4 we can prove also the weak convergence of the trajectories generated by (5) to a minimizer of the objective function g.

Theorem 3.5

Let x be the unique global C2-solution of (5). Assume that

t0+tϵ(t)dt<+

and that one of the following conditions is fulfilled:

  1. there exist a>1 and t1t0 such that
    ϵ˙(t)-aβ2ϵ2(t)foreverytt1;
  2. there exist a>0 and t1t0 such that
    ϵ(t)atforeverytt1.

If α>3, then x(t) converges weakly to an element in argming as t+.

Proof

We will to apply the continuous version of the Opial Lemma (Lemma A.3) for S=argming. According to Theorem 3.4, the limit

limt+x(t)-xR

exists for every xargming.

Further, let x¯H be a weak sequential limit point of x(t). This means that there exists a sequence (tn)nN[t0,+) such that limntn=+ and x(tn) converges weakly to x¯ as n. Since g is weakly lower semicontinuous, we have that

g(x¯)lim infn+g(x(tn)).

On the other hand, according to Theorem 3.3,

limt+g(x(t))=ming,

consequently one has g(x¯)ming, which shows that x¯argming.

The convergence of the trajectory is a consequence of Lemma A.3.

Remark 3.6

We proved in this section that the convergence rate of o1t2 for g(x(t)), the converge rate of o1t for x˙(t)+βg(x(t)) and the weak convergence of the trajectory to a minimizer of g that have been obtained in [11] for the dynamical system with Hessian driven damping (3) are preserved when this system is enhanced with a Tikhonov regularization term. In addition, in the case when the Hessian driven damping term is removed, which is the case when β=0, we recover the results provided in [8] for the dynamical system (4) with Tikhonov regularization term. In this setting, we have to assume in Theorem 3.1 just that t0+ϵ(t)tdt<+, and in the theorems 3.3 - 3.5 just that t0+tϵ(t)dt<+, since condition (a) is automatically fulfilled.

Strong convergence to the minimum norm solution

In this section we will continue the investigations we did at the end of Section 3, by working in the same setting, on the behaviour of the trajectory of the dynamical system (5) by concentrating on strong convergence. In particular, we will provide conditions on the Tikhonov parametrization tϵ(t) which will guarantee that the trajectory converges to a minimum norm solution of g, which is the element of minimum norm of the nonempty convex closed set argming. We start with the following result.

Lemma 4.1

Let x be the unique global C2-solution of (5). For xargming we introduce the function

hx:[t0,+)Rhx(t)=12x(t)-x2.

If α>0 and β0, then

suptt0x˙(t)<+and1tx˙(t)2L1([t0,+),R).

In addition,

suptt01t|h˙x(t)|<+.

Proof

We consider the following energy functional

W:[t0,+)R,W(t)=g(x(t))+12x˙(t)2+ϵ(t)2x(t)2. 32

By using (5) we have for every tt0

W˙(t)=g(x(t),x˙(t)+x¨(t),x˙(t)+ϵ˙(t)2x(t)2+ϵ(t)x˙(t),x(t)=g(x(t),x˙(t)+ϵ˙(t)2x(t)2+ϵ(t)x˙(t),x(t)+-αtx˙(t)-β2g(x(t))x˙(t)-g(x(t))-ϵ(t)x(t),x˙(t)=-αtx˙(t)2+ϵ˙(t)2x(t)2-β2g(x(t))x˙(t),x˙(t).

From here, invoking the convexity of g, it follows

W˙(t)-αtx˙(t)2+ϵ˙(t)2x(t)2, 33

for every tt0. Since ϵ is nonincreasing this leads further to

W˙(t)-αtx˙(t)2foreverytt0, 34

therefore the energy W is nonincreasing. Since W is bounded from bellow, there exists limt+W(t)R. Consequently, tW(t) is bounded on [t0,+) from which, since g is bounded from bellow, we obtain that

suptt0x˙(t)=K<+.

By integrating (34) on an interval [t0,t] for arbitrary t>t0 it yields

t0tαsx˙(s)2dsW(t0)-W(t),

which, by letting t+, leads to

1tx˙(t)2L1([t0,+),R).

Further, for every tt0 we have that

|h˙x(t)|=|x˙(t),x(t)-x|x˙(t)x(t)-x

and

|x(t)-xx(t)-x(t0)+x(t0)-xsuptt0x˙(t)(t-t0)+x(t0)-x,

hence,

1t|h˙x(t)|suptt0x˙(t)suptt0x˙(t)1-t0t+1tx(t0)-xsuptt0x˙(t)suptt0x˙(t)+1t0x(t0)-xR.

For each ϵ>0, we denote by xϵ the unique solution of the strongly convex minimization problem

xϵ=argminxHg(x)+ϵ2x2.

In virtue of the Fermat rule, this is equivalent to

g(xϵ)+ϵxϵ=0.

It is well known that the Tikhonov approximation curve ϵxϵ satisfies limϵ0xϵ=x, where x=argmin{x:xargming} is the element of minimum norm of the nonempty convex closed set argming. Since g is monotone, for every ϵ>0 it holds g(xϵ)-g(x),xϵ-x0, that is -ϵxϵ,xϵ-x0. Hence,-xϵ2+xϵ,x0, which, by using the Cauchy-Schwarz inequality, implies

xϵxforeveryϵ>0.

Strong ergodic convergence

We will start by proving a strong ergodic convergence result for the trajectory of (5).

Theorem 4.2

Let x be the unique global C2-solution of (5). Assume that

t0+ϵ(t)tdt=+.

Let x=argmin{x:xargming} be the element of minimum norm of the nonempty convex closed set argming. If α>0, then

limt+1t0tϵ(s)sdst0tϵ(s)sx(s)-x2ds=0andlim inft+x(t)-x=0.

Proof

We introduce the function

hx:[t0,+)R,hx(t)=12x(t)-x2.

For every tt0 we have

h¨x(t)+αth˙x(t)=x˙(t)2+x¨(t)+αtx˙(t),x(t)-x. 35

Further, for every tt0, the function gt:HR,gt(x)=g(x)+ϵ(t)2x2, is strongly convex, with modulus ϵ(t), hence

gt(x)-gt(x(t))gt(x(t)),x-x(t)+ϵ(t)2x(t)-x2. 36

But gt(x(t))=g(x(t))+ϵ(t)x(t) and by using (5) we get

gt(x(t))=-x¨(t)-αtx˙(t)-β2g(x(t))x˙(t)foreverytt0.

Consequently, (36) becomes

gt(x)-gt(x(t))x¨(t)+αtx˙(t)+β2g(x(t))x˙(t),x(t)-x+ϵ(t)2x(t)-x2foreverytt0. 37

By using (35), the latter relation leads to

gt(x)-gt(x(t))h¨x(t)+αth˙x(t)+ϵ(t)hx(t)+β2g(x(t))x˙(t),x(t)-x-x˙(t)2 38

for every tt0.

For every tt0, let xϵ(t) the unique solution of the strongly convex minimization problem

minxHg(x)+ϵ(t)2x2.

Then

gt(x)-gt(x(t))gt(x)-gt(xϵ(t))=g(x)+ϵ(t)2x2-g(xϵ(t))-ϵ(t)2xϵ(t)2ϵ(t)2(x2-xϵ(t)2)

for every tt0 and taking into account (38) we get

ϵ(t)2(x2-xϵ(t)2)h¨x(t)+αth˙x(t)+ϵ(t)hx(t)+β2g(x(t))x˙(t),x(t)-x-x˙(t)2 39

for every tt0. We have

h¨x(t)+αth˙x(t)=1tαddttαh˙x(t)

and

2g(x(t))x˙(t),x(t)-x=ddt(g(x(t)),x(t)-x-g(x(t)))

hence (39) is equivalent to

ϵ(t)thx(t)-12(x2-xϵ(t)2)1tx˙(t)2-1tα+1ddt(tαh˙x(t))-βtddt(g(x(t)),x(t)-x-g(x(t))), 40

for every tt0.

After integrating (40) on [t0,t], for arbitrary t>t0, it yields

t0tϵ(s)shx(s)-12(x2-xϵ(s)2)dst0t1sx˙(s)2-1sα+1ddssαh˙x(s)ds+t0tβsddsg(x(s)),x-x(s)+g(x(s))ds. 41

We show that the right-hand side of the above inequality is bounded from above. Indeed, according to Lemma 4.1, one has

1tx˙(t)2L1([t0,+),R),

hence there exists C10 such that t0t1sx˙(s)2C1 for every tt0. Further, for every tt0,

t0t1sα+1dds(sαh˙x(s))ds=h˙x(t)t-h˙x(t0)t0+(α+1)t0th˙x(s)s2ds=h˙x(t)t-h˙x(t0)t0+(α+1)hx(t)t2-hx(t0)t02+2(α+1)t0thx(s)s3dsh˙x(t)t-C2,

where C2=h˙x(t0)t0+(α+1)hx(t0)t02. Consequently,

t0tϵ(s)shx(s)-12(x2-xϵ(s)2)dsC1+C2-h˙x(t)t+t0tβsdds(g(x(s)),x-x(s)+g(x(s)))ds, 42

for every tt0. According to Lemma 4.1, there exists C3 such that 1t|h˙x(t)|C3foralltt0, which combined with (42) guarantees the existence of C40 such that

t0tϵ(s)shx(s)-12(x2-xϵ(s)2)dsC4+t0tβsddsg(x(s)),x-x(s)+g(x(s))ds 43

for every tt0.

On the other hand, for every tt0,

t0tβsdds(g(x(s)),x-x(s)+g(x(s)))ds=t0tβs2(g(x(s)),x-x(s)+g(x(s)))ds+βt(g(x(t)),x-x(t)+g(x(t)))-βt0(g(x(t0)),x-x(t0)+g(x(t0))).

From the gradient inequality of the convex function g we have

g(x(t)),x-x(t)+g(x(t))g(x),

hence

t0tβsdds(g(x(s)),x-x(s)+g(x(s)))dsβtg(x)+t0tβs2g(x)ds-βt0(g(x(t0)),x-x(t0)+g(x(t0))), 44

for all tt0. Obviously the right-hand side of (44) is bounded from above, hence there exists C5>0 such that

t0tβsdds(g(x(s)),x-x(s)+g(x(s)))dsC5foreverytt0. 45

Combining (43) and (45) we obtain that there exists C>0 such that

t0tϵ(s)shx(s)-12(x2-xϵ(s)2)dsCforeverytt0. 46

Since limt+ϵ(t)=0 we have limt+xϵ(t)=x, hence limt+(x2-xϵ(t)2)=0. Consequently, by using the l’Hospital rule and the fact that t0+ϵ(t)tdt=+, we get

limt+1t0tϵ(s)sdst0tϵ(s)s(x2-xϵ(s)2)ds=limt+ϵ(t)t(x2-xϵ(t)2)ϵ(t)t=limt+(x2-xϵ(t)2)=0.

Dividing (46) by t0tϵ(s)sds and taking into account that t0+ϵ(t)tdt=+, we obtain that

limt+1t0tϵ(s)sdst0tϵ(s)sx(s)-x2ds=0.

The last equality immediately implies that

lim inft+x(t)-x=0.

Remark 4.3

The strong ergodic convergence obtained in [8] for the dynamical system (4) is extended to the dynamical system with Hessian driven damping and Tikhonov regularization term (5) under the same hypotheses concerning the Tikhonov parametrization tϵ(t).

Strong convergence

In order to prove strong convergence for the trajectory generated by the dynamical system (5) to an element of minimum norm of argming we have to strengthen the conditions on the Tikhonov parametrization. This is done in the following result.

Theorem 4.4

Let be α3 and x the unique global C2-solution of (5). Assume that

t0+ϵ(t)tdt<+andlimt+βϵ(t)tα3+1t0tϵ2(s)sα3+1ds=0,

and that there exist a>1 and t1t0 such that

ϵ˙(t)-aβ2ϵ2(t)foreverytt1.

In addition, assume that

  • in case α=3: limt+t2ϵ(t)=+;

  • in case α>3: there exists c>0 such that t2ϵ(t)23α13α-1+βc2 for t large enough.

If x=argmin{x:xargming} is the element of minimum norm of the nonempty convex closed set argming, then

lim inft+x(t)-x=0.

In addition,

limt+x(t)-x=0,

if there exists Tt0 such that the trajectory {x(t):tT} stays either in the ball B(0,x), or in its complement.

Proof

Case I Assume that there exists Tt0 such that the trajectory {x(t):tT} stays in the complement of the ball B(0,x).

In other words, x(t)x for every tT. For p0, we consider the energy functional

Ebp(t)=tp+1(t+α-β-βp-b-1)(g(x(t))-ming)+tp+2ϵ(t)2(x(t)2-x2)+tp2b(x(t)-x)+t(x˙(t)+βg(x(t)))2foreverytt0. 47

We define t2:=maxt1,2(β+βp+b+1-α). We have that

Ebp(t)tp+1(t+α-β-βp-b-1)(g(x(t))-ming)+tp+2ϵ(t)2(x(t)2-x2)tp+212(g(x(t))-ming)+tp+2ϵ(t)2(x(t)2-x2)foreverytt2. 48

For every tt0 consider the strongly convex function

gt:HR,gt(x)=12g(x)+ϵ(t)2x2,

and denote

xϵ(t):=argminxHgt(x).

Since x is the element of minimum norm in argmin12g=argming, it holdsxϵ(t)x. Using the gradient inequality we have

gt(x)-gt(xϵ(t))ϵ(t)2x-xϵ(t)2foreveryxH.

On the other hand,

gt(xϵ(t))-gt(x)=12(g(xϵ(t))-ming)+ϵ(t)2(xϵ(t)2-x2)ϵ(t)2(xϵ(t)2-x2).

By adding the last two inequalities we obtain

gt(x)-gt(x)ϵ(t)2(x-xϵ(t)2+xϵ(t)2-x2)foreveryxH. 49

From (48) and (49) we have that for every tt2 it holds

Ebp(t)tp+2(gt(x(t))-gt(x))ϵ(t)2tp+2(x(t)-xϵ(t)2+xϵ(t)2-x2). 50

The next step is to obtain an upper bound for tEbp(t), and to this end we will evaluate its time derivative. For every tt0 we have

ddtEbp(t)=tp((p+2)t+(p+1)(α-β-βp-b-1))(g(x(t))-ming)+tp+1(t+α-β-βp-b-1)g(x(t)),x˙(t)+(p+2)tp+1ϵ(t)2+tp+2ϵ˙(t)2(x(t)2-x2)+tp+2ϵ(t))x˙(t),x(t)+ptp-12b(x(t)-x)+t(x˙(t)+βg(x(t)))2+tp(b+1)x˙(t)+βg(x(t))+t(x¨(t)+β2g(x(t))x˙(t)),b(x(t)-x)+t(x˙(t)+βg(x(t))). 51

By using (5) we have

x¨(t)+β2g(x(t))x˙(t)=-αtx˙(t)-g(x(t))-ϵ(t)x(t),

hence

(b+1)x˙(t)+βg(x(t))+t(x¨(t)+β2g(x(t))x˙(t)),b(x(t)-x)+t(x˙(t)+βg(x(t)))=(b+1-α)x˙(t)+βg(x(t))-t(g(x(t))+ϵ(t)x(t)),b(x(t)-x)+t(x˙(t)+βg(x(t)))=b(b+1-α)x˙(t),x(t)-x+(b+1-α)t(x˙(t)2+g(x(t)),x˙(t))+βbg(x(t),x(t)-x+βtg(x(t)),x˙(t)+β2tg(x(t))2-btg(x(t))+ϵ(t)x(t),x(t)-x-t2g(x(t))+ϵ(t)x(t),x˙(t)-βt2g(x(t))+ϵ(t)x(t),g(x(t)) 52

for every tt0. Further, for every tt0,

b(x(t)-x)+t(x˙(t)+βg(x(t)))2=b2x(t)-x2+2btx˙(t),x(t)-x+2bβtg(x(t)),x(t)-x+t2x˙(t)2+2βt2g(x(t)),x˙(t)+β2t2g(x(t))2, 53

which means that (51) becomes

ddtEbp(t)=tp((p+2)t+(p+1)(α-β-βp-b-1))(g(x(t))-ming)+(p+2)tp+1ϵ(t)2+tp+2ϵ˙(t)2(x(t)2-x2)+b2ptp-12x(t)-x2+(p+2)β2tp+12g(x(t))2+b+1-α+p2tp+1x˙(t)2+b(b+1-α+p)tpx˙(t),x(t)-x+bβ(p+1)tpg(x(t)),x(t)-x-btp+1g(x(t))+ϵ(t)x(t),x(t)-x-βtp+2g(x(t))+ϵ(t)x(t),g(x(t)). 54

The gradient inequality for the strongly convex function xg(x)+ϵ(t)2x2 gives

g(x(t))+ϵ(t)x(t),x-x(t)+ϵ(t)2x(t)-x2g(x)+ϵ(t)2x2-g(x(t))+ϵ(t)2x(t)2,

hence

-btp+1g(x(t))+ϵ(t)x(t),x(t)-x-btp+1(g(x(t))-g)-btp+1ϵ(t)2(x(t)2-x2)-btp+1ϵ(t)2x(t)-x2

for every tt0. Plugging this inequality into (54) gives

ddtEbp(t)tp((p+2-b)t+(p+1)(α-β-βp-b-1))(g(x(t))-ming)+(p+2-b)tp+1ϵ(t)2+tp+2ϵ˙(t)2(x(t)2-x2)+b2ptp-12-btp+1ϵ(t)2x(t)-x2+(p+2)β2tp+12-βtp+2g(x(t))2+b+1-α+p2tp+1x˙(t)2+b(b+1-α+p)tpx˙(t),x(t)-x+bβ(p+1)tpg(x(t)),x(t)-x-βtp+2ϵ(t)g(x(t)),x(t) 55

for every tt0. Further we have for every tt0

bβ(p+1)tpg(x(t)),x(t)-xbβ(p+1)4c2tp+1g(x(t))2+bβ(p+1)c2tp-1x(t)-x2 56

and

-βtp+2ϵ(t)g(x(t)),x(t)βatp+2g(x(t))2+aβ4ϵ2(t)tp+2x(t)2, 57

where a>1 and c>0 are the constants which are assumed to exist in the hypotheses of the theorem, whereby in case α=3 we will take c=1.

Combining (55), (56) and (57) and neglecting the nonpositive terms we derive

ddtEbp(t)tp((p+2-b)t+(p+1)(α-β-βp-b-1))(g(x(t))-ming)+(p+2-b)tp+1ϵ(t)2+tp+2ϵ˙(t)2+aβ4ϵ2(t)tp+2x(t)2-(p+2-b)tp+1ϵ(t)2+tp+2ϵ˙(t)2x2+b2ptp-12+bβ(p+1)c2tp-1-btp+1ϵ(t)2x(t)-x2+(p+2)β2tp+12+bβ(p+1)4c2tp+1-β1-1atp+2g(x(t))2+b+1-α+p2tp+1x˙(t)2+b(b+1-α+p)tpx˙(t),x(t)-x 58

for every tt0.

For the remaining of the proof we choose the parameters appearing in the definition of the energy functional as

b:=23αandp:=13(α-3).

Since α3, we have

p+2-b=1-α30,b+1+p-α=0andb+1+p2-α=-p20.

Notice that, if α=3, then (p+2-b)t+(p+1)(α-β-βp-b-1)=-β0 and, if α>3, then p+2-b<0. This means that there exists t3t2 such that (p+2-b)t+(p+1)(α-β-βp-b-1)<0 for every tt3. This implies that the term

tp((p+2-b)t+(p+1)(α-β-βp-b-1))(g(x(t))-ming)

in (58) is nonpositive for every tt2 and therefore we will omit it. Further, using that limt+t2ϵ(t)=+, if α=3, and that t2ϵ(t)23α(13α-1+βc2) for t large enough, if α>3, we immediately see that there exists t4t3 such that

b2ptp-12+bβ(p+1)c2tp-1-btp+1ϵ(t)20foreverytt3.

Finally, since a>1, it is obvious that there exists t5t4 such that

(p+2)β2tp+12+bβ(p+1)4c2tp+1-β1-1atp+20foreverytt5.

Thus, (58) yields

ddtEbp(t)(p+2-b)tp+1ϵ(t)2+tp+2ϵ˙(t)2+aβ4ϵ2(t)tp+2x(t)2-(p+2-b)tp+1ϵ(t)2+tp+2ϵ˙(t)2x2=(p+2-b)tp+1ϵ(t)2+tp+2ϵ˙(t)2+aβ4ϵ2(t)tp+2(x(t)2-x2)+aβ4ϵ2(t)tp+2x2, 59

for every tt5. By the hypotheses, we have that

(p+2-b)tp+1ϵ(t)2+tp+2ϵ˙(t)2+aβ4ϵ2(t)tp+20,

for every tt5 and, taking into account the setting considered in this first case, it follows there exists t6t5 such that

x(t)2-x20

for every tt6. Hence, (59) leads to

ddtEbp(t)aβ4ϵ2(t)tp+2x2foreverytt6. 60

By integrating (60) on the interval [t6,t], for arbitrary tt6, we get

Ebp(t)Ebp(t6)+aβ4x2t6tϵ2(s)sp+2dt. 61

Recall that from (50) we have

Ebp(t)ϵ(t)2tp+2(x(t)-xϵ(t)2+xϵ(t)2-x2),

which, combined with (61), gives for every tt6 that

x(t)-xϵ(t)2x2-xϵ(t)2+2Ebp(t6)ϵ(t)t13α+1+aβ2ϵ(t)t13α+1x2t6tϵ2(s)s13α+1dt. 62

Using that limt+ϵ(t)t13α+1=+, limt+xϵ(t)=x and taking into account the hypotheses of the theorem, we get that the right-hand side of (62) converges to 0 as t+. This yields

limt+x(t)=x.

Case II Assume that there exists Tt0 such that the trajectory {x(t):tT} stays in the ball B(0,x).

In other words, x(t)<x for every tT. Since

t0+ϵ(t)tdt<+,

according to Theorem 3.1, we have

limt+g(x(t))=ming.

Consider x¯H a weak sequential cluster point of the trajectory x, which exists since the trajectory is bounded. This means that there exists a sequence (tn)nN[T,+) such that tn+ and x(tn) converges weakly to x¯ as n+.

Since g is weakly lower semicontinuous, it holds

g(x¯)lim infn+g(x(tn))=ming,thusx¯argming.

Since the norm is weakly lower semicontinuous, it holds

x¯lim infn+x(tn)x,

which, by taking into account that x is the unique element of minimum norm in argming, implies x¯=x. This shows that the whole trajectory x converges weakly to x.

Thus,

xlim inft+x(t)lim supt+x(t)x,hencelimt+x(t)=x.

But by taking into account that x(t)x as t+, we obtain that the convergence is strong, that is

limt+x(t)=x.

Case III Assume that for every Tt0 there exists tT such that x>x(t) and there exists sT such that xx(s).

By the continuity of x it follows that there exists a sequence (tn)nN[t0,+) such that tn+ as n+ and

x(tn)=xforeverynN.

We will show that x(tn)x as n+. To this end we consider x¯H a weak sequential cluster point of the sequence (x(tn))nN. By repeating the arguments used in the previous case (notice that the sequence is bounded) it follows that (x(tn))nN converges weakly to x as n+. Since x(tn)x as n+, it yields x(tn)-x0 as n+. This shows that

lim inft+x(t)-x=0.

Remark 4.5

Theorem 4.4 can be seen as an extension of a result given in [8] for the dynamical system (4) to the dynamical system with Hessian driven damping and Tikhonov regularization term (5). One can notice that for the choice β=0, which means that the Hessian driven damping is removed, the lower bound we impose for tt2ϵ(t) in case α>3 is less tight than the one considered in [8, Theorem 4.1] for the system (4). As we will see later, this lower bound influences the asymptotic behaviour of the trajectory.

In case β>0, in order to guarantee that

limt+βϵ(t)tα3+1t0tϵ2(s)sα3+1ds=0,

one just have to additionally assume that

t0+ϵ(t)dt<+

and that the function

tt13α+1ϵ(t)isnondecreasingfortlargeenough.

This follows from Lemma A.1, by also taking into account that limt+ϵ(t)tα3+1=+.

Combining the main results in the last two sections, one can see that if

t0+tϵ(t)dt<+,

the function

tt13α+1ϵ(t)isnondecreasingfortlargeenough,

there exist a>1 and t1t0 such that

ϵ˙(t)-aβ2ϵ2(t)foreverytt1,

and

  • in case α=3: limt+t2ϵ(t)=+;

  • in case α>3: there exists c>0 such that t2ϵ(t)23α13α-1+βc2 for t large enough,

then one obtains both fast convergence of the function values and strong convergence of the trajectory to the minimal norm solution. This is for instance the case when ϵ(t)=t-γ for all γ(1,2).

In the following, we would like to comment on the role on the condition in Theorem 4.4 which asks, in case α>3, for the existence of a positive constant c such that t2ϵ(t)23α(13α-1+βc2) for t large enough. To this end it is very helpful to visualize the trajectories generated by the dynamical system (5) in relation with the minimization of the function given in (6) for a fixed large value of α and Tikhonov parametrizations of the form tϵ(t)=t-γ, for different values of γ(1,2). The trajectories in the plot in Fig. 2 have been generated for α=200 and β=1 and are all approaching the minimum norm solution x=0. The norm of the difference between the trajectory and the minimum norm solution is guaranteed to be bounded from above by a function which converges to zero, after the time point t is reached at which the inequality t2ϵ(t)23α(13α-1+βc2) “starts” being fulfilled. For large α and the Tikhonov parametrizations considered in our experiment, the closer γ is to 1 is, the faster is this inequality fulfilled. This is reflected by the behaviour of the trajectories plotted in Fig. 2.

Fig. 2.

Fig. 2

The behaviour of the trajectories generated by the dynamical system (5) in relation with the minimization of the function given in (6) for α=200, β=1, ϵ(t)=t-γ and different values for γ(1,2)

Finally, we would like to formulate some possible questions of future research related to the dynamical sytem (5):

  • In [7, Theorem 3.4] it has been proved for the dynamical system (1) that, when g is strongly convex, the rates of convergence of the function values and the tracjectory are both of O(t-23α), thus they can be made arbitrarily fast by taking α large. It is natural to ask if similar rates of convergence can be obtained in a similar setting for the dynamical system (5) (see, also, [8, Section 5.4]).

  • In the literature, in the context of dynamical systems, regularization terms have been considered not only in open-loop, but also in closed-loop form (see, for instance, [12]). It is an interesting question if one can obtain for the dynamical system (5) similar results if the Tikhonov regularization term is taken in closed-loop form.

  • A natural question is to formulate proper numerical algorithms via time discretization of (5), to investigate their theoretical convergence properties, and to validate them with numerical experiments.

Acknowledgements

Open access funding provided by Austrian Science Fund (FWF). The authors are thankful to an anonymous reviewer for comments and remarks which improved the quality of the paper.

Appendix

In this appendix, we collect some lemmas and technical results which we will use in the analysis of the dynamical system (5). The following lemma was stated for instance in [8, Lemma A.3] and is used to prove the convergence of the objective function along the trajectory to its minimal value.

Lemma A.1

Let δ>0 and fL1((δ,+),R) be a nonnegative and continuous function. Let φ:[δ,+)[0,+) be a nondecreasing function such that limt+φ(t)=+. Then it holds

limt+1φ(t)δtφ(s)f(s)ds=0.

The following statement is the continuous counterpart of a convergence result of quasi-Fejér monotone sequences. For its proofs we refer to [1, Lemma 5.1].

Lemma A.2

Suppose that F:[t0,+)R is locally absolutely continuous and bounded from below and that there exists GL1([t0,+),R) such that

ddtF(t)G(t)

for almost every t[t0,+). Then there exists limt+F(t)R.

The following technical result is [11, Lemma 2].

Lemma 4.6

Let u:[t0,+)H be a continuously differentiable function satisfying u(t)+tαu˙(t)uH as t+, where α>0. Then u(t)u as t+.

The continuous version of the Opial Lemma (see [7]) is the main tool for proving weak convergence for the generated trajectory.

Lemma A.3

Let SH be a nonempty set and x:[t0,+)H a given map such that:

(i)foreveryzSthelimitlimt+x(t)-zexists;(ii)every weak sequential limit point ofx(t)belongs to the setS.

Then the trajectory x(t) converges weakly to an element in S as t+.

Footnotes

Radu Ioan Boţ: Research partially supported by FWF (Austrian Science Fund), Project I 2419-N32. Ernö Robert Csetnek: Research supported by FWF (Austrian Science Fund), Project P 29809-N32. Szilárd Csaba László: This work was supported by a grant of Ministry of Research and Innovation, CNCS—UEFISCDI, Project Number PN-III-P1-1.1-TE-2016-0266 and by a Grant of Ministry of Research and Innovation, CNCS—UEFISCDI, Project Number PN-III-P4-ID-PCE-2016-0190, within PNCDI III.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Radu Ioan Boţ, Email: radu.bot@univie.ac.at.

Ernö Robert Csetnek, Email: ernoe.robert.csetnek@univie.ac.at.

Szilárd Csaba László, Email: szilard.laszlo@math.utcluj.ro.

References

  • 1.Abbas B, Attouch H, Svaiter BF. Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces. J. Optim. Theory Appl. 2014;161(2):331–360. doi: 10.1007/s10957-013-0414-5. [DOI] [Google Scholar]
  • 2.Alvarez F, Attouch H, Bolte J, Redont P. A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. Journal de Mathématiques Pures et Appliquées. 2002;81(8):747–779. doi: 10.1016/S0021-7824(01)01253-3. [DOI] [Google Scholar]
  • 3.Alvarez F, Cabot A. Asymptotic selection of viscosity equilibria of semilinear evolution equations by the introduction of a slowly vanishing term. Discrete Contin. Dyn. Syst. 2006;15:921–938. doi: 10.3934/dcds.2006.15.921. [DOI] [Google Scholar]
  • 4.Attouch H. Viscosity solutions of minimization problems. SIAM J. Optim. 1996;6(3):769–806. doi: 10.1137/S1052623493259616. [DOI] [Google Scholar]
  • 5.Attouch, H., Chbani, Z., Fadili, J., Riahi, H.: First-order optimization algorithms via inertial systems with Hessian driven damping (2019). arXiv:1907.10536v1
  • 6.Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α3. ESAIM: Control Optim. Calc. Var. 25(2) (2019)
  • 7.Attouch H, Chbani Z, Peypouquet J, Redont P. Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. Ser. B. 2018;168(1–2):123–175. doi: 10.1007/s10107-016-0992-8. [DOI] [Google Scholar]
  • 8.Attouch H, Chbani Z, Riahi H. Combining fast inertial dynamics for convex optimization with Tikhonov regularization. J. Math. Anal. Appl. 2018;457(2):1065–1094. doi: 10.1016/j.jmaa.2016.12.017. [DOI] [Google Scholar]
  • 9.Attouch H, Cominetti R. A dynamical approach to convex minimization coupling approximation with the steepest descent method. J. Differ. Equ. 1996;128(2):519–540. doi: 10.1006/jdeq.1996.0104. [DOI] [Google Scholar]
  • 10.Attouch H, Czarnecki M-O. Asymptotic control and stabilization of nonlinear oscillators with non-isolated equilibria. J. Differ. Equ. 2002;197:278–310. doi: 10.1006/jdeq.2001.4034. [DOI] [Google Scholar]
  • 11.Attouch H, Peypouquet J, Redont P. Fast convex optimization via inertial dynamics with Hessian driven damping. J. Differ. Equ. 2016;261(10):5734–5783. doi: 10.1016/j.jde.2016.08.020. [DOI] [Google Scholar]
  • 12.Attouch H, Redont P, Svaiter BF. Global convergence of a closed-loop regularized Newton method for solving monotone inclusions in Hilbert spaces. J. Optim. Theory Appl. 2013;157:624–650. doi: 10.1007/s10957-012-0222-3. [DOI] [Google Scholar]
  • 13.Attouch H, Svaiter BF. A continuous dynamical Newton-like approach to solving monotone inclusions. SIAM J. Control Optim. 2011;49(2):574–598. doi: 10.1137/100784114. [DOI] [Google Scholar]
  • 14.Cabot A, Engler H, Gadat S. On the long time behavior of second order differential equations with asymptotically small dissipation. Trans. Am. Math. Soc. 2009;361:5983–6017. doi: 10.1090/S0002-9947-09-04785-0. [DOI] [Google Scholar]
  • 15.Cominetti R, Peypouquet J, Sorin S. Strong asymptotic convergence of evolution equations governed by maximal monotone operators with Tikhonov regularization. J. Differ. Equ. 2008;245:3753–3763. doi: 10.1016/j.jde.2008.08.007. [DOI] [Google Scholar]
  • 16.Nesterov Y. A method of solving a convex programming problem with convergence rate O(1/k2) Soviet Math. Doklady. 1983;27:372–376. [Google Scholar]
  • 17.Sell GR, You Y. Dynamics of Evolutionary Equations. New York: Springer; 2002. [Google Scholar]
  • 18.Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via high-resolution differential equations (2018). arXiv:1810.08907v3
  • 19.Shi B, Du SS, Su WJ, Jordan MI. Acceleration via symplectic discretization of high-resolution differential equations. Adv. Neural Inf. Process. Syst. 2019;32(NIPS 2019):5745–5753. [Google Scholar]
  • 20.Su W, Boyd S, Candès EJ. A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 2016;17(153):1–43. [Google Scholar]

Articles from Mathematical Programming are provided here courtesy of Springer

RESOURCES