Skip to main content
Springer logoLink to Springer
. 2017 Jun 14;12(1):17–33. doi: 10.1007/s11590-017-1158-1

Gradient-type penalty method with inertial effects for solving constrained convex optimization problems with smooth data

Radu Ioan Boţ 1,2,, Ernö Robert Csetnek 1, Nimit Nimana 3
PMCID: PMC6956900  PMID: 31998412

Abstract

We consider the problem of minimizing a smooth convex objective function subject to the set of minima of another differentiable convex function. In order to solve this problem, we propose an algorithm which combines the gradient method with a penalization technique. Moreover, we insert in our algorithm an inertial term, which is able to take advantage of the history of the iterates. We show weak convergence of the generated sequence of iterates to an optimal solution of the optimization problem, provided a condition expressed via the Fenchel conjugate of the constraint function is fulfilled. We also prove convergence for the objective function values to the optimal objective value. The convergence analysis carried out in this paper relies on the celebrated Opial Lemma and generalized Fejér monotonicity techniques. We illustrate the functionality of the method via a numerical experiment addressing image classification via support vector machines.

Keywords: Gradient method, Penalization, Fenchel conjugate, Inertial algorithm

Introduction and preliminaries

Let H be a real Hilbert space with the norm and inner product given by · and ·,·, respectively, and f and g be convex functions acting on H, which we assume for simplicity to be everywhere defined and (Fréchet) differentiable. The object of our investigation is the optimization problem

minxargmingf(x). 1

We assume that

S:=argminf(x):xargming

and that the gradients f and g are Lipschitz continuous operators with constants Lf and Lg, respectively.

The work [5] of Attouch and Czarnecki has attracted since its appearance a huge interest from the research community, since it undertakes a qualitative analysis of the optimal solutions of (1) from the perspective of a penalty-term based dynamical system. This represented the starting point for the design and development of numerical algorithms for solving the minimization problem (1), several variants of it involving also nonsmooth data up to monotone inclusions that are related to optimality systems of constrained optimization problems. We refer the reader to [48, 10, 11, 1315, 2023, 33, 35] and the references therein for more insights into this research topic.

A key assumption used in this context in order to guarantee the convergence properties of the numerical algorithms is the condition

n=1λnβngpβn-σargmingpβn<+pran(Nargming),

where {λn}n=1 and {βn}n=1 are positive sequences, g:HR{+} is the Fenchel conjugate of g:

g(p)=supxH{p,x-g(x)}pH;

σargming:HR{+} is the support function of the set argming:

σargming(p)=supxargmingp,xpH;

and Nargming is the normal cone to the set argming, defined by

Nargming(x)={pH:p,y-x0yargming}

for xargming and Nargming(x)= for xargming. Finally, ran(Nargming) denotes the range of the normal cone Nargming, that is, pran(Nargming) if and only if there exists xargming such that pNargming(x). Let us notice that for xargming one has pNargming(x) if and only if σargming(p)=p,x. We also assume without loss of generality that ming=0.

In this paper we propose a numerical algorithm for solving (1) that combines the gradient method with penalization strategies also by employing inertial and memory effects. Algorithms of inertial type result from the time discretization of differential inclusions of second order type (see [1, 3]) and were first investigated in the context of the minimization of a differentiable function by Polyak [36] and Bertsekas [12]. The resulting iterative schemes share the feature that the next iterate is defined by means of the last two iterates, a fact which induces the inertial effect in the algorithm. Since the works [1, 3], one can notice an increasing number of research efforts dedicated to algorithms of inertial type (see [13, 9, 1619, 2428, 3032, 34]).

In this paper we consider the following inertial algorithm for solving (1):

Algorithm 1

Initialization: Choose the positive sequences {λn}n=1 and {βn}n=1, and a positive constant parameter α(0,1). Take arbitrary x0,x1H.

Iterative step: For given current iterates xn-1,xnH (n1), define xn+1H by

xn+1:=xn+α(xn-xn-1)-λnf(xn)-λnβng(xn).

We notice that in the above iterative scheme {λn}n=1 represents the sequence of step sizes, {βn}n=1 the sequence of penalty parameters, while α controls the influence of the inertial term.

For every n1 we denote by Ωn:=f+βng, which is also a (Fréchet) differentiable function, and notice that Ωn is Ln:=Lf+βnLg-Lipschitz continuous.

In case α=0, Algorithm 1 collapses in the algorithm considered in [35] for solving (1). We prove weak convergence for the generated iterates to an optimal solution of (1), by making use of generalized Fejér monotonicity techniques and the Opial Lemma and by imposing the key assumption mentioned above as well as some mild conditions on the involved parameters. Moreover, the performed analysis allows us also to show the convergence of the objective function values to the optimal objective value of (1). As an illustration of the theoretical results, we present in the last section an application addressing image classification via support vector machines.

Convergence analysis

This section is devoted to the asymptotic analysis of Algorithm 1.

Assumption 2

Assume that the following statements hold:

  • (I)

    The function f is bounded from below;

  • (II)

    There exist positive constants c>1 and K>0 such that Ln2+α-1λn-c+(1+α)K and βn+1-βnKλn+1βn+1 for all n1;

  • (III)

    For every pran(Nargming), we have n=1λnβngpβn-σargmingpβn<+;

  • (IV)

    lim infn+λnβn>0, 1λn+1-1λn2α for all n1 and n=1λn=+.

We would like to mention that in [21] we proposed a forward-backward-forward algorithm of penalty-type, endowed with inertial and memory effects, for solving monotone inclusion problems, which gave rise to a primal-dual iterative scheme for solving convex optimization problems with complex structures. However, we succeeded in proving only weak ergodic convergence for the generated iterates, while with the specific choice of the sequences {λn}n=1 and {βn}n=1 in Assumption 2 we will be able to prove weak convergence of the iterates generated in Algorithm 1 to an optimal solution of (1).

Remark 3

The conditions in Assumption 2 slightly extend the ones considered in [35] in the noninertial case. The only differences are given by the first inequality in (II), which here involves the constant α which controls the inertial terms (for the corresponding condition in [35] one only has to take α=0), and by the inequality 1λn+1-1λn2α for all n1.

We refer to Remark 12 for situations where the fulfillment of the conditions in Assumption 2 is guaranteed.

We start the convergence analysis with three technical results.

Lemma 4

Let x¯S and set p¯:=-f(x¯). We have for all n1

φn+1-φn-αφn-φn-1+λnβng(xn)xn+1-xn2+αxn-xn-12+λnβng2p¯βn-σargming2p¯βn, 2

where φn:=xn-x¯2.

Proof

Since x¯S, we have according to the first-order optimality conditions that 0f(x¯)+Nargming(x¯), thus p¯=-f(x¯)Nargming(x¯). Notice that for all n1

f(xn)=yn-xn+1λn-βng(xn),

where yn:=xn+α(xn-xn-1). This, together with the monotonicity of f, imply that

yn-xn+1λn-βng(xn)+p¯,xn-x¯=f(xn)-f(x¯),xn-x¯0n1, 3

so

2yn-xn+1,xn-x¯2λnβng(xn),xn-x¯-2λnp¯,xn-x¯n1. 4

On the other hand, since g is convex and differentiable, we have for all n1

0=g(x¯)g(xn)+g(xn),x¯-xn,

which means that

2λnβng(xn),xn-x¯2λnβng(xn). 5

As for all n1

2xn-xn+1,xn-x¯=xn+1-xn2+φn-φn+1

and

2αxn-xn-1,xn-x¯=αxn-xn-12+αφn-φn-1,

it follows

2yn-xn+1,xn-x¯=2xn-xn+1,xn-x¯+2αxn-xn-1,xn-x¯=xn+1-xn2+αxn-xn-12+φn-φn+1+αφn-φn-1. 6

Combining (4), (5) and (6), we obtain that for each n1

φn+1-φn-αφn-φn-1+λnβng(xn)xn+1-xn2+αxn-xn-12-λnβng(xn)+2λnp¯,xn-2λnp¯,x¯. 7

Finally, since x¯argming, we have that for all n1

2λnp¯,xn-λnβng(xn)-2λnp¯,x¯=λnβn2p¯βn,xn-g(xn)-2p¯βn,x¯λnβng2p¯βn-2p¯βn,x¯=λnβng2p¯βn-σargming2p¯βn,

which completes the proof.

Lemma 5

We have for all n1

Ωn+1(xn+1)Ωn(xn)+(βn+1-βn)g(xn+1)+Ln2+α2λn-1λnxn+1-xn2+α2λnxn-xn-12. 8

Proof

From the descent Lemma and the fact that Ωn is Ln-Lipschitz continuous, we get that

Ωn(xn+1)Ωn(xn)+Ωn(xn),xn+1-xn+Ln2xn+1-xn2n1.

Since Ωn(xn)=-xn+1-ynλn, it holds for all n1

f(xn+1)+βng(xn+1)f(xn)+βng(xn)-xn+1-ynλn,xn+1-xn+Ln2xn+1-xn2

and then

f(xn+1)+βn+1g(xn+1)f(xn)+βng(xn)+(βn+1-βn)g(xn+1)-1λnxn+1-xn2+αλnxn-xn-1,xn+1-xn+Ln2xn+1-xn2,

which is nothing else than

Ωn+1(xn+1)Ωn(xn)+(βn+1-βn)g(xn+1)+Ln2-1λnxn+1-xn2+αλnxn-xn-1,xn+1-xn. 9

By the Cauchy–Schwarz inequalty it holds that

xn-xn-1,xn+1-xn12xn-1-xn2+12xn+1-xn2,

hence, (9) becomes

Ωn+1(xn+1)Ωn(xn)+(βn+1-βn)g(xn+1)+α2λnxn-1-xn2+Ln2-1λn+α2λnxn+1-xn2n1.

For x¯S and all n1, we set

Γn:=f(xn)+(1-Kλn)βng(xn)+Kφn=Ωn(xn)-Kλnβng(xn)+Kφn,

and, for simplicity, we denote

δn:=12λn+Kα+c.

Lemma 6

Let x¯S and set p¯:=-f(x¯). We have for all n2

Γn+1-Γn-α(Γn-Γn-1)-δnxn+1-xn2+α12λn+Kxn-xn-12+Kλnβng2p¯βn-σargming2p¯βn+αΩn-1(xn-1)-Ωn(xn)+αKλnβng(xn)-λn-1βn-1g(xn-1). 10

Proof

According to Lemma 5 and Assumption 2(II), (8) becomes for all n1

Ωn+1(xn+1)-Ωn(xn)-Kλn+1βn+1g(xn+1)-(K+δn)xn+1-xn2+α2λnxn-xn-12. 11

On the other hand, after multiplying (2) by K, we obtain for all n1

Kφn+1-Kφn-αKφn-Kφn-1+Kλnβng(xn)Kxn+1-xn2+Kαxn-xn-12+Kλnβng2p¯βn-σargming2p¯βn. 12

After summing up the relations (11) and (12) and adding on both sides of the resulting inequality the expressions αΩn-1(xn-1)-Ωn(xn) and α(Kλnβng(xn)-Kλn-1βn-1g(xn-1)) for all n2, we obtain the required statement.

The following proposition will play an essential role in the convergence analysis (see also [13, 16]).

Proposition 7

Let {an}n=1,{bn}n=1 and {cn}n=1 be real sequences and α[0,1) be given. Assume that {an}n=1 is bounded from below, {bn}n=1 is nonnegative and n=1cn<+ such that

an+1-an-α(an-an-1)+bncnn1.

Then the following statements hold:

  • (i)

    n=1[an-an-1]+<+, where [t]+:=max{t,0};

  • (ii)

    {an}n=1 converges and n=1bn<+.

The following lemma collects some convergence properties of the sequences involved in our analysis.

Lemma 8

Let x¯S. Then the following statements are true:

  • (i)

    The sequence {Γn}n=1 is bounded from below.

  • (ii)

    n=1xn+1-xn2<+ and limn+Γn exists.

  • (iii)

    limn+xn-x¯ exists and n=1λnβng(xn)<+.

  • (iv)

    limn+Ωn(xn) exists.

  • (v)

    limn+g(xn)=0 and every sequential weak cluster point of the sequence {xn}n=1 lies in argming.

Proof

We set p¯:=-f(x¯) and recall that g(x¯)=ming=0.

(i) Since f is convex and differentiable, it holds for all n1

Γn=f(xn)+(1-Kλn)βng(xn)+Kφnf(xn)+Kxn-x¯2f(x¯)+f(x¯),xn-x¯+Kxn-x¯2f(x¯)-p¯24K,

which means that {Γn}n=1 is bounded from below. Notice that the first inequality in the above relation is a consequence of Assumption 2(II), since 1-αλnc+(1+α)KK, thus λnK1-α1 for all n1.

(ii) For all n2, we may set

μn:=Γn-αΓn-1+α12λn+Kxn-xn-12

and

un:=Ωn-1(xn-1)-Ωn(xn)+Kλnβng(xn)-Kλn-1βn-1g(xn-1).

We fix a natural number N02. Then

n=2N0un=f(x1)+(1-Kλ1)β1g(x1)-f(xN0)-(1-KλN0)βN0g(xN0).

Since f is bounded from below and g(xN0)g(x¯)=0, it follows that n=2un<+.

We notice that -δn+α12λn+1+K=α21λn+1-1λn-c and, since 1λn+1-1λn2α, we have for all n1

-δn+α12λn+1+K1-c. 13

Thus, according Lemma 6, we get for all n2

μn+1-μn=Γn+1-Γn-α(Γn-Γn-1)+α12λn+1+Kxn+1-xn2-α12λn+Kxn-xn-12-δnxn+1-xn2+Kλnβng2p¯βn-σargming2p¯βn+αun+α12λn+1+Kxn+1-xn2(1-c)xn+1-xn2+Kλnβng2p¯βn-σargming2p¯βn+αun.

We fix another natural number N12 and sum up the last inequality for n=2,,N1. We obtain

μN1+1-μ2(1-c)n=2N1xn+1-xn2+Kn=2N1λnβng2p¯βn-σargming2p¯βn+αn=2N1un, 14

which, by taking into account Assumption 2(III), means that {μn}n=2 is bounded from above by a positive number that we denote by M. Consequently, for all n2 we have

Γn+1-αΓnμn+1M,

so

Γn+1αΓn+M,

which further implies that

Γnαn-2Γ2+Mk=1n-2αk-1αn-2Γ2+M1-αn3.

We have for all n2

μn+1f(x¯)-p¯24K-αΓn,

hence

-μn+1αΓn-f(x¯)+p¯24Kαn-1Γ2+αM1-α-f(x¯)+p¯24K. 15

Consequently, for the arbitrarily chosen natural number N12, we have [see (14)]

(c-1)n=2N1xn+1-xn2-μN1+1+μ2+Kn=2N1λnβng2p¯βn-σargming2p¯βn+αn=2N1un,

which together with (15) and the fact that c>1 implies that

n=1xn+1-xn2<+.

On the other hand, due to (13) we have δn+1δn+1 for all n1. Consequently, using also that c>1, (10) implies that

Γn+1-Γn-α(Γn-Γn-1)-δnxn+1-xn2+(δn-c)xn-xn-12+Kλnβng2p¯βn-σargming2p¯βn+αun-δnxn+1-xn2+δn-1xn-xn-12+Kλnβng2p¯βn-σargming2p¯βn+αunn1.

According to Proposition 7 and by taking into account that {Γn}n=1 is bounded from below, we obtain that limn+Γn exists.

(iii) By Lemma 4 and Proposition 7, limn+φn exists and n=1λnβng(xn)<+.

(iv) Since Ωn(xn)=Γn-Kφn+Kλnβng(xn) for all n1, by using (ii) and (iii), we get that limn+Ωn(xn) exists.

(v) Since lim infn+λnβn>0, we also obtain that limn+g(xn)=0. Let w be a sequential weak cluster point of {xn}n=1 and assume that the subsequence {xnj}j=1 converges weakly to w. Since g is weak lower semicontinuous, we have

g(w)lim infj+g(xnj)=limn+g(xn)=0,

which implies that wargming. This completes the proof.

In order to show also the convergence of the sequence (f(xn))n=1, we prove first the following result.

Lemma 9

Let x¯S be given. We have

n=1λnΩn(xn)-f(x¯)<+.

Proof

Since f is convex and differentiable, we have for all n1

f(x¯)f(xn)+f(xn),x¯-xn.

Since g is convex and differentiable, we have for all n1

0βng(xn)+βng(xn),x¯-xn,

which together imply that

f(x¯)Ωn(xn)+Ωn(xn),x¯-xn=Ωn(xn)+yn-xn+1λn,x¯-xnn1.

From here we obtain for all n1 [see (6)]

2λnΩn(xn)-f(x¯)2yn-xn+1,xn-x¯=xn+1-xn2+φn-φn+1+α(φn-φn-1)+αxn-xn-12.

Hence, by using the previous lemma, the required result holds.

The Opial Lemma that we recall below will play an important role in the proof of the main result of this paper.

Proposition 10

(Opial Lemma) Let H be a real Hilbert space, CH a nonempty set and {xn}n=1 a given sequence such that:

  • (i)

    For every zC,limn+xn-z exists.

  • (ii)

    Every sequential weak cluster point of {xn}n=1 lies in C.

Then the sequence {xn}n=1 converges weakly to a point in C.

Theorem 11

  • (i)

    The sequence {xn}n=1 converges weakly to a point in S.

  • (ii)

    The sequence (f(xn))n=1 converges to the optimal objective value of the optimization problem (1).

Proof

(i) According to Lemma 8, limn+xn-x¯ exists for all x¯S. Let w be a sequential weak cluster point of {xn}n=1. Then there exists a subsequence {xnj}j=1 of {xn}n=1 such that xnj converges weakly to w as j+. By Lemma 8, we have that wargming. This means that in order to come to the conclusion it suffices to show that f(w)f(x) for all xargming. From Lemma 9, Lemma 8 and the fact that n=1λn=+, it follows that limn[Ωn(xn)-f(x¯)]0 for all x¯S. Thus,

f(w)lim infj+f(xnj)limn+Ωn(xn)f(x¯)x¯S,

which shows that wS. Hence, thanks to Opial Lemma, {xn}n=1 converges weakly to a point in S.

(ii) The statement follows easily from the above considerations.

In the end of this section we present some situations where Assumption 2 is verified.

Remark 12

Let α(0,1),c(1,+),q(0,1) and γ0,2Lg be arbitrarily chosen. We set

K:=2α>0,βn:=γ[Lf+2((1+α)K+c)]2-γLg+(1-α)γKnq,

and

λn:=(1-α)γβn,

for all n1.

  • (i)

    Since βnγ[Lf+2((1+α)K+c)]2-γLg, we have βn(2-γLg)γ[Lf+2((1+α)K+c)], which implies that Ln2+α-1λn-c+(1+α)K for all n1.

  • (ii)
    For all n1 it holds
    βn+1-βn=(1-α)γK[(n+1)q-nq](1-α)γK=Kλn+1βn+1.
  • (iii)

    It holds lim infn+λnβn=lim infn+(1-α)γ>0.

  • (iv)
    For all n1 we have
    1λn+1-1λn=1(1-α)γβn+1-βn=K(n+1)q-nqK=2α.
  • (v)

    Since q(0,1), we have n=11βn=+, which implies that n=1λn=+.

  • (vi)
    Finally, as gδargming, we have g(δargming)=σargming and this implies that g-σargming0. We present a situation where Assumption 2(III) holds and refer to [10] for further examples. For instance, if g(x)a2dist2(x,argming) where a>0, then g(x)-σargming(x)12ax2 for every xH. Thus, for pran(Nargming), we have
    λnβngpβn-σargmingpβnλn2aβnp2.
    Hence n=1λnβngpβn-σargmingpβn converges, if n=1λnβn converges or, equivalently, if n=11βn2 converges. This holds for the above choices of {βn}n=1 and {λn}n=1 when q12,1.

Numerical example: image classification via support vector machines

In this section we employ the algorithm proposed in this paper in the context of image classification via support vector machines.

Having a set of training data aiRn,i=1,,k, belonging to one of two given classes denoted by “-1” and “+1”, the aim is to construct by using this information a decision function given in the form of a separating hyperplane, which assigns every new data to one of the two classes with a misclassification rate as low as possible. In order to be able to handle the situation when a full separation is not possible, we make use of non-negative slack variables ξi0,i=1,,k; thus the goal will be to find (s,r,ξ)Rn×R×R+k as optimal solution of the following optimization problem

minimize12s2+C2ξ2subject todi(ais+r)1-ξi,i=1,,kξi0,i=1,,k,

where for i=1,,k,di is equal to -1 if ai belongs to the class “-1” and it is equal to +1, otherwise. Each new data aRn will by assigned to one of the two classes by means of the resulting decision function z(a)=as+r, namely, a will be assigned to the class “-1”, if z(a)<0, and to the class “+1”, otherwise. For more theoretical insights in support vector machines we refer the reader to [29].

By making use of the matrix

A=d1a1d1100d2a2d2010dkakdk0010Rn01000Rn00100Rn0001R2k×(n+1+k)

the problem under investigation can be written as

minimize12s2+C2ξ2subject toAsrξ-1Rk0RkR+2k

or, equivalently,

minimize12s2+C2ξ2subject tosrξargmin12dist2A(·)-1Rk0Rk,R+2k.

By considering f:Rn×R×RkR as fsrξ:=12s2+C2ξ2, we have fsrξ=s0Cξ and notice that f is max{1,C}-Lipschitz continuous.

Further, for gsrξ:=12dist2Asrξ-1Rk0Rk,R+2k, we have gsrξ=AI-projR+2kAsrξ-1Rk0Rk and notice that g is A2-Lipschitz continuous, where projR+2k denotes the projection operator on the set R+2k.

For the numerical experiments we used a data set consisting of 6.000 training images and 2.060 test images of size 28×28 taken from the website http://www.cs.nyu.edu/~roweis/data.html representing the handwritten digits 2 and 7, labeled by -1 and +1, respectively (see Fig. 1). We evaluated the quality of the resulting decision function on test data set by computing the percentage of misclassified images.

Fig. 1.

Fig. 1

A sample of images belonging to the classes -1 and +1, respectively

We denote by D={(Xi,Yi),i=1,,6.000}R784×{-1,+1} the set of available training data consisting of 3.000 images in the class -1 and 3.000 images in the class +1. Due to numerical reasons each image has been vectorized and normalized. We tested in MATLAB different combinations of parameters chosen as in Remark 12 by running the algorithm for 3.000 iterations. A sample of misclassified images is shown in Fig. 2.

Fig. 2.

Fig. 2

A sample of misclassified images

In Table 1 we present the misclassification rate in percentage for different choices for the parameters α(0,1) (we recall that in this case we take K:=2/α) and C>0, while for α=0 which corresponds to the noninertial version of the algorithm we consider different choices of the parameter K>0 and C>0. We observe that when combining α=0.1 with each regularization parameters C=5,10,100 leads to the lowest misclassification rate with 2.1845%.

Table 1.

Misclassification rate in percentage for different choices for the parameters α and C when c=2 and q=0.9

α C=0.1 C=1 C=2 C=5 C=10 C=100
0.1 2.2330 2.2330 2.2330 2.1845 2.1845 2.1845
0.3 2.2330 2.2816 2.2816 2.2816 2.2816 2.2816
0.5 2.2330 2.2330 2.2330 2.2816 2.2816 2.3301
0.7 2.3786 2.3786 2.3786 2.3786 2.3786 2.3786
0.9 2.9126 2.9126 2.9126 2.9126 2.8641 2.8155
0 (K=0.1) 3.1068 3.0583 3.0583 2.9612 2.9612 2.7184
0 (K=1) 2.2816 2.2330 2.2330 2.2330 2.2330 2.2330
0 (K=10) 2.2816 2.2330 2.2330 2.2330 2.2330 2.2330
0 (K=100) 2.2330 2.2330 2.2330 2.2330 2.2330 2.2330
0 (K=1000) 2.2330 2.2330 2.2330 2.2330 2.2330 2.2330

In Table 2 we present the misclassification rate in percentage for different choices of the parameters C>0 and c>1. The lowest classification rate of 2.1845% is obtained for each regularization parameter C=5,10,100.

Table 2.

Misclassification rate in percentage for different choices for the parameters C and c>1 when α=0.1 and q=0.9

C c=1.1 c=2 c=5 c=10 c=100
0.1 2.2330 2.2330 2.2330 2.2330 2.2330
1 2.2330 2.2330 2.2330 2.2330 2.2330
2 2.2330 2.2330 2.2330 2.2330 2.2330
5 2.1845 2.1845 2.1845 2.1845 2.1845
10 2.1845 2.1845 2.1845 2.1845 2.1845
100 2.1845 2.1845 2.1845 2.1845 2.1845

Finally, Table 3 shows the misclassification rate in percentage for different choices for the parameters C>0 and q(1/2,1). The lowest classification rate of 2.1845% is obtained when combining the value q=0.9 with each regularization parameter C=5,10,100.

Table 3.

Misclassification rate in percentage for different choices for the parameters C and q(1/2,1) when α=0.1 and c=2

C q=0.6 q=0.75 q=0.9
0.1 2.2816 2.3301 2.2330
1 2.2330 2.2816 2.2330
2 2.2816 2.2816 2.2330
5 2.2330 2.2816 2.1845
10 2.2330 2.2816 2.1845
100 2.2330 2.2330 2.1845

Acknowledgements

Ernö Robert Csetnek’s Research was supported by FWF (Austrian Science Fund), Lise Meitner Programme, project M 1682-N25. Nimit Nimana is thankful to the Royal Golden Jubilee Ph.D. Program for financial support. Research done during the two months’ stay of the third author in Spring 2016 at the Faculty of Mathematics of the University of Vienna. The authors are thankful to two anonymous reviewers for hints and comments which improved the quality of the paper.

Contributor Information

Radu Ioan Boţ, Email: radu.bot@univie.ac.at.

Ernö Robert Csetnek, Email: ernoe.robert.csetnek@univie.ac.at.

Nimit Nimana, Email: nimitn@hotmail.com.

References

  • 1.Alvarez F. On the minimizing property of a second order dissipative system in Hilbert spaces. SIAM J. Control Optim. 2000;38(4):1102–1119. doi: 10.1137/S0363012998335802. [DOI] [Google Scholar]
  • 2.Alvarez F. Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J. Optim. 2004;14(3):773–782. doi: 10.1137/S1052623403427859. [DOI] [Google Scholar]
  • 3.Alvarez F, Attouch H. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 2001;9:3–11. doi: 10.1023/A:1011253113155. [DOI] [Google Scholar]
  • 4.Attouch, H., Cabot, A., Czarnecki, M.-O.: Asymptotic behavior of nonautonomous monotone and subgradient evolution equations. Trans. Am. Math. Soc. (to appear) (2016). arXiv:1601.00767
  • 5.Attouch H, Czarnecki M-O. Asymptotic behavior of coupled dynamical systems with multiscale aspects. J. Differ. Equ. 2010;248(6):1315–1344. doi: 10.1016/j.jde.2009.06.014. [DOI] [Google Scholar]
  • 6.Attouch H, Czarnecki M-O. Asymptotic behavior of gradient-like dynamical systems involving inertia and multiscale aspects. J. Differ. Equ. 2017;262(3):2745–2770. doi: 10.1016/j.jde.2016.11.009. [DOI] [Google Scholar]
  • 7.Attouch H, Czarnecki M-O, Peypouquet J. Prox-penalization and splitting methods for constrained variational problems. SIAM J. Optim. 2011;21(1):149–173. doi: 10.1137/100789464. [DOI] [Google Scholar]
  • 8.Attouch H, Czarnecki M-O, Peypouquet J. Coupling forward-backward with penalty schemes and parallel splitting for constrained variational inequalities. SIAM J. Optim. 2011;21(4):1251–1274. doi: 10.1137/110820300. [DOI] [Google Scholar]
  • 9.Attouch H, Peypouquet J, Redont P. A dynamical approach to an inertial forward-backward algorithm for convex minimization. SIAM J. Optim. 2014;24(1):232–256. doi: 10.1137/130910294. [DOI] [Google Scholar]
  • 10.Banert S, Boţ RI. Backward penalty schemes for monotone inclusion problems. J. Optim. Theory Appl. 2015;166(3):930–948. doi: 10.1007/s10957-014-0700-x. [DOI] [Google Scholar]
  • 11.Bauschke HH, Combettes PL. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. New York: Springer; 2011. [Google Scholar]
  • 12.Bertsekas DP. Nonlinear Programming. 2. Cambridge: Athena Scientific; 1999. [Google Scholar]
  • 13.Boţ RI, Csetnek ER. Forward-backward and Tseng’s type penalty schemes for monotone inclusion problems. Set-Valued Var. Anal. 2014;22:313–331. doi: 10.1007/s11228-014-0274-7. [DOI] [Google Scholar]
  • 14.Boţ RI, Csetnek ER. A Tseng’s type penalty scheme for solving inclusion problems involving linearly composed and parallel-sum type monotone operators. Vietnam J. Math. 2014;42(4):451–465. doi: 10.1007/s10013-013-0050-2. [DOI] [Google Scholar]
  • 15.Boţ, R.I., Csetnek, E.R.: Levenberg–Marquardt dynamics associated to variational inequalities. Set-Valued Var. Anal. (2017). doi:10.1007/s11228-017-0409-8
  • 16.Boţ RI, Csetnek ER. An inertial forward-backward-forward primal-dual splitting algorithm for solving monotone inclusion problems. Numer. Algorithms. 2016;71:519–540. doi: 10.1007/s11075-015-0007-5. [DOI] [Google Scholar]
  • 17.Boţ RI, Csetnek ER. An inertial alternating direction method of multipliers. Minimax Theory Appl. 2016;1(1):29–49. [Google Scholar]
  • 18.Boţ RI, Csetnek ER. A hybrid proximal-extragradient algorithm with inertial effects. Numer. Funct. Anal. Optim. 2015;36(8):951–963. doi: 10.1080/01630563.2015.1042113. [DOI] [Google Scholar]
  • 19.Boţ RI, Csetnek ER. An inertial Tseng’s type proximal algorithm for nonsmooth and nonconvex optimization problems. J. Optim. Theory Appl. 2016;171(2):600–616. doi: 10.1007/s10957-015-0730-z. [DOI] [Google Scholar]
  • 20.Boţ RI, Csetnek ER. Approaching the solving of constrained variational inequalities via penalty term-based dynamical systems. J. Math. Anal. Appl. 2016;435:1688–1700. doi: 10.1016/j.jmaa.2015.11.032. [DOI] [Google Scholar]
  • 21.Boţ RI, Csetnek ER. Penalty schemes with inertial effects for monotone inclusion problems. Optimization. 2017;66(6):965–982. doi: 10.1080/02331934.2016.1181759. [DOI] [Google Scholar]
  • 22.Boţ RI, Csetnek ER. Second order dynamical systems associated to variational inequalities. Appl. Anal. 2017;96(5):799–809. doi: 10.1080/00036811.2016.1157589. [DOI] [Google Scholar]
  • 23.Boţ, R.I., Csetnek, E.R.: A second order dynamical system with Hessian-driven damping and penalty term associated to variational inequalities (2016). arXiv:1608.04137 [DOI] [PMC free article] [PubMed]
  • 24.Boţ RI, Csetnek ER, Hendrich C. Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl. Math. Comput. 2015;256:472–487. [Google Scholar]
  • 25.Boţ RI, Csetnek ER, László S. An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 2016;4:3–25. doi: 10.1007/s13675-015-0045-8. [DOI] [Google Scholar]
  • 26.Cabot A, Frankel P. Asymptotics for some proximal-like method involving inertia and memory aspects. Set-Valued Var. Anal. 2011;19:59–74. doi: 10.1007/s11228-010-0140-1. [DOI] [Google Scholar]
  • 27.Chen C, Chan RH, MA S, Yang J. Inertial proximal ADMM for linearly constrained separable convex optimization. SIAM J. Imaging Sci. 2015;8(4):2239–2267. doi: 10.1137/15100463X. [DOI] [Google Scholar]
  • 28.Chen C, MA S, Yang J. A general inertial proximal point algorithm for mixed variational inequality problem. SIAM J. Optim. 2015;25(4):2120–2142. doi: 10.1137/140980910. [DOI] [Google Scholar]
  • 29.Cristianini N, Taylor JS. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge: Cambridge University Press; 2000. [Google Scholar]
  • 30.Maingé P-E. Convergence theorems for inertial KM-type algorithms. J. Comput. Appl. Math. 2008;219:223–236. doi: 10.1016/j.cam.2007.07.021. [DOI] [Google Scholar]
  • 31.Maingé P-E, Moudafi A. Convergence of new inertial proximal methods for dc programming. SIAM J. Optim. 2008;19(1):397–413. doi: 10.1137/060655183. [DOI] [Google Scholar]
  • 32.Moudafi A, Oliny M. Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 2003;155:447–454. doi: 10.1016/S0377-0427(02)00906-8. [DOI] [Google Scholar]
  • 33.Noun N, Peypouquet J. Forward-backward penalty scheme for constrained convex minimization without inf-compactness. J. Optim. Theory Appl. 2013;158(3):787–795. doi: 10.1007/s10957-013-0296-6. [DOI] [Google Scholar]
  • 34.Ochs P, Chen Y, Brox T, Pock T. iPiano: Inertial proximal algorithm for non-convex optimization. SIAM J. Imaging Sci. 2014;7(2):1388–1419. doi: 10.1137/130942954. [DOI] [Google Scholar]
  • 35.Peypouquet J. Coupling the gradient method with a general exterior penalization scheme for convex minimization. J. Optim. Theory Appl. 2012;153(1):123–138. doi: 10.1007/s10957-011-9936-x. [DOI] [Google Scholar]
  • 36.Polyak BT. Introduction to Optimization, (Translated from the Russian) Translations Series in Mathematics and Engineering. New York: Optimization Software Inc., Publications Division; 1987. [Google Scholar]

Articles from Optimization Letters are provided here courtesy of Springer

RESOURCES