Skip to main content
Springer logoLink to Springer
. 2017 Sep 29;2017(1):239. doi: 10.1186/s13660-017-1500-2

Primal-dual interior point QP-free algorithm for nonlinear constrained optimization

Jinbao Jian 1, Hanjun Zeng 2, Guodong Ma 1,, Zhibin Zhu 3
PMCID: PMC5622232  PMID: 29033531

Abstract

In this paper, a class of nonlinear constrained optimization problems with both inequality and equality constraints is discussed. Based on a simple and effective penalty parameter and the idea of primal-dual interior point methods, a QP-free algorithm for solving the discussed problems is presented. At each iteration, the algorithm needs to solve two or three reduced systems of linear equations with a common coefficient matrix, where a slightly new working set technique for judging the active set is used to construct the coefficient matrix, and the positive definiteness restriction on the Lagrangian Hessian estimate is relaxed. Under reasonable conditions, the proposed algorithm is globally and superlinearly convergent. During the numerical experiments, by modifying the technique in Section 5 of (SIAM J. Optim. 14(1): 173-199, 2003), we introduce a slightly new computation measure for the Lagrangian Hessian estimate based on second order derivative information, which can satisfy the associated assumptions. Then, the proposed algorithm is tested and compared on 59 typical test problems, which shows that the proposed algorithm is promising.

Keywords: inequality and equality constraints, optimization, primal-dual interior method, working set, global and superlinear convergence

Introduction

In this paper, we consider nonlinear constrained optimization problems with inequality and equality constraints

(P)minf(x),s.t.gi(x)=0,iI;gj(x)0,jIı, 1

where I={1,2,,m},Iı={m+1,m+2,,m+mı}, the functions f and gj:RnR. It is known that the nonlinear equality constraints are difficult to be dealt with in designing algorithms for (P), especially, in designing the methods of feasible directions (MFD). In 1976, Mayne and Polak [2] proposed a simple scheme to convert (P) to a sequence of inequality smoothing constrained optimization

(Pρ)minfρ(x):=f(x)ρjIgj(x),s.t.gj(x)0,jIIı, 2

where ρ>0 is a penalty parameter. Under suitable constraint qualifications (CQ), e.g., linear independence, it has been shown that (Pρ) is equivalent to (P) when ρ is large enough. So, based on (Pρ), one can study and present effective algorithms for the original problem (P), e.g., Refs. [1, 36].

In addition, with the help of inequality constrained non-smoothing optimization

minf(x)+jIcj|gj(x)|,s.t.gj(x)0,jIIı,

one can also design an algorithm for solving the original problem (P), e.g., [7], where cj>0 is the penalty parameter that needs to be updated.

It is known that the sequential quadratic programming (SQP) method is one of the efficient methods for constrained optimization due to its fast convergence, and it has been widely studied by many authors, see Refs. [817]. However, the quadratic program (QP) subproblems solved in the SQP methods may be inconsistent, and the computational cost for the QPs is high. Therefore, motivated by the KKT condition of the QPs and/or the quasi-Newton method, QP-free methods are put forward, in which the QPs are replaced by suitable systems of linear equations (SLEs), see Refs. [1826].

Now we review briefly the study on the primal-dual interior point (PDIP) QP-free algorithms associated with our work. First, for problem (P) with no equality constraints, i.e., I=, in 1987, Panier et al. [22] presented a QP-free algorithm denoted by PTH, at iterate k, two SLEs are solved to yield a master search direction. Then a least squares problem (LSP) needs to be solved to avoid the so-called Maratos effect [27]. However, the SLEs solved in [22] may become ill-conditioned, and the PTH algorithm may be instable. Furthermore, the initial point must lie on the strict interior of the feasible set, and an additional assumption that ‘the number of stationary points is finite’ is used to ensure the global convergence. Later, under the assumption that the multiplier approximation sequence remains bounded, the PTH algorithm was improved by Gao et al. [3] by solving an extra SLE. The PTH algorithm was also improved by Qi and Qi [23], Zhu [26] and Cai [28].

To improve the PTH algorithm [22], by using the idea of PDIP and choosing different barrier parameters for each constraint, Bakthiari and Tits [18] proposed a new PDIP QP-free algorithm. The algorithm can start from a feasible point at the boundary of the feasible set, and it possesses global convergence without both the additional assumption of isolatedness of the stationary points and the positive definite restriction on matrix Hk. Almost at the same time, Tits et al. [1] extended and improved the PTH algorithm to problem (P) with both inequality and equality constraints. The algorithm [1] possesses two remarkable characters. One is that a new and simple rule to update the penalty parameter ρ in (Pρ) is derived, the other is that, same as in [18], the uniformly positive definite restriction on the Lagrangian Hessian estimate is relaxed.

More recently, for inequality constrained optimization, Jian et al. [21] proposed a strongly sub-feasible primal-dual quasi interior-point algorithm with superlinear convergence, where the initial point can be chosen arbitrarily, the number of feasible constraints is nondecreasing, and the iteration points all enter into the interior of the feasible region after finite iterations; a new kind of working set was introduced, which further reduced the computational cost; the uniformly positive definite restriction on the sequence {Hk} was relaxed; at each iteration, only two or three SLEs with the same coefficient matrix needed to be solved.

However, there are still some problems worthy of research on the PDIP-type algorithms [1, 18, 22]. First, the coefficient matrix of the Karush-Kuhn-Tucker (KKT) system of the LSP is not the same as the two previous SLEs, and this further increases the computational cost. Second, the coefficient matrices of the SLEs include all the constraints and their gradients, and this leads to a large increase in the scale of the SLEs. Third, the global convergence of the two algorithms [1, 18] relies on an additional assumption that the stationary points are finite or isolated.

On the other hand, to design more effective algorithms with small computational cost for solving constrained optimization, Facchinei et al. [29] first introduced the active set identifying technique (also called working set technique). And then this technique has been popularized and applied in many works, e.g., [17, 24, 25, 30, 31]. Particularly, the algorithm [30] needs to solve four SLEs at each iteration.

The goal of this paper is to improve and extend the algorithms [18, 21] to nonlinear constrained optimization (P) and, at the same time, to overcome the three problems mentioned above. As a result, by means of problem (Pρ), we propose a PDIP-type algorithm for problem (P). Compared with the previous PDIP-type algorithms, the proposed algorithm possesses the following features.

  1. A slightly new identifying technique for the active set different from [17, 25] is introduced. The multiplier yielded at the previous iteration is used to compute the working set, and no additional computational cost is needed, so the computational cost is expected to be reduced.

  2. At each iteration, to yield the search directions, only two or three SLEs with the same coefficient matrix need to be solved. Furthermore, the coefficient matrix has smaller scale than the ones in [1, 18, 22].

  3. For a strict interior point xk of the feasible set of (Pρ), the iteration at xk is well defined without any other constraint qualification (CQ).

  4. Under suitable CQ and assumptions including a relaxed positive definite restriction on the Lagrangian Hessian estimate Hk, but without the isolatedness of the stationary points, the proposed algorithm is globally and superlinearly convergent.

  5. A slightly new computation technique for Hk based on second order derivative information is introduced, which is a modification of the one in [1], Section 5.1, and satisfies the relaxed positive definite restriction.

Throughout this paper, for simplicity, denote vector (xT,yT,zT,)T by (x,y,z,) for column vectors x,y and z, and denotes the Euclidean norm.

Construction of algorithm

To analyze our algorithm, the following notations are used:

I=IIı,eˆ=(1,,1(mth),0,,0((m+mı)th))T,X={xRn:gi(x)=0,iI;gj(x)0,jIı},eJ=(1,,1)TR|J|,X˜={xRn:gj(x)0,jI},X0˜={xRn:gj(x)<0,jI},I(x)={jI:gj(x)=0},Iı(x)={jIı:gj(x)=0},I(x)=I(x)Iı(x),g(x)=(gj(x),jI),gı(x)=(gj(x),jIı),g(x)=(gj(x),jI),gJ(x)=(gj(x),jJI),gJ(x)=(gj(x),jJ),gjk=gj(xk),gJk=gJ(xk),gjk=gj(xk),gjkT=(gjk)T.

First, the following basic hypothesis is necessary.

H1

The inner set X0˜ is nonempty, and the functions f and gj (jI) are all continuously differentiable.

Remark 1

Note that if there exists a point belonging to the set , namely, xˆX˜, and the active constraint gradient vectors {gj(xˆ),jI(xˆ)} are linearly independent, then one can yield a point x0X0˜ by simple computation, e.g., execute line search on g starting with along direction dˆ=Nˆ(NˆTNˆ)1e, where Nˆ=gI(xˆ)(xˆ) and e=(1,,1)T.

Before proposing our algorithm, we give a proposition to show the equivalences between (P) and (Pρ).

Proposition 1

If (x,λ) is a KKT pair for problem (Pρ) and g(x)=0, then (x,λρ) with multiplier λρ=λρeˆ is a KKT pair for the original problem (P).

Based on Proposition 1, it is known that if one can construct an effective algorithm for problem (Pρ) and adjust parameter ρ to force the iterate to asymptotically satisfy g(x)=0, then the solution to (P) can be yielded.

Now, refer to [29] and [24], we introduce optimal identification functions Φ and δ as follows:

Φ(x,λ)=(xL(x,λ)g(x)min{gı(x),λı}),δ(x,λ)=Φ(x,λ)r, 3

where λ=(λ,λı), parameter r(0,1), and the Lagrangian function

L(x,λ)=f(x)+jIλjgj(x). 4

It is clear that (x,λ) is a KKT pair of (P) if and only if δ(x,λ)=0. Particularly, from [29] or/and [24], Definition 4.1, Theorems 4.1, 4.2 and 4.3, one can see that {jI:gj(x)+δ(x,λ)0} is an exact identification set for active constrain set I(x) if (x,λ) converges to a KKT pair (x,λ) of problem (P), and the Mangasarian-Fromovotz constraint qualification (MFCQ) and the second order sufficient conditions are satisfied at (x,λ).

In this paper, similarly to the techniques in [21, 30], for the current iterate xkX˜0, we yield the corresponding multiplier vector λk=(λk,λık) in (3)-(4) as follows:

λ0=z0,λk=λ¯k1ρk1eˆ,k>0, 5

where z0>0, and (λ¯k1,ρk1) is computed in the previous iteration (k1)th. Then, similarly to [29], we structure our working set by

Ikı={jIı:gj(xk)+δ(xk,λk)0},Ik=IIkı. 6

The reason why one does not compute Ik as Ikı is to force g(xk)0, see the analysis of Theorem 1 in Section 3. The set Ikı equals the exact active set Iı(x) when (xk,λk) is sufficiently close to a KKT pair (x,λ) of (P) and the second order sufficient conditions as well as the MFCQ hold at (x,λ). This important property allows us to construct the direction finding subproblems only considering the constraints in the working set Ik.

Taking into account that the iterates always execute within the feasible set , let us consider the first order condition of optimality (KKT condition) for problem (Pρk) nearby the current iterate xk:

fρk(x)+jIkλjgj(x)=0,λjgj(x)=0,jIk,λIk0.

Furthermore, if we ignore the non-negativity request ‘λIk0’ and simultaneously introduce a suitable perturbation ((1ζk)fρk(xk),μk)R(n+|Ik|) in the right-hand side of the above system, then it can be reduced as a system of nonlinear equations with variables (x,λIk)

(fρk(x)+jIkλjgj(x)λjgj(x),jIk)=((1ζk)fρk(xk)μk). 7

Applying the Newton method to system (7) starting with the current iterate (xk,λIkk), it yields a SLE as follows:

(xx2Lρk(xk,λIkk)gIk(xk)ΛkgIk(xk)Tdiag(gIkk))(xxkλIk)=(ζkfρk(xk)μk), 8

where diagonal matrix Λk=diag(λIkk), and the Lagrangian Hessian

xx2Lρk(xk,λIkk)=2fρk(xk)+jIkλjk2gj(xk).

Subsequently, to make the coefficient matrix in SLE (8) possess nice property and low computational cost, we consider its optimization and modification as follows. First, replace the Lagrangian Hessian by a suitable approximate symmetric matrix Hk, and denote xxk by direction d. Second, replace the diagonal matrix Λk by positive diagonal matrix Zk=diag(zIkk), where vector zIkk is an approximation of λk.

As a result, from system (8), the coefficient matrix and the form of the SLEs that need to be solved in our algorithm are as follows:

Vk:=(HkgIk(xk)ZkgIk(xk)Tdiag(gIkk)), 9
SLE(Vk;ζk,μk):Vk(dλIk)=(ζkfρk(xk)μk). 10

To yield improved search directions with superlinear convergence, our algorithm will solve two or three SLEs with the form of (10) with different perturbation vectors (ζk,μk).

Subsequently, it is necessary to analyze the singularities of the coefficient matrix Vk above, i.e., the solvability of SLE (10).

Lemma 1

For iterate xkX˜0 and zIkk>0, if the matrix Hk satisfies

HkjIkzjkgjkgjkgjkT, 11

then the coefficient matrix Vk defined by (9) is invertible, where matrix order AB means (AB) is positive definite on Rn.

Proof

One knows that it is sufficient to show that SLE Vku=0 has a unique solution zero, and this is elementary and omitted here. □

Remark 2

Obviously, the positive definiteness request (11) on Hk is weaker than the positive definiteness of Hk itself on Rn. But it is stronger than the positive definiteness of Hk on the null space of the gradients of approximate active constraints, i.e., on Ωk:={dRn:gIk(xk)Td=0}. However, the latter cannot ensure the invertibility of Vk.

Based on the above analysis and preparation, now we can describe the steps of our algorithm solving (P) as follows.

Algorithm A

Parameters: α(0,12),σ,β,θ,r(0,1),ξ(2,3), ν>2, ϑ>1, M,p>0; suitable small positive parameters γ1, γ and γ3; sufficiently small lower bound ε_>0 and sufficiently large upper bound ε>0; termination accuracy ϵ>0.

Data: x0X0˜,ρ0>0, vectors z0 with weights zj0[ε_,ε],jI. Set k:=0.

Step 1 Compute working set. Compute λk by (5), Φ(xk,λk) and δ(xk,λk) by (3)-(4). If Φ(xk,λk)ϵ or other suitable termination rule is satisfied, then (xk,λk) is an approximate KKT pair of problem (P) and stop; otherwise, generate the working sets Ikı and Ik by (6).

Step 2 Yield matrix Hk. Yield matrix Hk such that it approximates to the Hessian of the Lagrangian associated with (Pρk) and satisfies request (11).

Step 3 Compute the main search directions.

(i) Compute (d¯k,λ¯Ikk) by solving SLE(Vk;1,0), see (10), then set λ¯k=(λ¯Ikk,0IIk)=(λ¯k,λ¯ık) with λ¯ık=(λ¯Ikık,0IıIkı).

(ii) Check conditions: (a) dk¯γ1, (b) λ¯kγ2eI, (c) λ¯kγ3eI. If all the three conditions above hold, then increase penalty parameter ρ by ρk+1=ϑρk, set xk+1=xk,zk+1=zk,Hk+1=Hk, Ik+1ı=Ikı,Ik+1=Ik, k:=k+1, and go back to Step 3(i). Otherwise, set ρk+1=ρk, proceed to Step 3(iii) as follows.

(iii) Yield the weights of vector ϕk by

ϕjk=min{0,(max{λ¯jk,0})pMgjk},jIk. 12

Then compute

ξk=fρk(xk)Td¯kjIkλ¯jkϕjkzjk, 13
bk=(d¯kν+ϕk)(jIkλ¯jk)+jIkλ¯jkzjkϕjk, 14
φk={1,if bk0;min{(1θ)|ξk|bk,1},if bk>0, 15

and yield perturbation vectors via convex combinations

μk=(1φk)ϕk+φk(d¯kνϕk)zIkk. 16

(iv) Compute (dk,λIkk) by solving SLE(Vk;1,μk), see (10), then set λk=(λIkk,0IIk)=(λk,λık) with λık=(λIkık,0IıIkı).

Step 4 Trial of unit step. If

fρk(xk+dk)fρk(xk)+αfρk(xk)Tdk,gj(xk+dk)<0,jI,

then let the step size tk=1, the high order correction direction d˜k=0, and enter Step 7. Otherwise, proceed to Step 5.

Step 5 Generate high order correction direction. Compute (d˜k,λ˜Ikk) by solving SLE(Vk;0,μ˜k), where

μ˜k=ωkeIkZkgIk(xk+dk), 17
ωk=max{dkξ;dk2max{|1zjkλjk|σ,jIk,λjk0}}. 18

If d˜k>dk, reset d˜k=0.

Step 6 Perform arc search. Compute the step size tk, the maximum number t of sequence {1,β,β2,} satisfying

fρk(xk+tdk+t2d˜k)fρk(xk)+αtfρk(xk)Tdk, 19
gj(xk+tdk+t2d˜k)<0,jI. 20

Step 7 Update. Yield a new iterate by xk+1=xk+tkdk+tk2d˜k and compute

zjk+1=min{max{dk2+ε_,λjk},ε},jI. 21

Set k:=k+1, go back to Step 1.

Subsequently, we analyze and describe some properties of Algorithm A by the following lemma and several remarks. For convenience of writing, denote matrix

Qk:=HkjIkzjkgjkgjkgjkT. 22

Then request (11) implies that matrix Qk is positive definite.

Lemma 2

For the directions d¯k and dk yielded in Step 3(i), (iv), the following two relations hold:

fρk(xk)Td¯k=(d¯k)TQkd¯k0,k0, 23
fρk(xk)Tdkθξk0,k0. 24

Furthermore, when the iterative process goes into Step 3(iii), (iv), one has d¯k0 and ξk<0, so dk is a feasible direction of descent of problem (Pρk) at point xk and the arc search in Step 6 can be finished by finite calculations. Therefore, Algorithm  A is well defined.

Proof

First, from (9) and SLE(Vk;1,0) (10), we have

fρk(xk)Td¯k=(d¯k)T(Hkd¯k+jIkgjkλ¯jk)=(d¯k)T(HkjIkzjkgjkgjkgjkT)d¯k=(d¯k)TQkd¯k0.

So, conclusion (23) is at hand. Second, from (12)-(13), one gets

ϕjkλ¯jk0,jIk;ξkfρk(xk)Td¯k0. 25

On the other hand, taking into account SLE(Vk;1,0) and SLE(Vk;1,μk) as well as (13)-(14), it is not difficult to show that

fρk(xk)Tdk=fρk(xk)Td¯kjIkλ¯jkμjkzjk=ξk+φkbk. 26

Again, in view of (15), it follows that φkbk=bk0 if bk0, hence, the relations ξk+φkbkξkθξk hold since ξk0. If bk>0, then ξk+φkbkξk+(θ1)ξk=θξk. In all, one gets ξk+φkbkθξk. This, together with (26) and (25), shows that fρk(xk)Tdkθξk0.

Third, if d¯k=0, then, from SLE(Vk;1,0) (10), g(xk)<0 and (9), it follows that λ¯Ikk=0. So, by the structure of Step 3, the iterate k does not go into Step 3(iii), (iv). Thus, d¯k0 when the iterative process goes into Step 3(iii), (iv).

Finally, ξk<0 follows from (25), (23) and d¯k0. The remaining claims in Lemma 2 are at hand by ξk<0 and g(xk)<0. □

As an end of this section, to help the readers understand our algorithm, we further analyze the steps/structure of Algorithm A with three remarks below.

Remark 3

Analysis for Step 3

  • (i)

    The role of solving SLE(Vk;1,0) with no perturbation in Step 3(i) is to check whether the current iterate xk is an approximate KKT point of (Pρk) and yield an ‘improved’ direction d¯k to a certain extent.

  • (ii)

    If conditions (a) and (b) in Step 3(ii) are satisfied, and the parameters γ1 and γ2 are small enough, then SLE(Vk;1,0) implies that xk is an approximate KKT point of (Pρk). However, if case (c) is also satisfied, one cannot estimate g(xk). So, we increase the penalty parameter ρ. In practical computation, if conditions (a) and (b) are satisfied and g(xk) is small enough, we can terminate the algorithm.

  • (iii)

    From result (23), one knows that d¯k is a descent direction of the merit function fρk(x) at xk when d¯k0. However, the primal feasibility and dual feasibility are relaxed to a large extent in SLE(Vk;1,0), d¯k cannot be used as an effective search direction. So, generally, the first direction d¯k should be corrected by another SLE. For this goal, refer to [21], we construct and solve SLE(Vk;1,μk) in Step 3(iii), (iv). Lemma 2 and the global convergence analysis in the next section show that the algorithm with search direction dk is well defined and globally convergent.

Remark 4

Explanation for Steps 4 and 5

Usually, search direction dk cannot avoid the Maratos effect, i.e., unit step cannot be accepted by the associated line search for all sufficiently large iterates k. So, to overcome the Maratos effect and obtain superlinear convergence, one needs to compute an additional high order correction direction. Here, we generate it by solving SLE(Vk;0,μ˜k) in Step 5. Obviously, solving SLE(Vk;0,μ˜k) should add computational cost more or less. On the other hand, numerical testing shows that dk can still avoid the Maratos effect at some iterates. Therefore, to save computational cost as much as possible, the trial of unit step in Step 4 is added.

Remark 5

With the help of the working set technique, the three SLEs solved in Algorithm A have a common coefficient matrix Vk, which can save the cost of computation and is different from those in Refs. [18, 26], etc. Furthermore, due to being interior point type and the constructing technique for Vk, Algorithm A is well defined at each iterate without any other CQ except the strict inner X˜0, see Lemmas 1 and 2. In many existing QP-free type algorithms, see Refs. [1, 3, 2124], the linearly independent constraint qualification (LICQ) is necessary to ensure the iterate itself is well defined. Of course, as we see in Assumption H3, to obtain the global and superlinear convergence of Algorithm A, a suitable CQ on the boundary of is still necessary.

Analysis of global convergence

In this section, we assume that the proposed algorithm (Algorithm A) generates an infinite iteration sequence {xk} of points. First, we show that the penalty parameter ρk can be fixed after finite iterates. And then, we prove that Algorithm A is globally convergent. For this goal, the following hypotheses are necessary.

H2
Suppose that the sequences both {xk} and {Hk} yielded by Algorithm A are bounded, and assume that there exists a positive constant a such that
dTHkdad2jIkzjk|gjk|gjkTd2, 27
i.e., dTQkdad2,k,dRn.
H3
For each xX˜, suppose that
  • (i)
    the gradient vectors {gj(x),jI(x)} are linearly independent; and
  • (ii)
    if xX, i.e., g(x)0, then there exist no scalars λj0,jI(x) such that jIgj(x)=jI(x)λjgj(x).

Remark 6

Analysis for H2

The uniform ‘positive-definiteness’ request (27) on {Hk} is weaker than the usual uniform positive-definiteness of {Hk} itself on Rn, namely, dTHkdad2,k,dRn. However, it is stronger than the uniform positive-definiteness of Hk on the null space Ωk. It is encouraging that, based on the Lagrangian Hessian, we can design an alternative computational technique for Hk such that {Hk} is bounded and satisfies request (27), which implies (11) whenever {xk} is bounded, see formulas (52), (54) and (55) as well as Theorem 5 in Section 5.

Remark 7

Analysis for H3

  • (i)

    Hypothesis H3 was introduced by Tits et al. in [1], Assumption 3. In our work, it plays two roles in the convergence analysis of Algorithm A. One is to ensure the correction for the penalty parameter ρ can be finished in a finite number of iterations, the other is to assure that the sequence {Vk} of coefficient matrices is uniform invertible, see Lemmas 3 and 4. Furthermore, H3 is considerably milder than the linear independence of the gradients {gi(x),iI;gj(x),jIı(x)}, a detailed analysis for this assumption can be seen in [1, 32].

  • (ii)

    First, H3 automatically holds at each interior point xX˜0. Second, H3 can be reduced to each accumulation point x of the iterate sequence {xk}, which satisfies xX˜0. However, the latter is difficult to be verified.

Lemma 3

Suppose that H1, H2 and H3 hold. Then the penalty parameter ρk in Algorithm  A is increased at most finite times.

The proof of Lemma 3 is similar to the one of [1], Lemma 4.1, and omitted here. In what follows, ρ̄ denotes the final value of ρk, i.e., ρkρ¯ when k is sufficiently large.

Lemma 4

Suppose that H1, H2 and H3 hold. Then

  • (i)

    the sequence {Vk} of coefficient matrices is unified invertible, i.e., there exists a positive constant such that Vk1M¯,k0, and

  • (ii)

    both sequences {(d¯k,λ¯k)} and {(dk,λk)} are bounded.

Proof

(i) By contradiction, suppose that there exists an infinite subset K such that Vk1K. In view of the boundedness of {xk} and {Hk}, Step 6 and the finite choice of Ikı, without loss of generality, for kK, assume that

IkıI,xkx,HkH,zkzε_eI>0.

Denote Iˆ=II and Z=diag(zIˆ), then

VkKV:=(HgIˆ(x)ZgIˆ(x)Tdiag(gIˆ(x))). 28

Consequently, under H1-H3, refer to the proof of [21], Lemma 3.1(i), one can show that V is nonsingular. So Vk1KV1<, which contradicts Vk1K.

(ii) First, the boundedness of {(d¯k,λ¯k)} follows from SLE(Vk;1,0) and conclusion (i) as well as ρkρ¯. Second, the boundedness of {μk} follows from formulas (12)-(16) and the boundedness of {(d¯k,λ¯k)} as well as the positive boundary below of {zk}. Therefore, the boundedness of {(dk,λk)} is also at hand by SLE(Vk;1,μk). □

Lemma 5

Suppose that H1, H2 and H3 hold. Let x be an accumulation point of the sequence {xk} generated by Algorithm  A, and suppose that {xk}Kx for some infinite index set K. If {ξk}K0, then x is a KKT point of problem (Pρ¯), and both {λ¯k}K and {λk}K converge to the unique multiplier vector λ associated with x.

Proof

Let (λ¯;λˆ) be any given limit point of {(λ¯k;λk)}K. We first show that (x,λ¯) is a KKT pair of (Pρ¯). In view of H2, Lemma 4 and the finite choice of Ikı, we know that there is an infinite index KK such that

IkıI,(λ¯k;λk)(λ¯;λˆ),HkH,d¯kd¯,zkzε_eI,kK. 29

Therefore, from (25), (23) and H2, one can easily get d¯=0 by {ξk}K0. Further, taking the limit in SLE(Vk;1,0) for kK, we have, here Iˆ=II,

fρ¯(x)+jIˆλ¯jgj(x)=0;λ¯jgj(x)=0,jIˆ. 30

Next, divert our attention to showing that λ¯0. It is obvious that λ¯j=0 follows from λ¯jgj(x)=0 for jIˆI(x). Moreover, from the definition of ξk, i.e., (13), and (ξk,d¯k)K(0,0), we can deduce that jIˆλ¯jkϕjkzjk0,kK. Further, in view of (25), we know that each term λ¯jkϕjkzjk0, which together with (29) implies that λ¯jkϕjkK0. This, plus (12), shows that λ¯jmin{0,(max{λ¯j,0})pMgj(x)}=0 for jIˆ, and this includes λ¯j0 for jIˆI(x). Therefore, λ¯Iˆ0. Obviously, λ¯IIˆ=0. So λ¯0 is at hand.

Hence, taking into account xX˜, we can conclude from (30) that (x,λ¯) is a KKT pair and x is a KKT point for (Pρ¯). Furthermore, the analysis above further shows that the sequence {λ¯k}K possesses a unique limit point, i.e., the unique KKT multiplier vector λ. So limkKλ¯k=λ.

Finally, taking into account (d¯k,λ¯k)K(0,λ)0, from (12) and (16), we have (ϕk,μk)K0. Therefore, SLE(Vk;1,μk) minus SLE(Vk;1,0) gives

Vk(dkd¯kλIˆkλ¯Iˆk)=(0μk)K(00). 31

This, along with Lemma 4(i), shows that λˆ=limkKλk=limkKλ¯k=λ. □

Theorem 1

Suppose that H1, H2 and H3 hold. Then each accumulation point x of the sequence {xk} generated by Algorithm  A is a KKT point of the original problem (P), i.e. problem (1).

Proof

First, there exists an infinite index set K such that xkx,kK, and relation (29) holds. By contradiction, suppose that x is not a KKT point of (P). Then, from Lemma 4, without loss of generality, one can suppose that λk=λ¯k1ρ¯eˆλ¯,kK. Therefore, it follows that (x,λ¯) is not a KKT pair of (P), which further implies that δ(x,λ¯)>0 and Iı(x)Ikı,kK large enough. There are two cases as follows to be considered.

Case I: Assume that x is a KKT point of (Pρ¯). Then there exists a multiplier λ¯0 such that the KKT condition of (Pρ¯) is satisfied at (x,λ¯). In view of Iı(x)IkıI holds for kK large enough, it is easy to know, from the KKT condition of (Pρ¯), that (0,λ¯I) is a solution to SLE in (u,v)

V(uv)=(fρ¯(x)0), 32

where matrix V is defined by (28). On the other hand, passing to the limit in SLE(Vk;1,0) for kK and k, one knows that (d¯,λ¯I) also solves system (32) above. Taking into account the nonsingularity of matrix V (by Lemma 4(i)), one knows that the solution of (32) is unique. So d¯=0 and λ¯I=λ¯I0, which implies λ¯=λ¯0. Thus, conditions (a) and (b) in Step 3(ii) are always satisfied for kK large enough. Therefore, in view of ρkρ¯< for k large enough, Step 3(ii) implies λ¯Ik>γ3eI for kK large enough, which further implies that λ¯Iγ3eI>0. Hence, it follows from the complementary slackness at KKT pair (x,λ¯), 0=λ¯jgj(x)=λ¯jgj(x)=0 (jI). So g(x)=0, which together with Proposition 1 implies that x is also a KKT point of (P), which contradicts the assumption that x is not a KKT point of (P).

Case II: Suppose that x is not a KKT point of (Pρ¯). And, by Lemma 5 and ξk0, one can deduce that ξkξ¯<0,kK. Further, this along with (13) and (23) as well as H2, shows that limkK(d¯kν+ϕk)>0. So there exist a subset KK and a positive constant ϖ such that

ξkξ¯/2<0,(d¯kν+ϕk)ϖ>0,kK.

The remaining proof is divided into two steps.

Step A: Show that there exists a constant t¯>0 such that the step-length tkt¯ holds for all kK.

(A1) Analyze inequality (20). First, for jI(x),gj(x)<0, from the boundedness of {(dk,d˜k)}K and the continuity of gj, one gets that gj(xk+tdk+t2d˜k)<0 holds for kK large enough and t>0 sufficiently small. Second, consider index jI(x), i.e., gj(x)=0. In view of Iı(x)Ikı, which implies jIk, from Taylor expansion, formulas (9), (16) and SLE(Vk;1,μk) as well as d˜kd¯k, for t>0 small enough, we obtain that

gj(xk+tdk+t2d˜k)=gjk+tgjkTdk+o(t)=gjk+tμjkλjkgjkzjk+o(t)=(1tλjkzjk)gjk+t1φkzjkϕjktφk(d¯kν+ϕk)+o(t)tφk(d¯kν+ϕk)+o(t),

where the last inequality follows from Lemma 4(ii), zjkε_, φk1 and ϕjk0.

On the other hand, taking into account ξkξ¯/2<0 and the boundedness of bk (14) (by Lemma 4) as well as (15), we know that there exists a constant φ>0 such that φkφ>0,kK. So gj(xk+tdk+t2d˜k)φϖt+o(t)<0 holds for kK large enough and t>0 sufficiently small. Therefore, inequality (20) holds for t>0 sufficiently small and kK large enough.

(A2)
Analyze inequality (19). From Taylor expansion and (24), one gets
fρ¯(xk+tdk+t2d˜k)fρ¯(xk)αtfρ¯(xk)Tdk=(1α)tfρ¯(xk)Tdk+o(t)(1α)tθξk+o(t)(1α)tθξ¯/2+o(t)0.

Hence, inequality (19) holds for kK large enough and t>0 sufficiently small. Up to now, one can conclude that there exists a constant t¯>0 such that tkt¯ for each kK.

Step B: Use tkt¯>0 (kK) to bring a contradiction. Because of limkKfρ¯(xk)=fρ¯(x) and the monotone property of {fρ¯(xk)}, one knows that limkfρ¯(xk)=fρ¯(x). Further, in view of (19) and (24), it follows that for kK large enough

fρ¯(xk+1)fρ¯(xk)αtkfρ¯(xk)Tdkαtkθξkαθξ¯t¯/2.

Passing to the limit for kK and k in the inequality above, we can bring a contradiction. Summarizing the discussions above, the whole proof of Theorem 1 is completed. □

Analysis of strong and superlinear convergence

In this part, under some additional mild assumptions, we first show that the proposed algorithm is strongly convergent, that is, the whole sequence {xk} is convergent. Then the unit step can be accepted and the Maratos effect can be avoided for all k large enough. At last, we prove that Algorithm A achieves superlinear convergence.

H4
  • (i)
    The functions f(x) and g(x) are all twice continuously differentiable over ; and
  • (ii)
    there exists an accumulation point x of the sequence {xk} of iterative points with (unique) KKT multiplier λ associated with (P) such that the second order sufficiency conditions (SOSC) and the strict complementarity hold, i.e., the KKT pair (x,λ) of (P) satisfies λIı(x)>0 and
    dTxx2L(x,λ)d>0,d{dRn:d0,gI(x)(x)Td=0}.

Remark 8

Denote the Lagrangian function of problem (Pρ¯) by Lρ¯(x,λ)=fρ¯(x)+jIλjgj(x). Then, with relation λρ¯=λρ¯eˆ, we have L(x,λρ¯)=Lρ¯(x,λ). Therefore, taking into account Lemma 6(iv), it is readily checked that the SOSC with the strict complementarity for (Pρ¯) is identical with that for (P).

Lemma 6

Suppose that X˜ and assumptions H2, H3 and H4 are satisfied (by Remark  1, X˜ plus H3(i) implies X˜0). Then, for any subset K such that {xk}K converges to the limit point x stated in H4, there exists an infinite subset KK such that

  • (i)

    Iı(x)Ikı for kK sufficiently large;

  • (ii)

    x is a KKT point of problem (Pρ¯);

  • (iii)

    {(d¯k,λ¯k)}K(0,λ) and {(dk,λk)}K(0,λ), where λ together with x is a KKT pair of problem (Pρ¯); and

  • (iv)

    the KKT multiplier λ of (P) and λ of (Pρ¯) associated with the KKT point x satisfy λ=λρ¯eˆ,λI(x)>0.

Proof

(i) From Lemma 4(ii), there exists an infinite subset KK such that

xkx,λk=(λ¯k1ρ¯eˆ)λ¯,kK.

If (x,λ¯) is a KKT pair of (P), then λ¯=λ¯. Further, under H4, by [24, 29], one knows that IkıIı(x) for kK large enough. Otherwise, we have 0<δ(x,λ¯)Kδ(xk,λk). So, from (6), Iı(x)Ikı also holds for kK large enough.

(ii) By contradiction, suppose that x is not a KKT point of (Pρ¯). Then, taking into account conclusion Iı(x)Ikı (kK large enough), by Case II of the proof of Theorem 1, we can bring a contradiction.

(iii) To show {(d¯k,λ¯k)}K(0,λ), it is sufficient to show that (0,λ) is a unique accumulation point of {(d¯k,λ¯k)}K. Let (d¯,λ¯) be any given accumulation point of {(d¯k,λ¯k)}K. Since the sequences {(d¯k,λ¯k)} and {zk} are all bounded, in view of H2, H3 and IkıIı, there exists an infinite subset KK such that

IkıI,HkH,(d¯k,λ¯k)(d¯,λ¯),zkz,kK. 33

Now, passing to the limit for kK and k in SLE(Vk;1,0), we deduce that (d¯,λ¯Iˆ) (Iˆ:=II) solves SLE (32). Further, it follows from Lemma 4(i) that the coefficient matrix of SLE (32) is nonsingular. Thus the solution of (32) is unique. On the other hand, in view of Iı(x)I,I(x)=I and (x,λ) being a KKT pair of (Pρ¯), we know that (0,λIˆ) is also a solution to system (32). Therefore (d¯,λ¯Iˆ)=(0,λIˆ), this further implies that (d¯,λ¯)=(0,λ) and (0,λ) is a unique limit point of {(d¯k,λ¯k)}K.

Finally, conclusion {(dk,λk)}K(0,λ) follows from {(d¯k,λ¯k)}K(0,λ) and (31).

(iv) By Proposition 1 and g(x)=0, we have λ=λρ¯eˆ, and λIı(x)=λIı(x)>0 by H4(ii). Further, in view of d¯k0,λ¯kλ0,kK, one knows that conditions (a) and (b) in Step 3(ii) hold for k large enough. Therefore, taking into account ρkρ¯ for k large enough, it follows that λ¯k>γeI by Step 3(ii), so λγ3eI>0. Therefore λI(x)>0 holds. □

Remark 9

In view of I(x)=I, from H3, H4 and Lemma 6(ii), (iv), the following conclusion holds: The LICQ, SOSC and strict complementarity of problem (P) and problem (Pρ¯) are satisfied at their KKT pair (x,λ) and (x,λ), respectively.

In view of Remark 9, similarly to the proof of [21], Theorem 4.1, Lemma 4.2, we have the following result.

Theorem 2

Suppose that X˜ and assumptions H2, H3 and H4 are satisfied. Then

  • (i)

    xkx, i.e., Algorithm  A is strongly convergent;

  • (ii)

    (d¯k,λ¯k)(0,λ),(dk,λk)(0,λ),zkmin{max{ε_eI,λ},εeI}, and

  • (iii)

    ϕk=0,μk=φkd¯kνzIkk, IkıIı(x) and IkI:=I(x) if k is sufficiently large.

Lemma 7

Suppose that the hypotheses in Lemma 6 hold, and assume that the boundary parameters ε_ and ε̅ satisfy

ε_min{λj,jI},εmax{λj,jI}. 34

Then

zIkλI,ωk=o(dk2), 35

and the solution (d˜k,λ˜Ik) of SLE(Vk;0,μ˜k) satisfies

(d˜k,λ˜Ik)=O(ωˆk)=o(dk),ωˆkdk=o(ωk), 36

where

ωˆk=max{|zjk/λjk1|dk,jI;dk2}. 37

Furthermore, the correction direction d˜k in Step 6 is always yielded by the solution of SLE(Vk;0,μ˜k).

Proof

First, from the given conditions and Theorem 2(iv), relation zIkλI is at hand. Further, this, together with Theorem 2(ii), shows that zjk/λjk1 for jI. So, it follows that ωk=o(dk2) from (18).

Second, we prove relation (36). From Theorem 2(ii), (iii), we know that μk=φkd¯kνzIk0,k. This, along with SLE(Vk;1,0) and SLE(Vk;1,μk) as well as Lemma 4(i), implies that there exists a positive constant c such that

dkd¯kcd¯kν,λkλ¯kcd¯kν,dkd¯k. 38

Therefore, from definition (17) of μ˜k, Taylor expansion and SLE(Vk;1,μk), one has for jI

μ˜jk=ωkzjkgj(xk+dk)=ωkzjk(gjk+gjkTdk)+O(dk2)=ωkzjk(1zjk/λjk)gjkTdk+O(dk2)=ωk+O(max{|zik/λik1|dk,iI})+O(dk2)=ωk+O(ωˆk)+O(dk2)=o(dk2)+O(ωˆk)+O(dk2).

Obviously, definition (37) implies dk2=O(ωˆk). Thus μ˜k=O(ωˆk)=o(dk). Therefore, from SLE(Vk;0,μ˜k) and Lemma 4, it is clear that the first relation of (36) holds. Finally, relation ωˆkdk=o(ωk) follows from definitions (37) of ωˆk and (18) of ωk. □

To ensure the step size tk1 for k large enough, which is necessary to obtain superlinear convergence, similarly or refer to [8, 9], the following second order approximate condition is necessary.

H5

Assume that the relation Pk(xx2Lρ¯(xk,λk)Hk)Pkd¯k=o(d¯k) holds, where the projective matrix Pk is defined by Pk=EnNk(NkTNk)1NkT with Nk=gI(xk) and n-order unit matrix En.

Remark 10

About H5

  • (i)

    Due to I=I(x)=IIı(x), one knows from H3(i) that matrix NkgI(x) which is column full rank, and matrix Pk is well defined when k is large enough.

  • (ii)
    The 2-sided projection second order approximation H5 above, also used in [1, 8, 18, 22], is milder than the 1-sided projection second order approximation:
    H5+
    Pk(xx2Lρ¯(xk,λk)Hk)d¯k=o(d¯k).
    Both the two can ensure the step unit is achieved. However, the associated algorithms can attain (one-step) q-superlinear convergence under the latter, and only two-step superlinear convergence under the former.
  • (iii)

    In view of relation (38), assumptions H5 and H5+ are equivalent to Pk(xx2Lρ¯(xk,λk)Hk)Pkdk=o(dk) and Pk(xx2Lρ¯(xk,λk)Hk)dk=o(dk), respectively.

Theorem 3

Suppose that X˜ and hypotheses H2-H5 hold, and assume that the boundary parameters ε_ and ε̅ satisfy (34). Then the step size tk of Algorithm  A always equals one, i.e., tk1 for k large enough.

Proof

(i) Discuss (20). For jI=I(x), gj(x)<0, using the continuity of gj and (xk,dk,d˜k)(x,0,0),k, we know that (20) holds for t=1 and k large enough.

For jI=I(x)=Ik, in view of SLE(Vk;1,μk), Theorem 2(iii) and dkd¯k as well as λjkλj>0, we have

zjkgjkTdk+gjkλjk=μjk=o(dk2),gjk=O(dk). 39

Again, taking into account SLE(Vk;0,μ˜k), one has

zjkgjkTd˜k+zjkgj(xk+dk)+λ˜jkgjk=ωk.

This, together with (39), (36) and ωk=o(dk2), shows that

gj(xk+dk)+gjkTd˜k=ωkzjk+λ˜jkO(dk)=ωkzjk+O(ωˆkdk)=ωkzjk+o(ωk)=O(ωk)=o(dk2). 40

Further, using Taylor expansion and (36), one has

gj(xk+dk+d˜k)=gj(xk+dk)+gj(xk+dk)Td˜k+O(d˜k2)=gj(xk+dk)+gjkTd˜k+O(dkd˜k)+O(d˜k2)=ωkzjk+o(ωk)+O(ωˆkdk)=ωkzjk+o(ωk)=o(dk2).

Hence, we can conclude from the fourth equality above that inequality (20) holds for jI, t=1 and k large enough since zjkλj>0.

(ii) Analyze (19). From Taylor expansion and (36), it follows that

wk:=fρ¯(xk+dk+d˜k)fρ¯(xk)αfρ¯(xk)Tdk=fρ¯(xk)T(dk+d˜k)+12(dk)T2fρ¯(xk)dkαfρ¯(xk)Tdk+o(dk2). 41

On the other hand, from SLE(Vk;1,μk), we have

Hkdk+fρ¯(xk)+jIλjkgjk=0, 42

which, together with (36), gives

fρ¯(xk)Tdk=(dk)THkdkjIλjkgjkTdk, 43
fρ¯(xk)T(dk+d˜k)=(dk)THkdkjIλjkgjkT(dk+d˜k)+o(dk2). 44

Therefore, by (40) and Taylor expansion for gj(xk+dk) at point xk, one yields

gjkT(dk+d˜k)=gjk12(dk)T2gj(xk)dk+o(dk2),jI.

This, together with (44), shows that

fρ¯(xk)T(dk+d˜k)=(dk)T(Hk+12jIλjk2gj(xk))dk+jIλjkgjk+o(dk2). 45

On the other hand, the first relation of (39) gives

λjkgjkTdk=((λjk)2/zjk)gjk+o(dk2),jI.

This, along with (43), shows that

(dk)THkdk=fρ¯(xk)Tdk+jI(λjk)2zjkgjk+o(dk2). 46

Again, substituting (45) into (41), one has

wk=jIλjkgjk+12(dk)T(xx2Lρ¯(xk,λk)Hk)dk12(dk)THkdkαfρ¯(xk)Tdk+o(dk2).

Therefore, substituting (46) into the relation above, we have

wk=(12α)fρ¯(xk)Tdk+12(dk)T(xx2Lρ¯(xk,λk)Hk)dk+jIλjk(1λjk2zjk)gjk+o(dk2). 47

On the other hand, from the definition of the projection matrix Pk, we get

dk=Pkdk+d0k,d0k=Nk(NkTNk)1NkTdk.

Furthermore, in view of SLE(Vk;1,μk), Theorem 2(iii) and the above division, one has

NkTdk=Zk1(μkdiag(gIk)λIk),d0k=o(dk2)+O(gIk). 48

Thus, relation (47), together with the relations above and H5, implies that

wk=jIλjk(1λjk2zjk)gjk+o(dk2)+(12α)fρ¯(xk)Tdk+12(d0k+Pkdk)T(xx2Lρ¯(xk,λk)Hk)(d0k+Pkdk)=(12α)fρ¯(xk)Tdk+jIλjk(1λjk2zjk)gjk+O(gIk)+o(dk2). 49

On the other hand, taking into account Lemma 6(ii), Lemma 7 and Theorem 2, one has (when k)

λjkλj>0,λjk(1λjk2zjk)λj2>0,jI. 50

Further, relations (24), (13), (25), (23) and dkd¯k as well as H2 yield

fρ¯(xk)Tdkθξkθfρ¯(xk)Td¯k=θ(d¯k)TQkd¯kθad¯k2=θadk2+o(dk2). 51

Therefore, for k large enough, relations (49)-(51) show that wk(α12)θadk2+o(dk2)0. Thus, inequality (19) holds for t=1 and k large enough, and the entire proof of Theorem 3 is finished. □

Finally, based on Theorem 3, by similar analysis in [1, 18, 22] (for two-step superlinear convergence) and [21], Appendix A (for one-step superlinear convergence), we can prove the following rate of superlinear convergence.

Theorem 4

Suppose that X˜ and the hypotheses H2-H5 hold. If the boundary parameters ε_ and ε̅ satisfy (34), then the proposed Algorithm  A is two-step superlinearly convergent, i.e., xk+2x=o(xkx). Moreover, if H5 is strengthened as H5+, then Algorithm  A is one-step superlinearly convergent, i.e., xk+1x=o(xkx).

Numerical experiments

In this section, to show the practical effectiveness of Algorithm A, we test 59 typical problems from [33]. The numerical experiments are implemented by using MATLAB R2013a, and on a PC with Inter(R) Core(TM) i5-4590 3.30 GHz CPU, 4.00 GB RAM. The details about the implementation are described as follows.

Computing matrix Hk

During the process of iteration, to ensure the boundedness of {Hk}, by modifying the computing technique in [1] for the approximate Lagrangian Hessian, we introduce a slightly new computing method for the approximate Hessian matrix Hk in Step 2 as follows from second order derivative information. Denote vector zˆk and matrix Mk by

zˆk=(zIk,zIkık,0IıIkı), 52
Mk=xx2Lρk(xk,zˆk)jIzˆjkgjkgjkgjkT. 53

Then compute the smallest eigenvalue ϑmink of matrix Mk, and yield

θk={0,if ϑmink>ε_;ϑmink+ε_,if |ϑmink|ε_;2|ϑmink|,otherwise. 54

Subsequently, compute matrix Hk in Step 2 by

Hk={xx2Lρk(xk,zˆk)+θkEn,if ρkε and θkε;En,otherwise, 55

where the positive parameters ε_ and ε̅ same as the ones in Algorithm A are sufficiently small and sufficiently large, respectively.

The sequence {Hk} of matrices defined above possesses nice properties as follows.

Theorem 5

Suppose that X˜ and assumptions H3 and H4(i) hold. Yield matrix Hk in Step 2 by (52)-(55). If the sequence {xk} yielded by Algorithm  A is bounded, then the following results hold.

  • (i)

    The sequence {Hk} is bounded and satisfies the positive definite restriction (27) with constant a=ε_, so H2 holds.

  • (ii)
    In addition, assume that H4(ii) and (34) are satisfied. Then, for k large enough, matrix Mk is positive definite, ϑmink>0 and θk<ε_. Therefore, Hk is always yielded by the first case in (55), i.e.,
    Hk=xx2Lρk(xk,zˆk)+θkEn,when k is large enough. 56

Further, it follows that

limkxx2Lρ¯(xk,λk)Hk=limkθkε_0

when ε_ is sufficiently small. In this sense, we say assumption H5+ is almost satisfied.

Proof

(i) By the boundedness of {(xk,zˆk)}, in view of H4(i), the boundedness of {Hk} follows immediately from (55). To show the second claim of part (i), it is sufficient to discuss the case HkEn. For any dRn, from (22) and (52)-(55), one has

dTQkd=dT(HkjIzˆjkgjkgjkgjkT)d=dT(Mk+θkEn)d. 57

On the other hand, due to the symmetry of matrix Mk, there exists a real orthogonal matrix Uk such that Mk=Ukdiag(ϑk)UkT, where ϑk=(ϑik) is the eigenvalue vector of Mk. Therefore,

dTMkd=dTUkdiag(ϑk)UkTd=(UkTd)Tdiag(ϑk)UkTd=i=1n(uik)2ϑikϑminki=1n(uik)2=ϑmink(UkTd)TUkTd=ϑminkd2,

where uk=(uik,i=1,,n)=UkTd. This, along with (57) and (54), shows that

dTQkd=dTMkd+θkd2(ϑmink+θk)d2ε_d2.

So request (27) is satisfied with a=ε_.

(ii) First, under the given conditions, one knows that all the assumptions requested in Theorem 2 and Lemma 7 are satisfied. So, by Theorems 2 and Lemma 7, it follows that

IkıIı(x),ρkρ¯,(xk,zˆk)(x,λ),xx2Lρ¯(xk,zˆk)xx2Lρ¯(x,λ)=xx2L(x,λ). 58

Therefore, taking the above results and the SOSC in H4(ii) into account, it is not difficult to show that matrix Mk is positive definite when k is large enough, and this together with (54)-(55) and (58) further implies that the remaining claims in part (ii) hold. □

Based on Theorem 5, comparing with [1], the following remark is given.

Remark 11

The technique (52)-(55) yielding matrix Hk is a modification of the one in [1], Section 5.1, and they are unlike in two points. First, the former introduced in this work can ensure the boundedness of {Hk} (see Theorem 5(i)), which plays a key role in the analysis of global and superlinear convergence; especially, in ensuring the penalty parameter ρk is increased at most finitely many times. However, the latter in [1], Section 5.1, cannot ensure the boundedness of the sequence {Wk} yielded by [1], Section 5.1 (corresponds to {Hk} in this paper) since this strict relies on the bounded property of {(ρk,θk)}, and one of the necessary conditions for the boundedness of {(ρk,θk)} is just the boundedness of {Wk} (see the proof of [1], Lemma 4.1). Second, by introducing zˆk in the computation technique (52)-(55) rather than zk (corresponds the one denoted in [1], Section 5.1), the assumption H5+ is almost satisfied (see Theorem 5(iii)). If one still uses zk rather that zˆk in (52)-(55), then the second order approximate condition H5+ even H5 would be difficult to be satisfied since zIıIı(x)kε_eIıIı(x)>0=λIıIı(x) (by Theorem 2(iv)). Of course, in view of limkzkzˆk=ε_ which is small enough, it can be thought that the numerical performances with zk and zˆk should possess no distinct difference.

Choices of parameters

The parameters in our numerical testing are chosen as follows:

r=0.5,α=0.45,θ=0.99,β=0.5,σ=0.8,ξ=2.5,ε=105,ε_=105,ν=3,ρ0=p=1,ϑ=2,M=100,γ1=0.1,γ2=γ3=0.01,z0=(1,,1).

Remark 12

Analysis for lower bound ε_ and upper bound ε̅

First, by Theorems 1 and 2, it is known that, in terms of global and strong convergence of Algorithm A, there is no additional request on the lower bound ε_ and upper bound ε̅, i.e., any two positive constants should be suitable. Second, if one considers the rate of convergence of Algorithm A, by Theorem 4, parameters ε_ and ε̅ should be sufficiently small and sufficiently large, respectively. However, if the initial values of ε_ and ε̅ are chosen too small and/or too large, the numerical performances should be unstable. An ideal approach is to decrease ε_ and increase ε̅ based on values min{zik,iI} and max{zik,iI}, respectively.

Termination rules

During the process of iteration, the implementation is terminated successfully if one of the following two conditions is satisfied:

(i) Φ(xk,λk)<105; (ii) d¯k<105 and max{λ¯jk,jIı}<105.

Numerical reports

For the sake of comparing equally, the same initial points as in [33] should be selected. However, Algorithm A starts with a feasible interior point, namely, x0X0˜, and some initial points given in [33] do not satisfy this request. So, other initial points for these problems are selected and listed in Table 1.

Table 1.

Feasible initial interior points for testing problems

Prob. x0 Prob. x0 Prob. x0
HS6 (2,2) HS32 (0.1,0.7,0.1) HS63 (1,1,1)
HS7 (0,1) HS39 (−1,−1,0,0) HS73 (1,1,1,1)
HS8 (4,2) HS40 (2,−1,0,1) HS78 (−2,1.5,1,−1,−1)
HS25 (50,25,1.5) HS42 (1,1,1,1) HS79 (0,0,0,0,0)
HS26 (0.2,0.2,0.2) HS52 (1,−0.5,−1,0,1) HS80 (−2,−1.5,1,1,1)
HS27 (0,0,0) HS53 (−6 2 2 2 2) HS81 (−1.7,1,1.5,−0.8,−0.8)
HS28 (0,0,0) HS60 (0,0,0) HS107 (0.8 0.8 10 10 1 1 1 1 1)
HS111 (−1,−1,−1,−1,−1,−1,−1,−1,−1,−1) HS114 (1,745 12,000 110 3,048 1,974 89.2 92.8 8 3.6 145)

The numerical results are reported and compared with the ones from [1] in Table 2, where the columns have the following meanings:

Prob.:

the problem number given in [33];

Itr:

the number of iterations;

Nf:

the number of function evaluations for f;

N:

the total number of function evaluations for gj;

ρ̄:

the final value of ρk;

Tcpu:

the CPU time (seconds);

ffinal:

the objective function value at the final iterate.

Table 2.

Numerical experiment compared reports

Prob. n me mi Algorithm  A in this paper Algorithm from [ 1 ]
Itr Nf N ρ¯ ffinal Tcpu Itr ρ¯ ffinal
HS1 2 0 1 28 73 70 1 1.7825e − 18 0.02 24 1 6.5782e − 27
HS3 2 0 1 6 7 8 1 2.3501e − 06 0.01 4 1 8.5023e − 09
HS4 2 0 2 7 13 29 1 2.6667e + 00 0.01 4 1 2.6667e + 00
HS5 2 0 4 5 13 47 1 −1.9132e + 00 0.01 6 1 −1.9132e + 00
HS6 2 1 0 9 364 718 1 2.4199e − 07 0.03 7 2 0.0000e + 00
HS7 2 1 0 8 15 28 32 −1.7320e + 00 0.01 9 2 −1.7321e + 00
HS8 2 2 0 9 16 59 8,192 −1.0000e + 00 0.01 14 1 −1.0000e + 00
HS9 2 1 0 18 34 66 8,192 −4.9985e − 01 0.02 10 1 −5.0000e + 01
HS12 2 0 1 9 19 39 1 −3.0000e + 01 0.01 5 1 −3.0000e + 01
HS24 2 0 5 16 29 179 1 −1.0000e + 00 0.02 14 1 −1.0000e + 00
HS25 3 0 6 1 1 6 1 9.4934e − 31 0.01 62 1 1.8185e − 16
HS26 3 1 0 16 76 142 2 1.6085e − 04 0.02 19 2 2.8430e − 12
HS27 3 1 0 28 484 939 4 3.9958e − 02 0.05 14 32 4.0000e − 02
HS28 3 1 0 11 38 71 1,024 7.5674e − 08 0.01 6 1 0.0000e + 00
HS29 3 0 1 11 24 53 1 −2.2627e + 01 0.01 8 1 −2.2627e + 01
HS30 3 0 7 7 10 63 1 1.0000e + 00 0.02 7 1 1.0000e + 00
HS32 3 1 4 19 33 166 128 9.8818e − 01 0.02 24 4 1.0000e + 00
HS33 3 0 6 15 20 189 1 −4.5178e + 00 0.02 29 1 −4.5858e + 00
HS34 3 0 8 10 15 104 1 −8.3403e − 01 0.02 30 1 −0.8340e + 00
HS36 3 0 7 10 15 144 1 −3.3000e + 03 0.02 10 1 −3.3000e + 03
HS37 3 0 8 12 19 200 1 −3.4560e + 03 0.02 7 1 −3.4560e + 03
HS38 4 0 8 73 153 1,218 1 1.9761e − 11 0.06 37 1 3.1594e − 24
HS39 4 2 0 11 19 63 1 2.5328e − 04 0.02 19 4 −1.0000e + 00
HS40 4 3 0 49 108 726 2 −2.5000e − 01 0.05 4 2 −2.500e + 00
HS42 4 2 0 36 70 290 1,024 1.3883e + 01 0.03 6 4 1.3858e + 01
HS43 4 0 3 12 29 73 1 −4.4000e + 01 0.02 9 1 −4.4000e + 01
HS46 5 2 0 101 234 735 1 1.3088e − 04 0.05 25 2 6.6616e − 12
HS47 5 3 0 21 54 276 1 2.0468e − 04 0.04 25 16 8.0322e − 14
HS48 5 2 0 21 55 202 2,048 3.1361e − 09 0.02 6 4 0.0000e + 00
HS49 5 2 0 51 87 276 64 1.1761e − 02 0.03 69 64 3.5161e − 12
HS50 5 3 0 50 200 1,065 128 9.3190e − 05 0.04 11 512 4.0725e − 17
HS51 5 3 0 29 132 722 256 2.2808e − 05 0.03 8 4 0.0000e + 00
HS52 5 3 0 31 45 225 256 5.2930e + 00 0.03 4 8 5.3266e + 00
HS53 5 3 10 36 69 1,694 256 4.0734e + 00 0.06 5 8 4.0930e + 00
HS56 7 4 0 21 43 2,482 4 −2.6183e + 00 0.06 12 4 −3.4560e + 00
HS57 2 0 3 34 53 141 1 2.8461e − 02 0.03 15 18 2.8460e − 02
HS60 3 1 6 18 43 574 1 3.2650e − 02 0.04 7 1 3.2568e − 02
HS61 3 2 0 16 255 986 256 −1.7195e + 02 0.03 44 128 −1.4365e + 02
HS62 3 1 6 8 19 153 1 −2.6273e + 04 0.02 5 1 −2.6273e + 04
HS63 3 2 3 15 27 200 1 9.6232e + 02 0.02 5 2 9.6172e + 02
HS66 3 0 8 15 42 249 1 5.1816e − 01 0.02 1,000+ 1 5.1817e − 01
HS70 4 0 9 16 22 214 1 1.0085e − 02 0.03 22 1 1.7981e − 01
HS73 4 1 6 17 35 213 1 2.9896e + 01 0.03 16 1 2.9894e + 01
HS77 5 2 0 21 141 587 1 4.5981e − 01 0.06 13 1 2.4151e − 01
HS78 5 3 0 23 66 329 1 −2.9197e + 00 0.03 4 4 −2.9197e + 00
HS79 5 3 0 16 26 123 128 7.8681e − 02 0.02 7 2 7.8777e − 02
HS80 5 3 10 66 196 3,975 4 6.0149e − 02 0.14 6 2 5.3950e − 02
HS81 5 3 10 19 37 708 8 6.4109e − 02 0.05 9 8 5.3950e − 02
HS84 5 0 16 30 57 1,252 1 −5.2803e + 06 0.06 30 1 −5.2803e + 06
HS93 6 0 8 21 43 1,387 1 1.3629e + 02 0.04 12 1 1.3508e + 02
HS99 7 2 14 18 31 57 1 −8.3108e + 08 0.02 8 4 0.0000e + 00
HS100 7 0 4 8 22 86 1 6.8063e + 02 0.02 9 1 6.8063e + 02
HS107 9 6 8 41 67 1,086 1 1.3748e − 08 0.06 1,000+ 8,192 5.0545e + 38
HS110 10 0 20 11 510 10,146 1 −4.3134e + 01 0.13 6 1 −4.5778e + 01
HS111 10 3 20 26 264 6,542 1,024 −5.8531e + 01 0.14 1,000+ 1 −4.7760e + 01
HS112 10 3 10 6 11 199 2 −5.3197e + 01 0.02 11 1 −4.7761e + 01
HS113 10 0 8 21 48 519 1 2.4306e + 01 0.03 10 1 2.4306e + 01
HS114 10 3 28 11 136 4,474 16 −1.3407e + 03 0.13 39 256 −1.7688e + 03
HS118 15 0 59 34 51 2,554 1 6.6482e + 02 0.12 - - -

Same as the way of counting the number of iterations in [1], due to only a little change at the right side vector of SLE (10) in the loop between Step 3(i) and Step 3(ii), which leads to low computational cost, the number of this loop is not counted in the total number of iterations Itr.

From Table 2 it is clear that, for almost all test problems, the two algorithms (Algorithms A and the one in [1]) have the same optimal objective value. Relatively speaking, it also shows that Algorithm A is a promising one in terms of the CPU time, the number of function evaluations Nf and the total number of function evaluations N.

In particular, the following four performances are worth to be mentioned. First, for HS66, HS107 and HS111, the algorithm [1] yields the associated ffinal after 1000 iterations for each problem, while Algorithm A needs only 15, 41 and 26 iterations, respectively. Second, for HS107, the two algorithms yield two large different final objective function values ffinal, namely, 3,748e−08 and 5.0545e+38. Third, for HS118 with the same dimension as HS117, Algorithm A has a good numerical performance, while it is not reported in [1]. Fourth, for HS54, HS75, HS85 and HS117, Algorithm A fails to produce an invertible coefficient matrix after some iterations, then it cannot obtain the optimal objective value, so they are not listed in Table 2.

For more clarity, we also give the output of Algorithm A for problem HS8 in Table 3. It is found from ρk-column of Table 3 that the penalty parameter needs to be increased one, two, four and six times at 2nd, 3rd, 4th and 5th iterations, respectively; and it can be fixed in the subsequent iterations.

Table 3.

Output of Algorithm  A for problem HS8

k ρk xk f(xk) d¯k gk
0 1 (4, 2) −1.00000e+00 5.09902e+00
1 1 (4.56235, 1.91528) −1.00000e+00 5.70399e−01 5.09902e+00
2 2 (4.60502, 1.91721) −1.00000e+00 1.02139e−01 5.79231e−01
3 8 (4.60222, 1.94248) −1.00000e+00 1.07010e−01 2.08008e−01
4 128 (4.60158, 1.95367) −1.00000e+00 1.83034e−01 7.60755e−02
5 8,192 (4.60159, 1.95516) −1.00000e+00 1.93245e−01 1.32456e−02
6 8,192 (4.60159, 1.95575) −1.00000e+00 5.86992e−04 4.14795e−03
7 8,192 (4.60159, 1.95581) −1.00000e+00 1.24840e−04 5.82513e−04
8 8,192 (4.60159, 1.95583) −1.00000e+00 2.60839e−05 2.36077e−04
9 8,192 (4.60159, 1.95584) −1.00000e+00 1.21999e−05 7.55776e−05

Conclusions

In this paper, based on a simple and effective penalty parameter update rule and using the idea of primal-point interior method, a primal-dual interior point QP-free algorithm for nonlinear constrained optimization is proposed and analyzed. A ‘working set’ technique for estimating the active set is used in this work, then we need to solve only two or three reduced systems of linear equations with the same coefficient matrix at each iteration. Under suitable CQ and assumptions including a relaxed positive definite restriction on the Lagrangian Hessian estimate Hk, but without the isolatedness of the stationary points, the proposed algorithm is globally and superlinearly convergent. Moreover, a slightly new computation technique for Hk based on second order derivative information is introduced such that the associated assumptions, i.e., the boundedness of {Hk}, the relaxed positive definiteness and the 1-sided projection second order approximation H5+, are all (almost) satisfied. The numerical experiments based on the proposed computation technique for Hk show that the proposed algorithm is promising.

Results and discussion

In this work, a new primal-dual interior point QP-free algorithm for nonlinear optimization with equality and inequality constraints is proposed. The global and superlinear convergence are analyzed. Some effective numerical results are reported. As further work, there are several interesting problems worthy of discussing. First, refer to [21], improve the algorithm such that it can start from an arbitrary initial point. Second, try to get rid of the strict complementarity condition. Third, apply the ideas in the paper to minimax optimization problems, engineering problems and so on.

Acknowledgements

Project supported by the Natural Science Foundation of Guangxi Province (Nos. 2016GXNSFDA380019 and 2014GXNSFFA118001) and the Natural Science Foundation of China (Nos. 11771383 and 11561005).

Footnotes

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JBJ carried out the idea of this paper, conceived of the description of Algorithm A and drafted the manuscript. HJZ carried out the convergence analysis of Algorithm A. GDM participated in the numerical experiments and helped to draft the manuscript. ZBZ participated in the convergence analysis. All authors read and approved the final manuscript.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Jinbao Jian, Email: jianjb@gxu.edu.cn.

Hanjun Zeng, Email: hanjunk@sohu.com.

Guodong Ma, Email: mgd2006@163.com.

Zhibin Zhu, Email: zhuzb@guet.edu.cn.

References

  • 1.Tits AL, Wächter A, Bakhtiari S, Urban TJ, Lawrence CT. A primal-dual interior-point method for nonlinear programming with strong global and local convergence properties. SIAM J. Optim. 2003;14(1):173–199. doi: 10.1137/S1052623401392123. [DOI] [Google Scholar]
  • 2.Mayne DQ, Polak E. Feasible direction algorithms for optimization problems with equality and inequality constraints. Math. Program. 1976;11:67–80. doi: 10.1007/BF01580371. [DOI] [Google Scholar]
  • 3.Gao ZY, He GP, Wu F. Sequential systems of linear equations algorithm for nonlinear optimization problems with general constraints. J. Optim. Theory Appl. 1997;95:371–397. doi: 10.1023/A:1022639306130. [DOI] [Google Scholar]
  • 4.Lian SJ, Duan YQ. Smoothing of the lower-order exact penalty function for inequality constrained optimization. J. Inequal. Appl. 2016;2016:185. doi: 10.1186/s13660-016-1126-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Jian JB, Xu QJ, Han DL. A strongly convergent norm-relaxed method of strongly sub-feasible direction for optimization with nonlinear equality and inequality constraints. Appl. Math. Comput. 2006;182:854–870. [Google Scholar]
  • 6.Jian JB, Tang CM, Hu QJ, Zheng HY. A feasible descent SQP algorithm for general constrained optimization without strict complementarity. J. Comput. Appl. Math. 2006;180(2):391–412. doi: 10.1016/j.cam.2004.11.008. [DOI] [Google Scholar]
  • 7.Herskovits J. Feasible directions interior-point technique for nonlinear optimization. J. Optim. Theory Appl. 1998;99:121–146. doi: 10.1023/A:1021752227797. [DOI] [Google Scholar]
  • 8.Boggs PT, Tolle JW, Wang P. On the local convergence of quasi-Newton methods for constrained optimization. SIAM J. Control Optim. 1982;20:161–171. doi: 10.1137/0320014. [DOI] [Google Scholar]
  • 9.Boggs PT, Tolle JW. Sequential Quadratic Programming. Cambridge: Cambridge University Press; 1995. pp. 1–51. [Google Scholar]
  • 10.Gill PE, Murray W, Saunders MA. SNOPT: an SQP algorithm for large-scale constrained optimization. SIAM Rev. 2005;47:99–131. doi: 10.1137/S0036144504446096. [DOI] [Google Scholar]
  • 11.Jian JB, Tang CM. An SQP feasible descent algorithm for nonlinear inequality constrained optimization without strict complementarity. Comput. Math. Appl. 2005;49:223–238. doi: 10.1016/j.camwa.2004.09.004. [DOI] [Google Scholar]
  • 12.Jian JB, Zheng HY, Hu QJ, Tang CM. A new norm-relaxed method of strongly sub-feasible direction for inequality constrained optimization. Appl. Math. Comput. 2005;168:1–28. [Google Scholar]
  • 13.Jian JB, Zheng HY, Tang CM, Hu QJ. A new superlinearly convergent norm-relaxed method of strong sub-feasible direction for inequality constrained optimization. Appl. Math. Comput. 2006;182:955–976. [Google Scholar]
  • 14.Lawrence CT, Tits AL. A computationally efficient feasible sequential quadratic programming algorithm. SIAM J. Optim. 2001;11:1092–1118. doi: 10.1137/S1052623498344562. [DOI] [Google Scholar]
  • 15.Panier ER, Tits AL. A superlinearly convergent feasible method for the solution of inequality constrained optimization problems. SIAM J. Control Optim. 1987;25:934–950. doi: 10.1137/0325051. [DOI] [Google Scholar]
  • 16.Panier ER, Tits AL. On combining feasibility, descent and superlinear convergence in inequality constrained optimization. Math. Program. 1993;59:261–276. doi: 10.1007/BF01581247. [DOI] [Google Scholar]
  • 17.Spellucci P. An SQP method for general nonlinear programs using only equality constrained subproblem. Math. Program. 1998;82:413–448. [Google Scholar]
  • 18.Bakhtiari S, Tits AL. A simple primal-dual feasible interior-point method for nonlinear programming with monotone descent. Comput. Optim. Appl. 2003;25:17–38. doi: 10.1023/A:1022944802542. [DOI] [Google Scholar]
  • 19.El-bakry AS, Tapia RA, Tsuchiya T, Zhang Y. On the formulation and theory of the Newton interior-point method for nonlinear programming. J. Optim. Theory Appl. 1996;89:507–541. doi: 10.1007/BF02275347. [DOI] [Google Scholar]
  • 20.Forsgren A, Gill PE, Wright MH. Interior methods for nonlinear optimization. SIAM Rev. 2002;44:525–597. doi: 10.1137/S0036144502414942. [DOI] [Google Scholar]
  • 21.Jian JB, Pan HQ, Tang CM, Li JL. A strongly sub-feasible primal-dual quasi interior-point algorithm for nonlinear inequality constrained optimization. Appl. Math. Comput. 2015;266:560–578. [Google Scholar]
  • 22.Panier ER, Tits AL, Herskovits JN. A QP-free, globally convergent, locally superlinearly convergent algorithm for inequality constrained optimization. SIAM J. Control Optim. 1988;26:788–811. doi: 10.1137/0326046. [DOI] [Google Scholar]
  • 23.Qi HD, Qi LQ. A new QP-free, globally convergent, locally superlinearly convergent algorithm for inequality constrained optimization. SIAM J. Optim. 2000;11:113–132. doi: 10.1137/S1052623499353935. [DOI] [Google Scholar]
  • 24.Wang YL, Chen L, He GP. Sequential systems of linear equations method for general constrained optimization problems without strict complementarity. J. Comput. Appl. Math. 2005;182:447–471. doi: 10.1016/j.cam.2004.12.023. [DOI] [Google Scholar]
  • 25.Yang YF, Li DH, Qi LQ. A feasible sequential linear equation method for inequality constrained optimization. SIAM J. Optim. 2003;13:1222–1244. doi: 10.1137/S1052623401383881. [DOI] [Google Scholar]
  • 26.Zhu ZB. An interior point type QP-free algorithm with superlinear convergence for inequality constrained optimization. Appl. Math. Model. 2007;31:1201–1212. doi: 10.1016/j.apm.2006.04.019. [DOI] [Google Scholar]
  • 27. Maratos, N: Exact penalty function algorithm for finite dimensional and control optimization problems. Dissertation, Imperial College Science, Technology, University of London (1978)
  • 28.Cai XZ, Wu L, Yue YJ, Li MM, Wang GQ. Kernel-function-based primal-dual interior-point methods for convex quadratic optimization over symmetric cone. J. Inequal. Appl. 2014;2014:308. doi: 10.1186/1029-242X-2014-308. [DOI] [Google Scholar]
  • 29.Facchinei F, Fischer A, Kanzow C. On the accurate identification of active constraints. SIAM J. Optim. 1998;9:14–32. doi: 10.1137/S1052623496305882. [DOI] [Google Scholar]
  • 30.Chen L, Wang YL, He GP. A feasible active set QP-free method for nonlinear programming. SIAM J. Optim. 2006;17:401–429. doi: 10.1137/040605904. [DOI] [Google Scholar]
  • 31.Liu Y, Jian JB, Zhu ZB. New active set identification for general constrained optimization and minimax problems. J. Math. Anal. Appl. 2015;421:1405–1416. doi: 10.1016/j.jmaa.2014.07.041. [DOI] [Google Scholar]
  • 32.Wäachter A, Biegler LT. Failure of global convergence for a class of interior point methods for nonlinear programming. Math. Program. 2000;88:565–574. doi: 10.1007/PL00011386. [DOI] [Google Scholar]
  • 33.Hock W, Schittkowski K. Test Examples for Nonlinear Programming Codes. Heidelberg: Springer; 1981. [Google Scholar]

Articles from Journal of Inequalities and Applications are provided here courtesy of Springer

RESOURCES