A Modified BFGS Formula Using a Trust Region Model for Nonsmooth Convex Minimizations

Zengru Cui; Gonglin Yuan; Zhou Sheng; Wenjie Liu; Xiaoliang Wang; Xiabin Duan

doi:10.1371/journal.pone.0140606

. 2015 Oct 26;10(10):e0140606. doi: 10.1371/journal.pone.0140606

A Modified BFGS Formula Using a Trust Region Model for Nonsmooth Convex Minimizations

Zengru Cui ¹, Gonglin Yuan ^1,^2,^*, Zhou Sheng ¹, Wenjie Liu ^2,³, Xiaoliang Wang ¹, Xiabin Duan ¹

Editor: Lixiang Li⁴

PMCID: PMC4621044 PMID: 26501775

Abstract

This paper proposes a modified BFGS formula using a trust region model for solving nonsmooth convex minimizations by using the Moreau-Yosida regularization (smoothing) approach and a new secant equation with a BFGS update formula. Our algorithm uses the function value information and gradient value information to compute the Hessian. The Hessian matrix is updated by the BFGS formula rather than using second-order information of the function, thus decreasing the workload and time involved in the computation. Under suitable conditions, the algorithm converges globally to an optimal solution. Numerical results show that this algorithm can successfully solve nonsmooth unconstrained convex problems.

Introduction

Consider the following convex problem:

\begin{matrix} min_{x \in R^{n}} f (x), \end{matrix}

(1)

where f : ℝⁿ → ℝ is a possibly nonsmooth convex function. In general, this problem has been well studied for several decades when f is continuously differentiable, and a number of different methods have been developed for its solution Eq (1) (for example, numerical optimization method [1–3] etc, heuristic algorithm [4–6] etc). However, when f is a nondifferentiable function, the difficulty of solving this problem increases. Recently, such problems have arisen in many medical, image restoration and optimal control applications (see [7–13] etc). Some authors have previously studied nonsmooth convex problems (see [14–18] etc).

Let F : ℝⁿ → ℝ be the so-called Moreau-Yosida regularization of f, which is defined by

\begin{matrix} F (x) : = min_{z \in R^{n}} {f (z) + \frac{1}{2 λ} {∥ z - x ∥}^{2}}, \end{matrix}

(2)

where λ is a positive parameter and ‖ ⋅ ‖ denotes the Euclidean norm. The problem Eq (1) is equivalent to the following problem

\begin{matrix} min_{x \in R^{n}} F (x) . \end{matrix}

(3)

It is well known that the problems Eqs (1) and (3) of the solution sets are the same. As we know, one of the most effective methods for problems Eq (3) is the trust region method.

The trust region method plays an important role in the area of nonlinear optimization, and it has been proven to be a very efficient method. Levenberg [19] and Marquardt [20] first applied this method to nonlinear least-squares problems, and Powell [21] established a convergence result for this method for unconstrained problems. Fletcher [22] first proposed a trust region method for composite nondifferentiable optimization problems. Over the past decades, many authors have studied the trust region algorithm to minimize nonsmooth objective function problems. For example, Sampaio, Yuan and Sun [23] used the trust region algorithm for nonsmooth optimization problems; Sun, Sampaio and Yuan [24] proposed a quasi-Newton trust region algorithm for nonsmooth least-squares problems; Zhang [25] used a new trust region algorithm for nonsmooth convex minimization; and Yuan, Wei and Wang [26] proposed a gradient trust region algorithm with a limited memory BFGS update for nonsmooth convex minimization problems. For other references on trust region methods, see [27–35], among others. In particular, for the problem we address in this study, as we can compute the exact Hessian, the trust region method could be very efficient. However, it is difficult to compute the Hessian at every iteration, which increases the computational workload and time.

The purpose of this paper is to present an efficient trust region algorithm to solve Eq (3). With the use of the Moreau-Yosida regularization (smoothing) and the new quasi-Newton equation, the given method has the following good properties: (i) the Hessian makes use of not only the gradient value but also the function value and (ii) the subproblem of the proposed method, which possesses the form of an unconstrained trust region subproblem, can be solved using existing methods.

The remainder of this paper is organized as follows. In the next section, we briefly review some basic results in convex analysis and nonsmooth analysis and state a new quasi-Newton secant equation. In section 3, we present a new algorithm for solving problem Eq (3). In section 4, we prove the global convergence of the proposed method. In section 5, we report numerical results and present some comparisons for the existing methods to solve problem Eq (1). We conclude our paper in Section 6.

Throughout this paper, unless otherwise specified, ‖ ⋅ ‖ denotes the Euclidean norm of vectors or matrices.

Initial results

In this section, we first state some basic results in convex analysis and nonsmooth analysis. Let

\begin{matrix} θ (z, x) = f (z) + \frac{1}{2 λ} {∥ z - x ∥}^{2}, \end{matrix}

and denote p(x): = argmin _{z ∈ ℝⁿ} θ(z, x). Then, p(x) is well defined and unique, as θ(z, x) is strongly convex. By Eq (2), F can be rewritten as

\begin{matrix} F (x) = f (p (x)) + \frac{1}{2 λ} {∥ p (x) - x ∥}^{2} . \end{matrix}

In the following, we denote g(x) = ∇F(x). Some important properties of F are given as follows:

F is finite-valued, convex and everywhere differentiable with
$\begin{matrix} g (x) = \nabla F (x) = \frac{x - p (x)}{λ} . \end{matrix}$ (4)
The gradient mapping g : ℝⁿ → ℝ is globally Lipschitz continuous with modulus λ, i.e.,
$\begin{matrix} ∥ g (x) - g (y) ∥ \leq \frac{1}{λ} ∥ x - y ∥, \forall x, y \in R^{n} . \end{matrix}$ (5)
x solves Eq (1) if and only if ∇F(x) = 0, namely, p(x) = x.

It is obvious that F(x) and g(x) can be obtained through the optimal solution of argmin _{z ∈ ℝⁿ} θ(z, x). However, the minimizer of θ(z, x), p(x) is difficult or even impossible to solve for exactly. Thus, we cannot compute the exact value of p(x) to define F(x) and g(x). Fortunately, for each x ∈ ℝⁿ and any ϵ > 0, there exists a vector p ^α(x, ϵ) ∈ ℝⁿ such that

\begin{matrix} f (p^{α} (x, ϵ)) + \frac{1}{2 λ} {∥ p^{α} (x, ϵ) - x ∥}^{2} \leq F (x) + ϵ . \end{matrix}

(6)

Thus, we can use p ^α(x, ϵ) to define respective approximations of F(x) and g(x) as follows, when ϵ is small,

\begin{matrix} F^{α} (x, ϵ) : = f (p^{α} (x, ϵ)) + \frac{1}{2 λ} {∥ p^{α} (x, ϵ) - x ∥}^{2} \end{matrix}

(7)

and

\begin{matrix} g^{α} (x, ϵ) : = \frac{x - p^{α} (x, ϵ)}{λ}, \end{matrix}

(8)

The papers [36, 37] describe some algorithms to calculate p ^α(x, ϵ). The following remarkable feature of F ^α(x, ϵ) and g ^α(x, ϵ) is obtained from [38].

Proposition 2.1 Let p ^α(x, ϵ) be a vector satisfying Eq (6), and F ^α(x, ϵ) and g ^α(x, ϵ) are defined by Eqs (7) and (8), respectively. Then, we obtain

\begin{matrix} F (x) \leq F^{α} (x, ϵ) \leq F (x) + ϵ, \end{matrix}

(9)

\begin{matrix} ∥ p^{α} (x, ϵ) - p (x) ∥ \leq \sqrt{2 λ ϵ}, \end{matrix}

(10)

and

\begin{matrix} ∥ g^{α} (x, ϵ) - g (x) ∥ \leq \sqrt{\frac{2 ϵ}{λ}} . \end{matrix}

(11)

The relations Eqs (9), (10) and (11) imply that F ^α(x, ϵ) and g ^α(x, ϵ) may be made arbitrarily close to F(x) and g(x), respectively, by choosing the parameter ϵ to be small enough.

Second, recall that when f is smooth, the quasi-Newton secant method is used to solve problem Eq (1). The iterate x _k satisfies ∇f _k + B _k(x _k+1 − x _k) = 0, where ∇f _k = ∇f(x _k), B _k is an approximation Hessian of f at x _k, and the sequence of matrix {B _k} satisfies the secant equation as follows.

\begin{matrix} B_{k + 1} s_{k} = y_{k}, \end{matrix}

(12)

where y _k = ∇f _k+1− ∇f _k and s _k = x _k+1 − x _k. However, the function values are not exploited in Eq (12), which the method solves by only using the gradient information. Motivated by the above observations, we hope to develop a method that uses both the gradient information and function information. This problem has been studied by several authors. In particular, Wei, Li and Qi [39] proposed an important modified secant equation by using not only the gradient values but also the function values, and the modified secant is defined as

\begin{matrix} B_{k + 1} s_{k} = ν_{k}, \end{matrix}

(13)

where ν _k = y _k + β _k s _k, f _k = f(x _k), ∇f _k = ∇f(x _k), and $β_{k} = \frac{{(\nabla f_{k + 1} + \nabla f_{k})}^{T} s_{k} + 2 (f_{k} - f_{k + 1})}{{‖ s_{k} ‖}^{2}}$ . When f is twice continuously differentiable and B _k+1 is updated by the BFGS formula [40–43], where B _k = I is a unit matrix if k = 0, this secant Eq (13) possesses the following remarkable property:

\begin{matrix} f_{k} = f_{k + 1} + \nabla f_{k + 1}^{T} s_{k} + \frac{1}{2} s_{k}^{T} B_{k + 1} s_{k} \end{matrix}

This property holds for all k. Based on the result of Theorem 2.1 [39], Eq (13) has an advantage over Eq (12) in this approximate relation.

The new model

In this section, we present a modified BFGS formula using trust region model for solving Eq (1), which is motivated by the Moreau-Yosida regularization (smoothing), general trust region method and the new secant Eq (13). First, we describe the trust region method. In each iteration, a trial step d _k is generated by solving an adaptive trust region subproblem, in which the values of the gradient of F(x) at x _k and Eq (13) are used:

\begin{matrix} min q_{k} (d) = g^{α} {(x_{k}, ϵ_{k})}^{T} d + \frac{1}{2} d^{T} B_{k} d, \\ s . t . ∥ d ∥ \leq Δ_{k}, \end{matrix}

(14)

where the scalar ϵ _k > 0 and Δ_k describe the trust region radius.

Let d _k be the optimal solution of Eq (14). The actual reduction is defined by

\begin{matrix} A r e d_{k} : = F^{α} (x_{k}, ϵ_{k}) - F^{α} (x_{k} + d_{k}, ϵ_{k + 1}), \end{matrix}

(15)

and we define the predict reduction as

\begin{matrix} P r e d_{k} : = - g^{α} {(x_{k}, ϵ_{k})}^{T} - \frac{1}{2} d_{k}^{T} B_{k} d_{k} . \end{matrix}

(16)

Then, we define r _k to be the ratio between Are d _k and Pre d _k

\begin{matrix} r_{k} : = \frac{A r e d_{k}}{P r e d_{k}} . \end{matrix}

(17)

Based on the new secant Eq (13) and with B _k+1 being updated by the BFGS formula, we propose a modified BFGS formula. The B _k+1 is defined by

\begin{matrix} B_{k + 1} : = {\begin{matrix} B_{k}, & if s_{k}^{T} ν_{k} \leq 0, \\ B_{k} - \frac{B_{k} s_{k} s_{k}^{T} B_{k}}{s_{k}^{T} B_{k} s_{k}} + \frac{ν_{k} {ν_{k}}^{T}}{{ν_{k}}^{T} s_{k}}, & if s_{k}^{T} ν_{k} > 0, \end{matrix} \end{matrix}

(18)

where s _k = x _k+1 − x _k, y _k = g ^α(x _k+1, ϵ _k+1) − g ^α(x _k, ϵ _k), ν _k = y _k + β _k s _k and

\begin{matrix} β_{k} = \frac{(g^{α} (x_{k + 1}, ϵ_{k + 1}) + g^{α} {(x_{k}, ϵ_{k}))}^{T} s_{k} + 2 (F^{α} (x_{k}, ϵ_{k}) - F^{α} (x_{k + 1}, ϵ_{k + 1}))}{{∥ s_{k} ∥}^{2}}, \end{matrix}

if k = 0, then B _k = I, and I is a unit matrix.

We now list the steps of the modified trust region algorithm as follows.

Algorithm 1.

Step 0. Choose x ₀ ∈ ℝⁿ, 0 < σ ₁ < σ ₂ < 1, 0 < η ₁ < 1 < η ₂, λ > 0, 0 ≤ ɛ ≪ 1, Δ_max ≥ Δ₀ > 0 is called the maximum value of trust region radius, B ₀ = I, and I is the unit matrix. Let k: = 0.

Step 1. Choose a scalar ϵ _k+1 satisfying 0 < ϵ _k+1 < ϵ _k, and calculate p ^α(x _k, ϵ _k), $g^{α} (x_{k}, ϵ_{k}) = \frac{x_{k} - p^{α} (x_{k}, ϵ_{k})}{λ}$ . If x _k satisfies the termination criterion ‖g ^α(x _k, ϵ _k)‖ ≤ ɛ, then stop. Otherwise, go to Step 2.

Step 2. d _k solves the trust region subproblem Eq (14).

Step 3. Compute Are d _k, Pre d _k, r _k using Eqs (15), (16) and (17).

Step 4. Regulate the trust region radius. Let

\begin{matrix} Δ_{k + 1} : = {\begin{matrix} η_{1} Δ_{k}, & if r_{k} < σ_{1}, \\ Δ_{k}, & if σ_{1} \leq r_{k} \leq σ_{2}, \\ min {η_{2} Δ_{k}, Δ_{m a x}}, & if r_{k} > σ_{2} . \end{matrix} \end{matrix}

Step 5. If the condition r _k ≥ σ ₁ holds, then let x _{k + 1} = x _k + d _k, update B _{k + 1} by Eq (18), and let k: = k + 1; go back to Step 1. Otherwise, let x _k+1: = x _k and k: = k + 1; return to Step 2.

Similar to Dennis and Moré [44] or Yuan and Sun [45], we have the following result.

Lemma 1 If and only if the condition $s_{k}^{T} ν_{k} > 0$ holds, B _k+1 will inherit the positive property of B _k.

Proof “ ⇒ ” If B _k+1 is symmetric and positive definite, then

\begin{matrix} s_{k}^{T} B_{k + 1} s_{k} & = & s_{k}^{T} [B_{k} - \frac{B_{k} s_{k} s_{k}^{T} B_{k}}{s_{k}^{T} B_{k} s_{k}} + \frac{ν_{k} {ν_{k}}^{T}}{{ν_{k}}^{T} s_{k}}] s_{k} \\ = & s_{k}^{T} B_{k} s_{k} - \frac{s_{k}^{T} B_{k} s_{k} s_{k}^{T} B_{k} s_{k}}{s_{k}^{T} B_{k} s_{k}} + \frac{s_{k}^{T} ν_{k} {ν_{k}}^{T} s_{k}}{{ν_{k}}^{T} s_{k}} \\ = & s_{k}^{T} ν_{k} \\ > & 0 . \end{matrix}

“⇐” For the proof of the converse, suppose that $s_{k}^{T} ν_{k} > 0$ and B _k is symmetric and positive definite for all k ≥ 0. We shall prove that x ^T B _k+1 x > 0 holds for arbitrary x ≠ 0 and x ∈ ℝⁿ by induction. It is easy to see that B ₀ = I is symmetric and positive definite. Thus, we have

\begin{matrix} x^{T} B_{k + 1} x & = & x^{T} B_{k} x - \frac{x^{T} B_{k} s_{k} s_{k}^{T} B_{k} x}{s_{k}^{T} b_{k} s_{k}} + \frac{x^{T} ν_{k} {ν_{k}}^{T} x}{{ν_{k}}^{T} s_{k}} \\ = & x^{T} B_{k} x - \frac{{(x^{T} B_{k} s_{k})}^{2}}{s_{k}^{T} B_{k} s_{k}} + \frac{{(x^{T} ν_{k})}^{2}}{{ν_{k}}^{T} s_{k}} . \end{matrix}

(19)

Because B _k is symmetric and positive definite for all k ≥ 0, there exists a symmetric and positive definite matrix $B_{k}^{\frac{1}{2}}$ such that $B_{k} = B_{k}^{\frac{1}{2}} B_{k}^{\frac{1}{2}}$ . Thus, by using the Cauchy-Schwartz inequality, we obtain

\begin{matrix} {(x^{T} B_{k} s_{k})}^{2} & = & {[x^{T} B_{k}^{\frac{1}{2}} B_{k}^{\frac{1}{2}} s_{k}]}^{2} = {[{(B_{k}^{\frac{1}{2}} x)}^{T} (B_{k}^{\frac{1}{2}} s_{k})]}^{2} \\ \leq & {∥ B_{k}^{\frac{1}{2}} x ∥}^{2} ∥ B_{k}^{\frac{1}{2}} s_{k} ∥ \\ = & {(B_{k}^{\frac{1}{2}} x)}^{T} (B_{k}^{\frac{1}{2}} x) {(B_{k}^{\frac{1}{2}} s_{k})}^{T} (B_{k}^{\frac{1}{2}} s_{k}) \\ = & (x^{T} B_{k} x) (s_{k}^{T} B_{k} s_{k}) . \end{matrix}

(20)

It is not difficult to prove that the above inequality holds true if and only if there exists a real number γ _k ≠ 0 such that $B_{k}^{\frac{1}{2}} x = γ_{k} B_{k}^{\frac{1}{2}} s_{k}$ , namely, x = γ _k s _k.

Hence, if Eq (20) strictly holds (and note that $s_{k} {ν_{k}}^{T} > 0$ ), then from Eq (19), we have

\begin{matrix} x^{T} B_{k + 1} x & > & x^{T} B_{k} x - \frac{{(x^{T} B_{k} s_{k})}^{2}}{s_{k}^{T} B_{k} s_{k}} + \frac{{(x^{T} ν_{k})}^{2}}{{ν_{k}}^{T} s_{k}} \\ = & \frac{{(x^{T} ν_{k})}^{2}}{{ν_{k}}^{T} s_{k}} > 0 . \end{matrix}

Otherwise, ${(x^{T} B_{k} s_{k})}^{2} = (x^{T} B_{k} x) (s_{k}^{T} B_{k} s_{k})$ ; then, there exists γ _k such that x = γ _k s _k. Thus,

\begin{matrix} x^{T} B_{k + 1} x & = & \frac{{[{(γ_{k} s_{k})}^{T} ν_{k}]}^{2}}{{ν_{k}}^{T} s_{k}} \\ = & γ_{k}^{2} s_{k}^{T} ν_{k} > 0 . \end{matrix}

Therefore, for each 0 ≠ x ∈ ℝⁿ, we have x ^T B _k+1 x > 0. This completes the proof.

Lemma 1 states that if $s_{k}^{T} ν_{k} > 0$ , then the matrix sequence {B _k} is symmetric and positive definite, which is updated by the BFGS formula of Eq (18).

Convergence analysis

In this section, the global convergence of Algorithm 1 is established under the assumption that the following conditions are required.

Assumption A.

Let the level set Ω
$\begin{matrix} Ω = {x \in R^{n} | F^{α} (x, ϵ) \leq F^{α} (x_{0}, ϵ), \forall x_{0} \in R^{n}} . \end{matrix}$
F is bounded from below.
The matrix sequence {B _k} is bounded on Ω, which means that there exists a positive constant M such that
$\begin{matrix} ∥ B_{k} ∥ \leq M \forall k . \end{matrix}$
The sequence {ϵ _k} converges to zero.

Now, we present the following lemma.

Lemma 2 If d _k is the solution of Eq (14), then

\begin{matrix} P r e d_{k} = q_{k} (0) - q_{k} (d_{k}) \geq \frac{1}{2} ∥ g^{α} (x_{k}, ϵ_{k}) ∥ min {Δ_{k}, \frac{∥ g^{α} (x_{k}, ϵ_{k}) ∥}{∥ B_{k} ∥}} . \end{matrix}

(21)

Proof Similar to the proof of Lemma 7(6.2) in Ma [46]. Note that the matrix sequence {B _k} is symmetric and positive definite; then, we present $d_{k}^{c}$ to be a Cauchy point at iteration point x _k, which is defined by

\begin{matrix} d_{k}^{c} = - μ_{k} \frac{Δ_{k}}{∥ g^{α} (x_{k}, ϵ_{k}) ∥} g^{α} (x_{k}, ϵ_{k}), \end{matrix}

where $μ_{k} = min {\frac{{‖ g^{α} (x_{k}, ϵ_{k}) ‖}^{3}}{Δ_{k} {g^{α} (x_{k}, ϵ_{k})}^{T} B_{k} g^{α} (x_{k}, ϵ_{k})}, 1}$ . It is easy to verify that the Cauchy point is a feasible point, i.e., $‖ d_{k}^{c} ‖ \leq Δ_{k}$ .

If $\frac{{‖ g^{α} (x_{k}, ϵ_{k}) ‖}^{3}}{Δ_{k} {g^{α} (x_{k}, ϵ_{k})}^{T} B_{k} g^{α} (x_{k}, ϵ_{k})} > 1$ , then

\begin{matrix} {∥ g^{α} (x_{k}, ϵ_{k}) ∥}^{3} > Δ_{k} {g^{α} (x_{k}, ϵ_{k})}^{T} B_{k} g^{α} (x_{k}, ϵ_{k}), \end{matrix}

and

\begin{matrix} d_{k}^{c} = - \frac{Δ_{k}}{∥ g^{α} (x_{k}, ϵ_{k})} g^{α} (x_{k}, ϵ_{k}) . \end{matrix}

Thus, we obtain

\begin{matrix} P r e d_{k}^{c} & = & - q_{k} (- \frac{Δ_{k}}{∥ g^{α} (x_{k}, ϵ_{k}) ∥} g^{α} (x_{k}, ϵ_{k})) \\ = & - {g^{α} (x_{k}, ϵ_{k})}^{T} (- \frac{Δ_{k}}{∥ g^{α} (x_{k}, ϵ_{k}) ∥} g^{α} (x_{k}, ϵ_{k})) \\ - \frac{1}{2} {(- \frac{Δ_{k}}{∥ g^{α} (x_{k}, ϵ_{k}) ∥} g^{α} (x_{k}, ϵ_{k}))}^{T} B_{k} (- \frac{Δ_{k}}{∥ g^{α} (x_{k}, ϵ_{k}) ∥} g^{α} (x_{k}, ϵ_{k})) \\ = & \frac{Δ_{k}}{∥ g^{α} (x_{k}, ϵ_{k}) ∥} {∥ g^{α} (x_{k}, ϵ_{k}) ∥}^{2} - \frac{1}{2} \frac{Δ_{k}^{2}}{{∥ g^{α} (x_{k}, ϵ_{k}) ∥}^{2}} {g^{α} (x_{k}, ϵ_{k})}^{T} B_{k} g^{α} (x_{k}, ϵ_{k}) \\ \geq & \frac{1}{2} Δ_{k} ∥ g^{α} (x_{k}, ϵ_{k}) ∥ \\ \geq & \frac{1}{2} ∥ g^{α} (x_{k}, ϵ_{k}) ∥ min {Δ_{k}, \frac{∥ g^{α} (x_{k}, ϵ_{k}) ∥}{∥ B_{k} ∥}} . \end{matrix}

Otherwise, we have $d_{k}^{c} = - \frac{{‖ g^{α} (x_{k}, ϵ_{k}) ‖}^{2}}{{g^{α} (x_{k}, ϵ_{k})}^{T} B_{k} g^{α} (x_{k}, ϵ_{k})} g^{α} (x_{k}, ϵ_{k})$ . Thus, we obtain

\begin{matrix} P r e d_{k}^{c} & = & - g^{α} (x_{k}, ϵ_{k}) (- \frac{{∥ g^{α} (x_{k}, ϵ_{k}) ∥}^{2}}{{g^{α} (x_{k}, ϵ_{k})}^{T} B_{k} g^{α} (x_{k}, ϵ_{k})} g^{α} (x_{k}, ϵ_{k})) \\ - \frac{1}{2} {(- \frac{{∥ g^{α} (x_{k}, ϵ_{k}) ∥}^{2}}{{g^{α} (x_{k}, ϵ_{k})}^{T} B_{k} g^{α} (x_{k}, ϵ_{k})} g^{α} (x_{k}, ϵ_{k}))}^{T} B_{k} (- \frac{{∥ g^{α} (x_{k}, ϵ_{k}) ∥}^{2}}{{g^{α} (x_{k}, ϵ_{k})}^{T} B_{k} g^{α} (x_{k}, ϵ_{k})} g^{α} (x_{k}, ϵ_{k})) \\ = & \frac{1}{2} \frac{{∥ g^{α} (x_{k}, ϵ_{k}) ∥}^{4}}{{g^{α} (x_{k}, ϵ_{k})}^{T} B_{k} g^{α} (x_{k}, ϵ_{k})} \\ \geq & \frac{1}{2} \frac{{∥ g^{α} (x_{k}, ϵ_{k}) ∥}^{2}}{∥ B_{k} ∥} \\ \geq & \frac{1}{2} ∥ g^{α} (x_{k}, ϵ_{k}) ∥ min {Δ_{k}, \frac{∥ g^{α} (x_{k}, ϵ_{k}) ∥}{∥ B_{k} ∥}} . \end{matrix}

Let d _k be the solution of Eq (14). Because $q_{k} (d_{k}^{c}) \geq q_{k} (d_{k})$ , we have

\begin{matrix} P r e d_{k} = q_{k} (0) - q_{k} (d_{k}) \geq \frac{1}{2} ∥ g^{α} (x_{k}, ϵ_{k}) ∥ min {Δ_{k}, \frac{∥ g^{α} (x_{k}, ϵ_{k}) ∥}{∥ B_{k} ∥}} . \end{matrix}

This completes the proof.

Lemma 3 Let Assumption A hold true and the sequence {x _k} be generated by Algorithm 1. If d _k is the solution of Eq (14), then

\begin{matrix} | A r e d_{k} - P r e d_{k} | = o ({∥ d_{k} ∥}^{2}) . \end{matrix}

(22)

Proof Let d _k be the solution of Eq (14). By using Taylor expansion, F ^α(x _k + d _k, ϵ _k+1) can be expressed by

\begin{matrix} F^{α} (x_{k} + d_{k}, ϵ_{k + 1}) = F^{α} (x_{k}, ϵ_{k}) + {g^{α} (x_{k}, ϵ_{k})}^{T} d_{k} + \frac{1}{2} d_{k}^{T} B_{k} d_{k} + o ({∥ d_{k} ∥}^{2}), \end{matrix}

(23)

Note that with the definitions of Are d _k and Pre d _k and by using Eq (23), we have

\begin{matrix} | A r e d_{k} - P r e d_{k} | & = & | F^{α} (x_{k}, ϵ_{k}) - F^{α} (x_{k} + d_{k}, ϵ_{k + 1}) + q_{k} (d_{k}) | \\ = & o ({∥ d_{k} ∥}^{2}) . \end{matrix}

The proof is complete.

Lemma 4 Let Assumption A hold. Then, Algorithm 1 does not circle in the inner cycle infinitely.

Proof Suppose, by contradiction to the conclusion of the lemma, that Algorithm 1 cycles between Steps 2 and 5 infinitely at iteration point x _k, i.e., r _k < σ ₁ and that there exists a scalar ρ > 0 such that ‖g ^α(x _k, ϵ _k)‖ ≥ ρ. Thus, noting that 0 < η ₁ < 1, we have

\begin{matrix} ∥ d_{k} ∥ \leq Δ_{k} = η_{1}^{k} Δ_{0} \to 0, f o r k \to \infty . \end{matrix}

By using the result Eq (22) of Lemma 3 and the definition of r _k, we obtain

\begin{matrix} | r_{k} - 1 | & = & \frac{| A r e d_{k} - P r e d_{k} |}{| P r e d_{k} |} \\ \leq & \frac{2 o ({∥ d_{k} ∥}^{2})}{∥ g^{α} (x_{k}, ϵ_{k}) ∥ min {Δ_{k}, \frac{∥ g^{α} (x_{k}, ϵ_{k}) ∥}{∥ B_{k} ∥}}} \to 0, f o r k \to \infty . \end{matrix}

which means that we must have r _k ≥ σ ₁; this contradicts the assumption that r _k < σ ₁, and the proof is complete.

Based on the above lemmas, we can now demonstrate the global convergence of Algorithm 1 under suitable conditions.

Theorem 1 (Global Convergence). Suppose that Assumption A holds and that the sequence {x _k} is generated by Algorithm 1. Let d_k be the solution of Eq (14). Then, $lim_{k \to \infty} inf ‖ g_{k} ‖ = 0$ holds, and any accumulation point of x_k is an optimal solution of Eq (1).

Proof We first prove that

\begin{matrix} lim_{k \to \infty} inf ∥ g^{α} (x_{k}, ϵ_{k}) ∥ = 0 . \end{matrix}

(24)

Suppose that g ^α(x _k, ϵ _k) ≠ 0. Without loss of generality, by the definition of r _k, we have

\begin{matrix} | r_{k} - 1 | = | \frac{F^{α} (x_{k} + d_{k}, ϵ_{k + 1}) - F^{α} (x_{k}, ϵ_{k}) - q_{k} (d_{k})}{q_{k} (d_{k})} | . \end{matrix}

(25)

Using Taylor expansion, we obtain

\begin{matrix} F^{α} (x_{k} + d_{k}, ϵ_{k + 1}) = F^{α} (x_{k}, ϵ_{k}) + {g^{α} (x_{k}, ϵ_{k})}^{T} d_{k} + \int_{0}^{1} d_{k}^{T} [g^{α} (x_{k} + t d_{k}, ϵ_{k + 1}) - g^{α} (x_{k}, ϵ_{k})] d t . \end{matrix}

When Δ_k > 0 and small enough, we have

\begin{matrix} | F^{α} (x_{k} + d_{k}, ϵ_{k + 1}) - F^{α} (x_{k}, ϵ_{k}) - q_{k} (d_{k}) | \\ = & | \frac{1}{2} d_{k}^{T} B_{k} d_{k} - \int_{0}^{1} d_{k}^{T} [g^{α} (x_{k} + t d_{k}, ϵ_{k + 1}) - g^{α} (x_{k}, ϵ_{k})] d t | \\ \leq & \frac{1}{2} M {∥ d_{k} ∥}^{2} + o (∥ d_{k} ∥) . \end{matrix}

(26)

Suppose that there exists ω ₀ > 0 such that ‖g ^α(x _k, ϵ _k)‖ ≥ ω ₀. By contradiction, using Eqs (25) and (26) and Lemma 2, we have

\begin{matrix} | r_{k} - 1 | & \leq & \frac{\frac{1}{2} M {∥ d_{k} ∥}^{2} + o (∥ d_{k} ∥)}{\frac{1}{2} ∥ g^{α} (x_{k}, ϵ_{k}) min {Δ_{k}, \frac{∥ g^{α} (x_{k}, ϵ_{k}) ∥}{∥ B_{k} ∥}}} \\ \leq & \frac{M Δ_{k}^{2} + o (Δ_{k})}{ω_{0} min {Δ_{k}, \frac{ω_{0}}{M}}} \\ = & O (Δ_{k}) . \end{matrix}

(27)

which means that there exists sufficiently small $\hat{Δ} > 0$ such that $Δ_{k} \leq \hat{Δ}$ for each k, and we have ∣r _k − 1∣ < 1 − σ ₂, i.e., r _k > σ ₂. Then, according to the Algorithm 1, we have Δ_k+1 ≥ Δ_k.

Thus, there exists a positive integer k ₀ and a constant ρ ₀ for arbitrary k ≥ k ₀ and satisfying $Δ_{k} \leq \hat{Δ}$ , for which we have

\begin{matrix} Δ_{k} \neq ρ_{o} \hat{Δ} . \end{matrix}

(28)

On the other hand, because F is bounded from below, and supposing that there exists an infinite number k such that r _k > σ ₁, by the definition of r _k and Lemma 2, for each k ≥ k ₀,

\begin{matrix} F^{α} (x_{k}, ϵ_{k}) - F^{α} (x_{k} + d_{k}, ϵ_{k + 1}) \\ > σ_{1} [q_{k} (0) - q_{k} (d_{k})] \\ \geq \frac{σ}{2} ω_{0} min {Δ_{k}, \frac{ω_{0}}{M}} . \end{matrix}

which means that Δ_k → 0 for k → ∞; this is a contradiction to Eq (28).

Moreover, suppose that for sufficiently large k, we have r _k < σ ₁. Then, $Δ_{k} = η_{1}^{k} Δ_{0}$ , and we can see that Δ_k → 0 for k → ∞; this is also a contradiction to Eq (28). The contradiction shows that Eq (24) holds.

We now show that $lim_{k \to \infty} inf ‖ g_{k} ‖ = 0$ holds. By using Eq (11), we have

\begin{matrix} ∥ g^{α} (x_{k}, ϵ_{k}) - g (x_{k}) ∥ \leq \sqrt{\frac{2 ϵ_{k}}{λ}} . \end{matrix}

Together with Assumption A(iv), this implies that

\begin{matrix} lim_{k \to \infty} inf ∥ g_{k} ∥ = 0 . \end{matrix}

(29)

Finally, we make a final assertion. Let x* be an accumulation point of {x _k}. Then, without loss of generality, there exists a subsequence {x _k}_K satisfying

\begin{matrix} lim_{k \to \infty, k \in K} x_{k} = x^{*} . \end{matrix}

(30)

From the properties of F, we have

\begin{matrix} g (x_{k}) = \frac{x_{k} - p (x_{k})}{λ} . \end{matrix}

Thus, by using Eqs (29) and (30), we have x* = p(x*). Therefore, x* is an optimal solution of Eq (1). The proof is complete.

Similar to Theorem 3.7 in [25], we can show that the rate of convergence of Algorithm 1 is Q-superlinear. We omit this proof here (the proof of the Q-superlinear convergence can be found in [25]).

Theorem 2 (Q-superlinear Convergence) [25] Suppose that Assumption A(ii) holds, that the sequence {x _k} is generated by Algorithm 1, which has a limit point x*, and that g is BD-regular and semismooth at x*. Furthermore, suppose that ϵ _k = o(‖g(x _k)‖²). Then,

x* is the unique solution of Eq (1);
the entire sequence {x _k} converges to x* Q-superlinearly, i.e.,
$\begin{matrix} lim_{k \to \infty} \frac{∥ x_{k + 1} - x^{*} ∥}{∥ x_{k} - x^{*} ∥} = 0 . \end{matrix}$

Results

In this section, we test our modified BFGS formula using a trust region model for solving nonsmooth problems. The type of nonsmooth problems addressed in Table 1 can be found in [47–53]. The problem dimensions and optimum function values are listed in Table 1, where “No.” is the number of the test problem, “Dim” is the dimension of the test problem, “Problem” is the name of the test problem, “x ₀” is the initial point, and “f _ops(x)” is the optimization function evaluation. Here, the modified algorithm was implemented using MATLAB 7.0.4, and all numerical experiments were run on a PC with CPU Intel CORE(TM) 2 Duo T6600 2.20 GHZ, with 2.00 GB of RAM and with the Windows 7 operating system.

Table 1. Problem descriptions for test problems.

No.	Dim	Problem	x ₀	f _ops(x)
1	2	Rosenbrock [47]	(-1.2, 1.0)	0
2	2	Crescent [47]	(-1.5, 2.0)	0
3	2	CB2 [48]	(1.0, -0.1)	1.9522245
4	2	CB3 [48]	(2.0, 2.0)	2.0
5	2	DEM [49]	(1.0, 1.0)	-3.0
6	2	QL [50]	(-1.0, 5.0)	7.20
7	2	LQ [50]	(-0.5, -0.5)	-1.4142136
8	2	Mifflin 2 [51]	(-1.0, -1.0)	-1.0
9	5	Shor [52]	(0.0, 0.0, 0.0, 0.0, 1.0)	22.600162
10	50	MXHILB [53]	ones(50, 1)	0
11	50	LIHILB [53]	ones(50, 1)	0

Open in a new tab

To test the performance of the given algorithm for the problems listed in Table 1, we compared our method with the trust region concept (BT) of paper [15], the proximal bundle method (PBL) of paper [17] and the gradient trust region algorithm with limited memory BFGS update (LGTR) described in [26]. The parameters were chosen as follows: σ ₁ = 0.45, σ ₂ = 0.75, η ₁ = 0.5, η ₂ = 4, λ = 1, Δ₀ = 0.5 < Δ_max = 100 and $ϵ_{k} = \frac{1}{{(2 + k)}^{2}}$ (where k is the iterate number). We stopped the algorithm when the condition ‖g ^α(x, ϵ)‖ ≤ 10^{− 6} was satisfied. Based on the idea of [26], we use the function fminsearch in MATLAB for solving min θ(z, x). Then, we obtained the solution p(x); moreover, we obtained g ^α(x, ϵ), which is computed using Eq (8). Meanwhile, we also listed the results of PBL, LGTR, BT and our modified algorithm in Table 2. The numerical results of PBL and BT can be found in [17], and the numerical results of LGTR can be found in [26]. The following notations are used in Table 2: “NI” is the number of iterations; “NF” is the number of the function evaluations; “f(x)” is the function value at final iteration; “——” indicates that the algorithm fails to solve the problem; and “Total” denotes the sum of the NI/NF.

Table 2. Test results.

No.	PBL NI/NF/f(x)	LGTR NI/NF/f(x)	BT NI/NF/f(x)	Algorithm 1 NI/NF/f(x)
1	42/45/3.81 × 10⁻⁵	——	79/88/1.30 × 10⁻¹⁰	26/66/4.247136 × 10⁻⁶
2	18/20/6.79 × 10⁻⁵	10/10/3.156719 × 10⁻⁵	24/27/9.44 × 10⁻⁵	13/13/2.521899 × 10⁻⁵
3	32/34/1.9522245	10/11/1.952225	13/16/1.952225	4/6/1.952262
4	14/16/2.0	2/3/2.000217	13/21/2.0	3/4/2.000040
5	17/19/-3.0	3/3/-2.999700	9/13/-3.0	4/24/-2.999922
6	13/15/7.2000015	19/119/7.200001	12/17/7.200009	9/9/7.200043
7	11/12/-1.4142136	1/1/-1.207068	10/11/-1.414214	2/2/-1.414214
8	66/68/-0.99999941	3/3/-0.9283527	6/13/-1.0	4/4/-0.9978547
9	27/29/22.600162	42/443/22.62826	29/30/22.600160	8/9/22.600470
10	19/20/4.24 × 10⁻⁷	12/12/9.793119 × 10⁻³	——	23/108/5.228012 × 10⁻³
11	19/20/9.90 × 10⁻⁸	20/63/9.661137 × 10⁻³	——	7/7/2.632534 × 10⁻³
Total	278/298	164/1111	353/412	103/252

Open in a new tab

The numerical results show that the performance of our algorithm is superior to those of the methods in Table 2. It can be seen clearly that the sum of our algorithm relative to NI and NF is less than the other three algorithms. The paper [54] provides a new tool for analyzing the efficiency of these four algorithms. Figs 1 and 2 show the performances of these four methods relative to NI and NF of Table 2, respectively. These two figures prove that Algorithm 1 provides a good performance for all the problems tested compared to PBL, LGTR and BT. In sum, the preliminary numerical results indicate that the modified method is efficient for solving nonsmooth convex minimizations.

Conclusion

The trust region method is one of the most efficient optimization methods. In this paper, by using the Moreau-Yosida regularization (smoothing) and a new secant equation with the BFGS formula, we present a modified BFGS formula using a trust region model for solving nonsmooth convex minimizations. Our algorithm does not compute the Hessian of the objective function at every iteration, which decrease the computational workload and time, and it uses the function information and the gradient information. Under suitable conditions, global convergence is established, and we show that the rate of convergence of our algorithm is Q-superlinear. Numerical results show that this algorithm is efficient. We believe that this algorithm can be used in future applications to solve non smooth convex minimizations.

Acknowledgments

This work is supported by China NSF (Grant No. 11261006 and 11161003), the Guangxi Science Fund for Distinguished Young Scholars (No. 2015GXNSFGA139001), NSFC No. 61232016, NSFC No. U1405254, and PAPD issue of Jiangsu advantages discipline. The authors wish to thank the editor and the referees for their useful suggestions and comments which greatly improve this paper.

Data Availability

All data are available and they are listed in the paper.

Funding Statement

This work is supported by the Program for Excellent Talents in Guangxi Higher Education Institutions (Grant No. 201261), Guangxi NSF (Grant No. 2012GXNSFAA053002), China NSF (Grant No. 11261006 and 11161003), the Guangxi Science Fund for Distinguished Young Scholars (No. 2015GXNSFGA139001), NSFC No. 61232016, NSFC No. U1405254, and PAPD issue of Jiangsu advantages discipline.

References

1. Steihaug T. The conjugate gradient method and trust regions in lagre scale optimization, SIAM Journal on Numerical Analysis, 20, 626–637 (1983) 10.1137/0720042 [DOI] [Google Scholar]
2. Dai Y, Yuan Y. A nonlinear conjugate gradient with a strong alobal convergence properties, SIAM Journal on Optimization, 10, 177–182 (2000) 10.1137/S1052623497318992 [DOI] [Google Scholar]
3. Wei Z, Li G and Qi L. New nonlinear conjugate gradient formulas for large-scale unconstrained optimization problems, Appiled Mathematics and Computation, 179, 407–430 (2006) 10.1016/j.amc.2005.11.150 [DOI] [Google Scholar]
4. Li L, Peng H, Kurths J, Yang Y and Schellnhuber H.J. Chaos-order transition in foraging behavior of ants, PNAS, 111, 8392–8397 (2014) 10.1073/pnas.1407083111 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Peng H, Li L, Yang Y and Liu F. Parameter estimation of dynamical systems via a chaotic ant swarm, Physical Review E, 81, 016207, (2010) 10.1103/PhysRevE.81.016207 [DOI] [PubMed] [Google Scholar]
6. Wan M, Li L, Xiao J, Wang C and Yang Y. Data clustering using bacterial foraging optimization, Journal of Intelligent Information Systems, 38, 321–341 (2012) 10.1007/s10844-011-0158-3 [DOI] [Google Scholar]
7. Chan C, Katsaggelos A.K and Sahakian A.V. Image sequence filtering in quantum noise with applications to low-dose fluoroscopy, IEEE Transactions on Medical Imaging, 12, 610–621 (1993) 10.1109/42.241890 [DOI] [PubMed] [Google Scholar]
8. Banham M.R, Katsaggelos A.K. Digital image restoration, IEEE Signal Processing Magazine, 14, 24–41 (1997) 10.1109/79.581363 [DOI] [Google Scholar]
9. Gu B, Sheng V. Feasibility and finite convergence analysis for accurate on-line v-support vector learning, IEEE Transactions on Neural Networks and Learning Systems, 24, 1304–1315 (2013) 10.1109/TNNLS.2013.2250300 [DOI] [PubMed] [Google Scholar]
10. Li J, Li X, Yang B and Sun X. Segmentation-based image copy-move forgery detection scheme, IEEE Transactions on Information Forensics and Security, 10, 507–518 (2015) 10.1109/TIFS.2014.2381872 [DOI] [Google Scholar]
11. Wen X, Shao L, Fang W, and Xue Y. Efficient feature selection and classification for vehicle detection, IEEE Transactions on Circuits and Systems for Video Technology, (2015) 10.1109/TCSVT.2014.2358031 [DOI] [Google Scholar]
12. Zhang H, Wu J, Nguyen T and Sun M. Synthetic aperture radar image segmentation by modified student’s t-mixture model, IEEE Transaction on Geoscience and Remote Sensing, 52, 4391–4403 (2014) 10.1109/TGRS.2013.2281854 [DOI] [Google Scholar]
13. Fu Z. Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing, IEICE Transactions on Communications, E98-B, 190–200 (2015) 10.1587/transcom.E98.B.190 [DOI] [Google Scholar]
14. Yuan G, Wei Z and Li G, A modified Polak-Ribière-Polyak conjugate gradient algorithm for nonsmooth convex programs, Journal of Computational and Appiled Mathematics, 255, 86–96 (2014) 10.1016/j.cam.2013.04.032 [DOI] [Google Scholar]
15. Schramm H, Zowe J. A version of the bundle idea for minimizing a nonsmooth function: conceptual idea, convergence analysis, numercial results, SIAM Journal on Optimization, 2, 121–152 (1992) 10.1137/0802008 [DOI] [Google Scholar]
16. Haarala M. Miettinen K and Mäkelä M.M, New limited memory bundle method for lagre-scale nonsmooth optimization, Optimization Methods and Software, 19, 673–692 (2004) 10.1080/10556780410001689225 [DOI] [Google Scholar]
17. Lukšan L, Vlček J. A bundle-Newton method for nonsmooth unconstrained minimization, Mathematical Programming, 83, 373–391 (1998) 10.1016/S0025-5610(97)00108-1 [DOI] [Google Scholar]
18. Wei Z, Qi L and Birge J.R. A new methods for nonsmooth convex optimization, Journal of Inequalities and Applications, 2, 157–179 (1998) [Google Scholar]
19. Levenberg K. A method for the solution of certain nonlinear problem in least squares, Quarterly Journal of Mechanics and Applied Mathematics, 2, 164–166 (1944) [Google Scholar]
20. Martinet B. Régularisation d’inéquations variationelles par approxiamations succcessives, Rev. Fr. Inform. Rech. Oper, 4, 154–159 (1970) [Google Scholar]
21. Powell M.J.D. Convergence properties of a class of minimization algorithms In: Mangasarian Q.L., Meyer R.R., Robinson S.M. (eds.) Nonlinear Programming, vol. 2 Academic Press, New York: (1975) [Google Scholar]
22. Fletcher R. A model algorithm for composite nondifferentiable optimization problems, Math. Program. Stud, 17, 67–76 (1982) 10.1007/BFb0120959 [DOI] [Google Scholar]
23. Sampaio R.J.B. Yuan J and Sun W. Trust region algorithm for nonsmooth optimization, Appiled Mathematics and Computation, 85, 109–116 (1997) 10.1016/S0096-3003(96)00112-9 [DOI] [Google Scholar]
24. Sun W, Sampaio R.J.B and Yuan J. Quasi-Newton trust region algorithm for non-smooth least squares problems, Appiled Mathematics and Computation, 105, 183–194 (1999) 10.1016/S0096-3003(98)10103-0 [DOI] [Google Scholar]
25. Zhang L. A new trust region algorithm for nonsmooth convex minimization, Appiled Mathematics and Computation 193, 135–142 (2007) 10.1016/j.amc.2007.03.059 [DOI] [Google Scholar]
26. Yuan G, Wei Z and Wang Z. Gradient trust region algorithm with limited memory BFGS update for nonsmooth convex minization, Computational Optimization and Application, 54, 45–64 (2013) 10.1007/s10589-012-9485-8 [DOI] [Google Scholar]
27. Yuan G, Lu X and Wei Z. BFGS trust-region method for symmetric nonlinear equations, Journal of Computational and Appiled Mathematics, 230, 44–58 (2009) 10.1016/j.cam.2008.10.062 [DOI] [Google Scholar]
28. Qi L, Sun J. A trust region algorithm for minimization of locally Lipschitzian functions, Mathematical Programming, 66, 25–43 (1994) 10.1007/BF01581136 [DOI] [Google Scholar]
29. Bellavia S, Maccini M and Morini B. An affine scaling trust-region approach to bound-constrained nonlinear systems, Appiled Numerical Mathematics, 44, 257–280 (2003) 10.1016/S0168-9274(02)00170-8 [DOI] [Google Scholar]
30. Akbari Z, Yousefpour R and Reza Peyghami M. A new nonsmooth trust region algorithm for locally Lipschitz unconstrained optimization problems, Journal of Optimization Theory and Applications, 164, 733–754 (2015) 10.1007/s10957-014-0534-6 [DOI] [Google Scholar]
31. Bannert T. A trust region algorithm for nonsmooth optimization, Mathematical Programming, 67, 247–264 (1994) 10.1007/BF01582223 [DOI] [Google Scholar]
32. Amini K, Ahookhosh M. A hybrid of adjustable trust-region and nonmonotone algorithms for unconstrained optimization, Applied Mathematical Modelling, 38, 2601–2612 (2014) 10.1016/j.apm.2013.10.062 [DOI] [Google Scholar]
33. Zhou Q, Hang D. Nonmonotone adaptive trust region method with line search based on new diagonal updating, Appiled Numerical Mathematics, 91, 75–88 (2015) 10.1016/j.apnum.2014.12.009 [DOI] [Google Scholar]
34. Yuan G, Wei Z and Lu X. A BFGS trust-region method for nonlinear equations, Computing, 92, 317–333 (2011) 10.1007/s00607-011-0146-z [DOI] [Google Scholar]
35. Lu S, Wei Z and Li L. A trust region algorithm with adaptive cubic regularization methods for nonsmooth convex minization, Computational Optimization and Application, 51, 551–573 (2012) 10.1007/s10589-010-9363-1 [DOI] [Google Scholar]
36. Correa R, Lemaréchal C. Convergence of some algorithms for convex minimization, Mathematical Programming, 62, 261–273 (1993) 10.1007/BF01585170 [DOI] [Google Scholar]
37. Dennis J.E. Jr, Li S.B and Tapia R.A. A unified approach to global convergence of trust region methods for nonsmooth optimization, Mathematical Programming, 68, 319–346 (1995) 10.1016/0025-5610(94)00054-W [DOI] [Google Scholar]
38. Fukushima M, Qi L. A global and superlinearly convergent algorithm for nonsmooth convex minimization, SIAM Journal on Optimization, 6, 1106–1120 (1996) 10.1137/S1052623494278839 [DOI] [Google Scholar]
39. Wei Z, Li G and Qi L. New quasi-Newton methods for unconstrained optimization problems, Appiled Mathematics and Computation, 175, 1156–1188 (2006) 10.1016/j.amc.2005.08.027 [DOI] [Google Scholar]
40. Broyden C.G. The convergence of a class of double rank minimization algorithms: the new algorithm, Journal of the Institute of Mathematics and its Applications, 6, 222–131 (1970) 10.1093/imamat/6.3.222 [DOI] [Google Scholar]
41. Fletcher R. A new approach to variable metric algorithms, Computer Journal, 13, 317–322 (1970) 10.1093/comjnl/13.3.317 [DOI] [Google Scholar]
42. Goldfarb D. A family of variable metric methods derived by variational means, Mathematics of Computation, 24, 23–26 (1970) 10.1090/S0025-5718-1970-0258249-6 [DOI] [Google Scholar]
43. Shanno D.F. Conditioning of quadi-Newton methods for function minization, Mathematics of Computation, 24, 647–650 (1970) [Google Scholar]
44. Dennis J.E, Moré J.J. A characterization of superlinear convergence and its application to quasi-Newton methods, Mathematics of Computation, 28, 549–560 (1974) 10.1090/S0025-5718-1974-0343581-1 [DOI] [Google Scholar]
45. Yuan Y, Sun W. Optimization theory and methods, Science Press, Beijing: (1997) [Google Scholar]
46. Ma C. Optimization method and the Matlab programming, Science Press, Beijing: (2010) [Google Scholar]
47. Mäkelä M.M, Neittaanmäki P. Nonsmooth Optimization. World Scientific, London: (1992) [Google Scholar]
48. Charalambous J, Conn A.R. An efficient method to solve the minimax problem directly, SIAM Journal on Numerical Analysis, 15, 162–187 (1978) 10.1137/0715011 [DOI] [Google Scholar]
49. Demyanov V.F, Malozemov V.N. Introduction to Minimax. Wiley, New York: (1974) [Google Scholar]
50.Womersley J. Numerical methods for structured problems in nonsmooth optimization. Ph.D. thesis. Mathematics Department, University of Dundee, Dundee, Scotland (1981)
51.Gupta N. A higher than first order algorithm for nonsmooth constrained optimization. P.h. thesis, Department of Philosophy, Washington State University, Pullman, WA (1985)
52. Short N.Z. Minimization methods for nondifferentiable function. Springer, Berlin: (1985) [Google Scholar]
53. Kiwiel K.C. An ellipsoid trust region bundle method for nonsmooth convex minimization, SIAM Journal on Control and Optimization, 27, 737–757 (1989) 10.1137/0327039 [DOI] [Google Scholar]
54. Dolan E.D, Moré J.J. Benchmarking optimization software with performance profiles, Mathematical Programming, 91, 201–213 (2002) 10.1007/s101070100263 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data are available and they are listed in the paper.

[pone.0140606.ref001] 1. Steihaug T. The conjugate gradient method and trust regions in lagre scale optimization, SIAM Journal on Numerical Analysis, 20, 626–637 (1983) 10.1137/0720042 [DOI] [Google Scholar]

[pone.0140606.ref002] 2. Dai Y, Yuan Y. A nonlinear conjugate gradient with a strong alobal convergence properties, SIAM Journal on Optimization, 10, 177–182 (2000) 10.1137/S1052623497318992 [DOI] [Google Scholar]

[pone.0140606.ref003] 3. Wei Z, Li G and Qi L. New nonlinear conjugate gradient formulas for large-scale unconstrained optimization problems, Appiled Mathematics and Computation, 179, 407–430 (2006) 10.1016/j.amc.2005.11.150 [DOI] [Google Scholar]

[pone.0140606.ref004] 4. Li L, Peng H, Kurths J, Yang Y and Schellnhuber H.J. Chaos-order transition in foraging behavior of ants, PNAS, 111, 8392–8397 (2014) 10.1073/pnas.1407083111 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0140606.ref005] 5. Peng H, Li L, Yang Y and Liu F. Parameter estimation of dynamical systems via a chaotic ant swarm, Physical Review E, 81, 016207, (2010) 10.1103/PhysRevE.81.016207 [DOI] [PubMed] [Google Scholar]

[pone.0140606.ref006] 6. Wan M, Li L, Xiao J, Wang C and Yang Y. Data clustering using bacterial foraging optimization, Journal of Intelligent Information Systems, 38, 321–341 (2012) 10.1007/s10844-011-0158-3 [DOI] [Google Scholar]

[pone.0140606.ref007] 7. Chan C, Katsaggelos A.K and Sahakian A.V. Image sequence filtering in quantum noise with applications to low-dose fluoroscopy, IEEE Transactions on Medical Imaging, 12, 610–621 (1993) 10.1109/42.241890 [DOI] [PubMed] [Google Scholar]

[pone.0140606.ref008] 8. Banham M.R, Katsaggelos A.K. Digital image restoration, IEEE Signal Processing Magazine, 14, 24–41 (1997) 10.1109/79.581363 [DOI] [Google Scholar]

[pone.0140606.ref009] 9. Gu B, Sheng V. Feasibility and finite convergence analysis for accurate on-line v-support vector learning, IEEE Transactions on Neural Networks and Learning Systems, 24, 1304–1315 (2013) 10.1109/TNNLS.2013.2250300 [DOI] [PubMed] [Google Scholar]

[pone.0140606.ref010] 10. Li J, Li X, Yang B and Sun X. Segmentation-based image copy-move forgery detection scheme, IEEE Transactions on Information Forensics and Security, 10, 507–518 (2015) 10.1109/TIFS.2014.2381872 [DOI] [Google Scholar]

[pone.0140606.ref011] 11. Wen X, Shao L, Fang W, and Xue Y. Efficient feature selection and classification for vehicle detection, IEEE Transactions on Circuits and Systems for Video Technology, (2015) 10.1109/TCSVT.2014.2358031 [DOI] [Google Scholar]

[pone.0140606.ref012] 12. Zhang H, Wu J, Nguyen T and Sun M. Synthetic aperture radar image segmentation by modified student’s t-mixture model, IEEE Transaction on Geoscience and Remote Sensing, 52, 4391–4403 (2014) 10.1109/TGRS.2013.2281854 [DOI] [Google Scholar]

[pone.0140606.ref013] 13. Fu Z. Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing, IEICE Transactions on Communications, E98-B, 190–200 (2015) 10.1587/transcom.E98.B.190 [DOI] [Google Scholar]

[pone.0140606.ref014] 14. Yuan G, Wei Z and Li G, A modified Polak-Ribière-Polyak conjugate gradient algorithm for nonsmooth convex programs, Journal of Computational and Appiled Mathematics, 255, 86–96 (2014) 10.1016/j.cam.2013.04.032 [DOI] [Google Scholar]

[pone.0140606.ref015] 15. Schramm H, Zowe J. A version of the bundle idea for minimizing a nonsmooth function: conceptual idea, convergence analysis, numercial results, SIAM Journal on Optimization, 2, 121–152 (1992) 10.1137/0802008 [DOI] [Google Scholar]

[pone.0140606.ref016] 16. Haarala M. Miettinen K and Mäkelä M.M, New limited memory bundle method for lagre-scale nonsmooth optimization, Optimization Methods and Software, 19, 673–692 (2004) 10.1080/10556780410001689225 [DOI] [Google Scholar]

[pone.0140606.ref017] 17. Lukšan L, Vlček J. A bundle-Newton method for nonsmooth unconstrained minimization, Mathematical Programming, 83, 373–391 (1998) 10.1016/S0025-5610(97)00108-1 [DOI] [Google Scholar]

[pone.0140606.ref018] 18. Wei Z, Qi L and Birge J.R. A new methods for nonsmooth convex optimization, Journal of Inequalities and Applications, 2, 157–179 (1998) [Google Scholar]

[pone.0140606.ref019] 19. Levenberg K. A method for the solution of certain nonlinear problem in least squares, Quarterly Journal of Mechanics and Applied Mathematics, 2, 164–166 (1944) [Google Scholar]

[pone.0140606.ref020] 20. Martinet B. Régularisation d’inéquations variationelles par approxiamations succcessives, Rev. Fr. Inform. Rech. Oper, 4, 154–159 (1970) [Google Scholar]

[pone.0140606.ref021] 21. Powell M.J.D. Convergence properties of a class of minimization algorithms In: Mangasarian Q.L., Meyer R.R., Robinson S.M. (eds.) Nonlinear Programming, vol. 2 Academic Press, New York: (1975) [Google Scholar]

[pone.0140606.ref022] 22. Fletcher R. A model algorithm for composite nondifferentiable optimization problems, Math. Program. Stud, 17, 67–76 (1982) 10.1007/BFb0120959 [DOI] [Google Scholar]

[pone.0140606.ref023] 23. Sampaio R.J.B. Yuan J and Sun W. Trust region algorithm for nonsmooth optimization, Appiled Mathematics and Computation, 85, 109–116 (1997) 10.1016/S0096-3003(96)00112-9 [DOI] [Google Scholar]

[pone.0140606.ref024] 24. Sun W, Sampaio R.J.B and Yuan J. Quasi-Newton trust region algorithm for non-smooth least squares problems, Appiled Mathematics and Computation, 105, 183–194 (1999) 10.1016/S0096-3003(98)10103-0 [DOI] [Google Scholar]

[pone.0140606.ref025] 25. Zhang L. A new trust region algorithm for nonsmooth convex minimization, Appiled Mathematics and Computation 193, 135–142 (2007) 10.1016/j.amc.2007.03.059 [DOI] [Google Scholar]

[pone.0140606.ref026] 26. Yuan G, Wei Z and Wang Z. Gradient trust region algorithm with limited memory BFGS update for nonsmooth convex minization, Computational Optimization and Application, 54, 45–64 (2013) 10.1007/s10589-012-9485-8 [DOI] [Google Scholar]

[pone.0140606.ref027] 27. Yuan G, Lu X and Wei Z. BFGS trust-region method for symmetric nonlinear equations, Journal of Computational and Appiled Mathematics, 230, 44–58 (2009) 10.1016/j.cam.2008.10.062 [DOI] [Google Scholar]

[pone.0140606.ref028] 28. Qi L, Sun J. A trust region algorithm for minimization of locally Lipschitzian functions, Mathematical Programming, 66, 25–43 (1994) 10.1007/BF01581136 [DOI] [Google Scholar]

[pone.0140606.ref029] 29. Bellavia S, Maccini M and Morini B. An affine scaling trust-region approach to bound-constrained nonlinear systems, Appiled Numerical Mathematics, 44, 257–280 (2003) 10.1016/S0168-9274(02)00170-8 [DOI] [Google Scholar]

[pone.0140606.ref030] 30. Akbari Z, Yousefpour R and Reza Peyghami M. A new nonsmooth trust region algorithm for locally Lipschitz unconstrained optimization problems, Journal of Optimization Theory and Applications, 164, 733–754 (2015) 10.1007/s10957-014-0534-6 [DOI] [Google Scholar]

[pone.0140606.ref031] 31. Bannert T. A trust region algorithm for nonsmooth optimization, Mathematical Programming, 67, 247–264 (1994) 10.1007/BF01582223 [DOI] [Google Scholar]

[pone.0140606.ref032] 32. Amini K, Ahookhosh M. A hybrid of adjustable trust-region and nonmonotone algorithms for unconstrained optimization, Applied Mathematical Modelling, 38, 2601–2612 (2014) 10.1016/j.apm.2013.10.062 [DOI] [Google Scholar]

[pone.0140606.ref033] 33. Zhou Q, Hang D. Nonmonotone adaptive trust region method with line search based on new diagonal updating, Appiled Numerical Mathematics, 91, 75–88 (2015) 10.1016/j.apnum.2014.12.009 [DOI] [Google Scholar]

[pone.0140606.ref034] 34. Yuan G, Wei Z and Lu X. A BFGS trust-region method for nonlinear equations, Computing, 92, 317–333 (2011) 10.1007/s00607-011-0146-z [DOI] [Google Scholar]

[pone.0140606.ref035] 35. Lu S, Wei Z and Li L. A trust region algorithm with adaptive cubic regularization methods for nonsmooth convex minization, Computational Optimization and Application, 51, 551–573 (2012) 10.1007/s10589-010-9363-1 [DOI] [Google Scholar]

[pone.0140606.ref036] 36. Correa R, Lemaréchal C. Convergence of some algorithms for convex minimization, Mathematical Programming, 62, 261–273 (1993) 10.1007/BF01585170 [DOI] [Google Scholar]

[pone.0140606.ref037] 37. Dennis J.E. Jr, Li S.B and Tapia R.A. A unified approach to global convergence of trust region methods for nonsmooth optimization, Mathematical Programming, 68, 319–346 (1995) 10.1016/0025-5610(94)00054-W [DOI] [Google Scholar]

[pone.0140606.ref038] 38. Fukushima M, Qi L. A global and superlinearly convergent algorithm for nonsmooth convex minimization, SIAM Journal on Optimization, 6, 1106–1120 (1996) 10.1137/S1052623494278839 [DOI] [Google Scholar]

[pone.0140606.ref039] 39. Wei Z, Li G and Qi L. New quasi-Newton methods for unconstrained optimization problems, Appiled Mathematics and Computation, 175, 1156–1188 (2006) 10.1016/j.amc.2005.08.027 [DOI] [Google Scholar]

[pone.0140606.ref040] 40. Broyden C.G. The convergence of a class of double rank minimization algorithms: the new algorithm, Journal of the Institute of Mathematics and its Applications, 6, 222–131 (1970) 10.1093/imamat/6.3.222 [DOI] [Google Scholar]

[pone.0140606.ref041] 41. Fletcher R. A new approach to variable metric algorithms, Computer Journal, 13, 317–322 (1970) 10.1093/comjnl/13.3.317 [DOI] [Google Scholar]

[pone.0140606.ref042] 42. Goldfarb D. A family of variable metric methods derived by variational means, Mathematics of Computation, 24, 23–26 (1970) 10.1090/S0025-5718-1970-0258249-6 [DOI] [Google Scholar]

[pone.0140606.ref043] 43. Shanno D.F. Conditioning of quadi-Newton methods for function minization, Mathematics of Computation, 24, 647–650 (1970) [Google Scholar]

[pone.0140606.ref044] 44. Dennis J.E, Moré J.J. A characterization of superlinear convergence and its application to quasi-Newton methods, Mathematics of Computation, 28, 549–560 (1974) 10.1090/S0025-5718-1974-0343581-1 [DOI] [Google Scholar]

[pone.0140606.ref045] 45. Yuan Y, Sun W. Optimization theory and methods, Science Press, Beijing: (1997) [Google Scholar]

[pone.0140606.ref046] 46. Ma C. Optimization method and the Matlab programming, Science Press, Beijing: (2010) [Google Scholar]

[pone.0140606.ref047] 47. Mäkelä M.M, Neittaanmäki P. Nonsmooth Optimization. World Scientific, London: (1992) [Google Scholar]

[pone.0140606.ref048] 48. Charalambous J, Conn A.R. An efficient method to solve the minimax problem directly, SIAM Journal on Numerical Analysis, 15, 162–187 (1978) 10.1137/0715011 [DOI] [Google Scholar]

[pone.0140606.ref049] 49. Demyanov V.F, Malozemov V.N. Introduction to Minimax. Wiley, New York: (1974) [Google Scholar]

[pone.0140606.ref050] 50.Womersley J. Numerical methods for structured problems in nonsmooth optimization. Ph.D. thesis. Mathematics Department, University of Dundee, Dundee, Scotland (1981)

[pone.0140606.ref051] 51.Gupta N. A higher than first order algorithm for nonsmooth constrained optimization. P.h. thesis, Department of Philosophy, Washington State University, Pullman, WA (1985)

[pone.0140606.ref052] 52. Short N.Z. Minimization methods for nondifferentiable function. Springer, Berlin: (1985) [Google Scholar]

[pone.0140606.ref053] 53. Kiwiel K.C. An ellipsoid trust region bundle method for nonsmooth convex minimization, SIAM Journal on Control and Optimization, 27, 737–757 (1989) 10.1137/0327039 [DOI] [Google Scholar]

[pone.0140606.ref054] 54. Dolan E.D, Moré J.J. Benchmarking optimization software with performance profiles, Mathematical Programming, 91, 201–213 (2002) 10.1007/s101070100263 [DOI] [Google Scholar]

PERMALINK

A Modified BFGS Formula Using a Trust Region Model for Nonsmooth Convex Minimizations

Zengru Cui

Gonglin Yuan

Zhou Sheng

Wenjie Liu

Xiaoliang Wang

Xiabin Duan

Roles

Abstract

Introduction

Initial results

The new model

Convergence analysis

Results

Table 1. Problem descriptions for test problems.

Table 2. Test results.

Fig 1. Performance profiles of these methods (NI).

Fig 2. Performance profiles of these methods (NF).

Conclusion

Acknowledgments

Data Availability

Funding Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A Modified BFGS Formula Using a Trust Region Model for Nonsmooth Convex Minimizations

Zengru Cui

Gonglin Yuan

Zhou Sheng

Wenjie Liu

Xiaoliang Wang

Xiabin Duan

Roles

Abstract

Introduction

Initial results

The new model

Convergence analysis

Results

Table 1. Problem descriptions for test problems.

Table 2. Test results.

Fig 1. Performance profiles of these methods (NI).

Fig 2. Performance profiles of these methods (NF).

Conclusion

Acknowledgments

Data Availability

Funding Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases