An incomplete Hessian Newton minimization method and its application in a chemical database problem

Dexuan Xie; Qin Ni

doi:10.1007/s10589-008-9164-y

. Author manuscript; available in PMC: 2010 Aug 10.

Published in final edited form as: Comput Optim Appl. 2009 Dec;44(3):467–485. doi: 10.1007/s10589-008-9164-y

An incomplete Hessian Newton minimization method and its application in a chemical database problem

Dexuan Xie ¹, Qin Ni ²

PMCID: PMC2863154 NIHMSID: NIHMS171515 PMID: 20445822

Abstract

To efficiently solve a large scale unconstrained minimization problem with a dense Hessian matrix, this paper proposes to use an incomplete Hessian matrix to define a new modified Newton method, called the incomplete Hessian Newton method (IHN). A theoretical analysis shows that IHN is convergent globally, and has a linear rate of convergence with a properly selected symmetric, positive definite incomplete Hessian matrix. It also shows that the Wolfe conditions hold in IHN with a line search step length of one. As an important application, an effective IHN and a modified IHN, called the truncated-IHN method (T-IHN), are constructed for solving a large scale chemical database optimal projection mapping problem. T-IHN is shown to work well even with indefinite incomplete Hessian matrices. Numerical results confirm the theoretical results of IHN, and demonstrate the promising potential of T-IHN as an efficient minimization algorithm.

Keywords: Unconstraint minimization, Incomplete Hessian matrix, Convergence analysis, Truncated Newton, Chemical database analysis

1 Introduction

Let f be a twice continuously differentiable multivariable function defined on a bounded neighboring domain, $D$ , of the n-dimensional Euclidean real vector space Rⁿ. We consider a large scale unconstrained optimization problem:

Find x^{*} \in D \subset R^{n} such that f (x^{*}) = \min {f (x) ∣ x \in D},

(1)

where x* lies in the interior of $D$ , f(x) and the gradient vector g(x) (i.e., the first derivative of f at x) are expansive to be evaluated, and the Hessian matrix H(x) (i.e., the second derivative of f at x) is dense, and available to be evaluated analytically. In practice, however, when n is large enough, it is infeasible to evaluate the whole Hessian H(x) due to the computer cost of computing and storage. Typical examples of (1) often arise from biomolecular energy function minimization problems and chemical database optimal projection problems as well as many other scientific and engineering applications. The minimization problem of biomolecular potential energy function is one of the fundamental tasks in biomolecular simulations [10] while solving the optimal projection mapping problem is a key step in a large scale chemical database analysis [10, 13, 14]. Developing efficient numerical minimization algorithms is essential in these application fields.

Currently, typical algorithms for solving (1) include the steepest descent method (SD), the nonlinear conjugate gradient method (CG), the limited-memory BFGS method (L-BFGS) [4-6], and the discrete truncated Newton method (D-TN) [6, 7]. They all do not require any evaluations of the Hessian matrices but gradient vectors. In SD and CG, the gradient vectors are employed to construct descent search directions. To gain faster minimization algorithms, L-BFGS and D-TN use the gradient vectors to construct approximate Hessian matrices to define them as modified Newton type algorithms [6]. For example, in D-TN, an approximate Hessian matrix is generated implicitly by using gradient vectors to construct a finite difference approximation to the Hessian-vector product, which is the only place where H(x) occurs in the truncated Newton method (TN) [1]. However, such a finite difference formula is inherently numerically unstable, which may disturb the numerical behaviors of D-TN significantly.

In this paper, we intend to study the idea of constructing an approximate Hessian, M(x), directly from the Hessian matrix H(x), which is available in Problem (1) and should be used, even partially, in developing fast minimization algorithms. With a properly selected sparse pattern and sparse matrix techniques, we can simply construct M(x) as an incomplete Hessian matrix, and evaluate it in a fast way based on the available computer computing and storage capability. We then substitute it to the Hessian matrix H(x) of a classic modified Newton method to yield an incomplete Hessian Newton method (IHN). With a properly selected M(x), IHN is expected to be numerically stable, easy to be implemented, and have high computer performance and fast convergence rate.

The focus of this paper is to study some basic convergence properties of IHN. For this purpose, we simply assume that M(x) is symmetric, positive definite in domain $D$ . Following the classic quasi-Newton theory, we prove that IHN converges globally, and has both an R-linear rate of convergence and a Q-linear rate of convergence when M(x) is properly selected. We also prove that the Wolfe conditions hold for IHN with a line search step length of one when the number of IHN iterations is large enough.

As an application, we construct a particular IHN for solving the chemical database optimal projection mapping problem as described in [10, 13, 14]. In this application, the entries of H(x) that correspond to the pairwise distances within a certain short range become a dominating part of H(x). Thus, they can be selected as the nonzero entries of M(x), yielding a good approximation of H(x). In this paper, we construct M(x) by a distance cut-off strategy, and express both H(x) and M(x) in terms of Kronecker products to display the dense and sparse matrix structures of H(x) and M(x). Such a sparse expression of M(x) is valuable in programming M(x) by sparse matrix techniques. To confirm our theoretical results, we carry out numerical experiments on IHN with a real chemical dataset. Numerical results show that the rate of convergence of IHN can be close to that of the classic modified Newton method when the incomplete Hessian is properly selected. Even with a very sparse incomplete Hessian (a block diagonal matrix with each block being a 2 by 2 matrix), IHN was still found to have a much faster rate of convergence than SD.

However, assuming all the incomplete Hessian matrices to be positive definite is often too strong to be satisfied in practice. To release this assumption and further reduce the computing cost, we use the truncated Newton strategy given in [12] to modify IHN as a descent search direction method, and call it the truncated IHN method (T-IHN) for clarity. T-IHN is shown to converge globally even with indefinite incomplete Hessian matrices. To numerically study the convergence rate and computer performance, we develop a MATLAB program package for T-IHN for solving the chemical database problem using sparse matrix techniques. We then compare the convergence and performance of T-IHN with that of SD, BFGS, and D-TN. Here SD and BFGS were implemented by calling the minimization solver routine fminunc from the MATLAB library, and the MATLAB program of D-TN is the same as that of T-IHN except that it uses the Euler forward finite difference formula to approximate the Hessian-vector product.

Numerical results show that the T-IHN using an incomplete Hessian with about 60 percent of zero entries has a faster rate of convergence and a better performance than BFGS. T-IHN took less CPU time by a factor of about 2.09 than BFGS for a dataset of 300 members. T-IHN was also found to have a close rate of convergence as D-TN and a better performance than D-TN. In this test, T-IHN took less CPU time by a factor of about 2.6 than D-TN. Here we did not compare T-IHN with L-BFGS since the MATLAB library does not contain any L-BFGS program routine. Note that L-BFGS usually has a slower rate of convergence than BFGS. Hence, it can be expected that T-IHN has better performances in both convergence and CPU time than L-BFGS.

We also made tests on T-IHN with two very sparse incomplete Hessian matrices: one has about 97.43% of zero entries, and the other has about 99.67% of zero entries. Even so, T-IHN was still found to have much better performances in both convergence rate and CPU time than SD. It took less CPU time by a factor of up to 7.68 than SD. These numerical results demonstrate the promising potential of T-IHN as an efficient solver of minimization problem (1).

The remainder of the paper is organized as follows. We define IHN in Sect. 2, and present its basic convergent properties in Sect. 3. We then describe the IHN and T-IHN methods for solving the database problem in Sect. 4. Finally, the numerical results on IHN and T-IHN are presented in Sect. 5.

2 The IHN method

Let g(x), H(x) and M(x) denote the gradient vector, Hessian matrix, and incomplete Hessian matrix of f at $x \in D$ , respectively. We assume that both H(x) and M(x) are symmetric, positive definite in $D$ . The IHN iterative sequence {x_k} for solving (1) is defined in the form

x_{k + 1} = x_{k} + α_{k} p_{k}, k = 0, 1, 2, \dots,

(2)

where x₀ is a given initial iterate in $D$ , p_k is a search direction satisfying

M (x_{k}) p_{k} = - g (x_{k}),

(3)

and α_k is a step length satisfying the Wolfe conditions

f (x_{k + 1}) \leq f (x_{k}) + c_{1} α_{k} g {(x_{k})}^{T} p_{k} and g {(x_{k + 1})}^{T} p_{k} \geq c_{2} g {(x_{k})}^{T} p_{k}

(4)

for $0 < c_{1} < \frac{1}{2} < c_{2} < 1$ . Clearly, p_k = −M(x_k)⁻¹g_k, which is a descent search direction in the sense that

g {(x_{k})}^{T} p_{k} < 0 for k = 0, 1, 2, \dots .

(5)

Denote m_ij and h_ij as the (i, j)th entry of M(x) and H(x), respectively. The sparse pattern P of M(x) is a set of index pairs (i, j) at which m_ij ≠ 0. With a given P, the incomplete Hessian matrix M(x) is defined by

m_{i j} (x) = {\begin{matrix} h_{i j} (x) & for (i, j) \in P, \\ 0, & otherwise . \end{matrix}

Clearly, a selection of P depends on the problem to be solved and the capacity of a computer to be used for implementation. In the extreme cases, we can set P = P_f and P_d, where

P_{d} = {(i, i) ∣ i = 1, 2, \dots, n} and P_{f} = {(i, j) ∣ i, j = 1, 2, \dots, n} .

(6)

Obviously, the matrices M(x) with P_f and P_d are respectively the original Hessian matrix H(x) and the diagonal matrix with h₁₁(x), h₂₂(x), …, and h_nn(x) as the diagonal entries. Hence, the IHN with P_f returns to the classic Newton method.

3 The convergence analysis of IHN

Let ∥ · ∥ be the 2-norm of a vector/matrix. For clarity, we sometimes write f(x_k), g(x_k), H(x_k), M(x_k), g(x*), H(x*), and M(x*) as f_k, g_k, H_k, M_k, g*, $H_{*}$ , and $M_{*}$ , respectively. Following the general quasi-Newton theory (e.g., see pp. 43–45 in [6]), we have the global convergence theorem for IHN as below.

Theorem 1

Let {x_k} be a sequence of IHN iterates defined in (2). If there exists a positive constant η such that

‖ M_{k} ‖ ‖ M_{k}^{- 1} ‖ \leq η for all k,

(7)

then lim_k→∞∥g(x_k)∥ = 0 for any $x_{0} \in D$ .

To discuss the convergence rates of IHN, we make Assumptions 1 and 2.

Assumption 1

The IHN iterative sequence {x_k} converges to $x^{*} \in D$ for any $x_{0} \in D$ . Here g(x*) = 0 and both M(x*) and H(x*) are positive definite.

Assumption 2

There exists a positive integer, k₀, such that Wolfe condition (4) is satisfied with α_k = 1 for all k ≥ k₀.

The following two definitions will be used in our IHN analysis.

Definition 1

The convergence rate of {x_k} is R-linear if there exists a number r between 0 and 1 such that lim sup_k→∞∥x_k − x*∥^1/k = r.

Definition 2

The convergence rate of {x_k} is Q-linear if there exists a constant 0 < c < 1 and a positive integer k₀ such that ∥x_k+1 − x*∥ ≤ c∥x_k − x*∥ whenever k ≥ k₀.

Using the similar arguments in both [9] and the proof of Theorem 6.1 in [4], we can prove the R-linear convergence for IHN.

Theorem 2

If Assumption 1 and condition (7) hold, then the IHN iterative sequence {x_k} is R-linearly convergent.

The following corollary gives an easy-to-check sufficient condition for the IHN using the sparse pattern P_d defined in (6).

Corollary 1

Let Assumption 1 hold and the sparse pattern of incomplete Hessian matrix M_k be given with P_d in (6). If the diagonal elements of H_k are positive and bounded, then IHN is convergent both globally and R-linearly.

Proof

We only need to prove that (7) is satisfied. Let $h_{i i}^{(k)}$ for i = 1, …, n be the diagonal elements of H_k. By the assumption, there exist two positive constants b₁ and b₂ such that $b_{1} \leq h_{i i}^{(k)} \leq b_{2}$ for all i and k. Thus,

‖ M_{k} ‖ ‖ M_{k}^{- 1} ‖ = \frac{\max_{1 \leq i \leq n} ∣ h_{i i}^{(k)} ∣}{\min_{1 \leq i \leq n} ∣ h_{i i}^{(k)} ∣} \leq \frac{b_{2}}{b_{1}} .

Hence, the proof follows from Theorems 1 and 2.

To discuss the Q-linear rate of convergence, we need Lemmas 1 and 2.

Lemma 1

If Assumption 1 holds, then for any η > 0, there exists ∊ > 0 such that H(x) and M(x) are positive definite, ∥M(x)⁻¹ − M(x*)⁻¹∥ < η, and ∥g(x) − g(x*) − H(x*)(x − x*)∥ < η ∥x − x*∥ whenever ∥x − x*∥ ≤ ∊.

The above lemma can be proved by similar arguments to the ones from Sects. 2.3.3 and 3.1.6 in [8].

Lemma 2

If H and M are symmetric, positive definite, then

λ_{\min} (M^{- 1} H) - 1 \leq \frac{x^{T} (H - M) x}{x^{T} M x} \leq λ_{\max} (M^{- 1} H) - 1, \forall x \neq 0,

(8)

where λ_min(M⁻¹H) and λ_max(M⁻¹H) denote the smallest and largest eigenvalues of M⁻¹H, respectively.

Proof

Let M^1/2 be the square root matrix of M. Set $\overset{‒}{H} = M^{- 1 ∕ 2} H M^{- 1 ∕ 2}$ and y = M^1/2x for any nonzero $x \in D$ . It is easy to show that $x^{T} H x = y^{T} \overset{‒}{H} y$ , y^T y = x^T Mx, and $\overset{‒}{H}$ is similar to M⁻¹H. Hence,

λ_{\min} (M^{- 1} H) y^{T} y \leq y^{T} \overset{‒}{H} y \leq λ_{m a x} (M^{- 1} H) y^{T} y,

from which it is then easy to obtain (8).

Since $M_{*}$ is assumed to be symmetric, positive definite, its square root matrix $M_{*}^{1 ∕ 2}$ exists and satisfies that $M_{*} = M_{*}^{1 ∕ 2} M_{*}^{1 ∕ 2}$ . Thus, a new norm, ${‖ \cdot ‖}_{*}$ , can be well defined by

{‖ x ‖}_{*} = ‖ M_{*}^{1 ∕ 2} x ‖, \forall x \neq 0 .

(9)

Under this new norm, we obtain the Q-linear convergence rate for IHN in Theorem 3. Here the spectral radius of a matrix, say A, is denoted by ρ(A), which is defined as the largest of the modules of the eigenvalues of A.

Theorem 3

Let Assumptions 1 and 2 hold. If

ρ (M_{*}^{- 1} H_{*}) < 2,

(10)

then the IHN iterative sequence {x_k} has a Q-linear rate of convergence under the norm ${‖ \cdot ‖}_{*}$ defined in (9).

Proof

Set ${\overset{‒}{H}}_{*} = M_{*}^{- 1 ∕ 2} H_{*} M_{*}^{- 1 ∕ 2}$ , and denote by λ_j as the jth eigenvalue of ${\overset{‒}{H}}_{*}$ for j = 1, 2, …, n. Clearly, ${\overset{‒}{H}}_{*}$ is symmetric, positive definite because of Assumption 1 so that all its eigenvalues are positive. Further, ${\overset{‒}{H}}_{*}$ is similar to $M_{*}^{- 1} H_{*}$ ; thus, both $M_{*}^{- 1} H_{*}$ and ${\overset{‒}{H}}_{*}$ have the same eigenvalues. Hence, $1 - M_{*}^{- 1} H_{*}$ has eigenvalues 1 − λ_j for j = 1, 2, …, n, and from (10) it follows that 0 < λ_j < 2 or ∣1 − λ_j∣ < 1 for j = 1, 2, …, n. Therefore, $ρ (1 - M_{*}^{- 1} H_{*}) = \max_{1 \leq i \leq n} ∣ 1 - λ_{i} ∣ < 1$ , and we can select a positive real number, t, such that $ρ (1 - M_{*}^{- 1} H_{*}) \leq t$ .

We next set $E^{*} = M_{*} - H_{*}$ and ${\overset{‒}{E}}^{*} = M_{*}^{- 1 ∕ 2} E^{*} M_{*}^{- 1 ∕ 2}$ . It is easy to see that ${\overset{‒}{E}}^{*}$ is symmetric, and similar to $I - M_{*}^{- 1} H_{*}$ . Hence, $‖ {\overset{‒}{E}}^{*} ‖ = ρ ({\overset{‒}{E}}^{*}) = ρ (1 - M_{*}^{- 1} H_{*})$ .

With (2) and Assumption 1, we can get that

\begin{matrix} M_{*}^{1 ∕ 2} & (x_{k + 1} - x^{*}) \\ = & M_{*}^{1 ∕ 2} (x_{k} - M_{k}^{- 1} g_{k} - x^{*}) = M_{*}^{1 ∕ 2} (- M_{k}^{- 1} (g_{k} - g^{*}) + (x_{k} - x^{*})) \\ = & - M_{*}^{1 ∕ 2} M_{k}^{- 1} (g_{k} - g^{*} - H_{*} (x_{k} - x^{*})) \\ - M_{*}^{1 ∕ 2} (M_{k}^{- 1} - M_{*}^{- 1}) H_{*} (x_{k} - x^{*}) + M_{*}^{- 1 ∕ 2} E^{*} (x_{k} - x^{*}) . \end{matrix}

(11)

The last term of the above expression can then be estimated as below:

\begin{matrix} ‖ M_{*}^{- 1 ∕ 2} E^{*} (x^{*} - x_{k}) ‖ & = ‖ {\overset{‒}{E}}^{*} M_{*}^{1 ∕ 2} (x^{*} - x_{k}) ‖ \\ \leq ‖ {\overset{‒}{E}}^{*} ‖ {‖ x^{*} - x_{k} ‖}_{*} = ρ ({\overset{‒}{E}}^{*}) {‖ x^{*} - x_{k} ‖}_{*} \\ = ρ (1 - M_{*}^{- 1} H_{*}) {‖ x^{*} - x_{k} ‖}_{*} \leq t {‖ x^{*} - x_{k} ‖}_{*} . \end{matrix}

(12)

To estimate the other two terms of (11), we set $η = \frac{1}{2} (1 + t)$ , and

μ = \max {‖ H_{*} ‖, ‖ M_{*} ‖, ‖ M_{*}^{- 1} ‖, ‖ M_{*}^{1 ∕ 2} ‖, ‖ M_{*}^{- 1 ∕ 2} ‖} .

(13)

Obviously, t < η < 1, and there exists a sufficiently small positive number, γ, satisfying

t + μ^{2} (2 μ + γ) γ \leq η .

(14)

For the above γ > 0, using the continuity of H(x) and M(x) at x* and similar arguments in the proof of Lemma 1, we can show that there exists positive number ∊ such that

‖ M {(y)}^{- 1} - M {(x^{*})}^{- 1} ‖ \leq γ,

(15)

and

‖ M_{*}^{1 ∕ 2} (g (y) - g (x^{*}) - H (x^{*}) (y - x^{*})) ‖ \leq γ ‖ M_{*}^{1 ∕ 2} (y - x^{*}) ‖

(16)

whenever ∥y − x*∥ ≤ ∊.

Furthermore, from the definition of the convergence of sequence {x_k} it can follow that there exists integer k₁ > 0 such that ∥x_k − x*∥ ≤ ∊ for k ≥ k₁. Hence, by (13) and (15), we can obtain that

‖ M_{*}^{1 ∕ 2} (M_{k}^{- 1} - M_{*}^{- 1}) H_{*} M_{*}^{- 1 ∕ 2} ‖ \leq γ μ^{3},

and

‖ M_{k}^{- 1} ‖ \leq ‖ M_{*}^{- 1} ‖ + ‖ M_{k}^{- 1} - M_{*}^{- 1} ‖ \leq μ + γ for k \geq k_{1} .

Finally, applying the above two expressions, (12), (14) and (16) to (11) gives

\begin{matrix} {‖ x_{k + 1} - x^{*} ‖}_{*} = & ‖ M_{*}^{1 ∕ 2} (x_{k + 1} - x^{*}) ‖ \\ \leq & ‖ M_{*}^{1 ∕ 2} M_{k}^{- 1} (g_{k} - g^{*} - H_{*} (x_{k} - x^{*})) ‖ \\ + ‖ M_{*}^{1 ∕ 2} (M_{k}^{- 1} - M_{*}^{- 1}) H_{*} (x_{k} - x^{*}) ‖ + ‖ M_{*}^{- 1 ∕ 2} E^{*} (x_{k} - x^{*}) ‖ \\ \leq & (γ μ^{2} (2 μ + γ) + t) ‖ M_{*}^{1 ∕ 2} (x_{k} - x^{*}) ‖ \\ \leq & η ‖ M_{*}^{1 ∕ 2} (x_{k} - x^{*}) ‖ = η {‖ x_{k} - x^{*} ‖}_{*} \end{matrix}

(17)

whenever k ≥ k₁. This completes the proof of Theorem 3.

From the above proof we see that the condition $ρ (1 - M_{*}^{- 1} H_{*}) < 1$ follows from (10), and is then used in the remaining part of the proof. Thus, we have

Corollary 2

Let Assumptions 1 and 2 hold. If

ρ (I - M_{*}^{- 1} H_{*}) < 1,

(18)

then IHN has a Q-linear rate of convergence under the norm ${‖ \cdot ‖}_{*}$ .

According to the theory of linear stationary iterative methods [15], (18) gives a sufficient and necessary condition to guarantee the convergence of an iterative method constructed by the matrix splitting $H_{*} = M_{*} - N_{*}$ with $N_{*} = M_{*} - H_{*}$ for solving the linear system with coefficient matrix $H_{*}$ . Thus, (18) is a natural condition to construct an efficient IHN.

The condition (10) in Theorem 3 depends on the solution x*, which is difficult to be verified. To improve it, we propose a sufficient condition that only depends on the current iterate in the following corollary.

Corollary 3

Let Assumptions 1 and 2 hold. If H_k is positive definite and there exist positive constant η < 2 and positive integer k₁ such that

‖ M_{k}^{- 1} H_{k} ‖ \leq η, \forall k > k_{1},

(19)

then IHN has a Q-linear rate of convergence under the norm ${‖ \cdot ‖}_{*}$ .

Proof

From the property of matrix norm and the assumptions it follows that

ρ (M_{k}^{- 1} H_{k}) \leq ‖ M_{k}^{- 1} H_{k} ‖ \leq η for all k > k_{1} .

Letting k → ∞ in the above expression immediately gives condition (10). Thus, the proof is followed from Theorem 3.

A sufficient condition is given in the following theorem to guarantee that IHN satisfies Assumption 2.

Theorem 4

Let Assumption 1 hold. If the eigenvalues $λ_{i}^{(k)}$ for i = 1, …, n of $M_{k}^{- 1} H_{k}$ have lower and upper bounds λ_l and λ_u and there exists positive integer k₁ such that

1 - c_{2} < λ_{1} \leq λ_{i}^{(k)} \leq λ_{u} < 2 - 2 c_{1} for i = 1, \dots, n and k \geq k_{1},

(20)

where c₁ and c₂ are the two constants in the Wolfe conditions of (4), then Assumption 2 holds for IHN.

Proof

By Taylor expansion, we can get that

\begin{matrix} f & (x_{k} + p_{k}) - f (x_{k}) - c_{1} g_{k}^{T} p_{k} \\ = (\frac{1}{2} - c_{1}) g_{k}^{T} p_{k} + \frac{1}{2} p_{k}^{T} (g_{k} + H_{k} p_{k}) + \frac{1}{2} p_{k}^{T} (H (u_{k}) - H_{k}) p_{k}, \end{matrix}

(21)

where u_k = x_k + t_kp_k with 0 ≤ t_k ≤ 1. Set

r = \frac{1}{2} λ_{\min} (M (x^{*})), and ω = \frac{r}{2} \min {2 - 2 c_{1} - λ_{u}, λ_{l} + c_{2} - 1} .

(22)

From Assumption 1 and (20) it can be shown that both r and ω are positive. Thus, an upper bound for λ_u − 1 and a lower bound for λ_l − 1 can be estimated in terms of r and ω as below:

\begin{matrix} λ_{u} - 1 & \leq \frac{λ_{u} + 2 - 2 c_{1}}{2} - 1 = 1 - 2 c_{1} - \frac{1}{2} (2 - 2 c_{1} - λ_{u}) \\ \leq 1 - 2 c_{1} - \frac{ω}{r}, \end{matrix}

(23)

and

\begin{matrix} λ_{l} - 1 & \geq \frac{λ_{l} + 1 - c_{2}}{2} - 1 = - c_{2} + \frac{1}{2} (λ_{l} + c_{2} - 1) \\ \geq - c_{2} + \frac{ω}{r} . \end{matrix}

(24)

Further, by the continuity of H, we can select ∊ > 0 such that

‖ H (z) - H (x) ‖ \leq ω

(25)

whenever ∥x − x*∥ ≤ ∊ and ∥z − x*∥ ≤ ∊. Based on Assumption 1, we can also find positive integer k₂ ≥ k₁ such that

λ_{\min} (M_{k}) \geq r, ‖ x_{k} - x^{*} ‖ \leq ∊, ‖ x_{k} + p_{k} - x^{*} ‖ \leq ∊,

(26)

and H(x_k) is positive definite for all k ≥ k₂. As the result of the last two inequality of (26),

‖ u_{k} - x^{*} ‖ = ‖ t_{k} (x_{k} + p_{k} - x^{*}) + (1 - t_{k}) (x_{k} - x^{*}) ‖ \leq ∊ .

Hence, with (25) and the second inequality of (26), we can get that

∣ p_{k}^{T} (H (u_{k}) - H_{k}) p_{k} ∣ \leq ω {‖ p_{k} ‖}^{2}, \forall k \geq k_{2} .

(27)

Also, by (3), (8) and (23), the term $p_{k}^{T} (g_{k} + H_{k} p_{k})$ of (21) is estimated as

\begin{matrix} p_{k}^{T} (g_{k} + H_{k} p_{k}) & = P_{k}^{T} (H_{k} - M_{k}) p_{k} \leq (λ_{u} - 1) p_{k}^{T} M_{k} p_{k} \\ \leq (1 - 2 c_{1} - \frac{ω}{r}) p_{k}^{T} M_{k} p_{k} . \end{matrix}

(28)

Therefore, applying (3), the first one of (26), (27), and (28) into (21) gives

\begin{matrix} f & (x_{k} + p_{k}) - f (x_{k}) - c_{1} g_{k}^{T} p_{k} \\ \leq - (\frac{1}{2} - c_{1}) p_{k}^{T} M_{k} p_{k} + (\frac{1}{2} - c_{1} - \frac{ω}{2 r}) p_{k}^{T} M_{k} p_{k} + \frac{1}{2} ω {‖ p_{k} ‖}^{2} \\ = - \frac{ω}{2 r} p_{k}^{T} M_{k} p_{k} + \frac{1}{2} ω {‖ p_{k} ‖}^{2} \\ \leq - \frac{1}{2} ω {‖ p_{k} ‖}^{2} + \frac{1}{2} ω {‖ p_{k} ‖}^{2} = 0, \forall k \geq k_{2} . \end{matrix}

(29)

This completes the proof of the first inequality of (4) for Assumption 2.

We next prove the second inequality of (4) for Assumption 2.

By the mean value theorem and (3),

\begin{matrix} g {(x_{k} + p_{k})}^{T} p_{k} & = g_{k}^{T} p_{k} + p_{k}^{T} H_{k} p_{k} + p_{k}^{T} (H (v_{k}) - H_{k}) p_{k} \\ = p_{k}^{T} (H_{k} - M_{k}) p_{k} + p_{k}^{T} (H (v_{k}) - H_{k}) p_{k}, \end{matrix}

(30)

where $v_{k} = x_{k} + t_{k}^{'} p_{k}$ with $0 \leq t_{k}^{'} \leq 1$ . Similar to (27), we obtain

∣ p_{k}^{T} (H (v_{k}) - H_{k}) p_{k} ∣ \leq ω {‖ p_{k} ‖}^{2}, \forall k \geq k_{2} .

(31)

A combination of (8) with (20) and (24) gives

p_{k}^{T} (H_{k} - M_{k}) p_{k} \geq (λ_{l} - 1) p_{k}^{T} M_{k} p_{k} \geq (- c_{2} + \frac{ω}{r}) p_{k}^{T} M_{k} p_{k} .

(32)

Thus, by (3), (26), (30), (31), and (32),

\begin{matrix} g & {(x_{k} + p_{k})}^{T} p_{k} - c_{2} g_{k}^{T} p_{k} \\ = c_{2} p_{k}^{T} M_{k} p_{k} + p_{k}^{T} (H_{k} - M_{k}) p_{k} + p_{k} (H (v_{k}) - H_{k}) p_{k} \\ \geq c_{2} p_{k}^{T} M_{k} p_{k} + (- c_{2} + \frac{ω}{r}) p_{k}^{T} M_{k} p_{k} - ω {‖ p_{k} ‖}^{2} \\ = \frac{ω}{r} p_{k}^{T} M_{k} p_{k} - ω {‖ p_{k} ‖}^{2} \geq 0 for k \geq k_{2} . \end{matrix}

This completes the proof of Theorem 4.

4 The IHN and T-IHN methods for chemical database problem

As an application, we construct IHN and T-IHN for solving the chemical database optimal projection mapping problem described in [10, 13, 14]. The key step is the construction of incomplete Hessian matrix M_k, which is presented in details in this section.

With the notation of [13], we recall the database problem as below.

Let a chemical database of n members have been characterized as an n × m matrix X:

X = {(X_{1}, X_{2}, \dots, X_{n})}^{T} = [\begin{matrix} x_{11} & x_{12} & \dots & x_{1 m} \\ x_{21} & x_{22} & \dots & x_{2 m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{n 1} & x_{n 2} & \dots & x_{n m} \end{matrix}],

(33)

where X_i = (x_i1, x_i2, …, x_im)^T stands for the ith member of the database, x_ij denotes the value of the j th chemical descriptor for ith member, and the distance δ_ij = ∥X_i − X_j∥ measures the similarity of the ith member with jth member. A key step in the efficient visualization protocol proposed in [13, 14] is to solve the following unconstrained minimization problem:

E (Y_{1}^{*}, Y_{2}^{*}, \dots, Y_{n}^{*}) = \min_{\forall Y_{i} \in R^{l}} E (Y_{1}, Y_{2}, \dots, Y_{n}),

(34)

where Y_i = (y_i1, y_i2, …, y_il)^T, l is a positive integer less than m, and the objective function E is defined by

E (Y_{1}, Y_{2}, \dots, Y_{n}) = \frac{1}{4} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} ω_{i j} {∣ d {(Y_{i}, Y_{j})}^{2} - δ_{i j}^{2} ∣}^{2} .

(35)

Here $d {(Y_{i}, Y_{j})}^{2} = \sum_{k = 1}^{l} {(y_{i k} - y_{j k})}^{2}$ , and ω_ij is the weight constant, which is set as $ω_{i j} = 1 ∕ δ_{i j}^{4}$ if $δ_{i j}^{4} \geq η$ and ω_ij = 1 if $δ_{i j}^{4} < η$ . The parameter η is a small positive number such as 10⁻¹². In [13, 14], the above minimization problem was solved by D-TN with an initial guess generated from SVD/PCA (singular value decomposition or principal component analysis) [2, 3].

Set Y = (Y₁, Y₂, …, Y_n)^T. To construct the incomplete Hessian matrix M(Y), we first find the Hessian H(Y) of E(Y) as below:

H (Y) = \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} H_{i j} (Y),

(36)

where H_ij(Y) denotes the second derivative of the term $\frac{ω_{i j}}{4} {[d {(Y_{i}, Y_{j})}^{2} - δ_{i j}^{2}]}^{2}$ with respect to Y. We express H_ij(Y) as an n × n block matrix with each block entry, h_μν, being an l × l matrix. It is easy to find that

h_{μ ν} = {\begin{matrix} Π_{i j} & if μ = ν = i or μ = ν = j, \\ - Π_{i j} & if μ = i, ν = j or μ = j, ν = i, \\ 0 & otherwise, \end{matrix}

where Π_ij denotes an l × l matrix defined by

Π_{i j} = ω_{i j} (r_{i j} I_{l} + 2 R_{i j} R_{i j}^{T}) .

Here $r_{i j} = d {(Y_{i}, Y_{j})}^{2} - δ_{i j}^{2}$ , R_ij = Y_i − Y_j, and I_l is the l × l identity matrix. Clearly, H_ij is sparse with only four nonzero block entries, h_ii, h_jj, h_ij, and h_ji, where h_ii = h_jj and h_ij = h_ji. In terms of Kronecker product ⊗,¹ H_ij(Y) can be expressed as

H_{i j} (Y) = (e_{i} e_{i}^{T} - e_{i} e_{j}^{T} - e_{j} e_{i}^{T} + e_{j} e_{j}^{T}) \otimes Π_{i j},

where e_i denotes the ith standard unit vector of Rⁿ. Thus, if we define Π_ii = 0, then the Hessian H(Y) can be written as

\begin{matrix} H (Y) & = \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} (e_{i} e_{i}^{T} - e_{i} e_{j}^{T} - e_{j} e_{i}^{T} + e_{j} e_{j}^{T}) \otimes Π_{i j} \\ = \sum_{i = 1}^{n} (e_{i} e_{i}^{T} \otimes \sum_{j = 1}^{n} Π_{i j}) - \sum_{i < j} (e_{i} e_{j}^{T} + e_{j} e_{i}^{T}) \otimes Π_{i j} . \end{matrix}

(37)

Clearly, the first and second terms of (37) give the main block diagonal part and the off-diagonal part of H(Y), respectively. From the above expression it is easy to see that H(Y) is a full dense matrix of N × N with N = nl.

Using distance cutoff strategy, we define the incomplete Hessian matrix M(Y) as follows: With a given cutoff radius, τ > 0, we construct the sparse pattern P by

P = {(i, j) ∣ ‖ X_{i} - X_{j} ‖ \leq τ for i, j = 1, 2, \dots, n} .

We then define M(Y) by

M (Y) = \sum_{i = 1}^{n} (e_{i} e_{i}^{T} \otimes \sum_{j = 1}^{n} Π_{i j}) - \sum_{(i, j) \in P} (e_{i} e_{j}^{T}) \otimes Π_{i j} .

(38)

Clearly, the above matrix of M(Y) is symmetric due to the symmetry of the submatrix Π_ij and the definition of P. Hence, in computer implementation, we can only evaluate and store the upper triangular part of M(Y) to reduce the costs of computing and storage.

To properly control the sparsity of M(Y), we propose to select a value of the cutoff radius τ by the formula

τ = ξ {[\frac{2}{n (n - 1)} \sum_{i = 1}^{n} \sum_{j = i + 1}^{n} δ_{i j}^{2}]}^{1 ∕ 2},

(39)

where ξ is an adjusting factor for the sparsity of M(Y). If ξ = 1, τ gives a mean value of the distances between each pair of the database members. In other words, about a half of the entries of M(Y) may be zero. As the value of ξ is reduced to zero, M(Y) becomes the block diagonal matrix M_d(Y):

M_{d} (Y) = \sum_{i = 1}^{n} (e_{i} e_{i}^{T} \otimes \sum_{j = 1}^{n} Π_{i j}) .

(40)

From the block Jacobi convergence theory (see p. 111 in [15], for example) it follows that ρ(I − M_d(Y*)⁻¹H(Y*)) < 1 if H(Y*) and 2M_d(Y*) − H(Y*) are positive definite and all the main diagonal block entries of H(Y*) are symmetric, positive definite. Hence, by Corollary 2, it is claimed that the IHN method with the incomplete Hessian of (40) is Q-linearly convergent.

With the incomplete Hessian M(Y) of (38), we obtain IHN for solving the database problem (34).

We next describe T-IHN for solving the database problem (34).

The T-IHN iterative sequence {x_k} is defined in the same form as the one in (2) except that the descent search direction p_k is selected as either an iterate of the preconditioned conjugate gradient method (PCG) [2] for solving (3) or −g_k (in the worst case) according to the truncated Newton strategy given in [12]. For clarity, the scheme for generating p_k is presented in Algorithm 1. The initial iterate x₀ of T-IHN is generated by using the SVD/PCA scheme given in [13].

Algorithm 1

(Defining the descent search direction p_k for T-IHN)

Let B_k be a preconditioner for M_k, and w_j represent the jth PCG iterate for solving (3). Set η_k = min{c_k/k, ∥g_k∥}, and give ∊ > 0 and IT_PCG > 0 (e.g., c_k = 0.5, ∊ = 10⁻⁶ and IT_PCG = 80). The kth descent search direction p_k of the T-IHN method is selected by the following steps:

[INITIALIZATION]
- Set j = 1, w₁ = 0, r₁ = −g_k, and d₁ = z₁,
- where z₁ solves the linear system B_kz₁ = −r₁.
[SINGULARITY TEST]
- If either $∣ r_{j}^{T} z_{j} ∣ \leq δ$ or $∣ d_{j}^{T} M_{k} d_{j} ∣ \leq δ$ (e.g., δ = 10⁻¹⁰),
- exit the algorithm with p_k = w_j (for j = 1, set p_k = −g_k).
Compute $α_{j} = r_{j}^{T} z_{j} ∕ d_{j}^{T} M_{k} d_{j}$ and w_j+1 = w_j + α_jd_j.
[DESCENT DIRECTION TEST]
- If $g_{k}^{T} w_{j + 1} \geq g_{k}^{T} w_{j} + δ$ ,
- exit the algorithm with p_k = w_j (for j = 1, set p_k = −g_k).
Compute r_j+1 = r_j −α_jM_kd_j.
[TRUNCATION TEST]
- If ∥r_j+1∥ ≤ η_k ∥g_k∥ or j + 1 > IT_PCG,
- exit the algorithm with p_k = w_j+1.
Compute $β_{j} = r_{j + 1}^{T} z_{j + 1} ∕ r_{j}^{T} z_{j}$ , and d_j+1 = z_j+1 + β_jd_j,
- where z_j+1 solves the linear system B_kz_j+1 = r_j+1.
Increase j to j + 1 and go to Step 2.

As shown in [12], a descent search direction is produced from Algorithm 1 even with indefinite M_k or B_k. Thus, T-IHN is a descent search direction method, whose global convergence follows immediately from the general theory of descent methods [6].

The performance of T-IHN can be improved significantly if a good preconditioner B_k can be selected. However, since M_k is often indefinite for this database problem, its preconditioning causes difficulties in both theory and practice. We avoid such difficulties in this paper by simply setting B_k to be an identity matrix.

5 Numerical results

We developed two MATLAB program packages for IHN and T-IHN for solving the database problem (34), respectively. The IHN program package is only used for numerically studying the convergence behaviors of IHN. Thus, we simply store the incomplete Hessian M_k into a full matrix array and solve the related linear systems by a direct solver. In the T-IHN package, we evaluated and stored only the nonzero block entries of the upper triangular part of M_k, and evaluated the product of M_k with a nonzero vector using only the nonzero block entries of M_k. In both IHN and T-IHN packages, the step length α_k was calculated by calling the line search program from the MATLAB library, where we used c₁ = 0.01, c₂ = 0.9, and an initial guess of one (based on the result of Theorem 4). The test data sets were selected from a large chemical database provided by the Medical College of Wisconsin, in which each member consists of one rat’s renal reactions to different physiological and medical experiments.

As comparisons, we also solved the database problem (34) by the classic Newton method, SD, BFGS, and D-TN. Here BFGS and SD were implemented by calling the minimization program routine, fminunc, from the MATLAB library with the option of HessUpdate as ‘bfgs’ and ‘steepdesc’, respectively. The other options for BFGS and SD included the scaled-identity matrix as the initial Hessian approximation, and the default mixed cubic and quadratic polynomial line search method, which is the same as the one used in the IHN and T-IHN packages. The D-TN program package was the same as the T-IHN package except that the Hessian-vector product H_kd was approximated by the Euler forward finite difference approximation:

H_{k} d \approx \frac{g (x_{k} + h d) - g (x_{k})}{h},

(41)

where x_k is the kth D-TN iterate, d is a vector, and h is set as

h = \max {\frac{ς}{\max {10 ς, ‖ d ‖}}, 0.1 ς}

with $ς = 2 \sqrt{∊_{m}} (1 + ‖ x_{k} ‖ \sqrt{N})$ , ∊ = 10⁻¹⁰, and N = ln. The above h is the same as the default setting for D-TN within TNPACK [11]. With = (41), each evaluation of H_kd requires one new gradient evaluation. All the tests on SD, BFGS, D-TN, IHN and T-IHN used the same MATLAB program routine we wrote for computing function values E and gradient vectors g_k. They also used the same iteration stopping rule (i.e., ∥g(x_k)∥ < 10⁻⁶), the same initial iterate generated from SVD (except IHN), and the same datasets with m = 9 and l = 2. The numerical experiments were made via MATLAB version R2006a on a laptop computer (Latitude D610 with Intel Pentium(R) M 1.86 GHz processor, and 1 GB RAM) at the University of Wisconsin-Milwaukee.

In the numerical tests on IHN, we used a dataset with n = 80. As required by the IHN analysis, an initial guess x₀ was selected such that all the Hessian and incomplete Hessian matrices H_k and M_k were positive definite (we checked them by evaluating their eigenvalues). Three incomplete Hessian matrices were constructed by using ξ = 0.5, 0.2235 and 0, which gave the cutoff radius τ = 185.26, 82.81 and 0, respectively, according to (39). The resulted three incomplete Hessian matrices were found to have that ρ = 44.46%, 23.84% and 1.25%, respectively, where ρ is the percentage of nonzero entries of an N × N sparse matrix defined by

ρ = \frac{100 (Total Numbers of Nonzero Entries)}{N^{2}} % .

Their sparse patterns were plotted in Fig. 1. From this figure we see that the incomplete Hessian with ρ = 1.25% is a block diagonal matrix with each block being a 2 by 2 matrix.

Figures 2 and 3 and Table 1 compare the convergence behaviors of the IHN using the above three sparse incomplete Hessian matrices with that of the Newton and SD methods. Here the minimum point x* used in computing errors ∥x_k − x*∥ for each method was found previously by this method. From the figures we see that the convergence speed of IHN becomes decreasing as the sparsity percentage ρ is reduced. With ρ = 44.46%, the convergence rate of IHN was close to that of the classic Newton method. In the case of ρ = 1.25%, where incomplete Hessian M(x_k) is a block diagonal matrix with each block being a 2 by 2 matrix, IHN was still found to have a much faster convergence speed than SD.

Fig. 2 — Comparisons of the absolute errors of IHN with that of the classic Newton method for solving the database problem (34) with n = 80 and l = 2

Fig. 3 — Comparisons of the gradient norms of IHN with that of the Newton and SD methods for solving the database problem (34) with n = 80 and l = 2

Table 1.

Comparisons of the convergence of IHN with that of Newton and SD for solving the database problem (34) with n = 80 and l = 2

Minimizer	Iterations	Final E	Final ∥g∥
Classic Newton	53	20.663	6.74×10⁻⁷
IHN ρ = 44.46%	68	21.334	9.43×10⁻⁷
IHN ρ = 23.84%	105	20.776	9.62×10⁻⁷
IHN ρ = 1.25%	279	20.538	9.76×10⁻⁷
SD	7184	20.633	9.75×10⁻⁷

Open in a new tab

In the numerical experiments on T-IHN, we selected a dataset of 300 members (n = 300) from the large database. We also constructed three incomplete Hessian matrices using ξ = 0.5, 0.1 and 0, which resulted in the cutoff radius τ = 179.89, 35.978 and 0, and the sparsity percentage ρ = 40.22%, 2.58% and 0.33%, respectively. We plotted the sparse patterns of the incomplete matrices with ρ = 40.22% and 2.58% in Figs. 4 and 5. It is interesting to note that the nonzero entries of M_k are distributed across the whole matrix. The one with ρ = 0.33% is a block diagonal matrix with each block being a 2 by 2 matrix.

Fig. 4 — The sparse pattern of the 600 × 600 incomplete Hessian matrix with the sparsity ratio ρ = 40.2%

Fig. 5 — The sparse pattern of the 600 × 600 incomplete Hessian matrix with ρ = 2.58%

Table 2 gives the performance data on the T-IHN, D-TN, BFGS, and SD methods for solving the database problem (34) with n = 300 and l = 2. Here, funcCount denotes the number of calling the program routine for evaluating E and g as well as M (T-IHN only), the number in parentheses is the total number of CG iterations within the D-TN and T-IHN methods, and the computer CPU time is measured by the MATLAB time functions tic and toc, where tic saves the current time that toc uses later to measure the elapsed time in seconds. Comparisons of the convergence processes of T-IHN, D-TN, BFGS, and SD in terms of gradient norms are displayed in Figs. 6 and 7.

Table 2.

Comparisons of the convergence and performance of T-IHN with that of D-TN, BFGS, and SD for solving the database problem (34) with n = 300 and l = 2. Here the CPU time is measured in seconds

Minimizer	Iterations	Final E	Final ∥g∥	funcCount	CPU
BFGS	196	330.67	1.19×10⁻⁶	199	25.04
D-TN	19 (846)	330.67	4.45×10⁻⁶	909	27.26
SD	3531	330.67	2.76×10⁻⁵	6683	373.15
T-IHN ρ = 40.22%	21 (714)	330.67	9.14×10⁻⁶	83	11.28
T-IHN ρ = 2.58%	186 (1964)	330.67	9.46×10⁻⁶	440	48.56
T-IHN ρ = 0.33%	246 (9403)	330.67	8.85×10⁻⁶	553	61.25

Open in a new tab

Fig. 7 — Comparisons of the gradient norms of T-IHN with that of SD for solving the database problem (34) with n = 300 and l = 2. Even with very sparse incomplete Hessian matrices, T-IHN is shown to have a much faster convergence speed than SD

From Table 2 and Fig. 6 we see that the T-IHN with ρ = 40.22% (i.e., about 60 percentage of the entries are zero) not only had a rate of convergence that is close to D-TN and faster than BFGS, but also had better performances than both D-TN and BFGS. In these tests, T-IHN took less CPU time by a factor of 2.4 than D-TN and by a factor of 2.2 than BFGS.

As shown in Table 2, even with a very sparse incomplete Hessian matrix, T-IHN still had a much faster convergence speed and better performance than SD. The T-IHN using the incomplete Hessian with ρ = 2.58% and 0.33% reduced the total CPU time of SD by the factors of about 7.68 and 6.09, respectively.

These numerical results demonstrate the promising potential of T-IHN as an efficient algorithm for solving the minimization problem (1). In our sequent work, we intend to further improve the convergence rate of T-IHN and study the preconditioning issue for T-IHN to further improve the performance of T-IHN for solving a very large database problem.

Acknowledgements

The authors are grateful to Professor Dan Beard from the Medical College of Wisconsin for providing the chemical database. They are also indebted to the two anonymous referees for valuable comments and suggestions.

This project was supported by the National Science Foundation (DMS-0241236, USA), the National Institutes of Health (PHS R01 EB005825-01, USA), the Natural Science Foundation of China (10471062), and Jiangsu Province (BK2006184, China).

Footnotes

The Kronecker product ⊗ of an m × n matrix, say A = (a_ij)_m×n, with a μ × ν matrix B is defined as a μm × νn matrix in the form A × B = (a_ijB)_m×n.

Contributor Information

Dexuan Xie, Department of Mathematical Sciences, University of Wisconsin, 3200 N Cramer Street, EMS Building, Room E403, Milwaukee, WI 53211, USA.

Qin Ni, Department of Mathematics, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, People’s Republic of China.

References

1.Dembo RS, Steihaug T. Truncated-Newton algorithms for large-scale unconstrained optimization. Math. Program. 1983;26:190–212. [Google Scholar]
2.Golub GH, van Loan CF. Matrix Computations. 2nd edn Johns Hopkins University Press; Baltimore: 1986. [Google Scholar]
3.Jolliffe IT. Principal Component Analysis. Springer; New York: 1986. [Google Scholar]
4.Liu DC, Nocedal J. On the limited memory BFGS method for large scale optimization. Math. Program. 1989;45:503–528. [Google Scholar]
5.Nash SG, Nocedal J. A numerical study of the limited memory BFGS method and the truncated Newton method for large scale optimization. SIAM J. Optim. 1991;1:358–371. [Google Scholar]
6.Nocedal J, Wright SJ. Numerical Optimization. 2nd edn Springer; New York: 2006. [Google Scholar]
7.O’Leary DP. A discrete Newton algorithm for minimizing a function of many variables. Math. Program. 1982;23:20–33. [Google Scholar]
8.Ortega JM, Rheinboldt WC. Iterative Solution of Nonlinear Equations in Several Variables. Academic Press; New York: 1970. [Google Scholar]
9.Powell MJD. Nonlinear Programming. SIAM-AMS Proceedings. vol. IX. SIAM; Philadelphia: 1976. Some global convergence of a variable metric algorithm for minimization without exact linear search. [Google Scholar]
10.Schlick T. Molecular Modeling and Simulation, an Interdisciplinary Guide. Springer; New York: 2002. [Google Scholar]
11.Schlick T, Fogelson A. TNPACK—A truncated Newton minimization package for large-scale problems: I. Algorithm and usage. ACM Trans. Math. Softw. 1992;18:46–70. [Google Scholar]
12.Xie D, Schlick T. Efficient implementation of the truncated-Newton algorithm for large-scale chemistry applications. SIAM J. Optim. 1999;9:132–154. [Google Scholar]
13.Xie D, Singh SB, Fluder EM, Schlick T. Principal component analysis combined with truncated-Newton minimization for dimensionality reduction of chemical databases. Math. Program. 2003;95:161–185. [Google Scholar]
14.Xie D, Tropsha A, Schlick T. An efficient projection protocol for chemical databases: the singular value decomposition combined with truncated-Newton minimization. J. Chem. Inf. Comput. Sci. 2000;40:167–177. doi: 10.1021/ci990333j. [DOI] [PubMed] [Google Scholar]
15.Young DM. Iterative Solution of Large Linear System. Academic Press; New York: 1971. [Google Scholar]

[R1] 1.Dembo RS, Steihaug T. Truncated-Newton algorithms for large-scale unconstrained optimization. Math. Program. 1983;26:190–212. [Google Scholar]

[R2] 2.Golub GH, van Loan CF. Matrix Computations. 2nd edn Johns Hopkins University Press; Baltimore: 1986. [Google Scholar]

[R3] 3.Jolliffe IT. Principal Component Analysis. Springer; New York: 1986. [Google Scholar]

[R4] 4.Liu DC, Nocedal J. On the limited memory BFGS method for large scale optimization. Math. Program. 1989;45:503–528. [Google Scholar]

[R5] 5.Nash SG, Nocedal J. A numerical study of the limited memory BFGS method and the truncated Newton method for large scale optimization. SIAM J. Optim. 1991;1:358–371. [Google Scholar]

[R6] 6.Nocedal J, Wright SJ. Numerical Optimization. 2nd edn Springer; New York: 2006. [Google Scholar]

[R7] 7.O’Leary DP. A discrete Newton algorithm for minimizing a function of many variables. Math. Program. 1982;23:20–33. [Google Scholar]

[R8] 8.Ortega JM, Rheinboldt WC. Iterative Solution of Nonlinear Equations in Several Variables. Academic Press; New York: 1970. [Google Scholar]

[R9] 9.Powell MJD. Nonlinear Programming. SIAM-AMS Proceedings. vol. IX. SIAM; Philadelphia: 1976. Some global convergence of a variable metric algorithm for minimization without exact linear search. [Google Scholar]

[R10] 10.Schlick T. Molecular Modeling and Simulation, an Interdisciplinary Guide. Springer; New York: 2002. [Google Scholar]

[R11] 11.Schlick T, Fogelson A. TNPACK—A truncated Newton minimization package for large-scale problems: I. Algorithm and usage. ACM Trans. Math. Softw. 1992;18:46–70. [Google Scholar]

[R12] 12.Xie D, Schlick T. Efficient implementation of the truncated-Newton algorithm for large-scale chemistry applications. SIAM J. Optim. 1999;9:132–154. [Google Scholar]

[R13] 13.Xie D, Singh SB, Fluder EM, Schlick T. Principal component analysis combined with truncated-Newton minimization for dimensionality reduction of chemical databases. Math. Program. 2003;95:161–185. [Google Scholar]

[R14] 14.Xie D, Tropsha A, Schlick T. An efficient projection protocol for chemical databases: the singular value decomposition combined with truncated-Newton minimization. J. Chem. Inf. Comput. Sci. 2000;40:167–177. doi: 10.1021/ci990333j. [DOI] [PubMed] [Google Scholar]

[R15] 15.Young DM. Iterative Solution of Large Linear System. Academic Press; New York: 1971. [Google Scholar]

PERMALINK

An incomplete Hessian Newton minimization method and its application in a chemical database problem

Dexuan Xie

Qin Ni

Abstract

1 Introduction

2 The IHN method

3 The convergence analysis of IHN

Theorem 1

Assumption 1

Assumption 2

Definition 1

Definition 2

Theorem 2

Corollary 1

Proof

Lemma 1

Lemma 2

Proof

Theorem 3

Proof

Corollary 2

Corollary 3

Proof

Theorem 4

Proof

4 The IHN and T-IHN methods for chemical database problem

Algorithm 1

5 Numerical results

Fig. 1.

Fig. 2.

Fig. 3.

Table 1.

Fig. 4.

Fig. 5.

Table 2.

Fig. 6.

Fig. 7.

Acknowledgements

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases