Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2010 Aug 10.
Published in final edited form as: Comput Optim Appl. 2009 Dec;44(3):467–485. doi: 10.1007/s10589-008-9164-y

An incomplete Hessian Newton minimization method and its application in a chemical database problem

Dexuan Xie 1, Qin Ni 2
PMCID: PMC2863154  NIHMSID: NIHMS171515  PMID: 20445822

Abstract

To efficiently solve a large scale unconstrained minimization problem with a dense Hessian matrix, this paper proposes to use an incomplete Hessian matrix to define a new modified Newton method, called the incomplete Hessian Newton method (IHN). A theoretical analysis shows that IHN is convergent globally, and has a linear rate of convergence with a properly selected symmetric, positive definite incomplete Hessian matrix. It also shows that the Wolfe conditions hold in IHN with a line search step length of one. As an important application, an effective IHN and a modified IHN, called the truncated-IHN method (T-IHN), are constructed for solving a large scale chemical database optimal projection mapping problem. T-IHN is shown to work well even with indefinite incomplete Hessian matrices. Numerical results confirm the theoretical results of IHN, and demonstrate the promising potential of T-IHN as an efficient minimization algorithm.

Keywords: Unconstraint minimization, Incomplete Hessian matrix, Convergence analysis, Truncated Newton, Chemical database analysis

1 Introduction

Let f be a twice continuously differentiable multivariable function defined on a bounded neighboring domain, D, of the n-dimensional Euclidean real vector space Rn. We consider a large scale unconstrained optimization problem:

FindxDRnsuch thatf(x)=min{f(x)xD}, (1)

where x* lies in the interior of D, f(x) and the gradient vector g(x) (i.e., the first derivative of f at x) are expansive to be evaluated, and the Hessian matrix H(x) (i.e., the second derivative of f at x) is dense, and available to be evaluated analytically. In practice, however, when n is large enough, it is infeasible to evaluate the whole Hessian H(x) due to the computer cost of computing and storage. Typical examples of (1) often arise from biomolecular energy function minimization problems and chemical database optimal projection problems as well as many other scientific and engineering applications. The minimization problem of biomolecular potential energy function is one of the fundamental tasks in biomolecular simulations [10] while solving the optimal projection mapping problem is a key step in a large scale chemical database analysis [10, 13, 14]. Developing efficient numerical minimization algorithms is essential in these application fields.

Currently, typical algorithms for solving (1) include the steepest descent method (SD), the nonlinear conjugate gradient method (CG), the limited-memory BFGS method (L-BFGS) [4-6], and the discrete truncated Newton method (D-TN) [6, 7]. They all do not require any evaluations of the Hessian matrices but gradient vectors. In SD and CG, the gradient vectors are employed to construct descent search directions. To gain faster minimization algorithms, L-BFGS and D-TN use the gradient vectors to construct approximate Hessian matrices to define them as modified Newton type algorithms [6]. For example, in D-TN, an approximate Hessian matrix is generated implicitly by using gradient vectors to construct a finite difference approximation to the Hessian-vector product, which is the only place where H(x) occurs in the truncated Newton method (TN) [1]. However, such a finite difference formula is inherently numerically unstable, which may disturb the numerical behaviors of D-TN significantly.

In this paper, we intend to study the idea of constructing an approximate Hessian, M(x), directly from the Hessian matrix H(x), which is available in Problem (1) and should be used, even partially, in developing fast minimization algorithms. With a properly selected sparse pattern and sparse matrix techniques, we can simply construct M(x) as an incomplete Hessian matrix, and evaluate it in a fast way based on the available computer computing and storage capability. We then substitute it to the Hessian matrix H(x) of a classic modified Newton method to yield an incomplete Hessian Newton method (IHN). With a properly selected M(x), IHN is expected to be numerically stable, easy to be implemented, and have high computer performance and fast convergence rate.

The focus of this paper is to study some basic convergence properties of IHN. For this purpose, we simply assume that M(x) is symmetric, positive definite in domain D. Following the classic quasi-Newton theory, we prove that IHN converges globally, and has both an R-linear rate of convergence and a Q-linear rate of convergence when M(x) is properly selected. We also prove that the Wolfe conditions hold for IHN with a line search step length of one when the number of IHN iterations is large enough.

As an application, we construct a particular IHN for solving the chemical database optimal projection mapping problem as described in [10, 13, 14]. In this application, the entries of H(x) that correspond to the pairwise distances within a certain short range become a dominating part of H(x). Thus, they can be selected as the nonzero entries of M(x), yielding a good approximation of H(x). In this paper, we construct M(x) by a distance cut-off strategy, and express both H(x) and M(x) in terms of Kronecker products to display the dense and sparse matrix structures of H(x) and M(x). Such a sparse expression of M(x) is valuable in programming M(x) by sparse matrix techniques. To confirm our theoretical results, we carry out numerical experiments on IHN with a real chemical dataset. Numerical results show that the rate of convergence of IHN can be close to that of the classic modified Newton method when the incomplete Hessian is properly selected. Even with a very sparse incomplete Hessian (a block diagonal matrix with each block being a 2 by 2 matrix), IHN was still found to have a much faster rate of convergence than SD.

However, assuming all the incomplete Hessian matrices to be positive definite is often too strong to be satisfied in practice. To release this assumption and further reduce the computing cost, we use the truncated Newton strategy given in [12] to modify IHN as a descent search direction method, and call it the truncated IHN method (T-IHN) for clarity. T-IHN is shown to converge globally even with indefinite incomplete Hessian matrices. To numerically study the convergence rate and computer performance, we develop a MATLAB program package for T-IHN for solving the chemical database problem using sparse matrix techniques. We then compare the convergence and performance of T-IHN with that of SD, BFGS, and D-TN. Here SD and BFGS were implemented by calling the minimization solver routine fminunc from the MATLAB library, and the MATLAB program of D-TN is the same as that of T-IHN except that it uses the Euler forward finite difference formula to approximate the Hessian-vector product.

Numerical results show that the T-IHN using an incomplete Hessian with about 60 percent of zero entries has a faster rate of convergence and a better performance than BFGS. T-IHN took less CPU time by a factor of about 2.09 than BFGS for a dataset of 300 members. T-IHN was also found to have a close rate of convergence as D-TN and a better performance than D-TN. In this test, T-IHN took less CPU time by a factor of about 2.6 than D-TN. Here we did not compare T-IHN with L-BFGS since the MATLAB library does not contain any L-BFGS program routine. Note that L-BFGS usually has a slower rate of convergence than BFGS. Hence, it can be expected that T-IHN has better performances in both convergence and CPU time than L-BFGS.

We also made tests on T-IHN with two very sparse incomplete Hessian matrices: one has about 97.43% of zero entries, and the other has about 99.67% of zero entries. Even so, T-IHN was still found to have much better performances in both convergence rate and CPU time than SD. It took less CPU time by a factor of up to 7.68 than SD. These numerical results demonstrate the promising potential of T-IHN as an efficient solver of minimization problem (1).

The remainder of the paper is organized as follows. We define IHN in Sect. 2, and present its basic convergent properties in Sect. 3. We then describe the IHN and T-IHN methods for solving the database problem in Sect. 4. Finally, the numerical results on IHN and T-IHN are presented in Sect. 5.

2 The IHN method

Let g(x), H(x) and M(x) denote the gradient vector, Hessian matrix, and incomplete Hessian matrix of f at xD, respectively. We assume that both H(x) and M(x) are symmetric, positive definite in D. The IHN iterative sequence {xk} for solving (1) is defined in the form

xk+1=xk+αkpk,k=0,1,2,, (2)

where x0 is a given initial iterate in D, pk is a search direction satisfying

M(xk)pk=g(xk), (3)

and αk is a step length satisfying the Wolfe conditions

f(xk+1)f(xk)+c1αkg(xk)Tpkandg(xk+1)Tpkc2g(xk)Tpk (4)

for 0<c1<12<c2<1. Clearly, pk = −M(xk)−1gk, which is a descent search direction in the sense that

g(xk)Tpk<0fork=0,1,2,. (5)

Denote mij and hij as the (i, j)th entry of M(x) and H(x), respectively. The sparse pattern P of M(x) is a set of index pairs (i, j) at which mij ≠ 0. With a given P, the incomplete Hessian matrix M(x) is defined by

mij(x)={hij(x)for(i,j)P,0,otherwise.}

Clearly, a selection of P depends on the problem to be solved and the capacity of a computer to be used for implementation. In the extreme cases, we can set P = Pf and Pd, where

Pd={(i,i)i=1,2,,n}andPf={(i,j)i,j=1,2,,n}. (6)

Obviously, the matrices M(x) with Pf and Pd are respectively the original Hessian matrix H(x) and the diagonal matrix with h11(x), h22(x), …, and hnn(x) as the diagonal entries. Hence, the IHN with Pf returns to the classic Newton method.

3 The convergence analysis of IHN

Let ∥ · ∥ be the 2-norm of a vector/matrix. For clarity, we sometimes write f(xk), g(xk), H(xk), M(xk), g(x*), H(x*), and M(x*) as fk, gk, Hk, Mk, g*, H, and M, respectively. Following the general quasi-Newton theory (e.g., see pp. 43–45 in [6]), we have the global convergence theorem for IHN as below.

Theorem 1

Let {xk} be a sequence of IHN iterates defined in (2). If there exists a positive constant η such that

MkMk1ηfor allk, (7)

then limk→∞g(xk)∥ = 0 for any x0D.

To discuss the convergence rates of IHN, we make Assumptions 1 and 2.

Assumption 1

The IHN iterative sequence {xk} converges to xD for any x0D. Here g(x*) = 0 and both M(x*) and H(x*) are positive definite.

Assumption 2

There exists a positive integer, k0, such that Wolfe condition (4) is satisfied with αk = 1 for all kk0.

The following two definitions will be used in our IHN analysis.

Definition 1

The convergence rate of {xk} is R-linear if there exists a number r between 0 and 1 such that lim supk→∞xkx*∥1/k = r.

Definition 2

The convergence rate of {xk} is Q-linear if there exists a constant 0 < c < 1 and a positive integer k0 such that ∥xk+1x*∥ ≤ cxkx*∥ whenever kk0.

Using the similar arguments in both [9] and the proof of Theorem 6.1 in [4], we can prove the R-linear convergence for IHN.

Theorem 2

If Assumption 1 and condition (7) hold, then the IHN iterative sequence {xk} is R-linearly convergent.

The following corollary gives an easy-to-check sufficient condition for the IHN using the sparse pattern Pd defined in (6).

Corollary 1

Let Assumption 1 hold and the sparse pattern of incomplete Hessian matrix Mk be given with Pd in (6). If the diagonal elements of Hk are positive and bounded, then IHN is convergent both globally and R-linearly.

Proof

We only need to prove that (7) is satisfied. Let hii(k) for i = 1, …, n be the diagonal elements of Hk. By the assumption, there exist two positive constants b1 and b2 such that b1hii(k)b2 for all i and k. Thus,

MkMk1=max1inhii(k)min1inhii(k)b2b1.

Hence, the proof follows from Theorems 1 and 2.

To discuss the Q-linear rate of convergence, we need Lemmas 1 and 2.

Lemma 1

If Assumption 1 holds, then for any η > 0, there exists ∊ > 0 such that H(x) and M(x) are positive definite, ∥M(x)−1 − M(x*)−1∥ < η, and ∥g(x) − g(x*) − H(x*)(x − x*)∥ < η ∥x − x*∥ whenever ∥x − x*∥ ≤ ∊.

The above lemma can be proved by similar arguments to the ones from Sects. 2.3.3 and 3.1.6 in [8].

Lemma 2

If H and M are symmetric, positive definite, then

λmin(M1H)1xT(HM)xxTMxλmax(M1H)1,x0, (8)

where λmin(M−1H) and λmax(M−1H) denote the smallest and largest eigenvalues of M−1H, respectively.

Proof

Let M1/2 be the square root matrix of M. Set H=M12HM12 and y = M1/2x for any nonzero xD. It is easy to show that xTHx=yTHy, yT y = xT Mx, and H is similar to M−1H. Hence,

λmin(M1H)yTyyTHyλmax(M1H)yTy,

from which it is then easy to obtain (8).

Since M is assumed to be symmetric, positive definite, its square root matrix M12 exists and satisfies that M=M12M12. Thus, a new norm, , can be well defined by

x=M12x,x0. (9)

Under this new norm, we obtain the Q-linear convergence rate for IHN in Theorem 3. Here the spectral radius of a matrix, say A, is denoted by ρ(A), which is defined as the largest of the modules of the eigenvalues of A.

Theorem 3

Let Assumptions 1 and 2 hold. If

ρ(M1H)<2, (10)

then the IHN iterative sequence {xk} has a Q-linear rate of convergence under the norm defined in (9).

Proof

Set H=M12HM12, and denote by λj as the jth eigenvalue of H for j = 1, 2, …, n. Clearly, H is symmetric, positive definite because of Assumption 1 so that all its eigenvalues are positive. Further, H is similar to M1H; thus, both M1H and H have the same eigenvalues. Hence, 1M1H has eigenvalues 1 − λj for j = 1, 2, …, n, and from (10) it follows that 0 < λj < 2 or ∣1 − λj∣ < 1 for j = 1, 2, …, n. Therefore, ρ(1M1H)=max1in1λi<1, and we can select a positive real number, t, such that ρ(1M1H)t.

We next set E=MH and E=M12EM12. It is easy to see that E is symmetric, and similar to IM1H. Hence, E=ρ(E)=ρ(1M1H).

With (2) and Assumption 1, we can get that

M12(xk+1x)=M12(xkMk1gkx)=M12(Mk1(gkg)+(xkx))=M12Mk1(gkgH(xkx))M12(Mk1M1)H(xkx)+M12E(xkx). (11)

The last term of the above expression can then be estimated as below:

M12E(xxk)=EM12(xxk)Exxk=ρ(E)xxk=ρ(1M1H)xxktxxk. (12)

To estimate the other two terms of (11), we set η=12(1+t), and

μ=max{H,M,M1,M12,M12}. (13)

Obviously, t < η < 1, and there exists a sufficiently small positive number, γ, satisfying

t+μ2(2μ+γ)γη. (14)

For the above γ > 0, using the continuity of H(x) and M(x) at x* and similar arguments in the proof of Lemma 1, we can show that there exists positive number such that

M(y)1M(x)1γ, (15)

and

M12(g(y)g(x)H(x)(yx))γM12(yx) (16)

whenever ∥yx*∥ ≤ .

Furthermore, from the definition of the convergence of sequence {xk} it can follow that there exists integer k1 > 0 such that ∥xkx*∥ ≤ for kk1. Hence, by (13) and (15), we can obtain that

M12(Mk1M1)HM12γμ3,

and

Mk1M1+Mk1M1μ+γforkk1.

Finally, applying the above two expressions, (12), (14) and (16) to (11) gives

xk+1x=M12(xk+1x)M12Mk1(gkgH(xkx))+M12(Mk1M1)H(xkx)+M12E(xkx)(γμ2(2μ+γ)+t)M12(xkx)ηM12(xkx)=ηxkx (17)

whenever kk1. This completes the proof of Theorem 3.

From the above proof we see that the condition ρ(1M1H)<1 follows from (10), and is then used in the remaining part of the proof. Thus, we have

Corollary 2

Let Assumptions 1 and 2 hold. If

ρ(IM1H)<1, (18)

then IHN has a Q-linear rate of convergence under the norm .

According to the theory of linear stationary iterative methods [15], (18) gives a sufficient and necessary condition to guarantee the convergence of an iterative method constructed by the matrix splitting H=MN with N=MH for solving the linear system with coefficient matrix H. Thus, (18) is a natural condition to construct an efficient IHN.

The condition (10) in Theorem 3 depends on the solution x*, which is difficult to be verified. To improve it, we propose a sufficient condition that only depends on the current iterate in the following corollary.

Corollary 3

Let Assumptions 1 and 2 hold. If Hk is positive definite and there exist positive constant η < 2 and positive integer k1 such that

Mk1Hkη,k>k1, (19)

then IHN has a Q-linear rate of convergence under the norm .

Proof

From the property of matrix norm and the assumptions it follows that

ρ(Mk1Hk)Mk1Hkηfor allk>k1.

Letting k → ∞ in the above expression immediately gives condition (10). Thus, the proof is followed from Theorem 3.

A sufficient condition is given in the following theorem to guarantee that IHN satisfies Assumption 2.

Theorem 4

Let Assumption 1 hold. If the eigenvalues λi(k) for i = 1, …, n of Mk1Hk have lower and upper bounds λl and λu and there exists positive integer k1 such that

1c2<λ1λi(k)λu<22c1fori=1,,nandkk1, (20)

where c1 and c2 are the two constants in the Wolfe conditions of (4), then Assumption 2 holds for IHN.

Proof

By Taylor expansion, we can get that

f(xk+pk)f(xk)c1gkTpk=(12c1)gkTpk+12pkT(gk+Hkpk)+12pkT(H(uk)Hk)pk, (21)

where uk = xk + tkpk with 0 ≤ tk ≤ 1. Set

r=12λmin(M(x)),andω=r2min{22c1λu,λl+c21}. (22)

From Assumption 1 and (20) it can be shown that both r and ω are positive. Thus, an upper bound for λu − 1 and a lower bound for λl − 1 can be estimated in terms of r and ω as below:

λu1λu+22c121=12c112(22c1λu)12c1ωr, (23)

and

λl1λl+1c221=c2+12(λl+c21)c2+ωr. (24)

Further, by the continuity of H, we can select > 0 such that

H(z)H(x)ω (25)

whenever ∥xx*∥ ≤ and ∥zx*∥ ≤ . Based on Assumption 1, we can also find positive integer k2k1 such that

λmin(Mk)r,xkx,xk+pkx, (26)

and H(xk) is positive definite for all kk2. As the result of the last two inequality of (26),

ukx=tk(xk+pkx)+(1tk)(xkx).

Hence, with (25) and the second inequality of (26), we can get that

pkT(H(uk)Hk)pkωpk2,kk2. (27)

Also, by (3), (8) and (23), the term pkT(gk+Hkpk) of (21) is estimated as

pkT(gk+Hkpk)=PkT(HkMk)pk(λu1)pkTMkpk(12c1ωr)pkTMkpk. (28)

Therefore, applying (3), the first one of (26), (27), and (28) into (21) gives

f(xk+pk)f(xk)c1gkTpk(12c1)pkTMkpk+(12c1ω2r)pkTMkpk+12ωpk2=ω2rpkTMkpk+12ωpk212ωpk2+12ωpk2=0,kk2. (29)

This completes the proof of the first inequality of (4) for Assumption 2.

We next prove the second inequality of (4) for Assumption 2.

By the mean value theorem and (3),

g(xk+pk)Tpk=gkTpk+pkTHkpk+pkT(H(vk)Hk)pk=pkT(HkMk)pk+pkT(H(vk)Hk)pk, (30)

where vk=xk+tkpk with 0tk1. Similar to (27), we obtain

pkT(H(vk)Hk)pkωpk2,kk2. (31)

A combination of (8) with (20) and (24) gives

pkT(HkMk)pk(λl1)pkTMkpk(c2+ωr)pkTMkpk. (32)

Thus, by (3), (26), (30), (31), and (32),

g(xk+pk)Tpkc2gkTpk=c2pkTMkpk+pkT(HkMk)pk+pk(H(vk)Hk)pkc2pkTMkpk+(c2+ωr)pkTMkpkωpk2=ωrpkTMkpkωpk20forkk2.

This completes the proof of Theorem 4.

4 The IHN and T-IHN methods for chemical database problem

As an application, we construct IHN and T-IHN for solving the chemical database optimal projection mapping problem described in [10, 13, 14]. The key step is the construction of incomplete Hessian matrix Mk, which is presented in details in this section.

With the notation of [13], we recall the database problem as below.

Let a chemical database of n members have been characterized as an n × m matrix X:

X=(X1,X2,,Xn)T=[x11x12x1mx21x22x2mxn1xn2xnm], (33)

where Xi = (xi1, xi2, …, xim)T stands for the ith member of the database, xij denotes the value of the j th chemical descriptor for ith member, and the distance δij = ∥XiXj∥ measures the similarity of the ith member with jth member. A key step in the efficient visualization protocol proposed in [13, 14] is to solve the following unconstrained minimization problem:

E(Y1,Y2,,Yn)=minYiRlE(Y1,Y2,,Yn), (34)

where Yi = (yi1, yi2, …, yil)T, l is a positive integer less than m, and the objective function E is defined by

E(Y1,Y2,,Yn)=14i=1n1j=i+1nωijd(Yi,Yj)2δij22. (35)

Here d(Yi,Yj)2=k=1l(yikyjk)2, and ωij is the weight constant, which is set as ωij=1δij4 if δij4η and ωij = 1 if δij4<η. The parameter η is a small positive number such as 10−12. In [13, 14], the above minimization problem was solved by D-TN with an initial guess generated from SVD/PCA (singular value decomposition or principal component analysis) [2, 3].

Set Y = (Y1, Y2, …, Yn)T. To construct the incomplete Hessian matrix M(Y), we first find the Hessian H(Y) of E(Y) as below:

H(Y)=i=1n1j=i+1nHij(Y), (36)

where Hij(Y) denotes the second derivative of the term ωij4[d(Yi,Yj)2δij2]2 with respect to Y. We express Hij(Y) as an n × n block matrix with each block entry, hμν, being an l × l matrix. It is easy to find that

hμν={Πijifμ=ν=iorμ=ν=j,Πijifμ=i,ν=jorμ=j,ν=i,0otherwise,}

where Πij denotes an l × l matrix defined by

Πij=ωij(rijIl+2RijRijT).

Here rij=d(Yi,Yj)2δij2, Rij = YiYj, and Il is the l × l identity matrix. Clearly, Hij is sparse with only four nonzero block entries, hii, hjj, hij, and hji, where hii = hjj and hij = hji. In terms of Kronecker product ⊗,1 Hij(Y) can be expressed as

Hij(Y)=(eieiTeiejTejeiT+ejejT)Πij,

where ei denotes the ith standard unit vector of Rn. Thus, if we define Πii = 0, then the Hessian H(Y) can be written as

H(Y)=i=1n1j=i+1n(eieiTeiejTejeiT+ejejT)Πij=i=1n(eieiTj=1nΠij)i<j(eiejT+ejeiT)Πij. (37)

Clearly, the first and second terms of (37) give the main block diagonal part and the off-diagonal part of H(Y), respectively. From the above expression it is easy to see that H(Y) is a full dense matrix of N × N with N = nl.

Using distance cutoff strategy, we define the incomplete Hessian matrix M(Y) as follows: With a given cutoff radius, τ > 0, we construct the sparse pattern P by

P={(i,j)XiXjτfori,j=1,2,,n}.

We then define M(Y) by

M(Y)=i=1n(eieiTj=1nΠij)(i,j)P(eiejT)Πij. (38)

Clearly, the above matrix of M(Y) is symmetric due to the symmetry of the submatrix Πij and the definition of P. Hence, in computer implementation, we can only evaluate and store the upper triangular part of M(Y) to reduce the costs of computing and storage.

To properly control the sparsity of M(Y), we propose to select a value of the cutoff radius τ by the formula

τ=ξ[2n(n1)i=1nj=i+1nδij2]12, (39)

where ξ is an adjusting factor for the sparsity of M(Y). If ξ = 1, τ gives a mean value of the distances between each pair of the database members. In other words, about a half of the entries of M(Y) may be zero. As the value of ξ is reduced to zero, M(Y) becomes the block diagonal matrix Md(Y):

Md(Y)=i=1n(eieiTj=1nΠij). (40)

From the block Jacobi convergence theory (see p. 111 in [15], for example) it follows that ρ(IMd(Y*)−1H(Y*)) < 1 if H(Y*) and 2Md(Y*) − H(Y*) are positive definite and all the main diagonal block entries of H(Y*) are symmetric, positive definite. Hence, by Corollary 2, it is claimed that the IHN method with the incomplete Hessian of (40) is Q-linearly convergent.

With the incomplete Hessian M(Y) of (38), we obtain IHN for solving the database problem (34).

We next describe T-IHN for solving the database problem (34).

The T-IHN iterative sequence {xk} is defined in the same form as the one in (2) except that the descent search direction pk is selected as either an iterate of the preconditioned conjugate gradient method (PCG) [2] for solving (3) or −gk (in the worst case) according to the truncated Newton strategy given in [12]. For clarity, the scheme for generating pk is presented in Algorithm 1. The initial iterate x0 of T-IHN is generated by using the SVD/PCA scheme given in [13].

Algorithm 1

(Defining the descent search direction pk for T-IHN)

Let Bk be a preconditioner for Mk, and wj represent the jth PCG iterate for solving (3). Set ηk = min{ck/k, ∥gk∥}, and give > 0 and ITPCG > 0 (e.g., ck = 0.5, = 10−6 and ITPCG = 80). The kth descent search direction pk of the T-IHN method is selected by the following steps:

  1. [INITIALIZATION]
    • Set j = 1, w1 = 0, r1 = −gk, and d1 = z1,
    • where z1 solves the linear system Bkz1 = −r1.
  2. [SINGULARITY TEST]
    • If either rjTzjδ or djTMkdjδ (e.g., δ = 10−10),
    • exit the algorithm with pk = wj (for j = 1, set pk = −gk).
  3. Compute αj=rjTzjdjTMkdj and wj+1 = wj + αjdj.

  4. [DESCENT DIRECTION TEST]
    • If gkTwj+1gkTwj+δ,
    • exit the algorithm with pk = wj (for j = 1, set pk = −gk).
  5. Compute rj+1 = rjαjMkdj.

  6. [TRUNCATION TEST]
    • If ∥rj+1∥ ≤ ηkgk∥ or j + 1 > ITPCG,
    • exit the algorithm with pk = wj+1.
  7. Compute βj=rj+1Tzj+1rjTzj, and dj+1 = zj+1 + βjdj,
    • where zj+1 solves the linear system Bkzj+1 = rj+1.
  8. Increase j to j + 1 and go to Step 2.

As shown in [12], a descent search direction is produced from Algorithm 1 even with indefinite Mk or Bk. Thus, T-IHN is a descent search direction method, whose global convergence follows immediately from the general theory of descent methods [6].

The performance of T-IHN can be improved significantly if a good preconditioner Bk can be selected. However, since Mk is often indefinite for this database problem, its preconditioning causes difficulties in both theory and practice. We avoid such difficulties in this paper by simply setting Bk to be an identity matrix.

5 Numerical results

We developed two MATLAB program packages for IHN and T-IHN for solving the database problem (34), respectively. The IHN program package is only used for numerically studying the convergence behaviors of IHN. Thus, we simply store the incomplete Hessian Mk into a full matrix array and solve the related linear systems by a direct solver. In the T-IHN package, we evaluated and stored only the nonzero block entries of the upper triangular part of Mk, and evaluated the product of Mk with a nonzero vector using only the nonzero block entries of Mk. In both IHN and T-IHN packages, the step length αk was calculated by calling the line search program from the MATLAB library, where we used c1 = 0.01, c2 = 0.9, and an initial guess of one (based on the result of Theorem 4). The test data sets were selected from a large chemical database provided by the Medical College of Wisconsin, in which each member consists of one rat’s renal reactions to different physiological and medical experiments.

As comparisons, we also solved the database problem (34) by the classic Newton method, SD, BFGS, and D-TN. Here BFGS and SD were implemented by calling the minimization program routine, fminunc, from the MATLAB library with the option of HessUpdate as ‘bfgs’ and ‘steepdesc’, respectively. The other options for BFGS and SD included the scaled-identity matrix as the initial Hessian approximation, and the default mixed cubic and quadratic polynomial line search method, which is the same as the one used in the IHN and T-IHN packages. The D-TN program package was the same as the T-IHN package except that the Hessian-vector product Hkd was approximated by the Euler forward finite difference approximation:

Hkdg(xk+hd)g(xk)h, (41)

where xk is the kth D-TN iterate, d is a vector, and h is set as

h=max{ςmax{10ς,d},0.1ς}

with ς=2m(1+xkN), = 10−10, and N = ln. The above h is the same as the default setting for D-TN within TNPACK [11]. With = (41), each evaluation of Hkd requires one new gradient evaluation. All the tests on SD, BFGS, D-TN, IHN and T-IHN used the same MATLAB program routine we wrote for computing function values E and gradient vectors gk. They also used the same iteration stopping rule (i.e., ∥g(xk)∥ < 10−6), the same initial iterate generated from SVD (except IHN), and the same datasets with m = 9 and l = 2. The numerical experiments were made via MATLAB version R2006a on a laptop computer (Latitude D610 with Intel Pentium(R) M 1.86 GHz processor, and 1 GB RAM) at the University of Wisconsin-Milwaukee.

In the numerical tests on IHN, we used a dataset with n = 80. As required by the IHN analysis, an initial guess x0 was selected such that all the Hessian and incomplete Hessian matrices Hk and Mk were positive definite (we checked them by evaluating their eigenvalues). Three incomplete Hessian matrices were constructed by using ξ = 0.5, 0.2235 and 0, which gave the cutoff radius τ = 185.26, 82.81 and 0, respectively, according to (39). The resulted three incomplete Hessian matrices were found to have that ρ = 44.46%, 23.84% and 1.25%, respectively, where ρ is the percentage of nonzero entries of an N × N sparse matrix defined by

ρ=100(Total Numbers of Nonzero Entries)N2%.

Their sparse patterns were plotted in Fig. 1. From this figure we see that the incomplete Hessian with ρ = 1.25% is a block diagonal matrix with each block being a 2 by 2 matrix.

Fig. 1.

Fig. 1

The sparse patterns of the three incomplete Hessian matrices of 160 × 160 with the sparsity ratio ρ = 44.46%, 23.84% and 1.25% (from left to right)

Figures 2 and 3 and Table 1 compare the convergence behaviors of the IHN using the above three sparse incomplete Hessian matrices with that of the Newton and SD methods. Here the minimum point x* used in computing errors ∥xkx*∥ for each method was found previously by this method. From the figures we see that the convergence speed of IHN becomes decreasing as the sparsity percentage ρ is reduced. With ρ = 44.46%, the convergence rate of IHN was close to that of the classic Newton method. In the case of ρ = 1.25%, where incomplete Hessian M(xk) is a block diagonal matrix with each block being a 2 by 2 matrix, IHN was still found to have a much faster convergence speed than SD.

Fig. 2.

Fig. 2

Comparisons of the absolute errors of IHN with that of the classic Newton method for solving the database problem (34) with n = 80 and l = 2

Fig. 3.

Fig. 3

Comparisons of the gradient norms of IHN with that of the Newton and SD methods for solving the database problem (34) with n = 80 and l = 2

Table 1.

Comparisons of the convergence of IHN with that of Newton and SD for solving the database problem (34) with n = 80 and l = 2

Minimizer Iterations Final E Final ∥g
Classic Newton 53 20.663 6.74×10−7
IHN ρ = 44.46% 68 21.334 9.43×10−7
IHN ρ = 23.84% 105 20.776 9.62×10−7
IHN ρ = 1.25% 279 20.538 9.76×10−7
SD 7184 20.633 9.75×10−7

In the numerical experiments on T-IHN, we selected a dataset of 300 members (n = 300) from the large database. We also constructed three incomplete Hessian matrices using ξ = 0.5, 0.1 and 0, which resulted in the cutoff radius τ = 179.89, 35.978 and 0, and the sparsity percentage ρ = 40.22%, 2.58% and 0.33%, respectively. We plotted the sparse patterns of the incomplete matrices with ρ = 40.22% and 2.58% in Figs. 4 and 5. It is interesting to note that the nonzero entries of Mk are distributed across the whole matrix. The one with ρ = 0.33% is a block diagonal matrix with each block being a 2 by 2 matrix.

Fig. 4.

Fig. 4

The sparse pattern of the 600 × 600 incomplete Hessian matrix with the sparsity ratio ρ = 40.2%

Fig. 5.

Fig. 5

The sparse pattern of the 600 × 600 incomplete Hessian matrix with ρ = 2.58%

Table 2 gives the performance data on the T-IHN, D-TN, BFGS, and SD methods for solving the database problem (34) with n = 300 and l = 2. Here, funcCount denotes the number of calling the program routine for evaluating E and g as well as M (T-IHN only), the number in parentheses is the total number of CG iterations within the D-TN and T-IHN methods, and the computer CPU time is measured by the MATLAB time functions tic and toc, where tic saves the current time that toc uses later to measure the elapsed time in seconds. Comparisons of the convergence processes of T-IHN, D-TN, BFGS, and SD in terms of gradient norms are displayed in Figs. 6 and 7.

Table 2.

Comparisons of the convergence and performance of T-IHN with that of D-TN, BFGS, and SD for solving the database problem (34) with n = 300 and l = 2. Here the CPU time is measured in seconds

Minimizer Iterations Final E Final ∥g funcCount CPU
BFGS 196 330.67 1.19×10−6 199 25.04
D-TN 19 (846) 330.67 4.45×10−6 909 27.26
SD 3531 330.67 2.76×10−5 6683 373.15
T-IHN ρ = 40.22% 21 (714) 330.67 9.14×10−6 83 11.28
T-IHN ρ = 2.58% 186 (1964) 330.67 9.46×10−6 440 48.56
T-IHN ρ = 0.33% 246 (9403) 330.67 8.85×10−6 553 61.25

Fig. 6.

Fig. 6

Comparisons of the gradient norms of T-IHN with that of D-TN and BFGS for solving the database problem (34) with n = 300 and l = 2. The convergence rate of T-IHN is shown to be able to be close to that of D-TN and faster than that of BFGS

Fig. 7.

Fig. 7

Comparisons of the gradient norms of T-IHN with that of SD for solving the database problem (34) with n = 300 and l = 2. Even with very sparse incomplete Hessian matrices, T-IHN is shown to have a much faster convergence speed than SD

From Table 2 and Fig. 6 we see that the T-IHN with ρ = 40.22% (i.e., about 60 percentage of the entries are zero) not only had a rate of convergence that is close to D-TN and faster than BFGS, but also had better performances than both D-TN and BFGS. In these tests, T-IHN took less CPU time by a factor of 2.4 than D-TN and by a factor of 2.2 than BFGS.

As shown in Table 2, even with a very sparse incomplete Hessian matrix, T-IHN still had a much faster convergence speed and better performance than SD. The T-IHN using the incomplete Hessian with ρ = 2.58% and 0.33% reduced the total CPU time of SD by the factors of about 7.68 and 6.09, respectively.

These numerical results demonstrate the promising potential of T-IHN as an efficient algorithm for solving the minimization problem (1). In our sequent work, we intend to further improve the convergence rate of T-IHN and study the preconditioning issue for T-IHN to further improve the performance of T-IHN for solving a very large database problem.

Acknowledgements

The authors are grateful to Professor Dan Beard from the Medical College of Wisconsin for providing the chemical database. They are also indebted to the two anonymous referees for valuable comments and suggestions.

This project was supported by the National Science Foundation (DMS-0241236, USA), the National Institutes of Health (PHS R01 EB005825-01, USA), the Natural Science Foundation of China (10471062), and Jiangsu Province (BK2006184, China).

Footnotes

1

The Kronecker product ⊗ of an m × n matrix, say A = (aij)m×n, with a μ × ν matrix B is defined as a μm × νn matrix in the form A × B = (aijB)m×n.

Contributor Information

Dexuan Xie, Department of Mathematical Sciences, University of Wisconsin, 3200 N Cramer Street, EMS Building, Room E403, Milwaukee, WI 53211, USA.

Qin Ni, Department of Mathematics, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, People’s Republic of China.

References

  • 1.Dembo RS, Steihaug T. Truncated-Newton algorithms for large-scale unconstrained optimization. Math. Program. 1983;26:190–212. [Google Scholar]
  • 2.Golub GH, van Loan CF. Matrix Computations. 2nd edn Johns Hopkins University Press; Baltimore: 1986. [Google Scholar]
  • 3.Jolliffe IT. Principal Component Analysis. Springer; New York: 1986. [Google Scholar]
  • 4.Liu DC, Nocedal J. On the limited memory BFGS method for large scale optimization. Math. Program. 1989;45:503–528. [Google Scholar]
  • 5.Nash SG, Nocedal J. A numerical study of the limited memory BFGS method and the truncated Newton method for large scale optimization. SIAM J. Optim. 1991;1:358–371. [Google Scholar]
  • 6.Nocedal J, Wright SJ. Numerical Optimization. 2nd edn Springer; New York: 2006. [Google Scholar]
  • 7.O’Leary DP. A discrete Newton algorithm for minimizing a function of many variables. Math. Program. 1982;23:20–33. [Google Scholar]
  • 8.Ortega JM, Rheinboldt WC. Iterative Solution of Nonlinear Equations in Several Variables. Academic Press; New York: 1970. [Google Scholar]
  • 9.Powell MJD. Nonlinear Programming. SIAM-AMS Proceedings. vol. IX. SIAM; Philadelphia: 1976. Some global convergence of a variable metric algorithm for minimization without exact linear search. [Google Scholar]
  • 10.Schlick T. Molecular Modeling and Simulation, an Interdisciplinary Guide. Springer; New York: 2002. [Google Scholar]
  • 11.Schlick T, Fogelson A. TNPACK—A truncated Newton minimization package for large-scale problems: I. Algorithm and usage. ACM Trans. Math. Softw. 1992;18:46–70. [Google Scholar]
  • 12.Xie D, Schlick T. Efficient implementation of the truncated-Newton algorithm for large-scale chemistry applications. SIAM J. Optim. 1999;9:132–154. [Google Scholar]
  • 13.Xie D, Singh SB, Fluder EM, Schlick T. Principal component analysis combined with truncated-Newton minimization for dimensionality reduction of chemical databases. Math. Program. 2003;95:161–185. [Google Scholar]
  • 14.Xie D, Tropsha A, Schlick T. An efficient projection protocol for chemical databases: the singular value decomposition combined with truncated-Newton minimization. J. Chem. Inf. Comput. Sci. 2000;40:167–177. doi: 10.1021/ci990333j. [DOI] [PubMed] [Google Scholar]
  • 15.Young DM. Iterative Solution of Large Linear System. Academic Press; New York: 1971. [Google Scholar]

RESOURCES