Abstract
To efficiently solve a large scale unconstrained minimization problem with a dense Hessian matrix, this paper proposes to use an incomplete Hessian matrix to define a new modified Newton method, called the incomplete Hessian Newton method (IHN). A theoretical analysis shows that IHN is convergent globally, and has a linear rate of convergence with a properly selected symmetric, positive definite incomplete Hessian matrix. It also shows that the Wolfe conditions hold in IHN with a line search step length of one. As an important application, an effective IHN and a modified IHN, called the truncated-IHN method (T-IHN), are constructed for solving a large scale chemical database optimal projection mapping problem. T-IHN is shown to work well even with indefinite incomplete Hessian matrices. Numerical results confirm the theoretical results of IHN, and demonstrate the promising potential of T-IHN as an efficient minimization algorithm.
Keywords: Unconstraint minimization, Incomplete Hessian matrix, Convergence analysis, Truncated Newton, Chemical database analysis
1 Introduction
Let f be a twice continuously differentiable multivariable function defined on a bounded neighboring domain, , of the n-dimensional Euclidean real vector space Rn. We consider a large scale unconstrained optimization problem:
| (1) |
where x* lies in the interior of , f(x) and the gradient vector g(x) (i.e., the first derivative of f at x) are expansive to be evaluated, and the Hessian matrix H(x) (i.e., the second derivative of f at x) is dense, and available to be evaluated analytically. In practice, however, when n is large enough, it is infeasible to evaluate the whole Hessian H(x) due to the computer cost of computing and storage. Typical examples of (1) often arise from biomolecular energy function minimization problems and chemical database optimal projection problems as well as many other scientific and engineering applications. The minimization problem of biomolecular potential energy function is one of the fundamental tasks in biomolecular simulations [10] while solving the optimal projection mapping problem is a key step in a large scale chemical database analysis [10, 13, 14]. Developing efficient numerical minimization algorithms is essential in these application fields.
Currently, typical algorithms for solving (1) include the steepest descent method (SD), the nonlinear conjugate gradient method (CG), the limited-memory BFGS method (L-BFGS) [4-6], and the discrete truncated Newton method (D-TN) [6, 7]. They all do not require any evaluations of the Hessian matrices but gradient vectors. In SD and CG, the gradient vectors are employed to construct descent search directions. To gain faster minimization algorithms, L-BFGS and D-TN use the gradient vectors to construct approximate Hessian matrices to define them as modified Newton type algorithms [6]. For example, in D-TN, an approximate Hessian matrix is generated implicitly by using gradient vectors to construct a finite difference approximation to the Hessian-vector product, which is the only place where H(x) occurs in the truncated Newton method (TN) [1]. However, such a finite difference formula is inherently numerically unstable, which may disturb the numerical behaviors of D-TN significantly.
In this paper, we intend to study the idea of constructing an approximate Hessian, M(x), directly from the Hessian matrix H(x), which is available in Problem (1) and should be used, even partially, in developing fast minimization algorithms. With a properly selected sparse pattern and sparse matrix techniques, we can simply construct M(x) as an incomplete Hessian matrix, and evaluate it in a fast way based on the available computer computing and storage capability. We then substitute it to the Hessian matrix H(x) of a classic modified Newton method to yield an incomplete Hessian Newton method (IHN). With a properly selected M(x), IHN is expected to be numerically stable, easy to be implemented, and have high computer performance and fast convergence rate.
The focus of this paper is to study some basic convergence properties of IHN. For this purpose, we simply assume that M(x) is symmetric, positive definite in domain . Following the classic quasi-Newton theory, we prove that IHN converges globally, and has both an R-linear rate of convergence and a Q-linear rate of convergence when M(x) is properly selected. We also prove that the Wolfe conditions hold for IHN with a line search step length of one when the number of IHN iterations is large enough.
As an application, we construct a particular IHN for solving the chemical database optimal projection mapping problem as described in [10, 13, 14]. In this application, the entries of H(x) that correspond to the pairwise distances within a certain short range become a dominating part of H(x). Thus, they can be selected as the nonzero entries of M(x), yielding a good approximation of H(x). In this paper, we construct M(x) by a distance cut-off strategy, and express both H(x) and M(x) in terms of Kronecker products to display the dense and sparse matrix structures of H(x) and M(x). Such a sparse expression of M(x) is valuable in programming M(x) by sparse matrix techniques. To confirm our theoretical results, we carry out numerical experiments on IHN with a real chemical dataset. Numerical results show that the rate of convergence of IHN can be close to that of the classic modified Newton method when the incomplete Hessian is properly selected. Even with a very sparse incomplete Hessian (a block diagonal matrix with each block being a 2 by 2 matrix), IHN was still found to have a much faster rate of convergence than SD.
However, assuming all the incomplete Hessian matrices to be positive definite is often too strong to be satisfied in practice. To release this assumption and further reduce the computing cost, we use the truncated Newton strategy given in [12] to modify IHN as a descent search direction method, and call it the truncated IHN method (T-IHN) for clarity. T-IHN is shown to converge globally even with indefinite incomplete Hessian matrices. To numerically study the convergence rate and computer performance, we develop a MATLAB program package for T-IHN for solving the chemical database problem using sparse matrix techniques. We then compare the convergence and performance of T-IHN with that of SD, BFGS, and D-TN. Here SD and BFGS were implemented by calling the minimization solver routine fminunc from the MATLAB library, and the MATLAB program of D-TN is the same as that of T-IHN except that it uses the Euler forward finite difference formula to approximate the Hessian-vector product.
Numerical results show that the T-IHN using an incomplete Hessian with about 60 percent of zero entries has a faster rate of convergence and a better performance than BFGS. T-IHN took less CPU time by a factor of about 2.09 than BFGS for a dataset of 300 members. T-IHN was also found to have a close rate of convergence as D-TN and a better performance than D-TN. In this test, T-IHN took less CPU time by a factor of about 2.6 than D-TN. Here we did not compare T-IHN with L-BFGS since the MATLAB library does not contain any L-BFGS program routine. Note that L-BFGS usually has a slower rate of convergence than BFGS. Hence, it can be expected that T-IHN has better performances in both convergence and CPU time than L-BFGS.
We also made tests on T-IHN with two very sparse incomplete Hessian matrices: one has about 97.43% of zero entries, and the other has about 99.67% of zero entries. Even so, T-IHN was still found to have much better performances in both convergence rate and CPU time than SD. It took less CPU time by a factor of up to 7.68 than SD. These numerical results demonstrate the promising potential of T-IHN as an efficient solver of minimization problem (1).
The remainder of the paper is organized as follows. We define IHN in Sect. 2, and present its basic convergent properties in Sect. 3. We then describe the IHN and T-IHN methods for solving the database problem in Sect. 4. Finally, the numerical results on IHN and T-IHN are presented in Sect. 5.
2 The IHN method
Let g(x), H(x) and M(x) denote the gradient vector, Hessian matrix, and incomplete Hessian matrix of f at , respectively. We assume that both H(x) and M(x) are symmetric, positive definite in . The IHN iterative sequence {xk} for solving (1) is defined in the form
| (2) |
where x0 is a given initial iterate in , pk is a search direction satisfying
| (3) |
and αk is a step length satisfying the Wolfe conditions
| (4) |
for . Clearly, pk = −M(xk)−1gk, which is a descent search direction in the sense that
| (5) |
Denote mij and hij as the (i, j)th entry of M(x) and H(x), respectively. The sparse pattern P of M(x) is a set of index pairs (i, j) at which mij ≠ 0. With a given P, the incomplete Hessian matrix M(x) is defined by
Clearly, a selection of P depends on the problem to be solved and the capacity of a computer to be used for implementation. In the extreme cases, we can set P = Pf and Pd, where
| (6) |
Obviously, the matrices M(x) with Pf and Pd are respectively the original Hessian matrix H(x) and the diagonal matrix with h11(x), h22(x), …, and hnn(x) as the diagonal entries. Hence, the IHN with Pf returns to the classic Newton method.
3 The convergence analysis of IHN
Let ∥ · ∥ be the 2-norm of a vector/matrix. For clarity, we sometimes write f(xk), g(xk), H(xk), M(xk), g(x*), H(x*), and M(x*) as fk, gk, Hk, Mk, g*, , and , respectively. Following the general quasi-Newton theory (e.g., see pp. 43–45 in [6]), we have the global convergence theorem for IHN as below.
Theorem 1
Let {xk} be a sequence of IHN iterates defined in (2). If there exists a positive constant η such that
| (7) |
then limk→∞∥g(xk)∥ = 0 for any .
To discuss the convergence rates of IHN, we make Assumptions 1 and 2.
Assumption 1
The IHN iterative sequence {xk} converges to for any . Here g(x*) = 0 and both M(x*) and H(x*) are positive definite.
Assumption 2
There exists a positive integer, k0, such that Wolfe condition (4) is satisfied with αk = 1 for all k ≥ k0.
The following two definitions will be used in our IHN analysis.
Definition 1
The convergence rate of {xk} is R-linear if there exists a number r between 0 and 1 such that lim supk→∞∥xk − x*∥1/k = r.
Definition 2
The convergence rate of {xk} is Q-linear if there exists a constant 0 < c < 1 and a positive integer k0 such that ∥xk+1 − x*∥ ≤ c∥xk − x*∥ whenever k ≥ k0.
Using the similar arguments in both [9] and the proof of Theorem 6.1 in [4], we can prove the R-linear convergence for IHN.
Theorem 2
If Assumption 1 and condition (7) hold, then the IHN iterative sequence {xk} is R-linearly convergent.
The following corollary gives an easy-to-check sufficient condition for the IHN using the sparse pattern Pd defined in (6).
Corollary 1
Let Assumption 1 hold and the sparse pattern of incomplete Hessian matrix Mk be given with Pd in (6). If the diagonal elements of Hk are positive and bounded, then IHN is convergent both globally and R-linearly.
Proof
We only need to prove that (7) is satisfied. Let for i = 1, …, n be the diagonal elements of Hk. By the assumption, there exist two positive constants b1 and b2 such that for all i and k. Thus,
Hence, the proof follows from Theorems 1 and 2.
To discuss the Q-linear rate of convergence, we need Lemmas 1 and 2.
Lemma 1
If Assumption 1 holds, then for any η > 0, there exists ∊ > 0 such that H(x) and M(x) are positive definite, ∥M(x)−1 − M(x*)−1∥ < η, and ∥g(x) − g(x*) − H(x*)(x − x*)∥ < η ∥x − x*∥ whenever ∥x − x*∥ ≤ ∊.
The above lemma can be proved by similar arguments to the ones from Sects. 2.3.3 and 3.1.6 in [8].
Lemma 2
If H and M are symmetric, positive definite, then
| (8) |
where λmin(M−1H) and λmax(M−1H) denote the smallest and largest eigenvalues of M−1H, respectively.
Proof
Let M1/2 be the square root matrix of M. Set and y = M1/2x for any nonzero . It is easy to show that , yT y = xT Mx, and is similar to M−1H. Hence,
from which it is then easy to obtain (8).
Since is assumed to be symmetric, positive definite, its square root matrix exists and satisfies that . Thus, a new norm, , can be well defined by
| (9) |
Under this new norm, we obtain the Q-linear convergence rate for IHN in Theorem 3. Here the spectral radius of a matrix, say A, is denoted by ρ(A), which is defined as the largest of the modules of the eigenvalues of A.
Theorem 3
Let Assumptions 1 and 2 hold. If
| (10) |
then the IHN iterative sequence {xk} has a Q-linear rate of convergence under the norm defined in (9).
Proof
Set , and denote by λj as the jth eigenvalue of for j = 1, 2, …, n. Clearly, is symmetric, positive definite because of Assumption 1 so that all its eigenvalues are positive. Further, is similar to ; thus, both and have the same eigenvalues. Hence, has eigenvalues 1 − λj for j = 1, 2, …, n, and from (10) it follows that 0 < λj < 2 or ∣1 − λj∣ < 1 for j = 1, 2, …, n. Therefore, , and we can select a positive real number, t, such that .
We next set and . It is easy to see that is symmetric, and similar to . Hence, .
With (2) and Assumption 1, we can get that
| (11) |
The last term of the above expression can then be estimated as below:
| (12) |
To estimate the other two terms of (11), we set , and
| (13) |
Obviously, t < η < 1, and there exists a sufficiently small positive number, γ, satisfying
| (14) |
For the above γ > 0, using the continuity of H(x) and M(x) at x* and similar arguments in the proof of Lemma 1, we can show that there exists positive number ∊ such that
| (15) |
and
| (16) |
whenever ∥y − x*∥ ≤ ∊.
Furthermore, from the definition of the convergence of sequence {xk} it can follow that there exists integer k1 > 0 such that ∥xk − x*∥ ≤ ∊ for k ≥ k1. Hence, by (13) and (15), we can obtain that
and
Finally, applying the above two expressions, (12), (14) and (16) to (11) gives
| (17) |
whenever k ≥ k1. This completes the proof of Theorem 3.
From the above proof we see that the condition follows from (10), and is then used in the remaining part of the proof. Thus, we have
Corollary 2
Let Assumptions 1 and 2 hold. If
| (18) |
then IHN has a Q-linear rate of convergence under the norm .
According to the theory of linear stationary iterative methods [15], (18) gives a sufficient and necessary condition to guarantee the convergence of an iterative method constructed by the matrix splitting with for solving the linear system with coefficient matrix . Thus, (18) is a natural condition to construct an efficient IHN.
The condition (10) in Theorem 3 depends on the solution x*, which is difficult to be verified. To improve it, we propose a sufficient condition that only depends on the current iterate in the following corollary.
Corollary 3
Let Assumptions 1 and 2 hold. If Hk is positive definite and there exist positive constant η < 2 and positive integer k1 such that
| (19) |
then IHN has a Q-linear rate of convergence under the norm .
Proof
From the property of matrix norm and the assumptions it follows that
Letting k → ∞ in the above expression immediately gives condition (10). Thus, the proof is followed from Theorem 3.
A sufficient condition is given in the following theorem to guarantee that IHN satisfies Assumption 2.
Theorem 4
Let Assumption 1 hold. If the eigenvalues for i = 1, …, n of have lower and upper bounds λl and λu and there exists positive integer k1 such that
| (20) |
where c1 and c2 are the two constants in the Wolfe conditions of (4), then Assumption 2 holds for IHN.
Proof
By Taylor expansion, we can get that
| (21) |
where uk = xk + tkpk with 0 ≤ tk ≤ 1. Set
| (22) |
From Assumption 1 and (20) it can be shown that both r and ω are positive. Thus, an upper bound for λu − 1 and a lower bound for λl − 1 can be estimated in terms of r and ω as below:
| (23) |
and
| (24) |
Further, by the continuity of H, we can select ∊ > 0 such that
| (25) |
whenever ∥x − x*∥ ≤ ∊ and ∥z − x*∥ ≤ ∊. Based on Assumption 1, we can also find positive integer k2 ≥ k1 such that
| (26) |
and H(xk) is positive definite for all k ≥ k2. As the result of the last two inequality of (26),
Hence, with (25) and the second inequality of (26), we can get that
| (27) |
Also, by (3), (8) and (23), the term of (21) is estimated as
| (28) |
Therefore, applying (3), the first one of (26), (27), and (28) into (21) gives
| (29) |
This completes the proof of the first inequality of (4) for Assumption 2.
We next prove the second inequality of (4) for Assumption 2.
By the mean value theorem and (3),
| (30) |
where with . Similar to (27), we obtain
| (31) |
A combination of (8) with (20) and (24) gives
| (32) |
Thus, by (3), (26), (30), (31), and (32),
This completes the proof of Theorem 4.
4 The IHN and T-IHN methods for chemical database problem
As an application, we construct IHN and T-IHN for solving the chemical database optimal projection mapping problem described in [10, 13, 14]. The key step is the construction of incomplete Hessian matrix Mk, which is presented in details in this section.
With the notation of [13], we recall the database problem as below.
Let a chemical database of n members have been characterized as an n × m matrix X:
| (33) |
where Xi = (xi1, xi2, …, xim)T stands for the ith member of the database, xij denotes the value of the j th chemical descriptor for ith member, and the distance δij = ∥Xi − Xj∥ measures the similarity of the ith member with jth member. A key step in the efficient visualization protocol proposed in [13, 14] is to solve the following unconstrained minimization problem:
| (34) |
where Yi = (yi1, yi2, …, yil)T, l is a positive integer less than m, and the objective function E is defined by
| (35) |
Here , and ωij is the weight constant, which is set as if and ωij = 1 if . The parameter η is a small positive number such as 10−12. In [13, 14], the above minimization problem was solved by D-TN with an initial guess generated from SVD/PCA (singular value decomposition or principal component analysis) [2, 3].
Set Y = (Y1, Y2, …, Yn)T. To construct the incomplete Hessian matrix M(Y), we first find the Hessian H(Y) of E(Y) as below:
| (36) |
where Hij(Y) denotes the second derivative of the term with respect to Y. We express Hij(Y) as an n × n block matrix with each block entry, hμν, being an l × l matrix. It is easy to find that
where Πij denotes an l × l matrix defined by
Here , Rij = Yi − Yj, and Il is the l × l identity matrix. Clearly, Hij is sparse with only four nonzero block entries, hii, hjj, hij, and hji, where hii = hjj and hij = hji. In terms of Kronecker product ⊗,1 Hij(Y) can be expressed as
where ei denotes the ith standard unit vector of Rn. Thus, if we define Πii = 0, then the Hessian H(Y) can be written as
| (37) |
Clearly, the first and second terms of (37) give the main block diagonal part and the off-diagonal part of H(Y), respectively. From the above expression it is easy to see that H(Y) is a full dense matrix of N × N with N = nl.
Using distance cutoff strategy, we define the incomplete Hessian matrix M(Y) as follows: With a given cutoff radius, τ > 0, we construct the sparse pattern P by
We then define M(Y) by
| (38) |
Clearly, the above matrix of M(Y) is symmetric due to the symmetry of the submatrix Πij and the definition of P. Hence, in computer implementation, we can only evaluate and store the upper triangular part of M(Y) to reduce the costs of computing and storage.
To properly control the sparsity of M(Y), we propose to select a value of the cutoff radius τ by the formula
| (39) |
where ξ is an adjusting factor for the sparsity of M(Y). If ξ = 1, τ gives a mean value of the distances between each pair of the database members. In other words, about a half of the entries of M(Y) may be zero. As the value of ξ is reduced to zero, M(Y) becomes the block diagonal matrix Md(Y):
| (40) |
From the block Jacobi convergence theory (see p. 111 in [15], for example) it follows that ρ(I − Md(Y*)−1H(Y*)) < 1 if H(Y*) and 2Md(Y*) − H(Y*) are positive definite and all the main diagonal block entries of H(Y*) are symmetric, positive definite. Hence, by Corollary 2, it is claimed that the IHN method with the incomplete Hessian of (40) is Q-linearly convergent.
With the incomplete Hessian M(Y) of (38), we obtain IHN for solving the database problem (34).
We next describe T-IHN for solving the database problem (34).
The T-IHN iterative sequence {xk} is defined in the same form as the one in (2) except that the descent search direction pk is selected as either an iterate of the preconditioned conjugate gradient method (PCG) [2] for solving (3) or −gk (in the worst case) according to the truncated Newton strategy given in [12]. For clarity, the scheme for generating pk is presented in Algorithm 1. The initial iterate x0 of T-IHN is generated by using the SVD/PCA scheme given in [13].
Algorithm 1
(Defining the descent search direction pk for T-IHN)
Let Bk be a preconditioner for Mk, and wj represent the jth PCG iterate for solving (3). Set ηk = min{ck/k, ∥gk∥}, and give ∊ > 0 and ITPCG > 0 (e.g., ck = 0.5, ∊ = 10−6 and ITPCG = 80). The kth descent search direction pk of the T-IHN method is selected by the following steps:
- [INITIALIZATION]
- Set j = 1, w1 = 0, r1 = −gk, and d1 = z1,
- where z1 solves the linear system Bkz1 = −r1.
- [SINGULARITY TEST]
- If either or (e.g., δ = 10−10),
- exit the algorithm with pk = wj (for j = 1, set pk = −gk).
Compute and wj+1 = wj + αjdj.
- [DESCENT DIRECTION TEST]
- If ,
- exit the algorithm with pk = wj (for j = 1, set pk = −gk).
Compute rj+1 = rj −αjMkdj.
- [TRUNCATION TEST]
- If ∥rj+1∥ ≤ ηk ∥gk∥ or j + 1 > ITPCG,
- exit the algorithm with pk = wj+1.
- Compute , and dj+1 = zj+1 + βjdj,
- where zj+1 solves the linear system Bkzj+1 = rj+1.
Increase j to j + 1 and go to Step 2.
As shown in [12], a descent search direction is produced from Algorithm 1 even with indefinite Mk or Bk. Thus, T-IHN is a descent search direction method, whose global convergence follows immediately from the general theory of descent methods [6].
The performance of T-IHN can be improved significantly if a good preconditioner Bk can be selected. However, since Mk is often indefinite for this database problem, its preconditioning causes difficulties in both theory and practice. We avoid such difficulties in this paper by simply setting Bk to be an identity matrix.
5 Numerical results
We developed two MATLAB program packages for IHN and T-IHN for solving the database problem (34), respectively. The IHN program package is only used for numerically studying the convergence behaviors of IHN. Thus, we simply store the incomplete Hessian Mk into a full matrix array and solve the related linear systems by a direct solver. In the T-IHN package, we evaluated and stored only the nonzero block entries of the upper triangular part of Mk, and evaluated the product of Mk with a nonzero vector using only the nonzero block entries of Mk. In both IHN and T-IHN packages, the step length αk was calculated by calling the line search program from the MATLAB library, where we used c1 = 0.01, c2 = 0.9, and an initial guess of one (based on the result of Theorem 4). The test data sets were selected from a large chemical database provided by the Medical College of Wisconsin, in which each member consists of one rat’s renal reactions to different physiological and medical experiments.
As comparisons, we also solved the database problem (34) by the classic Newton method, SD, BFGS, and D-TN. Here BFGS and SD were implemented by calling the minimization program routine, fminunc, from the MATLAB library with the option of HessUpdate as ‘bfgs’ and ‘steepdesc’, respectively. The other options for BFGS and SD included the scaled-identity matrix as the initial Hessian approximation, and the default mixed cubic and quadratic polynomial line search method, which is the same as the one used in the IHN and T-IHN packages. The D-TN program package was the same as the T-IHN package except that the Hessian-vector product Hkd was approximated by the Euler forward finite difference approximation:
| (41) |
where xk is the kth D-TN iterate, d is a vector, and h is set as
with , ∊ = 10−10, and N = ln. The above h is the same as the default setting for D-TN within TNPACK [11]. With = (41), each evaluation of Hkd requires one new gradient evaluation. All the tests on SD, BFGS, D-TN, IHN and T-IHN used the same MATLAB program routine we wrote for computing function values E and gradient vectors gk. They also used the same iteration stopping rule (i.e., ∥g(xk)∥ < 10−6), the same initial iterate generated from SVD (except IHN), and the same datasets with m = 9 and l = 2. The numerical experiments were made via MATLAB version R2006a on a laptop computer (Latitude D610 with Intel Pentium(R) M 1.86 GHz processor, and 1 GB RAM) at the University of Wisconsin-Milwaukee.
In the numerical tests on IHN, we used a dataset with n = 80. As required by the IHN analysis, an initial guess x0 was selected such that all the Hessian and incomplete Hessian matrices Hk and Mk were positive definite (we checked them by evaluating their eigenvalues). Three incomplete Hessian matrices were constructed by using ξ = 0.5, 0.2235 and 0, which gave the cutoff radius τ = 185.26, 82.81 and 0, respectively, according to (39). The resulted three incomplete Hessian matrices were found to have that ρ = 44.46%, 23.84% and 1.25%, respectively, where ρ is the percentage of nonzero entries of an N × N sparse matrix defined by
Their sparse patterns were plotted in Fig. 1. From this figure we see that the incomplete Hessian with ρ = 1.25% is a block diagonal matrix with each block being a 2 by 2 matrix.
Fig. 1.
The sparse patterns of the three incomplete Hessian matrices of 160 × 160 with the sparsity ratio ρ = 44.46%, 23.84% and 1.25% (from left to right)
Figures 2 and 3 and Table 1 compare the convergence behaviors of the IHN using the above three sparse incomplete Hessian matrices with that of the Newton and SD methods. Here the minimum point x* used in computing errors ∥xk − x*∥ for each method was found previously by this method. From the figures we see that the convergence speed of IHN becomes decreasing as the sparsity percentage ρ is reduced. With ρ = 44.46%, the convergence rate of IHN was close to that of the classic Newton method. In the case of ρ = 1.25%, where incomplete Hessian M(xk) is a block diagonal matrix with each block being a 2 by 2 matrix, IHN was still found to have a much faster convergence speed than SD.
Fig. 2.

Comparisons of the absolute errors of IHN with that of the classic Newton method for solving the database problem (34) with n = 80 and l = 2
Fig. 3.

Comparisons of the gradient norms of IHN with that of the Newton and SD methods for solving the database problem (34) with n = 80 and l = 2
Table 1.
Comparisons of the convergence of IHN with that of Newton and SD for solving the database problem (34) with n = 80 and l = 2
| Minimizer | Iterations | Final E | Final ∥g∥ |
|---|---|---|---|
| Classic Newton | 53 | 20.663 | 6.74×10−7 |
| IHN ρ = 44.46% | 68 | 21.334 | 9.43×10−7 |
| IHN ρ = 23.84% | 105 | 20.776 | 9.62×10−7 |
| IHN ρ = 1.25% | 279 | 20.538 | 9.76×10−7 |
| SD | 7184 | 20.633 | 9.75×10−7 |
In the numerical experiments on T-IHN, we selected a dataset of 300 members (n = 300) from the large database. We also constructed three incomplete Hessian matrices using ξ = 0.5, 0.1 and 0, which resulted in the cutoff radius τ = 179.89, 35.978 and 0, and the sparsity percentage ρ = 40.22%, 2.58% and 0.33%, respectively. We plotted the sparse patterns of the incomplete matrices with ρ = 40.22% and 2.58% in Figs. 4 and 5. It is interesting to note that the nonzero entries of Mk are distributed across the whole matrix. The one with ρ = 0.33% is a block diagonal matrix with each block being a 2 by 2 matrix.
Fig. 4.

The sparse pattern of the 600 × 600 incomplete Hessian matrix with the sparsity ratio ρ = 40.2%
Fig. 5.

The sparse pattern of the 600 × 600 incomplete Hessian matrix with ρ = 2.58%
Table 2 gives the performance data on the T-IHN, D-TN, BFGS, and SD methods for solving the database problem (34) with n = 300 and l = 2. Here, funcCount denotes the number of calling the program routine for evaluating E and g as well as M (T-IHN only), the number in parentheses is the total number of CG iterations within the D-TN and T-IHN methods, and the computer CPU time is measured by the MATLAB time functions tic and toc, where tic saves the current time that toc uses later to measure the elapsed time in seconds. Comparisons of the convergence processes of T-IHN, D-TN, BFGS, and SD in terms of gradient norms are displayed in Figs. 6 and 7.
Table 2.
Comparisons of the convergence and performance of T-IHN with that of D-TN, BFGS, and SD for solving the database problem (34) with n = 300 and l = 2. Here the CPU time is measured in seconds
| Minimizer | Iterations | Final E | Final ∥g∥ | funcCount | CPU |
|---|---|---|---|---|---|
| BFGS | 196 | 330.67 | 1.19×10−6 | 199 | 25.04 |
| D-TN | 19 (846) | 330.67 | 4.45×10−6 | 909 | 27.26 |
| SD | 3531 | 330.67 | 2.76×10−5 | 6683 | 373.15 |
| T-IHN ρ = 40.22% | 21 (714) | 330.67 | 9.14×10−6 | 83 | 11.28 |
| T-IHN ρ = 2.58% | 186 (1964) | 330.67 | 9.46×10−6 | 440 | 48.56 |
| T-IHN ρ = 0.33% | 246 (9403) | 330.67 | 8.85×10−6 | 553 | 61.25 |
Fig. 6.

Comparisons of the gradient norms of T-IHN with that of D-TN and BFGS for solving the database problem (34) with n = 300 and l = 2. The convergence rate of T-IHN is shown to be able to be close to that of D-TN and faster than that of BFGS
Fig. 7.

Comparisons of the gradient norms of T-IHN with that of SD for solving the database problem (34) with n = 300 and l = 2. Even with very sparse incomplete Hessian matrices, T-IHN is shown to have a much faster convergence speed than SD
From Table 2 and Fig. 6 we see that the T-IHN with ρ = 40.22% (i.e., about 60 percentage of the entries are zero) not only had a rate of convergence that is close to D-TN and faster than BFGS, but also had better performances than both D-TN and BFGS. In these tests, T-IHN took less CPU time by a factor of 2.4 than D-TN and by a factor of 2.2 than BFGS.
As shown in Table 2, even with a very sparse incomplete Hessian matrix, T-IHN still had a much faster convergence speed and better performance than SD. The T-IHN using the incomplete Hessian with ρ = 2.58% and 0.33% reduced the total CPU time of SD by the factors of about 7.68 and 6.09, respectively.
These numerical results demonstrate the promising potential of T-IHN as an efficient algorithm for solving the minimization problem (1). In our sequent work, we intend to further improve the convergence rate of T-IHN and study the preconditioning issue for T-IHN to further improve the performance of T-IHN for solving a very large database problem.
Acknowledgements
The authors are grateful to Professor Dan Beard from the Medical College of Wisconsin for providing the chemical database. They are also indebted to the two anonymous referees for valuable comments and suggestions.
This project was supported by the National Science Foundation (DMS-0241236, USA), the National Institutes of Health (PHS R01 EB005825-01, USA), the Natural Science Foundation of China (10471062), and Jiangsu Province (BK2006184, China).
Footnotes
The Kronecker product ⊗ of an m × n matrix, say A = (aij)m×n, with a μ × ν matrix B is defined as a μm × νn matrix in the form A × B = (aijB)m×n.
Contributor Information
Dexuan Xie, Department of Mathematical Sciences, University of Wisconsin, 3200 N Cramer Street, EMS Building, Room E403, Milwaukee, WI 53211, USA.
Qin Ni, Department of Mathematics, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, People’s Republic of China.
References
- 1.Dembo RS, Steihaug T. Truncated-Newton algorithms for large-scale unconstrained optimization. Math. Program. 1983;26:190–212. [Google Scholar]
- 2.Golub GH, van Loan CF. Matrix Computations. 2nd edn Johns Hopkins University Press; Baltimore: 1986. [Google Scholar]
- 3.Jolliffe IT. Principal Component Analysis. Springer; New York: 1986. [Google Scholar]
- 4.Liu DC, Nocedal J. On the limited memory BFGS method for large scale optimization. Math. Program. 1989;45:503–528. [Google Scholar]
- 5.Nash SG, Nocedal J. A numerical study of the limited memory BFGS method and the truncated Newton method for large scale optimization. SIAM J. Optim. 1991;1:358–371. [Google Scholar]
- 6.Nocedal J, Wright SJ. Numerical Optimization. 2nd edn Springer; New York: 2006. [Google Scholar]
- 7.O’Leary DP. A discrete Newton algorithm for minimizing a function of many variables. Math. Program. 1982;23:20–33. [Google Scholar]
- 8.Ortega JM, Rheinboldt WC. Iterative Solution of Nonlinear Equations in Several Variables. Academic Press; New York: 1970. [Google Scholar]
- 9.Powell MJD. Nonlinear Programming. SIAM-AMS Proceedings. vol. IX. SIAM; Philadelphia: 1976. Some global convergence of a variable metric algorithm for minimization without exact linear search. [Google Scholar]
- 10.Schlick T. Molecular Modeling and Simulation, an Interdisciplinary Guide. Springer; New York: 2002. [Google Scholar]
- 11.Schlick T, Fogelson A. TNPACK—A truncated Newton minimization package for large-scale problems: I. Algorithm and usage. ACM Trans. Math. Softw. 1992;18:46–70. [Google Scholar]
- 12.Xie D, Schlick T. Efficient implementation of the truncated-Newton algorithm for large-scale chemistry applications. SIAM J. Optim. 1999;9:132–154. [Google Scholar]
- 13.Xie D, Singh SB, Fluder EM, Schlick T. Principal component analysis combined with truncated-Newton minimization for dimensionality reduction of chemical databases. Math. Program. 2003;95:161–185. [Google Scholar]
- 14.Xie D, Tropsha A, Schlick T. An efficient projection protocol for chemical databases: the singular value decomposition combined with truncated-Newton minimization. J. Chem. Inf. Comput. Sci. 2000;40:167–177. doi: 10.1021/ci990333j. [DOI] [PubMed] [Google Scholar]
- 15.Young DM. Iterative Solution of Large Linear System. Academic Press; New York: 1971. [Google Scholar]

