Abstract
GGOPT is a derivative-free non-linear optimizer for smooth functions with added noise. If the function values arise from observations or from extensive computations, these errors can be considerable. GGOPT uses an adjustable mesh together with linear least squares to find smoothed values of the function, gradient and Hessian at the center of the mesh. These values drive a descent method that estimates optimal parameters. The smoothed values usually result in increased accuracy.
Keywords: Optimization, Non-linear, Noisy function
1. Introduction
This modular software routine is designed to generate minimizing sequences for C3 functions, with noisy values. It uses a variant of the method in [2], and is implemented in ANSII 77 standard FORTRAN.
The object of GGOPT is to increase the accuracy of optimization at the expense of additional computation. The algorithm uses an n-dimensional mesh of 1 + n + n2 points to estimate the function, gradient and Hessian at the center of the mesh by least squares. The accuracy of the fit depends on the location of the points of the mesh.
GGOPT consists of two phases. Phase I uses an ad hoc method for mesh spacing. On the test problems, and other problems as well, GGOPT is more efficient than the ZXMIN (an IMSL routine), and OPTIF when n ≤ 4, slower otherwise. The approach used in OPTIF is described by Dennis and Schnabel [1]. Efficiency here means the number of function evaluations that are required to achieve a given accuracy. The single precision version of ZXMIN can handle only very small errors (approximately 10−5 relative error). The single precision version of GGOPT handles larger error. The algorithm of Phase II [3], which is not described here, provides mesh adjustments that result in still greater accuracy.
The noise or error in the data for GGOPT is assumed to be bounded in magnitude with a distribution whose covariance is a multiple of the identity matrix. The vector of parameter values, x, can be estimated within a certain accuracy, depending on the magnitude of the error. The method was developed with the assumption that the model function has bounded third derivatives for all values of f less than the value of f at the initial parameter x. The routine may be particularly useful for improving upon the minimization attained by another method, while the algorithm for Phase II may be tried to improve the accuracy of GGOPT.
2. Computational method
The algorithm estimates the function value f, the gradient vector G and the symmetric Hessian matrix H by generating a mesh around the current solution x0 (a mesh is generated by perturbing the current solution x0 as described in the algorithm, step 1 below) and using linear least squares. It uses the approximated gradient vector G and Hessian matrix H to estimate the step increments for the parameters.
The quadratic function is used to approximate f(x) in a neighborhood of x0. Let y = f(x1), f(x2),…, f(xN))T be the function values obtained at points x1, x2,…, xN (points perturbed around x0), where N =1 + n + n2. We use second subscripts to denote components of the parameter vectors, i.e. xij indicates the jth component of the ith parameter vector. Let:
and let matrix A of dimension N × (1 + n + n(n + 1)/2) be as shown below.
The problem of obtaining a least squares fit of Ψ(x) to f(x1), f(x2),…, f(xN) can then be formulated as the minimization of ||A · θ − y ||22; this gives the smoothed function values f, gradient vector G and Hessian matrix H, i.e. θ = (AT · A)−1 · AT · y, if AT · A is non-singular.
One way of generating a non-singular A is to take the following mesh points:
where Ij denotes the jth column of the identity matrix, and hj and hk are lengths of mesh perturbed in jth and kth component of parameter vector respectively. Thus it will give a pattern consisting of 1 + n + n2 mesh points.
For example, let n = 2 and x0 = (0, 0). There are 7 mesh points, i.e. x1 = x0 = (0, 0), x2 = (h1, 0), x3 = (0, h2), x4 = (−h1, 0), x5 = (0, −h2), x6 = (h1, −h2), x7 = (−h1, h2). The matrix A of dimension 7 × 6 is:
The algorithm is defined as follows:
Let the iteration index k = 0.
- Find the right mesh size, h, in terms of ε, the relative error in f(xk): if
Generate a mesh matrix A.
Call function f at each of the mesh points, and store in a vector y.
Solve this overdetermined system of equations A · θ =y by least squares, where θ contains the information of smoothed function value, gradient and Hessian.
Get the Hessian matrix, H, and the gradient vector, G, from θ.
Do a Cholesky decomposition of H; if H is positive definite, then take a Newton step, Δx = H−1 · G; else use a gradient step, Δx = G/||G||.
Find a step length, γk, by a line search, such that f(xk − γk · Δx)<f(xk).
Update: xk+1 = xk − γk· Δx.
Check convergence criteria; exit if any one of the convergence criteria is satisfied.
Set k = k + 1 and go to (2).
In (2) above, hk is estimated by a secant update that aims to maintain the differences of f(xk) with step hk to be 1/4 of the significance of f(xk) itself.
3. Program instruction
To use the FORTRAN subroutine GGOPT, the user needs a FORTRAN program which calls GGOPT and a function subprogram. In the FORTRAN program, the user must include the following statement:
call GGOPT(f, n, x, maxit, grdtl, stptl, iout, fmin, epsilon, istop).
3.1. Subroutine arguments
Input
f: a function subprogram supplied by the user. It must be declared external in the calling program. f defines the objective function to be minimized and should be of the form: f(x).
n: the number of parameters to be minimized.
x: vector of length n containing the initial guess of the parameters at entry.
maxit: maximum number of iterations allowed.
grdtl: convergence criterion, convergence condition satisfied if the Euclidean norm of the gradient is less than or equal to grdtl.
stptl: convergence criterion, convergence condition satisfied if the Euclidean norm of the relative parameter changes is less than or equal to stptl.
iout: logical unit of print output.
fmin: convergence criterion, convergence condition satisfied if the objective function to be minimized is less or equal to fmin; set to 0.0 if it is not known.
epsilon (ε): the relative error in the objective function.
Output
x: vector of length n containing the optimized parameters at the exit.
grdtl: the Euclidean norm of the gradients at exit.
fmin: the minimized objective function value at exit.
istop: the reason for termination: istop = 0, abnormal termination; istop = 1, the Euclidean norm of the gradients is less or equal to grdtl; istop = 2, the Euclidean norm of the relative parameter changes is less or equal to stptl; istop = 3, the number of iterations exceeds maxit; istop = 4, unable to locate a better solution than the current solution (current solution is the best approximation); istop = 5, the objective function value is less or equal to fmin.
3.2. User-defined function
The user must supply a FORTRAN function, f(x), that is to be minimized which may contain errors of round off and truncation. For example, fitting a complicated mathematical model to a set of the biological data, the function f(x) will be the sum of the squares of the fitted residuals. Note that in our formulation, the model function can have significant errors. The format of the function is listed below:
function f(x)
dimension x(*)
(⋮)
FORTRAN statements
(⋮)
return
end
An important application of GGOPT is its use for optimizing a physical device. Consider a device whose output voltage depends on parameters given by voltages. The problem is to find the parameters which maximize the output voltage, i.e. minimize the negative output voltage. This is an analog computer. The user supplied function will be an interface from the analog computer to the FORTRAN subroutine.
3.3. Typical value for convergence criteria
The typical convergence criteria for the single precision version of GGOPT with error at machine run-off level are maxit = 20, grdtl = 0.0001, stptl = 0.00001, fmin = 0.
3.4. Control of output
The standard print output from this routine consists of printing the norm of ∇f(x), the function value f(x), and the parameter values, x, at each iteration; as well as printing the reason for stopping. The logical unit of output can be controlled by input argument iout.
4. Sample runs
Three standard test problems with standard starting values [5] and an application of fitting tracer data were used for sample runs. The standard test problems are the Helical Valley function in three dimensions and the Rosenbrock and Jennrich-Sampson functions in two dimensions. Five different levels of relative random noise (ε = 10−7, 10−4, 10−3, 10−2 and 5 × 10−2) were added to these three functions for testing GGOPT in the presence of noise. The data were the indicator dilution curve, D-[3H]glucose of an isolated rabbit heart obtained at the right ventricle outflow after an injection into the coronary arteries. A 3-region (capillary, interstitial fluid space and parenchymal cell) 2-barrier blood-tissue exchange model, BTEX30, was used to fit the data.
4.1. Helical Valley function
The Helical Valley function with error added is defined as:
where θ= 1/(2 · π) · tan−1(x2/x1) if x1 > 0, θ = 1/(2 · π) · tan−1(x2/x1) + 0.5 if x1 < 0 and U[−ε, ε] is a uniform random number generated between −ε and ε
The minimal function value = 0.0 at x = (1, 0, 0). Table 1 summarizes sample runs using the five different relative errors (ε), all with a standard starting value (− 1, 0, 0), where ITN is the number of iterations.
TABLE 1.
Results of Helical Valley function run
| ε | ITN | f(x) | x1 | x2 | x3 |
|---|---|---|---|---|---|
| 0.1E−6 | 10 | 0.109671E−18 | 0.100000E + 01 | 0.208957E−09 | 0.327986E−09 |
| 0.1E−3 | 13 | 0.110068E−13 | 0.100000E + 01 | −0.65673E−07 | −0.10486E−06 |
| 0.1E−2 | 22 | 0.369896E−09 | 0.100000E + 01 | 0.121430E−04 | 0.191907E−04 |
| 0.1E−1 | 27 | 0.242588E−07 | 0.999986E + 01 | −0.24414E−04 | −0.42822E−04 |
| 0.5E−1 | 29 | 0.215065E−04 | 0.100003E + 01 | −0.29952E−02 | −0.47317E−02 |
4.2. Rosenbrock function
The error-added Rosenbrock function is defined as:
The minimal function value = 0.0 at x = (1, 1). Table 2 is the summary of the sample runs with the different relative errors (ε), using a standard starting value (− 1.2, 1).
TABLE 2.
Results of Rosenbrock function run
| ε | ITN | f(x) | x1 | x2 |
|---|---|---|---|---|
| 0.1E−06 | 22 | 0.000000E−00 | 0.100000E + 01 | 0.100000E + 01 |
| 0.1E−03 | 31 | 0.327410E−10 | 0.100001E + 01 | 0.100001E + 01 |
| 0.1E−02 | 26 | 0.336418E−08 | 0.100006E + 01 | 0.100011E + 01 |
| 0.1E−01 | 47 | 0.354608E−06 | 0.999406E + 00 | 0.998808E + 00 |
| 0.5E−01 | 44 | 0.428770E−02 | 0.932850E + 00 | 0.870257E + 00 |
4.3. Jennrich-Sampson function
The noise-added Jennrich-Sampson function is
The minimal function value = 124.362 at x = (0.2578, 0.2578) when the relative error, ε is below or at the computer roundoff level. When the relative error is above the computer roundoff error, the minimal function value = 124.362 ± 124.362 · ε These function values should not be used to compare the runs with different relative errors, but the solution of x will be good for the comparison of the runs at different error levels. Table 3 is the summary of the test results with the five relative errors using a standard starting value (0.3, 0.4). For this starting value, both ZXMIN failed with relative error at 10−4.
TABLE 3.
Results of Jennrich and Sampson function runs
| ε | ITN | f(x) | x1 | x2 |
|---|---|---|---|---|
| 0.1E−06 | 10 | 124.366 | 0.258766 | 0.256845 |
| 0.1E−03 | 7 | 124.355 | 0.257960 | 0.257597 |
| 0.1E−02 | 8 | 124.282 | 0.258399 | 0.256323 |
| 0.1E−01 | 36 | 124.283 | 0.251413 | 0.258590 |
| 0.5E−01 | 22 | 124.213 | 0.230784 | 0.272401 |
4.4. Fitting tracer data with blood tissue exchange model
Consider the application of GGOPT for fitting blood-tissue exchange model (BTEX30) to the recorded indicator dilution curves. In Fig. 1, the open circles show a deoxyglucose outflow dilution curve (normalized CD(t) curve) and the plus symbols are the albumin outflow dilution curve, which was used as an intravascular reference for the deoxyglucose curve. These outflow dilution curves were obtained by Kuikka et al. [4] from an isolated rabbit heart by injection of tracer-labeled albumin and deoxyglucose into the aortic root. The model is the solution to the 3-region convection–diffusion partial differential equations, which are required to describe the concentration gradients with position along the length of the capillary. Regions are denoted by subscripts: p for plasma in the capillary, isf for interstitial fluid space and pc for parenchymal cells. The concentration inside the capillary Cp(x, t) is then given by the equation:
where is the volume of distribution for the solute in the capillary (ml · g−1), is the velocity (cm ·s−1) in the capillary, and L is an arbitrary capillary length.
Fig. 1.
Fitting of the blood–tissue exchange model to the deoxyglucose outflow dilution curve of an isolated rabbit heart. The albumin dilution curve defines transport through the intravascular region without exchange with the tissue. From the initial solution (dotted line) using guessed parameter values, 11 iterations were used to reach the final solution (solid line).
The equations for the concentration in the interstitium, Cisf, and cell region, Cpc, are similar except that there are no flow terms. The equations:
where Gpc is an intracellular clearance (ml · g−1 · s−1) and is equivalent to Kseq, a first order rate constant (s−1) for intracellular sequestration or binding which is irreversible. and are interstitial and intracellular volumes of distribution.
The parameters, PSg, PSpc, Gpc, and were obtained by fitting the solution Cp(x = L, t) to the dilution curve. The data were obtained at discrete time points, CD(ti) for i=1, …, m, and the objective function to be minimized is defined as the coefficient of variation, with m − 5 degrees of freedom (the number of data points minus the number of free parameters):
The testing of various relative errors, ε, added to the model function Cp(x = L, ti), is of less relevance in this test because of the relatively large noise levels of the data, CD(ti). Model solutions will usually have errors that are orders of magnitude less than the residual differences between model function and data, in this case represented by a coefficient of variation of 0.072 in the final fitting.
The objective function, the coefficient of variation, the measurement of the fitting deviation between the model and the data, was reduced from 0.96 to 0.072. We chose a set of (arbitrary but with reasonably physiological sense) starting parameter values, PSg = 0.02, PSpc = 0.02, Gpc = 0.02 ml · g−1 · s−1, , and . The estimates of these transport parameters were obtained by GGOPT at PSg = 0.0338, PSpc= 0.0185, Gpc = 0.00675 ml · g−1 · s−1, , and . In Fig. 1, the open circles curve is the deoxyglucose outflow dilution curve, the solid line is the model solution optimized by GGOPT. The dotted line is the model solution at the initial guessed parameter values. The fitting of the model to the data was very consistent over the whole curve.
Acknowledgments
This work was supported by grant RR 01243 and EB08407 from the National Institutes of Health.
References
- 1.Dennis JE, Schnabel RB. Numerical Methods of Unconstrained Optimization and Nonlinear Equation. Prentice-Hall; New York: 1983. [Google Scholar]
- 2.Glad T, Goldstein AA. Optimization of functions whose values are subject to small errors. BIT. 1977;17:160–169. [Google Scholar]
- 3.Goldstein AA, Chan IS, Bassingthwaighte JB, Russak IB. On the minimization of functions with errors, (descriptive manuscript and code available on request) [Google Scholar]
- 4.Kuikka J, Levin M, Bassingthwaighte JB. Multiple tracer dilution estimates of D-, and 2-deoxy-D-glucose uptake by the heart. Am J Physiol (Heart Circ Physiol 19) 1986;250:H29–H42. doi: 10.1152/ajpheart.1986.250.1.H29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.More JJ, Garbow BS, Hillstrom KE. Testing unconstrained optimization software. ACM TOMS. 1981;7/1:17–41. [Google Scholar]

