MM Algorithms for Geometric and Signomial Programming

Kenneth Lange; Hua Zhou

doi:10.1007/s10107-012-0612-1

. Author manuscript; available in PMC: 2015 Feb 1.

Published in final edited form as: Math Program. 2012 Oct 26;143(1-2):339–356. doi: 10.1007/s10107-012-0612-1

MM Algorithms for Geometric and Signomial Programming

Kenneth Lange ¹, Hua Zhou ²

PMCID: PMC3950732 NIHMSID: NIHMS417790 PMID: 24634545

Abstract

This paper derives new algorithms for signomial programming, a generalization of geometric programming. The algorithms are based on a generic principle for optimization called the MM algorithm. In this setting, one can apply the geometric-arithmetic mean inequality and a supporting hyperplane inequality to create a surrogate function with parameters separated. Thus, unconstrained signomial programming reduces to a sequence of one-dimensional minimization problems. Simple examples demonstrate that the MM algorithm derived can converge to a boundary point or to one point of a continuum of minimum points. Conditions under which the minimum point is unique or occurs in the interior of parameter space are proved for geometric programming. Convergence to an interior point occurs at a linear rate. Finally, the MM framework easily accommodates equality and inequality constraints of signomial type. For the most important special case, constrained quadratic programming, the MM algorithm involves very simple updates.

Keywords: arithmetic-geometric mean inequality, geometric programming, global convergence, MM algorithm, linearly constrained quadratic program-ming, parameter separation, penalty method, signomial programming

1 Introduction

As a branch of convex optimization theory, geometric programming is next in line to linear and quadratic programming in importance [4,5,15,16]. It has applications in chemical equilibrium problems [14], structural mechanics [5], integrated circuit design [7], maximum likelihood estimation [12], stochastic processes [6], and a host of other subjects [5]. Geometric programming deals with posynomials, which are functions of the form

f (x) = \sum_{α \in S} c_{α} \prod_{i = 1}^{n} x_{i}^{α_{i}} .

(1)

Here the index set S ⊂ ℝⁿ is finite, and all coefficients c_α and all components x₁, …, x_n of the argument x of f(x) are positive. The possibly fractional powers α_i corresponding to a particular α may be positive, negative, or zero. For instance, $x_{1}^{- 1} + 2 x_{1}^{3} x_{2}^{- 2}$ is a posynomial on ℝ². In geometric programming we minimize a posynomial f(x) subject to posynomial inequality constraints of the form u_j(x) ≤ 1 for 1 ≤ j ≤ q, where the u_j(x) are again posynomials. In some versions of geometric programming, equality constraints of posynomial type are permitted [3].

A signomial function has the same form as the posynomial (1), but the coefficients c_α are allowed to be negative. A signomial program is a generalization of a geometric program, where the objective and constraint functions can be signomials. From a computational point of view, signomial programming problems are significantly harder to solve than geometric programming problems. After suitable change of variables, a geometric program can be transformed into a convex optimization problem and globally solved by standard methods. In contrast, signomials may have many local minima. Wang et al. [20] recently derived a path algorithm for solving unconstrained signomial programs.

The theory and practice of geometric programming has been stable for a generation, so it is hard to imagine saying anything novel about either. The attractions of geometric programming include its beautiful duality theory and its connections with the arithmetic-geometric mean inequality. The present paper derives new algorithms for both geometric and signomial programming based on a generic device for iterative optimization called the MM algorithm [9,11]. The MM perspective possesses several advantages. First it provides a unified framework for solving both geometric and signomial programs. The algorithms derived here operate by separating parameters and reducing minimization of the objective function to a sequence of one-dimensional minimization problems. Separation of parameters is apt to be an advantage in high-dimensional problems. Another advantage is ease of implementation compared to competing methods of unconstrained geometric and signomial programming [20]. Finally, straightforward generalizations of our MM algorithms extend beyond signomial programming.

We conclude this introduction by sketching a roadmap to the rest of the paper. Section 2 reviews the MM algorithm. Section 3 derives MM algorithm for unconstrained signomial program from two simple inequalities. The behavior of the MM algorithm is illustrated on a few numerical examples in Section 4. Section 5 extends the MM algorithm for unconstrained problems to the constrained cases using the penalty method. Section 6 specializes to linearly constrained quadratic programming on the positive orthant. Convergence results are discussed in Section 7.

2 Background on the MM Algorithm

The MM principle involves majorizing the objective function f(x) by a surrogate function g(x | x_m) around the current iterate x_m (with ith component x_mi) of a search. Majorization is defined by the two conditions

f (x_{m}) = g (x_{m} | x_{m})

(2)

f (x) \leq g (x | x_{m}), x \neq x_{m} .

In other words, the surface x ↦ g(x | x_m) lies above the surface x ↦ f(x) and is tangent to it at the point x = x_m. Construction of the majorizing function g(x | x_m) constitutes the first M of the MM algorithm.

The second M of the algorithm minimizes the surrogate g(x | x_m) rather than f(x). If x_m+1 denotes the minimizer of g(x | x_m), then this action forces the descent property f(x_m+1) ≤ f(x_m). This fact follows from the inequalities

f (x_{m + 1}) \leq g (x_{m + 1} | x_{m}) \leq g (x_{m} | x_{m}) = f (x_{m}),

reflecting the definition of x_m+1 and the tangency conditions (2). The descent property makes the MM algorithm remarkably stable. Strictly speaking, the validity of the descent property depends only on decreasing g(x | x_m), not on minimizing g(x | x_m).

3 Unconstrained Signomial Programming

The art in devising an MM algorithm revolves around intelligent choice of the majorizing function. For signomial programming problems, fortunately one can invoke two simple inequalities. For terms with positive coefficients c_α, we use the arithmetic-geometric mean inequality

\prod_{i = 1}^{n} z_{i}^{α_{i}} \leq \sum_{i = 1}^{n} \frac{α_{i}}{{‖ α ‖}_{1}} z_{i}^{{‖ α ‖}_{1}}

(3)

for nonnegative numbers z_i and α_i and ℓ₁ norm ${‖ α ‖}_{1} = \sum_{i = 1}^{n} | α_{i} |$ [19]. If we make the choice z_i = x_i/x_mi in inequality (3), then the majorization

\prod_{i = 1}^{n} x_{i}^{α_{i}} \leq (\prod_{i = 1}^{n} x_{mi}^{α_{i}}) \sum_{i = 1}^{n} \frac{α_{i}}{{‖ α ‖}_{1}} {(\frac{x_{i}}{x_{mi}})}^{{‖ α ‖}_{1}},

(4)

emerges, with equality when x = x_m. We can broaden the scope of the majorization (4) to cases with α_i < 0 by replacing z_i by the reciprocal ratio x_mi/x_i whenever α_i < 0. Thus, for terms $c_{α} \prod_{i = 1}^{n} x_{i}^{α_{i}}$ with c_α > 0, we have the majorization

c_{α} \prod_{i = 1}^{n} x_{i}^{α_{i}} \leq c_{α} (\prod_{j = 1}^{n} x_{mj}^{α_{j}}) \sum_{i = 1}^{n} \frac{| α_{i} |}{{‖ α ‖}_{1}} {(\frac{x_{i}}{x_{mi}})}^{{‖ α ‖}_{1} sgn (α_{i})},

where sgn(α_i) is the sign function.

The terms $c_{α} \prod_{i = 1}^{n} x_{i}^{α_{i}}$ with c_α < 0 are handled by a different majorization. Our point of departure is the supporting hyperplane minorization

z \geq 1 + ln z

at the point z = 1. If we let $z = \prod_{i = 1}^{n} {(x_{i} / x_{mi})}^{α_{i}}$ , then it follows that

\prod_{i = 1}^{n} x_{i}^{α_{i}} \geq \prod_{j = 1}^{n} x_{mj}^{α_{j}} (1 + \sum_{i = 1}^{n} α_{i} ln x_{i} - \sum_{i = 1}^{n} α_{i} ln x_{mi})

(5)

is a valid minorization in x around the point x_m. Multiplication by the negative coefficient c_α now gives the desired majorization. The surrogate function separates parameters and is convex when all of the α_i are positive.

In summary, the objective function (1) is majorized up to an irrelevant additive constant by the sum

g (x | x_{m}) = \sum_{i = 1}^{n} g_{i} (x_{i} | x_{m})

g_{i} (x_{i} | x_{m}) = \sum_{α \in S_{+}} c_{α} (\prod_{j = 1}^{n} x_{mj}^{α_{j}}) \frac{| α_{i} |}{{‖ α ‖}_{1}} {(\frac{x_{i}}{x_{mi}})}^{{‖ α ‖}_{1} sgn (α_{i})} + \sum_{α \in S_{-}} c_{α} (\prod_{j = 1}^{n} x_{mj}^{α_{j}}) α_{i} ln x_{i},

(6)

where S₊ = {α : c_α > 0}, and S₋ = {α : c_α < 0}. To guarantee that the next iterate is well defined and occurs on the interior of the parameter domain, it is helpful to assume for each i that at least one α ∈ S₊ has α_i positive and at least one α ∈ S₊ has α_i negative. Under these conditions each g_i(x_i | x_m) is coercive and attains its minimum on the open interval (0, ∞).

Minimization of the majorizing function is straightforward because the surrogate functions g_i(x_i | x_m) are univariate functions. The derivative of g_i(x_i | x_m) with respect to its left argument equals

g_{i}^{'} (x_{i} | x_{m}) = \sum_{α \in S_{+}} c_{α} (\prod_{j = 1}^{n} x_{mj}^{α_{j}}) α_{i} x_{i}^{- 1} {(\frac{x_{i}}{x_{mi}})}^{{‖ α ‖}_{1} sgn (α_{i})} + \sum_{α \in S_{-}} c_{α} (\prod_{j = 1}^{n} x_{mj}^{α_{j}}) α_{i} x_{i}^{- 1}

Assuming that the exponents α_i are integers, this is a rational function of x_i, and once we equate it to 0, we are faced with solving a polynomial equation. This task can be accomplished by bisection or by Newton’s method. In practice, just a few steps of either algorithm suffice since the MM principle merely requires decreasing the surrogate functions g_i(x_i | x_m).

In a geometric program, the function $g_{i}^{'} (x_{i} | x_{m})$ has a single root on the interval (0, ∞). For a proof of this fact, note that making the standard change of variables x_i = e^y_i eliminates the positivity constraint x_i > 0 and renders the transformed function h_i(y_i | x_m) = g_i(x_i | x_m) strictly convex. Because |α_i| sgn(α_i)² = |α_i|, the second derivative

h_{i}^{″} (y_{i} | x_{m}) = \sum_{α \in S_{+}} c_{α} (\prod_{j = 1}^{n} x_{mj}^{α_{j}}) \frac{| α_{i} | \cdot {‖ α ‖}_{1}}{x_{mi}^{{‖ α ‖}_{1} sgn (α_{i})}} e^{{‖ α ‖}_{1} sgn (α_{i}) y_{i}}

is positive. Hence, h_i(y_i | x_m) is strictly convex and possesses a unique minimum point. These arguments yield the even sweeter dividend that the MM iteration map is continuously differentiable. From the vantage point of the implicit function theorem [8], the stationary condition $h_{i}^{'} (y_{m + 1, i} | x_{m}) = 0$ determines y_m+1,i, and consequently x_m+1,i, in terms of x_m. Observe here that $h_{i}^{″} (y_{mi} | x_{m}) \neq 0$ as required by the implicit function.

It is also worth pointing out that even more functions can be brought under the umbrella of signomial programming. For instance, majorization of the two related functions − ln f(x) and ln f(x) is possible for any posynomial $f (x) = \sum_{α} c_{α} \prod_{i = 1}^{n} x_{i}^{α_{i}}$ . In the first case,

- ln f (x) \leq - \sum_{α} \frac{a_{m α}}{b_{m}} [\sum_{i = 1}^{n} α_{i} ln x_{i} + ln (\frac{c_{α} b_{m}}{a_{m α}})]

(7)

holds for $a_{m α} = c_{α} \prod_{i = 1}^{n} x_{mi}^{α_{i}}$ and b_m = Σ_α α_mα because Jensen’s inequality applies to the convex function − ln t. In the second case, the supporting hyperplane inequality applied to the convex function − ln t implies

ln f (x) \leq ln f (x_{m}) + \frac{1}{f (x_{m})} [f (x) - f (x_{m})] .

This puts us back in the position of needing to majorize a posynomial, a problem we have already discussed in detail. By our previous remarks, the coefficients c_α can be negative as well as positive in this case. Similar majorizations apply to any composition ϕ ◦ f(x) of a posynomial f(x) with an arbitrary concave function ϕ(y).

4 Examples of Unconstrained Minimization

Our first examples demonstrate the robustness of the MM algorithms in minimization and illustrate some of the complications that occur. In each case we can explicitly calculate the MM updates. To start, consider the posynomial

f_{1} (x) = \frac{1}{x_{1}^{3}} + \frac{3}{x_{1} x_{2}^{2}} + x_{1} x_{2}

with the implied constraints x₁ > 0 and x₂ > 0. The majorization (4) applied to the third term of f₁(x) yields

x_{1} x_{2} \leq x_{m 1} x_{m 2} [\frac{1}{2} {(\frac{x_{1}}{x_{m 1}})}^{2} + \frac{1}{2} {(\frac{x_{2}}{x_{m 2}})}^{2}] = \frac{x_{m 2}}{2 x_{m 1}} x_{1}^{2} + \frac{x_{m 1}}{2 x_{m 2}} x_{2}^{2} .

Applied to the second term of f₁(x) using the reciprocal ratios, it gives

\frac{3}{x_{1} x_{2}^{2}} \leq \frac{3}{x_{m 1} x_{m 2}^{2}} [\frac{1}{3} {(\frac{x_{m 1}}{x_{1}})}^{3} + \frac{2}{3} {(\frac{x_{m 2}}{x_{2}})}^{3}] = \frac{x_{m 1}^{2}}{x_{m 2}^{2}} \frac{1}{x_{1}^{3}} + \frac{2 x_{m 2}}{x_{m 1}} \frac{1}{x_{2}^{3}} .

The sum g(x | x_m) of the two surrogate functions

g_{1} (x_{1} | x_{m}) = \frac{1}{x_{1}^{3}} + \frac{x_{m 1}^{2}}{x_{m 2}^{2}} \frac{1}{x_{1}^{3}} + \frac{x_{m 2}}{2 x_{m 1}} x_{1}^{2}

g_{2} (x_{2} | x_{m}) = \frac{2 x_{m 2}}{x_{m 1}} \frac{1}{x_{2}^{3}} + \frac{x_{m 1}}{2 x_{m 2}} x_{2}^{2}

majorizes f₁(x). If we set the derivatives

g_{1}^{'} (x_{1} | x_{m}) = - \frac{3}{x_{1}^{4}} - \frac{x_{m 1}^{2}}{x_{m 2}^{2}} \frac{3}{x_{1}^{4}} + \frac{x_{m 2}}{x_{m 1}} x_{1}

g_{2}^{'} (x_{1} | x_{m}) = - \frac{6 x_{m 2}}{x_{m 1}} \frac{1}{x_{2}^{4}} + \frac{x_{m 1}}{x_{m 2}} x_{2}

of each of these equal to 0, then the updates

x_{m + 1, 1} = \sqrt[5]{3 (\frac{x_{m 1}^{2}}{x_{m 2}^{2}} + 1) \frac{x_{m 1}}{x_{m 2}}}, x_{m + 1, 2} = \sqrt[5]{6 \frac{x_{m 2}^{2}}{x_{m 1}^{2}}}

solve the minimization step of the MM algorithm. It is also obvious that the point $x = {(\sqrt[5]{6}, \sqrt[5]{6})}^{t}$ is a fixed point of the updates, and the reader can check that it minimizes f₁(x).

It is instructive to consider the slight variations

f_{2} (x) = \frac{1}{x_{1} x_{2}^{2}} + x_{1} x_{2}^{2}

f_{3} (x) = \frac{1}{x_{1} x_{2}^{2}} + x_{1} x_{2}

of this objective function. In the first case, the reader can check that the MM algorithm iterates according to

x_{m + 1, 1} = \sqrt[3]{\frac{x_{m 1}^{2}}{x_{m 2}^{2}}}, x_{m + 1, 2} = \sqrt[3]{\frac{x_{m 2}}{x_{m 1}}} .

In the second case, it iterates according to

x_{m + 1, 1} = \sqrt[5]{\frac{x_{m 1}^{3}}{x_{m 2}^{3}}}, x_{m + 1, 2} = \sqrt[5]{2 \frac{x_{m 2}^{2}}{x_{m 1}^{2}}} .

The objective function f₂(x) attains its minimum value whenever $x_{1} x_{2}^{2} = 1$ . The MM algorithm for f₂(x) converges after a single iteration to the value 2, but the converged point depends on the initial point x₀. The infimum of f₃(x) is 0. This value is attained asymptotically by the MM algorithm, which satisfies the identities $x_{m 1} x_{m 2}^{3 / 2} = 2^{3 / 10}$ and x_m+1,2 = 2^2/25x_m2 for all m ≥ 1. These results imply that x_m1 tends to 0 and x_m2 to ∞ in such a manner that f₃(x_m) tends to 0. One could not hope for much better behavior of the MM algorithm in these two examples.

The function

f_{4} (x) = x_{1}^{2} x_{2}^{2} - 2 x_{1} x_{2} x_{3} x_{4} + x_{3}^{2} x_{4}^{2} = {(x_{1} x_{2} - x_{3} x_{4})}^{2}

is a signomial but not a posynomial. The surrogate function (6) reduces to

g (x | x_{m}) = \frac{x_{m 2}^{2}}{2 x_{m 1}^{2}} x_{1}^{4} + \frac{x_{m 1}^{2}}{2 x_{m 2}^{2}} x_{2}^{4} + \frac{x_{m 4}^{2}}{2 x_{m 3}^{2}} x_{3}^{4} + \frac{x_{m 3}^{2}}{2 x_{m 4}^{2}} x_{4}^{4} - 2 x_{m 1} x_{m 2} x_{m 3} x_{m 4} (ln x_{1} + ln x_{2} + ln x_{3} + ln x_{4})

with all variables separated. The MM updates

x_{m + 1, 1} = \sqrt[4]{\frac{x_{m 1}^{3} x_{m 3} x_{m 4}}{x_{m 2}}}, x_{m + 1, 2} = \sqrt[4]{\frac{x_{m 2}^{3} x_{m 3} x_{m 4}}{x_{m 1}}}

x_{m + 1, 3} = \sqrt[4]{\frac{x_{m 3}^{3} x_{m 1} x_{m 2}}{x_{m 4}}}, x_{m + 1, 4} = \sqrt[4]{\frac{x_{m 4}^{3} x_{m 1} x_{m 2}}{x_{m 3}}}

converge in a single iteration to a solution of f₄(x) = 0. Again the limit depends on the initial point.

The function

f_{5} (x) = x_{1} x_{2} + x_{1} x_{3} + x_{2} x_{3} - ln (x_{1} + x_{2} + x_{3})

is more complicated than a signomial. It also is unbounded because the point $x = (m, \frac{1}{m}, \frac{1}{m})$ satisfies f₅(x) = 2 + m⁻² − ln(m + 2/m). According to the majorization (7), an appropriate surrogate is

g (x | x_{m}) = (\frac{x_{m 2}}{2 x_{m 1}} + \frac{x_{m 3}}{2 x_{m 1}}) x_{1}^{2} + (\frac{x_{m 1}}{2 x_{m 2}} + \frac{x_{m 3}}{2 x_{m 2}}) x_{2}^{2} + (\frac{x_{m 1}}{2 x_{m 3}} + \frac{x_{m 2}}{2 x_{m 3}}) x_{3}^{2} - \frac{x_{m 1}}{x_{m 1} + x_{m 2} + x_{m 3}} ln x_{1} - \frac{x_{m 2}}{x_{m 1} + x_{m 2} + x_{m 3}} ln x_{2} - \frac{x_{m 3}}{x_{m 1} + x_{m 2} + x_{m 3}} ln x_{3}

up to an irrelevant constant. The MM updates are

x_{m + 1, i} = \sqrt{\frac{x_{mi}^{2}}{(\sum_{j \neq i} x_{mj}) (x_{m 1} + x_{m 2} + x_{m 3})}} .

If the components of the initial point coincide, then the iterates converge in a single iteration to the saddle point with all components equal to $1 / \sqrt{6}$ . Otherwise, it appears that f₅(x_m) tends to −∞.

The following objective functions

f_{6} (x) = x_{1}^{2} x_{2}^{6} + x_{1}^{2} x_{2}^{4} - 2 x_{1}^{2} x_{2}^{3} - x_{1}^{2} x_{2}^{2} + 5.25 x_{1} x_{2}^{3} - 2 x_{1}^{2} x_{2} + 4.5 x_{1} x_{2}^{2} + 3 x_{1}^{2} + 3 x_{1} x_{2} - 12.75 x_{1}

f_{7} (x) = \sum_{i = 1}^{10} x_{i}^{4} + 2 \sum_{i = 1}^{9} x_{i}^{2} \sum_{j = i + 1}^{10} x_{j}^{2} + (10^{- 5} - 0.5) \sum_{i = 1}^{10} x_{i}^{2} - (2 \times 10^{- 5}) \sum_{i = 7}^{10} x_{i} + \frac{1}{16}

f_{8} (x) = x_{1} x_{3}^{2} x_{6}^{- 1} x_{7}^{- 1} + x_{1}^{2} x_{3}^{- 1} x_{5}^{- 2} x_{6}^{- 1} x_{7} + x_{1}^{3} x_{2}^{2} x_{5}^{- 2} x_{6}^{2} + x_{2}^{- 1} x_{4}^{- 1} x_{6}^{2} + x_{3} x_{5}^{3} x_{6}^{- 3}

f_{9} (x) = x_{1} x_{4}^{2} + x_{2} x_{3} + x_{1} x_{2} x_{3} x_{4}^{2} + x_{1}^{- 1} x_{4}^{- 2}

from the reference [20] are intended for numerical illustration. Table 1 lists initial conditions, minimum points, minimum values, and number of iterations until convergence under the MM algorithm. Convergence is declared when the relative change in the objective function is less than a pre-specified value ε, in other words, when

\frac{f (x_{m}) - f (x_{m + 1})}{| f (x_{m}) | + 1} \leq ε .

Optimization of the univariate surrogate functions easily succumbs to Newton’s method. The MM algorithm takes fewer iterations to converge than the path algorithm for all of the test functions mentioned in [20] except f₆(x). Furthermore, the MM algorithm avoids calculation of the gradient and Hessian and requires no matrix decompositions or selection of tuning constants.

Table 1.

Numerical examples of unconstrained signomial programming. Test functions f₄(x), f₆(x), f₇(x), f₈(x) and f₉(x) are taken from [20]. P: posynomial; S: signomial; G: general function.

Fun	Type	Initial Point x₀	Min Point	Min Value	Iters (10⁻⁹)
f₁	P	(1,2)	(1.4310,1.4310)	3.4128	38
f₂	P	(1,2)	(0.6300,1.2599)	2.0000	2
f₃	P	(1,1)	diverges	0.0000
f₄	S	(0.1,0.2,0.3,0.4)	(0.1596,0.3191,0.1954,0.2606)	0.0000	3
f₅	G	(1,1,1)	(0.4082,0.4082,0.4082)	0.2973	2
		(1,2,3)	diverges	−∞
f₆	S	(1,1)	(2.9978,0.4994)	−14.2031	558
f₇	S	(1, …, 10)	0.0255x₀	0.0000	18
f₈	P	(1, …, 7)	diverges	0.0000
f₉	P	(1,2,3,4)	(0.3969,0.0000,0.0000,1.5874)	2.0000	7

Open in a new tab

As Section 7 observes, MM algorithms typically converge at a linear rate. Although slow convergence can occur for functions such as the test function f₆(x), there are several ways to accelerate an MM algorithm. For example, our published quasi-Newton acceleration [21] often reduces the necessary number of iterations by one or two orders of magnitude. Figure 1 shows the progress of the MM iterates for the test function f₆(x) with and without quasi-Newton acceleration. Under a convergence criterion of ε = 10⁻⁹ and q = 1 secant condition, the required number of iterations falls to 30; under the same convergence criterion and q = 2 secant conditions, the required number of iterations falls to 12. It is also worth emphasizing that separation of parameters enables parallel processing in high-dimensional problems. We have recently argued [25] that the best approach to parallel processing is through graphics processing units (GPUs). These cheap hardware devices offer one to two orders of magnitude acceleration in many MM algorithms with parameters separated.

Fig. 1 — Upper left: The test function f₆(x). Upper right: 558 MM iterates. Lower left: 30 accelerated MM iterates (q = 1 secant conditions). Lower right: 12 accelerated MM iterates (q = 2 secant conditions).

5 Constrained Signomial Programming

Extending the MM algorithm to constrained geometric and signomial programming is challenging. Box constraints a_i ≤ x_i ≤ b_i are consistent with parameter separation as just developed, but more complicated posynomial constraints that couple parameters are not. Posynomial inequality constraints take the form

h (x) = \sum_{β} d_{β} \prod_{i = 1}^{n} x_{i}^{β_{i}} \leq 1 .

The corresponding equality constraint sets h(x) = 1. We propose handling both constraints by penalty methods. Before we treat these matters in more depth, let us relax the positivity restrictions on the d_β but enforce the restriction β_i ≥ 0. The latter objective can be achieved by multiplying h(x) by $x_{i}^{{max}_{β} {- β_{i}, 0}}$ for all i. If we subtract the two sides of the resulting equality, then the equality constraint h(x) = 1 can be rephrased as r(x) = 0 for the signomial $r (x) = \sum_{γ} e_{γ} \prod_{i = 1}^{n} x_{i}^{γ_{i}}$ , with no restriction on the signs of the e_γ but with the requirement γ_i ≥ 0 in effect. For example, the equality constraint

\frac{1}{x_{1}} + \frac{x_{1}}{x_{2}^{2}} = 1

becomes

x_{1}^{2} + x_{2}^{2} - x_{1} x_{2}^{2} = 0 .

In the quadratic penalty method [1,13,17] with objective function f(x) and a single equality constraint r(x) = 0 and a single inequality constraint s(x) ≤ 0, one minimizes the sum $f_{λ} (x) = f (x) + λ r {(x)}^{2} + λ s {(x)}_{+}^{2}$ , where s(x)₊ = max{s(x), 0}. As the penalty constant λ tends to ∞, the solution vector x_λ typically converges to the constrained minimum. In the revised objective function, the term r(x)² is a signomial whenever r(x) is a signomial. For example, in our toy problem the choice $r (x) = x_{1}^{2} + x_{2}^{2} - x_{1} x_{2}^{2}$ has square

r {(x)}^{2} = x_{1}^{4} + x_{2}^{4} + x_{1}^{2} x_{2}^{4} + 2 x_{1}^{2} x_{2}^{2} - 2 x_{1} x_{2}^{4} - 2 x_{1}^{3} x_{2}^{2} .

Of course, the powers in r(x) can be fractional here as well as integer. The term $s {(x)}_{+}^{2}$ is not a signomial and must be subjected to the majorization

s {(x)}_{+}^{2} \leq {\begin{matrix} {[s (x) - s (x_{m})]}^{2} & s (x_{m}) < 0 \\ s {(x)}^{2} & s (x_{m}) \geq 0 \end{matrix}

to achieve this status. In practice, one does not need to fully minimize f_λ(x) for any fixed λ. If one increases λ slowly enough, then it usually suffices to merely decrease f_λ(x) at each iteration. The MM algorithm is designed to achieve precisely this goal. Our exposition so far suggests that we majorize r(x)², s(x)², and [s(x) − s(x_m)]² in exactly the same manner that we majorize f(x). Separation of parameters generalizes, and the resulting MM algorithm keeps all parameters positive while permitting pertinent parameters to converge to 0. Section 7 summarizes some of the convergence properties of this hybrid procedure.

The quadratic penalty method traditionally relies on Newton’s method to minimize the unconstrained functions f_λ(x). Unfortunately, this tactic suffers from roundoff errors and numerical instability. Some of these problems disappear with the MM algorithm. No matrix inversions are involved, and iterates enjoy the descent property. Ill-conditioning does cause harm in the form of slow convergence, but the previously mentioned quasi-Newton acceleration largely remedies the situation [21]. As an alternative to quadratic penalties, exact penalties take the form λ|r(x)| + λs(x)₊. Remarkably, the exact penalty method produces the constrained minimum, not just in the limit, but for all finite λ beyond a certain point. Although this desirable property avoids the numerical instability encountered in the quadratic penalty method, the kinks in the objective functions f(x) + λ|r(x)| + λs(x)₊ are a nuisance. Our recent paper [24] on the exact penalty method shows how to circumvent this annoyance.

6 Nonnegative Quadratic Programming

As an illustration of constrained signomial programming, consider quadratic programming over the positive orthant. Let

f (x) = \frac{1}{2} x^{t} Qx + c^{t} x

be the objective function, Ex = d the linear equality constraints, and Ax ≤ b the linear inequality constraints. The symmetric matrix Q can be negative definite, indefinite, or positive definite. The quadratic penalty method involves minimizing the sequence of penalized objective functions

f_{λ} (x) = \frac{1}{2} x^{t} Qx + c^{t} x + \frac{λ}{2} {‖ {(Ax - b)}_{+} ‖}_{2}^{2} + \frac{λ}{2} {‖ Ex - d ‖}_{2}^{2}

as λ tends to ∞. Based on the obvious majorization

x_{+}^{2} \leq {\begin{matrix} {(x - x_{m})}^{2} & x_{m} < 0 \\ x^{2} & x_{m} \geq 0 \end{matrix},

the term ${‖ {(Ax - b)}_{+} ‖}_{2}^{2}$ is majorized by ${‖ Ax - b - r_{m} ‖}_{2}^{2}$ , where

r_{m} = min {{Ax}_{m} - b, 0} .

A brief calculation shows that f_λ(x) is majorized by the surrogate function

g_{λ} (x | x_{m}) = \frac{1}{2} x^{t} H_{λ} x + υ_{λ m}^{t} x

up to an irrelevant constant, where H_λ and υ_λm are defined by

H_{λ} = Q + λ (A^{t} A + E^{t} E)

υ_{λ m} = c - λ A^{t} (b + r_{m}) - λ E^{t} d .

It is convenient to assume that the diagonal coefficients $\frac{1}{2} h_{λ ii}$ appearing in the quadratic form $\frac{1}{2} x^{T}$ H_λx are positive. This is generally the case for large λ. One can handle the off-diagonal term h_λijx_ix_j by either the majorization (4) or the majorization (5) according to the sign of h_λij. The reader can check that the MM updates reduce to

x_{m + 1, i} = \frac{x_{mi}}{2} [- \frac{υ_{λ mi}}{h_{λ mi}^{+}} + \sqrt{{(\frac{υ_{λ mi}}{h_{λ mi}^{+}})}^{2} - 4 \frac{h_{λ mi}^{-}}{h_{λ mi}^{+}}}],

(8)

where

h_{λ mi}^{+} = \sum_{j : h_{λ ij} > 0} h_{λ ij} x_{mj}, h_{λ mi}^{-} = \sum_{j : h_{λ ij} < 0} h_{λ ij} x_{mj} .

When $h_{λ mi}^{-} = 0$ , the update (8) collapses to

x_{m + 1, i} = x_{mi} max {- \frac{υ_{λ mi}}{h_{λ mi}^{+}}, 0} .

(9)

To avoid sticky boundaries, we replace 0 in equation (9) by a small positive constant ε such as 10⁻⁹. Sha et al. [18] derived the update (8) for λ = 0 ignoring the constraints Ex = d and Ax ≤ b.

For a numerical example without equality constraints take

f_{10} (x) = \frac{1}{2} x_{1}^{2} + x_{2}^{2} - x_{1} x_{2} - 2 x_{1} - 6 x_{2}

A = (\begin{matrix} 1 & 1 \\ - 1 & 2 \\ 2 & 1 \end{matrix}), b = (\begin{matrix} 2 \\ 2 \\ 3 \end{matrix}) .

The minimum occurs at the point (2/3, 4/3)^t. Table 2 lists the number of iterations until convergence and the converged point x_λ for the sequence of penalty constants λ = 2^k. The quadratic program

f_{11} (x) = - 8 x_{1} - 16 x_{2} + x_{1}^{2} + 4 x_{2}^{2}

A = (\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix}), b = (\begin{matrix} 4 \\ 3 \end{matrix})

converges much more slowly. Its minimum occurs at the point (2.4, 1.6)^t. Table 3 lists the numbers of iterations until convergence with (q = 1) and without (q = 0) acceleration and the converged point x_λ for the same sequence of penalty constants λ = 2^k. Fortunately, quasi-Newton acceleration compensates for ill conditioning in this test problem.

Table 2.

Iterates from the quadratic penalty method for the test function f₁₀(x). The convergence criterion for the inner loops is 10⁻⁹.

log₂ λ	Iters	x_λ
0	8	(0.9503,1.6464)
1	6	(0.8580,1.5164)
2	5	(0.8138,1.4461)
3	23	(0.7853,1.4067)
4	32	(0.7264,1.3702)
5	31	(0.6967,1.3518)
6	30	(0.6817,1.3426)
7	29	(0.6742,1.3380)
8	28	(0.6704,1.3356)
9	26	(0.6686,1.3345)
10	25	(0.6676,1.3339)
11	23	(0.6671,1.3336)
12	22	(0.6669,1.3335)
13	21	(0.6668,1.3334)
14	19	(0.6667,1.3334)
15	18	(0.6667,1.3334)
16	16	(0.6667,1.3333)
17	15	(0.6667,1.3333)

Open in a new tab

Table 3.

Iterates from the quadratic penalty method for the test function f₁₁(x). The convergence criterion for the inner loops is 10⁻¹⁶.

log₂ λ	Iters (q = 0)	Iters (q = 1)	x_λ
0	18	5	(3.0000,1.8000)
1	2	2	(2.8571,1.7143)
2	56	6	(2.6667,1.6667)
3	97	5	(2.5455,1.6364)
4	167	5	(2.4762,1.6190)
5	312	5	(2.4390,1.6098)
6	541	6	(2.4198,1.6049)
7	955	5	(2.4099,1.6025)
8	1674	4	(2.4050,1.6012)
9	2924	3	(2.4025,1.6006)
10	4839	3	(2.4013,1.6003)
11	7959	4	(2.4006,1.6002)
12	12220	4	(2.4003,1.6001)
13	17674	4	(2.4002,1.6000)
14	21739	3	(2.4001,1.6000)
15	20736	3	(2.4000,1.6000)
16	8073	3	(2.4000,1.6000)
17	111	3	(2.4000,1.6000)
18	6	4	(2.4000,1.6000)
19	5	2	(2.4000,1.6000)
20	3	2	(2.4000,1.6000)
21	2	2	(2.4000,1.6000)

Open in a new tab

7 Convergence

As we have seen, the behavior of the MM algorithm is intimately tied to the behavior of the objective function f(x). For the sake of simplicity, we now restrict attention to unconstrained minimization of posynomials and investigate conditions guaranteeing that f(x) possesses a unique minimum on its domain. Uniqueness is related to the strict convexity of the reparameterization

h (y) = \sum_{α \in S} c_{α} e^{α^{t} y}

of f(x), where $α^{t} y = \sum_{i = 1}^{n} α_{i} y_{i}$ is the inner product of α and y and x_i = e^y_i for each i. The Hessian matrix

d^{2} h (y) = \sum_{α \in S} c_{α} e^{α^{t} y} α α^{t}

of h(y) is positive semidefinite, so h(y) is convex. If we let T be the subspace of ℝⁿ spanned by {α}_α∈S, then h(y) is strictly convex if and only if T = ℝⁿ. Indeed, suppose the condition holds. For any υ ≠ 0, it then must be true that α^tυ ≠ 0 for some α ∈ S. As a consequence,

υ^{t} d^{2} h (y) υ = \sum_{α \in S} c_{α} e^{α^{t} y} {(α^{t} υ)}^{2} > 0,

and d²h(y) is positive definite. Conversely, suppose T ≠ ℝⁿ, and take υ ≠ 0 with α^tυ = 0 for every α ∈ S. Then h(y+tυ) = h(y) for every scalar t, which is incompatible with h(y) being strictly convex.

Strict convexity guarantees uniqueness, not existence, of a minimum point. Coerciveness ensures existence. The objective function f(x) is coercive if f(x) tends to ∞ whenever any component of x tends to 0 or ∞. Under the reparameterization x_i = e^y_i, this is equivalent to h(y) = f(x) tending to ∞ as ‖y‖₂ tends to ∞. A necessary and sufficient condition for this to occur is that max_α∈S α^tυ > 0 for every υ ≠ 0. For a proof, suppose the contrary condition holds for some υ ≠ 0. Then it is clear that h(tυ) remains bounded above by h(0) as the scalar t tends to ∞. Conversely, if the stated condition is true, then the function q(y) = max_α∈S α^ty is continuous and achieves its minimum of d > 0 on the sphere {y ∈ ℝⁿ : ‖y‖₂ = 1}. It follows that q(y) ≥ d‖y‖₂ and that

h (y) \geq max_{α \in S} {c_{α} e^{α^{t} y}} \geq (min_{α \in S} c_{α}) e^{d {‖ y ‖}_{2}} .

This lower bound shows that h(y) is coercive.

The coerciveness condition is hard to apply in practice. An equivalent condition is that the origin 0 belongs to the interior of the convex hull of the set {α}_α∈S. It is straightforward to show that the negations of these two conditions are logically equivalent. Thus, suppose q(υ) = max_α∈S α^tυ ≤ 0 for some vector υ ≠ 0. Every convex combination Σ_α p_αα then satisfies (Σ_α p_αα)^t υ ≤ 0. If the origin is in the interior of the convex hull, then ευ is also for every sufficiently small ε > 0. But this leads to the contradiction $ε υ^{t} υ = ε {‖ υ ‖}_{2}^{2} \leq 0$ . Conversely, suppose 0 is not in the interior of the convex hull. According to the separating hyperplane theorem for convex sets, there exists a unit vector υ with υ^tα ≤ 0 = υ^t0 for every α ∈ S. In other words, q(υ) ≤ 0. The convex hull criterion is easier to check, but it is not constructive. In simple cases such as the objective function f₁(x) where the power vectors are α = (−3, 0)^t, α = (−1, −2)^t, and α = (1, 1)^t, it is visually obvious that the origin is in the interior of their convex hull.

One can also check the criterion q(υ) > 0 for all υ ≠ 0 by solving a related geometric programming problem. This problem consists in minimizing the scalar t subject to the inequality constraints α^ty ≤ t for all α ∈ S and the nonlinear equality constraint ${‖ y ‖}_{2}^{2} = 1$ . If t_min ≤ 0, then the original criterion fails.

In some cases, the objective function f(x) does not attain its minimum on the open domain $ℝ_{> 0}^{n} = {x : x_{i} > 0, 1 \leq i \leq n}$ . This condition is equivalent to the corresponding function ln h(y) being unbounded below on ℝⁿ. According to Gordon’s theorem [2,10], this can happen if and only if 0 is not in the convex hull of the set {α}_α∈S. Alternatively, both conditions are equivalent to the existence of a vector υ with α^tυ < 0 for all α ∈ S. For the objective function f₃(x), the power vectors are α = (−1, −2)^t and α = (1, 1)^t. The origin (0, 0)^t does not lie on the line segment between them, and the vector (−3/2, 1)^t forms a strictly oblique angle with each. As predicted, f₃(x) does not attain its infimum on $ℝ_{> 0}^{n}$ .

The theoretical development in reference [10] demonstrates that the MM algorithm converges at a linear rate to the unique minimum point of the objective function f(x) when f(x) is coercive and its convex reparameterization h(y) is strictly convex. The theory does not cover other cases, and it would be interesting to investigate them. The general convergence theory of MM algorithms [10] states that five properties of the objective function f(x) and MM algorithmic map x ↦ M(x) guarantee convergence to a stationary point of f(x): (a) f(x) is coercive on its open domain; (b) f(x) has only isolated stationary points; (c) M(x) is continuous; (d) x^* is a fixed point of M(x) if and only if x^* is a stationary point of f(x); and (e) f[M(x^*)] ≥ f(x^*), with equality if and only if x^* is a fixed point of M(x). For a general signomial program, items (a) and (b) are the hardest to check. Our examples provide some clues.

The standard convergence results for the quadratic penalty method are covered in the references [1,10,13,17]. To summarize the principal finding, suppose that the objective function f(x) and the constraint functions r_i(x) and s_i(x) are continuous and that f(x) is coercive on $ℝ_{> 0}^{n}$ . If x_λ minimizes the penalized objective function

f_{λ} (x) = f (x) + λ \sum_{i} r_{i} {(x)}^{2} + λ \sum_{j} s_{j} {(x)}_{+}^{2},

and x_∞ is a cluster point of x_λ as λ tends to ∞, then x_∞ minimizes f(x) subject to the constraints. In this regard observe that the coerciveness assumption on f(x) implies that the solution set {x_λ}_λ is bounded and possesses at least one cluster point. Of course, if the solution set consists of a single point, then x_λ tends to that point.

8 Discussion

The current paper presents novel algorithms for both geometric and signomial programming. Although our examples are low dimensional, the previous experience of Sha et al. [18] offers convincing evidence that the MM algorithm works well for high-dimensional quadratic programming with nonnegativity constraints. The ideas pursued here – the MM principle, separation of variables, quasi-Newton acceleration, and penalized optimization – are surprisingly potent in large-scale optimization. The MM algorithm deals with the objective function directly and reduces multivariate minimization to a sequence of one-dimensional minimizations. The MM updates are simple to code and enjoy the crucial descent property. Treating constrained signomial programming by the penalty method extends the MM algorithm even further. Quadratic programming with linear equality and inequality constraints is the most important special case of constrained signomial programming. Our new MM algorithm for constrained quadratic programming deserves consideration in high-dimensional problems. Even though MM algorithms can be notoriously slow to converge, quasi-Newton acceleration can dramatically improve matters. Acceleration involves no matrix inversion, only matrix times vector multiplication. In our limited experiments with large-scale problems [22,23], MM algorithms with quasi-Newton acceleration can achieve comparable or better performance than limited-memory BFGS algorithms. Finally, it is worth keeping in mind that parameter separated algorithms are ideal candidates for parallel processing.

Because geometric programs are convex after reparameterization, it is relatively easy to pose and check sufficient conditions for global convergence of the MM algorithm. In contrast it is far more difficult to analyze the behavior of the MM algorithm for signomial programs. Theoretical progress will probably be piecemeal and require problem-specific information. A major difficulty is understanding the asymptotic nature of the objective function as parameters approach 0 or ∞. Even in the absence of theoretical guarantees, the descent property of the MM algorithm makes it an attractive solution technique and a diagnostic tool for finding counterexamples. Some of our test problems expose the behavior of the MM algorithm in non-standard situations. We welcome the help of the optimization community in unraveling the mysteries of the MM algorithm in signomial programming.

Acknowledgments

Research was supported by United States Public Health Service grants GM53275, MH59490, and R01HG006139.

Contributor Information

Kenneth Lange, Departments of Biomathematics, Human Genetics, and Statistics, University of California, Los Angeles, CA 90095-1766, USA. klange@ucla.edu.

Hua Zhou, Department of Statistics, North Carolina State University, 2311 Stinson Drive, Campus Box 8203, Raleigh, NC 27695-8203, USA. hua_zhou@ncsu.edu.

References

1.Bertsekas DP. Nonlinear Programming. Athena Scientific; 1999. [Google Scholar]
2.Borwein JM, Lewis AS. Convex Analysis and Nonlinear Optimization: Theory and Examples. New York: Springer-Verlag; 2000. [Google Scholar]
3.Boyd S, Kim S-J, Vandenberghe L, Hassibi A. A tutorial on geometric programming. Optimization and Engineering. 2007;8:67–127. [Google Scholar]
4.Boyd S, Vandenberghe L. Convex Optimization. Cambridge: Cambridge University Press; 2004. [Google Scholar]
5.Ecker JG. Geometric programming: methods, computations and applications. SIAM Review. 1980;22:338–362. [Google Scholar]
6.Feigin PD, Passy U. The geometric programming dual to the extinction probability problem in simple branching processes. Annals Prob. 1981;9:498–503. [Google Scholar]
7.del Mar Hershenson M, Boyd SP, Lee TH. Optimal design of a CMOS op-amp via geometric programming. IEEE Trans Computer-Aided Design. 2001;20:1–21. [Google Scholar]
8.Hoffman K. Analysis in Euclidean Space. Englewood Cliffs, NJ: Prentice-Hall; 1975. [Google Scholar]
9.Hunter DR, Lange K. A tutorial on MM algorithms. Amer Statistician. 2004;58:30–37. [Google Scholar]
10.Lange K. Optimization. New York: Springer-Verlag; 2004. [Google Scholar]
11.Lange K, Hunter DR, Yang I. Optimization transfer using surrogate objective functions (with discussion) J Comput Graphical Stat. 2000;9:1–59. [Google Scholar]
12.Mazumdar M, Jefferson TR. Maximum likelihood estimates for multinomial probabilities via geometric programming. Biometrika. 1983;70:257–261. [Google Scholar]
13.Nocedal J, Wright SJ. Numerical Optimization. Springer; 1999. [Google Scholar]
14.Passy U, Wilde DJ. A geometric programming algorithm for solving chemical equilibrium problems. SIAM J Appl Math. 1968;16:363–373. [Google Scholar]
15.Peressini AL, Sullivan FE, Uhl JJ., Jr . The Mathematics of Nonlinear Programming. New York: Springer-Verlag; 1988. [Google Scholar]
16.Peterson EL. Geometric programming. SIAM Review. 1976;18:338–362. [Google Scholar]
17.Ruszczynski A. Optimization. Princeton University Press; 2006. [Google Scholar]
18.Sha F, Saul LK, Lee DD. Multiplicative updates for nonnegative quadratic programming in support vector machines. In: Becker S, Thrun S, Obermayer K, editors. Advances in Neural Information Processing Systems. Vol. 15. Cambridge, MA: MIT Press; pp. 1065–1073. [Google Scholar]
19.Steele JM. The Cauchy-Schwarz Master Class: An Introduction to the Art of Inequalities. Cambridge: Cambridge University Press and the Mathematical Association of America; 2004. [Google Scholar]
20.Wang Y, Zhang K, Shen P. A new type of condensation curvilinear path algorithm for unconstrained generalized geometric programming. Math. Comput. Modelling. 2002;35:1209–1219. [Google Scholar]
21.Zhou H, Alexander D, Lange KL. A quasi-Newton acceleration method for high-dimensional optimization algorithms. Statistics and Computing. 2011;21:261–173. doi: 10.1007/s11222-009-9166-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Zhou H, Lange KL. MM algorithms for some discrete multivariate distributions. Journal of Computational and Graphical Statistics. 2010;19(3):645–665. doi: 10.1198/jcgs.2010.09014. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Zhou H, Lange KL. A fast procedure for calculating importance weights in bootstrap sampling. Computational Statistics and Data Analysis. 2011;55:26–33. doi: 10.1016/j.csda.2010.04.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Zhou H, Lange KL. Path following in the exact penalty method of convex programming. arXiv: 1201.3593. 2011 doi: 10.1007/s10589-015-9732-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Zhou H, Lange KL, Suchard MA. Graphical processing units and high-dimensional optimization. Statistical Science. 2010;25(3):311–324. doi: 10.1214/10-STS336. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Bertsekas DP. Nonlinear Programming. Athena Scientific; 1999. [Google Scholar]

[R2] 2.Borwein JM, Lewis AS. Convex Analysis and Nonlinear Optimization: Theory and Examples. New York: Springer-Verlag; 2000. [Google Scholar]

[R3] 3.Boyd S, Kim S-J, Vandenberghe L, Hassibi A. A tutorial on geometric programming. Optimization and Engineering. 2007;8:67–127. [Google Scholar]

[R4] 4.Boyd S, Vandenberghe L. Convex Optimization. Cambridge: Cambridge University Press; 2004. [Google Scholar]

[R5] 5.Ecker JG. Geometric programming: methods, computations and applications. SIAM Review. 1980;22:338–362. [Google Scholar]

[R6] 6.Feigin PD, Passy U. The geometric programming dual to the extinction probability problem in simple branching processes. Annals Prob. 1981;9:498–503. [Google Scholar]

[R7] 7.del Mar Hershenson M, Boyd SP, Lee TH. Optimal design of a CMOS op-amp via geometric programming. IEEE Trans Computer-Aided Design. 2001;20:1–21. [Google Scholar]

[R8] 8.Hoffman K. Analysis in Euclidean Space. Englewood Cliffs, NJ: Prentice-Hall; 1975. [Google Scholar]

[R9] 9.Hunter DR, Lange K. A tutorial on MM algorithms. Amer Statistician. 2004;58:30–37. [Google Scholar]

[R10] 10.Lange K. Optimization. New York: Springer-Verlag; 2004. [Google Scholar]

[R11] 11.Lange K, Hunter DR, Yang I. Optimization transfer using surrogate objective functions (with discussion) J Comput Graphical Stat. 2000;9:1–59. [Google Scholar]

[R12] 12.Mazumdar M, Jefferson TR. Maximum likelihood estimates for multinomial probabilities via geometric programming. Biometrika. 1983;70:257–261. [Google Scholar]

[R13] 13.Nocedal J, Wright SJ. Numerical Optimization. Springer; 1999. [Google Scholar]

[R14] 14.Passy U, Wilde DJ. A geometric programming algorithm for solving chemical equilibrium problems. SIAM J Appl Math. 1968;16:363–373. [Google Scholar]

[R15] 15.Peressini AL, Sullivan FE, Uhl JJ., Jr . The Mathematics of Nonlinear Programming. New York: Springer-Verlag; 1988. [Google Scholar]

[R16] 16.Peterson EL. Geometric programming. SIAM Review. 1976;18:338–362. [Google Scholar]

[R17] 17.Ruszczynski A. Optimization. Princeton University Press; 2006. [Google Scholar]

[R18] 18.Sha F, Saul LK, Lee DD. Multiplicative updates for nonnegative quadratic programming in support vector machines. In: Becker S, Thrun S, Obermayer K, editors. Advances in Neural Information Processing Systems. Vol. 15. Cambridge, MA: MIT Press; pp. 1065–1073. [Google Scholar]

[R19] 19.Steele JM. The Cauchy-Schwarz Master Class: An Introduction to the Art of Inequalities. Cambridge: Cambridge University Press and the Mathematical Association of America; 2004. [Google Scholar]

[R20] 20.Wang Y, Zhang K, Shen P. A new type of condensation curvilinear path algorithm for unconstrained generalized geometric programming. Math. Comput. Modelling. 2002;35:1209–1219. [Google Scholar]

[R21] 21.Zhou H, Alexander D, Lange KL. A quasi-Newton acceleration method for high-dimensional optimization algorithms. Statistics and Computing. 2011;21:261–173. doi: 10.1007/s11222-009-9166-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Zhou H, Lange KL. MM algorithms for some discrete multivariate distributions. Journal of Computational and Graphical Statistics. 2010;19(3):645–665. doi: 10.1198/jcgs.2010.09014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Zhou H, Lange KL. A fast procedure for calculating importance weights in bootstrap sampling. Computational Statistics and Data Analysis. 2011;55:26–33. doi: 10.1016/j.csda.2010.04.019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Zhou H, Lange KL. Path following in the exact penalty method of convex programming. arXiv: 1201.3593. 2011 doi: 10.1007/s10589-015-9732-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Zhou H, Lange KL, Suchard MA. Graphical processing units and high-dimensional optimization. Statistical Science. 2010;25(3):311–324. doi: 10.1214/10-STS336. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

MM Algorithms for Geometric and Signomial Programming

Kenneth Lange

Hua Zhou

Abstract

1 Introduction

2 Background on the MM Algorithm

3 Unconstrained Signomial Programming

4 Examples of Unconstrained Minimization

Table 1.

Fig. 1.

5 Constrained Signomial Programming

6 Nonnegative Quadratic Programming

Table 2.

Table 3.

7 Convergence

8 Discussion

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

MM Algorithms for Geometric and Signomial Programming

Kenneth Lange

Hua Zhou

Abstract

1 Introduction

2 Background on the MM Algorithm

3 Unconstrained Signomial Programming

4 Examples of Unconstrained Minimization

Table 1.

Fig. 1.

5 Constrained Signomial Programming

6 Nonnegative Quadratic Programming

Table 2.

Table 3.

7 Convergence

8 Discussion

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases