Abstract
We use mathematical programming tools, such as Semidefinite Programming (SDP) and Nonlinear Programming (NLP)-based formulations to find optimal designs for models used in chemistry and chemical engineering. In particular, we employ local design-based setups in linear models and a Bayesian setup in nonlinear models to find optimal designs. In the latter case, Gaussian Quadrature Formulas (GQFs) are used to evaluate the optimality criterion averaged over the prior distribution for the model parameters. Mathematical programming techniques are then applied to solve the optimization problems. Because such methods require the design space be discretized, we also evaluate the impact of the discretization scheme on the generated design. We demonstrate the techniques for finding D–, A– and E–optimal designs using design problems in biochemical engineering and show the method can also be directly applied to tackle additional issues, such as heteroscedasticity in the model. Our results show that the NLP formulation produces highly efficient D–optimal designs but is computationally less efficient than that required for the SDP formulation. The efficiencies of the generated designs from the two methods are generally very close and so we recommend the SDP formulation in practice.
Keywords: Approximate Design, Bayesian Optimal Design, Global Optimization, Gaussian Quadrature Formula, Information Matrix
1. Introduction
We consider finding model-based optimal designs of experiments (M-bODE) for models that describe constitutive relations, commonly used to represent physical properties or kinetic data. For M-bODE problems, we have a given parametric model defined on a given design space and a given design criterion; our task is to find the number of design points required, where these design points are and the number of replicates at these design points that optimally meet the criterion. These design issues can be difficult to answer even for some relatively simple model. A general observation is that while there has been important advances made in solving estimation problems, innovation in techniques for finding efficient designs have not kept pace. In particular, it is helpful to explore the applicability of the increasing array of optimization numerical techniques used in other disciplines to solve statistical design problems where analytical approaches are no longer feasible. Continuing advances in algorithmic development is crucial to tackling more complex and high dimensional design problems.
In the subfield of optimal design of experiments in Statistics, various algorithms have been developed and continually improved for generating different types of optimal designs for algebraic models. Some examples are those proposed by Fedorov (1972) [1], Wynn (1972) [2], Mitchell (1974) [3] and, Gail and Kiefer (1980) [4]. Recently multiplicative algorithms seem to be gaining in popularity [5, 6]. Some of these algorithms are reviewed, compared and discussed in Cook and Nachtsheim (1982) [7] and Pronzato (2008) [8], among others. A common issue is how to confirm the global optimality of the design found from an algorithm. In selected situations, verification can be accomplished using an equivalence theorem [9]. These algorithms typically require a starting design and a stopping criterion to terminate the search for the optimal design. A common stopping rule comes from the general equivalence theorem, which we will use in this paper. Some algorithms also require that the space be discretized and so the generated optimal design depends on the size of the grid used in the search.
Mathematical programming algorithms and solvers have been used and continue to be widely used outside the field of statistics. These tools have improved substantially over the last two decades and they can solve complex high-dimensional optimization problems accurately and efficiently. In particular, mathematical programming approaches have been successfully employed to solve M-bODE problems. Some examples of such tools are Semidefinite Programming (SDP) [10, 11, 12], Semi Infinite Programming (SIP) [13], Nonlinear Programming (NLP) [14, 15], NLP combined with stochastic procedures such as Genetic Algorithms [16, 17], and Global Optimization [18]. This paper describes and compares a few mathematical programming tools for finding a variety of optimal designs used in chemistry and chemical engineering problems.
Section 2 presents background for SDP and NLP formulations for solving selected design problems, including Bayesian optimal design problems. Section 3 describes SDP formulations for linear and nonlinear models with applications to chemical engineering problems. Section 4 introduces the NLP formulations for finding D–optimal designs and compares results with those from the SDP formulations in Section 3. A conclusion is offered in Section 5.
2. Background
2.1. Preliminaries
Throughout we assume that we have a regression model with a given mean function f(x, θ) with differentiable components. The vector of regressors is x ∈ X ⊂ ℝnx and X is a user-selected compact design space. The continuous response is y and its mean response at x is modeled by
| (1) |
where the notation 𝔼[•] is the expectation of the argument in [•]. The np × 1 vector of unknown model parameters θ is assumed to belong to a known np-dimensional cartesian box with each interval [lj, uj] representing the known plausible range of values for the jth parameter. We assume errors are homoscedastic but when the responses have different variances depending on where the x’s are selected to observe the responses, methods discussed here can also apply and some brief results for such situations are also presented. Given a design criterion and a predetermined sample size, N, the research question is how to select the N sets of values for the covariates to observe the responses that maximize information in some optimal way.
A common goal of the M-bODE problem is to find an optimal design to maximize the information of the design of experiments carried out. Optimality depends on the objective of the study. For example, if predicting the responses at a few user-selected points in the design space is the primary goal, then one chooses a set of values of covariates in the design that will minimize the variances of the predicted responses at those points.
We focus on approximate design problems, which requires determination of a probability measure over the given design space X. Such a design ξ is characterized by the number of support points, their locations in the design space and the proportions of observations to be taken at these points. If the sample size for the experiment is fixed at N, the approximate design ξ is implemented by taking roughly N × wi observations at the design point xi, i = 1, …, k, subject to each N × wi is a positive integer and N × w1 + … + N × wk = N. In what is to follow, we represent such a design by rows where each row shows one of the design points and the last component in the row is the weight at the support point. If there are nx covariates in the model, the ith design point is and if there are k of them, the design can be represented by k rows: ( , wi), i ∈ {1, ···, k} with . In what is to follow, we let [k] = {1, ···, k}.
An optimal approximate design optimizes a given criterion over Ξ, the space of all approximate designs on X. The key advantages of working with approximate designs are that there is a unified framework for finding optimal continuous designs for M-bODE problems and when the design criterion is a convex or concave functional of the information matrix, equivalence theorems are available to provide a practical way to check the optimality of any design among all continuous designs. In case the design is not optimal, the equivalence theorem also provides a lower bound of the design efficiency of the current design relative to the optimum (without the need to find the optimum). In addition, there are algorithms for finding several types of optimal approximate designs.
To fix ideas, we assume that all N responses have constant variances, are identically, independently and normally distributed and there are ri replicates at each of the k points xi, i ∈ [k] with xi = (xi,1, …, xi,nx)T. If yi,j is the jth. observation from xi, the total log-likelihood function is
| (2) |
The maximum likelihood estimator (MLE) for θ is:
For an approximate k-point design with at x1, x2, …, xk and weights w1, w2, …, wk, the elements of the normalized FIM are the negative expectation of the second order derivatives of the total log-likelihood with respect to the parameters given by
| (3) |
where ℳ(δxi, θ) is the FIM from the design δxi that puts all weight at xi. Let 𝕏 denote the discretized space from X using q points equally spaced in each dimension. The above information matrix is now approximated by
where χ is the selected probability measure on 𝕏 so that the above sum is equal to the integral in (3) as close as possible. We denote the set of q points in 𝕏 by [q] = {1, ···, q}.
The volume of the asymptotic confidence region of θ is proportional to det[ℳ−1/2(ξ, θ)], and so maximizing the determinant by choice of the design provides the smallest possible volume. Maximizing the information matrix in other ways lead to other criteria, the most common ones are represented by a concave function of the information matrix. For example, D–, A– and E–optimal designs maximize each one of the following criteria, respectively:
| (4) |
| (5) |
| (6) |
where λmin is the minimum eigenvalue of the FIM. The efficiency of a design ξ is its worth relative to the optimum, and the D–, A– and E–efficiency are defined, respectively, by
| (7) |
| (8) |
| (9) |
Since the optimality criteria are concave functions of the FIM, we can use convex analysis theory [9, 19] or systematic mathematical convex programming solvers to obtain globally optimal designs [20, 21]. To verify the optimality of a design, Kiefer-Wolfowitz equivalence theorems [9, 1, 22] can be used. Pukelsheim (1993) [23] provides details on optimality criteria, equivalence theorems and the interpretation of design efficiency.
2.2. Pseudo-Bayesian designs
Nonlinear models are common in chemistry, thermodynamics, and chemical engineering, with typical applications ranging from modeling kinetic reaction rates to physical properties [24, 25]. For such models, the information matrix depends on the parameters and so all design criteria formulated in terms of the information matrix depend on the unknown parameters which we want to estimate. When nominal values are assumed for these parameters, the resulting designs are termed locally optimal. Some design strategies to handle this dependence include use of: (i) a series of locally optimal designs, each computed using the most updated estimate θ̂ of θ [26, Chap. 17]; (ii) Bayesian designs that optimize the expectation of the optimality criterion value averaged over the prior distribution model parameters θ in Θ [14]; (iii) minmax designs that maximize the design efficiency assuming that the true values for the model parameters are from the worst combination of parameters in Θ [27]. Here we focus on finding Bayesian optimal designs, or perhaps more correctly, pseudo-Bayesian optimal designs. However, for the purpose of this paper, we use the two terms interchangeably.
The Bayesian approach assumes that the uncertainty of the parameters can be adequately captured in the prior distribution. This prior density averages out the parameter values so that the design criterion is no longer dependent on the unknown parameters. The Bayesian optimal designs are then found by optimizing the expectation of the design criterion, see Chaloner and Verdinelli (1995) [28]. Specifically, given a prior density π(θ) for θ, the Bayesian D–optimal design ξBayesD is defined by
Similar representations apply for Bayesian A–, and E–optimal designs from equations (5–6). Gaussian Quadrature Formulas (GQF) can be used to approximate the expectation integral of the optimality criterion by first discretizing the parameter space Θ. For each dimension, the integration points are the roots of the (κ − 1)th order Legendre polynomials and κ is the number of points used to approximate the integral. The roots and weights of the integration scheme are presented in Atkinson (1989) [29] and for simplicity, we use the same number of points for all dimensions of Θ. Other discretization schemes may also be employed.
All computation in this paper were carried using on an Intel Core i7 machine (Intel Corporation, Santa Clara, CA) running 64 bits Windows 7 operating system with 2.80GHz.
2.3. Semidefinite programming
SDP is a subfield of mathematical programming for solving a class of optimization problems that enables us to specify in addition to a set of linear constraints, semi-definite constraints, a special form of nonlinear constraints [30]. Generally, the formulation seeks to minimize a linear combination of decision variables aggregated in a matrix that must lie in the (closed convex) cone of positive semidefinite symmetric matrices, subject to a set of linear matrix inequalities. Because both the objective function and the constraints are convex, a semidefinite program is a convex optimization problem. This requires numerical algorithms to solve the problem efficiently which is done using appropriate solvers. Among the most efficient methods used nowadays is Interior Point, see [31]. Applications of SDP to solve various optimization problems in different disciplines are given in Vandenberghe and Boyd (1996) [32], including details on implementing SDP to search for optimal designs. Specific applications to solve optimal design problems include finding (i) D–optimal designs for multi-response linear models [33], (ii) c-optimal designs for single-response trigonometric regression models [34] and, (iii) D–optimal designs for polynomial models and rational functions [12]. Second order conic programming (SOCP) formulations share conic properties with SDP representations and this feature was recently exploited to find c-optimal designs for linear models with multiple responses [11]. Collectively, these papers emphasize the simplicity and efficiency of the SDP based approach for finding optimal designs.
The SDP-based approach requires the design space to be discretized into a user-specified grid of points 𝕏. Given the fully parametrized model, we compute the Fisher Information Matrix (FIM) at each discretized point and sum them up to obtain the total information matrix. The design criterion is formulated as a convex function of this matrix, and SDP is applied to solve the the optimal design problem using an appropriate SDP solver. The essential ingredients of solving an optimal design problem using SDP are as follows:
Let 𝕊m be the space of m × m symmetric matrices and let ζ = (ζ1, …, ζm1)T ∈ ℝm1 be the vector of variables to be optimized in the the semidefinite program. A function φ : ℝm1 ↦ ℝ is called semidefinite representable (SDr) if and only if inequalities of the form u ≤ φ(ζ) can be expressed by linear matrix inequalities (LMI) [12, 35]. That is, φ(ζ) is SDr if and only if u ≤ φ(ζ) is equivalent to the existence of m × m symmetric matrices M0, ···, Mm1, ···, Mm1+m2 ∈ 𝕊m and a vector v = (v1, …, vm2)T such that
| (10) |
Here, the notation ⪰ means that the matrix left of it must be non negative definite. Given the optimality criterion and the design problem, real numbers c1, …, cm1 to be used in the linear combination of ζ are internally generated and the optimal values for ζ of the SDr functions are determined from the semidefinite programs of the form:
| (11a) |
| (11b) |
In our design context, m = np, the known number of parameters in the model. However, the integers m1 and m2 are not user-specified and are set by the SDP solver. They depend on the number of points used in the discretization of the design space, which is known, and the operators used to codify the LMI in the SDP formulations. The vector c = (c1, ···, cm1)T are also dependendent on the design problem, the operators and the discretization scheme, and is generated internally before the optimization problem is solved. The matrices Mi, i = 0, …, m1 are the local FIMs and other matrices used to reformulate SDr functions, and the vector ζ that includes the weights wi, i ∈ [k] of the optimal design. The matrices Mi, i = m1 + 1, ···, m1 + m2 are required to represent the SDr functions but are not included in the optimization problem. Optimal designs are found from formulation (11), which frequently contains many semidefinite constraints similar to (11b), along with the obvious linear constraints on w, i.e. its components s are non negative and they sum to unity.
A list of SDr functions was compiled by Ben-Tal and Nemirovski (2001) [36, Chap. 2–3], and used for deriving SDP formulations for the M-bODE problem, see Boyd and Vandenberghe (2004) [30, Sec. 7.3]. Sagnol (2013) showed that each criterion in the Kiefer’s class of optimality criteria defined by
is SDr for all rational values of p ∈ (−∞, 1] and general SDP formulations exist [35]. This result is also applicable to the case when p → 0, whereupon Φp[ℳ(ξ, θ)] = det[ℳ(ξ, θ)]1/np and D–optimality obtains [37]. The maximization of this SDr function is clearly equivalent to maximizing the geometric mean of the determinant [36, Chap. 3].
2.4. Global optimization
NLP formulations for Bayesian D–optimal designs were used by Chaloner (1989) but they reported difficulty in applying the Federov-Wynn algorithm to find the optimal designs. They had to resort to a Nelder-Mead simplex based method to solve several design problems for the logistic model. Boer and Hendrix (2000) observed that D–optimality NLP formulation can have multiple optimal solutions so that global optimization tools are required. To make the problem numerically tractable, it is helpful search for the optimal design by fixing the number of support points and let the design evolve until the general equivalence theorem is validated.
Global optimization (GO) seeks to find the global optimum x of a nonconvex function f : X ↦ ℝ in a compact domain X. The general structure of the GO problems is:
| (12) |
where r is a set of me equality constraints and g is a set of mi inequality constraints [38]. If all the decision variables ξ are continuous, as we will assume here for wi and xi, the problem is a Nonlinear Program. When the problem (12) is nonconvex, locally optimal solutions might not be global. The algorithms to handle GO problems fall into the deterministic methods [39] and stochastic methods [40]. The former group of approaches partitions the original domain into sub-domains, determines the local optima and uses theoretical concepts to discard subdomains. Some examples of deterministic algorithms are branch and bound, inner, outer and interval algebra-based methods; a complete overview of GO algorithms is available in [41, 39].
3. SDP-generated Optimal Designs
In this section, we use SDP formulations to solve M-bODE problems. In subsection 3.1 we have linear models and in subsection 3.2 we have nonlinear models. We demonstrate the two strategies using D–optimality and omit details for the A– and E-optimality. We present and compare the D–, A– and E-optimal designs from the two methods and the effect of the discretization scheme on the search and quality of the optimal designs.
3.1. Linear models
The SDP formulations used to find optimal designs for linear models are based on the representations of Boyd and Vandenberghe (2004) [30]. Recalling that the FIM for a linear model does not depend on the model parameter Θ, we now write the FIM simply as ℳ(ξ) and ignore the Θ in the argument. As an illustration, consider using the SDP formulation for finding a D–optimal design for a linear model. Recalling that np is number of parameters in θ, the LMI τ ≤ (det[ℳ(ξ)])1/np holds if and only if there exists a np × np–lower triangular matrix 𝒞 such that
where Diag(𝒞) is the diagonal matrix with diagonal entries 𝒞j,j and the geometric mean of the 𝒞j,j on the extreme right can, in turn, be expressed as a series of 2 × 2 LMIs [36].
To handle SDP problems we can use user-friendly interfaces, such as cvx [42] or Picos [43], that automatically transform constraints of the form τ ≤ φ(ζ) as a series of LMIs, and pass them to SDP solvers such as SeDuMi [44] or Mosek [45]. In what is to follow, we present SDP formulations for finding A–, E– and D–optimal designs in a compact form, so that they can be directly applied for high-level interfacing. This means that instead of using the LMIs generated by reformulating the optimization problem, we use the operators themselves with additional constraints to ensure the matrix ℳ(ξ) positive semidefinite.
Consider the SDP formulation for the D–optimal design problem (4) with Φp[ℳ(ξ)] = det[ℳ(ξ)]1/np as the criterion. The optimization problem is now compactly represented by
| (13a) |
| (13b) |
| (13c) |
| (13d) |
Similar formulations also apply for the A– and E–optimal design problems (5–6) using the criteria [tr(ℳ−1(ξ))]−1, and λmin(ℳ(ξ)), respectively. These SDP problems are then solved using the cvx environment combined with the SDP solver Mosek. The relative and absolute tolerances were set to 10−5 in all problems.
3.1.1. Example 1: Optimal Designs for a Linear Mixture Model
We evaluate the SDP formulation for finding optimal designs for a linear model using an empirical model representing the influence of the composition of water/acetone/ethanol mixture on the size of amphiphilic β-cyclodextrin nanoparticle [46]. The experiment is in part motivated by the known effect of the composition of the solvent mixture in the size of the nanoparticles produced by precipitation. We use a quadratic mixture model with mean function given by
| (14a) |
| (14b) |
| (14c) |
Here x1 is the fraction of water, x2 the fraction of ethanol, x3 the fraction of acetone in the solvent, and the response y is the average size of the nanoparticles. The inequalities (14c) are physical constraints that the faction of each component has to satisfy. Using the equality constraint (14b) to eliminate x3, we obtain
| (15a) |
| (15b) |
where all the parameters bi and bi,j’s are functions of the original βi’s. Our goal is to find D–, A– and E-optimal designs for estimating all the parameters in the re-parameterized model (15a), which includes the parameters bi and bi,j.
The design space is X ≡ [0.4, 0.7] × [0.0, 0.6] and we discretize this two-dimensional space using equally spaced grid points with Δx1 = Δx2 = 0.01. The discretization setup produces a discrete design space, denoted by 𝕏, with 1426 candidate points. From equation (3), we note that the mean vector function is [1, x1,i, x2,i, x1,i x2,i, ] and because the model is linear, the FIM is independent of model parameters. The solution of the SDP problem (13) determines the optimal set of k support points in ξ.
Table 1 displays the D–, A– and E–optimal designs obtained for all the optimality criteria. The D– and A–optimal designs have 9 support points, and the E–optimal design requires 35 points. Table 1 also reports the CPU time required to solve each problem. In all cases, our CPU times are relatively short.
Table 1.
SDP-generated D–, A– and E–optimal designs for the linear mixture model with Δx1 = Δx2 = 0.01 in Example 1.
| D–optimal design | A–optimal design | E–optimal design | |
|---|---|---|---|
| (0.40,0.00,0.60,0.1611) | (0.40,0.00,0.60,0.1542) | (0.40,0.00,0.60,0.1980) | |
| (0.40,0.30,0.30,0.1527) | (0.40,0.32,0.28,0.0623) | (0.55,0.00,0.45,0.0929) | |
| (0.40,0.60,0.00,0.1610) | (0.40,0.33,0.27,0.0674) | (0.55,0.01,0.44,0.0555) | |
| (0.53,0.23,0.24,0.0183) | (0.40,0.60,0.00,0.0322) | (0.55,0.02,0.43,0.0408) | |
| (0.53,0.24,0.23,0.0270) | (0.54,0.46,0.00,0.0083) | (0.55,0.03,0.42,0.0338) | |
| (0.56,0.00,0.44,0.0961) | (0.55,0.00,0.45,0.1390) | (0.55,0.45,0.00,0.0568) | |
| (0.56,0.44,0.00,0.0957) | (0.55,0.45,0.00,0.1214) | (0.70,0.29,0.01,0.0436) | |
| (0.70,0.00,0.30,0.1439) | (0.70,0.00,0.30,0.0818) | (0.70,0.30,0.00,0.2078) | |
| (0.70,0.30,0.00,0.1439) | (0.70,0.30,0.00,0.1480) | Additional 26 points | |
| CPU (s) | 1.7316 | 1.4040 | 1.6068 |
| Optimum | 0.00569874 | 4.0727e-005 | 5.5149e-005 |
(x1.xx, x2.xx, x3.xx,w.wwww) ≡(design point, weight)
To analyze the impact of the discretization scheme on the generated design, we now use two coarser equally spaced grid sets to discretize the design space with Δx1 = Δx2 = 0.02 and Δx1 = Δx2 = 0.1. These choices are somewhat arbitrary but seem adequate for the purpose of the comparison here. The generated designs using these coarser grid sets are shown in Table 2 and are slightly different from those found with a finer grid in Table 1). Since the support points of the optimal design are dependent on the grid used to discretize the design space, thinner grids might produce more efficient designs since they are closer to optimal designs obtained for continuous domain X. In section 4 we investigate the efficiency differences assuming that the design space is continuous and global optimization tools are used.
Table 2.
SDP-generated A–, E– and D–optimal designs for the linear mixture model in Example 1 using different discretization schemes.
| Grid: Δx1 = Δx2 = 0.02
|
Grid: Δx1 = Δx2 = 0.1
|
|||||
|---|---|---|---|---|---|---|
| D–optimal design | A–optimal design | E–optimal design | D–optimal design | A–optimal design | E–optimal design | |
| (0.40,0.00,0.60,0.1604) | (0.40,0.00,0.60,0.1531) | (0.56,0.14,0.30,0.0190) | (0.40,0.00,0.60,0.1603) | (0.40,0.00,0.60,0.1534) | (0.50,0.50,0.00,0.0487) | |
| (0.40,0.30,0.30,0.1537) | (0.40,0.32,0.28,0.0240) | (0.56,0.16,0.28,0.0181) | (0.40,0.30,0.30,0.1487) | (0.40,0.30,0.30,0.1165) | (0.60,0.00,0.40,0.0701) | |
| (0.40,0.60,0.00,0.1605) | (0.40,0.34,0.26,0.1091) | (0.70,0.18,0.12,0.0186) | (0.40,0.60,0.00,0.1604) | (0.40,0.40,0.20,0.0200) | (0.60,0.10,0.30,0.0490) | |
| (0.52,0.24,0.24,0.0102) | (0.40,0.60,0.00,0.0329) | (0.70,0.20,0.10,0.0206) | (0.50,0.00,0.50,0.0202) | (0.40,0.60,0.00,0.0329) | (0.60,0.20,0.20,0.0387) | |
| (0.54,0.22,0.24,0.0173) | (0.54,0.18,0.28,0.1822) | (0.70,0.22,0.08,0.0233) | (0.60,0.30,0.10,0.0310) | (0.50,0.20,0.30,0.0263) | (0.50,0.00,0.50,0.0648) | |
| (0.54,0.24,0.22,0.0187) | (0.54,0.46,0.00,0.1303) | (0.70,0.24,0.06,0.0275) | (0.50,0.30,0.20,0.0264) | (0.50,0.20,0.30,0.1723) | (0.60,0.40,0.00,0.0264) | |
| (0.56,0.00,0.44,0.0962) | (0.56,0.00,0.46,0.1431) | (0.70,0.26,0.04,0.0345) | (0.50,0.50,0.00,0.0197) | (0.50,0.50,0.00,0.0711) | (0.70,0.00,0.30,0.0285) | |
| (0.56,0.44,0.00,0.0961) | (0.70,0.00,0.30,0.0820) | (0.70,0.28,0.02,0.0497) | (0.60,0.00,0.40,0.0816) | (0.60,0.00,0.40,0.1069) | (0.70,0.10,0.20,0.0357) | |
| (0.70,0.00,0.30,0.1434) | (0.70,0.30,0.00,0.1433) | (0.70,0.30,0.00,0.1133) | (0.60,0.40,0.00,0.0821) | (0.60,0.40,0.00,0.0537) | (0.70,0.10,0.10,0.0488) | |
| (0.70,0.30,0.00,0.1434) | Additional 26 points | (0.70,0.00,0.30,0.1372) | (0.70,0.00,0.30,0.0838) | (0.70,0.30,0.00,0.1077) | ||
| CPU (s) | 1.4352 | 0.8268 | 1.2636 | 1.2168 | 0.7644 | 0.6084 |
| Optimum | 0.00569745 | 4.0621e-005 | 5.4655e-005 | 0.0055639 | 3.4523e-005 | 4.35901e-5 |
| eff† | 0.9989 | 0.9983 | 0.9910 | 0.8647 | 0.8486 | 0.7904 |
(x1.xx, x2.xx, x3.xx, w.wwww) ≡(design point, weight).
determined with (7–9), the optimal designs were generated with the thinner grid (cf. Table 1).
The efficiencies obtained for both discretization schemes are listed in Table 2, and were determined with equations (7–9) assuming that the optimal designs, ξD, ξA, and ξE, were generated with the thinner grid (cf. Table 1). We note that the grid generated withΔx1 = Δx2 = 0.02 has four times fewer nodes than the original (Δx1 = Δx2 = 0.01), but the decrease on the efficiency is below 1%. The grid produced with Δx1 = Δx2 = 0.1 has 100 times fewer nodes, and the reduction on the efficiency is between 10 and 20%. The differences observed in the efficiency of designs, due to support points placement, are compensated by the optimal choice of the weights, wi. From the practical point of view, a discrete grid is realistic since the design space may not allow, due to physical and economic limitations, all arbitrary combinations of regressors, and its implementation is also constrained to rational values of N × wi.
3.2. Nonlinear models
We now extend the SDP-based formulation to find optimal designs for estimating the model parameters θ in a nonlinear model. Specifically, we find Bayesian optimal designs by first eliciting a prior distribution π(θ) for the model parameters and then average the criterion over the prior distribution. The resulting expectation integral is then approximated using GQF based on (κ − 1)th degree Legendre polynomials, see Duarte and Wong (2014) [47].
Let ℳ(ξ, θ) be the FIM from an approximate design ξ. The Bayesian optimal design problem is to find a design that satisfies
| (16) |
Here we assume the design criterion is one of Kiefer’s Φp optimality criteria, but other criteria may be used. For D-optimality, Φp[ℳ(ξ, θ)] = (det[ℳ(ξ, θ)])1/np, which is equivalent to log(det[ℳ(ξ, θ)]). For A–optimality, we have (tr[ℳ−1(ξ, θ)])−1, and for E–optimality, we have λmin[ℳ−1(ξ, θ)].
Let ι be the number of points used in the integral approximation over Θ and let [ι] = {1, ···, ι} denote the set of points. Because we use Legendre polynomials of the same degree to approximate the expectation in each dimension of Θ, we have ι = κnp. The discrete set Θ ∈ Θ contains ι parameter combinations θi, i ∈ [ι]. Each element θi ∈ ℝnp of Θ is the cartesian product of the set containing GQF points from each dimension of Θ. If ρT ≡ [ρ1, ···, ρκ] is the vector of roots of the (κ − 1)th order Legendre’s polynomial on the interval [−1, 1], then
Let ith be the tuple containing elements of [κ] = {1, ···, κ}, let , i ∈ ι and let γT = [γk1, ···, γknp] ∈ ℝκ be the vector of weights of the Legendre polynomials on the interval [−1, 1]. The weight of the ith point θi ∈ Θ in the GQF is
and the expectation in (16) is now approximated using the GQF. The sought Bayesian optimal design for the prior π(θ) is
| (17) |
We observe that the Bayesian optimal design in (17) is obtained by optimizing a linear combination of the criteria Φp and because 0 ≤ π(θi) ≤ 1 and 0 ≤ ωi ≤ 1, i ∈ [ι], it is also a linear combination of SDr functions. This implies that the sum is a SDr function since each atomic element Φp[ℳ(ξ, θi)] is SDr by definition. Consequently, the SDP formulations for finding Bayesian A– and E–optimal designs follow directly as follows. For A–optimality, we note that Φp[ℳ(ξ, θi)] = (tr[ℳ−1(ξ, θi)])−1 is SDr for all i ∈ [ι] and the optimization problem is
| (18a) |
| (18b) |
| (18c) |
| (18d) |
For E–optimality, the SDP problem may be similarly formlated as follows:
| (19a) |
| (19b) |
| (19c) |
| (19d) |
The SDP formulation for the D–optimal design problem is more complicated because as noted earlier, Φp[ℳ(ξ, θi)] = log(det[ℳ−1(ξ, θi)]) is not SDr. However, the exponentiation of the sum of logarithm terms produces an equivalent problem
where the function now to optimize is SDr. Specifically, the terms (det[ℳ(ξ, θi)])1/np, by construction, are SDr, and the product has the form of a concave monomial, which is SDr if the power terms αi = np π(θi) ωi, ∀i ∈ [ι] are rational; see Ben-Tal and Nemirovski (2001) [36, Chap. 3]. If the power αi is irrational, a nearby rational value is used instead [42]. The upshot is that the Bayesian D–optimal design formulation is
| (20a) |
| (20b) |
| (20c) |
| (20d) |
| (20e) |
3.2.1. Example 2: Optimal Designs for Estimating the Kinetics of Alcohol Dehydration
We now present two examples of SDP formulations of design problems for nonlinear models. The first considers the case where the FIM is obtained from the mean function f directly. In the second case, we consider a more complex model where the mean response function is only implicitly defined.
The first example is taken from Box (1965) [24], where interest was fitting the kinetics of the catalytic dehydration of n-hexyl alcohol using the model:
| (21) |
Here y is the reaction rate, b1, b2, b3 are parameters to estimate, and x1 and x2 are the partial pressures of alcohol and olefin used in the experiment, respectively. The plausible range of values for the regressors x = (x1, x2)T are in a set X ≡ [0.0, 2.0]×[0.0, 2.0], and the vector of possible values for the parameters is contained in . The FIM for a “single observation” xi is constructed by first differentiating f(x, θ) with respect to θ to obtain the vector hT(xi, θ) given by
Our goals are to find A–, D– and E-optimal designs for estimating model parameters. To this end, we used a grid with equally spaced points of width Δx1 = Δx2 = 0.1 to first discretize the design space. This results in 441 grid points as candidate support points for the design. The expectation is determined by using a 6 point-based GQF in each dimension of Θ, resulting in 63 = 216 points. We used Box (1965) [24] nominal values for the parameters and set the plausible region Θ ≡ [1.9, 3.9] × [9.2, 15.2] × [1.14, 2.34]. Our interest is to determine various optimal designs for estimating the model parameters assuming that we have a three-dimensional uniform prior distribution on Θ, i.e. .
Table 3 presents the Bayesian optimal designs obtained with SDP formulations using different grid sets. We observe that all optimal designs obtained with the grid constructed with Δx1 = Δx2 = 0.1 have 4 support points, and 3 support points when a coarse grid with size twice that of the original is employed. The CPU time required is much longer than that required in Example 1 because the size of the SDP problem needed to solve here has increased by a few orders of magnitude. We also notice that (i) the CPU time increases significantly when the grid set used to discrete 𝕏 is finer and, (2) the Bayesian D–optimal design problem is computationally more challenging because of the number of LMIs in the problem.
Table 3.
SDP generated optimal designs for the kinetics of the catalytic dehydration of n-hexyl alcohol with for different discretization schemes.
| Grid: Δx1 = Δx2 = 0.1
|
Grid: Δx1 = Δx2 = 0.2
|
|||||
|---|---|---|---|---|---|---|
| D–optimal design | A–optimal design | E–optimal design | D–optimal design | A–optimal design | E–optimal design | |
| (0.30,0.00,0.3335) | (0.20,0.00,0.4432) | (0.20,0.00,0.0896) | (0.20,0.00,0.3333) | (0.20,0.00,0.4804) | (0.20,0.00,0.4257) | |
| (2.00,0.00,0.3331) | (0.30,0.00,0.0373) | (0.30,0.00,0.4125) | (2.00,0.00,0.3333) | (2.00,0.00,0.0663) | (0.40,0.00,0.0859) | |
| (2.00,0.50,0.0490) | (2.00,0.00,0.0661) | (2.00,0.00,0.0219) | (2.00,0.60,0.3333) | (2.00,0.60,0.4533) | (2.00,0.00,0.0151) | |
| (2.00,0.60,0.2843) | (2.00,0.60,0.4534) | (2.00,0.50,0.4760) | (2.00,0.50,0.4734) | |||
| CPU (s) | 168.5435 | 51.6987 | 16.4581 | 42.5103 | 23.93055 | 5.5848 |
| Optimum | 0.967486 | 54041.8 | 3.02988E-5 | 0.966133 | 54033.0 | 2.97844E-5 |
(x1.xx, x2.xx,w.wwww) ≡(design point, weight).
To investigate the impact of choice of the prior distribution π(θ) on the generated design, we now suppose we have a three-dimensional uncorrelated normal distribution with mean μ = (2.9, 12.2, 1.74)T, and the ordered diagonal elements in the covariance matrix ϒ1 are 1/32, 1.02 and 0.22, respectively. Table 4 presents the the A–, D– and E–optimal designs when the prior distribution is . The number of support points of the A–optimal design is the same as that of the uniform prior, and but the E–optimal design has one additional support point compared with the design found under the uniform prior distribution. Tables 3 and 4 compare the effect of the fineness of discretization scheme on the grid used to discretize the design space and they reveal that the prior distribution seems to have little influence in the generated designs. Of course, this observation cannot be generalized to other design problems but techniques discussed here can be similarly applied to ascertain the impact of the choice of the grid set on the generated design.
Table 4.
SDP generated optimal designs for the kinetics of the catalytic dehydration of n-hexyl alcohol with for Example 2 with a grid width of Δx1 = Δx2 = 0.1.
| D–optimal design | A–optimal design | E–optimal design | |
|---|---|---|---|
| (0.30,0.00,0.3333) | (0.20,0.00,0.4089) | (0.20,0.00,0.2001) | |
| (2.00,0.00,0.3333) | (0.30,0.00,0.0747) | (0.30,0.00,0.3077) | |
| (2.00,0.50,0.0411) | (2.00,0.00,0.0619) | (2.00,0.00,0.0134) | |
| (2.00,0.60,0.2923) | (2.00,0.60,0.4545) | (2.00,0.50,0.3967) | |
| (2.00,0.60,0.0819) | |||
| CPU (s) | 133.7084 | 61.32399 | 16.9729 |
| Optimum | 0.986152 | 46486.9 | 2.69544E-5 |
(x1.xx, x2.xx,w.wwww) ≡(design point, weight).
3.2.2. Example 3: Optimal Designs for Estimating Activity Coefficients in a UNIFAC Model
This application finds optimal designs for estimating group interaction parameters in the UNIFAC model. Recalling that UNIQUAC (short for UNIversal QUAsiChemical) is an activity coefficient model used to describe phase equilibria, UNIFAC stands for UNIQUAC Functional-group Activity Coefficients and the method is a semi-empirical system for estimating non-electrolyte activity in non-ideal mixtures. For example, [48] used the model for estimating the activity coefficient in the liquid-vapor equilibrium. Our model comprises a binary mixture of n-pentane and acetone where the mean response is given by
| (22) |
Here y is the activity coefficient and ζ is experimentally measured using the ratio γl Pv/(γv P), where γl the molar fraction of a mixture component in the liquid phase, Pv is the vapor pressure estimated by the Antoine equation, γv is the molar fraction of such a component in the vapor phase in equilibrium, P is the pressure at which the experiment is carried out, and both b1 and b2 are group interaction parameters. The Antoine equation is a simple 3-parameter regression model commonly used to fit experimental vapor pressures measured over a restricted temperature range, and we assume the parameters in the equation are known. The function f, formalized as the UNIFAC model, is continuous and differentiable, and can be found in several textbooks. For a complete overview of the model and their technicalities the readers are referred to [48, pages 8.75–8.77]. We assume that the regressors in the design of experiments are the molar fraction of one of the components in the liquid phase, here called x1, and the temperature of the experiment (expressed in K), designated by x2. Once the mixture becomes binary, the composition of the second component depends only on that of the first, and is not considered a factor in the design of experiments.
Let us consider a binary mixture formed by n-pentane/acetone, with three different functional groups: CH3–, –CH2–, and CH3CO–. The group interaction parameters between the groups CH3– and –CH2– are 0, since both belong to the same Main groups, see Hansen (1991) [49]. An optimal design is sought to estimate, as accurate as possible, the interaction parameter between the groups CH3CO– and CH3–, b1, and the interaction parameter between the groups CH3– and CH3CO–, b2. The domain Θ is [426.40, 526.40] × [20.76, 32.76], the design space is X ≡ [0.0, 1.0] × [298.0, 318.0] and the nominal values for b1 and b2 are 476.40 and 26.76, respectively, see Poling et al. (2001) [48]. The grid employed to discretize the design space X is equally spaced in each dimension with Δx1 = 0.01 and Δx2 = 1.0. The information matrix is found in the same way, except that the vector of the derivatives of f, hT(xi, θ), is now determined using a numerical differentiation scheme based on central differences approximation with a perturbation technique and a step size equal to 10−5.
Table 5 shows the optimal designs found when the uniform prior is used. The A— and E—optimal designs have 2 support points, and the D—optimal design has 3 points. In all the designs, the optimal temperature settings to carry out the experiment are at the extreme ends in Θ. Table 5 also presents the designs obtained for a larger discretization grid where Δx1 = 0.2 and Δx2 = 1.0. Both grids produce similar designs with about the same efficiency, except for A–optimality where both discretization schemes produce the same design.
Table 5.
SDP generated A–, E– and D–optimal designs for the activity coefficient of a binary mixture of n-pentane/acetone with for different discretization schemes.
| Grid: Δx1 = 0.1, Δx2 = 1.0
|
Grid: Δx1 = 0.2, Δx2 = 1.0
|
|||||
|---|---|---|---|---|---|---|
| D–optimal design | A–optimal design | E–optimal design | D–optimal design | A–optimal design | E–optimal design | |
| (0.14,318.0,0.1207) | (0.12,318.0,0.5971) | (0.11,318.0,0.6058) | (0.14,318.0,0.0633) | (0.12,318.0,0.5971) | (0.12,318.0,0.5978) | |
| (0.48,318.0,0.3844) | (0.90,318.0,0.4029) | (0.90,318.0,0.3942) | (0.48,318.0,0.4398) | (0.90,318.0,0.4029) | (0.90,318.0,0.4022) | |
| (0.87,318.0,0.4949) | (0.88,318.0,0.4969) | |||||
| CPU (s) | 257.0740 | 252.6124 | 250.3660 | 123.3812 | 121.8523 | 121.5872 |
| Optimum | 0.996786 | 32.9628 | 0.0324719 | 0.9967833 | 32.9628 | 0.0323780 |
(x1.xx, x2xx.x,w.wwww) ≡(design point, weight).
Table 6 displays the optimal designs determined for the normal prior with μ = (476.40, 26.76)T and ϒ2 = Diag(16.672, 2.002). The results in Tables 5 and 6 follow the trends observed in Example 2, that is, the prior distribution seems to only marginally affect the generated design. In this example, there are only differences in the weights of the support points, and even then, they are not very different.
Table 6.
SDP generated A–, E– and D–optimal designs for the activity coefficient of a binary mixture of n-pentane/acetone with , for Δx1 = 0.1 and Δx2 = 1.0.
| D–optimal design | A–optimal design | E–optimal design | |
|---|---|---|---|
| (0.1400,318.0,0.1192) | (0.1200,318.0,0.5962) | (0.1100,318.0,0.6068) | |
| (0.4800,318.0,0.3857) | (0.9000,318.0,0.4038) | (0.9000,318.0,0.3932) | |
| (0.8700,318.0,0.4951) | |||
| CPU (s) | 258.9304 | 254.4220 | 253.5640 |
| Optimum | 0.9967869 | 32.4097 | 0.032287 |
(x1.xx, x2xx.x,w.wwww) ≡(design point, weight).
4. Global Optimization-generated Optimal Designs
We provide in this section, NLP formulations for the D–optimal design problem for linear and nonlinear models and note that because the function to be optimized is no longer convex, there may have multiple local optima. Accordingly, we require Global Optimization (GO) solvers. For space consideration, we illustrate the procedure using the D–optimality criterion. The interest in this methodology is that we can search for the global optimal design over a continuous design space X and use it as a reference to assess the efficiency of SDP generated designs.
The D–optimality criterion seeks to find a design that minimizes the determinant of the inverse of the FIM. This problem is equivalent to finding a design that minimizes the product of the inverse of the eigenvalues, [26]. The number of support points of the optimal design is not known a priori and so we used an iterative procedure to find it. Following Dette and Tittof (2009) [50], we set the number of support points, k, in the starting design equal to np and solve the problem; if necessary we update the value of k and re-solve the problem until there is no improvement in the objective function for two consecutive values of k or one of the k support points of the design has a null weight. The latter strategy is inspired by the cutting plan algorithm to determine optimal designs of experiments [51].
Using Cholesky decomposition, we write the information matrix ℳ(ξ, θ) as a product of a unique lower triangular matrix 𝒟(ξ, θ) ∈ ℝnp×np and its transpose as
Let 𝒟i,i be the ith diagonal entry of 𝒟(ξ, θ). It follows that the determinant of ℳ(ξ, θ) is
and the D–optimality criterion becomes
| (23) |
4.1. Linear models
We now introduce the NLP formulation to find D–optimal designs for linear models. The FIM does not depend on the model parameters and we may denote the derivative of the mean function with respect to θ by h(x) with elements hj(x), j = 1, ···, np. If [np] = {1, ···, np} is the set of indices of the regressors, the formulation of the optimization problem is:
| (24a) |
| (24b) |
| (24c) |
| (24d) |
| (24e) |
| (24f) |
| (24g) |
Other constraints, such as symmetry of the design or additional equalities that the design should satisfy, can also be explicitly included in the mathematical program. As always, in our code here and elsewhere, we stipulate a small positive constant, ε, say, equal to 10−8, to ensure the semidefinite positiveness of the FIM during the iteration process.
The problem (24) may have multiple optima even when the number of support points, k, is imposed at the onset. To determine a global optimal design we codified the problem in GAMS [52] and used a multistart heuristic algorithm based solver, OQNLP, to find the global optimum. The algorithm calls a NLP solver from multiple starting points, keeps all the feasible solutions found, and picks the best as the optimum of the problem [53]. The starting points are computed with a random sampling driver that uses normal independent probability distribution functions for each decision variable. OQNLP does not guarantee that the final solution is a global optimum, but it has been successfully tested in a large set of problems. To build the initial sampling points the variables need to be bounded, which is what we have since the design space and the region of plausible values are all compact by assumption. The NLP solver called by OQNLP is CONOPT, which in turn uses the Generalized Reduced Gradient (GRG) algorithm [54]. The maximum number of starting points allowed is set to 1000 and the procedure terminates when 100 consecutive NLP solver calls result in a only tiny improvement in the criterion value, say, less than 10−4. The absolute and relative tolerances of the solvers were all set equal to 10−8 and 10−7, respectively, in all our problems.
We now apply the procedure to Example 1 in §3.1.1, previously handled with the SDP based strategy, and use it to test the NLP formulation (24). NLP formulation produces a design with 8 support points, one point fewer than the D–optimal design resulting from the SDP based approach, see Table 7. We observed that two of the support points obtained with the SDP formulation, the discrete points (0.5300,0.2300,0.2400) and (0.5300,0.2400,0.2300), are collapsed into a single point in the NLP-based design. The weight of the support point that replaces the former two is equal to the sum of the weights of the collapsed points in the SDP design, see the 4th. point on the right column of the Table 7.
Table 7.
SDP (based on a grid with Δx1 = Δx2 = 0.01) and NLP-generated D–optimal designs for the linear mixture model for Example 1.
| SDP-generated design | NLP-generated design | |
|---|---|---|
| (0.4000,0.0000,0.6000,0.1605) | (0.4000,0.0000,0.6000,0.1601) | |
| (0.4000,0.3000,0.3000,0.1528) | (0.4000,0.3000,0.3000,0.1529) | |
| (0.4000,0.6000,0.0000,0.1605) | (0.4000,0.6000,0.0000,0.1601) | |
| (0.5300,0.2300,0.2400,0.0234) | (0.5313,0.2343,0.2344,0.0475) | |
| (0.5300,0.2400,0.2300,0.0236) | (0.5569,0.0000,0.4431,0.0955) | |
| (0.5600,0.0000,0.4400,0.0961) | (0.5569,0.4431,0.0000,0.0955) | |
| (0.5600,0.4400,0.0000,0.0961) | (0.7000,0.0000,0.3000,0.1442) | |
| (0.7000,0.0000,0.3000,0.1435) | (0.7000,0.3000,0.0000,0.1442) | |
| (0.7000,0.3000,0.0000,0.1435) | ||
| CPU (s) | 2.2152 | 334.6080 |
| Optimum | 0.00569874 | 0.00574001 |
(x1.xxxx, x2.xxxx, x3.xxxx,w.wwww) ≡(design point, weight).
From (7), a direct calculation shows the D–efficiency of the design found from the SDP formulation with Δx1 = Δx2 = 0.01 relative to the global optimal design found from the NLP method is 0.9949. This suggests that the SDP-generated design is very close to the optimal design found with the NLP formulation, and should be adequate for practical purposes. Another aspect to mention is that the NLP formulation is computationally intensive; the CPU time is more than 100 times greater than that required by SDP.
4.2. Nonlinear models
To extend the NLP formulation for finding D–optimal designs for nonlinear models, we use nomenclature in §3.2 and apply the Bayesian framework and GQF to compute the expectation. The roots, θℓ ∈ Θ, and weights, ωℓ, ℓ ∈ [ι] included in the integration are computed in similar way. Different types of prior distributions are considered. A main difference here, unlike the linear model case discussed in §4.1, is that the determinants of the FIMs for all elements in θℓ ∈ Θ need to be computed, and consequently we need to factorize the ι FIM matrices via Cholesky decomposition. The NLP is as follows:
| (25a) |
| (25b) |
| (25c) |
| (25d) |
| (25e) |
| (25f) |
| (25g) |
To test the formulation (25) we use the nonlinear model (21) in §3.2.1. The expectation is computed with a six-point GQF, and we adopt both priors used earlier: i. three-dimensional uniform distribution with Θ ≡ [1.9, 3.9] × [9.2, 15.2] × [1.14, 2.34]; and ii. multivariate normal distribution with μ = (2.9, 12.2, 1.74)T and ϒ1 = Diag(1/32, 1.02, 0.22).
Table 8 displays the D–optimal designs resulting from the NLP-formulation for the uniform and normal prior distributions. The support points of both designs found from the NLP formulations are close even though they have different types of prior distributions. We also observed that two neighboring support points of the SDP-generated design, i.e. (2.0000,0.5000) and (2.0000,0.6000), have been collapsed into one in the design found by the NLP-formulation. The NLP-generated design is equally supported at 3 support points, which is the minimum required for the kinetic rate model.
Table 8.
NLP-generated Bayesian D–optimal designs for the kinetics of the catalytic dehydration of n-hexyl alcohol for different priors.
| Prior
|
|||
|---|---|---|---|
|
|
|
||
| (0.2597,0.0000,0.3333) | (0.2575,0.0000,0.3333) | ||
| (2.0000,0.0000,0.3333) | (2.0000,0.0000,0.3333) | ||
| (2.0000,0.5549,0.3333) | (2.0000,0.5566,0.3333) | ||
| CPU (s) | 3002.334 | 2668.5240 | |
| Optimum | 0.968842 | 0.989318 | |
(x1.xxxx, x2.xxxx,w.wwww) ≡(design point, weight).
As noted before, the average CPU required by the NLP-formulation is about 12 times greater than that of the SDP-based setup, compare the first column of Tables 3 and 4 with the results in Table 8. Considering that the NLP-generated design is the reference, ξD, in (7), the efficiencies of the SDP-generated Bayesian designs for uniform and normal priors, presented in Tables 3 and 4, are 0.9986 and 0.9968, respectively. These results suggest that the SDP-generated Bayesian D–optimal designs, being sub-optimal, have consistently high enough efficiencies for most practical implementations, and yet require considerably lower computational effort than the NLP formulation.
5. Conclusions
This paper discusses a systematic approach based on mathematical programming to find M-bODE using Semidefinite Programming tools and NLP formulations. The latter method is capable of solving nonconvex optimization problems with multiple local optima. However, a GO solver is required to find a global optimum for the problem. For nonlinear models, we adopt a Bayesian approach and expectation was evaluated using GQF. Unlike SDP formulations which require the design space be discretized, the NLP formulation is based on the Cholesky decomposition of the FIM and is capable of solving problems over a continuous domain.
We demonstrated the two procedures by applying them to find D–, A– and E–optimal designs for some chemical engineering problems. For D–optimality, which is the most common design criterion, our results consistently demonstrate that the NLP formulation produces more efficient designs than those obtained via SDP. However the differences in their efficiencies are typically quite negligible and the computational effort required is one or two order of magnitude lower in the SDP formulation. Consequently, we recommend SDP formulation for practical purposes. We also observed that designs obtained from the SDP formulations typically contain more points than that from the NLP formulation, where the extra points tend to collapse to the support points from the NLP formulations. This may not be surprising because one method assumes a discrete design space and the other does not. A cautionary note in these procedures is that appropriate and reliable solvers are required to solve the optimization problems efficiently.
HIGHLIGHTS.
SDP-based formulations for optimal design of experiments;
NLP-based formulation for D-optimal design of experiments;
Formulations to handle both linear and nonlinear algebraic models;
Examples from the areas of Chemistry and Chemical Engineering;
SDP-based formulation is computationally competitive and accurate.
Acknowledgments
The research of Wong reported in this paper was partially supported by the National Institute Of General Medical Sciences of the National Institutes of Health under Award Number R01GM107639.
Footnotes
The contents in this paper are solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Contributor Information
Belmiro P.M. Duarte, Email: bduarte@isec.pt.
Weng Kee Wong, Email: wkwong@ucla.edu.
Nuno M.C. Oliveira, Email: nuno@eq.uc.pt.
References
- 1.Fedorov VV. Theory of Optimal Experiments. Academic Press; 1972. [Google Scholar]
- 2.Wynn HP. Results in the theory and construction of D-optimum experimental designs. Journal of Royal Statistics Soc - Ser B. 1972;34:133–147. [Google Scholar]
- 3.Mitchell TJ. An algorithm for the construction of D-optimal experimental designs. Technometrics. 1974;16:203–210. [Google Scholar]
- 4.Galil Z, Kiefer J. Time- and space-saving computer methods, related to Mitchell’s DETMAX for finding D-optimum designs. Technometrics. 1980;22:301–313. [Google Scholar]
- 5.Torsney B, Mandal S. Two classes of multiplicative algorithms for constructing optimizing distributions. Computational Statistics & Data Analysis. 2006;51(3):1591–1601. [Google Scholar]
- 6.Dette H, Pepelyshev A, Zhigljavsky AA. Improving updating rules in multiplicative algorithms for computing D-optimal designs. Computational Statistics & Data Analysis. 2008;53(2):312–320. [Google Scholar]
- 7.Cook RD, Nachtsheim CJ. Model robust, linear-optimal designs. Technometrics. 1982;24:49–54. [Google Scholar]
- 8.Pronzato L. Optimal experimental design and some related control problems. Automatica. 2008;44:303–325. [Google Scholar]
- 9.Kiefer J, Wolfowitz J. The equivalence of two extremum problem. Canadian Journal of Mathematics. 1960;12:363–366. [Google Scholar]
- 10.Vandenberghe L, Boyd S. Applications of semidefinite programming. Applied Numerical Mathematics. 1999;29:283–299. [Google Scholar]
- 11.Sagnol G. Computing optimal designs of multiresponse experiments reduces to second-order cone programming. Journal of Statistical Planning and Inference. 2011;141(5):1684–1708. [Google Scholar]
- 12.Papp D. Optimal designs for rational function regression. Journal of the American Statistical Association. 2012;107:400–411. [Google Scholar]
- 13.Duarte BP, Wong W-K. A semi-infinite programming based algorithm for finding minimax optimal designs for nonlinear models. Statistics and Computing. 2014;24(6):1063–1080. [Google Scholar]
- 14.Chaloner K, Larntz K. Optimal Bayesian design applied to logistic regression experiments. Journal of Statistical Planning and Inference. 1989;59:191–208. [Google Scholar]
- 15.Molchanov I, Zuyev S. Steepest descent algorithm in a space of measures. Statistics and Computing. 2002;12:115–123. [Google Scholar]
- 16.Heredia-Langner A, Montgomery DC, Carlyle WM, Borror CM. Model-robust optimal designs: A Genetic Algorithm approach. Journal of Quality Technology. 2004;36:263–279. [Google Scholar]
- 17.Zhang Y. PhD thesis. Virginia Polytechnic Institute and State University; 2006. Bayesian D-Optimal Design for Generalized Linear Models. [Google Scholar]
- 18.Boer EPJ, Hendrix EMT. Global optimization problems in optimal design of experiments in regression models. Journal Global Optimization. 2000;18:385–398. [Google Scholar]
- 19.Pazman A. Foundations of Optimum Experimental Design. Reidel Publ. Company; New York: 1986. [Google Scholar]
- 20.Whittle P. Some general points in the theory of optimal experimental design. Journal of the Royal Statistical Society Ser B. 1973;35:123–130. [Google Scholar]
- 21.Kiefer J. General equivalence theory for optimum design (approximate theory) Annals of Statistics. 1974;2:849–879. [Google Scholar]
- 22.Silvey S. Optimal Design. Chapman & Hall; 1980. [Google Scholar]
- 23.Pukelsheim F. Optimal Design of Experiments. SIAM; Philadelphia: 1993. [Google Scholar]
- 24.Box GEP, Hunter WG. The experimental study of physical mechanisms. Technometrics. 1965;7(1):23–42. [Google Scholar]
- 25.Dette H, Melas VB, Strigul N. Applied Optimal Designs. John Willey & Sons; 2005. Design of experiments for microbiological models; pp. 137–180. [Google Scholar]
- 26.Atkinson AC, Donev AN, Tobias RD. Optimum Experimental Designs, with SAS. Oxford University Press; Oxford: 2007. [Google Scholar]
- 27.Wong W. A unified approach to the construction of minimax designs. Biometrika. 1992;79:611–620. [Google Scholar]
- 28.Chaloner K, Verdinelli I. Bayesian experimental design: A review. Statist Science. 1995;10:273–304. [Google Scholar]
- 29.Atkinson KE. An Introduction to Numerical Analysis. 2. John Willey & Sons; New York: 1989. [Google Scholar]
- 30.Boyd S, Vandenberghe L. Convex Optimization. University Press; Cambridge: 2004. [Google Scholar]
- 31.Ye Y. Interior Point Algorithms: Theory and Analysis. John Wiley & Sons; New York: 1997. [Google Scholar]
- 32.Vandenberghe L, Boyd S. Semidefinite programming. SIAM Review. 1996;8:49–95. [Google Scholar]
- 33.Filová L, Trnovská M, Harman R. Computing maximin efficient experimental designs using the methods of semidefinite programming. Metrika. 2011;64(1):109–119. [Google Scholar]
- 34.Qi H. A semidefinite programming study of the Elfving theorem. Journal of Statistical Planning and Inference. 2011;141:3117–3130. [Google Scholar]
- 35.Sagnol G. On the semidefinite representation of real functions applied to symmetric matrices. Linear Algebra and its Applications. 2013;439(10):2829–2843. [Google Scholar]
- 36.Ben-Tal A, Nemirovski AS. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. Society for Industrial and Applied Mathematics; Philadelphia: 2001. [Google Scholar]
- 37.Pronzato L. A delimitation of the support of optimal designs for Kiefer’s ϕp-class of criteria. Statistics & Probability Letters. 2013;83(12):2721–2728. [Google Scholar]
- 38.Horst R, Pardalos PM, Thoai NV. Introduction to Global Optimization. 2. Springer; Dordrecht: 2000. [Google Scholar]
- 39.Tawarlamani M, Sahinidis N. Convexification and Global Optimization in Continuous and Mixed Integer Nonlinear Programming. 1. Kluwer Academic Pusblishers; Dordrecht: 2002. [Google Scholar]
- 40.Zhigljavsky A, Žilinskas A. Stochastic Global Optimization. 1. Springer; New York: 2007. [Google Scholar]
- 41.Floudas CA. Deterministic Global Optimization: Theory, Methods and Applications. 1. Springer; Dordrecht: 1999. [Google Scholar]
- 42.Grant M, Boyd S, Ye Y. cvx Users Guide for cvx version 1.22. 1104 Claire Ave., Austin, TX 78703-2502: 2012. [Google Scholar]
- 43.Sagnol G. Tech Rep. ZIB; 2012. Picos, a python interface to conic optimization solvers; pp. 12–48. http://picos.zib.de. [Google Scholar]
- 44.Sturm J. Using SeDuMi 1.02, a Matlab toolbox for optimization oversymmetric cones. Optimization Methods and Software. 1999;11:625–653. [Google Scholar]
- 45.Andersen E, Jensen B, Jensen J, Sandvik R, Worsøe U. Tech rep, Technical Report TR–2009–3. MOSEK; 2009. Mosek version 6. [Google Scholar]
- 46.Choisnard L, Géze A, Bigan M, Putaux J, Wouessidjewe D. Efficient size control of amphiphilic cyclodextrin nanoparticles through a statistical mixture design methodology. J Pharm Pharmaceut Sci. 2005;8:593–600. [PubMed] [Google Scholar]
- 47.Duarte BPM, Wong WK. Finding Bayesian optimal designs for nonlinear models: A semidefinite programming-based approach. International Statistical Review. 2015;83(2):239–262. doi: 10.1111/insr.12073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Poling BE, Prausnitz JM, O’Connel JP. The Properties of Gases and Liquids. 5. McGraw-Hill; New York: 2001. [Google Scholar]
- 49.Hansen HK, Rasmussen P, Fredenslund A, Schiller M, Gmehling J. Vapor-liquid equilibria by UNIFAC group contribution. 5. Revision and extension. Industrial Engineering Chemistry Research. 1991;30:2352–2355. [Google Scholar]
- 50.Dette H, Titoff S. Optimal discriminating designs. Annals of Statistics. 2009;37:2056–2081. doi: 10.1214/15-AOS1333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Gribik PR, Kortanek KO. Equivalence theorems and cutting plane algorithms for a class of experimental design problems. SIAM J Appl Mathematics. 1977;32:232–259. [Google Scholar]
- 52.Brooke A, Kendrick D, Meeraus A, Raman R. GAMS - A Users Guide. GAMS Development Corporation; Washington: 1998. [Google Scholar]
- 53.Ugray Z, Lasdon L, Plummer J, Glover F, Kelly J, Martí R. Metaheuristic Optimization via Memory and Evolution. Springer; 2005. A multistart scatter search heuristic for smooth nlp and minlp problems; pp. 25–51. [Google Scholar]
- 54.Drud A. CONOPT: A GRG code for large sparse dynamic nonlinear optimization problems. Mathematical Programming. 1985;31:153–191. [Google Scholar]
