Bayesian T-optimal discriminating designs

Holger Dette; Viatcheslav B Melas; Roman Guchenko

doi:10.1214/15-AOS1333

. Author manuscript; available in PMC: 2016 Oct 1.

Published in final edited form as: Ann Stat. 2015 Oct;43(5):1959–1985. doi: 10.1214/15-AOS1333

Bayesian T-optimal discriminating designs

Holger Dette ¹, Viatcheslav B Melas ², Roman Guchenko ³

PMCID: PMC4793413 NIHMSID: NIHMS765881 PMID: 26997684

Abstract

The problem of constructing Bayesian optimal discriminating designs for a class of regression models with respect to the T-optimality criterion introduced by Atkinson and Fedorov (1975a) is considered. It is demonstrated that the discretization of the integral with respect to the prior distribution leads to locally T-optimal discriminating design problems with a large number of model comparisons. Current methodology for the numerical construction of discrimination designs can only deal with a few comparisons, but the discretization of the Bayesian prior easily yields to discrimination design problems for more than 100 competing models. A new efficient method is developed to deal with problems of this type. It combines some features of the classical exchange type algorithm with the gradient methods. Convergence is proved and it is demonstrated that the new method can find Bayesian optimal discriminating designs in situations where all currently available procedures fail.

Keywords: Design of experiment, Bayesian optimal design; model discrimination; gradient methods; model uncertainty

1 Introduction

Optimal design theory provides useful tools to improve the accuracy of statistical inference without any additional costs by carefully planning experiments before they are conducted. Numerous authors have worked on the construction of optimal designs in various situations. For many models optimal designs have been developed explicitly [see the monographs of Pukelsheim (2006); Atkinson et al. (2007)] and several algorithms have been developed for their numerical construction if the optimal designs are not available in explicit form [see Yu (2010); Yang et al. (2013) among others]. On the other hand the construction of such designs depends sensitively on the model assumptions and an optimal design for a particular model might be inefficient if it is used in a different model. Moreover, in many experiments it is often not obvious which model should be finally fitted to the data and model building is an important part of data analysis. A typical and very important example are Phase II dose-finding studies, where various nonlinear regression models of the form

Y = η (x, θ) + ε .

(1.1)

have been developed for describing the dose-response relation [see Pinheiro et al. (2006)], but the problem of model uncertainty arises in nearly any other statistical application. As a consequence, the construction of efficient designs for model identification has become an important field in optimal design theory. Early work can be found in Stigler (1971), who determined designs for discriminating between two nested univariate polynomials by minimizing the volume of the confidence ellipsoid for the parameters corresponding to the extension of the smaller model. Several authors have worked on this approach in various other classes of nested models [see for example Dette and Haller (1998) or Song and Wong (1999) among others].

A different approach to the problem of constructing optimal designs for model discrimination is given in a pioneering paper by Atkinson and Fedorov (1975a), who proposed the T-optimality criterion to construct designs for discriminating between two competing regression models. Roughly speaking their approach provides a design such that the sum of squares for a lack of fit test is large. Atkinson and Fedorov (1975b) extended this method for discriminating a selected model η₁ from a class of other regression models, say {η₂, . . . , η_k}, k ≥ 2. In contrast to the work Stigler (1971) and followers the T-optimality criterion does not require competing nested models and has found considerable attention in the statistical literature with numerous applications including such important fields as chemistry or pharmacokinetics [see e.g. Atkinson et al. (1998), Ucinski and Bogacka (2005), López-Fidalgo et al. (2007), Atkinson (2008), Tommasi (2009) or Foo and Duffull (2011) for some more recent references]. A drawback of the T-optimality criterion consists of the fact that – even in the case of linear models – the criterion depends on the parameters of the model η₁. This means that T-optimality is a local optimality criterion in the sense of Chernoff (1953), and that it requires some preliminary knowledge regarding the parameters. Consequently, most of the cited papers refer to locally T-optimal designs. Although there exist applications where such information is available [for example in the analysis of dose response studies as considered in Pinheiro et al. (2006)], in most situations such knowledge can be rarely provided. Several authors have introduced robust versions of the classical optimality criteria such as Bayesian or minimax D-optimality criteria in order to determine efficient designs for model discrimination, which are less sensitive with respect to the choice of parameters [see Pronzato and Walter (1985); Chaloner and Verdinelli (1995); Dette (1997)]. The robustness problem of the T-optimality criterion has been already mentioned in Atkinson and Fedorov (1975a), who proposed a Bayesian approach to address the problem of parameter uncertainty in the T-optimality criterion. Wiens (2009) imposed (linear) neighbourhoud structures on each regression response and determined least favorable points in these neighbourhouds in order to robustify the locally T-optimal design problem. Dette et al. (2012) considered polynomial regression models and determined explicitly Bayesian T-optimal discriminating designs for the criterion introduced by Atkinson and Fedorov (1975a). Their results indicate the difficulties arising in Bayesian T-optimal design problems.

The scarcity of literature on Bayesian T-optimal discriminating designs can be explained by the fact that in nearly all cases of practical interest these designs have to be found numerically, and even this is a very hard problem. These numerical difficulties become even apparent in the case of locally T-optimal designs. Atkinson and Fedorov (1975a) proposed an exchange type algorithm, which has a rather slow rate of convergence and has been used by several authors. Braess and Dette (2013) pointed out that, besides its slow convergence, this algorithm does not yield the solution of the optimal discriminating design problem, if more than 5 model comparisons are under consideration. These authors developed a more efficient algorithm for the determination of locally T-optimal discriminating designs for several competing regression models by exploring relations between optimal design problems and (nonlinear) vector-valued approximation theory. Although the resulting algorithm provides a substantial improvement of the exchange type methods it cannot deal with Bayesian optimality criteria in general, and the development of an efficient procedure for this purpose is a very challenging and open problem.

The goal of the present paper is to fill this gap. We utilize the fact that in applications the integral with respect to the prior distribution has to be determined by a discrete approximation and we show that the discrete Bayesian T-optimal design problem is a special case of the local T-optimality criterion for a very large number of competing models considered as in Braess and Dette (2013). The competing models arise from the different support points used for the approximation of the prior distribution by a discrete measure, and the number of model comparisons in the resulting criterion easily exceeds the 200. Therefore the algorithm in Braess and Dette (2013) does not provide a solution of the corresponding optimization problem, and we propose a new method for the numerical construction of Bayesian T-optimal designs with substantial computational advantages. Roughly speaking, the support points of the design in each iteration are determined in a similar manner as proposed in Atkinson and Fedorov (1975a) but for the calculation of the corresponding weights we use a gradient approach. It turns out that the new procedure is extremely efficient and is able to find Bayesian T-optimal designs with a few number of iterations.

The remaining part of this paper is organized as follows. In Section 2 we give an introduction into the problem of designing experiments for discriminating between competing regression models and also derive some basic properties of locally T-optimal discriminating designs. In particular we show how the Bayesian T-optimal design problem is related to a local one with a large number of model comparisons [see Section 2.2]. Section 3 is devoted to the construction of new numerical procedures (in particular Algorithm 3.2), for which we prove convergence to a T-optimal discriminating design. Our approach consists of two steps consecutively optimizing with respect to the support points (Step 1) and weights of the design (Step 2). For the second step we also discuss two procedures to speed up the convergence of the algorithm. The results are illustrated in Section 4 calculating several Bayesian T-optimal discriminating designs in examples, where all other available procedure do not provide a numerical solution of the optimal design problem. For example, the new procedure is able to solve locally T-optimal designs with more than 240 model comparisons as they are arising frequently in Bayesian T-optimal design problems. In particular we illustrate the methodology calculating Bayesian T-optimal discriminating designs for a dose finding clinical trial which has recently been discussed in Pinheiro et al. (2006). The corresponding R-package will be provided in the CRAN library. Finally all proof are deferred to an appendix in Section 5.

2 T-optimal discriminating designs

Consider the regression model (1.1), where x belongs to some compact set $X$ and observations at different experimental conditions are independent. For the sake of transparency and a clear representation we assume that the error ε is normally distributed. The methodology developed in the following discussion can be extended to more general error structures following the line of research in López-Fidalgo et al. (2007), but details are omitted for the sake of brevity.

Throughout this paper we consider the situation where ν different models, say

η_{i} (x, θ_{i}), i = 1, \dots, ν,

(2.1)

are available to describe the dependency of Y on the predictor x. In (2.1) the quantity θ_i denotes a d_i-dimensional parameter, which varies in a compact space, say Θ_i (i = 1, . . . , ν). Following Kiefer (1974) we consider approximate designs that are defined as probability measures, say ξ, with finite support. The support points x₁, . . . , x_k of a design ξ give the locations where observations are taken, while the weights ω₁, . . . , ω_k describe the relative proportions of observations at these points. If an approximate design is given and n observations can be taken, a rounding procedure is applied to obtain integers n_i (i = 1, . . . , k) from the not necessarily integer valued quantities ω_in such that $\sum_{i = 1}^{k} n_{i} = n$ . We are interested in designing an experiment, such that a most appropriate model can be chosen from the given class {η₁, . . . , η_ν} of competing models.

2.1 T-optimal designs

In the case of ν = 2 competing models Atkinson and Fedorov (1975a) proposed to fix one model, say η₁(·, θ₁), with corresponding parameter ${\bar{θ}}_{1}$ and to maximize the functional

T_{12} (ξ) = \inf_{θ_{2} \in ϴ_{2}} \int_{X} {[η_{1} (x, {\bar{θ}}_{1}) - η_{2} (x, θ_{2})]}^{2} ξ (d x),

(2.2)

in the class of all (approximate) designs. Roughly speaking, these designs maximize the power of the test of the hypothesis “η₁ versus η₂”. Note that the resulting optimal design depends on the parameter ${\bar{θ}}_{1}$ for the first model, which has to be fixed by the experimenter. This means that these designs are local in the sense of Chernoff (1953). It was pointed out by Dette et al. (2013) that locally T-optimal designs may be very sensitive with respect to misspecification of ${\bar{θ}}_{1}$ . In a further paper Atkinson and Fedorov (1975b) generalized their approach to construct optimal discriminating designs for more than 2 competing regression models and suggested the criterion

T (ξ) = \min_{2 \leq j \leq ν} T_{1 j} (ξ) = \min_{2 \leq j \leq ν} \inf_{θ_{j} \in ϴ_{j}} \int_{X} {[η_{1} (x, {\bar{θ}}_{1}) - η_{j} (x, θ_{j})]}^{2} ξ (d x) .

(2.3)

This criterion determines a “good” design for discriminating the model η₁ against η₂, . . . , η_ν, where the parameter ${\bar{θ}}_{1}$ has the same meaning as before. As pointed out by Tommasi and López-Fidalgo (2010) and Braess and Dette (2013) there are many situations, where it is not clear which model should be considered as fixed and these authors proposed a symmetrized Bayesian (instead of minimax) version of the T-optimality criterion, that is

T_{P} (ξ) = \sum_{i, j = 1}^{ν} p_{i, j} T_{i, j} (ξ) = \sum_{i, j = 1}^{ν} p_{i, j} \inf_{θ_{i, j} \in ϴ_{j}} \int_{X} {[η_{i} (x, {\bar{θ}}_{i}) - η_{j} (x, θ_{i, j})]}^{2} ξ (d x),

(2.4)

where the quantities p_i,j denote nonnegative weights reflecting the importance of the comparison between the the model η_i and η_j. We note again that this criterion requires the specification of the parameter ${\bar{θ}}_{i}$ , whenever the corresponding weight p_i,j is positive. Throughout this paper we will call a design maximizing one of the criteria (2.2) - (2.4) locally T-optimal discriminating design, where the specific criterion under consideration is always clear from the context. For some recent references discussing locally T-optimal discriminating designs we refer to Ucinski and Bogacka (2005), López-Fidalgo et al. (2007), Atkinson (2008), Tommasi (2009) or Braess and Dette (2013) among many others. For the formulation of the first results we require the following assumptions.

Assumption 2.1

For each i = 1, . . . , ν the functions η_i(·, θ_i) is continuously differentiable with respect to the parameter θ_i ∈ Θ_i,.

Assumption 2.2

For any design ξ such that T_P(ξ) > 0 and weight p_i,j ≠ 0 the infima in (2.4) are attained at a unique points ${\hat{θ}}_{i, j} = {\hat{θ}}_{i, j} (ξ)$ in the interior of the set Θ_j.

For a design ξ we also introduce the notation

ϴ_{i, j}^{*} (ξ) = \underset{θ_{i, j} \in ϴ_{j}}{\arg \inf} \int_{X} {[η_{i} (x, {\bar{θ}}_{i}) - η_{j} (x, θ_{i, j})]}^{2} ξ (d x),

(2.5)

which is used in the formulation of the following result.

Theorem 2.1

If Assumption 2.1 is satisfied, then the design ξ* is a locally T_P-optimal discriminating design, if and only if there exist distributions $μ_{i j}^{*}$ on the sets $ϴ_{i, j}^{*} (ξ^{*})$ defined in (2.5) such that the inequality

\sum_{i, j = 1}^{ν} p_{i, j} \int_{ϴ_{i, j}^{*} (ξ^{*})} {[η_{i} (x, {\bar{θ}}_{i}) - η_{j} (x, θ_{i, j})]}^{2} μ_{i j}^{*} (d θ_{i, j}) \leq T_{P} (ξ^{*})

(2.6)

is satisfied for all $x \in X$ . Moreover, there is equality in (2.6) for all support points of the the locally T_P-optimal discriminating design ξ*.

Theorem 2.1 provides an extension of the corresponding theorem in Braess and Dette (2013), and the proof is similar and therefore omitted. For designs ξ, ζ on $X$ we introduce the function

Q (ζ, ξ) = \int_{X} \sum_{i, j = 1}^{ν} p_{i, j} \inf_{θ_{i, j} \in ϴ_{i j}^{*} (ξ)} {[η_{i} (x, {\bar{θ}}_{i}) - η_{j} (x, θ_{i, j})]}^{2} ζ (d x),

(2.7)

where ζ is an experimental design and the set $ϴ_{i j}^{*} (ξ)$ is defined in (2.5). Using Lemma 5.1 from the appendix it is easy to check that

{\frac{\partial T_{P} (ξ (α))}{\partial α} ∣}_{α = 0} = Q (ζ, ξ) - T_{P} (ξ)

where ξ(α) = (1 – α)ξ + αζ denotes the convex combination of the designs ξ and ζ. If Assumption 2.2 is satisfied, the function Q simplifies to

Q (ζ, ξ) = \int_{X} \sum_{i, j = 1}^{ν} p_{i, j} {[η_{i} (x, {\bar{θ}}_{i}) - η_{j} (x, {\hat{θ}}_{i, j})]}^{2} ζ (d x),

which plays an important role in the subsequent discussion. In particular we need also the following extension of Theorem 2.1.

Theorem 2.2

If Assumption 2.1 is satisfied and the design ξ is not T_P-optimal, then there exists a design ζ*, such that the inequality Q(ζ*, ξ) > T_P(ξ) holds.

In order to obtain a more manageable condition of this result let ${\hat{μ}}_{i, j} (ξ)$ denote a measure on the set $ϴ_{i, j}^{*} (ξ) (i, j = 1, \dots, ν)$ for which the function

\max_{x \in X} \sum_{i, j = 1}^{ν} p_{i, j} \int_{ϴ_{i, j}^{*} (ξ)} {[η_{i} (x, {\bar{θ}}_{i}) - η_{j} (x, θ_{i, j})]}^{2} μ_{i, j} (d θ_{i, j})

attains its minimal value, and define

Ψ (x, ξ) = \sum_{i, j = 1}^{ν} p_{i, j} \int_{ϴ_{i, j}^{*} (ξ)} {[η_{i} (x, {\bar{θ}}_{i}) - η_{i} (x, θ_{i, j})]}^{2} {\hat{μ}}_{i, j} (d θ_{i, j}) .

(2.8)

Note that the function in (2.8) simplifies to

Ψ (x, ξ) = \sum_{i, j = 1}^{ν} p_{i, j} {[η_{i} (x, {\bar{θ}}_{i}) - η_{j} (x, {\hat{θ}}_{i, j})]}^{2},

(2.9)

if both Assumptions 2.1 and 2.2 are satisfied.

Corollary 2.3

If Assumption 2.1 is satisfied and the design ξ is not T_P-optimal then there exists a point $\bar{x} \in X$ such that

Ψ (\bar{x}, ξ) > T_{P} (ξ) .

2.2 Bayesian T-optimal designs

As pointed out by Dette et al. (2012) locally T-optimal designs are rather sensitive with respect to misspecification of the unknown parameters ${\bar{θ}}_{i}$ , and it might be appropriate to construct more robust designs for model discrimination. The problem of robustness was already mentioned in Atkinson and Fedorov (1975a) and these authors proposed a Bayesian version of the T-optimality criterion which reads in the situation of the criterion (2.4) as follows

T_{P}^{B} (ξ) = \sum_{i, j = 1}^{ν} p_{i, j} \int_{ϴ_{i}} \inf_{θ_{i, j} \in ϴ_{j}} \int_{X} {[η_{i} (x, λ_{i}) - η_{j} (x, θ_{i, j})]}^{2} ξ (d x) P_{i} (d λ_{i}) .

(2.10)

Here for each i = 1, . . . , ν the measure $P_{i}$ denotes a prior distribution for the parameter θ_i in model η_i, such that all integrals in (2.10) are well defined. Throughout this paper we will call any design maximizing the criterion (2.10) a Bayesian T-optimal discriminating design. For (two) polynomial regression models Bayesian T-optimal discriminating designs have been explicitly determined by Dette et al. (2013), and their results indicate the intrinsic difficulties in the construction of optimal designs with respect to this criterion.

In the following we will link the criterion (2.10) to the locally T-optimality criterion (2.4) for large number of competing models. For this purpose we note that in nearly all situations of practical interest an explicit evaluation of the integral in (2.10) is not possible and the criterion has to be evaluated by numerical integration approximating the prior distribution by a measure with finite support. Therefore we assume that the prior distribution $P_{i}$ in the criterion is given by a discrete measure with masses $τ_{i 1}, \dots τ_{i ℓ_{i}}$ at the points $λ_{i 1}, \dots, λ_{i ℓ_{i}}$ . The criterion in (2.10) can then be rewritten as

T_{P}^{B} (ξ) = \sum_{i, j = 1}^{ν} \sum_{k = 1}^{ℓ_{i}} p_{i, j} τ_{i k} \inf_{θ_{i, j} \in ϴ_{j}} \int_{X} {[η_{i} (x, λ_{i k}) - η_{j} (x, θ_{i, j})]}^{2} ξ (d x),

(2.11)

which is a locally T-optimality criterion of the from (2.4). The only difference between the criterion obtained form the Bayesian approach and (2.4) consists in the fact that the criterion (2.11) involves substantially more comparisons of the functions η_i and η_j. For example, if this approach is used for a Bayesian version of the criterion (2.2) we obtain

T_{12}^{B} (ξ) = \sum_{k = 1}^{ℓ} τ_{k} \inf_{θ_{2} \in ϴ_{2}} \int_{X} {[η_{1} (x, λ_{k}) - η_{2} (x, θ_{2})]}^{2} ξ (d x) .

(2.12)

This is the locally T-optimality criterion (2.4) with $ν = ℓ + 1$ , $p_{i, ℓ + 1} = τ_{i} (i = 1, \dots, ℓ)$ and p_i,j = 0 otherwise. Thus, instead of making only one comparison as required for the locally T-optimality criterion, the Bayesian approach (with a discrete approximation of the prior) yields a criterion with $ℓ$ comparisons, where $ℓ$ denotes the number of support points used for the approximation of the prior distribution. Moreover, for each support point of the prior distribution in the criterion (2.11) (or (2.12)) the infimum has to be calculated numerically, which is computationally expensive. Consequently, the computation of Bayesian T-optimal discriminating design problems is particularly challenging. In the following sections we provide an efficient solution of this problem.

3 Calculating locally T-optimal designs

Braess and Dette (2013) proposed an algorithm for the numerical construction of locally T-optimal designs, which is based on vector-valued Chebyshev approximation. This algorithm is quite difficult both in terms of description and implementation. Moreover, it requires substantial computational resources and is therefore only able to deal with a small number of comparisons in the T-optimality criterion. The purpose of this section is to develop a more efficient method which is able to deal with a large number of comparisons in the the criterion and avoids the drawbacks of the procedures in Atkinson and Fedorov (1975a) and Braess and Dette (2013). As pointed out in Section 2.2 methods solving this problem are required for the calculation of Bayesian T-optimal discriminating designs. Recall the definition of the function Ψ in (2.8) and note that under Assumption 2.1 it follows from Corollary 2.3 that there exists a point $\bar{x} \in X$ , such that the inequality

Ψ (\bar{x}, ξ) > T_{P} (ξ)

holds, whenever ξ is not a locally T-optimal discriminating design. The algorithm of Atkinson and Fedorov (1975a) uses this property to construct a sequence of designs which converges to the locally T-optimal discriminating design. For further reference it is stated here.

Algorithm 3.1 (Atkinson and Fedorov (1975a))

Let ξ₀ denote a given (starting) design and let ${(α_{s})}_{s = 0}^{\infty}$ be a sequence of positive numbers, such that $\lim_{s \to \infty} α_{s} = 0, \sum_{s = 0}^{\infty} α_{s} = \infty, \sum_{s = 0}^{\infty} α_{s}^{2} < \infty$ . For s = 0, 1, . . . define

ξ_{s + 1} = (1 - α_{s}) ξ_{s} + α_{s} ξ (x_{s + 1}),

where $x_{s + 1} = \arg \max_{e \in X} Ψ (x, ξ_{s})$ .

It can be shown that this algorithm converges in the sense that $\lim_{s \to \infty} T_{P} (ξ_{s}) = T_{P} (ξ^{*})$ , where ξ* denotes a locally T-optimal discriminating design. However, a major problem of Algorithm 3.1 is that it yields a sequence of designs with an increasing number of support points. As a consequence the resulting design (after applying some stopping criterion) is concentrated on a large set of points. Even if this problem can be solved by clustering or by determining the extrema of the final function Ψ(x, ξ_s), it is much more difficult to deal with the accumulation of support points during the iteration. Moreover, Braess and Dette (2013) demonstrated that in many cases the iteration process may take several hundred iterations for obtaining a locally T- optimal discriminating design with a required precision, resulting in a high computational complexity for the recalculation of the optimum values

{\hat{θ}}_{i, j} \in \underset{θ_{i, j} \in ϴ_{j}}{\arg \inf} \int_{X} {[η_{i} (x, {\bar{θ}}_{i}) - η_{j} (x, θ_{i, j})]}^{2} ξ (d x)

(3.1)

in the optimality criterion (2.4). These authors also showed that Algorithm 3.1 may not find the optimal design if there are too many model comparisons involved in the T-optimality criterion (2.4).

Therefore, we propose the following alternative basic procedure for the calculation of locally T-optimal discriminating designs as an alternative to Algorithm 3.1. Roughly speaking, it consists of two steps treating the maximization with respect to support points (Step 1) and weights (Step 2) separately, where two methods implementing the second step will be given below [see Section 3.1 and 3.2 for details].

Algorithm 3.2

Let ξ₀ denote a starting design such that T_P(ξ₀) > 0 and define recursively a sequence of designs (ξ_s)_s=0,1,... as follows:

(1)
Let $S_{[s]}$ denote the support of the design ξ_s. Determine the set $E_{[s]}$ of all local maxima of the function Ψ(x, ξ_s) on the design space $X$ and define $S_{[s + 1]} = S_{[s]} \cup E_{[s]}$ .
(2)
We define $ξ = {S_{[s + 1]}, ω}$ as the design supported at $S_{[s + 1]}$ (with a vector w of weights) and determine the locally T_P-optimal design in the class of all designs supported at $S_{[s + 1]}$ , that is we determine the vector ω_[s+1] maximizing the function
$g (ω) = T_{P} ({S_{[s + 1]}, ω}) = \sum_{i, j = 1}^{ν} p_{i, j} \inf_{θ_{i, j} \in ϴ_{j}} \sum_{x \in S_{[s + 1]}} {[η_{i} (x, {\bar{θ}}_{i}) - η_{j} (x, θ_{i, j})]}^{2} w_{x}$
(here w_x denotes the weights at the point $x \in S_{s + 1}$ ). All points in $S_{[s + 1]}$ with vanishing components in the vector of weights ω_[s+1] will be be removed and the new set of support points will also be denoted by $S_{[s + 1]}$ . Finally the design ξ_s+1 is defined as the design with the set of support points $S_{[s + 1]}$ and the corresponding nonzero weights.

Theorem 3.3

Let Assumption 2.1 be satisfied and let (ξ_s)_s=0,1,... denote the sequence of designs obtained by Algorithm 3.2, then

\lim_{s \to \infty} T_{P} (ξ_{s + 1}) = T_{P} (ξ^{*}),

where ξ* denotes a locally T-optimal discriminating design.

A proof of Theorem 3.3 is deferred to Section 5. Note that the algorithm adds all local maxima of the function Ψ(x, ξ_s) as possible support points of the design in the next iteration. Consequently, in the current form Algorithm 3.2 also accumulates too many support points. To avoid this problem, it is suggested to remove at each step those points from the support, whenever their weight is smaller than m^0.25, where m denote the working precision of the software used in the implementation (which is 2.2 × 10⁻¹⁶ for R). Note also that this refinement does not affect the convergence of the algorithm from a practical point of view. A more important question is the implementation of the second step of the procedure, that is the maximization of function g(ω). Before we discuss two computationally efficient procedures for this purpose in the following sections, we state an important property of the function Ψ(x, ξ_s+1) obtained in each iteration.

Lemma 3.4

At the end of each iteration of Algorithm 3.2 the function Ψ(x, ξ_s+1) attains one and the same value for all support points of the design ξ_s+1.

3.1 Quadratic programming

Let $S_{[s + 1]} = {x_{1}, \dots, x_{n}}$ denote the set obtained in the first step of Algorithm 3.2 and define ξ as a design supported at $S_{[s + 1]}$ with corresponding weights ω₁, . . . , ω_n (which have to be determined in Step 2 of the algorithm by maximizing the function

g (ω) = \sum_{i, j = 1}^{ν} p_{i, j} \sum_{k = 1}^{n} ω_{k} {[η_{i} (x_{k}, {\bar{θ}}_{i}) - η_{j} (x_{k}, {\hat{θ}}_{i, j})]}^{2},

where ${\hat{θ}}_{i, j} = {\hat{θ}}_{i, j} (ω)$ is defined in (3.1). For this purpose we suggest to linearize the functions $η_{j} (x_{k}, θ_{i, j})$ in the neighborhood of point ${\hat{θ}}_{i, j}$ . More precisely, we consider the function

\begin{matrix} \bar{g} (ω) = & \sum_{i, j = 1}^{ν} p_{i, j} \min_{α_{i, j} \in R^{d_{j}}} \sum_{k = 1}^{n} ω_{k} {[η_{i} (x_{k}, {\bar{θ}}_{i}) - η_{j} (x_{k}, {\hat{θ}}_{i, j}) - α_{i, j}^{T} {\frac{\partial η_{j} (x_{k}, θ_{i, j})}{\partial θ_{i, j}} ∣}_{θ_{i, j} = {\hat{θ}}_{i, j}}]}^{2} . \\ = & \sum_{i, j = 1}^{ν} p_{i, j} \min_{α_{i, j} \in R^{d_{j}}} [α_{i, j}^{T} J_{i, j}^{T} Ω J_{i, j} α_{i, j} - 2 ω^{T} R_{i, j} α_{i, j} + b_{i, j}^{T} ω], \end{matrix}

where d_j is the dimension of the parameter space Θ_j, Ω = diag(ω₁, . . . , ω_n) and the matrices $J_{i, j} \in R^{n \times d_{j}}$ , $R_{i, j} \in R^{n \times d_{j}}$ and the vectors $b_{i, j} \in R^{n}$ are defined by

\begin{matrix} J_{i, j} = & {({\frac{\partial η_{j} (x_{r}, θ_{i, j})}{\partial θ_{i, j}} ∣}_{θ_{i, j} = {\hat{θ}}_{i, j}})}_{r = 1, \dots, n}, \\ R_{i, j} = & {([η_{i} (x_{r}, {\bar{θ}}_{i}) - η_{j} (x_{r}, {\hat{θ}}_{i, j})] {\frac{\partial η_{j} (x_{r}, θ_{i, j})}{\partial θ_{i, j}} ∣}_{θ_{i, j} = {\hat{θ}}_{i, j}})}_{r = 1, \dots, n}, \\ b_{i, j} = & {({[η_{i} (x_{r}, {\bar{θ}}_{i}) - η_{j} (x_{r}, {\hat{θ}}_{i, j})]}^{2})}_{r = 1, \dots, n}, \end{matrix}

respectively. Obviously the minimum with respect to α_i,j is achieved by $α_{i, j} = {(J_{i, j}^{T} Ω J_{i, j})}^{- 1} R_{i, j}^{T} ω$ which gives

\bar{g} (ω) = - ω^{T} Q (ω) ω + b^{T} ω,

where

Q (ω) = \sum_{i, j = 1}^{ν} p_{i, j} R_{i, j} {(J_{i, j}^{T} Ω J_{i, j})}^{- 1} R_{i, j}^{T} .

The matrix Q(ω) depends on ω, but if we ignore this dependence and take the matrix $Ω = diag ({\bar{ω}}_{1}, \dots, {\bar{ω}}_{n})$ as fixed, then we end up with a quadratic programming problem, that is

\begin{matrix} ϕ (ω, \bar{ω}) = - ω^{T} Q (\bar{ω}) ω + b^{T} ω \to \max_{ω}, \\ \sum_{k = 1}^{n} ω_{k} = 1; ω_{k} \geq 0, k = 1, \dots, n . \end{matrix}

(3.2)

This problem is solved iteratively until convergence, substituting each time the solution obtained in the previous iteration instead of $\bar{ω}$ . We note that a similar idea has also been proposed by Braess and Dette (2013).

Remark 3.5

In the practical implementation of the procedure it is recommended to perform only a few iterations of this step such that an improvement in the difference between the value of the criterion of the starting design in Step 2 and the design obtained in the iteration of (3.2) is observed. This will speed up the convergence of the procedure substantially. In this case equality of the function Ψ at the support points of the calculated design (as stated in Lemma 3.4) is only achieved approximately.

Formally, the convergence of the algorithm is only proved if the iteration (3.2) is performed until convergence. However, in all examples considered so far, we observed convergence of the procedure, even if only a few iterations of (3.2) are used. In our R program the user can specify the number of iterations used in this part of the algorithm. Thus, if any problem regarding convergence is observed, the number of iterations should be increased (of course at a cost speed of the algorithm).

3.2 A gradient method

A further option for the second step in Algorithm 3.2 is a specialized gradient method, which is used for the function

g (ω) = \sum_{i, j = 1}^{ν} p_{i, j} \sum_{k = 1}^{n} ω_{k} {[η_{i} (x_{k}, {\bar{θ}}_{i}) - η_{j} (x_{k}, {\hat{θ}}_{i, j})]}^{2}

(3.3)

where ${\hat{θ}}_{i, j} = {\hat{θ}}_{i, j} (ω)$ is defined in (3.1). For it s description we define the functions

v_{k} (ω) = \sum_{i, j = 1}^{ν} p_{i, j} {[η_{i} (x_{k}, θ_{i}) - η_{j} (x_{k}, {\hat{θ}}_{i, j} (ω))]}^{2}, k = 1, \dots, n,

and iteratively calculate a sequence of vectors (ω_(γ))_γ=0,1,.... At the beginning we choose $ω_{(0)} = \bar{ω}$ (for example equal weights). If ω_(γ) = (ω_(γ),1, . . . , ω_(γ),n) is given, we proceed for γ = 0, 1, . . . as follows. We determine indices k̄ and ḵ corresponding to max_1≤k≤n v_k(ω_(γ)) and min_1≤k≤n v_k(ω_(γ)), respectively, and define

α^{*} = \arg \max_{0 \leq α \leq ω_{(γ), \underline{k}}} g ({\bar{ω}}_{(γ)} (α)),

(3.4)

where the vector ${\bar{ω}}_{(γ)} (α) = ({\bar{ω}}_{(γ), 1} (α), \dots, {\bar{ω}}_{(γ), n} (α))$ is given by

{\bar{ω}}_{(γ), i} (α) = {\begin{matrix} ω_{(γ), i} + α & if i = \bar{k} \\ ω_{(γ), i} - α & if i = \underline{k} \\ ω_{(γ), i} & else \end{matrix}

The vector ω_(γ+1) of the next iteration is then defined by $ω_{(γ + 1)} = {\bar{ω}}_{(γ)} (α^{*})$ . The following theorem shows that the generated sequence of vectors converges to a maximizer of the function g in (3.3) and is proved in the Appendix.

Theorem 3.6

The sequence ${(ω_{(γ)})}_{γ \in N}$ converges to a vector ω* ∈ arg max g(ω).

Remark 3.7

It is worthwhile to mention that the one dimensional optimization problem (3.4) is computationally rather expensive. In the implementation we use a linearization of the optimization problem, which is obtained in a similar way a described in Section 3.1.

4 Implementation and numerical examples

We have implemented the procedure for the calculation of the locally T-optimal discriminating design in R, where the user has to specify the weights p_i,j and the corresponding preliminary information regarding the parameters ${\bar{θ}}_{i}$ . To be precise, we call

P = [\begin{matrix} p_{1, 1} & p_{1, 2} & \dots & p_{1, ν - 1} & p_{1, ν} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ p_{ν, 1} & p_{ν, 2} & \dots & p_{ν, ν - 1} & p_{ν, ν} \end{matrix}]

the comparison table for the locally T-optimal discriminating design problem under consideration. This table has to be specified by the experimenter. Because the Bayesian T-optimal design problem with a discrete prior can be reduced to a locally T-optimal one with a large number of model comparisons, we now describe the corresponding table for the Bayesian T-optimality criterion. For illustration purposes we consider the case ν = 2. The Bayesian T-optimality criterion is given in (2.12), where the prior for the parameter θ₁ puts masses $τ_{1}, \dots τ_{ℓ}$ at the points $λ_{1}, \dots, λ_{ℓ}$ . This criterion can be rewritten as a local T-optimality criterion of the form (2.4), i.e.

T_{P} (ξ) = \sum_{i, j = 1}^{ℓ + 1} p_{i, j} \inf_{θ_{i, j} \in ϴ_{j}} \int_{X} {[η_{i} (x, θ_{i}) - η_{j} (x, θ_{i, j})]}^{2} ξ (d x),

(4.1)

where comparison table is given by

P = {(p_{i, j})}_{i, j = 1, \dots, ℓ + 1} = [\begin{matrix} 0 & 0 & \dots & 0 & τ_{1} \\ 0 & 0 & \dots & 0 & τ_{2} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & 0 & τ_{ℓ} \\ 0 & 0 & \dots & 0 & 0 \end{matrix}] \in R^{ℓ + 1 \times ℓ + 1},

(4.2)

$η_{i} (x, {\bar{θ}}_{i}) = η_{1} (x, λ_{i})$ , $i = 1, \dots, ℓ$ and $η_{ℓ + 1} (x, θ_{i, j}) = η_{2} (x, θ_{i, ℓ + 1})$ . The extension of this approach to more than two models is easy and left to the reader. We now illustrate the new method in two examples calculating Bayesian T-optimal discriminating designs. We have implemented both procedures described in Section 3.1 and 3.2 and the results were similar. For this reason we only represent the Bayesian T-optimal discriminating designs calculated by Algorithm 3.2, where the quadratic programming method was used in Step 2 [see Section 3.1 for details].

4.1 Bayesian T-optimal discriminating designs for exponential models

Consider the problem of discriminating between the two regression models

\begin{matrix} η_{1} (x, θ_{1}) = & θ_{1, 1} - θ_{1, 2} \exp (- θ_{1, 3} x^{θ_{1, 4}}), \\ η_{2} (x, θ_{2}) = & θ_{2, 1} - θ_{2, 2} \exp (- θ_{2, 3} x), \end{matrix}

(4.3)

where the design space is given by the interval [0, 10]. Exponential models of the form (4.3) are widely used in applications. For example, the model η₂ is frequently fitted in agricultural sciences, where it is called Mitscherlichs growth law and used for describing the relation between the yield of a crop and the amount of fertilizer. In fisheries research this model is called Bertalanffy growth curve and used for the description of the length of a fish in dependence of its age [see Ratkowsky (1990)]. Optimal designs for exponential regression models have been determined by Han and Chaloner (2003) among others. In the following we will demonstrate the performance of the new algorithm in calculating Bayesian T-optimal discriminating designs for the two exponential models. Note that it make only sense to consider the Bayesian version of T₁₂, because the model η₂ is obtained as a special case of η₁ for θ_1,4 = 1. It is easy to see that the locally T-optimal discriminating designs do not depend on the linear parameters of η₁ and we have chosen ${\bar{θ}}_{1, 1} = 2$ and ${\bar{θ}}_{2, 2} = 1$ for these parameters. For the parameters ${\bar{θ}}_{1, 3}$ and ${\bar{θ}}_{1, 4}$ we considered independent prior distributions supported at the points

μ_{j} + \frac{σ (i - 3)}{2} i = 1, \dots, 5; j = 3, 4,

(4.4)

where μ₃ = 0.8, μ₄ = 1.5 and different values of the variance σ² are investigated. The corresponding weights at these points are proportional (in both cases) to

\frac{1}{\sqrt{2 π σ^{2}}} \exp (- \frac{{(i - 3)}^{2}}{8}); i = 1, \dots, 5 .

(4.5)

We note that this yields 25 terms in the Bayesian optimality criterion (2.12). Bayesian T-optimal discriminating designs are depicted in Table 1 for various values of σ², where an equidistant design at 11 points 0, 1, . . . , 10 was used as starting design.

Table 1.

Bayesian T-optimal discriminating designs for the two exponential models in (4.3). The support points and weights of the independent prior distributions for the parameters ${\bar{θ}}_{1, 3}$ and ${\bar{θ}}_{1, 4}$ are given by (4.4) and (4.5), respectively.

σ ²	Optimal design				σ ²	Optimal design

0.0	0.000	0.441	1.952	10.000	0.285	0.000	0.453	1.758	10.000
0.0	0.209	0.385	0.291	0.115	0.285	0.207	0.396	0.292	0.105

0.1	0.000	0.452	1.877	10.000	0.3	0.000	0.452	1.747	4.951	10.000
0.1	0.209	0.391	0.290	0.110	0.3	0.207	0.396	0.292	0.003	0.102

0.2	0.000	0.455	1.811	10.000	0.4	0.000	0.446	1.651	4.699	10.000
0.2	0.208	0.394	0.291	0.107	0.4	0.200	0.384	0.290	0.060	0.066

Open in a new tab

A typical determination of the optimal design takes between 0.03 seconds (in the case σ² = 0) and 1.4 seconds (in the case σ² = 0.4) CPU time on a standard PC (with an intel core i7-4790K processor). The algorithm using the procedure described in Section 3.2 in step 2 requires between 0.11 seconds (in the case σ² = 0) and 11.6 seconds (in the case σ² = 0.4) CPU time. We observe that for small values of σ² the optimal designs are supported at 4 points, while for σ² ≥ 0.285 the Bayesian T-optimal discriminating design is supported at 5 points. The corresponding function Ψ from the equivalence Theorem 2.1. is shown in Figure 1.

*The function on the left hand side of inequality* (2.6) in the equivalence Theorem 2.1 for the numerically calculated Bayesian T-optimal discriminating designs. The competing regression models are given in (4.3).

4.2 Bayesian T-optimal discrimination designs for dose finding studies

Non-linear regression models have also numerous applications in dose response studies, where they are used to describe the dose response relationship. In these and similar situations the first step of the data analysis consists in the identification of an appropriate model, and the design of experiment should take this task into account. For example, for modeling the dose response relationship of a Phase II clinical trial Pinheiro et al. (2006) proposed the following plausible models

\begin{matrix} η_{1} (x, θ_{1}) = & θ_{1, 1} + θ_{1, 2} x; \\ η_{2} (x, θ_{2}) = & θ_{2, 1} + θ_{2, 2} x (θ_{2, 3} - x); \\ η_{3} (x, θ_{3}) = & θ_{3, 1} +_{3, 2} x ∕ (θ_{3, 3} + x); \\ η_{4} (x, θ_{4}) = & θ_{4, 1} + θ_{4, 2} ∕ (1 + \exp (θ_{4, 3} - x) ∕ θ_{4, 4}); \end{matrix}

(4.6)

where the designs space (dose range) is given by the interval $X = [0, 500]$ . In this reference some prior information regarding the parameters for the models is also provided, that is

{\bar{θ}}_{1} = (60, 0.56), {\bar{θ}}_{2} = (60, 7 ∕ 2250, 600), {\bar{θ}}_{3} = (60, 294, 25), {\bar{θ}}_{4} = (49.62, 290.51, 150, 45.51) .

Locally optimal discrimination designs for the models in (4.6) have been determined by Braess and Dette (2013) in the case p_i,j = 1/6, (1 ≤ j < i ≤ 4), which means that the resulting local T-optimality criterion (2.4) consists of 6 model comparisons.

We begin with an illustration of the new methodology developed in Section 3 calculating again the locally T-optimal discriminating design for this scenario. The proposed algorithm needs only four iterations for the calculation of a design, say ξ₄, which has at least efficiency

{Eff}_{T_{P}} (ξ_{4}) = \frac{T_{P} ({\tilde{ξ}}_{4})}{\sup_{ζ} T_{P} (ζ)} \geq 0.999 .

The function Ψ(·, ξ₁) after the first iteration is displayed in Figure 2, where we used the same starting design as in Braess and Dette (2013). The support points of ξ₁ are shown as circles and we can see that function Ψ(x, ξ₁) attains one and the same value, which is represented with dotted line, for all support points. We finally note that the algorithm proposed in Braess and Dette (2013) needs 9 iterations to find a design with the same efficiency.

*The function* Ψ(·, ξ₁) *after the first iteration of Algorithm 3.2*

We now investigate Bayesian T-optimal discriminating designs for a similar situation. For the sake of a transparent representation we only specify a prior distribution of the four-dimensional parameter ${\bar{θ}}_{4}$ for the calculation of the discriminating design, while ${\bar{θ}}_{2}$ and ${\bar{θ}}_{3}$ are considered as fixed. In order to obtain a design which is robust with respect to model misspecification we chose a prior discrete prior with 81 points in $R^{4}$ . More precisely, the support points of the prior distribution are given by the points

{μ_{e_{1}, e_{2}, e_{3}, e_{4}} ∣ e_{1}, e_{2}, e_{3}, e_{4} \in {- 1, 0, 1}},

(4.7)

where

\begin{matrix} μ_{e_{1}, e_{2}, e_{3}, e_{4}} = & (μ_{1} + e_{1} σ, μ_{2} + e_{2} σ, μ_{3} + e_{3} σ, μ_{4} + e_{4} σ), \\ μ = & (μ_{1}, μ_{2}, μ_{3}, μ_{4}) = (49.62, 290.51, 150, 45.51), \end{matrix}

and different values for σ² are considered. The weights at the corresponding points are proportional (normalized such that their sum is 1) to

\frac{1}{{(2 π σ^{2})}^{2}} \exp (\frac{{‖ μ_{e_{1}, e_{2}, e_{3}, e_{4}} - μ ‖}_{2}^{2}}{2 σ^{2}}), e_{1}, e_{2}, e_{3}, e_{4} \in {- 1, 0, 1},

(4.8)

where ∥·∥₂ denotes the Euclidean norm. The resulting Bayesian optimality criterion (2.11) consist of 246 model comparisons. In this case the method of Braess and Dette (2013) fails to find the Bayesian T-optimal discriminating design. Bayesian T-optimal discriminating designs have been calculated by the new Algorithm 3.2 for various values of σ² and the results are shown in Table 2. A typical determination of the optimal design takes between 0.09 seconds (in the case σ² = 0) and 7.8 seconds (in the case σ² = 37²) CPU time on a standard PC. The algorithm using the procedure described in Section 3.2 in Step 2 requires between 0.75 seconds (in the case σ² = 0) and 37.1 seconds (in the case σ² = 37²) CPU time. For small values the Bayesian T-optimal discriminating designs are supported at 4 points including the boundary of the design space. The smaller (larger) interior support point is increasing (decreasing) if σ² is increasing. For larger values of σ² even the number of support points of the optimal design increases. For example, if σ² = 35² or 37² the Bayesian T-optimal discriminating design has 5 or 6 points (including the boundary points of the design space). These observations are in line with the theoretical finding of Braess and Dette (2007) who showed that the number of support points of Bayesian D-optimal designs can become arbitrarily large with an increasing variability in the prior distribution. The corresponding functions from the equivalence Theorem 2.1 are shown in Figure 3.

Table 2.

Bayesian T-optimal discriminating designs for the models in (4.6). The weights in the criterion (2.10) are given by p_i,j = 1/6; 1 ≤ i < j ≤ 4 and the support and masses of the prior distribution are defined by (4.7) and (4.8), respectively.

σ ²	optimal design				σ ²	optimal design

0	0.000	78.783	241.036	500.0	33²		0 000	92.692	222.735	500.0
0	0.255	0.213	0.357	0.175	33²		0.260	0.240	0.344	0.156

20²	0.000	84.467	234.134	500.0	35²		0.000	91.743	129.322	221.118	500.0
20²	0.257	0.225	0.351	0.167	35²		0.260	0.214	0.036	0.336	0.154

30²	0.000	91.029	225.713	500.0	37²	0.000	89.881	129.590	170.306	220.191	500.0
30²	0.259	0.237	0.345	0.159	37²	0.260	0.170	0.091	0.019	0.310	0.150

Open in a new tab

5 Proofs

5.1 An auxiliary result

Lemma 5.1

Let φ(v, y) be a twice continuously differentiable function of two variables $v \in V \subset R^{k}$ and $y \in Y$ , where $Y$ is a compact set. Denote by $Y^{*}$ the set of all points where the minimum $\min_{y \in Y} φ (v, y)$ is attained and let $q \in R^{k}$ be an arbitrary direction. Then

\frac{\partial \min_{y \in Y^{*}} φ (v, y)}{\partial q} = \min_{y \in Y^{*}} \frac{\partial φ (v, y)}{\partial q} .

(5.1)

Proof

See Pshenichny (1971), p. 75.

5.2 Proofs

Proof of Theorem 2.2

Assume without loss of generality that p_i,j > 0 for all i, j = 1, . . . , ν. Let ξ* denote any locally T-optimal discriminating design and let θ = (θ_i,j)_i,j=1,...,ν denote the vector consisting of all θ_i,j ∈ Θ_i,j(ξ*). We introduce the function

φ (x, θ) = \sum_{i, j = 1}^{ν} p_{i, j} {[η_{i} (x, {\bar{θ}}_{i}) - η_{j} (x, θ_{i, j})]}^{2}

(5.2)

and consider the product measure

μ (d θ) = \prod_{i, j = 1, \dots, ν} μ_{i, j} (d θ_{i, j}),

(5.3)

where μ_ij are measures on the sets $ϴ_{i, j}^{*} (ξ^{*})$ defined by (2.5). Similarly, we define $μ^{*} (d θ) = \prod_{i, j = 1, \dots, ν} μ_{i, j}^{*} (d θ_{i, j})$ as the product measure of the measures $μ_{i, j}^{*}$ in Theorem 2.1. From this result we have

\begin{matrix} T_{P} (ξ^{*}) & \geq \sup_{ζ} \int_{X} \int_{ϴ^{*} (ξ^{*})} φ (x, θ) μ^{*} (d θ) ζ (d x) \\ \geq \inf_{μ} \sup_{ζ} \int_{X} \int_{ϴ^{*} (ξ^{*})} φ (x, θ) μ (d θ) ζ (d x) = \sup_{ζ} \inf_{μ} \int_{X} \int_{ϴ^{*} (ξ^{*})} φ (x, θ) μ (d θ) ζ (d x), \end{matrix}

where the sup and inf are calculated in the class of designs ζ on $X$ and product measures μ on $ϴ^{*} (ξ^{*}) = \otimes_{i, j = 1}^{ν} ϴ_{i, j (ξ^{*})}^{*}$ , respectively. It now follows that the characterizing inequality (2.6) in Theorem 2.1 is equivalent to the inequality

\sup_{ζ} Q (ζ, ξ^{*}) \leq T_{P} (ξ^{*}) .

Consequently, any non-optimal design must satisfy the opposite inequality.

Proof of Corollary 2.3

Let ξ denote a design such that T_P(ξ) > 0 and recall the definition of the set $ϴ_{i j}^{*} (ξ)$ in (2.5). We consider for a vector $θ = {(θ_{i, j})}_{i, j = 1, \dots, ν} \in ϴ^{*} (ξ) = \otimes_{i, j = 1, \dots, ν} ϴ_{i, j}^{*} (ξ)$ , the function φ is defined in (5.2) and product measures μ(dθ) of the form (5.3) on Θ*(ξ). Now the well known minimax theorem and the definition of the function Q in (2.7) yields

\begin{matrix} \max_{x \in X} Ψ (x, ξ) = & \inf_{μ} \max_{x \in X} \int_{ϴ^{*} (ξ)} φ (x, θ) μ (d θ) = \inf_{μ} \sup_{ζ} \int_{X} \int_{ϴ^{*} (ξ)} φ (x, θ) μ (d θ) ζ (d x) \\ = & \sup_{ζ} \inf_{μ} \int_{X} \int_{ϴ^{*} (ξ)} φ (x, θ) μ (d θ) ζ (d x) = \sup_{ζ} \inf_{θ \in ϴ^{*} (ξ)} \int φ (x, θ) ζ (d x) = \sup_{ζ} Q (ζ, ξ), \end{matrix}

where the infimum is calculated with respect to all measures μ of the form (5.3) and the supremum is calculated with respect to all experimental designs ζ on $X$ . Note that $X$ is compact by assumption and it can be checked that the set Θ*(ξ) is also compact as a closed subset of a compact set. Consequently all suprema and infima are achieved and there exists a design ζ* supported at the set of local maxima of the function Ψ(x, ξ), such that

Q (ζ^{*}, ξ) = \sup_{ζ} Q (ζ, ξ) = \max_{x \in X} Ψ (x, ξ) .

The assertion of Corollary 2.3 now follows from Theorem 2.2.

Proof of Theorem 3.3

Obviously, the inequality

T_{P} ({S_{[s]}, ω_{[s]}}) \leq T_{P} ({S_{[s + 1]}, ω_{[s + 1]}})

holds for all s as optimization with respect to ω occurs on a larger set. Moreover, the sequence T_P(ξ_s) is bounded from above by T_P(ξ*) and has a limit, which is denoted by $T_{P}^{* *}$ . Consequently, there exists a subsequence of designs, say ξ_sj, j = 1, 2, . . . converging to a design, say ξ**. Note that T_P is upper semi-continuous as the infimum of continuous functions, which implies $T_{P} (ξ^{* *}) = T_{P}^{* *}$ . Now, assume that T_P(ξ**) < T_P(ξ*), then ξ** is not locally T-optimal and by Theorem 2.2 there exists a constant δ > 0 such that

\sup_{ζ} Q (ζ, ξ^{* *}) - T_{P} (ξ^{* *}) = 2 δ,

where the function Q is defined in (2.7). Therefore for sufficiently large j, say, j ≥ N we obtain (using again the lower semi-continuity of sup_ζ Q(ζ, ξ)) that

\sup_{ζ} Q (ζ, ξ_{s_{j}}) - T_{P} (ξ_{s_{j}}) > δ,

whenever j ≥ N. Note that by construction the sequence ${(T_{P} (ξ_{s}))}_{s \in N}$ is increasing and therefore

T_{P} (ξ_{s_{j + 1}}) - T_{P} (ξ_{s_{j}}) \geq T_{P} (ξ_{s_{j} + 1}) - T_{P} (ξ_{s_{j}}) .

(5.4)

In order to estimate the right hand side we consider for j ≥ N and α ∈ [0, 1] the design

{\tilde{ξ}}_{s_{j + 1}} (α) = (1 - α) ξ_{s_{j}} + α ζ_{j},

where ζ_j is the measure for which the function Q(ζ, ξ_sj) attains its maximal value in the class of all experimental designs supported at the local maxima of the function Ψ(x, ξ_sj), and define

α_{s_{j + 1}} = \underset{0 \leq α \leq 1}{\arg \max} T_{P} ({\tilde{ξ}}_{s_{j + 1}} (α)) .

By construction of ξ_sj+1 is the best design supported at $supp (ξ_{s_{j}}) \cup supp (ζ_{j})$ , and (5.4) yields

T_{P} (ξ_{s_{j + 1}}) \geq T_{P} (ξ_{s_{j} + 1}) \geq T_{P} ({\tilde{ξ}}_{s_{j + 1}} (α_{s_{j + 1}})) .

(5.5)

We introduce the notations $h (j, α) = T_{P} ({\tilde{ξ}}_{s_{j}} (α))$ , and note that

{\frac{\partial T_{P} ({\tilde{ξ}}_{s_{j + 1}} (α))}{\partial α} ∣}_{α = 0} = Q (ζ_{j}, ξ_{s_{j}}) - T_{P} (ξ_{s_{j}}) = \sup_{ζ} Q (ζ, ξ_{s_{j}}) - T_{P} (ξ_{s_{j}}) > δ .

A Taylor expansion gives

\begin{matrix} h (j + 1, α_{s_{j + 1}}) - h (j + 1, 0) & = \max_{α \in [0, 1]} [T_{P} ({\tilde{ξ}}_{s_{j + 1}} (α)) - T_{P} ({\tilde{ξ}}_{s_{j + 1}} (0))] \\ \geq \max_{α \in [0, 1]} [α {\frac{\partial T_{P} ({\tilde{ξ}}_{s_{j + 1}} (α))}{\partial α} ∣}_{α = 1} - \frac{1}{2} α^{2} K] > \max_{α \in [0, 1]} [α δ - \frac{1}{2} α^{2} K] = \frac{δ^{2}}{2 K}, \end{matrix}

where K is an absolute upper bound of the second derivative. Therefore it follows from (5.5) that

\begin{matrix} T_{P} (ξ_{s_{j + 1}}) - T_{P} (ξ_{s_{j}}) & \geq T_{P} (ξ_{s_{j} + 1}) - T_{P} (ξ_{s_{j}}) \\ \geq T_{P} ({\tilde{ξ}}_{s_{j + 1}} (α_{s_{j + 1}})) - T_{P} (ξ_{s_{j}}) = h (j + 1, α_{s_{j + 1}}) - h (j + 1, 0) \geq \frac{δ^{2}}{2 K} . \end{matrix}

which gives for L > N + 1

T_{P} (ξ_{s_{L}}) - T_{P} (ξ_{s_{N}}) = \sum_{j = N}^{L - 1} [T_{P} (ξ_{s_{j + 1}}) - T_{P} (ξ_{s_{j}})] \geq [L - N] \frac{δ^{2}}{2 K} .

The left hand side of this inequality converges to the finite value T(ξ**) – T(ξ_{s_N}) as L → ∞, while the right hand side converges to infinity. Therefore we obtain a contradiction to our assumption T_P(ξ**) < T_P(ξ*), which proves the assertion of Theorem 3.3.

Proof of Lemma 3.4

Fix t ∈ {1, . . . , n} and note that $w_{t} = 1 - \sum_{ℓ = 1, ℓ \neq t}^{n} w_{ℓ}$ . Under Assumptions 2.1 and 2.2 we obtain by formula (5.1)

\frac{\partial g (ω)}{\partial ω_{k}} = \sum_{i, j = 1}^{ν} p_{i, j} {[η_{i} (x_{k}, {\bar{θ}}_{i}) - η_{i} (x_{k}, {\hat{θ}}_{i, j} (ω))]}^{2} - \sum_{i, j = 1}^{ν} p_{i, j} {[η_{i} (x_{t}, {\bar{θ}}_{i}) - η_{j} (x_{t}, {\hat{θ}}_{i, j} (ω))]}^{2}

The condition $\frac{\partial g (ω)}{\partial ω_{k}} = 0, k = 1, \dots, n, k \neq t$ is the necessary condition for weight optimality and consequently it follows from the definition of the function $Ψ (x, {\bar{ξ}}_{s + 1})$ that this function attains one and the same value for all support points of the design ${\bar{ξ}}_{s + 1}$ .

Proof of Theorem 3.6

The proof is similar to the proof of Theorem 3.3. Denote

h (γ, α) = g ({\bar{ω}}_{(γ)} (α)),

where the vector ${\bar{ω}}_{(γ)} (α^{*})$ is calculated at the γth iteration. Since the sequence g(ω_(γ)) is bounded and increasing (by construction) it converges to some limit, say g**. Consequently there exists a subsequence of vector of weights, say ${\bar{ω}}_{(γ_{j})}, j = 1, 2, \dots$ converging to a vector, say ${\bar{ω}}^{* *}$ . Note that g is upper semi-continuous as the infimum of continuous functions, which implies $g ({\bar{ω}}^{* *}) = g^{* *}$ . Now, assume that $g ({\bar{ω}}^{* *}) < g (ω^{*})$ , then it follows by an application of Theorem 2.1 with $X = {x_{1}, \dots, x_{n}}$ that there exists a constant δ > 0 such that

{\frac{\partial g (\bar{ω} (α))}{\partial α} ∣}_{α = 0} = 2 δ > 0 .

Here the vector $\bar{ω} (α)$ is defined in the same way as ${\bar{ω}}_{(γ)} (α)$ , where ω_(γ) is replaced by ω = ω**. Therefore for sufficiently large j, say, j ≥ N we obtain (using the lower semi-continuity of g) that h(γ_j, 0) > δ and a Taylor expansion yields

h (γ_{j + 1}, α_{(γ_{j + 1})}^{*}) - h (s_{j}, α_{(γ_{j})}^{*})) \geq \max_{α} (α \frac{\partial g (\bar{ω} (α))}{\partial α} - \frac{1}{2} α^{2} K) = \frac{δ^{2}}{2 K},

where $α_{(γ_{j})}^{*}$ is the value α* from the γ_jth iteration and K is an absolute upper bound of the second derivative. Using the same arguments as in the proof of Theorem 3.3 we obtain a contradiction, which proves the assertion of the theorem.

Acknowledgements

Parts of this work were done during a visit of the second author at the Department of Mathematics, Ruhr-Universität Bochum, Germany. The authors would like to thank M. Stein who typed this manuscript with considerable technical expertise. The work of H. Dette and V. Melas was supported by the Deutsche Forschungsgemeinschaft (SFB 823: Statistik nichtlinearer dynamischer Prozesse, Teilprojekt C2). The research of H. Dette reported in this publication was also partially supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R01GM107639. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. V. Melas was also partially supported by Russian Foundation of Basic Research (Project 12.01.00747a).

Contributor Information

Holger Dette, Ruhr-Universität Bochum, Fakultät für Mathematik, 44780 Bochum, Germany, holger.dette@rub.de.

Viatcheslav B. Melas, St. Petersburg State University, Department of Mathematics, St. Petersburg, Russia, vbmelas@post.ru

Roman Guchenko, St. Petersburg State University, Department of Mathematics, St. Petersburg, Russia, romanguchenko@ya.ru.

References

Atkinson A, Donev A, Tobias R. Optimum Experimental Designs, with SAS (Oxford Statistical Science Series) 2nd edition Oxford University Press; USA: 2007. [Google Scholar]
Atkinson AC. Examples of the use of an equivalence theorem in constructing optimum experimental designs for random-effects nonlinear regression models. Journal of Statistical Planning and Inference. 2008;138(9):2595–2606. [Google Scholar]
Atkinson AC, Bogacka B, Bogacki MB. D- and T -optimum designs for the kinetics of a reversible chemical reaction. Chemometrics and Intelligent Laboratory Systems. 1998;43:185–198. [Google Scholar]
Atkinson AC, Fedorov VV. The designs of experiments for discriminating between two rival models. Biometrika. 1975a;62:57–70. [Google Scholar]
Atkinson AC, Fedorov VV. Optimal design: Experiments for discriminating between several models. Biometrika. 1975b;62:289–303. [Google Scholar]
Braess D, Dette H. On the number of support points of maximin and Bayesian D-optimal designs in nonlinear regression models. Annals of Statistics. 2007;35:772–792. [Google Scholar]
Braess D, Dette H. Optimal discriminating designs for several competing regression models. Annals of Statistics. 2013;41(2):897–922. [Google Scholar]
Chaloner K, Verdinelli I. Bayesian experimental design: A review. Statistical Science. 1995;10(3):273–304. [Google Scholar]
Chernoff H. Locally optimal designs for estimating parameters. Annals of Mathematical Statistics. 1953;24:586–602. [Google Scholar]
Dette H. Designing experiments with respect to “standardized” optimality criteria. Journal of the Royal Statistical Society, Ser. B. 1997;59:97–110. [Google Scholar]
Dette H, Haller G. Optimal designs for the identification of the order of a Fourier regression. Annals of Statistics. 1998;26:1496–1521. [Google Scholar]
Dette H, Melas VB, Shpilev P. T-optimal designs for discrimination between two polynomial models. Annals of Statistics. 2012;40(1):188–205. [Google Scholar]
Dette H, Melas VB, Shpilev P. Robust T-optimal discriminating designs. Annals of Statistics. 2013;41(4):1693–1715. doi: 10.1214/15-AOS1333. [DOI] [PMC free article] [PubMed] [Google Scholar]
Foo LK, Duffull S. Optimal design of pharmacokinetic-pharmacodynamic studies. In: Bonate PL, Howard DR, editors. Pharmacokinetics in Drug Development, Advances and Applications. Springer; 2011. [Google Scholar]
Han C, Chaloner K. D- and c-optimal designs for exponential regression models used in pharmacokinetics and viral dynamics. Journal of Statistical Planning and Inference. 2003;115:585–601. [Google Scholar]
Kiefer J. General equivalence theory for optimum designs (approximate theory). Annals of Statistics. 1974;2(5):849–879. [Google Scholar]
López-Fidalgo J, Tommasi C, Trandafir PC. An optimal experimental design criterion for discriminating between non-normal models. Journal of the Royal Statistical Society, Series B. 2007;69:231–242. [Google Scholar]
Pinheiro J, Bretz F, Branson M. Analysis of dose-response studies: Modeling approaches. In: Ting N, editor. Dose Finding in Drug Development. Springer-Verlag; New York: 2006. pp. 146–171. [Google Scholar]
Pronzato L, Walter E. Robust experimental design via stochastic approximation. Mathematical Biosciences. 1985;75:103–120. [Google Scholar]
Pshenichny BN. Necessary Conditions of an Extremum. Marcel Dekker; New York: 1971. [Google Scholar]
Pukelsheim F. Optimal Design of Experiments. SIAM; Philadelphia: 2006. [Google Scholar]
Ratkowsky D. Handbook of Nonlinear Regression Models. Dekker; New York: 1990. [Google Scholar]
Song D, Wong WK. On the construction of grm-optimal designs. Statistica Sinica. 1999;9:263–272. [Google Scholar]
Stigler S. Optimal experimental design for polynomial regression. Journal of the American Statistical Association. 1971;66:311–318. [Google Scholar]
Tommasi C. Optimal designs for both model discrimination and parameter estimation. Journal of Statistical Planning and Inference. 2009;139:4123–4132. [Google Scholar]
Tommasi C, López-Fidalgo J. Bayesian optimum designs for discriminating between models with any distribution. Computational Statistics & Data Analysis. 2010;54(1):143–150. [Google Scholar]
Ucinski D, Bogacka B. T -optimum designs for discrimination between two multiresponse dynamic models. Journal of the Royal Statistical Society, Ser. B. 2005;67:3–18. [Google Scholar]
Wiens DP. Robust discrimination designs, with Matlab code. Journal of the Royal Statistical Society, Ser. B. 2009;71:805–829. [Google Scholar]
Yang M, Biedermann S, Tang E. On optimal designs for nonlinear models: A general and efficient algorithm. Journal of the American Statistical Association. 2013;108:1411–1420. [Google Scholar]
Yu Y. Monotonic convergence of a general algorithm for computing optimal designs. The Annals of Statistics. 2010;38(3):1593–1606. [Google Scholar]

[R1] Atkinson A, Donev A, Tobias R. Optimum Experimental Designs, with SAS (Oxford Statistical Science Series) 2nd edition Oxford University Press; USA: 2007. [Google Scholar]

[R2] Atkinson AC. Examples of the use of an equivalence theorem in constructing optimum experimental designs for random-effects nonlinear regression models. Journal of Statistical Planning and Inference. 2008;138(9):2595–2606. [Google Scholar]

[R3] Atkinson AC, Bogacka B, Bogacki MB. D- and T -optimum designs for the kinetics of a reversible chemical reaction. Chemometrics and Intelligent Laboratory Systems. 1998;43:185–198. [Google Scholar]

[R4] Atkinson AC, Fedorov VV. The designs of experiments for discriminating between two rival models. Biometrika. 1975a;62:57–70. [Google Scholar]

[R5] Atkinson AC, Fedorov VV. Optimal design: Experiments for discriminating between several models. Biometrika. 1975b;62:289–303. [Google Scholar]

[R6] Braess D, Dette H. On the number of support points of maximin and Bayesian D-optimal designs in nonlinear regression models. Annals of Statistics. 2007;35:772–792. [Google Scholar]

[R7] Braess D, Dette H. Optimal discriminating designs for several competing regression models. Annals of Statistics. 2013;41(2):897–922. [Google Scholar]

[R8] Chaloner K, Verdinelli I. Bayesian experimental design: A review. Statistical Science. 1995;10(3):273–304. [Google Scholar]

[R9] Chernoff H. Locally optimal designs for estimating parameters. Annals of Mathematical Statistics. 1953;24:586–602. [Google Scholar]

[R10] Dette H. Designing experiments with respect to “standardized” optimality criteria. Journal of the Royal Statistical Society, Ser. B. 1997;59:97–110. [Google Scholar]

[R11] Dette H, Haller G. Optimal designs for the identification of the order of a Fourier regression. Annals of Statistics. 1998;26:1496–1521. [Google Scholar]

[R12] Dette H, Melas VB, Shpilev P. T-optimal designs for discrimination between two polynomial models. Annals of Statistics. 2012;40(1):188–205. [Google Scholar]

[R13] Dette H, Melas VB, Shpilev P. Robust T-optimal discriminating designs. Annals of Statistics. 2013;41(4):1693–1715. doi: 10.1214/15-AOS1333. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Foo LK, Duffull S. Optimal design of pharmacokinetic-pharmacodynamic studies. In: Bonate PL, Howard DR, editors. Pharmacokinetics in Drug Development, Advances and Applications. Springer; 2011. [Google Scholar]

[R15] Han C, Chaloner K. D- and c-optimal designs for exponential regression models used in pharmacokinetics and viral dynamics. Journal of Statistical Planning and Inference. 2003;115:585–601. [Google Scholar]

[R16] Kiefer J. General equivalence theory for optimum designs (approximate theory). Annals of Statistics. 1974;2(5):849–879. [Google Scholar]

[R17] López-Fidalgo J, Tommasi C, Trandafir PC. An optimal experimental design criterion for discriminating between non-normal models. Journal of the Royal Statistical Society, Series B. 2007;69:231–242. [Google Scholar]

[R18] Pinheiro J, Bretz F, Branson M. Analysis of dose-response studies: Modeling approaches. In: Ting N, editor. Dose Finding in Drug Development. Springer-Verlag; New York: 2006. pp. 146–171. [Google Scholar]

[R19] Pronzato L, Walter E. Robust experimental design via stochastic approximation. Mathematical Biosciences. 1985;75:103–120. [Google Scholar]

[R20] Pshenichny BN. Necessary Conditions of an Extremum. Marcel Dekker; New York: 1971. [Google Scholar]

[R21] Pukelsheim F. Optimal Design of Experiments. SIAM; Philadelphia: 2006. [Google Scholar]

[R22] Ratkowsky D. Handbook of Nonlinear Regression Models. Dekker; New York: 1990. [Google Scholar]

[R23] Song D, Wong WK. On the construction of grm-optimal designs. Statistica Sinica. 1999;9:263–272. [Google Scholar]

[R24] Stigler S. Optimal experimental design for polynomial regression. Journal of the American Statistical Association. 1971;66:311–318. [Google Scholar]

[R25] Tommasi C. Optimal designs for both model discrimination and parameter estimation. Journal of Statistical Planning and Inference. 2009;139:4123–4132. [Google Scholar]

[R26] Tommasi C, López-Fidalgo J. Bayesian optimum designs for discriminating between models with any distribution. Computational Statistics & Data Analysis. 2010;54(1):143–150. [Google Scholar]

[R27] Ucinski D, Bogacka B. T -optimum designs for discrimination between two multiresponse dynamic models. Journal of the Royal Statistical Society, Ser. B. 2005;67:3–18. [Google Scholar]

[R28] Wiens DP. Robust discrimination designs, with Matlab code. Journal of the Royal Statistical Society, Ser. B. 2009;71:805–829. [Google Scholar]

[R29] Yang M, Biedermann S, Tang E. On optimal designs for nonlinear models: A general and efficient algorithm. Journal of the American Statistical Association. 2013;108:1411–1420. [Google Scholar]

[R30] Yu Y. Monotonic convergence of a general algorithm for computing optimal designs. The Annals of Statistics. 2010;38(3):1593–1606. [Google Scholar]

PERMALINK

Bayesian T-optimal discriminating designs

Holger Dette

Viatcheslav B Melas

Roman Guchenko

Abstract

1 Introduction

2 T-optimal discriminating designs

2.1 T-optimal designs

Assumption 2.1

Assumption 2.2

Theorem 2.1

Theorem 2.2

Corollary 2.3

2.2 Bayesian T-optimal designs

3 Calculating locally T-optimal designs

Algorithm 3.1 (Atkinson and Fedorov (1975a))

Algorithm 3.2

Theorem 3.3

Lemma 3.4

3.1 Quadratic programming

Remark 3.5

3.2 A gradient method

Theorem 3.6

Remark 3.7

4 Implementation and numerical examples

4.1 Bayesian T-optimal discriminating designs for exponential models

Table 1.

Figure 1.

4.2 Bayesian T-optimal discrimination designs for dose finding studies

Figure 2.

Table 2.

Figure 3.

5 Proofs

5.1 An auxiliary result

Lemma 5.1

Proof

5.2 Proofs

Proof of Theorem 2.2

Proof of Corollary 2.3

Proof of Theorem 3.3

Proof of Lemma 3.4

Proof of Theorem 3.6

Acknowledgements

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases