Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2020 Jan 20;62(4):1090–1104. doi: 10.1002/bimj.201900074

A design criterion for symmetric model discrimination based on flexible nominal sets

Radoslav Harman 1,2, Werner G Müller 2,
PMCID: PMC9328432  PMID: 31957085

Abstract

Experimental design applications for discriminating between models have been hampered by the assumption to know beforehand which model is the true one, which is counter to the very aim of the experiment. Previous approaches to alleviate this requirement were either symmetrizations of asymmetric techniques, or Bayesian, minimax, and sequential approaches. Here we present a genuinely symmetric criterion based on a linearized distance between mean‐value surfaces and the newly introduced tool of flexible nominal sets. We demonstrate the computational efficiency of the approach using the proposed criterion and provide a Monte‐Carlo evaluation of its discrimination performance on the basis of the likelihood ratio. An application for a pair of competing models in enzyme kinetics is given.

Keywords: discrimination experiments, exact designs, flexible nominal sets, nonlinear regression

1. INTRODUCTION

Besides optimization and parameter estimation, discrimination between rival models has always been an important objective of an experiment, and, therefore, of the optimization of experimental design. The crucial problem is that one typically cannot construct an optimal model‐discrimination design without already knowing which model is the true one, and what are the true values of its parameters. In this respect, the situation is analogous to the problem of optimal experimental design for parameter estimation in nonlinear statistical models (e.g., Pronzato & Pazman, 2014), and many standard techniques can be used to tackle the dependence on the unknown characteristics: local, Bayesian, minimax, and sequential approaches, as well as their various combinations.

A big leap from initial ad hoc methods for model discrimination (see Hill, 1978, for a review) was Atkinson and Fedorov (1975) who introduced T‐optimality, derived from the likelihood‐ratio test under the assumption that one model is true and its parameters are fixed at nominal values chosen by the experimenter. There, maximization of the noncentrality parameter is equivalent to maximizing the power of the likelihood‐ratio test for the least favorable parameter of the model assumed to be wrong. Thus, T‐optimality can be considered a combination of a localization and a minimax approach.

When the models are nested and (partly) linear, T‐optimality can be shown to be equivalent to Ds‐optimality for a single parameter that embodies the deviations from the smaller model (see, e.g., Dette & Titoff, 2009; Stigler, 1971). For this setting the optimal design questions are essentially solved and everything hinges on the asymmetric nature of the NP‐lemma with respect to the null‐ and alternative hypotheses. However, for a nonnested case the design problem itself is often inherently symmetric with respect to the exchangeability of the compared models and it is the purpose of the experiment to decide which of those two different models is true.

The aim of this paper is to solve the discrimination design problem in a symmetric way focusing on nonnested models. Thus, standard methods that are inherently asymmetric like T‐optimality, albeit being feasible, are not a natural choice. We further suppose that we do not use the full prior distribution of the unknown parameters of the models, which rules out Bayesian approaches such as Felsenstein (1992) and Tommasi and López‐Fidalgo (2010). Nevertheless, as we will make more precise in the next section, we will utilize what can be perceived as a specific kind of prior knowledge about the unknown parameters, extending the approach of local optimality. Our goal is to provide a lean, computationally efficient and scalable method as opposed to the heavy machinery recently employed in the computational statistics literature, for example, Hainy, Price, Restif, and Drovandi (2018). Furthermore, we strive for practical simplicity, which at first prohibits sequential (see Buzzi‐Ferraris & Forzatti, 1983; Müller & Ponce De Leon, 1996; Schwaab et al., 2006) or sequentially generated (see Vajjah & Duffull, 2012) designs.

A standard solution to the symmetric discrimination design problem is to employ symmetrizations of asymmetric criteria such as compound T‐optimality, which usually depend on some weighting chosen by the experimenter. Also the minimax strategy recently presented in Tommasi, Martín‐Martín, and López‐Fidalgo (2016) is essentially a symmetrization. Moreover, the usual minimax approaches lead to designs that completely depend on the possibly unrealistic extreme values of the parameter space and their calculation again demands enormous computational effort.

As the closest in spirit to our approach could be considered a proposal for linear models in section 4.4 of Atkinson and Fedorov (1975) and its extension in Fedorov and Khabarov (1986) which, however, was not taken up by the literature. The probable reason is that it involves some rather arbitrary restrictions on the parameters as well as taking an artificial lower bound to convert it into a computationally feasible optimization problem.

For expositional purposes we will now restrict ourselves to a rather specific design task but will discuss possible extensions at the end of the paper.

Let X be a finite design space and let D be a design on X, that is, a vector of design points x1,,xnX, where n is the chosen size of the experiment. Hence, in the terminology of the theory of optimal experimental design, we will work with exact designs. We will consider discrimination between a pair of nonlinear regression models

yi=η0(θ0,xi)+εi,i=1,,n,andyi=η1(θ1,xi)+εi,i=1,,n,

where y1,,yn are observations, η0:Θ0×XR, η1:Θ1×XR are the mean‐value functions, Θ0Rm0, Θ1Rm1 are parameter spaces with nonempty interiors int(Θ0), int(Θ1), and ε1,,εn are unobservable random errors. For both k=0,1 and any xX, we will assume that the functions ηk(·,x) are differentiable on int(Θk); the gradient of ηk(·,x) in θkint(Θk) will be denoted by ηk(θk,x). Our principal assumption is that one of the models is true but we do not know which, that is, for k=0 or for k=1 there exists θ¯kΘk such that yi=ηk(θ¯k,xi)+εi.

Let the random errors be i.i.d. N(0,σ2), where σ2(0,). The assumption of the same variances of the errors for both models is plausible if, for instance, the errors are due to the measurement device and hence do not significantly depend on the value being measured. The situation with different error variances requires a more elaborate approach, compared with Fedorov and Pázman (1968).

Eventually we are aiming not just at achieving some high design efficiencies with respect to our newly proposed criterion, but also want to test its usefulness in concrete discrimination experiments, that is, the probability that using our design we arrive at the correct decision about which model is the true one. So, to justify our approach numerically, we require a model‐discrimination rule that will be used after all observations based on the design D are collected for evaluational purposes.

The choice of the best discrimination rule based on the observations is generally a nontrivial problem. However, it is natural to compute the maximum likelihood estimates θ^0 and θ^1 of the parameters under the assumption of the first and the second model, respectively, and then base the decision on whether

L(θ^0|(yi)i=1n)L(θ^1|(yi)i=1n)<>1, (1)

that is, the likelihood ratio being smaller or greater than 1, or perhaps more simply whether logL(θ^0)logL(θ^1)<>0. Under the normality, homoskedasticity, and independence assumptions, this decision is equivalent to a decision based on the proximity of the vector (yi)i=1n of observations to the vectors of estimated mean values (η0(θ^0,xi))i=1n and (η1(θ^1,xi))i=1n.

For the case m0m1 to counterbalance favoring models with greater number of parameters, Cox (2013) recommends instead the use of L(θ^0)/L(θ^1)(em1/em0)n/n, which corresponds to the Bayesian information criterion; see Schwarz (1978). Here n corresponds to the number of observations in a real or fictitious prior experiment. For the sake of simplicity however, we will now restrict ourselves to the case of m:=m0=m1. Note that for the evaluational purposes we are taking a purely model selection based standpoint. More sophisticated testing procedures for instance allowing both models to be rejected based on the pioneering work of Cox (1961) are reviewed and outlined in Pesaran and Weeks (2007).

Let x1,,xnX and let D=(x1,,xn) be the design used for the collection of data prior to the decision, and assume that model η0 is true, with the corresponding parameter value θ¯0. Note that this comes without loss of generality and symmetry as we can equivalently assume model η1 to be true. Then, the probability of the correct decision based on the likelihood ratio is equal to

Pminθ0Θ0i=1n(η0(θ0,xi)yi)2minθ1Θ1i=1n(η1(θ1,xi)yi)2, (2)

where (yi)i=1n follows the normal distribution with mean (η0(θ¯0,xi))i=1n and covariance σ2In.

Clearly, probability (2) depends on the true model, the unknown true parameter, and also on the unknown variance of errors. Even if these parameters were known, the probability of the correct classification would be very difficult to compute for a given design because this requires a combination of high‐dimensional integration and multivariate nonconvex optimization. Therefore, it is practically impossible to directly optimize the design based on formula (2). However, we can simplify the problem by constructing a lower bound on (2) which does not depend on unknown parameters and is relatively much simpler to maximize with respect to the choice of the design. The bound based on the distance d(E0,E1), where Ej is the set of all possible mean values of the observations under the model j, j=0,1, and d(., .) denotes the infimum distance between all pairs of elements of two sets, is developed as follows.

Consider a fixed experimental design (x1,,xn), and denote y:=(yi)i=1n, ηj(θj):=(ηj(θj,xi))i=1n for j=0,1. Note that we can express (2) as P[d(E0,y)d(E1,y)]. Now, let R=ε, where ε=yη0(θ¯0), be the norm of the vector of errors. Assuming Rd(E0,E1)/2 we obtain

d(E0,E1)d(η0(θ^0),η1(θ^1))d(y,η0(θ^0))+d(y,η1(θ^1))d(y,η0(θ¯0))+d(y,η1(θ^1))=R+d(y,η1(θ^1))d(E0,E1)/2+d(y,η1(θ^1)),

which implies d(E0,E1)/2d(y,η1(θ^1)) and consequently

d(E0,y)=d(y,η0(θ^0))d(y,η0(θ¯0))=Rd(E0,E1)/2d(y,η1(θ^1))=d(E1,y).

Thus, the event [Rd(E0,E1)/2] implies the event [d(E0,y)d(E1,y)], that is, (2) can be bounded from below by

PRd(E0,E1)/2. (3)

To make (2) as high as possible, it makes sense to maximize (3), that is, maximize d(E0,E1), which depends on the underlying experimental design. Although this maximization is much simpler than maximizing (2) directly, it still generally requires nonconvex multidimensional optimization at each iteration of the maximization procedure, which is impractical for computing exact optimal designs. A realistic approach must be numerically feasible and address the problem of the dependence of the design on unknown true model parameters, which we will achieve by rapidly computable approximation of d(E0,E1) through linearization, as will be explained in the following section.

1.1. Example 1: A motivating example

Let η0(θ0,x)=θ0x and η1(θ1,x)=eθ1x. Furthermore, for the moment we assume just two observations y1,y2 at fixed design points x1=1 and x2=1, respectively. In this case evidently θ^0=y2y12 and θ^1 is the solution of 2eθ(y1eθ)2eθ(y2eθ)=0, which for 2y12 is the log root of the polynomial γ4γ3y2+γy11. Figure 1 displays the log‐likelihood‐ratio contours for the original and linearized models and it is obvious that the former are nonconvex and complex while the latter are much simpler, convex, and do approximate fairly well for a wide range of responses. Note that while this example is for a fixed design it motivates why the linearizations can serve as the cornerstones of our design method as will become clearer in the following sections.

Figure 1.

Figure 1

Left panel: contour plot of logL(θ^0)logL(θ^1) for Example 1, solid line corresponds to 0, horizontal y 1, vertical y 2; right panel: corresponding contour plot for the model η1 linearized at θ1=1

2. THE LINEARIZED DISTANCE CRITERION

We suggest an extension of the idea of local optimality used for nonlinear experimental design. Let θ0int(Θ0) and θ1int(Θ1) be nominal parameter values, which satisfy the basic discriminability condition η0(θ0,x)η1(θ1,x) for some xX. Let us introduce regions Θ0int(Θ0)Rm and Θ1int(Θ1)Rm containing θ0 and θ1; we will consequently call Θ0 and Θ1 flexible nominal sets. It is evident that optimal designs depend on the parameter spaces in the same way as on our flexible nominal sets (cf. Dette, Melas, & Shpilev, 2013), but the latter will not be considered fixed like the parameter spaces Θ0 and Θ1. A novelty of our procedure is that we use these sets as a tuning device.

Let D=(x1,,xn) be a design. Let us perform the following particular linearization of Model ηk=0,1 in θk:

(yi)i=1nFk(D)θk+ak(D)+ε,

where Fk(D) is the n×m matrix given by

Fk(D)=ηk(θk,x1),,ηk(θk,xn)T,

ak(D) is the n‐dimensional vector

ak(D)=(ηk(θk,xi))i=1nFk(D)θk,

and ε=(ε1,,εn)T is a vector of independent N(0,σ2) errors.

Note that for the proposed method the vector ak(D) plays an important role and, although it is known, we cannot subtract it from the vector of observations, as is usual when we linearize a single nonlinear regression model. However, if ηk corresponds to the standard linear model then ak(D)=0n for any D.

2.1. Definition of the δ criterion

Consider the design criterion

δ(D)=infθ0Θ0,θ1Θ1δ(D|θ0,θ1),where (4)
δ(D|θ0,θ1)=a0(D)+F0(D)θ0{a1(D)+F1(D)θ1}, (5)

for θ0Θ0,θ1Θ1. The criterion δ can be viewed as an approximation of the nearest distance d of the mean‐value surfaces of the models, in the neighborhoods of the vectors (η0(θ0,xi))i=1n and (η1(θ1,xi))i=1n; see the illustrative Figure 2.

Figure 2.

Figure 2

Illustrative graph for the definition of δ(D) for two one‐parameter models (Θ0,Θ1R) and a design of size two (D=(x1,x2)). The line segments correspond to the sets {a0(D)+F0(D)θ0:θ0Θ0} and {a1(D)+F1(D)θ1:θ1Θ1} for some flexible nominal sets Θ0 and Θ1

We will now express the δ‐criterion as a function of the design D=(x1,,xn) represented by the counting measure ξ on X defined as

ξ({x}):=#{i{1,,n}:xi=x},xX,

where # means the size of a set. Let θ=(θ0T,θ1T)T. For all xX let

Δη(θ,x):=η0(θ0,x)η1(θ1,x),η(θ,x):=η0T(θ0,x),η1T(θ1,x)T.

For any θ0Θ0, θ1Θ1 and θ=(θ0T,θ1T)T we have

δ2(D|θ0,θ1)=a0(D)+F0(D)θ0{a1(D)+F1(D)θ1}2=i=1nηT(θ,xi)(θθ)+Δη(θ,xi)2=XηT(θ,x)(θθ)+Δη(θ,x)2dξ(x). (6)

Therefore

δ2(D|θ0,θ1)=(θθ)TM(ξ,θ)(θθ)+2bT(ξ,θ)(θθ)+c(ξ,θ), (7)

where

M(ξ,θ)=Xη(θ,x)ηT(θ,x)dξ(x), (8)
b(ξ,θ)=XΔη(θ,x)η(θ,x)dξ(x), (9)
c(ξ,θ)=X[Δη(θ,x)]2dξ(x). (10)

The matrix M(ξ,θ) in Equations (7) and (8) can be recognized as the information matrix for the parameter θ in the linear regression model

zi=ηT(θ,xi)θ+εi=[F0(D),F1(D)]i·θ+εi;i=1,,n, (11)

where [F0(D),F1(D)]i· is the ith row of the matrix [F0(D),F1(D)], with parameter θ and independent, homoskedastic errors ε1,,εn with mean 0; we will call (11) a response difference model.

2.2. Computation of the δ criterion value for a fixed design

For a fixed design D, expression (5) shows that δ2(D|θ) is a quadratic function of θ=(θ0T,θ1T)T. Moreover, both δ(D|θ) and δ2(D|θ) are convex because they are compositions of an affine function of θ and convex functions . and .2, respectively. Clearly, if the flexible nominal sets are compact, convex, and polyhedral, optimization (4) can be efficiently performed by specialized solvers for linearly constrained quadratic programming.

Alternatively, we can view the computation of δ(D|θ) as follows. As

δ2(D|θ0,θ1)={a0(D)a1(D)}[F0(D),F1(D)]θ2,

the minimization in (4) is equivalent to computing the minimum sum of squares for a least squares estimate of θ restricted to Θ:=Θ0×Θ1 in the response difference model with artificial observations

zi={a1(D)a0(D)}i,i=1,,n.

Thus, if Θ0=Θ1=Rm, the infimum in (4) is attained, and it can be computed using the standard formulas of linear regression in the response difference model. If the flexible nominal sets are compact cuboids, (4) can be evaluated by the very rapid and stable method for bounded variable least squares implemented in the R package bvls; see Stark and Parker (1995) and Mullen (2013).

The following simple proposition collects the analytic properties of a natural analogue of δ defined on the linear vector space Ξ of all finite signed measures on X.

Proposition 2.1

For θ0Θ0, θ1Θ1 and a finite signed measure ξ on X let δapp2(ξ|θ0,θ1) be defined via formula (6). Then, δapp2(·|θ0,θ1) is linear on Ξ. Moreover, let

δapp2(ξ):=infθ0Θ0,θ1Θ1δapp2(ξ|θ0,θ1).

Then, δapp2 is positive homogeneous and concave on Ξ.

Positive homogeneity of δapp2 implies that an s‐fold replication of an exact design leads to an s‐fold increase of its δ2 value. Consequently, a natural and statistically interpretable definition of relative δ‐efficiency of two designs D1 and D2 is given by δ2(D1)/δ2(D2), provided that δ2(D2)>0.

Let D be the set of all n‐point designs. A design DD will be called δ‐optimal, if

DargmaxDDδ(D).

Note that the basic discriminability condition implies that if Θ0={θ0} and Θ1={θ1}, then δ(D) is strictly positive. However, for larger flexible nominal sets it can happen that δ(D)=0.

As the evaluation of the δ‐criterion is generally very rapid, the calculation of a δ‐optimal, or nearly δ‐optimal design is similar to that for standard design criteria. For instance, in small problems we can use complete‐enumeration and in larger problems we can employ an exchange heuristic, such as the KL exchange algorithm (see, e.g., Atkinson, Donev, & Tobias, 2007).

Note that the δ‐optimal designs depend not only on η0, η1, X, n, θ0, and θ1, but also on Θ0 and Θ1.

2.3. Parametrization of flexible nominal sets

For simplicity, we will focus on cuboid flexible nominal sets centered at the nominal parameter values. This choice can be justified by the results of Sidak (1967), in particular if we already have confidence intervals for individual parameters; see further discussion in Section 4. Specifically, we will employ the homogeneous dilations

Θk(r):=rΘk(1)θk+θk,r[0,),k=0,1, (12)

Θ0():=Rm, Θ1():=Rm, such that r can be considered a tuning (set) parameter governing the size of the flexible nominal sets. In (12), Θ0(1) and Θ1(1) are “unit” nondegenerate compact cuboids centered on respective nominal parameters. For any design D and r[0,], we define

δr(D):=infθ0Θ0(r),θ1Θ1(r)δ(D|θ0,θ1). (13)

Note that for our choice of flexible nominal sets the infimum in (13) is attained. The δr‐optimal values of the problem will be denoted by

o(r):=maxDDδr(D).

Proposition 2.2

(a) Let D be a design. Functions δr2(D), δr(D), o2(r), o(r) are nonincreasing and convex in r on the entire interval [0, ∞]. (b) There exists r<, such that for all rr: (i) o(r)=o(); (ii) Any δ‐optimal design is also a δr‐optimal design.

(a) Let D be an n‐point design and let 0r1r2[0,].

Inequality δr12(D)δr22(D) follows from definitions (12) and (13), and inequality o2(r1)o2(r2) follows from the fact that a maximum of nonincreasing functions is a nonincreasing function. Monotonicity of δr(D) and o(r) in r can be shown analogously.

To prove the convexity of δr2(D) in r, let α(0,1) and let rα=αr1+(1α)r2. For all r[0,], let θ^r denote a minimizer of δr2(D|·) on Θ(r):=Θ0(r)×Θ1(r). Convexity of δ2(D|θ) in θ and a simple fact αθ^r1+(1α)θ^r2Θ(rα) yield

αδr12(D)+(1α)δr22(D)=αδ2(D|θ^r1)+(1α)δ2(D|θ^r2)δ2(D|αθ^r1+(1α)θ^r2)δ2(D|θ^rα)=δrα2(D),

which proves that δr2(D) is convex in r. The convexity of δr(D) in r can be shown analogously. The functions o 2 and o, as pointwise maxima of a system of convex functions, are also convex.

(b) For any design D of size n, the function δ2(D|·) is nonnegative and quadratic on R2m, therefore its minimum is attained in some θDR2m. There is only a finite number of exact designs of size n, and Θ(r)rR2m, which means that there exists r< such that θDΘ(r) for all designs D of size n. Let rr. We have

o()=maxDDminθR2mδ(D|θ)=maxDDminθΘ(r)δ(D|θ)=maxDDδr(D)=o(r),

proving (i). Let D() be any δ‐optimal n‐trial design. The equality (i) and the fact that δr(D()) and o(r) are nonincreasing with respect to r gives

δr(D())δ(D())=o()=o(r)o(r),

which proves (ii).

The second part of Proposition 2.2 implies the existence of a finite interval [0,r] of relevant set parameters; increasing the set parameter beyond r leaves the optimal designs as well as the optimal value of the δ‐criterion unchanged. We will call any such r a set upper bound.

Algorithm 1 provides a simple iterative method of computing r. Our experience shows that it usually requires only a small number of recomputations of the δr‐optimal design, even if rini is small and q is close to 1, resulting in a good set upper bound r (see the metacode of Algorithm 1 for details). Due to the high speed and stability of the computation of the values of δr for candidate designs, it is possible to use an adaptation of the standard KL exchange heuristic to compute the input value o(), as well as to obtain δr‐optimal designs in steps 2 and 9 of the algorithm itself.

Algorithm 1. A simple algorithm for computing a set upper bound.

graphic file with name BIMJ-62-1090-e350.jpg

2.4. Example 1 continued

Consider the models from the motivating example. Let X={1.00,1.01,,2.00}, θ0=e, and θ1=1. Note that these nominal values satisfy η0(θ0,1)=η1(θ1,1). Moreover, let us set Θ0(1)=[e1,e+1] and Θ1(1)=[0,2], and let the required size of the experiment be n=6. First, we computed the value o2()0.02614. Next, we used Algorithm 1 with rini=0.3 and q=1+106, which returned a set upper bound r0.6787 after as few as seven computations of δr‐optimal designs. Informed by r, we computed δr‐optimal designs for r=0.01,0.1,0.2,,0.7. The resulting δr‐optimal designs are displayed in Figure 3. Note that if Θ(r)s are very narrow, the δr‐optimal design is concentrated in the design point x=2, effectively maximizing the difference between η0(θ0,x) and η1(θ1,x). For larger values of r, the δr‐optimal design has a 2‐point and ultimately a 3‐point support.

Figure 3.

Figure 3

δr‐Optimal designs of size n=6 for different rs; see the second part of the motivating example. The horizontal axis corresponds to the design space, and the vertical axis corresponds to different spans r of the flexible nominal sets. For each r, the figure displays the number of repeated observations at different design points, corresponding to the δr‐optimal design

For some pairs of competing models there exists a set upper bound r, beyond which the values of δr are constantly 0 for all designs. These cases can be identified by solving a linear programming (LP) problem, as we show next.

Proposition 2.3

Let D¯ be the design that performs exactly one trial in each point of X. Consider the following LP problem with variables rR, θ0Rm, θ1Rm:

minrs.t.F0(D¯)θ0+a0(D¯)=F1(D¯)θ1+a1(D¯),θ0Θ0(r),θ1Θ1(r),r0. (14)

Assume that (14) has some solution, and denote one solution of (14) by (r,θaT,θbT)T. Then, r is a finite set upper bound. Moreover, o(r)=0 for all r[r,].

From the expression (7) we see that for any design D and its nonreplication version Dnr we have δr(Dnr)=0 implies δr(D)=0. Moreover, if D2D1 in the sense that D2 is an augmentation of D1 then δr(D2)=0 implies δr(D1)=0. Now let (r,θaT,θbT)T be a solution of (14), let rr and let D be any design. Definition of δr and the form of (14) imply δr(D¯)=0. From D¯Dnr we see that then δr(Dnr)=0, hence δr(D)=0. The proposition follows.

Note that r obtained using Proposition 2.3 does not depend on n, that is, it is a set upper bound simultaneously valid for all design sizes. The basic discriminability condition implies that r0.

If the competing models are linear, vectors a0(D¯) and a1(D¯) are zero. Therefore, (2.3) has a feasible solution (r,0mT,0mT)T for any r0 such that both Θ0(r) and Θ1(r) cover 0m. That is, for the case of linear models, there is a finite set upper bound r beyond which the δr‐values of all designs vanish. However, the same holds for specific nonlinear models, including the ones from Section 3.

Proposition 2.4

Assume that both competing regression models are linear provided that we consider a proper subset of their parameters as known constants. Then (2.3) has a finite feasible solution, that is, there exists a finite set upper bound r such that o(r)=0 for all r[r,].

Without loss of generality, assume that fixing the first k0<m components of θ0 converts Model 0 to a linear model. More precisely, let θ01,,θ0m denote the components of θ0 and assume that

η0(θ0,x)=j=k0+1mγj(0)(θ01,,θ0k0,x)θ0j

for some functions γj(0), j=k0+1,,m. Choose θ^0 such that θ^0j=θ0j for j=1,,k0, and θ^0j=0 for j=k0+1,,m. Make an analogous assumption for Model 1 and also define θ^1 analogously. It is then straightforward to verify that for the design D¯ from Proposition 2.3 we have Fk(D¯)θ^k+ak(D¯)=0d, where d=#X, for both k=0,1. Therefore, any (r,θ^0T,θ^1T)T such that θ^0Θ0(r) and θ^1Θ1(r) is a solution of (14) in Proposition 2.3.

In the following, we numerically demonstrate that the δ design criterion leads to designs which yield a high probability of correct discrimination.

3. AN APPLICATION IN ENZYME KINETICS

This real applied example is taken from Bogacka, Patan, Johnson, Youdim, and Atkinson (2011) and was already used in Atkinson (2012) to illustrate model‐discrimination designs. There two types of enzyme kinetic reactions are considered, where the reactions velocity y is alternatively modeled as

y=θ01x1θ021+x2θ03+x1+ε (15)

and

y=θ11x1(θ12+x1)1+x2θ13+ε, (16)

which represent competitive and noncompetitive inhibition, respectively. Here x 1 denotes the concentration of the substrate and x 2 the concentration of an inhibitor. The data used in Bogacka et al. (2011) from an initial experiment of 120 observations are on Dextrometorphan–Sertraline and yields the estimates displayed in Table 1, where Gaussian errors were assumed. Assumed parameter spaces were not explicitly given there, but can be inferred from their figures as θ0,1,θ1,1(0,), θ0,2,θ1,2(0,60], and θ0,3,θ1,3(0,30], respectively. Designs for parameter estimation in these models were recently given in Schorning, Dette, Kettelhake, and Möller (2017).

Table 1.

Parameter estimates and corresponding standard errors for models (15) and (16), respectively

Estimate θ^ SE
σ^θ
Estimate θ^ SE
σ^θ
θ01 7.298 0.114 θ11 8.696 0.222
θ02 4.386 0.233 θ12 8.066 0.488
θ03 2.582 0.145 θ13 12.057 0.671

In Atkinson (2012), the two models are combined into an encompassing model:

y=θ21x1θ221+x2θ23+x11+(1λ)x2θ23+ε, (17)

where λ=1 corresponds to (15) and λ=0 to (16), respectively. Following the ideas of Atkinson (1972) as used, for example, in Atkinson (2008) or Perrone, Rappold, and Müller (2017) one can then proceed to find so‐called Ds‐optimal (i.e., D‐optimal for only a subset of parameters) designs for λ and employ them for model discrimination. Note that also this method is not fully symmetric as it requires a nominal value for λ for linearization of (17), which induces some kind of weighting.

The nominal values used in Atkinson (2012) obviously motivated by the estimates of (15) were θ01=θ11=θ21=10, θ02=θ12=θ22=4.36, θ03=2.58, θ13=5.16, and θ23=3.096. However, note that particularly for model (16) the estimates in Table 1 give considerably different values and also nonlinear least squares directly on (17) yields the deviating estimates given in Table 2. The design region used was rectangular X=X1×X2=[0,30]×[0,40].

Table 2.

Parameter estimates and corresponding standard errors for the encompassing model (17)

Estimate θ^ SE
σ^θ
θ21 7.425 0.130
θ22 4.681 0.272
θ23 3.058 0.281
λ 0.964 0.019

In Table 2 of Atkinson (2012) four approximate optimal designs (we will denote them as A1–A4) were presented: the Toptimal designs assuming λ=0 (A1) and λ=1 (A4), a compound T‐optimal design (A3), and a Ds‐optimum (A2) for the encompassing model (for the latter note that Atkinson assumed λ=0.8, whereas the estimate suggest a much higher value). We will compare our δ‐optimal designs against exact versions of these designs, properly rounded by the method of Pukelsheim and Rieder (1992).

3.1. Confirmatory experiment n=6, normal errors

Let us first assume we want to complement the knowledge from our initial experiment by another experiment for which, however, we were given only limited resources, for example, for the sample sizes of only n=6 observations. Note that the aim is not to augment the previous 120 observations but to make a confirmatory decision just using the new observations. That is we are using the data from the initial experiment just to provide us with nominal values for parameter estimates and noise variances for the simulation, respectively. This is a realistic scenario if for instance for legal reasons the original data had to be deleted and only summary information was available.

As we are assuming equal variances for the two models we are using the estimate for the error standard deviation σ^=0.1526 from the encompassing model as a base value for the simulation error standard deviation. However, using σ^ was not very revealing for the discriminatory performance was consistently high for all designs. Thus, to accentuate the differences the actual standard deviation used was 2×σ^ instead (unfortunately an even higher inflation is not feasible as it would result in frequent negative observations leading to faulty ML‐estimates). We then simulated the data‐generating process under each model for N=10000 times and calculated the total percentages of correct discrimination (hit rates) when using the likelihood ratio as decision rule.

We are comparing the designs A1–A4 to three specific δ designs δ1,δ2, and δ3, which represent a range of different nominal intervals. Specifically we chose Θk=[θk1±rσk1]×[θk2±rσk2]×[θk3±rσk3]k=0,1, where we chose θkj=θ^kj and σkj=σ^kj for k=0,1 and j=1,2,3. The tuning parameter r was set to three levels: r=1 (which is close to the lower bound of still providing a regular design), r=5 and r=15 (which is sufficiently close to the theoretical upper bound to yield a stable design), respectively. To make the latter more precise: the models in considerations are such that if we fix the last two out of the three parameters, then they become one‐parameter linear models. Therefore, using Proposition 2.4 we know that there exists a finite set upper bound r. Solving (14) provides the numerical value r64.02. Note that the same bound is valid for all design sizes n. Designs A1–A4 and δ1 all contain four support points, while δ2 has six and δ3 has five, respectively. A graphical depiction of the designs is given in Figure 4.

Figure 4.

Figure 4

Compared designs: first row A1–A4, second row δ1–δ3

Robustness study: As we would like to avoid comparing designs only when the data are generated from the nominal values (although this favors all designs equally), we perturbed the data‐generating process by drawing parameters from uniform distributions drawn at θ±c×σθ, where c then acts as a perturbation parameter. Under these settings all these designs fare pretty well as can be seen from Table 3. However, A4 and δ2 seem to outperform the other competing designs by usually narrow margins except perhaps for A1, which is consistently doing worst. Note that in a real situation the true competitors of δ‐optimal designs are just A2 and A3 as it is unknown beforehand which model is true.

Table 3.

Total hit rates for N=10000 under each model, maximal values in boldface

c 0 1 5
True model η0 η1 η0 η1 η0 η1
A1 91.11 94.45 91.35 93.95 90.44 93.24
A2 97.11 96.75 97.47 96.64 96.74 96.27
A3 96.60 96.51 96.47 96.40 95.69 96.06
A4 97.94 96.57 97.73 96.29 97.62 96.07
δ1 97.59 95.11 97.43 94.90 97.71 94.56
δ2 97.93 97.03 97.77 96.67 97.20 96.54
δ3 96.50 95.29 96.42 95.36 96.19 95.64

3.2. A second large‐scale experiment n=60, log‐normal errors

As the discriminatory power of all the designs for n=60 is nearly perfect, we are required to inflate the error variance. However, using additive normal errors in the data‐generating process and inflating the variance by a large enough factor, would generate a large number of negative observations, which renders likelihood estimation invalid. So, the data‐generating process was adapted to use multiplicative log‐normal errors. The observations were then rescaled to match the means from the original process. This way we are at liberty to inflate the error variance by any factor without producing faulty observations. Note that now the data‐generating process does not fully match the assumptions under which the designs were generated, but this can just be considered an extended robustness study as it holds for all compared designs equally. We could of course also have calculated the designs under the same data‐generating process, but as the fit of the model to the original data is not greatly improved and models (15) and (16) seem firmly established in the pharmacological literature, we refrained from doing this.

Perturbation of the parameters here did not exhibit a discernible effect, while the error inflation still does. For brevity, we here report only again the results for using 5×σ^ (and c=0). The respective designs δ1–3 were qualitatively similar to those given in Figure 4 albeit with more diverse weights. In this simulation we generated 100 instances of n=60 observations from these designs a thousand times.

The corresponding boxplots of the correct classification rates are given in Figure 5. In this setting A4 seems a bit superior even under η1 (remember it being the T‐optimum design assuming η0 true), while δ1 and δ2 come close (and beat the true competitors A2 and A3) with A1 again being clearly the worst.

Figure 5.

Figure 5

Boxplot for the total correct classification rates for all designs using nominal values and error standard deviations of 5×σ^; white under η0, grey under η1

4. CONCLUSIONS AND DIRECTIONS FOR FURTHER RESEARCH

We have presented a novel design criterion for symmetric model discrimination. Its main advantage is that design computations, unlike for T‐optimality, can be undertaken with efficient routines of quadratic optimization that in general enhance the speed of computations by an order of magnitude. An optimal exact design problem is a problem of discrete optimization, and the efficiency of its solution critically depends on the speed of evaluation of the design criterion. By a series of approximations, we substituted the theoretically ideal but numerically infeasible computation of the probability of correct discrimination with a simple convex optimization, which can be solved rapidly and reliably. Combined with the proposed methodology of flexible nominal sets, we can construct an entire sequence of exact experimental designs efficient for discrimination between models. Also it was shown in an example that resulting designs are competitive in their actual discriminatory abilities.

The notion of flexible nominal sets may have independent merit. Note again the distinction between parametric spaces and flexible nominal sets (and thus the principal distinction to “rigid” minimax approaches). Parametric spaces usually encompass all theoretically possible values of the parameters, while flexible nominal sets can contain the unknown parameters with very high likelihood, and still be significantly smaller than the original parameter spaces. In this paper, we do not completely specify the process of constructing the flexible nominal sets, but if we perform a two stage experiment, with a second, discriminatory phase, the potential specification through confidence intervals is an important problem.

As the approach suggested offers a fundamentally new way of constructing discriminatory designs, many properties are yet unexplored. A nonexhaustive list of questions follows.

Sequential procedure. The proposed method lends itself naturally to a two‐stage procedure, where parameter estimates and confidence intervals are employed as nominal values in the second stage. Even sequential generation of design points can be straightforwardly implemented.

Approximate designs. Proposition 2.1 is a possible gateway for the development of the standard approximate design theory for δ‐optimality because the criterion δapp2 is concave on the set of all approximate designs. Therefore, it is possible to work out a minimax‐type equivalence theorem for δ‐optimal approximate designs, and use specific convex optimization methods to find a δ‐optimal approximate designs numerically. For instance, it would be possible to employ methods analogous to Burclová and Pázman (2016) or Yue, Vandenberghe, and Wong (2018).

Utilization of the δ‐optimal designs for related criteria. For a design D=(x1,,xn), a natural criterion closely related to δr‐optimality can be defined as

δr(D)=infθ0Θ0(r),θ1Θ1(r)δ(D|θ0,θ1),whereδ(D|θ0,θ1)=(η0(θ0,xi))i=1n(η1(θ1,xi))i=1n.

The criterion δr requires a multivariate nonconvex optimization for the evaluation in each design D, which entails possible numerical difficulties and a long time to compute an optimal design. However, the δr‐optimal design, which can be computed rapidly and reliably, can serve as efficient initial design for the optimization of δr. Note that if Θ0 is a singleton containing only the nominal parameter value for Model 0, the δr‐optimal designs could potentially be used as efficient initial designs for computing the exact version of the criterion of T‐optimality.

Selection of the best design from a finite set of possible candidates. As most proposals for the construction of optimal experimental designs, the method depends on the choice of some tuning parameters or even on entire prior distributions (in the Bayesian approach), which always results in a set of possible designs. It would be interesting to develop a comprehensive Monte‐Carlo methodology for the choice of the best design out of this pre‐selected small set of candidate designs. A useful generalization of the rule would take into account possibly unequal losses for the wrong classification.

Noncuboid sets. The methodology could certainly be extended to other types of flexible nominal sets, particularly when we are interested in functional relations among the parameters. However, then the particularly efficient box constrained quadratic programming algorithm could not be utilized.

Higher order approximations. As a referee remarked it is possible to employ tighter approximations of the sets of mean values of responses than the one that we suggest. For instance, it would be possible to use the local curvature of the mean‐value function. However, this may also lead to the loss of numerical efficiency of the method.

More than two rival models. Another referee remark leads us to point out the natural extension to investigate a weighted sum or the minimum δ over all paired comparisons. The implications of these suggestions, however, requires deeper investigations.

Different error variances. Yet another referee requested a clarification for how to proceed in case of unequal error variances for the two models. In case the functional form of these variances are known simple standardizations of the models will suffice. All other cases, including dependencies of the errors, will require more elaborate strategies.

Combination with other criteria. The proposed method can produce poor or even singular designs for estimating model parameters. Because of this problem, already mentioned in Atkinson and Fedorov (1975), Atkinson (2008) used a compound criterion called DT‐optimality. The same approach is possible for δ‐optimality. However, our numerical experience suggests that for a large enough size of the flexible nominal set, the δ‐optimal designs tend to be supported on a set that is large enough for estimability of the parameters, without any combination with an auxiliary criterion. A detailed analysis goes beyond the scope of this paper.

CONFLICT OF INTEREST

The authors declare that there is no conflict of interest.

Open Research Badges

This article has earned an Open Data badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available in the Supporting Information section.

This article has earned an open data badge “Reproducible Research” for making publicly available the code necessary to reproduce the reported results. The results reported in this article could fully be reproduced.

Supporting information

SUPPORTING INFORMATION

ACKNOWLEDGMENTS

We are very grateful to Stefanie Biedermann from the University of Southampton for intensive discussions on earlier versions of the paper. We also thank Stephen Duffull from the University of Otago for sharing his code and Barbara Bogacka for sharing the data. We also thank various participants of the design workshop in Banff, August 2017, and to Valerii Fedorov for many helpful comments. Sincerest gratitude to the associate editor and the referees of the paper, whose remarks led to considerable improvements. The work of RH was supported by the VEGA 1/0341/19 grant from the Slovak Scientific Grant Agency, W.M.'s research was partially supported by project grants LIT‐2017‐4‐SEE‐001 funded by the Upper Austrian Government, and Austrian Science Fund (FWF): I 3903‐N32.

Harman R, Müller WG. A design criterion for symmetric model discrimination based on flexible nominal sets. Biometrical Journal. 2020;62:1090–1104. 10.1002/bimj.201900074

REFERENCES

  1. Atkinson, A. C. (1972). Planning experiments to detect inadequate regression models. Biometrika, 59(2), 275–293. [Google Scholar]
  2. Atkinson, A. C. (2008). DT‐optimum designs for model discrimination and parameter estimation. Journal of Statistical Planning and Inference, 138(1), 56–64. [Google Scholar]
  3. Atkinson, A. C. (2012). Optimum experimental designs for choosing between competitive and non competitive models of enzyme inhibition. Communications in Statistics—Theory and Methods, 41(13–14), 2283–2296. [Google Scholar]
  4. Atkinson, A. C. , Donev, A. , & Tobias, R. (2007). Oxford Statistical Science Series. Optimum experimental designs, with SAS. Oxford: Oxford University Press. [Google Scholar]
  5. Atkinson, A. C. , & Fedorov, V. V. (1975). The design of experiments for discriminating between two rival models. Biometrika, 62(1), 57–70. [Google Scholar]
  6. Bogacka, B. , Patan, M. , Johnson, P. J. , Youdim, K. , & Atkinson, A. C. (2011). Optimum design of experiments for enzyme inhibition kinetic models. Journal of Biopharmaceutical Statistics, 21(3), 555–572. [DOI] [PubMed] [Google Scholar]
  7. Burclová, K. , & Pázman, A. (2016). Optimal design of experiments via linear programming. Statistical Papers, 57(4), 893–910. [Google Scholar]
  8. Buzzi‐Ferraris, G. , & Forzatti, P. (1983). A new sequential experimental design procedure for discriminating among rival models. Chemical Engineering Science, 38(2), 225–232. [Google Scholar]
  9. Cox, D. R. (1961). Tests of separate families of hypotheses. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability: Contributions to the theory of statistics (Volume 1, pp. 105–123). Berkeley, CA: University of California Press. [Google Scholar]
  10. Cox, D. R. (2013). A return to an old paper: “Tests of separate families of hypotheses.” Journal of the Royal Statistical Society, Series B, 75(2), 207–215. [Google Scholar]
  11. Dette, H. , Melas, V. B. , & Shpilev, P. (2013). Robust T‐optimal discriminating designs. Annals of Statistics, 41(4), 1693–1715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dette, H. , & Titoff, S. (2009). Optimal discrimination designs. Annals of Statistics, 37(4), 2056–2082. [Google Scholar]
  13. Fedorov, V. V. , & Khabarov, V. (1986). Duality of optimal designs for model discrimination and parameter estimation. Biometrika, 73(1), 183–190. [Google Scholar]
  14. Fedorov, V. V. , & Pázman, A . (1968). Design of physical experiments (statistical methods). Fortschritte der Physik, 16, 325–355. [Google Scholar]
  15. Felsenstein, K. (1992). Optimal Bayesian design for discrimination among rival models. Computational Statistics & Data Analysis, 14(4), 427–436. [Google Scholar]
  16. Hainy, M. , Price, D. J. , Restif, O. , & Drovandi, C. (2018). Optimal Bayesian design for model discrimination via classification. Retrieved from arXiv:1809.05301 [DOI] [PMC free article] [PubMed]
  17. Hill, P. D. H. (1978). A review of experimental design procedures for regression model discrimination. Technometrics, 20(1), 15–21. [Google Scholar]
  18. Mullen, K. M. (2013). R‐package BVLS: The Stark‐Parker algorithm for bounded‐variable least squares. CRAN. [Google Scholar]
  19. Müller, W. G. , & Ponce De Leon, A. C. M. (1996). Discrimination between two binary data models: Sequentially designed experiments. Journal of Statistical Computation and Simulation, 55(1–2), 87–100. [Google Scholar]
  20. Perrone, E. , Rappold, A. , & Müller, W. G. (2017). Ds‐optimality in copula models. Statistical Methods & Applications, 26(3), 403–418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Pesaran, M. H. , & Weeks, M. (2007). Nonnested hypothesis testing: An overview. In Baltagi B. H. (Ed.), A companion to theoretical econometrics (pp. 279–309). Hoboken, NJ: Wiley. [Google Scholar]
  22. Pronzato, L. , & Pázman, A. (2014). Design of experiments in nonlinear models: Asymptotic normality, optimality criteria and small‐sample properties. Lecture Notes in Statistics. Berlin: Springer. [Google Scholar]
  23. Pukelsheim, F. , & Rieder, S. (1992). Efficient rounding of approximate designs. Biometrika, 79(4), 763–770. [Google Scholar]
  24. Schorning, K. , Dette, H. , Kettelhake, K. , & Möller, T. (2017). Optimal designs for enzyme inhibition kinetic models. Statistics: A Journal of Theoretical and Applied Statistics, 52(12), 1–20. [Google Scholar]
  25. Schwaab, M. , Silva, F. M. , Queipo, C. A. , Barreto, A. G. , Nele, M. , & Pinto, J. C. (2006). A new approach for sequential experimental design for model discrimination. Chemical Engineering Science, 61(17), 5791–5806. [Google Scholar]
  26. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464. [Google Scholar]
  27. Sidak, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions. Journal of the American Statistical Association, 62(318), 626–633. [Google Scholar]
  28. Stark, P. B. , & Parker, R. L. (1995). Bounded‐variable least‐squares: An algorithm and applications. Computational Statistics, 10(2), 129–141. [Google Scholar]
  29. Stigler, S. M. (1971). Optimal experimental design for polynomial regression. Journal of the American Statistical Association, 66(334), 311–318. [Google Scholar]
  30. Tommasi, C. , & López‐Fidalgo, J. (2010). Bayesian optimum designs for discriminating between models with any distribution. Computational Statistics & Data Analysis, 54(1), 143–150. [Google Scholar]
  31. Tommasi, C. , Martín‐Martín, R. , & López‐Fidalgo, J. (2016). Max‐min optimal discriminating designs for several statistical models. Statistics and Computing, 26(6), 1163–1172. [Google Scholar]
  32. Vajjah, P. , & Duffull, S. B. (2012). A generalisation of T‐optimality for discriminating between competing models with an application to pharmacokinetic studies. Pharmaceutical Statistics, 11(6), 503–510. [DOI] [PubMed] [Google Scholar]
  33. Yue, Y. , Vandenberghe, L. , & Wong, W. K. (2018). T‐optimal designs for multi‐factor polynomial regression models via a semidefinite relaxation method. Statistics and Computing, 29(4), 725–738. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SUPPORTING INFORMATION


Articles from Biometrical Journal. Biometrische Zeitschrift are provided here courtesy of Wiley

RESOURCES