Summary
We consider the problem of designing experiments for estimating a target parameter in regression analysis when there is uncertainty about the parametric form of the regression function. A new optimality criterion is proposed that chooses the experimental design to minimize the asymptotic mean squared error of the frequentist model averaging estimate. Necessary conditions for the optimal solution of a locally and Bayesian optimal design problem are established. The results are illustrated in several examples, and it is demonstrated that Bayesian optimal designs can yield a reduction of the mean squared error of the model averaging estimator by up to 45%.
Keywords: Bayesian optimal design, Local misspecification, Model averaging, Model selection, Model uncertainty, Optimal design
1. Introduction
It is well known that a carefully designed experiment can substantially improve statistical inference in regression analysis. Optimal design of experiments is more efficient the more knowledge about the underlying regression model is available, and an impressive theory has been developed to construct optimal designs under the assumption of a given regression model; see, for example Pukelsheim (2006), Atkinson et al. (2007) and Fedorov & Leonov (2013). On the other hand, model selection is an important step in any data analysis, and these references also partially discuss the problem of constructing efficient designs to address model uncertainty in the design of experiments. Because of its importance this problem has a long history. Early work dates back to Atkinson & Fedorov (1975), who determined optimal designs for model discrimination by maximizing the power of a test between competing regression models; see also Ucinski & Bogacka (2005); López-Fidalgo et al. (2007); Dette & Titoff (2009); Wiens (2009) or Tommasi & López-Fidalgo (2010) for some more recent references. A different line of research in this context was initiated by Läuter (1974), who proposed a criterion based on a product of the determinants of the information matrices in the various models under consideration, which yields efficient designs for all models under consideration. This criterion has been used successfully by Dette (1990) to determine efficient designs for a class of polynomial regression models, and by Biedermann et al. (2006) to construct efficient designs for binary response models, when there is uncertainty about the form of the link function. As these criteria do not reflect model discrimination, Zen & Tsai (2002), Atkinson (2008) and Tommasi (2009) considered a mixture of Läuter-type and discrimination criteria to construct efficient designs for model discrimination and parameter estimation. An alternative concept to robust designs with respect to misspecified models consists of the minimization of the maximal mean squared error calculated over a class of misspecified models with respect to the design under consideration; see Wiens (2015) for an overview. Several authors have worked on this problem, for example Wiens & Xu (2008), who derived robust prediction and extrapolation designs, or Konstantinou et al. (2017), who analysed robust designs under local alternatives for survival trials. This list of references is by no means complete. A common feature in most of the literature is that either, at least implicitly, the designs are constructed under the assumption that model selection is performed by hypotheses testing or designs are determined with good properties for a class of competing models.
On the other hand there is an enormous amount of literature on performing statistical inference under model uncertainty, which to the best of our knowledge has not been discussed in the context of optimal experimental design. One possibility is to select an adequate model from a set of candidate models, and numerous model selection criteria have been developed for this purpose (Burnham & Anderson, 2002; Claeskens & Hjort, 2008; Konishi & Kitagawa, 2008). These procedures are widely used and have the advantage of delivering a single model for the statistical analysis, which makes them very attractive for practitioners. However, there exists a well-known post-selection problem in this approach because estimators chosen after model selection usually behave like mixtures of many potential estimators. For example, if
is a parameter of interest in a regression model, such as a prediction at a particular point, the area under the curve, or a specific quantile of the regression model, it is known that selecting a single model and ignoring the uncertainty resulting from the selection process may give confidence intervals for
with coverage probability smaller than the nominal value; see, for example, Claeskens & Hjort (2008, Ch. 7) for a mathematical treatment or Bornkamp (2015) for a high-level discussion of this phenomenon.
As an alternative, several authors proposed smoothing estimators for the parameter
across several models, rather than choosing a specific model from the class under consideration and performing the estimation in the selected model. This approach takes the additional estimator variability caused by model uncertainty adequately into account and has been discussed intensively in the Bayesian community, where it is known as Bayesian model averaging; see Hoeting et al. (1999), among many others. Hjort & Claeskens (2003) pointed out several problems with this approach. In particular, they mentioned the difficulties in specifying the prior probabilities for a class of models and the problem of mixing together many conflicting prior opinions in the statistical analysis. They proposed an alternative non-Bayesian approach, which they call frequentist model averaging, and developed asymptotic theory for their method. Evidence exists that model averaging improves the estimation efficiency (Breiman, 1996; Raftery & Zheng, 2003), and Schorning et al. (2016) recently demonstrated the superiority of model averaging over estimation after model selection by an information criterion in the context of dose response models. These results have recently been confirmed by Aoki et al. (2017) and Buatois et al. (2018) in the context of nonlinear mixed effect models.
The present paper is devoted to the construction of optimal designs if parameters of interest are estimated under model uncertainty via frequentist model averaging. An optimal design for model averaging estimation minimizes the asymptotic mean squared error of the model averaging estimator under local alternatives, and we show that model averaging estimators can be improved if the experiments are well designed.
2. Model averaging under local misspecification
Model averaging is a common technique for estimating a parameter of interest, say
, under model uncertainty. Using this technique, an estimate for a parameter of interest is a weighted average of estimates of this parameter in the competing models under consideration, where different choices for the weights have been proposed in the literature; see, for example, Wassermann (2000) or Hansen (2007) for Bayesian and non-Bayesian model averaging methods. In this section we briefly describe this concept and the corresponding asymptotic theory in the present context, such that the results can be used to construct optimal designs for model averaging estimation. The results follow more or less from the statements in Hjort & Claeskens (2003) and Claeskens & Hjort (2008); details regarding their derivation are omitted for the sake of brevity.
We assume that
different experimental conditions, say
, are chosen in the design space
, and that at each experimental condition
one can observe
responses, say
,
. We also assume that for each
the responses
at experimental condition
are realizations of independent and identically distributed, real-valued, random variables
with unknown density
. Therefore the total sample size is
and the experimental design problem consists of the choice of the number
of different experimental conditions, the experimental conditions
themselves and the number
of observations taken at each
,
, such that the model averaging estimate is most efficient.
To measure efficiency and to compare different experimental designs we will use asymptotic arguments and consider the case
for
. As is common in optimal design theory we collect this information in the matrix
![]() |
(1) |
Following Claeskens & Hjort (2008) we assume that
is contained in a set, say
, of parametric candidate densities which is constructed as follows. The first candidate density in
is a parametric density
, where the form of
is assumed to be known,
and
denote the unknown parameters, which vary in a compact parameter space, say
. The second candidate density is the parametric density
, which is obtained by fixing the parameter value
to a prespecified known value
. Throughout, we will call
the wide density and
the narrow density, respectively. We use the analogous expressions wide model and narrow model for the corresponding candidate models. Additional candidate models are obtained by choosing certain submodels between the wide density
and the narrow density
. More precisely, for a chosen subset
of indices with cardinality
, we introduce the projection matrices
and
which map a
-dimensional vector to its components corresponding to indices in
and
, respectively. Using the abbreviations
and
, we define the candidate density
by
![]() |
(2) |
Consequently, for the density
the components of
with indices in
are fixed to the corresponding components of
, while the components with indices in
are considered as unknown parameters. Note that
,
and that in the most general case there are
possible candidate densities. As we might not be interested in all possible submodels, we assume that the competing models are defined by different sets
for some
. Thus the class
of candidate models is given by
![]() |
(3) |
Following Hjort & Claeskens (2003), we consider local deviations throughout and assume that the true density is given by
![]() |
(4) |
where the true parameter values are
and
. The true density is given by the wide density with a varying value of
, which differs from
through the perturbation term
. Thus, for
tending to infinity, it approximates the narrow density
.
Example 1.
Consider the case where
is a normal density with variance
and mean function
(5) where
,
and the explanatory variable
varies in an interval, say
. This model is the sigmoid Emax model, and has numerous applications in modelling the dependence of biochemical or pharmacological responses on concentration (Goutelle et al., 2008). The sigmoid Emax model is especially popular for describing dose-response relationships in drug development (MacDougall, 2006). The parameters in (5) have a concrete interpretation:
is used to model a placebo effect,
denotes the maximum effect of
relative to the placebo, and
is the value of
which produces half of the maximum effect. The so-called Hill parameter
characterizes the slope of the mean function
. The parameter
is included in every candidate model, whereas for the narrow model the components are fixed as
. Consequently, the narrow candidate model is a normal density with mean
(6) and variance
. In this case,
is the Michaelis–Menten function, which is widely utilized to represent an enzyme kinetics reaction, where enzymes bind substrates and turn them into products; see, for example, Cornish-Bowden (2012). The two other candidate models are obtained by either fixing
or
, and the corresponding densities are normal densities with mean functions
(7) respectively. The latter model is the Emax model, sometimes also referred to as the hyperbolic Emax model (Holford & Sheiner, 1981; MacDougall, 2006). Finally, under the local misspecification assumption (4) the true density
corresponds to a normal distribution with mean
and variance
. Typical functionals
of interest are the area under the curve
(8) calculated for a given region
or, for a given
, the quantile defined by
(9) The value defined in (9) is well known as
, that is, the effective dose at which
of the maximum effect is achieved; see MacDougall (2006) or Bretz et al. (2008).
We are interested in the estimation of a quantity, say
, where
is a differentiable function of the parameter
. For this purpose we fix one model
in the set of candidate models defined in (3) and use the estimator
, where
is the maximum-likelihood estimator in model
. Under the assumption (4) of a local misspecification and the common conditions of regularity (Lehmann & Casella, 1998, Ch. 6), it can be shown by adapting the arguments in Hjort & Claeskens (2003) and Claeskens & Hjort (2008) to the current situation that the resulting estimator
satisfies
![]() |
(10) |
Here,
denotes weak convergence and
is a normal distribution with variance
![]() |
where
is the gradient of
with respect to
, that is,
![]() |
(11) |
and
the information matrix in the candidate model
, that is,
![]() |
(12) |
The mean
in (10) is of the form
![]() |
where
![]() |
is the gradient in the wide model with respect to the parameters, the matrix
is defined by
![]() |
(13) |
the matrices
and
are given by (12) and
![]() |
(14) |
respectively, and
denotes the information matrix in the wide model
.
The frequentist model averaging estimator is now defined by assigning weights
, with
, to the different candidate models
and defining
![]() |
(15) |
where
are the estimators in the different candidate models
. The asymptotic behaviour of the model averaging estimator
can be derived from Hjort & Claeskens (2003). In particular, it can be shown that under assumption (4) and the standard regularity conditions a standardized version of
is asymptotically normally distributed, that is,
![]() |
(16) |
Here, the mean and variance are
![]() |
(17) |
![]() |
(18) |
respectively,
is the information matrix of the wide model
, and the vector
is given by
![]() |
(19) |
where we used the notation of
,
and
introduced in (11), (12) and (14). The optimal design criterion for model averaging proposed here is based on an asymptotic representation of the mean squared error of the estimate
derived from (16), and will be carefully defined in the following section.
3. An optimality criterion for model averaging estimation
Following Kiefer (1974), we call the quantity
in (1) an approximate design on the design space
. This means that the support points
define the distinct dose levels, where observations are to be taken and the weights
represent the relative proportion of responses at the corresponding support point
,
. For an approximate design
and given total sample size
, a rounding procedure is applied to obtain integers
,
, from the not necessarily integer-valued quantities
(Pukelsheim, 2006, Ch. 12), which define the number of observations at
,
.
If the observations are taken according to an approximate design
and an appropriate rounding procedure is used such that
as
, the asymptotic mean squared error of the model averaging estimate
of the parameter of interest
can be obtained from the discussion in § 2, that is
![]() |
(20) |
where the variance
and the bias
are defined in equations (17) and (18), respectively. Consequently, a good design for the model averaging estimate (15) should give small values of
. Therefore, for a given finite set
of candidate models
of the form (2) and weights
, a design
is called a locally optimal design for model averaging estimation of the parameter
if it minimizes the function
in (20) in the class of all approximate designs on
.
Locally model averaging optimal designs address uncertainty only with respect to the model
, but require prior information for the parameters
and
. While such knowledge might be available in some circumstances (see, e.g., Dette et al., 2008; Bretz et al., 2010), sophisticated design strategies have been proposed in the literature that require less precise knowledge about the model parameters, such as sequential, Bayesian or standardized maximin optimality criteria (Pronzato & Walter, 1985; Chaloner & Verdinelli, 1995; Dette, 1997). Any of these methodologies can be used to construct efficient robust designs for model averaging, and for the sake of brevity we restrict ourselves to Bayesian optimality criteria.
Here we address the uncertainty with respect to the unknown model parameters by a prior distribution, say
, on
; we call a design
Bayesian optimal for model averaging estimation of the parameter
with respect to the prior
if it minimizes the function
![]() |
(21) |
where the function
is defined in (20). We assume throughout that the integral exists. For the sake of brevity we restrict ourselves to a prior distribution for the parameters
and
. In applications one could also use a prior distribution for
to address uncertainty about this parameter.
The Bayesian optimal design still depends on the choice of the prior distribution
. In some applications there is prior knowledge available that can be used for modelling the prior distribution. A typical example is dose finding studies, where information from pharmaceutical development or from a phase I trial can be used (Dette et al., 2008). Alternatively, noninformative priors can be applied to obtain efficient designs which are robust against misspecification of the parameters. The performance of the resulting optimal designs depends on how the choice of the prior distribution reflects the truth. Numerical investigations show that locally optimal designs have smaller mean squared errors than Bayesian optimal designs for model averaging estimation of the parameter
, as long as the prior guesses of the parameters are very precise. Under misspecification of the parameters, Bayesian optimal designs generally show better performance than locally optimal designs.
Locally and Bayesian optimal designs for model averaging have to be calculated numerically in all cases of interest, and we present several examples in § 4. Next, we state necessary conditions for
and
optimality. The proofs are given in the Supplementary Material.
Theorem 1.
If the design
is a locally optimal design for model averaging estimation of the parameter
, then the inequality
(22) holds for all
, where
is defined by (17) and the functions
and
are given by
(23)
(24) where
the vector
is defined by (19) and the information matrix
by (12), respectively. The design
denotes the Dirac measure at the point
.
Moreover, there is equality in (22) for every point
in the support of
.
Theorem 2.
If a design
is Bayesian optimal for model averaging estimation of the parameter
with respect to the prior
, then
(25) holds for all
, where the derivatives
and
are given by (23) and (24), respectively. Moreover, there is equality in (25) for every point
in the support of
.
The derived conditions of Theorems 1 and 2 can be used in the following way: if a numerically calculated design does not satisfy inequality (22), it will not be locally optimal for model averaging estimation of the parameter
and the search for the optimal design has to be continued. The functions
and
are not convex, and therefore sufficient conditions for optimality are not available.
Remark 1.
Hjort & Claeskens (2003) also considered model averaging using weights
depending on the data
in the definition of the estimator
in (15). Because of this dependency they are called random weights in the literature. Typical examples are smooth aic weights
(26) based on the aic scores
where
denotes the loglikelihood function of model
evaluated in the maximum likelihood estimator
, and
is the number of parameters to be estimated in model
,
(Claeskens & Hjort, 2008, Ch. 2). Moreover, the estimator of a target
which is based on model selection by aic can also be rewritten in terms of a model averaging estimator by using random weights of the form
(27) where
is the indicator function of the set
and
denotes the model with the greatest aic score among the candidate models. For further choices of model averaging weights see, for example, Buckland et al. (1997), Hjort & Claeskens (2003) or Hansen (2007). In general, the case of random weights in model averaging estimation is more difficult to handle and the asymptotic distribution is not normal (Claeskens & Hjort, 2008, p. 196). As a consequence, an explicit calculation of the asymptotic bias and variance is not available.
From the design perspective it therefore seems to be reasonable to consider the case of fixed weights, for which the asymptotic properties of model averaging estimation under local misspecification are well understood and determine efficient designs for this estimation technique. Moreover, we also demonstrate in § 4 and in the Supplementary Material that model averaging estimation with fixed weights often shows better performance than model averaging with smooth aic weights, and that the optimal designs derived under the assumption of fixed weights also improve the current state of the art for model averaging using random weights.
4. Optimal designs for model averaging
4.1. Numerical calculation of optimal designs
We investigate the performance of optimal designs for model averaging estimation of a parameter
, considering several examples from the literature and comparing the Bayesian optimal designs for model averaging estimation with commonly used uniform designs by means of a simulation study.
In our examples we calculate the derivatives in the matrices
and
and the vector
analytically. This keeps the numerical errors in the computation of the criterion
in (21) and the directional derivative
in (25) as small as possible. Claeskens & Hjort (2008, § 6.5) give recommendations for estimating the matrices
and
when such an analytical expression is not available, but this might result in considerable numerical errors. Computing the inverse
can yield nonnegligible numerical errors when the information matrix is ill-conditioned. Therefore we recommend examining the condition numbers of the matrices
and
thoroughly before determining the optimal designs.
Many algorithms for the calculation of optimal designs have been developed (see, e.g., Fedorov & Leonov, 2013), and many of them build upon the directional derivative. These algorithms can be computationally less demanding than direct optimization of the criterion. Since computation of the directional derivative
in (25) is more complicated and numerically more unstable than calculating the criterion
in (21), we seek a direct minimization of the criterion
with respect to the support points
and the weights
,
, without using the directional derivative
in the optimization step. Throughout, we use the COBYLA algorithm for the minimization of the criterion
defined in (21); see Powell (1994) for details.
4.2. Estimation of ED
in the sigmoid Emax model
We consider the situation introduced in Example 1, where the underlying density is a normal distribution with variance
and different regression functions are under consideration for the mean. More precisely, the set
contains
candidate models which are defined by the different mean functions (5), (6) and (7), respectively. The parameter of interest
is
as defined in (9), which is estimated by an appropriate model averaging estimator. The design space is the interval
, and we assume that
observations can be taken in the experiment.
We determine a Bayesian optimal design for model averaging estimation of
. As the Emax model is linear in the parameters
and
, the optimality criterion does not depend on
or
and no prior information is required for these parameters. For the parameters
we choose independent uniform priors
and
on the sets
and
, respectively, and the variance
is fixed as
. One can also choose a prior for
. Finally, under the local misspecification assumption we set
such that
.
We first consider equal weights for the model averaging estimator, that is,
,
. The Bayesian optimal design for model averaging estimation of
is
![]() |
(28) |
and satisfies the necessary condition of optimality in Theorem 2, as shown in Fig. 4.2(a). The design
defined by (28) would not be optimal if the inequality was not satisfied.
Fig. 1.
The function
in (25) evaluated for (a) the design
in (28) and (b) the design
in (32).
In order to investigate the properties of the different designs for model averaging estimation, we have conducted a simulation study where we compare the Bayesian optimal design (28) for model averaging estimation of
with two uniform designs,
![]() |
(29) |
![]() |
(30) |
which are quite popular in the presence of model uncertainty (Bornkamp et al., 2007; Schorning et al., 2016). The design
is a uniform design with the same number of support points as the optimal design in (28), whereas the design
is a uniform design with more support points. Moreover, we also provide a comparison with two estimators commonly used in practice, namely the model averaging estimator based on smooth aic weights defined in (26) and the estimator in the model chosen by aic model selection, which is obtained as a model averaging estimator (15) using the weights in (27). Additionally, we investigate the estimator in the wide model (5), as proposed by a referee. For these estimators we also used observations taken according to the designs
,
and
. As the approximate designs cannot be implemented directly for
observations, a rounding procedure (see, e.g., Pukelsheim, 2006, Ch. 12) is applied to determine the number
of observations taken at
such that we have in total
observations. For example, the implemented design obtained from the Bayesian optimal design
for model averaging estimation of
uses
,
,
,
and
observations at the points
,
,
,
and
, respectively, and implementable versions of the designs
and
are obtained similarly.
All the results presented here are based on
simulation runs each generating
observations of the form
![]() |
(31) |
for the different designs, where the
are independent standard normal distributed random variables; different combinations of the true parameters
in (31) are investigated, whereas
is fixed. In the following discussion we will restrict ourselves to presenting the results for the parameters
,
. This is the parameter combination under the local misspecification assumption for
,
and
. Further simulation results for other parameter combinations can be found in the Supplementary Material.
In each simulation run the parameter
is estimated by model averaging using the different designs, and the mean squared error is calculated from all
simulation runs. More precisely, if
is the model averaging estimator for the parameter of interest
based on the observations
from model (31) with the design
, its mean squared error is
where
is the
in the true sigmoid Emax model (31) with parameters
. The simulated mean squared error of the model averaging estimator with equal weights
,
, for the different designs
,
and
is shown in the left column of Table 1. The second column of this table shows the mean squared error of the model averaging estimator with the smooth aic weights in (26), while the third column gives the corresponding results for the weights in (27), that is, estimation of
in the model identified by the aic for the different designs. Finally, the right column presents the results using the estimator in the wide model (5). The numbers marked with an asterisk in each column correspond to the smallest mean squared error obtained from the three designs.
Table 1.
The mean squared error of the model averaging estimators of
for the Bayesian optimal design (28), the uniform design defined in (29), and the uniform design defined in (30)
| Estimation method | ||||
|---|---|---|---|---|
| Design | Fixed weights | Smooth aic weights | Model selection | Wide model |
|
0.355* | 0.508* | 0.596* | 0.675* |
|
0.810 | 0.913 | 1.017 | 1.267 |
|
0.637 | 0.846 | 0.994 | 1.121 |
Fixed weights: 
We observe that model averaging always yields a smaller mean squared error than estimation in the model identified by the aic. For example, as shown in the first row in Table 1, if the design
is used then the mean squared error of the estimator based on model selection is
, whereas it is
and
for the model averaging estimator using equal weights and smooth aic weights, respectively. The situation for the non-optimal uniform designs is similar. These results, and also further simulation results in the Supplementary Material, coincide with the findings of Schorning et al. (2016), Aoki et al. (2017) and Buatois et al. (2018), and indicate that model averaging usually yields more precise estimates of the target than estimators based on model selection. Moreover, model averaging estimation with equal weights shows substantially better performance than the model averaging estimator with data-driven weights. Wagner & Hlouskova (2015) observed a similar effect in the context of principal components augmented regressions. A potential explanation of this observation is that the smooth aic weights might not be optimal for model averaging estimation in this situation. There is a vast amount of literature on the optimal choice of model weights such that the model averaging estimator has minimum mean squared error (Hjort & Claeskens, 2003; Hansen, 2007; Liang et al., 2011). These weights depend on the models and target parameters under consideration, and are in general data driven. Thus, considering the discussion in Remark 1, determining optimal weights prior to experimental design is infeasible. Finally, using the wide model for estimating
yields the largest mean squared error. Even though this estimator is unbiased, it exhibits a larger variance than estimation in a smaller model so that using this estimator results in a larger mean squared error; see also Claeskens & Hjort (2008, Ch. 5) for a discussion of this issue.
Compared to the uniform designs
and
, the optimal design
in (28) yields a reduction of the mean squared error by
and
for model averaging estimation with equal weights. Moreover, this design also reduces the mean squared error of model averaging estimation with smooth aic weights by
and
; for estimation in the model identified by the aic by
and
; and for estimation in the wide model by
and
.
As a further example we consider the model averaging estimator (15) of the parameter
for the four models in Example 1 with unequal weights, that is,
,
,
and
. The Bayesian optimal design for model averaging estimation of
is then
![]() |
(32) |
The necessary condition is depicted in Fig. 4.2(b).
A comparison of the designs
and
in (28) and (32) shows that the support points are similar, but that there are substantial differences in the weights.
In the simulation study of this model averaging estimator, we consider the same parameters as in the previous example. The corresponding results can be found in Table 2 and show a similar, though less pronounced, picture to the model averaging estimator with uniform weights. Model averaging always shows better performance than estimation in the model selected by the aic. The improvement varies between
and
using fixed weights, and between
and
using smooth aic weights. Moreover, for the designs
and
we observe an improvement when using fixed weights instead of smooth aic weights for model averaging, but for the design
there is in fact no improvement. Differently from Table 1, estimation in the wide model is more precise than estimation after performing model selection via the aic when using the Bayesian optimal design
. However, the model averaging techniques still yield a smaller mean squared error than estimation in the wide model. A comparison of the results in Tables 1 and 2 shows that for all designs nonuniform weights for model averaging estimation yield a larger mean squared error than uniform weights.
Table 2.
The mean squared error of the model averaging estimators of
for the Bayesian optimal design defined in (32), the uniform design defined in (29) and the uniform design defined in (30)
| Estimation method | ||||
|---|---|---|---|---|
| Design | Fixed weights | Smooth aic weights | Model selection | Wide model |
|
0.476* | 0.502* | 0.582* | 0.562* |
|
0.915 | 0.900 | 1.014 | 1.135 |
|
0.869 | 0.949 | 1.067 | 1.103 |
Fixed weights:
,
,
and
.
The Bayesian optimal design
for model averaging estimation of
improves the designs
and
by
and
, respectively, if model averaging with fixed, nonuniform, weights is used, and by
–
for model averaging estimation with smooth aic weights and estimation in the model selected by the aic or in the wide model, respectively.
Simulation results for further parameter combinations in the sigmoid Emax model can be found in the Supplementary Material, and show a very similar picture. We observe that in all the scenarios considered, model averaging estimation yields a smaller simulated mean squared error than estimation in a model identified by the aic, independently of the design and parameters under consideration. Bayesian optimal designs for model averaging estimation of
yield a substantially more precise estimation than the uniform designs in almost all cases.
4.3. Estimation of the area under the curve in the logistic regression model
We now consider the case where
is a normal density with variance
and mean function
![]() |
(33) |
This is a logistic regression model which is frequently used in dose-response or population growth modelling (Zwietering et al., 1990). The design space is
and we are interested in estimating the area under the curve defined in (8), where the region
and the design space
coincide. In model (33) the value
is the placebo effect,
denotes the maximum effect of the drug, relative to placebo, and
is the dose which produces half of the maximum effect. The parameter
characterizes the slope of the mean function
. We assume that the parameter
is included in every candidate model, whereas the components of the parameter
can be fixed to the corresponding components of
such that there are
competing models in the candidate set
defined by different mean functions, that is,
![]() |
(34) |
![]() |
(35) |
![]() |
(36) |
and
is defined by (33). As the parameters
and
appear linear in the model, only the prior distributions for
and
have to be specified; these are chosen as independent uniform priors supported on the sets
and
, respectively. The variance
is fixed as
and
is chosen such that
.
The Bayesian optimal design for model averaging estimation of the area under the curve with equal weights
has been calculated numerically and is
![]() |
(37) |
The performance of the different designs is again evaluated by means of a simulation study generating data from the model
![]() |
(38) |
where
are standard normal distributed random variables and
observations can be taken. We focus on the case
,
and
, which corresponds to a local misspecification, where
,
and
. Further results for other parameter choices show a similar picture and are discussed in the Supplementary Material.
The mean squared error of the model averaging estimator with equal weights
,
, for the different designs is given in the left column of Table 3, while the second and third columns show the corresponding results for the model averaging estimator with smooth aic weights and the estimator based on model selection, respectively. The fourth column presents the mean squared errors of the estimator of
in the wide model (33). We observe again that model averaging improves the estimation of the target area under the curve in the model chosen by the aic in all cases under consideration. For equal weights this improvement varies between
and
, depending on the design, while the improvement achieved by model averaging with smooth aic weights varies between
and
. The model averaging estimator with equal weights performs substantially better than the procedure with smooth aic weights and the estimation in the wide model (33).
Table 3.
The mean squared error of the model averaging estimators of the area under the curve with equal weights for the Bayesian optimal design (37), the uniform design defined in (29) and the uniform design defined in (30)
| Estimation method | ||||
|---|---|---|---|---|
| Design | Fixed weights | Smooth aic weights | Model selection | Wide model |
|
1.659* | 1.880 | 2.074 | 2.071 |
|
1.961 | 2.080 | 2.196 | 2.059 |
|
1.687 | 1.763* | 1.838* | 1.847* |
Fixed weights:
.
In the case of equal weights the Bayesian optimal design
for model averaging estimation of the area under the curve yields a
improvement of the uniform design
, but only a
improvement of the design
. On the other hand, if model averaging estimates with smooth aic weights and model selection weights or the estimator in the wide model are used, the uniform design
shows the best performance. This observation can be explained by the fact that the design
has not been constructed for this purpose. Consequently, although this design performs very well in many cases, it cannot be guaranteed that the design
is close to the optimal design for model averaging estimation of the area under the curve with smooth aic weights or for estimation in the wide model or a model selected by the AIC. Nevertheless, model averaging with equal weights and the corresponding Bayesian optimal design yields the smallest mean squared error in all considered scenarios.
Next, we consider a model averaging estimator with nonuniform weights
,
,
and
for the models (34), (35), (36) and (33), respectively. The corresponding Bayesian optimal design for model averaging estimation of the area under the curve with these weights is
![]() |
(39) |
The simulated mean squared errors of the various estimators of the area under the curve for the different designs are given in Table 4, where we use the same parameters as in the previous example. We observe similar behaviour to § 4.2: model averaging performs better than model selection, but in this situation model averaging based on smooth aic weights results in a slightly smaller mean squared error than model averaging based on fixed weights; the estimator with fixed weights yields an increase of the mean squared error of about
. In Table 3 the estimator in the wide model shows better performance than the estimators based on random weights for some designs, whereas in Table 4 the estimator in the wide model exhibits the largest mean squared error, independent of the design under consideration. For all model averaging estimators, including the estimator after model selection, the mean squared error from the Bayesian optimal design
defined in (39) is smaller than those obtained from the designs
and
.
Table 4.
The mean squared error of the model averaging estimators of the area under the curve for the Bayesian optimal design defined in (39), the uniform design defined in (29) and the uniform design defined in (30)
| Estimaton method | ||||
|---|---|---|---|---|
| Design | Fixed weights | Smooth aic weights | Model selection | Wide model |
|
1.764* | 1.723* | 1.835* | 1.978 |
|
2.059 | 2.041 | 2.129 | 2.185 |
|
1.841 | 1.801 | 1.883 | 1.886* |
Fixed weights:
,
,
and
.
Further simulation results using other parameter combinations can be found in the Supplementary Material and show similar results. For example, model averaging shows better performance than estimation in a model identified by the aic, independent of the design under consideration. In most cases the Bayesian optimal design for model averaging estimation of the area under the curve yields a substantial improvement compared to the uniform designs, even when it is used for model averaging with smooth aic weights or for estimation after model selection.
As pointed out by a reviewer, bic weights might be a reasonable alternative to aic weights. Therefore we conducted additional simulations with the same set-up as in the preceding sections using the Bayesian information criterion. See, for example, Claeskens & Hjort (2008, Ch. 3) for details regarding the calculation of smooth bic weights and for model selection by the Bayesian information criterion. The results are very similar to those presented here for the aic criterion and are therefore not depicted. Most importantly, we do not observe any substantial differences in the mean squared error of the model averaging estimators using smooth bic weights, and the estimators after model selection via bic compared to those based on the aic.
For the sake of brevity we have restricted ourselves to two examples. The performance of the proposed designs depends on many variables, such as the model under consideration, the respective prior distributions and the target parameter
. In particular, there might be other parameter combinations resulting in less visible improvement of the mean squared error of the model averaging estimator by the Bayesian optimal design. Such an example can be found in the Supplementary Material. In most scenarios that we have investigated, the Bayesian optimal designs improved the estimation precision of the model averaging estimator based on fixed weights.
5. Discussion
In our simulation study, model averaging estimators based on fixed weights usually achieve smaller mean squared errors than the estimators using data-dependent weights. Most likely, estimation precision can be further improved using optimal data-dependent model weights that minimize the mean squared error of the estimator. However, it remains an open and very challenging question for future research to determine optimal designs for estimation methods of this type. The asymptotic distribution of these estimators is complicated and has to be simulated in general for each design under consideration, which is computationally very demanding. A further interesting direction of future research in this context is the construction and investigation of adaptive designs, which proceed in several steps, updating the information about the models and their parameters sequentially.
Finally, the extension of the methodology to nonnested models is of particular importance and seems to be an interesting future research topic. As one cannot work with local alternatives to derive the asymptotic mean squared error, the theory of model averaging estimation developed in Claeskens & Hjort (2008) has first to be extended before it can be used to develop corresponding optimality criteria. Research in this direction is currently in progress. A further interesting question addressing a general aspect of model averaging estimation is to use a minimax approach for the choice of the weight
in the model averaging estimator.
Supplementary Material
Acknowledgement
This work was supported in part by the Deutsche Forschungsgemeinschaft and the National Institute of General Medical Sciences of the National Institutes of Health. The authors thank the referees and associate editor for their constructive comments on an earlier version of this paper.
Supplementary material
Supplementary material available at Biometrika online includes proofs of the theorems in § 3 and additional simulation results for the examples in § 4.
References
- Aoki, Y.,Röshammar, D.,Hamrén, B. & Hooker, A. C. (2017). Model selection and averaging of nonlinear mixed-effect models for robust phase III dose selection. J. Pharmacokin. Pharmacodynam. 44, 581–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
-
Atkinson, A. C. (2008).
-optimum designs for model discrimination and parameter estimation. J. Statist. Plan. Infer. 138, 56–64. [Google Scholar] - Atkinson, A. C.,Donev, A. & Randall, T. (2007). Optimum Experimental Designs, with SAS. Oxford: Oxford University Press. [Google Scholar]
- Atkinson, A. C. & Fedorov, V. V. (1975). The designs of experiments for discriminating between two rival models. Biometrika 62, 57–70. [Google Scholar]
- Biedermann, S.,Dette, H. & Pepelyshev, A. (2006). Some robust design strategies for percentile estimation in binary response models. Can. J. Statist. 34, 603–22. [Google Scholar]
- Bornkamp, B. (2015). Viewpoint: Model selection uncertainty, pre-specification and model averaging. Pharmaceut. Statist. 14, 79–81. [DOI] [PubMed] [Google Scholar]
- Bornkamp, B.,Bretz, F.,Dmitrienko, A.,Enas, G.,Gaydos, B.,Hsu, C.-H.,KÖNIG, F.,Krams, M.,Liu, Q.,Neuenschwander, B.,Parke, T. & Pinheiro, J. (2007). Innovative approaches for designing and analyzing adaptive dose-ranging trials. J. Biopharm. Statist. 17, 965–95. [DOI] [PubMed] [Google Scholar]
- Breiman, L. (1996). Bagging predictors. Mach. Learn. 24, 123–40. [Google Scholar]
- Bretz, F.,Dette, H. & Pinheiro, J. (2010). Practical considerations for optimal designs in clinical dose finding studies. Statist. Med. 29, 731–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bretz, F.,Hsu, J. & Pinheiro, J. (2008). Dose finding – a challenge in statistics. Biomet. J. 50, 480–504. [DOI] [PubMed] [Google Scholar]
- Buatois, S.,Ueckert, S.,Frey, N.,Retout, S. & Mentré, F. (2018). Comparison of model averaging and model selection in dose finding trials analyzed by nonlinear mixed effect models. AAPS J. 20, 56. [DOI] [PubMed] [Google Scholar]
- Buckland, S. T.,Burnham, K. P. & Augustin, N. H. (1997). Model selection: An integral part of inference. Biometrics 53, 603–18. [Google Scholar]
- Burnham, K. P. & Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. New York: Springer. [Google Scholar]
- Chaloner, K. & Verdinelli, I. (1995). Bayesian experimental design: A review. Statist. Sci. 10, 273–304. [Google Scholar]
- Claeskens, G. & Hjort, N. L. (2008). Model Selection and Model Averaging. Cambridge: Cambridge University Press. [Google Scholar]
- Cornish-Bowden, A. (2012). Fundamentals of Enzyme Kinetics. Weinheim: Wiley-Blackwell. [Google Scholar]
-
Dette, H. (1990). A generalization of
- and
-optimal designs in polynomial regression. Ann. Statist. 18, 1784–805. [Google Scholar] - Dette, H. (1997). Designing experiments with respect to ‘standardized’ optimality criteria. J. R. Statist. Soc. B 59, 97–110. [Google Scholar]
- Dette, H.,Bretz, F.,Pepelyshev, A. & Pinheiro, J. C. (2008). Optimal designs for dose finding studies. J. Am. Statist. Assoc. 103, 1225–37. [Google Scholar]
- Dette, H. & Titoff, S. (2009). Optimal discrimination designs. Ann. Statist. 37, 2056–82. [Google Scholar]
- Fedorov, V. V. & Leonov, S. L. (2013). Optimal Design for Nonlinear Response Models. Boca Raton, FL: CRC Press. [Google Scholar]
- Goutelle, S.,Maurin, M.,Rougier, F.,Barbaut, X.,Bourguignon, L.,Ducher, M. & Maire, P. (2008). The Hill equation: A review of its capabilities in pharmacological modelling. Fundamental Clin. Pharmacol. 22, 633–48. [DOI] [PubMed] [Google Scholar]
- Hansen, B. E. (2007). Least squares model averaging. Econometrica 75, 1175–89. [Google Scholar]
- Hjort, N. L. & Claeskens, G. (2003). Frequentist model average estimators. J. Am. Statist. Assoc. 98, 879–99. [Google Scholar]
- Hoeting, J. A.,Madigan, D.,Raftery, A. E. & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors). Statist. Sci. 14, 382–417. [Google Scholar]
- Holford, N. H. & Sheiner, L. B. (1981). Understanding the dose–effect relationship: Clinical application of pharmacokinetic–pharmacodynamic models. Clin. Pharmacokin. 6, 429–53. [DOI] [PubMed] [Google Scholar]
- Kiefer, J. (1974). General equivalence theory for optimum designs (approximate theory). Ann. Statist. 2, 849–79. [Google Scholar]
- Konishi, S. & Kitagawa, G. (2008). Information Criteria and Statistical Modeling. New York: John Wiley & Sons. [Google Scholar]
- Konstantinou, M.,Biedermann, S. & Kimber, A. (2017). Model robust designs for survival trials. Comp. Statist. Data Anal. 113, 239–50. [Google Scholar]
- Läuter, E. (1974). Experimental design in a class of models. Math. Operationsforsch. Statist. 5, 379–98. [Google Scholar]
- Lehmann, E. & Casella, G. (1998). Theory of Point Estimation. Springer Texts in Statistics. New York: Springer. [Google Scholar]
- Liang, H.,Zou, G.,Wan, A. T. K. & Zhang, X. (2011). Optimal weight choice for frequentist model average estimators. J. Am. Statist. Assoc. 106, 1053–66. [Google Scholar]
- López-Fidalgo, J.,Tommasi, C. & Trandafir, P. C. (2007). An optimal experimental design criterion for discriminating between non-normal models. J. R. Statist. Soc. B 69, 231–42. [Google Scholar]
-
MacDougall, J. (2006). Analysis of dose-response studies –
model. In Dose Finding in Drug Development. Ed. Ting N., pp. 127–45. New York: Springer. [Google Scholar] - Powell, M. J. D. (1994). A direct search optimization method that models the objective and constraint functions by linear interpolation. In Advances in Optimization and Numerical Analysis. Ed. Hennart J.-P. & Gomez S., pp. 51–67. Dordrecht: Kluwer. [Google Scholar]
- Pronzato, L. & Walter, E. (1985). Robust experimental design via stochastic approximation. Math. Biosci. 75, 103–20. [Google Scholar]
- Pukelsheim, F. (2006). Optimal Design of Experiments. Classics in Applied Mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics. [Google Scholar]
- Raftery, A. & Zheng, Y. (2003). Discussion: Performance of Bayesian model averaging. J. Am. Statist. Assoc. 98, 931–8. [Google Scholar]
- Schorning, K.,Bornkamp, B.,Bretz, F. & Dette, H. (2016). Model selection versus model averaging in dose finding studies. Statist. Med. 35, 4021–40. [DOI] [PubMed] [Google Scholar]
- Tommasi, C. (2009). Optimal designs for both model discrimination and parameter estimation. J. Statist. Plan. Infer. 139, 4123–32. [Google Scholar]
- Tommasi, C. & López-Fidalgo, J. (2010). Bayesian optimum designs for discriminating between models with any distribution. Comp. Statist. Data Anal. 54, 143–50. [Google Scholar]
-
Ucinski, D. & Bogacka, B. (2005).
-optimum designs for discrimination between two multiresponse dynamic models. J. R. Statist. Soc. B 67, 3–18. [Google Scholar] - Wagner, M. & Hlouskova, J. (2015). Growth regressions, principal components augmented regressions and frequentist model averaging. Jahrbücher für Nationalökonomie und Statistik 235, 642–62. [Google Scholar]
- Wassermann, L. (2000). Bayesian model selection and model averaging. J. Math. Psychol. 44, 92–107. [DOI] [PubMed] [Google Scholar]
- Wiens, D. (2015). Robustness of design. In Handbook of Design and Analysis of Experiments. Ed. Dean A.,Morris M.,Stufken J. & Bingham D.. Boca Raton: CRC Press. [Google Scholar]
- Wiens, D. P. (2009). Robust discrimination designs. J. R. Statist. Soc. B 71, 805–29. [Google Scholar]
- Wiens, D. P. & Xu, X. (2008). Robust prediction and extrapolation designs for misspecified generalized linear regression models. J. Statist. Plan. Infer. 138, 30–46. [Google Scholar]
- Zen, M.-M. & Tsai, M.-H. (2002). Some criterion-robust optimal designs for the dual problem of model discrimination and parameter estimation. Sankhya 64, 322–38. [Google Scholar]
- Zwietering, M. H.,Jöngenburger, I.,Rombouts, F. M. & Ried, K. V. (1990). Modeling of the bacterial growth curve. Appl. Envir. Microbiol. 56, 1875–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.









































































































