Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2019 Jul 13;106(3):665–682. doi: 10.1093/biomet/asz036

Optimal designs for frequentist model averaging

K Alhorn 1, K Schorning 2, H Dette 2,
PMCID: PMC6690170  PMID: 31427825

Summary

We consider the problem of designing experiments for estimating a target parameter in regression analysis when there is uncertainty about the parametric form of the regression function. A new optimality criterion is proposed that chooses the experimental design to minimize the asymptotic mean squared error of the frequentist model averaging estimate. Necessary conditions for the optimal solution of a locally and Bayesian optimal design problem are established. The results are illustrated in several examples, and it is demonstrated that Bayesian optimal designs can yield a reduction of the mean squared error of the model averaging estimator by up to 45%.

Keywords: Bayesian optimal design, Local misspecification, Model averaging, Model selection, Model uncertainty, Optimal design

1. Introduction

It is well known that a carefully designed experiment can substantially improve statistical inference in regression analysis. Optimal design of experiments is more efficient the more knowledge about the underlying regression model is available, and an impressive theory has been developed to construct optimal designs under the assumption of a given regression model; see, for example Pukelsheim (2006), Atkinson et al. (2007) and Fedorov & Leonov (2013). On the other hand, model selection is an important step in any data analysis, and these references also partially discuss the problem of constructing efficient designs to address model uncertainty in the design of experiments. Because of its importance this problem has a long history. Early work dates back to Atkinson & Fedorov (1975), who determined optimal designs for model discrimination by maximizing the power of a test between competing regression models; see also Ucinski & Bogacka (2005); López-Fidalgo et al. (2007); Dette & Titoff (2009); Wiens (2009) or Tommasi & López-Fidalgo (2010) for some more recent references. A different line of research in this context was initiated by Läuter (1974), who proposed a criterion based on a product of the determinants of the information matrices in the various models under consideration, which yields efficient designs for all models under consideration. This criterion has been used successfully by Dette (1990) to determine efficient designs for a class of polynomial regression models, and by Biedermann et al. (2006) to construct efficient designs for binary response models, when there is uncertainty about the form of the link function. As these criteria do not reflect model discrimination, Zen & Tsai (2002), Atkinson (2008) and Tommasi (2009) considered a mixture of Läuter-type and discrimination criteria to construct efficient designs for model discrimination and parameter estimation. An alternative concept to robust designs with respect to misspecified models consists of the minimization of the maximal mean squared error calculated over a class of misspecified models with respect to the design under consideration; see Wiens (2015) for an overview. Several authors have worked on this problem, for example Wiens & Xu (2008), who derived robust prediction and extrapolation designs, or Konstantinou et al. (2017), who analysed robust designs under local alternatives for survival trials. This list of references is by no means complete. A common feature in most of the literature is that either, at least implicitly, the designs are constructed under the assumption that model selection is performed by hypotheses testing or designs are determined with good properties for a class of competing models.

On the other hand there is an enormous amount of literature on performing statistical inference under model uncertainty, which to the best of our knowledge has not been discussed in the context of optimal experimental design. One possibility is to select an adequate model from a set of candidate models, and numerous model selection criteria have been developed for this purpose (Burnham & Anderson, 2002; Claeskens & Hjort, 2008; Konishi & Kitagawa, 2008). These procedures are widely used and have the advantage of delivering a single model for the statistical analysis, which makes them very attractive for practitioners. However, there exists a well-known post-selection problem in this approach because estimators chosen after model selection usually behave like mixtures of many potential estimators. For example, if Inline graphic is a parameter of interest in a regression model, such as a prediction at a particular point, the area under the curve, or a specific quantile of the regression model, it is known that selecting a single model and ignoring the uncertainty resulting from the selection process may give confidence intervals for Inline graphic with coverage probability smaller than the nominal value; see, for example, Claeskens & Hjort (2008, Ch. 7) for a mathematical treatment or Bornkamp (2015) for a high-level discussion of this phenomenon.

As an alternative, several authors proposed smoothing estimators for the parameter Inline graphic across several models, rather than choosing a specific model from the class under consideration and performing the estimation in the selected model. This approach takes the additional estimator variability caused by model uncertainty adequately into account and has been discussed intensively in the Bayesian community, where it is known as Bayesian model averaging; see Hoeting et al. (1999), among many others. Hjort & Claeskens (2003) pointed out several problems with this approach. In particular, they mentioned the difficulties in specifying the prior probabilities for a class of models and the problem of mixing together many conflicting prior opinions in the statistical analysis. They proposed an alternative non-Bayesian approach, which they call frequentist model averaging, and developed asymptotic theory for their method. Evidence exists that model averaging improves the estimation efficiency (Breiman, 1996; Raftery & Zheng, 2003), and Schorning et al. (2016) recently demonstrated the superiority of model averaging over estimation after model selection by an information criterion in the context of dose response models. These results have recently been confirmed by Aoki et al. (2017) and Buatois et al. (2018) in the context of nonlinear mixed effect models.

The present paper is devoted to the construction of optimal designs if parameters of interest are estimated under model uncertainty via frequentist model averaging. An optimal design for model averaging estimation minimizes the asymptotic mean squared error of the model averaging estimator under local alternatives, and we show that model averaging estimators can be improved if the experiments are well designed.

2. Model averaging under local misspecification

Model averaging is a common technique for estimating a parameter of interest, say Inline graphic, under model uncertainty. Using this technique, an estimate for a parameter of interest is a weighted average of estimates of this parameter in the competing models under consideration, where different choices for the weights have been proposed in the literature; see, for example, Wassermann (2000) or Hansen (2007) for Bayesian and non-Bayesian model averaging methods. In this section we briefly describe this concept and the corresponding asymptotic theory in the present context, such that the results can be used to construct optimal designs for model averaging estimation. The results follow more or less from the statements in Hjort & Claeskens (2003) and Claeskens & Hjort (2008); details regarding their derivation are omitted for the sake of brevity.

We assume that Inline graphic different experimental conditions, say Inline graphic, are chosen in the design space Inline graphic, and that at each experimental condition Inline graphic one can observe Inline graphic responses, say Inline graphic, Inline graphic. We also assume that for each Inline graphic the responses Inline graphic at experimental condition Inline graphic are realizations of independent and identically distributed, real-valued, random variables Inline graphic with unknown density Inline graphic. Therefore the total sample size is Inline graphic and the experimental design problem consists of the choice of the number Inline graphic of different experimental conditions, the experimental conditions Inline graphic themselves and the number Inline graphic of observations taken at each Inline graphic, Inline graphic, such that the model averaging estimate is most efficient.

To measure efficiency and to compare different experimental designs we will use asymptotic arguments and consider the case Inline graphic for Inline graphic. As is common in optimal design theory we collect this information in the matrix

graphic file with name Equation1.gif (1)

Following Claeskens & Hjort (2008) we assume that Inline graphic is contained in a set, say Inline graphic, of parametric candidate densities which is constructed as follows. The first candidate density in Inline graphic is a parametric density Inline graphic, where the form of Inline graphic is assumed to be known, Inline graphic and Inline graphic denote the unknown parameters, which vary in a compact parameter space, say Inline graphic. The second candidate density is the parametric density Inline graphic, which is obtained by fixing the parameter value Inline graphic to a prespecified known value Inline graphic. Throughout, we will call Inline graphic the wide density and Inline graphic the narrow density, respectively. We use the analogous expressions wide model and narrow model for the corresponding candidate models. Additional candidate models are obtained by choosing certain submodels between the wide density Inline graphic and the narrow density Inline graphic. More precisely, for a chosen subset Inline graphic of indices with cardinality Inline graphic, we introduce the projection matrices Inline graphic and Inline graphic which map a Inline graphic-dimensional vector to its components corresponding to indices in Inline graphic and Inline graphic, respectively. Using the abbreviations Inline graphic and Inline graphic, we define the candidate density Inline graphic by

graphic file with name Equation2.gif (2)

Consequently, for the density Inline graphic the components of Inline graphic with indices in Inline graphic are fixed to the corresponding components of Inline graphic, while the components with indices in Inline graphic are considered as unknown parameters. Note that Inline graphic, Inline graphic and that in the most general case there are Inline graphic possible candidate densities. As we might not be interested in all possible submodels, we assume that the competing models are defined by different sets Inline graphic for some Inline graphic. Thus the class Inline graphic of candidate models is given by

graphic file with name Equation3.gif (3)

Following Hjort & Claeskens (2003), we consider local deviations throughout and assume that the true density is given by

graphic file with name Equation4.gif (4)

where the true parameter values are Inline graphic and Inline graphic. The true density is given by the wide density with a varying value of Inline graphic, which differs from Inline graphic through the perturbation term Inline graphic. Thus, for Inline graphic tending to infinity, it approximates the narrow density Inline graphic.

Example 1.

Consider the case where Inline graphic is a normal density with variance Inline graphic and mean function

Example 1. (5)

where Inline graphic, Inline graphic and the explanatory variable Inline graphic varies in an interval, say Inline graphic. This model is the sigmoid Emax model, and has numerous applications in modelling the dependence of biochemical or pharmacological responses on concentration (Goutelle et al., 2008). The sigmoid Emax model is especially popular for describing dose-response relationships in drug development (MacDougall, 2006). The parameters in (5) have a concrete interpretation: Inline graphic is used to model a placebo effect, Inline graphic denotes the maximum effect of Inline graphic relative to the placebo, and Inline graphic is the value of Inline graphic which produces half of the maximum effect. The so-called Hill parameter Inline graphic characterizes the slope of the mean function Inline graphic. The parameter Inline graphic is included in every candidate model, whereas for the narrow model the components are fixed as Inline graphic. Consequently, the narrow candidate model is a normal density with mean

Example 1. (6)

and variance Inline graphic. In this case, Inline graphic is the Michaelis–Menten function, which is widely utilized to represent an enzyme kinetics reaction, where enzymes bind substrates and turn them into products; see, for example, Cornish-Bowden (2012). The two other candidate models are obtained by either fixing Inline graphic or Inline graphic, and the corresponding densities are normal densities with mean functions

Example 1. (7)

respectively. The latter model is the Emax model, sometimes also referred to as the hyperbolic Emax model (Holford & Sheiner, 1981; MacDougall, 2006). Finally, under the local misspecification assumption (4) the true density Inline graphic corresponds to a normal distribution with mean

Example 1.

and variance Inline graphic. Typical functionals Inline graphic of interest are the area under the curve

Example 1. (8)

calculated for a given region Inline graphic or, for a given Inline graphic, the quantile defined by

Example 1. (9)

The value defined in (9) is well known as Inline graphic, that is, the effective dose at which Inline graphic of the maximum effect is achieved; see MacDougall (2006) or Bretz et al. (2008).

We are interested in the estimation of a quantity, say Inline graphic, where Inline graphic is a differentiable function of the parameter Inline graphic. For this purpose we fix one model Inline graphic in the set of candidate models defined in (3) and use the estimator Inline graphic, where Inline graphic is the maximum-likelihood estimator in model Inline graphic. Under the assumption (4) of a local misspecification and the common conditions of regularity (Lehmann & Casella, 1998, Ch. 6), it can be shown by adapting the arguments in Hjort & Claeskens (2003) and Claeskens & Hjort (2008) to the current situation that the resulting estimator Inline graphic satisfies

graphic file with name Equation11.gif (10)

Here, Inline graphic denotes weak convergence and Inline graphic is a normal distribution with variance

graphic file with name Equation12.gif

where Inline graphic is the gradient of Inline graphic with respect to Inline graphic, that is,

graphic file with name Equation13.gif (11)

and Inline graphic the information matrix in the candidate model Inline graphic, that is,

graphic file with name Equation14.gif (12)

The mean Inline graphic in (10) is of the form

graphic file with name Equation15.gif

where

graphic file with name Equation16.gif

is the gradient in the wide model with respect to the parameters, the matrix Inline graphic is defined by

graphic file with name Equation17.gif (13)

the matrices Inline graphic and Inline graphic are given by (12) and

graphic file with name Equation18.gif (14)

respectively, and Inline graphic denotes the information matrix in the wide model Inline graphic.

The frequentist model averaging estimator is now defined by assigning weights Inline graphic, with Inline graphic, to the different candidate models Inline graphic and defining

graphic file with name Equation19.gif (15)

where Inline graphic are the estimators in the different candidate models Inline graphic. The asymptotic behaviour of the model averaging estimator Inline graphic can be derived from Hjort & Claeskens (2003). In particular, it can be shown that under assumption (4) and the standard regularity conditions a standardized version of Inline graphic is asymptotically normally distributed, that is,

graphic file with name Equation20.gif (16)

Here, the mean and variance are

graphic file with name Equation21.gif (17)
graphic file with name Equation22.gif (18)

respectively, Inline graphic is the information matrix of the wide model Inline graphic, and the vector Inline graphic is given by

graphic file with name Equation23.gif (19)

where we used the notation of Inline graphic, Inline graphic and Inline graphic introduced in (11), (12) and (14). The optimal design criterion for model averaging proposed here is based on an asymptotic representation of the mean squared error of the estimate Inline graphic derived from (16), and will be carefully defined in the following section.

3. An optimality criterion for model averaging estimation

Following Kiefer (1974), we call the quantity Inline graphic in (1) an approximate design on the design space Inline graphic. This means that the support points Inline graphic define the distinct dose levels, where observations are to be taken and the weights Inline graphic represent the relative proportion of responses at the corresponding support point Inline graphic, Inline graphic. For an approximate design Inline graphic and given total sample size Inline graphic, a rounding procedure is applied to obtain integers Inline graphic, Inline graphic, from the not necessarily integer-valued quantities Inline graphic (Pukelsheim, 2006, Ch. 12), which define the number of observations at Inline graphic, Inline graphic.

If the observations are taken according to an approximate design Inline graphic and an appropriate rounding procedure is used such that Inline graphic as Inline graphic, the asymptotic mean squared error of the model averaging estimate Inline graphic of the parameter of interest Inline graphic can be obtained from the discussion in § 2, that is

graphic file with name Equation24.gif (20)

where the variance Inline graphic and the bias Inline graphic are defined in equations (17) and (18), respectively. Consequently, a good design for the model averaging estimate (15) should give small values of Inline graphic. Therefore, for a given finite set Inline graphic of candidate models Inline graphic of the form (2) and weights Inline graphic, a design Inline graphic is called a locally optimal design for model averaging estimation of the parameter Inline graphic if it minimizes the function Inline graphic in (20) in the class of all approximate designs on Inline graphic.

Locally model averaging optimal designs address uncertainty only with respect to the model Inline graphic, but require prior information for the parameters Inline graphic and Inline graphic. While such knowledge might be available in some circumstances (see, e.g., Dette et al., 2008; Bretz et al., 2010), sophisticated design strategies have been proposed in the literature that require less precise knowledge about the model parameters, such as sequential, Bayesian or standardized maximin optimality criteria (Pronzato & Walter, 1985; Chaloner & Verdinelli, 1995; Dette, 1997). Any of these methodologies can be used to construct efficient robust designs for model averaging, and for the sake of brevity we restrict ourselves to Bayesian optimality criteria.

Here we address the uncertainty with respect to the unknown model parameters by a prior distribution, say Inline graphic, on Inline graphic; we call a design Inline graphic Bayesian optimal for model averaging estimation of the parameter Inline graphic with respect to the prior Inline graphic if it minimizes the function

graphic file with name Equation25.gif (21)

where the function Inline graphic is defined in (20). We assume throughout that the integral exists. For the sake of brevity we restrict ourselves to a prior distribution for the parameters Inline graphic and Inline graphic. In applications one could also use a prior distribution for Inline graphic to address uncertainty about this parameter.

The Bayesian optimal design still depends on the choice of the prior distribution Inline graphic. In some applications there is prior knowledge available that can be used for modelling the prior distribution. A typical example is dose finding studies, where information from pharmaceutical development or from a phase I trial can be used (Dette et al., 2008). Alternatively, noninformative priors can be applied to obtain efficient designs which are robust against misspecification of the parameters. The performance of the resulting optimal designs depends on how the choice of the prior distribution reflects the truth. Numerical investigations show that locally optimal designs have smaller mean squared errors than Bayesian optimal designs for model averaging estimation of the parameter Inline graphic, as long as the prior guesses of the parameters are very precise. Under misspecification of the parameters, Bayesian optimal designs generally show better performance than locally optimal designs.

Locally and Bayesian optimal designs for model averaging have to be calculated numerically in all cases of interest, and we present several examples in § 4. Next, we state necessary conditions for Inline graphic and Inline graphic optimality. The proofs are given in the Supplementary Material.

Theorem 1.

If the design Inline graphic is a locally optimal design for model averaging estimation of the parameter Inline graphic, then the inequality

Theorem 1. (22)

holds for all Inline graphic, where Inline graphic is defined by (17) and the functions Inline graphic and Inline graphic are given by

Theorem 1. (23)
Theorem 1. (24)

where Inline graphic the vector Inline graphic is defined by (19) and the information matrix Inline graphic by (12), respectively. The design Inline graphic denotes the Dirac measure at the point Inline graphic.

Moreover, there is equality in (22) for every point Inline graphic in the support of Inline graphic.

Theorem 2.

If a design Inline graphic is Bayesian optimal for model averaging estimation of the parameter Inline graphic with respect to the prior Inline graphic, then

Theorem 2. (25)

holds for all Inline graphic, where the derivatives Inline graphic and Inline graphic are given by (23) and (24), respectively. Moreover, there is equality in (25) for every point Inline graphic in the support of Inline graphic.

The derived conditions of Theorems 1 and 2 can be used in the following way: if a numerically calculated design does not satisfy inequality (22), it will not be locally optimal for model averaging estimation of the parameter Inline graphic and the search for the optimal design has to be continued. The functions Inline graphic and Inline graphic are not convex, and therefore sufficient conditions for optimality are not available.

Remark 1.

Hjort & Claeskens (2003) also considered model averaging using weights Inline graphic depending on the data Inline graphic in the definition of the estimator Inline graphic in (15). Because of this dependency they are called random weights in the literature. Typical examples are smooth aic weights

Remark 1. (26)

based on the aic scores Inline graphic where Inline graphic denotes the loglikelihood function of model Inline graphic evaluated in the maximum likelihood estimator Inline graphic, and Inline graphic is the number of parameters to be estimated in model Inline graphic, Inline graphic (Claeskens & Hjort, 2008, Ch. 2). Moreover, the estimator of a target Inline graphic which is based on model selection by aic can also be rewritten in terms of a model averaging estimator by using random weights of the form

Remark 1. (27)

where Inline graphic is the indicator function of the set Inline graphic and Inline graphic denotes the model with the greatest aic score among the candidate models. For further choices of model averaging weights see, for example, Buckland et al. (1997), Hjort & Claeskens (2003) or Hansen (2007). In general, the case of random weights in model averaging estimation is more difficult to handle and the asymptotic distribution is not normal (Claeskens & Hjort, 2008, p. 196). As a consequence, an explicit calculation of the asymptotic bias and variance is not available.

From the design perspective it therefore seems to be reasonable to consider the case of fixed weights, for which the asymptotic properties of model averaging estimation under local misspecification are well understood and determine efficient designs for this estimation technique. Moreover, we also demonstrate in § 4 and in the Supplementary Material that model averaging estimation with fixed weights often shows better performance than model averaging with smooth aic weights, and that the optimal designs derived under the assumption of fixed weights also improve the current state of the art for model averaging using random weights.

4. Optimal designs for model averaging

4.1. Numerical calculation of optimal designs

We investigate the performance of optimal designs for model averaging estimation of a parameter Inline graphic, considering several examples from the literature and comparing the Bayesian optimal designs for model averaging estimation with commonly used uniform designs by means of a simulation study.

In our examples we calculate the derivatives in the matrices Inline graphic and Inline graphic and the vector Inline graphic analytically. This keeps the numerical errors in the computation of the criterion Inline graphic in (21) and the directional derivative Inline graphic in (25) as small as possible. Claeskens & Hjort (2008, § 6.5) give recommendations for estimating the matrices Inline graphic and Inline graphic when such an analytical expression is not available, but this might result in considerable numerical errors. Computing the inverse Inline graphic can yield nonnegligible numerical errors when the information matrix is ill-conditioned. Therefore we recommend examining the condition numbers of the matrices Inline graphic and Inline graphic thoroughly before determining the optimal designs.

Many algorithms for the calculation of optimal designs have been developed (see, e.g., Fedorov & Leonov, 2013), and many of them build upon the directional derivative. These algorithms can be computationally less demanding than direct optimization of the criterion. Since computation of the directional derivative Inline graphic in (25) is more complicated and numerically more unstable than calculating the criterion Inline graphic in (21), we seek a direct minimization of the criterion Inline graphic with respect to the support points Inline graphic and the weights Inline graphic, Inline graphic, without using the directional derivative Inline graphic in the optimization step. Throughout, we use the COBYLA algorithm for the minimization of the criterion Inline graphic defined in (21); see Powell (1994) for details.

4.2. Estimation of EDInline graphic in the sigmoid Emax model

We consider the situation introduced in Example 1, where the underlying density is a normal distribution with variance Inline graphic and different regression functions are under consideration for the mean. More precisely, the set Inline graphic contains Inline graphic candidate models which are defined by the different mean functions (5), (6) and (7), respectively. The parameter of interest Inline graphic is Inline graphic as defined in (9), which is estimated by an appropriate model averaging estimator. The design space is the interval Inline graphic, and we assume that Inline graphic observations can be taken in the experiment.

We determine a Bayesian optimal design for model averaging estimation of Inline graphic. As the Emax model is linear in the parameters Inline graphic and Inline graphic, the optimality criterion does not depend on Inline graphic or Inline graphic and no prior information is required for these parameters. For the parameters Inline graphic we choose independent uniform priors Inline graphic and Inline graphic on the sets Inline graphic and Inline graphic, respectively, and the variance Inline graphic is fixed as Inline graphic. One can also choose a prior for Inline graphic. Finally, under the local misspecification assumption we set Inline graphic such that Inline graphic.

We first consider equal weights for the model averaging estimator, that is, Inline graphic, Inline graphic. The Bayesian optimal design for model averaging estimation of Inline graphic is

graphic file with name Equation32.gif (28)

and satisfies the necessary condition of optimality in Theorem 2, as shown in Fig. 4.2(a). The design Inline graphic defined by (28) would not be optimal if the inequality was not satisfied.

Fig. 1.

Fig. 1.

The function Inline graphic in (25) evaluated for (a) the design Inline graphic in (28) and (b) the design Inline graphic in (32).

In order to investigate the properties of the different designs for model averaging estimation, we have conducted a simulation study where we compare the Bayesian optimal design (28) for model averaging estimation of Inline graphic with two uniform designs,

graphic file with name Equation33.gif (29)
graphic file with name Equation34.gif (30)

which are quite popular in the presence of model uncertainty (Bornkamp et al., 2007; Schorning et al., 2016). The design Inline graphic is a uniform design with the same number of support points as the optimal design in (28), whereas the design Inline graphic is a uniform design with more support points. Moreover, we also provide a comparison with two estimators commonly used in practice, namely the model averaging estimator based on smooth aic weights defined in (26) and the estimator in the model chosen by aic model selection, which is obtained as a model averaging estimator (15) using the weights in (27). Additionally, we investigate the estimator in the wide model (5), as proposed by a referee. For these estimators we also used observations taken according to the designs Inline graphic, Inline graphic and Inline graphic. As the approximate designs cannot be implemented directly for Inline graphic observations, a rounding procedure (see, e.g., Pukelsheim, 2006, Ch. 12) is applied to determine the number Inline graphic of observations taken at Inline graphic such that we have in total Inline graphic observations. For example, the implemented design obtained from the Bayesian optimal design Inline graphic for model averaging estimation of Inline graphic uses Inline graphic, Inline graphic, Inline graphic, Inline graphic and Inline graphic observations at the points Inline graphic, Inline graphic, Inline graphic, Inline graphic and Inline graphic, respectively, and implementable versions of the designs Inline graphic and Inline graphic are obtained similarly.

All the results presented here are based on Inline graphic simulation runs each generating Inline graphic observations of the form

graphic file with name Equation35.gif (31)

for the different designs, where the Inline graphic are independent standard normal distributed random variables; different combinations of the true parameters Inline graphic in (31) are investigated, whereas Inline graphic is fixed. In the following discussion we will restrict ourselves to presenting the results for the parameters Inline graphic, Inline graphic. This is the parameter combination under the local misspecification assumption for Inline graphic, Inline graphic and Inline graphic. Further simulation results for other parameter combinations can be found in the Supplementary Material.

In each simulation run the parameter Inline graphic is estimated by model averaging using the different designs, and the mean squared error is calculated from all Inline graphic simulation runs. More precisely, if Inline graphic is the model averaging estimator for the parameter of interest Inline graphic based on the observations Inline graphic from model (31) with the design Inline graphic, its mean squared error is Inline graphic where Inline graphic is the Inline graphic in the true sigmoid Emax model (31) with parameters Inline graphic. The simulated mean squared error of the model averaging estimator with equal weights Inline graphic, Inline graphic, for the different designs Inline graphic, Inline graphic and Inline graphic is shown in the left column of Table 1. The second column of this table shows the mean squared error of the model averaging estimator with the smooth aic weights in (26), while the third column gives the corresponding results for the weights in (27), that is, estimation of Inline graphic in the model identified by the aic for the different designs. Finally, the right column presents the results using the estimator in the wide model (5). The numbers marked with an asterisk in each column correspond to the smallest mean squared error obtained from the three designs.

Table 1.

The mean squared error of the model averaging estimators of Inline graphic for the Bayesian optimal design (28), the uniform design defined in (29), and the uniform design defined in (30)

  Estimation method
Design Fixed weights Smooth aic weights Model selection Wide model
Inline graphic 0.355* 0.508* 0.596* 0.675*
Inline graphic 0.810 0.913 1.017 1.267
Inline graphic 0.637 0.846 0.994 1.121

Fixed weights: Inline graphic

We observe that model averaging always yields a smaller mean squared error than estimation in the model identified by the aic. For example, as shown in the first row in Table 1, if the design Inline graphic is used then the mean squared error of the estimator based on model selection is Inline graphic, whereas it is Inline graphic and Inline graphic for the model averaging estimator using equal weights and smooth aic weights, respectively. The situation for the non-optimal uniform designs is similar. These results, and also further simulation results in the Supplementary Material, coincide with the findings of Schorning et al. (2016), Aoki et al. (2017) and Buatois et al. (2018), and indicate that model averaging usually yields more precise estimates of the target than estimators based on model selection. Moreover, model averaging estimation with equal weights shows substantially better performance than the model averaging estimator with data-driven weights. Wagner & Hlouskova (2015) observed a similar effect in the context of principal components augmented regressions. A potential explanation of this observation is that the smooth aic weights might not be optimal for model averaging estimation in this situation. There is a vast amount of literature on the optimal choice of model weights such that the model averaging estimator has minimum mean squared error (Hjort & Claeskens, 2003; Hansen, 2007; Liang et al., 2011). These weights depend on the models and target parameters under consideration, and are in general data driven. Thus, considering the discussion in Remark 1, determining optimal weights prior to experimental design is infeasible. Finally, using the wide model for estimating Inline graphic yields the largest mean squared error. Even though this estimator is unbiased, it exhibits a larger variance than estimation in a smaller model so that using this estimator results in a larger mean squared error; see also Claeskens & Hjort (2008, Ch. 5) for a discussion of this issue.

Compared to the uniform designs Inline graphic and Inline graphic, the optimal design Inline graphic in (28) yields a reduction of the mean squared error by Inline graphic and Inline graphic for model averaging estimation with equal weights. Moreover, this design also reduces the mean squared error of model averaging estimation with smooth aic weights by Inline graphic and Inline graphic; for estimation in the model identified by the aic by Inline graphic and Inline graphic; and for estimation in the wide model by Inline graphic and Inline graphic.

As a further example we consider the model averaging estimator (15) of the parameter Inline graphic for the four models in Example 1 with unequal weights, that is, Inline graphic, Inline graphic, Inline graphic and Inline graphic. The Bayesian optimal design for model averaging estimation of Inline graphic is then

graphic file with name Equation36.gif (32)

The necessary condition is depicted in Fig. 4.2(b).

A comparison of the designs Inline graphic and Inline graphic in (28) and (32) shows that the support points are similar, but that there are substantial differences in the weights.

In the simulation study of this model averaging estimator, we consider the same parameters as in the previous example. The corresponding results can be found in Table 2 and show a similar, though less pronounced, picture to the model averaging estimator with uniform weights. Model averaging always shows better performance than estimation in the model selected by the aic. The improvement varies between Inline graphic and Inline graphic using fixed weights, and between Inline graphic and Inline graphic using smooth aic weights. Moreover, for the designs Inline graphic and Inline graphic we observe an improvement when using fixed weights instead of smooth aic weights for model averaging, but for the design Inline graphic there is in fact no improvement. Differently from Table 1, estimation in the wide model is more precise than estimation after performing model selection via the aic when using the Bayesian optimal design Inline graphic. However, the model averaging techniques still yield a smaller mean squared error than estimation in the wide model. A comparison of the results in Tables 1 and 2 shows that for all designs nonuniform weights for model averaging estimation yield a larger mean squared error than uniform weights.

Table 2.

The mean squared error of the model averaging estimators of Inline graphic for the Bayesian optimal design defined in (32), the uniform design defined in (29) and the uniform design defined in (30)

  Estimation method
Design Fixed weights Smooth aic weights Model selection Wide model
Inline graphic 0.476* 0.502* 0.582* 0.562*
Inline graphic 0.915 0.900 1.014 1.135
Inline graphic 0.869 0.949 1.067 1.103

Fixed weights: Inline graphic, Inline graphic, Inline graphic and Inline graphic.

The Bayesian optimal design Inline graphic for model averaging estimation of Inline graphic improves the designs Inline graphic and Inline graphic by Inline graphic and Inline graphic, respectively, if model averaging with fixed, nonuniform, weights is used, and by Inline graphicInline graphic for model averaging estimation with smooth aic weights and estimation in the model selected by the aic or in the wide model, respectively.

Simulation results for further parameter combinations in the sigmoid Emax model can be found in the Supplementary Material, and show a very similar picture. We observe that in all the scenarios considered, model averaging estimation yields a smaller simulated mean squared error than estimation in a model identified by the aic, independently of the design and parameters under consideration. Bayesian optimal designs for model averaging estimation of Inline graphic yield a substantially more precise estimation than the uniform designs in almost all cases.

4.3. Estimation of the area under the curve in the logistic regression model

We now consider the case where Inline graphic is a normal density with variance Inline graphic and mean function

graphic file with name Equation37.gif (33)

This is a logistic regression model which is frequently used in dose-response or population growth modelling (Zwietering et al., 1990). The design space is Inline graphic and we are interested in estimating the area under the curve defined in (8), where the region Inline graphic and the design space Inline graphic coincide. In model (33) the value Inline graphic is the placebo effect, Inline graphic denotes the maximum effect of the drug, relative to placebo, and Inline graphic is the dose which produces half of the maximum effect. The parameter Inline graphic characterizes the slope of the mean function Inline graphic. We assume that the parameter Inline graphic is included in every candidate model, whereas the components of the parameter Inline graphic can be fixed to the corresponding components of Inline graphic such that there are Inline graphic competing models in the candidate set Inline graphic defined by different mean functions, that is,

graphic file with name Equation38.gif (34)
graphic file with name Equation39.gif (35)
graphic file with name Equation40.gif (36)

and Inline graphic is defined by (33). As the parameters Inline graphic and Inline graphic appear linear in the model, only the prior distributions for Inline graphic and Inline graphic have to be specified; these are chosen as independent uniform priors supported on the sets Inline graphic and Inline graphic, respectively. The variance Inline graphic is fixed as Inline graphic and Inline graphic is chosen such that Inline graphic.

The Bayesian optimal design for model averaging estimation of the area under the curve with equal weights Inline graphic has been calculated numerically and is

graphic file with name Equation41.gif (37)

The performance of the different designs is again evaluated by means of a simulation study generating data from the model

graphic file with name Equation42.gif (38)

where Inline graphic are standard normal distributed random variables and Inline graphic observations can be taken. We focus on the case Inline graphic, Inline graphic and Inline graphic, which corresponds to a local misspecification, where Inline graphic, Inline graphic and Inline graphic. Further results for other parameter choices show a similar picture and are discussed in the Supplementary Material.

The mean squared error of the model averaging estimator with equal weights Inline graphic, Inline graphic, for the different designs is given in the left column of Table 3, while the second and third columns show the corresponding results for the model averaging estimator with smooth aic weights and the estimator based on model selection, respectively. The fourth column presents the mean squared errors of the estimator of Inline graphic in the wide model (33). We observe again that model averaging improves the estimation of the target area under the curve in the model chosen by the aic in all cases under consideration. For equal weights this improvement varies between Inline graphic and Inline graphic, depending on the design, while the improvement achieved by model averaging with smooth aic weights varies between Inline graphic and Inline graphic. The model averaging estimator with equal weights performs substantially better than the procedure with smooth aic weights and the estimation in the wide model (33).

Table 3.

The mean squared error of the model averaging estimators of the area under the curve with equal weights for the Bayesian optimal design (37), the uniform design defined in (29) and the uniform design defined in (30)

  Estimation method
Design Fixed weights Smooth aic weights Model selection Wide model
Inline graphic 1.659* 1.880 2.074 2.071
Inline graphic 1.961 2.080 2.196 2.059
Inline graphic 1.687 1.763* 1.838* 1.847*

Fixed weights: Inline graphic.

In the case of equal weights the Bayesian optimal design Inline graphic for model averaging estimation of the area under the curve yields a Inline graphic improvement of the uniform design Inline graphic, but only a Inline graphic improvement of the design Inline graphic. On the other hand, if model averaging estimates with smooth aic weights and model selection weights or the estimator in the wide model are used, the uniform design Inline graphic shows the best performance. This observation can be explained by the fact that the design Inline graphic has not been constructed for this purpose. Consequently, although this design performs very well in many cases, it cannot be guaranteed that the design Inline graphic is close to the optimal design for model averaging estimation of the area under the curve with smooth aic weights or for estimation in the wide model or a model selected by the AIC. Nevertheless, model averaging with equal weights and the corresponding Bayesian optimal design yields the smallest mean squared error in all considered scenarios.

Next, we consider a model averaging estimator with nonuniform weights Inline graphic, Inline graphic, Inline graphic and Inline graphic for the models (34), (35), (36) and (33), respectively. The corresponding Bayesian optimal design for model averaging estimation of the area under the curve with these weights is

graphic file with name Equation43.gif (39)

The simulated mean squared errors of the various estimators of the area under the curve for the different designs are given in Table 4, where we use the same parameters as in the previous example. We observe similar behaviour to § 4.2: model averaging performs better than model selection, but in this situation model averaging based on smooth aic weights results in a slightly smaller mean squared error than model averaging based on fixed weights; the estimator with fixed weights yields an increase of the mean squared error of about Inline graphic. In Table 3 the estimator in the wide model shows better performance than the estimators based on random weights for some designs, whereas in Table 4 the estimator in the wide model exhibits the largest mean squared error, independent of the design under consideration. For all model averaging estimators, including the estimator after model selection, the mean squared error from the Bayesian optimal design Inline graphic defined in (39) is smaller than those obtained from the designs Inline graphic and Inline graphic.

Table 4.

The mean squared error of the model averaging estimators of the area under the curve for the Bayesian optimal design defined in (39), the uniform design defined in (29) and the uniform design defined in (30)

  Estimaton method
Design Fixed weights Smooth aic weights Model selection Wide model
Inline graphic 1.764* 1.723* 1.835* 1.978
Inline graphic 2.059 2.041 2.129 2.185
Inline graphic 1.841 1.801 1.883 1.886*

Fixed weights: Inline graphic, Inline graphic, Inline graphic and Inline graphic.

Further simulation results using other parameter combinations can be found in the Supplementary Material and show similar results. For example, model averaging shows better performance than estimation in a model identified by the aic, independent of the design under consideration. In most cases the Bayesian optimal design for model averaging estimation of the area under the curve yields a substantial improvement compared to the uniform designs, even when it is used for model averaging with smooth aic weights or for estimation after model selection.

As pointed out by a reviewer, bic weights might be a reasonable alternative to aic weights. Therefore we conducted additional simulations with the same set-up as in the preceding sections using the Bayesian information criterion. See, for example, Claeskens & Hjort (2008, Ch. 3) for details regarding the calculation of smooth bic weights and for model selection by the Bayesian information criterion. The results are very similar to those presented here for the aic criterion and are therefore not depicted. Most importantly, we do not observe any substantial differences in the mean squared error of the model averaging estimators using smooth bic weights, and the estimators after model selection via bic compared to those based on the aic.

For the sake of brevity we have restricted ourselves to two examples. The performance of the proposed designs depends on many variables, such as the model under consideration, the respective prior distributions and the target parameter Inline graphic. In particular, there might be other parameter combinations resulting in less visible improvement of the mean squared error of the model averaging estimator by the Bayesian optimal design. Such an example can be found in the Supplementary Material. In most scenarios that we have investigated, the Bayesian optimal designs improved the estimation precision of the model averaging estimator based on fixed weights.

5. Discussion

In our simulation study, model averaging estimators based on fixed weights usually achieve smaller mean squared errors than the estimators using data-dependent weights. Most likely, estimation precision can be further improved using optimal data-dependent model weights that minimize the mean squared error of the estimator. However, it remains an open and very challenging question for future research to determine optimal designs for estimation methods of this type. The asymptotic distribution of these estimators is complicated and has to be simulated in general for each design under consideration, which is computationally very demanding. A further interesting direction of future research in this context is the construction and investigation of adaptive designs, which proceed in several steps, updating the information about the models and their parameters sequentially.

Finally, the extension of the methodology to nonnested models is of particular importance and seems to be an interesting future research topic. As one cannot work with local alternatives to derive the asymptotic mean squared error, the theory of model averaging estimation developed in Claeskens & Hjort (2008) has first to be extended before it can be used to develop corresponding optimality criteria. Research in this direction is currently in progress. A further interesting question addressing a general aspect of model averaging estimation is to use a minimax approach for the choice of the weight Inline graphic in the model averaging estimator.

Supplementary Material

asz036_Supplememtary_Data

Acknowledgement

This work was supported in part by the Deutsche Forschungsgemeinschaft and the National Institute of General Medical Sciences of the National Institutes of Health. The authors thank the referees and associate editor for their constructive comments on an earlier version of this paper.

Supplementary material

Supplementary material available at Biometrika online includes proofs of the theorems in § 3 and additional simulation results for the examples in § 4.

References

  1. Aoki, Y.,Röshammar, D.,Hamrén, B. & Hooker, A. C. (2017). Model selection and averaging of nonlinear mixed-effect models for robust phase III dose selection. J. Pharmacokin. Pharmacodynam. 44, 581–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Atkinson, A. C. (2008). Inline graphic-optimum designs for model discrimination and parameter estimation. J. Statist. Plan. Infer. 138, 56–64. [Google Scholar]
  3. Atkinson, A. C.,Donev, A. & Randall, T. (2007). Optimum Experimental Designs, with SAS. Oxford: Oxford University Press. [Google Scholar]
  4. Atkinson, A. C. & Fedorov, V. V. (1975). The designs of experiments for discriminating between two rival models. Biometrika 62, 57–70. [Google Scholar]
  5. Biedermann, S.,Dette, H. & Pepelyshev, A. (2006). Some robust design strategies for percentile estimation in binary response models. Can. J. Statist. 34, 603–22. [Google Scholar]
  6. Bornkamp, B. (2015). Viewpoint: Model selection uncertainty, pre-specification and model averaging. Pharmaceut. Statist. 14, 79–81. [DOI] [PubMed] [Google Scholar]
  7. Bornkamp, B.,Bretz, F.,Dmitrienko, A.,Enas, G.,Gaydos, B.,Hsu, C.-H.,KÖNIG, F.,Krams, M.,Liu, Q.,Neuenschwander, B.,Parke, T. & Pinheiro, J. (2007). Innovative approaches for designing and analyzing adaptive dose-ranging trials. J. Biopharm. Statist. 17, 965–95. [DOI] [PubMed] [Google Scholar]
  8. Breiman, L. (1996). Bagging predictors. Mach. Learn. 24, 123–40. [Google Scholar]
  9. Bretz, F.,Dette, H. & Pinheiro, J. (2010). Practical considerations for optimal designs in clinical dose finding studies. Statist. Med. 29, 731–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bretz, F.,Hsu, J. & Pinheiro, J. (2008). Dose finding – a challenge in statistics. Biomet. J. 50, 480–504. [DOI] [PubMed] [Google Scholar]
  11. Buatois, S.,Ueckert, S.,Frey, N.,Retout, S. & Mentré, F. (2018). Comparison of model averaging and model selection in dose finding trials analyzed by nonlinear mixed effect models. AAPS J. 20, 56. [DOI] [PubMed] [Google Scholar]
  12. Buckland, S. T.,Burnham, K. P. & Augustin, N. H. (1997). Model selection: An integral part of inference. Biometrics 53, 603–18. [Google Scholar]
  13. Burnham, K. P. & Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. New York: Springer. [Google Scholar]
  14. Chaloner, K. & Verdinelli, I. (1995). Bayesian experimental design: A review. Statist. Sci. 10, 273–304. [Google Scholar]
  15. Claeskens, G. & Hjort, N. L. (2008). Model Selection and Model Averaging. Cambridge: Cambridge University Press. [Google Scholar]
  16. Cornish-Bowden, A. (2012). Fundamentals of Enzyme Kinetics. Weinheim: Wiley-Blackwell. [Google Scholar]
  17. Dette, H. (1990). A generalization of Inline graphic- and Inline graphic-optimal designs in polynomial regression. Ann. Statist. 18, 1784–805. [Google Scholar]
  18. Dette, H. (1997). Designing experiments with respect to ‘standardized’ optimality criteria. J. R. Statist. Soc. B 59, 97–110. [Google Scholar]
  19. Dette, H.,Bretz, F.,Pepelyshev, A. & Pinheiro, J. C. (2008). Optimal designs for dose finding studies. J. Am. Statist. Assoc. 103, 1225–37. [Google Scholar]
  20. Dette, H. & Titoff, S. (2009). Optimal discrimination designs. Ann. Statist. 37, 2056–82. [Google Scholar]
  21. Fedorov, V. V. & Leonov, S. L. (2013). Optimal Design for Nonlinear Response Models. Boca Raton, FL: CRC Press. [Google Scholar]
  22. Goutelle, S.,Maurin, M.,Rougier, F.,Barbaut, X.,Bourguignon, L.,Ducher, M. & Maire, P. (2008). The Hill equation: A review of its capabilities in pharmacological modelling. Fundamental Clin. Pharmacol. 22, 633–48. [DOI] [PubMed] [Google Scholar]
  23. Hansen, B. E. (2007). Least squares model averaging. Econometrica 75, 1175–89. [Google Scholar]
  24. Hjort, N. L. & Claeskens, G. (2003). Frequentist model average estimators. J. Am. Statist. Assoc. 98, 879–99. [Google Scholar]
  25. Hoeting, J. A.,Madigan, D.,Raftery, A. E. & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors). Statist. Sci. 14, 382–417. [Google Scholar]
  26. Holford, N. H. & Sheiner, L. B. (1981). Understanding the dose–effect relationship: Clinical application of pharmacokinetic–pharmacodynamic models. Clin. Pharmacokin. 6, 429–53. [DOI] [PubMed] [Google Scholar]
  27. Kiefer, J. (1974). General equivalence theory for optimum designs (approximate theory). Ann. Statist. 2, 849–79. [Google Scholar]
  28. Konishi, S. & Kitagawa, G. (2008). Information Criteria and Statistical Modeling. New York: John Wiley & Sons. [Google Scholar]
  29. Konstantinou, M.,Biedermann, S. & Kimber, A. (2017). Model robust designs for survival trials. Comp. Statist. Data Anal. 113, 239–50. [Google Scholar]
  30. Läuter, E. (1974). Experimental design in a class of models. Math. Operationsforsch. Statist. 5, 379–98. [Google Scholar]
  31. Lehmann, E. & Casella, G. (1998). Theory of Point Estimation. Springer Texts in Statistics. New York: Springer. [Google Scholar]
  32. Liang, H.,Zou, G.,Wan, A. T. K. & Zhang, X. (2011). Optimal weight choice for frequentist model average estimators. J. Am. Statist. Assoc. 106, 1053–66. [Google Scholar]
  33. López-Fidalgo, J.,Tommasi, C. & Trandafir, P. C. (2007). An optimal experimental design criterion for discriminating between non-normal models. J. R. Statist. Soc. B 69, 231–42. [Google Scholar]
  34. MacDougall, J. (2006). Analysis of dose-response studies – Inline graphic model. In Dose Finding in Drug Development. Ed. Ting N., pp. 127–45. New York: Springer. [Google Scholar]
  35. Powell, M. J. D. (1994). A direct search optimization method that models the objective and constraint functions by linear interpolation. In Advances in Optimization and Numerical Analysis. Ed. Hennart J.-P. & Gomez S., pp. 51–67. Dordrecht: Kluwer. [Google Scholar]
  36. Pronzato, L. & Walter, E. (1985). Robust experimental design via stochastic approximation. Math. Biosci. 75, 103–20. [Google Scholar]
  37. Pukelsheim, F. (2006). Optimal Design of Experiments. Classics in Applied Mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics. [Google Scholar]
  38. Raftery, A. & Zheng, Y. (2003). Discussion: Performance of Bayesian model averaging. J. Am. Statist. Assoc. 98, 931–8. [Google Scholar]
  39. Schorning, K.,Bornkamp, B.,Bretz, F. & Dette, H. (2016). Model selection versus model averaging in dose finding studies. Statist. Med. 35, 4021–40. [DOI] [PubMed] [Google Scholar]
  40. Tommasi, C. (2009). Optimal designs for both model discrimination and parameter estimation. J. Statist. Plan. Infer. 139, 4123–32. [Google Scholar]
  41. Tommasi, C. & López-Fidalgo, J. (2010). Bayesian optimum designs for discriminating between models with any distribution. Comp. Statist. Data Anal. 54, 143–50. [Google Scholar]
  42. Ucinski, D. & Bogacka, B. (2005). Inline graphic-optimum designs for discrimination between two multiresponse dynamic models. J. R. Statist. Soc. B 67, 3–18. [Google Scholar]
  43. Wagner, M. & Hlouskova, J. (2015). Growth regressions, principal components augmented regressions and frequentist model averaging. Jahrbücher für Nationalökonomie und Statistik 235, 642–62. [Google Scholar]
  44. Wassermann, L. (2000). Bayesian model selection and model averaging. J. Math. Psychol. 44, 92–107. [DOI] [PubMed] [Google Scholar]
  45. Wiens, D. (2015). Robustness of design. In Handbook of Design and Analysis of Experiments. Ed. Dean A.,Morris M.,Stufken J. & Bingham D.. Boca Raton: CRC Press. [Google Scholar]
  46. Wiens, D. P. (2009). Robust discrimination designs. J. R. Statist. Soc. B 71, 805–29. [Google Scholar]
  47. Wiens, D. P. & Xu, X. (2008). Robust prediction and extrapolation designs for misspecified generalized linear regression models. J. Statist. Plan. Infer. 138, 30–46. [Google Scholar]
  48. Zen, M.-M. & Tsai, M.-H. (2002). Some criterion-robust optimal designs for the dual problem of model discrimination and parameter estimation. Sankhya 64, 322–38. [Google Scholar]
  49. Zwietering, M. H.,Jöngenburger, I.,Rombouts, F. M. & Ried, K. V. (1990). Modeling of the bacterial growth curve. Appl. Envir. Microbiol. 56, 1875–81. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

asz036_Supplememtary_Data

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES