Abstract
This paper discusses the problem of determining optimal designs for regression models, when the observations are dependent and taken on an interval. A complete solution of this challenging optimal design problem is given for a broad class of regression models and covariance kernels.
We propose a class of estimators which are only slightly more complicated than the ordinary least-squares estimators. We then demonstrate that we can design the experiments, such that asymptotically the new estimators achieve the same precision as the best linear unbiased estimator computed for the whole trajectory of the process. As a by-product we derive explicit expressions for the BLUE in the continuous time model and analytic expressions for the optimal designs in a wide class of regression models. We also demonstrate that for a finite number of observations the precision of the proposed procedure, which includes the estimator and design, is very close to the best achievable. The results are illustrated on a few numerical examples.
Keywords and Phrases: linear regression, correlated observations, signed measures, optimal design, BLUE, Gaussian processes, Doob representation
1 Introduction
Optimal design theory is a classical field of mathematical statistics with numerous applications in life sciences, physics and engineering. In many cases the use of optimal or efficient designs yields to a reduction of costs by a statistical inference with a minimal number of experiments without loosing any accuracy. Most work on optimal design theory concentrates on experiments with independent observations. Under this assumption the field is very well developed and a powerful methodology for the construction of optimal designs has been established [see for example the monograph of Pukelsheim (2006)]. While important and elegant results have been derived in the case of independence, there exist numerous situations where correlation between different observations is present and these classical optimal designs are not applicable.
The theory of optimal design for correlated observations is much less developed and explicit results are only available in rare circumstances. The challenging difficulty consists here in the fact that - in contrast to the independent case - correlations yield to non-convex optimization problems and classical tools of convex optimization theory are not applicable. Some exact optimal designs for specific linear models have been studied in Dette et al. (2008); Kiselak and Stehlík (2008); Harman and Štulajter (2010). Because explicit solutions of optimal design problems for correlated observations are rarely available, several authors have proposed to determine optimal designs based on asymptotic arguments [see for example Sacks and Ylvisaker (1966, 1968), Bickel and Herzberg (1979), Näther (1985a), Zhigljavsky et al. (2010)], where the references differ in the asymptotic arguments used to embed the discrete (non-convex) optimization problem in a continuous (or approximate) one. However, in contrast to the uncorrelated case, this approach does not simplify the problem substantially and due to the lack of convexity the resulting approximate optimal design problems are still extremely difficult to solve. As a consequence, optimal designs have mainly been determined analytically for the location model (in this case the optimization problems are in fact convex) and for a few one-parameter linear models [see Boltze and Näther (1982), Näther (1985a), Ch. 4, Näther (1985b), Pázman and Müller (2001) and Müller and Pázman (2003) among others]. Only recently, Dette et al. (2013) determined (asymptotic) optimal designs for least squares estimation in models with more parameters under the additional assumption that the regression functions are eigenfunctions of an integral operator associated with the covariance kernel of the error process. However, due to this assumption, the class of models for which approximate optimal designs can be determined explicitly is rather small.
The present paper provides a complete solution of this challenging optimal design problem for a broad class of regression models and covariance kernels. Roughly speaking, we determine (asymptotic) optimal designs for a slightly modified ordinary least squares estimator (OLSE), such that the new estimate and the corresponding optimal design achieve the same accuracy as the best unbiased linear estimate (BLUE) with corresponding optimal designs.
To be more precise, consider a general regression observation scheme given by
(1.1) |
where 𝔼[ε(tj)] = 0, K(ti, tj) = 𝔼[ε(ti)ε(tj)] denotes the covariance between observations at the points ti and tj. (i, j = 1, . . ., N), θ = (θ1, . . ., θm)T is a vector of unknown parameters, f(t) = (f1(t), . . ., fm(t))T is a vector of linearly independent functions, the explanatory variables t1, . . ., tN vary in a compact interval, say [a, b]. Parallel to model (1.1) we also consider its continuous time version
(1.2) |
where the full trajectory of the process {y(t)|t ∈ [a, b]} can be observed and {ε(t)|t ∈ [a, b]} is a centered Gaussian process with covariance kernel K, i.e. K(s, t) = 𝔼[ε(s)ε(t)]. This kernel is assumed to be continuous throughout this paper.
We pay much attention to the one-parameter case and develop a general method for solving the optimal design problem in model (1.2) explicitly for the OLSE, perhaps slightly modified. The new estimate and the corresponding optimal design achieve the minimal variance among all linear estimates (obtained by the BLUE). In particular, our approach allows to calculate this optimal variance explicitly. As a by-product we also identify the BLUE in the continuous time model (1.2). Based on these asymptotic considerations, we consider the finite sample case and suggest designs for a new estimation procedure (which is very similar to OLSE) with an efficiency very close to the best possible (obtained by the BLUE and the corresponding optimal design), for any number of observations. In doing this, we show how to implement the optimal strategies from the continuous time model in practice and demonstrate that even for very small sample sizes the loss of efficiency with respect to the best strategies based on the use of BLUE with a corresponding optimal design can be considered as negligible. We would like to point out at this point that - even in the one-dimensional case - the problem of numerically calculating optimal designs for the BLUE for a fixed sample size is an extremely challenging one due to the lack of convexity of the optimization problem.
In our approach, the importance of the one-parameter design problem is also related to the fact that the optimal design problem for multi-parameter models can be reduced componentwisely to problems in the one-parameter models. This gives us a way to generate analytically constructed universally optimal designs for a wide range of continuous time multi-parameter models of the form (1.2). Our technique is based on the observation that for a finite number of observations we can always emulate the BLUE in model (1.1) by a different linear estimator. To achieve that theoretically we assign signs to the support points of a discrete design and not only weights in the one-parameter models, but in the multi-parameter case we use matrix weights. We then determine “optimal” signs and weights and consider the weak convergence of these ‘designs” and estimators as the sample size converges to infinity. Finally, we prove the (universal) optimality of the limits in the continuous time model (1.2).
Theoretically, we construct a sequence of designs for either the pure or a modified OLSE, say θ̂N, such that its variance or covariance matrix satisfies Var(θ̂N) → D* as the sample size N converges to infinity, where D* is the variance (if m = 1) or covariance matrix (if m > 1) for the BLUE in the continuous time model (1.2). In other words, D* is the smallest possible variance (or covariance matrix with respect to the Loewner ordering) of any unbiased linear estimator and any design. This makes the designs derived in this paper very competitive in applications against the designs proposed by Sacks and Ylvisaker (1966) and optimal designs constructed numerically for the BLUE (using the Brimkulov-Krug-Savanov algorithm, for example). We emphasize once again that due to non-convexity the numerical construction of optimal designs for the BLUE is extremely difficult. An additional advantage of our approach is that we can analytically compute the BLUE with the corresponding optimal variance (covariance matrix) D* in the continuous time model (1.2) and therefore monitor the proximity of different approximations to the optimal variance D* obtained by the BLUE.
The methodology developed in this paper results in a non-standard estimation and optimal design theory and consists in a delicate interplay between new linear estimators and designs in the models (1.1) and (1.2). For this reason let us briefly introduce various estimators, which we will often refer to in the following discussion. Consider the model (1.1) and suppose that N observations are taken at experimental conditions t1, . . ., tN. For the corresponding vector of observations Y = (y(t1), . . ., y(tN))T, a general weighted least squares estimator (WLSE) of θ is defined by
(1.3) |
where is an N×m design matrix and W is some N × N matrix such that (XTWX)−1 exists. For any such W the estimator (1.3) is obviously unbiased. The covariance matrix of the estimator (1.3) is given by
(1.4) |
where Σ = (K(ti, tj))i,j=1,...,N is an N ×N matrix of variances/covariances. For the standard WLSE the matrix W is symmetric non-negative definite; in this case θcirc;WLSE minimizes the weighted sum of squares SSW(θ) = (Y − Xθ)TW(Y − Xθ) with respect to θ. Important particular cases of estimators of the form (1.3) are the OLSE, the best unbiased linear estimate (BLUE) and the signed least squares estimate (SLSE):
(1.5) |
(1.6) |
(1.7) |
Here S is an N×N diagonal matrix with entries +1 and −1 on the diagonal; note that if S ≠ IN then SLSE is not a standard WLSE. While the use of BLUE and OLSE is standard, the SLSE is less common. It was introduced in Boltze and Näther (1982) and further studied in Chapter 5.3 of Näther (1985a). In the content of the present paper, the SLSE will turn out to be very useful for constructing optimal designs for OLSE and the BLUE in the model (1.2) with one parameter, where the full trajectory can be observed. Another estimate of θ, which is not a special case of the WLSE, will be introduced in Section 3 and used in the multi-parameter models.
The remaining structure of the paper is as follows. In Section 2 we derive optimal designs for continuous time one-parameter models and discuss how to implement the designs in practice. In Section 3 we extend the results of Section 2 to multi-parameter models. In Appendix B we discuss transformations of regression models and associated designs, which are a main tool in the proofs of our result but also of own interest. In particular, we provide an extension of the famous Doob representation for Gaussian processes [see Doob (1949) and Mehr and McFadden (1965)], which turns out to be a very important ingredient in proving the design optimality results of Sections 2 and 3. Finally, in Appendix A we collect some auxiliary statements and proofs for the main results of this paper.
2 Optimal designs for one-parameter models
In this section we concentrate on the one-parameter model
(2.1) |
on the interval [a, b] and its continuous time analogue, where 𝔼[ε(t)] = 0 and 𝔼[ε(t)ε(t′)] = K(t, t′). Our approach uses some non-standard ideas and estimators in linear models and therefore we begin this section with a careful explanation of the logic of the material.
Sect. 2.1. Under the assumption that the design space is finite we show in Lemma 2.1 that by assigning weights and signs to the observation points {t1, . . ., tN} we can construct a WLSE which is equivalent to the BLUE. Then, we derive in Corollary 2.1 an explicit form for the optimal weights for a broad class of covariance kernels, which are called triangular covariance kernels.
Sect. 2.2. We demonstrate in Theorem 2.1 that the optimal designs derived in Sect. 2.1 converge weakly to a signed measure, if the cardinality of the design space converges to infinity.
Sect. 2.3. We consider model (2.1) under the assumption that the full trajectory of the process {y(t)|t ∈ [a, b]} can be observed. For the specific case of Brownian motion, that is K(t, t′) = min{t, t′}, we prove analytically the optimality of the signed measure derived in Theorem 2.1 for OLSE. Then, in Theorem 2.3 we establish optimality of the asymptotic measures from Theorem 2.1 for general covariance kernels. As a by-product we also identify the BLUE in the continuous time model (1.2) (in the one-dimensional case). For this purpose, we introduce a transformation which maps any regression model with a triangular covariance kernel into another model with different triangular kernels. These transformations allow us to reduce any optimization problem to the situation considered in Theorem 2.2, which refers to the case of Brownian motion. The construction of this map is based on an extension of the celebrated Doob’s representation which will be developed in Appendix B.
Sect. 2.4. We provide some examples of asymptotic optimal measures for specific models.
Sect. 2.5. We introduce a practical implementation of the asymptotic theory derived in the previous sections. For a finite sample size we construct WLSE with corresponding designs which can achieve very high efficiency compared to the BLUE with corresponding optimal design. It turns out that these estimators are slightly modified OLSE, where only observations at the end-points obtain a weight (and in some cases also a sign).
Sect. 2.6. We illustrate the new methodology in several examples. In particular, we give a comparison with the best known procedures based on BLUE and show that the loss in precision for the procedures derived in this paper is negligible with our procedures being much simpler and more robust than the procedures based on BLUE.
2.1 Optimal designs for SLSE on a finite design space
In this section, we suppose that the design space for model (2.1) is finite, say 𝒯 = {t1, . . ., tN}, and demonstrate that in this case the approximate optimal designs for the SLSE (1.7) can be found explicitly. Since we consider the SLSE (1.7) rather than the OLSE (1.5), a generic approximate design on the design space 𝒯 = {t1, . . ., tN} is an arbitrary discrete signed measure ξ = {t1, . . ., tN; w1, . . ., wN}, where wi = sipi, si ∈ {−1, 1}, pi ≥ 0 (i = 1, . . ., N) and . We assume that the support t1, . . ., tN of the design is fixed but the weights p1, . . ., pN and signs s1, . . ., sN, or equivalently the signed weights wi, will be chosen to minimize the variance of the SLSE (1.7). In view of (1.4), this variance is given by
(2.2) |
Note that this expression coincides with the variance of the WLSE (1.2), where the matrix W is defined by W = diag(w1, . . ., wN).
We assume that f(ti) ≠ 0 for all i = 1, . . ., N. If f(tj) = 0 for some j then the point tj can be removed from the design space 𝒯 without changing the SLSE estimator, its variance and the corresponding value D(ξ). In the above definition of the weights wi, we have . Note, however, that the value of the criterion (2.2) does not change if we change all the weights from wi to cwi (i = 1, . . ., N) for arbitrary c ≠ 0.
Despite the fact that the functional D in (2.2) is not convex as a function of (w1, . . ., wN), the problem of determining the optimal design can be easily solved by a simple application of the Cauchy-Schwarz inequality. The proof of the following lemma is given in Appendix A [see also Theorem 5.3 in Näther (1985a), where this result was proved in a slightly different form].
Lemma 2.1
Assume that the matrix Σ = (K(ti, tj))i,j=1,...,N is positive definite and f(ti) ≠ 0 for all i = 1, . . ., N. Then the optimal weights minimizing (2.2) subject to the constraint are given by
(2.3) |
where f = (f(t1), . . ., f(tN))T, ei = (0, 0, . . ., 0, 1, 0, . . ., 0)T ∈ ℝN is the i-th unit vector, and
Moreover, for the design with weights (2.3) we have D(ξ*) = D*, where D* = 1/(fT Σ−1f), the variance of the BLUE defined in (1.6) using all observations t1, . . ., tN.
Lemma 2.1 shows, in particular, that the pair {SLS estimate, corresponding optimal design ξ*} provides an unbiased estimator with the best possible variance for the one-parameter model (2.1). This results in a WLSE (1.2) with which is BLUE. In other words, by a slight modification of the OLSE we are able to emulate the BLUE using the appropriate design or WLSE.
While the statement of Lemma 2.1 holds for arbitrary kernels, we are able to determine the optimal weights more explicitly for a broad class, which are called triangular kernels and are of the form
(2.4) |
where u(·) and v(·) are some functions on the interval [a, b]. Note that the majority of covariance kernels considered in literature belong to this class, see for example Näther (1985a); Zhigljavsky et al. (2010) or Harman and Štulajter (2011). The following result is a direct consequence of Lemma A.1 from Appendix A.
Corollary 2.1
Assume that the covariance kernel K(·, ·) has the form (2.4) so that the matrix Σ = (K(ti, tj))i,j=1,...,N is positive definite and has the entries K(ti, tj) = uivj for i ≤ j, where for k = 1, . . ., N we denote uk = u(tk), vk = v(tk), and also fk = f(tk), qk = uk/vk. If f1 ≠ 0 (i = 1, . . ., N), the weights in (2.3) can be represented explicitly as follows:
(2.5) |
(2.6) |
(2.7) |
for i = 2, . . ., N − 1. In formulas (2.5), (2.6) and (2.7), the quantity σ̃ij denotes the element in the position (i, j) of the matrix Σ−1 = (s̃ij)i,j=1,...,N.
2.2 Weak convergence of designs
In this section, we consider the asymptotic properties of designs with weights (2.5) – (2.7). Recall that the design space is an interval, say [a, b], and that we assume a triangular covariance function of the form (2.4). According to the discussion of triangular covariance kernels provided in Section 4.1 of Appendix B, the functions u(·) and v(·) are continuous and strictly positive on the interval (a, b) and the function q(·) = u(·)/v(·) is positive, continuous and strictly increasing on (a, b). We also assume that the regression function f in (2.1) is continuous and strictly positive on the interval (a, b). We define the transformation
(2.8) |
and note that the function Q : [a, b] → [0, 1] is increasing on the interval [a, b] with Q(a) = 0 and Q(b) = 1, that is Q(·) is a cumulative distribution function (c.d.f.). For fixed N and i = 1, . . ., N, define and the design points
(2.9) |
Theorem 2.1
Consider the optimal design problem for the model (2.1), where the error process ε(t) has the covariance kernel K(t, s) of the form (2.4). Assume that u(·), v(·), f(·) and q(·) are strictly positive, twice continuously differentiable functions on the interval [a, b]. Consider the sequence of signed measures
where the support points ti,N are defined in (2.9) and the weights wi,N are assigned to these points according to the rule (2.3) of Lemma 2.1. Then the sequence of measures {ξN}N∈ℕ converges in distribution to a signed measure ξ*, which has masses
(2.10) |
at the points a and b, respectively, and the signed density
(2.11) |
(that is, the Radon-Nikodym derivative of ξ* with respect to the Lebesque measure) on the interval (a, b), where the function h(·) is defined by h(t) = f(t)/v(t).
The proof of Theorem 2.1 is technically complicated and therefore given in Appendix A. The constant c ≠ 0 in (2.10) and (2.11) is arbitrary. If a normalization |ξ*|([a, b]) = 1 is required, then c can be found from the normalizing condition
Throughout this paper we write the limiting designs of Theorem 2.1 in the form
(2.12) |
where δa(dt) and δb(dt) are the Dirac-measures concentrated at the points a and b, respectively, and the function p(·) is defined by (2.11). Note also that under the assumptions of Theorem 2.1, the function p(·) is continuous on the interval [a, b]. In the case of Brownian motion, the limiting design of Theorem 2.1 is particularly simple.
Example 2.1
If the error process ε in model (2.1) is the Brownian motion on the interval [a, b] with 0 < a < b < ∞, then K(t, s) = min(t, s) and hence u(t) = t, v(t) = 1, q(t) = t. This implies that the limiting design of Theorem 2.1 is given by (2.12) with
(2.13) |
2.3 Optimal designs and the BLUE
In this section we consider the continuous time model (1.2) in the case m = 1 and demonstrate that the limiting designs derived in Theorem 2.1 are in fact optimal. A linear estimator for the parameter θ in model (1.2) is defined by , where μ is a signed measure on the interval [a, b]. Special cases include the OLSE and SLSE , where ξ is a measure or a signed measure on the interval [a, b], respectively. Note that θ̂μ is unbiased if and only and is unbiased by construction. The BLUE (in the continuous time model (1.2)) minimizes
in the class of all signed measures μ satisfying , and
(2.14) |
denotes the best possible variance of all linear unbiased estimators in the continuous time model (1.2).
Similarly, a signed measure ξ* on the interval [a, b] is called optimal for least squares estimation in the one-parameter model (1.2), if it minimizes the functional
(2.15) |
in the set of all signed measures ξ on the interval [a, b], such that . In the case of a Brownian motion, we are able to establish the optimality of the design of Example 2.1. A proof of the following result is given in Appendix A.
Theorem 2.2
Let {ε(t) | t ∈ [a, b]} be a Brownian motion, so that K(t, t′) = min{t, t′}, and f be a positive, twice continuously differentiable function on the interval [a, b] ⊂ ℝ+. Then the signed measure ξ*, defined by (2.12) and (2.13) with arbitrary c ≠ 0, minimizes the functional (2.15). The minimal value in (2.15) is obtained as
Moreover, the BLUE in model (1.2) is given by θ̂μ*, where μ*(dt) = f(t)ξ**(dt) and ξ** is the signed measure defined by (2.12) and (2.13) with constant c* = D(ξ*). This further implies D* = D(ξ*) = Φ(μ*).
Based on the design optimality established in Theorem 2.2 for the special case of Brownian motion and the technique of transformation of regression models described in Appendix B, we can establish the optimality of the asymptotic designs derived in Theorem 2.1 for more general covariance kernels; see Appendix A for the proof.
Theorem 2.3
Under the conditions of Theorem 2.1, the optimal design ξ* minimizing the functional (2.15) is defined by the formulas (2.10) – (2.12) with arbitrary c ≠ 0. The minimal value in (2.15) is obtained as
(2.16) |
where f̃(t) = f(q−1(s))/v(q−1(s)). Moreover, the BLUE in model (1.2) is given by θ̂μ*, where μ*(dt) = f(t)ξ**(dt), ξ** is the signed measure defined in (2.10) – (2.12) with constant c* = D(ξ*), and D* = Φ(μ*) = D(ξ*).
2.4 Examples of optimal designs
In this section, we provide the values of Pa, Pb and the function p(·) in the general expression (2.12) for the optimal designs in a number of important special cases for the one-parameter continuous time model (1.2), where the design space is 𝒯 = [a, b]. Specifically, optimal designs are given in Table 1 for the location model, in Table 2 for the linear model, in Table 3 for a quadratic model and in Table 4 for a trigonometric model. The last named model was especially chosen to demonstrate the existence of optimal designs with a density p which changes sign in the interval (a, b). In the tables several triangular covariance kernels are considered. The parameters of these covariance kernels satisfy the constraints c2 > ±c1, ∓ c2 ∉ [a, b], γ > ω, λ > 0. For the sake of a transparent presentation, we use the factor c = 1 in all tables, but we emphasize once again that the optimal designs do not depend on the scaling factor.
Table 1.
Optimal designs for the location model: f(t) = 1, t [a, b].
u(t) | v(t) | Pa | Pb | p(t) | ||
---|---|---|---|---|---|---|
any | 1 | 1 | 0 | 0 | ||
c1 + t | c2 ± t |
|
|
0 | ||
tγ | tω | −γa−γ−ω | ωb−γ−ω | γωt−1−γ−ω | ||
eλt | e−γt | λea(γ−λ) | γeb(γ−λ) | λγet(γ−λ) |
Table 2.
Optimal designs for the linear regression model through the origin: f(t) = t, t ∈ [a, b].
u(t) | v(t) | Pa | Pb | p(t) | |||
---|---|---|---|---|---|---|---|
t | 1 | 0 | 1 | 0 | |||
c1 + t | c2 ± t |
|
|
0 | |||
tγ | t | −(γ−1)a−γ−ω | (ω−1)b−γ−ω | (1−γ)(1−ω)t−1−γ−ω | |||
eλt | 1 |
|
|
|
|||
eλt | e−γt |
|
|
|
Table 3.
Optimal designs for the quadratic regression model: f(t) = t2 + ν, t ∈ [a, b].
u(t) | v(t) | Paf(a) | Pbf(b) | p(t)f(t) | ||
---|---|---|---|---|---|---|
t | 1 | (a2 − ν)/a | −2b | 2 | ||
c1 + t | c2 ± t |
|
|
2 | ||
tγ | tω | ((2−γ)a2−γν)a−γ−ω | ((ω−2)b2+ων)b−γ−ω | ((2−ω)(2− ω)+νγω)t1−γ−ω | ||
eλt | 1 | (2a − (a2 + ν)λ)e−aλ | −2be−bλ | 2(1 − tλ)e−tλ | ||
eλt | e−λt | (2a − (a2 + ν)λ) | −((b2 + ν)λ + 2b) | (2 − λ2(t2 + ν) |
Table 4.
Optimal designs for the trigonometric regression model: , t ∈ [1, 2].
u(t) | v(t) | Pa | Pb | p(t)f(t) | ||
---|---|---|---|---|---|---|
t | 1 | (1 − π) | π | 2π2 sin(2πt) | ||
c1 + t | c2 ± t |
|
|
2π2 sin(2πt) | ||
t2 | t | (2 − π) | (2π − 1)/8 | 2t−4((π2t2−1) sin(2πt)+πt cos(2πt)−1) | ||
eλt | 1 | (λ − π)e−λ | πe−2λ | (2π2 sin(2πt) + πλcos(2πt))e−λt | ||
eλt | e−λt | (λ − π) | (λ + π) | ((2π2 + λ2/2) sin(2πt) + λ2) |
As an example, if K(t, t′) = e−λ|t−t′| for some λ > 0, we have from the last row of Table 2 that the optimal design for the continuous time model {θt+ε(t)|t ∈ [1, 2]} is , and as a consequence, .
2.5 Practical implementation: designs for finite sample size
In practice, efficient designs and corresponding estimators for the model (1.1) have to be derived from the optimal solutions in the continuous time model (1.2), and in this section a procedure with a good finite sample performance is proposed. Roughly speaking, it consists of a slight modification of the ordinary least squares estimator and a discretization of a continuous signed measure with the asymptotic optimal density in (2.11).
We assume that the experimenter can take N +2 observations with N observations inside the interval [a, b]. In principle, any probability measure on the interval can be approximated by an (N +2)-point measure with weights 1/(N +2) and similarly any finite signed measure can be approximated by an (N +2)-point signed measure with equal weights (in absolute value). We hence could use a direct approximation of the optimal signed measures of the form (2.12) by a sequence of (N +2)-point signed measures with equal weights (in absolute value). For an increasing sample size this sequence will eventually converge to the optimal measure of Theorem 2.3. However, this convergence will typically be very slow, where we measure the speed of convergence by the differences between the variances D(ξ) of the corresponding estimates and the optimal value D* defined in (2.16). The main difficulty lies in the fact that a typical optimal measure has masses at the boundary points a and b, in addition to some density on the interval (a, b). The convergence of discrete measures with equal (in absolute value) weights to such a measure will be very slow, especially in view of the fact that in our approximating measures the points cannot be repeated. Summarizing, approximation of the optimal signed measures by measures with equal weights is possible but cannot be accurate for small N.
In order to improve the rate of convergence we propose a slight modification of the ordinary least squares procedure. In particular, we propose a WLSE with weights at the points a and b (the end-points of the interval [a, b]), which correspond to the masses Pa and Pb of the asymptotic optimal design. We thus only need to approximate the continuous part of the optimal signed measure, which has a density on (a, b), by an N-point design with equal masses. To be precise, consider an optimal measure of the form (2.12). We assume that the density p(·) is not identically zero on the interval (a, b) and choose the constant c such that . Note that unless p(·) changes sign in (a, b), we can choose p(t) ≥ 0 for all t ∈ (a, b). Define φ(t) = |p(t)| for t ∈ (a, b) and denote by the corresponding distribution function. The N-point design we use as an N-point approximation to the measure with density φ(t) is ξ̂N = {t1,N, . . ., tN,N; 1/N, . . ., 1/N}, where ti,N = F−1(zi,N) with zi,N = i/(N + 1), i = 1, 2, . . ., N. If p(t) = 0 on a sub-interval of [a, b] and F−1(zi,N) is not uniquely defined then we choose the smallest element from the set F−1(zi,N) as ti,N . Finally, the design we suggest as an (N +2)-point approximation to the optimal measure in (2.12) is
where P = 1 − |Pa| − |Pb|, ξ̄N = {t1,N, . . ., tN,N; s1,N/N, . . ., sN,N/N} and si,N = sign(p(ti,N)), i = 1, . . ., N.
The matrix W, which corresponds to the design ξN+2 and is used in the corresponding WLSE (1.3), is a diagonal matrix WN = diag(NPa, s1,N P, s2,NP, . . ., sN,NP, NPb) of size (N +2) × (N +2). The set of N +2 design points, where the observations should be taken, is given by {a, t1,N, t2,N, . . ., tN,N, b} and the resulting estimate is defined by
(2.17) |
It follows from (1.4), (2.15) and the discussion of the previous paragraph that
where D* is defined in (2.14).
2.6 Some numerical results
Consider the regression model (2.1) with f(t) = t2 + 1, t ∈ [1, 2], where the error process is given by the Brownian motion. The optimal design for this model can be obtained from Table 3, and we have Pa = 0, Pb = −0.55, P = 0.45 and p(t) = 1.38/(t2 + 1). By computing the quantiles from the c.d.f. corresponding to p we can easily obtain support points of (N+2)-point designs. For example, and .
In Figure 1 we display the variance of various linear unbiased estimators for different sample sizes. We observe that the variance of the WLSE defined by (2.17) for the proposed (N+2)-point design is slightly larger than the variance of the BLUE for the proposed (N +2)-point design, which is very close to the variance of the BLUE with corresponding optimal (N+2)-point design. The calculation of these designs is complicated and has been performed numerically by the Nelder-Mead algorithm in MATLAB. We also note that due to the non-convexity of the optimization problem it is not clear that the algorithm finds the optimal design. However, by Theorem 2.2 and 2.3 we determined the optimal value (2.14), which is D* ≃ 0.075004. This means that for the proposed designs WLSE has almost the same precision as BLUE.
Figure 1.
The variance of the WLSE defined in (2.17) for the proposed (N +2)-point designs , of the BLUE for the proposed (N +2)-point designs (grey circles) and of the BLUE with corresponding optimal (N+2)-point designs (line). The error process in model (2.1) is given by the Brownian motion and the regression function is f(t) = t2 + 1, t ∈ [1, 2].
In our second example we compare the proposed optimal designs with the designs from Sacks and Ylvisaker (1966), which are constructed for the BLUE. For this purpose we consider the model (2.1) with regression function f(t) = 1 + 0.5 sin(2πt), t ∈ [1, 2], and triangular covariance kernel of the form (2.4) with u(t) = t2 and v(t) = t. The optimal design in the continuous time model can be obtained from Table 4 and its density is depicted in Figure 2.
Figure 2.
The density of the optimal design for continuous time model (2.1) with regression function f(t) = 1+0.5 sin(2πt), t ∈ [1, 2], and covariance kernel of the form (2.4) with u(t) = t2 and v(t) = t.
By computing quantiles using this optimal design, we obtain that the 4-point design is supported at points 1, 1.27, 1.68 and 2. For , the variance of the BLUE is ≃ 0.6129. Using the optimal density from Sacks and Ylvisaker (1966), we obtain the 4-point design supported at 1, 1.25, 1.63 and 2. For , the variance of the BLUE is ≃ 0.6200. For N = 2, 3, . . ., 20, the variances of the BLUE for the proposed (N + 2)-point designs, the (N + 2)-point designs from Sacks and Ylvisaker (1966) and the optimal (N + 2)-point designs for the BLUE are depicted in Figure 3. We observe that for N = 2, 3, 4 the new designs yield a smaller variance of the BLUE, while for N = 5 the design of Sacks and Ylvisaker (1966) shows a better performance. In all other cases the results for both designs are very similar. In particular, for N ≥ 6 the variances from the optimal (N + 2)-point designs proposed in this paper and in the paper of Sacks and Ylvisaker (1966) are only slightly worse than the variances of the BLUE with corresponding best (N +2)-point designs (which is computed by direct optimization).
Figure 3.
The variance of BLUE for the proposed (N+2)-point designs (grey circles), the (N+2)-point designs from Sacks and Ylvisaker (1966) (crosses) and the BLUE with corresponding optimal (N + 2)-point designs (line) for the model f(t) = 1 + 0.5 sin(2πt), t ∈ [1, 2], and the covariance kernel with u(t) = t2 and v(t) = t; N = 2, . . ., 20.
3 Multi-parameter models
In this section we discuss optimal design problems for models with more than one parameter. The structure of this section is somewhat similar to the structure of Section 2. In Section 3.1 we introduce a new class of linear estimators of the parameters in model (1.3), which we call matrix-weighted estimators (MWE) and show in Lemma 3.3 that for some special choices of the matrix weights the MWE can always emulate the BLUE. In Section 3.2 matrix-weighted designs associated with the MWE are defined. Then, for the case of triangular kernels, in Corollary 3.1 we derive the asymptotic forms for the sequence of designs that are associated with the version of the MWE which emulates the BLUE. In Section 3.3 we prove optimality of the asymptotic matrix-weighted measure derived in Corollary 3.1 in the continuous time model (1.2) (see Theorem 3.1), while some examples of asymptotically optimal measures are provided in Section 3.4. Finally, the practical implementation of the asymptotic measures is discussed in Section 3.5 and numerical examples are provided in Section 3.6.
The proofs of many statements in this section use the results of Section 2. This is possible as there is a lot of freedom in choosing the form of the MWE to emulate the BLUE and we choose a special form which could be considered as component-wise SLSE. Correspondingly, the resulting matrix-weighted designs (including the asymptotic ones) become combinations of designs for one-parameter models.
3.1 Matrix-weighted estimators and designs
Consider the regression model (1.1) and assume that N observations at points tj (j = 1, . . ., N) have been made. Let Oj be an m × m matrix associated with the observation point tj; j = 1, . . ., N. Recall the definition of the design matrix and the definition of Y = (y(t1), . . ., y(tN))T. We introduce the m × N matrix C = (O1f(t1), . . ., ON f(tN)), whose j-th column is Ojf(tj). Assuming that the m × m matrix
(3.1) |
is non-singular we define the linear estimator
(3.2) |
for the vector θ in model (1.1). We call this estimator the matrix-weighted estimator (MWE), because each column of the matrix X is multiplied by a matrix weight. It is easy to see that for any C the MWE θ̂MWE is unbiased and its covariance matrix is given by
(3.3) |
where Σ = (K(ti, tj))i,j=1,...,N is the N ×N matrix of covariances of the errors. Note that the matrix M defined in (3.1) generalizes the standard information matrix XTX and that M is not necessarily a symmetric matrix. The following result shows that different matrices O1, . . ., ON may yield the same matrix-weighted estimator θ̂MWE. Its proof is obvious and therefore omitted.
Lemma 3.1
Consider the regression model (1.1) and assume that the matrix M defined in (3.1) is non-singular. Then the estimator θ̂MWE defined in (3.2) coincides with the estimator θ̂MWE,Λ = (CΛX)−1CΛY, where CΛ = ΛC and Λ is an arbitrary non-singular m × m matrix.
The estimator θ̂MWE,Λ introduced in Lemma 3.1 is the MWE defined by the matrix weights ΛO1, . . ., ON. Lemma 3.1 implies that the θ̂MWE is exactly the same for any set of matrices {ΛO1, . . ., ΛON} as long as Λ is non-singular. In the asymptotic considerations below it will be convenient to interpret the combination of the set of experimental conditions {t1, . . ., tN} and the set of corresponding matrices {O1, . . ., ON} in the MWE as an N-point matrix-weighted design.
Definition 3.1
Any combination of N points {t1, . . ., tN} and m × m matrices {O1, . . ., ON} will be called N-point matrix-weighted design and denoted by
(3.4) |
The covariance matrix D(ξN) of a matrix-weighted design ξN is defined as the covariance matrix Var(θ̂MWE) in (3.3) of the corresponding estimate θ̂MWE.
The estimator θ̂MWE is not necessarily a least-squares type estimator; that is, it may not be representable in the form (1.3) for some N × N weight matrix W and hence there may be no associated weighted sum of squares which is minimized by the MWE. However, for any given W, we can always find matrices Oj such that
(3.5) |
and therefore achieve θ̂MWE = θ̂WLSE. The following result gives a constructive solution to the matrix equations (3.5).
Lemma 3.2
Assume that f1(t) ≠ 0 for all t ∈ [a, b]. Define ,
(3.6) |
where e1 = (1, 0, . . ., 0)T ∈ ℝm is the first unit vector and (XTW)j denotes the j-th column of the m × N matrix XTW. Then the corresponding matrix-weighted estimator satisfies θ̂MWE = θ̂WLSE.
Proof
The matrix equation (3.5) can be written as N vector equations
(3.7) |
with respect to the matrices Oj. Assume that for some ωj ∈ ℝm. Then
and equation (3.7) has the unique solutions (3.6).
The form for the matrices Oj considered in Lemma 3.2 means that the matrix Oj has the vector ωj as its first column while all other entries in this matrix are zero. We shall refer to this form as the one-column form. We can choose other forms for the matrices Oj, but then we would require different, somewhat stronger, assumptions regarding the vector f(t). For example, if f(t) ≠ (0, . . ., 0)T for all t ∈ [a, b], then we can always choose diagonal matrices Oj to satisfy (3.5) (see Lemma 3.5 below).
The following choices for Oj ensure coincidence of θ̂MWE with the three popular estimators defined in the Introduction.
If Oj = Im for all j, then θ̂MWE = θ̂OLSE.
If Oj = sjIm for all j, then θ̂MWE = θ̂SLSE.
If W = Σ−1 and with ωj = (XTΣ−1)j/f1(tj), then θ̂MWE = θ̂BLUE.
We shall call any MWE θ̂MWE optimal if it coincides with the BLUE. In view of the importance of the last case, the corresponding result is summarized in the following lemma.
Lemma 3.3
Consider the regression model (1.1) and let f1(t) ≠ 0 for all t ∈ [a, b]. For a given set of N observation points {t1, . . ., tN} the MWE θ̂MWE defines a BLUE if with .
If the covariance kernel of the error process has triangular form (2.4) then we can derive the explicit form for the optimal MWE. The result follows by a direct application of Lemma A.1.
Lemma 3.4
Assume that the covariance kernel K(·, ·) has the form (2.4) and that the matrix Σ = (K(ti, tj))i,j=1,...,N is positive definite with entries K(ti, tj) = uivj for i ≤ j, where for k = 1, . . ., N we denote uk = u(tk), vk = v(tk) and qk = uk/vk. Then we have the following representation for the optimal vectors introduced in Lemma 3.3:
(3.8) |
(3.9) |
(3.10) |
for i = 2, . . ., N − 1. Here in formulas (3.8), (3.9) and (3.10) σ̃ij denote the elements of the matrix Σ−1 = (σ̃ij)i,j=1,...,N.
The following provides a result similar to Lemmas 3.2 and 3.3 in the case where the matrices Oj are diagonal. An extension of Lemma 3.4 to the matrices Oj of the diagonal form is straightforward and omitted for the sake of brevity.
Lemma 3.5
Consider the regression model (1.1) and let fk(t) ≠ 0 for all t ≠ [a, b] and all k = 1, . . ., m. For each j = 1, . . ., N, define the diagonal matrix Oj by its diagonal elements
where (XTW)k,j denotes the (k, j)-th element of the matrix XTW. Then θ̂MWE = θ̂WLSE. If additionally W = Σ−1 so that (Oj)k,k = (XTΣ−1)k,j/fk(tj), then θ̂MWE = θ̂BLUE.
3.2 Weak convergence of matrix-weighted designs
Let Q : [a, b] → [0, 1] be an increasing function on the interval [a, b] with Q(a) = 0 and Q(b) = 1 so that Q(·) is a c.d.f. For a fixed N and j = 1, . . ., N, define the points t1,N, . . ., tN,N by (2.9). Suppose that with each t ∈ [a, b] we can associate an m × m matrix O(t) and consider an N-point matrix-weighted design ξN of the form (3.4) with tj = tj,N and Oj = O(tj,N). In view of (3.1) and (3.3) this design has the covariance matrix
where the matrices M(ξN) and B(ξN) are defined by
In addition to the sequence of matrix-weighted designs ξN consider the sequence of uniform distributions on the set {t1,N, . . ., tN,N}. As N → ∞, this sequence converges weakly to the design (probability measure) on the interval [a, b] with distribution function Q. This implies
and
(3.11) |
under the assumptions that the vector-valued function f, the matrix-valued function O, the kernel K are continuous on the interval [a, b] and the generalized information matrix M(ξ) are non-singular. Moreover, the sequence of estimators (3.2) converges (almost surely as N → ∞) to
(3.12) |
where {y(t) | t ∈ [a, b]} is the stochastic process in the continuous time model (1.2). Bearing these limiting expressions in mind we say that the sequence of matrix-weighted designs ξN defined by (3.4) converges to the limiting matrix-weighted design ξ(dt) = O(t)ζ(dt) as N → ∞. This relation justifies the notation M(ξ), B(ξ) and D(ξ) of the previous paragraph.
The (optimal) limiting matrix-weighted designs which will be constructed below will have a similar structure as the signed measures in (2.12). They will assign matrix weights Oa and Ob to the end-points of the interval [a, b] and a ‘matrix density’ O(t) to the points t ∈ (a, b); that is, these designs will have the form
(3.13) |
In view of (3.12), the MWE in the continuous time model (1.2) associated with any design of the form (3.13) can be written as
(3.14) |
where . In the particular case associated with Lemma 3.4, we have the following structure of the matrices Oa and Ob and the matrix function O(t) in (3.13):
(3.15) |
where ωa and ωb are some m-dimensional vectors and ω(t) ∈ ℝm is some vector-valued function defined on the interval (a, b). Note that ω(t) does not have to approach ωa and ωb as t → a and t → b, respectively.
When the sequence of matrix-weighted designs is defined by the formulas of Lemma 3.3 we can compute the limiting matrix-weighted design. The proof follows by similar arguments as given in the proof of Theorem 2.1 and is therefore omitted.
Corollary 3.1
Consider model (1.1), where the error process {ε(t)| t ∈ [a, b]} has a covariance kernel K of the form (2.4). Assume that u(·), v(·), q(·) are strictly positive, twice continuously differentiable functions on the interval [a, b] and that the vector-valued function f(·) is twice continuously differentiable with f1(t) ≠ 0 for all t ∈ [a, b]. Consider the matrix-weighted design ξN of the form (3.4), where the support points tj = tj,N are generated by (2.9) and the matrix weights Oj = Oj,N are defined in Lemma 3.3. The sequence {ξN}N∈N converges (in the sense defined above in the previous paragraph) to a matrix-weighted design ξ defined by (3.13) and (3.15) with
(3.16) |
where h(t) = f(t)/v(t) and the constant c ≠ 0 is arbitrary.
In Corollary 3.1, the one-column representation of the matrices Oj is used. The following statement contains a similar result for the case where the matrices Oj are diagonal.
Corollary 3.2
Let the conditions of Corollary 3.1 hold and assume additionally that fk(t) ≠ 0 for all t ∈ [a, b] and all k = 1, . . . , m. Consider the matrix-weighted design ξN of the form (3.4), where the support points tj = tj,N are generated by (2.9) and the matrices Oj = Oj,N are defined in Lemma 3.5 with diagonal elements given by (Oj)k,k = (XTΣ−1)k,j/fk(tj). Then the sequence {ξN}N×ℕ converges to the optimal matrix-weighted design ξ* of the form (3.13), where the diagonal elements of the matrices Oa = diag(Oa,11, . . . , Oa,mm), Ob = diag(Ob,11, . . . , Ob,mm) and O(t) = diag(O11(t), . . . , Omm(t)) are given by
respectively, hj(t) = fj(t)/v(t), j = 1, . . . , m and the constant c ≠ 0 is arbitrary.
3.3 Optimal designs and best linear estimators
In this section we consider again the continuous time model (1.2), where the full trajectory of the process {y(t)|t ∈ [a, b]} can be observed. We start recalling some known facts concerning best linear unbiased estimation. For details we refer the interested reader to the work of Grenander (1950) or Section 2.2 in Näther (1985a). Any linear estimator of θ can be written in the form of the integral
(3.17) |
where μ(t) = (μ1(t), . . . , μm(t))T is a vector of signed measures on the interval [a, b]. For given μ, the estimator θ̂μ is unbiased if and only if , where Im denotes the m-dimensional identity matrix. Theorem 2.3 in Näther (1985a) states that the estimator θ̂μ* is BLUE if and only if and the identity
holds for all u ∈ [a, b], where A is some m × m matrix. The matrix A is uniquely defined and coincides with the matrix
(3.18) |
The Gauss-Markov theorem further implies that D* ≤ Var(θ̂), where θ̂ is any other linear unbiased estimator of θ.
Definition 3.2
A matrix-weighted design ξ* is called optimal if D(ξ*) = D*, where D(ξ) is defined in (3.11) and D* is defined in (3.18).
The designs we consider have the form (3.13) and the corresponding MWE are expressed by (3.14). The estimator (3.14) can be expressed in the form (3.17), that is θ̂MWE(ξ) = θ̂μ with
The estimators defined in (3.14) are always unbiased and the following result provides the matrix-weighted optimal design and the BLUE in the continuous time model (1.2). The proof follows by similar arguments as given in the proof of Theorem 2.2 and 2.3 and is therefore omitted.
Theorem 3.1
Let K(t, s) be a covariance kernel of the form (2.4) and the vector-function f(·) be twice continuously differentiable with f1(t) ≠ 0 for all t ∈ [a, b]. Under the assumptions of Corollary 3.1 the matrix-weighted design ξ* defined by the formulas (3.13) and (3.16) with c = 1 is optimal in the sense of Definition 3.2. Moreover, if
then θ̂μ* defines the BLUE in model (1.2). Additionally, we have
where the matrix M(ξ*) is given by
and f̃(s) = f(q−1(s))/v(q−1(s)).
In Theorem 3.1 we have used the one-column representation for the matrices O(t). Similar arguments establish the optimality of the matrix-weighted designs ξ* defined in Corollary 3.2 where the diagonal representation for the matrices O(t) is used. The details are omitted for the sake of brevity.
3.4 Examples of optimal matrix-weighted designs
Consider the polynomial regression model with f(t) = (1, t, t2, . . . , tm−1)T, t ∈ [a, b] and the covariance kernel of the Brownian motion K(t, s) = min(t, s). For the construction of matrix-weighted designs we use matrices O(t) in the one-column and diagonal representations.
For the one-column representation we have from Corollary 3.1 and Theorem 3.1 that the optimal matrix weighted design has masses and at points a and b, respectively, and the density . Here the vectors ωa, ωb and ω(t) are given by
respectively. For the diagonal representation we have from Corollary 3.2 (and an analogue of Theorem 3.1) that the optimal matrix weighted design has masses Oa and Ob at points a and b, respectively, and the density O(t), where
Note that in this case all non-vanishing diagonal elements of the matrix O(t) are proportional to the function 1/t2. According to Lemma 3.1, we can use ΛO(t) instead of O(t), for any non-singular m × m matrix Λ. By taking the matrix
we obtain
As another example, consider the polynomial regression model with f(t) = (1, t, t2, . . . , tm−1)T, t ∈ [a, b], and the triangular covariance kernel of the function (2.4) with u(t) = tγ and v(t) = tω. For the diagonal representation we have from Corollary 3.2 that the optimal matrix weighted design has masses Oa and Ob at points a and b, respectively, and the density O(t), where
with τi = (i − 1 − γ) (i − 1 − ω), i = 1, . . . , m. If we further use Λ = diag(1/τ1, 1/τ2, . . . , 1/τm) then we obtain ΛO(t) = t−1−γ−ω diag(1, 1, . . . , 1), t ∈ (a, b); that is, all components of the matrix ΛO(t) have exactly the same density.
3.5 Practical implementation
Here we only consider the diagonal representation of the matrices Oa, Ob and O(t); the case of one-column representation of the matrices O can be treated similarly. We assign matrix weights Oa and Ob to the boundary points a and b and use an N-point approximation to an absolutely continuous probability measure on (a, b) with some density φ(t). The density φ(t) is defined to be either the uniform density on (a, b) (if nonzero elements of different components of O(t) are not proportional to each other) or φ (t) = c|Ol,l(t)| for some l ∈ {1, . . . , m} (if nonzero elements of different components of O(t) are proportional to each other), where c is the normalization constant and l is such that the density φ (t) is not identically zero on the interval (a, b). Denote by the corresponding c.d.f. For given N, we calculate an N-point approximation {t1,N, . . . , tN,N; 1/N, . . . , 1/N}, where ti,N = F−1(zi,N) with zi,N = i/(N + 1), i = 1, 2, . . . , N, to the probability measure with density φ (t).
To each point tj,N we assign a vector of weights sj = (sj,N,1, . . . , sj,N,m)T such that sj,N,k ∈ {−1, 0, 1} (k = 1, . . . , m). The values sj,N,k = sign(Ok,k(tj)) = ±1 correspond to the sign of the point tj,N in the estimation of θk exactly as in the procedure for one-parameter models described in Section 2.5. Some of the values sj,N,k could be 0. If sj,N,k = 0 for some k then the point tj,N is not used for the estimation of θk. By assigning zero weight to a point tj,N in the k-th estimation direction, we perform a thinning of the sample of points t1,N, . . . , tN,N in k-th direction and thus achieve a required density in the each estimation direction. This is a deterministic version of the well-known ‘rejection method’ widely used to generate samples from various probability distributions. If the nonzero components of the matrix weight O(t) are proportional to each other then for these components sj,N,k = 1 for all j and N.
The resulting estimator θ̂ has the form (3.2) where
and P is the diagonal m × m matrix whose diagonal elements are given by
If nonzero elements of different components of the matrix weight O(t) are proportional to each other (as was the case in the examples of Section 3.4) then the (N+2)-point approximations to the limiting design are very similar to the approximations in the one-parameter case considered in Section 2.5; their accuracy is also very high. Otherwise, when the diagonal elements of O(t) are possibly non-proportional, the accuracy of approximations will depend on the degree of non-homogeneity of components of the matrix weight O(t).
3.6 Some numerical results
For comparison of competing matrix weighted designs for multiparameter models it is convenient to consider a functional of the covariance matrix. Exemplarily we investigate in this Section the classical D-optimality criterion defined as Ψ (D(ξ)) = (det D(ξ))1/m which has to be minimized.
As an example where all nonzero elements of the matrix O(t) are proportional to each other, let us consider the cubic regression model with f(t) = (1, t, t2, t3)T and the Brownian motion error process. The optimal value in the continuous time model (1.2) is Ψ (D*) ≃ 2.7927. In Figure 4 we display the D-criterion of the covariance matrices of the MWE and the BLUE for the proposed (N+2)-point designs and the covariance matrix of the BLUE with corresponding optimal (N+2)-point designs. We can see that the D-efficiency of the proposed matrix-weighted design is very high, even for small N.
Figure 4.
The D-optimality criterion of the covariance matrix of the MWE for the proposed (N+2)-point designs (crosses), of the covariance matrix of the BLUE for the proposed (N+2)-point designs (line) and of the covariance matrix of the BLUE with corresponding D-optimal (N +2)-point designs (grey circles). The error process in model (1.1) is the Brownian motion and the vector of regression functions is given by f(t) = (1, t, t2, t3), t ∈ [1, 2].
The second example of this section considers a situation where nonzero elements of the matrix O(t) are not proportional to each other. For this purpose we consider the model (2.1) with f(t) = (1, t, t2)T, t ∈ [1, 2] and covariance kernel K(t, t′) = e−|t−t′| with u(t) = et and v(t) = e−t. Using the diagonal representation, we obtain for the optimal matrix-weighted designs
The optimal value in the continuous time model (1.2) is given by Ψ (D*) ≃ 1.6779. Since some diagonal elements of O(t) are constant functions, we take the support points of the design ξN+2 to be equidistant: ti,N = i/(N +1) for i = 1, . . . , N. Then we have sj,N,k = 1 for all j = 1, . . . , N and k = 1, 2. However, some elements of (s1,N,3, . . . , sN,N,3) should be zero because O3,3(t) is not proportional to O1,1(t). For example, for N = 10 the vector of signs (s1,N,3, . . . , sN,N,3) is (−1, −1, 0, 0, 0, 1, 0, 0, 1, 0) and for N = 30 it is (−1, 0, −1, −1, 0, −1, 0, 0, −1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1).
In Figure 5 we depict the D-optimality criterion of the covariance matrices for various estimators. We observe that in this example for all N the D-optimality criterion of the covariance matrices of the MWE is slightly larger than the D-optimality criterion of the covariance matrices of the BLUE. However, we can also see that the proposed (N +2)-point designs are very efficient compared to the BLUE with corresponding D-optimal (N +2)-point designs even for small N.
Figure 5.
The D-optimality criterion of the covariance matrix of the MWE for the proposed (N+2)-point designs (crosses), of the covariance matrix of the BLUE for the proposed (N+2)-point designs (line) and of the BLUE with corresponding D-optimal (N+2)-point designs (grey circles). The covariance kernel in model (1.1) is K(t, t′) = e−|t−t′| and the vector of regression functions is f(t) = (1, t, t2), t ∈ [1, 2].
Acknowledgments
This work has been supported in part by the Collaborative Research Center “Statistical modeling of nonlinear dynamic processes” (SFB 823, Teilprojekt C2) of the German Research Foundation (DFG). The research of H. Dette reported in this publication was also partially supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R01GM107639. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We would also like to thank Martina Stein who typed parts of this paper with considerable technical expertise. The work of Andrey Pepelyshev was partly supported by Russian Foundation of Basic Research, project 12-01-00747.
A Proof of main results
A.1 Explicit form of the inverse of the covariance matrix of errors
Here we state an auxiliary result, which gives an explicit form for the inverse of the matrix , with a triangular covariance kernel K. We did not find this result (as formulated below) in the literature. Versions of Lemma A.1, however, have been derived independently by different authors; see, for example, Lemma 7.3.2 in Zhigljavsky (1991) and formula (8) in Harman and Štulajter (2011). The proof follows from straightforward checking the condition Σ−1Σ = ΣΣ−1 = I.
Lemma A.1
Consider a symmetric N × N matrix Σ = (σi,j)i,j=1,...,N which elements are defined by the formula σi,j = uivj for 1 ≤ i ≤ j ≤ N. Assume that q1 < q2 < . . . < qN where qi = ui/vi. Then the inverse matrix Σ̃ = Σ−1 is a symmetric tri-diagonal matrix and its elements σ̃i,j with i ≤ j can be computed as follows:
In our applications of Lemma A.1 we assume that σi,j = K(ti, tj) with the covariance kernel K having the form (2.4).
A.2 Proof of Lemma 2.1
Denote Kij = K(ti, tj), f (ti) = fi, ai = fiwi, i, j = 1, . . . , N , a = (a1, . . . , aN)T. Then for any signed measure ξ = {t1, . . . , tN; w1, . . . , wN} we have
Since Σ is symmetric and Σ > 0, there exists Σ−1 and a symmetric matrix Σ1/2 > 0 such that Σ = Σ1/2Σ1/2. Denote b = Σ1/2a and d = Σ−1/2f. Then we can write the design optimality criterion D(ξ) as D(ξ) = bTb/(bTd)2. The Cauchy-Schwartz inequality gives for any two vectors b and d the inequality (bTd)2 ≤ (bTb)(dTd), that is, bTb/(bTd)2 ≥ 1/(dTd). This inequality with b and d as above is equivalent to D(ξ) ≥ 1/fT Σ−1f for all ξ. Equality is attained if the vector b is proportional to the vector d; that is, if bi = cdi for all i and any c ≠ 0. Finally, the equality bi = cdi can be rewritten in the form wi = c(Σ−1f)i/f(ti).
A.3 Proof of Theorem 2.1
Before starting the main proof we recall the definition of the design points (2.9) and prove the following auxiliary result.
Lemma A.2
Assume that q(·) = u(·)/v(·) is a twice continuously differentiable function on the interval [a, b]. Then for all i = 1, . . . , N − 1, we have
(A.1) |
(A.2) |
Proof of Lemma A.2
Recall the definition zi,N = (i − 1/2)/N (i = 1, . . . , N) and set
From the definition of the function Q in (2.8) we have
(A.3) |
for all i = 1, . . . , N − 1. Observing Taylor’s formula yields for any z
In this formula, set z = zi,N and δN = 1/N so that z + δ = zi+1,N. We thus obtain
By using (2.9) and the relation (Q−1)′ (z) = 1/Q′ (Q−1(z)) we can rewrite this in the form (A.1). The second statement obviously follows from (A.1).
Proof of Theorem 2.1
In view of Lemma 2.1 and (2.5) – (2.7) we have
where we have used the relations (A.3). Here cN is the normalization constant providing and we use the notation ui = u(ti,N), vi = (ti,N) and fi = f (ti,N).
Consider first w1,N. Denote g(t) = f(t)/u(t), then
(A.4) |
which gives
(A.5) |
as N → ∞. Similarly
yielding
(A.6) |
Combining (A.4), (A.5) and (A.6) we obtain
(A.7) |
as N → ∞. Similarly to (A.7) we get the asymptotic expression for wN,N :
(A.8) |
as N → ∞. Consider now the weights
(A.9) |
Assume N → ∞ and i = i(N) is such that i(N)/N = z + O(1/N) as N → ∞ for some z ∈ (0, 1), and set t = Q−1(z).
We are going to prove that
(A.10) |
(A.11) |
First, in view of (2.9) we have and hence
Consider the numerator in (A.10) and rewrite it as follows:
where t̃i,N = (ti−1,N + ti+1,N)/2. We obviously have ti+1,N = t̃i,N + ΔN and ti−1,N = t̃i,N − ΔN, where ΔN = (ti+1,N − ti−1,N)/2 is defined in (A.2). This yields
(A.12) |
Next we consider
For the first factor we have
while the second factor gives
where we have used the relation (Q−1)″ (z) = −Q″(z)/(Q′(z))3 in the last equation. This gives, as N → ∞,
(A.13) |
Combining the expressions (A.2), (A.10), (A.12) and (A.13) yields the asymptotic expression (A.11) for wi,N/cN .
By noting that as N → ∞ and that the asymptotic density of the points ti,N (i = 1, . . . , N) is Q′(t) on the interval [a, b], we deduce the statement of the theorem as a consequence of the asymptotic formulas (A.7), (A.8) and (A.11) for w1,N/cN, wN,N /cN and wi,N/cN respectively.
A.4 Proof of Theorem 2.2
By Theorem 3.3 in Dette et al. (2013) a design minimizes the functional (2.15) if the identity
(A.14) |
holds ξ –a.e., where λ is some constant. We consider the design ξ = ξ* defined by (2.12) and (2.13) and verify for this design the condition (A.14). To do this we calculate by partial integration
Observing the definition of the masses in (2.13), the identify (A.14) follows with λ = c. This proves the first part of Theorem 2.2.
For a proof of the second statement consider a linear unbiased estimator θ̂μ* in model (2.1) based on the full trajectory, where μ*(dt) = f(t)ξ*(dt) and ξ* is the design in (2.12), (2.13) with a constant c chosen such that θ̂μ* is unbiased, that is
Standard arguments of optimal design theory show that μ* minimizes Φ (that is, θ̂μ* is BLUE in model (2.1) where the full trajectory can be observed) if and only if the inequality
(A.15) |
holds for all signed measures ν satisfying . Observing this condition and the identity (A.14) we obtain
for all signed measures ν on [a, b] with . By (A.15) μ* minimizes Φ. Consequently, the corresponding estimator θ̂μ* is BLUE with minimal variance
A.5 Proof of Theorem 2.3
Let {ε̃ (s)|s ∈ [ã, b̃] be a Brownian motion on the interval [ã, b̃] and consider the regression model (2.1) with some function f̃(s) and the error process. By Theorem 2.2 the optimal design is given by
with
We shall now use Theorem B.1 to derive the optimal design ξ*(dt) for the original regression model (2.1) with regression function f(t) and covariance kernel K(t, t′) from the design ξ̃*(ds) for the function f̃(s) = h(q−1(s)), where h(t) = f(t)/v(t).
For the Brownian motion, the covariance function is defined by (B.4) with ṽ(t) = 1 and q̃(t) = t so that by (B.6) we have β(t) = q(t), α (t) = v(t) and α̃(t) = 1/v(q−1(t)). According to (B.14) the optimal design dξ̃(s) transforms to dξ*(t) = α̃2(β(t)) dξ̃* (β(t)) = dξ̃* (q(t))/ν(t).
Consider first the mass at b. We have P̃b̃ = cf̃′ (b̃)/f̃(b̃). By using the transformation of t into s = q−1(t), we obtain
as required. From the representation of P̃ãwe obtain by similar arguments
Let us now consider the density p̃(s), s ∈ [ã,b̃], and rewrite , the absolutely continuous part of the measure ξ̃*. The transformation of the variable s into t = q−1(s) ∈ [a, b] induces the density
(A.16) |
Differentiating the equality f̃(s) = h(q−1(s)), we have
Now we obtain
Inserting this into (A.16) and taking into account that f̃(q(t)) = h(t), we obtain the density
(A.17) |
In view of the relation dξ*(t) = α̃2(β(t))dξ̃*(β(t)) we need to divide the right hand side in (A.17) by v2(t) and obtain the expression for the density (2.11). This completes the proof of Theorem 2.3.
B Gaussian processes with triangular covariance kernels
B.1 Extended Doob’s representation
Assume that {ε (t)| t ∈ [a, b]} is a Gaussian process with covariance kernel K of the form (2.4); that is, K(t, t′) = u(t)v(t′) for t ≤ t′, where u(·) and v(·) are functions defined on the interval [a, b]. According to the terminology introduced in Mehr and McFadden (1965) kernels of the form (2.4) are called triangular. An alternative way of writing these covariance kernels is
(B.1) |
where q(t) = u(t)/v(t). We assume that ε(t) is non-degenerate on the open interval (a, b), which implies that the function q is strictly increasing and continuous on the interval [a, b] [see Mehr and McFadden (1965), Remark 2]. Moreover, this function is also positive on the interval (a, b) [see Remark 1 in Mehr and McFadden (1965)], which yields that the functions u and v must have the same sign and can be assumed to be positive on the interval (a, b) without loss of generality. We repeatedly use the following extension of the celebrated Doob’s representation [see Doob (1949)], which relates to two Gaussian processes (on compact intervals) by a time-space transformation.
Lemma B.1
Let {ε(t)| t ∈ [a, b]} be a non-degenerate Gaussian process with zero mean and covariance function (B.1) and let ṽ and q̃ be continuous positive functions on [ã,b̃], such that q̃ is strictly increasing and q̃ ([ã,b̃]) = q([a, b]). Define the transformations β̃ : [ã,b̃] → [a, b] and α̃ : [ã,b̃] → ℝ+ by
(B.2) |
Then the Gaussian process {ε̃(t)| t ∈ [ã,b̃]} defined by
(B.3) |
has zero mean and the covariance function is given by
(B.4) |
Conversely, the Gaussian process ε(t) can be expressed via ε̃(s) by the transformation
(B.5) |
where
(B.6) |
Proof
Since {ε(t)|t ∈ [a, b]} is Gaussian and has zero mean, the process defined by (B.3) is also Gaussian and has zero mean. For the covariance function of the process (B.3) we have
The second part of the proof follows by the same arguments and the details are therefore omitted.
Remark B.1
The classical result of Doob is a particular case of (B.5) when ε̃(t) = W(t) is the Brownian motion with covariance function K̃ (t, s) = min(t, s). In this case we have ṽ (t) = 1, q̃ (t) = t, α (t) = v(t) and β(t) = q(t). Specifically, the Doob’s representation is given by ε (t) = v(t)W(q(t)) [see Doob (1949)].
- Both functions β : [a, b] → [ã,b̃] and β̃ : [ã,b̃] → [a, b] are positive strictly increasing functions and are inverses of each other; that is,
(B.7) - The functions α (·) and α̃ (·) are positive and satisfy the relation
(B.8) The properties (b) and (c) imply that the transformation ε̃ → ε defined by (B.5) is the inverse of the transformation ε → ε̃ defined in (B.3).
B.2 Transformation of regression models
Associated with the transformation of the triangular covariance kernels there exists a canonical transformation for the corresponding regression models. To be precise, consider the regression model (1.1) or its continuous time version (1.2), where the covariance kernel K(·, ·) has the form (B.1). Recall the definition of the transformation β : [a, b] → [ã,b̃] defined in (B.6), which maps the observation points tj to t̃j = β(tj), j = 1, . . . , N and define
(B.9) |
where s ∈ [ã, b̃] so that β̃ (s) ∈ [a, b]. The regression model (1.1) can now be rewritten in the form
(B.10) |
The errors ε̃ (t̃j) in (B.10) have zero mean and, by Lemma B.1 and the identity (B.8), their covariances are given by
(B.11) |
Hence we have transformed the regression observation scheme (1.1) with error covariances 𝔼[ε(ti) ε (tj)] = K(ti, tj) to the scheme (B.10) with covariances (B.11). Conversely, we can transform the model (B.10) with covariances (B.11) to the model (1.1) using the transformations
(B.12) |
Lemma B.2
The transformation f → f̃ defined in (B.9) is an inverse to the transformation f̃ → f defined in (B.12).
Proof
Inserting the expression for f̃ from (B.9) into (B.12), we have
where we have used the identities β̃ (β(t)) = t, see (B.7), and α (t) α̃ (β(t)) = 1, see (B.8).
B.3 Transformation of designs
In this section we consider a transformation of the matrix-weighted designs under a given transformation of the regression models. In the one-parameter case with m = 1, these matrix-weighted designs become signed measures; that is, signed designs as considered in Section 2. In this section, it is convenient to define all integrals as Lebesgue-Stieltjes integrals with respect to the distribution functions of the measures ζ and ζ̃.
To be precise, let dξ(t) = Oξ(t)dζ(t) be a matrix-weighted design on the interval t ∈ [a, b]. Recalling the definition of α, α̃ and β, β̃ in (B.2) and (B.6) we define a matrix-weighted design dξ̃(s) = Õξ̃ (s)dζ̃(s) by
(B.13) |
Note that ζ̃ and ζ are probability measures on the intervals [ã, b̃] and [a, b], respectively. Similarly, for a given matrix-weighted design dξ̃(s) = Õξ̃ (s)dζ̃(s) on the interval [ã, b̃] we define a matrix-weighted design dξ(t) = Oξ(t)dζ(t) on the interval [a, b] by
(B.14) |
Similar to Lemma B.2 we can see that the transformation ξ̃ → ξ defined by (B.14) is the inverse to the transformation ξ → ξ̃ defined by (B.13).
For the following discussion we recall the definition of the covariance matrix D(ξ) in (3.11). For the model (B.10), the covariance matrix of the design dξ̃(s) = Õξ̃ (s)dζ̃(s), defined by (B.13), is given by
(B.15) |
where
and the kernel K̃ is defined by (B.4).
Theorem B.1
For any matrix-weighted design dξ(t) = Oξ(t)dζ(t) and the corresponding matrix-weighted design ξ̃ defined by (B.13), we have D(ξ) = D̃(ξ̃). In particular, D* = D̃*, where D* and D̃ * are the covariance matrices of the BLUE in the continuous time models (1.2) and in the model {θTf̃(s) + ε̃ (s)|s ∈ [ã, b̃]}, respectively.
Proof
Using the variable transformation β̃ (s) = t and (B.9), we have
Next, we calculate the corresponding expression for B̃ (ξ̃), that is
Define s = β̃ (x) and t = β̃ (y) so that x = β̃−1(s) = β(s) and similarly y = β(t). Changing the variables in the integrals above we obtain
Using the definition of β in (B.6) yields q̃ (β(t)) = q̃ (q̃−1(q(t))) = q(t) and by the definition of α in (B.6) we finally get
The result D(ξ) = D̃ (ξ̃) follows now from the definitions (3.11) and (B.15).
Contributor Information
Holger Dette, Ruhr-Universität Bochum, Fakultät für Mathematik, 44780 Bochum, Germany.
Andrey Pepelyshev, School of Mathematics, Cardiff University, Cardiff, CF24 4AG, UK.
Anatoly Zhigljavsky, School of Mathematics, Cardiff University, Cardiff, CF24 4AG, UK.
References
- Bickel PJ, Herzberg AM. Robustness of design against autocorrelation in time I: Asymptotic theory, optimality for location and linear regression. Annals of Statistics. 1979;7(1):77–95. [Google Scholar]
- Boltze L, Näther W. On effective observation methods in regression models with correlated errors. Math Operationsforsch Statist Ser Statist. 1982;13:507–519. [Google Scholar]
- Dette H, Kunert J, Pepelyshev A. Exact optimal designs for weighted least squares analysis with correlated errors. Statistica Sinica. 2008;18(1):135–154. [Google Scholar]
- Dette H, Pepelyshev A, Zhigljavsky A. Optimal design for linear models with correlated observations. The Annals of Statistics. 2013;41(1):143–176. [Google Scholar]
- Doob JL. Heuristic approach to the Kolmogorov-Smirnov theorems. The Annals of Mathematical Statistics. 1949;20(3):393–403. [Google Scholar]
- Grenander U. Stochastic processes and statistical inference. Ark Mat. 1950;1:195–277. [Google Scholar]
- Harman R, Štulajter F. Optimal prediction designs in finite discrete spectrum linear regression models. Metrika. 2010;72(2):281–294. [Google Scholar]
- Harman R, Štulajter F. Optimality of equidistant sampling designs for the Brownian motion with a quadratic drift. Journal of Statistical Planning and Inference. 2011;141(8):2750–2758. [Google Scholar]
- Kiselak J, Stehlík M. Equidistant D-optimal designs for parameters of Ornstein-Uhlenbeck process. Statistics and Probability Letters. 2008;78:1388–1396. [Google Scholar]
- Mehr C, McFadden J. Certain properties of Gaussian processes and their first-passage times. Journal of the Royal Statistical Society. Series B (Methodological) 1965:505–522. [Google Scholar]
- Müller WG, Pázman A. Measures for designs in experiments with correlated errors. Biometrika. 2003;90:423–434. [Google Scholar]
- Näther W. Effective Observation of Random Fields. Teubner Verlagsgesellschaft; Leipzig: 1985a. [Google Scholar]
- Näther W. Exact design for regression models with correlated errors. Statistics. 1985b;16:479–484. [Google Scholar]
- Pázman A, Müller WG. Optimal design of experiments subject to correlated errors. Statist Probab Lett. 2001;52(1):29–34. [Google Scholar]
- Pukelsheim F. Optimal Design of Experiments. SIAM; Philadelphia: 2006. [Google Scholar]
- Sacks J, Ylvisaker ND. Designs for regression problems with correlated errors. Annals of Mathematical Statistics. 1966;37:66–89. [Google Scholar]
- Sacks J, Ylvisaker ND. Designs for regression problems with correlated errors; many parameters. Annals of Mathematical Statistics. 1968;39:49–69. [Google Scholar]
- Zhigljavsky A, Dette H, Pepelyshev A. A new approach to optimal design for linear models with correlated observations. Journal of the American Statistical Association. 2010;105:1093–1103. [Google Scholar]
- Zhigljavsky AA. Theory of Global Random Search. Springer; Netherlands: 1991. [Google Scholar]