Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Feb 1.
Published in final edited form as: Ann Stat. 2015 Dec 10;44(1):113–152. doi: 10.1214/15-AOS1361

Optimal designs in regression with correlated errors

Holger Dette 1, Andrey Pepelyshev 2, Anatoly Zhigljavsky 3
PMCID: PMC4914140  NIHMSID: NIHMS765884  PMID: 27340304

Abstract

This paper discusses the problem of determining optimal designs for regression models, when the observations are dependent and taken on an interval. A complete solution of this challenging optimal design problem is given for a broad class of regression models and covariance kernels.

We propose a class of estimators which are only slightly more complicated than the ordinary least-squares estimators. We then demonstrate that we can design the experiments, such that asymptotically the new estimators achieve the same precision as the best linear unbiased estimator computed for the whole trajectory of the process. As a by-product we derive explicit expressions for the BLUE in the continuous time model and analytic expressions for the optimal designs in a wide class of regression models. We also demonstrate that for a finite number of observations the precision of the proposed procedure, which includes the estimator and design, is very close to the best achievable. The results are illustrated on a few numerical examples.

Keywords and Phrases: linear regression, correlated observations, signed measures, optimal design, BLUE, Gaussian processes, Doob representation

1 Introduction

Optimal design theory is a classical field of mathematical statistics with numerous applications in life sciences, physics and engineering. In many cases the use of optimal or efficient designs yields to a reduction of costs by a statistical inference with a minimal number of experiments without loosing any accuracy. Most work on optimal design theory concentrates on experiments with independent observations. Under this assumption the field is very well developed and a powerful methodology for the construction of optimal designs has been established [see for example the monograph of Pukelsheim (2006)]. While important and elegant results have been derived in the case of independence, there exist numerous situations where correlation between different observations is present and these classical optimal designs are not applicable.

The theory of optimal design for correlated observations is much less developed and explicit results are only available in rare circumstances. The challenging difficulty consists here in the fact that - in contrast to the independent case - correlations yield to non-convex optimization problems and classical tools of convex optimization theory are not applicable. Some exact optimal designs for specific linear models have been studied in Dette et al. (2008); Kiselak and Stehlík (2008); Harman and Štulajter (2010). Because explicit solutions of optimal design problems for correlated observations are rarely available, several authors have proposed to determine optimal designs based on asymptotic arguments [see for example Sacks and Ylvisaker (1966, 1968), Bickel and Herzberg (1979), Näther (1985a), Zhigljavsky et al. (2010)], where the references differ in the asymptotic arguments used to embed the discrete (non-convex) optimization problem in a continuous (or approximate) one. However, in contrast to the uncorrelated case, this approach does not simplify the problem substantially and due to the lack of convexity the resulting approximate optimal design problems are still extremely difficult to solve. As a consequence, optimal designs have mainly been determined analytically for the location model (in this case the optimization problems are in fact convex) and for a few one-parameter linear models [see Boltze and Näther (1982), Näther (1985a), Ch. 4, Näther (1985b), Pázman and Müller (2001) and Müller and Pázman (2003) among others]. Only recently, Dette et al. (2013) determined (asymptotic) optimal designs for least squares estimation in models with more parameters under the additional assumption that the regression functions are eigenfunctions of an integral operator associated with the covariance kernel of the error process. However, due to this assumption, the class of models for which approximate optimal designs can be determined explicitly is rather small.

The present paper provides a complete solution of this challenging optimal design problem for a broad class of regression models and covariance kernels. Roughly speaking, we determine (asymptotic) optimal designs for a slightly modified ordinary least squares estimator (OLSE), such that the new estimate and the corresponding optimal design achieve the same accuracy as the best unbiased linear estimate (BLUE) with corresponding optimal designs.

To be more precise, consider a general regression observation scheme given by

y(tj)=θTf(tj)+ε(tj),j=1,,N, (1.1)

where 𝔼[ε(tj)] = 0, K(ti, tj) = 𝔼[ε(ti)ε(tj)] denotes the covariance between observations at the points ti and tj. (i, j = 1, . . ., N), θ = (θ1, . . ., θm)T is a vector of unknown parameters, f(t) = (f1(t), . . ., fm(t))T is a vector of linearly independent functions, the explanatory variables t1, . . ., tN vary in a compact interval, say [a, b]. Parallel to model (1.1) we also consider its continuous time version

y(t)=θTf(t)+ε(t),t[a,b], (1.2)

where the full trajectory of the process {y(t)|t ∈ [a, b]} can be observed and {ε(t)|t ∈ [a, b]} is a centered Gaussian process with covariance kernel K, i.e. K(s, t) = 𝔼[ε(s)ε(t)]. This kernel is assumed to be continuous throughout this paper.

We pay much attention to the one-parameter case and develop a general method for solving the optimal design problem in model (1.2) explicitly for the OLSE, perhaps slightly modified. The new estimate and the corresponding optimal design achieve the minimal variance among all linear estimates (obtained by the BLUE). In particular, our approach allows to calculate this optimal variance explicitly. As a by-product we also identify the BLUE in the continuous time model (1.2). Based on these asymptotic considerations, we consider the finite sample case and suggest designs for a new estimation procedure (which is very similar to OLSE) with an efficiency very close to the best possible (obtained by the BLUE and the corresponding optimal design), for any number of observations. In doing this, we show how to implement the optimal strategies from the continuous time model in practice and demonstrate that even for very small sample sizes the loss of efficiency with respect to the best strategies based on the use of BLUE with a corresponding optimal design can be considered as negligible. We would like to point out at this point that - even in the one-dimensional case - the problem of numerically calculating optimal designs for the BLUE for a fixed sample size is an extremely challenging one due to the lack of convexity of the optimization problem.

In our approach, the importance of the one-parameter design problem is also related to the fact that the optimal design problem for multi-parameter models can be reduced componentwisely to problems in the one-parameter models. This gives us a way to generate analytically constructed universally optimal designs for a wide range of continuous time multi-parameter models of the form (1.2). Our technique is based on the observation that for a finite number of observations we can always emulate the BLUE in model (1.1) by a different linear estimator. To achieve that theoretically we assign signs to the support points of a discrete design and not only weights in the one-parameter models, but in the multi-parameter case we use matrix weights. We then determine “optimal” signs and weights and consider the weak convergence of these ‘designs” and estimators as the sample size converges to infinity. Finally, we prove the (universal) optimality of the limits in the continuous time model (1.2).

Theoretically, we construct a sequence of designs for either the pure or a modified OLSE, say θ̂N, such that its variance or covariance matrix satisfies Var(θ̂N) → D* as the sample size N converges to infinity, where D* is the variance (if m = 1) or covariance matrix (if m > 1) for the BLUE in the continuous time model (1.2). In other words, D* is the smallest possible variance (or covariance matrix with respect to the Loewner ordering) of any unbiased linear estimator and any design. This makes the designs derived in this paper very competitive in applications against the designs proposed by Sacks and Ylvisaker (1966) and optimal designs constructed numerically for the BLUE (using the Brimkulov-Krug-Savanov algorithm, for example). We emphasize once again that due to non-convexity the numerical construction of optimal designs for the BLUE is extremely difficult. An additional advantage of our approach is that we can analytically compute the BLUE with the corresponding optimal variance (covariance matrix) D* in the continuous time model (1.2) and therefore monitor the proximity of different approximations to the optimal variance D* obtained by the BLUE.

The methodology developed in this paper results in a non-standard estimation and optimal design theory and consists in a delicate interplay between new linear estimators and designs in the models (1.1) and (1.2). For this reason let us briefly introduce various estimators, which we will often refer to in the following discussion. Consider the model (1.1) and suppose that N observations are taken at experimental conditions t1, . . ., tN. For the corresponding vector of observations Y = (y(t1), . . ., y(tN))T, a general weighted least squares estimator (WLSE) of θ is defined by

WLSE:θ^WLSE=(XTWX)-1XTWY, (1.3)

where X=(fi(tj))j=1,,Ni=1,,m is an N×m design matrix and W is some N × N matrix such that (XTWX)−1 exists. For any such W the estimator (1.3) is obviously unbiased. The covariance matrix of the estimator (1.3) is given by

Var(θ^WLSE)=(XTWX)-1XTWWTX(XTWTX)-1, (1.4)

where Σ = (K(ti, tj))i,j=1,...,N is an N ×N matrix of variances/covariances. For the standard WLSE the matrix W is symmetric non-negative definite; in this case θcirc;WLSE minimizes the weighted sum of squares SSW(θ) = (YXθ)TW(YXθ) with respect to θ. Important particular cases of estimators of the form (1.3) are the OLSE, the best unbiased linear estimate (BLUE) and the signed least squares estimate (SLSE):

OLSE:θ^OLSE=(XTX)-1XTY, (1.5)
BLUE:θ^BLUE=(XT-1X)-1XT-1Y, (1.6)
SLSE:θ^SLSE=(XTSX)-1XTSY. (1.7)

Here S is an N×N diagonal matrix with entries +1 and −1 on the diagonal; note that if SIN then SLSE is not a standard WLSE. While the use of BLUE and OLSE is standard, the SLSE is less common. It was introduced in Boltze and Näther (1982) and further studied in Chapter 5.3 of Näther (1985a). In the content of the present paper, the SLSE will turn out to be very useful for constructing optimal designs for OLSE and the BLUE in the model (1.2) with one parameter, where the full trajectory can be observed. Another estimate of θ, which is not a special case of the WLSE, will be introduced in Section 3 and used in the multi-parameter models.

The remaining structure of the paper is as follows. In Section 2 we derive optimal designs for continuous time one-parameter models and discuss how to implement the designs in practice. In Section 3 we extend the results of Section 2 to multi-parameter models. In Appendix B we discuss transformations of regression models and associated designs, which are a main tool in the proofs of our result but also of own interest. In particular, we provide an extension of the famous Doob representation for Gaussian processes [see Doob (1949) and Mehr and McFadden (1965)], which turns out to be a very important ingredient in proving the design optimality results of Sections 2 and 3. Finally, in Appendix A we collect some auxiliary statements and proofs for the main results of this paper.

2 Optimal designs for one-parameter models

In this section we concentrate on the one-parameter model

y(tj)=θf(tj)+ε(tj);j=1,,N, (2.1)

on the interval [a, b] and its continuous time analogue, where 𝔼[ε(t)] = 0 and 𝔼[ε(t)ε(t′)] = K(t, t′). Our approach uses some non-standard ideas and estimators in linear models and therefore we begin this section with a careful explanation of the logic of the material.

  • Sect. 2.1. Under the assumption that the design space is finite we show in Lemma 2.1 that by assigning weights and signs to the observation points {t1, . . ., tN} we can construct a WLSE which is equivalent to the BLUE. Then, we derive in Corollary 2.1 an explicit form for the optimal weights for a broad class of covariance kernels, which are called triangular covariance kernels.

  • Sect. 2.2. We demonstrate in Theorem 2.1 that the optimal designs derived in Sect. 2.1 converge weakly to a signed measure, if the cardinality of the design space converges to infinity.

  • Sect. 2.3. We consider model (2.1) under the assumption that the full trajectory of the process {y(t)|t ∈ [a, b]} can be observed. For the specific case of Brownian motion, that is K(t, t′) = min{t, t′}, we prove analytically the optimality of the signed measure derived in Theorem 2.1 for OLSE. Then, in Theorem 2.3 we establish optimality of the asymptotic measures from Theorem 2.1 for general covariance kernels. As a by-product we also identify the BLUE in the continuous time model (1.2) (in the one-dimensional case). For this purpose, we introduce a transformation which maps any regression model with a triangular covariance kernel into another model with different triangular kernels. These transformations allow us to reduce any optimization problem to the situation considered in Theorem 2.2, which refers to the case of Brownian motion. The construction of this map is based on an extension of the celebrated Doob’s representation which will be developed in Appendix B.

  • Sect. 2.4. We provide some examples of asymptotic optimal measures for specific models.

  • Sect. 2.5. We introduce a practical implementation of the asymptotic theory derived in the previous sections. For a finite sample size we construct WLSE with corresponding designs which can achieve very high efficiency compared to the BLUE with corresponding optimal design. It turns out that these estimators are slightly modified OLSE, where only observations at the end-points obtain a weight (and in some cases also a sign).

  • Sect. 2.6. We illustrate the new methodology in several examples. In particular, we give a comparison with the best known procedures based on BLUE and show that the loss in precision for the procedures derived in this paper is negligible with our procedures being much simpler and more robust than the procedures based on BLUE.

2.1 Optimal designs for SLSE on a finite design space

In this section, we suppose that the design space for model (2.1) is finite, say 𝒯 = {t1, . . ., tN}, and demonstrate that in this case the approximate optimal designs for the SLSE (1.7) can be found explicitly. Since we consider the SLSE (1.7) rather than the OLSE (1.5), a generic approximate design on the design space 𝒯 = {t1, . . ., tN} is an arbitrary discrete signed measure ξ = {t1, . . ., tN; w1, . . ., wN}, where wi = sipi, si ∈ {−1, 1}, pi ≥ 0 (i = 1, . . ., N) and i=1Npi=1. We assume that the support t1, . . ., tN of the design is fixed but the weights p1, . . ., pN and signs s1, . . ., sN, or equivalently the signed weights wi, will be chosen to minimize the variance of the SLSE (1.7). In view of (1.4), this variance is given by

D(ξ)=i=1Nj=1NK(ti,tj)wiwjf(ti)f(tj)/(i=1Nwif2(ti))2. (2.2)

Note that this expression coincides with the variance of the WLSE (1.2), where the matrix W is defined by W = diag(w1, . . ., wN).

We assume that f(ti) ≠ 0 for all i = 1, . . ., N. If f(tj) = 0 for some j then the point tj can be removed from the design space 𝒯 without changing the SLSE estimator, its variance and the corresponding value D(ξ). In the above definition of the weights wi, we have i=1Nwi=i=1Npi=1. Note, however, that the value of the criterion (2.2) does not change if we change all the weights from wi to cwi (i = 1, . . ., N) for arbitrary c ≠ 0.

Despite the fact that the functional D in (2.2) is not convex as a function of (w1, . . ., wN), the problem of determining the optimal design can be easily solved by a simple application of the Cauchy-Schwarz inequality. The proof of the following lemma is given in Appendix A [see also Theorem 5.3 in Näther (1985a), where this result was proved in a slightly different form].

Lemma 2.1

Assume that the matrix Σ = (K(ti, tj))i,j=1,...,N is positive definite and f(ti) ≠ 0 for all i = 1, . . ., N. Then the optimal weights w1,,wN minimizing (2.2) subject to the constraint i=1Nwi=1 are given by

wi=ceiT-1ff(ti);i=1,,N, (2.3)

where f = (f(t1), . . ., f(tN))T, ei = (0, 0, . . ., 0, 1, 0, . . ., 0)T ∈ ℝN is the i-th unit vector, and

c=(i=1NeiT-1f/f(ti))-1.

Moreover, for the design ξ={t1,,tN;w1,,wN} with weights (2.3) we have D(ξ*) = D*, where D* = 1/(fT Σ−1f), the variance of the BLUE defined in (1.6) using all observations t1, . . ., tN.

Lemma 2.1 shows, in particular, that the pair {SLS estimate, corresponding optimal design ξ*} provides an unbiased estimator with the best possible variance for the one-parameter model (2.1). This results in a WLSE (1.2) with W=diag(w1,,wN) which is BLUE. In other words, by a slight modification of the OLSE we are able to emulate the BLUE using the appropriate design or WLSE.

While the statement of Lemma 2.1 holds for arbitrary kernels, we are able to determine the optimal weights wi more explicitly for a broad class, which are called triangular kernels and are of the form

K(t,t)=u(t)v(t)fortt, (2.4)

where u(·) and v(·) are some functions on the interval [a, b]. Note that the majority of covariance kernels considered in literature belong to this class, see for example Näther (1985a); Zhigljavsky et al. (2010) or Harman and Štulajter (2011). The following result is a direct consequence of Lemma A.1 from Appendix A.

Corollary 2.1

Assume that the covariance kernel K(·, ·) has the form (2.4) so that the matrix Σ = (K(ti, tj))i,j=1,...,N is positive definite and has the entries K(ti, tj) = uivj for ij, where for k = 1, . . ., N we denote uk = u(tk), vk = v(tk), and also fk = f(tk), qk = uk/vk. If f1 ≠ 0 (i = 1, . . ., N), the weights in (2.3) can be represented explicitly as follows:

w1=cf1(σ11f1+σ12f2)=cu2f1v1v2(q2-q1)(f1u1-f2u2), (2.5)
wN=cfN(σN,NfN+σN-1,NfN-1)=cfNvN(qN-qN-1)(fNvN-fN-1vN-1), (2.6)
wi=cfi(σi,ifi+σi-1,ifi-1+σi,i+1fi+1)=cfivi((qi+1-qi-1)fivi(qi+1-qi)(qi-qi-1)-fi-1vi-1(qi-qi-1)-fi+1vi+1(qi+1-qi)), (2.7)

for i = 2, . . ., N − 1. In formulas (2.5), (2.6) and (2.7), the quantity σ̃ij denotes the element in the position (i, j) of the matrix Σ−1 = (ij)i,j=1,...,N.

2.2 Weak convergence of designs

In this section, we consider the asymptotic properties of designs with weights (2.5) – (2.7). Recall that the design space is an interval, say [a, b], and that we assume a triangular covariance function of the form (2.4). According to the discussion of triangular covariance kernels provided in Section 4.1 of Appendix B, the functions u(·) and v(·) are continuous and strictly positive on the interval (a, b) and the function q(·) = u(·)/v(·) is positive, continuous and strictly increasing on (a, b). We also assume that the regression function f in (2.1) is continuous and strictly positive on the interval (a, b). We define the transformation

Q(t)=q(t)-q(a)q(b)-q(a) (2.8)

and note that the function Q : [a, b] → [0, 1] is increasing on the interval [a, b] with Q(a) = 0 and Q(b) = 1, that is Q(·) is a cumulative distribution function (c.d.f.). For fixed N and i = 1, . . ., N, define zi,N=(i-12)/N and the design points

ti,N=Q-1(zi,N),i=1,,N. (2.9)

Theorem 2.1

Consider the optimal design problem for the model (2.1), where the error process ε(t) has the covariance kernel K(t, s) of the form (2.4). Assume that u(·), v(·), f(·) and q(·) are strictly positive, twice continuously differentiable functions on the interval [a, b]. Consider the sequence of signed measures

ξN={t1,N,,tN,N;w1,N,,wN,N},

where the support points ti,N are defined in (2.9) and the weights wi,N are assigned to these points according to the rule (2.3) of Lemma 2.1. Then the sequence of measures {ξN}N∈ℕ converges in distribution to a signed measure ξ*, which has masses

Pa=cf(a)v2(a)q(a)[f(a)u(a)u(a)-f(a)],Pb=c·h(b)f(b)v(b)q(b) (2.10)

at the points a and b, respectively, and the signed density

p(t)=-cf(t)v(t)[h(t)q(t)] (2.11)

(that is, the Radon-Nikodym derivative of ξ* with respect to the Lebesque measure) on the interval (a, b), where the function h(·) is defined by h(t) = f(t)/v(t).

The proof of Theorem 2.1 is technically complicated and therefore given in Appendix A. The constant c ≠ 0 in (2.10) and (2.11) is arbitrary. If a normalization |ξ*|([a, b]) = 1 is required, then c can be found from the normalizing condition

abξ(dt)=Pa+Pb+abp(t)dt=1.

Throughout this paper we write the limiting designs of Theorem 2.1 in the form

ξ(dt)=Paδa(dt)+Pbδb(dt)+p(t)dt, (2.12)

where δa(dt) and δb(dt) are the Dirac-measures concentrated at the points a and b, respectively, and the function p(·) is defined by (2.11). Note also that under the assumptions of Theorem 2.1, the function p(·) is continuous on the interval [a, b]. In the case of Brownian motion, the limiting design of Theorem 2.1 is particularly simple.

Example 2.1

If the error process ε in model (2.1) is the Brownian motion on the interval [a, b] with 0 < a < b < ∞, then K(t, s) = min(t, s) and hence u(t) = t, v(t) = 1, q(t) = t. This implies that the limiting design of Theorem 2.1 is given by (2.12) with

Pa=cf(a)-f(a)aaf(a),Pb=cf(b)f(b)andp(t)=-cf(t)f(t). (2.13)

2.3 Optimal designs and the BLUE

In this section we consider the continuous time model (1.2) in the case m = 1 and demonstrate that the limiting designs derived in Theorem 2.1 are in fact optimal. A linear estimator for the parameter θ in model (1.2) is defined by θ^μ=aby(t)μ(dt), where μ is a signed measure on the interval [a, b]. Special cases include the OLSE and SLSE θξ=aby(t)f(t)ξ(dt)/abf2(t)ξ(dt), where ξ is a measure or a signed measure on the interval [a, b], respectively. Note that θ̂μ is unbiased if and only abf(t)μ(dt)=1 and θξ is unbiased by construction. The BLUE (in the continuous time model (1.2)) minimizes

Φ(μ)=Var(θ^μ)=ababK(x,y)μ(dx)μ(dy)

in the class of all signed measures μ satisfying abf(t)μ(dt)=1, and

D=inf{Φ(μ)μsignedmeasureon[a,b]} (2.14)

denotes the best possible variance of all linear unbiased estimators in the continuous time model (1.2).

Similarly, a signed measure ξ* on the interval [a, b] is called optimal for least squares estimation in the one-parameter model (1.2), if it minimizes the functional

D(ξ)=Var(θξ)=ababK(t,s)f(t)f(s)ξ(dt)ξ(ds)/(abf2(t)ξ(dt))2, (2.15)

in the set of all signed measures ξ on the interval [a, b], such that abf2(t)ξ(dt)0. In the case of a Brownian motion, we are able to establish the optimality of the design of Example 2.1. A proof of the following result is given in Appendix A.

Theorem 2.2

Let {ε(t) | t ∈ [a, b]} be a Brownian motion, so that K(t, t′) = min{t, t′}, and f be a positive, twice continuously differentiable function on the interval [a, b] ⊂ ℝ+. Then the signed measure ξ*, defined by (2.12) and (2.13) with arbitrary c ≠ 0, minimizes the functional (2.15). The minimal value in (2.15) is obtained as

D(ξ)=minξD(ξ)=[f2(a)a+ab(f(t))2dt]-1.

Moreover, the BLUE in model (1.2) is given by θ̂μ*, where μ*(dt) = f(t)ξ**(dt) and ξ** is the signed measure defined by (2.12) and (2.13) with constant c* = D(ξ*). This further implies D* = D(ξ*) = Φ(μ*).

Based on the design optimality established in Theorem 2.2 for the special case of Brownian motion and the technique of transformation of regression models described in Appendix B, we can establish the optimality of the asymptotic designs derived in Theorem 2.1 for more general covariance kernels; see Appendix A for the proof.

Theorem 2.3

Under the conditions of Theorem 2.1, the optimal design ξ* minimizing the functional (2.15) is defined by the formulas (2.10) – (2.12) with arbitrary c ≠ 0. The minimal value in (2.15) is obtained as

D(ξ)=[f2(q(a))q(a)+q(a)q(b)(f(t))2dt]-1, (2.16)

where (t) = f(q−1(s))/v(q−1(s)). Moreover, the BLUE in model (1.2) is given by θ̂μ*, where μ*(dt) = f(t)ξ**(dt), ξ** is the signed measure defined in (2.10) – (2.12) with constant c* = D(ξ*), and D* = Φ(μ*) = D(ξ*).

2.4 Examples of optimal designs

In this section, we provide the values of Pa, Pb and the function p(·) in the general expression (2.12) for the optimal designs in a number of important special cases for the one-parameter continuous time model (1.2), where the design space is 𝒯 = [a, b]. Specifically, optimal designs are given in Table 1 for the location model, in Table 2 for the linear model, in Table 3 for a quadratic model and in Table 4 for a trigonometric model. The last named model was especially chosen to demonstrate the existence of optimal designs with a density p which changes sign in the interval (a, b). In the tables several triangular covariance kernels are considered. The parameters of these covariance kernels satisfy the constraints c2 > ±c1, ∓ c2 ∉ [a, b], γ > ω, λ > 0. For the sake of a transparent presentation, we use the factor c = 1 in all tables, but we emphasize once again that the optimal designs do not depend on the scaling factor.

Table 1.

Optimal designs for the location model: f(t) = 1, t [a, b].

u(t) v(t) Pa Pb p(t)
any 1 1 0 0
c1 + t c2 ± t
1a+c1
-1b±c2
0
tγ tω γaγω ωbγω γωt−1−γω
eλt eγt λea(γλ) γeb(γλ) λγet(γλ)

Table 2.

Optimal designs for the linear regression model through the origin: f(t) = t, t ∈ [a, b].

u(t) v(t) Pa Pb p(t)
t 1 0 1 0
c1 + t c2 ± t
-c1(a+c1)a
±c2(b±c2)b
0
tγ t −(γ−1)aγω (ω−1)bγω (1−γ)(1−ω)t−1−γω
eλt 1
(aλ-1)eaλa
e-bλb
λe-tλt
eλt eγt
aλ-1aea(γ-λ)
bγ+1beb(γ-λ)
λγt-γ+λtet(γ-λ)

Table 3.

Optimal designs for the quadratic regression model: f(t) = t2 + ν, t ∈ [a, b].

u(t) v(t) Paf(a) Pbf(b) p(t)f(t)
t 1 (a2ν)/a −2b 2
c1 + t c2 ± t
(a2-ν+2ac1)a+c1
(b2-ν±2bc2)b±c2
2
tγ tω ((2−γ)a2γν)aγω ((ω−2)b2+ων)bγω ((2−ω)(2− ω)+νγω)t1−γω
eλt 1 (2a − (a2 + ν)λ)e −2be 2(1 − )e
eλt eλt (2a − (a2 + ν)λ) −((b2 + ν)λ + 2b) (2 − λ2(t2 + ν)

Table 4.

Optimal designs for the trigonometric regression model: f(t)=1+12sin(2πt), t ∈ [1, 2].

u(t) v(t) Pa Pb p(t)f(t)
t 1 (1 − π) π 2π2 sin(2πt)
c1 + t c2 ± t
1-πc1-πc1+1
1πc2-2πc2±2
2π2 sin(2πt)
t2 t (2 − π) (2π − 1)/8 2t−4((π2t2−1) sin(2πt)+πt cos(2πt)−1)
eλt 1 (λπ)eλ πe−2λ (2π2 sin(2πt) + πλcos(2πt))eλt
eλt eλt (λπ) (λ + π) ((2π2 + λ2/2) sin(2πt) + λ2)

As an example, if K(t, t′) = eλ|tt′| for some λ > 0, we have from the last row of Table 2 that the optimal design for the continuous time model {θt+ε(t)|t ∈ [1, 2]} is ξ(dt)=(λ-1)δ1(dt)+(λ+12)δ2(dt)+λ2dt, and as a consequence, D=(52+12λ+76λ)-1.

2.5 Practical implementation: designs for finite sample size

In practice, efficient designs and corresponding estimators for the model (1.1) have to be derived from the optimal solutions in the continuous time model (1.2), and in this section a procedure with a good finite sample performance is proposed. Roughly speaking, it consists of a slight modification of the ordinary least squares estimator and a discretization of a continuous signed measure with the asymptotic optimal density in (2.11).

We assume that the experimenter can take N +2 observations with N observations inside the interval [a, b]. In principle, any probability measure on the interval can be approximated by an (N +2)-point measure with weights 1/(N +2) and similarly any finite signed measure can be approximated by an (N +2)-point signed measure with equal weights (in absolute value). We hence could use a direct approximation of the optimal signed measures of the form (2.12) by a sequence of (N +2)-point signed measures with equal weights (in absolute value). For an increasing sample size this sequence will eventually converge to the optimal measure of Theorem 2.3. However, this convergence will typically be very slow, where we measure the speed of convergence by the differences between the variances D(ξ) of the corresponding estimates and the optimal value D* defined in (2.16). The main difficulty lies in the fact that a typical optimal measure has masses at the boundary points a and b, in addition to some density on the interval (a, b). The convergence of discrete measures with equal (in absolute value) weights to such a measure will be very slow, especially in view of the fact that in our approximating measures the points cannot be repeated. Summarizing, approximation of the optimal signed measures by measures with equal weights is possible but cannot be accurate for small N.

In order to improve the rate of convergence we propose a slight modification of the ordinary least squares procedure. In particular, we propose a WLSE with weights at the points a and b (the end-points of the interval [a, b]), which correspond to the masses Pa and Pb of the asymptotic optimal design. We thus only need to approximate the continuous part of the optimal signed measure, which has a density on (a, b), by an N-point design with equal masses. To be precise, consider an optimal measure of the form (2.12). We assume that the density p(·) is not identically zero on the interval (a, b) and choose the constant c such that abp(t)dt=1. Note that unless p(·) changes sign in (a, b), we can choose p(t) ≥ 0 for all t ∈ (a, b). Define φ(t) = |p(t)| for t ∈ (a, b) and denote by F(t)=atφ(s)ds the corresponding distribution function. The N-point design we use as an N-point approximation to the measure with density φ(t) is ξ̂N = {t1,N, . . ., tN,N; 1/N, . . ., 1/N}, where ti,N = F−1(zi,N) with zi,N = i/(N + 1), i = 1, 2, . . ., N. If p(t) = 0 on a sub-interval of [a, b] and F−1(zi,N) is not uniquely defined then we choose the smallest element from the set F−1(zi,N) as ti,N . Finally, the design we suggest as an (N +2)-point approximation to the optimal measure in (2.12) is

ξN+2=Paδa+Pbδb+Pξ¯N,

where P = 1 − |Pa| − |Pb|, ξ̄N = {t1,N, . . ., tN,N; s1,N/N, . . ., sN,N/N} and si,N = sign(p(ti,N)), i = 1, . . ., N.

The matrix W, which corresponds to the design ξN+2 and is used in the corresponding WLSE (1.3), is a diagonal matrix WN = diag(NPa, s1,N P, s2,NP, . . ., sN,NP, NPb) of size (N +2) × (N +2). The set of N +2 design points, where the observations should be taken, is given by {a, t1,N, t2,N, . . ., tN,N, b} and the resulting estimate is defined by

θ^WLSE,N=(XTWNX)-1XTWNY. (2.17)

It follows from (1.4), (2.15) and the discussion of the previous paragraph that

limNVar(θ^WLSE,N)=limND(ξN+2)=D,

where D* is defined in (2.14).

2.6 Some numerical results

Consider the regression model (2.1) with f(t) = t2 + 1, t ∈ [1, 2], where the error process is given by the Brownian motion. The optimal design for this model can be obtained from Table 3, and we have Pa = 0, Pb = −0.55, P = 0.45 and p(t) = 1.38/(t2 + 1). By computing the quantiles from the c.d.f. corresponding to p we can easily obtain support points of (N+2)-point designs. For example, supp(ξ4)={1,1.24,1.56,2},supp(ξ5)={1,1.18,1.39,1.65,2} and supp(ξ6)={1,1.14,1.30,1.49,1.71,2}.

In Figure 1 we display the variance of various linear unbiased estimators for different sample sizes. We observe that the variance of the WLSE defined by (2.17) for the proposed (N+2)-point design ξN+2 is slightly larger than the variance of the BLUE for the proposed (N +2)-point design, which is very close to the variance of the BLUE with corresponding optimal (N+2)-point design. The calculation of these designs is complicated and has been performed numerically by the Nelder-Mead algorithm in MATLAB. We also note that due to the non-convexity of the optimization problem it is not clear that the algorithm finds the optimal design. However, by Theorem 2.2 and 2.3 we determined the optimal value (2.14), which is D* ≃ 0.075004. This means that for the proposed designs WLSE has almost the same precision as BLUE.

Figure 1.

Figure 1

The variance of the WLSE defined in (2.17) for the proposed (N +2)-point designs ξN+2(crosses), of the BLUE for the proposed (N +2)-point designs (grey circles) and of the BLUE with corresponding optimal (N+2)-point designs (line). The error process in model (2.1) is given by the Brownian motion and the regression function is f(t) = t2 + 1, t ∈ [1, 2].

In our second example we compare the proposed optimal designs with the designs from Sacks and Ylvisaker (1966), which are constructed for the BLUE. For this purpose we consider the model (2.1) with regression function f(t) = 1 + 0.5 sin(2πt), t ∈ [1, 2], and triangular covariance kernel of the form (2.4) with u(t) = t2 and v(t) = t. The optimal design in the continuous time model can be obtained from Table 4 and its density is depicted in Figure 2.

Figure 2.

Figure 2

The density of the optimal design for continuous time model (2.1) with regression function f(t) = 1+0.5 sin(2πt), t ∈ [1, 2], and covariance kernel of the form (2.4) with u(t) = t2 and v(t) = t.

By computing quantiles using this optimal design, we obtain that the 4-point design ξ4 is supported at points 1, 1.27, 1.68 and 2. For ξ4, the variance of the BLUE is ≃ 0.6129. Using the optimal density from Sacks and Ylvisaker (1966), we obtain the 4-point design ξ4SY supported at 1, 1.25, 1.63 and 2. For ξ4SY, the variance of the BLUE is ≃ 0.6200. For N = 2, 3, . . ., 20, the variances of the BLUE for the proposed (N + 2)-point designs, the (N + 2)-point designs from Sacks and Ylvisaker (1966) and the optimal (N + 2)-point designs for the BLUE are depicted in Figure 3. We observe that for N = 2, 3, 4 the new designs yield a smaller variance of the BLUE, while for N = 5 the design of Sacks and Ylvisaker (1966) shows a better performance. In all other cases the results for both designs are very similar. In particular, for N ≥ 6 the variances from the optimal (N + 2)-point designs proposed in this paper and in the paper of Sacks and Ylvisaker (1966) are only slightly worse than the variances of the BLUE with corresponding best (N +2)-point designs (which is computed by direct optimization).

Figure 3.

Figure 3

The variance of BLUE for the proposed (N+2)-point designs (grey circles), the (N+2)-point designs from Sacks and Ylvisaker (1966) (crosses) and the BLUE with corresponding optimal (N + 2)-point designs (line) for the model f(t) = 1 + 0.5 sin(2πt), t ∈ [1, 2], and the covariance kernel with u(t) = t2 and v(t) = t; N = 2, . . ., 20.

3 Multi-parameter models

In this section we discuss optimal design problems for models with more than one parameter. The structure of this section is somewhat similar to the structure of Section 2. In Section 3.1 we introduce a new class of linear estimators of the parameters in model (1.3), which we call matrix-weighted estimators (MWE) and show in Lemma 3.3 that for some special choices of the matrix weights the MWE can always emulate the BLUE. In Section 3.2 matrix-weighted designs associated with the MWE are defined. Then, for the case of triangular kernels, in Corollary 3.1 we derive the asymptotic forms for the sequence of designs that are associated with the version of the MWE which emulates the BLUE. In Section 3.3 we prove optimality of the asymptotic matrix-weighted measure derived in Corollary 3.1 in the continuous time model (1.2) (see Theorem 3.1), while some examples of asymptotically optimal measures are provided in Section 3.4. Finally, the practical implementation of the asymptotic measures is discussed in Section 3.5 and numerical examples are provided in Section 3.6.

The proofs of many statements in this section use the results of Section 2. This is possible as there is a lot of freedom in choosing the form of the MWE to emulate the BLUE and we choose a special form which could be considered as component-wise SLSE. Correspondingly, the resulting matrix-weighted designs (including the asymptotic ones) become combinations of designs for one-parameter models.

3.1 Matrix-weighted estimators and designs

Consider the regression model (1.1) and assume that N observations at points tj (j = 1, . . ., N) have been made. Let Oj be an m × m matrix associated with the observation point tj; j = 1, . . ., N. Recall the definition of the design matrix X=(fi(tj))j=1,,Ni=1,,m and the definition of Y = (y(t1), . . ., y(tN))T. We introduce the m × N matrix C = (O1f(t1), . . ., ON f(tN)), whose j-th column is Ojf(tj). Assuming that the m × m matrix

M=CX=j=1NOjf(tj)fT(tj) (3.1)

is non-singular we define the linear estimator

θ^MWE=(CX)-1CY (3.2)

for the vector θ in model (1.1). We call this estimator the matrix-weighted estimator (MWE), because each column of the matrix X is multiplied by a matrix weight. It is easy to see that for any C the MWE θ̂MWE is unbiased and its covariance matrix is given by

Var(θ^MWE)=M-1CCT(M-1)T, (3.3)

where Σ = (K(ti, tj))i,j=1,...,N is the N ×N matrix of covariances of the errors. Note that the matrix M defined in (3.1) generalizes the standard information matrix XTX and that M is not necessarily a symmetric matrix. The following result shows that different matrices O1, . . ., ON may yield the same matrix-weighted estimator θ̂MWE. Its proof is obvious and therefore omitted.

Lemma 3.1

Consider the regression model (1.1) and assume that the matrix M defined in (3.1) is non-singular. Then the estimator θ̂MWE defined in (3.2) coincides with the estimator θ̂MWE = (CΛX)−1CΛY, where CΛ = ΛC and Λ is an arbitrary non-singular m × m matrix.

The estimator θ̂MWE,Λ introduced in Lemma 3.1 is the MWE defined by the matrix weights ΛO1, . . ., ON. Lemma 3.1 implies that the θ̂MWE is exactly the same for any set of matrices {ΛO1, . . ., ΛON} as long as Λ is non-singular. In the asymptotic considerations below it will be convenient to interpret the combination of the set of experimental conditions {t1, . . ., tN} and the set of corresponding matrices {O1, . . ., ON} in the MWE as an N-point matrix-weighted design.

Definition 3.1

Any combination of N points {t1, . . ., tN} and m × m matrices {O1, . . ., ON} will be called N-point matrix-weighted design and denoted by

ξN={t1,,tN;1NO1,,1NON}. (3.4)

The covariance matrix D(ξN) of a matrix-weighted design ξN is defined as the covariance matrix Var(θ̂MWE) in (3.3) of the corresponding estimate θ̂MWE.

The estimator θ̂MWE is not necessarily a least-squares type estimator; that is, it may not be representable in the form (1.3) for some N × N weight matrix W and hence there may be no associated weighted sum of squares which is minimized by the MWE. However, for any given W, we can always find matrices Oj such that

C=XTW (3.5)

and therefore achieve θ̂MWE = θ̂WLSE. The following result gives a constructive solution to the matrix equations (3.5).

Lemma 3.2

Assume that f1(t) ≠ 0 for all t ∈ [a, b]. Define Oj=ωje1T,

ωj=1f1(tj)(XTW)jm, (3.6)

where e1 = (1, 0, . . ., 0)T ∈ ℝm is the first unit vector and (XTW)j denotes the j-th column of the m × N matrix XTW. Then the corresponding matrix-weighted estimator satisfies θ̂MWE = θ̂WLSE.

Proof

The matrix equation (3.5) can be written as N vector equations

Ojf(tj)=(XTW)j;j=1,,N, (3.7)

with respect to the matrices Oj. Assume that Oj=ωje1T for some ωj ∈ ℝm. Then

Ojf(tj)=ωje1Tf(tj)=ωjf1(tj)

and equation (3.7) has the unique solutions (3.6).

The form Oj=ωje1T for the matrices Oj considered in Lemma 3.2 means that the matrix Oj has the vector ωj as its first column while all other entries in this matrix are zero. We shall refer to this form as the one-column form. We can choose other forms for the matrices Oj, but then we would require different, somewhat stronger, assumptions regarding the vector f(t). For example, if f(t) ≠ (0, . . ., 0)T for all t ∈ [a, b], then we can always choose diagonal matrices Oj to satisfy (3.5) (see Lemma 3.5 below).

The following choices for Oj ensure coincidence of θ̂MWE with the three popular estimators defined in the Introduction.

If Oj = Im for all j, then θ̂MWE = θ̂OLSE.

If Oj = sjIm for all j, then θ̂MWE = θ̂SLSE.

If W = Σ−1 and Oj=ωje1T with ωj = (XTΣ−1)j/f1(tj), then θ̂MWE = θ̂BLUE.

We shall call any MWE θ̂MWE optimal if it coincides with the BLUE. In view of the importance of the last case, the corresponding result is summarized in the following lemma.

Lemma 3.3

Consider the regression model (1.1) and let f1(t) ≠ 0 for all t ∈ [a, b]. For a given set of N observation points {t1, . . ., tN} the MWE θ̂MWE defines a BLUE if Oj=ωje1T with ωj=(XT-1)j/f1(tj).

If the covariance kernel of the error process has triangular form (2.4) then we can derive the explicit form for the optimal MWE. The result follows by a direct application of Lemma A.1.

Lemma 3.4

Assume that the covariance kernel K(·, ·) has the form (2.4) and that the matrix Σ = (K(ti, tj))i,j=1,...,N is positive definite with entries K(ti, tj) = uivj for ij, where for k = 1, . . ., N we denote uk = u(tk), vk = v(tk) and qk = uk/vk. Then we have the following representation for the optimal vectors ωj=(XT-1)j/f1(tj)m introduced in Lemma 3.3:

ω1=cf1(t1)(σ11f(t1)+σ12f(t2))=cu2f1(t1)v1v2(q2-q1)(f(t1)u1-f(t2)u2), (3.8)
ωN=cf1(tN)(σN,Nf(tN)+σN-1,Nf(tN-1))=cf1(tN)vN(qN-qN-1)(f(tN)vN-f(tN-1)vN-1), (3.9)
ωi=cf1(ti)(σi,if(ti)+σi-1,if(ti-1)+σi,i+1f(ti+1))=cf1(ti)vi((qi+1-qi-1)f(ti)vi(qi+1-qi)(qi-qi-1)-f(ti-1)vi-1(qi-qi-1)-f(ti+1)vi+1(qi+1-qi)), (3.10)

for i = 2, . . ., N − 1. Here in formulas (3.8), (3.9) and (3.10) σ̃ij denote the elements of the matrix Σ−1 = (σ̃ij)i,j=1,...,N.

The following provides a result similar to Lemmas 3.2 and 3.3 in the case where the matrices Oj are diagonal. An extension of Lemma 3.4 to the matrices Oj of the diagonal form is straightforward and omitted for the sake of brevity.

Lemma 3.5

Consider the regression model (1.1) and let fk(t) ≠ 0 for all t ≠ [a, b] and all k = 1, . . ., m. For each j = 1, . . ., N, define the diagonal matrix Oj by its diagonal elements

(Oj)k,k=1fk(tj)(XTW)k,j;k=1,,m,

where (XTW)k,j denotes the (k, j)-th element of the matrix XTW. Then θ̂MWE = θ̂WLSE. If additionally W = Σ−1 so that (Oj)k,k = (XTΣ−1)k,j/fk(tj), then θ̂MWE = θ̂BLUE.

3.2 Weak convergence of matrix-weighted designs

Let Q : [a, b] → [0, 1] be an increasing function on the interval [a, b] with Q(a) = 0 and Q(b) = 1 so that Q(·) is a c.d.f. For a fixed N and j = 1, . . ., N, define the points t1,N, . . ., tN,N by (2.9). Suppose that with each t ∈ [a, b] we can associate an m × m matrix O(t) and consider an N-point matrix-weighted design ξN of the form (3.4) with tj = tj,N and Oj = O(tj,N). In view of (3.1) and (3.3) this design has the covariance matrix

D(ξN)=M-1(ξN)B(ξN)(M-1(ξN))T,

where the matrices M(ξN) and B(ξN) are defined by

M(ξN)=1Nj=1NO(tj,n)f(tj,n)fT(tj,n),B(ξN)=1N2i=1Nj=1NK(ti,n,tj,n)O(ti,n)f(ti,n)fT(tj,n)OT(tj,n).

In addition to the sequence of matrix-weighted designs ξN consider the sequence of uniform distributions on the set {t1,N, . . ., tN,N}. As N → ∞, this sequence converges weakly to the design (probability measure) on the interval [a, b] with distribution function Q. This implies

limNM(ξN)=M(ξ)=abO(t)f(t)fT(t)ζ(dt)limNB(ξN)=B(ξ)=ababK(t,s)O(t)f(s)fT(t)OT(s)ζ(dt)ζ(ds),

and

limND(ξN)=D(ξ)=M-1(ξ)B(ξ)(M-1(ξ))T (3.11)

under the assumptions that the vector-valued function f, the matrix-valued function O, the kernel K are continuous on the interval [a, b] and the generalized information matrix M(ξ) are non-singular. Moreover, the sequence of estimators (3.2) converges (almost surely as N → ∞) to

θ^MWE,=M-1(ξ)abO(t)f(t)y(t)ζ(dt), (3.12)

where {y(t) | t ∈ [a, b]} is the stochastic process in the continuous time model (1.2). Bearing these limiting expressions in mind we say that the sequence of matrix-weighted designs ξN defined by (3.4) converges to the limiting matrix-weighted design ξ(dt) = O(t)ζ(dt) as N → ∞. This relation justifies the notation M(ξ), B(ξ) and D(ξ) of the previous paragraph.

The (optimal) limiting matrix-weighted designs which will be constructed below will have a similar structure as the signed measures in (2.12). They will assign matrix weights Oa and Ob to the end-points of the interval [a, b] and a ‘matrix density’ O(t) to the points t ∈ (a, b); that is, these designs will have the form

ξ(dt)=Oaδa(dt)+Obδb(dt)+O(t)dt. (3.13)

In view of (3.12), the MWE in the continuous time model (1.2) associated with any design of the form (3.13) can be written as

θ^MWE(ξ)=M-1(ξ)[Oaf(a)y(a)+Obf(b)y(b)+abO(t)f(t)y(t)dt], (3.14)

where M(ξ)=Oaf(a)fT(a)+Obf(b)fT(b)+abO(t)f(t)fT(t)dt. In the particular case associated with Lemma 3.4, we have the following structure of the matrices Oa and Ob and the matrix function O(t) in (3.13):

Oa=ωae1T,Ob=ωbe1T,O(t)=ω(t)e1Tfort(a,b), (3.15)

where ωa and ωb are some m-dimensional vectors and ω(t) ∈ ℝm is some vector-valued function defined on the interval (a, b). Note that ω(t) does not have to approach ωa and ωb as ta and tb, respectively.

When the sequence of matrix-weighted designs is defined by the formulas of Lemma 3.3 we can compute the limiting matrix-weighted design. The proof follows by similar arguments as given in the proof of Theorem 2.1 and is therefore omitted.

Corollary 3.1

Consider model (1.1), where the error process {ε(t)| t ∈ [a, b]} has a covariance kernel K of the form (2.4). Assume that u(·), v(·), q(·) are strictly positive, twice continuously differentiable functions on the interval [a, b] and that the vector-valued function f(·) is twice continuously differentiable with f1(t) ≠ 0 for all t ∈ [a, b]. Consider the matrix-weighted design ξN of the form (3.4), where the support points tj = tj,N are generated by (2.9) and the matrix weights Oj = Oj,N are defined in Lemma 3.3. The sequence {ξN}NN converges (in the sense defined above in the previous paragraph) to a matrix-weighted design ξ defined by (3.13) and (3.15) with

ωa=cf1(a)v2(a)q(a)[f(a)u(a)u(a)-f(a)],ωb=ch(b)f1(b)v(b)q(b),ω(t)=-cf1(t)v(t)[h(t)q(t)] (3.16)

where h(t) = f(t)/v(t) and the constant c ≠ 0 is arbitrary.

In Corollary 3.1, the one-column representation of the matrices Oj is used. The following statement contains a similar result for the case where the matrices Oj are diagonal.

Corollary 3.2

Let the conditions of Corollary 3.1 hold and assume additionally that fk(t) ≠ 0 for all t ∈ [a, b] and all k = 1, . . . , m. Consider the matrix-weighted design ξN of the form (3.4), where the support points tj = tj,N are generated by (2.9) and the matrices Oj = Oj,N are defined in Lemma 3.5 with diagonal elements given by (Oj)k,k = (XTΣ−1)k,j/fk(tj). Then the sequence {ξN}N×ℕ converges to the optimal matrix-weighted design ξ* of the form (3.13), where the diagonal elements of the matrices Oa = diag(Oa,11, . . . , Oa,mm), Ob = diag(Ob,11, . . . , Ob,mm) and O(t) = diag(O11(t), . . . , Omm(t)) are given by

Oa,jj=cfj(a)v2(a)q(a)[fj(a)u(a)u(a)-fj(a)],Ob,jj=chj(b)fj(b)v(b)q(b),Ojj(t)=-cfj(t)v(t)[hj(t)q(t)]

respectively, hj(t) = fj(t)/v(t), j = 1, . . . , m and the constant c ≠ 0 is arbitrary.

3.3 Optimal designs and best linear estimators

In this section we consider again the continuous time model (1.2), where the full trajectory of the process {y(t)|t ∈ [a, b]} can be observed. We start recalling some known facts concerning best linear unbiased estimation. For details we refer the interested reader to the work of Grenander (1950) or Section 2.2 in Näther (1985a). Any linear estimator of θ can be written in the form of the integral

θ^μ=aby(t)μ(dt), (3.17)

where μ(t) = (μ1(t), . . . , μm(t))T is a vector of signed measures on the interval [a, b]. For given μ, the estimator θ̂μ is unbiased if and only if abf(t)μT(dt)=Im, where Im denotes the m-dimensional identity matrix. Theorem 2.3 in Näther (1985a) states that the estimator θ̂μ* is BLUE if and only if abf(t)μT(dt)=Im and the identity

abK(u,v)μ(dv)=Af(u)

holds for all u ∈ [a, b], where A is some m × m matrix. The matrix A is uniquely defined and coincides with the matrix

D=Var(θ^μ)=inf{ababK(u,v)μ(du)μT(dv)|μvectorofsignedmeasures}. (3.18)

The Gauss-Markov theorem further implies that D* ≤ Var(θ̂), where θ̂ is any other linear unbiased estimator of θ.

Definition 3.2

A matrix-weighted design ξ* is called optimal if D(ξ*) = D*, where D(ξ) is defined in (3.11) and D* is defined in (3.18).

The designs we consider have the form (3.13) and the corresponding MWE are expressed by (3.14). The estimator (3.14) can be expressed in the form (3.17), that is θ̂MWE(ξ) = θ̂μ with

μ(dt)=M-1(ξ)[Oaf(a)δa(dt)+Obf(b)δb(dt)+O(t)f(t)dt].

The estimators defined in (3.14) are always unbiased and the following result provides the matrix-weighted optimal design and the BLUE in the continuous time model (1.2). The proof follows by similar arguments as given in the proof of Theorem 2.2 and 2.3 and is therefore omitted.

Theorem 3.1

Let K(t, s) be a covariance kernel of the form (2.4) and the vector-function f(·) be twice continuously differentiable with f1(t) ≠ 0 for all t ∈ [a, b]. Under the assumptions of Corollary 3.1 the matrix-weighted design ξ* defined by the formulas (3.13) and (3.16) with c = 1 is optimal in the sense of Definition 3.2. Moreover, if

μ(dt)=M-1(ξ)[ωae1Tf(a)δa(dt)+ωbe1Tf(b)δb(dt)+ω(t)e1Tf(t)dt],

then θ̂μ* defines the BLUE in model (1.2). Additionally, we have

D(ξ)=D=ababK(s,t)μ(ds)μ(dt)=M-1(ξ),

where the matrix M(ξ*) is given by

M(ξ)=f(q(a))fT(q(a))q(a)+q(a)q(b)f(s)fT(s)ds

and (s) = f(q−1(s))/v(q−1(s)).

In Theorem 3.1 we have used the one-column representation for the matrices O(t). Similar arguments establish the optimality of the matrix-weighted designs ξ* defined in Corollary 3.2 where the diagonal representation for the matrices O(t) is used. The details are omitted for the sake of brevity.

3.4 Examples of optimal matrix-weighted designs

Consider the polynomial regression model with f(t) = (1, t, t2, . . . , tm−1)T, t ∈ [a, b] and the covariance kernel of the Brownian motion K(t, s) = min(t, s). For the construction of matrix-weighted designs we use matrices O(t) in the one-column and diagonal representations.

For the one-column representation we have from Corollary 3.1 and Theorem 3.1 that the optimal matrix weighted design has masses Oa=ωae1T and Ob=ω(b)e1T at points a and b, respectively, and the density O(t)=ω(t)e1T. Here the vectors ωa, ωb and ω(t) are given by

ωa=(1/a,0,-a,,(2-m)am-2)T,ωb=(0,1,2b,3b2,,(m-1)bm-2)T,ω(t)=(0,0,-2,-3·2t,,-(m-1)(m-2)tm-3)T,t(a,b),

respectively. For the diagonal representation we have from Corollary 3.2 (and an analogue of Theorem 3.1) that the optimal matrix weighted design has masses Oa and Ob at points a and b, respectively, and the density O(t), where

Oa=diag(1/a,0,-1/a,,(2-m)/a),Ob=diag(0,1/b,2/b,,(m-1)/b),O(t)=diag(0,0,-2/t2,,-(m-1)(m-2)/t2),t(a,b).

Note that in this case all non-vanishing diagonal elements of the matrix O(t) are proportional to the function 1/t2. According to Lemma 3.1, we can use ΛO(t) instead of O(t), for any non-singular m × m matrix Λ. By taking the matrix

Λ=diag(1,1,-1/2,,-1/[(m-1)(m-2)]),

we obtain

ΛO(a)=diag(1/a,0,1/(2a),,1/[(m-1)a]),ΛO(b)=diag(0,1/b,-1/b,,-1/[(m-2)b]),ΛO(t)=diag(0,0,1/t2,,1/t2),t(a,b).

As another example, consider the polynomial regression model with f(t) = (1, t, t2, . . . , tm−1)T, t ∈ [a, b], and the triangular covariance kernel of the function (2.4) with u(t) = tγ and v(t) = tω. For the diagonal representation we have from Corollary 3.2 that the optimal matrix weighted design has masses Oa and Ob at points a and b, respectively, and the density O(t), where

Oa=a-γ-ωdiag(-γ,1-γ,2-γ,,m-1-γ),Ob=b-γ-ωdiag(ω,ω-1,ω-2b,,ω+1-m),O(t)=t-1-γ-ωdiag(τ1,τ2,,τm),t(a,b),

with τi = (i − 1 − γ) (i − 1 − ω), i = 1, . . . , m. If we further use Λ = diag(11, 12, . . . , 1m) then we obtain ΛO(t) = t−1−γ−ω diag(1, 1, . . . , 1), t ∈ (a, b); that is, all components of the matrix ΛO(t) have exactly the same density.

3.5 Practical implementation

Here we only consider the diagonal representation of the matrices Oa, Ob and O(t); the case of one-column representation of the matrices O can be treated similarly. We assign matrix weights Oa and Ob to the boundary points a and b and use an N-point approximation to an absolutely continuous probability measure on (a, b) with some density φ(t). The density φ(t) is defined to be either the uniform density on (a, b) (if nonzero elements of different components of O(t) are not proportional to each other) or φ (t) = c|Ol,l(t)| for some l ∈ {1, . . . , m} (if nonzero elements of different components of O(t) are proportional to each other), where c is the normalization constant and l is such that the density φ (t) is not identically zero on the interval (a, b). Denote by F(t)=atφ(s)ds the corresponding c.d.f. For given N, we calculate an N-point approximation {t1,N, . . . , tN,N; 1/N, . . . , 1/N}, where ti,N = F−1(zi,N) with zi,N = i/(N + 1), i = 1, 2, . . . , N, to the probability measure with density φ (t).

To each point tj,N we assign a vector of weights sj = (sj,N,1, . . . , sj,N,m)T such that sj,N,k ∈ {−1, 0, 1} (k = 1, . . . , m). The values sj,N,k = sign(Ok,k(tj)) = ±1 correspond to the sign of the point tj,N in the estimation of θk exactly as in the procedure for one-parameter models described in Section 2.5. Some of the values sj,N,k could be 0. If sj,N,k = 0 for some k then the point tj,N is not used for the estimation of θk. By assigning zero weight to a point tj,N in the k-th estimation direction, we perform a thinning of the sample of points t1,N, . . . , tN,N in k-th direction and thus achieve a required density in the each estimation direction. This is a deterministic version of the well-known ‘rejection method’ widely used to generate samples from various probability distributions. If the nonzero components of the matrix weight O(t) are proportional to each other then for these components sj,N,k = 1 for all j and N.

The resulting estimator θ̂ has the form (3.2) where

C=(NOaf(a),S1Pf(t1),SNPf(tN),NObf(b)),Sj=diag(sj,N,1,,sJ,N,m)m×m

and P is the diagonal m × m matrix whose diagonal elements are given by

Pk,k=Nj=1Nsj,N,kabOk,k(t)dt.

If nonzero elements of different components of the matrix weight O(t) are proportional to each other (as was the case in the examples of Section 3.4) then the (N+2)-point approximations to the limiting design are very similar to the approximations in the one-parameter case considered in Section 2.5; their accuracy is also very high. Otherwise, when the diagonal elements of O(t) are possibly non-proportional, the accuracy of approximations will depend on the degree of non-homogeneity of components of the matrix weight O(t).

3.6 Some numerical results

For comparison of competing matrix weighted designs for multiparameter models it is convenient to consider a functional of the covariance matrix. Exemplarily we investigate in this Section the classical D-optimality criterion defined as Ψ (D(ξ)) = (det D(ξ))1/m which has to be minimized.

As an example where all nonzero elements of the matrix O(t) are proportional to each other, let us consider the cubic regression model with f(t) = (1, t, t2, t3)T and the Brownian motion error process. The optimal value in the continuous time model (1.2) is Ψ (D*) ≃ 2.7927. In Figure 4 we display the D-criterion of the covariance matrices of the MWE and the BLUE for the proposed (N+2)-point designs and the covariance matrix of the BLUE with corresponding optimal (N+2)-point designs. We can see that the D-efficiency of the proposed matrix-weighted design is very high, even for small N.

Figure 4.

Figure 4

The D-optimality criterion of the covariance matrix of the MWE for the proposed (N+2)-point designs (crosses), of the covariance matrix of the BLUE for the proposed (N+2)-point designs (line) and of the covariance matrix of the BLUE with corresponding D-optimal (N +2)-point designs (grey circles). The error process in model (1.1) is the Brownian motion and the vector of regression functions is given by f(t) = (1, t, t2, t3), t ∈ [1, 2].

The second example of this section considers a situation where nonzero elements of the matrix O(t) are not proportional to each other. For this purpose we consider the model (2.1) with f(t) = (1, t, t2)T, t ∈ [1, 2] and covariance kernel K(t, t′) = e−|tt′| with u(t) = et and v(t) = et. Using the diagonal representation, we obtain for the optimal matrix-weighted designs

Oa=diag(1,0,-1),Ob=diag(1,1.5,2),O(t)=diag(1,1,1-2/t2),t(1,2).

The optimal value in the continuous time model (1.2) is given by Ψ (D*) ≃ 1.6779. Since some diagonal elements of O(t) are constant functions, we take the support points of the design ξN+2 to be equidistant: ti,N = i/(N +1) for i = 1, . . . , N. Then we have sj,N,k = 1 for all j = 1, . . . , N and k = 1, 2. However, some elements of (s1,N,3, . . . , sN,N,3) should be zero because O3,3(t) is not proportional to O1,1(t). For example, for N = 10 the vector of signs (s1,N,3, . . . , sN,N,3) is (−1, −1, 0, 0, 0, 1, 0, 0, 1, 0) and for N = 30 it is (−1, 0, −1, −1, 0, −1, 0, 0, −1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1).

In Figure 5 we depict the D-optimality criterion of the covariance matrices for various estimators. We observe that in this example for all N the D-optimality criterion of the covariance matrices of the MWE is slightly larger than the D-optimality criterion of the covariance matrices of the BLUE. However, we can also see that the proposed (N +2)-point designs are very efficient compared to the BLUE with corresponding D-optimal (N +2)-point designs even for small N.

Figure 5.

Figure 5

The D-optimality criterion of the covariance matrix of the MWE for the proposed (N+2)-point designs (crosses), of the covariance matrix of the BLUE for the proposed (N+2)-point designs (line) and of the BLUE with corresponding D-optimal (N+2)-point designs (grey circles). The covariance kernel in model (1.1) is K(t, t′) = e−|tt′| and the vector of regression functions is f(t) = (1, t, t2), t ∈ [1, 2].

Acknowledgments

This work has been supported in part by the Collaborative Research Center “Statistical modeling of nonlinear dynamic processes” (SFB 823, Teilprojekt C2) of the German Research Foundation (DFG). The research of H. Dette reported in this publication was also partially supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R01GM107639. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We would also like to thank Martina Stein who typed parts of this paper with considerable technical expertise. The work of Andrey Pepelyshev was partly supported by Russian Foundation of Basic Research, project 12-01-00747.

A Proof of main results

A.1 Explicit form of the inverse of the covariance matrix of errors

Here we state an auxiliary result, which gives an explicit form for the inverse of the matrix =(K(ti,tj))i,j=1N, with a triangular covariance kernel K. We did not find this result (as formulated below) in the literature. Versions of Lemma A.1, however, have been derived independently by different authors; see, for example, Lemma 7.3.2 in Zhigljavsky (1991) and formula (8) in Harman and Štulajter (2011). The proof follows from straightforward checking the condition Σ−1Σ = ΣΣ−1 = I.

Lemma A.1

Consider a symmetric N × N matrix Σ = (σi,j)i,j=1,...,N which elements are defined by the formula σi,j = uivj for 1 ≤ ijN. Assume that q1 < q2 < . . . < qN where qi = ui/vi. Then the inverse matrix Σ̃ = Σ−1 is a symmetric tri-diagonal matrix and its elements σ̃i,j with ij can be computed as follows:

σ1,1=u2u1v1v2(q2-q1),σN,N=1vN2(qN-qN-1),σi,i=qi+1-qi-1vi2(qi-qi-1)(qi+1-qi)(i=2,,N-1),σi,i+1=-1vivi+1(qi+1-qi)(i=1,,N-1),σi,i+k=0(i=1,,N-2,k2).

In our applications of Lemma A.1 we assume that σi,j = K(ti, tj) with the covariance kernel K having the form (2.4).

A.2 Proof of Lemma 2.1

Denote Kij = K(ti, tj), f (ti) = fi, ai = fiwi, i, j = 1, . . . , N , a = (a1, . . . , aN)T. Then for any signed measure ξ = {t1, . . . , tN; w1, . . . , wN} we have

D(ξ)=ijKijfifjwiwj(ifi2wi)2=ijKijaiaj(ifiai)2=aTa(aTf)2.

Since Σ is symmetric and Σ > 0, there exists Σ−1 and a symmetric matrix Σ1/2 > 0 such that Σ = Σ1/2Σ1/2. Denote b = Σ1/2a and d = Σ−1/2f. Then we can write the design optimality criterion D(ξ) as D(ξ) = bTb/(bTd)2. The Cauchy-Schwartz inequality gives for any two vectors b and d the inequality (bTd)2 ≤ (bTb)(dTd), that is, bTb/(bTd)2 ≥ 1/(dTd). This inequality with b and d as above is equivalent to D(ξ) ≥ 1/fT Σ−1f for all ξ. Equality is attained if the vector b is proportional to the vector d; that is, if bi = cdi for all i and any c ≠ 0. Finally, the equality bi = cdi can be rewritten in the form wi = c(Σ−1f)i/f(ti).

A.3 Proof of Theorem 2.1

Before starting the main proof we recall the definition of the design points (2.9) and prove the following auxiliary result.

Lemma A.2

Assume that q(·) = u(·)/v(·) is a twice continuously differentiable function on the interval [a, b]. Then for all i = 1, . . . , N − 1, we have

ti+1,N-ti,N=1NQ(ti,N)+O(1N2)asN. (A.1)
Δn=(ti+1,N-ti-1,N)/2=1NQ(ti,N)(1+O(1N))asN. (A.2)
Proof of Lemma A.2

Recall the definition zi,N = (i − 1/2)/N (i = 1, . . . , N) and set

m=q(a)=mint[a,b]q(t),M=q(b)=maxt[a,b]q(t).

From the definition of the function Q in (2.8) we have

q(ti+1,N)-q(ti,N)=(M-m)(zi+1,N-zi,N)=M-mN (A.3)

for all i = 1, . . . , N − 1. Observing Taylor’s formula yields for any z

Q-1(z+δ)=Q-1(z)+δ·(Q-1)(z)+O(δ2)asδ0.

In this formula, set z = zi,N and δN = 1/N so that z + δ = zi+1,N. We thus obtain

ti+1,N-ti,N=Q-1(zi+1,N)=Q-1(zi,N)+1N·(Q-1)(zi,N)+O(1N2)asN.

By using (2.9) and the relation (Q−1)′ (z) = 1/Q′ (Q−1(z)) we can rewrite this in the form (A.1). The second statement obviously follows from (A.1).

Proof of Theorem 2.1

In view of Lemma 2.1 and (2.5) – (2.7) we have

w1,N=cNu2f1v1v2(f1u1-f2u2),wN,N=cNfNvN(fNvN-fN-1vN-1),wi,N=cNfivi(2fivi-fi-1vi-1-fi+1vi+1)fori=2,,N-1,

where we have used the relations (A.3). Here cN is the normalization constant providing i=1Nwi,N=1and we use the notation ui = u(ti,N), vi = (ti,N) and fi = f (ti,N).

Consider first w1,N. Denote g(t) = f(t)/u(t), then

w1,N/cN=u(t2,N)f(t1,N)v(t1,N)v(t2,N)(g(t1,N)-g(t2,N)), (A.4)

which gives

u(t2,N)f(t1,N)v(t1,N)v(t2,N)=u(a)f(a)v2(a)(1+O(1N)),g(t1,N)-g(a)=g(a)(t1,N-a)+O((t1,N-a)2)=g(a)·12NQ(a)+O(1N2) (A.5)

as N → ∞. Similarly

g(t2,N)-g(a)=g(a)32NQ(a)+O(1N2)

yielding

g(t1,N)-g(t2,N)=-g(a)1NQ(a)+O(1N2). (A.6)

Combining (A.4), (A.5) and (A.6) we obtain

w1,NcN=-1N·u(a)g(a)f(a)v2(a)Q(a)(1+O(1N))=1N·1v2(a)Q(a)[u(a)u(a)-f(a)f(a)](1+O(1N)) (A.7)

as N → ∞. Similarly to (A.7) we get the asymptotic expression for wN,N :

wN,NcN=1N·h(b)f(b)v(b)Q(b)(1+O(1N)) (A.8)

as N → ∞. Consider now the weights

wi,N=cN2h(ti,N)-h(ti-1,N)-h(ti+1,N)f(ti,N)v(ti,N)(i=2,,N-1). (A.9)

Assume N → ∞ and i = i(N) is such that i(N)/N = z + O(1/N) as N → ∞ for some z ∈ (0, 1), and set t = Q−1(z).

We are going to prove that

wi,NcN=2h(ti,N)-h(ti-1,N)-h(ti+1,N)f(ti,N)v(ti,N) (A.10)
=1N2(Q(t))2f(t)v(t)[h(t)Q(t)Q(t)-h(t)](1+O(1N))=-1N2Q(t)f(t)v(t)[h(t)Q(t)](1+O(1N)). (A.11)

First, in view of (2.9) we have ti,n=t+O(1N) and hence

f(ti,N)v(ti,N)=f(t)v(t)(1+O(1N))asN.

Consider the numerator in (A.10) and rewrite it as follows:

2h(ti,N)-h(ti-1,N)-h(ti+1,N)=[2h(ti,N)-h(ti-1,N)-h(ti+1,N)]+2[h(ti,N)-h(ti,N)]

where i,N = (ti−1,N + ti+1,N)/2. We obviously have ti+1,N = i,N + ΔN and ti−1,N = i,N − ΔN, where ΔN = (ti+1,Nti−1,N)/2 is defined in (A.2). This yields

2h(ti,N)-h(ti-1,N)-h(ti+1,N)ΔN2=-h(t)(1+O(1N)). (A.12)

Next we consider

h(ti,N)-h(ti,N)ΔN2=h(ti,N)-h(ti,N)ti,N-ti,N·ti,N-ti,NΔN2.

For the first factor we have

h(ti,N)-h(ti,N)ti,N-ti,N=h(t)(1+O(1N)),

while the second factor gives

ti,N-ti,NΔN2=22ti,N-ti+1,N-ti-1,N(ti+1,N-ti-1,N)2=22Q-1(zi,N)-Q-1(zi+1,N)-Q-1(zi-1,N)1/N2(1/NQ-1(zi+1,N)-Q-1(zi-1,N))2=-2(Q-1)(z)/(2(Q-1)(z))2(1+O(1N))=Q(t)/(2Q(t))(1+O(1N)),

where we have used the relation (Q−1)″ (z) = −Q″(z)/(Q′(z))3 in the last equation. This gives, as N → ∞,

2h(ti,N)-h(ti,N)ΔN2=2h(ti,N)-h(ti,N)ti,N-ti,N·ti,N-ti,NΔN2=h(t)Q(t)Q(t)(1+O(1N)). (A.13)

Combining the expressions (A.2), (A.10), (A.12) and (A.13) yields the asymptotic expression (A.11) for wi,N/cN .

By noting that cN=NC(1+O(1N)) as N → ∞ and that the asymptotic density of the points ti,N (i = 1, . . . , N) is Q′(t) on the interval [a, b], we deduce the statement of the theorem as a consequence of the asymptotic formulas (A.7), (A.8) and (A.11) for w1,N/cN, wN,N /cN and wi,N/cN respectively.

A.4 Proof of Theorem 2.2

By Theorem 3.3 in Dette et al. (2013) a design minimizes the functional (2.15) if the identity

abmin(s,t)f(t)dξ(t)=λf(s) (A.14)

holds ξa.e., where λ is some constant. We consider the design ξ = ξ* defined by (2.12) and (2.13) and verify for this design the condition (A.14). To do this we calculate by partial integration

1cabmin(s,t)f(t)p(t)dt=ast(-f(t))dt+sbs(-f(t))dt=-{tf(t)|as-asf(t)dt}-s{f(b)-f(s)}=(af(a)-f(a))-sf(b)+f(s).

Observing the definition of the masses in (2.13), the identify (A.14) follows with λ = c. This proves the first part of Theorem 2.2.

For a proof of the second statement consider a linear unbiased estimator θ̂μ* in model (2.1) based on the full trajectory, where μ*(dt) = f(t)ξ*(dt) and ξ* is the design in (2.12), (2.13) with a constant c chosen such that θ̂μ* is unbiased, that is

c=[f2(a)a+ab(f(t))2dt]-1.

Standard arguments of optimal design theory show that μ* minimizes Φ (that is, θ̂μ* is BLUE in model (2.1) where the full trajectory can be observed) if and only if the inequality

Φ(μ,ν)=ababK(x,y)μ(dx)ν(dy)Φ(μ) (A.15)

holds for all signed measures ν satisfying abf(t)ν(dt)=1. Observing this condition and the identity (A.14) we obtain

Φ(μ,ν)=cabf(s)ν(ds)=[f2(a)a+ab(f(t))2dt]-1=Φ(μ)

for all signed measures ν on [a, b] with abf(t)ν(dt)=1. By (A.15) μ* minimizes Φ. Consequently, the corresponding estimator θ̂μ* is BLUE with minimal variance

D=c=[f2(a)a+ab(f(t))2dt]-1.

A.5 Proof of Theorem 2.3

Let {ε̃ (s)|s ∈ [ã, b̃] be a Brownian motion on the interval [ã, b̃] and consider the regression model (2.1) with some function (s) and the error process. By Theorem 2.2 the optimal design is given by

ξ(ds)=Paδa(ds)+Pbδb(ds)+p(s)ds

with

Pa=cf(a)-f(a)aaf(a),Pb=cf(b)f(b)andp(s)=-cf(s)f(s).

We shall now use Theorem B.1 to derive the optimal design ξ*(dt) for the original regression model (2.1) with regression function f(t) and covariance kernel K(t, t′) from the design ξ̃*(ds) for the function (s) = h(q−1(s)), where h(t) = f(t)/v(t).

For the Brownian motion, the covariance function is defined by (B.4) with (t) = 1 and (t) = t so that by (B.6) we have β(t) = q(t), α (t) = v(t) and α̃(t) = 1/v(q−1(t)). According to (B.14) the optimal design dξ̃(s) transforms to *(t) = α̃2(β(t)) dξ̃* (β(t)) = dξ̃* (q(t))(t).

Consider first the mass at b. We have = cf̃′ ()/f̃(). By using the transformation of t into s = q−1(t), we obtain

Pb=Pbv2(b)=cf(b)f(b)v2(b)=ch(b)q(b)v2(b)h(b)=ch(b)q(b)v(b)f(b),

as required. From the representation of ãwe obtain by similar arguments

Pa=Pav2(a)=ch(a)-ah(a)/q(a)q(a)v2(a)h(a).

Let us now consider the density (s), s ∈ [ã,b̃], and rewrite dξp(β(t)), the absolutely continuous part of the measure ξ̃*. The transformation of the variable s into t = q−1(s) ∈ [a, b] induces the density

dξp(β(t))=p(q(t))q(t)=-cq(t)f(q(t))f(q(t)). (A.16)

Differentiating the equality (s) = h(q−1(s)), we have

f(s)=(h(q-1(s))·(q-1(s)))=h(q-1(s))·((q-1(s)))2+h(q-1(s))·(q-1(s)).

Now we obtain

f(q(t))=h(t)(q(t))2-h(t)·q(t)(q(t))3

Inserting this into (A.16) and taking into account that (q(t)) = h(t), we obtain the density

dξp(β(t))=c1h(t)q(t)(h(t)·q(t)q(t)-h(t))=-ch(t)[h(t)q(t)]. (A.17)

In view of the relation *(t) = α̃2(β(t))dξ̃*(β(t)) we need to divide the right hand side in (A.17) by v2(t) and obtain the expression for the density (2.11). This completes the proof of Theorem 2.3.

B Gaussian processes with triangular covariance kernels

B.1 Extended Doob’s representation

Assume that {ε (t)| t ∈ [a, b]} is a Gaussian process with covariance kernel K of the form (2.4); that is, K(t, t′) = u(t)v(t′) for tt′, where u(·) and v(·) are functions defined on the interval [a, b]. According to the terminology introduced in Mehr and McFadden (1965) kernels of the form (2.4) are called triangular. An alternative way of writing these covariance kernels is

K(t,t)=v(t)v(t)min{q(t),q(t)}fort,t[a,b], (B.1)

where q(t) = u(t)/v(t). We assume that ε(t) is non-degenerate on the open interval (a, b), which implies that the function q is strictly increasing and continuous on the interval [a, b] [see Mehr and McFadden (1965), Remark 2]. Moreover, this function is also positive on the interval (a, b) [see Remark 1 in Mehr and McFadden (1965)], which yields that the functions u and v must have the same sign and can be assumed to be positive on the interval (a, b) without loss of generality. We repeatedly use the following extension of the celebrated Doob’s representation [see Doob (1949)], which relates to two Gaussian processes (on compact intervals) by a time-space transformation.

Lemma B.1

Let {ε(t)| t ∈ [a, b]} be a non-degenerate Gaussian process with zero mean and covariance function (B.1) and let and be continuous positive functions on [ã,b̃], such that is strictly increasing and ([ã,b̃]) = q([a, b]). Define the transformations β̃ : [ã,b̃] → [a, b] and α̃ : [ã,b̃] → ℝ+ by

β(s)=q-1(q(s)),α(s)=v(s)/v(β(s)). (B.2)

Then the Gaussian process {ε̃(t)| t ∈ [ã,b̃]} defined by

ε(s)=α(s)ε(β(s)) (B.3)

has zero mean and the covariance function is given by

K(s,s)=E[ε(s)ε(s)]=v(s)v(s)min(q(s),q(s)). (B.4)

Conversely, the Gaussian process ε(t) can be expressed via ε̃(s) by the transformation

ε(t)=α(t)ε(β(t)), (B.5)

where

β(t)=q-1(q(t)),α(t)=v(t)/v(β(t)). (B.6)
Proof

Since {ε(t)|t ∈ [a, b]} is Gaussian and has zero mean, the process defined by (B.3) is also Gaussian and has zero mean. For the covariance function of the process (B.3) we have

E[ε(s)ε(s)]=α(s)α(s)E[ε(β(s))ε(β(s))]=α(s)α(s)v(β(s))v(β(s))min[q(β(s)),q(β(s))]=v(s)v(s)min[q(s),q(s)]=K(s,s).

The second part of the proof follows by the same arguments and the details are therefore omitted.

Remark B.1

  1. The classical result of Doob is a particular case of (B.5) when ε̃(t) = W(t) is the Brownian motion with covariance function (t, s) = min(t, s). In this case we have (t) = 1, (t) = t, α (t) = v(t) and β(t) = q(t). Specifically, the Doob’s representation is given by ε (t) = v(t)W(q(t)) [see Doob (1949)].

  2. Both functions β : [a, b] → [ã,b̃] and β̃ : [ã,b̃] → [a, b] are positive strictly increasing functions and are inverses of each other; that is,
    β(t)=β-1(t),t[a,b]. (B.7)
  3. The functions α (·) and α̃ (·) are positive and satisfy the relation
    α(t)·α(β(t))=v(t)v(β(t))·v(β(t))v(β(β(t)))=1,t[a,b]. (B.8)
  4. The properties (b) and (c) imply that the transformation ε̃ε defined by (B.5) is the inverse of the transformation εε̃ defined in (B.3).

B.2 Transformation of regression models

Associated with the transformation of the triangular covariance kernels there exists a canonical transformation for the corresponding regression models. To be precise, consider the regression model (1.1) or its continuous time version (1.2), where the covariance kernel K, ·) has the form (B.1). Recall the definition of the transformation β : [a, b] → [ã,b̃] defined in (B.6), which maps the observation points tj to j = β(tj), j = 1, . . . , N and define

f(s)=f(β(s))α(β(s)),ε(s)=ε(β(s))α(β(s)),y(tj)=y(tj)α(tj), (B.9)

where s ∈ [ã, b̃] so that β̃ (s) ∈ [a, b]. The regression model (1.1) can now be rewritten in the form

y(tj)=θTf(tj)+ε(tj),tj[a,b],j=1,,N. (B.10)

The errors ε̃ (j) in (B.10) have zero mean and, by Lemma B.1 and the identity (B.8), their covariances are given by

E[ε(ti)ε(tj)]=K(ti,tj). (B.11)

Hence we have transformed the regression observation scheme (1.1) with error covariances 𝔼[ε(ti) ε (tj)] = K(ti, tj) to the scheme (B.10) with covariances (B.11). Conversely, we can transform the model (B.10) with covariances (B.11) to the model (1.1) using the transformations

f(t)=f(β(t))α(β(t)),ε(t)=ε(β(t))α(β(t)),t[a,b]. (B.12)

Lemma B.2

The transformation f defined in (B.9) is an inverse to the transformation f defined in (B.12).

Proof

Inserting the expression for from (B.9) into (B.12), we have

f(t)=f(β(t))α(β(t))=f(β(β(t)))α(β(β(t)))α(β(t))=f(t)α(t)α(β(t))=f(t),

where we have used the identities β̃ (β(t)) = t, see (B.7), and α (t) α̃ (β(t)) = 1, see (B.8).

B.3 Transformation of designs

In this section we consider a transformation of the matrix-weighted designs under a given transformation of the regression models. In the one-parameter case with m = 1, these matrix-weighted designs become signed measures; that is, signed designs as considered in Section 2. In this section, it is convenient to define all integrals as Lebesgue-Stieltjes integrals with respect to the distribution functions of the measures ζ and ζ̃.

To be precise, let (t) = Oξ(t)(t) be a matrix-weighted design on the interval t ∈ [a, b]. Recalling the definition of α, α̃ and β, β̃ in (B.2) and (B.6) we define a matrix-weighted design dξ̃(s) = Õξ̃ (s)dζ̃(s) by

dζ(s)=dζ(β(s))andOξ(s)=α2(β(s))Oξ(β(s)). (B.13)

Note that ζ̃ and ζ are probability measures on the intervals [ã, b̃] and [a, b], respectively. Similarly, for a given matrix-weighted design dξ̃(s) = Õξ̃ (s)dζ̃(s) on the interval [ã, b̃] we define a matrix-weighted design (t) = Oξ(t)(t) on the interval [a, b] by

dζ(t)=dζ(β(t))andOξ(t)=α2(β(t))Oξ(β(t)). (B.14)

Similar to Lemma B.2 we can see that the transformation ξ̃ξ defined by (B.14) is the inverse to the transformation ξξ̃ defined by (B.13).

For the following discussion we recall the definition of the covariance matrix D(ξ) in (3.11). For the model (B.10), the covariance matrix of the design dξ̃(s) = Õξ̃ (s)dζ̃(s), defined by (B.13), is given by

D(ξ)=M-1(ξ)B(ξ)(M-1(ξ))T, (B.15)

where

B(ξ)=ababK(t,s)Oξ(t)f(t)(Oξ(s)f(s))Tdζ(t)dζ(s),M(ξ)=abOξ(t)f(t)fT(t)dζ(t)

and the kernel is defined by (B.4).

Theorem B.1

For any matrix-weighted design (t) = Oξ(t)(t) and the corresponding matrix-weighted design ξ̃ defined by (B.13), we have D(ξ) = (ξ̃). In particular, D* = *, where D* and * are the covariance matrices of the BLUE in the continuous time models (1.2) and in the model {θT(s) + ε̃ (s)|s ∈ [ã, b̃]}, respectively.

Proof

Using the variable transformation β̃ (s) = t and (B.9), we have

M(ξ)=Oξ(s)f(s)fT(s)dζ(s)=Oξ(β(s))f(β(s))α(β(s))fT(β(s))α(β(s))·α2(β(s))dζ(β(s))=Oξ(t)f(t)fT(t)dζ(t)=M(ξ).

Next, we calculate the corresponding expression for (ξ̃), that is

B(ξ)=K(x,y)Oξ(x)f(x)(Oξ(y)f(y))Tdζ(x)dζ(y)=v(x)v(y)min(q(x),q(y))Oξ(x)f(x)(Oξ(y)f(y))Tdζ(x)dζ(y)=v(x)v(y)min(q(x),q(y))Oξ(β(x))f(β(x))α(β(x))(Oξ(β(y))f(β(y)))Tα(β(y))×α2(β(x))dζ(β(x))α2(β(y))dζ(β(y))

Define s = β̃ (x) and t = β̃ (y) so that x = β̃−1(s) = β(s) and similarly y = β(t). Changing the variables in the integrals above we obtain

B(ξ)=v(β(s))v(β(t))min(q(β(s)),q(β(t)))Oξ(s)f(s)(Oξ(t)f(t))Tα(s)α(t)dζ(s)dζ(t).

Using the definition of β in (B.6) yields (β(t)) = (−1(q(t))) = q(t) and by the definition of α in (B.6) we finally get

B(ξ)=v(β(s))v(β(t))min(q(s),q(t))Oξ(s)f(s)(Oξ(t)f(t))Tv(s)v(β(s))v(t)v(β(t))dζ(s)dζ(t)=min(q(s),q(t))Oξ(s)f(s)(Oξ(t)f(t))Tv(s)v(t)dζ(s)dζ(t)=B(ξ).

The result D(ξ) = (ξ̃) follows now from the definitions (3.11) and (B.15).

Contributor Information

Holger Dette, Ruhr-Universität Bochum, Fakultät für Mathematik, 44780 Bochum, Germany.

Andrey Pepelyshev, School of Mathematics, Cardiff University, Cardiff, CF24 4AG, UK.

Anatoly Zhigljavsky, School of Mathematics, Cardiff University, Cardiff, CF24 4AG, UK.

References

  1. Bickel PJ, Herzberg AM. Robustness of design against autocorrelation in time I: Asymptotic theory, optimality for location and linear regression. Annals of Statistics. 1979;7(1):77–95. [Google Scholar]
  2. Boltze L, Näther W. On effective observation methods in regression models with correlated errors. Math Operationsforsch Statist Ser Statist. 1982;13:507–519. [Google Scholar]
  3. Dette H, Kunert J, Pepelyshev A. Exact optimal designs for weighted least squares analysis with correlated errors. Statistica Sinica. 2008;18(1):135–154. [Google Scholar]
  4. Dette H, Pepelyshev A, Zhigljavsky A. Optimal design for linear models with correlated observations. The Annals of Statistics. 2013;41(1):143–176. [Google Scholar]
  5. Doob JL. Heuristic approach to the Kolmogorov-Smirnov theorems. The Annals of Mathematical Statistics. 1949;20(3):393–403. [Google Scholar]
  6. Grenander U. Stochastic processes and statistical inference. Ark Mat. 1950;1:195–277. [Google Scholar]
  7. Harman R, Štulajter F. Optimal prediction designs in finite discrete spectrum linear regression models. Metrika. 2010;72(2):281–294. [Google Scholar]
  8. Harman R, Štulajter F. Optimality of equidistant sampling designs for the Brownian motion with a quadratic drift. Journal of Statistical Planning and Inference. 2011;141(8):2750–2758. [Google Scholar]
  9. Kiselak J, Stehlík M. Equidistant D-optimal designs for parameters of Ornstein-Uhlenbeck process. Statistics and Probability Letters. 2008;78:1388–1396. [Google Scholar]
  10. Mehr C, McFadden J. Certain properties of Gaussian processes and their first-passage times. Journal of the Royal Statistical Society. Series B (Methodological) 1965:505–522. [Google Scholar]
  11. Müller WG, Pázman A. Measures for designs in experiments with correlated errors. Biometrika. 2003;90:423–434. [Google Scholar]
  12. Näther W. Effective Observation of Random Fields. Teubner Verlagsgesellschaft; Leipzig: 1985a. [Google Scholar]
  13. Näther W. Exact design for regression models with correlated errors. Statistics. 1985b;16:479–484. [Google Scholar]
  14. Pázman A, Müller WG. Optimal design of experiments subject to correlated errors. Statist Probab Lett. 2001;52(1):29–34. [Google Scholar]
  15. Pukelsheim F. Optimal Design of Experiments. SIAM; Philadelphia: 2006. [Google Scholar]
  16. Sacks J, Ylvisaker ND. Designs for regression problems with correlated errors. Annals of Mathematical Statistics. 1966;37:66–89. [Google Scholar]
  17. Sacks J, Ylvisaker ND. Designs for regression problems with correlated errors; many parameters. Annals of Mathematical Statistics. 1968;39:49–69. [Google Scholar]
  18. Zhigljavsky A, Dette H, Pepelyshev A. A new approach to optimal design for linear models with correlated observations. Journal of the American Statistical Association. 2010;105:1093–1103. [Google Scholar]
  19. Zhigljavsky AA. Theory of Global Random Search. Springer; Netherlands: 1991. [Google Scholar]

RESOURCES