Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Sep 19.
Published in final edited form as: Comput Stat Data Anal. 2009 Apr 8;53(11):3773–3786. doi: 10.1016/j.csda.2009.03.026

A Bayesian regression model for multivariate functional data

Ori Rosen a,*, Wesley K Thompson b
PMCID: PMC5604261  NIHMSID: NIHMS125546  PMID: 28936016

Abstract

In this paper we present a model for the analysis of multivariate functional data with unequally spaced observation times that may differ among subjects. Our method is formulated as a Bayesian mixed-effects model in which the fixed part corresponds to the mean functions, and the random part corresponds to individual deviations from these mean functions. Covariates can be incorporated into both the fixed and the random effects. The random error term of the model is assumed to follow a multivariate Ornstein–Uhlenbeck process. For each of the response variables, both the mean and the subject-specific deviations are estimated via low-rank cubic splines using radial basis functions. Inference is performed via Markov chain Monte Carlo methods.

1. Introduction

The term functional data analysis describes non-parametric analyses of longitudinal data which focus on the curves themselves as the basic unit of data. Some of the goals of functional data analysis include exploring individual variation of curves from an overall mean function, and modeling the dependence of the curves on covariates. The mean function, as well as the subject-specific functions are estimated non-parametrically. In this paper, we propose a method for analyzing multivariate functional data with unequally spaced observation times that may differ among subjects. It is assumed that all variables are observed at the same time points. Fitting a regression model with a multivariate response may be done by either fitting a separate regression for each of the response variables or by fitting a single regression with all response variables simultaneously. The latter may be advantageous if the error terms corresponding to each variable are correlated. Thus, fewer observations may be required to obtain reliable non-parametric function estimates compared to fitting each regression separately and ignoring the correlation. This has been shown to be the case in seemingly unrelated regression (see for example Smith and Kohn, 2000).

Our method is formulated as a Bayesian mixed-effects model in which the fixed part corresponds to the mean functions, and the random part corresponds to individual deviations from these mean functions. Covariates can be incorporated into both the fixed and the random effects. The random error term of the model is assumed to follow a first order continuous-time multivariate autoregression, also known as a multivariate Ornstein–Uhlenbeck process. For each of the response variables, both the mean and the subject-specific deviations are estimated via low-rank cubic splines using radial basis functions. Inference is performed via Markov chain Monte Carlo methods.

Our model is closest in spirit to the functional mixed effects model of Guo (2002), where the fixed and random effects are modeled by cubic smoothing splines. However, Guo’s model accommodates only a univariate response variable, and does not allow correlated error terms. It can be fit either via standard mixed effects software or by Kalman filtering. Inference and model selection are based on a generalized maximum likelihood ratio test. Baladandayuthapani et al. (2008) have proposed a Bayesian model for spatially correlated functional data analysis. The smoothing technique they use is similar to ours, but their emphasis is on spatial correlation rather than on temporal correlation. Smith and Kohn (2000) consider multivariate non-parametric regression using the seemingly unrelated regression approach. They show that if the error terms of the regression equations are correlated, better non-parametric estimates of the regression functions are obtained by accounting for this correlation compared to fitting separate regressions ignoring the correlation. It is noted that Smith and Kohn (2000) consider multivariate non-parametric regression, not functional data analysis. In functional data analysis, each individual subject has its own function which needs to be estimated for each variable. Smith and Kohn (2000) only estimate a single function for each variable.

The Ornstein–Uhlenbeck process has been used before in various contexts. Unlike most diffusion processes, its transition density is available in closed form, which results in a closed-form expression for the likelihood function. Jones (1993), Chapter 8, uses a state-space approach to parameter estimation. Sy et al. (1997) present a model for multivariate repeated measures which allows unequally spaced observations by using the multivariate integrated Ornstein–Uhlenbeck process. The fixed and random effects in their model have parametric forms. Markov chain Monte Carlo methods for inference on the Ornstein–Uhlenbeck process (univariate or multivariate) parameters have also been proposed. A recent review of estimation for discretely observed diffusion processes is given in Beskos et al. (2006). Golightly and Wilkinson (2006) discuss Bayesian inference for non-linear multivariate diffusions. A number of authors have assumed a common spacing between the observed times. Blackwell (2003) takes this common spacing to be the most frequently occurring interval between observations. De la Cruz-Mesía and Marshall (2003, 2006) discuss the univariate Ornstein–Uhlenbeck process and take the common spacing to be the average time difference between two consecutive observations.

The example for application of our methodology is taken from a recent psychiatric study comparing psychotherapy to pharmacotherapy carried out at the University of Pittsburgh and the University of Pisa, Italy (Frank et al., 2008). This study sought differential baseline predictors of response to these two forms of treatment of major depression. Here, we examine the interaction effect of treatment group with Lifetime Depressive Spectrum symptoms (LDS; (Cassano et al., 1997)) in 252 patients entering the study in an acutely depressive episode. Levels of depression are determined by the clinician-administered Hamilton Rating Scale for Depression (HRSD) and the Quick Inventory for Depression Self-report (QIDS). These two scales were given to patients at baseline and again roughly weekly over the course of each subject’s acutely depressive episode.

Our main contribution in this paper is accommodating multivariate functional data including covariates and accounting for correlation across variables and time using smoothing techniques in combination with modeling the error term via the multivariate Ornstein–Uhlenbeck process.

The rest of the paper is organized as follows. In Section 2, we describe our model, the prior distributions and the sampling scheme. Section 3 provides results of a simulation study. Section 4 discusses an application, and Section 5 ends with a brief discussion.

2. The model, priors and sampling scheme

2.1. The model

Suppose yi(tij) is a p × 1 vector of response variables on subject i at time tij, i = 1, …, n, j = 1, …, mi, and consider the model

yi(tij)=Xijμ(tij)+Zijgi(tij)+δi(tij). (1)

In (1), μ(t)=(μ1(t),,μp(t)) and gi(t)=(gi1(t),,gip(t)), where μk(t) = (μk1(t), …, μkr (t))′ and gik(t) = (gik1(t), …, giks(t))′ are an r × 1 vector of fixed functions, and an s × 1 vector of random functions, respectively, for k = 1, …, p. Associated with μ(tij) is an r × 1 covariate vector xij, and with gi(tij) – an s × 1 covariate vector zij, such that Xij=Ipxij and Zij=Ipzij, where Ip is a p × p identity matrix, and ⊗ denotes the Kronecker product. We have assumed here that the p response variables share the same covariates.

Before proceeding to specify the p × 1 vector of random errors, δi(tij), we give an example which is a special case of model (1). Suppose p = 2, r = 2 and s = 1, with xij taking values in {(1 0)′, (1 1)′}, and zij = 1 with corresponding functions μ1(t) = (μ11(t), μ12(t))′, μ2(t) = μ21(t), μ22(t))′, gi1(t) and gi2(t). In this case, model (1) reduces to

yi1(tij)=μ11(tij)+xij2·μ12(tij)+gi1(tij)+δi1(tij)yi2(tij)=μ21(tij)+xij2·μ22(tij)+gi2(tij)+δi2(tij),

where xij2, the second entry of xij, can take the values 0 or 1. In this example, there are two groups of subjects – control (xij2 = 0) and treatment (xij2 = 1). Each group has its own mean curve for each of the two variables, and individual deviations from these curves are accommodated by the random functions gi1(t) and gi2(t), i = 1, …, n.

The error term in model (1) is assumed to follow a multivariate Ornstein–Uhlenbeck process. More specifically, δi(t) satisfies the stochastic differential equation

dδi(t)=Aδi(t)+BdWi(t),

where A and B are p × p matrices of full rank common to all i = 1, …, n, and Wi(t) is the p-dimensional Wiener process. Three properties of the Ornstein–Uhlenbeck process (Gardiner, 1983, pp. 110–111) which will be useful in what follows are

  1. The Ornstein–Uhlenbeck process will be stationary provided the eigenvalues of A have positive real parts.

  2. The solution Σ to the matrix equation + ΣA′ = BB′ is the stationary variance–covariance matrix of the process.

  3. In the stationary state, the covariance of δi(t) and δi(s), for s < t, is

Cov(δi(t),δi(s))=exp{A(ts)}. (2)

Let Δtij = tijti,j−1 for j = 1, …, mi, where ti0 = 0. The transition density of the Ornstein–Uhlenbeck process is given by

p(δi(tij)δi(ti,j1),Δtij)|ΩΔtij|1/2exp{12γtijΩΔtij1γtij}, (3)

where γtij = δi(tij) − exp(−AΔtij)δi(ti,j−1) and ΩΔtij = Σ − exp(−AΔtij) Σ exp(−A′ Δtij).

The functions μkl(t) and gikm (t), k = 1, …, p, l = 1, …, r, m = 1, …, s, i = 1, …, n, are modeled as cubic splines using low-rank radial basis functions (French et al., 2001; Ruppert et al., 2003). In Section 2.2 we review briefly non-parametric function estimation.

2.2. Non-parametric function estimation

The functions μkl(t) and gikm(t) are estimated non-parametrically. In this section we explain the basis function approach which is used in turn in Section 2.3 to estimate these functions. For simplicity, we focus on scatterplot smoothing, with observations (xi, yi), i = 1, …, n. The description in this section is based on French et al. (2001), Ruppert et al. (2003) and Crainiceanu et al. (2005). Consider the model

yi=f(xi)+εi,

where E (εi) = 0, i = 1, …, n, and f is an unknown smooth function. A linear spline basis function can be expressed as (x − κ)+ = max (0, x − κ), where κ is a knot. Any linear combination of linear spline basis functions 1, x, (x − κ1)+, …, (x − κK)+ is a piecewise linear function with knots at κ1, …, κK. The function f may thus be expressed as

f(x)=β0+β1x+k=1Kuk(xκk)+, (4)

where the uks are the coefficients of the basis functions. We comment later in this section on the value of K.

To understand how the spline model (4) can be used for fitting a non-parametric curve to data, consider Fig. 1, which displays in the top panel whip-shaped data similar to the example in Ruppert et al. (2003). The left half of the data exhibits linear behavior while curvature is apparent on the right-hand side. The bottom panel presents the basis functions used in the spline model. In particular, the equally-spaced knots are 0.50, 0.55, 0.60, …, 0.95. Comparing the two panels of Fig. 1, it is quite easy to see that a linear combination of the basis functions in the bottom panel should be able to capture the data structure in the top panel.

Fig. 1.

Fig. 1

Top: data with fitted curve. Bottom: the basis functions.

In general, any structure can be accommodated by placing basis functions at additional knots. To automate the process, two main approaches are commonly taken. One approach is automatic knot selection which can be carried out via Bayesian variable selection. Specifically, a large number of knots are placed at either equally spaced locations or at specific percentiles of the covariate, and an indicator variable is attached to each knot (see for example Thompson and Rosen, 2008). The indicator value is 1 if a knot is to be retained at a given location or 0 if the knot should be removed from that location. In a Bayesian MCMC procedure, the indicator variables are sampled from at each iteration. The other approach is to retain all the knots but to constrain their influence. This can be accomplished by penalized spline regression or equivalently by using a linear mixed effects model formulation. In this paper, we use the latter approach in a Bayesian framework. In both approaches, the value of K is not crucial, as long as it is not too small. Typically, 30–40 knots are sufficient for medium-sized datasets. Instead of the linear spline representation (4), we use in this paper the low rank thin-plate spline representation

f(x)=β0+β1x+k=1Kuk|xκk|3. (5)

Using cubic radial basis functions tends to result in a more aesthetically appealing fit, compared to that of the truncated-line basis (Fig. 2), and may lead to faster convergence of the MCMC algorithm. The penalized spline approach prevents overfitting by adding a roughness penalty. Specifically, the minimization criterion is

i=1n(yif(xi))2+1λθDθ, (6)

where θ = (β0, β1, u1, …, uK)′, λ is the smoothing parameter and D is a known positive-definite penalty matrix. For thinplate splines, the matrix D is given by

D=(02×202×K0K×2ΩK),

where the (k, l)th element of ΩK is |κkκl|3. From the structure of the matrix D it is clear that only the uks are penalized. Let y = (y1, …, yn)′, X = [1 xi]1≤in, ZK = [|xi − κ1|3 … |xi − κK|3]1≤in. Dividing (6) by σε2 and expressing the penalty term explicitly as a function of ΩK results in

1σε2yXβZKu2+1λσε2uΩKu, (7)

where β = (β0, β1)′ and u = (u1, …, uK)′ are considered fixed and random parameters, respectively. The solution to (7) is equal to the best linear unbiased predictor (BLUP) in the linear mixed model

y=Xβ+ZKu+ε,Cov(u)=σu2(ΩK1/2)(ΩK1/2), (8)

where ΩK1/2 is based on the singular value decomposition. Note that ΩK is not a positive definite matrix so it is not a proper covariance matrix; however, French et al. (2001) show that the smooth fit is not affected by this fact. Let b=ΩK1/2u and Z=ZKΩK1/2, then the mixed model (8) is equivalent to

y=Xβ+Zb+ε,Cov(bε)=(σb2IK00σε2In). (9)

Fig. 2.

Fig. 2

The data with a fitted curve. Left: truncated lines basis. Right: Cubic radial basis functions.

In a Bayesian framework, prior distributions need to be placed on all the model parameters.

2.3. Estimating μkl(t) and gikm(t)

To estimate the functions μkl(t) and gikm(t), k = 1, …, p, l = 1, …, r, m = 1, …, s, i = 1, …, n, we use the basis function approach described in Section 2.2. In particular, let κ1, …, κK be K knots obtained as sample quantiles of tij, i = 1, …, n, j = 1, …, mi, and let ΛK = [|κk − κk′|3]1≤k,k′≤K be a K × K matrix. Let ϕij=(1tij),ξij=(|tijκ1|3,,|tijκK|3), and ψij=ξijΛK1/2, where ΛK1/2 is obtained via the singular value decomposition. The vectors ϕij and ψij are basis functions evaluated at tij and are used to model the linear part and the non-linear part, respectively, of the fixed and random functions. In particular, μkl(t) and gikm(t) can be evaluated at tij by

μkl(tij)=ϕijβkl+ψijνklandgikm(tij)=ϕijwikm+ψijuikm (10)

for k = 1, …, p, l = 1, …, r, m = 1, …, s and i = 1, …, n. In (10), βkl, νkl,wikm and uikm are unknown parameter vectors.

2.4. Priors on the basis function coefficients and the variance components

We place the following prior distributions on βkl, νkl, wikm and uikm, k = 1, …, p, l = 1, …, r, m = 1, …, s and i = 1, …, n.

  1. βklindN(0,σβkl2I2), where I2 is a 2 × 2 identity matrix, and σβkl2 is a large known value.

  2. νklindN(0,συkl2IK), where IK is a K × K identity matrix, and K is the number of knots.

  3. wikmindN(0,diag(σwkm02,σwkm12)).

  4. uikmindN(0,σukm2IK).

Similar prior distributions on the coefficients of the basis functions were used by Durbán et al. (2005). Note that the variances of the elements of wikm are different while those of the elements of uikm are all the same. This is merely for computational convenience to avoid an additional (K − 1) parameters. The priors on the variance components συkl2, k = 1, …, p, l = 1, …, r, are independent inverse gamma distributions with densities

p(συkl2)(συkl2)(α1+1)exp(b1/συkl2),

where a1 and b1 are known small values reflecting vague knowledge on συkl2. The priors on σwkm02, σwkm12 and σukm2 are similar inverse gamma distributions. Recently, a number of authors (see for example Gelman, 2006) have proposed alternative prior distributions for variance components which may exhibit superior behavior to that of inverse gamma distributions. However, Zhao et al. (2006) reported good performance of inverse gamma priors in the case of non-parametric regression, provided the hyperparameters are not too small. Specifically, hyperparameter values of 0.01 worked well, whereas values of 0.001 behaved erratically.

2.5. Priors on the Ornstein–Uhlenbeck parameters

The Ornstein–Uhlenbeck process parameters are the matrix A and the matrix C = BB′. Both matrices consist of parameters which are constrained to satisfy certain conditions. In particular, as mentioned in Section 2, the stationarity condition requires the real parts of the eigenvalues of A to be positive. Also, the matrix C is required to be symmetric and positive definite. Imposing the constraints directly on the elements of these matrices would be difficult. Instead, we first express each of these matrices in an appropriate decomposition and then place prior distributions on the parameters of the decomposition factors. This is a much easier task, as the factor parameters are either unconstrained or constrained to be non-negative. To place a prior on A, we express it as A = SΨS−1, where S is a matrix of linearly independent eigenvectors, and Ψ is a diagonal matrix of real positive eigenvalues. This parameterization, used also by Sy et al. (1997) for the bivariate Ornstein–Uhlenbeck process, satisfies the stationarity condition mentioned above for the Ornstein–Uhlenbeck process. Aït-Sahalia (2008) discusses identifiability related to A and expresses it as a lower triangular matrix with positive diagonal elements. Kessler and Rahbek (2004) discuss identifiability issues in the case of equidistant observation times. The matrix S is parameterized as S = (sij), i, j = 1, …, p, with unit diagonal elements. Independent N (0, σa2) priors are placed on the off-diagonal elements of S, and on the logarithms of the diagonal elements of Ψ.

The matrix C is symmetric and positive definite. To place priors on its elements which satisfy the symmetry and positive definiteness, we first express the matrix C as a modified Cholesky factorization, C = LDL′, where L is unit lower triangular, and D is diagonal. This approach was used for example in Smith and Kohn (2002) and in Rosen and Stoffer (2007). The emphasis of Rosen and Stoffer (2007) is on estimation in the frequency domain for multivariate time series observed at equally spaced time points. The priors on the off-diagonal elements of L are taken to be independent N(0, σL2) with a fixed large value of σL2. The priors placed on log(Di), where Di is the ith diagonal element of D, are independent N(0, σD2) with a fixed large value of σD2.

2.6. The sampling scheme

Let θk=(βk1,νk1,,βkr,νkr), for k = 1, …, p, and let θ=(θ1,,θp). Similarly, let ηik=(wik1,uik1,,wiks,uiks), for k = 1, …, p and ηi=(ηi1,,ηip) for i = 1, …, n. The sampling scheme consists of the following stages. More details are given in the Appendix.

  1. Initialize θ, ηi, i = 1, …, n, and the variance components by fitting p mixed effects models, for k = 1, …, p. Initialize A and C by maximizing numerically the log conditional joint posterior distribution of A and C.

  2. Generate θ from its full conditional posterior distribution, which is multivariate normal.

  3. For each i, i = 1, …, n, generate ηi from its full conditional posterior distribution, which is multivariate normal.

  4. For k = 1, …, p, l = 1, …, r, m = 1, …, s, generate the variance components συkl2, σwkm02, σwkm12 and σukm2 from their full conditional posterior distributions, which are inverse gamma.

  5. Generate A from its full conditional posterior distribution. Since this distribution is not standard, we use a Metropolis step with a multivariate normal proposal density centered at the current value of A. The variance–covariance matrix of this normal proposal is based on the inverse of the estimated negative hessian of the log conditional posterior distribution.

  6. Generate C from its full conditional posterior distribution using a Metropolis step.

3. Simulations

In this section, we explore by simulation the potential improvement in curve fitting when modeling the correlation structure of multivariate functional data rather than ignoring it. Specifically, we examine improvements in mean squared error for the individual subject-level functions. For this purpose we generated 100 datasets, with each dataset consisting of observations, without covariates, on n = 50 subjects. The number of observations per subject is mi = 2 + wi, where wi is a Poisson random variable with expectation 5, giving an average of 7 observation times per subject. The observation times themselves were independently generated from a uniform distribution on the interval [0, mi]. For each subject, there are p = 3 response variables with overall subject mean functions chosen to represent a variety of possible relationships. The first true mean function is μ1(t) = 7 sin(−.5t), which exhibits low frequency variation on the range of t. Note that the second subscript on μ1(t) was dropped, since there are no covariates in our simulation setting. The second true mean function is μ2(t) = 10ϕ(t; 1.5, .3) + 6ϕ(t; 4, .6), where ϕ(t; a, b) is a univariate normal density with mean a and standard deviation b. The third true mean function is μ3(t) = 2 sin(−t), which has higher frequency oscillations on the range of t. Let fik(t) = μk(t) = gik(t), k = 1, 2, 3, be the individual subject functions, where we have again dropped the covariate subscript. In particular,

fi1(t)=ai1sin(.5t)+ai2fi2(t)=bi1ϕ(t;1.5,.3)+bi2ϕ(t;4,.6)fi3(t)=ci1sin(t)+ci2, (11)

where ai1 ~ N (7, .5), ai2 ~ N (0, .2), bi1 ~ N(10, .25), bi2 ~ N (6, .25), ci1 ~ N (2, .5) and ci2 ~ N (0, .2). Here, N (a, b) indicates the univariate normal distribution with mean a and standard deviation b. The observations yi(tij) were obtained by drawing the random coefficients ai1, ai2, bi1, bi2, ci1, ci2, evaluating the equations in (11) at time tij and adding δi(tij), which was in turn generated according to a multivariate Ornstein−Uhlenbeck error process with parameter values

A=(20.60.60.620.6002)andC=(150001500015).

These settings result in fairly noisy data with cross-correlations ranging from .25 to .47 among the three variables when evaluated at Δ¯ti=1mij=1mi(tijti,j1)1. The cross-correlation matrix is obtained from the cross-covariance (2), evaluated at Δ̄ti. Plots of these mean functions along with one randomly generated dataset can be seen in Fig. 3.

Fig. 3.

Fig. 3

Mean functions (heavy lines) and data from one randomly generated dataset (light lines) with 50 individual subjects.

Our model was fitted four times for each dataset, once for each univariate outcome separately (thereby ignoring across-variable correlation) and then to all three outcomes simultaneously. The sampling scheme was run for 10,000 iterations per dataset, with a burn-in period of 5000. Median estimates  and Ĉ of the A and C matrices across all 100 multivariate fits were given by

A^=(2.910.721.310.382.290.630.020.073.03)andC^=(18.820.811.670.8116.810.191.670.1919.59).

To assess the quality of the resulting estimates of the three mean functions, we calculated the average squared difference between the function estimates and the true mean functions at the unique observation times, t1 <, …, < tM. For the kth function this was computed by

MSEk(1)=1Mm=1M(μ^k(tm)μk(tm))2,

where μ̂k(·) is the fitted mean function for the kth response variable. This was done for all three univariate fits, as well as for the joint trivariate fit. Boxplots of the resulting MSEk(1),k=1,2,3, are displayed in Fig. 4. These boxplots show that the separate univariate fittings and the joint multivariate fitting resulted in little difference in the mean squared error for the first variable but lower mean squared error for the second two. For the univariate fits, the median estimates for MSEk(1) were .244, 1.19, and .206 for k = 1, 2, 3, respectively. For the multivariate fits, the corresponding median estimates of MSEk(1) were .225, 1.03, and .188, for k = 1, 2, 3, respectively. Paired t-tests of the log MSEk(1) showed no significant difference of log MSE1(1) corresponding to the univariate and multivariate fits (t = 1.0, p = 0.16) but that log MSE2(1) and log MSE3(1) were significantly lower for the multivariate fits (t = 2.3, p = 0.01 and t = 3.4, p < .0005, respectively). To assess the quality of the fitted individual subject functional estimates, we calculated the average squared difference between the true individual subject functions and their estimates from the model at the measured observation times. For the kth variable, this mean squared error for the individual subject functions was computed by

MSEk(2)=1mi=1nj=1mi(f^ik(tij)fik(tij))2,

where m=i=1nmi and ik(·) is the fitted function for the ith subject’s kth response. Boxplots of the resulting mean squared errors for the individual subject functions are displayed in Fig. 5. The mean squared errors for the subject functions show a similar pattern as for the overall mean. The median mean squared errors for each of the three outcome variables for the univariate fits were .702, 3.02, and .445, respectively. The median values for the corresponding multivariate fits were .676, 2.584, and .351. Thus, there was a 15%–20% reduction in mean squared error for the individual functions for the last two variables when accounting for the multivariate covariance among them. Again, there was no significant difference of log MSE1(2) between the multivariate and univariate fits (t = 0.33, p = 0.39) but the multivariate fits had significantly lower log MSE1(2) for k = 2, 3 (t = 4.6, p < 0.0005, t = 9.2, p < 0.0005, respectively). One possible reason why the first variable exhibits no improvement in mean squared errors is that the low-frequency variation of the corresponding mean function renders it easier to fit and hence it is less important to borrow information across variables.

Fig. 4.

Fig. 4

Boxplots of MSEk(1) for the posterior means μ̂k(·), k = 1, 2, 3, based on 100 simulated samples. Univariate and multivariate fits are denoted by U and M, respectively.

Fig. 5.

Fig. 5

Boxplots of MSEk(2) for the posterior means ik(·), k = 1, 2, 3, i = 1, …, 25, based on 100 simulated samples. Univariate and multivariate fits are denoted by U and M, respectively.

4. Application

As described in Section 1, we apply our methodology to the results of a randomized clinical trial conducted at the University of Pittsburgh and the University of Pisa, Italy (Frank et al., 2008). Despite decades of clinical trial experience in major depression, there is only limited understanding of which patients with major depressive disorder respond better to psychotherapy or to pharmacotherapy. This clinical trial compares the effects of psychotherapy (129 subjects) vs. pharmacotherapy (123 subjects). For clarity, Fig. 6 shows the trajectories corresponding to 25 subjects only. We limit the current analysis to the first 12 weeks after baseline, at which about 95% of the subjects were still on study. Our methodology, which allows for non-linear estimation of time courses, can accommodate the subject trajectories which are clearly non-linear. In addition, our methodology accounts for the possibility of non-linear effects of baseline covariates over time. Of particular interest is the identification of baseline subject characteristics which differentially predict treatment response in the two groups. The treatment response was change over time in two depression scales, the clinician-administered Hamilton Rating Scale for Depression (HRSD) and the Quick Inventory of Depression - Self Report (QIDS). These measures were collected more than once per week on average, though there was variation both within and between patients in the actual timing and number of measurements, with a mean of 11.2 measurement times per subject over the course of treatment. The HRSD scores ranged from 0–31 with a median of 10, and the QIDS scores ranged from 0–26 with a median of 6. In both measures, higher values indicate more depressive symptoms. Both measures were log transformed and standardized before running the analyses. A Lifetime Depression Spectrum (LDS) score was assessed on each patient at baseline; this gives an omnibus measure of depressive symptomatology over a patient’s lifetime (Cassano et al., 1997). In this example, we considered the LDS score to be a pre-treatment covariate with potentially differential effects on treatment outcomes for the two treatment groups. To explore this possibility, treatment group, LDS score, and their interaction were entered as time-varying fixed effects into our model with responses HRSD and QIDS entered as bivariate dependent variables. A time-varying random intercept was also included in the model. In the notation of Section 2, xij = (xij1, xij2, xij3, xij4)′ and zij = 1, where xij1 = 1, xij2 is a group indicator (equal to 1 if subject i received psychotherapy and zero otherwise), xij3 is the ith subject’s LDS score, and xij4 = xij2xij3. The sampling scheme described in Section 2.6 was run for 10,000 iterations with a burn-in period of 5000 iterations. The estimated parameters of the Ornstein–Uhlenbeck process are

A^=(5.753.884.407.04)andC^=(3.030.050.053.50).

Fig. 6.

Fig. 6

HRSD subject trajectories (left panel) and QIDS subject trajectories (right panel) for acutely depressed subjects. For clarity, only 25 subject trajectories are displayed. Scores are standardized to have zero mean and unit variance. Trajectories were truncated at 12 weeks.

The estimated time-varying functional coefficients μ̂k(t) = (μ̂k1(t), (μ̂k2(t), (μ̂k3(t), (μ̂k4(t))′, = k 1, 2, for the HRSD and QIDS responses are plotted in Figs. 7 and 8, respectively. Solid lines correspond to the multivarite fits; for comparison, the univariate fits appear in dashed lines. As can be seen in these plots, the multivariate fits show little evidence for a treatment group effect on HRSD, but evidence for a slight difference in the QIDS at around 3 weeks. However, there is a significant effect of LDS score on both outcomes, so that higher lifetime depression spectrum predicts worse outcomes over the first eight weeks or so. The interaction term is insignificant for HRSD and marginally significant in the 2–8 week time period for the QIDS responses. The effect of the interaction on responses is that LDS score is less predictive of poor QIDS response in the psychotherapy group than in the pharmacotherapy group. In general, the pointwise 95% credible intervals are wider for the univariate fits. While the functional coefficient estimates were substantially similar, the interaction coefficient for both models were not significant, i.e., the pointwise 95% credible intervals contained zero for the entire time course.

Fig. 7.

Fig. 7

Time-varying functional coefficients for HRSD responses. The solid lines are μ̂1l(t), 1 ≤ l ≤ 4, and their corresponding pointwise 95% credible intervals. The dashed lines are the analogous estimates and credible intervals corresponding to the univariate fits. Upper left panel: μ̂11(t). Upper right panel: μ̂12(t). Lower left panel: μ̂13(t). Lower right panel: μ̂14(t).

Fig. 8.

Fig. 8

Time-varying functional coefficients for QIDS responses. The solid lines are μ̂2l(t), 1 ≤ 1 ≤ 4, and their corresponding 95% credible intervals. The dashed lines are the analogous estimates and credible intervals corresponding to the univariate fits. Upper left panel: μ̂21(t). Upper right panel: μ̂22(t). Lower left panel: μ̂23(t). Lower right panel: μ̂24(t).

5. Discussion

In this paper we have devised a regression model apropriate for multivariate functional responses with unequallyspaced observation times. Efficiency may be gained by fitting a single regression with all response variables simultaneously, as opposed to fitting regression models for each functional response separately. This is especially true if the error terms corresponding to each variable are correlated. In our formulation, the random error terms of the model were assumed to follow a multivariate Ornstein–Uhlenbeck process. Through this formulation we were able to extend the seemingly unrelated regression framework to the unequally-spaced multivariate functional data context.

The model we proposed uses a Bayesian mixed-effects approach, where the fixed part corresponds to the mean functions, and the random part corresponds to individual deviations from these mean functions. Covariates were allowed as either fixed or random effects. For each of the response variables, both the mean and the subject-specific deviations were estimated via low-rank cubic splines using radial basis functions. Thus mean and subject-specific deviation from the mean were allowed to vary smoothly as a function of time. Inference was performed via Markov chain Monte Carlo methods.

We demonstrated the improvement in efficiency that is possible by using this model in simulations which show mean squared-error is lower for the full multivariate algorithm compared to fitting each of the functional responses unvariately, thereby ignoring the across-variable correlation. This seems especially important when the mean functions are wiggly, so that borrowing information across multiple responses becomes more important.

Finally, the utility of this methodology was demonstrated by application to a real-life psychiatric dataset looking at the relationship of multiple depression measures over time in a clinical trial. Here, using a multivariate approach resulted in narrower posterior confidence bands.

We plan future research to extend the multivariate functional model to mixed discrete and continuous functional outcome data. We also plan to develop methods for the joint analysis of mulitvariate functional data and time-to-event data.

Acknowledgments

We thank the referees for their helpful comments, which greatly improved the paper. We also thank Dr. Ellen Frank, University of Pittsburgh, Department of Psychiatry, for use of the example data. The first author was supported in part by RCMI grant 5G12 RR008124 from the NIH and by NSF grants DMS-0706752 and DMS-0804140. The second author was supported by NIH grant K25 MH076981-01 and NSF grant DMS-0904825.

Appendix

Starting values for θ, ηis and σ2

Let Dϕijr=Irϕij, where ⊗ denotes the Kronecker product, and Ir is an r × r identity matrix. Define Dψijr, Dϕijs and Dψijs similarly.

Let

X0=(X1Xn)andZ0=(Zυ1Zυ2Zυn|blockdiag(Zwi)i=1,,n|blockdiag(Zui)i=1,,n),

where

Xi=(xi1Dϕi1rximiDϕimir),Zυi=(xi1Dψi1rximiDψimir),Zwi=(zi1Dϕi1szimiDϕimis)andZui=(zi1Dψi1szimiDψimis).

To obtain starting values for θ, {ηi}i=1,…,n and σ2, we fit the mixed effects model

y=X0βk0+Z0νk0+εk,

for k = 1, …, p, where βk0=(βk1,,βkr),νk0=(νk1,,νkr,{wik1,,wiks}i=1,,n,{uik1,,uiks}i=1,,n) and

cov(νk0,εk)=(σ02IKr0Kr×2sn0Kr×Ksn0Kr×n02sn×Krσkw02I2sn02sn×Ksn02sn×n0Ksn×kr0Ksn×2snσku02IKsn0Ksn×n0n×kr0n×2sn0n×ksnσ2In).

Generating θ

Let Γrij=Ir(ϕijψij),Γsij=Is(ϕijψij),χij=Ip(xijΓrij) and Eij=Ip(zijΓsij). The error term in model (1) can be expressed as

δi(tij)=yi(tij)χijθEijηi, (A.1)

where the vectors θ and ηi are as defined at the beginning of Section 2.6. Plugging (A.1) into γtij = δi(tij) − exp(−AΔtij) δi(ti,j−1) gives

γtij=ζi(tij,ti,j1)χi(tij,ti,j1)θ,

where

ζi(tij,ti,j1)=yi(tij)Eijηiexp(tij)[yi(ti,j1)Ei,j1ηi]

and

χi(tij,ti,j1)=χijexp(tij)χi,j1.

Let G=blockdiag(σβ112I2,συ112Ik,,σβ1r2I2,συ1r2IK,,σβp12I2,συp12IK,,σβpr2I2,συpr2Ik). Then,

[θη,A,C,σ2,y]~N(μθ,θ),

where σ2=(σβkl2,συkl2,σwkm02,σwkm12,σukm2) for k = 1, …, p, l = 1, …, r, m = 1, …, s,

θ=[G+i=1nj=1miχi(tij,ti,j1)ΩΔtij1χi(tij,ti,j1)]1

and

μθ=[i=1nj=1miζi(tij,ti,j1)ΩΔtij1χi(tij,ti,j1)]θ.

In the expression for ΩΔtij, when p = 2, the stationary variance of the Ornstein–Uhlenbeck process is given by

=det(A)C+[Atr(A)I]C[Atr(A)I]2tr(A)det(A),

see Gardiner (1983). For p = 1, this reduces to Σ = C/(2A). For p > 2, Σ can be obtained numerically by Matlab’s lyap function, for example.

Generating ηi

Let

qi(tij,ti,j1)=yi(tij)χijθexp(AΔtij)[yi(ti,j1)χi,j1θ]

and

Gwu=blockdiag(w111,σu112IK,,w1s1,σu1s2IK,,wp11,σup12IK,,wps1,σups2IK),

where wkm=diag(σwkm02,σwkm12). Then

[ηi|θ,A,C,σ2,y]~N(μηi,ηi),

where

ηi=[j=1miEi(tij,ti,j1)ΩΔtij1Ei(tij,ti,j1)+Gwu]1

and

μηi=[j=1miqi(tij,ti,j1)ΩΔtij1Ei(tij,ti,j1)]ηi.

Generating σ2

συkl2υkl~IG(K/2+a1,b1+12νklνkl)

for k = 1, …, p, l = 1, …, r.

σwkm02{wikm0i=1,,n}~IG(n/2+a2,b2+12i=1nwikm02)σwkm12{wikm1i=1,,n}~IG(n/2+a3,b3+12i=1nwikm12)σukm2{uikmi=1,,n}~IG(nK2+a4,b4+12i=1nuikmuikm)

for k = 1, …, p, m = 1, …, s.

Starting values for A and C

Starting values for A and C are obtained by numerically maximizing the conditional posterior

p(A,Cθ0,{ηi0}i=1,,n,y)i=1nj=1mi|ΩΔtij|1/2exp{12γtijΩΔtij1γtij}×p(A)×p(C),

where θ0 and ηi0, i = 1, …, n, are the starting values for the basis function coefficients. Note that γtij depends on θ0 and ηi0, i = 1, …, n, through δi(tij) (expression (A.1)).

Generating the Ornstein–Uhlenbeck process parameters

To generate A, note that

p(AC,θ,{ηi}i=1,,n,y)i=1nj=1mi|ΩΔtij|1/2exp{12γtijΩΔtij1γtij}×p(A). (A.2)

Since (A.2) is not a standard distribution, we use a Metropolis step to generate A. The proposal distribution is multivariate normal centered at the current value of A with a variance–covariance matrix equal to the inverse of the negative Hessian of the log of (A.2) evaluated numerically at the mode. This variance–covariance matrix is computed once, conditional on the starting values for the other parameters, and is then fixed throughout the sampling scheme. To increase the acceptance rate, this variance–covariance matrix is multiplied by 5.76/p, as proposed in Gelman et al. (2004), page 306. More generally, when using a normal proposal distribution centered at the current point, Gelman et al. (2004) suggest using c2Σ as the covariance matrix of that proposal distribution. Among this class of proposal densities, the most efficient one has scale c2.4/p. The acceptance probability is

min{1,p(ApC,θ,{ηi}i=1,n,y)p(AcC,θ,{ηi}i=1,n,y)}.

The matrix C is generated by first generating the matrices L and D via a Metropolis step based on

p(L,DA,θ,{ηi}i=1,,n,y)i=1nj=1mi|ΩΔtij|1/2exp{12γtijΩΔtij1γtij}×p(L)×p(D).

An iterate for C is then given by C = LDL′.

References

  1. Aït-Sahalia Y. Closed-form likelihood expansions for multivariate diffusions. Annals of Statistics. 2008;36:906–937. [Google Scholar]
  2. Baladandayuthapani V, Mallick BK, Young Hong M, Lupton JR, Turner ND, Carroll RJ. Bayesian hierarchical spatially correlated functional data analysis with application to colon carcinogenesis. Biometrics. 2008;64:64–73. doi: 10.1111/j.1541-0420.2007.00846.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Beskos A, Papaspiliopoulos O, Roberts GO, Fearnhead P. Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes. Journal of Royal Statistical Society B. 2006;68:333–382. [Google Scholar]
  4. Blackwell PG. Bayesian inference for Markov processes with diffusion and discrete components. Biometrika. 2003;90:613–627. [Google Scholar]
  5. Durbán M, Harezlak J, Wand MP, Carroll RJ. Simple fitting of subject-specific curves for longitudinal data. Statistics in Medicine. 2005;24:1153–1167. doi: 10.1002/sim.1991. [DOI] [PubMed] [Google Scholar]
  6. Cassano GB, Michelini S, Shear MK, Coli E, Maser JD, Frank E. The panic-agoraphobic spectrum: A descriptive approach to the assessment and treatment of subtle symptoms. American Journal of Psychiatry. 1997;154:27–38. doi: 10.1176/ajp.154.6.27. [DOI] [PubMed] [Google Scholar]
  7. De la Cruz-Mesía R, Marshall G. A Bayesian approach for nonlinear regression models with continuous errors. Communications in Statistics, Theory and Methods. 2003;32:1631–1646. [Google Scholar]
  8. De la Cruz-Mesía R, Marshall G. Nonlinear random effects models with continuous time autoregressive errors. Statistics in Medicine. 2006;25:1471–1484. doi: 10.1002/sim.2290. [DOI] [PubMed] [Google Scholar]
  9. Frank E, Cassano GB, Rucci P, Fagiolini A, Maggi L, Kraemer HC, Kupfer DJ, Pollock B, Bies R, Nimgaonkar V, Pilkonis P, Shear MK, Thompson WK, Grochocinski VJ, Scocco P, Buttenfield J, Forgione RN. Addressing the challenges of a cross-national investigation: lessons from the Pittsburgh-Pisa study of treatment-relevant phenotypes of unipolar depression. Clinical Trials. 2008 doi: 10.1177/1740774508091965. under revision. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. French JL, Kammann EE, Wand MP. Comment on “Semiparametric Nonlinear Mixed-effects Models and Their Application” by C. Ke and Y. Wang. Journal of American Statistical Association. 2001;96:1285–1288. [Google Scholar]
  11. Gardiner CW. Handbook of Stochastic Methods for Physics, Chemistry and Natural Sciences. Springer-Verlag; Berlin: 1983. [Google Scholar]
  12. Gelman A. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) Bayesian Analysis. 2006;1:515–534. [Google Scholar]
  13. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. Second ed. Chapman & Hall/CRC; Boca Raton: 2004. [Google Scholar]
  14. Golightly A, Wilkinson DJ. Bayesian sequential inference for nonlinear multivariate diffusions. Statistics and Computing. 2006;16:323–338. [Google Scholar]
  15. Guo W. Functional mixed effects models. Biometrics. 2002;58:121–128. doi: 10.1111/j.0006-341x.2002.00121.x. [DOI] [PubMed] [Google Scholar]
  16. Jones RH. Longitudinal Data with Serial Correlation: A State-Space Approach. Chapman & Hall/CRC; Boca Raton: 1993. [Google Scholar]
  17. Kessler M, Rahbek A. Identification and inference for multivariate cointegrated and ergodic gaussian diffusions. Statistical Inference for stochastic Processes. 2004;7:137–151. [Google Scholar]
  18. Rosen O, Stoffer DS. Automatic estimation of multivariate spectra via smoothing splines. Biometrika. 2007;94:335–345. [Google Scholar]
  19. Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Cambridge University Press; Cambridge: 2003. [Google Scholar]
  20. Smith M, Kohn R. Nonparametric seemingly unrelated regression. Journal of Econometrics. 2000;98:257–281. [Google Scholar]
  21. Smith M, Kohn R. Parsimonious covariance matrix estimation for longitudinal data. J American Statistical Association. 2002;97:1141–1153. [Google Scholar]
  22. Sy JP, Taylor JMG, Cumberland WG. A stochastic model for the analysis of bivariate longitudinal AIDS data. Biometrics. 1997;53:542–555. [PubMed] [Google Scholar]
  23. Thompson WK, Rosen O. A Bayesian model for sparse functional data. Biometrics. 2008;64:54–63. doi: 10.1111/j.1541-0420.2007.00829.x. Web appendices are available at http://www.tibs.org/biometrics. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Zhao Y, Staudenmayer J, Coull BA, Wand MP. General design Bayesian generalized linear mixed models. Statistical Science. 2006;21:35–51. [Google Scholar]

RESOURCES