Abstract
In this paper we present a model for the analysis of multivariate functional data with unequally spaced observation times that may differ among subjects. Our method is formulated as a Bayesian mixed-effects model in which the fixed part corresponds to the mean functions, and the random part corresponds to individual deviations from these mean functions. Covariates can be incorporated into both the fixed and the random effects. The random error term of the model is assumed to follow a multivariate Ornstein–Uhlenbeck process. For each of the response variables, both the mean and the subject-specific deviations are estimated via low-rank cubic splines using radial basis functions. Inference is performed via Markov chain Monte Carlo methods.
1. Introduction
The term functional data analysis describes non-parametric analyses of longitudinal data which focus on the curves themselves as the basic unit of data. Some of the goals of functional data analysis include exploring individual variation of curves from an overall mean function, and modeling the dependence of the curves on covariates. The mean function, as well as the subject-specific functions are estimated non-parametrically. In this paper, we propose a method for analyzing multivariate functional data with unequally spaced observation times that may differ among subjects. It is assumed that all variables are observed at the same time points. Fitting a regression model with a multivariate response may be done by either fitting a separate regression for each of the response variables or by fitting a single regression with all response variables simultaneously. The latter may be advantageous if the error terms corresponding to each variable are correlated. Thus, fewer observations may be required to obtain reliable non-parametric function estimates compared to fitting each regression separately and ignoring the correlation. This has been shown to be the case in seemingly unrelated regression (see for example Smith and Kohn, 2000).
Our method is formulated as a Bayesian mixed-effects model in which the fixed part corresponds to the mean functions, and the random part corresponds to individual deviations from these mean functions. Covariates can be incorporated into both the fixed and the random effects. The random error term of the model is assumed to follow a first order continuous-time multivariate autoregression, also known as a multivariate Ornstein–Uhlenbeck process. For each of the response variables, both the mean and the subject-specific deviations are estimated via low-rank cubic splines using radial basis functions. Inference is performed via Markov chain Monte Carlo methods.
Our model is closest in spirit to the functional mixed effects model of Guo (2002), where the fixed and random effects are modeled by cubic smoothing splines. However, Guo’s model accommodates only a univariate response variable, and does not allow correlated error terms. It can be fit either via standard mixed effects software or by Kalman filtering. Inference and model selection are based on a generalized maximum likelihood ratio test. Baladandayuthapani et al. (2008) have proposed a Bayesian model for spatially correlated functional data analysis. The smoothing technique they use is similar to ours, but their emphasis is on spatial correlation rather than on temporal correlation. Smith and Kohn (2000) consider multivariate non-parametric regression using the seemingly unrelated regression approach. They show that if the error terms of the regression equations are correlated, better non-parametric estimates of the regression functions are obtained by accounting for this correlation compared to fitting separate regressions ignoring the correlation. It is noted that Smith and Kohn (2000) consider multivariate non-parametric regression, not functional data analysis. In functional data analysis, each individual subject has its own function which needs to be estimated for each variable. Smith and Kohn (2000) only estimate a single function for each variable.
The Ornstein–Uhlenbeck process has been used before in various contexts. Unlike most diffusion processes, its transition density is available in closed form, which results in a closed-form expression for the likelihood function. Jones (1993), Chapter 8, uses a state-space approach to parameter estimation. Sy et al. (1997) present a model for multivariate repeated measures which allows unequally spaced observations by using the multivariate integrated Ornstein–Uhlenbeck process. The fixed and random effects in their model have parametric forms. Markov chain Monte Carlo methods for inference on the Ornstein–Uhlenbeck process (univariate or multivariate) parameters have also been proposed. A recent review of estimation for discretely observed diffusion processes is given in Beskos et al. (2006). Golightly and Wilkinson (2006) discuss Bayesian inference for non-linear multivariate diffusions. A number of authors have assumed a common spacing between the observed times. Blackwell (2003) takes this common spacing to be the most frequently occurring interval between observations. De la Cruz-Mesía and Marshall (2003, 2006) discuss the univariate Ornstein–Uhlenbeck process and take the common spacing to be the average time difference between two consecutive observations.
The example for application of our methodology is taken from a recent psychiatric study comparing psychotherapy to pharmacotherapy carried out at the University of Pittsburgh and the University of Pisa, Italy (Frank et al., 2008). This study sought differential baseline predictors of response to these two forms of treatment of major depression. Here, we examine the interaction effect of treatment group with Lifetime Depressive Spectrum symptoms (LDS; (Cassano et al., 1997)) in 252 patients entering the study in an acutely depressive episode. Levels of depression are determined by the clinician-administered Hamilton Rating Scale for Depression (HRSD) and the Quick Inventory for Depression Self-report (QIDS). These two scales were given to patients at baseline and again roughly weekly over the course of each subject’s acutely depressive episode.
Our main contribution in this paper is accommodating multivariate functional data including covariates and accounting for correlation across variables and time using smoothing techniques in combination with modeling the error term via the multivariate Ornstein–Uhlenbeck process.
The rest of the paper is organized as follows. In Section 2, we describe our model, the prior distributions and the sampling scheme. Section 3 provides results of a simulation study. Section 4 discusses an application, and Section 5 ends with a brief discussion.
2. The model, priors and sampling scheme
2.1. The model
Suppose yi(tij) is a p × 1 vector of response variables on subject i at time tij, i = 1, …, n, j = 1, …, mi, and consider the model
| (1) |
In (1), and , where μk(t) = (μk1(t), …, μkr (t))′ and gik(t) = (gik1(t), …, giks(t))′ are an r × 1 vector of fixed functions, and an s × 1 vector of random functions, respectively, for k = 1, …, p. Associated with μ(tij) is an r × 1 covariate vector xij, and with gi(tij) – an s × 1 covariate vector zij, such that and , where Ip is a p × p identity matrix, and ⊗ denotes the Kronecker product. We have assumed here that the p response variables share the same covariates.
Before proceeding to specify the p × 1 vector of random errors, δi(tij), we give an example which is a special case of model (1). Suppose p = 2, r = 2 and s = 1, with xij taking values in {(1 0)′, (1 1)′}, and zij = 1 with corresponding functions μ1(t) = (μ11(t), μ12(t))′, μ2(t) = μ21(t), μ22(t))′, gi1(t) and gi2(t). In this case, model (1) reduces to
where xij2, the second entry of xij, can take the values 0 or 1. In this example, there are two groups of subjects – control (xij2 = 0) and treatment (xij2 = 1). Each group has its own mean curve for each of the two variables, and individual deviations from these curves are accommodated by the random functions gi1(t) and gi2(t), i = 1, …, n.
The error term in model (1) is assumed to follow a multivariate Ornstein–Uhlenbeck process. More specifically, δi(t) satisfies the stochastic differential equation
where A and B are p × p matrices of full rank common to all i = 1, …, n, and Wi(t) is the p-dimensional Wiener process. Three properties of the Ornstein–Uhlenbeck process (Gardiner, 1983, pp. 110–111) which will be useful in what follows are
The Ornstein–Uhlenbeck process will be stationary provided the eigenvalues of A have positive real parts.
The solution Σ to the matrix equation AΣ + ΣA′ = BB′ is the stationary variance–covariance matrix of the process.
In the stationary state, the covariance of δi(t) and δi(s), for s < t, is
| (2) |
Let Δtij = tij − ti,j−1 for j = 1, …, mi, where ti0 = 0. The transition density of the Ornstein–Uhlenbeck process is given by
| (3) |
where γtij = δi(tij) − exp(−AΔtij)δi(ti,j−1) and ΩΔtij = Σ − exp(−AΔtij) Σ exp(−A′ Δtij).
The functions μkl(t) and gikm (t), k = 1, …, p, l = 1, …, r, m = 1, …, s, i = 1, …, n, are modeled as cubic splines using low-rank radial basis functions (French et al., 2001; Ruppert et al., 2003). In Section 2.2 we review briefly non-parametric function estimation.
2.2. Non-parametric function estimation
The functions μkl(t) and gikm(t) are estimated non-parametrically. In this section we explain the basis function approach which is used in turn in Section 2.3 to estimate these functions. For simplicity, we focus on scatterplot smoothing, with observations (xi, yi), i = 1, …, n. The description in this section is based on French et al. (2001), Ruppert et al. (2003) and Crainiceanu et al. (2005). Consider the model
where E (εi) = 0, i = 1, …, n, and f is an unknown smooth function. A linear spline basis function can be expressed as (x − κ)+ = max (0, x − κ), where κ is a knot. Any linear combination of linear spline basis functions 1, x, (x − κ1)+, …, (x − κK)+ is a piecewise linear function with knots at κ1, …, κK. The function f may thus be expressed as
| (4) |
where the uks are the coefficients of the basis functions. We comment later in this section on the value of K.
To understand how the spline model (4) can be used for fitting a non-parametric curve to data, consider Fig. 1, which displays in the top panel whip-shaped data similar to the example in Ruppert et al. (2003). The left half of the data exhibits linear behavior while curvature is apparent on the right-hand side. The bottom panel presents the basis functions used in the spline model. In particular, the equally-spaced knots are 0.50, 0.55, 0.60, …, 0.95. Comparing the two panels of Fig. 1, it is quite easy to see that a linear combination of the basis functions in the bottom panel should be able to capture the data structure in the top panel.
Fig. 1.

Top: data with fitted curve. Bottom: the basis functions.
In general, any structure can be accommodated by placing basis functions at additional knots. To automate the process, two main approaches are commonly taken. One approach is automatic knot selection which can be carried out via Bayesian variable selection. Specifically, a large number of knots are placed at either equally spaced locations or at specific percentiles of the covariate, and an indicator variable is attached to each knot (see for example Thompson and Rosen, 2008). The indicator value is 1 if a knot is to be retained at a given location or 0 if the knot should be removed from that location. In a Bayesian MCMC procedure, the indicator variables are sampled from at each iteration. The other approach is to retain all the knots but to constrain their influence. This can be accomplished by penalized spline regression or equivalently by using a linear mixed effects model formulation. In this paper, we use the latter approach in a Bayesian framework. In both approaches, the value of K is not crucial, as long as it is not too small. Typically, 30–40 knots are sufficient for medium-sized datasets. Instead of the linear spline representation (4), we use in this paper the low rank thin-plate spline representation
| (5) |
Using cubic radial basis functions tends to result in a more aesthetically appealing fit, compared to that of the truncated-line basis (Fig. 2), and may lead to faster convergence of the MCMC algorithm. The penalized spline approach prevents overfitting by adding a roughness penalty. Specifically, the minimization criterion is
| (6) |
where θ = (β0, β1, u1, …, uK)′, λ is the smoothing parameter and D is a known positive-definite penalty matrix. For thinplate splines, the matrix D is given by
where the (k, l)th element of ΩK is |κk − κl|3. From the structure of the matrix D it is clear that only the uks are penalized. Let y = (y1, …, yn)′, X = [1 xi]1≤i≤n, ZK = [|xi − κ1|3 … |xi − κK|3]1≤i≤n. Dividing (6) by and expressing the penalty term explicitly as a function of ΩK results in
| (7) |
where β = (β0, β1)′ and u = (u1, …, uK)′ are considered fixed and random parameters, respectively. The solution to (7) is equal to the best linear unbiased predictor (BLUP) in the linear mixed model
| (8) |
where is based on the singular value decomposition. Note that ΩK is not a positive definite matrix so it is not a proper covariance matrix; however, French et al. (2001) show that the smooth fit is not affected by this fact. Let and , then the mixed model (8) is equivalent to
| (9) |
Fig. 2.

The data with a fitted curve. Left: truncated lines basis. Right: Cubic radial basis functions.
In a Bayesian framework, prior distributions need to be placed on all the model parameters.
2.3. Estimating μkl(t) and gikm(t)
To estimate the functions μkl(t) and gikm(t), k = 1, …, p, l = 1, …, r, m = 1, …, s, i = 1, …, n, we use the basis function approach described in Section 2.2. In particular, let κ1, …, κK be K knots obtained as sample quantiles of tij, i = 1, …, n, j = 1, …, mi, and let ΛK = [|κk − κk′|3]1≤k,k′≤K be a K × K matrix. Let , and , where is obtained via the singular value decomposition. The vectors ϕij and ψij are basis functions evaluated at tij and are used to model the linear part and the non-linear part, respectively, of the fixed and random functions. In particular, μkl(t) and gikm(t) can be evaluated at tij by
| (10) |
for k = 1, …, p, l = 1, …, r, m = 1, …, s and i = 1, …, n. In (10), βkl, νkl,wikm and uikm are unknown parameter vectors.
2.4. Priors on the basis function coefficients and the variance components
We place the following prior distributions on βkl, νkl, wikm and uikm, k = 1, …, p, l = 1, …, r, m = 1, …, s and i = 1, …, n.
, where I2 is a 2 × 2 identity matrix, and is a large known value.
, where IK is a K × K identity matrix, and K is the number of knots.
.
.
Similar prior distributions on the coefficients of the basis functions were used by Durbán et al. (2005). Note that the variances of the elements of wikm are different while those of the elements of uikm are all the same. This is merely for computational convenience to avoid an additional (K − 1) parameters. The priors on the variance components , k = 1, …, p, l = 1, …, r, are independent inverse gamma distributions with densities
where a1 and b1 are known small values reflecting vague knowledge on . The priors on , and are similar inverse gamma distributions. Recently, a number of authors (see for example Gelman, 2006) have proposed alternative prior distributions for variance components which may exhibit superior behavior to that of inverse gamma distributions. However, Zhao et al. (2006) reported good performance of inverse gamma priors in the case of non-parametric regression, provided the hyperparameters are not too small. Specifically, hyperparameter values of 0.01 worked well, whereas values of 0.001 behaved erratically.
2.5. Priors on the Ornstein–Uhlenbeck parameters
The Ornstein–Uhlenbeck process parameters are the matrix A and the matrix C = BB′. Both matrices consist of parameters which are constrained to satisfy certain conditions. In particular, as mentioned in Section 2, the stationarity condition requires the real parts of the eigenvalues of A to be positive. Also, the matrix C is required to be symmetric and positive definite. Imposing the constraints directly on the elements of these matrices would be difficult. Instead, we first express each of these matrices in an appropriate decomposition and then place prior distributions on the parameters of the decomposition factors. This is a much easier task, as the factor parameters are either unconstrained or constrained to be non-negative. To place a prior on A, we express it as A = SΨS−1, where S is a matrix of linearly independent eigenvectors, and Ψ is a diagonal matrix of real positive eigenvalues. This parameterization, used also by Sy et al. (1997) for the bivariate Ornstein–Uhlenbeck process, satisfies the stationarity condition mentioned above for the Ornstein–Uhlenbeck process. Aït-Sahalia (2008) discusses identifiability related to A and expresses it as a lower triangular matrix with positive diagonal elements. Kessler and Rahbek (2004) discuss identifiability issues in the case of equidistant observation times. The matrix S is parameterized as S = (sij), i, j = 1, …, p, with unit diagonal elements. Independent N (0, ) priors are placed on the off-diagonal elements of S, and on the logarithms of the diagonal elements of Ψ.
The matrix C is symmetric and positive definite. To place priors on its elements which satisfy the symmetry and positive definiteness, we first express the matrix C as a modified Cholesky factorization, C = LDL′, where L is unit lower triangular, and D is diagonal. This approach was used for example in Smith and Kohn (2002) and in Rosen and Stoffer (2007). The emphasis of Rosen and Stoffer (2007) is on estimation in the frequency domain for multivariate time series observed at equally spaced time points. The priors on the off-diagonal elements of L are taken to be independent N(0, ) with a fixed large value of . The priors placed on log(Di), where Di is the ith diagonal element of D, are independent N(0, ) with a fixed large value of .
2.6. The sampling scheme
Let , for k = 1, …, p, and let . Similarly, let , for k = 1, …, p and for i = 1, …, n. The sampling scheme consists of the following stages. More details are given in the Appendix.
Initialize θ, ηi, i = 1, …, n, and the variance components by fitting p mixed effects models, for k = 1, …, p. Initialize A and C by maximizing numerically the log conditional joint posterior distribution of A and C.
Generate θ from its full conditional posterior distribution, which is multivariate normal.
For each i, i = 1, …, n, generate ηi from its full conditional posterior distribution, which is multivariate normal.
For k = 1, …, p, l = 1, …, r, m = 1, …, s, generate the variance components , , and from their full conditional posterior distributions, which are inverse gamma.
Generate A from its full conditional posterior distribution. Since this distribution is not standard, we use a Metropolis step with a multivariate normal proposal density centered at the current value of A. The variance–covariance matrix of this normal proposal is based on the inverse of the estimated negative hessian of the log conditional posterior distribution.
Generate C from its full conditional posterior distribution using a Metropolis step.
3. Simulations
In this section, we explore by simulation the potential improvement in curve fitting when modeling the correlation structure of multivariate functional data rather than ignoring it. Specifically, we examine improvements in mean squared error for the individual subject-level functions. For this purpose we generated 100 datasets, with each dataset consisting of observations, without covariates, on n = 50 subjects. The number of observations per subject is mi = 2 + wi, where wi is a Poisson random variable with expectation 5, giving an average of 7 observation times per subject. The observation times themselves were independently generated from a uniform distribution on the interval [0, mi]. For each subject, there are p = 3 response variables with overall subject mean functions chosen to represent a variety of possible relationships. The first true mean function is μ1(t) = 7 sin(−.5t), which exhibits low frequency variation on the range of t. Note that the second subscript on μ1(t) was dropped, since there are no covariates in our simulation setting. The second true mean function is μ2(t) = 10ϕ(t; 1.5, .3) + 6ϕ(t; 4, .6), where ϕ(t; a, b) is a univariate normal density with mean a and standard deviation b. The third true mean function is μ3(t) = 2 sin(−t), which has higher frequency oscillations on the range of t. Let fik(t) = μk(t) = gik(t), k = 1, 2, 3, be the individual subject functions, where we have again dropped the covariate subscript. In particular,
| (11) |
where ai1 ~ N (7, .5), ai2 ~ N (0, .2), bi1 ~ N(10, .25), bi2 ~ N (6, .25), ci1 ~ N (2, .5) and ci2 ~ N (0, .2). Here, N (a, b) indicates the univariate normal distribution with mean a and standard deviation b. The observations yi(tij) were obtained by drawing the random coefficients ai1, ai2, bi1, bi2, ci1, ci2, evaluating the equations in (11) at time tij and adding δi(tij), which was in turn generated according to a multivariate Ornstein−Uhlenbeck error process with parameter values
These settings result in fairly noisy data with cross-correlations ranging from .25 to .47 among the three variables when evaluated at . The cross-correlation matrix is obtained from the cross-covariance (2), evaluated at Δ̄ti. Plots of these mean functions along with one randomly generated dataset can be seen in Fig. 3.
Fig. 3.

Mean functions (heavy lines) and data from one randomly generated dataset (light lines) with 50 individual subjects.
Our model was fitted four times for each dataset, once for each univariate outcome separately (thereby ignoring across-variable correlation) and then to all three outcomes simultaneously. The sampling scheme was run for 10,000 iterations per dataset, with a burn-in period of 5000. Median estimates  and Ĉ of the A and C matrices across all 100 multivariate fits were given by
To assess the quality of the resulting estimates of the three mean functions, we calculated the average squared difference between the function estimates and the true mean functions at the unique observation times, t1 <, …, < tM. For the kth function this was computed by
where μ̂k(·) is the fitted mean function for the kth response variable. This was done for all three univariate fits, as well as for the joint trivariate fit. Boxplots of the resulting , are displayed in Fig. 4. These boxplots show that the separate univariate fittings and the joint multivariate fitting resulted in little difference in the mean squared error for the first variable but lower mean squared error for the second two. For the univariate fits, the median estimates for were .244, 1.19, and .206 for k = 1, 2, 3, respectively. For the multivariate fits, the corresponding median estimates of were .225, 1.03, and .188, for k = 1, 2, 3, respectively. Paired t-tests of the log showed no significant difference of log corresponding to the univariate and multivariate fits (t = 1.0, p = 0.16) but that log and log were significantly lower for the multivariate fits (t = 2.3, p = 0.01 and t = 3.4, p < .0005, respectively). To assess the quality of the fitted individual subject functional estimates, we calculated the average squared difference between the true individual subject functions and their estimates from the model at the measured observation times. For the kth variable, this mean squared error for the individual subject functions was computed by
where and f̂ik(·) is the fitted function for the ith subject’s kth response. Boxplots of the resulting mean squared errors for the individual subject functions are displayed in Fig. 5. The mean squared errors for the subject functions show a similar pattern as for the overall mean. The median mean squared errors for each of the three outcome variables for the univariate fits were .702, 3.02, and .445, respectively. The median values for the corresponding multivariate fits were .676, 2.584, and .351. Thus, there was a 15%–20% reduction in mean squared error for the individual functions for the last two variables when accounting for the multivariate covariance among them. Again, there was no significant difference of log between the multivariate and univariate fits (t = 0.33, p = 0.39) but the multivariate fits had significantly lower log for k = 2, 3 (t = 4.6, p < 0.0005, t = 9.2, p < 0.0005, respectively). One possible reason why the first variable exhibits no improvement in mean squared errors is that the low-frequency variation of the corresponding mean function renders it easier to fit and hence it is less important to borrow information across variables.
Fig. 4.

Boxplots of for the posterior means μ̂k(·), k = 1, 2, 3, based on 100 simulated samples. Univariate and multivariate fits are denoted by U and M, respectively.
Fig. 5.

Boxplots of for the posterior means f̂ik(·), k = 1, 2, 3, i = 1, …, 25, based on 100 simulated samples. Univariate and multivariate fits are denoted by U and M, respectively.
4. Application
As described in Section 1, we apply our methodology to the results of a randomized clinical trial conducted at the University of Pittsburgh and the University of Pisa, Italy (Frank et al., 2008). Despite decades of clinical trial experience in major depression, there is only limited understanding of which patients with major depressive disorder respond better to psychotherapy or to pharmacotherapy. This clinical trial compares the effects of psychotherapy (129 subjects) vs. pharmacotherapy (123 subjects). For clarity, Fig. 6 shows the trajectories corresponding to 25 subjects only. We limit the current analysis to the first 12 weeks after baseline, at which about 95% of the subjects were still on study. Our methodology, which allows for non-linear estimation of time courses, can accommodate the subject trajectories which are clearly non-linear. In addition, our methodology accounts for the possibility of non-linear effects of baseline covariates over time. Of particular interest is the identification of baseline subject characteristics which differentially predict treatment response in the two groups. The treatment response was change over time in two depression scales, the clinician-administered Hamilton Rating Scale for Depression (HRSD) and the Quick Inventory of Depression - Self Report (QIDS). These measures were collected more than once per week on average, though there was variation both within and between patients in the actual timing and number of measurements, with a mean of 11.2 measurement times per subject over the course of treatment. The HRSD scores ranged from 0–31 with a median of 10, and the QIDS scores ranged from 0–26 with a median of 6. In both measures, higher values indicate more depressive symptoms. Both measures were log transformed and standardized before running the analyses. A Lifetime Depression Spectrum (LDS) score was assessed on each patient at baseline; this gives an omnibus measure of depressive symptomatology over a patient’s lifetime (Cassano et al., 1997). In this example, we considered the LDS score to be a pre-treatment covariate with potentially differential effects on treatment outcomes for the two treatment groups. To explore this possibility, treatment group, LDS score, and their interaction were entered as time-varying fixed effects into our model with responses HRSD and QIDS entered as bivariate dependent variables. A time-varying random intercept was also included in the model. In the notation of Section 2, xij = (xij1, xij2, xij3, xij4)′ and zij = 1, where xij1 = 1, xij2 is a group indicator (equal to 1 if subject i received psychotherapy and zero otherwise), xij3 is the ith subject’s LDS score, and xij4 = xij2xij3. The sampling scheme described in Section 2.6 was run for 10,000 iterations with a burn-in period of 5000 iterations. The estimated parameters of the Ornstein–Uhlenbeck process are
Fig. 6.

HRSD subject trajectories (left panel) and QIDS subject trajectories (right panel) for acutely depressed subjects. For clarity, only 25 subject trajectories are displayed. Scores are standardized to have zero mean and unit variance. Trajectories were truncated at 12 weeks.
The estimated time-varying functional coefficients μ̂k(t) = (μ̂k1(t), (μ̂k2(t), (μ̂k3(t), (μ̂k4(t))′, = k 1, 2, for the HRSD and QIDS responses are plotted in Figs. 7 and 8, respectively. Solid lines correspond to the multivarite fits; for comparison, the univariate fits appear in dashed lines. As can be seen in these plots, the multivariate fits show little evidence for a treatment group effect on HRSD, but evidence for a slight difference in the QIDS at around 3 weeks. However, there is a significant effect of LDS score on both outcomes, so that higher lifetime depression spectrum predicts worse outcomes over the first eight weeks or so. The interaction term is insignificant for HRSD and marginally significant in the 2–8 week time period for the QIDS responses. The effect of the interaction on responses is that LDS score is less predictive of poor QIDS response in the psychotherapy group than in the pharmacotherapy group. In general, the pointwise 95% credible intervals are wider for the univariate fits. While the functional coefficient estimates were substantially similar, the interaction coefficient for both models were not significant, i.e., the pointwise 95% credible intervals contained zero for the entire time course.
Fig. 7.

Time-varying functional coefficients for HRSD responses. The solid lines are μ̂1l(t), 1 ≤ l ≤ 4, and their corresponding pointwise 95% credible intervals. The dashed lines are the analogous estimates and credible intervals corresponding to the univariate fits. Upper left panel: μ̂11(t). Upper right panel: μ̂12(t). Lower left panel: μ̂13(t). Lower right panel: μ̂14(t).
Fig. 8.

Time-varying functional coefficients for QIDS responses. The solid lines are μ̂2l(t), 1 ≤ 1 ≤ 4, and their corresponding 95% credible intervals. The dashed lines are the analogous estimates and credible intervals corresponding to the univariate fits. Upper left panel: μ̂21(t). Upper right panel: μ̂22(t). Lower left panel: μ̂23(t). Lower right panel: μ̂24(t).
5. Discussion
In this paper we have devised a regression model apropriate for multivariate functional responses with unequallyspaced observation times. Efficiency may be gained by fitting a single regression with all response variables simultaneously, as opposed to fitting regression models for each functional response separately. This is especially true if the error terms corresponding to each variable are correlated. In our formulation, the random error terms of the model were assumed to follow a multivariate Ornstein–Uhlenbeck process. Through this formulation we were able to extend the seemingly unrelated regression framework to the unequally-spaced multivariate functional data context.
The model we proposed uses a Bayesian mixed-effects approach, where the fixed part corresponds to the mean functions, and the random part corresponds to individual deviations from these mean functions. Covariates were allowed as either fixed or random effects. For each of the response variables, both the mean and the subject-specific deviations were estimated via low-rank cubic splines using radial basis functions. Thus mean and subject-specific deviation from the mean were allowed to vary smoothly as a function of time. Inference was performed via Markov chain Monte Carlo methods.
We demonstrated the improvement in efficiency that is possible by using this model in simulations which show mean squared-error is lower for the full multivariate algorithm compared to fitting each of the functional responses unvariately, thereby ignoring the across-variable correlation. This seems especially important when the mean functions are wiggly, so that borrowing information across multiple responses becomes more important.
Finally, the utility of this methodology was demonstrated by application to a real-life psychiatric dataset looking at the relationship of multiple depression measures over time in a clinical trial. Here, using a multivariate approach resulted in narrower posterior confidence bands.
We plan future research to extend the multivariate functional model to mixed discrete and continuous functional outcome data. We also plan to develop methods for the joint analysis of mulitvariate functional data and time-to-event data.
Acknowledgments
We thank the referees for their helpful comments, which greatly improved the paper. We also thank Dr. Ellen Frank, University of Pittsburgh, Department of Psychiatry, for use of the example data. The first author was supported in part by RCMI grant 5G12 RR008124 from the NIH and by NSF grants DMS-0706752 and DMS-0804140. The second author was supported by NIH grant K25 MH076981-01 and NSF grant DMS-0904825.
Appendix
Starting values for θ, ηis and σ2
Let , where ⊗ denotes the Kronecker product, and Ir is an r × r identity matrix. Define , and similarly.
Let
where
To obtain starting values for θ, {ηi}i=1,…,n and σ2, we fit the mixed effects model
for k = 1, …, p, where and
Generating θ
Let and . The error term in model (1) can be expressed as
| (A.1) |
where the vectors θ and ηi are as defined at the beginning of Section 2.6. Plugging (A.1) into γtij = δi(tij) − exp(−AΔtij) δi(ti,j−1) gives
where
and
Let . Then,
where for k = 1, …, p, l = 1, …, r, m = 1, …, s,
and
In the expression for ΩΔtij, when p = 2, the stationary variance of the Ornstein–Uhlenbeck process is given by
see Gardiner (1983). For p = 1, this reduces to Σ = C/(2A). For p > 2, Σ can be obtained numerically by Matlab’s lyap function, for example.
Generating ηi
Let
and
where . Then
where
and
Generating σ2
for k = 1, …, p, l = 1, …, r.
for k = 1, …, p, m = 1, …, s.
Starting values for A and C
Starting values for A and C are obtained by numerically maximizing the conditional posterior
where θ0 and ηi0, i = 1, …, n, are the starting values for the basis function coefficients. Note that γtij depends on θ0 and ηi0, i = 1, …, n, through δi(tij) (expression (A.1)).
Generating the Ornstein–Uhlenbeck process parameters
To generate A, note that
| (A.2) |
Since (A.2) is not a standard distribution, we use a Metropolis step to generate A. The proposal distribution is multivariate normal centered at the current value of A with a variance–covariance matrix equal to the inverse of the negative Hessian of the log of (A.2) evaluated numerically at the mode. This variance–covariance matrix is computed once, conditional on the starting values for the other parameters, and is then fixed throughout the sampling scheme. To increase the acceptance rate, this variance–covariance matrix is multiplied by 5.76/p, as proposed in Gelman et al. (2004), page 306. More generally, when using a normal proposal distribution centered at the current point, Gelman et al. (2004) suggest using c2Σ as the covariance matrix of that proposal distribution. Among this class of proposal densities, the most efficient one has scale . The acceptance probability is
The matrix C is generated by first generating the matrices L and D via a Metropolis step based on
An iterate for C is then given by C = LDL′.
References
- Aït-Sahalia Y. Closed-form likelihood expansions for multivariate diffusions. Annals of Statistics. 2008;36:906–937. [Google Scholar]
- Baladandayuthapani V, Mallick BK, Young Hong M, Lupton JR, Turner ND, Carroll RJ. Bayesian hierarchical spatially correlated functional data analysis with application to colon carcinogenesis. Biometrics. 2008;64:64–73. doi: 10.1111/j.1541-0420.2007.00846.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beskos A, Papaspiliopoulos O, Roberts GO, Fearnhead P. Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes. Journal of Royal Statistical Society B. 2006;68:333–382. [Google Scholar]
- Blackwell PG. Bayesian inference for Markov processes with diffusion and discrete components. Biometrika. 2003;90:613–627. [Google Scholar]
- Durbán M, Harezlak J, Wand MP, Carroll RJ. Simple fitting of subject-specific curves for longitudinal data. Statistics in Medicine. 2005;24:1153–1167. doi: 10.1002/sim.1991. [DOI] [PubMed] [Google Scholar]
- Cassano GB, Michelini S, Shear MK, Coli E, Maser JD, Frank E. The panic-agoraphobic spectrum: A descriptive approach to the assessment and treatment of subtle symptoms. American Journal of Psychiatry. 1997;154:27–38. doi: 10.1176/ajp.154.6.27. [DOI] [PubMed] [Google Scholar]
- De la Cruz-Mesía R, Marshall G. A Bayesian approach for nonlinear regression models with continuous errors. Communications in Statistics, Theory and Methods. 2003;32:1631–1646. [Google Scholar]
- De la Cruz-Mesía R, Marshall G. Nonlinear random effects models with continuous time autoregressive errors. Statistics in Medicine. 2006;25:1471–1484. doi: 10.1002/sim.2290. [DOI] [PubMed] [Google Scholar]
- Frank E, Cassano GB, Rucci P, Fagiolini A, Maggi L, Kraemer HC, Kupfer DJ, Pollock B, Bies R, Nimgaonkar V, Pilkonis P, Shear MK, Thompson WK, Grochocinski VJ, Scocco P, Buttenfield J, Forgione RN. Addressing the challenges of a cross-national investigation: lessons from the Pittsburgh-Pisa study of treatment-relevant phenotypes of unipolar depression. Clinical Trials. 2008 doi: 10.1177/1740774508091965. under revision. [DOI] [PMC free article] [PubMed] [Google Scholar]
- French JL, Kammann EE, Wand MP. Comment on “Semiparametric Nonlinear Mixed-effects Models and Their Application” by C. Ke and Y. Wang. Journal of American Statistical Association. 2001;96:1285–1288. [Google Scholar]
- Gardiner CW. Handbook of Stochastic Methods for Physics, Chemistry and Natural Sciences. Springer-Verlag; Berlin: 1983. [Google Scholar]
- Gelman A. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) Bayesian Analysis. 2006;1:515–534. [Google Scholar]
- Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. Second ed. Chapman & Hall/CRC; Boca Raton: 2004. [Google Scholar]
- Golightly A, Wilkinson DJ. Bayesian sequential inference for nonlinear multivariate diffusions. Statistics and Computing. 2006;16:323–338. [Google Scholar]
- Guo W. Functional mixed effects models. Biometrics. 2002;58:121–128. doi: 10.1111/j.0006-341x.2002.00121.x. [DOI] [PubMed] [Google Scholar]
- Jones RH. Longitudinal Data with Serial Correlation: A State-Space Approach. Chapman & Hall/CRC; Boca Raton: 1993. [Google Scholar]
- Kessler M, Rahbek A. Identification and inference for multivariate cointegrated and ergodic gaussian diffusions. Statistical Inference for stochastic Processes. 2004;7:137–151. [Google Scholar]
- Rosen O, Stoffer DS. Automatic estimation of multivariate spectra via smoothing splines. Biometrika. 2007;94:335–345. [Google Scholar]
- Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Cambridge University Press; Cambridge: 2003. [Google Scholar]
- Smith M, Kohn R. Nonparametric seemingly unrelated regression. Journal of Econometrics. 2000;98:257–281. [Google Scholar]
- Smith M, Kohn R. Parsimonious covariance matrix estimation for longitudinal data. J American Statistical Association. 2002;97:1141–1153. [Google Scholar]
- Sy JP, Taylor JMG, Cumberland WG. A stochastic model for the analysis of bivariate longitudinal AIDS data. Biometrics. 1997;53:542–555. [PubMed] [Google Scholar]
- Thompson WK, Rosen O. A Bayesian model for sparse functional data. Biometrics. 2008;64:54–63. doi: 10.1111/j.1541-0420.2007.00829.x. Web appendices are available at http://www.tibs.org/biometrics. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Y, Staudenmayer J, Coull BA, Wand MP. General design Bayesian generalized linear mixed models. Statistical Science. 2006;21:35–51. [Google Scholar]
