A Bayesian regression model for multivariate functional data

Ori Rosen; Wesley K Thompson

doi:10.1016/j.csda.2009.03.026

. Author manuscript; available in PMC: 2017 Sep 19.

Published in final edited form as: Comput Stat Data Anal. 2009 Apr 8;53(11):3773–3786. doi: 10.1016/j.csda.2009.03.026

A Bayesian regression model for multivariate functional data

Ori Rosen ^a,^*, Wesley K Thompson ^b

PMCID: PMC5604261 NIHMSID: NIHMS125546 PMID: 28936016

Abstract

In this paper we present a model for the analysis of multivariate functional data with unequally spaced observation times that may differ among subjects. Our method is formulated as a Bayesian mixed-effects model in which the fixed part corresponds to the mean functions, and the random part corresponds to individual deviations from these mean functions. Covariates can be incorporated into both the fixed and the random effects. The random error term of the model is assumed to follow a multivariate Ornstein–Uhlenbeck process. For each of the response variables, both the mean and the subject-specific deviations are estimated via low-rank cubic splines using radial basis functions. Inference is performed via Markov chain Monte Carlo methods.

1. Introduction

The term functional data analysis describes non-parametric analyses of longitudinal data which focus on the curves themselves as the basic unit of data. Some of the goals of functional data analysis include exploring individual variation of curves from an overall mean function, and modeling the dependence of the curves on covariates. The mean function, as well as the subject-specific functions are estimated non-parametrically. In this paper, we propose a method for analyzing multivariate functional data with unequally spaced observation times that may differ among subjects. It is assumed that all variables are observed at the same time points. Fitting a regression model with a multivariate response may be done by either fitting a separate regression for each of the response variables or by fitting a single regression with all response variables simultaneously. The latter may be advantageous if the error terms corresponding to each variable are correlated. Thus, fewer observations may be required to obtain reliable non-parametric function estimates compared to fitting each regression separately and ignoring the correlation. This has been shown to be the case in seemingly unrelated regression (see for example Smith and Kohn, 2000).

Our method is formulated as a Bayesian mixed-effects model in which the fixed part corresponds to the mean functions, and the random part corresponds to individual deviations from these mean functions. Covariates can be incorporated into both the fixed and the random effects. The random error term of the model is assumed to follow a first order continuous-time multivariate autoregression, also known as a multivariate Ornstein–Uhlenbeck process. For each of the response variables, both the mean and the subject-specific deviations are estimated via low-rank cubic splines using radial basis functions. Inference is performed via Markov chain Monte Carlo methods.

Our model is closest in spirit to the functional mixed effects model of Guo (2002), where the fixed and random effects are modeled by cubic smoothing splines. However, Guo’s model accommodates only a univariate response variable, and does not allow correlated error terms. It can be fit either via standard mixed effects software or by Kalman filtering. Inference and model selection are based on a generalized maximum likelihood ratio test. Baladandayuthapani et al. (2008) have proposed a Bayesian model for spatially correlated functional data analysis. The smoothing technique they use is similar to ours, but their emphasis is on spatial correlation rather than on temporal correlation. Smith and Kohn (2000) consider multivariate non-parametric regression using the seemingly unrelated regression approach. They show that if the error terms of the regression equations are correlated, better non-parametric estimates of the regression functions are obtained by accounting for this correlation compared to fitting separate regressions ignoring the correlation. It is noted that Smith and Kohn (2000) consider multivariate non-parametric regression, not functional data analysis. In functional data analysis, each individual subject has its own function which needs to be estimated for each variable. Smith and Kohn (2000) only estimate a single function for each variable.

The Ornstein–Uhlenbeck process has been used before in various contexts. Unlike most diffusion processes, its transition density is available in closed form, which results in a closed-form expression for the likelihood function. Jones (1993), Chapter 8, uses a state-space approach to parameter estimation. Sy et al. (1997) present a model for multivariate repeated measures which allows unequally spaced observations by using the multivariate integrated Ornstein–Uhlenbeck process. The fixed and random effects in their model have parametric forms. Markov chain Monte Carlo methods for inference on the Ornstein–Uhlenbeck process (univariate or multivariate) parameters have also been proposed. A recent review of estimation for discretely observed diffusion processes is given in Beskos et al. (2006). Golightly and Wilkinson (2006) discuss Bayesian inference for non-linear multivariate diffusions. A number of authors have assumed a common spacing between the observed times. Blackwell (2003) takes this common spacing to be the most frequently occurring interval between observations. De la Cruz-Mesía and Marshall (2003, 2006) discuss the univariate Ornstein–Uhlenbeck process and take the common spacing to be the average time difference between two consecutive observations.

The example for application of our methodology is taken from a recent psychiatric study comparing psychotherapy to pharmacotherapy carried out at the University of Pittsburgh and the University of Pisa, Italy (Frank et al., 2008). This study sought differential baseline predictors of response to these two forms of treatment of major depression. Here, we examine the interaction effect of treatment group with Lifetime Depressive Spectrum symptoms (LDS; (Cassano et al., 1997)) in 252 patients entering the study in an acutely depressive episode. Levels of depression are determined by the clinician-administered Hamilton Rating Scale for Depression (HRSD) and the Quick Inventory for Depression Self-report (QIDS). These two scales were given to patients at baseline and again roughly weekly over the course of each subject’s acutely depressive episode.

Our main contribution in this paper is accommodating multivariate functional data including covariates and accounting for correlation across variables and time using smoothing techniques in combination with modeling the error term via the multivariate Ornstein–Uhlenbeck process.

The rest of the paper is organized as follows. In Section 2, we describe our model, the prior distributions and the sampling scheme. Section 3 provides results of a simulation study. Section 4 discusses an application, and Section 5 ends with a brief discussion.

2. The model, priors and sampling scheme

2.1. The model

Suppose y_i(t_ij) is a p × 1 vector of response variables on subject i at time t_ij, i = 1, …, n, j = 1, …, m_i, and consider the model

y_{i} (t_{ij}) = X_{ij} μ (t_{ij}) + Z_{ij} g_{i} (t_{ij}) + δ_{i} (t_{ij}) .

(1)

In (1), $μ (t) = {(μ_{1}^{'} (t), \dots, μ_{p}^{'} (t))}^{'}$ and $g_{i} (t) = {(g_{i 1}^{'} (t), \dots, g_{ip}^{'} (t))}^{'}$ , where μ_k(t) = (μ_k1(t), …, μ_kr (t))′ and g_ik(t) = (g_ik1(t), …, g_iks(t))′ are an r × 1 vector of fixed functions, and an s × 1 vector of random functions, respectively, for k = 1, …, p. Associated with μ(t_ij) is an r × 1 covariate vector x_ij, and with g_i(t_ij) – an s × 1 covariate vector z_ij, such that $X_{ij} = I_{p} \otimes x_{ij}^{'}$ and $Z_{ij} = I_{p} \otimes z_{ij}^{'}$ , where I_p is a p × p identity matrix, and ⊗ denotes the Kronecker product. We have assumed here that the p response variables share the same covariates.

Before proceeding to specify the p × 1 vector of random errors, δ_i(t_ij), we give an example which is a special case of model (1). Suppose p = 2, r = 2 and s = 1, with x_ij taking values in {(1 0)′, (1 1)′}, and z_ij = 1 with corresponding functions μ₁(t) = (μ₁₁(t), μ₁₂(t))′, μ₂(t) = μ₂₁(t), μ₂₂(t))′, g_i1(t) and g_i2(t). In this case, model (1) reduces to

\begin{array}{l} y_{i 1} (t_{ij}) = μ_{11} (t_{ij}) + x_{ij 2} \cdot μ_{12} (t_{ij}) + g_{i 1} (t_{ij}) + δ_{i 1} (t_{ij}) \\ y_{i 2} (t_{ij}) = μ_{21} (t_{ij}) + x_{ij 2} \cdot μ_{22} (t_{ij}) + g_{i 2} (t_{ij}) + δ_{i 2} (t_{ij}), \end{array}

where x_ij2, the second entry of x_ij, can take the values 0 or 1. In this example, there are two groups of subjects – control (x_ij2 = 0) and treatment (x_ij2 = 1). Each group has its own mean curve for each of the two variables, and individual deviations from these curves are accommodated by the random functions g_i1(t) and g_i2(t), i = 1, …, n.

The error term in model (1) is assumed to follow a multivariate Ornstein–Uhlenbeck process. More specifically, δ_i(t) satisfies the stochastic differential equation

d δ_{i} (t) = - A δ_{i} (t) + Bd W_{i} (t),

where A and B are p × p matrices of full rank common to all i = 1, …, n, and W_i(t) is the p-dimensional Wiener process. Three properties of the Ornstein–Uhlenbeck process (Gardiner, 1983, pp. 110–111) which will be useful in what follows are

The Ornstein–Uhlenbeck process will be stationary provided the eigenvalues of A have positive real parts.
The solution Σ to the matrix equation AΣ + ΣA′ = BB′ is the stationary variance–covariance matrix of the process.
In the stationary state, the covariance of δ_i(t) and δ_i(s), for s < t, is

Cov (δ_{i} (t), δ_{i} (s)) = \exp {- A (t - s)} \sum .

(2)

Let Δt_ij = t_ij − t_i,j−1 for j = 1, …, m_i, where t_i0 = 0. The transition density of the Ornstein–Uhlenbeck process is given by

p (δ_{i} (t_{ij}) ∣ δ_{i} (t_{i, j - 1}), Δ t_{ij}) \propto {| Ω_{Δ t_{ij}} |}^{- 1 / 2} \exp {- \frac{1}{2} γ_{t_{ij}}^{'} Ω_{Δ t_{ij}}^{- 1} γ_{t_{ij}}},

(3)

where γ_{t_ij} = δ_i(t_ij) − exp(−AΔt_ij)δ_i(t_i,j−1) and Ω_{Δt_ij} = Σ − exp(−AΔt_ij) Σ exp(−A′ Δt_ij).

The functions μ_kl(t) and g_ikm (t), k = 1, …, p, l = 1, …, r, m = 1, …, s, i = 1, …, n, are modeled as cubic splines using low-rank radial basis functions (French et al., 2001; Ruppert et al., 2003). In Section 2.2 we review briefly non-parametric function estimation.

2.2. Non-parametric function estimation

The functions μ_kl(t) and g_ikm(t) are estimated non-parametrically. In this section we explain the basis function approach which is used in turn in Section 2.3 to estimate these functions. For simplicity, we focus on scatterplot smoothing, with observations (x_i, y_i), i = 1, …, n. The description in this section is based on French et al. (2001), Ruppert et al. (2003) and Crainiceanu et al. (2005). Consider the model

y_{i} = f (x_{i}) + ε_{i},

where E (ε_i) = 0, i = 1, …, n, and f is an unknown smooth function. A linear spline basis function can be expressed as (x − κ)₊ = max (0, x − κ), where κ is a knot. Any linear combination of linear spline basis functions 1, x, (x − κ₁)₊, …, (x − κ_K)₊ is a piecewise linear function with knots at κ₁, …, κ_K. The function f may thus be expressed as

f (x) = β_{0} + β_{1} x + \sum_{k = 1}^{K} u_{k} {(x - κ_{k})}_{+},

(4)

where the u_ks are the coefficients of the basis functions. We comment later in this section on the value of K.

To understand how the spline model (4) can be used for fitting a non-parametric curve to data, consider Fig. 1, which displays in the top panel whip-shaped data similar to the example in Ruppert et al. (2003). The left half of the data exhibits linear behavior while curvature is apparent on the right-hand side. The bottom panel presents the basis functions used in the spline model. In particular, the equally-spaced knots are 0.50, 0.55, 0.60, …, 0.95. Comparing the two panels of Fig. 1, it is quite easy to see that a linear combination of the basis functions in the bottom panel should be able to capture the data structure in the top panel.

In general, any structure can be accommodated by placing basis functions at additional knots. To automate the process, two main approaches are commonly taken. One approach is automatic knot selection which can be carried out via Bayesian variable selection. Specifically, a large number of knots are placed at either equally spaced locations or at specific percentiles of the covariate, and an indicator variable is attached to each knot (see for example Thompson and Rosen, 2008). The indicator value is 1 if a knot is to be retained at a given location or 0 if the knot should be removed from that location. In a Bayesian MCMC procedure, the indicator variables are sampled from at each iteration. The other approach is to retain all the knots but to constrain their influence. This can be accomplished by penalized spline regression or equivalently by using a linear mixed effects model formulation. In this paper, we use the latter approach in a Bayesian framework. In both approaches, the value of K is not crucial, as long as it is not too small. Typically, 30–40 knots are sufficient for medium-sized datasets. Instead of the linear spline representation (4), we use in this paper the low rank thin-plate spline representation

f (x) = β_{0} + β_{1} x + \sum_{k = 1}^{K} u_{k} {| x - κ_{k} |}^{3} .

(5)

Using cubic radial basis functions tends to result in a more aesthetically appealing fit, compared to that of the truncated-line basis (Fig. 2), and may lead to faster convergence of the MCMC algorithm. The penalized spline approach prevents overfitting by adding a roughness penalty. Specifically, the minimization criterion is

\sum_{i = 1}^{n} {(y_{i} - f (x_{i}))}^{2} + \frac{1}{λ} θ^{'} D θ,

(6)

where θ = (β₀, β₁, u₁, …, u_K)′, λ is the smoothing parameter and D is a known positive-definite penalty matrix. For thinplate splines, the matrix D is given by

D = (\begin{matrix} 0_{2 \times 2} & 0_{2 \times K} \\ 0_{K \times 2} & Ω_{K} \end{matrix}),

where the (k, l)th element of Ω_K is |κ_k − κ_l|³. From the structure of the matrix D it is clear that only the u_ks are penalized. Let y = (y₁, …, y_n)′, X = [1 x_i]_1≤i≤n, Z_K = [|x_i − κ₁|³ … |x_i − κ_K|³]_1≤i≤n. Dividing (6) by $σ_{ε}^{2}$ and expressing the penalty term explicitly as a function of Ω_K results in

\frac{1}{σ_{ε}^{2}} {‖ y - X β - Z_{K} u ‖}^{2} + \frac{1}{λ σ_{ε}^{2}} u^{'} Ω_{K} u,

(7)

where β = (β₀, β₁)′ and u = (u₁, …, u_K)′ are considered fixed and random parameters, respectively. The solution to (7) is equal to the best linear unbiased predictor (BLUP) in the linear mixed model

y = X β + Z_{K} u + ε, Cov (u) = σ_{u}^{2} (Ω_{K}^{- 1 / 2}) {(Ω_{K}^{- 1 / 2})}^{'},

(8)

where $Ω_{K}^{- 1 / 2}$ is based on the singular value decomposition. Note that Ω_K is not a positive definite matrix so it is not a proper covariance matrix; however, French et al. (2001) show that the smooth fit is not affected by this fact. Let $b = Ω_{K}^{1 / 2} u$ and $Z = Z_{K} Ω_{K}^{- 1 / 2}$ , then the mixed model (8) is equivalent to

y = X β + Z b + ε, Cov (\begin{matrix} b \\ ε \end{matrix}) = (\begin{matrix} σ_{b}^{2} I_{K} & 0 \\ 0 & σ_{ε}^{2} I_{n} \end{matrix}) .

(9)

Fig. 2 — The data with a fitted curve. Left: truncated lines basis. Right: Cubic radial basis functions.

In a Bayesian framework, prior distributions need to be placed on all the model parameters.

2.3. Estimating μ_kl(t) and g_ikm(t)

To estimate the functions μ_kl(t) and g_ikm(t), k = 1, …, p, l = 1, …, r, m = 1, …, s, i = 1, …, n, we use the basis function approach described in Section 2.2. In particular, let κ₁, …, κ_K be K knots obtained as sample quantiles of t_ij, i = 1, …, n, j = 1, …, m_i, and let Λ_K = [|κ_k − κ_k′|³]_{1≤k,k′≤K} be a K × K matrix. Let $ϕ_{ij}^{'} = (\begin{matrix} 1 & t_{ij} \end{matrix}), ξ_{ij}^{'} = ({| t_{ij} - κ_{1} |}^{3}, \dots, {| t_{ij} - κ_{K} |}^{3})$ , and $ψ_{ij}^{'} = ξ_{ij}^{'} Λ_{K}^{- 1 / 2}$ , where $Λ_{K}^{1 / 2}$ is obtained via the singular value decomposition. The vectors ϕ_ij and ψ_ij are basis functions evaluated at t_ij and are used to model the linear part and the non-linear part, respectively, of the fixed and random functions. In particular, μ_kl(t) and g_ikm(t) can be evaluated at t_ij by

μ_{kl} (t_{ij}) = ϕ_{ij}^{'} β_{kl} + ψ_{ij}^{'} ν_{kl} and g_{ikm} (t_{ij}) = ϕ_{ij}^{'} w_{ikm} + ψ_{ij}^{'} u_{ikm}

(10)

for k = 1, …, p, l = 1, …, r, m = 1, …, s and i = 1, …, n. In (10), β_kl, ν_kl,w_ikm and u_ikm are unknown parameter vectors.

2.4. Priors on the basis function coefficients and the variance components

We place the following prior distributions on β_kl, ν_kl, w_ikm and u_ikm, k = 1, …, p, l = 1, …, r, m = 1, …, s and i = 1, …, n.

$β_{kl} \overset{ind}{\sim} N (0, σ_{β_{kl}}^{2} I_{2})$ , where I₂ is a 2 × 2 identity matrix, and $σ_{β_{kl}}^{2}$ is a large known value.
$ν_{kl} \overset{ind}{\sim} N (0, σ_{υ_{kl}}^{2} I_{K})$ , where I_K is a K × K identity matrix, and K is the number of knots.
$w_{ikm} \overset{ind}{\sim} N (0, diag (σ_{w_{km 0}}^{2}, σ_{w_{km 1}}^{2}))$ .
$u_{ikm} \overset{ind}{\sim} N (0, σ_{u_{km}}^{2} I_{K})$ .

Similar prior distributions on the coefficients of the basis functions were used by Durbán et al. (2005). Note that the variances of the elements of w_ikm are different while those of the elements of u_ikm are all the same. This is merely for computational convenience to avoid an additional (K − 1) parameters. The priors on the variance components $σ_{υ_{kl}}^{2}$ , k = 1, …, p, l = 1, …, r, are independent inverse gamma distributions with densities

p (σ_{υ_{kl}}^{2}) \propto {(σ_{υ_{kl}}^{2})}^{- (α_{1} + 1)} \exp (- b_{1} / σ_{υ_{kl}}^{2}),

where a₁ and b₁ are known small values reflecting vague knowledge on $σ_{υkl}^{2}$ . The priors on $σ_{w_{km 0}}^{2}$ , $σ_{w_{km 1}}^{2}$ and $σ_{u_{km}}^{2}$ are similar inverse gamma distributions. Recently, a number of authors (see for example Gelman, 2006) have proposed alternative prior distributions for variance components which may exhibit superior behavior to that of inverse gamma distributions. However, Zhao et al. (2006) reported good performance of inverse gamma priors in the case of non-parametric regression, provided the hyperparameters are not too small. Specifically, hyperparameter values of 0.01 worked well, whereas values of 0.001 behaved erratically.

2.5. Priors on the Ornstein–Uhlenbeck parameters

The Ornstein–Uhlenbeck process parameters are the matrix A and the matrix C = BB′. Both matrices consist of parameters which are constrained to satisfy certain conditions. In particular, as mentioned in Section 2, the stationarity condition requires the real parts of the eigenvalues of A to be positive. Also, the matrix C is required to be symmetric and positive definite. Imposing the constraints directly on the elements of these matrices would be difficult. Instead, we first express each of these matrices in an appropriate decomposition and then place prior distributions on the parameters of the decomposition factors. This is a much easier task, as the factor parameters are either unconstrained or constrained to be non-negative. To place a prior on A, we express it as A = SΨS⁻¹, where S is a matrix of linearly independent eigenvectors, and Ψ is a diagonal matrix of real positive eigenvalues. This parameterization, used also by Sy et al. (1997) for the bivariate Ornstein–Uhlenbeck process, satisfies the stationarity condition mentioned above for the Ornstein–Uhlenbeck process. Aït-Sahalia (2008) discusses identifiability related to A and expresses it as a lower triangular matrix with positive diagonal elements. Kessler and Rahbek (2004) discuss identifiability issues in the case of equidistant observation times. The matrix S is parameterized as S = (s_ij), i, j = 1, …, p, with unit diagonal elements. Independent N (0, $σ_{a}^{2}$ ) priors are placed on the off-diagonal elements of S, and on the logarithms of the diagonal elements of Ψ.

The matrix C is symmetric and positive definite. To place priors on its elements which satisfy the symmetry and positive definiteness, we first express the matrix C as a modified Cholesky factorization, C = LDL′, where L is unit lower triangular, and D is diagonal. This approach was used for example in Smith and Kohn (2002) and in Rosen and Stoffer (2007). The emphasis of Rosen and Stoffer (2007) is on estimation in the frequency domain for multivariate time series observed at equally spaced time points. The priors on the off-diagonal elements of L are taken to be independent N(0, $σ_{L}^{2}$ ) with a fixed large value of $σ_{L}^{2}$ . The priors placed on log(D_i), where D_i is the ith diagonal element of D, are independent N(0, $σ_{D}^{2}$ ) with a fixed large value of $σ_{D}^{2}$ .

2.6. The sampling scheme

Let $θ_{k} = {(β_{k 1}^{'}, ν_{k 1}^{'}, \dots, β_{kr}^{'}, ν_{kr}^{'})}^{'}$ , for k = 1, …, p, and let $θ = {(θ_{1}^{'}, \dots, θ_{p}^{'})}^{'}$ . Similarly, let $η_{ik} = {(w_{ik 1}^{'}, u_{ik 1}^{'}, \dots, w_{iks}^{'}, u_{iks}^{'})}^{'}$ , for k = 1, …, p and $η_{i} = {(η_{i 1}^{'}, \dots, η_{ip}^{'})}^{'}$ for i = 1, …, n. The sampling scheme consists of the following stages. More details are given in the Appendix.

Initialize θ, η_i, i = 1, …, n, and the variance components by fitting p mixed effects models, for k = 1, …, p. Initialize A and C by maximizing numerically the log conditional joint posterior distribution of A and C.
Generate θ from its full conditional posterior distribution, which is multivariate normal.
For each i, i = 1, …, n, generate η_i from its full conditional posterior distribution, which is multivariate normal.
For k = 1, …, p, l = 1, …, r, m = 1, …, s, generate the variance components $σ_{υ_{kl}}^{2}$ , $σ_{w_{km 0}}^{2}$ , $σ_{w_{km 1}}^{2}$ and $σ_{u_{km}}^{2}$ from their full conditional posterior distributions, which are inverse gamma.
Generate A from its full conditional posterior distribution. Since this distribution is not standard, we use a Metropolis step with a multivariate normal proposal density centered at the current value of A. The variance–covariance matrix of this normal proposal is based on the inverse of the estimated negative hessian of the log conditional posterior distribution.
Generate C from its full conditional posterior distribution using a Metropolis step.

3. Simulations

In this section, we explore by simulation the potential improvement in curve fitting when modeling the correlation structure of multivariate functional data rather than ignoring it. Specifically, we examine improvements in mean squared error for the individual subject-level functions. For this purpose we generated 100 datasets, with each dataset consisting of observations, without covariates, on n = 50 subjects. The number of observations per subject is m_i = 2 + w_i, where w_i is a Poisson random variable with expectation 5, giving an average of 7 observation times per subject. The observation times themselves were independently generated from a uniform distribution on the interval [0, m_i]. For each subject, there are p = 3 response variables with overall subject mean functions chosen to represent a variety of possible relationships. The first true mean function is μ₁(t) = 7 sin(−.5t), which exhibits low frequency variation on the range of t. Note that the second subscript on μ₁(t) was dropped, since there are no covariates in our simulation setting. The second true mean function is μ₂(t) = 10ϕ(t; 1.5, .3) + 6ϕ(t; 4, .6), where ϕ(t; a, b) is a univariate normal density with mean a and standard deviation b. The third true mean function is μ₃(t) = 2 sin(−t), which has higher frequency oscillations on the range of t. Let f_ik(t) = μ_k(t) = g_ik(t), k = 1, 2, 3, be the individual subject functions, where we have again dropped the covariate subscript. In particular,

\begin{array}{l} f_{i 1} (t) = a_{i 1} \sin (- .5 t) + a_{i 2} \\ f_{i 2} (t) = b_{i 1} ϕ (t; 1.5, .3) + b_{i 2} ϕ (t; 4, .6) \\ f_{i 3} (t) = c_{i 1} \sin (- t) + c_{i 2}, \end{array}

(11)

where a_i1 ~ N (7, .5), a_i2 ~ N (0, .2), b_i1 ~ N(10, .25), b_i2 ~ N (6, .25), c_i1 ~ N (2, .5) and c_i2 ~ N (0, .2). Here, N (a, b) indicates the univariate normal distribution with mean a and standard deviation b. The observations y_i(t_ij) were obtained by drawing the random coefficients a_i1, a_i2, b_i1, b_i2, c_i1, c_i2, evaluating the equations in (11) at time t_ij and adding δ_i(t_ij), which was in turn generated according to a multivariate Ornstein−Uhlenbeck error process with parameter values

A = (\begin{array}{r} 2 & - 0.6 & - 0.6 \\ - 0.6 & 2 & - 0.6 \\ 0 & 0 & 2 \end{array}) and C = (\begin{array}{r} 15 & 0 & 0 \\ 0 & 15 & 0 \\ 0 & 0 & 15 \end{array}) .

These settings result in fairly noisy data with cross-correlations ranging from .25 to .47 among the three variables when evaluated at $\bar{Δ} t_{i} = \frac{1}{m_{i}} \sum_{j = 1}^{m_{i}} (t_{ij} - t_{i, j - 1}) \approx 1$ . The cross-correlation matrix is obtained from the cross-covariance (2), evaluated at Δ̄t_i. Plots of these mean functions along with one randomly generated dataset can be seen in Fig. 3.

Fig. 3 — Mean functions (heavy lines) and data from one randomly generated dataset (light lines) with 50 individual subjects.

Our model was fitted four times for each dataset, once for each univariate outcome separately (thereby ignoring across-variable correlation) and then to all three outcomes simultaneously. The sampling scheme was run for 10,000 iterations per dataset, with a burn-in period of 5000. Median estimates Â and Ĉ of the A and C matrices across all 100 multivariate fits were given by

\hat{A} = (\begin{array}{r} 2.91 & - 0.72 & - 1.31 \\ - 0.38 & 2.29 & - 0.63 \\ 0.02 & - 0.07 & 3.03 \end{array}) and \hat{C} = (\begin{array}{r} 18.82 & - 0.81 & - 1.67 \\ - 0.81 & 16.81 & - 0.19 \\ - 1.67 & - 0.19 & 19.59 \end{array}) .

To assess the quality of the resulting estimates of the three mean functions, we calculated the average squared difference between the function estimates and the true mean functions at the unique observation times, t₁ <, …, < t_M. For the kth function this was computed by

MS E_{k}^{(1)} = \frac{1}{M} {\sum_{m = 1}^{M} ({\hat{μ}}_{k} (t_{m}) - μ_{k} (t_{m}))}^{2},

where μ̂_k(·) is the fitted mean function for the kth response variable. This was done for all three univariate fits, as well as for the joint trivariate fit. Boxplots of the resulting $MS E_{k}^{(1)}, k = 1, 2, 3$ , are displayed in Fig. 4. These boxplots show that the separate univariate fittings and the joint multivariate fitting resulted in little difference in the mean squared error for the first variable but lower mean squared error for the second two. For the univariate fits, the median estimates for $MS E_{k}^{(1)}$ were .244, 1.19, and .206 for k = 1, 2, 3, respectively. For the multivariate fits, the corresponding median estimates of $MS E_{k}^{(1)}$ were .225, 1.03, and .188, for k = 1, 2, 3, respectively. Paired t-tests of the log $MS E_{k}^{(1)}$ showed no significant difference of log $MS E_{1}^{(1)}$ corresponding to the univariate and multivariate fits (t = 1.0, p = 0.16) but that log $MS E_{2}^{(1)}$ and log $MS E_{3}^{(1)}$ were significantly lower for the multivariate fits (t = 2.3, p = 0.01 and t = 3.4, p < .0005, respectively). To assess the quality of the fitted individual subject functional estimates, we calculated the average squared difference between the true individual subject functions and their estimates from the model at the measured observation times. For the kth variable, this mean squared error for the individual subject functions was computed by

MS E_{k}^{(2)} = \frac{1}{m} \sum_{i = 1}^{n} {\sum_{j = 1}^{m_{i}} ({\hat{f}}_{ik} (t_{ij}) - f_{ik} (t_{ij}))}^{2},

where $m = \sum_{i = 1}^{n} m_{i}$ and f̂_ik(·) is the fitted function for the ith subject’s kth response. Boxplots of the resulting mean squared errors for the individual subject functions are displayed in Fig. 5. The mean squared errors for the subject functions show a similar pattern as for the overall mean. The median mean squared errors for each of the three outcome variables for the univariate fits were .702, 3.02, and .445, respectively. The median values for the corresponding multivariate fits were .676, 2.584, and .351. Thus, there was a 15%–20% reduction in mean squared error for the individual functions for the last two variables when accounting for the multivariate covariance among them. Again, there was no significant difference of log $MS E_{1}^{(2)}$ between the multivariate and univariate fits (t = 0.33, p = 0.39) but the multivariate fits had significantly lower log $MS E_{1}^{(2)}$ for k = 2, 3 (t = 4.6, p < 0.0005, t = 9.2, p < 0.0005, respectively). One possible reason why the first variable exhibits no improvement in mean squared errors is that the low-frequency variation of the corresponding mean function renders it easier to fit and hence it is less important to borrow information across variables.

Fig. 4 — Boxplots of $MS E_{k}^{(1)}$ for the posterior means *μ̂_k*(·), k = 1, 2, 3, based on 100 simulated samples. Univariate and multivariate fits are denoted by U and M, respectively.

Fig. 5 — Boxplots of $MS E_{k}^{(2)}$ for the posterior means *f̂_ik*(·), k = 1, 2, 3, i = 1, …, 25, based on 100 simulated samples. Univariate and multivariate fits are denoted by U and M, respectively.

4. Application

As described in Section 1, we apply our methodology to the results of a randomized clinical trial conducted at the University of Pittsburgh and the University of Pisa, Italy (Frank et al., 2008). Despite decades of clinical trial experience in major depression, there is only limited understanding of which patients with major depressive disorder respond better to psychotherapy or to pharmacotherapy. This clinical trial compares the effects of psychotherapy (129 subjects) vs. pharmacotherapy (123 subjects). For clarity, Fig. 6 shows the trajectories corresponding to 25 subjects only. We limit the current analysis to the first 12 weeks after baseline, at which about 95% of the subjects were still on study. Our methodology, which allows for non-linear estimation of time courses, can accommodate the subject trajectories which are clearly non-linear. In addition, our methodology accounts for the possibility of non-linear effects of baseline covariates over time. Of particular interest is the identification of baseline subject characteristics which differentially predict treatment response in the two groups. The treatment response was change over time in two depression scales, the clinician-administered Hamilton Rating Scale for Depression (HRSD) and the Quick Inventory of Depression - Self Report (QIDS). These measures were collected more than once per week on average, though there was variation both within and between patients in the actual timing and number of measurements, with a mean of 11.2 measurement times per subject over the course of treatment. The HRSD scores ranged from 0–31 with a median of 10, and the QIDS scores ranged from 0–26 with a median of 6. In both measures, higher values indicate more depressive symptoms. Both measures were log transformed and standardized before running the analyses. A Lifetime Depression Spectrum (LDS) score was assessed on each patient at baseline; this gives an omnibus measure of depressive symptomatology over a patient’s lifetime (Cassano et al., 1997). In this example, we considered the LDS score to be a pre-treatment covariate with potentially differential effects on treatment outcomes for the two treatment groups. To explore this possibility, treatment group, LDS score, and their interaction were entered as time-varying fixed effects into our model with responses HRSD and QIDS entered as bivariate dependent variables. A time-varying random intercept was also included in the model. In the notation of Section 2, x_ij = (x_ij1, x_ij2, x_ij3, x_ij4)′ and z_ij = 1, where x_ij1 = 1, x_ij2 is a group indicator (equal to 1 if subject i received psychotherapy and zero otherwise), x_ij3 is the ith subject’s LDS score, and x_ij4 = x_ij2x_ij3. The sampling scheme described in Section 2.6 was run for 10,000 iterations with a burn-in period of 5000 iterations. The estimated parameters of the Ornstein–Uhlenbeck process are

\hat{A} = (\begin{array}{l} 5.75 & - 3.88 \\ - 4.40 & 7.04 \end{array}) and \hat{C} = (\begin{matrix} 3.03 & 0.05 \\ 0.05 & 3.50 \end{matrix}) .

Fig. 6 — HRSD subject trajectories (left panel) and QIDS subject trajectories (right panel) for acutely depressed subjects. For clarity, only 25 subject trajectories are displayed. Scores are standardized to have zero mean and unit variance. Trajectories were truncated at 12 weeks.

The estimated time-varying functional coefficients μ̂_k(t) = (μ̂_k1(t), (μ̂_k2(t), (μ̂_k3(t), (μ̂_k4(t))′, = k 1, 2, for the HRSD and QIDS responses are plotted in Figs. 7 and 8, respectively. Solid lines correspond to the multivarite fits; for comparison, the univariate fits appear in dashed lines. As can be seen in these plots, the multivariate fits show little evidence for a treatment group effect on HRSD, but evidence for a slight difference in the QIDS at around 3 weeks. However, there is a significant effect of LDS score on both outcomes, so that higher lifetime depression spectrum predicts worse outcomes over the first eight weeks or so. The interaction term is insignificant for HRSD and marginally significant in the 2–8 week time period for the QIDS responses. The effect of the interaction on responses is that LDS score is less predictive of poor QIDS response in the psychotherapy group than in the pharmacotherapy group. In general, the pointwise 95% credible intervals are wider for the univariate fits. While the functional coefficient estimates were substantially similar, the interaction coefficient for both models were not significant, i.e., the pointwise 95% credible intervals contained zero for the entire time course.

Fig. 7 — Time-varying functional coefficients for HRSD responses. The solid lines are μ̂_1l(t), 1 ≤ l ≤ 4, and their corresponding pointwise 95% credible intervals. The dashed lines are the analogous estimates and credible intervals corresponding to the univariate fits. Upper left panel: μ̂₁₁(t). Upper right panel: μ̂₁₂(t). Lower left panel: μ̂₁₃(t). Lower right panel: μ̂₁₄(t).

Fig. 8 — Time-varying functional coefficients for QIDS responses. The solid lines are μ̂_2l(t), 1 ≤ 1 ≤ 4, and their corresponding 95% credible intervals. The dashed lines are the analogous estimates and credible intervals corresponding to the univariate fits. Upper left panel: μ̂₂₁(t). Upper right panel: μ̂₂₂(t). Lower left panel: μ̂₂₃(t). Lower right panel: μ̂₂₄(t).

5. Discussion

In this paper we have devised a regression model apropriate for multivariate functional responses with unequallyspaced observation times. Efficiency may be gained by fitting a single regression with all response variables simultaneously, as opposed to fitting regression models for each functional response separately. This is especially true if the error terms corresponding to each variable are correlated. In our formulation, the random error terms of the model were assumed to follow a multivariate Ornstein–Uhlenbeck process. Through this formulation we were able to extend the seemingly unrelated regression framework to the unequally-spaced multivariate functional data context.

The model we proposed uses a Bayesian mixed-effects approach, where the fixed part corresponds to the mean functions, and the random part corresponds to individual deviations from these mean functions. Covariates were allowed as either fixed or random effects. For each of the response variables, both the mean and the subject-specific deviations were estimated via low-rank cubic splines using radial basis functions. Thus mean and subject-specific deviation from the mean were allowed to vary smoothly as a function of time. Inference was performed via Markov chain Monte Carlo methods.

We demonstrated the improvement in efficiency that is possible by using this model in simulations which show mean squared-error is lower for the full multivariate algorithm compared to fitting each of the functional responses unvariately, thereby ignoring the across-variable correlation. This seems especially important when the mean functions are wiggly, so that borrowing information across multiple responses becomes more important.

Finally, the utility of this methodology was demonstrated by application to a real-life psychiatric dataset looking at the relationship of multiple depression measures over time in a clinical trial. Here, using a multivariate approach resulted in narrower posterior confidence bands.

We plan future research to extend the multivariate functional model to mixed discrete and continuous functional outcome data. We also plan to develop methods for the joint analysis of mulitvariate functional data and time-to-event data.

Acknowledgments

We thank the referees for their helpful comments, which greatly improved the paper. We also thank Dr. Ellen Frank, University of Pittsburgh, Department of Psychiatry, for use of the example data. The first author was supported in part by RCMI grant 5G12 RR008124 from the NIH and by NSF grants DMS-0706752 and DMS-0804140. The second author was supported by NIH grant K25 MH076981-01 and NSF grant DMS-0904825.

Appendix

Starting values for θ, η_is and σ²

Let $D_{ϕ_{ij}}^{r} = I_{r} \otimes ϕ_{ij}^{'}$ , where ⊗ denotes the Kronecker product, and I_r is an r × r identity matrix. Define $D_{ψ_{ij}}^{r}$ , $D_{ϕ_{ij}}^{s}$ and $D_{ψ_{ij}}^{s}$ similarly.

Let

X_{0} = (\begin{matrix} X_{1} \\ ⋮ \\ X_{n} \end{matrix}) and Z_{0} = (\begin{matrix} Z_{υ 1} \\ Z_{υ 2} \\ ⋮ \\ Z_{υ n} \end{matrix} | \begin{matrix} \underset{i = 1, \dots, n}{blockdiag (Z_{wi})} \end{matrix} | \begin{matrix} \underset{i = 1, \dots, n}{blockdiag (Z_{ui})} \end{matrix}),

where

X_{i} = (\begin{matrix} x_{i 1}^{'} D_{ϕ_{i 1}}^{r} \\ ⋮ \\ x_{i m_{i}}^{'} D_{ϕ_{i m_{i}}}^{r} \end{matrix}), Z_{υi} = (\begin{matrix} x_{i 1}^{'} D_{ψ_{i 1}}^{r} \\ ⋮ \\ x_{i m_{i}}^{'} D_{ψ_{i m_{i}}}^{r} \end{matrix}), Z_{w i} = (\begin{matrix} z_{i 1}^{'} D_{ϕ_{i 1}}^{s} \\ ⋮ \\ z_{i m_{i}}^{'} D_{ϕ_{i m_{i}}}^{s} \end{matrix}) and Z_{ui} = (\begin{matrix} z_{i 1}^{'} D_{ψ_{i 1}}^{s} \\ ⋮ \\ z_{i m_{i}}^{'} D_{ψ_{i m_{i}}}^{s} \end{matrix}) .

To obtain starting values for θ, {η_i}_i=1,…,n and σ², we fit the mixed effects model

y = X_{0} β_{k 0} + Z_{0} ν_{k 0} + ε_{k},

for k = 1, …, p, where $β_{k 0} = {(β_{k 1}^{'}, \dots, β_{kr}^{'})}^{'}, ν_{k 0} = {(ν_{k 1}^{'}, \dots, ν_{kr}^{'}, {w_{ik 1}^{'}, \dots, w_{iks}^{'}}_{i = 1, \dots, n}, {u_{ik 1}^{'}, \dots, u_{iks}^{'}}_{i = 1, \dots, n})}^{'}$ and

cov (ν_{k 0}, ε_{k}) = (\begin{array}{l} σ_{kυ 0}^{2} I_{Kr} & 0_{Kr \times 2 sn} & 0_{Kr \times Ksn} & 0_{Kr \times n} \\ 0_{2 sn \times Kr} & σ_{kw 0}^{2} I_{2 sn} & 0_{2 sn \times K sn} & 0_{2 sn \times n} \\ 0_{K sn \times kr} & 0_{K sn \times 2 sn} & σ_{ku 0}^{2} I_{Ksn} & 0_{Ksn \times n} \\ 0_{n \times kr} & 0_{n \times 2 sn} & 0_{n \times ksn} & σ_{kε}^{2} I_{n} \end{array}) .

Generating θ

Let $Γ_{rij} = I_{r} \otimes (\begin{matrix} ϕ_{ij}^{'} & ψ_{ij}^{'} \end{matrix}), Γ_{sij} = I_{s} \otimes (\begin{matrix} ϕ_{ij}^{'} & ψ_{ij}^{'} \end{matrix}), χ_{ij} = I_{p} \otimes (x_{ij}^{'} Γ_{rij})$ and $E_{ij} = I_{p} \otimes (z_{ij}^{'} Γ_{sij})$ . The error term in model (1) can be expressed as

δ_{i} (t_{ij}) = y_{i} (t_{ij}) - χ_{ij} θ - E_{ij} η_{i},

(A.1)

where the vectors θ and η_i are as defined at the beginning of Section 2.6. Plugging (A.1) into γ_{t_ij} = δ_i(t_ij) − exp(−AΔt_ij) δ_i(t_i,j−1) gives

γ_{t_{ij}} = ζ_{i} (t_{ij}, t_{i, j - 1}) - χ_{i} (t_{ij}, t_{i, j - 1}) θ,

where

ζ_{i} (t_{ij}, t_{i, j - 1}) = y_{i} (t_{ij}) - E_{ij} η_{i} - \exp (- AΔ t_{ij}) [y_{i} (t_{i, j - 1}) - E_{i, j - 1} η_{i}]

and

χ_{i} (t_{ij}, t_{i, j - 1}) = χ_{ij} - \exp (- AΔ t_{ij}) χ_{i, j - 1 .}

Let $G = blockdiag (σ_{β_{11}}^{- 2} I_{2}, σ_{υ_{11}}^{- 2} I_{k}, \dots, σ_{β_{1 r}}^{- 2} I_{2}, σ_{υ_{1 r}}^{- 2} I_{K}, \dots, σ_{β_{p 1}}^{- 2} I_{2}, σ_{υ_{p 1}}^{- 2} I_{K}, \dots, σ_{β_{p r}}^{- 2} I_{2}, σ_{υ_{p r}}^{- 2} I_{k})$ . Then,

[θ ∣ η, A, C, σ^{2}, y] ~ N (μ_{θ}, \sum_{θ}),

where $σ^{2} = {(σ_{β_{kl}}^{2}, σ_{υ_{kl}}^{2}, σ_{w_{km 0}}^{2}, σ_{w_{km 1}}^{2}, σ_{u_{km}}^{2})}^{'}$ for k = 1, …, p, l = 1, …, r, m = 1, …, s,

\sum_{θ} = {[G + \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} χ_{i}^{'} (t_{ij}, t_{i, j - 1}) Ω_{Δ t_{ij}}^{- 1} χ_{i} (t_{ij}, t_{i, j - 1})]}^{- 1}

and

μ_{θ}^{'} = [\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} ζ_{i}^{'} (t_{ij}, t_{i, j - 1}) Ω_{Δ t_{ij}}^{- 1} χ_{i} (t_{ij}, t_{i, j - 1})] \sum_{θ} .

In the expression for Ω_{Δt_ij}, when p = 2, the stationary variance of the Ornstein–Uhlenbeck process is given by

\sum = \frac{\det (A) C + {[A - tr (A) I] C [A - tr (A) I]}^{'}}{2 tr (A) \det (A)},

see Gardiner (1983). For p = 1, this reduces to Σ = C/(2A). For p > 2, Σ can be obtained numerically by Matlab’s lyap function, for example.

Generating η_i

Let

q_{i} (t_{ij}, t_{i, j - 1}) = y_{i} (t_{ij}) - χ_{ij} θ - \exp (- A Δ t_{ij}) [y_{i} (t_{i, j - 1}) - χ_{i, j - 1} θ]

and

G_{wu} = blockdiag (\sum_{w_{11}}^{- 1}, σ_{u 11}^{- 2} I_{K}, \dots, \sum_{w_{1 s}}^{- 1}, σ_{u_{1 s}}^{- 2} I_{K}, \dots, \sum_{w_{p 1}}^{- 1}, σ_{u_{p 1}}^{- 2} I_{K}, \dots, \sum_{w_{ps}}^{- 1}, σ_{u_{ps}}^{- 2} I_{K}),

where $\sum_{w_{km}} = diag (σ_{w_{km 0}}^{2}, σ_{w_{km 1}}^{2})$ . Then

[η_{i} | θ, A, C, σ^{2}, y] ~ N (μ_{η_{i}}, \sum_{η_{i}}),

where

\sum_{η_{i}} = {[\sum_{j = 1}^{m_{i}} E_{i}^{'} (t_{ij}, t_{i, j - 1}) Ω_{Δ t_{ij}}^{- 1} E_{i} (t_{ij}, t_{i, j - 1}) + G_{wu}]}^{- 1}

and

μ_{η_{i}}^{'} = [\sum_{j = 1}^{m_{i}} q_{i}^{'} (t_{ij}, t_{i, j - 1}) Ω_{Δ t_{ij}}^{- 1} E_{i} (t_{ij}, t_{i, j - 1})] \sum_{η_{i}} .

Generating σ²

σ_{υ_{k l}}^{2} ∣ υ_{kl} ~ IG (K / 2 + a_{1}, b_{1} + \frac{1}{2} ν_{kl}^{'} ν_{kl})

for k = 1, …, p, l = 1, …, r.

\begin{array}{l} σ_{w_{km 0}}^{2} ∣ {\underset{i = 1, \dots, n}{w_{ikm 0}}} ~ IG (n / 2 + a_{2}, b_{2} + \frac{1}{2} \sum_{i = 1}^{n} w_{ikm 0}^{2}) \\ σ_{w_{km 1}}^{2} ∣ {\underset{i = 1, \dots, n}{w_{ikm 1}}} ~ IG (n / 2 + a_{3}, b_{3} + \frac{1}{2} \sum_{i = 1}^{n} w_{ikm 1}^{2}) \\ σ_{u_{km}}^{2} ∣ {\underset{i = 1, \dots, n}{u_{ikm}}} ~ IG (\frac{nK}{2} + a_{4}, b_{4} + \frac{1}{2} \sum_{i = 1}^{n} u_{ikm}^{'} u_{ikm}) \end{array}

for k = 1, …, p, m = 1, …, s.

Starting values for A and C

Starting values for A and C are obtained by numerically maximizing the conditional posterior

p (A, C ∣ θ_{0}, \underset{i = 1, \dots, n}{{η_{i 0}}}, y) \propto \prod_{i = 1}^{n} \prod_{j = 1}^{m_{i}} {| Ω_{Δ t_{ij}} |}^{1 / 2} \exp {- \frac{1}{2} γ_{t_{ij}}^{'} Ω_{Δ t_{i j}}^{- 1} γ_{t_{ij}}} \times p (A) \times p (C),

where θ₀ and η_i0, i = 1, …, n, are the starting values for the basis function coefficients. Note that γ_{t_ij} depends on θ₀ and η_i0, i = 1, …, n, through δ_i(t_ij) (expression (A.1)).

Generating the Ornstein–Uhlenbeck process parameters

To generate A, note that

p (A ∣ C, θ, \underset{i = 1, \dots, n}{{η_{i}}}, y) \propto \prod_{i = 1}^{n} \prod_{j = 1}^{m_{i}} {| Ω_{Δ t_{ij}} |}^{- 1 / 2} \exp {- \frac{1}{2} {γ^{'}}_{t_{ij}} Ω_{Δ t_{ij}}^{- 1} γ_{t_{ij}}} \times p (A) .

(A.2)

Since (A.2) is not a standard distribution, we use a Metropolis step to generate A. The proposal distribution is multivariate normal centered at the current value of A with a variance–covariance matrix equal to the inverse of the negative Hessian of the log of (A.2) evaluated numerically at the mode. This variance–covariance matrix is computed once, conditional on the starting values for the other parameters, and is then fixed throughout the sampling scheme. To increase the acceptance rate, this variance–covariance matrix is multiplied by 5.76/p, as proposed in Gelman et al. (2004), page 306. More generally, when using a normal proposal distribution centered at the current point, Gelman et al. (2004) suggest using c²Σ as the covariance matrix of that proposal distribution. Among this class of proposal densities, the most efficient one has scale $c \approx 2.4 / \sqrt{p}$ . The acceptance probability is

\min {1, \frac{p (A^{p} ∣ C, θ, {η_{i}}_{i = 1, \dots n}, y)}{p (A^{c} ∣ C, θ, {η_{i}}_{i = 1, \dots n}, y)}} .

The matrix C is generated by first generating the matrices L and D via a Metropolis step based on

p (L, D ∣ A, θ, {η_{i}}_{i = 1, \dots, n}, y) \propto \prod_{i = 1}^{n} \prod_{j = 1}^{m_{i}} {| Ω_{Δ t_{ij}} |}^{- 1 / 2} \exp {- \frac{1}{2} {γ^{'}}_{t_{ij}} Ω_{Δ t_{ij}}^{- 1} γ_{t_{ij}}} \times p (L) \times p (D) .

An iterate for C is then given by C = LDL′.

References

Aït-Sahalia Y. Closed-form likelihood expansions for multivariate diffusions. Annals of Statistics. 2008;36:906–937. [Google Scholar]
Baladandayuthapani V, Mallick BK, Young Hong M, Lupton JR, Turner ND, Carroll RJ. Bayesian hierarchical spatially correlated functional data analysis with application to colon carcinogenesis. Biometrics. 2008;64:64–73. doi: 10.1111/j.1541-0420.2007.00846.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Beskos A, Papaspiliopoulos O, Roberts GO, Fearnhead P. Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes. Journal of Royal Statistical Society B. 2006;68:333–382. [Google Scholar]
Blackwell PG. Bayesian inference for Markov processes with diffusion and discrete components. Biometrika. 2003;90:613–627. [Google Scholar]
Durbán M, Harezlak J, Wand MP, Carroll RJ. Simple fitting of subject-specific curves for longitudinal data. Statistics in Medicine. 2005;24:1153–1167. doi: 10.1002/sim.1991. [DOI] [PubMed] [Google Scholar]
Cassano GB, Michelini S, Shear MK, Coli E, Maser JD, Frank E. The panic-agoraphobic spectrum: A descriptive approach to the assessment and treatment of subtle symptoms. American Journal of Psychiatry. 1997;154:27–38. doi: 10.1176/ajp.154.6.27. [DOI] [PubMed] [Google Scholar]
De la Cruz-Mesía R, Marshall G. A Bayesian approach for nonlinear regression models with continuous errors. Communications in Statistics, Theory and Methods. 2003;32:1631–1646. [Google Scholar]
De la Cruz-Mesía R, Marshall G. Nonlinear random effects models with continuous time autoregressive errors. Statistics in Medicine. 2006;25:1471–1484. doi: 10.1002/sim.2290. [DOI] [PubMed] [Google Scholar]
Frank E, Cassano GB, Rucci P, Fagiolini A, Maggi L, Kraemer HC, Kupfer DJ, Pollock B, Bies R, Nimgaonkar V, Pilkonis P, Shear MK, Thompson WK, Grochocinski VJ, Scocco P, Buttenfield J, Forgione RN. Addressing the challenges of a cross-national investigation: lessons from the Pittsburgh-Pisa study of treatment-relevant phenotypes of unipolar depression. Clinical Trials. 2008 doi: 10.1177/1740774508091965. under revision. [DOI] [PMC free article] [PubMed] [Google Scholar]
French JL, Kammann EE, Wand MP. Comment on “Semiparametric Nonlinear Mixed-effects Models and Their Application” by C. Ke and Y. Wang. Journal of American Statistical Association. 2001;96:1285–1288. [Google Scholar]
Gardiner CW. Handbook of Stochastic Methods for Physics, Chemistry and Natural Sciences. Springer-Verlag; Berlin: 1983. [Google Scholar]
Gelman A. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) Bayesian Analysis. 2006;1:515–534. [Google Scholar]
Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. Second ed. Chapman & Hall/CRC; Boca Raton: 2004. [Google Scholar]
Golightly A, Wilkinson DJ. Bayesian sequential inference for nonlinear multivariate diffusions. Statistics and Computing. 2006;16:323–338. [Google Scholar]
Guo W. Functional mixed effects models. Biometrics. 2002;58:121–128. doi: 10.1111/j.0006-341x.2002.00121.x. [DOI] [PubMed] [Google Scholar]
Jones RH. Longitudinal Data with Serial Correlation: A State-Space Approach. Chapman & Hall/CRC; Boca Raton: 1993. [Google Scholar]
Kessler M, Rahbek A. Identification and inference for multivariate cointegrated and ergodic gaussian diffusions. Statistical Inference for stochastic Processes. 2004;7:137–151. [Google Scholar]
Rosen O, Stoffer DS. Automatic estimation of multivariate spectra via smoothing splines. Biometrika. 2007;94:335–345. [Google Scholar]
Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Cambridge University Press; Cambridge: 2003. [Google Scholar]
Smith M, Kohn R. Nonparametric seemingly unrelated regression. Journal of Econometrics. 2000;98:257–281. [Google Scholar]
Smith M, Kohn R. Parsimonious covariance matrix estimation for longitudinal data. J American Statistical Association. 2002;97:1141–1153. [Google Scholar]
Sy JP, Taylor JMG, Cumberland WG. A stochastic model for the analysis of bivariate longitudinal AIDS data. Biometrics. 1997;53:542–555. [PubMed] [Google Scholar]
Thompson WK, Rosen O. A Bayesian model for sparse functional data. Biometrics. 2008;64:54–63. doi: 10.1111/j.1541-0420.2007.00829.x. Web appendices are available at http://www.tibs.org/biometrics. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao Y, Staudenmayer J, Coull BA, Wand MP. General design Bayesian generalized linear mixed models. Statistical Science. 2006;21:35–51. [Google Scholar]

[R1] Aït-Sahalia Y. Closed-form likelihood expansions for multivariate diffusions. Annals of Statistics. 2008;36:906–937. [Google Scholar]

[R2] Baladandayuthapani V, Mallick BK, Young Hong M, Lupton JR, Turner ND, Carroll RJ. Bayesian hierarchical spatially correlated functional data analysis with application to colon carcinogenesis. Biometrics. 2008;64:64–73. doi: 10.1111/j.1541-0420.2007.00846.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Beskos A, Papaspiliopoulos O, Roberts GO, Fearnhead P. Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes. Journal of Royal Statistical Society B. 2006;68:333–382. [Google Scholar]

[R4] Blackwell PG. Bayesian inference for Markov processes with diffusion and discrete components. Biometrika. 2003;90:613–627. [Google Scholar]

[R5] Durbán M, Harezlak J, Wand MP, Carroll RJ. Simple fitting of subject-specific curves for longitudinal data. Statistics in Medicine. 2005;24:1153–1167. doi: 10.1002/sim.1991. [DOI] [PubMed] [Google Scholar]

[R6] Cassano GB, Michelini S, Shear MK, Coli E, Maser JD, Frank E. The panic-agoraphobic spectrum: A descriptive approach to the assessment and treatment of subtle symptoms. American Journal of Psychiatry. 1997;154:27–38. doi: 10.1176/ajp.154.6.27. [DOI] [PubMed] [Google Scholar]

[R7] De la Cruz-Mesía R, Marshall G. A Bayesian approach for nonlinear regression models with continuous errors. Communications in Statistics, Theory and Methods. 2003;32:1631–1646. [Google Scholar]

[R8] De la Cruz-Mesía R, Marshall G. Nonlinear random effects models with continuous time autoregressive errors. Statistics in Medicine. 2006;25:1471–1484. doi: 10.1002/sim.2290. [DOI] [PubMed] [Google Scholar]

[R9] Frank E, Cassano GB, Rucci P, Fagiolini A, Maggi L, Kraemer HC, Kupfer DJ, Pollock B, Bies R, Nimgaonkar V, Pilkonis P, Shear MK, Thompson WK, Grochocinski VJ, Scocco P, Buttenfield J, Forgione RN. Addressing the challenges of a cross-national investigation: lessons from the Pittsburgh-Pisa study of treatment-relevant phenotypes of unipolar depression. Clinical Trials. 2008 doi: 10.1177/1740774508091965. under revision. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] French JL, Kammann EE, Wand MP. Comment on “Semiparametric Nonlinear Mixed-effects Models and Their Application” by C. Ke and Y. Wang. Journal of American Statistical Association. 2001;96:1285–1288. [Google Scholar]

[R11] Gardiner CW. Handbook of Stochastic Methods for Physics, Chemistry and Natural Sciences. Springer-Verlag; Berlin: 1983. [Google Scholar]

[R12] Gelman A. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) Bayesian Analysis. 2006;1:515–534. [Google Scholar]

[R13] Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. Second ed. Chapman & Hall/CRC; Boca Raton: 2004. [Google Scholar]

[R14] Golightly A, Wilkinson DJ. Bayesian sequential inference for nonlinear multivariate diffusions. Statistics and Computing. 2006;16:323–338. [Google Scholar]

[R15] Guo W. Functional mixed effects models. Biometrics. 2002;58:121–128. doi: 10.1111/j.0006-341x.2002.00121.x. [DOI] [PubMed] [Google Scholar]

[R16] Jones RH. Longitudinal Data with Serial Correlation: A State-Space Approach. Chapman & Hall/CRC; Boca Raton: 1993. [Google Scholar]

[R17] Kessler M, Rahbek A. Identification and inference for multivariate cointegrated and ergodic gaussian diffusions. Statistical Inference for stochastic Processes. 2004;7:137–151. [Google Scholar]

[R18] Rosen O, Stoffer DS. Automatic estimation of multivariate spectra via smoothing splines. Biometrika. 2007;94:335–345. [Google Scholar]

[R19] Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Cambridge University Press; Cambridge: 2003. [Google Scholar]

[R20] Smith M, Kohn R. Nonparametric seemingly unrelated regression. Journal of Econometrics. 2000;98:257–281. [Google Scholar]

[R21] Smith M, Kohn R. Parsimonious covariance matrix estimation for longitudinal data. J American Statistical Association. 2002;97:1141–1153. [Google Scholar]

[R22] Sy JP, Taylor JMG, Cumberland WG. A stochastic model for the analysis of bivariate longitudinal AIDS data. Biometrics. 1997;53:542–555. [PubMed] [Google Scholar]

[R23] Thompson WK, Rosen O. A Bayesian model for sparse functional data. Biometrics. 2008;64:54–63. doi: 10.1111/j.1541-0420.2007.00829.x. Web appendices are available at http://www.tibs.org/biometrics. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Zhao Y, Staudenmayer J, Coull BA, Wand MP. General design Bayesian generalized linear mixed models. Statistical Science. 2006;21:35–51. [Google Scholar]

PERMALINK

A Bayesian regression model for multivariate functional data

Ori Rosen

Wesley K Thompson

Abstract

1. Introduction