Longitudinal functional principal component analysis

Sonja Greven; Ciprian Crainiceanu; Brian Caffo; Daniel Reich

doi:10.1214/10-EJS575

. Author manuscript; available in PMC: 2011 Jul 7.

Published in final edited form as: Electron J Stat. 2010;4:1022–1054. doi: 10.1214/10-EJS575

Longitudinal functional principal component analysis

Sonja Greven ^1,^*,^†, Ciprian Crainiceanu ^2,^*,^‡, Brian Caffo ^3,^*,^‡, Daniel Reich ^4,^§

PMCID: PMC3131008 NIHMSID: NIHMS299819 PMID: 21743825

Abstract

We introduce models for the analysis of functional data observed at multiple time points. The dynamic behavior of functional data is decomposed into a time-dependent population average, baseline (or static) subject-specific variability, longitudinal (or dynamic) subject-specific variability, subject-visit-specific variability and measurement error. The model can be viewed as the functional analog of the classical longitudinal mixed effects model where random effects are replaced by random processes. Methods have wide applicability and are computationally feasible for moderate and large data sets. Computational feasibility is assured by using principal component bases for the functional processes. The methodology is motivated by and applied to a diffusion tensor imaging (DTI) study designed to analyze differences and changes in brain connectivity in healthy volunteers and multiple sclerosis (MS) patients. An R implementation is provided.

Keywords and phrases: Diffusion tensor imaging, functional data analysis, Karhunen-Loève expansion, longitudinal data analysis, mixed effects model

1. Introduction

Scientific studies now commonly collect functional or imaging data at multiple visits over time. In this paper we introduce a class of models and inferential methods for the analysis of longitudinal data where each repeated observation is functional.

Our motivating data set comes from a diffusion tensor imaging (DTI) study, which was designed to analyze cross-sectional and longitudinal differences in brain connectivity in healthy volunteers and multiple sclerosis (MS) patients. For each of 112 subjects and each visit, we have fractional anisotropy (FA) measurements along the corpus callosum in the brain. Figure 1 shows an image of the corpus callosum with 7 biological landmarks (denoted 1, 20, 40, 60, 80, 100, 120) used for registration of measurements across subjects. Each visit’s data for a subject is a finely sampled function across the corpus callosum, with the argument of the function being the spatial distance along the tract. For illustration, the FA data is displayed for 2 subjects, one with 5 and one with 6 complete visits. Although change over time may be subtle in comparison with measurement error, accurate quantification of that change is crucial for applications ranging from powering clinical trials to understanding brain development. This data structure is not unique to this study. In fact, the tractography data is an example of many new data sets containing functions or images that are observed repeatedly over time (longitudinal functional data).

Fig 1 — Top: Sagittal image of the corpus callosum in one of the study subjects, a healthy 33-year-old man, showing the segmentation used [following 46] for construction of the tract profile. Values denote the bin number at the boundary point from the splenium (back of the head) to the genu/rostrum (closer to the eyes). Bottom: Two example subjects (both MS patients) from the tractography data with 5 and 6 complete visits, respectively. Shown are the fractional anisotropy along the corpus callosum, measured at the 120 sample points. Different visits for the same subject are indicated by color and overlaid.

The common structure of these studies can be understood using an analogy with classical longitudinal data [8]. Longitudinal data is commonly analyzed using the very flexible class of linear mixed models [22, 44], which explicitly decompose the variation in the data into between- and within- subject variability. Similarly, we decompose the dynamic behavior of functional data into a time-dependent population average, baseline (or static) subject-specific variability, longitudinal (or dynamic) subject-specific variability, subject-visit-specific variability and measurement error. Technically this is achieved by replacing random effects with random functional effects.

We propose an estimation procedure that is based on an eigenanalysis and extends functional principal component analysis (FPCA) to the longitudinal setting. Computation is very efficient, even for very large data sets. The estimation procedure performed well both in an extensive simulation study and in the DTI application, where it uncovered subtle but potentially important subject-specific changes over time in a specific region of the corpus callosum, the isthmus. The character of these changes could conceivably be used as an early gauge of disease progression or response to neuroprotective therapies.

Our approach is different from functional mixed models based on the smoothing of fixed and random curves using splines or wavelets [3, 15, 16, 29]. In contrast to these methods focusing on the estimation of fixed and random curves, our approach is based on functional principal component analysis. In addition to the computational advantages of such an approach [compare also 19], we are able to extract the main differences between subjects in their average profiles and in how their profiles evolve over time. Such a signal extraction, which is not possible using smoothing methods alone, allows the relation of subject-specific scores to other variables such as disease status, age or disease progression. Our approach can be seen as an extension of functional principal component analysis for multilevel functional data [7]. Our methods apply to longitudinal data where each observation is functional, and should thus not be confused with nonparametric methods for the longitudinal profiles of scalar variables [17, 30, 31, 37, 41, 48, 50, 51]. For good introductions to functional data analysis in general, please see [10, 34].

The remainder of the paper is organized as follows. Section 2 introduces the longitudinal functional model and explains how dimension reduction via longitudinal functional principal component analysis (LFPCA) is achieved. Section 3 develops our estimation procedure and provides computational efficiency results. Section 4 shows the performance of our procedure in an extensive simulation study. Section 5 provides the application of LFPCA methods to the tractography data, while Section 6 concludes with a discussion. Theoretical results and proofs are given in the appendix. Supplementary material [13] providing an R function implementing LFPCA, simulation code and additional graphs is available in the archive that the Electronic Journal of Statistics maintains on Project Euclid.

2. The longitudinal functional model

In this section we introduce models for data sets where functional data are recorded at multiple time points or visits for the same observational unit or subject. The observed data are {Y_ij(d), d ∈ Inline graphic , T_ij, Z_ij, V_ij}, where Y_ij(·) is a random function in L²[0, 1] observed at arguments d in some set , T_ij is the time of visit j for subject i, and Z_ij and V_ij are vectors of covariates for subject i = 1, …, I at visit j = 1, …, J_i, where the number of visits J_i can vary with the subject, i. We assume that at least some subjects i have at least 3 visits, that is J_i ≥ 3. The multi-level case when J_i ≤ 2 for all i was fully addressed by [7] and [6].

2.1. The functional random intercept and random slope model

The data structure in this paper is similar to that of standard longitudinal data, with the exception that instead of observing scalars, Y_ij, one observes functions, Y_ij(d), over time. We use this analogy to build up intuition and to introduce the functional equivalent of the standard longitudinal model. For simplicity we first extend the random intercept and slope model [35]. The functional analog is

Y_{i j} (d) = η (d, T_{i j}) + X_{i, 0} (d) + X_{i, 1} (d) T_{i j} + U_{i j} (d) + ε_{i j} (d),

(2.1)

where η(d, T_ij) is a fixed main effect surface, X_i_,0(d) is the random functional intercept for subject i, X_i_,1(d) is the random functional slope for subject i, T_ij is the time of visit j for subject i, U_ij(d) is the random subject and visit-specific functional deviation, and ε_ij(d) is random homoscedastic white noise. We make the following assumptions:

A.1
X_i(d) = {X_i_,0(d), X_i_,1(d)}, U_ij(d) and ε_ij(d) are zero-mean, square-integrable, mutually uncorrelated random processes on [0, 1],
A.2
X_i_,0(d) and X_i_,1(d) have auto-covariance functions K₀(d, d′) and K₁(d, d′), respectively, and cross-covariance function K₀₁(d, d′),
A.3
U_ij(d) has covariance function K_U (d, d′), and
A.4
ε_ij(d) is white noise measurement error with variance σ².

There are several parallels between the scalar random intercept-random slope model and model (2.1). First, Y_ij(d) is now a functional observation. Second, X_i_,0(d) and X_i_,1(d) replace the scalar random effects b_i₀ and b_i₁ as functional random intercept and random slope, respectively, capturing subject-to-subject variation. Third, the cross-covariance function K₀₁(d, d′) replaces the covariance between b_i₀ and b_i₁. Fourth, the subject- and visit-specific deviation now consists of two parts. U_i_,_j(d) is a visit-specific functional deviation from the subject-specific functional trend, capturing visit-to-visit functional variation on the same subject. ε_ij(d) is additional white noise measurement error, capturing random uncorrelated variation within each curve. The overall mean trend is allowed to be a smooth surface η(d, T_ij), which generalizes the linear mean β₀ + β₁T_ij often assumed in the scalar model.

Model (2.1) encompasses several simpler models that can be obtained as special cases. For example, visits may be of equal number J_i = J per subject or equally spaced, T_ij = j for all i and j. The mean function η(d, T_ij) may be time constant, η(d), additive or linear in T, η(s, T_ij) = η₁(d) + η₂(T_ij), or η(d, T_ij) = η₀(d) + T_ijη₁(d). The latter formulation is a direct extension of the linear population trend typically assumed in the scalar model.

Model (2.1) allows the decomposition of the variation in the observed curves into a) differences in subject’s baseline functions; b) differences in subjects’ average changes over time; c) visit-specific variation around these average trends; and d) measurement error. This decomposition is of interest in many applications. For example, in the tractography application we describe in Section 5, it is of interest to study both the population cross-sectional and the dynamic behavior of various measurements along neuronal tracts.

2.2. The general functional mixed model

While model (2.1) is rich enough for our application, it lends itself well to generalization, which could be useful in other applications. A more general form of the longitudinal functional model is

Y_{i j} (d) = η (d, Z_{i j}) + V_{i j}^{'} X_{i} (d) + U_{i j} (d) + ε_{i j} (d),

(2.2)

where we assume that

B.1
X_i(d), U_ij(d) and ε_ij(d) are zero-mean, square-integrable, mutually uncorrelated random processes on [0, 1],
B.2
X_i(d) is a vector-valued random process with auto-covariance functions for the p components K₁₁(d, d′), …, K_pp(d, d′), and cross-covariance functions K₁₂(d, d′), …, K_1,_p(d, d′), …, K_p_−1,_p(d, d′),
B.3
U_ij(d) is a random process with covariance function K_U (d, d′), and
B.4
ε_ij(d) is white noise measurement error with variance σ².

Model (2.1) and its assumptions A.1 to A.4 are obtained as a special case of (2.2) and B.1 to B.4 by setting p = 2, Z_ij = T_ij and V_ij = (1, T_ij)′. Note that model (2.1) counts the components of X from 0 to 1 rather than from 1 to 2 to stress the analogy to the scalar random intercept-random slope model. The functional mixed-effects ANOVA model [7] results if we set p = 1, Z_ij = j, V_ij = 1, and η(d, Z_ij) = μ(d) + η_j(d). More generally, Z_ij and V_ij are vectors of known covariates for subject i at time T_ij. η(d, Z_ij) is the fixed main effect surface, which can depend parametrically, semi-parametrically or non-parametrically on the covariates Z_ij = (Z_ij_,1, …, Z_ij_,_m). The simplest parametric form is a linear mean $η (d, Z_{i j}) = α_{0} + α_{1} d + Z_{i j}^{'} β$ , while the most complex nonparametric form is a (p+1)-dimensional smooth function η(d, Z_ij). Intermediate semi-parametric models such as η(d, Z_ij) = η₁(d, Z_ij_,1) + ···+ η_m (d, Z_ij_,_m) or $η (d, Z_{i j}) = η_{0} (d) + Z_{i j}^{'} β$ could also be useful in particular applications.

Model (2.2) is the functional analog of the linear mixed model for longitudinal data [22]. It is similar to models used by [15, 29], but we do not assume Gaussianity and allow for more general fixed effects. The model of [15] also does not admit correlated random functional effects, as are present in (2.1). In addition, we follow quite a different modeling approach, using longitudinal functional principal component analysis instead of smoothing splines or wavelets for nested curves. This has large computational advantages compared to [29], especially when the number of random effects is large, see Section 4.3 and [19]. We are at the same time able to extract the main differences between subjects in how their profiles evolve over time, something not possible in these other approaches.

For notational simplicity in the remainder of the paper we will focus on model (2.1). In Appendix B we point out the small technical differences for fitting the more general model (2.2).

2.3. Dimension reduction via longitudinal FPCA

While models (2.1) and (2.2) are intuitive generalizations of linear mixed effects models, their computational feasibility is not obvious, especially for large numbers of subjects, visits and observations. We here propose an efficient modeling approach, Longitudinal Functional Principal Component Analysis (LFPCA). LFPCA is the longitudinal generalization of functional principal component analysis (FPCA) [34] and multilevel functional principal component analysis (MFPCA) [7]. The main idea of LFPCA is to extract the main directions of variation of the X and U processes using an eigen decomposition of their respective covariance operators on the basis of Mercer’s theorem [27]. The Karhunen-Loève expansion [20, 26] is then used to obtain parsimonious expansions of X and U.

In the notation of model (2.1), we expand the covariance operator of the bivariate process X_i(d) = {X_i_,0(d), X_i_,1(d)} as

K_{X} (d, d^{'}) = (\begin{matrix} K_{0} (d, d^{'}) & K_{01} (d, d^{'}) \\ K_{01} (d^{'}, d) & K_{1} (d, d^{'}) \end{matrix}) = \sum_{k = 1}^{\infty} λ_{k} φ_{k}^{X} (d) φ_{k}^{X} {(d^{'})}^{'},

where $φ_{k}^{X} (d) = {φ_{k}^{0} (d), φ_{k}^{1} (d)}^{'}$ are the ordered eigenfunctions of K_X(d, d′) corresponding to the eigenvalues λ₁ ≥ λ₂ ≥ ··· ≥ 0. Similarly, let $K_{U} (d, d^{'}) = \sum_{k = 1}^{\infty} ν_{k} φ_{k}^{U} (d) φ_{k}^{U} (d^{'})$ , where $φ_{k}^{U} (d)$ are the ordered eigenfunctions of K_U (d, d′) corresponding to the eigenvalues ν₁ ≥ ν₂ ≥ ··· ≥ 0. The eigenfunctions { $φ_{k}^{U}$ , k ∈ IN} form an orthonormal basis of L²[0, 1] with respect to the usual L²[0, 1] scalar product. The eigenfunctions { $φ_{k}^{X}$ , k ∈ IN} form an orthonormal basis of L²[0, 1] × L²[0, 1] with respect to the additive scalar product

< (f_{0}, f_{1}), (g_{0}, g_{1}) > = \int_{0}^{1} f_{0} (s) g_{0} (s) d s + \int_{0}^{1} f_{1} (s) g_{1} (s) d s .

The function pairs $(φ_{k}^{0}, φ_{j}^{1}), (φ_{k}^{0}, φ_{j}^{U})$ or ( $φ_{k}^{1}, φ_{j}^{U}$ ) are not required to be orthogonal in L²[0, 1], nor will ( $φ_{k}^{0}, φ_{j}^{0}$ ) or ( $φ_{k}^{1}, φ_{j}^{1}$ ) be orthogonal in general for k ≠ j. The Karhunen-Loève expansions of the random processes are

X_{i} (d) = \sum_{k = 1}^{\infty} ξ_{i k} φ_{k}^{X} (d) and U_{i j} (d) = \sum_{k = 1}^{\infty} ζ_{ijk} φ_{k}^{U} (d),

where the principal components scores

ξ_{i k} = \int_{0}^{1} X_{i, 0} (s) φ_{k}^{0} (s) d s + \int_{0}^{1} X_{i, 1} (s) φ_{k}^{1} (s) d s and ζ_{ijk} = \int_{0}^{1} U_{i j} (s) φ_{k}^{U} (s) d s

are uncorrelated random variables with mean zero and variances λ_k and ν_k, respectively. Assumption A.1. is ensured by assuming that {ξ_ik, i = 1, …, I, k ∈ IN} and {ζ_ijk, j = 1, …, J_i, i = 1, …, I, k ∈ IN} are mutually uncorrelated. Because working with infinite expansions is impractical, we consider the finite-dimensional approximations of the X and U processes

X_{i} (d) = \sum_{k = 1}^{N_{X}} ξ_{i k} φ_{k}^{X} (d) and U_{i j} (d) = \sum_{k = 1}^{N_{U}} ζ_{ijk} φ_{k}^{U} (d),

where N_X and N_U will be estimated, as described in Section 3.4. Conditional on N_X and N_U the finite approximation to model (2.1) is

\begin{array}{l} Y_{i j} (d) & = & η (d, T_{i j}) + \sum_{k = 1}^{N_{X}} ξ_{i k} V_{i j}^{'} φ_{k}^{X} (d) + \sum_{k = 1}^{N_{U}} ζ_{ijl} φ_{k}^{U} (d) + ε_{i j} (d), \\ ξ_{i k} & \overset{unc}{\sim} & (0, λ_{k}), ζ_{ijl} \overset{unc}{\sim} (0, ν_{l}), ε_{i j} (d) \overset{unc}{\sim} (0, σ^{2}), \end{array}}

(2.3)

V_ij = (1, T_ij)′, which is a linear mixed model [see also 7]. Here, $x_{l} \overset{unc}{\sim} (0, a)$ denotes uncorrelated variables with mean 0 and variance a. We are neither assuming normality of the processes in (2.1) nor of the scores in (2.3). LFPCA extends similarly to the more general model (2.2).

3. Estimation

For reasons of simplicity, we focus the presentation on model (2.1), but estimation is done similarly for model (2.2). The minor adjustments for fitting (2.2) are described in Appendix B. We assume that the mean, covariance operators and eigenfunctions are smooth. For presentation purposes, we assume that all functions Y_ij(d) are measured at a finite number, D, of grid points Inline graphic ⊂ [0, 1]. However, the method can easily handle missing data, both in terms of visits per subject or observations per visit. Estimation can be done using a few simple steps, which will be described in more detail in the following.

Step 1
The fixed effect surface η is estimated using the working independence model
$Y_{i j} (d) = η (d, T_{i j}) + ε_{i j} (d) .$

Smoothness selection is by REML, which is more robust to neglecting the correlations in the errors than prediction error methods [21].
Step 2
The autocovariance functions for the random processes X_i = (X_i_,0, X_i_,1) and U_ij are estimated from the residuals Y_ij(d) − η̂ (d, T_ij), using a linear regression step.
Step 3
The ‘raw’ autocovariance function estimates from step 2 are subjected to bivariate smoothing, yielding also an estimate for σ².
Step 4
Eigen decompositions of the smoothed autocovariance functions provide bases for representing X = (X_i_,0, X_i_,1) and U_ij, which are truncated to achieve parsimony.
Step 5
Estimated BLUPs then provide estimates for the subject- and visit-specific scores, which summarize the main differences in the dynamics of functions over time.

3.1. Estimation of the mean

The fixed effect population mean surface η(d, T) can be estimated using a bivariate smoother in d and T under a working independence assumption. For discussions of smoothing for correlated data, see [21, 24]. Possibilities for smoothers include penalized splines [39], smoothing splines [11] and local polynomials [9]. Choice of a smoother and of the smoothing parameter or bandwidth is discussed extensively in the literature and is not the main focus here. It is our experience that most reasonable smoothers used judiciously will provide similar results. For simplicity and efficiency of the implementation for large data sets, we use penalized spline smoothing with REML estimation of the smoothing parameter. This choice has also been found to be relatively robust to misspecification of the error correlation structure in [21].

A bivariate smoother is appropriate when the collection of observations across visits and subjects is relatively dense. This need not be the case in general, and simpler choices might be more sensible. For example, η(d, T_ij) = η₀(d) + T_ijβ might be more appropriate if the T_ij form a sparser collection. In the case of equally spaced visits, T_ij = T_j, [7] used η(d, T_ij) = η_j(d). Choices will depend on the particular application, available data and scientific problem. In most applications, estimating the mean function is quite easy and, even, routine. Once a consistent estimator of the mean function is available, data can be centered as follows Y_ij(d) − η̂(d, T_ij) for all i, j and d. In the following we assume that the Y_ij(d) are mean zero.

3.2. Estimation of the covariance operators

A crucial point of our proposed methodology is estimating the covariance operators K_X(·, ·) and K_U (·, ·). To estimate the covariance functions, we focus on the cross-products Y_ij(d)Y_ik(d′). Because Y_ij(d) has zero mean, each product Y_ij(d)Y_ik(d′) is an estimator of the covariance between the function observed at time T_ij evaluated at location d, and the function observed at time T_ik evaluated at location d′. Every subject thus contributes an estimator each for every available pair of observations at time T_ij evaluated at location d, and at time T_ik evaluated at location d′. Available pairs of observations may vary between subjects in their (d, d′) and (T_ij, T_ik) combinations. The method described in the following can thus easily handle missing data, both in terms of visits per subject or observations per visit.

Under the assumptions of model (2.1),

E {Y_{i j} (d) Y_{i k} (d^{'})} = Cov {Y_{i j} (d), Y_{i k} (d^{'})} = K_{0} (d, d^{'}) + T_{i k} K_{01} (d, d^{'}) + T_{i j} K_{01} (d^{'}, d) + T_{i j} T_{i k} K_{1} (d, d^{'}) + [K_{U} (d, d^{'}) + σ^{2} δ_{d d^{'}}] δ_{j k},

(3.1)

for all d, d′, i, j and k, where δ_jk is Kronecker’s delta. Equation (3.1) suggests a straightforward solution for estimating the covariance operators: regress linearly the “outcome” Y_ij(d)Y_ik(d′) on the “covariates” (1, T_ik, T_ij, T_ikT_ij, δ_jk), where the “parameters” are {K₀(d, d′), K₀₁(d, d′), K₀₁(d′, d), K₁(d, d′), K_U (d, d′); d, d′ ∈ Inline graphic ; σ²}.

While the intuition behind the method is simple, there are two potential pitfalls that should be carefully avoided. First, σ² is identifiable only under the assumption that K_U (d, d′) is a bivariate smooth function in d and d′. Second, in a straightforward implementation of the linear regression on the basis of (3.1), there are $D^{2} \sum_{i = 1}^{I} J_{i}^{2}$ observations and 4D² + 1 variables. In our moderately sized tract data, this would correspond to 19 million observations and 57, 500 variables. In larger data sets the problem would be even more serious. Thus, careful implementation is required to ensure computational feasibility. We propose the following 3-step estimation procedure that avoids these problems.

Step A
{K₀(d, d′), K₀₁(d, d′), K₀₁(d′, d), K₁(d, d′), K_U (d, d′)+ σ²δ_dd_′} is estimated for each pair d ≤ d′ ∈ using least squares estimation based on (3.1). Symmetry constraints yield K₀(d, d′) = K₀(d′, d), K₁(d, d′) = K₁(d′, d) and K_U (d, d′) = K_U (d′, d) for d > d′. Denote estimates by K̃₀(d, d′), K̃₀₁(d, d′), K̃₁(d, d′) and K̃_U (d, d′).
Step B
Bivariate smoothing in d and d′ over K̃₀(d, d′), K̃₀₁(d, d′) and K̃₁(d, d′) yields smooth estimates K̂₀(d, d′), K̂₀₁(d, d′) and K̂₁(d, d′). Bivariate smoothing over K̃_U (d, d′), leaving out the diagonal elements as proposed by [41, 50], also yields estimates K̂_U (d, d′).

Please see Section 3.1 for a discussion of bivariate smoothing.
Step C
σ² can be estimated as ${\hat{σ}}^{2} = \frac{1}{D} \sum_{d = 1}^{D} {{\tilde{K}}_{U} (d, d) - {\hat{K}}_{U} (d, d)}$ , if positive, and as zero otherwise.

Estimation in Step A can be done using efficient matrix-vector computations as detailed in Theorem 1 in Appendix A. The following is a consequence of that theorem.

Corollary 1

The computational effort for estimation of the covariance functions in Step A for the general model (2.2) is of the order O{max(p⁶, p²D²g)}, where $g = \sum J_{i}^{2}$ and p is the dimension of the vector-valued random process X_i(d) in (2.2).

All proofs can be found in Appendix A. For model (2.1), p = 2 is small. Note that p²D² is the order of the number of unknown parameters in the covariance functions, and g is the number of observation pairs contributing to the estimation. The effort thus is linear in both. Our software implementation is so efficient that the computational effort is dominated by the bivariate smoothing of the mean and covariance functions; see Section 4.3 for a detailed investigation of efficiency.

Our procedure does not guarantee that K̂_X(·, ·) and K̂_U (·, ·) are positive definite. We correct this problem by trimming the eigenvalue-eigenvector pairs corresponding to negative eigenvalues, a method that has been found to increase the L² accuracy [17] and has been shown to work well in practice [51].

3.3. Estimation of the eigenfunctions and scores

In the previous section we showed how to obtain the estimated covariance matrices K̂₀ = {K̂₀(d, d′)}_d;d_′
∈, K̂₀₁ = {K̂₀₁(d, d′)}_d;d_′
∈, K̂₁ = {K̂₁(d, d′)}_d;d_′
∈, and K̂_U = {K̂_U (d, d′)}_d;d_′
∈. Estimates of the eigenvalues and of the eigenfunctions $φ_{k}^{X} (\cdot)$ , k = 1, 2, …, D, and $φ_{k}^{U} (\cdot)$ , k = 1, 2, …, D, at the grid points Inline graphic can then be obtained using the spectral decomposition of K_X and K_U, ${\hat{K}}_{X} = \sum_{k = 1}^{2 D} {\hat{λ}}_{k} {\hat{φ}}_{k}^{X} {{\hat{φ}}_{k}^{X}}^{'}$ and ${\hat{K}}_{U} = \sum_{k = 1}^{D} {\hat{ν}}_{k} {\hat{φ}}_{k}^{U} {φ_{k}^{U}}^{'}$ , where ${\hat{φ}}_{k}^{X} = {({\hat{φ}}_{k}^{0} (d), {\hat{φ}}_{k}^{1} (d)); d \in D}$ and ${\hat{φ}}_{k}^{U} = {{\hat{φ}}_{k}^{U} (d); d \in D}$ , are orthonormal vectors in IR²^D and IR^D, respectively. The estimation of the number of eigenfunctions retained for further analysis, N_X and N_U, is described in Section 3.4.

In Section 2.3, equation (2.3), we showed that for fixed N_X and N_U, model (2.1) is a linear mixed model. Thus, we can use best linear unbiased prediction (BLUP) to obtain predictions of the subject- and subject/visit-specific scores, ξ_ik and ζ_ijk, respectively. BLUP calculation does not require a normality assumption and is a generalization of the conditional expectations used by [50].

For given eigenfunctions, mean function η(d, T), and variances λ_k, k = 1, …, N_X, ν_k, k = 1, …, N_U, the BLUP for b = (ξ₁₁, …, ξ_{1N_X}, …, ξ_I₁, …, ξ_{IN_X} ζ₁₁₁, …, ζ_{11N_U}, …, ζ_{IJ_I1}, …, ζ_{IJ_IN_U}) in model (2.3) is given in the usual form by

\hat{b} = {DZ}^{'} {(σ^{2} I + {ZDZ}^{'})}^{- 1} (Y - η),

(3.2)

where Z = [Z_X|Z_U], Z_X = E_I ⊗ Φ⁰ + T ⊗ Φ¹, Z_U = I_n ⊗ Φ^U, E_I = (δ_ih)_ij₌₁₁,_…,_{IJ_I};_h_=1,…,_I, T = (T_ijδ_ih)_ij₌₁₁,_…,_{IJ_I};_h_=1,…,_I, $Φ^{0} = {φ_{k}^{0} (d)}_{d \in D; k = 1, \dots, N_{X}}, Φ^{1} = {φ_{k}^{1} (d)}_{d \in D; k = 1, \dots, N_{X}}, Φ^{U} = {φ_{k}^{U} (d)}_{d \in D; k = 1, \dots, N_{U}}$ , D = blockdiag(D_X, D_U ) = blockdiag{I_I ⊗ diag(λ₁, …, λ_{N_X}), I_n ⊗ diag(ν₁, …, ν_{N_U})}, Y = {Y₁₁(1), …, Y₁₁(D), …, Y_1J₁(1), …, Y_1J₁ (D), …, Y_{IJ_I}(1), …, Y_{IJ_I}(D)}, and η = {η(1, T₁₁), …, η(D, T₁₁), …, η(D, T_1J₁), …, η(1, T_{IJ_I}), …, η(D, T_{IJ_I})}. Here, ⊗ denotes the Kronecker product of matrices, and (a_ijh)_ij₌₁₁,_…,_{IJ_Ih}_=1,…,_I denotes a matrix with entries a_ijh, rows ij, j = 1, …, J_i, i = 1, …, I, and columns h = 1, …, I.

We can obtain estimated BLUPs (EBLUPs) using the estimated functions and variances η̂(·, ·), σ̂², ${\hat{φ}}_{k}^{0} (\cdot), {\hat{φ}}_{k}^{1} (\cdot)$ , λ̂_k, k = 1, …, N_X, and ${\hat{φ}}_{k}^{U} (\cdot)$ , ν̂_k, k =1, …, N_U, from Sections 3.1 and 3.2. This does not require fitting the model (2.3), which greatly increases computational efficiency. While straightforward implementation of (3.2) requires inverting nD × nD matrices, which would result in computational effort of the order O(n³D³), we make use of common matrix rules and of the model structure to obtain a more efficient representation, as detailed in Theorem 2 in Appendix A. The following result of Theorem 2 confirms the manageable computational effort even for very large data sets.

Corollary 2

Computational effort for calculation of the estimated BLUPs in (3.2) is of the order $O {max (nDf, I N_{X}^{3})}$ , where $n = \sum_{i = 1}^{I} J_{i}$ and f= N_U + N_XI/n.

The proofs are provided in Appendix A. N_X and N_U are typically small, and much smaller than either D or the number of observed curves n. These results and efficient block matrix manipulation make the models proposed here feasible even for very large data sets. For example, one of the simulation examples in Section 4 uses 1, 000 subjects, who were observed at 8 visits and had 200 observations per visit.

3.4. Decomposition of variance and choice of the number of components

There are several possible ways to choose the numbers of eigenfunctions N_X and N_U. Two alternatives that have been used before are leave-one-curve-out cross validation [38] and an AIC-type criterion [50]. Alternatively, one can make use of the fact that (2.3) is a linear mixed model, with N_X and N_U corresponding to the number of random effects. The conditional Akaike information criterion (cAIC), proposed for the selection of random effects in linear mixed models [12, 23, 43] could thus be employed. [40] and [6] point out that choosing the number of eigenfunctions corresponds to step-wise testing for zero variance components. They propose using a Restricted Likelihood Ratio Test (RLRT) for this zero variance. The null distribution can be easily approximated using methods introduced by [14] based on the null distribution derived in [5].

Here we follow a simpler approach based on the proportion of variance explained. This approach has several advantages: a) popularity; b) simplicity and interpretability; c) quantification of the contribution of the different processes to the variability in Y_ij(d).

To better understand variance partitioning, we give the following result.

Lemma 1

Let Y_ij(d) ∈ L²[0, 1] be a process that follows model (2.1) with zero mean, η(d, T_ij) ≡ 0. Let T_ij be independently distributed as T for all i and j, where E(T ²) < ∞, and let T_ij be independent of X_i, U_ij and ε_ij(d), d ∈ Inline graphic . Then, the average variance of Y_ij(d) can be written as

\int_{0}^{1} Var {Y_{i j} (s)} d s = \int_{0}^{1} (\sum_{k = 1}^{\infty} λ_{k} [{φ_{k}^{0} (s)}^{2} + 2 E (T_{i j}) φ_{k}^{0} (s) φ_{k}^{1} (s) + E (T_{i j}^{2}) {φ_{k}^{1} (s)}^{2}] + \sum_{k = 1}^{\infty} ν_{k} {φ_{k}^{U} (s)}^{2} + σ^{2}) d s .

The proof can be found in Appendix A. Given the usual interpretation of eigenvalues as variance explained in FPCA, one could be tempted to interpret λ_k similarly in the longitudinal context. The variance decomposition that we just described indicates that in LFPCA, λ_k can be interpreted as a variance component only if the time variable is standardized to have zero mean and unit variance. In this case, the two components of the $φ_{k}^{X}$ eigenfunction, $φ_{k}^{0}$ and $φ_{k}^{1}$ , will be on the same scale. We can then directly discuss λ_k as the “variance explained” by the eigenfunction $φ_{k}^{X}$ of K_X. Thus, we recommend standardizing the time variable. The variation in Y_ij(d) then has the following simple additive decomposition.

Corollary 3

In the case when E(T_ij) = 0 and Var(T_ij) = 1, the expression in Lemma 1 reduces to

\int_{0}^{1} Var {Y_{i j} (s)} d s = \sum_{k = 1}^{\infty} λ_{k} + \sum_{k = 1}^{\infty} ν_{k} + σ^{2} .

Thus, for standardized T_ij, the variation in Y_ij(d) can be decomposed additively into the contributions from the random intercept and random slope process, $\sum_{k = 1}^{\infty} λ_{k}$ , from the visit-specific deviation process, $\sum_{k = 1}^{\infty} ν_{k}$ , and from the additional random noise, σ². This leads to a simple decision rule for N_X and N_U: choose $φ_{k}^{X}$ and $φ_{k}^{U}$ corresponding to λ_k and ν_k in decreasing order, until

{\sum_{k = 1}^{N_{X}} λ_{k} + \sum_{k = 1}^{N_{U}} ν_{k} + σ^{2}} / {\sum_{k = 1}^{\infty} λ_{k} + \sum_{k = 1}^{\infty} ν_{k} + σ^{2}} \geq L,

where L is a pre-specified proportion of explained variation, such as L = 0.95. $\sum_{k = 1}^{\infty} λ_{k}$ and $\sum_{k = 1}^{\infty} ν_{k}$ provide quantifications of the relative importance of the X and U processes.

4. Simulations

4.1. Simulation design

To investigate the performance of our estimation procedure, we conduct an extensive simulation study. The design combines and extends scenarios used by [7] and [50]. For all settings, we generate 1000 data sets from model (2.3), where N_X = N_U = 4. We set the mean function to η(d, T ) = 0.5(T/4 − d)². The unequally spaced time points T_ij are simulated such that the mean for each subject is zero, and increments T_ij − T_ij₋₁ are independent draws from a uniform distribution on [0, 1]. The time variable is then standardized to have unit variance. The curves Y_ij(d) are taken to be observed for d ∈ Inline graphic = {(k −0.5)/D, k = 1, …, D}, D = 120, as in the tract data. We set the variances to be λ_k = ν_k = 2¹⁻^k, k = 1, …, 4, and σ = 0.05. This choice corresponds to 0.07% of the overall average variance explained by the error variance σ², higher than in the tract data (0.02%, please see Table 2).

Table 2.

Average variance $\int_{0}^{1} Var {Y_{i j} (s)} d s$ explained by the different model components in percent.

To obtain the variance explained by $φ_{k}^{0}$ and $φ_{k}^{1}$ , the corresponding λ_k is multiplied by $\int {(φ_{k}^{0} (s))}^{2} d s$ and $\int {(φ_{k}^{1} (s))}^{2} d s$ , respectively. The cumulative variance explained for row k is the sum of the row entries up to and including row k. The last row gives the cumulative variance explained for the respective column.

φ_{k}^{0}

φ_{k}^{1}

φ_{k}^{U}

σ ²

cumulative

37.97

0.21

22.81

0.02

61.01

6.76

0.55

5.41

73.73

3.33

0.32

2.96

80.34

2.11

0.43

2.07

84.95

1.50

0.28

1.51

88.24

0.98

0.19

1.32

90.73

52.65

1.98

36.08

0.02

90.73

Open in a new tab

We consider all possible combinations of the following scenarios:

number of subjects (a) I = 50 (b) I = 100 (c) I = 200 and (d) I = 500, including both smaller and larger numbers than in the tract data,
1. balanced design with J_i = 4 for all i,
2. unbalanced design with J_i ∈ {1, …, 9}, (a multiple of 8, 8, 9, 6, 5, 5, 4, 3, 2 times, respectively), giving 4 observations per subject on average,
1. normal scores ξ_ik ~ (0, λ_k) and ζ_ijk ~ (0, ν_k) for all i, j and k,
2. non-normal scores; ξ_ik drawn from a mixture of two normals, with equal probability from either $N (\sqrt{λ_{k} / 2}, λ_{k} / 2)$ or from $N (- \sqrt{λ_{k} / 2}, λ_{k} / 2)$ ; ζ_ijk drawn with equal probability from either $N (\sqrt{ν_{k} / 2}, ν_{k} / 2)$ or $N (- \sqrt{ν_{k} / 2}, ν_{k} / 2)$ ,
1. eigenfunctions $φ_{k}^{X} = (φ_{k}^{0}, φ_{k}^{1})$ with $φ_{k}^{0}$ and $φ_{k}^{1}$ orthogonal and of equal norm $\sqrt{1 / 2}; φ_{k}^{U}$ are not orthogonal to either $φ_{k}^{0}$ or $φ_{k}^{1}$ ,
2. eigenfunctions $φ_{k}^{X} = (φ_{k}^{0}, φ_{k}^{1})$ with $φ_{k}^{0}$ and $φ_{k}^{1}$ non-orthogonal and of unequal norms $\sqrt{3 / 4}$ and $\sqrt{1 / 4}$ . $φ_{k}^{U}$ are equal to $φ_{j}^{0}$ or $φ_{j}^{1}$ for some j for all k,
1. estimation does not include bivariate smoothing of the covariance functions. In this case, smoothing is only used to obtain an estimate of the diagonal K_U (d, d), d ∈ , and of σ²,
2. estimation includes bivariate smoothing of the covariance functions.

This gives 64 different combinations overall. The eigenfunctions for setting 4. (a) are

\begin{array}{l} φ_{1}^{0} (d) = sin (2 π d) & φ_{1}^{1} (d) = 1 / \sqrt{2} & φ_{1}^{U} (d) = 1 \\ φ_{2}^{0} (d) = cos (2 π d) & φ_{2}^{1} (d) = sin (6 π d) & φ_{2}^{U} (d) = \sqrt{3} (2 d - 1) \\ φ_{3}^{0} (d) = sin (4 π d) & φ_{3}^{1} (d) = cos (6 π d) & φ_{3}^{U} (d) = \sqrt{5} (6 d^{2} - 6 d + 1) \\ φ_{4}^{0} (d) = cos (4 π d) & φ_{4}^{1} (d) = sin (8 π d) & φ_{4}^{U} (d) = \sqrt{7} (20 d^{3} - 30 d^{2} + 12 d - 1) . \end{array}

Note that while $φ_{k}^{0}$ and $φ_{k}^{1}$ are orthogonal, they are not orthogonal to $φ_{j}^{U}$ for all k and j ≠ 1. The eigenfunctions for setting 4. (b) are

\begin{array}{l} φ_{1}^{0} (d) = sin (2 π d) \sqrt{2 / 3} & φ_{1}^{1} (d) = 1 / 2 \\ φ_{2}^{0} (d) = cos (2 π d) \sqrt{2 / 3} & φ_{2}^{1} (d) = \sqrt{3} (2 d - 1) / 2 \\ φ_{3}^{0} (d) = sin (4 π d) \sqrt{2 / 3} & φ_{3}^{1} (d) = \sqrt{5} (6 d^{2} - 6 d + 1) / 2 \\ φ_{4}^{0} (d) = cos (4 π d) \sqrt{2 / 3} & φ_{4}^{1} (d) = \sqrt{7} (20 d^{3} - 30 d^{2} + 12 d - 1) / 2 \\ φ_{1}^{U} (d) = \sqrt{4} φ_{1}^{1} (d) \\ φ_{2}^{U} (d) = \sqrt{4 / 3} φ_{1}^{0} (d) \\ φ_{3}^{U} (d) = \sqrt{4 / 3} φ_{2}^{1} (d) \\ φ_{4}^{U} (d) = \sqrt{4 / 3} φ_{3}^{1} (d) . \end{array}

Note that now $φ_{k}^{0}$ and $φ_{j}^{1}$ are non-orthogonal, and $φ_{k}^{0}$ has a larger norm than $φ_{k}^{1}$ . Also, $φ_{k}^{U}$ is equal to one of the $φ_{j}^{0}$ or $φ_{j}^{1}$ , j = 1, …, 4, for each k, making separation of the two processes X and U much more difficult. For bivariate smoothing of the mean and covariance functions, we use tensor product penalized cubic regression splines with 10 knots per dimension, where the smoothing parameters are estimated using REML estimation, as implemented in the R package mgcv [47].

To investigate the sensitivity of our results to our choices for η and σ, we also consider four variations on the balanced design with I = 100 and J_i = 4 for all i (1b and 2a), non-orthogonal $φ_{k}^{0}, φ_{k}^{1}$ and $φ_{k}^{U}$ with unequal weight on $φ_{k}^{1}$ and $φ_{k}^{0}$ (4b), a mixture distribution for the scores ξ_ij and ζ_ijk (3b), and bivariate smoothing of the covariance functions (5b). For these four settings, we vary η(d, T ) = (T/4 − d/D + 1/2)(T/4 + d/D − 1/2), η(d, T) = sin(πT/2)d/D, σ = 0.5 (corresponding to 6.25% of the overall average variance explained by the error variance σ²) or σ = 1 (21.05%), respectively.

For each of the 1000 replications and for each of the 68 settings, our estimation procedure from Section 3 with N_X = N_U = 4 is used to obtain estimates of the mean function, the covariance functions, the eigenfunctions, the scores, and all variances.

4.2. Simulation results

We now discuss results for one of the 68 settings in detail, and point out differences across settings. The complete simulation results can be found in the supplementary material.

Figure 2 and Table 1 show the main results of simulations based on a balanced design with I = 100 and J_i = 4 for all i (1b and 2a), non-orthogonal $φ_{k}^{0}, φ_{k}^{1}$ and $φ_{k}^{U}$ with unequal weight on $φ_{k}^{1}$ and $φ_{k}^{0}$ (4b), a mixture distribution for the scores ξ_ij and ζ_ijk (3b), and no bivariate smoothing of the covariance functions (5a). A plot of the true and estimated mean functions can be found in the supplementary material, illustrating that the mean is well and unbiasedly estimated.

Fig 2 — True and estimated eigenfunctions $φ_{k}^{X} = (φ_{k}^{0}, φ_{k}^{1})$ and $φ_{k}^{U}$ , k = 1, …, 4. The left column gives results for the part $φ_{k}^{0}$ corresponding to the random functional intercept X_i,0, the middle column for the part $φ_{k}^{1}$ corresponding to the random functional slope X_i,1, and the right column for the component $φ_{k}^{U}$ corresponding to the visit-specific functional deviation U_ij. Shown are the true function (thick black line), the mean of the estimated functions over 1000 simulations (dashed red line), the pointwise 5th and 95th percentiles of the estimated functions (blue), and the estimated functions from the first 50 simulations (grey). Simulations were based on model (2.3) with N_X = N_U = 4, a balanced design with I = 100 and J_i = 4 for all i, non-orthogonal $φ_{k}^{0}, φ_{k}^{1}$ and $φ_{k}^{U}$ with unequal weight on $φ_{k}^{1}$ and $φ_{k}^{0}$ , a mixture distribution for the scores ξ_ij and ζ_ijk, and no bivariate smoothing of the covariance functions.

Table 1.

True and estimated subject-specific and visit-specific scores ξ_ik and ζ_ijk. Given are summary statistics of the scaled differences $({\hat{ξ}}_{i k} - ξ_{i k}) / \sqrt{λ_{k}}$ (top) and $({\hat{ζ}}_{ijk} - ζ_{ijk}) / \sqrt{ν_{k}}$ (bottom), k = 1, …, 4. Simulations were based on model (2.3) with N_X = N_U = 4, a balanced design with I = 100 and J_i = 4 for all i, non-orthogonal $φ_{k}^{0}, φ_{k}^{1}$ and $φ_{k}^{U}$ with unequal weight on $φ_{k}^{1}$ and $φ_{k}^{0}$ , a mixture distribution for the scores ξ_ij and ζ_ijk, and no bivariate smoothing of the covariance functions.

Minimum	1st Quantile	Median	Mean	3rd Quantile	Maximum
−2.39	−0.31	0.00	0.00	0.30	2.56
−2.52	−0.24	0.00	0.00	0.23	3.13
−3.42	−0.31	0.00	0.00	0.31	3.39
−4.17	−0.20	0.00	0.00	0.20	4.58

−2.80	−0.18	0.00	0.00	0.18	3.11
−3.68	−0.39	0.00	0.00	0.39	4.26
−2.77	−0.30	0.01	0.01	0.31	2.54
−3.38	−0.34	0.00	0.00	0.34	4.16

Open in a new tab

Figure 2 shows the true and estimated eigenfunctions $φ_{k}^{X} = (φ_{k}^{0}, φ_{k}^{1})$ and $φ_{k}^{U}$ , k = 1, …, 4. Results for $φ_{k}^{0}, φ_{k}^{1}$ and $φ_{k}^{U}$ are displayed in the left, middle and right panels, respectively. Shown are the true function (thick black line), the mean of the estimated functions over 1000 simulations (dashed red line), the pointwise 5th and 95th percentiles of the estimated functions (blue), and the estimated functions from the first 100 simulations (grey). Note that the covariance functions, and thus the eigenfunctions, are not smoothed in this setting.

For all functions, the mean of the estimated functions is very close to the true function, and the variability around it is small. It can be noted that the $φ_{k}^{U}$ are slightly better estimated. This is due to the fact that estimation of the covariance function K_U (d, d′) is based on n = ΣJ_i visits, while estimation of the covariance function K_X(d, d′) is based on only I subjects, with I = n/4 in this setting. In this case, $φ_{k}^{0}$ has a larger norm than $φ_{k}^{1}$ , making estimation of this component easier. This is noticeable in a smaller variance for $φ_{k}^{0}$ . Nevertheless, estimation of the $φ_{k}^{1}$ is also remarkably good. Overall, estimation of all functions is very good, even though: a) $φ_{k}^{0}$ and $φ_{k}^{1}$ are not mutually orthogonal; and b) each $φ_{k}^{U}$ is actually identical to either $φ_{j}^{0}$ or $φ_{j}^{1}$ for some j. Our estimation procedure effectively separates the X and U processes, even in the most difficult of circumstances and with a moderate sample size.

Table 1 provides results for the scores ξ_ik and ζ_ijk, k = 1, …, 4. Shown are summary statistics for the scaled differences between estimated and true scores, $({\hat{ξ}}_{i k} - ξ_{i k}) / \sqrt{λ_{k}}$ and $({\hat{ζ}}_{ijk} - ζ_{ijk}) / \sqrt{ν_{k}}$ , k = 1, …, 4. The table illustrates that the majority of estimates lies close to the true scores, relative to the standard deviation of the score in question, even if the distribution of the estimates is more heavy tailed than in a normal distribution. This might be expected from the fact that the principal components $φ_{k}^{X}$ and $φ_{k}^{U}$ in model (2.3) are estimated and not observed.

Further figures in the supplementary material show results for the estimation of the variances, σ², λ_k and ν_k, k = 1, …, 4. The ν̂_k are centered at the true values ν_k, with about 70% within 10% and more than 95% within 20% of the true value. The λ̂_k show a slight downward bias and somewhat larger variation, reflecting the smaller effective sample size for estimation of these variance components. σ² is estimated almost as well as the ν_k.

Overall, the estimation procedure performed very well in a wide range of scenarios. Across simulations, we found the following similarities and differences. First, results improve for an increasing number of subjects I. As expected, a larger I decreases the variability of the estimated eigenfunctions, mean function, scores and variances. The slight downward bias in the λ̂_k disappears with increasing I. Second, a balanced design (2a) improves results compared to an unbalanced design (2b) with the same number of subjects and visits. A balanced design leads to a) decreased variability in the estimated mean η̂(d, T ), as we estimate the mean under a working independence assumption before estimating the complex covariance structure b) decreased variability in the estimated eigenfunctions ${\hat{φ}}_{K}^{X}$ and decreased variability and small sample bias in the variances λ̂_k, k = 1, …, N_X. This is similarly due to the fact that we estimate the covariance functions using least squares under a working independence assumption. Estimation of the $φ_{k}^{U}$ is not much affected by how balanced the design is, although there is some indication that small estimates λ̂_k are compensated for by a slight increase in the ν̂_k. Third, results for normal scores (3a) and non-normal scores (3b) where virtually identical. This is expected, as BLUPs do not rely on a normality assumption and are thus robust to departures from normality. Still, it is reassuring to see this confirmed in practice. Forth, non-orthogonality of $φ_{k}^{0}$ and $φ_{k}^{1}$ (4b) does not affect results compared to orthogonality (4a). Even though in (4b), each $φ_{k}^{U}$ is equal to either $φ_{j}^{0}$ or $φ_{j}^{1}$ for some j, estimation of the $φ_{k}^{U}$ is equally good in both cases. The only consistent difference between the two designs is, that as in (4b) $φ_{k}^{0}$ has a larger part in the norm of $φ_{k}^{X}$ , estimation of $φ_{k}^{0}$ improves somewhat, while estimation of $φ_{k}^{1}$ deteriorates slightly. Fifth, results excluding (5a) and including (5b) bivariate smoothing of the covariance functions were very similar, with the smooth version more effective at filtering out the measurement errors ε_ij(d) and obtaining smooth eigenfunctions $φ_{k}^{X}$ and $φ_{k}^{U}$ .

Our sensitivity analyses indicate that results are not very sensitive to the choice of the mean function, with all three considered mean functions estimated well and unbiasedly. Large error variances increase the variability of all estimates. When signal-to-noise ratios become small due to very large error variances and small variances λ_k or ν_k, this leads to some underestimation of the magnitude (but not shape) of the eigenfunctions. This is due to the unbiasedness of estimation for the covariance functions, which are quadratic in the eigenfunctions, and the attenuation resulting from large variances in $∣ E (X) ∣ = \sqrt{E (X^{2}) - Var (X)} < \sqrt{E (X^{2})}$ .

4.3. Computational efficiency

To investigate computation time we considered different combinations of number of subjects I ∈ {25, 50, 100, 200, 500, 1000}, number of observations per subject J ∈ {4, 8}, and number of sample points per curve D ∈ {50, 100, 200, 500}. All other parameters were chosen as for the simulations described in Section 4.1, settings combination 2(a), 3(a), 4(a), 5(a).

Figure 3 provides the computation times. System times were (for practical reasons) measured on three different cluster nodes running 64-bit Red Hat Linux, with 2.3/2.6/3.0 GHz AMD Opteron Processors and 32 GB random access memory. Figures 3 a) and c) for J = 4 and J = 8 display computation time versus the number of subjects I, stratified by D. For example, computation time for I = 100 subjects with J = 4 visits and D = 100 points per curve was just 1.4 minutes, while computation for I = 1, 000, J = 8 and D = 200 took 72 minutes. Figures 3 b) and d) display computation times versus sample points per curve, D, stratified by I. As suggested by Corollaries 1 and 2, computation time is roughly linear in I and between linear and quadratic in D. A linear regression of the log computation time log(C) on log(I), log(J) and log(D) yields $\hat{log (C)} = - 7.12 + 0.93 log (I) + 1.38 log (D) + 0.82 log (J)$ . The coefficient for log(D) is indeed between 1 and 2, as expected from Corollaries 1 and 2. The coefficients for log(I) and log(J) are close to 1, but somewhat lower, reflecting that there are some parts of the estimation procedure not depending on I and J. The adjusted R² for the model is a high 0.98, indicating that good estimates of computation time on comparable machines can be obtained from this regression equation also for parameter combinations not considered here. Note, however, that for very large I and, especially, D, memory might be more of a concern than computational efficiency. In that case, one can replace our efficient matrix computations by less efficient methods that optimize memory usage.

Fig 3 — Computation time for LFPCA for a simulated data set with the given number of subjects I and number of observations per subject J, and with D sample points per curve. Specifics of how computation time was measured are given in Section 4.3.

5. Application

In this section, we use LFPCA to decompose the variability in the tractography data. We first provide the scientific background.

5.1. Background and scientific questions

Multiple sclerosis (MS) is a disorder of the central nervous system (CNS) [e.g. 4]. MS causes typical abnormalities on magnetic resonance imaging (MRI) scans of the brain and spinal cord, and consequently MRI has become the primary diagnostic tool for MS. MRI scanning is increasingly used to monitor disease progression and response to therapy and has become an important surrogate outcome measure in clinical trials.

Diffusion tensor imaging (DTI), in contrast to conventional MRI techniques, is able to resolve individual functional tracts within the CNS white matter, the primary target of MS. DTI is sensitive to diffusion anisotropy, which, in the brain and spinal cord, corresponds to the tendency of water to diffuse along axonal tracts [1]. A focus on one or several tracts with specific functional correlates can then help in understanding the neuroanatomical basis of disability in MS. Quantitative measures derived from DTI data include fractional anisotropy (FA), measuring the degree of anisotropy between 0 and 1 [2]. FA can be decreased in MS due to lesions, loss of myelin and axon damage [42, 45].

Measurement of tract-specific MRI indices has traditionally worked with averages along tracts, ignoring the spatial variation of those indices within tracts [25, 32, 36]. However, that spatial variation can be considerable. The extent to which accounting for this spatial variation can improve detection of abnormality, correlation with disability, or sensitivity to change across time, remains uncertain. The last of these is particularly relevant for monitoring individual patients in the clinic and for the design and powering of clinical trials of new drugs.

We are interested in using the full spatial information to gain a better understanding of differences between subjects both with respect to their mean tract profiles over time (static behavior) and to the changes in those tract profiles over time (dynamic behavior). Our data set includes measurements for 84 MS patients and 28 controls with 1 to 8 complete visits, giving 308 visits overall. At each visit, we have measurements of FA and the diffusivities along several tracts in the brain, which were reconstructed using the tract finding algorithm of [28]. We will focus here on the corpus callosum, a tract connecting the two hemispheres of the brain. The 120 sample points - from the splenium (back of the head) to the genu/rostrum (closer to the eyes) - were chosen equidistantly between certain landmarks on the tract used for registration of curves between subjects [33]. The corpus callosum and its segmentation are illustrated in Figure 1 (top). Figure 1 (bottom) shows example profiles from two MS patients, illustrating the variability of profiles between subjects and within subjects over time. Visual examination of these tract profiles reveals variation within subjects across both space and time but no clear and consistent trend over time.

5.2. Application of LFPCA to the tractography data

As changes in MRI hardware over the five years of the study caused some variation in the measured MRI indices, we use a preprocessing step to remove differences due to variation in scanning technique. For each of the five scanning epochs, we estimate a mean profile for cases alone, using one visit per subject. This avoids confounding of disease status with epoch due to uneven distribution of cases and controls among epochs, and confounding by disease progression. We then subtract the difference of the epoch mean profile to the overall mean profile from all functional observations.

We obtain a decomposition of the variance using LFPCA. The time variable is centered by subject and standardized. For bivariate smoothing of the mean and covariance functions, we use tensor product penalized cubic regression splines with 30 knots per dimension, with smoothing parameters estimated using REML. A graph of the mean function η(d, T) is given in the supplementary material. The mean profile is roughly constant over time, with some variation near areas of high curvature.

For a pre-specified level L = 90% of explained average variance $\int_{0}^{1} Var {Y_{i j} (s)} d s$ , LFPCA gives N_X = N_U = 6 principal components $φ_{k}^{X}$ and $φ_{k}^{U}$ for the X and U processes. The decomposition of the average variance is given in Table 2. 38% of the variation is explained by the first principal component for X, $φ_{1}^{X}$ , another 23% by the first principal component for U, $φ_{1}^{U}$ . Overall, the first six components $φ_{k}^{X}$ , k = 1, …, 6, explain 55% of the average variance, indicating that the X process captures most of the variation in the data. Within X, most of the variation is explained by the random functional intercept X_i,₀, but the variance due to the subject-specific random slope is still large compared to the measurement error. Note also that the study period is much shorter than the disease duration for some of the patients, such that X_i,₁ might still be of large practical relevance over many years. Within-curve measurement error is negligible due to a smoothing step during profile construction, explaining only 0.02%. Estimated variances were λ̂_k = 0.316, 0.060, 0.030, 0.021, 0.015, 0.010, k = 1, …, 6, ν̂_k = 0.189, 0.045, 0.025, 0.017, 0.012, 0.011, k = 1, …, 6, and σ̂² = 0.0002.

Figure 4 shows the first three estimated principal components for the random intercept and slope process X. The left column gives estimates for the $φ_{k}^{0}$ , corresponding to the random functional intercept X_i,₀. Depicted are estimates for the overall mean η(d) (solid line), and for $η (d) \pm 2 \sqrt{λ_{k}} φ_{k}^{0}$ , k = 1, 2, 3 (+ and −, respectively). The middle column gives the corresponding results for the random functional slope X_i,₁. The right column shows boxplots for the estimates of the scores ξ_ik corresponding to ( $φ_{k}^{0}, φ_{k}^{1}$ ), k = 1, 2, 3, by case/control group. Estimated scores for the two example patients with tract profiles shown in Figure 1 are indicated by A and B, respectively.

Fig 4 — The first three estimated principal components for the random intercept and slope process X. The left column gives estimates for the $φ_{k}^{0}$ , corresponding to the random functional intercept X_i,0. Depicted are estimates for the overall mean η(d) (solid line), and for $η (d) \pm 2 \sqrt{λ_{k}} φ_{k}^{0}$ , k = 1, 2, 3 (+ and −, respectively). The middle column gives the corresponding results for the random functional slope X_i,1. The right column shows boxplots for the estimates of the scores ξ_ik corresponding to ( $φ_{k}^{0}, φ_{k}^{1}$ ), k = 1, 2, 3, by case/control group. Estimated scores for the two example patients with tract profiles shown in Figure 1 are indicated by A and B, respectively.

Positive loadings ξ̂_i₁ > 0 on the first component correspond to a lower mean function with a particularly deep FA dip in the isthmus (around 20), but only to small changes of profiles over time. For example, patient A with a much lower dip than B loads positively on this component, while B’s loading is roughly zero. The second component is a mean contrast, with positive scores corresponding to a lower dip around 20 and a higher plateau around 60. The corresponding change over time is similar, if smaller in magnitude, and could explain how the differences in mean profiles evolved over time. The large positive score ξ̂_i₂ in patient A corresponds to the large contrast between low dip and high plateau in this patient, which is very unpronounced in patient B (roughly zero score). The third component shows a shift of the location of the dip, which might reflect differences in anatomy that affect the tractography. This goes hand in hand with a slight further shifting and deepening (for negative scores) of the dip over time. For example, patient A, in contrast to patient B, exhibits more of a deepening of the dip and a shift to the left, with corresponding negative score ξ̂_i₃. As mentioned above, these consistent changes over time are not immediately apparent from an examination of the tract profiles in Figure 1 but are clearly revealed by loadings on the principal components derived from the LFPCA analysis. In future work, we plan to examine whether these changes can portend disease course.

Figure 5 shows the corresponding results for the visit-specific functional deviation U. ${\hat{φ}}_{1}^{U}$ is similar in shape to ${\hat{φ}}_{1}^{0}$ . Patient A at visit 8, for example, shows a lower profile than would be expected from the average evolution in this patient over time, and consequently has a positive score ζ̂_ij₁, with the converse being true for patient B at visit 2. Components ${\hat{φ}}_{2}^{U}$ and ${\hat{φ}}_{3}^{U}$ seem to pick up variation at the ends of the tract as well as visit-to-visit shifts of the location of the dip, which might be due to measurement error. Note that the U process captures both measurement error and true biological fluctuations, which are impossible to separate without additional subject-matter insight. Filtering out these processes allows us to study the systematic trends modeled by X.

Fig 5 — The first three estimated principal components for the visit-specific deviation process U. The left column gives results for the principal components $φ_{k}^{U}$ , depicting estimates for the overall mean η(d) (solid line), and for $η (d) \pm 2 \sqrt{ν_{k}} φ_{k}^{U}$ , k = 1, 2, 3 (+ and −, respectively). The right column shows boxplots for the estimates of the scores ζ_ik corresponding to $φ_{k}^{U}$ , k = 1, 2, 3, by case/control group. Estimated scores for example visits of the two patients with tract profiles shown in Figure 1 are indicated by A (visit 8) and B (visit 2), respectively.

Our model allows straightforward inclusion of additional covariates such as case/control status, disease severity, medication or age in the mean function η. In this study, however, we were interested in how the main variations in tract profiles and their changes over time differed by case/control group, i.e. in the covariance part of the model. When fixed effect group specific means are the target of inference, our approach could be used to improve the confidence band estimators on the group-specific mean difference. For example, the group-specific means could be estimated first under independence, then the covariances can be estimated using LFPCA, and then estimates of the group-specific mean differences can be further improved using the estimated covariance structure. The process can even be iterated until convergence.

Focusing on the covariances, we find a statistically significant difference in the distribution of the estimated scores ξ̂_i₁ between MS patients and controls (p=0.0056 in a Mann-Whitney-Wilcoxon test; also in a linear regression adjusting for age and sex). The patient group in particular seems to have a higher mean and a heavier right tail. This could be an indication of a mixture in this group of patients who are more or less affected by MS along this particular tract. Potential loading-based clustering into patient subgroups will be of interest in future work. Interestingly, FA for this component is not decreased uniformly along the tract, but only posterior to the genu (ca. 1–100), with the decrease being especially pronounced in the area of the isthmus (ca. 20). Our results thus identify the region of the corpus callosum (the isthmus) where MS seems to take its greatest toll and also define the ways in which that portion of the tract changes from one year to the next. In future work, we plan to examine whether these changes can portend disease course. This result could not have been obtained by using the average FA instead of our functional approach.

6. Discussion

We have introduced methods for functional data that is observed at multiple time points for the same subject. Our methods can be viewed as extending longitudinal mixed effects models by replacing the random effects with random processes. Models are designed to decompose the longitudinal functional data into a time-dependent population average, baseline (or static) subject-specific variability, longitudinal (or dynamic) subject-specific variability, subject/visit-specific variability and measurement error. We propose an estimation procedure based on an eigen expansion that is highly computationally efficient and performs well in a wide range of simulations and in our application.

Our work is different from functional data methodology applied to the analysis of longitudinally observed scalar data [41, among others], but builds on methods from both functional and longitudinal data analysis [8, 34]. While the considered model shares similarities with models used by [15, 29], we do not assume Gaussianity and our approach is based on functional principal component analysis. In addition to computational advantages (compare Section 4.3 and [19]), this allows the extraction of the main differences between subjects in the dynamics of their profiles over time, something of interest in many applications including our tractography study. Also, our work is different from methods for the 3-D analysis of subject-specific DTI studies [see for example 18]. It takes a functional data approach to tract data, as has recently been done for non-longitudinal DTI tractography data in [52].

Our approach can serve as a stepping-stone for further developments in the field of longitudinally observed functional data, and lends itself well to extensions. As our estimation procedure performs best when the number of time points per subject is balanced, it might be interesting to investigate further improvements, such as via iterations between mean and covariance estimation. Using an iterative approach, [49] find improvements to the integrated mean squared errors that are most pronounced for sparse functional data, where the number of sample points per curve is small. We will pursue such an approach in the future, in particular if dealing with sparse longitudinal functional data. While the functional random intercept-random slope model was sufficient for our application, it would also be interesting to apply our general model in more complex settings. And as our methods extract the main modes of variation in longitudinal functional data, including differences in mean curves and changes in curves over time, the associated scores could be of interest for further use in regression or classification.

Supplementary Material

Supplementary materials to “Longitudinal functional principal component analysis” by S. Greven, C. Crainiceanu, B. Caffo and D. Reich (doi: 10.1214/10-EJS575SUPP).

Supplementary Material

Web supplement

NIHMS299819-supplement-Web_supplement.zip^{(15.1MB, zip)}

Acknowledgments

We thank the editor, David Ruppert, for his helpful comments which led to an improved version of the manuscript. The MRI scans used in Section 5 were obtained through a generous grant from the National Multiple Sclerosis Society (TR3760A3) to Dr. Peter Calabresi, whom we gratefully acknowledge.

Appendix A: Theoretical results and proofs

Theorem 1

Estimates of the covariance functions in Step A can be expressed as ${\hat{β}}_{1} = {(X_{1}^{'} X_{1})}^{- 1} X_{1}^{'} c_{1}$ and ${\hat{β}}_{2} = {(X_{2}^{'} X_{2})}^{- 1} X_{2}^{'} c_{2}$ . Here, β₁ is the 5×{D(D−1)/2} matrix with column {K₀(d, d′), K₀₁(d, d′), K₀₁(d′, d), K₁(d, d′), K_U (d, d′)} corresponding to d < d′ ∈ Inline graphic , and β₂ is the 4 × D matrix with column {K₀(d, d), K₀₁(d, d), K₁(d, d′), K_U (d, d) + σ²} corresponding to d ∈ . X₁ is the $(\sum_{i = 1}^{I} J_{i}^{2}) \times 5$ matrix with rows (1, T_ik, T_ij, T_ijT_ik, δ_jk), j, k = 1, …, Ji, i = 1, …, I, and X₂ is the ${\sum_{i = 1}^{I} J_{i} (J_{i} + 1) / 2} \times 4$ matrix with rows (1, T_ik + T_ij, T_ijT_ik, δ_jk), j ≤ k = 1, …, J_i, i = 1, …, I. c₁ is the $(\sum_{i = 1}^{I} J_{i}^{2}) \times {D (D - 1)} / 2)$ matrix with column (Y_ij(d)Y_ik(d′), j, k = 1, …, J_i, i = 1, …, I) corresponding to d < d′ ∈ Inline graphic , and c₂ the ${\sum_{i = 1}^{I} J_{i} (J_{i} + 1) / 2} \times D$ matrix with column (Y_ij(d)Y_ik(d), j ≤ k = 1, …, J_i, i = 1, …, I) corresponding to d ∈ .

Proof

Consider least squares estimation of (K₀(d, d′), K₀₁(d, d′), K₀₁(d′, d), K₁(d, d′), K_U (d, d′) + σ²δ_dd_′; d ≤ d′ ∈ Inline graphic ) on the basis of (3.1). First, note that the design matrix in the corresponding linear regression is block diagonal, with blocks corresponding to (d, d′), d < d′, containing entries (1, T_ik, T_ij, T_ijT_ik, δ_jk) in the row corresponding to Y_ij(d)Y_ik(d′), and blocks corresponding to (d, d) containing entries (1, T_ik +T_ij, T_ijT_ik, δ_jk) in the row corresponding to Y_ij(d)Y_ik(d). Second, note that the blocks are identical for all pairs (d, d′), d < d′, respectively all pairs (d, d). Least squares estimates thus can be expressed as ${\hat{β}}_{1} \equiv {(X_{1}^{'} X_{1})}^{- 1} X_{1}^{'} c_{1}$ and ${\hat{β}}_{2} = {(X_{2}^{'} X_{2})}^{- 1} X_{2}^{'} c_{2}$ , where $X_{1}^{'} X_{1}$ and $X_{2}^{'} X_{2}$ are 5 × 5 and 4 × 4 matrices, respectively.

Proof of Corollary 1

From Theorem 1 and the analogous argument for the general model, only (p²+1)×(p²+1) matrices need to be inverted. Matrix inversion thus is of order O(p⁶). Matrix multiplication is of order O(p²D²g), giving overall computational effort of order O(max(p⁶, p²D²g)).

Theorem 2

The estimated BLUPs in (3.2) can be expressed as

\hat{b} = (\begin{array}{l} I_{N_{X} I} & - {BC}^{'} \\ 0 & I_{N_{U} n} \end{array}) (\begin{array}{l} B & 0 \\ 0 & R \end{array}) (\begin{array}{l} I_{N_{X} I} & 0 \\ - CB & I_{N_{U} n} \end{array}) Z^{'} (Y - η),

with C = E_I ⊗ φ^U^′ φ⁰ + T ⊗ φ^U^′ φ¹, R = I_n ⊗ diag(ν_k/(ν_k + σ²)) + GFG′, G = E_I ⊗ diag(ν_k/(ν_k + σ²)) φ^U^′ φ⁰ + T ⊗ diag(ν_k/(ν_k + σ²)) φ^U^′ φ¹, and where B and F are block-diagonal with blocks

\begin{array}{l} B_{i} = {[J_{i} {Φ^{0}}^{'} Φ^{0} + T_{i •} ({Φ^{0}}^{'} Φ^{1} + {Φ^{1}}^{'} Φ^{0}) + T_{i •}^{2} {Φ^{1}}^{'} Φ^{1} + diag (σ^{2} / λ_{1}, \dots, σ^{2} / λ_{N_{X}})]}^{- 1}, \\ F_{i} = {[J_{i} {Φ^{0}}^{'} L Φ^{0} + T_{i •} ({Φ^{0}}^{'} L Φ^{1} + {Φ^{1}}^{'} L Φ^{0}) + T_{i •}^{2} {Φ^{1}}^{'} L Φ^{1} + diag (σ^{2} / λ_{1}, \dots, σ^{2} / λ_{N_{X}})]}^{- 1}, \end{array}

i = 1, …, I, denoting $T_{i •} = \sum_{j = 1}^{J_{i}} T_{i j}, T_{i •}^{2} = \sum_{j = 1}^{J_{i}} T_{i j}^{2}$ , and L = I_D − Φ^U diag(ν_k/(ν_k + σ²)) Φ^U^′. Here, diag(a_k) denotes a diagonal matrix with entries a_k, k = 1, …, N_U, and we suppress for simplicity hat notation that indicates estimated quantities.

Proof

For simplicity, we suppress hat notation that indicates estimated quantities in the computation of the EBLUPs. Using the Woodbury formula, we obtain

\hat{b} = {(Z^{'} Z + σ^{2} D^{- 1})}^{- 1} Z^{'} (Y - η) .

Using the Schur complement S, write

\begin{array}{c} {(Z^{'} Z + σ^{2} D^{- 1})}^{- 1} = {(\begin{array}{c} Z_{X}^{'} Z_{X} + σ^{2} D_{X}^{- 1} & Z_{X}^{'} Z_{U} \\ Z_{U}^{'} Z_{X} & Z_{U}^{'} Z_{U} + σ^{2} D_{U}^{- 1} \end{array})}^{- 1} \\ = (\begin{array}{c} I_{N_{X} I} & - A^{- 1} Z_{X}^{'} Z_{U} \\ 0 & I_{N_{U} n} \end{array}) (\begin{array}{c} A^{- 1} & 0 \\ 0 & S^{- 1} \end{array}) (\begin{array}{c} I_{N_{X} I} & 0 \\ - Z_{U}^{'} Z_{X} A^{- 1} & I_{N_{U} n} \end{array}), \end{array}

where $A = Z_{X}^{'} Z_{X} + σ^{2} D_{X}^{- 1}$ and $S = (Z_{U}^{'} Z_{U} + σ^{2} D_{U}^{- 1}) - Z_{U}^{'} Z_{X} A^{- 1} Z_{X}^{'} Z_{U}$ .

Using properties of the Kronecker product, we have

A = diag (J_{1}, \dots, J_{I}) \otimes {Φ^{0}}^{'} Φ^{0} + diag (T_{1 •}, \dots, T_{I •}) \otimes ({Φ^{0}}^{'} Φ^{1} + {Φ^{1}}^{'} Φ^{0}) + diag (T_{1 •}^{2}, \dots, T_{I •}^{2}) \otimes {Φ^{1}}^{'} Φ^{1} + I_{I} \otimes diag (σ^{2} / λ_{1}, \dots, σ^{2} / λ_{N_{X}}) .

Thus, A is a block-diagonal matrix with I blocks A_i of size N_X × N_X, and B = A⁻¹ can be computed as a block-diagonal matrix with the ith block of size N_X × N_X of the form

B_{i} = [J_{i} {Φ^{0}}^{'} Φ^{0} + T_{i •} ({Φ^{0}}^{'} Φ^{1} + {Φ^{1}}^{'} Φ^{0}) + T_{i •}^{2} {Φ^{1}}^{'} Φ^{1} {+ diag (σ^{2} / λ_{1}, \dots, σ^{2} / λ_{N_{X}})]}^{- 1} .

Analogously,

\begin{array}{l} {(Z_{U}^{'} Z_{U} + σ^{2} D_{U}^{- 1})}^{- 1} = {(I_{n} \otimes {Φ^{U}}^{'} Φ^{U} + I_{n} \otimes diag (σ^{2} / ν_{1}, \dots, σ^{2} / ν_{N_{U}}))}^{- 1} \\ = I_{n} \otimes diag (ν_{1} / (ν_{1} + σ^{2}), \dots, ν_{N_{U}} / (ν_{N_{U}} + σ^{2})) \end{array}

can be computed explicitly, as the columns of Φ_U are orthonormal by construction. And finally, using the Woodbury formula again, the Schur complement S can be inverted as

\begin{array}{l} R = S^{- 1} = {(Z_{U}^{'} Z_{U} + σ^{2} D_{U}^{- 1})}^{- 1} + {(Z_{U}^{'} Z_{U} + σ^{2} D_{U}^{- 1})}^{- 1} Z_{U}^{'} Z_{X} \times [A - Z_{X}^{'} Z_{U} {(Z_{U}^{'} Z_{U} + σ^{2} D_{U}^{- 1})}^{- 1} \times {Z_{U}^{'} Z_{X}]}^{- 1} Z_{X}^{'} Z_{U} {(Z_{U}^{'} Z_{U} + σ^{2} D_{U}^{- 1})}^{- 1} \\ = I_{n} \otimes diag (ν_{k} / (ν_{k} + σ^{2})) + {GFG}^{'}, \end{array}

where

\begin{array}{l} C = Z_{U}^{'} Z_{X} = E_{I} \otimes {φ^{U}}^{'} φ^{0} + T \otimes {φ^{U}}^{'} φ^{1}, \\ G = {(Z_{U}^{'} Z_{U} + σ^{2} D_{U}^{- 1})}^{- 1} C \\ = E_{I} \otimes diag (ν_{k} / (ν_{k} + σ^{2})) {φ^{U}}^{'} φ^{0} + T \otimes diag (ν_{k} / (ν_{k} + σ^{2})) {φ^{U}}^{'} φ^{1}, \end{array}

and

\begin{array}{l} H = Z_{X}^{'} Z_{U} {(Z_{U}^{'} Z_{U} + σ^{2} D_{U}^{- 1})}^{- 1} Z_{U}^{'} Z_{X} \\ = ({E_{I}}^{'} \otimes {Φ^{0}}^{'} Φ^{U} + T^{'} \otimes {Φ^{1}}^{'} Φ^{U}) \times (I_{n} \otimes diag (ν_{k} / ν_{k} + σ^{2})) (E_{I} \otimes {Φ^{U}}^{'} Φ^{0} + T \otimes {Φ^{U}}^{'} Φ^{1}) \\ = diag (J_{i}) \otimes {Φ^{0}}^{'} Φ^{U} diag (ν_{k} / (ν_{k} + σ^{2})) {Φ^{U}}^{'} Φ^{0} + diag (T_{i •}) \otimes {Φ^{0}}^{'} Φ^{U} diag (ν_{k} / (ν_{k} + σ^{2})) {Φ^{U}}^{'} Φ^{1} + diag (T_{i •}) \otimes {Φ^{1}}^{'} Φ^{U} diag (ν_{k} / (ν_{k} + σ^{2})) {Φ^{U}}^{'} Φ^{0} + diag (T_{i •}^{2}) \otimes {Φ^{1}}^{'} Φ^{U} diag (ν_{k} / (ν_{k} + σ^{2})) {Φ^{U}}^{'} Φ^{1} \end{array}

is again block-diagonal with N_X × N_X blocks, as is A, such that F = [A−H]⁻¹ can be computed by inverting each block

A_{i} - H_{i} = [J_{i} {Φ^{0}}^{'} L Φ^{0} + T_{i •} ({Φ^{0}}^{'} L Φ^{1} + {Φ^{1}}^{'} L Φ^{0}) + T_{i •}^{2} {Φ^{1}}^{'} L Φ^{1} + diag (σ^{2} / λ_{1}, \dots, σ^{2} / λ_{N_{X}})],

with L = I_D − Φ^U diag(ν_k/(ν_k + σ²))Φ^U^′, separately.

Proof of Corollary 2

From Theorem 2, only matrices of size N_X × N_X need to be inverted to compute the EBLUPs, giving computational effort of order $O (I N_{X}^{3})$ . Usage of the block structure for all matrices reduces computation for the matrix multiplications. For example, multiplication of the (nN_U + IN_X) × nD and nD × 1 matrices Z′ and (Y − η), usually an O(nD(nN_U + IN_X)) operation, here reduces to I multiplications of N_X × D with D × 1 matrices and n multiplications of N_U × D with D × 1 matrices. Similarly bookkeeping for the other operations leads to the overall effort of order O(nD(N_U + N_XI/n)).

Proof of Lemma 1 and Corollary 3

Iterated expectations give us

Var {Y_{i j} (s)} = E {Var [Y_{i j} (s) ∣ T_{i j}]} + Var {E [Y_{i j} (s) ∣ T_{i j}]} .

As E{Y_ij(s)|T_ij} = 0,

\begin{array}{l} \int_{0}^{1} Var {Y_{i j} (s)} d s = \int_{0}^{1} E {\sum_{k = 1}^{\infty} λ_{k} {[φ_{k}^{0} (s) + T_{i j} φ_{k}^{1} (s)]}^{2} + \sum_{k = 1}^{\infty} ν_{k} {(φ_{k}^{U} (s))}^{2} + σ^{2}} d s \\ = \int_{0}^{1} \sum_{k = 1}^{\infty} λ_{k} [{(φ_{k}^{0} (s))}^{2} + 2 E(T_{i j}) φ_{k}^{0} (s) φ_{k}^{1} (s) + E (T_{i j}^{2}) {(φ_{k}^{1} (s))}^{2}] + \sum_{k = 1}^{\infty} ν_{k} {(φ_{k}^{U} (s))}^{2} + σ^{2} d s . \end{array}

Now consider the case where E(T_ij) = 0 and Var(T_ij) = 1. In this case, we have

\begin{array}{l} \int_{0}^{1} Var {Y_{i j} (s)} d s = \sum_{k = 1}^{\infty} λ_{k} \int_{0}^{1} {(φ_{k}^{0} (s))}^{2} + {(φ_{k}^{1} (s))}^{2} d s + \sum_{k = 1}^{\infty} ν_{k} \int_{0}^{1} {(φ_{k}^{U} (s))}^{2} d s + σ^{2} \\ = \sum_{k = 1}^{\infty} λ_{k} + \sum_{k = 1}^{\infty} ν_{k} + σ^{2} \end{array}

due to the orthonormality of the eigenfunctions.

Appendix B: Estimation of the general functional mixed model

Estimation for model (2.2) proceeds in the same way as for model (2.1). In this section, we briefly point out the necessary minor adjustments. An estimate of the mean function η(d, Z_ij) can again be obtained under a working independence assumption and under the specified model. For example, under the specification η(d, Z_ij) = η₁(d, Z_ij,₁) + ···+ η_m(d, Z_ij,m), bivariate smoothing in an additive model can be used.

In the estimation of the covariance functions, equation (3.1) now becomes

\begin{array}{l} E {Y_{i j} (d) Y_{i k} (d^{'})} = V_{i j}^{'} K_{X} (d, d^{'}) V_{i j} + [K_{U} (d, d^{'}) + σ^{2} δ_{d d^{'}}] δ_{j k} \\ = \sum_{l = 1}^{p} \sum_{m = 1}^{p} V_{i j} V_{ijm} K_{l m} (d, d^{'}) + [K_{U} (d, d^{'}) + σ^{2} δ_{d d^{'}}] δ_{j k}, \end{array}

where V_ij = (V_ij₁, …, V_ijp), and the three step estimation procedure for the covariance functions extends straightforwardly. The size of the matrix to be inverted during step 1 increases from 5 × 5 to (p² + 1) × (p² + 1).

Similarly, estimation of the eigenfunctions using the spectral decomposition, and estimation of the scores using best linear unbiased prediction, proceeds completely analogously, keeping in mind that { $φ_{k}^{X} (\cdot) = (φ_{k}^{1} (\cdot), \dots, φ_{k}^{p} (\cdot)$ , k = 1, 2, …} now form an orthonormal basis for the (L²[0, 1])^p.

Choice of N_X and N_U can again proceed using the proportion of variance explained. Standardization of variables in V_ij is recommended. Note, however, an additional complication in the higher-dimensional case. If some of the covariates in V_ij are correlated, corresponding additional terms will appear in the expansion of ∫Var{Y_ij(d)}. The eigenvalues λ_k might then somewhat over- or underrepresent the relative importance of the corresponding component $φ_{k}^{X}$ in explaining the variation in Y_ij(d). If strong correlations are a concern, additional measures, such as the use of orthogonal polynomials in the case of polynomial V_ij, should be taken.

Contributor Information

Sonja Greven, Email: sonja.greven@stat.uni-muenchen.de, Department of Statistics, Ludwig-Maximilians-University Munich, Ludwigstr. 33, 80539 Munich, Germany.

Ciprian Crainiceanu, Email: ccrainic@jhsph.edu, Department of Biostatistics, Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205, USA.

Brian Caffo, Email: bcaffo@jhsph.edu, Department of Biostatistics, Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205, USA.

Daniel Reich, Email: reichds@ninds.nih.gov, Translational Neuroradiology Unit, Neuroimmunology Branch, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20814, USA. Departments of Radiology and Neurology, Johns Hopkins Hospital, 600 N. Wolfe Street, Baltimore, MD 21287, USA.

References

1.Basser P, Mattiello J, LeBihan D. MR diffusion tensor spectroscopy and imaging. Biophysical Journal. 1994;66:259–267. doi: 10.1016/S0006-3495(94)80775-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Basser PJ, Pierpaoli C. Microstructural and physiological features of tissues elucidated by quantitative-diffusion-tensor MRI. Journal of Magnetic Resonance, Series B. 1996;111:209–219. doi: 10.1006/jmrb.1996.0086. [DOI] [PubMed] [Google Scholar]
3.Brumback BA, Rice JA. Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association. 1998:961–976. [Google Scholar]
4.Calabresi PA. Multiple sclerosis and demyelinating conditions of the central nervous system. In: Goldman L, Ausiello DA, editors. Cecil Medicine. 23. Saunders: Elsevier; 2008. [Google Scholar]
5.Crainiceanu C, Ruppert D. Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society, Series B. 2004;66:165–185. [Google Scholar]
6.Crainiceanu CM, Staicu AM, Di CZ. Generalized Multilevel Functional Regression. Journal of the American Statistical Association. 2009;104:1550–1561. doi: 10.1198/jasa.2009.tm08564. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Di CZ, Crainiceanu CM, Caffo BS, Punjabi NM. Multilevel functional principal component analysis. Annals of Applied Statistics. 2008;3:458–488. doi: 10.1214/08-AOAS206SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Diggle P, Heagerty P, Liang KY, Zeger S. Analysis of longitudinal data. Oxford University Press; USA: 2002. [Google Scholar]
9.Fan J, Gijbels I. Local polynomial modelling and its applications. CRC Press; 1996. [Google Scholar]
10.Ferraty F, Vieu P. Nonparametric functional data analysis: theory and practice. Springer Verlag; 2006. [Google Scholar]
11.Green PJ, Silverman BW. Nonparametric Regression and Generalized Linear Models: a Roughness Penalty Approach. Chapman & Hall Ltd; 1994. [Google Scholar]
12.Greven S, Kneib T. On the Behaviour of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models. Biometrika. 2010 to appear. [Google Scholar]
13.Greven S, Crainiceanu C, Caffo B, Reich D. Supplement to “Longitudinal functional principal component analysis. 2010 doi: 10.1214/10-EJS575SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Greven S, Crainiceanu CM, Küchenhoff H, Peters A. Restricted Likelihood Ratio Testing for Zero Variance Components in Linear Mixed Models. Journal of Computational and Graphical Statistics. 2008;17:870–891. [Google Scholar]
15.Guo W. Functional mixed effects models. Biometrics. 2002;58:121–128. doi: 10.1111/j.0006-341x.2002.00121.x. [DOI] [PubMed] [Google Scholar]
16.Guo W. Functional data analysis in longitudinal settings using smoothing splines. Statistical methods in medical research. 2004;13:49. doi: 10.1191/0962280204sm352ra. [DOI] [PubMed] [Google Scholar]
17.Hall P, Müller HG, Yao F. Modelling sparse generalized longitudinal observations with latent Gaussian processes. Journal of the Royal Statistical Society: Series B. 2008;70:703–723. [Google Scholar]
18.Heim S, Fahrmeir L, Eilers P, Marx B. 3D space-varying coefficient models with application to diffusion tensor imaging. Computational Statistics & Data Analysis. 2007;51:6212–6228. [Google Scholar]
19.Herrick RC, Morris JS. Wavelet-Based Functional Mixed Model Analysis: Computation Considerations. In. Proceedings, Joint Statistical Meetings, ASA Section on Statistical Computing 2006 [Google Scholar]
20.Karhunen K. Über Lineare Methoden in der Wahrscheinlichkeit-srechnung. Annales Academiae Scientiarum Fennicae. 1947;37:1–79. [Google Scholar]
21.Krivobokova T, Kauermann G. A note on penalized spline smoothing with correlated errors. Journal of the American Statistical Association. 2007;102:1328–1337. [Google Scholar]
22.Laird N, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
23.Liang H, Wu H, Zou G. A note on conditional AIC for linear mixed-effects models. Biometrika. 2008;95:773–778. doi: 10.1093/biomet/asn023. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Lin X, Carroll RJ. Nonparametric Function Estimation for Clustered Data When the Predictor Is Measured Without/With Error. Journal of the American Statistical Association. 2000;95:520–534. [Google Scholar]
25.Lin F, Yu C, Jiang T, Li K, Li X, Qin W, Sun H, Chan P. Quantitative analysis along the pyramidal tract by length-normalized parameterization based on diffusion tensor tractography: application to patients with relapsing neuromyelitis optica. NeuroImage. 2006;33:154–160. doi: 10.1016/j.neuroimage.2006.03.055. [DOI] [PubMed] [Google Scholar]
26.Loeve M. Fonctions aléatoires du second ordre. Comptes Rendus Académie des Sciences. 1945;220:380. [Google Scholar]
27.Mercer J. Functions of positive and negative type, and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society of London Series A. 1909:415–446. [Google Scholar]
28.Mori S, Crain BJ, Chacko V, Van Zijl PCM. Three-dimensional tracking of axonal projections in the brain by magnetic resonance imaging. Annals of Neurology. 1999;45:265–269. doi: 10.1002/1531-8249(199902)45:2<265::aid-ana21>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
29.Morris JS, Carroll RJ. Wavelet-based functional mixed models. Journal of the Royal Statistical Society, Series B. 2006;68:179–199. doi: 10.1111/j.1467-9868.2006.00539.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Müller HG. Functional modelling and classification of longitudinal data. Scandinavian Journal of Statistics. 2005;32:223–240. [Google Scholar]
31.Müller HG, Zhang Y. Time-varying functional regression for predicting remaining lifetime distributions from longitudinal trajectories. Biometrics. 2005;61:1064–1075. doi: 10.1111/j.1541-0420.2005.00378.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Oh JS, Song IC, Lee JS, Kang H, Park KS, Kang E, Lee DS. Tractography-guided statistics (TGIS) in diffusion tensor imaging for the detection of gender difference of fiber integrity in the midsagittal and parasagittal corpora callosa. Neuroimage. 2007;36:606–616. doi: 10.1016/j.neuroimage.2007.03.020. [DOI] [PubMed] [Google Scholar]
33.Ozturk A, Smith S, Gordon-Lipkin E, Harrison D, Shiee N, Pham D, Caffo B, Calabresi P, Reich D. MRI of the corpus callosum in multiple sclerosis: association with disability. Multiple Sclerosis. 2009 doi: 10.1177/1352458509353649. to appear. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Ramsay JO, Silverman B. Functional data analysis. 2. Springer; 2005. [Google Scholar]
35.Rao CR. The theory of least squares when the parameters are stochastic and its application to the analysis of growth curves. Biometrika. 1965;52:447–458. [PubMed] [Google Scholar]
36.Reich DS, Smith SA, Zackowski KM, Gordon-Lipkin EM, Jones CK, Farrell JAD, Mori S, van Zijl PCM, Calabresi PA. Multiparametric magnetic resonance imaging analysis of the corticospinal tract in multiple sclerosis. Neuroimage. 2007;38:271–279. doi: 10.1016/j.neuroimage.2007.07.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Rice JA. Functional and longitudinal data analysis: Perspectives on smoothing. Statistica Sinica. 2004;14:631–647. [Google Scholar]
38.Rice JA, Silverman B. Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society Series B. 1991;53:233–243. [Google Scholar]
39.Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Cambridge University Press; 2003. [Google Scholar]
40.Staicu AM, Crainiceanu CM, Carroll RJ. Fast Methods for Spatially Correlated Multilevel Functional Data. Biostatistics. 2010 doi: 10.1093/biostatistics/kxp058. to appear. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Staniswalis JG, Lee JJ. Nonparametric Regression Analysis of Longitudinal Data. Journal of the American Statistical Association. 1998;93:1403–1404. [Google Scholar]
42.Tievsky AL, Ptak T, Farkas J. Investigation of apparent diffusion coefficient and diffusion tensor anisotropy in acute and chronic multiple sclerosis lesions. American Journal of Neuroradiology. 1999;20:1491–1499. [PMC free article] [PubMed] [Google Scholar]
43.Vaida F, Blanchard S. Conditional Akaike information for mixed-effects models. Biometrika. 2005;92:351–370. [Google Scholar]
44.Verbeke G, Molenberghs G. Linear mixed models for longitudinal data. Springer; 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Werring D, Clark C, Barker G, Thompson A, Miller D. Diffusion tensor imaging of lesions and normal-appearing white matter in multiple sclerosis. Neurology. 1999;52:1626–1632. doi: 10.1212/wnl.52.8.1626. [DOI] [PubMed] [Google Scholar]
46.Witelson SF. Hand and sex differences in the isthmus and genu of the human corpus callosum: a postmortem morphological study. Brain. 1989;112:799–835. doi: 10.1093/brain/112.3.799. [DOI] [PubMed] [Google Scholar]
47.Wood SN. Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC; 2006. [Google Scholar]
48.Wu H, Zhang JT. Nonparametric regression methods for longitudinal data analysis: mixed-effects modeling approaches. Wiley-Blackwell; 2006. [Google Scholar]
49.Yao F, Lee TCM. Penalized spline models for functional principal component analysis. Journal of the Royal Statistical Society, Series B. 2006;68:3–25. [Google Scholar]
50.Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005;100:577–590. [Google Scholar]
51.Yao F, Clifford AJ, Dueker SR, Follett J, Lin Y, Buch-holz BA, Vogel JS. Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics. 2003;59:676–685. doi: 10.1111/1541-0420.00078. [DOI] [PubMed] [Google Scholar]
52.Zhu H, Styner M, Tang N, Liu Z, Lin W, Gilmore J. FRATS: Functional Regression Analysis of DTI Tract Statistics. IEEE Transactions on Medical Imaging. 2010;29:1039–1049. doi: 10.1109/TMI.2010.2040625. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web supplement

NIHMS299819-supplement-Web_supplement.zip^{(15.1MB, zip)}

[R1] 1.Basser P, Mattiello J, LeBihan D. MR diffusion tensor spectroscopy and imaging. Biophysical Journal. 1994;66:259–267. doi: 10.1016/S0006-3495(94)80775-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Basser PJ, Pierpaoli C. Microstructural and physiological features of tissues elucidated by quantitative-diffusion-tensor MRI. Journal of Magnetic Resonance, Series B. 1996;111:209–219. doi: 10.1006/jmrb.1996.0086. [DOI] [PubMed] [Google Scholar]

[R3] 3.Brumback BA, Rice JA. Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association. 1998:961–976. [Google Scholar]

[R4] 4.Calabresi PA. Multiple sclerosis and demyelinating conditions of the central nervous system. In: Goldman L, Ausiello DA, editors. Cecil Medicine. 23. Saunders: Elsevier; 2008. [Google Scholar]

[R5] 5.Crainiceanu C, Ruppert D. Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society, Series B. 2004;66:165–185. [Google Scholar]

[R6] 6.Crainiceanu CM, Staicu AM, Di CZ. Generalized Multilevel Functional Regression. Journal of the American Statistical Association. 2009;104:1550–1561. doi: 10.1198/jasa.2009.tm08564. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Di CZ, Crainiceanu CM, Caffo BS, Punjabi NM. Multilevel functional principal component analysis. Annals of Applied Statistics. 2008;3:458–488. doi: 10.1214/08-AOAS206SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Diggle P, Heagerty P, Liang KY, Zeger S. Analysis of longitudinal data. Oxford University Press; USA: 2002. [Google Scholar]

[R9] 9.Fan J, Gijbels I. Local polynomial modelling and its applications. CRC Press; 1996. [Google Scholar]

[R10] 10.Ferraty F, Vieu P. Nonparametric functional data analysis: theory and practice. Springer Verlag; 2006. [Google Scholar]

[R11] 11.Green PJ, Silverman BW. Nonparametric Regression and Generalized Linear Models: a Roughness Penalty Approach. Chapman & Hall Ltd; 1994. [Google Scholar]

[R12] 12.Greven S, Kneib T. On the Behaviour of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models. Biometrika. 2010 to appear. [Google Scholar]

[R13] 13.Greven S, Crainiceanu C, Caffo B, Reich D. Supplement to “Longitudinal functional principal component analysis. 2010 doi: 10.1214/10-EJS575SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Greven S, Crainiceanu CM, Küchenhoff H, Peters A. Restricted Likelihood Ratio Testing for Zero Variance Components in Linear Mixed Models. Journal of Computational and Graphical Statistics. 2008;17:870–891. [Google Scholar]

[R15] 15.Guo W. Functional mixed effects models. Biometrics. 2002;58:121–128. doi: 10.1111/j.0006-341x.2002.00121.x. [DOI] [PubMed] [Google Scholar]

[R16] 16.Guo W. Functional data analysis in longitudinal settings using smoothing splines. Statistical methods in medical research. 2004;13:49. doi: 10.1191/0962280204sm352ra. [DOI] [PubMed] [Google Scholar]

[R17] 17.Hall P, Müller HG, Yao F. Modelling sparse generalized longitudinal observations with latent Gaussian processes. Journal of the Royal Statistical Society: Series B. 2008;70:703–723. [Google Scholar]

[R18] 18.Heim S, Fahrmeir L, Eilers P, Marx B. 3D space-varying coefficient models with application to diffusion tensor imaging. Computational Statistics & Data Analysis. 2007;51:6212–6228. [Google Scholar]

[R19] 19.Herrick RC, Morris JS. Wavelet-Based Functional Mixed Model Analysis: Computation Considerations. In. Proceedings, Joint Statistical Meetings, ASA Section on Statistical Computing 2006 [Google Scholar]

[R20] 20.Karhunen K. Über Lineare Methoden in der Wahrscheinlichkeit-srechnung. Annales Academiae Scientiarum Fennicae. 1947;37:1–79. [Google Scholar]

[R21] 21.Krivobokova T, Kauermann G. A note on penalized spline smoothing with correlated errors. Journal of the American Statistical Association. 2007;102:1328–1337. [Google Scholar]

[R22] 22.Laird N, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]

[R23] 23.Liang H, Wu H, Zou G. A note on conditional AIC for linear mixed-effects models. Biometrika. 2008;95:773–778. doi: 10.1093/biomet/asn023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Lin X, Carroll RJ. Nonparametric Function Estimation for Clustered Data When the Predictor Is Measured Without/With Error. Journal of the American Statistical Association. 2000;95:520–534. [Google Scholar]

[R25] 25.Lin F, Yu C, Jiang T, Li K, Li X, Qin W, Sun H, Chan P. Quantitative analysis along the pyramidal tract by length-normalized parameterization based on diffusion tensor tractography: application to patients with relapsing neuromyelitis optica. NeuroImage. 2006;33:154–160. doi: 10.1016/j.neuroimage.2006.03.055. [DOI] [PubMed] [Google Scholar]

[R26] 26.Loeve M. Fonctions aléatoires du second ordre. Comptes Rendus Académie des Sciences. 1945;220:380. [Google Scholar]

[R27] 27.Mercer J. Functions of positive and negative type, and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society of London Series A. 1909:415–446. [Google Scholar]

[R28] 28.Mori S, Crain BJ, Chacko V, Van Zijl PCM. Three-dimensional tracking of axonal projections in the brain by magnetic resonance imaging. Annals of Neurology. 1999;45:265–269. doi: 10.1002/1531-8249(199902)45:2<265::aid-ana21>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]

[R29] 29.Morris JS, Carroll RJ. Wavelet-based functional mixed models. Journal of the Royal Statistical Society, Series B. 2006;68:179–199. doi: 10.1111/j.1467-9868.2006.00539.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Müller HG. Functional modelling and classification of longitudinal data. Scandinavian Journal of Statistics. 2005;32:223–240. [Google Scholar]

[R31] 31.Müller HG, Zhang Y. Time-varying functional regression for predicting remaining lifetime distributions from longitudinal trajectories. Biometrics. 2005;61:1064–1075. doi: 10.1111/j.1541-0420.2005.00378.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Oh JS, Song IC, Lee JS, Kang H, Park KS, Kang E, Lee DS. Tractography-guided statistics (TGIS) in diffusion tensor imaging for the detection of gender difference of fiber integrity in the midsagittal and parasagittal corpora callosa. Neuroimage. 2007;36:606–616. doi: 10.1016/j.neuroimage.2007.03.020. [DOI] [PubMed] [Google Scholar]

[R33] 33.Ozturk A, Smith S, Gordon-Lipkin E, Harrison D, Shiee N, Pham D, Caffo B, Calabresi P, Reich D. MRI of the corpus callosum in multiple sclerosis: association with disability. Multiple Sclerosis. 2009 doi: 10.1177/1352458509353649. to appear. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Ramsay JO, Silverman B. Functional data analysis. 2. Springer; 2005. [Google Scholar]

[R35] 35.Rao CR. The theory of least squares when the parameters are stochastic and its application to the analysis of growth curves. Biometrika. 1965;52:447–458. [PubMed] [Google Scholar]

[R36] 36.Reich DS, Smith SA, Zackowski KM, Gordon-Lipkin EM, Jones CK, Farrell JAD, Mori S, van Zijl PCM, Calabresi PA. Multiparametric magnetic resonance imaging analysis of the corticospinal tract in multiple sclerosis. Neuroimage. 2007;38:271–279. doi: 10.1016/j.neuroimage.2007.07.049. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Rice JA. Functional and longitudinal data analysis: Perspectives on smoothing. Statistica Sinica. 2004;14:631–647. [Google Scholar]

[R38] 38.Rice JA, Silverman B. Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society Series B. 1991;53:233–243. [Google Scholar]

[R39] 39.Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Cambridge University Press; 2003. [Google Scholar]

[R40] 40.Staicu AM, Crainiceanu CM, Carroll RJ. Fast Methods for Spatially Correlated Multilevel Functional Data. Biostatistics. 2010 doi: 10.1093/biostatistics/kxp058. to appear. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Staniswalis JG, Lee JJ. Nonparametric Regression Analysis of Longitudinal Data. Journal of the American Statistical Association. 1998;93:1403–1404. [Google Scholar]

[R42] 42.Tievsky AL, Ptak T, Farkas J. Investigation of apparent diffusion coefficient and diffusion tensor anisotropy in acute and chronic multiple sclerosis lesions. American Journal of Neuroradiology. 1999;20:1491–1499. [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Vaida F, Blanchard S. Conditional Akaike information for mixed-effects models. Biometrika. 2005;92:351–370. [Google Scholar]

[R44] 44.Verbeke G, Molenberghs G. Linear mixed models for longitudinal data. Springer; 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] 45.Werring D, Clark C, Barker G, Thompson A, Miller D. Diffusion tensor imaging of lesions and normal-appearing white matter in multiple sclerosis. Neurology. 1999;52:1626–1632. doi: 10.1212/wnl.52.8.1626. [DOI] [PubMed] [Google Scholar]

[R46] 46.Witelson SF. Hand and sex differences in the isthmus and genu of the human corpus callosum: a postmortem morphological study. Brain. 1989;112:799–835. doi: 10.1093/brain/112.3.799. [DOI] [PubMed] [Google Scholar]

[R47] 47.Wood SN. Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC; 2006. [Google Scholar]

[R48] 48.Wu H, Zhang JT. Nonparametric regression methods for longitudinal data analysis: mixed-effects modeling approaches. Wiley-Blackwell; 2006. [Google Scholar]

[R49] 49.Yao F, Lee TCM. Penalized spline models for functional principal component analysis. Journal of the Royal Statistical Society, Series B. 2006;68:3–25. [Google Scholar]

[R50] 50.Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005;100:577–590. [Google Scholar]

[R51] 51.Yao F, Clifford AJ, Dueker SR, Follett J, Lin Y, Buch-holz BA, Vogel JS. Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics. 2003;59:676–685. doi: 10.1111/1541-0420.00078. [DOI] [PubMed] [Google Scholar]

[R52] 52.Zhu H, Styner M, Tang N, Liu Z, Lin W, Gilmore J. FRATS: Functional Regression Analysis of DTI Tract Statistics. IEEE Transactions on Medical Imaging. 2010;29:1039–1049. doi: 10.1109/TMI.2010.2040625. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Longitudinal functional principal component analysis

Sonja Greven

Ciprian Crainiceanu

Brian Caffo

Daniel Reich

Abstract

1. Introduction

Fig 1.

2. The longitudinal functional model

2.1. The functional random intercept and random slope model

2.2. The general functional mixed model

2.3. Dimension reduction via longitudinal FPCA

3. Estimation

3.1. Estimation of the mean

3.2. Estimation of the covariance operators

Corollary 1

3.3. Estimation of the eigenfunctions and scores

Corollary 2

3.4. Decomposition of variance and choice of the number of components

Lemma 1

Corollary 3

4. Simulations

4.1. Simulation design

Table 2.

4.2. Simulation results

Fig 2.

Table 1.

4.3. Computational efficiency

Fig 3.

5. Application

5.1. Background and scientific questions

5.2. Application of LFPCA to the tractography data

Fig 4.

Fig 5.

6. Discussion

Supplementary Material

Supplementary Material

Acknowledgments

Appendix A: Theoretical results and proofs

Theorem 1

Proof

Proof of Corollary 1

Theorem 2

Proof

Proof of Corollary 2

Proof of Lemma 1 and Corollary 3

Appendix B: Estimation of the general functional mixed model

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases