Summary
Motivated by modern observational studies, we introduce a class of functional models that expand nested and crossed designs. These models account for the natural inheritance of the correlation structures from sampling designs in studies where the fundamental unit is a function or image. Inference is based on functional quadratics and their relationship with the underlying covariance structure of the latent processes. A computationally fast and scalable estimation procedure is developed for high-dimensional data. Methods are used in applications including high-frequency accelerometer data for daily activity, pitch linguistic data for phonetic analysis, and EEG data for studying electrical brain activity during sleep.
Keywords: Multilevel correlation structure, Functional linear mixed model, Functional principal component analysis, Latent process, Variance component
1. Introduction
In many current studies, functional measurements have well-defined stochastic structures induced either by the experimental design or by the scientific meaning of the data. For example, the Sleep Heart Health Study (SHHS) (Quan et al., 1997; Crainiceanu et al., 2009; Di et al., 2009) collected electroencephalograms (EEG) data for thousands of subjects at two visits, roughly 5 years apart. At every visit, EEG data were recorded at a frequency of 125 Hz during sleep. Thus, for each subject and visit, data consist of 125 observations per second. Crainiceanu et al. (2009) applied a Fourier transformation to the original data and obtained the normalized δ-power as a densely sampled stationary time series. These data have a natural hierarchical structure induced by the replicated visits within each subject. More precisely, one can denote the δ-power function for visit j of subject i at time t after sleep onset by Yij(t), which can be decomposed into a subject-specific process Xi(t) and a visit-within-subject process Uij(t) that quantifies the deviation from the subject-specific mean. A second example is provided by Bai et al. (2012) in a recent study of physical activity in an elderly population. In this study, each subject wears an accelerometer that records three-axis accelerations during in-home activities at a sampling frequency of 10 Hz. Bai et al. (2012) introduced activity intensity, a measure of activity expressed in multiple of signal standard deviations of inactive periods. Activity intensity is calculated in every tenth of one second interval. Figure 1 displays the activity intensity for five subjects during a 5-day period averaged over 15 minutes for improved display clarity. One possibility of analyzing these data is to focus on activity intensity in non-overlapping one-hour intervals. Thus, the data for each subject on every day contains 36,000 activity measurements per hour for 24 hours. This can be viewed as a three-level hierarchical structure: hour within day within subject. More specifically, let Yijk(t) be the activity intensity at time t within hour k on day j for subject i. In addition to the subject-specific process Xi(t) and the day-within-subject process Uij(t), the remaining part of the variation in Yijk(t) can be explained by the hour-specific process Wijk(t) that quantifies the deviation of hour k from the average of day j for subject i.
Aston, Chiou, and Evans (2010) described a different study of phonetic analysis where the authors were interested in studying the fundamental frequency (F0, “pitch”) of spoken languages. In particular, they recorded the F0-contours of syllables from 19 nouns pronounced by 8 native speakers of the Luobuzhai Qiang dialect in China. Suppose that we use Yijk(t) to denote the pitch of syllables within the jth word that are pronounced by subject i. Each (i, j) contains more than one curves, indicated by k, because there are multiple syllables within a word and every word was spoken under three different contexts. Each curve was normalized by the total duration of the corresponding vowel and was sampled at 11 equidistant time points. Figure 2 displays an example of F0-contours for vowels that compose three different words spoken by three speakers. We observe that: (a) the shapes of the curves are strongly associated with the vowels; (b) there are substantial variations across the speakers and words. For example, speaker “a” has, on average, a lower pitch than the other two. Given vowel “i”, curves from word 3 display a steep rising pattern and decay at the end of the vowel. But curves from word 2 (labeled by the triangle symbol) are all arch-shaped. Yijk(t) is jointly affected by at least two random components: the word-generic effect Xi(t) and the speaker-inherent effect Zj(t). Unlike the hierarchical models, the two random components are mutually independent, yet interact on the pitch contours.
Although these three studies have different designs, they share some common features: (1) the fundamental observational unit is a function that can be high-dimensional; (2) data have a known structure induced by the sampling design; and (3) analysis of individual levels of variability is of interest. One goal of this article is to define a wide class of structured functional models with explicit functional effect components; in particular, the model class will contain the observed structures in the three examples. We will focus on the common structures and provide a consistent statistical framework for all these models. A second goal is to characterize the observed variability by uncorrelated latent processes. Through estimating and diagonalization of these covariance operators, we will achieve both dimensionality reduction of the original data and statistical modeling on the induced linear spaces. From an intuitive perspective, this article shows how to conduct principal component analysis (PCA) when data have a particular known and common latent correlation structure.
The structured functional models in this article fall into functional linear mixed model (FLMM) framework. Early work (Guo, 2002; Herrick and Morris, 2006; Morris and Carroll, 2006) mainly use splines or wavelets smoothing in model fitting. Brumback and Rice (1998) and Guo (2004) have specifically studied functional nested and crossed designs. More recent work like Staicu, Crainiceanu, and Carroll (2010) and Zhou et al. (2010) consider spatial correlation in the nested model. While all these models can be viewed as particular cases of FLMM, model fit and inference remains difficult, and is currently done on a model-by-model basis. We conclude that none of these previous articles have addressed the class of complex functional structures discussed here. Moreover, very fast algorithms for high-dimensional data had not been available. We aim to introduce a data-driven approach that applies to both nested and crossed designs, but is generalizable to a much broader model space. We introduce latent processes to capture explicit levels of variability using the same concept from standard mixed effects models. The only difference is that random effects are now replaced with random processes. Computational feasibility is achieved via principal component decomposition of covariance operators for latent processes, and by loss-less projections of high-dimensional data. The approaches are methodologically related to PCA decomposition (Staniswalis and Lee, 1998; Yao et al., 2003; Yao, Müller, and Wang, 2005; Di et al., 2009; Aston et al., 2010; Greven et al., 2010). Among those, Aston et al. (2010) project the whole function onto a vector space, where the vector entries are the first few principal scores of the function. Through multiple linear mixed effects models which link principal scores with the covariates, they are able to assess the effect of covariates on the outcome function. Alternatively, multilevel functional PCA (MFPCA, Di et al. [2009]) decomposes the intra-subject and inter-subject covariance operators in the two-way nested model, while inference is based on the scores separated by levels of variability. Longitudinal functional PCA (LFPCA, Greven et al. [2010]) uses a similar approach to model the longitudinal dynamics of functional observations at multiple visits. In this article, we generalize these ideas to analyze functional observations collected under the most common nested and crossed designs, and expand the number and type of models for functional data. We propose structured functional principal component analysis (SFPCA) as a method to decompose the variability via PCA for any functional model with a particular linear structure. We claim SFPCA to be the first algorithm for FLMM to efficiently handle dense and high-frequency measurements.
We organize the article as follows: in Section 2, we provide a list of structured functional models that SFPCA is applied to and connect them with the symmetric sum method of moments (MoM) estimators described in Koch (1968); Section 3 discusses SFPCA and its implementation, with extension to high-dimensional settings; Section 4 describes simulation studies for low-dimensional, high-dimensional and noisy settings; Section 5 applies SFPCA to the scientific questions described in Section 1.
2. Structured Functional Models
Koch (1967) provides a comprehensive list of linear models for scalar data that emerge from various experimental designs. We contend that these models have natural extensions to functional data and that the models may be analyzed by decomposing the corresponding covariance operators. Table 1 lists the proposed designs that are grouped based on sampling schemes (Brumback and Rice, 1998; Guo, 2002; Yao et al., 2005; Morris and Carroll, 2006; Baladandayuthapani et al., 2008; Di et al., 2009; Staicu et al., 2010; Zhou et al., 2010; Liu and Guo, 2012).
Table 1.
Nested | (N1) One-way | Yi(t) = μ(t) + Xi(t) + εit |
(N2) Two-way | Yij(t) = μ(t) + Xi(t) + Uij(t) + εijt | |
(N3) Three-way | Yijk(t) = μ(t) + Xi(t) + Uij(t) + Wijk (t) + εijkt | |
(NM) Multi-way | ||
Crossed | (C2) Two-way | Yij(t) = μ(t) + Xi(t) + Zj(t) + Wij(t) + εijt |
(C2s) Two-way sub | Yijk(t) = μ(t) + Xi(t) + Zj(t) + Wij(t) + Uijk(t) + εijkt | |
(CM) Multi-way |
Let Y(t) indicate the observed outcome function. The most general model format is that latent processes + εt, where μ(t) is the mean curve of fixed effect, (0, σ2) is the white noise. The latent processes are assumed to be zero-mean and square integrable, so that they are identifiable and the standard statistical assumptions for scalar outcomes can mirror to functional data. Consequently, the total variability of a functional outcome is decomposed into sum of process-specific variations plus σ2. These models capture a wide variety of correlation structures in modern functional data studies. In the following, we build up the intuition behind the functional nested and crossed designs, and connect them to the data examples that are discussed in the Introduction. For presentation purpose, we first assume “noise-free” models where σ = 0. Our methods are extended to “noisy” scenarios in Section 3.4.
2.1. Nested Designs
A one-way nested model (N1) is the simplest variance component model for functional data. In (N1), the observed outcome Yi(t) is represented as a sum of a deterministic mean function, μ(t), and a level-specific stochastic process Xi(t). Xi(t) are assumed to be i.i.d., with mean zero and covariance operator KX(t, s) = Cov{Xi(t), Xi(s)}; KX may be thought of as the functional counterpart of scalar covariance. The variability of Yi(t) is completely determined by that of Xi(t), that is, KY = KX. In conventional functional data analysis (Ramsay and Silverman, 2005), Xi(t) would be expressed via a set of spline or wavelet basis, or data-driven principal components (Ramsay and Silverman, 2005; Di et al., 2009; Greven et al., 2010). Irrespective of the basis functions, KX is determined by the first two moments of the representation coefficients and a quadratic form of the basis functions.
The two-way functional nested design (N2) is the functional equivalent of a one-way analysis of variance (ANOVA) model. Originally motivated by the two-way sampling design of EEG data in SHHS (Di et al., 2009), the model expands (N1) with a subject-visit specific process Uij(t) that has covariance KU(t, s) = Cov{Uij(t), Uij(s) . Thus, the observed total variability of Yij(t)} is decomposed into subject-specific and subject-visit specific variability. These two parts are modeled through KX and KU – the functional covariance operators of Xi(t) and Uij(t). To ensure identifiability, the random processes Xi(t) and Uij(t) are assumed to have mean zero and be uncorrelated. This assumption also guarantees that KY = KX + KU.
Additional levels of nesting can be included in the model to accommodate higher hierarchies. For example, the three-way nested model (N3) provides an appropriate framework for modeling the activity intensity data described in Section 1. In addition to the subject-specific process Xi(t) and the subject-visit specific process, Uij(t), the remaining variation in Yijk(t) is modeled through Wijk(t), which quantifies the hourly deviation from the average activity intensity level of day j for subject i. The most general functional nested model (NM) admits arbitrarily many levels of nesting. If the activity intensity is followed for weeks or months, a four-way or five-way model may be more descriptive, given the possible repeated patterns of activity from week to week or from month to month. As in the preceding models, mutual independence is imposed for model identifiability. The total variability is decomposable into level-specific functional variance components as KY = K1 + K2 + ··· + Kr, where . Here we use the notation from Table 1 for the multilevel hierarchical model with an arbitrary number of levels (NM).
2.2. Crossed Designs
Another group of designs admits crossing between levels. For example, the two-way crossed design (C2) is a functional analog of two-way ANOVA with an interaction term. It emphasizes a joint effect of two uncorrelated processes Xi(t) and Zj(t), as well as their interaction Wij(t), on the outcome Yij(t). The two-way crossed model with sub-sampling (C2s) applies to experimental designs where repeated measurements occur within each combination (i, j) induced by the first-level processes Xi(t) and Zj(t). In addition to the first-level crossing Wij(t) as in (C2), Uijk(t) accounts for variation in the replicates. For the phonetic example, Xi(t) and Zj(t) model the main effects of speakers and words, while Wij(t) models their interaction. Since multiple F0-contours may fall in category (i, j), we use Uijk(t) to capture the residual variation.
In general, we can consider an m-way crossed functional model (CM) with arbitrary number of crossings. In this model, r (r > 2) uncorrelated latent processes have exchangeable first-level effects on Y(t). Any subset of s (s ≤ r) processes out of r may have interactions, resulting in d functional additive terms in the model. For notational convenience, we express this model using d sub-index sets, that define the model structure. For example, (C2s) with four terms can be written as and , , , and . The assumptions on correlation structures stay the same as the previous designs. We now show how to efficiently estimate these models.
3. Structured Functional PCA
We develop SFPCA to efficiently reduce dimensionality and extract signals for the class of functional models introduced in Section 2. This approach models latent processes parsimoniously via principal components (PCs) by Karhunen–Loéve expansion. SFPCA starts with estimating the covariance operators of latent processes. Following Koch (1968), we employ the MoM approach based on symmetric sums. By extending his approach to functional settings, we construct unbiased estimators of covariance matrices on a grid of p points . After estimating the covariance operators, we conduct spectral decomposition to obtain eigenfunctions and principal scores that serve as coordinates in the space spanned by eigenfunctions. Note that the fixed effect is not of our main interest and can be estimated using existing methods. Without loss of generality, we assume that the data are already demeaned and we mainly focus on the random effects.
We use two-way crossed design (C2) as the main example. Details for other models in Table 1 can be found in Appendix B. Let Xi(t), Zj(t) and Wij(t) be mutually uncorrelated mean-zero random processes as described in Section 2. Their covariance operators are KX, KZ, and KW, respectively, where KX(t, s) = E{Xi(t)Xi(s)}, KZ(t, s) = E{Zj(t)Zj(s)} and KW(t, s) = E{Wij(t)Wij(s)}. Using the Karhunen–Loéve expansion for Xi(t), Zj(t)}, and Wij(t), model (C2) becomes
(1) |
where , , and are the eigenfunctions of the covariance operators KX, KZ, and KW. The scores , , and are mutually independent random variables with mean 0 and variance , , and , respectively, where , , and for every k, l, and m Normality of scores not necessary for the results in this article, but may be a convenient mild assumption.
3.1. Level-Specific Spectral Decomposition
Consider the case when most variability of each latent process is captured by the first N1, N2, and N3 principal components of Xi(t), Zj(t), and Wij(t), model (1) can then be approximated as . We vectorize the functional outcome on the discrete sampling points , and define Y = (Y11, . . . , Y1J1, . . . , YI1, . . . , YIJI) to be a p × n matrix with Yij := {Yij(t1), Yij(t2), . . . , Yij(tp)}T and . For notational simplicity we assume a balanced design where Ji = J, though such assumption is not necessary. Let and be the first N1 principal components observed at time grid . Similar definitions apply to and . Hence the truncated model is further expressed into matrix form as .
We will show in the next section how to obtain K̂X, K̂Z and K̂W. Given the availability of such estimators, we obtain , , and to be their first N1, N2, and N3 eigenvectors, where Nk (k = 1, 2, 3) is selected so that and q is a threshold between (0, 1). denotes the estimated eigenvalues for the corresponding covariance matrix. Let , , and be the diagonal matrices for the first N1, N2, and N3 eigenvalues. We can estimate the truncated set of principal scores as the best linear unbiased predictor (BLUP) of the mixed effect model , where , and . The BLUP estimators for two-way crossed model (C2) and three-way nested model (N3) are provided in Appendix A.
3.2. MoM Covariance Operator Estimation
By extending the idea of symmetric sum MoM estimators in Koch (1968), we show that our estimated covariance matrices will be of the form K̂X = YGXYT, K̂Z = YGZYT, and K̂W = KGWYT, where GX, GZ, are GW are design-specific matrices of dimension n × n. In fact, for all the structured functional models, MoM estimators of covariance operator are representable in the “sandwich” form, YGYT. We illustrate the detailed calculation for the covariance operators for the two-way crossed design (C2) and three-way nested design (N3). Results for other design schemes are provided in Appendix B.
3.2.1. Two-way crossed design (C2)
For model (C2), we have
Let nij = 1 if Yij is observed and 0 otherwise; , , , , and . Define Dn×n = diag{N1, N2, . . . , NI} with Ni = ni0Ini0, , and . Pn×n = diag{P1, . . . , PI} with Pi = diag{n01, . . . , n0ni0} of dimension ni0 × ni0, FJ×n = (f1, . . . , fJ)T is the second-level analogy E, where fj is a vector with value 1 on observations with second-level process Zj(t) and 0 otherwise. If HZ = 2(KW + KZ), HX = 2(KW + KX), and HXZ = 2(KW + KZ + KX), then the results above indicate the following explicit MoM estimators
Thus, the covariance operators can be estimated as K̂Z = (ĤXZ – ĤX)/2 =: YGXYT, K̂X = (ĤXZ – ĤZ)/2 =: YGXYT and K̂W = (ĤX + ĤZ – ĤXZ)/2 =: YGWYT.
3.2.2. Three-way nested model
Consider model (N3), where Yijk(t) = Xi(t) + Uij(t) + Wijk(t), i = 1, 2, . . . , I; j = 1, 2, . . . , Ji; k = 1, 2, . . . , nij, and W, U, and X are the three latent processes nested in orders. Similar to the approach for (C2), we have
Let Yijk = {Yijk(t1), . . . , Yijk(tp)}T, , , , , . D1 diag {N11, . . . , NIJI} where Nij = nijInij, and D2 diag N1, . . . , NI}, where Ni = ni. Ini.; , . If HW = 2KW, HU = 2(KW + KU) and Hx = 2(KW + KZ + KX), we obtain
(2) |
Hence, K̂W = ĤW/2, K̂U = (ĤU – ĤW/2, and K̂X = (ĤX – ĤU/2 all have the form YGYT. In general, multi-way nested and crossed designs can be estimated through a similar work flow (see Appendix B for details).
3.3. Structured High-Dimensional Data
Given the current research emphasis on high-dimensional data, linear models are still difficult to fit. Here we show that the entire model class described in Table 1 can be fitted using fast approaches. Note that the estimation procedures in the previous sections assume that the MoM estimators of the covariance operators can be constructed and decomposed. When the dimension of observations, p, is moderate, the methods described in Section 3 are straightforward. However, if the observations are high-dimensional, such as p > 10,000, the approach is no longer feasible. Calculating and storing a p-dimensional covariance operator K̂p×p is computationally expensive, and conducting spectral decomposition will become prohibitive. One could possibly smooth and down-sample the data assuming that the data are generated from low-rank intrinsic features. But in many scenarios, data are densely sampled for us to explore finer information and we would like to preserve the high resolution. Thus, we propose an alternative approach based on a rank-preserving transformation. This algorithm allows efficient calculation of the eigenfunctions and eigenvalues without requiring either storing or diagonalizing the estimated covariance matrices in high-dimensional space.
We outline the algorithm as follows. Throughout this section, we assume that . Hence, the induced covariance matrix is at most of rank n. Zipunnikov et al. (2011) propose an approach that avoids calculating the covariance operators in the original p-dimensional space. Consider (C2) as an example: the idea is to map the model onto a lower-dimensional space and obtain , where the matrix C should be of dimension m × p and . An arbitrarily chosen C will lose information from the p-dimensional data. However, we can show that we are able to find a C such that Ỹij span a space that preserve the ordering and important features from the original data space. One possible choice would be to start with the whole data matrix, Y, which can be obtained by column binding individual data vectors, Yij. Suppose that Y = VS1/2UT is the singular value decomposition (SVD) of Y, let C = VT. Given that Yij = VỸij, the data in the reduced-dimensional space Ỹij contain enough information from the original space.
Model becomes . Theorem 1 in Zipunnikov et al. (2011) shows that this transformation preserves full information for the linear PCA model. The eigenfunctions for the original model can be recovered by left multiplying V to the eigenfunctions obtained in the new model, and the eigenvalues remain unchanged. This is straightforward to implement, as the number of operations involved in the SVD of Y is linear in p. After obtaining the SVD of Y, each column Yij can be represented as Yij = VS1/2Uij, where Uij is a corresponding column of matrix UT. Therefore, the vectors Yij differ only via the factors Uij of length n, which is much lower-dimensional. Comparing this SVD representation of Yij with the original model (C2), it follows that the structured separation of the variability modeled by high-dimensional latent processes Xi, Zj, and Wij is identical in the structured separation of the low-dimensional vectors Uij. This is the key observation which motivates our approach. This model has an “intrinsic” dimensionality that is induced by the sample size n. The low-dimensional model is estimable using SFPCA in Section 3 and requires only O(n3) calculations.
We obtain , , and as the induced BLUP in the lower-dimensional model, with the matrices AX, AZ, and AW replaced by their corresponding estimates , , and . Furthermore, , , and in the original space may be recovered by left multiplying V onto , , and . We provide the formula for final estimates and their detailed derivation for two-way crossed model (C2) and three-way nested model (N3) in Appendix A. Up until the last step, all the calculations can be conducted in O(n3) complexity. Therefore, fitting the model in a reduced-dimensional space guarantees the high-dimensional principal components in a p-linear time. This means that complex statistical models for high-dimensional data sets can be fitted quickly.
3.4. Model with Noise
So far we have assumed that the data are measured without noise. However, the algorithm can be naturally extended to “noisy models.” When the noise component has a smooth covariance structure on the functional domain and can be expressed as another latent process such as Uij(t) in model (N2) and Wijk(t) in model (N3), SFPCA directly applies. When there is white noise ε ~ (0, σ2) along the function and σ2 > 0, we propose several approaches to smooth either the raw data or the covariance matrix estimators.
Take model (N3) as an example, suppose that the observed data are Ỹijk(t) = Yijk(t) + εijkt. The symmetric sum MoM estimators as in equation 2 become H̃ ỸGỸT and EH̃ = EH̃ + σ2I, where H̃ = YGYT. For low-dimensional data where rank preserved projection in Section 3.3 is not necessary, we estimate EĤ by smoothing the off-diagonal surface of H̃ as in Staniswalis and Lee (1998), and proceed with SFPCA algorithm as in the “noise-free” scenarios. However, we encounter multiple difficulties when applying this approach to high-dimensional functional data. First, it is computationally infeasible to conduct bivariate smoothing on the p × p covariance matrix when say, p ≥ 10,000. Second, although the white noises remain to be i.i.d. when projected onto the lower-dimensional space, the one-to-one mapping of the eigenvalues and principal scores between the p-dimensional model and the reduced n-dimensional model no longer holds after smoothing the covariance matrix in the reduced-dimensional space.
Therefore, we recommend a pre-processing by smoothing the raw data before conducting SFPCA. There is a trade-off between the signal from the raw data and the smoothness in the pre-processed data. As we have observed in our simulation settings, the first eigenvalues from the smoothed data are usually under-estimated. An alternative approach for the high-dimensional functional data is to apply a “structured” twist of the fast covariance estimation (FACE) algorithm (Xiao, Li, and Ruppert, 2013; Xiao et al. 2014). Their algorithm implements a computationally fast sandwich smoother on the sample covariance matrix YYT, and directly provides the eigenvalues and eigenfunctions without explicitly constructing the smoothed covariance matrix. Since the covariance matrix for each latent process in SFPCA techniques has the uniform sandwich expression YGYT, we are able to define the new data matrix Ỹ := YG1/2 and directly apply FACE to Ỹ. We refer to that article for more details.
4. Simulations
To better understand how SFPCA performs in practice, we conduct simulation studies for both low- and high-dimensional functional data, under various experimental designs and signal-to-noise ratios.
(3) |
For the three-way nested model (N3), we generate the high-dimensional data based on the true model 3, where i = 1, . . . , I; j = 1, . . . , J; k = 1, . . . , K; N1 = N2 = N3 = 4, , 1, 2, 3, 4; ; p = 50,000, I = 50, J = 5, and K = 5. The eigenfunctions are specified as
sin(2πt) | l | |
cos(2πt) | sin(6πt) | |
sin(4πt) | cos(6πt) | |
cos(4πt) | sin(8πt) |
We vary the standard deviation of the white noise σ to be 0, 0.1, 0.5, and 1, and conduct 100 simulations under each scenario. To compare the estimation accuracy, the number of PCs N1, N2, and N3 are treated as known. Figure 3 shows the estimated eigenfunctions when σ = 0.5. Overall the shape of the functions are well recovered. As we go from lower (Xi) to higher (Wijk) hierarchies, the estimation gets better because the level-specific sample size increases. Within each latent process, the first few eigenfunctions with larger eigenvalues are better estimated than the later ones. Table 2 lists the mean square errors (MSEs) of estimated and eigenvalues under different signal-to-noise ratio. More results for this simulation can be found in Appendix C.
Table 2.
σ | MSEλX (10−2) |
MSEλU (10−2) |
MSEλW (10−2) |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3.2 | 0.9 | 0.2 | 0.1 | 1.2 | 0.3 | 0.1 | 0.02 | 0.2 | 0.1 | 0.01 | 2E-3 |
0.1 | 4.5 | 0.9 | 0.2 | 0.1 | 1.3 | 0.7 | 0.2 | 0.2 | 0.2 | 0.1 | 0.02 | 8E-3 |
0.5 | 5 | 1.6 | 1.4 | 0.4 | 1.3 | 9.3 | 3.2 | 1.0 | 6.5 | 6.7 | 3.5 | 2.7 |
1 | 8 | 5 | 2.8 | 0.6 | 2 | 15.5 | 4.1 | 0.8 | 102 | 101 | 33.6 | 21.7 |
|
MSEξ (10−2) |
MSEθ (10−2) |
MSEζ (10−2) |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
ξ 1 | ξ 2 | ξ 3 | ξ 4 | θ 1 | θ 2 | θ 3 | θ 4 | ζ 1 | ζ 2 | ζ 3 | ζ 4 | |
2E-17 | 14.7 | 11.2 | 7.6 | 136 | 120 | 4.5 | 2.5 | 2 | 20.2 | 8.1 | 0.5 | 2.6 |
3E-8 | 42.7 | 18.2 | 5.5 | 34.7 | 305 | 5.6 | 3.2 | 2 | 280 | 30.5 | 9.9 | 15.8 |
4E-7 | 267 | 129.2 | 17.7 | 22.2 | 1660 | 22.8 | 12 | 19.1 | 1623 | 226 | 161 | 102 |
1E-6 | 370 | 159 | 36.1 | 22.5 | 1657 | 50.1 | 58.3 | 31.7 | 1677 | 466 | 245 | 153 |
We have also conducted simulation studies for (C2) model under different sample sizes and by smoothing the off-diagonal matrix. The results are also presented in Appendix C.
5. Data Applications
SFPCA can be applied to various types of structured data including the three examples discussed in the Introduction. The SHHS data were analyzed in details in Di et al. (2009) with MFPCA, which is a special case of the methodology considered in this article. Here we provide results for the phonetic study and the accelerometer data.
5.1. Phonetic Study
The phonetic study of Luobuzhai Qiang dialect consists of F0-contours from 8 subjects speaking 19 words under 3 contexts. Every word contains up to 4 syllables, each corresponding to one of the five vowels: “ə”, “a”, “e”, “i”, and “u”. The pitch values of the contours are measured at 11 equidistant time points that are standardized based on the total duration of the vowel. As previously observed, given the balanced study design, the marginal shapes of the contours are correlated with the associated vowels. In addition, each curve demonstrates speaker-specific and word-generic variations. To assess the effect of these covariates with relatively simple specification, Aston et al. (2010) assume that all the latent processes are on the same space expanded by a common set of eigenfunctions, and that covariates are associated with pitch levels through the principal scores – weights of the eigenfunctions. Here we relax these assumptions and attempt to fully evaluate the variability of each latent process as indicated by the data structure. We fit a two-way crossed model with subsampling (C2s) as in Table 1, but absorb the speaker-word interaction Wij(t) into Uijk(t). More specifically, the observed pitch contours Yijk(t) is modeled as μ(t, vijk) + Xi(t) + Zj(t) + Uijk(t), where μ(t, vijk) is the fixed effect determined by vowel vijk ∈ {“ə”, “a”, “e”, “i”, “u”}, Xi(t) and Zj(t) are two independent first-level random effects for speaker i = 1, 2, . . . , 8 and word j = 1, 2, . . . , 19, respectively. Uijk(t) accounts for all the remaining variability such as the tone, stress and intonation of the syllables. By applying the SFPCA algorithm, we extract the PCs as shown in Figure 4.
The speaker-specific deviation from the population average explains 45% of the total variation in the data, among which the majority (99.86%) is captured by the first PC that indicates equal weights over time. Similarly, PC 1 for the word-specific process Zj(t) also stays constant over time. This is consistent with the findings in Aston et al. (2010): most of the variations across speakers or words arise from the “shift” in the average pitch level. However, instead of further modeling the overall principal scores to determine whether the “shift” is speaker- or word-dependent as in Aston et al. (2010), we can claim that corresponds to speaker heterogeneity and accounts for word difference. Under the threshold of 99%, we only keep one PC for Xi(t), two for Zj(t) and three for Uijk(t). The fact that more PCs are selected to represent the features of Zj(t) and Uijk(t) implies greater complexity induced by the inherent word and syllables effects. To further evaluate the effects of speaker- or word-related covariates, we can conduct regression analysis specifically to principal scores of each latent process.
Furthermore, with SFPCA we can quantify the relative effect size (Shou et al., 2013) of speakers versus words based on the portion of variation explained by Xi(t) or Zj(t) (45% vs. 12% in Figure 4), indicating that subject heterogeneity is about 3 times larger than that of word-to-word difference. In fact, it also helps us to select the current model over model μ(t, vijk) + Xi(t) + Zj(t) + Wij(t) + Uijk(t) because the estimated variation explained by Wij(t) is negligible compared to other latent processes. Such assessment cannot be obtained using the very interesting analysis from Aston et al. (2010), as it would require an explicit modeling of the functional space. The two approaches are complementary and should be pondered in particular applications.
5.2. Accelerometer Data
In the accelerometer study, each participant has their activity intensity values recorded for 5 days during active periods (after waking up and before bedtime), which are identified using methods developed by Bai et al. (2012). Bai et al. (2012) mainly focuses on predicting movement type based on the three-axis accelerometer records. Here we are more interested in using the same data set to assess the variability of energy expenditure in the population and from day to day. As Figure 1 indicates a periodic pattern every hour, we model the observed curves into three hierarchies: hours within days within each subject.
The three-way nested model (N3) is applied to decompose the variance of the data. For the original data set which contain 36,000 measurements per hour, we can implement SFPCA using methods described in Section 3.3 for high-dimensional data. However, with the interest of understanding the circadian patterns of daily activity, it is more informative to smooth the data by averaging energy expenditure within every minute and conduct SFPCA on the summarized data. For simplicity, we also truncate the observations at the end of the study that do not complete an entire hour. Therefore, there are 60 measurements for every curve with a maximum of 19 curves per day for each subject. The first four principal components for the three levels of latent processes are displayed in Figure 5. The first component for the patient-specific process Xi(t) accounts for the heterogeneity of average activity level in the population. While the remaining few demonstrate either one-peak or double-peak energy expenditure pattern within one hour. Compared to subject-specific and hour-specific effects, the day-to-day variation (8.3%) accounts for a much smaller portion of the total variability. The majority (about 76%) of the total variability is contained in the hour-to-hour heterogeneity. This indicates in a quantitative way that people follow a similar routine everyday, but their energy expenditures change dramatically within one day, depending on the type of activity they are involved in during a particular hour. The relative effect size of different processes can also be evaluated as in the previous example.
6. Discussion
The defining characteristic of many functional studies is the existence of a specific structure in correlations with regard to the experimental design, which can directly affect inference. Thus, there is an increasing demand for methods that (1) respect study design; (2) model multiple levels of variation; (3) are computationally feasible in high dimensions. In response to this demand, we have introduced a class of structured functional models that include nested and crossed designs, and proposed a statistical framework, SFPCA, that analyzes these models. Given the independence assumption of latent processes, the covariance structures of the observed outcome are fully captured by the variance operators of the random processes. SFPCA is a set of efficient tools that estimate and analyze the covariance structures using a uniform protocol for all the models. It uses functional PCA for dimensionality reduction and feature extraction.
The extensive simulation studies clearly demonstrate a great potential of the methodology to recover level-specific features of the latent processes. When we apply SFPCA to two studies that collected accelerometeric and phonetic data, we are able to distinguish various layers of effects that are inherent in the data. Similar to Section 5 in Koch (1967), our methods are extendable to the cases when the covariance matrices differ across levels.
Future work should focus on developing more efficient unbiased method of moments estimators that are adaptable to unbalanced designs. The development of combined methodology that infuses both “naked” (nested/crossed) design-induced structures, with covariate-driven parts such as the one proposed in Greven et al. (2010), is an important, although challenging step in generalizing this framework. Our methodology has a few potential limitations. Two most important ones are more rigorous treatment of noise (Di et al., 2009), and possible accommodation of sparsity in the functional observations (Di, Crainiceanu and Jank, 2014).
Supplementary Material
Acknowledgements
The project described was supported by the NIH grant R01 EB012547 from the National Institute of Biomedical Imaging and Bioengineering, NIH grants R01 NS060910 and NIH grant R01 NS085211 from the National Institute of Neurological Disorders and Stroke, the NIH grants R01 MH095836 and R01 HL123407 from the National Institute of Mental Health, and by the Emmy Noether grant GR 3793/1-1 from the German Research Foundation.
We thank Dr John Aston for kindly providing us the phonetic study data and for his inspiring thoughts on the application of SFPCA.
Footnotes
7. Supplementary Materials
Web Appendix A, B, and C referred in Sections 3 and 4 are available with this paper at the Biometrics website on Wiley Online Library. The corresponding R code for our method is available at the Biometrics website.
References
- Aston JAD, Chiou JM, Evans JP. Linguistic pitch analysis using functional principal component mixed effect models. Journal of the Royal Statistical Society, Series C. 2010;59:297–317. [Google Scholar]
- Bai J, Goldsmith J, Caffo BS, Glass T, Crainiceanu CM. Movelets : A dictionary of movement. Electronic Journal of Statistics. 2012;6:559–578. doi: 10.1214/12-EJS684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baladandayuthapani V, Mallick BK, Young Hong M, Lupton JR, Turner ND, Carroll RJ. Bayesian hierarchical spatially correlated functional data analysis with application to colon carcinogenesis. Biometrics. 2008;64:64–73. doi: 10.1111/j.1541-0420.2007.00846.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brumback BA, Rice JA. Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association. 1998;93:961–976. [Google Scholar]
- Crainiceanu CM, Caffo BS, Di CZ, Punjabi NM. Nonparametric signal extraction and measurement error in the analysis of electroencephalographic activity during sleep. Journal of American Statistical Association. 2009;104:541–555. doi: 10.1198/jasa.2009.0020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Di CZ, Crainiceanu CM, Caffo BS, Punjabi NM. Multilevel functional principal component analysis. The Annals of Applied Statistics. 2009;3:458–488. doi: 10.1214/08-AOAS206SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Di CZ, Crainiceanu CM, Jank WS. Multilevel sparse functional principal component analysis. Stat. 2014;29:126–143. doi: 10.1002/sta4.50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Greven S, Crainiceanu CM, Caffo BS, Reich D. Longitudinal functional principal component analysis. Electronic Journal of Statistics. 2010;4:1022–1054. doi: 10.1214/10-EJS575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo W. Functional mixed effects models. Biometrics. 2002;58:121–128. doi: 10.1111/j.0006-341x.2002.00121.x. [DOI] [PubMed] [Google Scholar]
- Guo W. Functional data analysis in longitudinal settings using smoothing splines. Statistical Methods in Medical Research. 2004;13:49–62. doi: 10.1191/0962280204sm352ra. [DOI] [PubMed] [Google Scholar]
- Herrick RC, Morris JS. Wavelet-based functional mixed model analysis: Computation considerations. In Proceedings, Joint Statistical Meetings. ASA Section on Statistical Computing. 2006 [Google Scholar]
- Liu Z, Guo W. Functional mixed effects models. Wiley Interdisciplinary Reviews: Computational Statistics. 2012;4:527–534. [Google Scholar]
- Koch GG. A general approach to the estimation of variance components. Technometrics. 1967;9:93–118. [Google Scholar]
- Koch GG. Some further remarks concerning “A general approach to the estimation of variance components”. Technometrics. 1968;10:551–558. [Google Scholar]
- Morris JS, Carroll RJ. Wavelet-based functional mixed models. Journal of the Royal Statistical Society, Series B. 2006;68:179–199. doi: 10.1111/j.1467-9868.2006.00539.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quan SF, Howard BV, Iber C, Kiley JP, Nieto FJ, OConnor GT, Rapoport DM, Redline S, Robbins J, Samet JM, Wahl PW. The sleep heart health study: Design, rationale, and methods. Sleep. 1997;20:1077–1085. [PubMed] [Google Scholar]
- Ramsay JO, Silverman B. Functional Data Analysis. 2nd edition. Springer; New York: 2005. [Google Scholar]
- Shou H, Eloyan A, Lee S, Zipunnikov V, Caffo BS, Lindquist M, Crainiceanu CM. Quantifying the reliability of image replication studies: The image intra-class correlation coefficient (I2C2). Cognitive, Affective, and Behavioral Neuroscience. 2013;13:714–724. doi: 10.3758/s13415-013-0196-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Staicu AM, Crainiceanu CM, Carroll RJ. Fast methods for spatially correlated multilevel functional data. Biostatistics. 2010;11:177–194. doi: 10.1093/biostatistics/kxp058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Staniswalis JG, Lee JJ. Nonparametric regression analysis of longitudinal data. Journal of the American Statistical Association. 1998;93:1403–1418. [Google Scholar]
- Xiao L, Li Y, Ruppert D. Fast bivariate P-splines: The sandwich smoother. Journal of the Royal Statistical Society, Series B. 2013;75:577–599. [Google Scholar]
- Xiao L, Ruppert D, Zipunnikov V, Crainiceanu C. Fast covariance estimation for high-dimensional functional data. Statistics and Computing. 2014 doi: 10.1007/s11222-014-9485-x. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao F, Clifford AJ, Dueker SR, Follett J, Lin Y, Buchholz BA, Vogel JS. Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics. 2003;59:676–685. doi: 10.1111/1541-0420.00078. [DOI] [PubMed] [Google Scholar]
- Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005;100:577–590. [Google Scholar]
- Zhou L, Huang JZ, Martinez JG, Maity A, Baladandayuthapani V, Carroll RJ. Reduced rank mixed effects models for spatially correlated hierarchical functional data. Journal of the American Statistical Association. 2010;105:390–400. doi: 10.1198/jasa.2010.tm08737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zipunnikov V, Caffo BS, Yousem DM, Davatzikos C, Schwartz BS, Crainiceanu CM. Multilevel functional principal component analysis for high-dimensional data. Journal of Computational and Graphical Statistics. 2011;20:852–873. doi: 10.1198/jcgs.2011.10122. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.