Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jun 4.
Published in final edited form as: Stat Interface. 2022 Jan 11;15(2):209–223. doi: 10.4310/21-sii712

Covariate-adjusted hybrid principal components analysis for region-referenced functional EEG data

Aaron Wolfe Scheffler 1,*, Abigail Dickinson 2, Charlotte DiStefano 3, Shafali Jeste 4, Damla Şentürk 5
PMCID: PMC9165697  NIHMSID: NIHMS1759541  PMID: 35664510

Abstract

Electroencephalography (EEG) studies produce region-referenced functional data via EEG signals recorded across scalp electrodes. The high-dimensional data can be used to contrast neurodevelopmental trajectories between diagnostic groups, for example between typically developing (TD) children and children with autism spectrum disorder (ASD). Valid inference requires characterization of the complex EEG dependency structure as well as covariate-dependent heteroscedasticity, such as changes in variation over developmental age. In our motivating study, EEG data is collected on TD and ASD children aged two to twelve years old. The peak alpha frequency, a prominent peak in the alpha spectrum, is a biomarker linked to neurodevelopment that shifts as children age. To retain information, we model patterns of alpha spectral variation, rather than just the peak location, regionally across the scalp and chronologically across development. We propose a covariate-adjusted hybrid principal components analysis (CA-HPCA) for EEG data, which utilizes both vector and functional principal components analysis while simultaneously adjusting for covariate-dependent heteroscedasticity. CA-HPCA assumes the covariance process is weakly separable conditional on observed covariates, allowing for covariate-adjustments to be made on the marginal covariances rather than the full covariance leading to stable and computationally efficient estimation. The proposed methodology provides novel insights into neurodevelopmental differences between TD and ASD children.

Keywords: Autism spectrum disorder, Covariate-adjustments, Electroencephalography, Functional data analysis, Heteroscedasticity

1. INTRODUCTION

Despite the numerous developmental delays observed in children with autism spectrum disorder (ASD) compared to their typically developing peers (TD), the neural mechanisms underpinning these delays are not well characterized. To address this gap, our motivating study collected resting-state electroencephalograms (EEG) on TD and ASD children aged two to twelve years old, making it possible to contrast neural processes between the two diagnostic groups over a wide developmental range. EEG and magnetoencephalography (MEG) characterize cortical and intracortical brain activity, respectively, via the measurement of electrical potentials and their corresponding oscillatory dynamics (i.e. spectral characteristics). Recent studies in cognitive development using both EEG and MEG highlight the peak alpha frequency (PAF), defined as the location of a single prominent peak in the spectral density within the alpha frequency band (6-14 Hz), as a potential biomarker associated with autism diagnosis [16, 14, 15]. Specifically, the location of the PAF tends to shift from lower to higher frequencies as TD children age but this chronological shift is notably delayed or absent in ASD children [38, 30, 14, 15]. This trend is observed in our motivating data from a temporal electrode (T8) where the PAF, identifiable as ‘humps’ in age-specific slices of the group-specific bivariate mean alpha spectral density (across age and frequency), increases in frequency with age for TD children but not for ASD children (Figure 1(a)).

Figure 1.

Figure 1.

(a) Slices of the group-specific bivariate mean alpha spectral density (across age and frequency (6-14 Hz)) at ages 50, 70, 90 and 110 months from the T8 electrode. Darker lines correspond to older children. (b) A schematic diagram of the 10-20 25 electrode montage observed in the EEG data.

Although the PAF holds promise as a biomarker for neural development in TD and ASD children, emphasis on the identification of a single peak produces considerable drawbacks. Not only is estimation of subject-specific PAF error prone due to the presence of noise and multiple local maxima [11] but also identification of a single peak frequency inherently reduces the alpha spectral band to a single scalar summary resulting in a loss of information. To avoid these limitations, we follow Scheffler et al [36] and consider the entire spectral density across the alpha band as a functional measurement of neural activity.

In our motivating data, EEG signals are recorded from a high-density electrode array for several minutes and the time series at each channel is divided into overlapping two-second segments prior to Fast Fourier Transform (FFT) to the frequency domain. Spectral information is averaged across segments to boost the signal-to-noise ratio and the resulting data form region-referenced functional data with electrodes and spectral densities referred to as the regional and functional dimensions, respectively.

We focus on modeling and contrasting patterns of alpha spectral variation regionally across the scalp and chronologically across development for both the ASD and TD diagnostic groups. Previous research clearly shows that alpha spectral dynamics differ as a function of age between TD and ASD children and to assume a constant covariance structure across development risks missing important findings. To preserve developmental information, we propose a covariate-adjusted hybrid principal components analysis (CA-HPCA) that models variation in region-referenced functional data while simultaneously allowing the patterns of variation to change as a function of subject-specific covariates. CA-HPCA assumes the covariance process is weakly separable conditional on observed covariates, allowing for covariate-adjustments to be made on the marginal covariances rather than the full covariance leading to stable and computationally efficient estimation.

Since the introduction of the functional principal components analysis (FPCA) expansion (i.e. Karhunen-Loève expansion; [24, 28]), a detailed literature has developed around the estimation of functional principal scores and components for both densely and sparsely observed functional data along a single dimension (see Wang et al [39] for a thorough review). In recent years, the literature surrounding FPCA has shifted to consider functional data with more complex dependency structures, including repeatedly measured functional data [12, 13, 26, 32, 31, 45], longitudinally observed functional data [19, 8, 33, 7, 29], spatially correlated functional data [2, 18, 44, 37, 27], both spatially and longitudinally observed functional data [21, 35], and multivariate functional data [22, 9, 20]. While these methods permit modeling of high dimensional functional covariances, they are unable to adjust for covariates in the analysis of higher dimensional functional data where the covariate may introduce heteroscedasticity to the functional dependency structure (e.g. due to chronological age).

In the simplified context of one-dimensional functional data, existing methods allow for covariate-adjustments to both the functional mean and covariance. Generally, the functional mean is smoothed across the covariate-domain, or calculated for each class in the case of discrete covariates. Covariate-adjustments are made to the functional covariance in two ways: either both the eigenvalues and eigenfunctions of the functional covariance are allowed to change as a function of observed covariates, or the eigenfunctions are assumed to be constant across the covariate dimension but their corresponding eigenvalues (as well as principal scores) are covariate-dependent. In the former class, Cardot [5] proposed a non-parametric covariate-adjusted FPCA in the context of dense functional data and Jiang and Wang [23] extended covariate-adjusted FPCA to noisy or sparse settings by estimating subject-specific scores using conditional expectation. In both cases, covariance estimation is performed non-parametrically by simultaneous smoothing across the covariate and functional domains via kernel methods. By fixing eigenfunctions across the covariate domain, Chiou et al [10] introduced a semi-parametric functional regression model that estimates covariate-dependent principal scores using a single-index model and Backenroth et al [1] developed a heteroscedastic FPCA for repeatedly measured curves that models eigenvalues as an exponential function of covariate and subject-dependent effects. We note that a parallel but distinct time series literature exists which focuses on estimation of covariate modulated spectral densities [17, 25, 4] but these works primarily focus on directly modeling non-stationary spectra as opposed functional data generally and do not embed covariate-adjustments into simplifying assumptions of the high-dimensional covariance structure (i.e. weak separability).

Our proposed covariate-adjusted hybrid principal components analysis (CA-HPCA) combines existing one-dimensional methods for covariate-dependent functional heteroscedasticity with recent advances in multi-dimensional FPCA to allow covariate-adjustments in the context of region-referenced functional data. We briefly explore the methodological contributions of our proposed model and the resulting computational gains. A central theme in FPCA decompositions for multi-dimensional functional data is the use of simplifying assumptions regarding the covariance structure to ease estimation. A flexible approach in modeling two-dimensional functional data is to assume weak separability of the covariance process [7, 29] in which the marginal covariances along each dimension are targeted and the full covariance is projected onto a tensor basis formed from the corresponding marginal eigenfunctions. Thus, estimation is reduced from that of the total covariance in four-dimensions to the marginal covariances in two-dimensions for which efficient two-dimensional smoothers exist. Scheffler et al [35] extended weak separability to region-referenced functional EEG data by proposing a hybrid principal components analysis (HPCA) that includes a discrete regional dimension but this model does not allow for the mean or covariance to change across development as needed in our application. We leverage the simplifying assumptions and computational efficiency of HPCA by introducing covariate-dependence to the functional mean and covariance which allows the marginal eigenvalues and eigenfunctions to change across the covariate domain.

In addition to the flexible modeling framework, CA-HPCA also introduces major computational savings. These savings are related to the the addition of a covariate dimension to estimation of the marginal covariances which for a scalar covariate requires smoothing across three dimensions. Previous methods such as Cardot [5] and Jiang and Wang [23] utilized kernel methods to estimate covariate-dependent marginal covariances but these approaches are computationally intensive and scale poorly with the introduction of additional covariates. To address this challenge, we extend the fast functional covariance smoothing proposed by Cederbaum et al [6] to allow for covariate-adjustments by including an additional basis along the covariate dimension. Thus, CA-HPCA generalizes covariate-adjustments to high-dimensional functional covariances and achieves substantial reduction in computational burden by applying covariate adjustments to the marginal covariances with subsequent estimation performed via cutting-edge fast covariance smoothers.

A mixed effects framework is proposed to estimate the subject-specific scores with variance components that are a function of observed covariates. The estimated model components can be coupled with a parametric bootstrap resampling procedure to allow inference in the form of hypothesis testing and point-wise confidence intervals. We apply the proposed procedure to assess differences in alpha spectral dynamics between the TD and ASD groups across development. The remaining sections are organized as follows. Section 2 introduces the proposed CA-HPCA and Section 3 describes the corresponding estimation procedure. Application of the proposed method to our motivating EEG data follows in Section 4. Section 5 studies the finite-sample properties of CA-HPCA via extensive simulations. Section 6 concludes with a brief summary and discussion.

2. COVARIATE-ADJUSTED HYBRID PRINCIPAL COMPONENTS ANALYSIS (CA-HPCA)

Consider a region-referenced functional process observed in the presence of some continuous non-functional covariate aiA, Ydi(ai, r, ω), for subject i, i = 1,…, nd, from group d, d = 1,…, D, in region r, r = 1,…, R, and at frequency ω, ω ∈ Ω. We decompose Ydi(ai, r, ω) additively such that the expectation and covariance of the process both depend on the covariate ai,

Ydi(ai,r,ω)=ηd(ai,r,ω)+Zdi(ai,r,ω)+ϵdi(ai,r,ω),

where ηd(ai, r, ω) = E{Ydi(ai, r, ω)∣ai} denotes the group-region mean function, Zdi(ai, r, ω) denotes a mean zero region-referenced stochastic process with total variance Σd,T(ai; r, ω; r′, ω′) = cov{Zdi(ai, r, ω), Zdi(ai, r′, ω′)∣ai}, and ϵdi(ai, r, ω) denotes measurement error with mean zero and variance σd2 that is independent across the regional, functional, and covariate domains. We assume the group-region mean functions ηd(a, r, ω) are smooth in both the functional domain Ω and the non-functional domain A though we place no restrictions across the regional domain RR which in EEG data can lack the ordering provided by continuity.

In the proposed CA-HPCA model, we assume that the total covariance Σd,T(a; r, ω; r′, ω′) is weakly separable for each aA. Weak separability, a concept recently proposed by Lynch and Chen [29] for two-dimensional functional data and adapted by Scheffler et al [35] to region-referenced functional EEG data, implies that a covariance can be expressed as a weighted sum of separable covariance components and that the direction of variation (i.e. eigenvectors or eigenfunctions) along one dimensions of the EEG data is the same across fixed slices of the other dimension. Specifically, this assumption allows a multi-dimensional functional process to be decomposed parsimoniously via tensors of one-dimensional eigencomponents obtained from marginal covariances along each dimension. Note that weak separability is more flexible than strong separability (i.e. separability) commonly utilized in spatiotemporal modeling which requires the total covariance, not just the directions of variation, is the same up to a constant for fixed slices of the other dimensions. Unlike previous applications of weak separability, we introduce covariate-dependent heteroscedasticity by assuming the total covariance is weakly separable conditional on observed covariates and that marginal covariances in each dimension vary smoothly along the covariate domain. Let the covariate-dependent regional and functional marginal covariances be defined as,

{Σd,R(a)}r,r=Ωcov{Zdi(ai,r,ω),Zdi(ai,r,ω)}dω=k=1Rτdk,R(ai)vdk(ai,r)vdk(ai,r),Σd,Ω(a,ω,ω)=r=1Rcov{Zdi(ai,r,ω),Zdi(ai,r,ω)}==1τd,Ω(ai)ϕd(ai,ω)ϕd(ai,ω),

where vdk(a, r) are the covariate-dependent eigenvectors of the regional marginal covariance matrix {Σd,R(a)}r,r, ϕdℓ(a, ω) are the covariate-dependent eigenfunctions of the functional marginal covariance surface Σd(a, ω, ω′), and τdk,R(a) and τdℓ(a) are the regional and functional covariate-dependent marginal eigenvalues, respectively. Thus, there exists an orthogonal expansion of the covariate-dependent marginal covariances in terms of covariate-dependent marginal eigenvectors and eigenfunctions.

Utilizing the covariate-dependent eigenvectors and eigenfunctions, the covariate-adjusted hybrid principal components decomposition (CA-HPCA) of Ydi(ai, r, ω) is given as,

Ydi(ai,r,ω)=ηd(ai,r,ω)+Zdi(ai,r,ω)+ϵdi(ai,r,ω)=ηd(ai,r,ω)+k=1R=1ξdi,k(ai)vdk(ai,r)×ϕd(ai,ω)+ϵdi(ai,r,ω),

where ξdi,kℓ(ai) are uncorrelated subject-specific scores defined through the projection of the region-referenced stochastic process onto the covariate-dependent tensor basis, Zdi(ai,r,ω),vdk(ai,r)ϕd(ai,ω)=r=1RZdi(ai,r,ω)vdk(ai,r)ϕd(ai,ω)dω. Note that under the assumption of weak separability the subject-specific scores are uncorrelated over regions and frequencies. The CA-HPCA decomposition leads to the decomposition of the total covariance Σd,T(a; r, ω; r′ , ω′) as follows,

Σd,T(a;r,ω;r,ω)=cov{Zdi(a,r,ω),Zdi(a,r,ω)a}+σd2δ(a;r,ω;r,ω)=k=1R=1τd,k(a)vdk(a,r)vdk(a,r)×ϕd(a,ω)ϕd(a,ω)+σd2δ(a;r,ω;r,ω),

where τd,kℓ(a) = var{ξdi,kℓ(a)} are covariate-dependent variance components and δ(a, r, ω; a′, r′, ω′) denotes the indicator for {(a, r, ω) = (a′, r′, ω′)}. Because the variance components τd,kℓ(a) are allowed to vary along the covariate domain, covariate-dependent heteroscedasticity is introduced not only through the covariate-dependent marginal eigencomponents but also in their relative contribution to the total variance.

In practice, the CA-HPCA decomposition is truncated to include only Kd and Ld covariate-dependent marginal eigencomponents for the regional and functional domains, respectively, with the number of components initially selected by the fraction of variance explained (FVE). One guideline is to include the minimum number of covariate-dependent marginal eigencomponents in the CA-HPCA expansion that explain at least 90% of variation in their respective covariate-dependent marginal covariances for each observed covariate value, though this may need to be relaxed in certain instances when the number of eigencomponents is excessively high for a few covariate values. The final number of components can be fixed after the subject-specific scores and their associated variance components are estimated via a mixed effects model proposed in Section 3.1 which allow enumeration of the total FVE in the observed data not just for the marginal covariances but the total covariance. Further details on the selection of the number of covariate-dependent marginal eigencomponents are presented in Section 3.1. As mentioned, the above model assumes that both the marginal directions of variation and their associated variance components for subject-specific scores are allowed to vary across the covariate domain. If instead we allow only the variance components to be covariate-dependent but restrict the marginal eigenfunctions and eigenvectors to be constant across the covariate-domain (i.e. the marginal directions of variation are common across the covariate domain), we produce a reduced model where the marginal covariance may be pooled across observed covariates. We defer specifics about this useful extension on the reduced CA-HPCA to Web Appendix A of the Supporting Information. While the reduced CA-HPCA can lead to major computational savings, the assumption of covariate independent eigenfunctions may not be satisfied in every application. As an example, it was not found plausible in the context of our motivating EEG data where the directions of marginal variation were not constant across development.

3. ESTIMATION OF MODEL COMPONENTS AND INFERENCE

The following section details estimation of all CA-HPCA model components, including group-region mean functions, covariate-dependent marginal covariances, subject-specific decomposition scores and their associated variance components as well as procedures for inference made available through the proposed linear mixed effects model. In addition, guidance is provided for the selection of the number of eigencomponents included in the proposed decomposition.

Algorithm:

CA-HPCA Estimation Procedure

1.Estimation of group-region mean functions(a)Calculateη^d(ai,r,ω)by applying a bivariate penalized spline smoother to all observed data{ai,ω,Ydi(ai,r,ω):i=1,,nd;aiA;ωΩ}.(b)Mean center each observation,Y^dic(ai,r,ω)=Ydi(ai,r,ω)η^d(ai,r,ω).2.Estimation of covariate-dependent marginal covariances and measurement error variance(a)CalculateΣ^d,Ω(a,ω,ω)andσ^d,Ω2by applying trivariate penalized spline smootherstothproducts,{ai,ω,ω,Y^dic(ai,r,ω)Y^dic(ai,r,ω):i=1,,nd;aiA;r=1,,R;ω,ωΩ}.(b)CalculateΣ^d,R(a)by smoothing each(r,r)entry acrossA.Forrr,estimate{Σ^d,R(a)}(r,r)by applying aunivariate kernel smoother to{ai,r,r,Y^dic(ai,r,ω)Y^dic(ai,r,ω):i=1,,nd;aiA;rr=1,,R;ωΩ}.Forr=r,estimate{Σ^d,R(a)}(r,r)by applying a univariate kernel smoother to{ai,r,r,Y^dic(ai,r,ω)Y^dic(ai,r,ω)σ^d,Ω2:i=1,,nd;aiA;r=r=1,,R;ωΩ}.3.Estimation of covariate-dependent marginal eigencomponents(a)For each unique value ofaobserved,employ FPCA onΣ^d,Ω(a,ω,ω)to estimate the covariate-dependent eigen-value, eigenfunction paris,{τd,Ω(a),ϕd(a,ω):=1,,Ld}.(b)For each unique value ofaobserved, employ PCAonΣ^d,R(a)and to estimate the covariate-dependent eigenvalue,eigenvector pairs{τdk,R(a),vdk(a,r):k=1,Kd}.4.Estimation of covariate-dependent variance components and subject-specific scores via linear mixed effects models(a)Calculateτ^dg(ai)andσ^d2by fitting the proposed linear mixed effects model.(b)SelectGdsuch thatFVEdG>.8ford=1,,D.(c)Calculateξ^dig(ai)as the BLUPξ^dig(ai)=E{ξdig(ai)Ydi}and form predictionsY^di(ai,r,ω).

3.1. Estimation of CA-HPCA model components

We begin by introducing the CA-HPCA algorithm above and focus our discussion in this section more on the novel estimation procedures used in targeting covariate-dependent marginal covariances and variance components found in steps 2 and 4, respectively.

(1) Estimation of group-region mean functions: We calculate the estimated group-region mean function η^d(ai,r,ω) for each region via smoothing performed by projection onto a tensor basis formed by penalized marginal B-splines in the covariate and functional domains. Smoothing parameter selection is performed using restricted maximum likelihood (REML) methods. Assuming the estimated group-region mean functions lie in the space spanned by the marginal B-splines, the estimated group-region mean functions enjoy asymptotic consistency as discussed in [41]. (2) Estimation of covariate-dependent marginal covariances and measurement error variance: We estimate the covariate-dependent marginal covariances by assuming each two-dimensional marginal covariance varies smoothly over the covariate dimension. For the functional marginal covariance, Σd(a, ω, ω′), we extend the fast bivariate covariance smoother of Cederbaum et al [6] to include a third covariate dimension aA. To briefly review, Cederbaum et al [6] proposed a smooth method of moments approach to estimate covariance functions based on fast bivariate penalized splines. To achieve computational efficiency, their method leverages the symmetry of the covariance function to reduce the data used in estimation by targeting the upper triangle of the covariance surface (including the diagonal) and enforce symmetry constraints that reduce the number of spline coefficients needed for estimation.

We extend their approach via the development of a fast trivariate penalized spline smoother which incorporates covariate information through the introduction of a marginal spline basis along the covariate dimension. The resulting smoother maintains the computational efficiency of Cederbaum et al [6] while simultaneously allowing the marginal functional covariance to vary smoothly along the covariate dimension. In the process of estimating the covariate-dependent marginal functional covariance, we obtain an initial estimate of the measurement error variance σ^d,Ω2 as well by modeling the diagonal elements additively as a function of the marginal covariance and measurement error variance. Smoothing parameter selection is performed using REML methods. Note, the smooth method of moments estimator assumes independences of the cross products and homoscedastic Gaussian measurement error, common assumptions in the estimation of functional curves. Smooth covariance estimators which allow for heteroscedasticity are explored in Xiao et al [42] but are more computationally demanding. However, Cederbaum et al [6] showed that estimates based on the working assumptions of independence and homoscedastic Gaussian measurement error are robust when these assumptions are violated and well worth the computational savings.

The regional marginal covariance {Σd,R(a)}r,r is discrete in the regional dimension and thus not amenable to trivariate smoothers as the marginal functional covariance above. Therefore, we estimate the raw regional marginal covariance at each covariate-value by removing the measurement variance from the diagonals as in Scheffler et al [35], and smooth the resulting matrices entry-by-entry along the covariate domain. To ensure a positive definite regional marginal covariance, we utilize a kernel function with a common bandwidth in smoothing each entry across the raw covariate-dependent regional marginal covariances. The optimal bandwidth is selected via leave-one-subject-out cross validation (LOSOCV). Our kernel smoother is the Nadaraya-Watson kernel weighted-average,

{Σ^d,R(ao)}(r,r)={i=1ndωΩKλ(aiao)Y^dic(ai,r,ω)×Y^dic(ai,r,ω)}{Ωi=1ndKλ(aiao)},

where Kλ(·) is a kernel with bandwidth parameter λ, Y^dic(ai,r,ω)=Ydi(ai,r,ω)η^d(ai,r,ω) is the demeaned subject-level data, and ∣Ω∣ is the number of observed functional grid points. For example, in our data application and simulation study we make use of a Gaussian kernel function such that Kλ()=exp((aiao)22λ). The parameter λ is selected to minimize the LOSOCV(λ) statistic across all entries (r, r′),

LOSOCV(λ)=r=1Rr=1rLOSOCV(λ,r,r),LOSOCV(λ,r,r)=1Ωndi=1ndωΩ[Y^dic(ai,r,ω)][×Y^dic(ai,r,ω){Σ^d,R(i)(ai)}(r,r)]2,

where {Σ^d,R(i)(ai)} is the estimated smoothed marginal covariance matrix with the ith subject left out. As with any Nadaraya-Watson estimator, concerns can arise when the covariate space grows to dimensions higher than observed in our motivating data.

Thus, we introduce two novel covariate-dependent smoothers for the regional and functional marginal covariances that allow for calculation of the covariate-dependent marginal covariances that may be used for subsequent covariate-dependent eigendecompositions. (3) Estimation of covariate-dependent marginal eigencomponents: To estimate the covariate-dependent marginal eigencomponents we perform eigendecompositions at each fixed covariate-value as described in Scheffler et al [35] retaining a common number of Kd and Ld covariate-dependent eigencomponents. We initially include Kd and Ld components that explain at least 90% of variation in their respective covariate-dependent marginal covariances for each observed covariate value.

(4) Estimation of covariate-dependent variance components and subject-specific scores via linear mixed effects models: We make use of the estimated group-region mean functions and covariate-dependent marginal eigencomponents to propose a linear mixed effects framework for estimation of the covariate-dependent variance components and measurement error variance. Under the assumption of joint normality of the covariate-dependent subject-specific scores and measurement error, the proposed mixed effects framework induces regularization and stability in modeling the data by enforcing a low-rank structure on the covariate-dependent variance components through projection of the corresponding precision components onto a smooth basis. The resulting variance components can be used to select the number of eigencomponents to include in the CA-HPCA decomposition by quantifying the proportion of variance explained, leading to parametric bootstrap based inference in the form of hypothesis testing and point-wise confidence intervals. We present the linear mixed effects modeling framework below.

To make our notation more compact, we replace the double index kℓ in CA-HPCA decomposition truncated at Kd and Ld with a single index g = (k − 1) + Kd( − 1) + 1,

Ydi(ai,r,ω)=ηd(ai,r,ω)+g=1Gdξdig(ai)φdg(ai,r,ω)+ϵdi(ai,r,ω),

where φdg(ai, r, ω) = vdk(ai, r)ϕdℓ(ai, ω) is a covariate-dependent tensor basis formed from marginal eigencomponents, ξdig(ai) = ⟨Z(ai, r, ω), φdg(ai, r, ω)⟩, τdg(ai) = cov{ξdig(ai)}, and Gd = KdLd. Let Ydi(ai) represent the vectorized form of Ydi(ai, r, ω) for subject i, i = 1, …, nd observed along with covariate value ai. Analogous vectorized forms for the group-region mean function, ηd(ai, r, ω), the region-referenced stochastic process Zdi(ai, r, ω), covariate-dependent tensor basis φdg(ai, r, ω), and the the measurement error ϵdi(ai, r, ω) are denoted by ηdi(ai), Zdi(ai), φdg(ai), and ϵdi(ai), respectively. Under the assumption that ξdi(ai) = {ξdi1(ai), …, {ξdiG(ai)} and ϵdi(ai) are jointly Gaussian and cov{ξdi(ai), ϵdi(ai)} = 0 at a fixed value of ai, the proposed linear mixed effects model is given as

Ydi(ai)=ηdi(ai)+Zdi(ai)+ϵdi(ai)=ηdi(ai)+g=1Gdξdig(ai)φdg(ai)+ϵdi(ai), (1)

for i = 1, …, nd. The model is fit separately in each group, d = 1, …, D and the regional and functional dependencies in Ydi(ai) are induced through the subject-specific random effects ξdig(ai) in (1). Given the total covariance is weakly separable for fixed values of a, cov{ξdig(ai), ξdig′(ai)} = 0 for gg′ and thus the covariance matrix of the subject-specific scores possesses a diagonal diagonal structure, cov{ξdi(ai)} = Td(ai) = diag{τd(ai)}, where τd(ai) = {τd1(ai), …, τdG(ai)}. We further assume that Td(a) evolves smoothly along the covariate domain which allows the amount of variation attributed to each component φdg(a, r, ω) to vary smoothly as well. For covariance estimation, we target the smooth variance components through their corresponding precision matrix Td1(a)=Γd(a)=diag{γd(a)}, where γd(a) = {1/τd1(ai), …, 1/τdG(ai)} = {γd1(a), …, γdG(a)}. To estimate the smooth precision components, we project γdg(a) onto a suitable basis, γdg(a)=p=1Pvdgpψp(a), where vdgp are scalar precision components that act as basis weights and {ψp(a)p=1,…,P} are basis functions spanning the covariate domain (e.g. B-splines). The dimension P of the basis functions is chosen to be sufficiently large to capture changes in the variance components along the covariate dimension Given previous estimates for ηdi(a) and φdg(a), estimates of the covariate-dependent variance components τd(a) and measurement error variance σd2 are obtained using REML methods [41].

The assumption that the variance components evolve smoothly over the covariate domain resolves several challenges that emerge when modeling the covariate-dependent variance components. First, the estimation procedure is able to borrow strength across the covariate-domain when modeling variation, a necessity when specific covariate values may only be observed once as in our motivating data. Second, we are able to project the precision components onto a low-rank basis of smooth functions which induces regularization and control over the speed at which τd(a) is allowed to vary. Alternatively, a projection based approach would be less computationally burdensome with estimates of the subject-specific scores obtained directly by numerical integration, ξ^dig(ai)=Zdi(ai,r,ω),φ^dg(ai,r,ω) and their corresponding variance components calculated empirically τ^dg(a)=cov{ξ^dig(a)}, but the resulting estimates are unstable due to the limited number of observations at each point along the covariate domain. Therefore, despite the added compute time, our proposed linear mixed effects framework is better suited for providing covariate-adjustments to the region-referenced functional process in a controlled and principled manner.

The estimated covariate-dependent variance components are used to choose the number of eigencomponents included in the CA-HPCA decomposition where G′d denotes a set of eigencomponents such that the total fraction of variance explained (FVEdG′d) is greater than 0.8 for each group d = 1, …, D. We recommend starting with a larger number Gd = KdLd of covariate-dependent tensor components, {φdg(ai, r, ω) : 1, …, Gd}, in the mixed effects modeling used for the estimation of the covariate-dependent variance components, {τdg(ai) : g = 1, …, Gd}, and then reduce or add components as appropriate to determine the final value of G′d. In order to estimate the group-specific total fraction variance explained via the G′d covariate-dependent tensor components, we consider the quantity,

FVEdGd={i=1ndg=1Gdτ^dg(ai)}da[i=1nd{Y^dic(ai,r,ω)Rσ^d2da}]da,

where f(ai,r,ω)2=r=1Rf(ai,r,ω)2dω. Note that the above formulation utilizes covariate-dependent variance components estimates τ^dg(a) and σ^d2 obtained from the proposed mixed effects model to calculate the ratio of the variance in the G′d eigencomponents to the total variation in the observed data Ydi(ai, r, ω) without measurement error. The denominator of FVEdG′d does not use variation in a large number of tensor components to estimate the total variation in the observed data due to computational costs in fitting the proposed mixed effects model, but instead uses the two-dimensional norm of the demeaned data minus measurement error variance, similar to the approach by Chen et al [7]. Consequently, when the measurement error variance is overestimated and scaled by Rσ^d2da, FVEdG′d may exceed 1. Once G′d is defined, the subject-specific scores can be obtained using their best linear unbiased predictor (BLUP),

ξ^dig(ai)=E{ξdig(ai)Ydi}=τ^dg(ai)φ^dg(ai)TΣ^Ydi1{Ydi(ai)η^di(ai)},

where Σ^Ydi=g=1Gdτ^dg(ai)φ^dg(ai)φ^dg(ai)T+σ^d2I. Predictions of subject-specific trajectories Y^di(ai) may be formed as in (1) using estimated components. Asymptotic theory supporting consistent estimation of the variance components and subject-specific scores is discussed in [43]. For CA-HPCA, this estimation relies on the assumption of weak separability which ensures that the subject-specific scores are uncorrelated and thus the variance components form a diagonal matrix.

3.2. Inference via parametric bootstrap

Inference in the form of hypothesis testing and point-wise confidence intervals can be performed via a parametric bootstrap based resampling from the estimated CA-HPCA model components. To test the null hypothesis that all groups have equal means in the region r across the entire covariate domain, i.e. H0 : ηd(a, r, ω) = η(a, r, ω) for d = 1, … D, we propose a parametric bootstrap procedure based on the CA-HPCA decomposition. The proposed parametric bootstrap generates outcomes based on the estimated model components under the null hypothesis for region r as Ydib(ai,r,ω)=η^(ai,r,ω)+g=1Gdξdigb(ai)φ^dg(ai,r,ω)+ϵdib(ai,r,ω) and as Ydib(ai,r,ω)=η^d(ai,r,ω)+g=1Gdξdigbφ^dg(ai,r,ω)+ϵdib(ai,r,ω) in the other regions r′r not considered under the null, where subject-specific scores and measurement error are sampled from ξdigb(ai)N{0,τ^dg(ai)} and ϵdib(ai,r,ω)N(0,σ^d2), respectively. The proposed test statistic Tr=[d=1D{η^d(a,r,ω)η^(a,r,ω)}2dadω]12 is based on the norm of the sum of square-integrated departures of the estimated group-region shifts η^d(a,r,ω) from the estimate of the common shift across groups, η^(a,r,ω). The common region shift estimate η^(a,r,ω) under the null is set to the point-wise average of the group-region shift estimates, η^d(a,r,ω),d=1,,D. We utilize the proposed parametric bootstrap to estimate the distribution of the test statistic Tr which can be used to evaluate the null hypothesis across the covariate domain. Presented below is the algorithm for the proposed bootstrap test.

Algorithm:

Bootstrap Test

For a fixed region,r{1,,R},perform the following:1.GenerateBparametric bootstrap samples with samplesize and age distribution in each group identical to theobserved data.2.For thebth parametric bootstrap sample, calculate thetest statisticTrb=d=1D{η^db(a,r,ω)η^b(a,r,ω)}2dadω,whereη^db(a,r,ω)andη^b(a,r,ω)are both estimatedbased on thebth bootstrap sample.3.Use(1B)b=1BI(Trb>Tr)to estimate the p-valuewhereI()denotes the indicator function andTris thetest statistic from the original sample.

The bootstrap test described above can be extended to test the null hypothesis that all groups have equal means in the region r for a fixed covariate value aA, i.e. H0 : ηd(a*, r, ω) = η(a*, r, ω) for d = 1, … D. This extension can be used to test for group differences at particular covariate values, for example at earlier or later developmental stages. Outcomes are generated as described above but the test statistic Tr(a*) is calculated at a fixed covariate value aA rather than integrated across the covariate domain. To generate point-wise confidence intervals for estimates of η^d(a,r,ω), repeat the above parametric bootstrap procedure but instead generate outcomes from the model Ydib(ai,r,ω)=η^d(ai,r,ω)+g=1Gdξdigbφ^dg(ai,r,ω)+ϵdib(ai,r,ω). At each iteration of the bootstrap, estimate η^db(a,r,ω) from the simulated data and then form point-wise confidence intervals based on percentiles of the estimated bootstrap group-region mean functions across iterations as a function of a, r and ω, {η^dgb(a,r,ω):b=1,,B}. The two versions of the bootstrap test as well as point-wise confidence intervals will be utilized for data analysis in Section 4 and evaluated via simulations in Section 5.

4. APPLICATION TO THE ‘EYES-OPEN’ PARADIGM DATA

4.1. Data structure and methods

In our motivating data application, EEG signals were sampled at 500 Hz for two minutes from a 128-channel HydroCel Geodesic Sensor Net on 58 ASD and 39 TD children aged 25 to 146 months old (diagnostic groups were age matched). EEG recordings were collected during an ‘eyes-open’ paradigm in which bubbles were displayed on a screen in a sound-attenuated room to subjects at rest [14]. We describe the dataset in our previous work and present an abbreviated description here and direct the reader to Scheffler et al [36] for technical details related to pre-processing and data acquisition. EEG data for each subject is interpolated down to a standard 10-20 system 25 electrode montage (R = 25, Figure 1(b)) using spherical interpolation as detailed in Perrin et al [34], producing 25 electrodes with continuous EEG signal. Spectral density estimates for each electrode were obtained on the first 38 seconds of artifact free EEG data using Welch’s method with two second Hanning windows and 50 percent overlap [40], where 38 seconds constitutes the minimum amount of artifact free data across subjects. Thus, for each subject the electrode-specific spectral estimates form an instance of region-referenced functional data. Given that our primary interest is to model the alpha spectrum as a form of functional data, we restrict our analysis to the alpha spectral band (Ω = (6 Hz, 14 Hz)) which due to the sampling scheme has a frequency resolution of .25 Hz resulting in ∣Ω∣ = 33 functional grid points. The spectral density within this band is normalized to a unit area (through division of by its integral over Ω) to better facilitate comparisons across electrodes and subjects.

We employ the CA-HPCA decomposition to model the alpha spectrum which allows both the group-region mean functions and total variation to change across development. Estimation for the CA-HPCA procedure is carried out as described in Section 3.1. Smooths of the group-region mean functions ηd(r, a, ω) and covariate-dependent functional marginal covariances Σd,(a, ω, ω′) are obtained using tensor bases formed from marginal penalized cubic B-splines (with 10 and 4 degrees of freedom in the functional and covariate domains, respectively) and second degree difference penalties along each dimension. Smooths of the precision components γdg(a) for the linear mixed effects model are estimated by projection onto cubic B-splines with 4 degrees of freedom in the covariate domain. Penalty parameters and variance components for the group-region mean functions, covariate-dependent functional marginal covariances, and linear mixed effects model are selected via REML and models are fit using the gam and bam functions from mgcv (version 1.8-28). Smooths of the covariate-dependent regional marginal covariance {Σd,R(a)}r,r are estimated as in Section 3.1 using a Gaussian kernel with Nadaraya-Watson estimates obtained using the ksmooth function from stats (version 3.6.1) and bandwidth selection performed via LOSOCV. The parametric bootstrap procedure utilized 200 bootstrap samples. All estimation was performed on a 2.8 GHz 6-Core Intel Core i7 processor operating the R software environment (version 3.6.1).

4.2. Data analysis results

We present the results from our application of the CA-HPCA decomposition to the EEG data. While the main focus of our analysis is to characterize differences in alpha spectral dynamics between TD and ASD children over the course of development via inference on the group-region mean functions, we begin by briefly discussing the covariate-dependent marginal eigencomponents produced by the decomposition. Figure S1 displays the marginal FVE of ordered eigencomponents for the covariate-dependent regional and functional marginal covariances that explain at least 90% of the marginal FVE across the covariate domain. The marginal FVE attributed to each component is relatively constant over development. The leading (five and six) covariate-dependent regional marginal eigenvectors and (four and four) covariate-dependent functional marginal eigenfunctions are collectively found to explain (1.006 and 0.895) of the total FVE (FVEdG′d) in the (TD and ASD) groups, respectively. Recall, the total FVE may exceed 1 due to slight overestimation of measurement error as described in Section 3.1.

In the functional dimension along the covariate domain, the leading covariate-dependent marginal eigenfunctions ϕd1(a, ω) (Figure 2(a), top row) display patterns of variation that capture individual differences between initial alpha power at 6 Hz and intermediate alpha power at 10 Hz. Note, the frequency location of maximal alpha variation decreases as age increases in the TD group but remains relatively constant across development in ASD group. The second leading covariate-dependent marginal eigenfunctions ϕd2(a, ω) (Figure 2(a), bottom row), identifies different sources of alpha variation in the TD and ASD groups. Referring back to the mean alpha spectral densities displayed in Figure 1 (a), both diagnostic groups display a dip before 10 Hz that is followed by a peak after 10 Hz, though this is much more pronounced in the TD group. For the TD group, the second eigenfunction can be interpreted as differences in alpha power below and above the prominent peak at 10 Hz, whereas in the ASD group the second eigenfunction captures difference between initial alpha power at 6 Hz and intermediate alpha power between 7 and 9 Hz. In the TD group, the location of variation in the second eigenfunction shifts horizontally from higher to lower frequencies as age increases but no covariate trend is detectable in the ASD group. The first two leading covariate-dependent marginal eigenfunctions together explain at least 65% of the variation in the covariate-dependent functional marginal covariances in each diagnostic group.

Figure 2.

Figure 2.

(a) Estimated first and second leading covariate-dependent eigenfunctions ϕd1(a, ω) and ϕd2(a, ω) at a = 50, 70, 90, 110 months (darker lines correspond to older children). (b) Estimated first and second leading covariate-dependent eigenvectors vd1(a, r) and vd2(a, r) at a = 50, 70, 90, 110 months. Shading corresponds to the weight of each element in the eigenvector.

In the regional dimension along the covariate domain, the first leading covariate-dependent marginal eigenvectors vd1(a, r) (Figure 2 (b), top row) display maximal variation in the (central and posterior regions) at younger ages with a shift to the (posterior and central regions) at older ages in the (TD and ASD groups), respectively. In the second leading covariate-dependent marginal eigenvector vd2(a, r) (Figure 2 (b), bottom row) the TD group shows maximal variation in the frontal region at younger ages with a shift to the the posterior at older ages while the ASD group exhibits maximal variation that alternates between the temporal and central regions over development. The first two covariate-dependent marginal eigenvectors together explain at least 70% of the variation in the covariate-dependent regional marginal covariances in each diagnostic group.

In the covariate dimension, the four leading covariate-dependent variance components τdg(a) (Figure S2) fluctuate as a function of age where the leading component corresponding to the covariate-dependent tensor component φd1(a, r, ω) = vd1(a, r)ϕd1(a, ω) is responsible for the vast majority of variation in the ASD group but not the TD group. The variance components in the TD group vary along the covariate domain whereas the variance components in the ASD group are relatively constant apart from edge effects at the boundary of the covariate domain. In summary, the covariate-dependent eigencomponents and their associated variance components for the two diagnostic groups show differing patterns of variation that modulate distinctly along the covariate domain, confirming our need to model covariate-dependent heteroscedasticity in each group separately.

To test for differences between diagnostic groups in the alpha spectrum across development, we utilize the parametric bootstrap procedures described in Section 3.2. For each electrode r, we test the null hypothesis that the TD and ASD group-region mean functions are equal across the entire covariate domain from 25 to 145 months which takes the form H0 : ηd(r, ω, a) = η(r, ω, a), d = 1, 2. To address the issue of multiple testing across electrodes, we utilize the procedure of Benjamini and Yekutieli [3], a less conservative alternative to Bonferroni correction which transforms p-values into q-values to control the false discovery rate (FDR). We define q-values less than 0.05 to be statistically significant and find significant differences in the frontal (Fp1, q=0.036; F3, q<0.001), temporal (T8, q=0.025; T10, q=0.036), and posterior regions (P4, q=0.025; P8, q=0.025; P10, q<0.001) between the TD and ASD groups across development (results for each electrode can be found in Table S2). To investigate which stages of development are driving the observed differences in these seven electrodes (Fp1, F3, T8, T10, P4, P8, and P10), we perform a secondary analysis and test the null hypothesis that the TD and ASD group-region mean functions are equal for a fixed age a* = 25, 26, …, 144, 145 months within each electrode which takes the form H0 : ηd(r, ω, a*) = η(r, ω, a*), d = 1, 2. Because this is a secondary analysis, rather than formally adjusting for multiple comparisons we examine unadjusted p-values and avoid emphasis of findings with nominal significance, instead highlighting regions of development where electrodes consistently violate the null hypothesis. Figure 3 displays the results of the hypothesis tests for the seven electrodes with p-values transformed to the −log10 scale to better stratify results where values greater than −log10(0.05) = 1.30 denote significance at level α = .05. All seven electrodes display strong differences between diagnostic groups in the alpha spectrum between approximately 100 and 130 months with the two temporal electrodes showing differences at earlier ages between 30 and 50 months, as well.

Figure 3.

Figure 3.

The results from the secondary analysis of seven electrodes identified as showing significant differences at some point across development. A parametric bootstrap test is conducted for each month between 25 and 145 months from the CA-HPCA model of the alpha spectrum. P-values are transformed to the −log10 scale to better stratify results where darker colors correspond to more significant differences (−log10(p) > 1.30 denote significance at level α = .05).

Among the greatest visual differences in the group-region mean functions are observed in the T8, T10, P8, and P10 electrodes displayed in Figure 4 along with their 95% point-wise confidence intervals generated as described in Section 3.2. At all four electrodes, the TD group displays a well-defined peak in the alpha spectrum that shifts from 9 Hz to 11 Hz moving from 50 to 110 months, whereas the ASD group generally has less clearly-defined flat peaks that tend to center around 9 Hz throughout development. Differences in the estimated group-region mean functions mirror the results found from the secondary analysis which examined group-differences month by month. For the T8 electrode, the point-wise confidence intervals between diagnostic groups separate for younger and older ages at 50, 90 and 110 months, while all four electrodes display separation in the point-wide confidence intervals at 110 months.

Figure 4.

Figure 4.

The estimated group-region mean functions ηd(a, r, ω) at ages a = 50, 70, 90, 110 months from the T8, T10, P8, and P10 electrodes from the CA-HPCA model of the alpha spectrum. Grey shading denotes 95% point-wise confidence intervals for estimates.

To assess the sensitivity of our results to developmental information, we include a naive analysis examining group differences in the alpha spectrum using the hybrid principal components analysis decomposition of Scheffler et al [35] which ignores covariate information. Full details of the naive analysis are included in Web Appendix B of the Supporting Information and are summarized here. The naive analysis find six regions that display differential alpha spectral dynamics between the two diagnostic groups over the course of development, four of which are not found among the seven electrodes identified by the CA-HPCA decomposition. Collectively, this suggests that omitting covariate-information reduces power in our motivating analysis and may lead to misleading results due to model misspecification. In addition, by omitting covariate information there is no way to quantify at what point in development these particular regions differ significantly. When aggregated, the observations and inferences obtained from the CA-HPCA model components provide evidence for differences in both the mean structure and patterns of covariation between the two diagnostic groups that shift and change over development highlighting the need to provide covariate-adjustments in modeling the region-referenced EEG data across a broad age range.

5. SIMULATION

We studied the finite sample properties of the proposed CA-HPCA model as well as the associated bootstrap derived group-level inference via extensive simulations. We summarize the results of the simulation study here and defer details of data generation and simulation evaluation to Web Appendix C of the Supporting information. We conducted 500 Monte Carlo runs for two sample sizes (nd = 50 and 100) and two signal-to-noise ratios (SNRs = 4 and 10) for a total of four settings. The lower sample size is similar to the group sample sizes in our observed EEG data. To assess the performance of the proposed estimation algorithm in targeting the functional and vector components of CA-HPCA, we utilize normalized mean squared errors (MSE) and relative squared errors (RSE) based on norms of deviations of the estimated quantities from target quantities. In addition, we report the total fraction of variance explained (FVE), coverage properties of 95% confidence intervals for the group-region mean functions, and power of the proposed bootstrap procedure for testing differences both across the covariate domain and at fixed locations of the covariate domain.

Figure 5 displays estimated model components based on 500 Monte Carlo runs from the CA-HPCA simulation setup with the most challenging simulation settings nd = 50 and c = 4 (low SNR). The estimated group-region mean functions with the 10th, 50th, and 90th percentile RSE from d = 1 and r = 5 (Figure 5(a)) closely match the true curves across the covariate domain. The estimated covariate-dependent regional and functional marginal eigencomponents (Figure 5 (b, c)) are displayed from runs with RSE values at the 10th, 50th, and 90th percentiles, overlaid by their true quantities. Even at small sample sizes and low SNR, CA-HPCA captures the periodicity, phase, and magnitude of the true components. Occasionally, estimates of model components at the edge of the covariate domain do not capture phase shifts likely due to relative sparsity observations in covariate domain when nd = 50.

Figure 5.

Figure 5.

The true and estimated (a) group-region mean functions ηd(a, r, ω) for d = 1 and r = 5, (b) two leading covariate-dependent functional marginal eigenfunctions ϕd1(a, ω) and ϕd2(a, ω), (c) two leading covariate-dependent regional marginal eigenvectors vd1(a, r) and vd2(a, r) for a = 0.211, 0.474, 0.737 corresponding to the 10th, 50th, and 90th percentile relative squared error (RSE) values based on 500 Monte Carlo runs from the CA-HPCA simulation design at ηd = 50 and low signal-to-noise ratio (SNR).

Table 1 displays median, 10th, and 90th percentile RSE and normalized MSE values based on 500 Monte Carlo runs corresponding to the estimated CA-HPCA components from all four simulation settings. Given that normalized measures of RSE and MSE were used, we report percentiles for model component(s) over all Monte Carlo runs combined across groups (and subjects in the case of subject-level predictions). More specifically, while performance measures for vdk(a, r), ϕdℓ(a, ω), σd2, and FVEdG′d are reported over D × 500 Monte Carlo runs, measures for ηd(a, r, ω), Ydi(ai, r, ω), τ(a)d,kℓ, and coverage are reported over D × R × 500, D × nd × 500, D × Kd × Ld × 500, and D×R×A×500 Monte Carlo runs, respectively.

Table 1.

Percentiles 50% (10%, 90%) of the relative squared errors (RSE), normalized mean squared errors (MSE), total fraction of variance explained (FVE), and coverage across groups for model components based on 500 Monte Carlo runs from the design at nd = 50, 100 for low and high signal-to-noise ratio (SNR) from the CA-HPCA simulation study. Due to their small magnitude, MSE values are scaled by a factor of 103 for presentation.

Low SNR High SNR
nd = 0 nd = 100 nd = 50 nd = 100
ηd(a, r, ω) 0.017 (0.007, 0.035) 0.009 (0.004, 0.018) 0.016 (0.006, 0.034) 0.008 (0.003, 0.017)
Ydi(ai, r, ω) 0.173 (0.140, 0.219) 0.173 (0.139, 0.220) 0.079 (0.062, 0.102) 0.078 (0.062, 0.102)
vd1(a, r) 0.086 (0.024, 0.233) 0.045 (0.017, 0.130) 0.082 (0.022, 0.236) 0.042 (0.014, 0.097)
vd2(a, r) 0.153 (0.069, 0.278) 0.074 (0.04, 0.159) 0.137 (0.061, 0.268) 0.067 (0.035, 0.128)
ϕd1(a, ω) 0.073 (0.026, 0.151) 0.048 (0.03, 0.097) 0.066 (0.031, 0.144) 0.048 (0.028, 0.088)
ϕd2(a, ω) 0.075 (0.031, 0.148) 0.052 (0.032, 0.098) 0.065 (0.033, 0.140) 0.049 (0.03, 0.091)
τd,kℓ(a) 0.150 (0.031, 0.989) 0.061 (0.021, 0.975) 0.105 (0.032, 0.243) 0.053 (0.019, 0.141)
σd2 0.056 (0.002, 0.320) 0.040 (0.002, 0.205) 0.067 (0.002, 0.430) 0.042 (0.001, 0.238)
FVE d,kℓ 0.982 (0.962, 1.005) 0.992 (0.974, 1.006) 0.971 (0.947, 0.994) 0.985 (0.971, 1.001)
coverage 0.892 (0.758, 0.988) 0.952 (0.802, 0.998) 0.940 (0.823, 0.998) 0.958 (0.828, 0.998)

Overall, the RSEs for all model components decrease with higher sample size and SNR. The predicted subject-level curves Ydi(ai, r, ω) are most sensitive to changes in SNR, as expected, while the RSEs for the eigencomponents, vdk(a, r), ϕdℓ(a, ω) and τd,kℓ(a), are more sensitive to changes in sample size rather than SNR, suggesting that the estimation procedure effectively corrects for measurement error when obtaining the marginal covariances. The MSE for σd2 was extremely small and did not follow a trend with respect to sample size or SNR. Across simulation designs, the total fraction of variance explained, FVEd,G′d, almost always approach 1.00 due to the compact number of marginal eigencomponents used to generate the data. Given that calculation of FVEd,G′d depends on estimates of the variance components and the two-dimensional norm of the demeaned observed data, the calculated values of FVEdGd may exceed 1.00 in some instances. For all simulation settings except the lowest sample size and SNR, the median coverage probabilities for the point-wise confidence intervals of the group-region mean functions approach their nominal level of 95%. For the hypothesis test defined across the covariate domain, the level of the parametric bootstrap test was approximately .05 for nd = 100 and the power of the test generally increases faster with larger sample sizes (Table S1). For the hypothesis test at fixed locations of the covariate domain, the level of the parametric bootstrap test was slightly above .05 across the covariate domain, particularly at the smaller sample size nd = 50. The power across fixed locations of the covariate domain also increased with sample size (Figure S4). Further discussion of the power analysis can be found in Web Appendix C of the Supporting Information.

6. DISCUSSION

We proposed a covariate-adjusted hybrid principal components analysis (CA-HPCA) which decomposes region-referenced functional data and accounts for covariate-dependent heteroscedasticity by assuming the high-dimensional covariance structure is weakly separable conditional on observed covariates. The proposed estimation procedure develops computationally efficient fast-covariance smoothers that incorporate covariate-dependence when estimating marginal covariances as well as a mixed effects framework which admits inference along the covariate-domain via parametric bootstrap sampling of estimated model components. As with any model, verifying key assumptions is necessary for principled inference, namely validating the assumption of weak separability conditional on observed covariates as well as joint normality of the subject-specific scores and measurement error variance in the linear mixed effects model. Application of CA-HPCA to region-referenced EEG data collected on TD and ASD children revealed that the alpha spectrum changes over development both in terms of mean structure and patterns of covariation. Further, inference based on the CA-HPCA decomposition revealed significant differences in alpha spectral dynamics between the two diagnostic groups, particularly at younger and older ages. The CA-HPCA decomposition was developed to model EEG data over a broad developmental range, the procedure may be applied to other settings where high-dimensional data is expected to exhibit differential covariation as a function of observed covariates.

Supplementary Material

Supplementary Material

ACKNOWLEDGEMENTS

This work was supported by the National Institute of Mental Health [R01 MH122428 (DS)].

Footnotes

SUPPORTING INFORMATION

Web Appendices A, B, and C for the proposed CA-HPCA decomposition may be found online in the Supporting Information section at the end of the article. R code implementing the proposed methodology can be found at https://github.com/aaron-scheffler.

Contributor Information

Aaron Wolfe Scheffler, Department of Epidemiology & Biostatistics, University of California, San Francisco, USA.

Abigail Dickinson, Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, USA.

Charlotte DiStefano, Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, USA.

Shafali Jeste, Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, USA.

Damla Şentürk, Department of Biostatistics, University of California, Los Angeles, USA.

REFERENCES

  • [1].Backenroth D, Goldsmith J, Harran MD, Cortes JC, Krakauer JW and Kitago T (2018). Modeling Motor Learning Using Heteroscedastic Functional Principal Components Analysis. Journal of the American Statistical Association 113 1003–1015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Baladandayuthapani V, Mallick BK, Hong MY, Lupton JR, Turner ND and Carroll RJ (2007). Bayesian Hierarchical Spatially Correlated Functional Data Analysis with Application to Colon Carcinogenesis. Biometrics 64 64–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Benjamini Y and Yekutieli D (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist 29 1165–1188. [Google Scholar]
  • [4].Bruce SA, Hall MH, Buysse DJ and Krafty RT (2018). Conditional adaptive Bayesian spectral analysis of non-stationary biomedical time series. Biometrics 74 260–269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Cardot H (2007). Conditional Functional Principal Components Analysis. Scandinavian Journal of Statistics 34 317–335. [Google Scholar]
  • [6].Cederbaum J, Scheipl F and Greven S (2018). Fast symmetric additive covariance smoothing. Computational Statistics & Data Analysis 120 25–41. [Google Scholar]
  • [7].Chen K, Delicado P and Müller HG (2016). Modelling function-valued stochastic processes, with applications to fertility dynamics. Journal of the Royal Statistical Society. Series B (Methodological) 79 177–196. [Google Scholar]
  • [8].Chen K and Müller HG (2012). Modeling Repeated Functional Observations. Journal of the American Statistical Association 107 1599–1609. [Google Scholar]
  • [9].Chiou J-M, Chen Y-T and Yang Y-F (2014). Multivariate functional principal components analysis: a normalization approach. Statistica Sinica 24 1571–1596. [Google Scholar]
  • [10].Chiou J-M, Müller H-G and Wang J-L (2003). Functional Quasi-Likelihood Regression Models with Smooth Random Effects. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 65 405–423. [Google Scholar]
  • [11].Corcoran AW, Alday PM, Schlesewsky M and Bornkessel-Schlesewsky I (2018). Toward a reliable, automated method of individual alpha frequency (IAF) quantification. Psychophysiology 55 e13064. [DOI] [PubMed] [Google Scholar]
  • [12].Crainiceanu C, Staicu AM and Di CZ (2009). Generalized Multilevel Functional Regression. Journal of the American Statistical Association 104 1550–1561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Di CZ, Crainiceanu CM, Caffo BS and Punjabi NM (2009). Multilevel functional principal component analysis. The Annals of Applied Statistics 3 458–488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Dickinson A, DiStefano C, Senturk D and Jeste SS (2018). Peak alpha frequency is a neural marker of cognitive function across the autism spectrum. European Journal of Neuroscience 47 643–651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Edgar JC, Dipiero M, McBride E, Green HL, Berman J, Ku M, Liu S, Blaskey L, Kuschner E, Airey M, Ross JL, Bloy L, Kim M, Koppers S, Gaetz W, Schultz RT and Roberts TPL (2019). Abnormal maturation of the resting-state peak alpha frequency in children with autism spectrum disorder. Human Brain Mapping 40 3288–3298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Edgar JC, Heiken K, Chen Y-H, Herrington JD, Chow V, Liu S, Bloy L, Huang M, Pandey J, Cannon KM, Qasmieh S, Levy SE, Schultz RT and Roberts TPL (2015). Resting-State Alpha in Autism Spectrum Disorder and Alpha Associations with Thalamic Volume. Journal of Autism and Developmental Disorders 45 795–804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Fiecas M and Ombao H (2016). Modeling the Evolution of Dynamic Brain Processes During an Associative Learning Experiment. Journal of the American Statistical Association 111 1440–1453. [Google Scholar]
  • [18].Giraldo R, Delicado P and Mateu J (2010). Ordinary kriging for function-valued spatial data. Environmental and Ecological Statistics 18 411–426. [Google Scholar]
  • [19].Greven S, Crainiceanu CM, Caffo BS and Reich DS (2010). Longitudinal functional principal component analysis. Electronic Journal of Statistics 4 1022–1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Happ C and Greven S (2018). Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains. Journal of the American Statistical Association 113 649–659. [Google Scholar]
  • [21].Hasenstab K, Scheffler A, Telesca D, Sugar CA, Jeste S, DiStefano C and Şentürk D (2017). A multi-dimensional functional principal components analysis of EEG data. Biometrics. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Jaques J and Preda C (2014). Model-based Clustering for Multivariate Functional Data. Computational Statistics and Data Analysis 71 92–106. [Google Scholar]
  • [23].Jiang C-R and Wang J-L (2010). Covariate adjusted functional principal components analysis for longitudinal data. Ann. Statist 38 1194–1226. [Google Scholar]
  • [24].Karhunen K (1946). Zur Spektraltheorie stochastischer Prozesse. Ann. Acad. Sci. Fennicae, AI 37 1–37. [Google Scholar]
  • [25].Krafty RT, Rosen O, Stoffer DS, Buysse DJ and Hall MH (2017). Conditional Spectral Analysis of Replicated Multiple Time Series With Application to Nocturnal Physiology. Journal of the American Statistical Association 112 1405–1416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Kundu MG, Harezlak J and Randolph TW (2016). Longitudinal functional models with structured penalties. Statistical Modelling 16 114–139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Liu C, Ray S and Hooker G (2016). Functional principal component analysis of spatially correlated data. Statistics and Computing 1–16. [Google Scholar]
  • [28].Loeve M (1946). Fonctions aléatoires à décomposition orthogonale exponentielle. La Revue Scientifique 84 159–162. [Google Scholar]
  • [29].Lynch B and Chen K (2018). A test of weak separability for multi-way functional data, with application to brain connectivity studies. Biometrika 105 815–831. [Google Scholar]
  • [30].Miskovic V, Ma X, Chou CA, Fan M, Owens M, Sayama H and Brandon E Gibb BE (2015). Developmental changes in spontaneous electrocortical activity and network organization from early to late childhood. NeuroImage 118 237 – 247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Morris JS, Vannucci M, Brown PJ and Carroll RJ (2003). Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis. Journal of the American Statistical Association 98 573–583. [Google Scholar]
  • [32].Morris JS and Carroll RJ (2006). Wavelet-based functional mixed models. Journal of the Royal Statistical Society. Series B (Methodological) 68 179–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Park SY and Staicu AM (2015). Longitudinal functional data analysis. Stat 4 212–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Perrin F, Pernier J, Bertrand O and Echallier JF (1989). Spherical splines for scalp potential and current density mapping. Electroencephalography and Clinical Neurophysiology 72 184 – 187. [DOI] [PubMed] [Google Scholar]
  • [35].Scheffler A, Telesca D, Li Q, Şentürk D, Sugar CA, Distefano C and Jeste S (2018). Hybrid principal components analysis for region-referenced longitudinal functional EEG data. Biostatistics. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Scheffler AW, Telesca D, Sugar CA, Jeste S, Dickinson A, DiStefano C and Şentürk D (2019). Covariate-adjusted region-referenced generalized functional linear model for EEG data. Statistics in Medicine 38 5587–5602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Staicu AM, Crainiceanu CM and Carroll RJ (2010). Fast methods for spatially correlated multilevel functional data. Biostatistics 11 177–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Valdas-Hernandez PA, Ojeda-Gonzalez A, Martinez-Montes E, Lage-Castellanos A, Virues-Alba T, Valdas-Urrutia L and Valdes-Sosa PA (2010). White matter architecture rather than cortical surface area correlates with the EEG alpha rhythm. NeuroImage 49 2328 – 2339. [DOI] [PubMed] [Google Scholar]
  • [39].Wang J-L, Chiou J-M and Müller HG (2016). Functional Data Analysis. Annual Review of Statistics and Its Application 3 257–295. [Google Scholar]
  • [40].Welch PD (1967). The use of Fast Fourier Transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transactions on audio and electroacoustics 15 70–73. [Google Scholar]
  • [41].Wood S (2017). Generalized additive models: an introduction with R. Chapman and Hall/CRC. [Google Scholar]
  • [42].Xiao L, Li C, Checkley W and Crainiceanu C (2018). Fast covariance estimation for sparse functional data. Stat. Comput 28 511–522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Yao F, Müller HG and Wang JL (2005). Functional Data Analysis for Sparse Longitudinal Data. Journal of the American Statistical Association 100 577–590. [Google Scholar]
  • [44].Zhou L, Huang JZ, Martinez JG, Maity A, Baladandayuthapani V and Carroll RJ (2010). Reduced Rank Mixed Effects Models for Spatially Correlated Hierarchical Functional Data. Journal of the American Statistical Association 105 390–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Zipunnikov V, Greven S, Shou H, Caffo BS, Reich DS and Crainiceanu CM (2014). Longitudinal high-dimensional principal components analysis with application to diffusion tensor imaging of multiple sclerosis. The Annals of Applied Statistics 8 2175–2202. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

RESOURCES