Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Nov 1.
Published in final edited form as: Neuroimage. 2019 Jul 15;201:116019. doi: 10.1016/j.neuroimage.2019.116019

Exploring individual and group differences in latent brain networks using cross-validated simultaneous component analysis

Nathaniel E Helwig a,b,*, Matthew A Snodgress a
PMCID: PMC6765442  NIHMSID: NIHMS1535865  PMID: 31319181

Abstract

Component models such as PCA and ICA are often used to reduce neuroimaging data into a smaller number of components, which are thought to reflect latent brain networks. When data from multiple subjects are available, the components are typically estimated simultaneously (i.e., for all subjects combined) using either tensor ICA or group ICA. As we demonstrate in this paper, neither of these approaches is ideal if one hopes to find latent brain networks that cross-validate to new samples of data. Specifically, we note that the tensor ICA model is too rigid to capture real-world heterogeneity in the component time courses, whereas the group ICA approach is too flexible to uniquely identify latent brain networks. For multi-subject component analysis, we recommend comparing a hierarchy of simultaneous component analysis (SCA) models. Our proposed model hierarchy includes a flexible variant of the SCA framework (the Parafac2 model), which is able to both (i) model heterogeneity in the component time courses, and (ii) uniquely identify latent brain networks. Furthermore, we propose cross-validation methods to tune the relevant model parameters, which reduces the potential of over-fitting the observed data. Using simulated and real data examples, we demonstrate the benefits of the proposed approach for finding credible components that reveal interpretable individual and group differences in latent brain networks.

Keywords: Group Component Analysis, Multi-Subject Analysis, Multiway Analysis, Parallel Factor Analysis, Parafac2, Tensor Decomposition

1. Introduction

1.1. Background

Event-related (or task-based) neuroimaging studies record some brain signal in response to an event (or task) that is designed to elicit a particular neural response. For example, an event-related potential study records electroencephalography (EEG) data in response to some stimulus to understand the brain’s processing (auditory and/or visual) of the stimulus. Unlike resting-state neuroimaging studies, event-related (or task-based) studies reveal brain networks that relate to the cognitive processes underlying the event/task of interest. This implies that such studies can be useful for establishing norms—and identifying abnormalities—in brain responses to different types of stimuli, which can improve our understanding of various psychological disorders, e.g., addiction, depression, bipolar disorder, schizophrenia, etc.

Typical neuroimaging studies collect data across time from multiple spatial locations (e.g., electrodes or voxels), and most studies collect data from several subjects who participate in multiple conditions of some experimental task. Furthermore, the subjects often participate in multiple trials of each condition. In many cases, the multiple trials are treated as independent replications of the task, and the data are averaged across the repeated trials within each subject. However, in some cases there is interest in modeling the evolution of the brain response across the trials (e.g., see Williams et al., 2018). Regardless of whether the data are aggregated across trials, a typical neuroimaging study produces vast amounts of data—so some dimension reduction approach is typically needed to extract meaning from the signal.

1.2. Dimension Reduction

Component analysis models such as Principal Components Analysis (PCA) and Independent Components Analysis (ICA) are frequently used to obtain lower-dimensional representations of neuroimaging data, which can be useful for identifying brain networks (McKeown et al., 1998; Calhoun et al., 2001). However, the PCA and ICA models are forms of bilinear models, which are designed to model data with two modes of variation, e.g., a time × space data matrix from a single subject. To apply a bilinear model such as PCA or ICA to multi-subject neuroimaging data, it is necessary to either (i) fit a separate model to each subject’s data, or (ii) fit a single model to the concatenated data (see Beckmann et al., 2005; Calhoun et al., 2009). The first approach is undesirable because there is no clear way to connect the solutions across the subjects, whereas the second approach is undesirable because the multiway structure of the data is not fully leveraged.

Tensor analysis models (see Kolda and Bader, 2009, for an overview), such as the Tucker Factor Analysis model (Tucker, 1966) and the Parallel Factors (or Canonical Polyadic Decomposition) model (Hitchcock, 1927; Harshman, 1970; Carroll and Chang, 1970), provide powerful extensions of bilinear models such as PCA and ICA. As a result, tensor models have the potential to leverage the multiway structure of neuroimaging data to extract brain networks that display systematic variation across time points, spatial locations (electrodes or voxels), and subjects. Such models could also be extended to higher-dimensional data (e.g., 4-mode or 5-mode data) if the subjects participate in multiple experimental conditions and/or trials. Note that tensor ICA (TICA) models have also been proposed (see Beckmann and Smith, 2005; De Vos et al., 2012). However, such extensions are only appropriate if the data have trilinear structure and the underlying components are statistically independent—which is a rather strict assumption (see Helwig and Hong, 2013; Stegeman, 2007).

1.3. Individual Differences

Applications of tensor analysis models are on the rise in neuroimaging research, but the focus is typically on the Parafac/CPD model or the TICA model (e.g., see Andersen and Rayens, 2004; Beckmann and Smith, 2005; Cichocki, 2013; Cong et al., 2015; Mørup et al., 2006; Miwakeichi et al., 2004; Martínez-Montes et al., 2008). Note that these models implicitly leverage Cattell’s (1944) principle of parallel proportional profiles, such that the neuroimaging data are assumed to display proportional variation across all three data modes (i.e., time, locations, and subjects). This implies that the underlying brain networks have the exact same time courses and spatial maps across subjects, which is a rather strict assumption. Note that assuming common spatial maps across subjects may be reasonable, given the spatial organization of the brain. However, assuming the same time courses for all subjects is only appropriate if the event/task elicits the exact same response—at the exact same time points—in all of the subjects.

With real data, there may exist differences in the timing and nature of different subject’s neural responses to the event/task. For example, individual differences in the cognitive processing of the stimuli would result in differences in the timing and shape of features in EEG waveforms. As another example, individual differences in haemodynamic response functions would result in differences in the timing and shape of recorded functional magnetic imaging (fMRI) time courses. Typical applications of the Parafac/CPD model, as well as the TICA model, are too rigid to capture these sorts of individual differences—which are expected to exist in most real neuroimaging applications. Consequently, a more flexible framework is needed to capture the heterogeneity in the timing and nature of recorded brain signals from different subjects.

1.4. Proposed Approach

In this paper, we discuss how the Parafac2 model (Harshman, 1972) and the Simultaneous Component Analysis (SCA) framework (Timmerman and Kiers, 2003) can be used to solve such problems. Specifically, we leverage recent advances in Parafac2 modeling with smoothness constraints (Helwig, 2017) to flexibly model individual differences in event-related neuroimaging data. We also extend these recent developments to create novel (smoothness constrained) extensions of the SCA framework for analyzing event-related neuroimaging data. The benefits of the Parafac2 model for neuroimage research have been recently demonstrated (see Ferdowsi et al., 2015; Madsen et al., 2017; Makkiabadi et al., 2011). However, unlike these recent papers, our approach combines compression (through smoothness constraints) and cross-validation methods to automatically find the appropriate model dimensionality and degree of smoothness of the data.

Using both simulated and real data, we illustrate the benefits of using cross-validated tensor models, particularly the Parafac2 model, for uniquely identifying brain networks in multi-subject neuroimaging data. Our results clearly demonstrate the dangers of ignoring individual differences in brain imaging data, as well as the interpretational issues inherent to decompositions with a rotational indeterminacy. Specifically, we show that (i) the Parafac/CPD model produces biased estimates of the spatial maps when individual differences exist in the underlying time courses, and (ii) tensor models with a rotational freedom can recover the data-generating tensor but cannot necessarily recover the true data-generating spatial maps (due to the rotational indeterminacy). We also show the robustness of the smoothness-constrained Parafac2 model (with CV tuning), which performed well across a wide variety of data-generating conditions.

2. Theory

2.1. Overview of Component Models

2.1.1. Bilinear Models

Let X = {xij}I×J denote a time-by-space data matrix recorded from a single subject, where xij denotes the data recorded at the i-th time point (i = 1, …, I) from the j-th spatial location (j = 1, …, J). A bilinear model such as PCA (Pearson, 1901; Hotelling, 1933; Tipping and Bishop, 1999) or ICA (Comon, 1994; Bell and Sejnowski, 1995; McKeown et al., 1998; Hyvärinen et al., 2001) assumes that the data matrix can be decomposed into R ≤ min(I, J) underlying factors (or components), which consist of an outer product of time courses and spatial maps (see Figure 1a). The bilinear model depicted in Figure 1a has a rotational determinacy, such that for any nonsingular R × R transformation matrix T, we have that AB=A~B~, where A~=AT are the rotated time courses, B~=B(T1) are the rotated spatial maps. This implies that, given any bilinear model solution, there are an infinite number of other possible solutions that fit the observed data equally well—but may produce different interpretations of the underlying factors. To obtain unique solutions, bilinear models must impose additional assumptions on the solution, such as orthogonality (PCA) or independence (ICA) of the factors/components. However, in practice, choosing the appropriate rotation can be difficult (see Browne, 2001, for an overview), and there is no guarantee that any standard rotation will find the true components.

Figure 1: Component Models.

Figure 1:

Visualizations of various component models for neuroimaging data. PCA = Principal Component Analysis, ICA = Independent Component Analysis, SCA = Simultaneous Component Analysis, GICA = Group ICA, TICA = Tensor ICA, PARAFAC = Parallel Factor Analysis, and PARAFAC2 = Parallel Factor Analysis-2.

2.1.2. Simultaneous Component Analysis

Let Xk = {xij(k)}I×J denote a time-by-space data matrix recorded from the k-th subject for k = 1, …, K. The Simultaneous Component Analysis (SCA) framework consists of a hierarchy of four models that can be used to factor analyze multivariate time series data collected from K > 1 subjects (Timmerman and Kiers, 2003). For multi-subject neuroimaging data, the SCA models assume that each subject has a unique time course, but the subjects share common spatial maps (see Figure 1b). The four SCA models differ in terms of the assumed crossproduct structure of the subject specific time courses Dk, see the Supplementary Online Materials (SOM). In all four cases, we can write Dk = AkCk where the Ck are diagonal matrices (similar to Figure 1d), and the constraints on the Ak matrices will depend on the assumed version of the SCA model. In this paper, we explore the most flexible version of the SCA model (i.e., SCA-P), which makes no assumptions about the crossproduct structure of the Dk weights. Note that in this case the SCA model is equivalent to the bilinear model in Figure 1a applied to the stacked (i.e., row-wise concatenated) data matrices.

2.1.3. Parallel Factors-1 Model

The Parallel Factors (Parafac) model was proposed by Harshman (1970) for factor analysis of multiway data, i.e., data collected across more than two modes of variation. Note that neuroimaging data from a single subject has two modes of variation (time by space), and the subjects add a third mode of variation to the data. For multi-subject neuroimaging data, the Parafac model assumes that each subject has the same time courses and spatial maps, which are weighted by subject specific scores (see Figure 1c). Unlike the bilinear model in Figure 1a, the Parafac model in Figure 1c can produce solutions that are essentially unique, i.e., do not have a rotational indeterminacy (see Harshman, 1970; Harshman and Lundy, 1994; Kruskal, 1977; Sidiropoulos and Bro, 2000). So, if the assumed form of the Parafac model is correct, it is possible to uniquely find latent brain components using this approach (e.g., see Helwig and Hong, 2013; Mørup et al., 2006; Miwakeichi et al., 2004; Williams et al., 2018). However, the assumption that each subject has the same time courses will not be reasonable if between-subjects temporal heterogeneity exists in the underlying components—which may be the case in many real data applications.

2.1.4. Parallel Factors-2 Model

The Parafac2 model (Harshman, 1972) is an extension of the Parafac model that has the flexibility to model data with heterogeneity in the factor weights. For multi-subject neuroimaging data, the Parafac2 model allows each subject to have unique time courses Ak, with the constraint that the time courses share a common crossproduct matrix AkAk=Φ (see Figure 1d). Similar to the Parafac model, the Parafac2 model can produce essentially unique solutions, i.e., solutions without a rotational indeterminacy (see Harshman, 1972; Harshman and Lundy, 1996; Ten Berge and Kiers, 1996; Kiers et al., 1999; Timmerman and Kiers, 2003; Helwig, 2013). Note that the Parafac2 model has the same form as the SCA model in Figure 1b, with the additional assumption that Dk = AkCk where the Ak matrices satisfy AkAk=Φ. Thus, by adding the “common crossproducts” constraint to the SCA model, it is possible to solve the rotational indeterminacy problem. Of course, in practice, the appropriateness of the common crossproduct constraint needs to be considered, e.g., by comparing the fit of the SCA-P and Parafac2 models.

2.1.5. Tensor and Group ICA Models

The ICA model can be extended to the multi-subject situation in (at least) two different ways: using tensor ICA (Beckmann and Smith, 2005) or group ICA (Beckmann et al., 2005). The TICA approach assumes the same model form as the Parafac model in Figure 1c, but adds the assumption that the spatial maps are statistically independent of one another. Note that this is a strict assumption that is not actually needed, given that the Parafac model already provides a unique decomposition (see Helwig and Hong, 2013). The GICA approach assumes the same model form as the SCA-P model in Figure 1b, and solves the rotational indeterminacy using an ICA rotation of the spatial maps.

Note that both the TICA and GICA approaches assume the subjects share common spatial maps, which are assumed to be statistically independent of one another. The two approaches differ in their assumptions about the time courses, such that the GICA approach is more flexible. For even further flexibility, the “dual regression” (DR) approach can be applied to estimate subject specific spatial maps, which are based on the original group maps (see Beckmann et al., 2009). Note that because GICA is a form of SCA-P, the approach does not provide a unique solution, i.e., there is a rotational indeterminacy to the solution. As a result, the quality of the GICA solution (with or without DR) will depend on the appropriateness of the ICA rotation used to solve the indeterminacy.

2.2. Component Models in Practice

2.2.1. Overview

Given a chosen component model and number of factors R, the parameters are typically estimated using a least squares approach (Krijnen, 2006), see the SOM for details. For bilinear models (including SCA-P), a closed form solution exists in terms of the singular value decomposition (Eckart and Young, 1936) of the stacked matrix (Timmerman and Kiers, 2003), and then a post-hoc rotation can be applied to the spatial maps. For the Parafac and Parafac2 models, the parameters are typically estimated using an alternating least squares (ALS) algorithm with multiple random starts (Kiers et al., 1999; Faber et al., 2003; Tomasi and Bro, 2006). In default cases, the parameters are estimated using unconstrained least squares. However, imposing constraints (e.g., non-negativity or smoothness) on a mode’s weights can be useful, particularly when the data are noisy (Harshman and Lundy, 1994; Bro and De Jong, 1997; Timmerman and Kiers, 2002). Later in this subsection, we discuss how to incorporate smoothness constraints on the time courses, which can greatly improve the recovery of the underlying factors. But we first address the issue of choosing the appropriate dimensionality R for the given model.

2.2.2. Choosing the Number of Components

Selecting the appropriate dimensionality R in bilinear models has been a topic of research for decades (e.g., see Kaiser, 1960; Cattell, 1966; Horn, 1965; Revelle and Rocklin, 1979; Akaike, 1987). Although many different approaches have been proposed over the years, there does not exist a “perfect solution” that is ideal for all circumstances. The parametric approaches (e.g., likelihood ratio tests and information criteria) can perform well for data that meet their particular parametric assumptions, but will fail to provide correct results in general cases. In contrast, the scree line tests can provide useful heuristics, but fail to provide a theoretically rigorous rule for determining the number of components. As a result, selecting the appropriate number of components in a bilinear model is still an open topic of research (e.g., see Hong et al., 2006; Larsen and Warne, 2010; Warne and Larsen, 2014).

To choose the number of components for Parafac/Parafac2 models, it is typical to use the CORe CONsistency DIAgnostic (CORCONDIA) proposed by Bro and Kiers (2003). This criterion leverages the fact that the Parafac model (or Parafac2 model) is a special case of the 3-way Tucker factor analysis model (Tucker, 1966) applied to the data (or projected data for Parafac2, see Kamstrup-Nielsen et al., 2013). The CORCONDIA compares the difference between the least squares estimate of the Tucker “core” array to the Parafac implied “super-diagonal” core array. If the two are similar, the CORCONDIA value will be close to 100, which indicates that the assumed model is well supported by the observed sample of data. In contrast, if the CORCONDIA value is below 100, this indicates deviations from the assumed model structure. In practice, different CORCONDIA thresholds have been proposed for choosing the dimensionality, e.g., CORCONDIA ≥ 80 or ≥ 90.

2.2.3. Incorporating Smoothness Constraints

Typical applications of component analysis models to spatiotemporal data do not take into consideration the spatial or temporal properties of the observed data. More specifically, the models do not take into account any information about the interrelationships between the levels of a given mode, e.g., that rows i and i + 1 of the A matrix (in Figure 1a) correspond to sequential time points. If the true time courses change as a smooth function across time1, then including this information in the model can greatly improve the parameter estimates—particularly when the signal-to-noise ratio is small (see Alsberg and Kvalheim, 1993; Reis and Ferreira, 2002; Timmerman and Kiers, 2002; Helwig, 2017). To incorporate smoothness information into the model, we can assume that the time course weights can be written as a linear combination of known basis functions with unknown coefficients. For example, a Parafac model with smoothness constraints on the time course weights assumes that A = Fα where F = {f(i)}I×ν is a known basis function matrix2 with ν denoting the degrees of freedom, and α = {αℓr}ν×R is an unknown matrix of coefficients.

The degrees of freedom parameter ν controls the smoothness of the A estimates, such that smaller values of ν relate to more smooth estimates. Note that setting ν = I (the number of time points) produces the same solution as would be obtained without the smoothness constraints imposed. As a result, the classic (unconstrained) solution can be considered a special case of this “smoothness constrained” approach (see the SOM for details). This implies that, in practice, one must determine a reasonable value of ν for the smoothing. Note that setting ν too small will introduce too much bias to the solution, whereas setting ν too large will introduce too much noise to the estimates (assuming noisy data). To find a balance between fitting and smoothing the data, it is typical to use ordinary cross-validation (OCV) applied to the tensor (see Timmerman and Kiers, 2002; Helwig, 2017). However, the OCV criterion is designed for situations where the error terms are “iid” (independent and identically distributed), which is not likely to be true with real neuroimaging data.

2.2.4. Cross-Validation for SCA

In typical applications, both the number of components R and the degree of smoothness ν are determined from a single sample of K subjects. This has the potential to lead to over-fitting the data, i.e., choosing a value of R or ν that fails to cross-validate for new samples of data. To avoid over-fitting, we propose a novel k-fold cross-validation (CV) procedure to select both the model dimensionality R and the degree of smoothness ν. Specifically, we propose using k-fold CV of the CORCONDIA to determine the appropriate number of component R, and k-fold CV of the tensor mean-squared error (MSE) to select the degrees of smoothness ν. We note that k-fold CV has been used to tune R in multiway models (Louwerse et al., 1999); however, previous discussions have neglected the Parafac2 model, and have not used the cross-validated CORCONDIA for choosing R. The details of our proposed CV approach are discussed in the SOM, and the utility of the method is explored in the Simulation Study.

3. Methods

3.1. Simulation Study

3.1.1. Design

We designed a Simulation Study to compare the various multi-subject component analysis models discussed in Section 2.1, as well as the k-fold CV tuning methods proposed in Section 2.2.4. Our simulation data-generating components are motivated by our real data application to EEG data. Specifically, throughout the simulation, we assumed that K = 30 subjects participated in an event-related potential (ERP) experiment where the EEG data were sampled at I = 256 Hz from J = 61 electrodes. To ensure that the data-generating conditions have practical relevance, the group time courses and spatial maps for the R = 2 data-generating components (displayed in Figure 2a) are derived from the real data results. As a part of the Simulation Study, we compared the methods under 24 different combinations of conditions, which represent all combinations of two factors: (i) the heterogeneity level, i.e., the amount of individual differences in the components (four levels, σ ∈ {0, 0.5, 1, 1.5}), and (ii) the signal to noise ratio (SNR; six levels: 0.1, 0.15, 0.25, 0.4, 0.6, 1).

Figure 2: Simulation Design Parameters.

Figure 2:

(a) The top row displays a prototypical event-related potential (ERP) waveform with five features (P1, N1, P2, N2, and P3), as well as the 61 channel electroencephalography (EEG) cap used in the simulation and real data examples. The middle and bottom rows show the data generating parameters for the simulation: the average Mode A (time) weights (i.e., Fμr) and the Mode B (electrode) weights. (b) Simulated Mode A (time) weights corresponding to five random samples of the coefficients (i.e., Fαk,r,) at the three different levels of heterogeneity. Note that when σ = 0 all of the subjects’ ERPs are equal to their expected values, which are denoted with black dashed lines. Figures were created using the eegkit package (Helwig, 2018a) in R (R Core Team, 2019).

Following Stegeman (2007) and Helwig and Hong (2013), we define the SNR as the root mean square of the signal divided by the root mean square of the noise (see the SOM). To simulate realistic ERP components with systematically different levels of heterogeneity, we used the following approach. Letting F = {f(i)}I×ν denote the B-spline basis function matrix with ν = 30 degrees of freedom, each subject’s ERP components were defined as Ak = Fαk where αk = {αℓr}ν×R is the k-th subject’s matrix of basis function coefficients. Letting αk,r, denote the r-th column of αk, we generated the αk,r vectors independently from a multivariate normal distribution. More specifically, we assumed that αk,r~Nν(μr,σ2Σr) where μr and Σr denote the mean and covariance matrices,3 and σ ∈ {0, 0.5, 1, 1.5} controls the heterogeneity4 of the ERPs (see Figure 2b). Note that σ = 0 corresponds to Ak = A for all k, so the generated data follow the Parafac model form in this case. When σ > 0 the generated data meet the assumptions of the SCA-P model, but they do not meet the assumptions of the Parafac2 model—given that the cross-product constraint AkAk=Φ is not enforced as a part of the data generation.

3.1.2. Analyses

Data Generation.

For each of the 24 (6 SNR × 4 σ) cells of the simulation design, we generated 100 independent samples of data. The Mode A weights were generated as previously described, i.e., Ak = Fαk with αk,r~Nν(μr,σ2Σr), the Mode B weights were fixed at the values displayed in Figure 2a, and the Mode C weights were independently sampled from a uniform distribution on the interval [0.5, 1.5]. The errors were sampled independently from a standard normal distribution, and then were rescaled (i.e., multiplied by a constant) to achieve the needed SNR.

Model Comparison.

For each sample of generated data, we fit six models: (i) the Parafac model, (ii) the Parafac2 model, (iii) the SCA-P model with Varimax rotation (Kaiser, 1958), (iv) the SCA-P model with no rotation, which is the PCA solution, the (v) the GICA model using a FastICA rotation (Hyvärinen, 1999), and (vi) the GICA model (using FastICA) with dual regression. For the model comparison, the number of factors R and the degree of smoothness ν were fixed at the data generating values, i.e., R = 2 and ν = 30, which makes it possible to compare the quality of the parameter estimates. The Parafac and Parafac2 models were fit to the data using a non-negativity constraint on the Mode C weights,5 and the Ck weights for the SCA-P models were defined as the standard deviations of the columns of Dk. For the Parafac and Parafac2 models, we used 10 random starts of the ALS algorithm. The models were fit using the multiway (Helwig, 2019) and ica (Helwig, 2018b) packages in R (R Core Team, 2019).

K-Fold CV Tuning.

For each sample of generated data, we also explored the k-fold CV tuning methods proposed in Section 2.2.4. To select the model dimensionality, we evaluated the CV-CORCONDIA using R = 1, …, 4, and we compared three different thresholds for determining if the CORCONDIA is “close enough” to the optimal value of 100, i.e., the thresholds >80, >85, and >90. To select the degree of smoothness ν, we evaluated the CV-MSE using 15 ν values ranging from 10 to 80 in increments of 5. We compared the results of our proposed selection rule (i.e., a threshold of δ = 0.2 for the standardized CV-MSE change) to the results that would have been obtained using the optimal degree of smoothing (i.e., ν = 30) and using no smoothing (or otherwise ν = 256).

Performance Metrics.

The quality of each solution was quantified using various metrics to determine how well the method performed. We used the root mean squared error (RMSE) between the true and estimated tensors to quantify the quality of the tensor recovery, and we used the observed data R-squared to quantify the quality of the data recovery. Note that the R-squared value quantifies the difference between the estimated tensor and the observed data, whereas the RMSE value quantifies the difference between the estimated tensor and the true model tensor (i.e., without the error). Finally, to quantify the quality of the parameter recovery, we calculated the Tucker Congruence Coefficient (TCC) (Tucker, 1951) between the true and estimated weights, i.e., time courses and spatial maps. Values of the TCC closer to 1 indicate better agreement between the true and estimated parameters. See the SOM for further details on the performance metrics.

3.2. Application to EEG Data

3.2.1. Data

To compare the performance of the various methods with real data, we use open-source EEG data collected from control and alcoholic subjects participating in a visual stimulus ERP study (Zhang et al., 1995). The data were collected in the Henri Begleiter Neurodynamics Laboratory at SUNY Downstate under the approval of the authors’ home institutions. The dataset is freely available—thanks to Ingber (1997, 1998)—from the UCI Machine Learning repository (Dua and Graff, 2019). Our use of the open-source data adheres to the ethical standards of the authors’ institution. The data consist of 61-channel ERPs recorded at 256 Hz for one second following the presentation of a visual stimulus from the Snodgrass and Vanderwart (1980) stimuli set. The data were collected from a total of 122 subjects (77 alcoholics and 45 controls) who were shown a single stimulus (S1), which was followed by either a matching stimulus (S2m) or a non-matching stimulus (S2n). For our analyses, we focus on the subjects’ initial response to the single stimulus, i.e., the S1 condition.

For each subject, multiple trials of the task were collected during each condition. Our analyses are based on a subset of 115 subjects (73 alcoholics and 42 controls) who have artifact free data for all electrodes on at least 20 trials of the S1 task. Note that we identified trials as having artifact (e.g., due to movement) if the absolute recorded voltage was greater than 100 μV, which resulted in the exclusion of about 5% of the trials in the full dataset. The raw data were bandpass filtered using a window of 1 to 50 Hz, and then averaged across trials within each subject. To ensure an equal number of trials for each subject’s mean data, we averaged the data across the first 20 trials for each subject. This resulted in a data tensor of dimension 256 time points × 61 electrodes × 115 subjects.

3.2.2. Analyses

The real data were analyzed using a similar procedure as was used in the Simulation Study. As a first step, the k-fold CV methods proposed in Section 2.2.4 were used to determine a reasonable number of components R and degree of smoothness ν for the data. The k-fold CV approach was applied in a similar fashion as before: (i) the CV-CORCONDIA was tuned with respect to R = 1, …, 4 to determine the number of components, and then (ii) the CV-MSE was tuned with respect to ν = (10, 15, …, 75, 80) to determine the degree of smoothness. Given the estimates of R and ν, we fit the six component models that were explored in the Simulation Study. Each model was fit using the same specifications that were used in the Simulation Study (see Section 3.1 for details).

4. Results

4.1. Simulation Study

4.1.1. Comparison of Component Models

Overview:

The simulation results reveal noteworthy differences between the compared methods on the examined performance metrics, which are plotted in Figure 3. We only plot the results for two (of the four) examined levels of heterogeneity, i.e., no heterogeneity (σ = 0) versus moderate heterogeneity (σ = 1), which is enough to make our key points.

Figure 3: Simulation Performance Results.

Figure 3:

The subplots display different performance metrics (rows) at different heterogeneity levels (columns). Within each subplot, the x-axis displays the results at the six different signal-to-noise ratios (SNRs). The six boxplots within each SNR denote the results from the six different methods

Observed Data Recovery:

For all levels of heterogeneity conditions, we found that the observed R-squared values tended to be ordered such that Parafac < Parafac2 < SCAP = GICA < GICA-DR. This ordering was expected, given that it represents the ordering of the methods with respect to the flexibility of the assumed model structure (from least to most flexible). With no heterogeneity present in the component time courses (Figure 3 left), the differences between the R-squared values for the different methods was small. This was anticipated given that all of the methods are flexible enough to model the data when σ = 0. In contrast, when heterogeneity is present in the latent time courses (Figure 3 right), the R-squared difference between Parafac and other methods is more substantial. This is not surprising, given that the Parafac model is not flexible enough to capture the individual differences that exist in the time courses when σ = 1.

Latent Tensor Recovery:

Unlike the R-squared (which measures observed data fit), the RMSE measures how well the estimated tensor fits the unobserved data-generating tensor. As a result, the RMSE patterns in Figure 3 reveal some interesting insights that are not discernible from the observed data R-squared values. Focusing first on the σ = 0 condition, we see that the RMSE values are approximately ordered such that Parafac < Parafac2 = SCAP = GICA < GICA-DR. For all methods, the RMSE values tend to decrease as the SNR increases, which is expected in this case. For small SNR values, the Parafac solution produces noticeably smaller (i.e., better) RMSE values than the other solutions, whereas the GICA-DR solution produces noticeably larger (i.e., worse) RMSE values than the Parafac2, SCAP, and GICA methods.

The results show a somewhat different pattern when heterogeneity is present in the component time courses. In the σ = 1 case, the Parafac RMSE does not decrease as the SNR increases, given that the Parafac model is not flexible enough to model the heterogeneity in the data. Also, when σ = 1 we see that the Parafac2 model shows a slightly larger RMSE than the SCA-P and GICA models. This is not surprising given that the generated data follow the SCA-P model form—not the Parafac2 model form. What is surprising is that the Parafac2 model is able to perform nearly as well as the SCA-P model, even though the data-generating tensor does not meet the Parafac2 model assumptions.

Parameter Recovery:

The bottom two rows of Figure 3 show the TCC between the true parameters (i.e., time courses and spatial maps) and their estimates obtained using the six different component analysis methods. As a reminder, TCC values closer to 1 indicate better agreement between the parameters and the estimates. With no heterogeneity present in the component time courses (σ = 0), the Parafac model performs noticeably better than the other methods, particularly at the small SNRs. In contrast, when there is heterogeneity (σ = 1), the Parafac model shows a noteworthy reduction in performance, particularly with respect to the time course estimation. These results were expected, given that the Parafac model is appropriate for the σ = 0 condition, but not the σ = 1 condition. Unlike the Parafac results, the Parafac2 parameter recovery improves as the SNR increases for both the homogeneous and heterogenous conditions. It is interesting to note that the Parafac2 model can well-recover the parameters when σ = 1, even though these data do not meet the Parafac2 model assumptions.

Comparing the results for the other methods provides a clear example of the important implications of the rotational indeterminacy problem. As a reminder, the two SCA-P models and the GICA model have the exact same fit measures (i.e., R-squared and RMSE), given that these three solutions are different rotations of the model in Figure 1b. As the SNR increases, the Varimax rotation (coincidently) produces rotated time courses and spatial maps that find the true data-generating parameters. In contrast, the PCA orientation (i.e., unrotated SCA-P) and the Fast ICA orientation (i.e., GICA) both fail to find the true data-generating parameters. This is because the true data generating spatial maps are moderately correlated (≈ 0.33), so the orthogonality and independence assumptions are not reasonable. Finally, it is worth noting that the GICA-DR approach resulted in worse spatial map estimates than using GICA without DR, but the difference between the approaches diminished as the SNR increased. This result was expected given that the true data-generating spatial maps were common across subjects.

Algorithm Runtimes:

The model fitting times are ordered such that the SCA variants are the most efficient (≈ 0.02 sec), the Parafac model is next (≈ 0.1 sec), and the Parafac2 model is least efficient (≈ 0.2 sec). Note that these runtimes are in the expected order, given that SCA has a closed-form solution, whereas the Parafac and Parafac2 models rely on iterative algorithms. Tuning the Parafac2 model (using k-fold CV) for the number of factors R (≈ 2.9–3 sec) and the degree of smoothness ν (≈ 4.5–5 sec) was the major computational burden of the Parafac2 analysis. However, in practice, the tuning and fitting can be accomplished rather efficiently, given that the k-fold CV can be implemented in parallel on a cluster.

4.1.2. Choosing the Number of Factors

The results of our k-fold CV-CORCONDIA approach (for selecting the number of factors) are plotted in the top row of Figure 4. As a reminder, we evaluated the CV-CORCONDIA for R ∈ {1, …, 4} factors with thresholds of >80, >85, and >90. Across all 24 cells of the simulation design, the dimensionality of R = 4 was never selected. As a result, Figure 4 only displays the selection results for R ∈ {1, 2, 3}. As expected, the results reveal that the number of selected factors tends to decrease as the CORCONDIA threshold increases. Furthermore, as expected, the selection results improve as the SNR increases. For all SNRs, the correct dimensionality (R = 2) is chosen most frequently. For the smaller SNRs, the threshold has a more noteworthy effect, such that over-selection of R is more common with a threshold of >80. However, for larger SNRs, the different thresholds provide rather consistent results, and select the correct dimensionality of R = 2 for a majority of simulation replications.

Figure 4: Simulation Tuning Results.

Figure 4:

Top: The top row shows the results from our k-fold cross validation (CV) method to choose the number of components. Within each SNR, the three bars denote the three different CORCONDIA selection rules (>80, >85, >90), and the colors within each bar denote the percentage of times a given number of components (R = 1, 2, 3) was selected. Others: Rows 2–4 show the results from our k-fold CV method to choose the degree of smoothness. These plots display the tensor and parameter recovery using three different methods: (i) no smoothing, (ii) smoothing using the degrees of freedom ν estimated from our proposed approach, and (iii) smoothing using the optimal (i.e., data-generating) degrees of freedom ν.

4.1.3. Choosing the Degree of Smoothness

The results of our k-fold CV-MSE approach (for selecting the degree of smoothness) are plotted in the bottom three rows of Figure 4. As a reminder, we evaluated the CV-MSE for ν ∈ {10, 15, …, 75, 80} degrees of freedom with a threshold of δ = 0.2 standardized units for the CV-MSE change. We compared the performance metrics using our estimated ν to those obtained from the optimal (data-generating) ν and those obtained without smoothness constraints. The results depicted in Figure 4 clearly demonstrate the effectiveness of our proposed k-fold CV-MSE approach. In particular, note that the performance metrics for the estimated ν value are (i) very similar to those of the optimal ν value, and (ii) substantially better than those obtained without including the smoothness constraints—particularly for small SNRs. Thus, these results highlight the benefits of including smoothness constraints in component models, and illustrate the potential of data-driven methods for automatic tuning of the degree of smoothness.

4.2. EEG Application

4.2.1. Dimensionality and Smoothness

The k-fold CV results for tuning the number of factors R and the degree of smoothness ν for the real EEG data are plotted in Figure 5. As in the Simulation Study, we first tuned the Parafac2 model with respect to R using the k-fold CV-CORCONDIA proposed in Section 2.2.4. The results in Figure 5 (left) clearly reveal that R = 2 factors should be preferred: note that R = 2 would be selected using any of the thresholds explored in the Simulation Study. Fixing R = 2, we tuned the model with respect to ν using the k-fold CV-MSE proposed in Section 2.2.4. The results in Figure 5 (right) illustrate that the CV-MSE standardized differences seem to stabilize for ν ≥ 30, which is the value of ν selected by our heuristic. Consequently, throughout the remainder of the example, the results are explored using R = 2 factors and ν = 30 degrees of freedom.

Figure 5: Real Data Tuning Results.

Figure 5:

The cross-validation values of the CORCONDIA (left) and the MSE standardized difference (right) for the real EEG data application. The results suggest that R = 2 factors and ν = 30 degrees of freedom are reasonable.

4.2.2. Model Fit Comparison

The Parafac model explained about 30% of the variation in the data, which was substantially smaller than the R-squared values for the other methods. This suggests that the Parafac model is not flexible enough to capture the heterogeneity that exists in the component time courses. The Parafac2, SCA-P, and GICA models all displayed similar R-squared values (0.633 for Parafac2 vs. 0.641 for others), which were more than double the R-squared value for the Parafac solution. The small difference in R-squared between the Parafac2 solution and the SCA-P/GICA solutions suggests that the Parafac2 model assumptions are reasonable for these data. In other words, constraining the time courses to have a common crossproduct matrix did not noticeably reduce the model’s ability to fit the observed data. But the constraints have practical implications for interpreting the underlying components, as will be discussed in the next subsection.

4.2.3. Interpreting the Components

The meaning of the components is typically inferred from the patterns of the estimated spatial maps and/or time courses. When applying the Parafac2 or SCA model, each subject has a unique time course matrix, so the interpretation of the components is typically inferred from the spatial maps. In Figure 6 we plot the estimated spatial maps for the Parafac2 solution, the Varimax rotated SCA-P solution, the unrotated SCA-P solution, and the GICA solution. Note that the Parafac2 and Varimax rotated solutions produce a similar interpretation: Factor 1 shows voltage differences between the temporal-parietal versus frontal-central electrodes, whereas Factor 2 shows voltage differences between the central-parietal versus frontal electrodes. Furthermore, note that the unrotated SCA-P and GICA solutions produce a similar interpretation as one another, but this interpretation differs from the previous one: Factor 1 is parietal versus frontal component, and Factor 2 is a central versus peripheral component. Which interpretation should we choose? This conundrum demonstrates the practical consequences of the rotational indeterminacy problem—as well as the power of Parafac2’s intrinsic axis property (i.e., lack of a rotational indeterminacy).

Figure 6: Real Data Spatial Maps.

Figure 6:

The estimated spatial maps for factors 1 and 2 (rows) obtained via four different methods (columns). All four solutions have similar R-squared values (0.633 for Parafac2 vs. 0.641 for others), which suggests that the Parafac2 model is reasonable. Figures were created using the eegkit package (Helwig, 2018a) in R (R Core Team, 2019).

Without considering the Parafac2 solution, one might (incorrectly) conclude that the second interpretation should be preferred, e.g., because two of the three SCA-P rotations resulted in this solution. However, in light of the Parafac2 solution, it is clear that the Varimax rotated SCA-P solution (or the Parafac2 solution) should be preferred for interpretational purposes. This is because the Parafac2 model fits nearly as well as the SCA-P model, and produces a unique interpretation of the underlying components—due to Parafac2’s intrinsic axis property. Based on the voltage difference patterns in the Parafac2 solution, we refer to Factors 1 and 2 as the Ventral and Dorsal factors, respectively. Historically, the ventral and dorsal streams have been distinguished as the “what” (i.e., object processing) versus the “where” (i.e., spatial processing) parts of the visual system (see Schneider, 1969; Mishkin et al., 1983). More recently, the distinction between the ventral and dorsal streams is often described as the “what” (i.e., vision for perception) versus the “how” (i.e., vision for action) parts of the visual system (see Goodale and Milner, 1992; Milner and Goodale, 2008).

4.2.4. Alcoholic versus Control Differences

The R-squared values previously reported were calculated ignoring group membership. Conditioning on the group, we find that the model fits the control subjects’ data (67.0% variance accounted for, VAF) slightly better than the alcoholic subjects’ data (59.6% VAF). Interestingly, if we calculate the factorwise R-squared separately for each group, we find some noteworthy differences. For the alcoholic subjects, the two factors have nearly equal influence such that the Dorsal Factor (30.5% VAF) is slightly more influential than the Ventral Factor (29.1% VAF). In contrast, for the control subjects we find that the Ventral Factor (39.3% VAF) is more influential than the Dorsal Factor (27.7% VAF). It is interesting that the alcoholic subjects have less coherence in the Ventral Factor (as evidenced by the reduced R-squared), which seems to imply an abnormality in the “what” processing of the information.

Examining the features of the ERP waveforms in Figure 7, we observe some interesting differences in the timing and forms of the expected ERP trajectories for the alcoholic and control subjects. For the Ventral Factor, both groups have an obvious P1 feature that peaks about 105 ms after the stimulus; however, for the Dorsal Factor the P1 feature peaks about 19 ms later for the alcoholic subjects (109 ms) compared to the control subjects (90 ms). Note that the Dorsal P1 peak occurs slightly after the Ventral P1 peak for the alcoholic subjects, whereas the opposite is true for the controls. The P1 feature of the ERP is typically thought to reflect the “cost of attention” (Luck et al., 1994), so the delay of the Dorsal P1 peak in the alcoholic subjects may reflect a deficit in early spatial (or action) processing related to the cost of shifting one’s attention to a new stimulus.

Figure 7: Real Data Parafac2 Solution.

Figure 7:

The top row is identical to the top row in Figure 2a. The middle and bottom rows show the estimated parameters: the average times courses (i.e., AkCk) for each group, as well as the Mode B (spatial) weights common to both groups. Figures were created using the eegkit package (Helwig, 2018a) in R (R Core Team, 2019)

The most prominent peak in each factor is the N1 feature, which is noticeably delayed in the alcoholic subjects. For the control subjects, the Dorsal N1 peak (172 ms) and the Ventral N1 peak (176 ms) occur slightly before the corresponding N1 peaks in the alcoholic subjects (Dorsal and Ventral: 180 ms). It is interesting to note that the alcoholics have a noteworthy delay in the N1 peak of both factors, even though the P1 peak was only delayed for the Dorsal Factor. Furthermore, the magnitude of the N1 peak in the alcoholic subjects is reduced for both factors, providing further evidence that the alcoholics have an information processing abnormality early in the visual processing stream. Note that the amplitude of the N1 peak has been found to be moderated by attention (e.g., see Haider et al., 1964; Eason et al., 1969; Luck et al., 2000; Wascher et al., 2009). Thus, these differences provide further evidence that the visual processing abnormalities in alcoholics may relate to attentional deficits/costs early in the visual processing stream.

For both factors, the most noteworthy difference between the ERPs of the alcoholics and controls occurs about 200–450 ms after the stimulus. During this time window, the alcoholics display an overall reduction in the magnitude of the ERP, as well as a difference in the shape of the ERP waveform. For the Ventral Factor, the alcoholic ERP seems to lack the typical N2 and P3 features that are evident in the control ERP. Note that the reduction in amplitude of the P3 component of alcoholic subjects has been well-documented in the literature (e.g., see Porjesz et al., 1980, 1987; Porjesz and Begleiter, 1990a,b). Intriguingly, the Dorsal Factor of the alcoholic subjects seems to have prolonged N2 and P3 components, which peak substantially later than in the control subjects (≈ 80–180 ms delay). Note that delays in the N2 and P3 features have been noted in previous visual stimulus ERP studies of alcoholics (see Emmerson et al., 1987; Fein and Chang, 2006; Fein and Andrew, 2011). This suggests that the “missing” N2 and P3 in the Ventral component may be related to a delayed/prolonged Dorsal stream processing of the visual stimulus.

5. Discussion

5.1. Synthesis of Findings

The simulation results clearly demonstrate the potential of cross-validated tensor models, particularly Parafac2, for component analysis of multi-subject data. Note that the Parafac2 model performed well (with respect to both tensor and parameter recovery) across a variety of SNRs and heterogeneity levels. In contrast, the Parafac model only performed well when the data contained no heterogeneity in the component time courses. The SCA-P and GICA models were able to well-recover the tensor but not necessarily the parameters—due to the rotational indeterminacy. The SCA-P model with Varimax rotation worked well in this case, but cannot be expected to perform similarly well for other configurations of the spatial maps (i.e., Varimax has no guarantees to find the “true” components). The important point to take away is that the Parafac2 model was able to well-recover the Mode B (space) weights even when the data were generated from the SCA-P model. This suggests that the Parafac2 model can be useful for finding an ideal orientation (i.e., rotation) of the SCA-P solution—even when the Parafac2 model is not the correct model for the data.

The real data example offered a novel look at a classic dataset by comparing different multi-subject component analysis models (Parafac, Parafac2, SCA-P, GICA). The real data results reveal that the Parafac2 model seems appropriate for the data, given that (i) the Parafac model—which is less flexible—explained substantially less of the data variation, and (ii) the SCA-P model—which is more flexible—explained a similar amount of the data variation. Furthermore, the real data results reveal the practical significance of Parafac2’s uniqueness property. In particular, the SCA-P solution with R = 2 factors produced a nearly identical fit as the Parafac2 solution, but the SCA-P solution could produce several possible interpretations of the underlying brain networks (see Figure 6). In contrast, the Parafac2 solution produced a unique result, where the R = 2 factors could be interpreted as the classic Ventral and Dorsal components of the visual processing stream.

The rotational indeterminacy of the SCA-P model poses a serious dilemma, given that different rotations—including different ICA algorithms—can produce different interpretations of the results. Although ICA inspired rotations are often considered in neuroimaging research, many other possible factor analytic rotations could be considered (see Browne, 2001; Bernaards and Jennrich, 2005). For any particular application, it is unclear which rotation should be preferred. This implies that two researchers (using two different rotations) could arrive at different interpretations of the underlying brain components, which is unsatisfactory. In contrast, we have demonstrated that the Parafac2 model can produce unique solutions that are useful for interpreting underlying brain components. Thus, when the Parafac2 and SCA-P models produce a similar fit to the data, the Parafac2 model should be preferred due to its intrinsic axis property.

5.2. Implications for Task-Based fMRI Studies

Although we used EEG data in our application, our results have important implications for component analysis of other types of functional neuroimaging data, such as fMRI data. In past applications, the TICA model has been the preferred approach for component analysis of multi-subject (task-based) fMRI data (Beckmann and Smith, 2005). Previous studies have noted that the TICA model can produce biased estimates of spatial maps when the true components are spatially dependent, and have recommended the Parafac model for component analysis of multi-subject task-based fMRI data (Stegeman, 2007; Helwig and Hong, 2013). However, these past studies failed to consider how un-modeled heterogeneity in the component time courses can negatively affect estimates of spatial maps. Our results clearly reveal that ignoring subject-specific heterogeneity in the component time courses can produce biased estimates of the latent time courses and spatial maps.

This implies that applying the Parafac (or TICA) model to data with temporal heterogeneity can be expected to produce misleading results. In particular, one should expect to over-factor the solution because “extra” factors will be needed to capture the un-modeled temporal heterogeneity. Substantial evidence suggests that there exists non-ignorable individual differences in haemodynamic response functions (e.g., see Handwerker et al., 2004). Consequently, the Parafac (or TICA) model may be too rigid for the analysis of typical task-based fMRI data, even when the task has the exact same stimulus function for each subject. To accommodate temporal heterogeneity in task-based fMRI studies, one might consider using the GICA model instead of the TICA model. However, our results suggest that the GICA model may produce misleading interpretations of the results, due to the rotational indeterminacy, and that the Parafac2 model has the potential to produce more valid results.

To illustrate this point, we used the fmri package (Tabelow and Polzehl, 2011) in R to simulate hypothetical fMRI data from a visual stimulus task. Like the EEG data example, the simulated fMRI data are comprised of R = 2 factors that are spatially dependent (see Figure 8a). To introduce heterogeneity into the data, we randomly jittered the HRF parameters that define the expected BOLD signal for each factor (see Figure 8b), which produces individual differences that do not conform to the Parafac2 model. We then analyzed the resulting noise free tensor to determine how well the various methods can find the data generating components. The entire data generating process was repeated 100 times to examine the sensitivity of the solution for each method. Note that we did not include the smoothness constraints for the models, as there is no need to smooth the noiseless data.

Figure 8: Simulated fMRI Example.

Figure 8:

(a) Data generating time courses and voxel maps. (b) Example of simulated heterogeneity in latent time courses. (c) Recovery of data (R-squared) and tensor (RMSE) across 100 replications. (d) Recovery of parameters across 100 replications. (e) Examples of estimated voxel maps.

The results reveal that the SCA-P and GICA models fit the data perfectly (as expected), such that they produce R-squared values of one and RMSE values of zero (see Figure 8c). Compared to the SCA-P and GICA models, the Parafac2 model shows a slight reduction in fit, whereas the Parafac model shows a more noteworthy reduction in fit (see Figure 8c). Note that to obtain a similar fit as the other methods, the Parafac model would require extracting “extra” factors, which would result in a misinterpretation of the data. Focusing on the parameter recovery (see Figure 8d), we see that the Parafac model struggles to correctly recover the time courses and voxel maps, which demonstrates the interpretational consequences of ignoring the temporal heterogeneity. The Parafac2 model performs best at recovering the voxel maps, with the Varimax rotated SCA-P model also performing reasonably well.6

The unrotated SCA-P model (i.e., PCA solution) produces a systematic bias that results in a substantial misinterpretation of the results: the first factor has bipolar (positive and negative) activations, and the second factor is a blend of the two data generating factors (see Figure 8e). The GICA solution is rather unstable, such that the method produces results that range between the unrotated and Varimax rotated SCA-P solutions. In approximately 2/3 of the replications, the GICA approach produces a solution similar to the Varimax rotated SCA-P model, whereas the other replications produce a solution more similar to the unrotated SCA-P model (see Figure 8e). This highlights the dangers of relying solely on the GICA rotation of the SCA-P solution, which can produce misleading results when the factors are spatially dependent. In particular, the bipolar spatial maps produced by the unrotated SCA-P and GICA models (see Figure 8e) might be misclassified as “noise components” (Griffanti et al., 2017) or misinterpreted as genuine components.

5.3. Applications to Functional Connectivity

Our results also have important implications for the application of component analysis models to resting-state fMRI data. In resting-state fMRI studies, component models are often used to provide lower-dimensional representations of the data, which can be used to understand functional connectivity patterns. In a recent study, Madsen et al. (2017) demonstrated that the GICA model and the Parafac2 model outperformed other component analysis strategies in terms of predicting between voxel covariances for new data. Interestingly, these authors noted that the GICA model often produces more interpretable components, whereas the Parafac2 model produces more stable results. Note that the better interpretability of the GICA solution seems to reflect the fact that the GICA model produces sparser components, which are more comparable to past ICA based results. But the authors do note that both the GICA and Parafac2 models “identify several components that are restricted within certain brain regions, allowing easier interpretation” (Madsen et al., 2017, p. 895).

When interpreting the results of Madsen et al. (2017), it is important to note that their study focuses on how well the various component models can predict between voxel covariances for new data. These authors did not address how well the component models can recover the true latent brain networks that generate the observed functional connectivity patterns between brain regions. The fact that the GICA and Parafac2 models both performed well with respect to predicting voxelwise covariances suggests that the Parafac2 model structure may be reasonable to assume for resting-state fMRI. However, the result does not imply that the two methods perform similarly in terms of recovering the true voxel maps. This idea is analogous to our finding that the two methods can produce similar fit statistics (i.e., R-squared and RMSE), while producing rather different estimates of the model parameters (e.g., time courses and voxel maps). Note that this point is also made by Madsen et al., who discuss the interpretational differences between the GICA and Parafac2 results.

In practice, the imperative question is whether the Parafac2 model or the GICA model (or some other rotation of the SCA-P solution) is most appropriate for the given data. Due to the longstanding tradition of using ICA based methods in neuroimaging, it is likely that the GICA approach will produce components that are more familiar. However, this is an indication of the stability (or reliability) of the ICA algorithm as applied to neuroimaging data, and not necessarily an indication of the validity of the resulting components. To determine which approach produces a more genuine look at the functional organization of the brain, it is first necessary to determine what it means for a component to be “valid” in the first place. One reasonable possibility is to validate components via their ability to explain meaningful individual differences in behavioral constructs (Smith et al., 2015). Consequently, future functional connectivity studies should consider comparing the Parafac2 and GICA (and other SCA) models to determine which approach results in components with better predictive validity for external constructs of interest.

5.4. Extensions and Future Directions

In this paper, we applied smoothness constraints to a single mode of the data (i.e., the time mode). However, in some applications it may be desirable to smooth more than one mode of the data (e.g., see Leibovici, 2010; Choi et al., 2018). For example, in the application discussed in this paper, we could have assumed that the spatial maps were smooth functions of some 3D electrode coordinates. Such an extension could be quite useful in situations with a large number of spatial locations (e.g., high-density EEG or fMRI). Note that if the smooth functions can be represented with a relatively small ν, the smoothness constraints can greatly improve the computational efficiency for the model fitting.

As another possibility, we could consider a single-subject application where the K levels of Mode C correspond to K trials from the same subject (instead of data from K different subjects). In this case, it may be reasonable to assume that the Mode C (trial) weights are unknown smooth functions of the (sequential) trial index. For example, in a learning experiment it may be interesting to model the evolution of the subject’s spatiotemporal response across the trials of the experimental task. The smoothness constraints that we applied to the time courses could be easily extended to these scenarios, or other higher-way scenarios that involve smoothing multiple data modes.

In the current study—and the mentioned extensions—the fundamental difficulty is tuning the model, i.e., choosing the appropriate degrees of freedom for the latent functions, and choosing the model dimensionality. Our results reveal that data-driven metrics, such as k-fold cross-validation of the CORCONDIA and MSE, can provide useful heuristics for model selection. However, real-world model selection often requires careful comparison of competing models, where thought is given to a balance between the model’s fit, complexity, and interpretational value. More future work is needed to develop and explore robust model selection methods for tensor decompositions.

6. Conclusions

This paper discusses and compares various multi-subject component analysis models for analyzing task-based (or event-related) neuroimaging data. We recommend comparing a hierarchy of simultaneous component analysis models to determine a reasonable model for the data. Furthermore, we recommend using cross-validation to tune the number of factors and degree of smoothness of the time latent time courses. Our Monte Carlo simulation results highlight the benefits of the Parafac2 model, which outperformed the competing methods—even though the data were generated from the more general SCA-P model. Also, our real data results demonstrate that the Parafac2 model can produce interpretable solutions that reveal informative individual and group differences in latent brain networks.

Supplementary Material

1
2

Acknowledgements

NEH was supported by NIH grants 1U01MH108150-01A1 and 1R01MH112583-01A1 and a Single Semester Leave award from the College of Liberal Arts at the University of Minnesota. MAS was supported by a fellowship from the Graduate Research Partnership Program in the Department of Psychology at the University of Minnesota.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Conflict of Interest

The authors have no conflicting interests.

1

The assumption that the true time courses are “smooth functions” is reasonable for applications of component models to spatiotemporal neuroimaging data. This is because, as will be described in the following paragraph, the unconstrained solution can be considered a special (“rough”) case of this assumption.

2

The notation f(i) denotes the -th basis function being evaluated at the i-th time point. In this paper, we use a cubic B-spline basis (de Boor, 2001), but other splines are possible (see Gu, 2013; Wahba, 1990).

3

The μr and Σr parameters were created based on the real data results to ensure that the mean ERPs (see Figure 2a) and individual differences in the ERPs (see Figure 2b) resembled those of the real data.

4

The σ = 1 condition corresponds to the level of individual differences present in the real-data example.

5

The non-negativity constraint ensures that the Mode C weights have a comparable interpretation in the Parafac, Parafac2, and SCA-P models, and has the added benefit of resolving the special sign indeterminacy of the Parafac2 model (Helwig, 2013).

6

As in the real data example, the data generating components have a similar influence on the solution (in terms of the explained variation), which is why the Varimax rotation performs well.

References

  1. Akaike H, 1987. Factor analysis and AIC. Psychometrika 52, 317–332. doi: 10.1007/BF02294359. [DOI] [Google Scholar]
  2. Alsberg B, Kvalheim O, 1993. Compression of nth-order data arrays by B-splines. part 1: theory. Journal of Chemometrics 7, 61–73. [Google Scholar]
  3. Andersen AH, Rayens WS, 2004. Structure-seeking multilinear methods for the analysis of fmri data. NeuroImage 22, 728–739. doi: 10.1016/j.neuroimage.2004.02.026. [DOI] [PubMed] [Google Scholar]
  4. Beckmann CF, DeLuca M, Devlin J, Smith S, 2005. Investigations into resting-state connectivity using independent component analysis. Philosophical Transactions of the Royal Society B: Biological Sciences 360, 1001–1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Beckmann CF, Mackay C, Filippini N, Smith SM, 2009. Group comparison of resting-state fmri data using multi-subject ica and dual regression. NeuroImage 47, S148. [Google Scholar]
  6. Beckmann CF, Smith SM, 2005. Tensorial extensions of independent component analysis for multisubject FMRI analysis. NeuroImage 25, 294–311. [DOI] [PubMed] [Google Scholar]
  7. Bell AJ, Sejnowski TJ, 1995. An information-maximization approach to blind separation and blind deconvolution. Neural Computation 7, 1129–1159. [DOI] [PubMed] [Google Scholar]
  8. Bernaards CA, Jennrich RI, 2005. Gradient projection algorithms and software for arbitrary rotation criteria in factor analysis. Educational and Psychological Measurement 65, 676–696. [Google Scholar]
  9. Bro R, De Jong S, 1997. A fast non-negativity-constrained least squares algorithm. Journal of Chemometrics 11, 393–401. [Google Scholar]
  10. Bro R, Kiers HAL, 2003. A new efficient method for determining the number of components in parafac models. Journal of Chemometrics 17, 274–286. [Google Scholar]
  11. Browne MW, 2001. An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research 36, 111–150. [Google Scholar]
  12. Calhoun V, Adali T, Pearlson G, Pekar J, 2001. Spatial and temporal independent component analysis of functional mri data containing a pair of task-related waveforms. Human Brain Mapping 13, 43–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Calhoun V, Liu J, Adali T, 2009. A review of group ica for fmri data and ica for joint inference of imaging, genetic, and erp data. NeuroImage 45, S163–S172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Carroll JD, Chang JJ, 1970. Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition. Psychometrika 35, 283–319. [Google Scholar]
  15. Cattell RB, 1944. “Parallel proportional profiles” and other principles for determining the choice of factors by rotation. Psychometrika 9, 267–283. [Google Scholar]
  16. Cattell RB, 1966. The scree test for the number of factors. Multivariate Behavioral Research 1, 245–276. doi: 10.1207/s15327906mbr0102_10. [DOI] [PubMed] [Google Scholar]
  17. Choi JY, Hwang H, Timmerman ME, 2018. Functional parallel factor analysis for functions of one- and two-dimensional arguments. Psychometrika 83, 1–20. doi: 10.1007/s11336-017-9558-9. [DOI] [PubMed] [Google Scholar]
  18. Cichocki A, 2013. Tensor decompositions: A new concept in brain data analysis? CoRR abs/1305.0395 URL: http://arxiv.org/abs/1305.0395.
  19. Comon P, 1994. Independent component analysis, a new concept? Signal Processing 36, 287–314. [Google Scholar]
  20. Cong F, Lin QH, Kuang LD, Gong XF, Astikainen P, Ristaniemi T, 2015. Tensor decomposition of eeg signals: A brief review. Journal of Neuroscience Methods 248, 59–69. doi: 10.1016/j.jneumeth.2015.03.018. [DOI] [PubMed] [Google Scholar]
  21. de Boor C, 2001. A Practical Guide to Splines. Revised ed., Springer-Verlag, New York. [Google Scholar]
  22. De Vos M, Nion D, Van Huffel S, De Lathauwer L, 2012. A combination of parallel factor and independent component analysis. Signal Processing 92, 2990–2999. [Google Scholar]
  23. Dua D, Graff C, 2019. UCI machine learning repository. URL: http://archive.ics.uci.edu/ml.
  24. Eason RG, Harter MR, White CT, 1969. Effects of attention and arousal on visually evoked cortical potentials and reaction time in man. Physiology & Behavior 4, 283–289. doi: 10.1016/0031-9384(69)90176-0. [DOI] [Google Scholar]
  25. Eckart C, Young G, 1936. The approximation of one matrix by another of lower rank. Psychometrika 1, 211–218. [Google Scholar]
  26. Emmerson RY, Dustman RE, Shearer DE, Chamberlin HM, 1987. Eeg, visually evoked and event related potentials in young abstinent alcoholics. Alcohol 4, 241–248. doi: 10.1016/0741-8329(87)90018-8. [DOI] [PubMed] [Google Scholar]
  27. Faber NKM, Bro R, Hopke PK, 2003. Recent developments in CANDE-COMP/PARAFAC algorithms: a critical review. Chemometrics and Intelligent Laboratory Systems 65, 119–137. [Google Scholar]
  28. Fein G, Andrew C, 2011. Event-related potentials during visual target detection in treatment-naïve active alcoholics. Alcoholism: Clinical and Experimental Research 35, 1171–1179. doi: 10.1111/j.1530-0277.2011.01450.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Fein G, Chang M, 2006. Visual p300s in long-term abstinent chronic alcoholics. Alcoholism: Clinical and Experimental Research 30, 2000–2007.doi: 10.1111/j.1530-0277.2006.00246.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Ferdowsi S, Abolghasemi V, Sanei S, 2015. A new informed tensor factorization approach to eeg-fmri fusion. Journal of Neuroscience Methods 254, 27–35. doi: 10.1016/j.jneumeth.2015.07.018. [DOI] [PubMed] [Google Scholar]
  31. Goodale MA, Milner AD, 1992. Separate visual pathways for perception and action. Trends in Neurosciences 15, 20–25. [DOI] [PubMed] [Google Scholar]
  32. Griffanti L, Douaud G, Bijsterbosch J, Evangelisti S, Alfaro-Almagro F, Glasser MF, Duff EP, Fitzgibbon S, Westphal R, Carone D, Beckmann CF, Smith SM, 2017. Hand classification of fmri ica noise components. NeuroImage 154, 188–205. doi: 10.1016/j.neuroimage.2016.12.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Gu C, 2013. Smoothing Spline ANOVA Models. Second ed., Springer-Verlag, New York. doi: 10.1007/978-1-4614-5369-7. [DOI] [Google Scholar]
  34. Haider M, Spong P, Lindsley DB, 1964. Attention, vigilance, and cortical evoked-potentials in humans. Science 145, 180–182. doi: 10.1126/science.145.3628.180. [DOI] [PubMed] [Google Scholar]
  35. Handwerker DA, Ollinger JM, D’Esposito M, 2004. Variation of bold hemodynamic responses across subjects and brain regions and their effects on statistical analyses. NeuroImage 21, 1639–1651. doi: 10.1016/j.neuroimage.2003.11.029. [DOI] [PubMed] [Google Scholar]
  36. Harshman RA, 1970. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multimodal factor analysis. UCLA Working Papers in Phonetics 16, 1–84. [Google Scholar]
  37. Harshman RA, 1972. PARAFAC2: Mathematical and technical notes. UCLA Working Papers in Phonetics 22, 30–44. [Google Scholar]
  38. Harshman RA, Lundy ME, 1994. PARAFAC: Parallel factor analysis. Computational Statistics and Data Analysis 18, 39–72. [Google Scholar]
  39. Harshman RA, Lundy ME, 1996. Uniqueness proof for a family of models sharing features of Tucker’s three-mode factor analysis and PARAFAC/CANDECOMP. Psychometrika 61, 133–154. [Google Scholar]
  40. Helwig NE, 2013. The special sign indeterminacy of the direct-fitting Parafac2 model: Some implications, cautions, and recommendations for Simultaneous Component Analysis. Psychometrika 78, 725–739. [DOI] [PubMed] [Google Scholar]
  41. Helwig NE, 2017. Estimating latent trends in multivariate longitudinal data via Parafac2 with functional and structural constraints. Biometrical Journal 15, 783–803. doi: 10.1002/bimj.201600045. [DOI] [PubMed] [Google Scholar]
  42. Helwig NE, 2018a. eegkit: Toolkit for Electroencephalography Data. URL: http://CRAN.R-project.org/package=eegkit R package version 1.0-4.
  43. Helwig NE, 2018b. ica: Independent Component Analysis. URL: http://CRAN.R-project.org/package=ica R package version 1.0-2.
  44. Helwig NE, 2019. multiway: Component Models for Multi-Way Data. URL: http://CRAN.R-project.org/package=multiway R package version 1.0-6. [Google Scholar]
  45. Helwig NE, Hong S, 2013. A critique of Tensor Probabilistic Independent Component Analysis: Implications and recommendations for multi-subject fMRI data analysis. Journal of Neuroscience Methods 213, 263–273. [DOI] [PubMed] [Google Scholar]
  46. Hitchcock FL, 1927. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics 6, 164–189. doi: 10.1002/sapm192761164. [DOI] [Google Scholar]
  47. Hong S, Mitchell SK, Harshman R, 2006. Bootstrap scree tests: A monte carlo simulation and applications to published data. British Journal of Mathematical and Statistical Psychology 59, 35–57. [DOI] [PubMed] [Google Scholar]
  48. Horn JL, 1965. A rationale and test for the number of factors in factor analysis. Psychometrika 30, 179–185. doi: 10.1007/BF02289447. [DOI] [PubMed] [Google Scholar]
  49. Hotelling H, 1933. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24, 417–441, 498–520. [Google Scholar]
  50. Hyvärinen A, 1999. Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks 10, 626–634. [DOI] [PubMed] [Google Scholar]
  51. Hyvärinen A, Karhunen J, Oja E, 2001. Independent Component Analysis. John Wiley & Sons, Inc., New York. [Google Scholar]
  52. Ingber L, 1997. Statistical mechanics of neocortical interactions: Canonical momenta indicators of electroencephalography. Physical Review E 55, 4578–4593. [Google Scholar]
  53. Ingber L, 1998. Statistical mechanics of neocortical interactions: Training and testing canonical momenta indicators of eeg. Mathematical Computer Modelling 27, 33–64. [Google Scholar]
  54. Kaiser HF, 1958. The varimax criterion for analytic rotation in factor analysis. Psychometrika 23, 187–200. [Google Scholar]
  55. Kaiser HF, 1960. The application of electronic computers to factor analysis. Educational and Psychological Measurement 20, 141–151. doi: 10.1177/001316446002000116. [DOI] [Google Scholar]
  56. Kamstrup-Nielsen MH, Johnsen LG, Bro R, 2013. Core consistency diagnostic in PARAFAC2. Journal of Chemometrics 27, 99–105. doi: 10.1002/cem.2497. [DOI] [Google Scholar]
  57. Kiers HAL, Ten Berge JMF, Bro R, 1999. PARAFAC2-Part I. A direct fitting algorithm for the PARAFAC2 model. Journal of Chemometrics 13, 275–294. [Google Scholar]
  58. Kolda TG, Bader BW, 2009. Tensor decompositions and applications. SIAM Review 51, 455–500. [Google Scholar]
  59. Krijnen WP, 2006. Convergence of the sequence of parameters generated by alternating least squares algorithms. Computational Statistics & Data Analysis 51, 481–489. doi: 10.1016/j.csda.2005.09.003. [DOI] [Google Scholar]
  60. Kruskal JB, 1977. Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and Its Applications 18, 95–138. [Google Scholar]
  61. Larsen R, Warne RT, 2010. Estimating confidence intervals for eigenvalues in exploratory factor analysis. Behavior Research Methods 42, 871–876. doi: 10.3758/BRM.42.3.871. [DOI] [PubMed] [Google Scholar]
  62. Leibovici DG, 2010. Spatio-temporal multiway decompositions using principal tensor analysis on k-modes: The R package PTAk. Journal of Statistical Software 34, 1–34. URL: http://www.jstatsoft.org/v34/i10/. [Google Scholar]
  63. Louwerse DJ, Smilde AK, Kiers HAL, 1999. Cross-validation of multiway component models. Journal of Chemometrics 13, 491–510. doi:. [DOI] [Google Scholar]
  64. Luck SJ, Hillyard SA, Mouloua M, Woldorff MG, Clark VP, Hawkins HL, 1994. Effects of spatial cuing on luminance detectability: Psychophysical and electrophysiological evidence for early selection. Journal of Experimental Psychology: Human Perception and Performance 20, 887–904. doi: 10.1037/0096-1523.20.4.887. [DOI] [PubMed] [Google Scholar]
  65. Luck SJ, Woodman GF, Vogel EK, 2000. Event-related potential studies of attention. Trends in Cognitive Sciences 4, 432–440. doi: 10.1016/S1364-6613(00)01545-X. [DOI] [PubMed] [Google Scholar]
  66. Madsen KH, Churchill NW, Mørup M, 2017. Quantifying functional connectivity in multi-subject fmri data using component models. Human Brain Mapping 38, 882–899. doi: 10.1002/hbm.23425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Makkiabadi B, Jarchi D, Sanei S, 2011. Blind separation and localization of correlated p300 subcomponents from single trial recordings using extended parafac2 tensor model, in: 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 6955–6958. doi: 10.1109/IEMBS.2011.6091758. [DOI] [PubMed] [Google Scholar]
  68. Martínez-Montes E, Sánchez-Bornot JM, Valdés-Sosa PA, 2008. Penalized PARAFAC analysis of spontaneous EEG recordings. Statistica Sinica 18, 1449–1464. [Google Scholar]
  69. McKeown MJ, Makeig S, Brown GG, Jung TP, Kindermann SS, Bell AJ, Sejnowski TJ, 1998. Analysis of fMRI data by blind separation into independent spatial components. Human Brain Mapping 6, 160–188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Milner AD, Goodale MA, 2008. Two visual systems re-viewed. Neuropsychologia 46, 774–785. doi: 10.1016/j.neuropsychologia.2007.10.005. [DOI] [PubMed] [Google Scholar]
  71. Mishkin M, Ungerleider LG, Macko KA, 1983. Object vision and spatial vision: two cortical pathways. Trends in Neurosciences 6, 414–417. doi: 10.1016/0166-2236(83)90190-X. [DOI] [Google Scholar]
  72. Miwakeichi F, Martínez-Montes E, Valdés-Sosa PA, Nishiyama N, Mizuhara H, Ya-maguchi Y, 2004. Decomposing eeg data into space—time—frequency components using parallel factor analysis. NeuroImage 22, 1035–1045. [DOI] [PubMed] [Google Scholar]
  73. Mørup M, Hansen L, Herrmann C, Parnas J, Arnfred S, 2006. Parallel factor analysis as an exploratory tool for wavelet transformed event-related eeg. NeuroImage 29, 938–947. [DOI] [PubMed] [Google Scholar]
  74. Pearson K, 1901. On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2, 559–572. [Google Scholar]
  75. Porjesz B, Begleiter H, 1990a. Event-related potentials for individuals at risk for alcoholism. Alcohol 7, 465–469. [DOI] [PubMed] [Google Scholar]
  76. Porjesz B, Begleiter H, 1990b. Neuroelectric processes in individuals at risk for alcoholism. Alcohol & Alcoholism 25, 251–256. [DOI] [PubMed] [Google Scholar]
  77. Porjesz B, Begleiter H, Bihari B, Kissin B, 1987. The N2 component of the event-related brain potential in abstinent alcoholics. Electroencephalography and Clinical Neurophysiology 66, 121–131. [DOI] [PubMed] [Google Scholar]
  78. Porjesz B, Begleiter H, Garozzo R, 1980. Visual evoked potential correlates of information deficits in chronic alcoholics, in: Begleiter H (Ed.), Biological effects of alcohol. Plenum Press, pp. 603–623. [DOI] [PubMed] [Google Scholar]
  79. R Core Team, 2019. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: URL: http://www.R-project.org/ R version 3.6.0. [Google Scholar]
  80. Reis MM, Ferreira M, 2002. PARAFAC with splines: A case study. Journal of Chemometrics 18, 444–450. [Google Scholar]
  81. Revelle W, Rocklin T, 1979. Very simple structure: An alternative procedure for estimating the optimal number of interpretable factors. Multivariate Behavioral Research 14, 403–414. doi: 10.1207/s15327906mbr1404_2. [DOI] [PubMed] [Google Scholar]
  82. Schneider GE, 1969. Two visual systems. Science 163, 895–902. doi: 10.1126/science.163.3870.895. [DOI] [PubMed] [Google Scholar]
  83. Sidiropoulos ND, Bro R, 2000. On the uniqueness of multilinear decomposition of n-way arrays. Journal of Chemometrics 14, 229–239. doi:. [DOI] [Google Scholar]
  84. Smith SM, Nichols TE, Vidaurre D, Winkler AM, Behrens TEJ, Glasser MF, Ugurbil K, Barch DM, Van Essen DC, Miller KL, 2015. A positive-negative mode of population covariation links brain connectivity, demographics and behavior. Nature Neuroscience 18, 1565–567. doi: 10.1038/nn.4125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Snodgrass JG, Vanderwart M, 1980. A standardized set of 260 pictures: Norms for the naming agreement, familiarity, and visual complexity. Journal of Experimental Psychology: Human Learning and Memory 6, 174–215. [DOI] [PubMed] [Google Scholar]
  86. Stegeman A, 2007. Degeneracy in Candecomp/Parafac and Indscal explained for several three-sliced arrays with a two-valued typical rank. Psychometrika 72, 601–619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Tabelow K, Polzehl J, 2011. Statistical parametric maps for functional mri experiments in r: The package fmri. Journal of Statistical Software 44, 1–21. doi: 10.18637/jss.v044.i11. [DOI] [Google Scholar]
  88. Ten Berge JMF, Kiers HAL, 1996. Some uniqueness results for PARAFAC2. Psychometrika 61, 123–132. [Google Scholar]
  89. Timmerman ME, Kiers HAL, 2002. Three-way component analysis with smoothness constraints. Computational Statistics & Data Analysis 40, 447–470. [Google Scholar]
  90. Timmerman ME, Kiers HAL, 2003. Four simultaneous component analysis models for the analysis of multivariate time series from more than one subject to model intraindividual and interindividual differences. Psychometrika 68, 105–121. [Google Scholar]
  91. Tipping ME, Bishop CM, 1999. Mixtures of probabilistic principal component analysers. Neural Computation 11, 443–482. [DOI] [PubMed] [Google Scholar]
  92. Tomasi G, Bro R, 2006. A comparison of algorithms for fitting the PARAFAC model. Computational Statistics & Data Analysis 50, 1700–1734. [Google Scholar]
  93. Tucker LR, 1951. A method for synthesis of factor analysis studies. Technical Report 984 Educational Testing Service, Princeton, NJ. [Google Scholar]
  94. Tucker LR, 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279–311. [DOI] [PubMed] [Google Scholar]
  95. Wahba G, 1990. Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia. [Google Scholar]
  96. Warne RT, Larsen R, 2014. Evaluating a proposed modification of the Guttman rule for determining the number of factors in an exploratory factor analysis. Psychological Test and Assessment Modeling 56, 104–123. [Google Scholar]
  97. Wascher E, Hoffmann S, Sänger J, Grosjean M, 2009. Visuo-spatial processing and the n1 component of the erp. Psychophysiology 46, 1270–1277. doi: 10.1111/j.1469-8986.2009.00874.x. [DOI] [PubMed] [Google Scholar]
  98. Williams AH, Kim TH, Wang F, Vyas S, Ryu SI, Shenoy KV, Schnitzer M, Kolda TG, Ganguli S, 2018. Unsupervised discovery of demixed, low-dimensional neural dynamics across multiple timescales through tensor component analysis. Neuron 98, 1099–1115.e8. doi: 10.1016/j.neuron.2018.05.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Zhang XL, Begleiter H, Porjesz B, Wang W, Litke A, 1995. Event related potentials during object recognition tasks. Brain Research Bulletin 38, 531–538. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2

RESOURCES