Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jan 1.
Published in final edited form as: J Comput Graph Stat. 2020 Feb 7;29(3):675–680. doi: 10.1080/10618600.2019.1704296

A Matrix-free Likelihood Method for Exploratory Factor Analysis of High-dimensional Gaussian Data

Fan Dai 1, Somak Dutta 1, Ranjan Maitra 1
PMCID: PMC7540940  NIHMSID: NIHMS1550245  PMID: 33041614

Abstract

This paper proposes a novel profile likelihood method for estimating the covariance parameters in exploratory factor analysis of high-dimensional Gaussian datasets with fewer observations than number of variables. An implicitly restarted Lanczos algorithm and a limited-memory quasi-Newton method are implemented to develop a matrix-free framework for likelihood maximization. Simulation results show that our method is substantially faster than the expectation-maximization solution without sacrificing accuracy. Our method is applied to fit factor models on data from suicide attempters, suicide ideators and a control group.

Keywords: EM algorithm, fMRI, Lanczos algorithm, L-BFGS-B, Profile likelihood

1. Introduction

Factor analysis (Anderson, 2003) is a multivariate statistical technique that characterizes dependence among variables using a small number of latent factors. Suppose that we have a sample Y1,Y2,,Yn from the p-variate Gaussian distribution Np(μ,Σ) with mean vector μ and a covariance matrix Σ. We assume that Σ=ΛΛ+Ψ, where Λ is a p × q matrix of rank q that describes the amount of variance shared among the p coordinates and Ψ is a diagonal matrix with positive diagonal entries representing the unique variance specific to each coordinate. Factor analysis of Gaussian data for p < n was first formalized by Lawley (1940) with efficient maximum likelihood (ML) estimation methods proposed by Jöreskog (1967); Lawley and Maxwell (1962); Mardia et al. (2006); Anderson (2003) and others. These methods however do not apply to datasets with p > n that occur in applications such as the processing of microarray data (Sundberg and Feldmann, 2016), sequencing data (Leek and Storey, 2007; Leek, 2014; Buettner et al., 2017), analyzing the transcription factor activity profiles of gene regulatory networks using massive gene expression datasets (Pournara and Wernisch, 2007), portfolio analysis in stock returns (Ng et al., 2014) and others (Trendafilov and Unkel, 2011). Necessary and sufficient conditions for the existence of MLE when p > n have been obtained by Robertson and Symons (2007). In such cases, the available computer memory may be inadequate to store the sample covariance matrix S or to make multiple copies of the dataset needed during the computation.

The expectation-maximization (EM) approach of Rubin and Thayer (1982) can be applied to datasets with p > n but is computationally slow. So, here we develop a profile likelihood method for high-dimensional Gaussian data. Our method allows us to compute the gradient of the profile likelihood function at negligible additional computational cost and to check first-order optimality, guaranteeing high accuracy. We develop a fast sophisticated computational framework called FAD (Factor Analysis of Data) to compute ML estimates of ΛandΨ. Our framework is implemented in an R (R Core Team, 2019) package called fad.

The remainder of this paper is organized as follows. Section 2 describes the factor model for Gaussian data and an ML solution using the EM algorithm, and then proposes the profile likelihood and FAD. The performance of FAD relative to EM is evaluated in Section 3. Section 4 applies our methodology on a functional magnetic resonance imaging (fMRI) dataset related to suicidal behavior (Just et al., 2017). Section 5 concludes with some discussion. An online supplement, with sections, tables and figures referenced here with the prefix “S”, is available.

2. Methodology

2.1. Background and Preliminaries

Let Y be the n × p data matrix with Yi as its ith row. Then, in the setup of Section 1, the ML method profiles out μ using the sample mean vector and then maximizes the log-likelihood,

l(Λ,Ψ)=n2{plog(2π)+logdetΣ+TrΣ1S} (1)

where Y¯=Y1/n,S=(Y1Y¯)(Y1Y¯)/n, where 1 is the vector of 1s. The matrix S is almost surely singular and has rank n when p > n The factor model (1) is not identifiable because the matrices ΛandΛQ give rise to the same likelihood for any orthogonal matrix Q So, additional constraints (see Anderson, 2003; Mardia et al., 2006, for more details) are imposed.

2.1.1. EM Algorithms for parameter estimation

The EM algorithm (Rubin and Thayer, 1982; McLachlan and Krishnan, 1996) exploits the structure of the factor covariance matrix by assuming q-variate standard normal latent factors and writing the factor model as Yi=μ+ΛZi+ϵi where ϵi’s are i.i.d Np(0,Ψ)andZis are independent of ϵi’s. The EM algorithm is easily developed, with analytical expressions for both the expectation (E-step) and maximization (M-step) steps that can be speedily computed (see Section S1.1).

Although EM algorithms are guaranteed to increase the likelihood at each iteration and converge to a local maximum, they are well-known for their slow convergence. Further, these iterations run in a (pq+p)-dimensional space that can be slow for very large p. Accelerated variants (Liu and Rubin, 2002; Varadhan and Roland, 2008) show superior performance in lowdimensional problems but come with additional computational overhead that dominates the gain in rate of convergence in high dimensions. EM algorithms also compromise on numerical accuracy by not checking for first-order optimality to enhance speed. So, we next develop a fast and accurate method for exploratory factor analysis (EFA) that is applicable in high dimensions.

2.2. Profile likelihood for parameter estimation

We start with the common and computationally useful identifiability restriction on Λ that constrains Γ=ΛΨ1Λ to be diagonal with decreasing diagonal entries. This scale-invariant constraint is completely determined except for changes in sign in the columns of Λ. Under this constraint, Λ can be profiled out for a given Ψ as described in the following

Lemma 1. Suppose that Ψ is a given p.d. diagonal matrix. Suppose that the q largest singular values of W=n1/2(Y1Y¯)Ψ1/2areθ1θ2θq and the corresponding p-dimensional right-singular vectors are the columns of Vq Then the function Λl(Λ,Ψ) is maximized at Λ^=Ψ1/2VqΔ, where Δ is a q × q diagonal matrix with ith diagonal entry as [max(θi1,0)]1/2. The profile log-likelihood equals,

lp(Ψ)=cn2{logdetΨ+TrΨ1S+i=1q(logθiθi+1)} (2)

where c is a constant that depends only on Y, n, p and q but not on Ψ.

Furthermore, the gradient of lp(Ψ) is given by:

lp(Ψ)=n2diag(Λ^Λ^+ΨS).

Proof. See Section S1.2. □

The profile log-likelihood lp(Ψ) in (2) depends on Y only through the q largest singular values of W So, in order to compute lp(Ψ)andlp(Ψ) we need to only compute the q largest singular values of W and the right singular vectors. For q << min(n, P), as is usually the case, these largest singular values and singular vectors can be computed very fast using Lanczos algorithm.

Further constraints on Ψ(e.g.Ψ=σ2Ip,σ2>0) can be easily incorporated. Also, lp(Ψ) is available in closed form that enables us to check first-order optimality and ensure high accuracy.

Finally, lp(Ψ) is expressed in terms of S However, ML estimators are scale-equivariant, so we can estimate ΛandΨ using the correlation matrix and scale back to S. A particular advantage of using the sample correlation matrix is that lp(Ψ) needs to be optimized over a fixed bounded rectangle (0,1)p that does not depend on the data and is conceivably numerically robust.

2.3. Matrix-free computations

2.3.1. A Lanczos algorithm for calculating partial singular values and vectors

In order to compute the profile likelihood and its gradient, we need the q largest singular values and right singular vectors of W We use the Lanczos algorithm (Baglama and Reichel, 2005; Dutta and Mondal, 2014) with reorthogonalization and implicit restart. Suppose that m=max{2q+1,20} and that f1n is any random vector with f1=1. Let g1=Wf1,α1=g1,F1=f1 and G1=g1. For j=1,,m let rj=Wgjαjfj, reorthogonalize rj=rjFjFjrj and set βj=rj, and if j<m, update fj+1=rj/βj,Fj+1=[Fj,fj+1], gj+1=Wfj+1βjgj, reorthogonalize gj+1=gj+1GjGjgj+1,αj+1=gj+1,gj+1=gj+1/αj+1, and set Gj+1=[Gj,gj+1].

Next, consider the m × m bidiagonal matrix Bm with diagonal entries α1,α2,...,αm with (j, j + 1) entry βj for j=1,2,..,m1 and all other entries as 0. Now suppose that h1h2hm are the singular values of Bm and that uj’s are the vj’s corresponding right and left singular vectors, which can be computed via a Sturm sequencing algorithm (Wilkinson, 1958). Also, let uj=Fmujandvj=Gmv(1jm). Then it can shown that for all j,Wuj=hjvjandWvj=hjuj+v˜j,mrm,wherev˜j,m is the last entry of vj. Because rm=βm and h1 is approximately the largest singular value of W, the algorithm stops if βm|v˜j,m|h1δ holds for j=1,2,,q, where δ is some prespecified tolerance level, and h1,h2,,hqandv1,v2,,vq are accurate approximations of the q largest singular values and corresponding right singular vectors of W.

Convergence of the reorthogonalized Lanczos algorithm often suffers from numerical instability that slows down convergence. To resolve this instability, Baglama and Reichel (2005) proposed restarting the Lanczos algorithm, but instead of starting from scratch, they initialized with the first q singular vectors To that end, let fm+1=rm/βmand resetFq+1=[u1,,uq,fm+1]. Then for j=1,2,,q, let ρj=βmv˜j,m, and reset rq=Wfm+1j=1qρjvj,αq+1=rq,gq+1=rq/αq+1, and Gq+1=[v1,,vq,gq+1]. Define γ=fm+1Wgq+1 and rq+1=Wgq+1γfm+1. For j=1,2,,mq1, compute βq+j=rq+j, fq+j+1=rq+j/βq+j,Fq+j+1=[Fq+j,fq+j+1],gq+j+1=(IGq+jGq+j)Wfq+j+1,αq+j+1=gq+j+1,gq+j+1=gq+j+1/αq+j+1 and rq+j+1=(IFq+j+1Fq+j+1)Wgq+j+1. This yields a matrix Bm with entries bj,j = hj and bj,q = ρj for j = 1,2,...,q, and bi,i = ρi for q + 1 ≤ im and bi,i + 1 = αi for q + 1 ≤ im − 1, and all other entries 0. The matrix Bm is not bidiagonal but is still small-dimensioned matrix whose singular value decomposition can be calculated very fast. Convergence of the Lanczos algorithm can be checked as before.

The only way that W enters this algorithm is through matrix-vector products of the forms Wg and Wf, both of which can be computed without explicitly storing W Overall, this algorithm yields the q largest singular values and vectors in O(qnp) computational cost using only O(qp) additional memory, resulting in substantial gains over the traditional methods (Jöreskog, 1967; Lawley and Maxwell, 1962). These traditional methods require a full eigenvalue decomposition of WW that is of O(p3) computational complexity and requires) O(p2) memory storage space. Having described a scalable algorithm for computing lp(Ψ)andlp(Ψ), we detail a numerical algorithm for computing the ML estimators.

2.3.2. Numerical optimization of the profile log-likelihood

On the correlation scale, ψii's lie between 0 and 1. Under this box constraint, the factanal function in R and factoran function in MATLAB® employ the limited-memory Broyden-Fletcher-Goldfarb-Shanno quasi-Newton algorithm (Byrd et al., 1995) with box-constraints (L-BFGS-B) to obtain the ML estimator of Ψ. However, in high dimensions, the advantages of the L-BFGS-B algorithm are particularly prominent. Because Newton methods require the search direction H(Ψ)1lp(Ψ),whereH(Ψ) is the p × p Hessian matrix of lp(Ψ), they are computationally prohibitive in high dimensions in terms of storage and numerical complexity. The quasi-Newton BFGS replaces the computation of the exact search direction by an iterative approximation using the already computed values of lp(Ψ)andlp(Ψ). The limited-memory implementation, moreover, uses only the last few (typically less than 10) values of lp(Ψ)andlp(Ψ) instead of using all the past values. Overall, L-BFGS-B reduces the storage cost from O(p2) to O(np) and the computational complexity from O(p3) to O(np). Interested readers are referred to Byrd et al. (1995); Zhu et al. (1994) for more details on the L-BFGS-B algorithm. The L-BFGS-B algorithm requires both lpandlp to be computed at each iteration. Because lp is available as a byproduct while computing lp (see Section Sections 2.2 and 2.3.1), we modify the implementation to jointly compute both quantities with a single call to the Lanczos algorithm at each L-BFGS-B iteration. In comparison to R’s default implementation (factanal) that separately calls lpandlp in its optimization routines, this tweak halves the computation time.

3. Performance evaluations

3.1. Experimental setup

The performance of FAD was compared to EM using 100 simulated datasets with true q = 3 or 5 and (n,p){(100,1000),(225,3375),(400,8000)}. For each setting, we simulated ψii~i.i.dU(0.2,0.8)andλij~i.i.d.N(0,1) and set μ = 0. We also evaluated performance with (n, p, q) ∈{(160,24547,2),(180,24547,2),(340,24547,4)} to match the settings of our application in Section 4: the true Ψ,Λ,μ were set to be the ML estimates from that dataset.

For the EM algorithm, Λ was initialized as the first q principal components (PCs) of the scaled data matrix computed via the the Lanczos algorithm while Ψ was started at Ipdiag(ΛΛ). PAD requires only Ψ to be initialized, which was done in the same way as the EM. We stopped FAD when the relative increase in lp(Ψ) was below 100ϵ0andlp<ϵ0 where ϵ0 is the machine tolerance, which in our case was approximately 2.2 × 10−16. The EM algorithm was terminated if the relative change in l(Λ,Ψ) was less than 10−16 and lp<ϵ0, or if the number of iterations reached 5000. Therefore, FAD and EM had comparable stopping criteria. For each simulated dataset, we fit models with K = l,2,···,2q factors and chose the number of factors by the Bayesian Information Criterion (BIC): 2l^k+pklogn. (Schwarz, 1978), where l^k is the maximum log-likelihood value with k factors. All experiments were done using R (R Core Team, 2019) on a workstation with Intel E5–2640 v3 CPU clocked @2.60 GHz and 64GB RAM.

3.2. Results

Because BIC always correctly picked q, we evaluated model fit for each method in terms of l(Λ^,Ψ^),dR=RRF/RFanddΓ=ΓΓF/ΓF where Γ=ΛΨ1ΛandRandR are the correlation matrices corresponding to ΣandΣ^=ΛΛ+Ψ.

3.2.1. CPU time

Figure 1 presents the relative speed of FAD to EM. Our compute times include the common initialization times. Specifically, for datasets of size (n, p) ∈{(100,1000),(225,3375),(400,8000)}, FADs was 10 to 70 times faster than EM, with maximum speedup at true q. However, EM did not converge within 5000 iterations in any of the overfitted models. In contrast, FAD always converged but it took longer than in other cases so the speedup is underestimated because of the censoring with EM. Also, the speedup is more pronounced (see Section S2.1) in the data-driven simulations where p is much larger.

Fig. 1.

Fig. 1

Relative speed of FAD to EM on (left) randomly simulated and (right) data-driven cases. Lighter ones correspond to true q = 3 and the darker ones correspond to true q = 5.

3.2.2. Parameter estimation and model fit.

Under the best fitted models, FAD and EM yield identical values of lp(Λ^,Ψ^),Ψ^,Γ^,andΛ^Λ^. Thus the relative errors (see Figure S2) in estimating these parameters are also identical.

3.3. Additional experiments in high-noise scenarios

We conclude this section by evaluating performance in situations where ostensibly, weak factors are hardly distinguished from high noise by SVD methods and where EM may be preferable (Owen and Wang, 2015). We applied FAD and EM to the simulation setup of Owen and Wang (2015): Here, the uniquenesses were sampled from three inverse Gamma distributions with unit means and variances of 0, 1 and 10, and (n, p) ∈{(200,1000),(100,5000)}. Figure S3 shows that our algorithm was substantially faster while having similar accuracy as EM.

4. Suicide ideation study

We applied EFA to data from Just et al. (2017) on an fMRI study conducted while 20 words connoting negative affects were shown to 9 suicide attempters, 8 suicide non-attempter ideators and 17 non-ideator control subjects. For each subject-word combination, Just et al. (2017) provided voxel-wise per cent changes in activation relative to the baseline in 50×61×23 image volumes. Restricting attention to the 24547 in-brain voxels yields datasets for the attempters, ideators and controls of sizes (n, p) ∈{(180,24547),(160,24547),(340,24547)}. We assumed each dataset to be a random sample from the multinormal distribution. Our interest was in determining if the variation in the per cent relative change in activation for each subject type can be explained by a few latent factors and whether there are differences in these factors between the three groups of subjects.

For each dataset, we performed EFA with q = 0,1,2,…,10 factors and using both FAD and EM. Table 1 demonstrates the computational benefits of using FAD over EM. We also used BIC to decide on the optimal q(qo) and obtained 2-factor models for both suicide attempters and ideators, and a 4-factor model for the control subjects. Figure 2 provides voxel-wise displays of the q0 factor loadings, obtained using the quartimax criterion (Costello and Osborne, 2005), for each type of subject. All the factor loadings are non-negligible only in voxels around the ventral attention network (VAN) which represents one of two sensory orienting systems that reorient attention towards notable stimuli and is closely related to involuntary actions (Vossel et al., 2014). However, there are differences between the factor loadings in each group and also among them.

Table 1:

CPU times (rounded to the nearest seconds) for FAD and EM applied to the suicide ideation study dataset.

q 1 2 3 4 5 6 7 8 9 10
Attempters FAD 3 3 4 5 5 6 6 7 9 9
EM 146 173 207 198 229 236 228 250 239 254
Ideators FAD 4 4 5 6 6 6 6 9 9 10
EM 118 197 207 200 222 244 241 226 258 258
Controls FAD 5 5 8 7 8 8 9 10 12 13
EM 300 451 456 407 426 461 483 438 566 519

Fig. 2.

Fig. 2

Loading values of fitted factors for suicide attempters, ideators and controls.

For instance, for the suicide attempters, each factor is a contrast between different areas of the VAN, but the contrasts themselves differ between the two factors. The first factor for the ideators is a weighted mean of the voxels while the second factor is a contrast of the values at the VAN voxels. For the controls, the first three factors are different contrasts of the values at different voxels while the fourth factor is more or less a mean of the values at these voxels. Further, the factor loadings in the control group are more attenuated than for either the suicide attempters or ideators. While a detailed analysis of our results is outside the purview of this paper, we note that EFA has provided us with distinct factor loadings that potentially explains the variation in suicide attempters, non-attempter ideators and controls. However, our analysis assumed that the image volumes are independent and Gaussian: further approaches relaxing these assumptions may be appropriate.

5. Discussion

In this paper, we propose a new ML-based EFA method called FAD using a sophisticated computational framework that achieves both high accuracy in parameter estimation and fast convergence via matrix-free algorithms. We implement a Lanczos method for computing partial singular values and vectors and a limited-memory quasi-Newton method for ML estimation. This implementation alleviates the computational limitations of current state-of-the-art algorithms and is capable of EFA for datasets with p >> n. In our experiments, FAD always converged but EM struggled with overfitted models. Although not demonstrated in this paper, FAD is also well-suited for distributed computing systems because it only uses the data matrix for computing matrix-vector products. FAD paves the way to develop fast methods for mixtures of factor analyzers and factor models for non-Gaussian data in high-dimensional clustering and classification problems.

Supplementary Material

Supp 1

Acknowledgements

We thank the editor and the associate editor for their helpful comments that improved the presentation of the article. An earlier version of this article won, the first author, a Student Poster Competition award at the 2nd Midwest Statistical Machine Learning Colloquium and a 2020 Student Paper Competition award from the ASA Section on Statistical Computing. This research was supported in part by the United States Department of Agriculture (USDA) National Institute of Food and Agriculture (NIFA) Hatch project I0W03617. The research of the third author was also supported in part by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health (NIH) under Grant R21EB016212. The content of this paper is however solely the responsibility of the authors and does not represent the official views of the NIBIB, the NIH, the NIFA or the USDA.

Footnotes

Supplementary Materials

Supplement: Provides additional details on the algorithms and performance evaluations on the methods discussed in this paper. Documented codes for reproducing the results are also included.

fad: An R package implementing the algorithm in this article available at CRAN (https://cran.r-project.org/web/packages/fad/index.html).

References

  1. Anderson TW (2003), An introduction to Multivariate Statistical Analysis, Wiley Series in Probability and Statistics, Wiley. [Google Scholar]
  2. Baglama J and Reichel L (2005), “Augmented Implicitly Restarted Lanczos Bidiagonalization Methods,” SIAM Journal on Scientific Computing, 27, 19–42. [Google Scholar]
  3. Buettner F, Pratanwanich N, McCarthy DJ, Marioni JC, and Stegle O (2017), “f-scLVM: Scalable and Versatile Factor Analysis for Single-cell RNA-seq,” Genome biology, 18, 212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Byrd RH, Lu P, J. N., and Zhu C (1995), “A Limited Memory Algorithm for Bound Constrained Optimization,” SIAM Journal on Scientific Computing, 16, 1190–1208. [Google Scholar]
  5. Costello AB and Osborne J (2005), “Best Practices in Exploratory Factor Analysis: Four Recommendations for Getting the Most From Your Analysis,” Practical Assessment, Research & Evaluation, 10, 1–9. [Google Scholar]
  6. Dutta S and Mondal D (2014), “An h-likelihood Method for Spatial Mixed Linear Model Based on Intrinsic Autoregressions,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77, 699–726. [Google Scholar]
  7. Jöreskog KG (1967), “Some Contributions to Maximum Likelihood Factor Analysis,” Psychometrika, 32, 443–482. [Google Scholar]
  8. Just MA, Pan LA, Cherkassky VL, McMakin DL, Cha CB, Nock MK, and Brent DP (2017), “Machine Learning of Neural Representations of Suicide and Emotion Concepts Identifies Suicidal Youth,” Nature Human Behaviour, 1, 911–919. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  9. Lawley DN (1940), “The Estimation of Factor Loadings by the Method of Maximum Likelihood,” Proceedings of the Royal Society of Edinburgh, 60, 64–82. [Google Scholar]
  10. Lawley DN and Maxwell AE (1962), “Factor Analysis as a Statistical Method,” Journal of the Royal Statistical Society. Series D (The Statistician), 12, 209–229. [Google Scholar]
  11. Leek JT (2014), “svaseq: Removing Batch Effects and Other Unwanted Noise from Sequencing Data,” Nucleic Acids Research, 42, e161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Leek JT and Storey JD (2007), “Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis,” PLoS Genetics, 3, e161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Liu C and Rubin DB (2002), “Maximum Likelihood Estimation of Factor Analysis Using the ECME Algorithm with Complete and Incomplete Data,” Statistics Sinica, 8, 729–747. [Google Scholar]
  14. Mardia KV, Kent JT, and Bibby JM (2006), Multivariate analysis, Elsevier Amsterdam. [Google Scholar]
  15. McLachlan G and Krishnan T (1996), The EM Algorithm and Extensions, Wiley Series in Probability and Statistics, Wiley. [Google Scholar]
  16. Ng CT, Yau CY, and Chan NH (2014), “Likelihood Inferences for High-Dimensional Factor Analysis of Time Series With Applications in Finance,” Journal of Computational and Graphical Statistics, 24, 866–884. [Google Scholar]
  17. Owen AB and Wang J (2015), “Bi-Cross-Validation for Factor Analysis,” Statistical Science, 31, 119–139. [Google Scholar]
  18. Pournara I and Wernisch L (2007), “Factor Analysis for Gene Regulatory Networks and Transcription Factor Activity Profiles,” BMC Bioinformatics, 8, 61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. R Core Team (2019), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, URLhttps://www.R-project.org/. [Google Scholar]
  20. Robertson D and Symons J (2007), “Maximum likelihood factor analysis with rank-deficient sample covariance matrices,” Journal of Multivariate Analysis, 98, 813–828. [Google Scholar]
  21. Rubin DB and Thayer DT (1982), “EM algorithms for ML factor analysis,” Psychometrika, 47, 69–76. [Google Scholar]
  22. Schwarz GE (1978), “Estimating the Dimension of a Model,” The Annals of Statistics, 6, 461–464. [Google Scholar]
  23. Sundberg R and Feldmann U (2016), “Exploratory factor analysis-Parameter estimation and scores prediction with high-dimensional data,” Journal of Multivariate Analysis, 148, 49–59. [Google Scholar]
  24. Trendafilov NT and Unkel S (2011), “Exploratory factor analysis of data matrices with more variables than observations,” Journal of Computational and Graphical Statistics, 20, 874–891. [Google Scholar]
  25. Varadhan R and Roland C (2008), “Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm,” Scandinavian Journal of Statistics, 35, 335–353. [Google Scholar]
  26. Vossel S, Geng JJ, and Fink GR (2014), “Dorsal and Ventral Attention Systems: Distinct Neural Circuits but Collaborative Roles,” The Neuroscientist, 20, 150–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Wilkinson JH (1958), “The Calculation of the Eigenvectors of Codiagonal Matrices,” Computer Journal, 1, 90–96. [Google Scholar]
  28. Zhu C, Byrd RH, Lu P, and Nocedal J (1994), “Algorithm 778: L-BFGS-B: Fortran Subroutines for Large-Scale Bound-Constrained Optimization,” ACM Transactions on Mathematical Software, 23, 550–560. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

RESOURCES