Abstract
This article is concerned with standard errors (SEs) and confidence intervals (CIs) for exploratory factor analysis (EFA) in different situations. The authors adapt a sandwich SE estimator for EFA parameters to accommodate nonnormal data and imperfect models, factor extraction with maximum likelihood and ordinary least squares, and factor rotation with CF-varimax, CF-quartimax, geomin, or target rotation. They illustrate the sandwich SEs and CIs using nonnormal continuous data and ordinal data. They also compare SE estimates and CIs of the conventional information method, the sandwich method, and the bootstrap method using simulated data. The sandwich method and the bootstrap method are more satisfactory than the information method for EFA with nonnormal data and model approximation error.
Keywords: factor analysis, latent variable models, factor rotation, standard errors
Introduction
Exploratory factor analysis (EFA) is a widely used statistical procedure in the social and behavioral sciences. It is a data-driven approach for understanding correlations among manifest variables with fewer latent factors. Cudeck and O’Dell (1994) encouraged factor analysts to examine both point estimates and standard error (SE) estimates (and confidence intervals [CIs]) when interpreting EFA results. If factor analysts examine only point estimates, results and conclusions can be misleading because these point estimates may be associated with SEs of different sizes or CIs of different widths.
EFA SEs were originally derived for maximum likelihood (ML) estimates under the assumptions that (a) the EFA model fits perfectly in the population and (b) manifest variables are normally distributed (Jennrich, 1973). Because the SE involves the information matrix, the authors refer to it as the information SE.1 The two assumptions for the information SE do not hold in many EFA applications.
The authors present a sandwich method for estimating SEs for rotated factor loadings and factor correlations in EFA. They first describe a general form of the sandwich SE estimator and then adapt it to accommodate different features of EFA models: three data distributions (normal data, nonnormal continuous data, and ordinal data), two estimation methods (ML and ordinary least squares [OLS]), four rotation criteria (CF-varimax, CF-quartimax, geomin, and target), and imperfect models. A sandwich SE estimator was first developed for estimating SEs in nonlinear regression with model error (White, 1981); it has been adapted to estimate EFA SEs with nonnormal variables and/or model error (Asparouhov & Muthén, 2009; Lee, Zhang, & Edwards, 2012; Yuan, Marshall, & Bentler, 2002). Previous adaptations focused on deriving EFA SEs for a particular combination of the estimation method and data distribution; the authors’ goal is to fully exploit the versatility of the sandwich SE estimator in EFA. The authors explain how the current adaptation includes previous adaptations as special cases. In addition, they implement the sandwich SE estimator in an R package EFAutilities (Zhang, Jiang, Hattori, & Trichtinger, 2018) to make it accessible to applied researchers.
The rest of the article is organized as follows. The authors first describe the estimation and interpretation of EFA. They then present the concept of imperfect EFA models and their consequences in model estimation and model interpretation. They next describe a sandwich SE estimator, which provides appropriate SEs for imperfect EFA models in a variety of conditions. In particular, they explain how components of the sandwich SE estimator are changed to accommodate different features of the EFA model. They demonstrate the versatility of the sandwich SE estimator with two empirical data sets and explore its statistical properties with simulated data. Finally, they provide several remarks on its theoretical and practical implications.
The Estimation and Interpretation of EFA
The EFA model is often estimated using a two-step procedure. The first step is factor extraction, in which an unrotated factor loading matrix is obtained by minimizing a discrepancy function of the sample correlation matrix and the model implied correlation matrix . The diagonal matrix contains unique variances. Two widely used discrepancy functions are ML and OLS. Except in one-factor models, the unrotated factor loading matrix is rarely interpretable. The second step of EFA is to rotate with the aim of improving its interpretability. One can conduct factor rotation obliquely or orthogonally. Factors are allowed to be correlated in oblique rotation, but they are uncorrelated in orthogonal rotation. Oblique rotation tends to produce clearer factor loading matrices than orthogonal rotation. Examples of factor rotation methods are CF-varimax, CF-quartimax, geomin, and target rotation (Browne, 2001).
The authors propose to interpret factor loadings with the aid of their CIs. We can divide factor loadings into four types according to their CIs: a strongly salient loading, a salient loading, a small factor loading, and a noninformative factor loading. Let be a criterion value (for example, or ) chosen by a factor analyst according to her substantive knowledge. If the lower end of the CI of a positive loading is larger than (or the upper end of the CI of a negative salient loading is less than ), the factor loading is a strongly salient loading. A strongly salient loading indicates a manifest variable that defines the corresponding factor. If the CI includes (or ) but does not include zero, the factor loading is a salient loading. A salient loading indicates some relation between the manifest variable and the factor, but the relation tends to be weaker than that of a strongly salient loading. If the whole CI is between and , the factor loading is a small loading. A small loading indicates that a factor does not importantly influence the corresponding manifest variable. In particular, if the CI includes zero, it indicates the relation between the factor and the manifest variable is not detectable at the current sample size. If the CI contains both zero and (or ), the factor loading is a noninformative loading. A noninformative loading provides little information about the strength or the direction of the relation between the manifest variable and the factor. Of course, the criterion value is still subjectively chosen. Because EFA is an exploratory procedure and it should be followed by a confirmatory procedure, sound human judgment often aids rather than hinders the extraction of information from data using EFA. Nevertheless, the authors feel that the use of CIs will improve decisions regarding interpreting loadings regardless of how the criterion value is chosen. For example, let two factor loadings and have the same point estimate of , but the CI for is [0.35, 0.45] and the CI for is [–0.21, 1.0]. Let the criterion value be . According to the CIs, is a strongly salient factor loading, but is a noninformative loading. One would regard both factor loadings as salient if one does not consider CIs.
Imperfect EFA Models
A parsimonious model can never capture the full richness of real-world phenomena. The best researchers can hope for is that a model “approximately” holds in the population. MacCallum (2003) examined consequences of imperfect EFA models. He argued that imperfect EFA models are unavoidable; for example, the influence of factors on manifest variables could be nonlinear, or there could be too many minor factors to be included in the model.
The unavoidability of imperfect EFA models has profound implications for the estimation and interpretation of EFA models. Let be the population correlation matrix. Factor analyzing does not produce a perfectly fitting EFA model. Nevertheless, minimizing a discrepancy function with regard to produces a set of parameter values , which includes the factor loading matrix and the factor correlation matrix . The minimum discrepancy function value is referred to as the error of approximation. Although the EFA model with does not account for completely, the EFA model can still be useful if it helps us understand with common factors. Of course, the EFA model is not helpful if the error of approximation is large. A commonly used method for measuring the error of the approximation is the root mean square error of approximation (RMSEA), and RMSEA values of , , , and correspond to perfect fit, close fit, acceptable fit, and unacceptable fit, respectively (Browne & Cudeck, 1993).
Note that contains population values rather than sample estimates because the EFA model is estimated with the population correlation matrix . Let be a sample correlation matrix drawn from the population. Factor analyzing produces , which includes and . The sample estimate is a consistent estimate for the population values even with model approximation error and nonnormal data.
Estimating SEs for with the information method assumes that there is no model error in the population. The authors next describe a sandwich SE method, which provides consistent SE estimates for even with model approximation error. In addition, it can be adapted to accommodate different estimation methods, different factor rotation methods, and different data distributions.
A Sandwich Method for Estimating SEs in EFA
The large sample distribution of is multivariate normal with mean vector and covariance matrix . The covariance matrix is computed using a sandwich method2
| (1) |
The middle part of the sandwich method involves two terms: is the asymptotic covariance matrix of the vector of sample correlations; is a matrix of the partial derivatives of the discrepancy function with regard to model parameters and manifest variable correlations. The outer parts of the sandwich method are both , which is obtained from a matrix inversion
| (2) |
The matrix contains the second derivatives of the discrepancy function with regard to model parameters; the vector contains constraint functions imposed on rotated factor loadings and factor correlations to deal with different factor rotation methods.
Although similar SE estimates have been described previously for EFA with particular combinations of data types, estimation methods, and rotation methods (Yuan et al., 2002, for ML estimates, orthogonal rotation, and continuous data; Lee et al., 2012, for OLS estimates, oblique rotation, and ordinal variables; Asparouhov & Muthén, 2009, for exploratory structural equation models), the authors’ goal here is to exploit the versatility of the sandwich SE estimator to its full extent and to show how to adapt it to accommodate different features of EFA models.
Nonnormal Data
We can readily adapt the sandwich SE estimator to accommodate nonnormal data because we factor analyze the correlation matrix rather than scaling the results after factor analyzing the covariance matrix. Note that factor analyzing the correlation matrix is much more common than factor analyzing the covariance matrix in EFA. The key adaptation is to properly specify the asymptotic covariance matrix ( in Equation 1) of manifest variable correlations for different types of data. Browne and Shapiro (1986) derived a matrix expression of for continuous but nonnormal data. When data are ordinal, their polychoric correlations are factor analyzed. The polychoric correlations are estimated using a two-stage method (Olsson, 1979), and their asymptotic covariance matrix is estimated using an estimating equation method (Yuan & Schuster, 2013).
Different Levels of Model Approximation Error
The sandwich SE estimator allows any level of model approximation error. When no model approximation error is present in the population, we can simplify the sandwich SE estimator in two ways. First, the second derivatives in Equation 2 are no longer necessary. We can replace the matrix of second derivatives by the product of first derivatives . Second, can be estimated using the model-implied correlations instead of .
We can greatly simplify the SE estimation in EFA if no model approximation error combines with normal data and ML estimation. The submatrix in Equation 2 alone provides SEs (Jennrich, 1974); they are information SEs. The information SEs are commonly computed in EFA software (CEFA, Browne, Cudeck, Tateneni, & Mels, 2010; PROC FACTOR, SAS Institute, 2006).
ML Estimation and OLS Estimation
Different factor estimation methods require modifications to the partial derivatives in the middle part and the second derivatives in the outer part of the sandwich SE estimator. Such derivatives for ML estimation and OLS estimation were described by Zhang, Preacher, and Jennrich (2012).
Factor Rotation Methods
Different rotation methods require modification to the constraint function in Equation 2. The constraint functions corresponding to the Crawford-Ferguson family were derived by Jennrich (1973), and constraint functions corresponding to CF-varimax, CF-quartimax, geomin, and target rotation are documented in Tateneni (1998).
An approximate 95% CI for a rotated factor loading is constructed by . Here, is a point estimate and is its SE estimate, which is obtained by computing the square root of a diagonal element of in Equation 1.
Empirical Illustrations
To illustrate the versatility of the sandwich SE estimator of Equation 1, the authors compute SEs and CIs for two empirical data sets. They present point estimates and SE estimates to save space. They include the tables for CIs and the R code for empirical illustrations in an online supporting file.3
EFA With Nonnormal Continuous Variables
Luo et al. (2008) reported a study on marital satisfaction of urban Chinese couples. Their participants were 537 couples in the first 3 years of their first marriage. The current illustration includes 28 facet scores of the Chinese Personality Assessment Inventory (Cheung et al., 1996) from the 537 wives. The authors extracted four factors from the sample correlation matrix using ML. The 90% CI for the RMSEA is [0.038, 0.049], which indicates close fit for the four-factor EFA model (Browne & Cudeck, 1993). The test of perfect fit is rejected, however. The factor rotation method was oblique CF-varimax.
The authors estimate SEs for rotated factor loadings and factor correlations using three methods: the sandwich method, the bootstrap method, and the information method. The information method assumes normal variables and a perfect EFA model, but the sandwich method and the bootstrap method do not make such assumptions. The number of bootstrap samples was .
Table 1 reports point estimates and two types of SE estimates (sandwich SE estimates, bootstrap SE estimates). These two types of SE estimates agree with each other up to the second decimal place for most parameters; the largest difference is about .4 According to the substantive theory, the authors expect some factor loadings to be large (shown in bold font) and other factor loadings to be small (shown in regular font). We can construct CIs using the point estimates and SE estimates to assess these expectations. CIs constructed with these two types of SE estimates are essentially the same. Let the criterion value be . Most of these expected large loadings are strongly salient loadings or salient loadings. The CI for is [0.60, 0.75]; it is interpreted as a strongly salient loading because its lower end is larger than ; it indicates a strong association between the manifest variable “novelty” and the factor “social potency.” The CI for is [0.22, 0.47]; it is interpreted as a salient loading because it contains but not ; it indicates a weak to moderate level of association between the manifest variable “aesthetics” and the factor “social potency.” Most expected small loadings are small. The CI for is [–0.08, 0.08]; it is interpreted as a small loading because the whole CI is between and ; it indicates a negligible association between the manifest variable “responsibility” and the factor “social potency.” Several factor loadings deviate from the expected pattern, however. The variable “traditionalism-modernity” was expected to load highly on “interpersonal relatedness,” but the CI is [–0.02, 0.17]; it is interpreted as a small loading. The loading of the same variable on “accommodation” was expected to be low, but the CI is [–0.72, –0.53]; it is interpreted as a strongly salient loading.
Table 1.
Factor Analysis of Luo et al.’s (2008) Personality Data.
| Rotated factor loadings |
||||
|---|---|---|---|---|
| Socplot | Depend | Accom | Interper | |
| Novelty | .67 [.04, .04] | .02 [.05, .05] | .12 [.04, .04] | .15 [.05, .06] |
| Diversity | .53 [.06, .06] | .15 [.05, .05] | .19 [.04, .04] | .39 [.06, .07] |
| Diverse-Thinking | .46 [.06, .06] | −.01 [.05, .05] | −.14 [.05, .05] | .31 [.07, .07] |
| Leadership | .67 [.04, .04] | .04 [.04, .05] | −.3 [.05, .05] | −.08 [.06, .06] |
| Logical-affective | .47 [.05, .05] | −.16 [.05, .05] | −.07 [.04, .04] | .18 [.06, .06] |
| Aesthetics | .34 [.07, .07] | .09 [.05, .05] | −.13 [.05, .05] | .22 [.07, .07] |
| Extroversion-introversion | .57 [.05, .05] | −.10 [.07, .07] | .00 [.04, .04] | −.07 [.05, .06] |
| Enterprise | .54 [.06, .07] | −.43 [.06, .07] | .10 [.04, .04] | −.27 [.04, .05] |
| Responsibility | .00 [.04, .04] | −.62 [.05, .05] | −.06 [.04, .04] | .12 [.05, .05] |
| Emotionality | .03 [.04, .05] | .75 [.04, .04] | .05 [.04, .04] | .06 [.04, .04] |
| Inferiority-self-acceptance | −.22 [.04, .04] | .46 [.04, .04] | −.44 [.05, .05] | −.19 [.04, .05] |
| Practical-mindedness | .03 [.04, .04] | −.51 [.05, .05] | −.03 [.05, .06] | .26 [.05, .05] |
| Optimism-pessimism | .28 [.05, .05] | −.48 [.05, .05] | .16 [.05, .05] | −.01 [.05, .05] |
| Meticulousness | .01 [.05, .05] | −.54 [.05, .06] | −.18 [.04, .04] | .05 [.05, .05] |
| Face | .06 [.06, .06] | .28 [.06, .06] | −.31 [.05, .05] | .24 [.06, .06] |
| Internal-external-control | .02 [.05, .05] | −.22 [.05, .05] | .37 [.04, .04] | −.07 [.05, .06] |
| Family-orientation | −.02 [.04, .04] | −.38 [.05, .05] | .28 [.06, .06] | .38 [.05, .05] |
| Defensiveness | .14 [.04, .04] | .26 [.04, .04] | −.64 [.05, .05] | −.17 [.06, .06] |
| Graciousness-meanness | −.05 [.04, .04] | −.24 [.05, .05] | .62 [.05, .05] | .23 [.06, .06] |
| Interpersonal-tolerance | .20 [.05, .05] | −.08 [.05, .05] | .54 [.04, .04] | .16 [.05, .05] |
| Self-social-orientation | .21 [.06, .06] | .13 [.06, .06] | −.59 [.05, .05] | −.02 [.06, .07] |
| Veraciousness-slickness | −.12 [.04, .04] | −.25 [.05, .05] | .48 [.06, .07] | .38 [.05, .06] |
| Traditionalism-modernity | −.16 [.05, .05] | −.26 [.06, .06] | −.62 [.05, .05] | .08 [.05, .05] |
| Relationship-orientation | .04 [.05, .06] | .07 [.05, .05] | −.09 [.05, .05] | .72 [.03, .03] |
| Social-sensitivity | .29 [.05, .06] | .06 [.04, .04] | −.14 [.04, .04] | .60 [.04, .05] |
| Discipline | .06 [.04, .04] | −.18 [.05, .05] | −.73 [.04, .04] | .30 [.05, .05] |
| Harmony | −.03 [.04, .04] | −.25 [.04, .04] | .17 [.05, .06] | .65 [.04, .04] |
| Thrift-extravagance | −.14 [.06, .06] | −.07 [.07, .07] | −.19 [.06, .06] | .36 [.06, .06] |
| Rotated factor correlations |
||||
| Socpot | Depend | Accom | Interper | |
| Socplot | 1 | |||
| Depend | −.10 [.04, .04] | 1 | ||
| Accom | −.06 [.04, .04] | −.31 [.03, .03] | 1 | |
| Interper | .21 [.04, .04] | −.30 [.04, .04] | .14 [.03, .03] | 1 |
Note. The table presents point estimates and two types of SE estimates (sandwich and bootstrap, in the parentheses). Socpot = social potency; depend = dependability; accom = accommodation; interper = interpersonal relatedness.
EFA With Ordinal Variables
The second empirical data set involves participants and ordinal variables (Luo, 2005).5 These variables are items of the Big Five Inventory (John, Donahue, & Kentle, 1991). The variables are five-point Likert-type scales: disagree strongly, disagree a little, neither agree nor disagree, agree a little, and agree strongly. Because the data are ordinal variables, the polychoric correlation matrix is factor analyzed instead of the Pearson correlation matrix. The factor estimation method is OLS estimation. A five-factor model fits the data well but not perfectly: a 90% CI for the RMSEA is [0.043, 0.054]. Model error is present in the EFA model. The authors illustrate the sandwich SE estimates for four oblique rotation methods: CF-varimax, CF-quartimax, geomin, and target rotation. The sandwich method involves the nontrivial task of estimating the asymptotic covariance matrix ( in Equation 1) of polychoric correlations. The polychoric correlation matrix is of order 44 by 44; their asymptotic covariance matrix is of order by ; the number of nonduplicated elements in the matrix is ,.
The point estimates for rotated factor loadings and factor correlations are very close under the four rotation methods. The congruence coefficients (Gorsuch, 1983, p. 285) of “extraversion” among the four rotation methods range from to ; the congruence coefficient ranges for “agreeableness,”“conscientiousness,”“neuroticism,” and “openness” are to , to , to , and to , respectively. Figure 1 displays the comparisons of SE estimates under the four rotation methods. SE estimates under CF-varimax, geomin, and target rotation are similar. SE estimates under CF-quartimax rotation differ from those of the other three rotation methods.
Figure 1.

Comparisons of SE estimates under four rotation methods with ordinal data (Luo, 2005).
Note. CF = Crawford-Ferguson.
Table 2 presents point estimates and SE estimates with geomin rotation. Results of CF-varimax, CF-quartimax, and target rotation are presented in the online supporting file. According to the substantive theory, the authors expect some factor loadings to be large (shown in bold font) and other factor loadings to be small (shown in regular font). We can construct CIs using the point estimates and SE estimates to assess these expectations. Most of these expected large loadings are strongly salient loadings or salient loadings. The CI for is [0.67, 0.90]; it is interpretd as a strongly salient loading because its lower end is larger than ; it indicates a strong association between the manifest variable “talkative” and the factor “extraversion.” Most expected small loadings are small. The CI for is [–0.09, 0.26]; it is interpreted as a small loading; it indicates a negligible association between the manifest variable “responsibility” and the factor “social potency.” Several CIs are wide. The CI of factor loading of “plans” on “agreeableness” is [–0.13, 0.39]; it is interpreted as noninformative because it includes both zero and ; the inference on the loading is inconclusive due to the wide CI.
Table 2.
Factor Analysis of Ordinal Data (Luo, 2005).
| Factor loadings |
||||||
|---|---|---|---|---|---|---|
| E | A | C | N | O | ||
| E-items | Talkative | .785 (.057) | .087 (.088) | −.028 (.052) | .197 (.080) | .087 (.065) |
| Reserved (R) | −.570 (.067) | −.037 (.068) | .067 (.075) | .146 (.080) | .028 (.061) | |
| Full Energy | .499 (.083) | .361 (.118) | .072 (.072) | −.169 (.081) | .094 (.080) | |
| Enthusiastic | .579 (.088) | .422 (.097) | −.069 (.065) | −.023 (.053) | .245 (.068) | |
| Quiet (R) | −.853 (.047) | .006 (.067) | −.036 (.063) | −.041 (.064) | .012 (.052) | |
| Assertive | .539 (.073) | −.107 (.093) | .148 (.078) | −.091 (.072) | .303 (.080) | |
| Shy (R) | −.711 (.056) | .118 (.067) | −.096 (.077) | .126 (.074) | .099 (.066) | |
| Outgoing | .775 (.058) | .261 (.101) | .022 (.050) | .019 (.050) | −.039 (.055) | |
| A-items | Find Fault (R) | .007 (.079) | −.395 (.091) | −.018 (.082) | .296 (.099) | .009 (.072) |
| Helpful | −.054 (.085) | .536 (.087) | .088 (.093) | −.031 (.082) | .086 (.080) | |
| Quarrels (R) | .205 (.109) | −.589 (.099) | −.056 (.088) | .337 (.116) | .038 (.058) | |
| Forgiving | .011 (.075) | .629 (.063) | −.081 (.079) | −.054 (.076) | .055 (.081) | |
| Trusting | .118 (.098) | .667 (.094) | .025 (.098) | −.017 (.066) | −.070 (.083) | |
| Cold (R) | −.083 (.112) | −.742 (.072) | .067 (.086) | .213 (.112) | .162 (.072) | |
| Considerate | −.210 (.112) | .741 (.061) | .019 (.066) | .040 (.055) | .271 (.075) | |
| Rude (R) | .168 (.104) | −.531 (.093) | −.062 (.088) | .388 (.107) | −.027 (.054) | |
| Cooperative | .167 (.112) | .663 (.081) | .178 (.095) | .087 (.072) | .000 (.053) | |
| C-items | Thorough | −.009 (.057) | .069 (.099) | .762 (.082) | −.033 (.058) | .091 (.067) |
| Careless (R) | .047 (.072) | .014 (.062) | −.496 (.099) | .250 (.094) | .220 (.083) | |
| Reliable | .046 (.071) | .278 (.132) | .574 (.099) | .091 (.091) | .043 (.064) | |
| Disorganized (R) | .079 (.057) | .026 (.054) | −.729 (.079) | .014 (.085) | .193 (.101) | |
| Lazy (R) | −.025 (.083) | −.114 (.070) | −.538 (.079) | .121 (.085) | .037 (.072) | |
| Persevere | −.097 (.074) | −.014 (.071) | .590 (.083) | −.031 (.065) | .234 (.085) | |
| Efficient | .025 (.070) | .148 (.126) | .634 (.093) | −.026 (.062) | .145 (.084) | |
| Plans | .109 (.081) | .129 (.132) | .664 (.089) | .093 (.071) | .003 (.054) | |
| Distracted (R) | −.024 (.071) | .153 (.074) | −.382 (.080) | .491 (.065) | .031 (.054) | |
| N-items | Blue | −.123 (.076) | −.190 (.092) | −.213 (.087) | .563 (.071) | .126 (.071) |
| Relaxed (R) | −.027 (.068) | .037 (.069) | .029 (.094) | −.718 (.071) | .023 (.075) | |
| Tense | −.050 (.059) | −.043 (.052) | .083 (.072) | .777 (.054) | .159 (.078) | |
| Worries | −.033 (.047) | .096 (.066) | −.049 (.065) | .773 (.064) | −.116 (.082) | |
| Emotionally Stable (R) | −.019 (.051) | .141 (.084) | −.047 (.078) | −.649 (.077) | .145 (.079) | |
| Moody | .086 (.079) | −.074 (.086) | .017 (.071) | .751 (.059) | −.010 (.060) | |
| Calm (R) | .061 (.083) | .059 (.091) | .123 (.106) | −.544 (.079) | .075 (.073) | |
| Nervous | −.242 (.074) | .209 (.074) | .040 (.057) | .666 (.062) | −.118 (.065) | |
| O-items | Ideas | .157 (.077) | −.048 (.058) | .090 (.102) | −.041 (.072) | .700 (.057) |
| Curious | .191 (.111) | .165 (.089) | .010 (.086) | −.030 (.076) | .468 (.076) | |
| Ingenious | .054 (.075) | −.061 (.076) | .183 (.096) | .030 (.065) | .601 (.061) | |
| Imaginative | .122 (.089) | .052 (.069) | −.075 (.093) | .049 (.061) | .712 (.051) | |
| Inventive | −.007 (.042) | −.066 (.053) | .026 (.078) | −.159 (.076) | .793 (.044) | |
| Artistic | −.145 (.130) | .111 (.093) | −.018 (.079) | .027 (.066) | .666 (.075) | |
| Routine (R) | .057 (.086) | .025 (.085) | .196 (.091) | .141 (.086) | −.233 (.071) | |
| Reflect | .173 (.089) | .064 (.068) | −.075 (.096) | .060 (.072) | .686 (.058) | |
| Nonartistic (R) | .052 (.100) | .059 (.083) | −.083 (.082) | .093 (.087) | −.417 (.091) | |
| Sophisticated | −.264 (.115) | .057 (.090) | .001 (.072) | −.034 (.065) | .635 (.069) | |
| Factor correlations |
||||||
| E | A | C | N | O | ||
| E | 1 | |||||
| A | .138 (.059) | 1 | ||||
| C | .128 (.092) | .288 (.084) | 1 | |||
| N | −.249 (.071) | −.193 (.088) | −.302 (.070) | 1 | ||
| O | .196 (.076) | .146(.085) | .090 (.089) | −.121 (.080) | 1 | |
Note. OLS was used to extract five factors from polychoric correlations of ordinal variables; the factor rotation method is oblique geomin. The table presents point estimates and SEs (in parentheses). (R) = reverse-coded items. OLS = ordinary least squares.
A Simulation Study
The Design of the Simulation Study
The goal of the simulation study is to assess the influence of model error and data distributions on SE estimates and CIs for factor loadings and factor correlations. The authors consider four levels of model error (RMSEA = , , , and ), three distributions (a normal distribution, an elliptical distribution, and a skewed distribution), and five levels of sample size (, , , , and ,). One thousand random samples are generated in each of the 60 conditions. Note that the four levels of the RMSEA correspond to perfect fit, close fit, acceptable fit, and unacceptable fit (Browne & Cudeck, 1993). The population parameters of the simulation study are chosen to be the parameter estimates of an empirical illustration reported earlier (Luo et al., 2008). The sample size of the empirical study was ; the authors include four other sample sizes so they can generalize the results to a wider range of conditions.
Let be the model implied correlation matrix computed using and of Table 1. We can construct population correlation matrices according to a method described by Yuan and Hayashi (2003):
| (3) |
Here, is the sample correlation matrix; is a positive number that controls the amount of model error. When , the population correlation matrix ; the RMSEA is and no model approximation error is present in the population. Adjusting produces three levels of model error: RMSEA values of , , and . Although the population correlation matrices are different at these four levels of model error,6 factor analysis of these correlation matrices produces the same parameter values.
The three distributions are a normal distribution, an elliptical distribution, and a skewed distribution. In the normal distribution condition, manifest variables are generated from a multivariate normal distribution with a null mean vector and a covariance matrix . In the elliptical distribution condition, manifest variables are generated from a mixture of two multivariate normal distributions (Ichikawa & Konishi, 1995). Both the normal distribution and the elliptical distribution are symmetric, but the elliptical distribution has heavier tails than the normal distribution. In the skewed distribution, manifest variables are generated using a method described in Yuan and Bentler (1997). The skewed distribution differs from the normal distribution in two ways: It is no longer a symmetric distribution; the marginal kurtosis of different components of are different. Note that the population correlation matrix is the same in all three distribution conditions.
Results of the Simulation Study
In each simulated sample, the authors extract four factors using ML and conduct oblique rotation using both CF-varimax and target rotation. They compute SEs for factor loadings and factor correlations using three methods: the information method, the sandwich method, and the bootstrap method. They construct CIs using point estimates and SE estimates.
Let be the CI for . Its empirical coverage rate over , simulation samples is the proportion of samples whose CI includes . Table 3 reports the mean empirical coverage rates (averaged across all factor loadings and factor correlations) of three types of CIs (sandwich CIs, information CIs, and bootstrap CIs) for oblique target rotation.7
Table 3.
Average Empirical Coverage Rates of CIs, Target Rotation.
| The mean empirical coverage rate across all parameters | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RMSEA |
.00 |
.05 |
.08 |
.10 |
|||||||||
| Dist | Info | Sand | Boot | Info | Sand | Boot | Info | Sand | Boot | Info | Sand | Boot | |
| N | 100 | 92.7 | 95.3 | 96.2 | 90.8 | 95.7 | 96.6 | 86.3 | 94.2 | 95.8 | 80.7 | 92.1 | 94.8 |
| 200 | 94.2 | 95.1 | 95.6 | 92.8 | 95.2 | 96.0 | 89.2 | 95.2 | 96.5 | 82.8 | 93.7 | 95.4 | |
| 537 | 94.8 | 95.0 | 95.2 | 93.3 | 95.1 | 95.1 | 90.7 | 95.0 | 95.5 | 85.5 | 95.0 | 96.3 | |
| 800 | 94.8 | 95.1 | 95.1 | 93.5 | 95.0 | 95.1 | 90.6 | 95.1 | 95.3 | 86.7 | 95.0 | 96.3 | |
| 2,000 | 95.0 | 95.1 | 95.0 | 93.6 | 94.9 | 95.1 | 91.0 | 95.1 | 95.2 | 87.1 | 95.1 | 95.5 | |
| E | 100 | 86.8 | 94.7 | 96.2 | 84.0 | 94.7 | 95.9 | 79.5 | 93.2 | 95.2 | 74.4 | 91.8 | 93.9 |
| 200 | 89.6 | 95.0 | 95.8 | 87.6 | 95.2 | 96.4 | 83.0 | 94.5 | 96.2 | 76.6 | 92.2 | 95.2 | |
| 537 | 90.4 | 95.0 | 95.1 | 88.6 | 95.1 | 95.3 | 85.0 | 95.1 | 95.8 | 79.5 | 94.5 | 96.1 | |
| 800 | 90.7 | 95.0 | 95.2 | 88.9 | 94.9 | 95.2 | 85.7 | 95.2 | 95.6 | 80.3 | 94.8 | 96.4 | |
| 2,000 | 91.0 | 95.1 | 95.1 | 88.9 | 95.0 | 95.1 | 85.9 | 95.2 | 95.0 | 81.8 | 95.2 | 95.7 | |
| S | 100 | 86.5 | 94.1 | 96.5 | 81.2 | 93.4 | 95.3 | 76.0 | 91.6 | 93.7 | 71.7 | 89.8 | 92.2 |
| 200 | 89.2 | 94.8 | 96.2 | 85.6 | 94.8 | 97.0 | 78.3 | 93.3 | 95.7 | 72.0 | 90.9 | 93.8 | |
| 537 | 90.2 | 94.7 | 95.2 | 89.6 | 94.9 | 96.4 | 83.7 | 94.3 | 96.9 | 74.2 | 91.9 | 95.1 | |
| 800 | 90.3 | 94.6 | 95.0 | 90.0 | 94.9 | 95.9 | 84.8 | 95.0 | 97.0 | 77.5 | 93.2 | 96.2 | |
| 2,000 | 90.4 | 94.8 | 94.9 | 90.1 | 94.8 | 95.2 | 86.8 | 95.2 | 96.3 | 81.0 | 95.0 | 97.2 | |
Note. CI = confidence interval; RMSEA = the root mean square error of approximation; Emp = empirical SEs; Info = SEs with the information matrix; Sand = sandwich SEs; Boot = bootstrap SEs; N = normal distribution; E = elliptical distribution; S = skewed distribution.
Four observations can be made on the mean empirical coverage rates of the three types of CIs. First, all three types of CIs have satisfactory empirical coverage rates under the ideal condition of normally distributed variables, no model error (RMSEA ), and moderately large samples (). Second, as the amount of model error increases, the empirical coverage rates of information CIs decrease. Such decreases are more pronounced when RMSEA and RMSEA . The empirical coverage rates of sandwich CIs and bootstrap CIs are closer to regardless of the amount of model error. Third, the empirical coverage rates of information CIs for nonnormal data are lower than those for normal data. The empirical coverage rates of sandwich CIs and bootstrap CIs are close to regardless of data distributions. Fourth, increasing sample size makes empirical coverage rates of sandwich CIs and bootstrap CIs close to in model error and nonnormal data conditions, but increasing sample size does not improve the empirical coverage rates of information CIs in such conditions.
Table 4 reports the mean empirical coverage rates (averaged across all factor loadings and factor correlations) of three types of CIs (sandwich CIs, information CIs, and bootstrap CIs) for oblique CF-varimax rotation. Although the four observations made on target rotation apply to CF-varimax rotation, the sandwich CIs and the bootstrap CIs perform less satisfactorily for CF-varimax rotation than for target rotation. The empirical coverage rates of sandwich CIs are lower than for small samples and move closer to at larger samples; the phenomenon is particularly noticeable when the amounts of model error are larger (RMSEA and RMSEA ). The empirical coverage rates of bootstrap CIs are higher than in most conditions.
Table 4.
Average Empirical Coverage Rates of CIs, CF-Varimax.
| The mean empirical coverage rate across all parameters | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RMSEA |
.00 |
.05 |
.08 |
.10 |
|||||||||
| Dist | Info | Sand | Boot | Info | Sand | Boot | Info | Sand | Boot | Info | Sand | Boot | |
| N | 100 | 92.2 | 94.5 | 97.6 | 87.9 | 93.5 | 96.8 | 82.9 | 91.9 | 96.2 | 78.1 | 89.8 | 94.9 |
| 200 | 93.7 | 94.8 | 97.3 | 92.6 | 95.5 | 96.9 | 85.6 | 93.4 | 97.5 | 80.8 | 91.9 | 96.3 | |
| 537 | 94.5 | 95.4 | 96.1 | 93.1 | 95.2 | 96.3 | 89.8 | 95.2 | 96.7 | 83.5 | 94.2 | 96.7 | |
| 800 | 94.6 | 94.9 | 95.7 | 93.5 | 95.5 | 96.2 | 90.3 | 95.6 | 96.4 | 84.5 | 95.3 | 97.3 | |
| 2,000 | 94.8 | 95.1 | 95.1 | 93.2 | 95.0 | 95.5 | 90.3 | 95.1 | 95.3 | 85.1 | 95.0 | 96.1 | |
| E | 100 | 85.2 | 93.8 | 96.5 | 82.3 | 93.5 | 96.4 | 76.2 | 91.5 | 94.8 | 73.4 | 90.1 | 93.6 |
| 200 | 89.9 | 95.3 | 96.8 | 86.4 | 94.7 | 97.3 | 79.9 | 93.3 | 96.6 | 73.9 | 91.4 | 94.9 | |
| 537 | 90.5 | 95.1 | 96.4 | 88.5 | 95.0 | 96.5 | 84.0 | 95.2 | 96.9 | 77.0 | 93.2 | 96.6 | |
| 800 | 91.4 | 95.4 | 95.8 | 89.1 | 95.7 | 95.9 | 84.3 | 95.4 | 96.8 | 77.3 | 94.7 | 96.9 | |
| 2,000 | 91.2 | 95.1 | 95.1 | 88.6 | 95.1 | 95.3 | 85.0 | 95.1 | 95.4 | 79.7 | 95.6 | 96.5 | |
| S | 100 | 85.2 | 93.3 | 93.5 | 79.7 | 92.3 | 95.1 | 75.0 | 91.1 | 92.7 | 70.7 | 88.4 | 92.2 |
| 200 | 88.9 | 94.1 | 97.4 | 84.1 | 93.4 | 97.2 | 76.2 | 90.8 | 94.6 | 69.9 | 89.0 | 93.0 | |
| 537 | 90.4 | 94.8 | 96.0 | 88.7 | 94.9 | 97.4 | 82.2 | 94.5 | 97.7 | 73.8 | 90.3 | 96.0 | |
| 800 | 90.5 | 95.2 | 95.6 | 88.9 | 94.8 | 96.8 | 83.8 | 94.9 | 97.7 | 73.2 | 91.6 | 96.8 | |
| 2,000 | 90.2 | 94.9 | 95.0 | 89.7 | 95.2 | 95.5 | 86.1 | 95.0 | 96.4 | 78.9 | 94.4 | 97.5 | |
Note. CI = confidence interval; CF = Crawford-Ferguson; RMSEA = the root mean square error of approximation; Emp = empirical SEs; Info = SEs with the information matrix; Sand = sandwich SEs; Boot = bootstrap SEs; N = normal distribution; E = elliptical distribution; S = skewed distribution.
The relative advantage of target rotation over CF-varimax rotation is expected. CF-varimax rotation is an automatic rotation method, but target rotation requires the factor analyst to provide a target matrix that reflects substantive knowledge about the factor loading pattern. Regardless of factor rotation methods, data distributions, and levels of model error, the sandwich method provides useful SE estimates and CIs at a moderately large sample size.
Concluding Comments
A parsimonious EFA model is unlikely to perfectly represent complicated real-world phenomena, and model error is always present in EFA (MacCallum, 2003). Let us consider a hypothetical scenario in which 10 manifest variables are affected by two major factors and 30 minor factors. Only the two major factors have large factor loadings and the 30 minor factors have only small loadings. A useful factor analysis model does not fit data perfectly, but it captures the influence of major common factors with the presence of minor factors that are like background noise. The information SEs and CIs may be invalid with model error, but sandwich SEs and CIs are still valid. Factor analysts can interpret rotated factor loadings and factor correlations by examining their sandwich CIs.
A common reason for nonnormal distributions is the use of ordinal variables. To accommodate ordinal variables, the polychoric correlation matrix is factor analyzed. Because polychoric correlation matrices are often not positive definite, ML estimation is infeasible. The authors consider OLS estimation for its computational robustness. Although estimating factor loadings and factor correlations involves only polychoric correlations, estimating SEs and CIs involves the asymptotic covariance matrix of polychoric correlations. Estimating such a large matrix is a nontrivial task. For example, there are 44 ordinal variables in the second empirical study, and the corresponding asymptotic covariance matrix has nearly half a million nonduplicated elements. The authors implemented an algorithm that uses an estimating equation approach (Yuan & Schuster, 2013) to estimate the asymptotic covariances of polychoric correlations.
The sandwich SE estimator is more versatile than the bootstrap method (Ichikawa & Konishi, 1995) and the infinitesimal jackknife method (Zhang et al., 2012). The bootstrap method is inappropriate for geomin rotation, which tends to produce multiple local solutions (Browne, 2001; Hattori, Zhang, & Preacher, 2017). The infinitesimal jackknife method is equivalent to the sandwich SE estimator when manifest variables are continuous, but it is inappropriate for ordinal variables. The sandwich SE estimator can be adapted in both situations.
The R package EFAutilities (Zhang et al., 2018) implements the sandwich SE estimator and the corresponding CIs. It computes SEs and CIs for EFA parameters with normal and nonnormal data, two types of estimation method (ML and OLS), and oblique rotation and orthogonal rotation with four rotation criteria (CF-varimax, CF-quartimax, geomin, or target), with any level of model approximation error.
Supplemental Material
Supplemental material, Online_supplement for A Sandwich Standard Error Estimator for Exploratory Factor Analysis With Nonnormal Data and Imperfect Models by Guangjian Zhang, Kristopher J. Preacher, Minami Hattori, Ge Jiang and Lauren A. Trichtinger in Applied Psychological Measurement
It is commonly referred to as the normal theory based SE. The authors avoid this name because it ignores the assumption of a perfectly fitting model in the population.
Most EFA models are estimated with sample correlation matrices. If a factor analyst is interested in estimating EFA models with a sample covariance matrix, the sandwich method can be easily adapted. The adaptation involves replacing by the asymptotic covariance matrix of unique elements of a sample covariance matrix. All other components remain the same.
The address for the online file is https://www3.nd.edu/~gzhang3/Papers/SandwichEFA/SandwichEFA.html
The authors include comparisons between information SEs and sandwich SEs in the online supporting file. For nearly all parameters, sandwich SE estimates are larger than the corresponding information SE estimates.
The authors thank Shanhong Luo for making the data set available to them.
The four population correlation matrices are included in the online supporting file.
The results for CIs and SE estimates for single parameters are included in the online file.
Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material: Supplemental material is available for this article online.
References
- Asparouhov T., Muthén B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16, 397-438. doi: 10.1080/10705510903008204 [DOI] [Google Scholar]
- Browne M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36, 111-150. doi: 10.1207/s15327906mbr3601_05 [DOI] [Google Scholar]
- Browne M. W., Cudeck R. (1993). Alternative ways of assessing model fit. In Bollen K. A., Long J. S. (Eds.), Testing structural equation models (pp. 136-162). Newbury Park, CA: Sage. [Google Scholar]
- Browne M. W., Cudeck R., Tateneni K., Mels G. (2010). CEFA 3.04: Comprehensive Exploratory Factor Analysis. Retrieved from http://faculty.psy.ohio-state.edu/browne/programs.htm
- Browne M. W., Shapiro A. (1986). The asymptotic covariance matrix of sample correlation coefficients under general conditions. Linear Algebra and Its Applications, 82, 169-176. doi: 10.1016/0024-3795(86)90150-3 [DOI] [Google Scholar]
- Cheung F. M., Leung K., Fan R., Song W., Zhang J., Zhang J. (1996). Development of the Chinese Personality Assessment Inventory. Journal of Cross-Cultural Psychology, 27, 181-199. doi: 10.1177/0022022196272003 [DOI] [Google Scholar]
- Cudeck R., O’Dell L. L. (1994). Applications of standard error estimates in unrestricted factor analysis: Significance tests for factor loadings and correlations. Psychological Bulletin, 115, 475-487. doi: 10.1037//0033-2909.115.3.475 [DOI] [PubMed] [Google Scholar]
- Gorsuch R. L. (1983). Factor analysis (2nd ed.). Mahwah, NJ: Lawrence Erlbaum. doi: 10.4324/9780203781098 [DOI] [Google Scholar]
- Hattori M., Zhang G., Preacher K. J. (2017). Multiple local solutions and geomin rotation. Multivariate Behavioral Research, 52, 720-731. doi: 10.1080/00273171.2017.1361312 [DOI] [PubMed] [Google Scholar]
- Ichikawa M., Konishi S. (1995). Application of the bootstrap methods in factor analysis. Psychometrika, 60, 77-93. doi: 10.1007/bf02294430 [DOI] [Google Scholar]
- Jennrich R. I. (1973). Standard errors for obliquely rotated factor loadings. Psychometrika, 38, 593-604. doi: 10.1007/bf02291497 [DOI] [Google Scholar]
- Jennrich R. I. (1974). Simplified formulae for standard errors in maximum-likelihood factor analysis. British Journal of Mathematical and Statistical Psychology, 27, 122-131. doi: 10.1111/j.2044-8317.1974.tb00533.x [DOI] [Google Scholar]
- John O. P., Donahue E. M., Kentle R. L. (1991). The Big Five Inventory—Versions 4a and 54. Berkeley, CA: University of California, Berkeley, Institute of Personality and Social Research. doi: 10.1037/t07550-000 [DOI] [Google Scholar]
- Lee C.-T., Zhang G., Edwards M. C. (2012). Ordinary least squares estimation of parameters in exploratory factor analysis with ordinal data. Multivariate Behavioral Research, 47, 314-339. doi: 10.1080/00273171.2012.658340 [DOI] [PubMed] [Google Scholar]
- Luo S. (2005). Personality and relationship satisfaction. Unpublished studies.
- Luo S., Chen H., Yue G., Zhang G., Zhaoyang R., Xu D. (2008). Predicting marital satisfaction from self, partner, and couple characteristics: Is it me, you, or us? Journal of Personality, 76, 1231-1266. doi: 10.1111/j.1467-6494.2008.00520.x [DOI] [PubMed] [Google Scholar]
- MacCallum R. C. (2003). Working with imperfect models. Multivariate Behavioral Research, 38, 113-139. doi: 10.1207/s15327906mbr3801_5 [DOI] [PubMed] [Google Scholar]
- Olsson U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44, 443-460. doi: 10.1007/bf02296207 [DOI] [Google Scholar]
- SAS Institute. (2006). SAS/STAT 9.2 user’s guide. Cary, NC: Author. [Google Scholar]
- Tateneni K. (1998). Use of automatic and numerical differentiation in the estimation of asymptotic standard errors in exploratory factor analysis (Doctoral dissertation). The Ohio State University, Columbus. [Google Scholar]
- White H. (1981). Consequences and detection of misspecified nonlinear regression models. Journal of the American Statistical Association, 76, 419-443. doi: 10.1080/01621459.1981.10477663 [DOI] [Google Scholar]
- Yuan K.-H., Bentler P. M. (1997). Generating multivariate distributions with specified marginal skewness and kurtosis. In Bandilla W., Faulbaum F. (Eds.), Softstat’97 advances in statistical software 6 (pp. 385-391). Stuttgart, Germany: Lucius & Lucius. [Google Scholar]
- Yuan K.-H., Hayashi K. (2003). Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models. British Journal of Mathematical and Statistical Psychology, 56, 93-110. doi: 10.1348/000711003321645368 [DOI] [PubMed] [Google Scholar]
- Yuan K.-H., Marshall L. L., Bentler P. M. (2002). A unified approach to exploratory factor analysis with missing data, nonnormal data, and in the presence of outliers. Psychometrika, 67, 95-121. doi: 10.1007/bf02294711 [DOI] [Google Scholar]
- Yuan K.-H., Schuster C. (2013). Overview of statistical estimation methods. In Little T. D. (Ed.), The Oxford handbook of quantitative methods (pp. 361-387). New York, NY: Oxford University Press. [Google Scholar]
- Zhang G., Jiang G., Hattori M., Trichtinger L. (2018). Utility functions for exploratory factor analysis (Version 1.2.2) [Computer software manual]. Retrieved from https://cran.r-project.org/web/packages/EFAutilities/EFAutilities.pdf
- Zhang G., Preacher K. J., Jennrich R. I. (2012). The infinitesimal jackknife with exploratory factor analysis. Psychometrika, 77, 634-648. doi: 10.1007/s11336-012-9281-5 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material, Online_supplement for A Sandwich Standard Error Estimator for Exploratory Factor Analysis With Nonnormal Data and Imperfect Models by Guangjian Zhang, Kristopher J. Preacher, Minami Hattori, Ge Jiang and Lauren A. Trichtinger in Applied Psychological Measurement
