Using Fit Statistic Differences to Determine the Optimal Number of Factors to Retain in an Exploratory Factor Analysis

W Holmes Finch

doi:10.1177/0013164419865769

. 2019 Jul 31;80(2):217–241. doi: 10.1177/0013164419865769

Using Fit Statistic Differences to Determine the Optimal Number of Factors to Retain in an Exploratory Factor Analysis

W Holmes Finch ^1,^✉

PMCID: PMC7047263 PMID: 32158020

Abstract

Exploratory factor analysis (EFA) is widely used by researchers in the social sciences to characterize the latent structure underlying a set of observed indicator variables. One of the primary issues that must be resolved when conducting an EFA is determination of the number of factors to retain. There exist a large number of statistical tools designed to address this question, with none being universally optimal across applications. Recently, researchers have investigated the use of model fit indices that are commonly used in the conduct of confirmatory factor analysis to determine the number of factors to retain in EFA. These results have yielded mixed results, appearing to be effective when used in conjunction with normally distributed indicators, but not being as effective for categorical indicators. The purpose of this simulation study was to compare the performance of difference values for several fit indices as a method for identifying the optimal number of factors to retain in an EFA, with parallel analysis, which is one of the most reliable such extant methods. Results of the simulation demonstrated that the use of fit index difference values outperformed parallel analysis for categorical indicators, and for normally distributed indicators when factor loadings were small. Implications of these findings are discussed.

Keywords: exploratory factor analysis, model fit statistics, root mean square error of approximation, RMSEA, parallel analysis, comparative fit index, CFI

One of the most popular statistical methods used in educational and psychological research is exploratory factor analysis (EFA). A review of the PsycINFO database reveals that between January 1, 2008 and December 31, 2018 a total of 9,264 citations with “exploratory factor analysis” as a subject were published. This method is used in a variety of ways, including as a step in the development of validity evidence for scales (Ratti, Vickerstaff, Crabtree, & Hassiotis, 2017), to explore the latent structure of psychological and educational constructs (Coker, Catlin, Ray-Griffith, Knight, & Stowe, 2018), and as a precursor to confirmatory factor analysis (CFA; Canivez, Watkins, & McGill, 2019). Because EFA is inherently exploratory in nature, meaning that the number of underlying factors is not known a priori, one of the key aspects in the successful application of this methodology is the determination of the number of factors to retain. There are a large number of techniques available to researchers for this purpose, with no one approach having been determined to be optimal under all situations. In addition, research has demonstrated that some of the more promising techniques for determining the number of factors to retain in an EFA may perform differently for different types of indicator variables (e.g., Wirth & Edwards, 2007; Yang & Xia, 2015).

Recently, Clark and Bowles (2018) investigated the performance for determining the number of factors to retain in the context of EFA of several statistics that are traditionally used to characterize model fit for CFA. Their results revealed mixed performance for several of these, with a recommendation that future research be conducted to investigate the use of fit indices in the context of EFA. The current study was designed to build on these earlier results by using differences in fit index values in order to ascertain the number of factors to retain in an EFA. The remainder of this article is organized as follows. First, the EFA model is introduced briefly, followed by a brief review of traditional methods for determining the number of factors to retain, after which the fit indices to be used in the current study are described. Previous work focused on using these indices to determine the number of factors to retain in the context of EFA is then reviewed, after which the goals and method of the current simulation study are reviewed. The results of these simulations are then described, and the article concludes with a discussion of these results, and their implications for practice.

Exploratory Factor Analysis

The standard EFA model can be expressed as

Y = Λ ξ + Ψ,

(1)

where

$Y =$ Matrix of observed indicator variables
$ξ =$ Matrix of factor(s)
$Λ =$ Matrix of factor loadings relating indicators to factor(s)
$Ψ =$ Matrix of unique random errors associated with the observed indicators.

Of particular interest in the context of EFA is the factor loading matrix, $Λ$ , linking the indicators to the factors. There exist a number of methods available for estimating the model parameters in Equation (1), with perhaps the most widely used being maximum likelihood estimation and principal axis factoring. After the initial factors are extracted, they are typically rotated in order to improve interpretability of the results through the mathematical encouragement of simple structure in $Λ$ , that is, each indicator will be associated with a single factor.

As noted above, several methods have been suggested for determining the optimal number of factors to retain. However, given that EFA is exploratory in nature, determining the number of factors to retain is an inherently uncertain task. Unlike with confirmatory factor analysis, the researcher using EFA has no firm notion regarding the number of factors that might underlay the observed indicators. Furthermore, including additional factors will always yield a better statistical fit to the data. In other words, if the only goal of the researcher is to more fully explain the covariance matrix for the observed variables, then including more factors will always be the preferred strategy. However, this instinct runs counter to the desire of researchers to explain the observed data with the most parsimonious model possible. Stated differently, it is advantageous from a conceptual perspective to retain the fewest number of factors that provide a satisfactory statistical explanation of the observed variable covariance matrix, and that also yields a theoretically meaningful solution. The effort to achieve this balance, and to do so using as objective a set of tools as possible has continued to prove challenging in many situations. For this reason, statisticians have continued to work on the development of techniques to assist researchers in determining the optimal number of factors to retain. The next section of the article is devoted to a review of these approaches.

Traditional Methods for Determining the Number of Factors to Retain

There is a wide array of methods available to researchers for determining the optimal number of factors to retain in the context of EFA. Among the earliest of these approaches are the scree plot, and the eigenvalue greater than 1 rule. With the latter rule, the researcher would retain all factors that have an eigenvalue of 1 or more, under the theory that such factors account for more variance in the observed data than do any single variables, which have variances of 1 when transformed to the standard normal distribution. For the scree plot, eigenvalues are plotted against their associated factors, and the researcher examines the plot to determine at what point there seems to be a marked change in the trajectory of the line connecting these points. The number of factors for which that change occurs is taken to be the appropriate number of factors to retain. Both the eigenvalue greater than 1 rule and the scree plot have been shown to be relatively inaccurate for the purpose of determining the number of factors to retain (Crawford & Koopman, 1979; Pett, Lackey, & Sullivan, 2003; Raiche, Wall, Magis, Riopel, & Blais, 2013). The use of the residual correlation matrix (i.e., the matrix reflecting the difference between the observed correlations among indicators, and the model predicted correlations) is a more effective tool for determining the number of factors to retain, but can be unwieldy to use in practice, particularly, when there are a large number of indicators (e.g., Gorsuch, 1983). When researchers use maximum likelihood estimation to fit the factor model, the difference in chi-square goodness-of-fit test values for models with different numbers of factors can be calculated and compared with the chi-square distribution. This statistic tests the null hypothesis that the model implied covariance matrix is identical to the observed covariance matrix obtained from the data. However, researchers have demonstrated that this approach is very sensitive to sample size, and in many cases may not yield accurate findings (Tong & Bentler, 2013). Other methods that have been suggested for use in determining the number of factors to retain in an EFA are the minimum average partial (Velicer, 1976), very simple structure (VSS; Revelle & Rocklin, 1979), and a set of objective measures based on the scree plot (see Raiche et al., 2013, for a description of these). Minimum average partial involves the calculation of the average correlation among the observed variables after the factors have been partialed out. The appropriate number of factors to retain is associated with the smallest of these average partial correlations. VSS is based on the ratio of the sum of squares of the residual correlations and the sum of squares of the observed correlations. The objective approaches based on the scree plot each use a different technique to fit a line to the plot of the eigenvalues and number of factors in an effort to objectively identify where the relationship changes direction most markedly.

Parallel Analysis

Perhaps the most consistent positive performer in terms of determining the number of factors to retain is parallel analysis (PA). This method was first described by Horn (1965), and involves the generation of synthetic data that have the same marginal properties (i.e., means and variances) as the actual observed data, but which have no underlying latent structure, that is, 0 factors. Typically, a large number (e.g., 1,000) of these synthetic data sets are generated, and factor analysis is conducted for each. The eigenvalues from these analyses on the synthetic data are then used to create distributions of eigenvalues that would be expected if no underlying factor structure is present. The eigenvalues obtained from a factor analysis of the observed data are then compared with these null distributions in order to determine the number of factors to retain. A factor is retained if the eigenvalue from the observed data is larger than the 95th percentile of the distribution of null factor eigenvalues generated from the synthetic data. The synthetic data used in PA can be generated parametrically from a known distribution, such as multivariate normal with means and variances equal to the means and variances of the observed data, and covariances of 0 among the generated variables. Alternatively, it can be created nonparametrically through the random mixing of indicator variable values within variables across observations.

Recently two alternative approaches for conducting PA have been proposed. Green, Levy, Thompson, Lu, and Lo (2012) proposed the use of what they called revised PA (R-PA), in which the comparison data are generated assuming that k− 1 latent variables underlie the observed data, rather than 0 factors as is the case with traditional PA. Thus, when testing for four factors, R-PA would generate data from a three-factor model, holding the marginal distributions of the variables to be equal to those in the observed sample. Concurrent with the work by Green et al., Ruscio and Roche (2012) described the comparative data method for determining the number of factors to retain. Comparative data method generates a sample of 10,000 cases based on a correlation matrix associated with k− 1 latent variables. Next, 500 random samples of n observations (where n is equal to the observed data sample size) are drawn from the 10,000 cases, and principal components analysis is conducted in order to obtain eigenvalues. The root mean square residual (RMSR) is calculated between the eigenvalues obtained from each of the 500 comparison data sets and the eigenvalues obtained from the observed data. Subsequently, the same set of steps are conducted for k latent variables. The RMSR values for the k and k− 1 solutions are compared with one another using the Mann–Whitney U test with an alpha of .3. The number of factors to retain is identified when the null hypothesis of the test is not rejected, in which case k− 1 factors are retained. As an example, if the Mann–Whitney U is not found to be statistically significant for k = 3 versus k− 1 =2 factors, then the researcher would retain 3 − 1, or 2 factors.

Research has demonstrated that PA and its variants are among the most effective methods for identifying the number of factors to retain in the context of EFA (Fabrigar & Wegener, 2011; Green, Redell, Thompson, & Levy, 2016; Green, Thompson, Levy, & Lo, 2015; Green, Xu, & Thompson, 2018; Preacher & MacCallum, 2003). Early simulation results consistently demonstrated that PA tended to identify the correct number of factors underlying a set of observed indicators more frequently than did most other alternative approaches, such as the scree plot, the eigenvalue greater than 1 rule, and proportion of variance explained by the factors. More recent work has shown that R-PA and comparative data method may yield somewhat more accurate results than standard PA, and that R-PA is perhaps the best performer most consistently across a variety of condition (Green et al., 2016). It is also true, however, that when the underlying latent traits are highly correlated with one another, PA has difficulty correctly identifying the number of factors to retain (Caron, 2018).

Model Fit Statistics

Recently, researchers have explored the use of several model fit indices that are most commonly associated with use in the context of CFA to determine the number of factors to retain in EFA (Clark & Bowles, 2018; Garrido, Abad, & Ponsoda, 2016; Preacher, Zhang, Kim, & Mels, 2013). A great deal of work has been conducted examining the performance of common fit indices, such as the comparative fit index (CFI), the Tucker–Lewis index (TLI), the standardized root mean square residual (SRMR), and the root mean square error of approximation (RMSEA), in the context of identifying correct CFA models. Each of these indices places a somewhat different emphasis on different aspects of model data fit. For example, RMSEA is an absolute fit index that assesses the extent to which the model chi-square goodness-of-fit statistic departs from the degrees of freedom. Under the null hypothesis of perfect fit in the population, the chi-square statistic will equal the model degrees of freedom. The further chi-square departs from the degrees of freedom, the poorer the fit, and the larger value of RMSEA. Statistical simulation work has shown that RMSEA typically indicates good fit when there are more degrees of freedom and larger samples (F. Chen, Curran, Bollen, Kirby, & Paxton, 2008). A common recommendation for practice is that RMSEA values of 0.05 or less indicate good fit, although F. Chen et al. found that this heuristic is not universally correct in terms of identifying good fitting models. CFI and TLI are also global model fit indices, each of which compares the fit of the target model with that of a baseline model, in which no underlying factor structure is posited. Larger values of the CFI indicate improved fit of the target model vis-à-vis the baseline, with values of 0.9 or higher typically being used to identify models that fit the data well (Hu & Bentler, 1999). The TLI is similar to the CFI, but rewards more parsimonious models through inclusion of the baseline model degrees of freedom in its calculation. In other words, of the two statistics, TLI is more likely to favor factor models with fewer loadings than is CFI. As with CFI, values of TLI greater than 0.9 are typically taken as indicators of good model fit. Finally, SRMR is the square root of the sum of squared correlation residuals for the among the indicator variables. As with RMSEA, SRMR can be seen as a badness-of-fit index in that larger values indicate a greater departure of the model predictions from the data, that is, larger residual correlations. Values of SRMR greater than 0.1 are typically used as indicators of good model fit (Hu & Bentler).

Results from this work have demonstrated that these fit indices are sensitive to a wide variety of factors, including sample size, number of indicators, factor loading values, the number of factors present in the model, and the type of indicators (e.g., Beauducel & Wittmann, 2005; F. Chen et al., 2008; Kenny, Kaniskan, & McCoach, 2015; Marsh, Hau, & Wen, 2004; Yuan, 2005). In addition, the thresholds that are commonly recommended for use with these indices were originally developed in the context of normally distributed continuous indicators, and their performance in the context of factor analysis with categorical indicator variables has been mixed. With respect to the performance of CFI and TLI with a 0.95 threshold value, accuracy at identifying model fit has been shown to degrade when discrete indicator variables have a small number of categories (Beauducel & Herzberg, 2006). Research on the performance of RMSEA in the context of categorical indicators has been more mixed, with some studies finding that the number of categories has very little impact on its performance (DiStefano & Morgan, 2014), whereas other work has shown that the commonly used thresholds employed with continuous data do not work well with categorical indicators (Monroe & Cai, 2015). This earlier work was extended by Clark and Bowles (2018), who found that the commonly used thresholds employed with continuous indicator variable models are often not applicable when the indicators are categorical. Despite the potential weaknesses of these indices in some contexts, they remain highly popular tools for researchers in determining the extent to which a particular factor model fits a sample of data, as is evidenced by their ubiquity in the applied factor analysis literature.

Determining the Number of Factors to Retain Using Fit Indices With EFA

Despite their popularity among researchers in the context of CFA, fit indices are used much less commonly to determine the number of factors to retain in EFA, though some work in this regard has been done. For example, Preacher et al. (2013) conducted a simulation study for factors involving normally distributed indicators in which they compared the performance of RMSEA, and the lower bound of the RMSEA confidence interval with that of two relative fit indices, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Their results showed that AIC may be the optimal tool for selecting consistently replicable models, whereas the lower bound of RMSEA may be most appropriate for identifying the actual data generating model. Based on results of an applied study but not simulations, Frazier and Youngstrom (2007) suggested that CFI and TLI could be good alternatives for determining the number of factors to retain in the context of EFA. Recent work by Barendse, Oort, and Timmerman (2015) and Yang and Xia (2015) has further developed this literature for both continuous and categorical indicator variables by showing that RMSEA appears to provide accurate results with respect to the number of factors to retain, when samples are large, and interfactor correlations are small. Yang and Xia also found that CFI performed similarly to RMSEA under many, though not all conditions.

With respect to factor extraction for EFA when indicator variables were categorical in nature, Garrido et al. (2016) investigated the performance of fit indices for determining the number of factors to retain for ordinal variables (2, 3, 4, and 5 categories), as well as continuous indicators. They found that CFI and TLI, used in conjunction with a robust-weighted least squares estimator (WLSMV), yielded slightly more accurate results for determining the number of factors to retain than did RMSEA or SRMR, and that these results were comparable to those produced by PA. Garrido et al. also reported that the performance of specific cut values interacted with conditions such as the number of variables per factor, sample size, and skewness of the categorical variable, and that CFI/TLI tended to underfactor for high interfactor correlations, whereas RMSEA was more negatively influenced by weak factor loadings than were the other approaches. Clark and Bowles (2018) further extended work in this area by focusing on dichotomous and normally distributed indicator variables, and comparing the performance of CFI, TLI, RMSEA, and the chi-square test for determining the number of factors to retain. Their study expanded on several study conditions that were used in prior studies, including the values of the factor loadings, interfactor correlations, and the number of items per factor. Clark and Bowles found that the fit indices were most accurate in terms of determining the number of factors to retain when the indicator variables were continuous and that when the indicators were dichotomous, CFI and TLI with a 0.95 cut-value were the most accurate of the fit indices studied. They also found that these approaches were most accurate when the factor loadings were relatively large, and when the interfactor correlations were smaller. Based on their findings, Clark and Bowles concluded that use of CFI and TLI with the 0.95 cut-value may be recommended for use, particularly when researchers want to avoid underfactoring their model, and that RMSEA may not be useful for this purpose. Clark and Bowles also called for future work in which differences in fit statistic values serve as the mechanism by which the number of factors to retain is determined. The current study is designed to further this research area by investigating the accuracy of fit index differences in determining the number of factors to retain.

Study Goals

The primary goal of this study was to extend prior work in the area of using fit statistics to determine the number of factors to retain in the context of EFA, by examining the performance of differences of several fit statistics in correctly identifying the correct number of factors underlying a set of indicators. Prior research has demonstrated that the use of these fit indices with commonly recommended threshold values is effective for identifying the correct number of factors to retain in some instances, but not others, and particularly not when the indicators are categorical in nature with a small number of categories. Therefore, the current study was designed to ascertain whether the use of differences of these fit indices might be better able to identify the number of factors to retain for EFA. The calculation and application of these difference statistics are described in more detail below. The indicator variables in the current study were simulated to be normally distributed, ordinal with four categories, and dichotomous. Comparative fit of different factor solutions was assessed using the difference in several statistics including CFI, TLI, RMSEA, and SRMR. The difference was calculated for solutions with adjacent numbers of factors (i.e., 1 vs. 2, 2 vs. 3, and 3 vs. 4). Several cut-values for indicating difference in fit were compared with one another. In addition, these difference in fit statistic values were compared with the performance of PA and RPA, which have been shown to be an effective tool for determining the number of factors to retain for both categorical and continuous indicator variables (Garrido et al., 2016; Green et al., 2018; Reilly & Eaves, 2000; Velicer, Eaton, & Fava, 2000; Wang, 2001). As outlined above, previous work in this area has demonstrated a mixed pattern of performance in terms of the ability of fit indices to accurately indicate the number of factors to retain when indicator variables are dichotomous (Clark & Bowles, 2018). On the other hand, for continuous indicators the use of such fit indices appears to yield more accurate findings (Preacher et al., 2013). Given this mixed set of findings, the purpose of this study was to ascertain whether the use of differences in fit indices could provide useful information regarding the number of factors to retain in EFA, under a variety of conditions.

Method

In order to address the goals of the study as outlined above, a Monte Carlo Simulation study was conducted, with 1,000 replications per combination of study conditions. This study is an extension of the simulation described by Clark and Bowles (2018). Data were generated using Mplus Version 8 (Muthèn & Muthèn, 2018), and a variety of conditions were varied, as described below. In addition, Mplus was used to fit the EFA models and obtain the model fit statistics, whereas the R software package Version 3.5.1 (R Core Team, 2017) was used to conduct PA and revised PA through the fa.parallel function in the psych (Revelle, 2018) library, and functions downloaded from https://web.asu.edu/samgreen/software for revised parallel analysis. For PA and revised PA, principal axis factoring was used for estimating the model parameters. The reduced correlation matrix was used for normally distributed indicator variables, whereas the reduced polychoric correlation matrices were used for the ordinal and dichotomous indicators. For the model fit index approaches, maximum likelihood was used for normal indicator variables, and robust weighted least squares (WLSMV in Mplus) was used for the dichotomous and ordinal indicator variable conditions. This is the same estimation approach that has been used in earlier research examining the use of fit indices with EFA (e.g., Clark & Bowles, 2018). All factor models were generated with a simple structure such that each indicator was associated with only a single factor, and all cross-loadings were set to 0. This simple structure model was simulated in order to allow for an investigation of the fit statistic difference values in the simplest case, and so as to keep the scope of the study within reasonable bounds. However, it is recognized that future research should include more complex factor models. Several variables were manipulated in this study in order to assess how well the methods under study could identify the appropriate number of factors under a variety of realistic conditions. These conditions are outlined below, and were selected based on prior work in this area (e.g., Clark & Bowles, 2018; Garrido et al., 2016).

Number of Factors

Data were simulated from a two- or three-factor model, in order to determine whether the methods performed differentially for different numbers of factors. For each replication, EFA models for one, two, three, and four factors were fit to the data, and the relevant statistics were retained for calculating the difference values.

Number and Type of Indicator Variables

The simulation included conditions with both 5 and 10 indicator variables per factor. Thus, in the 2 factors 5 indicators condition, 5 variables were associated with Factor 1 and 5 with Factor 2, yielding a total of 10 indicator variables. For the 3-factor condition, there were 5 indicators for each latent variable, resulting in 15 indicators altogether. The indicators were simulated to come from one of three different distributions: standard normal, dichotomous, and ordinal with four categories. For a given set of replications all indicators were of the same type. For the dichotomous items, threshold values were drawn from those taken from a standardized eighth-grade reading examination, and varied between −1.96 and 1.88. For the ordinal indicators with four categories, threshold values from a motivation assessment were used, such that all thresholds feel between −2.4 and 2.8. The specific thresholds for both the ordinal and dichotomous variables appear in Table 1. For each replication the requisite number of indicators (5 or 10 per factor) were randomly selected from this set of threshold values.

Table 1.

Threshold Values Used to Generate Categorical Indicator Variables.

Indicator	Ordinal thresholds	Dichotomous thresholds
1	−1.06, 0.52, 1.0	−1.96
2	0.05, 0.57, 1.38	−0.28
3	0.86, −0.13, 0.77	0.86
4	−2.40, −1.61, −0.39	0.75
5	−0.75, 0.11, 1.27	−0.72
6	−0.72, 0, 0.94	1.39
7	−0.06, 0.90, 1.88	−0.91
8	−1.29, −0.17, 0.65	−0.43
9	0.33, 1.41, 2.80	−0.30
10	−0.48, 0.16, 0.73	−1.19
11	−0.87, –0.04, 1.12	1.88
12	−2.01, –1.30, –0.27	0.98
13	−0.56, 0.13, 0.97	−0.67
14	−1.29, –0.36, 0.28	0.25
15	−0.27, 0.31, 0.86	−.40

Open in a new tab

Sample Size

Samples were simulated to be 100, 200, 300, 400, or 500. These values were selected so as to represent cases in which the sample size is relatively small (100) to a large sample case (500), given what is seen in practice.

Factor Loadings and Interfactor Correlations

The factor loadings were manipulated to be 0.35, 0.50, or 0.70. The data were generated so that these loadings were standardized, meaning that the error variances for individual indicators was 1 − $λ^{2}$ . These values were selected because they replicate those used by Clark and Bowles (2018), and represent common values seen in practice. The correlations among the factors were 0.2, 0.45, and 0.7, again in order to replicate Clark and Bowles. These values represent small, medium, and large relationships among the latent variables.

Methods for determining the number of factors to retain

Several approaches were used to determine the number of factors to retain, including PA using the 95th percentile (PA_95), PA using the 50th percentile (PA_50), and revised PA using the 95th percentile (RPA_95). In addition, for the PA methods both the parametric and nonparametric approaches were included in the study. In terms of the fit indices, differences in CFI, TLI, RMSEA, and SRMR values were all examined. Multiple cut-values for the differences were used. For CFI and TLI, these cut-values were 0.01, 0.005, and 0.001. Thus, for example, if the difference in CFI values for the two- and three-factor solutions for a given replication was 0.02, then the number of factors to be retained would be 3, because the improvement in fit by adding the third factor would exceed the cut-value. These cut-values were selected based on earlier work in the context of factor invariance assessment (F. F. Chen, 2007; Cheung & Rensvold, 2002; Meade, Johnson, & Braddy, 2008). In addition, for RMSEA, differences of 0.01, 0.015, and 0.02 were used to ascertain the number of factors to retain. These values were also selected based on earlier work, specifically by Meade et al., and Chen, the latter of whom found the RMSEA of 0.015 to be particularly useful for invariance assessment. For the SRMR, cut-values of 0.01, 0.02 and 0.03 were used, again based on earlier work in the context of invariance assessment (Chen). Finally, recommendations by Chen for using a combination of differences in fit index values was also employed in the current study, with the following heuristic for determining the number of factors to retain.

If the change in CFI was greater than or equal to 0.01, and either the change in RMSEA was 0.015 or greater, or the change in SRMR was 0.03 or greater then conclude that the fit of the models differs, and the one including more factors is taken to be the better of the two. This process was repeated, comparing a model with fewer factors with that containing one additional factor, until differences in model fit based on Chen’s change criteria are not achieved.

Comparison in model fit was made between adjacent factor solutions (i.e., 1 vs. 2, 2 vs. 3, 3 vs. 4). The number of factors to be retained corresponded to the number for which the difference in fit statistic values exceeded the threshold and the comparison with involving the next larger number of factors did not exceed the threshold. For example, when using the CFI of 0.01 standard, consider the situation in which the one-versus-two-factor comparison yielded a difference in CFI values of 0.021. We would conclude that the two-factor solution fit the data better than the one-factor solution, and that at minimum two factors should be retained. If the difference in CFI values for the two- versus three-factor comparison is then calculated to be 0.008, we would conclude that adding the third factor did not improve fit sufficiently for it to be retained. Therefore, in this example case, the two-factor solution would be deemed as optimal. At this point, the comparisons would stop, and two factors would be retained. A separate set of such comparisons using the differences in fit values was made for each statistic and each cut-value.

Study Outcomes

The primary study outcome of interest was the proportion of cases for which the correct number of factors was retained for each of the methods described above. In addition, the number of factors that each method indicated to be optimal was also recorded, as were convergence rates for each method. It should be noted that when convergence was not attained for one of the methods for a given replication, additional replications were conducted until 1,000 were successfully completed.

Results

As described above, a number of different methods for determining the optimal number of factors to retain in an EFA were investigated in the current study. Several of these approaches yielded very low accuracy rates, and are therefore not included in the description of results presented below. The approaches with these very low values were all of those associated with the difference SRMR values, and those based on combinations of difference values. Specifically, use of the SRMR statistic consistently led to selection of a less complex factor model (underfactoring) than was actually the case with the data generating model. This tendency to underfactor was most pronounced for the larger threshold values used with SRMR, but was present even for the 0.01 value. Finally, accuracy rates for SRMR never exceeded 0.53 under any of the conditions included here, and were typically less than 0.3. In terms of the combination criterion, the tendency was to identify models with too few factors (underfactor), with accuracy rates never exceeding 0.45, and rates generally less than 0.25.

With regard to the CFI, TLI, and RMSEA difference approaches, one heuristic was consistently more accurate than the others within the same family. Namely, the 0.01 criterion for CFI (CFI_01) and TLI (TLI_01), and the 0.015 criterion for RMSEA (RMSEA_015). Given that these were consistently the most accurate approaches for each family of difference statistics, they are the only ones presented in the following sections. It was determined that limiting the results to these statistics, given their consistently best performance, would help to keep the results understandable and clear. For the CFI and TLI statistics, the 0.005 and 0.001 criteria tended to lead to underfactoring, whereas for the RMSEA approaches, the 0.01 criterion was associated with overfactoring, whereas the 0.02 criterion was associated with underfactoring. Finally, results for the parametric and nonparametric approaches to PA yielded virtually identical results. Therefore, only those for the nonparametric approach are reported below, so as to avoid redundancy, and to keep the article at a reasonable length.

Convergence Rates

Convergence rates for all estimation methods were 100% across all conditions involving the normal indicator variables. For the ordinal indicators convergence rates were 100% for all conditions except for a sample size of 100, factor loadings of 0.35, and 30 indicators, for which the convergence rate was 77% for the WLSMV estimator. Convergence rates were also 100% for the dichotomous indicator variables for samples of 300 or more across all conditions. When the sample size was 200, the factor loadings were 0.35, and there were 30 indicators, the convergence rate was 83%. For a sample size of 100, factor loadings of 0.50, and 30 indicators, the convergence rate was 72%, whereas for a sample size of 100, factor loadings of 0.35, and 30 indicators, the convergence rate fell to 54%.

Normal Indicators

Figure 1 includes the proportion of cases in which the correct number of factors were identified by the method, number of indicators, and factor loading value for the normally distributed indicator variables. These results demonstrate that PA and RMSEA_015 difference statistic performed the best across the methods studied here. Furthermore, these techniques yielded more accurate results for larger factor loadings. When the factor loadings were 0.35 and the factors had five indicators each, RMSEA_015 yielded more accurate results than did RPA_095 (the best performer of the PA based methods), with accuracy rates between 0.39 and 0.45. RPA_095 exhibited accuracy rates between 0.34 and 0.39 for the factor loading condition of 0.35. When the factor loadings were either 0.50 or 0.70, RPA_095 yielded the highest accuracy rates (between 0.77 and 0.95), followed by PA_095 (0.73 to 0.94) and RMSEA_015 (0.68 to 0.88). The PA_50, CFI, and TLI difference statistics were consistently the least accurate in terms of identifying the correct number of factors than were the other approaches. For all of the methods, inaccurate results were associated with a tendency to overfactor.

Table 2 includes the proportion of correctly identified number of factors by method. The accuracy rates of all approaches were lowest for the highest interfactor correlation. RPA_95 was the most accurate method, and this difference vis-à-vis the other approaches was most pronounced for lower interfactor correlation values. For the largest interfactor correlation value (0.7), RPA_95 had accuracy rates very comparable to those of RMSEA_015 (approximately 0.67), and much lower than its peak accuracy of 0.88 for the smallest correlation. The impact of interfactor correlation on the performance of RMSEA_015 was much more muted, being associated with a decline from 0.70 to 0.66 for the lowest to highest correlation values. In addition, the results for PA_95 and RMSEA_015 were generally comparable across interfactor correlation conditions.

Table 2.

Proportion of Cases for Which the Correct Number of Factors Was Retained, by Method and Interfactor Correlation for Normal Indicator Variables.

	PA_95	PA_50	RPA_95	CFI_01	TLI_01	RMSEA_015
Correlation^a
0.2	0.72	0.58	0.88	0.66	0.62	0.70
0.45	0.74	0.56	0.82	0.58	0.57	0.70
0.7	0.61	0.47	0.67	0.56	0.54	0.66
Sample size^b
100	0.58	0.29	0.61	0.54	0.52	0.64
200	0.65	0.40	0.70	0.61	0.60	0.73
300	0.75	0.51	0.89	0.73	0.73	0.87
400	0.89	0.58	0.95	0.85	0.84	0.92
500	0.95	0.66	0.97	0.95	0.95	0.96

Open in a new tab

Results collapsed across all manipulated factors other than interfactor correlation. ^bResults collapsed across all manipulated factors other than sample size.

Results in Table 2 demonstrate that larger samples were associated with greater accuracy for all of the methods studied here, In addition, for samples of 100 or 200, RMSEA_015 displayed the highest accuracy rates, correctly identifying the number of factors in the underlying model between 64% and 73% of the time. RPA_95 yielded the second most accurate results for the smallest sample sizes, with rates between 61% and 70%. For samples of 300 or more, the RPA_95 technique yielded the most accurate findings (between 0.89 and 0.97), followed by RMSEA_015 (0.87 to 0.96). As was demonstrated in Figure 1, CFI_01, TLI_01, and PA_50 all were the least accurate methods. Finally, for a sample size of 500, all of the methods, with the exception of RPA_50, were all very accurate in terms of correctly identifying the number of underlying factors, with rates of 0.95 or higher.

Ordinal Indicators

First of all, as was the case with the normal indicators, when each of the methods made an error in terms of identifying the number of factors to retain, they tended to overfactor rather than underfactor. Figure 2 displays the accuracy rates for each method by the number of indicators and factor loading values. Note that in order to make interpretation of the graphs more straightforward, only the most accurate condition for each of the difference statistics is included in the graph. As with the normal indicator variables, the methods studied here displayed greater accuracy when the factor loading values were larger. In addition, as with the normal indicator variables, the RMSEA_015 technique performed better than RPA_95 in terms of correctly identifying the number of factors to retain for factor loadings of 0.35 with accuracy rates between 0.39 and 0.42. Likewise, for factor loadings of 0.5 the RMSEA_015 was also the best performer, exhibiting accuracy rates of between 0.65 and 0.69. For factor loadings of 0.50, RPA_95 yielded the second most accurate rates for determining the number of factors to retain, with rates between 0.60 and 0.65 for factor loadings of 0.50. For loadings of 0.35, CFI_01 was the second most accurate method behind RMSEA_015, with rates of 0.38 to 0.40. When the factor loadings were 0.7, RPA_95 yielded the most accurate results of the approaches studied here, with accuracy rates between 0.89 and 0.93. The accuracy rates of PA_95 were very close to those of the revised method, falling between 0.85 and 0.90. RMSEA_015 accurately identified the number of underlying factors approximately 82% of the time when factor loadings were 0.70. Finally, all of the methods performed slightly better when there were 10 indicator variables, as compared with 5.

Figure 2. — Proportion of cases for which the correct number of factors was retained, by method, number of indicators per factor, and factor loading values for ordinal indicator variables.

Table 3 includes the accuracy rates for each of the methods by the sample size, and by the interfactor correlation values. All methods yielded more accurate rates in terms of correctly identifying the number of factors for larger sample sizes, and for lower interfactor correlations. In addition, these results demonstrate that the RMSEA_015 approach had the highest accuracy rates for the two smallest sample size conditions (ranging between 0.54 and 0.60), followed by RPA_95 (0.51 to 0.58). For larger samples, RPA_95 had the highest accuracy rates (0.66 to 0.93), with RMSEA_015 being the second most accurate method (0.62 to 0.92). PA_50, CFI_01, and TLI_01 had the lowest accuracy rates, with values between 0.25 and 0.62. PA_95 was consistently the third most accurate method, with accuracy rates between 0.48 and 0.86, making its performance much more similar to that of RPA_95 and RMSEA_015, than to the other three approaches studied here. In terms of the interfactor correlation, RMSEA_015 was somewhat more accurate than RPA_95, whereas the two methods were both accurate in 50% of cases when the correlation among the factors was 0.7, As was evident when examining the relationship of sample size to accuracy rates, PA_50, CFI_01, and TLI_01 were the least accurate methods studied here.

Table 3.

Proportion of Cases for Which the Correct Number of Factors Was Retained, by Method, Interfactor Correlation, and Sample Size for Ordinal Indicator Variables.

	PA_95	PA_50	RPA_95	CFI_01	TLI_01	RMSEA_015
Correlation^a
0.2	0.53	0.42	0.56	0.47	0.46	0.58
0.45	0.46	0.37	0.50	0.40	0.36	0.53
0.7	0.47	0.36	0.50	0.37	0.34	0.50
Sample size^b
100	0.48	0.25	0.51	0.29	0.25	0.54
200	0.56	0.37	0.58	0.39	0.32	0.60
300	0.60	0.44	0.66	0.45	0.43	0.62
400	0.72	0.52	0.79	0.58	0.57	0.74
500	0.86	0.60	0.93	0.60	0.62	0.92

Open in a new tab

Results collapsed across all manipulated factors other than interfactor correlation. ^bResults collapsed across all manipulated factors other than sample size.

Dichotomous Indicator Variables

In keeping with the results for both ordinal and normal indicator variables, when any of the methods studied here yielded inaccurate results, they were more likely to indicate the need to retain more factors than were actually used to generate the data. The accuracy rates by method, number of indicators per factor, and factor loading values appear in Figure 3. As was true for the other indicator variable types, accuracy for all of the methods increased concomitantly with increases in factor loading values for the dichotomous indicators. For factor loading values of 0.35 or 0.50 coupled with five indicator variables per factor, the RMSEA_015 criterion exhibited the highest accuracy rates of the methods studied here, with values between 0.37 and 0.58. The next best performer in these conditions was RPA_95, which exhibited accuracy rates between 0.36 and 0.45. For factor loadings of 0.50 and 10 indicators per factor, RMSEA_015 and RPA_95 both had accuracy rates of approximately 0.63. For cases in which the factor loadings were 0.70, RPA_95 had higher accuracy rates for accurately determining the number of factors, with values between 0.79 and 0.82, whereas for RMSEA_015 in this case, the accuracy rates were 0.78 to 0.80. As was the case for both ordinal and normal indicators, PA_95 was the third best performer across factor loading conditions, with accuracy rates of between 0.20 (5 indicators with 0.35 loadings) and 0.80 (10 indicators with 0.70 loadings). PA_50, CFI_01, and TLI_01 yielded the lowest accuracy rates across all conditions, as was true for the ordinal and normal indicator conditions.

Figure 3. — Proportion of cases for which the correct number of factors was retained, by method, number of indicators per factor, and factor loading values for dichotomous indicator variables.

Table 4 contains the accuracy rates for each method by the interfactor correlation and the sample size. As was the case for both ordinal and normal indicator variables, each of the methods displayed greater accuracy for larger samples. In addition, for samples of 100 and 200, the RMSEA_015 technique had higher accuracy rates than did the other methods, with rates of 0.41 for a sample size of 100 and 0.49 for a sample size of 200. For sample sizes of 300 or more, RPA_95 yielded the most accurate results, with values between 0.61 and 0.79, compared with RMSEA_015, which had accuracy rates of 0.57 to 0.75. PA_95 was the third best performer across sample sizes, with CFI_01 and TLI_01 consistently exhibiting the lowest accuracy rates, with values between 0.24 and 0.68. Finally, results in Table 3 reinforce the findings for ordinal and normal indicators, namely that higher interfactor correlation values were associated with lower accuracy in terms of determining the number of underlying factors, across methods.

Table 4.

Proportion of Cases for Which the Correct Number of Factors Was Retained, by Method and Sample Size for Dichotomous Indicator Variables.

	PA_95	PA_50	RPA_95	CFI_01	TLI_01	RMSEA_015
Correlation^a
0.2	0.48	0.36	0.50	0.45	0.39	0.52
0.45	0.41	0.29	0.45	0.36	0.31	0.46
0.7	0.40	0.27	0.43	0.35	0.30	0.45
Sample size^b
100	0.35	0.21	0.37	0.24	0.24	0.41
200	0.44	0.27	0.46	0.26	0.25	0.49
300	0.55	0.38	0.61	0.49	0.45	0.57
400	0.67	0.44	0.75	0.62	0.60	0.73
500	0.72	0.51	0.79	0.68	0.66	0.75

Open in a new tab

Results collapsed across all manipulated factors other than interfactor correlation. ^bResults collapsed across all manipulated factors other than sample size.

Discussion

The goal of this study was to further work examining the use of model fit statistics commonly employed in the context of CFA to the problem of determining the number of factors to retain in EFA. Prior research in this area focused on the use of fit index values to determine whether models consisting of a given number of factors fit the data well, or not (e.g., Clark & Bowles, 2018). Given the increased popularity and demonstrated utility of fit statistic difference values in assessing model invariance, it was of some interest to ascertain whether such an approach might have applicability in determining the number of factors to retain for an EFA model. The widely used and generally effective PA approach served as the baseline against which the fit statistic difference statistics were compared.

Results of the study demonstrated that the fit statistic difference approach does hold some promise for determining the number of factors to retain in the context of EFA, particularly when the indicator variables were categorical and the factor loadings were 0.5 or 0.35. In particular, the RMSEA_015 cut-value criterion was demonstrated to be particularly effective in those cases. The methods studied here were all less accurate in conjunction with fewer indicator variables, a higher interfactor correlation, smaller samples, and weaker factor loadings. In addition to relative fit, it is also important to consider the absolute fit of the methods studied here. The results described above revealed that in the best case scenarios, with samples of 400 or more, factor loadings of 0.7, normally distributed indicators, and relatively low interfactor correlations, RPA_95, PA_95, and RMSEA_015 all yield accuracy rates of approximately 0.9 or higher. Indeed, the results presented here would seem to suggest that in such cases, any of these three approaches will work well. However, in the most difficult situations for these methods, including samples of 200 or fewer, dichotomous indicators, and low loadings, none of the methods had accuracy rates in excess of 0.50. In these worst case scenarios, RMSEA_015 performed the best. It is very important to note that even this best performer was still correct less than 50% of the time in these worst case scenarios. Thus, although this approach would certainly be recommended for use in such situations, based on the current study, the resulting analysis would not be very accurate in an absolute sense.

The results of this study with respect to the relative performance of the goodness-of-fit indices appear to be somewhat different than those presented in earlier research (e.g., Clark & Bowles, 2018; Garrido et al., 2016), which found that CFI and TLI were generally more accurate than was RMSEA in terms of identifying the optimal number of factors to retain in EFA for both categorical and normally distributed indicator variables. In the current study, the difference statistics associated with CFI and TLI using thresholds of 0.001, 0.005, or 0.01 did not perform particularly well when compared with the difference statistic based on the RMSEA and a threshold value of 0.015. Both CFI and TLI difference statistics with the thresholds used in this study had a strong tendency to overfactor, leading to relatively poor accuracy results when compared with RMSEA_015.

Another major finding of this study was that the RMSEA_015 criterion worked as well as the parallel analysis methods in many situations, and better in several specific cases. This is an interesting result given that RPA and PA have been found to be among the best approaches for determining the number of factors to retain in an EFA (Fabrigar & Wegener, 2011; Green et al., 2015; Green et al., 2016; Green et al., 2018; Preacher & MacCallum, 2003). The simulation results described in this article show that for categorical indicator variables the RMSEA_015 criterion is always at least as accurate as either RPA_95 or PA_95, and is more accurate when the factor loading is 0.5 or less and/or the sample size is 200 or fewer. In addition, RMSEA_015 was more accurate than either RPA_95 or PA_95 for identifying the number of factors to retain when indicator variables were normally distributed and the factor loadings were 0.35. Again, it is important to note that in these conditions no method, including RMSEA_015, was particularly accurate. However, when faced with such situations in practice, researchers should strongly consider RMSEA_015 for assistance in determining the number of factors to retain in an EFA, as it was the most accurate of the approaches studied here.

In addition to comparing the relative performance of the various approaches in terms of determining the number of factors to retain in EFA, it is also important to gain some insights into why the results came out as they did. For example, SRMR was consistently a poor performer across conditions. An examination of the SRMR results revealed that it had a tendency to underfactor, that is, indicate that fewer factors should be retained than were actually present in the data. This result is in keeping with other findings (e.g., Kim, Yoon, & Lee, 2012) that SRMR tends to be insensitive to model misspecification. In other words, this statistic does not decrease in value a great deal for an increased number of factors. In the context of using difference values to determine optimal model fit, this lack of sensitivity would be manifested as a tendency to find all models yielding similar fit (i.e., small differences in fit statistic values across different numbers of factors), thereby leading to the conclusion that models with fewer factors fit as well as models with more factors. These results are also in keeping with findings by Garrido et al. (2016) that SRMR was not as effective as other approaches for determining the number of factors to retain in the context of EFA.

With respect to CFI and TLI difference values, a somewhat similar result was in evidence for the normally distributed indicator variables. In this context, CFI and TLI both tended to exhibit relatively large values across most number of factor conditions, indicating good fit for these models. In addition, the difference between these statistics for differing numbers of factors tended to be quite small, leading to the conclusion that including additional factors to the model did not result in meaningful improvement (as defined by the cut-values) to fit. This lack of sensitivity, as with SRMR, though not as severely, resulted in a lower accuracy rate due to underfactoring. It should also be noted that this lack of sensitivity to model difference has also been reported in the context of invariance assessment (e.g., French & Finch, 2006).

Recommendations for Practice

The results of the current study hold several implications for research practice. First, PA was again demonstrated to be an effective tool for identifying the number of factors to retain in an EFA in several conditions. More specifically, PA appears to be particularly useful when indicator variables are normally distributed, and the factor loadings are moderate (0.5) to large (0.7) in value. On the other hand, when the indicators are categorical, and/or the factor loadings are small (0.35), PA is not as effective. Second, the difference in RMSEA statistic, with a cut-value of 0.015 shows promise as a method for identifying the number of factors to retain, particularly when the indicators are categorical, and when factor loadings are small. Although its accuracy in identifying the number of factors to retain does degrade with smaller factor loadings, this decline in performance is not as severe as was the case for the other methods studied here. Therefore, researchers faced with relatively low factor loadings and categorical indicator variables may want to consider using the difference in RMSEA with a 0.015 cut-value as a method for determining the number of factors to retain. A third implication of this study is that the in CFI, TLI, and SRMR difference statistics may not be particularly useful for deciding how many factors to retain in an EFA. Certainly, further research needs to be conducted before any definitive conclusions in this regard can be reached, however, the results reported here would appear to suggest, at least tentatively, that such is the case. A final implication coming from this research is that researchers cannot rely on only one approach to help them in determining the number of factors to retain. As noted above, when the indicator variables are normally distributed, and the factor loadings are 0.5 or 0.7, PA is a very effective tool for making this decision. However, when the indicators are categorical, and/or the loadings are relatively small, an alternative approach, such as RMSEA_015, needs to be considered.

Directions for Future Research

There are several limitations to the current study that future work should address. First, all of the data used here were generated from simple structure models. However, in many real-world applications factor models do not actually demonstrate simple structure. Therefore, future work should expand on the current study by examining the performance of these methods, particularly, the difference statistic approaches, under the case of nonsimple structure factor models. In addition, future work should include a wider array of latent structures, including a single-factor model, and a model with more than three factors. Likewise, future research should also investigate a wider array of indicator conditions, such as unequal numbers of indicators per factor, differing factor loadings for indicators on the same factor, and different indicator variable distributions (e.g., ordinal with five categories, continuous nonnormal). Further study should examine the use of different strategies for employing the difference statistics, other than the strictly sequential method used here. For example, researchers could check all possible differences up to a predetermined number of factors (e.g., the number of indicators − 1), or a combination of the difference statistic approach used here with an absolute criterion (e.g., employing a threshold for ΔRMSEA such as 0.015 in conjunction with a threshold for the raw RMSEA value as well, such as 0.08). Finally, future work could also investigate a wider array of difference statistics and thresholds for determining the number of factors to retain. This additional work might include the use of information indices, such as the Akaike information criterion and Bayesian information criterion, as well as the incremental fit index, and additional cut-values for RMSEA. It is also possible that future work could examine different cut-values for differences in CFI and TLI, as well as SRMR, although results from the current study would appear to suggest that these statistics may not be particularly promising with regard to determining the number of factors to retain in an EFA.

Conclusions

The goal of this study was to ascertain whether using difference of fit statistics might be a worthwhile approach for determining the number of factors to retain in the context of EFA. Such an approach has been shown to be effective in the context of invariance testing for CFA models (Meade et al., 2008). The results of the current study demonstrate that such an approach, particularly using RMSEA, does hold promise for this type of use. Although further work is clearly needed, these results are promising, and would suggest that the inclusion of such an approach in the toolbox of methods used to determine the number of factors to retain in an EFA would likely be helpful, particularly when indicator variables are categorical in nature, and/or the relationships between factors and indicators is not particularly strong.

Footnotes

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD: W. Holmes Finch Inline graphic https://orcid.org/0000-0003-0393-2906

References

Barendse M. T., Oort F. J., Timmeran M. E. (2015). Using exploratory factor analysis to determine the dimensionality of discrete responses. Structural Equation Modeling, 22, 87-101. [Google Scholar]
Beauducel A., Herzberg P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling, 13, 186-203. [Google Scholar]
Beauducel A., Wittmann W. W. (2005). Simulation study on fit indexes in CFA based on data with slightly distorted simple structure. Structural Equation Modeling, 12, 41-75. [Google Scholar]
Canivez G. L., Watkins M. W., McGill R. J. (2019). Construct validity of the Wechsler Intelligence Scale for Children–fifth UK edition: Exploratory and confirmatory factor analyses of the 16 primary and secondary subtests. British Journal of Educational Psychology, 89(2), 195-224. doi: 10.1111/bjep.12230 [DOI] [PubMed] [Google Scholar]
Caron P.-O. (2018). Minimum average partial correlation and parallel analysis: The influence of oblique structures. Communications in Statistics–Simulation and Computation, 48(7), 2110-2117. doi: 10.1080/03610918.2018.1433843 [DOI] [Google Scholar]
Chen F., Curran P. J., Bollen K. A., Kirby J., Paxton P. (2008). An empirical evaluation for the use of fixed cut-off points in RMSEA test statistic in structural equation models. Sociological Methods & Research, 36, 462-494. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14, 464-504. [Google Scholar]
Cheung G. W., Rensvold R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233-255. [Google Scholar]
Clark D. A., Bowles R. P. (2018). Model fit and item factor analysis: Overfactoring, underfactoring, and a program to guide interpretation. Multivariate Behavioral Research, 53, 544-558. [DOI] [PubMed] [Google Scholar]
Coker J. L., Catlin D., Ray-Griffith S., Knight B., Stowe Z. N. (2018). Buprenophrine medication-assisted treatment during pregnancy: An exploratory factor analysis associated with adherence. Drug and Alcohol Dependence, 192, 146-149. [DOI] [PMC free article] [PubMed] [Google Scholar]
Crawford C. B., Koopman P. (1979). Note: Inter-rater reliability of scree test and mean square ratio test of number of factors. Perceptual and Motor Skills, 49, 223-226. [Google Scholar]
DiStefano C., Morgan G. B. (2014). A comparison of diagonal weighted least squares robust estimation techniques for ordinal data. Structural Equation Modeling, 21, 425-438. [Google Scholar]
Fabrigar L. R., Wegener D. T. (2011). Exploratory factor analysis. Oxford, England: Oxford University Press. [Google Scholar]
Frazier T. W., Youngstrom E. A. (2007). Historical increase in the number of factors measured by commercial tests of cognitive ability: Are we overfactoring? Intelligence, 35, 169-182. [Google Scholar]
French B. F., Finch W. H. (2006). Confirmatory factor analytic procedures for the determination of measurement invariance. Structural Equation Modeling, 13, 378-402. [Google Scholar]
Garrido L. E., Abad F. J., Ponsoda V. (2016). Are fit indices really fit to estimate the number of factors with categorical variables? Some cautionary findings via Monte Carlo simulation. Psychological Methods, 21, 93-111. [DOI] [PubMed] [Google Scholar]
Gorsuch R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. [Google Scholar]
Green S. B., Levy R., Thompson M. S., Lu M., Lo W.-J. (2012). A proposed solution to the problem with using completely random data to assess the number of factors with parallel analysis. Educational and Psychological Measurement, 72, 357-374. [Google Scholar]
Green S. B., Redell N., Thompson M. S., Levy R. (2016). Accuracy of revised and traditional parallel analyses for assessing dimensionality with binary data. Educational and Psychological Measurement, 76, 5-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
Green S. B., Thompson M. S., Levy R., Lo W.-J. (2015). Type I and II error rates and overall accuracy of the revised parallel analysis method for determining the number of factors. Educational and Psychological Measurement, 75, 428-457. [DOI] [PMC free article] [PubMed] [Google Scholar]
Green S. B., Xu Y., Thompson M. (2018). Relative accuracy of two parallel analysis methods that use the proper reference distribution. Educational and Psychological Measurement, 78, 589-604. [DOI] [PMC free article] [PubMed] [Google Scholar]
Horn J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179-185. [DOI] [PubMed] [Google Scholar]
Hu L.-I., Bentler P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55. [Google Scholar]
Kenny D. A., Kaniskan B., McCoach D. B. (2015). The performance of RMSEA in models with small degrees of freedom. Sociological Methods & Research, 44, 486-507. [Google Scholar]
Kim E. S., Yoon M., Lee T. (2012). Testing measurement invariance using mimic likelihood ratio test with a critical value adjustment. Educational and Psychological Measurement, 72, 469-492. [Google Scholar]
Marsh H. W., Hau K., Wen Z. (2004). In search of golden rules: Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing. Structural Equation Modeling, 11, 320-341. [Google Scholar]
Meade A. W., Johnson E. C., Braddy P. W. (2008). Power and sensitivity of alternative fit indices in tests of measurement invariance. Journal of Applied Psychology, 93, 568-592. [DOI] [PubMed] [Google Scholar]
Monroe S., Cai L. (2015). Examining the reliability of student growth percentiles using multidimensional IRT. Educational Measurement: Issues and Practice, 34, 21-30. [Google Scholar]
Muthèn L. K., Muthèn B. O. (2018). Mplus user’s guide (8th ed.). Los Angeles, CA: Muthèn & Muthèn. [Google Scholar]
Pett M. A., Lackey N. R., Sullivan J. J. (2003). Making sense of factor analysis: The use of factor analysis for instrument development in health care research. Thousand Oaks, CA: Sage. [Google Scholar]
Preacher K. J., MacCallum R. C. (2003). Repairing Tom Swift’s electric factor analysis machine. Understanding Statistics, 2, 13-43. [Google Scholar]
Preacher K. J., Zhang G., Kim C., Mels G. (2013). Choosing the optimal number of factors in exploratory factor analysis: A model selection perspective. Multivariate Behavioral Research, 48, 28-56. [DOI] [PubMed] [Google Scholar]
R Core Team. (2017). R: A language and environment for statistical computing (3.5.1). Vienna, Austria: R Foundation for Statistical Computing. [Google Scholar]
Raiche G., Walls T. A., Magis D., Riopel M., Blais J.-G. (2013). Non-graphical solutions for Cattell’s scree test. Methodology, 9, 23-29. [Google Scholar]
Ratti V., Vickerstaff V., Crabtree J., Hassiotis A. (2017). An exploratory factor analysis and construct validity of the Resident Choice Assessment Scale with paid carers of adults with intellectual disabilities and challenging behavior in community settings. Journal of Mental Health Research in Intellectual Disabilities, 10, 198-216. [Google Scholar]
Reilly A., Eaves R. C. (2000). Factor analysis of the Minnesota Infant Development Inventory based on a Hispanic migrant population. Educational and Psychological Measurement, 60, 271-285. [Google Scholar]
Revelle W. (2018). psych: Procedures for personality and psychological research (Version 1.8.12). Evanston, IL: Northwestern University; Retrieved from https://CRAN.R-project.org/package=psych [Google Scholar]
Revelle W., Rocklin T. (1979). Very simple structure: An alternative procedure for estimating the optimal number of interpretable factors. Multivariate Behavioral Research, 14, 403-414. [DOI] [PubMed] [Google Scholar]
Ruscio J., Roche B. (2012). Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure. Psychological Assessment, 24, 282-292. [DOI] [PubMed] [Google Scholar]
Tong X., Bentler P. M. (2013). Evaluation of a new mean scaled and moment adjusted test statistic for SEM. Structural Equation Modeling, 20, 148-156. [DOI] [PMC free article] [PubMed] [Google Scholar]
Velicer W. F. (1976). Determining the number of components from the matrix of partial correlations. Psychometrika, 41, 321-327. [Google Scholar]
Velicer W. F., Eaton C. A., Fava J. L. (2000). Construct explication through factor or component analysis: A review and evaluation of alternative procedures for determining the number of factor or components. In Goffin R. D., Helmes E. (Eds.), Problems and solutions in human assessment (pp. 41-71). New York, NY: Springer Science. [Google Scholar]
Wang J. Z. (2001). Illegal Chinese immigration in the United States: A preliminary factor analysis. International Journal of Offender Therapy and Comparative Criminology, 45, 345-355. [Google Scholar]
Wirth R. J., Edwards M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58-70. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang Y., Xia Y. (2015). On the number of factors to retain in exploratory factor analysis for ordered categorical data. Behavioral Research Methods, 47, 756-772. [DOI] [PubMed] [Google Scholar]
Yuan K. (2005). Fit indices versus test statistics. Multivariate Behavioral Research, 40, 115-148. [DOI] [PubMed] [Google Scholar]

[bibr1-0013164419865769] Barendse M. T., Oort F. J., Timmeran M. E. (2015). Using exploratory factor analysis to determine the dimensionality of discrete responses. Structural Equation Modeling, 22, 87-101. [Google Scholar]

[bibr2-0013164419865769] Beauducel A., Herzberg P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling, 13, 186-203. [Google Scholar]

[bibr3-0013164419865769] Beauducel A., Wittmann W. W. (2005). Simulation study on fit indexes in CFA based on data with slightly distorted simple structure. Structural Equation Modeling, 12, 41-75. [Google Scholar]

[bibr4-0013164419865769] Canivez G. L., Watkins M. W., McGill R. J. (2019). Construct validity of the Wechsler Intelligence Scale for Children–fifth UK edition: Exploratory and confirmatory factor analyses of the 16 primary and secondary subtests. British Journal of Educational Psychology, 89(2), 195-224. doi: 10.1111/bjep.12230 [DOI] [PubMed] [Google Scholar]

[bibr5-0013164419865769] Caron P.-O. (2018). Minimum average partial correlation and parallel analysis: The influence of oblique structures. Communications in Statistics–Simulation and Computation, 48(7), 2110-2117. doi: 10.1080/03610918.2018.1433843 [DOI] [Google Scholar]

[bibr6-0013164419865769] Chen F., Curran P. J., Bollen K. A., Kirby J., Paxton P. (2008). An empirical evaluation for the use of fixed cut-off points in RMSEA test statistic in structural equation models. Sociological Methods & Research, 36, 462-494. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr7-0013164419865769] Chen F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14, 464-504. [Google Scholar]

[bibr8-0013164419865769] Cheung G. W., Rensvold R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233-255. [Google Scholar]

[bibr9-0013164419865769] Clark D. A., Bowles R. P. (2018). Model fit and item factor analysis: Overfactoring, underfactoring, and a program to guide interpretation. Multivariate Behavioral Research, 53, 544-558. [DOI] [PubMed] [Google Scholar]

[bibr10-0013164419865769] Coker J. L., Catlin D., Ray-Griffith S., Knight B., Stowe Z. N. (2018). Buprenophrine medication-assisted treatment during pregnancy: An exploratory factor analysis associated with adherence. Drug and Alcohol Dependence, 192, 146-149. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr11-0013164419865769] Crawford C. B., Koopman P. (1979). Note: Inter-rater reliability of scree test and mean square ratio test of number of factors. Perceptual and Motor Skills, 49, 223-226. [Google Scholar]

[bibr12-0013164419865769] DiStefano C., Morgan G. B. (2014). A comparison of diagonal weighted least squares robust estimation techniques for ordinal data. Structural Equation Modeling, 21, 425-438. [Google Scholar]

[bibr13-0013164419865769] Fabrigar L. R., Wegener D. T. (2011). Exploratory factor analysis. Oxford, England: Oxford University Press. [Google Scholar]

[bibr14-0013164419865769] Frazier T. W., Youngstrom E. A. (2007). Historical increase in the number of factors measured by commercial tests of cognitive ability: Are we overfactoring? Intelligence, 35, 169-182. [Google Scholar]

[bibr15-0013164419865769] French B. F., Finch W. H. (2006). Confirmatory factor analytic procedures for the determination of measurement invariance. Structural Equation Modeling, 13, 378-402. [Google Scholar]

[bibr16-0013164419865769] Garrido L. E., Abad F. J., Ponsoda V. (2016). Are fit indices really fit to estimate the number of factors with categorical variables? Some cautionary findings via Monte Carlo simulation. Psychological Methods, 21, 93-111. [DOI] [PubMed] [Google Scholar]

[bibr17-0013164419865769] Gorsuch R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. [Google Scholar]

[bibr18-0013164419865769] Green S. B., Levy R., Thompson M. S., Lu M., Lo W.-J. (2012). A proposed solution to the problem with using completely random data to assess the number of factors with parallel analysis. Educational and Psychological Measurement, 72, 357-374. [Google Scholar]

[bibr19-0013164419865769] Green S. B., Redell N., Thompson M. S., Levy R. (2016). Accuracy of revised and traditional parallel analyses for assessing dimensionality with binary data. Educational and Psychological Measurement, 76, 5-21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr20-0013164419865769] Green S. B., Thompson M. S., Levy R., Lo W.-J. (2015). Type I and II error rates and overall accuracy of the revised parallel analysis method for determining the number of factors. Educational and Psychological Measurement, 75, 428-457. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr21-0013164419865769] Green S. B., Xu Y., Thompson M. (2018). Relative accuracy of two parallel analysis methods that use the proper reference distribution. Educational and Psychological Measurement, 78, 589-604. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr22-0013164419865769] Horn J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179-185. [DOI] [PubMed] [Google Scholar]

[bibr23-0013164419865769] Hu L.-I., Bentler P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55. [Google Scholar]

[bibr24-0013164419865769] Kenny D. A., Kaniskan B., McCoach D. B. (2015). The performance of RMSEA in models with small degrees of freedom. Sociological Methods & Research, 44, 486-507. [Google Scholar]

[bibr25-0013164419865769] Kim E. S., Yoon M., Lee T. (2012). Testing measurement invariance using mimic likelihood ratio test with a critical value adjustment. Educational and Psychological Measurement, 72, 469-492. [Google Scholar]

[bibr26-0013164419865769] Marsh H. W., Hau K., Wen Z. (2004). In search of golden rules: Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing. Structural Equation Modeling, 11, 320-341. [Google Scholar]

[bibr27-0013164419865769] Meade A. W., Johnson E. C., Braddy P. W. (2008). Power and sensitivity of alternative fit indices in tests of measurement invariance. Journal of Applied Psychology, 93, 568-592. [DOI] [PubMed] [Google Scholar]

[bibr28-0013164419865769] Monroe S., Cai L. (2015). Examining the reliability of student growth percentiles using multidimensional IRT. Educational Measurement: Issues and Practice, 34, 21-30. [Google Scholar]

[bibr29-0013164419865769] Muthèn L. K., Muthèn B. O. (2018). Mplus user’s guide (8th ed.). Los Angeles, CA: Muthèn & Muthèn. [Google Scholar]

[bibr30-0013164419865769] Pett M. A., Lackey N. R., Sullivan J. J. (2003). Making sense of factor analysis: The use of factor analysis for instrument development in health care research. Thousand Oaks, CA: Sage. [Google Scholar]

[bibr31-0013164419865769] Preacher K. J., MacCallum R. C. (2003). Repairing Tom Swift’s electric factor analysis machine. Understanding Statistics, 2, 13-43. [Google Scholar]

[bibr32-0013164419865769] Preacher K. J., Zhang G., Kim C., Mels G. (2013). Choosing the optimal number of factors in exploratory factor analysis: A model selection perspective. Multivariate Behavioral Research, 48, 28-56. [DOI] [PubMed] [Google Scholar]

[bibr33-0013164419865769] R Core Team. (2017). R: A language and environment for statistical computing (3.5.1). Vienna, Austria: R Foundation for Statistical Computing. [Google Scholar]

[bibr34-0013164419865769] Raiche G., Walls T. A., Magis D., Riopel M., Blais J.-G. (2013). Non-graphical solutions for Cattell’s scree test. Methodology, 9, 23-29. [Google Scholar]

[bibr35-0013164419865769] Ratti V., Vickerstaff V., Crabtree J., Hassiotis A. (2017). An exploratory factor analysis and construct validity of the Resident Choice Assessment Scale with paid carers of adults with intellectual disabilities and challenging behavior in community settings. Journal of Mental Health Research in Intellectual Disabilities, 10, 198-216. [Google Scholar]

[bibr36-0013164419865769] Reilly A., Eaves R. C. (2000). Factor analysis of the Minnesota Infant Development Inventory based on a Hispanic migrant population. Educational and Psychological Measurement, 60, 271-285. [Google Scholar]

[bibr37-0013164419865769] Revelle W. (2018). psych: Procedures for personality and psychological research (Version 1.8.12). Evanston, IL: Northwestern University; Retrieved from https://CRAN.R-project.org/package=psych [Google Scholar]

[bibr38-0013164419865769] Revelle W., Rocklin T. (1979). Very simple structure: An alternative procedure for estimating the optimal number of interpretable factors. Multivariate Behavioral Research, 14, 403-414. [DOI] [PubMed] [Google Scholar]

[bibr39-0013164419865769] Ruscio J., Roche B. (2012). Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure. Psychological Assessment, 24, 282-292. [DOI] [PubMed] [Google Scholar]

[bibr40-0013164419865769] Tong X., Bentler P. M. (2013). Evaluation of a new mean scaled and moment adjusted test statistic for SEM. Structural Equation Modeling, 20, 148-156. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr41-0013164419865769] Velicer W. F. (1976). Determining the number of components from the matrix of partial correlations. Psychometrika, 41, 321-327. [Google Scholar]

[bibr42-0013164419865769] Velicer W. F., Eaton C. A., Fava J. L. (2000). Construct explication through factor or component analysis: A review and evaluation of alternative procedures for determining the number of factor or components. In Goffin R. D., Helmes E. (Eds.), Problems and solutions in human assessment (pp. 41-71). New York, NY: Springer Science. [Google Scholar]

[bibr43-0013164419865769] Wang J. Z. (2001). Illegal Chinese immigration in the United States: A preliminary factor analysis. International Journal of Offender Therapy and Comparative Criminology, 45, 345-355. [Google Scholar]

[bibr44-0013164419865769] Wirth R. J., Edwards M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58-70. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr45-0013164419865769] Yang Y., Xia Y. (2015). On the number of factors to retain in exploratory factor analysis for ordered categorical data. Behavioral Research Methods, 47, 756-772. [DOI] [PubMed] [Google Scholar]

[bibr46-0013164419865769] Yuan K. (2005). Fit indices versus test statistics. Multivariate Behavioral Research, 40, 115-148. [DOI] [PubMed] [Google Scholar]

PERMALINK

Using Fit Statistic Differences to Determine the Optimal Number of Factors to Retain in an Exploratory Factor Analysis

W Holmes Finch

Abstract

Exploratory Factor Analysis

Traditional Methods for Determining the Number of Factors to Retain

Parallel Analysis

Model Fit Statistics

Determining the Number of Factors to Retain Using Fit Indices With EFA

Study Goals