Abstract
Factor analysis is commonly used in behavioral sciences to measure latent constructs, and researchers routinely consider approximate fit indices to ensure adequate model fit and to provide important validity evidence. Due to a lack of generalizable fit index cutoffs, methodologists suggest simulation-based methods to create customized cutoffs that allow researchers to assess model fit more accurately. However, simulation-based methods are computationally intensive. An open question is: How many simulation replications are needed for these custom cutoffs to stabilize? This Monte Carlo simulation study focuses on one such simulation-based method—dynamic fit index (DFI) cutoffs—to determine the optimal number of replications for obtaining stable cutoffs. Results indicated that the DFI approach generates stable cutoffs with 500 replications (the currently recommended number), but the process can be more efficient with fewer replications, especially in simulations with categorical data. Using fewer replications significantly reduces the computational time for determining cutoff values with minimal impact on the results. For one-factor or three-factor models, results suggested that in most conditions 200 DFI replications were optimal for balancing fit index cutoff stability and computational efficiency.
Keywords: model fit, confirmatory factor analysis, structural equation modeling
Psychologists are often interested in studying latent constructs such as mental health conditions, cognitive abilities, and personality traits. Unlike physical or biometric variables, latent constructs are not directly observable, so they can be difficult to define and measure. Confirmatory factor analysis (CFA) is a structural equation modeling (SEM) technique commonly used to measure latent constructs (Kline, 2016). Model fit is a crucial part of applying CFA in research because obtaining an adequate fit of a CFA model provides important validity evidence (Rios & Wells, 2014).
There are two types of global model fit tests, exact fit and approximate fit. The likelihood ratio test (e.g., χ2 test; Jöreskog, 1969) is a common test of exact fit. The χ2 test evaluates whether the model-implied covariance matrix is equal to the observed covariance matrix. The χ2 test is useful since it is a null hypothesis test that provides a straightforward conclusion regarding model fit. However, despite the clear hypothesis and interpretation of an exact fit test, some methodologists have argued that the results from an exact fit test are not interesting because it is unreasonable to obtain a model that fits perfectly (Jöreskog & Sörbom, 1982), and researchers should not exclusively rely on the results from a χ2 test (Millsap, 2007). As a null hypothesis test, the χ2 test does not necessarily provide evidence for the pragmatic usefulness of a model because failing to reject the null hypothesis of the χ2 test does not necessarily indicate that the model has a good fit (Browne & Cudeck, 1992). In essence, retaining the null hypothesis of a χ2 test does not actively endorse that the model has a good fit; instead, it simply indicates a lack of evidence to reject the model. Furthermore, the power of the χ2 test can become excessively high when dealing with large sample sizes or non-normally distributed data, which may lead to an overly sensitive test that identifies possibly trivial misspecification (Hu et al., 1992). Thus, researchers often use approximate fit indices along with the χ2 test to assess model fit.
Approximate fit indices are similar to effect size measures and can capture the extent of misspecification, instead of simply providing a dichotomous determination of either fit or misfit. Widely used fit indices include the root mean square error of approximation (RMSEA; Steiger & Lind, 1980), standardized root mean squared residual (SRMR; Bentler, 1995), and comparative fit index (CFI; Bentler & Bonett, 1980). However, these fit indices are not without flaws. Over the last few decades, an ongoing debate has centered around the controversial issue of the usefulness of approximate fit indices, since there is a lack of well-defined, appropriate cutoffs (Marsh et al., 2004). Initially, Hu and Bentler (1999) addressed this issue using a simulation study with a three-factor CFA model to give suggestions on cutoff values for several approximate fit indices (e.g., SRMR ≤ 0.08, RMSEA ≤ 0.06, and CFI ≥ 0.96). Although the authors clearly indicated that these cutoffs should not be overgeneralized, these values are still very popular today, as this paper has received over 111,000 citations on Google Scholar.
The cutoffs suggested by Hu and Bentler (1999) were derived from simulations based on a specific CFA model with specific types of misspecifications, but they may lose their utility as researchers deviate further from the three-factor model in the original simulation study. Consequently, the reliability and usefulness of conclusions drawn under these circumstances may become questionable. Methodologists have criticized these rules of thumb and examined how the cutoffs can change when sample and model characteristics vary, such as sample size, number of items or factors, magnitude of factor loadings, factor reliability, and type of the model (Chen et al., 2008; Heene et al., 2011; Marsh et al., 2004; McNeish et al., 2018; Shi et al., 2019). Groskurth et al. (2023) recently conducted a comprehensive simulation study involving various sample and model characteristics to replicate and extend previous findings on the sensitivity and susceptibility of fit indices, emphasizing the perils of overgeneralizing fixed cutoffs.
To address the limitations of these commonly used cutoffs, methodologists have suggested that it can be beneficial for empirical researchers to follow Hu and Bentler’s (1999) logic to conduct their own simulation and create custom cutoffs based on their specific model characteristics (Kim & Millsap, 2014; Millsap, 2007). Recently, there have been computational alternatives that derive approximate fit index cutoffs and provide empirical researchers with accurate cutoffs that are tailored to their specific CFA model. These methods include SIMulated SEM (simsem; Pornprasertmanit et al., 2021), flexible cutoffs (Niemand & Mai, 2018), and dynamic fit index (DFI; McNeish & Wolf, 2023a). These approaches allow researchers to more accurately assess the fit of their model, addressing the challenge of relying on non-generalizable cutoffs in Hu and Bentler (1999).
These methods, despite their differences, are underpinned by a similar conceptual foundation since they are all simulation-based. As a result, the implementation of these methods demands time and computational power, posing a potential barrier to researchers due to the complexity and time constraints. Therefore, it is important to optimize the number of replications in simulations for deriving fit index cutoffs efficiently, as it consequently helps to reduce simulation time and enhance accessibility for researchers.
In this paper, we focus on McNeish and Wolf’s (2023a) DFI method and use a Monte Carlo simulation study to determine the optimal number of replications for obtaining stable fit index cutoffs for SRMR, RMSEA, and CFI using the DFI method. The structure of this paper is as follows. We first provide a brief overview of DFI and outline its functionality as a simulation-based method. We follow with a simulation study to investigate the number of replications required for the fit index cutoffs to stabilize. We then present the results and conclude with a discussion to offer recommendations to researchers and address the limitations of this study.
Overview of DFI
McNeish and Wolf (2023a) developed DFI cutoffs as a generalization of the approach used by Hu and Bentler (1999). The DFI method uses simulations and generates approximate fit index cutoffs that are tailored to reflect sensitivity to misspecification in the researcher’s fitted model. While the concept of customized simulations is not novel (Kim & Millsap, 2014; Millsap, 2007), relatively few researchers have adopted simulation-based methods. This is partially due to the limited training in statistical programming received by empirical researchers, hindering their ability to implement custom simulations from scratch. Furthermore, the process of devising and executing such simulations can be time-consuming, and selecting plausible misspecifications for a CFA model poses an additional challenge (West et al., 2023). To overcome these barriers, the DFI method was designed to be an automated approach that is easily accessible to researchers, and DFI allows researchers to modify the conditions used in the simulation based on their data and model characteristics.
The DFI method generalizes the simulation approach in Hu and Bentler (1999) and determines hypothetical misspecifications similar to the ones used in Hu and Bentler (1999). The DFI algorithm follows five steps (McNeish & Wolf, 2023a, 2023b). In step 1, using the model and data provided by the researcher, the empirical model is fitted, and the standardized parameter estimates are obtained. In step 2, the algorithm creates a set of misspecified models as data generation models. For one-factor models, the misspecifications are created by adding residual correlations, since common misspecifications in one-factor CFA models tend to be related to multidimensionality and local dependence. Three levels of misspecifications are considered. The level 1, 2, and 3 misspecifications correspond to a .30 residual correlation between one-third of the items, two-thirds of the items, and three-thirds of the items, respectively. These misspecifications provide a reasonably large discrepancy between the empirical model and the misspecified model, and the level 1 misspecification yields similar results to the traditional cutoffs. For multidimensional models with two or more factors, the DFI method adds potentially omitted cross-loadings to create misspecifications, following a similar procedure as in Hu and Bentler (1999). The number of levels of misspecification is the number of factors subtracted by one. For example, for a three-factor model, there are two levels of misspecification. A cross-loading is added to the item with the weakest standardized loading from the factor with the highest factor reliability that did not originally load on the item, and the magnitude of the cross-loading is the same as the weakest standardized loading. For models with high standardized loadings across items, the algorithm considers the magnitude of the lowest standardized loading in the empirical model to ensure that the cross-loading generated can maintain a nonnegative population residual variance.
In step 3, the algorithm generates 500 simulated datasets from the model-implied covariance matrix of the misspecified model in the previous step. In each data set, fit index cutoffs are obtained for SRMR, RMSEA, and CFI. These cutoffs capture the degree of misspecification in the model. In step 4, the algorithm gathers all of these cutoffs and determines the 5th percentile of the distribution for SRMR and RMSEA and the 95th percentile of the distribution for CFI, following the same procedure in Hu and Bentler (1999). In step 5, the algorithm repeats steps 2 through 4 using the model-implied covariance matrix from the original empirical model instead of the misspecified model to ensure that the cutoffs generated from step 4 can distinguish between misspecified and correct models. If not, the algorithm is repeated and uses a 10th percentile for SRMR and RMSEA (or 90th percentile for CFI) to generate cutoffs that can better discriminate between correct and misspecified models.
As a computational alternative, the DFI method uses simulations to generate approximate fit index cutoffs to offer empirical researchers customized cutoffs for their specific model. Providing more accurate fit index cutoffs for researchers, this novel approach addresses challenges pertaining to (a) researcher’s limited training in statistical programming and conducting simulation studies, (b) difficulty in selecting plausible misspecifications, and (c) time constraints.
Current Study
The DFI method is a simulation-based method, so it is important to evaluate the number of replications necessary for the generated fit index cutoffs to stabilize. The existing DFI algorithm sets a default of 500 replications, but it has not been studied whether this number is the optimal number of replications to generate stable fit index cutoffs.
In the current study, we use a Monte Carlo simulation focusing on a one-factor model and a three-factor model with either continuous or ordered categorical data. Our aims in this study were twofold. First, we assessed the optimal number of replications for obtaining stable DFI cutoffs. The goal of this aim was to determine if DFI can be made more efficient by arriving at essentially the same cutoffs but using fewer simulation replications. Second, we examined the impact of model and data characteristics on the number of replications for stable DFI cutoffs. The goal of this aim was to determine if the optimal number of replications was relatively constant across conditions or whether the optimal number of replications was idiosyncratic and depended on the characteristics of the model or data.
Simulation Design
We generated multivariate, normally distributed continuous data for two empirical models, a one-factor model and a three-factor model. For each model, we varied the number of DFI replications and drew 500 random samples with 3 realistic conditions that are common in psychological research to examine the DFI method and its optimal number of replications to obtain stable fit index cutoffs. The five manipulated factors in this simulation study were number of DFI replications, structure of the model (i.e., number of factors and items), magnitude of factor loadings, sample size, and type of data (i.e., continuous or ordered categorical).
One-Factor Model
In the one-factor model, we created misspecification by generating data that was truly two-dimensional, following the same misspecifications used by McNeish and Wolf (2023b) and McNeish (2023). Figure 1 presents the structure of data generation models for the one-factor model. In this case, the misspecified model had two factors, and 75% of the items loaded on factor 1 and 25% of the items loaded on factor 2, with no cross-loadings. We held the magnitude of misspecification constant by fixing the standardized loadings at .60 for items loading on factor 2 with a factor correlation at .50 between factors 1 and 2. This misspecification ensured that the factor 2 items had standardized loadings of around .30 when all items were fit in the misspecified one-factor model.
Figure 1.
Data Generating Models and Fitted Models for the One-Factor Model.
The manipulated factors in the one-factor model are as follows. First, as the key objective of this study, we varied the number of DFI replications: 25, 50, 100, 200, 300, 400, 500, and 1,000, to cover a plausible range of number of replications. We compared the fit index cutoffs generated in conditions with fewer replications to the condition with 1,000 replications and assessed the cost benefit of each DFI replication condition. Second, per McNeish and Wolf (2023b) and McNeish (2023), we varied the number of items and the magnitude of factor loadings. There were either 8 or 12 items, and the standardized factor loadings of the Factor 1 items were 0.60, 0.75, or 0.90. In addition, we considered two levels of sample size: N = 400 and N = 1,000. Jackson et al. (2009) reviewed 194 studies using CFA in APA journals and found that the average sample size was 389. Thus, we decided to use N = 400 as the lower bound for the sample size levels.
Regarding data type, multivariate normal, continuous data were generated in each condition based on the structure of the model, magnitude of factor loadings, and sample size. The model was fitted using maximum likelihood estimation in conditions with continuous data.
Flake et al. (2017) randomly sampled 39 articles from all articles published in the Journal of Personality and Social Psychology in 2014, and they found that out of the 351 scales reported in these articles, 81% were Likert-type response scales. Thus, examining ordered categorical data is crucial because models for such data are widely used. The ordered categorical data in this study were created by categorizing the initially generated continuous data into bins, resulting in data on a scale of 1 to 5. To create a normally distributed response pattern, the first 10% of the data were labeled as 1, the next 20% (up to 30%) labeled as 2, the next 40% (up to 70%) labeled as 3, the next 20% (up to 90%) labeled as 4, and the last 10% were labeled as 5. The models in conditions with the ordered categorical data were fitted using weighted least squares estimation with mean and variance corrections (often referred to as WLSMV in latent variable model software).
Three-Factor Model
In the three-factor model, as shown in Figure 2, we created misspecification by allowing 25% of the items to have cross-loadings, similar to the extent of misspecification in the one-factor model. The cross-loadings in the three-factor model were standardized and fixed at .30, and the factor correlations were fixed at .40 (McNeish, 2023). The number of DFI replications and sample size conditions were identical to the ones used in the one-factor model. In terms of the number of items and the magnitude of factor loadings, in the three-factor model, there were either 12 or 24 items (4 or 8 items per factor), and the standardized factor loadings were .60, .70, or .80. These factor loadings differed from the ones in the one-factor model due to two reasons. First, with more items in the three-factor model, it is more realistic that items have more moderate loadings. Second, to create meaningful misspecifications, we need smaller standardized loadings for the items to allow for larger cross-loadings. The same process was used in generating both the continuous and ordered categorical data. Maximum likelihood estimation and weighted least squares estimation with mean and variance corrections were used in fitting the model with continuous and ordered categorical data, respectively.
Figure 2.
Data Generating Models and Fitted Models for the Three-Factor Model.
For both the one-factor model and the three-factor model, as shown in Figures 1 and 2, we generated data using misspecified models, and the empirical models were used in the DFI algorithm to determine cutoffs for SRMR, RMSEA, and CFI. For each generated dataset, SRMR, RMSEA, and CFI cutoffs were saved, and the average values across the 500 random samples were evaluated in each condition. We considered the cutoffs from the conditions with 1,000 DFI replications as the reference cutoffs since these cutoffs yielded the least Monte Carlo error.
We assessed the cost benefit of each DFI replication condition (i.e., 25, 50, 100, 200, 300, 400, and 500) in comparison with 1,000 DFI replications. To investigate the optimal number of DFI replications, in each of the 500 random samples, we calculated the difference between the cutoff in each DFI replication condition and the cutoff from 1,000 DFI replications, across model structure, factor loading, and sample size conditions. Since it is common that fit indices are rounded to the third decimal place in most software, we considered an average difference of .001 or less as negligible (i.e., accuracy to the third decimal place). Data were generated in R version 4.1.3 (R Core Team, 2023) using the simstandard R package (Schneider, 2021). Models were fit with the lavaan R package (Rosseel, 2012) using default settings. DFI cutoffs for SRMR, RMSEA, and CFI were obtained with the dynamic R package (Wolf & McNeish, 2023).
Simulation Results
Overall Results
Table 1 shows the SRMR, RMSEA, and CFI cutoffs at various numbers of DFI replications with continuous data, compared with the reference cutoffs from 1,000 DFI replications, averaging over number of items, factor loading, and sample size conditions. Table 2 shows the results with categorical data in the same format. The optimal number of replications varies and can influence the obtained DFI cutoffs depending on the model and data characteristics. Overall, 200 DFI replications tended to be sufficient for models with more items and a larger sample size.
Table 1.
Average Difference in SRMR, RMSEA, and CFI Cutoffs Comparing With 1,000 DFI Replications.
Model Type | Number of DFI Reps | SRMR |
RMSEA |
CFI |
||||||
---|---|---|---|---|---|---|---|---|---|---|
1,000 DFI Reps | Estimated | Difference | 1,000 DFI Reps | Estimated | Difference | 1,000 DFI Reps | Estimated | Difference | ||
One-factor model | 25 | 0.036 | 0.037 | 0.0003 | 0.048 | 0.050 | 0.0013 | 0.978 | 0.977 | −0.0005 |
50 | 0.036 | 0.037 | 0.0007 | 0.048 | 0.049 | 0.0006 | 0.978 | 0.978 | −0.0003 | |
100 | 0.036 | 0.038 | 0.0012 | 0.048 | 0.050 | 0.0014 | 0.978 | 0.977 | −0.0012 | |
200 | 0.036 | 0.037 | 0.0002 | 0.048 | 0.049 | 0.0002 | 0.978 | 0.978 | −0.0002 | |
300 | 0.036 | 0.036 | 0.0000 | 0.048 | 0.048 | −0.0002 | 0.978 | 0.978 | 0.0001 | |
400 | 0.036 | 0.036 | 0.0000 | 0.048 | 0.048 | −0.0002 | 0.978 | 0.978 | 0.0002 | |
500 | 0.036 | 0.036 | 0.0001 | 0.048 | 0.048 | −0.0002 | 0.978 | 0.978 | 0.0002 | |
Three-factor model | 25 | 0.038 | 0.039 | 0.0008 | 0.054 | 0.055 | 0.0009 | 0.967 | 0.966 | −0.0013 |
50 | 0.038 | 0.038 | 0.0000 | 0.054 | 0.054 | 0.0002 | 0.967 | 0.967 | −0.0001 | |
100 | 0.038 | 0.038 | 0.0001 | 0.054 | 0.054 | 0.0000 | 0.967 | 0.967 | −0.0002 | |
200 | 0.038 | 0.039 | 0.0001 | 0.054 | 0.054 | −0.0001 | 0.967 | 0.967 | 0.0000 | |
300 | 0.038 | 0.038 | −0.0001 | 0.054 | 0.054 | −0.0004 | 0.967 | 0.968 | 0.0003 | |
400 | 0.038 | 0.038 | −0.0001 | 0.054 | 0.054 | −0.0002 | 0.967 | 0.967 | 0.0001 | |
500 | 0.038 | 0.038 | 0.0000 | 0.054 | 0.054 | −0.0001 | 0.967 | 0.967 | 0.0001 |
Note. This table displays the average difference between estimated fit index cutoffs at each number of DFI replications condition in the “Estimated” column and the cutoffs from 1,000 DFI replications in the “1,000 DFI Replications” column. Bolded numbers indicate average differences that are larger than 0.001 in absolute value. SRMR = standardized root mean squared residual; RMSEA = root mean square error of approximation; CFI = comparative fit index
Table 2.
Average Difference in SRMR, RMSEA, and CFI Cutoffs Compared With 1,000 DFI Replications With Categorical Data.
Model Type | Number of DFI Reps | SRMR | RMSEA | CFI | ||||||
---|---|---|---|---|---|---|---|---|---|---|
1,000 DFI Reps | Estimated | Difference | 1,000 DFI Reps | Estimated | Difference | 1,000 DFI Reps | Estimated | Difference | ||
One-factor model | 25 | 0.025 | 0.026 | −0.00943 | 0.027 | 0.030 | −0.00293 | 0.996 | 0.995 | 0.00108 |
50 | 0.025 | 0.026 | −0.00350 | 0.027 | 0.027 | −0.00031 | 0.996 | 0.995 | 0.00046 | |
100 | 0.025 | 0.026 | −0.00266 | 0.027 | 0.027 | 0.00009 | 0.996 | 0.996 | 0.00011 | |
200 | 0.025 | 0.025 | −0.00010 | 0.027 | 0.027 | −0.00001 | 0.996 | 0.995 | 0.00014 | |
300 | 0.025 | 0.025 | 0.00032 | 0.027 | 0.026 | 0.00089 | 0.996 | 0.996 | −0.00032 | |
400 | 0.025 | 0.025 | −0.00004 | 0.027 | 0.027 | 0.00025 | 0.996 | 0.996 | −0.00011 | |
500 | 0.025 | 0.025 | 0.00002 | 0.027 | 0.027 | 0.00029 | 0.996 | 0.996 | −0.00012 | |
Three-factor model | 25 | 0.030 | 0.029 | 0.00033 | 0.020 | 0.019 | 0.00077 | 0.966 | 0.996 | −0.00068 |
50 | 0.030 | 0.030 | −0.00037 | 0.020 | 0.020 | −0.00038 | 0.966 | 0.995 | 0.00019 | |
100 | 0.030 | 0.029 | 0.00019 | 0.020 | 0.019 | 0.00038 | 0.966 | 0.996 | −0.00028 | |
200 | 0.030 | 0.030 | 0.00006 | 0.020 | 0.020 | 0.00013 | 0.966 | 0.996 | −0.00010 | |
300 | 0.030 | 0.030 | −0.00006 | 0.020 | 0.020 | −0.00033 | 0.966 | 0.996 | −0.00002 | |
400 | 0.030 | 0.029 | 0.00022 | 0.020 | 0.019 | 0.00049 | 0.966 | 0.996 | −0.00034 | |
500 | 0.030 | 0.030 | 0.00012 | 0.020 | 0.019 | 0.00028 | 0.966 | 0.996 | −0.00034 |
Note. This table displays the average difference between estimated fit index cutoffs at each number of DFI replications condition in the “Estimated” column and the cutoffs from 1,000 DFI replications in the “1,000 DFI Replications” column. Bolded numbers indicate average differences that are larger than 0.001 in absolute value. SRMR = standardized root mean squared residual; RMSEA = root mean square error of approximation; CFI = comparative fit index.
For the one-factor model, on average, with 200 DFI replications or more, SRMR, RMSEA, and CFI cutoffs were all within a negligible difference of 0.001 in comparison with the reference cutoffs from 1,000 DFI replications. For the three-factor model, on average, with 50 DFI replications or more, SRMR, RMSEA, and CFI cutoffs were all within a negligible difference in comparison with the reference cutoffs from 1,000 DFI replications. With a less strict metric of a two-decimal point difference, regardless of the number of DFI replications, the difference was below 0.01 for all SRMR, RMSEA, and CFI cutoffs on average for both the one-factor model and the three-factor model. The variability of the fit index cutoffs remained the same regardless of the number of DFI replications in most conditions, with the exception that in models with categorical data, the variability of the RMSEA cutoffs decreased as the number of DFI replications increased. A full table of standard deviations of the cutoffs across conditions is reported in the supplemental materials for interested readers (See Online Supplementary Materials—Table S1).
Thus, overall, three-decimal point accuracy can be obtained with 200 DFI replications, while two-decimal point accuracy can be obtained with 25 DFI replications. Practically speaking, running 200 DFI replications can reduce the time required by approximately 60% compared with the default of 500 DFI replications, while running 25 DFI replications can cut the time by about 95%. This is due to a linear relationship between the number of DFI replications and the time needed. On average, running 200 DFI replications instead of 500 saves 23 seconds with continuous data and 289 seconds with categorical data, and running 25 DFI replications instead of 500 saves 38 seconds with continuous data and 455 seconds with categorical data (see Figure 3).
Figure 3.
Average Time Spent to Run the DFI Function in Seconds.
Note. The top panel displays results for one-factor models. The bottom panel displays results for three-factor models. In each panel, the column facets represent the number of items and sample size conditions. The shapes and colors represent the data type conditions.
This substantial reduction in time alleviates potential constraints and demonstrates the DFI approach can be made more accessible and more efficient. However, as we discuss in the following sections, model structure, factor loading, and sample size conditions may have an impact on the DFI cutoffs as well as the number of DFI replications needed to obtain stable DFI cutoffs.
Results for the One-Factor Model
Figure 4 shows the average cutoffs for SRMR, RMSEA, and CFI in various number of items and sample size conditions in the one-factor model with continuous data. Figure 5 shows the results with categorical data. These results were averaged over factor loading conditions because the cutoffs followed a similar pattern across different levels of factor loadings (see Supplemental Materials for complete results). For models with continuous data, on average, while a larger sample size led to larger RMSEA cutoffs and smaller SRMR and CFI cutoffs, a larger number of items led to smaller cutoffs for SRMR, RMSEA, and CFI. For models with categorical data, on average, while a larger sample size led to larger CFI cutoffs and smaller SRMR and RMSEA cutoffs, a larger number of items led to smaller cutoffs for SRMR, RMSEA, and larger cutoffs for CFI.
Figure 4.
Average SRMR, RMSEA, and CFI Cutoffs in the One-Factor Model With Continuous Data.
Note. The top panel displays results for SRMR. The middle panel displays results for RMSEA. The bottom panel displays results for CFI. In each panel, the column facets represent the number of items and sample size conditions. SRMR = standardized root mean squared residual; RMSEA = root mean square error of approximation; CFI = comparative fit index.
Figure 5.
Average SRMR, RMSEA, and CFI Cutoffs in the One-Factor Model With Categorical Data.
Note. The top panel displays results for SRMR. The middle panel displays results for RMSEA. The bottom panel displays results for CFI. In each panel, the column facets represent the number of items and sample size conditions. SRMR = standardized root mean squared residual; RMSEA = root mean square error of approximation; CFI = comparative fit index.
As shown in Figures 6 and 7, we examined the average difference for SRMR, RMSEA, and CFI in the one-factor model with continuous data (Figure 6) and categorical data (Figure 7) at various numbers of DFI replications, compared with the reference cutoffs from 1,000 DFI replication. We considered a difference of 0.001 or less as negligible, as the dashed lines annotate in Figures 6 and 7. We present the results averaged across factor loading levels since the differences followed a similar pattern in different loading conditions (see Online Supplemental Materials for complete results).
Figure 6.
Average Difference in SRMR, RMSEA, and CFI Cutoffs Comparing With 1,000 DFI Replications in the One-Factor Model With Continuous Data.
Note. The top panel displays results for SRMR. The middle panel displays results for RMSEA. The bottom panel displays results for CFI. In each panel, the column facets represent the number of items and sample size conditions. The dashed lines show a difference of ±0.001. SRMR = standardized root mean squared residual; RMSEA = root mean square error of approximation; CFI = comparative fit index.
Figure 7.
Average Difference in SRMR, RMSEA, and CFI Cutoffs Comparing With 1,000 DFI Replications in the One-Factor Model With Categorical Data.
Note. The top panel displays results for SRMR. The middle panel displays results for RMSEA. The bottom panel displays results for CFI. In each panel, the column facets represent the number of items and sample size conditions. The dashed lines show a difference of ±0.001. SRMR = standardized root mean squared residual; RMSEA = root mean square error of approximation; CFI = comparative fit index.
Both a larger sample size and a larger number of items led to more stable cutoffs even with a small number of DFI replications. For SRMR, the DFI cutoffs tended to stabilize within a difference of 0.001 when compared with the 1,000 DFI replication reference cutoffs at 200 replications. With 12 items and N = 1,000, all SRMR cutoffs were within a difference of 0.001 regardless of the number of DFI replications. That is, three-decimal point accuracy was achieved, on average, with 25 DFI replications. For RMSEA, similarly, the cutoffs stabilized within a difference of 0.001 when compared with the 1,000 DFI replication reference cutoffs at 200 replications, except for when the data were categorical with 8 items and N = 1,000, where three-decimal accuracy was not achieved until 500 DFI replications. For CFI, there were fluctuations with a small number of items and a small sample size when the data was continuous. With 8 items and N = 400, the CFI cutoffs for models with continuous data were within a difference of 0.001 when compared with the 1,000 DFI replication reference cutoffs at 50, 200, and 500 replications, but the difference was larger than 0.001 on average at 25, 100, 300, and 400 DFI replications. With a larger number of items and sample size, the CFI cutoffs for models with continuous data tended to stabilize at 200 DFI replications. For models with categorical data, the CFI cutoffs stabilized within a difference of 0.001 when compared with the 1,000 DFI replication reference cutoffs to cutoffs based on 50 replications.
Results for the Three-Factor Model
Figure 8 displays the average cutoffs for SRMR, RMSEA, and CFI in various number of items and sample size conditions in the three-factor model with continuous data, averaging over factor loading conditions (see Supplemental Materials for complete results). Figure 9 shows the results with categorical data. While a larger sample size and a larger number of items led to more stable cutoffs, consistent with the one-factor model results, there were less fluctuations in DFI cutoffs of the three-factor model.
Figure 8.
Average SRMR, RMSEA, and CFI Cutoffs in the Three-Factor Model With Continuous Data.
Note. The top panel displays results for SRMR. The middle panel displays results for RMSEA. The bottom panel displays results for CFI. In each panel, the column facets represent the number of items and sample size conditions. SRMR = standardized root mean squared residual; RMSEA = root mean square error of approximation; CFI = comparative fit index.
Figure 9.
Average SRMR, RMSEA, and CFI Cutoffs in the Three-Factor Model With Categorical Data.
Note. The top panel displays results for SRMR. The middle panel displays results for RMSEA. The bottom panel displays results for CFI. In each panel, the column facets represent the number of items and sample size conditions. SRMR = standardized root mean squared residual; RMSEA = root mean square error of approximation; CFI = comparative fit index.
Figure 10 shows the difference for SRMR, RMSEA, and CFI compared with the reference cutoffs from 1,000 DFI replication in the three-factor model with continuous data, averaging over factor loading conditions (see Supplemental Materials for complete results). Figure 11 shows the results with categorical data. Compared with the one-factor model, fewer DFI replications were needed to obtain stable DFI cutoffs in the three-factor model. For models with continuous data, for SRMR, except when there were 12 items with N = 400 and 25 DFI replications, all the other DFI cutoffs were within a difference of 0.001. For both RMSEA and CFI, all DFI cutoffs were within a difference of 0.001 with 50 DFI replications, except when there were 12 items with N = 400 and 300 DFI replications.
Figure 10.
Average Difference in SRMR, RMSEA, and CFI Cutoffs Compared With 1,000 DFI Replications in the Three-Factor Model With Continuous Data.
Note. The top panel displays results for SRMR. The middle panel displays results for RMSEA. The bottom panel displays results for CFI. In each panel, the column facets represent the number of items and sample size conditions. The dashed lines show a difference of 0.001. SRMR = standardized root mean squared residual; RMSEA = root mean square error of approximation; CFI = comparative fit index.
Figure 11.
Average Difference in SRMR, RMSEA, and CFI Cutoffs Compared With 1,000 DFI Replications in the Three-Factor Model With Categorical Data.
Note. The top panel displays results for SRMR. The middle panel displays results for RMSEA. The bottom panel displays results for CFI. In each panel, the column facets represent the number of items and sample size conditions. The dashed lines show a difference of 0.001. SRMR = standardized root mean squared residual; RMSEA = root mean square error of approximation; CFI = comparative fit index.
For models with categorical data, for SRMR, all DFI cutoffs were within a difference of 0.001 regardless of number of DFI replications, number of items, or sample size. For RMSEA, except when there were 12 items with N = 400 and 400 and 500 DFI replications and when there were 24 items with N = 400 and 50 DFI replications, all the other DFI cutoffs were within a difference of 0.001. For CFI, except when there were 12 items with N = 400 and 25, 400, and 500 DFI replications and when there were 24 items with N = 400 and 25 DFI replications, all the other DFI cutoffs were within a difference of 0.001. Therefore, the results showed that 500 DFI replication may not be enough for RMSEA and CFI for 12-item three-factor models with categorical data and N = 400. With a larger number of items and a larger sample size, SRMR, RMSEA, and CFI cutoffs tended to be within a difference of 0.001 when compared with the 1,000 DFI replication reference cutoffs.
The results of the three-factor model were similar to the one-factor model with two major distinctions. First, the DFI cutoffs in the three-factor model were more stable with a small number of DFI replications when compared with those in the one-factor model. There were more fluctuations in the DFI cutoffs in the one-factor model. In many cases in the one-factor model, 200 DFI replications were needed to obtain a stable DFI cutoff. In the three-factor model, however, 50 DFI replications were sufficient in many conditions. Second, the magnitude of factor loadings had a larger effect on the DFI cutoffs in some conditions. Overall, smaller factor loadings tended to lead to a larger difference and more unstable DFI cutoffs.
Discussion
Determining model fit is a crucial aspect of CFA. While the fit index cutoffs proposed by Hu and Bentler (1999) continue to be widely used today, critiques of these rules of thumb have arisen in response to the overgeneralization of these cutoffs, since fit index cutoffs are dependent on sample and model characteristics. To address these concerns, methodologists have recommended simulation-based methods for empirical researchers to obtain customized fit index cutoffs based on their specific models (Kim & Millsap, 2014; Millsap, 2007). Nonetheless, these methods may pose challenges due to complexities in statistical programming and simulation studies. Among these challenges, time constraints are indeed important to consider. The DFI approach is an accessible method that offers customized fit index cutoffs for model fit evaluation. The current DFI algorithm uses a default of 500 replications in its simulation to generate fit index cutoffs. In this study, we examined the optimal number of replications to generate stable fit index cutoffs.
We conducted a simulation study and determined that in general 200 DFI replications were sufficient for obtaining stable DFI cutoffs with three-decimal point accuracy, and 25 DFI replications were sufficient to achieve two-decimal point accuracy. Running 200 DFI replications reduces the required time by approximately 60% compared with the default of 500 DFI replications, while running 25 DFI replications reduces the time by about 95%. The findings suggest that the DFI method provides stable cutoff values with 500 replications, the current recommendation, but efficiency can be improved by using fewer replications, particularly in categorical data simulations, as this significantly decreases computational time with little impact on the generated cutoff values for fit indices. However, the optimal number may vary depending on specific sample and model characteristics. Overall, fewer replications were needed in the three-factor model in comparison with the one-factor model. A large number of items and a larger sample size both led to more stable DFI cutoffs that required fewer replications.
While the magnitude of factor loadings did not have an effect on the stability of the cutoffs in the one-factor model, in the three-factor model we observed that larger factor loadings led to slightly more stable cutoffs. Although the overall results showed that 200 DFI replications were sufficient when averaging over all conditions, there was one case in which 500 DFI replications may not be enough to obtain stable DFI cutoffs. The results showed that the DFI cutoffs for RMSEA and CFI did not stabilize at 500 DFI replications for 12-item three-factor models with categorical data and N = 400. Therefore, additional investigation is needed to examine the number of DFI replications needed for these types of models.
In addition, our findings confirmed the previous research findings and showed that in CFA fit index cutoffs vary depending on model structure, factor loading, and sample size (Groskurth et al., 2023; Heene et al., 2011; Marsh et al., 2004; McNeish et al., 2018). Our results, along with previous research findings (e.g., Wolf & McNeish, 2023), showed that the Hu and Bentler cutoffs are too lenient for certain model characteristics. For example, similar to the findings by Fan and Sivo (2005), our results also showed that the Hu and Bentler SRMR cutoff (SRMR < 0.08) is sensitive to omitted factor correlations in multifactor models, making it overly lenient for one-factor models. Our results showed that the appropriate SRMR cutoffs are below 0.04 in most conditions with one-factor models.
All simulation studies have limitations because their results depend on the specific design and conditions used in the study. In this study, we examined a one-factor model and a three-factor model with a limited range of number of items, magnitude of factor loadings, and sample size conditions. It is unknown whether our findings can generalize to other types of CFA models or other types of structural equation models such as mediation models and growth models. In addition, this simulation was based on a previously studied misspecification, but different types of misspecifications may have an impact on the results. Future research is needed to investigate the optimal number of DFI replications for other models with different data and model characteristics. Moreover, in this study, we considered the cost benefit of increasing the number of DFI replications in comparison with 1,000 DFI replications. However, there are other definitions of “stable cutoffs” that may be reasonable to consider, such as obtaining a minimum tolerable variation in the cutoff for each additional 100 replications, for example.
Overall, our results demonstrated that the DFI approach generates stable cutoffs with its current default of 500 replications but that it can be made more efficient by using fewer replications, particularly in time-intensive simulations involving categorical data. In fact, in many cases we examined, much fewer DFI replications are sufficient. Using fewer replications reduces computational time with minimal impact on the results. In most situations with a one-factor or three-factor model, the optimal number of replications for obtaining stable fit index cutoffs for SRMR, RMSEA, and CFI was 200 DFI replications.
Supplemental Material
Supplemental material, sj-docx-1-epm-10.1177_00131644241290172 for Optimal Number of Replications for Obtaining Stable Dynamic Fit Index Cutoffs by Xinran Liu and Daniel McNeish in Educational and Psychological Measurement
Footnotes
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by the Institute for Educational Sciences, award number R305D220003. The content is solely the responsibility of the authors and does not represent the official views of the US Department of Education or the Institute for Educational Sciences.
ORCID iDs: Xinran Liu
https://orcid.org/0000-0002-4784-0790
Daniel McNeish
https://orcid.org/0000-0003-1643-9408
Supplemental Material: Supplemental material for this article is available online.
References
- Bentler P. M. (1995). EQS structural equations program manual. Multivariate Software. [Google Scholar]
- Bentler P. M., Bonett D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88(3), 588–606. 10.1037/0033-2909.88.3.588 [DOI] [Google Scholar]
- Browne M. W., Cudeck R. (1992). Alternative ways of assessing model fit. Sociological Methods & Research, 21(2), 230–258. 10.1177/0049124192021002005 [DOI] [Google Scholar]
- Chen F., Curran P. J., Bollen K. A., Kirby J., Paxton P. (2008). An empirical evaluation of the use of fixed cutoff points in RMSEA test statistic in structural equation models. Sociological Methods & Research, 36(4), 462–494. 10.1177/0049124108314720 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan X., Sivo S. A. (2005). Sensitivity of fit indexes to misspecified structural or measurement model components: Rationale of two-index strategy revisited. Structural Equation Modeling: A Multidisciplinary Journal, 12(3), 343–367. 10.1207/s15328007sem1203_1 [DOI] [Google Scholar]
- Flake J. K., Pek J., Hehman E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological and Personality Science, 8(4), 370–378. 10.1177/1948550617693063 [DOI] [Google Scholar]
- Groskurth K., Bluemke M., Lechner C. M. (2023). Why we need to abandon fixed cutoffs for goodness-of-fit indices: An extensive simulation and possible solutions. Behavior Research Methods, 56, 3891–3914. 10.3758/s13428-023-02193-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heene M., Hilbert S., Draxler C., Ziegler M., Buehner M. (2011). Masking misfit in confirmatory factor analysis by increasing unique variances: A cautionary note on the usefulness of cutoff values of fit indices. Psychological Methods, 16, 319–336. 10.1037/a0024917 [DOI] [PubMed] [Google Scholar]
- Hu L., Bentler P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. 10.1080/10705519909540118 [DOI] [Google Scholar]
- Hu L., Bentler P. M., Kano Y. (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112(2), 351–362. 10.1037/0033-2909.112.2.351 [DOI] [PubMed] [Google Scholar]
- Jackson D. L., Gillaspy J. A., Purc-Stephenson R. (2009). Reporting practices in confirmatory factor analysis: An overview and some recommendations. Psychological Methods, 14(1), 6–23. 10.1037/a0014694 [DOI] [PubMed] [Google Scholar]
- Jöreskog K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34(2), 183–202. 10.1007/BF02289343 [DOI] [Google Scholar]
- Jöreskog K. G., Sörbom D. (1982). Recent developments in structural equation modeling. Journal of Marketing Research, 19(4), 404–416. 10.2307/3151714 [DOI] [Google Scholar]
- Kim H., Millsap R. (2014). Using the Bollen-Stine bootstrapping method for evaluating approximate fit indices. Multivariate Behavioral Research, 49(6), 581–596. 10.1080/00273171.2014.947352 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kline R. B. (2016). Principles and practice of structural equation modeling (4th ed., pp. xvii, 534). The Guilford Press. [Google Scholar]
- Marsh H. W., Hau K.-T., Wen Z. (2004). In search of golden rules: Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler’s (1999) findings. Structural Equation Modeling: A Multidisciplinary Journal, 11(3), 320–341. 10.1207/s15328007sem1103_2 [DOI] [Google Scholar]
- McNeish D. (2023). Generalizability of dynamic fit index, equivalence testing, and Hu & Bentler cutoffs for evaluating fit in factor analysis. Multivariate Behavioral Research, 58(1), 195–219. 10.1080/00273171.2022.2163477 [DOI] [PubMed] [Google Scholar]
- McNeish D., An J., Hancock G. R. (2018). The thorny relation between measurement quality and fit index cutoffs in latent variable models. Journal of Personality Assessment, 100(1), 43–52. 10.1080/00223891.2017.1281286 [DOI] [PubMed] [Google Scholar]
- McNeish D., Wolf M. G. (2023. a). Dynamic fit index cutoffs for confirmatory factor analysis models. Psychological Methods, 28(1), 61–88. 10.1037/met0000425 [DOI] [PubMed] [Google Scholar]
- McNeish D., Wolf M. G. (2023. b). Dynamic fit index cutoffs for one-factor models. Behavior Research Methods, 55(3), 1157–1174. 10.3758/s13428-022-01847-y [DOI] [PubMed] [Google Scholar]
- Millsap R. E. (2007). Structural equation modeling made difficult. Personality and Individual Differences, 42(5), 875–881. 10.1016/j.paid.2006.09.021 [DOI] [Google Scholar]
- Niemand T., Mai R. (2018). Flexible cutoff values for fit indices in the evaluation of structural equation models. Journal of the Academy of Marketing Science, 46(6), 1148–1172. 10.1007/s11747-018-0602-9 [DOI] [Google Scholar]
- Pornprasertmanit S., Miller P., Schoemann A., Jorgensen T. D., Quick C. (2021). simsem: SIMulated structural equation modeling (version 0.5-16) [Computer software]. https://github.com/simsem
- R Core Team. (2023). R: A language and environment for statistical computing (version 4.3.1) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/ [Google Scholar]
- Rios J., Wells C. (2014). Validity evidence based on internal structure. Psicothema, 26(1), 108–116. 10.7334/psicothema2013.260 [DOI] [PubMed] [Google Scholar]
- Rosseel Y. (2012). Lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. 10.18637/jss.v048.i02 [DOI]
- Schneider W. J. (2021). Simstandard: Generate standardized data (version 0.6.3) [Computer software]. R package. https://wjschne.github.io/simstandard
- Shi D., Lee T., Maydeu-Olivares A. (2019). Understanding the model size effect on SEM fit indices. Educational and Psychological Measurement, 79(2), 310–334. 10.1177/0013164418783530 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steiger J. H., Lind J. C. (1980, May). Statistically based tests for the number of common factors [Paper presentation]. Psychometric Society, Iowa City, IA, United States. https://www.semanticscholar.org/paper/Statistically-based-tests-for-the-number-of-common-Steiger/fddf2adb6ab085bc6231e72ba39904d7d2af57af [Google Scholar]
- West S. G., Wu W., McNeish D., Savord A. (2023). Model fit in structural equation modeling. In Hoyle R. H. (Ed.), Handbook of structural equation modeling (2nd ed., pp. 184–205). The Guilford Press. [Google Scholar]
- Wolf M. G., McNeish D. (2023). Dynamic: An R package for deriving dynamic fit index cutoffs for factor analysis. Multivariate Behavioral Research, 58(1), 189–194. 10.1080/00273171.2022.2163476 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material, sj-docx-1-epm-10.1177_00131644241290172 for Optimal Number of Replications for Obtaining Stable Dynamic Fit Index Cutoffs by Xinran Liu and Daniel McNeish in Educational and Psychological Measurement