Assessing Dimensionality of IRT Models Using Traditional and Revised Parallel Analyses

Wenjing Guo; Youn-Jeng Choi

doi:10.1177/00131644221111838

. 2022 Jul 21;83(3):609–629. doi: 10.1177/00131644221111838

Assessing Dimensionality of IRT Models Using Traditional and Revised Parallel Analyses

Wenjing Guo ¹, Youn-Jeng Choi ^2,^✉

PMCID: PMC10177320 PMID: 37187695

Abstract

Determining the number of dimensions is extremely important in applying item response theory (IRT) models to data. Traditional and revised parallel analyses have been proposed within the factor analysis framework, and both have shown some promise in assessing dimensionality. However, their performance in the IRT framework has not been systematically investigated. Therefore, we evaluated the accuracy of traditional and revised parallel analyses for determining the number of underlying dimensions in the IRT framework by conducting simulation studies. Six data generation factors were manipulated: number of observations, test length, type of generation models, number of dimensions, correlations between dimensions, and item discrimination. Results indicated that (a) when the generated IRT model is unidimensional, across all simulation conditions, traditional parallel analysis using principal component analysis and tetrachoric correlation performs best; (b) when the generated IRT model is multidimensional, traditional parallel analysis using principal component analysis and tetrachoric correlation yields the highest proportion of accurately identified underlying dimensions across all factors, except when the correlation between dimensions is 0.8 or the item discrimination is low; and (c) under a few combinations of simulated factors, none of the eight methods performed well (e.g., when the generation model is three-dimensional 3PL, the item discrimination is low, and the correlation between dimensions is 0.8).

Keywords: item response theory, dimensionality, parallel analysis, revised parallel analysis

Item response theory (IRT) has been widely applied to educational and psychological assessment because it has several advantages over classical test theory (CTT), such as invariance of person and item parameters (Baker & Kim, 2017). However, these advantages only apply when IRT’s assumptions are met, one of which is unidimensionality. When this assumption is violated, the estimations of item and ability parameters may be inaccurate (Finch & Monahan, 2008). In this case, multidimensional IRT (MIRT) may be used for analysis. Before applying MIRT, researchers should examine the dimensions underlying their data to decide which model they should use. Thus, determining the number of dimensions is important when applying IRT models to data.

Exploratory factor analysis (EFA) is a commonly used method to explore parsimonious and meaningful latent factors that could explain covariation between measures. Many empirical criteria have been applied to determine the number of factors that should be retained. The eigenvalue-greater-than-one criterion is one popular method. It is easy to understand and use, but many simulation studies have shown that this criterion tends to overfactor (e.g., Horn, 1965; Linn, 1968). Another commonly used method is the scree test proposed by Cattell (1966). Some researchers have argued that “although this scree test may work well with strong factors, it suffers from subjectivity and ambiguity” (Hayton et al., 2004, p. 198). Horn (1965) proposed parallel analysis (PA) to transcend the limitations of the eigenvalue-greater-than-one criterion. We refer to Horn’s PA as traditional PA (TPA). Since its proposal, many researchers have suggested modifications to TPA to improve its performance in terms of correctly identifying the number of underlying dimensions (e.g., Buja & Eyuboglu, 1992; Drasgow & Lissak, 1983; Finch & Monahan, 2008). However, Green et al. (2012) have argued that these modified PA methods have one common problem: They “include the generation of comparison data sets containing completely random data to assess the number of factors” (p. 358). To correct this problem, Green et al. (2012) proposed a revised PA (RPA) method, which considers k−1 factors in generated comparison data sets when researchers assess the need for the kth factor. Green et al. (2016) extended this RPA to dichotomous items. Although the two-parameter logistic (2PL) IRT model can be considered equivalent to the factor analysis of binary variables (Takane & De Leeuw, 1987), the three-parameter logistic (3PL) IRT includes a guessing parameter and thus is not equivalent to factor analysis with dichotomous variables. Some researchers (e.g., Tran & Formann, 2009) have evaluated TPA’s performance within IRT frameworks; however, their work has been restricted to unidimensional 2PL IRT. Therefore, the purpose of this study is to assess the accuracy of TPA and RPA in identifying underlying dimensions in both unidimensional and MIRT.

Brief Review of TPA and RPA

The TPA procedure (1965) comprises five steps. (a) Apply principal component analysis (PCA) to the observed data and record eigenvalues. (b) Generate 100 comparison data sets, each of which has the same number of variables (M) and sample size (N) as the observed data. The data in each comparison data set are generated assuming variables are multivariate, normally distributed, and uncorrelated in the population. (c) Conduct PCA in each comparison data set and record eigenvalues. (d) Calculate the means of the first, second,. . ., Mth eigenvalues for the 100 comparison data sets. (e) Sequentially compare the eigenvalues for the observed data with the means of the comparison data sets’ eigenvalues and record how many times the eigenvalue for the observed data is greater than the mean of eigenvalues for comparison data sets. This value is the number of factors that should be extracted.

To improve TPA performance, researchers argue that the eigenvalues for the observed data should be compared with the 95th percentile rather than the mean of eigenvalues for comparison data sets (e.g., Crawford et al., 2010; Glorfeld, 1995). The 95th percentile has been the most popular to date (Timmerman & Lorenzo-Seva, 2011). In addition, some researchers (e.g., Green et al., 2016; Humphreys & Ilgen, 1969) suggest that the extraction method PCA, in TPA could be replaced by principal axis factoring (PAF). One problem with PCA is that “it accounts for all of the variance in the correlation matrix to which it is applied” (Pett et al., 2003, p. 74), whereas common types of factor analysis (e.g., PAF) separate the common variance shared among variables from unique variance. Most measures in education and psychology contain random error, and PAF “procedures reflect a recognition of this fact, whereas PCA does not, [and] the common factor model is a more realistic model of the structure or correlations” (Fabrigar et al., 1999, p. 276). However, Crawford et al. (2010) compared TPA performance when using PAF and PCA extraction methods and concluded that one method did not uniformly perform better than the other. Therefore, in this study, we compared performance using both extraction methods.

Recently, researchers have argued that the rationale underlying TPA is questionable for evaluating all eigenvalues except for the first one because “the empirical distribution of eigenvalues beyond the first one is conditioned on the presence of zero factors rather than K factors” (Green et al., 2012, p. 360). To overcome this limitation, RPA was proposed to determine whether there are K+1 factors generated based on the model with K factors using comparison data sets. The eigenvalue of the (K+1)th factor in an observed data set is compared with those in comparison data sets. Specifically, the steps of RPA are as follows:

1. Apply PAF to the observed data and record eigenvalues and standardized loadings for unrotated factors.
2. Generate 100 comparison data sets, each with the same number of variables (M) and sample size (N) as the observed data. The data assuming k factors are generated using the following formula:

X = F Λ^{T} + E,

where $X$ = a N*M matrix, which contains the responses of N participants to M variables, and $F$ = a N*K matrix, which contains generated scores for K underlying dimensions. The generated scores are independent and are sampled from a normal distribution with a mean of 0 and a variance of 1. $Λ$ = a M*K matrix, which contains standardized loadings of M variables with K underlying factors, calculated from the observed data. $E$ = a N*M matrix, which contains residual scores for M variables. The residual scores are assumed to be independent and normally distributed with a mean of 0 and a variance equal to 1 minus the sum of K squared loadings. When K = 0, $X = E$ :

3. Conduct a PAF for each comparison data set and record eigenvalues for the (K+1)th factor.
4. Calculate the 95th percentile of the (K+1)th eigenvalues for the 100 comparison data sets.
5. If the (K+1)th eigenvalue in the observed data set is less than the 95th percentile of the (K+1)th eigenvalues for the 100 comparison data sets, the RPA is complete. The number of underlying factors is K. Otherwise, the number of underlying factors is at least K+1 unless the situation in Step 6 occurs.
6. For PAF, if the (K+1)th eigenvalue in the observed data set is less than or equal to 0, the RPA is complete. The number of factors is K. For PCA, if the (K+1)th eigenvalue in the observed data set is less than or equal to 1, the RPA is complete. The number of factors is K.

Green et al. (2016) applied RPA to binary data to identify the number of underlying dimensions. Based on their simulation studies, they found that RPA with PAF using the 95th percentile criterion tended to yield relatively high accuracy compared with TPA with PAF (also using the 95th percentile criterion), although this was not uniform across all conditions. The steps of RPA with binary data are similar to those when analyzing continuous data, with two exceptions. First, for binary data, tetrachoric correlations are used in RPA instead of Pearson product–moment correlation (labeled Pearson correlation). Second, when generating comparison data sets, thresholds are imposed to create binary data.

Pearson Correlation Versus Tetrachoric Correlation of Binary Data

In the case of dichotomous items, tetrachoric correlation (a special case of polychoric correlations) should be used instead of Pearson correlation to conduct either TPA or RPA. First, tetrachoric correlation can produce an unbiased estimation of relationships between binary observed variables (Olsson, 1979), while Pearson correlation tends to underestimate these relationships (Bollen & Barb, 1981). Second, factor analysis of Pearson correlation may produce spurious factors (referred to as difficulty factors for binary variables) when two variables are skewed in opposite directions or two items have noticeably different difficulties (Embretson & Reise, 2000); in contrast, variables’ skewness or different item difficulties do not influence the estimation of tetrachoric correlation (Carroll, 1945).

However, in practice, estimation of tetrachoric correlation may introduce some problems, such as nonpositive definite matrices, which have serious implications for factor analysis. Some researchers (e.g., Debelak & Tran, 2016; Timmerman & Lorenzo-Seva, 2011) suggest using smoothing algorithms to transform nonpositive definite matrices to positive ones. Debelak and Tran (2013) found that applying smoothing algorithms leads to a more accurate assessment of dimensionality compared with not doing so. In addition, both Carroll (1945) and DeMars (2019) have pointed out that a nonzero lower asymptote may influence the calculation of tetrachoric correlation, and thus researchers may take “guessing” into consideration.

Apart from these theoretical issues, empirical findings on the accuracy of TPA with Pearson correlation versus tetrachoric correlation have been mixed. For example, Cho et al. (2009) found that TPA with Pearson correlation performed at least as well as TPA with polychoric correlation under almost all conditions, while Garrido et al. (2012) showed that TPA with polychoric correlation performed better. Therefore, in this study, we compare both types of correlations’ performance.

Purpose of the Present Study

The purpose of this study is to evaluate the performance of TPA and RPA regarding identifying the number of factors that should be extracted within the IRT framework. Specifically, based on Pearson correlation, we compared the performances of TPA using PAF, TPA using PCA, RPA using PAF, and RPA using PCA. Furthermore, based on tetrachoric correlation, we compared the performances of TPA using PAF, TPA using PCA, RPA using PAF, and RPA using PCA.

Method

Data Generation

As shown in Table 1, 324 conditions were included in the data generation design. In each condition, 200 data sets were generated. For both unidimensional IRT (UIRT) and MIRT, three independent factors were manipulated. Sample sizes were N = 500 or 1,000. Test lengths were J = 30, or 60. As suggested by Baker and Kim (2017), the item discrimination parameters were drawn from $LNorm (In (A), 0.25)$ , where A = 0.5, 1.0, or 1.5. The purpose of generating low, medium, and high discrimination parameters is to reflect the low, medium, and high factor loadings in factor analysis (Tran & Formann, 2009). We used the Rasch generating model, 2PL model, and 3PL model.

Table 1.

Data Generation Design Factors.

Design factors		UIRT	MIRT
Manipulated	Sample size (N)	500, 1,000
	Test length (J)	30, 60
	Item discrimination (A)	Low, medium, high
	Generating model (M)	Rasch, 2PL, 3PL
	Dimension (D)	1	2, 3
	Individual ability parameters (θ)	N (0,1)	MVN (0, Σ)
	Correlation between dimensions (r)	—	0, 0.3, 0.5, 0.8
Held constant	Item difficulty parameters (b)	N(0,1)
Held constant	Item guessing parameters (c)	Beta(41, 161)

Open in a new tab

Note. UIRT = unidimensional item response theory; MIRT = multidimensional item response theory; 2PL = two-parameter logistic; 3PL = three-parameter logistic. MVN: multivariate normal distribution.

In UIRT, the 3PL model can be expressed as follows:

p_{j} (θ_{i}) = c_{j} + (1 - c_{j}) \frac{e^{1.702 a_{j} (θ_{i} - b_{j})}}{1 + e^{1.702 a_{j} (θ_{i} - b_{j}),}}

where $P_{j} (θ_{i})$ is the probability of answering item j correctly for person i, $c_{j}$ is the pseudo-chance parameter for item j, $a_{j}$ is the discrimination parameter for item j, and $b_{j}$ is the difficulty parameter for item j. Neither the 2PL nor the Rasch models contain pseudo-chance parameters. In addition, in the Rasch model, the discrimination parameter is equal to 1.

In MIRT, the number of dimensions was set to two levels: D = 2 or 3. The correlations between dimensions were 0, 0.3, 0.5, or 0.8. Each item measured only one dimension, which is referred to as a simple structure, and all dimensions were measured simultaneously.

One common form of the MIRT model proposed by Ackerman (1994) is the following:

p_{j} (θ_{ik}) = c_{j} + (1 - c_{j}) \frac{e^{\sum 1.702 a_{jk} (θ_{ik} - b_{j})}}{1 + e^{\sum 1.702 a_{jk} (θ_{ik} - b_{j}),}}

where $P_{j} (θ_{ik})$ is the probability of answering item j correctly for person i with latent trait $θ_{ik}$ , $c_{j}$ is the pseudo-chance parameter for item j, $a_{jk}$ is the discrimination parameter for item j and latent trait k, and $b_{j}$ is the difficulty parameter for item j. Similarly, the 2PL MIRT model does not contain a pseudo-chance parameter. In the Rasch MIRT model, the discrimination parameter is equal to 1, and no pseudo-chance parameter exists.

In UIRT, θ was normally distributed with a mean of 0 and a standard deviation of 1, while in the two-dimensional IRT, θ was bivariate, normally distributed with mean (0,0)^T and correlation matrix $[\begin{matrix} 1 & r \\ r & 1 \end{matrix}]$ , and in three-dimensional IRT, the distribution of θ was assumed to be multinormal distribution with a mean (0,0,0)^T and correlation matrix $[\begin{matrix} 1 & r_{12} & r_{13} \\ r_{21} & 1 & r_{23} \\ r_{31} & r_{32} & 1 \end{matrix}]$ . As suggested by Baker and Kim (2017), the distribution of item difficulty parameters was assumed to be standard normal, and pseudo-chance parameters were distributed $Beta (41, 161)$ . We generated data using the simdata function of the mirt R package (Chalmers, 2012).

Data Analysis

We used eight methods (shown in Table 2) to analyze the 6,480 generated data. For each simulated factor and combination of these factors, we calculated the proportion of each method accurately identifying the number of dimensions. According to the calculated proportion, we ranked all eight methods under each combination of simulated factors. We calculated the proportion of each method that was top-ranking across combinations for different generating models. Moreover, we calculated the proportion of each method underestimating and overestimating the number of dimensions for each simulated factor. As mentioned previously, estimation of tetrachoric correlation may lead to a nonpositive definite correlation matrix. In this case, we used a smoothing algorithm to make it positive definite. Specifically, we used the cor.smooth function in the psych R package (Revelle, 2022) to replace negative eigenvalues with $10^{- 12}$ . In addition, when the generating model was 3PL, before calculating tetrachoric correlations, we used the method proposed by Carroll (1945) to correct the $2 \times 2$ table between item pairs to account for the effect of guessing.

Table 2.

Eight Parallel Analysis Methods.

	Methods	PA	Correlation	Factor extraction
1	TPA–PEA–PCA	Traditional	Pearson	Principal components analysis
2	TPA–TET–PCA	Traditional	Tetrachoric	Principal components analysis
3	RPA–PEA–PCA	Revised	Pearson	Principal components analysis
4	RPA–TET–PCA	Revised	Tetrachoric	Principal components analysis
5	TPA–PEA–PAF	Traditional	Pearson	Principal axis factoring
6	TPA–TET–PAF	Traditional	Tetrachoric	Principal axis factoring
7	RPA–PEA–PAF	Revised	Pearson	Principal axis factoring
8	RPA–TET–PAF	Revised	Tetrachoric	Principal axis factoring

Open in a new tab

Note. TPA = traditional parallel analysis; PEA = Pearson correlation; PCA = principal components analysis; TET = tetrachoric correlation; RPA = revised parallel analysis; PAF = principal axis factoring.

Results

Unidimensional Data

Accurately Identifying the Number of Dimensions

Table 3 displays the proportion of each method that correctly detects the number of dimensions when the underlying dimension is 1. From this table, we observed that when the underlying dimension was 1, TPA–TET–PCA (TET refers to tetrachoric correlation) was the best-performing method of the eight methods. Specifically, the proportion for TPA–TET–PCA ranged from 0.82 to 1.00. Generally, regardless of PA methods and factor extraction methods, tetrachoric correlation outperformed Pearson correlation, and regardless of PA and type of correlation, PCA was superior to PAF.

Table 3.

Proportion of Each Method Accurately Identifying the Number of Dimensions for UIRT-Generated Data.

Factors	Levels	TPA–TET–PAF	TPA–TET–PCA	RPA–TET–PAF	RPA–TET–PCA	TPA–PEA–PAF	TPA–PEA–PCA	RPA–PEA–PAF	RPA–PEA–PCA
M	Rasch	0.77	1.00	0.80	0.79	0.00	0.06	0.00	0.00
	2PL	0.71	1.00	0.92	0.92	0.17	0.36	0.13	0.18
	3PL	0.56	0.82	0.54	0.50	0.27	0.52	0.25	0.29
N	500	0.83	0.97	0.82	0.80	0.18	0.36	0.15	0.18
N	1,000	0.53	0.91	0.69	0.68	0.12	0.26	0.10	0.13
A	Low	0.88	0.97	0.95	0.94	0.62	0.90	0.55	0.66
	Medium	0.70	0.97	0.77	0.74	0.02	0.15	0.01	0.02
	High	0.43	0.83	0.53	0.51	0.00	0.13	0.00	0.00
J	30	0.64	0.98	0.85	0.82	0.19	0.43	0.17	0.21
J	60	0.72	0.90	0.66	0.65	0.10	0.20	0.08	0.11

Open in a new tab

Note. UIRT = unidimensional item response theory; TPA = traditional parallel analysis; TET = tetrachoric correlation; PAF = principal axis factoring; PCA = principal components analysis; RPA = revised parallel analysis; PEA = Pearson correlation; 2PL = two-parameter logistic; 3PL = three-parameter logistic.

Apart from investigating the performance of each method under each stimulated factor, we further examined their performance under each combination of factors. When the unidimensional Rasch model was used for data generation, for all combinations, TPA–TET–PCA ranked highest. That is, for all combinations, TPA–TET–PCA demonstrated the highest proportion of correctly identifying the number of underlying dimensions. As presented in Figure 1, TPA–TET–PCA considerably outperformed the other seven methods when the test length was 60 and the sample size was 1,000. Specifically, with this combination of factors, TPA–TET–PCA could identify the number of dimensions with 100% accuracy, while the other seven methods’ proportions were below 50%. In addition, across all combinations, the performances of TPA–PEA–PAF, TPA–PEA–PCA, RPA–PEA–PAF, and RPA–PEA–PCA were not acceptable, as their identification accuracy was less than 25%.

Figure 1. — Proportion of Each Method Accurately Identifying the Number of Dimensions Under Each Combinations of Factors, When the Generating Model is Unidimensional Rasch Model.

When the unidimensional 2PL model was used for data generation, for all combinations, TPA–TET–PCA ranked highest. TPA–TET–PAF performed equally well when the test length was 60 and the sample size was 500. As shown in Figure 2, generally, when discrimination was medium or high, TPA–PEA–PAF, TPA–PEA–PCA, RPA–PEA–PAF, and RPA–PEA–PCA did not perform well, as their identification accuracy was less than 25%.

Figure 2. — Proportion of Each Method Accurately Identifying the Number of Dimensions Under Each Combinations of Factors, When the Generating Model is Unidimensional 2PL Model.

*Note*. 2PL = two-parameter logistic.

When the unidimensional 3PL model was used for data generation, TPA–TET–PCA typically ranked highest, followed by TPA–PEA–PCA. Specifically, for 75% of the combinations of factors, TPA–TET–PCA outperformed the other seven methods, while for under 25% of the combinations of factors, TPA–PEA–PCA performed optimally. As Figure 3 shows, TPA–TET–PCA far surpassed the other seven methods when discrimination was high, test length was 30, and the sample size was 1,000. Specifically, TPA–TET–PCA identified the number of underlying dimensions with 89.5% accuracy, while the accuracy of each of the other seven methods was lower than 25%. In addition, when the sample size was 1,000, test length was 60, and item discrimination was high, none of the eight methods performed well, with each method’s identification accuracy being lower than 10%.

Figure 3. — Proportion of Each Method Accurately Identifying the Number of Dimensions Under Each Combinations of Factors, When the Generating Model is Unidimensional 3PL Model.

*Note.* 3PL = three-parameter logistic.

Inaccurately Identifying the Number of Dimensions

Table 4 shows each method’s proportion of underestimating the number of dimensions when there was one underlying dimension. Of the eight methods, TPA–TET–PAF tended to underestimate the number of dimensions in all 10 conditions. That is, TPA–TET–PAF was most likely to treat unidimensional data as nonstructural data. The remaining seven methods did not underestimate the number of dimensions. From Table 5, we can observe that all eight methods tended to overestimate the number of dimensions when there was a single underlying dimension. However, RPA–PEA–PAF (PEA refers to Pearson correlation) exhibited the highest proportions, ranging from 0.45 to 1.00, across all 10 conditions. That is, RPA–PEA–PAF was most likely to overestimate the number of dimensions. In addition, TPA–PEA–PAF exhibited the second-highest proportion of overestimation. Generally, regardless of PA methods and factor extraction methods, Pearson correlation tended to overestimate the number of dimensions.

Table 4.

Proportion of Each Method’s Underestimation of the Number of Dimensions for UIRT-Generated Data.

Factors	Levels	TPA–TET–PAF
M	Rasch	0.08
	2PL	0.15
	3PL	0.09
N	500	0.07
N	1,000	0.14
A	Low	0.02
	Medium	0.12
	High	0.16
J	30	0.12
J	60	0.08

Open in a new tab

Table 5.

Proportion of Each Method’s Overestimation of the Number of Dimensions for UIRT-Generated Data.

Factors	Levels	TPA–TET–PAF	TPA–TET–PCA	RPA–TET–PAF	RPA–TET–PCA	TPA–PEA–PAF	TPA–PEA–PCA	RPA–PEA–PAF	RPA–PEA–PCA
M	Rasch	0.15	0.00	0.20	0.21	1.00	0.94	1.00	1.00
	2PL	0.14	0.00	0.08	0.08	0.83	0.64	0.87	0.82
	3PL	0.36	0.18	0.46	0.50	0.73	0.48	0.75	0.71
N	500	0.10	0.03	0.18	0.20	0.82	0.64	0.85	0.82
N	1,000	0.32	0.09	0.31	0.32	0.88	0.74	0.90	0.87
A	Low	0.11	0.03	0.05	0.06	0.38	0.10	0.45	0.34
	Medium	0.18	0.03	0.23	0.26	0.98	0.85	0.99	0.98
	High	0.41	0.17	0.47	0.49	1.00	0.87	1.00	1.00
J	30	0.24	0.02	0.15	0.18	0.81	0.57	0.83	0.79
J	60	0.19	0.10	0.34	0.35	0.90	0.80	0.92	0.89

Open in a new tab

Multidimensional Data

Accurately Identifying the Number of Dimensions

Table 6 displays the accuracy of each method in identifying the number of dimensions when either the two-dimensional IRT or three-dimensional IRT model was used to generate data. TPA–TET–PCA identified the number of underlying dimensions with the highest accuracy in all conditions except when correlations between dimensions were 0.8 or item discrimination was low. RPA–TET–PCA performed best when the correlation between dimensions was 0.8. TPA–PEA–PCA performed best when item discrimination was low. When the test length was 30 or when the multidimensional 2PL model was used to generate data, RPA–TET–PCA performed as well as TPA–TET–PCA. Generally, regardless of PA methods and factor extraction methods, tetrachoric correlation outperformed Pearson correlation. Regardless of PA methods and type of correlation, PCA was superior to PAF.

Table 6.

Proportion of Each Method Accurately Identifying the Number of Dimensions for MIRT-Generated Data.

Factors	Levels	TPA–TET–PAF	TPA–TET–PCA	RPA–TET–PAF	RPA–TET–PCA	TPA–PEA–PAF	TPA–PEA–PCA	RPA–PEA–PAF	RPA–PEA–PCA
M	Rasch	0.84	0.93	0.89	0.88	0.11	0.51	0.03	0.03
	2PL	0.69	0.85	0.83	0.85	0.28	0.65	0.25	0.33
	3PL	0.46	0.74	0.61	0.60	0.36	0.71	0.36	0.42
N	500	0.77	0.82	0.78	0.79	0.31	0.67	0.24	0.29
N	1,000	0.56	0.86	0.78	0.75	0.20	0.57	0.18	0.23
r	0	0.71	0.93	0.86	0.84	0.32	0.78	0.27	0.33
	0.3	0.71	0.94	0.85	0.84	0.30	0.77	0.25	0.32
	0.5	0.70	0.93	0.81	0.81	0.26	0.67	0.22	0.27
	0.8	0.54	0.56	0.59	0.60	0.12	0.27	0.10	0.12
A	Low	0.47	0.57	0.61	0.66	0.56	0.73	0.69	0.70
	Medium	0.75	0.91	0.86	0.85	0.18	0.59	0.09	0.14
	High	0.65	0.93	0.72	0.70	0.13	0.61	0.04	0.12
D	2	0.72	0.91	0.80	0.79	0.22	0.58	0.20	0.24
D	3	0.61	0.77	0.76	0.75	0.28	0.67	0.23	0.29
J	30	0.55	0.79	0.77	0.79	0.34	0.78	0.26	0.34
J	60	0.78	0.89	0.78	0.76	0.17	0.47	0.16	0.18

Open in a new tab

Note. MIRT = multidimensional item response theory; TPA = traditional parallel analysis; TET = tetrachoric correlation; PAF = principal axis factoring; PCA = principal components analysis; RPA = revised parallel analysis; PEA = Pearson correlation; 2PL = two-parameter logistic; 3PL = three-parameter logistic.

To gain greater insights into these methods, we also explored their performance under combinations of factors. Figure 4 shows the eight methods’ accuracy in identifying the number of underlying dimensions for each combination of simulated factors when the multidimensional Rasch model was used to generate data. From this figure, we observed that RPA–PEA–PAF and RPA–PEA–PCA performed worse than other methods across all combinations. In contrast, TPA–TET–PCA typically ranked highest. Specifically, for 87.50% of the combinations, TPA–TET–PCA most accurately identified the number of dimensions across 200 replications. However, when the correlations between dimensions were 0.8, the number of underlying dimensions was 3, the test length was 30, and the sample size was 500, TPA–TET–PCA performed worst among the eight methods. In this case, RPA–TET–PCA performed best, identifying the number of dimensions with 83% accuracy.

When the multidimensional 2PL model was used to generate data, TPA–TET–PCA was typically top-ranking. TPA–PEA–PCA demonstrated the second-best performance. Specifically, for 59.37% of the combinations of factors, TAP–TET–PCA performed best, while for 29.17% of the combinations of factors, TPA–PEA–PCA performed optimally. As shown in Figure 5, when item discrimination was low, the number of dimensions was 3, the correlations between dimensions were 0.8, and the test length was 30, all eight methods performed unsatisfactorily (accuracy below 25%). Another interesting observation was that RPA–PEA–PAF and RPA–PEA–PCA generally performed worse when item discrimination was medium or high compared with when it was low.

Figure 5. — Proportion of Each Method Accurately Identifying the Number of Dimensions Under Each Combinations of Factors, When the Generating Model is Multidimensional 2PL Model.

*Note.* 2PL = two-parameter logistic.

When the multidimensional 3PL model was used to generate data, TPA–TET–PCA most frequently performed the best, followed by TPA–PEA–PCA. Specifically, for 46.88% of the combinations of factors, TPA–TET–PCA produced the highest proportion of correctly detecting the number of dimensions, while for 23.96% of the combinations, TPA–PEA–PCA performed best. As shown in Figure 6, TPA–TET–PCA substantially outperformed the other seven methods when item discrimination was high, test length was 60, correlations between dimensions were 0.8, and the sample size was 1,000. When item discrimination was low, the number of dimensions was 3, and the correlations between dimensions were 0.8, none of the eight methods performed well (accuracy below 25%).

Figure 6. — Proportion of Each Method Accurately Identifying the Number of Dimensions Under Each Combinations of Factors, When the Generating Model is Multidimensional 3PL Model.

*Note*. 3PL = three-parameter logistic.

Inaccurately Identifying the Number of Dimensions

As shown in Table 7, TPA–TET–PCA was most likely to underestimate the number of dimensions when MIRT was used to generate data, followed by TPA–TET–PAF and RPA–TET–PAF. When the correlations between dimensions were 0.8, all eight methods tended to underestimate the number of dimensions, although TPA–TET–PCA produced the highest proportion of underestimation. As shown in Table 8, TPA–PEA–PAF overestimated the number of dimensions to a greater degree than any other method. RPA–PEA–PAF tended to overestimate the number of dimensions when the underlying dimensions were 2 or 3, with two exceptions: when the multidimensional 3PL model was used to generate data or when item discrimination was low. When the test length was 60, TPA–PEA–PAF overestimated the number of dimensions to the same degree as RPA–PEA–PAF did. When the multidimensional Rasch model was used to generate data, RPA–PEA–PCA overestimated the number of dimensions to the same degree as RPA–PEA–PAF did.

Table 7.

Proportion of Each Method’s Underestimation of the Number of Dimensions for MIRT-Generated Data.

Factors	Levels	TPA–TET–PAF	TPA–TET–PCA	RPA–TET–PAF	RPA–TET–PCA	TPA–PEA–PAF	TPA–PEA–PCA	RPA–PEA–PAF	RPA–PEA–PCA
M	Rasch	0.04	0.07	0.01	0.00	0.00	0.04	0.00	0.00
	2PL	0.10	0.11	0.09	0.07	0.03	0.08	0.03	0.04
	3PL	0.15	0.16	0.16	0.13	0.06	0.12	0.07	0.08
N	500	0.12	0.13	0.10	0.08	0.04	0.10	0.04	0.05
N	1,000	0.07	0.09	0.07	0.05	0.02	0.06	0.02	0.03
r	0	0.03	0.00	0.01	0.00	0.00	0.00	0.00	0.00
	0.3	0.04	0.00	0.03	0.01	0.00	0.00	0.00	0.00
	0.5	0.06	0.03	0.06	0.04	0.01	0.02	0.02	0.02
	0.8	0.26	0.42	0.25	0.22	0.10	0.29	0.11	0.13
A	Low	0.26	0.24	0.32	0.26	0.12	0.21	0.14	0.16
	Medium	0.05	0.08	0.02	0.02	0.00	0.05	0.00	0.01
	High	0.04	0.06	0.01	0.00	0.00	0.02	0.00	0.00
D	2	0.06	0.05	0.05	0.04	0.01	0.03	0.01	0.02
D	3	0.13	0.17	0.12	0.10	0.04	0.12	0.05	0.06
J	30	0.11	0.16	0.12	0.09	0.04	0.12	0.05	0.06
J	60	0.08	0.06	0.05	0.05	0.01	0.04	0.02	0.02

Open in a new tab

Table 8.

Proportion of Each Method’s Overestimation of the Number of Dimensions for MIRT-Generated Data.

Factors	Levels	TPA–TET–PAF	TPA–TET–PCA	RPA–TET–PAF	RPA–TET–PCA	TPA–PEA–PAF	TPA–PEA–PCA	RPA–PEA–PAF	RPA–PEA–PCA
M	Rasch	0.12	0.00	0.10	0.12	0.89	0.45	0.97	0.97
	2PL	0.21	0.04	0.08	0.08	0.69	0.27	0.72	0.64
	3PL	0.39	0.10	0.23	0.27	0.58	0.17	0.57	0.50
N	500	0.11	0.04	0.12	0.12	0.65	0.23	0.72	0.66
N	1,000	0.37	0.05	0.16	0.19	0.79	0.36	0.80	0.74
r	0	0.26	0.07	0.13	0.15	0.68	0.22	0.73	0.67
	0.3	0.25	0.06	0.13	0.15	0.70	0.23	0.74	0.68
	0.5	0.24	0.04	0.13	0.15	0.73	0.31	0.76	0.71
	0.8	0.20	0.02	0.16	0.18	0.77	0.43	0.79	0.74
A	Low	0.27	0.18	0.07	0.08	0.32	0.06	0.17	0.14
	Medium	0.20	0.01	0.11	0.13	0.82	0.37	0.91	0.86
	High	0.32	0.02	0.27	0.30	0.87	0.37	0.96	0.88
D	2	0.22	0.04	0.15	0.17	0.76	0.39	0.79	0.74
D	3	0.26	0.06	0.12	0.15	0.68	0.21	0.72	0.65
J	30	0.33	0.04	0.11	0.12	0.62	0.10	0.69	0.60
J	60	0.14	0.05	0.16	0.20	0.82	0.50	0.82	0.79

Open in a new tab

Discussion

There is a large body of research assessing dimensionality of data using TPA and RPA, most of which focuses on evaluating performance in the factor analysis framework (e.g., Green et al., 2016). IRT is a distinct form of factor analysis, especially with respect to multiple-choice items where a guessing parameter is introduced. In addition, the tetrachoric correlation has theoretical advantages when analyzing binary data, but estimation of the correlation may present some problems. In terms of the performance of the two extraction methods—PCA and PAF—results have been mixed (Crawford et al., 2010; Green et al., 2016). Therefore, our study provides important insight into the performance of TPA and RPA using either tetrachoric correlation or Pearson correlation with either the PCA or PAF extraction method. Overall, our findings indicate that (a) when the unidimensional IRT model is used to generate data, TPA using PCA and tetrachoric correlation performs best across all conditions; (b) when the MIRT model is used to generate data, TPA using PCA and tetrachoric correlation most accurately identifies the number of underlying dimensions in all conditions, except when the correlations between dimensions are high or item discrimination is low; and (c) under some combinations of simulated factors, none of the eight methods perform well (e.g., when the unidimensional 3PL was used to generate data, the sample size was 1,000, the test length was 60, and item discrimination was high).

Our findings provide researchers and practitioners with the following guidance. Generally speaking, TPA using PCA and tetrachoric correlation performs best. Second, TPA with tetrachoric correlation and PAF is more likely to underestimate the number of dimensions, when there was a single underlying dimension. Third, RPA with Pearson correlation and PAF tends to overestimate the number of dimensions. Finally, under certain conditions, all eight methods perform unsatisfactorily; therefore, researchers and practitioners should use the methods with caution.

Our study has some limitations that warrant attention in future studies. First, the characteristics of our simulation design do not reflect the full scope of educational measurement; thus, we encourage researchers and practitioners to consider the characteristics of the data sets that they analyze before generalizing our findings to other contexts that may have different characteristics. Second, we evaluated the performance of two popular techniques for assessing the dimensionality of data. Recently, other techniques have been proposed, such as exploratory graph analysis (e.g., Golino et al., 2020). In future studies, researchers may evaluate the performance of these newly proposed methods in identifying dimensions in the IRT framework.

Footnotes

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD: Wenjing Guo Inline graphic https://orcid.org/0000-0001-8271-9374

References

Ackerman T. A. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7(4), 255–278. [Google Scholar]
Baker F. B., Kim S. H. (2017). The basics of item response theory using R. New York: Springer. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bollen K. A., Barb K. H. (1981). Pearson’s R and coarsely categorized data. American Sociological Review, 46, 232–239. [Google Scholar]
Buja A., Eyuboglu N. (1992). Remarks on parallel analysis. Multivariate Behavioral Research, 27, 509–540. [DOI] [PubMed] [Google Scholar]
Carroll J. B. (1945). The effect of difficulty and chance success on correlations between items or between tests. Psychometrika, 10(1), 1–19. [Google Scholar]
Cattell R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245–276. [DOI] [PubMed] [Google Scholar]
Chalmers R. P. (2012). MIRT: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. [Google Scholar]
Cho S. J., Li F., Bandalos D. (2009). Accuracy of the parallel analysis procedure with polychoric correlations. Educational and Psychological Measurement, 69(5), 748–759. [Google Scholar]
Crawford A., Green S. B., Levy R., Lo W.-J., Scott L., Svetina D. S., Thompson M. S. (2010). Evaluation of parallel analysis methods for determining the number of factors. Educational and Psychological Measurement, 70, 885–901. [Google Scholar]
Debelak R., Tran U. S. (2013). Principal component analysis of smoothed tetrachoric correlation matrices as a measure of dimensionality. Educational and Psychological Measurement, 73(1), 63–77. [Google Scholar]
Debelak R., Tran U. S. (2016). Comparing the effects of different smoothing algorithms on the assessment of dimensionality of ordered categorical items with parallel analysis. PLOS ONE, 11(2), 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
DeMars C. E. (2019). Revised parallel analysis with nonnormal ability and a guessing parameter. Educational and Psychological Measurement, 79(1), 151–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
Drasgow F., Lissak R. I. (1983). Modified parallel analysis: A procedure for examining the latent dimensionality of dichotomously scored item responses. Journal of Applied Psychology, 68, 363–373. [Google Scholar]
Embretson S. E., Reise S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum. [Google Scholar]
Fabrigar L. R., Wegener D. T., MacCallum R. C., Strahan E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272–299. [Google Scholar]
Finch H., Monahan P. (2008). A bootstrap generalization of modified parallel analysis for IRT dimensionality assessment. Applied Measurement in Education, 21(2), 119–140. [Google Scholar]
Garrido L. E., Abad F. J., Ponsoda V. (2012). A new look at Horn’s parallel analysis with ordinal variables. Psychological Methods, 18, 454–474. [DOI] [PubMed] [Google Scholar]
Glorfeld L. W. (1995). An improvement on Horn’s parallel analysis methodology for selecting the correct number of factors to retain. Educational and Psychological Measurement, 55, 377–393. [Google Scholar]
Golino H., Shi D., Christensen A. P., Garrido L. E., Nieto M. D., Sadana R., Thiyagarajan J. A., Martinez-Molina A. (2020). Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial. Psychological Methods, 25(3), 292–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
Green S. B., Levy R., Thompson M. S., Lu M., Lo W. J. (2012). A proposed solution to the problem with using completely random data to assess the number of factors with parallel analysis. Educational and Psychological Measurement, 72(3), 357–374. [Google Scholar]
Green S. B., Redell N., Thompson M. S., Levy R. (2016). Accuracy of revised and traditional parallel analyses for assessing dimensionality with binary data. Educational and Psychological Measurement, 76, 5–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
Horn J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 32, 179–185. [DOI] [PubMed] [Google Scholar]
Humphreys L. G., Ilgen D. R. (1969). Note on a criterion for the number of common factors. Educational and Psychological Measurement, 29, 571–578. [Google Scholar]
Hayton J. C., Allen D. G., Scarpello V. (2004). Factor retention decisions in exploratory factor analysis: A tutorial on parallel analysis. Organizational research methods, 7(2), 191–205. [Google Scholar]
Linn R. L. (1968). A Monte Carlo approach to the number of factors problem. Psychometrika, 33, 37–71. [DOI] [PubMed] [Google Scholar]
Olsson U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44, 443–460. [Google Scholar]
Pett M. A., Lackey N. R., Sullivan J. J. (2003). Making sense of factor analysis: The use of factor analysis for instrument development in health care research. SAGE. [Google Scholar]
Revelle W. (2022). psych: Procedures for psychological, psychometric, and personality research (R package version 2.2.5.). Northwestern University. https://CRAN.R-project.org/package=psych [Google Scholar]
Takane Y., De Leeuw J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52(3), 393–408. [Google Scholar]
Timmerman M. E., Lorenzo-Seva U. (2011). Dimensionality assessment of ordered polytomous items with parallel analysis. Psychological Methods, 16(2), 209–220. [DOI] [PubMed] [Google Scholar]
Tran U. S., Formann A. K. (2009). Performance of parallel analysis in retrieving unidimensionality in the presence of binary data. Educational and Psychological Measurement, 69(1), 50–61. [Google Scholar]

[bibr1-00131644221111838] Ackerman T. A. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7(4), 255–278. [Google Scholar]

[bibr2-00131644221111838] Baker F. B., Kim S. H. (2017). The basics of item response theory using R. New York: Springer. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr3-00131644221111838] Bollen K. A., Barb K. H. (1981). Pearson’s R and coarsely categorized data. American Sociological Review, 46, 232–239. [Google Scholar]

[bibr4-00131644221111838] Buja A., Eyuboglu N. (1992). Remarks on parallel analysis. Multivariate Behavioral Research, 27, 509–540. [DOI] [PubMed] [Google Scholar]

[bibr5-00131644221111838] Carroll J. B. (1945). The effect of difficulty and chance success on correlations between items or between tests. Psychometrika, 10(1), 1–19. [Google Scholar]

[bibr6-00131644221111838] Cattell R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245–276. [DOI] [PubMed] [Google Scholar]

[bibr7-00131644221111838] Chalmers R. P. (2012). MIRT: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. [Google Scholar]

[bibr8-00131644221111838] Cho S. J., Li F., Bandalos D. (2009). Accuracy of the parallel analysis procedure with polychoric correlations. Educational and Psychological Measurement, 69(5), 748–759. [Google Scholar]

[bibr9-00131644221111838] Crawford A., Green S. B., Levy R., Lo W.-J., Scott L., Svetina D. S., Thompson M. S. (2010). Evaluation of parallel analysis methods for determining the number of factors. Educational and Psychological Measurement, 70, 885–901. [Google Scholar]

[bibr10-00131644221111838] Debelak R., Tran U. S. (2013). Principal component analysis of smoothed tetrachoric correlation matrices as a measure of dimensionality. Educational and Psychological Measurement, 73(1), 63–77. [Google Scholar]

[bibr11-00131644221111838] Debelak R., Tran U. S. (2016). Comparing the effects of different smoothing algorithms on the assessment of dimensionality of ordered categorical items with parallel analysis. PLOS ONE, 11(2), 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr12-00131644221111838] DeMars C. E. (2019). Revised parallel analysis with nonnormal ability and a guessing parameter. Educational and Psychological Measurement, 79(1), 151–169. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr13-00131644221111838] Drasgow F., Lissak R. I. (1983). Modified parallel analysis: A procedure for examining the latent dimensionality of dichotomously scored item responses. Journal of Applied Psychology, 68, 363–373. [Google Scholar]

[bibr14-00131644221111838] Embretson S. E., Reise S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum. [Google Scholar]

[bibr15-00131644221111838] Fabrigar L. R., Wegener D. T., MacCallum R. C., Strahan E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272–299. [Google Scholar]

[bibr16-00131644221111838] Finch H., Monahan P. (2008). A bootstrap generalization of modified parallel analysis for IRT dimensionality assessment. Applied Measurement in Education, 21(2), 119–140. [Google Scholar]

[bibr17-00131644221111838] Garrido L. E., Abad F. J., Ponsoda V. (2012). A new look at Horn’s parallel analysis with ordinal variables. Psychological Methods, 18, 454–474. [DOI] [PubMed] [Google Scholar]

[bibr18-00131644221111838] Glorfeld L. W. (1995). An improvement on Horn’s parallel analysis methodology for selecting the correct number of factors to retain. Educational and Psychological Measurement, 55, 377–393. [Google Scholar]

[bibr19-00131644221111838] Golino H., Shi D., Christensen A. P., Garrido L. E., Nieto M. D., Sadana R., Thiyagarajan J. A., Martinez-Molina A. (2020). Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial. Psychological Methods, 25(3), 292–220. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr20-00131644221111838] Green S. B., Levy R., Thompson M. S., Lu M., Lo W. J. (2012). A proposed solution to the problem with using completely random data to assess the number of factors with parallel analysis. Educational and Psychological Measurement, 72(3), 357–374. [Google Scholar]

[bibr21-00131644221111838] Green S. B., Redell N., Thompson M. S., Levy R. (2016). Accuracy of revised and traditional parallel analyses for assessing dimensionality with binary data. Educational and Psychological Measurement, 76, 5–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr22-00131644221111838] Horn J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 32, 179–185. [DOI] [PubMed] [Google Scholar]

[bibr23-00131644221111838] Humphreys L. G., Ilgen D. R. (1969). Note on a criterion for the number of common factors. Educational and Psychological Measurement, 29, 571–578. [Google Scholar]

[bibr24-00131644221111838] Hayton J. C., Allen D. G., Scarpello V. (2004). Factor retention decisions in exploratory factor analysis: A tutorial on parallel analysis. Organizational research methods, 7(2), 191–205. [Google Scholar]

[bibr25-00131644221111838] Linn R. L. (1968). A Monte Carlo approach to the number of factors problem. Psychometrika, 33, 37–71. [DOI] [PubMed] [Google Scholar]

[bibr26-00131644221111838] Olsson U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44, 443–460. [Google Scholar]

[bibr27-00131644221111838] Pett M. A., Lackey N. R., Sullivan J. J. (2003). Making sense of factor analysis: The use of factor analysis for instrument development in health care research. SAGE. [Google Scholar]

[bibr28-00131644221111838] Revelle W. (2022). psych: Procedures for psychological, psychometric, and personality research (R package version 2.2.5.). Northwestern University. https://CRAN.R-project.org/package=psych [Google Scholar]

[bibr29-00131644221111838] Takane Y., De Leeuw J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52(3), 393–408. [Google Scholar]

[bibr30-00131644221111838] Timmerman M. E., Lorenzo-Seva U. (2011). Dimensionality assessment of ordered polytomous items with parallel analysis. Psychological Methods, 16(2), 209–220. [DOI] [PubMed] [Google Scholar]

[bibr31-00131644221111838] Tran U. S., Formann A. K. (2009). Performance of parallel analysis in retrieving unidimensionality in the presence of binary data. Educational and Psychological Measurement, 69(1), 50–61. [Google Scholar]

PERMALINK

Assessing Dimensionality of IRT Models Using Traditional and Revised Parallel Analyses

Wenjing Guo

Youn-Jeng Choi

Abstract

Brief Review of TPA and RPA

Pearson Correlation Versus Tetrachoric Correlation of Binary Data

Purpose of the Present Study