Abstract
Analytic bifactor rotations (Jennrich & Bentler, 2011, 2012) have been recently developed and made generally available, but are not well understood. The Jennrich-Bentler analytic bifactor rotations (bi-quartimin and bi-geomin) are an alternative to, and arguably an improvement upon, the less technically sophisticated Schmid-Leiman orthogonalization (Schmid & Leiman, 1957). We review the technical details that underlie the Schmid-Leiman and Jennrich-Bentler bifactor rotations, using simulated data structures to illustrate important features and limitations. For the Schmid-Leiman, we review the problem of inaccurate parameter estimates caused by the linear dependencies, sometimes called “proportionality constraints,” that are required to expand a p correlated factors solution into a (p+1) (bi)factor space. We also review the complexities involved when the data depart from perfect cluster structure (e.g., item cross-loading on group factors). For the Jennrich-Bentler rotations, we describe problems in parameter estimation caused by departures from perfect cluster structure. In addition, we illustrate the related problems of: (a) solutions that are not invariant under different starting values (i.e., local minima problems); and, (b) group factors collapsing onto the general factor. Recommendations are made for substantive researchers including examining all local minima and applying multiple exploratory techniques in an effort to identify an accurate model.
Keywords: exploratory bifactor, Schmid-Leiman, local minima
The development of exploratory structural equation modeling (ESEM; Asparouhov, & Muthen, 2009; Marsh, Morin, Parker, & Kaur, 2014) has led to renewed interest in exploratory factor analysis. Accordingly, exploratory bifactor modeling is an increasingly common tool for understanding the latent structure of psychological measures (Canivez, 2015; Dombrowski, 2014; Morin, Arens, & Marsh, 2015). A chief purpose of exploratory bifactor modeling is to correctly partition item variance such that the general factor represents what is common among all the items and group factors represent systematic variation unrelated to the general factor. When accurate, this partitioning allows researchers to better understand the systematic sources of variance underlying item responses (e.g., Simms et al., 2008), and to calculate useful indices to judge the relative strength of the general factor and its influence on unit-weighted scale scores (Zinbarg, Revelle, Yovel, & Li, 2005), or the viability of scoring global and subscale domains (Rodriguez, Reise, & Haviland, 2015ab).
Nevertheless, the tools currently available for performing exploratory bifactor modeling possess shortcomings that may prevent them from realizing these goals. The Schmid-Leiman orthogonalization (SL; Schmid & Leiman, 1957), for example, imposes “proportionality constraints,” leading to inaccurate solutions when these constraints fail to hold (Brunner, Nagy, & Wilhelm, 2012; Jennrich & Bentler, 2011, 2012; Yung, Thissen, & McLeod, 1999). SL solutions are also problematic when data depart from a perfect cluster structure pattern (i.e., items load on more than one group factor; see also, Reise, Cook, & Moore, 2015).
Most recently, a technically superior approach to exploratory bifactor analyses has been developed by Jennrich and Bentler (2011, 2012) who described two analytic bifactor rotation criteria, called bi-quartimin and bi-geomin. We refer to such rotations as JB rotations. JB rotations are available in Mplus (Muthén & Muthén, 1998–2012), EQS (Bentler & Wu, 2002), and the GPArotation package in R (Bernaards & Jennrich, 2005; R Core Team, 2015). JB analytic rotations do not impose constraints and are thus potentially superior to the SL. However, JB rotations: (a) can also be inaccurate when the data depart from perfect cluster structure, (b) produce different solutions when different starting values are used, and (c) can improperly collapse group factors onto the general factor (Jennrich & Bentler, 2011; Asparouhov & Muthén, 2012).
The goal of this tutorial is therefore to demonstrate and elucidate the causes of these limitations in the JB and SL exploratory bifactor solutions and ultimately to suggest some remedies for applied researchers. To accomplish this, we must first introduce two technical topics that are critical to understanding the limitations of exploratory bifactor solutions: (a) Crawford-Ferguson (CF; Crawford & Ferguson, 1970; Crawford, 1975) rotations, and (b) the gradient projection algorithm (GPA: Bernaards & Jennrich, 2005). The CF rotation criteria are mathematical functions that a rotation seeks to minimize, while the GPA represents the technical method of minimizing a given rotation criterion.
Crawford-Ferguson Rotations and the Gradient Projection Algorithm
We assume without loss of generality that all observed variables and latent factors have zero means and unit variances. A factor model with n items and p common factors can then be written as
(1) |
where S is the n by n sample correlation matrix; Σ is the n by n model-reproduced correlation matrix that approximates S; Λ is the n by p matrix of factor loadings, which contains regression coefficients describing the relationship between the observed variables and latent factors; Φ is a p by p factor correlation matrix; and Θ2 is the n by n diagonal matrix of unique variance estimates.
In a bifactor model with (p+1) factors, Λ has a specific “ideal” form, namely, all items should load on a “general” factor and on one, and only one, of the p remaining “group” factors. In addition, in a bifactor model with orthogonal factors, Φ is a diagonal matrix. The SL and JB exploratory bifactor methods both attempt to achieve this “ideal” bifactor form, but do so in very different ways. In the SL, a researcher begins by extracting p orthogonal factors, performs an oblique rotation of those factors, and then expands that rotated solution via a second-order model into an orthogonal solution with (p + 1) factors. Such an expansion from a p space to a (p + 1) space must impose some form of constraint, as described below. JB bifactor solutions, on the other hand, begin with an extraction of p + 1 orthogonal factors followed by a rotation (bi-quartimin or bi-geomin) to bifactor structure with orthogonal dimensions1.
Given the above, we argue that the key to understanding the SL and JB lies not in the method used for initial extraction of p or (p + 1) orthogonal factors2, but rather in how those factors are rotated to a substantively interpretable solution. Specifically, factors are rotated to minimize some rotation criterion function, which is usually a complex polynomial function of the elements of the factor loading matrix, designed such that when certain requirements are satisfied, the rotation criterion is minimized.
We choose to discuss the CF rotations in this initial exposition on EFA for two reasons. First, many of the most commonly used rotation criteria are subsumed under the Crawford-Ferguson family of rotation criteria, including the quartimax (quartimin), varimax, equamax, parsimax, and factor parsimony rotation criteria in the orthogonal case (Browne, 2001). Second, the CF rotations are transparent in how they quantify two key aspects of a parsimonious factor solution, row complexity and column complexity. Although the CF rotations were designed as orthogonal rotations, which produce orthogonal factors (Φ diagonal), the distinction between oblique and orthogonal rotation is largely based on the optimization procedure rather than the choice of rotation criterion, and the CF rotations have also been fruitfully applied to oblique rotations that produce correlated factors (Φ unrestricted) (Crawford, 1975).
The CF rotation criteria are parameterized by a single parameter κ, and can be written as:
(2) |
where 0 ≤ κ ≤1 and λab is the (a,b)’th element of the factor loading matrix Λ. The goal of the CF rotations is to minimize two types of complexity; row (or variable) complexity, represented by the first term; and column (or factor) complexity, represented by the second term. The terms “row” and “column” refer to the rows and columns of Λ. The motivations behind minimizing row and column complexity stem from the primary goal of factor rotation: to produce an interpretable solution (Thurstone, 1954).
The tuning parameter κ represents the relative importance of row and column complexity in the factor solution. If a researcher is only interested in minimizing row complexity, the researcher would set κ = 0, the second term vanishes and the CF rotation criterion becomes the quartimax (quartimin) rotation. Likewise, if a researcher is only interested in minimizing column complexity, the researcher would set κ = 1, the first term vanishes and the CF rotation criterion becomes the factor parsimony (facparsim) rotation criterion, which is seldom used in practice. Intermediate values of κ yield varimax (κ=1/n), equamax (κ=p/(2n)), and parsimax (κ=(p−1)/(n+p−2)) rotations (Browne, 2001; Sass & Schmitt, 2010).
Once the criterion is defined, an algorithm is needed to find a rotation of the initially extracted solution that minimizes the rotation criterion. Bernaards and Jennrich (2005) described a general algorithm, called the gradient projection algorithm (GPA), for finding rotations of factor loading matrices that minimize an arbitrary rotation criterion. The algorithm proceeds in an iterative fashion, with each iteration involving a gradient descent step and a projection step. Specifically, suppose A is the n by p initially extracted factor loading matrix, Λ is the n by p rotated factor loading matrix, and T is the p by p rotation matrix used to rotate A to Λ according to the formula,
(3) |
Consider a rotation criterion function Q(Λ) of the rotated factor loading matrix Λ. Examples include the CF rotations discussed previously and the bifactor rotations to be discussed later. Then the goal of the gradient projection algorithm is to identify a rotation matrix T which minimizes
(4) |
Equivalently, consider a rotation criterion function f(T) of the rotation matrix defined as
(5) |
Equation (5) is the same as (4) but reframes the rotation criterion as a function of a rotation matrix T instead of a factor loading matrix Λ. Framed in this way, factor rotation involves finding a rotation matrix T that minimizes f(T), and factor rotation can be seen as an optimization problem for T that can be solved using GPA.
The first step of the GPA involves gradient descent of f(T), in which a small constant multiple of the gradient of f(T) evaluated at T is subtracted from T at each iteration. This gradient descent step is similar to Newton’s method, and uses the first derivatives of f(T) to produce a new matrix with a lower value of f(T). The gradient GT of f(T) is a simple projection of the gradient GΛ of Q(Λ) (Jennrich, 2001),
(6) |
GΛ can be derived analytically for most rotation criteria; see Bernaards and Jennrich (2005) pg. 683 for the corresponding derivatives for the CF rotations and Jennrich and Bentler (2011) pg. 14–15 for those derivatives for the bifactor rotations. Importantly, for the bifactor rotations, the gradient GΛ does not involve the general factor, because the rotation criterion itself does not involve the general factor.
The gradient descent step proceeds as follows for the (i+1)’th iteration of the algorithm, where α is a small number that can vary across iterations and controls the size of each step; see Jennrich (2001, 2002) for details on the specification of α.
(7) |
The gradient descent step of the GPA yields a matrix that is no longer a proper rotation matrix. In order to produce a proper rotation Λ of A, the rotation matrix T must be constrained during the GPA; these constraints are imposed during the second “projection” step of GPA, in which the new rotation matrix is projected onto the space of permissible rotation matrices. If T is constrained to be an orthogonal matrix, with all columns mutually orthogonal and the magnitude of each column constrained to unity, then Λ is an orthogonal rotation of A and the resulting factors will be uncorrelated. If T is constrained to be a normal matrix, with the magnitude of each column constrained to unity but no restriction on the orthogonality of the columns, then Λ is considered an oblique rotation of A and the resulting factors will be correlated. These constraints can be applied to any rotation criterion to produce orthogonal or oblique versions of that rotation criterion.
For orthogonal rotation, the projection step can be performed by finding the singular value decomposition of Ti+1 and setting all singular values equal to one. For oblique rotation, a function that projects Ti+1 onto the manifold of normal matrices is defined as
(8) |
These projections are computationally trivial and enable the algorithm to proceed very efficiently. However, note that they are simply Procrustes solutions that force T to be either orthogonal or normal, and essentially find the “closest” orthogonal or normal matrix to . This will become critically important in the discussion of bifactor rotations, because the gradient of bifactor rotation criteria does not involve the general factor, and while variance can shift among the group factors during the gradient descent step, variance can only shift to or from the general factor during the projection/Procrustes step.
The gradient projection algorithm proceeds by alternating the gradient descent and projection steps, adjusting α as necessary to ensure
(9) |
and proceeding until convergence.
Before moving forward to our discussion of SL and JB rotations, it is critical to note a major potential obstacle to obtaining rotated factor solutions using GPA. Namely, many authors have observed that the rotation criteria used in exploratory factor analysis can produce different solutions if different start values are given (Rozeboom, 1992; Browne, 2001). Generally, these solutions have different values of the rotation criterion function; the solution with the lowest function value is called the global minimum, and the other solutions are called local minima. In the presence of local minima, a single run of the GPA algorithm, starting from an initial rotation and rotating iteratively until convergence, will only identify one of these solutions, which may not be the global minimum. GPA shares this property with other “hill descent” algorithms such as Newton’s method; these algorithms find the closest minimum to the starting value in the direction of the gradient, which could be a local or global minimum. As described below, SL solutions appear to be much less prone to local minima problems than the JB solutions, a distinct advantage of the former.
The Schmid-Lieman Orthogonalization
The Schmid-Lieman (SL) proceeds by converting a p correlated factors solutions into a second-order solution, which is then orthogonalized into an (p + 1)-factor exploratory bifactor solution with proportionality constraints. The discussion to follow is a simplified version of that used in Yung et al. (1999)3. Let R be an n by n correlation matrix of n measured variables. First, an oblique factor analysis is performed on R, yielding a model-reproduced correlation matrix where
(10) |
and Λ is an n by p matrix of (standardized) factor loadings (n > p, p > 2), Φ is a p by p correlation matrix for the first-order factors, and Θ is an n by n diagonal matrix of unique variances for the observed variables. Next, Φ is further factor-analyzed to identify a single second-order factor, yielding a model-reproduced factor correlation matrix such that
(11) |
where Λ2 is a p by 1 matrix of factor loadings and is a p by p diagonal matrix with unique variances for the first-order factors on the diagonal. Schmid and Leiman (1957) assume that Λ and Λ2 have simple cluster structures; that is, only a single non-zero value is permissible in each row.
The factor model for the manifest variables can be written as: where z is an n by 1 random vector of standardized manifest variables, f1 is an p by 1 random vector of first-order factors, and u1 is an n by 1 random vector of unique factors. The factor model for the first-order factors can be written as
(12) |
where, similarly, f2 is the second-order factor and u2 is a p by 1 random vector of unique second-order factors. Combining the previous two equations, the factor model for the manifest variables can be equivalently written as:
(13) |
which can be rewritten as
(14) |
where
(15) |
B above is the factor loading matrix produced by the Schmid-Lieman orthogonalization, and the model-implied correlation matrix Σ can be written as
(16) |
The factor loading matrix B is highly structured; ΛΛ2 is the n by 1 submatrix containing the loadings of the items on the general factor, and ΛΘ2 is the n by p submatrix containing the loadings of the items on the group factors.
We now demonstrate key properties of the SL orthogonalization with an illustration of the “proportionality constraints” (Yung et al., 1999; Jennrich & Bentler, 2011, 2012). In basic terms, the transformation of a second-order solution into a bifactor solution forces the number of items minus the number of group factor “proportionality constraints” in a SL solution. These constraints are not intrinsic to bifactor models per se, they are unique to the SL. In a confirmatory bifactor model, either factor analytic or item response theory, proportionality constraints are not imposed when moving from a second-order to a bifactor model, thus making a confirmatory second-order and confirmatory bifactor models nested and statistically differentiable (Chen, West, & Sousa, 2007; Yung et al., 1999). In a SL orthogonalization, the second-order and bifactor are not nested, and not differentiable by standard methods for comparing nested models.
In the upper left panel of Table 1 is shown a factor loading matrix for a bifactor model with one general factor and four group factors. For purposes of this demonstration, this matrix was generated to contain proportionality constraints. For items within each group factor, the ratio of variance explained by the group factor to the variance explained by the general factor is a constant; these values are shown under the PC column. So for item 1, the ratio is .402/.502 = .65, and for item 5, the ratio is .512/.552 = .85, and so on. Observe that when this true population factor loading pattern is converted to a correlation matrix, , the first five eigenvalues are 5.85, 1.83, 1.48, 1.26, and 0.69. This pattern suggests that p = 4 factors, not (p + 1) = 5 factors, can account for the common variance shared among the items. The consequences of this will become apparent shortly. Finally, observe that there are no cross-loadings of items on group factors.
Table 1.
Schmid-Leiman Solution When Factor Loadings are Proportional in the True Population Model
True Population Model | Schmid-Leiman | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||
Item | G | F1 | F2 | F3 | F4 | PC | G | F1 | F2 | F3 | F4 | PC | ||
1 | .50 | .40 | 0 | 0 | 0 | .65 | .50 | 0 | 0 | 0 | .40 | .65 | ||
2 | .42 | .34 | 0 | 0 | 0 | .65 | .42 | 0 | 0 | 0 | .34 | .65 | ||
3 | .44 | .36 | 0 | 0 | 0 | .65 | .44 | 0 | 0 | 0 | .36 | .65 | ||
4 | .47 | .38 | 0 | 0 | 0 | .65 | .47 | 0 | 0 | 0 | .38 | .65 | ||
5 | .55 | 0 | .51 | 0 | 0 | .85 | .55 | 0 | .51 | 0 | 0 | .85 | ||
6 | .60 | 0 | .55 | 0 | 0 | .85 | .60 | 0 | .55 | 0 | 0 | .85 | ||
7 | .56 | 0 | .52 | 0 | 0 | .85 | .56 | 0 | .52 | 0 | 0 | .85 | ||
8 | .50 | 0 | .46 | 0 | 0 | .85 | .50 | 0 | .46 | 0 | 0 | .85 | ||
9 | .46 | 0 | 0 | .56 | 0 | 1.5 | .46 | .56 | 0 | 0 | 0 | 1.5 | ||
10 | .57 | 0 | 0 | .70 | 0 | 1.5 | .57 | .70 | 0 | 0 | 0 | 1.5 | ||
11 | .58 | 0 | 0 | .72 | 0 | 1.5 | .58 | .72 | 0 | 0 | 0 | 1.5 | ||
12 | .54 | 0 | 0 | .66 | 0 | 1.5 | .54 | .66 | 0 | 0 | 0 | 1.5 | ||
13 | .58 | 0 | 0 | 0 | .58 | 1.0 | .58 | 0 | 0 | .58 | 0 | 1.0 | ||
14 | .52 | 0 | 0 | 0 | .52 | 1.0 | .52 | 0 | 0 | .52 | 0 | 1.0 | ||
15 | .43 | 0 | 0 | 0 | .43 | 1.0 | .43 | 0 | 0 | .43 | 0 | 1.0 | ||
16 | .47 | 0 | 0 | 0 | .47 | 1.0 | .47 | 0 | 0 | .47 | 0 | 1.0 | ||
| ||||||||||||||
Correlated Factors (Λ) | Factor Intercorrelations (Φ) | |||||||||||||
|
|
|||||||||||||
1 | 0 | 0 | 0 | .64 | 1 | .46 | .45 | .49 | ||||||
2 | 0 | 0 | 0 | .54 | .46 | 1 | .52 | .57 | ||||||
3 | 0 | 0 | 0 | .57 | .45 | .52 | 1 | .55 | ||||||
4 | 0 | 0 | 0 | .61 | .49 | .57 | .55 | 1 | ||||||
5 | 0 | .75 | 0 | 0 | ||||||||||
|
||||||||||||||
6 | 0 | .81 | 0 | 0 | Λ2 |
|
|
|||||||
7 | 0 | .77 | 0 | 0 | .63 | .59 | .77 | |||||||
8 | 0 | .68 | 0 | 0 | .74 | .46 | .68 | |||||||
9 | .73 | 0 | 0 | 0 | .71 | .50 | .71 | |||||||
10 | .91 | 0 | 0 | 0 | .78 | .40 | .63 | |||||||
11 | .92 | 0 | 0 | 0 | ||||||||||
12 | .86 | 0 | 0 | 0 | L | |||||||||
|
||||||||||||||
13 | 0 | 0 | .83 | 0 | .63 | .77 | 0 | 0 | 0 | |||||
14 | 0 | 0 | .74 | 0 | .74 | 0 | .68 | 0 | 0 | |||||
15 | 0 | 0 | .61 | 0 | .71 | 0 | 0 | .71 | 0 | |||||
16 | 0 | 0 | .66 | 0 | .78 | 0 | 0 | 0 | .63 |
Note. G is the general factor, F1–F4 are group factors, and PC is the ratio of percent of variance explained by group factor over the percent of variance explained by general factor.
When R is factor analyzed extracting (maximum likelihood) four factors and rotated to an oblique solution using CF κ = 0 (quartimin), the results are shown in the bottom left panel of Table 1. This solution has perfect independent cluster structure – cross-loadings are exactly zero - because when the proportionality constraints hold, the bifactor model is equivalent to a second-order model, which can be represented perfectly by a correlated-factors model. The resulting factor inter-correlations are shown in the middle right panel in Table 1. When this matrix is factored (maximum likelihood, one factor) the resultant four loadings of the primary factors on the second-order factor are shown under the Λ2 label in the lower right panel of Table 1. Also shown are the unique variances ( ) of the primary factors and their square roots. Observe that in this data, equals Φ, exactly, again because the bifactor model in Table 1 is equivalent to a second-order model.
Finally, the heart and soul of the SL is to transform this second-order model into an orthogonal bifactor model with (p + 1) factors. This can be accomplished by defining two matrices. Let us call the original factor loadings in the correlated factors model Λ. Then, define a matrix, L, with dimension p × (p + 1), in this case 4 × 5. The first column of this matrix contains the loadings of the first-order factors on the second-order. The remaining columns are a diagonal matrix with the square roots of the primary factor unique variances on the diagonal (Θ2). This matrix, L, is shown in the bottom right panel of Table 1.
The SL solution is then simply Λ(L), shown in the upper right panel of Table 1. The resulting solution perfectly recovers the true population matrix. Obviously, the solution also reflects the proportionality constraints in the population matrix perfectly. Although this problem was contrived to work perfectly, it is instructive to review where the SL loadings come from and why they are proportional within group factors.
The proportionality of general to group factor loadings is due to an item’s loading on the general factor being equal to its factor loading in the correlated factors solution times the loading of the primary factor on the second-order factor. This latter value will be a constant for all items within a group factor. In turn, an item’s loading on a group factor is its loading in the correlated factors solution times the square root of the unique factor variance, again, this is a constant for items within a group factor. Each set of loadings, general and group, is formed by multiplying the same set of factor loadings from the correlated factors model (Λ) by a different constant; thus, the ratio of the general and group factor loadings is always the ratio of these two constants and is therefore the same for all items within a group factor.
In Table 1, the constraints of the SL are proportionality constraints. In bifactor EFA, the constraints of the SL are proportionality constraints only when Λ is perfect IC structure, with each item loading on one and only one group factor. Otherwise, they are “hidden” linear dependencies, as the following example illustrates. Consider Table 2, where in the upper left panel is shown a “true bifactor” structure in the population. No proportionality constraints were used to generate this data and items vary wildly within group factors in their ratios of variance explained by group over variance explained by the general factor as shown in the column labeled PC. The first five eigenvalues are 5.04, 1.68, 1.57, 1.47, and 0.73, again suggesting four factors, not five.
Table 2.
Schmid-Leiman Solution When Factor Loadings are not Proportional
True Population Model | Schmid−Leiman | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||
Item | G | F1 | F2 | F3 | F4 | PC | G | F1 | F2 | F3 | F4 | PC | ||
1 | .48 | .40 | 0 | 0 | 0 | .70 | .46 | .43 | .02 | .02 | .02 | .87 | ||
2 | .51 | .35 | 0 | 0 | 0 | .48 | .47 | .40 | .03 | .04 | .04 | .72 | ||
3 | .67 | .62 | 0 | 0 | 0 | .87 | .65 | .65 | .01 | .02 | .02 | .99 | ||
4 | .34 | .55 | 0 | 0 | 0 | 2.64 | .38 | .49 | −.03 | −.05 | −.04 | 1.68 | ||
5 | .44 | 0 | .45 | 0 | 0 | 1.03 | .41 | .03 | .47 | .03 | .02 | 1.29 | ||
6 | .40 | 0 | .48 | 0 | 0 | 1.43 | .39 | .01 | .49 | .01 | .01 | 1.59 | ||
7 | .32 | 0 | .70 | 0 | 0 | 4.76 | .36 | −.05 | .65 | −.05 | −.04 | 3.18 | ||
8 | .45 | 0 | .54 | 0 | 0 | 1.45 | .44 | .01 | .55 | .01 | .01 | 1.61 | ||
9 | .55 | 0 | 0 | .43 | 0 | .61 | .50 | .05 | .03 | .05 | .47 | .89 | ||
10 | .33 | 0 | 0 | .33 | 0 | .98 | .32 | .01 | .01 | .01 | .34 | 1.17 | ||
11 | .52 | 0 | 0 | .51 | 0 | .95 | .50 | .02 | .02 | .02 | .54 | 1.16 | ||
12 | .35 | 0 | 0 | .69 | 0 | 4.01 | .40 | −.06 | −.04 | −.05 | .62 | 2.41 | ||
13 | .32 | 0 | 0 | 0 | .65 | 4.14 | .35 | −.04 | −.03 | .59 | −.03 | 2.83 | ||
14 | .66 | 0 | 0 | 0 | .51 | .60 | .59 | .07 | .05 | .59 | .06 | .99 | ||
15 | .68 | 0 | 0 | 0 | .39 | .33 | .58 | .11 | .07 | .49 | .09 | .69 | ||
16 | .32 | 0 | 0 | 0 | .56 | 3.07 | .34 | −.02 | −.02 | .53 | −.02 | 2.47 | ||
| ||||||||||||||
Correlated Factors (Λ) | Factor Intercorrelations (Φ) | |||||||||||||
|
|
|||||||||||||
1 | .59 | .02 | .03 | .03 | 1 | .41 | .41 | .44 | ||||||
2 | .55 | .04 | .05 | .05 | .41 | 1 | .35 | .38 | ||||||
3 | .89 | .02 | .02 | .02 | .41 | .35 | 1 | .38 | ||||||
4 | .68 | −.04 | −.06 | −.05 | .44 | .38 | .38 | 1 | ||||||
5 | .04 | .58 | .03 | .03 | ||||||||||
6 | .02 | .61 | .01 | .01 | Λ2 |
|
|
|||||||
|
||||||||||||||
7 | −.07 | .80 | −.06 | −.05 | .69 | .52 | .72 | |||||||
8 | .02 | .69 | .02 | .01 | .59 | .66 | .81 | |||||||
9 | .07 | .04 | .06 | .62 | .59 | .66 | .81 | |||||||
10 | .02 | .01 | .02 | .45 | .64 | .59 | .77 | |||||||
11 | .03 | .02 | .03 | .70 | ||||||||||
12 | −.08 | −.05 | −.07 | .80 | L | |||||||||
|
||||||||||||||
13 | −.05 | −.03 | .74 | −.04 | .69 | .72 | 0 | 0 | 0 | |||||
14 | .10 | .06 | .73 | .08 | .59 | 0 | .81 | 0 | 0 | |||||
15 | .15 | .09 | .60 | .11 | .59 | 0 | 0 | .81 | 0 | |||||
16 | −.03 | −.02 | .66 | −.03 | .64 | 0 | 0 | 0 | .77 |
Note. G is the general factor, F1–F4 are group factors, and PC is the ratio of percent of variance explained by group factor over the percent of variance explained by general factor.
When the SL is applied, observe first that the correlated factors (CF rotation, k = 0) solution does not have perfect independent cluster structure, as there remain small cross-loadings that are not exactly zero. As a consequence, it is not possible for the SL to recover the true bifactor pattern where items load on one and only group factor and have zero loadings on the other group factors. As shown in the upper right panel of Table 2, although the SL loadings are close to the true population pattern, they are not perfectly accurate, which they should be when the true structure is known. This is but one example to illustrate a major shortcoming of the SL – the SL can only be expected to work perfectly if the population pattern has proportionality because only under that circumstance can a correlation matrix generated from a p + 1 bifactor model be perfectly represented as a p correlated factors model. Under any other circumstance, distortion is introduced by imposing the constraints of the SL.
Most importantly, in Table 2 it appears that the proportionality constraints discussed widely in the literature have disappeared because items within group factors now vary widely in PC in the SL solution. This is illusory, the SL solution in this example has the same constraints or dependencies as before, they are only more difficult to see. For example, consider Items 1 and 4. For these items, the loadings on the general factor are the linear combination:
The loadings on the relevant group factor are the linear combination:
Thus, although there appear to be no proportionality constraints here, there is still the multiplication of two sets of loadings by a common vector, rather than a scalar. Perhaps a better description would be to say that the SL orthogonalization imposes “linear dependencies” in going from a second-order to a bifactor solution. As before, the vector of general factor loadings G in the upper-right panel of Table 2 is a linear combination of the vectors of group factor loadings F1–F4 in the upper-left panel. In turn, these linear dependencies distort the SL leading to inaccurate parameter estimates. The size of these distortions will be commented on in the Discussion.
Given the above, it is fair to ask, why then does most every paper on the SL describe and illustrate proportionality (e.g., Brunner, Nagy, & Wilhelm, 2010) given that one is unlikely to “see it” in any realistic dataset? We believe it is likely that authors are simplifying in order to make a point, or are only considering confirmatory models with perfect IC structure. Regardless, consider what happens when we compute group and general factor loadings only as typically described in the literature; that is, we ignore the small cross-loadings in Table 2. Now the proportionality constraints within group factors are crystal clear.
Finally, we now consider the distorting effects of cross-loadings. Recall that earlier we introduced the CF family of rotations, which are determined by a single parameter, κ, which controls the solution’s preference for minimizing column complexity versus row complexity. All SL must begin with a correlated factors solution, for which the Φ matrix is then “structured” through a second-order model. When the population bifactor structure has no cross-loadings, the initial rotation value of κ used in the oblique rotation of the p factors likely matters little. However, when there are cross-loadings, then SL solutions can differ depending on the value of κ selected because the initial correlated factors solution can differ depending on κ. Also note that, with few exceptions, the presence of cross-loadings violates the linear dependencies of the SL, and any attempt to perform a SL in the presence of such cross-loadings will lead to biased estimates of all loadings.
In the upper left panel of Table 3 we display a true population bifactor loading matrix with modest cross-loadings for items 1, 5, 9, and 10. F2 now has 7 items loading on it and F3 and F4 have 5 each. In Table 3 we do not display the intermediate matrices, but simply display the SL results based on three values of k in the CF rotation: κ = 0, .167, and 1. The first solution seeks to minimize row complexity only (quartimin; κ = 0), the second attempts to balance row with column complexity (parsimax; κ = .167), and the third seeks to minimize column complexity only (facparsim; κ = 1).
Table 3.
Schmid-Leiman when there are cross-loadings, k=0, .167, 1
Population Loading Matrix, ωh=.693 | SL, κ=0 (quartimin), RMSE=.053, ωh=.702 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||
Item | G | F1 | F2 | F3 | F4 | Item | G | F1 | F2 | F3 | F4 |
1 | .50 | .44 | .50 | .00 | .00 | 1 | .59 | .32 | .42 | −.04 | −.04 |
2 | .50 | .36 | .00 | .00 | .00 | 2 | .45 | .43 | .01 | .03 | .03 |
3 | .50 | .44 | .00 | .00 | .00 | 3 | .46 | .49 | .01 | .01 | .01 |
4 | .50 | .31 | .00 | .00 | .00 | 4 | .44 | .39 | .02 | .04 | .04 |
5 | .50 | .00 | .37 | .40 | .30 | 5 | .63 | −.10 | .23 | .29 | .23 |
6 | .50 | .00 | .43 | .00 | .00 | 6 | .50 | .04 | .41 | .04 | .05 |
7 | .50 | .00 | .65 | .00 | .00 | 7 | .56 | −.02 | .61 | .00 | .02 |
8 | .50 | .00 | .33 | .00 | .00 | 8 | .47 | .07 | .33 | .05 | .06 |
9 | .50 | .00 | .30 | .30 | .00 | 9 | .54 | .01 | .23 | .29 | .01 |
10 | .50 | .00 | .30 | .49 | .00 | 10 | .58 | −.03 | .18 | .44 | −.03 |
11 | .50 | .00 | .00 | .32 | .00 | 11 | .46 | .09 | −.03 | .35 | .05 |
12 | .50 | .00 | .00 | .52 | .00 | 12 | .51 | .05 | −.09 | .51 | .00 |
13 | .50 | .00 | .00 | .00 | .54 | 13 | .48 | .01 | .00 | −.01 | .56 |
14 | .50 | .00 | .00 | .00 | .50 | 14 | .47 | .02 | .00 | .00 | .53 |
15 | .50 | .00 | .00 | .00 | .31 | 15 | .43 | .07 | .02 | .04 | .37 |
16 | .50 | .00 | .00 | .00 | .68 | 16 | .50 | −.03 | −.01 | −.03 | .67 |
SL, κ=.167 (parsimax), RMSE=.058, ωh=.627 | SL, κ=1 (facparsim), RMSE=.065, ωh=.587 | ||||||||||
| |||||||||||
1 | .57 | .35 | .42 | −.02 | −.03 | 1 | .55 | .37 | .43 | −.01 | −.04 |
2 | .42 | .45 | .02 | .04 | .04 | 2 | .40 | .47 | .03 | .05 | .04 |
3 | .43 | .51 | .01 | .03 | .03 | 3 | .41 | .53 | .02 | .03 | .03 |
4 | .41 | .41 | .03 | .05 | .06 | 4 | .39 | .43 | .03 | .06 | .06 |
5 | .60 | −.05 | .26 | .33 | .25 | 5 | .58 | −.03 | .27 | .35 | .25 |
6 | .48 | .07 | .42 | .06 | .06 | 6 | .47 | .09 | .43 | .07 | .06 |
7 | .54 | .01 | .62 | .03 | .03 | 7 | .53 | .03 | .63 | .04 | .02 |
8 | .45 | .10 | .34 | .07 | .07 | 8 | .44 | .11 | .34 | .09 | .07 |
9 | .51 | .04 | .24 | .32 | .02 | 9 | .50 | .06 | .25 | .33 | .02 |
10 | .56 | .01 | .20 | .47 | −.02 | 10 | .54 | .02 | .21 | .49 | −.01 |
11 | .43 | .12 | −.02 | .37 | .06 | 11 | .41 | .13 | −.01 | .38 | .06 |
12 | .48 | .08 | −.07 | .54 | .02 | 12 | .46 | .09 | −.06 | .55 | .02 |
13 | .44 | .06 | .03 | .03 | .59 | 13 | .42 | .08 | .04 | .05 | .59 |
14 | .43 | .07 | .03 | .04 | .55 | 14 | .41 | .09 | .04 | .06 | .56 |
15 | .40 | .11 | .04 | .07 | .39 | 15 | .39 | .13 | .05 | .08 | .39 |
16 | .46 | .02 | .02 | .01 | .70 | 16 | .44 | .05 | .03 | .03 | .71 |
Note. G is the general factor, F1–F4 are group factors, RMSE is the root mean square error, and ωh is coefficient omega hierarchical.
We hypothesized that using an initial factor rotation that accommodates cross-loadings (parsimax or facparsim) would produce a more accurate solution than a rotation criterion that is less tolerant to cross-loadings (quartimin); however, this was not the case, and the SL solution for quartimin (κ=0) yielded the lowest RMSE and the most accurate omega-hierarchical “general factor strength” statistic ( ; the proportion of unit weighted score variance attributable to the general factor). Nevertheless, all three solutions are clearly inaccurate, especially for items with cross loadings. For items 1, 5, 9 and 10, the general factor loadings are always over-estimated. These items with cross-loadings tend to have higher communalities which results in the SL “being fooled” into thinking that these items are the best markers of the general factor, when in fact they are not under the true model.
The JB Bifactor Rotation Criteria
Jennrich and Bentler (2011, 2012) describe a new method of exploratory bifactor modeling that addresses the shortcomings of the SL by not imposing any constraints in the patterns of loadings across the general and group factors. Instead, Jennrich and Bentler propose bifactor rotation criteria that can be used directly in the GPA. In the present article, we restrict our investigation to orthogonal bifactor rotations, in which the general factor and all group factors are mutually orthogonal. These are by far the most commonly applied in the exploratory and confirmatory literatures.
Analytic bifactor rotation using the JB rotations proceeds as follows. First, a set of common factors is extracted from the correlation matrix for the indicators. For a bifactor model with p group factors, p+1 factors are extracted in this step; compare this to the SL, in which only p group factors are extracted for the correlated factors solution. Thus, an analytic bifactor rotation with p group factors will always represent more common variance than the SL output with p group factors because an additional factor is extracted in the JB rotation. The (p+1)’th extracted factor is important in explaining some of the technical issues with analytic bifactor rotation, as this factor can be vanishingly small.
Next, the extracted orthogonal (p + 1) - factor loading matrix is rotated according to a bifactor rotation criterion using the gradient projection algorithm. Bifactor rotation criteria take the form (Jennrich & Bentler, 2011)
(17) |
where Λ is the factor loading matrix, Λ2 is the submatrix of Λ containing all but the first column of Λ (i.e., containing only group factors), and Q() denotes a simple-structure EFA rotation criterion. Jennrich and Bentler (2011, 2012) presented the quartimin rotation criterion and geomin rotation criterion as examples of Q(), and named the resulting bifactor rotation criteria the bi-quartimin and bi-geomin rotation criteria, respectively. The former is a member of the CF family with , the latter is not part of the CF family. Following the notation above, equations for these rotation criteria are
(18) |
(19) |
Where r,s≥2, k is the number of group factors and ε is a small positive value, usually 0.1. Note that the geomin rotation criterion and the quartimin rotation criterion only contain penalties for variable complexity, and not for factor complexity (Sass & Schmitt, 2010; Browne, 2001).
Rank deficiency and the JB rotations
One of the major differences between the JB and SL is in how many factors are extracted; the SL requires extracting p common factors, while the JB requires extracting p+1 common factors, where p is the number of group factors. As noted, SL produces (p+1) - factor solutions in which general factor loadings are linearly dependent on the other p factors, while the JB produces an unconstrained (p+1)-factor solution. As is well-known in EFA, over-extraction of factors can result in the extraction of single-item factors which represent the variance in only one item, and in so-called Heywood cases in which near or more than 100% of the variance in an item is explained by the factor model. These phenomena apply directly to the JB rotations; when the JB is performed on a correlation matrix that originates from a p – factor model, or a bifactor model in which the constraints of the SL hold, the resulting communality estimates can be grossly inflated and the resulting factor structure distorted. When the constraints of the SL nearly hold, the (p+1)’th factor can be vanishingly small.
This phenomenon is illustrated in Table 4. For Table 4, we first converted the true bifactor factor loading matrix in the upper left panel of Table 1 (i.e., the bifactor matrix with proportionality – linear dependence) to a correlation matrix according to Equation 1. Next, we extracted (p+1)=5 factors using maximum likelihood factor analysis; the resulting unrotated loadings are in the left panel of Table 4. Note that the first factor is defined almost entirely by the first item4; in this model the first item has a communality estimate of .995, indicating that too many factors were extracted. In the JB rotated solution (bi-quartimim method) in the right panel of Table 4, all factor loadings exhibit some bias and numerous non-zero cross-loadings appear throughout.
Table 4.
JB solution (bi-quartimin) when factor loadings are proportional
Extracted Factor Loading Matrix | JB Rotated Factor Loading Matrix | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
Item | V1 | V2 | V3 | V4 | V5 | G | F1 | F2 | F3 | F4 |
1 | 1.00 | −.02 | .00 | .00 | −.01 | .55 | .83 | .03 | .05 | .06 |
2 | .78 | .04 | .04 | −.05 | .26 | .61 | .55 | −.06 | −.08 | −.08 |
3 | .80 | .04 | .04 | −.05 | .27 | .62 | .56 | −.07 | −.08 | −.08 |
4 | .71 | .03 | .04 | −.05 | .24 | .55 | .50 | −.06 | −.08 | −.07 |
5 | .26 | .68 | −.18 | −.02 | −.01 | .44 | −.01 | .61 | .01 | .01 |
6 | .23 | .61 | −.16 | −.01 | .00 | .39 | −.01 | .55 | .01 | .01 |
7 | .30 | .80 | −.21 | −.02 | −.01 | .51 | −.02 | .72 | .02 | .01 |
8 | .23 | .61 | −.16 | −.01 | .00 | .39 | −.01 | .55 | .01 | .01 |
9 | .28 | .37 | .44 | −.13 | −.07 | .49 | −.02 | .05 | .44 | .03 |
10 | .25 | .33 | .39 | −.12 | −.06 | .44 | −.02 | .04 | .39 | .03 |
11 | .21 | .28 | .33 | −.10 | −.05 | .37 | −.02 | .03 | .33 | .02 |
12 | .30 | .39 | .47 | −.14 | −.07 | .52 | −.03 | .05 | .46 | .03 |
13 | .25 | .29 | .23 | .25 | .04 | .41 | −.01 | .06 | .07 | .30 |
14 | .27 | .32 | .26 | .28 | .05 | .45 | −.01 | .06 | .08 | .33 |
15 | .24 | .29 | .23 | .25 | .04 | .40 | −.01 | .05 | .07 | .29 |
16 | .25 | .30 | .24 | .26 | .05 | .42 | −.01 | .06 | .07 | .30 |
Note. V1–V4 are the unrotated factors, G is the general factor, F1–F4 are group factors.
Rotation of the general factor in the JB rotations
Jennrich and Bentler (2011, pg. 3) state that although the bifactor rotation criteria do not depend on the first column of Λ, it is all columns of Λ, including its first, that are rotated. How is the general factor rotated in the JB rotations, and what implications does this have for the resulting solutions? The second question can be answered briefly here, and is elaborated on in the next section: the implicit rotation of the general factor leads to local minima in the JB rotations and to solutions that depart from bifactor structure.
To answer the first question, we started with the population solution in the first column of Table 3 (i.e., the bifactor pattern with cross-loadings) and performed a single iteration of the GPA, involving a gradient descent step and a projection step. This process is illustrated in Table 5, which shows that the general factor is only rotated in the JB rotations during the projection step, not the gradient descent step. The gradient of the rotation criterion only involves the elements of Λ2, which excludes the general factor; as a result, the gradient descent step leaves the general factor completely unchanged. The general factor loadings in the left panel of Table 5 are exactly the same as the general factor loadings in the center panel, because only the group factors are rotated during the gradient descent step. However, the solution that results from the gradient descent step is not a proper solution; specifically, the rotation matrix is no longer an orthogonal matrix, and as a result, the factor loading matrix suggests biased estimates of common variance for the items and a generally distorted model.
Table 5.
The two steps of the gradient projection algorithm in JB (bi-quartimin) rotation
Unrotated Factor Loading Matrix, fBQ=0.063 | Factor Loading Matrix1 After Gradient Descent Step, fBQ=0.039 | Factor Loading Matrix After Projection Step, fBQ=0.055 | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Item | G | F1 | F2 | F3 | F4 | G | F1 | F2 | F3 | F4 | G | F1 | F2 | F3 | F4 |
1 | .50 | .44 | .50 | .00 | .00 | .50 | .40 | .43 | −.03 | −.02 | .52 | .43 | .48 | −.01 | −.01 |
2 | .50 | .36 | .00 | .00 | .00 | .50 | .34 | −.04 | −.02 | −.01 | .50 | .35 | −.02 | −.01 | −.01 |
3 | .50 | .44 | .00 | .00 | .00 | .50 | .42 | −.05 | −.02 | −.01 | .51 | .43 | −.02 | −.01 | −.01 |
4 | .50 | .31 | .00 | .00 | .00 | .50 | .29 | −.04 | −.02 | −.01 | .50 | .30 | −.02 | −.01 | −.01 |
5 | .50 | .00 | .37 | .40 | .30 | .50 | −.02 | .29 | .35 | .27 | .53 | −.01 | .35 | .39 | .29 |
6 | .50 | .00 | .43 | .00 | .00 | .50 | −.03 | .37 | −.03 | −.02 | .52 | −.01 | .41 | −.01 | −.01 |
7 | .50 | .00 | .65 | .00 | .00 | .50 | −.03 | .58 | −.04 | −.02 | .52 | −.01 | .63 | −.01 | −.01 |
8 | .50 | .00 | .33 | .00 | .00 | .50 | −.02 | .28 | −.03 | −.02 | .51 | −.01 | .31 | −.01 | −.01 |
9 | .50 | .00 | .30 | .30 | .00 | .50 | −.02 | .24 | .26 | −.02 | .52 | −.01 | .28 | .29 | −.01 |
10 | .50 | .00 | .30 | .49 | .00 | .50 | −.02 | .23 | .45 | −.02 | .52 | −.01 | .28 | .48 | −.01 |
11 | .50 | .00 | .00 | .32 | .00 | .50 | −.01 | −.05 | .29 | −.02 | .51 | −.01 | −.02 | .31 | −.01 |
12 | .50 | .00 | .00 | .52 | .00 | .50 | −.01 | −.06 | .48 | −.02 | .51 | −.01 | −.02 | .51 | −.01 |
13 | .50 | .00 | .00 | .00 | .54 | .50 | −.01 | −.04 | −.03 | .52 | .51 | −.01 | −.02 | −.01 | .53 |
14 | .50 | .00 | .00 | .00 | .50 | .50 | −.01 | −.04 | −.03 | .48 | .51 | −.01 | −.02 | −.01 | .49 |
15 | .50 | .00 | .00 | .00 | .31 | .50 | −.01 | −.04 | −.02 | .29 | .50 | −.01 | −.02 | −.01 | .30 |
16 | .50 | .00 | .00 | .00 | .68 | .50 | −.01 | −.05 | −.03 | .66 | .51 | −.01 | −.02 | −.01 | .67 |
| |||||||||||||||
Initial Factor Rotation Matrix (No Rotation) | Factor Rotation Matrix1 After Gradient Descent Step | Factor Rotation Matrix After Projection Step | |||||||||||||
| |||||||||||||||
G | F1 | F2 | F3 | F4 | G | F1 | F2 | F3 | F4 | G | F1 | F2 | F3 | F4 | |
G | 1.00 | .00 | .00 | .00 | .00 | 1.00 | −.03 | −.07 | −.04 | −.02 | 1.00 | −.01 | −.04 | −.02 | −.01 |
F1 | .00 | 1.00 | .00 | .00 | .00 | .00 | .98 | −.02 | .00 | .00 | .01 | 1.00 | .00 | .00 | .00 |
F2 | .00 | .00 | 1.00 | .00 | .00 | .00 | −.03 | .94 | −.03 | −.02 | .04 | .00 | 1.00 | .01 | .00 |
F3 | .00 | .00 | .00 | 1.00 | .00 | .00 | .00 | −.04 | .97 | −.02 | .02 | .00 | −.01 | 1.00 | .00 |
F4 | .00 | .00 | .00 | .00 | 1.00 | .00 | .00 | −.01 | −.01 | .99 | .01 | .00 | .00 | .00 | 1.00 |
Note. G is the general factor, F1–F4 are group factors. fBQ is the bi-quartimin criterion function.
The “rotation matrix” produced by the gradient descent step is not a proper rotation matrix, and is projected onto the manifold of admissible rotation matrices during the projection step
These two issues necessitate the projection step, which projects the rotation matrix onto the desired manifold of orthogonal matrices. The resulting solution is in the right panel of Table 5, and the factor loading matrix and rotation matrix are once again permissible. This projection step changes the factor loadings for the general factor; in the JB rotations, the general factor loadings can only change during this projection step. In this example, the general factor loadings become over-estimated for items with cross-loadings on group factors.
In the section on GPA, we mentioned that the projection step is simply a Procrustes transformation of the rotation matrix produced by the gradient descent step, which is geometrically necessary to optimize the rotation criterion on the nonlinear manifold of rotation matrices (Jennrich, 2001). Therefore, we will say that while the group factors are rotated explicitly in the JB rotations, because the gradient descent step seeks to directly minimize the rotation criterion and produce a more psychometrically interpretable solution, the general factor is only rotated implicitly during GPA, because no psychometric principles (e.g., simple structure) govern the projection step – it simply produces an allowable or proper rotation matrix.
Local minima in the JB rotations
The chief consequence of the implicit rotation of the general factor is that the JB rotations are prone to local minima problems. All rotation criteria optimized using hill descent algorithms are prone to local minima problems, but this is especially true for the JB rotations. To see why, we can conceptualize the JB rotations as a mixture of two factor models: a one-factor model defined by the general factor and a p-factor model, where p is the number of group factors. Assuming that the general factor is orthogonal to the group factors, this model can be written as
(20) |
where Σ is the model-implied correlation matrix, is the first column of the factor loading matrix, contains the remaining columns of the factor loading matrix ( from Equations 17–19), and is a diagonal matrix of unique variances. Thus, the model-implied correlation matrix is the sum of the model-implied reduced correlation matrix for the one-factor model defined by , the model-implied reduced correlation matrix for the group factors defined by , and the unique variances in Θ. Casting the JB rotations as a mixture model might already suggest to some readers that local minima problems may result (e.g., Hipp & Bauer, 2006); depending on the starting values for each component of a mixture, the components of the mixture may adapt to different features of the data and produce different solutions.
In this case, the start values of the algorithm provide initial values of and , which partition the common variance in the items into these two sources. The use of random starting values, which are implemented in most software, makes this partitioning random. The rotation algorithm then minimizes the rotation criterion for based on the starting values for , and any transfer of variance between and during optimization is simply a side-effect of the projection step of GPA (i.e., making the transformation proper). In general, high loadings in result in a positive gradient and a corresponding decrease in the variance accounted for by the group factors during the gradient descent step of GPA. To compensate for this reduction, the general factor tends to absorb some of this variance during the projection step of GPA, which implicitly leads the general factor to accommodate as much variance as possible. This can be seen in the rotation matrix in the right panel of Table 5, which re-defines the general factor as
where G* is the updated general factor and (G, F1–F4) are defined in the first column of Table 5
While the JB rotations tend to shift as much variance onto the general factor as possible, where that variance comes from is not governed by any psychometric principles because it results from the Procrustes projection step. This can lead the rotation algorithm to definitions of the general factor that correspond to distorted group factor structures. Most commonly, the variance contained in one of the group factors is improperly shifted to the general factor, a phenomenon known as factor collapse. Factor collapse in the JB rotations produces the following symptoms: inflated general factor loadings for items that load on the collapsed factor; decreased, or even negative, group factor loadings for items that load on the collapsed group factor; small cross-loadings for the items in the collapsed factor on all other group factors; and biased loadings throughout the solution.
Paradoxically, these “junk” solutions often produce comparable values of the JB rotation criteria compared to solutions that are closer to a “true” bifactor structure, for two reasons. First, when variance shifts away from the group factors into the general factor, the rotation criterion is quantifying the complexity in a smaller amount of variance, and all else equal the rotation criterion should be lowest when the general factor explains as much variance as possible.
The second reason is more subtle. Observe that the bi-quartimin and bi-geomin rotation criteria in Equations 18 and 19 only contain penalties for row complexity, not column complexity. Here, row complexity is operationalized as a function of products of factor loadings in each row of the group factor loading matrix; in bi-quartimin, the products of all pairs of loadings in each row are summed as in the ordinary quartimin rotation criterion, while in bi-geomin, the squared loadings in each row are multiplied as in the ordinary geomin rotation criterion. Mathematically, these functions achieve their lowest values for each variable under one of two conditions: (a) one factor loading is large and the others are very small, or (b) all factor loadings are small. The second case (b) is relatively rare in typical EFA; generally, small factor loadings correspond to small communality estimates which preclude the use of factor analysis. However, in the JB rotations, when a group factor collapses onto the general factor, the remaining common variance for the items on the collapsed factor can become very small, and because the rotation criterion is only a function of the group factor loadings, the rotation criterion can be minimized according to (b).
As a result of these two properties, collapsed factor solutions in the JB rotations can have even lower values of the rotation criteria than solutions that more accurately capture a bifactor structure. In these cases, the rotation criterion function fails as a measure of the interpretability of a solution, as the numerous cross-loadings in collapsed solutions render interpretation of the factors difficult.
To illustrate these phenomena, we removed the cross-loadings from Table 3, yielding a model with perfect IC bifactor structure, and performed both JB rotations on this model using 1,000 random starting values for each rotation, derived by performing the projection step of GPA on 1,000 5×5 matrices of random numbers. Duplicate solutions were removed for both rotations5, yielding five unique solutions for bi-quartimin and 42 unique solutions for bi-geomin. Table 6 displays the two solutions with the lowest function values for each of the JB rotations: the solutions in the left panels match the structure of the population model perfectly and had the lowest function values, while the solutions in the right panels displays factor collapse. Note that the values of the rotation criterion in the right panels are 0.022 and 0.144 for bi-quartimin and bi-geomin respectively, compared to .000 and .072 in the accurate solutions. The function values in the improper solutions are still fairly small, suggesting that the JB rotations produce a low row complexity measure in these solutions despite the factor collapse.
Table 6.
Local minima in the JB rotations (bi-quartimin), using the model in Table 3 with cross-loadings removed
Bi−Quartimin Global Minimum, fBQ=0.000 | Bi−Quartimin Local Minimum, fBQ=0.022 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
Item | G | F1 | F2 | F3 | F4 | G | F1 | F2 | F3 | F4 |
1 | .50 | .44 | .00 | .00 | .00 | .49 | .45 | .01 | .02 | .00 |
2 | .50 | .36 | .00 | .00 | .00 | .49 | .38 | .01 | −.02 | .01 |
3 | .50 | .44 | .00 | .00 | .00 | .49 | .45 | .01 | .02 | .00 |
4 | .50 | .31 | .00 | .00 | .00 | .48 | .33 | .02 | −.04 | .02 |
5 | .50 | .00 | .37 | .00 | .00 | .48 | .02 | .39 | −.04 | .01 |
6 | .50 | .00 | .43 | .00 | .00 | .49 | .01 | .44 | −.02 | .01 |
7 | .50 | .00 | .65 | .00 | .00 | .51 | −.01 | .64 | .07 | −.01 |
8 | .50 | .00 | .33 | .00 | .00 | .48 | .03 | .35 | −.05 | .02 |
9 | .50 | .00 | .00 | .30 | .00 | .57 | −.07 | −.06 | .00 | −.06 |
10 | .50 | .00 | .00 | .49 | .00 | .65 | −.16 | −.14 | .12 | −.12 |
11 | .50 | .00 | .00 | .32 | .00 | .58 | −.08 | −.07 | .01 | −.06 |
12 | .50 | .00 | .00 | .52 | .00 | .66 | −.17 | −.15 | .14 | −.13 |
13 | .50 | .00 | .00 | .00 | .54 | .49 | .01 | .01 | .00 | .55 |
14 | .50 | .00 | .00 | .00 | .50 | .49 | .01 | .01 | −.01 | .51 |
15 | .50 | .00 | .00 | .00 | .31 | .48 | .03 | .03 | −.08 | .33 |
16 | .50 | .00 | .00 | .00 | .68 | .50 | −.01 | −.01 | .05 | .68 |
| ||||||||||
Bi−Geomin Global Minimum, fBG=0.072 | Bi−Geomin Local Minimum, fBG=0.144 | |||||||||
| ||||||||||
Item | G | F1 | F2 | F3 | F4 | G | F1 | F2 | F3 | F4 |
| ||||||||||
1 | .50 | .44 | .00 | .00 | .00 | .58 | −.01 | −.04 | −.31 | −.03 |
2 | .50 | .36 | .00 | .00 | .00 | .56 | .03 | −.01 | −.26 | −.01 |
3 | .50 | .44 | .00 | .00 | .00 | .58 | −.01 | −.04 | −.31 | −.03 |
4 | .50 | .31 | .00 | .00 | .00 | .54 | .05 | .00 | −.23 | .01 |
5 | .50 | .00 | .37 | .00 | .00 | .46 | .01 | .42 | .00 | .04 |
6 | .50 | .00 | .43 | .00 | .00 | .46 | −.03 | .47 | .00 | .04 |
7 | .50 | .00 | .65 | .00 | .00 | .47 | −.14 | .65 | .01 | .01 |
8 | .50 | .00 | .33 | .00 | .00 | .46 | .03 | .39 | .00 | .05 |
9 | .50 | .00 | .00 | .30 | .00 | .54 | .08 | .01 | .21 | .01 |
10 | .50 | .00 | .00 | .49 | .00 | .61 | .00 | −.06 | .34 | −.04 |
11 | .50 | .00 | .00 | .32 | .00 | .55 | .07 | .00 | .22 | .00 |
12 | .50 | .00 | .00 | .52 | .00 | .62 | −.01 | −.07 | .36 | −.05 |
13 | .50 | .00 | .00 | .00 | .54 | .46 | −.01 | .04 | .00 | .58 |
14 | .50 | .00 | .00 | .00 | .50 | .46 | .01 | .04 | .00 | .54 |
15 | .50 | .00 | .00 | .00 | .31 | .45 | .08 | .07 | .00 | .37 |
16 | .50 | .00 | .00 | .00 | .68 | .46 | −.07 | .02 | .01 | .70 |
Note. G is the general factor, F1–F4 are group factors. fBQ is the bi-quartimin criterion function, fBG is the bi-geomin criterion function.
To illustrate additional features of the local optima problem, we performed both JB rotations on the factor loading matrix in Table 2, also with perfect IC bifactor structure, using 1,000 random starting values for each rotation. Duplicate solutions were removed for both rotations, yielding one unique solution for bi-quartimin and 28 unique solutions for bi-geomin. Table 7 displays the solution for bi-quartimin and the three solutions with the lowest function values for bi-geomin. As before, the bi-quartimin solution and the best bi-geomin solution recovered the population model perfectly, while the local minima in bi-geomin displayed factor collapse. Two differences are worth noting in this analysis. First, unlike in the previous case, bi-quartimin rotation worked perfectly, with all 1,000 random starts converging to the same, correct solution; thus, the JB rotations may not have local minima in all cases, but they are certainly common. Second, the second-best bi-geomin solution contains two collapsed factors instead of one; this demonstrates that these local minima can take many forms, not necessarily restricted to the collapse of a single factor.
Table 7.
Local minima in the JB rotations (bi-quartimin), using the model in Table 2
Bi−Quartimin Global Minimum, fBQ=0.000 | Bi−Geomin Global Minimum, fBG=0.077 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
Item | G | F1 | F2 | F3 | F4 | G | F1 | F2 | F3 | F4 |
1 | .48 | .40 | .00 | .00 | .00 | .48 | .40 | .00 | .00 | .00 |
2 | .51 | .35 | .00 | .00 | .00 | .51 | .35 | .00 | .00 | .00 |
3 | .67 | .62 | .00 | .00 | .00 | .67 | .62 | .00 | .00 | .00 |
4 | .34 | .55 | .00 | .00 | .00 | .34 | .55 | .00 | .00 | .00 |
5 | .44 | .00 | .45 | .00 | .00 | .44 | .00 | .45 | .00 | .00 |
6 | .40 | .00 | .48 | .00 | .00 | .40 | .00 | .48 | .00 | .00 |
7 | .32 | .00 | .70 | .00 | .00 | .32 | .00 | .70 | .00 | .00 |
8 | .45 | .00 | .54 | .00 | .00 | .45 | .00 | .54 | .00 | .00 |
9 | .55 | .00 | .00 | .43 | .00 | .55 | .00 | .00 | .43 | .00 |
10 | .33 | .00 | .00 | .33 | .00 | .33 | .00 | .00 | .33 | .00 |
11 | .52 | .00 | .00 | .51 | .00 | .52 | .00 | .00 | .51 | .00 |
12 | .35 | .00 | .00 | .69 | .00 | .35 | .00 | .00 | .69 | .00 |
13 | .32 | .00 | .00 | .00 | .65 | .32 | .00 | .00 | .00 | .65 |
14 | .66 | .00 | .00 | .00 | .51 | .66 | .00 | .00 | .00 | .51 |
15 | .68 | .00 | .00 | .00 | .39 | .68 | .00 | .00 | .00 | .39 |
16 | .32 | .00 | .00 | .00 | .56 | .32 | .00 | .00 | .00 | .56 |
| ||||||||||
Bi−Geomin Local Minimum 1, fBG=0.205 | Bi−Geomin Local Minimum 2, fBG=0.212 | |||||||||
| ||||||||||
Item | G | F1 | F2 | F3 | F4 | G | F1 | F2 | F3 | F4 |
| ||||||||||
1 | .36 | .40 | .31 | .00 | .00 | .57 | .00 | .01 | −.25 | .00 |
2 | .38 | .35 | .33 | .00 | .00 | .58 | −.04 | .03 | −.22 | .02 |
3 | .51 | .62 | .44 | .00 | .00 | .82 | .03 | .00 | −.39 | −.03 |
4 | .26 | .55 | .22 | .00 | .00 | .52 | .13 | −.05 | −.35 | −.10 |
5 | .59 | .00 | .00 | .00 | −.23 | .36 | −.03 | .51 | .00 | .07 |
6 | .57 | .00 | −.05 | .00 | −.24 | .33 | .00 | .53 | .00 | .06 |
7 | .64 | .00 | −.24 | .00 | −.35 | .26 | .11 | .72 | .00 | .01 |
8 | .65 | .00 | −.05 | .00 | −.27 | .37 | .00 | .59 | .00 | .07 |
9 | .42 | .00 | .36 | .43 | .00 | .61 | −.05 | .03 | .33 | .03 |
10 | .25 | .00 | .21 | .33 | .00 | .39 | .00 | .01 | .26 | .00 |
11 | .39 | .00 | .34 | .51 | .00 | .61 | .00 | .01 | .39 | .00 |
12 | .27 | .00 | .22 | .69 | .00 | .53 | .15 | −.05 | .53 | −.10 |
13 | .46 | .00 | −.03 | .00 | .56 | .27 | .21 | .00 | .00 | .64 |
14 | .67 | .00 | .24 | .00 | .44 | .56 | .00 | .08 | .00 | .61 |
15 | .64 | .00 | .30 | .00 | .34 | .57 | −.07 | .10 | .00 | .52 |
16 | .43 | .00 | .00 | .00 | .48 | .27 | .17 | .01 | .00 | .56 |
Note. G is the general factor, F1–F4 are group factors. fBQ is the bi-quartimin criterion function, fBG is the bi-geomin criterion function.
These examples illustrate the local minima problem in the JB rotations and motivate the use of multiple random starting values in any application of the JB. If GPA is only run with a single starting value, any of the possible solutions may be identified, which could lead researchers to use these distorted solutions in their analyses or to simply conclude that the JB rotations are inadmissible. On the contrary, in both models with both rotation criteria, the correct solution was identified upon examination of the local minima. In short, the JB rotations work, but are vulnerable to local minima. In all of these examples, the global minimum corresponded to the correct solution; however, these are contrived examples with all cross-loadings set to zero in the population model, and we do not expect the global minimum to correctly identify the best model so consistently in real data.
Bias in cross-loadings in the bi-quartimin rotation criterion
In the previous sections, the two JB rotations, bi-quartimin and bi-geomin, were discussed interchangeably because the same issues (rank deficiency and local minima) occur to both. In this section we compare the relative performance of the two JB rotations in the estimation of cross-loadings. In general, the performance of EFA rotation criteria suffers in the presence of cross-loadings because most rotation criteria, including quartimin, are designed to identify solutions in which each variable loads strongly on one and only one factor, a design feature that penalizes cross-loadings. Generally, any rotation criterion that minimizes row complexity will break down in the presence of cross-loadings. One notable exception is the geomin criterion (Yates, 1987, pp. 67–74) which is minimized as long as each row of the loading matrix has at least one near-zero loading.
It is straightforward to show that the bi-quartimin rotation criterion is not a measure of departure from bifactor structure in the presence of cross-loadings. In Equation 18 for the bi-quartimin rotation criterion, note that each loading is multiplied by each other loading in the corresponding row of the loading matrix. If the loading matrix exhibits perfect bifactor structure, only one element of each row is nonzero and all products are zero; however, if two elements in a row are nonzero, some products are nonzero and the optimization procedure will search for a loading matrix that minimizes these cross-loadings at the expense of the integrity of the rest of the loading matrix. In contrast, the geomin rotation criterion in Equation 19 uses squared factor loadings instead of products of distinct factor loadings, so the geomin rotation criterion does not penalize cross-loadings explicitly, and has been recommended in cases where cross-loadings are expected (Browne, 2001). Thus, we expect the bi-geomin rotation criterion to be more tolerant of cross-loadings than the bi-quartimin rotation criterion.
We used the bifactor model with cross-loadings in Table 3 to compare the relative performance of the two JB rotations in the presence of cross-loadings. We transformed the population model into a correlation matrix using Equation 1, and from this correlation matrix an initial unrotated factor loading matrix with five factors was extracted using maximum likelihood factor analysis. Next, 1,000 random rotation matrices were used as starting rotations for each of the two analytic bifactor rotation methods. As previously discussed, the JB rotations have problems with local minima and factor collapse, and a discussion of parameter bias is not complete without also addressing these issues.
Duplicate solutions were removed from the 1,000 resulting solutions for each of the JB bifactor rotations. The bi-quartimin solution, and the three bi-geomin solutions with the lowest function values, are displayed in Table 8. Bi-quartimin yielded one unique solution, whereas bi-geomin yielded 47 unique solutions.
Table 8.
Bi-quartimin and bi-geomin when there are cross-loadings
Bi-Quartimin Global Minimum, fBQ=0.055 | Bi-Geomin Global Minimum, fBG=0.155 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Item | G | F1 | F2 | F3 | F4 | G | F1 | F2 | F3 | F4 |
1 | .72 | .20 | −.14 | −.29 | −.17 | .50 | .44 | .50 | .00 | .00 |
2 | .47 | .40 | .01 | .01 | .04 | .50 | .36 | .00 | .00 | .00 |
3 | .48 | .46 | −.04 | −.01 | .03 | .50 | .44 | .00 | .00 | .00 |
4 | .46 | .36 | .04 | .02 | .05 | .50 | .31 | .00 | .00 | .00 |
5 | .72 | −.23 | −.18 | .14 | .13 | .51 | .00 | .36 | .40 | .30 |
6 | .62 | −.09 | .13 | −.15 | −.07 | .50 | .00 | .43 | .00 | .00 |
7 | .72 | −.20 | .09 | −.27 | −.15 | .51 | .00 | .65 | .00 | .00 |
8 | .57 | −.04 | .15 | −.10 | −.03 | .50 | .00 | .33 | .00 | .00 |
9 | .63 | −.10 | .01 | .15 | −.08 | .50 | .00 | .29 | .30 | .00 |
10 | .67 | −.15 | −.08 | .30 | −.12 | .51 | .00 | .29 | .49 | .00 |
11 | .48 | .05 | .06 | .33 | .03 | .50 | .00 | −.01 | .32 | .00 |
12 | .53 | .00 | −.04 | .49 | −.01 | .50 | .00 | −.01 | .52 | .00 |
13 | .47 | .02 | −.01 | .00 | .56 | .50 | .00 | .00 | .00 | .54 |
14 | .47 | .03 | .01 | .01 | .53 | .50 | .00 | .00 | .00 | .50 |
15 | .45 | .07 | .09 | .04 | .37 | .50 | .00 | .00 | .00 | .31 |
16 | .49 | −.01 | −.06 | −.02 | .68 | .50 | .00 | .00 | .00 | .68 |
| ||||||||||
Bi-Geomin Local Minimum, fBG=0.180 | Bi-Geomin Local Minimum, fBG=0.189 | |||||||||
| ||||||||||
1 | .60 | .44 | .37 | −.01 | −.01 | .69 | .44 | −.17 | .00 | .00 |
2 | .43 | .35 | .00 | .27 | .00 | .43 | .36 | .26 | .00 | .00 |
3 | .43 | .43 | .00 | .27 | .00 | .43 | .44 | .26 | .00 | .00 |
4 | .43 | .30 | .00 | .27 | .00 | .43 | .31 | .26 | .00 | .00 |
5 | .71 | .01 | .01 | −.22 | .29 | .62 | .00 | −.06 | .40 | .30 |
6 | .58 | .00 | .32 | .01 | .00 | .65 | .00 | −.11 | .00 | .00 |
7 | .65 | .00 | .48 | −.12 | .00 | .76 | .00 | −.30 | .00 | .00 |
8 | .54 | −.01 | .24 | .07 | .00 | .60 | .00 | −.02 | .00 | .00 |
9 | .65 | .00 | .02 | −.11 | .00 | .58 | .00 | .00 | .30 | .00 |
10 | .72 | .00 | −.11 | −.23 | −.01 | .58 | .00 | .01 | .49 | .00 |
11 | .55 | .00 | −.21 | .05 | .00 | .43 | .00 | .26 | .32 | .00 |
12 | .63 | .00 | −.35 | −.07 | .00 | .43 | .00 | .26 | .52 | .00 |
13 | .43 | .00 | .00 | .24 | .55 | .43 | .00 | .26 | .00 | .54 |
14 | .43 | .00 | .00 | .24 | .51 | .43 | .00 | .26 | .00 | .50 |
15 | .43 | .00 | .00 | .25 | .32 | .43 | .00 | .26 | .00 | .31 |
16 | .43 | .00 | .00 | .23 | .69 | .43 | .00 | .26 | .00 | .68 |
Note. G is the general factor, F1–F4 are group factors. fBQ is the bi-quartimin criterion function, fBG is the bi-geomin criterion function.
The bi-quartimin solution in Table 8 clearly demonstrates factor collapse; here, the bi-quartimin consistently produced the same collapsed solution for all 1,000 sets of starting values. The result for bi-geomin are harder to interpret. We attribute no psychometric meaning to the presence of 47 unique solutions, except that this example reemphasizes the vulnerability of bifactor rotation criteria to local minima. The solution with the lowest function value, displayed in the top right panel of Table 8, reproduces the population loading matrix in the first column of Table 3 almost perfectly, with a maximum bias of .01 after rounding. This is an example of the JB rotations at their finest, able to perfectly recover a true bifactor structure when it exists.6 The problem, of course, lies in the 46 other solutions. Of the 1,000 random starts, only 137, or 13.7%, converged to this near-perfect solution. In contrast, the next best local minimum, in the bottom left panel of Table 8, demonstrates factor collapse, but 247 of the 1,000 random starts converged to this solution. In other words, a researcher who uses only a single set of starting values is almost twice as likely to identify the collapsed solution compared to the best solution!
Discussion
Exploratory factor analysis continues to play an important role in understanding the structure of psychological measures and we believe that with the rising popularity of exploratory structural equations modeling, this will continue to be the case in future research. A contemporary trend in psychometrics is the rising popularity of bifactor modeling applications (Chen, West, & Sousa, 2007; Reise, 2012). From an exploratory standpoint, however, the only available bifactor model has been the Schmid-Leiman (1957) orthogonalization of a second-order model. Recently, Jennrich and Bentler (2011; 2012) have introduced true analytic bifactor rotations to the field. In this tutorial, we reviewed each of these methods, highlighting particular limitations that we believe applied researchers should be aware of.
If a researcher believes that item response data have a bifactor structure in the population, then they are implicitly assuming that there is one general factor that explains some of the common variance among all indicators, and p so-called “group factors” that explain some remaining common variance that is unique to items within group factors. One interesting feature of this bifactor structure, however, is that although the number of latent variables in the model is p + 1, we would not expect the dimensionality of the reduced corresponding correlation matrix (with communality estimates on the diagonal) to be p + 1, but in fact somewhere between p and p + 17, depending on the degree to which the constraints of the SL are violated. Where a matrix falls along this continuum has important implications for both the SL and JB, as we review below.
We begin our discussion by first considering the SL orthogonalization. The SL starts with the assumption of a p + 1 bifactor structure, extracts p orthogonal factors from the data, and rotates those factors obliquely. Next, the SL places a (1-factor) structure on the first-order factor correlations, thus turning the model into a second-order model, and finally transforms the second-order model into a bifactor model (p + 1). To fully appreciate the limitations of the SL, and the conditions wherein one expects it to excel, it is important to recognize the challenges in going from p + 1 orthogonal factors, to p correlated factors, to p correlated factors caused by a second-order factor, and then ultimately to a p + 1 bifactor.
In our first demonstration, we illustrated both the “proportionality constraints” of the SL and demonstrated the conditions in which the SL works optimally. Specifically, when each item loads on the general factor and one and only one group factor, and the ratio of variance explained by the group factor over variance explained by the general is constant within group factors, the SL recovers the population structure exactly. The reason is that under these conditions, the dimensionality of a reduced correlation matrix is p, that is, bifactor data that are consistent with proportionality have complete linear dependence in their factor structure. Thus, when the bifactor model has no cross-loadings on group factors, the data can be perfectly modeled by a p correlated factors model with perfect cluster structure. When that is the case, the transition to a second-order representation, and subsequent bifactor orthogonalization, works perfectly. The reason it works perfectly is that only under proportionality are the second-order and bifactor models equivalent (Yung et al., 1999); in other words, for any bifactor model with proportionality, there will be an equivalent second-order model.
In the second demonstration, we showed what happens when, more realistically, the model in the population does not match the proportionality constraints imposed by the SL. In this demonstration we proposed a model in which the ratio of variance explained by the group factor over variance explained by the general factor varied significantly both between and within group factors. Under this condition, the SL cannot recover the true population structure. The reason is that there is less linear dependence between the general and group factors in this case, or stated differently, it is impossible to take a p + 1 bifactor structure with independent cluster structure and reduce it down to a p correlated factors model with independent cluster structure. It is as simple as that – you can’t take something with a higher dimensionality and squeeze it into a lower rank without some constraints and if those constraints aren’t “in the data” some inaccuracies in the SL are inevitable.
A second point raised in the second demonstration is that if the constraints do not hold in the data, the SL no longer produces results that have an obvious proportionality constraint as they did in the first demonstration. Instead, the PC values in the SL were simply regressed toward the group factor mean PC values rather than being exactly equal within group factor. We then showed that this disappearance of proportionality constraint is illusory. When cross-loadings in the correlated factors model are taken into account, and used in the SL formula, there are still (hidden) linear dependencies affecting the SL parameter values, leading to solutions that must be inaccurate. These dependencies are due to multiplying a vector (loadings in the correlated-factors model) by two common vectors – loadings of the first-order factors on the second-order, and the vector of the square roots of unique first-order factor variances. Moreover, we showed that if one considers all cross-loadings zero in the correlated-factors model, the resulting SL then has obvious proportionality constraints.
Finally, in our third demonstration, we considered the inaccuracies caused by cross-loadings. In no way are we claiming that we have exhaustively covered all possible violations of perfect cluster structure, or that our contrived demonstration corresponds to any “realistic” data set. Our point, rather, was to show that when there are cross-loadings on group factors, it is nearly impossible to represent the true bifactor structure as a correlated factors model, and, in turn, to ultimately derive correct parameter estimates. In the presence of cross-loadings, loadings on the general factor will tend to be biased positively. Second, and drawing on our description of the CF family of rotations, in the presence of cross-loadings, the particular rotation selected will result in somewhat different SL results, including model derived indices such as coefficient omega hierarchical, as well as the percent of common variance due to the general versus group factor.8
Next, we discussed the recently developed Jennrich and Bentler (JB) rotations. These rotations do not involve a transformation of other solutions, do not impose any sort of constraints on the factor structure, and are “true” analytic bifactor rotation criteria, designed to be minimized when the factor structure is IC bifactor. These theoretical advantages make the JB rotations appealing because they appear to overcome the limitations of the SL. However, these new rotations come with limitations of their own, which must be considered in any application.
In the first example, we showed that the JB rotations break down when the constraints of the SL are satisfied. In other words, the JB fails in exactly those situations in which the SL excels. If the constraints of the SL hold, the “bifactor” structure with (p+1) factors can be represented perfectly using only p factors, and attempting to extract (p+1) factors could produce Heywood cases or other problems in the resulting solutions. This is not a technical issue with the JB rotations per se; any bifactor rotation that attempts to exactly recover a (p+1)-factor bifactor structure will break down in the extraction step if there are truly only p common factors in the data. Similarly, we expect the JB solutions to perform poorly and produce factor collapse more often when the (p+1)’th extracted factor is small, as when the constraints of the SL nearly hold.
Next, we discussed how the general factor is only rotated implicitly during the projection step in JB. This results in a loose “definition” of the general factor, which produces local minima and factor collapse. Most worrisome, factor collapse can occur even when a bifactor structure with perfect IC group factors exists in the data. These results suggest that researchers should always use multiple random starting values when using the JB rotations. Here, we arbitrarily chose 1,000 random starting values, which seems to have worked well. Also worrisome, the number of solutions produced by the JB rotations can be very large; in the last example, 47 solutions were found using bi-geomin rotation, suggesting that more starting values would have resulted in still more solutions. We also observed that bi-quartimin resulted in fewer solutions than bi-geomin in all examples, possibly due to the tolerance for cross-loadings in the latter.
For the sake of completeness, we also used 1,000 random starting values for all oblique rotations used in the SL demonstrations, and only one unique solution was identified in each case. Thus, although all EFA rotation criteria are theoretically prone to local minima (Browne, 2001; Rozeboom, 1992), the CF rotations in these examples produced only one solution each. We expect that, in general, the SL will be much less vulnerable to local minima problems than the JB, a potential advantage of the former. Lastly, we illustrated the theoretical superiority of bi-geomin to bi-quartimin rotation in the presence of cross-loadings. This advantage is borrowed from the well-known superior performance of geomin over quartimin in the presence of cross-loadings. This advantage has been proven in simulation as well (Bandalos & Kopp, 2012) even without the use of multiple random starts. The caveat is that, again, many random starting values are needed to identify the correct solution, and in our example, more starting values converged to a collapsed solution than to the correct solution.
Consequences for Applied Research
At this point we cannot offer any new approaches to exploratory bifactor modeling, but we can offer some suggestions for using the SL and JB in applied contexts. Before offering these suggestions, however, we believe that if a researcher believes that for a given data set, a bifactor structure exists in the population, then some fruitful preliminary explorations should be conducted prior to even exploratory bifactor analysis. Specifically, we recommend two methods that we have found highly informative. The first is ICLUST (Revelle, 1979), which is a hierarchical clustering technique, and is available in the psych package in R (Revelle, 2015). The second is DETECT and its derivatives (Stout, Habing, Douglas, & Kim, 1996; Zhang, 2007) which is a non-parametric approach to evaluating dimensionality that uses a genetic algorithm to assign items to clusters; the DETECT index is implemented in the sirt package in R (Robitzsch, 2016). If these analyses are not supportive of the a priori hypothesized content clusters in terms of number of content domains, and items clustering where they are theorized, then a researcher may want to re-evaluate their theory prior to estimating a bifactor model.
If the researcher then proceeds to a SL, we believe it is important to treat this method as a descriptive technique only, and not as an estimator of a bifactor structure in the population. The chief reason for this caution is that the SL can only be accurate when certain, highly unlikely, conditions exist (perfect cluster structure, proportionality) and the sample is large enough so that the correlation matrix reflects the population. In most applied circumstances, the SL loadings are distorted in complex ways by the imposed linear dependencies, and can yield very different results under different values of κ, or under different rotations in general. This is not to say that SL loadings will be wildly innacurate, completely invalid, or not worth estimating – some information is always better than no information. Indeed, in the present examples, the solutions are, arguably, not that far off. Moreover, we believe that SL results can be useful as starting values for iterated target bifactor rotations (Reise, Moore, & Maydeu-Olivares, 2011, for suggesting priors for Bayesian factor analysis (Moore, Reise, Depaolo, & Haviland, 2013), or in suggesting a confirmatory model more generally.
If the researcher instead decides to use the JB, we highly recommend the use of multiple random starting values to account for the local minima problem. Mplus users can specify both the number of random starting values and the number of solutions to report. We recommend at least 1,000 random starting values and to examine at least the best 10 solutions. If cross-loadings are hypothesized, we recommend the bi-geomin rotation criterion due to its theoretical (Jennrich & Bentler, 2012) and empirical (Bandalos & Kopp, 2012) superiority in estimating cross-loadings.
The local minima problems in the JB do not preclude its use in research; the theoretical justification for these rotation criteria still stands (Jennrich & Bentler, 2011, 2012) and in only one case (bi-quartimin with cross-loadings) did the JB rotations ultimately fail to recover the population model. However, in only two cases did all 1,000 random starting values converge to the same solution. In short, the JB rotations “work,” but require multiple random starting values to work consistently.
If multiple solutions are found, the ensuing problem of model selection becomes tricky. All EFA solutions are mathematically equivalent and only differ in interpretability; therefore, when the rotation criterion fails to properly quantify departure from interpretability, the researcher must use their own judgment in selecting a model for subsequent analysis. That being said, solutions with obvious factor collapse and numerous cross-loadings on the collapsed factor, such as those included in Tables 6, 7, and 8, can probably be safely discarded. If more than one solution with IC structure is found, and the solutions suggest different substantive interpretations, the researcher should report all of those solutions and acknowledge the implications of such a finding. For example, if a hypothesized group factor collapses cleanly onto the general factor, with few or no salient cross-loadings for items in the collapsed factor, this may indicate that these items can act as “pure” indicators of the general factor, unadulterated by residual variance from a group factor. Such findings are especially relevant to intelligence research, in which “pure” indicators of g are highly valued. However, if another solution places those indicators on a group factor together, then both models are supported by the data. Fixing all cross-loadings to zero and comparing the relative fit of the two models using CFA may provide insight into which model is better supported by the data.
Conclusion
Exploratory bifactor modeling is becoming increasingly relevant in psychometrics as a way to represent multidimensionality in psychological constructs without the restrictive assumptions of CFA. However, bifactor EFA can only be as valid as the procedures used to perform it. This research describes the limitations of the SL and JB procedures for bifactor EFA so that researchers can be aware of these limitations and so that research in this area can benefit from this awareness.
An excellent example of such research involved using bifactor EFA combined with exploratory SEM (B ESEM; Morin, Arens, & Marsh, 2015) to assess two sources of construct-relevant multidimensionality: hierarchically structured constructs, which suggests a bifactor model; and impure indicators which may load on multiple constructs, which suggests an EFA approach. The researchers were also interested in assessing measurement invariance across gender. Clearly, bifactor ESEM is the appropriate analysis for such a study.
The authors used target rotation to estimate the measurement portion of ESEM models, and we take no issue with this approach. However, we would not recommend the use of the SL or JB for ESEM without acknowledging the shortcomings of both. Because the SL is equivalent to a second-order model, such an analysis would not be a “true” bifactor ESEM. If the SL transformed model is used as the measurement model, primary loadings and cross-loadings are likely to be biased unless the constraints of the SL hold; that is, unless the bifactor model of interest is equivalent to a second-order model. These biases in the measurement model could in turn lead to bias in the structural model and/or distort the analysis of measurement invariance.
If the JB is used instead, the bi-geomin rotation seems appropriate due to the a priori interest in cross-loadings. However, the local minima in JB make it more difficult to assess measurement invariance, as different solutions might be identified for the two gender groups, suggesting that multiple random starts be used and multiple solutions examined. Selecting a solution to compare across groups is difficult, however, without biasing a measurement invariance analysis towards a desired result. Thus, it seems that both the JB and SL are ill-suited for direct use in this sort of analysis, and perhaps in bifactor ESEM in general, and target rotation would be more appropriate. The SL and JB can certainly be used to help specify the target matrix for target rotation, with the caveat that multiple solutions in the JB may suggest multiple target patterns.
Another potentially fruitful application of JB in guiding the specification of CFA models was used in Murray and Johnson (2013) when comparing bifactor and second-order models for the latent structure of cognitive abilities. In this study, the observed variables were measures of cognitive ability and the factors represented different dimensions of cognitive ability (Verbal, Perceptual, Rotation). Those authors were primarily concerned with a potential bias in fit indices in CFA towards the bifactor model. Murray and Johnson found that the bifactor solution produced by bi-quartimin rotation deviated substantially from the oblique rotations (geomin, oblimin, promax), with the Verbal factor failing to manifest in either of the two batteries examined. As a result, the CFA phase of the study compared two bifactor models, one with a separate Verbal group factor and one without. We believe this decision may have been affected by the local minima problems in the JB, and it is likely that a different local minimum corresponded more closely with the group factor structure suggested by the oblique rotations. When the SL and different solutions in the JB suggest different confirmatory models, a comparison of those confirmatory models may be very fruitful in identifying the best model, fit comparison biases notwithstanding. The SL, represented in the oblique rotations in Murray and Johnson, and the JB have great potential in suggesting CFA models as well as target rotations in EFA, but the limitations of the two approaches must be taken into account.
These studies are only two examples of applications of bifactor EFA, and interest in the area is expanding rapidly. According to Google Scholar, at the time of this writing, the original JB paper (Jennrich & Bentler, 2011) has been cited 95 times and the follow-up paper (Jennrich and Bentler, 2012) has been cited 35 times. The original SL paper (Schmid & Leiman, 1957) has been cited 265 times since 2011, reflecting renewed interest in the bifactor model.
In this tutorial, we explained several key properties and limitations of the SL and JB rotations to help guide researchers in the proper use of these tools. Although other methods of exploratory bifactor modeling exist, such as iterated target rotation and Bayesian EFA, the SL and JB are the only existing bifactor EFA methods that are purely exploratory, and we believe these methods can be valuable tools for researchers unwilling or unable to incorporate prior information into their analysis. We hope that this research will help researchers to properly apply the SL and JB to their own data, and to capitalize on the renewed interest in the bifactor model.
Acknowledgments
The ideas and opinions expressed herein are those of the authors alone, and endorsement by the University of California, Los Angeles, the National Institutes of Health, or the DMS - CDS&E-MSS is not intended and should not be inferred. The authors wish to thank Peter M. Bentler for his comments on an earlier draft of the manuscript.
This research was supported through Algorithms for Measurement Model Specification Search (PI: Peter Spirtes) DMS - CDS&E-MSS, 1317428. Additional research support was obtained through the National Institutes of Health NIH Roadmap for Medical Research Grant (AR052177; PI: David Cella). The authors wish to thank Peter M. Bentler for his comments on an earlier draft of the manuscript.
Funding: This work was supported by Grant 1317428 from the DMS - CDS&E-MSS and Grant AR052177 from the National Institutes of Health.
Role of the Funders/Sponsors: None of the funders or sponsors of this research had any role in the design and conduct of the study; collection, management, analysis, and interpretation of data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.
Footnotes
While oblique versions of the JB rotations exist (Jennrich and Bentler, 2012), oblique bifactor models run contrary to the conceptualization of the group factors as representing residual variance, which is by definition orthogonal to the general factor. For this and other very technical reasons, we ignore the oblique case in this tutorial.
The initial estimation of communalities (e.g., squared multiple correlations) and the choice of extraction method (e.g., maximum likelihood), and the number of factors extracted are of course important in terms of the accuracy of the ultimate bifactor solution. However, these topics are not important for understanding the relative limitations of SL or JB rotations, and thus will not be discussed further.
Yung et al. (1999) present a more complete treatment of the Schmid-Lieman orthogonalization, in which there are arbitrarily many levels of factors.
The true factor loading was .9973, the results in the table were rounded to 2 decimal places
GPArotation returns solutions with seemingly random orderings and signs (positive or negative) of group factors. Because the factors are orthogonal, any reordering of group factors or change in sign of any factor is mathematically equivalent to any other. Unique solutions were thus identified as follows. First, each column of a solution was multiplied by the sign (i.e., 1 or −1) of the mean of the loadings in that column to produce only positive factors. Next, all permutations of the columns of the solution were compared to the population loading matrix, and the permutation with the lowest MSE was kept. These solutions were then compared to identify unique solutions, which we examined afterward to guarantee that no duplicate solutions were erroneously identified.
Jennrich and Bentler (2011) prove that JB rotation criterion are always minimized when the structure is bifactor, but only if there are no cross-loadings.
Here, we use “dimensionality” to describe the number of factors with meaningful variance; if the constraints do not hold, the dimensionality of the reduced correlation matrix is, strictly speaking, p + 1, but if the constraints almost hold, the dimensionality is still “essentially” p
In popular psychometric packages, such as the R psych library (Revelle, 2015), the Schmid-Leiman routine only allows two possible rotations – oblimin and Promax. Researchers wishing to explore alternative rotations from the CF family should use the GPArotation library, and then transform the correlated factors solution into a SL via multiplication of lambda by L, as done here. This, of course, will not automatically yield indices such as omega, omega hierarchical, factor determinacy, and so on, as the psych package does by default.
Conflict of Interest Disclosures: Each author signed a form for disclosure of potential conflicts of interest. No authors reported any financial or other conflicts of interest in relation to the work described.
Ethical Principles: The authors affirm having followed professional ethical guidelines in preparing this work. These guidelines include obtaining informed consent from human participants, maintaining ethical treatment and respect for the rights of human or animal participants, and ensuring the privacy of participants and their data, such as ensuring that individual participants cannot be identified in reported results or from publicly available original or archival data.
References
- Asparouhov T, Muthén B. Exploratory structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal. 2009;16(3):397–438. [Google Scholar]
- Asparouhov T, Muthén B. Comparison of computational methods for high dimensional item factor analysis. Mplus Technical Report 2012 [Google Scholar]
- Bandalos DL, Kopp JP. The utility of exploratory bi-factor rotations in scale construction. Paper presented at the annual meeting of the American Psychological Society; Washington D.C.. 2013. May 24, Retrieved from http://www.jmu.edu/assessment/research/students/APS_2013_Bandalos.pdf. [Google Scholar]
- Bentler PM, Wu EJ. EQS-Windows user’s guide: version 4. BMDP Statistical Software; 1993. [Google Scholar]
- Bernaards CA, Jennrich RI. Gradient projection algorithms and software for arbitrary rotation criteria in factor analysis. Educational and Psychological Measurement. 2005;65(5):676–696. [Google Scholar]
- Browne MW. An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research. 2001;36(1):111–150. [Google Scholar]
- Brunner M, Nagy G, Wilhelm O. A tutorial on hierarchically structured constructs. Journal of Personality. 2012;80(4):796–846. doi: 10.1111/j.1467-6494.2011.00749.x. [DOI] [PubMed] [Google Scholar]
- Canivez GL. Bifactor modeling in construct validation of multifactored tests: Implications for understanding multidimensional constructs and test interpretation. In: Schweizer K, DiStefano C, editors. Principles and methods of test construction: Standards and recent advancements. Gottingen, Germany: Hogrefe Publishing; in press. Eds., under contract. [Google Scholar]
- Crawford CB. A comparison of the direct oblimin and primary parsimony methods of oblique rotation. British Journal of Mathematical and Statistical Psychology. 1975;28:201–213. [Google Scholar]
- Crawford CB, Ferguson GA. A general rotation criterion and its use in orthogonal rotation. Psychometrika. 1970;35:321–332. [Google Scholar]
- de Bruin GP, Henn CM. Dimensionality of the 9-item Utrecht Work Engagement Scale (UWES–9) 1. Psychological reports. 2013;112(3):788–799. doi: 10.2466/01.03.PR0.112.3.788-799. [DOI] [PubMed] [Google Scholar]
- Dombrowski SC. Investigating the structure of the WJ III Cognitive in early school age through two exploratory bifactor analysis procedures. Journal of Psychoeducational Assessment. 2014;32:483–494. doi: 10.1177/0734282914530838. [DOI] [Google Scholar]
- Gignac GE, Watkins MW. Bifactor modeling and the estimation of model-based reliability in the WAIS-IV. Multivariate Behavioral Research. 2013;48(5):639–662. doi: 10.1080/00273171.2013.804398. [DOI] [PubMed] [Google Scholar]
- Harman HH. Modern factor analysis. University of Chicago Press; 1976. [Google Scholar]
- Harman HH, Jones WH. Factor analysis by minimizing residuals (minres) Psychometrika. 1966;31(3):351–368. doi: 10.1007/BF02289468. [DOI] [PubMed] [Google Scholar]
- Horn JL. A rationale and test for the number of factors in factor analysis. Psychometrika. 1965;30:179–185. doi: 10.1007/BF02289447. [DOI] [PubMed] [Google Scholar]
- Jennrich RI. A simple general procedure for orthogonal rotation. Psychometrika. 2001;66(2):289–306. [Google Scholar]
- Jennrich RI. A simple general method for oblique rotation. Psychometrika. 2002;67(1):7–19. [Google Scholar]
- Jennrich RI, Bentler PM. Exploratory bi-factor analysis. Psychometrika. 2011;76(4):537–549. doi: 10.1007/s11336-011-9218-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jennrich RI, Bentler PM. Exploratory bi-factor analysis: The oblique case. Psychometrika. 2012;77(3):442–454. doi: 10.1007/s11336-012-9269-1. [DOI] [PubMed] [Google Scholar]
- Moore TM, Reise SP, Depaoli S, Haviland MG. Iteration of Partially Specified Target Matrices: Applications in Exploratory and Bayesian Confirmatory Factor Analysis. Multivariate Behavioral Research. 2015;50(2):149–161. doi: 10.1080/00273171.2014.973990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morin AJ, Arens AK, Marsh HW. A bifactor exploratory structural equation modeling framework for the identification of distinct sources of construct-relevant psychometric multidimensionality. Structural Equation Modeling: A Multidisciplinary Journal. 2015:1–24. [Google Scholar]
- Murray AL, Johnson W. The limitations of model fit in comparing the bi-factor versus higher-order models of human cognitive ability structure. Intelligence. 2013;41(5):407–422. [Google Scholar]
- Muthén B, Asparouhov T. Bayesian SEM: A more flexible representation of substantive theory. Psychological Methods. 2012;17:313–335. doi: 10.1037/a0026802. [DOI] [PubMed] [Google Scholar]
- Muthén LK, Muthén BO. Mplus User’s Guide. Sixth. Los Angeles, CA: Muthén & Muthén; 1998–2012. [Google Scholar]
- Olatunji BO, Ebesutani C, Reise SP. A Bifactor Model of Disgust Proneness Examination of the Disgust Emotion Scale. Assessment. 2015;22(2):248–262. doi: 10.1177/1073191114541673. [DOI] [PubMed] [Google Scholar]
- Olsson U. Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika. 1979;44(4):443–460. [Google Scholar]
- R Core Team. R: A language and environment for statistical computing [Computer software manual] Vienna, Austria: 2015. Retrieved from http://www.R-project.org/ [Google Scholar]
- Reise SP. The rediscovery of bifactor measurement models. Multivariate Behavioral Research. 2012;47(5):667–696. doi: 10.1080/00273171.2012.715555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reise SP, Cook KF, Moore TM. Evaluating the impact of multidimensionality on unidimensional item repsonse theory model parameters. In: Reise S, Revicki D, editors. Handbook of Item Response Theory Modeling: Applications to Typical Performance Assessment. New York: Routledge; 2015. pp. 13–40. [Google Scholar]
- Reise S, Moore T, Maydeu-Olivares A. Target rotations and assessing the impact of model violations on the parameters of unidimensional item response theory models. Educational and Psychological Measurement. 2011;71(4):684–711. [Google Scholar]
- Revelle W. Hierarchical cluster analysis and the internal structure of tests. Multivariate Behavioral Research. 1979;14(1):57–74. doi: 10.1207/s15327906mbr1401_4. [DOI] [PubMed] [Google Scholar]
- Revelle W. psych: Procedures for personality and psychological research [Computer software manual] 2015 http://cran.r-project.org/web/packages/psych/. (R package version 1.5.8)
- Robitzsch A. sirt: Supplementary item response theory models. 2016 R package version 1.11.0 ( http://cran.r-project.org/web/packages/sirt/). Accessed July 2016.
- Rodriguez A, Reise SP, Haviland MG. Evaluating bifactor models: Calculating and interpreting statistical indices. Psychological Methods. doi: 10.1037/met0000045. in press. [DOI] [PubMed] [Google Scholar]
- Rodriguez A, Reise SP, Haviland MG. Applying bifactor statistics indices in the evaluation of psychological measures. Journal of Personality Assessment. doi: 10.1080/00223891.2015.1089249. in press. [DOI] [PubMed] [Google Scholar]
- Rozeboom WW. The glory of suboptimal factor rotation: Why local minima in analytic optimization of simple structure are more blessing than curse. Multivariate Behavioral Research. 1992;27(4):585–599. doi: 10.1207/s15327906mbr2704_5. [DOI] [PubMed] [Google Scholar]
- Sass DA, Schmitt TA. A comparative investigation of rotation criteria within exploratory factor analysis. Multivariate Behavioral Research. 2010;45(1):73–103. doi: 10.1080/00273170903504810. [DOI] [PubMed] [Google Scholar]
- Schmid J, Leiman JM. The development of hierarchical factor solutions. Psychometrika. 1957;22(1):53–61. [Google Scholar]
- Simms LJ, Grös DF, Watson D, O’Hara MW. Parsing the general and specific components of depression and anxiety with bifactor modeling. Depression and Anxiety. 2008;25(7):E34–E46. doi: 10.1002/da.20432. [DOI] [PubMed] [Google Scholar]
- Stout W, Habing B, Douglas J, Kim HR. Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement. 1996;20:331–354. [Google Scholar]
- Thurstone LL. An analytical method for simple structure. Psychometrika. 1954;19(3):173–182. [Google Scholar]
- Walls MM, Kleinknecht RA. Disgust factors as predictors of blood-injury fear and fainting. Paper presented at the annual meeting of theWestern Psychological Association; San Jose, CA. 1996. Apr, [Google Scholar]
- Yates A. Multivariate exploratory data analysis: A perspective on exploratory factor analysis. Albany: State University of New York Press; 1987. [Google Scholar]
- Yung YF, Thissen D, McLeod LD. On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika. 1999;64(2):113–128. [Google Scholar]
- Zhang J. Conditional covariance theory and DETECT for polytomous items. Psychometrika. 2007;72:69–91. [Google Scholar]
- Zinbarg RE, Revelle W, Yovel I, Li W. Cronbach’s α, Revelle’s β, and McDonald’s ω H: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika. 2005;70(1):123–133. [Google Scholar]