Abstract
As pointed out by Sijtsma (in press), coefficient alpha is inappropriate as a single summary of the internal consistency of a composite score. Better estimators of internal consistency are available. In addition to those mentioned by Sijtsma, an old dimension-free coefficient and structural equation model based coefficients are proposed to improve the routine reporting of psychometric internal consistency. The various ways to measure internal consistency are also shown to be appropriate to binary and polytomous items.
Sijtsma (in press) has ably critiqued the unfortunate ascendance of coefficient alpha as the almost universal and sole estimator of the somewhat vague population concept of internal consistency. Among several recommendations, he suggests to report the greatest lower bound, here denoted ρglb, as a measure of internal consistency reliability, and the explained common variance (ECV) based on minimum rank factor analysis, as a measure of unidimensionality. These are good recommendations, but they are incomplete. We provide additional technical detail on some methods, and also consider several alternative methods.
Using a slightly different notation than Sijtsma, we are interested in describing the reliability of a composite X that is a simple sum of p unit-weighted components such as X = X1 + X2 +…+ Xp. An internal consistency reliability coefficient describes the quality of the composite or scale in terms of hypothesized constituents of the components Xi. These might represent true and error parts based on classical test theory (Xi = Ti + Ei), common and unique parts based on common factor analysis (Xi = Ci + Ui), or the loading of the component on its factor plus residual error (Xi = λi F + Ei). Since Ui is the sum of two uncorrelated parts called specificity and error, that is, Ui = Si + Ei, where Si is specificity, it is a more general decomposition than that of classical test theory and we focus on it. We assume that Ci, and Ui are uncorrelated, and that the Xi are not linearly dependent. Then the covariance structure model ∑ = ∑C + Ψ follows, where ∑C is the positive semidefinite (psd) covariance matrix of the common variables and Ψ is the covariance matrix of the unique variables, typically taken as positive definite and diagonal. In general, the Ci, are linearly dependent, so that ∑C can be of low rank and can have such factor analytic (FA) decompositions as ∑C = ΛΛ′ or ∑C = ΛΦΛ′, or a structure based on a FA simultaneous equation system such as ∑C = Λ(I − B)−1Φ(I − B)−1′ Λ′ (see Bentler, 2007). Here Λis a factor loading matrix,Φ is the covariance matrix of nondependent latent factors, and B is a matrix of coefficients relating latent factors.
Reliability and Internal Consistency
As noted by Sijtsma, internal consistency reliability is a vague concept, and so we now give it a concrete definition. Given decomposition for the parts Xi = Ti + Ei and further that Ui = Si + Ei, we have that the total or composite score has a similar decomposition X = C + U, where , and S and E are specific and error total scores. Thus X = C + S + E, i.e., the total score is made up of mutually uncorrelated total common, specific, and error variables. The covariance matrices of the parts are ∑ = ∑C + Ψ = ∑C + ΨS + ΨE = ∑T + ΨE, where ∑T = ∑C + ΨS and ΨE are the true and error covariance matrices and Ψs is the diagonal matrix of specific variances. The reliability of the composite is defined as the ratio of var(T)/var(X), or
| (1) |
where 1 is a column vector of units. We also immediately have a clear-cut definition of internal consistency reliability ρxx as
| (2) |
which is a lower bound to the reliability of the composite. See Bentler (1968, eq. 12) and Heise and Bohrnstedt (1970, eq. 32) for the origin of these ideas. These equations make clear that an internal consistency coefficient is a downward biased estimator of reliability by the proportion of specific variance, that is,
| (3) |
Equality holds only when there is no specific variance.
Alpha
Let σij be an off-diagonal element of ∑ and σ̄ij be the average of all σij. Then we have coefficient alpha as (see Sijtsma, eq. 2)
| (4) |
We now assume that Ψis diagonal, so that , where and σ̄CiCj are the average diagonal and off-diagonal elements of ∑C. Since ∑C is positive semidefinite, , and we have that
| (5) |
with equality if all the elements of ∑C are equal. Although Sijtsma points out that α is downwardly biased, we now see by comparing (2), (3), and (4) that the extent of this bias is given by
| (6) |
A point not discussed by Sijtsma is that α can, in fact, overestimate ρXX. This can occur if Ψ is not diagonal, i.e., if there are correlated errors. Then (2) is still appropriate, but the numerator in (4) is inflated by these correlated errors. For some discussions of this problem, see e.g., Raykov (2001) and Kano and Azuma (2003). Of course, if correlated errors are parameterized as factors so that ψ is diagonal, then (5) again holds. It would seem that the question of whether to consider correlated errors as factors and hence part of the common factor space, or as residual covariances and hence as part of the unique space, should be left up to the goals of the investigator.
From (4) we also see that α reflects the number of parts in the composite and also a weighted average correlation among measures, which may be useful information. If the variances of the parts are all equal, it reflects the average correlation. More precisely, let the generalized classical test theory average item or part reliability (Bentler, 1964) be defined by
| (7) |
where is the average item or part variance. If the item variances are equal, as in a parallel test situation, (7) is just the average correlation among items. In general, α is just the Spearman-Brown step-up of the part reliability (7), that is,
| (8) |
It can be argued that such an interpretation provides sufficient rationale for the continuing use of α as one of several measures of internal consistency.
Dimension-Free Internal Consistency Reliability
Sijtsma described the coefficient ρglb, but did not review an earlier coefficient that is precisely equal to the glb under standard assumptions. Suppose now we take ∑C = ΛΛ′ so that ∑ = ΛΛ′ + Ψ and Ψ is diagonal, i.e., we have a traditional factor analysis model. Bentler (1972) developed a dimension-free reliability coefficient based on this decomposition without any assumption on the number of factors. He proposed that factor loading matrix Λ (of arbitrary dimension k) be chosen so that trace(ΛΛ′) is minimized while the model ∑ = ΛΛ′ + Ψ holds precisely1. But min trace(ΛΛ′) = maxtrace(Ψ) = max1′Ψ1, so that Bentler’s dimension-free lower bound to reliability is
| (9) |
There is no smaller internal consistency reliability coefficient for which (∑ − Ψ) is positive semidefinite, i.e., for which the factors are real, and not imaginary. If, at the solution, the unique variance matrix Ψ has non-negative variances ψii ≥0, the dimension free lower bound is the greatest lower bound to reliability. Actually, we would argue that reliability theory requires every variable to have some nonzero error variance.
We had shown in (6) how α relates to reliability . In (6), however, no specific structure was assumed for ∑C. Using the structure (9), for ∑C =ΛΛ′ with minimum trace we can say
| (10) |
That is, ρblb increases α by the smallest possible amount that would make ∑C psd with minimum trace.
If we also constrain ψii ≥ 0 when minimizing (9), that is, we disallow Heywood cases, we obtain the greatest lower bound to internal consistency reliability (Woodhouse & Jackson, 1977; Bentler & Woodward, 1980). Specifically,
| (11) |
Since (11) is an optimization problem with an additional constraint as compared to (9), it immediately follows that
| (12) |
with equality when there are no negative unique variances in ρblb.2 Another way to say this is that ρglb improves on ρblb to the extent that variables in the population have negative error variances. This may be an illusory gain, since it implies that a factor model with nonnegative unique variances does not fit the population, and hence a reliability coefficient based on it may be questionable.
We now make several critical observations about these dimension-free coefficients. First, it was stated in association with (9) and (11) that the number of factors, k, is arbitrary. In covariance structure analysis, k is usually taken to be a small number so that a model would have positive degrees of freedom. Here, however, k is usually sufficiently large that a factor model would have negative degrees of freedom, that is, k exceeds the Ledermann bound. This might be an argument to consider model-based reliability coefficients, discussed in the next section. Closely related to this is the fact that ρblb and ρglb are based on the assumption that all covariation in the data should be modeled in the common factor space. Stated differently, Ψ is diagonal and no correlated errors are allowed. If the partitioning of variance that one has in mind implies that correlated errors should be considered part of error and not systematic common variance, these lower-bound coefficients overestimate internal consistency. Further, the computational procedures used to obtain solutions to (9) and (11) assume that there is no sampling variability in the sample covariances, that is, they treat sample covariances as if they are population covariances. Stated differently, while (9) and (11) are based on the population covariance matrix, the procedures are applied to a sample covariance matrix. This seems risky as any sample-based distortion of correlations will be modeled as real common variance. To take an extreme case, suppose the population covariance matrix ∑ = I, that is, variables are uncorrelated. ρblb and ρglb would be zero. But a sample covariance matrix would have nonzero covariances, so that ρblb and ρglb would be positive. This illustrates the bias noted by Sijtsma. Shapiro and ten Berge (2000) offered an explicit expression for the asymptotic bias of MTFA, and hence for ρblb. More effective non-asymptotic bias corrections also are available (Li & Bentler, 2004).
The very nature of k-dimensional coefficients also can be questioned. Perhaps one would like to assure that any measure of reliability is based on a single latent dimension, yet ρblb and ρglb allow k dimensions. Actually, this is an illusory problem. Bentler (2007) shows that there exists a rotation in k-dimensional factor space that gives a unidimensional subspace of maximum internal consistency, and that the resulting coefficients are actually given by ρblb and ρglb.
Sijtsma recommends assessing unidimensionality by the ratio of the first eigenvalue of ∑C to their sum, expressed as a percent. Called explained common variance, or ECV, this is a measure of how close ∑C is to being unidimensional. This is an informative measure. Sijtsma follows ten Berge and Sočan (2004) in recommending that ∑C and ECV be computed by a procedure called minimum rank factor analysis. But there is no reason to limit the definition in this way. Since tr(∑C)equals the sum of the common factor eigenvalues, and these are minimized by the dimension-free methods described above, as suggested by Bentler (1972) it would be at least as meaningful to apply the ECV measure to the ∑C obtained from dimension-free methods than from the minimum rank method.
Model-Based Internal Consistency Coefficients
The internal consistency coefficients discussed by Sijtsma, and also those reviewed above, are specified at a population level and are mathematical lower bounds to population internal consistency reliability. In practice, sample covariance and correlation matrices must be used in the computation instead of their population counterparts, which are essentially never available. It is well-known that a model-based covariance matrix ∑̂ can be a more efficient estimator of the population covariance matrix ∑ than the product moment sample covariance matrix Sn based on a sample of size n, and hence it may be useful to consider reliability coefficients based on a model.
A particular model-based coefficient that has been proposed in the literature is based on the 1-factor model ∑C = λλ′, where λ (px1) is the factor loading vector. Then ∑ = λλ′ + Ψ, and we can follow Jöreskog (1971, p. 112) to define
| (13) |
Zinbarg et al. (2005) equate ρ11 with McDonald’s (1985) ωH. Typically, Ψ is taken to be a diagonal matrix but it need not be so (see Bollen, 1980). To make ρ11 operational, the one-factor model is estimated based on Sn, and ρ̂11 is computed from (13) using the parameter estimates λ̂ and Ψ̂ and the resulting ∑̂ in place of their population counterparts.
However, a 1-factor model hardly ever describes real data with a reasonable large p, that is, the null hypothesis ∑ = λλ′ + Ψ will be invariably rejected (see e.g., ten Berge and Sočan, 2004, p. 613). McDonald (1999, p. 89) stated that if the 1-factor fit is poor “…we should not be using the coefficient anyway,” a point emphasized by Bentler (2003, 2007) in his proposal to use any statistically acceptable structural model that contains additive random errors to estimate internal consistency reliability. A good example is a variant of the FA simultaneous equation model, which can be written as
| (14) |
This fits in our framework with ∑C = Λ(I − B)−1 Φ(I − B)−1′ Λ′. Hence, after obtaining parameter estimates Λ̂,B̂,Φ̂,Ψ̂ for any model of the form (14) whose null hypothesis is accepted, we also have ∑̂C and ∑̂ , and we can use our basic internal consistency definition (2) to compute
| (15) |
Bentler (2007) shows that (15) can equivalently be conceptualized as the coefficient for a unidimensional subspace of ∑̂C Although we consider this coefficient to be one of the most defensible general-purpose internal consistency estimators, it was not considered by Sijtsma. Hopefully the coefficient (15) is just an ancillary measure to the more important objective of actually understanding the structure of the instrument based on (14).
Coefficients for Binary and Ordinal Variables
Sijtsma mentions item response theory but does not give concrete guidelines on how to obtain internal consistency reliability estimates under this framework. Here we note that this is easy to do if one accepts the notion that any observed binary or ordered categorical variable Xi represents a discretization of an underlying continuous normal variable Zi. The sum of the observed item scores, X = X1 + X2 +…+ Xp is no longer of interest, rather the sum of the underlying normal variables, that is, Z = Z1 + Z2 +…+ Zp. The internal consistency reliability of the composite Z can be obtained once an estimate of the correlation matrix of the Zi, known as the polychoric and polyserial correlation matrix, is available. Such estimates can be obtained via standard SEM packages based, e.g., on Lee, Poon and Bentler (1995) or Muthén (1984). Any of the previously described internal consistency coefficients can then be computed.
Minor Issues and Conclusion
The decompositions we have used to develop internal consistency coefficients also allow us to address some minor points in Sijtsma’s overview. In his real data example on coping, Sijtsma uses principal components analysis (PCA) to get “factor” loadings. However, nothing in the basic definitions related to reliability summarized above has anything to do with PCA. PCA is not relevant to the model, so whatever PCA tells us is irrelevant unless PCA and FA loading matrices are identical, which would only occur as the number of items becomes arbitrarily large (e.g., Bentler & Kano, 1990). Second, Sijtsma analyzes a correlation matrix. However, our development shows that the properties of the total score X depend on the covariance matrix of the parts, not on their correlation matrix. Reliability coefficients defined for continuous X are not identical to those that might be defined for Z = Z1 + Z2 +…+ Zp whose components are in a z-score or correlation metric. Third, in association with ρglb Sijtsma gives an example of a 3×3 covariance matrix (eq. 4) and draws conclusions related to it. However, this covariance matrix is not full rank, and hence in our view it violates a basic regularity assumption of classical reliability theory, namely, that no variable is measured perfectly without any random error.
Almost all of the coefficients described in this paper have been available in the EQS program (Bentler, in press) for many years. The ECV measure is not currently computed.
We conclude by making an observation on a typically-overlooked point. Sample covariance matrix based estimates such as α̂ or ρ̂glb and even the model-based coefficients reviewed above are not necessarily lower-bounds to population reliability. Thirty years ago, Woodward and Bentler (1978) developed a probabilistic or statistical lower bound to population reliability. It would seem that the concept of a statistical lower bound remains to be developed for several of the coefficients discussed above.
Footnotes
Research supported in part by grants DA00017 and DA01070 from the National Institute on Drug Abuse. This paper is based in part on Bentler (2003).
A more recent name for the optimization is minimum trace factor analysis or MTFA (e.g., della Riccia & Shapiro, 1982; Shapiro, 1982).
This optimization problem is also called constrained minimum trace factor analysis (e.g., ten Berge, Snijders, & Zegers, 1981).
References
- Bentler PM. Generalized classical test theory error variance. American Psychologist. 1964;19:548. [Google Scholar]
- Bentler PM. Alpha-maximized factor analysis (Alphamax): Its relation to alpha and canonical factor analysis. Psychometrika. 1968;33:335–345. doi: 10.1007/BF02289328. [DOI] [PubMed] [Google Scholar]
- Bentler PM. A lower-bound method for the dimension-free measurement of internal consistency. Social Science Research. 1972;1:343–357. [Google Scholar]
- Bentler PM. Should coefficient alpha be replaced by model-based reliability coefficients?; Invited paper presented at International Meetings of the Psychometric Society; Cagliari, IT. 2003. Jul, [Google Scholar]
- Bentler PM. Covariance structure models for maximal reliability of unit-weighted composites. In: Lee S-Y, editor. Handbook of latent variable and related models. North-Holland: Amsterdam; 2007. pp. 1–19. [Google Scholar]
- Bentler PM. EQS 6 structural equations program manual. Encino, CA: Multivariate Software; (in press) ( www.mvsoft.com). [Google Scholar]
- Bentler PM, Kano Y. On the equivalence of factors and components. Multivariate Behavioral Research. 1990;25:67–74. doi: 10.1207/s15327906mbr2501_8. [DOI] [PubMed] [Google Scholar]
- Bentler PM, Woodward JA. Inequalities among lower bounds to reliability: With applications to test construction and factor analysis. Psychometrika. 1980;45:249–267. [Google Scholar]
- Bollen KA. Issues in the comparative measurement of political democracy. American Sociological Review. 1980;45:370–390. [Google Scholar]
- della Riccia G, Shapiro A. Minimum rank and minimum trace of covariance matrices. Psychometrika. 1982;47:443–448. [Google Scholar]
- Heise DR, Bohrnstedt GW. Validity, invalidity, and reliability. In: Borgatta EF, editor. Sociological methodology 1970. San Francisco: Jossey-Bass; 1970. pp. 104–129. [Google Scholar]
- Jöreskog KG. Statistical analysis of sets of congeneric tests. Psychometrika. 1971;36:109–133. [Google Scholar]
- Kano Y, Azuma Y. Use of SEM programs to precisely measure scale reliability. In: Yanai H, Okada A, Shigemasu K, Kano Y, Meulman JJ, editors. New developments in psychometrics. Tokyo: Springer-Verlag; 2003. pp. 141–148. [Google Scholar]
- Lee S-Y, Poon W-Y, Bentler PM. A two-stage estimation of structural equation models with continuous and polytomous variables. British Journal of Mathematical and Statistical Psychology. 1995;48:339–358. doi: 10.1111/j.2044-8317.1995.tb01067.x. [DOI] [PubMed] [Google Scholar]
- Li L, Bentler PM. The greatest lower bound to reliability: Corrected and resampling estimators; Paper presented at Symposium on Recent Developments in Latent Variables Modeling; Tokyo: Japanese Statistical Association and Japanese Behaviormetric Society; 2004. [Google Scholar]
- McDonald RP. Factor analysis and related methods. Hillsdale, NJ: Erlbaum; 1985. [Google Scholar]
- McDonald RP. Test theory: A unified treatment. Mahwah, NJ: Erlbaum; 1999. [Google Scholar]
- Muthén B. A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika. 1984;49:115–132. [Google Scholar]
- Raykov T. Bias of coefficient α for fixed congeneric measures with correlated errors. Applied Psychological Measurement. 2001;25:69–76. [Google Scholar]
- Shapiro A. Rank reducibility of a symmetric matrix and sampling theory of minimum trace factor analysis. Psychometrika. 1982;47:187–199. [Google Scholar]
- Shapiro A, ten Berge JMF. The asymptotic bias of minimum trace factor analysis, with applications to the greatest lower bound to reliability. Psychometrika. 2000;65:413–425. [Google Scholar]
- Sijtsma K. On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika. doi: 10.1007/s11336-008-9101-0. (in press) [DOI] [PMC free article] [PubMed] [Google Scholar]
- ten Berge JMF, Snijders TAB, Zegers FE. Computational aspects of the greatest lower bound to reliability and constrained minimum trace factor analysis. Psychometrika. 1981;46:201–213. [Google Scholar]
- ten Berge JMF, Sočan G. The greatest lower bound to the reliability of a test and the hypothesis of unidimensionality. Psychometrika. 2004;69:613–625. [Google Scholar]
- Woodhouse B, Jackson PH. Lower bounds for the reliability of a test composed of nonhomogeneous items II: A search procedure to locate the greatest lower bound. Psychometrika. 1977;42:579–591. [Google Scholar]
- Woodward JA, Bentler PM. A statistical lower-bound to population reliability. Psychological Bulletin. 1978;85:1323–1326. [PubMed] [Google Scholar]
- Zinbarg RE, Revelle W, Yovel I, Li W. Cronbach’s α Revelle’s β, and McDonald’s ωH : Their relations with each other and two alternate conceptualizations of reliability. Psychometrika. 2005;70:1–11. [Google Scholar]
