Abstract
Unit-weight sum scores (UWSSs) are routinely used as estimates of factor scores on the basis of solutions obtained with the nonlinear exploratory factor analysis (EFA) model for ordered-categorical responses. Theoretically, this practice results in a loss of information and accuracy, and is expected to lead to biased estimates. However, the practical relevance of these limitations is far from clear. In this article, we adopt an empirical view and propose indices and procedures (some of them new) for assessing the appropriateness of UWSSs in nonlinear EFA applications. A new automated approach for obtaining UWSSs that maximize fidelity and correlational accuracy is proposed. The appropriateness of UWSSs under different conditions and the behavior of the present proposal in comparison with other more common approaches are assessed with a simulation study. A tutorial for interested practitioners is presented using an illustrative example based on a well-known personality questionnaire. All the procedures proposed in the article have been implemented in a well-known noncommercial EFA program.
Keywords: sum scores, factor score estimates, exploratory item factor analysis, ordered-categorical responses, coefficient of fidelity
Most applications of the exploratory factor analysis (EFA) model to item analysis and test scoring adopt a two-stage approach. In the first stage (calibration), the item parameters are estimated and the goodness of model–data fit is assessed. In the second stage (scoring), the first-stage estimates are taken as fixed and known, and, on this basis, individual scores are obtained for each respondent. The first stage has received the most attention by far (e.g., Fabrigar et al., 1999; Ferrando & Lorenzo-Seva, 2017).
If the main objective of a psychometric EFA application were to obtain an interpretable structure, the emphasis on calibration would be justified. However, most applications of this type ultimately involve some form of individual measurement (Ferrando & Lorenzo-Seva, 2018; McDonald, 2011), which means that the scoring stage also deserves careful attention. The usual scenario we find as reviewers and counselors, however, is not like that: Once the FA structure has been determined and fit considered acceptable, the second stage consists of obtaining sum scores by assigning unit weights (UWs) to the dominant or salient loadings on each factor (e.g., Comrey & Lee, 1992; DiStefano et al., 2009; Dumenci & Achenbach, 2008; Fava & Velicer, 1992; Horn, 1965; ten Berge & Knol, 1985). Although UW is possibly the most widely used scoring procedure in psychometric applications (e.g., Raykov & Marcoulides, 2011), when derived from an EFA solution it is theoretically suboptimal.
So far, the potential limitations of using UW scores instead of theoretically superior estimates has been assessed mainly from the linear FA model, in which the factor score estimates are weighted composites of the item scores. So, when this model is adopted, the basic scoring choice is between an unweighted linear composite and a weighted composite in which the weights are chosen to minimize a specified (usually least-squares-type) quantity, and attain certain properties (Beauducel, 2015; Grice, 2001; Horn, 1965; McDonald & Burr, 1967). The limitations of the simpler UW schema are now summarized. In the case of a single-factor solution, information is expected to be lost and scores are expected to be less accurate except under very restrictive solutions (Beauducel & Leue, 2013; Thissen et al., 1983). In the general case of multiple solutions, additional distorting influences are also expected to appear due to the presence of cross-loadings in complex solutions (e.g., Comrey & Lee, 1992; Drasgow & Miller, 1982).
The extent to which the potential distortions and accuracy losses mentioned above are important in practice is still being discussed, and UW sum scores (UWSSs) also have advantages and proponents in linear EFA: They are easy to compute, interpret, relate to previous studies, and use with new data when the goal is individual assessment or prediction (DiStefano et al., 2009, Grice & Harris, 1998). Furthermore, under “weak” calibration conditions (unstable estimates, presence of outliers, complex solutions), they might (a) be even better proxies for the “true” levels than factor score estimates and (b) provide more stable results under cross-validation (Grice & Harris, 1998; Horn, 1965; Morris, 1979; Wackwitz & Horn, 1971; Wainer, 1976). Also, when conditions favor the use of more informative weighted factor score estimates, the correlations between these estimates and UWSSs are typically very high (Fava & Velicer, 1992; Horn, 1965; Horn & Miller, 1966; Schweiker, 1967; Trites & Sells, 1955).
Even though there are still some controversial issues, after 70 years of research focused on scoring under the linear model, the limitations (or possible advantages) of using UWSSs within this framework are already reasonably well known. This is not the case, however, when items are calibrated using the nonlinear model for ordered-categorical responses, which is being used more and more in psychometric applications (see, e.g., Ferrando & Lorenzo-Seva, 2017). As explained below in more detail, in this model, item-factor regressions are nonlinear step functions governed by thresholds, and, consequently, factor score estimates are no longer weighted composites of the item scores. Rather, they cannot be obtained in closed form and have to be estimated by using iterative schemas or quadrature approximations. To sum up, in this modeling, differences between the UWSS and the factor score estimates may (theoretically) be much larger than they are in the linear case. Furthermore, and possibly due to their greater complexity, factor score estimates based on the nonlinear model are rarely used in EFA applications. So, the present state of affairs, on which this research focuses, is that the nonlinear EFA model is increasingly being used but only for item-calibration purposes, while the UWSSs derived from the solution are routinely used in the scoring stage.
Evidence on the consequences of using UWSS instead of factor score estimates under nonlinear modeling is relatively scarce, and mostly made from an item response theory (IRT) framework, which has traditionally focused on unidimensional item sets (e.g., Lord, 1952, 1953, 1980; Thissen & Wainer, 2001). The results again suggest that, in spite of the important differences between simple UWSSs and nonlinear factor score estimates, the correlational agreement between them is very high in most of the conditions considered (Dumenci & Achenbach, 2008; Fan, 1998; Macdonald & Paunonen, 2002). In our opinion, these results are rather incomplete, but they have led some researchers to question the potential benefits of IRT scoring (see Reise & Waller, 2009, for a discussion).
The present article continues the discussion so far. Our view is that the appropriateness of UWSSs within the nonlinear FA framework should not be a source of never-ending, theoretical debates. Instead it should be addressed as the empirical issue that it is. All the models and scoring schemas are approximations, and the key issue is to decide whether the approximation is good enough for the purposes of the study. In this spirit, the basic aim of our article is to assess whether UWSSs can still be reasonable proxies for factors under certain conditions and/or for certain purposes when scoring is based on nonlinear EFA solutions. As mentioned above, this aim has a clear practical relevance, as practitioners tend to use UWSSs without solid evidence to justify their choice.
More specifically, the present article aims to make a fourfold contribution: methodological, practical, illustrative, and instrumental. On the methodological level, we provide results and measures that allow the practitioner to assess the quality and appropriateness of UWSSs as proxies for the factors in the general case of multiple solutions based on nonlinear EFA. Furthermore, we propose a new automated approach for obtaining UWSSs that maximize the properties of fidelity and correlational accuracy. On the practical and illustrative levels, we provide a tutorial, based on an empirical example in the personality domain, which shows how the contributions of the article are intended to be used in applications. Finally, on the instrumental level, the proposals in the article have been implemented in a widely used noncommercial program for EFA.
Framework and Basic Results and Procedures
Of the various criteria for judging the appropriateness of factor score estimates (Beauducel, 2015; Cattell & Tsujioka, 1964; Grice, 2001; Horn & Miller, 1966; McDonald & Burr, 1967; Morris, 1979; ten Berge & Knol, 1985), we shall consider two for the UWSSs under the present modeling. The first, which we shall call fidelity, is the extent to which each UWSS is a close approximation (in correlational terms) to the factor it intends to measure. The second, which we shall call correlational accuracy is the extent to which the correlation matrix among the UWSSs is a close approximation to the structural interfactor correlation matrix. The reasons for choosing these two properties are discussed below.
The Unidimensional Case
In the unidimensional case only fidelity is relevant. So, we shall start by discussing this simple scenario. Consider a test made up of n items, each with an ordered response format with c categories scored as 1, 2, . . ., c, and which attempts to measure a single trait or common factor θ. The underlying-variables approach (UVA; e.g., Edwards & Thurstone, 1952; Muthén, 1984) assumes that (a) for each item, there is an underlying, continuous-unbounded latent variable that generates the observed item categorical score, and (b) this UV is related to θ according to the unidimensional FA model. If we denote by Yij the latent score of individual i in the latent variable that underlies item j, the structural measurement model is
(1) |
where the αs are the factor loadings, and the εs are the residuals, with zero means, and uncorrelated with θ or between themselves. The distribution and scale for the Ys and θ are indeterminate and we shall follow the common convention of assuming that they are all normally distributed with zero mean and unit variance. The distribution of the residuals εs is also assumed to be normal.
The process that produces the manifest categorical scores from the UVs is assumed to be a step function governed by c − 1 arbitrary thresholds (τ)
(2) |
From this modeling, it follows that the product–moment correlation between Yj and Yk is the polychoric correlation between Xj and Xk (tetrachoric in the c = 2 case). Overall, the model summarized can be viewed as an alternative parameterization of the IRT two-parameter normal-ogive model in the c = 2 case, and of Samejima’s (1969) normal-ogive graded response model for graded scores with more than two categories (see, Ferrando & Lorenzo-Seva, 2013).
We shall start now by reviewing previous results based on the linear model. To do so, suppose first that the Y variables could be directly observed. If so, the relation between the UWSS and θ would be
(3) |
So, the structural model for the UWSS would be the same linear FA model that holds for each individual item, with a loading equal to the sum of item loadings, and a residual equal to the sum of the item residual terms. This result implies that the UWSS in this case is an unbiased estimate of a fixed linear function of the common factor (Raykov et al., 2016). Furthermore, it follows from Equation (3) that the model-implied correlation between the UWSS and θ would be
(4) |
Expression (4) has been repeatedly obtained in the literature in different forms (e.g., Cattell & Tsujioka, 1964) and, in particular, is the unidimensional case of the coefficient of fidelity (COF) proposed by Drasgow and Miller (1982). Within the linear FA framework, the COF is a direct index for assessing the extent to which the UWSS is a good proxy for measuring the factor. Furthermore, the right-hand side of Equation (4) clearly explains most of the results repeatedly reported in the literature (Fava & Velicer, 1992; Raykov et al., 2016; Schweiker, 1967; Trites & Sells, 1955): When the linear FA model holds, the accuracy of the UWSS as a proxy for θ increases with the number of items and the signal/noise ratio (i.e., increase in loadings and decrease in the residual variances).
Because of the nonlinear step functions in Equation (2), the simple relations in Equations (3) and (4) do not hold in the nonlinear UVA modeling considered here. However, an ordinal version of the COF intended for this model can be derived as follows. Under the assumptions considered here, the product–moment correlation between the latent variable Yj and θ (which is the standardized loading αj) is the polyserial correlation between the manifest variable Xj and θ. On the other hand, the product–moment correlation between Xj and θ is the corresponding point–polyserial correlation. By using the relation between both coefficients (e.g., Olsson et al., 1982, equation 12), we obtain:
(5) |
where φ is the density of the standard normal distribution. Using Equation (5), the model-implied correlation between the UWSS and θ in the nonlinear UVA case is
(6) |
We shall call ρ(SX, θ) the ordinal coefficient of fidelity (O-COF). As for reference values for this new index, in the linear case, Drasgow and Miller (1982) proposed .90 as a reasonable cutoff for COF if UWSSs are considered to be good measures or proxies for the underlying factors. We consider this value also as an appropriate cutoff for the ordinal version in Equation (6).
If the linear-FA–based COF in Equation (4) is compared with the nonlinear-FA-based O-COF, it becomes clear that the latter is expected to reflect additional attenuating influences. In more detail, the magnitude of both coefficients reflects the influence of measurement error, but O-COF also reflects the influence of nonlinearity, which, when substantial, makes the UWSSs biased estimates of the true factor levels at the ends of the scale (Wright, 1999). So, apart from the general cutoff value proposed above, it is also important to distinguish the amount of attenuation in O-COF that is due to error from the amount due to nonlinearity. Theoretically, the conditions that lead to strong nonlinearity can be predicted from the calibration results (Ferrando, 2009; Lord, 1952, 1953). Operatively, the amount of attenuation can be estimated as follows. By definition, the correlation ratio expressing the curvilinear relation between the UWSS and θ is (e.g., Lord, 1952)
(7) |
Furthermore, the square of Equation (7) is also, by definition, the marginal reliability of the UWSS when this sum is viewed as a particular type of factor score estimate (see e.g., Ferrando & Lorenzo-Seva, 2018).
Contrary to what occurs with the O-COF in Equations (5) and (6), no closed form expression exists in general for computing η in Equation (7). However, it can be approximated with the desired precision by using quadrature, as is explained in the Supplemental Appendix (available online). Now, η is always equal to or greater than the product–moment correlation, and equality holds only when the relation is perfectly linear (Kendall & Stuart, 1977). So, the impact of nonlinearity can be inferred from the difference . So as to enhance interpretability, the difference can be further converted to Cohen’s (1988) f2 local effect size measure
(8) |
The results so far enable an absolute assessment to be made of the UWSS–θ relation. It is also of interest to make a comparative assessment of the information and accuracy that are lost when the UWSSs are used instead of the theoretically superior factor score estimates. The results below are based on maximum likelihood (ML) estimates, but they are expected to hold as a reasonable approximation in the case of other estimates in common use such as modal (MAP) or expected-a-posteriori (EAP) Bayes scores (see, Ferrando & Lorenzo-Seva, 2019, for details).
Let be the factor-score estimate of individual i, and let be the corresponding true factor score. As in Samejima (1977) we can write
(9) |
where denotes the measurement error. Contrary to what happens in the regression of UWSS on θ, the regression of on θ is now linear. So both the product–moment correlation and the correlation ratio have the same value. Furthermore, the squared correlation between and can be written as (Ferrando & Lorenzo-Seva, 2019; Samejima, 1977)
(10) |
which has the same general form as Equation (7). So, with ML factor score estimates, the squared product–moment correlation between and θ is the same as the squared correlation ratio, and as the marginal reliability of the factor score estimates (Ferrando & Lorenzo-Seva, 2018). The square root of Equation (10) can then be considered as the “ceiling” or the maximal correlation that can be attained between any score derived from the FA solution and θ. And the difference between the square roots of Equations (10) and (6) quantifies the information and accuracy lost when the ML factor score estimates are substituted by the simpler UWSSs. This direct difference, however, is a correlation residual, and its relevance depends on the magnitude of the correlations that are subtracted (Cohen, 1988). So, we propose instead to obtain and interpret a transformed difference based on Fisher’s z transformations, that is, . This transformed difference is Cohen’s q effect size measure (Cohen, 1988) and can be interpreted by using existing reference values.
The Multidimensional Case
Consider now a solution with m (possibly) correlated factors:
(11) |
Generalization of the fidelity results above is more conveniently handled by using matrix notation. Let Λ(n×m) be the pattern matrix whose elements are the αjk loadings in Equation (11). Let Φ (m×m) be the interfactor correlation matrix, and let S = ΛΦ be the structure matrix whose elements are the correlations between the latent item responses Yj and the factors. Denote by W(n×m) the weight matrix, whose elements are ones (i.e., the item is included in the scale) and zeros (is not included). Finally, let DY be the diagonal matrix whose elements are the standard deviations of the UWSS which would be obtained by applying W directly to the Y variables. For the linear FA case, the Drasgow–Miller COFs would be obtained as the diagonal elements of the following matrix:
(12) |
The ordinal version of these coefficients can be obtained by converting the polyserial correlations in S to point–polyserial correlations using Equation (5) and by using a diagonal matrix with the standard deviations of the observed sums. If we denote these matrices by SX and DX. The ordinal COFs (i.e., the predicted correlations between each UWSS and the factor that it intends to measure) are now the diagonal elements of
(13) |
As for the other results considered above, correlation ratios can be also obtained in the multidimensional case as detailed in the Supplemental Appendix (available online). However, for computational simplicity, at present they have been implemented only in the orthogonal case. As for the factor score estimates, the results (Equations 9 and 10) finally hold for any of the m factors in the multidimensional solution.
We turn now to correlational accuracy. The correlation matrix among the UWSSs can be obtained empirically and we shall denote this matrix as RSXSX.The extent to which these empirical correlations approach the corresponding “true” interfactor correlations can be assessed by comparing the nondiagonal elements of RSX θ with the corresponding elements of Φ in terms of absolute differences. The comparison approach we propose is as follows. First, obtain the residual matrix:
(14) |
As a global, descriptive index of discrepancy, the root mean square of the residuals (RMSR) based on the nonduplicated off-diagonal elements of ESXSX is a familiar and convenient measure.
Inspection of the individual residuals is also important, and we again propose transforming them by using Fisher’s z transformations to enhance interpretability, that is, zes, t = z(rXs, Xt) − . Furthermore, by considering the structural interfactor correlation as fixed and known, the transformed residual can be further converted into a standardized residual by using
(15) |
Some Substantive and Practical Considerations
In our experience, individual score estimates are derived from FA solutions in practice for one (or both) of the two basic purposes described below. The first is individual assessment (counseling, ranking individuals, creating percentile tables, etc.). The second is to use the score estimates in subsequent analyses in validity assessments to assess or predict relations with other relevant variables. Now, if UWSSs are to be used for the first purpose, then high degrees of fidelity and the small impact of nonlinearity are clearly the major properties that need to be assessed. On the other hand, the second purpose also requires a high degree of correlational accuracy: If the correlations among the UWSSs substantially depart from the correlations implied by the factor solution, then biased and distorted regression weights estimates are expected. However, if the correlation matrix between the score estimates is very close to that of the latent variables and fidelity is high enough (i.e., the attenuation due to measurement error is small) these scores are expected to lead to essentially unbiased estimates of the regression weights when used in subsequent analyses (e.g., Skrondal & Laake, 2001).
We clearly acknowledge that the two basic properties considered here are the only ones necessary—even though they may possibly be insufficient—for more complex purposes, such as comparing individual scores, assessing changes over time, or making decisions based on cutoff points (e.g., Reise & Waller, 2009). We shall not discuss these issues any further here, although our discussion so far has to be regarded only as a basic first step.
As a general, final caveat, the appropriateness of different scoring schemas derived from an EFA solution in terms of validity, generalizability, or stability depends largely on the strength and stability of the solution itself, and, we concur with Fava and Velicer (1992) in that weak and unstable solutions do not provide an adequate basis for establishing a scoring system. So, scoring only makes sense if the solution obtained in the calibration stage fits well, and is strong and replicable.
DIANA: A New Proposal for Achieving Improved UWSSs
Procedures for obtaining UWSSs that maximize fidelity, correlational accuracy, or both have been described sporadically in the EFA literature within the linear framework (e.g., Cattell & Tsujioka, 1964; Gorsuch, 1997). As mentioned above, by far the most common procedures are variants in which the pattern matrix or the structure matrix are inspected in order to optimize the 1-0 elements of the resulting W matrix (Comrey & Lee, 1992; Fava & Velicer, 1992; Horn, 1965; Wackwitz & Horn, 1971). We shall denote this approach generally as Unit-loading. An alternative approach, which we shall denote as Unit-regression, is to obtain W by directly inspecting the scoring weights (Grice & Harris, 1998; ten Berge & Knol, 1985). In principle, both approaches have a common limitation: The researcher needs to define a threshold value to decide when an element of the pattern/structure matrix (in the case of Unit-loading) or of the scoring weight (in the case of Unit-regression) is salient. As for performance, Grice and Harris (1998) compared both approaches in a simulation study using a threshold value of .30 and concluded that Unit-regression produced the best outcomes.
To the best of our knowledge, no optimizing procedures of the type discussed above have been developed for multiple EFA solutions based on the nonlinear UVA model considered here. We also note that the procedure that appeared to work best in the linear case, Unit-regression, is not directly applicable in the present modeling. Given this scenario, we wish to propose a new approach that is specifically designed for (multiple) nonlinear UVA–EFA solutions. To some extent, our proposal is related to previous unit-regression procedures. However, it is fully automated, and so does not require an arbitrary threshold value to be defined.
Let X (N×n) be the matrix of observed item scores and the factor score estimate of individual i in the s factor. In our proposal, the factor score estimates are obtained from an improved EAP estimation approach for oblique factor models known as ORION (Ferrando & Lorenzo-Seva, 2016). ORION takes into account the interfactor correlation matrix obtained at the calibration stage, and produces factor score estimates whose correlations closely approach those in Φ (thus maximizing correlational accuracy).
We now move on to determine which items are to be added to the computation of the UWSSs. The process assesses a regression model and, in particular, which items in X are good predictors of . This assessment is not simple because the columns of X (i.e., the predictors) are collinear (strongly so in measures with high internal consistency). Now, in the context of multiple linear regression with collinear predictors, it is usual to assess which predictors are to be included in the regression model on the basis of their relative importance (see Lorenzo-Seva et al., 2010). Johnson’s (2000) relative weights measure the proportional contribution that each predictor makes to the relative importance after correcting for the effects of the intercorrelations among predictors. If a particular predictor is not related to the criterion, or if the variance of the criterion accounted by for this predictor is redundant (i.e., it is already accounted by for other predictors in the model), then the relative weight for this predictor is zero. In an ideal model in which the n predictors are strongly related to the criterion, and each predictor accounts for an amount of variance that is not accounted for by any other predictor in the model, the relative weight for any predictor in the model is 100/n. From this result it, follows that 100/n is an objective threshold value that can be used to decide which predictors are to be included in the regression model.
Overall, then, to decide which of the n items are to be added, we propose computing Johnson’s relative weights when predicting from the columns of X. If the relative weight of an item is lower than 100/m, then the scores on this item are not used to compute the UWSSs, and the corresponding element in the W matrix is set to zero; otherwise, the value is set to 1 (or −1, if the loading of the item on the factor s is negative).
Once the W matrix is available, individual scores on factor s are obtained directly by simply adding the responses to the items that have a unit value in the W s column. Please note that, if the value is −1, then the scores on the corresponding item must be reversed before they are added. As the researcher does not need to propose which items are to be added to obtain scores on factor s, we name our approach Direct Item Addition of Non-Ahead proposed sets (DIANA).
As a summary, DIANA aims to maximize both the fidelity and the correlational accuracy of UWSS in the context of multiple UVA–EFA solutions intended for ordered-categorical responses. Fidelity is maximized because the sums obtained aim to be as close as possible to the factor score estimates that are used as criterion (which, in turn, are the score estimates with maximal fidelity). And correlational accuracy is obtained because the ORION estimates chosen as the criterion have the property of maximizing correlational accuracy. Whether our proposal works as intended is tested in the simulation studies in the following section.
Main Simulation Study
The main simulation study intended to assess the functioning of DIANA under various scenarios of communality levels, interfactor correlations, and response extremeness. In general terms, the design attempted to mimic the conditions expected in empirical applications, and so provide realistic choices. We specified three population models, each with a different level of communality (low, wide, high). The models were taken from the population loading matrices proposed by MacCallum et al. (2001) in which there were 20 measured variables (n = 20) and three factors (m = 3). Two scenarios were considered for the interfactor correlations in the factor model at the population level: orthogonal (i.e., the three factors correlate zero), and oblique (i.e., the three factors correlate between .20 and .50). The continuous simulated responses were categorized to a 5-point response format so that there were four conditions of response extremeness:
None: The thresholds used to categorize data were [.05, .26, .74, .95].
Low: The thresholds used to categorize data were [.05, .15, .35, .65].
Important: The thresholds used to categorize data were [.03, .09, .22, .52].
Severe: The thresholds used to categorize data were [.05, .10, .15, .25].
Data were generated as follows. First, a sample (N = 1000) of true factor scores was drawn from the normal distribution taking into account the number of factors, the population loading matrix, and the interfactor correlations in the population. The item responses were obtained using the standard EFA model, and the responses were then categorized using the corresponding thresholds in order to obtain the expected response extremeness for the condition at hand. Once the simulated item responses were available, in order to obtain estimates of the factor model, the polychoric correlation matrix of the simulated sample responses was factor analyzed using the unweighted least squares (ULS) method, and transformed using Promin rotation (Lorenzo-Seva, 1999). Three types of score were computed:
Unit-loading scores: In order to decide whether a loading was salient, we used the .30 threshold in absolute value. To assess the observed fidelity of these scores, we computed (a) the correlation between the Unit-loading scores and the true factor scores (i.e., the true O-COF), and (b) the predicted O-COF in our proposal (Equation 13). When data were simulated on the basis of the orthogonal population scenario, we also computed the correlation ratio η. In order to assess correlational accuracy, the RMSR based on Equation (14) was obtained.
DIANA scores: True O-COF, predicted O-COF, correlation ratio (in the orthogonal case), and RMSR were obtained in the same way as described above.
PHI-Information Oblique EAP scores (ORION scores) were also computed for the purpose of comparison. To assess fidelity, the correlation between ORION scores and true scores was computed. Also, RMSR between the ORION-based correlation matrix and the population interfactor matrix was computed.
To summarize, the study was based on a 3 × 2 × 4 design with 100 replicas per condition. The independent variables were (a) communality: low (item communalities between .20 and .40, with an average of .32), wide (item communalities between .20 and .80, with an average of .49), and high (item communalities between .60 and .80, with an average of .69); (b) interfactor correlations: orthogonal and oblique; and (c) response bias: none, low, important, and severe.
Table 1 shows the means and standard deviations related to the fidelity results obtained with Unit-loading scores, DIANA scores, and ORION scores. In order to assess the size of the effects obtained, analyses of variance were carried out with the IBM SPSS Statistics Version 20 program. Cohen (1988) suggested that threshold values for η2 effect sizes of .02 are small effects, .13 medium effects, and .26 or more large effects. Results can be summarized as follows. First, as expected, the O-COF proposed in this article (Equation 13) closely matches the “true” O-COF in this simulated scenario in which the true factor scores are known. Second, the DIANA UWSSs outperform the Unit-loading UWSSs in all cases, and their fidelity, as expected, falls between that of the Unit-loading scores and that of the factor score estimates. Third, the fidelity of the UWSSs (both types) increases when communality is high and item distributions approach symmetry.
Table 1.
Unit-loading | DIANA | ||||
---|---|---|---|---|---|
True O-COF | Predicted O-COF | True O-COF | Predicted O-COF | ORION | |
Overall | .793 (.063) | .794 (.063) | .831 (.071) | .830 (.073) | .853 (.072) |
Communality level | |||||
Low | .746 (.051) | .749 (.051) | .770 (.048) | .767 (.049) | .789 (.042) |
Wide | .827 (.049) | .829 (.049) | .842 (.055) | .841 (.056) | .864 (.064) |
High | .804 (.058) | .805 (.058) | .882 (.058) | .883 (.059) | .906 (.052) |
Interfactor correlations | |||||
Orthogonal | .768 (.060) | .769 (.060) | .821 (.074) | .819 (.076) | .857 (.070) |
Oblique | .817 (.055) | .818 (.056) | .842 (.067) | .841 (.068) | .849 (.074) |
Response | |||||
None | .836 (.048) | .837 (.047) | .880 (.058) | .879 (.060) | .886 (.064) |
Low | .820 (.044) | .820 (.045) | .862 (.052) | .861 (.053) | .878 (.057) |
Important | .796 (.043) | .798 (.045) | .836 (.048) | .836 (.051) | .861 (.061) |
Severe | .718 (.039) | .719 (.040) | .748 (.040) | .745 (.043) | .788 (.062) |
Note. Entries in boldface are the means of a manipulation in which the differences are significant and the effect sizes are large (i.e., η2 larger than .26). Entries in italics are the means of a manipulation in which the differences are significant and the effect sizes are medium (i.e., η2 larger than .13). DIANA = Direct Item Addition of Non-Ahead proposed sets; O-COF = ordinal coefficient of fidelity.
Table 2 shows the linear and nonlinear correlations between the two types of UWSSs and the true factor scores in the orthogonal case. Apart from the superior performance of DIANA, the results agree closely with theoretical expectations (Ferrando, 2009; Lord, 1952). First, the correlation ratios are systematically higher than the linear COFs, as they should be. Second, the closeness between both correlations depends on both communality level and item extremeness. When communality is low and the item distributions are almost symmetrical, both coefficients provide very similar results. On the other hand, the correlation ratio increasingly outperforms the O-COF in conditions of high communality and extremeness, which is only to be expected given that in these conditions the item-factor regressions become more and more nonlinear.
Table 2.
Unit-loading | DIANA | |||
---|---|---|---|---|
O-COF | Correlation ratio | O-COF | Correlation ratio | |
Overall | .768 (.060) | .798 (.052) | .821 (.074) | .851 (.072) |
Communality level | ||||
Low | .715 (.039) | .731 (.027) | .752 (.044) | .762 (.031) |
Wide | .811 (.044) | .840 (.019) | .831 (.055) | .864 (.031) |
High | .778 (.050) | .823 (.012) | .880 (.056) | .926 (.013) |
Response extremity | ||||
None | .810 (.044) | .808 (.045) | .867 (.064) | .861 (.067) |
Low | .795 (.042) | .810 (.045) | .851 (.056) | .864 (.063) |
Important | .772 (.040) | .805 (.047) | .826 (.053) | .859 (.066) |
Severe | .695 (.036) | .770 (.058) | .739 (.046) | .819 (.083) |
Note. DIANA = Direct Item Addition of Non-Ahead proposed sets; O-COF = ordinal coefficient of fidelity.
We turn finally to the correlational accuracy results in Table 3. They are quite clear: DIANA outperforms the Unit-loading scores in terms of correlational accuracy in all the conditions, producing acceptably low RMSRs in all cases. Note also that the Unit-weight scores perform particularly badly in the case of orthogonal solutions.
Table 3.
Unit-loading | DIANA | |
---|---|---|
Overall | .443 (.097) | .199 (.048) |
Communality level | ||
Low | .431 (.089) | .207 (.052) |
Wide | .393 (.069) | .222 (.042) |
High | .505 (.088) | .168 (.030) |
Interfactor correlations | ||
Orthogonal | .521 (.056) | .213 (.050) |
Oblique | .366 (.053) | .185 (.041) |
Response bias | ||
None | .454 (.091) | .208 (.043) |
Low | .450 (.091) | .207 (.044) |
Important | .445 (.093) | .203 (.046) |
Severe | .424 (.101) | .177 (.051) |
Note. Entries in boldface are the means of a manipulation in which the differences are significant and the effect sizes are large (i.e., η2 larger than .26). Entries in italics are the means of a manipulation in which the differences are significant and the effect sizes are medium (i.e., η2 larger than .13). DIANA = Direct Item Addition of Non-Ahead proposed sets.
Follow-Up Study: Stability Under Cross-Validation and Bias in Prediction
The follow-up study aims to address two issues discussed above but not considered in the main study. The first is the stability of DIANA scores under cross-validation. The second is the possible bias in the standardized regression weights (which are structural parameters) when these parameters are estimated by empirically regressing external variables on the score estimates.
The general conditions of the second study considered an oblique three-factor model with interfactor correlations ranging from .20 to .50, and in which the responses followed the condition of no extremeness. The conditions were the number of indicators per factor (32 or 64) and the sample size (400 and 800). The communalities of the data generated ranged between .20 and .80 (i.e., they correspond to the wide condition in the previous simulation study). The data generation procedures were the same as in the main study. As for the scoring schemas, we compared the performance of the DIANA scores with the improved-EAP ORION scores which were taken as a reference benchmark.
The stability of the scores under cross-validation was assessed by using the following general double design. The total sample was randomly split into two halves, and in each half individual scores were obtained on the basis of (a) the calibration results obtained in the same half and (b) the calibration results obtained in the other half. Next, in each half, the two sets of resulting scores were correlated, and the average correlation obtained in both halves was finally computed and used as the dependent variable in this part of the study. The coefficients of fidelity for each scoring schema were also reported.
A continuous variable with standard normal distribution, and which had a structural correlation of 0.60 with each of the three factors, was generated to be used as the external variable or criterion in the predictive part of the study. The dependent variables in this case were (a) the estimated correlations between each of the scores and the criterion and (b) the difference between each estimated regression (Beta) weight and the corresponding structural parameter.
Table 4 shows the results of stability under cross-validation. In general terms, the results match expectations: Overall, DIANA scores have slightly less fidelity than the “ceiling” ORION factor score estimates, but, on the other hand, are slightly more stable. We note, however, that stability is quite high for both methods and in all conditions. As for the details, the DIANA results are slightly more variable than the ORION results.
Table 4.
DIANA | ORION | |||
---|---|---|---|---|
O-COF | Stability | Fidelity | Stability | |
Overall | .969 (.006) | .992 (.007) | .978 (.008) | .990 (.005) |
N = 400 | .965 (.005) | .990 (.008) | .979 (.007) | .988 (.005) |
N = 800 | .972 (.005) | .994 (.005) | .980 (.006) | .992 (.003) |
m/f = 32 | .953 (.004) | .989 (.009) | .962 (.003) | .992 (.004) |
m/f = 64 | .970 (.004) | .992 (.006) | .982 (.001) | .990 (.005) |
Note. DIANA = Direct Item Addition of Non-Ahead proposed sets; O-COF = ordinal coefficient of fidelity.
The results on the regression bias are in Table 5. Again ORION-based results can be taken as a reference, as the regression weights based on Bayes factor score estimates are expected to lead to the least amount of bias (Skrondal & Laake, 2001). Results again match expectations: The ORION-based weights are the least biased, but the DIANA-based biases are remarkably small, and approach those of ORION as the sample and model size increase. This result could be anticipated given the high degrees of correlational accuracy found for the DIANA scores in the main study. We also note that, when estimating the correlations, the DIANA-based biases are smaller than those produced by ORION (again an expected result: Bayes score estimates are expected to produce unbiased estimates of the regression weights but biased estimates of the correlations). Finally, it is interesting to observe the pattern of relations in Table 5. DIANA tends to overestimate the correlations and underestimate the regression weights, and ORION tends to produce the opposite pattern of biases.
Table 5.
DIANA | ORION | ||||||
---|---|---|---|---|---|---|---|
Correlation | Beta | Bias | Correlation | Beta | Bias | ||
Overall | .617 (.015) | .384 (.009) | −.015 (.006) | .579 (.020) | .412 (.005) | .013 (.009) | |
Sample size | 400 | .620 (.015) | .385 (.011) | −.017 (.006) | .582 (.021) | .413 (.005) | .011 (.009) |
800 | .615 (.015) | .383 (.008) | −.014 (.006) | .576 (.019) | .411 (.004) | .014 (.008) | |
Variables | 32 | .632 (.009) | .394 (.005) | −.020 (.005) | .603 (.005) | .417 (.003) | .003 (.003) |
64 | .607 (.008) | .377 (.004) | −.012 (.004) | .563 (.004) | .409 (.003) | .019 (.003) |
Note. DIANA = Direct Item Addition of Non-Ahead proposed sets.
Implementation
The proposals made here have been implemented and tested in an experimental version of FACTOR (Ferrando & Lorenzo-Seva, 2017), a well-known, free exploratory factor analysis program. They are now available at http://www.psicologia.urv.cat/ca/utilitats/ in the 10.11.01 release of the program. If they are to be reported in the outcomes, users must select the option Direct Item Addition of Non-Ahead proposed sets (DIANA) on the menu Configure advanced indices related to the factor model. The outcomes are printed under the section PARTICIPANTS’ SCORES ON FACTORS: DIANA SCORES. The outcomes include the information about the items to be added to compute scores in each factor, and the items that must be reversed. In addition, COF (in the linear model), O-COF, and correlation ratios (in the orthogonal factor models) are also reported. Finally, stability coefficients obtained by using the double cross-validation approach in the follow-up study are also reported in the program output.
A Tutorial With an Illustrative Example
We use as a running example the reanalysis of the Spanish version of the Five-Factor Personality Inventory (FFPI; Rodríguez-Fornells et al., 2001), an adaptation of a Dutch questionnaire developed to assess the Big-Five model of Personality (Hendriks et al., 1999). The FFPI consists of 100 brief and behaviorally simple descriptive item stems written in the third person singular (e.g., takes risks, avoids company). Each dimension is measured by 20 balanced items, and responses are provided on a 5-point scale running from not at all applicable to entirely applicable. For simplicity and didactic purposes, only 40 items are used to measure two general dimensions. The first 20 items are designed to measure a factor of Emotional Stability (ES) and the last 20 to measure Agreeableness (AG). Theoretically, both factors are independent from each other. So, Φ is expected to be a diagonal matrix in the population. Respondents were 567 undergraduate college students (480 women and 87 men, mean age 19.3 years) from a Spanish university, who were asked to participate voluntarily.
In our reanalysis, items were calibrated by fitting the bidimensional UVA–EFA factor model with robust unweighted least squares estimation as implemented in FACTOR. An orthogonal Procustes transformation was then performed on the direct solution based on a semispecified pattern (Browne, 1972) which was built according to the theoretical structure above. Goodness-of-fit results were based on the second-order (mean and variance) corrected chi-square statistic proposed by Asparouhov and Muthén (2010) and were considered acceptable (root mean square error of approximation [RMSEA] = 0.06, comparative fit index [CFI] = 0.94, goodness-of-fit index [GFI] = 0.93). The overall congruence between the proposed target and the rotated solution was 0.92, which suggests that the solution agrees well, but not exactly, with its theoretical structure (see Lorenzo-Seva & ten Berge, 2006)
As recommended above, first, the strength, stability and replicability of the solution was assessed with the multidimensional extensions of Hancock and Mueller’s (2001) H index proposed by Ferrando and Lorenzo-Seva (2018) and provided in the FACTOR output. The G-H estimates in our solution were 0.91 (ES) and 0.93 (AG), both well above the 0.80 cutoff proposed by Ferrando and Lorenzo-Seva (2018). So, the solution obtained can be considered to provide a good basis for deriving valid individual score estimates.
We turn now to the scoring results. For didactic purposes, we considered the three scoring schemas used in the simulation. The first schema was the Unit-loading based on a 0.30 threshold, in which the loadings greater than 0.30 were converted to unit weights with the same sign as the loading (i.e., −1 or +1) and the remaining loadings were assigned a zero weight (UL-UWSS). The second schema was the DIANA scores computed from the program. And the third was the EAP ORION scores, also obtained from FACTOR.
Table 6 shows the fidelity results for the schemas above. We shall summarize results in five points. First, for both factors, the fidelity values of the UWSSs are all far greater than the 0.90 criterion originally proposed by Drasgow and Miller (1982). Second, the differences between the O-COF estimates and the estimated correlations between the EAP-FSs and the “true” factors are small, and the q effect-size transformations are 0.11 (ES) and 0.26 (AG), which would again qualify as small (Cohen, 1988). Fourth, the impact of nonlinearity also seems to be small in all cases and, in terms of f2 local effect size (Equation 8), the largest difference is 0.02, which would be small according to Cohen (1988). Finally, the differences between the COFs of UL-UWSS and the COFs of DIANA are virtually null, which is not unexpected given that there is virtually no room for improvement in terms of fidelity if the ORION-based results are taken as a ceiling.
Table 6.
Factors | Unit-loading | DIANA | Correlation ratio Unit-loading | Correlation ratio DIANA | ORION |
---|---|---|---|---|---|
F1-ES | .954 | .954 | .960 | .960 | .960 |
F2-AG | .955 | .954 | .961 | .955 | .971 |
Note. DIANA = Direct Item Addition of Non-Ahead proposed sets.
We turn now to the results on correlational accuracy. In this example, Φ was diagonal, so the empirical interfactor correlation matrix is already the residual matrix in Equation (13). Furthermore, with only two factors, there is a single interfactor correlation, so the RMSR is the absolute value of this correlation. The values for this correlation were 0.129 (UL-UWSS), −0.003 (DIANA), and −0.002 (ORION). So, it seems that it is the second property that determines the differences in how the various procedures function. The standardized value (14) for the UL-UWSS-based results is 3.08, quite large for a normal deviate. In contrast, the residuals obtained with the other two types of score are virtually zero.
The stability of the various scoring procedures was finally assessed by using the same schema as in the follow-up study. The results for the different procedures were 0.986 (ES) and 0.993 (AG) for the UL-UWSSs, 0.972 (ES) and 0.997 (AG) for DIANA, and 0.939 (ES) and 0.963 (AG) for ORION.
Overall, the results can be summarized as follows. In the single-sample analyses, as expected, the factor score estimates (ORION in this case) perform best both in terms of fidelity and correlational accuracy. However, DIANA performs almost as well in both properties, and the impacts of the larger measurement error and nonlinearity are almost negligible in this case. Furthermore, the DIANA scores are more stable under cross-validation than the factor score estimates, a result that agrees with those of the follow-up study. In fact, either of the two UWSS procedures lead to noticeably more stable results than the factor score estimates. So, for the purposes discussed above (clinical assessment, construct validation, prediction), DIANA-based UWSSSs could be used with some guarantee of success. Finally, the UL-UWSSs perform very well in terms of fidelity and the slight impact of nonlinearity and are quite stable under cross-validation. However, in terms of correlational accuracy they perform substantially worse than any of the alternatives. So, they would not be recommended for validation and prediction purposes.
Discussion
The scoring of measures calibrated with the FA model can be seen as a trade-off between simplicity and information/accuracy. On one hand, UWSSs are the simplest and easy to work with, but they use incomplete information from the calibration results and are expected to be less accurate. On the other hand, factor score estimates are more complex but use most of the calibration information and so are expected to be the most accurate. Furthermore, when calibration is based on the UVA-EFA, which is the situation discussed in this article, the theoretical gap between both extremes gets wider: Factor score estimates are no longer weighted composites of the item scores and UWSSs are now potentially affected by additional distorting influences (mainly nonlinearity).
This article has dealt with the appropriateness of UWSSs when the ordered-categorical items are calibrated with UVA–EFA. We have taken a practical view and assessed the extent to which these simple scores can still be reasonably good proxies for the factors identified in the calibration stage and be validly used for individual measurement. In order to assess appropriateness, we have considered two properties: fidelity and correlational accuracy. From this starting point, we first proposed operational measures for assessing them in the UVA–EFA case (mainly O-COF). Then we went on to propose a new analytical approach: DIANA, aimed at obtaining UWSSs that maximize both properties.
In general, the results of both simulation and empirical studies have been encouraging: DIANA-based scores systematically outperformed UWSSs based on existing approaches and, in many conditions, performed quite close to the factor score estimates that were taken as “ceiling” references. As expected from theory (Ferrando, 2009), nonlinearity only produced clear distortions when items were both extreme and highly discriminating (i.e., high commonalities). Furthermore, DIANA-UWSSs were generally more stable under cross-validation than factor score estimates in the conditions considered here. As a summary then, in well-defined solutions, simple UWSSs are expected to be reasonable proxies for the factors in many conditions found in practice, so they can be validly used at least for the most basic applications of individual assessment.
Like any proposal of this type, ours has its share of limitations and points that deserve further study. The most basic limitation is that DIANA is an analytical (partly) data-driven approach, so it is unavoidably affected by capitalization on chance. So, in spite of the favorable results obtained in both the follow-up simulation and the empirical study in terms of stability we recommend that the procedure only be used when the basis solution obtained in the calibration stage is strong and replicable, and, even in this case, it is strongly recommended to inspect the cross-validation results provided in the output of the program.
As for extensions and the need for further studies, the generality of our initial results must clearly be confirmed, and the conditions in which UWSSs perform appropriately and the purposes for which they can be used must be determined. More specifically our proposal is single-group, and internal. As for the first point, how UWSSs would perform in multiple-group assessments (including invariance studies) requires further, careful assessment (e.g., von Davier, 2010). As for, the “internal” nature of the proposal (see Morris, 1979) we have only been concerned with the closeness between the scores and the factors they intend to measure. However, it would also be of interest to consider the relations with the scores and relevant external variables as additional information for judging their appropriateness. The follow-up simulation study is a first step in this direction.
In spite of its limitations and the fact that it is an initial proposal, we believe that the article makes many original contributions that will be of clear interest for the practitioner. It provides an auxiliary source of information for judging whether the routine use of UWSSs is defensible in the study at hand, and, if it is, for obtaining the “best” UWSSs allowed by the calibration results. Furthermore, our proposal is simple and feasible, and its implementation in a free, well-known, and user-friendly program makes it more likely that it will be used in practice.
Supplemental Material
Supplemental material, appendix for The Appropriateness of Sum Scores as Estimates of Factor Scores in the Multiple Factor Analysis of Ordered-Categorical Responses by Pere J. Ferrando and Urbano Lorenzo-Seva in Educational and Psychological Measurement
Footnotes
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project has been made possible by the support of Ministerio de Economía, Industria y Competitividad, the Agencia Estatal de Investigación (AEI) and the European Regional Development Fund (ERDF) (PSI2017-82307-P).
ORCID iD: Pere J. Ferrando https://orcid.org/0000-0002-3133-5466
Supplemental Material: Supplemental material for this article is available online.
References
- Asparouhov T., Muthén B. (2010). Simple second order chi-square correction [Unpublished manuscript]. https://www.statmodel.com/download/WLSMV_new_chi21.pdf
- Beauducel A. (2015). A Schmid-Leiman based transformation resulting in perfect inter-correlations of three types of factor score predictors. arXiv. https://arxiv.org/abs/1511.00298
- Beauducel A., Leue A. (2013). Unit-weighted scales imply models that should be tested. Practical Assessment, Research & Evaluation, 18(1), 2. [Google Scholar]
- Browne M. (1972). Orthogonal rotation to a partially specified target. British Journal of Mathematical and Statistical Psychology, 25(1), 115-120. 10.1111/j.2044-8317.1972.tb00482.x [DOI] [Google Scholar]
- Cattell R. B., Tsujioka B. (1964). The importance of factor-trueness and validity, versus homogeneity and orthogonality, in test scales1. Educational and Psychological Measurement, 24(1), 3-30. 10.1177/001316446402400101 [DOI] [Google Scholar]
- Cohen J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum. [Google Scholar]
- Comrey A. L., Lee H. B. (1992). A first course in factor analysis (2nd ed.). Lawrence Erlbaum. [Google Scholar]
- DiStefano C., Zhu M., Mindrila D. (2009). Understanding and using factor scores: Considerations for the applied researcher. Practical Assessment, Research & Evaluation, 14(20), 1-11. https://eric.ed.gov/?id=EJ933679 [Google Scholar]
- Drasgow F., Miller H. E. (1982). Psychometric and substantive issues in scale construction and validation. Journal of Applied Psychology, 67(3), 268-279. 10.1037/0021-9010.67.3.268 [DOI] [Google Scholar]
- Dumenci L., Achenbach T. M. (2008). Effects of estimation methods on making trait-level inferences from ordered categorical items for assessing psychopathology. Psychological assessment, 20(1), 55-62. 10.1037/1040-3590.20.1.55 [DOI] [PubMed] [Google Scholar]
- Edwards A. L., Thurstone L. L. (1952). An internal consistency check for scale values determined by the method of successive intervals. Psychometrika, 17(2), 169-180. 10.1007/BF02288780 [DOI] [Google Scholar]
- Fabrigar L. R., Wegener D. T., MacCallum R. C., Strahan E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272-299. 10.1037/1082-989X.4.3.272 [DOI] [Google Scholar]
- Fan X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357-381. 10.1177/0013164498058003001 [DOI] [Google Scholar]
- Fava J. L., Velicer W. F. (1992). An empirical comparison of factor, image, component, and scale scores. Multivariate Behavioral Research, 27(3), 301-322. 10.1207/s15327906mbr2703_1 [DOI] [PubMed] [Google Scholar]
- Ferrando P. J. (2009). Difficulty, discrimination, and information indices in the linear factor analysis model for continuous item responses. Applied Psychological Measurement, 33(1), 9-24. 10.1177/0146621608314608 [DOI] [Google Scholar]
- Ferrando P. J., Lorenzo-Seva U. (2013). Unrestricted item factor analysis and some relations with item response theory [Technical Report]. Universitat Rovira i Virgili, Department of Psychology, Tarragona. [Google Scholar]
- Ferrando P. J., Lorenzo-Seva U. (2016). A note on improving EAP trait estimation in oblique factor-analytic and item response theory models. Psicológica, 37(2), 235-247. https://psycnet.apa.org/record/2016-34732-007 [Google Scholar]
- Ferrando P. J., Lorenzo-Seva U. (2017). Program FACTOR at 10: Origins, development and future directions. Psicothema, 29(2), 236-240. 10.7334/psicothema2016.304 [DOI] [PubMed] [Google Scholar]
- Ferrando P. J., Lorenzo-Seva U. (2018). Assessing the quality and appropriateness of factor solutions and factor score estimates in exploratory item factor analysis. Educational and Psychological Measurement, 78(5), 762-780. 10.1177/0013164417719308 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferrando P. J., Lorenzo-Seva U. (2019). On the added value of multiple factor score estimates in essentially unidimensional models. Educational and Psychological Measurement, 79(2), 249-271. 10.1177/0013164418773851 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gorsuch R. L. (1997). New procedure for extension analysis in exploratory factor analysis. Educational and Psychological Measurement, 57(5), 725-740. 10.1177/0013164497057005001 [DOI] [Google Scholar]
- Grice J. W. (2001). Computing and evaluating factor scores. Psychological Methods, 6(4), 430-450. 10.1037/1082-989X.6.4.430 [DOI] [PubMed] [Google Scholar]
- Grice J. W., Harris R. J. (1998). A comparison of regression and loading weights for the computation of factor scores. Multivariate Behavioral Research, 33(2), 221-247. 10.1207/s15327906mbr3302_2 [DOI] [PubMed] [Google Scholar]
- Hancock G. R., Mueller R. O. (2001). Rethinking construct reliability within latent variable systems. In Cudek R., duToit S. H. C., Sorbom D. F. (Eds.), Structural equation modeling: Present and future (pp. 195-216). Scientific Software. [Google Scholar]
- Hendriks A. A. J., Hofstee W. K. B., De Raad B. (1999). The Five-Factor Inventory (FFPI). Personality and Individual Differences, 27(2), 307-325. 10.1016/S0191-8869(98)00245-1 [DOI] [Google Scholar]
- Horn J. L. (1965). An empirical comparison of methods for estimating factor scores. Educational and Psychological Measurement, 25(2), 313-322. 10.1177/001316446502500202 [DOI] [Google Scholar]
- Horn J. L., Miller W. C. (1966). Evidence on problems in estimating common factor scores. Educational and Psychological Measurement, 26(3), 617-622. 10.1177/001316446602600306 [DOI] [Google Scholar]
- Johnson J. W. (2000). A heuristic method for estimating the relative weight of predictor variables. Multivariate Behavioral Research, 35(1), 1-19. 10.1207/S15327906MBR3501_1 [DOI] [PubMed] [Google Scholar]
- Kendall M. G., Stuart A. (1977). The advanced theory of statistics (Vol. 2). Charles Griffin. [Google Scholar]
- Lord F. M. (1952). A theory of test scores (Psychometrika Monograph. No 7). Psychometric Society. [Google Scholar]
- Lord F. M. (1953). The relation of test score to the trait underlying the test. Educational and Psychological Measurement, 13(4), 517-549. 10.1177/001316445301300401 [DOI] [Google Scholar]
- Lord F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum. [Google Scholar]
- Lorenzo-Seva U. (1999). Promin: A method for oblique factor rotation. Multivariate Behavioral Research, 34(3), 347-365. 10.1207/S15327906MBR3403_3 [DOI] [Google Scholar]
- Lorenzo-Seva U., ten Berge J. M. (2006). Tucker’s congruence coefficient as a meaningful index of factor similarity. Methodology, 2(2), 57-64. 10.1027/1614-2241.2.2.57 [DOI] [Google Scholar]
- Lorenzo-Seva U., Ferrando P. J., Chico E. (2010). Two SPSS programs for interpreting multiple regression results. Behavior Research Methods, 42(1), 29-35. 10.3758/BRM.42.1.29 [DOI] [PubMed] [Google Scholar]
- MacCallum R. C., Widaman K. F., Preacher K. J., Hong S. (2001). Sample size in factor analysis: The role of model error. Multivariate Behavioral Research, 36(4), 611-637. 10.1207/S15327906MBR3604_06 [DOI] [PubMed] [Google Scholar]
- Macdonald P., Paunonen S. V. (2002). A Monte Carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921-943. 10.1177/0013164402238082 [DOI] [Google Scholar]
- McDonald R. P. (2011). Measuring latent quantities. Psychometrika, 76(4), 511-536. 10.1007/s11336-011-9223-7 [DOI] [PubMed] [Google Scholar]
- McDonald R. P., Burr E. J. (1967). A comparison of four methods of constructing factor scores. Psychometrika, 32(4), 381-401. 10.1007/BF02289653 [DOI] [Google Scholar]
- Morris J. D. (1979). A comparison of regression prediction accuracy on several types of factor scores. American Educational Research Journal, 16(1), 17-24. 10.3102/00028312016001017 [DOI] [Google Scholar]
- Muthén B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49(1), 115-132. 10.1007/BF02294210 [DOI] [Google Scholar]
- Olsson U., Drasgow F., Dorans N. J. (1982). The polyserial correlation coefficient. Psychometrika, 47(3), 337-347. 10.1007/BF02294164 [DOI] [Google Scholar]
- Raykov T., Gabler S., Dimitrov D. M. (2016). Maximal reliability and composite reliability: A latent variable modeling approach to their difference evaluation. Structural Equation Modeling, 23(3), 384-391. 10.1080/10705511.2014.966369 [DOI] [Google Scholar]
- Raykov T., Marcoulides G. A. (2011). Introduction to psychometric theory. Routledge. 10.4324/9780203841624 [DOI]
- Reise S. P., Waller N. G. (2009). Item response theory and clinical measurement. Annual Review of Clinical Psychology, 5, 27-48. 10.1146/annurev.clinpsy.032408.153553 [DOI] [PubMed] [Google Scholar]
- Rodríguez-Fornells A., Lorenzo-Seva U., Andrés-Pueyo A. (2001). Psychometric properties of the Spanish adaptation of the Five Factor Personality Inventory. European Journal of Psychological Assessment, 17(2), 145-153. 10.1027//1015-5759.17.2.145 [DOI] [Google Scholar]
- Samejima F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometrika Monograph No. 17). Psychometric Society. 10.1007/BF03372160 [DOI]
- Samejima F. (1977). Weakly parallel tests in latent trait theory with some criticism of classical test theory. Psychometrika, 42(2), 193-198. 10.1007/BF02294048 [DOI] [Google Scholar]
- Schweiker R. F. (1967). Factor scores aren’t sacred: Comments on “Abuses of Factor Scores.” American Educational Research Journal, 4(2), 168-170. 10.2307/1162125 [DOI] [Google Scholar]
- Skrondal A., Laake P. (2001). Regression among factor scores. Psychometrika, 66, 563-575. doi: 10.1007/BF02296196 [DOI] [Google Scholar]
- ten Berge J. M., Knol D. L. (1985). Scale construction on the basis of components analysis: A comparison of three strategies. Multivariate Behavioral Research, 20(1), 45-55. 10.1207/s15327906mbr2001_3 [DOI] [PubMed] [Google Scholar]
- Thissen D., Wainer H. (2001). Test scoring. Routledge. 10.4324/9781410604729 [DOI]
- Thissen D., Steinberg L., Pyszczynski T., Greenberg J. (1983). An item response theory for personality and attitude scales: Item analysis using restricted factor analysis. Applied Psychological Measurement, 7(2), 211-226. 10.1177/014662168300700209 [DOI] [Google Scholar]
- Trites D. K., Sells S. B. (1955). A note on alternative methods for estimating factor scores. Journal of Applied Psychology, 39(6), 455-456. 10.1037/h0048235 [DOI] [Google Scholar]
- von Davier M. (2010). Why sum scores may not tell us all about test takers. Newborn and Infant Nursing Reviews, 10(1), 27-36. 10.1053/j.nainr.2009.12.011 [DOI] [Google Scholar]
- Wackwitz J., Horn J. (1971). On obtaining the best estimates of factor scores within an ideal simple structure. Multivariate Behavioral Research, 6(4), 389-408. 10.1207/s15327906mbr0604_2 [DOI] [PubMed] [Google Scholar]
- Wainer H. (1976). Estimating coefficients in linear models: It don’t make no never mind. Psychological Bulletin, 83(2), 213-217. 10.1037/0033-2909.83.2.213 [DOI] [Google Scholar]
- Wright B. D. (1999). Fundamental measurement of psychology. In Embretson S. E., Hershberger S. L. (Eds.), The new rules of measurement: What every psychologist and educator should know (pp. 65-104). Lawrence Erlbaum. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material, appendix for The Appropriateness of Sum Scores as Estimates of Factor Scores in the Multiple Factor Analysis of Ordered-Categorical Responses by Pere J. Ferrando and Urbano Lorenzo-Seva in Educational and Psychological Measurement