Skip to main content
Educational and Psychological Measurement logoLink to Educational and Psychological Measurement
. 2022 Jul 2;83(3):495–519. doi: 10.1177/00131644221105505

A Small Sample Correction for Factor Score Regression

Jasper Bogaert 1, Wen Wei Loh 1, Yves Rosseel 1,
PMCID: PMC10177321  PMID: 37187693

Abstract

Factor score regression (FSR) is widely used as a convenient alternative to traditional structural equation modeling (SEM) for assessing structural relations between latent variables. But when latent variables are simply replaced by factor scores, biases in the structural parameter estimates often have to be corrected, due to the measurement error in the factor scores. The method of Croon (MOC) is a well-known bias correction technique. However, its standard implementation can render poor quality estimates in small samples (e.g. less than 100). This article aims to develop a small sample correction (SSC) that integrates two different modifications to the standard MOC. We conducted a simulation study to compare the empirical performance of (a) standard SEM, (b) the standard MOC, (c) naive FSR, and (d) the MOC with the proposed SSC. In addition, we assessed the robustness of the performance of the SSC in various models with a different number of predictors and indicators. The results showed that the MOC with the proposed SSC yielded smaller mean squared errors than SEM and the standard MOC in small samples and performed similarly to naive FSR. However, naive FSR yielded more biased estimates than the proposed MOC with SSC, by failing to account for measurement error in the factor scores.

Keywords: factor score regression, method of Croon, structural equation modeling, small sample estimation, measurement error

Introduction

In the educational, behavioral, and social sciences, researchers often explore relationships between latent variables such as motivation and intelligence. The most obvious and widely adopted tool of choice to analyze these relationships is structural equation modeling (SEM). SEM is a statistical modeling procedure that estimates the measurement model and the structural relations simultaneously, and is (in combination with maximum likelihood estimation [MLE]) often regarded to be the gold standard (Bollen, 1989). However, there are several disadvantages to (traditional) SEM. For example, a large sample size is a desideratum, and a misspecification of the model might affect the estimation of all parameters (Bentler & Yuan, 1999; Boomsma, 1985; Nevitt & Hancock, 2004).

When substantive interest is primarily in assessing the structural relations between latent variables, a straightforward alternative to SEM is factor score regression (FSR) (Skrondal & Laake, 2001). In FSR, the parameters are estimated in two steps. In a first step, confirmatory factor analysis is used to estimate the parameters of the measurement model, and factor scores are predicted for each latent variable. In a second step, the factor scores are used in place of the latent variables to estimate the (linear) relationships among the latent variables. When the structural model is recursive (i.e. no feedback loops), the analysis in the second step boils down to a series of linear regressions (Devlieger et al., 2016). When the structural model is non-recursive, path analysis can be used (Devlieger & Rosseel, 2017). In this article, we used the term FSR for both settings. Factor scores can be calculated in various ways, leading to different sets of observed factor scores. When using Regression (Thomson, 1934; Thurstone, 1935) or Bartlett factor scores (Bartlett, 1937), these are by construction always a linear combination of the observed indicators measuring the latent variable. Therefore, they still contain measurement error and naively using them in place of latent variables will result in biased regression coefficients (Bollen, 1989; Lastovicka & Thamodaran, 1991; Shevlin et al., 1997). We described other issues that can arise when using this method in the “Discussion” section. In this article, we referred to the above-described method—without any form of bias-correction—as FSR.

It is nevertheless possible to remove the bias in the estimated regression coefficients. One possible approach is based on the work of Croon (2002), and termed the method of Croon (MOC) by Devlieger et al. (2016). The MOC proceeds by removing the measurement error from the observed variance–covariance matrix of the factor scores in order to obtain a consistent estimator of the variance–covariance matrix of the latent variables. This (co)variance matrix can then be used to obtain asymptotically unbiased estimators of the regression coefficients. Devlieger et al. (2016) and Takane and Hwang (2018) showed in their simulation studies that the MOC is a good performing alternative for SEM. As illustrated in Devlieger and Rosseel (2017), a major advantage of the MOC is its robustness against local misspecifications of the model.

Although SEM and the MOC are useful methods, both methods can perform poorly in small samples ( N<100 ). Some of the known problems that may occur if SEM is used with (very) small samples are: (a) convergence problems, (b) inadmissible solutions (e.g. Heywood cases), (c) biased estimates for variance components, and (d) incorrect confidence intervals and fit statistics (Bentler & Yuan, 1999; Boomsma, 1985; Nevitt & Hancock, 2004; Rosseel, 2020). The reason is that the (frequentist) estimation methods that are typically used in SEM (whether it be MLE or generalized least squares) only work well if the sample size is sufficiently large. To tackle the problem of non-convergence and inadmissible solutions, De Jonckere and Rosseel (2022) suggested using bounded estimation. Their method effectively decreases the rate of non-converge in small samples, but the variability of the estimated regression coefficients in the structural model is still substantial. Smid and Rosseel (2020) showed that the MOC, being a two-step method, performs slightly better, but still exhibits a lot of variability when the sample size is very small (say, N<50 ). Nonetheless, as long as the models are correctly specified, the estimates of the regression coefficients obtained by SEM or the MOC remain unbiased, even in very small samples (Smid & Rosseel, 2020).

Indeed, it is well-known that SEM and the MOC take into account the measurement error, and therefore give unbiased results. What is less known by the users of both methods is the price to pay for these unbiased results. The bias-correction performed by SEM and the MOC leads to a large variability of the parameter estimates, in particular when the sample size becomes small. This is related to the concept of bias-variance trade-off, which implies that a decreasing bias will lead to an increasing variance and vice versa (Hastie et al., 2006; James et al., 2013). Therefore, in addition to bias, we also focus on the estimators’ variability and mean squared error (MSE) in this article. Cox and Hinkley (1979) stress that an estimate with a small bias and a small variance might be preferable to one with no bias but a large variance.

Small sample corrections (SSCs) have already been presented in the general SEM framework (e.g. Ozenne et al. 2020). However, this is not the case for the MOC. Hence, in this article, we propose to modify the MOC with a SSC. Our modification is based on earlier work reported in the measurement error models literature (Carroll et al., 2006; Fuller, 1987; Wall & Amemiya, 2000). In the context of linear regression with measurement error in the predictors, Fuller (1980) proposed two modifications in order to improve small sample estimation. His aim was to reduce the variability by allowing some bias, resulting in a better MSE accordingly. The goal of this article is to incorporate these two modifications into the MOC, and examine its performance.

The rest of this article is structured as follows. First, we introduced a simple structural equation model and the notation. We then described different estimation methods to analyze the relationships between latent variables: traditional SEM, FSR, and the MOC. Later, information concerning measurement error models and Fuller’s modifications is given. Thereafter, we showed how to integrate these different adjustments into a single small-sample correction for the MOC. Subsequently, two different simulation studies are described and the results are presented. Lastly, we discussed the performance of the estimators and the robustness of their performance in various models and provided some concluding thoughts.

A Simple Structural Equation Model

To facilitate the presentation of the correction techniques developed in this article, we considered a structural equation model with one latent predictor (denoted by ηx ) measured by p indicators, and one latent dependent variable (denoted by ηy ) measured by q indicators. An example with p=q=3 is given in Figure 1.

Figure 1.

Figure 1.

A simple structural equation model.

We used this simple model because it allowed us to present many formulas in an easy to understand scalar form. For general settings, the reader can refer to Appendix A where all results are presented using matrix notation. The joint model consists of two models: a measurement model and a structural model. The measurement model can be written as:

x=νx+λxηx+ϵxandy=νy+λyηy+ϵy, (1)

where x and y are the p×1 and q×1 vectors capturing the observed indicators measuring ηx and ηy respectively; νx and νy are the vectors of intercepts; λx and λy the vectors containing the factor loadings; and ϵx and ϵy the vectors of the error variables. We assume that E(ϵx)=E(ϵy)=0 and denote Var(ϵx)=Θx and Var(ϵy)=Θy .

The structural model is given by:

ηy=α+βηx+ζ, (2)

where ηx is the latent predictor and ηy is the latent dependent variable; α is the intercept, β the regression coefficient, and ζ the residual error term. We assumed that E(ζ)=0 , Cov(ηx,ζ)=0 , and Cov(ϵx,ζ)=Cov(ϵy,ζ)=0 . We denoted Var(ηx)=Ψx and Var(ζ)=Ψy .

Without loss of generality, we assumed that the observed variables are centered and therefore, νx=νy=0 and α=0 . The remaining free parameters to be estimated in this model are the free elements of λx , λy , Θx , Θy , as well as the scalar parameters Ψx , Ψy , and β . We collected these free parameters in the parameter vector θ . The model-implied variance–covariance matrix is written as Σ(θ) to stress that it is a function of the parameter vector θ .

Maximum Likelihood Estimation in Structural Equation Modeling

When all observed variables are continuous, a common estimator in SEM is MLE. When using MLE, the parameter estimates are found by minimizing the discrepancy function as follows:

FML=tr(SΣ(θ^)-1)log|Σ(θ^)|+log|S|k, (3)

where S is the observed variance–covariance matrix of the observed indicators, Σ(θ^) is the model-implied variance–covariance matrix of the observed indicators, and k is the number of observed variables. Note that all the parameters in the parameter vector θ are estimated simultaneously.

As mentioned earlier, SEM suffers from convergence issues in small samples. Nonconvergence occurs when the applied optimization method (e.g. quasi-Newton optimization) fails to acquire a solution satisfying certain criteria (Anderson & Gerbing, 1984; Boomsma, 1985). De Jonckere and Rosseel (2022) showed that bounded estimation leads to a major decrease in the nonconvergence rate, without negative effects on the point estimates for the unbounded parameters. Given the focus on small samples in this article, bounded estimation (using standard bounds) is used for all estimators in this article in order to avoid convergence problems. Importantly, applying standard bounds prevents the elements of Θx and Θy from being negative, thereby avoiding Heywood cases.

Factor Score Regression

FSR is a two-step method. In the first step, confirmatory factor analysis (CFA) is used to estimate the parameters of the measurement model. As with SEM, convergence issues may also occur here. Therefore, we again use MLE with bounded estimation for the CFA. Because the two measurement models (for x and y ) do not share any parameters (no cross-loadings or residual correlations), we can estimate both measurement models separately. Once the parameters of the measurement model are estimated, we can compute factor scores for both latent variables as follows:

fx=axxandfy=ayy, (4)

where ax and ay are the factor score vectors. There are multiple ways to compute the factor score vector a , leading to different factor scores. The two most common predictors are the Regression (Thomson, 1934; Thurstone, 1935) and Bartlett predictor (Bartlett, 1937). Using the Bartlett method, the factor score vectors axB and ayB are defined as:

axB=[(λxTΘx-1λx)λxTΘx-1]TandayB=[(λyTΘy-1λy)λyTΘy-1]T. (5)

In the Regression method, the factor score vectors axR and ayR are defined as:

axR=[Var(ηx)λxTΣx1]TandayR=[Var(ηy)λyTΣy1]T. (6)

In the second step, we use the factor scores to estimate the regression coefficient β of the structural model. The ordinary least-squares estimator of β can be found from the variance–covariance matrix of the factor scores as follows:

β^fs=Var(fx)^1Cov(fx,fy)^, (7)

where the “ fs ” subscript indicates that the estimate is based on factor scores. This estimator is biased regardless of the sample size, except in the trivial setting where Cov(fx,fy)^=0 , or when there is no measurement error ( ϵx=ϵy=0 ) (Fuller, 1987). But see also Skrondal and Laake (2001) where an unbiased estimate only under specific settings can be obtained using a bias-avoiding FSR method.

The Method of Croon

The source of the bias when using naive FSR is that the variance–covariance matrix of the factor scores differs from the variance–covariance matrix of the latent variables. Hence, the MOC (Croon, 2002) performs a correction to the variance–covariance matrix of the factor scores so that the corrected variance–covariance matrix is consistent for that of the latent variables. For clarity, we call the elements of the initial variance–covariance matrix the (co)variances of the factor scores, in this case denoted by Var(fx) , Var(fy) , and Cov(fx,fx) . The elements of the corrected variance–covariance matrix are named the estimated (co)variances of the latent variables, denoted by Var(ηx)Croon , Var(ηy)Croon , and Cov(ηx,ηy)Croon .

The variance of a latent variable based on the variance of the factor scores can be obtained by the formula:

Var(ηx)^Croon=scalex2[Var(fx^)offsetx]=(axTλx)2[Var(fx^)λxTΘxλx]. (8)

Furthermore, the covariance of a latent variable based on the covariance of the factor scores can be obtained by:

Cov(ηx,ηy)^Croon=(scalex×scaley)1Cov(fx,fy)^=(axTλxayTλy)1Cov(fx,fy)^. (9)

Thereafter, it is possible to obtain an unbiased estimate of the regression parameter using the corrected variance in Equation (8) and covariance in Equation (9):

β^Croon=Var(ηx)^Croon1Cov(ηx,ηy)^Croon. (10)

The formulas above simplify when the Bartlett predictor is used to compute the factor scores, because in this case: scalex=scaley=1 . For a more comprehensive description of the MOC, we refer readers to Devlieger et al. (2016) and Devlieger and Rosseel (2017).

Fuller’s Modifications for Small Sample Estimation

We now present the two modifications for the small sample estimator suggested by Fuller (1980). Consider a simple linear regression model with one predictor and one independent variable. Using the notation of Fuller (1987), we can write this as:

Yi=α+βxi+ei,i=1,2,,N, (11)

where the xi are assumed to be independent drawings from a N(μx,σxx) distribution, and the ei are independent N(0,σee) random variables. It is also assumed that ei is independent of xi . If we write mxx and mxY for the sample (co)variance of xi (and Yi ), then the least squares estimator as:

β^ols=[i=1N(xix¯)2]1i=1N(xix¯)(YiY¯)=mxx1mxY, (12)

is unbiased for β . The estimator β^ols is also the MLE for β and has the smallest variance of unbiased linear estimators (Fuller, 1987). Next, consider a measurement error model where it is impossible to observe xi directly. Instead, we observe the sum as:

Xi=xi+ui, (13)

where ui is a N(0,σuu) random variable, indicating the measurement error. In the absence of measurement error, σuu equals zero. This model is referred to as the classical measurement error model (Buonaccorsi, 2010; Carroll et al., 2006). In this measurement error model, the least squares estimator

β^naive=[i=1N(XiX¯)2]1i=1N(XiX¯)(YiY¯)=mXX1mXY, (14)

will now be biased. The impact of the measurement error is attenuation of the regression coefficient toward zero. However, if an estimate of σuu>0 is available, it is possible to obtain an unbiased estimator for β by using the method of moments (Buonaccorsi, 2010; Carroll et al., 2006; Fuller, 1980, 1987):

β^mm=(mXXσuu)1mXY. (15)

Note that σuu in Equation (15) plays the same role as offsetx in Equation (8). Fuller (1980, 1987) suggested an alternative estimator with better small sample properties. To construct this alternative estimator, Equation (15) needs two adjustments. Fuller (1980, 1987) proposed the following alternative estimator:

β^λ,α=[H^xx+α(N1)1σuu]1mXY, (16)

where H^xx is an estimator of σxx , and α>0 is a fixed number to be specified. Furthermore,

H^xx={mXXσuuifλ^1+(N1)1mXX[λ^(N1)1]σuuifλ^<1+(N1)1, (17)

with λ^ the root of the determinantal equation:

|mZZλdiag(0,σuu)|=0, (18)

where mZZ is the sample variance–covariance matrix of both Xi and Yi , containing the elements mXX , mXY , and mYY in its upper triangular part.

Fuller’s correction involves two modifications. First, a decision based on λ^ is implemented in Equation (17) to ensure H^xx is positive definite. This modification is named the λ-correction from here on. Second, including the parameter α in Equation (16) ensures a less right skewed distribution of the estimated regression coefficients, thereby reducing the variability at the cost of some bias as explained in the next section. This modification is named the α-correction throughout this article. Both modifications have different functions and can be employed separately from each other. Together they form the alternative estimator presented by Fuller (1980, 1987) for better small sample estimation in the presence of measurement error.

Combining Fuller’s Modifications and the Method of Croon

In this section, we showed how the two modifications of Fuller (1980) can be incorporated into the MOC estimator. For the considered model, only Var(ηx)Croon needs to be corrected. First, the λ-correction is performed:

Var(ηx^)λ={scale^x2[Var(f^x)offset^x]ifλ^1+(N1)1scale^x2[Var(f^x)[λ^(N1)1]offset^x]ifλ^<1+(N1)1, (19)

where λ^ is the smallest root of the determinantal equation:

|Var(fx)^λ^offset^x|=0. (20)

This decision based on λ^ ensures that Var(ηx)^λ is positive. Thereafter, the α-correction is performed:

Var(ηx^)ssc=Var(ηx^)λ+scale^x2α(N1)1offset^x, (21)

where α>0 is a fixed number to be specified. In this model, the offset terms are not needed for the covariances, and therefore Cov(ηx,ηy)^ssc=Cov(ηx,ηy)^Croon . Now, that we have obtained the corrected terms, we can again find an estimate of the regression parameter β as usual:

β^ssc=Var(ηx^)ssc1Cov(ηx,ηy)^ssc. (22)

We call the adjusted estimator in Equation (22) with the two modifications the MOC with a SSC. The two modifications can be employed separately by leaving out the decision based on λ^ or by setting α equal to zero. As a consequence, it is possible to implement solely the λ-correction into the MOC. This latter extension of the MOC is named the MOC-λ method from here on.

If we assume λ^1+(N1)1 , Equation (21) can be rewritten differently, giving more insight into the rationale behind the α-correction :

Var(ηx^)ssc=scalex2[Var(f^x)(1α(N1)1)offsetx]. (23)

The rationale of the correction is that the variance of the factor scores is subtracted with only a fraction (1α(N1)1) of the offsetx parameter. This results in less variability at the cost of a certain amount of bias. The amount of bias and reduction in variability depends on the value of α . Meaningful values for α range from 0 to N1 , where using 0 is equivalent to the MOC and using N1 is identical to FSR when working with the Bartlett predictor. Two suggestions have been made in the literature regarding the value of α . On the one hand, Fuller (1980, 1987) proposed to use α=p+1 (with p the number of predictors). On the other hand, Wall and Amemiya (2000) suggested using α=p+5 . In our simulation study, we also considered α=(N1)/2 , which assumes a mid-position between FSR and the MOC. This way, the effect of the correction is larger and adjusts to the sample size at hand. On the other hand, the correction term no longer vanishes if the sample size increases, resulting in biased estimates even when the sample size is very large. This version of the correction is therefore only useful when the sample size is small.

Simulation Studies

Two simulation studies were performed. In the first study, we examined the operating characteristics of the MOC with SSC estimator as compared to other estimators. We have considered 12 different conditions varying in sample size and reliability of the indicators. Given the focus on small sample estimation, the following sample sizes were chosen: N = 20, 30, 50, 100, 200, and 2,000. In the second study, we assessed the robustness of the performance of the three specifications of α in different models (varying in the amount of latent variables and indicators). We only considered models where the amount of observations is larger than the number of variables in the model. For this reason, we left the smallest sample size (N = 20) out in the second simulation study, resulting in 10 distinct conditions per model. The data generation and analysis were done using R (version 4.1.0) (R Core Team, 2020) and the lavaan package (version 0.6-10) (Rosseel, 2012).

Simulation Study 1

For the first simulation study, two data-generating models based on Equation (A2) were considered. They differed only in terms of the reliability of the indicators. The first model had a reliability of 0.5 for all indicators, whereas in the second model the reliability was set to 0.8. The conditions corresponded respectively to low and high reliability or, in other words, a condition with a higher and a lower amount of measurement error (Brunner & Austin, 2009; Nunnally & Bernstein, 1994). Both models contained three correlated (latent) predictors ( ηu , ηv , ηw ) and one (latent) dependent variable ( ηy ). All latent variables were measured by three observed indicators. In this study, the parameters of interest were the regression coefficients. The true parameter values β for both models were: βu=0.5 , βv=2 , and βw=4 . The true standardized parameter values βz for both models were: βuz=0.072 , βvz=0.302 , and βwz=0.502 . Figure 2 depicts the data-generating model with the true parameter values.

Figure 2.

Figure 2.

Simulation model. The variances of the latent predictors varied in both models. For the model with low reliability, these were Var(ηu)=1.2 , Var(ηv)=1.3 , and Var(ηw)=0.9 . For the model with high reliability, these were Var(ηu)=4.8 , Var(ηv)=5.2 , and Var(ηw)=3.6 .

For each simulated dataset, we fitted the same model in Figure 2. We then compared seven different estimators: (1) traditional SEM (SEM-MLE), (2) the standard MOC, (3) the MOC-λ , (4) FSR (using Bartlett factor scores), and (5) to (7) the SSC with one of the three suggested values for α , being: α=p+1=4 , α=p+5=8 , and α=(N1)/2 . Bounded estimation was used for all estimators to obtain a higher number of converged solutions. Note that SEM-MLE refers to the estimation method (using MLE) and not the modeling procedure. The MOC-λ method was added to the simulation to see whether the λ-correction is beneficial for the MOC.

To evaluate each estimator, the following performance criteria were considered: the proportion of simulated datasets where the method converged (from here on named the convergence rate), the relative mean bias, the empirical standard deviation (ESD), and the MSE of the estimators. Table 1 summarizes the performance criteria. We simulated 10,000 samples for each of the 6 × 2 = 12 settings.

Table 1.

Overview of performance criteria. For a given simulated dataset, the estimator is denoted by b^i and its true value denoted by b . The number of simulations is denoted by R .

Performance criteria Formulas
Convergence rate #estimatorconvergedR
Mean bias 1Ri=1R(b^ib)
Relative mean bias 1Ri=1R(b^ib1)
Empirical standard error 1R1i=1R(b^ib¯)2withb¯=1Ri=1Rb^i
Mean squared error i=1R(b^ib)2

Simulation Study 2

In the second simulation study, six data-generating models were considered. They differed in terms of the reliability of the indicators, as well as in the number of latent predictors and indicators. The first three models had a reliability of 0.5 for all indicators, whereas in the last three models, the reliability was set to 0.8. Hence, there was again a condition with low and high reliability (Brunner & Austin, 2009; Nunnally & Bernstein, 1994). Models 1 and 4 both contained three correlated (latent) predictors ( ηu , ηv , ηw ) and one (latent) dependent variable ( ηy ). All latent variables were measured by three observed indicators. Models 2 and 5 included three correlated (latent) predictors ( ηu , ηv , ηw ) and one (latent) dependent variable ( ηy ) as well. However, in contrast to the other models, all latent variables were measured by six observed indicators. Models 3 and 6 contained six correlated (latent) predictors ( ηr , ηs , ηt , ηu , ηv , ηw ) and one (latent) dependent variable ( ηy ). As in models 1 and 4, all latent variables were measured by three observed indicators. As before, the parameters of interest were the regression coefficients. For convenience, the true parameter values β for all models were set to one: ( βr=1 , βs=1 , βt=1 ,) βu=1 , βv=1 , and βw=1 . The true standardized parameter values βz for all models were: ( βrz=0.134 , βsz=0.155 , βtz=0.161 ,) βuz=0.134 , βvz=0.155 , and βwz=0.161 . A visualization of the models and their specific parameter values can be found in the Supplemental Material.

Analogous to the first simulation study, we have compared the same seven estimators. Bounded estimation was again used for all estimators to obtain a higher number of converged solutions. In contrast to the first simulation study, the interest lies here in the robustness of the performance of the different specifications of α in the SSC. In order to assess the performance, we used the averaged MSE: the MSE was averaged over the regression coefficients for every estimator in each model separately. We simulated 10,000 samples, for each of the 10 settings (5 different samples sizes × 2 reliabilities) per model.

Results Simulation Study 1

Based on the bias-variance trade-off, a certain pattern in the results was anticipated. On the one hand, we expected SEM and the MOC to be the best performing estimators with respect to bias, but with the worst regarding variability. On the other hand, we foresaw FSR and the SSC with the largest α-value to have the largest bias, but with the lowest variability. Moreover, we expected the bias and variability to converge toward zero as the sample size grows larger. Except for FSR and the SSC with the largest α-value, we anticipated the bias to remain, even in large sample sizes. Finally, it was expected that the patterns will be less distinct when the reliability is high. Tables with the values of the considered performance criteria for the various estimators and settings can be found in the Supplemental Material.

Convergence rate

Because methods (2) to (7) were adaptations of FSR, they would all either converge or fail to converge for the same dataset. Hence, we grouped these methods together only when assessing convergence rates and compared SEM-MLE and FSR. Table 2 reveals a high overall proportion of successful replications. The lowest convergence rates (99.30% and 99.89%) are detected for SEM-MLE in the conditions with low reliability and the smallest sample sizes. In fact, a converged solution is acquired 9,930 out of 10,000 times when N=20 , and 9,989 out of 10,000 times when N=30 , both in case of low reliability. From a sample size of 50 onward, the convergence rates reach approximately 100% for all conditions.

Table 2.

Overview of convergence rates in percentage. Bounded estimation was used for all estimators. The latent variables were handled separately in the FSR methods, implying that convergence was only achieved if the factor analysis was successful for all four measurement models.

N Low reliability High reliability
SEM-MLE FSR SEM-MLE FSR
20 99.30 99.84 99.96 99.96
30 99.89 99.94 100.00 100.00
50 100.00 99.97 100.00 100.00
100.00 100.00 100.00 100.00 100.00
200 100.00 100.00 100.00 100.00
2000 100.00 100.00 100.00 100.00

Note. FSR = factor score regression; SEM-MLE = structural equation modeling–maximum likelihood estimation.

Bias

As expected, the results plotted in Figures 3 and 4 display that the mean bias decreases when the sample size grows. Similarly, the differences between the estimators become smaller, but remain larger with low reliability. In contrast to SEM-MLE, the MOC, and MOC-λ , the bias of FSR and the SSC does not converge to zero, as anticipated. Figures 3 and 4 show that the bias remains largest for FSR and the SSC with the largest α-value. This is in particular true for the model with low reliability.

Figure 3.

Figure 3.

Relative bias in case of low reliability.

Figure 4.

Figure 4.

Relative bias in case of high reliability.

The findings show identical results for the MOC and MOC-λ regarding (relative) bias in most conditions. The λ-correction was used (which means Var(η)^Croon was not positive definite) substantially when the reliability was low and the sample size was small. The correction was used about 4.43% when N=20 and 1.58% when N=30 . It was used less frequently in the following three conditions: 0.20% when the reliability was low and N=50 , 0.01% when the reliability was low and N=100 , and 0.06% when the reliability was high and N=20 . The findings suggest that the λ-correction does not have a beneficial effect for the (relative) bias.

Interestingly, the (relative) bias seems to depend on how large the regression coefficient is. If the true value is small (e.g. βu=0.5 ), then the bias is positive. If the true value becomes larger (e.g. βw=4 ), then the bias is negative.

Empirical standard deviation

As expected, the results suggest that the ESD reduces as the sample sizes grows. The differences between the estimators become smaller as well, especially in case of high reliability. The findings reveal no major differences between MOC-λ , FSR, and the SSC. The lowest ESD value is found for FSR and the SSC with the largest α-value. Figures 5 and 6 display similar results for all estimators from a sample size of 100 onwards.

Figure 5.

Figure 5.

Empirical standard deviation in case of low reliability (some values lie out of scale, these can be found in the corresponding table in the Supplemental Material).

Figure 6.

Figure 6.

Empirical standard deviation in case of high reliability.

The results imply that SEM-MLE is the worst performing estimator in the smaller sample sizes ( N50 ), in particular when the reliability was low. Figures 5 and 6 show that the MOC performed well in most conditions, except in the condition with the smallest sample size ( N=20 ) and low reliability. In this condition, MOC-λ outperformed the MOC. Therefore, suggesting the λ-correction contributes to a lower variability of the estimator. As mentioned above, the λ-correction is only used substantially when the reliability is low for the two smallest sample sizes. In all other conditions, the MOC and MOC-λ have identical results.

Mean squared error

As expected, the findings show decreasing MSE values as the sample size grows larger. Figures 7 and 8 indicate that the differences between the estimators become smaller as well, even more so in case of high reliability.

Figure 7.

Figure 7.

Mean squared error in case of low reliability (some values lie out of scale, these can be found in the corresponding table in the Supplemental Material).

Figure 8.

Figure 8.

Mean squared error in case of high reliability (some values lie out of scale, these can be found in the corresponding table in the Supplemental Material).

The overall worst performing method is SEM-MLE, specifically in the conditions with low reliability and small sample sizes. From a sample size of 100 onwards, SEM-MLE performs similar to the other estimators. The MOC shows comparable results to the other FSR estimators. However, its performance is poor in the conditions with low reliably and a sample size of 20 or 30. Hence, the λ-correction contributes to a lower MSE in these conditions.

In most conditions, the best performing method in terms of MSE is FSR, followed by the SSC with the largest α-value. On the other hand, FSR is the worst performing estimator when considering the MSE of β^w in the two largest sample sizes (N = 200, 2,000).

Results Simulation Study 2

As in the first simulation study, the findings show decreasing average MSE values as the sample size grows larger. Figures 9 and 10 indicate that the differences between the estimators become smaller as well, especially in case of high reliability.

Figure 9.

Figure 9.

Average mean squared error in case of low reliability (some values lie out of scale, more information regarding these values can be found in Table 3).

Figure 10.

Figure 10.

Average mean squared error in case of high reliability (some values lie out of scale, more information regarding these values can be found in Table 4).

Regarding the best and worst performing methods, the same results as in the first simulation study can be observed. The worst performing methods regarding average MSE are SEM-MLE and the MOC. The best performing method is FSR, followed by the SSC. From a sample size of 100 onwards, the methods perform more alike. Additionally, the results reveal once more the relevance of the λ-correction in the smallest sample sizes ( N=30,50 ) when reliability is low. This can be verified in Figures 9 and 10, which summarize the average MSE values for the three models with low and high reliability (respectively).

In line with the earlier findings, FSR is the best performing method with respect to average MSE in most conditions (i.e. when N200 ). To assess the robustness of the results obtained from the three specifications for α specifically, we examine the average MSE of the methods relative to FSR. By using this relative average MSE, the differences in performance compared to other methods can be easily illustrated. The relative average MSE values are shown in Tables 3 and 4 for the three models with low and high reliability (respectively).

Table 3.

Relative average mean squared error in case of low reliability.

Estimators N Model 1 Model 2 Model 3
30 50 100 200 2000 30 50 100 200 2000 30 50 100 200 2000
FSR 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
SSC ( α=(N1)/2 ) 1.29 1.30 1.29 1.23 0.72 1.22 1.20 1.19 1.16 0.91 1.43 1.42 1.38 1.31 0.86
SSC ( α=p+5 ) 1.51 1.67 1.75 1.69 0.79 1.35 1.39 1.41 1.38 1.00 1.63 1.92 2.00 1.94 1.09
SSC ( α=p+1 ) 1.72 1.79 1.81 1.72 0.79 1.45 1.45 1.43 1.40 1.00 1.99 2.17 2.10 1.98 1.09
MOC-λ 2.09 1.95 1.88 1.75 0.79 1.57 1.51 1.46 1.41 1.00 4.79 3.13 2.30 2.06 1.09
MOC 2.36 1.95 1.88 1.75 0.79 1.57 1.51 1.46 1.41 1.00 7926.05 4.03 2.30 2.06 1.09
SEM 12.17 3.68 2.16 1.86 0.80 2.08 1.65 1.51 1.43 1.00 32.00 15.84 2.83 2.21 1.09

Note. FSR = factor score regression; SEM = structural equation modeling; MLE = maximum likelihood estimation; SSC = small sample correction.

Table 4.

Relative average mean squared error in case of high reliability.

Estimators N Model 4 Model 5 Model 6
30 50 100 200 2000 30 50 100 200 2000 30 50 100 200 2000
FSR 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
SSC ( α=(N1)/2 ) 1.09 1.08 1.08 1.07 0.93 1.05 1.05 1.04 1.04 1.00 1.12 1.10 1.08 1.08 0.96
SSC ( α=p+5 ) 1.13 1.15 1.15 1.15 0.96 1.07 1.08 1.08 1.08 1.04 1.15 1.16 1.16 1.15 1.00
SSC ( α=p+1 ) 1.16 1.16 1.16 1.15 0.96 1.09 1.09 1.09 1.08 1.04 1.19 1.18 1.17 1.16 1.00
MOC-λ 1.19 1.18 1.17 1.16 0.96 1.10 1.09 1.09 1.08 1.04 1.27 1.22 1.19 1.17 1.00
MOC 1.19 1.18 1.17 1.16 0.96 1.10 1.09 1.09 1.08 1.04 1.27 1.22 1.19 1.17 1.00
SEM 4.43 1.24 1.19 1.17 0.96 1.13 1.11 1.09 1.09 1.04 29.30 1.29 1.21 1.17 1.00

Note. FSR = factor score regression; SEM = structural equation modeling; MLE = maximum likelihood estimation; SSC = small sample correction.

As expected, the largest relative average MSE values can be found for SEM-MLE and MOC(λ) . The differences become smaller when the sample size grows and when the reliability of the indicators is high. In the results, the same patterns can be observed across the three different models. After FSR, the SSC with α=(N1)/2 has the lowest average MSE values, followed by the specifications α=p+5 and α=p+1 , respectively. Despite the different number of latent predictors and indicators, the findings suggest that the performance of the SSC with the three specifications for α is similar. Furthermore, the results indicate that the model with more indicators per latent variable has lower average MSE values.

Summary of Simulation Studies

In the first simulation study, the overall convergence rates were high for both SEM-MLE and the FSR methods. This high number of converged solutions suggests a beneficial effect from the use of bounded estimation (in this case standard bounds), as discovered earlier by De Jonckere and Rosseel (2022). The λ-correction was often used in the smallest sample sizes when reliability is low. Nevertheless, it had a profound impact on the ESD and MSE in the condition with the very small sample sizes. The results for the FSR methods were acquired using the Bartlett predictor. However, identical results can be obtained using the Regression predictor.

As mentioned earlier, the SSC performed well in terms of MSE in the considered conditions. The rationale of the SSC is to introduce some bias for less variability. This trade-off has a beneficial effect for the MSE in small sample estimation. Out of the three suggestions, the estimator with α=(N1)/2 performed best in terms of MSE. Although it had the largest bias, combined with the smaller ESD it resulted in a lower MSE for most cases. We suggested the value (N1)/2 as it adjusts to the sample size at hand and assures a mid-position between FSR and the MOC, but for use in small samples only. Additionally, the SSC with the suggestions from Fuller (1980, 1987) and Wall and Amemiya (2000) surpassed SEM-MLE, MOC, and MOC-λ in all conditions.

SEM-MLE and the MOC were the worst performing estimators regarding ESD and MSE, particularly in the smaller sample sizes. However, both estimators outperformed FSR and the SSC regarding bias in most conditions. SEM-MLE and the MOC were particularly superior in terms of bias in case of low reliability and larger sample sizes. The results also showed decreasing bias of SEM-MLE and the MOC as the sample size grows. Furthermore, the ESD will also reduce, which will eventually (in lager sample sizes) result in a better performance compared to FSR and the SSC.

For the second simulation study, in each of the considered models, highly similar patterns were present in the results. This illustrates the robustness of the performance of the three α-values for the SSC in models varying in the number of latent predictors and indicators. Furthermore, the estimators performed slightly better in the model where latent variables were measured by more indicators.

Discussion

The goal of this article was to (a) evaluate the performance of (i) standard SEM, (ii) the standard MOC, (iii) naive FSR, and (iv) the MOC with the proposed SSC in small samples sizes and (b) assess the robustness of the performance of the different specification for the SSC in different models. We assessed the estimators in terms of convergence rates, relative mean bias, ESD, and MSE. The comparison was performed in 12 different conditions, with varying sample sizes and reliability of the indicators.

The findings reveal FSR as the best overall performing predictor, followed by the SSC with the largest α-value. Only for the conditions where N=200 or N=2000 and reliability was low, FSR was not the best performing estimator for the regression coefficient βw . In fact, FSR was then the worst performing predictor in terms of MSE in both cases. Therefore, the results suggest that both the reliability and sample size at hand play an important role. Our findings are in line with those of Devlieger et al. (2016) where FSR was the worst performing estimator (compared to SEM-MLE and the MOC) regarding MSE in sample sizes above 300. Furthermore, the findings demonstrated that the performances of the different specifications of α were similar across the considered models. The average MSE values of all methods were lowest in the model with more indicators.

Although the FSR and SSC estimators had the best performance in most of the conditions, they do not take the measurement error (entirely) into account. This results in a lower MSE in small samples. However, despite the beneficial effect on the MSE, the consequences of not taking measurement into account should not be neglected. On the one hand, the regression parameters can be over- or underestimated depending on the correlation structure between the latent variables (Cole & Preacher, 2014). On the other hand, an inflation of the type I error rate may occur under certain circumstances (Brunner & Austin, 2009). Moreover, the more complex the model becomes, the worse the consequences of both problems.

When analyzing relationships between latent variables, we encourage researchers to not choose a method ill-considered. Instead, we recommend to consider the sample size, complexity of the model, reliability of the indicators at hand, and to use the best method for the situation. If the sample size is large enough ( N200 ), one might be better off using SEM-MLE or the MOC instead of FSR or the SSC. However, if the sample size is small ( N100 ), one might consider using FSR or the SSC instead of SEM-MLE or the MOC. Additionally, the goal of the research should be taken into account. If bias is important, unbiased methods, such as SEM and the MOC, are favored in larger sample sizes, or the SSC with the smallest α-value ( =p+1 ) if the sample size is small. If bias is not of primary importance (and a lower variability is), methods such as FSR and the SSC with the largest α-value ( =(N1)/2 ) are favored. The SSC lets researchers to choose a compromise between ignoring measurement error (i.e. using FSR) and taking into account measurement error (i.e. using SEM or the MOC).

Like all simulation studies, our study was not without any limitations. The first limitation of this study is that only certain conditions and models were examined. Although these conditions and models may be commonly found in the literature, different settings may lead to different results. The second limitation is that we have not considered the type I error and the power as performance criteria. Future research should look into hypothesis testing to see how well the SSC and other estimators perform in (small) sample sizes, preferably using a different simulation model to see how general the results obtained in this article are.

Supplemental Material

sj-zip-1-epm-10.1177_00131644221105505 – Supplemental material for A Small Sample Correction for Factor Score Regression

Supplemental material, sj-zip-1-epm-10.1177_00131644221105505 for A Small Sample Correction for Factor Score Regression by Jasper Bogaert, Wen Wei Loh and Yves Rosseel in Educational and Psychological Measurement

Acknowledgments

The authors would like to thank the editor and two reviewers for their comments on prior versions of this manuscript.

Appendix A: Formulas in Matrix Notation

The Statistical Model in Structural Equation Modeling

Consider a structural equation model with p observed variables and m latent variables. The measurement model is defined as:

y=ν+Λη+ϵ, (A1)

where y , ν , and ϵ are p×1 vectors of respectively the observed variables, intercepts, and residual errors. The p×m matrix λ contains the factor loadings relating the latent to the observed variables. We denote Var(ϵ)=Θ . The structural model is defined as:

η=α+Bη+ζ, (A2)

Where η , α , and ζ are m×1 vectors containing the latent variables, intercepts, and residual error terms, respectively. B is an m×m matrix containing regression coefficients. The diagonal elements of B must be zero, and (IB) should be invertible. We assume that E(ϵ)=0 , E(ζ)=0 , Cov(η,ϵ)=0 , and Cov(ϵ,ζ)=0 . The model-implied variance–covariance matrix Var(y)=Σ is given by:

Σ=ΛVar(η)ΛT+Θ, (A3)

where

Var(η)=(IB)1Ψ(I-B)-T. (A4)

Factor Score Regression

The factor scores can be obtained by:

f=Ay, (A5)

with A the m×p factor score matrix. Using the Bartlett method (Bartlett, 1937), the factor score matrix is defined as:

AB=(ΛTΘ-1Λ)ΛTΘ-1. (A6)

In the Regression method (Thomson, 1934; Thurstone, 1935), the factor score matrix is defined as:

AR=Var(η)ΛTΣ1. (A7)

In the second step (once the factor scores are calculated), an estimate of the regression parameters can be obtained as follows:

β^fs=Var(fx)^1Cov(fx,fy)^. (A8)

This estimator is biased regardless of the sample size, except in the trivial setting where Cov(fx,fy)^=0 , or when there is no measurement error ( ϵ=0 ) (Fuller, 1987).

The Method of Croon

Croon (2002) presented his correction formulas in scalar form. We provide here the formulas in matrix form:

Var(η)Croon=D1[Var(f)E]DT (A9)

with E an m×m matrix defined as:

E=AΘAT, (A10)

and where D is an m×m matrix defined as:

D=Aλ. (A11)

Note that when the Bartlett factor scores are used, D=I , but this is not true for regression factor scores. Using the Croon corrected (co)variance matrix in Equation (32), we can obtain unbiased estimates for the regression parameters:

β^Croon=Var(ηx)^Croon1Cov(ηx,ηy)^Croon. (A12)

Small Sample Correction

To ensure that the SSC formulas are valid for both Bartlett and regression factor scores, we first define:

Var(f)=D1Var(f)DT (A13)

and

E=D1EDT. (A14)

Equation (A9) can then be rewritten as:

Var(η)Croon=Var(f)E (A15)

The λ-correction can be expressed as follows:

Var(η^)λ={Var(f^)E^ifλ^1+(N1)1Var(f^)[λ^(N1)1]E^ifλ^<1+(N1)1, (A16)

where N is the sample size, and λ^ is the smallest root of the determinantal equation |Var(f)^λ^E^|=0 . This decision based on λ^ ensures that Var(η)^λ is positive definite. The α-correction then proceeds as follows:

Var(η^)ssc=Var(η^)λ+α(N1)1E^, (A17)

where α is a fixed number to be specified. Once the elements of the variance–covariance matrix are corrected, it is possible to obtain estimates of the regression parameters β :

β^ssc=Var(ηx^)ssc1Cov(ηx,ηy)^ssc. (A18)

Footnotes

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Wen Wei Loh was supported by the Special Research Fund (BOF) of Ghent University postdoctoral fellowship BOF.PDO.2020.0045.01.

ORCID iD: Jasper Bogaert Inline graphichttps://orcid.org/0000-0003-0904-8409

Supplemental Material: Supplementary Materials can be found online: https://osf.io/bgd23/?view_only=fa12678fc9134ba9a4dc805a273072a0

References

  1. Anderson J. C., Gerbing D. W. (1984). The effect of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49(2), 155–173. [Google Scholar]
  2. Bartlett M. S. (1937). The statistical conception of mental factors. British Journal of Psychology, 28, 97–104. [Google Scholar]
  3. Bentler P. M., Yuan K.-H. (1999). Structural equation modeling with small samples: Test statistics. Multivariate Behavioral Research, 34(2), 181–197. [DOI] [PubMed] [Google Scholar]
  4. Bollen K. A. (1989). Structural equations with latent variables. John Wiley & Sons. [Google Scholar]
  5. Boomsma A. (1985). Nonconvergence, improper solutions, and starting values in LISREL maximum likelihood estimation. Psychometrika, 50(2), 229–242. [Google Scholar]
  6. Brunner J., Austin P. C. (2009). Inflation of type i error rate in multiple regression when independent variables are measured with error. Canadian Journal of Statistics, 37(1), 33–46. [Google Scholar]
  7. Buonaccorsi J. P. (2010). Measurement error: Models, methods, and applications. Chapman & Hall/CRC. [Google Scholar]
  8. Carroll R. J., Ruppert D., Stefanski L. A., Crainiceanu C. M. (2006). Measurement error in nonlinear models: A modern perspective. CRC press. [Google Scholar]
  9. Cole D. A., Preacher K. J. (2014). Manifest variable path analysis: Potentially serious and misleading consequences due to uncorrected measurement error. Psychological methods, 19(2), 300. [DOI] [PubMed] [Google Scholar]
  10. Cox D. R., Hinkley D. V. (1979). Theoretical statistics. CRC Press. [Google Scholar]
  11. Croon M. (2002). Using predicted latent scores in general latent structure models. In Marcoulides G., Moustaki I. (Eds.), Latent variable and latent structure models (pp. 195–223). Lawrence Erlbaum. [Google Scholar]
  12. De Jonckere J., Rosseel Y. (2022). Using bounded estimation to avoid nonconvergence in small sample structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 29(3), 1–16. [Google Scholar]
  13. Devlieger I., Mayer A., Rosseel Y. (2016). Hypothesis testing using factor score regression: A comparison of four methods. Educational and Psychological Measurement, 76(5), 741–770. 10.1177/0013164415607618 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Devlieger I., Rosseel Y. (2017). Factor score path analysis: An alternative for SEM? Methodology, 13, 31–38. [Google Scholar]
  15. Fuller W. A. (1980). Properties of some estimators for the errors-in-variables model. The Annals of Statistics, 8(2), 407–422. [Google Scholar]
  16. Fuller W. A. (1987). Measurement error models. John Wiley & Sons. [Google Scholar]
  17. Hastie T., Tibshirani R., Friedman J. (2006). The elements of statistical learning: Data mining, inference, and prediction. Springer. [Google Scholar]
  18. James G., Witten D., Hastie T., Tibshirani R. (2013). An introduction to statistical learning (Vol. 112). Springer. [Google Scholar]
  19. Lastovicka J. L., Thamodaran K. (1991). Common factor score estimates in multiple regression problems. Journal of Marketing Research, 28(1), 105–112. [Google Scholar]
  20. Nevitt J., Hancock G. R. (2004). Evaluating small sample approaches for model test statistics in structural equation modeling. Multivariate Behavioral Research, 39(3), 439–478. 10.1207/S15327906MBR3903\_3 [DOI] [Google Scholar]
  21. Nunnally J. C., Bernstein I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill, Inc. [Google Scholar]
  22. Ozenne B., Fisher P. M., Budtz-Jørgensen E. (2020). Small sample corrections for wald tests in latent variable models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 69(4), 841–861. [Google Scholar]
  23. R Core Team. (2020). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/ [Google Scholar]
  24. Rosseel Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. [Google Scholar]
  25. Rosseel Y. (2020). Small sample solutions for structural equation modeling. In van de Schoot R., Miocević M. (Eds.), Small sample size solutions: A guide for applied researchers and practitioners (pp. 226–238). Routledge. [Google Scholar]
  26. Shevlin M., Miles J. N., Bunting B. P. (1997). Summated rating scales. A Monte Carlo investigation of the effects of reliability and collinearity in regression models. Personality and Individual Differences, 23(4), 665–676. [Google Scholar]
  27. Skrondal A., Laake P. (2001). Regression among factor scores. Psychometrika, 66(4), 563–575. [Google Scholar]
  28. Smid S. C., Rosseel Y. (2020). SEM with small samples: Two-step modeling and factor score regression versus Bayesian estimation with informative priors. In van de Schoot R., Miocević M. (Eds.), Small sample size solutions: A guide for applied researchers and practitioners (pp. 239–254). Routledge. [Google Scholar]
  29. Takane Y., Hwang H. (2018). Comparisons among several consistent estimators of structural equation models. Behaviormetrika, 45(1), 157–188. [Google Scholar]
  30. Thomson G. (1934). The meaning of “i” in the estimate of “g”. British Journal of Psychology, 25, 92–99. [Google Scholar]
  31. Thurstone L. L. (1935). The vectors of mind. University of Chicago Press. [Google Scholar]
  32. Wall M. M., Amemiya Y. (2000). Estimation for polynomial structural equation models. Journal of the American Statistical Association, 95(451), 929–940. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sj-zip-1-epm-10.1177_00131644221105505 – Supplemental material for A Small Sample Correction for Factor Score Regression

Supplemental material, sj-zip-1-epm-10.1177_00131644221105505 for A Small Sample Correction for Factor Score Regression by Jasper Bogaert, Wen Wei Loh and Yves Rosseel in Educational and Psychological Measurement


Articles from Educational and Psychological Measurement are provided here courtesy of SAGE Publications

RESOURCES