Abstract
The item-position effect describes how an item’s position within a test, that is, the number of previous completed items, affects the response to this item. Previously, this effect was represented by constraints reflecting simple courses, for example, a linear increase. Due to the inflexibility of these representations our aim was to examine whether adapted representations are more appropriate than the existing ones. Models of confirmatory factor analysis were used for testing the different representations. Analyses were conducted by means of simulated data that followed the covariance pattern of Raven’s Advanced Progressive Matrices (APM) items. Since the item-position effect has been demonstrated repeatedly for the APM, it is a very suitable measure for our investigations. Results revealed no remarkable improvement by using an adapted representation. Possible reasons causing these results are discussed.
Keywords: APM, confirmatory factor analysis, item-position effect, simulation
The term item-position effect denotes the dependency of the response to a specific item in a sequence of homogeneous items on the position of this item within the sequence. In other words, as it is stated by Knowles (1988), the experience with early items alters the answers to later items. The item-position effect has been identified in the frameworks of different methodological approaches, for example, in experimental studies, in item-response theory (IRT), or in factor analysis. The present work is located in the factor-analytic approach. There are different explanations regarding the source underlying the item-position effect. So far, fatigue, persistence, impulsivity, and executive attention have been considered as source (Hartig & Buchholz, 2012; Kubinger, 2008; Lozano, 2015; Ren, Goldhammer, Moosbrugger, & Schweizer, 2012). However, probably the most attention has been given to learning (Embretson, 1991; Verguts & De Boeck, 2000). For example, there is empirical evidence suggesting the ability of complex learning as an important source of the item-position effect (Ren, Wang, Altmeyer, & Schweizer, 2014). Despite the available studies investigating possible sources, the question of how to represent the item-position effect most appropriately is still in need of a conclusive answer since only few specific representations have been included in studies so far. The confirmatory factor models of the available studies included representations based on what we will call basic functions: linear, quadratic, or logarithmic functions (Schweizer, 2012; Schweizer, Schreiner, & Gold, 2009; Schweizer, Troche, & Rammsayer, 2011). Therefore, our aim was to examine the item-position effect by means of models not only assuming a course1 of the effect according to such a basic function, but also an adaptive course taking properties of the items into account.
So far, only real data have been investigated since the focus has been on the identification of the effect. Investigating the validity of the method for representing the item-position effect, however, has been neglected. For this purpose, simulation studies are especially well suited (Bandalos & Gagné, 2012). A crucial precondition for a simulation study is the appropriateness of the data. Since it is still unclear what characterizes a covariance pattern of data showing the item-position effect, data were simulated according to a pattern that was derived from data collected by means of Raven’s Advanced Progressive Matrices (APM; Raven, Raven, & Court, 1997), a widely used measure of fluid intelligence. In this point the simulation method of this study deviates from the method that is usually selected for a simulation study. Raven’s APM are especially suited for investigating the item-position effect since it is a homogenous scale regarding the ability that is required for solving the items. Furthermore, the item-position effect has been found repeatedly in the APM (Lozano, 2015; Schweizer et al., 2009).
In general, the item-position effect can be observed in different personality scales as well as achievement tests. One of its most important characteristics is an increasing item reliability from the first to the last items within a scale that means an increase in the relative amount of true variance (Hamilton & Shuminsky, 1990; Hartig, Hölzel, & Moosbrugger, 2007; Knowles, 1988; Knowles & Byers, 1996). As it is already stated by Mollenkopf (1950), the impact of the item-position effect can be examined in two different ways: either focusing on item difficulty and item means, or considering item discrimination and item variance and covariance, respectively. Current research also reflects this distinction in two ways as the item-position effect is mostly investigated either in the framework of IRT (focusing on item characteristics) or factor analysis (focusing on variance and covariance). In the IRT framework either linear logistic test models or multidimensional Rasch models are used for studying the item-position effect (Embretson, 1991; Hohensinn et al., 2008; Kubinger, 2008; Verguts & De Boeck, 2000). In the factor-analytic framework, confirmatory factor analysis (CFA) is conducted since it allows decomposing the true variance of, for example, an achievement measure into two components: an ability-specific component and a position-specific component (Schweizer et al., 2009; Schweizer et al., 2011). A main characteristic of this approach is to constrain factor loadings according to theory-based expectations. This enables to make assumptions regarding the course of the item-position effect. In previous research this course was represented solely by functions assuming a consistent slope from the first to the last item, as for instance a linear or a quadratic slope. However, so far there is no function that can be seen as the standard way of representing the item-position effect.
Models Used in Research on the Item-Position Effect
The CFA model, which is used in the present study, can be perceived as related to the essentially tau-equivalent model of measurement (Lord & Novick, 1968), which assumes one latent variable and is extended by a constant in recent presentations (Graham, 2006). In this model of measurement, the vector of observations is set equal to the sum of the vector of intercepts µ, the product of the vector of factor loadings and the latent variable , and the vector enclosing the error components :
The main characteristic of this model of measurement is that it includes as many equations as there are observations. Additionally, all factor loadings ( to ) are set to the same value. Here, this common value is defined as :
Investigating the item-position effect requires an extension of Equation 1. For representing the item-position effect, a second component consisting of the product of the vector of factor loadings λe and the additional latent variable ηe is added to Equation 1:
As in the framework of CFA variances and covariances are analyzed, the model of measurement needs to be “translated” to a CFA model. In this model is the model-implied matrix of variances and covariances of p observations, is the matrix of factor loadings (here, q = 2 since two latent variables are included in the model), is the matrix of latent variances and covariances, and is the diagonal matrix of error variances:
For estimating the model, the empirical matrix S of observed variances and covariance serves as input to CFA. However, there is still a problem since in the case of ability and achievement measures, the empirical matrix S is based on dichotomous and binomial distributed data, whereas data are assumed to be continuous and normally distributed. In the context of CFA, there are different ways of dealing with this discrepancy. For example, either tetrachoric correlations serve as input to CFA or item factor analysis is conducted. However, both procedures include the estimation of thresholds assuming normally distributed latent values (Bock, Gibbons, & Muraki, 1988; Muthén, 1984). This demands a high quality of the data for achieving accurate parameter estimates, especially if items are very easy or very difficult. Since this precondition is frequently not given, especially if items show a broad range of difficulties, it is recommended to combine both procedures with the robust maximum likelihood estimation method (Finney & DiStefano, 2013). Although a good model fit is achievable by robust estimation, there is still a major limitation: The effect of this estimation method is restricted to model fit and there is no improvement regarding the quality of the parameter estimates. However, for investigating the item-position effect, a high quality of the parameter estimates is essential. Besides those procedures, diagonally weighted least square estimation is proposed to deal with the problem of dichotomous data. This estimation method does not demand data showing multivariate normality and is appropriate for categorical data. In this context, it is recommended to use robust estimation which is realized by using the full asymptotic covariance matrix as a weight matrix (DiStefano & Morgan, 2014; Li, 2015).
Moreover, there is another way of conducting CFA. This way, that is referred to as the threshold-free approach (TfA), avoids the estimation of thresholds (Schweizer, 2013; Schweizer, Ren, & Wang, 2015). Following this way, the empirical matrix S is obtained by probability-based covariances (McDonald & Ahlawat, 1974; Schweizer et al., 2015), in order to transform the scale from dichotomous to continuous. These covariances are computed from the probabilities that the items are answered correctly as well as the probabilities that pairs of items are answered correctly. Furthermore, a link transformation of variances and covariances is used to obtain values that reflect the transformation of the data distribution from binomial to normal. This link transformation is realized by assigning item-specific weights wi (i = 1, . . . , p), which are a function of the probability (Pr) that the response to item Xi (i = 1, . . . , p) is correct:
Although the linking of variances has been considered in the development of the generalized linear model (McCullagh & Nelder, 1984), most of the research work on link functions has concentrated on scores using functions from the exponential family.
The item-specific weights that are obtained by Equation 5 are inserted as the main diagonal elements into the diagonal matrix W that is added to the CFA model. In this extended model, the item-specific weights are multiplied by the factor loadings. As a result, the weighted version of the CFA model is achieved:
This threshold-free approach of analyzing dichotomous data can be combined with the normal maximum likelihood estimation method. It needs to be added that this approach additionally includes a disattenuation step. However, this step has to occur after the parameter estimation and can be omitted since it is not item-specific and, therefore, does not influence model fit or the relations among the factor loadings.
Generally, we like to add that evaluation studies focusing on CFA model fit usually spare the simulation of data that reflect very or extremely easy items. Therefore, procedures performing well in the range between moderately easy and moderately difficult items may fail if some items are very or extremely easy. As a consequence, there is a lack of empirical evidence for deciding about the most appropriate procedure. Hence, different ways of conducting CFA were applied in the present study.
Representing the Item-Position Effect
The CFA model that is used for investigating the item-position effect is characterized by constrained discrimination, which is also a major characteristic of the one-parameter IRT model (i.e., the Rasch model; Rasch, 1980). In the CFA context, this is realized by using constrained factor loadings. As mentioned earlier, literature suggests that the major characteristic of the item-position effect is its increase from the first to the last items of a scale, as also highlighted by Knowles (1988). However, it is still unclear whether the course of the item-position effect is specific for different scales or test situations. Moreover, it is unclear whether the item-position effect is independent of properties of items or not. If there is such dependency, the response to a specific item may be more influenced by the characteristics of neighboring items than by more distant items. Accordingly, our research strategy comprises both investigations of all the items of a scale and investigations of different sections of neighboring items, also denoted as subsets of items.
So far, the available studies employ three different functions to describe the course of the item-position effect (Schweizer, 2012): linear, quadratic, or logarithmic functions. In the following, they are referred to as basic functions. First, there is the linear function that provides the simplest way of representing a constantly increasing course, following the principle of economy. It provides a numerical value for every item i (i = 1, . . . , p) out of a set of p items. In order to keep the predicted values within the interval between zero and one, the following function is used to describe a linear course of the item-position effect:
Second, the quadratic function is used which is mainly characterized by its increasing slope. For the first few items only a very small effect is assumed, but there is an accelerating increase. Such a course may be appropriate if, for instance, complex learning is the source of the item-position effect. In this case, the complexity of the demands may delay the observable outcome of learning:
Again, all outcomes stay within the interval of zero and one.
As a third alternative, the logarithmic function is considered. The logarithmic function implies a course that is kind of contrary to the quadratic one as there is a steep slope at the beginning, which decreases as the item number rises. The decreasing slope of this function reflects that there may be an upper limit of the item-position effect that is approached asymptotically:
As , again all outcomes stay between zero and one.
All the functions presented above describe simple courses, implicitly assuming that the properties of the individual items do not influence the course of the item-position effect. However, such a simple course may be an inappropriate assumption, especially if neighboring items show different degrees of similarity, for example, large differences in item difficulty. Regarding the APM, the demand of different rules for solving the items (Carpenter, Just, & Shell, 1990) may cause a deviation of the item-position effect’s course from such a simple one. Thus, it is necessary to investigate different sections within a sequence of items. This can be done by subdividing the sequence into sections of neighboring items, also referred to as subsets, and investigating the course of the item-position effect within each subset. The results observed for subsets can provide the basis for a piecewise function that reflects characteristics of different subsets and, therefore, may describe a better representation of the item-position effect. Such a piecewise function may identify different courses for different subsets of items. The upper boundary of each subset is defined by , where , the total number of items within the scale. Accordingly, the jth subset contains the items with the numbers . For the jth subset the function is assumed:
The intercept is added to achieve a smooth transition from one section to the next one, for example,
The described functions provide values for representing the course of the item-position effect. In the CFA model, the representation is achieved by setting the factor loadings for the position-specific latent variable equal to these values. Accordingly, the functions (7) to (10) provide values for the corresponding elements of the factor loading matrix Λ (where λij describes the factor loading of the ith item on the jth latent variable):
Please note that the factor loadings on the first (ability-specific) latent variable have the same value for every item i ( for all i = 1, . . . , p).
The Present Study
Research Objectives
The present study aimed at the identification of the actual course of the item-position effect in the APM. So far, in the context of CFA research only the three basic functions described in the previous section were taken into account when investigating the item-position effect. Hence, the question arose whether a more sophisticated representation is possible and whether it is necessary to adapt the representation of the item-position effect to the characteristics of the considered measure. Therefore, the main objective was to conduct a very detailed investigation of the course of the item-position effect in the APM. This was accomplished by means of a simulation study in order to decrease the dependency on one specific data set. Accordingly, the present study is structured in three tasks. First, a population matrix that served as a basis for data simulation was constructed in such a way that the simulated data resembled real data received from Raven’s APM. This was an important task since our aim was to investigate the item-position effect with regard to real-world characteristics. The second task was to analyze small subsets of items by means of the existing, commonly used representations of the item-position effect in order to describe its course in the best possible way. Finally, in the third task, the obtained results were used to create new representations that were applied to the same data and compared to the so far used representations. Thus, we tested if an adapted representation of the item-position effect performed better than the simpler, nonadaptive ones or whether these simpler representations were sufficient (for representing the item-position effect). These main tasks were complemented by a brief comparison of two ways of conducting CFA.2
Method
Data Generation
Our aim was to simulate data that showed the same pattern as real data. Therefore, instead of using specific models that would have to be based on assumptions (e.g., regarding the number of latent variables, etc.) that determine the structure of the data as a basis for data simulation, data were simulated according to the covariance pattern found in data obtained by means of Raven’s APM. Therefore, the simulated data owned the advantage to show a pattern that would also be found in a real-world situation.
The 36 × 36 covariance matrix of the 36 items of Raven’s APM Set II, which was received by a sample of 530 university students, served as a basis for the simulation. In a first step, this covariance matrix, also denoted as population matrix, was smoothed since we assumed that some large differences between the neighboring coefficients (e.g., initially negative covariances) reflected specificities of the sample and would disappear in a much larger sample. A second important step was the disattenuation of the smoothed population matrix, that is, to estimate the covariances computed from continuous data that gave rise to the observed covariances based on dichotomous data. This step changed the smoothed population matrix into what is addressed as simulation pattern. This additional step was necessary because in data simulation continuous data were generated in the first step and transformed into dichotomous data subsequently. Thus, the resulting covariances would be smaller than the covariances of the smoothed population matrix. To prevent this, the smoothed population matrix was adjusted in a way that the mean deviation between the values contained in the simulated matrices and the corresponding values in the smoothed population matrix were at a minimum level. The mean regarding the absolute differences was 0.00156 (SD = 0.00135).
The actual simulation was conducted by means of the procedure proposed by Jöreskog and Sörbom (2001). First, 200 2,000 × 36 matrices were generated, initially containing continuous, normally distributed, uncorrelated data. The 36 columns reflected the 36 items of Raven’s APM Set II, and the 2,000 rows a sample size of N = 2,000. In the second step, the matrices were recalculated by using weights in order that they show similar covariance pattern to the simulation matrix. Still, these data were continuous and therefore served as outset for the dichotomization in order to obtain data that were similar to real APM data. The dichotomization was done according to the proportions of correct responses that were found for the population sample of 530 participants. For example, the 94.6% highest values of the first column were replaced by the number 1, and the remaining 5.4% lowest values were replaced by the number 0. The obtained dichotomous data served as the basis for computing matrices of probability-based covariances which provided the input to CFA.
Approach for Data Analyses
In order to conduct an investigation that reflects details of the course of the item-position effect, in addition to analyzing the items of the measure as a whole, the 36 APM items were subdivided into two, three, four, and six same-sized subsets of neighboring items. That means, for example, the division into two sections generated two subsets of 18 items each, one containing Items 1 to 18, and the second containing Items 19 to 36. The other subsets were generated in the same way, and all subsets were analyzed independently.
In a second step piecewise functions were created for representing the course of the item-position effect for all items again. This was done based on the results achieved for the subdivision into six subsets as this yielded the most detailed course description. Two different approaches were used for specifying a piecewise function. Following the first approach, for each subset containing six items each, the best representation was determined based on the model fit results. Accordingly, these courses were included in the new piecewise function revealing a first adapted course. For the second approach, the variance of the position-specific latent variable was additionally taken into account. If this latent variance was significant, the best fitting representation was included in the piecewise function for this subset of items. If the latent variance was not significant, no item-position effect was assumed for this subset.
Furthermore, a cross-validation was conducted in order to check the stability of our results. For this purpose, the adapted representations were created based on a set of 100 matrices (analogue to the described approach), whereas the second set of 100 matrices served as hold-out sample. Subsequently, the obtained adapted representations were applied to the hold-out sample.
Models
On one hand, there was the one-factor model including one latent variable only, but no representation of the item-position effect. In this model, all factor loadings were constrained to the same number (λτ), assuming a uniform influence across all items. Regarding the APM scale as a whole, a one-factor model with free factor loadings was estimated additionally. On the other hand, there were different two-factor models including a second latent variable for representing the item-position effect (see Equation 6). In total, five different representations of the item-position effect were applied, leading to five different two-factor models. A linear, a quadratic, a logarithmic, and two different adapted courses were considered (see Equations 7-10). The corresponding models were referred to as linear model, quadratic model, logarithmic model, and first, respectively, second adapted model. For all the two-factor models, the factor loadings on the first latent variable, that is, the ability-specific factor, were constrained to the same value (λτ).
Regarding the analyses of subsets of items, only the one-factor model (with constrained factor loadings), the linear, the quadratic, and the logarithmic models were considered. Since each subset contained only a few items, it was reasonable to assume only these simple courses. For analyzing the APM scale as a whole, all described models were applied.
Model Estimation
Statistical investigations were conducted by means of LISREL (Jöreskog & Sörbom, 2004). Both the maximum likelihood estimation method (as part of the TfA) as well as robust diagonally weighted least square estimation (DWLS) was used. For the evaluation of results regarding model fit several model fit indices were computed: χ2, root mean square error of approximation (RMSEA; ≤.06), standardized root mean square residual (SRMR; ≤.08), goodness-of-fit index (GFI; ≥.90), nonnormed fit index (NNFI; ≥.95), comparative fit index (CFI; ≥.95), and Akaike information criterion (AIC). The numbers written in parentheses served as cutoffs as proposed by Kline (2011) and Hu and Bentler (1999), indicating a good model fit. For each model fit index, means and standard deviations based on the results of the 200 simulated matrices were computed.
Additionally, scaled variances of the latent variables were computed according to a procedure proposed by Schweizer (2011). This made it possible to compare the latent variances and therefore interpret the importance of a latent variable.
Results
The results are structured in three parts: First, results regarding the two ways of conducting CFA (DWLS and TfA) are compared. Second, the results concerning the investigation of subsets are summarized, and third, the results regarding the adapted models are reported. Due to the large amount of data and results, we cannot present the detailed results for every subset, but we will give an overview that focuses on the differences between the subsets. Even though only the RMSEA, AIC, and CFI statistics will be presented, we can assure that the other model fit indices reveal similar results and indications. All the figures provided in the following are based on the means of all 200 estimated samples.
Comparison of DWLS and TfA
Table 1 summarizes model fit results regarding investigations of all 36 APM items as a whole. As is obvious from Table 1, DWLS yielded the better results in all cases with the exception of SRMR. The DWLS results even suggested staying with one of the one-factor models since none of the two-factor models showed a CFI value that surmounted the CFI value of the one-factor model by at least 0.01 (Cheung & Rensvold, 2000), and the chi-square difference test could not be considered in models including constrained factor loadings. In contrast, according to the TfA results the two-factor model with quadratically increasing constraints was the best-fitting model according to all fit statistics and its CFI value surmounts all the other CFI values by more than 0.01. Another difference to be mentioned was the variability of the fit statistics. The smaller variability of chi-square, RMSEA, SRMR, and GFI characterized TfA; in NNFI and CFI the smaller variability was obtained by DWLS, probably because of a ceiling effect. Finally, it needs to be mentioned that irregularities regarding the variance of the latent variable associated with the item-position effect were observed. In 3.8% of the investigated matrices the estimate of the variance was negative when DWLS was applied and in no case otherwise.
Table 1.
Means and Standard Deviations (in Parentheses) of the Results Regarding Model Fit for All Estimated Models and a Sample Size of N = 2,000. Results Are Based on 200 Matrices.
| Model | Method | χ2 (SD) | df | RMSEA (SD) | SRMR (SD) | GFI (SD) | NNFI (SD) | CFI (SD) |
|---|---|---|---|---|---|---|---|---|
| One-factor (fix) | TfA | 3761.5 (192.2) | 629 | 0.050CI (0.002) | 0.056CI (0.002) | 0.905CI (0.004) | 0.882 (0.008) | 0.882 (0.008) |
| DWLS | 1995.9 (130.8) | 629 | 0.033CI (0.002) | 0.091 (0.008) | 0.940CI (0.030) | 0.983CI (0.002) | 0.983CI (0.002) | |
| One-factor (free) | TfA | 3272.7 (197.9) | 594 | 0.048CI (0.002) | 0.043CI (0.001) | 0.917CI (0.005) | 0.893 (0.008) | 0.899 (0.007) |
| DWLS | 2181.3 (410.8) | 594 | 0.036CI (0.004) | 0.082 (0.005) | 0.967CI (0.003) | 0.979CI (0.005) | 0.981CI (0.005) | |
| Linear | TfA | 2526.1 (124.7) | 628 | 0.039CI (0.001) | 0.050CI (0.002) | 0.934CI (0.003) | 0.917 (0.007) | 0.917 (0.007) |
| DWLS | 1645.1 (155.1) | 628 | 0.029CI (0.007) | 0.089 (0.011) | 0.950CI (0.032) | 0.990CI (0.002) | 0.990CI (0.002) | |
| Quadratic | TfA | 2128.5 (98.6) | 628 | 0.035CI (0.001) | 0.048CI (0.002) | 0.944CI (0.002) | 0.930 (0.006) | 0.931 (0.006) |
| DWLS | 1449.5 (135.5) | 628 | 0.026CI (0.002) | 0.085 (0.009) | 0.954CI (0.036) | 0.982CI (0.002) | 0.990CI (0.002) | |
| Logarithmic | TfA | 3227.5 (172.8) | 628 | 0.046CI (0.002) | 0.049CI (0.001) | 0.918CI (0.004) | 0.897 (0.008) | 0.897 (0.008) |
| DWLS | 1888.6 (255.5) | 628 | 0.035CI (0.013) | 0.092 (0.014) | 0.945CI (0.027) | 0.984CI (0.003) | 0.985CI (0.003) |
Note. RMSEA = root mean square error of approximation; SRMR = standardized root mean square residual; GFI = goodness-of-fit index; NNFI = nonnormed fit index; CFI = comparative fit index; CI = the 95% confidence interval indicated a good fit; “free” = freely estimated factor loadings; “fix” = fixed factor loadings; TfA = threshold-free approach; DWLS = diagonally weighted least squares.
Although the best model-data fit was observed in using DWLS, there were also cues that appeared to call its validity into question. First, according to chi-square, RMSEA, NNFI, and CFI the one-factor model with constrained factor loadings showed a better model fit than the one-factor model with free factor loadings. This observation was contrary to other observations and intuition since free factor loadings enabled the clearly better adaptation to the specificities of data than constrained factor loadings. Second, there was an unusually large difference between the RMSEA and SRMR values for DWLS. On average the SRMR was three times as large as RMSEA. In contrast, it could be expected that the ratio of RMSEA and SRMR would reflect to some degree the ratio of the corresponding cutoffs that were .06 and .08, respectively. Third, there were the already mentioned negative variances of latent variables. They could be taken as indications of estimation problems. The occurrence of such problems in the sample size of 2,000 clearly signified that many more cases of such problems could be expected in smaller sample sizes.
Taking these observations into account, we decided to conduct the main part of the present study based on TfA only, since the results observed by TfA showed the properties that typically characterize results obtained by the maximum likelihood estimation method in CFA.
Results Regarding the Subsets
Figure 1 shows illustrations of the RMSEA results observed in investigating the subsets of items. Each bar refers to one RMSEA coefficient, and for each subset four RMSEA coefficients are provided, reflecting the one-factor, the linear, the quadratic, and the logarithmic models. From left to right along the horizontal axis, the number of subsets is increasing. Accordingly, the first set of bars shows the results for the whole measure. The second and third sets of bars represent the division into two subsets with 18 items each. The next three sets of bars represent the three subsets with 12 items each. The remaining bars show the four subsets with nine items each, and the six subsets with six items each. The dashed, horizontal line illustrates the cutoff criterion suggesting a good model fit for RMSEA results that are smaller than .06.
Figure 1.
Means of the RMSEA results for all four models and all subsets.
Note. Results are based on 200 matrices and a sample size of N = 2,000.
For every subset there was at least one RMSEA result suggesting a good model fit, except for the very last subset including Items 31 to 36. The one-factor model (with fixed factor loadings), represented by the white bars, almost always showed the highest values indicating the poorest model fit. For subsets in the rear part of the APM, the differences in RMSEA between the four models were especially large. For the very first items there was hardly any difference between the four models, whereas for the later items the two-factor models clearly showed better results than the one-factor model. This was very obvious for the breakdown in 6 subsets. Furthermore, model fit improved as the analyzed subsets became smaller. Only for the very last items this seemed not to hold since model fit was worst for the smallest subset.
For further comparison of the one- and two-factor models, the AIC and CFI results were computed. For every simulated data set each positive outcome of comparing AICs, that is, the AIC for the best fitting two-factor model was smaller than the AIC for the one-factor model, was counted. This resulted in a number between 0 and 200, whereby 200 indicated that an item-position effect was detected for all comparisons. For every subset the obtained number is listed in Table 2. For example, regarding the subset containing the Items 1 to 18, in 199 out of 200 cases the logarithmic model performed better than the one-factor model. Regarding the subset containing the Items 1 to 6, in only 88 out of 200 cases the quadratic model performed better than the one-factor model. Most of the subsets for which the maximum of 200 was not reached were located in the early part of APM. This implied that the item-position effect was not yet as strong as in the later items where 200 was reached frequently.
Table 2.
Number of Data Sets (From a Total of 200 Data Sets) in Which the Best Fitting Two-Factor Model Showed a Better Fit Than the One-Factor Model According to the AIC Statistic.
| Items contained in the subset | 1-36 | 1-18 | 19-36 | 1-12 | 13-24 | 25-36 | 1-9 | 10-18 | 19-27 | 28-36 | 1-6 | 7-12 | 13-18 | 19-24 | 25-30 | 31-36 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Type of the best fitting two-factor model | Quadratic | Logarithmic | Quadratic | Linear | Quadratic | Quadratic | Logarithmic | Linear | Logarithmic | Quadratic | Quadratic | Logarithmic | Logarithmic | Linear | Logarithmic | Quadratic |
| Number of datasets where AICone-factor model > AICbest fitting model | 200 | 199 | 200 | 198 | 200 | 200 | 102 | 110 | 200 | 200 | 88 | 121 | 181 | 200 | 198 | 200 |
The CFI results are presented in Figure 2, which is structured similar to Figure 1. Model fit seemed to improve as the subsets were getting smaller. However, this was only true for subsets close to the middle, and not for those containing the very first or last items. Again, the one-factor model, indicated by the white bars, showed the poorest model fit for all subsets. According to Cheung and Rensvold (2002), differences in CFI results that are larger than .01 implicate a significant difference in model fit. In line with this criterion, subsets where the mean CFI statistic of the one-factor model and the best performing two-factor model showed a substantial difference are marked by an asterisk in Figure 2. For example, the breakdown in six subsets clearly pointed out that the two-factor models outperformed the one-factor model for certain subsets, especially for subsets in the rear part of APM. However, only for a few models the index exceeded the cutoff of .95, indicating a good model fit.
Figure 2.
Means of the CFI results for all four models and all subsets.
Note. Results are based on 200 matrices and a sample size of N = 2,000. *In subsets marked with an asterisk, there was a significant difference between the standard model and the best fitting two-factor model.
Additionally, scaled variances of the latent variables were computed (Schweizer, 2011), that enabled to compare the amount of variance of different latent variables. The results are shown in Figure 3, which is structured similar to the other figures. For each subset of items a set of four bars is presented. They illustrate the scaled variances of the latent variables regarding the one-factor, the linear, the quadratic, and the logarithmic models, respectively, in corresponding order. The gray bars represent the variance of the ability-specific latent variable. For the two-factor models the additional white bars represent the variances of the position-specific latent variable. Subsets that showed a significant variance of the position-specific latent variable are highlighted with an asterisk in Figure 3. The most noticeable observation was that the total amount of latent variance, which is the sum of the variances of both the ability-specific and the position-specific latent variables, was increasing from earlier to later subsets. That means the amount of true variance was increasing from the first to the last items, and therefore more variance could be explained in the rear subsets. Another result was the increase in total variance from the one- to the two-factor models within each subset. Accordingly, the models with an additional position-specific latent variable explained more variance than the one-factor model with only one latent variable. Thus, these two-factor models reflected the analyzed data sets in a better way than the one-factor model. In addition, the variance of the position-specific latent variable accounted for a substantial percentage of the total latent variance, especially for the rear subsets. Consequently, these results suggested that the position-specific latent variable was an important component that was not negligible. However, for some subsets the estimated variance for the position-specific latent variable was not significant, indicating that no item-position effect can be assumed for these subsets.
Figure 3.
Means of the scaled variances of the latent variables for the four models and all subsets.
Note. Results are based on 200 matrices and a sample size of N = 2,000. Initially negative variances of the position-specific latent variable are marked by hatched bars. Subsets with a significant variance of the position-specific latent variable are marked by an asterisk.
Additionally, for three subsets of items the variance of the position-specific latent variable was negative, suggesting misspecifications for the affected models. These subsets are highlighted in Figure 3 by hatched bars. For the subset of Items 13 to 18 these negative variances actually became significant. However, these negative variances were mainly observed for subsets containing the same items. Obviously, since the computations were based on a real data pattern, these items showed characteristics that could not be explained through the applied models. As negative variances violate the assumptions of CFA, the model with two latent variables had to be rejected for the corresponding subset, like it had to in the cases of a negative but insignificant variance. This led to the result that no item-position effect was assumed for the subsets where negative latent variances were found (cf. Figure 3).
Results Regarding the Adapted Representations
Based on the results presented in the previous section, two different adapted representations of the item-position effect, which were defined by two piecewise functions, were composed. The resulting courses are illustrated in Figure 4. The main difference between the two representations was that, in contrast to the first adapted model, no item-position effect was assumed for Items 1 to 18 in the second adapted model.
Figure 4.
Course one (a) and two (b) of the adapted representations of the item-position effect.
In Table 3, the model fit results regarding the adapted models are presented. In order to prove the validity of our results, a cross-validation was conducted. For this purpose, the 200 simulated matrices were split in two. The first 100 matrices were used for investigating subsets of items according to the above described approach. The second 100 matrices served as hold-out sample. Based on the results regarding the first 100 matrices, again, two adapted representations regarding the item-position effect (i.e., piecewise functions) were created. As it turned out, this led to exactly the same adapted courses than the ones already described above (see Figure 4). In the next step, these adapted representations were applied to the first 100 matrices (first), as well as to the hold-out sample (second), in order to compare model fit results (Table 3). Additionally, the adapted models were applied to all 200 matrices (full). Results did not indicate any differences between the first and the second 100 matrices. The found results seemed to be stable over different samples that show similar characteristics.
Table 3.
Means and Standard Deviations (in Parentheses) of the Results Regarding Model Fit for the Adapted Models and a Sample Size of N = 2,000.
| Sample | Model | χ2 (SD) | df | RMSEA (SD) | SRMR (SD) | GFI (SD) | NNFI (SD) | CFI (SD) |
|---|---|---|---|---|---|---|---|---|
| First | Adapted 1 | 2620.0 (137.8) | 628 | 0.040CI (0.001) | 0.050CI (0.002) | 0.932CI (0.003) | 0.913 (0.007) | 0.913 (0.007) |
| Second | 2615.0 (123.3) | 628 | 0.040CI (0.001) | 0.051CI (0.002) | 0.932CI (0.003) | 0.914 (0.007) | 0.915 (0.007) | |
| Full | 2617.5 (130.8) | 628 | 0.040CI (0.001) | 0.051CI (0.002) | 0.932CI (0.003) | 0.914 (0.007) | 0.914 (0.007) | |
| First | Adapted 2 | 2107.3 (101.9) | 628 | 0.034CI (0.001) | 0.045CI (0.002) | 0.945CI (0.003) | 0.930 (0.006) | 0.930 (0.006) |
| Second | 2102.3 (94.7) | 628 | 0.034CI (0.001) | 0.046CI (0.002) | 0.945CI (0.002) | 0.931 (0.006) | 0.931 (0.006) | |
| Full | 2104.8 (98.4) | 628 | 0.034CI (0.001) | 0.046CI (0.002) | 0.945CI (0.002) | 0.931 (0.006) | 0.931 (0.006) |
Note. Results are based either on the first 100 (first), on the second 100 (second), or on all 200 (full) simulated matrices. The threshold-free approach was used. RMSEA = root mean square error of approximation; SRMR = standardized root mean square residual; GFI = goodness-of-fit index; NNFI = nonnormed fit index; CFI = comparative fit index; CI = the 95% confidence interval indicated a good fit.
These results (especially of the “full”-sample case) can be compared to the TfA-based results included in Table 1. Accordingly, the quadratic model and the second adapted model provided the best fit. These models showed a similar, good fit in most statistics, and acceptable results for NNFI and CFI. Although for some fit statistics the second adapted model showed slightly better results, overall no substantial improvement in model fit was observed compared to the quadratic model. Furthermore, the linear, the logarithmic, as well as the first adapted model showed significantly poorer model fit than the other two models, based on the CFI results. Overall, the two one-factor models (one with fixed factor loadings and one with freely estimated factor loadings) provided the poorest model fit.
Furthermore, scaled variances of the latent variables were computed (see Table 4). All estimated latent variances were significant as indicated by the asymptotic t-values. Regarding the two-factor models this indicated that the position-specific latent variable was not negligible. In fact, the position-specific variable accounted for 43% of the total latent variance in the linear model, 36% in the quadratic model, 53% in the logarithmic model, 43% in the first adapted model, and 31% in the second adapted model. The total variance, which is obtained by adding up the variances of both latent variables, was larger for the two-factor models than for the one-factor model. The largest amount of total latent variance was observed for the quadratic model. Thus, considering a latent variable for representing the item-position effect increased the amount of explained variance compared to the one-factor model. (Please note that no scaled variances were estimated for the one-factor model with freely estimated factor loadings.)
Table 4.
Means and Standard Deviations (in Parentheses) of the Scaled Variances of the Latent Variables and the Corresponding t-Values for All Estimated Models and a Sample Size of N = 2,000. Results Are Based on 200 Matrices.
| Model | Ability (SD) | t value (SD) | Position (SD) | t value (SD) |
|---|---|---|---|---|
| One-factor (fix) | 0.465 (0.016) | 26.1 (0.18) | — | — |
| Linear | 0.352 (0.022) | 19.0 (0.53) | 0.266 (0.012) | 16.8 (0.42) |
| Quadratic | 0.407 (0.021) | 22.5 (0.36) | 0.234 (0.010) | 18.6 (0.38) |
| Logarithmic | 0.242 (0.024) | 11.3 (0.87) | 0.275 (0.018) | 12.8 (0.66) |
| Adapted 1 | 0.348 (0.022) | 18.7 (0.54) | 0.260 (0.012) | 16.3 (0.44) |
| Adapted 2 | 0.426 (0.019) | 24.0 (0.29) | 0.191 (0.008) | 18.6 (0.39) |
Note. “Fix” = fixed factor loadings.
Conclusion
The present study was conducted in order to extend and evaluate existing representations of the item-position effect. We conducted a simulation study that was based on a covariance pattern characterizing Raven’s APM. The considered models included courses according to previously tested (also referred to as basic) as well as new adaptive functions. By using adaptive functions, we tried to find a representation of the item-position effect that fits to the APM data in the best possible way. At the same time, different basic functions that have been used so far, and whose courses, in a way, follow theoretical considerations, were applied. Accordingly, the adapted representations should reflect specificities of the data to a larger degree than the basic functions. Hence, the very low deviation between the basic and the adaptive models was considered as an indicator of the appropriateness of the commonly used models.
Considering a model without a representation of the item-position effect (i.e., the one-factor model) besides the models including such a representation allowed to identify for which sequences of items it is reasonable to assume an item-position effect. The item-position effect is observed for the whole measure as well as for most of the subsets of items as the two-factor models continuously showed better results than the one-factor model. Only for some subsets, especially in the earlier part of the APM, the assumption of an item-position effect is rejected because of insignificant results regarding the variance of the position-specific latent variable. Furthermore, in the investigation of subsets an increasing amount of total latent variance is observable from early to later subsets.
Overall, a quadratic function best represents the course of the item-position effect in Ravens’ APM, which is consistent with prior results (Ren et al., 2014; Schweizer et al., 2009). Although one of the two adapted representations shows a similar model fit, it does not remarkably improve it compared to the quadratic model. Additionally, the amount of total latent variance is largest for the quadratic model. Therefore, following the principle of simplicity, adapting the representation of the item-position effect to the characteristics of the APM seems not to be necessary. Rather, it seems to be sufficient to consider only basic functions.
Regarding Ravens’ APM, solving one item is a complex task which demands for combining different rules (Carpenter et al., 1990). Therefore, assuming learning as the underlying process, a quadratic course of the item-position effect is reasonable since only a small impact of the ability to learn is expected in the very beginning of working on the test. For later items, the solving rules that are necessary are (at least partly) known and the main task is to combine these rules. Accordingly, the ability to learn during a test shows a larger influence for later items. This is described best by a quadratic function.
Nevertheless, the investigation of small subsets of neighboring items reveals different courses of the item-position effect for different subsets. At first glance, this may appear inconsistent as at the same time the results suggest an overall quadratic course. In our opinion this may be reasonable: A quadratic course is mainly characterized by a large increase in slope from start to end. However, this main characteristic is not that distinct when regarding only a small section of items. Accordingly, also a linear or even a logarithmic course may be more appropriate in some sections.
Finally, the generalizability of the found results needs to be discussed. Of course, the quadratic representation of the item-position effect, which turns out to be most appropriate in the present study, cannot simply be generalized to other (achievement) measures. If learning can be confirmed as the source of the item-position effect (Embretson, 1991; Ren et al., 2014; Verguts & De Boeck, 2000), its course may depend on how demanding the items of a measure are. If solving the items does not demand for several complex rules (like in APM items), maybe a linear or a logarithmic course of the item-position effect may be more appropriate. However, the observation that adapted representations do not perform substantially better than the basic linear, quadratic, or logarithmic ones may also be true for other ability measures. Thus, it seems to be sufficient taking only those basic functions into account when analyzing the item-position effect.
The term “course” is used to address the shape of the corresponding mathematical function that represents a kind of developmental process of the item-position effect from the first to the last items.
We like to thank the reviewer who provided the opportunity to include such a comparison.
Footnotes
Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Parts of the research were supported by Deutsche Forschungsgemeinschaft, Kennedyallee 40, 53175 Bonn, Germany (SCHW 402/20-1).
References
- Bandalos D. L., Gagné P. (2012). Simulation methods in structural equation modeling. In Hoyle R. H. (Ed.), Handbook of structural equation modeling (pp. 92-108). New York, NY: Guilford. [Google Scholar]
- Bock R. D., Gibbons R., Muraki E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261-280. doi: 10.1177/014662168801200305 [DOI] [Google Scholar]
- Carpenter P. A., Just M. A., Shell P. (1990). What one intelligence test measures: A theoretical account of the processing in the Raven Progressive Matrices Test. Psychological Review, 97, 404-431. doi: 10.1037/0033-295X.97.3.404 [DOI] [PubMed] [Google Scholar]
- Cheung G. W., Rensvold R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9, 233-255. doi: 10.1207/S15328007SEM0902_5 [DOI] [Google Scholar]
- DiStefano C., Morgan G. B. (2014). A comparison of diagonal weighted least squares robust estimation techniques for ordinal data. Structural Equation Modeling: A Multidisciplinary Journal, 21, 425-438. doi: 10.1080/10705511.2014.915373 [DOI] [Google Scholar]
- Embretson S. E. (1991). A multidimensional latent trait model for measuring learning and change. Psychometrika, 56, 495-515. doi: 10.1007/BF02294487 [DOI] [Google Scholar]
- Finney S. J., DiStefano C. (2013). Nonnormal and categorical data in structural equation modeling. In Hancock G. R., Mueller R. O. (Eds.), Structural equation modeling: A second course (2nd ed., pp. 439-492). Charlotte, NC: Information Age. [Google Scholar]
- Graham J. M. (2006). Congeneric and (essentially) tau-equivalent estimates of score reliability: What they are and how to use them. Educational and Psychological Measurement, 66, 930-944. doi: 10.1177/0013164406288165 [DOI] [Google Scholar]
- Hamilton J. C., Shuminsky T. R. (1990). Self-awareness mediates the relationship between serial position and item reliability. Journal of Personality and Social Psychology, 59, 1301-1307. doi: 10.1037/0022-3514.59.6.1301 [DOI] [Google Scholar]
- Hartig J., Buchholz J. (2012). A multilevel item response model for item position effects and individual persistence. Psychological Test and Assessment Modelling, 54, 418-431. [Google Scholar]
- Hartig J., Hölzel B., Moosbrugger H. (2007). A confirmatory analysis of item reliability trends (CAIRT): Differentiating true score and error variance in the analysis of item context effects. Multivariate Behavioral Research, 42, 157-183. [DOI] [PubMed] [Google Scholar]
- Hohensinn C., Kubinger K. D., Reif M., Holocher-Ertl S., Khorramdel L., Frebort M. (2008). Examining item-position effects in large-scale assessment using the linear logistic test model. Psychology Science Quarterly, 50, 391-402. [Google Scholar]
- Hu L., Bentler P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6, 1-55. doi: 10.1080/10705519909540118 [DOI] [Google Scholar]
- Jöreskog K. G., Sörbom D. (2001). Interactive LISREL: User guide. Lincolnwood, IL: Scientific Software International. [Google Scholar]
- Jöreskog K. G., Sörbom D. (2004). LISREL 8.70. Lincolnwood, IL: Scientific Software International. [Google Scholar]
- Kline R. B. (2011). Principles and practice of structural equation modeling (3rd ed.). New York, England: Guilford. [Google Scholar]
- Knowles E. S. (1988). Item context effects on personality scales: Measuring changes the measure. Journal of Personality and Social Psychology, 55, 312-320. doi: 10.1037/0022-3514.55.2.312 [DOI] [Google Scholar]
- Knowles E. S., Byers B. (1996). Reliability shifts in measurement reactivity: Driven by content engagement or self-engagement? Journal of Personality and Social Psychology, 70, 1080-1090. doi: 10.1037//0022-3514.70.5.1080 [DOI] [PubMed] [Google Scholar]
- Kubinger K. D. (2008). On the revival of the Rasch model-based LLTM: From constructing tests using item generating rules to measuring item administration effects. Psychology Science Quarterly, 50, 311-327. [Google Scholar]
- Li C.-H. (2015). Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares. Behavior Research Methods. Advance online publication. doi: 10.3758/s13428-015-0619-7 [DOI] [PubMed] [Google Scholar]
- Lord F. M., Novick M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley. [Google Scholar]
- Lozano J. H. (2015). Are impulsivity and intelligence truly related constructs? Evidence based on the fixed-links model. Personality and Individual Differences, 85, 192-198. doi: 10.1016/j.paid.2015.04.049 [DOI] [Google Scholar]
- McCullagh P. J., Nelder J. A. (1984). Generalized linear Models (Monographs on Statistics and Applied Probability; Vol. 37). London, England: Chapman & Hall. [Google Scholar]
- McDonald R. P., Ahlawat K. S. (1974). Difficulty factors in binary data. British Journal of Mathematical and Statistical Psychology, 27, 82-99. doi:10.1111/j.2044-8317.1974. tb00530.x [Google Scholar]
- Mollenkopf W. G. (1950). An experimental study of the effects on item-analysis data of changing item placement and test time limit. Psychometrika, 15, 291-315. doi: 10.1007/BF02289044 [DOI] [PubMed] [Google Scholar]
- Muthén B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115-132. doi: 10.1007/BF02294210 [DOI] [Google Scholar]
- Rasch G. (1980). Probabilistic models for some intelligence and attainment tests (Expended ed.). Chicago, IL: University of Chicago Press. [Google Scholar]
- Raven J. C., Raven J., Court J. H. (1997). Raven’s Progressive Matrices and Vocabulary Scales. Edinburgh, England: J. C. Raven. [Google Scholar]
- Ren X., Goldhammer F., Moosbrugger H., Schweizer K. (2012). How does attention relate to the ability-specific and position-specific components of reasoning measured by APM? Learning and Individual Differences, 22(1), 1-7. doi: 10.1016/j.lindif.2011.09.009 [DOI] [Google Scholar]
- Ren X., Wang T., Altmeyer M., Schweizer K. (2014). A learning-based account of fluid intelligence from the perspective of the position effect. Learning and Individual Differences, 31, 30-35. doi: 10.1016/j.lindif.2014.01.002 [DOI] [Google Scholar]
- Schweizer K. (2011). Scaling variances of latent variables by standardizing loadings: Applications to working memory and the position effect. Multivariate Behavioral Research, 46, 938-955. doi: 10.1080/00273171.2011.625312 [DOI] [PubMed] [Google Scholar]
- Schweizer K. (2012). The position effect in reasoning items considered from the CFA perspective. International Journal of Educational and Psychological Assessment, 11(2), 44-58. [Google Scholar]
- Schweizer K. (2013). A threshold-free approach to the study of the structure of binary data. International Journal of Statistics and Probability, 2(2). doi: 10.5539/ijsp.v2n2p67 [DOI] [Google Scholar]
- Schweizer K., Ren X., Wang T. (2015). A comparison of confirmatory factor analysis of binary data on the basis of tetrachoric correlations and of probability-based covariances: A simulation study. In Millsap R. E., Bolt D. M., van der Ark L. A., Wang W.-C. (Eds.), Springer proceedings in mathematics & statistics. Quantitative psychology research (Vol. 89, pp. 273-292). Berlin, Germany: Springer. [Google Scholar]
- Schweizer K., Schreiner M., Gold A. (2009). The confirmatory investigation of APM items with loadings as a function of the position and easiness of items: A two-dimensional model of APM. Psychology Science Quarterly, 51(1), 47-64. [Google Scholar]
- Schweizer K., Troche S. J., Rammsayer T. H. (2011). On the special relationship between fluid and general intelligence: New evidence obtained by considering the position effect. Personality and Individual Differences, 50, 1249-1254. doi: 10.1016/j.paid.2011.02.019 [DOI] [Google Scholar]
- Verguts T., De Boeck P. (2000). A Rasch model for detecting learning while solving an intelligence test. Applied Psychological Measurement, 24, 151-162. doi:10.1177/014662 10022031589 [Google Scholar]




