Skip to main content
Applied Psychological Measurement logoLink to Applied Psychological Measurement
. 2022 Feb 13;46(2):79–97. doi: 10.1177/01466216211063234

Scale Linking for the Testlet Item Response Theory Model

Seonghoon Kim 1,, Michael J Kolen 2
PMCID: PMC8908412  PMID: 35281343

Abstract

In their 2005 paper, Li and her colleagues proposed a test response function (TRF) linking method for a two-parameter testlet model and used a genetic algorithm to find minimization solutions for the linking coefficients. In the present paper the linking task for a three-parameter testlet model is formulated from the perspective of bi-factor modeling, and three linking methods for the model are presented: the TRF, mean/least squares (MLS), and item response function (IRF) methods. Simulations are conducted to compare the TRF method using a genetic algorithm with the TRF and IRF methods using a quasi-Newton algorithm and the MLS method. The results indicate that the IRF, MLS, and TRF methods perform very well, well, and poorly, respectively, in estimating the linking coefficients associated with testlet effects, that the use of genetic algorithms offers little improvement to the TRF method, and that the minimization function for the TRF method is not as well-structured as that for the IRF method.

Keywords: scale linking methods, testlet model, item response theory


Educational test forms are often constructed using clusters of items based on a common stimulus or content area. For example, test items may be grouped around a reading passage, scenario, chart, or section associated with particular content. Wainer and Kiely (1987) called such a group of items a testlet and adopted it as a construction unit in computerized adaptive testing. From the perspective of item response theory (IRT; Lord, 1980; Yen & Fitzpatrick, 2006), the assumption of local independence among items nested within a testlet, given the primary latent trait, would be violated to some extent because the responses of examinees to the items might be affected by the testlet effect as well as the primary factor. An efficient way to deal with local dependence is to use the testlet model (Wainer et al., 2007), in which a secondary, random-effect factor is added to the primary factor. Researchers (e.g., DeMars, 2006; Li et al., 2006; Rijmen, 2010) have shown that the testlet model is a constrained version of the bi-factor model (Gibbons & Hedeker, 1992).

Like other IRT models, the testlet model has a model identification problem, specifically a scale indeterminacy problem, because the item parameters and person parameters are invariant within a linear transformation of the latent trait scale. In practice, scale indeterminacy typically is solved by choosing a scale such that the mean and standard deviation (SD) of the person parameters are arbitrarily set to certain values (e.g., 0 and 1) for the examinee group being analyzed (Rijmen, 2010). According to that convention, the latent scales obtained from separate calibrations of sample data from different populations are not likely to be equivalent, but they are assumed to be linearly related. This non-equivalency creates the need for a common scale, which can be developed through scale linking (or scale transformation), in which one scale is linked to another (base) scale with a linear function.

This paper is primarily concerned with the methods used to estimate the linking parameters for the testlet model under the common-item nonequivalent groups (CING) design (Kolen & Brennan, 2014). Many linking methods have been presented for use with traditional dichotomous IRT models such as the two-parameter logistic and three-parameter logistic (3PL) models (e.g., Divgi, 1985; Haebara, 1980; Loyd & Hoover, 1980; Marco, 1977; Stocking & Lord, 1983), and they have been extended to polytomous models (Kim & Lee, 2006). Most relevant to the present paper, Kim (2019) presented three linking methods for the 3PL bi-factor model, the direct least squares (DLS), item response function (IRF), and test response function (TRF) methods, which are bi-factor extensions of Divgi’s (1985), Haebara’s (1980), and Stocking and Lord’s (1983) approaches, respectively. Kim (2019) showed through simulations that the IRF, DLS, and TRF methods differed little in estimating the slope (dilation) linking coefficients, but they exhibited substantial differences in estimating the intercept (translation) linking coefficients, with the IRF method being the most accurate and the TRF method being the least accurate. However, in the IRT literature, only the TRF method has been formally extended for use with the testlet model. That extension is found in Li et al. (2005), who presented the TRF method for a two-parameter normal ogive (2PNO) testlet model. In this paper, Li et al.’s TRF method is presented under the 3PL testlet model since this general model is more widely used than the 2PNO testlet model in practice.

Questions and Purposes

As described in detail later, Li et al. (2005) formulated the linking task under the 2PNO testlet model such that, given k common testlets, the linking parameters should include the means (denoted by μγs ) of the testlet effect factors γs , s= 1,…, k , with the constraint s=1kμγs=0 , in addition to the linking coefficients A and B for the primary factor θ . The criterion function (also known as the loss function) for the TRF method is nonlinear with respect to the linking parameters, and thus a multivariate search technique such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm, one of the quasi-Newton methods (Dennis & Schnabel, 1996), should be implemented to estimate the linking parameters. Li et al. (2005) combined the GENOUD genetic algorithm (Sekhon & Mebane, 1998; see also Mebane & Sekhon, 2011) with the BFGS algorithm. Li and her colleagues used the genetic algorithm because they were concerned that if there were three or more linking parameters to be estimated, the criterion function might have multiple minimum or saddle points and the BFGS method might be unable to find or fail to converge to the global minimum. However, the GENOUD genetic algorithm is very computationally intensive and time-consuming (taking 25 or more minutes for a linking task, as reported by Li et al.).

The present study was motivated by some related questions regarding Li et al.’s (2005) approach to the linking solutions for the TRF method. The first question is “Is it necessary to use genetic algorithms to find the linking solutions for the TRF method?” This question is important, because previous studies into multidimensional IRT linking (e.g., Davey et al., 1996; Oshima et al., 2000) that considered six or more linking parameters in a rotation matrix and translation vector have not reported any problem in finding the linking solutions using a modified version of the Newton method. If the genetic method is not substantially superior to the BFGS method, there would be no compelling reason to use it in practice. The second and third questions, which are closely related to each other, are “Are linking methods for the testlet model other than the TRF method available?” and “How do the different linking methods for the testlet model compare in their performance?” These questions are also important, because the availability of different methods for scale linking allows practitioners to choose the appropriate method depending on the situation. The choice of a linking method can be made more wisely if more information about the relative performance of different methods is given to the practitioners. Even if one method is operationally used, other methods should still be implemented for diagnostic purposes (Kolen & Brennan, 2014).

The primary purposes of this paper are two-fold. One is to answer the first question posed above regarding Li et al.’s (2005) TRF method. The other is to present the mean/least squares (MLS) and IRF methods for the testlet model and investigate their performance in linking accuracy relative to the TRF method. To achieve these purposes, we first present the 3PL testlet model (instead of the 2PNO model) in the next section for generality and reformulate the linking task formulated by Li et al. (2005) into a special case under the bi-factor modeling. Next we use the reformulated linking framework to present the MLS, TRF, and IRF methods and conduct a simulation study to compare the accuracy of these methods.

Linking Methods for the 3PL Testlet Model

The IRT literature contains several versions of the testlet model for dichotomous items that differ slightly in parameterization (e.g., Bradlow et al., 1999; Glas et al., 2000; Wainer & Wang, 2000). For the purposes of this paper, we use the parameterization of Glas et al. (2000) to write a 3PL testlet model that defines the probability that an examinee j will answer item i correctly as

Pi(θj,γjs(i))=P(θj,γjs(i);ai,bi,ci)=ci+1ci1+exp[Dai(θj+γjs(i)bi)], (1)

where ai , bi , and ci are the discrimination, difficulty, and lower asymptote parameters for item i , respectively, D is a scaling constant (usually set to 1 or 1.7); θj is the primary trait (ability) parameter of examinee j ; and γjs(i) is a random-effect parameter (assumed to be independent of θj ) for examinee j of testlet s , the testlet to which item i belongs. Equation (1), the IRF for the 3PL testlet model, can be viewed as a special case of the 3PL bi-factor model, written as

Pi(θj0,θjs)=P(θj0,θjs;ai,Cs(i),di,ci)=ci+1ci1+exp[D(aiθj0+aiCs(i)θjs+di)], (2)

where ai , ci , and D are the same as in Equation (1); di=aibi is the intercept parameter; θj0 is the parameter for examinee j of the primary factor (i.e., θ0 = θ ); θjs is the parameter of the specific factor of testlet s with the relationship γs(i) = Cs(i)θs ; and Cs(i) is a proportionality constant across all items nested within testlet s .

Whether expressed as Equations (1) or (2), the 3PL testlet model cannot be identified unless some restrictions are imposed on the parameters. For Equation (1), the mean and SD of θ are typically fixed to 0 and 1, respectively, and the means of the γs(i) s are fixed to 0, with each of their SDs, σγs , being free parameters. For Equation (2), the mean and SD of θ0 and θs are fixed to 0 and 1, respectively, and Cs(i) = σγs are considered the free parameters to be estimated. In other words, a standardized scale (0–1 scale) is independently used for each θ dimension to remove model indeterminacy. Throughout this paper, we assume that the 3PL testlet model is identified by 0–1 scaling, and its parameters are estimated using sample data.

The Linking Parameters Estimated

Consider two examinee groups, a base group and a new group, that can differ in each θ dimension. Assume that an identical test, consisting of k testlets, has been administered to both groups and that for each group, separate calibration has been conducted using 0–1 scaling to estimate all item parameters, including Cs (dropping the nested subscript i for simplicity). Furthermore, define the 0–1 scales from the base and new groups as θB and θN , respectively, where θB= ( θ0B , θ1B ,…, θkB ) and θN= ( θ0N , θ1N ,…, θkN ). Use aiB , biB , ciB , diB , and CsB to denote the item/testlet parameters estimated on the θB scale, and use aiN , biN , ciN , diN , and CsN to denote the counterparts on the θN scale.

By the “within a linear transformation” invariance property of IRT, the θB and θN scales are linearly related as follows (Kim, 2019):

θ0B=Aθ0N+B, (3)
θsB=λsθsN+βs,s=1,,k, (4)

where A and B are the linking coefficients for the θ0 dimension and λs and βs are the linking coefficients for the θs dimension. The slopes, A and λs , adjust for unit differences between the new and base scales, and the translation intercepts, B and βs , adjust for location differences. If scale linking is perfect, the two sets of item/testlet parameter estimates from separate calibrations should be related as follows:

aiB=aiN/A, (5)
CsBA=CsNλs, (6)
diB=diNaiN(1A)BaiN(CsBA)βs (7a)
=diN1A[aiNaiN][100CsB][Bβs], (7b)
biB=A[biN+(CsBA)βs]+B, (8)
ciB=ciN. (9)

However, Equations (5) through (9) do not perfectly hold among estimated item/testlet parameters because of sampling errors and possible model-data misfit. In general, linking errors are unavoidable with sample data, and the linking coefficients should be properly estimated so as to minimize the errors (Kim & Lee, 2006; Kolen & Brennan, 2014).

The descriptions above might be read as if the linking methods for the 3PL testlet model should estimate 2(k+1) linking coefficients ( A , λ1 to λk and B , β1 to βk ), but that is not the case. As can be seen from Equation (6), each λs ( s= 1,…, k ) is a function of A , CsB , and CsN , so if A is estimated and the two constants are given, its value is determined. Thus, k lambda coefficients are not considered as linking parameters to be estimated. For the beta coefficients, βs , k1 can be uniquely estimated due to the linear dependence among them, such as s=1kβs=0 . Note that the linear dependence βs=0 agrees with the constraint μγs=0 used in Li et al. (2005), where μγs=CsBβs/A given A . Therefore, the three linking methods for the testlet model described below estimate k+1 “free” parameters, the A , B , and k1 βs coefficients. Because the meaning of the linear dependence among the βs values can be clearly revealed in presenting the MLS method, and the TRF and IRF methods are akin to each other, we present the MLS method first and then present the two response function methods. For the following presentation, it is assumed that a common testlet s contains ns items and the total number of items in the k common testlets for linking is n = ns .

MLS Method

The MLS method presented here is a hybrid in that it uses part of the mean/mean method (Loyd & Hoover, 1980) to estimate the slope A and then uses the linear least squares approach to estimate the intercepts, B and βs . Unlike the TRF and IRF methods, the MLS method can estimate the linking coefficients without an iterative search for the solutions.

Taking the mean over a-parameters based on Equation (5) and solving the resulting equation for A leads to a legitimate statistical solution for A (Loyd & Hoover, 1980):

A=Mean(aN)Mean(aB)=iaiN/niaiB/n, (10)

where aN and aB represent all the discrimination parameters estimated on the new and base scales, respectively. Once the A coefficient is estimated by Equation (10), the B and beta coefficients need to be simultaneously estimated because as seen from by Equation (7) or (8) they are related to each other in an equation.

Based on Equation (7a), let diN=diNaiNB/AaiNCsBβs/A and ei=diNdiB . According to the statistical approaches used in Divgi (1985) and Oshima et al. (2000), the B and beta coefficients can be estimated as the values that minimize the sum of squared differences ( ei2 ) between diN and diB for all i . To obtain the solutions for the intercept coefficients using the least squares method, we first write an error model, based on Equation (7b), as

dNdB=1APDβ+e, (11)

where dN = (d1N,...,dnN) and dB = (d1B,...,dnB) are n×1 vectors of d-parameter estimates; e is an n×1 error vector; P is an n×(k+1) matrix whose row elements are “factor loadings,” aiN s, associated with the (k+1) -dimensional space θ= ( θ0 , θ1 ,…, θk ); D is a diagonal matrix whose k+1 diagonal elements are 1, C1B ,…, CkB ; and β =( B,β1,...,βk) is a vector of size k+1 . For instance, if there are two testlets and two items within each testlet, P , D , and β are expressed as

P=[a1Na2Na3Na4Na1Na2N0000a3Na4N],D=[1000C1B000C2B],andβ=[Bβ1β2]. (12)

Although the error model in Equation (11) resembles a regression model, where the dependent variable is dNdB and the coefficient vector is β , the solutions of β cannot be computed using the ordinary least squares approach because the factor pattern matrix P is not of full column rank (as shown by the example matrix in Equation (12)). With the condition nk+1 , usually met in practice, the rank of P is k , not k+1 . Such rank deficiency implies that except for the B coefficient, the k βs coefficients in β are linearly dependent, and only k1 ones need to be estimated. Although the linear dependence among βs can be formulated in many ways, we here choose the constraint βs=0 , corresponding to the constraint μγs=0 used in Li et al. (2005). In addition, we introduce a transformation matrix T , which relates β to an estimation vector α = (B,α1,...,αk1) such that

β=Tα, (13)

where T=[100TD] is a (k+1)×k matrix, and TD=[1/k1/k...1/k(1k)/k1/k...1/k1/k(1k)/k...1/k.........1/k1/k...(1k)/k] . Now the free coefficients in α can be estimated using the least squares method, and the solution formula can be derived as

α=A[(PDT)(PDT)]1(PDT)(dNdB). (14)

Finally, the MLS solutions for the B and βs coefficients in β are obtained by plugging the resulting α into Equation (13). Note that the βs estimates surely satisfy the zero-sum constraint due to the use of the T matrix.

TRF Method

For the traditional 3PL model, the TRF at a given θ is defined as the sum of IRFs over all the items on the test, written as T(θ)=Pi(θ) . T(θ) is the true score for an examinee with ability θ . Conceptually, the analog of T(θ) for the 3PL testlet model can be defined as the sum of the marginalized IRFs, each of which is computed by integrating the nuisance dimension γs (or θs ) out from the IRF in Equation (1) (or 2). In accordance with this conception, let Pi(θB) and T(θB) denote the marginalized IRF and TRF computed with the item/testlet parameters estimated on the θB scale, respectively, and let Pi(θB) and T(θB) denote the marginalized IRF and TRF computed with the parameter estimates transformed to the θB scale. The TRF method finds the solutions of A and β=(B,β1,...,βk) that minimize the criterion function, fT(A,β) ,

fT(A,β)=1Nq=1N[T(θqB)T(θqB)] 2=1Nq=1N[i=1nPi(θqB)i=1nPi(θqB)] 2, (15)

where q= 1, 2,…, N indexes N arbitrary points over the θB scale.

Although the marginal IRF and TRF can be straightforwardly computed for each pair of θ and γs , Li et al. (2005) used a new composite variable ξs=θ+γs for linking purposes. They noted that, assuming θ and γs have independent normal distributions with zero means and variances equal to 1 and σγs2 , respectively, ξs given θ is distributed as N(θ,σγs2) . With ξs , more explicitly ξs(i) , the 3PL testlet model in Equation (1) can be expressed as

Pi(ξs)=P(ξs;ai,bi,ci)=ci+1ci1+exp[Dai(ξsbi)]. (16)

Then, the probability of answering item i within testlet s correctly, conditional on θ and σξs(=σγs) , that is, the marginalized Pi(θ) , is expressed as

Pi(θ)Pi(θ;σξs)=Pi(ξs)h(ξs|θ;σξs)dξs, (17)

where h is the probability density function of ξs given θ . The integral in Equation (17) can be approximated to any desired degree of accuracy by using Gauss–Hermite quadrature.

Let ξsB and ξsN denote the ξs scales defined with the base and new groups, respectively. If the two scales are related as ξsB = AξsN+B , the item parameters on the ξsN scale can be transformed into those on the ξsB scale as follows (Li et al., 2005):

aiN=aiN/A, (18)
biN=AbiN+B. (19)

Although both transformations are legitimate in a technical sense, the transformation of biN by Equation (19) is insufficient for linking purposes because ξsB = AξsN+B takes into account possible mean and SD differences in θ between the base and new groups but not possible mean differences in γs between the two groups. Li et al. (2005) pointed out that if separate calibration results were obtained using the model in Equation (16), possible differences in the mean of γs between the base and new groups would be absorbed into biN , which would lead to a shift in biN . Therefore, they used the following transformation to account for that possible shift

biN=A(biN+μγs)+B. (20)

They further indicated that the zero-sum constraint μγs=0 ( s=1,...,k ) should be imposed for model identification, although they did not detail why the model needed that constraint. Note that based on Equation (8), the biN in Equation (20) can also be written as

biN=A[biN+(CsBA)βs]+B, (21)

where CsBβs/A=μγs . Of course, the constraint βs=0 is necessary for the reason revealed when the MLS method was addressed above.

The criterion function fT(A,β) in Equation (15), defined with the two sets, { aiB,biB,ciB,CsB } and { aiN,biN,ciN }, for the n common items associated with k testlets, is nonlinear with respect to the linking coefficients, A , B , and βs , where the zero-sum constraint can be dealt with in practice by setting βk=s=1k1βs . Thus a multivariate search technique is required to find the linking solutions for the TRF method. Previous linking studies (e.g., Kim & Lee, 2006; Oshima et al., 2000) suggest that the minimization solutions can be obtained by using a modified Newton or quasi-Newton approach such as the BFGS method. However, Li et al. (2005) combined the GENOUD algorithm (Sekhon & Mebane, 1998) with the BFGS method to ensure that the global, not the local, minimum solutions are obtained. All of the search techniques are based on the vector of partial derivatives (i.e., gradient) of the criterion function with respect to the parameters. The analytic formulas for the gradient of fT(A,β) with respect to the linking coefficients are presented in the Appendix.

IRF Method

Given the marginalized IRFs Pi(θB) and Pi(θB) for all common items, the IRF linking method (Haebara, 1980) for the traditional 3PL model can be straightforwardly extended to the 3PL testlet model. Similarly to the TRF method, the IRF method finds the solutions of A and β=(B,β1,...,βk) that minimize the criterion function, fI(A,β) ,

fI(A,β)=1Nnq=1Ni=1n[Pi(θqB)Pi(θqB)] 2, (22)

where, as denoted earlier, n is the number of common items and q= 1, 2,…, N indexes N arbitrary points over the θB scale. Although the marginalized IRFs Pi(θ) or Pi(θ0) , as generally denoted, can be evaluated by Equation (17), they can also be computed using the bi-factor model in Equation (2) as follows:

Pi(θ)Pi(θ0)=Pi(θ0,θs)g(θs)dθs, (23)

where g is the probability density function of θs . Of course, in that case, the Pi(θqB) and Pi(θqB) in Equation (22) are the probabilities evaluated at θ0=θqB with the parameter sets { aiB , ciB , diB , CsB } and { aiN , ciN , diN }, respectively.

The criterion function fI(A,β) is nonlinear, as is fT(A,β) , with respect to the linking coefficients, and thus a search technique is required to find the linking solutions. In this paper, we use the BFGS algorithm to find the linking solutions for the IRF method. The analytic formulas for the gradient of fI(A,β) are presented in the Appendix.

Simulation Study

A simulation study was conducted to compare the performance of the TRF, MLS, and IRF methods. Two versions of the TRF method were conducted: one using the GENOUD algorithm and the other using the BFGS algorithm. The IRF method was implemented using only the BFGS algorithm. The design and methodology of this simulation study were closely matched to those used by Li et al. (2005) so that the comparison might be made under nearly the same conditions as those used in the previous study.

Design and Data

The CING design was used to evaluate the linking parameter recovery of the four methods for the 3PL testlet model: (a) the GENOUD-TRF method, (b) the BFGS-TRF method, (c) the MLS method, and (d) the IRF method based on the BFGS algorithm. As in Li et al. (2005), simulated tests and data sets were generated using different sets of item/testlet parameters and linking parameters. Each simulated test form consisted of six testlets, each of which contained 5 items, giving 30 items in total. The number k of common testlets between the two (“base” and “new”) test forms to be linked was considered as the simulation factor. Two levels of k were used: k =2 and k =4, resulting in the two common testlets condition (Condition 1) and the four common testlets condition (Condition 2), respectively.

For each simulation condition, 10 pairs of simulated tests with 5000 examinees per form were generated, as in Li et al. (2005). For each test form, with D=1.7 , the ai parameters were generated from LN(0, 0.52) , the log-normal distribution with log-mean=0 and log-SD = .5; the bi parameters were generated from N(0, 1) under the restriction that 3bi3 ; and the ci parameters were generated from a uniform distribution ranging from .05 to .35. For each test form, the variances of γs (that is, Cs(i)2 ) were set to three levels, .1 (small testlet effect), .5 (medium testlet effect), and 1 (large testlet effect), and they were assigned to three testlet pairs that were randomly matched. Note that 10 or 20 common items had the same parameters between the base and new forms to be linked.

Because the linking coefficients A and B reflect, respectively, the differences in the SD and mean of the primary factor θ0 between the base and new populations, and the βs coefficients reflect differences in the mean of the testlet effect factors θs , the generation of linking parameters began by fixing the distributions of all factors for the base population to N(0, 1) . Then the slope coefficients A (the SDs of θ0 for the new population) were generated from LN (0, 0.22), and the intercept coefficients B (the means of θ0 for the new population) were generated from N (0, 0.32). Ten combinations of A and B values were generated, and they were applied to both Conditions 1 and 2. Note that for the first combination, the values of A and B were set at 1 and 0, respectively, so that it could serve as the baseline combination. For each simulation condition, the βs coefficients (the means of θs for the new population) were generated from N (0, 0.32), subject to the constraint s=1kβs=0 , where the first k1 beta coefficients were randomly sampled from the distribution and the last beta coefficient was set as βk=s=1k=1βs . Associated with the first combination of A =1 and B =0, all beta coefficients were set at 0. The true linking parameters, A , B , and βs , used to generate 20 data sets (10 data sets per condition) are presented in Table 1.

Table 1.

True Linking Coefficients for Simulation Conditions 1 and 2.

Condition 1 Condition 2
A B Data Set β1 β2 Data Set β1 β2 β3 β4
1.000 .000 1 .000 .000 11 .000 .000 .000 .000
.850 −.244 2 −.052 .052 12 −.202 .006 −.541 .737
1.250 .370 3 −.362 .362 13 −.164 .507 .063 −.406
1.013 .051 4 .224 −.224 14 −.141 .512 −.112 −.259
.900 −.360 5 .145 −.145 15 −.024 .238 −.143 −.071
1.093 .202 6 −.017 .017 16 −.138 −.008 .658 −.512
1.234 −.180 7 .132 −.132 17 .142 −.172 −.021 .051
.986 .026 8 −.321 .321 18 .087 −.188 .053 .048
.961 .150 9 .239 −.239 19 −.475 −.247 .155 .567
.857 −.108 10 .020 −.020 20 −.082 .013 .364 −.295

Estimation and Evaluation

For each data set, the item and testlet parameters for the 3PL testlet model were estimated using the computer program flexMIRT (Cai, 2017). By default, flexMIRT uses 0–1 scaling for each factor to estimate item parameters, and we applied that scaling approach to the separate calibrations of base and new sample data. With the separate calibration results, the linking parameters were estimated using the statistical programming language R (R Development Core Team, 2018). Specifically, the linking solutions for the MLS method were computed using the built-in linear algebra functions. The solutions for the GENOUD-TRF method were found using the “genoud” function included in the R package genoud (Mebane & Sekhon, 2011). The solutions for the BFGS-TRF and IRF methods were found using the “optim” function included in the R package stats. For the TRF and IRF methods, 41 θB points, equally spaced from −4 to 4, were used to define their criterion functions (see Equations (15) and (22)).

For each data set in each condition, differences between the estimated and true linking parameters (i.e., estimation errors) were computed to evaluate the performance of each linking method. In addition, the means of the absolute differences across the 10 data sets in each condition were computed to summarize the estimation errors for each of the linking parameters.

Results

Results of Condition 1

The linking parameter recovery results of the GENOUD- and BFGS-TRF methods for Condition 1 (data sets 1–10), in which two common testlets were used, are presented in Table 2. The two TRF methods performed nearly equally in estimating the true linking parameters. For the A , B , and βs coefficients, in most cases, the estimates produced by the GENOUD-TRF method were equal to those by the BFGS-TRF method up to three decimal places. These results suggest that the use of a genetic algorithm offers little improvement to the TRF method. The recovery of the linking parameters differed by the type of linking coefficients. For most data sets, the estimation errors for A^ and B^ were close to zero, and the mean absolute errors of A^ and B^ were .021 and .029, respectively, indicating that the two TRF methods perform well in estimating the linking coefficients for the primary factor θ0 . By contrast, the estimation errors for β^1 and β^2 were greater (by more than .4 for data sets 3 and 9), and their mean absolute errors were .182 and .182, respectively (the zero-sum constraint causes the two values to be the same). It is noteworthy that for the first baseline data set, the estimation errors for the beta coefficients (−.174 and .174) are much larger than the error for the B coefficient (.013). This finding suggests that the TRF method can be poor at estimating the mean differences in testlet factors between the examinee groups being analyzed for linking.

Table 2.

Estimation Errors of the Linking Parameters from the Two TRF Methods for Condition 1.

Data Set A^A B^B β^1β1 β^2β2
GENOUD-TRF method
 1 −.019 .013 −.174 .174
 2 .007 −.016 .053 −.053
 3 −.018 .056 .445 −.445
 4 −.045 .042 −.162 .162
 5 −.007 −.023 −.069 .069
 6 −.038 −.001 .063 −.063
 7 .029 −.054 .154 −.154
 8 −.020 −.056 .121 −.121
 9 −.030 .025 −.530 .530
 10 .002 .006 .045 −.045
Mean absolute error .021 .029 .182 .182
BFGS-TRF method
 1 −.019 .013 −.174 .174
 2 .007 −.016 .053 −.053
 3 −.017 .056 .444 −.444
 4 −.045 .042 −.163 .163
 5 −.007 −.023 −.069 .069
 6 −.038 −.001 .063 −.063
 7 .029 −.054 .154 −.154
 8 −.020 −.056 .121 −.121
 9 −.030 .025 −.530 .530
 10 .002 .006 .045 −.045
Mean absolute error .021 .029 .182 .182

The recovery results from the MLS and IRF methods for Condition 1 (data sets 1–10) are presented in Table 3. For both methods, the estimation errors of A^ and B^ were close to zero in most cases, as was found with the two TRF methods. The mean absolute errors for A^ and B^ with the MLS method were .035 and .037, respectively, and those with the IRF method were .021 and .033. For the recovery of the beta coefficients, the estimation errors for β^1 and β^2 with the MLS and IRF methods were closer to zero than those with the two TRF methods. The mean absolute error for either beta coefficient with the MLS method was .068, and that with the IRF method was .036. This finding suggests that the IRF, MLS, and TRF methods perform best, second best, and worst, respectively, in estimating the intercept linking coefficients ( βs ).

Table 3.

Estimation Errors of the Linking Parameters from the MLS and IRF Methods for Condition 1.

Data Set A^A B^B β^1β1 β^2β2
MLS method
 1 .015 −.019 .013 −.013
 2 .108 −.025 −.155 .155
 3 .013 −.150 −.182 .182
 4 .025 .021 .002 −.002
 5 −.032 −.006 .084 −.084
 6 .056 −.067 −.190 .190
 7 .017 −.020 .025 −.025
 8 −.030 −.039 −.018 .018
 9 −.044 .016 .006 −.006
 10 .009 .006 .007 −.007
Mean absolute error .035 .037 .068 .068
IRF method
 1 .007 .003 −.005 .005
 2 −.003 −.013 .026 −.026
 3 .027 −.075 −.091 .091
 4 −.033 .043 .012 −.012
 5 −.036 −.050 .102 −.102
 6 −.032 .013 .037 −.037
 7 .025 −.036 .007 −.007
 8 −.010 −.069 −.024 .024
 9 .019 −.031 .046 −.046
 10 .014 −.001 .013 −.013
Mean absolute error .021 .033 .036 .036

Results of Condition 2

The recovery results of the GENOUD- and BFGS-TRF methods in Condition 2 (data sets 11–20) are presented in Table 4, and the results of the MLS and IRF methods are presented in Table 5. As was found in the results in Condition 1, all methods produced estimation errors for A^ and B^ that were close to zero in most cases. The mean absolute errors for A^ and B^ were .022 and .046, respectively, with the GENOUD-TRF method, .023 and .045 with the BFGS-TRF method, .026 and .053 with the MLS method, and .014 and .036 with the IRF method.

Table 4.

Estimation Errors of the Linking Parameters from the Two TRF Methods for Condition 2.

Data Set A^A B^B β^1β1 β^2β2 β^3β3 β^4β4
GENOUD-TRF method
 11 −.067 .006 −.504 .063 −.189 .630
 12 −.013 −.088 −.317 −.621 .563 .376
 13 −.055 −.021 −.062 −.072 −.189 .324
 14 .013 −.055 .180 −.029 −.245 .094
 15 −.009 .061 −.304 −.110 .091 .323
 16 −.027 .040 −.335 −.196 .361 .170
 17 −.001 −.013 .067 .054 −.194 .074
 18 .001 .044 −.022 .119 −.298 .201
 19 −.017 .115 −.193 −.097 −.063 .352
 20 −.018 .020 .102 −.202 −.240 .340
Mean absolute error .022 .046 .209 .156 .243 .288
BFGS-TRF method
 11 −.067 .006 −.500 .057 −.189 .632
 12 −.010 −.085 −.310 −.578 .555 .332
 13 −.055 −.024 −.088 −.059 −.182 .330
 14 .014 −.048 .066 −.038 −.160 .132
 15 −.026 .060 .053 −.114 −.101 .162
 16 −.028 .040 −.317 −.158 .296 .179
 17 .001 −.012 .054 .047 −.169 .068
 18 .001 .044 −.027 .119 −.295 .202
 19 −.017 .115 −.193 −.097 −.063 .353
 20 −.014 .019 .114 −.176 −.244 .306
Mean absolute error .023 .045 .172 .144 .225 .270

Table 5.

Estimation Errors of the Linking Parameters from the MLS and IRF Methods for Condition 2.

Data Set A^A B^B β^1β1 β^2β2 β^3β3 β^4β4
MLS method
 11 .082 −.093 −.014 .078 −.010 −.054
 12 .032 −.036 −.012 .170 −.167 .010
 13 −.029 −.063 −.300 .126 .120 .054
 14 .013 −.052 .005 .094 −.034 −.064
 15 −.003 .039 .036 .267 −.085 −.218
 16 −.038 .071 .008 −.226 .256 −.038
 17 .003 −.004 .012 −.068 .012 .043
 18 .005 −.021 .096 .063 −.017 −.141
 19 −.020 .139 −.100 −.220 −.078 .398
 20 −.035 −.012 −.142 .029 .209 −.097
Mean absolute error .026 .053 .073 .134 .099 .112
IRF method
 11 .047 −.006 .003 −.009 .044 −.037
 12 .007 −.051 .086 −.036 −.081 .031
 13 −.016 .005 −.038 −.001 .010 .029
 14 −.004 −.042 .004 .066 −.018 −.053
 15 .005 .063 −.066 .167 −.041 −.060
 16 −.013 .053 −.124 −.167 .345 −.054
 17 .006 .001 −.025 −.041 −.009 .074
 18 .014 .019 .011 −.034 −.027 .050
 19 −.011 .113 −.076 −.249 −.028 .353
 20 .017 −.006 −.103 .041 .137 −.075
Mean absolute error .014 .036 .053 .081 .074 .082

For the recovery of the beta coefficients, the two TRF methods produced estimation errors for β^1 to β^4 that deviated from zero by more than .3 in many cases. The mean absolute errors for β^1 to β^4 were .209, .156, .243, and .288, respectively, with the GENOUD-TRF method and .172, .144, .225, and .270 with the BFGS-TRF method. Thus, using a genetic algorithm for scale linking can lead to worse solutions for the beta coefficients than using a quasi-Newton algorithm. In contrast, the estimation errors for β^1 to β^4 with the MLS and IRF methods were closer to zero than those with the two TRF methods. The mean absolute errors of β^1 to β^4 were .073, .134, .099, and .112, respectively, with the MLS method and .053, .081, .074, and .082 with the IRF method. It is noteworthy that with the baseline data set 11, the two TRF methods resulted in much larger estimation errors for the beta coefficients than for the B coefficient. In sum, the IRF, MLS, and TRF methods performed best, second best, and worst, respectively, in estimating the βs coefficients.

Discussion and Conclusions

Li et al. (2005) proposed a TRF linking method for the 2PNO testlet model and used the GENOUD genetic algorithm to find minimization solutions for the linking parameters by updating a population of solutions from generation to generation. In the present paper, we used the 3PL testlet model for generality to formulate the linking task from the perspective of bi-factor modeling and presented two alternatives (MLS and IRF) to the TRF linking method for the model. One of the purposes of the simulation study was to examine whether there is a compelling reason to use the genetic algorithm instead of the BFGS algorithm, which is one of the quasi-Newton methods widely applied, when using the TRF method to find linking solutions. The other purpose was to investigate the performance of the TRF method (based on either the GENOUD or BFGS algorithm) against the other linking methods, MLS and IRF.

The following main results were found from the stimulation study. For the simulated linking data sets using two common testlets (Condition 1), the performance of the GENOUD-TRF method was nearly the same as that of the BFGS-TRF method in recovering the true linking parameters, A (slope) and B (intercept) for the primary dimension factor and βs (intercepts) for the testlet factors, subject to the zero-sum constraint βs=0 . In Condition 2 involving four common testlets, the GENOUD-TRF method performed nearly as well as the BFGS-TRF method in estimating the A and B coefficients, but it tended to estimate the βs coefficients less accurately than the BFGS-TRF method. This finding suggests that using a genetic algorithm does not lead to better solutions for the linking coefficients, particularly the beta coefficients, than using a quasi-Newton algorithm. In both simulation conditions, there was a small difference in linking accuracy among the linking methods for the A and B coefficients, whereas for the βs coefficients, the methods differed substantially. In recovering the true βs coefficients, on average, the IRF method showed the least estimation error, and the TRF methods produced the largest errors, more than double the average error of the IRF method. Taken together, these results suggest that the IRF, MLS, and TRF methods perform best, second best, and worst, respectively, in estimating the linking parameters associated with testlet effects.

The poor performance of the TRF method against the IRF method in estimating the βs coefficients may be regarded as a bit unusual, but is not a new finding, as shown by Kim (2019). To understand why the TRF method estimated the beta coefficients more poorly than the IRF method, we examined the contour plots (i.e., level curves) of the negative criterion functions, fT(A,β) and fI(A,β) , for the two methods. With βk = s=1k1βs and A and/or B fixed to their true values, we drew the contour plots with the axes of B and β1 for the data sets in Condition 1 and the axes of each of the three pairs ( β1 and β2 , β1 and β3 , and β2 and β3 ) for the data sets in Condition 2. As illustrated in Figure 1 (where the contour plots for data sets 1 and 11 are presented as examples), for all data sets in Condition 1, the contour plots of fT(A,β) had top level curves shaped like elongated rings, narrow along the B -axis but wide along the β1 -axis, whereas those of fI(A,β) had top level curves shaped like small ellipses, narrow along both axes. Compared with the top-level curves for the IRF method, the shape of the top-level curves for the TRF method indicates that the B coefficient can be more accurately estimated than the β1 coefficient. For all data sets in Condition 2, the contour plots of fT(A,β) had top level curves shaped like distorted ellipses, big and wide, whereas those of fI(A,β) had top level curves shaped like small ellipses, tilted diagonally.

Figure 1.

Figure 1.

Contour Plots of fT(A,β) and fI(A,β) with Data Set 1 in Condition 1 and Data Set 11 in Condition 2.

The difference in shape between the top-level curves suggests that the TRF method produces less stable estimates of the beta coefficients than the IRF method and that the GENOUD and BFGS algorithms can converge to different neighborhoods and reach a global minimum. In other words, the criterion function of the IRF method is well-structured for the linking solutions but that of the TRF method is not. The criterion function of the IRF method is based on the sum of the squared differences in category response functions at the item level, and so the possible location differences between separate calibrations for each testlet are preserved and not mixed with those for other testlets. But the criterion function of the TRF method is based on the squared differences in true test scores, so that the possible location differences for each testlet are likely confounded at the test level. Such confounding likely leads to unstable estimation of the beta coefficients.

From the simulation results, we find no compelling reason to use genetic algorithms instead of quasi-Newton algorithms to find minimization solutions for the TRF method. Furthermore, in a numerical sense, we find that the criterion function of the IRF method produces better-structured solutions than that of the TRF method. From a practical point of view, the BFGS algorithm should be preferred to the GENOUD algorithm because the latter takes much more time than the former to find the minimization solutions. In this simulation study, the GENOUD-TRF method often took more than 5 and 10 minutes for the data sets in conditions 1 and 2, respectively, whereas the BFGS-TRF and IRF methods took less than 5 and 15 seconds for the corresponding data sets.

As pointed out by Li et al. (2005), the zero-sum constraint for the beta coefficients indicates that the average (across testlets) of testlet factor means should be the same between the examinee groups being analyzed for scale linking. Although the zero-sum constraint seems to be reasonable and flexible, the linear dependence among the beta coefficients can be solved in other ways. One feasible approach is choosing a common testlet whose factor mean is expected to change little across examinee groups and fixing the beta coefficient for the common testlet to zero. Then the rest of the beta coefficients are considered as the free parameters. An analog of this approach is found in differential item functioning (DIF) analyses. If we use the constraint that the overall mean difference across examinee groups in item difficulties is zero, any performance differences are absorbed into differences in ability and treated as impact. If we suspect that the DIF may not cancel out across items, we can designate a set of anchor items to have zero DIF. Of course, to apply that “fixed-to-zero” constraint to linking tasks, the transformation matrix in Equation (13) should be modified appropriately for the MLS method, and the partial derivatives of the criterion functions with respect to the linking parameters should be properly computed for the IRF and TRF methods.

Some studies need to be conducted to improve understanding of the three linking methods presented in this paper and enable a wise choice among them in practice. First, the simulation study in this paper did not address the effects of linking errors in transformed item parameter estimates on the estimation of the ability ( θ0 ) parameters for the new group examinees. The accuracy of ability estimation would be affected most by the estimates of the biN or diN parameters, which are functions of the beta coefficients, given A , B , and CsB (see Equation (21)). It should be examined whether the ability parameters can be estimated better when the linking coefficients from the IRF method are used than when those from the MLS or TRF method are used. Second, although the separate calibration and linking approach is a basic and dependable method for developing a common IRT scale, a common scale can also be developed by using multiple-group concurrent calibration (Bock & Zimowski, 1997) or fixed parameter calibration (Kim, 2006). A comparative study on the performance of the three calibration types will offer practitioners in the areas of test equating and vertical scaling useful information about the advantages and disadvantages of the various linking methods. Third, the three linking methods presented for the 3PL testlet model need to be extended to polytomous testlet models such as the graded response testlet model and the generalized partial credit testlet model. It would be meaningful to investigate the performance of the linking methods using a variety of real and simulated data. Finally, it would be very useful to derive analytic formulas for the standard errors of linking coefficient estimates obtained from the different linking methods because those standard errors of estimates can be used as indices of linking precision in practice.

Acknowledgments

The authors are grateful to two anonymous reviewers, Dr. John R. Donoghue (the Editor-in-Chief), and Dr. Christine E. DeMars (the Associate Editor) for their beneficial comments and insightful suggestions to improve the quality of this paper.

Appendix: Partial Derivatives.

Based on ciN=ciN and Equations (18) and (21), let us write Pi(ξsB) and Pi(θB) as

Pi(ξsB)=ciN+(1ciN)/{1+exp[DaiN(ξsBbiN)]}, (A1)
Pi(θB)=Pi(ξsB)h(ξsB|θB;CsB)dξsB. (A2)

The partial derivatives of Pi(θB) with respect to A , B , and βs (where is ) are computed as

Pi(θB)A=DaiN(ξsBCsBβsBA)[Pi(ξsB)ciN1ciN][1Pi(ξsB)] h(ξsB|θB;CsB) dξsB, (A3)
Pi(θB)B=DaiN[Pi(ξsB)ciN1ciN][1Pi(ξsB)] h(ξsB|θB;CsB) dξsB, (A4)
Pi(θB)βs=DaiNCsB[Pi(ξsB)ciN1ciN][1Pi(ξsB)] h(ξsB|θB;CsB) dξsB. (A5)

Then, the partial derivatives of fI(A,β) with respect to A and B are given by

fIA=2Nnq=1Ni=1n[Pi(θqB)Pi(θqB)]Pi(θqB)A, (A6)
fIB=2Nnq=1Ni=1n[Pi(θqB)Pi(θqB)]Pi(θqB)B. (A7)

When βk=s=1k1βs , the partial derivatives of fI(A,β) with respect to βs ( s=1,...,k1 ) are given by

fIβs=2Nnq=1N{is[Pi(θqB)Pi(θqB)]Pi(θqB)βsik[Pi(θqB)Pi(θqB)]Pi(θqB)βk}. (A8)

With Equations (A1) to A5, the partial derivatives of fT(A,β) with respect to A and B are given by

fTA=2Nq=1N[i=1nPi(θqB)i=1nPi(θqB)] i=1nPi(θqB)A, (A9)
fTB=2Nq=1N[i=1nPi(θqB)i=1nPi(θqB)]i=1nPi(θqB)B. (A10)

And if βk=s=1k1βs , the partial derivatives of fT(A,β) with respect to βs ( s=1,...,k1 ) are given by

fTβs=2Nq=1N[i=1nPi(θqB)i=1nPi(θqB)]{isPi(θqB)βsikPi(θqB)βk}. (A11)

Footnotes

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Seonghoon Kim https://orcid.org/0000-0002-0357-8639

References

  1. Bock R. D., Zimowski M. F. (1997). Multiple group IRT. In van der Linden W. J., Hambleton R. K. (Eds.), Handbook of modern item response theory (pp. 433–448). Springer. https://doi/org/10.1007/978-1-4757-2691-6_25 [Google Scholar]
  2. Bradlow E. T., Wainer H., Wang X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168. 10.1007/bf02294533 [DOI] [Google Scholar]
  3. Cai L. (2017). flexMIRT: Flexible multilevel multidimensional item analysis and test scoring [Computer software] . Vector Psychometric Group. [Google Scholar]
  4. Davey T., Oshima T. C., Lee K. (1996). Linking multidimensional item calibrations. Applied Psychological Measurement, 20(4), 405–416. 10.1177/014662169602000407 [DOI] [Google Scholar]
  5. DeMars C. E. (2006). Application of the bi-factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43(2), 145-168. 10.1111/j.1745-3984.2006.00010.x [DOI] [Google Scholar]
  6. Dennis J. E., Schnabel R. B. (1996). Numerical methods for unconstrained optimization and nonlinear equations. Society for Industrial and Applied Mathematics. [Google Scholar]
  7. Divgi D. R. (1985). A minimum chi-square method for developing a common metric in item response theory. Applied Psychological Measurement, 9(4), 413–415. 10.1177/014662168500900410 [DOI] [Google Scholar]
  8. Gibbons R. D., Hedeker D. R. (1992). Full-information item bi-factor analysis. Psychometrika, 57(3), 423–436. 10.1007/bf02295430 [DOI] [Google Scholar]
  9. Glas C. A. W., Wainer H., Bradlow E. T. (2000). Maximum marginal likelihood and expected a posteriori estimation in testlet-based adaptive testing. In van der Linden W. J., Glas C. A. W. (Eds.), Computerized adaptive testing: Theory and practice (pp. 271–287). Kluwer Academic Publishers. 10.1007/0-306-47531-6_14 [DOI] [Google Scholar]
  10. Haebara T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22(3), 144–149. 10.4992/psycholres1954.22.144 [DOI] [Google Scholar]
  11. Kim S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43(4), 353–381. 10.1111/j.1745-3984.2006.00021.x [DOI] [Google Scholar]
  12. Kim S. (2019). Common-item linking methods for the bi-factor three parameter model in MIRT. Journal of Educational Evaluation, 32(1), 27–52. 10.31158/jeev.2019.32.1.27 [DOI] [Google Scholar]
  13. Kim S., Lee W.-C. (2006). An extension of four IRT linking methods for mixed-format tests. Journal of Educational Measurement, 43(1), 53–76. 10.1111/j.1745-3984.2006.00004.x [DOI] [Google Scholar]
  14. Kolen M. J., Brennan R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). Springer. [Google Scholar]
  15. Li Y., Bolt D. M., Fu J. (2005). A testlet characteristic curve linking method for the testlet model. Applied Psychological Measurement, 29(5), 340–356. 10.1177/0146621605276678 [DOI] [Google Scholar]
  16. Li Y., Bolt D. M., Fu J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30(1), 3–21. 10.1177/0146621605275414 [DOI] [Google Scholar]
  17. Lord F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates. [Google Scholar]
  18. Loyd B. H., Hoover H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17(3), 179–193. 10.1111/j.1745-3984.1980.tb00825.x [DOI] [Google Scholar]
  19. Marco G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14(2), 139–160. 10.1111/j.1745-3984.1977.tb00033.x [DOI] [Google Scholar]
  20. Mebane W. R., Sekhon J. S. (2011). Genetic optimization using derivatives: The rgenoud package for R. Journal of Statistical Software, 42(11), 1–26. 10.18637/jss.v042.i11 [DOI] [Google Scholar]
  21. Oshima T. C., Davey T. C., Lee K. (2000). Multidimensional linking: Four practical approaches. Journal of Educational Measurement, 37(4), 357–373. 10.1111/j.1745-3984.2000.tb01092.x [DOI] [Google Scholar]
  22. R Development Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing. [Google Scholar]
  23. Rijmen F. (2010). Formal relations and an empirical comparison among the bi-factor, the testlet, and a second-order multidimensional IRT model. Journal of Educational Measurement, 47(3), 361–372. 10.1111/j.1745-3984.2010.00118.x [DOI] [Google Scholar]
  24. Sekhon J. S., Mebane W. R. (1998). Genetic optimization using derivatives. Political Analysis, 7, 187–210. 10.1093/pan/7.1.187 [DOI] [Google Scholar]
  25. Stocking M. L., Lord F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201–210. 10.1177/014662168300700208 [DOI] [Google Scholar]
  26. Wainer H., Bradlow E. T., Wang X. (2007). Testlet response theory and its applications. Cambridge University Press. [Google Scholar]
  27. Wainer H., Kiely G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24(3), 185–201. 10.1111/j.1745-3984.1987.tb00274.x [DOI] [Google Scholar]
  28. Wainer H., Wang X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203–220. 10.1111/j.1745-3984.2000.tb01083.x [DOI] [Google Scholar]
  29. Yen W. M., Fitzpatrick A. R. (2006). Item response theory. In Brennan R. L. (Ed.), Educational measurement (4th ed., pp. 111–153). American Council on Education and Praeger. [Google Scholar]

Articles from Applied Psychological Measurement are provided here courtesy of SAGE Publications

RESOURCES