Abstract
The calibration of the one-parameter logistic ability-based guessing (1PL-AG) model in item response theory (IRT) with a modest sample size remains a challenge for its implausible estimates and difficulty in obtaining standard errors of estimates. This article proposes an alternative Bayesian modal estimation (BME) method, the Bayesian Expectation-Maximization-Maximization (BEMM) method, which is developed by combining an augmented variable formulation of the 1PL-AG model and a mixture model conceptualization of the three-parameter logistic model (3PLM). By comparing with marginal maximum likelihood estimation (MMLE) and Markov Chain Monte Carlo (MCMC) in JAGS, the simulation shows that BEMM can produce stable and accurate estimates in the modest sample size. A real data example and the MATLAB codes of BEMM are also provided.
Keywords: 1PL-AG, BEMM, MMLE, IRT, algorithm
In educational measurement, student’s ability is always considered to be unobservable or latent. With the introduction of the item response theory (IRT), the issue has been properly addressed by collecting students’ responses to several items and provides a measurable result of students’ proficiency level. The Rasch model (Lord & Novick, 1968; Rasch, 1960) has become a widely used and highly reliable model based only on students’ ability and the item difficulty.
Despite of its popularity, the Rasch model has to keep evolving to meet changing needs from testing practice. One prominent example is how to model guessing behavior which might be even prevalent due to the dominant item format, multiple-choice items, in the testing industry. To solve this problem, the one-parameter logistic guessing (1PL-G) IRT model and its ability-based guessing version (1PL-AG) are proposed (San Martín et al., 2006, 2013). Increasingly, applications of the 1PL-AG model have appeared in the literature as supporting evidence of its usefulness (Fariña et al., 2019; San Martín et al., 2013; Wang & Huang, 2011).
One daunting challenge regarding the 1PL-AG model pertains to the difficulty in estimating item parameters. The lack of a user-friendly implementation of the 1PL-AG model remains a major barrier to its widespread application in practice. Few studies to date have addressed the item calibration issue for the 1PL-AG model. All the methods fall into three approaches: PROC NLMIXED in SAS (San Martín et al., 2006), Markov Chain Monte Carlo (MCMC) via general-purpose Bayesian tools like WinBUGS (San Martín et al., 2013) and marginal maximum likelihood estimation with expectation-maximization (MMLE/EM) algorithm (Park et al., 2015). For the PROC NLMIXED and BUGS methods, although they can provide ease of programming and accurate item estimates, both of them are general implementation tools and, thus, inherently time-consuming (Park et al., 2015) which motivates us to find an alternative implementation.
Compared with PROC NLMIXED and MCMC, MMLE/EM is a more efficient procedure to obtain the estimates of the 1PL-AG model. However, just as the essential difficulty of estimating the three-parameter logistic model (3PLM), the sparse data for the guessing parameters maybe cause extremely small diagonal elements in the negative Hessian matrix when running Newton–Raphson iterations, which finally leads to an unstable and implausible MLE (Mislevy, 1986) and low convergence rates (Waller & Feuerstahler, 2017). In this case, MMLE/EM usually requires a relatively large sample size to obtain stable MLEs (Park et al., 2015).
In this article, the authors propose an alternative Bayesian modal estimation approach that they term as Bayesian Expectation-Maximization-Maximization(BEMM) method, which is developed by using a latent variable augmentation approach for the 1PL-AG model in conjunction with the mixture modeling approach for the 3PLM (Zheng et al., 2018). Furthermore, this kind of methods can be also applied for estimating the four parameter logistic IRT model (Zhang, 2018; Zhang et al., 2018). The following article is organized as follows: First, MMLE/EM method and identification issues for the 1PL-AG model are summarized. Then, the derivation of the BEMM algorithm for the 1PL-AG is presented. Next, a simulation study and an empirical example are reported. The last section is a brief discussion and future directions.
Marginalized Maximum Likelihood Estimation for the 1PL-AG Model
The basic IRT formulation of the 1PL-AG model is
(1) |
with
(2) |
(3) |
where is the weighted guessing component on the latent ability for all items in one test, and are difficulty and logistic scaled guessing parameter of item , respectively. is the ability parameter of examinee . Furthermore, the 1PL-AG model can degenerate to the 1PL-G model by fixing in Equation 3.
Let be the arbitrary item parameter in , and be the th column vector of the matrix collecting the item parameters for all items. Similarly, let be the matrix collecting all item responses of persons on items. Then, the marginal likelihood function should be
(4) |
with
(5) |
(6) |
where is a density function for each examinee , and is the vector containing the parameters of the examinee population ability distribution. Park et al. (2015) suggested using EM algorithm with Newton–Raphson method to solve this function, then the maximum likelihood estimates (MLEs) for each parameter can be obtained.
The Bayesian Modal Estimation for the 1PL-AG Model
Model Identification
The IRT models with guessing effects (e.g., 3PLM) suffer from the identification issue which causes estimation difficulty (Lord, 1980), and the 1PL-AG model is no exception (San Martín et al., 2015). However, several studies have been done to address this issue (Fariña et al., 2019; San Martín et al., 2006, 2013, 2015). Especially, San Martín et al. (2006) have shown that the identification issue of the 1PL-AG model can be avoided if we assume the latent variable follows a normal distribution with a fixed mean of zero. Following the MMLE/EM algorithm (Park et al., 2015), the BEMM algorithm also uses a standard normal distribution to integrate to guarantee the 1PL-AG model is identified.
Mixture-Modeling Reformulation
The 1PL-AG model is derived from the two-process interpretation of the 3PL model: The p-process and the g-process (Hutchinson, 1991). The p-process is the regular process that examinee tries to solve items depends only on the latent ability. The g-process, apart from the regular process, consists of a general weighted guessing component parameter of the examinee’s ability. Thus, it can be determined that the 1PL-AG model can be regarded as a latent mixture model. First, note that
(7) |
Next, a latent indicator variable is defined to indicate whether examinee knows the answer of item only using the p-process (Béguin & Glas, 2001; Culpepper, 2015; Guo & Zheng, 2019). Specifically, let
Then, the authors model probabilistically as meaning that .
From the mixture-modeling perspective, depending on the value of , the 1PL-AG model can be decomposed into two irrelevant parts: 1 and . The conditional probabilities can be further obtained as
(8) |
Thus, the joint distribution of and (conditional on item and person parameters) can be found as follows:
(9) |
Since , is unnecessary in the joint probability function. Let be the latent indicator vector for examinee . Then the joint distribution for the new augmented complete data is
(10) |
where
(11) |
Just like MMLE/EM algorithm (Bock & Aitkin, 1981), the BEMM algorithm also marginalizes out the latent ability variable, so the likelihood function of BEMM is
(12) |
This approach follows the basic premise of an EM algorithm—that is, from a full likelihood of the data, iteratively taking expectations of the log-likelihood over the incomplete portion of the data, defining “artificial data,” and maximizing the likelihood—and proposes to split the maximization step into two parts: first maximizing over and , and then maximizing over . Prior distributions for the item parameters are further used to obtain the Bayes estimates. As such, this approach is called the Bayesian Expectation-Maximization-Maximization (BEMM) method.
Follow the EM algorithm for IRT models (Bock & Aitkin, 1981; Park et al., 2015), the first derivative of the log-likelihood function is calculated for the arbitrary item parameter . Let , , be nodes on the ability scale with associated weights , the integral can be approximated by using the numerical quadrature method. Then, we have (please refer to Online Appendix A for a detailed derivation):
(13) |
with
(14) |
(15) |
(16) |
where is the posterior density of at quadrature point and is the density function of the standard normal distribution given .
Expectation Step and Prior Distribution
In the expectation step, the expectation of the first derivative of the log-likelihood is taken for each item parameter over the incomplete portion of the data, the latent variables and . Based on the general equation of the Bayesian model estimation given by Mislevy (1986), the next maximization step is equivalent to finding the zero of the sum of the expected first derivative of log-likelihood and the first derivative of the prior as
(17) |
where is the density function of the prior distribution of with hyperparameter . Notice that the above expression includes the latent discrete variable so when the expectation of the first derivatives is taken, it will be necessary to calculate . By using Bayes’ rule, the conditional distribution of is
(18) |
(19) |
Thus, the conditional expectation of is found to be
(20) |
Several “artificial data” can be defined for the algorithm (Bock & Aitkin, 1981):
(21) |
and
(22) |
We find that these expressions have very intuitive interpretations:
is the expected number of persons (out of ) with ability ;
is the expected number of persons with ability to answer item correctly ;
is the expected number of persons with ability that only using the p-process to know the answer of item ;
is the expected number of persons with ability that answering item correctly and only using the p-process to know the answer of item . Due to the definition of the conditional expectation , mathematically equals .
Since the parameter should be larger than 0 (San Martín et al., 2006), a log transformation for to facilitate estimation is made (Mislevy, 1986). Let , and then a normal distribution prior can be imposed on the arbitrary parameter . Thus, instead of directly estimating , first is obtained and its exponential value is taken as the estimates of . Let be the general form of the density function of the normal distribution with mean and variance . By plugging the artificial data and priors, Equation 17 can be derived as
(23) |
where
(24) |
Maximization Step 1: Parameters
Let and be the part of Equation 23 with respect to , . Then,
(25) |
(26) |
Since , and parameters do not have closed-form solutions, it is necessary to find them numerically. The method used here is the Newton–Raphson (N-R) algorithm for iteration:
(27) |
where is the second derivatives of the expected log-likelihood function for parameters listed in the subscript:
(28) |
(29) |
(30) |
Although and parameters are maximized in the same N-R cycle, it can be regarded as two independent procedures because . Thus, Equation 27 can be rewritten as
(31) |
In addition, the above-mentioned procedure can be also applied to estimate the 1PL-G model with some minor modifications.
Maximization Step 2: -Parameters
Unlike and parameters, the first and second derivative of involves the entire set of items, so it is impossible to maximize with in one Hessian matrix in the family of EM algorithm. In this case, following Park et al. (2015), the solution of and for each item is found, and then maximize parameter. Let and be the part of Equation 23 and corresponding second derivative with respect to , the estimates in N-R iteration can be obtained from
(32) |
where
(33) |
(34) |
Once the procedure of BEMM converged, the estimate of is equal to .
Standard Errors of Parameter Estimation in the BEMM
An important index of estimation quality is the standard error (SE). However, one major criticism of the EM algorithm is that parameter estimate SEs are not a natural product of the algorithm, and so other methods have to be devised (McLachlan & Krishnan, 2007). The BEMM falls prey to this criticism, just as all members of the EM algorithm family do. As such, it is necessary to compute SEs separately from the estimation algorithm itself.
Although the inverse of the negative expected value of the matrix of second derivatives of the log-likelihood (i.e., the inverse of the information matrix) is the simplest way to obtain the covariances, it can lead to serious bias and underestimate SEs (Kendall & Stuart, 1961). Cai and Lee (2009) proposed the Supplemented EM (SEM) method to offer a robust covariance matrix of estimates in the EM algorithm, and Tian et al. (2013) further speeded up that process and proposed the Updated SEM (USEM) method. Thus, the USEM method is used to obtain SEs, which can provide accurate and robust SEs for IRT model estimation.
Based on the general large-sample covariance matrix of MLEs proposed by Cai and Lee (2009), the standard error of BMEs in the 1PL-AG model should be:
(35) |
where is a 2×2 identity matrix, is the rate function with respect to the estimated parameter , is transformed from by using Delta method (Oehlert, 1992). The calculation of not only requires the temporary estimates in each BEMM iteration history but also needs dozens of additional BEMM cycles. Please refer to Cai and Lee (2009) for the details of , Tian et al. (2013) for the USEM method, and Online Appendix B for the specific implementation.
Simulation Studies
This section presents Monte Carlo results regarding the accuracy of the developed BEMM procedure for estimating 1PL-AG model item parameters, compared with MMLE/EM (Park et al., 2015) and MCMC in JAGS (Plummer, 2003). The goal of the simulation studies was to assess the influence of various design features, including the sizes of the weighted guessing component on ability, the sample sizes, and the priors. To achieve this goal, two implementations were programmed for BEMM and MMLE/EM algorithms in the MATLAB environment.
Item and Person Parameter Generation
The item parameters were generated from the same distribution as in Park et al. (2015). Specifically, item parameters for 20 items were generated from and , and these item parameters were kept consistent to facilitate comparison (the generated parameters are listed in Online Appendix C). Since the Bayesian methods are more powerful in the modest or relatively small sample sizes, the ability parameters were drawn from the standard normal distribution with the sample size of across three weighted guessing component conditions .
Priors
Two types of priors are applied in the simulation study, the true priors and the commonly-used priors. The true priors for and are the same as the true parameter generation distribution given in the last section, which can reflect the parameter recovery in the ideal situation. However, there are no true priors in the real-world environment, so the commonly-used priors are provided here. Based on the recommendation of BILOG-MG (du Toit, 2003) for parameters and flexMIRT (Houts & Cai, 2015) for the logistic scaled guessing parameters of 3PLM, the priors for and parameters were set as follows: and . Besides, was set as the prior of to cover the reasonable range of , which is usually distributed between (San Martín et al., 2006).
The Setting of BEMM and MMLE/EM in MATLAB
In both BEMM and MMLE/EM, 61 quadrature points were set to approximate the Gaussian distribution from −6 to +6 in all simulation studies and real-data examples, and use the same starting values for each type of parameters: , and . Both algorithms use the USEM method to obtain the standard errors of parameter estimates. The convergence criteria for the E-step and M-step in BEMM and MMLE/EM algorithms are 0.0001 and 0.01, respectively. In addition, both types of priors are employed in BEMM.
The Setting of JAGS
Four chains were run parallelly with 10,000 iterations (the first 5,000 were discarded for burn-in) per chain for JAGS. The convergence criterion for all following conditions is the Gelman-Rubin potential scale reduction factor (Gelman & Rubin, 1992), which can be computed by multiple independent chains with random starting values. Since the MCMC presented here is to be a reference as another Bayesian method, only true priors are employed for JAGS to obtain Bayesian estimates in an ideal situation.
In sum, 36 conditions were simulated: 4 estimation method conditions (BEMM with true priors [TP], BEMM with commonly-used priors [CP], MMLE/EM, MCMC with TP) each crossed with three examinee sample size conditions (500, 1,000, and 2,500) and three weighted guessing component parameters (0.065, 0.265, and 0.365). For each condition, 100 replications were run to reduce sampling error. Each condition was assessed by its accuracy in recovering item parameters as defined by bias, the root mean squared error (RMSE), and the average of standard error (ASE) across the 100 replications, where
(36) |
Results
Figures 1, 2, and 3 present of RMSEs for each item in three conditions (0.065, 0.265, 0.365), respectively. Table 1 summarizes the mean of the bias, RMSEs, and ASEs across 100 replications in 36 conditions.
Figure 1.
The RMSEs of item parameter recovery for 500, 1,000, and 2,500 examinees when α = 0.065.
Note. RMSE = root mean squared error.
Figure 2.
The RMSEs of item parameter recovery for 500, 1,000, and 2,500 examinees when α = 0.265.
Note. RMSE = root mean squared error.
Figure 3.
The RMSEs of item parameter recovery for 500, 1,000, and 2,500 examinees when α = 0.365.
Note. RMSE = root mean squared error.
Table 1.
The Mean of the Bias, RMSEs, and ASEs Across 100 Replications in 36 Conditions.
Conditions |
α |
β |
γ |
||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Indicator | True α | Methods | 500 | 1,000 | 2,500 | 500 | 1,000 | 2,500 | 500 | 1,000 | 2,500 |
Bias | .065 | BEMM-TP | .01 | .01 | .01 | .01 | .02 | .02 | .05 | .06 | .06 |
BEMM-CP | −.00 | −.00 | .00 | −.04 | −.02 | .00 | .11 | .09 | .07 | ||
MCMC-TP | .03 | .02 | .02 | .03 | .03 | .03 | .07 | .07 | .06 | ||
MMLE/EM | .00 | .00 | −.01 | .12 | .06 | .02 | −.21 | −.16 | −.11 | ||
.265 | BEMM-TP | −.09 | −.07 | −.06 | −.08 | −.07 | −.06 | .02 | .01 | .00 | |
BEMM-CP | −.11 | −.09 | −.07 | −.12 | −.11 | −.09 | −.04 | −.04 | −.03 | ||
MCMC-TP | −.02 | −.02 | −.01 | −.10 | −.08 | .05 | .03 | .03 | .03 | ||
MMLE/EM | −.11 | −.10 | −.08 | −.16 | −.14 | −.11 | −.32 | −.24 | −.22 | ||
.365 | BEMM-TP | −.11 | −.08 | −.06 | −.09 | −.08 | −.07 | .01 | .01 | .01 | |
BEMM-CP | −.11 | −.08 | −.07 | −.14 | −.13 | −.10 | −.06 | −.05 | −.03 | ||
MCMC-TP | −.03 | −.02 | −.02 | −.10 | −.08 | −.08 | .02 | .02 | .03 | ||
MMLE/EM | −.16 | −.14 | −.11 | −.19 | −.18 | −.15 | −.38 | −.27 | −.24 | ||
RMSE | .065 | BEMM-TP | .02 | .02 | .02 | .31 | .24 | .19 | .38 | .31 | .25 |
BEMM-CP | .01 | .02 | .02 | .32 | .25 | .19 | .35 | .28 | .23 | ||
MCMC-TP | .04 | .04 | .03 | .32 | .25 | .19 | .36 | .29 | .24 | ||
MMLE/EM | .08 | .07 | .05 | .51 | .36 | .21 | .86 | .64 | .43 | ||
.265 | BEMM-TP | .10 | .08 | .07 | .34 | .28 | .23 | .42 | .36 | .30 | |
BEMM-CP | .13 | .10 | .08 | .36 | .29 | .23 | .39 | .33 | .28 | ||
MCMC-TP | .06 | .05 | .04 | .34 | .28 | .22 | .40 | .34 | .28 | ||
MMLE/EM | .15 | .14 | .12 | .70 | .54 | .37 | 1.05 | .81 | .64 | ||
.365 | BEMM-TP | .13 | .10 | .08 | .36 | .31 | .25 | .45 | .39 | .33 | |
BEMM-CP | .13 | .11 | .09 | .38 | .31 | .25 | .42 | .36 | .30 | ||
MCMC-TP | .07 | .05 | .04 | .37 | .30 | .24 | .42 | .37 | .31 | ||
MMLE/EM | .18 | .16 | .14 | .86 | .73 | .49 | 1.19 | .96 | .78 | ||
ASE | .065 | BEMM-TP | .04 | .03 | .02 | .33 | .26 | .19 | .40 | .33 | .26 |
BEMM-CP | .03 | .03 | .02 | .30 | .24 | .18 | .34 | .29 | .23 | ||
MCMC-TP | .06 | .05 | .04 | .33 | .26 | .20 | .37 | .31 | .26 | ||
MMLE/EM | .04 | .03 | .02 | .32 | .23 | .16 | .41 | .32 | .25 | ||
.265 | BEMM-TP | .05 | .04 | .02 | .33 | .27 | .21 | .44 | .39 | .32 | |
BEMM-CP | .05 | .04 | .03 | .29 | .24 | .19 | .37 | .32 | .28 | ||
MCMC-TP | .07 | .06 | .05 | .34 | .28 | .22 | .41 | .36 | .30 | ||
MMLE/EM | .05 | .04 | .03 | .37 | .27 | .20 | .44 | .36 | .29 | ||
.365 | BEMM-TP | .05 | .04 | .03 | .34 | .29 | .23 | .47 | .41 | .34 | |
BEMM-CP | .06 | .04 | .03 | .30 | .25 | .22 | .39 | .35 | .31 | ||
MCMC-TP | .08 | .06 | .05 | .36 | .30 | .24 | .43 | .38 | .33 | ||
MMLE/EM | .05 | .04 | .03 | .38 | .29 | .22 | .46 | .38 | .31 |
Note. RMSE = root mean squared error; ASE = average of standard error; BEMM-TP = Bayesian expectation-maximization-maximization-true priors; BEMM-CP = Bayesian expectation-maximization-maximization-commonly-used priors; MCMC-TP = Markov Chain Monte Carlo-true priors; MMLE/EM = marginal maximum likelihood estimation/expectation-maximization.
In general, with the increasing of the sample size and the decreasing of the weighted guessing component, the absolute values of bias and RMSEs of BEMM, MCMC, and MMLE/EM gradually decrease, although the differences between and are relatively small in some conditions. The bias and RMSEs of BEMM and MCMC are obviously better than MMLE/EM, especially for the parameters in the sample size of 500 and 1,000. Indicated by the figures, some RMSEs of MMLE/EM for are quite large.
Among the three Bayesian estimation methods, the MCMC with true priors recovers parameter better than BEMM with true priors in the condition of and , but their other estimates are comparable in the same condition. Compared with BEMM with commonly-used priors, BEMM with true priors has relatively better performance in estimating . Furthermore, the item recovery of shows a mixed result: The bias from true priors is smaller than those from commonly-used priors while the situation of RMSEs is opposite. Despite this, both priors can offer acceptable bias and RMSEs for .
As can be seen, the estimated standard errors among BEMM, MMLE, and MCMC are decreasing with the increase of the sample size, and they are similar in the same condition. For MMLE/EM, all ASEs are smaller than their corresponding RMSEs, but this relationship is unstable for the Bayesian method due to the interference of priors. In addition, the running time for three methods is also summarized in Table 2. It can be seen that the running time of BEMM is very close to MMLE/EM, and both of them are obviously less than MCMC in JAGS.
Table 2.
The Mean of the Running Time (Minute) Across 100 Replications in 36 Conditions.
Methods | 0.065 |
0.265 |
0.365 |
||||||
---|---|---|---|---|---|---|---|---|---|
500 | 1,000 | 2,000 | 500 | 1,000 | 2,000 | 500 | 1,000 | 2,000 | |
BEMM-TP | .05 | .09 | .14 | .05 | .12 | .18 | .05 | .09 | .22 |
BEMM-CP | .04 | .09 | .15 | .05 | .12 | .19 | .06 | .11 | .21 |
MCMC-TP | 40.30 | 118.13 | 234.12 | 42.48 | 121.81 | 241.87 | 41.55 | 119.44 | 244.29 |
MMLE/EM | .03 | .02 | .09 | .06 | .08 | .11 | .06 | .08 | .15 |
Note. BEMM-TP = Bayesian expectation-maximization-maximization-true priors; BEMM-CP = Bayesian expectation-maximization-maximization-commonly-used priors; MCMC-TP = Markov Chain Monte Carlo-true priors; MMLE/EM = marginal maximum likelihood estimation/expectation-maximization.
Based on these results, several conclusions can be drawn: (a) BEMM only requires 1,000 examinees to obtain relatively stable estimates (500 examinees are barely acceptable if necessary) while MMLE/EM requires at least 2,500 examinees. (b) The item recovery of in MMLE/EM is obviously worse than ’s, and MMLE/EM cannot recovery well in the small sample size. (c) The estimated standard errors yielded by the USEM method are reasonable and sensitive to the change of the sample size. (d) The commonly-used priors in this study work well in calibrating the item parameters in the 1PL-AG model. (e) The bias and RMSEs of MMLE/EM in the sample size of 2,500 replicate the results of Park et al. (2015) and serve as the baseline condition in the simulation.
An Empirical Example
The 1PL-AG model is applied to the flexMIRT example data set “g341-19.dat” which consists of the responses to 12 items from 2,844 examinees. In a prior study, Guo and Zheng (2019) calibrated the items using the 3PLM and found the difficulty of the whole test is relatively low. In this case, the authors believe this example can be a necessary complement to our simulation studies which are designed for a high difficulty situation. To further demonstrate the feasibility of BEMM in the 1PL-AG model calibration, the authors randomly sample 500 and 1,000 examinees from the full data set. In sum, the point estimates and the estimated standard errors yielded by BEMM, MCMC in JAGS, and MMLE/EM will be compared among the sample size of 500, 1,000 and 2,844 examinees. The commonly-used priors in the last section will be used in BEMM and JAGS.
As can be seen from Figure 4, both Bayesian methods (BEMM and JAGS) have very similar point estimates and corresponding SEs under the same conditions. Furthermore, most of the point estimates of and are also similar among BEMM, JAGS, and MMLE/EM across the three sample sizes. However, the point estimates for γs of MMLE/EM show an obvious fluctuation with the increasing of the sample size while those estimates yielded by BEMM and JAGS are more stable and very closed to the results of the complete data set. Meanwhile, the estimated standard error generally tends to decrease with the increase of sample size among the three methods. These results are consistent with the simulation studies. In addition, compared with BEMM and JAGS, MMLE/EM has larger SEs in and but smaller SEs in .
Figure 4.
The point estimates and corresponding SEs for α, β, and γ.
To further demonstrate the usefulness of 1PL-AG model, a comparison between 1PL-AG and 1PL-G models is made for the full data set. It can be seen from Table 3 that the 1PL-AG model has smaller Akaike information criterion (AIC) and Bayesian information criterion (BIC) than 1PL-G model, so the 1PL-AG model fits the data better. Furthermore, the significant result of the likelihood ratio (LR) test actually indicates that the parameter is far more than 0, which offers a substantively interesting interpretation that correctly guessing is related to ability. In this case, the 1PL-G model does not fit these data well.
Table 3.
The Model Fits Index and Likelihood Ratio Test.
Indicator | Model |
LR test between two models |
|||
---|---|---|---|---|---|
1PL-G | 1PL-AG | χ2 | df | Significance | |
−2loglikehood | 33508.75 | 33456.62 | 52.14 | 1 | <.001 |
AIC | 33556.75 | 33506.62 | |||
BIC | 33699.63 | 33655.44 |
Note. 1PL-G = one-parameter logistic guessing; 1PL-AG = one-parameter logistic ability-based guessing; LR = likelihood ratio; AIC = Akaike information criterion; BIC = Bayesian information criterion.
Discussion
In recent years, interest in the 1PL-AG model has been on the rise in measurement research. The current study presents a new Bayesian formulation of the 1PL-AG model and shows an approach to the estimation that has the potential to open the gates for its wide application among applied researchers. This section summarizes the current study’s contributions and discusses future research directions.
First, this study offers a fast and stable Bayesian modal estimation for the 1PL-AG model in a modest sample size. One major barrier to a wide application in educational and psychological research is a user-friendly package for the 1PL-AG model. Through the simulation studies and a real data analysis, the authors demonstrated that BEMM could provide stable estimates in a modest sample size. To facilitate calibration, the authors have made available a MATLAB package online (see Online Appendix B). The authors believe this package offers a better alternative for 1PL-AG model calibration, helping to promote the use of the 1PL-AG model in educational and psychological research, especially in the large-scale assessment (Beghetto, 2019).
Second, this study offers a mixture modeling reformulation of the 1PL-AG model, aligning the 1PL-AG model in a general framework in IRT literature like those given for the 3PLM (Zheng et al., 2018), the 4PLM (Zhang et al., 2018), the three-parameter normal ogive model (3PNO; Béguin & Glas, 2001) and the 4PNO (Culpepper, 2015). This unified framework connects the IRT models with guessing effects with a well-known framework in statistics—namely, the mixture model—which can potentially serve to spur on new possibilities for modeling and parameter estimation methods within IRT. For instance, more complexities in modeling guessing and slipping effects can be easily incorporated within this framework. From a broader perspective, the 1PL-AG model is a special case of the hybrid model (von Davier & Rost, 2006; Yamamoto, 1982), which can be seen if weights are selected as the guessing parameters, and densities chosen to be the degenerate density and the 1PLM, showing that this mixture modeling perspective can indeed be extended easily. On the parameter estimation side, the most widely used algorithm in mixture modeling is the EM algorithm, serving as the main inspiration for the current study.
In contrast to the BEMM algorithm for 3PLM (Guo & Zheng, 2019), there is an obvious difference between two algorithms, even though both of them belong to the mixture-modeling framework, share the similar E-step and the same name. In the 3PLM, the first M-step is to calculate the guessing parameter via a closed-form solution, and then maximize the difficulty and discrimination parameters in the second M-step. However, in the 1PL-AG model, the first M-step is designed for the difficulty and logistic-scaled guessing parameters (although their estimation is independent of each other), and the second M-step is to obtain the parameter, which cannot be absorbed into the first M-step since the parameter is for all items.
Finally, several potential directions can be studied in future: (a) It is interesting to investigate the effects of different priors on the accuracy of estimation; (2) it is possible to extend BEMM to the 2PL-AG or ability-based slipping model; (3) it is a potential direction to compare BEMM with other Bayesian methods like Hamiltonian Monte Carlo.
Supplemental Material
Supplemental material, sj-pdf-1-apm-10.1177_0146621621990761 for Bayesian Modal Estimation for the One-Parameter Logistic Ability-Based Guessing (1PL-AG) Model by Shaoyang Guo, Tong Wu, Chanjin Zheng and Yanlei Chen in Applied Psychological Measurement
Footnotes
Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was partially supported by the Peak Discipline Construction Project of Education at East China Normal University and the Flower of Happiness Project in social science of East China Normal University (2019ECNU-XFZH015).
Author’s note: Shaoyang Guo is affiliated with the Institute of Curriculum & Instruction, Faculty of Education, East China Normal University, China. Tong Wu is affiliated with the College of Education, Purdue University, USA. Chanjin Zheng is affiliated with the Department of Educational Psychology, Faculty of Education, East China Normal University, China. Yanlei Chen is affiliated with the School of Educational Science, Liaocheng University.
ORCID iD: Shaoyang Guo
https://orcid.org/0000-0002-2480-8800
Data Availability: The data sets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Supplemental Material: Supplementary material is available for this article online.
References
- Beghetto R. A. (2019). Large-Scale assessments, personalized learning, and creativity: Paradoxes and possibilities. ECNU Review of Education, 2(3), 311–327. 10.1177/2096531119878963 [DOI] [Google Scholar]
- Béguin A. A., Glas C. A. W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541–561. 10.1007/bf02296195 [DOI] [Google Scholar]
- Bock R. D., Aitkin M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. 10.1007/BF02293801 [DOI] [Google Scholar]
- Cai L., Lee T. (2009). Covariance structure model fit testing under missing data: An application of the supplemented EM algorithm. Multivariate Behavioral Research, 44(2), 281–304. 10.1080/00273170902794255 [DOI] [PubMed] [Google Scholar]
- Culpepper S. A. (2015). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 1142–1163. 10.1007/s11336-015-9477-6 [DOI] [PubMed] [Google Scholar]
- du Toit M. (2003). IRT from SSI: BILOG-MG, MULTILOG, PARSCALE, TESTFACT. Scientific Software International. [Google Scholar]
- Fariña P., Jorge G., San Martín E. (2019). The use of an identifiability-based strategy for the interpretation of parameters in the 1PL-G and Rasch models. Psychometrika, 84(2), 511–528. 10.1007/s11336-018-09659-w [DOI] [PubMed] [Google Scholar]
- Gelman A., Rubin D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472. 10.1214/ss/1177011136 [DOI] [Google Scholar]
- Guo S., Zheng C. (2019). The Bayesian expectation-maximization-maximization for the 3PLM. Frontiers in Psychology, 10(1175), 1–11. 10.3389/fpsyg.2019.01175 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Houts C. R., Cai L. (2015). flexMIRT: Flexible multilevel multidimensional item analysis and test scoring user’s manual version 3.5 RC. https://www.vpgcentral.com/software/irt-software/
- Hutchinson T. P. (1991). Ability, partial information, guessing: Statistical modelling applied to multiple-choice tests. Rumsby Scientific Publishing. [Google Scholar]
- Kendall M. G., Stuart A. (1961). Interval estimation: Confidence intervals. In Kendall M. G., Stuart A. (Eds.), The advanced theory of statistics (Vol. 2: Inference and Relationship, pp. 98–133). Hafner Publishing. [Google Scholar]
- Lord F. M. (1980). Applications of item response theory to practical testing problems. Routledge. [Google Scholar]
- Lord F. M., Novick M. R. (1968). Statistical theories of mental test scores. Addison-Wesley. [Google Scholar]
- McLachlan G., Krishnan T. (2007). The EM algorithm and extensions (Vol. 382). John Wiley. [Google Scholar]
- Mislevy R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51(2), 177–195. 10.1007/BF02293979 [DOI] [Google Scholar]
- Oehlert G. W. (1992). A note on the delta method. American Statistician, 46(1), 27–29. [Google Scholar]
- Park R., Pituch K. A., Kim J., Dodd B. G., Chung H. (2015). Marginalized maximum likelihood estimation for the 1PL-AG IRT model. Applied Psychological Measurement, 39(6), 1–17. 10.1177/0146621615574694 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plummer M. (2003, March). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling [Paper presentation]. The Proceedings of the 3rd international workshop on distributed statistical computing (dsc 2003). https://www.r-project.org/conferences/DSC-2003/Proceedings/Plummer.pdf
- Rasch G. (1960). Probabilistic models for some intelligence and achievement tests. Danish Institute for Educational Research. [Google Scholar]
- San Martín E., Del Pino G., De Boeck P. (2006). IRT models for ability-based guessing. Applied Psychological Measurement, 30(3), 183–203. 10.1177/0146621605282773 [DOI] [Google Scholar]
- San Martín E., González J., Tuerlinckx F. (2015). On the unidentifiability of the fixed-effects 3PL model. Psychometrika, 80(2), 450–467. 10.1007/s11336-014-9404-2 [DOI] [PubMed] [Google Scholar]
- San Martín E., Rolin J.-M., Castro L. M. (2013). Identification of the 1PL model with guessing parameter: Parametric and semi-parametric results. Psychometrika, 78(2), 341–379. 10.1007/s11336-013-9322-8 [DOI] [PubMed] [Google Scholar]
- Tian W., Cai L., Thissen D., Xin T. (2013). Numerical differentiation methods for computing error covariance matrices in item response theory modeling: An evaluation and a new proposal. Educational and Psychological Measurement, 73(3), 412–439. 10.1177/0013164412465875 [DOI] [Google Scholar]
- von Davier M., Rost J. (2006). Mixture distribution item response models. In Rao C R., Sinharay S. (Eds.), Handbook of statistics (Vol. 26, pp. 643–661). Elsevier. [Google Scholar]
- Waller N. G., Feuerstahler L. M. (2017). Bayesian modal estimation of the four-parameter item response model in real, realistic, and idealized data sets. Multivariate Behavioral Research, 52(3), 350–370. 10.1080/00273171.2017.1292893 [DOI] [PubMed] [Google Scholar]
- Wang W. C., Huang S. Y. (2011). Computerized classification testing under the one-parameter logistic response model with ability-based guessing. Educational and Psychological Measurement, 71(6), 925–941. 10.1177/0013164410392372 [DOI] [Google Scholar]
- Yamamoto K. (1982). Hybrid model of IRT and latent class models. ETS Research Report Series, 1982(2), i-61. 10.1002/j.2333-8504.1982.tb01326.x [DOI] [Google Scholar]
- Zhang C. (2018). From expectation-3-maximization to Bayesian expectation-3-maximization: A latent mixture modeling-based Bayesian algorithm for the 4-parameter logistic model (Master in Science). University of Illinois at Urbana–Champaign. http://hdl.handle.net/2142/101297 [Google Scholar]
- Zhang C., Guo S., Zheng C. (2018, April). Bayesian Expectation-Maximization-Maximization Algorithm for the PLM4 [Paper presentation]. The 80th NCME Annual Meeting, New York, NY, United States. [Google Scholar]
- Zheng C., Meng X., Guo S., Liu Z. (2018). Expectation-Maximization-Maximization: A feasible MLE algorithm for the three-parameter logistic model based on a mixture modeling reformulation. Frontiers in Psychology, 8(2302), 1–10. 10.3389/fpsyg.2017.02302 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material, sj-pdf-1-apm-10.1177_0146621621990761 for Bayesian Modal Estimation for the One-Parameter Logistic Ability-Based Guessing (1PL-AG) Model by Shaoyang Guo, Tong Wu, Chanjin Zheng and Yanlei Chen in Applied Psychological Measurement