Exploration of Item Selection in Dual-Purpose Cognitive Diagnostic Computerized Adaptive Testing: Based on the RRUM

Buyun Dai; Minqiang Zhang; Guangming Li

doi:10.1177/0146621616666008

. 2016 Sep 24;40(8):625–640. doi: 10.1177/0146621616666008

Exploration of Item Selection in Dual-Purpose Cognitive Diagnostic Computerized Adaptive Testing

Based on the RRUM

Buyun Dai ^1,², Minqiang Zhang ^1,^3,^✉, Guangming Li ^1,³

PMCID: PMC5978720 PMID: 29882535

Abstract

Cognitive diagnostic computerized adaptive testing (CD-CAT) can be divided into two broad categories: (a) single-purpose tests, which are based on the subject’s knowledge state (KS) alone, and (b) dual-purpose tests, which are based on both the subject’s KS and traditional ability level ( $θ$ ). This article seeks to identify the most efficient item selection method for the latter type of CD-CAT corresponding to various conditions and various evaluation criteria, respectively, based on the reduced reparameterized unified model (RRUM) and the two-parameter logistic model of item response theory (IRT-2PLM). The Shannon entropy (SHE) and Fisher information methods were combined to produce a new synthetic item selection index, that is, the “dapperness with information (DWI)” index, which concurrently considers both KS and $θ$ within one step. The new method was compared with four other methods. The results showed that, in most conditions, the new method exhibited the best performance in terms of KS estimation and the second-best performance in terms of $θ$ estimation. Item utilization uniformity and computing time are also considered for all the competing methods.

Keywords: cognitive diagnostic computerized adaptive testing, item selection method, knowledge state, “dapperness with information” index

Computerized adaptive testing (CAT) is typically engineered to tailor a test to each examinee’s trait level, thus matching the difficulties of the items to the examinee being measured (Chang, 2014). In addition, the aim of cognitive diagnosis is to provide information about specific content areas in which an examinee needs help (McGlohen & Chang, 2008). “Adaptive” is the core idea behind CAT, whereas “efficiency” is an important objective of cognitive diagnostic testing (CDT). The combination of these two ideas resulted in cognitive diagnostic computerized adaptive testing (CD-CAT).

According to measurement targets, CD-CAT can be divided into two broad categories: (a) single-purpose tests (e.g., Xu, Chang, & Douglas, 2003) that only estimate the subject’s knowledge state (KS; that is, attribute mastery pattern), and (b) dual-purpose tests (e.g., McGlohen & Chang, 2008; Wang, Chang, & Douglas, 2012) that estimate both the subject’s KS and traditional ability level ( $θ$ ) in item response theory (IRT) framework. Dual-purpose tests are more complex and thus require more technically sophisticated methods.

The item selection algorithm is a critical component of CD-CAT. It affects whether items are adaptable to the subject’s current KS and/or ability and the level to which an item can be adapted to the subject. In addition, the item selection method affects the efficiency and efficacy of a test. To date, many item selection methods have been proposed for the two types of CD-CAT, and the best method may vary according to experimental conditions. Based on the reduced reparameterized unified model (RRUM) and various numbers of attributes, several item selection methods for dual-purpose CD-CAT have been synthetically compared in this article to identify the most item-efficient method (the method which can get the best synthetic result with the same number of items).

Literature Review

Item Selection Based on KS Alone

This category of CD-CAT is a combination of the principles of CDT and CAT. Specifically, it combines the “adaptive” object of the traditional CAT with an estimation of the subject’s KS. The Shannon entropy (SHE) method (Tatsuoka, 2002; Tatsuoka & Ferguson, 2003; Xu et al., 2003), the Kullback–Leibler (KL) information method (Xu et al., 2003), and its extensions are the key methods. In a broad sense, both KL and SHE are information indices. They are used to compute the information that each item can provide for the current subject given the current estimate of his or her KS; then the item best tailored to the current estimate of the subject is selected. These algorithms are introduced below.

The minimum expected predictive SHE algorithm

Assume that, in the Bayesian context, the prior distribution of the KS vector, $α$ , has been specified, where $α$ is a vector of length K and K is the number of attributes. In addition, binary attributes are considered in this article, whereby each element of $α$ takes a value of 1 or 0, which represents the mastery or nonmastery of an attribute. After the subject has answered n − 1 items, selecting the next item by minimizing the expected predictive SHE of the posterior distribution of $α$ would lead to minimal uncertainty and allow the estimated KS vector to better approximate the true KS vector. There are $L = 2^{K}$ possible KS vectors. The prior distribution of $α$ is $P (α = α_{l}) = λ_{l}$ , where $l = 1, &, 2^{K}$ . When a subject has answered n − 1 items, the posterior probability of his KS, $α_{l}$ , is denoted by $π_{n - 1} (α_{l})$ :

π_{n - 1} (α) \propto λ_{l} \prod_{t = 1}^{n - 1} (P_{l t})^{Y_{t}} {(1 - P_{l t})}^{1 - Y_{t}}

where $P_{l t}$ denotes $P (Y_{l t} = 1)$ and is equal to $P (Y_{t} = 1 | α = α_{l})$ , which indicates the probability of the subject providing a correct answer to his item t given that his KS is $α_{l}$ .

The SHE of the posterior distribution of the attribute pattern is

E {(π}_{n - 1}) = - \sum_{l = 1}^{2^{K}} [π_{n - 1} (α_{l})] \times \log [π_{n - 1} (α_{l})]

After answering n items, his answer pattern is $Y^{n} = (Y_{1}, Y_{2}, & Y_{n - 1}, Y_{n})$ , where the value of $Y_{n}$ may be 0 or 1. Thus, the expected SHE is

\begin{array}{l} SHE (π_{n}) = \sum_{z = 0}^{1} [E (π_{n} | Y^{n - 1}, Y_{n} = z) \times P (Y_{n} = z | Y^{n - 1})] \\ = \sum_{z = 0}^{1} {E (π_{n} | Y^{n - 1}, Y_{n} = z) \cdot \sum_{l = 1}^{2^{K}} [P (Y_{n} = z) | α_{l}) \times π_{n - 1} (α_{l})]} \end{array}

The formula derivation above refers to the definition of conditional expectation. Then, in selecting the nth item for the subject, this algorithm attempts to minimize $SHE (π_{n})$ .

The maximum KL information algorithm and its advanced algorithms

The KL information (which is also called the KL distance or relative entropy) measures the distance or divergence between two probability distributions. The KL index for the ith item, given the subject’s current estimate $KS (\hat{α})$ , is defined as the following sum of KL distances between $\hat{α}$ and possible candidate attribute vectors $α_{l}$ generated by the ith item:

{KL}_{i} (\hat{α}) = \sum_{l = 1}^{L} [\sum_{y = 0}^{1} \log (\frac{P (Y_{i} = y | \hat{α})}{P (Y_{i} = y | α_{l})}) \times P (Y_{i} = y | \hat{α})] .

The inner sum represents the KL information for the distribution of the ith item depending on attribute vectors $\hat{α}$ and $α_{l}$ , where $\hat{α}$ is regarded as a true value. Then, the item with the maximum KL index for the current subject is selected.

The KL index is also called the global discrimination index (GDI; Cheng, 2010). Many scholars have proposed improvements in the KL method. For example, Cheng (2009) proposed the posterior-weighted KL (PWKL) index and hybrid KL (HKL) index. If the prior is discrete uniform, then the PWKL index is equivalent to the likelihood-weighted KL (LWKL) information method. The HKL is the PWKL weighted by the inverse of the distance between the estimate of KS and any other possible KS. To ensure adequate coverage of all attributes/skills, Cheng (2010) imposed constraints on the number of times each attribute is represented in CD-CAT (for a review, see Chang, 2014), naming the new index as the Modified GDI (MGDI) and the new method as the Maximum Modified GDI (MMGDI) method.

Item Selection Based on Both KS and Ability

This category of CD-CAT provides cognitive diagnoses and estimates ability concurrently. During the test process, both a subject’s KS (denoted as $α$ ) and traditional ability level (denoted as $θ$ ) are repeatedly estimated, and the item that is tailored to the current estimates of subject’s current $α$ and $θ$ is selected. This category of CD-CAT is a combination of the main principles of CDT and CAT.

Based on the fusion model (Hartz, Roussos, & Stout, 2002) and the three-parameter logistic model of item response theory (IRT-3PLM), McGlohen & Chang (2008) proposed an item selection method for this category. This method involved construction of a shadow test (i.e., shadow item bank) that was optimized according to the subject’s current ability estimate $\hat{θ}$ . Then, the best item for measuring the attribute vector $\hat{α}$ was selected from the shadow test on the basis of the current $\hat{α}$ estimate, using the SHE or the KL method. Based on the DINA model (Junker & Sijtsma, 2001) and the IRT-3PLM, Du (2010) proposed an item selection method that reversed the order: The shadow test was constructed using the minimized SHE according to $\hat{α}$ ; then, the best item for measuring $\hat{θ}$ was selected from the shadow test on the basis of the current $\hat{θ}$ , using the Fisher information method.

Both of the above methods involve a shadow test; thus, they alternately select items according to ability and KS, and every item selection requires two steps. During the two steps, which aim to separately enhance the accuracy of ability and KS, the system selects the respective “local optimization” for each step; however, the combination of the two “local optimizations” does not necessarily guarantee a “good synthetic result.” The authors suggest that a more desirable item selection method should consider ability and KS concurrently within one step. To this end, a real “synthetic adaptive” algorithm must be developed.

The dual information (DI) method proposed by Cheng (2007) provides such a “synthetic adaptive” algorithm. The DI index was constructed as a weighted sum of the KL information of the $\hat{θ}$ component and the KL information of the $\hat{α}$ component, that is, $KL (\hat{θ})$ and $KL (\hat{α})$ . According to the results, the effect of different weights on measurement precision was negligible unless the weights were extreme.

Wang et al. (2012) also proposed several “synthetic adaptive” algorithms. Among these algorithms, according to their study, the maximum information based method (MIinfor method) provided the highest KS and $θ$ estimation accuracy. The method is described as follows: First, a priority index is proposed as

P_{i} = \sum_{k = 1}^{K} (u_{k} - x_{k}) d_{i k},

where $P_{i}$ is the priority index of the ith item, $d_{i k}$ summarizes the ability of the ith item to distinguish between examinees who had possessed the kth attribute and others who had not, and $u_{k}$ and $x_{k}$ are the upper bound and the accumulated information for the kth attribute, respectively. $u_{k}$ is set by users and should be adjusted when one of the test conditions changes. The calculations of $d_{i k}$ and $x_{k}$ are introduced in Wang et al. (2012). $u_{k} - x_{k}$ serves as a weight for the attribute-level information and indicates the importance of the information for the kth attribute. Finally, the priority index is multiplied by the Fisher information, and the item with the maximum multiplication result would be selected.

The RRUM

To date, most CD-CAT item selection studies have utilized the DINA model. This model divides subjects into two groups: One group is composed of attribute vectors that contain all of the required attributes for item i, and the other group is composed of attribute vectors that lack at least one of the required attributes for the same item. Attribute vectors in the same group are assumed to have the same probability of answering the item correctly. However, the disadvantage of the DINA model is that this assumption may not always hold for the latter group because attribute vectors in this group have varying degrees of deficiencies with respect to the required attributes. Therefore, their probabilities of success may not be identical (de la Torre, 2011). Thus, the current article employs another cognitive diagnostic model, that is, the RRUM, which was proposed by Hartz (2002).

The RRUM is a compensatory model that allows the absence of one attribute to be remedied by the presence of other attributes. Its item parameters are as follows: The baseline parameter associated with item difficulty and the penalty parameter that describes the extent to which the mastery of a specific attribute would affect the chance of answering the item correctly (Feng, Habing, & Huebner, 2014). For each item, every attribute is assigned one penalty parameter. Therefore, subjects with different KSs have different probabilities of successfully answering each item. Thus, the RRUM displays greater flexibility than the DINA model.

Consider an assessment of I items diagnosing K attributes. Let $Y_{i}$ be the observed binary 1/0 response of a subject to item i, where 1 indicates that the subject provides a correct response to item i, and 0 indicates an incorrect response. The binary Q-matrix describes how the items are related to the attributes, with entry $q_{i k}$ indicating whether attribute k is required by item i. Under the RRUM, the probability of a correct answer to item i given that a respondent has $KS (α_{l})$ is as follows (Feng et al., 2014; Rupp, Templin, & Henson, 2010):

p_{l i} = P (Y_{i} = 1 | α_{l}) = π_{i}^{*} \prod_{k = 1}^{K} r_{i k}^{* (1 - α_{l k}) q_{i k}} .

The baseline parameter $π_{i}^{*}$ is the probability of a correct response to item i given that a respondent has mastered all the required attributes for the item. An item with a large $π_{i}^{*}$ parameter indicates that the required attributes can effectively explain examinee responses to the item. The penalty parameter $r_{i k}^{*}$ is the probability that a subject has not mastered attribute k but correctly answers item i divided by the probability that a subject has mastered attribute k and correctly answers item i.

In the remaining sections, a new index that could consider both KS and $θ$ within one step without shadow tests is proposed, and is compared with four previous methods (Cheng, 2007; Du, 2010; McGlohen & Chang, 2008; Wang et al., 2012) under the RRUM and IRT-2PLM.

The Dapperness With Information (DWI) Index

As noted above, the most desirable item selection method should concurrently consider ability and KS within one step. To this end, a synthetic statistical index should be used. The authors decided to synthesize an index by combining one item selection method based on ability and one item selection method based on KS. In IRT-CAT, in which item selection is based on ability, the maximum Fisher information method is often used based on the current estimation of the subject’s ability.

To compare the KS recoveries of the item selection methods for single-purpose CD-CAT, a pilot study was conducted. The following six methods were compared: random, SHE, KL (GDI), LWKL, HKL, and MMGDI. The independent variables are item selection methods (the six above-mentioned methods), test lengths (short test: length of $K \times 4$ , long test: length of $K \times 5$ ), and attribute numbers (K = 4, 6, 8). The test length levels were set as above because a subject’s KS requires more items be measured as the number of attributes increases. The sizes of item banks were set according to Stocking’s (1994) recommendation. The experiments were repeated 20 times, using the MATLAB 2012b software package. Groups of examinees were simulated, and each group was composed of 1,000 examinees. The maximum a posteriori (MAP) method was used to estimate the KSs of the examinees. Then, the following results were obtained: for CD-CAT that estimates the subject’s KS alone, the SHE method was the most accurate method, the methods of HKL and LWKL were slightly inferior, and the random method performed the worst. However, the SHE, HKL, and LWKL methods were almost identical, except for a small difference of no more than 1%. These results were similar to the findings of Cheng (2009) and Chen, Li, and Xin (2011) using the DINA model. Based on the above results, it was decided to include the SHE method in the synthetic index.

Thus, a synthetic index can be generated as a weighted sum as follows:

ω f (I_{i} ({\hat{θ}}_{j})) + (1 - ω) g ({SHE}_{i} ({\hat{α}}_{j})),

where $I_{i} ({\hat{θ}}_{j})$ is the Fisher information of item i with respect to the current estimation of the ability of subject j, ${SHE}_{i} ({\hat{α}}_{j})$ is the expected predictive SHE of the item to the same subject, $f$ is a monotonically increasing function, $g$ is a monotonically decreasing function, and $ω \in (0, 1)$ represents the weight of the first part. Thus, the weight $ω$ can be used to achieve a balance between the accuracy of the estimation of the ability $θ$ and that of the $KS (α)$ . Because there is a considerable difference between the value range of $I_{i} ({\hat{θ}}_{j})$ and ${SHE}_{i} ({\hat{α}}_{j})$ , it is not suitable to directly sum them with weights. Instead, $f (x) = \log (x)$ and $g (x) = - \log (x)$ are adopted, because the logarithmic function is often used for changing values with different orders of magnitude into values with the same order of magnitude.

The authors have tried several values of ω (0.1, 0.3, 0.5, 0.7, and 0.9) and found that ω = 0.5 (i.e., $ω = \frac{1}{2}$ ) provided the best synthetic results (See Appendix B). When $ω = \frac{1}{2}$ , the synthetic index is equal to

\frac{1}{2} \log (I_{i} ({\hat{θ}}_{j})) - (1 - \frac{1}{2}) \log {((SHE}_{i} ({\hat{α}}_{j})) = \frac{1}{2} \log \frac{I_{i} ({\hat{θ}}_{j})}{{SHE}_{i} ({\hat{α}}_{j})}

Due to the monotonicity of the logarithm function, choosing an item that maximizes $\frac{1}{2} \log \frac{I_{i} ({\hat{θ}}_{j})}{{SHE}_{i} ({\hat{α}}_{j})}$ is equivalent to that maximizes $\frac{I_{i} ({\hat{θ}}_{j})}{{SHE}_{i} ({\hat{α}}_{j})}$ . Thus, the authors decide to use the ratio of the Fisher information and the expected predictive SHE as the synthetic item selection index for the dual-purpose CD-CAT.

In physics, entropy denotes the degree of confusion; thus, it is used to measure the degree of uncertainty (Shannon, 2001). Conversely, its inverse can be used to measure the degree of certainty, which is referred to as the degree of dapperness (or order). In CD-CAT, the SHE index selects the item that minimizes the expected SHE of the posterior distribution of the attribute vectors. Equivalently, a dapperness index can be constructed (the inverse of the SHE index), which selects the item with the maximum value. The new synthetic index is a combination of the dapperness index and the Fisher information quotient; thus, it is labeled as the DWI index. The formula can be written as

{DWI}_{i} = \frac{I_{i} ({\hat{θ}}_{j})}{{SHE}_{i} ({\hat{α}}_{j})} .

The Simulation Study

Simulation Design

Both McGlohen & Chang (2008) and Du (2010) used the IRT-3PLM. However, Baker and Kim (2004) noted that for the IRT-3PLM, 1,000 subjects and 60 items would be needed for accurately estimating the item and ability parameters. In the current study, all test lengths are less than 60 items; therefore, the IRT-2PLM is selected as Wang et al. (2012) did.

The purpose of this study is to compare the KS recovery, ability recovery, item utilization, and computing time of different item selection methods using the RRUM and the IRT-2PLM. The following five methods are compared: McGlohen & Chang (2008) method, Du’s (2010) method, the DI method, the MIinfor method, and the proposed DWI method.

The independent variables are the item selection methods (the five above-mentioned methods), attribute numbers (K = 4, 6, 8), and the type of relationship among the attributes (uncorrelated, correlated). The first independent variable is a within-group variable, and the latter two are between-group variables.

The evaluation criteria are the average values of the attribute recovery rate (ARR), the average value of the pattern correct classification rate (PCCR), the average values of the mean absolute error (ABSE), the average values of item utilization uniformity ( $χ^{2}$ ), and the average values of computing time. The ARR quantifies the estimation accuracy of certain attributes, and the ARR of the kth attribute is computed as follows:

{ARR}_{k} = \frac{J_{k c o r r e c t}}{J},

where J is the total number of subjects and $J_{k c o r r e c t}$ is the number of subjects whose the kth attribute are correctly classified.

The PCCR, which quantifies the estimation accuracy of the entire KS, is computed as follows:

PCCR = \frac{J_{p a t t e r n c o r r e c t}}{J},

where $J_{p a t t e r n c o r r e c t}$ is the number of subjects whose KSs are correctly classified.

The ABSE quantifies the estimation accuracy of subjects’ ability levels and is computed as follows:

ABSE = \frac{1}{J} \sum_{j = 1}^{J} | {\hat{θ}}_{j} - θ_{j} |,

where $θ_{j}$ is obtained through estimation involving the entire bank of items (see as the true IRT ability value of subject j), and ${\hat{θ}}_{j}$ is its corresponding estimate.

Item utilization uniformity quantifies the difference between ideal item exposure distribution and real item exposure distribution, and is computed as follows (Chen et al., 2011):

χ^{2} = \frac{\sum_{i} {(e r_{i} - L / I)}^{2}}{L / I},

where L is the test length, L/I means ideal item exposure, $e r_{i}$ means the observed exposure of item i. The smaller the $χ^{2}$ value is, the more ideal the item utilization is.

Experiments were repeated 20 times using the MATLAB 2012b software package. Groups of examinees were simulated, and each group was composed of 1,000 examinees.

The number of items in the item bank is denoted as I. Stocking (1994) noted that a rule of thumb in item banking is that the pool must have at least 12 times as many items as the test length. Thus, the size of the item bank was set as I = 300 when K = 4, I = 450 when K = 6, and I = 600 when K = 8.

For the case in which all attributes were uncorrelated, each subject’s true KS was randomly generated, with each attribute being independently generated and having an equal probability of being 0 or 1. For the case in which attributes were correlated, the correlation coefficients between any two different attributes were set as .5, and the probability of an attribute being mastered by a certain examinee remained .5, as before. The KS simulation of the latter case is identical to that of Henson and Douglas (2005), using multivariate normal K-dimensional vectors.

To ensure that every attribute was measured by at least one item, the first K items in the item bank were set as follows: the kth item measured the kth attribute only, where k is $1, 2, \dots K$ . These items were used as the initial items in the CD-CAT simulation. In regard to the other items, Q-matrix generation requires that the probability that an attribute is measured by a given item is .5, and there should be no item that measures nothing. According to Feng et al. (2014), the baseline parameters and penalty parameters are generated from uniform distributions of (0.6, 1.0) and (0.05, 0.4), respectively.

The simulation involved 11 steps:

Q-matrix generation;
baseline parameter and penalty parameter generation, based on the RRUM;
true KS generation;
simulation of the subjects’ responses on all of the items, generating the complete response matrix;
estimation of $θ$ and item parameters a and b based on the IRT-2PLM, according to the complete response matrix;
selection of initial items for a subject;
search of the complete response matrix for the subject’s responses to the initial items and presentation of the preliminary estimates of the subject’s KS and $θ$ ;
selection of the next item according to the item selection method;
search of the subject’s scores for the selected item (based on the complete response matrix);
reestimation of the subject’s KS and $θ$ ; and
repetition of Steps 8 through 10 until the fixed test length has been reached.

The length of the shadow test would influence the results of McGlohen & Chang (2008) and Du’s (2010) methods. After repeated trials, the best lengths of the shadow tests were found and selected in the design. The weights of $KL (\hat{θ})$ and $KL (\hat{α})$ in the DI method were set to equality. In addition, the upper bound information for each attribute employed in the MIinfor method was obtained and selected after repeated testing.

The MAP method was used to estimate the KSs of the examinees, and the expected a posteriori (EAP) method was used to estimate the θs.

A fixed-length stopping rule was employed. Because dual propose CD-CAT requires more items than single propose CD-CAT, the test lengths were fixed at 6 times the number of attributes.

Results

The results for the case in which all attributes were uncorrelated are provided in Tables 1 to 3.

Table 1.

Results for the Five Methods When Attributes Were Uncorrelated and K = 4 (Item Bank = 300, Test Length = 24).

Method	ARR				PCCR		ABSE		$χ^{2}$	T
	At.1	At.2	At.3	At.4	M	SE	M	SE	M	M
McGlohen’s	0.961	0.965	0.955	0.967	0.890	0.004	0.313	0.005	73.8	1.6
Du’s	0.974	0.976	0.972	0.978	0.913	0.005	0.415	0.008	59.6	14.2
DI	0.976	0.981	0.980	0.981	0.924	0.008	0.681	0.012	78.4	89.0
MIinfor	0.952	0.953	0.947	0.956	0.859	0.005	0.291	0.002	142.0	1.4
DWI	0.964	0.967	0.958	0.968	0.898	0.004	0.295	0.004	94.5	14.6

Item selection method	PCCR	ABSE
	M	M
McGlohen’s	0.942	0.280
Du’s	0.922	0.333
DI	0.689	0.422
MIinfor	0.928	0.267
DWI	0.949	0.269

Item selection method	PCCR	ABSE
	M	M
McGlohen’s	0.852	0.371
Du’s	0.847	0.404
DI	0.539	0.434
MIinfor	0.850	0.299
DWI	0.864	0.324

Item selection method	PCCR	ABSE
	M	M
McGlohen’s	0.745	0.706
Du’s	0.745	0.494
DI	0.452	0.451
MIinfor	0.726	0.504
DWI	0.747	0.447

	PCCR	ABSE
ω = 0.1	0.982	0.347
ω = 0.3	0.928	0.304
ω = 0.5	0.898	0.295
ω = 0.7	0.873	0.300
ω = 0.9	0.857	0.309

	PCCR	ABSE
ω = 0.1	0.970	0.500
ω = 0.3	0.906	0.348
ω = 0.5	0.858	0.312
ω = 0.7	0.803	0.322
ω = 0.9	0.752	0.326

	PCCR	ABSE
ω = 0.1	0.968	0.901
ω = 0.3	0.883	0.700
ω = 0.5	0.814	0.584
ω = 0.7	0.748	0.525
ω = 0.9	0.690	0.489

	PCCR	ABSE
ω = 0.1	0.995	0.264
ω = 0.3	0.979	0.246
ω = 0.5	0.971	0.244
ω = 0.7	0.959	0.248
ω = 0.9	0.921	0.251

	PCCR	ABSE
ω = 0.1	0.841	0.534
ω = 0.3	0.800	0.414
ω = 0.5	0.780	0.396
ω = 0.7	0.763	0.391
ω = 0.9	0.736	0.387

Method	ARR				PCCR		ABSE		$χ^{2}$	T
	At.1	At.2	At.3	At.4	M	SE	M	SE	M	M
McGlohen’s	0.991	0.990	0.988	0.991	0.965	0.002	0.264	0.005	60.1	1.5
Du’s	0.990	0.990	0.990	0.990	0.962	0.002	0.331	0.006	79.8	14.0
DI	0.965	0.964	0.931	0.964	0.832	0.015	0.388	0.005	76.7	89.3
MIinfor	0.989	0.987	0.989	0.988	0.958	0.003	0.244	0.004	93.7	1.4
DWI	0.993	0.992	0.990	0.993	0.971	0.003	0.244	0.004	71.5	14.6

Method	ARR								PCCR		ABSE		$χ^{2}$	T
	At.1	At.2	At.3	At.4	At.5	At.6	At.7	At.8	M	SE	M	SE	M	M
McGlohen’s	0.964	0.961	0.960	0.960	0.965	0.961	0.964	0.955	0.764	0.006	0.544	0.020	55.5	31.4
Du’s	0.966	0.965	0.964	0.966	0.970	0.967	0.967	0.961	0.774	0.008	0.444	0.012	77.4	868.7
DI	0.921	0.911	0.906	0.904	0.901	0.917	0.919	0.896	0.470	0.013	0.414	0.004	145.0	423.7
MIinfor	0.963	0.961	0.958	0.962	0.963	0.963	0.962	0.952	0.747	0.007	0.457	0.020	185.0	14.9
DWI	0.966	0.962	0.959	0.962	0.967	0.963	0.966	0.956	0.780	0.005	0.396	0.006	82.5	883.9

Method	ARR						PCCR		ABSE		$χ^{2}$	T
	At.1	At.2	At.3	At.4	At.5	At.6	M	SE	M	SE	M	M
McGlohen’s	0.986	0.975	0.977	0.987	0.984	0.985	0.901	0.006	0.343	0.010	54.5	6.0
Du’s	0.982	0.978	0.979	0.982	0.980	0.981	0.892	0.006	0.383	0.007	84.6	111.0
DI	0.941	0.902	0.918	0.942	0.934	0.937	0.646	0.012	0.421	0.003	111.0	204.2
MIinfor	0.981	0.972	0.975	0.981	0.976	0.981	0.885	0.006	0.287	0.005	139.5	3.5
DWI	0.986	0.973	0.975	0.987	0.983	0.984	0.904	0.007	0.308	0.006	86.2	112.5

PERMALINK

Exploration of Item Selection in Dual-Purpose Cognitive Diagnostic Computerized Adaptive Testing

Buyun Dai

Minqiang Zhang

Guangming Li

Abstract

Literature Review

Item Selection Based on KS Alone

The minimum expected predictive SHE algorithm

The maximum KL information algorithm and its advanced algorithms

Item Selection Based on Both KS and Ability

The RRUM

The Dapperness With Information (DWI) Index

The Simulation Study

Simulation Design

Results

Table 1.

Table 3.

Table 2.

Table 4.

Table 6.

Table 5.

Discussion and Conclusion

Acknowledgments

Appendix A

Table A1.

Table A2.

Table A3.

Appendix B

Table B1.

Table B2.

Table B3.

Table B4.

Table B5.

Table B6.

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases