Evaluating the Fit of Sequential G-DINA Model Using Limited-Information Measures

Wenchao Ma

doi:10.1177/0146621619843829

. 2019 May 14;44(3):167–181. doi: 10.1177/0146621619843829

Evaluating the Fit of Sequential G-DINA Model Using Limited-Information Measures

Wenchao Ma ^1,^✉

PMCID: PMC7174807 PMID: 32341605

Abstract

Limited-information fit measures appear to be promising in assessing the goodness-of-fit of dichotomous response cognitive diagnosis models (CDMs), but their performance has not been examined for polytomous response CDMs. This study investigates the performance of the M_ord statistic and standardized root mean square residual (SRMSR) for an ordinal response CDM—the sequential generalized deterministic inputs, noisy “and” gate model. Simulation studies showed that the M_ord statistic had well-calibrated Type I error rates, but the correct detection rates were influenced by various factors such as item quality, sample size, and the number of response categories. In addition, the SRMSR was also influenced by many factors and the common practice of comparing the SRMSR against a prespecified cut-off (e.g., .05) may not be appropriate. A set of real data was analyzed as well to illustrate the use of M_ord statistic and SRMSR in practice.

Keywords: cognitive diagnosis, ordinal response, model-data fit, goodness-of-fit, limited information, sequential G-DINA

Introduction

Cognitive diagnosis models (CDMs) aim to classify individuals into homogeneous latent classes. Each latent class has a unique profile of attributes, which are typically binary latent variables, representing the presence or absence of latent constructs of interest. A large number of CDMs have been developed in the literature. Examples of CDMs for dichotomous response include the deterministic inputs, noisy “and” gate (DINA; Haertel, 1989) model, the deterministic input noisy “or” gate (DINO; Templin & Henson, 2006) model, the additive CDM (A-CDM; de la Torre, 2011), and the generalized DINA (G-DINA; de la Torre, 2011) model. Examples of CDMs for polytomous responses include the sequential G-DINA model (sG-DINA; Ma & de la Torre, 2016), the diagnostic model for ordinal response (R. Liu & Jiang, 2018) and the general diagnostic model (von Davier, 2008). The usefulness of these CDMs, however, depends on whether they can adequately fit the data, and therefore, empirically examining the model-data fit is critical.

Traditional Pearson $χ^{2}$ statistic and likelihood ratio statistic $G^{2}$ are well recognized to be less useful in practice in that they consider the expected and observed frequencies of all response patterns. To address this issue, some limited-information measures based on the observed and predicted marginal frequencies of response patterns have been proposed (e.g., Reiser, 2008; Maydeu-Olivares & Joe, 2006). In CDMs, the $M_{2}$ statistic for dichotomous response has been shown to have well-calibrated Type I error rates under varied conditions and adequate power in detecting some types of model misspecifications (F. Chen, Liu, Xin, & Cui, 2018; Hansen, Cai, Monroe, & Li, 2016; Jurich, 2014; Y. Liu, Xin, Li, Tian, & Liu, 2016). The $M_{2}$ statistic can also be applied to graded response data, but its calculation may be difficult or even impossible due to a heavy computation burden when the number of items and response categories increase (Maydeu-Olivares & Joe, 2014). Building upon the $M_{2}$ statistic, Maydeu-Olivares (2013) and Cai and Hansen (2013) introduced the $M_{ord}$ statistic (referred to as the $M_{2}^{*}$ ) for polytomous response item response models. They also found that the $M_{ord}$ statistic had better-calibrated Type I error and higher power than the $M_{2}$ statistic in detecting model misspecifications for graded response data, especially when the number of categories was large. However, the performance of $M_{ord}$ statistic for ordinal response CDMs has not been investigated.

In addition to the limited-information statistics with known limiting distributions, several limited-information indices have also been proposed as effect size measures. The root mean square error approximation (RMSEA) based on the $M_{2}$ statistic using the univariate and bivariate margins, typically referred to as RMSEA₂, has been examined in several studies (e.g., Maydeu-Olivares & Joe, 2006; Y. Liu, Xin, et al., 2016). However, RMSEA₂ is a function of the number of categories and may not be suitable for polytomous response models (Maydeu-Olivares & Joe, 2014). Maydeu-Olivares and Joe (2014) recommended the use of the standardized root mean squared residual (SRMSR), and suggested that a model with SRMSR < 0.05 can be viewed as a well-fitted model. This criteria has been used in several studies on CDMs (Jiang & Ma, 2018; R. Liu, Huggins-Manley, & Bulut, 2018), but its appropriateness has not been examined.

This study aims to investigate the performance of the $M_{ord}$ statistic and SRMSR for the sG-DINA model, which provides a general model framework to handle polytomously scored items that can be decomposed into a set of tasks and are scored sequentially. In addition, unlike other CDMs for polytomous responses, the sG-DINA model can account for the fact that different attributes may be involved in different tasks, and thus has the potential to provide more accurate estimation of students’ attribute profiles.

Overview of the Sequential G-DINA Model

Suppose a test measures $K$ binary attributes, producing $2^{K}$ latent classes. Let $α_{c} = (α_{c 1}, \dots, α_{cK})^{T}$ denote the attribute profile vector for latent class $c$ , where $c = 1, \dots, 2^{K}$ . Element $α_{ck} = 1$ , if attribute $k$ is mastered by individuals in latent class $c$ , and $α_{ck} = 0$ , if attribute $k$ is not mastered. The sG-DINA model (Ma & de la Torre, 2016) assumes that item $j \in {1, \dots, J}$ involves $H_{j}$ tasks that need to be solved sequentially, and that students obtain a score of 0 if they fail the first task, a score of $h$ ( $0 < h < H_{j}$ ) if they perform the first $h$ tasks successfully, but fail task $h + 1$ , and a score of $H_{j}$ if they perform all tasks successfully. The probability of individuals in latent class $c$ performing task $h$ correctly given that task $h - 1$ has been completed successfully is referred to as the processing function (Samejima, 1997) and denoted as $s_{jh} (α_{c})$ .

Given that different tasks of item $j$ may involve different attributes, a binary q-vector $q_{jh}$ can be used to specify whether each attribute is measured by task $h$ of item $j$ , where element $q_{jhk} = 1$ if attribute $k$ is measured, and $q_{jhk} = 0$ if not. A collection of $q_{jh}$ produces a category level Q-matrix with $\sum_{j = 1}^{J} H_{j}$ rows. For task $h$ of item $j$ , let $α_{ljh}^{*}$ be the reduced attribute profile consisting of the required attributes for this task only, where $l = 1, \dots, 2^{K_{jh}^{*}}$ when the first $K_{jh}^{*}$ attributes are assumed to be required. Note that $2^{K}$ latent classes can be collapsed into $2^{K_{jh}^{*}}$ latent groups for category $h$ of item $j$ , and $s_{jh} (α_{c}) = s (α_{ljh} *)$ when latent class $c$ is collapsed into latent group $l$ . The sG-DINA model defines the processing function using the G-DINA model (de la Torre, 2011) as in

s (α_{ljh}^{*}) = ϕ_{jh 0} + \sum_{k = 1}^{K_{jh}^{*}} ϕ_{jhk} α_{lk} + \sum_{k' = k + 1}^{K_{jh}^{*}} \sum_{k = 1}^{K_{jh}^{*} - 1} ϕ_{jhkk'} α_{lk} α_{lk'} + \dots + ϕ_{jh 12 \dots K_{jh}^{*}} Π_{k = 1}^{K_{jh}^{*}} α_{lk},

(1)

where $ϕ_{jh} = (ϕ_{jh 0}, \dots, ϕ_{jh 12 \dots K_{jh}^{*}})^{T}$ is a vector of parameters involved in category h of item $j$ and $ϕ$ is used to denote a vector of all parameters involved in the measurement model. By setting appropriate constraints as in de la Torre (2011), the DINA, DINO, and A-CDM can also be used as the processing function, if necessary, for different categories within a single item. Specifically, the sequential DINA (sDINA) model is obtained when all main effects and interaction terms except the highest-order interaction are set to be 0:

s (α_{ljh}^{*}) = ϕ_{jh 0} + ϕ_{jh 12 \dots K_{jh}^{*}} Π_{k = 1}^{K_{jh}^{*}} α_{lk} .

(2)

The sequential DINO (sDINO) model is given by

s (α_{ljh}^{*}) = ϕ_{jh 0} + ϕ_{jhk} α_{lk},

(3)

where $ϕ_{jhk} = - ϕ_{jh k' k^{″}} = \dots = (- 1)^{K_{jh}^{*} + 1} ϕ_{jh 12 \dots K_{jh}^{*}}$ , for $k = 1, \dots, K_{jh}^{*}$ , $k' = 1, \dots, K_{jh}^{*} - 1$ , and $k^{″} > k', \dots, K_{jh}^{*}$ . The sequential A-CDM (sA-CDM) is the constrained identity-link G-DINA model without any interaction terms. It can be formulated as follows:

s (α_{ljh}^{*}) = ϕ_{jh 0} + \sum_{k = 1}^{K_{jh}^{*}} ϕ_{jhk} α_{lk} .

(4)

Limited-Information Measures

Let $π = {π_{x}}$ be a vector of length $u$ containing the (population) probabilities of each response pattern $x$ , and $p = {p_{x}}$ be the corresponding observed proportions from a sample of size $N$ . Also, let $\hat{π} = {π_{x} (\hat{γ})}$ be the model-implied response pattern probabilities associated with each response pattern based on $v$ parameter estimates $\hat{γ}$ . Bishop, Fienberg, and Holland (1975) showed that $\sqrt{N} (p - π) \overset{d}{\to} N (0, Γ),$ where $Γ = diag (π) - π π^{T}$ . In addition, as shown by Maydeu-Olivares and Joe (2005), the asymptotic distribution of the residual vector $p - \hat{π}$ is normal with zero means and limiting covariance matrix $Σ = Γ - Δ I^{- 1} Δ^{T}$ , that is, $\sqrt{N} (p - \hat{π}) \overset{d}{\to} N (0, Σ),$ where $Δ = \partial π (γ) / \partial γ$ is the $u \times v$ Jacobian matrix and $I = Δ^{T} diag [π (γ)]^{- 1} Δ$ is the Fisher information matrix.

Let $\hat{κ} = ({\hat{κ}}_{1}, {\hat{κ}}_{2}^{T})^{T}$ be a vector of length $w = J (J + 1) / 2$ containing all univariate and bivariate expectations, where ${\hat{κ}}_{1}$ and ${\hat{κ}}_{2}$ have elements $κ_{a} (\hat{γ}) = E [X_{a}]$ and $κ_{a, b} (\hat{γ}) = E [X_{a} X_{b}]$ , respectively. Also let $m = (m_{1}^{T}, m_{2}^{T})^{T}$ , where $m_{1}$ and $m_{2}$ are the sample counterparts of ${\hat{κ}}_{1}$ and ${\hat{κ}}_{2}$ , respectively. It is straightforward to show that $\hat{κ}$ is a linear transformation of $π (\hat{γ})$ , that is, $\hat{κ} = L π (\hat{γ})$ , where $L$ is a $w \times u$ matrix having full row rank. Likewise, $m = Lp$ . It is clear that $m - \hat{κ} = L [p - π (\hat{γ})]$ is also normally distributed, $\sqrt{N} (m - \hat{κ}) \overset{d}{\to} N_{w} (0, Ξ)$ , where $Ξ = L Σ L^{T} = L [Γ - Δ I^{- 1} Δ^{T}] L^{T} = L Γ L^{T} - L Δ I^{- 1} Δ^{T} L^{T}$ . Denote $Γ_{κ} = L Γ L^{T}$ and $Δ_{κ} = L Δ$ . $Ξ = Γ_{κ} - Δ_{κ} I^{- 1} Δ_{κ}^{T}$ . Let ${\bar{Δ}}_{κ}$ be an $w \times (w - v)$ orthogonal complement of $Δ_{κ}$ so that ${\bar{Δ}}_{κ}^{T} Δ_{κ} = 0$ . The $w - v$ dimensional vector $z_{κ} = \sqrt{N} {\bar{Δ}}_{κ}^{T} (m - \hat{κ})$ is normally distributed with asymptotic covariance matrix ${\bar{Δ}}_{κ}^{T} Γ_{κ} {\bar{Δ}}_{κ}$ . The $M_{ord}$ statistic (Maydeu-Olivares & Joe, 2014) is the quadratic form

M_{ord} = z_{κ}^{T} {[{\bar{Δ}}_{κ}^{T} Γ_{κ} {\bar{Δ}}_{κ}]}^{- 1} z_{κ} = N {(m - \hat{κ})}^{T} C_{κ} (m - \hat{κ}),

(5)

where $C_{κ} = {\bar{Δ}}_{κ} [{\bar{Δ}}_{κ}^{T} Γ_{κ} {\bar{Δ}}_{κ}]^{- 1} {\bar{Δ}}_{κ}^{T}$ . Under the null hypothesis, $M_{ord}$ is approximately $χ^{2}$ distributed. The degrees of freedom are $w - v$ , where $w = J (J + 1) / 2$ and $v$ is the number of parameters. Both $Γ_{κ}$ and $Δ_{κ}$ are evaluated at $\hat{γ}$ . Let $γ = (ϕ^{T}, ρ^{T})^{T}$ consist of both measurement model parameters $ϕ$ and structural parameters $ρ$ , and the resulting $M_{ord}$ is denoted as $M_{ord}^{all}$ because all model parameters are considered. The number of parameters in $ρ$ increases exponentially with the number of attributes $K$ , and therefore, when $K$ is very large, the $M_{ord}$ statistic may not be calculable. To address this issue, flexMIRT (Cai, 2017) ignores the structural parameter $ρ$ when calculating the $M_{2}$ statistic. However, the impact of ignoring the structural parameters has not been documented. Therefore, $M_{ord}$ statistic (referred to as $M_{ord}^{item}$ ) with $γ = ϕ$ is also calculated. The calculation details of the $M_{ord}$ statistics for the sG-DINA model are given in the Supplemental Appendix. In addition to the $M_{ord}$ statistics, the SRMSR (Maydeu-Olivares, 2013) can also be calculated as

SRMSR = \sqrt{\sum_{a < b} \frac{{(r_{ab} - {\hat{ρ}}_{ab})}^{2}}{J \frac{(J - 1)}{2}}},

(6)

where $r_{ab}$ and ${\hat{ρ}}_{ab}$ are observed and model-implied Pearson correlations, respectively, for items $a$ and $b$ . This index can be viewed as an average of correlation residuals for all item pairs.

Simulation Studies

In this section, two simulation studies were conducted to evaluate the viability of the $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics. Study 1 investigated their performance when the Q-matrix was correctly specified, but the fitted measurement model may or may not conform to the underlying condensation rule; in contrast, Study 2 considered the conditions where the Q-matrix was mistakenly specified, but the fitted measurement model is in line with the underlying condensation rule. The factors manipulated are summarized in Table 1, and elaborated thereafter.

Table 1.

Summary of the Simulation Factors.

Factors	Study 1	Study 2
Sample size ( $N$ )	$1, 000, 3, 000$
Test length ( $J$ )	$30$
Number of attributes ( $K$ )	$5$
Number of response categories ( $RC$ )	$3$ , $4$
Generating attribute structure	Multivariate normal distribution
Item quality	High, moderate, low
Generating model	sG-DINA, sDINA, sDINO, sA-CDM	sG-DINA
Fitted model	sG-DINA, sDINA, sDINO, sA-CDM	sG-DINA
Proportion of misspecified q-entries	0%	5%

Open in a new tab

Note. DINA = deterministic input noisy “and” gate; DINO = deterministic input noisy “or” gate; CDM = cognitive diagnosis model; A-CDM = additive CDM; sG-DINA = sequential generalized DINA model; sDINA = sequential DINA; sDINO = sequential DINO; sA-CDM = sequential A-CDM.

Study 1: The Model-Data Fit Measures Under Condensation Rule Misspecifications

Design

The number of items was fixed to $J = 30$ , which has been considered in many previous simulation studies (e.g., Y. Liu, Tian, & Xin, 2016) and is also similar to the test length in real-world diagnostic assessments (e.g., Bradshaw, Izsák, Templin, & Jacobson, 2014). Sample sizes were $1, 000$ and $3, 000$ , where the former is close to the median of sample sizes (i.e., 1,255) of 36 articles on CDM applications reviewed by Sessoms and Henson (2018) and the later represents a relatively large, but still realistic sample size as 30% of articles reviewed by Sessoms and Henson (2018) had sample sizes greater than 2000. The number of response categories, which is identical for all items in a test, has two levels: $RC = 3$ or $4$ with the maximum score being 2 or 3, respectively. The Q-matrix was simulated for each replication with constraints that (a) the maximum number of attributes required by each nonzero category is 2, (b) the number of categories measuring single and two attributes are equal, (c) each nonzero category measures at least one attribute, and (d) each attribute was measured by at least one item. Like Chiu, Douglas, and Li (2009), for individual $i$ , latent traits $θ_{i} = (θ_{i 1}, \dots, θ_{iK})^{T}$ were first generated from a multivariate normal distribution with mean vector $0_{K}$ . Variances and covariances in the covariance matrix were set to 1 and 0.5, respectively. Then the $k$ th element of attribute profile $α_{ik} = 1$ if $θ_{ik} \geq Φ^{- 1} k / (K + 1)$ and 0 otherwise. Data were simulated using the sDINA model, sDINO model, sA-CDM, and sG-DINA model. The quality of items had three levels with both $s (α_{ljh}^{*} = 0)$ and $1 - s (α_{ljh}^{*} = 1)$ being drawn from $U (0.05, 0.15), U (0.15, 0.25)$ , and $U (0.25, 0.35)$ for all categories of all items, representing high, moderate, and low quality, respectively. For the sA-CDM, main effects were constrained to be equal for each item indicating that all required attributes have the same contribution to the processing function as in Ma and de la Torre (2019). For the sG-DINA model, the success probabilities for individuals with attribute pattern $α_{ljh}^{*}$ being neither $0$ nor $1$ were simulated randomly with the monotonic constraint that $s (α_{ljh}^{*}) \geq s (α_{l^{'} jh}^{*})$ if $α_{ljh}^{*} ≻ α_{l^{'} jh}^{*}$ .

Note that whether the $M_{ord}$ statistics can distinguish different sequential models rely on how similar or dissimilar they are. Ma, Iaconangelo, and de la Torre (2016) examined the similarity among several dichotomous CDMs, but they mainly focused on additive models with different link functions and did not consider the impact of the quality of items. In this study, the dissimilarity between a true model and an approximating model is defined based on the Kullback–Leibler divergence (Chang & Ying, 1996; Xu, Chang, & Douglas, 2003), where a small value indicates that the approximating model can mimic the true model well and a large value indicates that the approximating model cannot mimic the true model well. Definition of the dissimilarity index and details about the dissimilarity analysis can be found online in the Supplemental Appendix.

Figure 1 gives the boxplot of the dissimilarity among different models based on 500 replications under varied conditions. Several findings can be observed. First, under all conditions, the dissimilarity between a model and itself was 0, and the sG-DINA model mimicked the models it subsumes perfectly. Second, the additive processing function represents a condensation rule between the conjunctive and disjunctive rules. Specifically, the sA-CDM mimicked the sDINA model better than the sDINO model did, and mimicked the sDINO model better than the sDINA model did. Third, the sA-CDM can mimic the sDINA and sDINO models similarly well. Fourth, it is difficult for the sDINA and sDINO models to mimic the sG-DINA model, whereas the sA-CDM can mimic the sG-DINA model quite well under many replications. Last but not least, the worse the quality of items became, the better one model could mimic another. This suggests that the quality of items and the dissimilarity of models are confounded. In other words, although the label of “item quality” was used, it also represents the magnitude of dissimilarity to some extent.

Under each condition, 500 data sets were generated and fitted using the sDINA, sDINO, sA-CDM, and sG-DINA models. The $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics were calculated for each fitted model. All analyses were conducted using the GDINA R package (Ma & de la Torre, 2018). The Type I error rate of the $M_{ord}$ statistic is calculated as the proportion of the generating model that is mistakenly flagged as a misfit model over all replications under each condition. Also, the correct detection rate is defined as the proportion of a misspecified condensation rule or Q-matrix that was correctly flagged over all replications.

Results

Table 2 gives the Type I error rates of the $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics. Note that with 500 replications, the Type I error rates have a 95% chance of falling in the interval of $[. 04, . 06]$ . From Table 2, it can be observed that the $M_{ord}^{all}$ statistic had well-calibrated Type I error rates under the .05 nominal level with only a few exceptions. Specifically, the $M_{ord}^{all}$ statistic was slightly liberal for the sDINA model with four out of 12 Type I error rates between .06 and .075, but slightly conservative for the sG-DINA model with two Type I error rates between .03 and .04. In contrast, the $M_{ord}^{item}$ statistic was more likely to produce underestimated Type I error rates, especially when items were of high quality. Two inflated Type I error rates of $M_{ord}^{item}$ statistic were observed when items were of moderate and low quality. Sample size, the number of response categories, and item quality had little impact on the Type I error rates.

Table 2.

Type I Error Rates Under $α = . 05$ .

			High quality		Moderate quality		Low quality
Model	$N$	$RC$	$M_{ord}^{all}$	$M_{ord}^{all}$	$M_{ord}^{all}$	$M_{ord}^{all}$	$M_{ord}^{all}$	$M_{ord}^{all}$
sDINA	1,000	3	0.058	0.030	0.048	0.040	0.054	0.050
		4	0.066	0.038	0.060	0.044	0.062	0.062
	3,000	3	0.054	0.020	0.074	0.044	0.070	0.056
		4	0.058	0.040	0.040	0.036	0.044	0.046
sDINO	1,000	3	0.056	0.016	0.060	0.044	0.056	0.042
		4	0.054	0.026	0.052	0.052	0.048	0.054
	3,000	3	0.048	0.024	0.036	0.030	0.048	0.050
		4	0.048	0.032	0.050	0.034	0.050	0.048
sA-CDM	1,000	3	0.054	0.042	0.034	0.024	0.064	0.048
		4	0.048	0.046	0.060	0.050	0.046	0.046
	3,000	3	0.054	0.034	0.050	0.042	0.052	0.044
		4	0.052	0.038	0.050	0.044	0.048	0.052
sG-DINA	1,000	3	0.044	0.034	0.058	0.050	0.056	0.054
		4	0.044	0.038	0.044	0.038	0.038	0.040
	3,000	3	0.034	0.030	0.046	0.046	0.036	0.038
		4	0.040	0.032	0.058	0.068	0.040	0.048

Open in a new tab

Note. DINA = deterministic input noisy “and” gate; DINO = deterministic input noisy “or” gate; CDM = cognitive diagnosis model; sG-DINA = sequential generalized DINA model; sDINA = sequential DINA; sDINO = sequential DINO; sA-CDM = sequential additive CDM.

The correct detection rates for $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics for data generated from the sDINA model are presented in Table 3. In the literature, a correct detection rate of .80 or higher is typically considered adequate, and .90 or higher excellent (e.g., de la Torre & Lee, 2013). When items were of moderate or high quality, both $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics had excellent correct detection rates in rejecting the sDINO model, with only one exception, which occurred under $N = 1, 000$ , $RC = 4$ , and moderate item quality condition. When items were of low quality, the $M_{ord}^{all}$ statistic had low detection rates to reject the sDINO model. In contrast, the $M_{ord}^{item}$ statistic had higher detection rates. For example, when $N = 3, 000$ , $RC = 3$ , and items were of low quality, the correct detection rates of $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics were 0.282 and 0.980, respectively. In addition, as shown in Table 3, the correct detection rates for the $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics in rejecting the sA-CDM were excellent when items were of high quality and $N = 3, 000$ , but dropped dramatically as $N$ became smaller or item quality became less optimal. For example, when items were of low quality, the correct detection rates for the $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics were below .082 and .104, respectively. Although $M_{ord}^{item}$ statistic still tended to have higher correct detection rates in rejecting the sA-CDM than $M_{ord}^{all}$ statistic under most conditions, the differences were less noticeable. Last, Table 3 also shows that the correct detection rates in rejecting the sG-DINA model for $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics were very low (ranging from .024 to .074), which was consistent with the author’s expectation in that the generating model is subsumed by the sG-DINA model. This implies that the $M_{ord}$ statistic is insensitive to model overfitting. Similar patterns were also observed when data were generated from other sequential models subsumed by the sG-DINA model.

Table 3.

Correct Detection Rates Under $α = . 05$ : sDINA-Generated Data.

			High quality		Moderate quality		Low quality
Model	$N$	$RC$	$M_{ord}^{all}$	$M_{ord}^{item}$	$M_{ord}^{all}$	$M_{ord}^{item}$	$M_{ord}^{all}$	$M_{ord}^{item}$
sDINO	1,000	3	1.000	1.000	0.902	1.000	0.100	0.546
		4	1.000	1.000	0.882	0.990	0.092	0.300
	3,000	3	1.000	1.000	0.996	1.000	0.282	0.980
		4	1.000	1.000	0.996	1.000	0.188	0.790
sA-CDM	1,000	3	0.798	0.810	0.092	0.118	0.064	0.046
		4	0.402	0.418	0.026	0.030	0.056	0.048
	3,000	3	1.000	1.000	0.470	0.568	0.082	0.104
		4	0.926	0.942	0.202	0.236	0.048	0.062
sG-DINA	1,000	3	0.064	0.024	0.068	0.056	0.068	0.048
		4	0.046	0.036	0.048	0.052	0.074	0.086
	3,000	3	0.048	0.026	0.052	0.034	0.058	0.060
		4	0.058	0.028	0.054	0.042	0.038	0.054

Open in a new tab

Note. DINA = deterministic input noisy “and” gate; DINO = deterministic input noisy “or” gate; CDM = cognitive diagnosis model; sG-DINA = sequential generalized DINA model; sDINO = sequential DINO; sA-CDM = sequential additive CDM.

When data were generated using the sDINO model, as presented in Table 4, the correct detection rates in rejecting the sDINA model for $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics are adequate when items were of moderate or high quality (ranging from .866 to 1); but dropped considerably as item quality became poor, especially for the $M_{ord}^{all}$ statistic. More specifically, when items were of low quality, the correct detection rates for the $M_{ord}^{all}$ statistic ranged from .074 to .260; whereas the $M_{ord}^{item}$ statistic outperformed the $M_{ord}^{all}$ statistic with correct detection rates ranging from .266 to .960. In addition, from Table 4, the correct detection rates in rejecting the sA-CDM for $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics were excellent when items were of high quality, but worsened substantially under other studied conditions. For example, when items were of moderate quality, the detection rates ranged from .194 to .670.

Table 4.

Correct Detection Rates Under $α = . 05$ : sDINO-Generated Data.

			High quality		Moderate quality		Low quality
Model	$N$	$RC$	$M_{ord}^{all}$	$M_{ord}^{item}$	$M_{ord}^{all}$	$M_{ord}^{item}$	$M_{ord}^{all}$	$M_{ord}^{item}$
sDINA	1,000	3	1.000	1.000	0.912	1.000	0.114	0.540
		4	1.000	1.000	0.866	0.988	0.074	0.266
	3,000	3	1.000	1.000	1.000	1.000	0.260	0.960
		4	1.000	1.000	1.000	1.000	0.182	0.728
sA-CDM	1,000	3	0.958	0.962	0.248	0.228	0.064	0.068
		4	0.932	0.926	0.194	0.200	0.080	0.070
	3,000	3	0.998	1.000	0.608	0.670	0.110	0.102
		4	1.000	1.000	0.474	0.486	0.070	0.048
sG-DINA	1,000	3	0.042	0.024	0.058	0.046	0.058	0.038
		4	0.058	0.022	0.056	0.040	0.054	0.062
	3,000	3	0.050	0.024	0.054	0.038	0.052	0.040
		4	0.042	0.020	0.042	0.036	0.054	0.064

Open in a new tab

Table 5 gives the correct detection rates for data generated from the sA-CDM. The $M_{ord}^{all}$ statistic had adequate correct detection rates in rejecting the sDINA and sDINO models only when items were of high quality and $N = 3, 000$ . The $M_{ord}^{item}$ statistic, however, had higher correct detection rates than the $M_{ord}^{all}$ statistic under all conditions in rejecting the sDINA and sDINO models. Specifically, excellent correct detection rates for the $M_{ord}^{item}$ statistic were observed when items were of high quality or $N = 3, 000$ , and adequate correction rates were observed when $N = 1, 000$ , $RC = 3$ , and items were of moderate quality. However, when items were of low quality, both $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics had low correct detection rates in rejecting the sDINA and sDINO models.

Table 5.

Correct Detection Rates Under $α = . 05$ : sA-CDM-Generated Data.

			High quality		Moderate quality		Low quality
Model	$N$	$RC$	$M_{ord}^{all}$	$M_{ord}^{item}$	$M_{ord}^{all}$	$M_{ord}^{item}$	$M_{ord}^{all}$	$M_{ord}^{item}$
sDINA	1,000	3	0.454	1.000	0.028	0.806	0.044	0.142
		4	0.496	0.992	0.042	0.426	0.038	0.054
	3,000	3	1.000	1.000	0.162	0.998	0.052	0.432
		4	0.968	1.000	0.138	0.936	0.060	0.184
sDINO	1,000	3	0.488	1.000	0.022	0.856	0.032	0.134
		4	0.472	1.000	0.034	0.548	0.034	0.064
	3,000	3	0.974	1.000	0.202	1.000	0.040	0.572
		4	0.982	1.000	0.128	0.966	0.048	0.244
sG-DINA	1,000	3	0.048	0.036	0.040	0.036	0.050	0.044
		4	0.048	0.046	0.050	0.058	0.048	0.054
	3,000	3	0.058	0.026	0.058	0.050	0.058	0.048
		4	0.058	0.038	0.040	0.046	0.046	0.048

Open in a new tab

Note. DINA = deterministic input noisy “and” gate; DINO = deterministic input noisy “or” gate; sG-DINA = sequential generalized DINA model; sDINA = sequential DINA; sDINO = sequential DINO.

Table 6 gives the correct detection rates for data generated using the sG-DINA model. It can be found that both $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics had adequate or better detection rates in rejecting the sDINA and sDINO models under favorable conditions (i.e., larger $N$ , smaller $RC$ , better item quality). However, their detection rates dropped to below adequate level under unfavorable conditions. A substantial improvement in the correct detection rates can be observed by using $M_{ord}^{item}$ statistic instead of $M_{ord}^{all}$ statistic under these unfavorable conditions in rejecting the sDINA and sDINO models. Last, under all studied conditions, two statistics performed similarly poorly in rejecting the sA-CDM with the correct detection rates ranging from .028 to .178, which suggests that $M_{ord}$ statistics are insensitive to the omission of attribute interactions.

Table 6.

Correct Detection Rates Under $α = . 05$ : sG-DINA-Generated Data.

			High quality		Moderate quality		Low quality
Model	$N$	$RC$	$M_{ord}^{all}$	$M_{ord}^{item}$	$M_{ord}^{all}$	$M_{ord}^{item}$	$M_{ord}^{all}$	$M_{ord}^{item}$
sDINA	1,000	3	1.000	1.000	0.850	0.976	0.172	0.324
		4	0.986	0.998	0.452	0.778	0.086	0.166
	3,000	3	1.000	1.000	0.996	0.998	0.544	0.868
		4	1.000	1.000	0.924	0.992	0.230	0.532
sDINO	1,000	3	1.000	1.000	0.860	0.986	0.158	0.348
		4	1.000	1.000	0.612	0.906	0.106	0.186
	3,000	3	1.000	1.000	0.990	1.000	0.582	0.908
		4	1.000	1.000	0.962	1.000	0.292	0.650
sA-CDM	1,000	3	0.076	0.064	0.054	0.056	0.056	0.040
		4	0.062	0.056	0.028	0.042	0.060	0.064
	3,000	3	0.206	0.178	0.058	0.048	0.052	0.056
		4	0.176	0.138	0.054	0.062	0.052	0.048

Open in a new tab

Note. DINA = deterministic input noisy “and” gate; DINO = deterministic input noisy “or” gate; CDM = cognitive diagnosis model; sDINA = sequential DINA; sDINO = sequential DINO; sA-CDM = sequential additive CDM.

In addition to the $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics, the properties of SRMSR under varied conditions were also investigated. As an absolute fit measure, the SRMSR is usually compared with a cut-off to indicate whether the model can fit the data adequately. The author attempts to find a cut-off $ϵ$ such that it can be used to separate the true model from misspecified models or a model with SRMSR < $ϵ$ produce relatively accurate person classifications as the major goal of CDM analyses is to classify students into latent classes. The accuracy of person classification is quantified by the proportion of correctly classified attribute vectors (PCV). Results are discussed below, but due to space limits, the scatter plots of PCV and SRMSR under different generating models are given online in the Supplemental Appendix.

When data were generated using the sDINA model, the sDINO model tend to produce larger SRMSRs and lower PCVs than the sDINA model. The sA-CDM produced larger SRMSRs when items were of high or moderate quality, but comparable SRMSRs when items were of low quality. The sG-DINA model produced similar SRMSRs and PCVs under all conditions. Taken together, SRMSR may be used to distinguish sDINO from sDINA model, but may not be effective to distinguish sA-CDM from sDINA model, especially when the quality of items was poor. It is apparent that SRMSR cannot be used to distinguish sG-DINA model from sDINA model.

In addition, compared with the number of response categories, the quality of items and sample size exerted a major influence on SRMSR. In particular, the SRMSR tended to be smaller as sample size increased for both true and misspecified models. When the quality of items worsened, the sDINA and sG-DINA models tended to produce slightly larger SRMSRs, but the sDINO and sA-CDM tended to produce smaller SRMSRs. More importantly, it is evident that no single cutoff $ϵ$ of the SRMSR existed to differentiate the true model from misspecified models or to differentiate models that produced high PCVs from those with low PCVs.

Similar patterns can be observed when the generating model were other sequential models. Specifically, when the sDINO model was the true model, the SRMSR could differentiate the sDINO model from the sDINA model, but was less effective to differentiate it from the sA-CDM and sG-DINA models. When the sA-CDM was the true model, the SRMSR could distinguish the sA-CDM from sDINA and sDINO models, especially when items were of high or moderate quality but not from the sG-DINA model. When the sG-DINA model was the true model, the SRMSR can distinguish it from the sDINA and sDINO models especially when items were of high or moderate quality, but cannot distinguish it from sA-CDM usually. Also, regardless of the generating model, sample size and item quality had major impact on SRMSR and there was no single cutoff suitable for SRMSR to distinguish the true model from the misspecified models.

Study 2: The Model-Data Fit Measures Under Q-Matrix Misspecifications

Design

The goal of this simulation study is to examine the correct detection rates of the $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics in detecting misspecifications in the Q-matrix. The settings of sample size, item quality, number of response categories, and generating attribute distribution were the same as the previous simulation study. The generating CDM considered in this simulation study is the sG-DINA model and the data were simulated in the same manner as in the previous study. For each replication, 2.5% 1 s and 2.5% 0 s in the Q-matrix were randomly selected and modified (i.e., $0 \to 1$ , or $1 \to 0$ ) with the constraint that each row and each column of the misspecified Q-matrix contain at least one 0 and at least one 1. This yielded 5% balanced misspecifications in the Q-matrix, similar to de la Torre and Chiu (2016). Note that the proportion of misspecifications considered in this study was less than those in other studies on model fit evaluation (e.g., Y. Liu et al., 2016; Wang, Shu, Shang, & Xu, 2015) because a mild level of misspecifications is created, and higher detection rates could be expected if more elements in the Q-matrix are misspecified. The data were fitted using the sG-DINA model along with the misspecified Q-matrix, and the $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics were calculated.

Results

Table 7 gives the correct detection rates of the $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics, as well as the average SRMSRs and their 90% confidence intervals. The confidence intervals were calculated empirically from the 500 replications with the lower and upper bounds being the 5th and 95th percentile points, respectively. Several conclusions can be drawn from the results. First, the correct detection rates for both $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics were higher under the conditions with larger sample size, better item quality, or fewer response categories. For example, when items were of high quality, both $M_{ord}^{all}$ and $M_{ord}^{item}$ statistics showed adequate or better correct detection rates (i.e., greater than .826). Note that only 5% elements in the Q-matrix were misspecified and it is expected that the correct detection rates would be improved when a larger proportion of elements are mistakenly specified. Second, across all conditions, the $M_{ord}^{item}$ statistic outperformed the $M_{ord}^{all}$ statistic with only one exception, where two statistics had the same correct detection rate. In addition to the $M_{ord}$ statistics, SRMSRs varied considerably under different conditions. More specifically, SRMSR tend to be larger when items were of better quality and sample size was smaller. For example, when $N = 1, 000$ , items were of high quality and $RC = 3$ , the 90% confidence interval for SRMSR is [.043, .102]; whereas when $N = 3, 000$ , items were of low quality and $RC = 3$ , the 90% confidence interval for SRMSR is [.018, .027]. This implies that the SRMSR is not only a function of model-data misspecifications, but also of many other factors; hence, using a single cut-off of SRMSR under varied conditions to evaluate the magnitude of model-data misfit may not be appropriate.

Table 7.

Correct Detection Rates of $M_{ord}$ Statistics and SRMSR Under Misspecified Q-Matrix.

			Correct detection rates		SRMSR	90%CI for SRMSR
$N$	Item quality	$RC$	$M_{ord}^{all}$	$M_{ord}^{item}$		LL	UL
1,000	High	2	0.950	0.954	0.071	0.043	0.102
		3	0.826	0.874	0.072	0.041	0.107
	Moderate	2	0.706	0.732	0.045	0.032	0.061
		3	0.384	0.540	0.046	0.034	0.061
	Low	2	0.208	0.236	0.033	0.029	0.037
		3	0.090	0.136	0.033	0.031	0.037
3,000	High	2	0.998	0.998	0.068	0.036	0.101
		3	0.938	0.960	0.067	0.038	0.099
	Moderate	2	0.924	0.952	0.038	0.024	0.054
		3	0.758	0.864	0.040	0.025	0.057
	Low	2	0.498	0.586	0.022	0.018	0.027
		3	0.188	0.312	0.022	0.019	0.027

Open in a new tab

Note. SRMSR = standardized root mean square residual; LL = lower limit; UL = upper limit.

Summary and Discussion

Assessing model-data fit has become a routine task in psychometric analyses to ensure the validity of inferences from the observed responses. This study systematically investigated the performance of two implementations of the $M_{ord}$ statistic for ordinal response data under the sG-DINA model. Simulation studies showed that the $M_{ord}^{all}$ statistic had better-calibrated Type I error rates than the $M_{ord}^{item}$ statistic, which was more likely to be conservative, especially when items were of high quality. Neither statistic is sensitive in rejecting the sG-DINA model when data were simulated using CDMs it subsumed, with the correct detection rates being close to the nominal level. This is not unexpected in that based on the model dissimilarity analysis, the sG-DINA model has more parameters than the generating models and can mimic the generating models perfectly. These additional parameters would probably capture the idiosyncrasies in the data and yield an overfitted model. However, the overfitting is unlikely to be detected by the fit statistics that assess the magnitude of residuals in that the overfitted model produces similar, if not smaller, residuals as the true model. The Wald test and likelihood ratio test have been shown promising in comparing the sG-DINA model and models it subsumes (Ma & de la Torre, 2019), and thus may be used as a supplement to the $M_{ord}$ statistics.

When the generating model was sDINA model, the $M_{ord}$ statistics yielded higher correct detection rates in rejecting the sDINO model than the sA-CDM. Similarly, when the generating model was sDINO model, the $M_{ord}$ statistics yielded higher correct detection rates in rejecting the sDINA model than the sA-CDM. This is caused by the fact that the sDINA and sDINO models represent two distinct condensation rules and that the sA-CDM is in-between, as shown in the boxplot of model dissimilarity in Figure 1. The dissimilarity among these three models explains why distinguishing the sDINA and sDINO models is easier than distinguishing them from the sA-CDM using the $M_{ord}$ statistics.

In addition, when the generating model was sG-DINA model, the $M_{ord}$ statistics had higher power to reject the sDINA and sDINO model than the sA-CDM. This is also in line with the findings from the study on dissimilarities among these models. From Figure 1, the sA-CDM mimicked the sG-DINA model better than the sDINA and sDINO models did, implying that distinguishing sA-CDM from sG-DINA model would be more challenging.

The dissimilarity analysis also reveals that the quality of items and model dissimilarity are confounded. Simulation studies showed that when items were of low quality, the $M_{ord}$ statistics had very low detection rates, which may be attributed to the fact that these models are too similar to differentiate. Note that when items were of low quality, the $M_{ord}$ statistics also experienced considerable difficulties in detecting Q-matrix misspecifications. In addition to the conditions that involve items of poor quality, the correct detection rates of the $M_{ord}$ statistics dropped substantially when sample size was small and number of response categories was large. The issue of low detection rates for $M_{ord}$ statistic in detecting some types model misspecifications has been observed in other studies as well (e.g., Cai & Hansen, 2013). Therefore, the $M_{ord}$ statistics need to be used with caution under these unfavorable conditions.

An interesting but perplexing finding is that the $M_{ord}^{item}$ statistic tends to produce similar or higher correct detection rates than the $M_{ord}^{all}$ statistic in detecting both model and Q-matrix misspecifications. It is unclear why this happens, and more studies are needed before it can safely be concluded that the $M_{ord}^{item}$ statistic should be preferred, especially when it is noted the $M_{ord}^{item}$ statistic tends to produce Type I error rates lower than the nominal level. Nevertheless, as shown in the real data analysis, which is presented online on the Supplemental Appendix due to the space limits, it is evident that the $M_{ord}^{item}$ statistic is more likely to be calculable because it ignores population proportion parameters and thus has larger degrees of freedom than the $M_{ord}^{all}$ statistic.

In addition to the $M_{ord}$ statistics, which examines whether the model fits the data statistically, the author also investigates the performance of SRMSR, which estimates the magnitude of model-data misfit. Although researchers have used SRMSR < .05 as the cut-off for acceptable model-data fit in CDMs (R. George & Robitzsch, 2015; Jiang & Ma, 2018; R. Liu et al., 2018), this study showed that the ranges of SRMSRs varied considerably under different conditions for both true model and misspecified models and thus there is no one-size-fits-all cut-off for SRMSR. Given that the SRMSR is the average correlation residuals, a model with smaller SRMSR can be viewed as having better absolute fit on average. However, it is more informative to report the maximum correlation residuals similar to J. Chen, de la Torre, and Zhang (2013), which will allow researchers to identify how severe and where the worst fit is. In addition, it is possible to employ some resampling techniques to obtain the empirical distribution of the SRMSR, based on which, a formal hypothesis test can be performed to determine whether the obtained SRMSR is too large or not. However, the performance of resampling procedures needs to be further examined.

This study systematically documents the performance of $M_{ord}$ statistics and SRMSR for the sG-DINA model, but it is not without limitations. First, the number of attributes and items were fixed in the simulation studies and future research may vary them. The attribute profiles were drawn from multivariate normal distribution with fixed correlations among attributes. Future research may vary the correlations or assume different distributions for generating attribute profiles. Also, like Ma and de la Torre (2019), when generating data using the sA-CDM, all required attributes were assumed to contribute equally, but future research could consider relaxing this constraint. In addition, this study only considered two types of misspecifications, namely, misspecified condensation rules and Q-matrix, and future studies may consider other causes of model-data misfit. For example, continuous latent variables may be treated as dichotomous ones and the number of attributes may be mistakenly specified. Furthermore, despite well-calibrated Type I error rates, the $M_{ord}$ statistics were found insensitive to some types of misspecification. Future research may explore how to extend other measures, such as log odds ratio and transformed correlation (J. Chen et al., 2013), for ordinal responses. Also note that for dichotomous items, the $M_{ord}$ statistic is equivalent to the $M_{2}$ statistic. Although the performance of $M_{2}$ statistic has been investigated for dichotomous response CDMs (Hansen et al., 2016; Y. Liu, Tian, & Xin, 2016), researchers did not take item quality or model dissimilarity into consideration. The current study suggests that the $M_{ord}$ and $M_{2}$ statistics need to be used with caution when items were of poor quality. Last but not least, future research may explore how to integrate the $M_{ord}$ statistics and SRMSR with Q-matrix validation procedures (e.g., de la Torre & Chiu, 2016; Ma & de la Torre, 2020), item-level model comparison approaches (e.g., de la Torre & Lee, 2013) and item fit measures (e.g., J. Chen et al., 2013; Sorrel, Abad, Olea, de la Torre, & Barrada, 2017; Wang et al., 2015) to determine the most appropriate models with acceptable model-data fit.

Supplemental Material

Online_Appendix – Supplemental material for Evaluating the Fit of Sequential G-DINA Model Using Limited-Information Measures

Click here for additional data file.^{(1.1MB, pdf)}

Supplemental material, Online_Appendix for Evaluating the Fit of Sequential G-DINA Model Using Limited-Information Measures by Wenchao Ma in Applied Psychological Measurement

Footnotes

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the Office for Research and Economic Development, The University of Alabama (Grant RG 14872).

ORCID iD: Wenchao Ma Inline graphic https://orcid.org/0000-0002-6763-0707

Supplemental Material: Supplemental material for this article is available online.

References

Bishop Y. M. M., Fienberg S. E., Holland P. W. (1975). Discrete multivariate analysis: Theory and practice (With the collaboration of R. J. Light & F. Mosteller). Cambridge, MA: MIT Press. [Google Scholar]
Bradshaw L., Izsák A., Templin J., Jacobson E. (2014). Diagnosing teachers’ understandings of rational numbers: Building a multidimensional test within the diagnostic classification framework. Educational Measurement: Issues and Practice, 33, 2-14. [Google Scholar]
Cai L. (2017). ﬂexMIRT: Flexible multilevel multidimensional item analysis and test scoring [Computer software] (Version 3.51). Chapel Hill, NC: Vector Psychometric Group. [Google Scholar]
Cai L., Hansen M. (2013). Limited-information goodness-of-fit testing of hierarchical item factor models. British Journal of Mathematical and Statistical Psychology, 66, 245-276. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chang H.-H., Ying Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213-229. [Google Scholar]
Chen F., Liu Y., Xin T., Cui Y. (2018). Applying the M₂ statistic to evaluate the fit of diagnostic classification models in the presence of attribute hierarchies. Frontiers in Psychology, 9, Article 1875. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen J., de la Torre J., Zhang Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50, 123-140. [Google Scholar]
Chiu C.-Y., Douglas J. A., Li X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74, 633-665. [Google Scholar]
de la Torre J. (2011). The generalized DINA model framework. Psychometrika, 76, 179-199. [Google Scholar]
de la Torre J., Chiu C.-Y. (2016). A general method of empirical Q-matrix validation. Psychometrika, 81, 253-273. [DOI] [PubMed] [Google Scholar]
de la Torre J., Lee Y. S. (2013). Evaluating the Wald test for item-level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50, 355-373. [Google Scholar]
George A. C., Robitzsch A. (2015). Cognitive diagnosis models in R: A didactic. The Quantitative Methods for Psychology, 11, 189-205. [Google Scholar]
Haertel E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 301-321. [Google Scholar]
Hansen M., Cai L., Monroe S., Li Z. (2016). Limited-information goodness-of-fit testing of diagnostic classification item response models. British Journal of Mathematical and Statistical Psychology, 69, 225-252. [DOI] [PubMed] [Google Scholar]
Jiang Z., Ma W. (2018). Integrating differential evolution optimization to cognitive diagnostic model estimation. Frontiers in Psychology, 9, Article 2142. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jurich D. P. (2014). Assessing model fit of multidimensional item response theory and diagnostic classification models using limited-information statistics (Unpublished doctoral dissertation). James Madison University, Harrisonburg, VA. [Google Scholar]
Liu R., Huggins-Manley A. C., Bulut O. (2018). Retrofitting diagnostic classification models to responses from IRT-based assessment forms. Educational and Psychological Measurement, 78, 357-383. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu R., Jiang Z. (2018). Diagnostic classification models for ordinal item responses. Frontiers in Psychology, 9, Article 2512. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu Y., Tian W., Xin T. (2016). An application of M₂ statistic to evaluate the fit of cognitive diagnostic models. Journal of Educational and Behavioral Statistics, 41, 3-26. [Google Scholar]
Liu Y., Xin T., Li L., Tian W., Liu X. (2016). An improved method for differential item functioning detection in cognitive diagnosis models: An application of Wald statistic based on observed information matrix. Acta Psychologica Sinica, 48, 588-598. [Google Scholar]
Ma W., de la Torre J. (2016). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology, 69, 253-275. [DOI] [PubMed] [Google Scholar]
Ma W., de la Torre J. (2018). GDINA: The generalized DINA model framework [Computer software] (Version 2.1). Retrieved from https://CRAN.R-project.org/package=GDINA
Ma W., de la Torre J. (2019). Category-level model selection for the sequential G-DINA model. Journal of Educational and Behavioral Statistics, 44, 45-77. [Google Scholar]
Ma W., de la Torre J. (2020). An empirical Q-matrix validation method for the sequential G-DINA model. British Journal of Mathematical and Statistical Psychology, 73, 143-166. [DOI] [PubMed] [Google Scholar]
Ma W., Iaconangelo C., de la Torre J. (2016). Model similarity, model selection, and attribute classification. Applied Psychological Measurement, 40, 200-217. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maydeu-Olivares A. (2013). Goodness-of-fit assessment of item response theory models. Measurement: Interdisciplinary Research and Perspectives, 11, 71-101. [Google Scholar]
Maydeu-Olivares A., Joe H. (2005). Limited-and full-information estimation and goodness-of-fit testing in 2 n contingency tables: A unified framework. Journal of the American Statistical Association, 100, 1009-1020. [Google Scholar]
Maydeu-Olivares A., Joe H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika, 71, 713-732. [Google Scholar]
Maydeu-Olivares A., Joe H. (2014). Assessing approximate fit in categorical data analysis. Multivariate Behavioral Research, 49, 305-328. [DOI] [PubMed] [Google Scholar]
Reiser M. (2008). Goodness-of-fit testing using components based on marginal frequencies of multinomial data. British Journal of Mathematical and Statistical Psychology, 61, 331-360. [DOI] [PubMed] [Google Scholar]
Samejima F. (1997). Graded response model. In van der Linden W. J., Hambleton R. K. (Eds.), Handbook of modern item response theory (pp. 85-100). New York, NY: Springer. [Google Scholar]
Sessoms J., Henson R. A. (2018). Applications of diagnostic classification models: A literature review and critical commentary. Measurement: Interdisciplinary Research and Perspectives, 16, 1-17. [Google Scholar]
Sorrel M. A., Abad F. J., Olea J., de la Torre J., Barrada J. R. (2017). Inferential item-fit evaluation in cognitive diagnosis modeling. Applied Psychological Measurement, 41, 614-631. [DOI] [PMC free article] [PubMed] [Google Scholar]
Templin J. L., Henson R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287-305. [DOI] [PubMed] [Google Scholar]
von Davier M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287-307. [DOI] [PubMed] [Google Scholar]
Wang C., Shu Z., Shang Z., Xu G. (2015). Assessing item-level fit for the DINA model. Applied Psychological Measurement, 39, 525-538. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu X., Chang H., Douglas J. (2003, April). Computerized adaptive testing strategies for cognitive diagnosis. Paper presented at the Annual Meeting of the National Council of Measurement in Education, Chicago, IL. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Online_Appendix – Supplemental material for Evaluating the Fit of Sequential G-DINA Model Using Limited-Information Measures

Click here for additional data file.^{(1.1MB, pdf)}

Supplemental material, Online_Appendix for Evaluating the Fit of Sequential G-DINA Model Using Limited-Information Measures by Wenchao Ma in Applied Psychological Measurement

[bibr1-0146621619843829] Bishop Y. M. M., Fienberg S. E., Holland P. W. (1975). Discrete multivariate analysis: Theory and practice (With the collaboration of R. J. Light & F. Mosteller). Cambridge, MA: MIT Press. [Google Scholar]

[bibr2-0146621619843829] Bradshaw L., Izsák A., Templin J., Jacobson E. (2014). Diagnosing teachers’ understandings of rational numbers: Building a multidimensional test within the diagnostic classification framework. Educational Measurement: Issues and Practice, 33, 2-14. [Google Scholar]

[bibr3-0146621619843829] Cai L. (2017). ﬂexMIRT: Flexible multilevel multidimensional item analysis and test scoring [Computer software] (Version 3.51). Chapel Hill, NC: Vector Psychometric Group. [Google Scholar]

[bibr4-0146621619843829] Cai L., Hansen M. (2013). Limited-information goodness-of-fit testing of hierarchical item factor models. British Journal of Mathematical and Statistical Psychology, 66, 245-276. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr5-0146621619843829] Chang H.-H., Ying Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213-229. [Google Scholar]

[bibr6-0146621619843829] Chen F., Liu Y., Xin T., Cui Y. (2018). Applying the M₂ statistic to evaluate the fit of diagnostic classification models in the presence of attribute hierarchies. Frontiers in Psychology, 9, Article 1875. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr7-0146621619843829] Chen J., de la Torre J., Zhang Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50, 123-140. [Google Scholar]

[bibr8-0146621619843829] Chiu C.-Y., Douglas J. A., Li X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74, 633-665. [Google Scholar]

[bibr9-0146621619843829] de la Torre J. (2011). The generalized DINA model framework. Psychometrika, 76, 179-199. [Google Scholar]

[bibr10-0146621619843829] de la Torre J., Chiu C.-Y. (2016). A general method of empirical Q-matrix validation. Psychometrika, 81, 253-273. [DOI] [PubMed] [Google Scholar]

[bibr11-0146621619843829] de la Torre J., Lee Y. S. (2013). Evaluating the Wald test for item-level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50, 355-373. [Google Scholar]

[bibr12-0146621619843829] George A. C., Robitzsch A. (2015). Cognitive diagnosis models in R: A didactic. The Quantitative Methods for Psychology, 11, 189-205. [Google Scholar]

[bibr13-0146621619843829] Haertel E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 301-321. [Google Scholar]

[bibr14-0146621619843829] Hansen M., Cai L., Monroe S., Li Z. (2016). Limited-information goodness-of-fit testing of diagnostic classification item response models. British Journal of Mathematical and Statistical Psychology, 69, 225-252. [DOI] [PubMed] [Google Scholar]

[bibr15-0146621619843829] Jiang Z., Ma W. (2018). Integrating differential evolution optimization to cognitive diagnostic model estimation. Frontiers in Psychology, 9, Article 2142. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr16-0146621619843829] Jurich D. P. (2014). Assessing model fit of multidimensional item response theory and diagnostic classification models using limited-information statistics (Unpublished doctoral dissertation). James Madison University, Harrisonburg, VA. [Google Scholar]

[bibr17-0146621619843829] Liu R., Huggins-Manley A. C., Bulut O. (2018). Retrofitting diagnostic classification models to responses from IRT-based assessment forms. Educational and Psychological Measurement, 78, 357-383. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr18-0146621619843829] Liu R., Jiang Z. (2018). Diagnostic classification models for ordinal item responses. Frontiers in Psychology, 9, Article 2512. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr19-0146621619843829] Liu Y., Tian W., Xin T. (2016). An application of M₂ statistic to evaluate the fit of cognitive diagnostic models. Journal of Educational and Behavioral Statistics, 41, 3-26. [Google Scholar]

[bibr20-0146621619843829] Liu Y., Xin T., Li L., Tian W., Liu X. (2016). An improved method for differential item functioning detection in cognitive diagnosis models: An application of Wald statistic based on observed information matrix. Acta Psychologica Sinica, 48, 588-598. [Google Scholar]

[bibr21-0146621619843829] Ma W., de la Torre J. (2016). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology, 69, 253-275. [DOI] [PubMed] [Google Scholar]

[bibr22-0146621619843829] Ma W., de la Torre J. (2018). GDINA: The generalized DINA model framework [Computer software] (Version 2.1). Retrieved from https://CRAN.R-project.org/package=GDINA

[bibr23-0146621619843829] Ma W., de la Torre J. (2019). Category-level model selection for the sequential G-DINA model. Journal of Educational and Behavioral Statistics, 44, 45-77. [Google Scholar]

[bibr24-0146621619843829] Ma W., de la Torre J. (2020). An empirical Q-matrix validation method for the sequential G-DINA model. British Journal of Mathematical and Statistical Psychology, 73, 143-166. [DOI] [PubMed] [Google Scholar]

[bibr25-0146621619843829] Ma W., Iaconangelo C., de la Torre J. (2016). Model similarity, model selection, and attribute classification. Applied Psychological Measurement, 40, 200-217. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr26-0146621619843829] Maydeu-Olivares A. (2013). Goodness-of-fit assessment of item response theory models. Measurement: Interdisciplinary Research and Perspectives, 11, 71-101. [Google Scholar]

[bibr27-0146621619843829] Maydeu-Olivares A., Joe H. (2005). Limited-and full-information estimation and goodness-of-fit testing in 2 n contingency tables: A unified framework. Journal of the American Statistical Association, 100, 1009-1020. [Google Scholar]

[bibr28-0146621619843829] Maydeu-Olivares A., Joe H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika, 71, 713-732. [Google Scholar]

[bibr29-0146621619843829] Maydeu-Olivares A., Joe H. (2014). Assessing approximate fit in categorical data analysis. Multivariate Behavioral Research, 49, 305-328. [DOI] [PubMed] [Google Scholar]

[bibr30-0146621619843829] Reiser M. (2008). Goodness-of-fit testing using components based on marginal frequencies of multinomial data. British Journal of Mathematical and Statistical Psychology, 61, 331-360. [DOI] [PubMed] [Google Scholar]

[bibr31-0146621619843829] Samejima F. (1997). Graded response model. In van der Linden W. J., Hambleton R. K. (Eds.), Handbook of modern item response theory (pp. 85-100). New York, NY: Springer. [Google Scholar]

[bibr32-0146621619843829] Sessoms J., Henson R. A. (2018). Applications of diagnostic classification models: A literature review and critical commentary. Measurement: Interdisciplinary Research and Perspectives, 16, 1-17. [Google Scholar]

[bibr33-0146621619843829] Sorrel M. A., Abad F. J., Olea J., de la Torre J., Barrada J. R. (2017). Inferential item-fit evaluation in cognitive diagnosis modeling. Applied Psychological Measurement, 41, 614-631. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr34-0146621619843829] Templin J. L., Henson R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287-305. [DOI] [PubMed] [Google Scholar]

[bibr35-0146621619843829] von Davier M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287-307. [DOI] [PubMed] [Google Scholar]

[bibr36-0146621619843829] Wang C., Shu Z., Shang Z., Xu G. (2015). Assessing item-level fit for the DINA model. Applied Psychological Measurement, 39, 525-538. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr37-0146621619843829] Xu X., Chang H., Douglas J. (2003, April). Computerized adaptive testing strategies for cognitive diagnosis. Paper presented at the Annual Meeting of the National Council of Measurement in Education, Chicago, IL. [Google Scholar]

PERMALINK

Evaluating the Fit of Sequential G-DINA Model Using Limited-Information Measures

Wenchao Ma

Abstract

Introduction

Overview of the Sequential G-DINA Model

Limited-Information Measures

Simulation Studies

Table 1.

Study 1: The Model-Data Fit Measures Under Condensation Rule Misspecifications

Design

Figure 1.

Results

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Study 2: The Model-Data Fit Measures Under Q-Matrix Misspecifications

Design

Results

Table 7.

Summary and Discussion

Supplemental Material

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Evaluating the Fit of Sequential G-DINA Model Using Limited-Information Measures

Wenchao Ma

Abstract

Introduction

Overview of the Sequential G-DINA Model

Limited-Information Measures

Simulation Studies

Table 1.

Study 1: The Model-Data Fit Measures Under Condensation Rule Misspecifications

Design

Figure 1.

Results

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Study 2: The Model-Data Fit Measures Under Q-Matrix Misspecifications

Design

Results

Table 7.

Summary and Discussion

Supplemental Material

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases