Skip to main content
Applied Psychological Measurement logoLink to Applied Psychological Measurement
. 2020 May 8;44(6):415–430. doi: 10.1177/0146621620909898

Partially and Fully Noncompensatory Response Models for Dichotomous and Polytomous Items

R Philip Chalmers 1,
PMCID: PMC7383690  PMID: 32788814

Abstract

This article extends Sympson’s partially noncompensatory dichtomous response model to ordered response data, and introduces a set of fully noncompensatory models for dichotomous and polytomous response data. The theoretical properties of the partially and fully noncompensatory response models are contrasted, and a small set of Monte Carlo simulations are presented to evaluate their parameter recovery performance. Results indicate that the respective models fit the data similarly when correctly matched to their respective population generating model. The fully noncompensatory models, however, demonstrated lower sampling variability and smaller degrees of bias than the partially noncompensatory counterparts. Based on the theoretical properties and empirical performance, it is argued that the fully noncompensatory models should be considered in item response theory applications when investigating conjunctive response processes.

Keywords: noncompensatory models, compensatory models, multidimensional item response theory, MIRT, conjunctive, disjunctive


Item response theory (IRT) comprises a set of probabilistic response models for capturing the relationship between one or more continuous latent traits and their interaction with categorically recorded response stimuli. These categorical stimuli, collected as either dichtomous or polytomous responses, are commonly found in psychological tests, surveys, and ratings scales and are coded as interval numbers to reflect an implied scoring behavior given the underlying traits. In the event that more than one latent trait is present in any given psychological test, several multidimensional item response theory (MIRT) models have been proposed. These MIRT models are generally categorized into two nonoverlapping response modeling classes: compensatory, and noncompensatory (Ackerman, 1989; Reckase, 2009). Noncompensatory models reflect whether the underlying traits follow a conjunctive (i.e., interactive) response process, while compensatory models reflect an underlying disjunctive (i.e., additive) response process (Maris, 1999; Whitely, 1980).

Compensatory IRT models are the dominant class of models utilized in practice, partially due to the support provided by commercial (e.g., flexMIRT, Cai, 2015) and open-source (e.g., mirt, Chalmers, 2012) software, but also because many more compensatory IRT models have been proposed in the literature. To date, the only class of noncompensatory MIRT model that has been proposed is the product model described by Sympson (1977), which currently is intended only for dichotomous response data. A special case of this product model was later adopted by Whitely (1980), who proposed a one-parameter variant of Sympson’s product model to measure multicomponent response processes. In both expressions of the product model, the conjunctive response process is included by computing the product of two or more distinct unidimensional IRT probability response functions so that the resulting probability remains large in magnitude only when the associated unidimensional probabilities are large in magnitude (i.e., closer to one). However, this product-model formulation is not without important practical and statistical limitations, some of which are highlighted in this article.

The purpose of this article is to generalize Sympson’s product model to polytomous response data, and to propose a novel set of noncompensatory IRT models for dichotomous and polytomous response data that contain several attractive properties compared to the product modeling approach. In particular, the proposed IRT models provide more stable likelihood functions (resulting in improved sampling variability), demonstrate an intimate connection with the general formulation of compensatory IRT models, and reflect a constrained conjunctive response process.

Compensatory and Partially Noncompensatory Models

The compensatory two-parameter logistic model (C2PLM; Reckase, 1997) is defined by the cumulative logistic function

P(y=1|θ,a,d)=11+exp[(aθ+d)], (1)

where y=1 represents a positive response or endorsement (otherwise, y=0), d an overall intercept that reflects the relative “easiness” of an item, and a a T×1 vector of discrimination or slope parameters associated with each latent trait element in θ. In this model, each i=1,,N participant may have a unique θ vector, while the item parameters are constant within each respective item.

Inspecting Equation 1 more closely, the vector product aθ highlights why this probability model is considered compensatory. Because the vector product generates a scalar c=a1θ1+a2θ2++aTθT, the value of c will necessarily increase whenever the signs of the respective at and θt match. This property implies that lower values on one or more latent traits will not necessarily result in a low probability of positive endorsement if the product between any adjacent latent traits and their associated slope parameters are large and positive. This property may be disadvantageous in empirical settings where the probability of positive responses should be limited by one or more ability or skill components (Whitely, 1980).

As a practical example where compensatory response models may be limited, Sympson (1977) argued that when test items pertaining to mathematical stimuli are presented in an unfamiliar language the probability of a positive response necessarily requires enough “reading comprehension”and“mathematical” ability to be successfully answered. Sympson’s reasoning was that although a participant may have excellent mathematical skills, if this individual does not possess enough reading ability then they likely will not understand the question; therefore, these individuals should have a low probability of answering correctly. To address this conjunctive response process, Sympson recommended forming an expected probability function based on the product of separate unidimensional probability functions (such as the two-parameter logistic model [2PLM]; Lord & Novick, 1968),

P(y=1|θ,a,d)=Πt=1TP(y=1|θt,at,dt)=Πt=1T11+exp[(atθt+dt)]. (2)

Compared to Equation 1, notice that d is now a T×1 vector instead of a scalar, where each trait is now associated with its own unique intercept. As well, when all ak=1 this model reduces to a Rasch-model variate, which is reflective of the model explored by Whitely (1980).1 Further modifications are possible, such as the three-parameter logistic model (e.g., Babcock, 2011; Chalmers & Flora, 2014), which incidentally was the initial model studied by Sympson (1977).

The functional behavior that the product model attempts to reflect is the following: if individuals have sufficiently high θt on each respective tth trait then, coupled with their associated slope and intercept parameters, the resulting expected probability in Equation 2 should result in a value close to 1; otherwise, if one or more θt terms are low, then the expected probability should be limited by these unidimensional probability term(s). As Reckase (2009) notes, however, the models proposed by Sympson (1977) ought to be characterized as “partially compensatory” or “partially noncompensatory” rather than completely noncompensatory because higher θt locations will necessarily yield higher probabilities even when holding auxiliary θt terms constant. In this sense, some compensation does occur in Sympson’s model, so while the product model does generally reflect a conjunctive response process it may not do so optimally across the entire range of θ.

The top and middle images in Figure 1 illustrate item response surfaces and contour plots for Equations 1 and 2, respectively. In the top image, the compensatory property of the C2PLM can be seen from the linearity of its contours, implying that the probability of positive endorsement will typically approach 1 at any fixed level of one ability as the level of the second ability increases (provided, in this example, that the slopes are positive). In contrast, the response surface and contour plots for the partially noncompensatory two-parameter logistic model (PNC2PLM) are generally bowed, implying that increasing either latent trait is largely limited by the expected probability on the adjacent latent trait. However, because these values are bowed, rather than appearing at right angles (i.e., see the bottom row of Figure 1), it is also clear why these models reflect a partially noncompensatory effect in that slight increases in one trait will still increase the probability of positive endorsement when holding constant all other auxiliary traits.

Figure 1.

Figure 1.

Hypothetical examples of compensatory (top row), partially noncompensatory (middle row), and fully noncompensatory (bottom row) probability models organized as surface (left column) and contour (right column) plots.

Limitations of the Partially Noncompensatory Model

As noted at the beginning of the previous section, a practical limitation with PNCMs is that they currently are available for dichotomous response data only, though it is certainly possible to encounter polytomous response data that reflect similar conjunctive response phenomenon. For instance, a mathematical question scored using a partial-credit rubric such as 0, 1, 2, or 3, reflecting the number of part-marks assigned, will also demonstrate the aforementioned reading-mathematics issue described by Sympson (1977) if the item’s stem or responses are presented in a less familiar language. In this partial-credit scoring example, the magnitude of the observed score should be limited by both reading and mathematical ability; hence, a score of 3 should only be likely if both reading and mathematics ability are sufficiently high.

Another limitation with respect to the general family of PNCMs is the inability to obtain suitable parameter estimates in empirical applications. Many of these practical issues with PNCMs have been discovered through Monte Carlo simulation studies, where similar observations are recurrent: parameter estimation instability, wide intervals of sampling uncertainty, problematic convergence when implementing algorithms for model estimation, suboptimal parameter recovery of the correlation parameters between the latent traits, and so on; see the simulations investigation by Bolt and Lall (2003), Babcock (2011), and Chalmers and Flora (2014) for further details and simulation-based conclusions.

Finally, the PNCM may not optimally reflect a suitable conjunctive response process due to its partially noncompensatory nature (Reckase, 2009) in that the individual response processes do not technically act independently in the expected probability models due to the multiplication operation for collapsing the unidimensional probability components. This property negatively impacts the slope of the response functions, where if one or more values in θ are low then the response functions will demonstrate notably lower discrimination effects. For example, in the middle row of Figure 1, holding θ1 constant at a higher location (say θ1=3) results in a conditional trace-line along θ2 that has a noticeably steep slope, thereby indicating a potentially useful degree of discrimination between adjacent θ2 locations. However, holding θ1 constant at a lower level (e.g., θ1=3) will result in a conditional trace-line along θ2 that has notably less steepness, indicating lower levels of discrimination. Therefore, the steepness of the response curves—regardless of the item parameters used in the PNC2PLM—are conditionally dependent upon the auxiliary latent trait locations and only reflect the typical slope-intercept interpretation in the 2PLM when all θt1 or dt1. This property has negative consequences when interpreting the associated parameters, as well as when quantifying the degree of expected information contributed by a given test item when drawing inferences about θ (Samejima, 1969).

A Polytomous Generalization of the PNCM for Ordinal Response Data

This section addresses the first of the three limitations of the PNCM family stated above by extending the PNCM to support polytomous response data that follow a natural response ordering. Beginning with the required compensatory model, the multidimensional compensatory variant of the graded response model (CGRM; Muraki & Carlson, 1995) requires collecting a set of sequential C2PLMs for each observed response category, and subsequently computing the difference between these response functions to obtain the expected probability of the kth category. Notionally, the development of the CGRM requires K1 logistic functions for the y=[0,1,,K1] possible observations of the form

P(yk|θ,a,dk)=11+exp[(aθ+dk)] (3)

where dk=[d1,d2,,dK1] are an ordered set of intercepts (from highest to lowest). Next, the response to each kth category is formed by

P(y=k|θ,a,dk,dk+1)=P(yk|θ,a,dk)P(yk+1|θ,a,dk+1), (4)

while the two boundary categories, y=0 and y=K1, are defined by 1P(y1|θ,a,d1) and P(yK1|θ,a,dK1), respectively. Note that when K=2, the CGRM necessarily reduces to the C2PLM as a special case.

Following the reasoning presented by Sympson (1977), though changing focus from dichotomous to polytomous response data, the contribution of each latent trait is conceptualized as an independent probabilistic response process that jointly contributes to the response behavior. As such, for a dichotomous item a total of T unidimensional IRT models are required to reflect each underlying response process (where T2), and the respective unidimensional probability models are collapsed through a product function. For polytomous items now, this independent unidimensional conceptualization also can be adopted, and analogous to the PNC2PLM only requires adding additional intercept parameters to control the functional forms in each respective unidimensional GRM.

Given Equation 3, modifying the components required to reflect a partially noncompensatory ordinal response model is possible by replacing each K1 C2PLM component with the commensurate PNC2PLM found in Equation 2. Analogously, instead of a single vector of ordered intercept terms, there are now K1 strictly ordered intercepts along each of the T dimensions. Hence, each logistic function in Equation 3 requires the a parameter vector, as well as the intercepts [dk1,dk2,,dkT], to compute the associated PNC2PLM logistic functions. After computing these response functions the associated expected probabilities may then be substituted into the difference form in Equation 4. In total, this newly developed partially noncompensatory graded response model (PNCGRM) requires T×K parameters to be estimated for each item, reflecting the T slopes and T×(K1) intercept parameters, where in the special case of K=2 this definition simplifies to the original PNC2PLM. Note that for polytomous response models, the compensatory and noncompensatory behavior is seen more clearly by focusing on the expected scoring function

E(θ)=k=0K1k·P(y=k|θ), (5)

rather than the K distinct probability functions.

Fully Noncompensatory Response Models

When constructing a probabilistic model suitable for reflecting a conjunctive response process the important desideratum are that (a) the probability of positive endorsement is a monotonic function with respect to each latent trait, (b) the rate of the monotonic increase is conditionally limited by the smallest latent trait’s model-implied contribution (e.g., atθt+dt, where larger positive values indicate higher contribution), and (c) given the least contributing latent trait, the probability of positive endorsement should not increase past this trait’s contribution. Building upon the idea that the atθt+dt terms reflect the core contributions of the tth trait, and with the intent of addressing the remaining limitations of the PNCM family of response models, the following fully noncompensatory model (FNCM) is proposed for data with K=2 categories:

P(y=1|θ,a,d)=11+exp[1t=1TIt(s)°(atθt+dt)]=11+exp[I(s)(a°θ+d)], (6)

where ° represents the Hadamard product. The noncompensatory operations in this equation are driven by the values from the indicator function I(s), where the function I(s) results in a vector with the constants 1 or 0 for the tth trait depending on whether the respective atθt+dt term was, or was not, the minimum in the vector s=[a1θ1+d1,a2θ2+d2,,aTθT+dT], respectively.

Compared to the family of PNCMs, Equation 6 avoids the need to multiply independent probability functions to create a noncompensatory effect by including the functional noncompensatory components directly within the model’s exponentiated component. For instance, if a2θ2+d2 is the minimum in s then I2(s)=1 and all remaining It2(s) would equal 0. Note that in the situation of ties, only one It(s) should equal the value 1, though which It(s) minimum to set to 1 is arbitrary because the contributions are identical. Focusing on the rightmost equation in Equation 6, expressed in matrix form, I(s) is a T×1 vector of constants with 0s everywhere except in the tth location corresponding to the minimum element in s, which is assigned the value 1. Finally, for traits that do not influence the respective item, It(s) can be set to the constant of 0, and the respective at and dt terms can be omitted during estimation.

To facilitate interpretation, the general behavior of the parameters in Equation 6 are presented in the bottom of Figure 1 for a hypothetical two-dimensional IRT model. As can be seen, modification of the at parameters has the effect of controlling the rate of change (i.e., slope) in the expected probability surfaces, where larger values of at reflect higher levels of discrimination within the respective trait. As well, the dt parameters dictate the general ease of positive endorsement within each respective trait. Most important, however, is that the resulting probability response functions become equal to a constant probability value based on the minimum in s given θ=[θ1,θ2].

Equation 6 reflects the three desideratum for noncompensatory IRT models listed above; however, it notably differs from Equation 2’s product formulation in that the resulting probability space is sharply bounded by the trait with the lowest contribution. In other words, if the associated skill or trait is not sufficiently high along the underlying continuum, given the associated slope and intercept scaling parameters, then the response probability will remain constant regardless of the adjacent θ locations. This effect is seen in the bottom row of Figure 1. Notice that the slope with respect to each dimension reflects the associated unidimensional at value at all possible locations up to the location where the expected probability becomes a constant. Therefore, unlike PNCMs, Equation 6 provides an unconditional interpretation of the slope and intercept parameters up to the point of the probabilistic limit, where below this limit the properties of these functions are identical to the unidimensional model from which they were constructed.

Finally, with respect to polytomous response data, and much like the transition from the C2PLM to the CGRM and the transition from the PNC2PLM to the PNCGRM, the fully noncompensatory two-parameter logistic model (FNC2PLM) in Equation 6 will readily generalize to ordered response data. Given the definition in Equation 6 for dichotomous response data, first apply this probability model for all k=[0,1,,K1] categorical terms for each category, where like the PNCGRM the sets [a1,a2,,aT] and [dk1,dk2,,dkT] are required to compute the category probabilities. After computing these probability functions, the associated expected values are substituted into the difference form found in Equation 3 to obtain the fully noncompensatory graded response model (FNCGRM). As such, both the PNCGRM and FNCGRM require a total of T×K parameters to be estimated, subject to the constraint that the respective intercepts within each tth dimension are strictly ordered.

Relationship Between PNCMs and FNCMs

Compared to the product model’s approach for modeling a conjunctive process, Equation 6 represents a somewhat different set of modeling assumptions. Generally speaking, PNCMs assume that the contributions from the independent unidimensional cumulative response functions are nonconstant across all levels of θt, in that the properties of the conditional response models jointly depend upon the levels of the auxiliary θ components. Using a T=2 and K=2 example for PNCMs, at all levels of θ1 and θ2 the following joint probability relationship will hold:

P(y=1|θ1,θ2)=P1(y=1|θ1)P2(y=1|θ2) (7)

where P1(·) and P2(·) represent the associated unidimensional probability functions for the respective latent traits. The consequence of assuming this joint relationship is that items should always be considered conditionally less discriminant (and more difficult) than the unidimensional models from which they were formed. Evidently, the resulting probability function on the left of Equation 7 consists of a heterogeneous combination of probability models, demonstrating the property P(y=1|θ1,θ2)<P1(y=1|θ1) and P(y=1|θ1,θ2)<P2(y=1|θ2) when all unidimensional conditional probabilities are less than unity.

The interpretation of the FNCMs, on the other hand, is that an item ought to be modeled as though a unidimensional cumulative response model is constant across the conditional θt range (i.e., represents a unidimensional IRT model) unless an auxiliary response process affects the probability of endorsement. This functional form has a discrete interpretation of a conjunctive response process in that the functions are limited based on the probability of whether auxiliary latent trait components have been mastered, and therefore reflect a conditional rather than joint probability relationship. Using the same T=2 and K=2 example, at all levels of θ1 and θ2 the following probability relationship will hold for the FNCM family:

P(y=1|θ1,θ2)=P1(y=1|θ1)I1(s)P2(y=1|θ2)I2(s) (8)

where It(s)=1 for the respective trait that demonstrates the limiting contribution (otherwise, It(s)=0). Depending on the limiting probability term selected, either P(y=1|θ1,θ2)=P1(y=1|θ1) or P(y=1|θ1,θ2)=P2(y=1|θ2) will be true given [θ1,θ2]. Based on these expressions, it is apparent that FNCMs imply a more constrained structure than the PNCMs, where the constraint is created by the It(s) selection scheme. Note that if the indicator function values are all set to 1, implying that the response probability should be limited by the probabilistic lack-of-mastery for both traits, then Equation 8 will equal Equation 7; otherwise, only one probability function will determine P(y=1|θ1,θ2) and therefore will behave identically to a unidimensional IRT model.2

Based on the above arguments, it is clear that the key difference between PNCMs and the FNCMs is the rate at which the conditional probability functions approach their limiting probability constants, where FNCMs assume that the respondents approach the limiting probability constant more quickly (i.e., at a rate equal to the respective unidimensional IRT models) than the related PNCMs family. As such, FNCMs can be viewed as a homogeneous subset of PNCMs and should in theory witness lower degrees of sampling variability due to their more constrained nature. The Monte Carlo simulations investigated in the following demonstrate this property.

Model Identification and Estimation

In its current form, PNCMs and FNCMs require additional identification constraints before they can be estimated from samples of response data. As Babcock (2011) and Chalmers and Flora (2014) note, PNCMs should include a set of design constraints for the slope parameters to (a) properly identify the model and (b) provide sufficient stability so that the respective θt elements do not switch axes during estimation. These slope constraints are often presented in the form of a so-called Q-matrix, which dictates where the fixed and estimated slope parameters should be defined. These identification constraints remain true for the FNCMs as well; hence, these recommendations should be adopted when fitting these respective IRT models.

How the θ parameters should be estimated is also important for adequate model identification. By convention, the assumption that θ is distributed multivariate normal with a mean vector 0 and variance–covariance matrix Σ with 1’s on the diagonal provides a sufficient scale for the latent trait variables (Reckase, 2009) and therefore identifies each compensatory and noncompensatory model described herein. If desired, the off-diagonal elements in Σ may also be freely estimated to determine the bivariate correlation between the respective θt terms. Assuming the distribution form of θ a priori also allows for marginal maximum likelihood (MML) frameworks to be adopted to obtain sample estimates of the respective model parameters (Bock & Aitkin, 1981); hence, the presentation of all of the above noncompensatory models naturally fit within Bock and Aitkin’s well-known Expectation–Maximization (EM) estimation paradigm.

In the following sections, the EM strategy for the MML criterion is adopted. As well, it is known that the EM algorithm is effective when estimating IRT models as long as the number of dimensions T is reasonably small; say, T4. Therefore, to avoid issues related to numerical integration borne within high-dimensional IRT models (Reckase, 2009), the following simulation studies are limited to T=2 latent traits. Finally, for an alternative and numerically stable parameterization of the PNCGRM and FNCGRM useful for obtaining parameter estimates with unconstrained optimization algorithms, refer to Appendix A. For an empirical application of these models to dichotomous response data, see the associated Supplemental Appendix.

Monte Carlo Simulations

This section provides a set of Monte Carlo simulations to investigate the described partial and fully noncompensatory response models for data-sets containing dichotomous or polytomous responses. To facilitate interpretation, many characteristics are held constant across the simulations so that the results are more easily comparable. All studies were controlled using the SimDesign package (Sigal & Chalmers, 2016), and models were estimated using the EM algorithm engine provided by the mirt package (Chalmers, 2012), Version 1.30, using the default estimation arguments for maximum likelihood estimation (MLE). Each simulation condition was studied using 300 independent replications. Where appropriate, bias and root mean-square error (RMSE) estimates were marginalized across suitable item sets for ease of presentation. Finally, to conserve space, complete tables of the simulation results are either available in the associated Supplemental Appendix or are available from the author upon request.

Dichotomous IRT Models

The first simulation study investigated test items with only two possible response options. To allow for suitable model identification using an associated Q-matrix, two sets of unidimensional models were generated from the more general C2PLM by constraining one of the slope parameters to 0 for each item. These unidimensional sets had either five or ten items each, where the slopes were drawn from a log-normal distribution, a~logN(0.2,0.3), and intercepts from a normal distribution, d~N(0,1). For the remaining items, a set of three, six, or nine noncompensatory models were selected as either the PNC2PLM or FNC2PLM, where slopes were drawn from a log-normal distribution, a~logN(0.2,0.3), and intercepts from a normal distribution, d~N(1.5,1). For example, in the five unidimensional item, three noncompensatory item designs, the Q-matrix is a 13-item by two-dimensional matrix (5×2+3) of the form

Q=[11111000001110000011111111]

The longest test in the simulation consisted of J=29 items, while the smallest test consisted of J=13 items, reflecting the above Q-matrix. Finally, N=1000 or N=2000 response vectors were generated in each test after obtaining suitable θ vectors drawn from a standard bivariate normal distribution with a correlation of r=0,.3,or.6.

There were a total of 72 unique simulation conditions investigated in this study (2 unidimensional item conditions × 3 noncompensatory item conditions × 2 noncompensatory types × 2 sample sizes × 3 correlations). For each condition, the correct Q-matrix was applied to the generated data to avoid misspecification of the slope parameters. However, with respect to the noncompensatory IRT items, both the PNC2PLM and FNC2PLM were independently fitted to the same data, creating a correct–incorrect specification pairing. This was included to determine whether the noncompensatory models could be approximately distinguished when fitted to the same datasets according to the log-likelihood criteria, and to study the implications of this misspecification when investigating subsequent θ^ estimates. Note that drawing comparisons using alternative information criteria, such as Akaike’s Information Criteria (AIC), is not required because PNC2PLMs and FNC2PLMs have the same number of estimated parameters.

Results

Based on the log-likelihood values returned at the MLEs,3 when the data were generated and fitted with the PNC2PLM these models provided a higher log-likelihood in 84.5% of the datasets when N=1000 and 93.7% when N=2000 than when fitted with the FNC2PLM. When data were generated and fitted with the FNC2PLM this model provided a higher log-likelihood in 88.6% of the datasets when N=1000, and 94.1% when N=2000 than when fitted with the PNC2PLM. These marginalized percentages were affected by several simulation design properties, such as the overall number of items and latent trait correlations. In general, increasing the correlation r resulted in a lower probability of selecting the correct model, while increasing the number of items (regardless of whether these were compensatory or noncompensatory) increased the chance that the correct model demonstrated the highest log-likelihood.

With respect to the parameter recovery results, increasing the sample size resulted in more optimal bias and RMSE estimates in all conditions, regardless of the population generating model, while varying the value of r slightly improved the bias and RMSE estimates for the compensatory parameters in the C2PLMs (as well as the recovery of r itself). Increasing r tended to increase both the bias and RMSE for the a and d parameters in the PNC2PLM and FNC2PLM. To help with interpretation, Table 1 contains RMSE estimates after marginalizing over the sample size and r conditions. Table 1 highlights that the unidimensional item parameters were generally recovered with the same precision in both simulation designs, while the noncompensatory parameters were typically obtained with better accuracy in the FNC2PLM simulation than the similarly generated PNC2PLM simulations. Table 1 also demonstrates the trend that increasing the number of unidimensional items generally improves the parameter recovery for each IRT model, while increasing the number of noncompensatory items tends to result in better recovery for the unidimensional items but worse recovery for generating noncompensatory models.

Table 1.

RMSE Estimates When Dichotomous IRT Models Were Generated and Fitted From Their Corresponding PNC2PLM or FNC2PLM Population Generating Model.

Unidimensional items Noncompensatory items
Generating model nuni nnoncomp a d a d
PNC2PLM 5 3 0.137 0.085 0.536 0.724
6 0.130 0.084 0.541 0.761
9 0.124 0.083 0.544 0.778
10 3 0.116 0.083 0.415 0.577
6 0.113 0.083 0.444 0.628
9 0.110 0.082 0.446 0.659
FNC2PLM 5 3 0.138 0.087 0.457 0.566
6 0.128 0.084 0.485 0.653
9 0.126 0.085 0.483 0.675
10 3 0.115 0.083 0.381 0.473
6 0.113 0.083 0.392 0.522
9 0.111 0.083 0.400 0.558

Note. Note that nuni reflects the number of unidimensional items per dimension, and nnon-comp reflects the number of (partially/fully) noncompensatory models generated. RMSE = root mean-square error; IRT = item response theory; PNC2PLM = partially noncompensatory two-parameter logistic model; FNC2PLM = fully noncompensatory two-parameter logistic model.

Finally, recovery of the correlation parameters (r) were on average unbiased when the correct model was fitted to the population generating model (average bias was .003 and .001 for the FNC2PLM and PNC2PLM, respectively). However, when the models were fitted to the incorrect generating model, the absolute bias and RMSE tended to increase slightly. When fitting the FNC2PLM to data generated with the PNC2PLM, the bias increased to .035, while the average RMSE increased from .029 to .043. In the converse misspecification condition, the average bias of the correlation parameter was .028, while RMSE increased from .029 to .040.

Polytomous IRT Models

The second simulation reflected nearly the same properties as the first simulation study. But, rather than studying dichotomous items, all items were generated from the CGRMs, PNCGRMs, and FNCGRMs. Items were organized to have K=5 response categories, and only three or five items were included due to the larger amount of statistical information provided by these items in comparison to dichotomous items (Samejima, 1969). The slope parameters were drawn from a log-normal distribution, a~logN(0.2,0.2), while the ordered intercept parameters for the unidimensional items were created by adding a standard normal deviation value, m*~N(0,1), to the vector [1.5,0.5,0.5,1.5]. For the noncompensatory IRT models, the ordered intercept parameters within each tth dimension were constructed by adding a standard normal deviation value, m*~N(0,1), to the vector [3,1.5,0,1.5]. For each condition, the correct Q-matrix was again fitted to the generated data, and both sets of noncompensatory IRT models (PNCGRM and FNCGRM) were fitted to the same data to create correct–incorrect pairings.

Results

The results of this simulation were very similar to the results presented above for the dichotomous simulation study. Specifically, the unidimensional parameters tended to be recovered with similar accuracy (see Table 2), the noncompensatory IRT parameters were notably more difficult to recover than the compensatory counterparts, and the FNCGRM’s parameters were obtained more efficiently (i.e., with less sampling variability) than the PNCGRM’s parameters, particularly for the intercept parameters. Similarly, based on the log-likelihood values returned at the MLEs the PNCGRM provided a higher log-likelihood in 97.7% of the datasets when N=1000 and 99.6% when N=2000 than when fitted with the FNCGRM. When data were generated and fitted with the FNCGRM this model provided a higher log-likelihood in 95.0% of the datasets when N=1000 and 98.4% when N=2000 than when fitted with the PNCGRM.

Table 2.

RMSE Estimates When Polytomous IRT Models Were Generated and Fitted From Their Corresponding PNCGRM or FNCGRM Population Generating Model.

Unidimensional items Noncompensatory items
Generating model nuni nnoncomp a d1 d2 d3 d4 a d1 d2 d3 d4
PNCGRM 3 3 0.109 0.097 0.082 0.081 0.096 0.382 0.684 0.549 0.539 0.754
6 0.096 0.093 0.079 0.080 0.093 0.370 0.708 0.563 0.555 0.786
9 0.092 0.091 0.076 0.078 0.092 0.339 0.670 0.542 0.538 0.773
5 3 0.093 0.093 0.079 0.079 0.093 0.305 0.546 0.438 0.439 0.664
6 0.089 0.092 0.080 0.079 0.093 0.313 0.581 0.472 0.472 0.703
9 0.085 0.092 0.078 0.079 0.092 0.300 0.581 0.477 0.476 0.709
FNCGRM 3 3 0.110 0.096 0.081 0.080 0.097 0.374 0.591 0.467 0.389 0.407
6 0.098 0.093 0.079 0.080 0.096 0.363 0.640 0.508 0.428 0.448
9 0.093 0.092 0.078 0.077 0.091 0.356 0.640 0.519 0.448 0.464
5 3 0.094 0.093 0.079 0.079 0.093 0.315 0.496 0.387 0.328 0.354
6 0.090 0.092 0.080 0.080 0.093 0.310 0.542 0.430 0.366 0.386
9 0.087 0.093 0.080 0.079 0.091 0.313 0.562 0.456 0.389 0.399

Note. Note that nuni reflects the number of unidimensional items per dimension, and nnoncomp reflects the number of (partially/fully) noncompensatory models generated. RMSE = root mean-square error; IRT = item response theory; PNC2PLM = partially noncompensatory two-parameter logistic model; FNC2PLM = fully noncompensatory two-parameter logistic model.

Recovery of the correlation parameters r were on average unbiased when the correct model was fitted to the generating model (average bias was .003 for both the FNCGRM and PNCGRM). As with the previous simulation study, when the models were fitted to the incorrect generating model the absolute bias and RMSE tended to increase. When fitting the FNCGRM to data generated with the PNCGRM, the bias increased to .063, while the average RMSE increased from .029 to .066. In the converse misspecification condition the average bias of the correlation parameter was .030, while RMSE increased from .031 to .043.

Recovery of the θ Values

For the above dichotomous and polytomous simulation studies, additional sets of response patterns were drawn to evaluate how well individual θ sets could be recovered by computing maximum a posteriori (MAP) predictions (θ^MAP) under each simulation condition after the IRT parameter estimates were obtained via MLE. To limit the scope of the θ recovery, only unique pairs consisting of the values θ1=θ2=[2,1,0,1,2] were investigated, resulting in a total of 25 unique [θ1,θ2] sets (i.e., [2,2],[2,1],,[1,2],[2,2]). As before, both the PNCM and FNCMs were fitted to the same data, creating a correct–incorrect specification pairing, which in this section is particularly useful to determine the accuracy of recovering θ under model misspecification. To help facilitate interpretation, bias and RMSE results are presented relative to the FNCM using the formula

RAB=|biasPNCM||biasFNCM|,

representing the relative absolute bias, and

RE=(RMSEPNCMRMSEFNCM)2,

representing the relative efficiency ratio. For both summary statistics, a ratio greater than 1 indicates that the FNCM family provides either less bias (RAB) or smaller sampling variability (RE) than the PNCM family, while values less than 1 indicate the converse.

Tests with more items resulted in smaller bias and RMSE estimates, and increasing the sample size had the same effect. More extreme θ sets, such as those containing either |θ1|=2 or |θ2|=2 values, were recovered with greater bias and variability than sets closer to the [0,0] pattern. More interesting in this simulation are the relative comparison results between the PNCM and FNCM when models are correctly and incorrectly fitted to the generating model. Figure 2 demonstrates the behavior of the RAB and RE summary statistics, marginalized over the 25 unique response patterns and sample size conditions for ease of presentation,4 where the left graphic depicts the RAB behavior and the right graphic the RE. Overall, the PNC2PLM and FNC2PLM provided similar bias estimates, though in general the FNC2PLM demonstrated lower marginal bias regardless of the generating model.

Figure 2.

Figure 2.

Relative absolute bias (left set) and efficiency (right set) graphics comparing the PNC2PLM and FNC2PLM after marginalizing over 25 unique θ patterns and sample size conditions.

Note. Within each set, the top rows reflect the conditions when the population generating model was the PNC2PLM, while the bottom rows the generating models was the FNC2PLM.

Focusing on the RE results in the right of Figure 2, when the data were generated and fitted by the PNC2PLM the sampling precision was smaller than when the same data were fitted by the FNC2PLM. This effect becomes more apparent as the number of noncompensatory items increase from three to nine, though the effect was somewhat moderated by the number of unidimensional items included. In general, the overall precision of the θ estimates will be better when the generating and fitted model is the PNC2PLM and worse when same generating model is incorrectly fitted with the FNC2PLM. This observed result did not occur when data were generated according to the FNC2PLM, however, reflecting that the PNC2PLM recovered the θ sets with relatively comparable precision due to its less constrained nature. Finally, because nearly identical RAB and RE behavior occurred within the PNCGRM and FNCGRM simulation study these results are not discussed herein, but are available from the author upon request.

Discussion

This article presented an extension of Sympson’s (1977) partially noncompensatory IRT model for ordered response data, and introduced a set of fully noncompensatory IRT models for dichotomous and polytomous response data. The properties of the partially and fully noncompensatory modeling approaches were contrasted, and a set of Monte Carlo simulations was presented to explore the performance of these models. Results revealed that the respective IRT models generally fit the data similarly when correctly matched with their corresponding population generating structure. However, when the correct models were fitted, the fully noncompensatory models tended to demonstrate lower sampling variability and bias than the partially noncompensatory counterparts in both dichotomous and polytomous tests. The lower sampling variability is also reflected in the standard error estimates in the empirical analysis example located in the Supplemental Appendix. With respect to the recovery of the latent trait parameters, however, the partially noncompensatory models demonstrated more flexible recovery of the underlying trait values than the fully noncompensatory models under misspecification conditions due to its less constrained nature.

The partially and fully noncompensatory models presented herein are intended to limit the probability of positive endorsement to items that reflect conjunctive response processes. From a theoretical perspective, the proposed fully noncompensatory models have some potentially attractive properties compared to the competing partially noncompensatory models. Specifically, the fully noncompensatory models provide a natural interpretation of the intercept and slope parameters (and associated Fisher information for the θ parameters) within a given item at conditional latent trait locations below the limiting probability constants. As well, because the fully noncompensatory models can be understood as a structured subset of the partially noncompensatory models, they will theoretically demonstrate lower sampling variability than the commensurate partially noncompensatory counterparts. Finally, unlike the product family of models the strategy of applying a Hadamard product to limit the conditional monotonicity of the probability space is a concept that can readily be applied to IRT models not studied in this article because the probability limiting components can be nested within the respective exponentiated computations directly.

Future research should contrast the fully noncompensatory model formulations with response models intended to reflect a similar conjunctive process in other measurement domains. For example, discrete response models found within the diagnostic classification model (DCM) framework (Junker & Sijtsma, 2001), such as the deterministic inputs, noisy “and” gate (DINA) model, as well as the noisy inputs, deterministic “and” gate (NIDA) model, follow a relatively similar conjunctive response process. These differ from the models studied herein in that they are intended for binary attributes in cognitive assessment tasks, typically with dichotomous response data, and follow the product-model specification to create the conjunctive response behavior. Hence, investigating the proposed fully noncompensatory approach within DCMs may breed novel and useful response models capable of capturing similar conjunctive response phenomenon. Such a theory would be an important contribution to establish the potentially intimate connection between DCMs and this new family of noncompensatory IRT models.

Supplemental Material

OLA – Supplemental material for Partially and Fully Noncompensatory Response Models for Dichotomous and Polytomous Items

Supplemental material, OLA for Partially and Fully Noncompensatory Response Models for Dichotomous and Polytomous Items by R. Philip Chalmers in Applied Psychological Measurement

Acknowledgments

Special thanks to Daniel Bolt, two anonymous reviewers, the editor, and the associate editor for providing constructive comments on this manuscript.

Appendix A

Unconstrained Optimization of Ordered Intercepts

When estimating the CGRM, PNCGRM, and FNCGRM, the corresponding vectors of intercepts are assumed to be ordered within each corresponding set, whereby d1>d2>>dK1. Although these types of constraints are well documented in the literature, the topic of how these ordering constraints should be satisfied when implementing unconstrained estimation algorithms (e.g., steepest descent, Newton–Raphson, quasi-Newton) appears nonexistent, even within detailed mathematical texts specifically dedicated to estimating IRT models (e.g., Baker & Kim, 2004). As such, software developers have typically adopted ad hoc strategies to address this ordering constraint during the numerical search. For example, the numerical optimizer could terminate early when this constraint is longer satisfied, thereby failing to locate the MLE, or the estimated parameters could be perturbed until the ordering constraint is no longer violated. However, one effective strategy to ensure that the ordered intercepts property holds during the iterative search can be achieved through re-parameterizations and transformations.

Let dj represent the ordered intercept parameters for the jth item, and ψj represent the unconstrained vector to be optimized. The set ψj is constructed in an intercept-deviation form, where the first element is equal to the first element in dj (the intercept) while the remaining terms represent the cumulative deviation from ψ1. Given a transformation matrix G with dimensions (K1)×(K1), as well as a function f(·) to apply the transformation exp(·) to all but the first element in ψj, the transformation

dj=Gf(ψj)=[100011001111][ψ1exp(ψ2)exp(ψK1)] (9)

provides the correct ordering of dj. For the CGRM, Equation 9 must be applied to one set of ψj per item. For the PNCGRM and FNCGRM, however, the transformation must be independently applied to the T distinct sets of intercepts that correspond to each respective dimension.

1.

Note that the intercept term in Equation 2 is added rather than subtracted, as is typically found in Rasch models.

2.

Note that with T>2, it is possible to create a hybridization between these two expressions. However, these mixed homogeneous–heterogeneous pairings are not considered in this article.

3.

Applying Vuong’s (1989) framework for comparing non-nested models is an alternative approach to investigate non-nested models. However, research in this area is currently in its infancy (e.g., Schneider et al., 2019) and therefore was not investigated in this study.

4.

Complete simulation results in the form of SimDesign objects (Chalmers, 2018) are available upon request.

Footnotes

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD: R. Philip Chalmers Inline graphichttps://orcid.org/0000-0001-5332-2810

Supplemental Material: Supplemental material for this article is available online.

References

  1. Ackerman T. A. (1989). Unidimensional IRT calibration of compensatory and noncompensatory multidimensional items. Applied Psychological Measurement, 13, 113–127. 10.1177/014662168901300201 [DOI] [Google Scholar]
  2. Babcock B. (2011). Estimating a noncompensatory IRT model using Metropolis within Gibbs sampling. Applied Psychological Measurement, 35(4), 317–329. 10.1177/0146621610392366 [DOI] [Google Scholar]
  3. Baker F. B., Kim S. H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). Dekker. [Google Scholar]
  4. Bock R. D., Aitkin M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. [Google Scholar]
  5. Bolt D. M., Lall V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov Chain Monte Carlo. Applied Psychological Measurement, 27(6), 395–414. 10.1177/0146621603258350 [DOI] [Google Scholar]
  6. Cai L. (2015). flexMIRT: A numerical engine for multilevel item factor analysis and test scoring [Computer software manual] (Version 3.0). Chapel Hill, NC: Vector Psychometric Group. [Google Scholar]
  7. Chalmers R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. 10.18637/jss.v048.i06 [DOI] [Google Scholar]
  8. Chalmers R. P. (2018). SimDesign: Structure for organizing Monte Carlo simulation designs [Computer software manual] (R package version 1.13). https://CRAN.R-project.org/package=SimDesign
  9. Chalmers R. P., Flora D. B. (2014). Maximum-likelihood estimation of noncompensatory IRT models with the MH-RM algorithm. Applied Psychological Measurement, 38(5), 339–358. 10.1177/0146621614520958 [DOI] [Google Scholar]
  10. Junker B. W., Sijtsma K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272. [Google Scholar]
  11. Lord F. M., Novick M. R. (1968). Statistical theory of mental test scores. Addison-Wesley. [Google Scholar]
  12. Maris E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187–212. [Google Scholar]
  13. Muraki E., Carlson E. B. (1995). Full-information factor analysis for polytomous item responses. Applied Psychological Measurement, 19, 73–90. [Google Scholar]
  14. Reckase M. D. (1997). A linear logistic multidimensional model for dichotomous item response data. In van der Linden W. J., Hambleton R. K. (Eds.), Handbook of modern item response theory (pp. 271–286). Springer-Verlag. [Google Scholar]
  15. Reckase M. D. (2009). Multidimensional item response theory. Springer-Verlag. [Google Scholar]
  16. Samejima F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, 34, 1–97. [Google Scholar]
  17. Schneider L., Chalmers R. P., Debelak R., Merkle E. (2019). Model selection of non-nested and nested item response models using Vuong tests. Multivariate Behavioral Research. Advance online publication. 10.1080/00273171.2019.1664280 [DOI] [PubMed]
  18. Sigal M. J., Chalmers R. P. (2016). Play it again: Teaching statistics with Monte Carlo simulation. Journal of Statistics Education, 24(3), 136–156. 10.1080/10691898.2016.1246953 [DOI] [Google Scholar]
  19. Sympson J. B. (1977). A model for testing with multidimensional items. In Weiss D. J. (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference (pp. 82–98). University of Minnesota. [Google Scholar]
  20. Vuong Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57(2), 307–333. [Google Scholar]
  21. Whitely S. E. (1980). Multicomponent latent trait models for ability tests. Psychometrika, 60, 181–198. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

OLA – Supplemental material for Partially and Fully Noncompensatory Response Models for Dichotomous and Polytomous Items

Supplemental material, OLA for Partially and Fully Noncompensatory Response Models for Dichotomous and Polytomous Items by R. Philip Chalmers in Applied Psychological Measurement


Articles from Applied Psychological Measurement are provided here courtesy of SAGE Publications

RESOURCES