A Mixture IRTree Model for Extreme Response Style: Accounting for Response Process Uncertainty

Nana Kim; Daniel M Bolt

doi:10.1177/0013164420913915

. 2020 Apr 27;81(1):131–154. doi: 10.1177/0013164420913915

A Mixture IRTree Model for Extreme Response Style: Accounting for Response Process Uncertainty

Nana Kim ^1,^✉, Daniel M Bolt ¹

PMCID: PMC7797955 PMID: 33456065

Abstract

This paper presents a mixture item response tree (IRTree) model for extreme response style. Unlike traditional applications of single IRTree models, a mixture approach provides a way of representing the mixture of respondents following different underlying response processes (between individuals), as well as the uncertainty present at the individual level (within an individual). Simulation analyses reveal the potential of the mixture approach in identifying subgroups of respondents exhibiting response behavior reflective of different underlying response processes. Application to real data from the Students Like Learning Mathematics (SLM) scale of Trends in International Mathematics and Science Study (TIMSS) 2015 demonstrates the superior comparative fit of the mixture representation, as well as the consequences of applying the mixture on the estimation of content and response style traits. We argue that methodology applied to investigate response styles should attend to the inherent uncertainty of response style influence due to the likely influence of both response styles and the content trait on the selection of extreme response categories.

Keywords: mixture IRTree model, mixture modeling, item response theory, extreme response style, self-report rating scale

Item response tree (IRTree) models have become a popular methodological approach for the measurement of response styles (Böckenholt, 2012; Böckenholt & Meiser, 2017; Jeon & De Boeck, 2016; Khorramdel & von Davier, 2014). Central to the application of such models is the assumption of a sequential process by which respondents arrive at chosen response categories. Typical applications of IRTree models assume a single response process that applies across all respondents, an assumption that seems important to confirm. In practice, however, assuming all respondents follow the same response process may be unrealistic. The assumption of a single common response process can be easily violated particularly when modeling response styles, which by definition are content-irrelevant tendencies in the selection of response categories (Paulhus, 1991). It is conceivable that for some respondents, the selection of extreme categories may be content-irrelevant, while for other respondents, the selection of extreme categories is content-relevant.

In this article, we illustrate the application of a mixture IRTree model to self-report rating scale data to account for such a phenomenon. We show how a mixture approach can accommodate the possibility that different traits may be relevant for different respondents in explaining their response category selection. In the presence of such a mixture, we observe, as expected, a reduced precision with which response styles are estimated. As we argue in the discussion, the mixture IRTree generalization brings the IRTree methodology closer to the mixture item response theory (IRT) and multidimensional IRT approaches alternatively used to measure response styles.

The article is organized as follows. First, we review the IRTree approach to modeling response style, focusing on a simple model for extreme response style (ERS) applied to a four-category rating scale. Second, we present a competing IRTree model that explains the extreme category responses using only the content trait. In the presence of competing IRTree models, we demonstrate through simulation analyses the potential use of a mixture IRTree model to evaluate whether a single tree or multiple trees appear to be present in the data. In that context, we evaluate the success of a Bayesian estimation approach both in detecting the presence of distinguishable classes, as well as in recovering the relevant model parameters for each class, both at the respondent (class membership) and sample (mixture proportions) levels. Third, we apply the mixture approach to a real test data set to demonstrate the practical reality of the mixture approach. We seek to demonstrate that a primary consequence of the presence of a mixture concerns its effects on the precision with which response styles are measured.

Response Styles and IRTree Models

Response styles have long been recognized as a threat to the validity of measurement. By definition, response styles refer to systematic tendencies to select response categories in ways that are unrelated to the content of item/test (Paulhus, 1991). One of the most frequently observed response styles is ERS, which refers to the tendency to overselect extreme response categories (De Jong et al., 2008; Greenleaf, 1992). ERS is desirable to measure not only because of its frequent and statistically detectable presence but also due to its potential to contaminate the estimation of the content trait. Ignoring the possibility that extreme responses can be due to ERS may yield an under- or overestimation of the content trait. Moreover, extreme response styles tend to correlate with other respondent characteristics. Specifically, many researchers have found that both individual- and country-level variables such as sociodemographic, personality, and cultural characteristics can correlate with ERS (Austin et al., 2006; Chen et al., 1995; De Jong et al., 2008; Greenleaf, 1992; Johnson et al., 2005; Meisenberg & Williams, 2008; Naemi et al., 2009; see Van Vaerenbergh & Thomas, 2013, for more details). Such correlations open the potential for ERS to contribute bias to the estimated relationships between the content trait and other criterion variables.

IRTree models have been proposed for modeling response styles (Park & Wu, 2019; Plieninger & Meiser, 2014). IRTree models (Böckenholt, 2012; De Boeck & Partchev, 2012; Jeon & De Boeck, 2016) characterize the underlying response processes associated with response selection using a sequential decision tree structure where IRT measurement models represent the outcomes at each decision node. For example, responses to a four-category item with categories “1 = Strongly Disagree, 2 = Disagree, 3 = Agree, and 4 = Strongly Agree” might be specified as a two-stage selection process involving three decision nodes (as shown in Figure 1). At the first decision node, the respondent chooses whether to agree or disagree with the item based on their content trait $(θ)$ , while at a second stage, the respondent decides on the extremity of agreement/disagreement based on their extreme response style ( $η$ ; ERS). The outcome at each node is binary, with respondents moving to the second node only if choosing to disagree at the first node, and moving to the third node only if choosing to agree at the first node. Importantly, by individually aligning the content and response style traits with different response nodes, the IRTree model separates the sources of information contributing to estimates of the content and response style traits, thus, making each easier to estimate.

Figure 1. — Illustration of an extreme response style (ERS) item response tree (IRTree) model for responses to a 4-point Likert-type scale item.

An implicit assumption of IRTree models is that both the decision tree structure and the nature of the underlying traits involved at each decision node are the same across all respondents. This assumption might be questioned to the extent that different respondents could choose the same response category for different reasons. We might anticipate, for example, that for a certain subset of respondents, selection of the extreme categories at the second stage is not due to a response style, but is instead a further manifestation of their underlying content trait $θ$ (as the item/test developer presumably intends). Such a possibility leads to a consideration of a mixture IRTree model, as we describe in a later section. Prior to this, however, we review a common strategy taken in representing an IRTree model as a statistical model.

Statistical Representation of an IRTree Model for Extreme Response Style

A common approach to translating an IRTree model such as in Figure 1 (“ERS IRTree model”) to a statistical representation makes use of pseudo-items that correspond to the binary outcome at each decision node. We denote the outcome at decision node k for respondent $j$ on item $i$ as pseudo-item $Y_{ijk}^{*}$ , where a value of 1 typically corresponds to the decision reflecting a higher level of the trait underlying that node. Consequently, for the IRTree model in Figure 1, an “Agree” decision at the first node is coded as $Y_{ij 1}^{*}$ = 1 while a “Disagree” decision is coded as $Y_{ij 1}^{*}$ = 0 as a respondent possessing a high level of content trait (which underlies the first node) would tend to “agree” to the statement. At Nodes 2 and 3, $Y_{ij 2}^{*}$ = 1 and $Y_{ij 3}^{*}$ = 1 correspond to the selection of extreme responses (i.e., “Strongly agree” or “Strongly disagree”), and $Y_{ij 2}^{*}$ = 0 and $Y_{ij 3}^{*}$ = 0 to selection of non-extreme responses (i.e., “Agree” or “Disagree”) because ERS underlies the second and third nodes. This implies that responses to each category in Figure 1 can be recoded into three pseudo-items that correspond to the three decision nodes. For example, a response of “2 = Disagree” can be recoded to 0 for both $Y_{ij 1}^{*}$ and $Y_{ij 2}^{*}$ , and missing (“NA”) for $Y_{ij 3}^{*}$ (as respondents who disagree at Node 1 will not go through Node 3).

Given this coding of pseudo-items, the probability of an outcome at a decision node is represented using an IRT model. In this article, we assume a two-parameter logistic model for all nodes. For decision node $k = 1$ , we assume

P (Y_{ij 1}^{*} = 1) = \frac{\exp (a_{i 1} θ_{j} + b_{i 1})}{1 + \exp (a_{i 1} θ_{j} + b_{i 1})},

(1)

where $θ_{j}$ denotes the content trait, while for nodes $k = 2$ and 3, we assume

P (Y_{ijk}^{*} = 1) = \frac{\exp (a_{ik} η_{j} + b_{ik})}{1 + \exp (a_{ik} η_{j} + b_{ik})},

(2)

where $η_{j}$ denotes the extreme response style trait. Note that the models are expressed using the slope–intercept parameterization where the $a_{ik}$ denotes item discrimination parameters and $b_{ik}$ denotes intercept (difficulty-related) parameters. The $a_{ik}$ s and $b_{ik}$ s are allowed to differ across nodes, with $a_{ik}$ s assumed to be positive. An assumption of local independence across nodes makes the probability of selecting a particular category $m$ equal to the product of probabilities of responses to pseudo-items across the three nodes:

\begin{matrix} P (Y_{ij} = m) = P (Y_{ij 1}^{*} = y_{ij 1}^{*}) P (Y_{ij 2}^{*} = y_{ij 2}^{*}) P (Y_{ij 3}^{*} = y_{ij 3}^{*}) \\ = \frac{\exp [y_{ij 1}^{*} (a_{i 1} θ_{j} + b_{i 1})]}{1 + \exp (a_{i 1} θ_{j} + b_{i 1})} \times {[\frac{\exp [y_{ij 2}^{*} (a_{i 2} η_{j} + b_{i 2})]}{1 + \exp (a_{i 2} η_{j} + b_{i 2})}]}^{1 - y_{ij 1}^{*}} \\ \times {[\frac{\exp [y_{ij 3}^{*} (a_{i 3} η_{j} + b_{i 3})]}{1 + \exp (a_{i 3} η_{j} + b_{i 3})}]}^{y_{ij 1}^{*}}, \end{matrix}

(3)

where $y_{ijk}^{*}$ for each node $k$ combine to render a final response of $m$ . The probability of selecting “2 = Disagree” category, for instance, would be

P (Y_{i j} = 2) = P (Y_{i j 1}^{*} = 0) P (Y_{i j 2}^{*} = 0) = \frac{1}{1 + \exp (a_{i 1} θ_{j} + b_{i 1})} \times \frac{1}{1 + \exp (a_{i 2} η_{j} + b_{i 2})} .

(4)

In typical practice, the specification of an IRTree model can be compared against a traditional unidimensional IRT model applied to polytomously scored items (e.g., the generalized partial credit model) to verify the presence of extreme response style (see e.g., Böckenholt & Meiser, 2017). In these applications, however, the IRTree model assumes the same response process and the same traits are invoked for all respondents across nodes. The possibility that different respondents provide responses for different reasons leads to consideration of a mixture representation.

Generalization of an IRTree Model to Accommodate a Mixture of Trees

An Alternative Model: The Ordinal IRTree Model

As an alternative to the ERS IRTree model, we consider a model that assumes a similar sequential process, but where the content trait underlies decisions made at all three nodes. As a result, the general tree presentation in Figure 1 still applies, but the decisions made at the second stage (Nodes 2 and 3) are assumed to be influenced by $θ$ (content trait) as opposed to $η$ (ERS). Figure 2 illustrates the resulting Ordinal (ORD) IRTree model now having a binary outcome at Decision Nodes 2 and 3 that implies a selection of the higher of the two successive categories (as opposed to the most extreme of the options). Therefore, the binary outcome at the second node is reversed compared with the ERS model in Figure 1. Specifically, “Disagree” corresponds to $Y_{ij 2}^{*}$ = 1 and “Strongly disagree” corresponds to $Y_{ij 2}^{*}$ = 0 in this ORD model (as a respondent with a higher level of the content trait is more likely to choose “Disagree” than “Strongly Disagree”).

Figure 2. — Illustration of an ordinal (ORD) item response tree (IRTree) model for responses to a 4-point Likert-type scale item.

The outcome at each decision node is consequently modeled as

P (Y_{ijk}^{*} = 1) = \frac{\exp (a_{ik} θ_{j} + b_{ik})}{1 + \exp (a_{ik} θ_{j} + b_{ik})}

(5)

for $k = 1, 2, 3$ , where $θ_{j}$ denotes the content trait for respondent $j$ and $a_{ik}$ s are constrained to be positive. The item parameters $a_{ik}$ and $b_{ik}$ are separately estimated for each node as in the ERS model.

Note that the same pseudo-items created for the ERS model can also be applied in fitting the ORD model. However, because the outcome for Node 2 is reversed under the ORD model compared with the ERS model (i.e., $Y_{ij 2}^{*} = 1$ under the ERS model corresponds to the lower category “Strongly Disagree” rather than the higher score category “Disagree”), the outcome for the second node can be specified as

P (Y_{ij 2}^{*} = 1) = 1 - \frac{\exp (a_{i 2} θ_{j} + b_{i 2})}{1 + \exp (a_{i 2} θ_{j} + b_{i 2})} = \frac{1}{1 + \exp (a_{i 2} θ_{j} + b_{i 2})} = \frac{\exp (- a_{i 2} θ_{j} - b_{i 2})}{1 + \exp (- a_{i 2} θ_{j} - b_{i 2})}

(6)

for the ORD model using the pseudo-item created under the ERS model. In this way, each of the ERS and ORD IRTree models can be fitted to the same pseudo-items. Similar to the ERS IRTree model, an assumption of local independence across nodes makes the probability of selecting category $m$ under the ORD model equivalent to

P (Y_{i j} = m) = \frac{\exp [y_{i j 1}^{*} (a_{i 1} θ_{j} + b_{i 1})]}{1 + \exp (a_{i 1} θ_{j} + b_{i 1})} \times {[\frac{\exp [y_{i j 2}^{*} (- a_{i 2} θ_{j} - b_{i 2})]}{1 + \exp (- a_{i 2} θ_{j} - b_{i 2})}]}^{1 - y_{i j 1}^{*}} \times {[\frac{\exp [y_{i j 3}^{*} (a_{i 3} θ_{j} + b_{i 3})]}{1 + \exp (a_{i 3} θ_{j} + b_{i 3})}]}^{y_{i j 1}^{*}},

(7)

where $y_{ijk}^{*}$ refers to the pseudo-item responses as defined under the ERS tree in Figure 1.

A Mixture of ERS and ORD IRTree Models

The ORD model is naturally a competitor to the ERS model and could be statistically compared with the ERS IRTree model. However, when both models are viewed as applicable across a population of respondents, we can also formulate a mixture IRTree model in which each of the ERS and ORD IRTree models defines a latent class in the mixture. Under a mixture representation, each respondent is assumed to have a latent membership in either the ERS or ORD class across all item responses. As can be seen in Figures 1 and 2, the two classes are distinguished by whether the decisions at Stage 2 (Nodes 2 and 3) are affected by the content trait $θ$ or an extreme response style trait $η$ . In order to link metrics of the content trait across classes, we constrain the ORD and ERS IRTree models to share a common latent trait θ (i.e., content trait) at the first node and have identical item parameters for that first pseudo-item across classes. The item parameters for Nodes 2 and 3 are allowed to vary across classes and are separately estimated for the two classes. Consequently, the mixture IRTree model can be written as

\begin{matrix} P (Y_{ij} = m) = \frac{\exp [y_{ij 1}^{*} (a_{i 1} θ_{j} + b_{i 1})]}{1 + \exp (a_{i 1} θ_{j} + b_{i 1})} \\ \times {{[\frac{\exp [y_{ij 2}^{*} (a_{i 21} η_{j} + b_{i 21})]}{1 + \exp (a_{i 21} η_{j} + b_{i 21})}]}^{1 - y_{ij 1}^{*}} {[\frac{\exp [y_{ij 3}^{*} (a_{i 31} η_{j} + b_{i 31})]}{1 + \exp (a_{i 31} η_{j} + b_{i 31})}]}^{y_{ij 1}^{*}}}^{2 - z_{j}} \\ \times {{[\frac{\exp [y_{ij 2}^{*} (- a_{i 22} θ_{j} - b_{i 22})]}{1 + \exp (- a_{i 22} θ_{j} - b_{i 22})}]}^{1 - y_{ij 1}^{*}} {[\frac{\exp [y_{ij 3}^{*} (a_{i 32} θ_{j} + b_{i 32})]}{1 + \exp (a_{i 32} θ_{j} + b_{i 32})}]}^{y_{ij 1}^{*}}}^{z_{j} - 1}, \end{matrix}

(8)

where $z_{j}$ denotes the class membership parameter of respondent j (1 = ERS class, 2 = ORD class), $y_{ijk}^{*}$ denotes pseudo-item responses recoded under the ERS model, $a_{ik 1}$ and $b_{ik 1}$ for $k = 2, 3$ represent item parameters of Nodes 2 and 3 for the ERS class, and $a_{ik 2}$ and $b_{ik 2}$ for the ORD class. We can observe from the right-hand side of the equation that the first part (i.e., the probability for the first node) stays the same for both classes but either the second or the third part drops out depending on the class membership $z_{j}$ a respondent has. For instance, when a respondent belongs to the ORD class ( $z_{j} = 2$ ), the second part drops out and the third part stays in the equation as the exponents $2 - z_{j}$ and $z_{j} - 1$ , respectively, become 0 and 1. In contrast, when a respondent is in the ERS class ( $z_{j} = 1$ ), the third part drops out and the second part stays in the equation.

Table 1 summarizes the pseudo-item outcomes that correspond to each item category response and the probability of responses at each node for each model (class) in the mixture model. We present the pseudo-item outcomes created under the ERS model and, therefore, use Equation (6) for the probability at Node 2 for the ORD model.

Table 1.

A Summary of Information for the Mixture Item Response Tree (IRTree) Model.

		Pseudo-items for each category ( $y_{ijk}^{*}$ )				Class membership ( $z_{j})$	Model [ $P (Y_{ijk}^{} = y_{ijk}^{})$ ]
Node (trait)		$m = 1$	$m = 2$	$m = 3$	$m = 4$
$Y_{1}^{*}$ ( $θ$ )		0	0	1	1	Both (1, 2)	$\frac{\exp [y_{ij 1}^{*} (a_{i 1} θ_{j} + b_{i 1})]}{1 + \exp (a_{i 1} θ_{j} + b_{i 1})}$
$Y_{2}^{*}$	( $η$ )	1	0	NA	NA	ERS (1)	$\frac{\exp [y_{ij 2}^{*} (a_{i 21} η_{j} + b_{i 21})]}{1 + \exp (a_{i 21} η_{j} + b_{i 21})}$
$Y_{2}^{*}$	( $θ$ )	1	0	NA	NA	ORD (2)	$\frac{\exp [y_{ij 2}^{*} (- a_{i 22} θ_{j} - b_{i 22})]}{1 + \exp (- a_{i 22} θ_{j} - b_{i 22})}$
$Y_{3}^{*}$	( $η$ )	NA	NA	0	1	ERS (1)	$\frac{\exp [y_{ij 3}^{*} (a_{i 31} η_{j} + b_{i 31})]}{1 + \exp (a_{i 31} η_{j} + b_{i 31})}$
$Y_{3}^{*}$	( $θ$ )	NA	NA	0	1	ORD (2)	$\frac{\exp [y_{ij 3}^{*} (a_{i 32} θ_{j} + b_{i 32})]}{1 + \exp (a_{i 32} θ_{j} + b_{i 32})}$

Open in a new tab

Note. The extreme response style (ERS) IRTree model and ordinal (ORD) IRTree model respectively defines a latent class in the mixture IRTree model.

The possibility of a mixture in relation to an IRTree model was also considered by Tijmstra et al. (2018). In their approach, a mixture is proposed in which respondents either conform to an IRTree model or a generalized partial credit model in which all the response categories are an ordinal reflection of the underlying latent trait. In this article, we take an alternative approach based on the aforementioned mixture of IRTree models. The use of a mixture of IRTree models has a couple of advantages over the mixture proposed by Tijmstra et al. (2018). First, it offers the ability to link the same content trait across trees (through the assumption of common item parameters at the first node), permitting estimation of a single-content trait that applies across models. A second related advantage is that by applying a common tree structure across models in the mixture, the uncertainty attached to class membership becomes manifest in the uncertainty (measurement error) in the trait estimates for each model. This proves important in the current application in that it makes it possible to examine how uncertainty regarding class contributes to uncertainty in the content $(θ$ ) and extreme response ( $η$ ) trait estimates. We examine these issues with both simulation and real data study.

Another approach using mixture models in the context of multidimensional IRT to attend to response styles was recently presented by Khorramdel et al. (2019). The Khorramdel et al. (2019) approach is implemented through a three-step procedure in which mixture IRT is applied in the second step to the pseudo-items corresponding to Nodes 2 and 3 (in isolation of the Node 1 pseudo-items), and is also exploratory in nature. The current approach formally defines one class to have a statistically identical trait for Nodes 1, 2, and 3 (the ORD class), and in this respect is a constrained mixture model. As a result, there are clear statistical differences between the approaches; a formal empirical comparison is beyond the scope of this article and an area for future research.

Simulation Analyses

We evaluated the mixture IRTree model and its estimation using a fully Bayesian estimation algorithm with simulated data. We focus our simulation on evaluating how well the model identifies the mixture of classes (both at respondent and sample levels) and recovers item and respondent parameters. The ERS and ORD IRTree models in Equations (3) and (7), respectively, were used to generate response patterns for respondents in Classes 1 (ERS class) and 2 (ORD class). Responses for a total of 1,000 respondents to 15 four-response category items were generated. We systematically varied the proportion of respondents in each class: ( $P_{1}, P_{2}$ ) = (1.0, 0.0), (0.7, 0.3), (0.5, 0.5), (0.7, 0.3), and (0.0, 1.0) as a simulation factor, where $P_{1}$ and $P_{2}$ , respectively, denote the proportion of respondents in the ERS and ORD classes. Ten data sets were generated for each condition for replication purposes; accordingly, 50 data sets (10 replications * 5 conditions) were generated in total. For each respondent, item category responses were simulated as outcomes of a sequential response process corresponding to the IRTree model of their respective class. Then the category responses were recoded into pseudo-items as defined under the ERS IRTree model. The overall data generation process can be summarized in the following steps:

Step 1. Generate person parameters $θ_{j}$ (content trait) and $η_{j}$ (ERS) for 1,000 respondents independently from $Normal (0, 1)$ , assuming that $θ_{j}$ and $η_{j}$ are uncorrelated.
Step 2. Generate item parameters $a_{ikc}$ and $b_{ikc}$ across three nodes (k = 1, 2, 3) for 15 items in each of the two classes (c = 1, 2), respectively, as $Uniform (0.5, 2)$ and $Uniform (- 3, 3)$ . The item parameters for the first node were generated to be identical across the two classes ( $a_{i 11} = a_{i 12}, b_{i 11} = b_{i 12}$ ) while the parameters for the second and third nodes were independently generated for each class.
Step 3. Assign class membership parameters ( $z_{j})$ for respondents according to the mixture proportion condition (i.e., [ $P_{1}, P_{2}$ ]) being considered. We assigned the first $100 P_{1}$ % of the respondents to Class 1 (ERS class) and the rest to Class 2 (ORD class).
Step 4. Calculate the probability of each respondent selecting category $m$ ( $m = 1, 2, 3, 4$ ) using Equation (8). For instance, for a respondent in ERS class ( $z_{j} = 1)$ , the probability of selecting category $m$ is calculated by plugging in the generated parameter values for $a_{ik 1}$ and $b_{ik 1}$ and pseudo-item outcome values corresponding to category $m$ for $y_{ijk}^{*}$ in this equation: $\frac{\exp [y_{ij 1}^{*} (a_{i 1} θ_{j} + b_{i 1})]}{1 + \exp (a_{i 1} θ_{j} + b_{i 1})} {[\frac{\exp [y_{ij 2}^{*} (a_{i 21} η_{j} + b_{i 21})]}{1 + \exp (a_{i 21} η_{j} + b_{i 21})}]}^{1 - y_{ij 1}^{*}} {[\frac{\exp [y_{ij 3}^{*} (a_{i 31} η_{j} + b_{i 31})]}{1 + \exp (a_{i 31} η_{j} + b_{i 31})}]}^{y_{ij 1}^{*}}$ . We calculate four probability values (corresponding to four response categories) for each respondent.
Step 5. Generate multinomial responses from 1,000 respondents to 15 items, using the probabilities calculated in the previous step.
Step 6. Transform the categorical responses to pseudo-items based on the ERS tree structure, as shown in Table 1. (Recall that the ORD tree structure can be fitted to ERS tree pseudo-items by forcing the discrimination parameter at the second node to be negative as opposed to positive; see Equation [6].) The final data set consequently has binary responses from 1,000 respondents to 45 pseudo-items (15 items * 3 nodes) with the irrelevant pseudo-items coded as missing.
Step 7. Repeat Steps 5 and 6 within each condition to generate 10 data sets for each mixing proportion condition.
Step 8. Repeat Steps 3 through 7 for each of the mixing proportion conditions.

We fit the IRTree mixture model involving ERS and ORD classes to each of the generated data sets using a Bayesian (Markov chain Monte Carlo) estimation algorithm applied using JAGS (Just Another Gibbs Sampler) 4.3.0 (Plummer, 2017). To run JAGS from the R software, the jagsUI package (Kellner, 2019) was used. The prior distributions used for the item parameters $a_{ikc}$ and $b_{ikc}$ were, respectively, $logNormal (0, 1)$ and $Normal (0, 1)$ . For person parameters, $θ_{j}$ and $η_{j}$ were each assumed to independently follow a prior distribution of $Normal (0, 1)$ while the class membership parameter $z_{j}$ was assumed to be $Categorical (p_{1}, p_{2})$ with $Dirichlet (α_{1}, α_{2})$ as a prior for the vector of hyperparameters $(p_{1}, p_{2})$ . We set $α_{1} = α_{2} = 1$ to make the prior uniformly distributed and noninformative. For each simulated data set, 10 chains were run where for each chain the total number of iterations was set to 25,100. Standard convergence criteria (i.e., Gelman–Rubin R²) supported this number of iterations as being consistently sufficient to achieve convergence with the studied mixture model. The first 100 iterations were used for adaptation and a subsequent 5,000 iterations were discarded as burn-in. We retained every 10th subsequent value in the simulated chains (i.e., thinning interval = 10 iterations), implying that a total of 2,000 iterations were used from each of the 10 chains to produce posterior distributions of the model parameters. We used the means of the univariate posterior distributions as parameter estimates.

To compare the mixture IRTree model against the use of a single IRTree model, we also separately fit the ERS and ORD IRTree models to the same data sets to examine whether the mixture model emerges as superior in the presence of two classes. The same prior distributions as used in the mixture model were applied for the corresponding parameters under each single IRTree model. Due to the reduced complexity of these models, five chains were run for each analysis and a total of 15,100 iterations for each chain. As above, this number of iterations was found sufficient to achieve convergence according to the Gelman–Rubin criterion. The first 100 and 5,000 iterations were discarded as adaptive and burn-in iterations, respectively. The resulting posterior distributions were constructed from the 10,000 post burn-in iterations again using a thinning interval of 10, implying a total of 1,000 iterations from five respective chains for determination of parameter estimates.

Simulation Results

Model Fit Comparisons

We first compared the fit of the mixture IRTree model with that of the single ERS and ORD IRTree models. The deviance information criteria (DIC) for the models, obtained from the first simulated data set, are reported in Table 2. The DIC is calculated as the sum of the mean deviance to a penalty based on the complexity of the model (the effective number of parameters, denoted as pD). As expected, for the data sets that only contained respondents from one of the two classes, that is ( $P_{1}, P_{2}$ ) = (1.0, 0.0), (0.0, 1.0), the IRTree model that corresponds to the correct class returned the lowest DIC value indicating the best model fit, while the mixture model returned the lowest DIC value for all conditions involving a mixture of the two classes. The same pattern of findings was observed across all 10 of the data sets, suggesting that the mixture IRTree model correctly emerges as superior in the presence of conditions in which both the ERS and ORD classes are present for different respondent subpopulations.

Table 2.

Deviance Information Criterion (DIC) Results for the Mixture, Extreme Response Style (ERS), and Ordinal (ORD) IRTree Models for Five Different Mixture Proportion Conditions, First Simulated Data Set for Each Condition.

	Class mixing proportion condition ( $P_{1}, P_{2}$ )
	(1.0, 0.0)		(0.7, 0.3)		(0.5, 0.5)		(0.3, 0.7)		(0.0, 1.0)
Fitted model	pD	DIC	pD	DIC	pD	DIC	pD	DIC	pD	DIC
Mixture	2,529	29,107	2,583	28,948	2,402	29,051	2,169	28,433	1,359	27,519
ERS	2,308	28,898	2,439	30,932	2,486	31,718	2,406	31,391	2,719	30,228
ORD	1,249	31,726	1,256	32,422	1,264	32,125	1,195	30,662	1,201	27,374

Open in a new tab

Note. The smallest DIC value for each condition is in boldface. pD = effective number of parameters.

Estimation of Latent Proportions and Classification Accuracy

The ability of the mixture IRTree model to correctly capture the mixture of respondents in the data can also be inspected by looking at how well the model estimates the mixing proportion parameters ( $p_{1}$ , $p_{2}$ ) and the respondent class memberships ( $z_{j}$ ). We evaluated recovery in terms of the bias and root mean square errors (RMSEs) of the estimates of the latent proportions. Specifically,

Bias = E ({\hat{p}}_{1 r}) - p_{1} = \frac{1}{rep} \sum_{r = 1}^{rep} {\hat{p}}_{1 r} - p_{1},

(9)

RMSE = \sqrt{E [{({\hat{p}}_{1 r} - p_{1})}^{2}]} = \sqrt{\frac{1}{rep} \sum_{r = 1}^{rep} {({\hat{p}}_{1 r} - p_{1})}^{2}},

(10)

where ${\hat{p}}_{1 r}$ denotes the estimate of the proportion parameter $p_{1}$ obtained from the $r$ th simulation data set, and $rep$ denotes the total number of replications for each condition (in this case 10).

As can be seen in Table 3, the bias and RMSE are all very small (less than 0.01) using the fully Bayesian estimation approach, indicating that the mixture model estimates the true proportion of respondents in each class accurately. At the respondent level, the estimate of class membership ( ${\hat{z}}_{j}$ ) lies between 1 and 2, and reflects the probability of belonging to each class. For example, if a respondent’s estimated class membership is 1.3, this means that the probability of belonging to Class 2 (ORD class) for the individual is 0.3 and to Class 1 (ERS class) is 0.7. We thus assign respondents to Class 1 if their ${\hat{z}}_{j}$ is smaller than or equal to 1.5 and Class 2 if their ${\hat{z}}_{j}$ is larger than 1.5. We evaluate the accuracy of such a classification by calculating the proportion of correctly assigned respondents to each class (i.e., hit rate). The average hit rate across the 10 replicated data sets within each condition is presented in the right two columns of Table 3. The hit rates are larger than 90% for all conditions, although accuracy decreases slightly as the proportion of respondents in the class decreases. For instance, the hit rate for the ERS class is 99.97% when all the respondents in the data are in the ERS class, but decreases to 91.37% when the proportion of respondents in the ERS class reduces to 0.3. These overall results suggest that the proposed modeling approach assigns the respondents to their correct classes with high accuracy provided the number of respondents in the class is sufficiently large.

Table 3.

Bias and Root Mean Square Error (RMSE) of the Estimated Latent Proportion ${\hat{p}}_{1}$ and Classification Accuracies (Average Hit Rates) Across Five Different Mixing Proportion Conditions.

	Posterior latent proportion ${\hat{p}}_{1}$		Average hit rate (%)
Condition ( $P_{1}, P_{2}$ )	Bias	RMSE	$P ({\hat{z}}_{j} \leq 1.5 \| z_{j} = 1)$	$P ({\hat{z}}_{j} > 1.5 \| z_{j} = 2$ )
(1.0, 0.0)	−0.003	0.003	99.97	—
(0.7, 0.3)	−0.006	0.009	96.51	92.53
(0.5, 0.5)	−0.002	0.009	94.64	96.14
(0.3, 0.7)	−0.003	0.008	91.37	97.89
(0.0, 1.0)	0.002	0.002	—	100.00

Open in a new tab

Recovery of Pseudo-Item Parameters

We also examined how well the mixture model recovers the pseudo-item parameters. Table 4 displays the bias and RMSE of the item parameters $a_{ikc}$ and $b_{ikc}$ for each class (c = 1, 2) averaged over items and nodes. The values derived from the single IRTree models are also presented as criteria (baseline) for evaluation. The mixture model produces nearly the same levels of bias and RMSE for pseudo-item discrimination and difficulty parameters as those of the single IRTree models when the proportion of respondents under the corresponding model is 1. For the intermediate mixing proportion conditions, the mixture model always produces smaller bias and RMSE compared with the single IRTree models, suggesting that the mixture model provides a superior estimation of the pseudo-item parameters. In addition, the mixture model seems to recover well the item parameters for a given class even when there is only a small proportion of respondents in the class. For instance, when the proportion of respondents in ERS class is only 0.3, the bias and RMSE of $b_{ik 1}$ ( $b_{ERS}$ ) estimates produced from the mixture model are, respectively, 0.133 and 0.263, whereas the ERS model produced larger values, presumably because, under the ERS model, information from respondents who are in the ORD class are also applied in estimating the item parameters.

Table 4.

Bias and Root Mean Square Error (RMSE) of the Item Parameter Estimates Across Mixture, Extreme Response Style (ERS) and Ordinal (ORD) IRTree Models by Mixing Proportion Condition.

		Bias^a			RMSE
Parameter Estimate	Condition ( $P_{1}, P_{2}$ )	MIX	ERS	ORD	MIX	ERS	ORD
${\hat{a}}_{ik 1}$ ( ${\hat{a}}_{ERS}$ )	(1.0, 0.0)	0.072	0.071	—	0.184	0.183	—
	(0.7, 0.3)	0.082	0.253	—	0.208	0.328	—
	(0.5, 0.5)	0.097	0.445	—	0.234	0.504	—
	(0.3, 0.7)	0.135	0.547	—	0.324	0.605	—
	(0.0, 1.0)	0.323	0.483	—	0.365	0.551	—
${\hat{a}}_{ik 2}$ ( ${\hat{a}}_{ORD}$ )	(1.0, 0.0)	0.339	—	0.660	0.398	—	0.704
	(0.7, 0.3)	0.107	—	0.542	0.296	—	0.595
	(0.5, 0.5)	0.093	—	0.418	0.258	—	0.467
	(0.3, 0.7)	0.086	—	0.283	0.222	—	0.343
	(0.0, 1.0)	0.064	—	0.065	0.185	—	0.184
${\hat{b}}_{ik 1}$ ( ${\hat{b}}_{ERS}$ )	(1.0, 0.0)	0.071	0.072	—	0.155	0.156	—
	(0.7, 0.3)	0.076	0.370	—	0.168	0.417	—
	(0.5, 0.5)	0.092	0.558	—	0.213	0.603	—
	(0.3, 0.7)	0.133	0.720	—	0.263	0.762	—
	(0.0, 1.0)	0.927	0.998	—	0.949	1.035	—
${\hat{b}}_{ik 2}$ ( ${\hat{b}}_{ORD}$ )	(1.0, 0.0)	0.971	—	1.007	0.998	—	1.037
	(0.7, 0.3)	0.152	—	0.752	0.274	—	0.784
	(0.5, 0.5)	0.128	—	0.610	0.257	—	0.646
	(0.3, 0.7)	0.102	—	0.459	0.198	—	0.496
	(0.0, 1.0)	0.079	—	0.080	0.158	—	0.159

Open in a new tab

The absolute values of bias are averaged over items and nodes.

Precision (Posterior Standard Deviations) of Respondent Parameter Estimates

Appendix A displays results in terms of the bias and RMSE of respondent parameter recovery (for both $θ$ and $η$ ). Importantly, the use of the mixture IRTree model is seen to result in a recovery of respondent $θ$ estimates that is equivalent to that observed when applying the correct model to the respondent’s data. In other words, under the mixture IRTree model, the recovery of $θ$ for ORD respondents is as good as it is when applying the single ORD IRTree model, and likewise, the recovery for ERS respondents is as good as when applying the single ERS IRTree model. Similarly, for the $η$ estimates, recovery for ERS respondents under the mixture IRTree model is as good as when applying the ERS model. (Note that recovery of the $η$ for ORD respondents is consistently poor under both models, as the items do not measure $η$ for this class of respondents.) As a result, the application of a mixture approach seemingly renders accurate trait estimates with respect to the relevant traits for the respondent’s actual response process.

Of particular interest in the current application, however, is the way in which the mixture approach represents the precision of the respondent parameter estimates. As we suggested in the introduction, an anticipated consequence of applying the mixture IRTree model is that it will appropriately account for the uncertainty present in respondent parameter estimates due to the uncertainty of the respondent’s response process. As seen in Figure A2 in Appendix A, both the mixture and ERS models produce equivalently large errors for many of the $η$ estimates. One of the differences between the mixture and ERS models is that the mixture model attends to the uncertainty of the estimates due to the uncertain response process whereas the ERS model does not. Figure 3 displays kernel-smoothed functions of the relationship between the respondent parameters and their corresponding posterior standard deviations (the Bayesian equivalent of standard errors of estimates). The left plot shows the posterior standard deviations for $θ$ , the right plot for $η$ . For the content trait $θ$ , we see from the left panel in Figure 3 that the mixture model produces lower posterior standard deviations than those from the ERS model across all levels of $θ$ , a result that can be attributed to the capacity of the mixture model to extract additional information about $θ$ from Nodes 2 and 3 when the respondent is in the ORD class. At the same time, the mixture model generally produces higher posterior standard deviations than those of the ORD model, a result that is due to some respondents actually being in the ERS class, as well as the uncertainty regarding class membership for many of the respondents.

Figure 3. — Average posterior standard deviations (PSDs) of θ and η in relation to the parameters, for mixture, extreme response style (ERS), and ordinal (ORD) IRTree models, for (P₁, P₂) = (0.5, 0.5) condition.

For the posterior standard deviations of $η$ , we initially note that because only the mixture and ERS models actually estimate $η$ , the result for the ORD model is a constant value of 1. This reflects that from a Bayesian perspective, the posterior standard deviation of $η$ would be 1, the standard deviation of the prior, for all respondents in the ORD class as the data provide no information about $η$ . The right panel of Figure 3 further shows that the mixture model returns higher posterior standard deviations across all levels of $η$ in comparison to the ERS model. This result is as expected for two primary reasons: First, the mixture model produces large posterior standard deviations of $η$ for those respondents who belong to the ORD class (where $η$ is unmeasured), and second, the uncertainty of class membership makes the estimates of $η$ less precise even for those belonging to the ERS class.

To evaluate the accuracy of the precision of estimates, we can examine the proportion of the 95% credible intervals that contain the true parameter values for each parameter under each modeling approach, as shown in Table 5. For the leftmost columns of the table, we see that for the $θ$ parameters, the ERS and mixture models provide accurate 95% credible intervals, indicating that the mixture model accurately reflects the estimation of $θ$ ; however, the intervals under the ORD model are consistently in the lower 90s, suggesting inaccuracy. Recalling from Figure 3 that the posterior standard deviations were consistently lower for the mixture model than the ERS model, it would thus appear that the credible intervals for the mixture model are best (probably narrower intervals than for the ERS model with similar accuracy).

Table 5.

The Proportion of 95% Credible Intervals for $θ$ and $η$ That Contain True Parameters, for the Mixture, Extreme Response Style (ERS), and Ordinal (ORD) IRTree Models Across Mixing Proportion Conditions.

	$θ$			$η$
Condition ( $P_{1}, P_{2}$ )	Mixture	ERS	ORD^a	Mixture	ERS	ORD^a
(1.0, 0.0)	0.952	0.952	0.938	0.940	0.940	0.944
(0.7, 0.3)	0.951	0.950	0.913	0.942	0.835	0.944
(0.5, 0.5)	0.954	0.953	0.919	0.945	0.718	0.944
(0.3, 0.7)	0.954	0.952	0.915	0.941	0.688	0.944
(0.0, 1.0)	0.953	0.953	0.952	0.944	0.830	0.944

Open in a new tab

0.025th and 0.975th quantiles of the standard normal distribution are used for the interval.

From the rightmost three columns in Table 5, we likewise see that the intervals derived from the mixture model mostly include $η$ (about 94%) whereas the ERS model more frequently returns rates much lower, especially in the presence of a near equal split in class sizes. Thus, by attending to the uncertainty of the $η$ estimates, the mixture model appears to produce more accurate credible intervals than the ERS model. In summary, when a mixture is simulated to be present, it is only by modeling the data as a mixture that we obtain accurate estimates and credible intervals for the respondent parameters.

Real Data Application

The mixture model was also fitted to actual data to demonstrate the presence of a mixture of respondents in self-report rating scale items as well as to show a real-world illustration of the results seen in the simulation. The data were collected through a survey from the Trends in International Mathematics and Science Study (TIMSS) 2015 (for Grade 8). We used responses from 1,000 randomly sampled respondents from the U.S. administration of the nine items for the Students Like Learning Mathematics (SLM) scale. Each item was rated using four response categories: disagree a lot, disagree a little, agree a little, and agree a lot. We converted the item scores to pseudo-item responses based on the ERS IRTree model and fitted the mixture model to the data using JAGS with the same specifications as for the simulation analyses reported above. As in the simulation analyses, we also fitted the single ERS and ORD IRTree models.

We validated the presence of a mixture by comparing the model fit and examining the estimated latent proportions for each class as well as estimated class memberships of individuals in the mixture model. The comparative model fits are provided in Table 6. As can be seen, the mixture IRTree model produces the smallest DIC value, suggesting a better fit in comparison to the other two single IRTree models. Also, we can observe from the first row in Table 7 that the latent proportions for each class estimated from the mixture model are about 0.3 (ERS class) and 0.7 (ORD class), respectively, roughly agreeing with the proportion of respondents actually classified to each class (as can be seen in the second row in Table 7).

Table 6.

Deviance Information Criterion (DIC) Results of the Mixture, Extreme Response Style (ERS), and Ordinal (ORD) IRTree Models for TIMSS Students Like Learning Mathematics (SLM) Scale Data.

Models	Mixture	ERS	ORD
pD	2636.5	2206.7	1330.3
DIC	15357.7	15441.17	15391.48

Open in a new tab

Note. The smallest DIC value is in boldface. TIMSS = Trends in International Mathematics and Science Study; pD = effective number of parameters.

Table 7.

Estimated Latent Proportions and the Number of Respondents Classified to Extreme Response Style (ERS) and Ordinal (ORD) Classes for TIMSS Students Like Learning Mathematics (SLM) Scale Data.

	Class 1 (ERS)	Class 2 (ORD)
Posterior latent proportions	0.320	0.680
The number of classified respondents	227	773

Open in a new tab

Note. TIMSS = Trends in International Mathematics and Science Study.

To illustrate the implications of applying single or mixture IRTree models to the data with a mixture of respondents, we provide some example response patterns and their corresponding trait estimates in Table 8. Note that the ERS and mixture IRTree models provide estimates of both $θ$ and $η$ , while the ORD model only provides an estimate of $θ$ ( $η$ estimates could also naturally be reported as the mean, 0, standard deviation, 1, as defined by the prior distribution). The mixture model also provides the estimated probability of class membership for each respondent. The estimated probabilities of being in ORD class for Respondents 4 and 141, who selected all 1s or 4s, indicate that it is highly likely that they chose extreme responses due to their low or high content trait $θ$ . The probabilities, however, are not equal to 1 because there still is a possibility that they selected all 1s or 4s due to their ERS. On the other hand, Respondents 244 and 74 have a low probability of belonging to the ORD class, implying that it is highly likely that these responses are influenced by their ERS. Specifically, selecting options only from both ends (1s and 4s) may reflect a high ERS while selecting a middle category from only one direction (all 3s) reflects a low ERS (i.e., tendency to avoid extreme responses). The probabilities for both cases also do not equal to 0, reflecting the possibility of selecting extreme responses due to their content traits.

Table 8.

Example Response Patterns for TIMSS Students Like Learning Mathematics (SLM) Scale Data and Corresponding Content and Response Style Trait Posterior Means (Parameter Estimates) and Posterior Standard Deviations (PSD), Derived From the Mixture, Extreme Response Style (ERS), and Ordinal (ORD) IRTree Models.

		Mixture			ERS		ORD
ID	Responses	Prob(Class2)	$\hat{θ}$ (PSD( $θ$ ))	$\hat{η}$ (PSD( $η$ ))	$\hat{θ}$ (PSD( $θ$ ))	$\hat{η}$ (PSD( $η$ ))	$\hat{θ}$ (PSD( $θ$ ))
603	221112211	0.713	−0.956 (0.439)	0.096 (0.863)	−1.411 (0.553)	0.159 (0.250)	−0.866 (0.192)
4	111111111	0.866	−1.789 (0.538)	0.196 (1.087)	−1.423 (0.551)	1.477 (0.556)	−1.922 (0.496)
141	444444444	0.871	1.815 (0.555)	0.212 (1.099)	1.297 (0.581)	1.542 (0.557)	1.997 (0.511)
527	141111111	0.188	−1.075 (0.404)	1.253 (0.897)	−1.041 (0.414)	1.477 (0.570)	−1.427 (0.338)
244	144111111	0.004	−0.774 (0.309)	1.541 (0.564)	−0.784 (0.314)	1.510 (0.578)	−1.128 (0.260)
74	333333333	0.032	1.255 (0.600)	−1.133 (0.655)	1.301 (0.577)	−1.263 (0.584)	0.314 (0.189)
90	223333222	0.559	−0.187 (0.190)	−0.526 (1.015)	−0.243 (0.203)	−1.329 (0.586)	−0.159 (0.165)

Open in a new tab

Note. TIMSS = Trends in International Mathematics and Science Study.

The differences we see in the posterior means and standard deviations across models for the example patterns highlight some important differences between the mixture and single IRTree models. First, in comparing Respondents 4 and 603, we note that the content trait estimates ( $\hat{θ}$ ) under the ERS model show no real difference between respondents, as the only information about the content trait is extracted from Node 1, and the two response patterns reflect identical outcomes for all items at Node 1 (i.e., $Y_{i, 4, 1}^{*} = Y_{i, 603, 1}^{*} = 0$ for all $i$ ). However, under both the ORD and mixture IRTree models, where Nodes 2 and 3 have the potential to contribute information to estimating $θ,$ we see differences such that Respondent 4 shows a lower content trait estimate than 603, as expected. A similar pattern of relationships is observed for Respondents 141 and 74. Thus, unlike the ERS model, the mixture model might be viewed as beneficial in allowing some information about the content trait to be extracted from the selection of extreme response categories.

Another aspect of using the mixture is seen when comparing the $\hat{η}$ for the mixture and ERS models. Note that the ERS estimates ( $\hat{η}$ ) under the mixture model are pulled substantially closer to the prior mean (0) relative to the ERS model (except for Respondent 244 where the uncertainty of class membership is minimal), a consequence of the greater uncertainty regarding response style under the mixture model. Similarly, the posterior standard deviations of $η$ are also seen to be larger under the mixture representation. These effects are particularly seen for Respondents 4 and 141. Note that while each of these respondents consistently selects the most extreme response option, the possibility under the mixture representation that such response patterns may also be due to extreme levels on the content trait makes the pattern less informative about extreme response style. As a result, we see extreme response style trait estimates ( $\hat{η})$ under the mixture that are between the estimates under the single ERS IRTree model and the prior means (0). Such uncertainty of response process is also reflected in the content trait estimates ( $\hat{θ}$ ) under the mixture model that lies between the estimates under the single ERS and ORD IRTree models. Also, we see higher posterior standard deviations under the mixture due to this uncertainty.

This change in posterior means and standard deviations of $η$ shows how the mixture IRTree model takes into account the uncertainty of the respondent’s class memberships. However, in certain instances, despite the presence of the mixture, class membership is quite clear, and this shrinkage effect is more minimal. An illustration is seen in comparing Respondents 527 and 244, each of whom select options from both extremes of the rating scale (both 1 and 4). For Respondent 244, who selects twice from the opposite end of the rating scale (4) relative to the modal response (1), the PSD of $η$ does not get much larger in comparison to the ERS class, as the evidence for ERS is strong even in the application of a mixture model. For Respondent 527, who selects only once from the opposite end of the rating scale, the evidence is less strong, and the PSD of $η$ is a little larger in comparison to that seen for the ERS class.

Figure 4 displays kernel-smoothed functions of the relationship between the posterior standard deviations (i.e., standard errors) of respondents’ person parameters and the respondents’ probability of belonging to the ORD class. The probability of ORD class membership indicates the uncertainty of the respondent’s true class membership. Note that the curve for $θ$ under the mixture model is quite consistently between those of the single tree ORD and ERS models, closer to the ERS model when the probability of being in the ORD class is low (see left panel). The mixture curve gets closer to the ORD model when the posterior probability of class membership is high. The curve lies between the two other curves (ERS and ORD) for the intermediate posterior probability of class membership values. Note that for $η$ the PSD curve under the ORD model is drawn as a straight line at the value of 1, the standard deviation of the prior distribution of $η$ . The mixture model produces progressively larger PSDs of $η$ as the probability of belonging to ORD class increases, showing how the uncertainty of the response style estimates is substantially affected by the reduced certainty of membership in the ERS class. The single ERS IRTree model, in contrast, does not consider such uncertainty, and as a result reports $η$ estimates with inflated levels of precision, as was observed in the simulation study.

In summary, the pattern of results we observe in the real data resembles quite closely the effects seen in the simulation. The assumption of a single class IRTree model, whether an ERS IRTree or an ORD IRTree, overstates the precision of the estimated respondent traits when both response processes may be present in the respondent population. The use of a single IRTree model should thus be supported by evidence of its validity across all respondents; our results suggest that a mixture may likely be present, in which case model estimates should be sensitive to the unknown class to which the respondent belongs.

Conclusions

IRTree models have become a popular way of measuring response styles for self-report rating scale assessments. Such models associate a response process (represented in the form of a decision-making tree) with content and response style traits that underlie the different decision-making nodes within the tree. The separation of traits across nodes enhances the ability to measure response style traits but comes at the cost of assuming the same response process (associated with the same traits across respective steps in the process) for all respondents.

In this article, we demonstrate through a real data application the likelihood that no single IRTree may best characterize all respondents, and that a better representation of the response process may be achieved by allowing a mixture of trees. We demonstrate this possibility in the context of a commonly used IRTree model for extreme response style by considering also an alternative IRTree model that assumes the content trait is relevant at all decision nodes. The better comparative fit of the mixture model confirms such a mixture. Some of the more immediate implications of the mixture relate to the precision with which the content and response style traits are assumed to be measured. We suggest that the likely presence of a mixture introduces what should be viewed as the core challenge in attempts to measure response styles such as ERS, namely the uncertainty that exists as to whether the selection of an extreme response category is due to the content trait, a response style, or some combination of these factors. Through a mixture IRTree model approach, it becomes possible to see how the uncertainty of class membership impacts the estimation of both the content and response style traits. As expected, the extreme response styles tend to show larger posterior standard deviations (i.e., standard errors of estimates) when allowing for a mixture. We contend that this uncertainty, already an implicit part of both mixture IRT (von Davier & Rost, 1995) and multidimensional IRT (Bolt & Johnson, 2009) approaches to measuring response style, is important to consider when psychometric models are used to measure response styles. As Adams et al. (2019) note, attending to this uncertainty has various practical implications related to the design of survey instruments, in particular, the value of having psychometrically heterogeneous items, as well as the relevance of having external criteria (e.g., anchoring vignettes, content-heterogeneous items) to more accurately measure response styles.

Although not explored in this article, a mixture tree representation arguably brings the IRTree methodology closer to that observed with mixture and multidimensional IRT models of response style. Future comparative studies of response style methods in this context would be useful. Along these lines, Meiser et al. (2019) also demonstrate the possibility that the outcome at a single decision tree node might be influenced simultaneously by both a content trait and a response style trait. In a similar way, such a model might also be anticipated to return reduced precision in the estimated response style trait due to the less certain roles the content and response style traits play in response category selection.

Of course, despite our use of a mixture model in this article, it is also conceivable that for an individual respondent the causes of extreme responses may also vary within a single respondent across items. For example, a respondent may for some items select an extreme category as a result of the content trait, and for other items a response style. This possibility was not considered in this article. It may be difficult to model such behavior unless a test is sufficiently long. Another issue not considered in this article concerns the potential for bias in the estimates of latent traits when failing to account for a true mixture. It is not difficult to envision scenarios whereby not only will precision be misestimated, but the trait estimates themselves become biased when a mixture is present but is not accounted for. We leave such investigation to future study.

In conclusion, we suggest that regardless of the methodology chosen in modeling response style, more attention should be devoted not just to focus on point estimates of content and response style traits returned but also the precision of those estimates. Such attention can make apparent where response styles can and cannot be successfully measured and will also make more apparent how different approaches to measuring response style may differ.

Supplemental Material

FigureA1 – Supplemental material for A Mixture IRTree Model for Extreme Response Style: Accounting for Response Process Uncertainty

Click here for additional data file.^{(312.6KB, jpg)}

Supplemental material, FigureA1 for A Mixture IRTree Model for Extreme Response Style: Accounting for Response Process Uncertainty by Nana Kim and Daniel M. Bolt in Educational and Psychological Measurement

FigureA2 – Supplemental material for A Mixture IRTree Model for Extreme Response Style: Accounting for Response Process Uncertainty

Click here for additional data file.^{(162.3KB, jpg)}

Supplemental material, FigureA2 for A Mixture IRTree Model for Extreme Response Style: Accounting for Response Process Uncertainty by Nana Kim and Daniel M. Bolt in Educational and Psychological Measurement

Mixturetree_AppendixA – Supplemental material for A Mixture IRTree Model for Extreme Response Style: Accounting for Response Process Uncertainty

Click here for additional data file.^{(91.4KB, pdf)}

Supplemental material, Mixturetree_AppendixA for A Mixture IRTree Model for Extreme Response Style: Accounting for Response Process Uncertainty by Nana Kim and Daniel M. Bolt in Educational and Psychological Measurement

Footnotes

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs: Nana Kim Inline graphic https://orcid.org/0000-0003-4416-2012

Daniel M. Bolt Inline graphic https://orcid.org/0000-0001-7593-4439

Supplemental Material: Supplemental material is available for this article online.

References

Adams D. J., Bolt D. M., Deng S., Smith S. S., Baker T. B. (2019). Using multidimensional item response theory to evaluate how response styles impact measurement. British Journal of Mathematical and Statistical Psychology, 72(3), 466-485. 10.1111/bmsp.12169 [DOI] [PMC free article] [PubMed] [Google Scholar]
Austin E. J., Deary I. J., Egan V. (2006). Individual differences in response scale use: Mixed Rasch modelling of responses to NEO-FFI items. Personality and Individual Differences, 40(6), 1235-1245. 10.1016/j.paid.2005.10.018 [DOI] [Google Scholar]
Böckenholt U. (2012). Modeling multiple response processes in judgment and choice. Psychological Methods, 17(4), 665-678. 10.1037/a0028111 [DOI] [PubMed] [Google Scholar]
Böckenholt U., Meiser T. (2017). Response style analysis with threshold and multi-process IRT models: A review and tutorial. British Journal of Mathematical and Statistical Psychology, 70(1), 159-181. 10.1111/bmsp.12086 [DOI] [PubMed] [Google Scholar]
Bolt D. M., Johnson T. R. (2009). Applications of a MIRT model to self-report measures: Addressing score bias and DIF due to individual differences in response style. Applied Psychological Measurement, 33(5), 335-352. 10.1177/0146621608329891 [DOI] [Google Scholar]
Chen C., Lee S., Stevenson H. W. (1995). Response style and cross-cultural comparisons of rating scales among East Asian and North American students. Psychological Science, 6(3), 170-175. 10.1111/j.1467-9280.1995.tb00327.x [DOI] [Google Scholar]
De Boeck P., Partchev I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, 48(1), 1-28. 10.18637/jss.v048.c01 [DOI] [Google Scholar]
De Jong M. G., Steenkamp J.-B. E. M., Fox J.-P., Baumgartner H. (2008). Using item response theory to measure extreme response style in marketing research: A global investigation. Journal of Marketing Research, 45(1), 104-115. 10.1509/jmkr.45.1.104 [DOI] [Google Scholar]
Greenleaf E. A. (1992). Measuring extreme response style. Public Opinion Quarterly, 56(3), 328-351. 10.1086/269326 [DOI] [Google Scholar]
Jeon M., De Boeck P. (2016). A generalized item response tree model for psychological assessments. Behavior Research Methods, 48(3), 1070-1085. 10.3758/s13428-015-0631-y [DOI] [PubMed] [Google Scholar]
Johnson T., Kulesa P., Cho Y. I., Shavitt S. (2005). The relation between culture and response styles: Evidence from 19 countries. Journal of Cross-Cultural Psychology, 36(2), 264-277. 10.1177/0022022104272905 [DOI] [Google Scholar]
Kellner K. (2019). jagsUI: A wrapper around “rjags” to Streamline “JAGS” analyses (R package Version 1.5.1). https://github.com/kenkellner/jagsUI
Khorramdel L., von Davier M. (2014). Measuring response styles across the big five: A multiscale extension of an approach using multinomial processing trees. Multivariate Behavioral Research, 49(2), 161-177. 10.1080/00273171.2013.866536 [DOI] [PubMed] [Google Scholar]
Khorramdel L., von Davier M., Pokropek A. (2019). Combining mixture distribution and multidimensional IRTree models for the measurement of extreme response styles. British Journal of Mathematical and Statistical Psychology, 72(3), 538-559. 10.1111/bmsp.12179 [DOI] [PubMed] [Google Scholar]
Meisenberg G., Williams A. (2008). Are acquiescent and extreme response styles related to low intelligence and education? Personality and Individual Differences, 44(7), 1539-1550. 10.1016/j.paid.2008.01.010 [DOI] [Google Scholar]
Meiser T., Plieninger H., Henninger M. (2019). IRTree models with ordinal and multidimensional decision nodes for response styles and trait-based rating responses. British Journal of Mathematical and Statistical Psychology, 72(3), 501-516. 10.1111/bmsp.12158 [DOI] [PubMed] [Google Scholar]
Naemi B. D., Beal D. J., Payne S. C. (2009). Personality predictors of extreme response style. Journal of Personality, 77(1), 261-286. 10.1111/j.1467-6494.2008.00545.x [DOI] [PubMed] [Google Scholar]
Park M., Wu A. D. (2019). Item response tree models to investigate acquiescence and extreme response styles in Likert-type rating scales. Educational and Psychological Measurement, 79(5), 911-930. 10.1177/0013164419829855 [DOI] [PMC free article] [PubMed] [Google Scholar]
Paulhus D. L. (1991). Measurement and control of response bias. In Robinson J. P., Shaver P. R., Wrightsman L. S. (Eds.), Measures of personality and social psychological attitudes (Vol. 1, pp. 17-59). Academic Press; 10.1016/B978-0-12-590241-0.50006-X [DOI] [Google Scholar]
Plieninger H., Meiser T. (2014). Validity of multiprocess IRT models for separating content and response styles. Educational and Psychological Measurement, 74(5), 875-899. 10.1177/0013164413514998 [DOI] [Google Scholar]
Plummer M. (2017). JAGS version 4.3. 0 user manual [Computer software manual]. sourceforge.net/projects/mcmc-jags/files/Manuals/4.x
Tijmstra J., Bolsinova M., Jeon M. (2018). General mixture item response models with different item response structures: Exposition with an application to Likert scales. Behavior Research Methods, 50(6), 2325-2344. 10.3758/s13428-017-0997-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
Van Vaerenbergh Y., Thomas T. D. (2013). Response styles in survey research: A literature review of antecedents, consequences, and remedies. International Journal of Public Opinion Research, 25(2), 195-217. 10.1093/ijpor/eds021 [DOI] [Google Scholar]
von Davier M., Rost J. (1995). Polytomous mixed Rasch models. In Fischer G. H., Molenaar I. W. (Eds.), Rasch models—Foundations, recent developments and applications (pp. 371-379). Springer; 10.1007/978-1-4612-4230-7_20 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

FigureA1 – Supplemental material for A Mixture IRTree Model for Extreme Response Style: Accounting for Response Process Uncertainty

Click here for additional data file.^{(312.6KB, jpg)}

FigureA2 – Supplemental material for A Mixture IRTree Model for Extreme Response Style: Accounting for Response Process Uncertainty

Click here for additional data file.^{(162.3KB, jpg)}

Mixturetree_AppendixA – Supplemental material for A Mixture IRTree Model for Extreme Response Style: Accounting for Response Process Uncertainty

Click here for additional data file.^{(91.4KB, pdf)}

[bibr1-0013164420913915] Adams D. J., Bolt D. M., Deng S., Smith S. S., Baker T. B. (2019). Using multidimensional item response theory to evaluate how response styles impact measurement. British Journal of Mathematical and Statistical Psychology, 72(3), 466-485. 10.1111/bmsp.12169 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr2-0013164420913915] Austin E. J., Deary I. J., Egan V. (2006). Individual differences in response scale use: Mixed Rasch modelling of responses to NEO-FFI items. Personality and Individual Differences, 40(6), 1235-1245. 10.1016/j.paid.2005.10.018 [DOI] [Google Scholar]

[bibr3-0013164420913915] Böckenholt U. (2012). Modeling multiple response processes in judgment and choice. Psychological Methods, 17(4), 665-678. 10.1037/a0028111 [DOI] [PubMed] [Google Scholar]

[bibr4-0013164420913915] Böckenholt U., Meiser T. (2017). Response style analysis with threshold and multi-process IRT models: A review and tutorial. British Journal of Mathematical and Statistical Psychology, 70(1), 159-181. 10.1111/bmsp.12086 [DOI] [PubMed] [Google Scholar]

[bibr5-0013164420913915] Bolt D. M., Johnson T. R. (2009). Applications of a MIRT model to self-report measures: Addressing score bias and DIF due to individual differences in response style. Applied Psychological Measurement, 33(5), 335-352. 10.1177/0146621608329891 [DOI] [Google Scholar]

[bibr6-0013164420913915] Chen C., Lee S., Stevenson H. W. (1995). Response style and cross-cultural comparisons of rating scales among East Asian and North American students. Psychological Science, 6(3), 170-175. 10.1111/j.1467-9280.1995.tb00327.x [DOI] [Google Scholar]

[bibr7-0013164420913915] De Boeck P., Partchev I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, 48(1), 1-28. 10.18637/jss.v048.c01 [DOI] [Google Scholar]

[bibr8-0013164420913915] De Jong M. G., Steenkamp J.-B. E. M., Fox J.-P., Baumgartner H. (2008). Using item response theory to measure extreme response style in marketing research: A global investigation. Journal of Marketing Research, 45(1), 104-115. 10.1509/jmkr.45.1.104 [DOI] [Google Scholar]

[bibr9-0013164420913915] Greenleaf E. A. (1992). Measuring extreme response style. Public Opinion Quarterly, 56(3), 328-351. 10.1086/269326 [DOI] [Google Scholar]

[bibr10-0013164420913915] Jeon M., De Boeck P. (2016). A generalized item response tree model for psychological assessments. Behavior Research Methods, 48(3), 1070-1085. 10.3758/s13428-015-0631-y [DOI] [PubMed] [Google Scholar]

[bibr11-0013164420913915] Johnson T., Kulesa P., Cho Y. I., Shavitt S. (2005). The relation between culture and response styles: Evidence from 19 countries. Journal of Cross-Cultural Psychology, 36(2), 264-277. 10.1177/0022022104272905 [DOI] [Google Scholar]

[bibr12-0013164420913915] Kellner K. (2019). jagsUI: A wrapper around “rjags” to Streamline “JAGS” analyses (R package Version 1.5.1). https://github.com/kenkellner/jagsUI

[bibr13-0013164420913915] Khorramdel L., von Davier M. (2014). Measuring response styles across the big five: A multiscale extension of an approach using multinomial processing trees. Multivariate Behavioral Research, 49(2), 161-177. 10.1080/00273171.2013.866536 [DOI] [PubMed] [Google Scholar]

[bibr14-0013164420913915] Khorramdel L., von Davier M., Pokropek A. (2019). Combining mixture distribution and multidimensional IRTree models for the measurement of extreme response styles. British Journal of Mathematical and Statistical Psychology, 72(3), 538-559. 10.1111/bmsp.12179 [DOI] [PubMed] [Google Scholar]

[bibr15-0013164420913915] Meisenberg G., Williams A. (2008). Are acquiescent and extreme response styles related to low intelligence and education? Personality and Individual Differences, 44(7), 1539-1550. 10.1016/j.paid.2008.01.010 [DOI] [Google Scholar]

[bibr16-0013164420913915] Meiser T., Plieninger H., Henninger M. (2019). IRTree models with ordinal and multidimensional decision nodes for response styles and trait-based rating responses. British Journal of Mathematical and Statistical Psychology, 72(3), 501-516. 10.1111/bmsp.12158 [DOI] [PubMed] [Google Scholar]

[bibr17-0013164420913915] Naemi B. D., Beal D. J., Payne S. C. (2009). Personality predictors of extreme response style. Journal of Personality, 77(1), 261-286. 10.1111/j.1467-6494.2008.00545.x [DOI] [PubMed] [Google Scholar]

[bibr18-0013164420913915] Park M., Wu A. D. (2019). Item response tree models to investigate acquiescence and extreme response styles in Likert-type rating scales. Educational and Psychological Measurement, 79(5), 911-930. 10.1177/0013164419829855 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr19-0013164420913915] Paulhus D. L. (1991). Measurement and control of response bias. In Robinson J. P., Shaver P. R., Wrightsman L. S. (Eds.), Measures of personality and social psychological attitudes (Vol. 1, pp. 17-59). Academic Press; 10.1016/B978-0-12-590241-0.50006-X [DOI] [Google Scholar]

[bibr20-0013164420913915] Plieninger H., Meiser T. (2014). Validity of multiprocess IRT models for separating content and response styles. Educational and Psychological Measurement, 74(5), 875-899. 10.1177/0013164413514998 [DOI] [Google Scholar]

[bibr21-0013164420913915] Plummer M. (2017). JAGS version 4.3. 0 user manual [Computer software manual]. sourceforge.net/projects/mcmc-jags/files/Manuals/4.x

[bibr22-0013164420913915] Tijmstra J., Bolsinova M., Jeon M. (2018). General mixture item response models with different item response structures: Exposition with an application to Likert scales. Behavior Research Methods, 50(6), 2325-2344. 10.3758/s13428-017-0997-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr23-0013164420913915] Van Vaerenbergh Y., Thomas T. D. (2013). Response styles in survey research: A literature review of antecedents, consequences, and remedies. International Journal of Public Opinion Research, 25(2), 195-217. 10.1093/ijpor/eds021 [DOI] [Google Scholar]

[bibr24-0013164420913915] von Davier M., Rost J. (1995). Polytomous mixed Rasch models. In Fischer G. H., Molenaar I. W. (Eds.), Rasch models—Foundations, recent developments and applications (pp. 371-379). Springer; 10.1007/978-1-4612-4230-7_20 [DOI] [Google Scholar]

PERMALINK

A Mixture IRTree Model for Extreme Response Style: Accounting for Response Process Uncertainty

Nana Kim

Daniel M Bolt

Abstract

Response Styles and IRTree Models

Figure 1.

Statistical Representation of an IRTree Model for Extreme Response Style

Generalization of an IRTree Model to Accommodate a Mixture of Trees