Explaining Performance Gaps with Problem-Solving Process Data via Latent Class Mediation Analysis

Sunbeom Kwon; Susu Zhang

doi:10.1017/psy.2025.10038

. 2025 Aug 11;90(5):1622–1650. doi: 10.1017/psy.2025.10038

Explaining Performance Gaps with Problem-Solving Process Data via Latent Class Mediation Analysis

Sunbeom Kwon ¹, Susu Zhang ^1,^2,^✉

PMCID: PMC12805201 PMID: 40785085

Abstract

Process data, in particular, log data collected from a computerized test, documents the sequence of actions performed by an examinee in pursuit of solving a problem, affording an opportunity to understand test-taking behavioral patterns that account for demographic group differences in key outcomes of interest, for instance, final score on a cognitive item. Addressing this aim, this article proposes a latent class mediation analysis procedure. Using continuous process features extracted from action sequence data as indicators, latent classes underlying the test-taking behavior are identified in a latent class mediation model, where an examinee’s nominal latent class membership enters as the mediator between the observed grouping and outcome variables. A headlong search algorithm for selecting the subset of process features that maximizes the total indirect effect of the latent class mediator is implemented. The proposed procedure is validated with a series of simulations. An application to a large-scale assessment highlights how the proposed method can be used to explain performance gaps between students with learning disability and their typically developing peers on the National Assessment of Educational Progress (NAEP) math assessment.

Keywords: large-scale assessment, latent class analysis, mediation analysis, process data, variable selection

1. Introduction

Using computers as assessment delivery platforms allowed the collection of process data, which is computer log data that documents an examinee’s sequence of actions (e.g., clicks, keystrokes, and revisits) while solving a task (Bergner & von Davier, 2019). Typically, the sequence of actions of an examinee on a particular item is stored as a tuple of nominal elements, each representing a specific action. For example, an action sequence on a constructed response item might be: (Enter_Item, Open_Scratchwork, Draw, Clear, Zoom_In, Type_7.35, Exit_Item, Enter_Item, Type_73.5, Exit_Item). It shows us what tools the examinee utilized, what answers the examinee typed in before submitting the final response, and how many times the examinee visited this item page on the computer. Such data can preserve valuable information on how examinees arrived at their outcome, thus providing information beyond response data (i.e., correct/incorrect). A rich body of literature demonstrated the utility of process data for common measurement and educational tasks, for instance, to build measurement models characterizing examinee and item characteristics (e.g., Chen, 2020; Fang & Ying, 2020; LaMar, 2018; Xiao & Liu, 2024; Zhan & Qiao, 2022) and improve proficiency scoring (e.g., He et al., 2023; Zhang et al., 2022), to identify behavioral prototypes or stages of problem-solving (e.g., Eichmann et al., 2020; Hao & Mislevy, 2019; He et al., 2019, 2022; Tang, 2023; Ulitzsch et al., 2022; Wang et al., 2020), and to identify behavioral characteristics that predict final performance (e.g., Greiff et al., 2015; He & von Davier, 2016; Qiao & Jiao, 2018; Ulitzsch et al., 2021, 2022).

The current article focuses on using process data to understand problem-solving patterns that account for group differences in test scores. Test scores play a vital role in many key decisions, both for individual candidates (e.g., in college admissions, licensing, and recruitment) and for educators and policymakers using formative and large-scale assessment data to guide instruction and policy development. Understanding demographic subgroup differences in test-taking behavior and performance is critical for mitigating potential test biases and closing achievement gaps. An example is the achievement gap in mathematics between U.S. students from underrepresented groups, such as racial minority groups and students with disabilities, and their peers, which has been persistently reported based on the National Assessment of Educational Progress (NAEP) over the years (U.S. Department of Education. Institute of Education Sciences, National Center for Education Statistics, 2022). While the NAEP assessments are designed to measure student performance instead of to explain the differences, there is growing interest in the potential utility of test-taking process data, coupled with student background and proficiency information, to provide additional insights into how problem-solving behavior (e.g., test-taking strategies, misconceptions, use of accommodation/universal design tools) explains performance differences across demographic groups. This is exemplified by the release of the restricted-use process data from select blocks of the NAEP 2017 Grade 8 and Grade 4 math assessments (NCES, 2020), as well as recent Institute of Education Sciences (IES) calls for proposals on the use of NAEP process data to understand the link between test-taking behavior and mathematics performance for learners with disabilities, the goal being to gather evidence that ultimately contributes to the improvement of learning of these students from special populations.

Indeed, many previous studies have shown that analyzing process data can aid in understanding subgroup differences (e.g., He & von Davier, 2016; Liao et al., 2019) and explaining differences in sequential patterns in correct/incorrect problem-solving (e.g., Greiff et al., 2015; He & von Davier, 2016; Ulitzsch et al., 2022). While these findings provide supporting evidence on the potential use of process data to understand subgroup differences in item performance, the limitation of prior approaches for investigating the process data is that the relationship between action sequence patterns and demographic backgrounds (e.g., Eichmann et al., 2020), and similarly the relationship between action sequence patterns and final response (e.g., Eichmann et al., 2020; Gao et al., 2022; He et al., 2023) are studied separately. This does not directly address the question of what types of sequential patterns contributed to group differences. Addressing this requires modeling problem-solving patterns as a potential mediator that explains group differences in the final response. To date, no model-based approach directly addresses this need. We propose a latent class mediation analysis (LCMA) procedure to address this question. Using continuous process features extracted from action sequence data (e.g., features extracted using multidimensional scaling [MDS]) as indicators, latent classes underlying the test-taking behavior are identified in a latent class mediation model, where an examinee’s nominal latent class membership enters as the mediator between the observed grouping and outcome variables.

In the traditional latent variable mediation analysis, the mediator is a continuous latent construct that mediates the predictor’s effect on the outcome in a linear fashion. Two methods can be used to estimate the mediation effect: the difference in coefficients method and the product of coefficients method. In the difference in coefficients method, an outcome is regressed on the predictor and then on both the predictor and mediator, and the indirect effect is the difference in the coefficient of the predictor. In the product of coefficients method, the mediator is regressed on the predictor, and the outcome is regressed on the predictor and mediator, and the indirect effect is the product of the coefficients associated with the predictor–mediator and mediator–outcome relationships. By contrast, in LCMA, the mediator is a discrete grouping variable whose membership probabilities change with the predictor and generate stepwise changes in the outcome. When both the mediator and outcome are continuous, the total effect of the predictor can be additively decomposed into direct and indirect effects. However, this additive decomposition is not straightforward when the mediator is discrete, and traditional methods for identifying indirect effects are no longer applicable (Sint et al., 2021). A counterfactual framework (Pearl, 2010; Robins & Greenland, 1992) resolves these issues by defining direct effect (DE) and total indirect effect (TIE) for discrete mediators. The TIE summarizes the mediation effect of a latent class mediator as the expected outcome difference in a focal group when class membership changes from what it would be under the focal group to what it would be under a reference group.

One difficulty in analyzing process data arises from the nonstandard format of response processes. That is, the length of action sequences varies across examinees and is coded as nominal elements, making traditional analyses inapplicable to process data such as generalized linear models. Addressing the issue of unstructured data format, we work with features extracted from process data. One example of a process feature extraction method is MDS (Borg & Groenen, 2005; Tang et al., 2020). The extracted MDS features are in a rectangular data format and scaled on a continuum while containing the information of the original action sequences, making it suitable for the proposed LCMA procedure.

Another challenge in process data analysis is that the features extracted from the process data are often high-dimensional. To address this issue, we further perform dimension reduction of the process features via model-based clustering on the process features, that is, latent class analysis. Latent class analysis (Banfield & Raftery, 1993; Lazarsfeld, 1950; Lazarsfeld, 1968; Oberski, 2016; Vermunt & Magidson, 2002) can be used to identify latent nominal variables through a set of observed indicators. Clustering is often used to explore common sequential patterns and to link them to variables of interest, such as final performance and demographics (e.g., Gao et al., 2022; Hao & Mislevy, 2019; He et al., 2023). Here, we use the term latent class to refer to the latent profile or the Gaussian mixture component underlying continuous indicators. Identifying latent classes in process data can classify examinees into subgroups based on their test-taking behavior and reveal individual differences in sequential patterns (e.g., Bergner & von Davier, 2019; Welling et al., 2024).

These latent classes may also help explain performance gaps, such as those observed on the NAEP Math Assessment between students with learning disabilities (LD) and their peers (Judge & Watson, 2011). This can be achieved by considering the latent class variable as a mediator explaining the effect of a predictor on the outcome (e.g., Muthén, 2011; Sint et al., 2021). Literature discussing latent classes as potential mediators has primarily focused on latent class mediators with discrete indicators (e.g., Hsiao et al., 2021; Muthén, 2011). However, there is a lack of methodological investigation in LCMA with continuous indicators (Hsiao et al., 2021). Literature considering the extension of latent class analysis with continuous indicators is limited to the latent class model with either covariates (Murphy & Murphy, 2020; Vermunt & Magidson, 2002), or the latent class model with distal outcomes (Dziak et al., 2016; Vermunt, 2010). In this study, we extend the latent class analysis with continuous indicators (e.g., process features) to explain the effect of a binary predictor on a binary outcome through the nominal latent class mediator. An Expectation-Maximization (EM) algorithm is implemented for parameter estimation.

Extracted process features may contain noise or irrelevant information, which can weaken the generalizability of results in latent class mediation models. Removing noisy indicators can enhance classification accuracy and parameter precision in latent class analysis (Dean & Raftery, 2010). To address this, variable selection methods, such as the headlong search algorithm, have been proposed to identify the optimal set of indicators. In this study, a headlong search algorithm, which is generally used to explore the model space and select clustering variables, was used to select process features that maximize the TIE of the latent class mediator in explaining group differences in outcomes.

In summary, we propose a LCMA procedure for 1) identifying the latent class underlying the distribution of process features, 2) finding the set of process features that can best explain the effect of observed group membership on the outcome, and 3) assessing the indirect effect of the group membership on the outcome through the nominal latent class mediator. A headlong search algorithm is used to find the set of process features that best explains the group difference in performance. This is achieved by finding the optimal subset of process features that maximizes the TIE. The proposed framework is intended primarily as an exploratory tool for hypothesis generation from the complex process data, rather than a confirmatory tool for drawing causal conclusions about test-taking behaviors.

The rest of the article is structured as follows. The next section begins with a motivating example based on one item from the NAEP 2017 Grade 8 Math Assessment. Then, the latent class mediation model and the parameter estimation algorithms are introduced. It is followed by the headlong search algorithm for selecting the optimal set of process features. In a simulation study, the performance of the proposed analysis procedure is evaluated in terms of classification accuracy and parameter estimation accuracy. This is followed by an empirical application of the procedure on the NAEP Math Assessment item from the motivating example. Lastly, the significance and limitations of the current study are discussed.

2. LCMA

2.1. Motivating example

As a motivating example, we consider one item available from the restricted-use response and process data in the digital version of the 2017 NAEP Grade 8 Math Assessment. NAEP adopts a probabilistic sampling approach to select schools and students to represent the diverse student population in the United States. The data set consisted of Inline graphic nationally sampled students who were administered a 15-item block (block 1717MA2N03CLID30EX) on the eNAEP, which was administered with a Surface tablet and a stylus. The eNAEP was also embedded with a set of universal design tools, including scratchwork (where students could draw and erase), zooming, color theme change, equation editor, text-to-speech (TOS), and highlighting. Students were allowed to revisit an item multiple times during the test, and each enter/exit of the item page was recorded. For this block, students were not allowed to use a calculator. The data set consisted of students’ ordinal scores to the 15 math items, as well as their log data on the math block, which contained student interactions with the eNAEP platform, such as item visits, tool usage, and response entries to the 15 multiple choice, constructed response, or drag-and-drop items. Students, teachers, and schools also completed a series of survey questionnaires, which contained information on students’ disability status and accommodation on the test.

In the NAEP Math Assessment, students with LD consistently underperformed compared to their typically developing (TD) peers (Judge & Watson, 2011). For the current example, we aim to identify test-taking process patterns that can explain this performance gap between LD and TD learners, by focusing on one item on the multiplication of decimals (VH336968) from this block (Figure 1). The item asked students to find the solution to Inline graphic without using a calculator, and the correct response was . This item was chosen because it was a constructed response item allowing various responses, and it was a relatively computationally involving task, where students use a certain tool (i.e., scratchwork) to facilitate computation.

Item VH336968 from the 2017 NAEP Grade 8 Math Assessment.

*Note*: https://www.nationsreportcard.gov/nqt/.

The NAEP restricted-use log data recorded each response entry to a constructed response item, from putting the cursor in the textbox to leaving the textbox, as one event. The log data thus contained the sequence of interactions of a student on the item, including various constructed response entries (a student can have multiple entries if they made answer changes throughout the test), tool usage, and item revisits (Exit_Item, Enter_Item in the middle of the action sequence). In the data preprocessing stage, we removed system events from the log data and recoded repeated actions, such as consecutive draws/erases for each stroke using the scratchwork tool, into a single action. The first and the last actions (Enter_Item, Exit_Item) were discarded as these were the common elements in all students’ action sequences. We masked the final responses to ensure the action sequence does not directly predict the final outcome, and the answer entries were recoded into two categories. The “735” category includes answers containing the number sequence 7, 3, and 5, with the decimal place masked. The “non-735” category includes responses that do not include the numbers 7, 3, or 5. A preliminary analysis revealed a common error, where many test takers placed the decimal point incorrectly, leading to errors of 735 and 73.5. Recoding the answers in this way masked the final responses while retaining information about the types of mathematical concepts the test takers struggled to demonstrate. Students with disabilities other than LD (e.g., Autism) and those who received extended time accommodation (90-minute version) were excluded from the analysis. The sample size of the LD group was Inline graphic . Two thousand five hundred students from the TD group were randomly selected to balance the sample size between the two groups and reduce computational demand. The sample size of the final data set used in the analysis was thus . Descriptive statistics of the sample are given in Table 1. The marginal proportion of correct responses was Inline graphic for the TD group and for the LD group.

Table 1.

Descriptive statistics of the NAEP Math Assessment Item VH336968

	LD	TD		LD	TD
Response time (secs)	102.55	88.03	Disability severity
Male, %	64	49	Profound, %	3
Age	14.57	14.38	Moderate, %	29
White, %	47	47	Mild, %	59
African American, %	11	14	Omitted, %	8
Hispanic, %	25	24	Breaks during test, %	7	0
Other, %	16	14	Cueing, %	3	0
ELL, %	13	4	Bilingual dictionary, %	1	0
			Preferential seating, %	5	0
			Separate sessions, %	13	0

Open in a new tab

Note: ELL is the English language learners. The number of LD students was Inline graphic and the number of TD students was . The sample sizes are rounded to the closest 10.

Source: U.S. Department of Education, National Center for Education Statistics, “Response Process Data from the NAEP 2017 Grade 8 Mathematics Assessment.”

To transform the process data into continuous features suitable for the subsequent analysis while preserving the original sequential pattern information, MDS was applied for feature extraction. MDS is a dimension reduction method that extracts latent features based on the pairwise dissimilarity measure between two observations. The technical details of extracting MDS features from the action sequence process data are summarized in Appendix A. The proposed LCMA procedure has a multivariate normal distributional assumption on the indicators of the latent class variable. The process features extracted from MDS are scaled on a continuum and are suitable for the proposed analysis. However, note that our proposed method is not limited to process features from MDS. Any feature extraction method that transforms the original action sequence data to a rectangular and continuous data format while preserving the information of examinees’ problem-solving behavior could serve as a viable alternative to MDS. Based on a five-fold cross-validation, Inline graphic total features were extracted. The cross-validation was run on the dissimilarity matrix of the action sequence data using the ProcData R package (Tang et al., 2021). The dissimilarity matrix of the action sequence data was obtained as described in Appendix A. Then, the process features Inline graphic extracted using MDS were used as the potential candidates of continuous indicators in the LCMA.

The LCMA aims to find the latent classes underlying the process features that can explain the correct response probability gap between the LD and TD students. In the latent class mediation model, the predictor G was the binary disability membership variable, where Inline graphic if the student belongs to the TD group and if the student belongs to the learning disability group, the outcome Y was the binary score on the multiplication item, with indicating an incorrect response, and indicating a correct response (i.e., answers equivalent to ). The English language learner (ELL) variable was included as a covariate X to control for potential confounding effects between the predictor and mediator, as well as between the mediator and outcome. Here, Inline graphic indicates an ELL, and indicates otherwise. The process features, , were the candidate indicators of the latent class membership variable ( ) that mediates the relationship between G and Y. The proposed LCMA procedure can be applied to find the optimal subset of process features maximizing the TIE of the latent class mediator between the predictor and the outcome. We next articulate the model formulation as well as the technical details.

2.2. Latent class mediation model

The latent class part of the model assumes a nominal latent class variable Inline graphic for N observations exists underlying the distribution of relevant process features . The set of relevant features is assumed to follow a mixture of multivariate normal distributions with class-specific mean and covariance .

(1)

Equation (1) implies that the distribution of an examinee’s process features, which contain information on their sequential patterns in pursuit of solving the item, differs across the latent classes. For a randomly sampled examinee, the probability density function of Inline graphic given , and is

(2)

where L is the number of latent classes and Inline graphic is the probability of belonging to latent class l. Here, denotes the class-specific multivariate normal density.

The effect of the binary group membership variable Inline graphic on the latent class , controlling for covariate ( ), can be described by a multinomial logistic regression model in Equation 3.

(3)

The regression coefficients Inline graphic , and are the class-specific intercept and slopes for class . For model identification, we set the intercept and slope of the first class to . Equation (3) implies that for an examinee i, the membership probability associated with the problem-solving latent class, , depends on the observed group membership Inline graphic , controlling for the covariate . When the predictor G does not represent a randomized intervention, associations among variables may be influenced by confounding factors. In such cases, it is common practice to adjust for potential confounders of the predictor–mediator ( ) and mediator–outcome ( Inline graphic ) associations by including relevant covariates in the model (Muthén, 2011; Preacher, 2015; Valente et al., 2017; Witkiewitz et al., 2018). This approach helps to reduce bias in the estimated associations.

Given the group membership Inline graphic and the latent class membership , examinee i’s outcome is modeled via a logistic model, controlling for the covariate ,

(4)

Each latent class of the problem-solving process is associated with a class-specific intercept ( Inline graphic ). The coefficient vector , together with and , are associated with the indirect effect of the group membership G on the outcome Y, mediated by the nominal latent class . The coefficient is associated with the direct effect of G on Y, after controlling for and covariate X. Figure 2 shows the structure of the latent class mediation model using process data.

Latent class mediation model.

*Note*: represents a process feature, is a latent class variable, G is a binary group membership (e.g., LD = 1 versus TD = 0), Y is a binary outcome (e.g., correct = 1 versus incorrect = 0)., and X is a covariate. Solid arrows indicate predictive relationships: G and X predict , while and X predict Y. The dashed arrows indicate that the s serve as measurement indicators of .

The likelihood of the model parameters given the observed group memberships Inline graphic , the final outcome , the process features , and the covariate is

(5)

Note that the process features are assumed to be independent of the final outcome given the latent class membership. That is, the latent class is assumed to fully capture the relationship between the process features and the outcome, given the covariates.

The number of latent classes (L) is determined by fitting the latent class model only using the process features and comparing the Bayesian information criterion (BIC).

(6)

where p is the number of parameters, and N is the sample size. BIC is known to be consistent in choosing the number of classes in a mixture model (Keribin, 1998).

The class-specific covariance matrix of process features for class l, Inline graphic , is parameterized through an eigenvalue decomposition of the following form:

(7)

where Inline graphic is a scalar controlling the volume of the ellipsoid, is a diagonal matrix specifying the shape with , and is an orthogonal matrix determining the orientation of the ellipsoid (Banfield & Raftery, 1993; Celeux & Govaert, 1995; Fraley & Raftery, 2002). Various equality constraints can be assumed between and within group covariance structures. In their works, Banfield & Raftery (1993) and Celeux & Govaert (1995) present models tailored to various clustering scenarios. These models are implemented in the mclust R package (Scrucca et al., 2023). Celeux & Govaert (1995) recommended using the model allowing different volumes and more parsimonious models, such as a diagonal covariance matrix for high-dimensional data. Here, we adopted the model that assumes varying volumes but equal shapes between classes and orientations aligned with the coordinate axes. In this parsimonious model, the class-specific covariance matrix becomes,

(8)

where Inline graphic is a diagonal matrix with .

2.3. Parameter estimation

An EM algorithm (Dempster et al., 1977) is implemented to find the marginal maximum likelihood estimates of the latent class mediation model by maximizing the observed data log-likelihood. Similar to the EM algorithm for the Gaussian mixture model in Fraley & Raftery (2002), the class membership variable Inline graphic is introduced as the unobserved portion of the data, where,

(9)

The conditional distribution of ( Inline graphic ) given is

(10)

The log-likelihood of the parameters given the complete data Inline graphic is,

(11)

The initial class memberships in the EM algorithm are obtained by fitting the hierarchical agglomeration clustering analysis (Murtagh & Legendre, 2014). The algorithm iterates the E-step and the M-step described below until a convergence criterion has been reached.

2.3.1. E-step

In the E-step, class membership probabilities, Inline graphic s, are estimated for and in the rth iteration by

(12)

2.3.2. M-step

In the M-step, we update the parameters, Inline graphic , and by maximizing the expected complete data log-likelihood computed with the estimates . For updating and , we set the first latent class as the baseline reference level for identifiability, and

(13)

where Inline graphic . The and the are updated with the estimates from the multinomial logistic regression model by maximizing

(14)

Similarly, Inline graphic , , and are updated by maximizing the following term

(15)

The closed-form solutions to Equations (14) and (15) are unavailable, so a quasi-Newton method (BFGS; Broyden, 1970; Fletcher, 1970; Goldfarb, 1970; Shanno, 1970) was used to update Inline graphic , and .

The class-specific means on the process features, Inline graphic s, have closed-form expressions from the E-step as

(16)

where Inline graphic . For updating the covariance matrix , we use the approach described in Celeux & Govaert (1995). The scattering matrix of a class is

(17)

We update Inline graphic and by minimizing

(18)

The minimization of (18) requires an iterative procedure.

(19)

where d is the dimension of the relevant process features Inline graphic . The E-step and the M-step are iterated until a termination criterion has been reached. Parameter estimates from the last iteration are used as the final estimates. For each examinee, the latent class memberships can be estimated via the maximum a posteriori probability (MAP).

(20)

2.4. Assessing direct and indirect effect

To quantify the amount of information in the outcome explained by the group membership through the latent class mediator, we adopt the assessment of direct and indirect effects with a nominal mediator described in Muthén (2011). Although intended as an exploratory tool, this model assumes no unmeasured confounding among the predictor, mediator, and outcome, as is standard in causal inference frameworks. Let Inline graphic denote the potential outcome that would have been observed if the group membership was g and the latent class membership was for an examinee. The conditional expectation of the outcome Y in group g, when the latent class mediator is held constant at the value it would obtain for group Inline graphic , controlling for the covariate X, is

(21)

The DE and the TIE are defined as follows:

(22)

The TIE is interpreted as the expectation of the difference between the outcome in the focal group ( Inline graphic ) when the mediator changes from the values it would obtain in the focal group to the values it would obtain in the reference group ( ). For example, in the context of the NAEP Math Assessment data, the TIE can be interpreted as the expected difference in the probability of a correct response for LD students when their latent class membership shifts from the class it would take in the LD group to the class it would take in the TD group. The TIE and DE can be estimated with the latent class mediation model parameter estimates. The sum of the DE and the total indirect effect is equal to the total effect, Inline graphic .

(23)

Testing of the TIE is available by constructing confidence intervals using the delta method (Sint et al., 2021) or bootstrap resampling (Muthén, 2011). The approximation of the standard error of TIE using the delta method is described in Appendix B.

3. Headlong search algorithm for feature selection

The process features are high-dimensional and may contain noisy information irrelevant to the relationship between the observed group and the final outcome. We implement a headlong search algorithm to find the optimal subset of process features that maximizes the TIE. Let Inline graphic be the set of all K process features. The algorithm starts with an initial subset of process features and iteratively updates the subset (denoted ), to find an optimal subset such that the LCMA model with process features maximizes the TIE.

3.1. Feature subset initialization

We first fit the latent class analysis (LCA) model using all K process features as indicators from Equation (2). The number of latent classes Inline graphic is selected using the BIC in Equation (6). Then, we fit a latent class model with a single indicator for each feature in with the fixed . The average variance of class probability estimates across individuals is calculated, where the class probability estimates are calculated in the E-step of the EM algorithm from the single indicator LCA model estimates. The larger the average class probability variance, the indicator gives a better separation of the classes. Similar to the approach in Dean & Raftery (2010), we select Inline graphic - features with the largest variance of class probabilities as the initial set. Here, - is the maximum number of features needed to identify L latent classes by their locations. With the initial set of features, the latent class mediation model is fit using the current subset, Inline graphic , the group membership variable, G, and the outcome, Y. After selecting the initial set of features, we proceed with the inclusion and exclusion steps of the headlong search algorithm.

3.2. Inclusion step

At any iteration, let Inline graphic be the subcolumns of currently included in the model, and let be the remaining columns of not included in the model. The logic of the inclusion and exclusion steps is that if including a feature in or excluding a feature from increases the TIE significantly, then we can add or exclude that feature. In the inclusion step, each process feature in Inline graphic is a candidate feature. For each candidate feature, the latent class mediation model is fit after adding the feature to . The number of latent classes is determined by selecting the LCA model with the highest BIC. We test if the absolute value of TIE increases significantly after adding the candidate feature by examining whether the Inline graphic % confidence interval includes the TIE estimate from the previous subset. The feature that increases the TIE most is added to the current set, , if the increase in TIE is significant. If none of the features increase the TIE significantly when added to the current set, we do not add any feature to Inline graphic .

3.3. Exclusion step

In the exclusion step, the features in Inline graphic are examined. For each feature in , the latent class mediation model is fit after removing that feature from . The number of latent classes is determined by selecting the LCA model with the highest BIC. The feature that leads to the largest increase in TIE when removed is excluded from Inline graphic if the confidence interval of the TIE does not contain the TIE estimate from the previous step. If none of the features contribute to a significant increase in TIE when removed from the current set, we do not remove any feature from . If there is no change after a round of inclusion and exclusion steps, the feature set is finalized as Inline graphic , and the finalized latent class mediation model is fit.

The proposed LCMA procedure using process data is summarized in Algorithm 1.

graphic file with name S0033312325100380_figu1.jpg

4. Simulation study

Simulation studies are conducted to examine whether the proposed procedure selects the signal indicators that effectively explain the mediation effect and accurately estimates the total indirect effect.

4.1. Data generation

Random samples with Inline graphic sample size, latent classes, indicators, and binary final outcome Y were generated under a latent class mediation model given the binary group membership G generated from a Bernoulli distribution with . The numbers of signal indicators were . The noisy indicators were randomly generated independently of the true latent class membership. Thus, the noisy indicators do not contribute to the classification of subjects into latent classes, and they are irrelevant to the relationship between the predictor G and the outcome Y. Figure 3 presents the true mean structure of the Inline graphic indicators conditioned on the four latent classes, where each column represents a latent class. The first S rows are the mean vectors of the signal indicators. In Figure 4, the distributions of latent classes from one of the simulated data sets are presented on a two-dimensional space using the first two indicators to summarise the simulation conditions. In the Inline graphic condition, at least three of the signal variables need to be selected to identify the four latent classes by location. In the condition, all the signal variables must be selected to identify the four latent classes correctly. In the condition, the first variable ( ) is the only indicator we need to identify the four true latent classes. Two levels of class-specific variances were considered, Inline graphic and , to control the level of overlap, that is, how much the latent classes can intersect. Overlapping true latent classes can lead to the misclassification of individuals. In the condition, the latent classes do not overlap, whereas in the condition, the latent classes do overlap, allowing the misclassification of individuals. The true TIE and DE were set to Inline graphic 0.125 and . The true model parameter values are described in Appendix C. The number of replications in each condition was . The R codes used for the simulation can be found on the Open Science Framework (OSF) at https://osf.io/a5zem/?view_only=983859876f2547bb977e02e5dfef6a3d.

True mean structures in the simulation study.

*Note*: The columns represent the four latent classes, and the rows represent the ten indicators. The first S rows are the signal indicators, and the rest are the noisy indicators.

Scatter plots of simulated indicators from the simulation conditions.

4.2. Simulation results

The bias, RMSE, and the Inline graphic % coverage rate of the TIE are given in Table 2. The bias and RMSE of TIE were calculated as follows:

(24)

Inline graphic is the TIE estimate calculated based on the model parameter estimates in the rth replication, and TIE is the true TIE. The proposed LCMA procedure recovered the TIE of the latent class mediator well, although the TIE was slightly overestimated. The magnitude of the bias slightly increased in the Inline graphic conditions, where the latent classes were allowed to overlap. However, the bias of TIE is negligible as the relative biases were less than except for conditions 4 and 5. Further, we found that the bias of model parameter estimates decreased as the sample size increased in an additional simulation (Table D1). The Inline graphic % coverage rate was computed using % confidence intervals constructed from standard error estimates derived via the delta method.

(25)

Inline graphic and are the lower bound and the upper bound of the confidence interval, and I is the indicator function. The coverage rates of TIE were acceptable, ranging from to in the non-overlapping classes conditions and from to in the overlapping classes conditions.

Table 2.

Simulation study results

								TIE
Con.	VAR	S	ARI	N.class	N.ind	FP	TP	Bias	RMSE	95% C.R.
1	1	5	0.91	3.69	3.20	0.06	0.58	0.006	0.024	0.92
2	1	3	0.92	3.72	3.39	0.10	0.91	0.006	0.026	0.90
3	1	1	0.96	4.32	2.37	0.15	1.00	0.003	0.024	0.94
4	3	5	0.80	3.50	3.31	0.07	0.60	0.014	0.027	0.88
5	3	3	0.82	3.65	3.48	0.13	0.86	0.017	0.029	0.74
6	3	1	0.88	4.11	1.89	0.10	1.00	0.003	0.025	0.93

Open in a new tab

Note: 95% C.R. is the 95% coverage rate.

Throughout the simulation conditions, the selected number of classes was close to the true number of classes, Inline graphic , ranging from to (Table 2). The classification accuracy of the proposed analysis was evaluated using the average adjusted Rand index (ARI; Hubert & Arabie, 1985) between the estimated class and the true class. ARI measures the agreement of the two classifications when the number of classes does not necessarily match. ARI close to Inline graphic indicates perfect agreement with the true classification, and ARI close to indicates random classification. The formula of the ARI is given as follows. Let be the number of individuals in class i classified into the jth class. is the number of true classes, and is the number of classes in the latent class mediation model. Then,

(26)

where Inline graphic , , and . In the simulation conditions, the average ARI values were greater than , indicating an accurate classification of the proposed analysis. The ARI values were greater in the non-overlapping ( ) condition ( ) than the overlapping ( ) condition ( ).

The variable selection algorithm performed well under the simulation conditions. In Table 2, the sixth column shows the average number of indicators selected in each condition. When three signal indicators were needed to identify the four true latent classes (i.e., conditions 1, 2, 4, and 5), slightly more than three variables were selected. When the first indicator was the only signal indicator (i.e., conditions 3 and 6), Inline graphic and indicators were selected on average in the final model. The seventh and eighth columns in Table 2 show the false positive (FP) rate and the true positive (TP) rate of selecting the indicator. The false positive rate is calculated as the probability of selecting a noisy indicator, and the true positive rate is calculated as the probability of selecting a signal indicator.

(27)

Inline graphic is the set of indicators selected in the final model in the rth replication. The variable selection algorithm controlled the false positive rate reasonably, ranging from to . In the conditions, the true positive rate was and , which means about 60% of the first five signal variables were selected, which suffices to identify the four true latent classes. In the Inline graphic conditions, most of the three signal indicators were selected with the true positive rates of and . In the condition, the sole signal indicator was always selected in the final model with true positive rate.

We conducted additional simulations to evaluate the accuracy of parameter estimates given the true number of latent classes, L. The EM algorithm for the LCMA model performed well, exhibiting low bias and RMSE in the parameter estimates. Additionally, we assessed the proposed algorithm’s performance under alternative data-generating models. The algorithm showed robust performance across various scenarios in terms of both variable selection and parameter estimation. Further details about the simulation methods and results are provided in Appendices D and E.

5. NAEP Math Assessment data analysis results

The LCMA was applied to the empirical data from the motivating example. To start, we fit a simple logistic regression predicting the final outcome Inline graphic with the disability group membership G, without any mediator. The log odds of correct response were lower in the LD group than in the TD group,

(28)

Without any mediators, the total effect of the group membership on the final outcome was Inline graphic 0.273, calculated as follows:

(29)

Then, the proposed LCMA procedure is applied to the empirical data. Specifically, in the current context, the LCMA aims to address the following research questions (RQs):

What are the latent classes ( ) of action sequence patterns that explain the relationship between disability group (G) and outcome (Y)? In other words, we search for underlying M in Equations 3–4.
What subset of action sequence features ( ) can best account for the effect of disability group on the outcome? In other words, we search for in Equations 22–23.
How much of the group difference in final outcome can be explained by the latent class mediator ( ) underlying problem-solving process features? In other words, we estimate and evaluate TIE in Equations 22–23.

The headlong search algorithm described previously was implemented to find the subset of indicators maximizing the TIE of the disability group membership on the final score through the process features. Out of the Inline graphic MDS process features, the variable selection algorithm selected indicators in the final model. The data analysis required approximately 31 hours with a sample size of , candidate features, and the maximum number of latent classes set to . The selected number of latent classes was Inline graphic . After incorporating the latent class mediator, the TIE estimate was , controlling for the ELL variable, with a 95% confidence interval of ( 0.183, 0.125). This shows that the latent class variable underlying the selected process features could substantially explain the final score difference between LD and TD students, controlling for the ELL status to 0. To evaluate the reproducibility of our results, we randomly sampled 80% of the data four times and applied the proposed LCMA procedure to each subsample, examining the stability of both the TIE estimates and classification of students across the subsamples. Each subsample consisted of 470 LD students and 2000 TD students. The TIE estimates varied only slightly, from Inline graphic 0.119 to 0.157, across the four subsamples. While the optimal number of classes was 20 in these subsamples, the ARIs comparing classification from the total sample to those from the subsamples ranged from to indicating high consistency in the classification of students.

To interpret and label the identified latent classes, we propose inspecting common patterns in the original action sequences of test takers within each class. Although a common approach involves describing classes based on their indicators (Spurk et al., 2020), this can be challenging with MDS features, as the extracted feature values are often difficult to interpret. Analyzing the action sequence offers a clearer and more practical approach to understanding and labeling the latent classes underlying the process data. Table 3 presents a descriptive summary of common patterns in the original sequences for each class, along with their corresponding class labels. Marked in (h) are homogeneous classes with identical action sequences. For instance, the common action sequence for Class 2, labeled “Revisit for review, 735”, was (Part_1_735, Exit_Item, Enter_Item). This indicates that every student in this class entered an answer with the numbers 7, 3, and 5 and revisited the item page once. On the other hand, the common action sequence for Class 5, labeled “Omission of the first try,” was (Exit_Item, Enter_Item, Part_1_735). This indicates that every student in this class initially omitted the item and then submitted an answer with the numbers 7, 3, and 5 during their second visit to the item page. To interpret the non-homogeneous classes, we examined both the common actions within each class and the summary statistics provided in Table 3. For example, Class 1 was labeled as “Multiple revisits, no tools, 735”, and every student in this class revisited the item page multiple times while submitting an answer with the numbers 7, 3, and 5.

Table 3.

Tool usage rates of latent classes from the NAEP math assessment from the NAEP math assessment item VH336968

Class	Label	No.	Len.M	Len.SD	Rev.	Avg.rev.	Dra.	Era.	Cle.	E C	Hig.	Zoo.	The.	TOS
1	Multiple revisits, no tools, 735	40	5.16	0.69	1.00	2.02	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
2	Revisit for review, 735 (h)	150	3.00	0.00	1.00	1.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
3	Draw_Erase	90	6.28	2.63	0.03	0.04	1.00	1.00	0.42	1.00	0.03	0.00	0.00	0.00
4	Draw_Clear	60	4.95	1.53	0.00	0.00	1.00	0.00	0.98	0.98	0.00	0.00	0.00	0.00
5	Omission of the first try (h)	50	3.00	0.00	1.00	1.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
6	No tools, 735 (h)	1020	1.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
7	Single draw (h)	140	2.00	0.00	0.00	0.00	1.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
8	Draw and revisit	100	6.61	2.20	1.00	1.17	1.00	0.43	0.57	0.79	0.01	0.00	0.00	0.00
9	Draw with clear or erase, revisit	170	7.53	5.96	0.41	0.74	0.90	0.49	0.56	0.83	0.11	0.01	0.05	0.08
10	Irrelevant tools (TOS) or reentries	80	5.72	2.48	0.31	0.37	0.52	0.16	0.17	0.29	0.03	0.00	0.03	0.91
11	Irrelevant tools (theme) or revisit	70	6.40	3.38	0.30	0.40	0.44	0.14	0.19	0.29	0.01	0.10	0.81	0.14
12	Draw with clear or erase	130	6.53	3.31	0.02	0.03	1.00	0.66	0.63	0.95	0.02	0.00	0.00	0.01
13	Multiple revisits or reentries	60	6.63	2.94	1.00	2.32	0.11	0.05	0.08	0.10	0.00	0.00	0.00	0.02
14	Omission of the first try, non-735 (h)	40	3.00	0.00	1.00	1.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
15	Draw_Erase or Draw_Clear, non-735	30	3.00	0.00	0.00	0.00	1.00	0.41	0.59	1.00	0.00	0.00	0.00	0.00
16	No tools, non-735 (h)	720	1.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
17	Revisit for review, non-735 (h)	60	3.00	0.00	1.00	1.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
18	Single draw, non-735 (h)	70	2.00	0.00	0.00	0.00	1.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00

Open in a new tab

Note: No.: Number of students classified into each class (rounded to the nearest ten); Len.M: Average action sequence length; Len.SD: Standard deviation of action sequence length; Rev. Revisit; Avg.Rev.: Average number of revisits; Dra.: Draw; Era.: Erase; Cle.: Clear; E Inline graphic C: Erase or Clear; Hig.: Highlight; Zoo.: Zooming in/out; The.: Theme editor; TOS.: text-to-speech; Source: U.S. Department of Education, National Center for Education Statistics, “Response Process Data from the NAEP 2017 Grade 8 Mathematics Assessment.”

Table 4 shows the model-implied correct response probabilities, Inline graphic , and class probabilities, , for LD and TD students, along with their (log) odds ratios and the raw differences in class probabilities. These probabilities are calculated based on the model parameter estimates related to the logistic regression in Equation 4, and , and the multinomial logistic regression in Equation 3, Inline graphic and , controlling for the covariate. Note that after classifying students into the latent classes, the difference in the correct response probabilities within each class between LD and TD students has decreased. This also shows that the latent class variable can explain the performance gap between the two groups. Behaviors observed in classes 1 to 9 are associated with higher correct response probabilities compared to the marginal correct response probability, while behaviors common in classes 10 to 18 are associated with lower correct response probabilities.

Table 4.

Model implied response probabilities and class probabilities from the NAEP math assessment item VH336968


Class	Label	LD	TD	LD	TD	OR	LOR	DIFF
	Marginal probability	0.21	0.49
1	Multiple revisits, no tools, 735	0.75	0.87	0.00	0.02	0.21	-0.67	-0.01
2	Revisit for review, 735 (h)	0.74	0.86	0.02	0.05	0.33	-0.48	-0.04
3	Draw_Erase	0.63	0.79	0.03	0.03	0.82	-0.09	-0.01
4	Draw_Clear	0.60	0.77	0.01	0.02	0.29	-0.53	-0.02
5	Omission of the first try (h)	0.60	0.77	0.01	0.02	0.44	-0.35	-0.01
6	No tools, 735 (h)	0.59	0.75	0.20	0.36	0.44	-0.36	-0.16
7	Single draw (h)	0.57	0.74	0.04	0.05	0.69	-0.16	-0.02
8	Draw and revisit	0.35	0.54	0.03	0.03	1.07	0.03	0.00
9	Draw with clear or erase, revisit	0.34	0.53	0.06	0.06	1.04	0.02	0.00
10	Irrelevant tools (TOS) or reentries	0.20	0.35	0.03	0.02	1.59	0.20	0.01
11	Irrelevant tools (theme) or revisit	0.19	0.34	0.03	0.02	1.41	0.15	0.01
12	Draw with clear or erase	0.01	0.02	0.05	0.04	1.34	0.13	0.01
13	Multiple revisits or reentries	0.00	0.00	0.03	0.01	2.46	0.39	0.02
14	Omission of the first try, non-735 (h)	0.00	0.00	0.02	0.01	1.63	0.21	0.01
15	Draw_Erase or Draw_Clear, non-735	0.00	0.00	0.02	0.00	5.31	0.73	0.02
16	No tools, non-735 (h)	0.00	0.00	0.36	0.20	2.20	0.34	0.16
17	Revisit for review, non-735 (h)	0.00	0.00	0.03	0.02	1.59	0.20	0.01
18	Single draw, non-735 (h)	0.00	0.00	0.03	0.02	1.40	0.15	0.01

Open in a new tab

Note: Inline graphic is the model implied correct response probability. is the model implied probability of belonging to the l-th class. (h) indicates a homogeneous class with the same action sequence. OR: Model implied odds ratio of class probabilities for LD against TD; LOR: Log odds ratio; DIFF: Difference in class probabilities. Source: U.S. Department of Education, National Center for Education Statistics, “Response Process Data from the NAEP 2017 Grade 8 Mathematics Assessment.”

From Table 4, we identify the test-taking behaviors that contribute to the performance gaps between LD and TD students by focusing on the latent classes with substantial class probability Inline graphic differences in both the odds ratio and absolute difference scales. Since most class probabilities are small except for Classes 6 and 16, some absolute proportion differences are also small. The classes with higher correct response probabilities were Class 2 “Revisit for review, 735”, Class 4 “Draw_Clear”, Class 6 “No tools, 735”, and Class 7 “Single draw”. The class probability odds ratios for these classes were Inline graphic , and , indicating that LD students were less likely to belong to these classes. Specifically, LD students were less likely to revisit the item for review and submit an answer with numbers 7, 3, and 5. Additionally, behaviors such as using scratchwork with a single draw stroke or clearing the scratchwork immediately after drawing led to higher correct response probabilities, yet LD students were less likely to display these behaviors. When using no tools, LD students were less likely to submit an answer containing 7, 3, and 5, suggesting they were more likely to make non-decimal point errors and demonstrate misconceptions in their responses. On the other hand, LD students were more likely to belong to the low-performing classes, Class 13 “Multiple revisits or reentries”, Class 15 “Draw_Erase or Draw_Clear, non-735”, and Class 16 “No tools, non-735”. The class probability odds ratios were Inline graphic , and , respectively. These results suggest that the behaviors associated with worse performance, and more commonly observed among LD students, include multiple revisits, a sequence of Draw and Erase or Clear with non-735 responses, and using no tools with non-735 responses.

These results show key differences in test-taking behaviors between LD and TD students, particularly in their use of scratchwork, item review, and response patterns for non-735 answers. TD students were more likely to engage in effective scratchwork strategies, such as making a single draw stroke, which were associated with higher correct response probabilities. In contrast, LD students tended to engage in unproductive behaviors like repeatedly revisiting or re-entering answers, which are associated with lower performance. Additionally, for students who submitted a non-735 answer, common incorrect responses such as Inline graphic , and suggest deeper misconceptions about decimal multiplication. These answers could be derived by the following computations: ; ; and . Each of these errors goes beyond simple misplacement of the decimal point and indicates fundamental misunderstandings about multiplication rules in the context of decimals.

In Figure 5, the global structure of the selected process features is displayed on a two-dimensional plot using the t-distributed stochastic neighborhood embedding (t-SNE; Van der Maaten & Hinton, 2008). The t-SNE is a popular dimension-reduction method for visualization that preserves the similarity between observations by considering the observation’s nearest neighbors. While Figure 5 displays a grouping of the 18 classes into distinct areas, some classes are less clearly separated. The homogeneous classes are displayed as single points, as these classes had identical feature values. Classes 11 and 10 were the most dispersed classes in the plot because students in these classes were randomly browsing the available tools and submitted various responses.

t-SNE plot of the selected process features from the NAEP math assessment item VH336968.

*Source:* U.S. Department of Education, National Center for Education Statistics, “Response Process Data from the NAEP 2017 Grade 8 Mathematics Assessment.”

6. Discussion

Process data collected in computerized testing preserves valuable information beyond the traditional response data. However, analyzing process data is challenging because of its unstructured format and noise, which hinders the use of traditional approaches developed for rectangular data. This study provides an approach to a traditionally challenging task with new but noisy process data. The proposed LCMA analysis procedure is a general statistical method that can be applied when the latent class variable underlying action sequences is assumed as the mediator between an observed predictor and an outcome. The latent class mediation model and the headlong search algorithm allow dimension reduction and noise elimination from the process features, enhancing the interpretability of the results.

The latent class analysis with continuous indicators, often called latent profile analysis or Gaussian mixture clustering, is extended to a LCMA. To the best of our knowledge, the current study is the first attempt to extend the latent class analysis assuming multivariate normality of the indicators into a latent class mediation model including both a covariate and a distill outcome to assess the mediation effect via nominal latent class variable. There are a few studies using a latent class mediator with continuous indicators. For example, Sint et al. (2021) proposed a LCMA where the observed continuous indicator was specified as a generalized linear model, given the latent class. The limitation of such an approach is that the covariance structure of the indicators was not considered.

Process data from large-scale assessments can help understand why certain students are struggling, serving as a seminal guide to efforts on evidence-based strategies to improve educational equity. The proposed analysis can help educators design targeted treatments for specific subgroups. With the NAEP Math Assessment data, we showed that the proposed LCMA can identify the latent class variable that explains the performance gap on a multiplication item between the students with learning disability and the TD students. Each class was interpreted and labeled based on summary statistics, such as the tool usage rates of the students classified into each class. Then, calculating the model-implied correct response probabilities and class probabilities using the parameter estimates from the proposed model allowed us to attribute the performance gap between the two groups to the difference in test-taking behaviors. The key point is that identifying the latent classes underlying the features and examining how the two groups differ in their probabilities of belonging to each latent class allows us to explain the performance gap between the two groups.

Practical implications of the NAEP Math Assessment data analysis demonstrate the importance of identifying specific test-taking behaviors that led to performance gaps between LD and TD students. By focusing on behaviors such as revisiting questions and employing effective problem-solving strategies, educators can design targeted interventions to help LD students develop more effective test-taking habits and improve their overall performance. Additionally, grouping students based on their specific test-taking behaviors can allow teachers to provide more focused support and instruction to meet individual needs, and such strategies can help bridge the gap in academic performance between LD and TD students.

The current study implemented a simultaneous estimation method for the latent class mediation model using an EM algorithm. The proposed estimation method is justified by the simulation results as the model parameters were accurately estimated when the data was generated from the true model. In addition, Bolck et al. (2004) suggested that simultaneous estimation is viable in latent class analysis with continuous indicators when a distal outcome is predicted by the latent class variable. However, in the mediation analysis with a latent variable, variations of two-step and three-step estimation approaches with adjustments for classification errors may be available. In the context of LCMA with categorical indicators, Hsiao et al. (2021) compared six different estimation methods, including variations of one-step, two-step, and three-step approaches. There is a demand for an investigation of the estimation in the LCMA with continuous indicators in various conditions.

A headlong search algorithm for feature selection is proposed. The objective of the feature selection algorithm is to find the subset of process features that maximizes the TIE. In the simulation, the proposed feature selection algorithm performed well in selecting the signal features while excluding the noisy features irrelevant to the true clustering. This approach aligns with the idea of the exploratory mediation analysis (van Kesteren & Oberski, 2019) where a mediation filter was used to find the subset of many potential mediators to explain the effect of the predictor on the outcome. There is one caveat to the proposed feature selection algorithm. Each inclusion and exclusion step requires significance testing. As the number of iterations in the search algorithm increases, the family-wise type-1 error can be hard to control at a desired significance level. Therefore, family-wise type-1 error control methods proposed in step-wise variable selection, such as Bonferroni correction, may be considered. Or, considering different criteria, such as a decrease in the DE for selecting the initial set of features, may improve the reliability of the search algorithm. Another alternative could be implementing a search algorithm that does not rely on step-wise decisions.

We adopted the counterfactual approach (Pearl, 2010; Robins & Greenland, 1992) and the formal definitions of effects involving a latent class mediator described in Muthén (2011) to assess the TIE of the nominal latent class mediator. The indirect effects defined in the counterfactual framework rely on several strict assumptions and are described in Imai et al. (2010), Valeri & Vander Weele (2013), and Vander Weele & Vansteelandt (2009). A part of the assumptions can be satisfied when the predictor is a randomized treatment. Other assumptions require that there is no unmeasured confounding variable of the predictor-outcome relationship and the mediator-outcome relationship. The effects of unmeasured confounding variables can be controlled by including them as covariates, as described previously. In observational research, however, including demographic variables such as learning disability status may still violate the randomized treatment assumption. The indirect effect estimates are biased when some of the assumptions are violated.

Importantly, we emphasize that the proposed framework is intended as an exploratory tool for generating hypotheses about causal relationships in complex process data, rather than for drawing causal claims about test-taking behavior. To advance from hypothesis generation to more robust causal statements, future work could integrate formal sensitivity analyses. For example, future studies could adopt bias-adjustment formulas for unmeasured confounding (Vander Weele & Arah, 2011), sensitivity analysis for causal mediation effects (Imai et al., 2010), or statistical methods for examining and adjusting for assumption violations (MacKinnon & Pirlott, 2015).

Complex latent-class models are susceptible to convergence at local optima, which can in turn affect BIC-based model selection. We initialize the EM algorithm via hierarchical agglomeration clustering, as implemented in the mclust R package (Scrucca et al., 2023), to optimize the chance of arriving at an accurate model solution. Nonetheless, future extensions could consider incorporating multiple-start EM runs, as implemented in Mplus (Muthén & Muthén, 2017). It should also be noted that uncertainty in the BIC-based model selection could propagate to mediation effect estimates. Such unaddressed model selection variability may lead to underestimation of posterior uncertainty for indirect effect parameters. To address this, one can adopt fully Bayesian model approaches treating the number of latent classes as a random variable (see e.g., Chen et al., 2021; Richardson & Green, 1997; Stephens, 2000) or apply Bayesian model averaging over candidate models (see e.g., Hoeting et al., 1999; Russell et al., 2015; Wasserman, 2000).

Other machine learning techniques can be used to extract process features from the unstructured action sequence data while preserving the information of the original data. The type of information kept in the process features will depend on the feature extraction method. For example, N-gram-based techniques could extract the frequencies of a sequence of actions (e.g., He & von Davier, 2016). One potential advantage of using N-gram-based features in latent class mediation modeling is that the selected features can be more interpretable. Each feature is related to the frequency of a certain action sequence. Therefore, the selected features can directly show the test-taking behavior that explains performance gaps between groups. However, the N-gram features are discrete variables, and the multivariate normality assumption of the proposed analysis may not hold. A future extension of this study could involve incorporating discrete features, such as count or binary data, into the proposed analysis framework. Another possible direction is extending the model to accommodate a multi-categorical group membership predictor by introducing Inline graphic dummy variables with their corresponding regression coefficients, where C is the number of categories.

Appendix A. MDS for action sequence data

MDS (Borg & Groenen, 2005) is a dimension reduction method that extracts latent features based on the pairwise dissimilarity measure between two observations. MDS is widely used for data visualization and in many areas of psychometrics (Takane, 2006). The goal of MDS is to locate observations within a vector space based on their pairwise dissimilarities, ensuring that similar observations are located closely. In contrast, less similar ones are located farther apart. Tang et al. (2020) proposed using MDS for extracting process features from the problem-solving process data. In process data analysis, if dissimilarities effectively capture differences between two processes, the coordinates derived from MDS can serve as features containing information about the original processes (Tang et al., 2020).

The dissimilarity measure between two action sequences takes into account the number of unique actions and the order of common actions (Gómez-Alonso & Valls, 2008). Let Inline graphic and be two action sequences of examinee i and j. and are the lengths of each action sequence. denotes the set of common actions that appear in both and . denotes the set of actions that appear in but not in . Let be the number of times that an action a appears in Inline graphic . denotes the kth element of that is, the position of the kth appearance of a in . Then, the dissimilarity among the common actions in and is quantified as

(A.1)

where Inline graphic . The count of unique actions appearing in only one of and is quantified as

(A.2)

Then, the dissimilarity between two action sequences is defined by

(A.3)

Let Inline graphic be the symmetric dissimilarity matrix, where measures the dissimilarity between and . Higher dissimilarities indicate greater disparities, and the dissimilarity between identical action sequences is zero. MDS assigns each action sequence to a latent vector in the K-dimensional Euclidean space such that these vectors dictate the dissimilarities. The application of MDS to the dissimilarity matrix D minimizes

(A.4)

The stochastic gradient descent (Robbins & Monro, 1951) can be used to solve the optimization problem. Let Inline graphic be the set of all process features extracted from the nation sequence process data. Then, has a standard form with homogeneous dimension while preserving the information of the original sequences. Hence, it can serve as a substitute for action sequences in traditional statistical models like generalized linear models (Tang et al., 2020). The number of process features K can be chosen by cross-validation and minimizing the loss function in Equation A.4.

Appendix B. Approximation of the standard error of TIE by delta method

The TIE for the LCMA can be expressed as:

(B.1)

where

(B.2)

Let’s denote Inline graphic as,

(B.3)

Then, the partial derivative of Inline graphic with respect to and are

(B.4)

with

(B.5)

and

(B.6)

The partial derivatives of the TIE with respect to the parameters are:

(B.7)

(B.8)

where

(B.9)

(B.10)

The gradient of the TIE with respect to the parameters is:

(B.11)

The approximation for the SE of the Inline graphic is then , where is the covariance matrix of the parameters. The confidence interval of TIE is constructed as , where is the critical value from the standard normal distribution.

Appendix C. True model parameter values in the simulation study

In the simulation study, the true model parameter values are set as follows. The true Inline graphic , , , and values were fixed in all simulation conditions.

(C.1)

The true class-specific mean structure, Inline graphic is given in Figure 3. The true class-specific covariance matrix is composed of that controls the volume and a diagonal matrix where the class-specific covariance matrix is . In conditions, was set as

(C.2)

so that the volume of the four classes is equal, and small enough to yield no between-class overlap. In Inline graphic conditions, was set as

(C.3)

The classes were allowed to vary in their volumes and have overlapping observations between classes as demonstrated in Figure 4. The diagonal elements of B, Inline graphic were generated as follows. Let , where K is the number of items.

(C.4)

The variance of the Inline graphic th item is 1.2 times the variance of the first item within a class. Then, was normalized by the geometric mean to satisfy .

(C.5)

Table D1.

Parameter recovery with fixed L and Inline graphic


Parameter	True value	Bias	RMSE	Bias	RMSE
	0.333	0.033	0.206	0.009	0.169
	0.667	0.014	0.249	0.013	0.172
	1	0.033	0.249	−0.01	0.164
	0.667	0.009	0.226	0.014	0.183
	1.333	0.018	0.304	0.011	0.164
	2	0.041	0.281	0.029	0.223
	0.667	0.056	0.233	0.026	0.171
	1.333	0.008	0.282	0.013	0.201
	2	0.058	0.332	0.017	0.210
	1	0.006	0.267	0.013	0.154
	0.333	0.002	0.215	0.008	0.152
	0.333	0.022	0.243	0.002	0.146
	1	0.024	0.252	0.032	0.199
	0	0.017	0.187	0.012	0.127
	0.5	0.005	0.206	0.025	0.128

Open in a new tab

Appendix D. Model parameter recovery check

In this section, we evaluate the accuracy of parameter estimates of the LCMA model. The LCMA model was fitted using all indicators Inline graphic , with the number of latent classes L fixed at its true value, to assess the parameter estimation accuracy of the EM algorithm. Random samples were generated under a latent class mediation model with latent classes and signal indicators. The sample sizes considered were and Inline graphic . The true parameter values are specified as Equations C.1–C.2 and C.4–C.5. The additional parameters related to the covariate were set as follows:

(D.1)

The true mean structure was specified as in Equation D.2. Simulation results based on 100 replications are summarized in Table D1. The bias ranged from Inline graphic to in the condition, and it decreased as the sample size increased to , ranging from 0.026 to . The RMSE also decreased from (0.187, 0.332) to (0.127, 0.223) as the sample size increased.

(D.2)

Appendix E. Simulation under alternative data generating models

In this section, we evaluate the performance of the LCMA procedure under four alternative data-generating models in terms of the variable selection and the parameter estimation accuracy. The alternative models include the following.

Confounder X and a non-zero
Noisy latent class underlying a noisy feature.
Unmeasured mediator .
Mixture Poisson distribution.

In each condition, 100 random samples were generated with sample size Inline graphic and true number of latent classes .

In Condition 1, the effect of a predictor–mediator and mediator–outcome confounder X is included in the data-generating model and is estimated. The true parameter values are specified as Equations C.1–C.2 and C.4–C.5. The parameters related to the confounder effect were set as follows:

(E.1)

The number of signal indicators was Inline graphic , and the true mean structure was specified as in the condition in Figure 3. In addition, we included a non-zero value, that is, a non-zero DE of the predictor given the latent class mediator and the confounder.

In Condition 2, a noisy latent class variable underlying a noisy feature was generated. This noisy latent class variable was unrelated to both the predictor and the outcome in the generating model. In this condition, we evaluated whether the proposed algorithm correctly selects the signal features despite the presence of a clustering structure underlying a noisy feature. More specifically, the signal latent class variable Inline graphic was generated as a function of the predictor G,

(E.2)

Then, the signal features Inline graphic were generated as

(E.3)

Then, the final outcome Y was generated as a function of the predictor G and the signal latent class variable Inline graphic , similar to Equation 4. The true parameter values are specified as Equations C.1–C.2 and C.4–C.5. The number of signal indicators was and the true mean structure was specified as in the condition in Figure 3. The noisy latent class variable was generated independently from a Bernoulli distribution.

(E.4)

Then, one of the noisy features was generated given the noisy latent class membership as,

(E.5)

In Condition 3, an unmeasured mediator Inline graphic was considered where was generated as a function of the predictor G,

(E.6)

where Inline graphic . Then, the outcome variable Y was generated as a function of G, , and .

(E.7)

The true parameter values are specified as Equations C.1–C.2 and C.4–C.5. The number of signal indicators was Inline graphic , and the true mean structure was specified as in the condition in Figure 3.

In Condition 4, we evaluate the performance of the proposed algorithm under the non-normality assumption. The features were generated under a mixture Poisson distribution with class-specific rate Inline graphic given in Equation E.8. Each column represents a latent class. The first five rows represent the signal indicators, and the last five rows are the noisy indicators. All the other true parameter values were set as Equation C.1.

(E.8)

The results from the additional simulation with alternative data-generating models are presented in Table E2. The classification accuracy remained reasonably high, with the ARI ranging from Inline graphic to . The false positive rate for selecting noisy indicators ranged from to . The bias in the TIE was small, with relative bias less than , except in Condition 3 with an unmeasured mediator. When the unmeasured mediator was not included in the model, the TIE was overestimated. Similarly, the RMSE of the TIE was highest in Condition 3.

Table E2.

Results from the additional simulation with alternative data generating models

						TIE
Con	ARI	N.class	N.ind	FP	TP	Bias	RMSE	95% C.R.
1	0.89	3.34	3.36	0.03	0.64	0.011	0.025	0.92
2	0.79	4.53	3.43	0.14	0.55	0.008	0.024	0.89
3	0.91	3.70	3.15	0.05	0.58	0.026	0.033	0.65
4	0.79	5.15	3.27	0.05	0.61	0.006	0.026	0.88

Open in a new tab

Data availability statement

The data that support the findings of this study are available from the U. S. Department of Education, National Center for Education Statistics. Restrictions apply to the availability of these data, which were used under license for this study. The codes are available in the Open Science Framework (OSF) repository at https://osf.io/a5zem/?view_only=983859876f2547bb977e02e5dfef6a3d.

Funding statement

The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R324P230002 to Digital Promise and the University of Illinois Urbana-Champaign. The opinions expressed are those of the authors and do not represent the views of the Institute or the U.S. Department of Education.

References

Banfield, J. D. , & Raftery, A. E. (1993). Model-based Gaussian and non-gaussian clustering. Biometrics, 49(3), 803–821. [Google Scholar]
Bergner, Y. , & von Davier, A. A. (2019). Process data in NAEP: Past, present, and future. Journal of Educational and Behavioral Statistics, 44(6), 706–732. [Google Scholar]
Bolck, A. , Croon, M. , & Hagenaars, J. (2004). Estimating latent structure models with categorical variables: One-step versus three-step estimators. Political Analysis, 12(1), 3–27. [Google Scholar]
Borg, I. , & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications. Springer Science & Business Media. [Google Scholar]
Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA Journal of Applied Mathematics, 6(1), 76–90. [Google Scholar]
Celeux, G. , & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793. [Google Scholar]
Chen, Y. , Liu, Y. , Culpepper, S. A. , & Chen, Y. (2021). Inferring the number of attributes for the exploratory DINA model. Psychometrika, 86(1), 30–64. [DOI] [PubMed] [Google Scholar]
Chen, Y. (2020). A continuous-time dynamic choice measurement model for problem-solving process data. Psychometrika, 85(4), 1052–1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dean, N. , & Raftery, A. E. (2010). Latent class analysis variable selection. Annals of the Institute of Statistical Mathematics, 62, 11–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dempster, A. P. , Laird, N. M. , & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22. [Google Scholar]
Dziak, J. J. , Bray, B. C. , Zhang, J. , Zhang, M. , & Lanza, S. T. (2016). Comparing the performance of improved classify-analyze approaches for distal outcomes in latent profile analysis. Methodology, 12(4), 107–116. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eichmann, B. , Goldhammer, F. , Greiff, S. , Brandhuber, L. , & Naumann, J. (2020). Using process data to explain group differences in complex problem solving. Journal of Educational Psychology, 112(8), 1546. [Google Scholar]
Eichmann, B. , Greiff, S. , Naumann, J. , Brandhuber, L. , & Goldhammer, F. (2020). Exploring behavioural patterns during complex problem-solving. Journal of Computer Assisted Learning, 36(6), 933–956. [Google Scholar]
Fang, G. , & Ying, Z. (2020). Latent theme dictionary model for finding co-occurrent patterns in process data. Psychometrika, 85(3), 775–811. [DOI] [PubMed] [Google Scholar]
Fletcher, R. (1970). A new approach to variable metric algorithms. The Computer Journal, 13(3), 317–322. [Google Scholar]
Fraley, C. , & Raftery, A. E. (2002). Model-based clustering, discriminant analysis and density estimation. Journal of the American statistical Association, 97(458), 611–631. [Google Scholar]
Gao, Y. , Cui, Y. , Bulut, O. , Zhai, X. , & Chen, F. (2022). Examining adults’ web navigation patterns in multi-layered hypertext environments. Computers in Human Behavior, 129, 107142. [Google Scholar]
Gao, Y. , Zhai, X. , Bulut, O. , Cui, Y. , & Sun, X. (2022). Examining humans’ problem-solving styles in technology-rich environments using log file data. Journal of Intelligence, 10(3), 38. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goldfarb, D. (1970). A family of variable-metric methods derived by variational means. Mathematics of Computation, 24(109), 23–26. [Google Scholar]
Gómez-Alonso, C. , & Valls, A. (2008). A similarity measure for sequences of categorical data based on the ordering of common elements. In V. Torra, & Y. Narukawa (Eds.), Modeling decisions for artificial intelligence: 5th international conference, mdai 2008 Sabadell, Spain, october 30-31, 2008. Proceedings 5. Springer. [Google Scholar]
Greiff, S. , Wüstenberg, S. , & Avvisati, F. (2015). Computer-generated log-file analyses as a window into students’ minds? A showcase study based on the Pisa 2012 assessment of problem solving. Computers & Education, 91, 92–105. [Google Scholar]
Hao, J. , & Mislevy, R. J. (2019). Characterizing interactive communications in computer-supported collaborative problem-solving tasks: A conditional transition profile approach. Frontiers in Psychology, 10, 424340. [DOI] [PMC free article] [PubMed] [Google Scholar]
He, Q. , Borgonovi, F. , & Suárez-Álvarez, J. (2022). Clustering sequential navigation patterns in multiple-source reading tasks with dynamic time warping method. Journal of Computer-Assisted Learning, 39(3), 719–736. 10.1111/jcal.12748. [DOI] [Google Scholar]
He, Q. , Borgonovi, F. , & Suárez-Álvarez, J. (2023). Clustering sequential navigation patterns in multiple-source reading tasks with dynamic time warping method. Journal of Computer Assisted Learning, 39(3), 719–736. [Google Scholar]
He, Q. , Shi, Q. , & Tighe, E. L. (2023). Predicting problem-solving proficiency with multiclass hierarchical classification on process data: A machine learning approach. Psychological Test and Assessment Modeling, 65(1), 145–177. [Google Scholar]
He, Q. , & von Davier, M. (2016). Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. In Y. Rosen, D. Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill development (pp. 750–777). IGI Global. [Google Scholar]
He, Q. , von Davier, M. , & Han, Z. (2019). Exploring process data in problem-solving items in computer-based large-scale assessments: Case studies in PISA and PIAAC. In Jiao H., Lissitz R. W., & Wie A. (Eds.), Data analytics and psychometrics: Informing assessment practices. Information Age Publishing, Inc. [Google Scholar]
Hoeting, J. A. , Madigan, D. , Raftery, A. E. , & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial (with comments by m. Clyde, David draper and E. I. George, and a rejoinder by the authors. Statistical Science, 14(4), 382–417. [Google Scholar]
Hsiao, Y.-Y. , Kruger, E. S. , Lee Van Horn, M. , Tofighi, D. , MacKinnon, D. P. , & Witkiewitz, K. (2021). Latent class mediation: A comparison of six approaches. Multivariate Behavioral Research, 56(4), 543–557. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hubert, L. , & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218. [Google Scholar]
Imai, K. , Keele, L. , & Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4), 309. [DOI] [PubMed] [Google Scholar]
Imai, K. , Keele, L. , & Yamamoto, T. (2010). Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science, 25(1), 51–71. [Google Scholar]
Judge, S. , & Watson, S. M. (2011). Longitudinal outcomes for mathematics achievement for students with learning disabilities. The Journal of Educational Research, 104(3), 147–157. [Google Scholar]
Keribin, C. (1998). Consistent estimate of the order of mixture models. Comptes Rendus De L Academie Des Sciences Serie I-Mathematique, 326(2), 243–248. [Google Scholar]
LaMar, M. M. (2018). Markov decision process measurement model. Psychometrika, 83(1), 67–88. [DOI] [PubMed] [Google Scholar]
Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structure analysis. Studies in Social Psychology in World War II Vol. IV: Measurement and Prediction, IV, 362–412. [Google Scholar]
Lazarsfeld, P. F. (1968). Latent structure analysis. New York, NY: Houghton Mifflin.
Liao, D. , He, Q. , & Jiao, H. (2019). Mapping background variables with sequential patterns in problem-solving environments: An investigation of United States adults’ employment status in PIAAC. Frontiers in Psychology, 10, 646. [DOI] [PMC free article] [PubMed] [Google Scholar]
MacKinnon, D. P. , & Pirlott, A. G. (2015). Statistical approaches for enhancing causal interpretation of the M to Y relation in mediation analysis. Personality and Social Psychology Review, 19(1), 30–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
Murphy, K. , & Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2), 293–325. [Google Scholar]
Murtagh, F. , & Legendre, P. (2014). Ward’s hierarchical agglomerative clustering method: Which algorithms implement ward’s criterion? Journal of Classification, 31, 274–295. [Google Scholar]
Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in mplus. Los Angeles, CA. Available at: https://www.statmodel.com/download/causalmediation.pdf.
Muthén, B. , & Muthén, L. (2017). Mplus. In Handbook of item response theory (pp. 507–518). Chapman; Hall/CRC. [Google Scholar]
NCES. (2020). Process data from the 2017 NAEP grade 8 mathematics assessment. Accessed: June 15, 2024.
Oberski, D. (2016). Mixture models: Latent profile and latent class analysis. In J. Robertson, & M. Kaptein, (Eds.), Modern Statistical Methods for HCI, Springer, 275–287. [Google Scholar]
Pearl, J. (2010). An introduction to causal inference. The International Journal of Biostatistics, 6(2), Article 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Preacher, K. J. (2015). Advances in mediation analysis: A survey and synthesis of new developments. Annual Review of Psychology, 66(1), 825–852. [DOI] [PubMed] [Google Scholar]
Qiao, X. , & Jiao, H. (2018). Data mining techniques in analyzing process data: A didactic. Frontiers in Psychology, 2231. [DOI] [PMC free article] [PubMed] [Google Scholar]
Richardson, S. , & Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society Series B: Statistical Methodology, 59(4), 731–792. [Google Scholar]
Robbins, H. , & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407. [Google Scholar]
Robins, J. M. , & Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3(2), 143–155. [DOI] [PubMed] [Google Scholar]
Russell, N. , Murphy, T. B. , & Raftery, A. E. (2015). Bayesian model averaging in model-based clustering and density estimation. preprint arXiv:1506.09035.
Scrucca, L. , Fraley, C. , Murphy, T. B. , & Raftery, A. E. (2023). Model-based clustering, classification, and density estimation using mclust in R. Chapman; Hall/CRC. 10.1201/9781003277965 [DOI] [Google Scholar]
Shanno, D. F. (1970). Conditioning of quasi-newton methods for function minimization. Mathematics of Computation, 24(111), 647–656. [Google Scholar]
Sint, K. , Rosenheck, R. , & Lin, H. (2021). Latent class mediator for multiple indicators of mediation. Statistics in Medicine, 40(12), 2800–2820. [DOI] [PMC free article] [PubMed] [Google Scholar]
Spurk, D. , Hirschi, A. , Wang, M. , Valero, D. , & Kauffeld, S. (2020). Latent profile analysis: A review and “how to” guide of its application within vocational behavior research. Journal of Vocational Behavior, 120, 103445. [Google Scholar]
Stephens, M. (2000). Bayesian analysis of mixture models with an unknown number of components-an alternative to reversible jump methods. Annals of Statistics, 28(1), 40–74. [Google Scholar]
Takane, Y. (2006). 11 applications of multidimensional scaling in psychometrics. Handbook of Statistics, 26, 359–400. [Google Scholar]
Tang, X. (2023). A latent hidden markov model for process data. Psychometrika, 89(1), 1–36. [DOI] [PubMed] [Google Scholar]
Tang, X. , Wang, Z. , He, Q. , Liu, J. , & Ying, Z. (2020). Latent feature extraction for process data via multidimensional scaling. Psychometrika, 85, 378–397. [DOI] [PubMed] [Google Scholar]
Tang, X. , Zhang, S. , Wang, Z. , Liu, J. , & Ying, Z. (2021). ProcData: An R package for process data analysis. Psychometrika, 86(4), 1058–1083. [DOI] [PubMed] [Google Scholar]
Ulitzsch, E. , He, Q. , & Pohl, S. (2022). Using sequence mining techniques for understanding incorrect behavioral patterns on interactive tasks. Journal of Educational and Behavioral Statistics, 47(1), 3–35. [Google Scholar]
Ulitzsch, E. , He, Q. , Ulitzsch, V. , Molter, H. , Nichterlein, A. , Niedermeier, R. , & Pohl, S. (2021). Combining clickstream analyses and graph-modeled data clustering for identifying common response processes. Psychometrika, 86(1), 190–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ulitzsch, E. , Ulitzsch, V. , He, Q. , & Lüdtke, O. (2022). A machine learning-based procedure for leveraging clickstream data to investigate early predictability of failure on interactive tasks. Behavior Research Methods, 55(3), 1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
U.S. Department of Education. Institute of Education Sciences, National Center for Education Statistics. (2022). National assessment of educational progress (NAEP), 2022 reading assessment. Accessed: June 15, 2024.
Valente, M. J. , Pelham III, W. E. , Smyth, H. , & MacKinnon, D. P. (2017). Confounding in statistical mediation analysis: What it is and how to address it. Journal of Counseling Psychology, 64(6), 659–671. [DOI] [PMC free article] [PubMed] [Google Scholar]
Valeri, L. , & Vander Weele, T. J. (2013). Mediation analysis allowing for exposure–mediator interactions and causal interpretation: Theoretical assumptions and implementation with SAS and SPSS macros. Psychological Methods, 18(2), 137. [DOI] [PMC free article] [PubMed] [Google Scholar]
Van der Maaten, L. , & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11), 2579–2605. [Google Scholar]
Vander Weele, T. J. , & Arah, O. A. (2011). Unmeasured confounding for general outcomes, treatments, and confounders: Bias formulas for sensitivity analysis. Epidemiology, 22(1), 42–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vander Weele, T. J. , & Vansteelandt, S. (2009). Conceptual issues concerning mediation, interventions and composition. Statistics and its Interface, 2(4), 457–468. [Google Scholar]
van Kesteren, E.-J. , & Oberski, D. L. (2019). Exploratory mediation analysis with many potential mediators. Structural Equation Modeling: A Multidisciplinary Journal, 26(5), 710–723. [Google Scholar]
Vermunt, J. K. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18(4), 450–469. [Google Scholar]
Vermunt, J. K. , & Magidson, J. (2002). Latent class cluster analysis. Applied Latent Class Analysis, 11(89–106), 60. [Google Scholar]
Wang, Z. , Tang, X. , Liu, J. , & Ying, Z. (2023). Subtask analysis of process data through a predictive model. British Journal of Mathematical and Statistical Psychology, 76(1), 211–235. [DOI] [PubMed] [Google Scholar]
Wasserman, L. (2000). Bayesian model selection and model averaging. Journal of Mathematical Psychology, 44(1), 92–107. [DOI] [PubMed] [Google Scholar]
Welling, J. , Gnambs, T. , & Carstensen, C. H. (2024). Identifying disengaged responding in multiple-choice items: Extending a latent class item response model with novel process data indicators. Educational and Psychological Measurement, 84(2), 314–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
Witkiewitz, K. , Roos, C. R. , Tofighi, D. , & Van Horn, M. L. (2018). Broad coping repertoire mediates the effect of the combined behavioral intervention on alcohol outcomes in the combine study: An application of latent class mediation. Journal of Studies on Alcohol and Drugs, 79(2), 199–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xiao, Y. , & Liu, H. (2024). A state response measurement model for problem-solving process data. Behavior Research Methods, 56(1), 258–277. [DOI] [PubMed] [Google Scholar]
Zhan, P. , & Qiao, X. (2022). Diagnostic classification analysis of problem-solving competence using process data: An item expansion method. Psychometrika, 87(4), 1529–1547. [DOI] [PubMed] [Google Scholar]
Zhang, S. , Wang, Z. , Qi, J. , Liu, J. , & Ying, Z. (2022). Accurate assessment via process data. Psychometrika, 88(1), 1–22. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[r1] Banfield, J. D. , & Raftery, A. E. (1993). Model-based Gaussian and non-gaussian clustering. Biometrics, 49(3), 803–821. [Google Scholar]

[r2] Bergner, Y. , & von Davier, A. A. (2019). Process data in NAEP: Past, present, and future. Journal of Educational and Behavioral Statistics, 44(6), 706–732. [Google Scholar]

[r3] Bolck, A. , Croon, M. , & Hagenaars, J. (2004). Estimating latent structure models with categorical variables: One-step versus three-step estimators. Political Analysis, 12(1), 3–27. [Google Scholar]

[r4] Borg, I. , & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications. Springer Science & Business Media. [Google Scholar]

[r5] Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA Journal of Applied Mathematics, 6(1), 76–90. [Google Scholar]

[r6] Celeux, G. , & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793. [Google Scholar]

[r7] Chen, Y. , Liu, Y. , Culpepper, S. A. , & Chen, Y. (2021). Inferring the number of attributes for the exploratory DINA model. Psychometrika, 86(1), 30–64. [DOI] [PubMed] [Google Scholar]

[r8] Chen, Y. (2020). A continuous-time dynamic choice measurement model for problem-solving process data. Psychometrika, 85(4), 1052–1075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r9] Dean, N. , & Raftery, A. E. (2010). Latent class analysis variable selection. Annals of the Institute of Statistical Mathematics, 62, 11–35. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10] Dempster, A. P. , Laird, N. M. , & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22. [Google Scholar]

[r11] Dziak, J. J. , Bray, B. C. , Zhang, J. , Zhang, M. , & Lanza, S. T. (2016). Comparing the performance of improved classify-analyze approaches for distal outcomes in latent profile analysis. Methodology, 12(4), 107–116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r12] Eichmann, B. , Goldhammer, F. , Greiff, S. , Brandhuber, L. , & Naumann, J. (2020). Using process data to explain group differences in complex problem solving. Journal of Educational Psychology, 112(8), 1546. [Google Scholar]

[r13] Eichmann, B. , Greiff, S. , Naumann, J. , Brandhuber, L. , & Goldhammer, F. (2020). Exploring behavioural patterns during complex problem-solving. Journal of Computer Assisted Learning, 36(6), 933–956. [Google Scholar]

[r14] Fang, G. , & Ying, Z. (2020). Latent theme dictionary model for finding co-occurrent patterns in process data. Psychometrika, 85(3), 775–811. [DOI] [PubMed] [Google Scholar]

[r15] Fletcher, R. (1970). A new approach to variable metric algorithms. The Computer Journal, 13(3), 317–322. [Google Scholar]

[r16] Fraley, C. , & Raftery, A. E. (2002). Model-based clustering, discriminant analysis and density estimation. Journal of the American statistical Association, 97(458), 611–631. [Google Scholar]

[r17] Gao, Y. , Cui, Y. , Bulut, O. , Zhai, X. , & Chen, F. (2022). Examining adults’ web navigation patterns in multi-layered hypertext environments. Computers in Human Behavior, 129, 107142. [Google Scholar]

[r18] Gao, Y. , Zhai, X. , Bulut, O. , Cui, Y. , & Sun, X. (2022). Examining humans’ problem-solving styles in technology-rich environments using log file data. Journal of Intelligence, 10(3), 38. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r19] Goldfarb, D. (1970). A family of variable-metric methods derived by variational means. Mathematics of Computation, 24(109), 23–26. [Google Scholar]

[r20] Gómez-Alonso, C. , & Valls, A. (2008). A similarity measure for sequences of categorical data based on the ordering of common elements. In V. Torra, & Y. Narukawa (Eds.), Modeling decisions for artificial intelligence: 5th international conference, mdai 2008 Sabadell, Spain, october 30-31, 2008. Proceedings 5. Springer. [Google Scholar]

[r21] Greiff, S. , Wüstenberg, S. , & Avvisati, F. (2015). Computer-generated log-file analyses as a window into students’ minds? A showcase study based on the Pisa 2012 assessment of problem solving. Computers & Education, 91, 92–105. [Google Scholar]

[r22] Hao, J. , & Mislevy, R. J. (2019). Characterizing interactive communications in computer-supported collaborative problem-solving tasks: A conditional transition profile approach. Frontiers in Psychology, 10, 424340. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r23] He, Q. , Borgonovi, F. , & Suárez-Álvarez, J. (2022). Clustering sequential navigation patterns in multiple-source reading tasks with dynamic time warping method. Journal of Computer-Assisted Learning, 39(3), 719–736. 10.1111/jcal.12748. [DOI] [Google Scholar]

[r24] He, Q. , Borgonovi, F. , & Suárez-Álvarez, J. (2023). Clustering sequential navigation patterns in multiple-source reading tasks with dynamic time warping method. Journal of Computer Assisted Learning, 39(3), 719–736. [Google Scholar]

[r25] He, Q. , Shi, Q. , & Tighe, E. L. (2023). Predicting problem-solving proficiency with multiclass hierarchical classification on process data: A machine learning approach. Psychological Test and Assessment Modeling, 65(1), 145–177. [Google Scholar]

[r26] He, Q. , & von Davier, M. (2016). Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. In Y. Rosen, D. Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill development (pp. 750–777). IGI Global. [Google Scholar]

[r27] He, Q. , von Davier, M. , & Han, Z. (2019). Exploring process data in problem-solving items in computer-based large-scale assessments: Case studies in PISA and PIAAC. In Jiao H., Lissitz R. W., & Wie A. (Eds.), Data analytics and psychometrics: Informing assessment practices. Information Age Publishing, Inc. [Google Scholar]

[r28] Hoeting, J. A. , Madigan, D. , Raftery, A. E. , & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial (with comments by m. Clyde, David draper and E. I. George, and a rejoinder by the authors. Statistical Science, 14(4), 382–417. [Google Scholar]

[r29] Hsiao, Y.-Y. , Kruger, E. S. , Lee Van Horn, M. , Tofighi, D. , MacKinnon, D. P. , & Witkiewitz, K. (2021). Latent class mediation: A comparison of six approaches. Multivariate Behavioral Research, 56(4), 543–557. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r30] Hubert, L. , & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218. [Google Scholar]

[r31] Imai, K. , Keele, L. , & Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4), 309. [DOI] [PubMed] [Google Scholar]

[r32] Imai, K. , Keele, L. , & Yamamoto, T. (2010). Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science, 25(1), 51–71. [Google Scholar]

[r33] Judge, S. , & Watson, S. M. (2011). Longitudinal outcomes for mathematics achievement for students with learning disabilities. The Journal of Educational Research, 104(3), 147–157. [Google Scholar]

[r34] Keribin, C. (1998). Consistent estimate of the order of mixture models. Comptes Rendus De L Academie Des Sciences Serie I-Mathematique, 326(2), 243–248. [Google Scholar]

[r35] LaMar, M. M. (2018). Markov decision process measurement model. Psychometrika, 83(1), 67–88. [DOI] [PubMed] [Google Scholar]

[r36] Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structure analysis. Studies in Social Psychology in World War II Vol. IV: Measurement and Prediction, IV, 362–412. [Google Scholar]

[r37] Lazarsfeld, P. F. (1968). Latent structure analysis. New York, NY: Houghton Mifflin.

[r38] Liao, D. , He, Q. , & Jiao, H. (2019). Mapping background variables with sequential patterns in problem-solving environments: An investigation of United States adults’ employment status in PIAAC. Frontiers in Psychology, 10, 646. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r39] MacKinnon, D. P. , & Pirlott, A. G. (2015). Statistical approaches for enhancing causal interpretation of the M to Y relation in mediation analysis. Personality and Social Psychology Review, 19(1), 30–43. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r40] Murphy, K. , & Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2), 293–325. [Google Scholar]

[r41] Murtagh, F. , & Legendre, P. (2014). Ward’s hierarchical agglomerative clustering method: Which algorithms implement ward’s criterion? Journal of Classification, 31, 274–295. [Google Scholar]

[r42] Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in mplus. Los Angeles, CA. Available at: https://www.statmodel.com/download/causalmediation.pdf.

[r43] Muthén, B. , & Muthén, L. (2017). Mplus. In Handbook of item response theory (pp. 507–518). Chapman; Hall/CRC. [Google Scholar]

[r44] NCES. (2020). Process data from the 2017 NAEP grade 8 mathematics assessment. Accessed: June 15, 2024.

[r45] Oberski, D. (2016). Mixture models: Latent profile and latent class analysis. In J. Robertson, & M. Kaptein, (Eds.), Modern Statistical Methods for HCI, Springer, 275–287. [Google Scholar]

[r46] Pearl, J. (2010). An introduction to causal inference. The International Journal of Biostatistics, 6(2), Article 7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r47] Preacher, K. J. (2015). Advances in mediation analysis: A survey and synthesis of new developments. Annual Review of Psychology, 66(1), 825–852. [DOI] [PubMed] [Google Scholar]

[r48] Qiao, X. , & Jiao, H. (2018). Data mining techniques in analyzing process data: A didactic. Frontiers in Psychology, 2231. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r49] Richardson, S. , & Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society Series B: Statistical Methodology, 59(4), 731–792. [Google Scholar]

[r50] Robbins, H. , & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407. [Google Scholar]

[r51] Robins, J. M. , & Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3(2), 143–155. [DOI] [PubMed] [Google Scholar]

[r52] Russell, N. , Murphy, T. B. , & Raftery, A. E. (2015). Bayesian model averaging in model-based clustering and density estimation. preprint arXiv:1506.09035.

[r53] Scrucca, L. , Fraley, C. , Murphy, T. B. , & Raftery, A. E. (2023). Model-based clustering, classification, and density estimation using mclust in R. Chapman; Hall/CRC. 10.1201/9781003277965 [DOI] [Google Scholar]

[r54] Shanno, D. F. (1970). Conditioning of quasi-newton methods for function minimization. Mathematics of Computation, 24(111), 647–656. [Google Scholar]

[r55] Sint, K. , Rosenheck, R. , & Lin, H. (2021). Latent class mediator for multiple indicators of mediation. Statistics in Medicine, 40(12), 2800–2820. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r56] Spurk, D. , Hirschi, A. , Wang, M. , Valero, D. , & Kauffeld, S. (2020). Latent profile analysis: A review and “how to” guide of its application within vocational behavior research. Journal of Vocational Behavior, 120, 103445. [Google Scholar]

[r57] Stephens, M. (2000). Bayesian analysis of mixture models with an unknown number of components-an alternative to reversible jump methods. Annals of Statistics, 28(1), 40–74. [Google Scholar]

[r58] Takane, Y. (2006). 11 applications of multidimensional scaling in psychometrics. Handbook of Statistics, 26, 359–400. [Google Scholar]

[r59] Tang, X. (2023). A latent hidden markov model for process data. Psychometrika, 89(1), 1–36. [DOI] [PubMed] [Google Scholar]

[r60] Tang, X. , Wang, Z. , He, Q. , Liu, J. , & Ying, Z. (2020). Latent feature extraction for process data via multidimensional scaling. Psychometrika, 85, 378–397. [DOI] [PubMed] [Google Scholar]

[r61] Tang, X. , Zhang, S. , Wang, Z. , Liu, J. , & Ying, Z. (2021). ProcData: An R package for process data analysis. Psychometrika, 86(4), 1058–1083. [DOI] [PubMed] [Google Scholar]

[r62] Ulitzsch, E. , He, Q. , & Pohl, S. (2022). Using sequence mining techniques for understanding incorrect behavioral patterns on interactive tasks. Journal of Educational and Behavioral Statistics, 47(1), 3–35. [Google Scholar]

[r63] Ulitzsch, E. , He, Q. , Ulitzsch, V. , Molter, H. , Nichterlein, A. , Niedermeier, R. , & Pohl, S. (2021). Combining clickstream analyses and graph-modeled data clustering for identifying common response processes. Psychometrika, 86(1), 190–214. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r64] Ulitzsch, E. , Ulitzsch, V. , He, Q. , & Lüdtke, O. (2022). A machine learning-based procedure for leveraging clickstream data to investigate early predictability of failure on interactive tasks. Behavior Research Methods, 55(3), 1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r65] U.S. Department of Education. Institute of Education Sciences, National Center for Education Statistics. (2022). National assessment of educational progress (NAEP), 2022 reading assessment. Accessed: June 15, 2024.

[r66] Valente, M. J. , Pelham III, W. E. , Smyth, H. , & MacKinnon, D. P. (2017). Confounding in statistical mediation analysis: What it is and how to address it. Journal of Counseling Psychology, 64(6), 659–671. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r67] Valeri, L. , & Vander Weele, T. J. (2013). Mediation analysis allowing for exposure–mediator interactions and causal interpretation: Theoretical assumptions and implementation with SAS and SPSS macros. Psychological Methods, 18(2), 137. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r68] Van der Maaten, L. , & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11), 2579–2605. [Google Scholar]

[r69] Vander Weele, T. J. , & Arah, O. A. (2011). Unmeasured confounding for general outcomes, treatments, and confounders: Bias formulas for sensitivity analysis. Epidemiology, 22(1), 42–52. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r70] Vander Weele, T. J. , & Vansteelandt, S. (2009). Conceptual issues concerning mediation, interventions and composition. Statistics and its Interface, 2(4), 457–468. [Google Scholar]

[r71] van Kesteren, E.-J. , & Oberski, D. L. (2019). Exploratory mediation analysis with many potential mediators. Structural Equation Modeling: A Multidisciplinary Journal, 26(5), 710–723. [Google Scholar]

[r72] Vermunt, J. K. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18(4), 450–469. [Google Scholar]

[r73] Vermunt, J. K. , & Magidson, J. (2002). Latent class cluster analysis. Applied Latent Class Analysis, 11(89–106), 60. [Google Scholar]

[r74] Wang, Z. , Tang, X. , Liu, J. , & Ying, Z. (2023). Subtask analysis of process data through a predictive model. British Journal of Mathematical and Statistical Psychology, 76(1), 211–235. [DOI] [PubMed] [Google Scholar]

[r75] Wasserman, L. (2000). Bayesian model selection and model averaging. Journal of Mathematical Psychology, 44(1), 92–107. [DOI] [PubMed] [Google Scholar]

[r76] Welling, J. , Gnambs, T. , & Carstensen, C. H. (2024). Identifying disengaged responding in multiple-choice items: Extending a latent class item response model with novel process data indicators. Educational and Psychological Measurement, 84(2), 314–339. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r77] Witkiewitz, K. , Roos, C. R. , Tofighi, D. , & Van Horn, M. L. (2018). Broad coping repertoire mediates the effect of the combined behavioral intervention on alcohol outcomes in the combine study: An application of latent class mediation. Journal of Studies on Alcohol and Drugs, 79(2), 199–207. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r78] Xiao, Y. , & Liu, H. (2024). A state response measurement model for problem-solving process data. Behavior Research Methods, 56(1), 258–277. [DOI] [PubMed] [Google Scholar]

[r79] Zhan, P. , & Qiao, X. (2022). Diagnostic classification analysis of problem-solving competence using process data: An item expansion method. Psychometrika, 87(4), 1529–1547. [DOI] [PubMed] [Google Scholar]

[r80] Zhang, S. , Wang, Z. , Qi, J. , Liu, J. , & Ying, Z. (2022). Accurate assessment via process data. Psychometrika, 88(1), 1–22. [DOI] [PubMed] [Google Scholar]

PERMALINK

Explaining Performance Gaps with Problem-Solving Process Data via Latent Class Mediation Analysis

Sunbeom Kwon

Susu Zhang

Abstract

1. Introduction

2. LCMA

2.1. Motivating example

Figure 1.

Table 1.

2.2. Latent class mediation model

Figure 2.

2.3. Parameter estimation

2.3.1. E-step

2.3.2. M-step

2.4. Assessing direct and indirect effect

3. Headlong search algorithm for feature selection

3.1. Feature subset initialization

3.2. Inclusion step

3.3. Exclusion step

4. Simulation study

4.1. Data generation

Figure 3.

Figure 4.

4.2. Simulation results

Table 2.

5. NAEP Math Assessment data analysis results

Table 3.

Table 4.

Figure 5.

6. Discussion

Appendix A. MDS for action sequence data

Appendix B. Approximation of the standard error of TIE by delta method

Appendix C. True model parameter values in the simulation study

Table D1.

Appendix D. Model parameter recovery check

Appendix E. Simulation under alternative data generating models

Table E2.

Data availability statement

Funding statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases