Skip to main content
Applied Psychological Measurement logoLink to Applied Psychological Measurement
. 2020 Jun 6;44(7-8):515–530. doi: 10.1177/0146621620920928

A Multivariate Probit Model for Learning Trajectories: A Fine-Grained Evaluation of an Educational Intervention

Yinghan Chen 1, Steven Andrew Culpepper 2,
PMCID: PMC7495794  PMID: 34565932

Abstract

Advances in educational technology provide teachers and schools with a wealth of information about student performance. A critical direction for educational research is to harvest the available longitudinal data to provide teachers with real-time diagnoses about students’ skill mastery. Cognitive diagnosis models (CDMs) offer educational researchers, policy makers, and practitioners a psychometric framework for designing instructionally relevant assessments and diagnoses about students’ skill profiles. In this article, the authors contribute to the literature on the development of longitudinal CDMs, by proposing a multivariate latent growth curve model to describe student learning trajectories over time. The model offers several advantages. First, the learning trajectory space is high-dimensional and previously developed models may not be applicable to educational studies that have a modest sample size. In contrast, the method offers a lower dimensional approximation and is more applicable for typical educational studies. Second, practitioners and researchers are interested in identifying factors that cause or relate to student skill acquisition. The framework can easily incorporate covariates to assess theoretical questions about factors that promote learning. The authors demonstrate the utility of their approach with an application to a pre- or post-test educational intervention study and show how the longitudinal CDM framework can provide fine-grained assessment of experimental effects.

Keywords: cognitive diagnosis, growth curve, learning trajectories, Bayesian, multivariate probit

Introduction

Cognitive diagnosis models (CDMs) offer educational researchers, policy makers, and practitioners a psychometric framework for designing instructionally relevant assessments (Huff & Goodman, 2007; Leighton & Gierl, 2007). Rather than relying upon broad measures of achievement, CDMs relate a collection of fine-grained skills to performance on educational tasks. A direct benefit of the CDM framework is that assessment results can be used to diagnose students’ skill mastery and provide teachers with information about students’ strengths and to illuminate skill deficits that must be addressed with educational interventions.

Prior research offered methodological developments and presented applications of CDMs. For example, early applications diagnosed student mastery of performance on fraction-subtraction questions (Mislevy & Wilson, 1996; Tatsuoka, 1984). The results of CDM analyses are becoming more relevant today in the modern, computerized classroom. Namely, the line between assessment and instruction is blurring with the development and dissemination of online learning tools (e.g., see a review by Ye et al., 2016) that allow students to practice content under the supervision of teachers. Educational technology provides teachers and schools with access to a wealth of information about student mastery and learning. Students are able to complete computerized assessments, and the responses can be analyzed in real time to support teachers’ instructional decisions.

CDMs have a clear role in supporting formative classroom assessments. Yet, one limitation is that the originally proposed CDM framework is static, in that, it was created for cross-sectional designs to diagnose student skills at a point in time rather than to model the trajectory students follow during the learning process. The modern classroom provides a wealth of longitudinal information that can be harvested to improve diagnoses and uncover common learning trajectories that students tend to follow toward skill mastery.

Modeling learning trajectories (i.e., the changes in skill profiles over time) is a familiar problem to the educational data mining community in which the method of Bayesian Knowledge Tracing (BKT) has become a dominant methodology (Corbett & Anderson, 1994). Recently, longitudinal CDMs emerged naturally from the extension of static CDMs and can be seen as new approaches of BKT. Both longitudinal CDMs and BKT are concerned with modeling the presence or absence of skills or attributes, and how this changes over time. They also both require a measurement model, such as the deterministic inputs, noisy “and” gate (DINA; Haertel, 1989; Junker & Sijtsma, 2001) model that is used here, which involves the possibility of slipping or guessing that is a familiar notion in BKT models. The transition model from presence or absence of a skill is common to both of these disciplines, and a challenging aspect is how to deal with multiple skills. For instance, existing longitudinal CDMs employ first-order (Chen et al., 2018; Kaya & Leite, 2017; Li et al., 2016) and higher order (Wang et al., 2017) hidden Markov models (HMMs). Y. Xu and Mostow (2011, 2012) address transition models in BKT with a logistic regression approach. Another common concern is how covariates may be employed to modify transition probabilities. See, for example, González-Brenes et al. (2014) from the BKT literature in which Feature Aware Student Knowledge Tracing (FAST) is introduced, or Wang et al. (2017) in which a longitudinal CDM is introduced that adjusts for practice and intervention effects, and includes a continuous random effect for learning ability to help capture dependence. Similarly, latent factor knowledge tracing (LFKT) makes clever use of both continuous and binary attributes by allowing the slipping and guessing parameters of a BKT model to depend on an individual’s latent continuous ability as discussed in Klingler et al. (2015) and Khajah et al. (2014). Although BKT and longitudinal CDMs have grown out of slightly different disciplines, the basic structure is identical, and particular methods only differ in how they address the measurement model, joint distribution of attributes, and the longitudinal learning transition aspect of the model.

In this article, the original contribution is to introduce another strategy to model the learning trajectory space that builds upon prior research on cross-sectional CDMs that approximated the latent class structure, π. As discussed below, the learning trajectory space grows exponentially as the number of skills (i.e., K) or time points (i.e., T) increases. Prior approaches to modeling learning trajectories inherited this complexity and accordingly estimate many parameters, which may be problematic for smaller scale educational studies. In contrast, the benefit of the model is that the authors propose a more parsimonious alternative that balances model fit and model complexity where the number of parameters only grows quadratically with K.

The remainder of this article includes four sections. The first section reviews the static and longitudinal CDM frameworks. The second section introduces the proposed learning trajectory model based on a multivariate probit model. Note that simulation results on model parameter recovery are presented in Online Appendix A. The authors consider an application in the third section with a data set involving a pre- or post-test experimental design to evaluate the impact of different kinds of feedback on mathematical skill development. One important finding from this application is that the method provides improved fit to the data in comparison with a longitudinal item response theory (IRT) approach. The third section also shows how to incorporate covariates in the method to evaluate educational interventions. Finally, the authors discuss the implications of the study and offer future research directions.

Cognitive Diagnosis Modeling

The authors first provide an overview of the cross-sectional CDM framework and then outline a more recently developed longitudinal CDM strategy. Note that they provide a brief review of previous research due to space considerations and direct readers to the original papers for specific details about equations and estimation algorithms.

The Cross-Sectional CDM Framework

Under the CDM framework, student skill mastery is represented as a binary latent variable αik, where αik=1 if student i(i=1,,N) mastered skill k(k=1,,K) and zero if the student has yet to master the skill. Educational tasks often require students to have mastered one or more skills to have the greatest chance of success. Accordingly, we let αi=(αi1,,αiK) be a K vector of binary skills needed for success in a given content area. Clearly, αi equals one of 2K different binary attribute profiles. For example, with K=3, there are eight profiles which include a profile for students with no mastered skills (i.e., αi=(0,0,0)), another profile for students who have mastered all skills (i.e., αi=(1,1,1)), and six additional skill profiles that represent students who are non-masters on at least one skill (i.e., (0,0,1), (0,1,0), (1,0,0), (0,1,1), (1,0,1), and (1,1,0)).

The psychometric literature on CDMs includes numerous item response functions (IRFs) to relate αi to performance on items. We let Yij be a binary random variable that equals one if individual i correctly answers item j(j=1,,J) and zero otherwise. In general, the IRF is P(Yij=1|αi,ζj), where ζj is a vector of parameters for item j. There are parsimonious models that include just a few item parameters, such as the DINA; the noisy inputs, deterministic “and” gate (NIDA; Maris, 1999); and the reduced reparameterized unified model (rRUM; Hartz, 2002) in addition to more flexible models (de la Torre, 2011; Henson et al., 2009; von Davier, 2008; G. Xu, 2017) that have at most 2K parameters, so that each latent skill class has a unique chance of success. Furthermore, CDM IRFs differ by the assumed latent response process, which determines how the underlying skills interact to affect observed performance. For instance, conjunctive models (e.g., the NIDA, DINA, and rRUM) assume that all of the necessary skills must be mastered (Haertel, 1989), disjunctive models suppose only one of the required skills is needed to maximize success (i.e., the deterministic input noisy output “or” gate [DINO] as discussed by Templin & Henson, 2006), and compensatory models consider cases where missing some needed skills can be offset by having others. Simpler models typically use fewer item parameters and assume one latent response process, whereas the advantage of more general models is that each class may have a probability and items can assume any response process.

In this article, the authors present an application using the DINA model. The DINA is a conjunctive model with ideal responses defined as ηij=I(αiqj=qjqj), where qjk=1 if attribute k is required for item j and zero otherwise. Therefore, ηij=1 if student i has all of the required skills and zero otherwise. For each item, the DINA includes slipping (i.e., sj=P(Yij=0|ηij=1)) and guessing (i.e., gj=P(Yij=1|ηij=0)) parameters, so the probability that examinee i correctly answers item j is

P(Yij=1|αi,sj,gj)=(1sj)ηijgj1ηij. (1)

Readers are directed to de la Torre (2009) and Culpepper (2015) for additional details about DINA model parameter estimation.

An important feature of CDMs is the latent class structure, which is denoted by the 2K vector Π, where each element denotes the probability of membership in a given class. For instance, πc=P(α=αc) is the cth element of π and denotes the probability an individual has an attribute profile of αc{0,1}K. The classic approach is to estimate the elements of Π in an unstructured manner.

Note that the number of elements in π increases exponentially with K. One implication is that estimating the latent skill distribution may pose a computational burden when there are many skills (i.e., K is large), and prior research considered several strategies for approximating the latent class structure with more parsimonious models. There are two popular alternatives to using an unstructured model for π. For instance, de la Torre and Douglas (2004) proposed a higher order factor model that assumes that attributes are conditionally independent given one or more continuous latent variables. A second approach outlined in Henson et al. (2009) and Templin et al. (2008) approximated the skill class structure with a multivariate probit model. That is, rather than specifying a higher order factor model, the multivariate probit model estimates a tetrachoric correlation matrix among the latent binary attributes. These two alternatives differ in terms of the number of parameters estimated. For example, a single-factor, higher order model requires 2K parameters (i.e., a latent threshold and loading for each skill), whereas the multivariate probit model includes K thresholds and a K×K unstructured tetrachoric correlation matrix for a total of K(K+1)/2 parameters.

The Longitudinal CDM Framework

The previous section outlined the classical CDM framework where students’ skill profiles are fixed. The authors next discuss an extension to longitudinal cases, which are becoming more popular (Chen et al., 2018; Kaya & Leite, 2017; Li et al., 2016; Madison & Bradshaw, 2017; Wang et al., 2017; Ye et al., 2016) given the utility such models offer for classroom assessments where students learn.

We let t=1,,T index time (i.e., the testing occasion), so that at time t, αit and Yit are student i’s skill profile and observed item response vector. Furthermore, in the longitudinal context, the authors are now interested in modeling students’ learning trajectory, which consists of skill profiles from time 1 to T. We let the learning trajectory for student i be αi=(αi1,,αiT). Learning trajectories track which skills students begin with, in addition to the skills they learn over time. We accordingly let Yi=(Yi1,,YiT) be a matrix of item responses student i completes over time.

As discussed in the previous section, the IRF specifies the relationship between αit and Yit at each occasion. The longitudinal CDM framework includes measurement models at each t in addition to a model that describes skill transitions (i.e., learning) over time. The challenge in the longitudinal framework is to model the distribution of αi, p(αi). As noted by Chen et al. (2018), one complication with longitudinal CDMs is that the learning trajectory space is high-dimensional. In the case with T=1, there are 2K possible skill classes. In contrast, the number of learning trajectories grows exponentially with T, so that in general, there are 2KT possible learning trajectories (i.e., different configurations for αi). Prior research considered several strategies to approximate the learning trajectory space. For instance, Chen et al. (2018) noted that if skills cannot be unlearned (i.e., learning is an absorbing state), the number of learning trajectories reduces to (T+1)K, but the number of non-decreasing learning trajectories may still be enormous with either a large T or K. Additional research considered more parsimonious models using hidden Markov (Chen et al., 2018; Kaya & Leite, 2017; Li et al., 2016) and higher order models (Wang et al., 2017) to approximate the learning trajectory space. For instance, an unstructured first-order HMM approximates the learning trajectory space with 4K parameters and the number of parameters is reduced to 3K if skill mastery is an absorbing state.

The high-dimensional nature of the learning trajectory space is apparent in the data application below. In the application, K=8 and T=2, so there are 28=256 skill classes at each measurement occasion and a total of 216=65,536 possible unstructured learning trajectories. The large learning trajectory space would not be problematic if we had correspondingly large data sets to estimate the likelihood of traversing each trajectory. However, as is typical for educational studies, the application data set is just a fraction of the number of learning trajectories; that is, we analyze a data set with N=268 observations. In practice, researchers must approximate the learning trajectory space. A more parsimonious alternative as proposed by Chen et al. (2018) is to employ a first-order HMM with the assumption that students cannot unlearn skills; however, such an analysis is also impractical for the application given there are 38=6,561 non-decreasing learning trajectories. Clearly, more parsimonious models are needed when the cardinality of the learning trajectories outpaces the sample size as found in typical educational studies (e.g., N200).

The authors next outline a multivariate probit model for learning trajectories and then report results from an application that estimated just 68 parameters. Although the application only concerns two time points, it may be extended to any number of time points and provides a flexible model for incorporating covariates and modeling the dependence of multiple attributes.

Multivariate Probit Model for Learning Trajectories

In this section, the authors introduce a model for describing student changes in skill profiles. In particular, the model for αi assumes skill profiles are independent across time when conditioned upon time-specific regression coefficients, βt, and a K×K attribute residual correlation matrix, Rt, as indicated by the following model:

p(αi|B,R)=Πt=1Tp(αit|βt,Rt), (2)

where, for brevity, we let B and R denote all βt and Rt, respectively.

There are many options for the density for αit in Equation 2. The authors consider a multivariate normal distribution for the probability of skill profiles, p(αit|βt,Rt), for all t. As noted above, prior research related to cross-sectional CDMs employed a multivariate probit model or a higher order factor model to describe dependence among latent skills. In this article, the authors consider the former strategy given the multivariate probit model estimates an unstructured tetrachoric correlation matrix and does not require theory to specify a higher order factor structure.

Bayesian Model Formulation

The authors next describe the Bayesian formulation for learning trajectories using the multivariate probit model, which builds upon the work of Lawrence et al. (2008). Parameter estimation for the multivariate probit model is notoriously challenging and one strategy is to augment the prior in Equation 2 by introducing additional latent variables. Specifically, the model for αit, βt, and Rt is as follows:

αikt=I(αikt*>0), (3)
αit*|βt,Rt~NK(xitβt,Rt), (4)

where αit*=(αi1t*,,αiKt*) and xit is a vector of covariates for individual i at time t. Equation 3 specifies a deterministic relationship between the binary skill and a continuous augmented variable αikt*. Equation 4 shows that the continuous augmented data are assumed to follow a multivariate normal distribution conditioned upon the time t regression coefficients (i.e., βt) and correlation matrix (i.e., Rt). Equations 3 and 4 together imply skill class probabilities are described by a multivariate normal distribution. Specifically, augmenting the prior in Equation 2 yields the following joint prior for learning trajectories and augmented data:

p(αi,αi*|B,R)=Πt=1Tp(αit*|βt,Rt)Πk=1K[I(αikt*>0)]αikt[I(αikt*0)]1αikt. (5)

We obtain the prior in Equation 2 from Equation 5 by integrating out αi* over the truncated support.

The multivariate probit model is not identified with unrestricted Rt, and it is common to restrict Rt to be a correlation matrix (Lawrence et al., 2008). There are no standard priors for correlation matrices. Accordingly, several studies considered approaches for Bayesian estimation of correlation to address this identifiability issue (e.g., see Lawrence et al., 2008, for a review of prior research). We extend the parameter expansion technique described by Lawrence et al. (2008). In particular, the parameter expansion addresses the problem of sampling correlation matrices by introducing additional random variables to specify a prior for a latent covariance matrix, as opposed to Rt, such that the covariance matrix is defined as Σt=Vt1/2RtVt1/2, where Vt includes variances along the diagonal and zeros elsewhere. The augmented data and regression coefficients are also rescaled as Wt=Vt1/2αt*, where αt* is a N×K matrix of augmented skills at time t and θt=Vt1/2βt. The priors for the expanded regression coefficients and covariance matrix are as follows:

θt|Σt,ω2~MNpt×K(Vt1/2Mβ,ω2Ipt,Σt), (6)
p(Σt)|Σt|K+12exp[12tr(Σ0Σt1)], (7)
p(ω2)(ω2)δ11exp(δ2ω2), (8)

where pt denotes the number of predictor variables at time t (i.e., the number of rows of θt). Equation 6 indicates the conditional prior for the pt×K matrix θt is a matrix normal distribution, which generalizes the multivariate normal from a vector to a matrix. The prior mean of θt is in general Vt1/2Mβ and we set it to zero in the application below. The association among the θt over rows (i.e., the row covariance matrix) is ω2Ipt, and the column covariance matrix is Σt. Equation 7 specifies an inverse-Wishart prior for the expanded covariance matrix and an inverse-Gamma distribution is specified for the regression coefficient scale, hyper-parameter, ω2.

It is important to note that the proposed Bayesian formulation offers a tractable Gibbs sampling algorithm for approximating the parameter posterior distribution. Readers are directed to Online Appendix A for additional technical details regarding the full conditional distributions for α, α*, B, R, and ω2 and a Markov chain Monte Carlo (MCMC) algorithm.

Application to an Educational Intervention Study

In this section, the authors apply the multivariate probit model to data from the Adaptive Content With Evidence-Based Diagnosis (ACED) evaluation study (Shute et al., 2008). They first provide an overview of the ACED data and then describe the implementation of the modeling approach. The standard approach for modeling longitudinal data is to model changes in a continuous latent variable over time. They compare the longitudinal CDM framework with a longitudinal, two-parameter IRT model to evaluate the relative fit of the two modeling frameworks. They conclude this section with a discussion of results.

Overview of ACED Data

In the ACED evaluation, Shute et al. (2008) used a pre-test treatment–post-test design to assess the effect of different training strategies. Students first completed a pre-test consisting of 25 test items and were then randomly assigned to one of four conditions to receive a 1-hr practice intervention. The control group (Condition 4, N4 = 55) received content irrelevant to math, while the other three intervention groups received practice related to test items. Conditions 1 (N1 = 71) and 2 (N2 = 75) are adaptive conditions where tasks in the practice were presented to students based on their solution history, while the linear condition (Condition 3, N3 = 67) presented tasks in a predetermined order. The feedback of practice tasks in Conditions 1 and 3 were verification of correctness and explanation, while respondents randomly assigned to Condition 2 only received verification as to the correctness of solutions and were not provided additional explanation. After the 1 hr period, students took a post-test on 25 items where the required skills, difficulty, and format of each item were matched with the pre-test.

The data set contains responses to 50 test items from N=268 students at T=2 time points. The test items are divided into two forms, A and B, each containing 25 matched items. The forms were counterbalanced such that half of the students received pre- and post-tests in order A-B while the other half received in order B-A. The test requires eight skills related to solving geometric sequence questions: (1) identify geometric sequence, (2) find the common ratio of sequence, (3) generate algebraic expressions for the nth term, (4) extend the geometric sequence with starting terms and common ratio, (5) generate a geometric sequence for the given situation, (6) interpret the table representing a geometric sequence, (7) generate a verbal rule for geometric sequence, and (8) use geometric sequence to model visual patterns. The Q-matrix for the 25 items in pre- or post-test is shown in Table 1.

Table 1.

Q-Matrix for ACED Pre- or Post-Test and the Estimated Item Parameters (EST) and Standard Errors (SE) for Forms A and B.

Skills Form A Form B
s^j g^j s^j g^j
Item 1 2 3 4 5 6 7 8 EST (SE) EST (SE) EST (SE) EST (SE)
1 0 0 0 1 0 0 0 0 0.22 (0.05) 0.37 (0.05) 0.40 (0.06) 0.26 (0.04)
2 0 0 0 1 0 0 0 0 0.40 (0.05) 0.31 (0.05) 0.28 (0.05) 0.37 (0.04)
3 0 0 0 1 0 1 0 0 0.06 (0.03) 0.56 (0.05) 0.13 (0.09) 0.34 (0.05)
4 0 0 0 1 0 0 0 1 0.28 (0.07) 0.41 (0.03) 0.30 (0.08) 0.34 (0.04)
5 0 0 1 0 0 0 0 0 0.64 (0.05) 0.27 (0.05) 0.52 (0.06) 0.30 (0.10)
6 0 0 0 1 0 0 0 0 0.53 (0.05) 0.26 (0.04) 0.45 (0.05) 0.27 (0.04)
7 1 0 0 0 0 0 0 0 0.31 (0.07) 0.34 (0.04) 0.15 (0.04) 0.51 (0.04)
8 0 1 0 0 0 0 0 0 0.29 (0.09) 0.40 (0.03) 0.34 (0.08) 0.38 (0.03)
9 1 0 0 0 0 0 0 0 0.57 (0.08) 0.01 (0.01) 0.57 (0.08) 0.02 (0.01)
10 0 0 0 1 0 0 0 0 0.11 (0.04) 0.60 (0.04) 0.39 (0.06) 0.32 (0.04)
11 0 0 0 1 1 1 0 0 0.14 (0.05) 0.59 (0.04) 0.24 (0.05) 0.56 (0.04)
12 0 0 1 0 0 0 0 1 0.47 (0.11) 0.21 (0.03) 0.50 (0.12) 0.22 (0.03)
13 0 0 1 0 1 0 1 0 0.71 (0.10) 0.07 (0.04) 0.69 (0.11) 0.13 (0.05)
14 0 0 0 1 0 0 0 0 0.31 (0.06) 0.21 (0.04) 0.28 (0.05) 0.32 (0.04)
15 1 0 0 0 0 0 0 0 0.03 (0.03) 0.19 (0.06) 0.11 (0.06) 0.27 (0.05)
16 0 1 0 0 0 0 0 0 0.46 (0.11) 0.19 (0.03) 0.39 (0.12) 0.27 (0.03)
17 0 0 0 0 1 1 0 0 0.79 (0.03) 0.11 (0.06) 0.81 (0.03) 0.09 (0.05)
18 0 0 0 1 0 0 1 0 0.69 (0.05) 0.18 (0.03) 0.81 (0.05) 0.08 (0.02)
19 0 0 0 1 0 0 1 0 0.32 (0.06) 0.37 (0.04) 0.63 (0.06) 0.22 (0.03)
20 0 0 0 1 1 1 0 0 0.14 (0.05) 0.50 (0.05) 0.04 (0.03) 0.43 (0.05)
21 0 0 0 1 0 0 0 1 0.73 (0.11) 0.08 (0.02) 0.58 (0.10) 0.21 (0.02)
22 0 0 0 1 0 0 1 0 0.78 (0.04) 0.16 (0.03) 0.68 (0.06) 0.19 (0.03)
23 0 0 0 1 0 0 0 0 0.50 (0.06) 0.17 (0.04) 0.58 (0.05) 0.16 (0.03)
24 0 1 0 0 0 0 0 0 0.42 (0.11) 0.23 (0.03) 0.48 (0.13) 0.16 (0.02)
25 0 0 0 0 1 0 1 0 0.64 (0.05) 0.11 (0.07) 0.65 (0.04) 0.14 (0.08)

Note. The skills are defined as follows: (1) identify geometric sequence, (2) find the common ratio of sequence, (3) generate algebraic expressions for the nth term, (4) extend the geometric sequence with starting terms and common ratio, (5) generate a geometric sequence for the given situation, (6) interpret the table representing a geometric sequence, (7) generate a verbal rule for geometric sequence, and (8) use geometric sequence to model visual patterns. ACED = Adaptive Content With Evidence-Based Diagnosis; EST = Estimated Item Parameters.

Multivariate Probit Implementation

The authors implement the multivariate probit learning trajectory model for the ACED data by introducing three dummy variables to distinguish the four experimental conditions. Let Dic for c=1,2,3 to denote the three kinds of treatment conditions where Dic=1 if student i received the intervention condition c. Note we omit a dummy variable for the control group (i.e., c=4) and accordingly interpret the intercepts at Time 2 as the control group skill prevalence.

The design matrix for subject i and the matrix of regression coefficient for the ACED data are

Xi=(1000001Di1Di2Di3),B=(β1β2), (9)

where β1 is a eight-dimensional row vector of intercepts for T=1 and β2 is a 4×8 matrix where the first row denotes post-test intercepts, and the second, third, and fourth rows correspond to the relative effects of Conditions 1, 2, and 3, respectively, on skill acquisition to the control Condition 4.

Finally, the authors fixed the underlying correlation matrix over time so that R1=R2. This restriction is not necessary in general, but the restriction was imposed to reduce the number of estimated parameters. That is, they estimated 40 parameters in B and 28 parameters in R for a total of 68 total parameters using a sample size of 268.

The simulation study in Online Appendix B mimics the setup of the real data, and the results suggest the algorithm converges after 2,000 iterations; therefore, the authors estimated the model parameters using MCMC with a chain of length 10,000 and a burn-in of 5,000. They coded the MCMC algorithm with C++ code and R (Eddelbuettel et al., 2011). They estimated model parameters (i.e., B and R) using the posterior mean. In addition, they summarized skill mastery rates by computing posterior modes for αi.

Results

Model fit and comparison with two-parameter, longitudinal IRT

Longitudinal models with continuous variables are widely used in education and it is necessary to evaluate the extent to which the proposed method improves upon existing strategies. The authors compare their method with a traditional two-parameter, longitudinal IRT model (e.g., see Albert, 1992) and assess relative model fit using the Deviance Information Criterion (DIC; Spiegelhalter et al., 2002). Note that the Bayesian formulation for the longitudinal IRT model is summarized in Online Appendix C. They compute the marginal DIC for their multivariate probit model and compare it with the marginal DIC for the longitudinal IRT model. Their method has a DIC value of 14,995.5, which is smaller than the DIC value of 16,624.5 for the longitudinal IRT model. Accordingly, they next discuss the parameter estimates for their model given the relative improvement in fit over the IRT model.

Multivariate probit model parameter estimates

The last four columns in Table 1 report the estimated item parameters for the DINA model. The point estimate is computed by sample posterior mean, and the standard error is computed by sample posterior standard deviation. The estimation shows a consistent pattern in item parameters for Forms A and B. For example, Test Items 5, 9, 13, 17, 18, 21, 22, and 25 in both forms have estimated slipping parameters higher than 0.5 and small guessing parameters, indicating these might be difficult questions; Test Items 11 and 20 in both forms have estimated guessing parameters greater than 0.4. Also, to address the comparability of forms, the authors performed signed-rank tests on the slipping and guessing parameters to assess whether the parameters differed between the two forms. The p values for slipping and guessing parameters were of .56 and .89, respectively, which suggests that the paired estimated values are from the same distribution.

Table 2 reports the estimated parameters of B^ and their standard errors. The point estimate is computed by sample posterior mean, and the standard error is computed by sample posterior standard deviation. The first two rows represent the intercept terms of the latent variable α* at the two time points. Based on Equations 3 and 4, a larger (smaller) intercept indicates skills were more (less) prevalent at a given point in time. For example, Skills 1, 2, and 8 were less likely to be mastered as indicated by the large, negative intercepts at both time points. In contrast, Skill 6 was more prevalent at both time points. The authors also find evidence that students across intervention groups acquired Skill 5 as noted by the increase in the intercept from 0.45 to 3.46.

Table 2.

Estimated B^ (EST) and Standard Errors (SE) for the ACED Evaluation Data.

Variable Skills
1 2 3 4 5 6 7 8
EST (SE) EST (SE) EST (SE) EST (SE) EST (SE) EST (SE) EST (SE) EST (SE)
Int., t=1 −0.81 (0.03) −4.47 (0.41) 0.04 (0.10) −0.44 (0.03) 0.45 (0.15) 2.64 (0.24) 1.14 (0.29) −3.58 (0.25)
Int., t=2 −0.84 (0.04) −6.67 (0.65) −1.17 (0.23) −0.21 (0.04) 3.46 (0.79) 9.25 (0.65) 1.12 (0.27) −3.22 (0.74)
Cond. 1 0.75 (0.05) 4.08 (0.51) 1.93 (0.33) 0.20 (0.05) 1.91 (0.47) 2.01 (0.78) 5.09 (0.78) 4.46 (0.55)
Cond. 2 0.69 (0.05) 0.66 (0.66) 3.96 (0.44) 0.15 (0.05) −0.63 (0.81) 4.41 (0.71) 4.66 (0.56) −2.09 (0.43)
Cond. 3 0.89 (0.05) −1.44 (0.75) 1.79 (0.31) 0.53 (0.05) 0.30 (0.51) −0.75 (0.43) 3.07 (0.33) −4.15 (0.66)

Note. The skills are defined as follows: (1) identify geometric sequence, (2) find the common ratio of sequence, (3) generate algebraic expressions for the nth term, (4) extend the geometric sequence with starting terms and common ratio, (5) generate a geometric sequence for the given situation, (6) interpret the table representing a geometric sequence, (7) generate a verbal rule for geometric sequence, and (8) use geometric sequence to model visual patterns. ACED = Adaptive Content With Evidence-Based Diagnosis; Int. = intercept; Cond. = condition.

Several additional observations are available from the results in Table 2. For Skills 1, 2, 3, and 8 whose β^2k<β^1k, it implies that if no practice is involved, subjects tend to forget what they have mastered on these attributes. The last three rows represent the effect of the three experimental conditions relative to the control group. Generally, Condition 1 helps students master every skill, while Conditions 2 and 3 facilitate mastery for some skills. For Skill 8, Condition 1 is the only group that has positive mean for the latent variables at post-test, which implies only students receiving adaptive practice with full feedback tend to learn how to interpret visual patterns. Finally, Table 2 shows that the standard errors were smallest for the parameters involving Skills 1 and 4. The authors note that the finding of smaller standard errors could be explained by the fact that there are more simple structure items involving Skills 1 and 4 (i.e., there are three for Skill 1 and six for Skill 4) than the other attributes.

Table 3 reports the estimated correlation matrix, R^. All the estimated correlation coefficients are non-negative, indicating that attributes are likely to be positively correlated. The highest correlation is between Skill 1 and Skill 4 (i.e., r14=.30) and the remaining skill profile correlations are no larger than .05. In short, the skills appear to be primarily independent in light of the small correlations in R^.

Table 3.

Estimated Correlation Matrix R^ for the Multivariate Probit Model.

Skills 1 2 3 4 5 6 7 8
1 1 .02 .02 .30 .01 .02 .01 .02
2 .02 1 .00 .02 .00 .03 .01 .03
3 .02 .00 1 .02 .00 .01 .01 .01
4 .30 .02 .02 1 .04 .02 .01 .01
5 .01 .00 .00 .04 1 .02 .02 .00
6 .02 .03 .01 .02 .02 1 .01 .00
7 .01 .01 .01 .01 .02 .01 1 .00
8 .02 .03 .01 .01 .00 .00 .00 1

Note. The skills are defined as follows: (1) identify geometric sequence, (2) find the common ratio of sequence, (3) generate algebraic expressions for the nth term, (4) extend the geometric sequence with starting terms and common ratio, (5) generate a geometric sequence for the given situation, (6) interpret the table representing a geometric sequence, (7) generate a verbal rule for geometric sequence, and (8) use geometric sequence to model visual patterns.

The authors next summarize skill mastery rates in the posterior at pre- and post-test by experimental condition to provide insight as to the magnitude of change. Table 4 shows the transition of estimated mastery rate of each skill for the four conditions. Overall, the estimated mastery rates are consistent with the estimated parameters in Table 2 and with prior investigations with the ACED data. Namely, Shute et al. (2008) concluded that Group 1 performed best in the post-test, followed by Groups 2, 3, and 4. The method provides additional fine-grained insight about skill acquisition. For example, Group 3 increased the most on mastery of Skill 4 (extend) and only Group 1 increased on mastery of Skill 8 (visual). For Skill 3, the results indicate that if no training is involved, students would forget what they have mastered, and this is consistent with Shute et al.’s (2008) result that the test scores slightly decreased in Groups 3 and 4.

Table 4.

Estimated Mastery Rate for Each Skill by Experimental Condition.

Condition 1 Condition 2 Condition 3 Condition 4
Skill Pre Post Pre Post Pre Post Pre Post
1 0.23 0.51 0.16 0.54 0.17 0.58 0.15 0.15
2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3 0.37 0.80 0.42 1.00 0.53 1.00 0.50 0.10
4 0.36 0.51 0.30 0.46 0.35 0.66 0.30 0.43
5 0.90 1.00 0.86 0.10 0.92 1.00 0.88 1.00
6 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
7 0.99 1.00 1.00 1.00 1.00 1.00 0.98 1.00
8 0.00 0.64 0.00 0.00 0.00 0.00 0.00 0.00

Note. The skills are defined as follows: (1) identify geometric sequence, (2) find the common ratio of sequence, (3) generate algebraic expressions for the nth term, (4) extend the geometric sequence with starting terms and common ratio, (5) generate a geometric sequence for the given situation, (6) interpret the table representing a geometric sequence, (7) generate a verbal rule for geometric sequence, and (8) use geometric sequence to model visual patterns.

Discussion

The authors proposed a new framework for modeling the high-dimensional learning trajectory space. In particular, they extended the multivariate probit model to longitudinal contexts and provided several examples of how the model can be applied to describe changes in skill acquisition. In this final section, they highlight important implications from the study, offer future research directions, and provide concluding remarks.

Implications for Research

Several studies proposed longitudinal CDM approaches, and the method improves upon prior research in several ways. First, the method does not explicitly assume that learning trajectories are non-decreasing (e.g., see Chen et al., 2018; Wang et al., 2017). It may be the case that students move between the mastery and non-mastery states during the learning process and the method provides a framework for tracking such changes. However, there are instances where it may be reasonable to impose the condition that skills cannot be unlearned. In such cases, the authors can incorporate the non-decreasing assumption in their framework by restricting the support for the attributes and augmented data. That is, they can easily enforce conditions such as αiktαik,t1 on the binary attributes and the continuous augmented data. Second, their method is applicable when the goal is to draw inferences about the high-dimensional learning trajectory space with smaller scale educational studies. As noted above, they estimated 68 parameters to evaluate the experimental interventions rather than, for example, 6,561 parameters that must be estimated with a non-decreasing first-order, HMM.

The utility of the framework, and longitudinal CDMs more generally, will be determined by its ability to support practitioners and researchers in addressing fundamental theoretical questions about factors that promote learning. The authors provide some evidence of the value of their framework through their application. Specifically, the results of the ACED data application provided evidence about which feedback interventions promoted skill acquisition under the pre- or post-test design. In particular, the authors found evidence that Condition 1 was beneficial across skills, whereas Conditions 2 and 3 were more beneficial for specific skills. Their application results could be disseminated to practitioners to provide specific recommendations as to which types of feedback promotes learning of which skills in an effort to create student-tailored instructional interventions.

The application also provided an example of the relative merits of longitudinal CDMs in comparison with IRT models that use continuous latent variables. The authors found evidence that their model improved upon the fit of a pre- or post-test two-parameter IRT model. The difference in model performance may be expected a priori given the ACED data were originally designed within a cognitive diagnosis assessment framework, so an application of IRT to the ACED data is retrofitting (Haberman & von Davier, 2006). Furthermore, the relative fit of a CDM versus an IRT model may be related to the attribute structure. The authors found evidence that the attributes were mostly uncorrelated in the ACED data. In contrast, an IRT model may be better suited for circumstances with correlated attributes following a higher order factor structure.

Limitations and Future Directions

There are several directions for future research. First, the authors employed the DINA measurement model with the multivariate probit model for learning trajectories, which is a more parsimonious CDM that may not apply in all applications. Future research should consider applications of the learning model with more general CDMs, such as the general DINA model, the loglinear cognitive diagnostic model, or the general diagnostic model (de la Torre, 2011; Henson et al., 2009; von Davier, 2008). Second, computerized assessments provide opportunities to collect additional information about students that may improve the accuracy of classifications or offer insight about the process of student learning. Future research should incorporate other ancillary information, such as response times, to characterize student performance. Third, additional research is needed to assess minimum sample size requirements for longitudinal CDM studies. Fourth, the authors fixed the attribute tetrachoric correlation matrix over time in their application to reduce the number of estimated parameters. It is possible the attribute relationships change over time and future applications should explicitly assess such structural changes.

Fifth, the learning model in their application included an intercept for the pre- and post-test rather than an intercept and linear term as used in latent growth curve models. The authors recommend future applications of their model with more than two time points should consider parameterizations that use unstructured growth curves or polynomials of time as applied in existing growth curve models. Furthermore, in cases with at least three time points, an extension of the growth curve framework is to model individual differences in the growth curve regression coefficients to account for additional individual differences in how skills evolve. Specifically, students may differ in terms of intercepts and linear rates of growth. The proposed framework can be easily extended by assuming conditional independence of attributes over time given the growth curve random effects.

Sixth, the authors reported changes in the probability of possessing skills for each attribute (e.g., αikt) rather than the attribute profile (e.g., αit). They could have instead estimated the transition probabilities for attribute profiles by summarizing the attribute values sampled for each individual in the posterior distribution. However, their application involved K=8 attributes and any description of attribute profile transition probabilities would require summarizing transitions from 256 attribute profiles at Time 1 into one of 256 profiles at Time 2, which cannot easily be presented in tables or figures. Seventh, a methodological alternative to using the dynamic CDM framework is a dynamic multidimensional IRT model, and additional research is needed to understand the merits of these procedures in various circumstances. One advantage of the CDM framework is that mastery/non-mastery classifications are designed as a component of the model, whereas thresholds would need to be pre-specified for the multidimensional IRT framework.

Finally, the previously developed HMMs account for dependence over time by relating skills at time t to t1. In contrast, the proposed approach follows the structural equation modeling (SEM) tradition by formulating a latent growth curve to characterize change patterns in α over time. Conceptually, the notion of a growth curve is well understood for continuous latent variables, but the authors show in this article that a new algorithm is needed to implement growth curves with binary attributes. The HMM and SEM approaches capture dependence among the latent variables differently for specific circumstances. If the goal is, for example, to design an intelligent tutoring system where the number of trials is possibly random, then HMMs are likely the better choice for modeling dependence over time, because the focus is on understanding the probability of transitioning to mastery if it has not yet occurred. In contrast, growth curves, as outlined in this article, are helpful when the goal is to understand the pattern of skill changes over a fixed time period. The ACED data seem appropriate for this latter model given the study was conducted over a fixed time period and the interest centered on how skills changed according to membership in the various intervention conditions.

Concluding Remarks

The authors presented a new framework for modeling learning trajectories within the CDM framework that builds upon multivariate growth curve models. This study addresses the opportunities available in the modern classroom where advances in educational technologies provide practitioners and researchers with a wealth of information about student performance. Longitudinal CDMs are designed to provide fine-grained classifications over time to support instructional decisions, and future research must continue to advance methods that support data-driven policy recommendations to improve the condition of education.

Supplemental Material

Online_Appendix – Supplemental material for A Multivariate Probit Model for Learning Trajectories: A Fine-Grained Evaluation of an Educational Intervention

Supplemental material, Online_Appendix for A Multivariate Probit Model for Learning Trajectories: A Fine-Grained Evaluation of an Educational Intervention by Yinghan Chen and Steven Andrew Culpepper in Applied Psychological Measurement

Footnotes

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by National Science Foundation Methodology, Measurement, and Statistics Program Grant #1632023 and Spencer Foundation Grant #201700062.

ORCID iD: Steven Andrew Culpepper Inline graphichttps://orcid.org/0000-0003-4226-6176

Supplemental Material: Supplementary material is available for this article online.

References

  1. Albert J. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational and Behavioral Statistics, 17(3), 251–269. [Google Scholar]
  2. Brooks S. P., Gelman A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4), 434–455. [Google Scholar]
  3. Chen Y., Culpepper S. A., Wang S., Douglas J. A. (2018). A hidden Markov model for learning trajectories in cognitive diagnosis with application to spatial rotation skills. Applied Psychological Measurement, 42, 5–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Corbett A. T., Anderson J. R. (1994). Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction, 4(4), 253–278. [Google Scholar]
  5. Culpepper S. A. (2015). Bayesian estimation of the DINA model with Gibbs sampling. Journal of Educational and Behavioral Statistics, 40(5), 454–476. [Google Scholar]
  6. de la Torre J. (2009). DINA model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 34(1), 115–130. [Google Scholar]
  7. de la Torre J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199. [Google Scholar]
  8. de la Torre J., Douglas J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353. [Google Scholar]
  9. Eddelbuettel D., François R., Allaire J., Chambers J., Bates D., Ushey K. (2011). Rcpp: Seamless R and C++ integration. Journal of Statistical Software, 40(8), 1–18. [Google Scholar]
  10. González-Brenes J., Huang Y., Brusilovsky P. (2014). General features in knowledge tracing to model multiple subskills, temporal item response theory, and expert knowledge. In Stamper J., Pardos Z., Mavrikis M., McLaren B. M. (eds) The proceedings of the 7th International Conference on Educational Data Mining (pp. 84–91). International Educational Data Mining Society. [Google Scholar]
  11. Haberman S. J., von Davier M. (2006). 31b some notes on models for cognitively based skills diagnosis. In Rao C., Sinharay S. (Eds.), Psychometrics (Vol. 26, p. 1031–1038). Elsevier; http://www.sciencedirect.com/science/article/pii/S0169716106260401 [Google Scholar]
  12. Haertel E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301–321. [Google Scholar]
  13. Hartz S. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality [Unpublished doctoral dissertation]. University of Illinois at Urbana-Champaign. [Google Scholar]
  14. Henson R. A., Templin J. L., Willse J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210. [Google Scholar]
  15. Huff K., Goodman D. P. (2007). The demand for cognitive diagnostic assessment. In Leighton J. P., Gierl M. J. (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 19–60). Cambridge University Press. [Google Scholar]
  16. Junker B. W., Sijtsma K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272. [Google Scholar]
  17. Kaya Y., Leite W. L. (2017). Assessing change in latent skills across time with longitudinal cognitive diagnosis modeling: An evaluation of model performance. Educational and Psychological Measurement, 77, 369–388. 10.1177/0146621617697959 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Khajah M., Wing R., Lindsey R., Mozer M. (2014). Integrating latent-factor and knowledge-tracing models to predict individual differences in learning. In Stamper J., Pardos Z., Mavrikis M., McLaren B. M. (eds) The proceedings of the 7th International Conference on Educational Data Mining (pp. 99–106). [Google Scholar]
  19. Klingler S., Käser T., Solenthaler B., Gross M. (2015). On the performance characteristics of latent-factor and knowledge tracing models. International Educational Data Mining Society. [Google Scholar]
  20. Lawrence E., Bingham D., Liu C., Nair V. N. (2008). Bayesian inference for multivariate ordinal data using parameter expansion. Technometrics, 50(2), 182–191. [Google Scholar]
  21. Leighton J. P., Gierl M. J. (2007). Why cognitive diagnostic assessment. In Leighton J. P., Gierl M. J. (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 3–18). Cambridge University Press. [Google Scholar]
  22. Li F., Cohen A., Bottge B., Templin J. (2016). A latent transition analysis model for assessing change in cognitive skills. Educational and Psychological Measurement, 76(2), 181–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Madison M., Bradshaw L. (2017, April). Assessing intervention effects in a diagnostic classification model framework [Paper presentation].The Annual Meeting of the National Council on Measurement in Education, San Antonio, TX, United States. [Google Scholar]
  24. Maris E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212. [Google Scholar]
  25. Mislevy R. J., Wilson M. (1996). Marginal maximum likelihood estimation for a psychometric model of discontinuous development. Psychometrika, 61(1), 41–71. [Google Scholar]
  26. Shute V. J., Hansen E. G., Almond R. G. (2008). You can’t fatten a hog by weighing it—Or can you? Evaluating an assessment for learning system called ACED. International Journal of Artificial Intelligence in Education, 18(4), 289–316. [Google Scholar]
  27. Spiegelhalter D. J., Best N. G., Carlin B. P., Van Der Linde A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 583–639. [Google Scholar]
  28. Tatsuoka K. K. (1984). Analysis of errors in fraction addition and subtraction problems. Computer-Based Education Research Laboratory, University of Illinois at Urbana-Champaign. [Google Scholar]
  29. Templin J. L., Henson R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287–305. [DOI] [PubMed] [Google Scholar]
  30. Templin J. L., Henson R. A., Templin S. E., Roussos L. (2008). Robustness of hierarchical modeling of skill association in cognitive diagnosis models. Applied Psychological Measurement, 32, 559–574. [Google Scholar]
  31. von Davier M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61(2), 287–307. [DOI] [PubMed] [Google Scholar]
  32. Wang S., Yang Y., Culpepper S., Douglas J. (2017). Tracking skill acquisition with cognitive diagnosis models: A higher-order hidden Markov model with covariates. Journal of Educational and Behavioral Statistics, 43, 57–87. [Google Scholar]
  33. Xu G. (2017). Identifiability of restricted latent class models with binary responses. Annals of Statistics, 45(2), 675–707. [Google Scholar]
  34. Xu Y., Mostow J. (2011). Logistic regression in a dynamic Bayes net models multiple subskills better! In Pechenizkiy M., Calders T., Conati C., Ventura S., Romero C., Stamper J. (eds) Proceedings of the 4th International Conference on Educational Data Mining, Eindhoven, Netherlands (pp. 337–338). Eindhoven University of Technology. [Google Scholar]
  35. Xu Y., Mostow J. (2012). Comparison of methods to trace multiple subskills: Is LR-DBN best? In Yacef K., Zaïane O., Hershkovitz A., Yudelson M., Stamper J. (eds) Proceedings of the 5th International Conference on Educational Data Mining, Chania, Greece (pp. 41–48). International Educational Data Mining Society. [Google Scholar]
  36. Ye S., Fellouris G., Culpepper S., Douglas J. (2016). Sequential detection of learning in cognitive diagnosis. British Journal of Mathematical and Statistical Psychology, 69(2), 139–158. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Online_Appendix – Supplemental material for A Multivariate Probit Model for Learning Trajectories: A Fine-Grained Evaluation of an Educational Intervention

Supplemental material, Online_Appendix for A Multivariate Probit Model for Learning Trajectories: A Fine-Grained Evaluation of an Educational Intervention by Yinghan Chen and Steven Andrew Culpepper in Applied Psychological Measurement


Articles from Applied Psychological Measurement are provided here courtesy of SAGE Publications

RESOURCES