A Multinomial Processing Tree Model of the 2-back Working Memory Task

Michael D Lee; Percy K Mistry; Vinod Menon

doi:10.1007/s42113-022-00138-1

. Author manuscript; available in PMC: 2023 Oct 23.

Published in final edited form as: Comput Brain Behav. 2022 Jun 7;5(3):261–278. doi: 10.1007/s42113-022-00138-1

A Multinomial Processing Tree Model of the 2-back Working Memory Task

Michael D Lee ¹, Percy K Mistry ², Vinod Menon ^2,^3,⁴

PMCID: PMC10593202 NIHMSID: NIHMS1890058 PMID: 37873549

Abstract

The $n$ -back task is a widely used behavioral task for measuring working memory and the ability to inhibit interfering information. We develop a novel model of the commonly used 2-back task using the cognitive psychometric framework provided by Multinomial Processing Trees. Our model involves three parameters: a memory parameter, corresponding to how well an individual encodes and updates sequence information about presented stimuli; a decision parameter corresponding to how well participants execute choices based on information stored in memory; and a base-rate parameter corresponding to bias for responding “yes” or “no”. We test the parameter recovery properties of the model using existing 2-back experimental designs, and demonstrate the application of the model to two previous data sets: one from social psychology involving faces corresponding to different races (Stelter and Degner, British Journal of Psychology 109:777–798, 2018), and one from cognitive neuroscience involving more than 1000 participants from the Human Connectome Project (Van Essen et al., Neuroimage 80:62–79, 2013). We demonstrate that the model can be used to infer interpretable individual-level parameters. We develop a hierarchical extension of the model to test differences between stimulus conditions, comparing faces of different races, and comparing face to non-face stimuli. We also develop a multivariate regression extension to examine the relationship between the model parameters and individual performance on standardized cognitive measures including the List Sorting and Flanker tasks. We conclude by discussing how our model can be used to dissociate underlying cognitive processes such as encoding failures, inhibition failures, and binding failures.

Keywords: n-back task, 2-back task, Multinomial processing trees, Psychometric models, Bayesian methods, Human Connectome Project

Introduction

The $n$ -back task, introduced by Kirchner (1958) and Mackworth (1959), is a widely used behavioral task for measuring working memory and the ability to inhibit interfering information. In a standard $n$ -back task, a participant is presented with a series of stimuli and is required to respond “yes” if the current stimulus is the same as one presented $n$ positions earlier in the sequence. This requires remembering earlier stimuli, so that correct “yes” responses can be produced, but also requires not responding “yes” when the current stimulus matches one that was recently presented, but not exactly $n$ positions earlier. As Coulacoglou and Saklofske (2017, Chapter 5) note: the $n$ -back task “requires not only the storage and continual updating of information in [working memory], but also interference resolution.”

When used to measure working memory capacity in the context of cognitive training, the $n$ -back task is often applied adaptively, so that the value of $n$ changes over experimental blocks depending on participant performance (e.g., Au et al. 2015; Jaeggi et al. (2008). In more general psychometric and cognitive neuroscience applications, the value of $n$ is often fixed, with 2-back tasks being the most common.¹ Examples of the use of 2-back tasks include studies of aging (Schmiedek et al., 2009), depression (Harvey, 2005), and psychosocial stress (Schoofs et al., 2008). The 2-back task is also one of the most widely used paradigms for measuring working memory in human neuroscience research (Cai et al., 2021; Owen et al., 2005).

Previous Cognitive Models

A number of cognitive models of $n$ -back task behavior have been developed. Most of these models aim to provide a detailed account of behavior, but draw on different cognitive modeling frameworks. Examples include non-linear dynamic models based on catastrophe theory (Guastello et al., 2015), cognitive architectural models based on ACT-R (Juvina and Taatgen, 2007), detailed cognitive processing models based on the HY-GENE hypothesis generation framework (Harbison et al., 2011), and a number of connectionist models (Chatham et al., 2011; Sylvester et al., 2013).

There are fewer models of the $n$ -back task that could be considered as psychometric or measurement models. Such a model could be very useful in psychometric studies, which usually involve a battery of cognitive tests and observed covariates such as clinical diagnoses and demographic measures. In most studies, the results of $n$ -back tasks are summarized in terms of overall accuracy, or in terms of hit and false alarm rates. These measures are then modeled statistically, such as by regressing on the covariates or using factor analysis (e.g., Patterson 2009; Rac-Lubashevsky and Kessler 2016).

An empirical approach for dissecting behavioral measures into cognitive sub-processes is presented by Rac-Lubashevsky and Kessler (2016). These authors did not develop cognitive models, but instead used an additional experimental reference task to make inferences about underlying memory and decision-making processes. The approach builds on a basic literature in studying the components of working memory updating (e.g., Ecker et al., 2010), using a subtractive logic in comparing reference and $n$ -back task behavior.

Current Aims

Our goal is to build a simple process model that can act as a cognitive psychometric measurement model, without the need for additional experimentation. The focus is on being able to infer interpretable parameters corresponding to the memory and decision-making properties of individuals in completing the 2-back task. The model we develop is based on the two-high threshold Multinomial Processing Tree (MPT) model of recognition memory tasks (Batchelder and Riefer, 1999; Erdfelder et al., 2009). $n$ -back tasks can be conceived as a sequence of inter-dependent recognition memory tasks. Rather than having separate “study” and “test” phases, a single sequence of stimuli is presented, with “test” stimuli becoming “study” stimuli $n$ presentations later in the sequence. Thus, our model of the 2-back task involves the same recognition decision processes as the two-high threshold model, with additional assumptions about memory processes that maintain the encoding of the relative position of the stimuli throughout the sequence.

In the next section, we develop the model, including its implementation as a Bayesian graphical model. We then test the identifiability of the model in simulation studies, before presenting a series of applications of the model to two data sets. The first data set comes from a social psychology domain, involving an experiment in which faces of different races are presented (Stelter and Degner, 2018). The second data set comes from a cognitive neuroscience domain involving the Human Connectome Project (Van Essen, 2013), which contains 2-back data from over 1000 participants along with a battery of standardized neuropsychological measures including List Sorting and Flanker tasks (Barch, 2013). For both data sets we demonstrate how the model can measure individual memory and decision-making with a latent-mixture extension to detect contaminant behavior, and how it can test for group or condition differences through a hierarchical extension. For the Human Connectome Project data, we also develop a multivariate regression extension of the model to allow the relationship between model parameters and observed neuropsychological measures from other cognitive tasks to be inferred. We conclude with a discussion of limitations and possible extensions of the model.

Multinomial Processing Tree Model

In this section, we develop a MPT model of 2-back behavior. We begin with the conceptual framework for $n$ -back tasks, then formalize the 2-back model specified by this framework.

Conceptual Model

Figure 1 provides a graphical representation of the conceptual framework for our model. It shows four stimulus sequences that identify four different cases in 2-back tasks. The top-left panel shows the sequence ABCD, with the current stimulus being the final D. Since D does not appear earlier in the sequence there is no possibility it has been encoded in memory. We call this case Ω-ago (read “nullago”) because the test stimulus has not been presented recently enough for the possibility that it is in memory to be considered. In the absence of any memory signal, our modeling framework assumes a decision process operates with a base-rate of giving the correct “no” answer.

Fig. 1 — Conceptual framework for a model of 2-back task behavior. As the stimulus sequence is presented, stimuli are encoded and their positions updated. Decisions are made about the current stimulus based on the encoded stimuli and their positions

The top-right panel shows the sequence ABCC. Since the current stimulus C is presented one position earlier, there is a possibility of a memory signal. The arrow shows the possible encoding of the previous C in a slot that indicates it was presented 1-ago. The red circle indicates that this encoding sends a signal that the previous presentation was not 2-ago. Our model assumes that either the encoding does happen, in which case the “no” signal is executed with some level of accuracy, or the encoding does not happen, in which case the same base-rate decision process as for Ω-ago applies.

The bottom-left panel shows the sequence ABCB. This is the 2-ago situation for which the correct response is “yes”. The arrow shows the initial encoding of the earlier B after it was presented. At that stage in the sequence, it is encoded as 1-ago relative to the subsequent C. As the B is then presented, memory is potentially updated. The solid arrow shows this updating, with the B now correctly encoded as 2-ago. In this case, the memory signal is for a “yes” response, indicated by the green circle. It is also possible, however, that the position of the encoded B is not updated, and it continues to be considered as 1-ago. This failed updating is shown by the broken arrow. Thus, overall, there are three possibilities: the B may not be encoded at all, it may be correctly encoded as 2-ago, or it may be incorrectly encoded as 1-ago.

Finally, the bottom-right panel shows the sequence ABCA. After the original encoding of the A there are two potential updates of its position as the C and subsequent A are presented. Solid and broken arrows again indicate correct and failed updating, leading to the possibility of A being encoded as 3-, 2-, or 1-ago. Note that there are multiple routes through which the initial A can be incorrectly encoded as 2-ago at the time the current A is presented. This creates the possibility of interference, in which memory signals an incorrect “yes” response. Overall, for the 3-ago case, there are four possibilities: the A may not be encoded at all, it may be encoded correctly as a 3-ago or incorrectly as a 1-ago, both of which signal a “no” response, or it may be encoded incorrectly as a 2-ago to signal a “yes” response.

Formalization as a Multinomial Processing Tree

Figure 2 formalizes the 2-back task using standard probability tree notation for the $Ω$ -, 1-, 2-, and 3-ago cases. The probability of encoding a presented stimulus and successfully updating its encoded position is represented by the memory parameter $α$ . The probability of executing the signal provided by a remembered stimulus is represented by the accuracy-of-execution decision parameter $δ$ . The probability of responding “no” when there is no memory signal is represented by the base-rate parameter $γ$ .

The four decision trees in Fig. 2 correspond to the four cases described in Fig. 1. The trees quantify the probability of “yes” and “no” responses in each of the four cases in terms of the memory and decision parameters $α, δ$ and $γ$ . In the Ω-ago case, the probability of a “yes” response in the $Ω$ -ago case depends only on the base-rate. We denote this probability $θ_{1}$ , and it is simply given by

θ_{1} = 1 - γ .

(1)

In the 1-ago case with the sequence ABCC, a “yes” response could be generated either by remembering the previous C with probability $α$ but then inaccurately executing its signal with probability $1 - δ$ , or by failing to encode the C with probability $1 - α$ and then producing a “yes” response following the base-rate probability $1 - γ$ . Thus, the overall probability of a “yes” response in the 1-ago case is

θ_{2} = α (1 - δ) + (1 - α) (1 - γ) .

(2)

The probabilities of a “yes” response for the other cases can similarly be determined by adding the products of probabilities of branches through the trees that terminate in “Y” nodes. For the 2-ago case, it is

θ_{3} = α^{2} γ + α (1 - α) (1 - γ) + (1 - α) (1 - γ),

(3)

and for the 3-ago case it is

θ_{4} = α^{3} (1 - γ) + 2 α^{2} (1 - α) γ + α {(1 - α)}^{2} (1 - γ) + (1 - α) (1 - γ) .

(4)

Graphical Model Implementation

Figure 3 shows a graphical model implementation of the basic 2-back model just described. Graphical models are a language for representing probabilistic generative models developed in statistics and computer science (Jordan, 2004; Koller et al., 2007), and are increasingly widely used in cognitive science (Lee and Wagenmakers, 2013). In graphical models, nodes represent parameters and data, and the graph structure shows how they depend on each other.

The three model parameters $α, γ$ , and $δ$ are shown at the top of Fig. 3. They are circular nodes, because they are continuously valued, and they are unshaded, because they are latent or unobserved. The data are shown by the $y_{t}$ node for the $t th$ trial, with $y_{t} = 1$ indicating a “yes” response and $y_{t} = 0$ indicating a “no” response. This node is square, because the values are discrete, and shaded, because the data are observed.

The model assumes the memory and decision parameters generate the observed behavioral data following the trees in Fig. 2. The probability of “yes” response on the $t th$ trial is represented by $θ_{t}$ , which takes the different values given in Equations 1–4 depending on whether the stimulus presented corresponds to a $Ω$ -, 1-, 2-, or 3-ago case. This information is represented by the discrete observed variable $s_{t}$ , which takes the values 1, 2, 3, and 4, respectively. The dependence of the $θ_{t}$ response probability on the parameters $α, β$ , and $γ$ and the state $s_{t}$ is indicated by the $θ_{t}$ node being the child of these other four parent nodes. The fact that the response probability is completely determined as a function of these other nodes is indicated by the double border around the $θ_{t}$ node. Given this response probability, the observed response on the $t th$ trial is given by $y_{t} ~$ Bernoulli $(θ_{t})$ and the model is completed by uniform priors on the memory and decision parameters $α, γ, δ ~$ uniform(0,1).

We implemented all of the graphical models in this article in JAGS (Plummer, 2003), which provides a high-level scripting language, and automates the application of Markov-chain Monte Carlo methods for computational Bayesian inference. The convergence of these chains was checked via visual inspection and the standard $\hat{R}$ statistic (Brooks and Gelman, 1997). Our results are based on 1000 or 2000 samples collected from each of 8 independent chains after up to 10,000 burn-in samples were discarded. For some applications the chains were thinned by a factor of 5.

Parameter Recovery Study

In this section, we examine some properties of the basic model in Fig. 3 using simulated data. Parameter recovery studies, in which the inferences of a model are compared to the known values that generated simulated data, are widely used in cognitive modeling. Their value as tests of models is often misunderstood (Evans & Brown 2018, p. 594; Lee 2018, pp. 42–43; Lee et al. 2019, Appendix B), but they are useful for some important purposes. Parameter recovery studies provide no information about the validity of a model and do not evaluate the assumptions of a model. They can, however, serve as checks on the correctness of implementation of a model, help diagnose potential identifiability issues with a model, and provide insight into whether behavioral data collected under specific experimental designs are likely to be informative enough to lead to useful model inferences. Our parameter recovery study addresses these three goals.

We simulated data from eight groups with ten participants each. The groups varied systematically in the $α, γ$ , and $δ$ values assigned to participants. Participants in four of the groups had high base-rate $γ$ values between 0.9 and 1, while the other four groups had low base-rates between 0.5 and 0.6. Within each of these sets of four groups, we used a 2 × 2 design with high and low values of the $α$ memory and $δ$ decision parameters. Once again, high values were between 0.9 and 1 and low values were between 0.5 and 0.6. We simulated data with each participant doing 50 experimental blocks. This corresponds to a realistic but extensive behavioral experiment.

We used three different task structures for the specific sequences of stimuli within each block. The first design used artificially created sequences of length 15. The sequences were designed so that there were significant numbers of $Ω$ -, 1-, 2-, and 3-ago cases. Specifically there were about 58% Ω-ago trials, 12% 1-ago trials, 17% 2-ago trials and 13% 3-ago trials. The second design used the length 10 stimulus sequences from the 2-back task in the Human Connectome Project (Van Essen, 2013). These sequences have about 64% Ω-ago trials, 11% 1-ago trials, 20% 2-ago trials and 5% 3-ago trials. The third design used the length 22 stimulus sequences from the 2-back task of Stelter and Degner (2018). This design is more problematic, with about 72% Ω-ago trials, 27% 2-ago trials, and fewer than 1% of both 1-ago and 3-ago trials.

Figure 4 summarizes the results of the recovery study. The left, middle, and right columns correspond to the artificial, Human Connectome Project, and Stelter and Degner (2018) designs, respectively. The top two rows show inferred joint posterior distributions of the $α$ memory parameter against the $δ$ decision parameter and the $γ$ base-rate parameter. Markers indicate posterior means for each participant and error bars show interquartile credible intervals. The marker colors and shapes indicate the group membership of each participant. For the artificial and Human Connectome Project designs, it is clear that the model inferences generally match the ranges of generating parameter values for the groups. The closest match is when all three parameters have high probabilities. There is less close agreement when both $α$ and $δ$ have smaller probabilities. For the Stelter and Degner (2018) design, recovery is less effective. The difference seems likely to be caused by the very small number of 1-ago and 3-ago trials.

The bottom row in Fig. 4 provides a posterior predictive check of the descriptive adequacy of the model. Markers correspond to each group and each of the $Ω$ -, 1-, 2-, and 3-ago cases, showing the probability of a “yes” response given by the model’s posterior predictive distribution and the observed frequency of “yes” responses in the simulated data. There is very good agreement for all of these probabilities using the artificial and Human Connectome Project designs. Using the Stelter and Degner (2018) design leads to lower agreement, but we conclude that the model shows acceptable levels of descriptive adequacy for all three designs.

Overall, the parameter recovery study shows that the model is able to infer structured variation in the memory and decision parameters and is descriptively adequate. These results provide evidence that the model is identifiable and useful with respect to the experimental designs considered. The relatively worse performance using the Stelter and Degner (2018) design highlights the need to include all of the cases considered by the model, and more generally emphasizes the importance of experimental design in allowing model-based inferences (Cavagnaro et al., 2010; Cavagnaro et al., 2011; Myung et al., 2013).

Applications to Stelter and Degner (2018)

As a first set of applications of the model, we consider an experiment conducted by Stelter and Degner (2018). These authors used the $n$ -back task as one of a set of tasks to investigate differences visual working memory between in-group and out-group face stimuli. We first show how the model can measure the memory and decision-making properties of individuals, and then show how it can test for differences with respect to the two face conditions. Specifically, we consider the data from all 52 participants in Stelter & Degner (2018, Experiment 1), which used an adaptive $n$ -back procedure for blocks of 15 white and middle eastern faces. Because of this design, different participants completed different numbers of 2-back blocks, with a minimum of 1, a maximum of 10, and a mean of 3 blocks.

Measurement of Individuals

Model

To apply the model to measure the memory and decision-making properties of individuals in an experimental setting, we extend the basic model to allow for the possibility of contaminant behavior using a latent-mixture approach (Zeigenfuse and Lee, 2010). Each participant is assumed either to make decisions according to the model on all trials, or to guess by responding “yes” with some fixed probability on all trials. If the $i th$ participant uses the model their parameters are $α_{i}, γ_{i}$ , and $δ_{i}$ , but if they guess their fixed probability is $ψ_{i}$ . Which of the two possibilities is followed is determined by the indicator parameter $z_{i}$ , with $z_{i} = 1$ indicating model-based responses and $z_{i} = 0$ indicating the contaminant guessing responses. The indicator parameters are sampled as $z_{i} ~$ Bernoulli $(ϕ)$ where $ϕ$ is a population base-rate of contaminant participants with uniform prior $ϕ ~$ uniform(0,1).

The graphical model for this latent-mixture extension is shown in Fig. 5. Note that an abbreviation is used with $θ_{i t} = 2 back (α_{i}, γ_{i}, δ_{i}, s_{i})$ indicating the selection of the appropriate model response probability for $θ_{i t}$ depending on the case of the current trial.

Result

All of the participants were inferred to use the model rather than the contaminant guessing process. The posterior means of the $z_{i}$ parameters were all greater than 0.99 and the base-rate $ϕ$ was similarly high with a mean of 0.98 and 95% credible interval (0.93,1.00). This result provides some additional evidence of the adequacy of the model in accounting for participants’ behavior. The contaminant model also provides a general approach that can be used in any analysis where it is possible some participants do not follow task instructions, or fail to perform in a motivated way.

Figure 6 shows the inferred model parameters for all 52 participants. The two panels summarize the joint posterior distributions between $α$ and $δ$ and between $α$ and $γ$ . Markers correspond to posterior means and error bars show interquartile credible intervals. The posterior means show that a range of values is inferred, indicating the presence of individual differences. The credible intervals show significant uncertainty in these inferences, especially for the $α$ memory and $δ$ decision parameters. This is consistent with the relatively limited 2-back data for each individual.

Fig. 6 — Results for the 2-back conditions in Stelter and Degner (2018). The left panel shows the joint posterior between the $α$ memory and $δ$ decision parameters. The right panel shows the joint posterior between the $α$ memory and $γ$ base-rate parameters. Points show posterior means and error bars show interquartile credible intervals. Two illustrative participants are highlighted in dark blue

Participants 22 and 47 are highlighted to demonstrate the individual differences. Participant 22 completed 10 blocks with an overall accuracy of 83%. Participant 47 completed 9 blocks and also had an overall accuracy of 83%. While their number of blocks and overall accuracies are very similar, their accuracies for the different cases are different. Participant 22 had accuracies of 91%, 100%, 61%, and 100% for Ω -, 1-, 2-, and 3-ago cases, respectively, while Participant 47 had accuracies of 90%, 50%, 69%, and 0%. The 1-ago and 3-ago cases are based on relatively few trials, because of the limitations of the experimental design. Nevertheless, these patterns suggest that Participant 22 is less accurate in identifying target 2-ago stimuli but more accurate in avoiding “yes” responses for interfering 1-ago and 3-ago matches.

The inferred $α$ memory and $δ$ decision parameters for the two participants capture this distinction. Participant 22 has $α_{22} = 0.83$ and $δ_{22} = 0.80$ while Participant 47 has higher $α_{47} = 0.90$ but lower $δ_{47} = 0.76$ . Both participants have similar base-rate parameters, with $γ_{22} = 0.90$ and $γ_{47} = 0.89$ . The model-based interpretation is that Participant 47 has better memory encoding and updating processes, which allows for better detection of target 2-ago stimuli, but worse accuracy of execution for decisions based on memory signals, which leads to errors with interfering matches in neighboring 1-ago and 3-ago positions.

Between Condition Differences

Model

To apply the model to measure differences between conditions, we distinguish between the in-group white and out-group middle eastern faces.² The goal is to test whether there are differences in the condition-level means of the parameters for responses on trials with white versus middle eastern faces.

We extend the model hierarchically to allow potentially different overarching Gaussian distributions for the white and middle eastern faces. The means of these distributions are expressed in terms of a parameter representing the overall condition-level mean and a parameter representing the difference between the means for the two types of faces. For example, the overall mean for the $α$ memory parameter is $μ_{α}$ and the difference is $ϵ_{α}$ . The condition means are then $μ_{α} - ϵ_{α} / 2$ for the white faces and $μ_{α} + ϵ_{α} / 2$ for the middle eastern faces, so that they differ by $ϵ_{α}$ .

The memory parameter used by the $i th$ participant for white faces is then sampled as

α_{i}^{w} ~ Gaussian (μ_{α} - e_{α} / 2, \frac{1}{σ_{α}^{2}}) T (0, 1),

and for the middle eastern faces as

α_{i}^{m} ~ Gaussian (μ_{α} + ϵ_{α} / 2, \frac{1}{σ_{α}^{2}}) T (0, 1),

where the standard deviation $σ_{α}$ is assumed to be the same for both conditions and measures the extent of individual differences within the conditions. The T(0,1) notation denotes truncation to keep the parameters in their valid range as probabilities. The overall mean, difference, and standard deviation are given the priors $μ_{α} ~$ uniform(0, 1), $ϵ_{α} ~$ Gaussian(0, 1∕0.3²), and $σ_{α} ~$ uniform(0,1). The $δ$ decision and $γ$ base-rate parameters are modeled in the same way at the condition and individual level.

Figure 7 shows the graphical model representation for the face condition differences application, with each of the parameters defined with individual differences within conditions for each type of face. For the $t th$ trial, $f_{i t} = 1$ indicates the face is white and $f_{i t} = 2$ indicates the face is middle eastern. This information is used to control whether $α_{i}^{w}, γ_{i}^{w}$ , and $δ_{i}^{w}$ or $α_{i}^{m}$ , $γ_{i}^{m}$ , and $δ_{i}^{m}$ are used to generate response probabilities according to the case of the trial. Note that the graphical model uses vectors as nodes, so that $μ = (μ_{α}, μ_{δ}, μ_{γ})$ , $α_{i} = (α_{i}^{w}, α_{i}^{m})$ , and so on.

Fig. 7 — Graphical model representation of the between-condition differences version of the 2-back MPT model applied to data from Stelter and Degner (2018)

Result 2 Figure 8 shows the relationship between the inferred model parameters for white and middle eastern faces, for each of the three parameters. Each panel corresponds to a parameter, and the main scatter plot contains circular markers showing the posterior means for each participant with error bars showing interquartile credible intervals. Most participants, for all three parameters, are near the dashed diagonal line at which the parameter values for white and middle eastern faces are the same. A few participants appear possibly to have different values for $δ^{w}$ and $δ^{m}$ but there is no large or systematic difference across the participants as a whole.

It is interesting to note that the assumption of hierarchical individual differences, in the form of an overarching Gaussian distribution, has affected the inferences about individual parameter values through hierarchical shrinkage. This shrinkage is consistent with the uncertainty at the individual level shown in Fig. 6. Individual participant values for the $α$ memory parameter, for example, span a narrower range in Fig. 8 than they do in the non-hierarchical analysis shown in Fig. 6. The hierarchical assumptions do not prevent the model from being descriptively adequate, as determined by comparing the empirically observed and posterior predicted expected probability of “yes” responses for all $Ω$ -, 1-, 2-, and 3-ago trials. The behavioral probabilities are 0.07, 0.33, 0.70, and 0.41 respectively. The model posterior predicted probabilities are 0.07, 0.20, 0.69, and 0.28. This level of agreement is very similar to that for the unconstrained model. Given the excellent agreement for the Ω-ago and 2-ago cases, which together constitute 98.7% of the trials, we regard this as an acceptable overall level of descriptive adequacy.

The inset axes in each panel in Fig. 8 show the prior and posterior distribution of the condition-level mean difference parameters. The prior is shown by the solid line and the posterior by the shaded region. For all three parameters, the posterior distributions have most of their mass close to the value zero, suggesting there is no difference between the faces in the two conditions. The Savage-Dickey method (Wetzels et al., 2010) provides a way to quantify this result by approximating the Bayes factor between the null model of no difference in the condition means and the alternative model of a difference. The Bayes factor is approximated as the ratio of posterior to prior density at the critical value $ϵ = 0$ . For all three parameters, the Bayes factor favors the null, with values of about 3, 2, and 14 for $α, δ$ , and $γ$ respectively. We interpret these results as providing weak evidence of no difference for the $α$ memory and $δ$ decision parameters, and strong evidence of no difference for the $γ$ base-rate parameter. The finding of no evidence for differences due to the type of face, and some evidence for sameness, is consistent with the results in Stelter & Degner (2018, Figure 1), which showed that the accuracy for the two types of faces is very similar for the 2-back blocks.

Applications to the Human Connectome Project

As a second set of applications of the model, we consider behavioral data from the Human Connectome Project young adult data set (Van Essen, 2013).³ The Human Connectome Project data set contains behavioral and neuroimaging data from 1200 young adults on a variety of cognitive tasks and assessments, with the broad purpose of understanding human brain structure, function, and connectivity and their relationships to behavior. One of the Human Connectome Project tasks is the 2-back working memory task.

We consider three applications of our model to these data. The first is a measurement application, with the same goals as for the Stelter and Degner (2018) data. The second is an application to between-condition differences. In the Human Connectome Project 2-back task, participants complete eight blocks with 10 trials each, and there are four types of stimuli: faces, tools, bodies, and places. One stimulus type is used for each block. Previous studies have suggested a visual short-term memory advantage for faces compared to non-face objects (Curby and Gauthier, 2007). In the context of the 2-back task, no consensus has emerged regarding concrete performance differences in faces versus non-face objects, although distinct cortical regions have been implicated in the processing of face, body parts, places, and tools as stimuli (Barch, 2013). Not hampered by small sample sizes, the Human Connectome Project data is ideal to evaluate a model-based account of whether there are process- and performance-related differences in face versus non-face stimuli. Thus, we test whether there are differences in model parameters between the face and non-face stimuli. Finally, we develop a new application that regresses model parameters on cognitive measures derived from other tasks in the Human Connectome Project behavioral battery. There are 1082 participants in the data set with valid 2-back task data and complete external task measures. All three analyses are conducted on these participants.