Dynamical analysis of Bayesian inference models for the Eriksen task

Yuan Sophie Liu; Angela Yu; Philip Holmes

doi:10.1162/neco.2009.03-07-495

. Author manuscript; available in PMC: 2009 Sep 23.

Published in final edited form as: Neural Comput. 2009 Jun;21(6):1520–1553. doi: 10.1162/neco.2009.03-07-495

Dynamical analysis of Bayesian inference models for the Eriksen task

Yuan Sophie Liu ¹, Angela Yu ², Philip Holmes ³

PMCID: PMC2749702 NIHMSID: NIHMS141895 PMID: 19191595

Abstract

The Eriksen task is a classical paradigm that explores the effects of competing sensory inputs on response tendencies, and the nature of selective attention in controlling these processes. In this task, conflicting flanker stimuli interfere with the processing of a central target, especially on short reaction-time trials. This task has been modeled by neural networks and more recently by a normative Bayesian account. Here, we analyze the dynamics of the Bayesian models, which are nonlinear, coupled discrete-time dynamical systems, by considering simplified, approximate systems that are linear and decoupled. Analytical solutions of these allow us to describe how posterior probabilities and psychometric functions depend upon model parameters. We compare our results with numerical simulations of the original models and derive fits to experimental data, showing that agreements are rather good. We also investigate continuum limits of these simplified dynamical systems, and demonstrate that Bayesian updating is closely related to a drift-diffusion process, whose implementation in neural network models has been extensively studied. This provides insight on how neural substrates can implement Bayesian computations.

Keywords: Bayesian inference, decoupling, drift-diffusion model, dynamical system, Eriksen task, linearization

1 Introduction

The psychological [Laming, 1968, Ratcliff, 1978, Ratcliff et al., 1999] and neural bases of decision making [Platt and Glimcher, 2001, Schall, 2001, Gold and Shadlen, 2001] have been widely studied, particularly in constrained situations such as the two-alternative forced-choice (2AFC) task. In 2AFC, subjects are required to discriminate a stimulus and to give one of two permissible responses. The sequential probability ratio test (SPRT) is optimal for 2AFC tasks, whether the objective is to minimize the mean reaction time (RT) for a desired accuracy level [Wald and Wolfowitz, 1948], or to minimize a linear cost function in accuracy and detection delay under the Bayesian formulation [Liu and Blostein, 1992]. The SPRT compares the relative likelihoods of noisy inputs given two possible hypotheses, and reaches a decision when the cumulative evidence for one of them exceeds a fixed threshold. Performance on 2AFC tasks seems broadly consistent with the SPRT [Ratcliff and Smith, 2004], and there is evidence that competing neural populations sub-serving decision-making may implement a strategy close to the SPRT [Gold and Shadlen, 2002, Schall, 2001, Gold and Shadlen, 2001, Shadlen and Newsome, 2001, Roitman and Shadlen, 2002] and [Schall et al., 2002]. Moreover, the continuum limit of SPRT is an analytically-tractable drift-diffusion model (DDM) [Holmes et al., 2005], which yields explicit expressions for error rates and reaction times that can be used to investigate reward-rate maximization in 2AFC [Bogacz et al., 2006], and it has been shown that various neural network models of decision-making [Cohen et al., 1990, Cohen et al., 1992, Usher and McClelland, 2001] can be reduced to variants of the DDM [Bogacz et al., 2006].

The Eriksen flanker task [Eriksen and Eriksen, 1974] is an extension of the classical 2AFC task in which the decision is complicated by potentially-conflicting distractor inputs. Subjects are required to discriminate a central target stimulus (e.g. the letter H or S) flanked by distractors. Flankers can either be compatible with the central target (e.g. HHHHH) or incompatible (e.g. HHSHH). Subjects display a compatibility effect, being typically slower and less accurate on incompatible than compatible trials [Eriksen and Eriksen, 1974]. Furthermore, subjects perform at worse than chance level for short RT’s for incompatible trials only. This “dip” in accuracy implies that flanker interference is particularly potent shortly after stimulus presentation. Fig. 1 shows data from two instances of a deadlined Eriksen task. While specific details of reaction time distributions and relationships between accuracy and reaction time differ between the two studies, the basic compatibility effect and the dip in accuracy on incompatible trials are prominent in both.

Accuracy vs. RT in the Eriksen task. Human subjects respond slower and less accurately in the incompatible condition. In particular, accuracy is below chance (.50) for short RT’s, but approaches 1 for longer RT’s. (A) Reaction times gauged by electromyographic activities (EMG), adapted from [Gratton et al., 1988]. (B) Behavioral data from [Servan-Schreiber et al., 1998]. Details differ, but the compatibility effect and “dip” in accuracy for short-reaction incompatible trials, are obvious in both data sets.

Since the Eriksen task extends the standard 2AFC task, we suspect that optimal policy in this case is similar to the SPRT. In this vein, [Yu et al., 2007] modeled the computations underlying the Eriksen task as iterative Bayesian updating, with the decision being made (and the trial terminated) when the cumulative posterior for one of the two possible target stimuli exceeds a fixed threshold. It was also proposed that the apparent suboptimality in performance can be explained by either an incorrect prior on the relative frequency of compatible and incompatible trials (compatibility bias model), or by inherent spatial overlap of visual processing neurons (spatial uncertainty model) [Yu et al., 2007]. Here we reduce the Bayesian models to simpler dynamical systems and study them analytically and numerically. While the simpler models closely approximate the original ones in dynamics and perfomance, their analytical tractability yields explicit expressions for the dependence of inferential and psychometric quantities on model parameters. We discuss the relationship between exact Bayesian inference and drift-diffusion processes, emphasising the link that this establishes between Bayesian updating and the neural substrates that may execute it. Our analysis also reveals the formal similarity of computations underlying the compatibility bias and spatial uncertainty models, which were motivated by disparate experimental literature and were formulated differently within the Bayesian framework.

The paper is organized as follows. After reviewing the Bayesian inference models in Section 2, in Section 3 we derive and analyze the simplified models: uncoupled, linear discrete dynamical systems. From these we derive explicit criteria on parameters that predict the dip in accuracy for incompatible trials, and we compare accuracies and RT distributions generated by the full and simplified models. In Section 4 we show that the DDM is a continuum limit of the simplified models, and from this derive analytical predictions for mean posterior probabilities. We also compute accuracy vs. time curves and reaction time distributions under an approximation that violates the first passage threshold crossing criterion adopted in [Yu et al., 2007], but permits explicit analysis, and we provide direct comparisons between behavioral data and predicitions of the full and approximate compatibility bias models. Section 5 contains a summary and discussion.

2 A Bayesian framework for the Eriksen task

We briefly review the compatibility bias and spatial uncertainty inference models for the Eriksen task proposed by [Yu et al., 2007]. The generative process common to both models consists of the prior probability distribution over trial type (M = 1 if compatible, M = 2 if incompatible), and the stochastic relationship between the trial type M and the stimuli s, and between the stimuli and the noisy inputs into the visual system x. For simplicity, it was assumed that there are three stimuli, s ≜ {s₁, s₂, s₃}, for “left”, “center”, “right”, respectively; and each one of three neural units or populations x ≜ {x₁, x₂, x₃} responds to one stimulus. Here the pairs of left and right flankers are combined in s₁ and s₃ respectively, and we assume that all three inputs contain independent noise, both among the units/populations, and over time. Using integers s_i = ±1 to denote S and H, and M = 1, 2 to denote compatible and incompatible trials respectively, we may formally describe the process as:

β ≜ P (M = 1) \in [0, 1]

(1)

P (s ∣ M = 1) = {\begin{matrix} 0.5 & s_{1} = s_{2} = s_{3} = - 1 (HHHHH) \\ 0.5 & s_{1} = s_{2} = s_{3} = + 1 (SSSSS) \end{matrix},

(2)

P (s ∣ M = 2) = {\begin{matrix} 0.5 & s_{1} = s_{3} = + 1, s_{2} = - 1 (SSHSS) \\ 0.5 & s_{1} = s_{3} = - 1, s_{2} = + 1 (HHSHH) \end{matrix},

(3)

p (x_{t} ∣ s) = p (x_{1} (t) ∣ s) p (x_{2} (t) ∣ s) p (x_{3} (t) ∣ s),

(4)

p (x_{1}, x_{2}, \dots, x_{t} ∣ s) = p (x_{1} ∣ s) p (x_{2} ∣ s) . . . p (x_{t} ∣ s) .

(5)

For the compatibility bias model, the prior probability β for compatible trials is assumed to be higher than the “true” value 0.5, and the inputs are taken to be normally distributed as a function of their respective stimuli and independent of neighboring stimuli:

p (x_{i} (t) ∣ s) = p (x_{i} (t) ∣ s_{i}) = \frac{1}{\sqrt{2 π σ^{2}}} \exp [\frac{- {(x_{i} - s_{i})}^{2}}{2 σ^{2}}],

(6)

i.e., at each step t the x_i(t) are independently drawn from normal distributions with means s_i and standard deviations σ. We denote this procedure below by $x_{i} (t) ~ N (s_{i}, σ^{2})$ .

In the spatial uncertainty model, the correct prior β = 0.5 is assumed, but the inputs are corrupted by their neighbors according to:

p (x_{t} ∣ s) = p (x_{1} (t) ∣ s_{1}, s_{2}) p (x_{2} (t) ∣ s_{1}, s_{2}, s_{3}) p (x_{3} (t) ∣ s_{2}, s_{3}), where

(7)

\begin{matrix} x_{1} (t) ~ N (a_{1} s_{1} + a_{2} s_{2}, σ_{1}^{2} + σ_{2}^{2}), \\ x_{2} (t) ~ N (a_{1} s_{2} + a_{2} s_{1} + a_{2} s_{3}, σ_{1}^{2} + 2 σ_{2}^{2}), \\ x_{3} (t) ~ N (a_{1} s_{3} + a_{2} s_{2}, σ_{1}^{2} + σ_{2}^{2}), \end{matrix}

(8)

where a₁, σ₁ denote influence from the primary stimulus, and a₂ and σ₂ that from the flankers.

Define $z_{i, j}^{t} ≜ P (s_{2} = i, M = j ∣ X_{t})$ for the posterior probabilities, and $l_{i, j}^{t} ≜ p (x_{t} ∣ s_{2} = i, M = j)$ for the likelihood functions, where i ∈ {-1, +1}, j ∈ {1, 2}. Based on Bayes’ Rule, this yields our inference model: four discrete-time dynamical equations, coupled through normalization:

z_{i, j}^{t} = \frac{l_{i, j}^{t} z_{i, j}^{t - 1}}{\sum_{k, l} l_{k, l}^{t}, z_{k, l}^{t - 1}},

(9)

with initial conditions

z_{i, j}^{0} = {\begin{matrix} \frac{β}{2}, & j = 1, \forall i; \\ \frac{(1 - β)}{2}, & j = 2, \forall i . \end{matrix}

(10)

To make a decision based on the accumulating inputs, we compare the cumulative marginal posterior probability,

P (s_{2} = i ∣ X_{t}) = z_{i, 1}^{t} + z_{i, 2}^{t},

(11)

against a decision threshold q, a policy closely related to the SPRT [Wald, 1947]. As soon as P(s₂=i|X_t) exceeds q for i=-1 or i=+1, the system chooses the corresponding response (H or S) and terminates observations for the current trial. The computation for the marginal posterior probability over compatibility is analogous: $P (M = j ∣ X_{t}) = z_{- 1, j}^{t} + z_{1, j}^{t}$ .

Examples of accuracies and RTs thus predicted are shown in Fig. 4, below. For these and other calculations, unless otherwise noted, we adopt the parameter values used in [Yu et al., 2007]: σ = 9 for the compatibility bias model and σ₁ = 7, σ₂ = 5, a₁ = 0.7, a₂ = 0.3 for the spatial uncertainty model, and q = 0.9 for both.

Top panels: Accuracy and reaction time distributions for the compatibility bias model for compatible stimuli (left) and incompatible stimuli (right). Solid and right hand (blue) bar of each RT bin pair from full inference model of [Yu et al., 2007]; dashed and left hand (yellow) bars from approximate linearized likelihood model. Bottom panels: Accuracy and reaction time distributions for the spatial uncertainty model. Results obtained by averaging over 2, 000 simulated trials in each case.

3 Linearization and parametric dependence

It was shown in [Yu et al., 2007] that certain choices of parameters allow both the compatibility bias and spatial uncertainty models to capture key properties of the behavioral data in Fig. 1 (see Fig. 4 below). Here we derive general constraints on the parameters in each model that allow them to reproduce the behavioral data: σ for the compatibility bias model, a₁, a₂, σ₁, and σ₂ for the spatial uncertainty model; and n, the number of distractors. While we cannot analyze the complex relationship between accuracy and reaction time directly, we wish to at least constrain parameters so that the mean posterior probability for s₂=1 (the correct answer) dips below that for s₂=-1 after one or a few timesteps of observations. Although the relative probability of a correct response at time t depends not just on the mean but also on higher-order moments, such an analysis would illuminate the magnitude and range of the effective parameters. Unfortunately, even this partial analysis is difficult for the original Bayesian model, as P(s₂|X_t) involves the summation of two exponential functions of the inputs, as in Eq. (11), and there is no obvious way to derive the expectation of P(s₂|X_t) as an explicit function of the parameters that specify the generation of the inputs x.

Due to such computational intractability, we instead work with a linearized approximation to the exact posterior update rule of Eq. (9). We will motivate and describe the approximations for the two Bayesian models, and demonstrate via simulations that the parametric constraints derived from this approximate scheme provide useful bounds for the original Bayesian models.

3.1 The compatibility bias model

Given our assumption of independent, normally-distributed inputs (Eqs. (4) and (6)), we have

p (x_{t} ∣ s) = {\frac{\exp [\frac{- {(x_{1} - s_{1})}^{2}}{2 σ^{2}}]}{\sqrt{2 π σ^{2}}}} {\frac{\exp [\frac{- {(x_{2} - s_{2})}^{2}}{2 σ^{2}}]}{\sqrt{2 π σ^{2}}}} {\frac{\exp [\frac{- {(x_{3} - s_{3})}^{2}}{2 σ^{2}}]}{\sqrt{2 π σ^{2}}}},

(12)

where each s_i can take on the value ±1. We now derive an approximation to Eq. (12) that is linear in the x_i(t)’s. Defining

ϴ_{k} ≜ \frac{\exp [\frac{- {(x_{k} - 1)}^{2}}{2 σ^{2}}]}{\sqrt{2 π σ^{2}}} + \frac{\exp [\frac{- {(x_{k} + 1)}^{2}}{2 σ^{2}}]}{\sqrt{2 π σ^{2}}},

the likelihood function for s₂=1 and M=1 (i.e. s₁=s₂=s₃=1) can be approximated as follows:

\begin{matrix} p (x_{t} ∣ s) & = ϴ_{1} ϴ_{2} ϴ_{3} {\frac{\exp [\frac{- {(x_{1} - 1)}^{2}}{2 σ^{2}}]}{ϴ_{1} \sqrt{2 π σ^{2}}} \frac{\exp [\frac{- {(x_{2} - 1)}^{2}}{2 σ^{2}}]}{ϴ_{2} \sqrt{2 π σ^{2}}} \frac{\exp [\frac{- {(x_{3} - 1)}^{2}}{2 σ^{2}}]}{ϴ_{3} \sqrt{2 π σ^{2}}}} \\ = ϴ_{1} ϴ_{2} ϴ_{3} [\frac{1}{1 + \exp (- \frac{2 x_{1}}{σ^{2}})} \frac{1}{1 + \exp (- \frac{2 x_{2}}{σ^{2}})} \frac{1}{1 + \exp (- \frac{2 x_{3}}{σ^{2}})}] \\ = \frac{ϴ_{1} ϴ_{2} ϴ_{3}}{8} [1 + \frac{x_{1}}{σ^{2}}] [1 + \frac{x_{2}}{σ^{2}}] [1 + \frac{x_{3}}{σ^{2}}] + O (x_{k}^{2} ∕ σ^{4}) \\ = \frac{ϴ_{1} ϴ_{2} ϴ_{3}}{8} [1 + \frac{x_{1} + x_{2} + x_{3}}{σ^{2}}] + O (x_{k}^{2} ∕ σ^{4}) \\ \approx \frac{ϴ_{1} ϴ_{2} ϴ_{3}}{8} [1 + \frac{x_{1} + x_{2} + x_{3}}{σ^{2}}] . \end{matrix}

(13)

The first step uses the fact that quadratic and constant terms cancel in the ratios, the next two rely on Taylor series expansion of the exponential terms and the binomial series approximation:

{[1 + \exp (- \frac{2 x_{k}}{σ^{2}})]}^{- 1} \approx {[2 (1 - \frac{x_{k}}{σ^{2}})]}^{- 1} \approx \frac{1}{2} [1 + \frac{x_{k}}{σ^{2}}],

and the approximation is justified by the fact that x_k(t) ∈ [-1-2σ, 1+ 2σ] with > 99% probability, if we can assume that σ ≫ 1. This latter assumption is reasonable since we are modeling the time-scale at which on average many time steps of inputs are needed to perform the discrimination.

Generalizing the approximation (13) to the other three cases and using the four resulting expressions in Eq. (9), we obtain the following approximate update rules:

z_{i, j}^{t} \approx \frac{1}{D_{t}} {\begin{matrix} (1 + \frac{x_{1} (t) + x_{2} (t) + x_{3} (t)}{σ^{2}}) z_{+ 1, 1}^{t - 1} & s_{2} = + 1, M = 1, \\ (1 - \frac{x_{1} (t) + x_{2} (t) + x_{3} (t)}{σ^{2}}) z_{- 1, 1}^{t - 1} & s_{2} = - 1, M = 1, \\ (1 - \frac{x_{1} (t) - x_{2} (t) + x_{3} (t)}{σ^{2}}) z_{+ 1, 2}^{t - 1} & s_{2} = + 1, M = 2, \\ (1 + \frac{x_{1} (t) - x_{2} (t) + x_{3} (t)}{σ^{2}}) z_{- 1, 2}^{t - 1} & s_{2} = - 1, M = 2, \end{matrix}

(14)

in which the denominator

D_{t} = (1 + \frac{x_{1} (t) + x_{2} (t) + x_{3} (t)}{σ^{2}}) z_{+ 1, 1}^{t - 1} + \dots + (1 + \frac{x_{1} (t) - x_{2} (t) + x_{3} (t)}{σ^{2}}) z_{- 1, 2}^{t - 1}

(15)

is the sum of all four numerators and normalizes the posterior distribution, and the common factors ϴ₁ϴ₂ϴ₃/8 in the numerators and denominator of Eq. (14) have canceled. Initial conditions are as in Eq. (10). Since this simplified system is still nonlinearly coupled through the denominator D_t, we work with the joint probability $v_{i, j}^{t} ≜ p (s_{2} = i, M = j, X_{t})$ instead. The two are related as follows:

\begin{matrix} z_{i, j}^{t} & = P (s_{2} = i, M = j ∣ X_{t}) = \frac{P (s_{2} = i, M = j ∣ X_{t - 1}) p (x_{t} ∣ s_{2} = i, M = j)}{p (x_{t} ∣ X_{t - 1})} \\ = \frac{P (s_{2} = i, M = j) \prod_{t^{'} = 1}^{t} p (x_{t^{'}} ∣ s_{2} = i, M = j)}{p (X_{t})} = \frac{v_{i, j}^{t}}{\sum_{k, l} v_{k, l}^{t}} . \end{matrix}

(16)

The joint probability $v_{i, j}^{t}$ obeys the uncoupled update rule:

v_{i, j}^{t} = l_{i, j}^{t} v_{i, j}^{t - 1} \approx (1 + \frac{\pm x_{1} (t) \pm x_{2} (t) \pm x_{3} (t)}{σ^{2}}) v_{i, j}^{t - 1},

(17)

where the sign preceding each x_i depends on i and j as in Eq. (14). As is apparent in Eq. (16), $z_{i j}^{t}$ can be obtained by normalizing $v_{i j}^{t}$ on timestep t, but $v_{i j}^{t}$ cannot be used directly in the perceptual decision process, since a fixed threshold in the posterior probability space has no equivalent fixed value in the joint posterior space. However, $v_{i j}^{t}$ is sufficient for deriving bounds on the generative parameters that on average make the posterior probability for s₂=1 dip below that for s₂=-1 after one time step, when the inputs are generated from the incompatible stimulus array: s= (-1, 1, -1) (the analysis for s=(1, -1, 1) is analogous). Specifically, since P(s₂, M|X_t)=p(s₂, M, X_t)/p(Xt), the condition $〈 z_{1, 1}^{1} + z_{1, 2}^{1} 〉 < 〈 z_{- 1, 1}^{1} + z_{- 1, 2}^{1} 〉$ is equivalent to $〈 v_{1, 1}^{1} 〉 + 〈 v_{1, 2}^{1} 〉 < 〈 v_{- 1, 1}^{1} 〉 + 〈 v_{- 1, 2}^{1} 〉$ . We therefore require

β (1 - \frac{1}{σ^{2}}) + (1 - β) (1 + \frac{3}{σ^{2}}) < β (1 + \frac{1}{σ^{2}}) + (1 - β) (1 - \frac{3}{σ^{2}}) \Rightarrow β > \frac{3}{4},

(18)

since the mean values of x₁, x₂, and x₃ are -1, 1,and -1, respectively, and the compatible terms are weighted by the compatibility prior bias β (and the incompatible ones weighted by 1-β).

Hence β > 3/4 is the necessary and sufficient condition for the average posterior probabilty for s₂=1 to dip below that for s₂=-1 after one observation, when the true stimuli are the incompatible array (-1, 1, -1). More generally, we can show that the constraint is β > (n + 1)/(2n), where n is the total number of flankers. This makes intuitive sense, for it suggests that the dip is more prominent or more likely to happen when the subject’s prior compatibility bias is stronger and/or the number of flankers is larger. Indeed, there is behavioral data suggesting that flanker interference effects are stronger when there is as a lower frequency of incompatible trials [Gratton et al., 1992].

These analytical constraints only guarantee a dip in the posterior probability. As shown in Figure 2 (left), for a particular set of model parameters, the mean accuracy in compatible trials terminating within 20 timesteps steadily decreases as a function of β, and it drops below .5, indicating the presence of a dip, for all values of β > 0.82: somewhat higher than β = 0.75, the lower bound of inequality (18) that results in a dip in posterior probability. A major factor underlying the discrepancy between the two constraints is that we only considered the mean of the posterior probability and not the full distribution. The mean accuracy depends not only on the mean posterior value, but also on higher moments. If the distribution were symmetric about its mean, then the dip in the mean posterior would directly translate into a dip in accuracy, but as we will show in Section 4, the distribution of the posterior trajectories is strongly skewed, and the interaction of that skewness with the decision threshold also plays a role in determining the presence of the dip in accuracy versus reaction time.

Simulated and analytical approximations of parameter values that produce dips in accuracy vs. reaction time for incompatible trials. Graphs show accuracy averaged over trials with simulated reaction times under 20 timesteps, as a function of for the compatibility bias model (left), and the ratio of means a₁/a₂ for the spatial uncertainly model (right). Crossings with the 0.5 accuracy line indicate numerically obtained estimates of the “true” parameter constraints; dashed lines show the approximate constraints of Eqs. (18) and (24).

A second reason for the discrepancy is that the theoretical bounds are for the dip to occur in the posterior after one time step, whereas in the numerical simlations, due to the infrequency of responses at very short RT’s, all trials that terminate within the first 20 timesteps were used to estimate accuracy. If the temporal extent of the dip in the posterior distribution is very small (which is likely in boundary cases), then conditional accuracy may not fall below 0.5 when averaged over 20 timesteps. The numerically-obtained constraints are therefore likely to be more conservative than the analytical approximations.

3.2 The spatial uncertainty model

Derivation of iterated maps for the spatial uncertainty model are more tedious than those of (14) due to the extra “cross-talk” links in the generative model, but they follow from similar reasoning. Defining $h_{k, i, j}^{t} ≜ p (x_{k} (t) ∣ s_{k} = i, M = j)$ , forming the triple product and dividing through by

ϴ^{'} = \prod_{k = 1}^{3} [h_{k, 1, 1}^{t} + h_{k, - 1, 1}^{t} + h_{k, 1, 2}^{t} + h_{k, - 1, 2}^{t}],

(19)

we obtain the approximate update rule:

z_{i, j}^{t} = \frac{1}{D_{t - 1}^{'}} \times {\begin{matrix} [1 - A_{1} + (A_{2} + A_{3}) (x_{1} + x_{3}) + (A_{4} + A_{5}) x_{2}] z_{+ 1, 1}^{t}, \\ [1 - A_{1} - (A_{2} + A_{3}) (x_{1} + x_{3}) - (A_{4} + A_{5}) x_{2}] z_{- 1, 1}^{t}, \\ [1 + A_{1} - (A_{2} - A_{3}) (x_{1} + x_{3}) + (A_{4} - A_{5}) x_{2}] z_{+ 1, 2}^{t}, \\ [1 + A_{1} + (A_{2} - A_{3}) (x_{1} + x_{3}) - (A_{4} - A_{5}) x_{2}] z_{- 1, 2}^{t}, \end{matrix}

(20)

where $D_{t}^{'}$ is again the sum of the numerators and the parameters A_i are

\begin{matrix} A_{1} = 2 a_{1} a_{2} (\frac{1}{σ_{1}^{2} + σ_{2}^{2}} + \frac{1}{σ_{1}^{2} + 2 σ_{2}^{2}}), A_{2} = \frac{a_{1}}{σ_{1}^{2} + σ_{2}^{2}}, \\ A_{3} = \frac{a_{2}}{σ_{1}^{2} + σ_{2}^{2}}, A_{4} = \frac{a_{1}}{σ_{1}^{2} + 2 σ_{2}^{2}}, A_{5} = \frac{2 a_{2}}{σ_{1}^{2} + 2 σ_{2}^{2}} . \end{matrix}

(21)

Since the prior distribution is uniform, the initial conditions for (20) are

z_{i, j}^{0} = \frac{1}{4}, for i = \pm 1 and j = 1 or 2 .

(22)

Again, the constraint

〈 P (s_{2} = 1, M = 1 ∣ X_{1}) 〉 + 〈 P (s_{2} = 1, M = 2 ∣ X_{1}) 〉 < 〈 P (s_{2} = - 1, M = 1 ∣ X_{1}) 〉 + 〈 P (s_{2} = - 1, M = 2 ∣ X_{1}) 〉 .

(23)

is satisfied if A₄(a₁ - 2a₂) < 2A₃(a₁ - a₂), or equivalently, if the ratio of means, a₁/a₂, lies in the interval

[\frac{2 r + 3 - \sqrt{2 r^{2} + 6 r + 5}}{1 + r}, \frac{2 r + 3 + \sqrt{2 r^{2} + 6 r + 5}}{1 + r}],

(24)

where $r ≜ σ_{1}^{2} ∕ σ_{2}^{2}$ is the ratio of the variances. Intuitively, if the ratio a₁/a₂ is too large, the flankers have negligible effects; if it is too small, the inputs lose their spatial selectivity altogether. More generally, if there are n flankers, the range is

[\frac{(\frac{n}{2} + 1) r + n + 1 - \sqrt{(\frac{n^{2}}{4} + 1) r^{2} + (n^{2} + 2) r + n^{2} + 1}}{1 + r}, \frac{(\frac{n}{2} + 1) r + n + 1 + \sqrt{(\frac{n^{2}}{4} + 1) r^{2} + (n^{2} + 2) r + n^{2} + 1}}{1 + r}] .

We now compare these constraints with numerical simulations of the full inference model for the specific noise parameters (σ₁=7, σ₂=5). We simulated the full model using a range of values of a₁ and a₂ (with their sum held at 1), and obtained accuracy of all responses falling within the first 20 timesteps as a function of a₁/a₂. As can be seen in Figure 2 (right), the accuracy in this short-RT bin is less than .5 when a₁/a₂ falls within (0.70, 3.55), a somewhat more stringent condition than the analytically derived (approximate) interval (0.67, 3.98).

3.3 Evaluating the cost of linearization

Direct simulations of the linear approximation can be compared with those of the original inference model. Figure 3 shows the results for the compatibility bias model for a particular setting of parameters (σ=9), comparing the full inference model with the simplified iteration of (17). The same sequence of noisy observations x_i(t) was used for both processes and in computing the value of P(s₂ = 1|X_t) for the latter at each timestep t, normalization was applied only at that step. The agreement is remarkably good, validating our linear approximations to the products of probabilites (5-4) developed in Section 3. The quality of the linear approximation for the spatial uncertainty model is similarly good (details not shown).

Posterior probability P(s₂ = 1|X_t) for one sample path of the approximate compatibility bias model (Eq. (17), dashed), compared with a sample path from the original inference model (Eq. (9), solid). The same sequence x(t) of observations was used in both cases.

We can also simulate perceptual discrimination based on the linearized evidence accumulation process, using the first passage criterion for threshold crossing appropriate for free response conditions. As in [Yu et al., 2007], we adopt the decision threshold q = 0.9 for both the compatibility bias and the spatial uncertainty model. The time span, taken here as 200 steps, is divided into ten bins and sample paths for the full model (9) and the approximate decoupled system (17) and its analogue for spatial uncertainty are computed. The decoupled results are then normalized by dividing by the sum $\sum_{i, j} v_{i, j}^{t}$ at each t in the current bin (normalization is not applied for steps 1 through t - 1). The same (unit) step size is used in all cases. Responses are logged when the first of the probabilities P(s₂ = +1|X_t) = P(s₂ = +1, M = 1|X_t) + P(s₂ = +1, M = 2|X_t) or P(s₂=-1|X_t) = P(s₂=-1, M = 1|X_t) + P(s₂=-1, M = 2|X_t) crosses q. After collecting sufficiently many paths (2000 in this case), response time histograms are formed and the fraction of correct responses in each bin summed to yield accuracy vs. time curves.

Figure 4 illustrates the results of such simulations for the compatibility and spatial uncertainty models. Accuracy vs. reaction time, and empirical distributions of reaction time are shown for both the full and approximate models. The approximate systems reproduce the characteristic dip in accuracy for fast incompatible trials for both models, and the accuracy curves and reaction time distributions predicted by the approximate theory agree well with those of the full inference models. Note that the use of the first passage criterion for response produces reaction time distributions that agree with the exact model in details of their shapes: a rise at short reactions times to a peak, followed by a long tail. The distributions for incompatible trials are also flatter and shifted rightward compared to those for compatible trials, as in the data of Figure 1.

4 A continuum limit

The key difficulty in working with the discrete dynamical systems (14) and (20) lies in the nonlinear coupling of the posteriors $z_{i, j}^{t}$ through the denominators D_t and $D_{t}^{'}$ . It can be proved that individual sample paths generated with the same noise inputs are identical whether computed by iteration of Eqs. (14) and (20) or by the analogous uncoupled systems Eq. (17), with posteriors normalised only at the last time step; cf. Eq. (16). (In computing the values for the approximate model (17) at each step t for Figure 3, normalization was applied only at that step, but not at steps 1 through t - 1, while the full iteration (9) is normalised at every step.) However, it does not follow that we may average over many realizations of the unnormalized process, and then normalize: as discussed further in Section 4.3, since these operations do not commute. Nonetheless, we can decouple the dynamics by replacing the normalization constant D_t at each time step with its expectation 〈D_t〉, which does not depend on the inputs, and replacing that in turn by a constant. We then take continuum limits of the resulting decoupled linear systems to form stochastic differential equations (SDEs), allowing us to use simple analytical results to compute properties of interest. As described further in Section 5, these SDEs may in turn be related to neurally-based models of evidence accumulation.

4.1 Approximating the denominators

We first examine the denominator 〈D_t〉 for the compatibility bias model:

\begin{matrix} 〈 D_{t} 〉 = & 〈 1 + \frac{x_{1} (t) + x_{2} (t) + x_{3} (t)}{σ^{2}} 〉 〈 z_{+ 1, 1}^{t - 1} 〉 + 〈 1 - \frac{x_{1} (t) + x_{2} (t) + x_{3} (t)}{σ^{2}} 〉 〈 z_{- 1, 1}^{t - 1} 〉 \\ + 〈 1 - \frac{x_{1} (t) - x_{2} (t) + x_{3} (t)}{σ^{2}} 〉 〈 z_{+ 1, 2}^{t - 1} 〉 + 〈 1 + \frac{x_{1} (t) - x_{2} (t) + x_{3} (t)}{σ^{2}} 〉 〈 z_{- 1, 2}^{t - 1} 〉, \end{matrix}

where the approximation comes from assuming that the input-dependent terms (functions of x_k(t)) are independent from the z_ij terms, which depend on the previous inputs x_k(1), ...x_k(t). Although the inputs are conditionally independent (cf. Eq. (5)), they are marginally dependent. That is, if previous inputs favored a particular setting of s₂ and M, then the current one also tends to do the same. For analytical simplicity, we ignore this statistical dependence. Note that in the limit as t → ∞, one of the $z_{i, j}^{t}$ ’s (corresponding to the actual stimulus setting) converges to 1 (and the others to 0), and that no matter which $z_{i, j}^{t}$ it is,

〈 D_{t} 〉 \to 1 + \frac{3}{σ^{2}} .

(25)

More generally, we expect 〈D_t〉 to increase from 1 (D₀ is just the sum of the priors) to $1 + \frac{3 μ}{σ^{2}}$ , where μ denotes the mean value of the x_j’s. Figure 5 shows exactly this for both compatible and incompatible stimuli for a particular setting of the model parameters, where s₂=1 and averaged over 10⁵ trials. Convergence is slower for incompatible stimuli, since the compatibility prior takes time to update from its initial value P(M) = 0.9.

Mean values of the denominator 〈*D_t*〉 for compatible (blue solid) and incompatible (red dashed) stimuli, each averaged over 10⁵ trials. In both cases the 〈*D_t*〉 rises monotonically toward its upper bound $1 + \frac{3}{σ^{2}} = 1.0370 \dots$ .

Based on these arguments, and in spite of the fact that D_t can exhibit large variance on individual trials, we assume D_t ≈ 〈D_t〉 ≈ 1, and approximate the dynamics of Eq. (14) by the following linear, decoupled system:

\begin{matrix} z_{+ 1, 1}^{t} = (1 + \frac{x_{1} + x_{2} + x_{3}}{σ^{2}}) z_{+ 1, 1}^{t - 1}, \\ z_{- 1, 1}^{t} = (1 + \frac{x_{1} - x_{2} + x_{3}}{σ^{2}}) z_{- 1, 1}^{t - 1}, \\ z_{+ 1, 2}^{t} = (1 - \frac{x_{1} - x_{2} + x_{3}}{σ^{2}}) z_{+ 1, 2}^{t - 1}, \\ z_{- 1, 2}^{t} = (1 - \frac{x_{1} + x_{2} + x_{3}}{σ^{2}}) z_{- 1, 2}^{t - 1}, \end{matrix}

(26)

with initial conditions

z_{\pm 1, 1}^{0} = \frac{1}{2} β, z_{\pm 1, 2}^{0} = \frac{1}{2} (1 - β) .

(27)

Similiar reasoning can be used to derive a linear, decoupled approximation for Eq. (20) for the spatial uncertainty model. The approximate dynamics for both models can be written as an iterated linear mapping in the following form

z_{i, j}^{t} = (a_{i, j} + b_{i, j} η (t)) z_{i, j}^{t - 1}, i = 1, \dots, 4,

(28)

where the random variables η(t) are drawn from a standard normal distribution, and a_i,j and b_i,j are constant parameters whose values depend on the model, the probability being computed, and the compatibility condition of the given trial.

For the compatibility bias model, from the details presented in §3.1 if the current stimulus array s(t) is compatible and s₂ = 1 we have

a_{i, j} = {\begin{matrix} 1 + \frac{3}{σ^{2}}, & i = + 1, j = 1, \\ 1 - \frac{3}{σ^{2}}, & i = - 1, j = 1, \\ 1 - \frac{1}{σ^{2}}, & i = + 1, j = 2, \\ 1 + \frac{1}{σ^{2}}, & i = - 1, j = 2, \end{matrix} and b_{i, j} = \frac{\sqrt{3}}{σ} \forall i, j;

(29)

and if s(t) is incompatible and s₂ = 1 we have

a_{i, j} = {\begin{matrix} 1 - \frac{1}{σ^{2}}, & i = + 1, j = 1, \\ 1 + \frac{1}{σ^{2}}, & i = - 1, j = 1, \\ 1 + \frac{3}{σ^{2}}, & i = + 1, j = 2, \\ 1 - \frac{3}{σ^{2}}, & i = - 1, j = 2, \end{matrix} and b_{i, j} = \frac{\sqrt{3}}{σ} \forall i, j .

(30)

For s₂ = -1 all the signs in a_i,j above are reversed.

For the spatial uncertainty model with compatible stimulus array and s₂ = 1, the calculations of §3.2 imply:

a_{i, j} = {\begin{matrix} 1 - A_{1} + 2 (a_{1} + a_{2}) (A_{2} + A_{3}) + (a_{1} + 2 a_{2}) (A_{4} + A_{5}), & i = + 1, j = 1, \\ 1 - A_{1} - 2 (a_{1} + a_{2}) (A_{2} + A_{3}) - (a_{1} + 2 a_{2}) (A_{4} + A_{5}), & i = - 1, j = 1, \\ 1 + A_{1} - 2 (a_{1} + a_{2}) (A_{2} - A_{3}) + (a_{1} + 2 a_{2}) (A_{4} - A_{5}), & i = + 1, j = 2, \\ 1 + A_{1} + 2 (a_{1} + a_{2}) (A_{2} - A_{3}) - (a_{1} + 2 a_{2}) (A_{4} - A_{5}), & i = - 1, j = 2, \end{matrix}

(31)

and for an incompatible stimulus array and s₂ = 1:

a_{i, j} = {\begin{matrix} 1 - A_{1} - 2 (a_{1} - a_{2}) (A_{2} + A_{3}) + (a_{1} - 2 a_{2}) (A_{4} + A_{5}), & i = + 1, j = 1, \\ 1 - A_{1} + 2 (a_{1} - a_{2}) (A_{2} + A_{3}) - (a_{1} - 2 a_{2}) (A_{4} + A_{5}), & i = - 1, j = 1, \\ 1 + A_{1} + 2 (a_{1} - a_{2}) (A_{2} - A_{3}) + (a_{1} - 2 a_{2}) (A_{4} - A_{5}), & i = + 1, j = 2, \\ 1 + A_{1} - 2 (a_{1} - a_{2}) (A_{2} - A_{3}) - (a_{1} - 2 a_{2}) (A_{4} - A_{5}), & i = - 1, j = 2 . \end{matrix}

(32)

In both cases the standard deviation of the noise is given by

b_{i, j} = {\begin{matrix} \sqrt{2 (σ_{1}^{2} + σ_{2}^{2}) {(A_{2} + A_{3})}^{2} + (σ_{1}^{2} + 2 σ_{2}^{2}) {(A_{4} + A_{5})}^{2}}, & i = \pm 1, j = 1, \\ \sqrt{2 (σ_{1}^{2} + σ_{2}^{2}) {(A_{2} - A_{3})}^{2} + (σ_{1}^{2} + 2 σ_{2}^{2}) {(A_{4} - A_{5})}^{2}}, & i = \pm 1, j = 2 . \end{matrix}

(33)

Figure 6 illustrates normal distributions from which these multiplicative terms in (28) are drawn.

Typical distributions from which the multiplicative factors *a_i,j* + b_i,jη(t) in Eq. (28) are drawn on each time step. Parameter values are σ = 1.8 (top) and a₁ = 0.7, a₂ = 0.3, σ₁ = 1.4, σ₂ = 1 (bottom). For illustrative purposes, standard deviations *σ, σ*₁, σ₂ are 20% of those used in the text to reduce overlap of distributions.

4.2 Taking the continuum limit

We now take continuum limits of the discrete dynamical systems derived above that will allow us compute properties of interest analytically. First consider the following finite-difference limit of the iterated mapping (28):

\frac{d (z_{i, j}^{t})}{d t} = \lim_{δ t \to 0} \frac{z_{i, j}^{t} - z_{i, j}^{t - δ t}}{δ t} = \lim_{δ t \to 0} [(a_{i, j} - 1) + b_{i, j} η (t)] z_{i, j}^{t - δ t},

(34)

where the $z_{i, j}^{t}$ represent the four posteriors P(s₂, M|X_t). For finite but small δt = 1/k, this represents a finer-grained discretization in which k steps are taken for every one step of (28), the deterministic increments being of order δt and the random ones of order $\sqrt{δ t}$ [Higham, 2001]. Taking the limit δt → 0 in Eq. (34), letting y_i,j = log(z_i,j), and appealing to the Ito formula [Oksendal, 2002, Section 4.1], we obtain independent, uncoupled SDEs for y_i,j(t):

d y_{i, j} = [(a_{i, j} - 1) - \frac{b_{i, j}^{2}}{2}] d t + b_{i, j} d W \overset{def}{=} A_{i, j} d t + B_{i, j} d W,

(35)

with constant coefficients $A_{i, j} = (a_{i, j} - 1) - \frac{b_{i, j}^{2}}{2}$ and B_i,j = b_i,j, whose values are specified in §4.1. Since each z_i,j(t) represents a posterior probability, it should take values in the interval [0, 1], so we shall be interested in sample paths y_i,j(t) that start at y_i,j(0) < 0 and satisfy -∞ < y_i,j(t) ≤ 0.

4.3 Analytical approximations for the mean posteriors

The SDE (35) describes a drift-diffusion process with constant signal and noise level, which has been extensively studied (e.g. [Gardiner, 1985, Oksendal, 2002]). In particular, for solutions (sample paths) started at y(0) = μ₀ and t = 0 the probability density function of y at time t is the following Gaussian distribution:

p (y, t) = \frac{1}{\sqrt{2 π σ {(t)}^{2}}} \exp [- \frac{{(y - μ (t))}^{2}}{2 σ {(t)}^{2}}],

(36)

where

μ (t) = A t + μ_{0} and σ (t) = B \sqrt{t} .

(37)

(Here and below we drop the subscripts {i, j} in y and z in the understanding that the appropriate coefficients will be used in the final formulae.) We now transform back into z-space, using y = log(z) and $d y = \frac{d z}{z}$ to obtain the density:

p (z, t) = \frac{1}{z \sqrt{2 π σ {(t)}^{2}}} \exp [- \frac{{(\log (z) - μ (t))}^{2}}{2 σ {(t)}^{2}}] .

(38)

The inverse transformation z = exp(y) takes the Gaussian distribution over y into a function skewed towards z = 1, as illustrated in Figure 7.

Probability density functions in logarithmic y-space and the original z-space.

The Gaussian distribution over y takes positive values on y > 0 for all t > 0. This presents a problem, since z = exp(y) > 1 for y > 0, contrary to z’s designation as a probability measure. Therefore, when computing expected values of P(s₂, M|X_t), which requires integration of the quantity z p(z, t), we replace all values of z > 1 by z = 1 (or values of y > 0 by y = 0 in the equivalent integral over y). However, to retain analytical tractability, we continue to assume a Gaussian distribution over y at time t when generating the distribution at time t+1 - that is, we only replace the inappropriate values of y (or x) in the integral, not in the underlying drift-diffusion process. The expected (mean) value of z is therefore approximated as

\begin{matrix} 〈 z (t) 〉 \approx \int_{0}^{1} z p (z, t) d z & = \int_{0}^{1} \frac{1}{z \sqrt{2 π σ {(t)}^{2}}} \exp [- \frac{{(\log (z) - μ (t))}^{2}}{2 σ {(t)}^{2}}] z d z \\ + \int_{1}^{\infty} \frac{1}{z \sqrt{2 π σ {(t)}^{2}}} \exp [- \frac{{(\log (z) - μ (t))}^{2}}{2 σ {(t)}^{2}}] d z, \end{matrix}

(39)

which may be evaluated as explained in Appendix A to yield

\frac{\exp [μ (t) + \frac{σ {(t)}^{2}}{2}]}{2} [1 - \erf (\frac{μ (t) + σ {(t)}^{2}}{\sqrt{2 σ {(t)}^{2}}})] + \frac{1}{2} [1 + \erf (\frac{μ (t)}{\sqrt{2 σ {(t)}^{2}}})] .

(40)

Substituting values appropriate for the compatibility bias model from Eqs. (29-30) for the parameters a_i,j and b_i,j, and hence for A_i,j, B_i,j, and via Eqs. (37), for μ(t) and σ(t), we obtain estimates for the four mean posterior probabilities at time t:

\begin{matrix} 〈 P (s_{2}, M ∣ X_{t}) 〉 \approx \frac{1}{2 D (t)} \times \\ {\exp [μ (t) + \frac{σ {(t)}^{2}}{2}] [1 - \erf (\frac{μ (t) + σ {(t)}^{2}}{\sqrt{2 σ {(t)}^{2}}})] + [1 + \erf (\frac{μ (t)}{\sqrt{2 σ {(t)}^{2}}})]} . \end{matrix}

(41)

where D(t) is the sum of all four probabilities that normalizes the expressions, and for compatible stimuli the functions μ(t) and σ(t) are:

μ (t) = {\begin{matrix} + \frac{3 t}{2 σ^{2}} + \log (\frac{β}{2}), & s_{2} = + 1, M = 1, \\ - \frac{9 t}{2 σ^{2}} + \log (\frac{β}{2}), & s_{2} = - 1, M = 1, \\ - \frac{5 t}{2 σ^{2}} + \log (\frac{1 - β}{2}), & s_{2} = + 1, M = 2, \\ - \frac{t}{2 σ^{2}} + \log (\frac{1 - β}{2}), & s_{2} = - 1, M = 2, \end{matrix} and σ (t) = \frac{\sqrt{3 t}}{σ} .

(42)

and for incompatible stimuli:

μ (t) = {\begin{matrix} + \frac{5 t}{2 σ^{2}} + \log (\frac{β}{2}), & s_{2} = + 1, M = 1, \\ - \frac{t}{2 σ^{2}} + \log (\frac{β}{2}), & s_{2} = - 1, M = 1, \\ + \frac{3 t}{2 σ^{2}} + \log (\frac{1 - β}{2}), & s_{2} = + 1, M = 2, \\ - \frac{9 t}{2 σ^{2}} + \log (\frac{1 - β}{2}), & s_{2} = - 1, M = 2, \end{matrix} and σ (t) = \frac{\sqrt{3 t}}{σ} .

(43)

Here, we also use the fact that all sample paths start with the initial conditions specified in Eq. (10) and that μ₀ = μ(0) = log(z(0)).

As noted at the beginning of this section, normalization and averaging do not commute. This may be understood in terms of the distributions of Figure 7 as follows. While each sample path can be computed for the uncoupled processes and normalized at time t to yield the same result as a sample path of the coupled system (cf. Figure 3), different normalization factors must typically be applied to the values of different paths z_i,j(t) at each time t. This would distort the distributions p(z_i,j, t), thereby changing their means. However, we may appeal to the observation that the expected value of the denominator remains close to 1 (cf. Figure 5) to conclude that this distortion is likely to be small, and proceed by dividing by the sums of the four mean probability trajectory values at time t to normalize the resulting expressions.

Typical results for mean posterior probabilities are shown in Figure 8. The approximate predictions developed above are shown as dashed curves and the results of averaging over 5000 simulated trials of the full inference model (9) are shown solid; compatible and incompatible trials are shown in red and blue respectively. As above, we compute 200 steps for the discrete iteration of the full system, and we evaluate the corresponding quantities for t ∈ [0, 200] time units from the formulae above. For P(M) = 0.5 (not shown), joint posteriors for correct responses increase similarly for both compatible and incompatible cases, but P(M)=0.9 elicts markedly different behaviors (top left). The compatibility posteriors P(M = 1|X_t) show a general rise for compatible stimuli and a monotonic fall for incompatible stimuli, but the posterior probability P(s₂ = 1|X_t) shows a significant dip below 0.5 at early times for incompatible stimuli, while it rises monotonically for compatible stimuli. As discussed in Section 5, the resulting accuracies exhibit similar patterns to the experimental data, with the incompatible case showing a dip in accuracy for early responses. Evolutions of the four individual posterior probabilities are shown in the lower panels of Figure 8.

Predictions of the full and simplified compatibility bias models in the case that the central symbol is S (s₂=1) and with prior compatibility bias P(M)=0.9. Top left: marginal mean posterior probabilities P(s₂ = 1|M) (correct response) for compatible and incompatible conditions. Top right: marginal mean posterior P(M = 1) for compatibility. Bottom row: individual mean posteriors for compatible (left) and incompatible (right) trials. Results from full inference model, averaged as in Figure 8, shown solid and predictions of the continuum approximation (41-43) shown dashed. Keys identify individual curves.

Figure 8 illustrates that, while the approximations developed here do not capture all the detailed behavior of the full model, they do provide reasonably good approximations to the average evolutions of the posteriors over the course of a trial. Time scales are slightly misestimated and the compatibility posterior P(M=1|X_t) (top right) fails to reproduce the slight dip below 0.9 that occurs for compatible trials at early times, but the relative orderings of all the posteriors are correctly predicted. Overall, absolute errors in mean posteriors, computed as described at the end of this section, lie between 0.002 and 0.05, the largest being for P(M=1|X_t) in the case of incompatible stimuli (top right, lower curves).

Predictions for the spatial uncertainty model follow from the formula (41) in a similar manner, upon the substitution of values for a and b from Eqs. (31-33), and using the initial conditions μ₀=log(1/4) for all four posteriors (Eq. (22)). For compatible stimuli, the function μ(t) is

μ (t) = {\begin{matrix} [\frac{a_{1}^{2} + a_{2}^{2}}{σ_{1}^{2} + σ_{2}^{2}} - \frac{2 a_{1} a_{2}}{σ_{1}^{2} + 2 σ_{2}^{2}}] t + \log (\frac{1}{4}), & s_{2} = + 1, M = 1, \\ [- \frac{3 a_{1}^{2} + 8 a_{1} a_{2} + 3 a_{2}^{2}}{σ_{1}^{2} + σ 2^{2}} - \frac{2 (a_{1} + a_{2}) (a_{1} + 4 a_{2})}{σ_{1}^{2} + 2 σ_{2}^{2}}] t + \log (\frac{1}{4}), & s_{2} = - 1, M = 1, \\ [\frac{- 3 a_{1}^{2} + 4 a_{1} a_{2} + a_{2}^{2}}{σ_{1}^{2} + σ_{2}^{2}} + \frac{2 (3 a_{1} - 4 a_{2}) a_{2}}{σ_{1}^{2} + 2 σ_{2}^{2}}] t + \log (\frac{1}{4}), & s_{2} = + 1, M = 2, \\ [\frac{a_{1}^{2} + 4 a_{1} a_{2} - 3 a_{2}^{2}}{σ_{1}^{2} + σ_{2}^{2}} - \frac{2 a_{1} (a_{1} - 3 a_{2})}{σ_{1}^{2} + 2 σ_{2}^{2}}] t + \log (\frac{1}{4}), & s_{2} = - 1, M = 2; \end{matrix}

(44)

for incompatible stimuli

μ (t) = {\begin{matrix} [\frac{- 3 a_{1}^{2} - 4 a_{1} a_{2} + a_{2}^{2}}{σ_{1}^{2} + σ_{2}^{2}} - \frac{2 (3 a_{1} + 4 a_{2}) a_{2}}{σ_{1}^{2} + 2 σ_{2}^{2}}] t + \log (\frac{1}{4}), & s_{2} = + 1, M = 1, \\ [\frac{a_{1}^{2} - 4 a_{1} a_{2} - 3 a_{2}^{2}}{σ_{1}^{2} + σ_{2}^{2}} - \frac{2 a_{1} (a_{1} + 3 a_{2})}{σ_{1}^{2} + 2 σ_{2}^{2}}] t + \log (\frac{1}{4}), & s_{2} = - 1, M = 1, \\ [\frac{a_{1}^{2} + a_{2}^{2}}{σ_{1}^{2} + σ_{2}^{2}} + \frac{2 a_{1} a_{2}}{σ_{1}^{2} + 2 σ_{2}^{2}}] t + \log (\frac{1}{4}), & s_{2} = + 1, M = 2, \\ [- \frac{3 a_{1}^{2} - 8 a_{1} a_{2} + 3 a_{2}^{2}}{σ_{1}^{2} + σ 2^{2}} - \frac{2 (a_{1} - a_{2}) (a_{1} - 4 a_{2})}{σ_{1}^{2} + 2 σ_{2}^{2}}] t + \log (\frac{1}{4}), & s_{2} = - 1, M = 2, \end{matrix}

(45)

and in both cases

σ (t) = {\begin{matrix} \sqrt{[\frac{{(a_{1} + a_{2})}^{2}}{σ_{1}^{2} + σ_{2}^{2}} + \frac{{(a_{1} + 2 a_{2})}^{2}}{σ_{1}^{2} + 2 σ_{2}^{2}}] t}, & s_{2} = \pm 1, M = 1, \\ \sqrt{[\frac{{(a_{1} - a_{2})}^{2}}{σ_{1}^{2} + σ_{2}^{2}} + \frac{{(a_{1} - 2 a_{2})}^{2}}{σ_{1}^{2} + 2 σ_{2}^{2}}] t}, & s_{2} = \pm 1, M = 2 . \end{matrix}

(46)

The above results, presented in Figure 9, are not as good as those for the compatibility bias model. Nonetheless, the approximate model captures the key features of the evolving posteriors in the full model rather well, prediciting the relative ordering of the posteriors appropriately in all cases except the incorrect choices P(HHH) and P(SHS) for incompatible stimuli; in that case the approximation for P(SHS) diverges from the correct function, increasing rather than decreasing as t increases (lower right panel), for an absolute error of 0.12. Apart from this case, however, errors lie between 0.015 and 0.08.

Predictions of the full and simplified spatial uncertainty models. Top left: marginal mean posterior probabilities P(s₂ = 1|M) (correct response) for compatible and incompatible conditions. Top right: marginal mean posterior P(M = 1) for compatibility. Bottom row: individual mean posteriors for compatible (left) and incompatible (right) trials. Results from full inference model, averaged as in Figure 8, shown solid and predictions of the continuum approximation (41) and (44-46) shown dashed. Keys identify individual curves.

The errors for both models were computed for each mean posterior using the L¹ norm as follows:

Error = \sum_{t = o}^{T} ∣ p_{t} - {\tilde{p}}_{t} ∣,

(47)

where p_t and ${\tilde{p}}_{t}$ denote the posteriors predicted by the full and simplified models respectively.

4.4 Making use of explicit mean posteriors

In addition to providing explicit expressions for posterior probabilities, the continuum limit also yields approximations for accuracy and reaction time distributions. To estimate accuracy as a function of response time under the free response protocol assumed by [Yu et al., 2007], we compute the fraction of mass of the evolving probability density $p (z_{i, 1}^{t}, z_{i, 2}^{t})$ that exceeds a given threshold $z_{i, 1}^{t} + z_{i, 2}^{t} = q$ at each time t (recall Eq. (11)). This procedure overestimates first passage times, since some of the sample paths that lie beyond the threshold q at time t may have crossed at earlier times, but it permits some analytical simplification. Without loss of generality, we shall assume that s₂ = 1.

The integral that we need to evaluate is

P {(s_{2} = 1 ∣ X_{t})}_{est} = \int_{0}^{\infty} \int_{q - z_{2}}^{\infty} p (z_{1, 1}^{t}, z_{1, 2}^{t}) d z_{1, 1}^{t} d z_{1, 2}^{t} \approx \int_{0}^{\infty} \int_{q - z_{2}}^{\infty} p (z_{1}, t) p (z_{2}, t) d z_{1} d z_{2},

(48)

where we have used the shorthand notation $p (z_{j}, t) = p (z_{1, j}^{t})$ , and the approximation comes from assuming $p (z_{1, 1}^{t}, z_{1, 2}^{t}) \approx p (z_{1, 1}^{t}) p (z_{1, 2}^{t})$ for the uncoupled and linearized approximate dynamical system - this assumption greatly simplifies the computations, although the uncoupled processes are not entirely independent since they are activated by common inputs (x₁, x₂, x₃), albeit in different linear combinations. We also note that the variables z_j should be non-negative (cf. Figure 7). The domain of integration is pictured in Figure 12. The p(z_j, t)’s take the forms derived in §4.3 above and since each is a normalized Gaussian in the logarithmic y variables, the integral of their product over the entire positive quadrant is 1. Hence we have

P {(s_{2} = 1 ∣ X_{t})}_{est} = 1 - \int_{0}^{q} \int_{0}^{q - z_{2}} p (z_{1}, t) p (z_{2}, t) d z_{1} d z_{2},

(49)

which is evaluated in Appendix A to yield:

P {(s_{2} = 1 ∣ X_{t})}_{est} = \frac{3}{4} - \frac{1}{4} \erf (\frac{\log (q) - μ_{2} (t)}{\sqrt{2 σ_{2} {(t)}^{2}}}) - \frac{1}{2} \int_{0}^{q} p (z_{2}, t) \erf (\frac{\log (q - z_{2}) - μ_{1} (t)}{\sqrt{2 σ_{1} {(t)}^{2}}}) d z_{2},

(50)

where

p (z_{2}, t) = \frac{1}{z_{2} \sqrt{2 π σ_{2} {(t)}^{2}}} \exp [- \frac{(\log (z_{2}) - μ_{2} (t))^{2}}{2 σ_{2} {(t)}^{2}}] .

(51)

The integral of the joint posterior probability distribution is taken over the positive (z₁, z₂)-quadrant less the shaded triangular region.

Unfortunately, the final integral in Eq. (50) cannot be computed analytically, but it can be evaluated accurately and rapidly by numerical methods.

Response accuracy is approximated by the fraction of correct responses that exceed threshold:

\frac{P {(s_{2} = 1 ∣ X_{t})}_{est}}{P {(s_{2} = 1 ∣ X_{t})}_{est} + P {(s_{2} = 2 ∣ X_{t})}_{est}},

(52)

where the denominator approximates the sum of all four probabilities $z_{1, 1}^{t} + z_{1, 2}^{t} + z_{2, 1}^{t} + z_{2, 2}^{t}$ . (The term P(s₂=2|X_t)_est is computed in a similar manner to Eq. (50), with the appropriate expressions for μ(t), σ(t) from §4.3.) The denominator is the cumulative reaction time and so its derivative with respect to t provides the reaction time distribution. Hence, both accuracy and reaction time distributions can be approximated semi-analytically. Figure 10 shows the resulting approximations to the mean posteriors for the compatibility bias model, for a particular setting of model parameters. The dip in accuracy for incompatible trials is reproduced, and after an initial rise in accuracy for compatible trials, accuracy slowly declines.

Predictions of accuracy (left) and reaction time histograms (right) computed under the approximation of Section 4.4. Solid curve and dark boxes indicate compatible trials; dashed curve and light boxes indicate incompatible trials.

As we have noted, sample paths of the SDE (35) may pass across q and back, possibly repeatedly, in the interval (0, t), so these results do not directly correspond to the first-passage decision policy of the Bayesian models in [Yu et al., 2007]. This accounts for differences between the accuracy curves and reaction time distributions of Figure 1 and the free response results of §3.3. For example, the compatibility bias free response data of Figure 4 do not show the mild decline in accuracy for later compatible trials of Figure 10, although the spatial uncertainty simulations of Figure 4 do show such a decline. Nonetheless, the qualitative agreement between Figures 10 and 4 is quite good, and since the semi-explicit expression Eqs. (50-51) replaces lengthy Monte-Carlo simulations of §3.3, it may be helpful in guiding parameter fits to data.

The posterior probability expressions can also be used to constrain parameter choices, by requiring the derivative of $P (s_{2} = 1 ∣ X_{t}) = z_{1, 1}^{t} + z_{1, 2}^{t}$ at time t = 0 to be negative and finding corresponding conditions on the parameters. The results of this computation (details not shown) agree closely with those in Section 3.

4.5 Fitting the models to data

We now briefly describe the results of fitting the full models of Section 2 and the reduced DD processes of Sections 4.2-4.4 to the data of [Servan-Schreiber et al., 1998], reproduced in Fig. 1B. For the compatibility bias model the parameters fitted are the noise level σ, prior β, threshold q and step durations δt (for DDM) and ΔT (for the full model), which determine the overall timescale. For spatial uncertainty, they are σ₁, σ₂, a₁, q and δt, ΔT (as in §3.2, we set a₂ = 1 - a₁). To these we add one further parameter, T₀, to account for time occupied by sensory decoding and motor response mechanisms, which superimposes a rightward shift on the RT distributions. (Such an “overhead time” might approximate the mean RT on a simple target detection task).

We employ the same weighted Euclidean error norm as in [Liu et al., 2007] (see Appendix B for details). The parameter values obtained are as follows. Compatibility bias: σ = 6.5, β = 0.87, q = 0.98, δt = 0.95 ms, ΔT = 1.04 ms, and T₀ = 90 ms. Spatial uncertainty: σ₁ = 6.9, σ₂ = 5.1, a₁ = 0.71, q = 0.92, δt = 3.4 ms, ΔT = 0.33 ms, and T₀ = 95 ms. Note that the noise levels are consistent with the assumptions of Sections 3 and 4.1-4.2: e.g., 1/σ⁴ ≪ 1/σ² (cf. Equation (13)). The fitting errors are as follows: Compatibility bias: full model 2.5; DDM 2.3. Spatial uncertainty: full model 2.1; DDM 1.8. In fitting we excluded data points in the first (0 - 100 ms) and the last (900 - 1000 ms)of the 10 RT bins, since no accuracy data is available for the former, and all trials in which responses exceeded 1000 ms were placed in the latter (note the uptick in RT distributions at the rightmost data point). However, we computed model data in that bin and in the next one (1000 - 1100 ms). Since our fitted values of the overhead time T₀ push even the shortest model RTs beyond 100 ms, accuracies cannot be computed for the 0-100 ms bin, unless we assume some premature responses that are initiated before stimulus onset. For such premature responses, the equal prevalence of H and S in the experiments ensure that accuracy approaches chance at very short decision times (cf. upper left panels of Figs. 8 and 9). Indeed, this chance performance is unavoidable, independent of the inference or decision strategy, since the response is deprived of stimulus information and cannot possibly correlate with stimulus identity.

These results are shown in Fig. 11. Fit qualities are slightly better for the spatial uncertainty model, and in both cases, perhaps surprisingly, fit errors are slightly smaller for the reduced DDM than for the full Bayesian procedure. The fit errors are similar to that of 2.4 obtained in [Liu et al., 2007] for the [Gratton et al., 1988] data (Fig. 1A), using a DDM with variable drift rates derived from the neural network model of [Cohen et al., 1992]. That model contains 8 free parameters, compared with 5 and 6 respectively in the present cases. Indeed, in [Liu et al., 2007] 6 parameters are required to describe drift rates in the compatible and incompatible cases, modeling progressive increase in attention to the central stimulus, and these cases are fitted separately. In the present study compatible and incompatible trials are fitted simultaneously, and a single parameter in each model (the compatability prior β, or the weight a₁), along with Bayesian updating, serves to describe the accumulation of evidence.

Accuracy (upper curves in each panel) and reaction time distributions (lower curves) from the full (squares) and reduced DD (triangles) models for compatible (left) and incompatible (right) trials. Upper panels show compatibility bias and lower panels spatial uncertainty model results respectively. Parameters were fitted to the data of [Servan-Schreiber et al., 1998] (dashed curves with circles, cf. Fig 1B).

Both models underestimate mean RTs for compatible trials, producing an excess of points in the 200-250 ms RT bin. They are also unable to capture the drop in accuracy at the shortest RTs on compatible trials (left panels), due to the T₀ behavior noted above. They do reproduce this drop on incompatible trials, although the full compatibility bias model does not exhibit the dip below 50%. The spatial uncertainty model is substantially better in this regard (lower right panel), although it underestimates accuracy in the 400 - 900 ms part of the RT range for both the compatible and incompatible cases. In preliminary work we also tried a modified norm that preferentially weights low RT data: this slightly improved fits of RT distributions, but did not affect compatible accuracy fits. We also fitted the full and DD models to the data of [Gratton et al., 1988] (Fig. 1A), obtaining similar fit qualities, although the failure to capture the steady rise from 50% accuracy at low RTs for compatible trials was more striking in that case (model results not shown here).

We note that individual subjects exhibit large differences in signal-to-noise ratios and thresholds (in DDM fits, cf. [Ratcliff et al., 1999, Bogacz et al., 2007]), and that here we have averaged over all subjects to produce single sets of fit parameters for each model. As illustrated in Fig. 1, there is also substantial variability in Eriksen data, perhaps due to differing deadlining protocols. (Deadlines are necessary to produce enough short reaction times and hence obtain a significant dip in accuracy on incompatible trials.) The resulting variability in motor preparation times can affect reaction times, and no allowance for this is made in the inference model, which describe only cognitive processing. Our additional parameter T₀ only partially accounts for this, and as we have remarked, in the present case it deprives us of accuracy data in the smallest RT bin.

5 Discussion and conclusions

In earlier work [Liu et al., 2007] a neural network model of the Eriksen task [Cohen et al., 1992, Servan-Schreiber et al., 1998] was linearized and reduced to a DDM with time-varying drift, allowing relatively complete analysis that reveals how parameters influence accuracy curves such as those of Figure 1. However, this network model involves somewhat arbitrary assumptions on architecture and parameters, and it is not clear how the DDM reduction of [Liu et al., 2007], with its variable drift rate, relates to the optimal decision theory for the constant drift case [Bogacz et al., 2006]. The present paper addresses this issue by offering analytically tractable approximations to two Bayesian inference models (compatibility bias and spatial uncertainty) proposed in [Yu et al., 2007].

Specifically, the joint signal probability distribution of Eq. (4) is approximated as a linear sum, and then, by assuming that the sum of the non-normalized posteriors remains close to one and taking a continuum limit, we obtain analytical expressions for the mean posterior probabilities. Employing a further approximation in which the net probabilities of having answered correctly or incorrectly at time t are computed, we derive semi-analytical approximations for accuracy and reaction time distributions. While the latter correspond more closely to an “interrogation protocol” [Bogacz et al., 2006, Liu et al., 2007] in which subjects are cued to respond at specific times, and so differ quantitatively from those computed numerically for free responses (compare Figures 10 with Figure 4), the overall accuracy curves and individual posteriors derived from the continuum model reproduce those of the Bayesian model quite well (see Figures 8-9).

We therefore expect that our analytical approximations will be useful in guiding parameter selection when fitting models to experimental data. In Section 3, we provide an example of this by deriving simple parametric constraints that must hold to obtain the dip below 50% in the posterior probability for early responses. Moreover, although the coefficients differ, the linearized update rules of both Eqs. (14) and (20) demonstrate that the flanker inputs x₁ and x₃ work with the target input x₂ for the compatible hypotheses, and against it for the incompatible hypotheses. This underlying computational architecture gives rise to the same basic ability of both the compatibility bias and spatial uncertainty models to account for the dynamics of flanker interference in behavioral data. In Section 4.5 we show that both the original models and DDM approximations derived from them can be fitted to experimental data, further strengthening our case.

Our analysis also reveals that a particularly simple stochastic differential equation, the constant-drift diffusion (DD) process of Eq. (35), approximately describes the evolution of Bayesian posteriors in log probability space. As described in [Bogacz et al., 2006], this is a continuum limit of the sequential probability ratio test [Wald, 1947], which is known to be optimal for identifying noisy signals in two-alternative choice tasks [Wald and Wolfowitz, 1948]. Moreover, it has been shown [Bogacz et al., 2006, Liu et al., 2007] that DD and related Ornstein-Uhlenbeck processes emerge naturally in linearized reductions of competing leaky accumulator models [Usher and McClelland, 2001] for 2AFC. In these neural networks the difference between activities in a pair of units at the output decision or response stage behaves like the accumulating variable y(t) in Eq. (35)¹ [Gold and Shadlen, 2001]. DD models can also capture bottom-up (stimulus-driven) and top-down influences such as attention and expectation of rewards via variable drift rates [Liu et al., 2007, Eckhoff et al., 2007]

Since accumulator models may be derived from biophysical models of spiking neurons [Wang, 2002, Wong and Wang, 2006], in which their activities represent short-term averages of collective firing rates, this suggests a mechanism by which neural substrates may be able to perform Bayesian computations. Specifically, in reducing the coupled Bayesian inference model (9) to a DD process we see how prior information maps into initial conditions, and evolving posteriors in log probability space are represented by spike rates of groups of neurons. In connection with the latter, we note that [Bogacz and Gurney, 2007] present computational and experimental evidence that Bayesian computations involving exponentiation and taking logarithms (cf. [Yu and Dayan, 2005]), as in Section 4, can be approximated by neurons in the basal ganglia.

Acknowledgments

This work was supported by PHS grants MH58480 and MH62196 (Cognitive and Neural Mechanisms of Conflict and Control, Silvio M. Conte Center). YL benefited from studentship support from the School of Engineering and Applied Science at Princeton University and AY received funding from an NIH NRSA institutional training grant. We thank the referees for perceptive and helpful comments.

Appendix: Mathematical and data fitting details

A Evaluation of integrals

To evaluate the integrals of Eq. (39) we employ the change of variables

x = \frac{(\log (z) - μ)}{\sqrt{2 σ^{2}}}, z = \exp (μ + \sqrt{2 σ^{2}} x),

(53)

so that $d x = \frac{d z}{z \sqrt{2 σ^{2}}}$ and the integrals become

\frac{1}{\sqrt{π}} \int_{- \infty}^{\frac{- μ}{\sqrt{2 σ^{2}}}} \exp (- x^{2} + μ + \sqrt{2 σ^{2}} x) d x and \frac{1}{\sqrt{π}} \int_{\frac{- μ}{\sqrt{2 σ^{2}}}}^{\infty} \exp (- x^{2}) d x .

(54)

The second expression is a standard error function integral, and the first may be put into the same form by completing the square in the argument of the exponent:

x^{2} - μ - \sqrt{2 σ^{2}} x = {(x - \sqrt{\frac{σ^{2}}{2}})}^{2} - (μ + \frac{σ^{2}}{2}),

(55)

followed by the further change of variables

u = (x - \sqrt{\frac{σ^{2}}{2}}) .

(56)

This process results in the expressions of Eq. (40).

To evaluate the integral of Eq. (49) we proceed as follows, dropping the explicit reference to time dependence, which enters the expressions through the mean and standard deviations μ(t), σ(t). Figure 12 indicates the domain of integration.

\begin{matrix} P {(s_{2} = i ∣ X_{t})}_{est} = 1 - \int_{0}^{q} \int_{0}^{q - z_{2}} p (z_{1}) p (z_{2}) d z_{1} d z_{2} = 1 - \int_{0}^{q} p (z_{2}, t) \int_{0}^{q - z_{2}} p (z_{1}) d z_{1} d z_{2} \\ = & 1 - \int_{0}^{q} p (z_{2}, t) \frac{1}{2} [1 + \erf (\frac{\log (q - z_{2}) - μ_{1} (t)}{\sqrt{2 σ_{1} {(t)}^{2}}})] d z_{2} \\ = & 1 - \frac{1}{2} \int_{0}^{q} p (z_{2}) d z_{2} - \frac{1}{2} \int_{0}^{q} p (z_{2}) \erf (\frac{\log (q - z_{2}) - μ_{1} (t)}{\sqrt{2 σ_{1} {(t)}^{2}}}) d z_{2} \\ = & 1 - \frac{1}{4} [1 + \erf (\frac{\log (q) - μ_{2} (t)}{\sqrt{2 σ_{2} {(t)}^{2}}})] - \frac{1}{2} \int_{0}^{q} p (z_{2}) \erf (\frac{\log (q - z_{2}) - μ_{1} (t)}{\sqrt{2 σ_{1} {(t)}^{2}}}) d z_{2} . \end{matrix}

(57)

Here we have added subscripts to the time-varying means and standard deviations μ_j(t), σ_j(t), using the same shorthand z_j = z_1,j as in §4.4 to indicate which of the four cases s₂ = ±1; M = 1, 2 enumerated in §4.3 is intended.

B Data fitting method

Data fits were performed using the fmincon() function in MATLAB. Parameters were determined by adjusting them while seeking minima of a error function, described by a weighted Euclidean norm, which averages over accuracy and RT data for both compatible and incompatible trials. The usual Euclidean (L²) distance between vectors u and v with components u_j and v_j is

∣ ∣ u - v ∣ ∣ = \sqrt{{(u_{1} - v_{1})}^{2} + {(u_{2} - v_{2})}^{2} + . . . + {(u_{n} - v_{n})}^{2}} .

(58)

Vectors describing accuracies and RT histograms were first formed from the data (AC_d, RT_d) and corresponding model predictions (AC_m, RT_m) were formed and their differences computed by (58). Since the units of accuracy and RT differ, each of these was then weighted by dividing it by the mean of the data, as indicated by an overbar below. This produces the nondimensional quantity:

Error = \sum_{comp ., incomp .} [\frac{∣ ∣ {AC}_{d} - {AC}_{m} ∣ ∣}{\overset{‒}{∣ ∣ {AC}_{d} ∣ ∣}} + \frac{∣ ∣ {RT}_{d} - {RT}_{m} ∣ ∣}{\overset{‒}{∣ ∣ {RT}_{d} ∣ ∣}}] .

(59)

This error term, representing the sum of percentage differences in accuracy and RT, was then minimized. Note that the resulting value depends on the number of RT bins in the data, and so should be normalized with respect to this when comparing fits of data sets with differing numbers of bins.

Footnotes

In N-alternative choice models, linear combinations of variables approximate (N - 1)-dimensional DD processes [Usher and McClelland, 2001, McMillen and Holmes, 2006].

References

Bogacz R, Brown E, Moehlis J, Holmes P, Cohen J. The physics of optimal decision making: A formal analysis of models of performance in two alternative forced choice tasks. Psychological Review. 2006;113(4):700–765. doi: 10.1037/0033-295X.113.4.700. [DOI] [PubMed] [Google Scholar]
Bogacz R, Gurney K. The basal ganglia and cortex implement optimal decision making between alternative actions. Neural Computation. 2007;19:442–477. doi: 10.1162/neco.2007.19.2.442. [DOI] [PubMed] [Google Scholar]
Bogacz R, Hu P, Cohen J, Holmes P. Submitted for publication. 2007. Do humans select the speed-accuracy tradeoff maximizing reward rate? [Google Scholar]
Cohen J, Dunbar K, McClelland J. On the control of automatic processes: A parallel distributed processing model of the Stroop effect. Psychological Review. 1990;97(3):332–361. doi: 10.1037/0033-295x.97.3.332. [DOI] [PubMed] [Google Scholar]
Cohen J, Servan-Schreiber D, McClelland J. A parallel distributed processing approach to automaticity. American Journal of Psychology. 1992;105:239–269. [PubMed] [Google Scholar]
Eckhoff P, Holmes P, Law C, Connolly P, Gold J. On diffusion processes with variable drift rates as models for decision making during learning. New Journal of Physics. 2007 doi: 10.1088/1367-2630/10/1/015006. ??:?? In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eriksen B, Eriksen C. Effects of noise letters upon the identification of a target letter in a non-search task. Perception and Psychophysics. 1974;16:143–149. [Google Scholar]
Gardiner C. Handbook of Stochastic Methods. Second Edition Springer; New York: 1985. [Google Scholar]
Gold J, Shadlen M. Neural computations that underlie decisions about sensory stimuli. Trends in Cognitive Science. 2001;5(1):10–16. doi: 10.1016/s1364-6613(00)01567-9. [DOI] [PubMed] [Google Scholar]
Gold J, Shadlen M. Banburismus and the brain: decoding the relationship between sensory stimuli, decisions, and reward. Neuron. 2002;36:299–308. doi: 10.1016/s0896-6273(02)00971-6. [DOI] [PubMed] [Google Scholar]
Gratton G, Coles M, Sirevaag E, Eriksen C, Donchin E. Pre- and poststimulus activation of response channels: a psychophysiological analysis. J. Exp. Psychol. Hum. Percept. Perform. 1988;14:331–344. doi: 10.1037//0096-1523.14.3.331. [DOI] [PubMed] [Google Scholar]
Gratton G, Coles MGH, Donchin E. Optimizing the use of information: The strategic control of the activation of responses. J. Exp. Psych. General. 1992;121:480–506. doi: 10.1037//0096-3445.121.4.480. [DOI] [PubMed] [Google Scholar]
Higham D. An algorithmic introduction to numerical simulation of stochastic differential equations. SIAM Rev. 2001;43(3):525–546. [Google Scholar]
Holmes P, Shea-Brown E, Moehlis J, Bogacz R, Gao J, Aston-Jones G, Clayton E, Rajkowski J, Cohen J. Optimal decisions: From neural spikes, through stochastic differential equations, to behavior. IEICE Transactions on Fundamentals on Electronics, Communications and Computer Science. 2005;E88A(10):2496–2503. [Google Scholar]
Laming D. Information Theory of Choice-Reaction Times. Academic Press; New York: 1968. [Google Scholar]
Liu Y, Blostein S. Optimality of the seqeuntial probability ratio test for nonstationary observations. IEEE Transactions on Information Theory. 1992;38(1):177–82. [Google Scholar]
Liu Y, Holmes P, Cohen J. A neural network model of the Eriksen task: Reduction, analysis, and data fitting. Neural Computation. 2007 doi: 10.1162/neco.2007.08-06-313. ??:?? In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
McMillen T, Holmes P. The dynamics of choice among multiple alternatives. J. Math. Psych. 2006;50:30–57. [Google Scholar]
Oksendal B. Stochastic Differential Equations. Springer; New York: 2002. [Google Scholar]
Platt M, Glimcher P. Neural correlates of decision variable in parietal cortex. Nature. 2001;400:233–238. doi: 10.1038/22268. [DOI] [PubMed] [Google Scholar]
Ratcliff R. A theory of memory retrieval. Psych. Rev. 1978;85:59–108. [Google Scholar]
Ratcliff R, Smith P. A comparison of sequential sampling models for two-choice reaction time. Psychol. Rev. 2004;111:333–46. doi: 10.1037/0033-295X.111.2.333. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ratcliff R, Van Zandt T, McKoon G. Connectionist and diffusion models of reaction time. Psych. Rev. 1999;106(2):261–300. doi: 10.1037/0033-295x.106.2.261. [DOI] [PubMed] [Google Scholar]
Roitman J, Shadlen M. Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J. Neurosci. 2002;22:9475–9489. doi: 10.1523/JNEUROSCI.22-21-09475.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schall J. Neural basis of deciding, choosing and acting. Nature Reviews in Neuroscience. 2001;2:33–42. doi: 10.1038/35049054. [DOI] [PubMed] [Google Scholar]
Schall J, Stuphorn V, Brown J. Monitoring and control of action by the frontal lobes. Neuron. 2002;36:309–322. doi: 10.1016/s0896-6273(02)00964-9. [DOI] [PubMed] [Google Scholar]
Servan-Schreiber D, Bruno R, Carter C, Cohen J. Dopamine and the mechanisms of cognition: Part I. A neural network model predicting dopamine effects on selective attention. Biological Psychiatry. 1998;43:713–722. doi: 10.1016/s0006-3223(97)00448-4. [DOI] [PubMed] [Google Scholar]
Shadlen M, Newsome W. Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. J. Neurophysiology. 2001;86:1916–1936. doi: 10.1152/jn.2001.86.4.1916. [DOI] [PubMed] [Google Scholar]
Usher M, McClelland J. On the time course of perceptual choice: The leaky competing accumulator model. Psych. Rev. 2001;108:550–592. doi: 10.1037/0033-295x.108.3.550. [DOI] [PubMed] [Google Scholar]
Wald A. Sequential Analysis. John Wiley & Sons; New York: 1947. [Google Scholar]
Wald A, Wolfowitz J. Optimum character of the sequential probability ratio test. Ann. Math. Statist. 1948;19:326–339. [Google Scholar]
Wang X-J. Probabilistic decision making by slow reverberation in cortical circuits. Neuron. 2002;36:955–968. doi: 10.1016/s0896-6273(02)01092-9. [DOI] [PubMed] [Google Scholar]
Wong K-F, Wang X-J. A recurrent network mechanism of time integration in perceptual decisions. J. Neurosci. 2006;26:1314–1328. doi: 10.1523/JNEUROSCI.3733-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu A, Cohen J, Dayan P, Center for the Study of Brain, Mind and Behavior. Princeton University A Bayesian view of sensory conflicts in decision-making. submitted to J. Exp. Psych. Human Perception and Performance. 2007 Preprint. [Google Scholar]
Yu A, Dayan P. Inference, attention and decision in a Bayesian nueral architecture. In: Saul L, Yair W, Bottou L, editors. Advances in Neural Information Processing Systems. Vol. 17. MIT Press; Cambridge, MA: 2005. pp. 179–196. [Google Scholar]

[R1] Bogacz R, Brown E, Moehlis J, Holmes P, Cohen J. The physics of optimal decision making: A formal analysis of models of performance in two alternative forced choice tasks. Psychological Review. 2006;113(4):700–765. doi: 10.1037/0033-295X.113.4.700. [DOI] [PubMed] [Google Scholar]

[R2] Bogacz R, Gurney K. The basal ganglia and cortex implement optimal decision making between alternative actions. Neural Computation. 2007;19:442–477. doi: 10.1162/neco.2007.19.2.442. [DOI] [PubMed] [Google Scholar]

[R3] Bogacz R, Hu P, Cohen J, Holmes P. Submitted for publication. 2007. Do humans select the speed-accuracy tradeoff maximizing reward rate? [Google Scholar]

[R4] Cohen J, Dunbar K, McClelland J. On the control of automatic processes: A parallel distributed processing model of the Stroop effect. Psychological Review. 1990;97(3):332–361. doi: 10.1037/0033-295x.97.3.332. [DOI] [PubMed] [Google Scholar]

[R5] Cohen J, Servan-Schreiber D, McClelland J. A parallel distributed processing approach to automaticity. American Journal of Psychology. 1992;105:239–269. [PubMed] [Google Scholar]

[R6] Eckhoff P, Holmes P, Law C, Connolly P, Gold J. On diffusion processes with variable drift rates as models for decision making during learning. New Journal of Physics. 2007 doi: 10.1088/1367-2630/10/1/015006. ??:?? In press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Eriksen B, Eriksen C. Effects of noise letters upon the identification of a target letter in a non-search task. Perception and Psychophysics. 1974;16:143–149. [Google Scholar]

[R8] Gardiner C. Handbook of Stochastic Methods. Second Edition Springer; New York: 1985. [Google Scholar]

[R9] Gold J, Shadlen M. Neural computations that underlie decisions about sensory stimuli. Trends in Cognitive Science. 2001;5(1):10–16. doi: 10.1016/s1364-6613(00)01567-9. [DOI] [PubMed] [Google Scholar]

[R10] Gold J, Shadlen M. Banburismus and the brain: decoding the relationship between sensory stimuli, decisions, and reward. Neuron. 2002;36:299–308. doi: 10.1016/s0896-6273(02)00971-6. [DOI] [PubMed] [Google Scholar]

[R11] Gratton G, Coles M, Sirevaag E, Eriksen C, Donchin E. Pre- and poststimulus activation of response channels: a psychophysiological analysis. J. Exp. Psychol. Hum. Percept. Perform. 1988;14:331–344. doi: 10.1037//0096-1523.14.3.331. [DOI] [PubMed] [Google Scholar]

[R12] Gratton G, Coles MGH, Donchin E. Optimizing the use of information: The strategic control of the activation of responses. J. Exp. Psych. General. 1992;121:480–506. doi: 10.1037//0096-3445.121.4.480. [DOI] [PubMed] [Google Scholar]

[R13] Higham D. An algorithmic introduction to numerical simulation of stochastic differential equations. SIAM Rev. 2001;43(3):525–546. [Google Scholar]

[R14] Holmes P, Shea-Brown E, Moehlis J, Bogacz R, Gao J, Aston-Jones G, Clayton E, Rajkowski J, Cohen J. Optimal decisions: From neural spikes, through stochastic differential equations, to behavior. IEICE Transactions on Fundamentals on Electronics, Communications and Computer Science. 2005;E88A(10):2496–2503. [Google Scholar]

[R15] Laming D. Information Theory of Choice-Reaction Times. Academic Press; New York: 1968. [Google Scholar]

[R16] Liu Y, Blostein S. Optimality of the seqeuntial probability ratio test for nonstationary observations. IEEE Transactions on Information Theory. 1992;38(1):177–82. [Google Scholar]

[R17] Liu Y, Holmes P, Cohen J. A neural network model of the Eriksen task: Reduction, analysis, and data fitting. Neural Computation. 2007 doi: 10.1162/neco.2007.08-06-313. ??:?? In press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] McMillen T, Holmes P. The dynamics of choice among multiple alternatives. J. Math. Psych. 2006;50:30–57. [Google Scholar]

[R19] Oksendal B. Stochastic Differential Equations. Springer; New York: 2002. [Google Scholar]

[R20] Platt M, Glimcher P. Neural correlates of decision variable in parietal cortex. Nature. 2001;400:233–238. doi: 10.1038/22268. [DOI] [PubMed] [Google Scholar]

[R21] Ratcliff R. A theory of memory retrieval. Psych. Rev. 1978;85:59–108. [Google Scholar]

[R22] Ratcliff R, Smith P. A comparison of sequential sampling models for two-choice reaction time. Psychol. Rev. 2004;111:333–46. doi: 10.1037/0033-295X.111.2.333. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Ratcliff R, Van Zandt T, McKoon G. Connectionist and diffusion models of reaction time. Psych. Rev. 1999;106(2):261–300. doi: 10.1037/0033-295x.106.2.261. [DOI] [PubMed] [Google Scholar]

[R24] Roitman J, Shadlen M. Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J. Neurosci. 2002;22:9475–9489. doi: 10.1523/JNEUROSCI.22-21-09475.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Schall J. Neural basis of deciding, choosing and acting. Nature Reviews in Neuroscience. 2001;2:33–42. doi: 10.1038/35049054. [DOI] [PubMed] [Google Scholar]

[R26] Schall J, Stuphorn V, Brown J. Monitoring and control of action by the frontal lobes. Neuron. 2002;36:309–322. doi: 10.1016/s0896-6273(02)00964-9. [DOI] [PubMed] [Google Scholar]

[R27] Servan-Schreiber D, Bruno R, Carter C, Cohen J. Dopamine and the mechanisms of cognition: Part I. A neural network model predicting dopamine effects on selective attention. Biological Psychiatry. 1998;43:713–722. doi: 10.1016/s0006-3223(97)00448-4. [DOI] [PubMed] [Google Scholar]

[R28] Shadlen M, Newsome W. Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. J. Neurophysiology. 2001;86:1916–1936. doi: 10.1152/jn.2001.86.4.1916. [DOI] [PubMed] [Google Scholar]

[R29] Usher M, McClelland J. On the time course of perceptual choice: The leaky competing accumulator model. Psych. Rev. 2001;108:550–592. doi: 10.1037/0033-295x.108.3.550. [DOI] [PubMed] [Google Scholar]

[R30] Wald A. Sequential Analysis. John Wiley & Sons; New York: 1947. [Google Scholar]

[R31] Wald A, Wolfowitz J. Optimum character of the sequential probability ratio test. Ann. Math. Statist. 1948;19:326–339. [Google Scholar]

[R32] Wang X-J. Probabilistic decision making by slow reverberation in cortical circuits. Neuron. 2002;36:955–968. doi: 10.1016/s0896-6273(02)01092-9. [DOI] [PubMed] [Google Scholar]

[R33] Wong K-F, Wang X-J. A recurrent network mechanism of time integration in perceptual decisions. J. Neurosci. 2006;26:1314–1328. doi: 10.1523/JNEUROSCI.3733-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Yu A, Cohen J, Dayan P, Center for the Study of Brain, Mind and Behavior. Princeton University A Bayesian view of sensory conflicts in decision-making. submitted to J. Exp. Psych. Human Perception and Performance. 2007 Preprint. [Google Scholar]

[R35] Yu A, Dayan P. Inference, attention and decision in a Bayesian nueral architecture. In: Saul L, Yair W, Bottou L, editors. Advances in Neural Information Processing Systems. Vol. 17. MIT Press; Cambridge, MA: 2005. pp. 179–196. [Google Scholar]

PERMALINK

Dynamical analysis of Bayesian inference models for the Eriksen task

Yuan Sophie Liu

Angela Yu

Philip Holmes

Abstract

1 Introduction

Figure 1.

2 A Bayesian framework for the Eriksen task

Figure 4.

3 Linearization and parametric dependence

3.1 The compatibility bias model

Figure 2.

3.2 The spatial uncertainty model

3.3 Evaluating the cost of linearization

Figure 3.

4 A continuum limit

4.1 Approximating the denominators

Figure 5.

Figure 6.

4.2 Taking the continuum limit

4.3 Analytical approximations for the mean posteriors

Figure 7.

Figure 8.

Figure 9.

4.4 Making use of explicit mean posteriors

Figure 12.

Figure 10.

4.5 Fitting the models to data

Figure 11.

5 Discussion and conclusions

Acknowledgments

Appendix: Mathematical and data fitting details

A Evaluation of integrals

B Data fitting method

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases