Skip to main content
eLife logoLink to eLife
. 2021 Jan 11;10:e55490. doi: 10.7554/eLife.55490

Lapses in perceptual decisions reflect exploration

Sashank Pisupati 1,2,, Lital Chartarifsky-Lynn 1,2,, Anup Khanal 1, Anne K Churchland 3,
Editors: Daeyeol Lee4, Joshua I Gold5
PMCID: PMC7846276  PMID: 33427198

Abstract

Perceptual decision-makers often display a constant rate of errors independent of evidence strength. These ‘lapses’ are treated as a nuisance arising from noise tangential to the decision, e.g. inattention or motor errors. Here, we use a multisensory decision task in rats to demonstrate that these explanations cannot account for lapses’ stimulus dependence. We propose a novel explanation: lapses reflect a strategic trade-off between exploiting known rewarding actions and exploring uncertain ones. We tested this model’s predictions by selectively manipulating one action’s reward magnitude or probability. As uniquely predicted by this model, changes were restricted to lapses associated with that action. Finally, we show that lapses are a powerful tool for assigning decision-related computations to neural structures based on disruption experiments (here, posterior striatum and secondary motor cortex). These results suggest that lapses reflect an integral component of decision-making and are informative about action values in normal and disrupted brain states.

Research organism: Rat

Introduction

Perceptual decisions are often modeled using noisy ideal observers (e.g., Signal detection theory, Green and Swets, 1966; Bayesian decision theory, Dayan and Daw, 2008) that explain subjects’ errors as a consequence of noise in sensory evidence. This predicts an error rate that decreases with increasing sensory evidence, capturing the sigmoidal relationship often seen between evidence strength and subjects’ decision probabilities (i.e. the psychometric function).

Human and nonhuman subjects often deviate from these predictions, displaying an additional constant rate of errors independent of the evidence strength known as ‘lapses’, leading to errors even on extreme stimulus levels (Wichmann and Hill, 2001; Busse et al., 2011; Gold and Ding, 2013; Carandini and Churchland, 2013). Despite the knowledge that ignoring or improperly fitting lapses can lead to serious mis-estimation of psychometric parameters (Wichmann and Hill, 2001; Prins and Kingdom, 2018), the cognitive mechanisms underlying lapses remain poorly understood. A number of possible sources of noise have been proposed to explain lapses, typically tangential to the decision-making process.

One class of explanations for lapses relies on pre-decision noise added due to fluctuating attention, which is often operationalized as a small fraction of trials on which the subject fails to attend to the stimulus (Wichmann and Hill, 2001). On these trials, it is assumed that the subject cannot specify the stimulus (i.e. sensory noise with infinite variance, Bays et al., 2009) and hence guesses randomly or in proportion to prior beliefs. This model can be thought of as a limiting case of the Variable Precision model, which assumes that fluctuating attention has a more graded effect of scaling the sensory noise variance (Garrido et al., 2011), giving rise to heavy tailed estimate distributions, resembling lapses in the limit of high variability (Shen and Ma, 2019; Zhou et al., 2018). Temporal forms of inattention have also been proposed to give rise to lapses, where the animal ignores early or late parts of the evidence (impulsive or leaky integration, Erlich et al., 2015).

An alternative class of explanations for lapses relies on a fixed amount of noise added after a decision has been made, commonly referred to as ‘post-categorization’ noise (Erlich et al., 2015) or decision noise (Law and Gold, 2009). Such noise could arise from errors in motor execution (e.g. finger errors, Wichmann and Hill, 2001), non-stationarities in the decision rule arising from computational imprecision (Findling et al., 2018), suboptimal weighting of choice or outcome history (Roy et al., 2018; Busse et al., 2011), or random variability added for the purpose of exploration (e.g. ‘ε-greedy’ decision rules).

A number of recent observations have cast doubt on fixed early- or late-stage noise as satisfactory explanations for lapses. For instance, many of these explanations predict that lapses should occur at a constant rate, while in reality, lapses are known to reduce in frequency with learning in nonhuman primates (Law and Gold, 2009; Cloherty et al., 2019). Further, they can occur with different frequencies for different stimuli even within the same subject (in rodents, Nikbakht et al., 2018; and humans, Mihali et al., 2018; Bertolini et al., 2015; Flesch et al., 2018), suggesting that they may reflect task-specific, associative processes that can vary within a subject.

Lapse frequencies are even more variable across subjects and can depend on the subject’s age and state of brain function. For instance, lapses are significantly higher in children and patient populations than in healthy adult humans (Roach et al., 2004; Witton et al., 2017; Manning et al., 2018). Moreover, a number of recent studies in rodents have found that perturbing neural activity in secondary motor cortex (Erlich et al., 2015) and striatum (Yartsev et al., 2018; Guo et al., 2018) has dramatic, asymmetric effects on lapses in auditory decision-making tasks. Because these perturbations were made in structures known to be involved in action selection, an intriguing possibility is that lapses reflect an integral part of the decision-making process, rather than a peripheral source of noise. However, because these studies only tested auditory stimuli, they did not afford the opportunity to distinguish sensory modality-specific deficits from general decision-related deficits. Taken together, these observations point to the need for a deeper understanding of lapses that accounts for effects of stimulus set, learning, age, and neural perturbations.

Here, we leverage a multisensory decision-making task in rodents to reveal the inadequacy of traditional models. We challenge a key assumption of perceptual decision-making theories, i.e. subjects’ perfect knowledge of expected rewards (Dayan and Daw, 2008), to uncover a novel explanation for lapses: uncertainty-guided exploration, a well-known strategy for balancing exploration and exploitation in value-based decisions. We test the predictions of the exploration model for perceptual decisions by manipulating the magnitude and probability of reward under conditions of varying uncertainty. Finally, we demonstrate that suppressing secondary motor cortex or posterior striatum unilaterally has an asymmetric effect on lapses that generalizes across sensory modalities, but only in uncertain conditions. This can be accounted for by an action value deficit contralateral to the inactivated side, reconciling the proposed perceptual and value-related roles of these areas and suggesting that lapses are informative about the subjective values of actions, reflecting a core component of decision-making.

Results

Testing ideal observer predictions in perceptual decision-making

We leveraged an established decision-making task (Raposo et al., 2012; Sheppard et al., 2013; Licata et al., 2017) in which freely moving rats judge whether the fluctuating rate of a 1000 ms series of auditory clicks and/or visual flashes (rate range: 9–16 Hz) is high or low compared with an abstract category boundary of 12.5 Hz (Figure 1a–c). Using Bayesian decision theory, we constructed an ideal observer for our task that selects choices that maximize expected reward (see Materials and methods: Modeling). To test whether behavior matches ideal observer predictions, we presented multisensory trials with matched visual and auditory rates (i.e., both modalities carried the same number of events per second; Figure 1c, bottom) interleaved with visual-only or auditory-only trials. This allowed us to separately estimate the sensory noise in the animal’s visual and auditory system, and compare the measured performance on multisensory trials to the predictions of the ideal observer.

Figure 1. Testing ideal observer predictions in perceptual decision-making.

Figure 1.

(a) Schematic drawing of rate discrimination task. Rats initiate trials by poking into a center port. Trials consist of visual stimuli presented via a panel of diffused LEDs, auditory stimuli presented via a centrally positioned speaker, or multisensory stimuli presented from both. Rats are rewarded with a 24 μL drop of water for reporting high-rate stimuli (greater than 12.5 Hz) with rightward choices and low-rate stimuli (lower than 12.5 Hz) with leftward choices. (b) Timeline of task events. (c) Example stimulus on auditory (top), visual (middle), and multisensory trials (bottom). Stimuli consist of a stream of events separated by long (100 ms) or short (50 ms) intervals. Multisensory stimuli consist of visual and auditory streams carrying the same underlying rate. Visual, auditory, and multisensory trials were randomly interleaved (40% visual, 40% auditory, and 20% multisensory). (d) Schematic outlining the computations of a Bayesian ideal observer. Stimulus belonging to a true category c with a true underlying rate s gives rise to noisy observations xA and xV, which are then integrated with each other and with prior beliefs to form a multisensory posterior belief about the category, and further combined with reward information to form expected action values QL,QR. The ideal observer selects the action a^ with maximum expected value. Lightning bolts denote proposed sources of noise that can give rise to (red) or exacerbate (gray) lapses, causing deviations from the ideal observer. (e) Posterior beliefs on an example trial assuming flat priors. Solid black line denotes true rate, and blue and green dotted lines denote noisy visual and auditory observations, with corresponding unisensory posteriors shown in solid blue and green. Solid red denotes the multisensory posterior, centered around the maximum a posteriori rate estimate in dotted red. Shaded fraction denotes the probability of the correct choice being rightward, with μ denoting the category boundary. (f) Ideal observer predictions for the psychometric curve, that is, proportion of high-rate choices for each rate. Inverse slopes of the curves in each condition are reflective of the posterior widths on those conditions, assuming flat priors. The value on the abscissa corresponding to the curve’s midpoint indicates the subjective category boundary, assuming equal rewards and flat priors.

Performance was assessed using a psychometric curve, that is, the probability of high-rate decisions as a function of stimulus rate (Figure 1f). The ideal observer model predicts a relationship between the slope of the psychometric curve and noise in the animal’s estimate: the higher the standard deviation (σ) of sensory noise, the more uncertain the animal’s estimate of the rate and the shallower the psychometric curve. On multisensory trials, the ideal observer should have a more certain estimate of the rate (Figure 1e, visual [blue] and auditory [green] σ values are larger than multisensory σ [red]), driving a steeper psychometric curve (Figure 1f, red curve is steeper than green and blue curves). Since this model does not take lapses into account, it would predict perfect performance on the easiest stimuli on all conditions, and thus all curves should asymptote at 0 and 1 (Figure 1f).

Lapses cause deviations from ideal observer and are reduced on multisensory trials

In practice, the shapes of empirically obtained psychometric curves do not perfectly match the ideal observer (Figure 2) since they asymptote at values that are less than 1 or greater than 0. This is a well-known phenomenon in psychophysics (Wichmann and Hill, 2001), requiring two additional lapse parameters to precisely capture the asymptotes. To account for lapses, we fit a four-parameter psychometric function to the subjects’ choice data (Figure 2a – red, Equation 1 in Materials and methods) with the Palamedes toolbox (Prins and Kingdom, 2018). γ and λ are the lower and upper asymptotes of the psychometric function, which parameterize lapses on low and high rates respectively; ϕ is a sigmoidal function, in our case the cumulative normal distribution; x is the event rate, that is, the average number of flashes or beeps presented during the 1 s stimulus period; μ parameterizes the midpoint of the psychometric function and σ describes the inverse slope after correcting for lapses.

Figure 2. Deviations from ideal observer reflect lapses in judgment.

Figure 2.

(a) Schematic psychometric performance of an ideal observer (black) vs. a model that includes lapses (red). The ideal observer model includes two parameters: midpoint (μ) and inverse slope (σ). The four-parameter model includes μ, σ, and lapse probabilities for low-rate (γ) and high-rate choices (λ). Dotted line shows the true category boundary (12.5 Hz). (b) Subject data was fit with a two-parameter model without lapses (black) and a four-parameter model with lapses (red). (c and d) Ideal observer predictions vs. measured multisensory sigma for fits with and without variable lapses across conditions. (c) Multisensory integration seems supra-optimal if lapses are not accounted for (no lapses, black), fixed across conditions (fixed lapses, purple), or assumed to be less than 0.1 (restricted lapses, yellow). (d) Optimal multisensory integration is restored when allowing lapses to vary freely across conditions (n = 17 rats. Points represent individual rats. Data points that lie on the unity line represent cases in which the measured sigma was equal to the optimal prediction). (e) Rats’ psychometric curves on auditory (green), visual (blue), and multisensory (red) trials. Points represent data pooled across 17 rats, and lines represent separate four-parameter fits to each condition. (f) Fit values of sigma (top) and lapse parameters (bottom) on unisensory and multisensory conditions. Both parameters showed significant reduction on the multisensory conditions (paired t-test, p<0.05); n = 17 rats (347,537 trials). (g) Model comparison using Bayes Information Criterion (pink) and Akaike Information Criterion (blue) for fits to pooled data across subjects (top) and to individual subject data (bottom). Lower scores indicate better fits. Both metrics favor a model where lapses are allowed to vary freely across conditions (‘Variable lapse’) over one without lapses (‘No lapses’), one with a fixed probability of lapses (‘Fixed lapse’), or where the lapses are restricted to being less than 0.1 (‘Restricted lapse’).

How can we be sure that the asymptotes seen in the data truly reflect nonzero asymptotes rather than fitting artifacts or insufficient data at the asymptotes? To test whether lapses were truly necessary to explain the behavior, we fit the curves with and without lapses (Figure 2b) and tested whether the lapse parameters were warranted. The fit without lapses was rejected in 15/17 rats by the Bayes Information Criterion (BIC) and in all rats by the Akaike Information Criterion (AIC). Fitting a fixed lapse rate across conditions was not sufficient to capture the data, nor was fitting a lapse rate that was constrained to be less than 0.1 (Wichmann and Hill, 2001). Both data pooled across subjects and individual subject data warranted fitting separate lapse rates to each condition (‘variable lapses’ model outperforms ‘fixed lapses’, ‘restricted lapses’ or ‘no lapses’ in 13/17 individuals based on BIC, all individuals based on AIC and in pooled data based on both, Figure 2g).

Multisensory trials offer an additional, strong test of ideal observer predictions. In addition to perfect performance on the easiest stimuli, the ideal observer model predicts the minimum possible perceptual uncertainty achievable on multisensory trials through optimal integration (Ernst and Bülthoff, 2004; Equation 9 in Materials and methods). By definition, better-than-optimal performance is impossible. However, studies in humans, rodents, and nonhuman primates performing multisensory decision-making tasks suggest that in practice, performance occasionally exceeds optimal predictions (Raposo et al., 2012; Nikbakht et al., 2018; Hou et al., 2018), seeming, at first, to violate the ideal observer model. Moreover, in these data sets, performance on the easiest stimuli was not perfect and asymptotes deviated from 0 and 1. As in these previous studies, when we fit performance without lapses, multisensory performance was significantly supra-optimal (p=0.0012, paired t-test), i.e. better than the ideal observer prediction (Figure 2c, black points are above the unity line). This was also true when lapse probabilities were assumed to be fixed across conditions (p=0.0018, Figure 2c, purple) or when they were assumed to be less than 0.1 (p=0.0003, Figure 2c, yellow). However, when we allowed lapses to vary freely across conditions, performance was indistinguishable from optimal (Figure 2d, data points are on the unity line). This reaffirms that proper treatment of lapses is crucial for accurate estimation of perceptual parameters and offers a potential explanation for previous reports of supra-optimality.

Using this improved fitting method, we replicated previous observations Raposo et al., 2012 showing that animals have improved sensitivity (lower σ) on multisensory vs. unisensory trials (Figure 2e, red curve is steeper than green/blue curves; Figure 2f, top). Interestingly, we observed that animals also had a lower lapse probability (λ+γ) on multisensory trials (Figure 2e, asymptotes for red curve are closer to 0 and 1; n = 17 rats, 347,537 trials). This was consistently observed across animals (Figure 2f, bottom; the probability of lapses on multisensory trials was 0.06 on average, compared to 0.17 on visual, p=1.4e-4 and 0.21 on auditory, p=1.5e-5). We also noticed that compared to unisensory trials, multisensory trials were slightly biased toward high rates. This bias may reflect that animals’ decisions do not exclusively depend on the rate of events, but are additionally weakly influenced by the total event count, as has been previously reported on a visual variant of the task (Odoemene et al., 2018).

Uncertainty-guided exploration offers a novel explanation for lapses where traditional explanations fail

What could account for the reduction in lapse probability on multisensory trials? While adding extra parameters to the ideal observer model fit the behavioral data well and accurately captured the reduction in inverse-slope on multisensory trials, this success does not provide an explanation for why lapses are present in the first place, nor why they differ between stimulus conditions.

To investigate this, we examined possible sources of noise that have traditionally been invoked to explain lapses (Figure 1d). The first of these explanations is that lapses might be due to a fixed amount of noise added once the decision has been made. These sources of noise could include decision noise due to imprecision (Findling et al., 2018) or motor errors (Wichmann and Hill, 2001). However, these sources should hinder decisions equally across stimulus conditions (Figure 3—figure supplement 1b), which cannot explain our observation of condition-dependent lapse rates (Figure 2f).

A second explanation is that lapses arise due to inattention on a small fraction of trials. Inattention would drive the animal to guess randomly, producing lapse rates whose sum should reflect the probability of not attending (Figure 3a, Materials and methods). According to this explanation, the lower lapse rate on multisensory trials could reflect increased attention on those trials, perhaps due to their increased bottom-up salience (i.e. two streams of stimuli instead of one). To examine this possibility, we leveraged a multisensory condition that has been used to manipulate perceptual uncertainty without changing salience in rats and humans (Raposo et al., 2012). Specifically, we interleaved standard matched-rate multisensory trials with ‘neutral’ multisensory trials for which the rate of the auditory stimuli ranged from 9 to 16 Hz, while the visual stimuli was always 12 Hz. This rate was so close to the category boundary (12.5 Hz) that it did not provide compelling evidence for one choice or the other (Figure 3d, left), thus reducing the information in the multisensory stimulus and increasing perceptual uncertainty on ‘neutral’ trials. However, since both ‘neutral’ and ‘matched’ conditions are multisensory, they should be equally salient, and since they are interleaved, the animal would be unable to identify the condition without actually attending to the stimulus. According to the inattention model, matched and neutral trials should have the same rate of lapses, only differing in their inverse-slope σ (Figure 3a, Figure 3—figure supplement 1c).

Figure 3. Uncertainty-guided exploration offers a novel explanation for lapses.

Models of lapses in decision-making: (a) Inattention model of lapses. Left panel: Observer’s posterior belief about rate. On a large fraction of trials given by pattend, the observer attends to the stimulus and has a peaked belief about the rate whose width reflects perceptual uncertainty (red curve on matched trials, orange curve on neutral trials), but on a small fraction of trials given by 1-pattend, the observer does not attend to the stimulus (black curve), leading to equal posterior beliefs of rates being high or low (shaded, clear regions of curves respectively) and guesses according to the probability bias, giving rise to lapses (right panel). The sum of lapse rates then reflects 1-pattend, while their ratio reflects the bias. Since matched and neutral trials are equally salient, they are expected to have the same pattend and hence similar overall lapse rates. (b) Fixed error model of lapses. Lapses could arise due to motor errors occurring on ε fraction of trials, or from decision rules that explore on a fixed proportion ε of trials (black), rather than always maximizing reward (blue). The sum of lapses reflects ε while their ratio reflects any bias in motor errors or exploration, leading to a fixed rate of lapses across conditions. (c) Uncertainty-guided exploration model. Lapses can also arise from more sophisticated exploratory decision rules such as the ‘softmax’ decision rule. Since the difference in expected value from right and left actions (QR-QL) is bounded by the maximum reward magnitudes rRight and rLeft, even when the stimulus is very easy, the maximum probability of choosing the higher value option is not 1, giving rise to lapses. Lapse rates on either side are then proportional to the reward magnitude on that side, and to a ‘temperature’ parameter β that is modulated by the uncertainty in action values. Conditions with higher overall perceptual uncertainty (e.g. neutral, orange) are expected to have higher value uncertainty, and hence higher lapses. (d) Left: multisensory stimuli designed to distinguish between attentional and non-attentional sources of lapses. Standard multisensory stimuli with matched visual and auditory rates (top) and ‘neutral’ stimuli where one modality has a rate very close to the category boundary and is uninformative (bottom). Both stimuli are multisensory and designed to have equal bottom-up salience, and can only be distinguished by attending to them and accumulating evidence. Right: rat performance on interleaved matched (red) and neutral (orange) trials. (e) Model fits (solid lines) overlaid on average data points. Deviations from model fits are denoted with arrows. The exploration model (bottom) provides a better fit than the inattention model (top), since it predicts higher lapse rates on neutral trials (orange). (f) Model comparison using Bayes Information Criterion (pink) and Akaike Information Criterion (blue) both favor the uncertainty-guided exploration model for pooled data (top) as well as individual subject data (bottom).

Figure 3.

Figure 3—figure supplement 1. Uncertainty-dependent exploration is the only model that accounts for behavioral data from all three manipulations.

Figure 3—figure supplement 1.

Columns: data/predictions for three experimental manipulations. Left: unisensory (blue, green) vs. multisensory (red). Middle: matched (red) vs. neutral (orange) multisensory. Right: increased (green) or decreased (red) rightward reward vs. equal reward (black) on auditory trials. (a–d): Four candidate models. (a) Ideal observer model predicts no lapses and only changes in sensitivity/bias across conditions. (b) Fixed motor error model predicts a constant rate of lapses across conditions in addition to changes in sensitivity/bias predicted from the ideal observer. (c) Inattention model predicts that the overall lapse rate (sum of lapses on both sides) depends on the level of bottom-up attentional salience, allowing for different rates for unisensory and multisensory trials. It also predicts that the lapse rate on neutral trials should be equal to that on multisensory trials, and that manipulating rightward reward should affect both lapse rates. (d) Uncertainty-dependent exploration model predicts that overall lapse rate depends on the level of exploratoriness and hence uncertainty associated with that condition, allowing for different lapse rates on unisensory and multisensory trials. It also predicts that the lapse rate on neutral trials should be equal to that on auditory trials and manipulating rightward reward should only affect high-rate lapses. (e) Data from an example rat on all three manipulations.
Figure 3—figure supplement 2. Thompson sampling, which balances exploration and exploitation, predicts lapses that increase with perceptual noise.

Figure 3—figure supplement 2.

(a) Formulation of perceptual decision-making task as a partially observable contextual bandit. To solve this task, an observer needs to infer the true category of the stimulus (Low or High) based on noisy observations, and pick the best action given the inferred category (Left for Low, Right for High). This requires accurately learning the expected rewards from all four state-action pairs. (b) Schematic illustrating the explore–exploit tradeoff: Leftward state-action value beliefs i.e. expected reward from leftward actions (L) performed in different states (Hi, Lo) showing different levels of uncertainty depending on policy. Beliefs are updated based on outcomes using a Bayesian update rule that takes into account uncertainty in state estimation. A greedy policy (top left) that always picks the best action maximizes reward and learns well about the preferred state-action pairs (i.e. Lo-L) but has high uncertainty about the non-preferred pairs (Hi–L). A random policy (top right) earns reward at chance, but learns equally well about all state-action pairs. An ε-greedy policy (bottom left) learns well about the non-preferred pair, but leaves the choice of ε unspecified, and continues exploring even after it has learnt the values well, continuing to forego rewards. Thompson sampling (bottom right) tunes the amount of exploration to the current uncertainties in each value, and balances immediately reward-maximizing decisions with decisions that reduce uncertainty, maximizing average reward in the long term. (c) Cumulative regret, i.e. foregone reward accrued by different policies on the rate discrimination task as a function of training, with lower regret being more desirable. Black – random exploration, Pink – greedy, Purple – ε-greedy, and Yellow – Thompson sampling. Thompson sampling outperforms all other policies by achieving the minimum regret. (d) Learnt beliefs about expected reward with Thompson sampling at various levels of perceptual uncertainty. Low levels of sensory noise (left top) produce more separable beliefs, while higher levels of sensory noise (left bottom) lead to large perceptual uncertainty, yielding highly overlapping belief distributions owing to a reduced ability to assign obtained rewards to one of the states. (Right) Simulated performance averaged across 2000 trials of the Bayesian observer, under a Thompson sampling policy. The observer makes fewer exploratory choices for lower levels of sensory noise (orange) owing to the more separable value beliefs, giving rise to lower lapse rates. (e) Session-averaged lapse rates as a function of sensory noise in simulations (left, center) and multisensory rat data (right). Simulations were done under increasing levels of sensory noise (colors going from hot to cold) under beliefs that action values are stationary (left) or non-stationary (center), solid lines indicate linear best-fit. Individual rat data was fit with a constrained version of the exploration model where total lapse rate was constrained to be linearly related to sensory noise across all modality conditions (auditory – green, multisensory – red, and visual – blue). Lines indicate best fit linear constraints for each rat. (f) Learnt beliefs about expected reward with Thompson sampling during early (left top) and late (left bottom) stages of training. Training reduces uncertainty about expected rewards, producing more separable beliefs and yielding less exploration and lower lapse rates over time (right – simulated average performance). (g) Session-wise lapse rates in simulated (left, center) and rat data (right) as a function of both training and sensory noise. Simulations show decreasing lapse rates over training that asymptote at zero under stationary beliefs (left) and to nonzero values dictated by sensory noise under non-stationary beliefs (center). Rat data was separated by session starting from the earliest day of training with all three modalities, and combined across rats to produce session-wise fits, and the resulting lapse rates were fit with an exponential curve for each modality (solid lines indicate best-fit curves for multisensory – red, visual – blue, and auditory – green).
Figure 3—figure supplement 3. Uncertainty guided exploration outperforms competing models for average and individual data.

Figure 3—figure supplement 3.

(a) Fits of the four models (ideal observer, fixed motor error, inattention, and exploration) to average rat data on unisensory (blue – visual, green – auditory) and multisensory (red) trials. (b) Exploration model fits to unisensory and multisensory data for 17 individual animals. (c) Model comparison for individual animals using Bayes Information Criterion (BIC; left), Akaike Information Criterion (AIC; right) of the four aforementioned models, plus a constrained version of the exploration model corresponding to Thompson sampling. Darker colors are lower BICs/AICs, denoting a better fit. (d) Summed model comparison metrics across animals, showing that inattention and exploration models fit the data equally well, and much better than the ideal observer or fixed error models. Thompson sampling is preferred by BIC, since it fits as well as exploration model but with fewer effective parameters. (e) Fits of the four models to average data including neutral trials (orange) provide a stronger test of the inattention model. (f) Exploration model fits to multisensory data including neutral trials for five individual animals. (g) Model comparison for individual animals. (h) Summed model comparison metrics across animals shows that the uncertainty-guided exploration model performs better than other models.

Contrary to this prediction, we observed higher lapse rates in the ‘neutral’ condition, where trials had higher perceptual uncertainty on average, compared to the ‘matched’ condition (Figure 3d). This correlation between the average perceptual uncertainty in a condition and its frequency of lapses was reminiscent of the correlation observed while comparing unisensory and multisensory trials (Figure 2e,f; Figure 3—figure supplement 1e).

Having observed that traditional explanations of lapses fail to account for the behavioral observations, we re-examined a key assumption of ideal observer models used in perceptual decision-making – that subjects have complete knowledge about the rules and rewards (Dayan and Daw, 2008). In general, this assumption may not hold true for a number of reasons – even when the stimulus category is known with certainty, subjects might have uncertainty about the values of different actions because they are still in the process of learning (Law and Gold, 2009), because they incorrectly assume that their environment is nonstationary (Yu and Cohen, 2008), or because they forget over time (Gershman, 2015; Drugowitsch and Pouget, 2018). In such situations, rather than always ‘exploiting’ (i.e. picking the action currently assumed to have the highest value), it is advantageous to ‘explore’ (i.e. occasionally pick actions whose value the subject is uncertain about), in order to gather more information and maximize reward in the long term (Dayan and Daw, 2008). Exploratory choices of the lower value action for the easiest stimuli would resemble lapses, and the sum of lapses would reflect the overall degree of exploration.

Choosing how often to explore is challenging and requires trading off immediate rewards for potential gains in information – random exploration would reward subjects at chance, but would reduce uncertainty uniformly about the value of all possible stimulus-action pairs, while a greedy policy (i.e. always exploiting) would yield many immediate rewards while leaving lower value stimulus-action pairs highly uncertain (Figure 3—figure supplement 2a,b). Policies that explore randomly on a small fraction of trials (e.g. ‘ε-Greedy’ policies) do not make prescriptions about how often the subject should explore, and are behaviorally indistinguishable from motor errors when the fraction is fixed (Figure 3b). One elegant way to automatically balance exploration and exploitation is to explore more often when one is more uncertain about action values. In particular, a form of uncertainty-guided exploration called Thompson sampling is asymptotically optimal in many general environments (Leike et al., 2016), achieving lower regret than other forms of exploration (Figure 3—figure supplement 2c). This can be thought of as a dynamic ‘softmax’ policy (Figure 3c), whose ‘inverse temperature’ parameter (β) scales with uncertainty (Gershman, 2018). This predicts a lower β when values are more uncertain, encouraging more exploration and more frequent lapses, and a higher β when values are more certain, encouraging exploitation. The limiting case of perfect knowledge (β) reduces to the reward-maximizing ideal observer.

Subjects’ uncertainty about stimulus-action values is compounded by perceptual uncertainty – on trials where the stimulus category is not fully known, credit cannot be unambiguously assigned to one stimulus-action pair when rewards are obtained and value uncertainty is only marginally reduced. Hence conditions where trials have higher perceptual uncertainty on average (e.g. unisensory or neutral trials) will have more overlapping value beliefs, encouraging more exploration and giving rise to more frequent lapses (Figure 3—figure supplement 2d).

As a result, on neutral multisensory trials, the uncertainty-guided exploration model predicts an increase not only in the inverse slope parameter σ, but also in the rate of lapses, just as we observed (Figure 3d). In fact, this model predicts that both slope and lapse parameters on neutral trials should match those on auditory trials, since these conditions have comparable levels of perceptual uncertainty. The data was well fit by the exploration model (Figure 3e, bottom) and satisfied both predictions (Figure 4—source data 1, Neutral has higher σ and lower β than Multisensory, and matched σ and β to Auditory) . By contrast, the inattention model predicts that both conditions would have the same lapse rates, with the neutral condition simply having a larger inverse slope σ. This model provided a worse fit to the data, particularly missing the data at extreme stimulus values where lapses are most clearly apparent (Figure 3e, top). Model comparison using BIC and AIC favored the exploration model over the inattention model, both for fits to pooled data across subjects (Figure 3f, top) and fits to individual subject data (Figure 3f, bottom, Figure 3—figure supplement 3, for the 3/5 subjects rejected by ideal observer model that is, with sizable lapses. Both predictions of the exploration model were confirmed using unconstrained descriptive fits to individuals, and held up for 4/5 subjects).

To further understand the precise relationship between perceptual uncertainty and lapses under this form of exploration, we simulated learning in a Thompson sampling agent for various levels of sensory noise, and found a roughly linear relationship between sensory noise and average lapse rate. Hence we fit a constrained version of the exploration model to the multisensory data from 17 rats, where the degree of exploratory lapses was constrained to be a linear function of that condition’s sensory noise (with two free parameters – slope and intercept, rather than three free parameters for the three conditions). This model yielded lower BIC than the unconstrained exploration model in all 14/17 rats that were rejected by the ideal observer model (Figure 3—figure supplement 3c), and yielded similar slope and intercept parameters across animals (Figure 3—figure supplement 2e).

Reward manipulations confirm predictions of exploration model

One of the key claims of the uncertainty-guided exploration model is that lapses are exploratory choices made with full knowledge of the stimulus, and should therefore depend only on the expected rewards associated with that stimulus category (Figure 3—figure supplement 2). This is in stark contrast to the inattention model and many other kinds of disengagement (Figure 4—figure supplement 1), according to which lapses are caused by the observer disregarding the stimulus, and hence lapses at the two extreme stimulus levels are influenced by a common underlying guessing process that depends on expected rewards from both stimulus categories. This is also in contrast to fixed error models such as motor error or ε-greedy models in which lapses are independent of expected reward (Figure 3b).

Therefore, a unique prediction of the exploration model is that selectively manipulating expected rewards associated with one of the stimulus categories should only change the explore–exploit tradeoff for that stimulus category, selectively affecting lapses at one extreme of the psychometric function. Conversely, inattention and other kinds of disengagement predict that both lapses should be affected, while fixed error models predict that neither should be affected (Figure 4a, Figure 3—figure supplement 1, Figure 4—figure supplement 1).

Figure 4. Reward manipulations match predictions of the exploration model.

(a) The inattention, fixed error, and exploration models make different predictions for increases and decreases in the reward magnitude for rightward (high-rate) actions. The inattention model (left panel) predicts changes in lapses for both high- and low-rate choices, while fixed error models such as motor error or ε-greedy (center) predict changes in neither lapse, and the uncertainty-dependent exploration model (right) predicts changes in lapses only for high-rate choices. Black line denotes equal rewards on both sides; green, increased rightward reward; red, decreased rightward reward. (b) Schematic of rate discrimination trials and interleaved ‘sure bet’ trials. The majority of the trials (94%) were rate discrimination trials as described in Figure 1. On sure-bet trials, a pure tone was played during a 0.2 s fixation period and one of the side ports was illuminated once the tone ended to indicate that reward was available there. Rate discrimination and sure-bet trials were randomly interleaved, as were left and right trials, and the rightward reward magnitude was either increased (36 μL) or decreased (16 μL) while maintaining the leftward reward at 24 μL. (c) Rats’ behavior on rate discrimination trials following reward magnitude manipulations. High-rate lapses decrease when water reward for high-rate choices is increased (left panel; n = 3 rats, 6976 trials), while high-rate lapses increase when reward on that side is decreased (right panel; n = 3 rats, 11,164 trials). Solid curves are exploration model fits with a single parameter change accounting for the manipulation. (d) Rats show nearly perfect performance on sure-bet trials and are unaffected by reward manipulations on these trials. (e) Reward probability manipulation. (Left) Schematic of probabilistic reward trials, incorrect (leftward) choices on high rates were rewarded with a probability of 0.5, and all other rewards were left unchanged. (Right) Rats’ behavior and exploration model fits showing a selective increase in high-rate lapses (n = 5 rats, 34,292 trials). (f) Rats’ behavior on equal reward trials conditioned on successes (green) or failures (red) on the right on the previous trials resembles effects of reward size manipulations. (g) Model comparison showing that Akaike Information Criterion and Bayes Information Criterion both favor the exploration model on data from all three manipulations.

Figure 4—source data 1. Fit parameters to pooled data across rats.

Figure 4.

Figure 4—figure supplement 1. Alternative models of inattentional lapses.

Figure 4—figure supplement 1.

Predictions of alternative models of lapses. (a) Effort-dependent disengagement model: In this model, there is an additional cost or mental effort to being engaged in the task which could vary with condition, and an additional random guessing action. If the net payoff of engagement is not greater than the average value of a guess, then it guesses randomly. Such a model does not produce lapses if the effort is fixed across trials (left), but could produce lapses if the effort fluctuates from trial to trial (center). (b) Proportion of trials on which the animal withdrew prematurely does not vary between matched and neutral trials, suggesting that rats are not disengaging preferentially on neutral trials. (c) Predictions of the effort-dependent disengagement model. The model accurately predicts increased lapses on unisensory trials (left panel, green/blue traces) and neutral multisensory trials (middle left panel, orange trace). However, for asymmetric reward manipulations (middle right – reward magnitude, right – reward probability), the model fails to predict our behavioral observation (Figure 4d) that only lapses on the manipulated side are affected. (d) Temporal inattention model: in this model, temporal weighting of evidence differs between matched and neutral trials. To test this, we compared psychophysical kernels on matched and neutral trials. The temporal dynamics of attention are unchanged between the two kinds of trials, arguing against the temporal inattention model. (e) Variable precision model: in this model, the sensory noise (or its inverse, precision) fluctuates from trial to trial, producing heavy tailed performance curves with apparent 'lapses’. The model accurately predicts increased apparent lapses on unisensory trials (left panel, green/blue traces) and neutral multisensory trials (middle left panel, orange trace). However, for asymmetric reward manipulations (middle right, right), the model fails to predict our behavioral observation (Figure 4d) that lapses only on the manipulated side are affected. Like other models of inattention, it predicts that manipulating reward on one side should affect both lapses. (f) Motivation + salience-dependent inattention: in this model, inattention is determined not just by salience, but also motivation, which in turn depends on average reward. This model’s predictions on unisensory, multisensory (left) and neutral (middle left) trials are identical to the inattention model, but on asymmetric reward manipulations, it predicts that total lapse rate should change as a function of total reward. As a result, when reward magnitude on one side is increased or decreased (middle right), total lapse rate also increases or decreases, in addition to the vertical shifts predicted by inattention. However, on the reward probability manipulation (right), it predicts a *decrease* in total lapse rate owing to the overall higher average reward, in addition to a downward shift predicted by inattention, unlike the rat data (Figure 4e) where overall lapse rate *increases* as a consequence of high-rate lapses selectively *increasing*.
Figure 4—figure supplement 2. Psychometric functions with lapses make it possible to assign perturbations effects to specific stages of decision-making.

Figure 4—figure supplement 2.

(a) (Top row) Model predictions for biased sensory evidence (left), enhanced rightward action value (center), and reduced effort in performing rightward movements (right) in an exploratory regime where lapses are sizeable. The three kinds of perturbations affect decisions at the sensory, value, or motor stages and predict different effects on lapses. (Middle row) Effects of the three manipulations on the four stimulus-action value pairs. Biasing rightward evidence (left) leaves stimulus-action value pairs unchanged, while biasing the learnt rightward values (center) selectively affects rightward action values on high rates and biasing rightward effort (right) affects both high- and low-rate action values equally. (Bottom row) All three perturbations reduce to the same effect (horizontal shift) in the absence of lapses, that is, in the exploit regime. (b) Example data from two rats that experienced the same perturbation: increased rewards on the right port. The rats differ in the extent to which their psychometric functions have lapses. Top: In a psychometric function with lapses, the perturbation (green trace) leads to an interpretable change: the asymmetric change in lapses is only consistent with the explanation that the perturbation enhanced the value of rightward choices (as in [a], top, middle). The perturbation did not drive a change consistent with biased evidence or biased effort. Bottom: In a psychometric function with negligible lapses, the perturbation (red trials) lead to a cryptic change in the psychometric function: the observed shift could equivalently have been driven by biased evidence, value, or effort (as in [a], bottom three panels). Therefore, although the perturbation likely caused the same change in the two rats, an experimenter is only able to accurately explain this change in a rat with lapses.

To experimentally test these predictions, we tested rats on the rate discrimination task with asymmetric rewards (Figure 4b, top). Instead of rewarding high- and low-rate choices equally, we increased the water amount on the reward port associated with high rates (rightward choices) so it was 1.5 times larger than before, without changing the reward on the low-rate side (leftward choices). In a second rat cohort we did the opposite: we devalued the choices associated with high-rate trials by decreasing the water amount on that side port, and so it was 1.5 times smaller than before, without changing the reward on the low-rate side.

The animals’ behavior on the asymmetric-reward task matched the predictions of the exploration model. Increasing the reward size on choices associated with high rates led to a decrease in lapses for the highest rates and no changes in lapses for the lowest rates (Figure 4c, left; n = 3 rats, 6976 trials). Decreasing the reward of choices associated with high rates led to an increase in lapses for the highest rates and no changes in lapses for the lower rates (Figure 4c, right; n = 3 rats, 11,164 trials). This shows that both increasing and decreasing the value of actions associated with one of the stimulus categories selectively affects lapses on that stimulus category, unlike the predictions of the inattention model.

A key claim of the uncertainty-guided exploration model is that the effects of reward manipulations on lapses arise from a selective shift in the trade-off between exploiting the most rewarding action and exploring uncertain ones, rather than from a non-selective bias toward the side with bigger rewards. Importantly, the model predicts that in the absence of uncertainty, decisions should be perfectly exploitative and unaffected by reward imbalances, since subjects would always be comparing perfectly certain, nonzero rewards to zero. To determine whether the effects that we observed were truly driven by uncertainty, we examined performance on randomly interleaved ‘sure bet’ trials on which the uncertainty was very low (Figure 4b, bottom). On these trials, a pure tone was played during the fixation period, after which an LED at one of the side ports was clearly illuminated, indicating a reward. Sure-bet trials comprised 6% of the total trials, and as with the rate discrimination trials, left and right trials were interleaved. Owing to the low perceptual uncertainty and consequently low value uncertainty, the model predicts that animals would quickly reach an ‘exploit’ regime, achieving perfect performance on these trials. Importantly, our model predicts that performance on these ‘sure-bet’ trials would be unaffected by imbalances in reward magnitude, since the ‘exploit’ action remains unchanged.

In keeping with this prediction, performance on sure-bet trials was near perfect (rightward probabilities of 0.003 [0.001,0.01] and 0.989 [0.978,0.995] on go-left and go-right trials respectively) and unaffected following reward manipulations (Figure 4d: Rightward probabilities of 0.004 [0.001, 0.014] and 0.996 [0.986,0.999] on increased reward, 0.006 [0.003,0.012] and 0.99 [0.983,0.994] on decreased reward). This suggests that the effects of reward manipulations that we observed (Figure 4c) are not a default consequence of reward imbalance, but a consequence of a reward-dependent trade-off between exploitation and uncertainty-guided exploration.

As an additional test of the model, we manipulated expected rewards by probabilistically rewarding incorrect choices for one of the stimulus categories. Here, leftward choices on high-rate (‘go right’) trials were rewarded with a probability of 0.5, while leaving all other rewards unchanged (Figure 4e left). The exploration model predicts that this should selectively increase the value of leftward actions on high-rate trials, hence shifting the trade-off toward exploration on high rates and increasing high-rate lapses. Indeed, this is what we observed (Figure 4e right, n = 5 animals, 347,537 trials), and the effect was strikingly similar to the decreased reward experiment, even though the two manipulations affect high-rate action values through changes on opposite actions. This experiment in particular distinguishes the exploration model from motivation-dependent models of disengagement or inattention in which overall reward modulates the total lapse rate through a nonspecific process that averages over stimulus categories (Figure 4—figure supplement 1a–c,f). Moreover, this suggests that lapses reflect changes in stimulus-specific action value caused by changing either reward magnitudes or reward probabilities, as one would expect from the exploration model. Across experiments (Figure 4—source data 1) and individuals, these changes were captured by selectively changing the relevant baseline action value in the model, despite variability in these baselines.

An added consequence of uncertainty in action values is that it should encourage continued learning even in the absence of explicit reward manipulations. This means that animals should continue to use the outcomes of previous trials to update the values of different actions as long as this uncertainty persists. Such persistent learning has been observed in a number of studies (Busse et al., 2011; Lak et al., 2018; Mendonca et al., 2018; Odoemene et al., 2018; Pinto et al., 2018; Scott et al., 2015). The uncertainty-dependent exploration model predicts that the effect of recent outcome history on action values should manifest as changes in lapse rates, rather than as horizontal biases caused by irrelevant, non-sensory evidence as is often assumed (Busse et al., 2011). For example, the action value of rightward choices should increase following a rightward success, producing similar changes to lapses as increased rightward reward magnitude. As predicted, trials following rewarded and unrewarded rightward choices showed decreased and increased lapses, respectively (Figure 4f; same rats and trials as in Figure 2e). Taken together, manipulations of value confirm the predictions of the uncertainty-dependent exploration model (Figure 4g).

Lapses are a powerful tool for assigning decision-related computations to neural structures based on disruption experiments

The results of the behavioral manipulations (above) predict that unilateral disruption of neural regions that leads to a one-sided scaling of learnt stimulus-action values should affect lapse rates asymmetrically. In contrast, disruptions to areas that process sensory evidence would lead to horizontal biases without affecting action values or lapses, and disruptions to motor areas that make one of the actions harder to perform irrespective of the stimulus would affect both lapses (Figure 4—figure supplement 2a top, middle). Crucially, in the absence of lapses, all three of these disruptions would drive an identical behavioral effect, a horizontal shift of the psychometric function (Figure 4—figure supplement 2a bottom). Indeed, the same reward manipulations that gave rise to distinct value biases in rats with sizeable lapses (Figure 4—figure supplement 2b top) led to horizontal shifts indistinguishable from sensory biases in highly trained rats with negligible lapses on multisensory trials (Figure 4—figure supplement 2b bottom). This suggests that lapses are actually informative about decision-making computations and can be used as a tool to determine which computations are affected by disruptions of a candidate brain region. To demonstrate this, we identified two candidate areas, secondary motor cortex (M2) and posterior striatum (pStr), that receive convergent input from primary visual and auditory cortices (Figure 5—figure supplement 1, results of simultaneous anterograde tracing from V1 and A1; also see Jiang and Kim, 2018; Barthas and Kwan, 2017). In previous work, disruptions of these areas had effects on auditory decisions, including changes in lapses (Erlich et al., 2015; Guo et al., 2018). However, considerable controversy remains as to which computations were affected by those disruptions. The effects were largely interpreted in terms of traditional ideal observer models (see Siniscalchi et al., 2019 for a notable exception), and thus attributed to perceptual biases (Guo et al., 2018), leaky accumulation (Erlich et al., 2015) or post-categorization biases (Piet et al., 2017; Erlich et al., 2015). Notably, the asymmetric effects on lapses seen in these studies resembled the effects of the reward manipulations in our task, hinting that they may actually arise from action value changes. Importantly, these existing studies used only auditory stimuli, so were limited in their ability to distinguish sensory-specific deficits from action value deficits.

Here, we used analyses of lapses to determine the decision-related computations altered by unilateral disruption of M2 and pStr. If these disruptions affected action values, the exploration model makes three strong predictions. First, because action values are computed late in the decision-making process, the model predicts that the effects should not depend on the modality of the stimulus. We therefore performed disruptions in animals doing interleaved auditory, visual, and multisensory trials. If pStr and M2 indeed compute action value, then following unilateral disruption of these areas, our model should capture changes to all three modalities by a single parameter change to the contralateral action value. Second, these disruptions should selectively affect lapses on stimuli associated with contralateral actions, irrespective of the stimulus-response contingency. To test this, we performed disruptions on animals trained on standard and reversed contingencies. Finally, because altered action values should have no effect when there is no uncertainty and consequently no exploration, disruption to pStr and M2 should spare performance on sure-bet trials (Figure 4b, bottom).

We suppressed activity of neurons in each of these areas using muscimol, a GABAA agonist, during our multisensory rate discrimination task. We implanted bilateral cannulae in M2 (Figure 5a, Figure 5—figure supplement 2b; n = 5 rats; +2 mm AP 1.3 mm ML, 0.3 mm DV) and pStr (Figure 5a, Figure 5—figure supplement 2a; n = 6 rats; −3.2 mm AP, 5.4 mm ML, 4.1 mm DV). On control days, rats were infused unilaterally with saline, followed by unilateral muscimol infusion the next day (M2: 0.1–0.5 μg, pStr 0.075–0.125 μg). We compared performance on the multisensory rate discrimination task for muscimol days with preceding saline days. Inactivation of the side associated with low-rate choices biased the animals to make more low-rate choices (Figure 5b; left six panels: empty circles, inactivation sessions; full circles, control sessions), while inactivation of the side associated with high rates biased them to make more high-rate choices (Figure 5b, right six panels). The inactivations largely affected lapses on the stimulus rates associated with contralateral actions, while sparing those associated with ipsilateral actions (Figure 5c). These results recapitulated previous findings and were strikingly similar to the effects we observed following reward manipulations (as seen in Figure 4c, right panel). These effects were seen across areas (Figure 5b, top, M2; bottom, pStr) and modalities (Figure 5b; green, auditory; blue, visual and red, multisensory).

Figure 5. Inactivation of secondary motor cortex and posterior striatum affects lapses, suggesting a role in action value encoding.

(a) Schematic of cannulae implants in M2 (top) and pStr (bottom) and representative coronal slices. For illustration purposes only, the schematic shows implants in the right hemisphere; however, the inactivations shown in panel (b) were performed unilaterally on both hemispheres. (b) Unilateral inactivation of M2 (top) and pStr (bottom). Left six plots: inactivation of the side associated with low rates shows increased lapses for high rates on visual (blue), auditory (green), and multisensory (red) trials (M2: n = 5 rats; 10,329 control trials, full line; 6174 inactivation trials, dotted line; pStr: n = 5 rats; 10,419 control trials; 6079 inactivation trials). Right six plots: inactivation of the side associated with high rates shows increased lapses for low rates on visual, auditory, and multisensory trials (M2: n = 3 rats; 5678 control trials; 3816 inactivation trials; pStr: n = 6 rats; 11,333 control trials; 6838 inactivation trials). Solid lines are exploration model fits, accounting for inactivation effects across all three modalities by scaling all contralateral values by a single parameter. (c) Increased high-rate lapses following unilateral inactivation of the side associated with low rates (top left); no change in low-rate lapses (bottom left) and vice versa for inactivation of the side associated with high rates (top, bottom right). Control data on the abscissa is plotted against inactivation data on the ordinate. Same animals as in b. Green, auditory trials; blue, visual trials; red, multisensory trials. Abbreviations: posterior striatum (pStr), secondary motor cortex (M2). (d) Sure bet trials are unaffected following inactivation. Pooled data shows that rats that were inactivated on the side associated with high rates make near perfect rightward and leftward choices Top, M2 (three rats); bottom, pStr (six rats). (e) Model comparison of three possible multisensory deficits – reduction of contralateral evidence by a fixed amount (left), reduction of contralateral value by a fixed amount (center), or an increased contralateral effort by a fixed amount (right). Both Akaike Information Criterion and Bayes Information Criterion suggest a value deficit. (f) Proposed computational role of M2 and striatum. Lateralized encoding of left and right action values by right and left M2/pStr (bottom) explains the asymmetric effect of unilateral inactivations on lapses (top).

Figure 5.

Figure 5—figure supplement 1. pStr and M2 receive direct projections from visual and auditory cortex.

Figure 5—figure supplement 1.

(a) Schematic of tracing experiments. AAV2.CB7.CI.EGFP.WPRE.RBG and AAV2.CAG.tdTomato.WPRE.SV40 constructs were injected unilaterally to primary visual (V1) and auditory (A1) cortices, respectively (V1 coordinates: 6.9 mm posterior to Bregma; 4.2 mm to the right of midline; A1 coordinates: 4.7 mm posterior to Bregma; 7 mm to the right of midline). (b) Secondary motor cortex (M2) receives inputs from V1 and A1 as shown by green and red fluorescence. (c) Posterior striatum (pStr) receives direct inputs from V1 and A1 as shown by green and red fluorescence. Yellow signal medial to pStr reflects overlapping passing fibers.
Figure 5—figure supplement 2. Histological slices of implanted rats.

Figure 5—figure supplement 2.

Representative coronal slices of all rats implanted with cannulae for muscimol inactivation experiments. (a) Six rats were bilaterally implanted in posterior striatum (pStr). (b) Five rats were implanted in secondary motor cortex (M2).
Figure 5—figure supplement 3. Single rat performance following M2 inactivation.

Figure 5—figure supplement 3.

Left: inactivation of the low-rate associated side. Rat shows increased lapses on high-rate trials on all sensory modalities. Right: inactivation of the high-rate associated side. Rat shows increased lapses on low-rate trials on all sensory modalities. Auditory (green), visual (blue), and multisensory (red).
Figure 5—figure supplement 4. Single rat performance following pStr inactivation.

Figure 5—figure supplement 4.

Left: inactivation of the low-rate associated side. Rat shows increased lapses on high-rate trials on all sensory modalities. Right: inactivation of the high-rate associated side. Rat shows increased lapses on low-rate trials on all sensory modalities. Auditory (green), visual (blue), and multisensory (red).
Figure 5—figure supplement 5. Unilateral inactivation of M2 or pStr biases performance ipsilaterally and increases contralateral lapses.

Figure 5—figure supplement 5.

Performance of the same rats shown in Figure 5b depicted as a function of the inactivated side (right or left) and the rate-contingency in which they were trained (standard or reverse), along with fits from the biased value model (solid lines – saline, dotted lines - muscimol). Standard contingency: high rate = go right, low rate = go left; reverse contingency: high rate = go left, low rate = go right. Each quadrant shows four plots: three psychometrics for rate discrimination trials and one for performance on sure-bet trials. auditory (green), visual (blue), and multisensory (red). (a–d) M2 inactivation. (e–h) pStr inactivation. (a), (d) Rats trained on the standard contingency and inactivated on the left hemisphere show increased lapses on the high rates (i.e., fewer rightward choices on high rates). No effect on sure-bet trials. (b), (f) Rats trained on the standard contingency and inactivated on the right hemisphere show increased lapses on the low rates (i.e., fewer leftward choices on low rates). No effect on sure-bet trials. (c), (g) Rats trained on the reverse contingency and inactivated on the left hemisphere show increased lapses on the low rates (i.e., fewer rightward choices on low rates). No effect on sure-bet trials. No data for this condition for M2 inactivation. (d), (h) Rats trained on the reverse contingency and inactivated on the right hemisphere show increased lapses on the high rates (i.e., fewer leftward choices on high rates). No effect on sure-bet trials for pStr inactivated animals; no data for M2 inactivated animals.
Figure 5—figure supplement 6. Inactivations devalue contralateral actions irrespective of associated stimulus.

Figure 5—figure supplement 6.

(a) Model predictions for rightward inactivations on standard (top) and reversed (bottom) stimulus-response contingencies – in both cases, the model predicts that reduced leftward action values should only affect lapses on the side associated with leftward movements. (b) Inactivation data on visual trials from M2 (left) or pStr (right) along with fits from the biased value model (solid lines – saline, dotted lines – muscimol) shows a pattern of effects consistent with action value deficits, irrespective of the contingency.
Figure 5—figure supplement 7. No significant effect on movement parameters following muscimol inactivation.

Figure 5—figure supplement 7.

(a) Mean movement times from the center port to the side ports were not significantly different following muscimol inactivation of M2 (left; p=0.9554 for contralateral, 0.9852 for ipsilateral movements; n = 5 rats) or pStr (right; p=0.6629 for contra, p=0.2615 for ipsi, n = 6 rats). Control data on the abscissa is plotted against inactivation data on the ordinate. Purple, movement toward the side ipsilateral to the inactivation site; blue, movement toward the side contralateral to the inactivation site; error bars (s.e.m.) are not visible because they were obscured by the markers in all cases. (b) Mean wait times in the center port were not significantly different following muscimol inactivation of M2 (left; p=0.7612 for contra, p=0.8896 for ipsi, n = 5 rats) or pStr (right; p=0.9128 for contra, p=0.9412 for ipsi, n = 6 rats). All p-values were computed from paired t-tests. Error bars (s.e.m.) are not visible because they were obscured by the markers in all cases.

Fitting averaged data across rats with the exploration model revealed that, in keeping with the first model prediction, the effects on lapses in all modalities could be captured by scaling the contralateral action value by a single parameter (Figure 5b, joint fits to control [solid lines] and inactivation trials [dotted lines] across modalities with the ‘biased value’ model, differing only by a single parameter), similar to the reward manipulation experiments. Animals that were inactivated on the side associated with high rates showed increased lapses on low-rate trials (Figure 5c, bottom right; data points are above the unity line; n = 9 rats), but unchanged lapses on high-rate trials (Figure 5c, top right; data points are on the unity line). This was consistent across areas and modalities (Figure 5c; M2, triangles; pStr, circles; blue, visual; green, auditory; red, multisensory). Similarly, animals that were inactivated on the side associated with low rates showed the opposite effect: increased lapses on high-rate trials (Figure 5c, top left; n = 10 rats), while lapses did not change for low-rate trials (Figure 5c, bottom left). Fits to individual animals revealed that the majority of animals were best fit by the ‘biased value’ model (6/8 rats in M2 – Figure 5—figure supplement 3, 7/11 in pStr – Figure 5—figure supplement 4), and the remaining animals were best fit by the ‘biased effort’ model.

In keeping with the second prediction, when we compared the effects of the disruptions in animals trained on standard and reversed contingencies (low rates rewarded with leftward or rightward actions respectively), the effects were always restricted to lapses on the stimuli associated with the side contralateral to the inactivation (Figure 5—figure supplement 5), always resembling a devaluation of contralateral actions (Figure 5—figure supplement 6).

A model comparison across rats revealed that a fixed multiplicative scaling of contralateral value captured the inactivation effects much better than a fixed reduction in contralateral sensory evidence or a fixed addition of contralateral motor effort, both for M2 (Figure 5e, top) and pStr (Figure 5e, bottom). In uncertain conditions, this reduced contralateral value gives rise to more exploratory choices and hence more lapses on one side (Figure 5f, top).

The final prediction of the exploration model is that changes in action value will only affect trials in which there was uncertainty about the outcome. In keeping with that prediction, performance was spared on sure-bet trials (Figure 5d): rats made correct rightward and leftward choices regardless of the side that was inactivated. This observation provides further reassurance that the changes we observed on more uncertain conditions did not simply reflect motor impairments that drove a tendency to favor ipsilateral movements. Additional movement parameters such as wait time in the center port and movement times to ipsilateral and contralateral reward ports were likewise largely spared (Figure 5—figure supplement 7), suggesting that effects on decision outcome were not due to an inactivation-induced motor impairment.

Together, these results demonstrate that lapses are a powerful tool for interpreting behavioral changes in disruption experiments. For M2 and pStr disruptions, our analysis of lapses and deployment of the exploration model allowed us to reconcile previous inactivation studies. Our results suggest that M2 and pStr have a lateralized, modality-independent role in computing the expected value of actions (Figure 5f, bottom).

Discussion

Perceptual decision-makers have long been known to display a small fraction of errors even on easy trials. Until now, these ‘lapses’ were largely regarded as a nuisance and lacked a comprehensive, normative explanation. Here, we propose a novel explanation for lapses: that they reflect a strategic balance between exploiting known rewarding options and exploring uncertain ones. Our model makes strong predictions for lapses under diverse decision-making contexts, which we have tested here. First, the model predicts more lapses on conditions with higher perceptual uncertainty, such as unisensory (Figure 2) or neutral (Figure 3), compared to matched multisensory or sure-bet conditions. Second, the model predicts that stimulus-specific reward manipulations should produce stimulus-specific effects on lapses, sparing decisions about un-manipulated or highly certain stimulus-action pairs (Figure 4). Finally, the model predicts that lapses should be affected by perturbations to brain regions that encode action value. Accordingly, we observed that inactivations of secondary motor cortex and posterior striatum affected lapses similarly across auditory, visual and multisensory decisions, and could be accounted for by a one-parameter change to the action value (Figure 5). Taken together, our model and experimental data argue strongly that far from being a nuisance, lapses are informative about animals’ subjective action values and reflect a trade-off between exploration and exploitation.

Considerations of value have provided many useful insights into aspects of behavior that seem sub-optimal at first glance from the perspective of perceptual ideal observers. For instance, many perceptual tasks are designed with accuracy in mind – defining an ideal observer as one who maximizes accuracy, in line with classical signal detection theory. However, in practice, the success or failure of different actions may be of unequal value to subjects, especially if reward or punishment is delivered explicitly, as is often the case with nonhuman subjects. This may give rise to biases that can only be explained by an observer that maximizes expected utility (Dayan and Daw, 2008). Similarly, outcomes on a given trial can influence decisions about stimuli on subsequent trials through reinforcement learning, giving rise to serial biases. These biases occur even though the ideal observer should treat the evidence on successive trials as independent (Lak et al., 2018; Mendonca et al., 2018). When subjects can control how long they sample the stimulus, subjects maximizing reward rate may choose to make premature decisions, sacrificing accuracy for speed (Bogacz et al., 2006; Drugowitsch et al., 2014). Finally, additional costs of exercising mental effort could lead to bounded optimality through ‘satisficing’ or finding good enough solutions (Mastrogiorgio and Petracca, 2018; Fan et al., 2018).

Here, we take further inspiration from considerations of value to provide a novel, normative explanation for lapses in perceptual decisions. Our results argue that lapses are not simply accidental errors made as a consequence of attentional ‘blinks’ or motor ‘slips’, but can reflect a deliberate, internal source of behavioral variability that facilitates learning and information gathering when the values of different actions are uncertain. This explanation connects a well-known strategy in value-based decision-making to a previously mysterious phenomenon in perceptual decision-making.

Although exploration no longer yields the maximum utility on any given trial, it is critical for environments in which there is uncertainty about expected reward or stimulus-response contingency, especially if these need to be learnt or refined through experience. By encouraging subjects to sample multiple options, exploration can potentially improve subjects’ knowledge of the rules of the task, helping them to increase long-term utility. This offers an explanation for the higher rate of lapses seen in humans on tasks with abstract (Raposo et al., 2012), non-intuitive (Mihali et al., 2018), or non-verbalizable (Flesch et al., 2018) rules. Exploration is also critical for dynamic environments in which rules or rewards drift or change over time. Subjects adapted to such dynamic real-world environments might entertain the possibility of non-stationarity even in tasks or periods where rewards are truly stationary, and such mismatched beliefs predict residual levels of exploration even in well-trained subjects (Figure 3—figure supplement 2g, middle). Such beliefs could be probed by challenging subjects with unsignalled changes in rewards and measuring how quickly they recover from these change-points. For instance, primates with higher levels of tonic exploration on cognitive set-shifting tasks (Ebitz et al., 2019) are more flexible and make fewer perseverative errors at change-points, at the cost of more lapses in rule adherence during stable periods.

Balancing exploration and exploitation is computationally challenging, and the mechanism we propose here, Thompson sampling, is an elegant heuristic for achieving this balance. This strategy has been shown to be utilized by humans in value-based decision-making tasks (Wilson et al., 2014; Speekenbrink and Konstantinidis, 2015; Gershman, 2018) and is asymptotically optimal even in partially observable environments involving perceptual uncertainty such as ours (Figure 3—figure supplement 2c, Leike et al., 2016). It can be naturally implemented through a sampling scheme where the subject samples action values from a learnt distribution and then maximizes with respect to the sample. This strategy predicts that conditions with higher perceptual uncertainty and consequently higher value uncertainty should have more exploration, and consequently higher lapse rates, explaining the pattern of lapse rates we observed on unisensory vs. multisensory trials as well as on neutral vs. matched trials. A lower rate of lapses on multisensory trials has also been reported on a visual-tactile task in rats (Nikbakht et al., 2018) and a vestibular integration task in humans (Bertolini et al., 2015) and can potentially account for the apparent supra-optimal integration that has been reported in a number of rodent, nonhuman primate and human studies (Nikbakht et al., 2018; Hou et al., 2018; Raposo et al., 2012). A strong prediction of uncertainty guided exploration is that the animal should quickly learn to exploit on conditions with little or no uncertainty, as we observed on sure-bet trials (Figures 4d and 5d).

Uncertainty-guided exploration also predicts that exploratory choices, and consequently lapses, should decrease with training as the animal becomes more certain of the rules and expected rewards, explaining training-dependent effects on lapses in our rats (Figure 3—figure supplement 2g, right) and similar effects reported in primates (Law and Gold, 2009; Cloherty et al., 2019). This can also potentially explain why children have higher lapse rates (Witton et al., 2017; Manning et al., 2018), as they have been shown to be more exploratory in their decisions than adults (Lucas et al., 2014).

A unique prediction of the exploration model is that one-sided reward manipulations should have one-sided effects on lapses, unlike the inattention or motor error models. These predictions are borne out in our data (Figure 4c); moreover, they offer a principled, theoretically grounded way to distinguish between different sources of lapses. This approach can be extended to connect richer statistical descriptions of behavior to psychological variables such as evidence and action value. For instance, some authors have proposed that some of the variance attributed to lapses can be accounted for by allowing psychometric parameters to drift across trials (Roy et al., 2018) or switch between different settings (Ashwood et al., 2019). Whether this parametric non-stationarity arises from non-stationary evidence weighting across trials caused by inattention, variable attention (Shen and Ma, 2019) or attention to irrelevant evidence (Busse et al., 2011), or whether it arises from non-stationary beliefs about action values that encourage continued learning (Lak et al., 2018) and bouts of exploration (Ebitz et al., 2019) can be tested using one-sided reward manipulations, and by extending our model to include trial-by-trial updates of action value based on the history of evidence and outcomes (Pisupati et al., 2019). By decoupling the values of different actions on the two stimulus categories, one-sided reward manipulations distinguish between incorrect decisions made due to a lack of knowledge about the stimulus category (i.e. inattention) and those made despite this knowledge, due to uncertainty about action values (i.e. exploration). An alternative way to decouple these two kinds of errors would be to offer subjects additional actions, e.g. by adding explicit ‘opt-out’ actions (Zatka-Haas et al., 2019), or by adding task-irrelevant actions that subjects need to learn to avoid (Mihali et al., 2018), affording more opportunities to distinguish exploratory and inattentive decisions than tasks with two alternative actions.

In addition to diagnosing or remedying lapses, the exploration model can be used to harness lapses to pinpoint decision-making computations in the brain. Our model suggests that the asymmetric effects on lapses seen during unilateral inactivations of prefrontal and striatal regions (Figure 5b) arise from a selective devaluation of learnt contralateral stimulus-action values. This interpretation reconciles a number of studies that have found asymmetric effects of inactivating these areas during perceptual decisions (Erlich et al., 2015; Zatka-Haas et al., 2019; Wang et al., 2018; Guo et al., 2018) with their established roles in encoding action value (Sul et al., 2011) during value-based decisions , and strengthens previous proposals that these areas arbitrate between perceptual and value-based influences on decisions (Lee et al., 2015; Barthas and Kwan, 2017; Siniscalchi et al., 2019). The effects of inactivation in these studies are consistent with a ‘devaluation’ deficit, or multiplicative scaling of learnt stimulus-action values, resembling the majority of our inactivations (6/8 rats in M2, 7/11 in pStr) and selectively affecting lapses on stimuli strongly associated with the devalued actions. However, inactivations sometimes resembled additive deficits in action value (2/8 rats in M2, 4/11 in pStr), akin to an added ‘effort’ in performing the associated action irrespective of its learnt value, consistent with some reports in striatum (Tai et al., 2012). Further work will be needed to precisely understand the nature of value representations in these regions and why they are sometimes multiplicatively and sometimes additively impacted by inactivations.

An open question that remains is how the brain might tune the degree of exploration in proportion to uncertainty. An intriguing candidate for this is dopamine, whose phasic responses have been shown to reflect state uncertainty (Starkweather et al., 2017; Babayan et al., 2018; Lak et al., 2018), and whose tonic levels have been shown to modulate exploration in mice on a lever-press task (Beeler et al., 2010), and context-dependent song variability in songbirds (Leblois et al., 2010). Dopaminergic genes have been shown to predict individual differences in uncertainty-guided exploration in humans (Frank et al., 2009), and dopaminergic disorders such as Parkinson’s disease have been shown to disrupt the uncertainty-dependence of lapses across conditions on a multisensory task (Bertolini et al., 2015), while L-Dopa, a Parkinson’s drug and dopamine precursor, has been shown to attentuate uncertainty-guided exploration (Chakroun et al., 2019). Patients with ADHD, another disorder associated with dopaminergic dysfunction, have been shown to display both increased perceptual variability and increased task-irrelevant motor output, a measure that correlates with lapses (Mihali et al., 2018). Finally, tonic exploration and lapses of rule adherence are reduced in nonhuman primates that are administered cocaine (Ebitz et al., 2019), which interferes with dopamine transport. A promising avenue for future studies is to leverage the informativeness of lapses and the precise control of uncertainty afforded by multisensory tasks, in conjunction with perturbations or recordings of dopaminergic circuitry, to further elucidate the connections between perceptual and value-based decision-making systems.

Materials and methods

Key resources table.

Reagent type (species) or
resource
Designation Source or
reference
Identifiers Additional information
Strain, strain background
(Rattus norvegicus
domestica, male and
female)
Long-Evans Rat Taconic Farms SimTac:LE TAC: LONGEV-M, TAC:
LONGEV-F
Recombinant DNA reagent AAV2.CB7.CI.EGFP.WPRE.RBG UPenn Vector Core Obtained from the
laboratory of Dr.
Partha Mitra at CSHL
Recombinant DNA reagent AAV2.CAG.tdTomato.WPRE.SV40 UPenn Vector Core Obtained from the
laboratory of Dr.
Partha Mitra at CSHL
Chemical compound, drug Muscimol abcam ab120094
Software, algorithm PALAMEDES toolbox Prins and Kingdom, 2018 doi: 10.3389/fpsyg.2018.01250
Software, algorithm MATLAB The Mathworks, Inc

Behavior

Animal subjects and housing

All animal procedures and experiments were in accordance with the National Institutes of Health's Guide for the Care and Use of Laboratory Animals and were approved by the Cold Spring Harbor Laboratory Animal Care and Use Committee. Experiments were conducted with 34 adult male and female Long Evans rats (250–350 g, Taconic Farms) that were housed with free access to food and restricted access to water starting from the onset of behavioral training. Rats were housed on a reversed light–dark cycle; experiments were run during the dark part of the cycle. Rats were pair-housed during the whole training period.

Animal training and behavioral task

Rats were trained following previously established methods (Raposo et al., 2012; Sheppard et al., 2013; Raposo et al., 2014; Licata et al., 2017). Briefly, rats were trained to wait in the center port for 1000 ms while stimuli were presented, and to associate stimuli with left/right reward ports. Stimuli for each trial consisted of a series of events: auditory clicks from a centrally positioned speaker, full-field visual flashes, or both together. Stimulus events were separated by either long (100 ms) or short (50 ms) intervals. For the easiest trials, all inter-event intervals were identical, generating rates that were nine events per second (all long intervals) or 16 events per second (all short intervals). More difficult trials included a mixture of long and short intervals, generating stimulus rates that were intermediate between the two extremes and therefore more difficult for the animal to judge. The stimulus began after a variable delay following when the rats snout broke the infrared beam in the center port. The length of this delay was selected from a truncated exponential distribution (λ = 30 ms, minimum = 10 ms, maximum = 200 ms) to generate an approximately flat hazard function. The total time of the stimulus was usually 1000 ms. Trials of all modalities and stimulus strengths were interleaved. For multisensory trials, the same number of auditory and visual events were presented (except for a subset of neutral trials). Auditory and visual stimulus event times were generated independently, as our previous work has demonstrated that rats make nearly identical decisions regardless of whether stimulus events are presented synchronously or independently (Raposo et al., 2012). For most experiments, rats were rewarded with a drop of water for moving to the left reward port following low-rate trials and to the right reward port following high-rate trials. For muscimol inactivation experiments, half of the rats were rewarded according to the reverse contingency. Animals typically completed between 700 and 1200 trials per day. Most experiments had 18 conditions (three modalities, eight stimulus strengths), leading to 29–50 trials per condition per day.

To probe the effect of uncertainty on lapses, rats received catch trials consisting of multisensory neutral trials, where only the auditory modality provided evidence for a particular choice, whereas the visual modality provided evidence that was so close to the category boundary (12 Hz) that it did not support one choice or the other (Raposo et al., 2012).

To probe the effect of value on lapses, we manipulated either reward magnitude or reward probability associated with high rates, while keeping low-rate trials unchanged. To increase or decrease reward magnitude associated with high rates, the amount of water dispensed on the right port was increased or decreased to 36 μL or 16 μL respectively, while the reward on the left port was maintained at 24 μL. To manipulate reward probability, we occasionally rewarded rats on the (incorrect) left port on high-rate trials with a probability of 0.5. The right port was still rewarded with a probability of 1 on high rates, and reward probabilities on low-rate trials were unchanged (one on the left port, 0 on the right).

Analysis of behavioral data

Psychometric curves

Descriptive four-parameter psychometric functions were fit to choice data using the Palamedes toolbox (Prins and Kingdom, 2018). Psychometric functions were parameterized as:

ψ(x;μ,σ,γ,λ)=ϕ(x;μ,σ)(1λγ)+γ (1)

where γ and λ are the lower and upper asymptote of the psychometric function, which parametrize the lapse rates on low and high rates, respectively. ϕ is a cumulative normal function; x is the event rate, that is, the number of flashes or beeps presented during the 1 s stimulus period; μ parametrizes the x-value at the midpoint of the psychometric function and σ describes the inverse slope. 95% Confidence intervals on these parameters were generated via bootstrapping based on 1000 simulations.

Our definition of lapses is restricted to strictly asymptotic errors following Wichmann and Hill, 2001, and not simply errors on the easiest stimuli tested. Errors on the easiest stimuli could in general arise not just from lapses (strictly defined) but also from perceptual errors caused by low sensitivity to the stimulus, an insufficient stimulus range or non-stationary weights (Busse et al., 2011; Roy et al., 2018). However, we do not consider easy errors alone to be evidence of lapses and only consider asymptotic errors. To confirm the necessity of including the lapse parameters, we fit the following variants of the model above, including lapse parameters when warranted by model comparison using AIC/BIC:

No lapses

This model forces λ=γ=0 for all conditions (visual, auditory, and multisensory) and only allows σ and μ parameters to vary across conditions.

Fixed lapses

This model allows for a fixed λ and γ (which may be unequal) across all conditions.

Restricted lapses

This model allows λ and γ to vary across conditions, but restricts λ+γ to be less than 0.1. This corresponds to an often used prior over total lapse rates, embodying the belief that lapse trials are infrequent (Wichmann and Hill, 2001; Prins and Kingdom, 2018).

Variable lapses

This model allows both λ and γ to vary freely across conditions, allowing them each to take any value between 0 and 1 (as long as their sum also lies between 0 and 1).

Modeling

Ideal observer model

We can specify an ideal observer model for our task using Bayesian Decision Theory (Dayan and Daw, 2008). This observer maintains probability distributions over previously experienced stimuli and choices, computes the posterior probability of each action being correct given its observations, and picks the action that yields the highest expected reward.

Let the true category on any given trial be ctrue, the true stimulus rate be strue, and the animal’s noisy visual and auditory observations of strue be xV and xA, respectively. We assume that the two sensory channels are corrupted by independent Gaussian noise with standard deviation σA and σV, respectively, giving rise to conditionally independent observations.

p(xA|strue)=𝒩(strue,σA),p(xV|strue)=𝒩(strue,σV),p(xA,xV|strue)=p(xA|strue)p(xV|strue) (2)

The ideal observer can use this knowledge to compute the likelihood of seeing the current trial’s observations as a function of the hypothesized stimulus rate s. This likelihood is a Gaussian function of s with a mean given by a weighted sum of the observations xA and xV:

(s)=p(xA,xV|s)=p(xA|s)p(xV|s)𝒩(μM,σM)μM=wAxA+wVxVσM=(σA2+σV2)12wA=σM2σA2,wV=σM2σV2 (3)

The likelihood of seeing the observations as a function of the hypothesized category c is given by marginalizing over all possible hypothesized stimulus rates. Let the experimentally imposed category boundary be μ0, such that stimulus rates are considered high when s>μ0 and low when s<μ0. Then,

(c=High)=p(xA,xV|c=High)=sp(xA,xV,s|c=High)ds=sp(xA,xV|s)p(s|c=High)dsxA,xVc|s=s>μ0p(xA,xV|s)ds1Φ(μ0;μM,σM) (4)

where Φ is the cumulative normal function. Using Bayes’ rule, the ideal observer can then compute the probability that the current trial was high or low rate given the observations, that is, the posterior probability.

p(c|xA,xV)=p(xA,xV|c)p(c)p(xA,xV)p(c=High|xA,xV)pHigh(1Φ(μ0;μM,σM))p(c=Low|xA,xV)pLowΦ(μ0;μM,σM) (5)

where pHigh and pLow are the prior probabilities of high and low rates respectively. The expected value Q(a) of choosing right or left actions (also known as the action values) is obtained by marginalizing the learnt value of state-action pairs q(c,a) over the unobserved state c.

Q(a=R)=p(High|xA,xV)q(High,R)+p(Low|xA,xV)q(Low,R)Q(a=L)=p(High|xA,xV)q(High,L)+p(Low|xA,xV)q(Low,L) (6)

Under the standard contingency, high rates are rewarded on the right and low rates on the left, so for a trained observer that has fully learnt the contingency, q(High,R)rR,q(High,L)0,q(Low,R)0,q(Low,L)rL, with rR and rL being reward magnitudes for rightward and leftward actions. This simplifies the action values to:

Q(R)=p(High|xA,xV)rRpHigh(1Φ(μ0;μM,σM))rRQ(L)=p(Low|xA,xV)rLpLowΦ(μ0;μM,σM)rL (7)

The max-reward decision rule involves picking the action a^ with the highest expected reward:

a^=argmaxQ(a)i.e. a^=RQ(R)>Q(L)pHigh(1Φ(μ0;μM,σM))rR>pLowΦ(μ0;μM,σM)rLΦ(μM;μ0,σM)>11+pHighrRpLowrLwAxA+wVxV>Φ1(11+pHighrRpLowrL;μ0,(σA2+σV2)12) (8)

In the special case of equal rewards and uniform stimulus and category priors, this reduces to choosing right when the weighted sum of observations is to the right of the true category boundary, that is, wAxA+wVxV>μ0. Note that this is a deterministic decision rule for any given observations xA and xV; however, since these are noisy and Gaussian distributed around the true stimulus rate strue, the likelihood of making a rightward decision is given by the cumulative Gaussian function Φ:

ForpHigh=pLow,rR=rLp(a^=R|s)=p(wAxA+wVxV>μ0|s)=Φ(strue;μ0,σ)σ={σA on auditory trials σV on visual trials (σA2+σV2)12 on multisensory trials  (9)

We can measure this probability empirically through the psychometric curve. Fitting it with a two-parameter cumulative Gaussian function yields μ and σ which can be compared to ideal observer predictions. The σ parameter is then taken to reflect sensory noise; and with the assumption of uniform priors and equal rewards, the μ parameter is taken to reflect the subjective category boundary. For the purpose of assessing optimality of integration, σ was individually fit to each condition and compared to ideal observer predictions, but for the purpose of comparing theoretical models of lapses, σ on multisensory conditions was constrained to be optimal for all models. Although μ should equal μ0 for the ideal observer, in practice it is treated as a free parameter in all models, and deviations of μ from μ0 could reflect any of three possible suboptimalities: (1) a subjective category boundary mismatched to the true one, possibly arising from the use of irrelevant features such as total event count (Odoemene et al., 2018), (2) mismatched priors, or (3) unequal subjective rewards rR and rL of the two actions.

Inattention model

The traditional model for lapse rates assumes that on a fixed proportion of trials, the animal fails to pay attention to the stimulus, guessing randomly between the two actions. We can incorporate this suboptimality into the ideal observer above as follows: Let the probability of attending be pattend. Then, on 1-pattend fraction of trials, the animal does not attend to the stimulus (i.e. receives no evidence), effectively making σsensory and giving rise to a posterior that is equal to the prior. On these trials, the animal may choose to maximize this prior (always picking the option that is more likely a priori, guessing with 50–50 probability if both options are equally likely), or probability-match the prior (guessing in proportion to its prior). Let us call this guessing probability pbias. Then, the probability of a rightward decision is given by marginalizing over the attentional state:

p(a^=R|s)=p(a^=R|s,attend)p(attend)+p(a^=R|s,attend)p(attend)=p(a^=R|s)pattend+pbias(1pattend) (10)

Comparing this with the standard four-parameter sigmoid used in psychometric fitting, we obtain

p(a^=R|strue)=γ+(1γλ)Φ(strue;μ0,σ)γ+λ=1pattend,γγ+λ=pbias (11)

where γ and λ are the lower and upper asymptotes respectively, collectively known as ‘lapses’. In this model, the sum of the two lapses depends on the probability of attending, which could be modulated in a bottom-up fashion by the salience of the stimulus; their ratio depends on the guessing probability, which in turn depends on the observer’s priors and subjective rewards rR and rL.

Motor error/ϵ greedy model

Lapses can also occur if the observer does not always pick the reward-maximizing or ‘exploit’ decision. This might occur due to random errors in motor execution on a small fraction of trials given by ε, or it might reflect a deliberate propensity to occasionally make random ‘exploratory’ choices to gather information about rules and rewards. This is known as an ε-greedy decision rule, where the observer chooses randomly (or according to pbias) on ε fraction of trials. Both these models yield predictions similar to those of the inattention model:

p(a^=R|s)=p(a^=R|s)(1ϵ)+ϵpbiasγ+λ=ϵ,γγ+λ=pbias (12)

Uncertainty guided exploration model

A more sophisticated form of exploration is the ‘softmax’ decision rule, which explores options in proportion to their expected rewards, allowing for a balance between exploration and exploitation through the tuning of a parameter β known as inverse temperature. In particular, in conditions of greater uncertainty about rules or rewards, it is advantageous to be more exploratory and have a lower β. This form of uncertainty-guided exploration is known as Thompson sampling. It can be implemented by sampling from a belief distribution over expected rewards and maximizing with respect to the sample, reducing to a softmax rule whose β depends on the total uncertainty in expected reward (Gershman, 2018).

p(a^=R|Q(a))=expβQ(R)expβQ(L)+expβQ(R)=11+exp(β(Q(R)Q(L))) (13)

The proportion of rightward choices conditioned on the true stimulus rate is then obtained by marginalizing over the latent action values Q(a), using the fact that the choice depends on s only through its effect on Q(a), where ρ is the animal’s posterior belief in a high-rate stimulus, that is, ρ=p(c=High|xA,xV). ρ is often referred to as the belief state in reinforcement learning problems involving partial observability such as our task.

p(a^=R|s)=Q(a)p(a^=R,Q(a)|s)dQ=Q(a)p(a^=R|Q(a))p(Q(a)|s)dQa^s|Q(a)=ρ11+expβ(ρ(rR+rL)rL)𝒩(Φ1(1ρ,0,σpost),μ0s,σpost)𝒩(Φ1(1ρ,0,σpost),0,σpost)dρ (14)

Since lapses are the asymptotic probabilities of the lesser rewarding action at extremely easy stimulus rates, we can derive them from this expression by setting ρ1 or ρ0. This yields

γ=11+exp(βrL),λ=11+exp(βrR) (15)

Critically, in this model, the upper and lower lapses are dissociable, depending only on the rightward or leftward rewards, respectively. In practice since β can only be specified up to an arbitrary scaling of reward magnitudes, we either fix rL=1 and fit β and a reward bias rRrL in units of rL (for conditions with different expected β), or fix β=1 and fit rL and rR in units of β (for conditions with the same β where one of the rewards is expected to change).

Such a softmax decision rule has been used to account for suboptimalities in value-based decisions (Dayan and Daw, 2008); however, it has not been used to account for lapses in perceptual decisions. Other suboptimal decision rules described in perceptual decisions, such as generalized probability matching or posterior sampling (Acerbi et al., 2014; Drugowitsch et al., 2016; Ortega and Braun, 2013), amount to a softmax on log-posteriors or log-expected values, rather than on expected values, and do not produce lapses since in these decision rules, when the posterior probability goes to 1, so does the decision probability.

p(a^=R|Q(a))=expβlogQRexpβlogQL+expβlogQR=QRβQLβ+QRβ{ρ1p(R)1ρ0p(R)0 (16)

Inactivation modeling

Inactivations were modeled using the following one-parameter perturbations to the decision-making process, while keeping all other parameters fixed:

Biased evidence

A fixed amount of evidence was added to all modalities. This corresponds to adding a rate bias of K*σi for a condition with sensory noise σi with K>0 fixed across modalities, leading to bigger biases for conditions with higher sensory noise.

Biased value

The expected values of one of the actions was scaled down by a fixed factor of K<1 across all modalities. For instance, QLiK*QLi produced a rightward biased value for a condition with baseline leftward expected value QLi. This led to a stimulus-dependent bias in action value and consequently lapses, since QLi is large and heavily affected for low-rate trials, and close to zero and largely unaffected for high-rate trials.

Biased effort

A fixed ‘effort’ cost (i.e. negative value) K<0 was added to the expected values of one of the actions for all modalities. This added a stimulus-independent bias in action values, since the difference in expected values was biased away from the effortful action by the same amount irrespective of the stimulus rate.

Model fitting

Model fits were obtained from custom maximum likelihood fitting code using MATLAB’s fmincon, by maximizing the marginal likelihood of rightward choices given the stimulus on each trial as computed from each model. Confidence intervals for fit parameters were generated using the hessian obtained from fmincon. Fits to multiple conditions were performed jointly, taking into account any linear or nonlinear (e.g. optimality) constraints on parameters across conditions. Model comparisons were done using AIC and BIC. For comparisons of fits to data pooled across subjects, AIC/BIC values were computed with respect to the best fit model, so that the best model had an AIC/BIC of 0. For comparisons of fits to individual subject data, AIC/BIC values for each subject were computed with respect to the best fit model for each subject, so that the best model for that subject had an AIC/BIC of 0, and then summed across subjects.

Surgical procedures

All rats subject to surgery were anesthetized with 1–3% isoflurane. Isoflurane anesthesia was maintained by monitoring respiration, heart rate, oxygen, and CO2 levels, as well as foot pinch responses throughout the surgical procedure. Ophthalmic ointment was applied to keep the eyes moistened throughout surgery. After scalp shaving, the skin was cleaned with 70% ethanol and 5% betadine solution. Lidocaine solution was injected below the scalp to provide local analgesia prior to performing scalp incisions. Meloxicam (5 mg/mL) was administered subcutaneously (2 mg/kg) for analgesia at the beginning of the surgery, and daily 2–3 days post-surgery. The animals were allowed at least 7 days to recover before behavioral training.

Viral injections

Two rats, 15 weeks of age, were anesthetized and placed in a stereotaxic apparatus (Kopf Instruments). Small craniotomies were made in the center of primary visual cortex (V1; 6.9 mm posterior to Bregma, 4.2 mm to the right of midline) and primary auditory cortex (A1; 4.7 mm posterior to Bregma, 7 mm to the right of midline). Small durotomies were performed at each craniotomy and virus was pressure injected at depths of 600, 800, and 1000 μm below the pia (150 nL/depth). Virus injections were performed using Drummond Nanoject III, which enables automated delivery of small volumes of virus. To minimize virus spread, the Nanoject was programmed to inject slowly: fifteen 10 nL boluses, 30 s apart. Each bolus was delivered at 10 nL/s. Two to three minutes were allowed following injection at each depth to allow for diffusion of virus. The AAV2.CB7.CI.EGFP.WPRE.RBG construct was injected in V1, and the AAV2.CAG.tdTomato.WPRE.SV40 construct was injected in A1. Viruses were obtained from the University of Pennsylvania vector core.

Cannulae implants

Rats were anesthetized and placed in the stereotax as described above. After incision and skull cleaning, two skull screws were implanted to add more surface area for the dental cement. For striatal implants, two craniotomies were made, one each side of the skull (3.2 mm posterior to Bregma; 5.4 mm to the right and left of midline). Durotomies were performed and a guide cannula (22 gauge, 8.5 mm long; PlasticsOne) was placed in the brain, 4.1 mm below the pia at each craniotomy. For secondary motor cortex implants, one large craniotomy spanning the right and left M2 was performed (∼5 mm × ∼2 mm in size centered around 2 mm anterior to Bregma and 3.1 mm to the right and left of midline). A durotomy was performed and a double guide cannula (22 gauge, 4 mm long; PlasticsOne) was placed in the brain, 300 μm below the pia. The exposed brain was covered with sterile Vaseline and cannulae were anchored to the skull with dental acrylic (Relyx). Single or double dummy cannulae protruding 0.7 mm below the guide cannulae were inserted.

Inactivation with muscimol

Rats were lightly anesthetized with isoflurane. Muscimol was unilaterally infused into pStr or M2 with a final concentration of 0.075–0.125 μg and 0.1–0.5 μg, respectively. A single/double-internal cannula (PlasticsOne), connected to a 2 μL syringe (Hamilton microliter syringe, 7000 series), was inserted into each previously implanted guide cannula. Internal cannulae protruded 0.5 mm below the guide. Muscimol was delivered using an infusion pump (Harvard PHD 22/2000) at a rate of 0.1 μL/min. Internal cannulae were kept in the brain for three additional minutes to allow for diffusion of muscimol. Rats were removed from anesthesia and returned to cages for 15 min before beginning behavioral sessions. The same procedure was used in control sessions, where muscimol was replaced with sterile saline.

Histology

At the conclusion of inactivation experiments, animals were deeply anesthetized with Euthasol (pentobarbital and phenytoin). Animals were perfused transcardially with 4% paraformaldehyde. Brains were extracted and post-fixed in 4% paraformaldehyde for 24–48 hr. After post-fixing, 50–100 μm coronal sections were cut on a vibratome (Leica) and imaged.

Acknowledgements

We thank Matt Kaufman, Simon Musall, Onyekachi Odoemene, Ashley Juavinett, Farzaneh Najafi, Akihiro Funamizu, Priyanka Gupta, Anne Urai, James Roach, Colin Stoneking, Diksha Gupta, Tatiana Engel, Rob Phillips, Tony Zador, Steve Shea, and Bo Li for scientific advice and discussions, and Angela Licata, Steven Gluf, Liete Einchorn, Dennis Maharjan, Alexa Pagliaro, Edward Lu, and Barry Burbach for technical assistance. We thank Partha Mitra, Alexander Tolpygo, and Stephen Savoia for help with slicing and imaging virus-injected brains. This work was supported by the Simons Collaboration on the Global Brain, ONR MURI, the Eleanor Schwartz Fund, the Pew Charitable Trust, and the Watson School of Biological Sciences. [Competing Interests] The authors declare that they have no competing financial interests. [Correspondence] Correspondence and requests for materials should be addressed to Anne K Churchland (email: churchland@cshl.edu).

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Anne K Churchland, Email: AChurchland@mednet.ucla.edu.

Daeyeol Lee, Johns Hopkins University, United States.

Joshua I Gold, University of Pennsylvania, United States.

Funding Information

This paper was supported by the following grants:

  • Army Research Office W911NF-16-1-0368 to Sashank Pisupati, Lital Chartarifsky-Lynn, Anup Khanal, Anne K Churchland.

  • National Institutes of Health R01 EY022979 to Anne K Churchland.

Additional information

Competing interests

No competing interests declared.

Author contributions

Data curation, Formal analysis, Validation, Investigation, Visualization, Writing - original draft, Writing - review and editing.

Formal analysis, Investigation, Visualization, Writing - review and editing.

Investigation, Writing - review and editing.

Conceptualization, Resources, Data curation, Supervision, Writing - original draft, Project administration, Writing - review and editing.

Ethics

Animal experimentation: All animal procedures and experiments were in accordance with the National Institutes of Healths Guide for the Care and Use of Laboratory Animals and were approved by the Cold Spring Harbor Laboratory Animal Care and Use Committee (protocol 19-16-13-10-7).

Additional files

Transparent reporting form

Data availability

Data are publicly available: http://repository.cshl.edu/id/eprint/38957/.

The following dataset was generated:

Pisupati S, Chartarifsky L, Khanal A, Churchland A K. 2020. Dataset from: Lapses in perceptual decisions reflect exploration. CSHL.

References

  1. Acerbi L, Vijayakumar S, Wolpert DM. On the origins of suboptimality in human probabilistic inference. PLOS Computational Biology. 2014;10:e1003661. doi: 10.1371/journal.pcbi.1003661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ashwood ZC, Roy NA, Urai AE, Aguillon Rodriguez V, Bonacchi N, Cazettes F, Chapuis GA, Churchland AK, Faulkner M, Hu F, Krasniak C, Laranjeira IC, Meijer GT, Miska NJ, Noel JP, Pan-vazquez A, Rossant C, Socha KZ, Stone IR, Wells MJ, Wilson CJ, Winter O, Pillow JW, IBL Collaboration State-dependent modeling of psychophysical behavior during decision making. program No. 241.11. 2019. Neuroscience Meeting Planner Society for Neuroscience.2019. [Google Scholar]
  3. Babayan BM, Uchida N, Gershman SJ. Belief state representation in the dopamine system. Nature Communications. 2018;9:1891. doi: 10.1038/s41467-018-04397-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Barthas F, Kwan AC. Secondary motor cortex: where 'Sensory' Meets 'Motor' in the Rodent Frontal Cortex. Trends in Neurosciences. 2017;40:181–193. doi: 10.1016/j.tins.2016.11.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bays PM, Catalao RF, Husain M. The precision of visual working memory is set by allocation of a shared resource. Journal of Vision. 2009;9:7. doi: 10.1167/9.10.7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Beeler JA, Daw N, Frazier CR, Zhuang X. Tonic dopamine modulates exploitation of reward learning. Frontiers in Behavioral Neuroscience. 2010;4:170. doi: 10.3389/fnbeh.2010.00170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bertolini G, Wicki A, Baumann CR, Straumann D, Palla A. Impaired tilt perception in Parkinson's disease: a central vestibular integration failure. PLOS ONE. 2015;10:e0124253. doi: 10.1371/journal.pone.0124253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bogacz R, Brown E, Moehlis J, Holmes P, Cohen JD. The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychological Review. 2006;113:700–765. doi: 10.1037/0033-295X.113.4.700. [DOI] [PubMed] [Google Scholar]
  9. Busse L, Ayaz A, Dhruv NT, Katzner S, Saleem AB, Schölvinck ML, Zaharia AD, Carandini M. The detection of visual contrast in the behaving mouse. Journal of Neuroscience. 2011;31:11351–11361. doi: 10.1523/JNEUROSCI.6689-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Carandini M, Churchland AK. Probing perceptual decisions in rodents. Nature Neuroscience. 2013;16:824–831. doi: 10.1038/nn.3410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chakroun K, Mathar D, Wiehler A, Ganzer F, Peters J. Dopaminergic modulation of the exploration/exploitation trade-off in human decision-making. bioRxiv. 2019 doi: 10.1101/706176. [DOI] [PMC free article] [PubMed]
  12. Cloherty SL, Yates JL, Graf D, DeAngelis GC, Mitchell JF. Motion perception in the common marmoset. bioRxiv. 2019 doi: 10.1101/522888. [DOI] [PMC free article] [PubMed]
  13. Dayan P, Daw ND. Decision theory, reinforcement learning, and the brain. Cognitive, Affective, & Behavioral Neuroscience. 2008;8:429–453. doi: 10.3758/CABN.8.4.429. [DOI] [PubMed] [Google Scholar]
  14. Drugowitsch J, DeAngelis GC, Klier EM, Angelaki DE, Pouget A. Optimal multisensory decision-making in a reaction-time task. eLife. 2014;3:e03005. doi: 10.7554/eLife.03005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Drugowitsch J, Wyart V, Devauchelle AD, Koechlin E. Computational precision of mental inference as critical source of human choice suboptimality. Neuron. 2016;92:1398–1411. doi: 10.1016/j.neuron.2016.11.005. [DOI] [PubMed] [Google Scholar]
  16. Drugowitsch J, Pouget A. Learning optimal decisions with confidence. bioRxiv. 2018 doi: 10.1101/244269. [DOI] [PMC free article] [PubMed]
  17. Ebitz RB, Sleezer BJ, Jedema HP, Bradberry CW, Hayden BY. Tonic exploration governs both flexibility and lapses. PLOS Computational Biology. 2019;15:e1007475. doi: 10.1371/journal.pcbi.1007475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Erlich JC, Brunton BW, Duan CA, Hanks TD, Brody CD. Distinct effects of prefrontal and parietal cortex inactivations on an accumulation of evidence task in the rat. eLife. 2015;4:e05457. doi: 10.7554/eLife.05457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Ernst MO, Bülthoff HH. Merging the senses into a robust percept. Trends in Cognitive Sciences. 2004;8:162–169. doi: 10.1016/j.tics.2004.02.002. [DOI] [PubMed] [Google Scholar]
  20. Fan Y, Gold JI, Ding L. Ongoing, rational calibration of reward-driven perceptual biases. eLife. 2018;7:e36018. doi: 10.7554/eLife.36018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Findling C, Skvortsova V, Dromnelle R, Palminteri S, Wyart V. Computational noise in reward-guided learning drives behavioral variability in volatile environments. bioRxiv. 2018 doi: 10.1101/439885. [DOI] [PubMed]
  22. Flesch T, Balaguer J, Dekker R, Nili H, Summerfield C. Comparing continual task learning in minds and machines. PNAS. 2018;115:E10313–E10322. doi: 10.1073/pnas.1800755115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Frank MJ, Doll BB, Oas-Terpstra J, Moreno F. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nature Neuroscience. 2009;12:1062–1068. doi: 10.1038/nn.2342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Garrido MI, Dolan RJ, Sahani M. Surprise leads to noisier perceptual decisions. I-Perception. 2011;2:112–120. doi: 10.1068/i0411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Gershman SJ. A unifying probabilistic view of associative learning. PLOS Computational Biology. 2015;11:e1004567. doi: 10.1371/journal.pcbi.1004567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Gershman SJ. Deconstructing the human algorithms for exploration. Cognition. 2018;173:34–42. doi: 10.1016/j.cognition.2017.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Gold JI, Ding L. How mechanisms of perceptual decision-making affect the psychometric function. Progress in Neurobiology. 2013;103:98–114. doi: 10.1016/j.pneurobio.2012.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Green DM, Swets JA. Signal Detection Theory and Psychophysics. Wiley; 1966. [DOI] [Google Scholar]
  29. Guo L, Walker WI, Ponvert ND, Penix PL, Jaramillo S. Stable representation of sounds in the posterior striatum during flexible auditory decisions. Nature Communications. 2018;9:1534. doi: 10.1038/s41467-018-03994-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Hou H, Zheng Q, Zhao Y, Pouget A, Gu Y. Neural correlates of optimal multisensory decision making. bioRxiv. 2018 doi: 10.1101/480178. [DOI] [PubMed]
  31. Jiang H, Kim HF. Anatomical inputs from the sensory and value structures to the tail of the rat striatum. Frontiers in Neuroanatomy. 2018;12:30. doi: 10.3389/fnana.2018.00030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Lak A, Okun M, Moss M, Gurnani H, Wells MJ, Reddy CB, Harris KD, Carandini M. Dopaminergic and frontal signals for decisions guided by sensory evidence and reward value. bioRxiv. 2018 doi: 10.1101/411413. [DOI]
  33. Law CT, Gold JI. Reinforcement learning can account for associative and perceptual learning on a visual-decision task. Nature Neuroscience. 2009;12:655–663. doi: 10.1038/nn.2304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Leblois A, Wendel BJ, Perkel DJ. Striatal dopamine modulates basal ganglia output and regulates social context-dependent behavioral variability through D1 receptors. Journal of Neuroscience. 2010;30:5730–5743. doi: 10.1523/JNEUROSCI.5974-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Lee AM, Tai LH, Zador A, Wilbrecht L. Between the primate and 'reptilian' brain: Rodent models demonstrate the role of corticostriatal circuits in decision making. Neuroscience. 2015;296:66–74. doi: 10.1016/j.neuroscience.2014.12.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Leike J, Lattimore T, Orseau L, Hutter M. Thompson sampling is asymptotically optimal in general environments. arXiv. 2016 https://arxiv.org/abs/1602.07905
  37. Licata AM, Kaufman MT, Raposo D, Ryan MB, Sheppard JP, Churchland AK. Posterior parietal cortex guides visual decisions in rats. The Journal of Neuroscience. 2017;37:4954–4966. doi: 10.1523/JNEUROSCI.0105-17.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Lucas CG, Bridgers S, Griffiths TL, Gopnik A. When children are better (or at least more open-minded) learners than adults: developmental differences in learning the forms of causal relationships. Cognition. 2014;131:284–299. doi: 10.1016/j.cognition.2013.12.010. [DOI] [PubMed] [Google Scholar]
  39. Manning C, Jones PR, Dekker TM, Pellicano E. Psychophysics with children: investigating the effects of attentional lapses on threshold estimates. Attention, Perception, & Psychophysics. 2018;80:1311–1324. doi: 10.3758/s13414-018-1510-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Mastrogiorgio A, Petracca E. Satisficing as an alternative to optimality and suboptimality in perceptual decision making. Behavioral and Brain Sciences. 2018;41:e235. doi: 10.1017/S0140525X18001358. [DOI] [PubMed] [Google Scholar]
  41. Mendonca AG, Drugowitsch J, Vicente MI, DeWitt E, Pouget A, Mainen ZF. The impact of learning on perceptual decisions and its implication for speed-accuracy tradeoffs. bioRxiv. 2018 doi: 10.1101/501858. [DOI] [PMC free article] [PubMed]
  42. Mihali A, Young AG, Adler LA, Halassa MM, Ma WJ. A Low-Level perceptual correlate of behavioral and clinical deficits in ADHD. Computational Psychiatry. 2018;2:141–163. doi: 10.1162/cpsy_a_00018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Nikbakht N, Tafreshiha A, Zoccolan D, Diamond ME. Supralinear and supramodal integration of visual and tactile signals in rats: psychophysics and neuronal mechanisms. Neuron. 2018;97:626–639. doi: 10.1016/j.neuron.2018.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Odoemene O, Pisupati S, Nguyen H, Churchland AK. Visual evidence accumulation guides Decision-Making in unrestrained mice. The Journal of Neuroscience. 2018;38:10143–10155. doi: 10.1523/JNEUROSCI.3478-17.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Ortega PA, Braun DA. Thermodynamics as a theory of decision-making with information-processing costs. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2013;469:20120683. doi: 10.1098/rspa.2012.0683. [DOI] [Google Scholar]
  46. Piet AT, Erlich JC, Kopec CD, Brody CD. Rat prefrontal cortex inactivations during decision making are explained by bistable attractor dynamics. Neural Computation. 2017;29:2861–2886. doi: 10.1162/neco_a_01005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Pinto L, Koay SA, Engelhard B, Yoon AM, Deverett B, Thiberge SY, Witten IB, Tank DW, Brody CD. An Accumulation-of-Evidence task using visual pulses for mice navigating in virtual reality. Frontiers in Behavioral Neuroscience. 2018;12:36. doi: 10.3389/fnbeh.2018.00036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Pisupati S, Musall SM, Urai AE, Churchland AK. A two stage bayesian observer predicts the effects of learning on perceptual decisions. program No. 756.01. 2019. Neuroscience Meeting Planner Society for Neuroscience.2019. [Google Scholar]
  49. Prins N, Kingdom FAA. Applying the Model-Comparison approach to test specific research hypotheses in psychophysical research using the palamedes toolbox. Frontiers in Psychology. 2018;9:1250. doi: 10.3389/fpsyg.2018.01250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Raposo D, Sheppard JP, Schrater PR, Churchland AK. Multisensory decision-making in rats and humans. Journal of Neuroscience. 2012;32:3726–3735. doi: 10.1523/JNEUROSCI.4998-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Raposo D, Kaufman MT, Churchland AK. A category-free neural population supports evolving demands during decision-making. Nature Neuroscience. 2014;17:1784–1792. doi: 10.1038/nn.3865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Roach NW, Edwards VT, Hogben JH. The tale is in the tail: an alternative hypothesis for psychophysical performance variability in dyslexia. Perception. 2004;33:817–830. doi: 10.1068/p5207. [DOI] [PubMed] [Google Scholar]
  53. Roy NA, Bak JH, Akrami A, Brody CD, Pillow JW. Efficient inference for time-varying behavior during learning. Advances in Neural Information Processing Systems; 2018. pp. 5695–5705. [PMC free article] [PubMed] [Google Scholar]
  54. Scott BB, Constantinople CM, Erlich JC, Tank DW, Brody CD. Sources of noise during accumulation of evidence in unrestrained and voluntarily head-restrained rats. eLife. 2015;4:e11308. doi: 10.7554/eLife.11308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Shen S, Ma WJ. Variable precision in visual perception. Psychological Review. 2019;126:89–132. doi: 10.1037/rev0000128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Sheppard JP, Raposo D, Churchland AK. Dynamic weighting of multisensory stimuli shapes decision-making in rats and humans. Journal of Vision. 2013;13:4. doi: 10.1167/13.6.4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Siniscalchi MJ, Wang H, Kwan AC. Enhanced population coding for rewarded choices in the medial frontal cortex of the mouse. Cerebral Cortex. 2019;29:4090–4106. doi: 10.1093/cercor/bhy292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Speekenbrink M, Konstantinidis E. Uncertainty and exploration in a restless bandit problem. Topics in Cognitive Science. 2015;7:351–367. doi: 10.1111/tops.12145. [DOI] [PubMed] [Google Scholar]
  59. Starkweather CK, Babayan BM, Uchida N, Gershman SJ. Dopamine reward prediction errors reflect hidden-state inference across time. Nature Neuroscience. 2017;20:581–589. doi: 10.1038/nn.4520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Sul JH, Jo S, Lee D, Jung MW. Role of rodent secondary motor cortex in value-based action selection. Nature Neuroscience. 2011;14:1202–1208. doi: 10.1038/nn.2881. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Tai LH, Lee AM, Benavidez N, Bonci A, Wilbrecht L. Transient stimulation of distinct subpopulations of striatal neurons mimics changes in action value. Nature Neuroscience. 2012;15:1281–1289. doi: 10.1038/nn.3188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Wang L, Rangarajan KV, Gerfen CR, Krauzlis RJ. Activation of striatal neurons causes a perceptual decision Bias during visual change detection in mice. Neuron. 2018;97:1369–1381. doi: 10.1016/j.neuron.2018.01.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Wichmann FA, Hill NJ. The psychometric function: I. fitting, sampling, and goodness of fit. Perception & Psychophysics. 2001;63:1293–1313. doi: 10.3758/BF03194544. [DOI] [PubMed] [Google Scholar]
  64. Wilson RC, Geana A, White JM, Ludvig EA, Cohen JD. Humans use directed and random exploration to solve the explore-exploit dilemma. Journal of Experimental Psychology: General. 2014;143:2074–2081. doi: 10.1037/a0038199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Witton C, Talcott JB, Henning GB. Psychophysical measurements in children: challenges, pitfalls, and considerations. PeerJ. 2017;5:e3231. doi: 10.7717/peerj.3231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Yartsev MM, Hanks TD, Yoon AM, Brody CD. Causal contribution and dynamical encoding in the striatum during evidence accumulation. eLife. 2018;7:e34929. doi: 10.7554/eLife.34929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Yu AJ, Cohen JD. Sequential effects: superstition or rational behavior?. Advances in Neural Information Processing Systems; 2008. pp. 1873–1880. [PMC free article] [PubMed] [Google Scholar]
  68. Zatka-Haas P, Steinmetz NA, Carandini M, Harris KD. Distinct contributions of mouse cortical areas to visual discrimination. bioRxiv. 2019 doi: 10.1101/501627. [DOI]
  69. Zhou B, Hofmann D, Pinkoviezky I, Sober SJ, Nemenman I. Chance, long tails, and inference in a non-Gaussian, bayesian theory of vocal learning in songbirds. PNAS. 2018;115:E8538–E8546. doi: 10.1073/pnas.1713020115. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Daeyeol Lee1
Reviewed by: Long Ding2, Alex C Kwan3

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This manuscript presents a novel explanation for lapses in perceptual decision making. Using precise computational models to analyze the data from a click-rate discrimination task, the authors show that lapses might be due to the rats' uncertainty-dependent exploration strategy rather than inattention or motor errors.

Decision letter after peer review:

Thank you for submitting your article "Lapses in perceptual decisions reflect exploration" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Joshua Gold as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Long Ding (Reviewer #1); Alex C Kwan (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This manuscript presents an interesting explanation for lapses in perceptual decision making. The authors showed that rats showed imperfect performance on a click-rate-discrimination task. They showed that the amount of lapse depended on whether the decision was based on uni- or multi-sensory stimulus, whether the reward associations were equal or asymmetric for the two choices, and whether M2/pStr was intact. Using precise computational models, they proposed that these lapses were due to the rats' uncertainty-dependent exploration strategy. The results are somewhat consistent with the uncertainty-dependent exploration strategy, but some features that are inconsistent with this strategy appear to be ignored. A critical analysis to directly relate uncertainty and lapse across sessions was also missing. Therefore, although the manuscript has the potential of establishing the uncertainty-dependent exploration strategy as one of the factors contributing to lapses, additional analyses/explanations are needed.

Essential revisions:

1) The contrast between regular multisensory trials and "neutral" trials is a clever design. However, the results did not convincingly support the uncertainty-dependent exploration model. As the authors stated "the lapse parameter on neutral trials should match those on auditory trials, since these conditions have comparable levels of perceptual uncertainty". Judging from Figure 3—figure supplement 1E, this prediction did not hold for the example rat. It is unclear how well it held for the other four rats.

2) Model comparison results in Figure 3—figure supplement 3C suggest that the inattention model performed as well as the Exploration model for fitting uni/multi-sensory data. However, the results of the model fitting should be more fully disclosed. The results in Figure 3—figure supplement 3G were not conclusive. Namely, the inattention model seemed to outperform the Exploration model for rat#4; both models performed similarly for rat #5, and the Exploration model was better for the other three rats. The example rat in Figure 3—figure supplement 1E also showed other behavioral patterns that are puzzling. When reward was increased for the Right choice, there appeared to be a leftward bias (comparing the "Multisensory" curve in the left panel and the "Increased Right" curve in the right panel). The "equal reward" curve in the right panel showed significantly worse performance than other curves. How representative were these behavioral patterns? Do these patterns invalidate the uncertainty-dependent exploration model?

3) The two models (inattention and fixed error) used to compare against the exploration model are simplistic and may not serve as a fair comparison. In particular, based on prior literature on similar rodent task, it seems that another model based on motivation + inattention might be a more relevant and reasonable explanation, and should be compared against the exploration model. There is evidence that in sensory discrimination tasks, rodent's behavior exhibits serial choice bias. Specifically, if the last trial yielded a reward, then that could influence the current decision (Busse et al., 2011; Siniscalchi et al., 2019). One reasonable interpretation is that this is a motivational component that is dependent on the prior trial's outcome. Given this, one model that may be worthwhile to try is an outcome-dependent inattention model, where the amount of inattention differs depending on whether the last trial was rewarded or not. Namely, if the last trial was rewarded, then animal has fewer lapses, whereas if the last trial was not rewarded, then animal has more lapses. There is indication that some aspects of the current data support this idea (Figure 4F). How would this type of model contrast with the exploration model? One specific question is, similar to Figure 4F, but if we additionally plot previous L success and previous L failure, then does the reward history for prior L choices influence the proportion of choosing R at high stimulus rate?

The premise is that the exploratory choices would resemble lapses. This is true in a task design involving two choice options, but probably should be considered as a caveat of the task design. If the task has more than two choices, then one may more confidently distinguish these processes and identify periods of exploration. Some considerations as to how such a task design (or the fact that the current finding only has two options) influences the conclusions should be added in the Discussion.

4) The claim is that there is uncertainty-driven exploration that could explain the lapse rate. However, the task always employs the same criterion boundary for the discrimination problem, and the stimulus set is fixed across sessions. The animals are presumably over-trained and expert in this task, so it is unclear why they would be incentivized to update values for the stimuli in this sensory discrimination task. The authors presented some data to suggest they continuously learn. Is there a normative explanation for why they should be doing this in the current experiments?

5) Although the data in Figure 4C appear to support the uncertainty-dependent exploration model, it is possible that, on equal reward trials, the three rats trained for the "increased rRight" condition performed much worse than the three rats trained for the "decreased rRight" condition. The difference in "Proportion choose high" at 16Hz between the two cohorts for equal reward trials appeared as large as the effects of changing reward. The differences between equal reward trials and "increased/decreased rRight" trials might be due to some factors beyond value associations (e.g., how the two cohorts were trained).

6) There are many variants of models in the manuscript, but they were not presented in sufficient details, making it hard to track what parameters were fixed or fitted separately for different types of trials in a given experiment. For example, for the data in Figure 5, the legend says that the model fits scaled all contralateral values by a single parameter. Does it mean that this scaler was the only free parameter for the inactivation data, after fitting the control data? Or the model was fitted to both control and inactivation data simultaneously, with all but the scaler fixed between the two datasets? If a single scaling parameter can account for the inactivation effects, similar effects would be expected for auditory, visual and multi-sensory decisions for a given rat. But this does not seem to be the case. For example, Rats 8,9,10 in Figure 5—figure supplement 3 showed very different effects between auditory and visual decisions for M2-low rate side inactivation. Similarly, rats 2,3,6 in Figure 9—figure supplement 4 for pStr-low rate side inactivation. It would be helpful to have a table with the fitted parameter values for each experiment/rat, so that readers can better track how the model fitting was done and develop a better sense of how changes in model parameters affect the psychometric curves.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Lapses in perceptual decisions reflect exploration" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Joshua Gold as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Long Ding (Reviewer #1); Alex C Kwan (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Our expectation is that the authors will eventually carry out the additional experiments and report on how they affect the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper.

Summary:

This manuscript presents an interesting explanation for lapses in perceptual decision making. The authors showed that rats showed imperfect performance on a click-rate-discrimination task. They showed that the amount of lapse depended on whether the decision was based on uni- or multi-sensory stimulus, whether the reward associations were equal or asymmetric for the two choices, and whether M2/pStr was intact. Using precise computational models, they proposed that these lapses were due to the rats' uncertainty-dependent exploration strategy. The authors have addressed most of the concerns raised by the reviewers appropriately, but there is one issue that requires additional clarification.

Revisions for this paper:

The original figure of concern actually showed example neutral/auditory trials from different rats. The authors generated new figures showing both types of trials from 5 rats separately (Figure 3—figure supplement 1E and Author response image 2C). In three out of the five rats, for the LOW choice, the lapse was larger for neutral trials; for the HIGH choice, the lapse was larger for auditory trials. This kind of asymmetric difference in lapse appears similar to the predictions for effort manipulation in Figure 4—figure supplement 2. If there was no category-specific value/effort manipulation between neutral and auditory trials, it is not intuitive how the uncertainty-dependent exploration model can account for this asymmetry. An explanation would be helpful.

eLife. 2021 Jan 11;10:e55490. doi: 10.7554/eLife.55490.sa2

Author response


Essential revisions:

1) The contrast between regular multisensory trials and "neutral" trials is a clever design. However, the results did not convincingly support the uncertainty-dependent exploration model. As the authors stated "the lapse parameter on neutral trials should match those on auditory trials, since these conditions have comparable levels of perceptual uncertainty". Judging from Figure 3—figure supplement 1E, this prediction did not hold for the example rat. It is unclear how well it held for the other four rats.

We thank the reviewers for these kind words about the experimental design and we are very grateful that they brought this issue to our attention. The example rats in Figure 3—figure supplement 1E were intended to demonstrate the relationship between conditions within each manipulation, hence a different example rat was chosen for each manipulation (Author response image 1A shows the original figure). This yielded misleading patterns across manipulations, especially since these were incorrectly labelled "example rat" in the figure and legend. We thank the reviewer for bringing these discrepancies to our attention – we have corrected this and chosen the same example rat across all manipulations (same rat, lc40, as reward panel – Author response image 1B) in order to facilitate across-manipulation comparisons. We hope that this revised version of the figure makes it clear that the total lapses on neutral trials (Author response image 1B, middle panel, orange trace) are similar to those on auditory trials (Author response image 1B, left panel, green trace).

Author response image 1.

Author response image 1.

In addition:(1) An unconstrained descriptive model ( "variable lapse" i.e. 4 independent parameters per condition) fit to all 5 rats reveals that slope and lapse parameters on neutral and auditory conditions lie along the unity line for 4/5 rats (Author response image 2A) We have updated the Results section to reflect this.

Author response image 2.

Author response image 2.

(2) The exploration model fits (Author response image 2C) demonstrate that the prediction *did* hold for the example rat (1st panel) as well as the other 4 rats: auditory and neutral conditions were constrained to have the same σ and lapse parameters in the exploration model (Author response image 2B), and this provides a good fit to the data.

2) Model comparison results in Figure 3—figure supplement 3C suggest that the inattention model performed as well as the Exploration model for fitting uni/multi-sensory data. However, the results of the model fitting should be more fully disclosed. The results in Figure 3—figure supplement 3G were not conclusive. Namely, the inattention model seemed to outperform the Exploration model for rat#4; both models performed similarly for rat #5, and the Exploration model was better for the other three rats. The example rat in Figure 3—figure supplement 1E also showed other behavioral patterns that are puzzling. When reward was increased for the Right choice, there appeared to be a leftward bias (comparing the "Multisensory" curve in the left panel and the "Increased Right" curve in the right panel). The "equal reward" curve in the right panel showed significantly worse performance than other curves. How representative were these behavioral patterns? Do these patterns invalidate the uncertainty-dependent exploration model?

The reviewer is correct in pointing out that there is individual variability in the best fitting model in Figure 3—figure supplement 3; however it is important to note that the ideal observer model is a limiting case of the inattention/exploration models, and for animals with very small lapse rates, these models are indistinguishable so BIC would strongly prefer the more parsimonious ideal observer model ​ -​ this is the case for both rats 4 and 5 for which the ideal observer provides the best fit according to BIC, suggesting that these rats have very small lapse rates. All 3 rats that are rejected by the ideal observer model i.e. have sizable lapse rates, are best fit by the exploration model. In the revised version of the manuscript, we acknowledge this variability in individuals that are best fit by the ideal observer vs exploration models (Results).

As for the rat in (Figure 3—figure supplement 1E), the "equal reward" and "increased Right" conditions on the rightmost panel both consist of auditory trials, and the performance on “equal reward” trials is comparable to performance on auditory trials in the “multisensory” or “neutral” experiments (updated left, center panel from the same example rat). We have clarified this in the legend.

3) The two models (inattention and fixed error) used to compare against the exploration model are simplistic and may not serve as a fair comparison. In particular, based on prior literature on similar rodent task, it seems that another model based on motivation + inattention might be a more relevant and reasonable explanation, and should be compared against the exploration model. There is evidence that in sensory discrimination tasks, rodent's behavior exhibits serial choice bias. Specifically, if the last trial yielded a reward, then that could influence the current decision (Busse et al., 2011; Siniscalchi et al., 2019). One reasonable interpretation is that this is a motivational component that is dependent on the prior trial's outcome. Given this, one model that may be worthwhile to try is an outcome-dependent inattention model, where the amount of inattention differs depending on whether the last trial was rewarded or not. Namely, if the last trial was rewarded, then animal has fewer lapses, whereas if the last trial was not rewarded, then animal has more lapses. There is indication that some aspects of the current data support this idea (Figure 4F). How would this type of model contrast with the exploration model? One specific question is, similar to Figure 4F, but if we additionally plot previous L success and previous L failure, then does the reward history for prior L choices influence the proportion of choosing R at high stimulus rate?

The premise is that the exploratory choices would resemble lapses. This is true in a task design involving two choice options, but probably should be considered as a caveat of the task design. If the task has more than two choices, then one may more confidently distinguish these processes and identify periods of exploration. Some considerations as to how such a task design (or the fact that the current finding only has two options) influences the conclusions should be added in the Discussion.

We fully agree with the reviewers here. If the level of attention were changed following correct/error trials, this could potentially account for the trial history effects: an increase/decrease in attention following success/failure, in combination with an upward or downward shift due to changes in guessing probabilities emerging from updates to the prior/action values (i.e. following success/failure on the R, animal assumes that High/Low rates are more likely -or- assumes that R actions are more/less rewarding.)

Importantly, in order for this to capture the asymmetry in the data, the two effects would have to be fine-tuned to cancel each other out at low rates and add up at high rates. Further, since this model invokes trial-by-trial updates, a fair comparison to it would be an exploration model combined with trial-by-trial updates of the values of chosen actions based on past outcomes (thus producing the asymmetry, which is seen following both leftward and rightward actions – Author response image 3), which is how we currently propose outcomes affect subsequent trials, as we allude to in the discussion on trial-by-trial modeling.

Author response image 3.

Author response image 3.

However, we can still make predictions for trial-averaged behavior from this model for the different manipulations, by assuming that p(Attend) is modulated by average reward i.e. higher average rewards give rise to greater overall motivation, more attention and fewer lapses. This allows us to compare its predictions with other models on the neutral and reward manipulations:(1) Predictions for matched v. neutral: Since the “matched” and “neutral” trial types are randomized and uncued, the animal doesn't know what the upcoming condition is and the average rewards preceding the two trials should be the same. Hence, the motivated inattention model predicts the same level of attention across the two conditions, just like the regular inattention model, predicting equal lapses (unlike those observed in the data)

(2) Predictions for reward magnitude manipulation: The motivated inattention​ model can indeed explain the effects of the reward magnitude experiment, by assuming that the higher/lower average reward on increased/decreased reward conditions gives rise to more/less attention, and fewer/more lapses in conjunction with the upward/downward shifts predicted by the regular inattention model. Once again, this does require fine-tuning of the two effects in order to cancel out at low rates. (Author response image 4 left, center panels)

Author response image 4.

Author response image 4.

3) Predictions for reward probability experiment: In this experiment, leftward actions are probabilistically rewarded (50%) on highrates (instead of yielding 0 reward), and always rewarded on low rates – thus increasing the overall proportion of rewarded outcomes and increasing the proportion of leftward trials rewarded, compared to rightward trials.

– This predicts an increase in overall attention (to both stimulus categories) due to the higher levels of motivation, and hence a decrease in overall lapses

– It also predicts a non-specific increase in leftward choices due to the leftward biased average rewards yielding more leftward inattentive guesses.

In particular, these two effects should lead to a bigger downward shift in low rates (as shown in Author response image 4 right panel)

However, this is not the effect observed in the data, instead the manipulation only increases leftward choices for high rates, thus increasing lapses on high rates, and consequently increasing the overall lapse rate – which matches the predictions of the exploration model.

We thank the reviewers for the motivated inattention model, and have added it as a new panel (Figure 4—figure supplement 1F). We have also added text in the Discussion about the caveats of a task design with two options, and cited emerging work that offers a possible remedy (Zatka-Haas et al., 2018, Mihali et al., 2018).​

4) The claim is that there is uncertainty-driven exploration that could explain the lapse rate. However, the task always employs the same criterion boundary for the discrimination problem, and the stimulus set is fixed across sessions. The animals are presumably over-trained and expert in this task, so it is unclear why they would be incentivized to update values for the stimuli in this sensory discrimination task. The authors presented some data to suggest they continuously learn. Is there a normative explanation for why they should be doing this in the current experiments?

The reviewers are correct that in the current task, the true category boundary, stimulus-action contingency and expected rewards are fixed. Therefore the normative strategy is to explore until the uncertainty in action values reduces to zero, and then stop exploring. While some features of the task, such as sensory uncertainty, abstractness of the stimulus-response contingency and arbitrariness of the category boundary could lead to increased action value uncertainty to begin with, this normative strategy still predicts zero lapses in the asymptotic limit of training (e.g. Author response image 5A simulation with belief in stationary rewards, increasing sensory uncertainty – indicated by cooler colors – reduces speed of reduction in lapse rates, but asymptotic lapse rate is 0).

Author response image 5.

Author response image 5.

A possible normative explanation for the fact that the animals are continuously learning/exploring is that they do not assume that the action values are static, but instead entertain the possibility that they drift/change over time (possibly reflecting the statistics of real world rewards). While this model is clearly mismatched to the current task, under this model (i.e. in truly non-stationary worlds) the normative solution is to maintain a low level of exploration to test whether the world has changed or not – we can simulate this using a model that assumes a small rate of non-stationarity in values (Author response image 5B.) This model predicts a residual level of uncertainty that never goes to 0, and consequently a residual asymptotic lapse rate scaled by sensory uncertainty. Some animals do indeed achieve close-to-zero lapse rates after extensive training (e.g. Figure 4—figure supplement 2), but for animals that still have residual lapses, it is difficult to distinguish these models with finite training data.

Fortunately, a unique prediction of a mismatched world model is that it predicts that in the event of a contingency change, subjects should unlearn old contingencies much faster than predicted by a stationary belief model (which would display perseveration and take as many examples to unlearn a contingency as it did to learn it – akin to Ebitz et al., 2019). We have performed preliminary tests of this prediction by reversing the contingency in a small cohort of rats. We observed that rats did indeed unlearn old contingencies much faster than predicted from the stationary belief model, and resembled a model that entertains a small possibility of non-stationarity in values.

We have expanded on this point about normativity and included predictions for matched/mismatched world models and tests of contingency change in the Discussion. However, we have not included the results of our contingency change experiments in this manuscript since they are quite preliminary.

5) Although the data in Figure 4C appear to support the uncertainty-dependent exploration model, it is possible that, on equal reward trials, the three rats trained for the "increased rRight" condition performed much worse than the three rats trained for the "decreased rRight" condition. The difference in "Proportion choose high" at 16Hz between the two cohorts for equal reward trials appeared as large as the effects of changing reward. The differences between equal reward trials and "increased/decreased rRight" trials might be due to some factors beyond value associations (e.g., how the two cohorts were trained).

The reviewer is correct in pointing out that there are individual differences between cohorts at baseline, possibly due to differences between the precise history of training data each rat has seen (which we expect to reflect in the subjective action values learnt by each rat).

For this reason, we restrict all our reward comparisons in Figure 4C to within-cohort comparisons – each rat is first trained on equal reward, then tested with a reward manipulation (either increase or decrease) to measure the effect of the reward manipulation on its behavior relative to baseline (which in the exploration model is captured by scaling only the relevant baseline value). Neither of the cohorts have baseline lapses at ceiling/floor, allowing for measurement of the effect of reward manipulation on both lapses and hence a comparison of the different models. Moreover, the exploration model captures within-individual comparisons relative to baseline by changing high rate action values on the right alone, even though individuals vary substantially in their baseline left/right values (Table 2). We have clarified this in the Results.

6) There are many variants of models in the manuscript, but they were not presented in sufficient details, making it hard to track what parameters were fixed or fitted separately for different types of trials in a given experiment. For example, for the data in Figure 5, the legend says that the model fits scaled all contralateral values by a single parameter. Does it mean that this scaler was the only free parameter for the inactivation data, after fitting the control data? Or the model was fitted to both control and inactivation data simultaneously, with all but the scaler fixed between the two datasets? If a single scaling parameter can account for the inactivation effects, similar effects would be expected for auditory, visual and multi-sensory decisions for a given rat. But this does not seem to be the case. For example, Rats 8,9,10 in Figure 5—figure supplement 3 showed very different effects between auditory and visual decisions for M2-low rate side inactivation. Similarly, rats 2,3,6 in Figure 5—figure supplement 4 for pStr-low rate side inactivation. It would be helpful to have a table with the fitted parameter values for each experiment/rat, so that readers can better track how the model fitting was done and develop a better sense of how changes in model parameters affect the psychometric curves.

We agree. First, to address the point about clarifying models and parameters in the paper, we generated a new table with fit parameters for all the experiments and models in order to clarify parameters and constraints for each of the fits. (Table 1 i.e. Figure 4—source data 1: fits to pooled data across individuals. We generated a second table, Table 2, with individual fits. However, we did not include this in the revised manuscript because we feared this might be cumbersome to include in the supplement).

As for inactivation fits, the model was fit to both control and inactivation data simultaneously, with a single scalar being the only parameter differing between the two datasets. We have added a separate section describing inactivation modeling in the Materials and methods to clarify this point.

Second, we have addressed the issue that the fits in (Figure 5—figure supplement 3,4) were unclear. In the original manuscript, they were indeed all descriptive, unconstrained fits (i.e. 4 parameters per fit x 3 modalities x 2 perturbation conditions = 24 params per rat). We have updated this figure with individual fits of the best fitting model for each rat (biased evidence/value/effort, which have 11 control + scalar = 12 params per rat). Despite being heavily constrained, these account for all rats quite well.

All three models of inactivation are capable of producing effects with differing strengths across modalities: the key intuition is that these effects interact with the baseline sensory noise (biased evidence) and baseline values/exploratoriness (biased value and biased effort), producing the strongest effects for modalities with the highest sensory noise/exploration. The simulation (Author response image 6) illustrates this difference. Performance on control trials for left- and right-biased baseline action valus shows only subtle differences (compare solid traces on top, bottom). However, the same inactivation strength (i.e. multiplicative/additive factor) drives strikingly different effects on these conditions (compare dashed traces for top, bottom). This highlights the strength of a model that can estimate action values from lapse rates. This approach can reconcile seemingly different effects across conditions within the same animal (or across animals) with a single change in action value, without needing to invoke separate mechanisms for each condition.

Author response image 6.

Author response image 6.

However despite the ability of the model to account for disparate changes in the psychometric function, we agree with the reviewer's assessment that the biased value model does not fully account for the effects in some rats. In Rats 8,9 (M2 low rate), 2(pStr High),3(pStr low) and 6(pStr high and low), the best fitting model was still one in which action values changed. But, importantly, the winning model for those rats was the one in which a single scalar 'effort' (i.e. negative, stimulus-independent value) was added to the contralateral side, rather than a single multiplicative (i.e. stimulus-dependent) value scaling. This suggests that the inactivations affected value additively, rather than multiplicatively in these rats. In the revised version of the text, we address these individual differences (and display fits from the best fitting model for each rat in Figure 5—figure supplements 3,4), and acknowledge that additional recording and inactivation studies might shed light on why disruptions drive changes that are sometimes additive and sometimes multiplicative. We argue that our approach nonetheless gives an experimenter considerable power in interpreting inactivations that diverge across conditions.[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Revisions for this paper:

The original figure of concern actually showed example neutral/auditory trials from different rats. The authors generated new figures showing both types of trials from 5 rats separately (Figure 3—figure supplement 1E and Author response image 2C). In three out of the five rats, for the LOW choice, the lapse was larger for neutral trials; for the HIGH choice, the lapse was larger for auditory trials. This kind of asymmetric difference in lapse appears similar to the predictions for effort manipulation in Figure 4—figure supplement 2. If there was no category-specific value/effort manipulation between neutral and auditory trials, it is not intuitive how the uncertainty-dependent exploration model can account for this asymmetry. An explanation would be helpful.

The reviewers have correctly identified that while auditory and neutral trials have comparable sigmas and total lapse rates, some rats indeed show a slight low-rate bias on auditory trials compared to neutral and multisensory trials (Author response image 8). While all models considered can account for this through their free “bias” and “lapse-bias” parameters (hence not affecting model comparisons), none of them explain its existence.

Author response image 8. Bias in psychometric functions of individual rats.

Author response image 8.

Data from the 5 rats used in the neutral experiment, on auditory (green), neutral (orange) and multisensory (red) conditions, demonstrating individual variability in low-rate bias on auditory trials, compared to neutral/multisensory trials. Some rats (e.g. rightmost panel) show almost no bias, resembling the “rate-only” decoders, while others resemble “hybrid” decoders with weak influences fromcount information.

Instead, we think that this bias can be explained by animals not using a pure "flash/pulse rate" decoder to solve the task, but instead using a hybrid decoder that incorporates both "flash/pulse rate" and "total flash/pulse count" information. These two features are correlated with each other and with reward on all trial types, making them both susceptible to reward-based credit assignment – we reported a similar hybrid "rate+count" strategy in rats and mice on a visual-only variant of the rate task (Odoemene et al., 2018, Figure 3). Note that neutral and multisensory trials however are offset on the "count" dimension compared to auditory trials, since they have additional events for the same "rate" (twice as many on multisensory, 12 additional on neutral – First column in Author response image 7).

Author response image 7. Use of count information generates bias in psychometric functions.

Author response image 7.

Simulations demonstrating the effect of 4 different linear decoders (1st column) on psychometric functions under the fixed error (2nd column), inattention (3rd column) and exploration (4th column) models. Dotted black lines indicate the true category boundary that separates “low” and “high” stimulus categories, solid black lines indicate the subjective category boundary for each decoder. Colored dots indicate stimulus sets in auditory (green), neutral (orange) and multisensory (red) conditions. Open circles indicate point of subjective equality i.e. stimulus that produces equal “high” and “low” evidence for each condition. Rate-only decoder (1st row) aligns with the true category boundary, producing unbiased psychometric functions across conditions, with models differing only in the total lapse rate of each condition. Hybrid decoders incorporating a small amount of count information (2nd row: 1/15th as much as rate, 3rd row: 1/5th as much as rate, 4th row: just as much as rate) that are unbiased on multisensory, neutral conditions produce a low-rate bias on auditory trials, with more use of count information producing higher bias. This produces horizontal shifts (i.e. bias) across models, and additionally produces vertical shifts (i.e. lapse bias) in inattention, exploration models, while preserving the predictions for total lapse rates across models/conditions.

As a result, only an observer that uses a pure "rate" decoder would have an unbiased psychometric across all conditions, centered at the true category boundary of 12.5 (First row in Author response image 7) – this is true of rat 5 in Author response image 8. However, even a weak influence of count would bias conditions with respect to each other. For instance, an animal that uses a "hybrid" decoder that is unbiased for multisensory and neutral conditions would be biased towards low rates on auditory trials, with increasing use of count information producing greater biases (Second, third, fourth rows of Author response image 7 – effects of count information adding 1/15th as much, 1/7th as much or just as much evidence as rate information).

Such a hybrid decoder would translate into different effective state-action value pairs for different conditions based on their position in this 2-d space, with unisensory conditions having the highest asymmetry between low and high rate action values. In the fixed error model, this asymmetry would simply produce horizontal biases in the psychometric function, but in both inattention and exploration models, this asymmetry would additionally bias the lapses. Hence, the varying biases and lapse biases seen in the other 4 rats (Author response image 8) could arise from varying degrees of use of count information. Note that if an animal using a hybrid decoder was unbiased on auditory trials, then it would show a high-rate bias on multisensory trials, as is observed in some of the animals in the 1st cohort (unisensory vs. multisensory)

We tested this “hybrid decoder” hypothesis in an independent cohort of rats where we increased/decreased the duration of auditory trials to increase/decrease count information without changing rate information (as per Odoemene et al., 2018), and indeed found evidence for a hybrid decoder with a weak influence of count (Author response image 9), however we think these results are outside the scope of the current manuscript. In case other readers share the reviewers’ concern about bias, we now mention the count bias and refer to previous work.

Author response image 9. Duration manipulation on auditory trials confirms weak influence of count.

Author response image 9.

Data from 4 rats (right) shows that increasing (light green) or decreasing (brown) the duration of auditory trials produces slight high- or low-rate biases, resembling a hybrid decoder with a weak influence of count information(middle panel left).

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Pisupati S, Chartarifsky L, Khanal A, Churchland A K. 2020. Dataset from: Lapses in perceptual decisions reflect exploration. CSHL. [DOI] [PMC free article] [PubMed]

    Supplementary Materials

    Figure 4—source data 1. Fit parameters to pooled data across rats.
    Transparent reporting form

    Data Availability Statement

    Data are publicly available: http://repository.cshl.edu/id/eprint/38957/.

    The following dataset was generated:

    Pisupati S, Chartarifsky L, Khanal A, Churchland A K. 2020. Dataset from: Lapses in perceptual decisions reflect exploration. CSHL.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES