The effects of phasic and tonic dopamine on Believer-Skeptic competition. (A) Simulation of probabilistic value-based decision task (upper-left; see Frank et al., 2004) in which the agent must learn the relative value of two arbitrary stimuli based on trial-and-error feedback. On each trial the agent makes a decision by choosing between a pair of Japanese symbols, one with a higher probability of yielding a reward (left column; chosen with action aopt) than the other (right column; chosen with action asub). Value-based decisions are simulated as a race-to-threshold between two stochastic accumulators (see Figure 3), each reflecting the direct-indirect competition within a single action channel (see Figure 2). Both actions start out with equal associated values Q(aopt) = Q(asub) and thus, equal drift-rates of accumulation. On each trial, the corrective effects of phasic changes in dopamine are simulated by enhancing (depressing) the sensitivity of the direct (indirect) pathway following positive outcomes (+δ) and vice-versa following negative outcomes (−δ). In the accumulator model, this learning results in an increase in the drift-rate for aopt (solid arrow) and a decrease in the drift-rate for asub (dotted arrow), proportional to the difference in their associated value. The bottom panel shows the timeline of the estimated value difference for alternative actions (Q(aopt) − Q(asub)) for three different probabilistic reward schedules. Stimulus pairs with a greater discrepancy in reward probability (i.e., red > green > blue) lead to faster associative value learning. (B) Simulated effects of tonic dopamine levels on exploration-exploitation tradeoff. Tonic dopamine levels were simulated by varying the strength of non-specific background inputs (Iλ) in a network with stronger weighting of cortical input to direct than indirect pathway. (Bottom) panel: the same ratio of cortical input to the direct (green) and indirect (blue) pathways leads to faster gating in the presence higher Iλ (darker colors, increased baseline) compare to when Iλ is low (lighter colors, decreased baseline). (Top) panel: Increasing tonic levels of Iλ facilitates exploitation of the current cortico-striatal weights by accelerating evidence accumulation, resulting in faster decisions and reduced trial-to-trial variability in RT. In contrast, behavior is substantially more variable with lower levels of Iλ, promoting an exploration policy.