Single-phase models have a problem with correctly assigning credit during reinforcement learning. (a) Activity is only in the MSNs of the direct pathway. Activity levels are designated by the size of the yellow star. Activity in the striatum influences the final action selection by premotor cortex, which in this instance chooses R. In this case, dopamine-dependent reinforcement works correctly to strengthen synapses onto the R MSN in the direct pathway. (b) Both direct and indirect pathways are active in this example, and both ‘vote’ on actions. The final choice is R because votes for L in the direct pathway are balanced by votes against L in the indirect pathway. If R is reinforced, the resulting dopamine release will lead to enhancement of the active synapses onto the L MSNs, which have the largest eligibility trace in the direct pathway. This is problematic because it will not lead to the desired change in behaviour. (Online version in colour.)