A gated-reinforcement learning model for sequence learning. (A) A schematic of the model for a three time-step (T1→T2→T3) song motif. Three ensembles of MSNs (a/b/c) in Area X (gray circle) are involved in learning the sequence of corresponding vocal elements (labeled as syllable A/B/C). MSN ensembles receive inputs from the ensemble of HVC neurons (HVCx, labeled as t1/t2/t3), each of which is active at the corresponding time-steps (T1–T3). Arrows between HVCx and an individual ensemble of MSNs indicate their synaptic connections and correspondent weights. In this hypothetical scenario, the probability of ensemble “a” of MSNs being activate is equal across all three time-steps for a given rendition of the song motif. This contrasts with ensemble “b” which can be activated exclusively at T2 or ensemble “c” which can be activated at either T2 or T3. (B) Theoretical neural activity in MSN ensembles (a/b/c) in Area X at different time steps given the corticostriatal connections shown in (A). (C) Syllable syntax map showing the potential syllable transitions that could result from activity patterns across MSN ensembles in Area X shown in (B). For instance, if ensemble “a” of MSNs is activate at all three time-step in one rendition, syntax “AAA” is produced. In another rendition, if ensemble “a” is activate at T1 and ensemble “c” is activated at T2 and T3, syntax “ACC” is produced. Arrows between different syllables indicate the corresponding transitions. Circle arrows besides “A” or “C” indicate repetition of the given syllable. (D) Schematic of the three inputs to the MSN ensemble “a”: VTA/SN, HVC, and ChIs. The square represents ChIs whose pauses are associated with syllable “A”. The green circle indicates that ensemble “a” is being depolarized at the same time that there are pauses in the activity of ChIs projecting onto ensemble “a”. Arrows indicate glutamatergic inputs from either HVC or DLM (thalamus). Lines with a circle at the end indicate modulatory inputs from either VTA/SN (gray, DA) or ChIs (black, Ach). (E) Schematic of gated-reinforcement learning. ChIs pauses are coincident with activation of ensemble “a” (green circle) while phasic dopamine signal (DA, gray rectangle) is active at T1. LTP (red rectangle) results at T1 when MSNs depolarization is coincident with ChIs pause and phasic increases in DA; LTD (blue rectangle) results at T2/T3 when MSNs depolarization is coincident with ChIs pause in the absence of phasic increases in DA. (F) Given the coincident activity of MSNs depolarization, cholinergic pauses and phasic increase in DA shown in (E), LTP (red arrow) results at “t1→a” synapses and LTD (blue dashed arrow) results at “t2→a” and “t3→a” synapses.