Figure 1.
(A) Pictorial representation of the anatomical reciprocal connections between the basal ganglia, thalamus, and cerebellum. Green arrows depict the cortico-striatal reward learning circuitry via the thalamus. Blue arrows depict the cortico-cerebellar recurrent loops for classically conditioned reflexive behaviors. Adapted and modified from Doya (2000a). (B) Combinatorial learning framework with parallel combination of ICO learning and actor-critic reinforcement learning. Individual learning mechanisms adapt their weights independently and then their final weighted outputs (Oico and Oac) are combined into Ocom using a reward modulated heterosynaptic plasticity rule (dotted arrows represent plastic synapses). Ocom controls the agent behavior (policy) while sensory feedback from the agent is sent back to both the learning mechanisms in parallel.