Skip to main content
. 2011 May 12;7(5):e1001133. doi: 10.1371/journal.pcbi.1001133

Figure 2. Neuronal actor-critic architecture generating and exploiting a dopaminergic TD error signal.

Figure 2

The input layer of the neuronal network consists of pools of cortical neurons (C) representing state information. The critic module is composed of neurons in the striatum (STR), neurons in the ventral pallidum (VP) and dopaminergic neurons (DA). The direct pathway from the striatum to the dopamine neurons is delayed with respect to the indirect pathway via the neuron population in the ventral pallidum. The actor module consists of one neuron for each possible action. The neural network interacts with an environment (E). The environment stimulates the cortical neurons representing the current state with a DC input. Whichever action neuron fires first is interpreted by the environment as the chosen action for the current state. After an action has been chosen the environment inhibits the actor neurons for a short time period by a negative DC input. If the current state is associated with a reward, the environment delivers a reward signal (R) in form of a DC input to the dopaminergic neurons. The dopaminergic signal modulates as a global third factor the plasticity of cortico-striatal synapses and the synapses between cortex and actor neurons. Red lines; inhibitory connections, blue lines; excitatory connections, purple lines; dopaminergic signal. All neurons receive additional Poissonian background noise (not shown).