Skip to main content
. 2013 Apr 11;9(4):e1003024. doi: 10.1371/journal.pcbi.1003024

Figure 2. Critic learning in a linear track task.

Figure 2

A: Learning rule with three factors. Top: TD-LTP is the learning rule given in Eq. 17. It works by passing the presynaptic spike train Inline graphic (factor 1) and the postsynaptic spike train Inline graphic (factor 2) through a coincidence window Inline graphic. Spikes are counted as coincident if the postsynaptic spike occurs within after a few ms of a presynaptic spike. The result of the pre-post coincidence measure is filtered through a Inline graphic kernel, and then multiplied by the TD error Inline graphic (factor 3) to yield the learning rule which controls the change Inline graphic of the synaptic weight. Bottom: TD-STDP is a TD-modulated variant of R-STDP. The main difference with TD-LTP is the presence of a post-before-pre component in the coincidence window. B: Linear track task. The linear track experiment is a simplified version of the standard maze task. The actor's choice is forced to the correct direction with constant velocity (left), while the critic learns to represent value (right). C: Value function learning by the critic. Each colored trace shows the value function represented by the critic neurons activity against time in the Inline graphic first simulation trials (from dark blue in trial 1 to dark red in trial 20), with Inline graphic corresponding to the time of the reward delivery. The black line shows an average over trials 30 to 50, after learning converged. The gray dashed line shows the theoretical value function. D: TD signal Inline graphic corresponding to the simulation in C. The gray dashed line shows the reward time course Inline graphic.