Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2016 Jan 19;9:85. doi: 10.3389/fncir.2015.00085

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 2016 Frémaux and Gerstner.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

PMC Copyright notice

Schematic of reward-modulated learning rules. Boxes on the left show the magnitude of plasticity as a function of the time difference Δt = t_post−t_pre, between post- and presynaptic spike firing. (A) R-max (Pfister et al., 2006; Baras and Meir, 2007; Florian, 2007; Frémaux et al., 2010). The learning rule is maximal for “pre-before-post” coincidences (red line, ϵ) and rides on a negative bias representing the expected number of postsynaptic spikes 〈post〉 (red dashed line). This Hebbian coincidence term is then low-pass filtered by an exponential filter, before being multiplied by the delayed reward R transmitted by a neuromodulator. (B) R-STDP (Farries and Fairhall, 2007; Florian, 2007; Izhikevich, 2007; Legenstein et al., 2008; Vasilaki et al., 2009; Frémaux et al., 2010). Similar to A, except for the shape of the pre-post coincidence window W which is bi-phasic and does not depend on the expected number of postsynaptic spikes. The Hebbian coincidence term is, after filtering, multiplied with the neuromodulator transmitting the success signal M = R − b where b is the expected reward. (C) TD-STDP (Frémaux et al., 2013). Similar to B, except for the modulating factor which in this case is the TD error M = δ^TD. (D) Generalized learning rule. Changing the meaning of the neuromodulator term M = D allows the switching between different regimes of the learning rule.