Skip to main content
. 2019 Mar 13;121(5):1748–1760. doi: 10.1152/jn.00817.2018

Fig. 4.

Fig. 4.

Bidirectional learning rule for η. Because of feature overlap, the learned value (black curve, plotted here against objective time) does not drop immediately to zero after reward time T (dashed vertical line). This gradual decrease allows its derivative V^˙ to be negative over a nonzero domain of time, which in turn allows the update rule for η in Eq. 6 to take negative values over that same domain. It follows that η increases if reward is delivered roughly before T and decreases if reward is delayed past T (gray curve).