Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Nov 4.
Published in final edited form as: Neuron. 2015 Oct 22;88(3):528–538. doi: 10.1016/j.neuron.2015.09.037

Distinct Eligibility Traces for LTP and LTD in Cortical Synapses

Kaiwen He 1, Marco Huertas 2, Su Z Hong 1, XiaoXiu Tie 1, Johannes W Hell 3, Harel Shouval 2, Alfredo Kirkwood 1,*
PMCID: PMC4660261  NIHMSID: NIHMS726142  PMID: 26593091

Abstract

In reward-based learning, synaptic modifications depend on a brief stimulus and a temporally delayed reward, which poses the question of how synaptic activity patterns associate with a delayed reward. A theoretical solution to this so-called “distal reward problem” has been the notion of activity-generated ‘synaptic eligibility traces’, silent and transient synaptic tags that can be converted into long-term changes in synaptic strength by reward-linked neuromodulators. Here we report the first experimental demonstration of eligibility traces in cortical synapses. We demonstrate the Hebbian induction of distinct traces for LTP and LTD and their subsequent timing-dependent transformation into lasting changes by specific monoaminergic receptors anchored to postsynaptic proteins. Notably, the temporal properties of these transient traces allow stable learning in a recurrent neural network that accurately predicts the timing of the reward, further validating the induction/transformation of eligibility traces for LTP and LTD as a plausible synaptic substrate for reward-based learning.

Introduction

A central aim of learning in biological organisms is to maximize reward. To achieve this aim, animals must learn what stimuli and actions predict an often delayed reward, and when the reward is likely to arrive. This poses a fundamental question regarding the synaptic mechanisms of learning: how can a delayed reward gate plasticity in synapses that were transiently activated by the predictive stimulus? A theoretical solution proposed decades ago to bridge the temporal gap between stimulus and reward, the so-called “credit assignment problem,” is the notion that neural activity generates silent and transient “synaptic eligibility traces” that can be transformed into long-term changes in synaptic strength by reward-linked neuromodulators (Crow, 1968; Frémaux, Sprekeler, & Gerstner, 2010; Gavornik, Shuler, Loewenstein, Bear, & Shouval, 2009; Hull, 1943; Izhikevich, 2007; Klopf, 1982; Sutton & Barto, 1998; Turner, O'Connor, Tate, & Abraham, 2003; Wörgötter & Porr, 2005).

In most theoretical models of reward-driven learning, synaptic eligibility traces are typically induced in a Hebbian manner by coincident pre- and post-synaptic activity, and have half times in the order of seconds (Frémaux et al., 2010; Izhikevich, 2007; Klopf, 1982; Sutton & Barto, 1998), during which they can be converted into long-term changes by the action of neuromodulators. Although bidirectional synaptic plasticity induced by coincident activity is well established, particularly in the form of spike-timing dependent plasticity (STDP) (Caporale & Dan, 2008; Richards, Aizenman, & Akerman, 2010), the existence of eligibility traces for LTP has been reported in only two studies, neither of them in cortex (Cassenaer & Laurent, 2012; Yagishita et al., 2014).

Recent findings in rodents and humans have implicated primary sensory cortices in reinforced learning (Chubykin, Roach, Bear, & Shuler, 2013; Gardner & Fontanini, 2014; Jaramillo & Zador, 2011; Poort et al., 2015; Seitz, Kim, & Watanabe, 2009; Shuler & Bear, 2006), making them attractive systems to examine the existence of eligibility traces. Historically, neuroplasticity associated with reward has been studied primarily in the dopaminergic system and its projection areas, including basal ganglia and prefrontal cortex, which are involved in detecting reward and orchestrating the appropriate response. However, the process of learning to recognize the reward-predicting stimuli likely involves remodeling in primary sensory cortices as well. Indeed, cells in primary sensory cortices can predict essential attributes of the reward, including timing (Poort et al., 2015; Shuler & Bear, 2006) and value (Gardner & Fontanini, 2014).

We examined the existence of eligibility traces in layer II/III pyramidal cells in slices from both visual and prefrontal cortices. An important motivation was the observation in the visual cortex, the Hebbian induction of long-term potentiation and depression (LTP and LTD) depends crucially on not only glutamate receptors, but also neuromodulator receptors coupled to Gs and Gq (Choi et al., 2005; Huang et al., 2012; Yang and Dani, 2014). In reinforcement learning, reward is typically delayed. We therefore tested whether neuromodulators could also act in a retrograde manner, to allow synaptic changes when applied after conditioning. We demonstrated in both visual and prefrontal cortices the Hebbian induction of short-lived eligibility traces that can be converted into either LTP or LTD by specific monoamines. We found that LTP and LTD associated traces have different dynamics and demonstrated the functional significance of these different dynamics by showing that temporal competition between these eligibility traces produces stable learning that allows a recurrent neural network to predict the arrival time of the reward.

Results

Specific monoamines transform synaptic eligibility traces induced by spike-timing conditioning into LTP or LTD

As mentioned above, in cortex, unlike other structures such as hippocampus, the induction of Hebbian plasticity depends critically on the activation of G-protein coupled receptors (GPCRs), such that blockade of these receptors or depletion of the endogenous neuromodulators prevents LTP and LTD (Choi et al., 2005; Huang et al., 2012). Moreover, due to this GPCR-dependency, under certain experimental conditions, including ours, the Hebbian induction of synaptic plasticity with spike-timing (ST) dependent conditioning requires the addition of exogenous neuromodulators (Edelmann and Lessmann, 2013; Huang et al., 2014; Seol et al., 2007; Yang and Dani, 2014). We exploited this fact to directly test the induction of eligibility traces in cortical slices by determining whether ST conditioning can result in LTP or LTD if rapidly followed by an application of neuromodulator agonists. The neuromodulators tested were norepinephrine, serotonin, dopamine and acetylcholine, all of which have been implicated in cortical plasticity. We first focused on the primary visual cortex, where reward-based changes are well established in both primates (including humans) and rodents (Goltstein, Coffey, Roelfsema, & Pennartz, 2013; Poort et al., 2015; Seitz et al., 2009; Shuler & Bear, 2006). The recordings were done in layer II/III pyramidal cells and involved activation of two independent Layer IV to Layer II/III pathways, which were conditioned simultaneously with near coincidental pre- and postsynaptic stimulation (spike-timing or ST conditioning, Fig 1A, B). In one pathway, presynaptic stimulation preceded a burst of postsynaptic potentials by 10 ms (pre-post: to promote LTP); in the other one, it occurred 10 ms after the burst (post-pre: to promote LTD). Neuromodulators were pressure ejected from a nearby pipette beginning immediately after ST conditioning and continuing for 10 s. As expected, under control conditions the ST conditioning elicited no plasticity (Fig 1C, pre-post: p = 0.563; post-pre: p = 0.156), but plasticity was observed when the ST conditioning was immediately followed by pressure ejection of norepinephrine (NE: 50 μM, 10 s) or serotonin (5-HT: 50 μM, 10 s). NE selectively potentiated the pre-post pathway without affecting the post-pre pathway (Fig 1D, pre-post: p = 0.002; post-pre: p = 0.232); conversely, 5-HT selectively depressed the post-pre pathway, but not the pre-post pathway (Fig 1E, pre-post: p = 0.160; post-pre: p = 0.002). Pressure ejection of the agonists alone in naïve (non-conditioned) pathways had no lasting effect on synaptic strength (NE only: 102.9 ± 5.6%, n = 6; 5-HT only: 102.8 ± 8.5%, n = 5. Data not shown), confirming that the monoamine agonists were converting previously induced eligibility traces into changes of synaptic strength.

Figure 1.

Figure 1

Specific monoamines transform STDP-induced eligibility traces into LTP and LTD.

(A) Two-pathway whole-cell recording configuration.

(B) Induction of eligibility traces with STDP paradigms. A representative response for the two-pathway ST conditioning is shown in the dashed box.

(c) In the visual cortex, ST conditioning alone did not affect synaptic strength in either the pre-post (red dots) or the post-pre (blue dots) pathway.

(D,E) Pressure ejection of NE (50 μM, 10 sec, grey bar) immediately after the ST conditioning (arrow) converted LTP eligibility traces in the pre-post pathway (pre-post in D: 132.3 ± 9.0%), while a similar puff of 5-HT (50 μM) transformed LTD traces in the post-pre pathway (post-pre in E: 73.1 ± 4.5%).

(F,G) Eligibility traces were not affected by pressure ejection of either 50 μM DA (F) or 50 μM CCh (G).

Indicated in parentheses is the number of experiments. Traces in C to G are averages of 10 EPSPs of the two pathways (Red: pre-post; blue: post-pre) recorded in the same neuron immediately before (thin light-color line) or 25 min after (thick dark-color line) conditioning.

Scale: 2 mV, 25 ms.

See also Fig S1.

In contrast to NE and 5HT, no effect was observed with dopamine application (DA: 50 μM; Fig 1F, pre-post: p = 0.843; post-pre: p = 1), which is not surprising given that dopaminergic transmission is minimal in visual cortex. Similarly, application of the cholinergic agonist carbachol (CCh: 250 μM; Fig 1G, pre-post: p = 0.742; post-pre: p = 0.547) after ST conditioning did not affect the EPSPs, even with a long (5 min) puff duration (Fig S1A). However, and confirming previous findings (Kirkwood, Rozas, Kirkwood, Perez, & Bear, 1999), the long CCh exposure did promote LTD induction if applied before the ST conditionings (Fig S1B). Thus, only a subset of neuromodulators can transform eligibility traces into LTP and LTD. The induction of the traces, on the other hand, is a general phenomenon not restricted to ST conditioning, and it can also be achieved by pairing synaptic stimulation (10 Hz, 20 sec) with sustained postsynaptic depolarization (-10 mV for LTP and -40 mV for LTD). Conditioning by pairing to -10 mV depolarization produced a modest LTP (109.64 ± 3.59%, n = 8, p = 0.005, data not shown). Consistent with the crucial role of neuromodulators in cortical LTP (Choi et al., 2005; Huang et al., 2012), this LTP was substantially impaired (101.36 ± 4.58%, n = 8, Fig S1C) if the endogenous monoamines were depleted by reserpine injection 1 day prior to the experiments (Choi et al., 2005; Otmakhova & Lisman, 1996). In these depleted slices, however, LTP developed robustly when NE was puffed on after the conditioning protocol (131.28 ± 7.08%, n = 7, p = 0.006. Fig S1C). Similarly, 10 Hz stimulation paired to -40 mV depolarization alone was not able to induce LTD (106.3 ± 7.0 %, n = 9, Fig S1D) in the reserpine-injected mouse. However, it caused a prominent LTD when immediately followed by 5-HT puff (78.2 ± 6.8 %, n = 9, p = 0.027, Fig S1D).

To evaluate the generality of the eligibility traces, we extended the studies to layer II/III synapses of the prefrontal cortex (mPFC), which is highly innervated by dopaminergic, noradrenergic, and serotonergic fibers, and has been implicated in multiple forms of reward-based learning (Kahnt, Grueschow, Speck, & Haynes, 2011; Ridderinkhof, van den Wildenberg, Segalowitz, & Carter, 2004; Rushworth, Noonan, Boorman, Walton, & Behrens, 2011). As in visual cortex, NE (50 μM, 10 s) transformed the trace in the pre-post pathway into LTP (Fig 2A, p = 0.01) and 5-HT (50 μM, 10 s) transformed the trace in the post-pre pathway into LTD (Fig 2B, p = 0.008). Unlike in visual cortex, however, DA (50 μM, 10 sec) did transform the trace in the pre-post pathway into LTP (Fig 2C, p = 0.01). On the other hand, CCh was ineffective in either pathway (Fig 2D, pre-post: p = 0.156; post-pre: p = 0.125). Altogether, these results indicate that eligibility traces for LTP and LTD can be induced in a Hebbian manner, and that distinct and specific monoamine neuromodulators can transform these invisible traces into long-term synaptic plasticity throughout many cortical areas.

Figure 2.

Figure 2

Eligibility traces in the prefrontal cortex.

(A) In LII/III synapses of the mPFC, a 10 s puff of DA (50 μM) transformed the LTP trace (pre-post: 133.1 ± 9.7%).

(B) A puff of 5-HT (50 μM) transformed the LTD trace (post-pre: 72.0 ± 7.3%).

(C) A puff of DA (50 μM) transformed the LTP trace (pre-post: 133.1 ± 9.7%).

(D) A puff of CCh (250 μM) did not affect the EPSPs (pre-post: 113.5 ± 7.4%; post-pre: 116.6 ± 8.6%).

Traces in A to D are coded as in Figure 1. Scale: 2 mV, 25 ms.

Endogenous monoamines can transform synaptic eligibility traces

Although puffing neuromodulators at high concentration yields consistent results, this paradigm may not resemble conditions in vivo. Therefore, we tested a more physiological paradigm for the transformation of eligibility traces by releasing endogenous neuromodulators with optogenetics in TH-ChR2 and Tph2-ChR2 mice, which express channelrhodopsin- (ChR2) in adrenergic/dopaminergic (Fig S2) and serotonergic nuclei (Zhao et al., 2011), respectively. Similar to puffing, release of endogenous NE only transformed the LTP eligibility trace (Fig 3A, pre-post: p = 0.039) while endogenous 5-HT only transformed the LTD trace (Fig 3C, post-pre: p = 0.002) in visual cortex. Importantly, the transformation of the LTP/D traces only happened when the monoamines were released after the Hebbian conditioning but not before (Fig 3B, D). The requirement for a strict temporal order between the ST conditioning and the phasic release of neuromodulators mirrors the sequential order of stimulus-reward in reinforcement learning.

Figure 3.

Figure 3

Endogenous neuromodulators released optogenetically transform previously induced eligibility traces.

(A-B) In the visual cortex, local release of endogenous NE in the TH-ChR2 mouse or 5-HT in the Tph2-ChR2 by optogenetic stimulation (blue bar) transformed the LTP/LTD eligibility traces generated by ST conditioning (pre-post in A: 115.5 ± 4.4%; post-pre in B: 73.8 ± 8.9%.).

(C-D) Neuromodulators only consolidate eligibility traces when phasically released after, but not immediately before (no overlap between the light and the conditioning), the ST-conditioning (light before in C: 90.7 ± 6.7%; light before in D: 106.2 ± 11%).

Traces in A-B are coded as in Figure 1. Scale: 2 mV, 25 ms. See also Fig S2.

Reinforced learning in behaving animals occurs over multiple stimulus-reward epochs spaced in time (Chubykin et al., 2013; Seitz et al., 2009). This differs from the protocols we used above, which were chosen to demonstrate unequivocally the induction and transformation of the eligibility traces. In this study we delivered the neuromodulators just once after 200 Hebbian conditionings that were massed into a single induction epoch. To better mirror reinforcement learning we tested whether optogenetic reinforcement of individual ST conditioning epochs (40 pre-post or post-pre pairings spaced by 20 seconds intervals) can also result in LTP/D. In slices from TH-ChR2 mice, 1 s trains of blue light pulses (10 ms at 10 Hz) flashed immediately after each pre-post conditioning epoch induced robust LTP (Fig 4A, p1, p = 0.016). Similarly, in the Tph2-ChR2 mice, the blue light flashed immediately after each post-pre conditioning epoch induced LTD (Fig 4B, p1, p = 0.002). In both cases (Fig 4A, B), synaptic responses in control pathways that were conditioned with the ST epochs, but out of phase with the blue light flashes (10 s gap), did not change (p2, pre-post only: p = 0.164, Fig 4A; p2, post-pre only: p = 0.734, Fig 4B). Altogether, these results indicate that the monoamine-mediated transformation of eligibility traces is a physiologically plausible mechanism to encode reward-based learning in vivo.

Figure 4.

Figure 4

Optogenetic release of endogenous neuromodulators transforms eligibility traces induced by spaced single ST-conditioning.

Experimental design (left): two pathways received 40 ST-conditioning epoch in an alternated manner every 20 sec. One pathway (red or blue symbols) was paired with 1 s light (10 light pulses (10 ms 700 mA each) delivered at 10 Hz); the unpaired pathway (grey symbols) served as a control.

(A) Light stimulation transforms LTP traces induced by pre-post conditioning (red symbols) in slices from TH-ChR2 mice.

(B) Light stimulation transforms LTD traces induced by post-pre conditioning (blue symbols) in slices from the Tph2-ChR2 mice.

Traces in A and B are coded as in Figure 1. Scale: 2 mV, 25 ms

Transformation of short-lived synaptic eligibility traces requires anchoring of monoamine receptors

Previously we showed that stimulation of the Gs- and Gq-coupled receptors respectively promote LTP and LTD (Seol et al., 2007). It was surprising therefore that NE and DA, which stimulate both types of receptors, only affected the eligibility traces for LTP. Indeed, only 5-HT acted on the LTD traces. To solve this conundrum we first set out to identify the relevant neuromodulator receptors using receptor-specific antagonists. One attractive candidate among the adrenoreceptors coupled to Gs was β2ARs, which are enriched in spines and promote LTP (Davare et al., 2001; Qian et al., 2012). We found that the β2AR antagonist (ICI 118,551, 1 μM) blocked the transformation of the LTP traces by NE (Fig 5A). Moreover, the βAR agonist isoproterenol (50 μM) was sufficient to transform the LTP trace, as was direct elevation of intracellular cAMP level, which is consistent with the role of β2AR stimulation in cAMP production (Fig S3). On the other hand, the generic 5-HT2 antagonist ketanserin (1 μM) blocked the transformation of the LTD trace (99.97 ± 6.75%, n = 7, data not shown). In addition, and consistent with the absence of 5-HT2A receptors in layer II/III (Weber & Andrade, 2010), the specific 5-HT2CR antagonist RS 102221 (1 μM) was sufficient to block the transformation of the LTD traces by 5-HT (Fig 5B). Thus, although multiple Gs- and Gq- coupled receptors, including the noradrenergic α1 and the cholinergic m1, may prime the subsequent induction of synaptic plasticity in visual cortex, our results strongly suggest that the β2AR and 5-HT2CR are mainly responsible for transforming previously induced eligibility traces.

Figure 5.

Figure 5

Anchoring of monoamine receptors is crucial for the transformation of transient LTP/D eligibility traces.

(A) The β2AR-specific antagonist ICI 118,551 (1 μM) prevents the transformation of the LTP eligibility trace by NE (95.2 ± 5.3%). The magenta line depicts control LTP (data from Fig1D).

(B) The 5-HT2CR specific antagonist RS 102221 (1 μM) prevents the transformation of the LTD eligibility trace by 5-HT (99.8 ± 8.2%). The blue line depicts control LTD (data from Fig1E). (C) β2AR directly interacts with PSD-95, and its c-terminal peptide DSPL disrupts this interaction.

(D) DSPL, but not the scrambled peptide DAPA, abolished the NE-mediated transformation of the LTP eligibility trace (DSPL: 96.1 ± 8.2%; DAPA: 127.8 ± 7.9%).

(E) The C-terminal peptide 2C-ct prevents the interaction between 5-HT2CR and PDZ-containing proteins such as PSD-95.

(F) 2C-ct, but not the control peptide CSSA, blocked transformation of the LTD eligibility trace by 5-HT (2C-ct: 102.9 ± 3.7%; CSSA: 82.6 ± 3.9%).

See also Fig S3-4.

One possible determinant of the specific role of β2AR and 5-HT2CR in trace transformation is the subcellular location of these receptors. Both receptors can directly interact with the PDZ domain-containing proteins such as PSD-95 and/or MUPP1 (Becamel et al., 2001; Bécamel et al., 2004; Joiner et al., 2010), suggesting that they are anchored at or very close to the synapse. Therefore, we tested the effects of disrupting their interaction with PDZ proteins by adding the c-terminal peptides of β2AR (DSPL: 50 μM) or 5-HT2CR (2C-ct peptide: 50 μM) to the recording electrode (Altier, Lory, Wijnholds, & Marin, 2006; Joiner et al., 2010)(Fig 5C-F). DSPL, but not the control peptide DAPA (with the -2 and 0 positions changed to alanine), did block the NE-mediated transformation of the LTP trace (Fig 5D, p = 0.041 between DSPL and DAPA), while the 2C-ct peptide, but not its scrambled control CSSA, prevented the transformation of the LTD eligibility trace (Fig 5F, p = 0.004 between 2C-ct and CSSA). The peptides did not block synaptic plasticity induced by presynaptic stimulation paired with postsynaptic depolarization, which is an effective induction protocol that does not require added neuromodulators (Fig S4, see methods and Huang et al., 2012 for further details), indicating that the anchoring of receptors was only required for the conversion of the eligibility traces, not for the induction of plasticity. The results above suggest that β2AR and 5-HT2CR needs to be anchored at or close to the synapse in order to convert very transient eligibility traces.

LTP/D synaptic eligibility trace properties allow a network to learn to predict reward timing

Theoretical considerations suggest that synaptic eligibility traces should be transient, but experimentally very little is known about their duration (Yagishita et al., 2014). Moreover, since distinct traces for LTP and LTD have not previously been described either experimentally or theoretically, nothing is known about the temporal properties of LTD traces. We set out to study the duration of the different eligibility traces and found that they have different durations. We show theoretically that these different durations are sufficient for producing stable learning in recurrent networks that learn to predict expected reward times.

To experimentally study the duration of the eligibility traces, we varied the delay between the ST conditioning and the puff of neuromodulators (Fig 6A insert). The LTP magnitude was about half-reduced when the agonist puff was delayed by 5 s and it was completely gone if delayed by 10 s (Fig 6A, B; p = 0.007 between Δt = 10 s and Δt = 0 s). The LTD eligibility trace was even shorter, and by 5 s it was completely absent (Fig 6A, B; p = 0.003 between Δt = 5 s and Δt = 0 s). Thus, the eligibility traces are short-lived, with the LTD trace substantially shorter than the LTP trace.

Figure 6.

Figure 6

Eligibility traces for LTP/D are transient and have different durations.

(A) Magnitude of synaptic changes (measured 30 min after conditioning) evoked when neuromodulators (50 μM isoproterenol for LTP: magenta line and symbols; 50 μM 5-HT for LTD: blue line and symbols) were puffed after the ST conditioning at the specified delays (Δt (s), delay as described in the top right insert). The duration was less than 10 s for the LTP eligibility trace and less than 5 s for the LTD eligibility trace.

(B) Significant LTP (filled magenta circles, top panel) or LTD (filled blue circles, bottom panel) was induced when neuromodulators were puffed immediately after the spike-timing pairings. There was no change in EPSP slope when puffing Iso 10 s after (open magenta circles, top panel) or 5-HT 5 s after (open blue circle, bottom panel).

In general, learning rules must not only represent the statistics of the environment, but also find stable solutions in which synaptic efficacies do not saturate or fall to zero. A possible consequence of having two eligibility traces, one for LTP and one for LTD, is that the balance between LTP and LTD could produce stable learning. Synaptic eligibility traces as observed experimentally are Hebbian in nature, and therefore depend on network dynamics, which in turn depend on synaptic efficacies. Here we propose that under certain conditions, the difference observed in temporal dynamics of the eligibility traces can generate stable reinforcement learning in cortical networks.

We illustrated this process in the context of learning to predict reward timing within a recurrent neural network. Our example is motivated by several experiments in primary sensory cortex (Chubykin et al., 2013; Gavornik et al., 2009; Goltstein et al., 2013; Shuler & Bear, 2006), in which a stimulus paired with a delayed reward results in cortical cells that remain active until the expected reward time. To this end we simulated the activity of a recurrent network of excitatory neurons (architecture depicted in Fig 7A. Model details and equations in the Mathematical Model part of the methods section which implements a learning-rule based on two eligibility traces, with different dynamics as observed experimentally (Fig 6). Such a network, as shown previously (Gavornik & Shouval, 2011; Gavornik et al., 2009), can generate long lasting dynamics that predict the timing of reward by learning the appropriate choice of lateral connection strengths, denoted by the connection matrix L (Fig 7A). Previously, a learning-rule based on a single eligibility trace and active inhibition of reward was proposed, but this rule is inconsistent with experimental results (Chubykin et al., 2013; Gavornik & Shouval, 2011; Gavornik et al., 2009; Liu et al., 2015). We replaced the previous learning-rule with a rule consistent with the experimental findings found here. The learning-rule proposed here is based on the following minimal set of assumptions: first, two eligibility traces, one for LTP and one for LTD, are activated in a Hebbian manner. Second, the time constant of the LTP trace is longer than that of the LTD trace. Third, the LTD trace saturates at higher effective values than the LTP trace. Finally, the change in synaptic weights depends on the difference between the LTP and LTD traces at the time of reward. These assumptions are implemented mathematically by equations 1-3 in the Mathematical Model part of the methods section. The first two assumptions are explicitly demonstrated experimentally in this paper, and the other assumptions are biologically plausible. The network (Fig 7A) was trained by repeatedly pairing a brief feedforward stimulus (100 ms) with a reward delayed by 1000 ms. Initially, the network responded only to the presentation of the stimulus (Fig 7B), but over the course of many trials strengthening of the recurrent synaptic weights (indicated by L in Fig. 7A) transformed the network's activity into a sustained response, which decayed slowly, spanning the time between the stimulus and the expected reward (Fig 7C-D, raster plots in Fig S5). After training, the network exhibited sustained activity that terminated near the expected time of the reward, indicating that the network learned to represent the reward-timing, similar to what is observed in rodent visual cortex after a similar training procedure (Chubykin et al., 2013; Shuler & Bear, 2006). This self-limiting sustained network activity results from the temporal competition between the LTP (red) and LTD (blue) eligibility traces (Figure 7E-G). Initially, at the time of the reward, the LTP eligibility trace (Fig 7E, red) is larger than the LTD-related trace (Fig 7E, blue), resulting in net LTP. The increase in recurrent synaptic efficacies causes reverberations in the network extend the network activity (Fig 7C). When network activity is still significantly shorter than the delay to reward, the LTP eligibility trace still dominates (Fig 7F). When the duration of activity in the network approaches the reward time (Fig 7D), the eligibility traces at time of reward cancel each other out (Fig 7G) and the network dynamics are stabilized. If the network dynamics overshoot the reward time, or if the reward time is modified to a shorter delay, the LTD related trace would dominate, and the network dynamics will become shorter and stabilize at the correct reward interval (Fig S5 C1-C3). This learning mechanism is robust and can be used to learn the timing for rewards arriving over a large range of different temporal delays (Fig 7H).

Figure 7.

Figure 7

Competition between LTP and LTD eligibility traces results in stable reinforcement learning.

(A) Diagram of recurrent network of excitatory neurons representing cells in visual cortex driven by feed-forward input from LGN.

(B-D) Simulated average population firing rate computed from a recurrent network of 100 integrate-and-fire excitatory units. The network is trained to report a 1 second time interval after a 100 ms stimulation. Three instances of network dynamics are shown: B, before training; C, during training (18 trials); and D, after training (70 trials).

(E-G) Time evolution of LTP- and LTD-promoting eligibility traces corresponding to the same trials as in B - D. Magenta lines are LTP eligibility traces, and blue lines are LTD eligibility traces. LTP and LTD eligibility traces both increase during the period of network activity (see above). LTD traces saturate at higher effective levels. At the beginning of training (E), LTP traces are larger than LTD traces at the time of reward, and therefore LTP is expressed. At the end of training (G), LTP and LTD traces are equal, resulting in no net change in synaptic efficacy. (H) The model can be trained to predict different reward timings accurately.

See also Fig S5

After training, network dynamics do not terminate exactly at time of reward, but decay just prior to its arrival (Fig 7, Fig S5). The time between the termination of network dynamics and the delivery of reward (defined as D) depends on the parameters of the learning rule (Fig S5 D, E), and this can be approximately characterized by a simple formula (See methods section – Mathematical Model and Fig S5E).

Figure 6A shows a small potentiation when serotonin is applied with a delay of 5 seconds for an LTD-inducing protocol. Although this potentiation is not statistically significant, one might pose the question of how this will affect the behavior of the model. We find that at least in the context of the network trained here, this will not have a significant effect because at long delays the net effect is still LTP. Once the network activity approaches the reward time, LTD will still dominate, resulting in stable learning.

We demonstrated here that reinforcement learning that is based on the competition between the LTP and LTD traces, which is consistent with our experimental observations, stabilizes learning without the need to include additional reward inhibiting mechanisms as assumed previously (Gavornik et al., 2009; Rescorla & Wagner, 1972; Sutton & Barto, 1998).

Discussion

Although it is well established that Hebbian plasticity can account for the remodeling of cortical networks during learning, it has been less clear how Hebbian plasticity can be recruited/gated by reward. We have provided direct physiological support for the theoretical concept of “synaptic eligibility traces.” Importantly, we demonstrate that there are two eligibility traces, one for LTP and one for LTD, with different dynamics. The transformation of these transient traces into synaptic plasticity is accomplished by specific monoamine receptors that are anchored at the synapse. The existence of different traces for LTP and LTD may be a general phenomenon, as distinct traces are observable in both visual and prefrontal cortices. The different temporal dynamics of these two generate a self-stabilizing learning-rule that allows the cortical network to perform a fundamental computation, to learn the expected time of reward. We surmise that Hebbian induction of distinct eligibility traces for LTP and LTD, which can be transformed by specific monoamines, is a simple and attractive mechanism that would allow cortical circuits to learn what stimuli and actions predict reward.

The molecular details of eligibility traces remain to be determined. A plausible scenario is that the traces reflect residual activity of kinases and phosphatases that gate AMPA receptor trafficking in and out of the synapse, and that neuromodulators, by phosphorylating AMPA receptors, are crucial to complement or enhance this process (Huang et al., 2012; Seol et al., 2007). Consistent with this idea, the decay of the LTP trace roughly matches the decay of CaMKII activity at pyramidal cell synapses (Lee, Escobedo-Lozoya, Szatmari, & Yasuda, 2009). The present results also agree with our previous observation that G-protein coupled receptors act downstream of NMDA receptor activation to prime subsequent STDP induction in a pull-push manner, with Gs-coupled receptors promoting LTP over LTD and Gq-coupled receptors promoting LTD over LTP (Huang et al., 2012; Seol et al., 2007). Consistent with this pull-push model, β2 and 5HT2c receptors in the visual cortex, which specifically transform the traces for LTP and LTD, are coupled to Gs and Gq, respectively. Notably, however, while prolonged stimulation of multiple G-protein coupled receptors can prime LTP and LTD, their corresponding traces are transformed only by β2 and 5HT2c receptors, which are anchored to the synapse. Moreover, brief stimulation of these two receptors can transform previously induced traces, but does not promote subsequent plasticity. Thus, our present findings extend the pull-push model, as the anterograde and retrograde actions of the neuromodulators both follow the Gs/Gq rule for LTP/LTD induction. At the same time, the present results reveal that the spatiotemporal profile of neuromodulator activation dictates whether they can support priming or transformation of plasticity.

The principles uncovered in visual cortex were confirmed in the prefrontal cortex, suggesting that transformation of LTP and LTD traces occurs throughout the cortex, although the specific supporting Gs- and Gq-coupled receptors may vary between cortical regions and layers. For example, DA can convert LTP traces in frontal but not visual cortex, and in visual cortex, acetylcholine puffs can reward input activity in layer V (Chubykin et al., 2013) but not layer II/III cells (Fig 1). These discrepancies can be simply explained by the synaptic anchoring of different GCPRs in these cells, although we cannot rule out more complex scenarios related to different mechanisms of synaptic plasticity (Wang & Daw, 2003). A general mechanism of trace transformation is also consistent with the retrograde action of octopamine on STDP in insect olfactory learning (Cassenaer & Laurent, 2012), and with the recent report that in the striatum, Gs-coupled D1 receptors promote structural plasticity akin to LTP in synapses previously conditioned in a Hebbian manner (Yagishita et al., 2014). These previous studies only showed a single eligibility trace, and it remains unclear whether two independent traces are a general phenomenon that also applies to these specific systems.

In contrast to previous theories focusing on a single plasticity trace, we uncover distinct and independent traces for LTP and LTD. The observation that the decay of the LTD eligibility trace is about twice as fast as the decay of the LTP trace was initially surprising because theoretical considerations of unsupervised STDP in neural networks indicate that a larger window for LTD induction confers stability to learning in neural networks (Kempter, Gerstner, & van Hemmen, 2001; Song, Miller, & Abbott, 2000). In order to obtain stability, theories of reinforcement learning typically require an additional stopping rule (Gavornik et al., 2009; Rescorla & Wagner, 1972; Sutton & Barto, 1998), which at the physiological level is usually interpreted as inhibition of a reward nucleus. We demonstrated that due to the competition between the two eligibility traces, neural firing in cells within the network naturally stop prior to reward time without the need for inhibition of reward. This stability is not obtained due to competition between the different neuromodulators (Boureau & Dayan, 2011), but due to temporal competition between synaptic eligibility traces with different dynamics, and could in principle be accomplished even if the same neuromodulator was responsible for converting both traces. Such neural dynamics, as observed in vivo (Shuler & Bear, 2006), can enable a cortical network to perform the behaviorally important task of predicting reward times. It would be of interest to explore whether the properties of the two independent eligibility traces, besides predicting timing, can also enable learning about other attributes of the reward, like quality and quantity, which are essential for decision making.

Experimental Procedures

Animals

All procedures were approved by the Institutional Animal Care and Use Committee at Johns Hopkins University. TH-ChR2 mice were produced by crossing THicre homozygote (generously provided by Dr. Jeremy Nathan) with Floxed-ChR2 (B6;129S-Gt(ROSA)26Sortm32(CAG-COP4*H134R/EYFP)Hze/J, used for data in Fig 3A and Sup Fig 2B or B6.Cg-Gt(ROSA)26Sortm27.1(CAG-COP4*H134R/tdTomato)Hze/J, used in Fig 3B and Sup Fig 2C-D (The Jackson Laboratory, Bar Harbor, ME). A Tph2-ChR2 (B6;SJL-Tg(Tph2-COP4*H134R/EYFP)5Gfng/J) heterozygote breeding pair was purchased from the Jackson Laboratory. Mice used for Sup Fig 1C-D were intraperitoneally injected with reserpine (5 mg/ kg) 23-24 hours before the experiment. All mice used were bred on a C57BL/6J background and were used at the age of p25-45, when both LTP and LTD are expressed postsynaptically (Seol et al. 2007).

Slice preparation

Coronal brain slices containing either visual or frontal cortex (300 μm thick) from C57BL/6J or transgenic mice (P25-P45) were prepared as described (Huang et al., 2012). Briefly, slices were cut in ice-cold dissection buffer containing (in mM): 212.7 sucrose, 5 KCl, 1.25 NaH2PO4, 10 MgCl2, 0.5 CaCl2, 26 NaHCO3, 10 dextrose, bubbled with 95% O2/ 5% CO2 (pH 7.4). Slices were transferred to normal artificial cerebrospinal fluid (ACSF) (similar to the dissection buffer except that sucrose is replaced by 119 mM NaCl, MgCl2 is lowered to 1 mM, CaCl2 is raised to 2 mM), incubated at 30 °C for 30 minutes, and then at room temperature for at least 30 minutes prior to recording.

Whole-cell current clamp recording

Visualized whole-cell recordings were made from layer II/III (>35% depth from the pia) regular-spiking pyramidal neurons. Glass pipette recording electrodes (3-5 MΩ) were filled with solution containing (in mM): 130 (K)Gluconate, 10 KCl, 0.2 EGTA, 10 HEPES, 4 (Mg)ATP, 0.5 (Na)GTP, 10 (Na)Phosphocreatine (pH:7.2-7.3, 280-290 mOsm). Only cells with membrane potentials <-65 mV, series resistance <25 MΩ, and input resistance >85 MΩ were recorded. Cells were discarded if any of these values changed >25% during the experiment. Data were filtered at 10 kHz and digitized at 10 kHz using Igor Pro (WaveMetrics Inc., Lake Oswego, Oregon).

Electrical stimulation and induction of plasticity

Synaptic responses were evoked in two independent pathways at 0.05 Hz by either alternating or consecutive (300 ms apart) paired-pulse stimulations (0.2 ms; 10-100 μA; 50 ms interval) through two concentric bipolar electrodes (125 μm diameter; FHC, Bowdoin, ME) placed ∼300 μm apart in the middle of the cortical thickness. Stimulus intensity was adjusted to evoke simple-waveform (2-8 mV), short onset latency (<4 ms) monosynaptic excitatory postsynaptic potentials (EPSPs). Input independence was confirmed by the absence of paired-pulse interactions. Spike-timing (ST) conditioning consisted of 200 pairings (one presynaptic stimulation given either 10 ms before or 10 ms after 4 consecutive action potentials at 100 Hz in the postsynaptic neuron) delivered at 10 Hz. Action potentials were generated by injecting 1.2-1.6 nA current for 2 ms. Pairings were followed by one of the following manipulations: a 10 s puff (1-6 psi) of neuromodulator Picospritzer; Parker Instrumentation), 50 UV light pulses (Thorlabs 365 nm LED, 100 ms duration) delivered through the 40× objective at 5 Hz to uncage DMNB-caged cAMP (Invitrogen), or trains of blue light pulses (Thorlabs 455 nm LED, 10 ms duration) delivered at 10 Hz for 10 sec (Fig 3) or 1 sec (Fig 4) to activate ChR2. Pairing LTP/LTD in Sup Fig 4 was induced by 150 pairings of presynaptic stimulation with postsynaptic depolarization to 0/-40 mV at 0.75 Hz (each depolarization lasted for 666 ms; presynaptic stimulation was given 100 ms after the onset of depolarization). Pairing LTP/LTD in reserpine-injected mice (Sup Fig 1C-D) was induced by pairing 10 Hz presynaptic stimulation with 20 s of postsynaptic depolarization from -70 mV to -10 mV for LTP, and to -40 mV for LTD, with or without 10 s of neuromodulator puffing. The synaptic strength was quantified by measuring the initial slope of the EPSPs.

Isoproterenol hydrochloride (Iso, 50 μM), methoxamine hydrochloride (Met, 50 μM), carbamoylcholine chloride (CCH, 10-500 μM), norepinephrine bitartrate (NE, 10-50 μM), and ketanserine tartrate salt (Ketanserine, 1 μM) were purchased from Sigma. Serotonin hydrochloride (5-HT, 50 μM), dopamine hydrochloride (DA, 50 μM), RS 102221 hydrochloride (RS 102221, 1 μM), ICI 118,551 hydrochloride (ICI 118,551, 1 μM), and reserpine (5 mg/kg, in 1.5% acetic acid) were purchased from Tocris. 4,5-Dimethoxy-2-Nitrobenzyl Adenosine 3′,5′-Cyclicmonophosphate (DMNB-caged cAMP, 100 μM) was purchased from Invitrogen. The membrane-permeable peptide DSPL (11R-QGRNSNTNDSPL) and its active analogue DAPA (11R-QGRNSNTNDAPA) were gifts from Dr. Johannes W Hell. Synthetic peptides (5-HT2C-Ct, VNPSSVVSERISSV; 5-HT2CSSA-Ct, VNPSSVVSERISSA, >98% purity) were purchased from GenScript.

Biocytin staining and imaging

For imaging LC noradrenergic neurons, 5-week-old TH-ChR2 were transcardially perfuse with fresh PFA (4%). Brains were removed and fixed overnight in PFA before being transferred to a sterile solution of 30% sucrose in PBS(pH 7.4) for at least 12 hours. The fixed brain was sectioned into 40 μm coronal slices using a freezing microtome (Leica) and kept at -20°C until use. For imaging recorded neurons from acute cortical slices of TH-ChR2 mice, biocytin was included into the recording pipette. After recording, slices were fixed in 10% formalin at least overnight before being rinsed in 0.1 M PBS (2× 10 min). Slices were then permeabilized (2% Triton-X in 0.1 M PBS) for 1 h before incubation with 1 ug/ml streptavidin-488 (in 0.1 M PBS containing 1% Triton-X) overnight at 4°C. Slices were rinsed with 0.1 M PBS (2× 10 min) before being mounted on a glass slide.

Confocal images were taken on a Zeiss LSM 510 with the following objective lenses: 10× /0.45, 20× /0.75, and 40×/ 1.2.

Data analysis

Data were analyzed using a custom program (Igor). Data were averaged over the last 5 min of post induction time and normalized to the last 5 min of baselinea and the Wilcoxon rank-sum test was used for independent data. One-way ANOVAs followed by Tukey's HSD post-hoc tests were used to compare the mean of more than 2 samples. Differences were considered significantly when P<0.05.

Mathematical Model

Learning rules

Simulations were performed on a recurrent network of excitatory neurons consisting of 100 integrate-and-fire units with all-to-all lateral connections. The network was driven by feedforward excitatory input representing incoming spikes from the LGN. Model equations describing the dynamics of the neurons are as in Gavornik et al 2009, except for the learning rule that updates the changes of synaptic weights of the lateral connections. The prolonged network dynamics are due to the positive feedback from lateral connections, and the strength of synaptic efficacies (denoted by the matrix L) determines the duration of activity in the network. In the current model, two synaptic eligibility traces (previously referred to as proto-weights (Gavornik et al 2009)), mediating LTP (Tpij) and LTD (Tdij) separately, evolve in time according to a pair of ordinary differential equations of the form:

τpdTijpdt=Tijp+Hp(Ri,Rj)(TmaxpTijp) (1)
τddTijddt=Tijd+Hd(Ri,Rj)(TmaxdTijd) (2)

where τp and τd are the decay time constants of the corresponding LTP and LTD traces, respectively; Hp(Ri,Rj) and Hd(Ri,Rj) are Hebbian terms, which in general are different for each trace and can include the effects of the pre- and post-synaptic spike ordering. In the present model we have used the simplest assumption, considering that both Hebbian terms are identical and depend on a product of instantaneous firing rates of post- (Ri) and pre-synaptic (Rj) neurons, as in Gavornik et al 2009. Each synaptic trace can saturate at a different level, which are determined by the quantities Tdmax and Tpmax. Finally, ε is a factor scaling the Hebbian term.

We chose a very simple rule for updating the synaptic weights, which depends on the difference between these traces and on the delivery of reward:

dLijdt=η(TijpTijd)δ(ttreward (3)

where Lij is the magnitude of the synaptic weight between neurons i (post-synaptic) and j (presynaptic), η is the learning rate, and the delta function term indicates that the changes occur at the time of reward (tReward) when neurotransmitter is released. This delta function can easily be replaced by a narrow function near the reward time, representing the presence of neuromodulator. All these equations were chosen to be as simple as possible rather than to be biophysically precise.

Note that the model assumes a reward signal at time tReward and does not distinguish between the two neuromodulators. By doing this, we implicitly assume that the actual reward activates both neuromodulators simultaneously. One could write a more complex equation with two different neuromodulators acting independently on the two different traces, for our implementation here it would not matter but could be useful if we are to consider situations where one neuromodulator is active and the other is not.

Recurrent network

The recurrent network is constructed as in Gavornik et al (2009), and only the learning rule is modified. Each neuron is a conductance-based integrate-and-fire unit, following the equations:

Cdνidt=gL(ELνi)+gE,i(EEνi)

and

skdt=1τssk+ρ(1sk)jδ(ttjk)

where υi represents the membrane potential of the i-th neuron, which in this simple model is excitatory (E) and where sk is the synaptic activation of the k-th pre-synaptic neuron. Other parameters are: membrane capacitance C; leak and excitatory conductances gL, gE,i; leak and excitatory reversal potentials EL, EE,; percentage change of synaptic activation with input spikes ρ and time constant for synaptic activation τs. The neuron fires an action potential once it reaches threshold (υth), υi = υth and the membrane potential is then reset to υrest. The delta-function in the equation above indicates that these changes occur only at the moment of the arrival of a pre-synaptic spike at tkj, where the index j indicates that this is the j′th spike in neuron k, and where:

gE,i=kLiksk

All parameter values are as in Gavornik et al 2009.

Derivation of equation in figure S5,E

After training, network activity decays almost fully before the reward signal is delivered. The difference between the time that the network decays below a threshold and the reward time is defined as D (Fig S5,D). The value of D can be approximated based on the observation that fixed points are obtained when the two eligibility traces are equal (equation 3). To calculate this, we make the following approximations: we assume that the network is either fully active or inactive, and that when it is fully active both traces are saturated. Combining these crude approximations with equations 1 and 2, we observe:

TmaxpeD/τp=TmaxdeD/τd

which can be solved for D to yield:

D=log(Tmaxd/Tmaxp)τpτdτpτd.

In figure S5,E this approximate formula is compared to simulation results, yielding a good agreement, at least for these biophysically plausible parameter ranges.

Supplementary Material

supplement

Acknowledgments

We thank Dr. H.K. Lee for insightful discussions and Drs. M. Bridi and J. Lucas Whitt for comments on the manuscript. Supported by NIH grants R01MH093665 to H.S and R01EY012124 to A.K., and by SLI grant to AK

Footnotes

Author contributions: Conceptualization, K.H., H.S. and A.K; Investigation, K.H., M.H., S.Z.H., and X.T., Writing, K.H., H.S. and A.K, Funding Acquisition, H.S. and A.K, Resources, J.W.H.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Altier C, Lory P, Wijnholds J, Marin P. Opposite Effects of PSD-95 and MPP3 PDZ Proteins on Serotonin 5-Hydroxytryptamine 2C Receptor Desensitization and Membrane Stability Sophie Gavarini. * Carine Be. 2006;17(November):4619–4631. doi: 10.1091/mbc.E06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Becamel C, Figge a, Poliak S, Dumuis a, Peles E, Bockaert J, et al. Ullmer C. Interaction of serotonin 5-hydroxytryptamine type 2C receptors with PDZ10 of the multi-PDZ domain protein MUPP1. The Journal of Biological Chemistry. 2001;276(16):12974–82. doi: 10.1074/jbc.M008089200. [DOI] [PubMed] [Google Scholar]
  3. Bécamel C, Gavarini S, Chanrion B, Alonso G, Galéotti N, Dumuis A, et al. Marin P. The serotonin 5-HT2A and 5-HT2C receptors interact with specific sets of PDZ proteins. The Journal of Biological Chemistry. 2004;279(19):20257–66. doi: 10.1074/jbc.M312106200. [DOI] [PubMed] [Google Scholar]
  4. Boureau YL, Dayan P. Opponency revisited: competition and cooperation between dopamine and serotonin. Neuropsychopharmacology : Official Publication of the American College of Neuropsychopharmacology. 2011;36:74–97. doi: 10.1038/npp.2010.151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Caporale N, Dan Y. Spike timing-dependent plasticity: a Hebbian learning rule. Annu Rev Neurosci. 2008;31:25–46. doi: 10.1146/annurev.neuro.31.060407.125639. Retrieved from http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18275283. [DOI] [PubMed] [Google Scholar]
  6. Cassenaer S, Laurent G. Conditional modulation of spike-timing-dependent plasticity for olfactory learning. Nature. 2012;482(7383):47–52. doi: 10.1038/nature10776. [DOI] [PubMed] [Google Scholar]
  7. Choi SY, Chang J, Jiang B, Seol GH, Min SS, Han JS, et al. Kirkwood A. Multiple receptors coupled to phospholipase C gate long-term depression in visual cortex. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience. 2005;25(49):11433–43. doi: 10.1523/JNEUROSCI.4084-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chubykin Aa, Roach EB, Bear MF, Shuler MGH. A cholinergic mechanism for reward timing within primary visual cortex. Neuron. 2013;77(4):723–35. doi: 10.1016/j.neuron.2012.12.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Crow TJ. Cortical synapses and reinforcement: a hypothesis. Nature. 1968;219(5155):736–737. doi: 10.1038/219736a0. [DOI] [PubMed] [Google Scholar]
  10. Davare MA, Avdonin V, Hall DD, Peden EM, Burette A, Weinberg RJ, et al. Hell JW. A beta2 adrenergic receptor signaling complex assembled with the Ca2+ channel Cav1.2. Science (New York, NY) 2001;293:98–101. doi: 10.1126/science.293.5527.98. [DOI] [PubMed] [Google Scholar]
  11. Edelmann E, Lessmann V. Dopamine regulates intrinsic excitability thereby gating successful induction of spike timing-dependent plasticity in CA1 of the hippocampus. Frontiers in Neuroscience. 2013;7:25. doi: 10.3389/fnins.2013.00025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Frémaux N, Sprekeler H, Gerstner W. Functional requirements for reward-modulated spike-timing-dependent plasticity. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience. 2010;30(40):13326–37. doi: 10.1523/JNEUROSCI.6249-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Gardner MPH, Fontanini A. Encoding and tracking of outcome-specific expectancy in the gustatory cortex of alert rats. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience. 2014;34(39):13000–17. doi: 10.1523/JNEUROSCI.1820-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Gavornik JP, Shouval HZ. A network of spiking neurons that can represent interval timing: Mean field analysis. Journal of Computational Neuroscience. 2011;30:501–513. doi: 10.1007/s10827-010-0275-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gavornik JP, Shuler MGH, Loewenstein Y, Bear MF, Shouval HZ. Learning reward timing in cortex through reward dependent expression of synaptic plasticity. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(16):6826–31. doi: 10.1073/pnas.0901835106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Goltstein PM, Coffey EBJ, Roelfsema PR, Pennartz CMa. In vivo two-photon Ca2+ imaging reveals selective reward effects on stimulus-specific assemblies in mouse visual cortex. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience. 2013;33:11540–55. doi: 10.1523/JNEUROSCI.1341-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Huang S, Rozas C, Treviño M, Contreras J, Yang S, Song L, et al. Kirkwood A. Associative hebbian synaptic plasticity in primate visual cortex. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience. 2014;34:7575–9. doi: 10.1523/JNEUROSCI.0983-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Huang S, Treviño M, He K, Ardiles A, Pasquale RDe, Guo Y, et al. DePasquale R. Pull-push neuromodulation of LTP and LTD enables bidirectional experience-induced synaptic scaling in visual cortex. Neuron. 2012;73(3):497–510. doi: 10.1016/j.neuron.2011.11.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Hull CL. Principles of behavior: an introduction to behavior theory. Appleton-Century; 1943. [Google Scholar]
  20. Izhikevich EM. Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral Cortex (New York, NY : 1991) 2007;17(10):2443–52. doi: 10.1093/cercor/bhl152. [DOI] [PubMed] [Google Scholar]
  21. Jaramillo S, Zador AM. The auditory cortex mediates the perceptual effects of acoustic temporal expectation. Nature Neuroscience. 2011;14:246–251. doi: 10.1038/nn.2688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Joiner MaL, Lise MF, Yuen EY, Kam AYF, Zhang M, Hall DD, et al. Lisé MF. Assembly of a beta2-adrenergic receptor--GluR1 signalling complex for localized cAMP signalling. Embo J. 2010;29(2):482–495. doi: 10.1038/emboj.2009.344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Kahnt T, Grueschow M, Speck O, Haynes JD. Perceptual learning and decision-making in human medial frontal cortex. Neuron. 2011;70(3):549–59. doi: 10.1016/j.neuron.2011.02.054. [DOI] [PubMed] [Google Scholar]
  24. Kempter R, Gerstner W, van Hemmen JL. Intrinsic stabilization of output rates by spike-based Hebbian learning. Neural Computation. 2001;13:2709–2741. doi: 10.1162/089976601317098501. [DOI] [PubMed] [Google Scholar]
  25. Kirkwood A, Rozas C, Kirkwood J, Perez F, Bear MF. Modulation of Long-Term Synaptic Depression in Visual Cortex by Acetylcholine and Norepinephrine. Journal of Neuroscience. 1999;19(5):1599–1609. doi: 10.1523/JNEUROSCI.19-05-01599.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Klopf AH. The hedonistic neuron: a theory of memory, learning, and intelligence. Hemisphere/Taylor & Francis; New York: 1982. [Google Scholar]
  27. Lee SJR, Escobedo-Lozoya Y, Szatmari EM, Yasuda R. Activation of CaMKII in single dendritic spines during long-term potentiation. Nature. 2009;458(7236):299–304. doi: 10.1038/nature07842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Liu C, Coleman JE, Zhang K, Shuler MGH, Liu C, Coleman JE, et al. Shuler MGH. Selective Activation of a Putative Reinforcement Signal Conditions Cued Interval Timing in Primary Article Selective Activation of a Putative Reinforcement Signal Conditions. Current Biology. 2015:1–11. doi: 10.1016/j.cub.2015.04.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Otmakhova Na, Lisman JE. D1/D5 dopamine receptor activation increases the magnitude of early long-term potentiation at CA1 hippocampal synapses. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience. 1996;16(23):7478–86. doi: 10.1523/JNEUROSCI.16-23-07478.1996. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/8922403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Poort J, Khan AG, Pachitariu M, Nemri A, Orsolic I, Krupic J, et al. Hofer SB. Learning Enhances Sensory and Multiple Non-sensory Representations in Primary Visual Cortex. Neuron. 2015;86(6):1478–1490. doi: 10.1016/j.neuron.2015.05.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Qian H, Matt L, Zhang M, Nguyen M, Patriarchi T, Koval OM, et al. Hell JW. 2-Adrenergic receptor supports prolonged theta tetanus-induced LTP. Journal of Neurophysiology. 2012 doi: 10.1152/jn.00374.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Rescorla RA, Wagner AR. Classical conditioning II: current research and theory. New York: Appleton-Century; 1972. A Theory of Pavlovian Conditioning : Variations in the Effectiveness of Reinforcement and Nonreinforcement; pp. 64–99. [Google Scholar]
  33. Richards Ba, Aizenman CD, Akerman CJ. In vivo spike-timing-dependent plasticity in the optic tectum of Xenopus laevis. Frontiers in Synaptic Neuroscience. 2010;2(June):7. doi: 10.3389/fnsyn.2010.00007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Ridderinkhof KR, van den Wildenberg WPM, Segalowitz SJ, Carter CS. Neurocognitive mechanisms of cognitive control: the role of prefrontal cortex in action selection, response inhibition, performance monitoring, and reward-based learning. Brain and Cognition. 2004;56(2):129–40. doi: 10.1016/j.bandc.2004.09.016. [DOI] [PubMed] [Google Scholar]
  35. Rushworth MFS, Noonan MP, Boorman ED, Walton ME, Behrens TE. Frontal cortex and reward-guided learning and decision-making. Neuron. 2011;70(6):1054–69. doi: 10.1016/j.neuron.2011.05.014. [DOI] [PubMed] [Google Scholar]
  36. Seitz AR, Kim D, Watanabe T. Rewards Evoke Learning of Unconsciously Processed Visual Stimuli in Adult Humans. Neuron. 2009;61:700–707. doi: 10.1016/j.neuron.2009.01.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Seol GH, Ziburkus J, Huang S, Song L, Kim IT, Takamiya K, et al. Kirkwood A. Neuromodulators control the polarity of spike-timing-dependent synaptic plasticity. Neuron. 2007;55(6):919–929. doi: 10.1016/j.neuron.2007.08.013. Retrieved from http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17880895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Shuler MG, Bear MF. Reward timing in the primary visual cortex. Science (New York, NY) 2006;311(5767):1606–9. doi: 10.1126/science.1123513. [DOI] [PubMed] [Google Scholar]
  39. Song S, Miller KD, Abbott LF. Competitive Hebbian learning through spike-timing-dependent synaptic plasticity. Nature Neuroscience. 2000;3:919–926. doi: 10.1038/78829. [DOI] [PubMed] [Google Scholar]
  40. Sutton RS, Barto AG. Reinforcement learning: an introduction. IEEE Transactions on Neural Networks / a Publication of the IEEE Neural Networks Council. 1998;9:1054. doi: 10.1109/TNN.1998.712192. [DOI] [Google Scholar]
  41. Turner PR, O'Connor K, Tate WP, Abraham WC. Roles of amyloid precursor protein and its fragments in regulating neural activity, plasticity and memory. Prog Neurobiol. 2003;70(1):1–32. doi: 10.1016/s0301-0082(03)00089-3. Retrieved from http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12927332. [DOI] [PubMed] [Google Scholar]
  42. Wang XF, Daw NW. Long term potentiation varies with layer in rat visual cortex. Brain Research. 2003;989:26–34. doi: 10.1016/S0006-8993(03)03321-3. [DOI] [PubMed] [Google Scholar]
  43. Weber ET, Andrade R. Htr2a Gene and 5-HT(2A) Receptor Expression in the Cerebral Cortex Studied Using Genetically Modified Mice. Frontiers in Neuroscience. 2010;4(August):1–12. doi: 10.3389/fnins.2010.00036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Wespatat V, Tennigkeit F, Singer W. Phase sensitivity of synaptic modifications in oscillating cells of rat visual cortex. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience. 2004;24(41):9067–75. doi: 10.1523/JNEUROSCI.2221-04.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Wörgötter F, Porr B. Temporal sequence learning, prediction, and control: a review of different models and their relation to biological mechanisms. Neural Computation. 2005;17(2):245–319. doi: 10.1162/0899766053011555. [DOI] [PubMed] [Google Scholar]
  46. Yagishita S, Hayashi-Takagi a, Ellis-Davies GCR, Urakubo H, Ishii S, Kasai H. A critical time window for dopamine actions on the structural plasticity of dendritic spines. Science. 2014;345(6204):1616–1620. doi: 10.1126/science.1255514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Yang K, Dani JA. Dopamine D1 and D5 receptors modulate spike timing-dependent plasticity at medial perforant path to dentate granule cell synapses. Journal of Neurophysiology. 2014;34(48):15888–15897. doi: 10.1523/JNEUROSCI.2400-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Zhao S, Ting JT, Atallah HE, Qiu L, Tan J, Gloss B, et al. Feng G. Cell type-specific channelrhodopsin-2 transgenic mice for optogenetic dissection of neural circuitry function. Nat Methods. 2011;8(9):745–752. doi: 10.1038/nmeth.1668. Retrieved from http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=21985008. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplement

RESOURCES