Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Feb 20.
Published in final edited form as: Neuron. 2013 Feb 20;77(4):723–735. doi: 10.1016/j.neuron.2012.12.039

A cholinergic mechanism for reward timing within primary visual cortex

Alexander A Chubykin 1,*, Emma B Roach 2,*, Mark F Bear 1,3, Marshall G Hussain Shuler 2,3
PMCID: PMC3597441  NIHMSID: NIHMS435986  PMID: 23439124

Summary

Neurons in rodent primary visual cortex (V1) relate operantly conditioned stimulus-reward intervals with modulated patterns of spiking output, but little is known about the locus or mechanism of this plasticity. Here we show that cholinergic basal forebrain projections to V1 are necessary for the neural acquisition, but not the expression, of reward timing in the visual cortex of awake, behaving animals. We then mimic reward timing in vitro by pairing white matter stimulation with muscarinic receptor activation at a fixed interval, and show that this protocol results in the prolongation of electrically-evoked spike train durations out to the conditioned interval. Together, these data suggest that (1) V1 possesses the circuitry and plasticity to support reward time prediction learning and (2) the cholinergic system serves as an important reinforcement signal which, in vivo, conveys to the cortex the outcome of behavior.

INTRODUCTION

Predicting the timing and identity of salient events from limited sensory experience is critical for survival, allowing animals to exploit resources and avoid harmful situations. Learned temporal relationships are thought to be represented in the frontal and parietal cortices (Genovesio et al., 2009; Leon and Shadlen, 2003; Mita et al., 2009), regions atop a cortical hierarchy that receive highly processed sensory information (Felleman and Van Essen, 1991). Surprisingly, neurons within the primary visual cortex (V1), located at the lowest level of this presumptive hierarchy, are also capable of providing information about the learned timing of reward in relation to sensory input. This “reward timing activity” arises as a consequence of training adult rodents to associate visual cues with water reward at brief delays (e.g., 1 or 2 seconds). Through the course of conditioning, V1 neural responses evolve from relating simple features of the visual cues to expressing what these cues have come to predict: the expected time of reward (Shuler and Bear, 2006).

The means by which cortical neurons come to express reward timing activity are unknown, but are thought to derive from a process of reinforcement learning, wherein a reinforcement signal relates the outcome of behavior with preceding neural activity (Dayan and Niv, 2008; Sutton and Barto, 1998). Computational studies have shown how such a reinforcement signal can impinge locally on synapses that have been active in the recent past, selectively modifying them so that appropriate cue-reward intervals are expressed (Gavornik and Shouval, 2011; Gavornik et al., 2009). These models assert that V1 receives convergent information about the stimulus as well as a signal conveying the subsequent receipt of reward. While it is well known that V1 attains feed-forward visual input from the thalamus (Hubel and Wiesel, 1962), the source and identity of such a reinforcing signal is unknown, as are the mechanisms, perhaps local to V1, that transform the sensory response to encode time.

Neuromodulators are attractive candidates for reinforcement due to their ability to simultaneously broadcast a message of behavioral importance throughout the cortex (Doya, 2002; Woolf, 1996) but see (Pennartz, 1995), and acetylcholine (ACh) from the basal forebrain (BF) is particularly well-suited to reinforce V1 for a variety of reasons. There is a high density of cholinergic varicosities within V1 (Lysakowski et al., 1989; Mechawar et al., 2000) arising from multiple BF cholinergic nuclei, including the substantia inominata, nucleus basalis, and the diagonal band of Broca (Carey and Rieck, 1987; Rye et al., 1984). Functionally, ACh contributes to plasticity in V1 (Bear and Singer, 1986; Gu and Singer, 1993) for reviews see (Gu, 2003; Origlia et al., 2008) and is involved in the alteration of tuning properties and map organization in other areas of cortex (Conner et al., 2003; Froemke et al., 2007; Kilgard and Merzenich, 1998; Sachdev et al., 1998; Weinberger, 2003). In addition, single unit recordings in the BF have provided evidence for the encoding of unexpected rewarding (Santos-Benitez et al., 1995; Wilson and Rolls, 1990) and salient events (Lin and Nicolelis, 2008).

The current study had two goals. The first was to determine if the cholinergic innervation of visual cortex provides the conjectured reinforcement signal required for neurons to learn conditioned cue-reward intervals. The second was to investigate the possibility that V1, isolated from the rest of the brain in an in vitro preparation, could support response duration plasticity (RDP) analogous to reward timing activity observed in vivo.

RESULTS

Cholinergic innervation is required to learn, but not express, reward timing in V1

The first set of experiments utilized the behavioral protocol previously established in adult male Long-Evans rats by Shuler and Bear (2006) and was designed to distinguish between a requirement for local BF cholinergic input in the acquisition vs. the expression of reward timing activity. Water-restricted rats were trained to approach a nosepoke, receive one of two visual cues (100 ms, full-field retinal cue presented to either the left or right eye through removable, head-mounted goggles), and lick on a spout to receive delayed water reward (Fig. 1A). Each cue was experimentally associated with a discrete number of licks necessary to release a drop of water. To correctly complete a trial, animals were required to lick m times following a presentation of cue 1 and n times following cue 2, translating to a short delay to reward following cue 1 and a long delay following cue 2. To differentiate reward receipt from expectancy, half the total number of trials were unrewarded even if the animal licked to criterion. Rats quickly adopt a simple strategy in this task: they begin licking immediately following cue presentation and exit the nosepoke after either receiving and consuming a reward (rewarded trial, water delivery indicated by blue droplets in Fig. 1A) or soon after the longer delay has expired (unrewarded trial, reward omission indicated by ghosted droplets in Fig. 1A). Therefore, a feature of this task is that differences in neural activity across experimental conditions can be interpreted free of behavioral confound, as the exit times between the two unrewarded conditions are indistinguishable (see quantification below).

Fig 1.

Fig 1

Experimental design. A) Following a 2 s intertrial interval, rats could enter the nosepoke to activate one of four pseudorandomly interleaved trial types. A brief cue (green flash) was presented to either the left eye or the right and was predictive of the number of licks (solid black ticks) required to gain a small bolus of water (blue drop) in half of the trials. B) Implanted animals performed the task as outlined in A before receiving a bilateral infusion of either saline or 192-IgG-saporin into V1. Following recovery, the task parameters were reversed such that the cue previously associated with the short delay was now paired with the longer delay and vice versa. C) Coronal sections demonstrating AChE histochemistry from a lesioned animal (top) and an intact animal (bottom). The black boxes on the low magnification images (left) correspond to the high magnification regions on the right. The white arrowhead indicates a remaining fiber stained for AChE, and the yellow arrowhead indicates a capillary. D) Example visualization of fiber depletion. The contour plot (top) shows the relative difference of laminar-averaged staining intensity between atlas-matched sections from an intact and lesioned hemisphere. The heat plot (bottom) represents the relative comparison for the coronal slice with the widest apparent lesion extent (anterior-posterior position indicated by the black triangle next to the contour plot above). The solid lines demarcate the boundary of V1 while the dashed lines indicate the border between the monocular (V1M) and binocular (V1B) portions of V1. Lower diagram adapted from Paxinos and Watson (2008). E) Aerial view of approximate recording locations and infusion zones for the lesion (top) and intact (bottom) groups. Each large circle indicates the expected region affected by the infusion for each animal, and the estimated position of the 8×2 recording electrode array is indicated by the pin points. The gray dashed line indicates the region shown in the contour plot in D. Aerial maps adapted from Zilles (Zilles, 1985).

Following the acquisition of stereotypical task performance (i.e., stable cue-reward intervals and nosepoke exit times), animals were bilaterally implanted with microelectrode arrays and infusion cannulae in the deep layers of binocular V1. Single unit data were collected while animals performed daily sessions of the task under the initial cue-reward pairing, where cue 1 predicts a short delay to reward and cue 2 predicts a long delay. As previously observed (Shuler and Bear, 2006), many neurons appeared to report one (but not both) of the operantly conditioned intervals between visual cue and expected reward delivery with a spiking pattern of sustained increased, sustained decreased, or delayed peak activity (Figs. 2A–C).

Fig 2.

Fig 2

V1 neurons recorded under initial pairing. A–C) Columns contain single unit examples of each form of reward timing (left: sustained increase, center: sustained decrease, right: peaks). A) Dots represent spikes recorded from a single unit in a behavioral session, with responses to the dominant cue on top (dark gray) and non-dominant below (light gray). The raster plots are gathered from all correctly completed unrewarded trials, aligned at stimulus onset, and stacked in chronological order. B) Smoothed, average spike rates compiled across trials are shown in dark gray for the dominant response and light gray for the non-dominant (shaded green bar: cue presentation; blue line: average reward delivery following the dominant cue). C) Bin-bybin ROC analysis, comparing responses evoked in all trials of dominant and non-dominant cue presentation, provides the area under each ROC curve (dark gray) and a 95% confidence interval (light gray). For sustained increase (left) and sustained decrease neurons (center), the neural report of time (response time; yellow star) is defined as the first bin returning to chance. For peak neurons (right), the neural report of time is defined as the maximal ROC value. D) Cumulative population distributions of delay index scores are plotted on the left, where zero represents the short delay to reward and one represents the long delay. Plotted in dark gray is the population of neurons (n = 180) whose response is dominated by the cue associated with the short delay (cue 1) and in light gray the population (n = 116) dominated by the cue paired with the longer delay (cue 2). The median values of these distributions lie close to zero and one respectively, and are significantly different from one another (Mann-Whitney U test P < 0.05), indicating that the observed subpopulations accurately relate the reward times associated with their dominant cues. The boxplot on the right indicates the median values (circle) and 25–75% percentile range (line) for each subpopulation.

To quantify this observation in individual neurons, we first used a correlation analysis to determine which cue evoked the greatest modulation of spiking activity, referred to as the neuron's “dominant” cue (as opposed to its “non-dominant” cue) in the post-stimulus period (>100 ms after stimulus offset up to the average trial duration). Next, we employed a bin-by-bin receiver operating characteristic (ROC) analysis to compare the responses evoked across all trials of dominant versus non-dominant cue presentation. As exit behavior is consistent across trials containing presentations of either cue, this approach allows us to use the non-dominant response as a comparison for any modulation resulting from the animal's immediate behavior. Of 842 well-isolated single units, 609 (48% sustained increase, 36% sustained decrease, 16% peak) showed dominant cue-evoked spiking activity in the post-stimulus period that differed significantly from the non-dominant response and were considered for further analysis.

For each of these units, the neural report of time (NRT) was defined as the moment the area under the ROC curve returned to chance with 95% confidence, or for peak neurons, as the post-stimulus moment with the highest ROC value (Fig. 2C). To analyze NRTs collected across animals that experienced different cue-reward intervals, NRTs were expressed by a “delay index” score:

delayindex=NRTτshortτlongτshort,

where τshort and τlong correspond to the average time the animal received reward following the short and long delay, respectively. Under this index, a value of zero translates to a perfect report of the short delay, whereas a value of one is a perfect report of the long delay. If neurons tend to report the expected reward time associated with their dominant cue, then a population analysis of cue 1-dominant neurons should reveal that NRT scores conform to the short delay, and that cue 2-dominant neurons match the long delay. The cumulative distributions of NRT scores shown in Fig. 2D demonstrate that, indeed, the subpopulations have significantly different median values that accord with the operantly conditioned reward delays paired with their dominant cues. These results corroborate the original in vivo description of cue-specific reward timing activity in V1.

Having established that subpopulations of V1 neurons exhibit reward timing to the delay paired with their dominant cue, we went on to query the necessity of BF cholinergic innervation in 1) the neural expression of the previously learned contingency and 2) the ability for neurons to acquire reward timing that matches novel task parameters. To address these issues, the animals were first given a bilateral infusion of either 192-IgG-saporin (n = 6) or saline (n = 7) through cannulae that were implanted alongside and centered to each recording electrode array (Fig. 1E) and then subjected to additional training with reversed cue-reward pairings (Fig. 1B). The immunotoxin 192-IgG-saporin binds p75, a growth factor receptor expressed exclusively by cholinergic neurons arising from the BF, and is transported to the soma where it induces cell death by ribosome inactivation (Wiley et al., 1991). Importantly, the technique is fast (Waite et al., 1994) and eliminates only cholinergic BF input to the infusion zone, leaving the rest of the circuit intact (Holley et al., 1994). Based on measures of dye diffusion in the rat brain (Myers, 1966), and given our cannula/electrode geometry, we expected our infusion (~350 nL) to impact a 1–2 mm diameter spherical volume centered about the electrode array. We used a diaminobenzidine-enhanced Hatchett Brown stain for acetylcholinesterase (Tago et al., 1986) (Fig. 1C) to visualize the expanse of our lesioning protocol, both in area and in depth (Fig. 1D), and quantified the extent of lesion as a volume with a diameter of 1.583 ± 0.105 mm (mean ± standard error, n = 3; see Supplementary Online Methods). The fiber loss was restricted to the occipital cortex and was not observed in any subcortical region.

The selective removal of cholinergic input to V1 prior to task reversal enabled us to disambiguate the effect of cholinergic depletion upon the expression of previously established reward timing, versus the acquisition of new reward timing intervals. Therefore, following 5–6 days of post-infusion recovery, single unit data were collected while animals performed daily sessions of the task with reversed parameters (i.e., cue 1 predicting a long delay and cue 2 predicting a short delay; Fig. 1B). Because animals exit the nosepoke shortly after the expiration of the longer delay time on unrewarded trials, regardless of which cue was presented (Student's paired t-test, P = 0.1862), the behavioral readout (average unrewarded exit time divided by average long reward delay) remained stable following cue-delay reversal for both intact and lesioned rats (two-way repeated measures ANOVA: task epoch, P = 0.744; infusion content, P = 0.179; interaction, P = 0.105). The average delays to reward were also unchanged across reversal for both groups (short delay: task epoch, P = 0.739; infusion content, P = 0.210; interaction, P = 0.146; long delay: task epoch, P = 0.653; infusion content, P = 0.246; interaction, P = 0.155). Fig. 3A,B shows example neurons of each form of reward timing (sustained increase, sustained decrease, and peak) collected following the reversal for neurons that were either cue-1 dominant (Fig. 3A) or cue-2 dominant (Fig. 3B). These examples illustrate that both cue 1-dominant and cue 2-dominant neurons recorded from intact animals express NRTs that appropriately reflect the new policy. Conversely, although cue 1- and cue 2-dominant neurons recorded from 192-IgG-saporin-infused animals are capable of displaying all forms of reward timing activity, they do not update their NRTs but rather persist in reporting the now outdated policy.

Fig 3.

Fig 3

V1 neurons recorded following cortical infusion and cue-reward reversal. A–B) Example neurons exhibiting sustained increase (top), sustained decrease (middle) and peak (bottom) reward timing recorded from intact (orange) and lesioned (red) animals following V1 drug infusion and cue-reward pairing reversal. Conventions are as in Fig. 2, with the current reward time associated with the dominant cue shown in blue (“reversed”) and the reward time initially associated with the dominant cue shown in gray (“initial”). A) Cue 1-dominant neurons from saline-infused animals (left) update their policy to reflect the reversed contingency (cue 1 associated with a long delay) while neurons from lesioned animals (right) continue to report the initial policy. B) Cue 2-dominant neurons report the new contingency (cue 2 associated with a short delay) in intact animals (left), while neurons from 192-IgG-saporin-infused animals (right) continue to express the outdated policy. C) NRTs recorded following contingency reversal, plotted as cumulative population distributions of update index scores. Zero represents the reward time initially associated with the dominant cue and one represents the reward time associated with the dominant cue following reversal (gray: observations recorded under the initial task parameters; orange: observations collected from intact animals after reversal; red: observations gathered from lesioned animals after reversal). Scores from intact animals (left; n = 182) form a population with a median value that is distinct from the initial observations (Mann-Whitney U test P < 10−10) while neurons collected from lesioned animals (right; n = 131) continue as a population to report the same median value (P = 0.6929). See also Figure S1 and S2. Boxplot conventions are as in Fig 2.

To quantify these observations from 192-IgG-saporin-infused and intact animals following policy reversal, NRTs gathered under the second contingency were scored using an “update index” to express the degree to which NRTs reflected the new, as opposed to the previously conditioned, reward time. The update index expresses the temporal difference of the NRT from the previously associated time of reward in units of the absolute difference between the old and new reward times:

updateindex={cue1dominant,τreversed>τinitial:NRTτinitialτinitialτreversedcue2dominant,τreversed<τinitial:τinitialNRTτinitialτreversed},

where τinitial and τreversed correspond to the average reward delays paired with the dominant cue during the initial and reversed task parameters. Under this “update index,” a score of zero indicates that the NRT has not moved from the initially associated reward time, whereas a positive one indicates traveling exactly one unit towards the new reward time (a negative one indicates that the NRT has moved a complete unit in the wrong direction). Pre-reversal NRTs were also expressed by this index for the purpose of comparing population responses before and after policy reversal. Since pre-reversal index scores from intact and 192-IgG-saporin-infused animals were not significantly different from each other (cue 1-dominant: Mann-Whitney U test P = 0.4010; cue 2-dominant: P = 0.7980), these data were combined together for the remaining analyses.

The distribution of update index scores collected from intact, saline-infused animals has a median value which is not significantly different from a score of one (Wilcoxon signed-rank P = 0.8161) and shows a significant shift relative to the pre-reversal data (Fig. 3C, left), indicating that as a population, neurons from intact animals updated their policy to reflect the novel pairing. However, the median value of the neural population recorded from 192-IgG-saporin-infused animals post-reversal is not significantly different from that observed under the initial conditions (Fig. 3C, right). Indeed, the median value is indistinguishable from the initial target index (Wilcoxon signed-rank P = 0.8219), indicating that the population is continuing to express the previously learned temporal relationships. Dividing the neurons by cue dominance into subpopulations reveals the same effect: neuronal responses from intact animals come to reflect the newly paired cue-reward intervals whether their dominant cue went from being paired with a short delay to a long (Fig. S1A, top), or a long delay to a short (Fig. S1B, top). However, comparing the scores collected from 192-IgG-saporin-infused animals to the pre-reversal data demonstrates that 192-IgG-saporin-infused animals do not update their reports of time, but rather continue to express the initial policy: cue 1 paired with a short delay (Fig. S1A, bottom) and cue 2 paired with a long delay (Fig. S1B, bottom).

While the NRTs obtained from 192-IgG-saporin-infused animals do not display a median shift following reversal, the shape of the distribution changes significantly (Kolmogorov-Smirnov P < 0.005) due to an apparent increase in variance. This increased variability could either be a direct consequence of removing BF cholinergic innervation or occur in its prolonged absence. To differentiate between these possibilities, we categorized NRTs as belonging to an “early” (first three sessions following the reversal) or “late” (>4 sessions after) post-reversal epoch (Fig. S2). In the early post-reversal epoch, NRT update scores collected from intact and 192-IgG-saporin-infused animals are not significantly different from each other and are indistinguishable from the pre-reversal distribution. Whereas NRTs from intact animals eventually come to relate the new reward times given more experience with the new pairings, NRTs collected from 192-IgG-saporin-infused animals persist in reporting the outdated policy, albeit with diminished precision. Thus, the long term consequences of cholinergic depletion following reversal are an inability to update temporal relationships and a degradation of the NRT population precision, possibly reflecting a neural signature of extinction.

Teaching a visual cortical slice to report time

Because the effects of 192-IgG-saporin treatment were restricted to the cholinergic innervation of the occipital cortex (Fig. 1C–E), the in vivo findings suggest that the plasticity underlying reward-timing activity occurs locally within V1. This insight encouraged us to attempt induction of response-duration plasticity (RDP) in acutely prepared slices of visual cortex, with electrical stimulation substituting for visual input and a brief pulse of the cholinergic agonist carbachol (CCh) serving as a proxy for the putative reinforcement signal. Because slice health and plasticity decline with age, we elected to use juvenile rather than adult animals for these experiments. Furthermore, we used mice rather than rats to establish an in vitro RDP protocol in anticipation of the need to use genetically engineered animals in future mechanistic investigations. This species difference does not undermine the relevance of the slice experiments, however, because mouse V1 also shows in vivo reward timing activity as observed in rat (Liu and Shuler, unpublished).

We recorded extracellular spiking activity of layer 5 neurons in response to electrical stimulation (200 μs) of the underlying white matter (WM) every 30 sec (Fig. 4A) in acutely prepared slices from male C57BL/6 mice (postnatal day 16–21). After collecting stable baseline responses, each slice was randomly assigned to a conditioning protocol involving either CCh or artificial cerebrospinal fluid (ACSF), which served as our control. Conditioning comprised local application of brief (30 ms) puffs of CCh (100 μM) or ACSF to the preparation one second after each electrical stimulus for 200 trials. After conditioning, continuous perfusion with ACSF of slices was maintained for an additional 30 minutes. Following this “washout,” we applied post-conditioning WM stimuli to assess potential changes in the duration of electrically-evoked spike trains.

Fig. 4.

Fig. 4

Response Duration Plasticity. A) Experimental design (Rec, recording electrode; Stim, stimulating electrode; WM, white matter; L4, layer 4; CCh, carbachol; ACSF, artificial cerebrospinal fluid). Conditioning (cond) is performed by CCh application at Δt delay after electrical stimulation. B–C) Raster plots of representative neurons before (baseline) and after (post-cond) conditioning. B) A neuron conditioned with CCh applied at 1s delay. C) A control neuron conditioned with ACSF applied at 1s delay. The vertical red tick indicates the time of electrical stimulus, the vertical green and gray ticks indicate the time corresponding to the CCh- or ACSF- application during conditioning, correspondingly. D–E) SDFs of the neurons before (baseline) and after (post-cond) CCh conditioning (D; n = 22 neurons from 8 animals), or, ACSF conditioning (E; n = 20 from 6 animals). For visualization, the color scale shows spike densities greater than 1 standard deviation above spontaneous, normalized to the peak response magnitude during the baseline period. Cyan crosses represent neurons' calculated response durations (time to return to 1 STD of a neuron's spontaneous firing rate). Individual neuronal responses are sorted according to the baseline response durations. F) Population SDFs before (black) and after (green) CCh conditioning. G) Population SDFs before (black) and after (gray) control ACSF conditioning. The mean neuronal response duration before and after conditioning is shown as the corresponding colored bars below the x axis. See also Figure S3.

Under our experimental conditions (see methods), each shock applied to the WM during baseline evoked a barrage of action potentials that usually subsided within a few hundred milliseconds (e.g., Fig. 4B,C top and Fig. S3). Although puffing ACSF one second after WM stimulation has no effect, conditioning with CCh at that delay caused an increase following conditioning in the duration of the spike trains elicited by WM stimulation. We calculated spike density functions (SDFs) for each isolated single unit and averaged all SDFs within the same experimental condition before (baseline) and after conditioning to achieve population SDFs. For visualization, the SDFs of individual neurons before and after conditioning were normalized to the peak magnitude of their baseline population response (Fig. 4D,E) and the population SDFs were compared (Fig. 4F,G). To quantify the changes observed following conditioning on a per neuron basis, the first moment of post-stimulus time when the SDF was no longer above 1 STD of the neuron's spontaneous activity was credited as being the neuron's response duration (cyan crosses Fig. 4D,E; see methods). A comparison of response durations before and after training reveals that the CCh-conditioned population has a substantially longer response (baseline = 673 ± 103 ms, post-cond = 1069 ± 76 ms, P < 0.01, paired t-test) terminating at a moment that accords well with the stimulus-CCh interval (Wilcoxon signed-rank test, no significant difference from 1 s stimulus-CCh interval, P = 0.13, but significant difference, P < 0.001 and P < 0.0001 correspondingly, from 0.5 s and 1.5 s intervals which constitute alternative conditioning intervals tested below), an effect that was not observed following ACSF conditioning (baseline = 634 ± 67 ms, post-cond = 670 ± 60 ms, P > 0.05, paired t-test; Wilcoxon signed-rank test, P < 0.001 for 1 s).

While this first experiment showed that CCh conditioning can prolong WM-evoked responses, it did not assess whether the duration of conditioned responses is determined by the specific stimulus-CCh interval or is simply a non-specific consequence of CCh pairing. To address this question we performed a second experiment where slices were conditioned using randomly assigned intervals of 0.5 s or 1.5 s (Fig. 5A–D). Response durations of the neurons conditioned with the 0.5 s stimulus-CCh interval increased significantly, though modestly given the proximity of the mean baseline response to the target interval (baseline: 428 ± 50 ms; post-cond: 598 ± 51 ms, P < 0.01, paired t-test). Following conditioning with a stimulus-CCh interval of 1.5 s, the mean response duration increased from the baseline duration (619 ± 69 ms) to 1,368 ± 76 ms (P < 0.001, paired t-test). For comparison, response durations from all experimental groups before versus after conditioning are plotted in Fig. 5E–F. Cumulative probability distributions allow direct comparison of response durations following conditioning with different stimulus-CCh intervals (Fig. 5G). The distributions of response durations of neurons trained to the 1 s stimulus-CCh interval were significantly different from the response durations of neurons trained to both the 0.5 s interval and the 1.5 s interval (Kolmogorov-Smirnov P < 0.001). Only after CCh conditioning were the central tendencies of the response durations significantly extended beyond their original baseline decay times to times that accorded well with their respective conditioned times (Wilcoxon signed-rank test, no significant difference from their corresponding stimulus-CCh intervals: 0.5 s – P = 0.10, 1.5 s – P = 0.23, but significant differences to the other intervals, 0.5 s: 1 s – P < 0.0001, 1.5 s – P < 0.0001; 1.5 s: 0.5 s – P < 0.0001, 1 s – P < 0.001).

Fig. 5.

Fig. 5

Conditioning to two different stimulus-CCh delays, 0.5 s and 1.5 s. A) SDFs before and after conditioning to 0.5 s stimulus-reward interval; n = 19 from 7 animals. Conventions are the same as Fig. 4. B) Plots of SDFs before and after conditioning to 1.5 s stimulus-reward interval; n = 25 from 10 animals. Individual neuronal responses are sorted according to the initial response time. C) Population SDFs conditioned to the 0.5 s stimulus-reward interval before (black) and after (light green) conditioning. D) Population SDFs conditioned to the 1.5 s stimulus-reward interval before (black) and after (dark green) conditioning. The mean neuronal response duration before and after conditioning is shown as the corresponding colored bars below the x axis. E) Connected filled circles represent response times of individual neurons before and after conditioning with ACSF (gray) and with CCh to different stimulus-reward times 0.5 s (light green), 1.0 s (green), 1.5 s (dark green). Post-cond, post-conditioning. F) Mean response times of the neurons before (black) and after (colored) conditioning. Error bars represent S.E.M. * P < 0.05; **P < 0.01; ***P < 0.001, Student's t-test. G) Cumulative probability distributions of the neuronal response times after conditioning with ACSF and CCh. See also Figure S4 & S5.

These data show that timed CCh application can change the duration of evoked spike trains such that they relate to the stimulus-CCh interval. To determine if CCh acts via muscarinic ACh receptors, we repeated the experiment with a 1 s stimulus-CCh interval in the presence of atropine (2 μM). We found that this drug treatment completely blocked the induction of RDP (baseline: 431 ± 64 ms; post-cond: 388 ± 56 ms, P = 0.4267, Fig. S4). Thus, muscarinic ACh receptor activation is necessary for the modifications responsible for RDP.

Stimulation of a visual cortical slice in the continuous presence of carbachol can lead to long-term potentiation (LTP) of evoked responses (Brocher et al., 1992; Cho et al., 2012) but see (Kirkwood et al., 1999). Although this mechanism alone seems unlikely to account for the precise timing of RDP after conditioning with different stimulus-CCh intervals, LTP of peak response magnitude offers one plausible mechanism for prolongation of an evoked spike train. Analysis of peak response magnitudes after conditioning did reveal an increase in the experiments using a 1.5 second delay (baseline: 78 ± 16 spikes/s; post-cond: 113 ± 22 spikes/s, P = 0.0014), but not in experiments using the 0.5 (baseline: 71 ± 12 spikes/s; post-cond: 82 ± 13 spikes/s) or 1 second delay intervals (baseline: 91 ± 15 spikes/s; post-cond: 95 ± 18 spikes/s, Fig. S5). To investigate this question further, we examined the relationship between changes in response duration and the corresponding changes in response magnitude on a cell-by-cell basis and found no correlation (Pearson's correlation coefficient R= −0.0606, P = 0.5797) (Fig. S7A). We also examined the relationship between the change in response duration and the latency to the first spike and found no correlation (Pearson's correlation coefficient R= −0.0283, P = 0.7958) (Fig. S7B). These results make LTP of monosynaptic inputs, directly activated by the electrical stimulation, an unlikely basis for RDP.

Another possibility is that CCh exerts its effects by strongly depolarizing neurons and evoking spikes at a fixed delay after WM stimulation. If this were the case, the coincidence of CCh-induced spiking with residual polysynaptic activity in layer 5 neurons could induce LTP of the active synapses and enhance reverberating activity. However, recording of spiking during conditioning revealed no such effect. Indeed, not only was CCh ineffective in evoking spikes, remarkably, the average WM-evoked response durations did not change during conditioning (Fig. S5). RDP only became apparent after washout of the CCh. To test if a brief CCh puff produces subthreshold depolarization, we conducted whole-cell patch clamp recordings that revealed no direct effect on membrane potential at the time of puff (Fig. S6A, B). To ensure that the training protocol was effective in these experiments, we continued conditioning the slices and confirmed, using extracellular recordings, successful entrainment to the 1s stimulus-reward interval (Fig. S6D).

One interesting property of reward timing activity in V1 in vivo is that individual neurons display cue dominance, reporting the stimulus-reward interval associated exclusively with one, but not both, visual cues (Shuler and Bear, 2006). To assess if RDP in the slice also shows input specificity, we performed a two-pathway experiment where only one pathway was conditioned (Fig. 6A–B). We first collected baseline responses of neurons to two alternating stimulating electrodes (Stim 1 and 2) placed at a distance from each other in the white matter to allow for activation of at least partially non-overlapping inputs that converge onto the recorded neurons. We then conditioned the slice by pairing only Stim 1 with a 100 μM CCh puff applied at a 1s delay (200 stimulus-CCh presentations). After 30 minute washout in ACSF, we again collected responses from the same neurons to the two stimulating electrodes. Figure 6 shows, per neuron, the SDFs evoked by the conditioned (Stim 1, Fig. 6B) and the non-conditioned (Stim 2, Fig. 6C) electrodes before and after the pairing protocol, as well as the mean population response evoked by stimulating the conditioned (Fig. 6D) or the non-conditioned pathway (Fig. 6E). Response durations pre- and post-conditioning (Fig. 6F) reveal that after conditioning, only the responses to the conditioned stimulus were extended to the time at which CCh was applied (baseline: 450 ± 44 ms; post-cond: 1,158 ± 79 ms, P < 0.001, paired t-test), while the responses to the control stimulus were not significantly changed (baseline: 314 ± 35 ms; post-cond: 412 ± 51 ms, P = 0.055, paired t-test) (Fig. 6G–H). The cumulative distributions of the response durations evoked by the conditioned and the non-conditioned stimulating electrodes were also significantly different (Kolmogorov-Smirnov P < 0.001) (Fig. 6H).

Fig. 6.

Fig. 6

Two-pathway experiment. A) Experimental setup. Stim 1 represents the conditioned stimulus; Stim 2 represents the control stimulus. B) SDF's before (baseline) and after (post-cond) conditioning with a 1s stimulus-CCh interval, n = 25 units from 9 animals. Conventions are the same as Fig. 4. C) SDF's of the neurons in response to the control stimulus. Both B) and C) represent responses of the same neurons to Stim 1 and Stim 2; individual neuronal responses are sorted according to the initial duration of response to the Stim 1 electrode. D) Population SDF to the conditioned stimulus before (black) and after (green) conditioning. E) Population SDF to the control stimulus before (black) and after (cyan) conditioning. The mean neuronal response duration before and after conditioning is shown as the corresponding colored bars below the x axis. F) Individual response times of the conditioned (green) and the control (cyan) stimuli. G) Mean Response times of the conditioned and the control stimuli, ***P < 0.001; n.s., not significant; Student's t-test. H) Cumulative probability distributions of the neuronal responses after conditioning.

Theoretical and experimental studies suggest that alteration of neuronal timing, resulting from delayed behavioral reinforcement, can be accounted for either by synaptic modifications within a network of interconnected neurons (Gavornik and Shouval, 2011; Gavornik et al., 2009; Sanchez-Vives and McCormick, 2000) or by changes in the intrinsic electrical excitability of individual cells (Egorov et al., 2002; Fransen et al., 2006; Shouval and Gavornik, 2011). To decide which model better describes RDP in visual cortex, we performed whole-cell patch clamp recordings of layer 5 neurons after conditioning slices with CCh at a 1 second delay between stimulus and “reward.” Neurons were patched in the vicinity of an extracellular recording electrode used to monitor baseline activity and the effect of conditioning. Current-clamp recordings of evoked action potentials confirmed RDP following CCh conditioning (Fig. 7A–D). Response durations of individual conditioned neurons were extended to the time when CCh had been applied following electrical stimulation (conditioned: 915 ± 64 ms, unconditioned control: 325 ± 40 ms; P < 0.001, Student's t-test) (Fig. 7C–F), and the distributions of response durations of the conditioned and control neurons were significantly different (Fig. 7G; Kolmogorov-Smirnov P < 0.001). Voltage clamp experiments, in which the cell was held at the membrane potential Vm=−70 mV, revealed a barrage of incoming synaptic currents corresponding well with extracellular recordings and the encoded stimulus-reward interval (Fig. 7H). The mean synaptically evoked charge transfer was 53.04 ± 5.9 pC (n = 8) in conditioned neurons compared to 15.53 ± 2.7 pC (n = 9) in controls (P < 0.001, Student's t-test). However, there was no difference in the membrane resistance of these same cells following conditioning (Fig. 7I), nor in the function relating action potential number to the amount of current injected into the cell in current-clamp (Fig. 7J). Based on these findings, we conclude that changes in intrinsic membrane excitability of layer 5 neurons are unlikely to account for RDP.

Fig. 7.

Fig. 7

Characterization of response time plasticity using whole-cell patch clamp. A) Representative traces of the current clamp recordings (top) and the simultaneous multi-unit recordings (bottom) post-conditioning. The persistent firing is triggered by the barrages of synaptic potentials resulting in irregular spike frequency. The insert shows a part of the current-clamp recording with the arrows indicating excitatory postsynaptic potentials (EPSPs). B) Control traces. C&D) SDFs of the individual whole-cell recordings C) post-conditioning (n=18 from 10 animals, and D) from control whole-cell recordings (n=17 from 8 animals). Neuronal responses are sorted according to their duration. Cyan crosses represent the calculated response durations. E) Population SDFs of the conditioned (green) and the control neurons (black). The mean neuronal response duration for control and post-conditioning is shown as the corresponding colored bars below the x axis. F) Mean response durations of the control (black) and the conditioned (green) neurons. Error bars represent S.E.M. * - *** - P<0.001, Student's t-test. G) Cumulative probability distributions of the control and the conditioned neuronal response durations. H) Voltage clamp recording of the individual cell (top) and the simultaneous extracellular multi-unit recording (bottom). Barrages of synaptic currents correspond well with the extracellular recordings of spiking activity. I) Membrane resistance of the control (black) and the conditioned (green) neurons was not significantly different. J) Intrinsic excitability of the conditioned and control neurons was also not significantly different as measured by the number of action potentials (AP) triggered by the increasing step current injections. See also Figure S6 & S7.

DISCUSSION

We find that selective removal of BF cholinergic input to V1 is sufficient to disrupt neural acquisition of cue-reward intervals while leaving intact the expression of intervals that were previously learned, indicating that V1 is the locus of plastic changes underlying reward timing activity and that ACh is necessary for this plasticity to occur. An instructive role for ACh in learning reward timing is indicated by the finding that pairing stimulation of V1 slice with a delayed pulse of the cholinergic agonist CCh extends the evoked spike train durations to relate the conditioned interval. Together, these data suggest that cholinergic input provides a signal of behavioral reinforcement to V1 and that even the reduced network contained within a visual cortical slice is sufficient to support reinforcement learning of temporal intervals as long as 1.5 seconds. The RDP protocol we describe here lays a foundation for future studies aimed at dissecting the mechanisms that underlie reward time prediction learning.

The neural process by which the brain attributes behavioral relevance to cues in the environment, and learns temporal expectancies based on that evidence, is unknown. Reinforcement learning theory provides a conceptual framework to explain how the brain might relate preceding neural activity with behavioral outcomes; by modifying relevant network elements on the basis of behavioral success, particular behaviors become more or less likely to occur in the future (Sutton and Barto, 1998). The biological implementation of this framework is achieved in part by invoking a global reinforcer, a signal which is permissive to plasticity and conveys the relative success or failure of recent actions throughout a network of neurons. However, relating the reinforcement signal directly to preceding neural activity presents a challenge for intervals that exceed the membrane time constant. Reinforcement learning theory contends with this “distal reward problem” by proposing the existence of an eligibility trace, a molecular memory of recent synaptic (or cellular) activity. Relating neural activity to future outcomes therefore requires an interaction between the reinforcement signal and slowly decaying eligibility traces, a notion which has been successfully applied to resolve the distal reward problem (Izhikevich, 2007). However, the observed reward timing in V1 requires further explanation: not only are associations expressed relating cues to reward, but explicit temporal representations of expected cue-reward intervals are generated.

A formal model describing how visually-cued expectancies can be learned and expressed within V1 has only recently been proposed (Gavornik et al., 2009). This model demonstrates how a reinforcement-based process termed “reward-dependent expression” generates not only the formation of appropriate associations, but also encodes the intervals interceding stimulus and reward. This model, and the in vivo results showing that ACh could serve as the reinforcement signal in V1, informed the design of our slice experiments. Consistent with theoretical assumptions, we found that when brief electrical stimulation of white matter is followed at a fixed delay by a pulse of CCh, the spike trains evoked in neurons by the electrical stimulation increase in duration to the time at which CCh was delivered. According to this theory, reward timing can be implemented either by adjusting the synaptic weights of a recurrent network of neurons (Gavornik et al., 2009) or by changing the intrinsic excitability of individual cells (Egorov et al., 2002; Fransen et al., 2006; Shouval and Gavornik, 2011). There are ample biological precedents for both types of change (Bekisz et al., 2010; Breton and Stuart, 2009; Budd, 1998; Johnson et al., 2010; Maravall et al., 2004; Nataraj et al., 2010; Rahman and Berger, 2011; Saar and Barkai, 2009). The RDP observed in layer 5 neurons appears to be better explained by the network model, as it is input specific and reflected in synaptic events without any changes in intrinsic membrane excitability. We cannot rule out changes in excitability in other neurons in the network, however.

The in vitro data presented here appear to correspond well with the results in vivo. We find that responses can be trained to represent time from stimulus to CCh “reward” with considerable accuracy, and that the modified response is specific to the input that is associated with such reinforcement. However, there are some noteworthy differences. Perhaps the most apparent is that reward timing activity in vivo assumes one of three different forms: sustained increases in firing to the anticipated reward time, sustained decreases in firing to the anticipated reward time, and peak firing at the anticipated reward time (Fig 2). In slice however, RDP manifested in neurons as sustained increases in the duration of spike trains evoked by WM stimulation. This difference is not accounted for by a species difference, as the same three classes of reward timing activity have been observed in mouse V1 (Liu and Shuler, unpublished). It could be due to a different sample of neurons in the in vivo and in vitro experiments, since only layer 5 neurons were recorded in slices. Another possible explanation is that neurons in slices have much lower baseline firing rates than neurons in vivo, thus preventing detection of neurons that decrease firing to the time of anticipated CCh delivery, as well as detection of neurons that depend upon input from “sustained decrease” neurons to generate a response that peaks at the reward time. A final possibility is that RDP in the slice is a phenomenon that does not overlap mechanistically with the reward timing observed in vivo. However, given the critical involvement of cholinergic transmission in both phenomena, we believe that the last explanation is unlikely.

The in vivo findings suggest that the release of ACh in V1 is a mechanistic requirement for acquisition of reward timing activity, and slice experiments show that CCh, acting via muscarinic ACh receptors, can elicit RDP to encode specific stimulus-CCh intervals. Precisely how CCh acts in this context as a reinforcement signal remains to be determined. One simple possibility for which there is precedence from cortical slice culture experiments (Johnson et al., 2010) is that CCh marks the time interval by triggering a second burst of action potentials, and by doing so triggers LTP of reverberating polysynaptic activity. However, our recordings during conditioning with CCh suggest that this is not the case. Under our experimental conditions, the CCh puff induces neither spiking nor subthreshold depolarization around the time of drug application. Our results are consistent with those of Gulledge et al. (2009). In that study, the authors applied brief pulses of a high concentration of ACh with pipettes in close proximity to layer 5 neurons in frontal cortex slices. The ACh did not evoke spikes directly, and instead produced a phasic hyperpolarization with a ~1 s delay. If the neurons were induced to fire action potentials with an intracellular current injection, the transient hyperpolarization was followed by an acceleration of the firing rate with a longer delay (~2 sec). Both effects were blocked by atropine and were absent in the M1 receptor knockout. Because the neurons in our experiments were not firing action potentials at the time of the CCh application, there was no opportunity for spike acceleration. We did not observe CCh-induced hyperpolarization, possibly due to the fact that our application method was more diffuse and less focal. In any case, there is no support either from the literature or from our experiments that CCh initiates response duration plasticity by simply depolarizing layer 5 neurons at the time of application.

More work is required to identify the molecular basis of the theoretically postulated “eligibility traces” and to understand how conditioning results in encoding specific time intervals expressed as the duration of stimulus-elicited spike trains. Recently it has been shown that the sign of synaptic plasticity can change its polarity from LTP to LTD depending on the timing of cholinergic activation (Gu and Yakel, 2011; Pawlak et al., 2010). The visual cortical circuit might possess similar synaptic plasticity, and RDP may be its functional consequence at the network level. In any case, RDP in the slice offers an experimentally accessible framework in which these behaviorally relevant questions can be addressed.

Conclusion

Local disruption of cholinergic input is sufficient to block the acquisition of reward timing in vivo, and acutely isolated slices of V1 can learn to anticipate delivery of CCh. Collectively, these results indicate that this type of reward prediction can occur early in sensory processing streams and is not a specialized function allocated exclusively to higher order cortical areas. In addition to introducing a robust in vitro protocol to study reinforcement learning, the present study advances the understanding of V1 reward timing activity and suggests that cholinergic neuromodulation serves as a reinforcement signal which informs the cortex about the outcome of behavior.

Experimental Procedures

All animal procedures followed NIH guidelines and were approved by the Johns Hopkins University IACUC or the Committee on Animal Care of Massachusetts Institute of Technology.

In vivo data collection and analysis

Sterile surgical procedures (see Supplementary Experimental Procedures) were used to chronically implant bilateral, movable electrode assemblies and infusion cannulae into binocular V1 (1–2 mm anterior, 4.5 mm lateral from Lambda) of adult, male Long-Evans rats. Animals were housed in a vivarium with ad libitum food, and following recovery from surgery, given water only during behavioral sessions and brief scheduled access periods. Infusions were administered under light anesthesia and consisted of 350 nL saline or 192-IgG-saporin (13.5 μg/mL; Advanced Targeting Systems). After the final recording session, each animal was given a lethal dose of pentobarbital and transcardially perfused with cold phosphate-buffered saline (PBS) followed by 4% paraformaldehyde in PBS. After post-fixation, brain tissue was processed for acetylcholinesterase visualization (See Supplementary Experimental Procedures).

Before each session, animals were placed in an operant chamber containing a water spout within a nosepoke port (Med Associates). Sixteen-channel headstage amplifiers were attached to electrode interface plugs overlying each hemisphere, and bandpass filtered (1 KHz – 10 KHz) neural signals were sampled at 33 KHz by commercially available hardware (Neuralynx). Prior to recording, the spiking activity was inspected and a minute adjustment (~25μm) of the electrode depth was applied if necessary to increase the yield or quality of recordings. Spike waveforms were sorted offline into single units (Offline Sorter, Plexon) by manually tracing boundaries between three dimensional clusters formed by the following attributes: principal components 1 and 2, peak minus valley amplitude, nonlinear energy, and valley timing. Clusters that could not be cleanly separated from noise were excluded. The isolated timestamps were visualized with Neuroexplorer (Nex Technologies) and analyzed with MATLAB (Mathworks) scripts. Collapsing across all of the recording sessions from each animal (range: 12–40), each electrode yielded an average of 3.031 ± 0.887 single unit recordings.

Visual cues were presented through removable, head-mounted goggles overlying each eye and embedded with green LEDs. Following a 2 s inter-trial interval, the animal could initiate a trial by entering the nosepoke, triggering a 100 ms flash to one of the goggles. The side of the cue presentation dictated the number of licks (range: 4–18) required to briefly activate a solenoid placed between the lick spout and a water reservoir. Rewarded and unrewarded trials were pseudorandomly interleaved such that no more than 6 consecutive trials could have the same stimulus and outcome. Behavioral sessions were controlled via custom signal conditioning hardware and MATLAB (Mathworks) software executed on the data acquisition computer.

For each single unit, cue dominance to the left eye (LE) or right eye (RE) cue was determined by comparing the correlations of the mean evoked responses in the post-stimulus period (>100 ms following stimulus offset up to average trial duration) to their differences. For instance, if the correlation between LE and LE-RE was greater than that of the correlation between RE and RE-LE, the neuron was classified as having cue dominance to the LE (and vice versa if the RE value was higher). Next, the area under the ROC curve was geometrically estimated bin-by-bin (10 ms, non-overlapping) from the distributions formed by all trials of dominant (“signal”) and non-dominant (“noise”) cue presentations (Wickens, 2001). Bias-corrected 95% confidence intervals were generated from 1,000 bootstrapped data sets drawn randomly from the signal and noise distributions with replacement (Obuchowski and Lieber, 1998). In the case of sustained inhibited neurons, the ROC area and confidence intervals were inverted. Cells were considered to have significantly different modulation if the lower bound of the confidence interval exceeded chance in more than 5% of the bins in the post-stimulus period.

Brain slice experiments

Slices of visual cortex from male P16–P21 C57BL/6 mice were prepared as described (Philpot et al., 2001). Coronal slices (350 μm thickness) recovered for 30 m at 32 °C in a holding chamber filled with warmed modified artificial cerebrospinal fluid (ACSF) that closely mimics ionic concentrations in physiological CSF (124 mM NaCl, 3.5 mM KCl, 1.25 mM Na2PO4, 26 mM NaHCO3, 0.8 mM MgCl2, 1 mM CaCl2, and 10 mM dextrose, saturated with 95% O2, 5% CO2) (Di Terlizzi and Platt, 2006; Sachdev et al., 1998)and then continued to recover for an additional 2 h at room temperature. All recordings were done in a submersion-type recording chamber maintained at 30°C and perfused at a rate of 2.5 ml/min with modified ACSF. Stimulation electrodes (clustered bipolar tungsten, FHC, Bowdoin, ME) were positioned in white matter and extracellular recordings were performed in layer 5 using glass recording electrodes (≈1 MΩ) filled with ACSF. Baseline and post-conditioning responses were collected every 30 s, the stimulus intensity (30–150 μA, 0.2 ms duration) was chosen to achieve a stable persistent response duration longer than 30 ms. For conditioning, we used a Picospritzer II (General Valve Corporation) to apply brief pulses of CCh to a slice (30 ms puffs of 100 μM CCh).

Extracellular recordings made using Axopatch 200B (Axon Instruments) were amplified 1000 times, filtered between 0.3 and 3 kHz, and digitized at 10 or 30 kHz. Offline spike detection and discrimination of single unit activity was done using stochastic wavelet analysis, with spike events detected using amplitude thresholding at 3.5× or 5× the standard deviation of the noise (MATLAB (Mathworks) software by Rodrigo Quiroga, www.vis.caltech.edu/~rodri )(Quiroga et al., 2004). Spike density functions (SDF) were built from the peri-stimulus time histograms with 10 ms bins and smoothed by convolution with a Gaussian kernel (σ = 70 ms). The last ten SDFs of baseline and the first ten SDFs post-conditioning (following the “washout” period) were averaged for each neuron. The effect of conditioning was visualized at the population, as well as the individual neuron level. At the population level, the responses of all neurons in each experimental condition were averaged to create population SDFs before and after training.

sdfpopulation=i=1NsdfiN

Population SDFs were then normalized to the peak magnitude of the population SDF before training (baseline). SDFs of individual neurons were similarly normalized for visualization. To calculate each individual neuron's response duration, we first subtracted the neuron's spontaneous activity +1 standard deviation from its SDF and then defined the neuron's response duration as the time of the first zero-crossing of the resulting SDF. These neuronal response durations were then compared before and after conditioning.

Whole-cell patch clamp recordings were done using pipettes with open tip resistances 3–5 MΩ. For current clamp recordings the pipette internal solution contained: 20 mM KCl, 100 mM Na-gluconate, 10 Hepes, 4 mM MgATP, 0.3 mM Na2GTP, 7 mM phosphocreatine-Tris, 0.2% biocytin with pH adjusted to 7.2 and osmolarity adjusted to 300 mOsm; for voltage clamp the internal solution contained: 103 mM Cs-gluconate, 5 mM TEA-Cl, 2.8 mM NaCl, 20 mM HEPES, 0.3 mM Na2GTP, 4 mM MgATP, 10 mM Na2-phosphocreatine , 0.2 mM EGTA, 0.2% biocytin and 5 mM QX-314-Cl with pH adjusted to 7.2 and osmolarity adjusted to <300 mOsm with sucrose or ddH2O. Seal resistances were >1 GΩ, pipette capacitive transients were minimized prior to breakthrough. Only cells with series resistance < 30 MΩ were included in this study. Membrane resistance and cell excitability were measured in response to a series of incremental 200 ms step current injections from −100 pA to 100 pA. Whole-cell recordings were made using Axopatch 200B (Axon Instruments). Evoked synaptic charge transfer was calculated as the integral of the postsynaptic currents over time from the moment following electrical stimulation to 1.5 s post-stimulus.

To prevent experimental bias, all experiments were done in pairs of protocol conditions where the type of experiment was determined by a coin flip. The data are reported as means ± SEM. To determine the statistical significance of the difference in response durations before and after conditioning, we used paired Student's t-test. To compare the difference in response durations post-training for different conditioning intervals and for the two-pathway experiments, we used the Kolmogorov-Smirnov test. All in vitro analyses were done using pClamp (Axon Instruments) and custom-written MATLAB and python scripts.

Supplementary Material

01

Highlights

  • V1 neurons report stimulus-reward intervals after operant conditioning.

  • Cholinergic input is required to learn, but not express, reward timing in V1.

  • Response duration plasticity is induced in V1 slices by timed delivery of carbachol.

  • V1 contains the circuitry and mechanisms required for learning reward timing.

Acknowledgements

We thank Harel Shouval, Jeff Gavornik, and Arnold Heynen for insightful comments and critiques. We are grateful to Jill McGaughy, Weidong Hu, and Peter Holland for sharing their histological expertise. We thank Tanya Marton, Vijay M. K. Namboodiri, and Camila Zold for helpful discussions of the experimental design. We also thank Hannah Joo and Shelby Hecht for their assistance with data collection, electrode drive assembly, and histology, and Kathleen Oram, Erik Sklar, and Suzanne Meagher for technical and administrative support. This work was funded by NIMH (R01 MH084911) and start-up funds provided by the Johns Hopkins University to M.G.H.S, NIDA (F31 DA026687) to E.B.R., and grants from NEI (R01 EYO12309) and NICHD (R01 HD046943) to M.F.B.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Bear MF, Singer W. Modulation of visual cortical plasticity by acetylcholine and noradrenaline. Nature. 1986;320:172–176. doi: 10.1038/320172a0. [DOI] [PubMed] [Google Scholar]
  2. Bekisz M, Garkun Y, Wabno J, Hess G, Wrobel A, Kossut M. Increased excitability of cortical neurons induced by associative learning: an ex vivo study. The European journal of neuroscience. 2010;32:1715–1725. doi: 10.1111/j.1460-9568.2010.07453.x. [DOI] [PubMed] [Google Scholar]
  3. Breton J-D, Stuart GJ. Loss of sensory input increases the intrinsic excitability of layer 5 pyramidal neurons in rat barrel cortex. The Journal of physiology. 2009;587:5107–5119. doi: 10.1113/jphysiol.2009.180943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Brocher S, Artola A, Singer W. Agonists of cholinergic and noradrenergic receptors facilitate synergistically the induction of long-term potentiation in slices of rat visual cortex. Brain Res. 1992;573:27–36. doi: 10.1016/0006-8993(92)90110-u. [DOI] [PubMed] [Google Scholar]
  5. Budd JM. Extrastriate feedback to primary visual cortex in primates: a quantitative analysis of connectivity. Proceedings. Biological sciences / The Royal Society. 1998;265:1037–1044. doi: 10.1098/rspb.1998.0396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Carey RG, Rieck RW. Topographic projections to the visual cortex from the basal forebrain in the rat. Brain Res. 1987;424:205–215. doi: 10.1016/0006-8993(87)91463-6. [DOI] [PubMed] [Google Scholar]
  7. Cho KH, Jang HJ, Jo YH, Singer W, Rhie DJ. Cholinergic induction of input-specific late-phase LTP via localized Ca2+ release in the visual cortex. J Neurosci. 2012;32:4520–4530. doi: 10.1523/JNEUROSCI.4577-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Conner JM, Culberson A, Packowski C, Chiba AA, Tuszynski MH. Lesions of the Basal forebrain cholinergic system impair task acquisition and abolish cortical plasticity associated with motor skill learning. Neuron. 2003;38:819–829. doi: 10.1016/s0896-6273(03)00288-5. [DOI] [PubMed] [Google Scholar]
  9. Dayan P, Niv Y. Reinforcement learning: the good, the bad and the ugly. Current opinion in neurobiology. 2008;18:185–196. doi: 10.1016/j.conb.2008.08.003. [DOI] [PubMed] [Google Scholar]
  10. Di Terlizzi R, Platt S. The function, composition and analysis of cerebrospinal fluid in companion animals: part I - function and composition. Vet J. 2006;172:422–431. doi: 10.1016/j.tvjl.2005.07.021. [DOI] [PubMed] [Google Scholar]
  11. Doya K. Metalearning and neuromodulation. Neural networks : the official journal of the International Neural Network Society. 2002;15:495–506. doi: 10.1016/s0893-6080(02)00044-8. [DOI] [PubMed] [Google Scholar]
  12. Egorov AV, Hamam BN, Fransen E, Hasselmo ME, Alonso AA. Graded persistent activity in entorhinal cortex neurons. Nature. 2002;420:173–178. doi: 10.1038/nature01171. [DOI] [PubMed] [Google Scholar]
  13. Felleman DJ, Van Essen DC. Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex (New York, N.Y. : 1991) 1991;1:1–47. doi: 10.1093/cercor/1.1.1-a. [DOI] [PubMed] [Google Scholar]
  14. Fransen E, Tahvildari B, Egorov AV, Hasselmo ME, Alonso AA. Mechanism of graded persistent cellular activity of entorhinal cortex layer v neurons. Neuron. 2006;49:735–746. doi: 10.1016/j.neuron.2006.01.036. [DOI] [PubMed] [Google Scholar]
  15. Froemke RC, Merzenich MM, Schreiner CE. A synaptic memory trace for cortical receptive field plasticity. Nature. 2007;450:425–429. doi: 10.1038/nature06289. [DOI] [PubMed] [Google Scholar]
  16. Gavornik JP, Shouval HZ. A network of spiking neurons that can represent interval timing: mean field analysis. J Comput Neurosci. 2011;30:501–513. doi: 10.1007/s10827-010-0275-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Gavornik JP, Shuler MG, Loewenstein Y, Bear MF, Shouval HZ. Learning reward timing in cortex through reward dependent expression of synaptic plasticity. Proc Natl Acad Sci U S A. 2009;106:6826–6831. doi: 10.1073/pnas.0901835106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Genovesio A, Tsujimoto S, Wise SP. Feature- and order-based timing representations in the frontal cortex. Neuron. 2009;63:254–266. doi: 10.1016/j.neuron.2009.06.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Gu Q. Contribution of acetylcholine to visual cortex plasticity. Neurobiol Learn Mem. 2003;80:291–301. doi: 10.1016/s1074-7427(03)00073-x. [DOI] [PubMed] [Google Scholar]
  20. Gu Q, Singer W. Effects of intracortical infusion of anticholinergic drugs on neuronal plasticity in kitten striate cortex. Eur J Neurosci. 1993;5:475–485. doi: 10.1111/j.1460-9568.1993.tb00514.x. [DOI] [PubMed] [Google Scholar]
  21. Gu Z, Yakel JL. Timing-dependent septal cholinergic induction of dynamic hippocampal synaptic plasticity. Neuron. 2011;71:155–165. doi: 10.1016/j.neuron.2011.04.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Gulledge AT, Bucci DJ, Zhang SS, Matsui M, Yeh HH. M1 receptors mediate cholinergic modulation of excitability in neocortical pyramidal neurons. J Neurosci. 2009;29:9888–9902. doi: 10.1523/JNEUROSCI.1366-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Holley LA, Wiley RG, Lappi DA, Sarter M. Cortical cholinergic deafferentation following the intracortical infusion of 192 IgG-saporin: a quantitative histochemical study. Brain research. 1994;663:277–286. doi: 10.1016/0006-8993(94)91274-2. [DOI] [PubMed] [Google Scholar]
  24. Hubel DH, Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of physiology. 1962;160:106–154. doi: 10.1113/jphysiol.1962.sp006837. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Izhikevich EM. Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral cortex (New York, N.Y. : 1991) 2007;17:2443–2452. doi: 10.1093/cercor/bhl152. [DOI] [PubMed] [Google Scholar]
  26. Johnson HA, Goel A, Buonomano DV. Neural dynamics of in vitro cortical networks reflects experienced temporal patterns. Nat Neurosci. 2010;13:917–919. doi: 10.1038/nn.2579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Kilgard MP, Merzenich MM. Cortical map reorganization enabled by nucleus basalis activity. Science. 1998;279:1714–1718. doi: 10.1126/science.279.5357.1714. [DOI] [PubMed] [Google Scholar]
  28. Kirkwood A, Rozas C, Kirkwood J, Perez F, Bear MF. Modulation of long-term synaptic depression in visual cortex by acetylcholine and norepinephrine. The Journal of neuroscience : the official journal of the Society for Neuroscience. 1999;19:1599–1609. doi: 10.1523/JNEUROSCI.19-05-01599.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Leon MI, Shadlen MN. Representation of time by neurons in the posterior parietal cortex of the macaque. Neuron. 2003;38:317–327. doi: 10.1016/s0896-6273(03)00185-5. [DOI] [PubMed] [Google Scholar]
  30. Lin S-C, Nicolelis M.a.L. Neuronal ensemble bursting in the basal forebrain encodes salience irrespective of valence. Neuron. 2008;59:138–149. doi: 10.1016/j.neuron.2008.04.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Lysakowski A, Wainer BH, Bruce G. An atlas of the regional and laminar distribution of choline acetyltransferase immunoreactivity in rat cerebral cortex. Neuroscience. 1989;28:291–336. doi: 10.1016/0306-4522(89)90180-2. [DOI] [PubMed] [Google Scholar]
  32. Maravall M, Stern EA, Svoboda K. Development of intrinsic properties and excitability of layer 2/3 pyramidal neurons during a critical period for sensory maps in rat barrel cortex. Journal of neurophysiology. 2004;92:144–156. doi: 10.1152/jn.00598.2003. [DOI] [PubMed] [Google Scholar]
  33. Mechawar N, Cozzari C, Descarries L. Cholinergic innervation in adult rat cerebral cortex: a quantitative immunocytochemical description. The Journal of comparative neurology. 2000;428:305–318. doi: 10.1002/1096-9861(20001211)428:2<305::aid-cne9>3.0.co;2-y. [DOI] [PubMed] [Google Scholar]
  34. Mita A, Mushiake H, Shima K, Matsuzaka Y, Tanji J. Interval time coding by neurons in the presupplementary and supplementary motor areas. Nature neuroscience. 2009;12:502–507. doi: 10.1038/nn.2272. [DOI] [PubMed] [Google Scholar]
  35. Myers RD. Injection of solutions into cerebral tissue: Relation between volume and diffusion. Physiology &amp; Behavior. 1966;1:171–IN179. [Google Scholar]
  36. Nataraj K, Le Roux N, Nahmani M, Lefort S, Turrigiano G. Visual deprivation suppresses L5 pyramidal neuron excitability by preventing the induction of intrinsic plasticity. Neuron. 2010;68:750–762. doi: 10.1016/j.neuron.2010.09.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Obuchowski N.a., Lieber ML. Confidence intervals for the receiver operating characteristic area in studies with small samples. Academic radiology. 1998;5:561–571. doi: 10.1016/s1076-6332(98)80208-0. [DOI] [PubMed] [Google Scholar]
  38. Origlia N, Kuczewski N, Pesavento E, Aztiria E, Domenici L. The role of cholinergic system in neuronal plasticity: focus on visual cortex and muscarinic receptors. Archives italiennes de biologie. 2008;146:165–188. [PubMed] [Google Scholar]
  39. Pawlak V, Wickens JR, Kirkwood A, Kerr JN. Timing is not Everything: Neuromodulation Opens the STDP Gate. Front Synaptic Neurosci. 2010;2:146. doi: 10.3389/fnsyn.2010.00146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Pennartz CM. The ascending neuromodulatory systems in learning by reinforcement: comparing computational conjectures with experimental findings. Brain research. Brain research reviews. 1995;21:219–245. doi: 10.1016/0165-0173(95)00014-3. [DOI] [PubMed] [Google Scholar]
  41. Philpot BD, Sekhar AK, Shouval HZ, Bear MF. Visual experience and deprivation bidirectionally modify the composition and function of NMDA receptors in visual cortex. Neuron. 2001;29:157–169. doi: 10.1016/s0896-6273(01)00187-8. [DOI] [PubMed] [Google Scholar]
  42. Quiroga RQ, Nadasdy Z, Ben-Shaul Y. Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering. Neural Comput. 2004;16:1661–1687. doi: 10.1162/089976604774201631. [DOI] [PubMed] [Google Scholar]
  43. Rahman J, Berger T. Persistent activity in layer 5 pyramidal neurons following cholinergic activation of mouse primary cortices. Eur J Neurosci. 2011;34:22–30. doi: 10.1111/j.1460-9568.2011.07736.x. [DOI] [PubMed] [Google Scholar]
  44. Rye DB, Wainer BH, Mesulam MM, Mufson EJ, Saper CB. Cortical projections arising from the basal forebrain: a study of cholinergic and noncholinergic components employing combined retrograde tracing and immunohistochemical localization of choline acetyltransferase. Neuroscience. 1984;13:627–643. doi: 10.1016/0306-4522(84)90083-6. [DOI] [PubMed] [Google Scholar]
  45. Saar D, Barkai E. Long-lasting maintenance of learning-induced enhanced neuronal excitability: mechanisms and functional significance. Molecular neurobiology. 2009;39:171–177. doi: 10.1007/s12035-009-8060-5. [DOI] [PubMed] [Google Scholar]
  46. Sachdev RNÂS, Lu S-M, Wiley RG, Ebner FF. Role of the Basal Forebrain Cholinergic Projection in Somatosensory Cortical Plasticity. J Neurophysiol. 1998;79:3216–3228. doi: 10.1152/jn.1998.79.6.3216. [DOI] [PubMed] [Google Scholar]
  47. Sanchez-Vives MV, McCormick DA. Cellular and network mechanisms of rhythmic recurrent activity in neocortex. Nat Neurosci. 2000;3:1027–1034. doi: 10.1038/79848. [DOI] [PubMed] [Google Scholar]
  48. Santos-Benitez H, Magariños-Ascone CM, Garcia-Austt E. Nucleus basalis of Meynert cell responses in awake monkeys. Brain research bulletin. 1995;37:507–511. doi: 10.1016/0361-9230(95)00033-b. [DOI] [PubMed] [Google Scholar]
  49. Shouval HZ, Gavornik JP. A single spiking neuron that can represent interval timing: analysis, plasticity and multi-stability. J Comput Neurosci. 2011;30:489–499. doi: 10.1007/s10827-010-0273-0. [DOI] [PubMed] [Google Scholar]
  50. Shuler MG, Bear MF. Reward timing in the primary visual cortex. Science (New York, N.Y.) 2006;311:1606–1609. doi: 10.1126/science.1123513. [DOI] [PubMed] [Google Scholar]
  51. Sutton RS, Barto AG. Reinforcement learning. MIT Press; Cambridge, MA: 1998. [Google Scholar]
  52. Tago H, Kimura H, Maeda T. Visualization of detailed acetylcholinesterase fiber and neuron staining in rat brain by a sensitive histochemical procedure. Journal of Histochemistry & Cytochemistry. 1986;34:1431–1438. doi: 10.1177/34.11.2430009. [DOI] [PubMed] [Google Scholar]
  53. Waite JJ, Wardlow ML, Chen AC, Lappi DA, Wiley RG, Thal LJ. Time course of cholinergic and monoaminergic changes in rat brain after immunolesioning with 192 IgG-saporin. Neuroscience Letters. 1994;169:154–158. doi: 10.1016/0304-3940(94)90379-4. [DOI] [PubMed] [Google Scholar]
  54. Weinberger NM. The nucleus basalis and memory codes: Auditory cortical plasticity and the induction of specific, associative behavioral memory. Neurobiology of Learning and Memory. 2003;80:268–284. doi: 10.1016/s1074-7427(03)00072-8. [DOI] [PubMed] [Google Scholar]
  55. Wickens TD. Elementary Signal Detection Theory. Oxford University Press; USA: 2001. [Google Scholar]
  56. Wiley RG, Oeltmann TN, Lappi DA. Immunolesioning: selective destruction of neurons using immunotoxin to rat NGF receptor. Brain research. 1991;562:149–153. doi: 10.1016/0006-8993(91)91199-b. [DOI] [PubMed] [Google Scholar]
  57. Wilson F.a., Rolls ET. Neuronal responses related to reinforcement in the primate basal forebrain. Brain research. 1990;509:213–231. doi: 10.1016/0006-8993(90)90546-n. [DOI] [PubMed] [Google Scholar]
  58. Woolf NJ. Global and serial neurons form A hierarchically arranged interface proposed to underlie memory and cognition. Neuroscience. 1996;74:625–651. doi: 10.1016/0306-4522(96)00163-7. [DOI] [PubMed] [Google Scholar]
  59. Zilles K. The Cortex of the Rat: A Stereotaxic Atlas. Springer-Verlag; 1985. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

01

RESOURCES