SUMMARY
Appropriate choice about delayed reward is fundamental to the survival of animals. While animals tend to prefer immediate reward, delaying gratification is often advantageous. The dorsal raphe (DR) serotonergic neurons have long been implicated the processing of delayed reward, but it has been unclear whether or when their activity causally directs choice. Here we transiently augmented or reduced the activity of DR serotonergic neurons while mice decided between differently delayed rewards as they performed a novel odor-guided intertemporal choice task. We found that these manipulations, precisely targeted at the decision point, were sufficient to bidirectionally influence impulsive choice. The manipulation specifically affected choices with more difficult trade-off. Similar effects were observed when we manipulated the serotonergic projections to the nucleus accumbens (NAc). We propose that DR serotonergic neurons preempt reward delays at decision point and play a critical role in suppressing impulsive choice by regulating decision trade-off.
Keywords: Serotonin, raphe, intertemporal choice, decision trade-off, impulsivity, accumbens, delay discounting
INTRODUCTION
Intertemporal choice is decision about future outcomes that are differently delayed. Impulsive choice, the preference for more immediate but smaller rewards, is associated with a number of maladaptive behaviors [1–4] and psychiatric disorders [5–8].
Serotonin has been well studied in intertemporal choice paradigms with lesion and pharmacological methods, and is thought to promote the choice for larger, more delayed reward [9–11]. Due to the low temporal resolution of such techniques, it has been difficult to pinpoint how serotonin performs this function. We reasoned that since many intertemporal decisions are carried out before action or outcome materializes, and are often prompted by conditioned stimuli (CS), neuronal activity causally driving decision-making should occur at cue. Indeed, recordings of optically tagged DR serotonergic neurons in a Pavlovian task showed that some of them fired phasically to reward-predicting cues [12]. We therefore hypothesized that this phasic activity of serotonergic neurons could be used to direct choice at the decision point with the presentation of a CS.
To test our hypothesis, we needed a behavioral task that isolated a clearly defined decision point[13]. To this end, we devised a novel odor-guided intertemporal choice (OGIC) task that randomizes reward delay contingencies trial-by-trial, in the fashion of existing odor-based rodent decision tasks [14–16]. Randomizing reward contingencies ensures that a decision has to be made on every trial; and the use of odor cues allows the isolation of a decision period for temporally precise neural manipulations. We then systematically tested mouse subjects on decisions between a large reward and a small one that were differently delayed, while optogenetically activating or suppressing DR serotonergic neurons during the decision period in a random subset of the trials. Our findings suggest that serotonergic neuronal activity suppresses impulsive choice at the decision point under trade-off conditions, possibly via action in the nucleus accumbens (NAc), another structure implicated impulsive choice [17–20]. We propose that serotonergic neurons play a crucial role in resolving reward delay and size trade-off under decision conflict.
RESULTS
Mice Can Use Odor Cues to Choose Sooner or Larger Rewards
We conducted the task in a custom-designed operant chamber with a center odor port and two side reward ports. Subjects initiated each trial by nosepoking into the odor port and sampled a mixture of two odors. After cue sampling, mice could respond by choosing to nosepoke in either reward port (Figure 1A). After a reward delay, a water reward was delivered in the chosen side port. A predetermined trial duration was set to the same length regardless of the chosen reward option so that subjects could not perform more trials over a given period of time by choosing the less delayed reward. All possible combinations of left and right reward delays were offered pseudorandomly in an interleaved fashion within each session. Sert-Cre mice [21] were water-restricted and shaped to associate the intensity of odor 1 with left reward delay and that of odor 2 with right reward delay (Figure 1B). The performance of an example subject after training is shown in Figure 1C. During training, both left reward size (SL) and right reward size (SR) were 1 drop (SL1 sessions), and after training the left reward size became 2 drops while the right reward size remained at 1 (SL2 sessions). Motor aspects of response to the two odors are shown in Figure S1. Subjects chose the left reward more often during SL2 sessions (Figure 1F, paired t-test, t9= 6.090, p<0.001). Psychometric curves of a representative group of subjects show the proportion of left choice as a function of left and right reward delays (10 mice, 64 SL1 sessions, 10882 trials; 79 SL2 sessions, 10269 trials). We fitted the proportion of left choice with a multiple regression model to express left choice as a function of left and right reward delays offered. We reasoned that since the goal of the task was to compare reward delays to make a choice, it was very likely that the effects of the two reward delays depended on each other, and therefore, we included an interaction term between the left and right reward delays. This addition also improved fitting of the model (see Methods and Table S1 for model selection). Our model indeed had a significant interaction term on the population level (Wilcoxon signed-rank test, p<0.01 for SL1 sessions, and p<0.01 for SL2 sessions). Between SL1 and SL2 sessions, the only model coefficient that changed significantly was that of the interaction between left and right delay, βLeft Delay*Right Delay−1 (Figure 1J, Wilcoxon signed-rank test, p<0.01), indicating reduced dependency between the effects of the two reward delays.
Optogenetic Manipulations of DR Serotonergic Neurons Bidirectionally Regulated Impulsive Choice
We manipulated DR serotonergic neurons at the decision point in the OGIC task to investigate the role of transient serotonergic activity in intertemporal choice. In all manipulation experiments, we tested a smaller range of right reward delays (0, 2 and 8 seconds). The left reward was always twice as large as the right reward (SL=2×SR).
First, we hypothesized that transiently silencing serotonergic neurons would encourage impulsive choice. SERT-Cre mice trained on the OGIC task as described were injected in the DR nucleus with AAV9-ef1α-DIO-eArch3.0-eYFP (DR-Arch, Figure 2A and C), or a control virus containing only eYFP (DR-Ctrl), and implanted with a single optical fiber over DR. Light delivery was administered interleaved on a subset of trials. A constant pulse of green light (532 nm, 5 mW) was triggered by odor port entry and terminated at reward port entry (Figure 2B) to ensure light delivery spanned the entire epoch where a decision could be made or changed. Similar light delivery was effective in suppressing spontaneous firing in DR in an in vivo anesthetized preparation (Figure S2A and B). Green light reduced choice of the larger reward in DR-Arch but not DR-Ctrl group (Figure 2K, two-way ANOVA with repeated measures, significant interaction between group and light, F1, 19 = 11.72, p<0.01; n=10 DR-Arch, 10 DR-Ctrl). Proportion of left choice for DR-Arch and DR-Ctrl groups under green light manipulation were fitted with the choice model described in the previous section, with an additional term for light treatment, shown in Figure 2D, E, I and J. We found that there was a significant positive interaction between green light and the delay interaction term (Figure 2M, Wilcoxon signed-rank test, p<0.05). Green light did not have significant interaction with any other model parameters (Table S2, Wilcoxon signed-rank test, p>0.05). We conclude that green light specifically increased the interaction between left and right reward delays in favor of choosing the smaller reward on the right.
Next we hypothesized that augmenting serotonergic activity during the same time point might have the opposite effect. We injected similarly trained and implanted SERT-Cre mice with AAVrh8-CBA-DIO-ChR2-eYFP virus (DR-ChR2, Figure 2F and H). A transient bout of blue light was able to reliably trigger spiking in the ChR2-expressing serotonergic neurons (Figure S2C–E). In the task, blue light increased preference for the larger reward in DR-ChR2 group but not DR-Ctrl group (Figure 2L, two-way ANOVA with repeated measures, significant interaction between group and light, F1, 19 = 29.95, p <0.01; n=10 DR-ChR2, 10 DR-Ctrl). There was a significantly negative interaction between blue light and specifically the delay interaction term (Figure 2M, Wilcoxon signed-rank test, p<0.05) in favor of choosing the larger reward. Blue light did not have significant interaction with any other model parameters (Table S2, Wilcoxon signed-rank test, p>0.05).
Taken together, these results indicate that manipulating the predictive activity of DR serotonergic neurons bidirectionally regulated impulsive choice by altering how the effects of both reward delays on offer affected each other.
Nucleus Accumbens is a Site of Action for Serotonergic Neurons in Regulating Impulsive Choice
NAc is a region with extensive neuromodulatory innervation and interaction [20,22,23]. Serotonergic axons have been shown to innervate both the core and the shell regions of NAc [24]. As have been previously reported [25], we observed presence of eYFP containing projections in NAc, more prominently in the shell region, in Sert-Cre mice expressing ChR2-eYFP construct in DR (Figure 3A and B).
To investigate the downstream target of DR neurons responsible for the effects we observed, we manipulated opsin-expressing serotonergic axons in NAc. Sert-Cre mice trained on the OGIC task were injected with AAVs carrying either a ChR2-eYFP construct (DR-NAC-ChR2), an eArch3.0-eYFP construct (DR-NAC-Arch), or a control eYFP construct (DR-NAC-Ctrl), and implanted bilaterally with optical fibers over NAc.
Green light reduced the proportion of left choice in DR-NAC-Arch group but not in the control group (Figure 3E, two-way ANOVA with repeated measures, significant interaction between group and light, F1, 19 = 13.19, p<0.01; n=10 DR-NAC-Arch, 10 DR-NAC-Ctrl). Blue light increased this preference in DR-NAC-ChR2 group but not in the control group (Figure 3F, two-way ANOVA with repeated measures, significant interaction between group and light, F1, 19 = 15.17, p<0.01; n=10 DR-NAC-ChR2, 10 DR-NAC-Ctrl). Similar to what we observed in DR cell body experiments, green light in DR-NAC-Arch group had a positive interaction with the delay interaction term (Figure 3G, Wilcoxon signed-rank test, p<0.05), whereas blue light in DR-NAC-ChR2 group had a negative interaction with the delay interaction term (Figure 3G, Wilcoxon signed-rank test, p<0.01). Light did not have significant interaction with any other model parameters (Table S2, Wilcoxon signed-rank test, p>0.05). These results suggest that NAc likely acts as a target site to mediate the action of serotonergic neurons in suppressing impulsive choice by altering how much effects of both reward delays on offer affected each other.
DR Serotonergic Manipulation Effects Were Modulated by the Degree of Trade-off
We consistently found optogenetic manipulations of serotonergic activity either at the cell bodies or their NAc-projecting terminals to be effective in altering the interaction between the effects of left and right reward delays bidirectionally. In the OGIC task, difficulty arising from having to discriminate between odor pairs (perceptual difficulty, see Methods) covaried with the dilemma in choosing between two reward delays (choice difficulty, see Methods), therefore, we could not immediately exclude the possibility that light manipulations were affecting the subjects’ discrimination between similarly concentrated odors. We found, though, that light did not affect the effects of the left delay or right delay alone, indicating that the subjects could respond normally to each odor by itself. Since interaction terms are complex concepts to interpret, we visualized the manipulation effects using our choice model with heatmaps (Figure 4A–D). We found that the valleys of the Arch manipulation effects and the peaks of the ChR2 manipulation effects did not coincide with the perceptual boundary, where left and right reward delays are equal, but were skewed in the direction of longer left reward delay. Since the left reward was larger, the subjects were willing to tolerate higher delays for it. Therefore, the region we saw the greatest manipulation effects was where there was more difficult decision trade-off. Given the model prediction, we looked in our raw choice data for a relationship between choice difficulty and the light effect (Figure 4E–H). To quantify the relationship, we expressed light effect as a function of both choice difficulty and perceptual difficulty (see Methods). The coefficient for choice difficulty after any effect of perceptual difficulty was accounted for was still significantly lower than zero for DR-Arch and NAC-Arch groups (Figure 4J and L, Wilcoxon singed-rank test, p<0.05) and was significantly higher than zero for DR-ChR2 and NAC-ChR2 groups (Figure 4J and L, Wilcoxon signed-rank test, p<0.05 and p<0.01 respectively). Choice difficulty did not affect light effects in control groups (Figure 4J and L, Wilcoxon signed-rank test, p >0.05).
Taken together, we show that optogenetic manipulations of serotonergic neurons and their NAc-projecting axons are most effective in altering the subjects’ choice when the trade-off was most difficult.
DISCUSSION
Serotonergic Activity at the Decision Point Modulates Impulsive Choice
We report here that transient inhibition of serotonergic activity at the decision point promotes impulsive choice, whereas transient activation has the opposite effect.
We demonstrated a role for serotonergic activity at the decision point to affect choice. Previous work has shown that serotonergic activation during waiting for reward increased waiting [26,27], suggesting that serotonin signals the availability of delayed reward and increases persistence in waiting for the said delayed reward. In contrast to the subjects in a waiting paradigm, the subjects in the present experiments had not already been waiting at the activation or inhibition onset, and therefore the effects we observed were not simply due to extending an existing behavioral state. We showed that manipulations of serotonergic neurons are capable of recruiting patient behavior programs, such as a patient choice in our experiments, before waiting even begins. Our results therefore demonstrate an additional and distinct mechanism for serotonin to suppress impulsive choice and buttress the theory of serotonin support for patience. We did not observe altered behavior inhibition since the subjects did not spend more time sampling in the odor port or traveling to the reward ports during light stimulation (Figure S2F and G). Our results provide strong evidence that serotoninergic neurons support patience primarily through suppressing impulsive choice, as opposed to impulsive action [28].
Serotonergic neurons have also been suggested to encode unconditioned rewarding stimuli in naïve mice [25,29]. After training, serotonergic neurons have been shown to fire phasically at CS in proportion to the value predicted by the CS [12,30–33] in Pavlovian tasks. Our results provided causal evidence that this reward value-predictive signal could be used to direct behavior. The manipulation effects in the present experiments are specific to the reward delay interaction term (Figure 2M and 3G), which was also the term that was different between SL1 and SL2 sessions in baseline behavior (Figure 1J). It is possible that serotonergic neuronal suppression caused the subjects to undervalue the left reward size and activation caused them to overvalue it. This bidirectional shift in preferences was only captured because our task constituted only free choices with equally valid responses on both sides. From this perspective, serotonergic neurons could act as a decision variable for controlling intertemporal preferences.
Serotonergic Neurons Are Engaged in Difficult Delay-size Trade-off
We observed the greatest effect in serotonergic manipulations when the subjects had to wait longer for the larger reward, but not vastly longer (Figure 4). We propose that serotonergic neuronal control of choice is specific to difficult trade-off conditions. This idea may explain the seemingly inconsistent results in serotonergic manipulations in rodent delay discounting literature. Chemical lesions of the serotonergic system have been shown to increase impulsivity in some delay discounting experiments [10, 34], but not in others [35]. Mobini et al. and Wogar et al. used an adjusting-delay procedure [36], whereas Winstanley et al. varied delay contingencies systematically. Since adjusting-delay procedures for the indifference delay for the larger reward, the procedure samples preference near the indifference delay, i.e. conditions with greater delay-size trade-off. On the other hand, when delays are systematically varied, the region of greatest trade-off may contribute narrowly to sampled data sets. In our study, we may have identified conditions for which serotonergic neurons are important, and those for which they are not, because we exhaustively sampled large sets of delay combinations in fine grain, delineating between high trade-off and low trade-off conditions.
Our proposal for serotonin function in trade-off emphasizes the integrative nature of serotonin encoding. To process trade-off, a neural structure must have information about both reward and cost. Others have proposed similar theories for serotonin’s role in combining positive and negative valences. For example, serotonin was proposed to encode “beneficialness” [37], i.e. as a function of reward magnitude multiplied by reward probability subtracting cost. Given the diverse observations of serotonin effects in both reward and punishment processing, the more integrative theories may have greater explanatory power. Of course, positive and negative valences are not necessarily integrated in parallel as in the case of positive and negative outcomes of a choice. A possible alternative interpretation of our data is that the need for trade-off constitutes a precondition for serotoninergic neurons to be engaged. Decision conflict between wanting to choose a larger reward and a sooner one could induce anxiety and lead to impulsive behavior. Interestingly, it has been shown that optogenetic activation was most effective in averting wait errors during the late phase of waiting, when mice were more likely to give up [26]. It is possible that serotonergic neurons are activated in response to this negative mood to maintain a representation of positive predictions of future reward, and as a result, resolving decision conflicts appropriately. Testing these theories will certainly require a combination of manipulations and monitoring of neuronal activities during behavior paradigms explicitly designed to probe the interaction between positive and negative valences.
DR-NAc Circuits Can Mediate Serotonergic Control of Intertemporal Choice
We found that augmenting and inhibiting opsin-expressing serotonergic axonal projections in NAc mimicked the effects of manipulating DR serotonergic cell bodies. NAc has been found to be involved in choice impulsivity[17], and the dense and diverse neuromodulatory input to the region is thought to contribute[20]. More specifically, ablation of the core region of NAc (NAcC) has been shown to reduce responding to the larger reward in delay discounting tasks[18,19], but not in the shell region (NAcSh)[19]. We observed more prominent presence of opsin-expressing serotonergic projections in NAcSh, but also in NAcC to a smaller extent. Due to difficulty in restricting the spread of light in optogenetic manipulations, we cannot definitively attribute the effects of DR-NAc manipulations to either sub-region, although it is more likely that DR-NAcSh projections are mediating the observed effects. The latter possibility appears inconsistent with the prevailing belief that NAcC is the major component responsible for processing delayed reward. The difference between chronic and acute neuronal manipulations are often discussed, especially in the era of optogenetics[38]. It is possible that NAcC is the main component processing delayed reward, but NAcSh is a site of dynamic neuromodulatory control upstream of NAcC. A future goal would be to elucidate the functional relationship between the structures.
We propose that NAc is a projection target that mediates the action of serotonergic neurons in guiding intertemporal choice. Since we cannot rule out the possibility that terminal stimulation caused antidromic activation of DR cell bodies, and since serotonergic neurons project widely, it is possible that other areas could be affected by stimulation of DR projections in NAc. Further work is required to clarify the relationship between DR serotonergic neurons and their target areas.
Neurochemical Considerations
Optogenetic activation of serotoninergic neurons have been well-documented to cause serotonin release in projection targets [26,39]. Serotonergic neurons have also been reported to signal reward via the co-release of both serotonin and glutamate [25,29,40]. It is possible that some of the effects we observed are mediated by glutamate transmission. Although we do not observe reward effects of optogenetic stimulation of DR, we cannot exclude this possibility. We therefore draw our conclusions about serotonergic neurons as labeled by the genetic marker Sert.
In conclusion, we developed a novel odor-guided choice task that combines genetic tools of great temporal and cellular specificity with well-controlled psychophysical characterization of intertemporal decision-making. We show that transient manipulations of DR serotonergic activity at the decision point causally regulated impulsive choice during difficult delay-size trade-off. We propose that predictive reward-value encoding signals in serotonergic neurons underlie appropriate choice between differently delayed reward options under decision trade-off.
Star Methods
Experimental Model and Subject Details
Subjects were male adult mice heterozygous for the transgene SERT-Cre (Tg(Slc6a4-cre)ET33Gsat; Gensat) in the C57BL/6J background. They were singly housed and water-restricted in accordance with guidelines from the Committee on Animal Care (CAC) at the Massachusetts Institute of Technology. Behavior experiments were carried out during the dark phase of the 12 hour dark/12 hour light cycle.
Methods Details
Viral Construction
The AAV-CBA-DIO-ChR2-eYFP plasmid was constructed by inserting the ChR2-eYFP gene fragment, which was obtained from a template, pLenti-CaMKIIa-ChR2-eYFP (courtesy of Dr. Karl Deisseroth at Stanford University [40]) into a linearized and modified AAV vector containing the chicken β-actin promoter, using the double-floxed inverted construct strategy. Restriction digests were made according to standard protocol, and ligations were made using Takara DNA ligation kit version 2.1. The construct was amplified using EndoFree Plasmid QIAGEN maxi prep kit. Recombinant AAV vectors were serotyped with AAVrh8 coat proteins and were packaged by the viral vector core at the Gene Therapy center and Vector Core at the University of Massachusetts Medical School. The final viral concentration was 1 × 1013 genome copies mL−1.
Stereotactic Injections and Optical Fiber Implantations
Stereotactic viral injections and optical fiber implantations were performed in accordance with CAC guidelines at MIT. Mice were anaesthetized using 250 mg/kg avertin and mounted onto a stereotactic setup. A small craniotomy was made over DR. 1.5 µL of virus was injected by using a glass micropipette attached to a 10 µL Hamilton microsyringe through a microelectrode holder filled with mineral oil. A microsyringe pump was used to control the speed of the injection. For Arch experiments, mice were injected in the DR nucleus (AP: −4.65 mm, DV: −3 mm, ML: 0 mm) with AAV9-ef1α-DIO-eArch3.0-eYFP (UNC Vector Core), or a control virus containing AAV9-ef1α-DIO-eYFP (UNC Vector Core), diluted to a titer of 1×1011 particles/ml, in a pulled micropipette needle attached to a microinjector. For ChR2 experiments, mice were injected in the DR nucleus with AAVrh8-CBA-DIO-ChR2-eYFP diluted to a titer of 1×1012 particles/ml.
For DR cell body experiments, a single optical fiber (diameter 200 µm, numerical aperture 0.22) was implanted over the DR nucleus in the same session as viral injections, targeted for DR (AP: −4.65 mm, DV: −2.7 mm, ML: 0 mm), and secured with dental cement fitted with the top segment of a black microcentrifuge tube. For DR-NAc projection manipulation experiments, bilateral optical fibers were implanted above the nucleus accumbens (AP: +1.2 mm, DV: −4mm, ML: ±0.7 mm). For further light shielding, any dental cement not covered by the microcentrifuge tube was painted over with black nail polish. Mice were allowed to recover over a period of at least two weeks before being water-restricted again.
For electrophysiological validation, viruses were injected and allowed to express for 3–4 weeks before recordings were performed.
In vivo Electrophysiological Recordings
DR-ChR2 mice were anaesthetized via an i.p. injection (100 mL/kg) of a mixture of ketamine (100 mg/mL) / xylazine (20 mg/mL) and mounted on a stereotactic instrument. DR-Arch mice were anaesthetized with an i.p. injection of 1g/kg of urethane dissolved in saline. A small craniotomy was made above DR, and an optrode consisting a 0.5 MΩ tungsten electrode and an optical fiber with a diameter of 200 µm was lowered slowly into DR nucleus. The tip of the electrode extended beyond the end of the fiber tip by 200 µm. The optical fiber was connected either to a blue laser (473 nm, at 10 mW at fiber tip) or a green laser (532 nm, at 5 mW at fiber tip) during the recording. The lasers were controlled by a signal generator. For ChR2 experiments, the blue laser was pulsed at 5 ms width and 10 Hz frequency. For Arch experiments, a constant pulse of green light was used. Multiunit or single unit activity was recorded via an Axon Digidata 1440A acquisition system running Clampex 10.2 software. Data were analyzed using MATLAB.
Histology and Immunohistochemistry
Mice were perfused with 4% paraformaldehyde (PFA) in phosphate buffered saline (PBS). Brains were then post-fixed with the same solution overnight and sliced with a vibrotome to 50 µm sections. Sections were then incubated in primary antibodies in blocking solution (PBS containing 0.2% triton and 1% BSA) overnight at room temperature, washed in PBS 3 times for 15 min each, and then incubated in secondary antibodies in blocking solution for 2 hours at room temperature. Primary antibodies used were rabbit anti-TPH (Thermo Fisher, 1:1000) and chicken anti-GFP (Abcam, 1:1000). Secondary antibodies used were AlexaFluor 488- or 568-conjugated secondary antibodies (Invitrogen, 1:1000). Sections were then washed again in PBS 3 times, mounted onto glass slides and allowed to dry completely. Vectashield mounting medium containing DAPI was then applied to the brain sections and cover slides were added.
Behavior Chamber
The behavior task took place in an operant chamber. The chamber was equipped with 3 ports into which the subjects could nosepoke. The center port delivered the odor and the two side ports delivered water reward. The center port was connected to an odor manifold fitted with filters containing odors diluted with mineral oil. Air flow to the manifold was controlled by an olfactometer (Island Motion, Tappan, NY, USA). A stream of air flow through the odor filters joined the main carrier flow and was delivered to the subject. The side ports were fitted with water tubing connected to solenoid valves. When the subjects poked into the ports, infrared beams broke and the poke events were logged by Matlab (The MathWorks Inc., Natick, MA, USA).
Behavior Task
Subjects were water restricted for a week (administered 1.2 ml of water in a single session per day), and then gradually shaped to associate odor intensity with reward delays. On day one, subjects were placed into the chamber with the shaping protocol in place. Trials were self-initiated. Subjects were required to poke in the center port and collect a reward from either side port. Total air flow was held constant at 1000mL/min throughout the session. Odor delivery started as soon as the subjects made a center poke in. The intensity of odor 1 (caproic acid, 1:10 dilution in mineral oil) was linearly correlated with delay to the left reward. The intensity of odor 2 (hexanol, 1:10 dilution in mineral oil) was linearly correlated with delay to the right reward. The subjects were required to sample for a minimum of 400 milliseconds at the center port before the trial was able to proceed. The subjects were then required to commit a choice in one of the side ports to initiate the reward delay. The LEDs on the ports turned on when a center poke was made and were turned off when a valid side poke was made. If a center poke was not long enough to qualify the trial, the trial was aborted and the LEDs turned off. During the reward delay, the subjects were free to leave the port and move about the chamber. Once the reward delay lapsed, the water valve released a drop of water (~3 µL) with a click, and the subject could collect the water reward. The trials lasted 6 seconds longer than the longer reward delay so that the length of the trial was not contingent upon the choice. During training, both reward ports released 1 drop of water each. Subjects were typically trained on the task for 3 to 4 months before they consistently chose the less delayed reward. Once subjects had associated the odor intensities with reward delays, the left reward port released 2 separate drops of water (a large reward), while the right reward remained at 1 drop (a small reward).
Implanted mice performed the OGIC task with laser delivery through patch cords attached to the optical implants by ceramic sleeves. The patch cords were attached to a rotary joint to allow rotations and to free the mice for movement within the operant chamber. Between 10% and 20% of the trials were pseudorandomly selected to be light-on trials, in which the laser was turned on when a valid center poke was made, concurrent with the odor onset, and turned off when a valid side poke was made, committing the choice. The lasers (CNI, Jilin, China; Optoengine, Utah, USA) were triggered via a TTL pulse issued from the state machine that also controls the behavior apparatus. For Arch and green light control experiments, a constant pulse of 5 mW 532 nm light was used. For ChR2 and blue light control experiment, a train of 473 nm light was used at 10 Hz and 5 ms pulse width, power measured to be 10mW at constant output.
Quantification and Statistical Analyses
Preference Data Extraction
Port entry timestamps were logged using computers. For each subject, trial-by-trial data arrays were constructed using Matlab. Data from all sessions performed by the subject were pooled. The preferences were tabulated for each combination of left and right delays. Sampling time was calculated by subtracting the center poke entry time from center poke exit time. Transit time was calculated by subtracting the side port entry time from the center port exit time. The preference data was first sorted by relative reward delays: whether left reward delay was longer than, equal to or shorter than the right reward delay. Average preference for each case was calculated and compared between stimulation conditions.
Choice Model for Psychometric Curves
We used a multiple regression model to describe the choice. We took the inverse of the right reward delay because it fitted the data better. In order to take the inverse of right reward delay, which contained 0 second values, we added a small delay of 0.5 seconds to both left and right reward delays, which corresponds roughly to average transit time. Different link functions were compared as well as models including and not including an interaction between left delay and right delay−1. Model selection was based on goodness of fit assessed using Akaike Information Criterion normalized with number of trials. We use the function glmfit in Matlab to fit a generalized linear model for proportion left choice with a combination of a constant, βLeft Delay × Left Delay, βRight Delay−1 × Right Delay−1 and βLeft Delay*Right Delay−1 × Left Delay × Right Delay−1. To assess the influence of light activation and inhibition, we added a dummy variable for the presence of light, which could be 0 for light off and 1 for light on and could interact with any of the existing independent variables. Population means and SEMs for light interaction coefficients were compared to zero.
Trade-off Analysis
Heatmaps were generated by representing light effect predicted by the model with color intensity (predicted proportion of left choice with light on – predicted proportion of left choice with light off). Choice difficulty was calculated as 0.5 - |preference during light off trials – 0.5|, i.e. the distance between the subject’s mean choice to the closer 0 or 1 bounds. Perceptual difficulty was calculated as 1 - |intensity of odor 1 – intensity of odor 2 |/140. 140mL/min is the maximum total odor intensity possible. To compare the influence of choice difficulty against that of perceptual difficulty on the manipulation effect, a multiple linear regression was performed for individual subjects, where effect = β0 + βchoice difficulty * choice difficulty + βperceptual difficulty * perceptual difficulty. Population means and SEM for regression coefficients were compared to zero.
Statistics
Data between control and test groups with a light-on and light-off design were analyzed in R with two-way ANOVA with a mixed model, accounting for the within design of light and between design of group. Model coefficients sometimes violate the assumption of normality of residuals as assessed by Shapiro-Wilk test, and therefore were compared with Wilcoxon signed-rank tests between conditions or with zero, wherever appropriate. Null hypothesis was rejected at p-value<0.05, unless Bonferroni-corrected for multiple comparisons, in which case the p-value was multiplied by m, where m is the number of hypotheses. All data were visualized as population means with SEM, and the fit line represents the population mean of linear fit lines of individual subjects.
Supplementary Material
Acknowledgments
We thank C.W. Lovett for assistance with histology and colony management; C. A. Twiss for assistance with behavior training; W. Yu for assistance with histology and T. O’Connor for help with the behavior setup. We are grateful to N. Uchida for advice on the experiments, K. Deisseroth for viral constructs and M. S. Fee and E. K. Miller for comments on the study. We thank A. Bari for reading the manuscript and K. J. Bustamante for helpful input for the initial design of the task. We thank members of the Tonegawa lab for helpful discussions. This work was supported by a National Science Scholarship from the Agency for Science, Technology and Research (to S.X.), the RIKEN Brain Science Institute, the Howard Hughes Medical Institute, and the JPB Foundation (to S.T.).
Footnotes
Author Contributions:
S. T. and S. X. conceived the study; S. X., G. D. and E. H. designed the experiments. S. X. conducted the experiments. G. D. provided reagents. S. T. and S. X. wrote the paper with input from all authors.
The authors declare no conflict of interest.
References
- 1.Shoda Y, Mischel W. Predicting adolescent cognitive and self-regulatory competencies from preschool delay of gratification: Identifying diagnostic conditions. Dev. Psychol. 1990;26:978–986. [Google Scholar]
- 2.Bickel WK, Koffarnus MN, Moody L, Wilson AG. The behavioral- and neuro-economic process of temporal discounting: A candidate behavioral marker of addiction. Neuropharmacology. 2014;76(Pt B):518–527. doi: 10.1016/j.neuropharm.2013.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Reynolds B. A review of delay-discounting research with humans: relations to drug use and gambling. Behav. Pharmacol. 2006;17:651–667. doi: 10.1097/FBP.0b013e3280115f99. [DOI] [PubMed] [Google Scholar]
- 4.Story GW, Vlaev I, Seymour B, Darzi A, Dolan RJ. Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective. Front. Behav. Neurosci. 2014;8:76. doi: 10.3389/fnbeh.2014.00076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Barkley RA, Edwards G, Laneri M, Fletcher K, Metevia L. Executive functioning, temporal discounting, and sense of time in adolescents with attention deficit hyperactivity disorder (ADHD) and oppositional defiant disorder (ODD) J. Abnorm. Child Psychol. 2001;29:541–556. doi: 10.1023/a:1012233310098. [DOI] [PubMed] [Google Scholar]
- 6.Heerey EA, Robinson BM, McMahon RP, Gold JM. Delay discounting in schizophrenia. Cognit. Neuropsychiatry. 2007;12:213–221. doi: 10.1080/13546800601005900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Takahashi T, Oono H, Inoue T, Boku S, Kako Y, Kitaichi Y, Kusumi I, Masui T, Nakagawa S, Suzuki K, et al. Depressive patients are more impulsive and inconsistent in intertemporal choice behavior for monetary gain and loss than healthy subjects--an analysis based on Tsallis’ statistics. Neuro Endocrinol. Lett. 2008;29:351–358. [PubMed] [Google Scholar]
- 8.Ahn W-Y, Rass O, Fridberg DJ, Bishara AJ, Forsyth JK, Breier A, Busemeyer JR, Hetrick WP, Bolbecker AR, O’Donnell BF. Temporal discounting of rewards in patients with bipolar disorder and schizophrenia. J. Abnorm. Psychol. 2011;120:911–921. doi: 10.1037/a0023333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wogar AM, Bradshaw CM, Szabadi E. Effect of lesions of the ascending 5-hydroxytryptaminergic pathways on choice between delayed reinforcers. Psychopharmacology (Berl.) 1993:1–5. doi: 10.1007/BF02245530. [DOI] [PubMed] [Google Scholar]
- 10.Mobini S, Chiang T, Ho M, Bradshaw C, Szabadi E. Effects of central 5-hydroxytryptamine depletion on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology (Berl.) 2000;152:390–397. doi: 10.1007/s002130000542. [DOI] [PubMed] [Google Scholar]
- 11.Schweighofer N, Bertin M, Shishida K, Okamoto Y, Tanaka SC, Yamawaki S, Doya K. Low-serotonin levels increase delayed reward discounting in humans. J Neurosci. 2008;28:4528–4532. doi: 10.1523/JNEUROSCI.4982-07.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Cohen JY, Amoroso MW, Uchida N. Serotonergic neurons signal reward and punishment on multiple timescales. eLife. 2015;4:e06346. doi: 10.7554/eLife.06346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kim S, Hwang J, Lee D. Prefrontal Coding of Temporally Discounted Values during Intertemporal Choice. Neuron. 2008;59:522. doi: 10.1016/j.neuron.2008.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kepecs A, Uchida N, Zariwala H, Mainen ZF. Neural correlates, computation and behavioural impact of decision confidence. Nature. 2008;455:227–231. doi: 10.1038/nature07200. [DOI] [PubMed] [Google Scholar]
- 15.Roesch M, Calu D, Schoenbaum G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci. 2007;10:1615–1624. doi: 10.1038/nn2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Uchida N, Mainen ZF. Speed and accuracy of olfactory discrimination in the rat. Nat Neurosci. 2003;6:1224–1229. doi: 10.1038/nn1142. [DOI] [PubMed] [Google Scholar]
- 17.Acheson A, Farrar AM, Patak M, Hausknecht KA, Kieres AK, Choi S, de Wit H, Richards JB. Nucleus accumbens lesions decrease sensitivity to rapid changes in the delay to reinforcement. Behav Brain Res. 2006;173:217–228. doi: 10.1016/j.bbr.2006.06.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Cardinal R, Pennicott D, Lakmali C. Impulsive choice induced in rats by lesions of the nucleus accumbens core. Science. 2001;292:2499–2501. doi: 10.1126/science.1060818. [DOI] [PubMed] [Google Scholar]
- 19.Pothuizen HJ, Jongen-Rêlo J, Feldon AL, Yee BK. Double dissociation of the effects of selective nucleus accumbens core and shell lesions on impulsive-choice behaviour and salience learning in rats. Eur. J. Neurosci. 2005;22:2605–2616. doi: 10.1111/j.1460-9568.2005.04388.x. [DOI] [PubMed] [Google Scholar]
- 20.Winstanley CA, Theobald DEH, Dalley JW, Robbins TW. Interactions between serotonin and dopamine in the control of impulsive choice in rats: therapeutic implications for impulse control disorders. Neuropsychopharmacology. 2005;30:669–682. doi: 10.1038/sj.npp.1300610. [DOI] [PubMed] [Google Scholar]
- 21.Gerfen CR, Paletzki R, Heintz N. Cre-Recombinase Driver Lines to Study the Functional Organization of Cerebral Cortical and Basal Ganglia Circuits. Neuron. 2013;80:1368–1383. doi: 10.1016/j.neuron.2013.10.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Dalley JW, Fryer TD, Brichard L, Robinson ESJ, Theobald DEH, Laane K, Pena Y, Murphy ER, Shah Y, Probst K, et al. Nucleus accumbens D2/3 receptors predict trait impulsivity and cocaine reinforcement. Science. 2007;315:1267–1270. doi: 10.1126/science.1137073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Dalley JW, Everitt BJ, Robbins TW. Impulsivity, compulsivity, and top-down cognitive control. Neuron. 2011;69:680–694. doi: 10.1016/j.neuron.2011.01.020. [DOI] [PubMed] [Google Scholar]
- 24.Brown P, Molliver ME. Dual Serotonin (5-HT) Projections to the Nucleus Accumbens Core and Shell: Relation of the 5-HT Transporter to Amphetamine-Induced Neurotoxicity. J. Neurosci. 2000;20:1952–1963. doi: 10.1523/JNEUROSCI.20-05-01952.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Liu Z, Zhou J, Li Y, Hu F, Lu Y, Ma M, Feng Q, Zhang J, Wang D, Zeng J, et al. Dorsal raphe neurons signal reward through 5-HT and glutamate. Neuron. 2014;81:1360–1374. doi: 10.1016/j.neuron.2014.02.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Miyazaki KW, Miyazaki K, Tanaka KF, Yamanaka A, Takahashi A, Tabuchi S, Doya K. Optogenetic Activation of Dorsal Raphe Serotonin Neurons Enhances Patience for Future Rewards. Curr. Biol. 2014;24:2033–2040. doi: 10.1016/j.cub.2014.07.041. [DOI] [PubMed] [Google Scholar]
- 27.Fonseca MS, Murakami M, Mainen ZF. Activation of Dorsal Raphe Serotonergic Neurons Promotes Waiting but Is Not Reinforcing. Curr. Biol. 2015;25:306–315. doi: 10.1016/j.cub.2014.12.002. [DOI] [PubMed] [Google Scholar]
- 28.Bari A, Robbins TW. Inhibition and impulsivity: Behavioral and neural basis of response control. Prog. Neurobiol. 2013;108:44–79. doi: 10.1016/j.pneurobio.2013.06.005. [DOI] [PubMed] [Google Scholar]
- 29.Qi J, Zhang S, Wang HL, Wang H, de Jesus Aceves Buendia J, Hoffman AF, Lupica CR, Seal RP, Morales M. A glutamatergic reward input from the dorsal raphe to ventral tegmental area dopamine neurons. Nat Commun. 2014;5:5390. doi: 10.1038/ncomms6390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Matias S, Lottem E, Dugué GP, Mainen ZF. Activity patterns of serotonin neurons underlying cognitive flexibility. eLife. 2017;6:e20552. doi: 10.7554/eLife.20552. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Bromberg-Martin ES, Hikosaka O, Nakamura K. Coding of Task Reward Value in the Dorsal Raphe Nucleus. J. Neurosci. 2010;30:6262–6272. doi: 10.1523/JNEUROSCI.0015-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Nakamura K, Matsumoto M, Hikosaka O. Reward-dependent modulation of neuronal activity in the primate dorsal raphe nucleus. J. Neurosci. 2008;28:5331–5343. doi: 10.1523/JNEUROSCI.0021-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ranade SP, Mainen ZF. Transient Firing of Dorsal Raphe Neurons Encodes Diverse and Specific Sensory, Motor, and Reward Events. J. Neurophysiol. 2009;102:3026–3037. doi: 10.1152/jn.00507.2009. [DOI] [PubMed] [Google Scholar]
- 34.Wogar MA, Bradshaw CM, Szabadi E. Effect of lesions of the ascending 5-hydroxytryptaminergic pathways on choice between delayed reinforcers. Psychopharmacology (Berl.) 1993;111:239–243. doi: 10.1007/BF02245530. [DOI] [PubMed] [Google Scholar]
- 35.Winstanley CA, Dalley JW, Theobald DEH, Robbins TW. Fractionating impulsivity: contrasting effects of central 5-HT depletion on different measures of impulsive behavior. Neuropsychopharmacol. Off. Publ. Am. Coll. Neuropsychopharmacol. 2004;29:1331–1343. doi: 10.1038/sj.npp.1300434. [DOI] [PubMed] [Google Scholar]
- 36.Mazur JE. An adjusting procedure for studying delayed reinforcement. Hillsdale, NJ: Earlbaum; 1987. [Google Scholar]
- 37.Luo M, Li Y, Zhong W. Do dorsal raphe 5-HT neurons encode “beneficialness”? Neurobiol. Learn. Mem. 2016;135:40–49. doi: 10.1016/j.nlm.2016.08.008. [DOI] [PubMed] [Google Scholar]
- 38.Sudhof TC. Reproducibility: Experimental mismatch in neural circuits. Nature. 2015;528:338–339. doi: 10.1038/nature16323. [DOI] [PubMed] [Google Scholar]
- 39.Marcinkiewcz CA, Mazzone CM, D’Agostino G, Halladay LR, Hardaway JA, DiBerto JF, Navarro M, Burnham N, Cristiano C, Dorrier CE, et al. Serotonin engages an anxiety and fear-promoting circuit in the extended amygdala. Nature. 2016;537:97–101. doi: 10.1038/nature19318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.McDevitt RA, Tiran-Cappello A, Shen H, Balderas I, Britt JP, M MRA, Chung SL, Richie CT, Harvey BK, Bonci A. Serotonergic versus Nonserotonergic Dorsal Raphe Projection Neurons: Differential Participation in Reward Circuitry. Cell Rep. 2014;8:1857–1869. doi: 10.1016/j.celrep.2014.08.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Mattis J, Tye KM, Ferenczi EA, Ramakrishnan C, O’Shea DJ, Prakash R, Gunaydin LA, Hyun M, Fenno LE, Gradinaru V, et al. Principles for applying optogenetic tools derived from direct comparative analysis of microbial opsins. Nat Meth. 2012;9:159–172. doi: 10.1038/nmeth.1808. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.