Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Nov 3.
Published in final edited form as: Curr Biol. 2014 Oct 23;24(21):2564–2568. doi: 10.1016/j.cub.2014.09.016

Observation of reward delivery to a conspecific modulates dopamine release in ventral striatum

Vadim Kashtelyan 1, Nina T Lichtenberg 1, Mindy L Chen 1, Joseph F Cheer 3, Matthew R Roesch 1,2
PMCID: PMC4303906  NIHMSID: NIHMS627822  PMID: 25438944

SUMMARY

Dopamine (DA) neurons increase and decrease firing to rewards that are better and worse than expected, respectively. These correlates have been observed at the level of single-unit firing and in measurements of phasic DA release in ventral striatum (VS) [1-10]. Here, we ask if DA release is modulated by delivery of reward, not to oneself, but to a conspecific. It is unknown what, if anything, DA release encodes during social situations in which one animal witnesses another animal receive reward. It might be predicted that DA release will increase; suggesting that watching a conspecific receive reward is a favorable outcome. Conversely, DA release may be entirely dependent on personal experience, or perhaps observation of receipt of reward might be experienced as a negative outcome because another individual, rather than oneself, received the reward. Our data shows that animals display a mixture of affective states during observation of conspecific reward, first exhibiting increases in appetitive calls (50 kHz), followed by an increase in aversive calls (22 kHz) [11-14]. Like ultrasonic vocalizations (USVs), DA signals were modulated by delivery of reward to the conspecific. We show stronger DA release during observation of the conspecific receiving reward relative to observation of reward delivered to an empty box, but only on the first trial. During the following trials this relationship reversed; DA release was reduced during observation of the conspecific receiving rewards. These findings suggest that positive and negative states associated with conspecific reward delivery modulate DA signals related to learning in social situations.

Keywords: dopamine, ventral striatum, nucleus accumbens, observation, reward, voltammetry, rat, ultrasonic vocalization, social behavior

RESULTS

Rats experienced 3 different trial-types during data collection (Fig. 1A). For one trial type, observer rats were placed in the left side of the behavioral box, which was divided in two by a wire mesh. While on the left side, observer rats received palatable sucrose pellet rewards 10 s after illumination of a cue-light while a conspecific was located on the right (receive-reward trial type). Following ~15 trials, observers were removed from the left side and placed in the right side. At this point the program resumed on the left side (i.e., reward delivery was delivered in the food cup 10 s after light onset) in one of two ways. In half of the blocks, a conspecific was placed in the left side, where it consumed reward while the observer rat watched (observe-rat trial-type). In the other half of the blocks, the left side was empty and the pellets were dropped into the food cup with no conspecific present (observe-empty trial-type). Following ~15 trials, observer rats were placed back in the left side and received rewards for ~15 trials, before returning to the right side. This sequence repeated several times (Fig. 1B).

Figure 1. Task design.

Figure 1

A. Rats were trained that reward (sugar pellet) was delivered 10 s after onset of cue lights. The chamber was divided in half by wire mesh, which allowed the animals to hear, see, and smell both sides. Each recording session consisted of 3 block types: ‘receive- reward’, ‘observe-rat’, and ‘observe-empty’. In the ‘receive-reward’ block, the animal from which DA was recorded was placed in the left chamber and another animal was placed in the right half. In the ‘observe’ blocks recorded rats were placed in the right side while the program commenced on the left side. During these trials, the left side of the box was ‘empty’ or occupied by a conspecific. B. The order of the blocks was: 1. receive-reward → 2. observe-rat → 3. receive-reward → 4. observe-empty → 5. receive-reward →6. observe-rat → 7. receive-reward → 8. observe-empty. The 2nd and 6th block of trials alternated with the 4th and 8th block of trials daily. Each block had an average of 15 trials. C. Placement of chronic recording electrodes based on histology.

In a group of 16 rats we recorded USVs during these trial types. Figure 2A-D plots USVs for the first (black; early) and last 6 (gray; late) trials in successive 10 second epochs. During observe-empty trials (i.e., pellets delivered to the food cup in an empty box) observer rats exhibited slightly increased 50 kHz vocalizations during early trials. Fifty kHz rates were significantly higher in bins 10-30 relative to baseline (10 s before cue; i.e., -10 in Fig. 2A; Wilcoxon, p's < 0.05). There were also increases in 22 kHz calls during observe-empty trials (Fig. 2B) but they did not achieve significance (Wilcoxon, p's > 0.056). Thus, observation of reward delivered to an empty box marginally increased USVs.

Figure 2. USVs are modulated by conspecific reward observation.

Figure 2

A and B. Fifty kHz (A) and 22 kHz (B) USV rates in 10 s bins during the trial for observe-empty trials. C and D. Fifty kHz (A) and 22 kHz (B) USV rates in 10 s bins during the trial for observe-rat trials. Early trials = first 6 trials (black); Late trials = last 6 trials (gray); *Wilcoxon between observe-rat and observe-empty trials; p < 0.05. E and F. USV rates and dB level for 50 and 22 kHz USVs for the first 6 observe-rat trials during the 10 s period after reward delivery. *Wilcoxon between indicated trial and last trial in the block. G and H. USV frequency and dB levels during the first 6 trials when rats were in the left side of the box with no other rat present. Error bars are standard error of the mean.

In contrast, delivery of the reward to the conspecific had pronounced effects on USVs. Both appetitive and aversive calls increased after cue-light onset. During early trials (black), USV rates were significantly higher on observe-rat compared to observe-empty trials (Wilcoxon; p < 0.05). Twenty-two kHz calls increased gradually after onset of the cue light during observe-rat trials (Fig. 2D; black), but were not significantly different than observe-empty trials (Fig. 2B) until bin 30 (Wilcoxon, p < 0.05). During the later trials (last 6 trials; gray), both 50 and 22 kHz were less frequent during trial events (cue onset to cue offset; 0-20s), but remained significantly higher during the ITI (Fig. 2C-D, asterisks).

To further characterize the development of USVs during early portions of each trial block we examined USVs over the course of first 6 trials during the 10-second period after reward. Fifty kHz calls were emitted at the highest rate and amplitude during the first trial of observation of the conspecific (Fig. 2E). Only on the first trial were the 50 kHz USV rates and amplitude significantly higher compared to the last trial in the block (Wilcoxon, p < 0.05). Unlike 50 kHz USVs, 22 kHz calls were most prevalent on the third trial (Fig. 2F; Wilcoxon; 3rd versus last trial; p < 0.05). Importantly, these USV patterns were not observed when the demonstrators received reward with no observer present (Fig. 2G and H), suggesting that the USV microphone was too far from the demonstrator to collect USVs and/or that the demonstrator was not emitting them. Together these results suggest that appetitive calls were prominent from cue onset through reward delivery (Fig. 2C), whereas aversive calls gradually increased during cue onset and reward delivery, peaking several seconds into the ITI (Fig. 2D). During reward delivery and consumption, appetitive calls were strongest during the first trial (Fig. 2E), whereas aversive calls emerged over the first several trials (Fig. 2E).

To determine how DA release changes during the 3 trial-types (Fig. 1) we outfitted 4 rats with FSCV electrodes in nucleus accumbens core (Fig. 1C). For our first analysis we averaged over all first trials during observation of the conspecific over all sessions (see supplemental for single trial examples). Recall that rats exhibited the most robust 50 kHz vocalizations in the first trial. We observed DA release following cue onset and around the time of reward delivery on observe-rat trials (Fig. 3A; green). Release occurred prior to the onset of the reward delivery, which might reflect anticipation of the reward by either estimating the 10 s period or using social cues (e.g., USVs [15]). DA release was significantly elevated during the 10 s following cue onset (cue epoch ttest, t21 = 2.80; p < 0.05) and during the 2 s after reward delivery (reward epoch; ttest, t21 = 3.52; p < 0.05) relative to baseline (1 s before light on).

Figure 3. DA release is modulated by conspecific observation.

Figure 3

A and B. Average DA concentration over time during the first (A) and last (B) trial of each trial type (11 sessions; 4 rats). C and D. Average DA release for all three trial-types taken 10 s after cue light onset during the first (C) and last (D) trials in each block (cue epoch). E and F. Average DA release for all three trial-types taken 2 s after reward delivery (reward epoch) during the first (E) and last (F) trials in each block. Error bars are standard error of the mean. *Tukey; p < 0.05. Blue: Receive-reward; Green: Observe-rat; Red: Observe-empty.

Next, we determined if DA release was differentially modulated during the three types of trials during the cue and reward epochs. A 2-factor ANOVA was performed with trial-type (reward, observe-rat, and empty-empty) and trial (first and last) as factors independently for each epoch. Main and interaction effects with trial-type were explored via post-hoc tests (Cue epoch: trial-type, F(2,168) = 24.5, p < 0.05; trial, F(1,168) = 8.8, p < 0.05, interaction, F(2,168) = 0.28, p = 0.75; Reward epoch: F(2,168) = 29.6, p < 0.05; trial, F(1,168) = 16.4, p < 0.05, interaction, F(2,168) = 1.3, p = 0.75).

During observe-rat (green) and observe-empty (red) trials, cue-related DA release was significantly attenuated relative to when the CS was a reliable predictor that reward would be delivered to the rat being recorded from; i.e., receive-reward trials (Fig. 3A and C; green versus blue: tukey's, t64 = 2.35, p < 0.05, red versus blue: tukey's, t64 = 3.49, p <0.05). Cue evoked DA release was not significantly different between observe-rat (green) and observe-empty (red) trial types (Fig. 3A and C; tukey's; t21 = 1.30, p = 0.21).

In the reward epoch, receipt of reward by the recorded rat elicited the most robust DA release (Fig. 3E; blue versus green: tukey's ttest, t64 = 2.44, p <0.05; blue versus red: tukey's ttest, t64 = 4.53, p < 0.05). DA release during delivery to the recorded rat (blue) and during observation of the conspecific receiving reward (green) were both significantly higher than delivery of reward to an empty box during the first trial (Fig. 3E; blue versus red: tukey's, t64 = 4.53, p < 0.05, green versus red: tukey's, t21 = 2.70, p <0.05). There was no significant difference between observe-rat and observe-empty during the last trials in each block (Fig. 3B and F; t21 = 0.41, p = 0.69).

To explore the dynamics of DA release after the initial increase observed on the first trial, we plotted average DA release over time for the second trial. For comparison, we re-plotted the first trials for each trial-type (Fig. 4A and B). Remarkably, the initial release of DA present on the first observe-rat trial was not present on the second trial. In fact, release was dramatically reduced after delivery of reward on the second trial (Fig. 4A; thin green) compared to that observed during the first trial of observation (Fig. 4A; thick green). The same degree of change was not observed during the second trial of observe-empty trials (Fig. 3B; thick versus thin red).

Figure 4. Modulation of DA release depends on trial number.

Figure 4

A and B. Average DA concentration over time during the first (thick) and second (thin) trial of each trial type (11 sessions; 4 rats). C and D. Average DA release for all three trial-types taken during the first 6 trials and the last trial in each block for cue (C) and reward (D) epoch. Error bars are standard error of the mean. E. Orientation to the left side of the box by the recording rat during observe- rat and observe-empty trial blocks. Please see methods for detail about video scoring. *Wilcoxon; p < 0.05. Blue: Receive-reward; Green: Observe-rat; Red: Observe-empty.

To quantify DA release over each trial block, we plotted the average DA concentration over the cue (Fig. 4C) and reward (Fig. 4D) epochs during the first six trials. A multi-factor ANOVA exhibited main and interaction effects with trial-type and trial number (Cue epoch: trial-type, F(2,593) = 114.8, p < 0.05; trial, F(6,593) = 3.6, p < 0.05, interaction, F(12, 593) = 1.7, p = 0.06; Reward epoch: F(2, 593) = 99.4, p < 0.05; trial, F(6,593) = 5.21, p < 0.05, interaction, F(12, 593) = 2.31, p < 0.05). Over the course of the first several trials during observe-rat blocks, release declined in both epochs reaching a minimum during trial four (see supplemental figure 3 for individual rat plots). For both cue and reward epochs, release was significantly different than trial one during trial four of observe-rat blocks (Fig. 4C and D; green trial one versus trial four; t21's > 3.57, p's < 0.05) and was significantly different than observe empty trials (Trial 4; red versus green; t21's> 2.17; p < 0.05, see supplemental for analysis of release later in the trial).

Changes in DA release might reflect changes in orientation toward the rewarding side of the box. In a final analysis we examined the observer's orientation toward the conspecific's side of the box during the first 6 and last trials. Orientation declined over the first 6 trials. Rats oriented less to the left side of the box on the last trial relative to the first trial during observation blocks (Wilcoxon; p < 0.05; Fig. 4E; see methods for more details), however there was no significant difference between observe-rat and observe-empty trials on the first trial. Interestingly, the decline in orienting to the rewarding side of the box decreased for both observation trial-types, however this decline occurred earlier under observe-rat trials (Fig. 4E). During observation of the conspecific, significantly less orienting was observed by the forth trial relative to the first (Fig. 4E; green; Wilcoxon, p < 0.05), whereas significant differences between the first and subsequent trials were not present until the very last trial, during observation of the empty box (Fig. 4E; red; Wilcoxon, p < 0.05). Thus, the decline matched that observed by the fourth trial when examining average DA release during observe-rat trials (see supplemental for further analysis). Combined with the USV data, these results suggest that observing the conspecific receive reward is initially appetitive and then becomes aversive, at which point observers orient away from the conspecific.

DISCUSSION

Here we show that DA release in VS is modulated by the presence of a conspecific engaged in the pursuit of reward. Increased and decreased DA release around the time of reward delivery do not just reflect reward anticipation or the inability to obtain reward, respectively, because these responses were not present when the box was empty. Both increases and decreases in DA release were observed in response to trial events, and therefore cannot reflect the presence of the other rat [16, 17]. We conclude that DA is not exclusively modulated when rats receive reward themselves, but also during observation of a conspecific receiving reward.

Remarkably, DA release during observation of reward delivery to the conspecific was only present during the first trial. Fifty kHz calls were more prominent during the first trial of observation [11-14], suggesting that observing rats found the first trial of conspecific reward delivery to be a positive affective event. During trials 2-4, DA signals dropped below what was observed on observe-empty trials, but then returned to comparable levels by trial 5. This corresponds remarkably well to changes in 50 and 22 kHz calls. Previous work has shown decreases and increases in 50 and 22 kHz calls during extinction, respectively [11-14]. Here, we show increased 22 kHz calls after rats observe conspecifics receive reward for several trials. The rapid decline in the DA signal on observe-rat trials compared to observe-empty trials likely reflects a negative affective state associated with this trial type in relation to previous receipt of reward to oneself. Strong emotional reactions and changes in DA release should more strongly modify behavior during observe-rat trials. Indeed, rats oriented away from the left side of the box faster for observe-rat trials relative to observe-empty trials. It is not clear why rats orient away from the conspecific so quickly on observe-rat trials, but one intriguing possibility is that they experience a negative affective state (e.g., frustration, jealousy) when watching another rat eating – when they are not – that they quickly look away. Certainly, this interpretation fits with the decrease and increase in 50 and 22 kHz USVs over the first several trials.

Although DA release was modulated during observation of reward delivery to a conspecific, its amplitude was smaller compared to when the recorded rat performed blocks in which it obtained reward. This suggests that DA release does not reflect the relationship between the cue and the reward, as this did not change over the course of the experiment. In all blocks, the CS always predicted that reward would be delivered 10 s later. This raises the intriguing notion that cue- and reward-related DA signals are dependent on whether or not the reward is to be consumed by oneself, another rat, or not at all.

We conclude that delivery of reward to a conspecific modulates DA release and affective states of rats during observation. Considering DA's role in reinforcement learning we hypothesize that this signal is critical for animals to learn by watching the actions of others when those actions are adaptive (i.e., they result in reward) [1-10]. We certainly cannot unambiguously prove this here because there was no instrumental response to be learned, but previous work has shown that animals can learn to perform behaviors through observation, thus paving the way for interesting future studies [18-21].

Methods

Electrode fabrication, surgical procedure, histology, and electrochemical detection of dopamine are the same as previously reported and described in detail in the supplemental material [22]. Task parameters are described in the text and figure 1. For more detail see supplemental material.

Supplementary Material

ACKNOWLEDGEMENTS

This work was supported by University of Maryland College Park/School of Medicine Seed grant and funds from NIDA (R01DA031695, MR). We thank Ronny Gentry, Erik Oleson, and Roger Cachope for training related to collection and analysis of fast-scan cyclic voltammetry.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

REFERENCES

  • 1.Day JJ, Roitman MF, Wightman RM, Carelli RM. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat Neurosci. 2007;10:1020–1028. doi: 10.1038/nn1923. [DOI] [PubMed] [Google Scholar]
  • 2.Oleson EB, Gentry RN, Chioma VC, Cheer JF. Subsecond dopamine release in the nucleus accumbens predicts conditioned punishment and its successful avoidance. J Neurosci. 2012;32:14804–14808. doi: 10.1523/JNEUROSCI.3087-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
  • 4.Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. doi: 10.1016/j.neuron.2005.05.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H. Midbrain dopamine neurons encode decisions for future action. Nat Neurosci. 2006;9:1057–1063. doi: 10.1038/nn1743. [DOI] [PubMed] [Google Scholar]
  • 6.Roitman MF, Stuber GD, Phillips PE, Wightman RM, Carelli RM. Dopamine operates as a subsecond modulator of food seeking. J Neurosci. 2004;24:1265–1271. doi: 10.1523/JNEUROSCI.3823-03.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bromberg-Martin ES, Matsumoto M, Hikosaka O. Dopamine in motivational control: rewarding, aversive, and alerting. Neuron. 2010;68:815–834. doi: 10.1016/j.neuron.2010.11.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Roesch MR, Calu DJ, Schoenbaum G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat Neurosci. 2007;10:1615–1624. doi: 10.1038/nn2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Pan WX, Schmidt R, Wickens JR, Hyland BI. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J Neurosci. 2005;25:6235–6242. doi: 10.1523/JNEUROSCI.1478-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Schultz W. Updating dopamine reward signals. Current opinion in neurobiology. 2013;23:229–238. doi: 10.1016/j.conb.2012.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Knutson B, Burgdorf J, Panksepp J. Ultrasonic vocalizations as indices of affective states in rats. Psychological bulletin. 2002;128:961–977. doi: 10.1037/0033-2909.128.6.961. [DOI] [PubMed] [Google Scholar]
  • 12.Burgdorf J, Knutson B, Panksepp J. Anticipation of rewarding electrical brain stimulation evokes ultrasonic vocalization in rats. Behavioral neuroscience. 2000;114:320–327. [PubMed] [Google Scholar]
  • 13.Wohr M, Schwarting RK. Affective communication in rodents: ultrasonic vocalizations as a tool for research on emotion and motivation. Cell and tissue research. 2013;354:81–97. doi: 10.1007/s00441-013-1607-9. [DOI] [PubMed] [Google Scholar]
  • 14.Burgdorf J, Panksepp J, Moskal JR. Frequency-modulated 50 kHz ultrasonic vocalizations: a tool for uncovering the molecular substrates of positive affect. Neuroscience and biobehavioral reviews. 2011;35:1831–1836. doi: 10.1016/j.neubiorev.2010.11.011. [DOI] [PubMed] [Google Scholar]
  • 15.Willuhn I, Tose A, Wanat MJ, Hart AS, Hollon NG, Phillips PE, Schwarting RK, Wohr M. Phasic Dopamine Release in the Nucleus Accumbens in Response to Pro-Social 50 kHz Ultrasonic Vocalizations in Rats. J Neurosci. 2014;34:10616–10623. doi: 10.1523/JNEUROSCI.1060-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Robinson DL, Zitzman DL, Smith KJ, Spear LP. Fast dopamine release events in the nucleus accumbens of early adolescent rats. Neuroscience. 2011;176:296–307. doi: 10.1016/j.neuroscience.2010.12.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Robinson DL, Heien ML, Wightman RM. Frequency of dopamine concentration transients increases in dorsal and ventral striatum of male rats during introduction of conspecifics. J Neurosci. 2002;22:10477–10486. doi: 10.1523/JNEUROSCI.22-23-10477.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Jeon D, Kim S, Chetana M, Jo D, Ruley HE, Lin SY, Rabah D, Kinet JP, Shin HS. Observational fear learning involves affective pain system and Cav1.2 Ca2+ channels in ACC. Nat Neurosci. 2010;13:482–488. doi: 10.1038/nn.2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zentall TR, Levine JM. Observational learning and social facilitation in the rat. Science. 1972;178:1220–1221. doi: 10.1126/science.178.4066.1220. [DOI] [PubMed] [Google Scholar]
  • 20.Saggerson AL, Honey RC. Observational learning of instrumental discriminations in the rat: the role of demonstrator type. Q J Exp Psychol (Colchester) 2006;59:1909–1920. doi: 10.1080/17470210600705032. [DOI] [PubMed] [Google Scholar]
  • 21.Leggio MG, Graziano A, Mandolesi L, Molinari M, Neri P, Petrosini L. A new paradigm to analyze observational learning in rats. Brain Res Brain Res Protoc. 2003;12:83–90. doi: 10.1016/j.brainresprot.2003.08.001. [DOI] [PubMed] [Google Scholar]
  • 22.Oleson EB, Gentry RN, Chioma VC, Cheer JF. Subsecond dopamine release in the nucleus accumbens predicts conditioned punishment and its successful avoidance. J Neurosci. 2012;32:14804–14808. doi: 10.1523/JNEUROSCI.3087-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

RESOURCES