Skip to main content
eLife logoLink to eLife
. 2022 Feb 3;11:e72549. doi: 10.7554/eLife.72549

Cell-type-specific responses to associative learning in the primary motor cortex

Candice Lee 1, Emerson F Harkin 1, Xuming Yin 1, Richard Naud 1,2,3,4, Simon Chen 1,3,4,
Editors: Jun Ding5, Michael J Frank6
PMCID: PMC8856656  PMID: 35113017

Abstract

The primary motor cortex (M1) is known to be a critical site for movement initiation and motor learning. Surprisingly, it has also been shown to possess reward-related activity, presumably to facilitate reward-based learning of new movements. However, whether reward-related signals are represented among different cell types in M1, and whether their response properties change after cue–reward conditioning remains unclear. Here, we performed longitudinal in vivo two-photon Ca2+ imaging to monitor the activity of different neuronal cell types in M1 while mice engaged in a classical conditioning task. Our results demonstrate that most of the major neuronal cell types in M1 showed robust but differential responses to both the conditioned cue stimulus (CS) and reward, and their response properties undergo cell-type-specific modifications after associative learning. PV-INs’ responses became more reliable to the CS, while VIP-INs’ responses became more reliable to reward. Pyramidal neurons only showed robust responses to novel reward, and they habituated to it after associative learning. Lastly, SOM-INs’ responses emerged and became more reliable to both the CS and reward after conditioning. These observations suggest that cue- and reward-related signals are preferentially represented among different neuronal cell types in M1, and the distinct modifications they undergo during associative learning could be essential in triggering different aspects of local circuit reorganization in M1 during reward-based motor skill learning.

Research organism: Mouse

Introduction

The primary motor cortex (M1) is an essential site for movement execution and motor learning. Within M1, neurons encode movement goals and movement kinematics (Georgopoulos et al., 1992; Moran and Schwartz, 1999; Peters et al., 2014). Intriguingly, neurons in M1 have also been reported to show reward-related activity. In vivo recording studies performed in nonhuman primates found neurons in M1 that encode reward anticipation, reward delivery, and mismatches between the two (Marsh et al., 2015; Ramakrishnan et al., 2017; Ramkumar et al., 2016). In human subjects, reward has also been shown to modulate M1 activity, likely through an inhibitory circuit-dependent mechanism (Thabit et al., 2011). However, it remains unclear how reward-related responses are represented in M1, and if the representation changes with associative learning.

It was recently shown that in well-trained mice performing a skilled reaching task, a subset of layer 2/3 (L2/3) pyramidal neurons (PNs) in M1 specifically report successful, but not failed, reach-and-grasp movements. In contrast, a different subset of PNs report only failed reach-and-grasp movements (Levy et al., 2020). Since the ability to use past experience to learn action–outcome associations is critical to survival, encoding the outcome in M1 may be an important part of motor skill learning. It is widely accepted that associative learning using reinforcement can accelerate and enhance learning (Abe et al., 2011; Nikooyan and Ahmed, 2015). In the case of motor learning, studies have demonstrated that positive feedback (reward) facilitates motor memory retention and negative feedback (punishment) speeds up the learning process (Galea et al., 2015). One hypothesis is that during learning, reward signals in the brain, together with neuromodulators and synaptic plasticity, are involved in potentiating and optimizing the neural circuitry in M1 that underlies the rewarded movement. Implementing such a learning process would necessitate the interplay between different cell types within the local microcircuitry (Richards et al., 2019).

M1, like other cortical areas, is densely packed with PNs and diverse inhibitory interneuron (IN) types and is wired in a delicately balanced and intricate circuit. Different IN subtypes have been shown to have distinct gene expression profiles, electrophysiological properties, and connectivity motifs (Fishell and Rudy, 2011; Markram et al., 2004). Somatostatin-, parvalbumin-, and vasoactive intestinal peptide- expressing inhibitory neurons (SOM-INs, PV-INs, and VIP-INs, respectively) are three major nonoverlapping subtypes of GABAergic neurons that broadly form a common microcircuit motif in the cortex. Some studies have demonstrated that SOM-INs preferentially target distal dendrites of PNs to filter synaptic inputs, fast-spiking PV-INs preferentially target perisomatic regions of PNs enabling strong inhibition of spiking, and VIP-INs regulate local microcircuits by controlling other local INs (Pfeffer et al., 2013). Due to their diverse properties and strategic connectivity motifs, these INs exert fine control over local network activity and provide a potential mechanism for how the brain processes reward signals and ultimately uses this information to optimize neural activity related to learned motor skills.

Multiple studies using in vivo opto-recordings in the primary visual cortex have shown that visual orientation selectivity in PNs is modulated and sharpened by PV- and SOM-INs (Atallah et al., 2012; Lee et al., 2012; Wilson et al., 2012). In the primary auditory cortex, PV- and SOM-INs exert analogous control over PN frequency tuning (Seybold et al., 2015). Moreover, in the auditory cortex, prefrontal cortex, and basolateral amygdala, reinforcement signals such as reward and punishment have been shown to recruit VIP-INs, which in turn, inhibit SOM- and PV-INs (Krabbe et al., 2019; Pi et al., 2013). The subtype-specific roles of these INs have long been elusive, but a complex picture is emerging where INs are not only responsible for maintaining a delicate balance of excitation and inhibition, but are also actively involved in processing activity in the cortex (Lee et al., 2020; Wood et al., 2017).

Here, we employed chronic in vivo two-photon imaging, combined with a head-fixed classical conditioning task, to monitor the activity of the same population of PNs, PV-INs, SOM-INs, or VIP-INs before and after associative learning to investigate whether and how conditioned cue stimulus (CS) and unconditioned reward are represented among different neuronal cell types in M1. Our results demonstrate that all four major cell types in M1 show distinct responses to CS and reward, and their response properties undergo cell-type-specific modifications after associative learning. Notably, PV-INs and VIP-INs exhibited stimulus-specific modifications, in which PV-INs became more reliably responsive to the CS but not to the reward, whereas VIP-INs became more reliable to the reward but not to the CS. PNs initially showed robust responses to novel reward but became habituated to it after associative learning. Lastly, SOM-IN responses emerged with learning and responded more reliably to both the CS and reward. Taken together, these results show that cue- and reward-related signals are preferentially represented among major neuronal cell types in M1, and they undergo cell-type-specific modifications during associative learning, indicating they may have distinct roles in integrating reinforcement signals to promote circuit reorganization in M1 during motor skill learning.

Results

To understand how reward-associated signals are represented within the local microcircuitry in M1 before and after associative learning, we established a head-fixed auditory cued reward conditioning task, which allowed us to combine the task with in vivo two-photon Ca2+ imaging to examine the response properties of different neuronal cell-type populations in awake and behaving mice (Figure 1A). In this task, water-restricted mice were exposed to a conditioned stimulus (CS; auditory tone, 1-s duration), followed by a 1.5-s delay and then the delivery of the unconditioned stimulus (US; water reward, ~10 µl). Mice were trained for ~30–35 trials/session (1 session/day for 7 days) with randomly varied intertrial intervals (ITIs) between 60 and 120 s (Figure 1B). Since M1 is known to be involved in movement initiation and motor skill learning, we chose to use a simple classical conditioning task with just an auditory tone paired with reward and omitted any additional training where mice would be required to learn a new movement. The rationale for this is that many neuronal cell types, including PNs, PV-, and SOM-INs, have been shown to undergo modifications when mice acquire new movements (Chen et al., 2015; Cichon and Gan, 2015; Donato et al., 2013; Xu et al., 2009). However, since licking is an innate movement that does not induce plastic changes in adult mice (Chen et al., 2015; Komiyama et al., 2010; Peters et al., 2014), we can reliably attribute changes in neuronal activity over the course of the task to associative learning, rather than motor learning.

Figure 1. Associative learning during a head-fixed classical conditioning task.

Figure 1.

(A) Schematic of head-fixed classical conditioning task. (B) Trial structure. (C) Mean lick rate per second on days 1 and 7. Binned over 0.5-s intervals. Lick rate following cue stimulus (CS) onset up to reward delivery time is higher on day 7. Two-way analysis of variance (ANOVA), ***p < 1 × 10–3, effect of time: p < 1 × 10−3, effect of day: p < 1 × 10−3. (D) Mean anticipatory lick rate across trials within days 1 and 7 sessions. Mean anticipatory lick rate was calculated from CS onset to end of delay period. Two-way ANOVA, effect of trial number: p = 0.91, effect of day: p < 1 × 10−3. (E) Mean lick rate during the first 2.5 s of intertrial interval (ITI) lick bouts, 2.5 s following CS onset and 2.5 s following reward delivery on day 1. Each point is the mean from an individual mouse. One-way ANOVA with Tukey–Kramer correction for multiple comparisons, ITI vs. CS: p = 0.97, ITI vs. reward: p < 1 × 10−3, CS vs. reward: p < 1 × 10−3. (F) Mean lick rate during the first 2.5 s of ITI lick bouts, 2.5 s following CS onset and 2.5 s following reward delivery on day 7. Each point is the mean from an individual mouse. One-way ANOVA with Tukey–Kramer correction for multiple comparisons. ITI vs. CS: p = 1.08 × 10−3, ITI vs. reward: p < 1 × 10−3. n = 23 mice. **p < 0.01, ***p < 0.001. Error bars show standard error of the mean (SEM).

Mice learned to associate the CS with the reward after 7 days, shown by an increase in anticipatory lick rate, a conditioned response, following the cue onset on day 7 compared to day 1 (Figure 1C). On a trial-by-trial basis, anticipatory lick rate did not change significantly within a single session on both days 1 and 7, implying limited within-session improvements (Figure 1D). To ensure the increase in lick rate was specific to the CS, we compared the mean lick rate during the CS and reward period to the lick rate during self-initiated spontaneous lick bouts in the ITI (in the absence of the CS or reward). To be consistent with the 2.5-s analysis window for CS responses, we analyzed the first 2.5 s of ITI lick bouts and 2.5 s following reward delivery. On day 1, the mean lick rate during ITI lick bouts (1.47 ± 0.17/s) and the CS (1.37 ± 0.17/s) were similar, while the lick rate following reward delivery was significantly higher (7.16 ± 0.48/s). In contrast, on day 7 following associative learning, the lick rate during the CS period (3.42 ± 0.37/s) was significantly higher than during ITI lick bouts (1.71 ± 0.2/s), demonstrating that the mice effectively learned the CS–reward association by day 7 (Figure 1E, F).

To investigate the activity of different neuronal cell types during this task, we used in vivo two-photon Ca2+ imaging of different cell-type populations. To target PNs in M1, we injected an adenoassociated virus (AAV) carrying a Ca2+ indicator (GCaMP6f) driven by the CaMKII promoter (AAV1.CaMKIIa.GCaMP6f) into M1 of wild-type B6129S mice. After 3–5 weeks, we recorded the activity of hundreds of L2/3 PNs using two-photon microscopy in awake mice while they underwent the head-fixed conditioning task, and we tracked the same population of neurons on days 1 and 7 (Figure 2A). We identified all the active neurons within a session, irrespective of the behavioral task (see Methods), and sorted neurons by the timing of their peak activity relative to the CS onset. It was apparent that there were subpopulations of neurons more responsive to CS, reward, or both (Figure 2B, C). We also repeated the experiments to examine if the major IN subtypes in M1 also respond to the CS and reward during the conditioning task. To do this, we injected AAV-Syn-Flex-GCaMP6f in PV-Cre, SOM-Cre, or VIP-Cre transgenic mice to selectively express GCaMP6f in PV-INs, SOM-INs, or VIP-INs, respectively, and then performed in vivo two-photon Ca2+ imaging to monitor the response properties of the same population of INs on days 1 and 7 after associating learning (Figure 2—figure supplements 1 and 2). We compared the mean percentage of active cells within the entire session to ensure all cell types had a similar proportion of active cells (irrespective of the behavioral task) on days 1 and 7 (Figure 2D).

Figure 2. Longitudinal in vivo Ca2+ imaging of neuronal responses in M1 during a classical conditioning task.

(A) Experimental timeline (top). In vivo two-photon imaging of L2/3 pyramidal neurons (PNs) expressing GCaMP6f in M1 (bottom left). The same population can be tracked from days 1 to 7 (bottom right). Yellow arrows indicate example tracked neurons across days. (B) Z-Scored fluorescence traces from 13 neurons (top), and the corresponding licking measured with the lick-o-meter (bottom) from the same mouse and same trial on day 1. Gray bar represents the timing of the cue stimulus (CS). Dotted red line indicates the onset of water reward delivery. (C) Z-Scored activity of all the active neurons from an example mouse during one representative trial on days 1 and 7, sorted by timing of maximum activity following the CS onset. Gray bar represents the timing of the CS. White line indicates the onset of water reward delivery. (D) Mean percent of active neurons within a session, irrespective of the behavioral task, for PNs, PV-INs, VIP-INs, and SOM-INs on days 1 and 7. All cell types showed a similar percentage of active neurons. One-way analysis of variance (ANOVA), n.s., nonsignificant, day 1: p = 0.38, day 7: p = 0.22. Error bars show standard error of the mean (SEM). (E–H) Mean percent of responsive neurons to CS (top) and reward (bottom) within 2.5 s of CS/reward onset for each cell type. Violin plots show null distribution of percentage of responsive neurons made by randomly resampling mice and shuffling the session, 1000 times (see Methods). The circle represents the mean percentage of tone- or reward-responsive neurons. Monte-Carlo with Bonferroni correction, n.s., nonsignificant, *p < 0.05, ***p < 1 × 10−3. PN CS day 1: p = 0.582, CS day 7: p = 0.423, Reward day 1: p < 1 × 10−3, Reward day 7: p = 0.015, n = 1029 cells from six mice (E), PV-IN CS day 1: p < 1 × 10−3, CS day 7: p < 1 × 10−3, Reward day 1: p < 1 × 10−3, Reward day 7: p < 1 × 10−3, n = 316 cells from six mice (F), VIP-IN CS day 1: p = 0.039, CS day 7: p < 1 × 10−3, Reward day 1: p < 1 × 10−3, Reward day 7: p < 1 × 10−3, n = 407 cells from four mice (G), SOM-IN CS day 1: p = 0.47 , CS day 7: p < 1 × 10−3, Reward day 1: p = 0.033, Reward day 7: p < 1 × 10−3, n = 189 cells from seven mice (H).

Figure 2.

Figure 2—figure supplement 1. Z-Scored ∆F traces from cells among each cell type on days 1 and 7.

Figure 2—figure supplement 1.

Example calcium traces from five example cells (same mouse) during a representative intertrial interval (ITI) lick bout on day 1 (left), a trial on day 1 (middle) and a trial on day 7 (right). The same cells are tracked from days 1 to 7. Blue dashed line indicates the time of the first lick in the lick bout (left). Gray shading represents time of the cue stimulus (CS) and the red dashed line indicates the time of the water reward delivery (middle, right).

Figure 2—figure supplement 2. Z-Scored population activity of cells tracked from days 1 to 7 for each cell type.

Figure 2—figure supplement 2.

Neurons were not sorted and the order of all cells were maintained for the days 1 and 7 plot. Gray bar represents the timing of the cue stimulus (CS). White line indicates the onset of water reward delivery. Black lines indicate licking.

To examine task-related activity in each cell type, we first compared the mean percent of active cells during the CS and reward to a null distribution made by randomly sampling the session irrespective of the behavioral task, and then calculating the mean percentage of active neurons during the sampled period. By repeating this 1000 times for each cell type on days 1 and 7, we created a distribution of the percentage of active neurons that were present at baseline levels or by chance. Surprisingly, we found that only PV-IN and VIP-IN cell types had a percent of CS- and reward-responsive cells that were significantly greater than chance level on both days 1 and 7 (PV-IN CS: day 1: 15.26% ± 2.11%, day 7: 24.09% ± 2.98%; PV-IN reward: day 1: 20.17% ± 3.27%, day 7: 27.47% ± 3.66%; VIP-IN CS: day 1: 11.29% ± 3.23%, day 7: 16.59% ± 2.01%, VIP-IN reward: day 1: 18.65% ± 5.91%, day 7: 26.3% ± 6.04%; Figure 2F, G). PN responses to the CS were not different from the null distribution on both days 1 and 7 (day 1: 18.42% ± 1.05%, day 7: 16.41% ± 1.33%; Figure 2E); in contrast, PN responses to the reward were significantly higher on day 1 but significantly lower than the null distribution on day 7 (day 1: 23.95% ± 2.49%, day 7: 12.12% ± 0.95%; Figure 2E). Lastly, SOM-INs showed significant responses to the CS and reward only on day 7 following associative learning, while on day 1, they demonstrated no response to the CS and a modest response to the reward (CS: day 1: 5.53% ± 2.7%, day 7: 13.56% ± 3.17%; Reward: day 1: 9% ± 3.66%, day 7: 12.15% ± 3.93%; Figure 2H). Based on these findings, we decided to only examine the sessions where the percent of active cells were significantly greater than the null distribution in our subsequent analysis, as a nonsignificant percent of active cells during the stimulus period cannot be readily distinguished from nontask-related baseline noise.

We began our analysis on PV-INs and VIP-INs because they both showed significant responses to both the CS and reward on days 1 and 7. To understand how their representations of reward and reward-associated cues changed over the course of learning, we first analyzed the tuning of individual cells to unbiasedly identify their response properties. By quantifying the tuning of each cell’s average response during the CS and reward response periods (2.5-s window) using the nonparametric Spearman correlation ρ (see Methods), we observed a wide range of tuning coefficients to the CS and reward, with a small proportion that was strongly positively or negatively tuned to the CS or reward stimulus (tuning coefficient near –1 or 1; Figure 3A–D), consistent with our earlier analyses demonstrating that neurons in M1 show activity associated with the CS or reward during the conditioning task. We next examined whether the tuning coefficient within each cell type changed after associative learning by calculating the change in tuning coefficients for each cell between days 1 and 7. Again, to validate our findings, we compared these values to a null distribution of Δρ values obtained by randomly sampling the two sessions (see Method details). The PV-IN population did not show any significant changes in either CS or reward tuning between days 1 and 7 (Δρ¯CS=-0.049±0.046, Δρ¯reward=0.014±0.054; Figure 3E, F), indicating that neither CS- nor reward-related tuning became stronger after associative learning. In contrast, VIP-INs’ CS tuning did not change significantly between days 1 and 7 Δρ¯CS=-0.065±0.048, but VIP-INs’ reward tuning significantly increased on day 7 (Δρ¯reward=0.161±0.086; Figure 3G, H), suggesting a strengthening of VIP-IN responsivity to reward following associative learning.

Figure 3. Learning-associated changes in single-neuron tuning properties in M1.

Figure 3.

Example fluorescence traces, color-coded based on nonparametric Spearman correlation with cue stimulus (CS) (A) or reward (B). Each trace is a trial-averaged response from different neurons on day 7. Trial-averaged fluorescence of all the neurons recorded on day 7 and sorted to the value of the Spearman correlation (−1−1) to CS (C) or reward (D). Active neurons during the CS- or reward-responsive period showed higher tuning coefficient. Left, distribution of changes in Spearman correlation Δρ with CS (E, G) or reward (F, H) for PV-INs (top) and VIP-INs (bottom). Each curve represents a Gaussian kernel density estimate of the distribution of Δρ in a single mouse. Right, mean change in Spearman correlation Δρ¯ for PV-INs and VIP-INs. Null distributions (gray) were estimated by resampling each mouse and shuffling trials 1000 times (see description of calculation of tuning coefficients in Methods). VIP-IN reward tuning significantly increased with associative learning. Monte-Carlo, ***p < 1 × 10−3, n.s., nonsignificant, PV-IN CS p = 0.12 (E), PV-IN Reward p = 0.61 (F), VIP-IN CS p = 0.082 (G), VIP-IN Reward p < 1 × 10−3 (H) PV-IN: n = 316 cells from six mice. VIP-IN: n = 407 cells from four mice.

Although the tuning properties can reveal changes in task-related responsivity, this analysis is limited in identifying changes at the trial-by-trial level. When we assessed population activity following the CS onset (Figure 4A), it was apparent that a group of PV-INs and VIP-INs were responsive to CS on both days 1 and 7 (Figures 4B and 5B). Hence, by identifying and tracking the same neurons from days 1 to 7, we were able to ask if there was (1) an increase in the number of neurons being recruited as CS or reward responsive during associative learning or (2) a change in the trial-by-trial reliability of CS and reward responses. When we compared the mean percent of CS-responsive neurons on days 1 and 7, we found that the average percent of CS-responsive PV-INs during a trial increased significantly by day 7 (day 1: 15.26% ± 2.11%, day 7: 24.09% ± 2.98%; Figure 4C), while the percent of CS-responsive VIP-INs did not change (day 1: 11.29% ± 3.23%, day 7: 16.59% ± 2.01%; Figure 4D), demonstrating that more PV-INs became responsive to the CS after associative learning. We then assessed the reliability of the responses, defined as the percent of trials within a session where a neuron was responsive to the CS. This measure quantifies how consistently a neuron responded to the CS within a session. We first plotted the cumulative distribution function of reliabilities among all PV-INs and VIP-INs. We observed that PV-INs, as a population, were significantly more reliable in their CS responses than VIP-INs on day 1 (Figure 4E). When we sorted neurons based on their day 1 reliability values and followed them to day 7, we observed that many of the PV-INs that initially had Low Reliability to CS became more responsive on day 7 (Figure 4H); therefore, we grouped neurons into ‘High Reliability’ if they were among the top 50th percentile, while neurons in the lower 50th percentile were deemed ‘Low Reliability’. We found that PV-INs that began as highly reliable maintained their reliability to the CS (day 1: 29.8% ± 1.51%, day 7: 33.87% ± 4.72%), while PV-INs that began as Low Reliability became significantly more reliable (8.47% ± 0.46%, day 7: 18.99% ± 3.76%; Figure 4F). In contrast, the reliability of both High and Low VIP-INs did not change (High Reliability: day 1: 26.55% ± 2.62%, day 7: 25.93% ± 3.81%; Low Reliability: day 1: 6.32% ± 0.76%, day 7: 14.24% ± 2.6%; Figure 4G). We then followed individual Low Reliability PV-INs and calculated the change in reliability to the CS (reliabilityCS) from days 1 to 7. As a control, we randomly sampled the day 7 session irrespective of the behavioral task and calculated a reliability value. We then subtracted that value from the actual day 1 CS reliability to generate a randomized change in reliability (reliabilityrandom) for each Low Reliability neuron (see Methods). When we compared the two distributions, we found that among the Low Reliability PV-INs, reliabilityCS was significantly greater than the reliabilityrandom control (Figure 4I). Lastly, we examined if the onset of neuronal activity after the CS changed after associative learning, and if any neurons that were previously responsive to the CS became responsive to the reward only. We did not observe a change in the onset of neuronal activity following CS (Figure 4—figure supplement 1A, see Methods). Furthermore, 97% of the CS-responsive PV-INs from day 1 still showed responsivity to CS on day 7 (Figure 4—figure supplement 1C, D). Altogether, these results show that as a population, more PV-INs became responsive to the CS, and this is mainly due to Low Reliability PV-INs that became more reliable following associative learning.

Figure 4. PV-IN and VIP-IN cue stimulus (CS)-related responses before and after associative learning.

(A) Trial structure. Gray shaded bar represents the response period analyzed for CS-responsive activity. (B) Z-Scored activity of all the active PV-INs from an example mouse during one representative trial on days 1 and 7, sorted by timing of maximum activity following the CS onset. Gray bar represents the timing of the CS. White line indicates the onset of water reward delivery. Mean percent of cells that are responsive to the CS for PV-INs (C) and VIP-INs (D). PV-INs showed an increase in the percent of CS-responsive neurons after reinforcement learning, while VIP-INs did not show any change. Paired t-test, *p < 0.05, n.s., nonsignificant, PV-IN: p = 0.031 (C), VIP-IN: p = 0.38 (D). (E) Cumulative probability plots showing the percent of trials that each neuron responded to the CS for PV-INs and VIP-INs on day 1. Neurons from each cell type were pooled across mice. PV-INs showed significantly greater reliability to the CS than VIP-INs. Kolmogorov–Smirnov test, p < 1 × 10−3. Mean reliability index of cells that are responsive to the CS for PV-INs (F) and VIP-INs (G). Each cell type is divided into High or Low Reliability Group based on the 50th percentile from the cumulative probability plots in (E). High Reliability PV-INs maintained their consistency, while the Low Reliability group became more consistent in their responses to CS. (G) VIP-INs did not show a change in either group after learning. Paired t-test, PV-IN High Reliability: p = 0.38, PV-IN Low Reliability: p = 0.044, VIP-IN High Reliability: p = 0.79, VIP-IN Low Reliability: p = 0.060. (H) CS responses from all tracked PV-INs in one example mouse on days 1 and 7. Left, cumulative distribution of CS response reliability among all tracked cells within the example mouse. Right, binary map of each cell’s CS response (active or not) across all trials on days 1 and 7. Cells were sorted by their day 1 reliability shown on the left and the order is maintained on day 7. Cells with low response reliability to CS on day 1 became more reliable on day 7. (I) Cumulative probability plots of the change in reliability from days 1 to 7 (reliabilityCS) among PV-INs with Low Reliability group to CS on day 1. Bold red, reliabilityCS of the population of PV-INs. Thin red lines show the reliabilityCS distribution within individual PV-Cre mice. As a control, day 7 session was randomly sampled and a random reliability was calculated. reliabilityrandom was calculated by subtracting day 1 CS reliability from the day 7 random reliability (gray, reliabilityrandom from the same population of PV-INs). Kolmogorov–Smirnov test, ***p < 1 × 10−3 PV-IN: n = 316 cells from six mice. VIP-IN: n = 407 cells from four mice. Error bars show standard error of the mean (SEM).

Figure 4.

Figure 4—figure supplement 1. Response properties of PV-IN and VIP-IN cells tracked from days 1 to 7 are consistent across days.

Figure 4—figure supplement 1.

(A) Mean time from the cue stimulus (CS) onset to Ca2+ event onset for all PV-IN CS responses. Each line represents an individual mouse. The mean event onset time did not change from days 1 to 7. Paired t-test, p = 0.058. (B) Mean time from the first lick in response to reward delivery to Ca2+ event onset for all VIP-IN reward responses. Each line represents an individual mouse. The mean event onset time did not change from days 1 to 7. Paired t-test, p = 0.53. (C) Overall proportion of PV-INs that responded to the CS on day 1 did not change on day 7. Light red indicates the proportion of day 7 CS-responsive cells that also responded to the CS on day 1. Each line represents an individual mouse. Paired t-test, p = 0.32. (D) PV-INs that responded to the CS on day 1, tracked to day 7. 97% of the cells that responded to CS on day 1 also responded to the CS on day 7. 3% of the cells that responded to the CS on day 1 became responsive to reward but not to the CS on day 7. Less than 1% responded to neither CS nor reward on day 7. (E) Overall proportion of VIP-INs that responded to the reward stimuli on day 1 did not change on day 7. Light blue indicates the proportion of day 7 reward-responsive cells that also responded to reward on day 1. Each line represents an individual mouse. Paired t-test, p = 0.32. (F) VIP-INs that responded to reward on day 1, tracked to day 7. 95% of the cells that responded to reward on day 1 also responded to reward on day 7. 4% of cells that responded to reward on day 1 became responsive to CS but not reward on day 7. Less than 1% responded to neither cue nor reward on day 7. PV-IN: n = 316 cells from six mice. VIP-IN: n = 407 cells from four mice. Error bars show standard error of the mean (SEM).

Figure 4—figure supplement 2. PV-IN and VIP-IN population plasticity is specific to associative learning.

Figure 4—figure supplement 2.

(A) Trial structure for the no-reward control task. Gray shaded bar represents the response period analyzed for tone-responsive activity. (B) Mean percent of tone-responsive PV-INs from the no-reward task. Violin plots show null distribution of percentage of responsive neurons made by resampling mice and shuffling the session 1000 times (same as Figure 2). The circle represents the mean percentage of tone-responsive neurons. In the absence of reward, PV-INs did not show significant responses to tone. Bootstrap, day 1: p = 0.11, day 7: p = 0.11. (C) The mean percent of tone-responsive PV-INs during no-reward control do not change between days 1 and 7. Paired t-test, n.s., nonsignificant, p = 0.89. (D) Trial structure for the nonpaired control task. Tone was followed by a randomly varied 40- to 80-s period before reward delivery and then a randomly varied intertrial interval (ITI) between 15 and 25 s. First gray shaded bar shows the response period analyzed for tone-responsive activity (2.5 s from tone onset). Second shaded bar shows the response period analyzed for reward-responsive activity (2.5 s from onset of first lick after reward delivery). (E) Mean lick rate during the tone from the nonpaired task. Mice did not show increased anticipatory licking during the tone on day 7. One-way analysis of variance (ANOVA), p = 0.097. (F) Mean lick rate following the randomized water rewards from the nonpaired task. Mice increased lick rate to consume the reward. The lick response was greater on day 7. One-way ANOVA, p < 1 × 10−3. (G) Mean percent of tone-responsive PV-INs during the nonpaired task. Violin plots show null distribution of percentage of responsive neurons made by resampling mice and shuffling the session 1000 times. The circle represents the mean percentage of tone-responsive neurons. When tone was not paired with reward, PV-INs showed significant responses to tone on day 1 but not day 7. Bootstrap, Bonferroni correction for multiple comparisons, day 1 p = 0.003, day 7 p = 0.069. (H) The mean percent of tone-responsive PV-INs during the nonpaired task did not change between days 1 and 7. Paired t-test, n.s., nonsignificant, p = 0.43. (I) Mean reliability index of PV-INs that were responsive to the tone during the nonpaired task. Cells were divided into High or Low Reliability Group based on the 50th percentile of tone reliability. The reliability of neither High nor Low Reliability PV-INs changed from days 1 to 7. Paired t-test, n.s., nonsignificant, High: p = 0.86, Low: p = 0.19. (J) Mean percent of reward-responsive PV-INs during the nonpaired task. Violin plots show null distribution of percentage of responsive neurons made by re-sampling mice and shuffling the session 1000 times. The circle represents the mean percentage of reward-responsive neurons. PV-INs responded significantly to randomly timed rewards on days 1 and 7. Bootstrap, Bonferroni correction for multiple comparisons, day 1: p < 1 × 10−3,day 7: p < 1 × 10−3. (K) The mean percent of reward-responsive PV-INs during the nonpaired task did not change between days 1 and 7. Paired t-test, n.s., nonsignificant, p = 0.51. (L) Mean reliability index of PV-INs that were responsive to the reward. Cells were divided into High or Low Reliability Group based on the 50th percentile of reward reliability. The reliability of neither High nor Low Reliability PV-INs changed from days 1 to 7. Paired t-test, n.s., nonsignificant, High: p = 0.23, Low: p = 0.24. (M) Mean percent of reward-responsive VIP-INs during the nonpaired task. Violin plots show null distribution of percentage of responsive neurons made by resampling mice and shuffling the session 1000 times. The circle represents the mean percentage of reward-responsive neurons. VIP-INs responded significantly to randomly timed rewards on days 1 and 7. Bootstrap, day 1: p < 1 × 10−3, day 7: p < 1 × 10−3. (N) The mean percent of reward-responsive VIP-INs during the nonpaired control did not change between days 1 and 7. Paired t-test, n.s., nonsignificant, p = 0.57. (O) Mean reliability index of VIP-INs that were responsive to the reward. Cells were divided into High or Low Reliability Group based on the 50th percentile of reward reliability. The reliability of neither High nor Low Reliability PV-INs changed from days 1 to 7. Paired t-test, n.s., nonsignificant, High: p = 0.53, Low: p = 0.22. **p < 0.01, ***p < 0.001.
No-water rewards: n = 357 PV-IN cells from three mice. Nonpaired rewards: n = 324 PV-IN cells from five mice; n = 208 VIP-IN cells from three mice. Error bars show standard error of the mean (SEM).

Figure 5. PV-IN and VIP-IN reward-related responses before and after associative learning.

Figure 5.

(A) Trial structure. Gray shaded bar represents the response period analyzed for reward-responsive activity. (B) Z-Scored activity of all the active VIP-INs from an example mouse during one representative trial on days 1 and 7, sorted by timing of maximum activity following the cue stimulus (CS) onset. Gray bar represents the timing of the CS. White line indicates the onset of water reward delivery. Mean percent of cells that are responsive to the reward for PV-INs (C) and VIP-INs (D). Neither PV- or VIP-INs showed a significant change. Paired t-test, n.s., nonsignificant, PV-IN: p = 0.16 (C), VIP-IN: p = 0.16 (D). (E) Cumulative probability plots showing the percent of trials that each neuron responded to reward for PV-INs and VIP-INs on day 1. Neurons from each cell type were pooled across mice. VIP-INs showed a significantly greater response reliability to the reward than PV-INs. Kolmogorov–Smirnov test, *p < 0.05, p = 5.2 × 10−3. Mean reliability index of cells that are responsive to the reward for PV-INs (F) and VIP-INs (G). Each cell type is divided into High or Low Reliability Group based on the 50th percentile from the cumulative probability plots in (E). High and Low Reliability PV-INs maintained their consistency. (G) High Reliability VIP-INs did not show a change, while Low Reliability VIP-INs significantly increased in reliability following associative learning. Paired t-test, PV-IN High Reliability: p = 0.36, PV-IN Low Reliability: p = 0.058, VIP-IN High Reliability: p = 0.090, VIP-IN Low Reliability: p = 0.045. (H) Reward-responses from all tracked VIP-INs in an example mouse on days 1 and 7. Left, cumulative distribution of reward response reliability among all tracked cells within the example mouse. Right, binary map of each cell’s reward response (active or not) across all trials on days 1 and 7. Cells were sorted by their day 1 reliability shown on the left and the order is maintained on day 7. Cells with low response reliability to reward on day 1 became more reliable on day 7. (I) Cumulative probability plots of the change in reliability from days 1 to 7 (reliabilityreward) among VIP-INs with Low Reliability to reward on day 1. Bold blue, reliabilityreward of the population. Thin blue lines show the reliabilityreward distribution within individual VIP-Cre mice. As a control, day 7 session was randomly sampled and a random reliability was calculated. reliabilityrandom was calculated by subtracting day 1 reward reliability from the day 7 random reliability (Gray, reliabilityrandom from the same population of VIP-INs). Kolmogorov–Smirnov test, ***p < 1 × 10−3. PV-IN: n = 316 cells from six mice. VIP-IN: n = 4 07 cells from four mice. Error bars show standard error of the mean (SEM).

To demonstrate that changes in PV-INs’ representation of the CS resulted from associative learning, we conducted additional control experiments to examine their responses to tone when mice received no rewards or nonpaired randomly timed rewards. In the first experiment, water-restricted PV-Cre mice were exposed to the same auditory tone used as the CS (1-s duration) but all water rewards were omitted (randomly varied ITI between 60 and 120 s; ~30 trials/session; 1 session/day for 7 days). We imaged PV-IN activity on both days 1 and 7 and assessed their population responses following the tone onset (Figure 4—figure supplement 2A). We first compared the mean percent of active cells during tone to a null distribution made by randomly sampling the session irrespective of the behavioral task (as in Figure 2). Surprisingly, PV-INs did not show significant tone-responsive cells compared to the chance level on either day 1 or 7 (Figure 4—figure supplement 2B); hence, the average percent of tone-responsive PV-INs per trial also did not increase from days 1 to 7 (Figure 4—figure supplement 2C). Next, in a separate cohort of mice, we exposed water-restricted PV-Cre mice to tone, followed by a ‘nonpaired’ water reward that was given at randomly varied time intervals (40–80 s). Mice were also trained for ~30 trials/session (1 session/day for 7 days) with a randomly varied ITI between 15 and 25 s (Figure 4—figure supplement 2D). We found PV-INs were significantly responsive to the tone stimuli on day 1, similar to what we observed earlier in the CS–reward task (Figure 2F). Interestingly, by day 7, PV-INs no longer responded to the tone stimulus (Figure 4—figure supplement 2G). We next examined if mice that received the tone stimulus with nonpaired water reward learned to associate the two after 7 days. We found the animals did not learn the association, as their conditioned response (tone-evoked anticipatory licking) did not increase at day 7 (Figure 4—figure supplement 2E, F). Unlike the mice that learned the association in the CS–reward task, we did not observe a change in the mean percent of tone-responsive PV-INs from days 1 to 7 in the nonpaired paradigm, and the reliability of Low Reliability PV-INs to tone also did not change (Figure 4—figure supplement 2H, I). Together, these results suggest that PV-INs in M1 do not respond to auditory tone in general, but instead only respond to the tone when the animal actively associates it with reward. Moreover, the changes among PV-INs to the CS tone from days 1 to 7 are specific to associative learning.

We next assessed reward responses among PV-INs and VIP-INs in the same manner but now looked for responses within 2.5 s of the reward delivery time (Figures 4B and 5A, B). We tracked the same neurons from days 1 to 7 and compared the mean percent of reward-responsive neurons. PV-INs and VIP-INs did not show a significant change in the percent of responsive cells per trial (PV-IN: day 1: 20.17% ± 3.27%, day 7: 27.47% ± 3.66%; VIP-IN: day 1: 18.65% ± 5.91%, day 7: 26.3% ± 6.04%; Figure 5C, D). When we examined the cumulative distribution of reliabilities for reward responses between the two cell types, VIP-INs as a population were significantly more reliable than PV-INs on day 1 (Figure 5E). By dividing the cells into High and Low Reliability groups, we found the High Reliability VIP-INs maintained their reliability (VIP-IN High Reliability: day 1: 38.79% ± 2.71%, day 7: 35.88% ± 4.32%), and the Low Reliability VIP-INs became significantly more reliable on day 7 (VIP-IN Low Reliability: day 1: 10.25% ± 1.67%, day 7: 24.06% ± 4.87%; Figure 5G, H). In contrast, both High and Low Reliability PV-INs maintained their reliability to reward (High Reliability: day 1: 35.59% ± 2.81%, day 7: 41.94% ± 5.7%; Low Reliability: day 1: 10.58% ± 0.74%, day 7: 19.59% ± 3.08%; Figure 5F). We also followed individual Low Reliability VIP-INs and calculated the change in reliability to reward (reliabilityreward) from days 1 to 7. As a control, we randomly sampled the day 7 session irrespective of the behavioral task and calculated a random reliabilityrandom for each neuron as described above. When we compared the two distributions, the reliabilityreward was significantly greater than the reliabilityrandom , demonstrating that Low Reliability VIP-INs became more reliably responsive to reward (Figure 5I). We also examined if the onset of neuronal activity after reward consumption changed after associative learning, and if neurons that were previously responsive to reward became responsive to the CS only. We did not observe a change in the onset of neuronal activity following reward (Figure 4—figure supplement 1B). Furthermore, 95% of the VIP-INs from day 1 still showed responsivity to reward on day 7 (Figure 4—figure supplement 1E, F). Lastly, to demonstrate that changes in VIP-INs’ representation of reward resulted from associative learning, we examined both PV-IN and VIP-IN responses to reward in mice that were exposed to the nonpaired behavioral paradigm (tone+ randomly timed water rewards; Figure 4—figure supplement 2D–F). Consistent to what we observed in the CS–reward paradigm, both PV-INs and VIP-INs consistently showed higher mean percent of active cells during reward (2.5 s from the first lick after reward delivery) compared to the chance level on both days 1 and 7 (Figure 4—figure supplement 2J–M). However, because these mice did not learn the association between the auditory tone and randomly delivered water reward, we did not see an increase in the reliability of the Low Reliability VIP-INs from days 1 to 7 (Figure 4—figure supplement 2O), or a change in the percent of responsive cells in either PV-INs or VIP-INs (Figure 4—figure supplement 2K–N). Altogether, we found that during associative learning, while the proportion of reward-responsive VIP-INs during a given trial did not change, a subset of VIP-INs that were largely unresponsive to reward on day 1 became more reliably responsive on day 7.

Although PV-INs and VIP-INs were the only cell types that were significantly responsive to both CS and reward on both days 1 and 7, PNs and SOM-INs also had significant responses to specific stimuli on certain days. While PNs did not show significant CS responses when compared to baseline, their reward responses on day 1 were significantly above the null distribution, and they became significantly lower than the null distribution on day 7 (Figure 2E). This result is in line with the change in tuning coefficient (ρreward), which showed a significant decrease in reward tuning between days 1 and 7 (Δρ¯reward=-0.141±0.067; Figure 6A). Moreover, the cumulative distribution function of PN reliability also shifted significantly to lower reliabilities on day 7 compared to day 1 (Figure 6B). These results indicate that PNs initially responded to novel reward; however, they habituated to the reward following associative learning.

Figure 6. Pyramidal neuron (PN) and SOM-IN reliability is altered after associative learning.

Figure 6.

(A) Left, distribution of changes in PN Spearman correlation Δρ for reward. Each curve represents a Gaussian kernel density estimate of the distribution of Δρ in a single mouse. Right, mean change in Spearman correlation Δρ¯ . Null distributions (gray) were estimated by resampling each mouse and shuffling trials 1000 times. Reward tuning among PNs decreased after associative learning, Monte-Carlo, ***p < 1 × 10−3. (B) Cumulative probability plots showing the percent of trials that each neuron responded to reward for PNs on days 1 and 7. Neurons were pooled across mice. Day 7 reliability was significantly lower than day 1. Kolmogorov–Smirnov test, ***p < 1 × 10−3. (C) Left, distribution of changes in SOM-IN Spearman correlation Δρ with cue stimulus (CS). Each curve represents a Gaussian kernel density estimate of the distribution of Δρ in a single mouse. Right, mean change in Spearman correlation Δρ¯ . SOM-INs did not show a change in tone tuning. Monte-Carlo, n.s., nonsignificant, p = 0.128. (D) Cumulative probability plots showing the percent of trials that each neuron responded to the CS for SOM-INs on days 1 and 7. Neurons were pooled across mice. Day 7 reliability was significantly greater than day 1. Kolmogorov–Smirnov test, p < 1 × 10−3. (E) Left, distribution of changes in SOM-IN Spearman correlation Δρ with reward. Each curve represents a Gaussian kernel density estimate of the distribution of Δρ in a single mouse. Right, mean change in Spearman correlation Δρ¯ . SOM-INs did not show a change in reward tuning. Monte-Carlo, n.s., nonsignificant, p = 0.598. (F) Cumulative probability plots showing the percent of trials that each neuron responded to the reward for SOM-INs on days 1 and 7. Neurons were pooled across mice. Day 7 reliability was significantly greater than day 1. Kolmogorov–Smirnov test, **p = 0.012.

PN: n = 1029 cells from six mice. SOM-INs: n = 189 cells from seven mice.

SOM-INs initially had no response to the CS on day 1, but their responses became significant on day 7 (Figure 2H). The change in CS tuning coefficient (ρtone) was not significant (ρtone=.059±0.031, Figure 6C), suggesting their responsivity did not change with learning. Interestingly, when we assessed the cumulative distribution of CS response reliability on days 1 and 7, the cumulative distribution function shifted to significantly higher reliability values on day 7 (Figure 6C, D). Notably, by day 7, there was a visible reduction in the number of SOM-INs that had 0% reliability to CS on day 1, indicating they were completely unresponsive to the CS on day 1 but not on day 7. Finally, SOM-INs showed modest but significant responses to reward on days 1 and 7 (Figure 2H). When we assessed the reward tuning among the SOM-IN population, ρreward did not show a significant change between days 1 and 7 (ρreward=. 0.017 ± 0.040, Figure 6E). However, SOM-IN reliabilities also shifted to higher values on day 7 (Figure 6F). Altogether, these results suggest that in naive mice, SOM-INs were unresponsive to the CS and modestly responsive to the novel reward; however, following associative learning, SOM-INs became more reliably responsive to both the CS and reward.

Lastly, reward consumption requires innate tongue movements during licking, and since microstimulation of M1 in mice has been shown to evoke tongue and jaw movements (Komiyama et al., 2010), it is crucial to distinguish whether the observed CS and reward responses resulted from task-related stimuli or if the activity is simply associated with licking movements. We demonstrated earlier that head-fixed mice learned the CS–reward association by displaying the conditioned response (anticipatory licking) following the CS on day 7 (Figure 1). To address this potential confound, we identified all the self-initiated licking bouts during ITIs, when no reward was present (Figure 7A–C; Figure 2—figure supplement 1). We first assessed all the significantly active cells in each cell type (identified in Figure 2D) during the first lick bout of each ITI on days 1 and 7. We observed that in each cell type, the majority of the neurons were nonlick neurons on both days 1 and 7 (Figure 7—figure supplement 1A). We then tracked individual lick and nonlick neurons and examined if they shifted their responses after associative learning. We found that most of the neurons maintained the same responses, and nonlick neurons were still the majority in all cell types (Figure 7—figure supplement 1B). Next, we examined whether the lick neurons also showed mixed responses to CS, reward, or CS+ reward. Indeed, lick neurons exhibited mixed responses to CS, reward, or CS+ reward (Figure 7—figure supplement 1C). We then further divided them into three categories – ‘CS cells’, ‘reward cells’, and ‘CS+ reward cells’ and compared the percentage of neurons in each category between days 1 and 7; we did not observe a significant difference (Figure 7—figure supplement 1D). Lastly, at the population level, we examined the response reliability index of all the active neurons during all ITI lick bouts and compared them to the response reliability index for the CS and reward. On both day 1 (when there was minimal anticipatory licking during the CS) and day 7 (when mice showed anticipatory licking), all cell types exhibited lower reliability index values for the ITI lick bouts compared to the CS and reward, indicating that the increase in task-related responses following water rewards was specific to the reward stimulus, and not licking movements (Figure 7D–J). Together, these results suggest that the cell-type-specific modifications observed between days 1 and 7 were not caused by licking movements.

Figure 7. Cell-type-specific cue and reward activity are not due to licking movements.

(A) Schematic of the cued reward conditioning task (top) and the trial structure (bottom). (B) Example licking behavior during the intertrial intervals (ITIs) from one mouse on day 1. Purple shading shows licks that were considered to be an individual lick bout. Gray shaded bar shows the cue stimulus (CS) timing and the red dotted line shows the reward timing. (C) Z-Scored activity of all the active pyramidal neurons (PNs) from an example mouse during one representative ITI lick bout. Left: maximum activity aligned to the lick bout onset. Right: maximum activity aligned to the CS onset. Gray bar represents the timing of the CS. White line indicates the time of water reward delivery. (D–F) Mean reliability index for all cell types with significant CS-related responses during ITI lick bouts with no water reward present, compared to the mean reliability index during the CS and up to but not including the reward delivery time. As shown in Figure 1, mice display anticipatory licking during the CS on day 7, but not day 1. PV-INs, VIP-INs, and SOM-INs were more reliably responsive during the CS than during licking movement alone. Only sessions/cell types with significant CS-related responses were analyzed. Paired t-test, *p < 0.05, ***p < 0.001. PV-INs day 1: p < 1 × 10−3, PV-IN day 7: p = 6.6 × 10−3 (D), VIP-IN day 1: p = 0.0355, VIP-IN day 7: p < 1 × 10−3 (E), SOM-IN day 7: p = 3.9 × 10−3 (F). (G–I) Mean reliability index for all cell types with significant reward-related responses during ITI lick bouts with no water reward present, compared to the mean reliability index following reward timing. All cell types were more reliably responsive during the reward period than during licking movement alone. Only sessions/cell types with significant reward-related responses were analyzed. Paired t-test, PV-INs day 1: p = 1.6 × 10−3, PV-IN day 7: p = 1.6 × 10−3 (G), VIP-IN day 1: p = 0.049, VIP-IN day 7: p = 0.014 (H), PN day 1 p < 1 × 10−3 (I), SOM-IN day 7: p = 0.030 (J).

PN: n = 1029 cells from six mice. PV-IN: n = 316 cells from six mice. VIP-IN: n = 407 cells from four mice. SOM-IN: n = 189 cells from seven mice. Error bars show standard error of the mean (SEM).

Figure 7.

Figure 7—figure supplement 1. Lick cells were a small subset of active cells with stable mixed selectivity.

Figure 7—figure supplement 1.

(A) The fraction of lick cells (in black) among all active cells tracked from days 1 to 7 for each cell type. Percentages indicate the percentage of cells that were nonlick cells. For all cell types, the majority of cells were nonlick cells. (B) Contingency table showing lick selectivity among all tracked active cells on days 1 and 7. Individual cells were tracked across days to determine if licking-related activity changed across days. For all cell types, most cells were nonlick cells on days 1 and 7. (C) Proportion of mixed selectivity for cue stimulus (CS) and reward among lick cells for all cell types on days 1 and 7. (D) Percentage of lick cells with significant CS but not reward selectivity for all cell types. Each line represents an individual mouse. (E) Percentage of lick cells with significant reward but not CS selectivity for all cell types. Each line represents an individual mouse. (F) Percentage of lick cells with significant CS and reward selectivity for all cell types. Each line represents an individual mouse.
PN: n = 1029 cells from six mice. PV-IN: n = 316 cells from six mice. VIP-IN: n = 407 cells from four mice SOM-IN: n = 189 cells from seven mice. Error bars show standard error of the mean (SEM).

Discussion

M1 is known to be involved in motor initiation, movement kinematics, and motor learning. Recent studies have demonstrated reward-related activity in M1 using in vivo electrophysiological recordings in nonhuman primates (Marsh et al., 2015; Ramakrishnan et al., 2017; Ramkumar et al., 2016) and transcranial magnetic stimulation in human subjects (Thabit et al., 2011). However, whether CS- and reward-associated signals are represented among different neuronal cell types within the microcircuit in M1 is still unclear. Using chronic two-photon Ca2+ imaging, combined with transgenic mouse lines and viral strategies to target different neuronal cell types, we demonstrated that during a conditioning task, all major cell types in M1 responded to either the CS, the reward, or both. Most notably, each cell type underwent distinct modifications after association learning. By tracking the same population of neurons, we revealed that the CS-responding population increased among PV-INs and individual cells responded more reliably to the CS following associative learning. On the contrary, VIP-INs became more reliable in response to reward. When mice underwent control behavioral paradigms where tone was not paired with reward and no associative learning occurred, PV-INs and VIP-INs did not undergo these changes. Additionally, PNs had a drastically reduced response to reward, while SOM-INs became more reliable to both the CS and the reward. Our findings suggest that each cell type has a distinct role in processing information related to the cue–reward association in M1, and they may work together to provide the reinforcement signals in M1 that are important for motor skill learning.

Previous studies in trained rhesus monkeys performing a joystick center-out task have shown a widespread representation of reward anticipation and reward-related activity among cortical neurons in M1 (Ramakrishnan et al., 2017). Consistent with earlier work, we also observed reward-related activity in all four major cell types in M1, even in naive mice on day 1 when they were first exposed to the CS and reward. It has been reported that in sensory cortices, repeated passive exposure to a sensory stimulus leads to a long-lasting reduction in PN responsivity, but when animals are engaged in learning, PNs maintain their responsivity to the repeated stimulus (Kato et al., 2015; Makino and Komiyama, 2015). However, we found in M1, when water-restricted mice were engaged in a conditioning task to learn the association between the CS and water reward, PNs still showed a drastic habituation to the reward stimulus. A recent study that imaged neuronal activity in expert mice performing a head-fixed pellet reaching task demonstrated that L2/3 PNs in M1 are involved in encoding movement outcome (success vs. failure) but not the appetitive outcome (reward vs. no reward). However, the authors did not image the mice at the naive stage (Levy et al., 2020). Hence, one possibility is that L2/3 PNs in M1 encode reward signals during the naive stage, but after associative learning, they habituate and become unresponsive to the reward stimulus. In addition, in the sensory cortices, the flexibility to either respond to or ignore sensory stimuli is based on the stimulus’ behavioral relevance and is gated by local SOM-INs (Kato et al., 2015; Makino and Komiyama, 2015; Poort et al., 2021). In line with these findings, we found that SOM-INs became more reliably responsive to both the CS and the reward with associative learning. We also observed stimulus-specific increases in the reliability of PV-INs’ response to the CS and VIP-INs’ response to the reward after associative learning, and these changes to tone and reward were not observed in the absence of associative learning. When mice were exposed to tone alone, PV-INs did not show significant responses to tone on either day 1 or 7, while mice exposed to a nonpaired tone and reward task showed significant responses to tone on day 1, but not on day 7. This suggests that in M1, PV-INs only respond to behaviorally relevant cues such as those that predict reward. Finally, while VIP-INs remained responsive to reward in the nonpaired paradigm, VIP-INs did not show changes in reliability in the absence of associative learning. Together, our results suggest that different IN subtypes may have distinct roles in processing CS- and reward-related information in M1 during motivated associative learning.

One hypothesis is that PV-INs are recruited by the CS to control the behavioral responses (anticipatory licking) during reward anticipation since PV-INs are known to regulate PN firing through both feedforward and feedback inhibition (Fishell and Rudy, 2011; Xu and Callaway, 2009; Xue et al., 2014). Similar observations have been reported in the striatum, in which optogenetic activation or suppression of PV-INs during a similar conditioning task impaired anticipatory licking, demonstrating the importance of PV-INs in the expression of conditioned responses (Lee et al., 2017). Likewise, PV-INs in the basolateral amygdala, are also recruited during the CS and subsequently inhibited during the US in an auditory fear conditioning task. Optogenetic activation of PV-INs during the CS increased conditioned freezing behavior while PV-IN suppression reduced freezing, indicating bidirectional control of the conditioned response (Wolff et al., 2014). Our results demonstrate that in a naive animal, a subset of PV-INs in M1 are responsive to the CS only when rewards are present, and more PV-INs are recruited by the CS if the animal learns that the CS predicts reward. This suggests that in M1, PV-IN responses to the CS are not purely sensory, but rather, they may play an important role in controlling the behavioral responses to the CS.

VIP-INs, on the other hand, were significantly less reliable in responding to the CS compared to PV-INs, and their responses to the CS remained low. However, VIP-INs’ responses to the reward were more reliable than those of PV-INs, and they became more closely tuned and reliably responsive to the reward with learning. Due to the disinhibitory position of VIP-INs in the microcircuit, activation of VIP-INs can lead to widespread increases in local excitability and contribute to regulating cortical gain (Fu et al., 2014; Jackson et al., 2016; Pfeffer et al., 2013). Furthermore, a growing body of evidence suggests a general principle across brain regions, in which VIP-INs receive long-range inputs (Duan et al., 2020; Krabbe et al., 2019; Turi et al., 2019; Zhang et al., 2014; Gasselin et al., 2021), respond to reinforcement signals (Krabbe et al., 2019; Pi et al., 2013), and play an important role in goal-oriented learning (Krabbe et al., 2019; Turi et al., 2019). Taken together, our results suggest that during CS–reward conditioning, PV-INs in M1 encode the CS association, and may regulate local circuit activity related to reward anticipation, whereas VIP-INs act as a context-dependent switch following the reward delivery (Muñoz et al., 2017; Turi et al., 2019) to instruct and disinhibit local PNs to enable learning-induced plastic changes critical for the acquisition of new movements. An interesting point to note is that since PV-INs are only responsive to tone when it is paired with reward, and neither PV-INs nor VIP-INs undergo plastic changes in the absence of associative learning, M1 is unlikely to be a primary site for learning reward predictions. We hypothesize that other brain regions are responsible for learning relevant CS–reward associations while filtering out behaviorally irrelevant stimuli, and these regions subsequently send long-range inputs to M1 to instruct motor responses to the CS and the reward. In summary, this study provides insight on how different IN subtypes in M1 integrate incoming inputs from various brain regions and orchestrate local circuit plasticity. Future work will be important to identify the origin of these putative long-range inputs to different cell types in M1.

Materials and methods

Mice

Experimental mice were group housed in plastic cages with food and water ad libitum in a room with a reversed light cycle (12–12 hr). PV-Cre (008069), SOM-Cre (013044), VIP-Cre (010908), and B6129SF1/J (101043) mouse lines were acquired from Jackson Laboratory (Bar Harbor, ME, USA). All mouse lines were homozygous and in C57BL/6 × 129S4 background. For all mouse lines, both male and females were used. Mice were between P40 and P60 at the time of surgery.

Surgery

Mice were deeply anesthetized under 1–2% isoflurane and given subcutaneous injections of Baytril (10 mg/kg) to prevent infection and buprenorphine (0.05 mg/kg) for analgesia. An incision was performed to remove a piece of the scalp and a custom head-plate was implanted onto the skull using instant glue (Krazy Glue) and dental cement (Lang Dental, Wheeling, IL, USA). A craniotomy of approximately 2 mm in diameter was performed over the right primary motor cortex. Virus (PNs: AAV1.CaMKII.GCaMP6f.WPRE.SV40; PV-IN, VIP-IN, and SOM-IN: AAV1.Syn.Flex.GCaMP6f.WPRE.5v40) was diluted 1:5 in saline and injected at a depth of ~250 µm from the pia using a glass pipette. All virus was obtained from Addgene (Watertown, MA, USA). Injections were performed at five sites, centered on coordinates 1.5 mm lateral and 0.3 mm anterior to bregma. For PN groups, 20 nl per site was injected. For PV-IN, VIP-IN, and SOM-IN, 40 nl per site was injected. All injections were performed at a rate of 10 nl/min and the pipette was left in place for 4 min following the injection to avoid backflow. A glass imaging window was then implanted over the craniotomy and sealed with dental cement. Following surgery, a subcutaneous injection of dexamethasone (2 mg/kg) and buprenorphine (0.1 mg/kg) was given. Mice were given a minimum of 1 week to recover prior to beginning water restriction.

Auditory cued reward conditioning behavior

Mice were gradually water restricted down to ~1 ml/day and were maintained at ~80% of original body weight over 2 weeks prior to the start of imaging/behavior sessions (Chen et al., 2015; Harvey et al., 2012; Komiyama et al., 2010; O’Connor et al., 2013; Peters et al., 2014). Mice were then head-fixed for simultaneous two-photon imaging and exposed to the conditioned stimulus (a constant 9 kHz auditory tone, 1 s in duration) followed by a 1.5-s delay period and a water reward (~10 µl). All lick times were measured by an infrared beam lick-o-meter and logged using the data acquisition software WaveSurfer (https://wavesurfer.janelia.org/). The ITI between the previous water reward and subsequent CS onset was randomly varied between 60 and 120 s. Each session was 1 hr in duration with 30–35 trials in total. Mice underwent 1 session/day for seven consecutive days. Two-photon calcium image was performed simultaneously on days 1 and 7 of the behavioral task.

To assess licking behavior, lick rate (number of licks per second, measured as infrared beam breaks) was calculated within 500-ms bins, then averaged across all trials within a session for each mouse. Lick rate was then averaged across mice. Mean anticipatory lick rate was calculated as the mean lick rate from the time of the CS onset to the end of the delay period (2.5 s in duration), not including the reward delivery. Mean ITI lick rate was calculated from the lick rate during the first 2.5 s of self-initiated spontaneous lick bouts. ITI lick bouts were defined as licking events that followed the previous trial by at least 20 s and preceded the subsequent trial by more than 2.5 s. Mean reward lick rate was calculated from the lick rate from the time of reward delivery to 2.5 s after.

All trials within a session were included in lick rate analysis in Figure 1. To ensure behavioral consistency across trials, only trials with at least three lick responses within 2.5 s of the reward delivery time were included in all analysis of neural responses.

In the control experiments with water rewards omitted, mice were head-fixed and exposed to a constant 9 kHz auditory tone, 1 s in duration, followed by a randomly varied ITI between 15 and 25 s. In the nonpaired control experiments, mice were exposed to a constant 9 kHz auditory tone, followed by a randomly varied delay period between 40 and 80 s before delivery of a water reward (Krabbe et al., 2019). Water rewards were then followed by a 15- to 25-s ITI. In both tasks, each session was 45 min in duration with an average of 30 trials. Mice underwent 1 session/day for seven consecutive days.

Calcium imaging and analysis

In vivo imaging was performed using a commercial two-photon microscope (B-scope, Thorlabs, Newton, NJ, USA) and a ×16 water immersion objective (Nikon) with excitation at 925 nm (InSight X3, Spectra-Physics, Milpitas, CA, USA) with a frame rate of 30 Hz. Images were taken at 512 × 512 pixels covering 755 by 650 µm.

Images were corrected for movement in the x and y plane using full-frame cross-correlation image alignment (Turboreg Thévenaz et al., 1998 plug-in ImageJ). The entire session was visually inspected and regions of interests (ROIs) were manually drawn on neurons using a custom MATLAB program, described in Peters et al., 2014. The ROI template from day 1 was loaded onto day 7 and aligned along the x and y plane. Only ROIs that could be tracked from days 1 to 7 were included in the dataset unless otherwise specified.

Fluorescence within an ROI was averaged across pixels. ΔF was calculated by subtracting a time-varying baseline fluorescence estimate (F0) from the raw fluorescence trace. The calculation for baseline fluorescence (F0) was calculated iteratively and based on inactive parts of the fluorescence trace as previously described (Chu et al., 2016; Kato et al., 2012; Peters et al., 2014; Peters et al., 2017).

We adapted a method by Driscoll et al., 2017 to identify significant activity events for each neuron and then excluded ROIs with no significant activity events within the session, irrespective of the behavior. For each neuron the ΔF trace was circularly shifted by a random integer 1000 times and compared to the original trace. If the original ΔF trace was greater than the shifted data for at least five consecutive frames in at least 950 iterations, this was considered an active event. If a neuron did not have at least one active event in the entire session, irrespective of the behavior, it was removed from the dataset. This only accounted for a small proportion of ROIs as most of them are active on both days 1 and 7, as shown in Figure 2D.

For all subsequent analyses, a modified Z-score, adapted from Kato et al., 2015, was applied to ΔF. The Z-score was calculated as Z = (f(t) − µ)/σ, where f(t) is the ΔF trace for a neuron, µ is the mean, and σ is the standard deviation of the neuron’s ΔF during the baseline period. The baseline period was a concatenation of 2.5 s preceding the CS onset (start of a trial) for all trials within a session.

Calculation of tuning coefficients

We quantified the tuning of individual neurons to the CS and reward stimuli delivered in our classical conditioning task using the nonparametric Spearman correlation ρ (scipy.stats.spearmanr) between the trial-averaged fluorescence and the timing of stimulus delivery

ρm,n,s(d)=corr1|Tm(d)|tTm(d)fm,n,t-Tbaseline:t+TCS+Tdelay+Treward+Tpost(d),1s,

where t is the start time of a trial (defined as the start of the CS), Tm(d) is the set of all trial start times from mouse m on day d{1,7}, fm,n,t-Tbaseline:t+TCS+(d) is the fluorescence trace of neuron n from mouse m during a single trial, 1s is an indicator function for stimulus s{CS,reward}, |Tm(d)| is the number of trials, and ρ is the Spearman correlation coefficient. Analysis was carried out with Tbaseline= 2 s and Tpost= 6 s. We considered the ‘CS’ period indicated by 1CS to range from the start of the CS at time t to the start of reward delivery at time t+TCS+Tdelay , and the ‘reward’ period indicated by 1reward to be the first 2.5 s of reward delivery (see schematic in Figures 4A and 5A). We used the change in ρ from days 1 to 7 as a cell-resolved measure of changes in tuning over the course of learning.

To summarize learning-associated changes in tuning, we calculated the mean change in the Spearman correlation for each cell type and trial component (CS or reward) from days 1 to 7 as follows

Δρ¯s=1|M|mM1Nmn=1Nmρm,n,s(7)-ρm,n,s(1),

where M is the set of mice used in the experiment, Nm is the number of neurons in mouse m, and ρm,n,s(d) is the Spearman correlation as defined above.

We used a nonparametric approach for statistical tests involving the mean change in Spearman correlation by scrambling trial times and bootstrapping mice to construct a null distribution for Δρ¯s . Specifically, we first drew a random sample of |M| mice from M with replacement, then drew a random sample of |Tm(d)| trial start times uniformly distributed between 0 and Tsession(d)-(Tbaseline+TCS+Tdelay+Treward+Tpost) for each day d and randomly selected mouse, and finally used these randomly selected mice and scrambled trial start times to compute the change in tuning Δρ¯s . This process was repeated 1000 times to approximate the distribution of Δρ¯s under the null hypothesis that changes in tuning are unrelated to the CS and reward delivery. We considered the observed changes in tuning Δρ¯s to be statistically significant at the * or ** level if they fell into the 5 or 1% tails of this distribution, respectively.

Activity analysis

To identify neuron responses to the CS and reward, we applied a set threshold to each neuron. Neurons were defined as CS or reward responsive on a trial-by-trial basis if they exceeded 1 Z-score (excitation threshold used in Kato et al., 2015) for at least five consecutive frames within 2.5 s of the CS onset or 2.5 s of the reward delivery time, respectively. This was assessed for each trial with at least three lick responses within 2.5 s of the reward delivery time. We then took the median of the percent of responsive neurons across all trials in a session from one mouse, and the mean across mice. In the nonpaired experiments, since the reward was not preceded by a CS, the mice required variable amounts of time to notice the water reward. Therefore, in the nonpaired experiments, we calculated reward responses within 2.5 s from the onset of the first lick following the water delivery. In the case that no licks were recorded before the subsequent tone, the trial was not included in the reward analysis.

We used a Monte-Carlo approach to validate the percent of CS- and reward-responsive neurons. The mean percentage of CS- and reward-responsive neurons observed were compared to a null distribution made for each cell type on each day. We randomly sampled mice with replacement, then sampled the entire session, and then calculated the percentage of active cells (exceeding 1 Z-score for at least five consecutive frames) during a randomly chosen 2.5-s window. For each mouse, the number of samples was equal to the number of included trials (i.e., number of trials with at least three lick responses within 2.5 s of reward delivery). We then took the median across the random samples and then took the mean across mice to obtain a mean percentage of responsive neurons during a randomly chosen time window. This was repeated 1000 times to generate a null distribution of mean percentage of active neurons. To assess whether the observed percentage of CS- and reward-responsive neurons was significantly different from the null distribution, the observed value was compared to the tails of the null distribution. This was done for each cell type on both days 1 and 7. We considered the CS or reward responses to be statistically significant at the * or ** level if they fell into the 5 or 1% tails of this distribution, respectively, and *** if there was no overlap with the distribution. Since this approach tests the null hypothesis that the observed neuronal responses are due to chance (in this case, baseline activity/noise), only cell types with a significantly higher percentage of responsive neurons for a given session were analyzed further.

The CS/reward reliability index was defined as the percentage of trials within a session where the neuron was CS/reward responsive. The reliability cumulative distribution was made by pooling the day 1 index values of all the neurons from a neuronal cell type (across mice). If a neuron’s day 1 index value was lower or equal to the index value at the 50th percentile of the cumulative distribution (excluding nonresponsive neurons with a reliability of 0) for that cell type, it was categorized into the Low Reliability group. If a neuron’s day 1 index value exceeded the 50th percentile value, it was categorized into the High Reliability group. To assess changes in reliability at the population level, we took the mean reliability within each group on days 1 and 7. To assess changes in reliability among individual Low Reliability neurons, we used a reliability measure where, reliabilityCS=(CSreliabilityDay7-CSreliabilityDay1) and reliabilityreward=(rewardreliabilityDay7-rewardreliabilityDay1). As a control, we randomly sampled the day 7 session matching the number of trials, and calculated the reliability to obtain a ‘random reliability’ for each neuron. We then calculated a reliabilityrandom where, reliabilityRandom=(reliabilityrandom-reliabilityDay1) for CS and reward reliabilities.

The onset time of neuronal activity following the CS was calculated as the time from the CS onset to the time of the first Ca2+ event (fifth frame above threshold) within the CS response period. For the onset time of reward-related neuronal activity, the time from the first lick (after reward delivery) to the time of the first Ca2+ event (fifth frame above threshold) within the reward response period was used. The latency for each cell was first calculated by taking the mean across all active trials for a single cell, then the median of all cells within a mouse was calculated. Only cells that were tracked between days were included.

To determine the proportion of PV-INs that responded to the CS on days 1 and 7, we found the overall proportion of cells that responded to the CS out of total active cells on both days 1 and 7. To calculate the percentage of PV-INs that maintained CS responses across days, we found the proportion of day 7 cells that also had CS responses on day 1. We also found the proportion of day 1 cells with CS responses, that either maintained CS responses on day 7, became CS unresponsive but reward responsive, or became unresponsive to both CS and reward. The same analysis was performed on VIP-IN reward responses. Only active cells that were tracked from days 1 to 7 were included.

Licking-related analysis

ITI lick bouts were defined as self-initiated licking events that occurred at least 20 s after the preceding reward delivery time (trial end) and more than 2.5 s prior to the subsequent CS onset (trial start). If individual licks were separated by 3 s or more, they were considered to be a new lick bout. To remain consistent with CS and reward analyses, only the first 2.5 s of a lick bout were analyzed for neural responses. ITI lick bout reliability indices were calculated as described above.

To determine lick cells, we found the first ITI lick bout in each ITI and calculated the mean Z-scored ΔF during the first 2.5 s of the lick bout. We then created a matrix of concatenated ITIs from the same session and randomly sampled the concatenated ITIs, and calculated the mean Z-scored ΔF, matching the duration and number of ITI lick bouts. A paired t-test was used to compare the mean Z-scored ΔF during ITI lick bouts and during the random samples. Cells with significantly higher ΔF during lick bouts were considered to be lick cells. We performed this analysis on days 1 and 7 and tracked individual neurons to identify changes in selectivity on a cell-by-cell basis. To determine if cells have mixed selectivity, we performed the same analysis using the CS and reward response periods (2.5 s from onset) and compared this activity to an equal number of random samples using a paired t-test.

Statistical analysis

Statistical analysis for tuning coefficients was performed in Python and in R. All other statistical analyses were performed in Matlab using the Statistics and Machine Learning Toolbox. Two-way analysis of variance (ANOVA) was used to test for differences in anticipatory lick rate on days 1 and 7. One-way ANOVA was used to test for differences in lick rate during ITI, CS, and reward. One-way ANOVA was used to compare the percent of active cells across cell types on a single day. Monte-Carlo (as described above) was used to test for significant percent of CS- and reward-responsive neurons, and for changes in tuning properties. Paired t-test was used to test for differences in the percentage of responsive cells and reliability index on days 1 and 7, and for differences in neuron reliability between ITI lick bouts, CS, and reward. Paired t-test was used to determine mixed selectivity as described above. The Kolmogorov–Smirnov test was used to compare response reliability cumulative distributions and ∆reliability distributions. All values were reported as the mean and standard error of the mean unless otherwise specified. Power analysis was not performed to predetermine the sample size, and the experiments were not blinded.

Data analysis and code availability

Tuning coefficient calculation and statistical tests were performed using Python 3.8 with the following libraries: NumPy, Pandas, h5py, and SQLAlchemy. Figures were prepared in Python using matplotlib and seaborn, and in R using ggplot2. Codes to reproduce the analysis for Figures 1, 2,, 47 are available at https://github.com/clee162/Analysis-of-Cell-type-Specific-Responses-to-Associative-Learning-in-M1 (Lee, 2022; copy archived at swh:1:rev:824cf3c2d3c4345b174227656b15d66cb84ede31). Codes to reproduce the analysis and Figure 3 are available at https://github.com/nauralcodinglab/interneuron-reward (Harkin, 2022; copy archived at swh:1:rev:3c30ebc43f5032c1ccbc09704e3d0bc295eaa778). Data can be found on Dryad at https://doi.org/10.5061/dryad.q573n5tjj.

Acknowledgements

We thank the members of the Chen lab for discussions and providing feedback on the manuscript. This work was supported by grants for S.X.C. from Canada Research Chair (CRC) (grant no. 950-231274) and Natural Sciences and Engineering Research Council of Canada (NSERC) (grant no. 05308), and a grant for R N from NSERC (grant no. 06972). E.H. was supported by a NSERC graduate scholarship. C.L. was supported by Ontario Graduate Scholarship and Queen Elizabeth II Graduate Scholarship.

Funding Statement

The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.

Contributor Information

Simon Chen, Email: schen2@uottawa.ca.

Jun Ding, Stanford University, United States.

Michael J Frank, Brown University, United States.

Funding Information

This paper was supported by the following grants:

  • Natural Sciences and Engineering Research Council of Canada 05308 to Simon Chen.

  • Canada Research Chairs 950-231274 to Simon Chen.

  • Natural Sciences and Engineering Research Council of Canada 06972 to Richard Naud.

Additional information

Competing interests

No competing interests declared.

No competing interests declared.

Author contributions

Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Supervision, Validation, Visualization, Writing - original draft, Writing - review and editing.

Formal analysis, performed the analyses for Figure 3, performed the analyses for Figure 3.

Data curation.

Formal analysis, Supervision, performed the analyses for Figure 3, performed the analyses for Figure 3.

Conceptualization, Funding acquisition, Supervision, Writing - original draft, Writing - review and editing.

Ethics

All animal experiments were approved by the University of Ottawa Animal Care Committee (protocol #: CMM-2737) and in accordance with the Canadian Council on Animal Care guidelines.

Additional files

Transparent reporting form
Supplementary file 1. Summary.of the number of mice, total regions of interests (ROIs), total active cells, and total active cells tracked from days 1 to 7 in all experimental conditions.
elife-72549-supp1.xlsx (10.5KB, xlsx)

Data availability

Codes to reproduce the analysis for figures 1-2 and 4-7 are available at https://github.com/clee162/Analysis-of-Cell-type-Specific-Responses-to-Associative-Learning-in-M1. Codes to reproduce the analysis and figure 3 are available at https://github.com/nauralcodinglab/interneuron-reward. Data can be found on Dryad at https://doi.org/10.5061/dryad.q573n5tjj.

The following dataset was generated:

Chen SX. 2022. Data from: Cell-type specific responses to associative learning in the primary motor cortex. Dryad Digital Repository.

References

  1. Abe M, Schambra H, Wassermann EM, Luckenbaugh D, Schweighofer N, Cohen LG. Reward improves long-term retention of a motor memory through induction of offline memory gains. Current Biology. 2011;21:557–562. doi: 10.1016/j.cub.2011.02.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Atallah BV, Bruns W, Carandini M, Scanziani M. Parvalbumin-expressing interneurons linearly transform cortical responses to visual stimuli. Neuron. 2012;73:159–170. doi: 10.1016/j.neuron.2011.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chen SX, Kim AN, Peters AJ, Komiyama T. Subtype-specific plasticity of inhibitory circuits in motor cortex during motor learning. Nature Neuroscience. 2015;18:1109–1115. doi: 10.1038/nn.4049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chu MW, Li WL, Komiyama T. Balancing the Robustness and Efficiency of Odor Representations during Learning. Neuron. 2016;92:174–186. doi: 10.1016/j.neuron.2016.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cichon J, Gan WB. Branch-specific dendritic Ca(2+) spikes cause persistent synaptic plasticity. Nature. 2015;520:180–185. doi: 10.1038/nature14251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Donato F, Rompani SB, Caroni P. Parvalbumin-expressing basket-cell network plasticity induced by experience regulates adult learning. Nature. 2013;504:272–276. doi: 10.1038/nature12866. [DOI] [PubMed] [Google Scholar]
  7. Driscoll LN, Pettit NL, Minderer M, Chettih SN, Harvey CD. Dynamic Reorganization of Neuronal Activity Patterns in Parietal Cortex. Cell. 2017;170:986–999. doi: 10.1016/j.cell.2017.07.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Duan Z, Li A, Gong H, Li X. A Whole-brain Map of Long-range Inputs to GABAergic Interneurons in the Mouse Caudal Forelimb Area. Neuroscience Bulletin. 2020;36:493–505. doi: 10.1007/s12264-019-00458-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Fishell G, Rudy B. Mechanisms of inhibition within the telencephalon: “where the wild things are.”. Annual Review of Neuroscience. 2011;34:535–567. doi: 10.1146/annurev-neuro-061010-113717. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Fu Y, Tucciarone JM, Espinosa JS, Sheng N, Darcy DP, Nicoll RA, Huang ZJ, Stryker MP. A cortical circuit for gain control by behavioral state. Cell. 2014;156:1139–1152. doi: 10.1016/j.cell.2014.01.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Galea JM, Mallia E, Rothwell J, Diedrichsen J. The dissociable effects of punishment and reward on motor learning. Nature Neuroscience. 2015;18:597–602. doi: 10.1038/nn.3956. [DOI] [PubMed] [Google Scholar]
  12. Gasselin C, Hohl B, Vernet A, Crochet S, Petersen CCH. Cell-type-specific nicotinic input disinhibits mouse barrel cortex during active sensing. Neuron. 2021;109:778–787. doi: 10.1016/j.neuron.2020.12.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Georgopoulos AP, Ashe J, Smyrnis N, Taira M. The motor cortex and the coding of force. Science (New York, N.Y.) 1992;256:1692–1695. doi: 10.1126/science.256.5064.1692. [DOI] [PubMed] [Google Scholar]
  14. Harkin EF. Calcium activity in M1 during classical conditioning. 3c30ebcGitHub. 2022 https://github.com/nauralcodinglab/interneuron-reward
  15. Harvey CD, Coen P, Tank DW. Choice-specific sequences in parietal cortex during a virtual-navigation decision task. Nature. 2012;484:62–68. doi: 10.1038/nature10918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Jackson J, Ayzenshtat I, Karnani MM, Yuste R. VIP+ interneurons control neocortical activity across brain states. Journal of Neurophysiology. 2016;115:3008–3017. doi: 10.1152/jn.01124.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Kato HK, Chu MW, Isaacson JS, Komiyama T. Dynamic sensory representations in the olfactory bulb: modulation by wakefulness and experience. Neuron. 2012;76:962–975. doi: 10.1016/j.neuron.2012.09.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Kato HK, Gillet SN, Isaacson JS. Flexible Sensory Representations in Auditory Cortex Driven by Behavioral Relevance. Neuron. 2015;88:1027–1039. doi: 10.1016/j.neuron.2015.10.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Komiyama T, Sato TR, O’Connor DH, Zhang Y-X, Huber D, Hooks BM, Gabitto M, Svoboda K. Learning-related fine-scale specificity imaged in motor cortex circuits of behaving mice. Nature. 2010;464:1182–1186. doi: 10.1038/nature08897. [DOI] [PubMed] [Google Scholar]
  20. Krabbe S, Paradiso E, d’Aquin S, Bitterman Y, Courtin J, Xu C, Yonehara K, Markovic M, Müller C, Eichlisberger T, Gründemann J, Ferraguti F, Lüthi A. Adaptive disinhibitory gating by VIP interneurons permits associative learning. Nature Neuroscience. 2019;22:1834–1843. doi: 10.1038/s41593-019-0508-y. [DOI] [PubMed] [Google Scholar]
  21. Lee S-H, Kwan AC, Zhang S, Phoumthipphavong V, Flannery JG, Masmanidis SC, Taniguchi H, Huang ZJ, Zhang F, Boyden ES, Deisseroth K, Dan Y. Activation of specific interneurons improves V1 feature selectivity and visual perception. Nature. 2012;488:379–383. doi: 10.1038/nature11312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Lee K, Holley SM, Shobe JL, Chong NC, Cepeda C, Levine MS, Masmanidis SC. Parvalbumin Interneurons Modulate Striatal Output and Enhance Performance during Associative Learning. Neuron. 2017;93:1451–1463. doi: 10.1016/j.neuron.2017.02.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lee C, Lavoie A, Liu J, Chen SX, Liu BH. Light Up the Brain: The Application of Optogenetics in Cell-Type Specific Dissection of Mouse Brain Circuits. Frontiers in Neural Circuits. 2020;14:18. doi: 10.3389/fncir.2020.00018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Lee C. Analysis of cell-type-specific responses to associative learning in M1. 824cf3cGitHub. 2022 doi: 10.7554/eLife.72549. https://github.com/clee162/Analysis-of-Cell-type-Specific-Responses-to-Associative-Learning-in-M1 [DOI] [PMC free article] [PubMed]
  25. Levy S, Lavzin M, Benisty H, Ghanayim A, Dubin U, Achvat S, Brosh Z, Aeed F, Mensh BD, Schiller Y, Meir R, Barak O, Talmon R, Hantman AW, Schiller J. Cell-Type-Specific Outcome Representation in the Primary Motor Cortex. Neuron. 2020;107:954–971. doi: 10.1016/j.neuron.2020.06.006. [DOI] [PubMed] [Google Scholar]
  26. Makino H, Komiyama T. Learning enhances the relative impact of top-down processing in the visual cortex. Nature Neuroscience. 2015;18:1116–1122. doi: 10.1038/nn.4061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Markram H, Toledo-Rodriguez M, Wang Y, Gupta A, Silberberg G, Wu C. Interneurons of the neocortical inhibitory system. Nature Reviews. Neuroscience. 2004;5:793–807. doi: 10.1038/nrn1519. [DOI] [PubMed] [Google Scholar]
  28. Marsh BT, Tarigoppula VSA, Chen C, Francis JT. Toward an autonomous brain machine interface: integrating sensorimotor reward modulation and reinforcement learning. The Journal of Neuroscience. 2015;35:7374–7387. doi: 10.1523/JNEUROSCI.1802-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Moran DW, Schwartz AB. Motor cortical representation of speed and direction during reaching. Journal of Neurophysiology. 1999;82:2676–2692. doi: 10.1152/jn.1999.82.5.2676. [DOI] [PubMed] [Google Scholar]
  30. Muñoz W, Tremblay R, Levenstein D, Rudy B. Layer-specific modulation of neocortical dendritic inhibition during active wakefulness. Science (New York, N.Y.) 2017;355:954–959. doi: 10.1126/science.aag2599. [DOI] [PubMed] [Google Scholar]
  31. Nikooyan AA, Ahmed AA. Reward feedback accelerates motor learning. Journal of Neurophysiology. 2015;113:633–646. doi: 10.1152/jn.00032.2014. [DOI] [PubMed] [Google Scholar]
  32. O’Connor DH, Hires SA, Guo ZV, Li N, Yu J, Sun Q-Q, Huber D, Svoboda K. Neural coding during active somatosensation revealed using illusory touch. Nature Neuroscience. 2013;16:958–965. doi: 10.1038/nn.3419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Peters AJ, Chen SX, Komiyama T. Emergence of reproducible spatiotemporal activity during motor learning. Nature. 2014;510:263–267. doi: 10.1038/nature13235. [DOI] [PubMed] [Google Scholar]
  34. Peters AJ, Lee J, Hedrick NG, O’Neil K, Komiyama T. Reorganization of corticospinal output during motor learning. Nature Neuroscience. 2017;20:1133–1141. doi: 10.1038/nn.4596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Pfeffer CK, Xue M, He M, Huang ZJ, Scanziani M. Inhibition of inhibition in visual cortex: the logic of connections between molecularly distinct interneurons. Nature Neuroscience. 2013;16:1068–1076. doi: 10.1038/nn.3446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Pi HJ, Hangya B, Kvitsiani D, Sanders JI, Huang ZJ, Kepecs A. Cortical interneurons that specialize in disinhibitory control. Nature. 2013;503:521–524. doi: 10.1038/nature12676. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Poort J, Wilmes KA, Blot A, Chadwick A, Sahani M, Clopath C, Mrsic-Flogel TD, Hofer SB, Khan AG. Learning and attention increase visual response selectivity through distinct mechanisms. Neuron. 2021;1:00954–00955. doi: 10.1016/j.neuron.2021.11.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Ramakrishnan A, Byun YW, Rand K, Pedersen CE, Lebedev MA, Nicolelis MAL. Cortical neurons multiplex reward-related signals along with sensory and motor information. PNAS. 2017;114:E4841–E4850. doi: 10.1073/pnas.1703668114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Ramkumar P, Dekleva B, Cooler S, Miller L, Kording K. Premotor and Motor Cortices Encode Reward. PLOS ONE. 2016;11:e0160851. doi: 10.1371/journal.pone.0160851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, Christensen A, Clopath C, Costa RP, de Berker A, Ganguli S, Gillon CJ, Hafner D, Kepecs A, Kriegeskorte N, Latham P, Lindsay GW, Miller KD, Naud R, Pack CC, Poirazi P, Roelfsema P, Sacramento J, Saxe A, Scellier B, Schapiro AC, Senn W, Wayne G, Yamins D, Zenke F, Zylberberg J, Therien D, Kording KP. A deep learning framework for neuroscience. Nature Neuroscience. 2019;22:1761–1770. doi: 10.1038/s41593-019-0520-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Seybold BA, Phillips EAK, Schreiner CE, Hasenstaub AR. Inhibitory Actions Unified by Network Integration. Neuron. 2015;87:1181–1192. doi: 10.1016/j.neuron.2015.09.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Thabit MN, Nakatsuka M, Koganemaru S, Fawi G, Fukuyama H, Mima T. Momentary reward induce changes in excitability of primary motor cortex. Clinical Neurophysiology. 2011;122:1764–1770. doi: 10.1016/j.clinph.2011.02.021. [DOI] [PubMed] [Google Scholar]
  43. Thévenaz P, Ruttimann UE, Unser M. A pyramid approach to subpixel registration based on intensity. IEEE Transactions on Image Processing. 1998;7:27–41. doi: 10.1109/83.650848. [DOI] [PubMed] [Google Scholar]
  44. Turi GF, Li W-K, Chavlis S, Pandi I, O’Hare J, Priestley JB, Grosmark AD, Liao Z, Ladow M, Zhang JF, Zemelman BV, Poirazi P, Losonczy A. Vasoactive Intestinal Polypeptide-Expressing Interneurons in the Hippocampus Support Goal-Oriented Spatial Learning. Neuron. 2019;101:1150–1165. doi: 10.1016/j.neuron.2019.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Wilson NR, Runyan CA, Wang FL, Sur M. Division and subtraction by distinct cortical inhibitory networks in vivo. Nature. 2012;488:343–348. doi: 10.1038/nature11347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Wolff SBE, Gründemann J, Tovote P, Krabbe S, Jacobson GA, Müller C, Herry C, Ehrlich I, Friedrich RW, Letzkus JJ, Lüthi A. Amygdala interneuron subtypes control fear learning through disinhibition. Nature. 2014;509:453–458. doi: 10.1038/nature13258. [DOI] [PubMed] [Google Scholar]
  47. Wood KC, Blackwell JM, Geffen MN. Cortical inhibitory interneurons control sensory processing. Current Opinion in Neurobiology. 2017;46:200–207. doi: 10.1016/j.conb.2017.08.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Xu X, Callaway EM. Laminar specificity of functional input to distinct types of inhibitory cortical neurons. The Journal of Neuroscience. 2009;29:70–85. doi: 10.1523/JNEUROSCI.4104-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Xu T, Yu X, Perlik AJ, Tobin WF, Zweig JA, Tennant K, Jones T, Zuo Y. Rapid formation and selective stabilization of synapses for enduring motor memories. Nature. 2009;462:915–919. doi: 10.1038/nature08389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Xue M, Atallah BV, Scanziani M. Equalizing excitation-inhibition ratios across visual cortical neurons. Nature. 2014;511:596–600. doi: 10.1038/nature13321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Zhang S, Xu M, Kamigaki T, Hoang Do JP, Chang WC, Jenvay S, Miyamichi K, Luo L, Dan Y. Selective attention: Long-range and local circuits for top-down modulation of visual cortex processing. Science (New York, N.Y.) 2014;345:660–665. doi: 10.1126/science.1254126. [DOI] [PMC free article] [PubMed] [Google Scholar]

Editor's evaluation

Jun Ding 1

Using advanced live brain imaging techniques, the authors studied the activities of neurons in the primary motor cortex of mice during a classical conditional task, in which a tone is paired with a water reward. They found that distinct types of neurons respond differently to the auditory cue or the reward, and the responses evolve differentially as learning proceeds. This work reveals an interesting role of the motor cortex beyond its well-recognized function in motor control and suggests distinct functions of pyramidal neurons as well as various interneurons in reinforcement learning.

Decision letter

Editor: Jun Ding1
Reviewed by: Jerry L Chen2, Hyung-Bae Kwon3

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Cell-Type Specific Responses to Associative Learning in the Primary Motor Cortex" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Michael Frank as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Jerry L Chen (Reviewer #1); Hyung-Bae Kwon (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

Reviewers think that additional control experiments are needed for the revision. In addition, additional data analysis and clarification are needed to strengthen the paper. The following are essential revisions:

1. To address the question of whether changes are specifically associated with learning, a critical control experiment is missing. The behavioral study lacks a non-paired control – it would be more compelling if there was a CS of the same modality that was not paired with the US so we can be sure that the effects are not cue-specific but specific to conditioning. Then those cues would need to be counterbalanced across animals. This would be important for us to conclude the neural effects reflect associative learning vs some other impact of the cue over time.

2. Please examine the motor-related activity in the data set and address the following questions raised by Reviewer #1 and Reviewer #2, (a) As a control, can you quantify the number of licking-related neurons across cell types and confirm that they do not change with learning? (b) Are there neurons that show mixed responses to cue and licking? Do those responses change at all during learning? (c) Are there neurons show mixed responses to licking and reward? Do those responses change at all during learning?

3. The 2.5s period chosen for analysis covers both tone presentation and delay. Will the conclusions in Figure 2E-H change if the analysis is restricted to the 1s of tone presentation? In addition, it seems better to use the actual duration of reward presentation (see Q2) for analysis. As the statistical analysis is done by random shuffling, there seems no need to match the period for tone and reward analyses.

4. In Activity Analysis and Tuning Coefficients Calculation, the authors performed a resampling of mice with replacement, and the size of the random sample equals that of the original set of mice. Please clarify why this is done this way. Can the authors simply take all mice into analysis?

5. Both Reviewer #2 and Reviewer #3 pointed out the importance of plotting calcium signals over days without resorting. The authors conclude that "PV-INs that began as highly reliable maintained their reliability to the CS, …, while PV-INs that began as low reliability became significantly more reliable." However, Figure 4F only shows how the percentage of neurons in the high- or low-reliability category changes overtraining. To draw this conclusion, the authors need to track individual neurons and compare the same neuron's reliability on d1 and d7.

Reviewer #1 (Recommendations for the authors):

I would encourage the authors to examine the motor-related activity in their data set, to help shed light on the following questions. Do the cue and reward related changes really reflect local circuit changes as the authors seem to suggest? Or could they potentially reflect changes outside of M1 that are then inherited and readout by M1 cell type? If local circuit changes are occur, one might expect to see changes in the conjunctive responses of cue, reward, and motor activity within individual cells. Changes in network activity between cue, reward, and motor cells may also be observed. It should be possible and worth examining these relationships to tease apart potential mechanisms and impact of non-motor changes in M1.

Reviewer #2 (Recommendations for the authors):

1. It is helpful to provide further details about the genetic background of the transgenic animals. Are they homozygous or heterozygous? What genetic background are they maintained in? Also, it would be helpful to indicate the number of neurons imaged per mouse.

2. A question about the behavioral setup. How long is the water reward presented? How many licks does it take the mouse to consume the 10 ul water reward? It seems from Figure 1B that most licks occur when water is no longer available. Also, the example in Figure 2C suggests that post-cue licking is far fewer on d7 than on d1, suggesting that the mouse has learned to inhibit non-rewarded licking. This seems not to agree with Figure 1C.

3. As imaging is conducted in M1, how is the response of the cells related to licking? Can the authors make a licking-triggered average of Ca traces for comparison?

4. In Figure 2B: is the scale bar 10% or 10, i.e., 1000%? Many transients show a very slow onset, which is not consistent with the rapid rising phase of GCaMP6f signals as shown in many previous publications. Also, previous publications show that PV and SOM interneurons have very synchronized activity in mPFC (Pinto and Dan, 2015 Neuron) and secondary motor cortex (Garcia-Junco-Clemente et al., 2019 Cell Report). Is it true in M1? Can the authors give some examples of Ca traces of each type of the interneurons? If Ca transients in M1 interneurons exhibit different kinetics from those in pyramidal neurons, how would that affect the choice of analysis criteria?

5. The 2.5s period chosen for analysis covers both tone presentation and delay. Will the conclusions in Figure 2E-H change if the analysis is restricted to the 1s of tone presentation? In addition, it seems better to use the actual duration of reward presentation (see Q2) for analysis. As the statistical analysis is done by random shuffling, there seems no need to match the period for tone and reward analyses.

6. In Activity Analysis and Tuning Coefficients Calculation, the authors performed a resampling of mice with replacement, and the size of the random sample equals that of the original set of mice. What is the reason to do this? Can the authors simply take all mice into analysis?

7. Figure 3A-B: a little clarification is needed. (1) Is each trace a single trial or the average over trials? (2) What does it mean that "each trace is from the same neuron on d7?" Are the five traces from one neuron, or five neurons? If they are from five neurons, are these the same five neurons in A and B? (3) Why does the color of a single trace change with time?

8. The authors conclude that "PV-INs that began as highly reliable maintained their reliability to the CS, …, while PV-INs that began as low reliability became significantly more reliable." However, Figure 4F only shows how the percentage of neurons in the high- or low-reliability category changes over training. The authors need to track individual neurons and compare the same neuron's reliability on d1 and d7 to draw this conclusion. The same issue applies to the analysis of all cell types.

Reviewer #3 (Recommendations for the authors):

First of all, the work is great. Analyzing calcium activity from each interneuron cell type is quite difficult, but authors elegantly performed the experiments. I wonder whether you can plot calcium signals over days (day 1 to day 7) without resorting. I understand it is very difficult, but if you have a good imaging quality enough to trace calcium activity from the same cells over several days, it would be nicer to show. Another concern is the lack of control experiments that show no such changes presented in training groups. Otherwise, it is quite good.

eLife. 2022 Feb 3;11:e72549. doi: 10.7554/eLife.72549.sa2

Author response


Essential revisions:

Reviewers think that additional control experiments are needed for the revision. In addition, additional data analysis and clarification are needed to strengthen the paper. The following are essential revisions:

1. To address the question of whether changes are specifically associated with learning, a critical control experiment is missing. The behavioral study lacks a non-paired control – it would be more compelling if there was a CS of the same modality that was not paired with the US so we can be sure that the effects are not cue-specific but specific to conditioning. Then those cues would need to be counterbalanced across animals. This would be important for us to conclude the neural effects reflect associative learning vs some other impact of the cue over time.

We thank the reviewers for suggesting this control experiment. To properly address this, we conducted 3 sets of experiments using PV-Cre and VIP-Cre mice. In the first experiment, we exposed PV-Cre mice to the same auditory tone used as the CS but we omitted all water rewards, and recorded the response of PV-INs to tone on both Day 1 and Day 7 (Figure 4 —figure supplement 1A). Additionally, in separate cohorts of animals, we exposed PV-Cre and VIP-Cre mice to tone but with non-paired water rewards (given at randomly varied time intervals) and recorded the responses from PV-INs and VIP-INs on both Day 1 and Day 7 (Figure 4 —figure supplement 1D). In all three experiments, tone was presented at the same frequency and duration as previous CS-reward experiments, and the number of trials was also equivalent.

Surprisingly, we found that when mice were not given water rewards, PV-INs did not respond to the tone stimulus on either Day 1 or Day 7, as the mean percent of active cells during the tone response period was not significantly different from the null distribution generated by randomly sampling the session (Figure 4 —figure supplement 1B). In comparison, when we examined mice that were exposed to the tone stimulus with non-paired water rewards, we found that PVINs were significantly responsive to the tone stimulus on Day 1, similar to what we observed in Figure 2. Interestingly, by Day 7, PV-INs no longer responded to the tone stimulus (Figure 4 —figure supplement 1G). We next examined if the mice that received the tone stimulus with nonpaired water reward learned to associate the tone with the reward after 7 days. We found the animals did not learn the association, as their conditioned response (anticipatory licking) did not increase at Day 7 (Figure 4 —figure supplement 1E). Together, in the first set of experiments where reward was omitted, we demonstrated that PV-INs did not respond to the tone stimulus on either day in the absence of reward. In the second set of experiments, where the tone was not paired with reward and the mice did not learn the CS-reward association, PV-INs were initially responsive to the tone on day 1, but by day 7, they no longer responded. These experiments suggest that PV-INs in M1 do not respond to auditory tone in general, but instead only respond to the tone when the animal is actively associating it with reward. In addition, unlike the mice that learned the association, we did not observe a change in the mean percent of tone-responsive PV-INs from Day 1 to Day 7 in the non-paired mice, and ‘Low Reliability’ PVINs did not show a change in tone reliability (Figure 4 —figure supplement 1H-I). These results further demonstrate that the changes among PV-INs to the CS tone were specific to associative learning.

Lastly, in Figure 5, we showed that after associative learning, the reward response reliability of ‘Low Reliability’ VIP-INs increased; therefore, we performed an additional control experiment examining VIP-IN responses to reward in the non-paired paradigm. We found that although VIPcre mice did not learn to associate the tone with the random water rewards (Figure 4 —figure supplement 1E), VIP-INs consistently responded to rewards (Figure 4 —figure supplement 1M). However, without learning the association, the reward response reliability of ‘Low Reliability’ VIP-INs did not change from Day 1 to Day 7 (Figure 4 —figure supplement 1O).

In conclusion, these control experiments further support our findings that different cell types in M1 undergo cell-type specific modifications after associative learning, in which PV-INs’ responses became more reliable to the cue stimulus, while VIP-INs’ responses became more reliable to the reward. We have included these results in the manuscript (Line 235 – 260, 279 – 290).

2. Please examine the motor-related activity in the data set and address the following questions raised by Reviewer #1 and Reviewer #2, (a) As a control, can you quantify the number of licking-related neurons across cell types and confirm that they do not change with learning? (b) Are there neurons that show mixed responses to cue and licking? Do those responses change at all during learning? (c) Are there neurons show mixed responses to licking and reward? Do those responses change at all during learning?

We thank the reviewers for these suggestions. To identify licking-related neurons, we took all the active cells within a session and examined their activity during the first lick bout of each ITI. We then took the mean z-score during the lick bouts and compared it to the mean z-score of an equal number of randomly sampled bouts (with equal duration). If the mean lick bout z-score was significantly higher than the mean random bout z-score, we considered the cell to be a lick neuron. We found that on both Day 1 and Day 7, the majority of neurons (from all cell types) were non-lick neurons (Figure 7 —figure supplement 1A). We also tracked individual neurons and performed the same analysis for both Day 1 and Day 7 and examined whether lick and nonlick neurons shift their responses after associative learning. We found that the majority of cells maintained the same responses, and non-lick cells were still the largest group in all cell types (Figure 7 —figure supplement 1B).

Next, we examined whether these lick neurons also showed mixed responses to CS and reward on Day 1 and Day 7. Indeed, these lick neurons exhibited mixed responses to CS, reward, or CS + reward (Figure 7 —figure supplement 1C). We then divided the lick neurons into three categories – ‘CS cells’, ‘reward cells’, and ‘CS + reward cells’. When we compared the percentage of neurons in each category between Day 1 and Day 7, we did not observe a significant difference (Figure 7 —figure supplement 1D). Together, our analyses showed that there were not many lick neurons (from all cell types) present in M1, and the percentage of lick and non-lick neurons did not change after associative learning. This is consistent with previous optical and electrical micro-stimulation studies that showed tongue, jaw, and lip are more reliably evoked in the Anterior-Lateral-Motor area (ALM) (PMID: 20376005, 2036613), rather than M1. Furthermore, we found that while these lick neurons also exhibit mixed responses to CS, reward, or CS + reward, the percentage of these mixed response cells did not change after associative learning. Therefore, since the lick cells were stable from Day 1 to Day 7, it is unlikely that they impacted the learning related changes. We have included these results in Figure 7 —figure supplement 1 and Line 322 – 331.

3. The 2.5s period chosen for analysis covers both tone presentation and delay. Will the conclusions in Figure 2E-H change if the analysis is restricted to the 1s of tone presentation? In addition, it seems better to use the actual duration of reward presentation (see Q2) for analysis. As the statistical analysis is done by random shuffling, there seems no need to match the period for tone and reward analyses.

Following the reviewers’ suggestion, we calculated the percent of CS-responsive neurons for each cell type on Day 1 and Day 7 using a 1s time window following CS presentation. Surprisingly, we found the percentage of CS-responsive neurons decreased significantly compared to the 2.5s window in all cell types, and the percentages were all around 5% of total active neurons (Author response image 1A). One plausible explanation is that M1 is not a region for direct sensory input related to auditory tone, water, or reward; hence, neuronal responses in M1 could be slightly delayed. This is supported by our no-reward control experiment, in which PV-INs did not respond to auditory tone when no water rewards were given (Figure 4 —figure supplement 1B). In combination with the slow kinetics of calcium indicator compared to electrophysiology recording, we believe this explains why a 1s window is not sufficient to capture CS or reward-related activity in M1. Since all cell types only have around 5% of CS-responsive cells, we do not think using the 1-sec time window truly represents the CS responses in M1.

Author response image 1.

Author response image 1.

(A) Comparing the percent of CS-responsive cells calculated with a 2. 5s or a 1.0s response window after CS onset. For all cell types, the percent of CS-responsive cells was greatly reduced when using a 1.0s response window for analysis compared to a 2.5s window. Most cell-types were reduced to about 5% of responsive cells. Paired t-test, * p < 0.05, ** p <0.01, *** p < 0.001, PN: Day 1 p < 1×10-3, Day 7 p < 1×10-3; PV-IN: Day 1 p = 0.002, Day 7 p = 0.001; VIP-IN: Day 1 p = 0.008, Day 7 p < 1×10-3, SOM-IN: Day 1 p = 0.034, Day 7 p < 1×10-3 (B) Comparing Monte-Carlo resampling methods using sampling with replacement vs. without replacement. For each cell type, the null distribution estimates the percent of cells active at baseline using sampling with replacement (top row) and without replacement (bottom row) for Day 1 and Day 7. With replacement: mice were re-sampled with replacement, the session was randomly sampled, and the mean percent of active cells was calculated; this was repeated 1000 times. Without replacement: all mice were sampled once, the session was randomly sampled, and the mean percent of active cells was calculated; this was repeated 1000 times. As random sampling of mice with replacement allows for a more unbiased estimate of between animal variability, the distribution created without replacement is much narrower with a smaller range and standard deviation. PN: n = 1029 cells from 6 mice. PV-IN: n = 316 cells from 6 mice. VIP-IN: n = 407 cells from 4 mice. SOM-IN: n = 189 cells from 7. Error bars show SEM.

Regarding Reviewer #2’s Q2, for each trial, the entire 10µL water reward was delivered once. However, we noticed that sometimes the water droplet disperses to the sides of the lick port, therefore, it is hard to determine how many licks it will take the mouse to finish the water. Since it varies trial-to-trial, it is difficult to determine the exact reward duration for each trial. The example trace in Figure 2C only reflects that particular trial because in other licking examples in Figure 4B and 5B, mice all showed sustained licking after water delivery on Day 7.

4. In Activity Analysis and Tuning Coefficients Calculation, the authors performed a resampling of mice with replacement, and the size of the random sample equals that of the original set of mice. Please clarify why this is done this way. Can the authors simply take all mice into analysis?

In our understanding, it is typical to perform the bootstrap analysis with random sample replacement. Based on two classic papers (Efron, The Annals of Statistics, 1979; Sitter, Comparing three bootstrap methods for survey data, 1992), the authors’ described the advantage of sampling with replacement to obtain a more accurate and unbiased estimate of variance.

Following the reviewer’s suggestion, we repeated the analysis in Figure 3 without sample replacement for all cell types. Interestingly, we noticed that after 1,000 repetitions, the distributions using ‘without replacement’ were much narrower compared to ‘with replacement’ (Author response image 1B). This is in line with the ‘without replacement’ method providing an inherently biased estimate of variance that does not capture the true distribution of the data.

5. Both Reviewer #2 and Reviewer #3 pointed out the importance of plotting calcium signals over days without resorting. The authors conclude that "PV-INs that began as highly reliable maintained their reliability to the CS, …, while PV-INs that began as low reliability became significantly more reliable." However, Figure 4F only shows how the percentage of neurons in the high- or low-reliability category changes overtraining. To draw this conclusion, the authors need to track individual neurons and compare the same neuron's reliability on d1 and d7.

We thank the reviewers for their suggestion. We performed the suggested analysis focusing on CS-responsive PV-INs and reward-responsive VIP-INs as these two populations showed an increase in reliability after associative learning (Figure 4F and 5F).

We were able to track each neuron and compare the same neuron’s reliability between Day 1 and Day 7. We first plotted an example mouse to show their reliability increases from Day 1 to Day 7. To do this, we began by sorting the neurons based on their Day 1 reliability and showed each individual neuron’s response to every trial within the session. We then tracked the same neuron to Day 7 and again showed each neuron’s response to every trial while maintaining the same order as Day 1 (without re-sorting on Day 7). We can see that many of the low reliability neurons on Day 1 had more active trials on Day 7 (Figure 4H and 5H).

To quantify that the increase in reliability of individual neurons is significant, we isolated the ‘Low Reliability’ neurons on Day 1 and calculated the change in reliability (Δreliability) for each individual neuron between Day 1 and Day 7. As a control, we randomly sampled the Day 7 session irrespective of the behavioural task and calculated a reliability value. We then subtracted that value from the actual Day 1 reliability value to generate a random Δreliability value for each neuron. By comparing the two distributions, we found the Δreliability from Day 1 to Day 7 in both PV-INs’ responses to CS and VIP-INs’ responses to reward were significantly greater than the random Δreliability in control (Figure 4I and 5I). Together, by tracking the same neurons from Day 1 to Day 7, we showed that the increase in reliability also occurs at the individual neuron level. We have included these results in the manuscript (Line 218 – 234, 271 – 278).

Reviewer #1 (Recommendations for the authors):

I would encourage the authors to examine the motor-related activity in their data set, to help shed light on the following questions. Do the cue and reward related changes really reflect local circuit changes as the authors seem to suggest? Or could they potentially reflect changes outside of M1 that are then inherited and readout by M1 cell type? If local circuit changes are occur, one might expect to see changes in the conjunctive responses of cue, reward, and motor activity within individual cells. Changes in network activity between cue, reward, and motor cells may also be observed. It should be possible and worth examining these relationships to tease apart potential mechanisms and impact of non-motor changes in M1.

We thank the reviewer for bringing up this important issue. Please find a detailed response above in ‘Essential Revision #2’. In brief, we identified lick cells in all the cell types and further examined their responses to CS, reward, or CS + reward (Figure 7 —figure supplement 1). We found the majority of active neurons (in all cell types) were non-lick neurons, and the percentage remained stable from Day 1 to Day 7. Moreover, among the lick cells that showed responses to CS, reward, or CS + reward, the percentage of cells in each category also did not change significantly after associative learning. Our working hypothesis is that the changes in CS and reward representations observed in M1 are happening outside of M1, and long-range inputs from other brain regions to a specific cell type in M1 (CS for PV-INs or reward for VIP-INs) are increased and/or strengthened after associative learning. This hypothesis is supported by our control experiments, in which in the no-reward paradigm, PV-INs in M1 did not respond to auditory tone when no water rewards were given. Also, while VIP-INs consistently responded to water rewards on both day 1 and day 7 in the non-paired behavioral paradigm, their reliability did not change when the animals did not learn the association. Future work will involve identifying the brain regions that provide these specific inputs to M1; however, we think that is beyond the scope of this manuscript. We have included a brief discussion of our working hypothesis in the Discussion section (Line 412 – 417).

Reviewer #2 (Recommendations for the authors):

1. It is helpful to provide further details about the genetic background of the transgenic animals. Are they homozygous or heterozygous? What genetic background are they maintained in? Also, it would be helpful to indicate the number of neurons imaged per mouse.

Following the reviewer’s suggestion, we have included the new information regarding the transgenic mice in the Methods (Line 709 – 710), and an additional table with the number of neurons imaged per mouse (Supplementary File 1). The figure legends indicate the total number of ROIs across all mice for each experimental condition, and the table provides further details including the number of active cells on day 1 (irrespective of the behavioral task), and the number of active cells tracked from day 1 to day 7 from each individual mouse for all experimental conditions used.

2. A question about the behavioral setup. How long is the water reward presented? How many licks does it take the mouse to consume the 10 ul water reward? It seems from Figure 1B that most licks occur when water is no longer available. Also, the example in Figure 2C suggests that post-cue licking is far fewer on d7 than on d1, suggesting that the mouse has learned to inhibit non-rewarded licking. This seems not to agree with Figure 1C.

We thank the reviewer for the question. The 10µL water reward was delivered all at once. However, we noticed that sometimes the water droplet gets dispersed to the sides of the lick port, and the mouse continues to lick although no additional water was delivered. Hence, it is hard to determine how many licks it takes for the mouse to finish the water. Since it varies trialto-trial, it is difficult to determine the exact reward duration for each trial. The example in Figure 2C only reflects that particular trial because in other licking examples provided in Figure 1C, 4B, 5B, mice all showed continuous licking after water delivery on Day 7.

3. As imaging is conducted in M1, how is the response of the cells related to licking? Can the authors make a licking-triggered average of Ca traces for comparison?

We thank the reviewer for bringing up this important issue. Please find a detailed response above in ‘Essential Revision #2’. In brief, we did not find many lick neurons (from all cell types) present in M1, and the percentage of lick and non-lick neurons did not change after associative learning. This is consistent with previous optical and electrical micro-stimulation studies that showed tongue, jaw, and lip are more reliably evoked in the Anterior-Lateral-Motor area (ALM) (PMID: 20376005, 2036613), rather than M1. We have also included example individual ca2+ traces during ITI lick bouts for each of the cell types (Figure 2 —figure supplement 1).

4. In Figure 2B: is the scale bar 10% or 10, i.e., 1000%? Many transients show a very slow onset, which is not consistent with the rapid rising phase of GCaMP6f signals as shown in many previous publications. Also, previous publications show that PV and SOM interneurons have very synchronized activity in mPFC (Pinto and Dan, 2015 Neuron) and secondary motor cortex (Garcia-Junco-Clemente et al., 2019 Cell Report). Is it true in M1? Can the authors give some examples of Ca traces of each type of the interneurons? If Ca transients in M1 interneurons exhibit different kinetics from those in pyramidal neurons, how would that affect the choice of analysis criteria?

We thank the reviewer for the question. The ‘10’ in Figure 2B is the z-score value, which means 10 standard deviations from the mean of the neuron’s entire trace.

Following the reviewer’s suggestion, we have provided ca2+ traces of each cell type (PN, PV-INs, VIP-INs, SOM-INs). We showed 5 cells during one lick bout and during one trial on Day 1. We then tracked the same cells to Day 7 and showed the ca2+ trace of the same neurons during one trial on Day 7. In general, we did not notice much difference in the kinetics between cell types during ca2+ imaging (Figure 2 —figure supplement 1). We also assessed published work from other groups using GCaMP6f for in vivo Ca imaging in inhibitory neurons to further verify the kinetics we observed. When we compared the kinetics of our calcium traces from PN, PV-IN, VIP-IN and SOM-IN to those featured in Pinto and Dan, Neuron, 2015 (PMID: 26143660), Puscian et al., Cell Reports, 2020 (PMID: 32726633) and Poort et al., Neuron, 2021 (PMID: 34906356), we did not observe a notable difference. In addition, since we utilized a threshold to determine active events, the rise kinetics should not influence the quantification of active events. We employed a similar active event criterion as Kato et al., Neuron, 2015 (PMID: 26586181), where they used GCaMP6s for ca2+ imaging in PV-INs and SOM-INs in the auditory cortex and a threshold of over 1.0 z-score for 3 consecutive frames to identify active events. In our paper, we used GCaMP6f (which has faster kinetics than GCaMP6s), and we used a longer criterion for active events (above 1.0 z-score for 5 consecutive frames) than theirs. Hence, we do not believe that the kinetics of different neuron types will affect our analyses.

In regard to synchrony, we generated similar visualization plots as Pinto and Dan, 2015 and Garcia-Junco-Clemente et al., 2019 following the reviewer’s question. In Pinto and Dan, 2015, the authors show ca2+ traces from different PV-IN and SOM-IN neurons in mPFC are highly correlated. We qualitatively compared these to our own individual traces (Figure 2 —figure supplement 1) and did not observe synchrony. In Garcia-Junco-Clemente et al., 2019, the authors used heatmaps to visualize PV-IN population activity in M2 over extended periods of time. They found synchrony within two PV-IN subpopulations – one group that was highly active when the mouse was running, and another group that was highly active during stationary periods. Therefore, we performed a similar analysis and used heatmaps to visualize the activity of all PV-Ins within each mouse over 1,000s periods (similar time course as in Garcia-JuncoClemente et al.,); however, we did not observe any synchrony. We also did not observe any synchrony in SOM-INs (Author response image 2A-B). It is possible that synchrony is specific to certain brain regions, specific tasks and/or specific behavioural states.

Author response image 2.

Author response image 2.

(A-B) Examining if PV-INs or SOM-INs in M1 showed synchronized activity. Z-scored activity of all tracked active neurons from an example PV-Cre mouse and SOM-Cre mouse during the first 1,000s of the Day 1 session (left) and the Day 7 session (right). Neuron order along the y-axis was not sorted and was maintained across days. Red arrows indicate trial start time, and white dashed lines indicate water reward delivery. Licking activity is shown above. Different from Garcia-Junco-Clemente et al., 2019, synchrony was not apparent within the PV-IN or SOM-IN population.

5. The 2.5s period chosen for analysis covers both tone presentation and delay. Will the conclusions in Figure 2E-H change if the analysis is restricted to the 1s of tone presentation? In addition, it seems better to use the actual duration of reward presentation (see Q2) for analysis. As the statistical analysis is done by random shuffling, there seems no need to match the period for tone and reward analyses.

We thank the reviewer for the suggestion. Please find a detailed response above in ‘Essential Revision #3’. In brief, we calculated the percent of CS-responsive neurons for each cell type on Day 1 and Day 7 using the 1-sec time window following CS presentation. We found the percentage of CS-responsive neurons decreased significantly compared to the 2.5-sec window in all the cell types, and the percentages were all around 5% of total active neurons (Author response image 1A). Since all cell types had a low percentage of CS-responsive cells, we do not think using a 1-sec time window truly represents the CS responses in M1.

6. In Activity Analysis and Tuning Coefficients Calculation, the authors performed a resampling of mice with replacement, and the size of the random sample equals that of the original set of mice. What is the reason to do this? Can the authors simply take all mice into analysis?

We thank the reviewer for the suggestion. Please find a detailed response above in ‘Essential Revision #4’.

7. Figure 3A-B: a little clarification is needed. (1) Is each trace a single trial or the average over trials? (2) What does it mean that "each trace is from the same neuron on d7?" Are the five traces from one neuron, or five neurons? If they are from five neurons, are these the same five neurons in A and B? (3) Why does the color of a single trace change with time?

We apologize for the lack of clarity and confusion. Each trace is the mean z-score from all the trials of one neuron, and they are different neurons on Day 1 and Day 7. We have removed the sentence ‘each trace is from the same neuron on d7’ from the figure legend. The lighter color part of the trace during non-CS or non-reward period was meant to highlight the actual response period. We have now re-made the figures, and the entire trace is a uniform color that represents the correlation value during CS or reward period.

8. The authors conclude that "PV-INs that began as highly reliable maintained their reliability to the CS, …, while PV-INs that began as low reliability became significantly more reliable." However, Figure 4F only shows how the percentage of neurons in the high- or low-reliability category changes over training. The authors need to track individual neurons and compare the same neuron's reliability on d1 and d7 to draw this conclusion. The same issue applies to the analysis of all cell types.

We thank the reviewer for the comments. Please find a detailed response above in ‘Essential Revision #5’. In brief, we tracked each PV-INs in the ‘Low Reliability’ to CS group and each VIP-INs in the ‘Low Reliability’ to reward group from Day 1 to Day 7, and we calculated the change in reliability (ΔReliability) between both days. For the control, we randomly sampled Day 7 irrespective of the behavioral task and generated a reliability value. We then subtracted that value from the actual Day 1 reliability to generate a random ‘ΔReliability’ value for each neuron. By comparing the two distributions, we found the increase in reliability from Day 1 to Day 7 in both Low Reliability PV-INs to CS and Low Reliability VIP-INs to reward were significantly higher than random.

Reviewer #3 (Recommendations for the authors):

First of all, the work is great. Analyzing calcium activity from each interneuron cell type is quite difficult, but authors elegantly performed the experiments. I wonder whether you can plot calcium signals over days (day 1 to day 7) without resorting. I understand it is very difficult, but if you have a good imaging quality enough to trace calcium activity from the same cells over several days, it would be nicer to show.

We thank the reviewer for the generous comments and suggestions. We included calcium traces from 5 example neurons for each cell type on Day 1 during one lick bout and during one trial. We then showed the same neurons again on Day 7 during another trial (Figure 2 —figure supplement 1). However, the limitation of showing calcium traces is that we can only visualize one trial at a time. In order to demonstrate that PV-INs and VIP-INs increase their response reliability to CS and reward, respectively, we decided to show every single trial from all neurons from one example mouse. To do that, we first sorted the neurons (of one mouse) based on their Day 1 reliability, and we tracked the same neurons to Day 7 without re-sorting and showed whether each trial was active or not (Figure 4H and 5H). We can clearly see that the ‘Low Reliability’ PV- or VIP-INs from Day 1 showed more active trials on Day 7 after associative learning (Figure 4I and 5I).

Another concern is the lack of control experiments that show no such changes presented in training groups. Otherwise, it is quite good.

We thank the reviewer for the comments. Please find a detailed response above in ‘Essential Revision #1’. In brief, we altered the reward contingencies in the behavioral task and performed control experiments in three separate cohorts of mice. First, we found that tone presentations in the absence of water rewards did not activate PV-INs in M1, as the percent of active PV-INs during the tone response period was not different from that at baseline. In another cohort of mice, we presented the tone with non-paired, randomly-timed water rewards. Intriguingly, PVINs were significantly responsive to the tone on Day 1 but not Day 7, suggesting PV-INs in M1 do not have pure sensory responses to the tone, but instead, they respond to reward-predicting cues. In contrast, VIP-INs remained significantly responsive to randomly timed water rewards, but the response reliability of the VIP-INs in the ‘Low Reliability’ to reward group did not change from Day 1 to Day 7. Together, our results suggest that the increase in reliability in PV-INs to the CS and VIP-INs to reward are specific to associative learning.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Chen SX. 2022. Data from: Cell-type specific responses to associative learning in the primary motor cortex. Dryad Digital Repository. [DOI] [PMC free article] [PubMed]

    Supplementary Materials

    Transparent reporting form
    Supplementary file 1. Summary.of the number of mice, total regions of interests (ROIs), total active cells, and total active cells tracked from days 1 to 7 in all experimental conditions.
    elife-72549-supp1.xlsx (10.5KB, xlsx)

    Data Availability Statement

    Codes to reproduce the analysis for figures 1-2 and 4-7 are available at https://github.com/clee162/Analysis-of-Cell-type-Specific-Responses-to-Associative-Learning-in-M1. Codes to reproduce the analysis and figure 3 are available at https://github.com/nauralcodinglab/interneuron-reward. Data can be found on Dryad at https://doi.org/10.5061/dryad.q573n5tjj.

    The following dataset was generated:

    Chen SX. 2022. Data from: Cell-type specific responses to associative learning in the primary motor cortex. Dryad Digital Repository.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES