Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jul 1.
Published in final edited form as: Psychopharmacology (Berl). 2022 Dec 27;240(1):213–225. doi: 10.1007/s00213-022-06298-z

Task parameters influence operant response variability in mice

Emma G Follman 1,#, Maxime Chevée 1,#, Courtney J Kim 1, Amy R Johnson 1, Jennifer Tat 1, Michael Z Leonard 1, Erin S Calipari 1,2,3,4,5,*
PMCID: PMC9894580  NIHMSID: NIHMS1865335  PMID: 36572717

Abstract

Rationale:

During operant conditioning, animals associate actions with outcomes. However, patterns and rates of operant responding change over learning, which makes it difficult to distinguish changes in learning from general changes in performance or movement. Thus, understanding how task parameters influence movement execution is essential.

Objectives:

To understand how specific operant task parameters influenced the repetition of future operant responses, we investigated the ability of operant conditioning schedules and contingencies to promote reproducible bouts of five lever presses in mice.

Methods:

Mice were trained on one of four operant tasks to test three distinct hypotheses: (1) whether a cue presented concurrently with sucrose delivery influenced the pattern of lever pressing; (2) whether requiring animals to collect earned sucrose promoted the organization of responses into bouts; (3) whether only reinforcing bouts where inter-response time (IRT) variances were below a target promoted reproducible patterns of operant behavior.

Results:

(1) Signaling reinforcer delivery with a cue increased learning rates but resulted in mice pressing the lever in fast succession until the cue turned on, rather than executing discrete bouts. (2) Requiring mice to collect the reinforcer between bouts had little effect on behavior. (3) A training strategy that directly reinforced bouts with low variance IRTs was not more effective than a traditional fixed ratio schedule at promoting reproducible action execution.

Conclusions:

Together, our findings provide insights into the parameters of behavioral training that promote reproducible actions and that should be carefully selected when designing operant conditioning experiments.

Keywords: reinforcer, reinforcement schedule, movement kinematics, learning, reward, inter-response times, fixed ratio, FR5

Introduction

The ability to learn relationships between actions and outcomes is the basis of operant behavior. Early stages of reinforcement learning are probabilistic - initially a random action results in an outcome and then that action is more likely or less likely to occur in the future, depending on the consequence (Ferster and Skinner, 1957; Gershman and Ölveczky, 2020). Operant conditioning therefore requires that subjects move in order to explore their environment, as well as perform task related actions – thus making both movement control and learning fundamentally related in an operant context. Likewise, the neural circuits engaged during reinforcement learning overlap significantly with those engaged during movement (Kravitz and Kreitzer, 2012; Panigrahi et al., 2015; Faure et al., 2005; Packard and Knowlton, 2002). Thus, identifying signatures of reinforcement performance and movement control is challenging, as distinct aspects of an operant task have the potential to influence either process (or both). Disentangling these factors is critical to understanding both behavior in general as well as the neural basis of specific behaviors. Here, we sought to test how specific schedules of reinforcement and contingencies within operant conditioning tasks influenced the patterns by which animals emitted responses during the task.

The goal of the present study is to provide insights into the influence of several features of operant conditioning on motor output in mice – a model organism for which genetic and optogenetic tools allow fine-scale circuit dissection but for which our understanding of operant behavior is less extensive than other organisms such as rats and non-human primates. Several experimental results in zebra finches (Duffy et al., 2022; Gadagkar et al., 2016) and in rodents (Greenstreet et al., 2022) suggest that learning and motor signals in the brain may be conceptualized as motor-errors that represent the mismatch between intended and executed movements – an attractive idea that unites both motor and learning signals into one theoretical framework. In this context, we sought to test distinct operant training conditions during a task that produces repeated actions with little variability at both large and fine scales because reproducible motor patterns allow the relatively easy dissection of execution errors. Fixed Ratio schedules of reinforcement (FR) using a lever press require repeated movements whose microstructure can be dissected and are known to promote the grouping of presses into bouts (Felton and Lyon, 1966; Jin et al., 2014). Understanding which features of an FR schedule effectively or ineffectively promote the emergence of reproducible actions in mice will help the field to take full advantage of the tools only accessible in this organism, and to identify the circuits, cell types and molecular mechanisms that together control reinforcement learning and movement.

In this study, we trained mice in a series of operant conditioning tasks where sucrose delivery was reinforced on an FR5 schedule of reinforcement. We tested three distinct hypotheses: First, we tested whether the presence of an additional cue presented concurrently with sucrose delivery influenced the pattern of lever pressing throughout learning. A “signaled” reinforcer has been shown to improve instrumental learning (Branch, 1977; Lewis et al., 1974; Marcucella and Margolius, 1978; Sanderson et al., 2014; Schachtman and Reed, 1992) and has been proposed to influence the behavioral strategy underlying action control (Vandaele et al., 2017). Thus, we reasoned that the immediate feedback provided by the cue and the absence of the need to check for reinforcer delivery after completing the response ratio may promote reproducible times to reward collection as well as inter-response times (IRTs) within each bout. Second, we tested whether requiring animals to collect their earned sucrose, thus preventing them from accumulating unconsumed reinforcers, was an effective condition to promote the organization of responses into bouts. Finally, we tested whether reinforcing reproducible bouts by only rewarding those whose IRT variance was below a specified target was an effective strategy to promote reproducible patterns of operant behavior.

Our results show that indeed, signaling reinforcer delivery with a consequent cue is crucial for mice to learn and to organize their lever presses into bouts. The sucrose collection condition, however, did not have measurable effects on the reproducibility of behaviors. Finally, the strategy in which we directly reinforced reproducible bouts revealed that signaling reinforcer delivery with a cue, while very effective at promoting learning and organizing presses into bouts, promotes lever pressing in bouts terminated by stimulus presentation rather than intrinsically discrete bouts. Our results provide valuable insight into how features of a task shape actions generated during operant behaviors and will allow future studies to investigate the neural substrates of such behaviors while carefully tuning their training parameters.

METHODS

Subjects:

Experiments were approved by the Institutional Animal Care and Use Committee of Vanderbilt University Medical Center and conducted according to the National Institutes of Health guidelines for animal care and use. Forty-seven 8-week-old animals were used for this study. C57BL/6J mice (22 males and 25 females) were acquired from Jackson Laboratory (Bar Harbor, ME; SN: 000664) and maintained on an 8am/8pm 12-hour reverse light cycle. Experiments were performed during the dark phase. Four to five animals were housed per cage with unlimited access to water. Food access was restricted to maintain ~90% pre-restriction body weight. Only mice that met FR1 acquisition criteria (as described below) were moved to each subsequent FR5 sessions.

The number of animals in each group is as follows: 1 FR5 w/ MustCollect mouse did not reach FR1 acquisition criteria and was excluded. An additional 9 mice did not complete the full training due to COVID19-related scheduling issues and were also excluded from all subsequent analyses (N = 1 from the FR5 w/ MustCollect group, N = 6 from the FR5 w/ LightCue&MustCollect group and N = 2 from the LowVariance group). In total, the cohort reported in this study included 37 animals (FR5 w/ LightCue&MustCollect: 5 males/4 females; FR5 w/ MustCollect: 3 males/3 females; FR5 w/ LightCue: 4 males/4 females; LowVariance: 5 males/9 females). The data was acquired in three sequential sessions: a session included FR5 w/ LightCue (N=4), FR5 w/ MustCollect (N=3) groups; another session included the FR5 w/ LightCue&MustCollect group (N=9); a last session included FR5 w/ LightCue (N=4), FR5 w/ MustCollect (N=3) and LowVariance (N=14) groups. The data in Figure 6 were only acquired for a subset of animals (N=9 mice from the LowVariance group, N=6 mice from the FR5 w/ LightCue&MustCollect group).

Figure 6: Individual presses become less variable with training.

Figure 6:

(A) Example bouts of five presses midway (day 5) through training (left) and late (day 10) in training (right) from one mouse. (B) Heatmap showing the pairwise correlation coefficients for the lever displacement of all presses of an example mouse trained on LowVariance. Correlations increase with time, showing that presses become more similar to each other. (C) Comparison of the pairwise correlation coefficients across time and between FR5 w/ LightCue&MustCollect and LowVariance groups. Data presented as mean +/− S.E.M. Males are depicted as dashed lines (B-C).

Apparatus:

Mice were trained and tested daily in individual standard wide mouse operant conditioning chambers (Med Associates Inc., St. Albans, Vermont) in which 3D-printed dividers were inserted, limiting the available space to a small square area providing access to a sucrose port and a lever (area: 13×13=169cm2). These boxes were fitted with a standard retractable lever and a white noise generator with a speaker. A custom-made 3D-printed wall insert was used to hold and display a stainless-steel cannula (lick port, 18 gauge, 0.042 “ID, 0.05” OD, 0.004” Wall Thickness), which was connected to a syringe pump for sucrose delivery. An illumination light was affixed above the lick port. To measure lever displacement, two small magnets (Neodymium block magnets N45 0.069×0.591×0.197 in, Buymagnets.com #EP331) were fixed to the lever and a Hall effect sensor (Sensor Hall Analog Radial Lead, Honeywell #SS49E) was placed 5mm above the lever. The weight of the magnets was countered by attaching a 1.38” 12V 44LB electromagnet (APW company #EM137–12-222) 30mm above the lever. Control of the electromagnet strength and acquisition of the Hall effect sensor data were performed using an Arduino Nano Every (Arduino, #ABX00033).

Procedure:

General Procedural Information:

All sessions lasted until the maximum number of rewards was obtained (51) or 1 hour was reached, whichever came first. White noise signaled the beginning of the session and was on for the entire duration of the session.

Task Design:

FR5 w/ MustCollect: Mice were first trained on a fixed ratio 1 (FR1) schedule of reinforcement. Each lever press resulted in the delivery of 8 μL of a 10% sucrose solution. Additional presses performed after sucrose delivery but before sucrose collection had no programmed consequence and did not count toward the next sequence. Once the reinforcer was collected, lever presses counted again. Acquisition criteria were considered met once a mouse had obtained 50 rewards within the allotted 1 hour for two consecutive days (5.7±2.0 days, N=3 females; 5.0±0.58 days, N=3 males). Mice that did not meet the criteria within 10 days were excluded (N=1 mouse). Upon meeting the criteria, the reinforcement contingency was increased to FR5 with all other conditions the same. All animals in the study were trained on this paradigm for 10 days.

FR5 w/ LightCue:

Animals were first trained on an FR1 schedule of reinforcement. Each lever press resulted in the delivery of 8 μL of a 10% sucrose solution and the light above the lick port turning on for 1 second. Acquisition criteria were considered met once a mouse had obtained 50 rewards within the allotted 1 hour for two consecutive days (3.25±0.63 days, N=4 females; 3.5±0.65 days, N=4 males). Mice that did not meet criteria within 10 days were excluded (N=0 mouse). Upon meeting the criteria, the reinforcement contingency was increased to FR5 with all other conditions the same. All animals in the study were trained on this paradigm for 10 days.

FR5 w/ LightCue&MustCollect:

Mice trained on this task experienced both conditions described above (w/ LightCue and w/ MustCollect). Acquisition criteria were considered met once a mouse had obtained 50 rewards within the allotted 1 hour for two consecutive days (3.5±0.29 days, N=4 females; 3.0±0.32 days, N=5 males). Mice that did not meet the criteria within 10 days were excluded (N=0 mouse). Upon meeting criteria, the reinforcement contingency was increased to FR5 with all other conditions the same. All animals in the study were trained on this paradigm for 10 days.

LowVariance:

This strategy was identical to the FR5 w/ LightCue&MustCollect group during the initial FR1 training phase, in that it included both the w/ LightCue and w/ MustCollect conditions (days to acquisitions: 4.1±0.42 days, N=9 females; 3.8±0.49 days, N=5 males). Subsequently, during the FR5 phase, a sequence of 5 presses was only rewarded if the variance of its within-bout inter-response intervals (IRTs) was below a threshold computed as the median IRT variance over the last 5 sequences. If the IRT variance of a bout was above the threshold at that time, the threshold was updated but no external signals were generated, and the animal simply had to try again.

Analysis

Statistical Analyses:

All analyses were performed using custom code in Python (v3.6.13). The SciPy package (v1.5.3) was used to perform paired t-tests (Figures 2B, 3F-G, 4B-C, 4E-F, 5D-E, 7B-C) and the Pingouin package (v0.3.12) was used to perform one-way and mixed ANOVAs as well as the corresponding post-hoc Tukey tests (Figures 2C-E, 3F-G, 6C). All data are reported as Mean ± SEM and all statistical tests used are specified in the Results section.

Figure 2: Absence of a light cue signaling sucrose delivery impairs performance on an FR5 lever pressing task.

Figure 2:

(A) Plots of cumulative lever presses across sessions for one example mouse per group. Black dashes indicate reinforcer delivery. (B) Comparison of lever press rates between early in training (day 1) and late in training (mean of days 9 and 10) for each group. (C) Comparison of the fold change in lever press rates across groups which had a significant change. (D) Comparison of the lever press rates late in training across groups. (E) Comparison of the total rewards acquired during 10 days across groups. Data presented as mean +/− S.E.M. * p < 0.05, ns, not significant. Males are depicted as dashed lines (B) and hollow circles (C-E).

Figure 3: A light cue signaling sucrose delivery is necessary for clustering lever presses into bouts.

Figure 3:

(A) Diagram showing the distinction between inter-response times (IRTs) occurring within reinforced bouts and those occurring between bouts. (B-E) Plots showing the distribution of the two types of IRTs across training for example mice from each group. (i) Example cumulative lever press plots (reset at each reward) color-coded according to diagram in (A). (ii) Distribution of “within-bout IRTs” and “between-bout IRTs” on day 1 (top) and on day 10 (bottom). (iii) Heatmap showing the distribution of “within-bout IRTs” across days. Yellow line indicates the median “within-bout IRT”. (right) Heatmap showing the distribution of “between-bout IRTs” across days. Blue line indicates the median “between-bout IRT”. (F) Comparison of “within-bout IRTs” early (day 1) versus late (mean of days 9 and 10) for each group (left), and comparison of the late/early fold change for the groups with a significant difference. (G) Comparison of the ratio “between-bout IRTs” / “within-bout IRTs” early (day 1) versus late (mean of days 9 and 10) for each group (left), and comparison of the late/early fold change for the groups with a significant difference (right). Data presented as mean +/− S.E.M. * p < 0.05, ns, not significant. Males are depicted as dashed lines (F left, G left) and hollow circles (F right, G right).

Figure 4: A light cue that signals sucrose delivery shortens the delay to reinforcer collection.

Figure 4:

(A) Plots showing the cumulative sum of latencies from reinforcer delivery to collection across sessions for one example mouse from each group. (B) Comparison of the median latency to collect the reinforcer between early (day 1) versus late (mean of days 9 and 10) for each group. (C) Comparison of the slope of the cumulative density function of the latency to reward collection at 50% of maximum between early (day 1) and late (mean of days 9 and 10) for each group. (D) Plots showing the cumulative sum of latencies from reinforcer collection to next lever press across sessions for one example mouse from each group. (E) Comparison of the median latency to first press between early (day 1) and late (mean of days 9 and 10) for each group. (F) Comparison of the slope of the cumulative density function of the latency to first press at 50% of maximum between early (day 1) and late (mean of days 9 and 10) for each group. Data presented as mean +/− S.E.M. * p < 0.05, ns, not significant. Males are depicted as dashed lines (B-C, E-F).

Figure 5: Direct reinforcement of low variance bouts does not produce more homogeneous bouts than an FR5 schedule.

Figure 5:

(A-B) Example raster plots (center) and post press time histograms (bottom) showing early (A) and late (B) bouts of presses aligned to the first press. The IRT variance for each bout is plotted on the right panel and the distribution of these variances is shown in the top right plots. The light gray line shows the early variance median and the dark gray line shows the late variance median. (C) Heatmaps showing the distribution of “within-bout IRT” variances for example mice from each group. (D) Comparison of the “within-bout IRT” coefficient of variation (CV) of reinforced bouts between early and late in training for each group. (E) Comparison of the late/early fold change for the groups with a significant difference. Data presented as mean +/− S.E.M. * p < 0.05, ns, not significant. Males are depicted as dashed lines (D) and hollow circles (E).

Figure 7: The LowVariance strategy induces a decrease of within-bout inter-response times but fails to produce bouts of five lever presses.

Figure 7:

(A) Diagram showing the different types of IRTs for mice performing the LowVariance task (top) and an example cumulative plot (bottom). (B) Comparison of the early vs late IRTs for each type of IRT in LowVariance mice. (C) Comparison of the late/early fold change for each type of IRT that was significantly different. Data presented as mean +/− S.E.M. * p < 0.05, ns, not significant. Males are depicted as dashed lines (B) and hollow circles (C).

Hall effect sensor data processing:

Data was acquired at 1,000 Hz and lowpass filtered at 5e−9 cycles/unit to remove fast oscillations originating from the electromagnet. To identify deflections in the time series, we first computed a threshold, which, when used to define deflection points, resulted in the same number of deflections as lever presses counted by MedPC. However, this procedure resulted in a small number of false positives as well as false negatives. To identify which deflections in the data represented a lever press counted by MedPC versus deflections that were too small to trigger a bonafide lever press, we used MedPC timestamps as a reference. This approach allowed us to manually adjust the labels for each deflection, and only sessions in which 90% of presses were accounted for were kept for further analysis. The start and end of each deflection were identified by sliding backwards and forwards in time, respectively, from the threshold crossing point until the values returned to baseline or until the preceding/following press. The data from each session were z-scored. To compute the Pearson correlation coefficient between individual presses, we used one second from the start of each deflection. Presses that lasted longer were truncated and presses that were shorted were padded with zeros. This approach resulted in the correlations being primarily driven by the shape of the downward deflection and the duration of the press. Changing the size of this window had little effect on our results. This system was only installed half-way through the experiment, which is why we only presented data for a subset of mice and days. Hall effect sensor analysis were performed with Python 3.9.7.

Results

Paradigm-specific effects on reinforcement behavior.

To test the hypothesis that specific features of operant conditioning differentially shape action execution, we trained four groups of mice on four distinct FR5 lever pressing procedures. A first group was trained on an FR5 schedule in which lever presses only counted if executed after the previous reinforcer had been collected (FR5 w/ MustCollect, N=6, Figure 1Ai). A second group was trained on an FR5 schedule in which a light cue signaled sucrose delivery (one second light cue, FR5 w/ LightCue, N=8, Figure 1Aii). A third group was trained with both conditions (FR5 w/ LightCue&MustCollect, N=9, Figure 1Aiii). Finally, a fourth group was trained to test whether a schedule that specifically reinforces bouts of five lever presses with reproducible IRTs would be more effective than FR5 at producing highly reproducible actions. Specifically, the variance of the IRTs in a bout of five presses had to be below a target variance to trigger reinforcer delivery. The target was dynamically defined as the median variance of the last 5 bouts (LowVariance, N=14, Figure 1Aiv). In the case of a failed attempt, the mouse did not get any indication that the IRT variance was higher than the target and simply had to continue pressing until it generated a bout of five presses whose IRT variance was below the target. These four strategies allowed us to determine which specific features of training paradigms promote or hinder the development of reproducible actions.

Figure 1: A light cue signaling sucrose delivery improves acquisition of a lever pressing task.

Figure 1:

(A) Diagram describing the four reinforcement strategies used in this study. (B) Cumulative fraction plot showing the number of days mice from each group took to reach the FR1 acquisition criteria (maximum rewards acquired on two consecutive days). Data presented as a cumulative fraction of the total number of mice.

A light cue that signals sucrose delivery improves acquisition of a lever pressing operant task in mice.

All mice were first trained on an FR1 schedule and moved on to FR5 once they acquired the maximum number of reinforcers on two consecutive days (51 rewards, 1-hour max sessions). To determine whether the w/ LightCue and the w/ MustCollect conditions impacted the mice’s ability to learn to press the lever for a sucrose reinforcer, we plotted the cumulative distribution of days to FR1 acquisition for each group (Figure 1B). There was a significant difference in the time to acquisition across groups for mice that reached criteria within 10 days (FR5 w/ LightCue&MustCollect, N=9 mice: 3.1±0.26 days; FR5 w/ MustCollect, N=6 mice: 5.2±0.94 days; FR5 w/ LightCue, N=8 mice: 3.4±0.42 days; LowVariance, N=14 mice: 3.7±0.30 days; one-way ANOVA, F = 3.2, p = 0.037), with the group trained using the FR5 w/ MustCollect paradigm acquiring slower than the FR5 w/ LightCue&MustCollect group (Post-hoc Tukey test: FR5 w/ LightCue&MustCollect _vs_ FR5 w/ MustCollect p = 0.03). This result suggests that the light cue signaling reinforcer delivery improved the acquisition of a simple lever pressing task, a result in line with previous studies (Branch, 1977; Lewis et al., 1974).

Absence of a light cue signaling reinforcer delivery impairs the response rate on an FR5 schedule of reinforcement.

We first tested whether each training strategy successfully increased the rate of lever pressing across 10 sessions. Figure 2A shows the cumulative lever presses across training for one example mouse in each group. Mice from all groups except FR5 w/ MustCollect pressed the lever at higher rates late in training (day 9/10) compared to early in training (day 1) (Figure 2B, FR5 w/ LightCue&MustCollect - early: 5.7±0.42 LP/min, late: 11.8±1.3 LP/min, paired t-test p = 0.032; FR5 w/ MustCollect - early: 2.2±0.18 LP/min, late: 3.4±0.58 LP/min, paired t-test p = 0.27; FR5 w/ LightCue - early: 4.3±0.53 LP/min, late: 7.6±1.3 LP/min, paired t-test p = 0.036; LowVariance - early: 2.3±0.25 LP/min, late: 5.7±0.41 LP/min, paired t-test p = 4.1e−5). To identify which training strategy resulted in the largest increase in response rate, we computed the fold change in rate between late and early across groups that had a significant increase and found no difference across contingencies (Figure 2C, FR5 w/ LightCue&MustCollect: 2.6±0.70; FR5 w/ LightCue: 1.8±0.32; LowVariance: 3.1±0.47; one-way ANOVA, F = 1.457, p = 0.25). However, the FR5 w/ LightCue&MustCollect group pressed faster than FR5 w/ MustCollect and LowVariance late in training (Figure 2D, FR5 w/ LightCue&MustCollect: 11.8±1.8 LP/min; FR5 w/ MustCollect: 3.4±0.86 LP/min; FR5 w/ LightCue: 7.6±1.8 LP/min; LowVariance: 5.7±0.59 LP/min; one-way ANOVA, F = 8.9, p = 4.3e-5; Posthoc Tukey tests: FR5 w/ LightCue&MustCollect _vs_ FR5 w/ MustCollect p = 0.0010, FR5 w/ LightCue&MustCollect _vs_LowVariance p = 0.0010) and FR5 w/ LightCue&MustCollect and FR5 w/ LightCue groups obtained more total rewards than the FR5 w/ MustCollect and LowVariance groups across sessions (Figure 2E, FR5 w/ LightCue&MustCollect, N=15 mice:493±5.57 reinforcers; FR5 w/ MustCollect, N=6 mice: 296±32.7 reinforcers; FR5 w/ LightCue, N=8 mice: 427±32.1 reinforcers; LowVariance, N=16 mice: 285±23.2 reinforcers; one-way ANOVA, F = 17.08, p = 7.17e−7; posthoc Tukey test: FR5 w/ LightCue&MustCollect _vs_ FR5 w/ MustCollect p = 0.0010; FR5 w/ LightCue&MustCollect _vs_LowVariance p = 0.0010; FR5 w/ LightCue _vs_ FR5 w/ MustCollect p = 0.016; FR5 w/ LightCue _vs_LowVariance p = 0.0010). These results indicate that all strategies except FR5 w/ MustCollect, which was the only paradigm without a light cue signaling sucrose delivery, reinforced lever pressing, although they resulted in distinct rates of reinforcer delivery.

A light cue that signals sucrose delivery is necessary for clustering lever presses into bouts.

To determine how effective each training strategy was at promoting the clustering of presses into bouts, we analyzed the distribution of IRTs. We labeled IRTs between presses occurring within a reinforced bout of five as “within-bout IRTs” and IRTs between presses occurring across two subsequent bouts as “between-bout IRTs” (Figure 3A). Mice that learn to perform a bout of 5 presses should have “within-bout IRTs” that are progressively shorter compared to “between-bout IRTs”. To visualize this, we plotted example cumulative lever press plots (reset at each sucrose delivery, Figures 3B-E, panel i) as well as histograms showing the distribution of “within-bout IRTs” and of “between-bout IRTs” for the first and last sessions for one example mouse from each group (Figures 3B-E, panels ii). The same distributions are shown for each session across training using density heatmaps (Figure 3B-E, panels iii). The median “within-bout IRTs” decreased from early (day 1) to late (mean of days 9 and 10) in training for each group except for mice trained on the FR5 w/ MustCollect paradigm (Figure 3F, left, FR5 w/ LightCue&MustCollect - early: 2.8±0.24 s, late: 0.78±0.065 s; paired t-test p = 6.3e−4; FR5 w/ MustCollect - early: 9.1±1.1 s, late: 11.6±2.9 s; paired t-test p = 0.46; FR5 w/ LightCue - early: 5.9±1.2 s, late: 1.3±0.21 s; paired t-test p = 0.035; LowVariance - early: 11.8±1.7 s, late: 1.5±0.14 s; paired t-test p = 7.5e−4), although there was no difference in the late/early fold change in “within-bout IRT” duration between the three groups that clustered their lever presses (Figure 3F, right, FR5 w/ LightCue&MustCollect: 0.32±0.055, FR5 w/ LightCue: 0.36±0.90, LowVariance: 0.17±0.029; one-way ANOVA, F = 3.076, p =0.062). Similarly, the ratio of median “between-bout IRTs” to median “within-bout IRTs” increased for all groups except FR5 w/ MustCollect (Figure 3G, left, FR5 w/ LightCue&MustCollect - early: 3.3±0.27, late: 12.6±1.4; paired t-test p = 0.0022; FR5 w/ MustCollect - early: 2.7±0.48, late: 3.4±1.0; paired t-test p = 0.71; FR5 w/ LightCue - early: 2.9±0.29, late: 16.2±2.0; paired t-test p = 0.0036; LowVariance - early: 2.1±0.19, late: 19.2±3.1; paired t-test p = 0.0017), showing that all conditions but the one without a light cue signaling sucrose delivery clustered their lever presses into bouts. To compare the magnitude of the clustering attained with each training strategy, we compared the late/early ratio for the three groups which clustered their presses and found no differences (Figure 3G, right, FR5 w/ LightCue&MustCollect: 4.2±0.86, FR5 w/ LightCue: 6.8±2.0, LowVariance: 9.7±2.1; one-way ANOVA, F = 2.226, p =0.127), suggesting all three strategies promoted the clustering of lever presses into bouts equally.

A light cue that signals sucrose delivery shortens the delay to reinforcer collection.

Because the action of collecting the reinforcer is an integral part of the behavior, performance of reinforcer collection may also be under the influence of task rules. We therefore asked how the delay from reinforcer delivery to reinforcer collection was affected under each strategy. The cumulative sum plots shown in Figure 4A illustrate the progression of latency to reinforcer collection across sessions for one example mouse from each group. The median latency from reinforcer delivery to collection decreased from early to late in training for each group except for mice trained on the FR5 w/ MustCollect paradigm (Figure 4B, FR5 w/ LightCue&MustCollect - early: 1.4±0.16 s, late: 0.61±1.9e−2 s; paired t-test p = 9.0e−3; FR5 w/ MustCollect - early: 2.5±0.29 s, late: 2.4±0.58 s; paired t-test p = 0.92; FR5 w/ LightCue - early: 1.8±0.19 s, late: 0.71±3.2e−2 s; paired t-test p = 3.8e−3; LowVariance - early: 2.2±0.27 s, late: 0.87±5.5e−2 s; paired t-test p = 2.6e−3). However, there was no difference in the late/early fold change between the three groups that collected faster (FR5 w/ LightCue&MustCollect: 0.49±6.3e−2, FR5 w/ LightCue: 0.47±6.8e−2, LowVariance: 0.46±5.5e−2; one-way ANOVA, F = 0.035, p =0.97). To quantify the variability in latency from reinforcer delivery to collection, we compared the slope of the cumulative density function (CDF) at 50% (normalized versions of Figure 4A) - a steeper slope indicates less variability. We found that, similar to median delay, the CDF slope increased from early to late in training for each group except for mice trained on FR5 w/ MustCollect (Figure 4C, FR5 w/ LightCue&MustCollect - early: 0.19±1.2e−2, late: 0.26±9.6e−3; paired t-test p = 0.021; FR5 w/ MustCollect - early: 0.11±1.6e−2, late: 0.15±1.3e−2; paired t-test p = 0.38; FR5 w/ LightCue - early: 0.14±1.4e−2, late: 0.25±8.6e−3; paired t-test p = 1.8e−3; LowVariance - early: 0.14±1.4e−2, late: 0.26±4.8e−3; paired t-test p = 2.6e−5), although the late/early fold change was not different for the three groups whose CDF slope steepened (FR5 w/ LightCue&MustCollect: 1.5±0.21, FR5 w/ LightCue: 2.1±0.36, LowVariance: 3.4±1.2; one-way ANOVA, F = 1.0, p =0.38). These results show that, in addition to clustering their lever press into bouts, all conditions but the one without a light cue signaling sucrose delivery improved and stabilized their performance during the reinforcer collection phase.

Unlike the latency to reinforcer collection, the delay between reinforcer collection and the next lever press changed little across sessions, with only the LightCue&MustCollect group showing a significant decrease (Figure 4D,E, FR5 w/ LightCue&MustCollect - early: 23.9±3.7 s, late: 9.4±1.8 s; paired t-test p = 0.036; FR5 w/ MustCollect - early: 37.3±6.5 s, late: 92.3±30.4 s; paired t-test p = 0.27; FR5 w/ LightCue - early: 24.3±7.0 s, late: 22.8±4.3 s; paired t-test p = 0.87; LowVariance - early: 44.6±8.5 s, late: 24.7±3.7 s; paired t-test p = 0.15). Similarly, the slope of the delay between reinforcer collection and the next lever press cumulative density function at 50% only significantly increased for the LightCue&MustCollect group (Figure 4F, FR5 w/ LightCue&MustCollect - early: 0.10±2.8e−2, late: 6.6e-2±8.1e−3; paired t-test p = 0.048; FR5 w/ MustCollect - early: 0.12±7.3e−3, late: 0.11±1.2e−2; paired t-test p = 0.64; FR5 w/ LightCue - early: 8.6e-2±8.5e−2, late: 8.5e-2±6.1e−3; paired t-test p = 0.96; LowVariance - early: 0.11±5.6e−3, late: 9.6e-2±2.8e−3; paired t-test p = 0.34). Together, these results show that the latency between reinforcer delivery and collection became shorter and less variable in the presence of a light cue signaling reinforcer delivery, while the delay between reinforcer collection and next press was less sensitive to that signal.

Direct reinforcement of low variance bouts does not produce more reproducible inter-response times than a traditional FR5.

The distribution of “within-bout IRTs” versus “between-bout IRTs” provides a useful metric to quantify how well animals clustered their presses. However, it does not indicate whether bouts of presses become more reproducible across sessions. One goal of our experiment was to test the hypothesis that directly reinforcing bouts with low variance IRTs is more effective than a traditional FR5 schedule in promoting reproducible patterns of lever presses. Here, we were specifically interested in testing the reproducibility of presses within a single bout rather than across all bouts. To test this hypothesis, we used the variance of “within-bout IRTs’’ as a quantitative metric to assess the reproducibility of the rhythm of presses within each bout. While the variance of “within-bout IRTs” is large because the rhythm is irregular early in training (Figure 5A), more rapid and uniform “within-bout IRTs’’ later in training have lower variance (Figure 5B). To visualize how “within-bout IRT” variance changed throughout training, we computed the IRT variance for each bout and plotted heatmaps showing the distribution of these variances across sessions for example mice (Figure 5C). For each training strategy, we tested whether the “within-bout IRT’’ coefficient of variation changed from early (day 1) to late (mean of days 9 and 10) in training and found that the mean coefficient of variation decreased only for mice trained on FR5 w/ LightCue&MustCollect and on LowVariance (Figure 5D, FR5 w/ LightCue&MustCollect - early: 11.9±1.4, late: 5.0±0.9, paired t-test p = 0.049; FR5 w/ MustCollect - early: 33.1±4.8, late: 48.7±10.6, paired t-test p = 0.18; FR5 w/ LightCue - early: 18.5±3.8, late: 16.5±4.6, paired t-test p = 0.82; LowVariance - early: 26.3±5.4, late: 4.4±0.68, paired t-test p = 0.015). The late/early fold changes were not different between these two groups (Figure 5E, FR5 w/ LightCue&MustCollect: 0.66±0.26; LowVariance: 0.35±0.086; paired t-test p = 0.23). These results show that the rhythm of lever presses produced by mice trained on FR5 with a light cue that signals sucrose delivery and a reinforcer collection condition become more reproducible across training and that directly reinforcing reproducible rhythms does not generate bouts that are less variable than those developed naturally.

Individual lever presses become less variable with training.

The rhythm of presses is one way to quantify the reproducibility of actions. Another metric is the detailed kinematics of the movement executed by a mouse each time it presses the lever. Using magnetic sensors, we measured the displacement of the lever during behavioral sessions and tested the hypothesis that individual lever pressing movements become more reproducible as training progresses. Examples of presses in a sequence midway through training (Figure 6A, left, day 5) and late in training (Figure 6A, right, day 10) illustrate how individual presses indeed became more reproducible. The progression in press reproducibility was apparent when we computed the pairwise Pearson correlation coefficient between all presses across training (Figure 6B). To quantify this progression and specifically test whether correlations improved with time and whether there was a difference between mice trained on FR5 w/ LightCue&MustCollect and mice trained on LowVariance, we compared the mean pairwise correlation coefficients across groups and between midway through training and late in training (Figure 6C). We found that the pairwise correlation coefficient between presses increased from midway to late in training and that mice trained on FR5 w/ LightCue&MustCollect had more reproducible presses than mice trained on the LowVariance paradigm (LowVariance-mid: 0.41±0.029, LowVariance-late: 0.46±0.037, FR5 w/ LightCue&MustCollect -early: 0.55±0.065, FR5 w/ LightCue&MustCollect -late: 0.61±0.041; mixed ANOVA: group (between subject) F = 6.2, p = 0.027; time (within subject) F = 8.5, p = 0.012; Post-hoc t-tests: FR5 w/ LightCue&MustCollect _vs_LowVariance p = 0.047, Mid_vs_Late p = 0.0096). These results indicate that mice produce presses that are less variable as they progress through training and confirm that the LowVariance training strategy did not produce more homogeneous actions.

The LowVariance strategy induces a decrease in within-bout inter response times but fails to produce bouts of five lever presses.

Mice trained using the LowVariance strategy learned to press the lever (Figure 1), gradually increased their rate of pressing (Figure 2), clustered their presses (Figure 3), decreased their latency to collect the sucrose reinforcer (Figure 4), and reduced the variability of their within-bout IRTs (Figure 5). However, this paradigm was designed to only reinforce a bout when the “within-bout IRT” variance was below a threshold, which resulted in a subset of “failed bouts” - those for which the IRT variance did not dip below the threshold (Figure 7A). For those animals, the IRTs between bouts are therefore composed of two distinct types of IRTs; those that occur between the last press of a “failed” bout and the first press of the next attempted bout (Figure 7A, red “post failed bout IRTs”) and those that occur between the last press of a reinforced bout and the first press of the subsequent bout (Figure 7A, blue “post reinforced bout IRTs”). The bimodal distribution of “between-bout IRTs” observed in the example mouse shown in Figure 3E suggests that these two types of IRTs change differentially with training. To test this possibility, we separated each type and compared IRTs early versus late in training. We found that, in addition to the “within attempted bout IRTs”, the “post failed bout IRTs” were dramatically shortened while the “post rewarded bout IRTs” remained unchanged (Figure 7B, Within attempted bout IRTs - early: 9.8±0.96 s, late: 1.5±0.16 s, paired t-test p = 3.7e-5; Post reinforced bout IRTs - early: 18.3±1.7 s, late: 20.7±2.4 s, paired t-test p = 0.46; Post failed bout IRTs - early: 16.9±2.2 s, late: 1.4±0.15 s, paired t-test p = 3.8e−4). Importantly, the late/early ratio was not different between “within attempted bout IRTs” and “post failed bout IRTs” (Figure 7C, Within attempted bout IRTs: 0.17±0.029; Post failed bout IRTs: 0.11±0.024; paired t-test p = 0.16), suggesting that while LowVariance mice successfully generated clustered lever presses, they failed to generate discrete bouts of 5 presses.

Discussion

Here, we sought to test whether specific schedules of reinforcement and contingencies within operant conditioning tasks were effective at promoting reproducible performance of actions. Using several variations of an FR5 lever pressing task, we found that the presence of a light cue signaling sucrose delivery was a necessary component both for learning to press the lever (Figure 1,2), clustering presses into bouts (Figures 3) and collecting the reinforcer in a reproducible manner (Figure 4), while a reinforcer collection condition did not dramatically improve nor hinder metrics of performance. In addition, we tested whether directly reinforcing the performance of bouts of five lever presses with low IRT variance (LowVariance) was more effective than a traditional FR5 strategy at generating reproducible actions. Although it promoted similar response rates to the other cued conditions, this strategy failed to generate organized bouts of lever presses of the target length (ie. 5 lever presses) or produce lever presses whose kinematics were less variable than those produced during FR5 training.

One striking result from our study is the learning deficits observed in the FR5 w/ MustCollect group, which was the only group without a cue that signaled sucrose delivery. Not only did these animals not learn to cluster their lever presses nor to readily collect the sucrose reinforcer as well as the other groups (Figures 24), but they also already showed deficits during the acquisition of FR1 compared to the FR5 w/ LightCue&MustCollect group (Figure 1). This finding is consistent with previous studies showing that “signaled” reinforcers reinforce behavior more effectively than “unsignaled” reinforcers (Branch, 1977; Doughty and Lattal, 2003; Lewis et al., 1974; Marcucella and Margolius, 1978; Sanderson et al., 2014; Schachtman and Reed, 1992) and that temporal proximity of reinforcement increases reinforcement learning efficiency (Arbel et al., 2017; Foerde and Shohamy, 2011; Peterburs et al., 2016; Weinberg et al., 2012; Yin et al., 2018). Indeed, as the reinforcer cue provides an immediate proxy for sucrose availability, there is no need for the mouse to check whether a reinforcer was delivered, thus dramatically improving the contiguity between action and outcome (Foerde and Shohamy, 2011; Garr et al., 2021; Urcelay and Jonkman, 2019).

While the light cue was effective at indicating sucrose delivery and contributed to reinforcing lever pressing, “post failed bout IRTs” in the LowVariance group were indistinguishable from “within attempted bout IRTs”, suggesting the animals failed to generate discrete bouts of 5 presses (Figure 7). This pattern of IRTs reveals that under this particular contingency, mice exclusively depended on the light cue to stop their bout of presses. One interpretation is that the LowVariance group simply engaged a stereotypical rhythm of presses and waited for the light cue to signal they should stop their ongoing bout, rather than executing a motor program consisting of a bout of 5 presses (Wymbs et al., 2012). In one case, the behavior is a bout of actions of predefined length while in the other it is a repeating action motif that is halted by external factors. Our study shows that one can easily masquerade as the other in the presence of a “signaled” reinforcer, which should be considered carefully when analyzing operant behaviors.

The LowVariance strategy we describe in this study resembles schedules such as differential reinforcement of low/high rates of behavior (DRL/DRH; Alleman, 1970; Ferster and Skinner, 1957; Krame and Rilling, 1970; Kuch and Platt, 1976) in that we sought to reinforce the performance of a subset of IRTs. However, while DRL/DRHs are designed to produce a desired rate or behavior through reinforcement of either long or short IRTs, our goal was to produce consistency in the temporal structure of responding, independent of the rate at which it is emitted. Furthermore, more valuable reinforcers are more effective at changing behavior than less valuable reinforcers in positive reinforcement settings (Baron et al., 1992; Blakely and Schlinger, 1988; Schlinger et al., 1990). Here the reinforcer delivery rate for the LowVariance group falls between 0 and 1 reinforcer per 5 presses depending on the mouse’s performance, while for a mouse performing a traditional FR5 the reward rate is 1 reinforcer to 5 presses. This difference likely explains why FR5 w/ LightCue&MustCollect mice pressed the lever at an overall faster rate than LowVariance mice (Figure 2C) and suggests that LowVariance mice were able to achieve mostly comparable performance compared to FR5 w/ LightCue&MustCollect when looking at measures of reproducibility (Figures 26) despite operating under a leaner reinforcement paradigm. Indeed, under fixed-ratio schedules, incentive variables such as satiety state (Malott and Cumming, 1966; Sidman and Stebbins, 1954), reinforcer magnitude (Lowe et al., 1974; Powell, 1969) and probability of reinforcement (McMillan, 1971) tend to modify response rates as a function of the latency to initiate responding post-reinforcement, while rates of responding within bouts typically remain unchanged.

Supporting results from previous studies, our findings show that the relationship between action execution and reinforcement rate is complex. For example, animals trained on DRL schedules effectively produce different rates of subsets of IRTs compared to control cohorts trained on variable ratio schedules with a rate of reinforcer delivery yoked to the DRL rates (Kuch and Platt, 1976). In addition, work in rats has shown that task structure influences the behavioral strategies underlying action control without affecting response rates, such that discrete trials promote habit formation while continuous FR5 promotes goal-directed behavior (Vandaele et al., 2017; Vandaele and Ahmed, 2020). How the manipulations we implemented here affect the cognitive basis of the reinforced lever pressing actions remains to be investigated.

Together, we described the complex relationship between operant contingencies and the induction of reproducible lever pressing patterns. We demonstrated the importance of a cue signaling reinforcer delivery on this behavior, showing that this cue was effective at increasing learning rates but resulted in mice pressing the lever in fast succession until the cue turned on, rather than pressing it in discrete bouts of five. Finally, we showed that a training strategy that directly reinforced bouts of lever presses with low IRT variance was not more effective than a traditional fixed ratio schedule at promoting reproducible action execution. Our findings provide insights into the parameters of behavioral training that promote reproducible action execution.

Acknowledgements:

This work was supported by NIH grants DA042111 and DA048931 to E.S.C., 5T32MH065215–18 to M.C and to MZL., as well as by funds from Brain and Behavior Research Foundation, the Whitehall Foundation, and the Edward Mallinckrodt, Jr. Foundation to E.S.C.

Footnotes

Conflict of Interest Statement: The authors have no conflicts to report.

REFERENCES

  1. Alleman HD (1970) Interresponse time reinforcement. University of Iowa Thesis
  2. Arbel Y, Hong L, Baker TE, et al. (2017) It’s all about timing: An electrophysiological examination of feedback-based learning with immediate and delayed feedback. Neuropsychologia 99: 179–186. DOI: 10.1016/j.neuropsychologia.2017.03.003. [DOI] [PubMed] [Google Scholar]
  3. Baron A, Mikorski J and Schlund M (1992) Reinforcement magnitude and pausing on progressive-ratio schedules. Journal of the Experimental Analysis of Behavior 2(2): 377–388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Blakely E and Schlinger H (1988) Determinants of pausing under variable-ratio schedules: reinforcer magnitude, ratio size, and schedule configuration. Journal of the Experimental Analysis of Behavior 1(1): 65–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Branch MN (1977) Signalled and Unsignalled Percentage Reinforcement of Performance Under a Chained Schedule. Journal of the Experimental Analysis of Behavior 27(1): 71–83. DOI: 10.1901/jeab.1977.27-71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Doughty AH and Lattal KA (2003) Response persistence under variable-time schedules following immediate and unsignalled delayed reinforcement. Quarterly Journal of Experimental Psychology Section B: Comparative and Physiological Psychology 56 B(3): 267–277. DOI: 10.1080/02724990244000124. [DOI] [PubMed] [Google Scholar]
  7. Duffy A, Latimer KW, Goldberg JH, et al. (2022) Dopamine neurons evaluate natural fluctuations in performance quality. Cell Reports 38(13). The Authors: 110574. DOI: 10.1016/j.celrep.2022.110574. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Faure A, Haberland U, Condé F, et al. (2005) Lesion to the nigrostriatal dopamine system disrupts stimulus-response habit formation. Journal of Neuroscience 25(11): 2771–2780. DOI: 10.1523/JNEUROSCI.3894-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Felton M and Lyon D (1966) The post-reinforcement pause. Journal of the Experimental Analysis of Behavior 9(2): 131–134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Ferster CB and Skinner BF (1957) Schedules of reinforcement. Appleton-Century-Crofts 79(1911): 5326. [Google Scholar]
  11. Foerde K and Shohamy D (2011) Feedback timing modulates brain systems for learning in humans. Journal of Neuroscience 31(37): 13157–13167. DOI: 10.1523/JNEUROSCI.2701-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Gadagkar V, Puzerey PA, Chen R, et al. (2016) Dopamine neurons encode performance error in singing birds. Science 354(6317): 1278–1283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Garr E, Padovan-Hernandez Y, Janak PH, et al. (2021) Maintained goal-directed control with overtraining on ratio schedules. Learning & memory (Cold Spring Harbor, N.Y.) 28(12): 435–439. DOI: 10.1101/lm.053472.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Gershman SJ and Ölveczky BP (2020) The neurobiology of deep reinforcement learning. Current Biology 30(11): R629–R632. DOI: 10.1016/j.cub.2020.04.021. [DOI] [PubMed] [Google Scholar]
  15. Greenstreet F, Vergara HM, Pati S, et al. (2022) Action prediction error: a value-free dopaminergic teaching signal that drives stable learning. BiorXiv
  16. Jin X, Tecuapetla F and Costa RM (2014) Basal ganglia subcircuits distinctively encode the parsing and concatenation of action sequences. Nature Neuroscience 17(3): 423–430. DOI: 10.1038/nn.3632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Krame TJ and Rilling M (1970) Differential reinforcement of low rates: a selective critique. Psychological Bulletin 74(4). DOI: 10.1037/h0021468. [DOI] [Google Scholar]
  18. Kravitz A v and Kreitzer AC (2012) Striatal Mechanisms Underlying Movement, Reinforcement, and Punishment. Physiology 27: 167–177. DOI: 10.1152/physiol.00004.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Kuch D and Platt JR (1976) Reinforcement rate and interresponse time differentiation. Journal of the Experimental Analysis of Behavior 3(3): 471–486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Lewis P, Lewin L, Muehleisen P, et al. (1974) Preference for Signalled Reinforcement. Journal of the Experimental Analysis of Behavior 22(1): 143–150. DOI: 10.1901/jeab.1974.22-143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Lowe CF, Davey GCL and Harzem P (1974) Effects of reinforcement magnitude on interval and ratio schedules. Journal of the Experimental Analysis of Behavior 3(3): 553–560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Malott RW and Cumming WW (1966) Concurrent Schedules of Interresponse Time Reinforcement: Probability of Reinforcement and the Lower Bounds of the Reinforced Interresponse Time Intervals. Journal of the Experimental Analysis of Behavior 9(4): 317–325. DOI: 10.1901/jeab.1966.9-317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Marcucella H and Margolius G (1978) Time Allocation in Concurrent Schedules: the Effect of Signalled Reinforcement. Journal of the Experimental Analysis of Behavior 29(3): 419–430. DOI: 10.1901/jeab.1978.29-419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. McMillan JC (1971) Percentage Reinforcement of Fixed-Ratio and Variable-Interval Performances. Journal of the Experimental Analysis of Behavior 15(3): 297–302. DOI: 10.1901/jeab.1971.15-297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Packard MG and Knowlton BJ (2002) Learning and memory functions of the basal ganglia. Annual Review of Neuroscience 25: 563–593. DOI: 10.1146/annurev.neuro.25.112701.142937. [DOI] [PubMed] [Google Scholar]
  26. Panigrahi B, Martin KA, Li Y, et al. (2015) Dopamine Is Required for the Neural Representation and Control of Movement Vigor. Cell 162(6): 1418–1430. DOI: 10.1016/j.cell.2015.08.014. [DOI] [PubMed] [Google Scholar]
  27. Peterburs J, Kobza S and Bellebaum C (2016) Feedback delay gradually affects amplitude and valence specificity of the feedback-related negativity (FRN). Psychophysiology 53(2): 209–215. DOI: 10.1111/psyp.12560. [DOI] [PubMed] [Google Scholar]
  28. Powell RW (1969) The effect of reinforcement magnitude upon responding under fixed-ratio schedules. Journal of the Experimental Analysis of Behavior 12(4): 605–608. DOI: 10.1901/jeab.1969.12-605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Sanderson DJ, Cuell SF and Bannerman DM (2014) The effect of US signalling and the US-CS interval onbackward conditioning in mice. Learning and Motivation 48: 22–32. DOI: 10.1016/j.lmot.2014.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Schachtman TR and Reed P (1992) Reinforcement signals facilitate learning about early behaviors of a response sequence. Behavioural Processes 26: 1–11. [DOI] [PubMed] [Google Scholar]
  31. Schlinger H, Blakely E and Kaczor T (1990) Pausing under variable-ratio schedules: interaction of reinforcer magnitude, variable-ratio size, and lowest ratio. Journal of the Experimental Analysis of Behavior 1(1): 133–139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Sidman M and Stebbins WC (1954) Satiation effects under fixed-ratio schedules of reinforcement. Journal of Comparative and Physiological Psychology 47(2): 114–116. DOI: 10.1037/h0054127. [DOI] [PubMed] [Google Scholar]
  33. Urcelay GP and Jonkman S (2019) Delayed Rewards Facilitate Habit Formation. Journal of Experimental Psychology: Animal Learning and Cognition 45(4): 413–421. DOI: 10.1037/xan0000221. [DOI] [PubMed] [Google Scholar]
  34. Vandaele Y and Ahmed SH (2020) Habit, choice, and addiction. Neuropsychopharmacology DOI: 10.1038/s41386-020-00899-y. [DOI] [PMC free article] [PubMed]
  35. Vandaele Y, Pribut HJ and Janak PH (2017) Lever insertion as a salient stimulus promoting insensitivity to outcome devaluation. Frontiers in Integrative Neuroscience 11(September): 1–13. DOI: 10.3389/fnint.2017.00023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Weinberg A, Luhmann CC, Bress JN, et al. (2012) Better late than never? The effect of feedback delay on ERP indices of reward processing. Cognitive, Affective and Behavioral Neuroscience 12(4): 671–677. DOI: 10.3758/s13415-012-0104-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Wymbs NF, Bassett DS, Mucha PJ, et al. (2012) Differential Recruitment of the Sensorimotor Putamen and Frontoparietal Cortex during Motor Chunking in Humans. Neuron 74: 936–946. DOI: 10.1016/j.neuron.2012.03.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Yin H, Wang Y, Zhang X, et al. (2018) Feedback delay impaired reinforcement learning: Principal components analysis of Reward Positivity. Neuroscience Letters 685: 179–184. DOI: 10.1016/j.neulet.2018.08.039. [DOI] [PubMed] [Google Scholar]

RESOURCES