Abstract
In Experiment 1 with rats, a left lever press led to a 5-s delay and then a possible reinforcer. A right lever press led to an adjusting delay and then a certain reinforcer. This delay was adjusted over trials to estimate an indifference point, or a delay at which the two alternatives were chosen about equally often. Indifference points increased as the probability of reinforcement for the left lever decreased. In some conditions with a 20% chance of food, a light above the left lever was lit during the 5-s delay on all trials, but in other conditions, the light was only lit on those trials that ended with food. Unlike previous results with pigeons, the presence or absence of the delay light on no-food trials had no effect on the rats' indifference points. In other conditions, the rats showed less preference for the 20% alternative when the time between trials was longer. In Experiment 2 with rats, fixed-interval schedules were used instead of simple delays, and the presence or absence of the fixed-interval requirement on no-food trials had no effect on the indifference points. In Experiment 3 with rats and Experiment 4 with pigeons, the animals chose between a fixed-ratio 8 schedule that led to food on 33% of the trials and an adjusting-ratio schedule with food on 100% of the trials. Surprisingly, the rats showed less preference for the 33% alternative in conditions in which the ratio requirement was omitted on no-food trials. For the pigeons, the presence or absence of the ratio requirement on no-food trials had little effect. The results suggest that there may be differences between rats and pigeons in how they respond in choice situations involving delayed and probabilistic reinforcers.
Keywords: choice, delay, probability, fixed ratio, fixed interval, lever press, key peck, rats, pigeons
If all else is equal, animals typically show a preference for an alternative that delivers a reinforcer on every trial over one that delivers a reinforcer on only a percentage of the trials (e.g., Battalio, Kagel, & McDonald, 1985; Logan, 1965; Waddington, Allen, & Heinrich, 1981). If there is also a delay between the choice response and delivery of the reinforcer, there can be a tradeoff between delay and probability: An animal may choose a reinforcer delivered on a probabilistic basis after a short delay over a reinforcer delivered with certainty after a longer delay. In a series of experiments (Mazur, 1989, 1991, 1995; Mazur & Romano, 1992), I have used an adjusting-delay procedure to study how delay to, and probability of, reinforcement combine to determine pigeons' responses in choice situations. This procedure involves a choice between a standard alternative, which has a constant delay (e.g., a 5-s delay, followed by food on 20% of the trials) and an adjusting alternative (e.g., a delay that varies over trials, followed by food on 100% of the trials). The delay for the adjusting alternative is systematically increased and decreased over trials, depending on the animal's choices, in order to estimate an indifference point—a delay at which the two alternatives are chosen about equally often. For example, in one study, pigeons showed an indifference point between food delivered on 20% of the trials after a 5-s delay and food delivered on 100% of the trials after a delay of about 17 s (Mazur, 1989).
This research also found that the presence or absence of distinctive stimuli during the delay intervals had a large effect on the pigeons' indifference points. The standard and adjusting alternatives were associated with red and green keylights, and red and green houselights were lit during the delays before food. In the choice situation just described (called the red-present condition), a peck on the red (standard) key was always followed by a 5-s delay with red houselights, and then food was delivered on 20% of the trials. However, in the red-absent condition, a slight change was made in the stimulus conditions: After a peck on the red key, the 5-s red houselights were present on the standard trials that ended with food, but they were omitted on the standard trials that ended without food. Rather, on no-food trials, a peck on the red key led only to the white houselights associated with the intertrial interval (ITI), which remained on until the next trial began. Under these conditions, the indifference points decreased from 17 s to about 7 s, indicating a much stronger preference for the standard alternative when the red houselights were omitted on standard trials without food.
Notice that this large change in preference occurred even though the delays and probabilities of the food deliveries were identical in both conditions. To account for these results, Mazur (1989) proposed that choice depended on the strengths of the conditioned reinforcers that preceded food (the red and green keylights and houselights), and that the strengths of the conditioned reinforcers were inversely related to the total time spent in their presence per food delivery (cf., Fantino, 1977; ). With a 20% chance of food, an average of five standard trials would occur per food delivery. Therefore, in the red-present condition, the red stimuli associated with the standard alternative would be present for an average of 30 s per food delivery (because with response latencies of about 1 s, there would be an average of five 1-s keylight presentations and five 5-s houselight presentations per food delivery). In red-absent condition, the red stimuli would be present for an average of only 10 s per food delivery (because there would be an average of five 1-s keylight presentations but only one 5-s houselight presentation per food delivery). According to this reasoning, the red conditioned reinforcers were stronger in the red-absent condition because food was delivered more frequently in their presence than in the red-present condition, and this is why the pigeons showed a much stronger preference for the standard alternative in the red-absent condition.
To address these matters in a more precise and quantitative way, Mazur (1989) used the following equation, known as the hyperbolic-decay model:
![]() |
V is the value or strength of a reinforcer that could be delivered after any one of n possible delays, Pi is the probability that a delay of Di seconds will occur, A is a measure of the amount of reinforcement, and K is a decay parameter that determines how quickly V decreases with increases in Di. Previous studies had found that Equation 1 could make accurate predictions for pigeons' indifference points in a variety of choice situations, including delay-amount tradeoffs (Mazur, 1987), choices between reinforcers delivered after fixed versus variable delays (Mazur, 1984), and choices between single and multiple delayed reinforcers (Mazur, 1986).
To apply this equation to choices involving delayed and probabilistic reinforcers, Mazur (1989) defined Di as the total time spent in the presence of the conditioned reinforcers per food delivery. For example, in the red-present conditions, Di would equal 6 s for the cases in which food was delivered after one choice of the standard alternative (1-s red keylight plus 5-s red houselight), 12 s for cases in which food was delivered after two choices of the standard alternative, and so on. As specified in Equation 1, each different value of Di would then be weighted by its probability of occurrence, Pi, to estimate the overall value of the standard alternative. Mazur (1989) showed that Equation 1 could make accurate predictions for the results of this experiment.
Another experiment in this series used the same procedures as in the red-present conditions, but the duration of the ITI was varied across conditions (Mazur, 1989, Experiment 2). Varying the ITI had no measurable effect on the pigeons' indifference points. From a logical perspective, longer ITIs might be expected to cause a decrease in preference for the probabilistic alternative, because a subject will usually have to wait through several trials, and several ITIs, before receiving food. However, with Di defined as the durations of the conditioned reinforcers, Equation 1 correctly predicts that ITI duration will have no effect in this situation, because the red and green stimuli were never presented during the ITI. Later studies found support for some counterintuitive predictions of Equation 1, including cases in which, with the appropriate arrangement of conditioned reinforcers, pigeons chose food delivered on 50% of the trials over food delivered on 100% of the trials (Mazur, 1995).
In summary, Equation 1 successfully predicted the results from several experiments on delayed and probabilistic reinforcement. However, all of the previous experiments used pigeons as subjects, and it remains to be seen whether similar results can be found with other species. Experiments 1 and 2 used an adjusting-delay procedure to determine whether similar effects of delay, probability, conditioned reinforcers, and ITI duration would be found with rats. The choice responses of these rats were different in some ways from the previous results with pigeons. To obtain more information about possible species differences, Experiment 3 used rats and Experiment 4 used pigeons in parallel experiments on probabilistic reinforcement in which standard and adjusting ratio schedules were used in place of standard and adjusting delays.
Experiment 1
The experiment included three phases. In Phase 1, the probability of reinforcement for the standard alternative was varied to measure the tradeoff between delay and probability. Phase 2 included conditions similar to the red-present and red-absent conditions of Mazur (1989) to determine whether the presence or absence of conditioned reinforcers on trials without food would affect rats' indifference points. In Phase 3, ITI duration was varied: In some conditions, trials occurred about every 40 s, and in other conditions, trials occurred about every 100 s.
Method
Subjects
Three experimentally naive male Sprague-Dawley rats, about 5 months old at the start of the experiment, were maintained at approximately 85% of their free-feeding weights. A 4th rat showed an extreme bias for the lever associated with the adjusting delay and was removed from the experiment.
Apparatus
The experimental chamber was a modular test chamber for rats, 30.5 cm long, 24 cm wide, and 21 cm high. The side walls and top of the chamber were Plexiglas, and the front and back walls were aluminum. The floor consisted of steel rods, 0.48 cm in diameter and 1.6 cm apart, center to center. The front wall had two retractable response levers, 11 cm apart, 6 cm above the floor, 4.8 cm long, and extending 1.9 cm into the chamber. Centered in the back wall was a nonretractable lever with the same dimensions, 2.5 cm above the floor. A force of approximately 0.25 N was required to operate each lever, and when a lever was active, each effective response produced a feedback click. Above each lever was a 2-W stimulus light, 2.5 cm in diameter. The lights above the left and back levers were white, and the light above the right lever was green. A pellet dispenser delivered 45-mg food pellets into a receptacle through a square 5.1 cm opening in the center of the front wall. A 2-W white houselight was mounted at the top center of the rear wall.
The chamber was enclosed in a sound-attenuating box containing a ventilation fan. All stimuli were controlled and responses recorded by an IBM®-compatible personal computer using the Medstate® programming language.
Procedure
Pretraining
Each rat was first placed in the test chamber for several brief sessions, with food pellets in the food tray. Once a rat promptly ate the pellets upon entering the chamber, lever pressing was trained on all three levers through an autoshaping procedure. About once a minute, the light above one of the levers was lit, and a lever press led to the delivery of a food pellet. If no lever press occurred within 10 s, the light was turned off and a pellet was delivered. After the rats began to press each lever when the light above it was lit, they received several 60-trial sessions in which a food pellet was delivered after a response on the back lever followed by a response on one of the front levers. As in the actual experiment, the light above the back lever was lit at the start of each trial. A response on the back lever caused the light above the lever to turn off, one of the front levers was extended into the chamber, and the light above that lever was lit. When that lever was pressed, it retracted, the chamber became dark, and a food pellet was delivered. The next trial started about 1 min later. Once the rats were reliably pressing the back and front levers on each trial, the first condition of the experiment began.
The experiment consisted of 12 conditions, which were divided into three phases.
Phase 1 (Conditions 1 to 3)
Every session lasted for 64 trials or for 60 min, whichever came first. Within a session, each block of four trials consisted of two forced trials followed by two choice trials. At the start of each trial, the houselight and the light above the rear lever were lit. A response on the rear lever was required to begin the choice period. On choice trials, after a response on the rear lever, the light above this lever was turned off, the lights above the two front levers were turned on, and the two front levers were extended into the chamber. A single response on the left lever constituted a choice of the standard alternative, and a single response on the right lever constituted a choice of the adjusting alternative.
If the adjusting (right) lever was pressed during the choice period, the two front levers were retracted, only the light above the right lever remained on, and there was a delay of adjusting duration (as explained below). At the end of the adjusting delay, the light above the right lever was turned off, one food pellet was delivered, and the chamber was dark for 1 s. Then the houselight was turned on and an ITI began. For all adjusting and standard trials, the duration of the ITI was set so that the total time from a choice response to the start of the next trial was 40 s.
If the standard (left) lever was pressed during the choice period, the two front levers were retracted and there was a 5-s delay during which only the light above the left lever was lit. At the end of the standard delay, the light was turned off, and on a certain percentage of the trials, a food pellet was delivered and the chamber was dark for 1 s. Then the houselight was turned on and the ITI began. If no food pellet was delivered on a standard trial, the chamber did not turn dark, and the houselight was turned on immediately after the 5-s standard delay. Food pellets were delivered on 100%, 20%, and 40% of standard trials in Conditions 1 to 3, respectively. Table 1 shows the order of conditions in Experiment 1.
Table 1. Order of conditions in Experiment 1.
| Condition | Standard reinforcer percentage | Stimulus condition | Trial spacing (s) |
| 1 | 100 | Light present | 40 |
| 2 | 20 | Light present | 40 |
| 3 | 40 | Light present | 40 |
| 4 | 40 | Light absent | 40 |
| 5 | 20 | Light absent | 40 |
| 6 | 20 | Light present | 40 |
| 7 | 20 | Light absent | 40 |
| 8 | 20 | Light present | 100 |
| 9 | 20 | Light present | 40 |
| 10 | 20 | Light present | 100 |
| 11 | 20 | Light present | 40 |
| 12 | 20 | Light present | 100 |
The procedure on forced trials was the same as on choice trials, except that only one lever, left or right, was extended into the chamber after a response on the back lever, and only the stimulus light above that lever was lit. A response on this lever led to the sequence described above. Of every two forced trials, one involved the left lever and the other the right lever. The temporal order of these two types of trials varied randomly.
After every two choice trials, the duration of the adjusting delay might be changed. If the subject chose the standard lever on both trials, then the adjusting delay was decreased by 1 s. If a subject chose the adjusting lever on both choice trials, then the adjusting delay was increased by 1 s (up to a maximum of 35 s). If the subject chose each lever on one trial, then no change was made. In all three cases, this adjusting delay remained in effect for the next block of four trials. At the start of the first session of a condition, the adjusting delay was 0 s. At the start of later sessions of the same condition, the adjusting delay was determined by the above rules as if it were a continuation of the preceding session.
Phase 2 (Conditions 4 to 7)
The purpose of this phase was to determine whether the absence of the delay-interval stimulus on trials without food would affect preference for the standard alternative. Condition 4 was similar to Condition 3 (40% of the standard trials ended with food), except for the following change. On standard trials without food, the light above the left lever was turned off immediately after a response on this lever (along with the light above the right lever if it was a choice trial), the houselight was turned on, and the ITI began. On standard trials with food, the light above the left lever remained on during the 5-s delay that preceded food, as in Condition 3.
In Conditions 5, 6, and 7, 20% of the standard trials ended with food. In Condition 6, the light above the left lever was lit for 5 s on both trials with and without food (making it a replication of Condition 2), but in Conditions 5 and 7, the light above the left lever was lit for 5 s only on trials with food. Conditions in which the light above the left lever was lit for 5 s on every standard trial will be called light-present conditions; conditions in which the light was only lit on trials with food will be called light-absent conditions.
Phase 3 (Conditions 8 to 12)
The purpose of this phase was to determine whether lengthening the ITI would affect preference. All conditions were light-present conditions, and 20% of the standard trials ended with food. The time from a choice response to the start of the next trial was 40 s in Conditions 9 and 11, but it was 100 s in Conditions 8, 10, and 12. The maximum possible adjusting delay was increased to 95 s in Conditions 8, 10, and 12. Because of the longer time between trials, sessions in these three conditions ended after 32 trials or 60 min, whichever came first.
Stability Criteria
All conditions lasted for a minimum of 20 sessions, except Condition 2, which lasted for a minimum of 27 sessions. After the minimum number of sessions, a condition was terminated for each subject individually when several stability criteria were met. To assess stability, each session was divided into two 32-trial blocks (or 16-trial blocks for the conditions with 32 trials per session), and for each block the mean adjusting delay was calculated. The results from the first two sessions of a condition were not used, and the condition was terminated when all of the following criteria were met, using the data from all subsequent sessions: (a) Neither the highest nor the lowest single-block mean of a condition could occur in the last six blocks of a condition, (b) the mean adjusting delay across the last six blocks could not be the highest or the lowest six-block mean of the condition, (c) the mean delay of the last six blocks could not differ from the mean of the preceding six blocks by more than 10% or by more than 1 s (whichever was larger), and (d) visual inspection revealed no systematic upward or downward trends in the last six blocks.
Results and Discussion
The number of sessions needed to meet the stability criteria ranged from 20 to 31 (median = 22 sessions). All data analyses were based on the results from the six half-session blocks that satisfied the stability criteria in each condition. For each subject and each condition, the mean adjusting delay from these six half-session blocks was used as a measure of the indifference point.
Figure 1 shows the indifference points from Phase 1, in which the reinforcement percentage for the standard alternative ranged from 20% to 100%. Overall, the results from Phase 1 were similar to those found by Mazur (1989) with pigeons: Indifference points increased at an accelerating rate with decreasing reinforcement percentages.
Figure 1. Mean adjusting delays (and standard deviations) are shown for each rat and for the group mean from Phase 1, in which the reinforcement percentage for the standard alternative ranged from 20% to 100%.
Figure 2 shows the indifference points from Conditions 3 through 7, which compared light-present conditions (white bars) and light-absent conditions (black bars). The first two bars are from conditions with 40% reinforcement for the standard alternative, and the last three bars are from conditions with 20% reinforcement. In contrast to results from several experiments with pigeons (Mazur, 1989, 1991, 1995), there was no evidence for a systematic difference between the light-present and light-absent conditions, either with 40% reinforcement or with 20% reinforcement.
Figure 2. Mean adjusting delays (and standard deviations) are shown for each rat and for the group mean from Conditions 3 through 7, which compared light-present conditions (white bars) and light-absent conditions (black bars).
Figure 3 shows the indifference points from Conditions 8 through 12, in which the trials occurred either about every 40 s (black bars) or about every 100 s (white bars). With only one exception (Rat 1 in Condition 10), indifference points were longer in conditions with 100-s trial spacing (M = 30.5 s) than in those with 40-s trial spacing (M = 22.9 s). A planned comparison compared the indifference points from the two 40-s conditions and the three 100-s conditions and found a significant difference, F(1, 8) = 17.45, p < .01. These results show that preference for the standard alternative, which had a 20% reinforcement percentage, decreased when the time between trials was longer.
Figure 3. Mean adjusting delays (and standard deviations) are shown for each rat and for the group mean from Conditions 8 through 12, in which the trials occurred either about every 40 s (black bars) or about every 100 s (white bars).
The results from these rats were similar to Mazur's (1989) results from pigeons in some ways, but different in others. As with the pigeons, the mean adjusting delays increased systematically as the reinforcement percentage for the standard alternative decreased (Figure 1). However, in contrast to the results with pigeons, the presence or absence of the 5-s stimulus lights on trials without food had no effect on the rats' indifference points (Figure 2). Furthermore, whereas the duration of the ITI had no effect on pigeons' indifference points, the rats' indifference points were longer in the conditions with longer ITIs than in the conditions with shorter ITIs (Figure 3).
Although there were some clear differences in the data from these rats compared to those obtained with pigeons, it is still possible to account for them using the same theoretical framework. As already discussed, Equation 1 provided a good account of the results from several experiments with pigeons when Di was defined as the time spent in the presence of the conditioned reinforcers—the red and green keylights and houselights that preceded food. Time spent in the ITIs was ignored in these analyses, primarily because the data dictated this approach: Variations in ITI duration had no discernable effect on the pigeons' choices. However, as Rachlin, Logue, Gibbon, and Frankel (1986) pointed out, ITI duration does indeed affect the average delay to a reinforcer that is delivered on a probabilistic basis. For example, if a reinforcer occurs, on average, every five trials, there will be an average of four ITIs between the first choice response and delivery of the reinforcer. From a logical perspective, one could argue that ITI duration should affect preference for a probabilistic reinforcer because the ITIs are part of the total time before reinforcer delivery. When Rachlin et al. had college students choose among hypothetical monetary reinforcers that differed in probability and amount, they found less preference for a low-probability reinforcer when ITIs were longer.
Suppose that the choices of the rats in this experiment were controlled by the total time to reinforcement, including both the delay periods with the stimulus lights and the ITIs. Equation 1 was used to generate predictions for this experiment (setting K equal to 1, and assuming an average response latency of 1 s—values that were used to make predictions for pigeons in previous studies). Two sets of predictions were made, one with ITI durations included in the calculation of Di, and one with ITI durations excluded. Table 2 compares both sets of predictions to the mean adjusting delays obtained in this experiment, averaged across subjects and replications.
Table 2. The mean adjusting delays from the different conditions in Experiment 1 are compared to the predictions of Equation 1, using computations that either included or excluded ITI durations in calculating the delay to reinforcement, Di. All durations are in seconds.
| Reinforcer percentage | Stimulus condition | Trial spacing | Mean adjusting delay | Predicted including ITI | Predicted excluding ITI |
| 100 | Light present | 40 | 6.5 | 5.0 | 5.0 |
| 40 | Light present | 40 | 12.6 | 13.4 | 9.1 |
| 20 | Light present | 40 | 22.3 | 26.1 | 14.5 |
| 40 | Light absent | 40 | 13.3 | 13.4 | 6.2 |
| 20 | Light absent | 40 | 21.5 | 26.1 | 7.8 |
| 20 | Light present | 100 | 30.5 | 29.6 | 14.5 |
The first three rows of Table 2 show that as reinforcement probability decreased, the indifference points increased in a manner that was roughly consistent with both sets of predictions, although the results were somewhat closer to the predictions that included ITI durations. A comparison of the light-present and light-absent conditions (rows 2 through 5) clearly favored the predictions that included ITI durations, both because there were no systematic differences between these two types of conditions, and because the mean adjusting delays were close to the predictions of Equation 1 when ITI durations were included in the calculation of Di. Finally, rows 3 and 6 in Table 2 show that even when ITI durations were included in the calculations, Equation 1 predicted only a small change in the indifference points (from 26.1 s to 29.6 s) when trial durations increased from 40 s to 100 s. The predicted difference is so small because, with typical parameter values, V in Equation 1 is primarily determined by the value of food when it occurs after a single 5-s delay. For cases in which food is delivered only after two or more trials, the calculated value of the food is so small that it makes little difference whether the trial duration was 40 s or 100 s. Therefore, the relatively small differences found in the conditions with 40-s versus 100-s trial durations (averaging 22.3 s and 30.5 s, respectively) were actually somewhat larger than those predicted by Equation 1 with K = 1.
In summary, when ITI durations were included in the calculations, the predictions of Equation 1, though not exact, were in rough correspondence with the group means obtained in this experiment. These findings suggest that, unlike the pigeons in previous experiments (Mazur, 1989, 1991, 1995), the rats in this experiment were sensitive to ITI duration.
Cardinal, Daw, Robbins, and Everitt (2002) used rats in an adjusting-delay procedure similar to the one used in this experiment. They reported that their rats were not sensitive to the duration of the adjusting delay, and they questioned whether this is a suitable procedure to measure rats' preferences. It is therefore worth considering whether the lack of a difference between the light-present and light-absent conditions in this experiment might reflect a problem with the adjusting-delay procedure itself. Two lines of evidence argue against this possibility. First, although it is not clear why Cardinal et al. did not find sensitivity to the adjusting delay in their study, other studies have successfully used the adjusting-delay procedure with rats (e.g., Mazur, 1988; Mazur, Stellar, & Waraczynski, 1987). Second, in the present experiment, the rats' indifference points showed sensitivity to reinforcement probability in Phase 1 and to ITI duration in Phase 3, so the results from these conditions provided clear evidence that the rats were sensitive to the size of the adjusting delay.
One possible explanation of the failure for the stimulus lights to affect the indifference points in Phase 2 is that these lights were simply not salient enough. The only difference between the light-present and light-absent conditions was that the light above the left lever was lit for 5 s on trials without food in the light-present conditions but not in the light-absent conditions. If the rats did not attend to this light, there would be no reason to expect a difference in performance between the light-present and light-absent conditions. This possibility was examined in Experiment 2.
Experiment 2
The purpose of this experiment was to determine whether a difference between light-present and light-absent trials would emerge if the delay-interval stimuli were more salient. To ensure that the rats would attend to the delay-interval stimuli, a fixed-interval (FI) schedule had to be completed during each delay period in the light-present conditions. In a light-absent condition, no FI schedule was in effect on no-food trials.
Method
Subjects and Apparatus
Three male Sprague-Dawley rats, about 22 months old at the start of the experiment, were maintained at approximately 85% of their free-feeding weights. They had served previously in other studies on choice in the same test chamber. The chamber was similar to the one used in Experiment 1.
Procedure
An adjusting-delay procedure similar to that of Experiment 1 was used: Sessions lasted for 64 trials or for 60 min, whichever came first, and each block of four trials consisted of two forced trials followed by two choice trials. The lever in the rear of the chamber was not used in this experiment.
Light-Present Conditions
At the start of a choice trial, the lights above the two front levers were turned on, and the two levers were extended into the chamber. A single response on the left lever constituted a choice of the standard alternative, and a single response on the right lever constituted a choice of the adjusting alternative.
If the standard (left) lever was pressed during the choice period, the right lever was retracted and the light above it was turned off, but unlike in Experiment 1, the left lever remained in the chamber, the light above it remained on, and a FI 4-s schedule was in effect. The first lever press after 4 s completed the FI requirement, the left lever was retracted, and the light above it was turned off. On 20% of the standard trials, a food pellet was delivered and the chamber was dark for 1 s. Then the houselight was turned on, and a 45-s ITI began. On the other 80% of the standard trials, no food pellet was delivered, the chamber did not turn dark for 1 s, and the ITI began immediately.
If the adjusting (right) lever was pressed during the choice period, the left lever was retracted and the light above it was turned off, but the right lever remained in the chamber, the light above it remained on, and a FI schedule of adjusting duration was in effect. The first lever press after the adjusting delay completed the FI requirement, the right leverwas retracted, the light above it was turned off, a food pellet was delivered, and thechamber was dark for 1 s. Then the houselight was turned on, and an ITI began. To keep the times between trial onsets approximately constant, the ITI for the adjusting lever was 49 s minus the duration of the adjusting delay.
The procedure on forced trials was the same as on choice trials, except that only one lever, left or right, was extended into the chamber at the start of the trial, and only the stimulus light above that lever was lit. A response on this lever led to the sequence described above. After every block of four trials, the adjusting delay might be changed, following to the same rules as in Experiment 1.
Light-Absent Condition
The procedure in the light-absent condition was the same in all respects except that on the standard trials without food, there was no FI schedule. On trials without food, immediately after a choice response on the standard lever, the lever was retracted, the light above it was turned off, and the ITI began.
Order of Conditions
The experiment began with nine sessions of a training procedure that was identical to the light-present conditions described above, except that all standard trials ended with reinforcement. Then all subjects received a light-present condition, a light-absent condition, and a second light-present condition. Each condition lasted for a minimum of 14 sessions and was terminated using the same stability criteria as in Experiment 1.
Results and Discussion
The number of sessions needed to meet the stability criteria ranged from 14 to 23 (median = 16 sessions). All data analyses were based on the results from the six half-session blocks that satisfied the stability criteria in each condition. For each subject and each condition, the mean adjusting delay from these six half-session blocks was used as a measure of the indifference point.
Figure 4 shows the mean adjusting delay for each rat in the three conditions. There was no systematic difference between the light-present conditions (white bars) and light-absent condition (black bars) for any subject. The mean indifference point was 27.9 s in the light-present conditions and 28.3 s in the light-absent condition.
Figure 4. Mean adjusting delays (and standard deviations) are shown for each rat and for the group mean from Experiment 2, which compared light-present conditions (white bars) and a light-absent condition (black bars).
In Experiment 1, it was possible that the rats did not attend to the white light above the standard lever that was present during the 5-s delays. If they did not, this could explain why there was no difference between the light-present and light-absent conditions in that experiment. However, in Experiment 2, it is difficult to argue that the rats did not attend to the difference between light-present and light-absent trials, because an FI requirement was in effect on light-present trials but not light-absent trials. There were actually three features of the no-food trials in the light-present conditions that were not included in the light-absent conditions: (a) The light above the left lever was on during the delay on all standard trials, (b) the left lever remained in the chamber, and (c) the FI schedule required at least one response on this lever before the delay period was terminated. That is, the light, lever, and FI schedule were present on all standard trials in the light-present conditions, but in the light-absent conditions, they were present on only the 20% of the trials that ended with food. Despite these differences, there were no systematic variations in the mean adjusting delays.
It is not clear why the rats in these experiments showed no differences between the light-present and light-absent conditions, whereas large differences have been found in experiments with pigeons (Mazur, 1989, 1995). However, the results of Experiment 2 suggest that the difference between rats and pigeons was not due to the rats' failure to attend to the delay-interval stimuli.
To make the events on food and no-food trials even more different, Experiment 3 (with rats) and Experiment 4 (with pigeons) used fixed-ratio (FR) schedules in place of the FI schedules used in Experiment 2. Similar procedures were used in these two experiments. The standard alternative was an FR 8 schedule, and food was delivered on 33% of the trials. The adjusting alternative was an adjusting ratio schedule, and food was delivered on 100% of the trials. In ratio-present conditions, the animals had to complete the FR 8 schedule on all standard trials, whereas in ratio-absent conditions, the FR 8 schedule was omitted on no-food trials.
Experiment 3
Method
Subjects and Apparatus
Four male Sprague-Dawley rats, about 8 months old at the start of the experiment, were maintained at approximately 80% of their free-feeding weights. They had served previously in other studies on choice in the same test chamber. The chamber was the same as in Experiment 1.
Procedure
The overall procedure was similar to that of Experiment 2: Sessions lasted for 64 trials or for 60 min, whichever came first, and each block of four trials consisted of two forced trials followed by two choice trials.
Ratio-Present Conditions
At the start of a choice trial, the lights above the two front levers were turned on, and the two levers were extended into the chamber. A single response on the left lever constituted a choice of the standard alternative, and a single response on the right lever constituted a choice of the adjusting alternative.
If the standard (left) lever was pressed during the choice period, the right lever was retracted and the light above it was turned off, but the left lever remained in the chamber, the light above it remained on, and a FR 8 schedule was in effect. After eight additional responses on the left lever, the lever was retracted, and the light above it was turned off. On 33% of the standard trials, a food pellet was delivered and the chamber was dark for 1 s. Then the houselight was turned on and a 40-s ITI began. On the other 67% of the standard trials, no food pellet was delivered, the chamber did not turn dark for 1 s, and the ITI began immediately.
If the adjusting (right) lever was pressed during the choice period, the left lever was retracted and the light above it was turned off, but the right lever remained in the chamber, the light above it remained on, and an adjusting ratio schedule was in effect. When the ratio requirement was completed, the right lever was retracted, the light above it was turned off, a food pellet was delivered, and the chamber was dark for 1 s. Then the houselight was turned on and a 40-s ITI began.
The procedure on forced trials was the same as on choice trials, except that only one lever, left or right, was extended into the chamber at the start of the trial, and only the stimulus light above that lever was lit. A response on this lever led to the sequence described above.
The adjusting-ratio schedule operated as follows: After every block of four trials, if the two previous choice responses were on the adjusting lever, then the adjusting ratio was increased by one response, up to a maximum of 100 responses. If the two previous choice responses were on the standard lever, then the adjusting ratio was decreased by one response. If one choice response was made on each lever, then there was no change in the adjusting ratio.
Ratio-Absent Conditions
The procedure in the ratio-absent condition was the same in all respects except that on the standard trials without food, there was no FR schedule. On trials without food, immediately after a choice response on the standard lever, the lever was retracted, the light above it was turned off, and a 40-s ITI began.
Order of Conditions
Conditions 1 and 3 were ratio-present conditions, and Conditions 2 and 4 were ratio-absent conditions. Each condition lasted for a minimum of 20 sessions and was terminated using the same stability criteria as in the previous experiments, except that the dependent variable was the mean adjusting ratio rather than the mean adjusting delay.
Results and Discussion
The number of sessions needed to meet the stability criteria ranged from 20 to 38 (median = 21.5 sessions). All data analyses were based on the results from the six half-session blocks that satisfied the stability criteria in each condition. Response rates on the ratio schedules were rapid, and slightly higher on the standard lever. In the ratio-present conditions, the mean response rates were 3.0 responses/s on the standard lever and 2.2 responses/s on the adjusting lever. In the ratio-absent conditions, the mean response rates were 2.8 responses/s on the standard lever and 2.4 responses/s on the adjusting lever. The slower response rates on the adjusting lever may be attributable to ratio strain with the longer ratios (see below).
For each subject and each condition, the mean adjusting ratio from the six half-session blocks that satisfied the stability criteria was used as a measure of the indifference point. Figure 5 shows the mean adjusting ratio for each rat in the four conditions. All 4 rats showed a similar pattern: the mean adjusting ratio was larger in the ratio-absent conditions (black bars, M = 79.5) than in the ratio-present conditions (white bars, M = 51.7). A repeated-measures analysis of variance (ANOVA) showed that this difference between the ratio-absent and ratio-present conditions was statistically significant, F(1, 3) = 52.05, p < .01). An increase in the adjusting ratio represents a decrease in preference for the standard alternative. Therefore, these results indicate that the preference for the standard alternative was greater in the ratio-present conditions (when an FR 8 schedule had to be completed every trial) than in the ratio-absent conditions (when the FR 8 schedule had to be completed on only 33% of the trials—those that included food).
Figure 5. Mean adjusting ratios (and standard deviations) are shown for each rat in Experiment 3, which compared ratio-present conditions (white bars) and ratio-absent conditions (black bars).
This result was not expected, and it is surprising because it suggests that the rats actually preferred a situation that required, on average, three times as many lever presses for every reinforcer obtained. In the ratio-absent conditions, one FR 8 schedule had to be completed for every food delivery, whereas in the ratio-present conditions an average of three FR 8 schedules had to be completed for every food delivery. Furthermore, because the duration of the ITI was always 40 s, the time needed to complete the FR requirements lengthened the total time between trials, and decreased the overall rate of reinforcement.
Before speculating about possible explanations, it will be useful to compare the performance of rats and pigeons on this task. Would pigeons also show greater preference for the standard alternative in ratio-present conditions? Experiment 4 was conducted to answer this question.
Experiment 4
Method
Subjects
Four male White Carneau pigeons were maintained at approximately 80% of their free-feeding weights. They had served previously in other experiments on choice.
Apparatus
The experimental chamber was 30 cm long, 30 cm wide, and 31 cm high. The chamber had three response keys, each 2 cm in diameter, mounted in the front wall of the chamber, 24 cm above the floor and 8 cm apart, center to center. The center key was not used in this experiment. A force of approximately 0.15 N was required to operate each key. Each key could be transilluminated with lights of different colors. A hopper below the center key provided controlled access to grain, and when grain was available, the hopper was illuminated with a 2-W white light. Two 2-W white houselights were mounted above the Plexiglas ceiling of the chamber. The chamber was enclosed in a sound-attenuating box containing a ventilation fan. All stimuli were controlled and responses recorded by an IBM-compatible personal computer using the Medstate programming language.
Procedure
The procedure was similar to that of Experiment 3. Sessions lasted for 64 trials or for 60 min, whichever came first, and each block of four trials consisted of two forced trials followed by two choice trials.
The white houselights were on throughout each session except when food was delivered. At the start of a choice trial, the left key (the standard key) was lit green and the right key (the adjusting key) was lit red. A single peck on either key constituted a choice of that alternative.
In ratio-present conditions, if the standard (left) key was pecked during the choice period, the right keylight was turned off, but the left keylight remained on, and a FR 8 schedule was in effect. After eight additional responses on the left key, the left keylight was turned off. On 33% of the standard trials, the houselights were turned off, the light above the food hopper was lit, and food was presented for 3 s. Then the houselights were turned on and a 40-s ITI began. On the other 67% of the standard trials, no food was delivered, the houselight remained on, and the ITI began immediately.
If the adjusting (right) key was pecked during the choice period, the left keylight was turned off, but the right keylight remained on, and an adjusting ratio schedule was in effect. When the ratio requirement was completed, the right keylight and the houselights were turned off, the light above the food hopper was lit, and food was presented for 3 s. Then the houselights were turned on and a 40-s ITI began.
The procedure on forced trials was the same as on choice trials, except that only one keylight, left or right, was lit. A response on this key led to the sequence described above. The adjusting-ratio schedule operated just as in Experiment 3.
In ratio-absent conditions, the procedure was the same in all respects except that on the standard trials without food, there was no FR schedule. On trials without food, immediately after a choice response on the standard key, the keylight was turned off and a 40-s ITI began.
Conditions 1 and 3 were ratio-present conditions, and Conditions 2 and 4 were ratio-absent conditions. Each condition lasted for a minimum of 20 sessions and was terminated using the same stability criteria as in Experiment 3.
Results and Discussion
The number of sessions needed to meet the stability criteria ranged from 20 to 31 (median = 22 sessions). All data analyses were based on the results from the six half-session blocks that satisfied the stability criteria in each condition. Response rates on the ratio schedules were rapid and similar on the two keys. In the ratio-present conditions, the mean response rates were 3.3 responses/s on the standard key and 3.5 responses/s on the adjusting key. In the ratio-absent conditions, the mean response rates were 3.1 responses/s on the standard key and 3.3 responses/s on the adjusting key.
For each subject and each condition, the mean adjusting ratio from the six half-session blocks that satisfied the stability criteria was used as a measure of the indifference point. Figure 6 shows the mean adjusting ratio for each pigeon in the four conditions. The mean adjusting delay in the ratio-absent conditions (black bars, M = 18.5) was slightly lower than in the ratio-present conditions (white bars, M = 22.2). A repeated-measures ANOVA found that this difference was just short of statistical significance, F(1,3) = 9.57, p = .054. In two ways, these results were different from those obtained with rats in Experiment 3. First, the pigeons showed no significant change in preference when the ratio requirement was omitted on no-food trials, whereas the rats showed a substantially lower preference for the standard alternative under these conditions. Second, compared to the mean indifference points listed above, the means from the rats in Experiment 3 (79.5 and 51.7, respectively) were much larger. Taken at face value (i.e., ignoring any differences that could be due to differences in the procedures for the rats and pigeons), these results suggest that the pigeons had a stronger preference for the probabilistic reinforcer (in both ratio-present and ratio-absent conditions) than did the rats in the previous experiment.
Figure 6. Mean adjusting ratios (and standard deviations) are shown for each pigeon in Experiment 4, which compared ratio-present conditions (white bars) and ratio-absent conditions (black bars).
Although these results from the pigeons are not as surprising as those from the rats in Experiment 3, they are still not what would be expected if we extrapolate from previous experiments with pigeons that compared red-present and red-absent procedures (Mazur, 1989, 1991). In those experiments, the standard alternative was a 5-s delay followed by food on 20% of the trials, and the differences between the red-absent and red-present conditions were large and unmistakable. For example, in Experiment 3 of Mazur (1989), the mean indifference points were about 17 s in the red-present conditions and 7 s in the red-absent conditions, implying a much stronger preference for the standard alternative when the red houselights were omitted on no-food trials. Assuming that varying the presence and absence of a ratio requirement would have at least as much effect on the pigeons' choices as varying the presence and absence of red houselights, the difference between conditions in the present experiment should have been substantial. However, there was only a small and inconsistent difference between the ratio-present and ratio-absent conditions. Extrapolating from the previous findings with red-present and red-absent conditions was not successful, but it is not clear why.
General Discussion
These experiments on choice with delayed and probabilistic reinforcers found some similarities between rats and pigeons, some differences between the two species, and some surprising results that will require additional research to interpret. The results can be divided into four areas: (a) The overall effects of decreasing probability of reinforcement on choice, (b) the effects of delay-interval stimuli, (c) the effects of ITI duration, and (d) the effects of ratio requirements.
Figure 1 showed that as the probability of reinforcement for the standard alternative decreased from 100% to 20%, the rats' indifference points increased dramatically, signifying a decreasing preference for the standard alternative. The curvilinear patterns in this figure resemble those found with pigeons (e.g., Mazur, 1989, 1991), and in fact the indifference points with 20% reinforcement (M = 22.3 s) are not very different from those obtained with pigeons under similar conditions (M = 19.9 s in Experiment 1 of Mazur, 1989). Because of the ITIs that occurred after every trial, the average time between a choice response and the delivery of food was much greater for the standard alternative than for the adjusting alternative. Therefore, the choices of both the rats and pigeons can be described as risk-prone—at the indifference point they selected the probabilistic reinforcer on about half of the trials despite its much longer average time to reinforcement.
Although the overall effects of a decreasing probability of reinforcement were quite similar for rats and pigeons, the effects of delay-interval stimuli and ITI durations were not. In Experiment 1, the rats' indifference points were not systematically different in the light-present and light-absent conditions, whereas previous studies with pigeons found much shorter indifference points in light-absent conditions (Mazur, 1989, 1991). To make the difference between the light-present and light-absent conditions more salient, Experiment 2 used FI schedules instead of the simple delays used in Experiment 1. Once again, there were no systematic differences between the light-present and light-absent conditions. However, the rats' indifference points were slightly longer with longer ITIs, whereas previous results from pigeons showed no effect of ITI (Mazur, 1989, Experiments 1 and 2).
There were clear differences in the responses of rats and pigeons under these conditions. Nevertheless, the same mathematical framework (Equation 1) can describe the results from both species, provided that different assumptions about the measurement of delay (Di in Equation 1) are used for the two species. My previous studies with pigeons found that Equation 1 could give a good description of the results if Di included only the time spent in the presence of the stimuli associated with the standard and adjusting alternatives (red and green keylights and houselights), and excluded the durations of the ITIs, during which white houselights were always on. This approach to measuring Di was dictated by the results, which showed clearly that only time spent in the presence of the red and green stimulus lights affected the pigeons' choices. I suggested that these stimuli could be considered conditioned reinforcers because they precede and predict the arrival of food and that Equation 1 provides a way of calculating the strength or value of these conditioned reinforcers.
In contrast to these findings, the data from Experiments 1 and 2 suggested that for rats, Di should reflect all of the time between a choice response and the delivery of food, including the ITI. Table 2 shows that calculations based on this assumption provided a fairly good account of the group results, whereas calculations that excluded the ITI did not.
Perhaps there is an inherent difference in how these two species respond to choice situations involving delayed and probabilistic reinforcers. Pigeons may be insensitive to the time between trials when choosing between certain and uncertain reinforcers, whereas rats may treat the time between trials as part of the total delay to food. If so, the choices of rats could be considered more rational because, as Rachlin et al. (1986) explained, time between trials affects the average delay to food when dealing with probabilistic reinforcers. Another possibility, however, is that these different results occurred because of differences in the procedures or stimuli that were used with the two species. Experiment 2 appeared to rule out the simple possibility that the rats did not attend to the delay-interval stimuli. However, other procedural differences could still account for the different results.
The results from the ratio-present and ratio-absent conditions in Experiments 3 and 4 also showed a clear difference between rats and pigeons, but they raised some unanswered questions as well. The difference was that the pigeons showed a small increase in preference (not statistically significant) for the standard alternative when the ratio schedule was omitted on no-food trials, whereas the rats showed a large and statistically significant decrease in preference. At least in one respect, these results were consistent with those of the previous studies: the direction of the difference between rats and pigeons was the same as in the experiments on light-present and light-absent conditions. That is, the pigeons showed greater preference for the light-absent conditions than did the rats, and the pigeons also showed a greater preference for the ratio-absent conditions than did the rats. The unanswered questions are why the absence of the ratio requirement on no-food trials had so little effect for the pigeons, and why its absence actually produced a decrease in preference for the rats.
The results from the rats are particularly surprising because they showed an increase in preference when an FR 8 schedule had to be completed every trial compared to when the FR 8 schedule had to be completed on only 33% of the trials. One possible explanation is that receiving a stimulus signaling the absence of food (i.e., the white houselights) is aversive, and this signal came immediately after a choice response in the ratio-absent conditions, but it was delayed in the ratio-present conditions until after the rats completed the FR 8 requirement. Previous studies have shown that subjects will avoid stimuli that signal the absence of reinforcement (e.g., Fantino & Case, 1983; Mulvaney, Dinsmoor, Jwaideh, & Hughes, 1974). A problem with this explanation, however, is that the white houselights also came on immediately on no-food trials in the light-absent conditions of Experiments 1 and 2, but the rats did not show a preference for the light-present condition—they merely showed no difference in preference between the light-present and light-absent conditions. There was obviously something different about the removal of a ratio schedule and the removal of a delay-interval light, but it will take additional research to determine the reasons for this difference.
The different behavior patterns found with rats and pigeons could represent an inherent difference in how these two species react to probabilistic reinforcement, or they could be due to the inevitable procedural differences that occur whenever research is conducted with different species. If the differences between rats and pigeons were due to procedural differences, it should be possible to modify the choice behavior of either or both species with suitable changes in the procedures. For instance, there should be a way to design a procedure in which pigeons' choices are sensitive to the duration of the ITI, or one in which rats' choices are sensitive to the presence or absence of delay-interval stimuli. Further research on this matter would be useful, because it could uncover the reasons for these apparent species differences, and because it could identify variables that modulate an animal's sensitivity to the effects of delay and probabilistic reinforcement.
Acknowledgments
This research was supported by Grant MH 38357 from the National Institute of Mental Health. I thank Michael Lejeune, Joseph Sastre, and Joel Yudt for their help in various phases of the research.
REFERENCES
- Battalio R.C, Kagel J.H, McDonald D.N. Animals' choices over uncertain outcomes. American Economic Review. 1985;75:597–613. [Google Scholar]
- Cardinal R.N, Daw N, Robbins T.W, Everitt B.J. Local analysis of behaviour in the adjusting-delay task for assessing choice of delayed reinforcement. Neural Networks. 2002;15:617–634. doi: 10.1016/s0893-6080(02)00053-9. [DOI] [PubMed] [Google Scholar]
- Fantino E. Conditioned reinforcement: Choice and information. In: Honig W.K, Staddon J.E.R, editors. Handbook of operant behavior. Englewood Cliffs NJ: Prentice-Hall; 1977. pp. 313–339. [Google Scholar]
- Fantino E, Case D.A. Human observing: Maintained by stimuli correlated with reinforcement but not extinction. Journal of the Experimental Analysis of Behavior. 1983;40:193–210. doi: 10.1901/jeab.1983.40-193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Logan F.A. Decision making by rats: Uncertain outcome choices. Journal of Comparative and Physiological Psychology. 1965;59:246–251. doi: 10.1037/h0021850. [DOI] [PubMed] [Google Scholar]
- Mazur J.E. Tests of an equivalence rule for fixed and variable reinforcer delays. Journal of Experimental Psychology: Animal Behavior Processes. 1984;10:426–436. [PubMed] [Google Scholar]
- Mazur J.E. Choice between single and multiple delayed reinforcers. Journal of the Experimental Analysis of Behavior. 1986;46:67–77. doi: 10.1901/jeab.1986.46-67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mazur J.E. An adjusting procedure for studying delayed reinforcement. In: Commons M.L, Mazur J.E, Nevin J.A, Rachlin H, editors. Quantitative analyses of behavior: Vol. 5. The effect of delay and of intervening events on reinforcement value. Hillsdale NJ: Erlbaum; 1987. pp. 55–73. [Google Scholar]
- Mazur J.E. Choice between small certain and large uncertain reinforcers. Animal Learning & Behavior. 1988;16:199–205. [Google Scholar]
- Mazur J.E. Theories of probabilistic reinforcement. Journal of the Experimental Analysis of Behavior. 1989;51:87–99. doi: 10.1901/jeab.1989.51-87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mazur J.E. Choice with probabilistic reinforcement: Effects of delay and conditioned reinforcers. Journal of the Experimental Analysis of Behavior. 1991;55:63–77. doi: 10.1901/jeab.1991.55-63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mazur J.E. Conditioned reinforcement and choice with delayed and uncertain primary reinforcers. Journal of the Experimental Analysis of Behavior. 1995;63:139–150. doi: 10.1901/jeab.1995.63-139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mazur J.E, Romano A. Choice with delayed and probabilistic reinforcers: Effects of variability, time between trials, and conditioned reinforcers. Journal of the Experimental Analysis of Behavior. 1992;58:513–525. doi: 10.1901/jeab.1992.58-513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mazur J.E, Stellar J.R, Waraczynski M. Self-control choice with electrical stimulation of the brain as a reinforcer. Behavioural Processes. 1987;15:143–153. doi: 10.1016/0376-6357(87)90003-9. [DOI] [PubMed] [Google Scholar]
- Mulvaney D.E, Dinsmoor J.A, Jwaideh A.R, Hughes L.H. Punishment of observing by the negative discriminative stimulus. Journal of the Experimental Analysis of Behavior. 1974;21:37–44. doi: 10.1901/jeab.1974.21-37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rachlin H, Logue A.W, Gibbon J, Frankel M. Cognition and behavior in studies of choice. Psychological Review. 1986;93:33–45. [Google Scholar]
- Shull R.L, Spear D.J. Detention time after reinforcement: Effects due to delay of reinforcement? In: Commons M.L, Mazur J.E, Nevin J.A, Rachlin H, editors. Quantitative analyses of behavior: Vol. 5. The effect of delay and of intervening events on reinforcement value. Hillsdale, NJ: Erlbaum; 1987. pp. 187–204. [Google Scholar]
- Vaughan W., Jr Choice: A local analysis. Journal of the Experimental Analysis of Behavior. 1985;43:383–405. doi: 10.1901/jeab.1985.43-383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waddington K.D, Allen T, Heinrich B. Floral preferences of bumblebees (Bombus edwardsii) in relation to intermittent versus continuous rewards. Animal Behavior. 1981;29:779–784. [Google Scholar]







