Abstract
Six pigeons were trained on a procedure in which seven components arranged different food-delivery ratios on concurrent variable-interval schedules each session. The components were unsignaled, lasted for 10 food deliveries, and occurred in random order with a 60-s blackout between components. The schedules were arranged using a switching-key procedure in which two responses on a center key changed the schedules and associated stimuli on two side keys. In Experiment 1, over five conditions, an increasing proportion of food deliveries accompanied by a magazine light was replaced with the presentation of the magazine light only. Local analyses of preference showed preference pulses toward the alternative that had just produced either a food-plus-magazine-light or magazine-light-only presentation, but pulses after food deliveries were always greater than those after magazine lights. Increasing proportions of magazine lights did not change the size of preference pulses after food or magazine-light presentations. Experiment 2 investigated the effects of correlations between food ratios and magazine-light ratios: In Condition 6, magazine-light ratios in components were inversely correlated (−1.0) with food ratios, and in Condition 7, magazine-light ratios were uncorrelated with food ratios. In Conditions 8 and 9, pecks also produced occasional 2.5-s flashes of a green keylight. In Condition 8, food and magazine-light ratios were correlated 1.0 whereas food and green-key ratios were correlated −1.0. In Condition 9, food and green-key ratios were correlated 1.0 whereas food and magazine-light ratios were correlated −1.0. Preference pulses toward alternatives after magazine lights and green keys depended on the correlation between these event ratios and the food ratios: If the ratios were correlated +1.0, positive preference pulses resulted; if the correlation was −1.0, preference pulses were negative. These results suggest that the Law of Effect has more to do with events signaling consequences than with strengthening responses.
Keywords: choice, concurrent schedules, preference pulses, reinforcement, conditional reinforcement, pecking, pigeons
According to the usual phrasing of the law of effect, reinforcers increase the probability of responses that they follow (Skinner, 1938). This statement is generally accepted in the experimental and applied analyses of behavior as an accurate description of the process of reinforcement. It also is generally accepted that if a previously neutral stimulus is paired with a primary reinforcer (the “pairing hypothesis”), like food for a hungry animal, then the previously neutral stimulus becomes a conditional reinforcer, and it, too, will increase the probability of behavior that it follows—so long as the pairing of the conditional stimulus and the reinforcer is maintained. However, assuming that the stimulus–reinforcer pairing is an example of respondent conditioning, the simple pairing hypothesis requires modification because, as Rescorla (1967, 1968, 1972) showed, presentation of the unconditional stimulus (US) in the absence of the conditional stimulus (CS) changes the quantitative function of the CS—the CS needs to be differentially predictive of the US for it to act as a CS. Additionally, the pairing hypothesis is weakened by the finding that the brief stimulus that maintains considerable amounts of behavior in second-order schedules does not have to be paired with the reinforcer (Stubbs, 1971; Stubbs & Cohen, 1972). Alternatives to the pairing hypothesis that have found considerable support include delay-reduction theory (DRT; Fantino, 1969), in which a stimulus is said to become a conditional reinforcer when it signals a decrease in the expected time to a reinforcer; and the information theory of conditional reinforcement (Bloomfield, 1972).
Conditional reinforcement has been investigated in a number of different ways: in chained schedules, in concurrent-chained schedules, in second-order schedules, and in extinction. In the present experiment, we investigated conditional reinforcement in concurrent variable-interval (VI) schedules in which the food-delivery ratios changed frequently within sessions (Davison & Baum, 2000). This procedure arranged a random sequence of seven different food ratios in seven components within each session with a 60-s blackout between components. Each component lasted for 10 food deliveries. In this arrangement, food deliveries produce a period of heightened preference, lasting 20–30 responses and termed a “preference pulse,” for the alternative that just produced the food. Arguably, this is a prototypical reinforcement effect, though other reinforcement effects have been documented, such as increasingly high initial (and later) preferences as more food deliveries occur in sequence from the same alternative (“continuations”). The focus of the present experiment was to ask whether preference pulses similar to those following food occurred following putative conditional reinforcers. If stimuli paired with food are conditional reinforcers, we might expect to find preference pulses following such paired stimuli, though weaker than those following food. Additionally, decreasing the frequency of pairing with food might be expected to decrease the preference pulses following the conditional reinforcers (Baum & Davison, 2004). Thus, in Experiment 1, we arranged a baseline procedure as used by Davison and Baum (2000) and then, across conditions, increased the number of conditional reinforcers (magazine-light deliveries) relative to the number of food deliveries.
Experiment 1
Method
Subjects
Six show-homer pigeons numbered 91 to 96 were maintained at 85% ± 15 g of their free-feeding body weights. They had access to grit and water at all times.
Apparatus
The pigeons were housed individually in 375-mm high by 370-mm deep by 370-mm wide cages, and these cages also acted as the experimental chambers. On one wall of the cage were three 20-mm diameter plastic pecking keys set 100 mm apart center to center and 220 mm from a wooden perch situated 100 mm from the wall and 20 mm from the floor. Each key could be transilluminated by yellow, green, or red LEDs, and responses to illuminated keys exceeding about 0.1 N were counted as effective responses. Beneath the center key, and 60 mm above the perch, was a 40-mm by 40-mm magazine aperture. During food presentation, the keylights were extinguished, the aperture was illuminated, and the hopper, containing wheat, was raised for 2.5 s. The pigeons could see and hear pigeons in other experiments, but no personnel entered the experimental room while the experiments were running.
Procedure
The room lights turned on at midnight, and experimentation started at 12:30 a.m. The room lights were extinguished at 4 p.m. Sessions were conducted in the pigeons' numerical order. The sessions started with the lighting of keylights and ended in blackout after all seven components had been completed or after fixed times that were at least 30 min longer than the calculated session time.
As the pigeons were naïve at the start of the experiment, they were deprived to 85% of their free-feeding body weights and then autoshaped to peck all three keys and key colors over a period of 2 weeks. When the pigeons were pecking the keys reliably, variable-interval (VI) schedules were arranged on all keys and colors, and the mean intervals of these schedules were increased until all were VI 60 s. Then the experiment began.
Sessions commenced with either the left key or the right key lit yellow (randomly selected with p = .5), and the center key lit red. The component in effect was randomly chosen without replacement from the set of seven components. The procedure was a switching-key arrangement, with the center red key as the switching key. Responding on the two yellow side keys intermittently produced food (2.5 s access to wheat with the magazine light lit) and, in some conditions, 2.5 s magazine light without food. Both types of contingent event were arranged via concurrent VI schedules, with food ratios determined according to which component was in effect. When the red center key was pecked, the side key on which the animal had been responding was extinguished, and a further center-key peck turned on the other side key, and turned off the center key and made it inoperative. A peck on the newly presented side key turned the center-key light on again, and switches again became available. Technically, then, the procedure used a changeover ratio (Stubbs, Pliskoff, & Reid, 1977) of two responses. Responses to the changeover key were excluded from all analyses. The base VI schedule was arranged by interrogating a probability gate, set at .037, every second (that is, VI 27 s), and when this schedule arranged an event, it was allocated to the Left and Right keys according to a probability that changed across components. Thus, the VI schedules were dependently arranged—when one schedule set up an event (food plus magazine light, or magazine light, or green keylight in Experiment 2), timing of both schedules stopped until that event had been produced. Schedule timing continued during changing over. The experimental contingencies were controlled by MED-PC IV programs arranged on a remote PC-compatible computer, which recorded the time of every experimental and behavioral event with a resolution of 0.01 s.
Components, which were unsignaled, lasted for 10 food deliveries. Components were separated by 60-s blackouts. The component food ratios were 1∶27, 1∶9, 1∶3, 1∶1, 3∶1, 9∶1 and 27∶1 as in Davison & Baum (2000).
The changeover ratio (COR) was increased from zero to two responses during the first 200 sessions (not reported here), at which point performance was similar to that reported by Davison & Baum (2000) using a changeover delay rather than a changeover ratio. The benefit of using a COR on a separate key, rather than a changeover delay, is that the changeover responses themselves are not counted as part of the measure of preference.
The sequence of experimental conditions for Experiment 1 is shown in Table 1. Each condition lasted 100 sessions, and successive conditions had increasing frequencies of presentation of magazine lights from zero (baseline Condition 1) to eight on average per component (Condition 5). Magazine lights were scheduled in the following way: With a set probability (zero to .8), the control program (written in MED-PC IV) replaced a scheduled food delivery (food plus magazine light) with just a magazine light, but components still lasted for 10 food presentations. Thus, as magazine-light-only probability increased, the food delivery rate fell, but the overall frequency of events (food plus magazine light or magazine light alone) remained constant at 1 per 27 s. This way of arranging magazine lights produced, across components, a 1.0 correlation between magazine-light frequency from an alternative and food-delivery frequency from that alternative. Thus, in the 27∶1 food-ratio component, there was also a 27∶1 magazine-light ratio; in a 1∶27 food-ratio component, the magazine-light ratio was 1∶27. The scheduling and COR applied to magazine-light presentations in the same way as it applied to food deliveries.
Table 1. Sequence of conditions arranged in Experiments 1 and 2. Components changed after 10 food deliveries: Thus, when eight magazine-light presentations per component were arranged, components changed after an average of 18 events (food or magazine light). The value of r in Experiment 2 is the correlation of magazine-light and green-keylight presentation ratios with food ratios.
Condition | Arranged number of magazine-light presentations per component |
Experiment 1 | |
1 | 0 |
2 | 2 |
3 | 4 |
4 | 6 |
5 | 8 |
Experiment 2 | Correlation (r) between magazine light ratios, and green keylight ratios, with food ratios |
6 | Magazine light r = −1.0 |
7 | Magazine light r = 0 |
8 | Magazine light r = +1.0; Green keylights r = −1.0 |
9 | Magazine light r = −1.0; Green keylights r = +1.0 |
Results
The data used in all analyses were the times of all experimental events over the last 85 sessions of each condition. Group average data were obtained by taking the mean of the appropriate raw data across all individuals. This is not the preferred method of averaging (averaging, for example, the log response ratios would be preferable), but for some detailed analyses no responses were emitted on an alternative by a pigeon, giving an infinite log response ratio and making an average across pigeons impossible. Taking averages in different ways for different analyses would be confusing, and would make comparisons difficult.
Our first concern was that the COR procedure used here would produce results similar to those from the changeover-delay procedure that we used previously (Davison & Baum, 2000, 2002). We compared the data from Condition 1 to our previous data. Figure 1 shows two analyses of the data from Condition 1 using group data; Appendix Figures A1 and A2 show the same analyses for individual pigeons. The upper curve in Figure 1 shows sensitivity to food ratio between successive food deliveries in the components. Sensitivity estimates were obtained from the generalized matching law (Baum, 1974; Davison & McCarthy, 1988):
1 |
where B and R are, respectively, responses and food deliveries obtained on the left (L) and right (R) alternatives. The subscript i (i = 0 to 9) denotes food-delivery number within components. The parameter a is sensitivity to the food ratio, a measure of the change in log behavior ratio with a unit change in log food ratio, and log c is bias, a constant proportional preference for an alternative across changes in food ratios. Thus, Equation 1 was fitted, using linear regression, to (for example) log response ratio before the first food delivery and from one delivery to the next in components (i = 0 to 9) across the seven obtained component food ratios.
Sensitivity to food ratio increased from being slightly negative before any food had been obtained in a component to around 0.6 after the ninth food delivery. These results (Figures 1 and A1) are almost identical to those reported by Davison & Baum (2002) for a 60-s blackout between components and a changeover delay of 2 s. As Figure A1 shows, sensitivity increased similarly across successive food deliveries for the 6 pigeons, but sensitivity level varied (cf. Pigeons 91 and 94). The lower lines in Figures 1 and A1 show bias (log c in Equation 1). Bias showed no systematic variation and was idiosyncratic to the individuals, with Pigeon 94 showing a strong bias toward the right key. Sometimes bias increased (e.g., Pigeon 92), and sometimes it decreased (Pigeon 95), with successive food deliveries, and there was no consistent trend. The mean negative bias in Figure 1 was caused mainly by Pigeon 94's strong right-key bias.
Figures 2 (group data) and A2 (individual pigeon data) show log response ratios following food in Condition 1. The upper panel of Figure 1 shows log response ratios following left- and right-key food separately, whereas the lower panel shows these data collapsed across the alternatives as response ratios of the just-productive alternative (P) to the not-just-productive alternative (N). In this latter analysis, the response ratios are free of the overall right-key bias shown in the upper panel. The sample size necessarily decreases with increasing distance from food, increasing variability in log response ratios. Again, these were similar to those previously reported (Davison & Baum, 2002, 2003). Thus, the changeover delay and COR procedures produce very similar results. Log response ratios in Figure 2 (upper panel) were not symmetrical around 0 (indifference), indicating a small average bias (the horizontal line) toward responding on the right key (mostly contributed by Pigeon 94). Comparing the lower panel of Figure 2 with individual-pigeon data in Figure A2 demonstrates that mean results were representative of individual results also at this level of analysis. Because of the similarity in both of the above comparisons of individual and mean data, we concentrate on the group data in the remainder of the paper.
Figure 3 shows the changes in sensitivity and bias according to Equation 1 fitted to the seven food ratios arranged across components before the first 10 events in components. The separate panels show sensitivity and bias as the number of magazine-light-only presentations increased from 0 to an average of 8 per component (Conditions 1 to 5). The lower-right panel shows results from all five conditions and the mean. In all conditions, sensitivity increased in a negatively accelerated pattern from slightly less than 0 to between 0.67 and 0.73 after the ninth event. Bias values always were negative (around −0.2), and although trends across successive food deliveries occurred in some conditions, their direction was not systematic. The lower-right graph shows no systematic changes across conditions. Thus, in this analysis, there were no clear effects of increasing the number of magazine-light presentations or, indeed, of decreasing the overall food-presentation rate as more magazine-light-only presentations replaced food-plus-magazine-light presentations.
Log response ratios of the just-productive alternative (P) to the not-just-productive alternative (N) as a function of successive responses following food-plus-magazine-light presentations for Conditions 1 to 5 are shown in the upper portion of Figure 4. The ratios for magazine-light-only presentations are shown in the lower portion. As previously mentioned, variance increased with successive responses because sample size decreased. This was especially true for magazine-light-only presentations in conditions in which these were relatively infrequent. Preference pulses occurred after both food plus magazine light and magazine-light-only presentations, but the latter were always less extreme than the former at every ordinal response. Both types of preference pulses left longer-term changes in choice—that is, log response ratios did not fall to zero within 40 responses after events—but those after magazine light only were less extreme than those after food. Additionally, both types of pulses showed an asymmetry between left and right events, again implying an overall bias of about 0.2 toward the right alternative, but not seen in Figure 4. An analysis of changes in log response ratios over the first 40 responses across conditions after both food plus magazine light (N = 40, k = 4) and magazine light only (N = 40, k = 3) showed no significant trends as the frequency of magazine-light-only presentations relative to food plus magazine light presentations was increased (nonparametric trend test, z = 1.29 and −1.46 respectively, p > .05).
Discussion
When magazine lights and food were paired, magazine-light-only presentations that were contingent on responding led to an immediate increase in the probability of whichever response they followed (Figure 4). Thus, the magazine light appeared to act as a conditional reinforcer. This result would be expected from the previous research on conditional reinforcement. But two aspects of the results from Experiment 1 were unexpected. First, the size of preference pulses after food-plus-magazine-light presentations was not systematically affected by an increasing dilution by magazine-light-only presentations and, concomitantly, by a decreasing rate of food delivery. Davison and Baum (2000) found that sensitivity to food ratio, in the same type of analysis depicted in Figure 3, increased with increasing overall food-delivery rates. Since the overall frequency of events—whether food plus magazine light or magazine-light-only presentations—remained invariant, the overall frequency of events may have constituted the main variable controlling preference.
The second unexpected finding was that preference pulses following magazine- light-only presentations remained invariant as their frequency increased (Figure 4). Although food delivery always implied magazine light, the association (predictability) between magazine light and food decreased as the number of magazine-light-only presentations increased. Thus, in Condition 5, only 56% of all magazine-light presentations were associated with food. Most theoretical approaches, particularly those based on respondent conditioning, would predict that the magazine light would become a less effective conditional reinforcer as its predictability of food decreased. Additionally, if we theorized that the effect of reinforcement might be conserved, a food-reinforcer effect that is shared with a conditional reinforcer ought to decrease as a result of such sharing—that is, we might expect that total reinforcer effects are constant, and that we cannot create unlimited conditional reinforcers by pairing them with “primary” reinforcers.
However, in another sense, the predictability of food by magazine-light presentation remained constant. Because of the procedure we used in Experiment 1, there was a correlation of 1.0 between the ratio of magazine-light presentations and the ratio of food deliveries in the components. Thus, rather than the individual occurrence of the magazine light, the relative rate of magazine lights predicted the location at which more food would be found—in fact, predicted as well as the relative rate of food itself. Since the alternative producing the first event in the component (food plus magazine light or magazine light only) was also usually the richer alternative, the magazine light might have been differentially paired with the higher food rate. In that way, the magazine light may have acted as a discriminative stimulus for the future location of food on the left and right keys. Although one might suppose that the magazine light came to act as a conditional reinforcer, too, such an explanation may prove superfluous. If a signal occurs during responding at an alternative, and the signal indicates that more food is available there, and behavior increases for a while following that signal, is this an example of conditional reinforcement? Or is it more parsimonious to see it as discriminative control? Suppose instead that the signal indicates that food has become available at another location; then the immediate increase in responding at the other location could hardly be taken as conditional reinforcement because the rate of the prior behavior would be decreased, and the rate of alternative behavior would be increased, by the response-produced stimulus. Such a finding would support the view that the magazine light acted only as a signal.
If, as we have suggested, further food (or higher rates of food) were equally predictable from magazine-light-only presentations or from food delivery itself, then why were preference pulses after food plus magazine light larger than those after magazine-light-only presentations? Is this difference most parsimoniously explained by the concept of conditional reinforcement? Perhaps not. Perhaps some stimuli are simply more effective discriminative stimuli. Perhaps food is a more salient stimulus than magazine light?
These considerations led us to conduct a further experiment in which we manipulated the degree to which contingent stimuli signaled the location of future food, and the stimuli were unpaired with higher food rate or unpaired with food at all.
Experiment 2
In Experiment 1, across components, the ratio of food deliveries was perfectly correlated with the ratio of magazine-light-only presentations. We varied this correlation in Conditions 6 and 7 of Experiment 2. In Condition 6 (see Table 1), we arranged a correlation of −1.0 between food-and-magazine-light ratios, and in Condition 7 we arranged a zero correlation. Then, in Conditions 8 and 9, we added a further response-contingent event, a green keylight. In Condition 8, the magazine-light ratio was correlated +1.0 with the food ratio, whereas the green keylight ratio was correlated −1.0 with the food ratio. In Condition 9, the reverse was arranged; green keylight ratio was correlated +1.0, and magazine-light ratio was correlated −1.0, with food ratio. This set of conditions allowed us to assess whether events produce preference pulses only if they predict further food at an alternative and to determine whether a stimulus that is never paired with food, but is predictive of food rate, will produce preference pulses.
Method
Subjects, Apparatus, and Procedure
The subjects and apparatus were the same as in Experiment 1, and the sequence of conditions conducted is shown in Table 1. In Condition 6, the negative correlation between magazine-light ratio and food ratio was arranged by scheduling equal rates of events in all seven components on a VI 27-s schedule and varying the probability that these events would be food plus magazine light presentations across components. Thus, when the food ratio was 27∶1, the magazine-light ratio was 1∶27, and so on. Components still ended after 10 food deliveries. This resulted in the same set of food ratios in components as in Experiment 1, but an inverse ratio of magazine-light-only presentations. In Condition 7 (zero correlation), in each component, a VI 27-s schedule arranged all events, and these were either food plus magazine light presentations or magazine-light-only presentations with a probability of .5, and were then further allocated to the two alternatives with a set of probabilities that produced the usual food ratios but a magazine-light ratio equal to 1.0. If the event was to be a magazine-light-only presentation, it was allocated to either alternative with a probability of .5. Thus, food plus magazine light and magazine-light presentations occurred on average at intervals of 54 s, and the ratio of magazine-light-only presentations was always about 1.0, even as the food ratio varied across components.
Conditions 8 and 9 were arranged in a similar way, except that events that were not food plus magazine light presentations were divided (p = .5) into two categories: magazine light or a 2.5-s green keylight presented on the lit side key. In Condition 8, magazine-light-only presentations were then allocated to alternatives with the same probabilities as food deliveries, whereas green keylight presentations were allocated with the complementary probabilities. Thus, for example, if the food ratio in a component was 1∶9, the ratio of magazine-light-only presentations was also 1∶9, whereas the ratio of green keylight presentations was 9∶1. In Condition 9, green keylight presentations were allocated to alternatives with the same probabilities as food, whereas magazine-light-only presentations were allocated with the complementary probabilities, reversing the correlations of Condition 8. In both Conditions 8 and 9, the scheduling and COR requirement applied to green keylight presentations as well as to food plus magazine light and magazine-light-only presentations.
All other procedural details remained the same as in Experiment 1.
Results and Discussion
In the analysis of Experiment 2, we focused on preference pulses following food, magazine-light, and green keylight presentations. The data used were again pooled across the last 85 sessions of each condition.
Figure 5 shows group preference pulses (just-productive key P to not-just-productive key N) after food plus magazine light and magazine-light-only presentations for Conditions 6 and 7, with those for Condition 5 (+1.0 correlation of ratios) included for comparison. As Figure 5 shows, the preference pulse following food delivery remained approximately constant across Conditions 5 to 7. However, the correlation between magazine-light ratios and food ratios showed considerable effects: When the correlation was +1.0 between magazine-light ratio and food ratio, preference pulses produced by magazine-light-only presentations were large but smaller than those following food, and the effects lasted more than 40 responses. But when the correlation was −1.0 between magazine-light ratio and food ratio, only the first response ratio after a magazine-light-only presentation was toward the alternative (P) that produced the magazine light. All subsequent response ratios up to Response 40 were toward the alternative (N) that had not produced the magazine light. Finally, when the correlation was zero, a small preference pulse followed magazine lights lasting two responses, following which response ratios approximated indifference (zero). Thus, the magazine-light-only preference pulses reflect the degree to which magazine-light-only presentations predicted subsequent food deliveries; they do not reflect the pairing of magazine lights with food (conditional reinforcement). Furthermore, the results of Condition 6, in which following a response by a food-paired stimulus (magazine light) led to less subsequent responding, contradict the notion of conditional reinforcement, which would predict more responding to the conditionally reinforced alternative. Rather, the magazine light, despite being paired with food, acted as a discriminative stimulus signaling the conditional probability of food delivery at an alternative.
The results from Condition 8 (food correlated +1.0 with magazine lights and −1.0 with green keylights) are shown in the upper panel of Figure 6. The results were consistent with those from Conditions 5 and 6 (Figure 5): Food plus magazine light presentations resulted in a large preference pulse and a large residual preference up to Response 40; magazine-light-only presentations produced a smaller preference pulse and residual preference. Negatively correlated green keylights, never paired with food, produced a small one- to two-response preference pulse, followed by a small, but generally consistent, preference for the other alternative.
The lower panel of Figure 6 shows the results from Condition 9, in which magazine-light ratios were correlated −1.0 with food ratios, and green-keylight ratios were correlated +1.0 with food ratios. Green keylights, which were unpaired with food, produced a small preference pulse to the alternative on which the green keylight had been presented, and this preference lasted up to 40 responses following the lights. Magazine lights produced a two-response preference pulse to the alternative (P) that had produced the magazine light, followed by a small but consistent preference for the other alternative (N) until about the 29th response, from which point preference favored neither alternative consistently.
We investigated whether there were any differences between the effects of magazine-light and green-keylight ratios when they were correlated +1.0 and −1.0 with food ratios. Using the data shown in Figure 6, when r = +1.0, log response ratios following magazine-light-only presentations (open circles, upper graph) were significantly greater (binomial test, N = 40, p < .01) than those following green keylights (filled squares, lower graph); similarly, when r = −1.0, log response ratios following magazine lights (open circles, lower graph) were significantly less negative than those following green keylights (filled squares, upper graph; p < .01). However, although the differences were statistically significant, they were small in magnitude: Averaged over the 40 responses, the median differences between the +1.0 correlated log response ratios and between the −1.0 correlated log response ratios in Conditions 8 versus 9 were 0.02 and 0.06. We conclude that the magazine lights and green keylights had roughly equivalent effects.
Figure 7 shows changes in log response ratio in Conditions 8 and 9 across the first nine successive events (food, magazine light, or green keylight) in a component and for the first 10 responses following the event. Preference pulses appear here as the vertical spacing of data points at each step of the x-axis. In both Conditions 8 and 9, log response ratios usually increased with successive food events, as expected from the general increase in sensitivity within components shown in Figures 1 and 3. This result demonstrates the increasing control by food as the component progresses. Control by the magazine light only or by the green keylight similarly increased within components when the magazine-light ratio or green-keylight ratio was positively correlated with the food ratio. Increasing control in the opposite direction appears as a decrease in log response ratio when the magazine-light ratio or green-keylight ratio was negatively correlated with food ratio. Thus, for all three events, control continued to increase across successive events, ruling out any overshadowing of the effects of magazine light or keylight by food.
The results from Experiment 2 therefore supported the notion that magazine lights paired with food had much the same function as green keylights that were unpaired with food, and that these functions could be described parsimoniously as discriminative, rather than as conditionally reinforcing. Since the magazine light was always paired with food, and the green keylights were produced by both alternatives and only related by their ratio to the food ratio in an extended time frame, any theory of conditional reinforcement would predict a large difference in function. Both events, when their ratios were correlated +1.0 with food ratios across components, produced similar preference pulses—they increased the responses that they followed. But both events, when their ratios correlated −1.0 with food ratios (but with magazine lights still paired with food), produced subsequent preferences to the alternative that had not just produced the event—that is, they increased the probability of responses that they had not followed. Thus, for both magazine light and green keylight, control increased as the component progressed, independent of whether the correlation between event ratio and food ratio was +1.0 or −1.0.
General Discussion
We propose that the pairing of a stimulus with food does not make the stimulus into a conditional reinforcer that increases the probability of any response that it follows. Rather, responses that follow the stimulus are increased only if the stimulus signals the subsequent availability of food for that response. Even when a stimulus is paired with food, if it signals a lower subsequent probability of food, then it decreases the probability of responses that it follows. The notion of conditional reinforcement fails to explain the present findings, whereas the notion of discriminative control explains them. The only apparent contradiction lies in the one- or two-response preference pulses that followed negatively correlated or zero-correlated events (Figures 5, 6, and 7). These pulses probably arose simply from our having imposed a changeover or travel cost between alternatives—with a COR or COD (Davison & Baum, 2000) procedure, staying at an alternative for one or two responses after an event is preferable to switching simply on the basis that it avoids the COR or COD. The cost of changing over, all other factors aside, probably produced the one- to two-response pulses that were approximately invariant regardless of the response-produced stimulus (magazine light versus green keylight) or the correlation with the food ratio (+1.0 versus −1.0). That this second-order effect occurred after the green keylight, which was never paired with food, indicates that it cannot be explained as conditional reinforcement.
The present results may be compared to similar results from research using second-order schedules in which brief response-produced stimuli maintain considerable amounts of behavior. However, much research shows that such brief stimuli have the same effect even without being paired with food and that the amount of behavior maintained differs little between stimuli that are paired or unpaired (Neuringer & Chung, 1967; Squires, Norborg, & Fantino, 1975; Stubbs, 1971; Stubbs & Cohen, 1972; see the review by Marr, 1979). Neuringer and Chung called the function of unpaired, but apparently reinforcing, stimuli “quasi-reinforcement.” These results are evidence against the pairing hypothesis because the brief stimuli are only ever paired with nonreinforcement. The present research leads us to reinterpret Neuringer and Chung's conclusion: The brief stimuli may not have reinforcing properties at all—“quasi” or otherwise—they simply have discriminative properties because they signal which response will likely produce food. In our experiment, the signals indicated on average whether the alternative that just produced the event will likely produce food as well (+1.0 correlation of ratios) or whether the other alternative will likely produce food (−1.0 correlation of ratios). In experiments including only one food-producing activity, signals correlated with food increase that activity in comparison with other activities such as resting and grooming.
Food delivery itself has discriminative properties. This was demonstrated in the 1950s (e.g., Bullock & Smith, 1953; Dufort, Guttman, & Kimble, 1954; Reid, 1958) and more recently by Krägeloh, Davison, and Elliffe (2005) who arranged various conditional probabilities of food delivery given a prior food delivery. When food delivery signaled the absence of food delivery, subsequent preference pulses were predominantly toward the alternative (not-just-productive) key. Thus, it is but a small step from seeing putative conditional reinforcer effects as discriminative effects to seeing putative primary reinforcer effects similarly as discriminative effects. In most procedures, food delivery signals more food deliveries for the same activity, and that activity repeats. In the procedure of Krägeloh et al., or any procedure that requires alternation, food delivery signals less food available, and behavior switches. The question is: Are the effects that we traditionally have termed reinforcing effects only discriminative effects?
Baum (1973) suggested that correlation in time rather than pairing provided a general framework for understanding the effects of consequences such as food and electric shock. This framework might be expanded in light of the present results. To be effective, Pavlovian or respondent procedures that pair a stimulus with food or shock also must pair the non-occurrence of food or shock with the non-occurrence of the stimulus. Thus, the differential predictiveness of the stimulus produces its effect. Predictive stimuli within procedures that correlate certain activities with the occurrence of food or shock are usually called discriminative stimuli, signals, or cues. If we think of food or shock as predictive of itself, we may also understand the effects of common procedures that establish correlations between food or shock and the further likelihood of food or shock. These procedures, by creating the predictiveness of these events, guide activity toward or away from them, setting up the conditions for observing the effects commonly called reinforcement and punishment. For example, in a two-alternative situation, behavior shifts strongly to an unpunished alternative when mild electric shocks follow responses at the other alternative (Azrin & Holz, 1966). When we see situations in which food or shock signals its own unavailability correlated with some alternative activity, and we see behavior switch to the likely source of food or to the activity that avoids the shock, we may generalize further. The most general principle, rather than a strengthening and weakening by consequences, may be that whatever events predict phylogenetically important (i.e., fitness-enhancing or fitness-reducing) events, such as food and pain, will guide behavior into activities that produce fitness-enhancing events and into activities that prevent the fitness-reducing events (see Baum, 2005, for a view of operant behavior in an evolutionary framework). This principle embraces both our present results, the results of Krägeloh et al. (2005), and the phenomena of aversive control. A fuller discussion of the general principle awaits future research.
If reinforcement is defined only by its effect on behavior, then some traditional reinforcers may sometimes increase the behavior they follow and sometimes may not. Equally, the stimuli that occur at the same time as traditional reinforcers may act as conditional reinforcers or they may not. The present research, and that of Krägeloh et al. (2005), suggest that whether we call these processes “reinforcement” and “conditional reinforcement” depends on whether or not they signal a higher conditional probability of further reinforcers. On this view, and as is well known already, defining reinforcement as increasing the response that it follows is deeply problematic.
In Conditions 8 and 9, magazine lights that signaled that subsequent food was less likely produced slightly greater switching of preference than did green keylights that signaled the same lower likelihood of food. However, magazine lights that signaled either a higher or lower probability of future food resulted in an increment in positive preference pulses compared with green keylights that signaled the same probabilities. But these differences were small in comparison with the differences produced by positive versus negative correlation with food. This small difference might be an effect of pairing a stimulus with food.
One might ask what the present conception of reinforcement implies for behavior analysis. Since the signaling properties of positive or negative phylogenetically important events themselves, or of signals of these events, determine changes in behavior, the contingent presentation of putative reinforcers may have behavior-increasing effects (if the reinforcers signal more of the same) or behavior-decreasing effects (if they signal fewer of the same). Thus, in an economy in which only a small number of reinforcers are available, those reinforcers may actually decrease one activity and, at the same time, promote alternative activities. An example might be a foraging situation in which only a small fixed number of prey are available in a patch: If this fixed number can easily be discriminated, then animals will remove themselves immediately from the patch when that number has been obtained. If the number is larger, and cannot be discriminated accurately, then the animal will remove itself when some other stimulus condition occurs, such as after nonreinforcement for a period of time. In experimental sessions that end after a fixed number of food deliveries, the number of food deliveries obtained may signal the unavailability of food in the experimental setting, and perhaps the availability of food at another patch, such as postsession feeding in the home cage. With punishers, such as timeout (Dunn, 1990), the punisher may be ineffective in the long term if it signals a period of increased probability of reinforcers or decreased probability of punishers following the timeout.
Overall, the present approach suggests that considerable care needs to be taken to ensure that reinforcers do indeed signal more of the same, and that punishers do not signal periods of increased reinforcer frequency. For example, to keep a student working on task, we must not only remove immediate reinforcers for undesirable activity but also ensure that, in the long run, timeout cannot signal other reinforcers, such as sympathy from peers. The present results suggest that conditional probabilities of long-term events should be taken into account and exploited in planning behavioral interventions.
Acknowledgments
We thank Chris Krägeloh and Douglas Elliffe for many discussions of these data, and Mick Sibley for looking after the pigeons.
References
- Azrin N.H, Holz W.C. Punishment. In: Honig W.K, editor. Operant behavior: Areas of research and application. New York: Appleton-Century-Crofts; 1966. pp. 380–447. [Google Scholar]
- Baum W.M. The correlation-based law of effect. Journal of the Experimental Analysis of Behavior. 1973;20:137–153. doi: 10.1901/jeab.1973.20-137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baum W.M. On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior. 1974;22:231–242. doi: 10.1901/jeab.1974.22-231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baum W.M. Understanding behaviorism: Behavior, culture, and evolution. Malden, MA: Blackwell Publishing; 2005. [Google Scholar]
- Baum W.M, Davison M. Choice in a variable environment: Visit patterns in the dynamics of choice. Journal of the Experimental Analysis of Behavior. 2004;81:85–127. doi: 10.1901/jeab.2004.81-85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bloomfield T.M. Reinforcement schedules: Contingency or contiguity? In: Gilbert R.M, Millenson J.R, editors. Reinforcement: Behavioral analyses. New York: Academic Press; 1972. pp. 165–208. [Google Scholar]
- Bullock D.H, Smith W.C. An effect of repeated conditioning-extinction upon operant strength. Journal of Experimental Psychology. 1953;46:349–352. doi: 10.1037/h0054544. [DOI] [PubMed] [Google Scholar]
- Davison M, Baum W.M. Choice in a variable environment: Every reinforcer counts. Journal of the Experimental Analysis of Behavior. 2000;74:1–24. doi: 10.1901/jeab.2000.74-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davison M, Baum W.M. Choice in a variable environment: Effects of blackout duration and extinction between components. Journal of the Experimental Analysis of Behavior. 2002;77:65–89. doi: 10.1901/jeab.2002.77-65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davison M, Baum W.M. Every reinforcer counts: Reinforcer magnitude and local preference. Journal of the Experimental Analysis of Behavior. 2003;80:95–129. doi: 10.1901/jeab.2003.80-95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davison M, McCarthy D. The matching law: A research review. Hillsdale, NJ: Erlbaum; 1988. [Google Scholar]
- Dufort R.H, Guttman N, Kimble G.A. One-trial discrimination reversal in the white rat. Journal of Comparative and Physiological Psychology. 1954;47:248–249. doi: 10.1037/h0057856. [DOI] [PubMed] [Google Scholar]
- Dunn R. Timeout from concurrent schedules. Journal of the Experimental Analysis of Behavior. 1990;53:163–174. doi: 10.1901/jeab.1990.53-163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fantino E. Choice and rate of reinforcement. Journal of the Experimental Analysis of Behavior. 1969;12:723–730. doi: 10.1901/jeab.1969.12-723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krägeloh C.U, Davison M, Elliffe D.M. Local preference in concurrent schedules: The effects of reinforcer sequences. Journal of the Experimental Analysis of Behavior. 2005;84:37–64. doi: 10.1901/jeab.2005.114-04. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marr M.J. Second-order schedules and the generation of unitary response sequences. In: Zeiler M.D, Harzem P, editors. Advances in the analysis of behaviour: Vol. 1. Reinforcement and the organization of behavior. Chichester, England: Wiley; 1979. pp. 223–260. [Google Scholar]
- Neuringer A.J, Chung S.H. Quasi-reinforcement: Control of responding by a percentage-reinforcement schedule. Journal of the Experimental Analysis of Behavior. 1967;10:45–54. doi: 10.1901/jeab.1967.10-45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reid R.L. The role of the reinforcer as a stimulus. British Journal of Psychology. 1958;49:202–209. doi: 10.1111/j.2044-8295.1958.tb00658.x. [DOI] [PubMed] [Google Scholar]
- Rescorla R.A. Pavlovian conditioning and its proper control procedures. Psychological Review. 1967;74:71–80. doi: 10.1037/h0024109. [DOI] [PubMed] [Google Scholar]
- Rescorla R.A. Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology. 1968;66:1–5. doi: 10.1037/h0025984. [DOI] [PubMed] [Google Scholar]
- Rescorla R.A. Informational variables in Pavlovian conditioning. In: Bower G, editor. The psychology of learning and motivation, Vol. 6. New York: Academic Press; 1972. pp. 1–46. [Google Scholar]
- Skinner B.F. The behavior of organisms. New York: Appleton-Century-Crofts; 1938. [Google Scholar]
- Squires N, Norborg J, Fantino E. Second-order schedules: Discrimination of components. Journal of the Experimental Analysis of Behavior. 1975;24:157–171. doi: 10.1901/jeab.1975.24-157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stubbs D.A. Second-order schedules and the problem of conditioned reinforcement. Journal of the Experimental Analysis of Behavior. 1971;16:289–313. doi: 10.1901/jeab.1971.16-289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stubbs D.A, Cohen S.L. Second-order schedules: Comparison of different procedures for scheduling paired and nonpaired brief stimuli. Journal of the Experimental Analysis of Behavior. 1972;18:403–413. doi: 10.1901/jeab.1972.18-403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stubbs D.A, Pliskoff S.S, Reid H.M. Concurrent schedules: A quantitative relation between changeover behavior and its consequences. Journal of the Experimental Analysis of Behavior. 1977;27:85–96. doi: 10.1901/jeab.1977.27-85. [DOI] [PMC free article] [PubMed] [Google Scholar]