The Influence of Prior Choices on Current Choice

Xochitl de la Piedad; Douglas Field; Howard Rachlin

doi:10.1901/jeab.2006.132-04

. 2006 Jan;85(1):3–21. doi: 10.1901/jeab.2006.132-04

The Influence of Prior Choices on Current Choice

Xochitl de la Piedad ¹, Douglas Field ¹, Howard Rachlin ^1,^✉

PMCID: PMC1397795 PMID: 16602373

Abstract

Three pigeons chose between random-interval (RI) and tandem, continuous-reinforcement, fixed-interval (crf-FI) reinforcement schedules by pecking either of two keys. As long as a pigeon pecked on the RI key, both keys remained available. If a pigeon pecked on the crf-FI key, then the RI key became unavailable and the crf-FI timer began to time out. With this procedure, once the RI key was initially pecked, the prospective value of both alternatives remained constant regardless of time spent pecking on the RI key without reinforcement (RI waiting time). Despite this constancy, the rate at which pigeons switched from the RI to the crf-FI decreased sharply as RI waiting time increased. That is, prior choices influenced current choice—an exercise effect. It is argued that such influence (independent of reinforcement contingencies) may serve as a sunk-cost commitment device in self-control situations. In a second experiment, extinction was programmed if RI waiting time exceeded a certain value. Rate of switching to the crf-FI first decreased and then increased as the extinction point approached, showing sensitivity to both prior choices and reinforcement contingencies. In a third experiment, crf-FI availability was limited to a brief window during the RI waiting time. When constrained in this way, switching occurred at a high rate regardless of when, during the RI waiting time, the crf-FI became available.

Keywords: choice, law of exercise, sunk-cost, key peck, pigeons

Thorndike (1911) originally proposed a “law of exercise” along with a positive and negative “law of effect.” According to the law of exercise:

Any response to a situation will, other things being equal, be more strongly connected with the situation in proportion to the number of times it has been connected to that situation and to the average vigor and duration of the connections. (p. 244)

We can translate Thorndike's language into choice-based terms:

Choice of any alternative from a given set of alternatives will, other things being equal, increase in probability in proportion to the number of times that alternative has been chosen in the past from that set of alternatives.

Thorndike later abandoned both the negative law of effect and the law of exercise. It is now generally agreed that he was wrong to have abandoned the negative law of effect (Rachlin & Herrnstein, 1969). Was he also wrong to have abandoned the law of exercise?

Although Guthrie (1935) based his theory of learning entirely on the principle of exercise, modern theories of choice have ignored it. The generalized matching law (Baum, 1974), for example, describes choice as a function of parameters of reinforcement (rate, amount, delay, and so forth) contingent on one or another choice. Theories derived from matching such as Fantino's delay reduction theory (Fantino & Abarca, 1985) also are phrased in terms of choice as a function of its consequences—not as a function of prior choices. Economic maximization theory (Rachlin, Battalio, Kagel, & Green, 1981) is explicit in disregarding prior choices. Organisms are said to choose so as to maximize utility; to do this, current choices must be focused entirely on the future. In any kind of maximization theory, whether momentary (Shimp, 1966) or long-term (Rachlin et al., 1981), the past is meaningful only to the extent that it predicts or determines future value. Otherwise it should be ignored. Your decision to buy or sell a stock, for instance, should not be influenced by the fact that you bought or sold it in the past but only by whether you believe it will go up or down in the future. Economists call those past choices “sunk costs.” If you did use such past choices to guide your present behavior (given that those choices themselves did not affect future value), you would have committed the “sunk-cost fallacy.”

The only conceivable biological advantage of past choices as such influencing present choice is if past choices had been based on more valid information about the future than are present choices; this (more valid) information might be obscured at the present time by current incentives. Such a reason for the sunk-cost effect would be psychological, not economic, and would apply in situations requiring self-control. We shall return to this issue in the General Discussion.

The present experiments attempted to test the law of exercise with pigeons by exposing them to a continuous choice between two concurrent schedules where reinforcer parameters—probability, delay, amount—of each alternative remained constant over time. In Experiment 1, as time passed between reinforcers, nothing changed except the pigeons' own prior choices. If, between reinforcers, a pigeon's current choice varied as a function of its prior choices, it could only have been because of the influence of those prior choices. The finding that current choice varied directly with prior choices would be evidence for a positive law of exercise; the finding that current choice varied inversely with prior choices would be evidence for a negative law of exercise.

At the beginning of each session and after a 3-s postreinforcement blackout that followed each reinforcer delivery in Experiment 1 (Condition 2), the pigeons were exposed to two lit keys. The purpose of the blackout was to increase the salience of the start of the random-interval (RI) waiting period. Pecks on one key were reinforced on an RI schedule; pecks on the other key were reinforced on a tandem continuous-reinforcement, fixed-interval (crf-FI) schedule. In the tandem schedule, the fixed interval did not start to time out until the crf-FI key was pecked. Moreover, that first peck on the crf-FI key turned off the RI keylight and made RI pecks ineffective. The values of the RI and crf-FI schedules were 60 s and 14 s. These intervals kept the average immediacy of the rewards constant as long as both alternatives were available.

As long as the pigeon pecked exclusively on the RI key, both keys remained available. Therefore, at any point prior to RI reinforcement the pigeon could either (a) continue pecking the RI key until an RI peck eventually was reinforced, or (b) switch to the crf-FI key. If the pigeon switched, the RI key became unavailable until the crf-FI reinforcer had been obtained and a 3-s postreinforcement blackout had elapsed. Figure 1 illustrates the three possible postreinforcement sequences of responding. The first (a) shows the case where the pigeon's first peck is on the crf-FI; the RI key becomes unavailable and the FI delay begins. The second two cases (b and c) show the two possible sequences given that the first peck was on the RI key. In the middle sequence (b) the pigeon pecks the RI key and keeps pecking it exclusively (RI waiting time) until reinforcement is obtained. In the lower sequence (c) the pigeon pecks the RI key initially, pecks it for a while without reinforcement, then switches to the crf-FI key. At that point the RI key becomes unavailable and the FI schedule begins.

Fig 1 — The upper diagram (a) illustrates the case in which the IRI starts with a peck on the crf-FI key, after which the RI key becomes unavailable and the FI delay begins to elapse. The next two diagrams (b and c) correspond to the two possible sequences given that the first peck was on the RI key. The middle diagram (b) illustrates an IRI in which the pigeon continues pecking on the RI key until reinforcement is delivered by that schedule. The lower diagram (c) illustrates an IRI in which the pigeon switches to the crf-FI prior to RI reinforcement. The RI waiting time is the time that elapses, during those IRIs initiated by a peck on the RI key, from that first peck until either (b) reinforcement is delivered in the RI or (c) the RI is abandoned by a switch to the FI. The overlapping representations of the crf-FI key indicate that even though the key is available during pecks to the RI, the FI interval does not start to elapse until the first peck on that key.

The random nature of the RI schedule ensured that the probability of RI reinforcement remained constant regardless of RI waiting time. Immediately after reinforcement, the expected time to reinforcement for pecks on the RI key was 60 s. After waiting any amount of time without RI reinforcement, the expected time to RI reinforcement was still 60 s and remained 60 s regardless of how long the pigeon waited (as long as the pigeon kept pecking the RI key at a normal rate). Correspondingly, the expected time to reinforcement for a switch to the crf-FI key remained constant at the FI value. Also, the amount of reinforcement delivered by either of the two schedules was the same (3 s of access to a hopper with mixed grain). These reinforcement-parameter constancies are the “other things” that both Thorndike's conception of the law of exercise and our relativistic conception of that law require be kept constant.

The choice contingencies of Experiment 1 differ crucially from those in concurrent variable-interval variable-interval (conc VI VI) schedules. In such schedules, the longer the time spent on either alternative without switching, the greater the probability that responding to the other alternative will be reinforced. Momentary maximization theories of choice such as Shimp's (1966) theory rely on this increased probability to explain switching. In Experiment 1, however, probability of reinforcement for switching from the RI to the crf-FI schedule remained constant (the pigeons could not switch from the crf-FI to the RI schedule).

Because the probability of reinforcement of both alternatives remained constant as long as the pigeon pecked the RI key, any choice theories based on reinforcement parameters would predict either: (a) exclusive choice of the more favorable alternative or (b) a constant probability of switching (from the RI to the crf-FI schedule) determined by some relative measure of the constant reinforcer values. If it is found that choice is neither exclusive nor constant but that it varies as a function of how long the pigeon has been pecking on the RI key (RI waiting time), then the variation must be explained in terms of either RI waiting time itself or the number of pecks on the RI key (the only parameters that systematically varied with RI waiting time). If local response rates had varied systematically during RI waiting time, then response rate could conceivably have been used to time the interval. But, as is typically the case with RI schedules, local rates did not vary in any systematic way for any pigeon and so could not have served this function.

Cognitive theories of choice—that is, theories relying on hypothetical internal mechanisms—also could not explain changes in the probability of switching from the RI schedule to the crf-FI schedule. As an example, consider Gibbon's scalar expectancy theory (SET; Gibbon, 1977; Gibbon, Fairhurst, Church, & Kacelnik, 1988). According to SET, the chooser repeatedly samples from a hypothetical internal distribution of prior interreinforcement times and current elapsed time to reinforcement (a pacemaker count). This yields a continuously varying estimate of time-left-to-reinforcement for each alternative. Actual choice is a function of the relative time-left estimates. But, in Experiment 1, as long as both alternatives were available (that is, prior to a switch from the RI to the crf-FI), time-left never changed for either alternative. During the RI waiting time, the time left on the RI 60-s schedule was always 60 s; the time left on the crf-FI schedule was always the FI value. Thus, because the time-left estimates would not change as a function of RI waiting time, SET must predict a constant rate of switching from the RI to the crf-FI, as must any theory that does not account in some way for effects of past choices per se on current choice.

One theory that does account for an influence of past events on current choice is Nevin's behavioral momentum theory (Nevin, Mandell, & Atak, 1983). But those events are past reinforcers, not past choices. In fact, behavioral momentum theory explicitly denies the relevance of past responding to current choice (Nevin & Grace, 2000).

Experiment 1 used the procedure illustrated in Figure 1 to determine whether the probability of a switch from the RI schedule would remain constant, increase, or decrease as a function of time spent pecking on the RI schedule without reinforcement. A constant switch probability over time would be evidence for the absence of an exercise effect in this experiment; a decrease in switch probability over time would be evidence of a positive exercise effect; an increase in switch probability over time would be evidence for a negative exercise effect.

Experiment 1

Method

Subjects

Three male White Carneau pigeons (Palmetto Pigeon Plant) maintained at approximately 80% of their free-feeding weight were housed individually in a colony room, with free access to grit and water, where they were exposed to a 12:12 hr light/dark cycle with controlled temperature. Their weight was monitored before and after each daily session; they received the postsession feeding necessary to maintain their target weights.

Apparatus

Sessions were conducted in four Extra Tall Modular Operant Test Chambers manufactured by MedAssociates, Inc. (30.5 cm wide, 24.1 cm deep, and 29.2 cm tall), each enclosed in a sound-attenuating box equipped with a ventilating fan that masked external noise. Mounted on the front panel of each chamber were three response keys (2.54 cm in diameter) arranged horizontally, 8 cm apart and 20 cm above the floor. The distance from the side walls to the center of each of the two side keys was 4 cm. The keys could be illuminated with green, red, or white light. Only the two side keys were used in this experiment, while the center key remained dark and inoperative. On the back panel of the chamber were three houselights (red, green, and white) that were not used during this experiment. Food reinforcement was access to mixed grain delivered by a food hopper located below the center key, 2 cm above the floor. During reinforcement the hopper was illuminated with white light. All events were controlled by a computer running Med-PC® for Windows.

Procedure

Because all pigeons had had previous experience with similar procedures, it was not necessary to shape their behavior; they were exposed directly to the first condition of the experiment. All sessions lasted 90 min or until 60 reinforcers had been collected, whichever happened first. Sessions started at approximately the same time every day and were run a minimum of 6 days per week.

Condition 1: RI60s Only

In this condition, subjects were exposed exclusively to the RI schedule, which was signaled with a green keylight for all pigeons. At the beginning of the session and 3 s after each reinforcer either the left or the right key was illuminated with green light (sides were assigned randomly after reinforcement). Pecks to the illuminated key were reinforced every 60 s, on average, according to an RI schedule of reinforcement (RI 60 s). In the RI schedule, there was a fixed per second probability (p = 0.0167) of reinforcement being available. The first peck after reinforcement was set up turned off the keylight and activated the food hopper for 3 s. An interval of 3 s followed reinforcement, during which all lights in the chamber were turned off (blackout). Then either the left or the right key was illuminated again as at the beginning of the session. Pecks to nonilluminated keys had no scheduled consequences. This condition was maintained for 15 sessions.

Condition 2: Modified Concurrent Choice

In this condition (diagrammed in Figure 1), the pigeons chose between the RI 60 s of Condition 1 and a crf-FI 14-s schedule (for Pigeon 3, this was reduced to crf-FI 10 s, as explained below). At the beginning of the session and 3 s after each reinforcer, the left and right keys were illuminated with green and red light. Colors were randomly assigned to each side after reinforcement. Pecks on the RI (green) key had the same scheduled consequences as in the previous condition, and had no effect on availability of the crf-FI schedule. A peck on the crf-FI (red) key turned off the RI key (making it inoperative) and started the crf-FI timer; the first peck after 14 s extinguished the red keylight and activated the hopper for a 3-s reinforcement period. Again, a 3-s blackout followed each reinforcer. This condition was run for 20 sessions. Behavior was judged to be stable well before this point because the hazard functions for each pigeon (see below) showed no apparent change in height or shape after the first 10 sessions.

Condition 3: Exclusive Choice

In this condition, pigeons were again faced with a choice between the RI and the crf-FI (with crf-FI values set at 14 s for Pigeons 1 and 2 and at 10 s for Pigeon 3). The procedure was identical to that of the previous condition, except that the first peck to either key following the blackout eliminated the availability of the other key, and the reinforcer was available only after the corresponding delay. This condition was run for a minimum of 20 sessions after which the proportion of choices over the last five sessions was visually judged to be stable.

Results

Condition 1: RI60s Only

The purpose of this condition was to expose the subjects to the geometric distribution of reinforcement generated by the RI 60-s schedule. Distributions of obtained delays to reinforcement for the 20 sessions of this condition are shown in Figure 2. For each pigeon, the graphs show the proportion of obtained reinforcers at every second, as well as the theoretical geometric distribution of delays expected given the RI schedule value.

Fig 2 — The graphs also present the curve corresponding to the programmed geometric distribution. The proportion of variance accounted for when fitting this function to the data (r-squared) is also shown.

The programmed distribution of delays (y = 0.0167*(.9833)^(s-1)) was fitted to the data of the obtained delay distributions and the resulting proportion of variance accounted for by the fit for each pigeon is shown in the figure. The arithmetic average delays (scheduled and obtained), as well as their corresponding standard errors, are shown in Table 1. Together, these data show that the program was effective in generating a random distribution of reinforcement delays.

Table 1. Average delays to reinforcement and standard errors (in seconds) for data in the first condition of Experiment 1.

Delays	Pigeon 1		Pigeon 2		Pigeon 3		Overall
Delays	M	SE	M	SE	M	SE	M	SE
Scheduled	58.05	1.94	58.81	1.93	59.82	1.97	58.89	1.12
Obtained	58.96	1.94	59.83	1.93	60.41	1.97	59.73	1.13

Open in a new tab

Condition 2. RI 60 s—crf-FI 14 s Modified Concurrent Choice

For this condition, the following information was recorded: number of RI and crf-FI initial choices (i.e., the first peck after the keys were lit), number of switches, and time at which each event (responses in RI or crf-FI, switches, and reinforcements) occurred. This information was used to compute the proportion of initial RI choices (number of times during an interreinforcer interval [IRI] in which the first peck occurred on the RI-key divided by total number of IRIs) and the proportion of switches to the crf-FI (number of times the pigeon switched to the crf-FI divided by total number of initial RI choices). The likelihood of a switch to the crf-FI as a function of time spent pecking on the RI key also was computed in order to determine the relation between switching behavior and RI waiting time.

Proportion of Choices and Switches

Figure 3 shows, for each pigeon, the proportion of IRIs with initial RI choices in each session, as well as the proportion of those IRIs that ended with a switch to the crf-FI schedule (i.e., the proportion of crf-FI choices conditional on initial RI choice). Note that by the 20th session, the proportion of switches made by Pigeon 3 had decreased and remained below 20%. Because our main interest was in the analysis of switches, the crf-FI value was decreased, for this pigeon only, from 14 s to 10 s. As a result, Pigeon 3's proportion of switches to the crf-FI increased, so that by the end of the second 20 sessions it fluctuated around 50%.

Fig 3 — Note that the graph for Pigeon 3 includes two sections; one corresponding to the sessions run with crf-FI 14 s and the other to crf-FI 10 s.

The figure shows that all pigeons were predominantly starting IRIs by choosing the RI schedule. Table 2 shows the mean proportion of RI choices and switches across sessions and their corresponding standard deviations. For all pigeons, the mean proportion of RI choices was above 85%. That is, pigeons strongly preferred the RI schedule at the beginning of the IRI.

Table 2. Mean proportion of RI choices and switches and standard deviations per session for Condition 2 of Experiment 1.

Proportion	Pigeon 1		Pigeon 2		Pigeon 3
	Pigeon 1		Pigeon 2		Phase 1: crf-FI 14 s		Phase 2: crf-FI 10 s
	M	SD	M	SD	M	SD	M	SD
RI choices	0.91	0.06	0.97	0.02	0.87	0.28	0.89	0.09
Switches	0.52	0.14	0.36	0.16	0.42	0.28	0.56	0.19

Open in a new tab

Note. For Pigeon 3 there are two phases. Phase 1 corresponds to Sessions 1 through 20, with crf-FI 14 s; Phase 2 corresponds to Sessions 21 through 40, where the crf-FI value was decreased to 10 s.

Switching Behavior

Given the initial preferences, it is important to look at the times at which switches occurred between reinforcers in order to determine whether preferences changed as a function of time into the IRI.

For each pigeon, all IRIs during which the RI key was pecked first were taken for analysis. As sections b and c in Figure 1 illustrate, each RI waiting time could end with only one of two events: an RI reinforcer or a switch to the crf-FI. Each RI waiting time was divided into 5-s intervals. The frequencies of switches or RI reinforcements were obtained for each 5-s bin and collapsed across all RI waiting times. All intervals that did not include either of these terminating events were considered time spent “waiting.” A measure of the probability of switching in each interval was computed by dividing the number of switches in each interval by the number of times the pigeon was still waiting at the midpoint of the interval (this measure is known as the hazard rate).

Data were organized in thirty 5-s bins for which the frequency of RI reinforcements and switches were obtained; the 30th bin, starting at 150 s, included all the events from this point forward. This information was used to perform a survival analysis in which the degree of fit of the observed hazard rate to a model that assumes a constant hazard rate (i.e., exponential) was tested. The data were tested against an exponential model because, as explained in the introduction, current models of choice would predict a constant rate of switching to the crf-FI. The hazard rate is the probability of a switch to the crf-FI during an interval, given that the pigeon has stayed (not switched) prior to that point. For each pigeon, the tests showed that the best-fitting exponential model (i.e., the exponential function with the best-fitting parameters) was significantly worse at describing the data than a model that does not assume constant likelihood of switching. The statistical information on the degree of fit of the exponential model is shown in Table 3.

Table 3. Best-fit estimates (Lambda) of an exponential model that assumes a constant hazard rate to the observed rate, with the corresponding standard errors (SE), log-likelihood, chi-square, and significance values, indicating that the obtained distributions are significantly different from the exponential model that assumes a constant rate of switching as a function of waiting time.

Pigeon	Lambda	SE Lambda	Log-likelihood	Chi-square	df	p
1	.053	.004	−1,889.33	166.43	28	p < .001
2	.035	.003	−1,662.90	127.84	28	p < .001
3	.079	.007	−1,986.24	127.78	28	p < .001

Open in a new tab

Figure 4 presents the obtained hazard rate functions for each pigeon, along with the best-fitting exponential model (for Pigeon 3, both the crf-FI 14-s and the crf-FI 10-s data are shown). Note that an exponential function is a horizontal line in Figure 4; this is because the likelihood of switching, according to the exponential model, is constant. In all cases, the lack of fit to the exponential model is due to the fact that the points at the beginning of the IRI are consistently above and the points at the end of the IRI are below the line corresponding to the exponential model. In the case of Pigeon 3, the hazard rate function for the crf-FI 14-s phase also follows this general pattern, although it is more horizontal than are those of the other pigeons. In general, the data of Pigeons 1 and 2 are closer to each other than to those of Pigeon 3, which are noisier for the crf-FI 10-s phase. However, although Pigeon 3's initial decrease is not as sharp as that of the other 2 pigeons, its rate of switching also decreases as a function of RI waiting time.

Fig 4 — The tics on the x axis correspond to 5-s bins. For Pigeon 3, the two functions correspond to the crf-FI 14-s (closed circles) and the crf-FI 10-s (open circles) phases of the condition. The horizontal dashed lines are best-fit estimates for the exponential model that assumes a constant rate of switching.

Condition 3: Exclusive Choice

Figure 5 shows the proportion of initial pecks to the RI key for Condition 3. The corresponding data for Condition 2 are repeated here for comparison. Note that the proportion of times an IRI was initiated with a peck to the RI key decreased in Condition 3, when the availability of the crf-FI key was removed contingent on the first RI-peck. The mean proportion of RI choices in the last five sessions of the condition decreased from .91, .97, and .89 in Condition 2, to .60, .79, and .31 in Condition 3 for Pigeons 1, 2, and 3.

Discussion

There were three main results. First, the pigeons did not show exclusive preference for the RI alternative throughout those IRIs where the RI key was pecked first. Although all pigeons initially chose the RI alternative on over 85% of the IRIs, they switched to the crf-FI on more than 35% of those IRIs. Second, preferences exhibited at the beginning changed as time into the IRI progressed. The poor fit to the exponential model (constant switching rate) is due to the fact that pigeons tend to switch more initially than later on during the IRI. Even though the hazard rate was low just after reinforcement, it decreased further as RI waiting time increased. That is, pigeons tended to choose the RI initially and, if not immediately rewarded, were more and more likely to switch to the crf-FI. Third, when the availability of the crf-FI key was removed by introducing a forced-choice procedure, initial preference for the RI decreased.

How would current models of choice explain these data? The pigeons did not maximize reinforcement rate over the session. In order to do that, they would have had to choose the crf-FI schedule exclusively; this would have yielded 3.53 reinforcers per minute. The rate of reinforcement actually obtained was 1.73, 1.36, and 1.95 reinforcers per minute for Pigeons 1, 2, and 3. This result is not surprising because a strong preference for a variable delay mixture over a fixed delay equal to the average delay of the mixture has been observed frequently (Bateson & Kacelnik, 1995, 1996, 1997; Cicerone, 1976; Davison, 1969; Herrnstein, 1964; Killeen, 1968; Mazur, 1984; Navarick & Fantino, 1976). The fixed-delay alternative was initially set at the harmonic average (average immediacy) of the random mixture (and then decreased for Pigeon 3). According to models that assume the maximization of immediacy (e.g., Bateson & Kacelnik, 1996; Mazur, 1984), Pigeons 1 and 2 should have been indifferent between the two alternatives. Yet a strong initial preference for the RI over the crf-FI was found. This preference may have been due to the fact that, in this procedure, subjects were not forced to commit to the random mixture once they chose it, but were allowed to sample it for as long as they chose and then switch. This freedom to leave the RI might have actually increased its initial value (Catania, 1980; Catania & Sagvolden, 1980; Cerutti & Catania, 1997). The results of the third condition seem to support this hypothesis. It seems probable that the RI in Condition 2 was more valuable initially than the crf-FI because of both the RI's potential for immediate reinforcement and the fact that pigeons could leave it any time.

None of these arguments implies that the hazard function should have varied as RI waiting time increased. Recall that, at every second, the prospective value of the alternatives faced by the pigeons was exactly the same; yet the probability of switching to the crf-FI decreased as a function of RI waiting time.

There is some similarity between this experiment and Gibbon and colleagues' time-left procedure (see, e.g., Gibbon & Church, 1981). However, in the time-left procedure, when variable delay mixtures (a truncated geometric distribution) were used, the fixed delay option was always the elapsing comparison and began to elapse at the beginning of the trial and not when subjects chose it (Brunner, Gibbon, & Fairhurst, 1994; Gibbon et al., 1988). Given the contingencies of the present experiment where, as long as the pigeon did not switch, the two options were always identical (i.e., the expected delay to reinforcement for each of them was always the same, regardless of how long the pigeon had been pecking on the RI key), whatever preference a model predicts at the beginning of the trial should apply as well further on into the RI waiting time. That is, according to Gibbon et al.'s SET, as well as other choice models (e.g., Grace, 1994; Killeen, 1982), the switching rate in this experiment should have remained constant.

In other words, investment of time in the RI schedule made future RI choices more likely. This positive exercise effect corresponds to the so-called sunk-cost “fallacy” cited in the human decision-making literature, whereby the likelihood of continuing a course of action increases as a function of previous investment in that alternative (e.g., Arkes & Blumer, 1985). The reason why this behavior is often considered fallacious, as stated in the introduction, is that current choices should be based on future benefits and costs (prospects) without regard to past choices (except to the extent that past choices may determine future values). That is, an exercise effect can disrupt utility maximization. However, as will be argued in more detail in the General Discussion, committing the sunk-cost fallacy may be useful in the context of self-control problems.

Of course, the fact that there was an exercise effect in this experiment does not mean that choice is only an effect of exercise or that the law of effect or any particular theory of choice is invalid. Compared to the effect of postchoice contingencies (the law of effect), the effect of prior choices on current choice (the law of exercise) would be expected to be weak. The purpose of Experiment 2 was to test this expectation. In Experiment 2, differential instrumental contingencies were imposed as waiting progressed. It was expected that when those postchoice contingencies opposed the effect of prior choices, the former would strongly dominate. At the same time, demonstration of typical postchoice contingency effects with the present procedure would show that the procedure itself does not somehow impose an exercise effect—in other words, that the exercise effect was not an artifact of the procedure of Experiment 1.

Experiment 2

In Experiment 1, the more time pigeons spent waiting for the RI reinforcement, the more likely they were to keep on waiting. This may have been at least partly due to the fact that, despite the uncertainty of the reward, reinforcement eventually always arrived. In Experiment 2, we introduced the possibility that the probabilistic reinforcement schedule would never deliver reinforcement. That is, if the pigeon waited on the RI for a specified time without switching to the crf-FI key, the probability of reinforcement dropped to zero. The specific point in time at which extinction was introduced for each pigeon was based on its behavior in Experiment 1.