Skip to main content
Journal of the Experimental Analysis of Behavior logoLink to Journal of the Experimental Analysis of Behavior
. 2008 Nov;90(3):283–299. doi: 10.1901/jeab.2008.90-283

REINFORCER ACCUMULATION IN A TOKEN-REINFORCEMENT CONTEXT WITH PIGEONS

Rachelle L Yankelevitz 1,, Christopher E Bullock 1,, Timothy D Hackenberg 1
PMCID: PMC2582204  PMID: 19070337

Abstract

Four pigeons were exposed to a token-reinforcement procedure with stimulus lights serving as tokens. Responses on one key (the token-production key) produced tokens that could be exchanged for food during an exchange period. Exchange periods could be produced by satisfying a ratio requirement on a second key (the exchange-production key). The exchange-production key was available any time after one token had been produced, permitting up to 12 tokens to accumulate prior to exchange. Token accumulation, measured in terms of both frequency (percent cycles with accumulation) and magnitude (mean number of tokens accumulated), decreased as the token-production ratio increased from 1 to 10 across conditions (with exchange-production ratio held constant), and increased as the exchange-production ratio increased from 1 to 250 across conditions (with token-production ratio held constant). When tokens were removed, accumulation decreased markedly compared to conditions with tokens and the same schedules. These data show that token accumulation is an orderly function of token-production and exchange-production schedules, and they are broadly consistent with a unit-price model based on local and global responses per reinforcer.

Keywords: reinforcer accumulation, token reinforcement, unit price, fixed-ratio schedules, key peck, pigeons


The accumulation of resources for later use is common in everyday life. For example, bees visit and collect nectar from several flowers per foraging trip before traveling to the hive and depositing their cache, kangaroo rats collect multiple seeds in their cheek pockets before returning to the nest, and some foraging birds obtain several prey items before returning to the nest to feed young. Each of these examples is characterized by a three-component sequence: (a) the production of a resource, (b) the accumulation of that resource, and (c) the utilization of the resource. Together, the components constitute larger functional units of activity such as accumulating stores for the winter or birthing season. In ecological terms, the components might be called procurement, caching, and consumption; in economic terms, they might be earning, saving, and collecting/spending. In either case, one might expect accumulation to vary with the costs and benefits of the overall response sequence.

In the laboratory, resource accumulation has been studied by providing opportunities to earn multiple reinforcers before collecting them. In an experiment by Killeen (1974), for example, rats' lever presses produced food that was dispensed in a receptacle located at the opposite end of a straight alley. Once deposited, the food pellet remained in the receptacle until it was collected. Multiple lever presses thus added food pellets to the receptacle. The extent to which rats accumulated food in the receptacle depended on the travel cost, which varied across conditions from 300 to 2400 mm. As the distance between the lever and the food receptacle increased, the number of lever presses emitted between trips to the food receptacle (i.e., reinforcer accumulation) also increased. Killeen's study showed that accumulation varied as an orderly function of one cost embedded within the exchange sequence—that of changing from earning to consuming food.

McFarland and Lattal (2001) expanded upon Killeen's (1974) experiment, examining two dimensions of cost by manipulating both the response requirement of earning and collecting food and the distance between food-production and food-collection levers. Rats responded on two levers, one of which was on a moveable panel. Each completion of a fixed-ratio (FR) requirement on the “earn” lever produced a brief tone and earned an undelivered pellet. At any point after one pellet was earned, completion of a second FR requirement on the “collect” lever resulted in delivery of a food pellet already earned. All earned pellets could be collected successively by repeatedly fulfilling the collect FR requirement. Earn and collect FR requirements were each either FR 1 or FR n (with n varying between 7 and 20), and the distance between the levers was systematically varied across conditions, from 310 mm to 2480 mm. When the collect FR requirement was held constant, accumulation magnitude (average number of pellets accumulated per collection cycle) and frequency (number of cycles with accumulation) increased with distance between the two levers and decreased with increases in the earn FR requirement. When the earn requirement was held constant, accumulation varied directly with the distance between the levers.

Accumulation in these studies may be understood in terms of tradeoffs between reinforcer immediacy and overall reinforcement density (see Cole, 1990). As the changeover requirement (or travel cost) increases, the delay to the upcoming reinforcer also increases. If each food pellet is collected as soon as it is earned, then each food reinforcer requires FR n earn responses, plus the travel between the earn lever and the food receptacle (assuming constant collect/consumption requirements). Earning additional food pellets before collection/consumption results in fewer travel episodes per session, decreasing the overall effort and the average delay per food pellet. Indeed, the most globally efficient response pattern in the long run is to accumulate the maximum number of reinforcers. Conversely, consuming each reinforcer as it is earned (no accumulation) is the pattern with the highest local payoff, the one that minimizes the delay to the upcoming reinforcer. Intermediate levels of accumulation reflect tradeoffs between reinforcer immediacy and overall reinforcer density.

Although previous results are qualitatively consistent with this interpretation, one difficulty is that different components were defined in different units—lever presses and distance traveled. It would be preferable for quantitative purposes to define the costs of the various components in similar units. The present study was designed to accomplish this, examining accumulation as a parametric function of cost variables all measured in the same units: number of key-peck responses. Reinforcers were earned according to one FR schedule, with opportunities to exchange earned reinforcers arranged according to a second FR schedule (analogous to distance in McFarland & Lattal, 2001). An additional FR schedule permitted collection of earned reinforcers.

Defining the separate components as ratio schedules permits a straightforward analysis of the costs and benefits of accumulation in terms of unit price—the number of responses per unit of food reinforcer. For example, consider the unit price associated with accumulation versus nonaccumulation with an earn ratio of 1, a ratio of 50 to move from earning to collecting, and a collect ratio of 1. Accumulating the maximum number of reinforcers available in the present study yields twelve 2-s periods of food access, resulting in a unit price of 3.1 (74 responses for 24-s access to food). This overall unit price compares favorably to that associated with no accumulation (52 responses for 2-s access to food, a unit price of 26), despite the somewhat greater number of responses to the upcoming reinforcer (74 vs. 52). Intermediate levels of accumulation yield intermediate unit prices, permitting a quantitative analysis of the tradeoffs between local and more global reinforcement variables.

The ratio schedules comprising the overall sequence were embedded within a token reinforcement system. Tokens are conditioned reinforcers that can be earned, accumulated, and exchanged for other reinforcers (Kelleher, 1966). Tokens are often manipulable objects, but can be nonmanipulable as well (e.g., points, lights, etc.). In the present study, the tokens consisted of stimulus lamps mounted in a horizontal array above the response keys in a standard operant chamber for pigeons, patterned after similar procedures developed by Jackson and Hackenberg (1996; see also Bullock & Hackenberg, 2006; Foster & Hackenberg, 2004; Foster, Hackenberg, & Vaidya, 2001; Hackenberg & Vaidya, 2003; Pietras & Hackenberg, 2005).

Token-reinforcement schedules consist of three distinct schedule components: (a) the schedules by which tokens are produced (token-production schedule), (b) the schedule by which an exchange period is presented (exchange-production schedule) and (c) the schedule by which tokens are exchanged for food (token-exchange schedule) (Malagodi, Webbe, & Waddell, 1975). On conventional token-reinforcement schedules, the components are arranged successively, in chain-like fashion (Kelleher & Gollub, 1962). In the present experiment, the token-production and exchange-production schedules operated concurrently after the first token was produced, permitting pigeons to choose between producing additional tokens and exchanging earned tokens for food. The token-production and exchange-production ratios were altered systematically across experimental conditions, generating functions relating accumulation to the costs of earning tokens and of producing the exchange periods. The token-production schedule is analogous to the “earn” requirement, and the exchange-production schedule to the travel requirement in prior studies (Killeen, 1974; McFarland & Lattal, 2001). Thus, one might predict that accumulation would vary directly with the exchange-production ratio and inversely with the token-production ratio.

Another aim of the present study was to determine more precisely the discriminative functions of earned reinforcers. In previous studies of accumulation, earnings of food deliveries produced correlated stimuli (e.g., a brief tone), but it is unclear what, if any, effect the cache of accumulated food had on further accumulation. The spatial separation between the food and the food-producing lever may have attenuated discriminative control by the amount of food already earned. The present study was designed to enhance discriminative control over accumulation by the number of recently earned reinforcers through the use of the token reinforcement procedure. Once tokens were earned, they remained visible until exchanged for food, providing clear temporal and spatial markers of food availability and amount. The discriminative role of the tokens can also be readily changed or eliminated by altering the contingency between responding and token display or between token display and food delivery. To evaluate further the discriminative role of the tokens, some conditions were conducted without tokens present.

Method

Subjects

Four male White Carneau pigeons (Columba livia), numbered 38, 907, 2295, and 866, served as subjects. All except Pigeon 2295 had prior experience with token-reinforcement schedules. Pigeons were housed individually under a 16.5-hr/7.5-hr light/dark cycle (lights on from 7:00 a.m. until 11:30 p.m.) and had continuous access to water and grit in their home cages. Pigeons were maintained at 80% ± 20 g of their free-feeding weights by supplementary postsession feeding of mixed grain.

Apparatus

A standard experimental chamber was used. The chamber had inside dimensions of 483 mm long × 356 mm wide × 356 mm high. Three response keys, 25 mm in diameter, were arrayed horizontally across the intelligence panel. The keys could be illuminated yellow, red, or green, and required a force of 0.25 N to be operated. The centers of each key were 108 mm below the ceiling of the chamber and 89 mm apart. Above the three response keys was a row of 12 red lights, which served as tokens. Each token light had a diameter of 13 mm and protruded 13 mm into the chamber. The distance between the centers of each token was 29 mm, and the centers of the token lights were 50 mm above the centers of the response keys. The order of illumination of the tokens was always from left to right. Food delivery consisted of 2-s access to mixed grain through a centrally located rectangular aperture (50 mm × 58 mm), located 120 mm below the middle key. A photocell mounted in the food aperture ensured that each food delivery was of equal duration, timed from head entry into the hopper. A Sonalert was mounted behind the stimulus panel and emitted a 0.1-s tone when tokens were illuminated or extinguished. A white houselight centered above the token array was on throughout the session, and white noise masked external sounds continuously. Contingencies were programmed and data collected using a computer and MED-PC software located in an adjacent room.

Preliminary Training

Because Pigeon 2295 had no previous experimental history, key pecking was shaped by reinforcing with food successive approximations to pecking. The other 3 pigeons required no preliminary key peck training, but all pigeons received several training sessions in which the final performance was established via principles of backward chaining. In the first set of training sessions, each cycle began with the illumination of one stimulus light (token) along with the red center (exchange) key. One response on this exchange key turned off the token, produced 2-s access to food, and began the cycle anew. In the second set of training sessions, one token was illuminated, along with the right green (exchange-production) key. One response on this exchange-production key produced the exchange key, in the presence of which a response turned off the token and produced 2-s access to food. In the third and final set of training sessions, only the left yellow (token-production) key was initially available. One response on this key illuminated a token and the exchange-production key and extinguished the token-production key. One response on the exchange-production key extinguished this key and produced the exchange key, in the presence of which a response turned off the token and produced 2-s access to food. All training sessions ended after 30 reinforcers had been earned. Each pigeon completed all three sets of training sessions within 13 total sessions.

Experimental Procedure

In the terminal procedure, each cycle began with illumination of the left yellow (token-production) key. Satisfying a ratio contingency on this key produced a token. When the first token had been earned, the right green (exchange-production) key was also illuminated. Thus, after one token was earned, responses could be emitted on either the token-production key or the exchange-production key, schedules on each of which were arranged independently. Responses on the token-production key after 12 tokens were earned had no effect. When the ratio requirement on the exchange-production key had been satisfied, both the left and right keys darkened, and the center (exchange) key was illuminated red. Each peck on this exchange key darkened the rightmost lit token and produced a tone and 2-s access to food. This continued until all tokens (maximum of 12) earned that cycle had been exchanged for food, whereupon a new cycle began with illumination of the yellow token-production key.

The FR response requirements on the token-production and exchange-production keys were varied independently and parametrically across conditions. The exchange-production ratio was varied from 1 to 250 across conditions at each of three token-production ratios. For Pigeons 38, 907, and 2295, token-production FRs were 1, 5, and 10. For Pigeon 866, responding was too weak to meet stability criteria during FR 10 token-production conditions, so token-production FRs were 1, 2 and 5 for this subject. For ease of exposition, we refer to blocks of token-production conditions as phases. Thus, the exchange-production ratio was varied across conditions within a token-production phase, yielding between 15 and 19 unique conditions (between 4 and 7 exchange-production × 3 token-production) per subject. The token-exchange schedule (the schedule on the center key in the presence of which responses produced food) was held constant at FR 1 throughout the experiment. In general, at least one exchange-production ratio condition was replicated at each token-production ratio. Replications focused on intermediate values of the exchange-production ratio requirement in order to characterize more thoroughly the shape of the function relating exchange-production ratio to accumulation. Additionally, in several conditions tokens were absent. In these no-token conditions, all contingencies were identical to comparable conditions with tokens, except that no tokens or tones were presented. Conditions are denoted by indicating the phase (defined by token-production FR value) followed by the condition (defined by exchange-production FR value), such that FR 1/FR 25 indicates the condition in which the token-production ratio was 1 and the exchange-production ratio was 25.

Table 1 shows the order of conditions and the number of sessions conducted in each. Between the FR 1/FR 1 condition and the FR 1/FR 25 condition, the exchange-production ratio was gradually increased from FR 5 to FR 20 across several sessions. Responding grew too weak to meet stability requirements at the highest token- and exchange-production FR requirements and in some no-token conditions. In these cases, conditions were terminated prematurely, and these data were excluded from the analysis. Because an aim of the current study was to generate functions relating accumulation to the exchange-production ratio, we also excluded data from token-production phases without enough complete conditions to establish a function. This was the case for the highest token-production ratio for each subject. Excluded conditions (denoted with an asterisk) are listed in Table 1 to provide a fuller characterization of the sequence of conditions experienced by each subject.

Table 1.

Sequence of conditions, number of sessions in each condition, mean number of tokens per cycle, percentage of cycles with multiple tokens accumulated, and token-production and exchange-production response rates and preratio pauses for each pigeon.

Pigeon Token Production Exchange Production Number of sessions Average number of tokens per cycle Percent multiple-token exchanges Token-production responses per minute Exchange-production responses per minute Token-production pre-ratio pause Exchange-production pre-ratio pause
38 FR 1 FR 1 18 2.21 74.39 51.65 39.52 1.85 1.53
FR 1 FR 25 17 3.52 83.93 24.95 104.74 6.26 2.29
FR 1 FR 50 24 5.57 97.50 29.90 119.71 8.13 2.78
FR 1 FR 100 16 4.80 97.78 16.25 120.17 15.16 3.84
FR 1 FR 100 NT 36 2.19 46.77 4.52 99.25 50.46 9.47
FR 1 FR 50 REP 22 4.44 94.18 29.90 134.75 5.60 3.08
FR 1 FR 150 40 5.72 97.50 8.33 94.97 33.62 10.25
FR 1 FR 200* 30
FR 25 FR 1* 8
FR 10 FR 1 12 1.01 1.03 110.42 39.91 2.00 1.51
FR 10 FR 25 10 1.02 1.04 66.27 122.21 4.79 2.17
FR 10 FR 50 16 1.04 4.27 41.35 122.36 10.93 2.87
FR 10 FR 100 20 1.61 51.64 33.04 122.26 22.69 4.50
FR 10 FR 150* 61
FR 5 FR 1 11 1.00 0.00 82.44 46.39 1.69 1.31
FR 10 FR 100 REP 57 1.81 64.46 17.52 117.06 54.38 5.33
FR 10 FR 100 NT* 24
FR 5 FR 25 51 1.56 53.17 39.63 115.73 9.03 2.44
FR 5 FR 50 15 1.81 77.10 34.81 125.01 11.88 2.67
FR 5 FR 100 24 3.60 88.47 21.73 100.99 36.12 6.57
FR 5 FR 150 10 4.65 98.33 14.81 98.86 64.50 11.79
FR 5 FR 200 15 6.92 96.67 9.51 94.27 207.51 16.02
FR 5 FR 250* 17
FR 5 FR 100 REP 25 1.71 51.54 9.45 107.78 70.22 5.88
FR 5 FR 100 NT* 18
907 FR 1 FR 1 14 1.12 11.38 20.92 37.56 2.92 1.60
FR 1 FR 25 23 8.29 94.29 70.88 90.22 3.00 4.29
FR 1 FR 50 20 9.33 100.00 65.85 94.88 4.64 5.59
FR 1 FR 100 48 5.06 92.73 18.77 110.76 11.91 3.98
FR 1 FR 100 NT 18 2.15 36.11 7.31 91.22 29.94 3.64
FR 1 FR 50 REP 13 8.96 100.00 49.52 83.03 5.01 3.46
FR 1 FR 150 32 10.11 100.00 14.49 60.58 25.29 18.14
FR 25 FR 1* 32
FR 25 FR 25* 18
FR 25 FR 50* 23
FR 25 FR 100* 5
FR 10 FR 1 11 1.01 0.51 99.76 42.45 3.74 1.42
FR 10 FR 25 10 1.03 3.10 54.88 111.66 6.71 2.19
FR 10 FR 50 28 1.63 50.14 29.61 69.46 24.40 6.11
FR 10 FR 100 34 3.06 93.96 25.19 107.53 45.08 4.73
FR 10 FR 100 NT 21 1.92 15.55 16.72 103.50 46.46 3.12
FR 10 FR 100 REP 47 3.49 89.78 16.72 72.51 86.48 7.59
FR 10 FR 150 29 7.06 96.67 13.31 77.07 193.54 6.17
FR 10 FR 200 17 8.00 100.00 18.82 89.94 184.94 9.74
FR 10 FR 250 12 8.87 100.00 9.40 73.62 426.78 16.27
FR 5 FR 1 12 1.00 0.00 61.91 44.39 4.07 1.36
FR 5 FR 25 16 1.47 39.93 36.96 79.66 8.01 3.66
FR 5 FR 50 47 3.93 94.36 34.04 61.88 16.43 5.74
FR 5 FR 100 14 5.84 100.00 33.95 90.80 23.13 5.15
FR 5 FR 150 10 10.03 100.00 26.06 77.80 31.03 5.14
FR 5 FR 200 10 11.40 100.00 25.50 91.16 71.72 9.99
FR 5 FR 250 12 7.96 100.00 16.68 96.82 80.02 10.38
FR 5 FR 100 REP 10 3.11 87.99 23.55 96.81 24.15 4.88
FR 5 FR 100 NT* 23
2295 FR 1 FR 1 14 1.10 8.27 30.73 54.67 2.11 1.24
FR 1 FR 25 29 1.50 19.25 17.52 147.88 4.80 1.73
FR 1 FR 50 30 8.17 100.00 22.70 140.10 13.06 2.13
FR 1 FR 100 18 6.87 96.67 12.33 136.77 20.51 2.65
FR 1 FR 100 NT 19 4.28 75.00 9.37 115.90 23.97 4.79
FR 1 FR 50 REP 14 6.55 100.00 23.93 135.69 13.86 2.51
FR 25 FR 1* 12
FR 1 FR 150 18 11.40 100.00 17.41 90.72 32.98 10.60
FR 25 FR 25* 12
FR 25 FR 50* 14
FR 25 FR 100* 30
FR 25 FR 150* 10
FR 10 FR 1 11 1.00 0.00 114.93 52.63 2.43 1.14
FR 10 FR 25 11 1.01 0.51 58.47 143.92 6.98 1.48
FR 10 FR 50 12 1.00 0.00 39.38 129.83 11.80 2.74
FR 10 FR 100 29 2.03 42.21 31.04 111.86 29.96 6.46
FR 10 FR 150 29 2.12 52.21 16.65 109.78 61.95 16.68
FR 10 FR 200* 90
FR 10 FR 150 REP 20 1.89 34.76 6.26 74.29 235.25 30.44
FR 10 FR 100 REP 45 1.35 21.08 12.29 89.39 65.80 16.27
FR 10 FR 100 NT* 14
FR 5 FR 1 10 1.03 2.67 72.24 48.39 2.08 1.24
FR 5 FR 25 10 1.00 0.00 33.02 132.95 6.84 1.57
FR 5 FR 50 12 1.01 0.51 17.05 121.53 15.52 2.95
FR 5 FR 100 17 2.49 41.81 19.68 106.92 33.31 9.15
FR 5 FR 150 23 4.90 76.51 7.81 68.37 176.13 46.74
FR 5 FR 200 38 6.00 97.14 4.96 51.63 240.17 72.19
FR 5 FR 150 REP 13 7.11 100.00 9.33 47.76 122.06 49.55
FR 5 FR 150 NT* 36
866 FR 1 FR 1 14 1.04 2.65 38.91 48.18 1.59 1.27
FR 1 FR 25 13 1.12 11.86 9.66 69.39 6.89 5.18
FR 1 FR 50 18 1.76 58.36 6.75 79.32 13.03 5.51
FR 1 FR 100 18 5.43 96.92 8.05 48.75 22.29 8.91
FR 1 FR 100 NT 55 1.16 7.74 1.31 72.92 75.95 12.09
FR 1 FR 50 REP 65 1.63 59.48 9.60 106.43 9.69 5.12
FR 1 FR 150 20 7.83 100.00 12.92 63.30 46.55 19.92
FR 1 FR 200 20 10.10 100.00 0.87 60.12 727.81 36.17
FR 10 FR 1* 10
FR 10 FR 25* 10
FR 10 FR 50* 10
FR 10 FR 100* 13
FR 5 FR 1 10 1.00 0.00 77.53 46.75 1.85 1.29
FR 5 FR 25 12 1.00 0.00 42.36 135.08 4.97 1.55
FR 5 FR 50 11 1.05 4.85 20.23 116.17 13.02 3.13
FR 5 FR 100 27 1.07 6.08 11.52 112.19 23.33 4.90
FR 2 FR 1 16 1.01 0.51 50.89 41.32 1.48 1.45
FR 2 FR 25 12 1.03 3.37 19.56 110.93 5.41 2.37
FR 2 FR 50 38 1.77 73.12 18.30 136.13 9.86 4.60
FR 2 FR 100 12 1.86 75.65 8.79 127.02 24.23 6.06
FR 2 FR 150 11 6.09 100.00 9.57 104.34 45.92 11.29
FR 2 FR 200 36 5.68 90.00 2.35 63.33 264.08 25.35
FR 2 FR 100 NT 31 1.61 25.09 4.24 128.23 40.67 3.09
FR 2 FR 50 NT* 5

Sessions ended following 40 food deliveries or 75 minutes, whichever came first. Because up to 12 tokens could be exchanged for food in a given exchange cycle, the exact number of tokens per session could vary from 40 (if the final exchange cycle began after 28 food deliveries) to 51 (if the final exchange cycle began after 39 food deliveries). Conditions lasted for a minimum of 10 sessions, and remained in effect until accumulation magnitude (average number of tokens accumulated per cycle) and frequency (percent of cycles with multiple tokens exchanged) were deemed stable—five consecutive sessions that did not include the highest or lowest points of the condition and were free of monotonic trends.

Results

All analyses are based on the last five sessions of each condition. The main measures of accumulation, frequency and magnitude, are presented in Table 1 and summarized in Figures 1 and 2.

Fig 1.

Fig 1

Percentage of exchange periods during which multiple tokens were exchanged as a function of token-production and exchange-production ratios, for each pigeon averaged across the final five sessions of each condition. Error bars are standard deviations. Token-production ratios are depicted across columns, and exchange-production ratios within columns. Replications are represented by detached, unfilled points, and no-token conditions by detached, filled points. Replication and no-token condition symbols are slightly horizontally displaced to avoid overlapping data points.

Fig 2.

Fig 2

Mean tokens per exchange as a function of token-production and exchange-production ratios for each pigeon across the final five sessions of each condition. All other details are as in Figure 1.

Figure 1 shows mean accumulation frequency for each subject at each token-production FR and as a function of the exchange-production FR. Increasing the exchange-production FR generally increased accumulation at each token-production FR (comparing within columns in Figure 1). Increasing token-production FR generally decreased accumulation at a given exchange-production ratio (comparing across columns in Figure 1). Results of replicated conditions (unconnected open symbols) were similar to the original conditions, especially at the lower token-production ratios. Removal of the tokens (unconnected filled symbols) produced marked decreases in accumulation frequency when compared to standard token conditions for all pigeons: 96.0% with tokens vs. 41.4% without tokens.

Figure 2 shows mean accumulation magnitude for individual subjects at each token-production FR and as a function of the exchange-production ratio. In general, within each token-production FR, increasing the exchange-production ratio increased the number of tokens accumulated per exchange. Within each exchange-production FR, increasing the token-production FR decreased the number of tokens produced per exchange. Three of the four pigeons (38, 907, and 2295) accumulated fewer tokens on average during the FR 1/FR 100 condition than at lower exchange-production response requirements within the same token-production requirement, but accumulation returned to high levels in the subsequent FR 1/FR 150 condition. Replicated conditions yielded data similar to the original conditions for all pigeons. Removal of the tokens resulted in large decreases in accumulation magnitude, except for Pigeon 866 under FR 2/FR 100 and Pigeon 907 under FR 10/FR 100—conditions in which accumulation magnitudes were close to zero even with tokens. Pigeons accumulated an average of 5.5 reinforcers per cycle with tokens vs. 2.4 without tokens.

Figure 3 shows obtained unit prices (responses per reinforcer) associated with different patterns of accumulation as a function of token-production FR size and exchange-production FR size. (Responses on the token-production key after 12 tokens had been earned had no programmed contingencies but were included in the calculations. Such responses, although somewhat more frequent at higher exchange-production ratios, occurred with sufficiently low frequency that unit price calculations using obtained versus programmed responses per reinforcer were extremely similar.) The patterns of accumulation can be analyzed in cost-benefit terms, according to the following equation:

graphic file with name jeab-90-03-08-e01.jpg 1

where P is the unit price associated with the accumulation pattern, and A is the total seconds' access to food earned in the cycle. Rt is the number of responses allocated towards the token-production schedule, Rp is the number of responses allocated towards the exchange-production schedule, and Rx is the number of responses allocated to the token-exchange schedule, with each term in the numerator being summed across the entire cycle. The reference lines in Figure 3 are theoretical functions depicting unit prices associated with no accumulation (dashed lines) and maximum accumulation (solid lines). The former corresponds to sensitivity to the upcoming reinforcer, or minimizing local responses per unit of food. The latter corresponds to sensitivity to overall responses per reinforcer, or minimizing global unit price.

Fig 3.

Fig 3

Programmed and obtained unit prices as a function of token-production and exchange-production ratios for each pigeon. Obtained points were averaged across the final five sessions of each condition. The reference lines correspond to theoretical unit prices resulting from minimization of local responses per food delivery (dashed lines) and minimization of global responses per food delivery (solid lines). See text for other details.

For most pigeons, in conditions with tokens, obtained unit prices were closer to those predicted by global minimization than by local minimization, especially at the higher exchange-production ratios and the lower token-production ratios. The predictions were in closer accord with global minimization in 81% (17 out of 21) of the conditions at the smallest token-production ratio (FR 1 for all pigeons), in 67% (16 out of 24) of the conditions at the middle token-production ratio (FR 2 or FR 5), and in 35% (7 out of 20) of the conditions at the highest token-production ratio (FR 5 or FR 10). All FR 1 exchange-production conditions are excluded from this analysis because the predictions of local and global analyses nearly converge at this point. Replicated values approximated their original values, and no-token conditions yielded performances more in accord with local minimization than did standard token conditions (a consequence of lower accumulation).

In Figure 3, unit price is computed across an entire cycle, comparing the payoffs of earning 1 or 12 tokens, corresponding to no accumulation or maximum accumulation, respectively. Equation 1 may also be applied more locally, at different points in a sequence of accumulated tokens. In other words, accumulation may be conceptualized as a series of choices between exchanging already-earned tokens versus earning an additional token. At each choice point the unit price associated with exchanging the already-earned tokens for food can be compared to the unit price of earning one additional token before exchange. Calculating unit price more locally predicts that successive tokens will be earned until the unit price obtained by exchanging those earned tokens is less than or equal to the unit price obtained by earning one additional token. In other words, accumulation will continue until the marginal utility of exchanging exceeds the marginal utility of earning one more token.

In this more local calculation, Rt is the number of upcoming token-production responses required by the choice sequence under consideration rather than the total responses allocated towards the token-production schedule across the entire cycle. If the subject exchanges already-earned tokens, no future token-production responses are required that cycle (hence, Rt = 0); however, if one additional token is earned, then Rt equals the number of responses in the token-production FR requirement. For example, in the FR 5/FR 50 condition, after earning the ninth token in a cycle, the unit price associated with exchanging already-earned tokens equals (0 + 50 + 9)/(2*9), or 3.3. This unit price of exchanging can be compared to the unit price of earning one additional token before exchange, which is (5 + 50 + 10)/(2*10), or 3.25. Because the unit price of exchanging exceeds the unit price of earning one more token, the model predicts further accumulation. Iterating these calculations at the choice point after production of the 10th token shows that exchanging now yields a higher unit price than continuing to save, so the model predicts the subject will stop accumulating tokens at this point.

Figure 4 shows the predicted (lines without points) and obtained (lines with points) number of tokens accumulated at each token-production and exchange-production ratio with unit price computed locally, as described above. The obtained data are the same data portrayed in Figure 2, averaged across subjects (excluding Pigeon 866 for whom a different range of token-production ratios were studied). Because the maximum number of tokens accumulated was 12, the predicted number of tokens at each unit price was truncated at 12. At a given token-production ratio, the model predicts that accumulation will vary directly with the exchange-production ratio. Conversely, at a given exchange-production ratio, accumulation will vary inversely with the token-production ratio. The obtained number of tokens accumulated per cycle is in ordinal agreement with the predictions of Equation 1, locally construed, although the equation overestimates the number of tokens accumulated.

Fig 4.

Fig 4

Mean tokens accumulated per exchange cycle across exchange-production ratios, averaged across subjects and the final five sessions of each condition. Separate data paths depict different token-production FR values: FR 1 (circles), FR 5 (squares), and FR 10 (triangles). Also shown are theoretical functions (lines without symbols), corresponding to the predictions of Equation 1, locally construed. Separate data paths depict different token-production FR values: FR 1 (long dashes), FR 5 (dashes and dots), and FR 10 (dots). See text for other details.

Apart from accumulation, response rates and preratio pausing engendered by the token-production and exchange-production ratios are also of interest. These measures are included in Table 1 and the response-rate data are summarized in Figure 5. This figure shows mean responses per minute in token-production and exchange-production components as a function of token-production and exchange-production FR values. Response rates are calculated as the number of responses on each key divided by the total time spent responding on each key, excluding the initial preratio pauses each cycle. Token-production response rates generally decreased as the exchange-production schedule increased; exchange-production response rates increased across lower exchange-production ratios and then decreased at the highest exchange-production ratios. Changes in the token-production ratio did not systematically affect response rate. Rates of responding were consistently higher in exchange-production than in token-production components except when the exchange-production schedule was FR 1. Replications produced response rates close to those from the original condition.

Fig 5.

Fig 5

Mean responses per minute in the token-production component (triangles) and the exchange-production component (squares) as a function of token-production and exchange-production ratios for each pigeon across the final five sessions of each condition. Error bars are standard deviations. Unconnected points represent replication conditions.

Due to the concurrent presentation of the token- and exchange-production keys, pigeons could switch back to the token-production key after making responses on the exchange-production key but before fulfilling the exchange-production ratio requirement of that cycle. Switching back was not systematically related to FR value on either key and was infrequent, with most occurrences in no-token conditions in which no signals indicated token-production FR completions.

Discussion

Accumulation of reinforcers was systematically related to the contingencies whereby tokens and exchange opportunities were made available. The frequency and magnitude of accumulation varied directly with the exchange-production ratio, and inversely with token-production ratio. These effects are seen most clearly in Figures 1 and 2 by comparing accumulation both: (a) within-phase, at different exchange-production FRs at a given token-production FR; and (b) across-phase, at different token-production ratios at a given exchange-production ratio.

The present findings are generally consistent with previous results of reinforcer accumulation (Killeen, 1974; McFarland & Lattal, 2001), but extend them in some important ways. First, the costs were varied over a much wider parametric range, generating functions relating accumulation both to the costs of earning reinforcers (token-production ratio) and to the costs of switching from earning to collecting (exchange-production ratio). Second, and more importantly, the present study defined all costs in similar currency: number of key-peck responses. This not only permitted more balanced comparisons between the different components of accumulation, but also facilitated a quantitative analysis of accumulation in cost-benefit terms (Equation 1, Figures 3 and 4).

One way the present results differed from previous results was in the overall levels of accumulation. Measured either in terms of frequency (percentage of cycles with accumulation) or magnitude (mean number of tokens accumulated), accumulation was somewhat greater in the present study than in previous studies. This was perhaps due to enhanced discriminative control. Illumination of a token upon completion of each token-production FR served as a stimulus correlated with each food reinforcer. The influence of added stimuli on accumulation was shown by Cole (1990) in an experiment comparing signaled and unsignaled accumulation. In the first phase, lever presses with interresponse times (IRTs) less than 1 s were reinforced with food pellets. This IRT limit was sufficiently short to prevent consumption of food pellets between responses in a single pellet production-consumption cycle. The delivery of each food pellet was accompanied by a brief visual stimulus (darkening of the cue light). In the second phase, short IRTs also earned food pellets, but there were no visual stimuli associated with pellet accumulation, and delivery of pellets was delayed; instead of being delivered individually after each response, all pellets were delivered at once when the IRT requirement was exceeded. Rats saved more food pellets under signaled than unsignaled conditions.

The current study found these same relations between accumulation and the presence of added stimuli. All pigeons accumulated less frequently (Figure 1), accumulated fewer tokens (Figure 2), and obtained higher unit prices (Figure 3), in the absence of tokens. In contrast to earlier reinforcer-accumulation experiments, cumulative signals (in the form of successively illuminated token lights) were used to indicate the number of accumulated reinforcers in a continuous manner. These illuminated token lights served a discriminative function, correlated with amount of food available in exchange. That the stimuli were extended through time and available during the entire token-production component may have enhanced their discriminative function, contributing to the increased accumulation relative to previous experiments.

Such discriminative functions resemble those seen in extended-chained schedules, including token reinforcement schedules, suggesting that response patterns in the current study may be similar to those in other sequential-schedule arrangements. Response rates were higher on the exchange-production schedule relative to the token-production ratio (Figure 5). This effect parallels that seen in extended-chained and second-order token schedules with ratio components, where response rates are higher in the presence of stimuli more proximal to food (Bullock & Hackenberg, 2006; Foster et al., 2001; Jwaideh, 1973; Kelleher, 1958). That token-production response rates decreased at higher exchange-production ratios can also be understood in chained-schedule terms: These are the conditions that generated higher levels of accumulation, hence, a greater number of links per cycle. This, too, is consistent with prior results showing that response rates vary inversely with the number of links per chain (Jwaideh, 1973).

Within each schedule component, effects also resembled prior results. Response rates on the exchange-production key were a bitonic function of exchange-production ratio size, resembling response patterns on simple FR schedules (Mazur, 1983). Conversely, response rates increased slightly as the token-production ratio increased. Although seemingly at odds with prior results with FR token-production ratios (Bullock & Hackenberg, 2006; Kelleher, 1958), those studies used much higher values (FR 25 to 100), and the function relating response rates to token-production ratio often did not turn over until FR 50 or higher. The token-production ratios used in the present study were likely on the ascending leg of the bitonic function.

Examining accumulation in a token-reinforcement context allowed investigation of accumulation not of food itself, but of a type of currency exchangeable for food. This context comes closer than nontoken procedures to replicating an economic system in which one resource (tokens) is depleted as it is used to gain another resource (food). Human economic decisions frequently center around how much of a resource should be saved before spending. Often, the monetary value of a commodity is not the only cost involved in its procurement; travel time, opportunity costs (such as interest and lost wages), and physical effort required may all influence a purchasing decision and its timing. The point at which accumulation stops is the point at which the value of the terminal reinforcer exceeds the marginal value of saving additional resources. Questions about the influence of costs on saving can be examined using operant procedures like those used here.

In a similar economic vein, one explanation for the accumulation observed at higher ratios in the current study involves tradeoffs between immediate and delayed reinforcement variables. As shown in the cost-benefit analysis (Figure 3), the unit price associated with a token-accumulation cycle can be computed as overall responses per reinforcer. Accumulating all 12 tokens before completing the exchange-production requirement results in the lowest overall price; in contrast, earning only 1 token before completing the exchange-production requirement minimizes the number of responses before the first upcoming reinforcer. Each additional reinforcer accumulated per cycle, while increasing the local responses per reinforcer, decreases the session-wide responses per reinforcer. As the cost-decreasing contribution of each saved token increased with increases in the exchange-production ratio, pigeons' choices were more controlled by the overall cost per reinforcer at the expense of reinforcer immediacy. Conceptualizing accumulation more locally as a series of choices between exchanging already-earned tokens and earning one subsequent token yields predictions about the number of tokens accumulated per cycle. Accumulation varied across token- and exchange-production values in a pattern broadly consistent with this model.

The present findings are also related to preference reversals seen in the self-control realm. Accumulation, the more globally efficient pattern of responding, is akin to choosing the larger but more delayed reinforcer, but the associated cost is that the number of responses to be emitted before the next reinforcer must increase. Rachlin and Green (1972) showed that pigeons prefer a smaller, more immediate reinforcer over a larger delayed reinforcer when the delay between choices and reinforcers was relatively short, but that preferences reversed in favor of the larger reinforcer when choices were scheduled sufficiently in advance of the reinforcers. This preference reversal has subsequently been replicated across a range of procedures and species (e.g., Ainslie & Herrnstein, 1981; Green & Snyderman, 1980; Rachlin, Castrogiovanni & Cross, 1987). Increasing the exchange-production requirement in the present experiment is analogous to moving the choice further from the exchange (measured either in terms of time or work required). Increasing these requirements produced graded shifts in preferences from the immediate small reinforcer (no accumulation) to the larger delayed one (maximum accumulation). As Cole (1990) and others have noted, reinforcer accumulation procedures provide an alternative way to study choice with contrasting short-term and longer-term consequences.

Such tradeoffs between short- and long-term consequences also operate in natural environments outside the laboratory. Accumulating food or other commodities for later use is part of the natural foraging patterns of many species, including hamsters, bees, and ants, to name just a few. These species are biologically predisposed toward accumulation, the selective contingencies having operated over evolutionary time. Although experiences have been shown to influence the timing and degree of cache building (Fantino & Cabanac, 1984; Bartness & Clein, 1994), it is often difficult to disentangle ontogenetic from phylogenetic variables. Pigeons, on the other hand, which do not normally accumulate food in naturalistic settings, are an ideal species with which better to understand the role of experiences in accumulation. To the extent that accumulation can be generated in the absence of phylogenetic contingencies that select for it, the behavior must reflect specific learning histories. This may shed light on the local behavioral mechanisms, or “rules of thumb” by which accumulation develops under more naturalistic conditions outside the laboratory.

Acknowledgments

Authorship between the first two authors is equal. The research was supported by grants SES 9982452 and IBN 0420747 from the National Science Foundation, and was part of a Master's Thesis submitted by the first author to the Graduate School at the University of Florida. We are grateful to Marc Branch, Hank Pennypacker, and Alan Spector for their thoughtful comments on this manuscript, and to Theresa A. Foster and Anthony DeFulio for assistance in running experimental sessions. Portions of these data were presented at the 2003 meeting of the Southeastern Association for Behavior Analysis and the 2003, 2004, and 2006 meetings of the Association for Behavior Analysis. Chris Bullock is now at the United States Army Medical Research Institute of Chemical Defense.

References

  1. Ainslie G, Herrnstein R.J. Preference reversal and delayed reinforcement. Animal Learning & Behavior. 1981;9:476–482. [Google Scholar]
  2. Bartness T.J, Clein M.R. Effects of food deprivation and restriction, and metabolic blockers on food hoarding in Siberian hamsters. American Journal of Physiology – Regulatory, Integrative and Comparative Physiology. 1994;266:R1111–R1117. doi: 10.1152/ajpregu.1994.266.4.R1111. [DOI] [PubMed] [Google Scholar]
  3. Bullock C.E, Hackenberg T.D. Second-order schedules of token reinforcement with pigeons: Implications for unit price. Journal of the Experimental Analysis of Behavior. 2006;85:95–106. doi: 10.1901/jeab.2006.116-04. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Cole M.R. Operant hoarding: A new paradigm for the study of self-control. Journal of the Experimental Analysis of Behavior. 1990;53:247–261. doi: 10.1901/jeab.1990.53-247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Fantino M, Cabanac M. Effect of a cold ambient temperature on the rat's food hoarding behavior. Physiology & Behavior. 1984;32:183–190. doi: 10.1016/0031-9384(84)90127-6. [DOI] [PubMed] [Google Scholar]
  6. Foster T.A, Hackenberg T.D. Unit price and choice in a token reinforcement context. Journal of the Experimental Analysis of Behavior. 2004;81:5–25. doi: 10.1901/jeab.2004.81-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Foster T.A, Hackenberg T.D, Vaidya M. Second-order schedules of token reinforcement with pigeons: Effects of fixed- and variable-ratio exchange schedules. Journal of the Experimental Analysis of Behavior. 2001;76:159–178. doi: 10.1901/jeab.2001.76-159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Green L, Snyderman M. Choice between rewards differing in amount and delay: Toward a choice model of self control. Journal of the Experimental Analysis of Behavior. 1980;34:135–147. doi: 10.1901/jeab.1980.34-135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Hackenberg T.D, Vaidya M. Determinants of pigeons' choices in token-based self-control procedures. Journal of the Experimental Analysis of Behavior. 2003;79:207–218. doi: 10.1901/jeab.2003.79-207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Jackson K, Hackenberg T.D. Token reinforcement, choice, and self-control in pigeons. Journal of the Experimental Analysis of Behavior. 1996;66:29–49. doi: 10.1901/jeab.1996.66-29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Jwaideh A.R. Responding under chained and tandem fixed-ratio schedules. Journal of the Experimental Analysis of Behavior. 1973;19:259–267. doi: 10.1901/jeab.1973.19-259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Kelleher R.T. Fixed-ratio schedules of conditioned reinforcement with pigeons. Journal of the Experimental Analysis of Behavior. 1958;1:281–289. doi: 10.1901/jeab.1958.1-281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Kelleher R.T. Conditioned reinforcement in second-order schedules. Journal of the Experimental Analysis of Behavior. 1966;9:475–485. doi: 10.1901/jeab.1966.9-475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Kelleher R.T, Gollub L.R. A review of positive conditioned reinforcement. Journal of the Experimental Analysis of Behavior. 1962;5:543–597. doi: 10.1901/jeab.1962.5-s543. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Killeen P. Psychophysical distance functions for hooded rats. The Psychological Record. 1974;24:229–235. [Google Scholar]
  16. Malagodi E.F, Webbe F.M, Waddell T.R. Second-order schedules of token reinforcement: Effects of varying the schedule of food presentation. Journal of the Experimental Analysis of Behavior. 1975;24:173–181. doi: 10.1901/jeab.1975.24-173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Mazur J.E. Steady-state performance on fixed-, mixed-, and random-ratio schedules. Journal of the Experimental Analysis of Behavior. 1983;39:293–307. doi: 10.1901/jeab.1983.39-293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. McFarland J.M, Lattal K.A. Determinants of reinforcer accumulation during an operant task. Journal of the Experimental Analysis of Behavior. 2001;76:321–338. doi: 10.1901/jeab.2001.76-321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Pietras C.J, Hackenberg T.D. Response-cost punishment via token loss with pigeons. Behavioural Processes. 2005;69:343–356. doi: 10.1016/j.beproc.2005.02.026. [DOI] [PubMed] [Google Scholar]
  20. Rachlin H, Castrogiovanni A, Cross D. Probability and delay in commitment. Journal of the Experimental Analysis of Behavior. 1987;48:347–353. doi: 10.1901/jeab.1987.48-347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Rachlin H, Green L. Commitment, choice, and self-control. Journal of the Experimental Analysis of Behavior. 1972;17:15–22. doi: 10.1901/jeab.1972.17-15. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of the Experimental Analysis of Behavior are provided here courtesy of Society for the Experimental Analysis of Behavior

RESOURCES