Choice as a Function of Reinforcer “Hold”: From Probability Learning to Concurrent Reinforcement

Greg Jensen; Allen Neuringer

doi:10.1037/0097-7403.34.4.437

. Author manuscript; available in PMC: 2009 Apr 24.

Published in final edited form as: J Exp Psychol Anim Behav Process. 2008 Oct;34(4):437–460. doi: 10.1037/0097-7403.34.4.437

Choice as a Function of Reinforcer “Hold”: From Probability Learning to Concurrent Reinforcement

Greg Jensen ¹, Allen Neuringer ¹

PMCID: PMC2673116 NIHMSID: NIHMS103857 PMID: 18954229

Abstract

Two procedures commonly used to study choice are concurrent reinforcement and probability learning. Under concurrent-reinforcement procedures, once a reinforcer is scheduled, it remains available indefinitely until collected. Therefore reinforcement becomes increasingly likely with passage of time or responses on other operanda. Under probability learning, reinforcer probabilities are constant and independent of passage of time or responses. Therefore a particular reinforcer is gained or not, on the basis of a single response, and potential reinforcers are not retained, as when betting at a roulette wheel. In the “real” world, continued availability of reinforcers often lies between these two extremes, with potential reinforcers being lost owing to competition, maturation, decay, and random scatter. The authors parametrically manipulated the likelihood of continued reinforcer availability, defined as hold, and examined the effects on pigeons’ choices. Choices varied as power functions of obtained reinforcers under all values of hold. Stochastic models provided generally good descriptions of choice emissions with deviations from stochasticity systematically related to hold. Thus, a single set of principles accounted for choices across hold values that represent a wide range of real-world conditions.

Keywords: matching, stochastic response, limited hold, reinforcement probability, pigeons

Probability learning and concurrent reinforcement are procedures commonly used to study choice. They share many similarities. For example, subjects choose repeatedly between two options to obtain reinforcers: Human participants press buttons or push computer keys for points or money; monkeys look left or right for orange juice or raisins; rats run down alleys or press levers for food pellets; and pigeons peck keys for grain. Choice distributions under both procedures are influenced by reinforcer distributions and by other attributes of the reinforcers, for example, qualities, amounts, and delays. However, aspects of the procedures differ, and there is controversy concerning results, especially whether results from the two procedures are consistent.

Under probability-learning procedures, reinforcer availability depends on a random number generator that is activated (or fired) each time a choice occurs. Thus each choice is reinforced with some probability, and the probabilities are independent of previous events. For example, a left (L) choice might be reinforced with 0.4 probability and a right (R) one with 0.1 probability. In some studies the resulting response distributions are described as probability matching: Ratios of responses (L/R) equal ratios of reinforcement probabilities, or 4 to 1 in the example given (Myers, Lohmeier, & Well, 1994; Vulkan, 2000). Such matching is inefficient: probability matching leads to 34% of responses being reinforced in the example, whereas choosing L exclusively achieves 40% reinforcement, the maximum possible. Thus, unsurprisingly, recent studies indicate that probability matching is limited: It tends to occur early in training and when subjects are not strongly motivated. For example, Shanks, Tunney, and McCarthy (2002) showed that extended training, large monetary rewards, and feedback that indicates level of proficiency all tend to move choice distributions by humans toward exclusive preference for the best alternative. More generally, when subjects are motivated to obtain rewards quickly or to gain as many as possible, rats, pigeons, and people tend to choose the highest valued alternative a large proportion of the time, sometimes exclusively (see Fantino & Esfandiari, 2002; Herrnstein & Loveland, 1975; Shah, Bradshaw, & Szabadi, 1989; Shanks et al., 2002; Vulkan, 2000).

Concurrent-reinforcement procedures, developed and popularized by Richard Herrnstein and his colleagues, differ in that availability of reinforcers depends on the passage of time, most commonly under variable-interval (VI) schedules (see Herrnstein, 1997). For example, L choices might be reinforced an average of once per minute and R an average of once every 3 min. The original report by Herrnstein (1970) indicated that choice proportions matched proportions of obtained reinforcers, and the phenomenon came to be referred to as reinforcement matching. In the example just given, assuming that three times as many reinforcers were received on the left as on the right, the same left-to-right ratio of responses would be emitted to achieve a “matching” relationship. Note that matching in the concurrent-reinforcement case is between choices and experienced or obtained reinforcers, whereas in the probability-learning case, the term is used with respect to programmed probabilities of reinforcement. Because programmed and obtained are not always equivalent—obtained reinforcers partly depend on choice distributions—matching according to one definition may be inconsistent with matching according to the other. Furthermore, many studies using the concurrent procedure report “undermatching” (Baum, 1979), that is, response proportions are closer to 0.50 than are predicted by Herrnstein’s matching formulation, adding to the uncertainty as to how best to integrate results from the two procedures.

Research in the two areas is often directed at similar questions: How do consequences influence choice? What theories account for the distribution of responses? How do drugs, injury to the central nervous system (CNS), and psychopathologies affect choice? And what CNS systems underlie choice? Some attempts have been made to relate the two (Fantino, 1998; Herrnstein, 1970; Herrnstein & Loveland, 1975; Mackintosh, 1974), but researchers in these areas generally make little contact with one another. The main goal of the present research is to assess whether progress toward integration can be obtained by placing the two procedures along a single continuum, the hold continuum. Hold refers to the duration or probability that a reinforcer will remain available after it is initially scheduled. It indicates the lastability, persistence, or retention of a commodity or opportunity.

Probability learning and concurrent reinforcement lie at opposite ends of the hold continuum. Under concurrent-reinforcement procedures, a reinforcer remains available from the time it has “set up” until it is collected. An example from the “real” world would be mail delivered to a mailbox: The mail remains in the box until retrieved, no matter how long the interval between delivery and receipt. Another example is withdrawing money from a savings account: Once the money is deposited, it remains accessible until withdrawn. Pumping water from a slowly filling well or getting dinnerware from the cupboard are also good metaphors. These are goods or goals that are available until the requisite response, no matter the delay between when the goods are initially deposited and when the response is made.

In contrast, under probability-learning procedures, the reinforcer is available (or not), contingent on a single response: A possible reinforcer is not held beyond the one response. An example is throwing a desperation pass in a football game toward a crowd of receivers and opponents. Other examples of the temporary (or “one-shot”) availability of probability-learning reinforcers include taking a picture of the rare bird that flies by and placing a bet at a roulette table.

Many real-world instances lie between these extremes: Potential rewards are often available for a period of time, or series of responses, but with a limit. When the phone rings, answering will be effective, but only for the period of the ring. Ripe apples hang from the tree, but not forever. Availability of a $100 bill that happens to lie on the sidewalk is temporary, since it may be found by someone else or blown away by the wind. These last examples represent common occurrences in nature, where potential rewards such as fruit, prey, or mates are available only temporarily, and natural processes of scatter, decay, or competition limit access. The continued availability of a reinforcer is determined by many factors.

Three of the four experiments described in this article explore the role of systematic variations in hold probabilities on choice distributions. We are not the first to superimpose hold contingencies on operant reinforcement schedules, for example, as in limited hold procedures (see Boelens, 1984; Buskist & Morgan, 1987; Morse, 1966). As far as we know, however, we are the first to apply hold equally for all choices and to test whether choice distributions—under conditions spanning the range from probability learning to concurrent reinforcement—can be explained by differences in hold (see Killeen & Shumway, 1971, for related research).

Our initial experiment combined probability-learning and concurrent- reinforcement procedures in order to test a procedure for the experiments that followed. Pigeons chose repeatedly among three keys, with concurrently operating random ratio (RR) schedules programming the reinforcers on each of the keys. Unlike most concurrent ratio procedures, the random-number generators were activated for all keys whenever any key was chosen. Thus, if a subject responded five times to the right-hand key, all keys—left, middle, and right (L, M, and R)— had five opportunities for a reinforcer to become available. Once available, a potential reinforcer remained so until collected. The schedule is referred to as a dependent concurrent random ratio, or Conc_dep3RR (following MacDonall, 1988) and is similar to concurrent VIs: Reinforcers on one operandum partly depend on time (in the VI case) or responses (in the RR case) devoted to other operanda. It is essentially analogous to R. Herrnstein’s (1970) concurrent schedule but with reinforcers programmed probabilistically at each response instead of in time (see Lau & Glimcher, 2005; MacDonall, 1988). The Conc_dep3RR differs importantly from the more common concurrent VI, however, in that response rates and pauses do not influence reinforcer availability, and as is shown here, this facilitates analyses.

Experiment 1

Method

Subjects

Six Racing Homer pigeons (275 g to 350 g) had prior experience pecking keys for food rewards in the apparatus used in this study. The birds were maintained at 85% of their free-feeding weight and housed individually under a 16:8 hr light/dark cycle, with water continuously available in their home cage. The birds received food during the experimental procedure only if their weight fell below the 85% level.

Apparatus

The apparatus consisted of five identical Ger-brands operant-conditioning chambers, 28 × 30 × 29.5 cm. Three back-lit response keys were spaced along the wall on the left-hand side of the door, 21.5 cm above the floor. The chambers were housed within sound-insulating enclosures. The keys—referred to as L (left), M (middle), and R (right)—had diameters of 2 cm and were spaced 7 cm from one another. A food hopper was located in a 5.5 × 4.5 cm opening below the middle key, 7.5 cm above the floor. A houselight was mounted on top of the chamber. Apple eMac computers controlled events in the chamber, with programs written in TrueBasic.

Procedure

Each of the three keys was associated with a separate random number generator that controlled reinforcement availability (or reinforcer set-up) for that key. Responses to any of the keys activated (or “fired”) all three random generators. Thus, a response might be reinforced or lead to set up of reinforcers on all keys simultaneously, or on no keys, or on some keys. Once set up, a reinforcer remained available until collected (as under concurrent VI conditions), and no additional reinforcers could set up on that operandum (reinforcers did not “stack”). As was indicated previously, the schedule is abbreviated as Conc_dep3RR, with dep referring to reinforcer set-up events depending on responses to any key. The contingency, functionally identical to the two-operandum procedure described by Lau and Glimcher (2005), did not use a change-over delay (COD), that is, did not withhold reinforcers for switching among the keys. A session terminated after 3 hr or 210 rewards, whichever came first. Generally, all reinforcers were collected in sessions that averaged approximately 90 min.

Each response produced a 2-s interresponse interval (IRI), during which the houselight and key lights were dark. A reinforced choice produced 3 s of access to Purina Nutriblend pellets, identical to the normal feed given in the pigeons’ home cages, followed by the 2-s IRI. Responses during the IRI reset it, but such responses were rare. Key lights and the houselightwere darkened during reinforcer delivery, with a light above the hopper illuminating the available food. At other times, the house light and key lights were illuminated (white lights), and the hopper light was off.

The pigeons experienced six phases, each consisting of five sessions (with the exception of the first phase, which consisted of four sessions, this because of experimental error). Each phase provided a different set of reinforcement probabilities on the three keys, with the sum of the probabilities equal to 0.2 in all cases (see Table 1).

Table 1.

Probabilities of Reinforcer Set-Up for the Three Keys in Each Phase of Experiment 1

Phase	Key L	Key M	Key R
1	0.0667	0.0667	0.0667
2	0.06	0.02	0.12
3	0.09	0	0.11
4	0	0.2	0
5	0.1	0.06	0.04
6	0	0.15	0.05

Open in a new tab

Results

The solid lines connecting the solid points in Figure 1 show proportions of responses on L, M, and R keys for each pigeon (S1 through S6) during each session. Proportions were calculated by dividing total number of responses to a key during a session by the sum of responses on all three keys. The unconnected open points show the associated proportions of obtained reinforcers, calculated similarly. The numbers above the curve in the S1 graph indicate the programmed reinforcer probabilities, these being the same for all of the pigeons.

Proportions of responses (solid circles and connecting lines) on left (L), middle (M), and right (R) keys for each of 6 pigeons (S1 through S6) during each session of Experiment 1. Also depicted are proportions of obtained reinforcers (open circles) and programmed reinforcer set-up probability (gray lines). These programmed probabilities, which were the same for all pigeons, are also written on the S1 graph for each phase.

Three aspects of these results are noteworthy. First, the pigeons responded similarly to one another: Intersubject consistency was high. This consistency is representative of most of the results throughout our four experiments, and to conserve space, we present averages of the 6 birds in many cases, or group all pigeons’ performances on a single graph in others.

Second, response proportions paralleled obtained reinforcers, a result shown more directly in Figure 2. Ongoing discussion in the literature focuses on how best to summarize choice–reinforcer relationships when more than two alternatives are available (Aparicio & Cabrera, 2001; Davison, Krageloh, Fraser, & Breier, 2007). A solution that minimizes theoretical assumptions focuses on ratios of response pairs and associated reinforcer pairs (e.g., L/M, M/R, and R/L). These ratios are shown in three graphs in Figure 2, response pairs as a function of reinforcer pairs on logarithmic coordinates. The graphs are based on responses and obtained reinforcers during the fifth (final) session of each phase. (We generally use fifth-session data for all detailed analyses.) Each point represents one pigeon during one phase. The good fits of the data to the least squares lines (high r² values) and the absence of bias (represented by the y intercepts being close to 0.0) permit us to combine data from the three pairs onto a single graph, as provided in the lower right-hand quadrant. The exponents of the logarithmic functions were all approximately 1.0, indicating a direct 1:1 matching relationship. In other words, proportions of responses equaled proportions of reinforcers.

Log of response ratios (left/middle, middle/right, and right/left) as a function of log obtained reinforcer ratios, these shown in the upper left, upper right, and lower left quadrants, respectively. Each point represents a pigeon’s performance during the last session of each phase. The lower right quadrant combines the data from the other three quadrants. The lines are the least-squares, best fitting functions.

A third conclusion can be drawn from the gray horizontal lines in Figure 1 that represent proportions of programmed reinforcers. The important observation is that choice proportions approximated programmed, as well as obtained, reinforcer proportions, indicating that the observed relationships were not caused by responses “driving” those distributions.

How might these data be explained? “Matching occurs naturally” has been offered as one explanation for choices under concurrent schedules, implying that animals and people have evolved to allocate choices in proportion to received reinforcers (Gallistel et al., 2007; Gallistel, Mark, King, & Latham, 2001). Our data are consistent with this interpretation, because they show rapid and consistent matching, but other possibilities are not excluded.

One alternative is that matching is the by-product of “lower-level” relationships in which choices are governed by the momentarily best option. In other words, matching at the molar level, as is depicted in Figure 2, may occur because subjects choose (or attempt to choose) the response option with the momentarily highest reinforcement probability (Shimp, 1966; Silberberg, Hamilton, Ziriax, & Casey, 1978). To examine this possibility, we calculated—for every response in a session—the probabilities that each of the three possible choices, L, M, and R, would be reinforced (for similar calculations in the concurrent VI case, see Williams, 1988; Staddon, Hinson, & Kram, 1981). Under the Conc_dep3RR contingencies, the probability that a particular response (an instance) will be reinforced on key k, denoted as I_k, is determined by two factors: the programmed probability of reinforcement on key k (here denoted as R_k) and the number of responses since k was last chosen (here denoted as n to indicate “responses since last selection of k”). The formula is

I_{k} = 1 - {(1 - R_{k})}^{n} Where n > = 1

(1)

Because the exponent n indicates “since last selection of key k,” repetitions yield an n value of 1, the lowest value possible. The derivation of this equation is provided in Appendix A.

One can readily calculate I_k for each operandum at each response in a session and thereby determine, response by response, the “best choices,” that is, those that would maximize overall reinforcers per response. The result was a different sequence for each of the six phases of the experiment: LMRLMR … in Phase 1; RRLRMRLRRLRMRL … in Phase 2; MLML … in Phase 3; MM … in Phase 4; LMLRLMLR … in Phase 5; and MMMLMMML … in Phase 6. In other words, given that reinforcers were programmed probabilistically, and thus that the pigeons could not anticipate with certainty when or where a reinforcer was available, the sequences just described are the best that the pigeon could do to maximize reinforcers per response.

Instead of detecting this sort of highly structured patterned responses, we found evidence for an alternative response-allocation strategy, suggested by Nevin (1969, 1979) and others (Heyman, 1979): namely, that the birds were matched by stochastic responses. An example of stochastic matching is this: In an environment in which 60% of reinforcers were obtained from one key, 30% from another, and 10% from the third, respond as if governed by a biased three-sided object that had a 0.60 probability of landing on side A, 0.30 probability on side B, and 0.10 on side C. We evaluate stochasticity of responses in a number of ways and report the results of two tests, these representing the others. The data show that responses were generally consistent with those in a stochastic model, but we also identify deviations from stochasticity. (The term stochastic is used here to imply a Bernoulli-type process in which events are independent.)

The first test involved predicting proportions of each of the nine possible dyads (LL, LM, LR, etc.) from proportions of L, M, and R responses. If responses were generated stochastically, then the proportions of each of the dyads could be predicted from the first-order L, M, and R proportions. For example, if the proportion of L, M, and R responses in a session were .6, .3, and .1, respectively, then the proportions predicted from a stochastic model for L followed by M would be (0.6 × 0.3) or .09; of two Ls in a row would be .36; and so on. Note that the first-order proportions do not necessitate the dyad proportions. If responses were not stochastic in nature, then the same first-order proportions could be generated by 60 Ls in a row, followed by 30 Ms, followed by 10 Rs, and the emitted dyads would differ appreciably from those just described. Similarly, the deterministic “optimal response sequences” described here also would fail to produce stochastic-like dyads.

Because of the large number of data points (9 dyads × 6 conditions × 6 pigeons), we used an information statistic to summarize the distributions of dyads and compared the pigeons’ values to those predicted by the stochastic model. If responses were emitted stochastically, then the information contained in the dyad distribution would be the following:

{Info}_{Stoch} = \sum_{i, j} [Prop (k_{i}) \times Prop (k_{j})] \times {log}_{2} [Prop (k_{i}) \times Prop (k_{j})]

(2)

Here, Prop(k_i) and Prop(k_j) refer to the proportions (across a session) on L, M, and R keys. In essence, the calculation provides the predicted information values in the dyads from the first-order L, M, and R proportions, taken two at a time.

The analogous information value calculated from the pigeons’ actual data was as follows:

{Info}_{Data} = \sum_{i, j} Prop (k_{i} k_{j}) \cdot {log}_{2} (Prop (k_{i} k_{j}))

(3)

Here, Prop(k_ik_j) represents the proportions of the actually emitted nine possible dyads. We used each pigeon’s terminal sessions’ first-order proportions (proportions of L, M, and R) to predict that pigeon’s dyad proportions and then compared the pigeon’s dyad distribution—in terms of information value—to the predicted distribution.

The upper left graph in Figure 3 plots the information values obtained from the pigeons’ data (y axis) as a function of the values predicted from a stochastic model (x axis). The best-fitting line of these points has a slope very close to 1.0, indicating that the pigeons’ data closely followed the information expected from a stochastic model. That is, the pigeons’ behaviors were consistent with stochastic generation.

Information contained in the distribution of response dyads (LL, LM, LR, ML …) as a function of the information value predicted if responses were stochastic. Data are from the last session in each phase of each of the experiments.

No single test suffices to indicate stochasticity (Knuth, 1969; Nickerson, 2002). Whereas the information statistic evaluates overall distributions of dyads, the “mean run length” statistic, which we present next, evaluates lengths of response strings, or “runs,” on a single key (e.g., RRRR, without interruption from responses to keys L or M). The average length of runs can be predicted on the basis of first-order proportions of L, M, and R, assuming that the generating process was stochastic. The question then becomes whether the pigeons’ mean run lengths were similar to those predicted by the stochastic model.

To calculate mean run length on operandum k, we divide the number of responses to k by the number of response strings composed exclusively of ks, or

MRL (k) = \frac{Responses (k)}{Strings (k)}

(4)

As an example, consider the following sequence: LLLMLR-RLLMLLLL. This sequence has 10 individual responses to L and four strings composed of Ls: LLL, L, LL, and LLLL, in that order. Thus, the MRL for L is 2.5 (or 10/4).

If responses were stochastic in nature, then the actual mean run lengths should be related to the first-order L, M, and R response proportions according to Equation 5:

MRL {(k)}_{Stoch} = \frac{1}{1 - Prop (k)}

(5)

Here Prop(k) indicates the proportion of responses on key k across the entire session.

Figure 4 shows the pigeons’ mean run lengths as a function of their first-order response proportions, with each point representing an individual bird’s responses on each of the three keys in the final session of each phase of the experiment. The drawn function is that expected from a stochastic model (Equation 5). The pigeons’ mean run lengths approximated the stochastic model, although the data points appeared to fall below the stochastic line in the middle range, indicating a possible deviation from stochasticity.

Mean run lengths on each key during the last session of each phase of Experiment 1 as a function of the proportion of responses to that key. Data from all of the pigeons are shown. The drawn line is the expected function if responses were stochastic.

Additional analysis indicated a tendency in some phases for the pigeons to switch among the keys more than that predicted by a stochastic model. Expected switch rates were calculated by taking the proportions of responses on each key (e.g., L = 0.6, M = 0.3, and R = 0.1) and squaring them to predict the proportion of each “repeat pair” (LL = 0.36, MM = 0.09, and RR = 0.01). The sum of these probabilities (0.46) should (assuming statistical independence) represent the proportion of response repetitions; and (1-repetitions) equals switches (0.54).

Table 2 provides average proportions of switches (switches/switches + repeats; left column), the switch proportions predicted from the stochastic model (center column), and the ratio of these two values (right column). The pigeons switched significantly more than was expected in Phases 2 and 3, paired t tests (5) = 4.96 and 4.54, p < .01, in both cases. Although these data are only suggestive of a bias in switch rates, they are presented here because they led to similar evaluations in the later experiments, and these showed that the pigeons’ switch versus repeat responses systematically deviated from otherwise stochastic emissions.

Table 2.

Ratio of Actual Versus Expected Proportion of Switches Per Phase in Experiment 1

Phase	Actual switch proportion	Expected switch proportion	Actual:Expected ratio
1	0.703	0.660	1.066
2	0.622	0.538	1.155
3	0.647	0.516	1.257
4	0.080	0.082	0.994
5	0.648	0.612	1.060
6	0.381	0.379	1.000

Open in a new tab

Shown are arithmetic means of the 6 birds’ performances.

Discussion

Under a dependent concurrent RR schedule, proportions of responses by pigeons matched (or equaled) proportions of obtained reinforcers. Matching occurred rapidly and was consistent across 6 pigeons. Many aspects of the procedure differed from commonly used concurrent schedules: (a) Reinforcers were scheduled probabilistically, with availability dependent on all responses (i.e., every response increased the probability of reinforcement on all keys). (b) Three choice alternatives were provided, rather than the more common two. (c) Responses were “discrete,” with a 2-s IRI pause between each response (and after reinforcement). And (d) overall frequencies of reinforcement were high in comparison with those in many experiments in this field of study, with about one reinforcer for every five to six responses.

Each of these may have contributed to rapid, orderly, and consistent changes in responses. A few other experiments have used similar procedures, although with only two choice alternatives (Lau & Glimcher, 2005; MacDonall, 1988; Meisch & Spiga, 1998), and choices in those cases were also related in orderly fashion to reinforcers but were generally less sensitive to changes in reinforcer ratios (exponents of the power function < 1.0).

Response proportions matched both obtained and programmed reinforcers. The latter relationship is of interest because (assuming stochastic generation) matching to programmed reinforcers maximizes overall reinforcers per response under the present conditions. We independently verified, by modeling the contingencies with different response-allocation strategies and proportions, that if a subject responds stochastically, matching is the most effective choice distribution. Thus, as a somewhat oversimplified general statement, the birds stochastically matched and, by so doing, maximized reinforcements.

Two statistics supported the hypothesis that the pigeons’ responses were stochastic-like in nature, one a measure of information and the other a measure of run lengths. However, deviation from stochasticity was also identified: In two phases the pigeons switched from one key to the others more frequently than was predicted by a stochastic model. Such switching could be due to the close physical proximity of the keys, absence of CODs, a species-specific (or “natural”) tendency for pigeons to switch among choice alternatives, or some other “general tendency.” Another possibility is that switching was differentially reinforced (Plowright & Shettleworth, 1990). These hypotheses are explored in Experiments 2 through 4. The primary goal of Experiment 1 was met, namely to assess whether the Conc_dep3RR procedure provides data of sufficient order to test effects of hold.

Experiment 2

As is indicated in the previous section, in some situations, a potential reinforcer remains available until collected, whereas in other situations, the potential reinforcer is temporary, lasting for an instant or one response. Many cases lie between these extremes. Experiment 2 studied choices by pigeons under contingencies similar to those in Experiment 1 but in which we systematically varied the probability that a reinforcer, once set up, would remain available. The hold (h) parameter specified that probability, the schedule is referred to as Conc_dep3RR_h, with h a variable between 0.0 and 1.0. When h = 0.0, if a response did not gain a just-made-available reinforcer (because the response was to another key), that potential reinforcer was removed (the probability of retention was zero). When h = 1.0, a reinforcer remained available from the time it set up until received, no matter how long the interim. As has been indicated, these extremes represent two large bodies of literature encompassing research on probability learning and concurrent reinforcement. The contribution of this experiment is to explore hold probabilities that lie between these extremes.