Reinforcer Control by Comparison-Stimulus Color and Location in a Delayed Matching-to-Sample Task

Brent Alsop; B Max Jones

doi:10.1901/jeab.2008-89-311

. 2008 May;89(3):311–331. doi: 10.1901/jeab.2008-89-311

Reinforcer Control by Comparison-Stimulus Color and Location in a Delayed Matching-to-Sample Task

Brent Alsop ^1,^✉, B Max Jones ¹

PMCID: PMC2373768 PMID: 18540217

Abstract

Six pigeons were trained in a delayed matching-to-sample task involving bright- and dim-yellow samples on a central key, a five-peck response requirement to either sample, a constant 1.5-s delay, and the presentation of comparison stimuli composed of red on the left key and green on the right key or vice versa. Green-key responses were occasionally reinforced following the dimmer-yellow sample, and red-key responses were occasionally reinforced following the brighter-yellow sample. Reinforcer delivery was controlled such that the distribution of reinforcers across both comparison-stimulus color and comparison-stimulus location could be varied systematically and independently across conditions. Matching accuracy was high throughout. The ratio of left to right side-key responses increased as the ratio of left to right reinforcers increased, the ratio of red to green responses increased as the ratio of red to green reinforcers increased, and there was no interaction between these variables. However, side-key biases were more sensitive to the distribution of reinforcers across key location than were comparison-color biases to the distribution of reinforcers across key color. An extension of Davison and Tustin's (1978) model of DMTS performance fit the data well, but the results were also consistent with an alternative theory of conditional discrimination performance (Jones, 2003) that calls for a conceptually distinct quantitative model.

Keywords: reinforcer control, response bias, comparison-stimulus dimensions, delayed matching-to-sample, pigeon

Delayed matching-to-sample (DMTS) tasks are a set of procedures used for investigating aspects of remembering in human and nonhuman animals. In a typical DMTS task using pigeons, one of two sample stimuli is chosen randomly and presented on a central key, the pigeon pecks this key, then the sample is removed and a delay, or retention interval, begins. Following that delay, two comparison stimuli are presented via the two side keys, one on each side of the sample. Pecking the comparison stimulus that matches (either directly or symbolically) the sample stimulus is “correct” and produces occasional reinforcement (e.g., access to food). Incorrect responses produce a short blackout. Numerous experiments have shown that matching accuracy in these tasks decreases as the delay increases (e.g., Blough, 1959; Cumming & Berryman, 1965; D'Amato, 1973; White, 1985).

The present study focuses on an additional feature of DMTS tasks: That is, the location of the two comparison stimuli varies randomly or quasi-randomly between the two side-key locations across trials. The reason for this arrangement is straightforward; if the comparison stimuli were always presented in the same locations, then the task requires only a simple discrimination between the samples, rather than a conditional discrimination (see Honig & Wasserman, 1981), and a pigeon could make the precursors of a correct choice response (e.g., standing in front of the correct key) immediately after the sample appears. This overt mediating behavior could conceivably attenuate the effects of the delay, and so challenge the face validity of the task as one in which remembering (or short-term or working memory) is required. Indeed, matching accuracy declines more rapidly with increasing delay intervals in delayed conditional discriminations than in delayed simple discriminations (Honig & Wasserman, 1981; Smith, 1967).

Varying the location of the comparison stimuli poses an interesting problem for investigations into the operation of reinforcement in these procedures. Specifically, reinforcers delivered for correct responses in DMTS tasks are associated with two orthogonal stimulus dimensions, that dimension which is relevant in the reinforcement contingencies (e.g., color of keys) and the locations of those stimuli (i.e., position of side key relative to the central key). Most studies that have focused on reinforcer control in DMTS tasks, however, have ignored the location dimension associated with reinforcement, sometimes to the extent that the relevant data simply were not collected or reported (e.g., Harnett, McCarthy, & Davison, 1984; McCarthy & Davison, 1986). The implicit assumption seems to be that the location of comparison stimuli, and any associated asymmetry in reinforcer deliveries across those locations, was either irrelevant or a minor consideration. These assumptions also appear in quantitative models of reinforcer control in these procedures (e.g., Davison & Tustin, 1978; Alsop & Davison, 1991) which ignore the location of the comparison stimuli in their formulations. First, the responses in MTS (and DMTS) are defined in terms of the comparison stimulus chosen (Davison & Nevin, 1999) in order to apply those models to both MTS and signal-detection tasks involving two responses such as saying “yes” or “no” or pressing left or right operanda. Second, the ratio of reinforcers obtained for Comparison 1 to Comparison 2 selections, irrespective of their locations, is the primary independent variable. Consequently, any systematic changes in either the left-to-right response ratio or the left-to-right reinforcer ratio obtained will usually have gone unnoticed.

Two empirical studies of reinforcement in DMTS tasks have examined position biases in choice responding (Jones & White, 1992; McCarthy & Davison, 1991). McCarthy and Davison arranged dim- and bright-yellow samples, and red and green comparison stimuli. They examined the degree to which variations of the red/green reinforcer ratio produced changes in response bias for choosing one comparison stimulus (called reinforcer sensitivity) as the delay between offset of the sample and presentation of the comparisons (the sample–choice delay), or the delay between a correct comparison selection and the reinforcer (the choice–reinforcer delay), was varied across conditions. Sensitivity to the red/green reinforcer ratio decreased with increasing sample–choice and choice–reinforcer delays, but about half of the pigeons also showed increasing position biases with these increasing delays. Furthermore, when they estimated sensitivity of position biases to the left/right obtained reinforcer ratio, these pigeons showed increasing sensitivity with increasing delays. In an attempt to reconcile the decrease in red/green sensitivities with the predictions of their quantitative models, they argued that there may have been “a change in the locus of control exerted by the reinforcers” (p.65); namely, from control by reinforcers associated with comparison color to control by reinforcers associated with comparison position.

Jones and White (1992) also examined sensitivity to reinforcement in a DMTS task, however, they varied only the sample–choice delay and did so within sessions rather than across conditions. In contrast to McCarthy and Davison's (1991) results, estimates of sensitivity to the comparison color (red/green) reinforcer ratio increased with increasing sample–choice delays, position preferences at all delays were negligible, and sensitivities of position preferences did not change systematically with delays. Jones and White argued that these results were consistent with McCarthy and Davison's (1991) proposal of an interaction between reinforcer control by comparison color and comparison location; that is, the absence of position biases in the behavior of Jones and White's subjects concomitant with increasing color biases across delays is consistent with McCarthy and Davison's suggestion that their subjects showed decreased comparison-color sensitivity because they developed comparison-location sensitivity as delays increased.

Studies of MTS performance also suggest that subjects learn about the location of comparison stimuli and, therefore, that the left/right reinforcer ratio might affect performance. Jones (2003) extended Sidman's (1986, 2000) and Cumming and Berryman's (1965) theoretical analysis of MTS tasks and argued that the effective discriminative stimuli in MTS tasks (and, by implication, in DMTS tasks also) “are not the two comparison stimuli themselves irrespective of their location, … but the two 2-key stimulus configurations presented at the choice phase on a trial” (p. 342). Thus, in the typical two-sample two-comparison procedure, Comparison 1 on the left key and Comparison 2 on the right key (i.e., C₁–C₂) constitutes one discriminative stimulus while Comparison 2 on the left and Comparison 1 on the right (i.e., C₂–C₁) constitutes the other. Furthermore, he argued that selecting the left key and selecting the right key in such procedures, rather than selecting either comparison stimulus, should be considered the fundamental responses in these tasks. These contentions were supported by his data and data from other studies. For example, Iversen, Sidman and Carrigan (1986) trained monkeys to perform MTS and then showed that interchanging the locations of sample and comparison stimuli across response keys severely disrupted matching accuracy. Similarly, Kamil and Sacks (1972) trained pigeons on a MTS task involving only three of the four combinations of a sample and comparison-stimuli locations. Following one sample (S₁), Comparison 1 appeared on the left and Comparison 2 appeared on the right (C₁–S₁–C₂) on some trials, and the reverse (C₂–S₁–C₁) was arranged on an equal number of other trials. Following the other sample (S₂) however, Comparison 1 always appeared on the left and Comparison 2 always on the right (C₁–S₂–C₂). Once all pigeons were matching accurately, they added the fourth untrained stimulus configuration (i.e., C₂–S₂–C₁). There was no immediate transfer to the new configuration; the pigeons responded not to the correct comparison stimulus on the left key (C₂), but rather to the right key—the location of the correct response for the previously learnt C₁–S₂–C₂ configuration. Kamil and Sacks' pigeons seemed to have learned the comparison-stimulus configurations associated with each sample. Finally, Sidman (1992) arranged a more complicated set of comparison configurations for his monkeys. Four choice operanda were arranged in a square, and the two sample stimuli were presented on an operandum in the center of the square. On any trial, the two comparison stimuli could be presented on any two of the four locations, making a total of 12 possible configurations per sample. Not only did the monkeys have difficulty learning the task, but acquisition followed different patterns for the various comparison–location configurations; that is, they seemed to learn which response was appropriate for each of the configurations separately. Although these results were all generated in MTS tasks, given the similarity between MTS and DMTS tasks, it is likely that similar effects operate in DMTS tasks as well.

The present experiment systematically varied the distribution of reinforcers across comparison-stimulus color (i.e., red & green) and comparison-stimulus locations (i.e., left & right) in a DMTS task for pigeons. We examined whether the frequency of reinforcers for left- versus right-key responses exerted any control over choice between comparison stimuli, compared the control by the left/right reinforcer ratio with the control by the red/green reinforcer ratio, and looked for any interaction between the two types of control.

Method

Subjects

Six adult ex-homing pigeons, numbered 141 to 146, were experimentally naïve at the start of the experiment. They were maintained at 85 % ± 15 g of their free-feeding body weight by providing supplementary mixed grain after daily sessions if needed. Water and grit were freely available in home cages. Pigeons were housed individually in their home cages in a room with about 75 other pigeons running on unrelated experiments. The room was temperature controlled and artificially lit on a 15.5 h light/8.5 h dark cycle.

Apparatus

For all subjects, the 38 cm × 38 cm × 38 cm home cage doubled as an experimental chamber. Three of the walls were solid sheet metal while the floor, ceiling, and fourth wall were galvanized rods. Two wooden perches were situated at the bottom of the cage to assist access to mixed grain, water, grit, and the interface panel. The interface panel was situated on one of the solid metal walls and consisted of four circular, translucent response keys each with a diameter of 2 cm and situated 5 cm apart from one another and 21 cm up from the perch. Only the three keys furthest from the cage door were operative during the experiment. The keys were illuminated using Light Emitting Diodes (LEDs). The center key could be illuminated with one of two intensities of yellow and the two side keys could be illuminated either red or green. A peck of 0.1 N was defined as an effective response.

A 6 cm × 6 cm food aperture centrally located under the response keys on the interface panel provided controlled access to wheat via a food hopper. During reinforcer delivery, all key lights were extinguished, the food hopper was raised for 2 s, and the aperture was illuminated yellow while the hopper was raised.

All experimental events were run and recorded using a computer running Med-PC software in an adjacent room.

Procedure

Subjects were initially magazine trained and autoshaped. They were then trained on a variable-interval (VI) 5-s schedule for pecking the center key when lit bright yellow. This was progressively increased to VI 15 s across sessions. Later, VI schedules operated for pecking each stimulus that would be used in the DMTS task; namely, bright and dim yellow lights on the centre key, and red and green lights on the side keys. The VI schedules were again increased from a VI 5 s to VI 15 s.

Following simple schedules of reinforcement for pecking lit keys, 40 sessions of forced-choice chain-schedule training was arranged. In this training, each discrete trial began with the center key lit either bright or dim yellow with a probability of .5. Following five responses to the center key, one of the side keys was lit with a probability of .5. The side key was lit red if the sample was bright yellow, and green if the sample was dim yellow. A single peck to the lit side key produced either a food reinforcer or 2 s of blackout; exactly which consequence was earned was determined by a probabilistic method of scheduling reinforcers described below. In either case, there was a 5-s intertrial interval in blackout before the next trial began.

The final phase of preliminary training consisted of a symbolic DMTS task. The center key was lit either bright or dim yellow with a probability of .5. After five responses to this center key it was extinguished, and after a delay of 0.01 s, the two side keys were lit with either red on the left and green on the right, or vice versa. The probability that red would appear on left was .5. Correct responses were defined as a response to the red side key after the bright sample and a response to the green side key after the dim sample; the same relations as those used in the forced-choice training phase. Furthermore, intermittent reinforcement of correct responses was scheduled in the same manner as it was in forced-choice training, and is described in detail below. This training was conducted for 43 sessions.

The delay interval was then increased to 0.5 s for 15 sessions, 1.0 s for 15 sessions, and then to 1.5 s for 15 sessions, at which point experimental conditions commenced. For Bird 146, however, matching accuracy dropped to chance levels with a 1.5 s delay, so the delay interval for this subject was decreased to 0.5 s.

Intermittent reinforcement of side key responses in forced-choice training, and of correct comparison-stimulus selections in later preliminary training and all experimental conditions, was scheduled as follows: At the start of a session and following each reinforcer delivery during the session, the next reinforcer was assigned to either the next correct red or the next correct green response with a set probability. This reinforcer was then further assigned to either the next correct left or the next correct right response with another set probability. (In all preliminary training, the probability that the next reinforcer would be earned for a correct red response was .5 and the probability that it would be earned for the next correct left response was .5.) Thus, a reinforcer was allocated to one of four possible color-location combinations (i.e., red on the left, red on the right, green on the left, or green on the right), and all correct responses to any other color–location combination prior to that selected combination earned only 2 s blackout; the same consequence as errors. The arranged reinforcer remained allocated across trials to a particular color–position combination until that correct response had been made and the reinforcer had been delivered. Similarly, other reinforcers were not scheduled until that reinforcer was collected, ensuring that the obtained reinforcer ratios closely approximated those arranged.

This method of scheduling reinforcers is equivalent to variable ratio (VR) 4 for correct responding given that red and green, and left and right, were equally likely to be correct on any one trial. (The density of reinforcement across trials could, however, be considerably lower if matching accuracy was lower and long runs of particular comparison configurations were displayed and/or selected for reinforcement.) It allowed us to vary independently the reinforcer ratio obtained for red/green and the reinforcer ratio for left/right responses across conditions. Table 1 shows the probability that a reinforcer was assigned to the next correct red response—p(red)—and the probability that a reinforcer was assigned to the next correct left response—p(left)—in each of the 28 experimental conditions. [Note that the probability of reinforcers for green responses is simply the inverse of p(red) and the probability of reinforcers for right responses is the inverse of p(left)]. Four of the 25 conditions involved direct replications of earlier conditions (i.e., Conditions 4, 14, 15 & 16) to investigate order effects in the data.

Table 1.

The probabilities of reinforcing the next correct–red and the next correct–left response in each of the experimental conditions. Replication conditions are indicated by *. Conditions 5, 6, and 7 were run with a different sample stimulus disparity and their data were not included in the present analyses. For Birds 141–145, the delay interval was 1.5 s, and for Bird 146 the delay interval was 0.5 s.

Condition	p(red)	p(left)
1	.5	.5
2	.2	.5
3	.8	.5
4*	.5	.5
8	.5	.8
9	.2	.8
10	.8	.8
11	.5	.2
12	.2	.2
13	.8	.2
14*	.5	.8
15*	.5	.2
16*	.5	.5
17	.9	.5
18	.1	.5
19	.5	.9
20	.5	.1
21	.1	.1
22	.8	.1
23	.2	.1
24	.9	.1
25	.9	.9
26	.9	.2
27	.9	.8
28	.1	.8

Open in a new tab

In all preliminary training and experimental conditions, sessions ended after 50 reinforcers had been obtained, or 45 min had elapsed, whichever came first.

Each condition was run for a minimum of 40 experimental sessions and until there were no obvious trends in measures of accuracy or response bias. Eight response types were tallied in each session: correct responses to red on the left key, correct responses to red on the right, correct responses to green on the left, correct responses to green on the right, incorrect responses to red on the left, incorrect responses to red on the right, incorrect responses to green on the left, and incorrect responses to green on the right. In addition, the number of reinforcers obtained for each of the four types of correct response was tallied for each session. The data summed over the last 20 sessions of each condition for each subject were analyzed.

Results

The relative frequency with which subjects chose one comparison-stimulus color over the other (i.e., red versus green), and the relative frequency with which they chose one comparison-stimulus location over the other (i.e., left versus right) are important aspects of performance in this study. Figure 1 presents an analysis of the relative frequency of red responses across conditions and Figure 2 an analysis of the relative frequency of left responses. In Figure 1, the log ratio of red to green responses was calculated separately for each configuration of sample and comparison stimuli (i.e., C₁–S₁–C₂, C₂–S₁–C₁, C₁–S₂–C₂, and C₂–S₂–C₁) in each condition, where C₁ and C₂ refer to the red and green comparisons respectively, and S₁ and S₂ refer to the bright and dim sample respectively. (Log red/green response ratios are positive when red is selected more often than green, and negative when green is selected more often than red. The absolute size of the value indicates the degree of bias for choosing one color over the other.) Figure 1 shows the mean and standard errors of these log ratios across subjects. Each panel plots these means as a function of the probability of a reinforcer being arranged for a correct response to the red comparison, p(red). The different panels show the effects of different probabilities of a reinforcer being arranged on the left key, p(left).

Fig 1 — Within each panel, the logarithm of the red-to-green response ratio (log B_red/B_green) is plotted separately for each sample-comparison configuration as a function of the probability that a reinforcer was arranged for the next correct response on the red comparison p(red). The different panels show the effects of varying the probability that the reinforcer would be obtained at a particular location, p(left).

Fig 2 — Within each panel, the logarithm of the left-to-right response ratio (log B_left/B_right) is plotted separately for each sample-comparison configuration as a function of the probability that a reinforcer was arranged for the next correct response on the left side key p(left). The different panels show the effects of varying the probability that the reinforcer would be arranged for the next correct–red response, p(red).

Figure 1 shows that choice between comparison colors differed across the four configurations, and that these differences were related to the sample stimulus, the probability that responses to a comparison color were reinforced, and the probability that responses to a position were reinforced. The top left panel illustrates these effects. Consider first log response ratios on the two comparison configurations associated with S₁; the open and filled circles. These data show that there were more red (correct) responses than green (incorrect) responses following S₁ because all these data points exceeded 0. However, the bias for choosing the red comparison on S₁ trials was greater when the red comparison was on the right (C₂–S₁–C₁; open circles), where the probability of reinforcement for correct red responses was higher (p(right) = 1 – p(left) = 1 − .1 = .9), than when the red comparison was on the left (C₁–S₁–C₂; filled circles) where the probability of reinforcement for correct red responses was lower (p(left) = .1). The results following S₂ presentations show the mirror-image effect. That is, although choice on both S₂ configurations was biased to green (indicated by all triangles being less than 0), the bias for green was greater when green appeared on the right side-key (C₁–S₂–C₂ configuration; filled triangles) where the probability of reinforcement for correct green responses was high (p(right) = 1 – p(left) = 1 − .1 = .9), than when green appeared on the left key (C₂–S₂–C₁ configuration; open triangles) where the probability of reinforcement for correct green responses was lower (p(left) = .1). These differences between configurations with the same sample stimuli were evident in all the panels shown in Figure 1 except the rightmost where the probability of reinforcement across positions was the same (p(left) = .5).

The effect of varying the probability of reinforcers for correct red or correct green responses, p(red), is shown within each panel of Figure 1. This effect was less robust than the effect of varying reinforcement probabilities across comparison position, but there was some evidence that the relative number of red responses increased as the probability for red reinforcers increased.

Figure 2 shows the primary analysis implied by Jones' (2003) theory; log left/right response ratios for each of the four sample–comparison configurations plotted as a function of the probability that correct-left responses would be reinforced. Again, these data are the mean of log response ratios across subjects. The different panels depict the effects of different probabilities of a reinforcer being arranged for a correct-red response, p(red). Various effects are apparent in Figure 2. First, clear differences between left/right response ratios on the four configurations are apparent. The data points depicting choice on configurations where the left comparison was correct (C₁–S₁–S₂ and C₂–S₂–C₁; filled circles & open triangles respectively) are all positive indicating more left than right responses (and, therefore, more correct than incorrect responses) on these trials. Similarly, data points depicting choice on configurations where the right comparison was correct (C₂–S₁–C₁ and C₁–S₂–C₂; open circles & filled triangles respectively) are all negative indicating more right than left responses (and, therefore, more correct than incorrect responses) on these trials also. While these effects are apparent across all the panels except that showing equal probabilities of red and green reinforcers (p(red) = .5), the relative positions of the S₁ and S₂ configurations differ across panels. Consider the top left panel of Figure 1 where p(red) = .1: conditions where the probability of reinforcement for correct-red responses was low and, therefore, the probability of reinforcement for green responses was high. The bias toward left and correct on the C₂–S₂–C₁ configuration (open triangles) exceeded the bias toward left and correct on the C₁–S₁–C₂ configuration (filled circles). In addition, the bias toward right and correct on the C₁–S₂–C₂ configuration (filled triangles) exceeded the bias toward right and correct on the C₂–S₁–C₁ configuration (open circles). Put another way, there was a larger separation between data depicting the two S₂ configurations than there was between data depicting the two S₁ configurations implying that matching was more accurate on S₂ trials than on S₁ trials when the probability of reinforcement for correct responses on S₂ trials (1 - p(red)) exceeded that arranged on S₁ trials (p(red)). However, when the probability of a red reinforcer was high and that of a green reinforcer was low (p(red) = .9; the lower-right panel in Figure 1), exactly the opposite effects can be seen: The separation between the two S₁ configurations exceeded the separation between the two S₂ configurations, and thus matching accuracy was higher on S₁ trials than on S₂ trials. Overall, therefore, the difference between position biases in the presence of the two S₁ configurations (C₁–S₁–C₂ and C₂–S₁–C₁)—or matching accuracy on S₁ trials—increased as the probability of a red reinforcer increased, and the difference between position biases in the presence of the two S₂ configurations ((C₁–S₂–C₂ and C₂–S₂–C₁)—or matching accuracy on S₂ trials—increased as the probability of a green reinforcer increased. Finally, Figure 2 shows that the log left/right response ratios increased (with only one exception) as the probability of reinforcement for correct-left responses increased. It is noteworthy that this increase was considerably more robust than the increase in log red/green response ratios seen in Figure 1, a point which will be elaborated and discussed later.

The subsequent analyses use a quantitative model of signal detection (Davison and Tustin, 1978) to measure separately the effects of stimulus disparity and the left/right and red/green distributions of reinforcers. This model has routinely been applied to performances on MTS and DMTS tasks. Estimates of discriminability between the sample stimuli and response (comparison–selection) bias in each experimental condition were calculated using the measures log d and log b, respectively, from their model; that is,

and

where B₁₁ and B₁₂ are correct and incorrect responses, respectively, following Sample 1 (S₁), and B₂₂ and B₂₁ are correct and incorrect responses, respectively, following Sample 2 (S₂).

Red/Green and Left/Right Reinforcer Control

For each subject in each condition, the log response ratios (log B_red/B_green) from the configurations with red–left and green–right side keys following S₁ and S₂ (C₁–S₂–C₂ and C₁–S₂–C₂) were averaged and Equation 2 was applied. This gave a measure of any bias each subject had for choosing one comparison color more often than the other for that comparison-key configuration (C₁–S_1or2–C₂). This was also done for green–left and red–right side-key configurations (C₂–S_1or2–C₁). Figure 3 plots the mean estimates of color bias across subjects for each configuration as a function of the mean obtained log red/green reinforcer ratio in each condition. Each panel shows these relations for conditions with a different probability of the reinforcer being obtained on the left key, p(left). Lines of best fit through each set of data were found by least squares linear regression, and the equations describing those lines are shown in each panel. There was a moderate positive relation between color bias and changes in the log red/green reinforcer ratio as indicated by positive slopes to the best-fitting lines. Furthermore, there was no evidence of systematic differences between the slope of the regression lines as a function of either p(left) or the two different configurations. Table 2 shows the results of corresponding linear regressions on the data from individual subjects. Overall, the individuals' results show a positive relation between red/green bias and the log red/green reinforcer ratio, although this effect is quite weak for Bird 144. These individual analyses also showed no systematic changes in slope across either p(left) or the two configurations.

Fig 3 — Log color bias, calculated using Equation 2, is plotted as a function of the log red/green reinforcer ratio. The different panels show the results from different probabilities that the reinforcer was also arranged for a left-key response, p(left).

Table 2.

Estimates of the sensitivity of color biases to changes in the red/green reinforcer ratio, a, and inherent bias, log c, for the fits to the data from individual subjects. The variance accounted for by the fitted lines (R²) is also given.

Bird	p(left)	Choice-key configurations
		left–red and green–right			left–green and red–right
		a	log c	R²	a	log c	R²
141	0.1	0.43	−0.72	0.59	0.29	0.61	0.60
	0.2	0.35	−0.64	0.99	0.37	0.47	0.96
	0.5	0.14	−0.29	0.37	0.47	0.09	0.80
	0.8	0.05	−0.08	0.28	0.45	−0.17	0.87
142	0.1	0.32	−0.98	0.79	0.38	1.16	0.87
	0.2	0.28	−0.70	0.77	0.29	0.62	0.28
	0.5	0.28	−0.08	0.53	0.29	0.14	0.40
	0.8	0.05	0.35	0.02	0.19	−0.52	0.39
143	0.1	0.16	−0.96	0.28	0.26	0.90	0.91
	0.2	0.46	−0.65	0.92	0.44	0.42	0.94
	0.5	0.40	−0.01	0.58	0.44	0.04	0.73
	0.8	0.75	0.40	0.81	0.33	−0.50	0.83
144	0.1	0.16	−1.00	0.15	0.44	1.28	0.78
	0.2	0.37	−0.59	0.90	0.04	0.71	0.01
	0.5	0.21	0.21	0.32	−0.05	−0.19	0.03
	0.8	0.06	0.83	0.03	−0.06	−0.96	0.03
145	0.1	0.22	−0.52	0.31	0.09	0.73	0.08
	0.2	−0.15	−0.35	0.17	0.51	0.33	0.92
	0.5	0.24	0.02	0.52	0.35	−0.08	0.57
	0.8	0.57	0.53	0.84	0.06	−0.69	0.05
146	0.1	0.38	−0.27	0.83	0.60	0.60	0.89
	0.2	0.57	−0.28	0.95	0.52	0.28	0.95
	0.5	0.30	0.17	0.71	0.38	−0.25	0.73
	0.8	0.50	0.55	0.93	0.33	−0.63	0.52

Open in a new tab

Figure 3 and Table 2 show that the intercepts of the regression lines for the two configurations changed as p(left) changed. When p(left) equaled 0.5, the intercepts were nearly identical, but the two functions became increasingly separated as p(left) became more extreme. These changes were consistent with a bias toward the side key with the greater probability of reinforcement for each configuration. For example, when p(left) was .1 and p(right) was .9, the subjects showed a bias for the green key in the red–left green–right configuration (C₁–S_1or2–C₂; mean = −0.74) and for the red key in the green–left red–right configuration (C₂–S_1or2–C₁; mean = 0.88). The intercepts following the regression analyses of individual data (Table 2) generally follow the same pattern as the mean data.

Figure 4 shows the corresponding analysis when mean measures of position bias on the C₁–S_1or2–C₂ and C₂–S_1or2–C₁ configurations were calculated using the appropriate modification to Equation 2 and plotted as a function of the mean obtained log left/right reinforcer ratio in each condition. Each panel plots these data for conditions with a different probability of reinforcement for red responses (p(red)). The data from each comparison-key configuration are plotted separately, each with the best fitting lines drawn through them, and the equations of those lines provided. There was a clear positive relation between position bias and changes in the log left/right reinforcer ratio evident in high positive slopes to the fitted lines. Furthermore, there were no systematic changes in the slopes of the regression lines as a function of either p(red) or the two different comparison configurations. Table 3 shows the results of corresponding regressions for individual subjects' data, and these also showed no systematic effects of either p(red) or configuration.

Fig 4 — Log position bias, calculated using a modification to Equation 2, is plotted as a function of the log left/right reinforcer ratio. The different panels show the results from different probabilities that the reinforcer was also arranged for a red-key response, p(red).

Table 3.

Estimates of the sensitivity of position biases to changes in the left/right reinforcer ratio, a, and inherent bias, log c, for the fits to the data from individual subjects. The variance accounted for by the fitted lines (R²) is also given.

Bird	p(red)	Choice-key configurations
		left–red and green–right			left–green and red–right
		a	log c	R²	a	log c	R²
141	0.1	0.51	−0.43	0.97	0.82	0.33	0.99
	0.2	0.80	−0.43	0.86	0.46	0.09	0.94
	0.5	0.46	−0.28	0.72	0.42	−0.09	0.82
	0.8	0.26	−0.18	0.87	0.62	−0.25	0.98
	0.9	0.46	−0.09	0.89	0.52	−0.54	0.96
142	0.1	0.89	−0.41	0.99	1.08	0.08	0.91
	0.2	0.99	−0.16	0.99	0.79	0.09	0.95
	0.5	0.93	−0.14	0.94	0.94	−0.10	0.96
	0.8	1.22	0.27	0.99	1.12	−0.33	0.93
	0.9	0.54	−0.04	0.90	0.92	−0.40	0.79
143	0.1	0.81	−0.65	0.99	0.91	0.19	0.99
	0.2	0.61	−0.35	0.83	0.85	0.17	0.97
	0.5	0.72	−0.12	0.83	0.78	0.04	0.89
	0.8	1.07	0.20	0.90	0.83	−0.25	0.95
	0.9	1.22	0.39	0.96	0.87	−0.41	0.96
144	0.1	1.02	0.03	0.95	1.01	0.26	0.99
	0.2	1.52	0.02	1.00	1.56	0.24	1.00
	0.5	0.97	0.17	0.93	1.03	0.01	0.92
	0.8	1.28	0.42	0.98	1.64	0.06	0.99
	0.9	0.95	0.11	0.95	1.37	−0.01	0.84
145	0.1	0.45	−0.13	0.95	0.78	0.40	0.92
	0.2	0.40	−0.20	0.50	0.97	0.19	0.79
	0.5	0.77	0.11	0.88	0.91	0.19	0.92
	0.8	1.18	0.24	0.94	1.19	0.14	0.99
	0.9	0.76	0.42	0.88	0.69	−0.23	0.89
146	0.1	0.47	−0.16	0.96	0.70	0.53	0.98
	0.2	0.45	−0.16	0.74	0.55	0.40	0.74
	0.5	0.79	0.25	0.93	0.87	0.20	0.94
	0.8	0.50	0.32	0.79	0.66	−0.08	1.00
	0.9	0.56	0.63	0.96	1.15	−0.21	0.98

Open in a new tab

A comparison of Figures 3 and 4, and Tables 2 and 3, showed that the slopes of the regression lines following changes to the log left/right reinforcer ratio were consistently greater than those following changes to the log red/green reinforcer ratio. That is, the sensitivity of position biases to the obtained left/right reinforcer ratio consistently exceeded the sensitivity of color biases to the obtained red/green reinforcer ratio. The differences were also quite large; the mean sensitivity of color bias to the red/green reinforcer ratio across subjects and configurations was 0.30 (SD = 0.19), whereas the mean sensitivity of position biases to the left/right reinforcer ratio was 0.84 (SD = 0.30).

Figure 4 shows that the intercepts of the regression lines for the two configurations changed as p(red) changed. When p(red) was 0.5, the two functions were nearly identical, but they became increasingly separated as p(red) became more extreme. These changes were consistent with an increasing bias toward the key color with the greater probability of reinforcement for each configuration as that probability of reinforcement increased. For example, when p(red) was .1 and p(green) was .9 (top left panel), the subjects showed a bias for the right key in the red–left green–right configuration (C₁–S_1or2–C₂; intercept = 0.30) and a bias for the left key in the green–left red–right configuration (C₂–S_1or2–C₁; intercept = −0.29). These biases remained when p(red) was 0.2 and p(green) was .8 (top center panel), but they both decreased. When p(red) was .8 and p(green) was .2 (bottom left panel), the bias was now to choosing left in the red–left green–right configuration (C₁–S_1or2–C₂; intercept = 0.21) and to choosing right in the green–left red–right configuration (C₂–S_1or2–C₁; intercept = −0.12). These biases remained but were greater when p(red) equaled .9 (bottom center panel). The results from the regressions of individual subjects' data (Table 3) were consistent with the mean data shown in Figure 4. However, the size of the changes in intercept across changes in p(red) were generally smaller in Figure 4 and Table 3 than the corresponding changes found when p(left) was varied (Figure 3, Table 2).

Taken together, the results of Figures 3 and 4 and Tables 2 and 3 suggested the red/green and the left/right reinforcer ratios operate multiplicatively to determine choice between comparison stimuli and, therefore, that the effects of the logarithms of these reinforcer ratios were independent and additive. In mathematical terms, the comparison-color bias for the red–left green–right configuration (C₁–S_1or2–C₂) could be described by the equation,

graphic file with name jeab-89-03-03-e03.jpg

and the bias for either comparison color on the green–left red–right configuration (C₂–S_1or2–C₁) by,

graphic file with name jeab-89-03-03-e04.jpg

where R denotes numbers of reinforcers, a₁ denotes the sensitivity of behavior to the red/green reinforcer ratio, a₂ denotes the sensitivity of behavior to the left/right reinforcer ratio, c₁ and c₂ are inherent biases related to color and location respectively, and all other notation is as above. Therefore, a multiple linear regression was conducted on the data from each subject in all 25 conditions using the estimates of response bias from the separate configurations as the dependent variable; that is, Equation 3 and 4 were fitted simultaneously to the data. The results are shown in Table 4. The R² measures indicated that the model fitted well, accounting for much of the variance in the data. There was a significant effect of both the red/green reinforcer ratio (a₁) and the left/right reinforcer ratio (a₂) for every subject. Furthermore, sensitivity of behavior to changes in the left/right reinforcer ratio was always greater than sensitivity to changes in the red/green reinforcer ratio, sometimes markedly so (e.g., Birds 142 and 144). There was also some evidence of an inverse relation between estimates of a₁ and a₂, but a correlation test was not significant (p = .16). Estimates of inherent biases for choosing one comparison color over the other (log c₁) were negligible and not significantly different from 0 for 5 of the 6 subjects. In contrast, inherent biases for location (log c₂) were quite large and significantly different from 0 for 5 of the 6 subjects at p < .05. Finally, the residuals following the multiple linear regressions were plotted as a function of the log red/green reinforcer ratio and the log left/right reinforcer ratio separately for all subjects (Figure 5). In both cases, the residuals were evenly scattered around zero, indicating that the model did not appear to miss any systematic effects of the independent variables.

Table 4.

Results of a multiple linear regression of log response bias (log b) for the red–green and green–red configurations as a function of red–green and left–right obtained reinforcer ratios fitting Equations 3 and 4 simultaneously. The parameters a₁ and a₂ measure sensitivity of behavior to changes in the ratio of reinforcers across key color and key position, respectively. The parameters c₁ and c₂ measure any consistent bias for key color or key position, respectively, across conditions.

Subject	a₁ (std err)	a₂ (std err)	log c₁ (std err)	log c₂ (std err)	R²
141	0.32 (.04)^***	0.50 (.04)^***	−0.09 (0.02)^**	−0.20 (.02)^***	.88
142	0.25 (.05)^***	0.90 (.05)^***	0.00 (.03)	−0.13 (.03)^***	.90
143	0.40 (.05)^***	0.87 (.05)^***	−0.03 (.03)	−0.06 (.03)	.90
144	0.14 (.06)^*	1.18 (.06)^***	0.03 (.04)	0.10 (.04)^*	.90
145	0.26 (.05)^***	0.81 (.05)^***	−0.01 (.04)	0.11 (.04)^**	.84
146	0.40 (.04)^***	0.73 (.04)^***	0.01 (.03)	0.20 (.03)^***	.89

Open in a new tab

^***

p<.001,

^**

p<.01,

p<.05

Fig 5 — Residuals, following fits to Equation 3 and 4, are plotted as a function of the log red/green reinforcer ratio (left panel) and log left/right reinforcer ratio (right panel) for each subject in each condition.

Stimulus Discriminability

Figure 6 plots the mean estimates of stimulus discriminability (log d, Equation 1) as a function of the log red/green reinforcer ratio for each of the two configurations of comparison stimuli separately. The separate panels show different arranged values of p(left). There were no consistent differences between the estimates of log d obtained from the two configurations. However, Figure 6 provides some evidence that the estimates of stimulus discriminability (log d) changed as a function of the log red/green reinforcer ratio; specifically, estimates of discriminability were greater at more extreme reinforcer ratios. This was particularly clear in the U-shaped pattern in the top left panel of Figure 6 where p(left) was 0.1. A more detailed examination of this effect appears below.

Fig 6 — Mean discriminability (Equation 1) is plotted as a function of the log red/green reinforcer ratio. The different panels show the results from different probabilities that the reinforcer was also arranged for a left-key response, p(left).

Figure 7 shows the corresponding analysis of stimulus discriminability estimates using the log left/right reinforcer ratio as the independent variable within each panel, and with p(red) varying across panels. There were no consistent differences between the estimates of log d obtained from the two configurations and, unlike Figure 6, there was no evidence of a U-shaped function.

Fig 7 — Mean discriminability (Equation 1) is plotted as a function of the log left/right reinforcer ratio. The different panels show the results from different probabilities that the reinforcer was also arranged for a red-key response, p(red).

A multiple linear regression was conducted for each subject using the data from all conditions to investigate the patterns suggested by Figures 6 and 7. The dependent variables were the estimates of stimulus discriminability (log d) from each of the two configurations. The independent variables were the log red/green reinforcer ratio, the log left/right reinforcer ratios, and, because Figure 6 suggested quadratic effects, the squares of these log ratios. Table 5 shows the results of the multiple regressions. All subjects showed a significant y-intercept (p<.001), but this was not surprising because Figures 6 and 7 clearly show high values of log d in all the conditions. There was also some statistical support for the U-shaped pattern in Figure 6 in that Subjects 143, 145, and 146 showed significant quadratic effects of the log red/green reinforcer ratio. This implies that estimates of stimulus discriminability increased significantly as the red/green reinforcer ratio deviated further, and in either direction, from 1 (a log ratio of 0). There was less support for the inverted U-shape relation suggested by Figure 7; only Subject 144 showed a significant negative quadratic effect (p = 0.041) of the log left/right reinforcer ratio. Finally, Subject 146 showed a small but significant (p = 0.039), linear effect of the log red/green reinforcer ratio.

Table 5.

Results of a multiple linear regression of discriminability (log d) for the red–green and green–red configurations as a function of the log red–green reinforcer ratio, the log left–right obtained reinforcer ratio, and the squares of these ratios.

Bird	log(R_red/R_green)	log(R_left/R_right)	(log(R_red/R_green))²	(log(R_left/R_right))²	Int'pt (std err)
Bird	a₁ (std err)	a₂ (std err)	a₃ (std err)	a₄ (std err)	Int'pt (std err)
141	0.03 (0.07)	−0.05 (0.07)	−0.15 (0.12)	−0.21 (0.12)	1.28 (0.08)^***
142	0.00 (0.07)	0.00 (0.07)	−0.05 (0.07)	−0.23 (0.12)	1.05 (0.08)^***
143	0.11 (0.05)	−0.05 (0.05)	0.34 (0.09)^***	0.06 (0.10)	0.93 (0.06)^***
144	−0.03 (0.07)	0.00 (0.06)	0.17 (0.12)	−0.24 (0.11)^*	1.06 (0.07)^***
145	0.02 (0.06)	0.04 (0.05)	0.39 (0.09)^***	0.08 (0.10)	1.13 (0.07)^***
146	0.12 (0.05)^*	−0.02 (0.05)	0.66 (0.09)^***	0.12 (0.10)	1.15 (0.06)^***

Open in a new tab

^***

p<.001,

p<.05

Discussion

Various findings emerged from this systematic variation of reinforcer ratios in a DMTS task. The pigeons' behavior toward the comparison stimuli was sensitive to the distribution of reinforcers across both comparison-stimulus color and comparison-stimulus position (Figures 1 and 2). Furthermore, the effects of these two dimensions of reinforcer control were independent and additive when the ratio data were transformed into logarithms (Figures 3 and 4, Tables 2 and 3). That is, the sensitivity of position biases to the obtained left/right reinforcer ratio did not change systematically across sets of conditions arranging different red/green reinforcer ratios. Similarly, the sensitivity of color biases to the red/green reinforcer ratio was unaffected by variations of the left/right reinforcer ratio.

Extant quantitative models of DMTS (and MTS) performance do not account for the effects of the left/right reinforcer ratio observed here because none incorporates the relevant dependent and independent variables. However, an extended version of the Davison and Tustin (1978) model that included separate terms for the distribution of reinforcers across comparison-stimulus colors and across comparison-stimulus positions (Equations 3 and 4) described the results well (Table 4, Figure 5). These analyses revealed that the sensitivity of left/right response ratios to the reinforcer distribution across position was consistently and markedly greater than the sensitivity of red/green response ratios to the reinforcer distribution across comparison color. This result, and the size of the difference (Tables 2 and 3), is, perhaps, surprising. Comparison color was the relevant stimulus dimension during the choice phase for showing discrimination between the sample stimuli, and Figures 6 and 7 illustrate that the pigeons discriminated between the samples accurately; in other words, the color of the comparison stimuli clearly had stimulus control over the pigeons' responding. Despite this degree of discrimination between the comparison colors, varying the red/green reinforcer ratio produced relatively small changes in measures of color bias (Figure 3). Furthermore, we are unaware of any evidence that sensitivity of choice responding to reinforcer ratios is markedly lower in switching-key concurrent schedules, where stimuli such as key colors typically signal the two schedules, than in two-key concurrent schedules where key location signals the schedules. Perhaps response location is a particularly salient dimension where reinforcer control is concerned, and this salience is highlighted in situations such as DMTS tasks where position and other stimulus dimensions can gain reinforcer control.

The present experiment is relevant for Jones' (2003) conceptual model of contingencies of reinforcement in MTS tasks. Jones proposed that MTS tasks (and by implication, DMTS tasks) involve four-term contingencies of reinforcement where the sample stimuli serve as conditional stimuli (the first terms), the comparison-key configurations serve as the discriminative stimuli (the second terms), left and right responses (or, more generally, location-directed responses) are the fundamental units of behavior (the third terms), and differential consequences for the two responses (depending on the comparison configurations and the conditional stimuli appearing on a trial) serve as the fourth terms. The results in Figures 1 and 2 are consistent with parts of this theory. In Figure 1, whenever p(left) was not .5, there were clear differences between performance for each configuration; that is, the contingencies signaled by a particular configuration controlled behavior on that trial. For example, when p(left) was .9 and p(red) was .9 (rightmost data points in the bottom-central panel), then the configuration of red–left and green–right comparison stimuli was associated with a more extreme reinforcer distribution than the green–left and red–right configuration, and the response ratio data clearly reflect this difference (Figure 1). Thus, these results show that comparison configurations can serve as discriminative stimuli in a DMTS task, at least when differential frequencies of reinforcement are earned for left versus right and red versus green responses (conditions that generally hold even when these reinforcer ratios are not explicitly varied). In addition, Figure 2 shows that left/right response ratios changed systematically with variation of the left/right reinforcer ratio, supporting Jones' contention that location-directed responses (i.e., left versus right in the typical DMTS task) are the appropriate units of behavior for analysis. Together, these results support Jones' theory that subjects in a MTS task (and a DMTS task) are not choosing the comparison color that is designated by the sample to be the correct choice; instead they are choosing the side-key that is designated by the sample and the comparison configuration on that trial to be the correct choice. Although this approach is an alternative to the view taken in extant models of DMTS (and MTS) performance, it does not necessarily conflict with the model presented in Equations 3 and 4 where Davison and Tustin's (1978) model was extended. Such an extension was required to capture the combined effects of reinforcement parameters across the two different configurations, and was straightforward because the effects of the reinforcer ratios for the two stimulus dimensions comprising the configuration were independent.

Aside from receiving support from the results obtained in this experiment, the theory advanced by Jones (2003) might also assist our understanding of the difference between position-bias and color-bias sensitivities observed here. One implication of his offering location-directed responses (e.g., left and right keypecks) as the fundamental responses in MTS and DMTS tasks is that the relative frequency of emitting either response should vary as a function of the relative frequency with which it is reinforced. That is, a choice exists between emitting the different responses, and the reinforcement history of each should exert some control over that choice. (As noted above, Figure 2 shows evidence of such control.) However, Jones challenged the traditional view that comparison-color biases arise from variations of the comparison-color reinforcer ratio by a similar mechanism. Instead, he argued that any tendency to choose one comparison color more often than the other over a block of trials (where the two comparisons were correct equally often) reflected different comparison-discrimination accuracies on S₁ and S₂ trials over that block. His argument went as follows: First, he noted that the degree of any difference between position biases on the two comparison configurations after one sample (e.g., C₁–S₁–C₂ and C₂–S₁–C₁) afforded a measure of the differential responding toward the comparisons after that sample; for example, the larger the difference between a left-key bias on C₁–S₁–C₂ and C₂–S₁–C₁ trials, the greater the differential responding. The data for such an analysis appear in Figure 2 where a change in the difference between position biases on the two configurations after one sample was apparent across parts of the experiment varying the red/green reinforcer ratio. (Note that a point estimate of discrimination accuracy between the comparisons after one sample could be calculated by applying Equation 1 to these data.) Furthermore, responding differentially to the two configurations after one sample reflects stimulus control by the relevant dimension of the comparison stimuli, usually their colors. Thus, the greater the difference between position biases on the two comparison configurations, the greater the stimulus control by the comparison colors, and the higher matching accuracies will be. Second, he noted that subjects will have selected one comparison stimulus more often than the other over a large block of trials whenever these measures of stimulus control by comparison colors differ after each sample. (Jones & Davison, 1998, referred to differences between these measures as asymmetrical discrimination accuracies). For example, when subjects choose C₁ after S₁ more often than they choose C₂ after S₂, and S₁ and S₂ are presented equally often, they will have chosen C₁ more often than C₂ overall. Thus, different degrees of discrimination between the comparisons after the two samples will generate a difference in comparison-selection frequencies that mimics a response bias arising from some matching-type relations. But how might such different discrimination accuracies arise? Jones (2003) highlighted the fact that different rates of reinforcement for correct responding are signaled by the samples (to a degree depending on sample disparity) whenever the reinforcer ratio for C₁/C₂ selections deviates from 1.0. Based on his results, Jones argued that the rate of reinforcement that could effectively be signaled by a sample determined the degree of discrimination between the comparisons (or the stimulus control exerted by the relevant comparison-stimulus dimension); the higher that rate of signaled reinforcement, the greater the comparison discrimination. (Figure 2 presents further evidence of this effect.) Therefore, his approach to the current data asserts that the difference between position-bias and color-bias sensitivities reflects a difference between the mechanisms underlying each so-called bias. To reiterate, the tendency to choose one comparison color more often than the other results from a difference between comparison–discrimination accuracies following the two samples arising from different signaled probabilities of reinforcement for correct responses, whereas the tendency to choose left more often than right reflects some degree of matching (i.e., sensitivity) to the obtained left/right reinforcer ratio. With such a difference between mechanisms, it is perhaps not surprising that the sensitivities of color biases and the sensitivities of position biases were so different in the present experiment. Although plausible, this interpretation remains tentative and awaits further examination.

The present results are also relevant for previous research that investigated reinforcer control in DMTS tasks (e.g., Harnett et al., 1984; Jones & White, 1992; McCarthy & Davison, 1991; McCarthy & Voss, 1995). These studies varied the reinforcer ratio for C₁/C₂ selections, but never controlled for systematic variation in the reinforcer distribution across positions. On one level this might not be a problem; the present data suggest that although differences in the reinforcer distribution across left and right keys would affect performance, these effects would be independent of the effects of the reinforcer distribution across colors. This means that the conclusions of these previous studies would be unchanged. However, the interaction between control by the comparison-color reinforcer ratio and control by the left/right reinforcer ratio proposed by McCarthy and Davison (1991) and Jones and White (1992) is unlikely to have existed. This implies that the differences in results between these two studies cannot be due to different degrees of position bias in the behavior of the subjects across studies. Instead, other procedural differences between the studies must account for the different results, and only further research can establish what those critical differences were. Future research might also consider whether sensitivity to comparison-stimulus-color and comparison-stimulus-position reinforcer ratios both change as a function of stimulus–choice delay, and, in particular, whether there is an inverse relation between the changes in these parameters.

The fact that comparison-stimulus configuration is a multidimensional stimulus complicates the modeling of configuration discriminability, and therefore, of discriminability between the comparison stimuli themselves. In particular, the effects of varying the stimulus disparity of each dimension (e.g., position and color) comprising the configuration are not necessarily equivalent. It is beyond the scope of the present data to test a model that captures all these effects. We do, however, offer two directions for extending conventional quantitative models of DMTS performance. The first model is a modification of Equations 3 and 4. For simplicity and economy, only the equation for choice on trials involving a bright-yellow sample and the red–left and green–right comparison configuration (assuming the red comparison is correct) is given; the related equations for the other three combinations of sample stimuli and choice-key configurations follow the same logical pattern. The equation, without its logarithmic transformation, is given by

graphic file with name jeab-89-03-03-e05.jpg

where all notation is above, and the parameters b₁ and b₂ measure the extent to which the disparity between key colors and key positions, respectively, affects performance. Together, the parameters b₁ and b₂ encompass the effects of discriminability between the two stimulus dimensions defining the configuration. However, the two parameters produce slightly different effects on behavior. The parameter b₁ modulates the effects of the red–green reinforcer ratio, any red–green response bias, and discrimination between the sample stimuli, whereas the parameter b₂ changes only the effects of the left–right reinforcer ratio and left–right response bias.

Alsop and Davison (1991) provided an alternative quantitative model of signal detection. Extended to accommodate the effects of configuration, the equivalent equation to Equation 5 could be written

graphic file with name jeab-89-03-03-e06.jpg

where, d_sb, d_br1, and d_br2 measure “the generalization engendered by confusion between stimulus–response relations and between response–reinforcer relations” (Davison & Nevin, 1999, p. 449), respectively, and all other notation is as above. The parameter d_sb is similar to log d in previous equations. The parameters d_br1 and d_br2 measure the extent to which changes in the distribution of reinforcers (in this case across key color and key position, respectively) change behavior. This approach assumes that strict matching occurs (i.e., changes in reinforcer ratios produce equal changes in response ratios) only when the discriminability between the response–reinforcer contingencies is perfect. Physical disparity between the response alternatives (e.g., color or position) is one factor that can affect the discriminability between response–reinforcer contingencies (Alsop & Davison, 1991; Godfrey & Davison, 1998; Miller, Saunders, & Bourland, 1980). The parameters d_br1 and d_br2, therefore, inherently capture the discriminability between configurations, provided the effects of the configuration's different dimensions are independent—an assumption which is supported by the results of the present experiment.

The question of which of the two new models presented here are more viable is not straightforward to answer. Equation 6 and its related equations seem a more parsimonious treatment of the effects of configuration than Equation 5. In addition, Equation 5 does not predict changes in inherent response biases, c₁ and c₂, as a function of changes in the discriminability between the two configurations, and this seems unlikely on logical grounds. Separate expressions modeling effects on these measures of inherent bias would increase the complexity, and probably the number of free parameters, of Equation 5. An extensive parametric experiment varying reinforcer distributions at different levels of disparity between both dimensions of the configurations would be necessary to test the absolute and relative merit of these two models.

A further result obtained in the present experiment was that the distribution of reinforcers across comparison color and comparison position had different effects on discriminability. There was evidence of a differential outcome effect (DOE) across changes in the distribution of red/green reinforcers; that is, estimates of stimulus discriminability (log d) were greater when the red/green reinforcer distribution was unequal than equal (Figure 6). Analysis of individual subjects' data showed this effect for 3 of the 6 subjects (Table 4). However, there was no evidence of a DOE across changes in the distribution of left/right reinforcers (Figure 7), and 1 subject showed the opposite effect with lower estimates of log d at unequal distributions (Table 4). This difference between the effects of red/green and the left/right reinforcer ratio is consistent with an explanation of the DOE known as outcome-expectancy theory (Peterson & Trapold, 1980). According to this theory, matching accuracy is higher when different outcomes serve as reinforcers for the two correct responses because an expectancy of the outcome serves as an additional cue for selection of the correct comparison stimulus. Clearly, the sample stimuli in a DMTS task signal which comparison will be correct but they do not signal which position will be correct. Thus, a DOE would not be predicted to occur when the two comparison positions are associated with different rates of reinforcement because the samples cannot elicit expectancies of those positions to supplement stimulus control by the samples—a prediction that was upheld in the data reported here.

The DOE is problematic for quantitative models of discrimination performances. If the DOE arises due to, for example, added mediating stimuli during a stimulus–choice delay, then there is no clear way of incorporating this effect into extant models. The situation is further complicated because there is no guarantee that a particular inequality of reinforcer frequencies (or magnitudes, say) will produce a DOE, or that all subjects will show this effect (as in the present study, Table 5). Therefore, although we recognize this limitation to our equations, we can offer no solution, and it may be that the source of the DOE is orthogonal to the main focus of behavioral models of signal detection, MTS and DMTS.

There was a further aspect of the analysis of stimulus-discriminability estimates (log d) that warrants attention. Figures 6 and 7 plot the estimates of stimulus discriminability separately for each configuration. This not only allowed comparison of performance from the two configurations, but it was also necessary to preserve the integrity of the analysis. The two configurations represent two trial types with the same sample stimuli, but different levels of response bias (e.g., Figure 1) arising from the different reinforcer distributions signaled by each configuration. Pooling data across trial types with different levels of response bias can contaminate and reduce these estimates of discriminability. This is illustrated in Figure 8, which replots the data from the p(left) = .8 panel of Figure 6 and the p(red) = .8 panel of Figure 7, and also includes estimates of log d from the data pooled across the two configurations. The pooled data from the p(left) = .8 conditions (open circles in left panel) are generally lower than the estimates of log d for the individual configurations. The pooled data from the p(red) = .8 conditions (open circles in right panel) are also lower than the data from individual configurations and are markedly so at the more extreme comparison-position reinforcer ratios. The differences between pooled and configuration data were larger in the right panel because there were greater effects on response bias by the comparison-position reinforcer ratio than by the comparison-color reinforcer ratio (Figures 3 and 4). Most researchers using MTS and DMTS procedures pool their data across configurations without assessing whether different degrees of the two types of bias (i.e., position and comparison color) exist for the different comparison configurations. The results of our analysis suggest that this practice has the potential to reduce and systematically distort measures of matching accuracy such as log d. Researchers explicitly examining changes in matching accuracy, such as those studying of short-term remembering in animals using a DMTS task, may need to ensure that their independent variables have not also produced unexpected response biases if they are to avoid this problem. As a hypothetical example, consider the research comparing many-to-one and one-to-many DMTS procedures. In the former procedure, there are a number of different sample stimuli, but only two comparison stimuli, typically distinguished by two different visual cues that change randomly between two positions across trials. One of the comparisons is correct for some of the sample stimuli, and the other is correct for the remaining samples. One-to-many MTS uses the same basic procedure, but there are only two sample stimuli, and a number of different comparison stimuli. Performance is usually more accurate on many-to-one procedures than on one-to-many procedures, and this difference is usually discussed in terms of coding processes (e.g., Santi & Roberts, 1985). However, the number of different comparison configurations is greater in one-to-many procedures than in many-to-one procedures. This effectively adds discriminative stimuli to the task (Jones, 2003), and complicates reinforcer control along the dimensions of the visual cues. Response biases related to the more stable dimension of comparison position might also contribute to the apparent decrease in accuracy.

Fig 8 — Mean discriminability (Equation 1) is plotted as a function of the log red/green reinforcer ratio (left panel) and the log left/right reinforcer ratio (right panel). The estimates for the two side-key configurations and the data pooled across the two configurations is shown separately.

Finally, it is clear that reinforcer control in DMTS tasks is more complex than most researchers have previously acknowledged. The present experiment revealed systematic effects of varying the reinforcer ratio associated with the two comparison-stimulus colors and the reinforcer ratio associated with their positions. Furthermore, these effects were independent and there was a striking difference between the magnitude of each effect. These results are consistent with Jones' (2003) conceptual model of MTS and DMTS performance in that they demonstrate that comparison-stimulus configurations can serve as discriminative stimuli, and that position biases change in lawful ways (suggesting that left and right side-key responses are the fundamental units of behavior), in those tasks. That model also offered an account of why comparison-color sensitivities were lower than comparison-position sensitivities. However, a quantitative extension of a conventional signal-detection model also captured the main effects of the reinforcer variables. Therefore, the degree to which quantitative modeling of reinforcement variables in MTS and DMTS requires modification remains unclear and awaits future research.

Acknowledgments

We thank the University of Auckland Research Committee for their financial support of this research, Mick Sibley for his care of the subjects, and the undergraduate and graduate students who helped run this experiment. Reprints may be obtained from either author.

References

Alsop B, Davison M. Effects of varying stimulus disparity and the reinforcer ratio in concurrent-schedule and signal-detection procedures. Journal of the Experimental Analysis of Behavior. 1991;56:67–80. doi: 10.1901/jeab.1991.56-67. [DOI] [PMC free article] [PubMed] [Google Scholar]
Blough D.S. Delayed matching in the pigeon. Journal of the Experimental Analysis of Behavior. 1959;2:151–160. doi: 10.1901/jeab.1959.2-151. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cumming W.W, Berryman R. The complex discriminated operant: Studies of matching-to-sample and related problems. In: Mostofosky D.I, editor. Stimulus generalization. Stanford: Stanford University Press; 1965. pp. 284–330. [Google Scholar]
D'Amato M.R. Delayed matching and short-term memory in monkeys. In: Bower G.H, editor. The psychology of learning and motivation, Vol. 7. New York: Academic Press; 1973. pp. 227–269. [Google Scholar]
Davison M, Nevin J.A. Stimuli, reinforcers, and behavior: An integration. Journal of the Experimental Analysis of Behavior. 1999;71:439–482. doi: 10.1901/jeab.1999.71-439. [DOI] [PMC free article] [PubMed] [Google Scholar]
Davison M, Tustin R. The relation between the generalized matching law and signal-detection theory. Journal of the Experimental Analysis of Behavior. 1978;29:331–336. doi: 10.1901/jeab.1978.29-331. [DOI] [PMC free article] [PubMed] [Google Scholar]
Godfrey R, Davison M. Effects of varying sample- and choice-stimulus disparity on symbolic matching-to-sample performance. Journal of the Experimental Analysis of Behavior. 1998;69:311–326. doi: 10.1901/jeab.1998.69-311. [DOI] [PMC free article] [PubMed] [Google Scholar]
Harnett P, McCarthy D, Davison M. Delayed signal detection, differential reinforcement, and short-term memory in the pigeon. Journal of the Experimental Analysis of Behavior. 1984;42:87–111. doi: 10.1901/jeab.1984.42-87. [DOI] [PMC free article] [PubMed] [Google Scholar]
Honig W.K, Wasserman E.A. Performance of pigeons on delayed simple and conditional discriminations under equivalent training procedures. Learning and Motivation. 1981;12:149–170. [Google Scholar]
Iversen I.H, Sidman M, Carrigan P. Stimulus definition in conditional discriminations. Journal of the Experimental Analysis of Behavior. 1986;45:297–304. doi: 10.1901/jeab.1986.45-297. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jones B.M. Quantitative analyses of matching-to-sample performance. Journal of the Experimental Analysis of Behavior. 2003;79:323–350. doi: 10.1901/jeab.2003.79-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jones B.M, Davison M.C. Reporting contingencies of reinforcement in concurrent schedules. Journal of the Experimental Analysis of Behavior. 1998;69:161–183. doi: 10.1901/jeab.1998.69-161. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jones B.M, White K.G. Stimulus discriminability and sensitivity to reinforcement in delayed matching-to-sample. Journal of the Experimental Analysis of Behavior. 1992;58:159–172. doi: 10.1901/jeab.1992.58-159. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kamil A.C, Sacks R.A. Three-configuration matching-to-sample in the pigeon. Journal of the Experimental Analysis of Behavior. 1972;17:483–488. doi: 10.1901/jeab.1972.17-483. [DOI] [PMC free article] [PubMed] [Google Scholar]
McCarthy D, Davison M. Delayed reinforcement and delayed choice in symbolic matching to sample: Effects of stimulus discriminability. Journal of the Experimental Analysis of Behavior. 1986;46:293–303. doi: 10.1901/jeab.1986.46-293. [DOI] [PMC free article] [PubMed] [Google Scholar]
McCarthy D, Davison M. The interaction between stimulus and reinforcer control on remembering. Journal of the Experimental Analysis of Behavior. 1991;56:51–66. doi: 10.1901/jeab.1991.56-51. [DOI] [PMC free article] [PubMed] [Google Scholar]
McCarthy D, Voss P. Delayed matching-to-sample performance: Effects of relative reinforcer frequency and of signaled versus unsignaled reinforcer magnitudes. Journal of the Experimental Analysis of Behavior. 1995;63:33–51. doi: 10.1901/jeab.1995.63-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
Miller J.T, Saunders S.S, Bourland G. The role of stimulus disparity in concurrently available reinforcement schedules. Animal Learning and Behavior. 1980;8:635–641. [Google Scholar]
Peterson G.B, Trapold M.A. Effects of altering outcome expectancies on pigeons' delayed conditional discrimination performance. Learning and Motivation. 1980;11:267–288. [Google Scholar]
Santi A, Roberts W.A. Prospective representation: The effects of varied mapping of sample stimuli to comparison stimuli and differential trial outcomes on pigeons' working memory. Animal Learning and Behaviour. 1985;13:103–108. [Google Scholar]
Sidman M. Functional analysis of emergent verbal classes. In: Thompson T, Zeiler M.D, editors. Analysis and integration of behavioral units. Hillsdale, NJ: Erlbaum; 1986. pp. 213–245. [Google Scholar]
Sidman M. Adventitious control by the location of comparison stimuli in conditional discriminations. Journal of the Experimental Analysis of Behavior. 1992;58:173–182. doi: 10.1901/jeab.1992.58-173. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sidman M. Equivalence relations and the reinforcement contingency. Journal of the Experimental Analysis of Behavior. 2000;74:127–146. doi: 10.1901/jeab.2000.74-127. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smith L. Delayed discrimination and delayed matching in pigeons. Journal of the Experimental Analysis of Behavior. 1967;10:529–533. doi: 10.1901/jeab.1967.10-529. [DOI] [PMC free article] [PubMed] [Google Scholar]
White K.G. Characteristics of forgetting functions in delayed matching to sample. Journal of the Experimental Analysis of Behavior. 1985;44:15–34. doi: 10.1901/jeab.1985.44-15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-Alsop1] Alsop B, Davison M. Effects of varying stimulus disparity and the reinforcer ratio in concurrent-schedule and signal-detection procedures. Journal of the Experimental Analysis of Behavior. 1991;56:67–80. doi: 10.1901/jeab.1991.56-67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-Blough1] Blough D.S. Delayed matching in the pigeon. Journal of the Experimental Analysis of Behavior. 1959;2:151–160. doi: 10.1901/jeab.1959.2-151. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-Cumming1] Cumming W.W, Berryman R. The complex discriminated operant: Studies of matching-to-sample and related problems. In: Mostofosky D.I, editor. Stimulus generalization. Stanford: Stanford University Press; 1965. pp. 284–330. [Google Scholar]

[jeab-89-03-03-DAmato1] D'Amato M.R. Delayed matching and short-term memory in monkeys. In: Bower G.H, editor. The psychology of learning and motivation, Vol. 7. New York: Academic Press; 1973. pp. 227–269. [Google Scholar]

[jeab-89-03-03-Davison1] Davison M, Nevin J.A. Stimuli, reinforcers, and behavior: An integration. Journal of the Experimental Analysis of Behavior. 1999;71:439–482. doi: 10.1901/jeab.1999.71-439. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-Davison2] Davison M, Tustin R. The relation between the generalized matching law and signal-detection theory. Journal of the Experimental Analysis of Behavior. 1978;29:331–336. doi: 10.1901/jeab.1978.29-331. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-Godfrey1] Godfrey R, Davison M. Effects of varying sample- and choice-stimulus disparity on symbolic matching-to-sample performance. Journal of the Experimental Analysis of Behavior. 1998;69:311–326. doi: 10.1901/jeab.1998.69-311. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-Harnett1] Harnett P, McCarthy D, Davison M. Delayed signal detection, differential reinforcement, and short-term memory in the pigeon. Journal of the Experimental Analysis of Behavior. 1984;42:87–111. doi: 10.1901/jeab.1984.42-87. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-Honig1] Honig W.K, Wasserman E.A. Performance of pigeons on delayed simple and conditional discriminations under equivalent training procedures. Learning and Motivation. 1981;12:149–170. [Google Scholar]

[jeab-89-03-03-Iversen1] Iversen I.H, Sidman M, Carrigan P. Stimulus definition in conditional discriminations. Journal of the Experimental Analysis of Behavior. 1986;45:297–304. doi: 10.1901/jeab.1986.45-297. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-Jones1] Jones B.M. Quantitative analyses of matching-to-sample performance. Journal of the Experimental Analysis of Behavior. 2003;79:323–350. doi: 10.1901/jeab.2003.79-323. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-Jones2] Jones B.M, Davison M.C. Reporting contingencies of reinforcement in concurrent schedules. Journal of the Experimental Analysis of Behavior. 1998;69:161–183. doi: 10.1901/jeab.1998.69-161. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-Jones3] Jones B.M, White K.G. Stimulus discriminability and sensitivity to reinforcement in delayed matching-to-sample. Journal of the Experimental Analysis of Behavior. 1992;58:159–172. doi: 10.1901/jeab.1992.58-159. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-Kamil1] Kamil A.C, Sacks R.A. Three-configuration matching-to-sample in the pigeon. Journal of the Experimental Analysis of Behavior. 1972;17:483–488. doi: 10.1901/jeab.1972.17-483. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-McCarthy1] McCarthy D, Davison M. Delayed reinforcement and delayed choice in symbolic matching to sample: Effects of stimulus discriminability. Journal of the Experimental Analysis of Behavior. 1986;46:293–303. doi: 10.1901/jeab.1986.46-293. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-McCarthy2] McCarthy D, Davison M. The interaction between stimulus and reinforcer control on remembering. Journal of the Experimental Analysis of Behavior. 1991;56:51–66. doi: 10.1901/jeab.1991.56-51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-McCarthy3] McCarthy D, Voss P. Delayed matching-to-sample performance: Effects of relative reinforcer frequency and of signaled versus unsignaled reinforcer magnitudes. Journal of the Experimental Analysis of Behavior. 1995;63:33–51. doi: 10.1901/jeab.1995.63-33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-Miller1] Miller J.T, Saunders S.S, Bourland G. The role of stimulus disparity in concurrently available reinforcement schedules. Animal Learning and Behavior. 1980;8:635–641. [Google Scholar]

[jeab-89-03-03-Peterson1] Peterson G.B, Trapold M.A. Effects of altering outcome expectancies on pigeons' delayed conditional discrimination performance. Learning and Motivation. 1980;11:267–288. [Google Scholar]

[jeab-89-03-03-Santi1] Santi A, Roberts W.A. Prospective representation: The effects of varied mapping of sample stimuli to comparison stimuli and differential trial outcomes on pigeons' working memory. Animal Learning and Behaviour. 1985;13:103–108. [Google Scholar]

[jeab-89-03-03-Sidman1] Sidman M. Functional analysis of emergent verbal classes. In: Thompson T, Zeiler M.D, editors. Analysis and integration of behavioral units. Hillsdale, NJ: Erlbaum; 1986. pp. 213–245. [Google Scholar]

[jeab-89-03-03-Sidman2] Sidman M. Adventitious control by the location of comparison stimuli in conditional discriminations. Journal of the Experimental Analysis of Behavior. 1992;58:173–182. doi: 10.1901/jeab.1992.58-173. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-Sidman3] Sidman M. Equivalence relations and the reinforcement contingency. Journal of the Experimental Analysis of Behavior. 2000;74:127–146. doi: 10.1901/jeab.2000.74-127. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-Smith1] Smith L. Delayed discrimination and delayed matching in pigeons. Journal of the Experimental Analysis of Behavior. 1967;10:529–533. doi: 10.1901/jeab.1967.10-529. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jeab-89-03-03-White1] White K.G. Characteristics of forgetting functions in delayed matching to sample. Journal of the Experimental Analysis of Behavior. 1985;44:15–34. doi: 10.1901/jeab.1985.44-15. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Reinforcer Control by Comparison-Stimulus Color and Location in a Delayed Matching-to-Sample Task

Brent Alsop

B Max Jones

Abstract