Abstract
One experiment with rats used Pavlovian-to-instrumental transfer (PIT) tests to explore potential competitive interactions between Pavlovian and instrumental processes during instrumental learning. Two instrumental response-outcome relations (e.g., left lever – grain pellets, right lever – sucrose pellets) were first trained in distinct contexts for one group of rats (Group Differential) or in each of two contexts for a second group (Group Non-Differential). Both of these groups then received training with two Pavlovian stimulus-outcome relations in a third experimental context. Selective PIT tests conducted in both the Pavlovian and instrumental contexts revealed greater selective PIT in Group Non-Differential than in Group Differential subjects. This result is discussed in terms of the roles played by context-outcome, response-outcome, and outcome-response associations during instrumental learning. The results further help us understand the nature of Pavlovian-instrumental interactions in specific PIT tasks.
The study of Pavlovian – instrumental interactions has once again become a popular and exciting area of research. Recent use of more refined behavioral tasks and neuroscience techniques has led to an explosion of interest in the neurobiological substrates of basic learning processes (e.g., Berridge, 2009; Delamater & Lattal, 2014; Fanselow, Zelikowsky, Perusini, Rodriguez Barrera, & Hersman, 2014; Honey, Iordanova, & Good, 2014; Laurent, Morse, & Balleine, 2014; McDannald, Jones, Takahashi, & Schoenbaum, 2014) and how such Pavlovian-instrumental interactions may play a role in a wide variety of circumstances (e.g., Corbit & Janak, 2007; Holland & Hsu, 2014; Holmes, Marchand, & Coutureau, 2010; Lewis, Niznikiewicz, Delamater, & Delgado, 2013; Martinovic et al., 2014; Ostlund, LeBlanc, Kosheleff, Wassum, & Maidment, 2014; Parnaudeau et al., 2014; Peciña & Berridge, 2013). It is worth briefly reviewing some of the key ideas that have arisen from the behavioral literature on the study of Pavlovian-instrumental interactions because these help identify what we take to be the central theoretical issues in this area of research.
Pavlov (1932), Konorski and Miller (1937), and Estes and Skinner (1941) were among the first to explore how these two basic learning processes might jointly contribute to affect performance. Later, Rescorla and Solomon (1967), Trapold and Overmier (1972), and Rescorla (1992) advanced the main conceptual approaches that theorists today frequently use in explaining various Pavlovian-instrumental interaction phenomena. Following Mowrer (1947; also Konorski & Miller, 1937), Rescorla and Solomon (1967) suggested that a stimulus used in Pavlovian conditioning acquires the capacity to activate a rather general central motivational state. They further assumed that activation of this central motivational state, in turn, influences instrumental responding by affecting the overall motivational substrate that supports that responding. For instance, in the case of appetitively-reinforced instrumental responding (e.g., lever pressing for food), it was assumed that activation by a stimulus of an appetitive central motivational state would enhance or energize such responding because this would further activate the appetitive motivational substrate that supports that response. In contrast, activation of an aversive motivational state by the stimulus would antagonize the appetitive state that normally supports the food-reinforced instrumental response, and the effect would be to decrease instrumental responding (see also Weiss, Thomas, & Weissman, 1996). The added value of this framework is that it makes further interesting and testable predictions for situations where the instrumental response is maintained through aversive motivational processes, such as avoidance conditioning (e.g., Overmier, Bull, & Pack, 1971; Rescorla & LoLordo, 1965).
While this approach continues to have a rather wide appeal (e.g., see Balleine & Killcross, 2006), it fails to account for more specific incentive motivational effects that a stimulus has on instrumental behavior. The most common method used today to study Pavlovian-instrumental interactions is the Pavlovian-to-instrumental transfer test (PIT). There are different variants of the PIT procedure, but a common method is to train an animal to perform two distinct instrumental responses with different outcomes, creating two response-outcome (R-O) relations (e.g., a left lever press is paired with pellets and a right lever press is paired with sucrose) in one phase of the experiment. Separately, two distinct Pavlovian stimuli are differentially paired with the two outcomes (e.g., a light is paired with pellets and a tone is paired with sucrose). And, finally, the effects of the Pavlovian stimuli on instrumental responding are assessed in a non-reinforced choice test. The usual result is that the stimulus selectively enhances, above baseline, the response with which it shares a reinforcing outcome. That is, in this example, the presentation of the light results in increased responding to the left lever and the presentation of the tone results in increased responding to the right lever (e.g., Delamater & Holland, 2008; Kruse, Overmier, & Rokke, 1983), an effect known as outcome-specific PIT.
Outcome-specific PIT poses a serious problem for the sort of motivational account of Rescorla and Solomon (1967), because both stimuli should activate the same central appetitive motivational state and, therefore, activate both responses, not just one. In order to explain the specific incentive motivational effect, Trapold and Overmier (1972) suggested that in addition to a stimulus activating a general central motivational state, the stimulus comes to associate with (see also Konorski, 1967) and, thus, activate, a representation of the specific sensory qualities of the reinforcer (e.g., the taste of sucrose). Further, these authors assumed that this sensory-specific outcome expectancy could, itself, act as a stimulus that could associate directly with the instrumental response. In the usual PIT paradigm, this outcome-response (O-R) association could be learned during the instrumental training phase of the experiment by virtue of the fact that the response is often reinforced in the presence of an expectancy of the outcome (provided, for instance, by contextual cues that have become associated with the outcome). As such, this expectancy can be seen as a discriminative stimulus for the instrumental response. If we allow for specific S-O and O-R associations to be formed in the Pavlovian and instrumental phases of the experiment, respectively, then one can see how a Pavlovian stimulus could exert a selective effect on instrumental responding during the PIT test. Namely, a given stimulus activates the specific O representation with which it was associated (S-O), and this, in turn, activates the specific R with which it was associated (O-R).
A related account – the bidirectional hypothesis – was put forth by Pavlov (1932) and discussed more extensively by Mackintosh and Dickinson (1979; also Colwill & Rescorla, 1985; Rescorla, 1992; Urcuioli & DeMarse, 1997). This account similarly assumes that specific S-O associations are formed during Pavlovian learning, but that R-O (rather than O-R) associations are formed during instrumental training. In order for the stimulus to selectively affect instrumental responding during a PIT test in this case, it is further assumed that the stimulus activates the outcome representation, which, in turn, activates its associated response through the forward R-O associative link that is used in the backward direction, i.e., the R-O link is, in some sense, bidirectional. In this way, the specific PIT effects that are usually found can be understood, once again, in the form of an S-O, O-R associative chain but the nature of the instrumental learning is different in the two cases.
This rather subtle difference has important implications for the control of instrumental actions. Rescorla (1992) contrasted different predictions made by these two accounts and provided evidence favoring the bidirectional R-O model. In one experiment, Rescorla (1992) taught the rat to make one instrumental response (R1) for one particular reinforcing outcome (O1) in the presence of a discriminative stimulus that otherwise signaled the non-contingent delivery of a different reinforcing outcome (O2). In this task, the animals could potentially learn an R1-O1 association (when R1 was reinforced with O1) or an O2-R1 association (because R1 was reinforced in the presence of an expectation for O2 activated by the discriminative stimulus). In a series of studies, Rescorla (1992) observed that the rats were more strongly controlled by the R-O than the O-R relations.
Another type of experiment – the so-called “stimulus-response overshadowing” experiment popular for a time in the 1970s – provides some evidence to favor the view that R-O (and not O-R) associations are learned during instrumental training. In one study, Pearce and Hall (1978; see also Williams, 1999) demonstrated with rats that learning to lever press on an intermittent reinforcement schedule was impaired when a brief visual stimulus consistently occurred just prior to the delivery of every food reinforcement. The result was understood by assuming that a Pavlovian light-pellet association was established and competed with (overshadowed) the formation of an instrumental lever press-pellet association. This interpretation quite naturally follows from the assumption that expected outcomes are less well processed and, therefore, cannot easily support new learning compared to unexpected outcomes (Kamin, 1969). If instrumental learning consists of the animals developing an O-R association, it is not so straightforward to see why presenting a light stimulus after the response but just before the outcome should have diminished O-R learning. However, one possible way to explain this is to assume that O-R learning is best accomplished when the instrumental response is trained in the presence of a strongly conditioned context (e.g., Steinhauer, Davol, & Lee, 1976). If the visual cue in this study overshadowed the context-pellet association, then this could have potentially weakened the development of an O-R association because the outcome expectancy would not have been present prior to the actual response having been made. Therefore, the outcome expectancy should not have been easily learned as a discriminative cue for the instrumental response (Trapold & Overmier, 1972).
The present study explored this issue in a somewhat different way. Half of the rats in the present experiment were first taught an R1-O1 relation in Context 1 and an R2-O2 relation in a physically distinct Context 2 (Group Differential), while the remaining rats learned both of these instrumental relations in both contexts (Group Non-Differential). If the context in which lever pressing takes place associates with the reinforcing outcomes, then according to the O-R view of instrumental learning Group Differential subjects will learn to press each of the two levers in the presence of distinct outcome expectancies. This might be expected to enhance instrumental learning and outcome-specific control over responding because of the differential outcome effect (e.g., Trapold, 1970; Trapold & Overmier, 1972). In contrast, according to this view Group Non-Differential subjects should learn to associate each of the two contexts with both reinforcing outcomes. But since both responses are trained in those contexts the two outcome expectancies should each associate with both instrumental responses, and this would be expected to diminish outcome-specific control over these responses.
We tested these predictions by giving differential S1-O1 and S2-O2 Pavlovian training in a third context, and then, ultimately, giving PIT tests in either the Pavlovian or instrumental training contexts. If O-R associations were learned during instrumental training and controlled performance during the PIT tests, then greater selective PIT should be seen in Group Differential than in Group Non-Differential for the reasons noted above (and related to this theory’s account of the basic differential outcome effect; Trapold & Overmier, 1972).
On the other hand, the opposite prediction is made if instrumental learning results in the formation of R-O associations. Group Differential subjects would learn to make each instrumental response in the presence of a context that already predicts the reinforcing outcome. Because of the stimulus-response overshadowing (Pearce & Hall, 1978; Williams, 1999) and blocking (Kamin, 1969) effects, the distinct context-outcome associations in Group Differential should reduce the effectiveness of those reinforcers in supporting instrumental R-O associations (see also Rescorla & Wagner, 1972). Group Non-Differential subjects should not suffer from this competition, however, because the different instrumental responses are the most valid predictors of the different reinforcing outcomes in this situation. In this case, since both responses are trained in both contexts, the contexts become less valid than the responses, themselves, at predicting which outcome will occur at any given moment. Thus, the problem reduces to a relative cue validity one (Wagner, Logan, Haberlandt, & Price, 1968), and, as a result, the R-O associations should be more strongly conditioned in Group Non-Differential than in Group Differential. Therefore, during the PIT tests the strongest selective PIT effects should be seen in Group Non-Differential. The present study examined these contrasting hypotheses in an effort to determine which associative mechanism, R-O or O-R, underlies instrumental learning and, ultimately, mediates the selective PIT effect.
Method
Subjects
Subjects were 32 naïve male Long-Evans rats, weighing approximately 325 g at the start of the experiment. The experiment was run in two replications (n = 16 rats in each replication). The rats were housed in cages containing between two and four animals and were food restricted such that their weights were maintained at 85% of their free-feeding weight throughout the experiment. They were given free access to water at all times. The colony room in which they were housed was kept on a 14/10 hours light/dark cycle where the lights came on at 7 AM and turned off at 9 PM each day.
Apparatus
The apparatus consisted of two sets of eight identical conditioning chambers, which were encased in light and sound resistant wooden shells. The conditioning chambers were 30.5 cm long × 24.0 cm wide × 25.0 cm deep. The end walls were made from aluminum and the ceiling and the sidewalls were made from clear Plexiglas. The food magazine measured 3.0 cm long × 3.6 cm wide × 2.0 cm deep and was located at the center of one of the end walls. The outcomes (0.1 ml droplet of 20% sucrose or approximately two 45 mg food pellets (Research Diet)) were delivered into a well on the bottom of the food magazine. The floor was made from stainless steel rods (0.6 cm in diameter, 2.0 cm apart). An infrared detector and emitter were mounted on the magazine walls (and positioned at the entrance) to record magazine entries by the rat. Two response levers were presented that were 4 cm wide and were located 3 cm either side of the food magazine and 8 cm above the floor level. While both levers were permanently mounted in the chamber, access to either one could be restricted by placement of an aluminum cover over it. The cover was the same height as the chamber and was 8.1 cm wide and 3.18 cm deep. A tone CS (1500 Hz) was produced by a speaker located approximately 22 cm behind the front wall of the conditioning chamber. This tone measured 4 dB above background noise levels. A steady light CS (6-W light bulb) was mounted on the top of the sidewall of the outer chamber. Background noise and ventilation were created by a fan, which was attached to the outer shell. Background noise was measured at 78 dB. The equipment was controlled and data were recorded by a personal computer and interfacing equipment (Alpha Products) located in the same room as the experimental chambers.
To create three distinct contexts different sets of inserts were used. The two instrumental training contexts were created using either aluminum or Plexiglas inserts. In the aluminum context a flat sheet of aluminum (28.74 cm × 23.81 cm) was inserted to cover the grid floor, and a second sheet (35.56 cm × 23.5 cm) was arranged such that it sat diagonally across the chamber between the top of the wall bearing the levers and the magazine, and the floor on the opposite side of the chamber. The second instrumental context was made up of a 27.94 cm × 29.53 cm sheet of Plexiglas with 506 evenly spaced .64 cm diameter holes drilled into it. A second sheet of Plexiglas sat diagonally across the chamber so that it reached from the vertical edge of the wall bearing the levers and the magazine to the furthest corner on the opposite side of the chamber. The context used during the Pavlovian stage consisted only of the chamber itself with the levers covered by the aluminum covers. Thus, the three contexts differed in terms of their floor texture and their overall shape.
Behavioral Procedures
Magazine training
For two days hungry subjects were placed in individual conditioning chambers and were trained to approach the food magazine to receive the outcomes. All of the animals received two training sessions on each day – Context 1 was used for one of these sessions and Context 2 for the other. For Group Non-Differential, on day one, Outcome 1 was presented in Context 1 and Outcome 2 in Context 2. On the second day Outcome 1 was presented in Context 2 and Outcome 2 in Context 1. In this way both outcomes were presented in each context. The magazine training for Group Differential was very similar, except that Outcome 1 was always presented in Context 1 and Outcome 2 was always presented in Context 2. The identity of the outcomes as O1 or O2 (pellets, sucrose) was also counterbalanced across animals, as was the identity of the contexts as Context 1 or 2. The outcome was delivered 20 times during each 20 min session, and was presented on a variable time 60-s schedule. The order of presentation of the outcomes was also counterbalanced across the two days; if Outcome 1 was presented first on the first day of magazine training, then Outcome 2 was presented first on the second day. Access to the levers during these sessions was prevented by using the aluminum lever covers, and the two Pavlovian stimuli were not presented during the magazine training phase.
Instrumental training
Following completion of magazine training, the rats were given two sessions of instrumental continuous reinforcement (CRF) training. In each of these sessions, only one lever was accessible to the rat. In the first session of the day rats were presented with lever 1, so that only response 1 (R1) could be performed. Initially rats were trained on a CRF schedule, so that every time they performed R1, O1 was delivered to the magazine. During the second session of the day, lever 2 was made accessible, and lever 1 was covered. Again, animals were initially trained on a CRF schedule so that every time the rat performed R2, O2 was delivered. For the animals in Group Non-Differential, both levers were paired with their respective outcomes in both contexts, so on day 1 R1-O1 was given in Context 1 and on day 2 it was given in Context 2. The same was true of the R2-O2 pairing. For animals in Group Differential, each R-O relation was trained in only one context. That is, on both days R1-O1 was given only in Context 1 and R2-O2 only in Context 2. Animals were given instrumental training in both contexts on each day. Each subject was trained on the CRF schedule until they made 50 responses on each lever. After reaching this criterion, all subjects were trained on steadily increasing variable ratio schedules. They received VR 5 for two days and VR 10 for four days. On each of these days, subjects were given two 20-min training sessions. During one of these sessions R1 was reinforced with O1 and R2 was reinforced with O2. For animals in Group Differential, the R1 was only ever available in Context 1, and R2 in Context 2. For the animals in Group Non-Differential, as with CRF training, both pairings were given in both contexts. That is, R1 was reinforced with O1 in Context 1 and in Context 2 on an equal number of occasions. This was also true of R2 and O2. The order of training sessions was counterbalanced in a pseudorandom order so that on some days the animals were trained with R1 first and on the remaining days they were trained on R2 first. The number of responses was recorded during each session.
Pavlovian conditioning
For eight days following the completion of instrumental training, the two Pavlovian stimuli (S1 and S2) were individually paired with the two outcomes (O1 and O2). For half of the animals S1 was paired with sucrose and S2 with pellets, for the other half S1 was paired with pellets and S2 with sucrose. Each conditioning session was 79 min long, during which there occurred six S1-O1 and six S2-O2 trials. The order of the stimulus presentations was presented in a pseudorandom order with the provision that neither stimulus was presented more than three times in a row. Each stimulus lasted 90 s and the appropriate outcome was delivered within each stimulus according to a variable time 30 s schedule. The inter-trial intervals (ITIs) averaged 5 min and ranged from 2 – 8 min, and the animals were removed from the chambers 1 min following the final conditioning trial. The number of magazine approach responses was recorded 90 s before the stimulus (referred to as the pre-CS), and for the 90 s duration of the stimulus. Pavlovian conditioning took place in Context 3.
Pavlovian-instrumental transfer test
Following Pavlovian conditioning, the animals were given two further instrumental training sessions the day before the first test session, one for each response. These sessions were conducted with VR 10 reinforcement schedules. The rats received four test sessions in the Pavlovian contexts on consecutive days. Two of these tests were single response tests and two were choice tests where both of the levers were available. Half of the rats received two single response tests (one with each lever) followed by two choice tests, and the remaining rats were tested with two choice tests followed by two single response tests. Each test session lasted 30 min, with four 90 s presentations of each Pavlovian stimulus alternating with four 90 s pre stimulus periods. The tests began with a 6-min period in which no stimuli were presented to familiarize the animals with the choice procedure as well as to lower the overall levels of responding before the transfer trials began. The two stimuli were then presented in an ABBA BAAB sequence (counterbalanced across days). The number of responses on each of the levers was recorded during the 90 s immediately before stimulus onset (pre-CS) and during the 90 s presentation of the stimulus. No outcomes were delivered during these test sessions.
In order to determine if the pattern of PIT results would differ when testing occurred in the instrumental, as opposed to the Pavlovian training contexts, rats in replication 2 received an additional set of tests that were conducted following the tests described above. These rats were first given an additional six instrumental retraining sessions followed by two additional Pavlovian retraining sessions, and all of these were conducted exactly as in these original conditioning phases. Subsequently there were two choice tests like those described above, but one of these was conducted in instrumental context 1 and the other in instrumental context 2.
Statistical Analyses
The lever press data were analyzed using standard analysis of variance (ANOVA) techniques. A Type I error rate of 0.05 was adopted for all statistical tests. Magazine responding during the Pavlovian conditioning phases was not statistically evaluated because such responding could reflect a mixture of both conditioned and unconditioned effects when the outcomes were presented at random times within the 90 s stimulus.
Results
Lever press responding steadily increased over the course of instrumental conditioning in both groups (from a mean of 10.7 rpm on day 1 to 19.0 rpm on day 6 for Group Differential, and from 12.3 to 21.2 rpm for Group Non-Differential). An ANOVA applied to these data revealed a significant main effect of Session, F(5,140) = 61.45, but the small apparent difference between the groups was not significant.
The PIT results from the choice test sessions conducted in the Pavlovian training context can be seen in Figure 1. The data were converted to elevation scores. The elevation score was calculated by dividing the mean response rate during the stimulus presentation (A) by the sum of the response rate during the stimulus and the 90 s pre-CS (B) period (A/(A+B)). To ensure that there were no significant differences between the two replications, Replication was included as a factor in an ANOVA comparing the effects of Group (Differential or Non-Differential) and Response (the lever associated with the same outcome as the stimulus being presented or the lever associated with a different outcome). The ANOVA revealed no meaningful effects involving the Replication factor (a significant Replication × Group interaction, F(1,28) = 8.00, merely reflected the fact that overall levels of responding in the two groups differed in the two replications). In addition, there was a significant main effect of Response, F(1,28) =6.38, as well as a significant Group × Response interaction, F(1,28) = 6.74, where Group Non-Differential subjects responded more to the lever that was associated with the same than different outcome as the stimulus being presented. The mean response rates per minute during the baseline pre-CS intervals (collapsed across bins) was 2.1 (MSE = 0.35) and 2.3 (MSE = 0.54) on the same lever for Group Differential and Non-Differential, respectively, and on the ‘different lever’ it was 2.5 (MSE = 0.57) and 2.4 (MSE = 0.47). These small baseline differences were not reliably different.
In the single response test sessions (data not shown) none of the main effects or interactions were significant.
The results from the PIT tests conducted in the instrumental training contexts (for replication 2 rats) are presented in Figure 2. We examined the hypothesis that Group Differential subjects may have been impaired at selective PIT when testing took place in the Pavlovian training context because they had difficulty retrieving the R-O associations at the time of test. If so, then these subjects should perform more normally when testing takes place in the instrumental training contexts, especially if the same R-O relation was retrieved at the time of test. The data were, therefore, initially segregated in Group Differential subjects in terms of whether the same R-O relation as that signaled by the test stimulus was tested in the context in which that particular R-O relation was trained (and, therefore, retrieved) or whether the same R-O relation was tested in the context in which it was not trained (and, thus, not retrieved). For instance, Group Differential rats received R1-O1 training in Context 1 and R2-O2 training in Context 2. When testing S1 in Context 1 the same R1-O1 relation would be retrieved, but when testing S1 in Context 2 the same R1-O1 relation would not be retrieved. If memory retrieval processes play a role then larger selective PIT effects would be expected when the same R-O relations were tested in their retrieved contexts.
The data in Figure 2 show that this was not the case. An ANOVA was performed on Group Differential subjects that compared the factors of Response (same or different) and Retrieval Context. This analysis revealed no significant main effects or interaction.
Figure 2 also shows data from these test sessions for Group Non-Differential. Since there was no effect of retrieval context in Group Differential, a second Response (same, different) × Group (Differential, Non-differential) ANOVA was conducted on the data collapsed across both instrumental contexts. Consistent with the Pavlovian context PIT data reported above, this ANOVA revealed no significant main effect of Group, a significant main effect of Response, F(1,14) = 29.13, and, most importantly, a significant Response × Group interaction, F(1,14) = 8.33. Once again, Group Non-Differential showed a greater outcome-specific PIT effect than Group Differential. The mean response rates per minute during the baseline pre-CS intervals were 4.0 (MSE = 0.81) and 3.1 (MSE = 0.91) on the same lever for Group Differential and Non-Differential, respectively. On the ‘different’ lever it was 4.3 (MSE = 1.28) and 3.1 (MSE = 0.8). None of these apparent differences were reliable.
Discussion
The main result from the present experiment is that outcome-selective PIT effects are reduced (or even eliminated) when instrumental training of two distinct R-O relations is given in two separate contexts, compared to when they are given in both of these contexts. The failure to see a strong selective PIT effect in Group Differential rats when the responses were tested in the Pavlovian context, in principle, could have been caused by an inability of these rats to retrieve their instrumental associations, rather than an inability to learn specific instrumental associations. However, further tests conducted in the instrumental training contexts were not consistent with this possibility. We reasoned that if reduced selective PIT was due to a memory retrieval deficit, then the difference between the groups should have disappeared when the same instrumental R-O relation was tested in its retrieved context. But this result was not obtained. Our findings have important implications for our understanding of Pavlovian-instrumental interactions and, more generally, of the nature of instrumental learning, and we will discuss these below.
As noted in the introduction, PIT effects have most often been understood in terms of general or specific influences of Pavlovian cues on instrumental responding. Konorski’s (1967) framework has proven useful in understanding how Pavlovian stimuli might exert these general and specific effects (see also Balleine & Killcross, 2006; Wagner & Brandon, 1989). Briefly, Konorski assumed that Pavlovian stimuli enter into separate associations with the general emotional and specific sensory qualities of a reinforcing outcome. The former type of association would enable a stimulus to exert quite general effects on behavior of the sort discussed by Rescorla and Solomon (1967) because this association would result in the stimulus evoking a general motivational/emotional state capable of modulating a wide variety of activities. In contrast, when a stimulus associatively activates a sensory-specific representation of its associated outcome, such a cue would be assumed to impact instrumental responding only in a very specific manner that would require an instrumental learning structure that also directly encodes that specific outcome.
Although Konoski’s framework for understanding Pavlovian learning can be usefully applied to understanding how Pavlovian stimuli might be capable of modulating instrumental behaviors in rather general or specific ways, by itself, it offers little guidance on selecting among rival mechanisms that describe the nature of the Pavlovian-instrumental interaction on performance in a PIT task. To describe selective PIT effects, for instance, we have distinguished between two different accounts. Trapold and Overmier (1972) suggested that S-O and O-R associations form during Pavlovian and instrumental training phases, respectively, whereas Pavlov (1932; also Mackintosh & Dickinson, 1979; Rescorla, 1992) suggested that S-O and R-O associations form during these phases of training and that the R-O association is used in the backward direction to enable selective PIT. Both of these models expect specific PIT to occur only to the extent that the O component in these two associations is specific in its sensory content and shared between the two. On the basis of a wide variety of studies there is little reason to question that Pavlovian conditioning can result in the development of highly specific S-O associations (e.g., Delamater, 2012). As to whether instrumental learning results in O-R or R-O associations there is less evidence available. In a series of studies Rescorla (1992) directly contrasted predictions deduced from these two views and concluded that the R-O association primarily controlled the instrumental response (see also Urcuioli & DeMarse, 1997).
Our data can also be taken to support the view that instrumental learning results in the development of highly specific R-O associations, and, by inference, that these associations are used in the backward direction during a specific PIT test. This conclusion follows from our finding that training two instrumental R-O relations in distinct contexts disrupts the ability of those responses to be modulated by Pavlovian stimuli. As noted in the introduction the O-R model predicts that training in this manner would more optimally result in learning of highly specific O-R associations than when both responses are trained in each of two contexts. As such, it fails to anticipate the results we obtained. However, the R-O model correctly anticipates our findings. According to this view, the learning of separate R-O associations should be weakened when these R-O relations are trained in distinct contexts because the contexts, themselves, signal the reinforcing outcomes and render the instrumental responses as redundant predictors of the outcomes. This would result in the context-outcome associations overshadowing the specific R-O association (see also Pearce & Hall, 1978; Williams, 1999). This overshadowing effect should not have occurred in our Group Non-Differential rats because for these rats the instrumental responses were relatively more valid predictors of the specific outcomes than were the contexts. This would mean that the R-O associations should remain strong in these rats (Wagner et al., 1968).
Our claim that context-outcome learning overshadowed R-O learning in Group Differential rats may lead one to expect that instrumental responding during the initial instrumental training phase should be lower in Group Differential than Group Non-Differential rats – a result that we did not statistically observe. However, it should be noted that our specific PIT tests allow us to target very particular aspects of learning, such as outcome-specific associations, in a way that simple instrumental responding cannot assess. It is possible, therefore, that while context-outcome associations successfully overshadowed learning of specific R-O associations, those associations may not have been able to compete with other aspects of instrumental learning, such as, acquisition of S-R associations.
If the instrumental learning we observed in Group Differential was, indeed, best described in these terms, then it follows that these responses should also be less sensitive to outcome devaluation manipulations (e.g., Colwill & Rescorla, 1985). Further work will be required to determine if this is the case, but there is a considerable amount of evidence to support the conclusion that distinct neural circuits underlie the formation of goal-oriented behaviors (reflective of R-O associations), on the one hand, and habitual behaviors (reflective of control by S-R associations), on the other hand (e.g., Coutureau & Killcross, 2003; Killcross & Coutureau, 2003). Thus, it seems highly plausible that although one aspect of instrumental learning may be undermined another aspect of learning may be fully intact with the end result being no difference in overall levels of responding.
A rather different approach to understanding PIT effects was advocated by Cohen-Hatton et al. (2013) who questioned the basic assumption that selective PIT effects occur through the operation of an S-O association learned during Pavlovian training. Instead, these investigators suggested that a mediated S-R association is learned during the Pavlovian training phase. This account assumes that during instrumental training a bidirectional R-O association is established, and, further, that when an outcome is paired with a stimulus during the Pavlovian training phase this ensures that an outcome-evoked memory of its associated response occurs in close temporal contiguity with the physically present stimulus. This contiguity results in new S-R learning and is responsible for the specific PIT effect. This view makes the somewhat radical assertion that S-O associations are either not acquired during Pavlovian learning or they play no special role in explaining selective PIT.
Our results present problems for this model. In particular, it is not clear why mediated S-R associations should have been weaker in our Group Differential rats compared to our Group Non-Differential rats. If presenting an O during the Pavlovian training phase evokes a memory of its associated R when the S is present, then it is difficult to see why this should not have occurred equally in our two groups of rats. At the very least, additional assumptions would need to be made for this account to accommodate our findings.
One further point is worth considering. While we have emphasized associative mechanisms at the intersection of Pavlovian and instrumental learning that would permit for an interaction at the levels of learning and performance, it is important to realize that there very well may be important differences, as well, in the underlying associative circuitries of Pavlovian and instrumental learning that may prevent interactions from taking place in some circumstances. For instance, in one study Corbit and Balleine (2003) trained animals to learn two instrumental responses in a heterogeneous chain to earn one outcome. Separately, the animals were trained to make two distinct Pavlovian S-O associations. In PIT tests where the rats could choose freely between both responses in the presence and absence of both Pavlovian stimuli, selective PIT was only observed with the response component of the chain that was most proximal to outcome delivery during instrumental training. In other words, the stimuli had no effect (general or specific) on the first response element of the chain. Separately, these authors found that the first response element of the chain, but not the second, was sensitive to an instrumental incentive learning manipulation. Thus, it appears that there may be important differences in the manner in which the outcome is encoded in both Pavlovian and instrumental conditioning that may limit performance interactions. The incentive learning effect requires that the distal response had, indeed, associated with a specific O, but that outcome encoding was apparently quite different from the manner in which the outcome was encoded during Pavlovian training since the Pavlovian stimulus failed to control this response. In contrast, the proximal response element apparently did associate with an encoding of the outcome that was more similar to the Pavlovian outcome encoding since this response was influenced by the Pavlovian stimulus in an outcome-specific manner. Selective PIT, then, would only be expected to occur to the extent that the outcome encodings in both Pavlovian and instrumental associative structures were similar. Under conditions where instrumental incentive learning is expected to occur, selective PIT should not also occur and the only manner in which Pavlovian stimuli might come to modulate such responding would be through the more general processes emphasized by Rescorla and Solomon (1967). The generality of this conclusion will need to be established, but it is illuminating that distinct neural mechanisms are beginning to be uncovered in the study of Pavlovian and instrumental processes (e.g., Corbit, Muir, & Balleine, 2001; Laurent et al., 2014).
In summary, we have presented data from an experiment designed to contrast predictions made by O-R and R-O associative models of instrumental learning. We used a selective PIT task to assess which of these models more accurately describes performance after instrumental training of separate R-O relations was conducted in two separate contexts or in each of two contexts. We observed weaker selective PIT effects when animals learned each of two different R-O relations in separate contexts. Apparently, such training enables distinct context-outcome associations to overshadow the learning of specific R-O associations over the course of instrumental learning. A further inference from these findings is that selective PIT is largely controlled by an R-O association working in the backward direction, rather than by an O-R association. Overall, the data from this study shows how specific PIT paradigms can be used to answer interesting questions about the nature of Pavlovian-instrumental interactions, while also providing further information regarding the nature of instrumental learning itself.
Acknowledgments
Financial Support: The research reported here was supported by a National Institute on Drug Abuse (SC1 DA034995) grant awarded to ARD.
Footnotes
Conflict of Interest: None.
References
- Balleine BW, Killcross S. Parallel incentive processing: an integrated view of amygdala function. Trends in Neuroscience. 2006;29:272–279. doi: 10.1016/j.tins.2006.03.002. [DOI] [PubMed] [Google Scholar]
- Barnet RC, Miller RR. Second-order excitation mediated by a backward conditioned inhibitor. Journal of Experimental Psychology: Animal Behavior Processes. 1996;22:279–296. doi: 10.1037//0097-7403.22.3.279. [DOI] [PubMed] [Google Scholar]
- Berridge KC. ‘Liking’ and ‘wanting’ food rewards: Brain substrates and roles in eating disorders. Physiology & Behavior. 2009;97:537–550. doi: 10.1016/j.physbeh.2009.02.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen-Hatton SR, Haddon JE, George DN, Honey RC. Pavlovian-to-instrumental transfer: paradoxical effects of the Pavlovian relationship explained. Journal of Experimental Psychology: Animal Behavior Processes. 2013;39:14–23. doi: 10.1037/a0030594. [DOI] [PubMed] [Google Scholar]
- Colwill RM, Rescorla RA. Postconditioning devaluation of a reinforce affects instrumental responding. Journal of Experimental Psychology: Animal Behavior Processes. 1985;11:120–132. [Google Scholar]
- Corbit LH, Janak PH. Ethanol-associated cues produce general Pavlovian-instrumental transfer. Alcoholism: Clinical and Experimental Research. 2007;31:766–774. doi: 10.1111/j.1530-0277.2007.00359.x. [DOI] [PubMed] [Google Scholar]
- Corbit LH, Balleine BW. Instrumental and Pavlovian incentive processes have dissociable effects on components of a heterogeneous instrumental chain. Journal of Experimental Psychology: Animal Behavior Processes. 2003;29:99–106. doi: 10.1037/0097-7403.29.2.99. [DOI] [PubMed] [Google Scholar]
- Corbit LH, Balleine BW. The general and outcome-specific forms of Pavlovian-instrumental transfer are differentially mediated by the nucleus accumbens core and shell. Journal of Neuroscience. 2011;17:11786–11794. doi: 10.1523/JNEUROSCI.2711-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corbit LH, Muir JL, Balleine BW. The role of the nucleus accumbens instrumental conditioning: Evidence of a functional dissociation between accumbens core and shell. Journal of Neuroscience. 2001;21:3251–3260. doi: 10.1523/JNEUROSCI.21-09-03251.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coutureau E, Killcross S. Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats. Behavioural Brain Research. 2003;146:167–174. doi: 10.1016/j.bbr.2003.09.025. [DOI] [PubMed] [Google Scholar]
- Delamater AR. On the nature of CS and US representations in Pavlovian learning. Learning & Behavior. 2012;40:1–23. doi: 10.3758/s13420-011-0036-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Delamater AR, Holland PC. The influence of CS–US interval on several different indices of learning in appetitive conditioning. Journal of Experimental Psychology: Animal Behavior Processes. 2008;34:202–222. doi: 10.1037/0097-7403.34.2.202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Delamater AR, Lattal KM. The study of associative learning: Mapping from psychological to neural levels of analysis. Neurobiology of Learning and Memory. 2014;108:1–4. doi: 10.1016/j.nlm.2013.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Estes WK, Skinner BF. Some quantitative properties of anxiety. Journal of Experimental Psychology. 1941;29:390–400. [Google Scholar]
- Fanselow MS, Zelikowsky M, Perusini J, Rodriguez Barrera V, Hersman S. Isomorphisms between psychological processes and neural mechanisms: From stimulus elements to genetic markers of activity. Neurobiology of Learning and Memory. 2014;108:5–13. doi: 10.1016/j.nlm.2013.10.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holland PC, Hsu M. Role of amygdala central nucleus in the potentiation of consuming and instrumental lever-pressing for sucrose by cues for the presentation or interruption of sucrose delivery in rats. Behavioral Neuroscience. 2014;128:71–82. doi: 10.1037/a0035445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holmes NM, Marchand AR, Couturear E. Pavlovian to instrumental transfer: a neurobehavioural perspective. Neuroscience and Biobehavioral Reviews. 2010;34:1277–1295. doi: 10.1016/j.neubiorev.2010.03.007. [DOI] [PubMed] [Google Scholar]
- Honey RC, Iordanova MD, Good M. Associative structures in animal learning: Dissociating elemental and configural processes. Neurobiology of Learning and Memory. 2014;108:96–103. doi: 10.1016/j.nlm.2013.06.002. [DOI] [PubMed] [Google Scholar]
- Kamin L. Selective association and conditioning. In: Mackintosh NJ, Honig WK, editors. Fundamental issues in associative learning. Halifax, UK: Dalhousie University Press; 1969. pp. 42–89. [Google Scholar]
- Killcross AS, Coutureau E. Coordination of actions and habits in the medial prefrontal cortex of rats. Cerebral Cortex. 2003;13:400–408. doi: 10.1093/cercor/13.4.400. [DOI] [PubMed] [Google Scholar]
- Konorski J. Integrative activity of the brain. Chicago, IL: University of Chicago Press; 1967. [Google Scholar]
- Konorski J, Miller S. On two types of conditioned reflex. Journal of Genetic Psychology. 1937;16:264–272. [Google Scholar]
- Kruse JM, Overmier JB, Konz WA, Rokke E. Pavlovian conditioned stimulus effects upon instrumental choice behavior are reinforcer specific. Learning & Motivation. 1983;14:165–181. [Google Scholar]
- Laurent V, Morse AK, Balleine BW. The role of opioid processes in reward and decision-making. British Journal Pharmacology. 2014 doi: 10.1111/bph.12818. doi: 10.1111/bph.12818. [Epub ahead of print] [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewis AH, Niznikiewicz MA, Delamater AR, Delgado MR. Avoidance-based human Pavlovian-to-instrumental transfer. European Journal of Neuroscience. 2013;38:3740–3748. doi: 10.1111/ejn.12377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mackintosh NJ, Dickinson A. Instrumental (Type II) conditioning. In: Dickinson A, Boakes RA, editors. Mechanisms of learning and motivation: A memorial volume to Jerzy Konorski. Hillsdale, NJ: Lawrence Erlbaum Associates; 1979. pp. 143–170. [Google Scholar]
- Martinovic J, Jones A, Christiansen P, Rose AK, Hogarth L, Field M. Electrophysiological responses to alcohol cues are not associated with Pavlovian-to-Instrumental transfer in social drinkers. PLoS One. 2014;9:e94605. doi: 10.1371/journal.pone.0094605. doi: 10.1371/journal.pone.0094605 PMCID: PMC3986108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McDannald MA, Jones JL, Takahashi YK, Schoenbaum G. Learning theory: A driving force in understanding orbitofrontal function. Neurobiology of Learning and Memory. 2014;108:22–27. doi: 10.1016/j.nlm.2013.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mowrer OH. On the dual nature of learning- a re-interpretation of "conditioning" and "problem-solving". Harvard Educational Review. 1947;17:102–148. [Google Scholar]
- Ostlund SB, LeBlanc KH, Kosheleff AR, Wassum KM, Maidment NT. Phasic mesolimbic dopamine signaling encodes the facilitation of incentive motivation produced by repeated cocaine exposure. Neuropsychopharmacology. 2014;39:2441–2449. doi: 10.1038/npp.2014.96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Overmier JB, Bull JA, Pack K. On instrumental response interaction as explaining the influences of Pavlovian CS+s upon avoidance behavior. Learning and Motivation. 1971;2:103–112. [Google Scholar]
- Parnaudeau S, Taylor K, Bolkan SS, Ward RD, Balsam PD, Kellendonk C. Mediodorsal thalamus hypofunction impairs flexible goal-directed behavior. Biological Psychiatry. 2014 doi: 10.1016/j.biopsych.2014.03.020. pii: S0006-3223(14)00221-222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pavlov IP. The reply of a physiologist to psychologists. Psychological Review. 1932;39:91–127. [Google Scholar]
- Pearce JM, Hall G. Overshadowing the instrumental conditioning of a lever-press response by a more valid predictor of the reinforcer. Journal of Experimental Psychology: Animal Behavior Processes. 1978;4:356–367. [Google Scholar]
- Peciña S, Berridge KC. Dopamine or opioid stimulation of nucleus accumbens similarly amplify cue-triggered 'wanting' for reward: entire core and medial shell mapped as substrates for PIT enhancement. European Journal of Neuroscience. 2013;37:1529–1540. doi: 10.1111/ejn.12174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rescorla RA. Response-outcome versus outcome-response associations in instrumental learning. Animal Learning & Behavior. 1992;20:223–232. [Google Scholar]
- Rescorla RA, LoLordo VM. Inhibition of avoidance behavior. Journal of Comparative and Physiological Psychology. 1965;59:406–412. doi: 10.1037/h0022060. [DOI] [PubMed] [Google Scholar]
- Rescorla RA, Solomon RL. Two-process learning theory: Relationships between Pavlovian conditioning and instrumental learning. Psychological Review. 1967;74:151–182. doi: 10.1037/h0024475. [DOI] [PubMed] [Google Scholar]
- Steinhauer GD, Davol GH, Lee A. Acquisition of the autoshaped key peck as a function of amount of preliminary magazine training. Journal of the Experimental Analysis of Behavior. 1976;25:355–359. doi: 10.1901/jeab.1976.25-355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trapold MA. Are expectancies based upon different positive reinforcing events discriminably different? Learning and Motivation. 1970;1:129–140. [Google Scholar]
- Trapold MA, Overmier JB. The second learning process in instrumental learning. In: Black AH, Prokasy WF, editors. Classical conditioning. II. Current research and theory. New York, NY: Appleton-Century-Crofts; 1972. [Google Scholar]
- Urcuioli PJ, DeMarse TB. Further tests of response-outcome associations in differential-outcome matching-to-sample. Journal of Experimental Psychology: Animal Behavior Processes. 1997;23:171–182. [Google Scholar]
- Wagner AR, Brandon SE. Evolution of a structured connectionist model of Pavlovian conditioning (AESOP) In: Klein SB, Mower RR, editors. Contemporary learning theories: Pavlovian conditioning and the status of traditional learning theory. Hillsdale, NJ: Erlbaum; 1989. pp. 149–189. [Google Scholar]
- Wagner AR, Logan FA, Haberlandt K, Price T. Stimulus selection in animal discrimination learning. Journal of Experimental Psychology. 1968;76:171–180. doi: 10.1037/h0025414. [DOI] [PubMed] [Google Scholar]
- Weiss SJ, Thomas DA, Weissman RD. Combining operant-baseline-derived conditioned excitors and inhibitors from the same and different incentive classes: an investigation of appetitive-aversive interactions. Quarterly Journal of Experimental Psychology: B. 1996;49:357–381. doi: 10.1080/713932635. [DOI] [PubMed] [Google Scholar]
- Williams BA. Associative competition in operant conditioning: Blocking the response-reinforcer association. Psychonomic Bulletin & Review. 1999;6:618–623. doi: 10.3758/bf03212970. [DOI] [PubMed] [Google Scholar]