Summary
Little is known about the neural mechanisms that allow humans and animals to plan actions using knowledge of task contingencies. Emerging theories hypothesize that it involves the same hippocampal mechanisms that support self-localization and memory for locations. Yet, limited direct evidence supports the link between planning and the hippocampal place map. We addressed this by investigating model-based planning and place memory in healthy controls and epilepsy patients, treated using unilateral anterior temporal lobectomy with hippocampal resection. Both functions were impaired in the patient group. Specifically, the planning impairment was related to right hippocampal lesion size, controlling for overall lesion size. Furthermore, while planning and boundary-driven place memory co-varied in the control group, this relationship was attenuated in patients, consistent with both functions relying on the same structure in the healthy brain. These findings clarify both the neural mechanism of model-based planning and the scope of hippocampal contributions to behavior.
eTOC Blurb
Testing patients with hippocampal damage, Vikbladh et al. demonstrate that model-based planning and place memory rely on a common hippocampal substrate. The study bridges the reinforcement-learning and spatial memory literatures to clarify the scope of hippocampal contributions to behavior.
Introduction
Using knowledge of task contingencies, humans and other animals can plan novel courses of action, such as trajectories through a maze. Although the neural substrates for such “model-based” planning are poorly understood, this ability is often viewed as similar to other functions supported by the hippocampus, like representing and remembering locations in space. Both model-based planning (Tolman, 1948) and place memory (O’Keefe and Nadel, 1978) are often described as requiring ‘cognitive maps’ of the environment or task structure, and are contrasted against habitual response-based behaviors that depend on the basal ganglia. Still, despite the commonalities, these functions are distinct in principle and it is unclear whether they actually share a common neural mechanism and, if so, what that mechanism is.
Research into hippocampal spatial cognition most clearly emphasizes localization: determining one’s position in allocentric space. This function is most famously exemplified by location-selective neural responses in the hippocampus (O’Keefe and Nadel, 1978) and behaviorally operationalized using spatial tasks such as the Morris Water Maze (MWM) (Morris et al., 1982), where rodents have to find and remember the location of a hidden platform in an open arena. This type of “place memory,” keyed to allocentric configurations of cues like boundaries, is distinguished from more landmark-based strategies, such as egocentric stimulus-response strategies (e.g., turn left or right) which rely more on the basal ganglia (McDonald and White, 1994, Packard and McGaugh, 1996, Pearce et al., 1998). An analogous dissociation between hippocampal and basal ganglia-dependent memory has been demonstrated in humans using functional magnetic resonance imaging (fMRI) in virtual spatial tasks (Hartley et al., 2003, Iaria et al., 2003, Voermans et al., 2004, Doeller et al., 2008).
By contrast, research into planning investigates how organisms use knowledge of task contingencies, like action outcomes and state transitions, to evaluate actions by mental simulation. Experiments probing such functions, including reward devaluation in operant lever pressing (Adams and Dickinson, 1981, Adams, 1982) and multi-step reinforcement learning (Gläscher et al., 2010, Daw et al., 2011), support a distinction between two classes of strategies – referred to as goal-directed or model-based planning vs. habitual or model-free learning (Balleine and Dickinson, 1998, Daw et al., 2005). This distinction seems to parallel the place vs. response memory dichotomy from spatial cognition (Poldrack and Packard, 2003, Kosaki et al., 2018) and the related declarative vs. procedural memory distinction from the memory literature (Squire, 1992, Knowlton et al., 1996, Foerde and Shohamy, 2011, Shohamy and Daw, 2015). Indeed, model-free learning, like landmark-based stimulus-response strategies in some spatial tasks, is well captured by theories of dopamine and the basal ganglia (Schultz et al., 1997, Bayer and Glimcher, 2005).
It is less clear what neural mechanisms are responsible for model-based planning. There are, however, a number of suggestive reasons to suspect it shares a common hippocampal substrate with place memory (Hirsch, 1974, Dickinson and Balleine, 1993, Eilan et al., 1993, Johnson and Redish, 2007, Shohamy and Daw, 2015, Kumaran et al., 2016). Hippocampal function, of course, extends beyond spatial cognition to support declarative memory, and notably a role in encoding the relationships among environmental stimuli (Eichenbaum and Cohen, 2004, Davachi and Wagner, 2002, Kumaran et al., 2009, Schapiro et al., 2016, Boorman et al., 2016, Garvert et al., 2017). Knowing such relations is critical to building a model of task contingencies. Tests of relational encoding have even relied on tasks which are similar in logic to probes for model-based planning, like transitive inference or acquired equivalence (Dusek and Eichenbaum, 1997, Heckers et al., 2004, Shohamy and Wagner, 2008, Wimmer and Shohamy, 2012). Moreover, the hippocampus has been implicated in the ability to imagine or simulate future events, a function that may be critical to model-based planning (Hassabis et al., 2007, Addis et al., 2011). Spatial navigation studies have further demonstrated that the hippocampus and surrounding medial temporal areas, in addition to the current location, encode other variables that are relevant to planning, such as boundaries or the identity, direction, and distance to a goal (Spiers and Maguire, 2007, Viard et al., 2011, Chadwick et al., 2015, Wikenheiser and Redish, 2015, Brown et al., 2016, Kaplan et al., 2017).
Non-local place-cell firing, such as preplay of locations ahead of the animal, has also been proposed to support planning by mental simulation of candidate routes, drawing on a cognitive model or map of the world (Johnson and Redish, 2007, Pfeiffer and Foster, 2013, Shohamy and Daw, 2015, Mattar and Daw, 2017).
At the same time, there is a surprising lack of direct evidence for hippocampal involvement in model-based planning. For predominant rodent models of model-based behavior, including outcome devaluation and contingency degradation in operant lever-pressing, hippocampal lesions have negligible effects (Corbit and Balleine, 2000, Corbit et al., 2002). One exception is a recent rodent study in which hippocampal lesions impaired model-based planning in a multistep decision task (Miller et al., 2017). However, the extensive training needed to teach animals such sequential decision tasks may elicit model-free strategies that only mimic the signatures of planning in more lightly trained humans (Akam et al., 2015, Economides et al., 2015). Even seemingly model-based rodent behavior (Miller et al., 2017) could thus rely on the hippocampus for different reasons. Finally, with few exceptions (Simon and Daw, 2011), little evidence links hippocampal activity in human neuroimaging, or rodent place cell preplay to planning in tasks specifically designed to identify choice strategies that require knowledge of task contingencies.
We therefore sought to directly test the hypotheses that model-based planning uses a hippocampal mechanism in humans, and whether this substrate is shared with boundary-driven place memory. To this end, we studied the performance on a model-based planning task (Daw et al., 2011) and a spatial memory task (Doeller et al., 2008) in healthy controls and patients with medically intractable epilepsy, treated by unilateral anterior temporal lobectomy (ATL) with hippocampal resection. We investigated whether damage to the temporal lobe impaired model-based planning and boundary-driven place memory and how it affected the relationship between them. If the hippocampus is a common neural substrate for both functions, we expected hippocampal damage to impair performance on both tasks. Furthermore, a common substrate could lead to correlated performance across the tasks, but this correlation should itself be attenuated if that substrate is impaired by hippocampal damage. Finally, because the lesions also affected overlying cortex, we explored to what extent performance in either task was related specifically to the extent of damage to hippocampus on either side, controlling for the overall extent of the lesion.
Results
Participant Characteristics
We recruited 19 epilepsy patients, treated with unilateral anterior temporal lobectomy (ATL) i.e. surgical removal of the anterior temporal lobe on one side, and 19 healthy controls (see STAR Methods). Patients and controls displayed no significant group differences in IQ (t36= 0.2200, p=0.8271), age (t36=−0.7760, p=0.4428) or number of males vs. females (z=−0.3261, p=0.7444). For 10 out of 19 patients ATL was right lateralized. There were no significant differences between the right and left lateralized ATL groups in IQ (t= −0.5295, p= 0.6033), age (t17=1.0876, p=0.2919) or number of males vs. females (z=−0.6752, p=0.4995).
Individual lesion-masks, derived from post-surgical structural brain scans normalized to the 1 mm MNI template (Figure 1 and STAR Methods) were used to estimate the standardized total lesion size of each patient. Total lesion size in the right lateralized ATL patient group (mean=65314 voxels, SE=2791 voxels) was found to be significantly larger (t17=−2.6103, p=0.0183) than in the left lateralized ATL patient group (mean=40536 voxel, SE=2296 voxels).
Each patient’s lesion mask was also compared to the Harvard-Oxford Brain Lexicon (p>.5) in order to estimate what percentage of the hippocampus had been resected (See STAR Methods). Mean hippocampal lesion size estimated in the right lateralized ATL patient group (mean=2791 voxels, SE=321 voxels) was not significantly larger (t17= −1.1124, p= 0.2814) than in the left lateralized ATL patient group (mean=2296 voxels, SE=302 voxels). The mean hippocampal lesion sizes corresponded to 62.8% (SE=7.2) and 53.7% (SE=7.1) of the hippocampus resected, for the right and left ATL groups respectively,
Patients display shift from model-based to model-free strategy
Participants completed 200 trials of a two-step Markov decision task (Daw et al., 2011) designed to quantify the reliance on model-based and model-free strategies (see STAR Methods). The mean number of completed trials was 195.3 (SE=2.1) with no difference between control and patient groups (t38=1.2515, p= 0.2188). The mean number of rewards received was 107.9 (SE=2.6), also, with no significant difference between control or patient group (t36= −0.0399, p= 0.9684). In general, rewards in this task are by design highly stochastic and not sensitive to differences in strategy.
On each trial the participant first made a choice between two spaceships. One spaceship most commonly (p=.7) transitioned to the purple planet, and otherwise made a rare transition (p=.3) to the red planet. For the other spaceship, probabilities were reversed. The participant then made a choice between two planet-specific aliens, each associated with a unique, slowly drifting probability of reward (Figure 2).
Figure 3 shows markers of both as a function of group, estimated from a factorial logistic regression (Table S1), which predicts choosing the same spaceship as on the previous trial. Model-free learning is signaled by a main effect of reward, i.e. a tendency to repeat choosing the spaceship that led to reward, whereas in model-based learning, choice of spaceship is mediated by expectations about the planets to which it leads, indicated by an interaction between reward and whether a rare or common transition occurred on the last trial. If, for instance, a reward is received but following a rare transition, a model-based agent should be less likely to repeat the choice of spaceship on the next trial. The difference in these effects measures the relative strength of model-based vs model-free choice. The regression also controls for additional nuisance explanatory factors, age and IQ.
The expression of model-based vs model-free strategies differed significantly by group, with controls showing a relatively even mixture of strategies (similar to previous reports using this task) but patients skewed away from model-based planning toward model-free learning. The relative reliance on model-based over model-free strategies, calculated by taking the difference between these effects, differed significantly between groups, indicating a specific strategy change rather than a general impairment (z=2.028, p=0.043). This finding is consistent with our hypothesis that hippocampal damage in the ATL lesion group specifically affects the use of internal models or maps of task contingencies.
For simplicity, the above factorial analysis only considers the effect of the preceding trial’s events on each trial’s choice. To verify that our results were not dependent on this assumption, and in keeping with previous work (Daw et al., 2011) we repeated our analysis (Table S2) by fitting participants’ choices with a full 6-parameter computational learning model (Daw et al., 2011, as modified by Gillan et al., 2016), which uses the full history of preceding rewards to predict each choice. The results recapitulate the findings from the regression: chiefly, a significant interaction of RL strategy and experimental group such that patients are biased away from model-based and towards model-free strategies (p=0.036). In addition, in this analysis (here going beyond the simpler regression analysis) the estimated strength of model-free learning is itself significantly higher in the patient group than the control group (p=0.043). The remaining parameters of the computational model did not differ significantly between groups.
Patients display impaired boundary-driven place memory
Participants completed a spatial task (Doeller et al., 2008) where on each of 64 trials they had navigate, in first person (Figure 4 Left), to indicate from memory the correct location of one of four objects in a virtual arena (see STAR Methods). The mean number of completed trials was 61.3 (SE=0.5) with no significant difference between control and patient group in number of completed trials (t36= −0.2882, p= 0.7749).
For two of the objects, correct locations were defined in relation to distal boundary cues around the arena, and for the other two objects correct locations were defined in relation to a landmark inside the arena (Figure 4 Middle). Trials were presented in four blocks consisting of 16 trials each. Within the blocks, the landmark location was fixed with respect to the boundary cues, but between the blocks, the landmark moved (Figure 4 Right). Participants were not instructed about the difference between landmark and boundary objects, or about the block-wise landmark movements.
The movements of the landmarks with respect to the boundary cues serve to dissociate spatial memory performance based on either type of cue. Given previous results (Pearce et al., 1998; Doeller et al., 2008), we hypothesized that hippocampal damage would preferentially impair reliance on boundary cues. Following Doeller et al. (2008), for each object, we therefore focused our analyses on the first trial following each of the three landmark movements (average total 11.4 trials per participant, due to missed trials). This is because these trials cleanly dissociate performance based on recalling the object’s location in the previous block relative to each type of cue (Figure 4 Middle). Note that since the boundary and landmark cues remain fixed with respect to each other within each block, performance on the remaining trials of each block are not as diagnostic of cue usage, since to the extent behavior is based on recalling the object’s most recent location with respect to either cue, this is equivalent for both cues (though see Figure S1 and S2 for analysis of remaining trials). Similarly, to avoid relying on the assumption that participants were able to learn to differentiate landmark from boundary objects (which is only possible following experience with at least one of the three landmark movements) we analyzed landmark and boundary error for all objects rather than differentiating by object type. We did, however, specify a regression model where we, for the critical trials following the landmark moves, additionally interacted group and distance error type by object type (Table S3) we did find a two-way interaction of object type by error type (F1, 782.4 = 10.527, p<0.001), indicating that participants were ultimately able to treat the two object types differently. However, we found no significant three-way interaction of object type by error type by group (F1,782.1 = 0. 2.6427, p= 0.1044), indicating that we did not detect a group difference in this respect.
On the critical trials, those following movement of the landmark, we quantified reliance on either cue type by computing distance errors dB and dL, respectively, between the chosen location and the correct locations as predicted by boundary cues and landmark cues, based on the previous block (Figure 4 Right). dB and dL thus inversely reflect performance with respect to boundary and landmark cues. To assess group differences we specified a regression model where the dependent variable, distance error (dB and dL for each trial) was regressed on the key explanatory variables lesion group (control vs. patient) and distance error type (dB or dL), while also controlling for additional nuisance explanatory factors, age and IQ.
We found a significant interaction of group by distance error type (F1,97.58=5.5080, p=0.021), indicating a difference between groups in their relative reliance on the two cue types. This effect mainly reflected the finding that patient’s dB was significantly higher, i.e. patients’ performance was less driven by boundary cues (F1,39.41=2.5102, p=0.016) (Figure 5 and Table S4).
In a follow-up analysis, aimed at simplifying the design for later elaboration by assessing relative reliance on boundary vs. landmark cues using a single explanatory variable, we defined a relative measure of error: the ratio dB/(dL+dB), which measures whether participants were relatively biased toward using boundary cues over landmark cues. Regressing it on group (controlling for nuisance variables age and IQ), we again found that patients were significantly biased towards relying on boundary cues (F1,432=8.213, p=0.004) (Table S5). All these results are consistent with our prediction that anterior temporal lobe structures like the hippocampus preferentially support boundary- over landmark-driven memory.
Relationship between model-based planning and boundary-driven place memory
So far, we have shown impaired model-based planning and boundary-driven place memory in the patient group. Next, we examined the relationship between these two functions, first by investigating their baseline correlation in neurologically intact control participants. We did this by calculating the mean boundary distance error dB for each participant, and using it as a covariate in the logistic regression model of our decision task. This approach is analogous to estimating participant-by-participant scores for model-based planning from the logistic model, then correlating those with dB in a second step, but preferable because it takes account of statistical uncertainty about the participants’ planning scores in computing their relationship to dB, which the naive correlation neglects. IQ was also included as a nuisance covariate to account for task-general variation.
Figure 6 displays the results of this regression (Table S6), broken down by group. Boundary-driven place memory significantly predicted a control participant’s use of a model-based strategy (z=6.6455, p= 0.001), consistent with the two measures sharing some underlying substrate.
Next we engaged in a series of follow-up analyses to interrogate the specificity of this cross-task relationship. First, we wished to examine whether the relationship was specific to model-based and boundary-driven task strategies, rather than general to both strategies tested in each task. We first verified that the increase in model-based planning associated with dB is significantly larger than any corresponding effect for model-free choice (z=2.137, p=0.033), or in other words that dB is associated with a relative increase in model-based relative to model-free choice. Next, we refined this analysis to also probe its specificity to boundary over landmark error. In a new regression model (Table S7), we replaced the boundary error dB with the relative error ratio dB/(dL+dB), which measures whether participants were relatively biased toward using boundary cues over landmark cues in the spatial task. In the control group, as with absolute boundary error, relative error was also significantly associated with relative increase in model-based, minus model-free, choice (z=2.069, p=0.039) (Table S7). Thus, in healthy controls there is a specific relationship between boundary-driven spatial memory and model-based choice, relative to their respective alternative strategies.
We next tested the specificity of the cross-task relationship to controls vs patients. We reasoned that if the relationship in the control group depends on the intact hippocampus (e.g., if it arose due to a common substrate located there), then over and above the effects on each task separately, their relationship would be affected by hippocampal/MTL damage. Therefore, we tested the null hypothesis that the relationship between the tasks is unaffected by ATL damage, the rejection of which would support the alternative hypothesis that the ATL does affect their relationship. The relationship between boundary-driven memory and model-based planning was indeed significantly attenuated in the patient group (z=2.137, p=0.032). Reflecting this attenuation, the patient group, considered alone, did not display a significantly detectable relationship between the two functions (z=0.156, p=0.875). Critically, this null result does not imply that these functions are unrelated in the patient group.
Finally, we repeated this analysis using the full computational learning model in place of the simpler regression-based index of learning (Table S8). Again, while controlling for IQ, we observed a strong positive correlation between model-based planning and place memory in the control group (p=0.030) but not the patient group (p= 0.803), although the group-wise interaction was merely trending in this version of the analysis (p=0.081).
Deficits are more robust for patients with right lateralized ATL
Based on previous literature, we next sought to examine to what extent the reported effects might be preferentially associated with lesions lateralized to one side or the other. Breaking down the data this way requires examining small subgroups (N=9 and 10), meaning that the key analyses comparing the two laterality groups against one another are underpowered relative to comparing either group to controls. Also, lesion laterality is correlated with overall lesion extent in our sample. Altogether, these analyses are fundamentally more exploratory and their results more tentative than those reported above.
With those caveats, we expected boundary-based memory, and spatial relations generally, to be more strongly associated with the right hippocampus (Burgess et al., 2002). For instance, boundary-memory-related hippocampal activity in the previous fMRI study of the spatial memory task we used was right-lateralized (Doeller et al., 2008). It is less clear, a priori, how model-based planning might be lateralized.
Figure 7 shows decision task and spatial memory task performance with the patient data further subdivided by ATL laterality and the relationship between model-based planning and dB also broken down by laterality. In all three cases the differences between patients and controls (Figure 7) appeared to be driven by the right lesion patients, with the left lesion patients more similar to controls. This impression is only partly borne out by statistics, however (Tables S9, S10 and S11). In particular, in all three cases the right patients differ significantly from controls (model-based minus model-free: z=2.300, p= 0.022; dB minus dL: F1,92.22=4.464, p=0.034; across-task correlation between model-learning and boundary memory: z=2.550, p=0.011), whereas the left patient group did not differ from controls in any case (dB minus dL: F1,100.7=2.644, p=0.107; model-based minus model-free: z=1.001, p=0.317; across-task correlation between model-learning and boundary memory: z=0.470, p=0.639). However, in no case were the lesion groups significantly different from one another (dB minus dL: F1, 97.8.= 0.150, p= 0.699; model-based minus model-free: z=−1.149, p=0.251; across-task correlation between model-learning and boundary memory: z=1.552, p=0.121).
We also examined the breakdown, by lesion laterality, of the relationship between relative measures of planning and spatial memory, to account for alternative strategies. In the regression model specified earlier that included the relative error ratio dB/(dL+dB) as a covariate (Table S7), the association between relative preference for model-based (minus model-free) planning and the relative bias toward using boundary over landmark cues was larger in the left group than the right group (z=2.8082, p=0.005). An estimated effect in the same direction was also seen comparing the control with the right patient group, with the relationship being stronger in the control group, although it did not reach significance (z=1.567, p=0.117).
Thus, although noisy, there is a consistent suggestion across all three measures and different ways of examining the cross-task relationship that the results of this study were most robust in the right lesion group
Lesion-size in the right hippocampus predicts model-based planning deficits
One way to sharpen the foregoing analyses is to focus specifically on not just the side but the particular anatomical region hypothesized to underlie the effects: the hippocampus. Accordingly, we tested how performance on the tasks co-varied with estimated lesion size on the right and left hippocampus respectively. Hippocampal lesion size for each patient (Figure 1) was estimated by comparing the normalized anatomical masks to the Harvard-Oxford Lexicon (p>.5) (see STAR Methods). Importantly, the ATL procedure involves a pattern of damage to numerous temporal lobe structures in addition to hippocampus, which means one should be highly cautious interpreting these results with respect to any particular structure. Although it is not practical to control for damage to many different MTL structures individually, we attempt to mitigate these concerns and focus on hippocampal lesion size by controlling for the overall lesion size as a nuisance effect. The regression analyses also controlled for age and IQ.
We found, as can be seen in Figure 8 (Table S12), that model-based planning was significantly worse for larger right hippocampal lesions (z=2.831, p=0.005). Conversely, planning was not significantly related to the amount of hippocampal damage on the left hippocampus (z=1.062, p=0.288) and this difference between right and left effects was itself significant (z=2.508, p=0.012). These results indicate that the amount of damage to the right but not the left hippocampus is related to model-based deficits. However, in order to further test specificity, and ensure that the lesions in the right hippocampus are not simply causing general learning deficits, we also calculated the effect of each hemisphere’s hippocampal lesion size on the difference between model-based and model-free learning, as estimated by the logistic regression. As predicted, we found that right hippocampal lesion size was significantly related to a shift away from model-based towards a model-free strategy (z= 2.984, p=0.003) and that this effect was significantly larger for right compared to left hippocampal lesions (z=−3.377, p<0.001).
For boundary memory, the effects of lesion size on performance were similar in magnitude and pattern, although not significant in either the right (F1,12.16=2.6082, p=0.1320) or the left patient group (F1,12.93=0.1204, p=0.7340) (Table S13). It should be noted that this analysis is based on many fewer trials than the sequential decision task analysis.
Discussion
Although extensive evidence indicates that the hippocampus supports localization in allocentric space, there is relatively little direct evidence for the hypothesis that the same mechanisms extend to model-based planning. We addressed this gap by testing model-based planning and place memory in patients with extensive hippocampal damage as a result of unilateral ATL lesions and matched, neurologically typical controls. Our results are consistent with the hypothesis that both and model-based planning and boundary-driven place memory share a common mechanism, which is affected in ATL patients and, more tentatively, associated with right hippocampus.
As predicted, ATL patients displayed significantly attenuated boundary-driven place memory in our MWM-like spatial memory task, alongside spared landmark-based memory. These results echo the dual-systems view of navigation supported both by rodent lesion (O'Keefe and Nadel, 1978, McDonald and White, 1994, Packard and McGaugh, 1996, Pearce et al., 1998) and human neuroimaging experiments (Hartley et al., 2003, Iaria et al., 2003, Voermans et al., 2004, Doeller et al., 2008), whereby the hippocampus supports fast learning of allocentric spatial maps and the striatum facilitates slow, incremental associations between stimuli and responses. That said, one weakness of the current task in operationalizing the place vs. response distinction is that although use of boundary cues clearly exercises allocentric spatial localization, landmark usage imperfectly captures a striatal response system as classically envisioned (for instance, because some allocentric information is still needed to place objects correctly relative to the landmark). Nevertheless, the use of boundary cues to index hippocampal function (the measure most important to our results) is unambiguous and well validated (Pearce et al., 1998, Doeller et al. 2008), even if the landmark foil imperfectly captures a hypothetical striatal contribution. Also, our distance error measures assess the tendency of participants to use either sort of cue. It cannot distinguish deficits in boundary-driven place memory per se from performance deficits such as reduced attention to these cues or a greater belief that landmarks predict object locations. All these mechanisms, though, are consistent with the broader perspective that anterior temporal lobe is ultimately involved in allocentric spatial localization based on configurations of cues.
The patients were also significantly biased away from using model-based planning and toward model-free habitual strategies in the two-step Markov decision task. This result provides causal evidence for the inference that temporal lobe structures support model-based planning, over and above their role in place memory. The appearance of a compensatory shift toward improved model-free learning, which is rarely reported with this task (Frank et al., 2004), indicates that behavior in the ATL patients is not simply noisier, and instead is consistent with models invoking multiple, potentially competing, reinforcement-learning systems in the human brain (Daw et al., 2005). Still, our results do not speak clearly to the perennial question whether the hippocampus plays a special role in such models for spatial vs. more abstract relational tasks. This is because although our planning task is structured like an abstract Markov decision process, its cover story, in terms of rocket trips to planets, might have elicited a spatial interpretation.
Our results also complement and extend previous research with rodents. Unit recording studies have shown results suggestive of hippocampal involvement in model-based planning, notably replay of forward trajectories in hippocampal place cells (Johnson and Redish, 2007, Pfeiffer and Foster, 2013). However, in contrast to our results, previous studies with place cell recordings have not yet shown behavioral evidence for a link between the hippocampus and the use of this knowledge in planning, nor do they provide evidence for a causal role of hippocampus in such a function. In these respects, our results more closely parallel a recent report of a related deficit in model-based learning in rodents during inactivation of the dorsal hippocampus, using an analogous two-stage Markov decision task (Miller et al., 2017). The targeting of the inactivation to hippocampus in the rodent study sharpens the anatomical specificity of the effect. Conversely, our human result clarifies the contribution of the damaged structure, because we know more about the computations underlying model-based behavior on this task in humans. In particular, in humans, but not yet rodents, model-based choices have been explicitly linked to prospective neural activity at decision time (Doll et al., 2015). This helps to rule out other potentially confounding strategies, such as that the apparently model-based choices in rodents are instead produced by some learned response switching rule contingent on events spanning multiple trials. It has been suggested that such model-free strategies might arise following overtraining of the sort used to teach animals this task (Akam et al., 2015, Economides et al., 2015); this might also implicate hippocampus for other, confounding reasons, such as its involvement in trace conditioning and latent states (Solomon et al., 1986, Büchel et al., 1999, Gershman et al., 2010). Our findings of a similar result in humans, without extensive training, thus help corroborate the interpretation of the rodent study as well.
Comparing performance between our two tasks we also found that in healthy controls, boundary-driven place memory performance correlated with the extent of reliance on model-based planning strategies. Importantly, this relationship was also significantly attenuated in the patient group, a result that suggests the lesion affects some common substrate for the tasks that is otherwise provided by the temporal lobes in the healthy brain. Following damage to this structure, however, the two tasks may be at least partly supported by differential compensatory mechanisms, leading to their de-correlation. These findings are consistent with the hypothesis that both model-based planning and place memory share a common mechanism, which is impaired in ATL patients.
It is surprising and interesting that the effects we report emerge following damage to only one lobe, as unilateral temporal lobe damage is generally known to produce rather subtle effects on cognition in humans (Spiers et al., 2001) and animals (van Praag et al., 1998), relative to the famously dramatic effects of bilateral lesion (e.g. Scoville and Milner, 1957). This may relate to our use of two behavioral tasks that are well attuned to temporal lobe function. However, there exist inherent and important caveats in drawing conclusions about the neural bases of effects from a study of this sort. It is possible that the observed effects are caused, at least in part, by damage to the brain, including the hemisphere not surgically altered, as a result of the chronic epilepsy that prompted the surgery. Indeed, as with all studies of temporal lobe function in patients with epilepsy, the possibility that impaired behavior and cognition in patients is due to a history of epilepsy rather than the surgical intervention per se must be taken into account.
For this reason and others, we must also be cautious about associating the damage with individual structures. Our analyses indicate that the size of lesion to the right hippocampus is significantly related to model-based planning deficits. Still, ATL lesions additionally affect a number of other regions including parahippocampal cortex, perirhinal cortex, and amygdala, which might also subserve these effects. Moreover, since the pattern of the lesions mainly varies in the extent that the temporal lobe has been removed in the dorsal direction, the patterns of damage to these structures tends to covary across individuals. Such collinearity makes it difficult to use variation across patients in damage to individual structures to fully disentangle their differential roles. We attempted to mitigate these issues by controlling for overall lesion size. Nevertheless, due to the very substantial analytic and interpretational issues, this anatomical specificity remains emphatically tentative.
These caveats aside, a final question posed by our results concerns how model-based planning and boundary-driven place memory actually relate to one another. In the spatial literature, the notion of a cognitive map primarily refers to place-selective hippocampal activity, which allows organisms to recognize and remember discrete locations in allocentric space. From the perspective of planning, a cognitive map goes beyond such a representation, but is built upon it: the map captures the relationships between locations, which can be used to evaluate candidate actions. This function fits well with the broader view of hippocampus supporting relational memory (Eichenbaum and Cohen, 2001, Davachi and Wagner, 2002, Kumaran et al., 2009, Schapiro et al., 2016, Boorman et al., 2016, Garvert et al., 2017) which indicate that the planning deficit in patients stems from hippocampal damage being accompanied by attenuation of the knowledge of relationships between actions and states. This view is also consistent with recent computational models describing how the hippocampus might serve model-based planning. In spatial tasks, sequential activations of place-selective cells are hypothesized to provide, not only a mnemonic function through supporting reactivation of previously traversed trajectories, but a planning function by generating novel place cell sequences, based on the learned contingencies between locations (Johnson and Redish, 2007, Pfeiffer and Foster, 2013, Mattar and Daw, 2018). The related successor representation model (Stachenfeld et al., 2017, Garvert et al., 2017) also focuses on learned relationships among locations, by proposing that place selectivity itself is built from experience of state transitions to reflect expectations about future locations. A key challenge for future work addressing these ideas will be studying hippocampal activity in tasks, like the planning one used here, which manipulate animals' experience of environmental relationships, to reveal how they leverage this knowledge to guide choice.
STAR Methods
CONTACT FOR RESOURCE SHARING
Further information and requests for resources should be directed to and will be fulfilled by the Lead Contact, Oliver M. Vikbladh (omv208@nyu.edu).
SUBJECT DETAILS
The NYU committee on activities involving human subjects approved this study, and all participants gave written informed consent before participation. 19 individuals, who had undergone unilateral anterior temporal lobectomy (ATL) for the treatment of intractable epilepsy, were recruited from the New York University (NYU) Patient Registry for the Study of Perception, Emotion and Cognition (PROSPEC). A clinical neuropsychologist (MRM or KB) conducted all standardized procedures for screening patients for inclusion into NYU PROSPEC. Patients were only selected for inclusion in the study if there was no evidence of global cognitive dysfunction as measured by a comprehensive neuropsychological evaluation, an FSIQ (Wechsler Adult Intelligence Scale-Fourth Edition (Wechsler, 2008) above 80, no evidence of diffuse atrophy on MRI (e.g., brain tumor or idiopathic epilepsy), or and no history of psychiatric or neurologic disease other than the primary etiology for the focal brain lesion. The patient participants had a mean age of 37.0 years (SE=1.5), mean IQ-score of 109.7 (SE=2.8) and 10/9 male to female ratio. 19 healthy controls were also recruited from the local community through internet-based advertisement and gave consent to participate in the study. The control participants had a mean age of 39.3 years (SE=3.6), mean IQ-score of 108.7 (SE=3.6) and 11/8 male to female ratio.
METHOD DETAILS
MRI Scanning
When post-surgical structural brain scans (T1 MP-RAGE) were not available from the referring center, the Department of Radiology at the NYU School of Medicine, patients were imaged at the NYU Center for Brain Imaging on a 3-Tesla Siemens Allegra head-only MR scanner. Medical Center scans were obtained using 1.5 or 3-Tesla Siemens full-body MR scanners. Image acquisitions included a conventional three-plane localizer and two T1-weighted gradient-echo sequence (MP-RAGE) volumes (TE = 3.25 ms, TR = 2530 ms, TI = 1.100 ms, flip angle=7° FOV=256 mm, voxel size=1×1×1.33 mm). Acquisition parameters were optimized for increased gray/white matter image contrast.
Task Order
On the day of testing, participants completed the two behavioral tasks separated by a short break. For all participants the sequential decision-making task was given first, followed by the spatial memory task. For the control participants the completion of the tasks was followed by the administration of the WAIS-IV. For the patient participants, the WAIS-IV had been completed during screening procedures for inclusion into PROSPEC.
Sequential-Decision Making Task
Participants completed 200 trials of a two-step Markov decision task designed to quantify the extent to which participants use a world model to prospectively evaluate actions (Daw et al., 2011). The task was framed as a game about mining for space treasure (Decker et al., 2016) and was presented with Matlab, using the Psychophysics Toolbox extensions (Kleiner et al., 2007). Each trial involved two choices in succession, followed by reward (Figure 2). Participants first made a choice between two actions, depicted as spaceships, randomly presented on the left and right. The choice resulted in a transition to one of two second-stage states (depicted as a red or purple planet). One spaceship most commonly (p=.7) transitioned to the purple planet, and otherwise made a rare transition (p=.3) to the red planet. For the other spaceship, probabilities were reversed. Participants were informed that each spaceship was more likely to go to a different planet but not which planet, nor the explicit transition probabilities.
Subsequently, participants made a choice between two actions depicted as a pair of aliens that were unique to the planet, randomly presented on the left and right. Each alien was associated with a probability of monetary reward (vs nothing) that slowly diffused over trials according to an independent random walk. Rewards were paid out at the end of the experiment at a rate of 15 cents per reward. The random change in the second-stage reward probabilities encouraged participants to adjust their choice preferences at both stages trial-by-trial, so as to maximize payoffs. For each choice, participants had 3 seconds to respond; or else the trial was aborted with a time-out message.
Prior to the experiment, participants completed an extensive instructional tutorial. The tutorial included a 20-trial practice run, using a different set of visual stimuli (planets, spaceships, and aliens) but otherwise identical.
Spatial Memory Task
Each participant completed 64 trials of a spatial memory task, identical (with one exception, see below) to the task used by Doeller et al. (2008). On each trial, participants navigated a virtual reality arena using keyboard presses. UnrealEngine2 Runtime software (Epic Games) was used to present a first-person perspective view of the arena. The virtual arena (Figure 2 Left) was bounded by a circular wall, contained a single intra-maze landmark in the form of a traffic cone, and was surrounded by distant cues (mountains, clouds, and the sun) projected at infinity. Both the boundary (wall) and landmark (cone) were rotationally symmetric, leaving the distal cues as the main source of orientation.
At the beginning of each trial, a picture of one of four objects was presented on a grey background for 2 s. Participants were then placed in a random position within the arena without any objects, one-fifth of the radius from the center of the arena and facing a random direction (note that in Doeller et al. (2008) the starting radius was not restricted). Participants subsequently had 12 seconds to navigate to the correct location of the object as they remembered it from previous trials, and indicate that position by a button press. Following this button press, the object immediately appeared in its correct location. If no response had been made in 12 seconds, the object also appeared in its correct location automatically. Participants ended the trial by collecting the object in its correct location. A fixation cross was then presented for 2 s, before the start of the next trial.
The task consisted of 64 trials divided into 4 continuous blocks, each containing 4 pseudo-randomized presentations of each of the 4 objects. Between blocks, the landmark moved in relation to the boundaries, such that there were four arena configurations, with the landmark roughly in the middle of the north, south, west, and east sectors of the arena, as defined by the distal cues (Figure 2 Middle). The order of arena configurations over blocks was counterbalanced across participants and experimental groups. Participants were not informed of the landmark movements prior to the experiment.
During the first block, the correct location of all objects was in rough proximity of the landmark. Two of the objects were ‘boundary objects’, for which the correct locations were fixed relative to the environmental boundaries across the whole experiment. The other two objects (unannounced to the participants) were ‘landmark objects,’ for which correct locations were fixed at a constant distance and direction to the intra-maze landmark even as the landmark moved.
The task probed for memory of correct object locations within the arena. Critically, by manipulating the landmark location in relation to the boundary and distal cues, the task distinguished whether participants stored place memory of allocentric location in relation to the boundary and distal cues, or by egocentric response memory in relation to individual landmarks (Figure 4 Right). The original study, using the same procedure during fMRI in healthy participants, showed that boundary-related and landmark-related memory correlated with activity in the right posterior hippocampus and striatum, respectively (Doeller et al 2008).
Participants practiced in an unrelated virtual environment with a different set of object stimuli before performing the experiment. Additionally, before the first trial, participants collected each of the objects once in their correct block 1 locations.
QUANTIFICATION AND STATISTICAL ANALYSIS
MRI Image Processing
The high-resolution structural images from each patient were normalized to Montreal Neurological Institute (MNI) standard space using FSL FLIRT (FMRIB’s Linear Image Registration Tool; http://fsl.fmrib.ox.ac.uk/fsl) (Jenkinson and Smith, 2001). This consisted of a two-step procedure: First, using MRIcron (http://www.mccauslandcenter.sc.edu/mricro/mricron/), a mask was drawn over the lesion and any craniotomy defect to prevent bias in the transformation, then masked voxels were assigned a weight of “0” and ignored during a subsequent 12-parameter affine transformation of the lesioned brain to the standard MNI 1 mm reference volume (Mackey et al., 2016). The second step was manually tracing the lesions on individual slices of the patients’ brains overlaid on the standard MNI brain template, while crosschecking in all three planes. This tracing procedure produced a 3D mask with “1” indicating the presence of the lesion and “0” the presence of normal tissue. All patients had surgical lesions, which made the margins readily visible on the T1-weighted MRI images. In instances where there was uncertainty regarding the lesion margins, the treating neurosurgeon(s) and/or neuro-radiologists were consulted.
The lesion masks drawn in MNI space were subsequently overlaid on the Harvard-Oxford Structural Atlas) (Mazziotta et al., 2001) to estimate the extent of damage to the hippocampus (Figure 1). Hippocampal lesion size was calculated as the voxel overlap between the individual lesion masks and the hippocampus as defined by the Atlas with p>.5.
Sequential-Decision Making Task – Regression Analysis
The logic of the task exploits the noisy coupling between spaceships and planets to measure model-free learning - directly learning the value of spaceship choices vs. model-based planning - prospectively computing the value of the spaceship choices in terms of the planets they lead to.
For instance, consider on some trial choosing the spaceship that usually transitions to the purple planet, but instead being taken to the red planet (a “rare” transition). On the red planet your choice of alien is subsequently rewarded. In this situation model-free and model-based strategies make conflicting predictions about first-level choice behavior on the next trial. Participants using a model-free strategy will be more likely to choose the same spaceship on next trial, as it was rewarded. Conversely, participants using a model-based strategy will be more likely to switch and choose the other first-level action. This is because the model-based strategy computes the value of the spaceships using a cognitive map or model of their transition probabilities to the respective planets and the reward expected at the planets.
The goal of analysis was to estimate, for each participant, the extent to which they followed either strategy. Following previous work (Daw et al., 2011), we did this two ways, using a factorial logistic regression that captures the above qualitative logic, and fits of a more elaborate, but more assumption-laden, computational learning model.
We analyzed the first-level choices over spaceships using mixed-effects logistic regression (estimated using the fitglme function in Matlab). For each trial, the dependent variable (coded as stay with the same spaceship or switch, relative to the previous trial) was explained in terms of two events from the previous trial: whether reward was received, whether the planet encountered was reached following a common or rare transition given the spaceship chosen, and the interaction of these two factors. Our measure of model-free choice was the main effect of reward; our measure of model-based choice was the interaction of reward by transition type (common vs. rare). We further interacted the task factors with experimental group (lesion vs. control) as well as with two nuisance covariates, IQ and age, which have both been shown to affect behavior on this task (Gillan et al., 2016). The intercept, and the regression coefficients for reward, transition, and their interaction were all taken as random effects (allowed to vary across participants).
To test our predictions about the relationship between reinforcement learning strategies employed in the decision-making task and place memory performance from the spatial memory task, we also specified a second regression model which interacted the task- and group-related factors (reward, common vs rare, group, and their interactions) with participant-specific average boundary distance error (dB). IQ was also included as a nuisance variable. The interactions with dB (up to four-way) measure the extent to which the various effects in the decision task systematically vary, across participants, with their spatial memory performance; i.e. this is analogous to extracting per-participant effect sizes from the decision model and correlating them with dB, but by estimating that correlation as an effect within the regressing defining those decision effects, accounts properly for uncertainty in the per-participant estimates. We also calculated a ratio dB/(dL+dB), where dL and dB where participant wise means of landmark and response distance error. This ratio was also used in a separate model interacted with task- and group-related factors.
Sequential-Decision Making Task – Computational Model Fit
The logistic regression analysis considers only the previous trial’s experience in predicting each choice; this simplification is motivated by a limiting argument over the learning rate parameter in a more elaborate RL model of the data (Daw et al., 2011). In order to ensure that our results were not affected by neglecting the effect of earlier trials, we repeated our analyses fitting each participant’s trial-by-trial choices with a full RL model in which each choice depends on values learned from all previous rewards (based on Daw et al., 2011, but using the version from Gillan et al. 2016). To estimate the model we utilized Markov Chain Monte Carlo (MCMC) methods, implemented in the Stan modeling language (Stan Development Team). Given an arbitrary generative model for data dependent on free parameters, the method permits samples to be drawn from the posterior probability distribution of parameter values, conditional on the observed data. From the quantiles of these distributions, we constructed confidence intervals – technically, credible intervals – over the likely values of the free parameters (Kruschke, 2010). We also report the posterior likelihood that the credible region contains zero, as one minus the size of the largest symmetric credible interval that excludes zero, which is roughly comparable to a two-sided P value.
For each model, we produced 4 chains of 10,000 samples each. The first 2500 samples from each chain were discarded to allow for equilibration. We verified the convergence of the chains by visual inspection, and additionally by computing for each parameter the ‘potential scale reduction factor’ (Gelman and Rubin, 1992). For all parameters, we verified that , a range consistent with convergence (Gelman, Carlin, Stern, and Rubin, 2003).
We simultaneously estimated a model of all the data, incorporating individual parameters for each participant nested within a population-level model of the distribution of these parameters for each group.
At the participant level, the model is the same as the one used by Gillan et al. (2016), and full equations are presented there. In brief, the model learns from experience to predict values Q(s,a) for the different actions a (rockets, aliens) in the different states (planets and the starting state). Different RL algorithms, model-based and model-free, produce different estimates Q at each step. First-level (spaceship) choices are determined by softmax choice, according to the weighted combination of model-based and model-free Q values, with weightings controlled by the free inverse temperature parameters βMB and βMF; a third parameter βstick captures any value-independent bias to stay or switch. Second-level (alien) choices are determined by a single set of Q values (since model-based and model-free evaluation coincide for terminal actions), with inverse temperature βstage2. The various Q values are updated according to delta rules with a free learning rate parameter α. Finally, the net model-free weighting βMF is itself derived from the weighted combination of Q values learned by two variants of TD learning, TD(0) and TD(1), with weights βMF0 and βMF1. (This is a minor change of variables with respect to the standard model-free TD(λ) algorithm used to hybridize these learning rules in Daw et al., 2011. Here the second temperature parameter replaces the eligibility trace parameter λ used in that model, which has the advantage of eliminating its 0,1 boundaries.) Following estimation, we reverse the change of variables by computing the net model-free weighting as , where the α accounts for a difference in scaling between the two parameters (see Gillan et al., 2016). When making group comparisons, group estimates of βMF0 are scaled by the estimated α of the corresponding group.
The model thus estimates six free parameters per participant: α, βMB, βMF0, βMF1, βstick and βstage2, and our main hypotheses of interest concern group-wise differences in βMB and the net βMF.
Group-level Modeling and Estimation for Computational Model Fit
The model was specified hierarchically, so that the participant-specific parameter estimates were assumed to be drawn from a population-level distribution, separately for the patient and control groups. In particular, parameters (a six-vector) for each participant s were modeled as drawn from a multivariate normal with mean and covariance Σ. An additional vector coded any difference in means for the lesion group (i.e., their mean was , allowing us to test for group differences in each parameter by comparing the corresponding element of to zero). For the parameter α (which is constrained to 0,1), the corresponding element of (which has infinite support) was transformed through the CDF of the standard normal.
We jointly estimated the posterior distribution over the individual and group-level parameters using MCMC as described above, which required specifying prior distributions (“hyper-priors”) on the parameters of the group level distributions. In particular, priors for the elements of and were individually normal (mean=0, SD=2), which is uninformative within the relevant range. The covariance Σ was specified (as recommended in the Stan documentation) as the product of a correlation matrix Ω (which had an LKJ prior with shape v =2; Lewandowski et al., 2009) scaled element wise by the outer product of a scale vector (whose elements were again taken as normal, mean=0, sd=2) with itself. This model also included individual IQ scores and age as covariates.
In order to test the interaction between performances in the two tasks, we then expressed a new model with group-specific parameter estimates (priors were normal distributions with mean = 0 and sd=1) that specified how individual z-scored participant-wise estimates of mean boundary distance error (dB) predicted the participant-specific parameter estimates. This model also included individual IQ scores as a covariate.
Spatial Memory Task Analysis
To measure memory of locations in relation to boundary and landmark cues, we focused our main analysis on the trials following the relative movement of the landmark in relation to the boundaries. Reliance on boundary cues was quantified by boundary distance error (dB), where dB was the distance from the response location to the correct location as defined by the boundary and distal cues in the previous block (Figure 4 Right). Reliance on landmark cues was quantified by landmark distance error (dL), where dL was the distance from the response location to the correct location as defined by the landmark, according to the position of the object relative to the landmark in the previous block, translated with respect to the landmark’s new position (Figure 4 Right). Low dB thus indicated greater reliance on boundary cues, which we interpret as ‘place memory’, and low dL indicated greater reliance on landmark cues, which we interpret as ‘response memory’.
To capture the repeated-measure structure of the data, all statistical analyses of performance in the task were done using mixed-effects linear regression, treating participant as a random factor. The models were estimated using the fitlme function in Matlab, with standard errors computed using the Satterthwaite approximation to the degrees of freedom when the model was linear, and Wald (asymptotic Gaussian) test for logistic models. The dependent variable, distance error (dB and dL respectively for each trial) was regressed on the key explanatory variables lesion group, distance error-type (dB or dL) and object type (boundary or landmark), while also controlling for additional nuisance explanatory factors, age and IQ.
Supplementary Material
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Software and Algorithms | ||
MATLAB | Mathworks | https://www.mathworks.com |
Psychophysics Toolbox | Kleiner et al., 2007 | http://psychtoolbox.org/ |
UnrealEngine2 | Epic Games | http://api.unrealengine.com/udk/Two/WebHome.html |
Stan Modeling Language | Stan Development Team | https://mc-stan.org/ |
FSL FLIRT | Jenkinson and Smith, 2001 | http://fsl.fmrib.ox.ac.uk/fsl |
MRIcron | NITRC | http://www.mccauslandcenter.sc.edu/mricro/mricron/ |
Highlights.
We tested planning and spatial memory in people with hippocampal damage and controls
Patients relied less on both model-based planning and allocentric spatial memory
The planning impairment was related to the amount of damage to right hippocampus
Planning and place memory covaried in controls, but were less related in patients
Acknowledgements
We wish to thank Philip Parnamets and Aaron Bornstein for helpful input and Catherine Hartley for assistance with experimental stimuli. This project was supported by NIH grant DA038891, part of the CRCNS program, and a gift from Google DeepMind.
Footnotes
Author Contributions
Conceptualization: O.M.V. and N.D.D., Methodology: O.M.V., N.D.D. and N.B., Software: O.M.V., J.K., N.D.D., Formal Analysis: O.M.V., Investigation: O.M.V., M.R.M. and K.B., Resources: O.D., M.R.M. and K.B., Data Curation: M.R.M. and K.B., Writing – Original Draft: O.M.V. and N.D.D, Writing – Review & Editing: O.M.V., N.D.D., D.S., N.B., O.D. and M.R.M., Visualizations: O.M.V., Supervision: N.D.D., N.B. and D.S., Funding Acquisition: N.D.D. and D.S.
Declaration of Interests
No conflicts of interest to declare.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References:
- Adams CD (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. The Quarterly Journal of Experimental Psychology, 34(2), 77–98. [Google Scholar]
- Adams CD, and Dickinson A (1981). Instrumental responding following reinforcer devaluation. The Quarterly Journal of Experimental Psychology, 33(2), 109–121. [Google Scholar]
- Addis DR, Cheng T, P Roberts R, & Schacter DL (2011). Hippocampal contributions to the episodic simulation of specific and general future events. Hippocampus, 21(10), 1045–1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akam T, Costa R, and Dayan P (2015). Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLoS Computational Biology, 11(12), e1004648. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Balleine BW, and Dickinson A (1998). Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology, 37(4), 407–419. [DOI] [PubMed] [Google Scholar]
- Bayer HM, and Glimcher PW (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47(1), 129–141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boorman ED, Rajendran VG, O’Reilly JX, & Behrens TE (2016). Two anatomically and computationally distinct learning signals predict changes to stimulus-outcome associations in hippocampus. Neuron, 89(6), 1343–1354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown TI, Carr VA, LaRocque KF, Favila SE, Gordon AM, Bowles B, … and Wagner AD. (2016). Prospective representation of navigational goals in the human hippocampus. Science, 352(6291), 1323–1326. [DOI] [PubMed] [Google Scholar]
- Büchel C, Dolan RJ, Armony JL, and Friston KJ (1999). Amygdala–hippocampal involvement in human aversive trace conditioning revealed through event-related functional magnetic resonance imaging. Journal of Neuroscience, 19(24), 10869–10876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burgess N, Maguire EA, and O’Keefe J (2002). The human hippocampus and spatial and episodic memory. Neuron, 35(4), 625–641. [DOI] [PubMed] [Google Scholar]
- Chadwick MJ, Jolly AE, Amos DP, Hassabis D, and Spiers HJ (2015). A goal direction signal in the human entorhinal/subicular region. Current Biology, 25(1), 87–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corbit LH, and Balleine BW (2000). The role of the hippocampus in instrumental conditioning. Journal of Neuroscience, 20(11), 4233–4239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corbit LH, Ostlund SB, and Balleine BW (2002). Sensitivity to instrumental contingency degradation is mediated by the entorhinal cortex and its efferents via the dorsal hippocampus. Journal of Neuroscience, 22(24), 10976–10984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davachi L, and Wagner AD (2002). Hippocampal contributions to episodic encoding: insights from relational and item-based learning. Journal of Neurophysiology, 88(2), 982–990. [DOI] [PubMed] [Google Scholar]
- Daw ND, Niv Y, and Dayan P (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8(12), 1704. [DOI] [PubMed] [Google Scholar]
- Daw ND, Gershman SJ, Seymour B, Dayan P, and Dolan RJ (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron, 69(6), 1204–1215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Decker JH, Otto AR, Daw ND, and Hartley CA (2016). From creatures of habit to goal-directed learners: Tracking the developmental emergence of model-based reinforcement learning. Psychological Science, 27(6), 848–858. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dickinson A, and Balleine B (1993). Actions and responses: The dual psychology of behaviour. [Google Scholar]
- Doeller CF, King JA, and Burgess N (2008). Parallel striatal and hippocampal systems for landmarks and boundaries in spatial memory. Proceedings of the National Academy of Sciences, 105(15), 5915–5920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doll BB, Duncan KD, Simon DA, Shohamy D, and Daw ND (2015). Model-based choices involve prospective neural activity. Nature Neuroscience, 18(5), 767–772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dusek JA, and Eichenbaum H (1997). The hippocampus and memory for orderly stimulus relations. Proceedings of the National Academy of Sciences, 94(13), 7109–7114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Economides M, Kurth-Nelson Z, Lübbert A, Guitart-Masip M, and Dolan RJ (2015). Model-based reasoning in humans becomes automatic with training. PLoS Computational Biology, 11(9), e1004463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eichenbaum H, and Cohen NJ (2004). From conditioning to conscious recollection: Memory systems of the brain (No. 35). Oxford University Press on Demand. [Google Scholar]
- Eilan N, McCarthy RA, and Brewer B (1993). Spatial representation: Problems in philosophy and psychology. [Google Scholar]
- Foerde K, and Shohamy D (2011). Feedback timing modulates brain systems for learning in humans. Journal of Neuroscience, 31(37), 13157–13167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frank MJ, Seeberger LC, and O'reilly RC (2004). By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science, 306(5703), 1940–1943. [DOI] [PubMed] [Google Scholar]
- Garvert MM, Dolan RJ, & Behrens TE (2017). A map of abstract relational knowledge in the human hippocampal–entorhinal cortex. Elife, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gelman A, Carlin JB, Stern HS, and Rubin DB (2003). Bayesian Data Analysis, (Chapman and Hall/CRC Texts in Statistical Science; ). Retrieved from http://www.citeulike.org/group/302/article/105949 [Google Scholar]
- Gelman A, and Rubin DB (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 457–472. [Google Scholar]
- Gershman SJ, Blei DM, and Niv Y (2010). Context, learning, and extinction. Psychological Review, 117(1), 197. [DOI] [PubMed] [Google Scholar]
- Gillan CM, Kosinski M, Whelan R, Phelps EA, and Daw ND (2016). Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. Elife, 5, e11305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gläscher J, Daw N, Dayan P, and O’Doherty JP (2010). States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron, 66(4), 585–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartley T, Maguire EA, Spiers HJ, and Burgess N (2003). The well-worn route and the path less traveled: distinct neural bases of route following and wayfinding in humans. Neuron, 37(5), 877–888. [DOI] [PubMed] [Google Scholar]
- Hassabis D, Kumaran D, Vann SD, & Maguire EA (2007). Patients with hippocampal amnesia cannot imagine new experiences. Proceedings of the National Academy of Sciences, 104(5), 1726–1731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heckers S, Zalesak M, Weiss AP, Ditman T, and Titone D (2004). Hippocampal activation during transitive inference in humans. Hippocampus, 14(2), 153–162. [DOI] [PubMed] [Google Scholar]
- Hirsh R (1974). The hippocampus and contextual retrieval of information from memory: A theory. Behavioral biology, 12(4), 421–444. [DOI] [PubMed] [Google Scholar]
- Iaria G, Petrides M, Dagher A, Pike B, and Bohbot VD (2003). Cognitive strategies dependent on the hippocampus and caudate nucleus in human navigation: variability and change with practice. Journal of Neuroscience, 23(13), 5945–5952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jenkinson M, and Smith S (2001). A global optimisation method for robust affine registration of brain images. Medical Image Analysis, 5(2), 143–156. [DOI] [PubMed] [Google Scholar]
- Johnson A, and Redish AD (2007). Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. Journal of Neuroscience, 27(45), 12176–12189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaplan R, King J, Koster R, Penny WD, Burgess N, and Friston KJ (2017). The neural representation of prospective choice during spatial planning and decisions. PLoS Biology, 15(1), e1002588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kleiner M, Brainard D, Pelli D, Ingling A, Murray R, & Broussard C (2007). What’s new in Psychtoolbox-3. Perception, 36(14), 1. [Google Scholar]
- Knowlton BJ, Mangels JA, and Squire LR (1996). A neostriatal habit learning system in humans. Science, 273(5280), 1399–1402. [DOI] [PubMed] [Google Scholar]
- Kruschke JK (2010). Bayesian data analysis. Wiley Interdisciplinary Reviews: Cognitive Science, 1(5), 658–676. [DOI] [PubMed] [Google Scholar]
- Kumaran D, Summerfield JJ, Hassabis D, & Maguire EA (2009). Tracking the emergence of conceptual knowledge during human decision making. Neuron, 63(6), 889–901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumaran D, Hassabis D, & McClelland JL (2016). What learning systems do intelligent agents need? Complementary learning systems theory updated. Trends in cognitive sciences, 20(7), 512–534. [DOI] [PubMed] [Google Scholar]
- Mackey WE, Devinsky O, Doyle WK, Meager MR, and Curtis CE (2016). Human dorsolateral prefrontal cortex is not necessary for spatial working memory. Journal of Neuroscience, 36(10), 2847–2856. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mattar MG, and Daw ND (2018). Prioritized memory access explains planning and hippocampal replay. BioRxiv, 225664 10.1101/225664 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mazziotta J, Toga A, Evans A, Fox P, Lancaster J, Zilles K, … others. (2001). A probabilistic atlas and reference system for the human brain: International Consortium for Brain Mapping (ICBM). Philosophical Transactions of the Royal Society of London B: Biological Sciences, 356(1412), 1293–1322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McDonald RJ, and White NM (1994). Parallel information processing in the water maze: evidence for independent memory systems involving dorsal striatum and hippocampus. Behavioral and neural biology, 61(3), 260–270. [DOI] [PubMed] [Google Scholar]
- Miller KJ, Botvinick MM, and Brody CD (2017). Dorsal hippocampus plays a causal role in model-based planning. BioRxiv, 096594. [Google Scholar]
- Morris RGM, Garrud P, Rawlins JA, and O'Keefe J (1982). Place navigation impaired in rats with hippocampal lesions. Nature, 297(5868), 681. [DOI] [PubMed] [Google Scholar]
- O'keefe J, and Nadel L (1978). The hippocampus as a cognitive map. Oxford: Clarendon Press. [Google Scholar]
- Packard MG, and McGaugh JL (1996). Inactivation of hippocampus or caudate nucleus with lidocaine differentially affects expression of place and response learning. Neurobiology of Learning and Memory, 65(1), 65–72. [DOI] [PubMed] [Google Scholar]
- Pearce JM, Roberts AD, and Good M (1998). Hippocampal lesions disrupt navigation based on cognitive maps but not heading vectors. Nature, 396(6706), 75. [DOI] [PubMed] [Google Scholar]
- Pfeiffer BE, and Foster DJ (2013). Hippocampal place cell sequences depict future paths to remembered goals. Nature, 497(7447), 74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Poldrack RA, and Packard MG (2003). Competition among multiple memory systems: converging evidence from animal and human brain studies. Neuropsychologia, 41(3), 245–251. [DOI] [PubMed] [Google Scholar]
- Schapiro AC, Turk-Browne NB, Norman KA, and Botvinick MM (2016). Statistical learning of temporal community structure in the hippocampus. Hippocampus, 26(1), 3–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schultz W, Dayan P, and Montague PR (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–1599. [DOI] [PubMed] [Google Scholar]
- Scoville WB, and Milner B (1957). Loss of recent memory after bilateral hippocampal lesions. Journal of Neurology, Neurosurgery, and Psychiatry, 20(1), 11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shohamy D, and Wagner AD (2008). Integrating memories in the human brain: hippocampal-midbrain encoding of overlapping events. Neuron, 60(2), 378–389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shohamy D, and Daw ND (2015). Integrating memories to guide decisions. Current Opinion in Behavioral Sciences, 5, 85–90. [Google Scholar]
- Simon DA, and Daw ND (2011). Neural correlates of forward planning in a spatial decision task in humans. Journal of Neuroscience, 31(14), 5526–5539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Solomon PR, Vander Schaaf ER, Thompson RF, and Weisz DJ (1986). Hippocampus and trace conditioning of the rabbit’s classically conditioned nictitating membrane response. Behavioral Neuroscience, 100(5), 729. [DOI] [PubMed] [Google Scholar]
- Spiers HJ, Burgess N, Maguire EA, Baxendale SA, Hartley T, Thompson PJ, and O’keefe J (2001). Unilateral temporal lobectomy patients show lateralized topographical and episodic memory deficits in a virtual town. Brain, 124(12), 2476–2489. [DOI] [PubMed] [Google Scholar]
- Spiers HJ, and Maguire EA (2007). A navigational guidance system in the human brain. Hippocampus, 17(8), 618–626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Squire LR (1992). Memory and the hippocampus: a synthesis from findings with rats, monkeys, and humans. Psychological review, 99(2), 195. [DOI] [PubMed] [Google Scholar]
- Stachenfeld KL, Botvinick M, and Gershman SJ (2014). Design principles of the hippocampal cognitive map. In Advances in neural information processing systems (pp. 2528–2536). [Google Scholar]
- Tolman EC (1948). Cognitive maps in rats and men. Psychological Review, 55(4), 189. [DOI] [PubMed] [Google Scholar]
- van Praag H, Qu PM, Elliott RC, Wu H, Dreyfus CF, and Black IB (1998). Unilateral hippocampal lesions in newborn and adult rats: effects on spatial memory and BDNF gene expression. Behavioural Brain Research, 92(1), 21–30. [DOI] [PubMed] [Google Scholar]
- Viard A, Doeller CF, Hartley T, Bird CM, and Burgess N (2011). Anterior hippocampus and goal-directed spatial decision making. Journal of Neuroscience, 31(12), 4613–4621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wimmer GE, and Shohamy D (2012). Preference by association: how memory mechanisms in the hippocampus bias decisions. Science, 338(6104), 270–273. [DOI] [PubMed] [Google Scholar]
- Voermans NC, Petersson KM, Daudey L, Weber B, Van Spaendonck KP, Kremer HP, and Fernández G (2004). Interaction between the human hippocampus and the caudate nucleus during route recognition. Neuron, 43(3), 427–435. [DOI] [PubMed] [Google Scholar]
- Wikenheiser AM, and Redish AD (2015). Hippocampal theta sequences reflect current goals. Nature neuroscience, 18(2), 289. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.