Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2020 Sep 9;16(9):e1008149. doi: 10.1371/journal.pcbi.1008149

Similarities and differences in spatial and non-spatial cognitive maps

Charley M Wu 1,2,*, Eric Schulz 3, Mona M Garvert 4,5,6, Björn Meder 2,7,8, Nicolas W Schuck 5,9
Editor: Daniele Marinazzo10
PMCID: PMC7480875  PMID: 32903264

Abstract

Learning and generalization in spatial domains is often thought to rely on a “cognitive map”, representing relationships between spatial locations. Recent research suggests that this same neural machinery is also recruited for reasoning about more abstract, conceptual forms of knowledge. Yet, to what extent do spatial and conceptual reasoning share common computational principles, and what are the implications for behavior? Using a within-subject design we studied how participants used spatial or conceptual distances to generalize and search for correlated rewards in successive multi-armed bandit tasks. Participant behavior indicated sensitivity to both spatial and conceptual distance, and was best captured using a Bayesian model of generalization that formalized distance-dependent generalization and uncertainty-guided exploration as a Gaussian Process regression with a radial basis function kernel. The same Gaussian Process model best captured human search decisions and judgments in both domains, and could simulate realistic learning curves, where we found equivalent levels of generalization in spatial and conceptual tasks. At the same time, we also find characteristic differences between domains. Relative to the spatial domain, participants showed reduced levels of uncertainty-directed exploration and increased levels of random exploration in the conceptual domain. Participants also displayed a one-directional transfer effect, where experience in the spatial task boosted performance in the conceptual task, but not vice versa. While confidence judgments indicated that participants were sensitive to the uncertainty of their knowledge in both tasks, they did not or could not leverage their estimates of uncertainty to guide exploration in the conceptual task. These results support the notion that value-guided learning and generalization recruit cognitive-map dependent computational mechanisms in spatial and conceptual domains. Yet both behavioral and model-based analyses suggest domain specific differences in how these representations map onto actions.

Author summary

There is a resurgence of interest in “cognitive maps’’ based on recent evidence that the hippocampal-entorhinal system encodes both spatial and non-spatial relational information, with far-reaching implications for human behavior. Yet little is known about the commonalities and differences in the computational principles underlying human learning and decision making in spatial and non-spatial domains. We use a within-subject design to examine how humans search for either spatially or conceptually correlated rewards. Using a Bayesian learning model, we find evidence for the same computational mechanisms of generalization across domains. While participants were sensitive to expected rewards and uncertainty in both tasks, how they leveraged this knowledge to guide exploration was different: participants displayed less uncertainty-directed and more random exploration in the conceptual domain. Moreover, experience with the spatial task improved conceptual performance, but not vice versa. These results provide important insights about the degree of overlap between spatial and conceptual cognition.

Introduction

Thinking spatially is intuitive. We remember things in terms of places [13], describe the world using spatial metaphors [4, 5], and commonly use concepts like “space” or “distance” in mathematical descriptions of abstract phenomena. In line with these observations, previous theories have argued that reasoning about abstract conceptual information follows the same computational principles as spatial reasoning [68]. This has recently gained new support from neuroscientific evidence suggesting that common neural substrates are the basis for knowledge representation across domains [913].

One important implication of these accounts is that reinforcement learning [14] in non-spatial domains may rely on a map-like organization of information, supported by the computation of distances or similarities between experiences. These representations of distance facilitate generalization, allowing for predictions about novel stimuli based on their similarity to previous experiences. Here, we ask to what extent does the search for rewards depend on the same distance-dependent generalization across two different domains—one defined by spatial location and another by abstract features of a Gabor patch—despite potential differences in how the stimuli and their similarities may be processed?

We formalize a computational model that incorporates distance-dependent generalization and test it in a within-subject experiment, where either spatial features or abstract conceptual features are predictive of rewards. This allows us to study the extent to which the same organizational structure of cognitive representations is used in both domains, based on examining the downstream behavioral implications for learning, decision making, and exploration.

Whereas early psychological theories described reinforcement learning as merely developing an association between stimuli, responses, and rewards [1517], more recent studies have recognized that the structure of representations plays an important role in making value-based decisions [11, 18] and is particularly important for knowing how to generalize from limited data to novel situations [19, 20]. This idea dates back to Tolman, who famously argued that both rats and humans extract a “cognitive map” of the environment [21]. This cognitive map encodes relationships between experiences or options, such as the distances between locations in space [22], and—crucially—facilitates flexible planning and generalization. While cognitive maps were first identified as representations of physical spaces, Tolman hypothesized that similar principles may underlie the organization of knowledge in broader and more complex cognitive domains [21].

As was the case with Tolman, neuroscientific evidence for a cognitive map was initially found in the spatial domain, in particular, with the discovery of spatially selective place cells in the hippocampus [23, 24] and entorhinal grid cells that fire along a spatial hexagonal lattice [25]. Together with a variety of other specialized cell types that encode spatial orientation [26, 27], boundaries [28, 29], and distances to objects [30], this hippocampal-entorhinal machinery is often considered to provide a cognitive map facilitating navigation and self-location. Yet more recent evidence has shown that the same neural mechanisms are also active when reasoning about more abstract, conceptual relationships [3136], characterized by arbitrary feature dimensions [37] or temporal relationships [38, 39]. For example, using a technique developed to detect spatial hexagonal grid-like codes in fMRI signals [40], Constantinescu et al. found that human participants displayed a pattern of activity in the entorhinal cortex consistent with mental travel through a 2D coordinate system defined by the length of a bird’s legs and neck [9]. Similarly, the same entorhinal-hippocampal system has also been found to reflect the graph structure underlying sequences of stimuli [10] or the structure of social networks [41], and even to replay non-spatial representations in the sequential order that characterized a previous decision-making task [42]. At the same time, much evidence indicates that cognitive map-related representations are not limited to medial temporal areas, but also include ventral and orbital medial prefrontal areas [9, 11, 40, 4345]. Relatedly, a study by Kahnt and Tobler [46] using uni-dimensional variations of Gabor stimuli showed that the generalization of rewards was modulated by dopaminergic activity in the hippocampus, indicating a role of non-spatial distance representations in reinforcement learning.

Based on these findings, we asked whether learning and searching for rewards in spatial and conceptual domains is governed by similar computational principles. Using a within-subject design comparing spatial and non-spatial reward learning, we tested whether participants used perceptual similarities in the same way as spatial distances to generalize from previous experiences and inform the exploration of novel options. To ensure commensurate stimuli discriminability between domains, participants completed a training phase where they were required to reach the same level of proficiency in correctly matching a series of target stimuli (see Methods; Fig 1c). In both domains, rewards were correlated (see S2 Fig), such that nearby or similar options tended to yield similar rewards. To model how participants generalize and explore using either perceptual similarities or spatial distances, we used Gaussian Process (GP) regression [47, 48] as a Bayesian model of generalization based on the principle of function learning. The Bayesian predictions of the GP model generalize about novel options using a common notion of similarity across domains, and provide estimates of expected reward and uncertainty. We tested out-of-sample predictions of the GP model against a Bayesian learner that incorporates uncertainty-guided exploration but without generalization, and investigated differences in parameters governing value-based decision making and uncertainty-directed exploration [4951].

Fig 1. Experiment design.

Fig 1

a) In the spatial task, options were defined as a highlighted square in a 8 × 8 grid, where the arrow keys were used to move the highlighted location. b) In the conceptual task, each option was represented as a Gabor patch, where the arrow keys changed the tilt and the number of stripes (S1 Fig). Both tasks corresponded to correlated reward distributions, where choices in similar locations or having similar Gabor features predicted similar rewards (S2 Fig). c) The same design was used in both tasks. Participants first completed a training phase where they were asked to match a series of target stimuli. This used the same inputs and stimuli as the main task, where the arrow keys modified either the spatial or conceptual features, and the spacebar was used to make a selection. After reaching the learning criterion of at least 32 training trials and a run of 9 out of 10 correct, participants were shown instructions for the main task and asked to complete a comprehension check. The main task was 10 rounds long, where participants were given 20 selections in each round to maximize their cumulative reward (shown in panels a and b). The 10th round was a “bonus round” where after 15 selections participants were asked to make 10 judgments about the expected reward and associated uncertainty for unobserved stimuli from that round. After judgments were made, participants selected one of the options, observed the reward, and continued the round as usual.

Participant performance was correlated across tasks and was best captured by the GP model in both domains. We were also able to reliably predict participant judgments about unobserved options based on parameters estimated from the bandit task. Whereas the model parameters indicated similar levels of generalization in both domains, we found lower levels of directed exploration in the conceptual domain, where participants instead showed increased levels of random exploration. Moreover, we also observed an asymmetric task order effect, where performing the spatial task first boosted performance on the conceptual task but not vice versa. These findings provide a clearer picture of both the commonalities and differences in how people reason about and represent both spatial and abstract phenomena in complex reinforcement learning tasks.

Results

129 participants searched for rewards in two successive multi-armed bandit tasks (Fig 1). The spatial task was represented as an 8 × 8 grid, where participants used the arrow keys to move a highlighted square to one of the 64 locations, with each location representing one option (i.e., arm of the bandit). The conceptual task was represented using Gabor patches, where a single patch was displayed on the screen and the arrow keys changed the tilt and stripe frequency (each having 8 discrete values; see S1 Fig), providing a non-spatial domain where similarities are relatively well defined. Each of the 64 options in both tasks produced normally distributed rewards, where the means of each option were correlated, such that similar locations or Gabor patches with similar stripes and tilts yielded similar rewards (S2 Fig), thus providing traction for similarity-guided generalization and search. The strength of reward correlations were manipulated between subjects, with one half assigned to smooth environments (with higher reward correlations) and the other assigned to rough environments (with lower reward correlations). Importantly, both classes of environments had the same expectation of rewards across options.

The spatial and conceptual tasks were performed in counter-balanced order, with each task consisting of an initial training phase (see Methods) and then 10 rounds of bandits. Each round had a different reward distribution (drawn without replacement from the assigned class of environments), and participants were given 20 choices to acquire as many points as possible (later converted to monetary rewards). The search horizon was much smaller than the total number of options and therefore induced an explore-exploit dilemma and motivated the need for generalization and efficient exploration. The last round of each task was a “bonus round”, where after 15 choices, participants were shown 10 unobserved options (selected at random) and asked to make judgments about the expected reward and their level of confidence (i.e., uncertainty about the expected rewards). These judgments were used to validate the internal belief representations of our models. All data and code, including interactive notebooks containing all analyses in the paper, is publicly available at https://github.com/charleywu/cognitivemaps.

Computational models of learning, generalization, and search

Multi-armed bandit problems [52, 53] are a prominent framework for studying learning, where various reinforcement learning (RL) models [14] are used to model the learning of reward valuations and to predict behavior. A common element of most RL models is some form of prediction-error learning [54, 55], where model predictions are updated based on the difference between the predicted and experienced outcome. One classic example of learning from prediction errors is the Rescorla-Wagner [55] model, in which the expected reward V(⋅) of each bandit is described as a linear combination of weights wt and a one-hot stimuli vector xt representing the current state st:

V(xt)=wtxt (1)
wt+1=wt+ηδtxt (2)

Learning occurs by updating the weights w as a function of the prediction error δt = rtV(xt), where rt is the observed reward, V(xt) is the reward expectation, and 0 < η ≤ 1 is the learning rate parameter. In our task, we used a Bayesian Mean Tracker (BMT) as a Bayesian variant of the Rescorla-Wagner model [55, 56]. Rather than making point estimates of reward, the BMT makes independent and normally distributed predictions V(si,t)N(mi,t,vi,t) for each state si,t, which are characterized by a mean m and variance v and updated on each trial t via the delta rule (see Methods for details).

Generalization using Gaussian process regression

Yet, an essential aspect of human cognition is the ability to generalize from limited experiences to novel options. Rather than learning independent reward representations for each state, we adopt a function learning approach to generalization [19, 57], where continuous functions represent candidate hypotheses about the world, mapping the space of possible options to some outcome value. For example, a function can map how pressure on the gas pedal is related to the acceleration of a car, or how different amounts of water and fertilizer influence the growth rate of a plant. Crucially, the learned mapping provides estimates even for outcomes that have not been observed, by interpolating or extrapolating from previous experiences.

While the literature on how humans explicitly learn functions extends back to the 1960s [58], more recent approaches have proposed Gaussian Process (GP) regression [47] as a candidate model of human function learning [5961]. GPs unite previous proposals of rule-based [62, i.e., learning the weights of a particular parametric function] and exemplar-based theories [63, i.e., neural networks predicting similar inputs will produce similar outputs], while also predicting the perceived difficulty of learning different functions [64] and explaining biases in how people extrapolate from limited data [59].

Formally, a GP defines a multivariate-normal distribution P(f) over possible value functions f(s) that map inputs s to output y = f(s).

P(f)GP(m(s),k(s,s)) (3)

The GP is fully defined by the mean function m(s), which is frequently set to 0 for convenience without loss of generality [47], and kernel function k(s, s′) encoding prior assumptions (or inductive biases) about the underlying function. Here we use the radial basis function (RBF) kernel:

k(s,s)=exp(-s-s22λ2) (4)

encoding similarity as a smoothly decaying function of the squared Euclidean distance between stimuli s and s′, measured either in spatial or conceptual distance. The length-scale parameter λ encodes the rate of decay, where larger values correspond to broader generalization over larger distances.

Given a set of observations Dt=[st,yt] about previously observed states and associated rewards, the GP makes normally distributed posterior predictions for any novel stimuli s, defined in terms of a posterior mean and variance:

m(s|Dt)=K(s,st)[K(st,st)+σϵ2I]1yt (5)
v(s|Dt)=k(s,s)-K(s,st)[K(st,st)+σϵ2I]1K(st,s) (6)

The posterior mean corresponds to the expected value of s while the posterior variance captures the underlying uncertainty in the prediction. Note that the posterior mean can also be rewritten as a similarity-weighted sum:

m(s|Dt)=i=1twik(s,si) (7)

where each si is a previously observed input in st and the weights are collected in the vector w=[K(st,st)+σϵ2I]1yt. Intuitively, this means that GP regression is equivalent to a linearly weighted sum, but uses basis functions k(⋅, ⋅) that project the inputs into a feature space, instead of the discrete state vectors. To generate new predictions, every observed reward yi in yt is weighted by the similarity of the associated state si to the candidate state s based on the kernel similarity. This similarity-weighted sum (Eq 7) is equivalent to a RBF network [65], which has featured prominently in machine learning approaches to value function approximation [14] and as a theory of the neural architecture of human generalization [66] in vision and motor control.

Uncertainty-directed exploration

In order to transform the Bayesian reward predictions of the BMT and GP models into predictions about participant choices, we use upper confidence bound (UCB) sampling together with a softmax choice rule as a combined model of both directed and random exploration [19, 50, 51].

UCB sampling uses a simple weighted sum of expected reward and uncertainty:

qUCB(s)=m(s)+βv(s) (8)

to compute a value q for each option s, where the exploration bonus β determines how to trade off exploring highly uncertain options against exploiting high expected rewards. This simple heuristic—although myopic—produces highly efficient learning by preferentially guiding exploration towards uncertain yet promising options, making it one of the only algorithms with known performance bounds in Bayesian optimization [67]. Recent studies have provided converging evidence for directed exploration in human behavior across a number of domains [19, 50, 6870].

The UCB values are then put into a softmax choice rule:

P(si)=exp(q(si)/τ)jexp(q(sj)/τ) (9)

where the temperature parameter τ controls the amount of random exploration. Higher temperature sampling leads to more random choice predictions, with τ → ∞ converging on uniform sampling. Lower temperature values make more precise predictions, where τ → 0 converges on an argmax choice rule. Taken together, the exploration bonus β and temperature τ parameters estimated on participant data allow us to assess the relative contributions of directed and undirected exploration, respectively.

Behavioral results

After training participants were highly proficient in discriminating the stimuli, achieving at least 90% accuracy in both domains (see S3 Fig). Participants were also successful in both bandit tasks, achieving much higher rewards than chance in both conceptual (one-sample t-test: t(128) = 24.6, p < .001, d = 2.2, BF > 100) and spatial tasks (t(128) = 34.6, p < .001, d = 3.0, BF > 100; Fig 2a; See Methods for further details about statistics). In addition, participants could also leverage environmental structure in both domains. Using a two-way mixed ANOVA, we found that both environment (smooth vs. rough: F(1, 127) = 9.4, p = .003, η2 = .05, BF = 13) and task (spatial vs. conceptual: F(1, 127) = 35.8, p < .001, η2 = .06, BF > 100) influenced performance. The stronger reward correlations present in smooth environments facilitated higher performance (two sample t-test: t(127) = 3.1, p = .003, d = 0.5, BF = 12), even though both environments had the same expected reward.

Fig 2. Behavioral results.

Fig 2

a) Mean reward in each task, where each dot is a participant and lines connect the same participant across tasks. Tukey boxplots show median (horizontal line) and 1.5x IQR, while diamonds indicate the group mean. The dashed line indicates chance performance. Bayes Factors (BF) indicate the evidence against a specified null hypothesis for either two sample (rough vs. smooth) or paired (conceptual vs. spatial) t-tests (see Methods). b) Correspondence between tasks, where each dot represents the average reward of a single participant and the dotted line indicates y = x. c) Task order effect, where experience with spatial search boosted performance on conceptual search, but not vice versa. Bayes factors correspond to paired t-tests. d) Average learning curves over trials, showing the mean (line) and standard error (ribbon) aggregated across rounds and participants. The dashed line indicates chance performance. e) The Manhattan distance between selections compared to a random baseline (black line). f) Distance between selections as a function of the previous observed reward value, showing the aggregate means (points) and the group-level predictions of a mixed-effects regression (S1 Table), where the ribbons indicate the 95% CI.

While performance was strongly correlated between the spatial and conceptual tasks (Pearson’s r = .53, p < .001, BF > 100; Fig 2b), participants performed systematically better in the spatial version (paired t-test: t(128) = 6.0, p < .001, d = 0.5, BF > 100). This difference in task performance can largely be explained by a one-directional transfer effect (Fig 2c). Participants performed better on the conceptual task after having experienced the spatial task (t(127) = 2.8, p = .006, d = 0.5, BF = 6.4). This was not the case for the spatial task, where performance did not differ whether performed first or second (t(127) = − 1.7, p = .096, d = 0.3, BF = .67). Thus, experience with spatial search boosted performance on conceptual search, but not vice versa.

Participants learned effectively within each round and obtained higher rewards with each successive choice (Pearson correlation between reward and trial: r = .88, p < .001, BF > 100; Fig 2d). We also found evidence for learning across rounds in the spatial task (Pearson correlation between reward and round: r = .91, p < .001, BF = 15), but not in the conceptual task (r = .58, p = .104, BF = 1.5).

Patterns of search also differed across domains. Comparing the average Manhattan distance between consecutive choices in a two-way mixed ANOVA showed an influence of task (within: F(1, 127) = 13.8, p < .001, η2 = .02, BF = 67) but not environment (between: F(1, 127) = 0.12, p = .73, η2 = .001, BF = 0.25, Fig 2e). This reflected that participants searched in smaller step sizes in the spatial task (t(128) = − 3.7, p < .001, d = 0.3, BF = 59), corresponding to a more local search strategy, but did not adapt their search distance to the environment. Note that each trial began with a randomly sampled initial stimuli, such that participants did not begin near the previous selection (see Methods). The bias towards local search (one-sample t-test comparing search distance against chance: t(128) = − 16.3, p < .001, d = 1.4, BF > 100) is therefore not a side effect of the task characteristics, but both purposeful and effortful (see S4 Fig for additional analysis of search trajectories).

Participants also adapted their search patterns based on reward values (Fig 2f), where lower rewards predicted a larger search distance on the next trial (correlation between previous reward and search distance: r = − .66, p < .001, BF > 100). We analyzed this relationship using a Bayesian mixed-effects regression, where we found previous reward value to be a reliable predictor of search distance (bprevReward = − 0.06, 95% HPD: [−0.06, −0.06]; see S1 Table), while treating participants as random effects. This provides initial evidence for generalization-like behavior, where participants actively avoided areas with poor rewards and stayed near areas with rich rewards.

In summary, we found correlated performance across tasks, but also differences in both performance and patterns of search. Participants were boosted by a one-directional transfer effect, where experience with the spatial task improved performance on the conceptual task, but not the other way around. In addition, participants made larger jumps between choices in the conceptual task and searched more locally in the spatial task. However, participants adapted these patterns in both domains in response to reward values, where lower rewards predicted a larger jump to the next choice.

Modeling results

To better understand how participants navigated the spatial and conceptual tasks, we used computational models to predict participant choices and judgments. Both GP and BMT models implement directed and undirected exploration using the UCB exploration bonus β and softmax temperature τ as free parameters. The models differed in terms of learning, where the GP generalized about novel options using the length-scale parameter λ to modulate the extent of generalization over spatial or conceptual distances, while the BMT learns the rewards of each option independently (see Methods).

Both models were estimated using leave-one-round-out cross validation, where we compared goodness of fit using out-of-sample prediction accuracy, described using a pseudo-R2 (Fig 3a). The differences between models were reliable and meaningful, with the GP model making better predictions than the BMT in both the conceptual (t(128) = 3.9, p < .001, d = 0.06, BF > 100) and spatial tasks (t(128) = 4.3, p < .001, d = 0.1, BF > 100). In total, the GP model best predicted 85 participants in the conceptual task and 93 participants in the spatial task (out of 129 in total). Comparing this same out-of-sample prediction accuracy using a Bayesian model selection framework [71, 72] confirmed that the GP had the highest posterior probability (corrected for chance) of being the best model in both tasks (protected exceedance probability; conceptual: pxp(GP) = .997; spatial: pxp(GP) = 1.000; Fig 3b). The superiority of the GP model suggests that generalization about novel options via the use of structural information played a guiding role in how participants searched for rewards (see S6 Fig for additional analyses).

Fig 3. Modeling results.

Fig 3

a) Predictive accuracy of each model, where 1 is a perfect model and 0 is equivalent to chance. Each dot is a single participant, with lines indicating the difference between models. Tukey boxplot shows the median (line) and 1.5 IQR, with the group mean indicated as a diamond. b) Protected Exceedence Probability (pxp), which provides a hierarchical estimate of model prevalence in the population (corrected for chance). c) Simulated learning curves. Each line is the averaged performance over 10,000 replications, where we sampled participant parameter estimates and simulated behavior on the task. The pink line is the group mean of our human participants, while the black line provides a random baseline. d) Simulation results from panel c aggregated over trials, where the height of the bar indicates average reward. e) GP parameter estimates from the conceptual (x-axis) and spatial (y-axis) tasks. Each point is the mean estimate for a single participant and the dotted line indicates y = x. For readability, the x- and y-axis limits are set to Tukey’s upper fence (Q3 + 1.5 × IQR) for the larger of the two dimensions, but all statistics are performed on the full data.

Learning curves

To confirm that the GP model indeed captured learning behavior better in both tasks, we simulated learning curves from each model using participant parameter estimates (Fig 3c; see Methods). The GP model achieved human-like performance in all tasks and environments (comparing aggregate GP and human learning curves: conceptual MSE = 17.7; spatial MSE = 16.6), whereas BMT learning curves were substantially less similar (conceptual MSE = 150.6; spatial MSE = 330.7). In addition, the GP captured the same qualitative difference between domains and environments as our human participants (Fig 3d), with better performance in conceptual vs. spatial, and smooth vs. rough. These patterns were not present in the BMT or random simulations.

Parameter estimates

To understand how generalization and exploration differed between domains, Fig 3e compares the estimated model parameters from the conceptual and spatial tasks. The GP model had three free parameters: the extent of generalization (λ) of the RBF kernel, the exploration bonus (β) of UCB sampling, and the temperature (τ) of the softmax choice rule (see S9 Fig for BMT parameters). Note that the exploration bonus captures exploration directed towards uncertainty, whereas temperature captures random, undirected exploration, which have been shown to be distinct and recoverable parameters [19, 70].

We do not find reliable differences in λ estimates across tasks (Wilcoxon signed-rank test: Z = − 1.2, p = .115, r = − .11, BF = .13). In all cases, we observed lower levels of generalization relative to the true generative model of the underlying reward distributions (λrough = 2, λsmooth = 4; min-BF = 1456), replicating previous findings [19] that found undergeneralization to be largely beneficial in similar settings. Generalization was anecdotally correlated across tasks (Kendall rank correlation: rτ = .13, p = .028, BF = 1.3), providing weak evidence that participants tended to generalize similarly across domains.

Whereas generalization was similar between tasks, there were intriguing differences in exploration. We found substantially lower exploration bonuses (β) in the conceptual task (Z = − 5.0, p < .001, r = − .44, BF > 100), indicating a large reduction of directed exploration, relative to the spatial task. At the same time, there was an increase in temperature (τ) in the conceptual task (Z = 6.9, p < .001, r = − .61, BF > 100), corresponding to an increase in random, undirected exploration. These domain-specific differences in β and τ were not influenced by task order or environment (two-way ANOVA: all p > .05, BF < 1). Despite these differences, we find some evidence of correlations across tasks for directed exploration (rτ = .18, p = .002, BF = 13) and substantial evidence for correlations between random exploration across domains (rτ = .43, p < .001, BF > 100).

Thus, participants displayed similar and somewhat correlated levels of generalization in both tasks, but with markedly different patterns of exploration. Whereas participants engaged in typical levels of directed exploration in the spatial domain (replicating previous studies; [19, 70]), they displayed reduced levels of directed exploration in the conceptual task, substituting instead an increase in undirected exploration. Again, this is not due to a lack of effort, because participants made longer search trajectories in the conceptual domain (see S4a Fig). Rather, this indicates a meaningful difference in how people represent or reason about spatial and conceptual domains in order to decide which are the most promising options to explore.

Bonus round

In order to further validate our behavioral and modeling results, we analyzed participants’ judgments of expected rewards and perceived confidence for 10 unobserved options they were shown during the final “bonus” round of each task (see Methods and Fig 1c). Participants made equally accurate judgments in both tasks (comparing mean absolute error: t(128) = − 0.2, p = .827, d = 0.02, BF = .10; Fig 4a), which were far better than chance (conceptual: t(128) = − 9.2, p < .001, d = 0.8, BF > 100; spatial: t(128) = − 8.4, p < .001, d = 0.7, BF > 100) and correlated between tasks (r = .27, p = .002, BF = 20). Judgment errors were also correlated with performance in the bandit task (r = − .45, p < .001, BF > 100), such that participants who earned higher rewards also made more accurate judgments.

Fig 4. Bonus round.

Fig 4

a) Mean absolute error (MAE) of judgments in the bonus round, where each dot is a single participant and lines connect performance across tasks. Tukey boxplot show median and 1.5 × IQR, with the diamonds indicating group mean and the dashed line providing a comparison to chance. Bayes factor indicates the evidence against the null hypothesis for a paired t-test. b) Average confidence ratings (Likert scale: [0, 10]). c) Comparison between participant judgments and model predictions (based on the parameters estimated from the search task). Each point is a single participant judgment, with colored lines representing the predicted group-level effect of a mixed effect regression (S2 Table) and ribbons showing the 95% CI (undefined for the BMT model, which makes identical predictions for all unobserved options). d) Correspondence between participant confidence ratings and GP uncertainty, where both are rank-ordered at the individual level. Black dots show aggregate means and 95% CI, while the colored line is a linear regression.

Participants were equally confident in both domains (t(128) = − 0.8, p = .452, d = 0.04, BF = .13; Fig 4b), with correlated confidence across tasks (r = .79, p < .001, BF > 100), suggesting some participants were consistently more confident than others. Ironically, more confident participants also had larger judgment errors (r = .31, p < .001, BF = 91) and performed worse in the bandit task (r = − .28, p = .001, BF = 28).

Using parameter estimates from the search task (excluding the entire bonus round), we computed model predictions for each of the bonus round judgments as an out-of-task prediction analysis. Whereas the BMT invariably made the same predictions for all unobserved options since it does not generalize (Fig 4c), the GP predictions were correlated with participant judgments in both conceptual (mean individual correlation: r^=.35; single sample t-test of z-transformed correlation coefficients against μ = 0: t(128) = 11.0, p < .001, d = 1.0, BF > 100) and spatial tasks (r^=.43; t(128) = 11.0, p < .001, d = 1.0, BF > 100). This correspondence between human judgments and model predictions was also confirmed using a Bayesian mixed effects model, where we again treated participants as random effects (bparticipantJudgment = .82, 95% HPD: [0.75, 0.89]; see S2 Table for details).

Not only was the GP able to predict judgments about expected reward, but it also captured confidence ratings. Fig 4d shows how the highest confidence ratings corresponded to the lowest uncertainty estimates made by the GP model. This effect was also found in the raw data, where we again used a Bayesian mixed effects model to regress participant confidence judgments onto the GP uncertainty predictions (bparticipantJudgment = − 0.02, 95% HPD: [-0.02, -0.01]; see S2 Table).

Thus, participant search behavior was consistent with our GP model and we were also able to make accurate out-of-task predictions about both expected reward and confidence judgments using parameters estimated from the search task. These predictions validate the internal learning model of the GP, since reward predictions depend only on the generalization parameter λ. All together, our results suggest domain differences were not due to differences in how participants computed or represented expected reward and uncertainty, since they were equally good judging at their uncertainty in the bonus rounds for both domains. Rather, these diverging patterns of search arose from differences in exploration, where participants substantially reduced their level of exploration directed towards uncertain options in the conceptual domain.

Discussion

Previous theories of cognitive maps [21, 3234] have argued that reasoning in abstract domains follows similar computational principles as in spatial domains, for instance, sharing a common approach to computing similarities between experiences. These accounts imply that the shared notion of similarity should influence how people generalize from past outcomes, and also how they balance between sampling new and informative options as opposed to exploiting known options with high expected rewards.

Here, we investigated to what extent learning and searching for rewards are governed by similar computational principles in spatial and conceptual domains. Using a within-subject design, we studied participant behavior in both spatially and conceptually correlated reward environments. Comparing different computational models of learning and exploration, we found that a Gaussian Process (GP) model that incorporated distance-based generalization, and hence a cognitive map of similarities, best predicted participants behavior in both domains. In both domains, our parameter estimates indicated equivalent levels of generalization. Using these parameters, our model was able to simulate human-like learning curves and make accurate out-of-task predictions about participant reward estimations and confidence ratings in a final bonus round. This model-based evidence for similar distance-based decision making in both domains was also in line with our behavioral results. Performance was correlated across domains and benefited from higher outcome correlations between similar bandit options (i.e., smooth vs. rough). Subsequent choices tended to be more local than expected by chance, and similar options where more likely to be chosen after a high reward than a low reward outcome.

In addition to revealing similarities, our modelling and behavioral analyses provided a diagnostic lens into differences between spatial and conceptual domains. Whereas we found similar levels of generalization in both tasks, patterns of exploration were substantially different. Although participants showed clear signs of directed exploration (i.e., seeking out more uncertain options) in the spatial domain, this was notably reduced in the conceptual task. However, as if in compensation, participants increased their random exploration in the conceptual task. This implies a reliable shift in sampling strategies but not in generalization. Thus, even though the computational principles underpinning reasoning in both domains are indeed similar, how these computations are mapped onto actions can vary substantially. Moreover, participants obtained more rewards and sampled more locally in the spatial domain. We also find a one-directional transfer effect, where experience with the spatial task boosted performance on the conceptual task, but not vice versa. These findings shed new light onto the computational mechanisms of generalization and decision making, suggesting a universality of generalization and a situation-specific adaptation of decision making policies.

Related work

Our findings also contribute to a number of other cognitive models and theories. According to the successor representation (SR; [18]) framework, hippocampal cognitive maps reflect predictions of expected future state occupancy [7375]. This provides a similarity metric based on transition dynamics, where an analytic method for computing the SR in closed form is to assume random transitions through the state space. This assumption of a random policy produces a nearly identical similarity metric as the RBF kernel [76], with exact equivalencies in certain cases [77].

However, the SR can also be learned online using the Temporal-Difference learning algorithm, leading to asymmetric representations of distance that are skewed by the distance of travel [73, 78]. Recent work building on Kohonen maps has also suggested that the distribution of the experienced stimuli in feature space will have implications for the activation profiles of grid cells and the resulting cognitive map [79].

In our current study, we have focused on the simplifying case of a cognitive map learned through a random policy. This context was induced by having stimuli uniformly distributed over the search space and using a training phase involving extensive and random trajectories over the search space (i.e., matching random targets from random starting points). While this assumption is not always met in real life domains, it provides a useful starting point and allows us to reciprocally compare behavior in spatial and conceptual domains.

Previous work has also investigated transfer across domains [80], where inferences about the transition structure in one task can be generalized to other tasks. Whereas we used identical transition structures in both tasks, we nevertheless found asymmetric transfer between domains. A key question underlying the nature of transfer is the remapping of representations [81, 82], which can be framed as a hidden state-space inference problem. Different levels of prior experience with the spatial and conceptual stimuli could give rise to different preferences for reuse of task structure as opposed to learning a novel structure. This may be a potential source of the asymmetric transfer we measured in task performance.

Additionally, clustering methods (e.g., [79]) can also provide local approximations of GP inference by making predictions about novel options based on the mean of a local cluster. For instance, a related reward-learning task on graph structures [76] found that a k-nearest neighbors model provided a surprisingly effective heuristic for capturing aspects of human judgments and decisions. However, a crucial limitation of any clustering models is it would be incapable of learning and extrapolating upon any directional trends, which is a crucial feature of human function learning [59, 60]. Alternatively, clustering could also play a role in approximate GP inference [83], by breaking up the inference problem into smaller chunks or by considering only a subset of inputs. Future work should explore the question of how human inference scales with the complexity of the data.

Lastly, the question of “how the cognitive map is learned” is distinct from the question of “how the cognitive map is used”. Here, we have focused on the latter, and used the RBF kernel to provide a map based on the assumption of random transitions, similar to a random-policy implementation of the SR. While both the SR and GP provide a theory of how people utilize a cognitive map for performing predictive inferences, only the GP provides a theory about representations of uncertainty via Bayesian predictions of reward. These representations of uncertainty are a key feature that sets the GP apart from the SR. Psychologically, GP uncertainty estimates systematically capture participant confidence judgments and provide the basis for uncertainty-directed exploration. This distinction may also be central to the different patterns of search we observed in spatial and non-spatial domains, where a reduction in uncertainty-directed exploration may also reflect computational differences in the structure of inference. However, the exact nature of these representations remains an open question for future neuroimaging research.

Future directions

Several questions about the link between cognitive maps across domains remain unanswered by our current study and are open for future investigations. Why did we find differences in exploration across domains, even though the tasks were designed to be as equivalent as possible, including requiring commensurate stimuli discriminability during the pre-task training phase? Currently, our model can capture but not fully explain these differences in search behavior, since it treats both domains as equivalent generalization and exploration problems.

One possible explanation is a different representation of spatial and non-spatial information, or different computations acting on those representations. Recent experimental work has demonstrated that representations of spatial and non-spatial domains may be processed within the same neural systems [9, 12], suggesting representational similarities. But in our study it remains possible that different patterns of exploration could instead result from a different visual presentation of information in the spatial and the non-spatial task. It is, for example, conceivable that exploration in a (spatially or non-spatially) structured environment depends on the transparency of the structure in the stimulus material, or the alignment of the input modality. In our case the spatial structure was embedded in the stimulus itself, whereas the conceptual structure was not. Additionally, the arrow key inputs may have been more intuitive for manipulating the spatial stimuli. While generalization could be observed in both situations, directed exploration might require more explicitly accessible information about structural relationships or be facilitated by more intuitively mappable inputs. Previous work used a task where both spatial and conceptual features were simultaneously presented [84, i.e., conceptual stimuli were shuffled and arranged on a grid], yet only spatial or only conceptual features predicted rewards. However, differences in the saliency of spatial and conceptual features meant participants were highly influenced by spatial features, even when they were irrelevant. This present study was designed to overcome these issues by presenting only task-specific features, yet future work should address the computational features that allow humans to leverage structured knowledge of the environment to guide exploration. There is also a wide range of alternative non-spatial stimuli that we have not tested (for instance auditory [12] or linguistic stimuli [85, 86]), which could be considered more “conceptual” than our Gabor stimuli or may be more familiar to participants. Thus, it is an open empirical question to determine the limits to which spatial and different kinds of conceptual stimuli can be described using the same computational framework.

Our model also does not account for attentional mechanisms [87] or working memory constraints [88, 89], which may play a crucial role in influencing how people integrate information differently across domains [90]. To ask whether feature integration is different between domains, we implemented a variant of our GP model using a Shepard kernel [65], which used an additional free parameter estimating the level of integration between the two feature dimensions (S10 Fig). This model did not reveal strong differences in feature integration, yet replicated our main findings with respect to changes in exploration. Additional analyses showed asymmetries in attention to different feature dimensions, which was an effect modulated by task order (S4d–S4f Fig). Task order also modulated performance differences between domains, which only appeared when the conceptual task was performed before the spatial task (Fig 2c). Experience with the spatial task version may have facilitated a more homogenous mapping of the conceptual stimuli into a 2D similarity space, which in turn facilitated better performance. This asymmetric transfer may support the argument that spatial representations have been “exapted” to other more abstract domains [68]. For example, experience of different resource distributions in a spatial search task was found to influence behavior in a word generation task, where participants exposed to sparser rewards in space generated sparser semantic clusters of words [91]. Thus, while both spatial and conceptual knowledge are capable of being organized into a common map-like representation, there may be domain differences in terms of the ease of learning such a map and asymmetries in the transfer of knowledge. Future research should investigate this phenomenon with alternative models that make stronger assumptions about representational differences across domains.

We also found no differences in predictions and uncertainty estimates about unseen options in the bonus round. This means that participants generalized and managed to track the uncertainties of unobserved options similarly in both domains, yet did not or could not leverage their representations of uncertainty for performing directed exploration as effectively in the conceptual task. Alternatively, differences in random exploration could also arise from limited computational precision during the learning of action values [92]. Thus, the change in random exploration we observed may be due to different computational demands across domains. Similar shifts increases to random exploration have also been observed under direct cognitive load manipulations, such as by adding working memory load [93] or by limiting the available decision time [94].

Finally, our current experiment only looked at similarities between spatial and conceptual domains if the underlying structure was the same in both tasks. Future studies could expand this approach across different domains such as logical rule-learning, numerical comparisons, or semantic similarities. Additionally, structure learned in one domain could be transferable to structures encountered in either the same domain with slightly changed structures or even to totally different domains with different structures. A truly all-encompassing model of generalization should capture transfer across domains and structural changes. Even though several recent studies have advanced our understanding of how people transfer knowledge across graph structures [80], state similarities in multi-task reinforcement learning [95], and target hypotheses supporting generalization [90], whether or not all of these recruit the same computational principles and neural machinery remains to be seen.

Conclusion

We used a rich experimental paradigm to study how people generalize and explore both spatially and conceptually correlated reward environments. While people employed similar principles of generalization in both domains, we found a substantial shift in exploration, from more uncertainty-directed exploration in the spatial task to more random exploration in the conceptual domain. These results enrich our understanding of the principles connecting generalization and search across different domains and pave the way for future cognitive and neuroscientific investigations.

Methods

Participants and design

140 participants were recruited through Amazon Mechanical Turk (requiring a 95% approval rate and 100 previously approved HITs) for a two part experiment, where only those who had completed part one were invited back for part two. In total 129 participants completed both parts and were included in the analyses (55 female; mean age = 35, SD = 9.5). Participants were paid $4.00 for each part of the experiment, with those completing both parts being paid an additional performance-contingent bonus of up to $10.00. Participants earned $15.6 ± 1.0 and spent 54 ± 19 minutes completing both parts. There was an average gap of 18 ± 8.5 hours between the two parts of the experiment.

We varied the task order between subjects, with participants completing the spatial and conceptual task in counterbalanced order in separate sessions. We also varied between subjects the extent of reward correlations in the search space by randomly assigning participants to one of two different classes of environments (smooth vs. rough), with smooth environments corresponding to stronger correlations, and the same environment class used for both tasks (see below).

Ethics statement

The study was approved by the ethics committee of the Max Planck Institute for Human Development (A 2019/27) and all participants gave written informed consent.

Materials and procedure

Each session consisted of a training phase, the main search task, and a bonus round. At the beginning of each session participants were required to complete a training task to familiarize themselves with the stimuli (spatial or conceptual), the inputs (arrow keys and spacebar), and the search space (8 × 8 feature space). Participants were shown a series of randomly selected targets and were instructed to use the arrow keys to modify a single selected stimuli (i.e., adjusting the stripe frequency and angle of a Gabor patch or moving the location of a spatial selector, Fig 1c) in order to match a target stimuli displayed below. The target stayed visible during the trial and did not have to be held in memory. The space bar was used to make a selection and feedback was provided for 800ms (correct or incorrect). Participants were required to complete at least 32 training trials and were allowed to proceed to the main task once they had achieved at least 90% accuracy on a run of 10 trials (i.e., 9 out of 10). See S3 Fig for analysis of the training data.

After completing the training, participants were shown instructions for the main search task and had to complete three comprehension questions (S11 and S12 Figs) to ensure full understanding of the task. Specifically, the questions were designed to ensure participants understood that the spatial or conceptual features predicted reward. Each search task comprised 10 rounds of 20 trials each, with a different reward function sampled without replacement from the set of assigned environments. The reward function specified how rewards mapped onto either the spatial or conceptual features, where participants were told that options with either similar spatial features (Spatial task) [19, 96] or similar conceptual features (Conceptual task) [20, 57] would yield similar rewards. Participants were instructed to accumulate as many points as possible, which were later converted into monetary payoffs.

The tenth round of each sessions was a “bonus round”, with additional instructions shown at the beginning of the round. The round began as usual, but after 15 choices, participants were asked to make judgments about the expected rewards (input range: [1, 100]) and their level of confidence (Likert scale from least to most confident: [0, 10]) for 10 unrevealed targets. These targets were uniformly sampled from the set of unselected options during the current round. After the 10 judgments, participants were asked to make a forced choice between the 10 options. The reward for the selected option was displayed and the round continued as normal. All behavioral and computational modeling analyses exclude the last round, except for the analysis of the bonus round judgments.

Spatial and conceptual search tasks

Participants used the arrow keys to either move a highlighted selector in the spatial task or change the features (tilt and stripe frequency) of the Gabor stimuli in the conceptual task (S1 Fig). On each round, participants were given 20 trials to acquire as many cumulative rewards as possible. A selection was made by pressing the space bar, and then participants were given feedback about the reward for 800 ms, with the chosen option and reward value added to the history at the bottom of the screen. At the beginning of each trial, the starting position of the spatial selector or the displayed conceptual stimulus was randomly sampled from a uniform distribution. Each reward observation included normally distributed noise, ϵN(0,1), where the rewards for each round were scaled to a uniformly sampled maximum value in the range of 80 to 95, so that the value of the global optima in each round could not be easily guessed.

Participants were given feedback about their performance at the end of each round in terms of the ratio of their average reward to the global maximum, expressed as a percentage (e.g., “You have earned 80% of the maximum reward you could have earned on this round”). The performance bonus (up to $10.00) was calculated based on the cumulative performance of each round and across both tasks.

Bonus round judgments

In both tasks the last round was a “bonus round”, which solicited judgments about the expected reward and their level of confidence for 10 unrevealed options. Participants were informed that the goal of the task remained the same (maximize cumulative rewards), but that after 15 selections, they would be asked to provide judgments about 10 randomly selected options, which had not yet been explored. Judgments about expected rewards were elicited using a slider from 1 to 100 (in increments of 1), while judgments about confidence were elicited using a slider from 0 to 10 (in increments of 1), with the endpoints labeled ‘Least confident’ and ‘Most confident’. After providing the 10 judgments, participants were asked to select one of the options they just rated, and subsequently completed the round like all others.

Environments

All environments were sampled from a GP prior parameterized with a radial basis function (RBF) kernel (Eq 4), where the length-scale parameter (λ) determines the rate at which the correlations of rewards decay over (spatial or conceptual) distance. Higher λ-values correspond to stronger correlations. We generated 40 samples of each type of environments, using λrough = 2 and λsmooth = 4, which were sampled without replacement and used as the underlying reward function in each task (S2 Fig). Environment type was manipulated between subjects, with the same environment type used in both conceptual and spatial tasks.

Models

Bayesian mean tracker

The Bayesian Mean Tracker (BMT) is a simple but widely-applied associative learning model [69, 97, 98], which is a special case of the Kalman Filter with time-invariant reward distributions. The BMT can also be interpreted as a Bayesian variant of the Rescorla-Wagner model [56], making predictions about the rewards of each option j in the form of a normally distributed posterior:

P(μj,t|Dt)=N(mj,t,vj,t) (10)

The posterior mean mj,t and variance vj,t are updated iteratively using a delta-rule update based on the observed reward yt when option j is selected at trial t:

mj,t=mj,t-1+δj,tGj,t[yt-mj,t-1] (11)
vj,t=[1-δj,tGj,t]vj,t-1 (12)

where δj,t = 1 if option j was chosen on trial t, and 0 otherwise. Rather than having a fixed learning rate, the BMT scales updates based on the Kalman Gain Gj,t, which is defined as:

Gj,t=vj,t-1vj,t-1+θϵ2 (13)

where θϵ2 is the error variance, which is estimated as a free parameter. Intuitively, the estimated mean of the chosen option mj,t is updated based on the prediction error ytmj,t−1 and scaled by the Kalman Gain Gj,t (Eq 11). At the same time, the estimated variance vj,t is reduced by a factor of 1 − Gj,t, which is in the range [0, 1] (Eq 12). The error variance θϵ2 can be interpreted as an inverse sensitivity, where smaller values result in more substantial updates to the mean mj,t, and larger reductions of uncertainty vj,t.

Model cross-validation

As with the behavioral analyses, we omit the 10th “bonus round” in our model cross-validation. For each of the other nine rounds, we use cross validation to iteratively hold out a single round as a test set, and compute the maximum likelihood estimate using differential evolution [99] on the remaining eight rounds. Model comparisons use the summed out-of-sample prediction error on the test set, defined in terms of log loss (i.e., negative log likelihood).

Predictive accuracy

As an intuitive statistic for goodness of fit, we report predictive accuracy as a pseudo-R2:

R2=1-logL(Mk)logL(Mrand) (14)

comparing the out-of-sample log loss of a given model Mk against a random model Mrand. R2 = 0 indicates chance performance, while R2 = 1 is a theoretically perfect model.

Protected exceedance probability

The protected exceedance probability (pxp) is defined in terms of a Bayesian model selection framework for group studies [71, 72]. Intuitively, it can be described as a random-effect analysis, where models are treated as random effects and are allowed to differ between subjects. Inspired by a Polya’s urn model, we can imagine a population containing K different types of models (i.e., people best described by each model), much like an urn containing different colored marbles. If we assume that there is a fixed but unknown distribution of models in the population, what is the probability of each model being more frequent in the population than all other models in consideration?

This is modelled hierarchically, using variational Bayes to estimate the parameters of a Dirichlet distribution describing the posterior probabilities of each model P(mk|y) given the data y. The exceedance probability is thus defined as the posterior probability that the frequency of a model rmk is larger than all other models rmkk under consideration:

xp(mk)=p(rmk>rmkk|y) (15)

[72] extends this approach by correcting for chance, based on the Bayesian Omnibus Risk (BOR), which is the posterior probability that all model frequencies are equal:

pxp(mk)=xp(mk)(1-BOR)+BORK (16)

This produces the protected exceedance probability (pxp) reported throughout this article, and is implemented using https://github.com/sjgershm/mfit/blob/master/bms.m.

Simulated learning curves

We simulated each model by sampling (with replacement) from the set of cross-validated participant parameter estimates, and performing search on a simulated bandit task. We performed 10,000 simulations for each combination of model, environment, and domain (spatial vs. conceptual).

Bonus round predictions

Bonus round predictions used each participant’s estimated parameters to predict their judgments about expected reward and confidence. Because rewards in each round were randomly scaled to a different global maximum, we also rescaled the model predictions in order to align model predictions with the observed rewards and participant judgments.

Statistical tests

Comparisons

We report both frequentist and Bayesian statistics. Frequentist tests are reported as Student’s t-tests (specified as either paired or independent) for parametric comparisons, while the Mann-Whitney-U test or Wilcoxon signed-rank test are used for non-parametric comparisons (for independent samples or paired samples, respectively). Each of these tests are accompanied by a Bayes factors (BF) to quantify the relative evidence the data provide in favor of the alternative hypothesis (HA) over the null (H0), which we interpret following [100].

Parametric comparison are tested using the default two-sided Bayesian t-test for either independent or dependent samples, where both use a Jeffreys-Zellner-Siow prior with its scale set to 2/2, as suggested by [101]. All statistical tests are non-directional as defined by a symmetric prior (unless otherwise indicated).

Non-parametric comparisons are tested using either the frequentist Mann-Whitney-U test for independent samples, or the Wilcoxon signed-rank test for paired samples. In both cases, the Bayesian test is based on performing posterior inference over the test statistics (Kendall’s rτ for the Mann-Whitney-U test and standardized effect size r=ZN for the Wilcoxon signed-rank test) and assigning a prior using parametric yoking [102]. This leads to a posterior distribution for Kendall’s rτ or the standardized effect size r, which yields an interpretable Bayes factor via the Savage-Dickey density ratio test. The null hypothesis posits that parameters do not differ between the two groups, while the alternative hypothesis posits an effect and assigns an effect size using a Cauchy distribution with the scale parameter set to 1/2.

Correlations

For testing linear correlations with Pearson’s r, the Bayesian test is based on Jeffrey’s [103] test for linear correlation and assumes a shifted, scaled beta prior distribution B(1k,1k) for r, where the scale parameter is set to k=13 [104].

For testing rank correlations with Kendall’s tau, the Bayesian test is based on parametric yoking to define a prior over the test statistic [105], and performing Bayesian inference to arrive at a posterior distribution for rτ. The Savage-Dickey density ratio test is used to produce an interpretable Bayes Factor.

ANOVA

We use a two-way mixed-design analysis of variance (ANOVA) to compare the means of both a fixed effects factor (smooth vs. rough environments) as a between-subjects variable and a random effects factor (conceptual vs. spatial) as a within-subjects variable. To compute the Bayes Factor, we assume independent g-priors [106] for each effect size θ1N(0,g1σ2),,θpN(0,gpσ2), where each g-value is drawn from an inverse chi-square prior with a single degree of freedom gii.i.dinverse-χ2(1), and assuming a Jeffreys prior on the aggregate mean and scale factor. Following [107], we compute the Bayes factor by integrating the likelihoods with respect to the prior on parameters, where Monte Carlo sampling was used to approximate the g-priors. The Bayes factor reported in the text can be interpreted as the log-odds of the model relative to an intercept-only null model.

Mixed effects regression

Mixed effects regressions are performed in a Bayesian framework with brms [108] using MCMC methods (No-U-Turn sampling [109] with the proposal acceptance probability set to.99). In all models, we use a maximal random effects structure [110], and treat participants as a random intercept. Following [111] we use the following generic weakly informative priors:

b0N(0,1) (17)
biN(0,1) (18)
σHalf-N(0,1) (19)

All models were estimated over four chains of 4000 iterations, with a burn-in period of 1000 samples.

Supporting information

S1 Fig. Gabor stimuli.

Tilt varies from left to right from 105° to 255° in equally spaced intervals, while stripe frequency increases moving upwards from 1.5 to 15 in log intervals.

(TIFF)

S2 Fig. Correlated reward environments.

Heatmaps of the reward environments used in both spatial and conceptual domains. The color of each tile represents the expected reward of the bandit, where the x-axis and y-axis were mapped to the spatial location or the tilt and stripe frequency (respectively). All environments have the same minimum and maximum reward values, and the two classes of environments share the same expectation of reward across options.

(EPS)

S3 Fig. Training phase.

a) Trials needed to reach the learning criterion (90% accuracy over 10 trials) in the training phase, where the dotted line indicates the 32 trial minimum. Each dot is a single participant with lines connecting the same participant. Tukey boxplots show median (line) and 1.5x IQR, with diamonds indicating group means. b) Average correct choices during the training phase. In the last 10 trials before completing the training phase, participants had a mean accuracy of 95.0% on the spatial task and 92.7% on the conceptual task (difference of 2.3%). In contrast, in the first 10 trials of training, participants had a mean accuracy of 84.1% in the spatial task and 68.8% in the conceptual (difference of 15.4%). c) Heatmaps of the accuracy of different target stimuli, where the x and y-axes of the conceptual heatmap indicate tilt and stripe frequency, respectively. d) The probability of error as a function of the magnitude of error (Manhattan distance from the correct response). Thus, most errors were close to the target, with higher magnitude errors being monotonically less likely to occur.

(EPS)

S4 Fig. Search trajectories.

a) Distribution of trajectory length, separated by task and environment. The dashed vertical line indicates the median for each category. Participants had longer trajectories in the contextual task (t(128) = − 10.7, p < .001, d = 1.0, BF > 100), but there were no differences across environments (t(127) = 1.3, p = .213, d = 0.2, BF = .38). b) Average reward value as a function of trajectory length. Longer trajectories were correlated with higher rewards (r = .23, p < .001, BF > 100). Each dot is a mean with error bars showing the 95% CI. c) Distance from the random initial starting point in each trial as a function of the previous reward value. Each dot is the aggregate mean, while the lines show the fixed effects of a Bayesian mixed-effects model (see S1 Table), with the ribbons indicating the 95% CI. The relationship is not quite linear, but is also found using a rank correlation (rτ = .18, p < .001, BF > 100). The dashed line indicates random chance. d) Search trajectories decomposed into the vertical/stripe frequency dimension vs. horizontal/tilt dimension. Bars indicate group means and error bars show the 95% CI. We find more attention given to the vertical/stripe frequency dimension in both tasks, with a larger effect for the conceptual task (F(1, 127) = 26.85, p < .001, η2 = .08, BF > 100), but no difference across environments (F(1, 127) = 1.03, p = .311, η2 = .005, BF = 0.25). e) We compute attentional bias as Δdim = P(vertical/stripe frequency)− P(horizontal/tilt), where positive values indicate a stronger bias towards the vertical/stripe frequency dimension. Attentional bias was influenced by the interaction of task order and task (F(1, 127) = 8.1, p = .005, η2 = .02, BF > 100): participants were more biased towards the vertical/stripe frequency dimension in the conceptual task when the conceptual task was performed first (t(66) = − 6.0, p < .001, d = 0.7, BF > 100), but these differences disappeared when the spatial task was performed first (t(61) = − 1.6, p = .118, d = 0.2, BF = .45). f) Differences in attention and score. Each participant is represented as a pair of dots, where the connecting line shows the change in score and Δdim across tasks. We found a negative correlation between score and attention for the conceptual task only in the conceptual first order (rτ = − .31, p < .001, BF > 100), but not in the spatial first order (rτ = − .07, p = .392, BF = .24). There were no relationships between score and attention in the spatial task in either order (spatial first: rτ = .03, p = .738, BF = .17; conceptual first: rτ = − .03, p = .750, BF = .17).

(EPS)

S5 Fig. Heatmaps of choice frequency.

Heatmaps of chosen options in a) the Gabor feature of the conceptual task and b) the spatial location of the spatial task, aggregated over all participants. The color shows the frequency of each option centered on yellow representing random chance (1/64), with orange and red indicating higher than chance, while green and blue were lower than chance.

(EPS)

S6 Fig. Additional modeling results.

a) The relationship between mean performance and predictive accuracy, where in all cases, the best performing participants were also the best described. b) The best performing participants were also the most diagnostic between models, but not substantially skewed towards either model. Linear regression lines strongly overlap with the dotted line at y = 0, where participants above the line were better described by the GP model. c Model comparison split by which task was performed first vs. second. In both cases, participants were better described on their second task, although the superiority of the GP over the BMT remains, comparing only task one (paired t-test: t(128) = 4.6, p < .001, d = 0.10, BF = 1685) or only task two (t(128) = 3.5, p < .001, d = 0.08, BF = 27).

(EPS)

S7 Fig. GP parameters and performance.

a) We do not find a consistent relationship between λ estimates and performance, which were anectdotally correlated in the spatial task (rτ = .13, p = .030, BF = 1.2) or negatively correlated in the conceptual task (rτ = − .22, p < .001, BF > 100). b) Higher β estimates were strongly predictive of better performance in both conceptual (rτ = .32, p < .001, BF > 100) and spatial tasks (rτ = .31, p < .001, BF > 100). c) On the other hand, high temperature values predicted lower performance in both conceptual(rτ = − .59, p < .001, BF > 100) and spatial tasks (rτ = − .58, p < .001, BF > 100).

(EPS)

S8 Fig. GP exploration bonus and temperature.

We check here whether there exists any inverse relationship between directed and undirected exploration, implemented using the UCB exploration bonus β (x-axis) and the softmax temperature τ (y-axis), respectively. Results are split into conceptual (a) and spatial tasks (b), where each dot is a single participant and the dotted line indicates y = x. The upper axis limits are set to the largest 1.5 × IQR, for both β and τ, across both conceptual and spatial tasks.

(EPS)

S9 Fig. BMT parameters.

Each dot is a single participant and the dotted line indicates y = x. a) We found lower error variance (σϵ2) estimates in the conceptual task (Wilcoxon signed-rank test: Z = − 4.8, p < .001, r = − .42, BF > 100), suggesting participants were more sensitive to the reward values (i.e., more substantial updates to their means estimates). Error variance was also correlated across tasks (rτ = .18, p = .003, BF = 10). b) As with the GP model reported in the main text, we also found strong differences in exploration behavior in the BMT. We found lower estimates of the exploration bonus in the conceptual task (Z = − 5.9, p < .001, r = − .52, BF > 100). The exploration bonus was also somewhat correlated between tasks (rτ = .16, p = .006, BF = 4.8). c) Also in line with the GP results, we again find an increase in random exploration in the conceptual task (Z = − 6.9, p < .001, r = − .61, BF > 100). Once more, temperature estimates were strongly correlated (rτ = .34, p < .001, BF > 100).

(EPS)

S10 Fig. Shepard kernel parameters.

We also considered an alternative form of the GP model. Instead of modeling generalization as a function of squared-Euclidean distance with the RBF kernel, we use the Shepard kernel described in [65], where we instead use Minkowski distance with the free parameter ρ ∈ [0, 2]. This model is identical to the GP model reported in the main text when ρ = 2. But when ρ < 2, the input dimensions transition from integral to separable representations [112]. The lack of clear differences in model parameters motivated us to only include the standard RBF kernel in the main text. a) We find no evidence for differences in generalization between tasks (Z = − 1.8, p = .039, r = − .15, BF = .32). There is also marginal evidence of correlated estimates (rτ = .13, p = .026, BF = 1.3). b) There is anecdotal evidence of lower ρ estimates in the conceptual task (Z = − 2.5, p = .006, r = − .22, BF = 2.0). The implication of a lower ρ in the conceptual domain is that the Gabor features were treated more independently, whereas the spatial dimensions were more integrated. However, the statistics suggest this is not a very robust effect. These estimates are also not correlated (rτ = − .02, p = .684, BF = .12). c) Consistent with all the other models, we find systematically lower exploration bonuses in the conceptual task (Z = − 5.5, p < .001, r = − .49, BF > 100). There was weak evidence of a correlation across tasks (rτ = .14, p = .021, BF = 1.6). d) We find clear evidence of higher temperatures in the conceptual task (Z = − 6.3, p < .001, r = − .56, BF > 100), with strong correlations across tasks (rτ = .41, p < .001, BF > 100).

(EPS)

S11 Fig. Comprehension questions for the conceptual task.

The correct answers are highlighted.

(TIFF)

S12 Fig. Comprehension questions for the spatial task.

The correct answers are highlighted.

(TIFF)

S1 Table. Mixed effects regression results: Previous reward.

(PDF)

S2 Table. Mixed effects regression results: Bonus round judgments.

(PDF)

Acknowledgments

We thank Daniel Reznik, Nicholas Franklin, Samuel Gershman, Christian Doeller, and Fiery Cushman for helpful discussions.

Data Availability

All data and analysis code is available from https://github.com/charleywu/cognitivemaps.

Funding Statement

ES is supported by the Max Planck Society and the Jacobs Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. James W. The Principles of Psychology. Dover, New York; 1890. [Google Scholar]
  • 2. Yates FA. Art of Memory. Routledge; 2013. [Google Scholar]
  • 3. Dresler M, Shirer WR, Konrad BN, Müller NC, Wagner IC, Fernández G, et al. Mnemonic training reshapes brain networks to support superior memory. Neuron. 2017;93:1227–1235. 10.1016/j.neuron.2017.02.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Landau B, Jackendoff R. Whence and whither in spatial language and spatial cognition? Behavioral and Brain Sciences. 1993;16:255–265. 10.1017/S0140525X00029927 [DOI] [Google Scholar]
  • 5. Lakoff G, Johnson M. Metaphors We Live By. University of Chicago press; 2008. [Google Scholar]
  • 6. Todd PM, Hills TT, Robbins TW. Cognitive search: Evolution, algorithms, and the brain. MIT press; 2012. [Google Scholar]
  • 7. Hills TT, Todd PM, Goldstone RL. Search in external and internal spaces: Evidence for generalized cognitive search processes. Psychological Science. 2008;19:802–808. 10.1111/j.1467-9280.2008.02160.x [DOI] [PubMed] [Google Scholar]
  • 8. Hills TT. Animal foraging and the evolution of goal-directed cognition. Cognitive Science. 2006;30:3–41. 10.1207/s15516709cog0000_50 [DOI] [PubMed] [Google Scholar]
  • 9. Constantinescu AO, O’Reilly JX, Behrens TE. Organizing conceptual knowledge in humans with a gridlike code. Science. 2016;352:1464–1468. 10.1126/science.aaf0941 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Garvert MM, Dolan RJ, Behrens TE. A map of abstract relational knowledge in the human hippocampal–entorhinal cortex. eLife. 2017;6:e17086 10.7554/eLife.17086 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Schuck NW, Cai MB, Wilson RC, Niv Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron. 2016;91:1402–1412. 10.1016/j.neuron.2016.08.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Aronov D, Nevers R, Tank DW. Mapping of a non-spatial dimension by the hippocampal–entorhinal circuit. Nature. 2017;543(7647):719 10.1038/nature21692 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Solomon EA, Lega BC, Sperling MR, Kahana MJ. Hippocampal theta codes for distances in semantic and temporal spaces. Proceedings of the National Academy of Sciences. 2019;116(48):24343–24352. 10.1073/pnas.1906729116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Sutton RS, Barto AG. Reinforcement learning: An introduction. Cambridge: MIT Press; 1998. [Google Scholar]
  • 15. Thorndike EL. Animal intelligence: An experimental study of the associative processes in animals. The Psychological Review: Monograph Supplements. 1898;2(4):i. [Google Scholar]
  • 16. Pavlov IP. Conditional reflexes: an investigation of the physiological activity of the cerebral cortex. Oxford University Press; 1927. [Google Scholar]
  • 17. Skinner BF. The behavior of organisms: An experimental analysis. Appleton-Century, New York; 1938. [Google Scholar]
  • 18. Dayan P. Improving generalization for temporal difference learning: The successor representation. Neural Computation. 1993;5:613–624. 10.1162/neco.1993.5.4.613 [DOI] [Google Scholar]
  • 19. Wu CM, Schulz E, Speekenbrink M, Nelson JD, Meder B. Generalization guides human exploration in vast decision spaces. Nature Human Behaviour. 2018;2:915––924. 10.1038/s41562-018-0467-4 [DOI] [PubMed] [Google Scholar]
  • 20. Stojić H, Schulz E, Analytis P P, Speekenbrink M. It’s new, but is it good? How generalization and uncertainty guide the exploration of novel options. Journal of Experimental Psychology: General. 2020;. [DOI] [PubMed] [Google Scholar]
  • 21. Tolman EC. Cognitive maps in rats and men. Psychological Review. 1948;55:189–208. 10.1037/h0061626 [DOI] [PubMed] [Google Scholar]
  • 22. Thorndyke PW. Distance estimation from cognitive maps. Cognitive psychology. 1981;13(4):526–550. 10.1016/0010-0285(81)90019-0 [DOI] [Google Scholar]
  • 23. O’Keefe J, Dostrovsky J. The hippocampus as a spatial map: Preliminary evidence from unit activity in the freely-moving rat. Brain research. 1971;. [DOI] [PubMed] [Google Scholar]
  • 24. O’Keefe J. A review of the hippocampal place cells. Progress in neurobiology. 1979;13(4):419–439. 10.1016/0301-0082(79)90005-4 [DOI] [PubMed] [Google Scholar]
  • 25. Hafting T, Fyhn M, Molden S, Moser MB, Moser EI. Microstructure of a spatial map in the entorhinal cortex. Nature. 2005;436(7052):801 10.1038/nature03721 [DOI] [PubMed] [Google Scholar]
  • 26. Taube JS, Muller RU, Ranck JB. Head-direction cells recorded from the postsubiculum in freely moving rats. I. Description and quantitative analysis. Journal of Neuroscience. 1990;10(2):420–435. 10.1523/JNEUROSCI.10-02-00420.1990 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Taube JS. Head direction cells and the neurophysiological basis for a sense of direction. Progress in neurobiology. 1998;55(3):225–256. 10.1016/S0301-0082(98)00004-5 [DOI] [PubMed] [Google Scholar]
  • 28. Lever C, Burton S, Jeewajee A, O’Keefe J, Burgess N. Boundary vector cells in the subiculum of the hippocampal formation. Journal of Neuroscience. 2009;29(31):9771–9777. 10.1523/JNEUROSCI.1319-09.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Solstad T, Boccara CN, Kropff E, Moser MB, Moser EI. Representation of geometric borders in the entorhinal cortex. Science. 2008;322(5909):1865–1868. 10.1126/science.1166466 [DOI] [PubMed] [Google Scholar]
  • 30. Høydal ØA, Skytøen ER, Andersson SO, Moser MB, Moser EI. Object-vector coding in the medial entorhinal cortex. Nature. 2019;568(7752):400 10.1038/s41586-019-1077-7 [DOI] [PubMed] [Google Scholar]
  • 31. Epstein RA, Patai EZ, Julian JB, Spiers HJ. The cognitive map in humans: spatial navigation and beyond. Nature neuroscience. 2017;20(11):1504 10.1038/nn.4656 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Behrens TE, Muller TH, Whittington JC, Mark S, Baram AB, Stachenfeld KL, et al. What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior. Neuron. 2018;100(2):490–509. 10.1016/j.neuron.2018.10.002 [DOI] [PubMed] [Google Scholar]
  • 33. Kaplan R, Schuck NW, Doeller CF. The role of mental maps in decision-making. Trends in Neurosciences. 2017;40:256–259. 10.1016/j.tins.2017.03.002 [DOI] [PubMed] [Google Scholar]
  • 34. Bellmund JL, Gärdenfors P, Moser EI, Doeller CF. Navigating cognition: Spatial codes for human thinking. Science. 2018;362(6415):eaat6766 10.1126/science.aat6766 [DOI] [PubMed] [Google Scholar]
  • 35. Eichenbaum H. Hippocampus: remembering the choices. Neuron. 2013;77(6):999–1001. 10.1016/j.neuron.2013.02.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Spiers HJ. The Hippocampal Cognitive Map: One Space or Many? Trends in Cognitive Sciences. 2020; 10.1016/j.tics.2019.12.013. [DOI] [PubMed] [Google Scholar]
  • 37. Schiller D, Eichenbaum H, Buffalo EA, Davachi L, Foster DJ, Leutgeb S, et al. Memory and space: towards an understanding of the cognitive map. Journal of Neuroscience. 2015;35(41):13904–13911. 10.1523/JNEUROSCI.2618-15.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Kraus BJ, Robinson RJ II, White JA, Eichenbaum H, Hasselmo ME. Hippocampal “time cells”: time versus path integration. Neuron. 2013;78(6):1090–1101. 10.1016/j.neuron.2013.04.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. MacDonald CJ, Carrow S, Place R, Eichenbaum H. Distinct hippocampal time cell sequences represent odor memories in immobilized rats. Journal of Neuroscience. 2013;33(36):14607–14616. 10.1523/JNEUROSCI.1537-13.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Doeller CF, Barry C, Burgess N. Evidence for grid cells in a human memory network. Nature. 2010;463(7281):657 10.1038/nature08704 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Tavares RM, Mendelsohn A, Grossman Y, Williams CH, Shapiro M, Trope Y, et al. A map for social navigation in the human brain. Neuron. 2015;87(1):231–243. 10.1016/j.neuron.2015.06.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Schuck NW, Niv Y. Sequential replay of nonspatial task states in the human hippocampus. Science. 2019;364(6447):eaaw5181 10.1126/science.aaw5181 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Jacobs J, Weidemann CT, Miller JF, Solway A, Burke JF, Wei XX, et al. Direct recordings of grid-like neuronal activity in human spatial navigation. Nature neuroscience. 2013;16(9):1188 10.1038/nn.3466 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Schuck NW, Wilson R, Niv Y. A state representation for reinforcement learning and decision-making in the orbitofrontal cortex In: Goal-Directed Decision Making. Elsevier; 2018. p. 259–278. [Google Scholar]
  • 45. Niv Y. Learning task-state representations. Nature neuroscience. 2019;22(10):1544–1553. 10.1038/s41593-019-0470-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Kahnt T, Tobler PN. Dopamine regulates stimulus generalization in the human hippocampus. Elife. 2016;5:e12678 10.7554/eLife.12678 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Rasmussen C, Williams C. Gaussian Processes for Machine Learning Adaptive Computation and Machine Learning. MIT Press; 2006. [Google Scholar]
  • 48. Schulz E, Speekenbrink M, Krause A. A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions. bioRxiv. 2017;. [Google Scholar]
  • 49. Auer P. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research. 2002;3(Nov):397–422. [Google Scholar]
  • 50. Wilson RC, Geana A, White JM, Ludvig EA, Cohen JD. Humans use directed and random exploration to solve the explore–exploit dilemma. Journal of Experimental Psychology: General. 2014;143:155–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Schulz E, Gershman SJ. The algorithmic architecture of exploration in the human brain. Current Opinion in Neurobiology. 2019;55:7–14. 10.1016/j.conb.2018.11.003 [DOI] [PubMed] [Google Scholar]
  • 52. Steyvers M, Lee MD, Wagenmakers EJ. A Bayesian analysis of human decision-making on bandit problems. Journal of Mathematical Psychology. 2009;53:168–179. 10.1016/j.jmp.2008.11.002 [DOI] [Google Scholar]
  • 53.Acuna D, Schrater P. Bayesian modeling of human sequential decision-making on the multi-armed bandit problem. In: Proceedings of the 30th annual conference of the cognitive science society. vol. 100. Washington, DC: Cognitive Science Society; 2008. p. 200–300.
  • 54. Bush RR, Mosteller F. A mathematical model for simple learning. Psychological Review. 1951;58:313 10.1037/h0054388 [DOI] [PubMed] [Google Scholar]
  • 55. Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. Classical conditioning II: Current research and theory. 1972;2:64–99. [Google Scholar]
  • 56. Gershman SJ. A unifying probabilistic view of associative learning. PLoS Computational Biology. 2015;11(11):e1004567 10.1371/journal.pcbi.1004567 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Schulz E, Konstantinidis E, Speekenbrink M. Putting bandits into context: How function learning supports decision making. Journal of experimental psychology: learning, memory, and cognition. 2018;44(6):927. [DOI] [PubMed] [Google Scholar]
  • 58. Carroll JD. Functional learning: The learning of continuous functional mappings relating stimulus and response continua. ETS Research Bulletin Series. 1963;1963:i–144. [Google Scholar]
  • 59. Lucas CG, Griffiths TL, Williams JJ, Kalish ML. A rational model of function learning. Psychonomic Bulletin & Review. 2015;22(5):1193–1215. 10.3758/s13423-015-0808-5 [DOI] [PubMed] [Google Scholar]
  • 60. Griffiths TL, Lucas C, Williams J, Kalish ML. Modeling human function learning with Gaussian processes. In: Advances in Neural Information Processing Systems; 2009. p. 553–560. [Google Scholar]
  • 61. Schulz E, Tenenbaum JB, Duvenaud D, Speekenbrink M, Gershman SJ. Compositional inductive biases in function learning. Cognitive Psychology. 2017;99:44–79. 10.1016/j.cogpsych.2017.11.002 [DOI] [PubMed] [Google Scholar]
  • 62. Koh K, Meyer DE. Function learning: Induction of continuous stimulus-response relations. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1991;17:811. [DOI] [PubMed] [Google Scholar]
  • 63. Busemeyer JR, Byun E, DeLosh EL, McDaniel MA. Learning functional relations based on experience with input-output pairs by humans and artificial neural networks In: Lamberts K, Shanks D, editors. Concepts and Categories. Cambridge: MIT Press; 1997. p. 405–437. [Google Scholar]
  • 64.Schulz E, Tenenbaum JB, Reshef DN, Speekenbrink M, Gershman S. Assessing the Perceived Predictability of Functions. In: Proceedings of the 37th Annual Meeting of the Cognitive Science Society. Cognitive Science Society; 2015. p. 2116–2121.
  • 65. Jäkel F, Schölkopf B, Wichmann FA. Similarity, kernels, and the triangle inequality. Journal of Mathematical Psychology. 2008;52:297–303. 10.1016/j.jmp.2008.03.001 [DOI] [Google Scholar]
  • 66. Poggio T, Bizzi E. Generalization in vision and motor control. Nature. 2004;431:768–774. 10.1038/nature03014 [DOI] [PubMed] [Google Scholar]
  • 67.Srinivas N, Krause A, Kakade SM, Seeger M. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. Proceedings of the 27th International Conference on Machine Learning (ICML 2010). 2010; p. 1015–1022.
  • 68. Gershman SJ. Deconstructing the human algorithms for exploration. Cognition. 2018;173:34–42. 10.1016/j.cognition.2017.12.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Speekenbrink M, Konstantinidis E. Uncertainty and exploration in a restless bandit problem. Topics in Cognitive Science. 2015;7:351–367. 10.1111/tops.12145 [DOI] [PubMed] [Google Scholar]
  • 70. Schulz E, Wu CM, Ruggeri A, Meder B. Searching for rewards like a child means less generalization and more directed exploration. Psychological Science. 2019;. 10.1177/0956797619863663 [DOI] [PubMed] [Google Scholar]
  • 71. Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ. Bayesian model selection for group studies. Neuroimage. 2009;46:1004–1017. 10.1016/j.neuroimage.2009.03.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Rigoux L, Stephan KE, Friston KJ, Daunizeau J. Bayesian model selection for group studies—revisited. Neuroimage. 2014;84:971–985. 10.1016/j.neuroimage.2013.08.065 [DOI] [PubMed] [Google Scholar]
  • 73. Stachenfeld KL, Botvinick MM, Gershman SJ. The hippocampus as a predictive map. Nature Neuroscience. 2017;20:1643 EP –. 10.1038/nn.4650 [DOI] [PubMed] [Google Scholar]
  • 74. Russek EM, Momennejad I, Botvinick MM, Gershman SJ, Daw ND. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS computational biology. 2017;13(9):e1005768 10.1371/journal.pcbi.1005768 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. Bellmund JL, De Cothi W, Ruiter TA, Nau M, Barry C, Doeller CF. Deforming the metric of cognitive maps distorts memory. Nature Human Behaviour. 2020;4(2):177–188. 10.1038/s41562-019-0767-3 [DOI] [PubMed] [Google Scholar]
  • 76. Wu CM, Schulz E, Gershman SJ. Inference and search on graph-structured spaces. bioRxiv. 2020;. [Google Scholar]
  • 77.Machado MC, Rosenbaum C, Guo X, Liu M, Tesauro G, Campbell M. Eigenoption Discovery through the Deep Successor Representation. In: Proceedings of the International Conference on Learning Representations (ICLR); 2018.
  • 78. Mehta MR, Quirk MC, Wilson MA. Experience-dependent asymmetric shape of hippocampal receptive fields. Neuron. 2000;25(3):707–715. 10.1016/S0896-6273(00)81072-7 [DOI] [PubMed] [Google Scholar]
  • 79. Mok RM, Love BC. A non-spatial account of place and grid cells based on clustering models of concept learning. Nature communications. 2019;10(1):1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Mark S, Moran R, Parr T, Kennerley S, Behrens T. Transferring structural knowledge across cognitive maps in humans and models. bioRxiv. 2019;. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81. Sanders H, Wilson MA, Gershman SJ. Hippocampal Remapping as Hidden State Inference. BioRxiv. 2019;. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82. Whittington JC, Muller TH, Mark S, Chen G, Barry C, Burgess N, et al. The Tolman-Eichenbaum Machine: Unifying space and relational memory through generalisation in the hippocampal formation. bioRxiv. 2019; p. 770495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83. Liu H, Ong YS, Shen X, Cai J. When Gaussian process meets big data: A review of scalable GPs. IEEE Transactions on Neural Networks and Learning Systems. 2020;. [DOI] [PubMed] [Google Scholar]
  • 84.Wu CM, Schulz E, Garvert MM, Meder B, Schuck NW. Connecting conceptual and spatial search via a model of generalization. In: Rogers TT, Rau M, Zhu X, Kalish CW, editors. Proceedings of the 40th Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society; 2018. p. 1183–1188.
  • 85. Abbott JT, Austerweil JL, Griffiths TL. Random walks on semantic networks can resemble optimal foraging. Psychological Review. 2015;122(3):558–569. 10.1037/a0038693 [DOI] [PubMed] [Google Scholar]
  • 86. Hills TT, Jones MN, Todd PM. Optimal foraging in semantic memory. Psychological review. 2012;119(2):431 10.1037/a0027373 [DOI] [PubMed] [Google Scholar]
  • 87. Radulescu A, Niv Y, Ballard I. Holistic Reinforcement Learning: The Role of Structure and Attention. Trends in Cognitive Sciences. 2019;23:278–292. 10.1016/j.tics.2019.01.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88. Collins AG, Frank MJ. Within-and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory. Proceedings of the National Academy of Sciences. 2018;115:2502–2507. 10.1073/pnas.1720963115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89. Ohl S, Rolfs M. Saccadic selection of stabilized items in visuospatial working memory. Consciousness and Cognition. 2018;64:32–44. 10.1016/j.concog.2018.06.016 [DOI] [PubMed] [Google Scholar]
  • 90. Austerweil JL, Sanborn S, Griffiths TL. Learning How to Generalize. Cognitive science. 2019;43(8). 10.1111/cogs.12777 [DOI] [PubMed] [Google Scholar]
  • 91. Hills TT, Todd PM, Goldstone RL. The central executive as a search process: Priming exploration and exploitation across domains. Journal of Experimental Psychology: General. 2010;139(4):590 10.1037/a0020666 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. Findling C, Skvortsova V, Dromnelle R, Palminteri S, Wyart V. Computational noise in reward-guided learning drives behavioral variability in volatile environments. Nature neuroscience. 2019;22(12):2066–2077. 10.1038/s41593-019-0518-9 [DOI] [PubMed] [Google Scholar]
  • 93. Cogliati Dezza I, Cleeremans A, Alexander W. Should we control? The interplay between cognitive control and information integration in the resolution of the exploration-exploitation dilemma. Journal of Experimental Psychology: General. 2019;. [DOI] [PubMed] [Google Scholar]
  • 94.Wu CM, Schulz E, Gerbaulet K, Pleskac TJ, Speekenbrink M. Under pressure: The influence of time limits on human exploration. In: Goel AK, Seifert CM, Freksa C, editors. Proceedings of the 41st Annual Conference of the Cognitive Science Society. Montreal, QB: Cognitive Science Society; 2019. p. 1219––1225.
  • 95. Tomov M, Schulz E, Gershman SJ. Multi-Task Reinforcement Learning in Humans. bioRxiv. 2019; p. 815332. [DOI] [PubMed] [Google Scholar]
  • 96.Wu CM, Schulz E, Speekenbrink M, Nelson JD, Meder B. Mapping the unknown: The spatially correlated multi-armed bandit. In: Proceedings of the 39th Annual Meeting of the Cognitive Science Society; 2017. p. 1357–1362.
  • 97. Courville AC, Daw ND. The rat as particle filter. In: Advances in neural information processing systems; 2008. p. 369–376. [Google Scholar]
  • 98. Navarro DJ, Tran P, Baz N. Aversion to option loss in a restless bandit task. Computational Brain & Behavior. 2018;1(2):151–164. 10.1007/s42113-018-0010-8 [DOI] [Google Scholar]
  • 99. Mullen K, Ardia D, Gil DL, Windover D, Cline J. DEoptim: An R package for global optimization by differential evolution. Journal of Statistical Software. 2011;40(6):1–26. [Google Scholar]
  • 100. Jeffreys H. The theory of probability. OUP Oxford; 1998. [Google Scholar]
  • 101. Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review. 2009;16:225–237. [DOI] [PubMed] [Google Scholar]
  • 102. van Doorn J, Ly A, Marsman M, Wagenmakers EJ. Bayesian Latent-Normal Inference for the Rank Sum Test, the Signed Rank Test, and Spearman’s ρ. arXiv preprint arXiv:171206941. 2017;. [Google Scholar]
  • 103. Jeffreys H. The Theory of Probability. Oxford, UK: Oxford University Press; 1961. [Google Scholar]
  • 104. Ly A, Verhagen J, Wagenmakers EJ. Harold Jeffreys’s default Bayes factor hypothesis tests: Explanation, extension, and application in psychology. Journal of Mathematical Psychology. 2016;72:19–32. 10.1016/j.jmp.2015.06.004 [DOI] [Google Scholar]
  • 105. van Doorn J, Ly A, Marsman M, Wagenmakers EJ. Bayesian inference for Kendall’s rank correlation coefficient. The American Statistician. 2018;72:303–308. 10.1080/00031305.2016.1264998 [DOI] [Google Scholar]
  • 106.Zellner A, Siow A. Posterior odds ratios for selected regression hypotheses. In: Bernardo JM, Lindley DV, Smith AFM, editors. Bayesian Statistics: Proceedings of the First International Meeting held in Valencia (Spain). University of Valencia; 1980. p. 585–603.
  • 107. Rouder JN, Morey RD, Speckman PL, Province JM. Default Bayes factors for ANOVA designs. Journal of Mathematical Psychology. 2012;56:356–374. 10.1016/j.jmp.2012.08.001 [DOI] [Google Scholar]
  • 108. Bürkner PC. brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software. 2017;80(1):1–28. [Google Scholar]
  • 109. Hoffman MD, Gelman A. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research. 2014;15:1593–1623. [Google Scholar]
  • 110. Barr DJ, Levy R, Scheepers C, Tily HJ. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of memory and language. 2013;68(3):255–278. 10.1016/j.jml.2012.11.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111. Gelman A, Hill J. Data analysis using regression and multilevel/hierarchical models. Cambridge university press; 2006. [Google Scholar]
  • 112.Austerweil J, Griffiths T. Learning hypothesis spaces and dimensions through concept learning. In: Proceedings of the Annual Meeting of the Cognitive Science Society. vol. 32; 2010.
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008149.r001

Decision Letter 0

Daniele Marinazzo

5 Mar 2020

Dear Mr. Wu,

Thank you very much for submitting your manuscript "Similarities and differences in spatial and non-spatial cognitive maps" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers.

The paper was overall well received, but some important issues need to be addressed, in particular involving better motivating your approach and situating it in the state of the art, together with commenting on its generalizability.

Concerning the statistics and the report of the results:

- please always report all the data points, as you do in most figures, instead of bar plots with confidence bars

- removing outliers messes with the degrees of freedom. Several alternative approaches exist, excellent robust alternatives are proposed in this paper Wilcox, R. R., & Rousselet, G. A. (2018). A Guide to Robust Statistical Methods in Neuroscience. Current Protocols in Neuroscience, 82(1). doi:10.1002/cpns.41 (open access here https://www.biorxiv.org/content/10.1101/151811v1). The same paper also suggests multivariate robust linear regression as an alternative to ANOVA

- some data appear to be distributed in a very non-linear way, questioning the linear fit

In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Daniele Marinazzo

Deputy Editor

PLOS Computational Biology

Daniele Marinazzo

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The manuscript “Similarities and differences in spatial and non-spatial cognitive maps” claims that spatial and conceptual cognitive maps are fundamentally different. The authors propose that the two domains are dealt with differently in important and characteristic ways. One of the main differences is claimed to be that conceptual cognition involves more random exploration while spatial cognition involves more uncertainty-driven exploration. The authors also discovered that experience with space helps with concepts (higher accuracy), but not vice versa. To explain and analyse their results the authors use a Gaussian process model.

Overall, I am unsure about how generalisable these results are, given the authors have not sufficiently proposed any strong theoretical constraints on their hypotheses. I would have loved to have seen a higher-level theoretical account as to why they designed their experiment the way they did, why they suspected sampling strategies would differ (or perhaps they did not?), and especially why they chose the stimuli used. I will elaborate below what I mean and how I think some of these issues can be addressed.

Firstly, in terms of the stimuli: I believe that Gabor patches are not confirmed to be homogeneously mappable to 2D. This might indicate that while participants can achieve high accuracy in the training task (which was matching a stimulus to target stimulus by moving in 2D space) the higher cognitive demands of the test phase (which involves exploration as well) would impair participants’ accuracy if Gabor patches are harder to map onto 2D. In other words, due to the nature of the stimuli the test phase could be harder for Gabor patches. To assuage my worries, the authors could show that the stimuli in the two cases (spatial vs conceptual) are indeed 2D and indeed homogeneously distributed in their respective domains. What I mean is that Gabor patches might not all be equally easy to tell apart as a function of their frequency and orientation. And therefore, arguably, the spatial case could be seen as more homogenous. One way to explore this is to first ascertain if the stimuli are 2D — something like Ahlheim and Love (2018, the code is open source) could be run on the raw pixels of what the participants see to ensure both sets of stimuli have the same dimensionality. Alternatively, other methods of investigating this are possible. After that, assuming that both spaces are found to be (roughly) 2D, the issue of the homogeneity of the spaces can be addressed (thank you to Sebastian Bobadilla Suarez for input on this issue). Is moving e.g., one step in 2D frequency/orientation space also one step in the stimulus space of the Gabor stimuli? Mutatis mutandis for the spatial case, of course, which I suspect is 2D and homogenous.

Secondly, as mentioned above, I believe that some attention needs to be paid to other models, like Kohonen maps, e.g., the work in Mok and Love (2019). It might prove useful to give a few sentences on such models’ computational properties in order to understand what the paper sets out to investigate: the computational overlap between special and conceptual cognitive processing and what such an overlap might imply. In other words, given the authors are interested in the computational nature of cognitive maps some mention of modelling maps (explicitly) computationally is pertinent. Furthermore, it might help address, or at least contextualise, some of the ideas around the one-directional facilitation effect found and provide a formalisable structure and plan for future work.

Thirdly, if all the above is addressed, I would be more comfortable with the claims that this is a “fundamental difference in how people represent or reason about spatial and conceptual domains” but still not completely. Arguably people have vastly more experience with a 2D spatial domain than the domain of Gabor patches. Even the input modality is more easily mappable onto the spatial than conceptual domain since participants used the keyboard arrows in both tasks. I do not believe this is a fatal flaw in the paper, but it is something that has to be touched on: input space and task space are aligned more so in the spatial case than the concept case.

Code: I am having trouble running your code. I suggest the first step is to tidy up your code according to R best practices and especially in terms of dependencies by including a DESCRIPTION file and name these requirements, explaining how to install them in your README file. Also mention the version of R you used to create and run your codebase. See: http://r-pkgs.had.co.nz/description.html#dependencies — as well as: https://github.com/ropensci/Rclean and https://github.com/ironholds/urltools as examples of good practice to copy from.

Minor: Figure 3 panel d has a typo, should be “Bonus”.

References

Ahlheim, C., & Love, B. C. (2018). Estimating the functional dimensionality of neural representations. NeuroImage, 179, 51-62.

Mok, R. M., & Love, B. C. (2019). A non-spatial account of place and grid cells based on clustering models of concept learning. Nature communications, 10(1), 1-9.

Reviewer #2: I enjoyed reading this paper. I think trying to understand computational differences in how individuals reason and generalize in spatial and non-spatial maps explore is an important problem in cognitive science. The study presented in the paper finds some intriguing similarities and differences between generalization and exploration in these domains that I think will be inspiring for future research. In general, the analysis are very well presented and I appreciate the thorough investigation of the behavioral data in a model-free manner in addition to the sophisticated model-fitting. I only have a few critiques.

1. Comparisons in generalization and exploration parameters between the two tasks rely on distances between the two stimuli meaning the same thing between the two tasks. There’s no reason though that this should inherently be the case. The authors use evidence from the training task to argue that there are not perceptual discriminability differences between the stimuli for the two tasks. I’m not sure I understood though how this would address the question of whether distances between stimuli are comparable.

As a side point here, I’m not sure I fully understood the training task and what exactly data from it is meant to show. Was the target on the screen while the subjects navigated to it, or were subjects required to hold the target in memory? Additionally, in order to receive a correct response, were subjects required to take the shortest path to the target, or merely to arrive at the target eventually - perhaps this would bear on whether subjects intuited a map-like distance in the conceptual space that is similar to the map distance in spatial space?

Related to the question about how we can know whether distances between tasks are equivalent, I’m not sure it’s fair to assume that distances across the two dimensions of the conceptual stimuli mean the same thing (as i think is assumed by the RBF kernel in the GP model.) I think the authors should either address this with further analysis, or discuss whether this assumption being untrue would change interpretation of parameters from the model.

If it is not possible to address these concerns with further analysis, I think the paper is still valuable and interesting, but I think the authors should address, in the discussion, whether this concern poses limitations in the interpretation of parameter similarities and differences.

2. I appreciated the in depth model-free analysis (Behavioral Results section) prior to the modeling analysis. However it seems that a number of the features of the data that are presented in the model-free are not addressed again in the modeling section. This leads to the impression that perhaps there are aspects of behavior that the GP model is not picking up.

In particular, I was wondering whether differences in model parameters (between environments and also between tasks) can account for the following features in the data:

- That participants get more rewards in smooth compared to rough domains

- The one-directional transfer effect that subjects conceptual performance benefits from first performing the spatial task.

Relatedly, in comparing human and model learning curves (figure 3c) it appears that humans outperform the model in smooth, but not in rough environments. Why does the model fail to capture learning curves as well in smooth environments?

I think it is fine if the model cannot account for all these differences. But it would be useful for the reader for the paper to clearly state what aspects of the model-free analysis the GP models can and cannot account for.

3. Lastly, for comparison of exploration parameters, I think the difference in directed exploration between environments is a really interesting difference between tasks. However, I question whether it is fair to interpret differences in the random exploration parameter as a strategy difference. This is because, I presume, that errors in model prediction of behavior get soaked up into that parameter. If this is correct, couldn’t the paper equivalently just state that the model fits worse in the conceptual task than the spatial task?

Reviewer #3: Wu and colleagues tested human participants on a spatial and conceptual task to assess whether similar cognitive mechanisms are used across these domains. They applied computational modelling and found shared and different processes, suggesting some processes are shared whereas other processes might be distinct.

This is an exceptionally well conducted study with a clear rationale, strong analyses and modelling work. I believe this will be a great paper for PLOS computational biology, after minor revisions. Mainly, I have some questions to clarify parts of the paper, and on how the works compares to some of the current literature. I also include some few suggestions that I hope will help the paper, if space allows.

Task design

1. Smooth vs rough designs: Is the reason why people get more rewards on smooth conditions because there is more rewards overall across the map, or is it actually that they do better in smooth environments because they learnt it?

I ask because it looks like smooth environments have more rewards (looking at the maps in S2 - more yellow cells). But it also sounds like it is normalized so overall same expected reward - so would that mean each yellow cell means lower rewards in the smooth versus rough? But if that's the case then it's much harder to get higher reward in the smooth conditions (since highest reward per choice is lower?)

Results

2. Do participants attend to one feature dimension more than another (e.g. Nosofsky, 1986 - you could check if they weighted one dimension more than another / more sensitive to one dimension)? Probably not for spatial, but maybe in the gabors? Does that affect anything / maybe harder to generalize to space if so?

3. Related to the above point - transfer - is there an analysis to show why there is transfer for space --> gabors but not vice versa? E.g. general: people who learn better on space --> transfer more. If so - maybe people who learn better on gabors also transfer more to space (even though main effect not there). Possibly, those who show more 'equal' attention to both dimensions on gabors show better transfer?

Model results:

4. The results of 3A are convincing but shows that the BMT model does well as well; whereas 3B looks like it's doing very poorly; why is this? I realize A is predicting novel choices, and B is model fit. Is BMT actually doing a very bad job of fitting, but still getting pretty good predictive accuracy?

Model questions / considerations:

5. Do the rewards have to be spatially correlated for the GP model to work well / generalize? Comparing the BMT (point estimate) versus the GP model - it makes sense that the function learning approach will do better than a point estimate approach when locations in the space are correlated. Would your approach do as well if they were less/not (spatially) correlated? i.e. would the GP model still work with learning structures that were not, e.g. distributed smoothly?

My question is - if there is structure, but not spatially smooth at all (rough is still quite smooth), would the GP still learn it? Eg. Across blocks, you change the rewarded locations, but the relations between the reward locations are kept the same. Or would a point-estimate model do as well?

6. In the Discussion: "Comparing different computational models of learning and exploration, we found that a Gaussian Process model that incorporated distance-based generalization, and hence a cognitive map of similarities, best predicted participants behavior in both domains. "

The authors only compare with one model - BMT, a Bayesian model that does point estimates on rewards. How about other models? The claim that GP is a good model because it captures the generalization bit is fine - but maybe less emphasis on the model as the only model? E.g. can other models solve it? Maybe there are good reasons this model is better theoretically anyway - could discuss this and why it's better/ different to other models at least

Theoretically, would a gaussian mixture model or clustering model not also work for structures where the rewards are spatially correlated like this? You'd get generalization to new parts of the space if they learn the centres of the reward regions - though I'm not sure if it would learn as quickly as is needed (compared to GPs)

To be clear, I am not asking the authors to run all the models, but state what they show (they can show good generalization, but not that this is the only model can do that). It could make sense to discuss other models and maybe why they won't work if that is the case.

Related literature

7. How is it related to other ideas about the neural underpinnings, e.g. grid cells? For example, the successor representation (e.g. Stachenfeld et al., 2017, Mommenejad & Howard, 2018), clustering (Mok & Love, 2019), Gaussian/Bayesian mixture models and more (e.g. Sanders, Wilson, Gershman 2019, bioRxiv), spatial-conceptual (Bellmund et al., 2019- cited but relevant comparison). Would these models do well at your task, or is GP the only one that could capture the data? Are there any predictions or interpretations of the GP for neural data?

Suggestions

Abstract:

1. Key findings include both similarities and differences between cognitive mechanisms for the two tasks - the differences are described well but the similarities are a bit vague: "Using a Bayesian learning model, we find evidence for the same computational mechanisms of generalization across domains." If there is space, I suggest the authors could add a sentence or state what they find - that there are no/little differences between the parameters from these model across tasks, and they are correlated across participants (or qualify which parameters were not different/correlated).

Introduction

2. An explanation of what 'generalization' in cited work and in the current work is assumed. Something simple would already help: e.g. when people learn or gain an understanding about an environment, they can generalize in the sense they know what the value of the novel options are and select them even though they have never experienced them.

3. How is cognitive maps/generalization related to directed vs random exploration? Not sure this is addressed in the introduction, though the authors set out to test it.

4. The reader would benefit from a short introduction to GPs and the motivation for using them - why are they a good model to be used here? How are they related to previous ideas mentioned in the intro? Could be very brief and since there is more in the Results.

Minor:

5. Suggestion: Figure 1 - why not show all the gabors to illustrate the structure of the stimulus space, to show the correspondence to the spatial structure? It is not clear that this is the case. Could be nice to have the supplement figure S1 included here, if it fits.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Olivia Guest

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008149.r003

Decision Letter 1

Daniele Marinazzo

25 May 2020

Dear Mr. Wu,

Thank you very much for submitting your manuscript "Similarities and differences in spatial and non-spatial cognitive maps" for consideration at PLOS Computational Biology.

We appreciate the changes that you made to your manuscript following the recommendations, and two reviewers are fully satisfied by them.

On the other hand Dr. Guest feels that some major concerns have not been addressed, and I agree. Now, it can be that there is a misunderstanding in the conception of these issues, or in the wording used to address them, or even that you don't feel like that these issues need to be addressed, or to be addressed as suggested by Dr. Guest. Or a mixture of all this.

I think that it's important to disambiguate these issues and to agree (even agree to disagree) on the remaining issues.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Daniele Marinazzo

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: My review is uploaded as a PDF attachment.

Reviewer #2: The authors have adequately addressed all of my concerns. I think the revised paper is strong.

Reviewer #3: Thank you for writing this detailed and clear response, engaging with my questions, and the effort on all the extra analyses.

The authors have answered all my questions and have improved the paper with clarifications and additional analyses.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Olivia Guest

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

Attachment

Submitted filename: Review.pdf

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008149.r005

Decision Letter 2

Daniele Marinazzo

13 Jul 2020

Dear Mr. Wu,

We are pleased to inform you that your manuscript 'Similarities and differences in spatial and non-spatial cognitive maps' has been provisionally accepted for publication in PLOS Computational Biology.

Some disagreements remain, and since in general reviewers are proxies of the wider community of readers, I hope you will agree to make the reviews public, and to engage with any subsequent comment (this should be true in general for any research product anyway).

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Daniele Marinazzo

Deputy Editor

PLOS Computational Biology

Daniele Marinazzo

Deputy Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I thank the authors for apologising for their confusing previous response letter, however I am not entirely sure this more recent one is a huge improvement. I am not used to dealing with differences between phraseology in the letter versus the manuscript. Ideally, the authors should be consistent because both manuscript and letter reflect their internal ideas about their work. Are they going to publish a very carefully written piece (this one) but then talk about it in presentations using language that is so generalised that they will get the reactions they got from me in my previous reply? (Anyway, this is a rhetorical question for the editor and authors.)

Some of the changes to the manuscript are appropriate and useful — I am glad the authors spent time attempting to improve their paper — while in other cases the authors have opted not to modify or clarify their prose. Given this, they likely do not want to edit their manuscript (in those cases) because they disagree with my perspectives as given in my previous review, which is of course understandable and totally within their remit. So while I think some of the changes made are appropriate, some of the more important points I raised still stand.

To end, I appreciate some of the changes the authors have made, but we obviously disagree on some, perhaps very core, issues. Since it’s not a productive use of time to just rehash our previous disagreements (which are in my previous review, of course), I will close by saying that I wish the authors and their manuscript all the best and that I have nothing further to add.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Olivia Guest

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008149.r006

Acceptance letter

Daniele Marinazzo

20 Aug 2020

PCOMPBIOL-D-20-00183R2

Similarities and differences in spatial and non-spatial cognitive maps

Dear Dr Wu,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Matt Lyles

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Gabor stimuli.

    Tilt varies from left to right from 105° to 255° in equally spaced intervals, while stripe frequency increases moving upwards from 1.5 to 15 in log intervals.

    (TIFF)

    S2 Fig. Correlated reward environments.

    Heatmaps of the reward environments used in both spatial and conceptual domains. The color of each tile represents the expected reward of the bandit, where the x-axis and y-axis were mapped to the spatial location or the tilt and stripe frequency (respectively). All environments have the same minimum and maximum reward values, and the two classes of environments share the same expectation of reward across options.

    (EPS)

    S3 Fig. Training phase.

    a) Trials needed to reach the learning criterion (90% accuracy over 10 trials) in the training phase, where the dotted line indicates the 32 trial minimum. Each dot is a single participant with lines connecting the same participant. Tukey boxplots show median (line) and 1.5x IQR, with diamonds indicating group means. b) Average correct choices during the training phase. In the last 10 trials before completing the training phase, participants had a mean accuracy of 95.0% on the spatial task and 92.7% on the conceptual task (difference of 2.3%). In contrast, in the first 10 trials of training, participants had a mean accuracy of 84.1% in the spatial task and 68.8% in the conceptual (difference of 15.4%). c) Heatmaps of the accuracy of different target stimuli, where the x and y-axes of the conceptual heatmap indicate tilt and stripe frequency, respectively. d) The probability of error as a function of the magnitude of error (Manhattan distance from the correct response). Thus, most errors were close to the target, with higher magnitude errors being monotonically less likely to occur.

    (EPS)

    S4 Fig. Search trajectories.

    a) Distribution of trajectory length, separated by task and environment. The dashed vertical line indicates the median for each category. Participants had longer trajectories in the contextual task (t(128) = − 10.7, p < .001, d = 1.0, BF > 100), but there were no differences across environments (t(127) = 1.3, p = .213, d = 0.2, BF = .38). b) Average reward value as a function of trajectory length. Longer trajectories were correlated with higher rewards (r = .23, p < .001, BF > 100). Each dot is a mean with error bars showing the 95% CI. c) Distance from the random initial starting point in each trial as a function of the previous reward value. Each dot is the aggregate mean, while the lines show the fixed effects of a Bayesian mixed-effects model (see S1 Table), with the ribbons indicating the 95% CI. The relationship is not quite linear, but is also found using a rank correlation (rτ = .18, p < .001, BF > 100). The dashed line indicates random chance. d) Search trajectories decomposed into the vertical/stripe frequency dimension vs. horizontal/tilt dimension. Bars indicate group means and error bars show the 95% CI. We find more attention given to the vertical/stripe frequency dimension in both tasks, with a larger effect for the conceptual task (F(1, 127) = 26.85, p < .001, η2 = .08, BF > 100), but no difference across environments (F(1, 127) = 1.03, p = .311, η2 = .005, BF = 0.25). e) We compute attentional bias as Δdim = P(vertical/stripe frequency)− P(horizontal/tilt), where positive values indicate a stronger bias towards the vertical/stripe frequency dimension. Attentional bias was influenced by the interaction of task order and task (F(1, 127) = 8.1, p = .005, η2 = .02, BF > 100): participants were more biased towards the vertical/stripe frequency dimension in the conceptual task when the conceptual task was performed first (t(66) = − 6.0, p < .001, d = 0.7, BF > 100), but these differences disappeared when the spatial task was performed first (t(61) = − 1.6, p = .118, d = 0.2, BF = .45). f) Differences in attention and score. Each participant is represented as a pair of dots, where the connecting line shows the change in score and Δdim across tasks. We found a negative correlation between score and attention for the conceptual task only in the conceptual first order (rτ = − .31, p < .001, BF > 100), but not in the spatial first order (rτ = − .07, p = .392, BF = .24). There were no relationships between score and attention in the spatial task in either order (spatial first: rτ = .03, p = .738, BF = .17; conceptual first: rτ = − .03, p = .750, BF = .17).

    (EPS)

    S5 Fig. Heatmaps of choice frequency.

    Heatmaps of chosen options in a) the Gabor feature of the conceptual task and b) the spatial location of the spatial task, aggregated over all participants. The color shows the frequency of each option centered on yellow representing random chance (1/64), with orange and red indicating higher than chance, while green and blue were lower than chance.

    (EPS)

    S6 Fig. Additional modeling results.

    a) The relationship between mean performance and predictive accuracy, where in all cases, the best performing participants were also the best described. b) The best performing participants were also the most diagnostic between models, but not substantially skewed towards either model. Linear regression lines strongly overlap with the dotted line at y = 0, where participants above the line were better described by the GP model. c Model comparison split by which task was performed first vs. second. In both cases, participants were better described on their second task, although the superiority of the GP over the BMT remains, comparing only task one (paired t-test: t(128) = 4.6, p < .001, d = 0.10, BF = 1685) or only task two (t(128) = 3.5, p < .001, d = 0.08, BF = 27).

    (EPS)

    S7 Fig. GP parameters and performance.

    a) We do not find a consistent relationship between λ estimates and performance, which were anectdotally correlated in the spatial task (rτ = .13, p = .030, BF = 1.2) or negatively correlated in the conceptual task (rτ = − .22, p < .001, BF > 100). b) Higher β estimates were strongly predictive of better performance in both conceptual (rτ = .32, p < .001, BF > 100) and spatial tasks (rτ = .31, p < .001, BF > 100). c) On the other hand, high temperature values predicted lower performance in both conceptual(rτ = − .59, p < .001, BF > 100) and spatial tasks (rτ = − .58, p < .001, BF > 100).

    (EPS)

    S8 Fig. GP exploration bonus and temperature.

    We check here whether there exists any inverse relationship between directed and undirected exploration, implemented using the UCB exploration bonus β (x-axis) and the softmax temperature τ (y-axis), respectively. Results are split into conceptual (a) and spatial tasks (b), where each dot is a single participant and the dotted line indicates y = x. The upper axis limits are set to the largest 1.5 × IQR, for both β and τ, across both conceptual and spatial tasks.

    (EPS)

    S9 Fig. BMT parameters.

    Each dot is a single participant and the dotted line indicates y = x. a) We found lower error variance (σϵ2) estimates in the conceptual task (Wilcoxon signed-rank test: Z = − 4.8, p < .001, r = − .42, BF > 100), suggesting participants were more sensitive to the reward values (i.e., more substantial updates to their means estimates). Error variance was also correlated across tasks (rτ = .18, p = .003, BF = 10). b) As with the GP model reported in the main text, we also found strong differences in exploration behavior in the BMT. We found lower estimates of the exploration bonus in the conceptual task (Z = − 5.9, p < .001, r = − .52, BF > 100). The exploration bonus was also somewhat correlated between tasks (rτ = .16, p = .006, BF = 4.8). c) Also in line with the GP results, we again find an increase in random exploration in the conceptual task (Z = − 6.9, p < .001, r = − .61, BF > 100). Once more, temperature estimates were strongly correlated (rτ = .34, p < .001, BF > 100).

    (EPS)

    S10 Fig. Shepard kernel parameters.

    We also considered an alternative form of the GP model. Instead of modeling generalization as a function of squared-Euclidean distance with the RBF kernel, we use the Shepard kernel described in [65], where we instead use Minkowski distance with the free parameter ρ ∈ [0, 2]. This model is identical to the GP model reported in the main text when ρ = 2. But when ρ < 2, the input dimensions transition from integral to separable representations [112]. The lack of clear differences in model parameters motivated us to only include the standard RBF kernel in the main text. a) We find no evidence for differences in generalization between tasks (Z = − 1.8, p = .039, r = − .15, BF = .32). There is also marginal evidence of correlated estimates (rτ = .13, p = .026, BF = 1.3). b) There is anecdotal evidence of lower ρ estimates in the conceptual task (Z = − 2.5, p = .006, r = − .22, BF = 2.0). The implication of a lower ρ in the conceptual domain is that the Gabor features were treated more independently, whereas the spatial dimensions were more integrated. However, the statistics suggest this is not a very robust effect. These estimates are also not correlated (rτ = − .02, p = .684, BF = .12). c) Consistent with all the other models, we find systematically lower exploration bonuses in the conceptual task (Z = − 5.5, p < .001, r = − .49, BF > 100). There was weak evidence of a correlation across tasks (rτ = .14, p = .021, BF = 1.6). d) We find clear evidence of higher temperatures in the conceptual task (Z = − 6.3, p < .001, r = − .56, BF > 100), with strong correlations across tasks (rτ = .41, p < .001, BF > 100).

    (EPS)

    S11 Fig. Comprehension questions for the conceptual task.

    The correct answers are highlighted.

    (TIFF)

    S12 Fig. Comprehension questions for the spatial task.

    The correct answers are highlighted.

    (TIFF)

    S1 Table. Mixed effects regression results: Previous reward.

    (PDF)

    S2 Table. Mixed effects regression results: Bonus round judgments.

    (PDF)

    Attachment

    Submitted filename: Neurogrid_resubmission_letter.pdf

    Attachment

    Submitted filename: Review.pdf

    Attachment

    Submitted filename: reviewerRebuttal2ndresubmission.pdf

    Data Availability Statement

    All data and analysis code is available from https://github.com/charleywu/cognitivemaps.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES