Skip to main content
eLife logoLink to eLife
. 2024 Feb 29;12:RP84141. doi: 10.7554/eLife.84141

Endotaxis: A neuromorphic algorithm for mapping, goal-learning, navigation, and patrolling

Tony Zhang 1, Matthew Rosenberg 1,2, Zeyu Jing 1, Pietro Perona 3, Markus Meister 1,
Editors: Srdjan Ostojic4, Timothy E Behrens5
PMCID: PMC10911395  PMID: 38420996

Abstract

An animal entering a new environment typically faces three challenges: explore the space for resources, memorize their locations, and navigate towards those targets as needed. Here we propose a neural algorithm that can solve all these problems and operates reliably in diverse and complex environments. At its core, the mechanism makes use of a behavioral module common to all motile animals, namely the ability to follow an odor to its source. We show how the brain can learn to generate internal “virtual odors” that guide the animal to any location of interest. This endotaxis algorithm can be implemented with a simple 3-layer neural circuit using only biologically realistic structures and learning rules. Several neural components of this scheme are found in brains from insects to humans. Nature may have evolved a general mechanism for search and navigation on the ancient backbone of chemotaxis.

Research organism: None

Introduction

Animals navigate their environment to look for resources – such as shelter, food, or a mate – and exploit such resources once they are found. Efficient navigation requires knowing the structure of the environment: which locations are connected to which others (Tolman, 1948). One would like to understand how the brain acquires that knowledge, what neural representation it adopts for the resulting map, how it tags significant locations in that map, and how that knowledge gets read out for decision-making during navigation.

Experimental work on these topics has mostly focused on simple environments – such as an open arena (Wilson and McNaughton, 1993), a pond (Morris et al., 1982), or a desert (Müller and Wehner, 1988) – and much has been learned about neural signals in diverse brain areas under these conditions (Sosa and Giocomo, 2021; Collett and Collett, 2002). However, many natural environments are highly structured, such as a system of burrows, or of intersecting paths through the underbrush. Similarly, for many cognitive tasks, a sequence of simple actions can give rise to complex solutions.

One algorithm for finding a valuable resource is common to all animals: chemotaxis. Every motile species has a way to track odors through the environment, either to find the source of the odor or to avoid it (Baker et al., 2018). This ability is central to finding food, connecting with a mate, and avoiding predators. It is believed that brains originally evolved to organize the motor response in pursuit of chemical stimuli. Indeed, some of the oldest regions of the mammalian brain, including the hippocampus, seem organized around an axis that processes smells (Jacobs, 2012; Aboitiz and Montiel, 2015).

The specifics of chemotaxis, namely the methods for finding an odor and tracking it, vary by species, but the toolkit always includes a search strategy based on trial-and-error: try various actions that you have available, then settle on the one that makes the odor stronger (Baker et al., 2018). For example, a rodent will weave its head side-to-side, sampling the local odor gradient, then move in the direction where the smell is stronger. Worms and maggots follow the same strategy. Dogs track a ground-borne odor trail by casting across it side-to-side. Flying insects perform similar casting flights. Bacteria randomly change direction every now and then, and continue straight as long as the odor improves (Berg, 1988). We propose that this universal behavioral module for chemotaxis can be harnessed to solve general problems of search and navigation in a complex environment, even when tell-tale odors are not available.

For concreteness, consider a mouse exploring a labyrinth of tunnels (Figure 1A). The maze may contain a source of food that emits an odor (Figure 1A1). That odor will be strongest at the source and decline with distance along the tunnels of the maze. The mouse can navigate to the food location by simply following the odor gradient uphill. Suppose that the mouse discovers some other interesting locations that do not emit a smell, like a source of water, or the exit from the labyrinth (Figures 1A23). It would be convenient if the mouse could tag such a location with an odorous material, so it may be found easily on future occasions. Ideally, the mouse would carry with it multiple such odor tags, so it can mark different targets each with its specific recognizable odor.

Figure 1. A mechanism for endotaxis.

Figure 1.

(A) A constrained environment of tunnels linked by intersections, with special locations offering food, water, and the exit. (1) A real odor emitted by the food source decreases with distance (shading). (2) A virtual odor tagged to the water source. (3) A virtual odor tagged to the exit. (4) Abstract representation of this environment by a graph of nodes (intersections) and edges (tunnels). (B) A neural circuit to implement endotaxis. Open circles: four populations of neurons that represent ‘resource,’ ‘point,’ ‘map,’ and ‘goal.’ Arrows: signal flow. Solid circles: synapses. Point cells have small receptive fields localized in the environment and excite map cells. Map cells excite each other (green synapses) and also excite goal cells (blue synapses). Resource cells signal the presence of a resource, for example, cheese, water, or the exit. Map synapses and goal synapses are modified by activity-dependent plasticity. A ‘mode’ switch selects among various goal signals depending on the animal’s need. They may be virtual odors (water, exit) or real odors (cheese). Another goal cell (clock) may report how recently the agent has visited a location. The output of the mode switch gets fed to the chemotaxis module for gradient ascent. Mathematical symbols used in the text: ui is the output of a point cell at location i, wi is the input to the corresponding map cell, vi is the output of that map cell, M is the matrix of synaptic weights among map cells, G are the synaptic weights from the map cells onto goal cells, and rk is the output of goal cell k.

Figure 3. Theory of the goal signal.

Figure 3.

Dependence of the goal signal on graph distance, and the consequences for endotaxis navigation. (A) The graph representing a binary tree labyrinth (Rosenberg et al., 2021) serves for illustration. Suppose the endotaxis model has acquired the adjacency matrix perfectly: M=A. We compute the goal signal Exy between any two nodes on the graph and compare the results at different values of the map gain γ. (B) Dependence of the goal signal Exy on the graph distance Dxy between the two nodes. Mean ± SD, error bars often smaller than markers. The maximal distance on this graph is 12. Note logarithmic vertical axis. The signal decays exponentially over many log units. At high γ, the decay distance is greater. (C) A detailed look at the goal signal, each point is for a pair of nodes (x,y). For low γ, the decay with distance is strictly monotonic. At high γ, there is overlap between the values at different distances. As γ exceeds the critical value γc=0.38, the distance dependence breaks down. (D) Using the goal signal for navigation. For every pair of start and end nodes, we navigate the route by following the goal signal and compare the distance traveled to the shortest graph distance. For all routes with the same graph distance, we plot the median navigated distance with 10 and 90% quantiles. Variable gain at a constant noise value of ϵ=0.01. (E) As in panel (D) but varying the noise at a constant gain of γ=0.34.

Here we show that such tagging does not need to be physical. Instead, we propose a mechanism by which the mouse’s brain may compute a ‘virtual odor’ signal that declines with distance from a chosen target. That neural signal can be made available to the chemotaxis module as though it were a real odor, enabling navigation up the gradient toward the target. Because this goal signal is computed in the brain rather than sensed externally, we call this hypothetical process endotaxis.

The developments reported here were inspired by a recent experimental study with mice navigating a complex labyrinth (Rosenberg et al., 2021) that includes 63 three-way junctions. Among other things, we observed that mice could learn the location of a resource in the labyrinth after encountering it just once, and perfect a direct route to that target location after 10 encounters. Furthermore, they could navigate back out of the labyrinth using a direct route they had not traveled before, even on the first attempt. Finally, the animals spent most of their waking time patrolling the labyrinth, even long after they had perfected the routes to rewarding locations. These patrols covered the environment efficiently, avoiding repeat visits to the same location. All this happened within a few hours of the animal’s first encounter with the labyrinth. Our modeling efforts here are aimed at explaining these remarkable phenomena of rapid spatial learning in a new environment: one-shot learning of a goal location, zero-shot learning of a return route, and efficient patrolling of a complex maze. In particular we want to do so with a biologically plausible mechanism that could be built out of neurons.

Results

A neural circuit to implement endotaxis

Figure 1B presents a neural circuit model that implements three goals: mapping the connectivity of the environment; tagging of goal locations with a virtual odor; and navigation toward those goals. The model includes four types of neurons: resource cells, point cells, map cells, and goal cells.

Resource cells

These are sensory neurons that fire when the animal encounters an interesting resource, for example, water or food, that may form a target for future navigation. Each resource cell is selective for a specific kind of stimulus. The circuitry that produces these responses is not part of the model.

Point cells

This layer of cells represents the animal’s location. (We avoid the term ‘place cell’ here because [1] that term has a technical meaning in the rodent hippocampus, whereas the arguments here extend to species that do not have a hippocampus; and [2] all the cells in this network have a place field, but it is smallest for the point cells.) Each neuron in this population has a small response field within the environment. The neuron fires when the animal enters that response field. We assume that these point cells exist from the outset as soon as the animal enters the environment. Each cell’s response field is defined by some conjunction of external and internal sensory signals at that location.

Map cells

This layer of neurons learns the structure of the environment, namely how the various locations are connected in space. The map cells get excitatory input from point cells in a one-to-one fashion. These input synapses are static. The map cells also excite each other with all-to-all connections. These recurrent synapses are modifiable according to a local plasticity rule. After learning, they represent the topology of the environment.

Goal cells

Each goal cell serves to mark the locations of a special resource in the map of the environment. The goal cell receives excitatory input from a resource cell, which gets activated whenever that resource is present. It also receives excitatory synapses from map cells. Such a synapse is strengthened when the presynaptic map cell is active at the same time as the resource cell.

After the map and goal synapses have been learned, each goal cell carries a virtual odor signal for its assigned resource. The signal increases systematically as the animal moves closer to a location with that resource. A mode switch selects one among many possible virtual odors (or real odors) to be routed to the chemotaxis module for odor tracking. (The mode switch effectively determines the animal’s behavioral policy. In this report, we do not consider how or why the animal chooses one mode or another.) The animal then pursues its chemotaxis search strategy to maximize that odor, which leads it to the selected tagged location.

Why does the circuit work?

The key insight is that the output of the goal cell declines systematically with the distance of the animal from the target location. This relationship holds even if the environment is constrained with a complex connectivity graph (Figure 1A4). Here we explain how this comes about, with mathematical details to follow.

In a first phase, the animal explores the environment while the circuit builds a map. When the animal moves from one location to an adjacent one, those two point cells fire in rapid succession. That leads to a Hebbian strengthening of the excitatory synapses between the two corresponding map cells (Figure 2A and B). In this way, the recurrent network of map cells learns the connectivity of the graph that describes the environment. To a first approximation, the matrix of synaptic connections among the map cells will converge to the correlation matrix of their inputs (Dayan and Abbott, 2001; Galtier et al., 2012), which in turn reflects the adjacency matrix of the graph (Equation 1). Now the brain can use this adjacency information to find the shortest path to a target.

Figure 2. The phases of endotaxis during exploration, goal-tagging, and navigation.

Figure 2.

A portion of the circuit in Figure 1 is shown, including a single goal cell that responds to the water resource. Bottom shows a graph of the environment, with nodes linked by edges, and the agent’s current location shaded in orange. Each node has a point cell that reports the presence of the agent to a corresponding map cell. Map cells are recurrently connected (green) and feed convergent signals onto the goal cell. (A) Initially the recurrent synapses are weak (empty circles). (B) During exploration, the agent moves between two adjacent nodes on the graph, and that strengthens (arrowhead) the connection between their corresponding map cells (filled circles). (C) After exploration, the map synapses reflect the connectivity of the graph. Now the map cells have an extended profile of activity (darker = more active), centered on the agent’s current location x and decreasing from there with distance on the graph. (D) When the agent reaches the water source y, the goal cell gets activated by the sensation of water, and this triggers plasticity (arrowhead) at its input synapses. Thus, the state of the map at the water location gets stored in the goal synapses. This event represents tagging of the water location. (E) During navigation, as the agent visits different nodes, the map state gets filtered through the goal synapses to excite the goal cell. This produces a signal in the goal cell that declines with the agent’s distance from the water location.

After this map learning, the output of the map network is a hump of activity, centered on the current location x of the animal and declining with distance along the various paths in the graph of the environment (Figure 2C). If the animal moves to a different location y, the map output will change to another hump of activity, now centered on y (Figure 2D). The overlap of the two hump-shaped profiles will be large if nodes x and y are close on the graph, and small if they are distant. Fundamentally the endotaxis network computes that overlap.

Suppose the animal visits y and finds water there. Then the water resource cell fires, triggering synaptic learning in the goal synapses. That stores the current profile of map activity vi(y) in the synapses Gki onto the goal cell k that responds to water (Figure 2D, Equation 9). When the animal subsequently moves to a different location x, the goal cell k receives the current map output v(x) filtered through the previously stored synaptic template v(y) (Figure 2E). This is the desired measure of overlap (Equation 10). Under suitable conditions, this goal signal declines monotonically with the shortest graph distance between x and y, as we will demonstrate both analytically and in simulations (sections ‘Theory of endotaxis’ and ‘Acquisition of map and targets during exploration’).

Theory of endotaxis

Here we formalize the processes of Figure 2 in a concrete mathematical model. The model is simple enough to allow some exact predictions for its behavior. The present section develops an analytical understanding of endotaxis that will help guide the numerical simulations in subsequent parts.

The environment is modeled as a graph consisting of n nodes, with adjacency matrix

Aij={1, if node i can be reached from node j in one step0,otherwise, including thei=jcase (1)

We suppose the graph is undirected, meaning that every link can be traversed in both directions,

Aij=Aji

Movements of the agent are modeled as a sequence of steps along that graph. During exploration, the agent performs a walk that tries to cover the entire environment. In the process, it learns the adjacency matrix A. During navigation, the agent uses that knowledge to travel to a known target.

For an agent navigating a graph, it is very useful to know the shortest graph distance between any two nodes

Dij=minimum number of steps needed to reach nodeifrom nodej (2)

Given this information, one can navigate the shortest route from x to y: for each of the neighbors of x, look up its distance to y and step to the neighbor with the shortest distance. Then repeat that process until y is reached. Thus, the shortest route can be navigated one step at a time without any high-level advanced planning. This is the core idea behind endotaxis.

The network of Figure 1B effectively computes the shortest graph distances. We implement the circuit as a textbook linear rate model (Dayan and Abbott, 2001). Each map unit i has a synaptic input wi that it converts to an output vi,

vi=γwi (3)

where γ is the gain of the units. The input consists of an external signal ui summed with a recurrent feedback through a connection matrix M

wi=ui+ijMijvj (4)

where Mij is the synaptic strength from unit j to i.

The point neurons are one-hot encoders of location. A point neuron fires if the agent is at that location; all the others are silent:

ui(x)=firing rate of point celliwith the agent at nodex=δix (5)

where δix is the Kronecker delta.

So the vector of all map outputs is

v=γ(u+Mv)=(1γ1M)1u (6)

where u is the one-hot input from point cells.

Now consider goal cell number k that is associated to a particular location y because its resource is present at that node. The goal cell sums input from all the map units vi, weighted by its goal synapses Gki. So with the agent at node x, the goal signal rk is

rk(x)=iGkivi(x)=gkv(x)=gk(1γ1M)1u(x) (7)

where we write gk for the kth row vector of the goal synapse matrix G. This is the set of synapses from all map cells onto the specific goal cell in question.

Suppose now that the agent has learned the structure of the environment perfectly, such that the map synapses are a copy of the graph’s adjacency matrix (1),

M=A (8)

Similarly, suppose that the agent has acquired the goal synapses perfectly, namely proportional to the map output at the goal location y:

gk=v(y) (9)

Then as the agent moves to another location x, the goal cell reports a signal

rk(x)=gkv(x)=v(y)v(x)Exy (10)

where the matrix

E=(1γ1A)1(1γ1A)1 (11)

It has been shown (Meister, 2023) that for small values of γ the elements of the resolvent matrix

Y=(1γ1A)1 (12)

are monotonically related to the shortest graph distances D. Specifically,

Yxyγ0γ1+Dxy (13)

Building on that, the matrix E becomes

Exyγ0zγ1+Dzxγ1+Dzy=zγ2+Dzx+Dzy (14)

The limit is dominated by the term with the smallest exponent, which occurs when z lies on a shortest path from x to y

minz(Dzx+Dzy)=Dxy

where we have used the undirected nature of the graph, namely Dzx=Dxz.

Therefore,

Exyγ0γ2+Dxy (15)

where Dxy is the smallest number of steps needed to get from node y to node x.

Figure 3 illustrates this relationship with numerical results on a binary tree graph. As expected, for small γ the goal signal decays exponentially with graph distance (Figure 3B). Therefore, an agent that makes local turning decisions to maximize that goal signal will reach the goal by the shortest possible path.

The exponential decay of the goal signal represents a challenge for practical implementation with biological circuits. Neurons have a finite signal-to-noise ratio, so detecting minute differences in the firing rate of a goal neuron will be unreliable. Because the goal signal changes by a factor of γ across every link in the graph, one wants to set the map neuron gain γ as large as possible. However, there is a critical gain value γc that sets a strict upper limit:

γ<γc1largest absolute eigenvalue ofA (16)

For larger γ>γc, the goal signal Exy no longer represents graph distances (Meister, 2023). The largest eigenvalue of the adjacency matrix in turn is related to the number of edges per node. For graphs with 2–4 edges per node, γc is typically about 0.3. The graph in Figure 3A has γc0.383, and indeed Exy becomes erratic as γ approaches that value (Figure 3C).

To implement the finite dynamic range explicitly, we add some noise to the goal signal of Equation 11:

rk(x)=gkv(x)+η (17)

where the noise η has a Gaussian distribution with full width ϵ:

ηN(0,(ϵ/2)2) (18)

The scale ϵ of this noise is expressed relative to the maximum value of the goal signal. If the agent must decide between two goal signals separated by less than ϵ, the noise will take a toll on the resulting navigation performance.

Of course, neurons everywhere within the network will carry some noise. We lump the cumulative effects of that into the final readout step because that allows for efficient calculations (see section ‘Average navigated distance’). (In the circuit of Figure 1B, one can envision that the readout noise gets added after the mode switch.) What is a reasonable value for this effective readout noise? For reference, humans and animals can routinely discriminate sensory stimuli that differ by only 1%, for example, the pitch of tones or the intensity of a light, especially if they occur in close succession. Clearly the neurons all the way from receptors to perception must represent those small differences. Thus, we will use ϵ=0.01 as a reference noise value in many of the results presented here.

The process of navigation toward a chosen goal signal is formalized in Algorithm 1. At each node, the agent inspects the goal signal that would be obtained at all the neighboring nodes, corrupted by the readout noise η. Then it steps to the neighbor with the highest value. Suppose the agent starts at node x and navigates following the goal signal for node y. The resulting navigation route x=s0,s1,,sn=y has Lxy=n steps. Navigation is perfect if this equals the shortest graph distance, Lxy=Dxy. We will assess deviations from perfect performance by the excess length of the routes.

Algorithm 1 Navigation.
Parameters: gain γ, noise ϵ
Input: map synapse matrix M, goal synapse vector g
  sx                                                  start navigation at node x
  while not at goal do                                 stop when goal node is found
    for all nodes j that neighbor s do
      u(j)iδi,j for every point cell i           point cell output with agent at node j
      v(j)(1γ1M)1u(j)                                         map output
      r(j)gv(j)+η(j)                          noisy goal signal, ηN(0,(ϵ/2)2)
    end for
    sargmaxjr(j)           choose the neighbor node with the highest goal signal
  end while

Figure 3D and E illustrate how the navigated path distance Lxy depends on the noise level ϵ and the gain γ. For small gain or high noise, the goal signal extends only over a graph distance of 5–6 links. Beyond that, the navigated distance Lxy begins to exceed the graph distance Dxy. As the gain increases, the goal signal extends further through the graph and navigation becomes reliable over longer distances (Figure 3D). Eventually, however, the goal signal loses its monotonic distance dependence (Figure 3C). At that stage, navigation across the graph may fail because the agent gets trapped in a local maximum of the goal signal. This can happen even before the critical gain value is reached (Figure 3C). For the example in Figure 3, the highest useful gain is γ=0.34 whereas γc=0.383.

For any given value of the gain, navigation improves with lower noise levels, as expected (Figure 3E). At the reference value of ϵ=0.01, navigation is perfect even across the 12 links that separate the most distant points on this graph.

In summary, this analysis spells out the challenges that need to be met for endotaxis to work properly. First, during the learning phase, the agent must reliably extract the adjacency matrix of the graph and copy it into its map synapses. Second, during the navigation phase, the agent must evaluate the goal signal with enough resolution to distinguish the values at alternative nodes. The neuronal gain γ plays a central role: with γ too small, the goal signal decays rapidly with distance and vanishes into the noise just a few steps away from the goal. But at large γ the network computation becomes unstable.

Acquisition of map and targets during exploration

As discussed above, the goal of learning during exploration is that the agent acquires a copy of the graph’s adjacency matrix in its map synapses, MA, and stores the map output at a goal location y in the goal synapses gv(y). Here we explore how the rules for synaptic plasticity in the map and goal networks allow that to happen. Algorithm 2 spells out the procedure we implemented for learning from a random walk through the environment.

Algorithm 2 Map and goal-learning.
Parameters: γ,θ,α
Input: adjacency matrix A, resource signals F
  M0                                                 initiate map synapses at 0
  G0                                                 initiate goal synapses at 0
  t0                                                          t counts the steps
  s(t)x                                                   start random walk at x
  while learning do
    tt+1
     s(t) a random neighbor of s(t1)                   continue the random walk
     ui(t)δi,s(t) for every point cell i                                point cell output
     v(t)(1γ1M)1u(t)                                        map cell output
    for all map cell pairs (i,j) do
      if vj(t1)>θ and vi(t)>θ then     threshold on pre- and post-synaptic activity
        Mji,Mij1                           on a directed graph only increment Mij
      end if
    end for
    rGv(t)                                                          goal signals
    for every goal neuron k do
      if Fk,s(t)>0 then               the agent is at a location that contains resource k
        for every map neuron j do
          GkjGkj+α(Fk,s(t)rk)vj(t)                        update goal synapses
        end for
      end if
    end for
  end while

The map synapses Mij start out at zero strength. When the agent moves from node j=s(t) at time t to node i=s(t+1), the map cells j and i are excited in close succession. When that happens, the agent potentiates the synapses between those two neurons to Mji=Mij=1. Of course, a map cell can also get activated through the recurrent network, and we must distinguish that from direct input from its point cell. We found that a simple threshold criterion is sufficient. Here θ is a threshold applied to both the pre- and postsynaptic activity, and the map synapse gets established only if both neurons respond above threshold. The tuning requirements for this threshold are discussed below.

The goal synapses Gkj similarly start out at zero strength. Consider a particular goal cell k, and suppose its corresponding resource cell has activity Fky when the agent is at location y. When a positive resource signal arrives, that means the agent is at a goal location. If the goal signal rk received from the map output is smaller than the resource signal Fky, then the goal synapses get incremented by something proportional to the current map output. Learning at the goal synapses saturates when the goal signal correctly predicts the resource signal. The learning rate α sets how fast that will happen. Note that both the learning rules for map and goal synapses are Hebbian and strictly local: each synapse is modified based only on signals available in the pre- and postsynaptic neurons.

To illustrate the process of map and goal-learning, we simulate an agent exploring a simple ring graph by a random walk (Figure 4). At first, there are no targets in the environment that can deliver a resource (Figure 4A). Then we add one target location, and later a second one. Finally, we add a new link to the graph that makes a connection clear across the environment. As the agent explores the graph, we will track how its representations evolve by monitoring the map synapses and the profile of the goal signal.

Figure 4. Learning the map and the targets during exploration.

Figure 4.

(A) Simulation of a random walk on a ring with 14 nodes. Left: layout of the ring, with resource locations marked in blue. The walk progresses in 800 time steps (top to bottom); with the agent’s position marked in red (nodes 0–13, horizontal axis). At each time, the color map shows the goal signal that would be produced if the agent were at position ‘Node.’ White horizontal lines mark the appearance of a target at t=200, a second target with the same resource at t=400, and a new link across the ring at step t=600. (B) The matrix M of map synapses at various times. The pixel in row i and column j represents the matrix element Mij. Color purple =0. Note the first few steps (number above graph) each add a new synapse. Eventually, M reflects the adjacency matrix of nodes on the graph. (C) Goal signals just before and just after the agent encounters the first target. (D) Goal signals just before and just after the agent encounters the second target. (E) Goal signals just before and just after the agent travels the new link for the first time. Parameters: γ=0.32,θ=0.27,α=0.3.

At the outset, every time the agent steps to a new node, the map synapse corresponding to that link gets potentiated (Figure 4B). After enough steps, the agent has executed every link on the graph, and the matrix of map synapses resembles the full adjacency matrix of the graph (Figure 4B). At this stage, the agent has learned the connectivity of the environment.

Once a target appears in the environment, it takes the agent a few random steps to encounter it. At that moment, the goal synapses get potentiated for the first time, and suddenly a goal signal appears in the goal cell (Figure 4C). The profile of that goal signal is fully formed and spreads through the entire graph thanks to the pre-established map network. By following this goal signal uphill, the agent can navigate along the shortest path to the target from any node on the graph. Note that the absolute scale of the goal signal grows a little every time the agent visits the goal (Figure 4A) and eventually saturates.

Sometime later, we introduce a second target elsewhere in the environment (Figure 4D). When the agent encounters it along its random walk, the goal synapses get updated, and the new goal signal has two peaks in its profile. Again, this goal signal grows during subsequent visits. By following that signal uphill from any starting point, the agent will be led to a nearby target by the shortest possible path.

When a new link appears, the agent eventually discovers it on its random walk. At that point, the goal signal changes instantaneously to incorporate the new route (Figure 4E). An agent following the new goal signal from node 13 on the ring will now be led to a target location in just three steps, using the shortcut, whereas previously it took five steps.

This simulation illustrates how the structure of the environment is acquired separately from the location of resources. The agent can explore and learn the map of the environment even without any resources present (Figure 4B). This learning takes place among the map synapses in the endotaxis circuit (Figure 1B). When a resource is found, its location gets tagged within that established map through learning by the goal synapses. The resulting goal signal is available immediately without the need for further learning (Figure 4C). If the distribution of resources changes, the knowledge in the map remains unaffected (Figure 4D) but the goal synapses can change quickly to incorporate the new target. Vice versa, if the graph of the environment changes, the map synapses get updated, and that adapts the goal signal to the new situation even without further change in the goal synapses (Figure 1E).

What happens if a previously existing link disappears from the environment, for example, because one corridor of the mouse burrow caves in? Ideally the agent would erase that link from the cognitive map. The learning algorithm (Algorithm 2) is designed for rapid and robust acquisition of a cognitive map starting from zero knowledge and does not contain a provision for forgetting. However, one can add a biologically plausible rule for synaptic depression that gradually erases memory of a link if the agent never travels it. Details are presented in section ‘Forgetting of links and resources’ (Figure 10). For the sake of simplicity, we continue the present analysis of endotaxis based on the simple three-parameter algorithm presented above (Algorithm 2).

Choice of learning rule

The map learning rule in Algorithm 2 produces full-strength synapses Mij and Mji after a single co-activation of the two neurons. A more common approach to synaptic learning uses small incremental updates and stabilizes the update rule with some form of normalization, based on the average pre- or postsynaptic activity over many steps (Gerstner and Kistler, 2002). For example, presynaptic normalization leads the synaptic network to learn a transition probability matrix (Fang et al., 2023)

Tij=probability of stepping to nodeigiven current nodej

Instead, we adopted the instantaneous update model for two reasons: most importantly, this allows the agent to learn a route after the first traversal, which is needed to explain the rapid learning observed in experimental animals. For example, section ‘Navigating a partial map: homing behavior’ models accurate homing after the first excursion into the labyrinth. Furthermore, when we repeated the analysis of Figure 3 using the transition matrix Tij instead of the adjacency matrix Aij, the goal signal correlated more weakly with distance, and even with the optimal gain setting the range of correct navigation was considerably reduced.

This rapid learning rule reflects an implicit assumption that the environment is static, such that the learned transition will always be available. For adaptation to slow changes in the environment, see section ‘Forgetting of links and resources.’ Note also that the above procedure Algorithm 2 updates both synapses between neurons i and j. This assumes implicitly that the experienced edge on the graph can also be traversed in the opposite direction, which applies to many navigation problems. To learn a directed environment – such as a city map with one-way streets or a game in which moves cannot be reversed – one may use a directed learning rule that requires the presynaptic neuron to fire before the postsynaptic neuron. This will update only the synapse Mij representing the edge that was actually traveled. For all simulations in this article, we will use the symmetric learning rule.

Navigation using the learned goal signal

We now turn to the ‘exploitation’ component of endotaxis, namely use of the learned information to navigate toward targets. In the simulations of Figure 5, we allow the agent to explore a graph. Every node on the graph drives a separate resource cell, thus the agent simultaneously learns goal signals to every node. After a random walk sufficient to cover the graph several times, we test the agent’s ability to navigate to the goals by ascending on the learned goal signal. For that purpose, we teleport the agent to an arbitrary start node in the graph and ask how many steps it takes to reach the goal node following the policy of Algorithm 1. In these tests, the learning of map and goal synapses was turned off during the navigation phase, so we could separately assess how learning and navigating affect the performance. However, there is no functional requirement for this, and indeed one of the attractive features of this model is that learning and navigation can proceed in parallel at all times.

Figure 5. Navigation using the learned map and targets.

Figure 5.

(A–C) Ring with 50 nodes. (A) Goal signal for a single target location (blue dot on left icon) after learning during random exploration with 10,000 steps. Color scale is logarithmic, yellow = high. Note the monotonic decay of the goal signal with graph distance from the target. (B) Results of all-to-all navigation where every node is a separate goal. For all pairs of nodes, this shows the navigated distance vs the graph distance. Median ±10/90 percentiles for all routes with the same graph distance. ‘Ideal’ navigation would follow the identity. The actual navigation is ideal over short distances, then begins to deviate from ideal at a critical distance that depends on the noise level ϵ. (C) As in (B) over a wider range, note logarithmic axis. Noise ϵ=0.01. Includes comparison to navigation by a random walk; and navigation using the optimal goal signal based on knowledge of the graph structure and target location. γ=0.41,θ=0.39,α=0.1. (D–F) As in (A–C) for a binary tree graph with 127 nodes. (D) Goal signal to the node marked on the left icon. This was the reward port in the labyrinth experiments of Rosenberg et al., 2021. White lines separate the branches of the tree. γ=0.33,θ=0.30,α=0.1. (G–I) As in (A–C) for a ‘Tower of Hanoi’ graph with 81 nodes. γ=0.29,θ=0.27,α=0.1.

Figure 5A–C shows results on a ring graph with 50 nodes. With suitable values of the model parameters (γ,θ,α) – more on that later – the agent learns a goal signal that declines monotonically with distance from the target node (Figure 5A). The ability to ascend on that goal signal depends on the noise level ϵ, which determines whether the agent can sense the difference in goal signal at neighboring nodes. At a high noise level ϵ=0.1, the agent finds the target by the shortest route from up to five links away (Figure 5B); beyond that range, some navigation errors creep in. At a low noise level of ϵ=0.005, navigation is perfect up to 10 links away. Every factor of two increase in noise seems to reduce the range of navigation by about one link.

How does the process of learning the map of the environment affect the ultimate navigation performance? Figure 5C makes that comparison by considering an agent with oracular knowledge of the graph structure and target location (Equations 9 and 10). Interestingly, this barely improves the distance range for perfect navigation. By contrast, an agent performing a random walk with zero knowledge of the environment would take about 40 times longer to reach the target than by using endotaxis (Figure 5C).

The ring graph is particularly simple, but how well does endotaxis learn in a more realistic environment? Figure 5D–F shows results on a binary tree graph with six levels: this is the structure of a maze used in a recent study on mouse navigation (Rosenberg et al., 2021). In those experiments, mice learned quickly how to reach the reward location (blue dot in Figure 5D) from anywhere within the maze. Indeed, the endotaxis agent can learn a goal signal that declines monotonically with distance from the reward port (Figure 5D). At a noise level of ϵ=0.01, navigation is perfect over distances of 9 links and close to perfect over the maximal distance of 12 links that occurs in this maze (Figure 5E). Again, the challenge of having to learn the map affects the performance only slightly (Figure 5F). Finally, comparison with the random agent shows that endotaxis shortens the time to target by a factor of 100 on this graph (Figure 5F).

Figure 5G–I shows results for a more complex graph that represents a cognitive task, namely the game ‘Tower of Hanoi.’ Disks of different sizes are stacked on three pegs, with the constraint that no disk can rest on top a smaller one. The game is solved by rearranging the pile of disks from the center peg to another. In any state of the game, there are either two or three possible actions, and they form an interesting graph with many loops (Figure 5G). The player starts at the top node (all disks on the center peg) and the two possible solutions correspond to the bottom left and right corners. Again, random exploration leads the endotaxis agent to learn the connectivity of the game and to discover the solutions. The resulting goal signal decays systematically with graph distance from the solution (Figure 5G). At a noise of ϵ=0.01, navigation is perfect once the agent gets to within nine moves of the target (Figure 5H). This is not quite sufficient for an error-free solution from the starting position, which requires 15 moves. However, compared to an agent executing random moves, endotaxis speeds up the solution by a factor of 10 (Figure 5I). If the game is played with only three disks, the maximal graph distance is 7, and endotaxis solves it perfectly at ϵ=0.01.

These results show that endotaxis functions well in environments with very different structure: linear, tree-shaped, and cyclic. Random exploration in conjunction with synaptic learning can efficiently acquire the connectivity of the environment and the location of targets. With a noise level of 1%, the resulting goal signal allows perfect navigation over distances of ∼9 steps, independent of the nature of the graph. This is a respectable range: personal experience suggests that we rarely learn routes that involve more than nine successive decisions. Chess openings, which are often played in a fast and reflexive fashion, last about 10 moves.

Parameter sensitivity

The endotaxis model has only three parameters: the gain γ of map units, the threshold θ for learning at map synapses, and the learning rate α at goal synapses. How does performance depend on these parameters? Do they need to be tuned precisely? And does the optimal tuning depend on the spatial environment? There is a natural hierarchy to the parameters if one separates the process of learning from that of navigation. Suppose the circuit has learned the structure of the environment perfectly, such that the map synapses reflect the adjacencies (Equation 9), and the goal synapses reflect the map output at the goal (Equation 10). Then the optimal navigation performance of the endotaxis system depends only on the gain γ and the noise level ϵ. For a given γ, in turn, the precision of map learning depends only on the threshold θ (see Algorithm 2). Finally, if the gain is set optimally and the map was learned properly, the identification of targets depends only on the goal-learning rate α. Figure 6 explores these relationships in turn.

Figure 6. Sensitivity of performance to the model parameters.

Figure 6.

On each of the three graphs, we simulated endotaxis for all-to-all navigation, where each node serves as a start and a goal node. The performance measure was the range of the goal signal, defined as the graph distance over which at least half the navigated routes follow the shortest path. The exploration path for synaptic learning was of medium length, visiting each edge on the graph approximately 10 times. The noise was set to ϵ=0.01. (A) Ring graph with 50 nodes. Left: dependence of the goal signal range on the gain γ and the threshold θ for learning map synapses. Performance increases with higher gain until it collapses beyond the critical value. For each gain, there is a sharply defined range of useful thresholds, with lower values at lower gain. Right: dependence of the goal signal range on the learning rate α at goal synapses, and the length of the exploratory walk, measured in visits per edge of the graph. For a short walk (one edge visit), a high learning rate is best. For a long walk (100 edge visits), a lower learning rate wins out. (B) As in (A) for the Binary tree maze with 127 nodes. (C) As in (A) for the Tower of Hanoi graph with 81 nodes.

We simulated the learning phase of endotaxis as in the preceding section (Figure 5B, E and H), using a noise level of ϵ=0.01, and systematically varying the model parameters (γ,θ,α). For each parameter set, we measured the graph distance over which at least half of the navigated routes were perfect. We defined this distance as the range of the goal signal.

For example, on the ring graph (Figure 6A) the signal range improves with gain until performance collapses beyond a maximal gain value. This is just as predicted by the theory (Figure 3), except that the maximal gain γmax=0.41 is somewhat below the critical value γc=0.5. Clearly the added complications of having to learn the map and goal locations take their toll at high gain. Below the maximal cutoff, the dependence of performance on gain is rather gentle: for example, a 14% change in gain from 0.35 to 0.40 leads to a 26% change in performance. At any given gain value, there is a range of values for the threshold θ within which the map is learned perfectly. Note that this range is generous and does not require precise adjustment: for example, under a near-maximal gain of 0.38, the threshold can vary freely over a 35% range.

Once the gain and synaptic threshold are set so as to acquire the map synapses, the quality of goal-learning depends only on the learning rate α. With large α, a single visit to the goal fully potentiates the goal synapses so that they do not get updated further. This allows for a fast acquisition of that target, but at the risk of imperfect learning, because the map may not be fully explored yet. A small α will update the synapses only partially over many successive visits to the goal. This leads to a poor performance after short exploration, because the weak goal signal competes with noise, but superior performance after long explorations: a tradeoff between speed of learning and accuracy. Precisely this speed-accuracy tradeoff is seen in the simulations (Figure 6A, right): a high learning rate is optimal for short explorations, but for longer ones a small learning rate wins out. An intermediate value of α=1 delivers a good compromise performance.

We found qualitatively similar behavior for the other two environments studied here: the binary maze graph (Figure 6B) and the Tower of Hanoi graph (Figure 6C). In each case, the maximal usable gain is slightly below the critical value γc of that graph. A learning rate of α=1 delivers intermediate results. For long explorations, a lower learning rate is best.

In summary, this sensitivity analysis shows that the optimal parameter set for endotaxis does depend on the environment. This is not altogether surprising: every neural network needs to adapt to the distribution of inputs it receives so as to perform optimally. At the same time, the required tuning is rather generous, allowing at least 10–20% slop in the parameters for reasonable performance. Furthermore, a single parameter set of γ=0.29,θ=0.26,α=1 performs quite well on both the binary maze and the Tower of Hanoi graphs, which are dramatically different in character.

A saturating activation function improves navigation

So far, the model of the map network used neurons with a linear activation function (Equation 3), meaning the output v is simply proportional to the input, v=γw. We also explored nonlinear activation functions v=f(w) and found that the performance of endotaxis improves under certain conditions (Fang et al., 2023). The most important feature is that f(w) should saturate for inputs x that are larger than the output of the point cells (u=1 in Equation 4). The detailed shape matters little, so for illustration we will use a linear-flat activation curve (Figure 7A):

f(w)={γw,ifw1γ,ifw>1 (19)

Figure 7. Benefits of a nonlinear activation function.

Figure 7.

(A) The activation function relating a map neuron’s output v to its total input w. Red: linear function with gain γ. Blue: nonlinear function with saturation at w>1. (B–D) Range of the goal signal, as defined in Figure 6, as a function of the gain γ (noise ϵ=0.01). Range increases with gain up to a maximal value. The maximal range achieved is higher with nonlinear activation (blue) than linear activation (red). Results for the ring graph (B), binary tree maze (C), and Tower of Hanoi graph (D). (E) Output of map cells during early exploration of the ring graph (gain γ=0.49). Suppose the agent has walked back and forth between nodes 2 and 5, so all their corresponding map synapses are established (black bars). Then the agent steps to node 6 for the first time (orange). Lines plot the output of the map cells with the agent at locations 2, 3, 4, 5, or 6. Dotted line indicates the maximal possible setting of the threshold θ in the learning rule. With linear activation (left), a map cell receiving purely recurrent input (4) may produce a signal larger than threshold (arrowhead above the dotted line). Thus, cells 4 and 6 would form an erroneous synapse. With a saturating activation function (right), the map amplitude stays constant throughout learning, and this confound does not happen. (F) The goal signal from an end node of the binary maze, plotted along the path from another end node. Map and goal synapses set to their optimal values assuming full knowledge of the graph and the target (gain γ=0.37). With linear activation (red), the goal signal has a local maximum, so navigation to the target fails. With a saturating activation function (blue), the goal signal is monotonic and leads the agent to the target.

Figure 7B–D reports the range of navigation on the three sample graphs, defined and computed from simulations as in the preceding section (Figure 6). The effective range is the largest graph distance over which the median trajectory chooses the shortest route. As observed using linear map neurons (Figure 6), the range increases with the gain γ until it collapses beyond some maximal value (Figure 7B–D). However, the saturating activation function allowed for higher gain values, which led to considerable increases in the range of navigation: by a factor of 2.2 for the ring graph, and 1.5 for the Tower of Hanoi graph. On the binary maze, the saturating activation function allowed perfect navigation over the maximal distance available of 12 steps.

The enhanced performance was a result of better map learning as well as better navigation. To understand the former, consider Figure 7E: here the agent has begun to learn the ring graph by walking back and forth between a few nodes (2–5), thus establishing all their pairwise map synapses; then it steps to a new node (6). With a linear activation function (Figure 7E, left), the recurrent synapses enhance the map output, so the map signal with the agent in the explored region (2–5) is considerably larger than after stepping to the new node. This interferes with the mechanism for map learning: the learning rule must identify which of the map cells represents the current location of the agent, and does so by setting a threshold on the output signal (Algorithm 2). In the present example, this leads to erroneous synapses because a map cell that receives only recurrent input (4) produces outputs larger than the threshold (arrowhead in Figure 7E). With the saturating activation function (Figure 7E, right), the directly activated map cells always have the largest output signal, so the learning rule can operate without errors.

The saturating activation function also helps after learning is complete. In Figure 7F, the agent is given perfect knowledge of the binary maze map, then asked to use the resulting goal signals to navigate from one end node to another. With a linear activation function, the goal signal has a large local maximum that traps the agent. The nonlinear activation function produces a monotonic goal signal that leads the agent to the target.

Both these aspects of enhanced performance can be traced to the normalizing effect of the nonlinearity that keeps the peak output of the map constant. Such normalization could be performed by other mechanisms as well, for example, a global inhibitory feedback among the map neurons.

In summary, this section shows that altering details of the model can substantially extend its performance. For the remainder of this article, we will return to the linear activation curve because interesting behavioral phenomena can be observed even with the simple linear model.

Navigating a partial map: Homing behavior

We have seen that endotaxis can learn both connections in the environment and the locations of targets after just one visit (Figure 6.) This suggests that the agent can navigate well on whatever portion of the environment it has already seen, before covering it exhaustively. To illustrate this, we analyze an ethologically relevant instance.

Consider a mouse that enters an unfamiliar environment for the first time, such as a labyrinth constructed by graduate students (Rosenberg et al., 2021). Given the uncertainties about what lurks inside, the mouse needs to retain the ability to flee back to the entrance as fast as possible. For concreteness, take the mouse trajectory in Figure 8A. The animal has entered the labyrinth (location 1), made its way to one of the end nodes (3), then explored further to another end node (4). Suppose it needs to return to the entrance now. One way would be to retrace all its steps. But the shorter way is to take a left at (2) and cut out the unnecessary branch to (3). Experimentally we found that mice indeed take the short direct route instead of retracing their path (Rosenberg et al., 2021). They can do so even on the very first visit of an unfamiliar labyrinth. Can endotaxis explain this behavior?

Figure 8. Homing by endotaxis.

Figure 8.

(A) A binary tree maze as used in Rosenberg et al., 2021. A simulated mouse begins to explore the labyrinth (colored trajectory, purple = early, yellow = late), traveling from the entrance (1) to one of the end nodes (3), then to another end node (4). Can it return to the entrance from there using endotaxis? (B) Goal signal learned by the end of the walk in (A), displayed as in Figure 5D, purple = 0. Note the goal signal is nonzero only at the nodes that have been encountered so far. From all those nodes, it increases monotonically toward the entrance. (C) Detailed plot of the goal signal along the shortest route for homing. Parameters γ=0.33,θ=0.30,α=10,ϵ=0.01.

We assume that the entrance is a salient location, so the agent dedicates a goal cell to the root node of the binary tree. Figure 8B plots the goal signal after the path in panel A, just as the agent wants to return home. The goal signal is nonzero only at the locations the agent has visited along its path. It clearly increases monotonically toward the entrance (Figure 8C). At a noise level of ϵ=0.01, the agent can navigate to the entrance by the shortest path without error. Note specifically that the agent does not retrace its steps when arriving at location (2), but instead turns toward (1).

One unusual aspect of homing is that the goal is identified first, before the agent has even entered the environment to explore it. That strengthens the goal synapse from the sole map cell that is active at the entrance. Only subsequently does the agent build up map synapses that allow the goal signal to spread throughout the map network. Still, in this situation, the single synapse onto the goal cell is sufficient to convey a robust signal for homing.

Efficient patrolling

Beside exploring and exploiting, a third mode of navigating the environment is patrolling. At this stage, the animal knows the lay of the land, and has perhaps discovered some special locations, but continues to patrol the environment for new opportunities or threats. In our study of mice freely interacting with a large labyrinth, the animals spent more than 85% of the time patrolling the maze (Rosenberg et al., 2021). This continued for hours after they had perfected the targeting of reward locations and the homing back to the entrance. Presumably, the goal of patrolling is to cover the entire environment frequently and efficiently so as to spot any changes as soon as they develop. So the ideal path in patrolling would visit every node on the graph in the smallest number of steps possible. In the binary tree maze used for our experiments, that optimal patrol path takes 252 steps: it visits every end node of the labyrinth exactly once without any repeats (Figure 9A).

Figure 9. Patrolling by endotaxis.

Figure 9.

(A) Left: a binary tree maze as used in Rosenberg et al., 2021, plotted here so every node has a different vertical offset. Right: a perfect patrol path through this environment. It visits every node in 252 steps, then starts over. (B) Patrolling efficiency of different agents on the binary tree maze. The focus here is on the 64 end nodes of the labyrinth. We ask how many distinct end nodes are found (vertical axis) as a function of the number of end nodes visited (horizontal axis). For the perfect patrolling path, that relationship is the identity (‘perfect’). For a random walk, the curve is shifted far to the right (‘random’, note log axis). Ten mice in Rosenberg et al., 2021 showed patrolling behavior within the shaded range. Solid lines are the endotaxis agent, operating at different noise levels ϵ. Note ϵ=0.01 produces perfect patrolling; in fact, panel (A) is a path produced by this agent. Higher noise levels lead to lower efficiency. The behavior of mice corresponds to ϵ1. Gain γ=0.33, habituation β=1.2, with recovery time τ=100 steps.

Real mice do not quite execute this optimal path, but their patrolling behavior is much more efficient than random (Figure 9B). They avoid revisiting areas they have seen recently. Could endotaxis implement such an efficient patrol of the environment? The task is to steer the agent to locations that have not been visited recently. One can formalize this by imagining a resource called ‘neglect’ distributed throughout the environment. At each location, neglect increases with time, then resets to zero the moment the agent visits there. To use this in endotaxis, one needs a goal cell that represents neglect.

We add to the core model a goal cell that represents ‘neglect.’ It receives excitation from every map cell via synapses that are equal and constant in strength (see clock symbol in Figure 1B). This produces a goal signal that is approximately constant everywhere in the environment. Now suppose that the point neurons undergo a form of habituation: when a point cell fires because the agent walks through its field, its sensitivity decreases by some habituation factor. That habituation then decays over time until the point cell recovers its original sensitivity. As a result, the most recently visited points on the graph produce a smaller goal signal. Endotaxis based on this goal signal will therefore lead the agent to the areas most in need of a visit.

Figure 9B illustrates that this is a powerful way to implement efficient patrols. Here we modeled endotaxis on the binary tree labyrinth, using the standard parameters useful for exploration, exploitation, and homing in previous sections. To this, we added a habituation in the point cells with exponential recovery dynamics. Formally, the procedure is defined by Algorithm 3. Again, we turned off the learning rules (Algorithm 2) during this simulation to observe the effects of habituation in isolation. A fully functioning agent can keep the learning rules on at all times (Figure 11).

Algorithm 3 Patrolling.
Parameters: gain γ, noise ϵ, habituation β, recovery time τ
Input: map synapses M
  hi1, for all point cells i                  starting sensitivity of point cell at node i
  sx                                                  begin patrolling at node x
  while patrolling do
    hshseβ                                           habituation of point cell s
    hi1(1hi)e1/τ, for all i                    resensitization of all point cells
    for all nodes j that neighbor s do                    agent tests available options
      ui(j)δi,jhj for all i                      point cell output with agent at node j
      v(j)(1γ1M)1u(j)                                          map output
      p(j)1Zivi(j)+η      sum of map output with noise, normalized so max = 1
    end for
    sargmaxjp(j)             move to neighbor node with the highest patrol signal
  end while

With appropriate choices of habituation β and recovery time τ, the agent does in fact execute a perfect patrol path on the binary tree, traversing every edge of the graph exactly once, and then repeating that sequence indefinitely (Figure 9A). For this to work, some habituation must persist for the time taken to traverse the entire tree; in this simulation, we used τ=100 steps on a graph that requires 252 steps. As in all applications of endotaxis, the performance also depends on the readout noise ϵ. For increasing readout noise, the agent’s behavior transitions gradually from the perfect patrol to a random walk (Figure 9B). The patrolling behavior of real mice is situated about halfway along that range, at an equivalent readout noise of ϵ=1 (Figure 9B).

Finally, this suggests a unified explanation for exploration and patrolling: in both modes, the agent follows the output of the ‘neglect’ cell, which is just the sum total of the map output. However, in the early exploration phase, when the agent is still assembling the cognitive map, it gives the neglect signal zero or low weight, so the turning decisions are dominated by the readout noise and produce something close to a random walk. Later on, the agent assigns a higher weight to the neglect signal, so it exceeds the readout noise and shifts the behavior toward systematic patrolling. In our simulations, an intrinsic readout noise of ϵ=0.01 is sufficiently low to enable even a perfect patrol path (Figure 9B).

In summary, the core model of endotaxis can be enhanced by adding a basic form of habituation at the input neurons. This allows the agent to implement an effective patrolling policy that steers towards regions which have been neglected for a while. Of course, habituation among point cells will also change the dynamics of map learning during the exploration phase. We found that both map and goal synapses are still learned effectively, and navigation to targets is only minimally affected by habituation (Figure 11).

Discussion

Summary of claims

We have presented a biologically plausible neural mechanism that can support learning, navigation, and problem solving in complex environments. The algorithm, called endotaxis, offers an end-to-end solution for assembling a cognitive map (Figure 4), memorizing interesting targets within that map, navigating to those targets (Figure 5), as well as accessory functions like instant homing (Figure 8) and effective patrolling (Figure 9). Conceptually, it is related to chemotaxis, namely the ability to follow an odor signal to its source, which is shared universally by most or all motile animals. The endotaxis network creates an internal ‘virtual odor’ which the animal can follow to reach any chosen target location (Figure 1). When the agent begins to explore the environment, the network learns both the structure of the space, namely which points are connected, and the location of valuable resources (Figure 4), even after a single experience (Figures 4 and 8). The agent can then navigate back to those target locations efficiently from any point in the environment (Figure 5). Beyond spatial navigation, endotaxis can also learn the solution to purely cognitive tasks (Figure 5) that can be formulated as search on a graph (section ‘Theory of endotaxis’). It takes as given two elementary facts: the existence of place cells that fire when the animal is at a specific location, and a behavioral module that allows the animal to follow an odor gradient uphill. The proposed circuit (Figure 1) provides the interface from the place cells to the virtual odor gradient. In the following sections, we consider how these findings relate to phenomena of animal behavior and neural circuitry, and prior art in the area of theory and modeling.

Theories and models of spatial learning

Broadly speaking, endotaxis can be seen as a form of reinforcement learning (Sutton and Barto, 2018): the agent learns from rewards or punishments in the environment and develops a policy that allows for subsequent navigation to special locations. The goal signal in endotaxis plays the role of a value function in reinforcement learning theory. From experience, the agent learns to compute that value function for every location and control its actions accordingly. Within the broad universe of reinforcement learning algorithms, endotaxis combines some special features as well as limitations that are inspired by empirical phenomena of animal learning, and also make it suitable for a biological implementation.

First, most of the learning happens without any reinforcement. During the exploratory random walk, endotaxis learns the topology of the environment, specifically by updating the synapses in the map network (M in Figure 1B). Rewards are not needed for this map learning, and indeed the goal signal remains zero during this period (Figure 4). Once a reward is encountered, the goal synapses (G in Figure 1B) get set, and the goal signal instantly spreads through the known portion of the environment. Thus, the agent learns how to navigate to the goal location from a single reinforcement (Figure 4). This is possible because the ground has been prepared, as it were, by learning a map. In animal behavior, the acquisition of a cognitive map without rewards is called latent learning. Early debates in animal psychology pitched latent learning and reinforcement learning as alternative explanations (Thistlethwaite, 1951). Instead, in the endotaxis algorithm, neither can function without the other as the goal signal explicitly depends on both the map and goal synapses (Equation 18, Algorithm 1).

More specifically, the neural signals in endotaxis bear some similarity to the so-called successor representation (Dayan, 1993; Corneil and Gerstner, 2015; Stachenfeld et al., 2017; Garvert et al., 2017; Fang et al., 2023). This is a proposal for how the brain might encode the current state of the agent, intended to simplify the mathematics of time-difference reinforcement learning. In that representation, there is a neuron for every state of the agent, and the activity of neuron j is the time-discounted probability that the agent will find itself at state j in the future. Similarly, the output of the endotaxis map network is related to future states of the agent and follows a similar functional dependence on distance (Meister, 2023, Equation 7). However, despite these formal similarities, the underlying logic is quite different. In the successor representation, γ plays the role of a temporal discount factor for rewards (Dayan, 1993); essentially it is the proportionality factor in the agent’s belief that ‘time is money.’ In this picture, varying γ allows the agent to make predictions with different time horizons (Fang et al., 2023; Stachenfeld et al., 2017). In endotaxis, there is no time/reward tradeoff. The agent simply wants the shortest path to the goal. The map network reflects the objective connectivity of the environment to the farthest extent possible. Here γ is the gain of the map neurons that, when properly chosen, allows the neural network to perform that computation. The agent may want to tune γ to the statistics of the environment, although we showed that a common value of γ works quite well across environments (Figure 6). (These differences in how the problem is formulated can lead to slightly different mathematical expressions, for example, compare the role of γ in Equation 7 with Equation 2 of Fang et al., 2023.)

Second, endotaxis does not tabulate the list of available actions at each state. That information remains externalized in the environment: the agent simply tries whatever actions are available at the moment, then picks the best one. This is a characteristically biological mode of action and most organisms have a behavioral routine that executes such trial-and-error. This ‘externalized cognition’ simplifies the learning task: for any given navigation policy, the agent needs to learn only one scalar function of location, namely the goal signal. By comparison, many machine learning algorithms develop a value function for state–action pairs, which then allows more sophisticated planning (Sutton and Barto, 2018; Moerland et al., 2023). The relative simplicity of the endotaxis circuit depends on the limitation to learning only state functions.

Some key elements of the endotaxis model have appeared in prior work, starting with the notion of ascending a scalar goal signal during navigation (Schmajuk and Thieme, 1992; Voicu and Schmajuk, 2000; Samsonovich and Ascoli, 2005). Several models assume the existence of a map layer, in which individual neurons stand for specific places, and the excitatory synapses between neurons represent the connections between those places (Gaussier et al., 2002; Schölkopf and Mallot, 1995; Voicu and Schmajuk, 2000; Trullier and Meyer, 2000; Martinet et al., 2011; Ponulak and Hopfield, 2013; Khajeh-Alijani et al., 2015). Then the agent somehow reads out those connections in order to find the shortest path between its current location (the start node) and a desired target (the end node).

Very different schemes have been proposed for this readout of the map. The most popular scheme is to somehow inject a signal into the desired end node, let it propagate backward through the network, and read out the magnitude or gradient of the signal near the start node (Glasius et al., 1996; Gaussier et al., 2002; Gorchetchnikov and Hasselmo, 2005; Martinet et al., 2011; Ponulak and Hopfield, 2013; Khajeh-Alijani et al., 2015). In general, this requires some accessory system that can look up which neuron in the map corresponds to the desired end node, and which neuron to the agent’s current location or its neighbors; often these accessory functions remain unspecified (Schölkopf and Mallot, 1995; Voicu and Schmajuk, 2000; Khajeh-Alijani et al., 2015). By contrast, in the endotaxis model the signal is propagated in the forward direction starting with the activity of the place cell at the agent’s current location. The signal strength is read out at the goal location: The goal neuron is the same neuron that also responds directly to the rewarding feature at the goal location. For example, the proximity to water is read out by a neuron that is also excited when the animal drinks water. In this way, the brain does not need to maintain a separate lookup table for goal neurons. If the agent wants to find water, it should simply follow the same neuron that fires when it drinks.

Another distinguishing feature of endotaxis is that it operates continuously. Many models for navigation have to separate the phase of spatial learning from the phase of goal-directed navigation. Sometimes plasticity needs to be turned off or reset during one phase or the other (Samsonovich and Ascoli, 2005; Ponulak and Hopfield, 2013). Sometimes a special signal must be injected during goal-seeking (Voicu and Schmajuk, 2000). Sometimes the rules change depending on whether the agent approaches or leaves a target (Blum and Abbott, 1996). Again this requires additional supervisory systems that often go unexplained. By contrast, endotaxis is ‘always on.’ Whether the animal explores a new environment, navigates to a target, or patrols a well-known graph, the synaptic learning rules are always the same. The animal chooses its policy by setting the mode switch that selects one of the available goal signals for the taxis module (Figure 1). Nothing has to change under the hood in the operation of the circuit. All the same signals are used for map learning, target learning, and navigation.

In summary, various components of the endotaxis model have appeared in other proposed schemes for spatial learning and navigation. The present model stands out in that all the essential functions are covered in a feed-forward and neuromorphically plausible manner, without invoking unexplained control schemes.

Animal behavior

The millions of animal species no doubt use a wide range of mechanisms to get around their environment, and it is worth specifying which types of navigation endotaxis might solve. First, the learning mechanism proposed here applies to complex environments, namely those in which discrete paths form sparse connections between points. For a rodent and many other terrestrial animals, the paths they may follow are usually constrained by obstacles or by the need to remain under cover. In those conditions, the brain cannot assume that the distance between points is given by Euclidean geometry, or that beacons for a goal will be visible in a straight line from far away, or that a target can be reached by following a known heading. As a concrete example, a mouse wishing to exit from deep inside a labyrinth (Figure 8A, Rosenberg et al., 2021) can draw little benefit from knowing the distance and heading of the entrance.

Second, we are focusing on the early experience with a new environment. Endotaxis can get an animal from zero knowledge to a cognitive map that allows reliable navigation toward goals discovered on a previous foray. It explains how an animal can return home from inside a complex environment on the first attempt (Rosenberg et al., 2021) or navigate to a special location after encountering it just once (Figures 6 and 8). But it does not implement more advanced routines of spatial reasoning, such as stringing a habitual sequence of actions together into one, or deliberating internally to plan entire routes. Clearly, given enough time in an environment, animals may develop algorithms other than the beginner’s choice proposed here.

A key characteristic of endotaxis, distinct from other forms of navigation, is the reliance on trial-and-error. The agent does not deliberate to plan the shortest path to the goal. Instead, it finds the shortest path by locally sampling the real-world actions available at its current point, and choosing the one that maximizes the virtual odor signal. In fact, there is strong evidence that animals navigate by real-world trial-and-error, at least in the early phase of learning (Redish, 2016). Lashley, 1912, in his first scientific paper on visual discrimination in the rat, reported that rats at a decision point often hesitate ‘with a swaying back and forth between the passages.’ These actions – called ‘vicarious trial and error’ – look eerily like sniffing out an odor gradient, but they occur even in the absence of any olfactory cues. Similar behaviors occur in arthropods (Tarsitano, 2006) and humans (Santos-Pata and Verschure, 2018) when poised at a decision point. We suggest that the animal does indeed sample a gradient, not of an odor, but of an internally generated virtual odor that reflects the proximity to the goal. The animal seems to use the same policy of spatial sampling that it would apply to a real odor signal.

Frequently, a rodent stopped at a maze junction merely turns its head side-to-side, rather than walking down a corridor to sample the gradient. Within the endotaxis model, this could be explained if some of the point cells in the lowest layer (Figure 1B) are selective for head direction or for the view down a specific corridor. During navigation, activation of that ‘direction cell’ systematically precedes activation of point cells further down that corridor. Therefore, the direction cell gets integrated into the map network. From then on, when the animal turns in that direction, this action takes a step along the graph of the environment without requiring a walk in ultimately fruitless directions. In this way, the agent can sample the goal gradient while minimizing energy expenditure.

Once the animal gains familiarity with the environment, it performs fewer of the vicarious trial-and-error movements, and instead moves smoothly through multiple intersections in a row (Redish, 2016). This may reflect a transition between different modes of navigation, from the early endotaxis, where every action gets evaluated on its real-world merit, to a mode where many actions are strung together into behavioral motifs. Eventually the animal may also develop an internal forward model for the effects of its own actions, which would allow for prospective planning of an entire route (Kay et al., 2020; Nyberg et al., 2022). An interesting direction for future research is to seek a neuromorphic circuit model for such action planning; perhaps it can be built naturally on top of the endotaxis circuit.

Brain circuits

The key elements in the proposed circuitry (Figure 1) are a large population of neurons with sparsely selective responses; massive convergence from that population onto a smaller set of output neurons; and synaptic plasticity at the output neurons gated by signals from the animal’s experience. A prominent instance of this motif is found in the mushroom body of the arthropod brain (Heisenberg, 2003; Strausfeld et al., 2009). Here the Kenyon cells, with their sparse odor responses (Stopfer, 2014), play the role of both point and map cells. They are strongly recurrently connected; in fact, most of the Kenyon cell output synapses are onto other Kenyon cells (Eichler et al., 2017; Takemura et al., 2017). Kenyon cells converge onto a much smaller set of mushroom body output neurons (Aso et al., 2014), which play the role of goal cells. Plasticity at the synapse between Kenyon cells and output neurons is gated by neuromodulators that encode rewards or punishments (Cohn et al., 2015). Mushroom body output neurons are known to guide the turning decisions of the insect (Aso et al., 2014), perhaps through their projections to the central complex (Li et al., 2020), an area critical to the animal’s turning behavior (Honkanen et al., 2019). Conceivably, this is where the insect’s basic chemotaxis module is implemented.

In the conventional view, the mushroom body helps with odor discrimination and forms memories of discrete odors that are associated with salient experience (Heisenberg, 2003). Subsequently, the animal can seek or avoid those odors. But the endotaxis model suggests a different interpretation: insects can also use odors as landmarks in the environment. In this more general form of navigation, the odor is not a goal in itself, but serves to mark a route toward some entirely different goal (Knaden and Graham, 2016; Steck et al., 2009). A Kenyon cell, through its sparse odor selectivity, may be active at only one place in the environment, and thus provide the required location-selective input to the endotaxis circuit. Recurrent synapses among Kenyon cells will learn the connectivity among these odor-defined locations, and the output neurons will learn to produce a goal signal that leads the insect to a rewarding location, which itself may not even have a defined odor.

Bees and certain ants rely strongly on vision for their navigation. Here the insect uses discrete panoramic views of the landscape as markers for its location (Webb and Wystrach, 2016; Buehlmann et al., 2020; Sun et al., 2020). In those species, the mushroom body receives massive input from visual areas of the brain. If the Kenyon cells respond sparsely to the landscape views, like the point cells in Figure 1, then the mushroom body can tie together these discrete vistas into a cognitive map that supports navigation toward arbitrary goal locations.

The same circuit motifs are commonly found in other brain areas, including the mammalian neocortex and hippocampus. While the synaptic circuitry there is less understood than in the insect brain, one can record from neurons more conveniently. Much of that work on neuronal signals during navigation has focused on the rodent hippocampal formation (Moser et al., 2015), and it is instructive to compare these recordings to the expectations from the endotaxis model. The three cell types in the model – point cells, map cells, and goal cells – all have place fields, in that they fire preferentially in certain regions within the graph of the environment. However, they differ in important respects.

The place field is smallest for a point cell; somewhat larger for a map cell, owing to recurrent connections in the map network; and larger still for goal cells, owing to additional pooling in the goal network. Such a wide range of place field sizes has indeed been observed in surveys of the rodent hippocampus, spanning at least a factor of 10 in diameter (Wilson and McNaughton, 1993; Kjelstrup et al., 2008). Some place cells show a graded firing profile that fills the available environment. Furthermore, one finds more place fields near the goal location of a navigation task, even when that location has no overt markers (Hollup et al., 2001). Both of those characteristics are expected of the goal cells in the endotaxis model.

The endotaxis model assumes that point cells exist from the very outset in any environment. Indeed, many place cells in the rodent hippocampus appear within minutes of the animal’s entry into an arena (Wilson and McNaughton, 1993; Frank et al., 2004). Furthermore, any given environment activates only a small fraction of these neurons. Most of the ‘potential place cells’ remain silent, presumably because their sensory trigger feature does not match any of the locations in the current environment (Alme et al., 2014; Epsztein et al., 2011). In the endotaxis model, each of these sets of point cells is tied into a different map network, which would allow the circuit to maintain multiple cognitive maps in memory (Muller et al., 1991).

Goal cells, on the other hand, are expected to have large place fields, centered on a goal location, but extending over much of the environment, so the animal can follow the gradient of their activity (Burgess and O’Keefe, 1996). Indeed, such cells have been reported in rat cortex (Hok et al., 2005). In the endotaxis model, a goal cell appears suddenly when the animal first arrives at a memorable location, the input synapses from the map network are potentiated, and the neuron immediately develops a place field (Figure 4). This prediction is reminiscent of a startling experimental observation in recordings from hippocampal area CA1: a neuron can suddenly start firing with a fully formed place field that may be located anywhere in the environment (Bittner et al., 2017). This event appears to be triggered by a calcium plateau potential in the dendrites of the place cell, which potentiates the excitatory synaptic inputs the cell receives. A surprising aspect of this discovery was the large extent of the resulting place field, which requires the animal several seconds to cover. Subsequent cellular measurements indeed revealed a plasticity mechanism that extends over several seconds (Magee and Grienberger, 2020). The endotaxis model relies on just such a plasticity rule for map learning (Algorithm 2) that can correlate events at subsequent nodes on the agent’s trajectory.

Outlook

Endotaxis is a hypothetical neural circuit solution to the problems of spatial exploration, learning, and navigation. Its compact circuit structure and all-in-one functionality suggest that it would fit in even the smallest brains. Effectively, endotaxis represents a brain module that could be interposed between a spatial-sensing module, which produces place cells, and a taxis module, which delivers the movements to ascend a goal signal. It further relies on some high-level policy that sets the ‘mode switch’ by which the animal chooses what goal to pursue. Future research might get at this behavioral control mechanism through a program of anatomical module tracing: first find the neural circuit that controls chemotaxis behavior. Then test if that module receives a convergence of goal signals from other circuits with non-olfactory information. If so, the mechanism of arbitrage that routes one or another goal signal to the taxis module should reveal the high-level coordination of the animal’s behavior. Given the recent technical developments in mapping the connectome (Dorkenwald et al., 2023), we believe that such a program of module tracing is within reach, probably first for the insect brain.

Materials and methods

Simulations

Numerical simulations were performed as described (see Algorithms 1–4). Parameter settings are listed in the text and figure captions. The sensitivity to parameters is reported in Figure 6. Code that produced all the results is available in a public repository.

Average navigated distance

In the text, we often assess the performance of an endotaxis agent by considering point-to-point navigation between all pairs of points on a graph. Given the readout noise ϵ that affects the goal signal, navigation is a stochastic process with many random decisions along the route. Different random instantiations of the process will produce routes of different lengths. Fortunately, there is a way to calculate the expectation value of the route length without any Monte Carlo simulation.

Consider navigation to goal node y. From the state of the network (M and G), we compute the goal signal Eyj at every node j. When the agent is at node j, it chooses among the neighbor nodes the one with the highest sum of goal signal and noise (Algorithm 1). Based on the goal signal Eyj and the noise ϵ, one can compute the probability for each such possible step from j. This leads to a transition matrix for the random walk

Tij(y)=probability of stepping toiwhen atjwhile in pursuit ofy

Subsequent decisions along the route are independent of each other. Hence, the process is a Markov chain. Then we make use of a well-known result for first-capture times on a Markov chain to compute the expected number of steps to arrival at y starting from any node x.

Note the method assumes that the process is stationary Markov, such that the goal signal Exy does not change in the course of navigation. In our analysis of patrolling (Figures 9 and 11), this assumption is violated because the habituation state of the point cells depends on what path the agent took to the current node. In those cases, we resorted to Monte Carlo simulations to estimate the distribution of route lengths.

Nonlinear activation function

The activation function of a map neuron is the relationship of input to output

vi=f(wi) (20)

where (Equation 4)

wi=ui+jMijvj (21)

is the input to the map neuron. Most of the report assumes a linear activation function (Equation 3)

f(w)=γw (22)

For Figure 7, we used a saturating function instead (Equation 20):

f(w)={γw,ifw1γ,ifw>1 (23)

The recurrent network equation vi=f(ui+jMijvj) was solved using Python’s fsolve.

Forgetting of links and resources

In section ‘Acquisition of map and targets during exploration,’ we discuss the learning algorithm that acquires the connectivity of the environment and the locations of resources. It reacts rapidly to the appearance of new links in the environment: as soon as the agent travels from one point to another, the synapse between the corresponding map cells gets established. Suppose now that a previously existing link becomes blocked: How can one remove the corresponding synapse from the map? A simple solution would be to let all synapses decay over time, balanced by strengthening whenever a link gets traveled. In that case, the entire map would be forgotten when the animal goes to sleep for a few hours, whereas it is clear that animals retain such maps over many days. Instead, one wants a mode of active forgetting: memory of the link from node i to j should be weakened only if the agent find itself at node i and repeatedly chooses not to go to j. We formalize this in Algorithm 4, which differs only slightly from Algorithm 2.

Algorithm 4 Learning and forgetting.
Parameters: gain γ, threshold θ, goal-learning rate α, forgetting rate δ
Input: adjacency matrix A, resource signals F
  M0                                                 initiate map synapses at 0
  G0                                                  initiate goal synapses at 0
  t0                                                          t counts the steps
  s(t)x start random walk at x
  while learning do
    tt+1
    s(t) a random neighbor of s(t1)                   continue the random walk
    ui(t)δi,s(t) for every point cell i                                point cell output
    v(t)(1γ1M)1u(t)                                        map cell output
    for all map cell pairs (i,j) do
      if vj(t1)>θ then                                        if pre-synaptic high
        if vi(t)>θ then                                     if post-synaptic also high
          Mij,Mji1                                      potentiate the synapses
        else                                                     if post-synaptic low
          MijeδMij                                       depress the synapses
          MjieδMji
        end if
      end if
    end for
    rGv(t)                                                          goal signals
    for every goal neuron k do
      DFk,s(t)rk difference between resource signal and prediction from the map
      if D>0 then          if the resource signal exceeds the prediction from the map
        for every map neuron j do
          GkjGkj+αDvj(t)                              potentiate goal synapses
        end for
      else                                      if resource signal less than prediction
        for every map neuron j do
          GkjeδvjGkj                                     depress goal synapses
        end for
      end if
    end for
  end while

Here the added parameter δ determines how much a map synapse gets depressed each time the corresponding link is not chosen. Similarly, goal synapses decay if their prediction for a resource exceeds the resource signal received by the goal cell. The synaptic learning rule resembles the BCM rule (Bienenstock et al., 1982): synaptic modification is conditional on presynaptic activity and leads to either potentiation or depression depending on the level of postsynaptic activity.

Figure 10 illustrates this process with a simulation analogous to Figure 4. The agent explores a ring graph by a random walk. At some point, a new link appears clear across the ring. Later on that link disappears again. Acquisition of the link happens very quickly, within a single time step (Figure 10A and C). Forgetting that link takes longer, on the order of several hundred steps (Figure 10A, D and E). In this simulation, δ=0.1, so the map synapses decay by about 10% whenever a link is not traveled. One could, of course, accelerate that with a higher δ, but at the cost of destabilizing the entire map. Even the synapses for intact links get depressed frequently (Figure 10E) because the random choices of the agent lead it to take any given link only a fraction of the time.

Figure 10. Forgetting a link during exploration.

Figure 10.

(A) Simulation of a random walk on a ring with 14 nodes as in Figure 4. Left: layout of the ring, with resource locations marked in blue. The walk progresses in 1000 time steps (top to bottom); with the agent’s position marked in red (nodes 0–13, horizontal axis). At each time, the color map shows the goal signal that would be produced if the agent were at position ‘Node.’ White horizontal lines mark the appearance of a new link between nodes 4 and 11 at t=200, and disappearance of that link at t=400. (B) The matrix M of map synapses at various times. The pixel in row i and column j represents the matrix element Mij. Color purple = 0. Note the first few steps (number above graph) each add a new synapse. Eventually, M reflects the adjacency matrix of nodes on the graph, and changes as a link is added and removed. (C) Goal signals just before and just after the agent travels the new link. (D) Goal signals just before the link disappears and at the end of the walk. (E) Strength of two synapses in the map, M4,5 and M4,11, plotted against time during the random walk. Model parameters: γ=0.32,θ=0.27,α=0.3,δ=0.1.

One limitation of the endotaxis agent is that it does not keep a record of what actions are available at each node. Instead, it leaves that information in the environment (see ‘Discussion’) and simply tries all the actions that are available. When faced with a blocked tunnel, the endotaxis agent does not know that this was previously available. Clearly, a more advanced model of the world that includes a state–action table would allow more effective editing of the cognitive map.

Habituation in point cells

In section ‘Efficient patrolling,’ we discuss an extension of the core endotaxis model in which a point neuron undergoes habituation after the agent passes through its node. With every visit, the neuron’s sensitivity declines by a factor eβ. Between visits the sensitivity gradually returns toward 1 with an exponential recovery time of τ steps (see Algorithm 3).

This addition to the model changes the dynamics of the network input throughout the phases of exploration, navigation, and patrolling. We explored how the resulting performance is affected by applying a strong habituation that decays slowly (β=1.2,τ=100) and comparing to the basic model with no habituation (β=0). During the learning phase, when the map and goal synapses are established via a random walk, the main change is that it takes somewhat longer to learn the map. This is because synaptic updates happen only when both pre- and postsynaptic map cells exceed a threshold (see Algorithm 2), and that requires that both of the respective point neurons be in a high-sensitivity state. Remarkably all the parameter settings (γ,θ,α) that support learning and navigating under standard conditions (Figure 6) also work well when habituation takes place.

To illustrate the overall effect that habituation has on performance, we simulated learning and navigation on the binary tree graph of Figure 9. For every pair of start and end nodes, we asked how the actual navigated distance compared to the shortest graph distance. Figure 11 shows that performance is affected only slightly. At the standard noise value ϵ=0.01 used in other simulations, the range of navigation extends over 10 or more steps under both conditions.

Figure 11. Navigation performance with and without habituation.

Figure 11.

Navigated distance on the binary tree maze, displayed as in Figure 5E. Left: an agent with strong habituation: β=1.2,τ=100. Right: no habituation: β=0. The agent learned the map and the goal signals for every node during a random walk with 30,000 steps. Then the agent navigated between all pairs of points on the maze. Graphs show the median ± 10/90 percentile of the navigated distance for all routes with the same graph distance. Other model parameters: γ=0.33,θ=0.30,α=0.1,ϵ as listed.

Acknowledgements

This work was supported by the Simons Collaboration on the Global Brain (grant 543015 to MM and 543025 to PP), NSF award 1564330 to PP, and a gift from Google to PP.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Markus Meister, Email: meister4@mac.com.

Srdjan Ostojic, École Normale Supérieure - PSL, France.

Timothy E Behrens, University of Oxford, United Kingdom.

Funding Information

This paper was supported by the following grants:

  • Simons Foundation 543015 to Markus Meister.

  • Simons Foundation 543025 to Pietro Perona.

  • National Science Foundation 1564330 to Pietro Perona.

  • Google to Pietro Perona.

Additional information

Competing interests

No competing interests declared.

Reviewing editor, eLife.

Author contributions

Conceptualization, Software, Formal analysis, Investigation, Writing – review and editing.

Conceptualization, Investigation, Writing – review and editing.

Formal analysis, Investigation, Writing – review and editing.

Conceptualization, Formal analysis, Supervision, Funding acquisition, Investigation, Writing – review and editing.

Conceptualization, Data curation, Software, Formal analysis, Supervision, Funding acquisition, Investigation, Writing - original draft, Writing – review and editing.

Additional files

MDAR checklist

Data availability

Data and code to reproduce the reported results are openly available at https://github.com/markusmeister/Endotaxis-2023 (copy archived at Meister, 2024).

References

  1. Aboitiz F, Montiel JF. Olfaction, navigation, and the origin of isocortex. Frontiers in Neuroscience. 2015;9:402. doi: 10.3389/fnins.2015.00402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alme CB, Miao C, Jezek K, Treves A, Moser EI, Moser MB. Place cells in the hippocampus: Eleven maps for eleven rooms. PNAS. 2014;111:18428–18435. doi: 10.1073/pnas.1421056111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Aso Y, Sitaraman D, Ichinose T, Kaun KR, Vogt K, Belliart-Guérin G, Plaçais P-Y, Robie AA, Yamagata N, Schnaitmann C, Rowell WJ, Johnston RM, Ngo T-TB, Chen N, Korff W, Nitabach MN, Heberlein U, Preat T, Branson KM, Tanimoto H, Rubin GM. Mushroom body output neurons encode valence and guide memory-based action selection in Drosophila. eLife. 2014;3:e04580. doi: 10.7554/eLife.04580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Baker KL, Dickinson M, Findley TM, Gire DH, Louis M, Suver MP, Verhagen JV, Nagel KI, Smear MC. Algorithms for olfactory search across species. The Journal of Neuroscience. 2018;38:9383–9389. doi: 10.1523/JNEUROSCI.1668-18.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Berg HC. In: Cold Spring Harbor Symposia on Quantitative Biology. Berg HC, editor. Elesiver; 1988. A physicist looks at bacterial Chemotaxis; pp. 1–9. [DOI] [PubMed] [Google Scholar]
  6. Bienenstock EL, Cooper LN, Munro PW. Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. The Journal of Neuroscience. 1982;2:32–48. doi: 10.1523/JNEUROSCI.02-01-00032.1982. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bittner KC, Milstein AD, Grienberger C, Romani S, Magee JC. Behavioral time scale synaptic plasticity underlies CA1 place fields. Science. 2017;357:1033–1036. doi: 10.1126/science.aan3846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Blum KI, Abbott LF. A model of spatial map formation in the hippocampus of the rat. Neural Computation. 1996;8:85–93. doi: 10.1162/neco.1996.8.1.85. [DOI] [PubMed] [Google Scholar]
  9. Buehlmann C, Wozniak B, Goulard R, Webb B, Graham P, Niven JE. Mushroom bodies are required for learned visual navigation, but not for innate visual behavior, in ants. Current Biology. 2020;30:3438–3443. doi: 10.1016/j.cub.2020.07.013. [DOI] [PubMed] [Google Scholar]
  10. Burgess N, O’Keefe J. Neuronal computations underlying the firing of place cells and their role in navigation. Hippocampus. 1996;6:749–762. doi: 10.1002/(SICI)1098-1063(1996)6:6&#x0003c;749::AID-HIPO16&#x0003e;3.0.CO;2-0. [DOI] [PubMed] [Google Scholar]
  11. Cohn R, Morantte I, Ruta V. Coordinated and compartmentalized neuromodulation shapes sensory processing in Drosophila. Cell. 2015;163:1742–1755. doi: 10.1016/j.cell.2015.11.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Collett TS, Collett M. Memory use in insect visual navigation. Nature Reviews. Neuroscience. 2002;3:542–552. doi: 10.1038/nrn872. [DOI] [PubMed] [Google Scholar]
  13. Corneil DS, Gerstner W. Attractor network Dynamics enable Preplay and rapid path planning in maze–like environments. Advances in Neural Information Processing Systems.2015. [Google Scholar]
  14. Dayan P. Improving generalization for temporal difference learning: the successor representation. Neural Computation. 1993;5:613–624. doi: 10.1162/neco.1993.5.4.613. [DOI] [Google Scholar]
  15. Dayan P, Abbott LF. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. Computational Neuroscience. Cambridge, Mass: MIT Press; 2001. [Google Scholar]
  16. Dorkenwald S, Matsliah A, Sterling AR, Schlegel P, Yu SC, McKellar CE, Lin A, Costa M, Eichler K, Yin Y, Silversmith W, Schneider-Mizell C, Jordan CS, Brittain D, Halageri A, Kuehner K, Ogedengbe O, Morey R, Gager J, Kruk K, Perlman E, Yang R, Deutsch D, Bland D, Sorek M, Lu R, Macrina T, Lee K, Bae JA, Mu S, Nehoran B, Mitchell E, Popovych S, Wu J, Jia Z, Castro M, Kemnitz N, Ih D, Bates AS, Eckstein N, Funke J, Collman F, Bock DD, Jefferis G, Seung HS, Murthy M, FlyWire Consortium Neuronal Wiring Diagram of an Adult Brain. bioRxiv. 2023 doi: 10.1101/2023.06.27.546656. [DOI] [PMC free article] [PubMed]
  17. Eichler K, Li F, Litwin-Kumar A, Park Y, Andrade I, Schneider-Mizell CM, Saumweber T, Huser A, Eschbach C, Gerber B, Fetter RD, Truman JW, Priebe CE, Abbott LF, Thum AS, Zlatic M, Cardona A. The complete connectome of a learning and memory centre in an insect brain. Nature. 2017;548:175–182. doi: 10.1038/nature23455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Epsztein J, Brecht M, Lee AK. Intracellular determinants of hippocampal CA1 place and silent cell activity in a novel environment. Neuron. 2011;70:109–120. doi: 10.1016/j.neuron.2011.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Fang C, Aronov D, Abbott LF, Mackevicius EL. Neural learning rules for generating flexible predictions and computing the successor representation. eLife. 2023;12:e80680. doi: 10.7554/eLife.80680. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Frank LM, Stanley GB, Brown EN. Hippocampal plasticity across multiple days of exposure to novel environments. The Journal of Neuroscience. 2004;24:7681–7689. doi: 10.1523/JNEUROSCI.1958-04.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Galtier MN, Faugeras OD, Bressloff PC. Hebbian learning of recurrent connections: A geometrical perspective. Neural Computation. 2012;24:2346–2383. doi: 10.1162/NECO_a_00322. [DOI] [PubMed] [Google Scholar]
  22. Garvert MM, Dolan RJ, Behrens TE. A map of abstract relational knowledge in the human hippocampal-entorhinal cortex. eLife. 2017;6:e17086. doi: 10.7554/eLife.17086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Gaussier P, Revel A, Banquet JP, Babeau V. From view cells and place cells to cognitive map learning: processing stages of the hippocampal system. Biological Cybernetics. 2002;86:15–28. doi: 10.1007/s004220100269. [DOI] [PubMed] [Google Scholar]
  24. Gerstner W, Kistler WM. Mathematical formulations of Hebbian learning. Biological Cybernetics. 2002;87:404–415. doi: 10.1007/s00422-002-0353-y. [DOI] [PubMed] [Google Scholar]
  25. Glasius R, Komoda A, Gielen SCAM. A biologically inspired neural net for trajectory formation and obstacle avoidance. Biological Cybernetics. 1996;74:511–520. doi: 10.1007/BF00209422. [DOI] [PubMed] [Google Scholar]
  26. Gorchetchnikov A, Hasselmo ME. A biophysical implementation of A bidirectional graph search algorithm to solve multiple goal navigation tasks. Connection Science. 2005;17:145–164. doi: 10.1080/09540090500140925. [DOI] [Google Scholar]
  27. Heisenberg M. Mushroom body memoir: from maps to models. Nature Reviews. Neuroscience. 2003;4:266–275. doi: 10.1038/nrn1074. [DOI] [PubMed] [Google Scholar]
  28. Hok V, Save E, Lenck-Santini PP, Poucet B. Coding for spatial goals in the prelimbic/infralimbic area of the rat frontal cortex. PNAS. 2005;102:4602–4607. doi: 10.1073/pnas.0407332102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Hollup SA, Molden S, Donnett JG, Moser MB, Moser EI. Accumulation of hippocampal place fields at the goal location in an annular watermaze task. The Journal of Neuroscience. 2001;21:1635–1644. doi: 10.1523/JNEUROSCI.21-05-01635.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Honkanen A, Adden A, da Silva Freitas J, Heinze S. The insect central complex and the neural basis of navigational strategies. The Journal of Experimental Biology. 2019;222:Suppl. doi: 10.1242/jeb.188854. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Jacobs LF. From chemotaxis to the cognitive map: The function of olfaction. PNAS. 2012;109:10693–10700. doi: 10.1073/pnas.1201880109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Kay K, Chung JE, Sosa M, Schor JS, Karlsson MP, Larkin MC, Liu DF, Frank LM. Constant sub-second cycling between representations of possible futures in the hippocampus. Cell. 2020;180:552–567. doi: 10.1016/j.cell.2020.01.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Khajeh-Alijani A, Urbanczik R, Senn W. Scale-Free navigational planning by neuronal traveling waves. PLOS ONE. 2015;10:e0127269. doi: 10.1371/journal.pone.0127269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Kjelstrup KB, Solstad T, Brun VH, Hafting T, Leutgeb S, Witter MP, Moser EI, Moser MB. Finite scale of spatial representation in the hippocampus. Science. 2008;321:140–143. doi: 10.1126/science.1157086. [DOI] [PubMed] [Google Scholar]
  35. Knaden M, Graham P. The sensory ecology of ant navigation: from natural environments to neural mechanisms. Annual Review of Entomology. 2016;61:63–76. doi: 10.1146/annurev-ento-010715-023703. [DOI] [PubMed] [Google Scholar]
  36. Lashley KS. Visual discrimination of size and form in the albino rat. Journal of Animal Behavior. 1912;2:310–331. doi: 10.1037/h0071033. [DOI] [Google Scholar]
  37. Li F, Lindsey JW, Marin EC, Otto N, Dreher M, Dempsey G, Stark I, Bates AS, Pleijzier MW, Schlegel P, Nern A, Takemura S-Y, Eckstein N, Yang T, Francis A, Braun A, Parekh R, Costa M, Scheffer LK, Aso Y, Jefferis GS, Abbott LF, Litwin-Kumar A, Waddell S, Rubin GM. The connectome of the adult Drosophila mushroom body provides insights into function. eLife. 2020;9:e62576. doi: 10.7554/eLife.62576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Magee JC, Grienberger C. Synaptic plasticity forms and functions. Annual Review of Neuroscience. 2020;43:95–117. doi: 10.1146/annurev-neuro-090919-022842. [DOI] [PubMed] [Google Scholar]
  39. Martinet LE, Sheynikhovich D, Benchenane K, Arleo A. Spatial learning and action planning in a prefrontal cortical network model. PLOS Computational Biology. 2011;7:e1002045. doi: 10.1371/journal.pcbi.1002045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Meister M. A Fast Algorithm for All-Pairs-Shortest-Paths Suitable for Neural Networks. arXiv. 2023 doi: 10.1162/neco_a_01716. https://arxiv.org/abs/2308.07403 [DOI] [PubMed]
  41. Meister M. Endotaxis-2023. swh:1:rev:7c97e345063101f15c59ab9d321a3eea9809fa8bSoftware Heritage. 2024 https://archive.softwareheritage.org/swh:1:dir:5039b9a19394508de3394e91877b8fc01f672f08;origin=https://github.com/markusmeister/Endotaxis-2023;visit=swh:1:snp:4a9895f6868d180d3ba38e3ea92083c91fd42693;anchor=swh:1:rev:7c97e345063101f15c59ab9d321a3eea9809fa8b
  42. Moerland TM, Broekens J, Plaat A, Jonker CM. Model-based reinforcement learning: A survey. Ieee; 2023. [DOI] [Google Scholar]
  43. Morris RGM, Garrud P, Rawlins JNP, O’Keefe J. Place navigation impaired in rats with hippocampal lesions. Nature. 1982;297:681–683. doi: 10.1038/297681a0. [DOI] [PubMed] [Google Scholar]
  44. Moser MB, Rowland DC, Moser EI. Place cells, grid cells, and memory. Cold Spring Harbor Perspectives in Biology. 2015;7:a021808. doi: 10.1101/cshperspect.a021808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Müller M, Wehner R. Path integration in desert ants, Cataglyphis fortis. PNAS. 1988;85:5287–5290. doi: 10.1073/pnas.85.14.5287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Muller RU, Kubie JL, Saypoff R. The hippocampus as a cognitive graph (abridged version) Hippocampus. 1991;1:243–246. doi: 10.1002/hipo.450010306. [DOI] [PubMed] [Google Scholar]
  47. Nyberg N, Duvelle É, Barry C, Spiers HJ. Spatial goal coding in the hippocampal formation. Neuron. 2022;110:394–422. doi: 10.1016/j.neuron.2021.12.012. [DOI] [PubMed] [Google Scholar]
  48. Ponulak F, Hopfield JJ. Rapid, parallel path planning by propagating wavefronts of spiking neural activity. Frontiers in Computational Neuroscience. 2013;7:98. doi: 10.3389/fncom.2013.00098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Redish AD. Vicarious trial and error. Nature Reviews. Neuroscience. 2016;17:147–159. doi: 10.1038/nrn.2015.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Rosenberg M, Zhang T, Perona P, Meister M. Mice in a labyrinth show rapid learning, sudden insight, and efficient exploration. eLife. 2021;10:e66175. doi: 10.7554/eLife.66175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Samsonovich AV, Ascoli GA. A simple neural network model of the hippocampus suggesting its pathfinding role in episodic memory retrieval. Learning & Memory. 2005;12:193–208. doi: 10.1101/lm.85205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Santos-Pata D, Verschure PFMJ. Human vicarious trial and error is predictive of spatial navigation performance. Frontiers in Behavioral Neuroscience. 2018;12:237. doi: 10.3389/fnbeh.2018.00237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Schmajuk NA, Thieme AD. Purposive behavior and cognitive mapping: A neural network model. Biological Cybernetics. 1992;67:165–174. doi: 10.1007/BF00201023. [DOI] [PubMed] [Google Scholar]
  54. Schölkopf B, Mallot HA. View-Based cognitive mapping and path planning. Adaptive Behavior. 1995;3:311–348. doi: 10.1177/105971239500300303. [DOI] [Google Scholar]
  55. Sosa M, Giocomo LM. Navigating for reward. Nature Reviews. Neuroscience. 2021;22:472–487. doi: 10.1038/s41583-021-00479-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Stachenfeld KL, Botvinick MM, Gershman SJ. The hippocampus as a predictive map. Nature Neuroscience. 2017;20:1643–1653. doi: 10.1038/nn.4650. [DOI] [PubMed] [Google Scholar]
  57. Steck K, Hansson BS, Knaden M. Smells like home: Desert ants, Cataglyphis fortis, use olfactory landmarks to pinpoint the nest. Frontiers in Zoology. 2009;6:5. doi: 10.1186/1742-9994-6-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Stopfer M. Central processing in the mushroom bodies. Current Opinion in Insect Science. 2014;6:99–103. doi: 10.1016/j.cois.2014.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Strausfeld NJ, Sinakevitch I, Brown SM, Farris SM. Ground plan of the insect mushroom body: functional and evolutionary implications. The Journal of Comparative Neurology. 2009;513:265–291. doi: 10.1002/cne.21948. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Sun X, Yue S, Mangan M. A decentralised neural model explaining optimal integration of navigational strategies in insects. eLife. 2020;9:e54026. doi: 10.7554/eLife.54026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press; 2018. [Google Scholar]
  62. Takemura S-Y, Aso Y, Hige T, Wong A, Lu Z, Xu CS, Rivlin PK, Hess H, Zhao T, Parag T, Berg S, Huang G, Katz W, Olbris DJ, Plaza S, Umayam L, Aniceto R, Chang L-A, Lauchie S, Ogundeyi O, Ordish C, Shinomiya A, Sigmund C, Takemura S, Tran J, Turner GC, Rubin GM, Scheffer LK. A connectome of A learning and memory center in the adult Drosophila brain. eLife. 2017;6:e26975. doi: 10.7554/eLife.26975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Tarsitano M. Route selection by a jumping spider (Portia labiata) during the locomotory phase of a detour. Animal Behaviour. 2006;72:1437–1442. doi: 10.1016/j.anbehav.2006.05.007. [DOI] [PubMed] [Google Scholar]
  64. Thistlethwaite D. A critical review of latent learning and related experiments. Psychological Bulletin. 1951;48:97–129. doi: 10.1037/h0055171. [DOI] [PubMed] [Google Scholar]
  65. Tolman EC. Cognitive maps in rats and men. Psychological Review. 1948;55:189–208. doi: 10.1037/h0061626. [DOI] [PubMed] [Google Scholar]
  66. Trullier O, Meyer JA. Animat navigation using a cognitive graph. Biological Cybernetics. 2000;83:271–285. doi: 10.1007/s004220000170. [DOI] [PubMed] [Google Scholar]
  67. Voicu H, Schmajuk N. Exploration, navigation and cognitive mapping. Adaptive Behavior. 2000;8:207–223. doi: 10.1177/105971230000800301. [DOI] [Google Scholar]
  68. Webb B, Wystrach A. Neural mechanisms of insect navigation. Current Opinion in Insect Science. 2016;15:27–39. doi: 10.1016/j.cois.2016.02.011. [DOI] [PubMed] [Google Scholar]
  69. Wilson MA, McNaughton BL. Dynamics of the hippocampal ensemble code for space. Science. 1993;261:1055–1058. doi: 10.1126/science.8351520. [DOI] [PubMed] [Google Scholar]

eLife assessment

Srdjan Ostojic 1

This valuable work proposes a framework inspired by chemotaxis for understanding how the brain might implement behaviors related to navigating toward a goal. The evidence supporting the conceptual claim is convincing. The article proposes a hypothesis that would be of interest to the broad systems neuroscience community, although it was noted the relationship to existing similar hypotheses could be clarified.

Reviewer #1 (Public Review):

Anonymous

This paper presents a highly compelling and novel hypothesis for how the brain could generate signals to guide navigation toward remembered goals. Under this hypothesis, which the authors call "Endotaxis", the brain co-opts its ancient ability to navigate up odor gradients (chemotaxis) by generating a "virtual odor" that grows stronger the closer the animal is to a goal location. This idea is compelling from an evolutionary perspective and a mechanistic perspective. The paper is well-written and delightful to read.

The authors develop a detailed model of how the brain may perform "Endotaxis", using a variety of interconnected cell types (point, map, and goal cells) to inform the chemotaxis system. They tested the ability of this model to navigate in several state spaces, representing both physical mazes and abstract cognitive tasks. The Endotaxis model performed reasonably well across different environments and different types of goals.

The authors further tested the model using parameter sweeps and discovered a critical level of network gain, beyond which task performance drops. This critical level approximately matched analytical derivations.

Overall, this paper provides a very compelling model for how neural circuits may have evolved the ability to navigate towards remembered goals, using ancient chemotaxis circuits.

This framework will likely be very important for understanding how the hippocampus (and other memory/navigation-related circuits) interfaces with other processes in the brain, giving rise to memory-guided behavior.

Reviewer #2 (Public Review):

Anonymous

The manuscript presents a computational model of how an organism might learn a map of the structure of its environment and the location of valuable resources through synaptic plasticity, and how this map could subsequently be used for goal-directed navigation.

The model is composed of 'map cells', which learn the structure of the environment in their recurrent connections, and 'goal-cell' which store the location of valued resources with respect to the map cell population. Each map cell corresponds to a particular location in the environment due to receiving external excitatory input at this location. The synaptic plasticity rule between map cells potentiates synapses when activity above a specified threshold at the pre-synaptic neuron is followed by above-threshold activity at the post-synaptic neuron. The threshold is set such that map neurons are only driven above this plasticity threshold by the external excitatory input, causing synapses to only be potentiated between a pair of map neurons when the organism moves directly between the locations they represent. This causes the weight matrix between the map neurons to learn the adjacency for the graph of locations in the environment, i.e. after learning the synaptic weight matrix matches the environment's adjacency matrix. Recurrent activity in the map neuron population then causes a bump of activity centred on the current location, which drops off exponentially with the diffusion distance on the graph. Each goal cell receives input from the map cells, and also from a 'resource cell' whose activity indicates the presence or absence of a given values resource at the current location. Synaptic plasticity potentiates map-cell to goal-cell synapses in proportion to the activity of the map cells at time-points when the resource cell is active. This causes goal cell activity to increase when the activity of the map cell population is similar to the activity where the resource was obtained. The upshot of all this is that after learning the activity of goal cells decreases exponentially with the diffusion distance from the corresponding goal location. The organism can therefore navigate to a given goal by doing gradient ascent on the activity of the corresponding goal cell. The process of evaluating these gradients and using them to select actions is not modelled explicitly, but the authors point to the similarity of this mechanism to chemotaxis (ascending a gradient of odour concentration to reach the odour source), and the widespread capacity for chemotaxis in the animal kingdom, to argue for its biological plausibility. The ideas are interesting and the presentation of the results in the manuscript is generally clear.

Closely related ideas have been explored in previous work, and there are some aspects of how the work relates to previous literature that it would be useful to clarify. Several lines of work have proposed learning long-range relationships between states in the environment, to enable navigation to rewarding goals by effectively descending distance gradients. The most well-known of these in the neuroscience literature is the Successor Representation (SR) (Dayan 1993), which is defined as the expected discounted future occupancy of each state given the current state. As noted in the discussion, this is closely related to the representation learnt by the map cells in the current model. The key difference is that the successor representation uses state-state transitions under a given policy (a mapping from states to actions), whereas the current model uses the adjacency matrix between states, which depends only on the environment and hence is independent of the policy followed while the representation is learnt (given sufficient exploration). This policy independence is useful, as the SR can fail to generate good routes to goals when these are very different from the policy under which it was learned (see Russek et al. https://doi.org/10.1371/journal.pcbi.1005768). However, there are several prior proposals for policy-independent SR-like mechanisms that it would be useful to discuss. Baram et al. (https://doi.org/10.1101/421461) propose navigating to goals by doing gradient descent on diffusion distances, computed as powers of the adjacency matrix as in the current work. One limitation of using the adjacency matrix is that it does not handle situations where transitions between states are probabilistic, which is not a big issue for navigation in physical space but is for applying the mechanism to cognitive tasks more broadly. There are prior ideas for learning policy-independent representations similar to the SR that do not have this limitation. Kaelbling (Learning to achieve goals, IJCAI, 1993) proposed using an off-policy learning rule similar to Q-learning, to learn shortest path distances between states. Piray and Daw (https://doi.org/10.1038/s41467-021-25123-3) consider a default representation, which is a successor-like representation under a generic default policy, building on the Linear Markov Decision Process (LMDP) framework of Todorov (https://doi.org/10.1073/pnas.0710743106). Also relevant to the current study is the work of Fang et al. (https://doi.org/10.7554/eLife.80680) who, as in the current work, propose using recurrent network dynamics to compute a long-range representation (the SR) from synaptic weights that store local transition information.

One other area where I felt the work could be better integrated with the existing literature was the discussion of mapping the model onto brain circuits. An interesting and attractive aspect of the work is the idea that the relatively high-level operation of goal-directed navigation could be built on top of evolutionarily older mechanisms for ascending odour gradients. Given this framing, I was expecting the discussion of brain circuits to consider interactions between spatial mapping systems and regions involved in olfactory processing. However the discussion of mammalian brains focussed exclusively on the hippocampus without any link to olfaction, which feels like a missed opportunity. I am not an expert on olfaction, but one region that seems particularly interesting in this context is the olfactory tubercle (see Wesson & Wilson https://doi.org/10.1016/j.neubiorev.2010.08.004 for a review). This region is contiguous with the ventral striatum and has similar local circuitry, receives strong input from olfactory regions, but also input from the hippocampal formation, and a strong dopaminergic innervation from VTA. This suggests a mapping of the model to brain circuits in which map cells in the hippocampal formation project to goal cells in the olfactory tubercle, with the dopaminergic input acting as resource cells (note that different dopamine neuron populations appear to respond to different reward types, see e.g. https://doi.org/10.1038/s41586-022-04954-0, https://doi.org/10.1101/2023.05.09.540067). I was also surprised not to see any discussion of internally generated sequential activity in the hippocampus as a possible mechanism for the look-ahead needed to evaluate the goal distance gradient, particularly given the authors suggest that vicarious trial and error (VTE) is a behavioural signature of this gradient sampling, and it is known that during VTE hippocampus plays out internally generated sequences of possible future locations (see Redish https://doi.org/10.1038/nrn.2015.30).

Reviewer #3 (Public Review):

Anonymous

This paper describes an algorithm that provides a general mechanism for goal-directed behaviour in a biologically plausible neural form.

The method depends on substantial simplifying assumptions. The simulated animal effectively moves through an environment consisting of discrete locations and can reliably detect when it is in each location. Whenever it moves from one location to an adjacent location, it perfectly learns the connectivity between these two locations (changes the value in an adjacency matrix to 1). This creates a graph of connections that reflects the explored environment. In this graph, the current location gets input activation and this spreads to all connected nodes multiplied by a constant decay (adjusted to the branching number of the graph) so that as the number of connection steps increases the activation decreases. Some locations will be marked as goals through experiencing a resource of a specific identity there and subsequently will be activated by an amount proportional to their distance in the graph from the current location, i.e., their activation will increase if the agent moves a step closer and decrease if it moves a step further away. Hence by making such exploratory movements, the animal can decide which way to move to obtain a specified goal.

Although the algorithm is presented within a conceptual framework of chemotaxis, I.e., making movements to sample a local gradient and move up it, the approach relates closely to previous models of exploration, learning, and navigation that similarly establish (through experience) a graph structure to represent how locations are connected and use some form of activity-propagation from the current node or goal node to identify a (shortest) route between them. Many of these similarly claim to be plausible neural circuits. The current authors argue that the current algorithm has several desirable features with respect to such previous work: for example, the 'readout' of the path does not require explicit 'look-up' and activation of the goal node (although it does require a choice of which goal node is currently connected to behavior); and does not require any separate control or rules for learning vs. navigation phases. By comparison to the successor representation method used in RL, which also appears related, they note that the gain (decay) factor is not equivalent to a temporal discount and that their method learns only state-state transitions, allowing the value of actions to be externalised, I.e., calculated by trying alternative actions to see which increases the activation at the goal node the most. On the other hand, it should be noted that some issues addressed in previous models, such as uncertainty over the current state or probabilistic state(-action) transitions are not addressed in this work.

The algorithm presents some elegant features with respect to previous work such as conceptually separating the 'goal' nodes from the state (location) graph (I.e. 'goals' are not just special target states within the graph) so that a small number of goals can become associated to (potentially) multiple regions of the state graph where they are satisfied, or near to being satisfied. This architecture is suggested, in the discussion, to resemble the insect mushroom body (MB), where it is known that a small number of output neurons (MBONs, putative goal neurons) are activated by plastic connections from Kenyon cells (KCs, putative state neurons). However, it goes substantially beyond any available evidence to claim that KC connectivity could support the acquisition of a graph (in the form of an adjacency matrix) representing the structure of the environment: KCs show sparse distributed activity (not one active node per state); it seems unlikely that any two arbitrary KCs can (rapidly) become connected; and as yet has not been demonstrated that KC connectivity is plastic at all.

The results presented are fairly straightforward given the simplification of the tasks, as described above. They show (1) in practical terms, the spreading signal travels further for a larger decay but becomes erratic as the decay parameter (map neuron gain) approaches its theoretical upper bound and decreases below noise levels beyond a certain distance. Both follow the theory but it is perhaps helpful to see that there is a viable range of values of the gain for which the mechanism works, that is, it is not highly dependent on precise tuning. (2) That different graph structures can be acquired and used to approach goal locations (not surprising). (3) That simultaneous learning and exploitation of the graph only minimally affects the performance over starting with perfect knowledge of the graph. (4) That the parameters interact in expected ways. (5) That the separation of goals from states can be used flexibly e.g. the homing behaviour (a goal state is learned before any of the map is learned) and the patrolling behaviour (a goal cell that monitors all states for how recently they were visited). It is also interesting to link the mechanism of exploration of neighbouring states to observed scanning behaviours in navigating animals. It would have been interesting to explore whether the parameters could be dynamically tuned, based on the overall graph activity.

eLife. 2024 Feb 29;12:RP84141. doi: 10.7554/eLife.84141.3.sa4

Author Response

Tony Zhang 1, Matthew Hudson Rosenberg 2, Zeyu Jing 3, Pietro Perona 4, Markus Meister 5

The following is the authors’ response to the original reviews.

Thank you for the detailed and constructive reviews. We revised the paper accordingly, and a point-by-point reply appears below. The main changes are:

  • An extended discussion section that places our work in context with other related developments in theory and modeling.

  • A new results section that demonstrates a substantial improvement in performance from a non-linear activation function. This led to addition of a co-author.

  • The mathematical proof that the resolvent of the adjacency matrix leads to the shortest path distances has been moved to a separate article, available as a preprint and attached to this resubmission. This allows us to present that work in the context of graph theory, and focus the present paper on neural modeling.

Reviewer #1 (Public Review):

This paper presents a highly compelling and novel hypothesis for how the brain could generate signals to guide navigation towards remembered goals. Under this hypothesis, which the authors call "Endotaxis", the brain co-opts its ancient ability to navigate up odor gradients (chemotaxis) by generating a "virtual odor" that grows stronger the closer the animal is to a goal location. This idea is compelling from an evolutionary perspective and a mechanistic perspective. The paper is well-written and delightful to read.

The authors develop a detailed model of how the brain may perform "Endotaxis", using a variety of interconnected cell types (point, map, and goal cells) to inform the chemotaxis system. They tested the ability of this model to navigate in several state spaces, representing both physical mazes and abstract cognitive tasks. The Endotaxis model performed reasonably well across different environments and different types of goals.

The authors further tested the model using parameter sweeps and discovered a critical level of network gain, beyond which task performance drops. This critical level approximately matched analytical derivations.

My main concern with this paper is that the analysis of the critical gain value (gamma_c) is incomplete, making the implications of these analyses unclear. There are several different reasonable ways in which the Endotaxis map cell representations might be normalized, which I suspect may lead to different results. Specifically, the recurrent connections between map cells may either be an adjacency matrix, or a normalized transition matrix. In the current submission, the recurrent connections are an unnormalized adjacency matrix. In a previous preprint version of the Endotaxis manuscript, the recurrent connections between the map cells were learned using Oja's rule, which results in a normalized state-transition matrix (see "Appendix 5: Endotaxis model and the successor representation" in "Neural learning rules for generating flexible predictions and computing the successor representation", your reference 17). The authors state "In summary, this sensitivity analysis shows that the optimal parameter set for endotaxis does depend on the environment". Is this statement, and the other conclusions of the sensitivity analysis, still true if the learned recurrent connections are a properly normalized state-transition matrix?

Yes, this is an interesting topic. In v.1 of our bioRxiv preprint we used Oja’s rule for learning, which will converge on a map connectivity that reflects the transition probabilities. The matrix M becomes a left-normalized or right-normalized stochastic matrix, depending on whether one uses the pre-synaptic or the post-synaptic version of Oja’s rule. This is explained well in Appendix 5 of Fang 2023.

In the present version of the model we use a rule that learns the adjacency matrix A, not the transition matrix T. The motivation is that we want to explain instances of oneshot learning, where an agent acquires a route after traversing it just once. For example, we had found experimentally that mice can execute a complex homing route on the first attempt.

An agent can establish whether two nodes are connected (adjacency) the very first time it travels from one node to the other. Whereas it can evaluate the transition probability for that link only after trying this and all the other available links on multiple occasions. Hence the normalization terms in Oja’s rule, or in the rule used by Fang 2023, all involve some time-averaging over multiple visits to the same node. This implements a gradual learning process over many experiences, rather than a one-shot acquisition on the first experience.

Still one may ask whether there are advantages to learning the transition matrix rather than the adjacency matrix. We looked into this with the following results:

• The result that (1/γ − A)−1 is monotonically related to the graph distances D in the limit of small γ (a proof now moved to the Meister 2023 preprint) , holds also for thetransition matrix T. The proof follows the same steps. So in the small gain limit, the navigation model would work with T as well.

• If one uses the transition matrix to compute the network output (1/γ − T)-1 then the critical gain value is γc = 1. It is well known that the largest eigenvalue of any Markov transition matrix is 1, and the critical gain γc is the inverse of that. This result is independent of the graph. So this offers the promise that the network could use the same gain parameter γ regardless of the environment.

• In practice, however, the goal signal turned out to be less robust when based on T than when based on A. We illustrate this with the attached Author response image 1. This replicates the analysis in Figure 3 of the manuscript, using the transition matrix instead of the adjacency matrix. Some observations:

• Panel B: The goal signal follows an exponential dependence on graph distance much more robustly for the model with A than with T. This holds even for small gain values where the exponential decay is steep.

• Panel C: As one raises the gain closer to the critical value, the goal signal based on T scatters much more than when based on A.

• Panels D, E: Navigation based on A works better than based on T. For example, using the highest practical gain value, and a readout noise of ϵ = 0.01, navigation based on T has a range of only 8 steps on this graph, whereas navigation based on A ranges over 12 steps, the full size of this graph.

We have added a section “Choice of learning rule” to explain this. The Author response image 1 is part of the code notebook on Github.

Author response image 1.

Author response image 1.

Overall, this paper provides a very compelling model for how neural circuits may have evolved the ability to navigate towards remembered goals, using ancient chemotaxis circuits.

This framework will likely be very important for understanding how the hippocampus (and other memory/navigation-related circuits) interfaces with other processes in the brain, giving rise to memory-guided behavior.

Reviewer #2 (Public Review):

The manuscript presents a computational model of how an organism might learn a map of the structure of its environment and the location of valuable resources through synaptic plasticity, and how this map could subsequently be used for goal-directed navigation.

The model is composed of 'map cells', which learn the structure of the environment in their recurrent connections, and 'goal-cell' which stores the location of valued resources with respect to the map cell population. Each map cell corresponds to a particular location in the environment due to receiving external excitatory input at this location. The synaptic plasticity rule between map cells potentiates synapses when activity above a specified threshold at the pre-synaptic neuron is followed by above-threshold activity at the post-synaptic neuron. The threshold is set such that map neurons are only driven above this plasticity threshold by the external excitatory input, causing synapses to only be potentiated between a pair of map neurons when the organism moves directly between the locations they represent. This causes the weight matrix between the map neurons to learn the adjacency for the graph of locations in the environment, i.e. after learning the synaptic weight matrix matches the environment's adjacency matrix. Recurrent activity in the map neuron population then causes a bump of activity centred on the current location, which drops off exponentially with the diffusion distance on the graph. Each goal cell receives input from the map cells, and also from a 'resource cell' whose activity indicates the presence or absence of a given values resource at the current location. Synaptic plasticity potentiates map-cell to goal-cell synapses in proportion to the activity of the map cells at time points when the resource cell is active. This causes goal cell activity to increase when the activity of the map cell population is similar to the activity where the resource was obtained. The upshot of all this is that after learning the activity of goal cells decreases exponentially with the diffusion distance from the corresponding goal location. The organism can therefore navigate to a given goal by doing gradient ascent on the activity of the corresponding goal cell. The process of evaluating these gradients and using them to select actions is not modelled explicitly, but the authors point to the similarity of this mechanism to chemotaxis (ascending a gradient of odour concentration to reach the odour source), and the widespread capacity for chemotaxis in the animal kingdom, to argue for its biological plausibility.

The ideas are interesting and the presentation in the manuscript is generally clear. The two principle limitations of the manuscript are: (i) Many of the ideas that the model implements have been explored in previous work. (ii) The mapping of the circuit model onto real biological systems is pretty speculative, particularly with respect to the cerebellum.

Regarding the novelty of the work, the idea of flexibly navigating to goals by descending distance gradients dates back to at least Kaelbling (Learning to achieve goals, IJCAI, 1993), and is closely related to both the successor representation (cited in manuscript) and Linear Markov Decision Processes (LMDPs) (Piray and Daw, 2021, https://doi.org/ 10.1038/s41467-021-25123-3, Todorov, 2009 https://doi.org/10.1073/pnas.0710743106). The specific proposal of navigating to goals by doing gradient descent on diffusion distances, computed as powers of the adjacency matrix, is explored in Baram et al. 2018 (https://doi.org/10.1101/421461), and the idea that recurrent neural networks whose weights are the adjacency matrix can compute diffusion distances are explored in Fang et al. 2022 (https://doi.org/10.1101/2022.05.18.492543). Similar ideas about route planning using the spread of recurrent activity are also explored in Corneil and Gerstner (2015, cited in manuscript). Further exploration of this space of ideas is no bad thing, but it is important to be clear where prior literature has proposed closely related ideas.

We have added a discussion section on “Theories and models of spatial learning” with a survey of ideas in this domain and how they come together in the Endotaxis model.

Regarding whether the proposed circuit model might plausibly map onto a real biological system, I will focus on the mammalian brain as I don't know the relevant insect literature. It was not completely clear to me how the authors think their model corresponds to mammalian brain circuits. When they initially discuss brain circuits they point to the cerebellum as a plausible candidate structure (lines 520-546). Though the correspondence between cerebellar and model cell types is not very clearly outlined, my understanding is they propose that cerebellar granule cells are the 'map-cells' and Purkinje cells are the 'goal-cells'. I'm no cerebellum expert, but my understanding is that the granule cells do not have recurrent excitatory connections needed by the map cells. I am also not aware of reports of place-field-like firing in these cell populations that would be predicted by this correspondence. If the authors think the cerebellum is the substrate for the proposed mechanism they should clearly outline the proposed correspondence between cerebellar and model cell types and support the argument with reference to the circuit architecture, firing properties, lesion studies, etc.

On further thought we agree that the cerebellum-like circuits are not a plausible substrate for the endotaxis algorithm. The anatomy looks compelling, but plasticity at the synapse is anti-hebbian, and - as the reviewer points out - there is little evidence for recurrence among the inputs. We changed the discussion text accordingly.

The authors also discuss the possibility that the hippocampal formation might implement the proposed model, though confusingly they state 'we do not presume that endotaxis is localized to that structure' (line 564).

We have removed that confusing bit of text.

A correspondence with the hippocampus appears more plausible than the cerebellum, given the spatial tuning properties of hippocampal cells, and the profound effect of lesions on navigation behaviours. When discussing the possible relationship of the model to hippocampal circuits it would be useful to address internally generated sequential activity in the hippocampus. During active navigation, and when animals exhibit vicarious trial and error at decision points, internally generated sequential activity of hippocampal place cells appears to explore different possible routes ahead of the animal (Kay et al. 2020, https://doi.org/10.1016/j.cell.2020.01.014, Reddish 2016, https:// doi.org/10.1038/nrn.2015.30). Given the emphasis the model places on sampling possible future locations to evaluate goal-distance gradients, this seems highly relevant.

In our model, the possible future locations are sampled in real life, with the agent moving there or at least in that direction, e.g. via VTE movements. In this simple form the model has no provision for internal planning, and the animal never learns any specific route sequence. One can envision extending such a model with some form of sequence learning that would then support an internal planning mechanism. We mention this in the revised discussion section, along with citation of these relevant articles.

Also, given the strong emphasis the authors place on the relationship of their model to chemotaxis/odour-guided navigation, it would be useful to discuss brain circuits involved in chemotaxis, and whether/how these circuits relate to those involved in goal-directed navigation, and the proposed model.

The neural basis of goal-directed navigation is probably best understood in the insect brain. There the locomotor decisions seem to be initiated in the central complex, whose circuitry is getting revealed by the fly connectome projects. This area receives input from diverse sensory areas that deliver the signal on which the decisions are based. That includes the mushroom body, which we argue has the anatomical structure to implement the endotaxis algorithm. It remains a mystery how the insect chooses a particular goal for pursuit via its decisions. It could be revealing to force a change in goals (the mode switch in the endotaxis circuit) while recording from brain areas like the central complex.Our discussion now elaborates on this.

Finally, it would be useful to clarify two aspects of the behaviour of the proposed algorithm:

1. When discussing the relationship of the model to the successor representation (lines 620-627), the authors emphasise that learning in the model is independent of the policy followed by the agent during learning, while the successor representation is policy dependent. The policy independence of the model is achieved by making the synapses between map cells binary (0 or 1 weight) and setting them to 1 following a single transition between two locations. This makes the model unsuitable for learning the structure of graphs with probabilistic transitions, e.g. it would not behave adaptively in the widely used two-step task (Daw et al. 2011, https://doi.org/10.1016/j.neuron.2011.02.027) as it would fail to differentiate between common and rare transitions. This limitation should be made clear and is particularly relevant to claims that the model can handle cognitive tasks in general. It is also worth noting that there are algorithms that are closely related to the successor representation, but which learn about the structure of the environment independent of the subjects policy, e.g. the work of Kaelbling which learns shortest path distances, and the default representation in the work of Piray and Daw (both referenced above). Both these approaches handle probabilistic transition structures.

Yes. Our problem statement assumes that the environment is a graph with fixed edge weights. The revised text mentions this and other assumptions in a new section “Choice of learning rule”.

1. As the model evaluates distances using powers of adjacency matrix, the resulting distances are diffusion distances not shortest path distances. Though diffusion and shortest path distances are usually closely correlated, they can differ systematically for some graphs (see Baram et al. ci:ted above).

The recurrent network of map cells implements a specific function of the adjacency matrix, namely the resolvent (Eqn 7). We have a mathematical proof that this function delivers the shortest graph distances exactly, in the limit of small gain (γ in Eqn 7), and that this holds true for all graphs. For practical navigation in the presence of noise, one needs to raise the gain to something finite. Figure 3 analyzes how this affects deviations from the shortest graph distance, and how nonetheless the model still supports effective navigation over a surprising range. The mathematical details of the proof and further exploration of the resolvent distance at finite gain have been moved to a separate article, which is cited from here, and attached to the submission. The preprint by Baram et al. is cited in that article.

Reviewer #3 (Public Review):

This paper argues that it has developed an algorithm conceptually related to chemotaxis that provides a general mechanism for goal-directed behaviour in a biologically plausible neural form.

The method depends on substantial simplifying assumptions. The simulated animal effectively moves through an environment consisting of discrete locations and can reliably detect when it is in each location. Whenever it moves from one location to an adjacent location, it perfectly learns the connectivity between these two locations (changes the value in an adjacency matrix to 1). This creates a graph of connections that reflects the explored environment. In this graph, the current location gets input activation and this spreads to all connected nodes multiplied by a constant decay (adjusted to the branching number of the graph) so that as the number of connection steps increases the activation decreases. Some locations will be marked as goals through experiencing a resource of a specific identity there, and subsequently will be activated by an amount proportional to their distance in the graph from the current location, i.e., their activation will increase if the agent moves a step closer and decrease if it moves a step further away. Hence by making such exploratory movements, the animal can decide which way to move to obtain a specified goal.

I note here that it was not clear what purpose, other than increasing the effective range of activation, is served by having the goal input weights set based on the activation levels when the goal is obtained. As demonstrated in the homing behaviour, it is sufficient to just have a goal connected to a single location for the mechanism to work (i.e., the activation at that location increases if the animal takes a step closer to it); and as demonstrated by adding a new graph connection, goal activation is immediately altered in an appropriate way to exploit a new shortcut, without the goal weights corresponding to this graph change needing to be relearnt.

As the reviewer states, allowing a graded strengthening of multiple synapses from the map cells increases the effective range of the goal signal. We have now confirmed this in simulations. For example, in the analysis of Fig 3E, a single goal synapse enables perfect navigation only over a range of 7 steps, whereas the distributed goal synapses allow perfect navigation over the full 12 steps. This analysis is included in the code notebook on Github.

Given the abstractions introduced, it is clear that the biological task here has been reduced to the general problem of calculating the shortest path in a graph. That is, no real-world complications such as how to reliably recognise the same location when deciding that a new node should be introduced for a new location, or how to reliably execute movements between locations are addressed. Noise is only introduced as a 1% variability in the goal signal. It is therefore surprising that the main text provides almost no discussion of the conceptual relationship of this work to decades of previous work in calculating the shortest path in graphs, including a wide range of neural- and hardwarebased algorithms, many of which have been presented in the context of brain circuits.

The connection to this work is briefly made in appendix A.1, where it is argued that the shortest path distance between two nodes in a directed graph can be calculated from equation 15, which depends only on the adjacency matrix and the decay parameter (provided the latter falls below a given value). It is not clear from the presentation whether this is a novel result. No direct reference is given for the derivation so I assume it is novel. But if this is a previously unknown solution to the general problem it deserves to be much more strongly featured and either way it needs to be appropriately set in the context of previous work.

As far as we know this proposal for computing all-pairs-shortest-path is novel. We could not find it in textbooks or an extended literature search. We have discussed it with two graph theorist colleagues, who could not recall seeing it before, although the proof of the relationship is elementary. Inspired by the present reviewer comment, we chose to publish the result in a separate article that can focus on the mathematics and place it in the appropriate context of prior work in graph theory. For related work in the area of neural modeling please see our revised discussion section.

Once this principle is grasped, the added value of the simulated results is somewhat limited. These show: (1) in practical terms, the spreading signal travels further for a smaller decay but becomes erratic as the decay parameter (map neuron gain) approaches its theoretical upper bound and decreases below noise levels beyond a certain distance. Both follow the theory. (2) that different graph structures can be acquired and used to approach goal locations (not surprising) . (3) that simultaneous learning and exploitation of the graph only minimally affects the performance over starting with perfect knowledge of the graph. (4) that the parameters interact in expected ways. It might have been more impactful to explore whether the parameters could be dynamically tuned, based on the overall graph activity.

This is a good summary of our simulation results, but we differ in the assessment of their value. In our experience, simulations can easily demolish an idea that seemed wonderful before exposure to numerical reality. For example, it is well known that one can build a neural integrator from a recurrent network that has feedback gain of exactly 1. In practical simulations, though, these networks tend to be fickle and unstable, and require unrealistically accurate tuning of the feedback gain. In our case, the theory predicts that there is a limited range of gains that should work, below the critical value, but large enough to avoid excessive decay of the signal. Simulation was needed to test what this practical range was, and we were pleasantly surprised that it is not ridiculously small, with robust navigation over a 10-20% range. Similarly, we did not predict that the same parameters would allow for effective acquisition of a new graph, learning of targets within the graph, and shortest-route navigation to those targets, without requiring any change in the operation of the network.

Perhaps the most biologically interesting aspect of the work is to demonstrate the effectiveness, for flexible behaviour, of keeping separate the latent learning of environmental structure and the association of specific environmental states to goals or values. This contrasts (as the authors discuss) with the standard reinforcement learning approach, for example, that tries to learn the value of states that lead to reward. Examples of flexibility include the homing behaviour (a goal state is learned before any of the map is learned) and the patrolling behaviour (a goal cell that monitors all states for how recently they were visited). It is also interesting to link the mechanism of exploration of neighbouring states to observed scanning behaviours in navigating animals.

The mapping to brain circuits is less convincing. Specifically, for the analogy to the mushroom body, it is not clear what connectivity (in the MB) is supposed to underlie the graph structure which is crucial to the whole concept. Is it assumed that Kenyon cell connections perform the activation spreading function and that these connections are sufficiently adaptable to rapidly learn the adjacency matrix? Is there any evidence for this?

Yes, there is good evidence for recurrent synapses among Kenyon cells (map cells in the model), and for reward-gated synaptic plasticity at the synapses onto mushroom body output cells (goal cells in our model). We have expanded this material in the discussion section. Whether those functions are sufficient to learn the structure of a spatial environment has not been explored; we hope our paper might give an impetus, and are exploring behavioral experiments on flies with colleagues.

As discussed above, the possibility that an algorithm like 'endotaxis' could explain how the rodent place cell system could support trajectory planning has already been explored in previous work so it is not clear what additional insight is gained from the current model.

Please see our revised discussion section on “theories and models of spatial learning”.In short, some ingredients of the model have appeared in prior work, but we believe that the present formulation offers an unexpectedly simple end-to-end solution for all components of navigation: exploration, target learning, and goal seeking.

Reviewer #1 (Recommendations For The Authors):

Major concern:

See the public review. How do the results change depending on whether the recurrent connections between map cells are an adjacency matrix vs. a properly normalized statetransition matrix? I'm especially asking about results related to critical gain (gamma_c), and the dependence of the optimal parameter values on the environment.

Please see our response above including the attached reviewer figure.

Minor concerns:

It is not always clear when the learning rule is symmetric vs asymmetric (undirected vs directed graph), and it seems to switch back and forth. For example, line 127 refers to a directed graph; Fig 2B and the intro describe symmetric Hebbian learning. Most (all?) of the simulations use the symmetric rule. Please make sure it's clear.

For simplicity we now use a symmetric rule throughout, as is appropriate for undirected graphs. We mention that a directed learning rule could be used to learn directed graphs.See the section on “choice of learning rule”.M_ij is not defined when it's first introduced (eq 4). Consider labeling the M's and the G's in Fig 2.

Done.

The network gain factor (gamma, eq 4) is distributed over both external and recurrent inputs (v = gamma(u + Mv)), instead of local to the recurrent weights like in the Successor Representation. This notational choice is obviously up to the authors. I raise slight concern for two reasons -- first, distributing gamma may affect some of the parameter sweep results (see major concern), and second, it may be confusing in light of how gamma is used in the SR literature (see reviewer's paper for the derivation of how SR is computed by an RNN with gain gamma).

In our model, gamma represents the (linear) activation function of the map neuron, from synaptic input to firing output. Because the synaptic input comes from point cells and also from other map cells, the gain factor is applied to both. See for example the Dayan & Abbott book Eqn 7.11, which at steady state becomes our Eqn 4. In the formalism of Fang 2023 (Eqn 2), the factor γ is only applied to the recurrent synaptic input J ⋅ f, but somehow not to the place cell input ϕ. Biophysically, one could imagine applying the variable gain only to the recurrent synapses and not the feed-forward ones. Instead we prefer to think of it as modulating the gain of the neurons, rather than the synapses. The SR literature follows conventions from the early reinforcement learning papers, which were unconstrained by thinking about neurons and synapses. We have added a footnote pointing the reader to the uses of γ in different papers.

In eq 13, and simulations, noise is added to the output only, not to the activity of recurrently connected neurons. It is possible this underestimates the impact of noise since the same magnitude of noise in the recurrent network (map cells) could have a compounded effect on the output.

Certainly. The equivalent output noise represents the cumulative effect of noise everywhere in the network. We argue that a cumulative effect of 1% is reasonable given the overall ability of animals at stimulus discrimination, which is also limited by noise everywhere in the network. This has been clarified in the text.

Fig 3 E, F, it looks like the navigated distance may be capped. I ask because the error bars for graph distance = 12 are so small/nonexistent. If it's capped, this should be in the legend.

Correct. 12 is the largest distance on this graph. This has been added to the caption.

Fig 3D legend, what does "navigation failed" mean? These results are not shown.

On those occasions the agent gets trapped at a local maximum of the goal signal other than the intended goal. We have removed that line as it is not needed to interpret the data.

Line 446, typo (Lateron).

Fixed.

Line 475, I'm a bit confused by the discussion of birds and bats. Bird behavior in the real world does involve discrete paths between points. Even if they theoretically could fly between any points, there are costs to doing so, and in practice, they often choose discrete favorite paths. It is definitely plausible that animals that can fly could also employ Endotaxis, so it is confusing to suggest they don't have the right behavior for Endotaxis, especially given the focus on fruit flies later in the discussion.

Good points, we removed that remark. Regarding fruit flies, they handle much important business while walking, such as tracking a mate, fighting rivals over food, finding a good oviposition site.

Section 9.3, I'm a bit confused by the discussion of cerebellum-like structures, because I don't think they have as dense recurrent connections as needed for the map cells in Endotaxis. Are you suggesting they are analogous to the output part of Endotaxis only, not the whole thing?

Please see our reply in the public review. We have removed this discussion of cerebellar circuits.

Line 541, "After sufficient exploration...", clarify that this is describing learning of just the output synapses, not the recurrent connections between map cells?

We have revised this entire section on the arthropod mushroom body.

In lines 551-556, the discussion is confusing and possibly not consistent with current literature. How can a simulation prove that synapses in the hippocampus are only strengthened among immediately adjacent place fields? I'd suggest either removing this discussion or adding further clarification. More broadly, the connection between Endotaxis and the hippocampus is very compelling. This might also be a good point to bring up BTSP (though you do already bring it up later).

As suggested, we removed this section.

Line 621 "The successor representation (at least as currently discussed) is designed to improve learning under a particular policy" That's not actually accurate. Ref 17 (reviewer's manuscript, cited here) is not policy-specific, and instead just learns the transition statistics experienced by the animal, using a biologically plausible learning rule that is very similar to the Endotaxis map cell learning rule (see our Appendix 5, comparing to Endotaxis, though that was referencing the previous version of the Endotaxis preprint where Oja's rule was used).

We have edited this section in the discussion and removed the reference to policyspecific successor representations.

Line 636 "Endotaxis is always on" ... this was not clear earlier in the paper (e.g. line 268, and the separation of different algorithms, and "while learning do" in Algorithm 2).

The learning rules are suspended during some simulations so we can better measure the effects of different parts of endotaxis, in particular learning vs navigating. There is no interference between these two functions, and an agent benefits from having the learning rules on all the time. The text now clarifies this in the relevant sections.

Section 9.6, I like the idea of tracing different connected functions. But when you say "that could lead to the mode switch"... I'm a bit confused about what is meant here. A mode switch doesn't need to happen in a different brain area/network, because winnertake-all could be implemented by mutual inhibition between the different goal units.

This is an interesting suggestion for the high-level control algorithm. A Lorenzian view is that the animal’s choice of mode depends on internal states or drives, such as thirst vs hunger, that compete with each other. In that picture the goal cells represent options to be pursued, whereas the choice among the options occurs separately. But one could imagine that the arbitrage between drives happens through a competition at the level of goal cells: For example the consumption of water could lead to adaptation of the water cell, such that it loses out in the winner-take-all competition, the food cell takes over, and the mouse now navigates towards food. In this closed-loop picture, the animal doesn’t have to “know” what it wants at any given time, it just wants the right thing. This could eliminate the homunculus entirely! Of course this is all a bit speculative. We have edited the closing comments in a way that leaves open this possibility.

Line 697-704, I need more step-by-step explanation/derivation.

We now derive the properties of E step by step starting from Eqn (14). The proof that leads to Eqn 14 is now in a separate article (available as a preprint and attached to this submission).

Reviewer #3 (Recommendations For The Authors):

  • Please include discussion and comparison to previous work of graph-based trajectory planning using spreading activation from the current node and/or the goal node. Here is a (far from comprehensive) list of papers that present similar algorithms:

Glasius, R., Komoda, A., & Gielen, S. C. (1996). A biologically inspired neural net for trajectory formation and obstacle avoidance. Biological Cybernetics, 74(6), 511-520.

Gaussier, P., Revel, A., Banquet, J. P., & Babeau, V. (2002). From view cells and place cells to cognitive map learning: processing stages of the hippocampal system. Biological cybernetics, 86(1), 15-28.

Gorchetchnikov A, Hasselmo ME. A biophysical implementation of a bidirectional graph search algorithm to solve multiple goal navigation tasks. Connection Science. 2005;17(1-2):145-166

Martinet, L. E., Sheynikhovich, D., Benchenane, K., & Arleo, A. (2011). Spatial learning and action planning in a prefrontal cortical network model. PLoS computational biology, 7(5), e1002045.

Ponulak, F., & Hopfield, J. J. (2013). Rapid, parallel path planning by propagating wavefronts of spiking neural activity. Frontiers in computational neuroscience, 7, 98.

Khajeh-Alijani, A., Urbanczik, R., & Senn, W. (2015). Scale-free navigational planning by neuronal traveling waves. PloS one, 10(7), e0127269.

Adamatzky, A. (2017). Physical maze solvers. All twelve prototypes implement 1961 Lee algorithm. In Emergent computation (pp. 489-504). Springer, Cham.

Please see our reply to the public review above, and the new discussion section on “Theories and models of spatial learning”, which cites most of these papers among others.

  • Please explain, if it is the case, why the goal cell learning (other than a direct link between the goal and the corresponding map location) and calculation of the overlapping 'goal signal' is necessary, or at least advantageous.

Please see our reply in the public review above.

  • Map cells are initially introduced (line 84) as getting input from "only one or a few point cells". The rest of the paper seems to assume only one. Does it work when this is 'a few'? Does it matter that 'a few' is an option?

We simplified the text here to “only one point cell”. A map cell with input from two distant locations creates problems. After learning the map synapses from adjacencies in the environment, the model now “believes” that those two locations are connected. This distorts the graph on which the graph distances are computed and introduces errors in the resulting goal signals. One can elaborate the present toy model with a much larger population of map cells that might convey more robustness, but that is beyond our current scope.

  • (line 539 on) Please explain what feature in the mushroom body (or other cerebellumlike) circuits is proposed to correspond to the learning of connections in the adjacency matrix in the model.

Please see our response to this critique in the public review above. In the mushroom body, the Kenyon cells exhibit sparse responses and are recurrently connected. These would correspond to map cells in Endotaxis. For vertebrate cerebellum-like circuits, the correspondence is less compelling, and we have removed this topic from the discussion.


Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

RESOURCES