Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2021 Dec 30;119(1):e2107431118. doi: 10.1073/pnas.2107431118

Sector search strategies for odor trail tracking

Gautam Reddy a,b, Boris I Shraiman c,1, Massimo Vergassola b,d,1
PMCID: PMC8740577  PMID: 34983837

Significance

Surface-bound odor trail tracking is critical for the survival of terrestrial animals dependent on olfaction. Little is known about how animals track trails at the algorithmic level. In the present study, we propose that a tracking animal maintains a noisy estimate of where the trail is headed based on its past contacts with the trail. We show that virtual agents trained to exploit this strategy reproduce the tracking patterns of ants and rodents. The observed patterns emerge simply as a consequence of common geometric constraints, which also impose fundamental limits on how quickly an animal can track trails. A series of experiments is proposed to quantify how past experience and trail statistics shape tracking behavior.

Keywords: stracking, algorithm, behavior, optimization

Abstract

Ants, mice, and dogs often use surface-bound scent trails to establish navigation routes or to find food and mates, yet their tracking strategies remain poorly understood. Chemotaxis-based strategies cannot explain casting, a characteristic sequence of wide oscillations with increasing amplitude performed upon sustained loss of contact with the trail. We propose that tracking animals have an intrinsic, geometric notion of continuity, allowing them to exploit past contacts with the trail to form an estimate of where it is headed. This estimate and its uncertainty form an angular sector, and the emergent search patterns resemble a “sector search.” Reinforcement learning agents trained to execute a sector search recapitulate the various phases of experimentally observed tracking behavior. We use ideas from polymer physics to formulate a statistical description of trails and show that search geometry imposes basic limits on how quickly animals can track trails. By formulating trail tracking as a Bellman-type sequential optimization problem, we quantify the geometric elements of optimal sector search strategy, effectively explaining why and when casting is necessary. We propose a set of experiments to infer how tracking animals acquire, integrate, and respond to past information on the tracked trail. More generally, we define navigational strategies relevant for animals and biomimetic robots and formulate trail tracking as a behavioral paradigm for learning, memory, and planning.


Experimental studies demonstrate the ability of ants, dogs, humans, and rodents to track odor trails (16). Rodents accurately track trails in the dark, remaining close to the trail and casting when contact is lost (Fig. 1A) (5). Carpenter ants closely follow a trail while sampling it using a “crisscross” pattern with their two antennae (Fig. 1B) (1). Current models of this behavior rely on variants of chemotaxis (7) based on continuous estimates of the rising and falling odor gradients as the trail is crossed. One such strategy compares simultaneous odor concentrations detected by two spatially separated sensors (8). Yet, rats with a blocked nostril (5) and ants with a single antenna (1) are still able to track trails, although less accurately. An alternative chemotaxis strategy has the animal measuring odor gradients along its trajectory and turning when a significant decrease is perceived (5).

Fig. 1.

Fig. 1.

Sample trail-tracking trajectories from previous experiments and our RL simulations. (A) A rat (head position in red) tracking a trail (in black). Note the wide casts on extended loss of contact with the trail. Data reproduced from ref. 5. (B) A carpenter ant tracking an odor trail (black) using a stereotyped crisscrossing strategy (1). (C and D) Sample trajectories obtained from RL for agents with one sensor (C) and two sensors (D) recapitulate experimentally observed tracking patterns in A and B. (E, Left) Search paths executed by RL agents with a single sensor upon loss of contact with the trail. (E, Right) The initial prior distribution (E, Bottom Right) over trail headings transforms into a bimodal posterior distribution (E, Top Rightand E, Middle Right), which drives the oscillatory pattern of casting. (F) RL agents with two sensors show a characteristic crisscrossing pattern close to the last detection point. The search path is similar to the single-sensor agent at long distances (SI Appendix, Fig. S1A). (G) RL agents show a trade-off between tracking speed (rescaled by the sector angle σ, sensor size a, and sampling frequency ω) and the probability of losing the trail entirely.

While chemotaxis-based strategies can allow for trail tracking when trails are continuous, they fail when trails are broken and gradients are absent, which is certainly relevant for animals tracking trails in the wild. In experiments with broken trails (1, 5), the absence of signal triggers casting, which is a fundamental feature shared with olfactory searches in a turbulent medium (911). Even though turbulent searches also feature sporadic cues, airborne odor signals tend to be localized in a cone, and even within the cone, the signal is highly fluctuating (12, 13). Therefore, beyond qualitative similarities between terrestrial trail tracking and airborne olfactory searches, the specific statistics of detections, geometric constraints, and behavioral patterns are distinct.

In contrast with chemotaxis-based algorithms, we propose an alternative framework built on the searcher exploiting past contacts with the trail to maintain an estimate of the trail’s local heading and its uncertainty. A minimal memory of the approximate locations of the two most recent contacts suffices to delineate an angular sector of probable trail headings that radiates from the most recent detection point. The resulting “sector search” provides a quantitative description of trail-tracking behavior that unifies its various phases and yields specific experimental predictions.

Results

We first show that reinforcement learning (RL) based on the sector search idea can recapitulate natural behavior. An RL agent in this scheme learns to traverse the trail as quickly as possible while minimizing the probability of losing it (Materials and Methods has details). Our in silico RL experiments show that general aspects of animal tracking behavior naturally emerge (Fig. 1 C and D). Specifically, casts are observed around the most likely heading of the trail, and their amplitude is within the angular sector defined by the initial uncertainty σ of the trail’s heading ϕ. The reason for the oscillatory pattern of casting is intuitive. Indeed, while moving along a path C without detecting the trail, the estimated heading’s probability distribution P(ϕ) (Fig. 1E) is updated into PC(ϕ) as

P(ϕ)PC(ϕ)ΓC(ϕ)P(ϕ) [1]

where ΓC(ϕ) is the probability of not detecting the trail headed along ϕ. Irrespective of the explicit form of ΓC(ϕ), the depletion of headings already explored generally leads to a bimodal posterior distribution, with the two modes at the edges of the angular sector (Fig. 1E). The search process is analogous to an agent “foraging” for the trail at two spatially separated patches. The emergence of oscillations is then understood in terms of marginal value theory (14, 15); we show using a minimal model of casting (Materials and Methods) that the turning point of a cast occurs when the marginal value of continuing on one side of the sector (i.e., without paying the cost of traveling) is outweighed by the probability of finding the trail on the opposite side.

We proceed now by establishing geometric limits on tracking speed. A typical RL curve for the probability of losing the trail vs. speed is shown in Fig. 1G. Its monotonicity epitomizes universal limits that “staying on the trail” imposes on tracking speed. Intuitively, searching slowly reduces the distance between detections (the interdetection interval [IDI]), decreasing the uncertainty in the estimate of the trail’s heading and thus, the probability of losing the trail. However, these benefits come at the cost of slow forward progression along the trail. In contrast, moving quickly reduces the detection rate, leading to longer IDIs, increased uncertainty, and loss probability.

We quantify the above trade-off using simple scaling arguments. Suppose the tracking agent has a sensor of size a, samples at a frequency ω, and moves with a fixed forward speed v. As shown in Fig. 2A, the angle subtended by the detector at distance r from the last contact is a / r. The agent searching over an angular sector then scans at a rate dϕ/dt=ωa/rωa/vt. Integrating the above expression for the angular rate, vaωdϕ=dt/t, we obtain the typical time for searching over a sector angle σ: tcω1eσv/aω. The corresponding distance Lvtc is obtained using rvt. The heading of the trail is known with uncertainty σ, which is the opening angle of the conical sector shown in Fig. 2A. Uncertainty is expected to depend on the distance L from the previous detection as σ(L)=(L/)γ, where and γ characterize the statistics of trails (below and Fig. 2D). Importantly, a stable strategy for long-term tracking requires that successive IDIs should on average be equal (i.e., L=L). Combining L = vtc with L=σ1/γ and the expression for tc, we finally obtain an upper bound on the tracking speed v:

v1+γaγω1+γ=(ωtc)γlog(ωtc)γ1e1. [2]

Fig. 2.

Fig. 2.

History dependence and trail models. (A and B) Trail tracking naturally splits into distinct episodes punctuated by trail detections by the searcher. In each episode, we propose that the tracker searches for the trail using an estimate of the trail’s heading updated based on the past points of contact with the trail and a model of trail statistics. We affix a polar coordinate system with the origin at the most recent contact point and the azimuthal angle defined relative to the estimated trail heading. The uncertainty σ fixes the angular width of the search. The searcher moves forward with a speed v while sampling at a frequency ω. A sensor of size a spans a / r radians at distance r, which determines the rate at which the angular space is searched. (C) To estimate where the trail is headed and its uncertainty from past contacts, the tracker can either use local anisotropy estimated from a single contact (C, Left) or extrapolate from previous points of contact using a model of trail statistics (C, Center and C, Right). In the latter case, the most likely trail paths (dashed blue lines) are similar to interpolated splines, which capture basic geometric notions of persistence in heading and curvature. (D) The uncertainty in trail heading (in radians) as a function of the distance, L, between points of contact for the GWLC model of trails discussed in the text. λ is the correlation length scale of the trail’s curvature. SI Appendix, Fig. S3C illustrates the various scaling regimes exhibited by the GWLC model. (E) The correlation between trail heading at the most recent and second most recent points of contact for the GWLC model changes with the distance between these points, yet it is generally expected to be negative. (F) The expected search distance, vtc, against the distance, L, between the previous two points of contact for a/λ=0.1 and v/aω=5,6.25 (black and gray, respectively) (discussion above Eq. 2). The point of intersection with the 45° dashed line is the condition for a stable tracking strategy. The gray curve corresponds to vmax beyond which tracking is unstable.

Its maximum vmaxω(aγ)11+γ defines the optimal stable tracking speed in terms of the tracker’s sensory parameters and trail statistics. The basic element that leads to this bound is the geometric factor 1/r that underlies searching over an angular sector. The result from Eq. 2 that ωtc is of order one (e1/γ) explains experimental observations (Fig. 1) that tracking animals typically take only a few samples to reestablish contact with the trail.

The above argument implies that tracking speed depends on the trail statistics via the relation between uncertainty and the distance between points of contact. We use ideas from polymer physics to quantify how this relationship depends on geometric properties of the trails. Specifically, we ask how detecting the trail at a set of points r0,r1,r2, (Fig. 2B) constrains the searcher’s estimate of its future heading. We consider the case when the searcher keeps track of the two most recent points of contact with perfect memory of their location. A more extended memory is discussed further below; an imperfect memory adds to the uncertainty and can be easily accommodated within the framework developed below. Intuition for the two-point case is provided by the familiar “curve” tool in graphical design software, which draws a cubic spline through a set of prescribed points (Fig. 2C). The tool captures the simple intuition that tangents to a curve are continuous (i.e., the trail’s heading has local persistence), which is a plausible, minimal assumption about trails. We show in SI Appendix that cubic spline interpolation corresponds to the most likely path (through a fixed set of points) in the so-called worm-like chain (WLC) ensemble (originally introduced for polymers) (16, 17). In this ensemble, the tangent direction undergoes diffusion with rate κ, and the uncertainty is then σ=(κL/3)12, which determines the two parameters: the scaling law, γ=1/2, and the correlation length scale, =3κ1, in Eq. 2. Actual trails could be smoother and have a well-defined curvature (the rate of change of heading) that persists on a characteristic length scale λ. We capture this ensemble of curves by introducing two additional parameters: persistence length λ and typical radius of curvature ξ (Materials and Methods). Uncertainty is then given by σL/2ξ (hence, γ = 1 and =2ξ) at distances L<λ, while diffusive scaling is recovered at larger distances with an effective diffusivity κ=2λξ2. This extended model defines a generalized worm-like chain (GWLC) ensemble with cross-overs across the various regimes (Materials and Methods). In summary, the model leads to a “propagator,” which encodes how information about past contacts is integrated to form an estimate of the trail’s heading while taking into account geometric aspects of trails. A general feature is that the headings at two consecutive contacts are anticorrelated (Fig. 2E), which reflects the bending of the spline relative to the chord seen in Fig. 2C. We emphasize that although the general strategy of the agent depends on the statistical properties of the trail ensemble, the specific actions taken by the tracking agent along a particular trail, such as reorientation based on the most likely trail heading, will depend on the history of contact points via the propagator for the WLC (or GWLC) model.

Why and when do searchers need to cast? The question stems from our previous result that a few samples are typically sufficient to reestablish contact with the trail. To address it quantitatively, we consider again the setup of Eqs. 1 and 2. The nondetection probability averaged over the ensemble of trails that pass through past contact points is

ΓC=eωCdsvIa(r(s),y(s))y, [3]

where s parametrizes the searcher’s path C and the Boolean indicator function Ia measures if the agent at r(s) is within sensing range a of the trail at y (i.e., the integral is the time spent in contact with the trail). Numerical simulations of the search show a power law scaling regime for ΓC, which is cut off at short distances by the initial surge and at long distances by trails escaping out of the casting envelope (Fig. 3 AC). We proceed to explain these three regimes shown in Fig. 3B. Intuitively, at short radial distances r<a/σv/ω (the latter from Eq. 2), the sensor covers the entire sector of likely headings, the searcher can just move forward, and ΓCeωr/v (Fig. 3B). Casting sets in if the searcher reaches, without detection, a distance ra/σ (i.e., when the sector is not fully covered any longer by the sensor of size a). The sector geometry in Fig. 3A implies that the length of a single casting sweep is proportional to r. A fixed forward speed then implies that the distance between successive encounters with the trail also scales with r. Hence, the number of times the searcher crosses the trail (and thus, the time spent on the trail) per unit radial distance decreases as 1/r. This 1/r scaling in the overlap then leads to a logarithmic integral in the exponent of Eq. 3 and thus, a power law regime ΓCrβ during casting. The optimal exponent β* depends on the statistics of the trails, yet it generally satisfies β*>1 (β*=1.63 for the curvature-dominated GWLC model) (Eq. 19). Since ΓC is a cumulative distribution, β>1 implies that the mean distance is indeed determined by the lower cutoff (i.e., the trail is typically found in a few samples, as estimated in Eq. 2). However, the power law decay implies that casting phases are frequent and can span up to the upper cutoff where all aspects of trail statistics and search geometry come into play, as discussed below.

Fig. 3.

Fig. 3.

The role of casting in trail tracking. (AC) The probability of not detecting the trail at distance L decays exponentially during the initial forward surge (red) and as a power law during casting (blue) over a conical envelope. The searcher often finds the trail within the initial surge (mean in B), whereas casting determines the probability of losing the trail (determined by the upper cutoff) if the trail is not found during the surge. Beyond a characteristic length scale , the detection rate becomes negligible (green region in B) as trails initially well inside the casting envelope escape out of the envelope (C). (D and E) The casting policy obtained from Bellman optimization. The specific casting strategy depends on trail statistics, shown here for diffusive and curvature-dominated trail ensembles (scaling exponent γ=0.5,1.0) in blue and red, respectively. Note the increasing casting angle and the slowing down of the agent at r (E). (F) The trade-off between the probability of missing the trail for fixed L and searcher speed v.

How should an agent perform sustained casts so as to minimize the probability of losing the trail while maximizing tracking speed? To go beyond the above scaling arguments, we now consider the geometry of sector search in detail. The searcher’s path is parameterized by the sequence of turning points of its casting trajectory {rk,θk} (in polar coordinates with respect to the most recent point of contact), which are to be optimized. We maximize the average tracking speed L / T, where L and T are the distance and duration between the most recent and the subsequent point of contact with the trail, respectively. As discussed previously, the uncertainty estimated for a bout of sector search depends on the IDI. To constrain the uncertainty, we therefore constrain the average, L. Hence, we consider the following optimization problem:

Vv,Λ=max{rk,θk}[LTΛL], [4]

where Λ is the Lagrange multiplier enforcing the L constraint. The turning points and the searcher’s speed v affect the probability of detecting the trail in a single cast, which is implicit in the expectation in Eq. 4. We solve the Bellman equation corresponding to the above optimization problem using dynamic programming (Materials and Methods), which sequentially optimizes for the turning points by considering at each step the two possibilities that either the trail is detected during the cast or the agent advances to the next cast. The probabilities for these two events are controlled by the nondetection probability given by Eq. 3. In the event of no detection, the estimate of the trail’s heading is updated according to Eq. 1. The resulting optimization yields a search strategy with an increasing sequence of casting angles (Fig. 3D). The specific casting strategy depends on how trails meander and curve (Fig. 3E). The choice of v controls the trade-off between the tracking speed and the probability of losing the trail entirely (Fig. 3F). Independent of the choice of v and L, azimuthal excursions are by and large conical but extend dramatically (Fig. 3 D and E) when trails that were initially inside the cone escape from it with high probability (Fig. 3C). This happens at a distance that scales with but also depends on the sector envelope, which in turn, depends on σ (recall that is the correlation length of the trail heading). Intuitively, at length scales , the trail’s heading decorrelates from its initial value, the relevance of information on past detections has expired, the trail is effectively lost, and it is optimal to stop progressing forward.

Experimental Tests

A number of transformative experimental assays are suggested by our theoretical framework. The broad theme is whether and how animals adapt their behavior to the statistics of trails. For field experiments, it would be informative to measure the statistics of natural trails, analogous to the statistics of natural images that has brought insight into the adaptation of visual responses (1821). Specifically, one can measure the autocorrelation of local heading and curvature of natural trails, which would test the validity and fix the parameters of our WLC-type models. In laboratory settings, the statistics of trails can be controlled by varying persistence or curvature or using broken trails (Fig. 4A). The general issue of adaptation is articulated in the following four specific questions that stem from our work.

Fig. 4.

Fig. 4.

Behavioral assays to dissect trail-tracking strategies. (A) A tracker executing a sector search often maintains continuous contact with the trail (A, Left). Extended sector searches can instead be systematically elicited using broken trails (A, Right). The subsequent search envelope’s orientation and width relative to past contact points inform the tracker’s internal estimate and uncertainty σ of the trail’s heading. (B) We propose dashed trails as an assay to infer how a tracking animal integrates past information. The distance between dashes forces contact points to be separated by at least δL, and the subsequent search sector yields an estimate of σ. (C) Our theory predicts that the most likely trail heading ϕML is proportional to the angle ψ between the line segments joining points of contact (red dots), with a prefactor that depends on the distance δL between the two most recent contact points, the trail model and memory. (D) Automated behavioral tracking of rodents on a treadmill allows control of tracking speed and trail statistics (5, 22). (E) The mean distance to the trail with speed in simulations of sector search (as in A, Left), which recapitulates the linear relationship (dashed line) observed in experiments with rats (5). We use a sector search strategy, where the longitudinal speed v is fixed and the tracker rapidly casts within a conical envelope (SI Appendix, Fig. S4A shows a sample trajectory in a single bout). For each v, we simulate 100 trials, where each trial consists of successive 10 successful contacts with the trail. Trails are generated from the GWLC ensemble with κ=0,ξ/λ=3,a/λ=0.1. Error bars are one SEM.

First, how long after the loss of contact do animals “give up” tracking? Our prediction is that they should when they get beyond the characteristic correlation length of the trails. At this point, the value of past information has expired, and it is best to turn back or start a new search. This prediction can be tested by varying trail statistics, interrupting the trails, and measuring when animals give up.

Second, does the amplitude of casting depend on the statistics of trails and the IDI? We predict that it should, and the specific quantitative relationship is a signature of the underlying predictive model employed by the animal (Fig. 4B). Our prediction should be contrasted with the nonadaptive casting envelope assumed in ref. 5. The IDI can be experimentally manipulated by generating dashed trails as shown in Fig. 4B, which forces the animal to detect the trail sporadically yet at controllable intervals. It would be particularly informative to verify whether or not animals include curvature in their estimates of the trails’ future heading or limit to persistence.

Third, what is the memory of past trail contacts? Experiments with forked trails (5) show that rats exhibit a predictive component, suggesting a memory that extends over the recent past. Our theory posits that the tracker remembers (at least) the two most recent detection points. For the case of two-point memory, we predict the sector search is oriented along the line connecting those two points. If more than two points are remembered, the expected heading deviates from this line (by an angle that we calculate explicitly in SI Appendix) as illustrated in Fig. 4C. Note that the heading is not an average of the past headings, as assumed in ref. 5, and actually depends on the IDI between recent contacts; this prediction could again be tested by using curved, dashed trails as in Fig. 4B.

Fourth, does the tracking speed vary with the typical IDI, reducing with increased uncertainty as predicted by the speed–accuracy trade-off Eq. 2? This can be tested by varying the speed, for instance, of the treadmill in ref. 5 (Fig. 4D) and measuring tracking accuracy. Available data for three speeds in ref. 5 are captured by our theory (Fig. 4E), which highlights the importance of revisiting those pioneering experiments and measuring additional quantities, namely the explicit prediction in Fig. 3F.

Conclusion

In conclusion, we show that an optimized sector search strategy based on the memory of two or more recent detection events yields an oscillatory search path with increasing amplitude that naturally unifies the observed low-amplitude “zigzagging” and larger-amplitude “casting” behaviors into the same quantitative framework. This framework elucidates the geometric and computational constraints faced by tracking animals and identifies general features of the algorithms that efficiently solve the task, which can also be implemented for robotic applications. Insights and predictions developed here impact and should inform the design and analysis of future animal behavior experiments.

While the computational constraints discussed in this work apply generally, animals may face an additional set of species-specific physical and physiological constraints that influence aspects of their trail-tracking strategies. For example, inertial effects and gait constraints will affect the rate at which an animal sweeps across a sector and consequently, its speed selection. Further flexibility is offered by extra degrees of freedom in the sensorimotor apparatus, such as modulation of the inhalation rate or independent control of segments of an ant’s antennae. In our model, the sensorimotor mechanism impacts the strategy via the probability of detection of the trail from the animal’s sensors, which we have assumed to be fixed. Indeed, without additional trade-offs, our model implies that the optimal strategy is to simply pick the parameters that maximize the probability of detection (i.e., the maximal sensor size and sampling frequency). Physiological constraints may impose trade-offs (e.g., between the strength and rate of inhalation), which then lead to an altogether new dimension in the animal’s decision making. We postpone a detailed examination of these trade-offs and how they influence the trail-tracking strategy to future work.

We have assumed that the searcher’s behavior is guided by a model of continuous trails and investigated how an animal should behave in the presence of a sporadic break (a common perturbation in experiments). Recurrent breaks in the trails can be incorporated into the decision-making framework by introducing an additional term in the exponent of Eq. 3 for the presence or absence of a trail. When averaged over the trail ensemble and the statistics of brokenness, the effect of broken trails appears as a reduced detection rate if the breaks are much shorter than the correlation length scale , and the general conclusions remain unaffected. However, if the breaks last longer than , the trail is “patchy,” with each patch containing little information about the location of the next patch. In such a scenario, a strategy that employs biased random exploration or foraging may prove optimal.

Materials and Methods

An RL Framework for Trail Tracking

Trail tracking naturally splits into discrete episodes where after each loss of contact, the tracker searches and attempts to reestablish contact with the trail. We use RL to identify optimal search strategies for each episode and explore how factors such as sensory configuration or movement constraints influence the strategy. The task is a one-dimensional search over angular space θ, the geometry of which is illustrated in Fig. 2A. The tracker controls its tangential speed u(t)rθ˙, while its radial speed v is kept constant. For simplicity, we focus here on the configuration featuring a single sensor of size a sampling with a Poisson frequency ω. The generalization to two sensors is found in SI Appendix.

In each episode, the (Bayesian) agent maintains a posterior probability distribution function (PDF) P(ϕ) over possible trail headings, which is continuously updated based on the locations already visited until the trail is recontacted. The agent’s strategy of decisions about its future trajectory a priori depends on the full high-dimensional distribution P(ϕ), which is difficult to learn. To circumvent this issue, we formulate the search task using a tractable parametrization of the posterior as a mixture of K Gaussian basis functions:

P(ϕ)=i=1KqiBi(ϕ),Bi(ϕ)=s1φ(|ϕμi|s), [5]

where φ is the standard normal PDF and the qi ’s are normalized weights. The posterior is encoded by the weights q, the posterior probabilities of the latent states given the agent’s history. The corresponding vector is lower dimensional and yields to standard RL methods. For simulations in the text, we used K=3; equal initial weights; μ1,μ2,μ3=σ,0,σ; and s=0.5σ. These values were chosen so that the prior has mean zero and SD σ. Using K > 3 led to similar strategies of search, but training was slower. We define the detection probability given latent state i for an agent at r as

γ¯i(r)=γ(r,ϕ)Bi(ϕ)dϕ, [6]

where γ(r,ϕ) is the detection probability of finding a trail headed along ϕ if the searcher is at r. We assume a Gaussian detection kernel of size a, with distance measured to the closest point on the trail: γ(r,ϕ)=er2sin2(θϕ)/2a2er2(θϕ)2/2a2, where we have used the small-angle approximation. Conditional on no detection at r, Bayes’ rule yields q˙i=ωqi(γ¯i(r)γ¯(r)), where γ¯=iqiγ¯i is the total probability of detection. From Eq. 6, we have

γ¯i(r)=π2σrσrφ(θμiσr), [7]

where σr=a/r and σr=σr2+s2.

We use a discount rate λ and provide a reward as discussed below, after Eq. 8. Training is performed in an episodic fashion with each episode lasting time T. The kinematic variables are updated with time step dt, and actions are taken with time step dtactdt. Movement constraints are imposed by restricting the set of actions to three values, u/aω{α,0,α}. The state space has four dimensions: r, θ, lnq1/q2, and lnq3/q2. We discretize our state space using a nonoverlapping tile coding scheme (23). We refer to SI Appendix for hyperparameter values, details about the state space architecture, and the case of two sensors.

We use the SARSA (state-action-reward-state-action) Q-learning algorithm (23), which learns the so-called Q function, that is the value function for each action in a given state:

Qπ(r,q,u)=γ¯(r)dt+[1(λ+γ¯(r))dt]Vπ(r+vdt,q+q.dt) [8]

where Vπ(r,q)=uQπ(r,q,u)π(u|r,q), and the index π highlights the dependence on the probabilistic policy π(u|r,q). The above equation differs from the standard SARSA update by the addition of γ¯dt in the discount term, which is due to our formulation of the search as a continuing process conditional on no contact with the trail. Alternatively, one may provide a unit reward when the trail is found, stop the episode, and start over. However, the credit assignment problem in goal-oriented tasks makes the training (Eq. 9 hereafter) problematic, even though the final optimal policy is equivalent (e.g., ref. 24). Our formulation circumvents both issues by 1) giving a local reward γ¯dt rather than a final one, which addresses the credit assignment, and 2) including the detection probability γ¯dt into the discount rate to account for the condition of no contact.

To learn the Q function as defined in Eq. 8, we use a “softmax” training policy: that is, lnπ(u|r,q)Q^(r,q,u)/Texplo, where Q^ is the current estimate of the Q function and Texplo is a “temperature” parameter that is annealed as training progresses to allow for sufficient exploration of actions. Given an action u at state (r,q) and a subsequent action u at state (r,q),Q^ is updated during training as

Q^π(r,q,u)Q^π(r,q,u)(1η)+η(γ¯(r)dt+(1(λ+γ¯(r))dt)Q^π(r,q,u)) [9]

where η is the learning rate. The function Q^ obtained at the end of the training period yields a search strategy as π*(r,q)=argmaxuQ^(r,q,u).

We applied the algorithm just described for a range of values of v,α, and d (half the distance between sensors for the two-sensor case). For each parameter set, we obtain a search strategy, the corresponding probability ΓC(T) of missing the trail in time T, and the expected number of samples to find the trail. Comparing ΓC(T) on a test set for different numbers of training episodes (SI Appendix, Fig. S1B) shows that nontrivial learning takes place, typically saturating at 104 training episodes.

Casting in a Minimal Model of Sector Search

In order to establish the relation between casting and marginal value theory, here we propose a minimal model of sector search. The model lends to an analytical solution, which allows us to quantify how the frequency of casting and the efficiency of search depend on the movement and computational constraints imposed upon the tracker.

We consider the same setting as the above episodic RL framework, where the tracker is searching for the trail over a sector after losing contact with it. To focus on casting, we analyze the behavior of the searcher after an initial forward excursion along the most likely heading. This surge decreases the probability weight q of the mode at ϕ=0 and yields a symmetric bimodal posterior distribution concentrated at the two modes, ±ϕ0 (typically ϕ0σ). The resulting model is equivalent to Eq. 5 with K = 2, μ1=ϕ0,μ2=ϕ0, and s small. The searcher moves radially as r(t) (fixed), controls its tangential speed u=rθ˙, receives a unit reward when it finds the trail, and incurs a movement cost per unit time μu2/2, where μ sets the movement constraint. The reward and cost are discounted at a rate λ. The two-dimensional state space of the agent consists of θ and the probability q of finding the target at ϕ0 (the probability is 1q at ϕ0). For full details, we refer to SI Appendix.

The above model is exactly solvable. An optimal searcher exhibits oscillations between ϕ0 and ϕ0 (casting) until it finds the trail (SI Appendix, Fig. S2A). After an initial transient, the searcher traverses a loop in state space (θ,q), alternating between sampling at ϕ0 or ϕ0 and casting to the other side:

(ϕ0,qs*)cast(ϕ0,qs*)sample(ϕ0,1qs*)cast(ϕ0,1qs*)sample(ϕ0,qs*), [10]

where qs* is the optimal switching probability at which the searcher stops sampling at ϕ0 and traverses to ϕ0. The speed of traversal from ϕ0 to ϕ0 is determined by balancing the cost of traversing quickly and the potential value at ϕ0 discounted due to the limited time horizon.

Intuitively, the searcher casts when the marginal value of continuing to sample at ϕ0 is just outweighed by the marginal value that the searcher receives if it stops sampling, traverses from ϕ0 to ϕ0, and samples at ϕ0. Balancing the marginal value of these two possibilities then yields the optimal switching probability qs*. For small qs*, we derive (SI Appendix)

qs*λ1+λζ21+ζ+(1+λ)ζ2ζ3, [11]

where ζ12ϕ02μλ(1+λ)>0. When ζ<0, the movement cost outweighs the value the agent may receive, and the optimal strategy is to simply not move. qs* get smaller, and thus, the searcher casts less frequently with increasing time horizon (λ1) or movement costs (0<ζ1). The probability of not detecting the trail after time t is given by

ΓC(t)eIt0tdsr(s), [12]

where I is interpreted as the rate of information acquisition. Its dependence on μ, λ, and ϕ0 is shown in SI Appendix, Fig. S2B. As expected, increasing movement cost (decreasing ζ) decreases how quickly information is acquired. Similarly, a large time horizon (λ small) makes the agent sample at ±ϕ0 longer and cast slower, decreasing the rate of information acquisition. For a constant radial speed v, the probability ΓC(t)tI/v decreases as a power law, which arises quite generally from the sector search geometry as discussed in the text.

Trail Statistics

If an agent makes contact with the trail at two points separated by distance L, statistical and geometric information is encoded in the propagator P(ϕL,ϕ0), where ϕL,ϕ0 are the trail headings at the two points (measured relative to the line joining them). If H points of contact are remembered, then the propagator can be used to compute the posterior distribution of the heading: P(ϕ)=DϕiP(ϕ|ϕ1)P(ϕ1|ϕ2)P(ϕH2,ϕH1). We introduce an ensemble of trails (which we call the GWLC ensemble) that have persistence in heading and curvature, quantified respectively by the parameters κ and λ (distinct from the discount rate used for RL) and a typical radius of curvature, ξ. Typical samples of trails from this ensemble are presented in SI Appendix, Fig. S3A. The WLC ensemble (16, 17) previously introduced for polymers is a special case with ξ=.

The propagator P(ϕL,ϕ0) takes into account all the trails constrained to pass through two contact points, weighted by their probability. We affix a coordinate system, where the line joining the two contact points is along the x axis. Specifically, the trail’s y coordinate, yx, satisfies the constraint y0=yL=0; the small-angle approximation is used so that y˙x=ϕx; and the trail has end-point headings ϕ0,ϕL. We define

P(ϕL,ϕ0)=Z1DϕxDχxeE({ϕx,χx}), [13]

where Z is a normalization constant, and the integral is over all possible headings and curvatures ϕx,χx at the various positions x. The action E of a path is given by

E({ϕx,χx})=12κ0Ldx(ϕ˙xχx)2+λξ240Ldx(χ˙x+χxλ)2+ξ2χ022. [14]

The model Eq. 14 is a Gaussian process. Since symmetry dictates that ϕL=ϕ0=0, it follows that P(ϕL,ϕ0) is defined entirely by the variance σ2(L)ϕL2=ϕ02 and by the correlation ρ(L)ϕ0ϕL/σL2.

We present the full calculation of σ(L) and ρ(L) in SI Appendix. In summary, the first integral over χx can be performed using the Gaussian integral formula and leads to an effective action in ϕx. The Euler–Lagrange equation of this effective action then yields extremal paths (“splines”) that minimize the effective action (SI Appendix has details). The splines have the form

yx=a0(Lx)+aLx+c0(Lx)36+cLx36+λ2d0e(Lx)/λ+λ2dLex/λ, [15]

where the constants in the above equation are set so that y0=yL=0 and can be expressed in terms of ϕ0,ϕL. SI Appendix, Fig. S3B shows the splines between contact points spaced at increasing intervals. Plugging Eq. 15 in the effective action, we obtain

σ2(L)=κΩ2λ2Lb1+b22,ρ(L)=b2b1b2+b1, [16]

where the two functions b1 and b2 are

b1=L22λL2Ω2V2(1eLλ)b2=L26λ(L+2λ)2Ω2V2L[(L2λ)+eLλ(L+2λ)], [17]

and V=κξ2λ/2,Ω2=λ2+V2. The variance σ(L) and the correlation ρ(L) are plotted in SI Appendix, Fig. S3 C and D. Three distinct regimes are apparent. For L/λ1, we can appproximate σ2(L)κL/3+L2/4ξ2, which reflects diffusive L and curvature-dominated L2 scalings. When diffusion dominates, ρ(L)=1/2, whereas ρ(L)1 when curvature dominates. The perfect anticorrelation in the curvature-dominated regime is intuitive as the line joining the two points of contact can be viewed as the chord of a circle with radius ξ, whereas σ(L)=L/2ξ is the angular deviation of the trail around this chord. When L/λ1, the heading is randomized over many correlation lengths, and the diffusive scaling σ2(L)2λL/3ξ2 is recovered.

While P(ϕL|ϕ0) (the interpolation model) is required to integrate past information, search strategies also require the forward propagator P(ϕL,yL|ϕ0), which keeps track of trail headings and locations while the agent searches for the trail. The methods described above can be used again (SI Appendix has details) to yield extremal paths as Eq. 15 and the three quantities ϕL2fwd,yL2fwd,ϕLyLfwd. These quantities fully describe P(ϕL,yL|ϕ0). We validated our interpolation model Eq. 16 and the forward model using numerical simulations (SI Appendix, Fig. S3 E and F).

The Nondetection Probability during Surge and Cast

We introduce a sector search strategy that allows us to quantify the nondetection probability taking into account the full dynamics of the trails and yields intuition on the factors that contribute toward losing the trail entirely. We suppose the radial speed v is fixed and the tangential speed uaω. In other words, the agent casts rapidly within a conical envelope of semiaperture angle σΘ0, where σ is the prior uncertainty of the trail’s heading and Θ01 (SI Appendix, Fig. S4A). To simplify the presentation, we assume curvature-dominated trails [i.e., σ(L)=L/2ξ], although the arguments below are general.

The rapid casting limit u/aω1 allows us to compute the nondetection probability at distance r from the most recent contact point defined by Eq. 3:

ΓC(r)=eωv0rdxγ(x,y(x))y(x), [18]

where the expectation is over the full ensemble of trails {y(x)} that pass through past detection points. Here, we provide intuition for the detection rate ωγ(x,y(x)) and refer to SI Appendix for full details. For small x (i.e., xa/2σΘ0), γ(x,y(x))=1 for most trails as the sensor size a spans the entire casting envelope 2σxΘ0. In the casting regime (xa/2σΘ0), since the time spent on the trail in a single cast is a / u, the probability of nondetection per crossing is eaω/u. After n crossings, the nondetection probability is then enaω/u. As the agent moves a distance dx in the radial direction, it crosses the trail n=udx/2vσxΘ0 times [i.e., γ(x,y(x))=a/2σxΘ0 ], which is independent of u. From Eq. 18, these relations yield an exponential ΓC(r) during the initial surge followed by a power law, rβ with exponent βaω/2σvΘ0. This heuristic argument aligns well with the results of numerical simulations in SI Appendix, Fig. S4B.

The spline formulation of the GWLC yields a geometric picture of the dynamics. Intuitively, the nondetection probability (and thus, the posterior probability from Eq. 1) is large if the trail has a large probability of escaping the casting envelope before it is found by the tracker. In order to “escape” the casting envelope, the trails that are initially well within the casting envelope have to bend significantly. This bending incurs a cost in the action Eq. 14, which reduces with increasing r. The nondetection probability conditional on initial trail heading, ΓC(r|ϕ), flattens out (i.e., the detection rate vanishes) when a significant fraction of trails escapes the casting envelope, including trails with initial heading ϕ=0 (Fig. 3C).

This geometric picture yields a length scale Lin, which is the distance at which the trails initially along the most likely heading ϕ=0 escape the casting envelope. For curvature-dominated trails, escaping trails should deviate by an angle σΘ0Lin/2ξ, which gives Lin2σξΘ0. The probability of losing the trail can then be estimated as ϵin=ΓC(Lin). An additional contribution to the flattening of ΓC(r) comes from the trails that are headed outside of the cone |ϕ|>Θ0 even before reaching Lin. The probability ΓC(r) would then flatten out at a different length scale Lout, where ΓC(Lout)=ϵout. The relative contributions of the two mechanisms are discussed in detail in SI Appendix, and they are validated by using numerical simulations as shown in SI Appendix, Figs. S4C and S5B.

Due to the power law tail for ΓC(r) with exponent β, the mean distance to find the trail, r¯, is determined by the upper or lower cutoff for β<1 or β>1, respectively. Since the upper cutoff is much larger than the lower cutoff, it is more convenient for the tracker to use β>1. By using the self-consistency condition σ=r¯/2ξ in the curvature-dominated regime and the relation r¯vω1(1+eβ/(β1)), we obtain for the maximum speed

v*=ωaξ/Θ0maxβ(β+βeβ(β1))10.47ωaξ/Θ0, [19]

where the maximum occurs at β*=1.63 (SI Appendix, Fig. S5C). The mean number of samples to find the trail at this optimum is 1.3, indicating that the optimal strategy is for the agent to move slow enough that it typically finds the trail within one or two samples.

Intuition for the above results holds quite generally, including for other statistical ensembles of trails and for nonconical casting envelopes. The latter is relevant because we expect that a slowdown in the radial direction and a widening casting envelope may increase the likelihood of finding the trail. Relaxing the constraints of fixed radial speed and conical envelopes constitutes the aim of the next section.

Bellman Optimization of the Casting Strategy

Our final step is to formulate the problem of optimizing the casting strategy. The geometry of the search is shown in SI Appendix, Fig. S6A. To simplify, we parametrize the path by the set of turning points {ri,θi}. We assume a fixed speed v (note that the average radial speed depends on the strategy), which is set later based on the probability of losing the trail. As discussed in the text, we optimize the average tracking speed, L/T, after constraining the IDI, L, using a Lagrange multiplier Λ (Eq. 4). This optimization can be recast as a Bellman-type dynamic programming problem by breaking the problem up into discrete steps, each corresponding to a cast from {ri,θi}{ri+1,θi+1} (denoted {r,θ}{r,θ} below for conciseness):

V(r,t,P(ϕ))=maxr,θ[(rtΛr)(1Γ¯r,θ)+Γ¯r,θV(r,t,Pr,θ(ϕ))], [20]

where the required reward function in Eq. 4 is Vv,Λ=V(0,0,P0(ϕ)) with the Gaussian normal prior P0(ϕ)=N(0,σ2). The first term in the square bracket returns the reward if the trail is found (weighted by the detection probability 1Γ¯r,θ), while the second term advances to the next cast if detection failed. The Bellman equation thus relates V before and after a cast by updating the current state variables, r,t,P(ϕ). The time elapsed is updated as tt+(rr+2r|θ|)/v, where we approximate |θθ|2|θ| for simplicity. The prior P(ϕ) is updated using Bayes’ rule Pr,θ(ϕ)=Γ¯r,θ(ϕ)P(ϕ)/Γ¯r,θ, where Γ¯r,θ(ϕ) is the nondetection probability given the initial trail heading ϕ and Γ¯r,θ=Γ¯r,θ(ϕ)ϕ is the normalization.

It remains to calculate Γ¯r,θ(ϕ), which depends on the forward model of the trail. We consider generally that the trail’s azimuthal position expands as σfwd(r) [i.e., a trail initially headed along ϕ is located at the azimuthal position, ϕ+Δϕ, where ΔϕN(0,σfwd2(r)) ]. For the GWLC model, we show in SI Appendix that σfwd(r)=σ(r), where σ(r) is given in Eq. 16. Since the time spent on the trail within the casting envelope in a single cast is a / v, we have Γ¯r,θ(ϕ)=eaω/v if |ϕ+Δϕ|<θ and equal to unity otherwise. Since Δϕ is normal distributed, we have 1(|ϕ+Δϕ|<|θ|)Δϕ=Φ(θϕσfwd(r))Φ(θϕσfwd(r)), where Φ is the normal cumulative distribution function. We finally approximate Γ¯r,θ(ϕ)eaωv1(|ϕ+Δϕ|<|θ|)Δϕ and compute the expectation over ϕ in Γ¯r,θ numerically. The previous approximation greatly simplifies the optimization as P(ϕ) is then a sufficient statistic for past measurements (thus allowing the decomposition in Eq. 20) and yet, captures the effect of the trail’s widening trajectory on the optimized casting strategy.

At each casting step, we optimize (using standard black box optimization methods using the SciPy library in Python) for Δr=rr>0 and θ by expanding Eq. 20 two steps further into the future. The optimization then involves six variables, the immediate pair Δr,θ and the subsequent two pairs. The optimized immediate pair is then used for updating as detailed above, and the process is repeated. Optimizing more than two steps did not yield different results as the Γ¯r,θ factors introduce an effective planning horizon, effectively suppressing contributions from future rewards beyond two steps. For less than two steps, optimization is “greedy,” and the corresponding landscape was found to have a qualitatively different, shallow landscape around the optimum. After completing the optimization procedure, the constraint of fixed L is imposed by varying v and the Lagrange multiplier Λ (SI Appendix, Fig. S6B).

Supplementary Material

Supplementary File

Acknowledgements

We thank Venkatesh Murthy for stimulating discussions throughout our work. This research was supported in part by NSF Grant PHY-1748958, NIH Grant R25GM067110, and Gordon and Betty Moore Foundation Grant 2919.02. G.R. was partially supported by NSF–Simons Center for Mathematical & Statistical Analysis of Biology at Harvard Award 1764269 and the Harvard Quantitative Biology Initiative. B.I.S. was supported by NSF Grant PHY-1707973.

Footnotes

The authors declare no competing interest.

This article is a PNAS Direct Submission.

See online for related content such as Commentaries.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2107431118/-/DCSupplemental.

Data Availability

There are no data underlying this work.

References

  • 1.Draft R. W., McGill M. R., Kapoor V., Murthy V. N., Carpenter ants use diverse antennae sampling strategies to track odor trails. J. Exp. Biol. 221, jeb185124 (2018). [DOI] [PubMed] [Google Scholar]
  • 2.Thesen A., Steen J. B., Døving K. B., Behaviour of dogs during olfactory tracking. J. Exp. Biol. 180, 247–251 (1993). [DOI] [PubMed] [Google Scholar]
  • 3.Hepper P. G., Wells D. L., How many footsteps do dogs need to determine the direction of an odour trail? Chem. Senses 30, 291–298 (2005). [DOI] [PubMed] [Google Scholar]
  • 4.Porter J., et al., Mechanisms of scent-tracking in humans. Nat. Neurosci. 10, 27–29 (2007). [DOI] [PubMed] [Google Scholar]
  • 5.Khan A. G., Sarangi M., Bhalla U. S., Rats track odour trails accurately using a multi-layered strategy with near-optimal sampling. Nat. Commun. 3, 703 (2012). [DOI] [PubMed] [Google Scholar]
  • 6.Bhattacharyya U., Bhalla U. S., Robust and rapid air-borne odor tracking without casting. eNeuro 2, ENEURO.0102-15.2015 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Berg H. C., E. coli in Motion (Springer Science & Business Media, 2008). [Google Scholar]
  • 8.Riman N., Victor J. D., Boie S. D., Ermentrout B., The dynamics of bilateral olfactory search and navigation. SIAM Rev. 63, 100–120 (2021). [Google Scholar]
  • 9.David C., Kennedy J., Ludlow A., Finding of a sex pheromone source by gypsy moths released in the field. Nature 303, 804–806 (1983). [Google Scholar]
  • 10.Murlis J., Elkinton J. S., Carde R. T., Odor plumes and how insects use them. Annu. Rev. Entomol. 37, 505–532 (1992). [Google Scholar]
  • 11.Mafra-Neto A., Cardé R. T., Fine-scale structure of pheromone plumes modulates upwind orientation of flying moths. Nature 369, 142–144 (1994). [Google Scholar]
  • 12.Vergassola M., Villermaux E., Shraiman B. I., ‘Infotaxis’ as a strategy for searching without gradients. Nature 445, 406–409 (2007). [DOI] [PubMed] [Google Scholar]
  • 13.Celani A., Villermaux E., Vergassola M., Odor landscapes in turbulent environments. Phys. Rev. X 4, 041015 (2014). [Google Scholar]
  • 14.Charnov E. L., Optimal foraging, the marginal value theorem. Theor. Popul. Biol. 9, 129–136 (1976). [DOI] [PubMed] [Google Scholar]
  • 15.Stephens D. W., Krebs J. R., Foraging Theory (Princeton University Press, 1986), vol. 1. [Google Scholar]
  • 16.Doi M., Edwards S. F., Edwards S. F., The Theory of Polymer Dynamics (Oxford University Press, 1988), vol. 73. [Google Scholar]
  • 17.Marko J. F., Siggia E. D., Stretching DNA. Macromolecules 28, 8759–8770 (1995). [Google Scholar]
  • 18.Field D. J., Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A 4, 2379–2394 (1987). [DOI] [PubMed] [Google Scholar]
  • 19.Ruderman D. L., Bialek W., Statistics of natural images: Scaling in the woods. Phys. Rev. Lett. 73, 814–817 (1994). [DOI] [PubMed] [Google Scholar]
  • 20.Simoncelli E. P., Olshausen B. A., Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001). [DOI] [PubMed] [Google Scholar]
  • 21.Wark B., Lundstrom B. N., Fairhall A., Sensory adaptation. Curr. Opin. Neurobiol. 17, 423–429 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Mathis A., et al., DeepLabCut: Markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018). [DOI] [PubMed] [Google Scholar]
  • 23.Sutton R. S., Barto A. G., Reinforcement Learning: An Introduction (MIT Press, 2018). [Google Scholar]
  • 24.Reddy G., Wong-Ng J., Celani A., Sejnowski T. J., Vergassola M., Glider soaring via reinforcement learning in the field. Nature 562, 236–239 (2018). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

Data Availability Statement

There are no data underlying this work.


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES