Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2021 May 20;17(5):e1008965. doi: 10.1371/journal.pcbi.1008965

Maximally efficient prediction in the early fly visual system may support evasive flight maneuvers

Siwei Wang 1,2, Idan Segev 3,4, Alexander Borst 5, Stephanie Palmer 1,2,*
Editor: Lyle J Graham6
PMCID: PMC8136689  PMID: 34014926

Abstract

The visual system must make predictions to compensate for inherent delays in its processing. Yet little is known, mechanistically, about how prediction aids natural behaviors. Here, we show that despite a 20-30ms intrinsic processing delay, the vertical motion sensitive (VS) network of the blowfly achieves maximally efficient prediction. This prediction enables the fly to fine-tune its complex, yet brief, evasive flight maneuvers according to its initial ego-rotation at the time of detection of the visual threat. Combining a rich database of behavioral recordings with detailed compartmental modeling of the VS network, we further show that the VS network has axonal gap junctions that are critical for optimal prediction. During evasive maneuvers, a VS subpopulation that directly innervates the neck motor center can convey predictive information about the fly’s future ego-rotation, potentially crucial for ongoing flight control. These results suggest a novel sensory-motor pathway that links sensory prediction to behavior.

Author summary

Survival-critical behaviors shape neural circuits to translate sensory information into strikingly fast predictions, e.g. in escaping from a predator faster than the system’s processing delay. We show that the fly visual system implements fast and accurate prediction of its visual experience. This provides crucial information for directing fast evasive maneuvers that unfold over just 40ms. Our work shows how this fast prediction is implemented, mechanistically, and suggests the existence of a novel sensory-motor pathway from the fly visual system to a wing steering motor neuron. Echoing and amplifying previous work in the retina, our work hypothesizes that the efficient encoding of predictive information is a universal design principle supporting fast, natural behaviors.

Introduction

The goal of sensory processing is to guide behavior. Whether we are trying to catch prey, escape a predator or find food within a complex environment, the data our senses collect are only useful to the extent that these data can improve the success of our future actions. However, most previous work [1, 2] that established that early sensory processing might be optimal only applied the notion of optimality to the physical limits of sensing, or the statistics of the external input alone. To move beyond this picture, one must consider what sensory information is used for. For this problem, the fly is an ideal testbed: both precise measurements of its behavior and detailed mechanistic understanding of its underlying sensory processing are available, so one can investigate behavioral constraints that arise from key survival strategies and how these constraints might sculpt its early sensory circuits.

Here we focus on how targeted sensory processing in the fly may support one key survival-critical behavior: in-flight evasive maneuvers. To maximize the animal’s survival, escapes generally take place at strikingly fast timescales [3]. Such short timescales are usually comparable to the timescale of the inherent sensory processing delay. For example, Drosophila can finish its evasive maneuver within a mere 40ms after the detection of a purely visual threat. This is at the same timescale of its visual processing delay, which is about 30ms [4, 5]. Because of this, previous work has suggested that escape behaviors are largely planned, i.e. that they unfold as a “motor tape”. However, while visual information may not be updated during the course of the escape maneuver, it is still possible that the initial visual information can inform how the escape unfolds. These in-flight evasive maneuvers are banked turns, i.e. initiated with a rotation immediately followed by actively damping of that turn to start counter-rotation in the opposite direction. In addition, the maneuvers themselves require active and on-going control: i) The initial rotation is not stereotyped (i.e. purely 90° or on predetermined body axes as are saccadic flight turns [6]). It must be tuned to both the position of the looming threat and the animal’s heading at the moment of detection. ii) The subsequent counter-rotation determines how successful the animal is at turning away from the threat by changing its prior flying trajectory. Thus, it also needs to be fine-tuned in an ongoing manner instead of being a simple correction of the initial rotation. iii) Both the initial rotation and the follow-up counter rotation (and their combined trajectories) must be variable even for visual threats appearing on the same position to avoid the predator’s trap (e.g. an electric snake can mitigate its prey’s predictable, reflexive C-start [7] and thus foil the escape). Previous experiments identified that the mechanosensory circuit, i.e. the halteres, can perform such motor control for voluntary saccadic turns. Thus, it was hypothesized that the halteres are also responsible for the motor control during the escape.

However, the banked turns of evasive maneuvers are five times faster than those observed in the voluntary saccadic turns. The halteres alone do not have sufficient dynamic range to sense these fast rotations. Halteres are known to be gyroscopic sensors [8]; they help a fly keep its aerodynamic balance [9]. Like the vestibular system, they respond to short timescale rotational perturbations (e.g. air anisotropies) and experience the Coriolis forces during a fly’s body rotation. Previous work showed that the halteres can sense ego-rotation with angular velocities up to 2500°/s [10, 11]. Because voluntary saccadic turns, i.e. fixed stereotyped rotations, have angular velocities around 1000°/s only, most previous studies are able to use these voluntary saccadic turns to show how the halteres control behavior. Nevertheless, this dynamic range is dwarfed by the angular velocities achieved during evasive maneuvers, which can be as high as 5300°/s. Alternatively, descending visual inputs can shift the dynamic range of the halteres, which allows them to engage in active control when encouraging high angular velocity during evasive maneuvers. This is the so-called control loop hypothesis [12] and experimentally validated recently [13]. Such vision-mediated control has also been observed in other behaviors [1416].

The descending visual inputs useful for haltere control are the global motion responses generated by the large tangential lobula plate cells (LPTC) in the fly visual system. The fly visual system is organized in four largely feedforward layers: retina, lamina, lobula, and lobula plate. Located at the 4th neuropil, these LPTCs have a roughly 30 ms processing lag across many dipterans [4, 5, 17]. Because of this processing lag, it is unlikely that evasive maneuvers can access visual information through feedback during the escape. Instead, we propose that the visual information of ego-motion at the time of threat detection may play a new feedforward role. Said another way, by using its own past visual input before the evasive maneuver to predict the its future visual experience during that maneuver, a fly bypasses this processing lag and still uses visual inputs for active control. This so-called “bottom-up” prediction exploits the temporal correlations between past and future visual stimuli (due to the stereotypy in the escape behavior) during evasive maneuvers. Such bottom-up prediction exists in the vertebrate retina [18, 19], and ensures fluid interaction with the external environment. It was also shown to be important in the formation of long-term memory [20]. In our case, the escape trajectory depends on the threat angle relative to the fly’s heading [21]. Where and how the escape maneuver begins constrains how it will unfold, giving the visual system ample predictive power with which to feed forward into active flight control. We show how this bottom-up prediction provides information about the future sensory input, subverting delays in the visual input stream. Because blowflies are known to execute higher velocity saccadic turns than Drosophila, this bottom-up prediction will be even more important for blowflies’ shorter and faster evasive maneuvers.

The neural architecture of insect brains is highly conserved. It is hypothesized that the neural circuits and escape behaviors co-evolved, as early as flight itself [22]. Many modern arthropod species inherited these core sensory-behavioral modules. Both the blowfly and Drosophila use banked turns to change their heading direction [6, 23, 24]. Although blowflies in general execute higher velocity saccadic turns and have higher acuity in their compound eyes than Drosophila, both animals share similar wingbeat frequencies [25, 26]. This suggests similar time courses in their escape trajectories. Because of the high angular velocity, banked turns during evasive maneuvers are mostly composed of pitch/roll combinations. These rotations generate rotational flow-fields processed mainly by the vertical system (VS) network [2729]. The VS cells in both the blowfly and Drosophila have analogous electrotonic structures [30] despite their size difference (a blowfly is roughly four times bigger than a Drosophila). Similarly, their mechanosensory and motor systems are scaled according to this size difference [13, 25, 3136]. Because the only precise measurements of the fly’s evasive maneuver are available in Drosophila [21] and the only experimentally validated neural circuit that processes the visual input during these maneuvers is available from the blowfly [3739], but the two species are so similar overall [40], we investigate how the blowfly’s motion sensing circuit extracts behaviorally relevant information based on behavioral measurements from Drosophila.

Although both the local motion detection at the retina [41, 42] or the global motion processing at the VS network have long low-pass time constants of 150-200 ms to achieve its optimum temporal frequency, these considerations are specific in the steady state. Previous experiments [43] and modeling [44] showed that single VS cells elicit transient detector responses for short abrupt maneuvers at the timescale of milliseconds. This makes the VS network suitable for evasive maneuvers. In addition, recent theoretical work [45, 46] showed that the population coding scheme in the VS network efficiently encodes constant speed rotational motion using transient response only, i.e. the output of the initial 10 ms after encountering a motion stimulus. This effect was dependent on the positioning of gap junctions between the axons of neighboring VS cells, and has been experimentally validated in [37, 47]. This work suggests that the VS network can encode motion information on a timescale compatible with evasive maneuvers. However, without determining if the output of the encoding scheme is behaviorally relevant, this cannot determine if such processing is useful for downstream motor control. In naturalistic behaviors, ego-motion is highly dynamic. Encoding a past ego-motion is only useful if it is informative for the animal’s future experience. Merely maximizing information about the instantaneous motion is not the challenge the animal faces during evasive maneuvers; the VS network must extract the predictive information about future motion. In order to identify whether the specific wiring architecture of the VS network supports evasive maneuvers, we ask whether it encodes predictive information about future ego-motion using within its own transient responses.

To explore this hypothesis, ideally one would trace the activity of the VS cells in behaving animals. However, evasive flight maneuvers require untethered flight, which makes population recording from the VS network prohibitive. Furthermore, it is not feasible to block the VS network in flying animals, because they are essential for optomotor responses [48, 49]. Therefore, we use numerical simulations of an in-silico, biophysically realistic compartmental reconstruction of the VS network to investigate how the VS network might encode this kind of fast, predictive information. This compartmental reconstruction of the VS network is highly experimentally constrained [37, 38]. All single-cell [50] and neuronal circuitry parameters [37, 51, 52] are set such that this compartmental reconstruction behaves as does the real VS network when presented with the same current injection [37, 38, 51, 52]. Our computational approach (using a model when large scale recordings from a complete circuit during natural behavior is not possible) is similar to how previous work on electric fish [53] used a synthetic population model of the electrosensory lobe to show how accurate cancellation of self-generated electric signals is achieved.

We first show that the VS network uses axonal gap junctions to output substantial predictive information for evasive maneuvers. Next, we show that this predictive information is near-optimal throughout the duration of evasive maneuvers—it can be used to prospect forward in time with equally fidelity through the escape. We further show that the output circuitry of the VS network (the VS 5-6-7 triplet) to the neck motor center retains all available information about the future stimulus, i.e. compressing the readout does not sacrifice how much a downstream pathway knows about the ongoing evasive maneuver. Finally, we identify that the encoding of predictive information is particularly suitable to fine-tuning future motion directions. These results suggest possible novel sensory-motor pathways: either a direct connection between the lobula plate descending neurons to the wing steering muscles [5456], or an indirect connection from the visual system through the halteres to the wing steering muscle, as proposed in the control loop hypothesis [12, 13].

Results

Visual prediction provides substantial information about motion without delay

We show in Fig 1 that visual prediction contains substantial information about future motion that may be important for controlling evasive flight maneuvers. We first use a schematic trace to illustrate the inputs and delays in the fly visual system (Fig 1A). Previous works showed that the haltere outputs reach the wing steering muscles after a 15-20 ms delay [21, 57], towards the second half of the maneuver, right before the active counter-banked turn starts. Visual feedback would arrive too late, coming online only after 30 ms, long after the initial rotation is replaced by the counter rotation through active control [57].

Fig 1. Predictive information is the dominant information source about visual inputs during evasive flight maneuvers.

Fig 1

(A) Upon emergence of a threat (shown as the red star), dashed arrow represents the visual-motor delay of 60 ms from the onset of threat to the start of evasive maneuvers. After this sensory-motor delay, the position of the threat is known. The fly performs an evasive maneuver by changing its heading through a banked turn (arrows show a rotation at direction θ and its respective counterrotation). During evasive maneuvers, visual predictions can provide motion information throughout the entire duration, i.e. without delay (shown as the yellow zone), whereas the haltere feedback is only available after 20 ms (shown as the green zone) and the visual feedback is only available after 30 ms (shown as the shaded zone). The arrow leading to the haltere system illustrates how visual information might regulate haltere activity (as recently shown in [13]): because of the 30 ms sensory processing lag, haltere activity must be regulated by visual prediction. (3D fly rendering courtesy of D. Allan Drummond.) (B) This histogram compares how much information the visual prediction (shown in blue) can encode about ego-rotation (I(θ, V)) during evasive maneuvers with their respective entropy (shown as S(θ) in gray). We use the ego-rotation distribution at Δt = 10ms into the evasive maneuver to compute this entropy. Its distribution is shown in S2(A) Fig. Note that the VS output encodes almost half of the entropy of a future ego-rotation. (C) The Mercator map of a randomly generated natural scene background. To generate this map, we first randomly generate a natural scene environment. We then generate a movie mimicking an evasive flight in the natural environment by rotating this natural scene environment according to the respective measured rotations. We project this movie to a unit sphere that represents the fly’s retina, see details in S1 Fig. There are 5,000 local motion detectors (LMD) on this unit sphere as on the fly’s retina. The responses of these LMDs are then integrated as the input current I to the VS network (shown as an arrow to D). (D) A biophysically detailed model of the VS network, based on known neural circuitry [50, 51]. Note that because the soma is decoupled in VS cells (only connecting to the rest of the cell via a soma fiber), we leave out the soma in this VS model. We highlight the outputs to the neck motor center here, the axonal voltages of the VS 5-6-7 triplet. This is the only known readout that directly connects to motor pathways. (E) A cartoon showing how the information bottleneck problem is setup for prediction in this system: using the general correlation between a past input (either the ego-rotation θ or the corresponding dendritic input I, this information bottleneck finds a compact representation V (i.e. the bottleneck defines how much information about the input is ‘squeezed out’ when V is generated) of the past (the past input to the VS network: Ipast) that retains predictive components about the future (θfuture or Ifuture).

To quantify how much visual prediction encodes about ego-rotation (θ) in the fly’s future escape trajectories (Fig 1B), we define this ego-rotation-relevant predictive information in the output voltage from the fly VS network as Ifuture(θ, Δt) (Eq 1, abbreviated as Ifutureθ),

Ifuture(θ,Δt)(Δt)=I(Vpast;θfuture)=I(Vt;θt+Δt), (1)

where Vt is the output axonal voltage of the VS network at time t. Δt is the time interval between the past voltage and future ego-rotation. Here, we use intervals of Δt = 10ms, 20ms, 30ms, 40ms to obtain the output of the VS network. This is because the maximum firing rate of the descending neuron connecting to the neck motor center is 100 Hz [38], which corresponds to an integration step of at least 10 ms (see Materials and methods). Throughout this paper, we represent future ego-rotations θtt by their vector components (cos(θtt), sin(θtt)). The cosine component corresponds to roll direction/magnitude and the sine component corresponds to pitch direction/magnitude. This vector is within the fly’s coronal plane, to which the VS neurons are selectively sensitive. We then estimate p(θtt), i.e. the stimuli distribution and p(θtt|Vt), i.e. the probability of future ego-rotation conditioned on the past output axonal voltage to obtain Ifuture(θ, Δt) (see Materials and methods). Fig 1B shows that the predictive information Ifuture(θ, Δt) in the VS output voltages captures nearly 50% of the entropy of the future escape trajectory. This is because where and how the escape maneuver begins constrains how it will unfold, giving the visual system ample predictive power. This suggests that the predictive information encoded by the VS network is an important information source for evasive flight behaviors in the natural environment. This also supports that the haltere is a multisensory integration circuit, other sensory modalities, e.g. antenna [58, 59], ocelli [6062] or wing reflexes [63, 64] may provide additional motion information.

In the original paper describing the escape trajectories of Drosophila [21], the authors found that escape trajectories remain consistent despite different expansion velocities and expansion angles of the visual threat. Therefore, the specific visual features of the threat were not important. These authors also used a large range of sizes for the spatially uniform visual threat. A visual threat as large in diameter as their maximum expansion angle (107°) is similar to a uniform object found in a natural scene, like a large patch of sky or other large, low-contrast object. Thus, we hypothesize that using a generic natural scene background will not significantly change our results.

To evaluate Ifuture(θ, Δt), we need to approximate both ego-rotation distributions and the respective output distributions of the VS network. To obtain these ego-rotation distributions, we generate 650,000 samples of visual field motion experience based on behavioral recordings published in [21]. Each visual experience corresponds to one instance of a particular evasive maneuver embedded in a randomly selected set of nature scene images. There are 10,000 samples for each of the 65 evasive flight trajectories with duration of 40 ms (out of the total 92 published trajectories in [21]). Fig 1C shows one exemplar visual experience of a particular evasive maneuver trajectory. Here, we obtain the “cage” of natural images for simulation by randomly selecting six images out of the van Hateren dataset [65] and patch them onto the six faces of a cube. Then we generate a movie mimicking an evasive flight in the natural environment by rotating this natural scene cage according to measured rotations in an evasive flight trajectory S1 Fig (Because previous work [39] showed that the VS network is not responsive to translation motion, we do not use the translation component of evasive maneuvers in this simulation, also see Materials and methods). We next project this movie onto a unit sphere that represents the fly’s retina, following the protocol introduced in [39, 45]. There are 5,500 local motion detectors located on this unit sphere, whose outputs are local motion estimates based on pixel intensity difference between neighboring photoreceptors.

To evaluate whether the VS network encodes Ifuture(θ, Δt) efficiently, we need to define a few additional information theoretic quantities within the VS network architecture (Fig 1D). The dendrites of all VS cells receive current inputs resulting from integrating the outputs from hundreds of upstream local motion detectors [51]. These outputs are then integrated and become the input current I to the VS network (Fig 1D), which outputs axonal voltages V for downstream processing. Because of the feedfoward structure in the fly visual system (retina, lamina, lobula, lobula plate), the VS network, located at the 4th neuropil the lobula plate, does not have direct access to the visual input. Therefore, we need to define a proxy for correlations between past and future ego-rotations based on the past and future input currents. We probe how the VS network might use these correlations to encode predictive information about the future ego-rotation. In this encoding scheme, the generalized correlation between the past and the future of the input current Ifuturemax (Eq 2) itself limits how much predictive information the VS network can encode:

Ifuturemax(I,Δt)=I(Ipast;Ifuture)=I(It;It+Δt) (2)

This is the mutual information between the past and future input (the dendritic current) and defines the total interdependence of the current with itself in time.

Similar to Ifuturemax, we define how much information is retained by the axonal voltage of the VS network about its future input as Ifuture (Eq 3, abbreviated as IfutureI).

Ifuture(I,Δt)=I(Vpast;Ifuture)=I(Vt;It+Δt). (3)

This is the predictive information between the output axonal voltage and the future input current, which again we are using as a proxy for future ego-rotation.

All of the information encoded by the VS network comes from its sole input current, Ipast. To quantify the efficiency of encoding, we need to qualify not only the prediction (i.e. the Ifuture), but also determine how much the axonal voltages consumes to obtain the predictive information, i.e. how much they encode from the input at the same past time. This is mutual information quantity: Ipast (Eq 4),

Ipast=I(Vpast;Ipast)=I(Vt;It). (4)

Because brief and fast evasive maneuvers have a quickly varying ego-rotation, we identify prediction as the most important ‘relevant’ variable in the input to the VS system and set up the corresponding information bottleneck problem (Fig 1E, also see Materials and methods): this method finds the maximal amount of relevant predictive information that the VS network can encode about the future ego-rotation θfuture and its proxy, i.e. the input Ifuture, via its axonal voltages at a past time, Vpast. The solutions to the information bottleneck problem, Ifuture*(I,Δt) or Ifuture*(θ,Δt), are subject to a constraint on the amount of information Vpast has about the input in the past, I(Vt;It). The absolute maximum of the predictive information is set by the generalized correlation between past and future of the ego-rotation trajectory itself, I(θpast, θfuture), (and its encoding in the VS network input, Ifuturemax), which we take from real fly maneuvers.

The specific texture of the visual scene significantly impacts the ego-motion-induced input to the VS dendrites. Of course, if the fly rotates against a uniform background, no visual motion information is available. The details of the spatial correlation structure in the scene also affect the input to the VS system and its subsequent ego-rotation estimate. Previous work [37] compared motion encoding using two different visual backgrounds: an artificial background with regular texture (e.g. random dots) and a natural scene background with irregular texture and an inhomogeneous contrast distribution. Those results showed that with the artificial background, rotational optic flow can be reliably read out at the VS dendrites. Axonal gap junctions do not improve this estimate. However, the irregularly distributed patches of high and low contrast in natural scenes make the motion estimation noisy, even after the dendritic integration. Only once the outputs are filtered through the VS axons with gap junction coupling does robust encoding of the fly’s rotational optic flow emerge. By pooling inputs across neighboring VS cells, axonal gap junctions average out the fluctuations in local motion measurements due to irregular textures in natural scenes. It was identified that such contrast normalization denoise the dendritic inputs to the VS network [66], recent work further showed that this contrast normalization acts nearly instantly [67, 68], compatible for the processing during evasive maneuvers. Moreover, [46] showed that even with those axonal gap junctions, the VS network can only encode 50% of the motion information in natural scenes, as compared to artificial, regular textures (shown in S1(C) Fig). Here we are challenging our in silico fly with both naturalistic self-motion and complex visual texture. The axonal gap junctions in the VS system may have evolved to solve this particular, behaviorally relevant problem in natural scenes.

Axonal gap junctions enable prediction during evasive maneuvers

The VS network generates wide-field rotational field responses combining upstream vertical local motion detection [2729]. There are 20 different VS cells (10 at each compound eye). VS dendrites integrate upward and downward local motion signals from T4 and T5 such that their axons output global motion components of ego-rotation based on the fly’s visual experience. Each VS cell has its receptive field centered at a specific rotational axis of the fly’s coronal plane (combinations of all pitch and roll rotations). They are numbered VS1-VS10 according to their preferred ego-rotation arranged along the fly’s anterior-posterior axis [69]. Recordings from VS cells confirm that they mainly respond to rotational flow fields but not expanding flow fields [49]. If the animal is moving in an open field, translational motion does not generate rotational flow, and the VS network is only sensitive to ego-rotation. VS cells are upstream of both other LPTCs (previous work described ultra-selective LPTCs for detecting the size and speed of a looming target (LPLC2 and LC4 [70, 71]) as well as descending neurons connecting to the neck motor center. Because the rotations during evasive maneuvers are mainly pitch and roll combinations (similar to how a fighter jet gains tactical advantage), the VS network is a good candidate for encoding information specific to and most relevant for this behavior.

Previous work identified the Rm, Ra, Cm, gexc, ginh for each VS cell [50] and observed that the network performance of the VS network is dominated by its chain-like wiring architecture [37, 5052]. Each VS cell only connects with other VS cells with immediately neighboring receptive fields, e.g. VS2 only connects to VS1 and VS3. Meanwhile, the VS1 and VS10 cells show reciprocal inhibition [72]. Previous dual-recording experiments [51] have shown that VS cells connect amongst each other through electrical synapses. Further dye-coupling experiments also have shown that these electrical synapses were gap junctions [52]. In [37], they identified that these gap junctions are located at the axons of VS cells. Anatomically, these gap junctions are the only connection between the VS cells. Mechanistically, the VS network implements an electrotonic segregation through these gap junctions: all VS cells show broadened receptive fields at their axons compared to those at their dendrites. These broadened receptive fields improve the robustness of visual motion encoding [37, 73] at the output of the VS network. The upstream input from local motion detectors brings much more substantial synaptic conductance load on the dendrites than the dendritic-axonal leak, so placing a gap junction at the dendrites has negligible effects on the VS output [73]. However, placing gap junctions between the axonal compartments performs a kind of linear interpolation along the lateral (azimuth) direction [37]. This removes corruption of the ego-rotation signal arising from the inhomogeneous contrast distribution of the natural visual scene (and not from the global ego-rotation signal, this is analogous to how dendritic integration at single VS cell removes corruptions along elevation, shown in [74]). Gap junctions also support a robust [37] and efficient [46] subpopulation readout scheme: the output from the VS network arises from subpopulations of adjacent cell triplets, which target different downstream areas [39, 75]. In particular, the VS network connects to the downstream descending motor neurons or neck motor neurons only through the VS 5-6-7 triplet of cells [50, 75], which have dendritic receptive fields located at the center of the fly’s field of view. Therefore, these axonal gap junctions are dominant for network performance. Here, we investigate i) if axonal gap junctions improve prediction; ii) how such improvement is different from efficient coding of instantaneous input previously shown in [46].

Causality dictates that the past axonal voltages can only obtain information about the future current from the very own past current, therefore Ifuturemax (shown as the solid line in Fig 2A) is an upper bound on Ifuture. Here, we explore what network wiring features support the maximal transmission of the correlation structure in the input current onto the output axonal voltage of the VS network. As shown in Fig 2A, axonal gap junctions are necessary for the system to encode the maximal amount of predictive information about the input current. Namely, the IfutureI (shown as dashed line) only approaches Ifuturemax (shown as solid line) when gap junctions are present between neighboring VS axons. The other two configurations of gap junctions, i.e. no gap junctions or gap junctions at the dendrites (shown as dotted and dashdotted lines respectively), cannot encode as much predictive information.

Fig 2. The capacity of the VS network to encode predictive information varies with the anatomical locations of the gap junction between VS cells.

Fig 2

A) The predictive information about the future input current, Ifuture encoded in four different schemes: 1) the past dendritic input current (solid line, this is the limit Ifuturemax. It is also the upper bound of Ifuture), 2) the past axonal voltage when the gap junctions are present between VS axons (dashed line), 3) when the gap junctions are present between VS dendrites (dotted line) and 4) in the absent of gap junctions (dash-dotted line). All gap junctions = 1000 nS for both settings when they are present. Only their locations differ, i.e. axon vs. dendrite. Note that when the gap junctions are present between VS cell axons, the output voltages preserve almost the entire amount of the predictive information available at the inputs (red). (See also Materials and methods.) Such encoding is not because of linear correlation. As shown in, there is negligible linear correlation between the past and future egorotation at Δt = 30ms or Δt = 40ms. B) The presence of axonal GJs improves the encoding of predictive information more than instantaneous input. We compare how the VS network encodes the predictive information in two scenarios (i.e. Ifuture and Ifuture(θ, Δt) with Δt = 10 ms) and the instantaneous egomotion θ with axonal GJs (cyan bars) and without GJs (red bars). The encoding of instantaneous constant egomotion I(θt; Vt) (without prediction forward in time) is compiled from the previous work [46]. I(θt; Vt) was defined as the mutual information between a constant rotation θ and the transient axonal voltages of the VS network (integrated for Δt = 10 ms).

We also observe that the presence of axonal gap junctions improves the encoding of predictive information much more than it improves the encoding of the instantaneous ego-motion [46] (Fig 2B). Comparing Ifuturet) with I(θt;Vt), we observe that the presence of axonal gap junctions improves I(θtt; Vt) (Ifutureθ)and I(It+Δt;Vt) (IfutureI) about 100% as opposed to 10% for I(θt; Vt); see the differences between the cyan and pink bars in Fig 2B. Because having axonal gap junctions can only reformat information received from the dendrites, this suggests that placing gap junctions at this position may be a strategy the VS network uses to specifically improve the predictive capacity of its output encoding. Next, we investigate whether such improvement approaches the physical limits set by the input current or ego-rotation information, computed using the information bottleneck framework.

The VS network is near-optimal in predicting its own future input

All of the information encoded by the VS network comes from its sole input current, Ipast. To quantify the efficiency of encoding, we need to qualify not only the prediction (i.e. the Ifuture), but also how much the axonal voltages encode from past input Ipast. Comparing Ipast and Ifuture, where the past is at time t and the future at t + Δt, we can ask formally whether the VS network encodes as much as predictive information as possible, using the information bottleneck framework [76]. Given the amount of information the axonal voltage encodes about the past sensory input, what is the maximal amount of information it can retain about the future input? Such an optimum Ifuture*(I,Δt) traces out a bound (the dark blue line) in Fig 3 as a function of Ipast. It is the maximal possible predictive information at each level of compression, Ipast. For encodings with the same Ipast, those approaching the bound are optimal.

Fig 3. Near-optimal prediction of the input to the VS network.

Fig 3

(A) The encoding of predictive information about the future current input to the VS network is near-optimal 10 ms after evasive maneuvers starts (Δt = 10 ms). Such performance is present for using both the entire VS network and the triplets. The dark blue curve traces out optimum encoding of future input to the VS network given varying amounts of information retained about the past input (also see Materials and methods). This curve also divides the plane into allowed (blue shaded region) and forbidden regions. No encoding can exist in the forbidden region because it cannot have more information about its future inputs than the input correlation structure allows, given causality and the data processing inequality. In addition, the maximal amount of information (shown as the highest point of the information curve) that is available as predictive information is limited by the correlation structure of the input (current), itself. We then plot the amount of information the axonal voltages of VS network (we show with axonal gap junctions in pink and without gap junctions in black) encode about the future input (the input current at time t + Δt) versus the information they retain about the past input (the input current at time t) (with all 120 triplets (crosses) and the whole network (circle)). The information efficiency, compared to the bound, contained in a particular encoding scheme corresponds to a single point in this (Ipast, Ifuture) plane, which shows how much information it encodes about the past input vs. how much it encodes about the future. A particular VS encoding could occupy any point within the blue shaded region, but those that get close to the bound Ifuture*(I,Δt) for a particular Ipast are the maximally informative predictors of the future input. (B) Ifuture for all VS (light bars) vs. triplets (dark bars, with error bars) throughout the time span of evasive maneuvers (Δt = 10 ms, 20 ms, 30 ms, 40 ms).

The known circuitry of the VS network allows us to probe two coupled questions: 1) What is the predictive efficiency (based on the encoding of the past) and 2) what is the predictive capacity (encoding of the past input only to predict the future input) of the VS network, given different readout circuitry (the entire VS network vs. the output VS 5-6-7 triplet)?

The predictive capacity of the VS network for its own future inputs is near-optimal. As shown in Fig 3A, the axonal voltages of the VS network encode Ifuture = 3.49 ± 0.1 bits for future inputs at Δt = 10 ms (the beginning of the banked turn). Considering that optimum is Ifuture*(I,Δt)=3.59 bits, using axonal voltages of the entire VS network capture Ifuture/Ifuture*(I,Δt)=97.2% of the optimal predictive information.

Similarly, using only the axonal voltages from the triplets, prediction of the entire VS network’s future input is close to optimal as well (the cross in red in Fig 3A). All encodings based on outputs of triplets reach Ifuture = 2.89 ± 0.36 bits while their respective physical limits are 3.07 ± 0.24 bits in Fig 3A. This suggests that all triplets achieve 89.8 ± 1.5% efficiency in encoding predictive information about the inputs Ifuture/Ifuture*. VS triplets are not as efficient as the entire VS network in encoding the future dendritic input.

In general, triplets also retain less absolute predictive information about future input than the entire VS network throughout the time span of evasive maneuvers (Fig 3B). But the key point is not how well the triplet predicts the VS input, but how well it might help guide behavior by predicting the fly’s future ego-rotation. We next explore this directly by asking how much information the triplet output has about the future ego-rotation.

The triplet architecture selectively encodes predictive information about future ego-rotation

Here, we show that the triplet readout architecture retains nearly all of the available predictive information about the future ego-rotation Ifuture(θ, Δt) (light bars in Fig 4A) available to the VS network from the upstream input (darker bars in Fig 4A). Because downstream pathways of the VS network only read out from triplets, the VS network appears to use a two-step strategy to optimize this readout: it first efficiently represents correlations within its own past and future input, i.e. Ifuture(I,Δt), at its output; then it selects components within that output that are relevant for predicting future ego-rotation, i.e. Ifuture(θ, Δt). This is possible because correlations coming from events in the visual world, such as the movement of large objects or the full-field background movement, have a different temporal structure (e.g. longer correlation times) than those internal to the brain.

Fig 4. Encodings based on the axonal voltages of triplets are near-optimal in predicting the future ego-rotation.

Fig 4

(A) Histogram showing that the triplets (we use the output triplet VS 5-6-7 triplet here) encode nearly as much information about the future ego-rotation (shown in dark bars) vs. the entire VS network (shown in light bars), throughout evasive maneuvers. The solid line shows the mutual information within ego-rotation themselves: between the prior heading upon the detection of visual threat θpast and different future ego-rotation throughout evasive maneuvers. The dashed line shows the mutual information between past input to the VS network and different future ego-rotation, all using the same past input Ipast corresponding to the previous θpast. This is also the limit of prediction in the information bottleneck framework. (B) The encoding of predictive information for the θfuture at 10 ms after the start of evasive maneuvers (Δt = 10 ms). The dark blue curve traces out the optimum encoding for the future ego-rotation (I(Vpast; θfuture)) given varying amounts of information retained about the past input (I(Vpast;Ipast)). The cyan cross corresponds to how much information each of all possible 120 triplets encodes about the future ego-rotation vs. how much information they retain from the past input.

We also observe that all triplets are near-optimal in encoding the predictive information about future ego-rotation (Fig 4B) (such optimality is also present for prediction of the distant future, i.e. Δt > 10 ms, results not shown). Considering that the VS 5-6-7 triplet encodes nearly as much information about the future ego-rotation as the VS network (Fig 4A), the main benefit of using triplets is compression: the VS triplet readout discards information predictive of the less behaviorally-relevant intrinsic dynamics of the inputs themselves. This compression is close to the niche region where the predictive information just begins to saturate, indicating that the triplet output has a better trade-off than the whole VS network in terms of how much Ifuture(θ, Δt) it can encode given its cost Ipast.

Although all triplets encode similar amounts of information about the future ego-rotation (the standard deviation of the Ifuture(θ, Δt) amongst all 120 triplets is just 0.1 bits), the particular triplet connecting to the neck motor center, the VS 5-6-7, is one of the better choices in terms of how much information about the future ego-rotation it packs into its prediction of the future input, while the VS 1-2-3 triplet is the most efficient. However, if we factor in wiring constraints, linking the output from VS 5-6-7 to a downstream dendritic arbor in the descending neurons for the neck motor center requires a much shorter wiring length compared to the peripheral location of the VS 1-2-3 triplet (VS cells are numbered according to their locations along the anterior-posterior axis; VS 5-6-7 are central in the compound eyes). It is possible that the minimization of wiring length [77] is important in selecting the simultaneously most predictive and most resource-efficient encoding.

Here we show that the VS 5-6-7 triplet that projects to the downstream neck motor center retains nearly all of predictive information about future ego-rotation as is present in the entire VS network. This result also shows that the predictive information encoded by the VS network is compressible: the VS 5-6-7 triplet successfully reformats the predictive information from 20 dendrites/axons (10 VS cells from both compound eyes combined) into six axons (the axons of VS 5-6-7 from both compound eyes combined). In the next section, we investigate how ego-rotation representations vary based on either the entire VS network or the VS 5-6-7 triplet. We hope to understand a) what kind of computation is possible via the encoding of near optimal predictive information, and b) how the VS 5-6-7 triplet reformats this near-optimal prediction.

Predictive information encoded by the VS network provides fine-scale discrimination of nearby stimuli

Because the entire VS network has 20 VS cells and the VS 5-6-7 triplet only has 6 cells, it is difficult to directly compare their encoding of the fly’s ego-rotation with different dimensions. However, both encoding schemes encode similar amounts of predictive information for ego-rotation. Because solutions with the same y-value or similar y-values on the information curve share the same dimensionality in their respective compressed representations [76], the difference between these representations shows how the whole VS network differs from the VS 5-6-7 triplet in predictive information encoding. Because solving for these representation is generally intractable, we instead use the variational information bottleneck (VIB) to approximate these optimal representations [7880]. The VIB is a generative learning framework. It is closely related to variational autoencoders [81]. Given pairs of inputs and outputs, it generates a latent feature space whose dimensions are predictive features from the input to the output (S6 Fig). One can then project the input into this latent feature space to obtain the predictive representation of the output. Here, we obtain these predictive representations in two steps: we first train the VIB to generate a latent feature space that maps the input (the axonal voltages of the VS network) to the future input current. We next project the axonal voltages (that correspond to the same future ego-rotation) onto this latent space. We can label these points in the latent space according to their respective future ego-rotation, and repeat this procedure for multiple ego-rotations. We can visually and computationally examine how overlapping or distinct these maximally predictive clusters of ego-rotations are in the latent space of the VIB. To allow for a direct comparison, we keep the dimension of the latent feature space to be the same while changing the input, using either the axonal voltages of the entire VS network, or those of the VS 5-6-7 triplet. S7 Fig shows that using two dimensional latent space already enables these predictive representations to encode substantial predictive information.

The VIB-generated representations of future ego-rotation (Fig 5) show that the predictive information encoded by the VS network supports fine-scale discrimination of the input motion direction. For ego-rotations with different degrees of clockwise roll and up-tilt pitch (but the same direction), the encoded predictive information puts those close in their azimuths far apart, i.e. a pair of ego-rotations (56° and 67°, shown in S9(A) and S9(B) Fig) with 10 degrees difference are mapped to distinct, well-separated clusters in the latent space of the VIB, whereas another pair (37° and 56°) that are farther apart share some overlap (S9(C) Fig). The VS 5-6-7 triplet preserves this fine scale discrimination (and S9(B) and S9(D) Fig) while compressing the readout. The same fine-scale discrimination is also present for ego-rotations combining counter-clockwise roll and up tilt, i.e. corresponding to vectors within the 4th quadrant of the fly’s coronal plane (Fig 5B and 5D). However, these predictive representations cannot discriminate ego-rotations with vastly different roll or pitch directions, i.e. belonging to different quadrants: there is substantial overlap if we overlay these predictive representations, e.g. the cluster corresponding to 270° (shown in magenta in Fig 5C) will entirely cover the cluster corresponding to 19° (also shown in magenta, but in Fig 5A). The same overlap is also present in Fig 5B and 5D as well.

Fig 5. The predictive information encoded by the VS network supports fine scale discrimination of future ego-rotation.

Fig 5

(A) The predictive representation of four future ego-rotations in the same quadrant of roll and pitch, e.g. an up-tilt and a clockwise roll. This representation maps the axonal voltage of the entire VS network to future ego-rotation through a latent feature space. The dimensions in this latent feature space (shown as VIB D1 and VIB D2) are VIB-learned predictive features based on the output of the VS network. All ego-rotation correspond to vectors within the 1st quadrant of the fly’s coronal plane. The inset shows a polar histogram in grey and the four selected ego-rotations in color. (B) Similar to A but using the axonal voltages of the VS 5-6-7 triplet. (C) Similar to A, but ego-rotation are all counter-clockwise roll and up-tilt, corresponding to vectors in the 4th quadrant (between 270° and 360°) of the fly’s coronal plane. (D) Similar to C, but obtained using the axonal voltages of the VS 5-6-7 triplet as the VIB input. Note that although the overall correlation is high for the VIB solution using the axonal voltages of the VS 5-6-7 triplet, the VIB D1 and VIB D2 encode different information about θfuture: VIB D1 encodes 1 bits about θfuture and VIB D2 encodes an additional 0.3 bits.

Even with a representation that retains all of ego-rotation-relevant information in the input to the VS network, one cannot use information available at the VS network input to discriminate ego-rotations of wide difference. We construct such a representation based on the instantaneous input current of the present ego-rotation. These input currents contain 2.44 bits relevant to the fly’s instantaneous ego-rotation i.e. without prediction forward in time. This information is higher than that available via predictive information from the past input current (2.1 bits, shown as the red bar in S7 Fig). The first two principal components (PC) of the input current retain nearly all available ego-rotation relevant information, so we ask how ego-rotations disentangle in this representation of the first two instantaneous PCs. By projecting all VS input currents into these first 2 PCs, we find that there still exists substantial overlap between ego-rotation (S8 Fig), e.g. the cluster of 19° in magenta almost covers the entire cluster of 247° in light green (S8(A) Fig). This shows that the input to the VS network can only support fine-scale discrimination, whether an instantaneous readout or predictive readout.

Ego-rotation representations are different if the system is optimized for prediction as opposed to optimized for encoding the present input. For example, the VIB-based predictive representation not only separates nearby ego-rotations (e.g. 56° and 67°) into distinct clusters, but also inserts a cluster of another ego-rotation angle (e.g, 19°) between them. By contrast, the disentanglement implemented by a circuit that maximally encodes the instantaneous inputs projects all distinguishable ego-rotations into adjacent clusters according to their azimuthal angle, i.e. 56° and 67° are next to each other. Because predictive representations contain less absolute information about the ego-rotation than the instantaneous-optimal representation, this difference suggests that the predictive information might preferentially support fine-scale discrimination of nearby ego-rotations. Information is not necessarily lost, however. Downstream, at the descending motor neurons [8285] or the neck motor center [86, 87], information from other pathways (e.g. the haltere and prosternal organs [40]) is integrated. Thus, the discrimination of larger ego-rotation angles may be supported, while allowing the VS system to serve a specialized role in fine discrimination.

Discussion

Here, by focusing our analysis of the fly’s neural code for a key survival strategy, the evasive flight maneuver, we have shown that the VS network in the early fly visual system encodes behaviorally relevant predictive information near-optimally. A subpopulation readout mechanism, based on triplets of VS cells, further compresses the representation of that predictive information. This compression trades off the prediction of its own input with the prediction of future ego-rotation: while it encodes less absolute information about the future input, it retains the predictive information about what a fly will experience during evasive maneuvers at higher fidelity, in the outputs. The encoding of predictive information has a concrete behavioral goal: it enables fine-tuning of ego-rotation during evasive maneuvers.

Combining these observations, the fly brain satisfies an overarching computational goal of effectively guiding evasive flight trajectories through visual prediction at both levels of input filtering (via axonal gap junctions) and output reformatting (via subpopulation readout based on triplets). By next identifying that the predictive representations of future ego-rotation are best at enabling fine-scale discrimination of nearby ego-rotations, we have shown how structure maps to function in this motion sensing system. In addition, we have shown that behaviorally relevant features of ego-rotation are faithfully encoded via circuit mechanisms at the sensory end of the arc from sensation to action. This suggests that behavioral goals sculpt neural encoding even at the earliest stages of sensory processing.

Previous work had proposed that the halteres may implement a proportion-integration (PI) controller framework to generate motor commands [57]. In this framework, we hypothesize that feedforward visual control may have a functional role similar to the integration term: both are meant to reduce noise for high frequency signals, i.e. high velocity rotation in the evasive maneuver. A key difference between visual prediction and the integration term is that visual prediction does not merely respond to past errors, it gives an estimate of the future: it supports fine-scale discrimination of future ego-rotations at high velocity. This noise reduction is better suited for the abrupt, brief nature of evasive maneuvers, whose past error (before the initiation of evasive flight) may not be useful. Recent experiments [13] identified that visual information can shift the dynamic range of the motor neurons in the halteres. Namely, the halteres themselves obtain proportional gain for ego-rotation up to 2500°/s from its own mechonsensory input. The visual prediction then induces a shift in the haltere dynamic range so that they can output appropriate motor commands if a future ego-rotation is beyond 2500°/s and up to 5300°. Whether this linear PI controller is indeed implemented by the halteres is an interesting direction for future research.

The halteres may implement the above PI controller in a different ways, some of them may predict novel sensory pathways between the visual system and a visually gated motor neuron of wing steering muscles. The halteres were known as a multi-sensory integration circuit. They combine inputs from the eye, ocelli, antenna and themselves using multiple campaniform sensilla embedded on the stalk. One possibility is that the descending visual input recruits the Coriolis-sensitive sensilla. This activates the reflex loop consisting of dF2 campaniform sensilla and wing steering motor neurons and thus directly modify the firing phase of the tonic first basalar muscle w.B1 whose phase advance associates with increased wingbeat amplitude [88]. Alternatively, the visual prediction may use dF3, a campaniform sensilla that receives input independent from dF2, to recruit the phasic motor neuron, i.e. the second basalare motor neurons M.b2 and steering muscle w.B2. Little is known about these neurons other than that they are responsible for initiating these elevated wing kinematics (stroke amplitude and frequency [54, 55, 89]) which are necessary for evasive maneuvers. In the prediction paradigm, when the predictive visual information tunes the strength of haltere mechanosensory feedback [13], it indirectly activates these motor neurons via the haltere’s connections to (M.b2) through electric synapses (fast enough for evasive maneuvers). Because of the lack of driver line targeting individual campaniform sensilla of the halteres, these are open questions for future investigation.

Our work also suggests that the encoding of predictive information is a key functional role of the lobula plate tangential neurons in dipterans. This might not be obvious given that different species have very different behavioral repertoires and selective pressures that sculpt their tangential neuron computations [40]. However, all dipterans use their lobula plate tangential neurons to encode motion information [40, 84]. All of these neurons have long processing lags, with respect to their behavioral timescale (e.g. the reaction time of robber flies are even faster for prey capture, around 10-30 ms, and their sensory processing delay is around 18-28 ms) [5]. The common theme that they overcome processing lags in their global motion sensitive neurons suggests that they may all use prediction to satisfy various selective pressure from different survival-critical behaviors. The predictive information from the visual system only accounts for about half of the future discriminability. We believe that the halteres, which integrate input from other sources, use the combination of visual feedforward prediction with other prediction to improve its motor command [88].

How the fly brain reads out the predictive information from the VS system is an important open question. Although the information contained by the temporally averaged output of the VS population response can be read out linearly, this may not be how readout takes place in the real downstream pathways of the VS network. For example, there are two descending DNOVS neurons (DNOVS1 and DNOVS2) connecting to the output from the triplet VS 5-6-7 [69, 75, 90, 91]. While DNOVS1 is a graded neuron, the DNOVS2 is a spiking neuron (with firing rates up to 100Hz). DNOVS2 introduces a substantial nonlinearity such that the biological readout may not be a simple integrator. In addition, the output of the VS network may be further combined with outputs from other LPTCs before they reach a motor control center [39].

Gap junctions are prevalent throughout the brain in many species [92, 93]. In vertebrate visual systems, the retina also encodes predictive information near-optimally to potentially circumvent sensory processing delays [18, 94]. Initial evidence supports the notion that gap junctions are a key circuit element in improving signal transmission in retina: for example, gap junctions between directionally selective ganglion cells in the mouse retina result in lag-normalization [95], and the gap junctions present in cones and bipolar cells improve the signal-to-noise ratio in their respective outputs [96]. Gap junctions can also rapidly regulate chemical synapses and improve sensitivity to correlated signals [97]. When processing stimuli with correlations between the past and the future (e.g. predictable motion), these mechanisms can support prediction to compensate for delays. In the central nervous system, gap junctions are versatile enough to support flexible hierarchical information processing in cortical circuits, as hypothesized in [98]. The ubiquitous evolutionary pressure to perform efficient prediction may shape nervous systems through this common circuit motif.

The brain carries out flexible, robust, and efficient computations at every moment as an organism explores and interacts with the external world. These computations are only possible through versatile mechanisms that operate under realistic behavioral constraints. We have shown that optimizing the transmission of predictive information in sensing systems is a useful way to interrogate the neural code. Given the presence of predictive information in sensory systems that evolved independently [18], our work supports the idea that predictive information may very well be a fundamental design principle that underlies neural circuit evolution. While we have dug into the specific mechanisms and representations that support this kind of efficient prediction for fast, natural and behaviorally critical motion processing in the fly visual system, the lessons learned may apply to a much larger class of neural sensing systems.

Materials and methods

Ego-rotation for evasive flight maneuvers

We obtain ego-rotation stimuli from a large dataset of evasive flight maneuvers in Drosophila published in [21]. This dataset contains 82 traces of evasive trajectories when the flies face looming targets from all possible angles in their left visual field. All traces contain motion information (e.g. direction, velocity, etc.) from the emergence of the threat to the end of the evasive maneuver. In this dataset, the evasive flight trajectories are aligned at the beginning of the maneuver. In [21], they showed that both the speed and the expansion sizes of looming threats do not change the respective escape time courses and dynamics, i.e. this 40 ms evasive maneuver is the best Drosophila can do. The duration of the evasive trajectories vary between 10-40 ms, with 65 out of 82 flights as long as 40 ms. We chose this dataset for two reasons: a) its sample rate (7500 fps) allows us to trace the activity of the VS network at the millisecond scale; b) it contains threats approaching the fly from angles spanning a full 180°, providing a well-sampled collection of the fly’s behavioral repertoire.

Simulation of the in silico VS network

Our simulation uses a biophysically realistic simplified model of the VS network based on a reconstruction introduced in [37] (we used the modelDB python package introduced in [45]). This reconstruction models each VS cell with hundreds of dendritic compartments based on image stacks obtained by two-photon microscopy. Meanwhile, it implements the chain-like circuitry of the VS network by using both a) resistances connecting neighboring cells as axonal gap junctions [51, 52]; b) the negative conductance between the VS1 and the VS10 to account for the reciprocal inhibition [72].

Compared to the detailed reconstruction, the simplified, biophysically realistic model introduced in [38] reduces all dendritic compartments into a single compartment while keeping other components intact. In the simplified model, an individual VS cell is represented by one dendritic compartment and one axonal compartment, respectively. All its parameters were determined by a genetic algorithm [38] so that this simplified model behaves roughly the same as the real VS network when given the same current injection [51, 52]. Both the dendritic and axonal compartments have their own conductances (gdend and gaxon, respectively) and a connection conductance between them (shown as the gdend/axon).

This simplified model does not include a few detailed dendritic structures of the VS cells, including the rotational distribution of dendritic branches and the newly identified inhibition from LPi [66] after this model was proposed. We don’t think inclusion of these details would significantly change our results, especially given the consistency between this simplified model and real VS cells when having the same current injection [37, 39].

We combine the evasive traces and natural scene images to generate the optic flow patterns to which the VS network responds. For each of the 65 evasive traces that lasted a full 40 ms, we simulated 10,000 randomly generated natural scenes to obtain samples of the input (current arriving at dendrites) and output (axonal voltages) for subsequent analysis. In every simulation, we first generate the pseudo-3D visual “cube” (S1(A) Fig), representing the environment to which our model fly visual system responds, by randomly selecting six images from the van Hateren dataset. We then rotate this natural scene cube according to the rotational motion during evasive maneuvers recorded in [21] (we sample the rotational motion at a Δt = 1 ms interval, and integrate the VS response at a smaller time step of 0.01 ms to guarantee numerical accuracy). This yields the optic flow pattern which we project onto a unit sphere that represents the fly’s retina, following the protocol introduced in [39, 45].

The final step before simulating the VS network’s response is to simulate its upstream local motion detection at the fly’s retina. We use 5000 local motion detectors (LMD) evenly distributed on the unit sphere. Each LMD contains two subunits that differ by 2° in elevation. Their responses are R(t) = (f * V1)(t)(g * V2)(t) − (g * V1)(t)(f * V2)(t) where V1,2(t) are photoreceptor responses at two neighboring locations and f is the low pass filter. g is the high pass filter [99].

Each VS dendrite takes as input the output of the LMDs that fall into its respective field. The receptive field (RF) of these dendritic compartments are modeled as 2-D Gaussian S(x,y)=12πσxσyexp(-[(x-xctr)22σx2+y22σy2]) with azimuth σx = 15° and elevation σy = 60°, tiling along the anterior-posterior axis (e.g. the centers of these receptive fields are located at xctr = 10°, 26°, ⋯, 154° for VS1-10 respectively, see detailed configuration in [39]). The input current of an individual VS dendrite is the weighted sum of the synaptic conductance load from both excitatory and inhibitory inputs multiplied by the potential, i.e., with the excitatory conductance load as (gE=Σx=-180180Σy=-9090S(x,y)[Rt(x,y)] (similar for gI, the input current is then gE * EE + gI * EI (EE,I are reverse potentials relative to excitatory/inhibitory resting potentials, respectively). In our simulation, it is kept between -2.5nA to 2.5nA to be realistic [50]).

The neighboring axonal compartments of different VS cells are connected by gap junctions (shown as gGJ), whereas VS1 and VS10 are connected by inhibitory chemical synapses. In our simulation, we set all conductance magnitudes using the same method as in [38]. Based on experimental findings [50], we vary the magnitude of the GJ conductance between 0 and 1 μS. Because the maximum firing rate of the descending neuron connecting to the neck motor center is 100 Hz [38], we use the transient temporal average of the resulting axonal voltage V¯pasttr=1/TVpast(t)dt (in the main text, the Vpast are temporal average. For simplicity, we use Vpast instead) for the subsequent analysis For the voltages just before the start of evasive maneuvers, we use the average from t = −10 ∼ 0 ms, i.e. 0 ms is the start of evasive maneuvers.

Efficient encoding of predictive information

To predict the future input motion, the only input the VS network has is its dendritic input at past times up to the present, i.e. Ipast. Ideally, the VS network output represents the future motion in a specific form, Z, following the optimal encoding dictated by the solution to our information bottleneck problem. The bottleneck minimizes how much the representation retains about the past input I(Z;Ipast) and maximizes how much it encodes about the future input i.e. I(Z;Ifuture) (or I(Z; θfuture in Result). Formally, such encoding Z solves the following variational problem, prediction of its own input:

Lp(Z|Ipast),β=Ipast-βIfuture (5)

where β is the trade-off parameter between compression of information about the past, and retention of information about the future sensory input (we switch to Ifuture(θ, Δt) when we look at the prediction of future ego-rotation, as shown in Section 4 of the Result). For each Ipast, there exists an optimal Ifuture*(Ipast) which is the maximum Ifuture possible for a specified Ipast, determined by the statistics of the sensory input, i.e. Ipast, itself.

We use the following iterative (the Blahut-Arimoto algorithm [100]) algorithm (the MATLAB implementation is available at: https://www.mathworks.com/matlabcentral/fileexchange/65937-information-bottleneck-iterative-algorithm by Shabab Bazrafkan to find Z that optimizes Eq 5: (we use X=Ipast and Y=Ifuture here)

pt(Z|X)=pt(Z)Z(X,β)exp[-βYp(Y|X))logp(Y|X)pt(Y|z)] (6)
pt+1(Z)=Xp(X)pt(z|X) (7)
pt+1(Y|Z)=Xp(Y|X)pt(X|Z) (8)

Mutual information estimation

We use the K-nearest neighbor approach described in [101] (its open source software MILCA is available at https://www.ucl.ac.uk/ion/milca-0) to obtain mutual information estimates of Ifuture(I,Δt), Ifuturemax, Ifuture(θ, Δt) and Ipast. Here, the mutual information is approximated via its corresponding complete gamma function:

I(X;Y)=ψ(K)-<ψ(nx+1)+ψ(ny+1)>+ψ(N) (9)

K is the parameter for evaluating the complete gamma function ψ and N as the sample size, here N = 650, 000.

To choose K, we run the Blahut-Arimoto algorithm with a large β (i.e. β = 100) to estimate the upper bound of mutual information, based on the observation that the information bottleneck reaches its optimum at I(X;Y). Each Blahut-Arimoto estimation can take one week or longer in a multi-core cluster, so we only obtained the upper bounds for Ifuture, Ifuture(θ, Δt) and Ipast for the entire VS network and the VS5-6-7 triplet. We then use these upper bounds to determine K. In general, we use k = 10, ⋯, 15 (or K = 1000, ⋯, 1100 for those bootstrapped θ distributions) and calculate the mean as the estimate in our analysis. We omitted the standard deviations when we plotted Figs 2 and 4 because of their small magnitudes (< 0.2).

Variational approximation of optimal encoding of the predictive information (VIB)

We use the variational approximation introduced in [78]. We first rewrite Eq 5 as Eq 10.

Lp(z|Ipast),β=Ifuture-βIpast (10)

The minimization of Eq 5 is equivalent to the maximization of Eq 10 (i.e. when β=1β, Eq 10 is the same as Eq 5).

Lp(z|Ipast),β-βHθfutureLVIB=dydzp(Ifuture,Z)logq(Ifuture|Z)-βdIpastdzp(Ipast)p(Z|Ipast)logp(Z|Ipast)r(Z) (11)

Next, we minimize the variational lower bound Eq 11 of Eq 10. The advantage of using this variational approximation Eq 11 is that we can constrain the distribution of Z to a particular form (i.e. a 2-D Gaussian) while letting the distributions of x and y to be arbitrary. This provides us with a latent feature representation of the lower bound for the optimal encoding of predictive information.

In this work, we would like to understand the structure of the optimal encoding for future ego-rotation given the input (the dendritic current, the VS axonal voltages, or the triplet voltages). Therefore, we obtain the respective solutions of LVIB with fixed β′ = 40. This is the value that falls into the diminishing return part of the IB curves in both Figs 3 and 4.

Supporting information

S1 Fig

A) Schematic depiction of the visual stimuli for the simulation, recompiled from [46]. Six natural images (five are shown here, with one excluded to reveal the fly’s viewing perspective) were randomly selected from the van Hateren dataset [65]; each image was patched onto a different face of a cube. Assuming that the fly is located in the center of this cube, we obtain the visual experience of the fly’s ego-rotational motion by rotating this cage around a particular motion direction shown by the dark blue arrow. We then project the moving natural scene cage to a unit sphere that represents the fly’s retina, following the protocol introduced in [39, 45]. There are ∼5,500 local motion detectors (LMD) evenly distributed on this unit sphere. The responses of those LMDs whose locations are within a VS cell’s dendritic receptive field (Σazimuth = 15 and Σelevation = 60°, tiling along the fly’s anterior-posterior axis, see details in supplementary Materials and methods) are then integrated as the input current to this particular VS cell. B) Mercator maps with both checkerboard and natural scene backgrounds, at 1° resolution in spherical coordinates. C) ego-motion information inferable in checkerboard and natural scene backgrounds. The stimulus is a constant rotation of 500°/s from [46]. Note that there is only half of the information about this motion stimulus using the background of natural scene textures compared to the checkerboard background.

(TIF)

S2 Fig. Egorotation distributions for different time steps during the evasive maneuver.

Egorotation distributions for different time steps during the evasive maneuver. Here we focus on the egorotations to which the VS network is sensitive. Because the VS network is only responsive to combinations of roll and pitch motions, i.e. motions within the fly’s coronal plane, we represent all stimuli with their corresponding vectors in this plane. A) The egorotation distribution at 10ms before the onset of evasive maneuvers. B) The future egorotation at 10ms after the initiation of evasive maneuvers. C) Similar to B, but for the egorotation at 20ms within the evasive maneuver. Here, most of the banked turns slow down and counter banked turns start.) D) Similar to B, but for the egorotation at 30ms within the evasive maneuver. This motion corresponds to the start of the counter-banked turn. E) Similar to B, but for egorotations a fly would experience at the end of the evasive maneuver. This motion corresponds to the slowing down of counter-banked turn and the completion of evasive maneuver. All of these egorotation distributions have comparable entropy ∼4 − 4.3 bits. F) The Jensen–Shannon divergence between the past egorotation distribution and the egorotations at Δt = 10, 20, 30, 40ms of evasive maneuvers, respectively.

(TIF)

S3 Fig. Linear correlation between the past egoroation θpastt = −10ms before the start of evasive maneuvers) and the future egorotation θfuturet = 10ms, 20ms, 30ms, 40ms) at different time lags in the evasive maneuver.

These egorotations are calculated as the axis of rotation, combining the rotational angles along both roll and pitch body axes. A) The correlation between the egorotation distribution at 10ms before the onset of evasive maneuvers and the egorotation distribution 10ms into evasive maneuvers. B) Similar to A), but for the egorotation distribution 20ms into evasive maneuvers. C) Similar to A), the egorotation distribution 30ms into evasive maneuvers. D) Similar to A), the egorotation distribution 40ms into evasive maneuvers.

(TIF)

S4 Fig. Scatterplots of 120 triplets in A) the IfutureI-Ipast plane; B) the Ifutureθ-Ipast.

(TIF)

S5 Fig. How much a triplet based encoding retains from the past input vs. how much that information is about the future stimulus (out of the information about their own future input), for all 120 possible triplets.

The particular VS 5,6,7 triplet (shown by the red circle and the arrow) that connects with the neck motor center, is one of the most efficient in terms of how much fraction its prediction of its own input is about the future stimulus, while its encoding cost Ipast is modest.

(TIF)

S6 Fig. Network schematic for the variational approximation of the information bottleneck solution (VIB).

By constructing a variational approximation, the encoder learned a latent representation z from the past VS voltages. For training the encoder, we first project the axonal voltages of 20 VS cells to 200 intermediate filters, followed by a batch normalization layer. We then learn a latent representation with z = 2 for easy visualization. Then a decoder of the same structure as the encoder generates samples from z and reads them out as the future input current to the VS network. Note the VS network does not have direct access to the stimulus, it uses the correlations between its past and future inputs induced by the stimulus as a proxy for the stimulus correlations, themselves. z follows a Gaussian distribution, with parameters as μ and Σ. During training for this VIB, the mean μ and covariance matrix Σ of z map the axonal voltages of VS to the future input. When the VIB succeeds, we obtain the predictive representation of the future stimulus by projecting their respective axonal voltages into the latent feature space of z.

(TIF)

S7 Fig. Predictive information for the future stimulus 10ms after the evasive maneuver starts (Δt = 10ms).

The red bar shows that the PCA projection of the first 2PCs from the input current contains almost all of the stimulus information available at the input current itself. We use this PCA projection to understand whether it is possible to disentangle input stimuli from different quadrants using prediction in Fig 5. The green bar shows the limit on prediction information, based on the information bottleneck method. It corresponds to the point on information curve at the given compression in Fig 4B. The cyan bar corresponds to the predictive information about the future stimulus using outputs from all VS cells. The darker-colored region shows how much information the corresponding VIB captures about the future stimulus. The purple bar is similar to the cyan bar, for predictive encodings of the VS 5-6-7 triplet vs. their respective VIB solution.

(TIF)

S8 Fig. The input to the VS network only supports local discrimination.

A) The representation of 8 randomly selected stimuli within the plane whose dimensions are the first two principal components of the input currents. Note that there are substantial overlaps between clusters: e.g. the light-green cluster is almost on top of the dark-red/dark-blue clusters. B) The subset of 4 stimuli from A. The only difference, as compared to A, is that all these stimuli have the same pitch/roll directions (clockwise roll and up tilt pitch, i.e. they are all within the 1st quadrant of the fly’s coronal plane).

(TIF)

S9 Fig. The predictive information encoded by the VS network preferentially discriminates nearby egorotations.

A) The predictive representation of stimuli at 37° and 56° obtained by mapping the respective axonal voltages of the entire VS network to the latent feature space generated by the VIB. B) Similar to A, but using the VS 5-6-7 triplet as input. C) The predictive representation of two stimuli that are much closer in stimulus space: 56° and 67°, respectively. Note that there is no overlap between these two nearby stimuli whereas there is some overlap for stimuli that are farther apart (shown in A). D) Similar to C, but using the VS 5-6-7 triplet as input.

(TIF)

Acknowledgments

We thank D. Allan Drummond for graphical design of the 3D fly illustrations in Fig 1.

Data Availability

The paper is a theoretical work and does not contain experimental data. All the parameters and open source software packages required to reproduce our simulation and results are specified in this Materials and Methods section. We prepared a github repository https://github.com/siwei-wang/VS_pred including all intermediate data and code we used to generate figures in the manuscript.

Funding Statement

This work was supported in part by the Gatsby Charitable Foundation (https://www.gatsby.org.uk/neuroscience) (SW, IS) and by the Max Planck Hebrew University Center for Sensory Processing of the Brain in Action (https://www.mpg.de/7021540/hebrew_uni_center_Jerusalem) (SW, IS, AB). The work was also support by the National Science Foundation, both via CAREER award 1652617 (SEP, SW), and through the Center for the Physics of Biological Function (PHY-1734030) (SEP), and by the National Institutes of Health BRAIN-R01 EB026943 (SEP, SW). The funders played no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Laughlin S. A simple coding procedure enhances a neuron’s information capacity. Zeitschrift fur Naturforschung Section C, Biosciences. 1981;36:910–912. 10.1515/znc-1981-9-1040 [DOI] [PubMed] [Google Scholar]
  • 2. Reliability and statistical efficiency of a blowfly movement-sensitive neuron. Philosophical Transactions of the Royal Society of London Series B: Biological Sciences. 1995;348(1325):321–340. 10.1098/rstb.1995.0071 [DOI] [Google Scholar]
  • 3. Card G, Dickinson M. Visually mediated motor planning in the escape response of Drosophila. Current Biology. 2008;18(17):1300–7. 10.1016/j.cub.2008.07.094 [DOI] [PubMed] [Google Scholar]
  • 4. Land M, Collett T. Chasing behaviour of houseflies (fannia canicularis). J Compara. 1974;89:331–357. [Google Scholar]
  • 5. Fabian ST, Sumner ME, Wardill TJ, Rossoni S, Gonzalez-Bellido PT. Interception by two predatory fly species is explained by a proportional navigation feedback controller. Journal of The Royal Society Interface. 2018;15(147):20180466. 10.1098/rsif.2018.0466 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Muijres FT, Elzinga MJ, Iwasaki NA, Dickinson MH. Body saccades of Drosophila consist of stereotyped banked turns. The Journal of experimental biology. 2015;218:864–875. 10.1242/jeb.114280 [DOI] [PubMed] [Google Scholar]
  • 7. Catania KC. Tentacled snakes turn C-starts to their advantage and predict future prey behavior. Proceedings of the National Academy of Sciences. 2009;106(27):11183–11187. 10.1073/pnas.0905183106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Buddenbrock Wv. Die vermutliche Losung der Halternfrage. Pfugers Arch. 1919;175:125–164. 10.1007/BF01722145 [DOI] [Google Scholar]
  • 9.W D. Physico-Theology. W.&J.Inny; 1714.
  • 10. Ristroph L, Bergou AJ, Ristroph G, Coumes K, Berman GJ, Guckenheimer J, et al. Discovering the flight autostabilizer of fruit flies by inducing aerial stumbles. Proceedings of the National Academy of Sciences. 2010;107(11):4820–4824. 10.1073/pnas.1000615107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Bergou AJ, Ristroph L, Guckenheimer J, Cohen I, Wang ZJ. Fruit flies modulate passive wing pitching to generate in-flight turns. Physical review letters. 2010;104:148101. 10.1103/PhysRevLett.104.148101 [DOI] [PubMed] [Google Scholar]
  • 12. Chan WP, Prete F, Dickinson MH. Visual input to the efferent control system of a fly’s “gyroscope”. Science (New York, NY). 1998;280:289–292. 10.1126/science.280.5361.289 [DOI] [PubMed] [Google Scholar]
  • 13. Dickerson BH, de Souza AM, Huda A, Dickinson MH. Flies Regulate Wing Motion via Active Control of a Dual-Function Gyroscope. Current Biology. 2019;29(20):3517–3524. 10.1016/j.cub.2019.08.065 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Hengstenberg R. Mechanosensory control of compensatory head roll during flight in the blowflyCalliphora erythrocephala Meig. Journal of Comparative Physiology A. 1988;163(2):151–165. 10.1007/BF00612425 [DOI] [Google Scholar]
  • 15. Sherman A. A comparison of visual and haltere-mediated equilibrium reflexes in the fruit fly Drosophila melanogaster. Journal of Experimental Biology. 2003;206(2):295–302. 10.1242/jeb.00075 [DOI] [PubMed] [Google Scholar]
  • 16. Sherman A. Summation of visual and mechanosensory feedback in Drosophila flight control. Journal of Experimental Biology. 2004;207(1):133–142. 10.1242/jeb.00731 [DOI] [PubMed] [Google Scholar]
  • 17. Kim AJ, Fenk LM, Lyu C, Maimon G. Quantitative Predictions Orchestrate Visual Signaling in Drosophila. Cell. 2017;168(1-2):280–294.e12. 10.1016/j.cell.2016.12.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Palmer S, Marre O, Berry M, Bialek W. Predictive information in a sensory population. Proc Natl Acad Sci USA. 2015;112:6908–6913. 10.1073/pnas.1506855112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Chalk M, Marre O, Tkačik G. Toward a unified theory of efficient, predictive, and sparse coding. Proceedings of the National Academy of Sciences. 2017;115(1):186–191. 10.1073/pnas.1711114115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Berman G, Bialek W, Shaevitz J. Predictability and hierarchy in Drosophila behavior. Proc Natl Acad Sci USA. 2016;113:11943–11948. 10.1073/pnas.1607601113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Muijres F, Elzinga M, Melis J, Dickinson M. Flies evade looming targets by executing rapid visually directed banked turns. Science. 2014;344:172–177. 10.1126/science.1248955 [DOI] [PubMed] [Google Scholar]
  • 22. Dickinson MH. Death Valley,Drosophila, and the Devonian Toolkit. Annual Review of Entomology. 2014;59(1):51–72. 10.1146/annurev-ento-011613-162041 [DOI] [PubMed] [Google Scholar]
  • 23. Schilstra Hateren. Blowfly flight and optic flow. I. Thorax kinematics and flight dynamics. The Journal of experimental biology. 1999;202 (Pt 11):1481–1490. 10.1242/jeb.202.11.1481 [DOI] [PubMed] [Google Scholar]
  • 24. Hateren Schilstra. Blowfly flight and optic flow. II. Head movements during flight. The Journal of experimental biology. 1999;202 (Pt 11):1491–1500. 10.1242/jeb.202.11.1491 [DOI] [PubMed] [Google Scholar]
  • 25. Pringle JWS. The gyroscopic mechanism of the halteres of Diptera. Philosophical Transactions of the Royal Society of London Series B, Biological Sciences. 1948;233(602):347–384. [Google Scholar]
  • 26. Lehmann FO, Dickinson MH. The control of wing kinematics and flight forces in fruit flies (Drosophila spp.). Journal of Experimental Biology. 1998;201(3):385–401. 10.1242/jeb.201.3.385 [DOI] [PubMed] [Google Scholar]
  • 27. Krapp HG, Hengstenberg R. Estimation of self-motion by optic flow processing in single visual interneurons. Nature. 1996;384(6608):463–466. 10.1038/384463a0 [DOI] [PubMed] [Google Scholar]
  • 28. Haag J, Borst A. Dendro-dendritic interactions between motion-sensitive large-field neurons in the fly. The Journal of neuroscience: the official journal of the Society for Neuroscience. 2002;22:3227–3233. 10.1523/JNEUROSCI.22-08-03227.2002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Mauss AS, Meier M, Serbe E, Borst A. Optogenetic and pharmacologic dissection of feedforward inhibition in Drosophila motion vision. The Journal of neuroscience: the official journal of the Society for Neuroscience. 2014;34:2254–2263. 10.1523/JNEUROSCI.3938-13.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Cuntz H, Forstner F, Schnell B, Ammer G, Raghu SV, Borst A. Preserving Neural Function under Extreme Scaling. PLoS ONE. 2013;8(8):e71540. 10.1371/journal.pone.0071540 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Gnatzy W, Grunert U, Bender M. Campaniform sensilla of Calliphora vicina (Insecta, Diptera). Zoomorphology. 1987;106(5):312–319. 10.1007/BF00312005 [DOI] [Google Scholar]
  • 32. Chevalier RL. The fine structure of campaniform sensilla on the halteres ofDrosophila melanogaster. Journal of Morphology. 1969;128(4):443–463. 10.1002/jmor.1051280405 [DOI] [Google Scholar]
  • 33. Smith DS. The fine structure of haltere sensilla in the blowfly Calliphora erythrocephala (Meig.), with scanning electron microscopic observations on the haltere surface. Tissue and Cell. 1969;1(3):443–484. 10.1016/S0040-8166(69)80016-9 [DOI] [PubMed] [Google Scholar]
  • 34. Agrawal S, Grimaldi D, Fox JL. Haltere morphology and campaniform sensilla arrangement across Diptera. Arthropod Structure & Development. 2017;46(2):215–229. 10.1016/j.asd.2017.01.005 [DOI] [PubMed] [Google Scholar]
  • 35. Toh Y. Structure of campaniform sensilla on the haltere ofDrosophila prepared by cryofixation. Journal of Ultrastructure Research. 1985;93(1-2):92–100. 10.1016/0889-1605(85)90089-8 [DOI] [Google Scholar]
  • 36. Chan WP, Dickinson MH. Position-specific central projections of mechanosensory neurons on the haltere of the blow fly, Calliphora vicina. Journal of Comparative Neurology. 1996;369(3):405–418. [DOI] [PubMed] [Google Scholar]
  • 37. Cuntz H, Haag J, Forstner F, Segev I, Borst A. Robust coding of flow-field parameters by axo-axonal gap junctions between fly visual interneurons. Proc Natl Acad Sci USA. 2007;104:10229–10233. 10.1073/pnas.0703697104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Weber F, Eichner H, Cuntz H, Borst A. Eigenanalysis of a neural network for optic flow processing. New Journal of Physics. 2008;10:015–013. 10.1088/1367-2630/10/1/015013 [DOI] [Google Scholar]
  • 39. Borst A, Weber F. Neural action fields for optic flow based navigation: a simulation study of the fly lobula plate network. PLoS One. 2011;6(1):e16303. 10.1371/journal.pone.0016303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Buschbeck K, Strausfeld N. The relevance of neural architecture to visual performance: phylogenetic conservation and variation in Dipteran visual systems. J Comp Neurol. 1997;383(3):282–304. [PubMed] [Google Scholar]
  • 41. Creamer MS, Mano O, Clark DA. Visual Control of Walking Speed in Drosophila. Neuron. 2018;100(6):1460–1473.e6. 10.1016/j.neuron.2018.10.028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Haag J, Denk W, Borst A. Fly motion vision is based on Reichardt detectors regardless of the signal-to-noise ratio. Proceedings of the National Academy of Sciences. 2004;101(46):16333–16338. 10.1073/pnas.0407368101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Reisenman C, Haag J, Borst A. Adaptation of response transients in fly motion vision. I: Experiments. Vision Research. 2003;43(11):1293–1309. 10.1016/S0042-6989(03)00091-9 [DOI] [PubMed] [Google Scholar]
  • 44. Borst A, Reisenman C, Haag J. Adaptation of response transients in fly motion vision. II: Model studies. Vision research. 2003;43:1309–1322. 10.1016/S0042-6989(03)00092-0 [DOI] [PubMed] [Google Scholar]
  • 45. Trousdale J, Carroll S, Gabbiani F, Josi K. Near-optimal decoding of transient stimuli from coupled neuronal subpopulations. JNeurosci. 2014;34:12206–12222. 10.1523/JNEUROSCI.2671-13.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Wang S, Borst A, Zaslavsky N, Tishby N, Segev I. Efficient encoding of motion is mediated by gap junctions in the fly visual system. Plos Comp Bio, vol 13, no 12, p e1005846. 2017;13:e1005846. 10.1371/journal.pcbi.1005846 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Elyada YM, Haag J, Borst A. Different receptive fields in axons and dendrites underlie robust coding in motion-sensitive neurons. Nature neuroscience. 2009;12:327–332. 10.1038/nn.2269 [DOI] [PubMed] [Google Scholar]
  • 48. Borst A. Fly visual course control: behaviour, algorithms and circuits. Nature Reviews Neuroscience. 2014;15(9):590–599. 10.1038/nrn3799 [DOI] [PubMed] [Google Scholar]
  • 49. Borst A, Haag J, Mauss AS. How fly neurons compute the direction of visual motion. Journal of Comparative Physiology A. 2019; p. 1–16. 10.1007/s00359-019-01375-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Borst A, Haag J. The intrinsic electrophysiological characteristics of fly lobula plate tangential cells: I. Passive membrane properties. Journal of computational neuroscience. 1996;3:313–336. 10.1007/BF00161091 [DOI] [PubMed] [Google Scholar]
  • 51. Haag J, Borst A. Neural mechanism underlying complex receptive field properties of motion sensitive interneurons. Nat Neurosci. 2004;7:628–634. 10.1038/nn1245 [DOI] [PubMed] [Google Scholar]
  • 52. Haag J, Borst A. Dye-coupling visualizes networks of large-field motion-sensitive neurons in the fly. Journal of Comparative Physiology A. 2005;191(5):445–454. 10.1007/s00359-005-0605-0 [DOI] [PubMed] [Google Scholar]
  • 53. Kennedy A, Wayne G, Kaifosh P, Alviña K, Abbott L, Sawtell NB. A temporal basis for predicting the sensory consequences of motor commands in an electric fish. Nature neuroscience. 2014;17(3):416. 10.1038/nn.3650 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Tu MS, Dickinson MH. The control of wing kinematics by two steering muscles of the blowfly (Calliphora vicina). Journal of comparative physiology A, Sensory, neural, and behavioral physiology. 1996;178:813–830. [DOI] [PubMed] [Google Scholar]
  • 55. Heide G, Götz KG. Optomotor control of course and altitude in Drosophila melanogaster is correlated with distinct activities of at least three pairs of flight steering muscles. The Journal of experimental biology. 1996;199:1711–1726. 10.1242/jeb.199.8.1711 [DOI] [PubMed] [Google Scholar]
  • 56. Lindsay T, Sustar A, Dickinson M. The Function and Organization of the Motor System Controlling Flight Maneuvers in Flies. Current Biology. 2017;27(3):345–358. 10.1016/j.cub.2016.12.018 [DOI] [PubMed] [Google Scholar]
  • 57. Dickinson M, Muijres F. the aerodynamics and control of free flight manoeuvres in Drosophila. Phil Trans R Soc. 2016;371. 10.1098/rstb.2015.0388 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Hengstenberg R. Gaze control in the blowfly Calliphora: a multisensory, two-stage integration process. Seminars in Neuroscience. 1991;3(1):19–29. 10.1016/1044-5765(91)90063-T [DOI] [Google Scholar]
  • 59. Göpfert MC, Robert D. The mechanical basis of Drosophila audition. The Journal of experimental biology. 2002;205:1199–1208. 10.1242/jeb.205.9.1199 [DOI] [PubMed] [Google Scholar]
  • 60. Parsons MM, Krapp HG, Laughlin SB. A motion-sensitive neurone responds to signals from the two visual systems of the blowfly, the compound eyes and ocelli. Journal of Experimental Biology. 2006;209(22):4464–4474. 10.1242/jeb.02560 [DOI] [PubMed] [Google Scholar]
  • 61. Budick SA, Reiser MB, Dickinson MH. The role of visual and mechanosensory cues in structuring forward flight in Drosophila melanogaster. Journal of Experimental Biology. 2007;210(23):4092–4103. 10.1242/jeb.006502 [DOI] [PubMed] [Google Scholar]
  • 62. Krapp HG. Ocelli. Current Biology. 2009;19(11):R435–R437. 10.1016/j.cub.2009.03.034 [DOI] [PubMed] [Google Scholar]
  • 63. Sane SP, Dieudonne A, Willis MA, Daniel TL. Antennal Mechanosensors Mediate Flight Control in Moths. Science. 2007;315(5813):863–866. 10.1126/science.1133598 [DOI] [PubMed] [Google Scholar]
  • 64. Eberle AL, Dickerson BH, Reinhall PG, Daniel TL. A new twist on gyroscopic sensing: body rotations lead to torsion in flapping, flexing insect wings. Journal of the Royal Society, Interface. 2015;12:20141088. 10.1098/rsif.2014.1088 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. van Hateren J. A theory of maximizing sensory information. Biol Cybern. 1992;68:23–29. 10.1007/BF00203134 [DOI] [PubMed] [Google Scholar]
  • 66. Mauss AS, Pankova K, Arenz A, Nern A, Rubin GM, Borst A. Neural Circuit to Integrate Opposing Motions in the Visual Field. Cell. 2015;162:351–362. 10.1016/j.cell.2015.06.035 [DOI] [PubMed] [Google Scholar]
  • 67. Drews MS, Leonhardt A, Pirogova N, Richter FG, Schuetzenberger A, Braun L, et al. Dynamic Signal Compression for Robust Motion Vision in Flies. Current Biology. 2020;30(2):209–221.e8. 10.1016/j.cub.2019.10.035 [DOI] [PubMed] [Google Scholar]
  • 68. Matulis CA, Chen J, Gonzalez-Suarez AD, Behnia R, Clark DA. Heterogeneous Temporal Contrast Adaptation in Drosophila Direction-Selective Circuits. Current biology: CB. 2020;30:222–236.e6. 10.1016/j.cub.2019.11.077 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Wertz A, Gaub B, Plett J, Haag J, Borst A. Robust coding of ego-motion in descending neurons of the fly. Journal of Neuroscience. 2009;29(47):14993–15000. 10.1523/JNEUROSCI.3786-09.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Ache JM, Polsky J, Alghailani S, Parekh R, Breads P, Peek MY, et al. Neural basis for looming size and velocity encoding in the Drosophila giant fiber escape pathway. Current Biology. 2019;29(6):1073–1081. 10.1016/j.cub.2019.01.079 [DOI] [PubMed] [Google Scholar]
  • 71. Klapoetke NC, Nern A, Peek MY, Rogers EM, Breads P, Rubin GM, et al. Ultra-selective looming detection from radial motion opponency. Nature. 2017;551(7679):237–241. 10.1038/nature24626 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Haag J, Borst A. Reciprocal inhibitory connections within a neural network for rotational optic-flow processing. Frontiers in neuroscience. 2007;1:111–121. 10.3389/neuro.01.1.1.008.2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Elyada Y, Haag J, Borst A. Different receptive fields in axons and dendrites underlie robust coding in motion-sensitive neurons. Nat Neurosci. 2009;12(3):327–332. 10.1038/nn.2269 [DOI] [PubMed] [Google Scholar]
  • 74. Single S, Borst A. Dendritic Integration and Its Role in Computing Image Velocity. Science. 1998;281(5384):1848–1850. 10.1126/science.281.5384.1848 [DOI] [PubMed] [Google Scholar]
  • 75. Haag J, Wertz A, Borst A. Integration of Lobula Plate Output Signals by DNOVS1, an Identified Premotor Descending Neuron. Journal of Neuroscience. 2007;27(8):1992–2000. 10.1523/JNEUROSCI.4393-06.2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Tishby N, Pereira FC, Bialek W. The Information Bottleneck Method; 1999.
  • 77. Cuntz H, Borst A, Segev I. Optimization principles of dendritic structure. Theor Biol Med Model. 2009;8:4–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Alemi A, Fischer I, Dillon J, Murphy K. Deep variational information bottleneck Int. In: Conf. on Learning Representations; 2017.
  • 79.Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In: International Conference on Learning Representations (ICLR); 2017. p. ICLR.
  • 80.Chalk M, Marre O, Tkacik G. Relevant sparse codes with variational information bottleneck; 2016.
  • 81.Kingma DP, Welling M. Auto-Encoding Variational Bayes; 2013.
  • 82. Gronenberg W, Strausfeld NJ. Descending pathways connecting the male-specific visual system of flies to the neck and flight motor. Journal of comparative physiology A, Sensory, neural, and behavioral physiology. 1991;169:413–426. [DOI] [PubMed] [Google Scholar]
  • 83. Gronenberg W, Strausfeld NJ. Premotor descending neurons responding selectively to local visual stimuli in flies. The Journal of Comparative Neurology. 1992;316(1):87–103. 10.1002/cne.903160108 [DOI] [PubMed] [Google Scholar]
  • 84. Douglass JK, Strausfeld NJ. Anatomical organization of retinotopic motion-sensitive pathways in the optic lobes of flies. Microscopy Research and Technique. 2003;62(2):132–150. 10.1002/jemt.10367 [DOI] [PubMed] [Google Scholar]
  • 85. Strausfeld NJ, Bassemir UK. The organization of giant horizontal-motion-sensitive neurons and their synaptic relationships in the lateral deutocerebrum of Calliphora erythrocephala and Musca domestica. Cell and Tissue Research. 1985;242(3). 10.1007/BF00225419 [DOI] [Google Scholar]
  • 86. Strausfeld NJ, Seyan HS. Convergence of visual, haltere, and prosternai inputs at neck motor neurons of Calliphora erythrocephala. Cell and Tissue Research. 1985;240(3):601–615. 10.1007/BF00216350 [DOI] [Google Scholar]
  • 87. Strausfeld NJ, Seyan HS, Milde JJ. The neck motor system of the flyCalliphora erythrocephala. Journal of Comparative Physiology A. 1987;160(2):205–224. 10.1007/BF00609727 [DOI] [Google Scholar]
  • 88. Dickerson BH. Timing precision in fly flight control: integrating mechanosensory input with muscle physiology. Proceedings of the Royal Society B: Biological Sciences. 2020;287(1941):20201774. 10.1098/rspb.2020.1774 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89. Lehmann FO, Gotz KG. Activation phase ensures kinematic efficacy in flight-steering muscles of Drosophila melanogaster. Journal of Comparative Physiology A. 1996;179(3). 10.1007/BF00194985 [DOI] [PubMed] [Google Scholar]
  • 90. Wertz A, Borst A, Haag J. Nonlinear integration of binocular optic flow by DNOVS2, a descending neuron of the fly. The Journal of neuroscience: the official journal of the Society for Neuroscience. 2008;28:3131–3140. 10.1523/JNEUROSCI.5460-07.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91. Wertz A, Haag J, Borst A. Local and global motion preferences in descending neurons of the fly. Journal of comparative physiology A, Neuroethology, sensory, neural, and behavioral physiology. 2009;195:1107–1120. 10.1007/s00359-009-0481-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. Connors B. Synchrony and so Much More: Diverse Roles for electrical Synapses in Neural Circuits. Dev Neurobiol. 2017;77(5):610–624. 10.1002/dneu.22493 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93. Marder E. Electrical synapses: Beyond speed and synchrony to computation. Current Biology. 1998;8:R795–R797. 10.1016/S0960-9822(07)00502-7 [DOI] [PubMed] [Google Scholar]
  • 94. Sederberg A, MacLean J, Palmer S. Learning to make external sensory stimulus predictions using internal correlations in populations of neurons. Proc Natl Acad Sci USA. 2018;115(5):1105–1110. 10.1073/pnas.1710779115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95. Trenholm S, Schwab DJ, Balasubramanian V, Awatramani GB. Lag normalization in an electrically coupled neural network. Nature neuroscience. 2013;16:154–156. 10.1038/nn.3308 [DOI] [PubMed] [Google Scholar]
  • 96. Ala-Laurila P, Greschner M, Chichilnisky E, Rieke F. Cone photoreceptor contributions to noise and correlation in the retinal output. Nat Neurosci. 2011;14:1309–1316. 10.1038/nn.2927 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97. Jacoby J, Nath A, Jessen Z, Schwartz G. A Self-Regulating Gap Junction Network of Amacrine Cells Controls Nitric Oxide Release in the Retina. Neuron. 2018;100(5):1149–1162. 10.1016/j.neuron.2018.09.047 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98. Heeger D. Theory of cortical function. Proc Natl Acad Sci USA. 2017;114(8):1773–1782. 10.1073/pnas.1619788114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99. Fitzgerald JE, Clark DA. Nonlinear circuits for naturalistic visual motion estimation. eLife. 2015;4. 10.7554/eLife.09123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100. Blahut R. Computation of Channel Capacity and Rate-distortion Functions. IEEE Trans Inf Theor. 1972;18(4):460–473. 10.1109/TIT.1972.1054855 [DOI] [Google Scholar]
  • 101. Kraskov A, Stogbauer H, Grassberger P. Estimating mutual information. Phy Rev E. 2004;69:066138. 10.1103/PhysRevE.69.066138 [DOI] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008965.r001

Decision Letter 0

Lyle J Graham

9 Nov 2020

Dear Dr Palmer,

Thank you very much for submitting your manuscript "Maximally efficient prediction in the early fly visual system may support evasive flight maneuvers" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Lyle Graham

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Wang and colleagues present an analysis of numerical simulations of a realistic network of direction-selective VS cells in flies. The network is presented with the visual stimuli encountered during evasive maneuvers recorded in flying Drosophila, using a cube-geometry of natural images as visual inputs. The goal of this study is to understand whether signals in this network of neurons are predictive of future signals, during the evasive maneuver, such that they could be used for control during such fast maneuvers (typically on the order of 40 ms). The analyses appeared rigorous and well-done, but I thought some conceptual points could use clarification.

Major points

1) Line 46: Active control needs more definition. I interpreted it to mean on-going control during the evasive maneuver, rather than the motor activity stemming from a one-time command, and I’m not sure that the subsequent reasoning in this paragraph all supports that interpretation, since a one-time command could create points 1 and 4. Point 3 is just that all flight required on-going control, but not necessarily visual control. (And control on what timescale? Per wing-beat?) Point 2 seems most relevant, if flies update their escape maneuver in response to continuing changes in the looming stimulus during their maneuver. (Use of “active control” again in line 94).

2) Related to the point above: This whole analysis asks how present visual information can be used to predict future visual information, but if it’s being used to do this, isn’t that a forward model for control, rather than active control? This hinges on the author’s definition of “active control”, but it seems like ‘active control’ should be distinct from the kind of control that employs a forward model of what’s going to happen. Does the framework used here – in which current signals are informative about future ones -- differ from a forward model?

3) The authors note that these predictions are only possible because of correlations (not necessarily low-order) in the trajectory during these evasive maneuvers. A vanilla explanation of these results would be that simple, second-order trajectory correlations account for a good fraction of trajectory variance, and the VS network predicts the future well because it’s a good representation of the present. Can one ask how much variance in future state is explained by current trajectory? I guess I’m asking about whether some of this analysis can also be done using simple representations of temporal co-variance, and how different the results are when more sophisticated, information theoretic methods are used.

4) Input currents to VS cells are critical to define well, given their importance in this study: they were nowhere defined that I could see, though might be in a referenced paper. For instance, I expect these currents to include excitatory and inhibitory inputs from the LMD model, but do they allow those conductances to shunt current? For instance, if both excitatory and inhibitory conductances are high, is the current their sum, or an appropriately weighted sum reflecting the VS membrane voltage? Does it make a difference?

5) Related to the point above: how much do these results depend on the timescale of filtering of the local motion detectors? To obtain temporal frequency tuning of ~1 Hz (as in typical VS recordings), the delay line in any motion detector must do some substantial filtering over time, say 150 ms, which would cause reasonably long autocorrelations in the velocity signals. How does this affect the ability of this network to encode predicted changes in the future signal on timescales of the 40 ms maneuver? I can’t quite see how a long time delay in the motion detector could work with these short, fast maneuvers.

Minor points

1) Line 87: Not sure how an active counter-rotation differs from a counter-rotation.

2) Line 146: Citation for TF-tuning of local motion detectors only references theory for and recordings from an LPTC, which has spatially integrated, opponent-subtracted signals from local motion detectors. Creamer et al. 2018 showed that individual local motion detectors in Drosophila also have this tuning (before opponent subtraction).

3) Line 154: Add citations for LPLC2 and LC2. I don’t think the authors mean LC2 – it should be LC4. This will be Klapoetke et al. and Ache et al.

4) Fig 1C: Mercator projection seems to only show 180 degrees of azimuth. Perhaps scales would be helpful if a full 360 is not being shown?

5) Line 301: Talking about contrast heterogeneity. Not necessarily required for this model, but might be important to mention recent work in Drosophila on contrast normalization in the LMDs preceding LPTCs (Drews et al. and Matulis et al.). These sorts of effects almost certainly also exist in bigger Diptera.

6) Is the result in Figure 2 just due to averaging over space? Figure S3 seems to argue strongly against that, and the authors might consider moving that to a main figure.

7) Figure 3: Is it worth showing all triplets, rather than the cross bar? Would give a better sense of how many of them lie close to or far from the limit. Same with Figure 4B.

8) Since this is all information theoretic, is there a proposed method for the read out of the future information from the current state? Or just that it’s there?

9) Figure 5: numbers representing the angles are pretty unreadable. Please enlarge.

10) Figure 5BD: I don’t understand why these two VIB dimensions are so highly correlated. Does this mean a single dimension could do the VIB encoding in the case of this triplet?

11) Line 589: capitalize Drosophila

12) Line 634: need equation for local motion detection.

13) Line 638: reference V_past on both sides of equation, T is undefined.

14) Line 667: variable k appears undefined. Some capitalizations required in this paragraph.

15) Fig S2: Only 2 labeled panels, but don’t match the figure caption.

Reviewer #2: In this interesting paper the authors demonstrate that the VS network in the fly might be optimized to encode relevant information about future behavior during evasive flight manuevers. This work is a skillful combination of a broad range of approaches - natural stimulus and behavior statistics, detailed biophysical modelling and information theory. The paper generates novel experimental hypotheses, and provides a link between theoretical principles of efficient coding and predictive information and natural behavior. Overall, I think this work could be of potentially broad interest and relevance.

I have, however some concerns which I think the authors should address before acceptance.

1 - In reality evasive manuevers are triggered by a specific object in the visual field (e.g. an obstacle or a predator). Naturalistics scenes generated by the authors do not have, however such visual obstacles matched to the behavior - they are just images from the van Hateren database. That is the statistical structure of the scene and the evasive behavior are independent of each other. How does this affect the interpretation of the results? Could it be that after matching visual scene content to the behavior the ratio of I(theta, V) / S(theta) (Fig. 1B) would be much closer to 1? I think this point is central to the argument of the paper and should be explicitly discussed (and/or supported with additional analysis).

2 - The authors highlight the importance of the gap junctions (GJ) as a key biophysical component necessary to encode the predictive information (e.g. Fig. 2). These gap-junctions are a subset of parameters of the VS model. How many other parameters does the model have? Are there other parameters which might dramatically affect the network performance other than GJs? In other words - what makes GJs a unique subset of parameters from the perspective of predictive information coding?

3 - The idea of encoding the information about the organisms own future behavior is interesting, but could be perhaps better discussed. From a more "control-theoretic" perspective, the aim would be to encode the stimulus, incorporate it into the model of the environment, and then generate action which maximizes probability of the obstacle avoidance, given the flies current belief (or prediction) about the environment. Can one think of encoding the predictive information as extracting bits relevant only for such "control-theoretic" planning? I think it could be better explained and positioned in the context of the current literature.

4 - If I understood correctly, at best, predictive information is only arroud 50% of the entropy of the body rotation (Fig. 1B). If it is so - then to avoid the obstacle the flies still needs a lot of information. Where does it come from? How can partial information be used to perform the manuever much earlier? It would be good to discuss these aspects explicitly.

5 - The authors should dedicate more space to explain the relationship of this work to their previous paper [57]. In particular - if 57 claims that almost entire information about the rotation theta_t is encoded in the VS network state - how substantial is the current advancement? After all efficient coding and coding of predictive information will start to diverge when the bottleneck is strong (i.e. the network can retain only a small proportion of bits from the input). Even if my understanding of [57] is incorrect, this should be much more explicitly discussed.

6 - The article could be much more clear, and would definitely benefit from some streamlining. This work is a synergy of multiple approaches and research traditions - which is its strength. It however combines technical wording and explanations which make it confusing to readers who are not experts in all these fields (and I am clearly a member of this club). My specific suggestions are:

Shorten the introduction - it is very long and it is hard to understand what are the main contributions of the paper. Many parts of it could be moved to the results (e.g. description of the VS network)

There are very many information quantities with a lot of confusing indices. It took me a lot to map them all out, and I'm still not certain if I did it right. A clear diagram in Fig 1, explaining the relationship between V and I and I(V_past; Ipast), etc would be a great help. Fig 1. E does not seem to be enough.

Improve Fig 1 A - I find it very confusing - e.g. what does the dashed vertical arrow correspond to? Is it a process which happens instantaneously? Is all of the vertical dimension time?

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: As far as I can tell, the numerical data underlying graphs is not available in spreadsheet form as supporting information. Although the authors reference lots of software packages to account for their simulations and fits, this work would be most reproducible if the analysis code were provided, perhaps also with intermediate data (like the output of simulations).

Reviewer #2: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008965.r003

Decision Letter 1

Lyle J Graham

20 Feb 2021

Dear Dr Palmer,

Thank you very much for submitting your manuscript "Maximally efficient prediction in the early fly visual system may support evasive flight maneuvers" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic and your revisions. I am saying minor revision based on Reviewer 1's comments, which are truly minor, so I won't send it back out for review again.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Lyle J. Graham

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: In this revision, the authors have addressed well all the points I brought up. I think the revisions to the introduction improved clarity and I found the notation clearer as well. I'm also glad they will provide their code, since I think that will benefit the community.

A few minor notes:

Line 62: point 1 is a run-on sentence.

Line 55: I think it would be good to add a few citations to back up this claim about previous work.

Figure S3 could use x and y axis labels and units. I’m guessing these are in degrees per second. Is this roll rotation or yaw or total? Which rotation types are quantified/plotted is something that could be clarified throughout.

Line 285-288: Text does not quite match author’s description of it in response to minor point 5, since it cites only Drews, not Matulis.

Line 561: I’ve only ever heard these referred to as ‘campaniform sensilla’, never ‘campaniforms’.

Line 570: typo “the”?

Line 630: Capitalize?

Line 736: “Blahut-Arimoto” should be capitalized here and later in paragraph.

Reviewer #2: I thank the Authors for their response and modifications of the manuscript. In particular, I appreciate streamlining of the Introduction and the simplified notation of information quantities. The text is now much easier to understand (at least from my perspective).

I would encourage the Authors to include in the text some variant of their response to my question about the independence of the visual scene and the shape of the evasive trajectory (first question in the previous review). After all, this study connects statistics of stimuli to behavioral control, and many readers may wonder whether there is a link between the image of the obstacle/threat, and the evasive manuever.

Other than that, I think that the manuscript has now improved and will be of interest to a broad audience in computational neuroscience. I recommend it for publication, and congratulate the Authors.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-materials-and-methods

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008965.r005

Decision Letter 2

Lyle J Graham

13 Apr 2021

Dear Dr Palmer,

We are pleased to inform you that your manuscript 'Maximally efficient prediction in the early fly visual system may support evasive flight maneuvers' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Lyle J. Graham

Deputy Editor

PLOS Computational Biology

***********************************************************

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008965.r006

Acceptance letter

Lyle J Graham

30 Apr 2021

PCOMPBIOL-D-20-01719R2

Maximally efficient prediction in the early fly visual system may support evasive flight maneuvers

Dear Dr Palmer,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Andrea Szabo

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig

    A) Schematic depiction of the visual stimuli for the simulation, recompiled from [46]. Six natural images (five are shown here, with one excluded to reveal the fly’s viewing perspective) were randomly selected from the van Hateren dataset [65]; each image was patched onto a different face of a cube. Assuming that the fly is located in the center of this cube, we obtain the visual experience of the fly’s ego-rotational motion by rotating this cage around a particular motion direction shown by the dark blue arrow. We then project the moving natural scene cage to a unit sphere that represents the fly’s retina, following the protocol introduced in [39, 45]. There are ∼5,500 local motion detectors (LMD) evenly distributed on this unit sphere. The responses of those LMDs whose locations are within a VS cell’s dendritic receptive field (Σazimuth = 15 and Σelevation = 60°, tiling along the fly’s anterior-posterior axis, see details in supplementary Materials and methods) are then integrated as the input current to this particular VS cell. B) Mercator maps with both checkerboard and natural scene backgrounds, at 1° resolution in spherical coordinates. C) ego-motion information inferable in checkerboard and natural scene backgrounds. The stimulus is a constant rotation of 500°/s from [46]. Note that there is only half of the information about this motion stimulus using the background of natural scene textures compared to the checkerboard background.

    (TIF)

    S2 Fig. Egorotation distributions for different time steps during the evasive maneuver.

    Egorotation distributions for different time steps during the evasive maneuver. Here we focus on the egorotations to which the VS network is sensitive. Because the VS network is only responsive to combinations of roll and pitch motions, i.e. motions within the fly’s coronal plane, we represent all stimuli with their corresponding vectors in this plane. A) The egorotation distribution at 10ms before the onset of evasive maneuvers. B) The future egorotation at 10ms after the initiation of evasive maneuvers. C) Similar to B, but for the egorotation at 20ms within the evasive maneuver. Here, most of the banked turns slow down and counter banked turns start.) D) Similar to B, but for the egorotation at 30ms within the evasive maneuver. This motion corresponds to the start of the counter-banked turn. E) Similar to B, but for egorotations a fly would experience at the end of the evasive maneuver. This motion corresponds to the slowing down of counter-banked turn and the completion of evasive maneuver. All of these egorotation distributions have comparable entropy ∼4 − 4.3 bits. F) The Jensen–Shannon divergence between the past egorotation distribution and the egorotations at Δt = 10, 20, 30, 40ms of evasive maneuvers, respectively.

    (TIF)

    S3 Fig. Linear correlation between the past egoroation θpastt = −10ms before the start of evasive maneuvers) and the future egorotation θfuturet = 10ms, 20ms, 30ms, 40ms) at different time lags in the evasive maneuver.

    These egorotations are calculated as the axis of rotation, combining the rotational angles along both roll and pitch body axes. A) The correlation between the egorotation distribution at 10ms before the onset of evasive maneuvers and the egorotation distribution 10ms into evasive maneuvers. B) Similar to A), but for the egorotation distribution 20ms into evasive maneuvers. C) Similar to A), the egorotation distribution 30ms into evasive maneuvers. D) Similar to A), the egorotation distribution 40ms into evasive maneuvers.

    (TIF)

    S4 Fig. Scatterplots of 120 triplets in A) the IfutureI-Ipast plane; B) the Ifutureθ-Ipast.

    (TIF)

    S5 Fig. How much a triplet based encoding retains from the past input vs. how much that information is about the future stimulus (out of the information about their own future input), for all 120 possible triplets.

    The particular VS 5,6,7 triplet (shown by the red circle and the arrow) that connects with the neck motor center, is one of the most efficient in terms of how much fraction its prediction of its own input is about the future stimulus, while its encoding cost Ipast is modest.

    (TIF)

    S6 Fig. Network schematic for the variational approximation of the information bottleneck solution (VIB).

    By constructing a variational approximation, the encoder learned a latent representation z from the past VS voltages. For training the encoder, we first project the axonal voltages of 20 VS cells to 200 intermediate filters, followed by a batch normalization layer. We then learn a latent representation with z = 2 for easy visualization. Then a decoder of the same structure as the encoder generates samples from z and reads them out as the future input current to the VS network. Note the VS network does not have direct access to the stimulus, it uses the correlations between its past and future inputs induced by the stimulus as a proxy for the stimulus correlations, themselves. z follows a Gaussian distribution, with parameters as μ and Σ. During training for this VIB, the mean μ and covariance matrix Σ of z map the axonal voltages of VS to the future input. When the VIB succeeds, we obtain the predictive representation of the future stimulus by projecting their respective axonal voltages into the latent feature space of z.

    (TIF)

    S7 Fig. Predictive information for the future stimulus 10ms after the evasive maneuver starts (Δt = 10ms).

    The red bar shows that the PCA projection of the first 2PCs from the input current contains almost all of the stimulus information available at the input current itself. We use this PCA projection to understand whether it is possible to disentangle input stimuli from different quadrants using prediction in Fig 5. The green bar shows the limit on prediction information, based on the information bottleneck method. It corresponds to the point on information curve at the given compression in Fig 4B. The cyan bar corresponds to the predictive information about the future stimulus using outputs from all VS cells. The darker-colored region shows how much information the corresponding VIB captures about the future stimulus. The purple bar is similar to the cyan bar, for predictive encodings of the VS 5-6-7 triplet vs. their respective VIB solution.

    (TIF)

    S8 Fig. The input to the VS network only supports local discrimination.

    A) The representation of 8 randomly selected stimuli within the plane whose dimensions are the first two principal components of the input currents. Note that there are substantial overlaps between clusters: e.g. the light-green cluster is almost on top of the dark-red/dark-blue clusters. B) The subset of 4 stimuli from A. The only difference, as compared to A, is that all these stimuli have the same pitch/roll directions (clockwise roll and up tilt pitch, i.e. they are all within the 1st quadrant of the fly’s coronal plane).

    (TIF)

    S9 Fig. The predictive information encoded by the VS network preferentially discriminates nearby egorotations.

    A) The predictive representation of stimuli at 37° and 56° obtained by mapping the respective axonal voltages of the entire VS network to the latent feature space generated by the VIB. B) Similar to A, but using the VS 5-6-7 triplet as input. C) The predictive representation of two stimuli that are much closer in stimulus space: 56° and 67°, respectively. Note that there is no overlap between these two nearby stimuli whereas there is some overlap for stimuli that are farther apart (shown in A). D) Similar to C, but using the VS 5-6-7 triplet as input.

    (TIF)

    Attachment

    Submitted filename: Response_letter_Jan_13 (1).pdf

    Attachment

    Submitted filename: Response_letter_Mar_2021.pdf

    Data Availability Statement

    The paper is a theoretical work and does not contain experimental data. All the parameters and open source software packages required to reproduce our simulation and results are specified in this Materials and Methods section. We prepared a github repository https://github.com/siwei-wang/VS_pred including all intermediate data and code we used to generate figures in the manuscript.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES