Skip to main content
eLife logoLink to eLife
. 2021 Apr 21;10:e65541. doi: 10.7554/eLife.65541

Creating and controlling visual environments using BonVision

Gonçalo Lopes 1, Karolina Farrell 2,, Edward AB Horrocks 2,, Chi-Yu Lee 2,, Mai M Morimoto 2,, Tomaso Muzzu 2,, Amalia Papanikolaou 2,, Fabio R Rodrigues 2,, Thomas Wheatcroft 2,, Stefano Zucca 2,, Samuel G Solomon 2,†,, Aman B Saleem 2,†,
Editors: Chris I Baker3, Chris I Baker4
PMCID: PMC8104957  PMID: 33880991

Abstract

Real-time rendering of closed-loop visual environments is important for next-generation understanding of brain function and behaviour, but is often prohibitively difficult for non-experts to implement and is limited to few laboratories worldwide. We developed BonVision as an easy-to-use open-source software for the display of virtual or augmented reality, as well as standard visual stimuli. BonVision has been tested on humans and mice, and is capable of supporting new experimental designs in other animal models of vision. As the architecture is based on the open-source Bonsai graphical programming language, BonVision benefits from native integration with experimental hardware. BonVision therefore enables easy implementation of closed-loop experiments, including real-time interaction with deep neural networks, and communication with behavioural and physiological measurement and manipulation devices.

Research organism: Human, Mouse, Rat, Zebrafish

Introduction

Understanding behaviour and its underlying neural mechanisms calls for the ability to construct and control environments that immerse animals, including humans, in complex naturalistic environments that are responsive to their actions. Gaming-driven advances in computation and graphical rendering have driven the development of immersive closed-loop visual environments, but these new platforms are not readily amenable to traditional research paradigms. For example, they do not specify an image in egocentric units (of visual angle), sacrifice precise control of a visual display, and lack transparent interaction with external hardware.

Most vision research has been performed in non-immersive environments with standard two-dimensional visual stimuli, such as gratings or dot stimuli, using established platforms including PsychToolbox (Brainard, 1997) or PsychoPy (Peirce, 2007; Peirce, 2008). Pioneering efforts to bring gaming-driven advances to neuroscience research have provided new platforms for closed-loop visual stimulus generation: STYTRA (Štih et al., 2019) provides 2D visual stimuli for larval zebrafish in python, ratCAVE (Del Grosso and Sirota, 2019) is a specialised augmented reality system for rodents in python, FreemoVR (Stowers et al., 2017) provides virtual reality in Ubuntu/Linux, and ViRMEn (Aronov and Tank, 2014) provides virtual reality in Matlab. However, these new platforms lack the generalised frameworks needed to specify or present standard visual stimuli.

Our initial motivation was to create a visual display software with three key features. First, an integrated, standardised platform that could rapidly switch between traditional visual stimuli (such as grating patterns) and immersive virtual reality. Second, the ability to replicate experimental workflows across different physical configurations (e.g. when moving from one to two computer monitors, or from flat-screen to spherical projection). Third, the ability for rapid and efficient interfacing with external hardware (needed for experimentation) without needing to develop complex multi-threaded routines. We wanted to provide these advances in a way that made it easier for users to construct and run closed-loop experimental designs. In closed-loop experiments, stimuli are ideally conditioned by asynchronous inputs, such as those provided by multiple independent behavioural and neurophysiological measurement devices. Most existing platforms require the development of multi-threaded routines to run experimental paradigms (e.g. control brain stimulation, or sample from recording devices) without compromising the rendering of visual scenes. Implementing such multi-thread routines is complex. We therefore chose to develop a visual presentation framework within the Bonsai programming language (Lopes et al., 2015). Bonsai is a graphical, high-performance, and event-based language that is widely used in neuroscience experiments and is already capable of real-time interfacing with most types of external hardware. Bonsai is specifically designed for flexible and high-performance composition of data streams and external events, and is therefore able to monitor and connect multiple sensor and effector systems in parallel, making it easier to implement closed-loop experimental designs.

We developed BonVision, an open-source software package that can generate and display well-defined visual stimuli in 2D and 3D environments. BonVision exploits Bonsai’s ability to run OpenGL commands on the graphics card through the Bonsai.Shaders package. BonVision further extends Bonsai by providing pre-built GPU shaders and resources for stimuli used in vision research, including movies, along with an accessible, modular interface for composing stimuli and designing experiments. The definition of stimuli in BonVision is independent of the display hardware, allowing for easy replication of workflows across different experimental configurations. Additional unique features include the ability to automatically detect and define the relationship between the observer and the display from a photograph of the experimental apparatus, and to use the outputs of real-time inference methods to determine the position and pose of an observer online, thereby generating augmented reality environments.

Results

To provide a framework that allowed both traditional visual presentation and immersive virtual reality, we needed to bring these very different ways of defining the visual scene into the same architecture. We achieved this by mapping the 2D retino-centric coordinate frame (i.e. degrees of the visual field) to the surface of a 3D sphere using the Mercator projection (Figure 1A, Figure 1—figure supplement 1). The resulting sphere could therefore be rendered onto displays in the same way as any other 3D environment. We then used ‘cube mapping’ to specify the 360° projection of 3D environments onto arbitrary viewpoints around an experimental observer (human or animal; Figure 1B). Using this process, a display device becomes a window into the virtual environment, where each pixel on the display specifies a vector from the observer through that window. The vector links pixels on the display to pixels in the ‘cube map’, thereby rendering the corresponding portion of the visual field onto the display.

Figure 1. BonVision's adaptable display and render configurations.

(A) Illustration of how two-dimensional textures are generated in BonVision using Mercator projection for sphere mapping, with elevation as latitude and azimuth as longitude. The red dot indicates the position of the observer. (B) Three-dimensional objects were placed at the appropriate positions and the visual environment was rendered using cube-mapping. (C–E) Examples of the same two stimuli, a checkerboard + grating (middle row) or four three-dimensional objects (bottom row), displayed in different experimental configurations (top row): two angled LCD monitors (C), a head-mounted display (D), and demi-spherical dome (E).

Figure 1.

Figure 1—figure supplement 1. Mapping stimuli onto displays in various positions.

Figure 1—figure supplement 1.

(A) Checkerboard stimulus being rendered. (B) Projection of the stimulus onto a sphere using Mercator projection. (C) Example display positions (dA–dF) and (D) corresponding rendered images. Red dot in C indicates the observer position.

Figure 1—figure supplement 2. Modular structure of workflow and example workflows.

Figure 1—figure supplement 2.

(A) Description of the modules in BonVision workflows that generate stimuli. Every BonVision stimuli includes a module that creates and initialises the render window, shown in ‘BonVision window and resources’. This defines the window parameters in Create Window (such as background colour, screen index, VSync), and loads predefined (BonVision Resources) and user defined textures (Texture Resources, not shown), and 3D meshes (Mesh Resources). This is followed by the modules: ‘Drawing region’, where the visual space covered by the stimuli is defined, which can be the complete visual space, 360° × 360°. ‘Draw stimuli’ and ‘Define scene’ are where the stimulus is defined, ‘Map Stimuli’, which maps the stimuli into the 3D environment, and ‘Define display’, where the display devices are defined. (B and C) Modules that define the checkerboard +grating stimulus (B) shown in the middle row of Figure 1 and 3D world (C) with five objects shown in the bottom row of Figure 1. The display device is defined separately and either display can be appended at the end of the workflow. This separation of the display device allows for replication between experimental configurations. (D) The variants of the modules used to display stimuli on a head-mounted display. The empty region under ‘Define scene’ would be filled by the corresponding nodes in B and C.

Our approach has the advantage that the visual stimulus is defined irrespectively of display hardware, allowing us to independently define each experimental apparatus without changing the preceding specification of the visual scene, or the experimental design (Figure 1C–E, Figure 1—figure supplements 1 and 2). Consequently, BonVision makes it easy to replicate visual environments and experimental designs on various display devices, including multiple monitors, curved projection surfaces, and head-mounted displays (Figure 1C–E). To facilitate easy and rapid porting between different experimental apparatus, BonVision features a fast semi-automated display calibration. A photograph of the experimental setup with fiducial markers (Garrido-Jurado et al., 2014) measures the 3D position and orientation of each display relative to the observer (Figure 2 and Figure 2—figure supplement 1). BonVision’s inbuilt image processing algorithms then estimate the position and orientation of each marker to fully specify the display environment.

Figure 2. Automated calibration of display position.

(A) Schematic showing the position of two hypothetical displays of different sizes, at different distances and orientation relative to the observer (red dot). (B) How a checkerboard of the same visual angle would appear on each of the two displays. (C) Example of automatic calibration of display position. Standard markers are presented on the display, or in the environment, to allow automated detection of the position and orientation of both the display and the observer. These positions and orientations are indicated by the superimposed red cubes as calculated by BonVision. (D) How the checkerboard would appear on the display when rendered, taking into account the precise position of the display. (E and F) Same as (C and D), but for another pair of display and observer positions. The automated calibration was based on the images shown in C and E.

Figure 2.

Figure 2—figure supplement 1. Automated workflow to calibrate display position.

Figure 2—figure supplement 1.

The automated calibration is carried out by taking advantage of ArUco markers (Del Grosso and Sirota, 2019) that can be used to calculate the 3D position of a surface. (Ai) We use one marker on the display and one placed in the position of the observer. We then use a picture of the display and observer position taken by a calibrated camera. This is an example where we used a mobile phone camera for calibration. (Aii) The detected 3D positions of the screen and the observer, as calculated by BonVision. (Aiii) A checkerboard image and a small superimposed patch of grating, rendered based on the precise position of the display. (B and C) same as A and C for different screen and observer positions: with the screen tilted towards the animal (B), or the observer shifted to the right of the screen (C). The automated calibration was based on the images shown in Ai, Bi, and Ci, which in this case were taken using a mobile phone camera.

Figure 2—figure supplement 2. Automated gamma-calibration of visual displays.

Figure 2—figure supplement 2.

BonVision monitored a photodiode (Photodiode v2.1, https://www.cf-hw.org/harp/behavior) through a HARP microprocessor to measure the light output of the monitor (Dell Latitude 7480). The red, green, and blue channels of the display were sent the same values (i.e. grey scale). (A) Gamma calibration. The input to the display channels was modulated by a linear ramp (range 0–255). Without calibration the monitor output (arbitrary units) increased exponentially (blue line). The measurement was then used to construct an intermediate look-up table that corrected the values sent to the display. Following calibration, the display intensity is close to linear (red line). Inset at top: schematic of the experimental configuration. (B) Similar to A, but showing the intensity profile of a drifting sinusoidal grating. Measurements before calibration resemble an exponentiated sinusoid (blue dotted line). Measurements after calibration resemble a regular sinusoid (red dotted line).

Virtual reality environments are easy to generate in BonVision. BonVision has a library of standard pre-defined 3D structures (including planes, spheres, and cubes), and environments can be defined by specifying the position and scale of the structures, and the textures rendered on them (e.g. Figure 1—figure supplement 2 and Figure 5F). BonVision also has the ability to import standard format 3D design files created elsewhere in order to generate more complex environments (file formats listed in Materials and methods). This allows users to leverage existing 3D drawing platforms (including open source platform ‘Blender’: https://www.blender.org/) to construct complex virtual scenes (see Appendix 1).

BonVision can define the relationship between the display and the observer in real-time. This makes it easy to generate augmented reality environments, where what is rendered on a display depends on the position of an observer (Figure 3A). For example, when a mouse navigates through an arena surrounded by displays, BonVision enables closed-loop, position-dependent updating of those displays. Bonsai can track markers to determine the position of the observer, but it also has turn-key capacity for real-time live pose estimation techniques – using deep neural networks (Mathis et al., 2018; Pereira et al., 2019Kane et al., 2020) – to keep track of the observer’s movements. This allows users to generate and present interactive visual environments (simulation in Figure 3—video 1 and Figure 3B and C).

Figure 3. Using BonVision to generate an augmented reality environment.

(A) Illustration of how the image on a fixed display needs to adapt as an observer (red dot) moves around an environment. The displays simulate windows from a box into a virtual world outside. (B) The virtual scene (from: http://scmapdb.com/wad:skybox-skies) that was used to generate the example images and Figure 3—video 1 offline. (C) Real-time simulation of scene rendering in augmented reality. We show two snapshots of the simulated scene rendering, which is also shown in Figure 3—video 1. In each case the inset image shows the actual video images, of a mouse exploring an arena, that were used to determine the viewpoint of an observer in the simulation. The mouse’s head position was inferred (at a rate of 40 frames/s) by a network trained using DeepLabCut (Aronov and Tank, 2014). The top image shows an instance when the animal was on the left of the arena (head position indicated by the red dot in the main panel) and the lower image shows an instance when it was on the right of the arena.

Figure 3.

Figure 3—video 1. Augmented reality simulation using BonVision.

Download video file (3.2MB, mp4)
This video is an example of a deep neural network, trained with DeepLabCut, being used to estimate the position of a mouse’s head in an environment in real-time, and updating a virtual scene presented on the monitors based on this estimated position. The first few seconds of the video display the online tracking of specific features (nose, head, and base of tail) while an animal is moving around (shown as a red dot) in a three-port box (as in Soares et al., 2016). Subsequently the inset shows the original video of the animal’s movements, which the simulation is based on. The rest of the video image shows how a green field landscape (source: http://scmapdb.com/wad:skybox-skies) outside the box would be rendered on three simulated displays within the box (one placed on each of the three oblique walls). These three displays simulate windows onto the world beyond the box. The position of the animal was updated by DeepLabCut at 40 frames/s, and the simulation was rendered at the same rate.

BonVision is capable of rendering visual environments near the limits of the hardware (Figure 4). This is possible because Bonsai is based on a just-in-time compiler architecture such that there is little computational overhead. BonVision accumulates a list of the commands to OpenGL as the programme makes them. To optimise rendering performance, the priority of these commands is ordered according to that defined in the Shaders component of the LoadResources node (which the user can manipulate for high-performance environments). These ordered calls are then executed when the frame is rendered. To benchmark the responsiveness of BonVision in closed-loop experiments, we measured the delay (latency) between an external event and the presentation of a visual stimulus. We first measured the closed-loop latency for BonVision when a monitor was refreshed at a rate of 60 Hz (Figure 4A). We found delays averaged 2.11 ± 0.78 frames (35.26 ± 13.07 ms). This latency was slightly shorter than that achieved by PsychToolbox (Brainard, 1997) on the same laptop (2.44 ± 0.59 frames, 40.73 ± 9.8 ms; Welch’s t-test, p<10−80, n = 1000). The overall latency of BonVision was mainly constrained by the refresh rate of the display device, such that higher frame rate displays yielded lower latency (60 Hz: 35.26 ± 13.07 ms; 90 Hz: 28.45 ± 7.22 ms; 144 Hz: 18.49 ± 10.1 ms; Figure 4A). That is, the number of frames between the external event and stimulus presentation was similar across frame rate (60 Hz: 2.11 ± 0.78 frames; 90 Hz: 2.56 ± 0.65 frames; 144 Hz: 2.66 ± 1.45 frames; Figure 4C). We used two additional methods to benchmark visual display performance relative to other frameworks (we did not try to optimise code fragments for each framework) (Figure 4B and C). BonVision was able to render up to 576 independent elements and up to eight overlapping textures at 60 Hz without missing (‘dropping’) frames, broadly matching PsychoPy (Peirce, 2007; Peirce, 2008) and Psychtoolbox (Brainard, 1997). BonVision’s performance was similar at different frame rates – at standard frame rate (60 Hz) and at 144 Hz (Figure 4—figure supplement 1). BonVision achieved slightly fewer overlapping textures than PsychoPy, as BonVision does not currently have the option to trade-off the resolution of a texture and its mask for performance. BonVision also supports video playback, either by preloading the video or by streaming it from the disk. The streaming mode, which utilises real-time file I/O and decompression, is capable of displaying both standard definition (SD: 480 p) and full HD (HD: 1080 p) at 60 Hz on a standard computer (Figure 4D). At higher rates, performance is impaired for Full HD videos, but is improved by buffering, and fully restored by preloading the video onto memory (Figure 4D). We benchmarked BonVision on a standard Windows OS laptop, but BonVision is now also capable of running on Linux.

Figure 4. Closed-loop latency and performance benchmarks.

(A) Latency between sending a command (virtual key press) and updating the display (measured using a photodiode). (A.i and A.ii) Latency depended on the frame rate of the display, updating stimuli with a delay of one to three frames. (A.iii and A.iv). (B and C) Benchmarked performance of BonVision with respect to Psychtoolbox and PsychoPy. (B) When using non-overlapping textures BonVision and Psychtoolbox could present 576 independent textures without dropping frames, while PsychoPy could present 16. (C) When using overlapping textures PsychoPy could present 16 textures, while BonVision and Psychtoolbox could present eight textures without dropping frames. (D) Benchmarks for movie playback. BonVision is capable of displaying standard definition (480 p) and high definition (1080 p) movies at 60 frames/s on a laptop computer with a standard CPU and graphics card. We measured display rate when fully pre-loading the movie into memory (blue), or when streaming from disk (with no buffer: orange; 1-frame buffer: green; 2-frame buffer: red; 4-frame buffer: purple). When asked to display at rates higher than the monitor refresh rate (>60 frames/s), the 480 p video played at the maximum frame rate of 60fps in all conditions, while the 1080 p video reached the maximum rate when pre-loaded. Using a buffer slightly improved performance. A black square at the bottom right of the screen in A–C is the position of a flickering rectangle, which switches between black and white at every screen refresh. The luminance in this square is detected by a photodiode and used to measure the actual frame flip times.

Figure 4.

Figure 4—figure supplement 1. BonVision performance benchmarks at high frame rate.

Figure 4—figure supplement 1.

(A) When using non-overlapping textures BonVision was able to render 576 independent textures without dropping frames at 60 Hz. At 144 Hz BonVision was able to 256 non-overlapping textures, with no dropped frames, and seldom dropped frames with 576 textures. BonVision was unable to render 1024 or more textures at the requested frame rate. (B) When using overlapping textures BonVision was able to render 64 independent textures without dropping frames at 60 Hz. At 144 Hz BonVision was able to render 32 textures, with no dropped frames. Note that these tests were performed on a computer with better hardware specification than that used in Figure 4, which led to improved performance on the benchmarks at 60 Hz. A black square at the bottom right of the screen in A and B is the position of a flickering rectangle, which switches between black and white at every screen refresh. The luminance in this square is detected by a photodiode and used to measure the actual frame flip times.

To confirm that the rendering speed and timing accuracy of BonVision are sufficient to support neurophysiological experiments, which need high timing precision, we mapped the receptive fields of neurons early in the visual pathway (Yeh et al., 2009), in the mouse primary visual cortex and superior colliculus. The stimulus (‘sparse noise’) consisted of small black or white squares briefly (0.1 s) presented at random locations (Figure 5A). This stimulus, which is commonly used to measure receptive fields of visual neurons, is sensitive to the timing accuracy of the visual stimulus, meaning that errors in timing would prevent the identification of receptive fields. In our experiments using BonVision, we were able to recover receptive fields from electrophysiological measurements - both in the superior colliculus and primary visual cortex of awake mice (Figure 5B and C) – demonstrating that BonVision meets the timing requirements for visual neurophysiology. The receptive fields show in Figure 5C were generated using timing signals obtained directly from the stimulus display (via a photodiode). BonVision’s independent logging of stimulus presentation timing was also sufficient to capture the receptive field (Figure 5—figure supplement 1).

Figure 5. Illustration of BonVision across a range of vision research experiments.

(A) Sparse noise stimulus, generated with BonVision, is rendered onto a demi-spherical screen. (and C) Receptive field maps from recordings of local field potential in the superior colliculus (B), and spiking activity in the primary visual cortex (C) of mouse. (D) Two cubes were presented at different depths in a virtual environment through a head-mounted display to human subjects. Subjects had to report which cube was larger: left or right. (E) Subjects predominantly reported the larger object correctly, with a slight bias to report that the object in front was bigger. (F) BonVision was used to generate a closed-loop virtual platform that a mouse could explore (top: schematic of platform). Mice naturally tended to run faster along the platform, and in later sessions developed a speed profile, where they slowed down as they approached the end of the platform (virtual cliff). (G) The speed of the animal at the start of the platform and at the end of the platform as a function training. (H) BonVision was used to present visual stimuli overhead while an animal was free to explore an environment (which included a refuge). The stimulus was a small dot (5° diameter) moving across the projected surface over several seconds. (I) The cumulative probability of Freeze and Flight behaviour across time in response to moving dot presented overhead.

Figure 5.

Figure 5—figure supplement 1. BonVision timing logs are sufficient to support receptive field mapping of spiking activity.

Figure 5—figure supplement 1.

Top row in each case shows the receptive field identified using the timing information provided by a photodiode that monitored a small square on the stimulus display that was obscured from the animal. Bottom row in each case shows the receptive field identified by using the timing logged by BonVision during the stimulus presentation (a separate timing system was used to align the clocks between the computer hosting BonVision and the Open EPhys recording device). (A) Average OFF and ON receptive field maps for 33 simultaneously recorded units in a single recording session. (B) Individual OFF receptive field maps for three representative units in the same session.

To assess the ability of BonVision to control virtual reality environments we first tested its ability to present stimuli to human observers on a head-mounted display (Scarfe and Glennerster, 2015). BonVision uses positional information (obtained from the head-mounted display) to update the view of the world that needs to be provided to each eye, and returns two appropriately rendered images. On each trial, we asked observers to identify the larger of two non-overlapping cubes that were placed at different virtual depths (Figure 5D and E). The display was updated in closed-loop to allow observers to alter their viewpoint by moving their head. Distinguishing objects of the same retinal size required observers to use depth-dependent cues (Rolland et al., 1995), and we found that all observers were able to identify which cube was larger (Figure 5E).

We next asked if BonVision was capable of supporting other visual display environments that are increasingly common in the study of animal behaviour. We first projected a simple environment onto a dome that surrounded a head-fixed mouse (as shown in Figure 1E). The mouse was free to run on a treadmill, and the treadmill’s movements were used to update the mouse’s position on a virtual platform (Figure 5F). Not only did mouse locomotion speed increase with repeated exposure, but the animals modulated their speed depending on their location in the platform (Figure 5F and G). BonVision is therefore capable of generating virtual reality environments which both elicit and are responsive to animal behaviour. BonVision was also able to produce instinctive avoidance behaviours in freely moving mice (Figure 5H and I). We displayed a small black dot slowly sweeping across the overhead visual field. Visual stimuli presented in BonVision primarily elicited a freezing response, which similar experiments have previously described (De Franceschi et al., 2016Figure 5I). Together these results show that BonVision provides sufficient rendering performance to support human and animal visual behaviour.

Discussion

BonVision is a single software package to support experimental designs that require visual display, including virtual and augmented reality environments. BonVision is easy and fast to implement, cross-platform and open source, providing versatility and reproducibility.

BonVision makes it easier to address several barriers to reproducibility in visual experiments. First, BonVision is able to replicate and deliver visual stimuli on very different experimental apparatus. This is possible because BonVision’s architecture separates specification of the display and the visual environment. Second, BonVision includes a library of workflows and operators to standardise and ease the construction of new stimuli and virtual environments. For example, it has established protocols for defining display positions (Figure 3), mesh-mapping of curved displays (Figure 1E), and automatic linearisation of display luminance (Figure 4), as well as a library of examples for experiments commonly used in visual neuroscience. In addition, the modular structure of BonVision enables the development and exchange of custom nodes for generating new visual stimuli or functionality without the need to construct the complete experimental paradigm. Third, BonVision is based on Bonsai (Lopes et al., 2015), which has a large user base and an active developer community, and is now a standard tool for open-source neuroscience research. BonVision naturally integrates Bonsai’s established packages in the multiple domains important for modern neuroscience, which are widely used in applications including real-time video processing (Zacarias et al., 2018; Buccino et al., 2018), optogenetics (Zacarias et al., 2018; Buccino et al., 2018; Moreira et al., 2019), fibre photometry (Soares et al., 2016; Hrvatin et al., 2020), electrophysiology (including specific packages for Open Ephys Siegle et al., 2017; Neto et al., 2016 and high-density silicon probes Jun et al., 2017; Dimitriadis, 2018), and calcium imaging (e.g. UCLA miniscope Aharoni et al., 2019; Cai et al., 2016). Bonsai requires researchers to get accustomed to its graphical interface and event-based framework. However, it subsequently reduces the time required to learn real-time programming, and the time to build new interfaces with external devices (see Appendix 1). Moreover, since Bonsai workflows can be called via the command line, BonVision can also be integrated into pre-existing, specialised frameworks in established laboratories.

In summary, BonVision can generate complex 3D environments and retinotopically defined 2D visual stimuli within the same framework. Existing platforms used for vision research, including PsychToolbox (Brainard, 1997), PsychoPy (Peirce, 2007; Peirce, 2008), STYTRA (Štih et al., 2019), or RigBox (Bhagat et al., 2020), focus on well-defined 2D stimuli. Similarly, gaming-driven software, including FreemoVR (Stowers et al., 2017), ratCAVE (Del Grosso and Sirota, 2019), and ViRMEn (Aronov and Tank, 2014), are oriented towards generating virtual reality environments. BonVision combines the advantages of both these approaches in a single framework (Appendix 1), while bringing the unique capacity to automatically calibrate the display environment, and use deep neural networks to provide real-time control of virtual environments. Experiments in BonVision can be rapidly prototyped and easily replicated across different display configurations. Being free, open-source, and portable, BonVision is a state-of-the-art tool for visual display that is accessible to the wider community.

Materials and methods

Benchmarking

We performed benchmarking to measure latencies and skipped (‘dropped’) frames. For benchmarks at 60 Hz refresh rate, we used a standard laptop with the following configuration: Dell Latitude 7480, Intel Core i7-6600U Processor Base with Integrated HD Graphics 520 (Dual Core, 2.6 GHz), 16 GB RAM. For higher refresh rates we used a gaming laptop ASUS ROG Zephyrus GX501GI, with an Intel Core i7-8750H (six cores, 2.20 GHz), 16 GB RAM, equipped with a NVIDIA GeForce GTX 1080. The gaming laptop's built-in display refreshes at 144 Hz, and for measuring latencies at 90 Hz we connected it to a Vive Pro SteamVR head-mounted display (90 Hz refresh rate). All tests were run on Windows 10 Pro 64-bit.

To measure the time from input detection to display update, as well as dropped frames detection, we used open-source HARP devices from Champalimaud Research Scientific Hardware Platform, using the Bonsai.HARP package. Specifically we used the HARP Behavior device (a lost latency DAQ; https://www.cf-hw.org/harp/behavior), to synchronise all measurements with the extensions: ‘Photodiode v2.1’ to measure the change of the stimulus on the screen, and ‘Mice poke simple v1.2’ as the nose poke device to externally trigger changes. To filter out the infrared noise generated from an internal LED sensor inside the Vive Pro HMD, we positioned an infrared cut-off filter between the internal headset optics and the photodiode. Typically, the minimal latency for any update is two frames: one which is needed for the VSync, and one is the delay introduced by the OS. Display hardware can add further delays if they include additional buffering. Benchmarks for video playback were carried out using a trailer from the Durian Open Movie Project ( copyright Blender Foundation | durian.blender.org).

All benchmark programmes and data are available at https://github.com/bonvision/benchmarks.

File formats

We tested the display of images and videos using the image and video benchmark workflows. We confirmed the ability to use the following image formats: PNG, JPG, BMP, TIFF, and GIF. Movie display relies on the FFmpeg library (https://ffmpeg.org/), an industry standard, and we confirmed ability to use the following containers: AVI, MP4, OGG, OGV, and WMV; in conjunction with standard codecs: H264, MPEG4, MPEG2, DIVX. Importing 3D models and complex scenes relies on the Open Asset Importer Library (Assimp | http://assimp.org/). We confirmed the ability to import and render 3D models and scenes from the following formats: OBJ, Blender.

Animal experiments

All experiments were performed in accordance with the Animals (Scientific Procedures) Act 1986 (United Kingdom) and Home Office (United Kingdom) approved project and personal licenses. The experiments were approved by the University College London Animal Welfare Ethical Review Board under Project License 70/8637. The mice (C57BL6 wild-type) were group-housed with a maximum of five to a cage, under a 12 hr light/dark cycle. All behavioural and electrophysiological recordings were carried out during the dark phase of the cycle.

Innate defensive behaviour

Mice (five male, C57BL6, 8 weeks old) were placed in a 40 cm square arena. A dark refuge placed outside the arena could be accessed through a 10 cm door in one wall. A DLP projector (Optoma GT760) illuminated a screen 35 cm above the arena with a grey background (80 candela/m2). When the mouse was near the centre of the arena, a 2.5 cm black dot appeared on one side of the projection screen and translated smoothly to the opposite side over 3.3 s. Ten trials were conducted over 5 days and the animal was allowed to explore the environment for 5–10 min before the onset of each trial.

Mouse movements were recorded with a near infrared camera (Blackfly S, BFS-U3-13Y3M-C, sampling rate: 60 Hz) positioned over the arena. An infrared LED was used to align video and stimulus. Freezing was defined as a drop in the animal speed below 2 cm/s that lasted more than 0.1 s; flight responses as an increase in the animal running speed above 40 cm/s (De Franceschi et al., 2016). Responses were only considered if they occurred within 3.5 s from stimulus onset.

Surgery

Mice were implanted with a custom-built stainless-steel metal plate on the skull under isoflurane anaesthesia. A ~1 mm craniotomy was performed either over the primary visual cortex (2 mm lateral and 0.5 mm anterior from lambda) or superior colliculus (0.5 mm lateral and 0.2 mm anterior from lambda). Mice were allowed to recover for 4–24 hr before the first recording session.

We used a virtual reality apparatus similar to those used in previous studies (Schmidt-Hieber and Häusser, 2013; Muzzu et al., 2018). Briefly, mice were head-fixed above a polystyrene wheel with a radius of 10 cm. Mice were positioned in the geometric centre of a truncated spherical screen onto which we projected the visual stimulus. The visual stimulus was centred at +60° azimuth and +30° elevation and had a span of 120° azimuth and 120° elevation.

Virtual reality behaviour

Five male, 8-week-old, C57BL6 mice were used for this experiment. One week after the surgery, mice were placed on a treadmill and habituated to the virtual reality (VR) environment by progressively increasing the number of time spent head fixed: from ~15 min to 2 hr. Mice spontaneously ran on the treadmill, moving through the VR in absence of reward. The VR environment was a 100 cm long platform with a patterned texture that animals ran over for multiple trials. Each trial started with an animal at the start of the platform and ended when it reached the end, or if 60 s had elapsed. At the end of a trial, there was a 2 s grey interval before the start of the next trial.

Neural recordings

To record neural activity, we used multi-electrode array probes with two shanks and 32 channels (ASSY-37 E-1, Cambridge Neurotech Ltd., Cambridge, UK). Electrophysiology data was acquired with an Open Ephys acquisition board connected to a different computer from that used to generate the visual stimulus.

The electrophysiological data from each session was processed using Kilosort 1 or Kilosort 2 (Pachitariu et al., 2016). We synchronised spike times with behavioural data by aligning the signal of a photodiode that detected the visual stimuli transitions (PDA25K2, Thorlabs, Inc, USA). We sampled the firing rate at 60 Hz, and then smoothed it with a 300 ms Gaussian filter. We calculated receptive fields as the average firing rate or local field potential elicited by the appearance of a stimulus in each location (custom routines in MATLAB).

Augmented reality for mice

The mouse behaviour videos were acquired by Bruno Cruz from the lab of Joe Paton at the Champalimaud Centre for the Unknown, using methods similar to Soares et al., 2016. A ResNet-50 network was trained using DeepLabCut (Mathis et al., 2018Kane et al., 2020). We simulated a visual environment in which a virtual scene was presented beyond the arena, and updated the scenes on three walls of the arena. This simulated how the view changed as the animal moved through the environment. The position of the animal was updated from the video file at a rate of 40 frames/s on a gaming laptop: ASUS ROG Zephyrus GX501GI, with an Intel Core i7-8750H (six cores, 2.20 GHz), 16 GB RAM, equipped with a NVIDIA GeForce GTX 1080, using a 512 × 512 video. The performance can be improved using a lower pixel resolution for video capture, and we were able to achieve up to 80 frames/s without a noticeable decrease in tracking accuracy using this strategy. Further enhancements can be achieved using a MobileNetV2 network (Kane et al., 2020). The position inference from the deep neural network and the BonVision visual stimulus rendering were run on the same machine.

Human psychophysics

All procedures were approved by the Experimental Psychology Ethics Committee at University College London (Ethics Application EP/2019/002). We obtained informed consent and consent to publish from all participants. Four male participants were tested for this experiment. The experiments were run on a gaming laptop (described above) connected to a Vive Pro SteamVR head-mounted display (90 Hz refresh rate). BonVision is compatible with different headsets (e.g. Oculus Rift, HTC Vive). BonVision receives the projection matrix (perspective projection of world display) and the view matrix (position of eye in the world) for each eye from the head set. BonVision uses these matrices to generate two textures, one for the left eye and one for the right eye. Standard onboard computations on the headset provide additional non-linear transformations that account for the relationship between the eye and the display (such as lens distortion effects).

Code availability

BonVision is an open-source software package available to use under the MIT license. It can be downloaded through the Bonsai (bonsai-rx.org) package manager, and the source code is available at: github.com/bonvision/BonVision. All benchmark programmes and data are available at https://github.com/bonvision/benchmarks (copy archived at swh:1:rev:7205c04aa8fcba1075e9c9991ac117bd25e92639, Lopes, 2021). Installation instructions, demos, and learning tools are available at: bonvision.github.io/.

Acknowledgements

We are profoundly thankful to Bruno Cruz and Joe Paton for sharing their videos of mouse behaviour. This work was supported by a Wellcome Enrichment award: Open Research (200501/Z/16/A), Sir Henry Dale Fellowship from the Wellcome Trust and Royal Society (200501), Human Science Frontiers Program grant (RGY0076/2018) to ABS, an International Collaboration Award (with Adam Kohn) from the Stavros Niarchos Foundation/Research to Prevent Blindness to SGS, Medical Research Council grant (R023808), Biotechnology and Biological Sciences Research Council grant (R004765) to SGS and ABS.

Appendix 1

Basic workflow structure

Each BonVision workflow starts by loading the basic Shaders library (this is Bonsai's implementation of OpenGL) and then creating a window in which stimuli are to be displayed. Bonsai is an event-based framework, so the visual stimulus generation and control are driven by events from the RenderFrame or UpdateFrame nodes, which are in turn activated when a screen refresh occurs. An event broadcast from the RenderFrame or UpdateFrame node then activates the cascade of nodes that load, generate, or update the different visual stimuli.

Closed-loop control

Parameters of stimuli can also be updated, asynchronously and in parallel, by other events. Parameters of any Bonsai node can be controlled by addressing the relevant property within that node – all parameters within a node can be made visible to the external caller of that node. This is particularly useful for generating closed loop stimuli where the value of these parameters can be linked to external IO devices (e.g. position sensors) that are easily accessible using established Bonsai drivers and packages. A major advantage of the Bonsai framework is that the visual stimulus generation does not need to pause to poll those I/O devices, and the values from those devices can be retrieved any time up to the rendering of the frame, creating opportunities for low-lag updating of the visual stimulus.

Considerations while using BonVision

Client control

Some experimental designs may rely on complex experimental control protocols that are already established in other software, or are challenging to implement in a reactive framework. For such applications, BonVision’s rendering platform can be used as a client to create and control calibrated visual stimuli. This can be implemented using Bonsai’s inbuilt IP communication protocols to interact with the independent controller software (e.g. Python or MATLAB). BonVision workflows can also be executed from the command-line using standard syntax, without opening the graphical interface of Bonsai.

Mercator projection

A key motivation in developing BonVision was the ability to present 2D and 3D stimuli in the same framework. To enable this, we chose to project 2D stimuli onto a 3D sphere, using the Mercator projection. The Mercator projection, however, contracts longitude coordinates around the two poles, and the consequence is that 2D stimuli presented close to the poles are deformed without compensation. Experiments that require 2D-defined stimuli to be presented near the default poles therefore need particular care. There are a few options to overcome this limitation. One option is to rotate the sphere mapping so that the poles are shifted away from the desired stimulus location. A second option is to present the texture on a 3D object facing the observer. For example, to present a grating in a circular aperture, we could have the grating texture rendered on a disk presented in 3D, and the disk is placed in the appropriate position. Finally, the user can present stimuli via the NormalisedView node, which defines stimuli in screen pixel coordinates, and use manual calibrations and precomputations to ensure the stimuli are of the correct dimensions.

Constructing 3D environments

There are many well-established software packages with graphical interfaces that are capable of creating 3D objects and scenes, and users are likely to have their preferred method. BonVision therefore focuses on providing easy importing of a wide variety of 3D model formats. BonVision offers three options for building 3D environments:

  1. BonVision (limited capability). Inbuilt BonVision processes allow for the rendering of textures onto simple planar surfaces. The user defines the position and orientation of each plane in 3D space, and the texture that is to be drawn onto that plane, using the DrawTexturedModel node.

  2. Import (load) 3D models of objects (including cubes, spheres, and more complex models). Common 3D models (such as those used in Figure 1) are often freely available online. Custom models can be generated using standard 3D software, including Blender and CAD programmes. The user defines the position of each object, and its dynamics, within BonVision, and can independently attach the desired texture(s) to each of the different faces of those objects using the DrawTexturedModel node.

  3. Import a full 3D scene (with multiple objects and camera views). BonVision is able to interact with both individual objects and cameras defined within a 3D scene. A particular advantage of this method is that specialised software (e.g. Blender) provide convenient methods to construct and visualise scenes in advance; BonVision provides the calibrated display environment and capacity for interaction with the objects.

Once the 3D scene is created, the user can then control a camera (e.g. move or rotate) in the resultant virtual world. BonVision computes the effects of the camera movement (i.e. without any additional user code) to render what the camera should see onto a display device.

Animation lags and timing logs

While BonVision expends substantial effort to eliminate interruptions to the presentation of a visual stimulus, these can occur, and solutions may be beyond the control of the experimenter. To avoid the potential accumulation of timing errors, the UpdateFrame node uses the current time to specify the current location in an animation sequence. The actual presentation time of each frame in an animation can be logged using the standard logging protocols in BonVision. The log can also include the user predefined or real-time updated parameters that were used to generate the corresponding stimulus frame.

Customised nodes and new stimuli

Bonsai’s modular nature and simple integration with C# and Python scripting means BonVision can be extended by users. The BonVision package is almost entirely implemented using the Bonsai visual programming language, showcasing its power as a domain-specific language. Custom BonVision nodes can be easily created in the graphical framework, or using C# or Python scripting with user-defined inputs, outputs, properties and operations can be generated by users to create novel visual stimuli, define interactions between objects and enable visual environments which are arbitrarily responsive to experimental subjects.

Physics engine

BonVision is able to calculate interactions between objects using the package Bonsai.Physics, including collisions, bouncing off surfaces, or deformations.

Spatial calibration

BonVision provides automatic calibration protocols to define the position of display(s) relative to the observer. A single positional marker is sufficient for each flat display (illustrated in Figure 2; a standard operating procedure is described on the website). An additional marker is placed in the position of the observer to provide the reference point.

When the observer’s position relative to the display varies (e.g. in the augmented reality example in Figure 3 and Figure 3—video 1), the easiest solution is to calibrate the position of the displays relative to a fixed point in the arena. The observer position is then calculated in real-time, and the vector from the observer to the reference point is added to the vector from the reference to the display. The resultant vector is the calibrated position of the display relative to the observer’s current position.

In the case of head-mounted displays (HMDs), BonVision takes advantage of the fact that HMD drivers can provide the calibrated transform matrices from the observer’s eye centre, using the HMDView node.

When the presentation surface is curved (e.g. projection onto a dome) a manual calibration step is required as in other frameworks. This calibration step is often referred to as mesh-mapping and involves the calculation of a transformation matrix that specifies the relationship between a (virtual) flat display and position on the projection surface. A standard operating procedure for calculating this mesh-map is described on the BonVision website.

Performance optimisation

We recommend displaying stimuli through a single graphics card when possible. When multiple displays are used for visual stimulation, we recommend configuring them as a single extended display (as seen by the operating system). All our tests were performed under this configuration.

Appendix 1—table 1. Features of visual display software.
Features BonVision PsychToolbox PsychoPy ViRMEn ratCAVE FreemoVR Unity
Free and Open-source (FOSS) √√ √# √√ √# √√
Rendering of 3D environments √√ √√ √√ √√ √√
Dynamic rendering based on observer viewpoint √√ √√ √√
GUI for designing 3D scenes √√ √√
Import 3rd party 3D scenes √√ √√
Real-time interactive 3D scenes √√ √√ √√ √√ √√
Web-based deployment √√ √√
Interfacing with cameras, sensors and effectors √√ √√ ~ √√ ~ ~
Real-time hardware control √√ ~ ~ √√
Traditional visual stimuli √√ √√ √√
Auto-calibration of display position and pose √√
Integration with deep learning pose estimation √√

√√ easy and well-supported.

√ possible, not well-supported.

~ difficult to implement.

# based on MATLAB (requires a license).

Learning to use BonVision

We provide the following learning materials (which will continue to be updated):

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Samuel G Solomon, Email: s.solomon@ucl.ac.uk.

Aman B Saleem, Email: aman.saleem@ucl.ac.uk.

Chris I Baker, National Institute of Mental Health, National Institutes of Health, United States.

Chris I Baker, National Institute of Mental Health, National Institutes of Health, United States.

Funding Information

This paper was supported by the following grants:

  • Wellcome Trust 200501/Z/16/A to Aman B Saleem.

  • Wellcome Trust Sir Henry Dale Fellowship (200501) to Aman B Saleem.

  • Royal Society Sir Henry Dale Fellowship (200501) to Aman B Saleem.

  • Medical Research Council R023808 to Samuel G Solomon, Aman B Saleem.

  • Stavros Niarchos Foundation to Samuel G Solomon.

  • Biotechnology and Biological Sciences Research Council R004765 to Samuel G Solomon, Aman B Saleem.

  • Human Frontier Science Program RGY0076/2018 to Aman B Saleem.

Additional information

Competing interests

Gonçalo Lopes is affiliated with NeuroGEARS Ltd. The author has no financial interests to declare.

No competing interests declared.

Author contributions

Conceptualization, Resources, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Validation, Writing - review and editing.

Validation, Investigation, Writing - review and editing.

Validation.

Validation, Investigation, Writing - review and editing.

Validation, Investigation, Writing - original draft, Writing - review and editing.

Validation, Investigation, Writing - review and editing.

Validation, Investigation, Writing - review and editing.

Validation, Investigation, Writing - review and editing.

Validation, Investigation, Writing - review and editing.

Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Writing - original draft, Project administration, Writing - review and editing.

Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Writing - original draft, Project administration, Writing - review and editing.

Ethics

Human subjects: All procedures were approved by the Experimental Psychology Ethics Committee at University College London (Ethics Application EP/2019/002). We obtained informed consent, and consent to publish from all participants.

Animal experimentation: All experiments were performed in accordance with the Animals (Scientific Procedures) Act 1986 (United Kingdom) and Home Office (United Kingdom) approved project and personal licenses. The experiments were approved by the University College London Animal Welfare Ethical Review Board under Project License 70/8637.

Additional files

Transparent reporting form

Data availability

BonVision is an open-source software package available to use under the MIT license. It can be downloaded through the Bonsai (https://bonsai-rx.org) package manager, and the source code is available at: https://github.com/bonvision/BonVision. All benchmark programs and data are available at https://github.com/bonvision/benchmarks (copy archived at https://archive.softwareheritage.org/swh:1:rev:7205c04aa8fcba1075e9c9991ac117bd25e92639). Installation instructions, demos and learning tools are available at: https://bonvision.github.io/.

References

  1. Aharoni D, Khakh BS, Silva AJ, Golshani P. All the light that we can see: a new era in miniaturized microscopy. Nature Methods. 2019;16:11–13. doi: 10.1038/s41592-018-0266-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Aronov D, Tank DW. Engagement of neural circuits underlying 2D spatial navigation in a rodent virtual reality system. Neuron. 2014;84:442–456. doi: 10.1016/j.neuron.2014.08.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bhagat J, Wells MJ, Harris KD, Carandini M, Burgess CP. Rigbox: an Open-Source toolbox for probing neurons and behavior. Eneuro. 2020;7:ENEURO.0406-19.2020. doi: 10.1523/ENEURO.0406-19.2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Brainard DH. The psychophysics toolbox. Spatial Vision. 1997;10:433–436. doi: 10.1163/156856897X00357. [DOI] [PubMed] [Google Scholar]
  5. Buccino AP, Lepperød ME, Dragly SA, Häfliger P, Fyhn M, Hafting T. Open source modules for tracking animal behavior and closed-loop stimulation based on open ephys and bonsai. Journal of Neural Engineering. 2018;15:055002. doi: 10.1088/1741-2552/aacf45. [DOI] [PubMed] [Google Scholar]
  6. Cai DJ, Aharoni D, Shuman T, Shobe J, Biane J, Song W, Wei B, Veshkini M, La-Vu M, Lou J, Flores SE, Kim I, Sano Y, Zhou M, Baumgaertel K, Lavi A, Kamata M, Tuszynski M, Mayford M, Golshani P, Silva AJ. A shared neural ensemble links distinct contextual memories encoded close in time. Nature. 2016;534:115–118. doi: 10.1038/nature17955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. De Franceschi G, Vivattanasarn T, Saleem AB, Solomon SG. Vision guides selection of freeze or flight defense strategies in mice. Current Biology. 2016;26:2150–2154. doi: 10.1016/j.cub.2016.06.006. [DOI] [PubMed] [Google Scholar]
  8. Del Grosso NA, Sirota A. Ratcave: a 3D graphics Python package for cognitive psychology experiments. Behavior Research Methods. 2019;51:2085–2093. doi: 10.3758/s13428-019-01245-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dimitriadis G. Why not record from every channel with a CMOS scanning probe? bioRxiv. 2018 doi: 10.1101/275818. [DOI]
  10. Garrido-Jurado S, Muñoz-Salinas R, Madrid-Cuevas FJ, Marín-Jiménez MJ. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition. 2014;47:2280–2292. doi: 10.1016/j.patcog.2014.01.005. [DOI] [Google Scholar]
  11. Hrvatin S, Sun S, Wilcox OF, Yao H, Lavin-Peter AJ, Cicconet M, Assad EG, Palmer ME, Aronson S, Banks AS, Griffith EC, Greenberg ME. Neurons that regulate mouse torpor. Nature. 2020;583:115–121. doi: 10.1038/s41586-020-2387-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Jun JJ, Steinmetz NA, Siegle JH, Denman DJ, Bauza M, Barbarits B, Lee AK, Anastassiou CA, Andrei A, Aydın Ç, Barbic M, Blanche TJ, Bonin V, Couto J, Dutta B, Gratiy SL, Gutnisky DA, Häusser M, Karsh B, Ledochowitsch P, Lopez CM, Mitelut C, Musa S, Okun M, Pachitariu M, Putzeys J, Rich PD, Rossant C, Sun WL, Svoboda K, Carandini M, Harris KD, Koch C, O'Keefe J, Harris TD. Fully integrated silicon probes for high-density recording of neural activity. Nature. 2017;551:232–236. doi: 10.1038/nature24636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Kane GA, Lopes G, Saunders JL, Mathis A, Mathis MW. Real-time, low-latency closed-loop feedback using markerless posture tracking. eLife. 2020;9:e61909. doi: 10.7554/eLife.61909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Lopes G, Bonacchi N, Frazão J, Neto JP, Atallah BV, Soares S, Moreira L, Matias S, Itskov PM, Correia PA, Medina RE, Calcaterra L, Dreosti E, Paton JJ, Kampff AR. Bonsai: an event-based framework for processing and controlling data streams. Frontiers in Neuroinformatics. 2015;9:7. doi: 10.3389/fninf.2015.00007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Lopes G. BonVision Benchmarks. swh:1:rev:7205c04aa8fcba1075e9c9991ac117bd25e92639Software Heritage. 2021 https://archive.softwareheritage.org/swh:1:rev:7205c04aa8fcba1075e9c9991ac117bd25e92639
  16. Mathis A, Mamidanna P, Cury KM, Abe T, Murthy VN, Mathis MW, Bethge M. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nature Neuroscience. 2018;21:1281–1289. doi: 10.1038/s41593-018-0209-y. [DOI] [PubMed] [Google Scholar]
  17. Moreira JM, Itskov PM, Goldschmidt D, Baltazar C, Steck K, Tastekin I, Walker SJ, Ribeiro C. optoPAD, a closed-loop optogenetics system to study the circuit basis of feeding behaviors. eLife. 2019;8:e43924. doi: 10.7554/eLife.43924. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Muzzu T, Mitolo S, Gava GP, Schultz SR. Encoding of locomotion kinematics in the mouse cerebellum. PLOS ONE. 2018;13:e0203900. doi: 10.1371/journal.pone.0203900. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Neto JP, Lopes G, Frazão J, Nogueira J, Lacerda P, Baião P, Aarts A, Andrei A, Musa S, Fortunato E, Barquinha P, Kampff AR. Validating silicon polytrodes with paired juxtacellular recordings: method and dataset. Journal of Neurophysiology. 2016;116:892–903. doi: 10.1152/jn.00103.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Pachitariu M, Steinmetz M, Kadir M, Carandini M, Harris M. Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels. bioRxiv. 2016 doi: 10.1101/061481. [DOI]
  21. Peirce JW. PsychoPy--psychophysics software in python. Journal of Neuroscience Methods. 2007;162:8–13. doi: 10.1016/j.jneumeth.2006.11.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Peirce JW. Generating stimuli for neuroscience using PsychoPy. Frontiers in Neuroinformatics. 2008;2:10. doi: 10.3389/neuro.11.010.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Pereira TD, Aldarondo DE, Willmore L, Kislin M, Wang SS, Murthy M, Shaevitz JW. Fast animal pose estimation using deep neural networks. Nature Methods. 2019;16:117–125. doi: 10.1038/s41592-018-0234-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Rolland JP, Gibson W, Ariely D. Towards quantifying depth and size perception in virtual environments. Presence: Teleoperators and Virtual Environments. 1995;4:24–49. doi: 10.1162/pres.1995.4.1.24. [DOI] [Google Scholar]
  25. Scarfe P, Glennerster A. Using high-fidelity virtual reality to study perception in freely moving observers. Journal of Vision. 2015;15:3. doi: 10.1167/15.9.3. [DOI] [PubMed] [Google Scholar]
  26. Schmidt-Hieber C, Häusser M. Cellular mechanisms of spatial navigation in the medial entorhinal cortex. Nature Neuroscience. 2013;16:325–331. doi: 10.1038/nn.3340. [DOI] [PubMed] [Google Scholar]
  27. Siegle JH, López AC, Patel YA, Abramov K, Ohayon S, Voigts J. Open ephys: an open-source, plugin-based platform for multichannel electrophysiology. Journal of Neural Engineering. 2017;14:045003. doi: 10.1088/1741-2552/aa5eea. [DOI] [PubMed] [Google Scholar]
  28. Soares S, Atallah BV, Paton JJ. Midbrain dopamine neurons control judgment of time. Science. 2016;354:1273–1277. doi: 10.1126/science.aah5234. [DOI] [PubMed] [Google Scholar]
  29. Štih V, Petrucco L, Kist AM, Portugues R. Stytra: an open-source, integrated system for stimulation, tracking and closed-loop behavioral experiments. PLOS Computational Biology. 2019;15:e1006699. doi: 10.1371/journal.pcbi.1006699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Stowers JR, Hofbauer M, Bastien R, Griessner J, Higgins P, Farooqui S, Fischer RM, Nowikovsky K, Haubensak W, Couzin ID, Tessmar-Raible K, Straw AD. Virtual reality for freely moving animals. Nature Methods. 2017;14:995–1002. doi: 10.1038/nmeth.4399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Yeh CI, Xing D, Williams PE, Shapley RM. Stimulus ensemble and cortical layer determine V1 spatial receptive fields. PNAS. 2009;106:14652–14657. doi: 10.1073/pnas.0907406106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Zacarias R, Namiki S, Card GM, Vasconcelos ML, Moita MA. Speed dependent descending control of freezing behavior in Drosophila melanogaster. Nature Communications. 2018;9:1–11. doi: 10.1038/s41467-018-05875-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Chris I Baker1
Reviewed by: Jonathan P Newman2, André Maia Chagas3, Sue Ann Koay4

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Acceptance summary:

Increasingly, neuroscience experiments require immersive virtual environments that approximate natural sensory motor loops while permitting high-bandwidth measurements of brain activity. BonVision is an open-source graphics programming library that allows experimenters to quickly implement immersive 3D visual environments across display hardware and geometry with automated calibration and integration with hundreds of different neural recording technologies, behavioral apparatuses, etc. BonVision standardizes sharing complex, closed-loop visual tasks between labs with vastly different equipment and provides a concrete and easy way to do so.

Decision letter after peer review:

Thank you for submitting your article "Creating and controlling visual environments using BonVision" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by Chris Baker as the Senior and Reviewing Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Jonathan P Newman (Reviewer #1); André Maia Chagas (Reviewer #2); Sue Ann Koay (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this letter to help you prepare a revised submission.

Essential revisions:

In general, the reviewers were very positive about the manuscript and appreciated the time and effort taken both to develop BonVision and write this manuscript. The major concerns reflect a desire from the reviewers to see more detail on specific points as well as clarification over some of the statements made.

In your revision please address the specific recommendations below.

Reviewer #1 (Recommendations for the authors):

General comment: There are two measures of performance that are not explored in the manuscript but may aid in describing BonVision's advantages over alternative software. The first is the improved performance and ease of use compared to alternatives in cases where the input used to drive visual stimuli consists of mixtures of asynchronous data sources (e.g. ephys and behavioral measurements together). This is something I imagine BonVision could do with less effort and greater computational efficiency than alternative software. The animal experiments provided are good benchmarks because they are common designs, but do not demonstrate BonVision's immense potential for easily creating visual tasks with complex IO and stimulus contingencies. The second is a measure of human effort required to implement an experiment using BonVision compared to imperative, text-based alternatives. I think both of these issues could be tackled in the discussion by expanding a bit on Lines 144-146: why are BonVision and Bonsai so good at data stream composition compared to alternatives, and why is a visual programming approach so appropriate for Bonsai/BonVision's target use cases?

General comment: Following up on my desire for a more detailed explanation of the operation of the Bonsai.Shaders library, most of its operators have obvious relations to traditional OpenGL programming operations. However, an explanation of how the traditionally global state machine (context) of OpenGL was mapped onto the Bonsai.Shaders nodes and how temporal order of OpenGL context option manipulation is enforced might be helpful for those wishing to understand the underlying mechanics of BonVision and create their own tools using the Shaders library.

Line 11: The use of the word "timing" is ambiguous to me. Are the authors referring to closed loop reaction times and determinism, hardware IO delays, the combination of samples from asynchronous data streams, or all of the above?

Lines 13 and 22: The authors correctly state that graphics programming requires advanced training. However, the use of Bonsai, a functional language that operates entirely on Observable Sequences, also requires quite a lot of training to use effectively. I do think the authors have a point here, and I agree Bonsai is tool worth learning, but I feel the main strength of using Bonsai is its (broadly defined) performance (speed, elegance when dealing with asynchronous data, ease and formality of experiment sharing, ease of rapid prototyping, etc) rather than its learning curve. This point is exacerbated by the lack of documentation (outside of the source code) for many Bonsai features.

Line 64: Adding a parenthetical link to the Blender website seems appropriate.

Line 97: The model species should be stated here.

Figure 4(C): There is a single instance of BonVision being outperformed by PsychoPy3 in the case of 16-layer texture blending at 60 FPS. Can the authors comment on why this might be (e.g. PsychoPy3's poor performance at low layer counts is due to some memory bottleneck?) and why this matters (or does not matter) practically in the context of BonVision's target uses?

Figure 4(A-C): The cartoons of display screens have little black boxes in the lower right corners and I'm not sure what they mean.

Figure 5(A): As mentioned previously, it seems that these are post-hoc temporally aligned receptive fields (RFs). Is it worth seeing what the RFs created without post-hoc photodiode-based alignment of stimuli onset look like so that we can see the effect of display presentation jitter (or lack thereof)? This would be a nice indication of the utility of the package for real-time stimulus shaping for system ID purposes where ground truth alignment might not be possible. This is made more relevant given BonVision's apparent larger latency jitter compared to PsychToolbox (Figure 4A).

Figure 5(D): Although useful, the size discrimination task probably does not cover all potential corner cases with this type of projection. I don't think more experiments need to be performed but a more thorough theoretical comparison with other methods, e.g. sphere mapping, might be useful to motivate the choice of cube mapping for rendering 3D objects, perhaps in the discussion.

Figure 5(I): The caption refers to the speed of the animal on the ordinate axis but that figure seems to display a psychometric curve for freezing or running behaviors over time from stimulus presentation.

Line 293 and 319-322: HARP is a small enough project that I feel some explanation of the project's intentions and capabilities and the Bonsai library used to acquire it from HARP hardware might be useful.

Line 384: "OpenEphys" should be changed to "Open Ephys" in the text and in reference 13 used to cite the Acquisition Board's use.

Reviewer #2 (Recommendations for the authors):

– Figures 1, 2, 3 and Supp1 – the indication of the observer in these figures is sometimes a black, and sometimes a red dot. This is not bad, but I think you could streamline your figures if on the first one you had a legend for what represents the observer (ie observer = red dot) and have the same pattern through the figures?

– Figure 2 – If I understand correctly, in panels C and E, the markers are read by an external camera, which I am supposing in this case are the laptop camera? If this is the case, could you please change these panels so that they explicitly show where the cameras are? Maybe adding the first top left panel from supp Figure 3 to Figure 2 and indicate from where the markers are read would solve this?

– Figure 5 – Panel I: the legend states "The speed of the animal across different trials, aligned to the time of stimulus appearance." but the figure Y axis states Cumulative probability. I guess the legend needs updating? Also it is not clear to me how the cumulative probabilities of freeze and flight can sum up to more than one, as it seems to be the case from the figure? I am assuming that an animal either freezes of flees in this test? Maybe I missed something?

– In the results, lines 40 to 42, the authors describe how they have managed to have a single framework for both traditional visual presentation and immersive virtual reality. Namely they project 2D coordinate frame to a 3D sphere using Mercator projection. I would like to ask the authors to explain a bit how they deal with the distortions present in this type of projection. As far as I understand, this type of projection inflates the size of objects that are further away from the sphere midline (with increased intensity the further away)? Is this compensated for in the framework somehow? Would it make sense to offer users the option to choose different projections depending on their application?

– In line 62 "BonVision also has the ability to import standard format 3D design files" could the authors specify which file formats are accepted?

– When benchmarking BonVision (starting on line 73), the authors focus on 60Hz stimulus presentation using monitors with different capabilities. This is great, as it addresses main points for human, non-human primates and rodent experiments. I believe however that it would be great for the paper and the community in general if the authors could do some benchmarking with higher frame rates and contextualize BonVision for the use with other animal models, such as Fly, fish, etc. Given that there are a couple of papers describing visual stimulators that take care of the different wavelengths needed to stimulate the visual system of these animals, it seems to me that BonVision would be a great tool to create stimuli and environments for these stimulators and animal models.

Reviewer #3 (Recommendations for the authors):

I have a few presentation style points where I feel the text should be more careful not to come across as unintendedly too strong, or otherwise justification need to be provided to substantiate the claims. Most importantly, line 19 "the ability for rapid and efficient interfacing with external hardware (needed for experimentation) without development of complex multi-threaded routines" is a bit mysterious to me because I am unsure what these external hardware are that BonVision facilitates interfacing with. For example, experimenters do prefer multi-threaded routines where the other threads are used to trigger reward delivery, sensory stimuli of other modalities, or control neural stimulation or recording devices. This is in order to avoid blocking execution of the visual display software when these other functions are called. If BonVision provides a solution for these kinds of experiment interfacing requirements, I think they are definitely important enough to mention in the text. Otherwise, the sentence of line 19 needs some work in order to make it clear as to exactly which functionalities of BonVision are being referred to.

The other claims that stood out to me are as follows. In the abstract it is said that "Real-time rendering… necessary for next-generation…", but I don't know if anybody can actually claim that any one method is necessary. In line 116, "suggesting habituation to the virtual environment", the authors can also acknowledge that mice might simply be habituating to the rig (e.g. even if there was no visual display), since this does not seem to be a major claim that needs to be made. The virtual cliff effect (line 118) also seems very interesting, but the authors have not fully demonstrated that mice are not alternatively responding to a change in floor texture. It is also unclear to me why a gray floor (which looks to be equiluminant with the rest of the textured floor at least by guessing from Figure 5F) should be visually identified as a cliff, as opposed to, say, black. In order to make this claim about visual cliff identification especially without binocular vision, the authors would probably have to show experiments where the mice do not slow down at other floor changes (to white maybe?), but I'm unsure as to whether the data exists for this nor whether it is worth the effort. Overall I don't see a reason why the authors should attempt to claim that "BonVision is capable of eliciting naturalistic behaviors in a virtual environment", since the naturalness of rodent behaviors in virtual environments is a topic of debate in some circles, independent of the software used to generate those environments. I figure it's better to stay away unless this is a fight that one desires to fight.

eLife. 2021 Apr 21;10:e65541. doi: 10.7554/eLife.65541.sa2

Author response


Reviewer #1 (Recommendations for the authors):

General comment: There are two measures of performance that are not explored in the manuscript but may aid in describing BonVision's advantages over alternative software. The first is the improved performance and ease of use compared to alternatives in cases where the input used to drive visual stimuli consists of mixtures of asynchronous data sources (e.g. ephys and behavioral measurements together). This is something I imagine BonVision could do with less effort and greater computational efficiency than alternative software. The animal experiments provided are good benchmarks because they are common designs, but do not demonstrate BonVision's immense potential for easily creating visual tasks with complex IO and stimulus contingencies. The second is a measure of human effort required to implement an experiment using BonVision compared to imperative, text-based alternatives. I think both of these issues could be tackled in the discussion by expanding a bit on Lines 144-146: why are BonVision and Bonsai so good at data stream composition compared to alternatives, and why is a visual programming approach so appropriate for Bonsai/BonVision's target use cases?

We agree and we have now revised the Introduction and Discussion to better make these points transparent (particularly around lines 44-55 and 235-239).

General comment: Following up on my desire for a more detailed explanation of the operation of the Bonsai.Shaders library, most of its operators have obvious relations to traditional OpenGL programming operations. However, an explanation of how the traditionally global state machine (context) of OpenGL was mapped onto the Bonsai.Shaders nodes and how temporal order of OpenGL context option manipulation is enforced might be helpful for those wishing to understand the underlying mechanics of BonVision and create their own tools using the Shaders library.

We thank the reviewer for prompting us. Generally, we now mention that we build on the Bonsai.Shaders package in new text (lines 58-59 and in Supplementary Details).

Specifically, related to the point related to the temporal order of OpenGL, we also include the text (in lines 133-135): “BonVision accumulates a list of the commands to OpenGL as the program makes them. To optimise rendering performance, the priority of these commands is ordered according to that defined in the Shaders component of the LoadResources node (which the user can manipulate for high-performance environments). These ordered calls are then executed when the frame is rendered.”

Line 11: The use of the word "timing" is ambiguous to me. Are the authors referring to closed loop reaction times and determinism, hardware IO delays, the combination of samples from asynchronous data streams, or all of the above?

Thank you for picking this up – the organisation of the first paragraph meant that the subject of this sentence was unclear, and we have now tried to make this paragraph clearer, including splitting it into two distinct points (lines 3-17). We hope these changes now address the reviewers point.

Lines 13 and 22: The authors correctly state that graphics programming requires advanced training. However, the use of Bonsai, a functional language that operates entirely on Observable Sequences, also requires quite a lot of training to use effectively. I do think the authors have a point here, and I agree Bonsai is tool worth learning, but I feel the main strength of using Bonsai is its (broadly defined) performance (speed, elegance when dealing with asynchronous data, ease and formality of experiment sharing, ease of rapid prototyping, etc) rather than its learning curve. This point is exacerbated by the lack of documentation (outside of the source code) for many Bonsai features.

We agree and have revised the Introduction (lines 42-55) and Discussion (lines 235-239) to make this clearer.

Line 64: Adding a parenthetical link to the Blender website seems appropriate.

Line 97: The model species should be stated here.

Done.

Figure 4(C): There is a single instance of BonVision being outperformed by PsychoPy3 in the case of 16-layer texture blending at 60 FPS. Can the authors comment on why this might be (e.g. PsychoPy3's poor performance at low layer counts is due to some memory bottleneck?) and why this matters (or does not matter) practically in the context of BonVision's target uses?

In the conditions under which the benchmarking was performed, PsychoPy was able to present more overlapping stimuli compared to BonVision and PsychToolBox, because PsychoPy presented stimuli at a lower resolution compared to the other systems. We now indicate this in the main text of the manuscript (lines 150-151).

Figure 4(A-C): The cartoons of display screens have little black boxes in the lower right corners and I'm not sure what they mean.

The black square represents the position of a flickering square, the luminance of which is detected by a photodiode and used to measure frame display times. We have now updated the legend of Figure 4 to make this clear.

Figure 5(A): As mentioned previously, it seems that these are post-hoc temporally aligned receptive fields (RFs). Is it worth seeing what the RFs created without post-hoc photodiode-based alignment of stimuli onset look like so that we can see the effect of display presentation jitter (or lack thereof)? This would be a nice indication of the utility of the package for real-time stimulus shaping for system ID purposes where ground truth alignment might not be possible. This is made more relevant given BonVision's apparent larger latency jitter compared to PsychToolbox (Figure 4A).

We thank the reviewer for this suggestion. We now include receptive field maps calculated using the BonVision timing log in Figure5—figure supplement 1. Using the BonVision timing alone was also effective in identifying receptive fields.

Figure 5(D): Although useful, the size discrimination task probably does not cover all potential corner cases with this type of projection. I don't think more experiments need to be performed but a more thorough theoretical comparison with other methods, e.g. sphere mapping, might be useful to motivate the choice of cube mapping for rendering 3D objects, perhaps in the discussion.

We now clarify that we use the size discrimination task as a simple test of the ability of BonVision to run VR stimuli on a head-mounted display.

Although we considered the different mapping styles, we settled on cube mapping for 3D stimuli, as this is currently the standard for 3D rendering systems, and the most computationally efficient. We included a detailed discussion on the merits and issues with Mercatore projection for 2D stimuli in the new section “Appendix 1”.

Figure 5(I): The caption refers to the speed of the animal on the ordinate axis but that figure seems to display a psychometric curve for freezing or running behaviors over time from stimulus presentation.

Thank you for pointing this out, we have now corrected this.

Line 293 and 319-322: HARP is a small enough project that I feel some explanation of the project's intentions and capabilities and the Bonsai library used to acquire it from HARP hardware might be useful.

We have now added more information on the HARP sources, and why we have employed it here, including details of the Bonsai library needed to use the HARP device (lines 648-652). However, we are not core members of the HARP project and are wary of speaking for them on its intentions and other capabilities.

Line 384: "OpenEphys" should be changed to "Open Ephys" in the text and in reference 13 used to cite the Acquisition Board's use.

Done.

Reviewer #2 (Recommendations for the authors):

– Figures 1, 2, 3 and Supp1 – the indication of the observer in these figures is sometimes a black, and sometimes a red dot. This is not bad, but I think you could streamline your figures if on the first one you had a legend for what represents the observer (ie observer = red dot) and have the same pattern through the figures?

Great suggestion, thank you. We have now changed all observers to red dots and indicated this in the legend.

– Figure 2 – If I understand correctly, in panels C and E, the markers are read by an external camera, which I am supposing in this case are the laptop camera? If this is the case, could you please change these panels so that they explicitly show where the cameras are? Maybe adding the first top left panel from supp Figure 3 to Figure 2 and indicate from where the markers are read would solve this?

We think that the reviewer had spotted that there are multiple cameras shown in the image, and we apologise for not spotting this ourselves. The calibration is performed using only the images shown (that is the camera that is taking the image is the one used for the calibration). We now make this clearer in the legend to Figure 2.

– Figure 5 – Panel I: the legend states "The speed of the animal across different trials, aligned to the time of stimulus appearance." but the figure Y axis states Cumulative probability. I guess the legend needs updating? Also it is not clear to me how the cumulative probabilities of freeze and flight can sum up to more than one, as it seems to be the case from the figure? I am assuming that an animal either freezes of flees in this test? Maybe I missed something?

We thank the reviewer for highlighting this error. We have updated the legend.

– In the results, lines 40 to 42, the authors describe how they have managed to have a single framework for both traditional visual presentation and immersive virtual reality. Namely they project 2D coordinate frame to a 3D sphere using Mercator projection. I would like to ask the authors to explain a bit how they deal with the distortions present in this type of projection. As far as I understand, this type of projection inflates the size of objects that are further away from the sphere midline (with increased intensity the further away)? Is this compensated for in the framework somehow? Would it make sense to offer users the option to choose different projections depending on their application?

This is an excellent point. We have added a specific discussion related to the Mercator projection in the new section called Appendix, where we discuss the distortions and methods to work around them.

– In line 62 "BonVision also has the ability to import standard format 3D design files" could the authors specify which file formats are accepted?

We now link from the main text to the ‘File Formats’ section in Methods.

– When benchmarking BonVision (starting on line 73), the authors focus on 60Hz stimulus presentation using monitors with different capabilities. This is great, as it addresses main points for human, non-human primates and rodent experiments. I believe however that it would be great for the paper and the community in general if the authors could do some benchmarking with higher frame rates and contextualize BonVision for the use with other animal models, such as Fly, fish, etc. Given that there are a couple of papers describing visual stimulators that take care of the different wavelengths needed to stimulate the visual system of these animals, it seems to me that BonVision would be a great tool to create stimuli and environments for these stimulators and animal models.

We have added a new Figure 4—figure supplement 1, in which we show the results of the non-overlapping textures benchmark for BonVision at 144 Hz refresh. Comparison with the same data obtained at 60 Hz shows little deterioration in performance. These new data supplement the extant tests in Figure 4A, where we tested the closed-loop latency at these higher frame rates.

Reviewer #3 (Recommendations for the authors):

I have a few presentation style points where I feel the text should be more careful not to come across as unintendedly too strong, or otherwise justification need to be provided to substantiate the claims. Most importantly, line 19 "the ability for rapid and efficient interfacing with external hardware (needed for experimentation) without development of complex multi-threaded routines" is a bit mysterious to me because I am unsure what these external hardware are that BonVision facilitates interfacing with. For example, experimenters do prefer multi-threaded routines where the other threads are used to trigger reward delivery, sensory stimuli of other modalities, or control neural stimulation or recording devices. This is in order to avoid blocking execution of the visual display software when these other functions are called. If BonVision provides a solution for these kinds of experiment interfacing requirements, I think they are definitely important enough to mention in the text. Otherwise, the sentence of line 19 needs some work in order to make it clear as to exactly which functionalities of BonVision are being referred to.

We agree and have now revised the Introduction (lines 42-55) to make these points clearer.

The other claims that stood out to me are as follows. In the abstract it is said that "Real-time rendering… necessary for next-generation…", but I don't know if anybody can actually claim that any one method is necessary.

We have changed the text to say ‘important’ rather than ‘necessary’.

In line 116, "suggesting habituation to the virtual environment", the authors can also acknowledge that mice might simply be habituating to the rig (e.g. even if there was no visual display), since this does not seem to be a major claim that needs to be made. The virtual cliff effect (line 118) also seems very interesting, but the authors have not fully demonstrated that mice are not alternatively responding to a change in floor texture. It is also unclear to me why a gray floor (which looks to be equiluminant with the rest of the textured floor at least by guessing from Figure 5F) should be visually identified as a cliff, as opposed to, say, black. In order to make this claim about visual cliff identification especially without binocular vision, the authors would probably have to show experiments where the mice do not slow down at other floor changes (to white maybe?), but I'm unsure as to whether the data exists for this nor whether it is worth the effort. Overall I don't see a reason why the authors should attempt to claim that "BonVision is capable of eliciting naturalistic behaviors in a virtual environment", since the naturalness of rodent behaviors in virtual environments is a topic of debate in some circles, independent of the software used to generate those environments. I figure it's better to stay away unless this is a fight that one desires to fight.

We agree that there are heated debates around these issues in the field, and that this is not the place to have those discussions. We have changed the relevant sentence to read (lines 204-205): “BonVision is therefore capable of generating virtual reality environments which both elicit, and are responsive to animal behaviour.”

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Transparent reporting form

    Data Availability Statement

    BonVision is an open-source software package available to use under the MIT license. It can be downloaded through the Bonsai (https://bonsai-rx.org) package manager, and the source code is available at: https://github.com/bonvision/BonVision. All benchmark programs and data are available at https://github.com/bonvision/benchmarks (copy archived at https://archive.softwareheritage.org/swh:1:rev:7205c04aa8fcba1075e9c9991ac117bd25e92639). Installation instructions, demos and learning tools are available at: https://bonvision.github.io/.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES