Skip to main content
Journal of Neurophysiology logoLink to Journal of Neurophysiology
. 2019 Sep 25;122(6):2220–2242. doi: 10.1152/jn.00301.2019

A passive, camera-based head-tracking system for real-time, three-dimensional estimation of head position and orientation in rodents

Walter Vanzella 1,3,*, Natalia Grion 1,*, Daniele Bertolini 1,*, Andrea Perissinotto 1,3, Marco Gigante 2, Davide Zoccolan 1,
PMCID: PMC6966308  PMID: 31553687

Abstract

Tracking head position and orientation in small mammals is crucial for many applications in the field of behavioral neurophysiology, from the study of spatial navigation to the investigation of active sensing and perceptual representations. Many approaches to head tracking exist, but most of them only estimate the 2D coordinates of the head over the plane where the animal navigates. Full reconstruction of the pose of the head in 3D is much more more challenging and has been achieved only in handful of studies, which employed headsets made of multiple LEDs or inertial units. However, these assemblies are rather bulky and need to be powered to operate, which prevents their application in wireless experiments and in the small enclosures often used in perceptual studies. Here we propose an alternative approach, based on passively imaging a lightweight, compact, 3D structure, painted with a pattern of black dots over a white background. By applying a cascade of feature extraction algorithms that progressively refine the detection of the dots and reconstruct their geometry, we developed a tracking method that is highly precise and accurate, as assessed through a battery of validation measurements. We show that this method can be used to study how a rat samples sensory stimuli during a perceptual discrimination task and how a hippocampal place cell represents head position over extremely small spatial scales. Given its minimal encumbrance and wireless nature, our method could be ideal for high-throughput applications, where tens of animals need to be simultaneously and continuously tracked.

NEW & NOTEWORTHY Head tracking is crucial in many behavioral neurophysiology studies. Yet reconstruction of the head’s pose in 3D is challenging and typically requires implanting bulky, electrically powered headsets that prevent wireless experiments and are hard to employ in operant boxes. Here we propose an alternative approach, based on passively imaging a compact, 3D dot pattern that, once implanted over the head of a rodent, allows estimating the pose of its head with high precision and accuracy.

Keywords: head tracking, perceptual discrimination, place cell, pose estimation

INTRODUCTION

Careful monitoring and quantification of motor behavior is essential to investigate a range of cognitive functions (such as motor control, active perception, and spatial navigation) in a variety of different species. Examples include tracking eye movements in primate and nonprimate species (Kimmel et al. 2012; Payne and Raymond 2017; Remmel 1984; Stahl et al. 2000; Wallace et al. 2013; Zoccolan et al. 2010), monitoring whisking activity in rodents (Knutsen et al. 2005; Perkon et al. 2011; Rigosa et al. 2017), and tracking the position of virtually any species displaying interesting navigation patterns—from bacteria (Berg and Brown 1972) and invertebrate species (Cavagna et al. 2017; Garcia-Perez et al. 2005; Mazzoni et al. 2005; Mersch et al. 2013) to small terrestrial (Aragão et al. 2011; Tort et al. 2006) and aerial mammals (Tsoar et al. 2011; Yartsev and Ulanovsky 2013) and birds (Attanasi et al. 2014). In particular, studies in laboratory animals aimed at measuring the neuronal correlates of a given behavior require tools that can accurately track it in time and space and record it along with the underlying neuronal signals.

A classical application of this approach is to track the position of a light-emitting diode (LED), mounted over the head of a rat or a mouse, while recording the activity of place cells in hippocampus (O’Keefe and Dostrovsky 1971; O’Keefe and Nadel 1978) or grid cells in entorhinal cortex (Fyhn et al. 2004; Hafting et al. 2005), with the goal of understanding how space is represented in these brain structures (Moser et al. 2008, 2015). It is also common, in studies of spatial representations, to track the yaw of the head (i.e., its orientation in the horizontal plane where the rodent navigates; see Fig. 2C), to investigate the tuning of neurons in hippocampus, entorhinal cortex, and other limbic structures for head direction, speed, or angular velocity (Acharya et al. 2016; Kropff et al. 2015; Sargolini et al. 2006; Taube 2007). In these applications, the yaw is tracked through an overhead video camera imaging two LEDs of different colors (e.g., red and green), mounted over the head stage of the neuronal recording system and placed along the anteroposterior axis of the head. In some studies, this LED arrangement was also used to estimate the pitch of the head (i.e., its rotation about the interaural axis; see Fig. 2C), by measuring the distance between the two LEDs in the image plane (Bassett and Taube 2001; Stackman and Taube 1998).

Fig. 2.

Fig. 2.

Illustration of the 3D pattern of dots used for head tracking and of the Euler angles that define its pose in the camera reference system. A: CAD rendering of the pattern of dots that, once mounted over the head of a rat, allows its position and pose to be tracked. Notice the arm that allows the pattern to be attached to a matching base surgically implanted over the skull of the rat. B: different views of a rat with the dot pattern mounted over his head. C: definition of the angles of rotation of the reference system centered on the dot pattern (x′, y′, z′; purple arrows) with respect to the camera reference system (x, y, z; black arrows). The three Euler angles—yaw, pitch, and roll—are shown, respectively, by the red, green, and blue arrows. O′ indicates the origin of the pattern reference system. The brown arrow indicates the direction where the rat’s nose is pointing and it is parallel to the head’s anteroposterior axis (i.e., to the x′ axis). The dashed brown line is the projection of the brown arrow over the (x, y) plane of the camera reference system.

It is more difficult (and only rarely it has been attempted) to achieve a complete estimate of the pose and location of the head in the three-dimensional (3D) space—i.e., to simultaneously track the three Cartesian coordinates of the head and the three Euler angles that define its orientation: yaw, pitch, and roll (with the latter defined as the rotation about the head’s anteroposterior axis; see Fig. 2C). Recently, two groups have successfully tracked the head of small, freely moving mammals in 3D through videography, by relying either on a single camera imaging a custom tetrahedral arrangement of four LEDs with different colors (Finkelstein et al. 2015), or on multiple cameras (up to four) imaging custom 3D arrangements of up to six infrared LEDs (Sawinski et al. 2009; Wallace et al. 2013). Other groups have used inertial measurement units (IMUs), such as accelerometers and gyroscopes, mounted over the head of a rat, to record its angular displacement and velocity along the three Euler rotation axes (Kurnikova et al. 2017; Pasquet et al. 2016).

All these approaches provide accurate measurements of head position and pose in 3D. However, having been developed as ad hoc solutions for specific experimental settings, their design is not necessarily optimal for every application domain. For instance, most of these systems were conceived to track the head of small mammals roaming over an open-field arena, where the relatively large size of the custom LED- or IMU-based headset (extending several centimeters above and/or around the animal’s head) was not an issue in terms of encumbrance or obstruction. Moreover, these headsets need to be powered to operate. In general, this requires dedicated wires, which increase the stiffness of the bundle of cables connected to the headstage of the recording system and prevent performing fully unplugged recordings using headstages equipped with wireless transmitters (Pinnell et al. 2016; Szuts et al. 2011).

In this study, we tried to overcome these limitations, by using a single overhead camera to passively image a 3D-printed structure, painted with a pattern of black dots over a white background and mounted over the head of a rat. The small size of the pattern (1.35×1.35×1.5 cm) makes it ideal for perceptual studies, where a rodent performs a discrimination task inside a narrow operant box, often with its head inserted through an opening or confined within a funnel, as in the studies of rodent visual perception recently carried out by our group (Alemi-Neissi et al. 2013; Djurdjevic et al. 2018; Nikbakht et al. 2018; Rosselli et al. 2015; Tafazoli et al. 2012; Zoccolan et al. 2009) and other authors (Bossens and Op de Beeck 2016; De Keyser et al. 2015; Hauser et al. 2019; Horner et al. 2013; Kemp and Manahan-Vaughan 2012; Kurylo et al. 2015, 2017; Mar et al. 2013; Stirman et al. 2016; Vermaercke and Op de Beeck 2012; Yu et al. 2018). In what follows, besides describing in details the equipment and the algorithm on which our method is based (materials and methods) and validate its accuracy and precision (first part of the results and discussion), we provide a practical demonstration of the way our head tracker can help understanding 1) how a rat samples the sensory stimuli during a visual or auditory discrimination task; and 2) how hippocampal neurons represent head position over extremely small spatial scales around the area where the animal delivers its perceptual decision and collects the reward.

MATERIALS AND METHODS

Experimental rig for behavioral tests and in vivo electrophysiology.

To demonstrate how our head tracker can be used in behavioral neurophysiology experiments with rodents, we applied it to track the head of a rat engaged in a two-alternative forced-choice (2AFC) discrimination task (Zoccolan 2015; Zoccolan and Di Filippo 2018). The animal was also implanted with an electrode array to perform neuronal recordings from the hippocampus.

The rig used to administer the task to the rat is similar to that employed in previous studies from our group (Alemi-Neissi et al. 2013; Djurdjevic et al. 2018; Rosselli et al. 2015; Zoccolan et al. 2009) and is illustrated in Fig. 1. Briefly, a custom-made operant box was designed with the CAD software SolidWorks (Dassault Systèmes) and then built using black and transparent Plexiglas. The box was equipped with a 42-in. LCD monitor (Sharp, PN-E421) for presentation of visual stimuli and an array of three stainless steel feeding needles (Cadence Science), 10 mm apart from each other connected to three proximity sensors. The left and right feeding needles were connected to two computer-controlled syringe pumps (New Era Pump Systems NE-500), for automatic pear juice delivery. Each feeding needle was flanked by a LED on one side and a photodiode on the other side, so that, when the rat licked the needle, he broke the light bean extending from the LED to the photodiode and his responses were recorded. The front wall of the operant box had a rectangular aperture, which was 4 cm wide and extended vertically for the whole height of the wall, so as to allow room for the cables connecting the implanted electrode array to the preamplifiers of the acquisition system. The rat learned to insert his head through this aperture, so as to reach the feeding needles and face the stimulus display, which was located 30 cm from his nose. Two speakers positioned at the sides of the monitor were used for playing the sound stimuli. Stimulus presentation, input and output devices, and all task-relevant parameters and behavioral data acquisition were controlled with the freeware, open-source software MWorks (https://mworks.github.io/) running on a Mac Mini (Apple; solid cyan and black arrows in Fig. 1A).

Fig. 1.

Fig. 1.

Illustration of the experimental rig where perceptual discrimination tests were combined with in vivo neuronal recordings and real-time tracking of the head of a rat subject. A: this schematic shows the operant box in which the rat was trained, along with its various components: the array of response ports (gray rectangle) for collection of behavioral responses, the stimulus display (orange rectangle), and the speakers for presentation of the visual and acoustic stimuli. The drawing also shows the computers used to run the behavioral task, record the neurophysiological signals, and acquire the video stream collected by the camera of the head tracker (green box). The arrows show the flow of the signals collected and commands issued by the three computers (see materials and methods for details). TDT, System Three workstation from Tucker-Davis Technologies. B: CAD rendering of some of the key elements of the experimental rig. The drawing allows appreciating the relative size and position of the operant box (light gray), stimulus display (dark gray), camera (dark green), and illuminator (light blue). The inset shows a detail of the 3D-printed block (cyan) holding the response ports and allows appreciating the size and typical position of the dot pattern, relative to the other components of the rig.

During the neurophysiological experiments, stimulus presentation and collection of behavioral responses were synchronized with the amplification and acquisition of extracellular neuronal signals from hippocampus (red arrow in Fig. 1A), performed using a System Three workstation (Tucker-Davis Technologies, TDT), with a sampling rate of 25 kHz, running on a Dell PC. Specifically, MWorks sent a code with the identity and time of the stimulus presented in every trial to the TDT system via a UDP connection (dotted black arrow). The TDT system also controlled the acquisition of the frames by the camera of the head tracker, by generating a square wave that triggered the camera image uptake (dashed black arrow). The camera, in turn, generated a unique identification code for every acquired image. This code was saved by the PC running the head-tracking software (along with the Cartesian coordinates and orientation of the head) and also fed back to the TDT (dashed green arrow), which saved it in a data file along with the neurophysiological and behavioral recordings. The square wave had a fixed period and duty cycle, but it was adjustable by the user from the TDT graphical interface. In our experiments, the period was 23 ms (~50 Hz) and the duty cycle was around 33%.

Perceptual discrimination tasks, surgery, and neuronal recordings.

One adult male Long-Evans rat (Charles River Laboratories) was used for the validation of the head-tracking system. The animal was housed in a ventilated cabinet (temperature controlled) and maintained on a 10:14-h light-dark cycle. The rat weighed ~250 g at the onset of the experiment and grew to over 500 g at the end of the study. The rat received a solution of water and pear juice (ratio 1:5) as a reward during each training session and, in addition, he had access to water ad libitum for 1 h after the training. All animal procedures were in agreement with international and institutional standards for the care and use of animals in research and were approved by the Italian Ministry of Health: project N. DGSAF 22791-A, submitted on Sep. 7, 2015 and approved on Dec. 10, 2015 (approval N. 1254/ 2015-PR).

The rat was trained in a visual and sound-recognition task in the rig described in Experimental rig for behavioral tests and in vivo electrophysiology. The task required the animal to discriminate between either two visual objects or two sounds (Fig. 5A). Each trial started when the animal touched the central response port, which triggered the stimulus presentation. After the presentation of the stimulus on the monitor or the playback of the sound, the rat had to report its identity by licking either the left or the right response port. Each stimulus was associated to one specific port. Hence, only one action, either licking the left or right port, was associated to the reward in any given trial. Correct object identification was followed by the delivery of the pear juice-water solution, while incorrect response yielded a 1- to 3-s timeout, with no reward delivery and a failure tone played along with the flickering of the monitor from black to middle gray at 15 Hz. The stimuli were presented for 1 s or until the animal licked one of the lateral response ports, independently of the correctness of the choice.

Fig. 5.

Fig. 5.

Head tracking of a rat engaged in a perceptual discrimination task. A: illustration of the visual discrimination task. The rat learned to lick the central response port to trigger the presentation of either object 1 or object 2 on the stimulus display placed in front of the operant box (see Fig. 1A). Presentation of object 1 required the rat to approach and lick the response port on the left to correctly report its identity, while presentation of object 2 required the animal to lick the response port on the right. B: example snapshots captured and processed by the head tracker at three representative times—i.e., when the rat licked the central, the left, and the right response ports. The colored lines are the x′ (red), y′ (green), and z′ (blue) axes of the reference system centered on the dot pattern (see Fig. 2C), as inferred in real time by the head tracker (see also the Supplemental Video: https://doi.org/10.5281/zenodo.2789936). C: the trajectories of the nose of the rat in consecutive trials during the execution of the task are superimposed to a snapshot of the block holding the response ports, as imaged by the head tracker. The red and blue traces refer to trials in which the animal chose, respectively, the left and right response port. The trajectories are plotted in the Cartesian plane corresponding to the floor of the operant box, where the x and y axes (black arrows) are, respectively, parallel and orthogonal to the stimulus display (see also Fig. 1A). D: the bar plot shows the distribution of the percent of frames lost because of segmentation failure across the trials recorded in the experiment of Figs. 6 and 7.

The visual stimuli consisted of a pair of three-lobed objects, previously used by our group in several studies of visual perceptual discrimination in rats (Alemi-Neissi et al. 2013; Djurdjevic et al. 2018; Rosselli et al. 2015; Zoccolan et al. 2009). Specifically, each object was a rendering of a three-dimensional model built using the ray tracer POV-Ray (http://www.povray.org). The sound stimuli were two pure tones, at 600 Hz and a 6,000 Hz, respectively (sound level: ~55 dB). Sounds were delivered from two speakers located symmetrically on both sides of the front part of the operant box, so that the sound level at the position of the animal’s ears was equal when the animal’s nose was near the central response port at the onset of the trial.

Once the rat reached ≥70% correct behavioral performance, it was implanted with an electrode array for chronic recordings. To this aim, the animal was anesthetized with isofluorane and positioned in a stereotaxic apparatus (Narishige, SR-5R). A craniotomy was made above the dorsal hippocampus and a 32-channel Zif-clip array (Tucker-Davis Technologies Inc., was lowered into the craniotomy. Six stainless steel screws were inserted into the skull (three anterior, one lateral, and two posterior to the craniotomy) to give anchoring to the implant cementation. Around the implant, we put hemostatic gelatin sponge (Spongostan Dental, Ethicon, Inc.) saturated with sterile sodium chloride solution to protect the brain from dehydration, and then silicon (Kwik-Cast, World Precision Instruments) to seal the craniotomy and protect it from the dental cement (Secure, Sun Medical Co. Ltd.) that was finally used to secure the whole implant to the skull. Hippocampal stereotaxic coordinates were −3.7 mm AP, −3.5 mm ML. The final depth of the electrodes to target CA1 was around −2.2 mm and for the CA3 subregion was around −3.4 mm. To place the dot pattern over the head of the rat for head tracking (see Head-tracking system), a complementary connector to the one mounted on the pattern was also cemented on the head, anterior to the electrode array (with a magnet on top; 210 g/cm2 holding force; see Fig. 2, A and B). The rat was given antibiotic enrofloxacin (Baytril; 5 mg/kg) and carprofen (Rimadyl; 2.5 mg/kg, subcutaneous injection) for prophylaxis against infections and postoperative analgesia for the next 3 days postsurgery. The animal was allowed to recover for 7–10 days after the surgery, during which he had access to water and food ad libitum. The behavioral and recording sessions in the operant box were resumed after this recovery period. Action potentials (spikes) in the extracellular signals acquired by the TDT system were detected and sorted for each recording site separately, using Wave Clus (Quiroga et al. 2004) in MATLAB (The MathWorks). Online visual inspection of prominent theta waveforms in addition to histology confirmed the position of the electrodes.

Head-tracking system.

The head tracker (Fig. 1) consists of an industrial monochromatic CMOS camera (Point Grey Flea3, 1.3 MP B&W, C-Mount), a far-red illuminator, a dedicated PC (Intel Xeon HP Workstation Z620 with Xeon CPU 2.5 GHz, and RAM 16 GB), and a three-dimensional pattern of dots, mounted over the head of the animal and imaged by the overhead camera (Fig. 2A). The sensor size of the camera was 1/2″ and its resolution was 1,280 × 1,024 pixels. The camera mounted a TUSS Mega-pixel Fixed Focal Lens 2/3″, 50 mm, with aperture size f/2.8. The illuminator was made of a matrix of 4×4 LEDs, with dominant wavelength at 730 nm (OSLON SSL 150, PowerCluster LED Arrays) and a radiance angle of [−40°,40°] and was powered at 100–150 mW. It was mounted right above the stimulus display, oriented toward the operant box with an angle of ~50° with respect to the display (Fig. 1B). The camera was set in the external trigger mode and the triggers it received from the TDT system (dashed black arrow in Fig. 1A) were counted, so that each frame was assigned a unique code, which was encoded in the first four pixels of each acquired image. In our validation analyses and in our tests with the implanted rat, the CMOS sensor integration time was set at 3 ms. This value was chosen after preliminary tests with the implanted rat performing the discrimination task, in such a way to guarantee that the images of the dot pattern acquired during fast sweeps among the response ports were not blurred. The camera was placed above the stimulus display, oriented toward the operant box with an angle of ~50° with respect to the display (Fig. 1B), although some of the validation measures presented in the results (Fig. 3) were obtained with the camera mounted in such a way to have its optical axis perpendicular to the floor of the measurement area.

Fig. 3.

Fig. 3.

Nominal precision of the head tracker in the camera reference system. A: CAD rendering of the breadboard with the grid of 5 × 5 locations where the dot pattern was placed to obtain the validation measurements shown in D. The pattern was mounted over an apposite pedestal (dark gray structure) with four feet, which were inserted in matching holes over the breadboard. B: CAD rendering of the stereotax arm that was used to displace the dot pattern vertically, to obtain the validation measurements shown in E. C: CAD rendering of the stereotax arm that was used to change the pitch and roll angles of the dot pattern, to obtain the validation measurements shown in F. D, left: validation measurements (red dots) obtained by placing the dot pattern on a grid of 5 × 5 ground-truth positions (grid intersections), over the floor of the testing area, using the breadboard shown in A. Each red dot in the main panel is actually a cloud of 30 repeated measurements, as it can be appreciated by zooming into the area of the grid intersection at the submillimeter scale (inset). Right: mean precision (top) and accuracy (bottom) of the head-tracker measurements over the 25 tested positions (see results for details). E: top: validation measurements (red dots) obtained by vertically displacing the dot pattern (relative to the floor of the testing area) of 34 consecutive increments, using the stereotax arm shown in B. Bottom: mean precision (left) and accuracy (right) of the head-tracker measurements over the 36 tested vertical displacements. F, left: validation measurements (red dots) obtained by setting the roll and pitch angles of the dot pattern to a combination of 13 × 9 ground-truth values (grid intersections), using the custom assembly shown in C. The inset allows appreciating the spread of the 30 measurements taken at one of the tested angle combinations. Note that, for some extreme rotations of the pattern, no measurements could be taken (grid intersections without red dots), since the dots on the pattern were not visible. Right: mean precision (left) and accuracy (right) of the head-tracker measurements over the set of tested angle combinations.

Obviously, following the generation of a trigger signal by the TDT, a propagation delay occurred before the camera started integrating for the duration of the exposure time. This delay was ~5 µs and has been measured as suggested by the camera constructor. That is, we configured one of the camera’s GPIO pins to output a strobe pulse and we connected to an oscilloscope both the input trigger pin and the output strobe pin. The time delay from the trigger input to the complete image formation was then given by the sum of the propagation delay and exposure time, where the latter, as explained above, was fixed at 3 ms. Any other possible time delays like the sensor readout and the data transfer toward the PC do not affect real-time acquisition, since the content of the image and its frame number is entirely established at the end of the exposure time.

After the image has been read from the camera memory buffer, it becomes available to the head-tracking software (see Overview of the head-tracking algorithm) and, before any image processing occurs, the embedded frame number is extracted from the image and sent back to the TDT system via UDP protocol. Due to the data transfer over the USB 3.0 and UDP connections, the frame number information reaches the TDT system with an unpredictable time distance from the signal that triggered the acquisition of the frame itself. However, synchronizing the triggering pulses and the frame numbers is straightforward when considering the frame numbers and the number of trigger pulses generated. To validate our synchronization method, we conducted a double check by comparing the synch by frame number with the synch by timestamps. The second method uses the timestamp of the frame number received by TDT and looks for the trigger pulse that could have generated the frame in an interval of time [−0.0855 −0.0170 ms] before the frame number arrived. In particular, it assigns to the current frame number the more recent pulse that has not yet been assigned before. The two methods gave identical results over long sessions.

The head tracker works by imaging a small (1.35×1.35×1.5 cm), lightweight (~1.10 g), easy-to-place 3D-printed structure (shown in Fig. 2A), consisting of five coplanar black dots over a white background (~3.5 mm apart from each other), plus a sixth dot located over an elevated pillar (~5 mm tall). The five coplanar dots were arranged in a L-like shape, while the elevated dot was placed at the opposite side with respect to the corner of the L shape, so as to minimize the chance of occluding the other dots. The 3D structure holding the dots was designed with the CAD software SolidWorks (Dassault Systèmes) and then 3D printed. The dots were printed on a standard white paper sheet, which was then cut to the proper size and glued over the 3D-printed structure. This structure also included an arm, with a magnet at the bottom, which allowed the pattern to be mounted over an apposite support (holding a complementary magnet) that was surgically implanted on the rat’s head (Fig. 2, A and B).

Since the head-tracking algorithm (see Overview of the head-tracking algorithm) requires a precise knowledge of the spatial arrangement of the dots relative to each other, the distance between each pair of dots, as well as the height of the pillar, were carefully measured using a caliper and by acquiring high-magnification images of the pattern along with a precision ruler placed nearby (resolution 0.5 mm). This information is inserted in an apposite configuration file and is part of the calibration data that are required to operate the head tracker. The other calibration information is the internal camera parameters K, which have to be precomputed by the camera calibration procedure (see appendix A). Once this information is known, it can be stored and then loaded from the configuration file at the beginning of each behavioral/recording session, provided that the position of the camera in the rig is not changed and the same dot pattern is always used.

Overview of the head-tracking algorithm.

Our head-tracking algorithm consists of three different functional parts: 1) the Point Detection (PD) module; 2) the Points-Correspondences Identification (PcI) module; and 3) the Perspective-n-Point (PnP) module. These modules are executed in a sequence at each frame captured by the camera. An estimate of the position/pose of the dot pattern (and, therefore, of the head) is computed, only when the PD module is able to extract the positions of all the dots within the acquired image. In cases in which such condition is not satisfied, the PD algorithm signals and records the inability to successfully recover the position/pose of the pattern for the current frame.

The PD module is a fast feature-detection algorithm that eliminates false positive blobs and restricts the search space of point configurations by certain geometric constraints. Following the correct extraction of the dot positions by the PD algorithm, such positions must be univocally matched to the known geometry of the pattern by the PcI algorithm. That is, all six points, as they appear in the image, must be associated to the known 3D coordinates of the corresponding dots on the 3D pattern. Finally, the position and pose of the pattern reference system (x′, y′, z′; purple arrows in Fig. 2C) with respect to the camera reference system (x, y, z; black arrows in Fig. 2C) is obtained by solving the PnP problem. The whole algorithm was designed to process the images captured by the camera in real time, so as to output the estimated position and pose of the pattern without the need of storing the images for off-line processing. This required designing the three different modules, in such a way to maximize both the speed of processing and the accuracy of the estimates.

A detailed description of each processing module is provided in appendix A. The source code of the tracking algorithm, along with a short, example video to test it, is freely available at the following link: https://doi.org/10.5281/zenodo.3429870.

RESULTS

Our head tracker was developed in the context of a 2AFC perceptual discrimination experiment, involving the visual and auditory modalities. A scheme of the experimental set up is shown in Fig. 1A. The rat performed the discrimination task inside an operant box that was equipped with a monitor and two speakers for the presentation of the sensory stimuli. The task required the animal to insert the head through an opening in the wall facing the monitor and interact with an array of three response ports (i.e., three feeding needles, equipped with proximity sensors). Specifically, the rat had to lick the central needle to trigger the presentation of the stimulus. Afterward, he had to lick one of the lateral needles to report his perceptual choice and receive a liquid reward, in case of successful discrimination (see materials and methods for details).

Figure 1A also shows the equipment used to acquire the neurophysiological signals and image the dot pattern mounted over the head of the rat to perform head tracking (see materials and methods for a description). Figure 1B shows a CAD drawing with a scaled representation of the key components of the rig and their relative position: the operant box, the monitor, the block holding the feeding needles, the overhead camera, and the case of the light source.

The key element of our head-tracking systems is a 3D pattern of dots, mounted over the head of the animal and imaged by the overhead camera (Fig. 2A). At the beginning of each experimental session, the pattern is mounted on top of the head of the animal (Fig. 2B), using an apposite magnet that connects it to a base that was previously surgically implanted. As described in detail in the materials and methods and appendix A, the head-tracking algorithm detects the six black dots in the frames of the video stream and computes the position and pose of the pattern in the 3D space of the camera reference system. More specifically, a pattern reference system (x′, y′, z′; purple arrows in Fig. 2C) is defined, with the origin O′ placed over one of the dots, the x′ and y′ axes parallel to the edges of the plate, and the z′ axis perpendicular to it (i.e., parallel to the pillar). In the ideal case in which the pattern had been precisely aligned to the anatomical axes of the head at the time of the implant, x′ corresponds to the head’s anteroposterior axis, while y′ corresponds to the head’s interaural axis. The camera reference system (x, y, z; black arrows in Fig. 2C) results instead from calibrating the camera using a standard procedure that consists in imaging a checkerboard pattern placed at various positions and orientations (see appendix A). Once the camera is calibrated, the head-tracking algorithm provides 1) the three Cartesian coordinates of O′ in the camera reference system; and 2) the rotation matrix R that defines the 3D rotation bringing the pattern reference system (x′, y′, z′) to be aligned with the camera reference system (x, y, z). R is defined as

R=Rz(φc)Ry(θc)Rx(γc), (1)

where Rz, Ry, and Rx are the elemental rotation matrixes that define intrinsic rotations by the Euler angles φC (yaw), θC (pitch), and γC (roll) about the axes of the camera reference system. More specifically:

RzφC=cosφCsinφC0sinφCcosφC0001
Ry(θC)=cosθC0sinθC010sinθC0cosθC
Rx(γC)=1000cosγCsinγC0sinγCcosγC,

where, with reference to Fig. 2C: φC is the angle between the projection of x′ onto the camera (x, y) plane and the camera x axis; θC is the angle between x′ and the (x, y) plane; and γC is the rotation angle of the pattern around x′.

It should be noted that the three Euler angles φC, θC, and γC, as well as the three Cartesian coordinates of O′, are not immediately applicable to know the pose and position of the head in the environment. First, they are relative to the camera reference system, while the experimenter needs to know them with respect to some meaningful environmental landmarks (e.g., the floor and walls of the arena or the operant box where the animal is tested). Second, no matter how carefully the pattern is placed over the head of the rat, in general, the (x′, y′, z′) axes will not be perfectly aligned to the anatomical axes of the head—e.g., x′ and y′ will not be perfectly aligned to the head’s anteroposterior and interaural axes. However, expressing the pose and position of the pattern in camera coordinates allows measuring the nominal precision and accuracy that the head tracker can achieve, which is essential to validate the system. In Validation of the head tracker: nominal precision and accuracy in the camera reference system, we illustrate how we collected these validation measurements, and, in Operation of the head tracker: measuring displacements and rotations relative to reference poses in the physical environment, we explain how the actual position and pose of the head can be expressed in a convenient reference system, by collecting images of the checkerboard and dot pattern at, respectively, a reference position and pose.

Validation of the head tracker: nominal precision and accuracy in the camera reference system.

To measure the precision and accuracy of the head tracker, we used a custom combination of breadboards, linear stages, rotary stages, and goniometers to hold the dot pattern in known 3D positions and poses (Fig. 3, AC). This allowed comparing the ground-truth coordinates/angles of the pattern with the measurements returned by the head tracker (Fig. 3, DF). For these measurements, the camera was positioned in such a way to have its optical axis perpendicular to the floor of the testing area.

We first measured the ability of the system to track two-dimensional (2D) displacements of the pattern over the floor of the testing area. To this aim, the pattern was placed over a 3D printed breadboard with a grid of 5 × 5 holes, that were 37.5 mm and 25 mm apart along, respectively, the horizontal and vertical dimensions (Fig. 3A; the maximal error on the positioning of the pattern was ±0.15 mm and ±0.10 mm along the two dimensions; see appendix B). For each of the 25 breadboard locations, we took a set of 30 repeated, head-tracker measurements of the origin of the pattern in the camera reference system (i.e., O′ in Fig. 2C). Since the coordinates of the grid holes in such a reference system are not known a priori, a Procrustes analysis (Gower and Dijksterhuis 2004) was applied to find the optimal match between the set of 25×30 measures returned by the head tracker and the known, physical positions of the holes of the breadboard. Briefly, the Procrustes analysis is a standard procedure to optimally align two shapes (or two sets of points, as in our application) by uniformly rotating, translating, and scaling one shape (or one set of points) with respect to the other. In our analysis, since we compared physical measurements acquired with a calibrated camera, we did not apply the scale transformation (i.e., the scale factor was fixed to 1). When applied to our set of 25×30 measures, the Procrustes analysis returned a very good match with the set of 25 ground-truth positions of the grid (Fig. 3D, left). As shown by the virtually absent spread of the dots at each intersection, the root mean square error (RMSE) of each set of 30 measurements, relative to their mean, was very low, yielding a mean precision (across positions) of 0.056 ± 0.007 mm along the x-axis and 0.037 ± 0.005 mm along the y-axis (Fig. 3D, right; top bar plot). The RMSE of each set of 30 measurements, relative to the corresponding grid intersection, was also very low, yielding a mean accuracy (across positions) of 0.663 ± 0.079 mm along the x-axis and 0.268 ± 0.041 mm along the y-axis (Fig. 3D, right; bottom bar plot).

Interestingly, zooming into the area of the grid intersections at the submillimeter scale (see Fig. 3D, inset) revealed that the spread of the position estimates was elongated along a specific direction in the (x,y) plane. This can be explained by the fact that the PnP algorithm, when it finds the position/pose of the pattern that yields the best match between the projection of the dots on the camera sensor and their actual position, as segmented from the acquired image (see appendix A), naturally tends to make larger errors along the direction of sight of the camera (i.e., the axis connecting the camera’s sensor to the pattern). In fact, small changes along this direction would result in projected images of the pattern that do not differ much among each other (in other words, this is the direction along which less information can be inferred about the position of the pattern). Hence, the cloud of repeated position estimates would mainly spread along this direction. When these estimates are projected onto the (x,y) plane, the spread will still appear elongated in a given direction (as shown in the inset of Fig. 3D), depending on the relative position of the camera and the pattern’s location (only if the pattern were located right below the camera, perpendicular to its optic axis, the clouds of measurements would be circularly symmetric, since, in this case, the measurements would spread along the z axis). It should also be noted that an elongated spread in the (x,y) plane automatically implies a difference in the accuracy of the x and y estimates. For instance, in the case of our measurements, the RMSE in x was about three times larger than in y (Fig. 3D, right; bottom bar plot), which is consistent with the larger spread of the measurements along the x-axis, as compared with the y-axis, that is apparent in the inset.

To verify the ability of the head tracker to estimate the height of the pattern (i.e., its displacement along the z-axis) when it was varied over a range of positions, the pattern was mounted on the arm of a stereotax (MyNeuroLab Angle One) through a 3D-printed custom joint (Fig. 3B). The arm was positioned in such a way to be perpendicular to the floor of the testing area, thus allowing the pattern to be displaced vertically of 34 consecutive 2-mm increments, with a resolution of 0.01 mm. After each increment, the z coordinate of the pattern was estimated by the head tracker in 30 repeated measurements, which were then compared with the total physical displacement of the stereotaxic arm up to that point (Fig. 3E, top). The resulting estimates were again very precise and accurate (Fig. 3E, bottom), with RMSE values that were close to those obtained previously for the x and y displacements (compare the z bars of Fig. 3E to the x and y bars of Fig. 3D).

For the validation of the angular measurements, we built a custom assembly, made of a rotary stage (Thorlabs MSRP01/M) mounted on a stereotaxic arm (MyNeuroLab Angle One), that allowed rotating the pattern about two orthogonal axes, corresponding to roll and pitch (Fig. 3C). Rotations about each axis were made in 10° steps, spanning from −60° to 60° roll angles and from −40° to 40° pitch angles, while the yaw was kept fix at 0° (the resolution of the stereotaxic arm was 0.2°, while the rotary stage had an accuracy and precision of, respectively, 0.13° and 0.04°, measured as RMSE see appendix B). Again, 30 repeated head-tracker measurements were collected at each know combination of angles over the resulting 13 × 9 grid. As for the case of the 2D displacements, the angles returned by the head tracker were not immediately comparable with the nominal rotations on the rotary stages, because the two sets of angles are measured with respect to two different reference systems—i.e., the camera reference system (as defined in Fig. 2C), and the stage reference system (as defined by the orientation of the pattern in the physical environment, when the rotations of the stages are set to zero). Therefore, to compare the nominal rotations of the pattern on the stages (Rstagesnom) to their estimates provided by the head tracker (Rcamest), we first had to express the former in the camera reference system (Rcamnom). To this aim, we followed the same approach of Wallace et al. (2013) and we computed Rcamnom as

Rcamnom=RstagecamRstagesnom(Rstagecam)TRcamnom 0, (2)

where Rcamnom 0 is the pose of the pattern (in the camera reference system) when all the rotations of the stages are nominally set to zero (i.e., reference rotation), and Rstagecam is the matrix mapping the stage reference system into the camera reference system (note that each matrix in Eq. 2 is in the form shown in Eq. 1). The matrixes Rstagecam and Rcamnom 0 are unknown that can be estimated by finding the optimal match between the nominal and estimated rotations in camera coordinates, i.e., between Rcamnom, as defined in Eq. 2, and Rcamest. Following Wallace et al. (2013), we defined the rotation difference matrix Rdiff=Rcamest·RcamnomT, from which we computed the error of a head-tracker angle estimate as the total rotation in Rdiff, i.e., as

err=cos1traceRdiff12.

By minimizing the sum of err2 over all tested rotations of the stages (using MATLAB fminsearch function), we obtained a very close match between estimated and nominal stage rotations. This is illustrated in Fig. 3F, left, where the red dots are the head-tracker estimates and the grid intersections are the nominal rotations. As for the case of the Cartesian displacements, also for the pitch and roll angles, the head tracker returned very precise (roll: RMSE = 0.095 ± 0.005°; pitch: RMSE = 0.109 ± 0.006°) and accurate (roll: RMSE = 0.557 ± 0.045°; pitch: RMSE = 0.381 ± 0.030°) measurements (Fig. 3F, right).

Operation of the head tracker: measuring displacements and rotations relative to reference poses in the physical environment.

While the validation procedure described in Validation of the head tracker: nominal precision and accuracy in the camera reference system provides an estimate of the nominal precision and accuracy of the head tracker, measuring Cartesian coordinates and Euler angles in the camera reference system is impractical. To refer the head-tracker measurements to a more convenient reference system in the physical environment, we 3D printed a custom adapter to precisely place the checkerboard pattern used for camera calibration over the block holding the feeding needles, in such a way to be parallel to the floor of the operant box, with the vertex of the top-right black square vertically aligned with the central needle (see Fig. 4A). We then acquired an image of this reference checkerboard, which served to establish a new reference system (x″, y″, z″), where the x″ and y″ axes are parallel to the edges at the base (the floor) of the operant box, while z″ is perpendicular to the floor and passes through the central feeding needle. The position measurements returned by the head tracker can be expressed as (x″, y″, z″) coordinates by applying the rotation matrix Rboxcam and the translation vector tboxcam that map the camera reference system into this new operant box reference system. Rboxcam is in the form shown in Eq. 1, but with the angles referring to the rotation of the reference checkerboard with respect to the camera reference system; tboxcam=ΔtxΔtyΔtz, where Δt is the distance between the origins of the two reference systems along each axis.

Fig. 4.

Fig. 4.

Validation of the head tracker in the reference system of the operant box. A: CAD rendering of the top view of the 3D-printed block (cyan) holding the three feeding needles that worked as response ports in the operant box. The figure shows how the checkerboard pattern used to calibrate the camera was placed over the response port block, in such a way to be parallel to the floor of the operant box and with its top-right vertex vertically aligned with the central needle. An image of the checkerboard pattern in such a reference position was acquired, so as to express the position measurements returned by the head tracker in a reference system with the x and y axes parallel to the edges at the base of the operant box, and the z perpendicular to the floor and passing through the central feeding needle. B: CAD rendering of the custom assembly of two 3D-printed goniometers and a rotary stage that was used to obtain the validation measurements shown in D. C, left: validation measurements (red dots) obtained by placing the dot pattern on a grid of 5 × 5 ground-truth positions (grid intersections; same as in Fig. 3D), over the floor of the operant box. Each measurement is relative to the box reference system, defined as described in A. The inset allows appreciating the spread of the 30 measurements taken at one of the tested positions. Right: mean precision (top) and accuracy (bottom) of the head-tracker measurements over the 25 tested positions. RMSE, root mean square error. D, left: validation measurements (red dots) obtained by setting the yaw, pitch, and roll angles of the dot pattern to a combination of 3 × 7 × 5 ground-truth values (grid intersections), using the custom assembly shown in B. Each measurement is relative to a pose zero reference system, obtained by acquiring an image of the dot pattern with all the angles on custom assembly set to zero. Right: mean precision (top) and accuracy (bottom) of the head-tracker measurements over the set of tested angle combinations.

We tested the ability of the head tracker to correctly recover the Cartesian coordinates of the dot pattern relative to the box reference system, by placing the pattern over the 5×5 grid shown in Fig. 3A and collecting 30 repeated head-tracker measurements at each location. To verify the functioning of the head tracker under the same settings used for the behavioral and neurophysiological experiments (see Head movements of a rat performing a two-alternative forced choice task in the visual and auditory modalities), the camera was not centered above the operant box, with the optical axis perpendicular to the floor, as previously done for the validation shown in Fig. 3. Otherwise, during a neuronal recording session, the cable that connects the headstage protruding from the rat head to the preamplifier would partially occlude the camera’s field of view. Hence, the need to place the camera in front of the rat, above the stimulus display, oriented with an angle of ~50° relative to the floor (see Fig. 1B). This same positioning was used here to collect the set of validation measurements over the 5×5 grid. As shown in Fig. 4C, the match between estimated and nominal horizontal (x″) and vertical (y″) coordinates of the pattern was very good (compare the red dots to the grid intersection), with a barely appreciable dispersion of the 30 measurements around each nominal position value. This resulted in a good overall precision (x: RMSE = 0.034 ± 0.004; y: 0.158 ± 0.010) and accuracy (x: RMSE = 0.403 ± 0.050; y: 1.36 ± 0.093) of the x″ and y″ measurements.

The Euler angles defining the pose of the head in the 3D space could also be measured relative to the operant box reference system (x″, y″, z″). However, this would yield an estimate of the rotation of the dot pattern, rather than of the rat head, in the physical environment. In fact, no matter how carefully the support holding the pattern is implanted at the time of the surgery, it is unlikely for the pattern to be perfectly aligned to the anatomical axes of the head, once put in place. In general, the x′ and y′ axes of the pattern reference system (see Fig. 2B) will be slightly titled, with respect to the head’s anteroposterior and interaural axes. Therefore, it is more convenient to acquire an image of the pattern when the rat is placed in the stereotax, with its head parallel to the floor of the operant box, and then use such a pose zero of the pattern as a reference for the angular measurements returned by the head tracker. This can be achieved by defining a pose zero reference system (x0, y0, z0) and a rotation matrix Rpose 0cam mapping the camera reference system into this new coordinate system [the matrix is in the form shown in Eq. 1, but with the angles referring to the rotation of the pose zero of the pattern with respect to the camera reference system].

To test the ability of the head tracker to correctly recover the Euler angles of the dot pattern relative to the pose zero reference system, we mounted the pattern over a custom assembly made of two 3D printed goniometers, each allowing rotations over a span of 70° (from −35° to +35°), and a rotary stage, enabling 360° rotations (Fig. 4B). This allowed setting the pose of the pattern over a grid of known 3×7×5 combinations of yaw, pitch, and roll angles (the accuracy and precision of such ground-truth rotations were 0.07° and 0.02° for the yaw; 0.05° and 0.03° for the pitch; and 0.13° and 0.04° for the roll; all measured as RMSE see appendix B). As illustrated in Fig. 4D, left, we found a good match between the nominal and estimated pattern rotations (note that not all 3×7×5 angle combinations were actually tested, since the dots on the pattern were not detectable at some extreme rotations). When mediated across all tested angle combinations, the resulting precision and accuracy (Fig. 4D, right) were very similar to the nominal ones shown in Fig. 3F (precision: roll 0.068 ° ± 0.003°, pitch 0.076° ± 0.004°, yaw 0.0409° ± 0.001°; accuracy: roll 0.746° ± 0.036, pitch 0.598° ± 0.048°, yaw 0.929° ± 0.052°).

Head movements of a rat performing a two-alternative forced choice task in the visual and auditory modalities.

To illustrate the application of our head tracker in vivo, we implanted a rat with an electrode array targeting hippocampus (see materials and methods and Simultaneous head-tracking and neuronal recordings during a two-alternative forced choice discrimination task for details). The implant also included a base with a magnet that allowed attaching the dot pattern during the behavioral/recording sessions (see Fig. 2, AB). As previously explained, the animal had to interact with an array of three response ports, each equipped with a feeding needle and a proximity sensor (Fig. 1). Licking the central port triggered the presentation of either a visual object (displayed on the screen placed in front of the rat) or a sound (delivered through the speakers located on the side of the monitor). Two different visual objects could be presented to the animal (same as in Alemi-Neissi et al. 2013; Zoccolan et al. 2009)—object 1 required the rat to approach and lick the left response port to obtain liquid reward, while object 2 required him to lick the right response port (Fig. 5A). The same applied to the two sounds, one associated to the left and the other to right response port.

Figure 5B shows example images captured and processed by the head tracker in three representative epochs during the execution of the task (the colored lines are the x′, y′, and z′ axes that define the pose of the pattern, as inferred in real time by the head tracker; see the Supplemental Video: https://doi.org/10.5281/zenodo.2789936). By tracking the position of the pattern in the 2D plane of the floor of the operant box, it was possible to plot the trajectory of the rat’s nose in each individual trial of the behavioral test (Fig. 5C, red versus blue lines, referring to trials in which the animal chose, respectively, the left and right response port). The position of the nose was monitored, because it better reflects the interaction of the animal with the response sensors, as compared with the position of the pattern. The latter can be converted into the nose position by carefully measuring the distance between the origin of the pattern and the tip of the nose along each of the axes of pattern reference system at the time of the surgery (such distances were: 30.84, 1.5, and 22.16 mm along, respectively, the x′, y′, and z′ axes). Importantly, despite the presence of possible obstacles that could occlude the view of the pattern (i.e., the cable connected to the headstage for neuronal recordings), and despite the fact that the head could undergo large rotations (see Fig. 8), only a tiny fraction of frames were dropped due to segmentation failure. As shown in Fig. 5D, in the majority of the trials (56.3% of the total) less than 10% of the frames were lost per trial, and only in 2.1% of the trials we observed more than 50% of dropped frames.

Fig. 8.

Fig. 8.

Statistical characterization of the head’s rotations performed by the rat during the perceptual discrimination task. A: illustration of the yaw and roll rotations that the rat’s head can perform, relative to its reference pose (see main text). B: time course of the pitch (left), roll (middle), and yaw (right) angles during repeated trials of the perceptual discrimination task. The red and blue colors indicate trials in which the rat chose, respectively, the left and right response port. Traces are aligned to the time in which the stimulus was presented (dashed gray line). The mean total response time (i.e., the mean time of arrival to the selected response port) is shown by the dashed black line. The thick colored lines are the averages of the left and right response trajectories. C: the pitch (top), roll (middle), and yaw (bottom) angles measured by the head tracker in individual trials (thin blue lines) and, on average, across all trials (thick black lines), at the times the rat reached the left, central, and right response ports.

With such a precise and reliable knowledge of the position and pose of the nose/head in space and time, we could address questions concerning the perceptual, motor, and cognitive processes deployed by the rat during the visual and auditory discriminations, well beyond the knowledge that can be gained by merely monitoring the response times collected by the proximity sensors.

For example, in Fig. 6A, left, we have reported the x position of the rat’s nose as a function of time in all the behavioral trials collected over 14 consecutive test sessions (total of 3,413 trials). The traces are aligned to the time in which the stimulus was presented (i.e., 300 ms after the animal had licked the central port). From these traces, we computed the reaction time (RcT) of the rat in each trial, defined as the time, relative to the stimulus onset, in which the animal left the central sensor to start a motor response, eventually bringing its nose to reach either the left or right response port. As it can be appreciated by looking at the spread of both sets of traces, RcT was highly variable across trials, ranging from 300 ms to 1,166 ms (extreme values not considering outliers), with a median around 530 ms (Fig. 6A, right). By contrast, a much smaller variability was observed for the ballistic response time (BRsT), i.e., the time taken by the rat to reach the left or right response port, relative to the last time he passed by the central port (Fig. 6B, left), with BRsT ranging between 66 and 566 ms (Fig. 6B, right: top box plot; median ~300 ms). This suggests that much of the variability in the total response time (ToRsT; i.e., the time taken to reach the left or right response port, relative to the stimulus onset; Fig. 6B, right: bottom box plot) has to be attributed to the perceptual/decisional process required for the correct identification of the stimulus. However, a close inspection of the response trajectories of Fig. 6B (aligned to the time in which the ballistic motor response was initiated) shows that the rat, after leaving the central port in response to the stimulus, often did not point directly to the port that he would eventually choose (referred to as the selected response port in what follows). In many trials, the final ballistic response was preceded by earlier movements of the head, either toward the selected response port or the opposite one (referred to as the opposite response port in what follows).

Fig. 6.

Fig. 6.

Statistical characterization of the head’s displacements performed by the rat during the perceptual discrimination task. A, left: position of the rat’s nose as a function of time, along the x axis of the Cartesian plane corresponding to the floor of the operant box (i.e., x-axis in Fig. 5C). The traces, recorded in 3,413 trials over the course of 14 consecutive sessions, are aligned to the time in which the stimulus was presented (gray dashed line). The black dashed line indicates the mean reaction time (as defined in the main text). The red and blue colors indicate trials in which the rat chose, respectively, the left and right response port. The dashed lines highlight two specific examples of left and right responses, while the thick lines are the averages of the left and right response trajectories. Right: box plot showing the distributions of reaction times for the two classes of left and right responses. B, left: same trajectories as in A, but aligned to the last time the rat passed by the central port (dashed line). Right: box plots showing the distributions of ballistic response times (top) and total response times (bottom) for the two classes of left and right responses (see results for definitions). C: subset of the response patterns (referred to as P1) shown in A and B, in which the rat, after leaving the central port, made a direct ballistic movement to the selected response port (trace alignment as in B). D: subset of the response patterns (referred to as P2) shown in A and B, in which the rat, before making a ballistic movement toward the selected response port, made an initial movement toward the opposite port (trace alignment as in B). E: subset of the response patterns (referred to as P3) shown in A and B, in which the rat made an initial movement toward the selected response port, then moved back to the central port, and finally made a ballistic moment to reach the selected port (trace alignment as in B). F: comparisons among the mean reaction times (left), among the mean motor response times (middle; see main text for a definition), and among the mean ballistic response times (right) that were measured for the three types of trajectories (i.e., P1, P2, and P3) shown, respectively, in C, D, and E (***P < 0.001; two-tailed, unpaired t test). The number of trajectories of a given type is reported on the corresponding bar. Error bars are SE of the mean. G: velocity of the rat’s nose as a function of time for the three types of trajectories P1, P2, and P3.

To better investigate rat response patterns, we separated the recorded trajectories into three response patterns. In the first response pattern (P1), the rat, after leaving the central port, made a direct ballistic movement to the selected response port (either left or right; Fig. 6C). In the second response pattern (P2), the rat made an initial movement toward the opposite response port, before correcting himself and making a ballistic movement toward the selected response port (Fig. 6D). In the third response pattern (P3), the rat made an initial movement toward the selected response port but then moved back to the central port, before approaching again, with a ballistic moment, the selected port (Fig. 6E). Interestingly, RcT was significantly smaller in P2 than in P1 (P < 0.001; two-tailed, unpaired t test; Fig. 6F, leftmost bar plot), suggesting that the trials in which the animal reversed his initial decision were those in which he made an “impulsive” choice that he eventually corrected. As expected, the motor response time (MoRsT; i.e., the time taken by the animal to reach the selected response port, after leaving, for the first time, the central port) was substantially lower in P1, as compared with P2 and P3, given the indirect trajectories that the latter trial types implied (P < 0.001; two-tailed, unpaired t test; Fig. 6F, middle bar plot). By contrast, BaRsT was slightly, but significantly higher in P1 than in P2 and P3, indicating that ballistic movements were faster when they followed a previously aborted choice (P < 0.001; two-tailed, unpaired t test; Fig. 6F, rightmost bar plot).

To better understand this phenomenon, we plotted the average velocity of the rat’s nose as a function of time for the three response patterns (Fig. 6G; curves are aligned to the onset of the ballistic motor response). As expected, in P1, the velocity was close to zero till the time in which the ballistic response was initiated. By contrast, in P2, the velocity was already high at the onset of the ballistic movement, because the animal’s nose passed by the central sensor when sweeping from the opposite response port to the selected one (see Fig. 6D). This momentum of the rat’s head was thus at the origin of the faster ballistic responses in P2, as compared with P1. In the case of P3, there was no appreciable difference of velocity with P1 at the onset of the ballistic movements. This is expected, given that the rat moved twice in the direction of the selected port and, in between the two actions, he stopped at the central sensor (see Fig. 6E). However, following the onset of the ballistic movement, the rat reached a larger peak velocity in P3 than in P1, which explains the shorter time needed to complete the ballistic response in the former trials’ type. This may possibly indicate a larger confidence of the rat in his final choice, following an earlier, identical response choice that was later aborted. More in general, the response patterns P2 and P3 appear to be consistent with the behavior known as vicarious trial and error, which reflects the deliberation process deployed by rats when considering possible behavioral options (Redish 2016).

We also used the head tracker to inquire whether the rat deployed different response/motor patterns depending on the sensory modality of the stimuli he had to discriminate. Rat performance was higher in the sound discrimination task, as compared with the visual object discrimination task (P < 0.001; two-tailed, unpaired t test; Fig. 7A). Consistently with this observation, the fraction of trials in which the animal aborted an initial perceptual choice (i.e., response patterns P2 and P3), as opposed to make a direct response to the selected port (i.e., response pattern P1), was significantly larger in visual than in auditory trials (P < 0.01, χ2 test for homogeneity; Fig. 7B). This means that the animal was less certain about his initial decision in the visual trials, displaying a tendency to correct such decisions more often, as compared with the auditory trials. Interestingly, the lower perceptual discriminability of the visual stimuli did not translate into a general tendency of reaction times to be longer in visual than auditory trials. When the animal aimed directly to the selected port (P1; the vast majority of trials, as it can be appreciated in Fig. 7B), no significant difference in RcT was observed between visual and auditory trials (P > 0.05; two-tailed, unpaired t test; Fig. 7C, top left bar plot). By contrast, in trials in which the rat corrected his initial decision (P2), RcT was significantly longer in visual than auditory trials (P < 0.01; two-tailed, unpaired t test; Fig. 7C, middle left bar plot), but the opposite trend was found in trials in which the animal swung back and forth to the selected response port (P3; P < 0.05; two-tailed, unpaired t test; Fig. 7C, bottom left bar plot). We found instead a general tendency of the rat to make faster ballistic responses in visual than auditory trials, with this trend being significant in P1 and P2 (P < 0.001; two-tailed, unpaired t test; Fig. 7D, top right and middle right bar plots). It is hard to interpret these findings, which could indicate a larger impulsivity of the animal in the visual discrimination, but, possibly, also a larger confidence in his decision. Addressing more in depth this issue is obviously beyond the scope of our study, since it would require measuring the relevant metrics (e.g., RcT and BaRsT) over a cohort of animals, while our goal here was simply to provide an in vivo demonstration of the working principle of our head tracker and suggest possible ways of using it to investigate the perceptual, decision and motor processes involved in a perceptual discrimination task.

Fig. 7.

Fig. 7.

Statistical comparison of the head displacements performed by the rat in visual and auditory trials. A: comparison between the performances attained by the rat in the visual (light gray) and auditory (dark gray) discrimination tasks (***P < 0.001; two-tailed, unpaired t test). B: proportion of response patterns of type P1, P2, and P3 (see Fig. 6, CE) observed across the visual (light gray) and auditory (dark gray) trials. The two distributions were significantly different (P < 0.01, χ2 test). C: comparisons between the mean reaction times (left) and between the mean ballistic response times (right) that were measured in visual (light gray) and auditory (dark gray) trials for each type of trajectories (i.e., P1, P2, and P3). *P < 0.05, **P < 0.01, ***P < 0.001; two-tailed, unpaired t test. In every plot, the number of trajectories of a given type is reported on the corresponding bar.

As a further example of the kind of behavioral information that can be extracted using the head tracker, we analyzed the pose of the rat’s head during the execution of the task (Fig. 8). As shown in Fig. 8B, at the time the rat triggered the stimulus presentation (time 0), his head was, on average across trials (thick curves), parallel to the floor of the operant box (i.e., with pitch 0° and roll close to 0°) and facing frontally the stimulus display (i.e., with yaw close to 0°). However, at the level of single of trials (thin curves), the pose of the head was quite variable, with approximately a ± 30° excursion in the pitch, and a ± 15°/20° excursion in the roll and yaw. This can be also appreciated by looking at the polar plots of Fig. 8C (central column), which report the average pitch, roll, and yaw angles that were measured over the 10 frames (~300 ms) following the activation of each response port (thin lines: individual trials; thick lines: trials’ average). Being aware of such variability is especially important in behavioral and neurophysiological studies of rodent visual perception (Zoccolan 2015). In fact, a variable pose of the head at the time of stimulus presentation implies that the animal viewed the stimuli under quite different angles across repeated behavioral trials. This, in turns, means that the rat had to deal with a level of variation in the appearance of the stimuli on his retina that was larger than that imposed, by the design, by the experimenter. In behavioral experiments where the invariance of rat visual perception is under investigation, this is not an issue, because, as observed in Alemi-Neissi et al. (2013), it can lead at most to an underestimation (not to an overestimation) of rat invariant shape-processing abilities. However, in studies were a tight control over the retinal image of the visual stimuli is required, the trial-by-trial variability reported in Fig. 8, B and C indicates that the use of a head tracker is necessary to measure, and possibly compensate, the change of viewing angle occurring across repeated stimulus presentations. This applies, for instance, to neurophysiological studies of visual representations in unrestrained (i.e., not head-fixed) rodents, especially when localized stimuli (e.g., visual objects) are used to probe low- and middle-level visual areas. For instance, head tracking, ideally also paired with eye tracking, would be necessary to investigate putative ventral stream areas in unrestrained rats, as recently done in anesthetized (Matteucci et al. 2019; Tafazoli et al. 2017) or awake but head-fixed animals (Kaliukhovich and Op de Beeck 2018; Vinken et al. 2014, 2016, 2017). By contrast, the issue of pose variability afflicts less neurophysiological studies targeting higher-order association or decision areas, especially when using large (ideally full-field) periodic visual patterns (e.g., gratings) (Nikbakht et al. 2018).

Monitoring the pose of the head during the behavioral trials also revealed that the rat approached the lateral responses ports with his head at port-specific angles (compare red versus blue curves in Fig. 8B, and left versus right columns in Fig. 8C). Unsurprisingly, the yaw angle was the opposite for left (~40°) and right (about −35°) responses, since the animal had to rotate his head toward opposite directions to reach the lateral response ports (see Fig. 8A, red arrows). It was less obvious to observe opposite rotations also about the roll and pitch axes, which indicate that the rat bent his head in port-specific ways to reach each response port and lick from the corresponding feeding needle. Again, this information is relevant for behavioral studies of rodent visual perception, where the stimulus is often left on the screen after the animal makes a perceptual choice and during the time he retrieves the liquid reward (Alemi-Neissi et al. 2013; Djurdjevic et al. 2018; Rosselli et al. 2015; Zoccolan et al. 2009). This implies that the animal experiences each stimulus from a port-specific (and therefore stimulus-specific) viewing angle for a few seconds after the choice. As pointed out in (Djurdjevic et al. 2018), this can explain why rats learn to select specific stimulus features to process a given visual object.

Simultaneous head-tracking and neuronal recordings during a two-alternative forced choice discrimination task.

To illustrate how our head tracker can be combined with the recording of neuronal signals, we monitored the head movements of the rat during the execution of the visual/auditory discrimination task, while recording the activity of hippocampal neurons in CA1. Given that, in rodents, hippocampal neurons often code the position of the animal in the environment (Moser et al. 2008, 2015), we first built a map of the places visited by the rat while performing the task (Fig. 9A, top). It should be noted that, differently from typical hippocampal studies, where the rodent is allowed to freely move inside an arena, the body of the rat in our experiment was at rest, while his head, after entering through the viewing hole, made small sweeps among the three response ports. Therefore, the map of visited positions shown in Fig. 9A refers to the positions of the rat’s nose, rather than to his body. Thus, by binning the area around the response ports, we obtained a density map of visited locations by the nose (Fig. 9A, bottom). Not surprisingly, this map revealed that the rat spent most of the time with his nose very close to one of the response ports (red spots). Figure 9B shows instead the average firing rate of an example, well-isolated hippocampal neuron (whose waveforms and interspike interval distribution are shown in Fig. 9C) as a function of the position of the rat’s nose (only locations that were visited more than 10 times were considered). The resulting spatial map displayed a marked tendency of the neuron to preferentially fire when the rat approached the right response port. In other words, this neuron showed the sharp spatial tuning that is typical of hippocampal place cells, but over a much smaller spatial extent (i.e., over the span of his head movements around the response ports) than typically observed in freely moving rodents.

Fig. 9.

Fig. 9.

Place field of a rat hippocampal neuron during the execution of the perceptual discrimination task. A, top: map of the places visited by the nose of the rat (red dots) around the area of the response ports, while the animal was performing the discrimination task. Bottom: density map of visited locations around the response ports. The map was obtained by 1) dividing the image plane in spatial bins of 20×25 pixels; 2) counting how many times the rat visited each bin; and 3) taking the logarithm of the resulting number of visits per bin (so as to allow a better visualization of the density map). B: place field of a well-isolated hippocampal neuron (as shown in C). The average firing rate of the neuron (i.e., the number of spikes fired in a 33-ms time bin, corresponding to the duration of a frame captured by the head tracker) was computed across all the visits the rat’s nose made to any given spatial bin around the response ports (same binning as in A). Only bins with at least 10 visits were considered, and the raw place field was smoothed with a Gaussian kernel with sigma = 1.2 bins. C: superimposed waveforms (left) and interspike interval distribution (right) of the recorded hippocampal neuron. D: time course of the activity of the neuron, during the execution of the task. The dots in the raster plots (top) indicate the times at which the neuron fired an action potential (or spike). The periresponse time histograms (PRTHs) shown in the bottom graphs were obtained from the raster plots by computing the average number of spikes fired by the neuron across repeated trials in consecutive time bins of 0.48 ms. The color indicates the specific stimulus condition that was presented in a given trial (see key in the figure). The panels on the left refer to trials in which the rat’s choice was correct, while the panels on the right refer to trials in which his response was incorrect. Both the raster plots and the PRTHs are aligned to the time of arrival to the response port (time 0; black dashed line). The gray dashed line shows the mean stimulus presentation time (gray dashed line), relative to the time of the response.

To verify that the neuron mainly coded spatial information, we plotted its activity as a function of time, across the various epochs of the discrimination task, as well as a function of the sensory stimuli the rat had to discriminate and of the correctness of his choices. This is illustrated by the raster plots of Fig. 9D, top, where each dot shows the time at which the neuron fired an action potential, before and after the animal made a perceptual choice (i.e., before and after he licked one of the lateral response ports). Individual rows show the firing patterns in repeated trials during the task, with the color coding the identity of the stimulus. In addition, the trials are grouped according to whether the rat’s response was correct (left) or not (right). Each raster plot was then used to obtain the periresponse time histograms (PRTHs) shown in the bottom of Fig. 9D, where the average firing rate of the neuron across trials of the same kind is reported as a function of time. As expected for a place cell, and consistently with the spatial tuning shown in Fig. 9B, the neuron started to respond vigorously after the rat licked the right response port and did so regardless of whether the stimulus was auditory or visual (see, respectively, the green and blue dots/lines in Fig. 9D). In addition, the neuron also fired when the rat licked the right port in response to the visual stimulus (object 1) that required a response to be delivered on the left port, i.e., on trials in which his choice was incorrect (red dots/line).

Obviously, our analysis of this example neuronal response pattern is far from being systematic. It is simply meant to provide an illustration of how our lightweight, portable head-tracking system can be applied to study the motor/behavioral patterns of small rodents (and their neuronal underpinnings) in experimental contexts where the animals do not navigate large environments but are confined to a restricted (often small) span of spatial locations.

DISCUSSION

In this study, we designed, implemented, and validated a novel head-tracking system to infer the three Cartesian coordinates and the three Euler angles that define the pose of the head of a small mammal in 3D. Algorithmic-wise, our approach draws from established computer vision methods but combines them in a unique and original way to achieve the large accuracy, precision, and speed we demonstrated in our validation measurements (Figs. 3 and 4). This is typical of computer vision applications, where the challenge is often to properly choose, among the battery of available approaches, those that can be most effectively combined to solve a given problem, so as to craft a reliable and efficient application. In our case, we built our head-tracking software incrementally, by adding specific algorithms to solve a given task (e.g., feature extraction), until the task was accomplished with sufficient reliability (e.g., minimal risk of segmentation failure) and the whole tracking system reached submillimeter and subdegree precision and accuracy. As a result, our method, when compared with existing approaches, has several distinctive features that make it ideal for many applications in the field of behavioral neurophysiology.

First, to our knowledge, this is the only methodological study that describes in full detail a videography-based method for 3D pose estimation of the head, while providing at the same time a systematic validation of the system, a demonstration of its application in different experimental contexts, and the source code of the tracking algorithm, along with an example video to test it (see materials and methods). A few other descriptions of head trackers for 3D pose estimation can be found in the neuroscience literature, but only as supplementary sections of more general studies (Finkelstein et al. 2015; Wallace et al. 2013). As a consequence, the details of the image processing and tracking algorithms are not covered as exhaustively as in our study (see materials and methods and appendix A), and the sets of validation measurements to establish the precision and accuracy of the methods are not always reported. This makes it hard for a potential user to implement the method from scratch and have an intuition of its validity in a specific application domain, also because neither open-source code nor freeware software packages implementing the tracking algorithms are available.

A second, key feature of our method is the minimal weight and encumbrance of the dot pattern implanted on the animal’s head, along with the fact that the implant, being passively imaged by the camera of the head tracker, does not contain active elements that need to be electrically powered (Figs. 1 and 2). Achieving such a purely passive tracking was a major challenge in the development of our method, which we solved by applying a cascade of feature extraction algorithms that progressively refine the detection of the dots on the pattern (see appendix A). This sets our head tracker apart from existing approaches, based either on videography (Finkelstein et al. 2015; Wallace et al. 2013) or on inertial measurement units (Kurnikova et al. 2017; Pasquet et al. 2016), which rely instead on relatively large headsets that need to be powered to operate. As a result, these methods do not allow wireless recordings and their application is limited to relatively large (usually open-field) arenas. By contrast, as demonstrated in our study (Figs. 6, 7, 8, and 9), our system can work in the small enclosures and operant boxes that are typical of perceptual studies, even when the animal is instructed to sample the sensory stimuli through a narrow viewing hole. Crucially, these practical advantages over other methods were achieved without compromising on tracking accuracy, which was comparable or superior to that reported for previous camera-based head trackers. Specifically, the overall mean absolute error achieved in Finkelstein et al. (2015) was 5.6° (i.e., an order of magnitude lager than the values obtained in our study), while Wallace et al. (2013) reported RMSEs in the order of those found for our system (i.e., 0.79° for pitch, 0.89° for roll, and 0.55° for yaw; compare with Fig. 4D).

Together, the lightness and compactness of our method, along with its suitability for wireless recordings, makes our system ideal for high-throughput applications, where tens of animals are either continuously monitored in their home cages while engaged in spontaneous behaviors or are tested in parallel, across multiple operant boxes, in perceptual discrimination tasks (Aoki et al. 2017; Dhawale et al. 2017; Meier et al. 2011; Zoccolan 2015). At the same time, no major obstacles prevent our method to be used in more traditional contexts, i.e., in navigation studies employing open-field arenas, mazes, or linear tracks (Moser et al. 2008, 2015). This would simply require choosing a video camera with a larger field of view and a higher resolution, so as to image with sufficient definition the dot pattern over a wider area. In this regard, it is important to highlight that our method, despite being based on purely passive imaging, can successfully track head movements as fast as ~400 mm/s (Fig. 6G)—a speed similar to that reached by rats running on a linear track, as measured by imaging head-mounted LEDs (Davidson et al. 2009). Our method could also be extended to track in 3D the head’s pose of multiple animals in the same environment—e.g., by adding to each pattern an additional dot with a distinctive color, coding for the identity of each animal. Alternatively, multiple synchronized cameras could be used to track the head of a single animal, either to extend the field over which the head is monitored (e.g., in case of very large arenas) or to mitigate the problem of occlusion (e.g., by the cables of the neuronal recording apparatus or by the presence of objects or other animals in the testing area), thus reducing the proportion of dropped frames because of segmentation failure (although this issue, at least in the context of the perceptual task tested in our study, is already properly addressed by the use of a single camera; see Fig. 5D).

With regard to our validation procedure, it should be noted that the collection of 30 repeated head-tracker measurements in rapid succession at each tested position/pose could in principle lead to an overestimation of the precision of our system. In fact, such validation would not take into account the environmental changes that are likely to occur over the course of many hours or days of usage of the head tracker (e.g., because of mechanical drifts or changes of temperature of the camera). Given our goal of testing the head tracker over grids with tens of different positions/poses (see Figs. 3 and 4), it was obviously impossible to obtain many repetitions of the same measurements across delays of hours or days. However, we minimized the risk of overestimating the precision of our system, by making sure that the camera was firmly bolt into the optic table where our measurements took place and by keeping the camera turned on for many consecutive days before performing our tests (with the camera and the whole testing rig located inside an air-conditioned room with tight control of the environmental temperature). This allowed the camera to operate at a stable working temperature and minimized mechanical drifts.

To conclude, because of the key features discussed above, we believe that our method will allow exploring the influence of head positional signals on neuronal representations in a variety of cognitive tasks, including not only navigation, but also sensory processing and perceptual decision making. In fact, there is a growing appreciation that, at least in the rodent brain, the representation of positional information is ubiquitous. Cells tuned for head direction have been found in unexpected places, such as the hippocampus (Acharya et al. 2016). An ever-increasing number of spatial signals (e.g., speed) appear to be encoded in enthorinal cortex (Kropff et al. 2015). A variety of positional, locomotor, and navigational signals has been found to be encoded even in sensory areas, such as primary visual cortex (Ayaz et al. 2013; Dadarlat and Stryker 2017; Niell and Stryker 2010; Saleem et al. 2013, 2018). So far, this positional information has been mainly studied in 2D, but the rodent brain likely encodes a much richer, 3D representation of the head, and studies exploring this possibility are under way. For instance, some authors recently reported the existence of neurons in rat V1 that linearly encode the animal's 3D head direction (i.e., the yaw, roll, and pitch angles) (Guitchounts et al. 2019). It is likely that such 3D head positional signals affect also higher-order visual association areas, as those recently implied in shape processing and object recognition (Kaliukhovich and Op de Beeck 2018; Matteucci et al. 2019; Tafazoli et al. 2017; Vinken et al. 2014, 2016, 2017), as well as the cortical regions involved in perceptual decision making, such as posterior parietal cortex (Licata et al. 2017; Nikbakht et al. 2018; Odoemene et al. 2018; Raposo et al. 2014). Overall, this indicates that estimation of the pose of the head in 3D will soon become an essential tool in the study of rodent cognition. This calls for the availability of reliable, fully validated, easy-to-use and freeware/open-source approaches for 3D head tracking, as the method presented in this study.

One could wonder whether this problem could be solved, in an even less involving and more user-friendly way, by approaches that perform markerless tracking of animals or body parts using deep learning architectures (Insafutdinov et al. 2016; Mathis et al. 2018). Most notably, DeepLabCut has been recently shown to be able to segment and track the body and body parts of several animal species with great accuracy (Mathis et al. 2018). However, current tracking methods based on deep learning operate in the pixel space. They exploit the generalization power of deep networks trained for image classification (often referred to as transfer learning) to correctly segment, label, and track target features (e.g., a hand or a paw) in the image plane. However, they are not meant to provide a geometric reconstruction of the pose of a feature in 3D, which is especially problematic if the goal is to measure rotations about the three Euler rotation axes. While it can be relatively easy to convert pixel displacements into metric displacements by acquiring proper calibration measurements, it is not obvious how pixel measures could be converted into angle estimates. In the spirit of deep learning, this would likely require feeding to a deep network many labeled examples of head poses with known rotation angles, but, without resorting to marker-based measurements, it is unclear how such ground-truth data could be collected in the first place. In summary, at least in the context of experiments that require estimating head rotations in 3D, videography methods that are based on tracking patterns of markers with known geometry are not easily replaceable by markerless approaches.

GRANTS

This work was supported by a Human Frontier Science Program Grant to D. Zoccolan (contract n. RGP0015/2013), a European Research Council Consolidator Grant to D. Zoccolan (project n. 616803-LEARN2SEE) and a FSE POR Regione autonoma FVG Program Grant (HEaD - Higher education and development) to W. Vanzella.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the authors.

AUTHOR CONTRIBUTIONS

W.V., N.G., D.B., and D.Z. conceived and designed research; W.V., N.G., D.B., A.P., and M.G. performed experiments; W.V., N.G., D.B., A.P., and M.G. analyzed data; W.V., N.G., D.B., M.G., and D.Z. interpreted results of experiments; W.V., N.G., and D.Z. drafted manuscript; W.V., N.G., and D.Z. edited and revised manuscript; N.G., D.B., M.G., and D.Z. prepared figures; D.Z. approved final version of manuscript.

ACKNOWLEDGMENTS

Present address for D. Bertolini: VIVISOL (SOL group), Milan, Italy.

Present address for A. Perissinotto: Department of Biomedical Engineering, Tel Aviv University, Ramat Aviv, Israel.

APPENDIX A

Point detection module.

To solve the final PnP problem and estimate the pose of the dot pattern in the camera reference system, all the possible point candidates that represent a 2D projection configuration of the pattern must be considered. This requires extracting first the positions of all six points in each frame captured by the camera. The PD module of our algorithm takes care of this step by applying a Difference of Gaussians (DoG) filter, which is particularly efficient to compute, is rotationally invariant, and shows a good stability under projective transformation and illumination changes (Lowe 1991, 2004).

The DoG filter is an approximation of the well-known Laplacian of Gaussian (LoG) filter. It is defined as the difference between the images resulting from filtering a given input image I with two Gaussians having different sigma, σ1 and σ2 (Jähne 2005), i.e.: DoG = G(I1)−G(I2), where G(I,σ) is the convolution of I with a Gaussian filter G with parameter σ. When the size of the DoG kernel matches the size of a bloblike structure in the image, the response of the filter becomes maximal. The DoG kernel can therefore be interpreted as a matching filter (Duda et al. 2001). In our implementation, the ratio R = σ12 has been fixed to 2. Therefore, σ1 and σ2 can be written as

σ1=σ2
σ2=σ/2

where σ can be interpreted as the size of the DoG kernel. In principle, σ should be set around r/2 where r is the radius of a black dot as it appears on a frame imaged by the camera (Lindeberg 1998). However, the method is quite tolerant to the variations of the distance of the dot’s pattern from the camera, and the dots are correctly detected even when they are at a working distance that is half than that originally set (i.e., when their size is twice as large as r). In addition, the software implementation of our head tracker includes a graphical user interface (GUI) that allows manually adjusting the value of σ, as well as of other key parameters (e.g., Ic and score; see next paragraph and Point-correspondences identification module and Perspective-n-Point module below), depending on the stability of the tracking procedure (as visually assessed by the user, in real time, through the GUI).

After detecting the candidate dots in the image plane using the DoG filter, we applied a nonmaxima suppression algorithm (Canny 1987) that rejects all candidate locations that are not local maxima and are smaller than a contrast threshold Ic. Still, depending on the value of Ic, a large number of false positives can be found along the edges of the pattern. In fact, a common drawback of the DoG and LoG representations is that local maxima can also be detected in the neighborhood of contours or straight edges, where the signal change is only in one direction. These maxima, however, are less stable, because their localization is more sensitive to noise or small changes in neighboring texture. A way to solve the problem of these false detections would be to analyze simultaneously the trace and the determinant of the Hessian matrix over a neighborhood of pixels in the image (Mikolajczyk and Schmid 2002). The trace of the Hessian matrix is equal to the LoG but considering simultaneously the maxima of the determinant penalizes points for which the second derivatives detect signal-changes in only one direction. Since a similar idea is explored in the Harris cornerness operator (Harris and Stephens 1988), our algorithm exploits the already calculated Gaussian smoothed image and uses the efficient implementation of Harris corner detection available in the OpenCV library (https://opencv.org).

Given an image I, Harris cornerness operator obtains the local image structure tensor HSp over a neighborhood of pixels Sp, where HSp is defined as

HSp=SpI/x2SpI/xI/ySpI/xI/ySpI/y2.

Here, I/x and I/y are the partial derivatives of the image intensity I along the two spatial axes x and y of the image plane, computed using a Sobel operator with an aperture of three pixels—i.e., a 3×3 filter that implements a smooth, discrete approximation of a first order derivative (González and Woods 2008; Jähne 2005). HSp provides a robust distinction between edges and small blobs because the difference detHSpk*traceHSp2 assumes values that are strictly negative on edges and positives on blob centers. This difference depends on three parameters: 1) the aperture of the Sobel filter, which, as mentioned above, was fixed to the minimal possible value (3) for the sake of speed of computation; 2) the weight k, assigned to the trace of the HSp tensor, which, as suggested in Grauman and Leibe (2011), was set to 0.04, i.e., the default value of the openCV library (https://opencv.org); and 3) the block size of the squared neighborhood Sp that was empirically set to 9 based on some pilot tests, where it showed good stability over a broad range of working distances (note that Sp could be extrapolated from the dot size, the working distance between the camera and the dot pattern, and the internal camera parameters, assuming an ideal condition of a clean white background around the dots).

To summarize, the complete PD algorithm worked as follows. First, the DoG filter was applied to identify all candidate dots in the acquired image. Then, the nonmaxima suppression algorithm was used to prune some of the false positives around the maxima. Finally, for each of the remaining detected dots, the Harris cornerness operator HSp was computed over the a neighborhood Sp centered on the position of each dot and, depending on the sign of the difference detHSpk*traceHSp2, the dot was rejected as a false positive or accepted as the projection on the image plane of one of the dots of the 3D pattern. As mentioned above, the contrast threshold Ic of the nonmaxima suppression algorithm was adjustable by the user through the same GUI used to adjust σ of the DoG.

To conclude, it should be noted that, despite the pruning of false detections performed by the PD algorithm, still many spurious dots are identified in the image, in addition to those actually present on the pattern. This is because, in general, many dotlike features are present in any image. To further refine the identification of the actual dots of the pattern, it is necessary to take into account their relationship, given the geometry of the pattern itself. This is achieved by the PcI algorithm that is described in Point-correspondences identification module.

Point-correspondences identification module.

Since projective transformations maintain straight lines, aligned triplets of dots (p1, p2, p3) in the 3D pattern must still be aligned in the images captured by the camera. Our PcI algorithm searches all aligned triplets of dots in an image and, to reduce the number of possible triplets, only those having length (i.e., distance between the external points) smaller than D are considered. To identify a triplet, for any given pair of detected dots, we looked at whether a third, central dot was present in the proximity of the middle position between the pair (note that, in case of a triplet, the offset of the projected central point with respect to the middle position is negligible, because usually the black dots are at a much shorter distance than the working distance and the orthographic projection can be adopted). D is automatically computed from the physical distances of the external points of the triplets on the dot pattern, knowing the intrinsic parameters of the camera. It must be set to be slightly bigger than the maximum distance between the external points in the triplets on the image plane, when the pattern is positioned at the maximal working distance from the camera (10% bigger showed to be sufficient). This condition is achieved when the z-axis of the pattern is exactly aligned to the optical axis of the camera.

Once all candidate triplets have been identified, the algorithm looks for those having a common external point, which corresponds to the corner of the L-shaped arrangement of five coplanar dots in the pattern (Fig. 2A). The search of this L’s corner is performed by considering 5-tuples of points configurations to obtain the correspondence final assignment. In addition to collinearity, another important projective invariant is the angular ordering (on planes facing the view direction of the imaging device). That is, if we take three points defining a triangle, once we have established an ordering to them (either clockwise or counterclockwise), such ordering is maintained under any projective transformations that look down to the same side of the plane (Bergamasco et al. 2011). In our framework, this implies evaluating the external product of the two vectors that start from the common point (i.e., the L’s corner) and end on the respective external points. This establishes the orientation order and, consequently, uniquely assigns the five black dots correspondences between the 3D pattern and its image.

Following this assignment, the sixth dot belonging to the pattern (i.e., the one placed over the pillar; Fig. 2A) is searched in proximity of the L’s corner. To this aim, the maximal distance between the dot on the pillar and the L’s corner (DP) in the image plane is automatically estimated (in the same way as D) from the actual distance between the two dots in the 3D pattern, knowing the camera’s internal parameters and its maximal working distance from the pattern. This yields a set of candidate pillar dots. Finding the correct one requires evaluating each candidate dot in conjunction with the other five coplanar dots in terms of its ability to minimize the reprojection error computed by solving the PnP problem (see Perspective-n-Point module below). It should be noted that the correct identification of the sixth point on the pillar is fundamental, since it is the only point out of the plane that allows the PnP problem to be solved robustly. The reprojection error is defined as the sum norm distance between the estimated positions of the dots in the image and the projections of the physical dots of the 3D pattern on the image plane, under a given assumed pose of the pattern and knowing the camera’s calibration parameters. More specifically, during the sixth point selection we defined the score of a given dots configuration as

score=100S*Reprojection_error (A1)

where S is a proper scalar factor established experimentally (in our application, we set S = 5). To understand the meaning of S, let’s suppose, for instance, to have a distance error of one pixel for each dot, thus yielding a reprojection error of 6. Without the S scaling factor (i.e., with S = 1), we would obtain a score of 94. However, the error on each dot is typically well below one pixel (see next paragraph about the way to estimate the dots coordinates with subpixel accuracy) and the score would therefore be always close to 100. Hence, the need of introducing the factor S to rescale the score, so that it can range between 90 and 100. As mentioned above, during the selection procedure, the sixth point on the pillar was chosen among the candidates that maximized the score defined above. In fact, each hypothesis about the position of the sixth point yielded a PnP transformation, for which it was possible to compute the reprojection error. Note that, to eliminate some possible ambiguities in the selection of the sixth point, we also exploited the a priori knowledge about the direction of the pillar (which must point toward the camera sensor).

As expected, given how sensitive the PnP procedure is to small variations in the estimated positions of the dots, we empirically verified that a pixel-level accuracy was not sufficient to guarantee high precision and stability in our pose estimates. For this reason, we estimated the positions of the centers of the dots at the subpixel level by three-point Gaussian approximation (Naidu and Fisher 1991). This method considers the three highest, contiguous intensity values (along either the x or y spatial axes) within a region of the image that has been identified as one of the dots (i.e., a blob) and assumes that the shape of the observed peak fits a Gaussian profile. This assumption is reasonable, because the sensor integration over a small area of pixels containing a blob, after the DoG filtering, produces smooth profiles very similar to Gaussian profiles. If a, b, and c are the intensity values observed at pixel positions x − 1, x, and x + 1, with b having the highest value, then the subpixel location (xs) of the peak is given by

xs=x12lnclnalna+lnc2lnb

where x is the x-coordinate of the center of the pixel with intensity value b. The same approximation is applied to obtain the subpixel y coordinate ys of the dot center.

Perspective-n-Point module.

In computer vision, the problem of estimating the position and orientation of an object with respect to a perspective camera, given its intrinsic parameters obtained from calibration and a set of world-to-image correspondences, is known as the Perspective-n-Point camera pose problem (PnP) (Fiore 2001; Lowe 1991). Given a number of 2D-3D point correspondences miMi (where mi = [u v]′ are the 2D coordinates of point i over the image plane, and Mi = [X Y Z]′ are the 3D coordinates of point i in the physical environment) and the matrix K with the intrinsic camera parameters (see definition below), the PnP problem requires to find 1) a rotation matrix R that defines the orientation of the object (i.e., of the pattern reference system x′, y′, z′ in our case) with respect to the camera reference system x, y, z (see Eq. 1 in the results and Fig. 2C); and 2) a translation vector t that specifies the Cartesian coordinates of the center of the object (i.e., of the origin O′ of the pattern reference system in our case) in the camera reference system, such that

m˜iK[Rt]M˜i (A2)

for all i, where ~ denotes homogeneous coordinates (Jähne 2005) and defines an equation up to a scale factor. Specifically:

m˜i=uv1
K=fx0cx0fycy001
[Rt]=[r11r12r13t1r21r22r23t2r31r32r33t3]
M˜i=XYZ1,

where fx and fy are the focal lengths and cx and cy are the coordinates of the principal point of the camera lens.

To solve the PnP problem, all methods have to face a tradeoff between speed and accuracy. Direct methods, such as Direct Linear Transform, find a solution to a system of linear equations as in Eq. A2 and are usually faster but less accurate, as compared with iterative methods. On the other hand, iterative methods that explicitly minimize a meaningful geometric error, such as the reprojection error, are more accurate but slower (Garro et al. 2012). In our application, we adopted the method known as EPnP (Lepetit et al. 2009), followed by an iterative refinement. The EPnP approach is based on a noniterative solution to the PnP problem and its computational complexity grows linearly with n, where n is the number of point correspondences. The method is applicable for all n ≥ 4 and properly handles both planar and nonplanar configurations. The central idea is to express the n 3D points as a weighted sum of four virtual control points. Since in our setup high precision is required, the output of the closed-form solution given by the EPnP was used to initialize an iterative Levenberg-Marquardt scheme (More 1977), which finds the pose that minimizes the reprojection error, thus improving the accuracy with negligible amount of additional time. Both the EPnP and the Levemberg-Marquardt iterative scheme are available in the openCV library (https://opencv.org) and are extremely efficient. The execution time in our HP Workstation Z620 was in the order of some milliseconds.

Camera calibration procedure.

Camera calibration, or more precisely camera resectioning, is the process of estimating the parameters of a pinhole camera model approximating the true camera that produced a given image or a set of images. With the exception of the so-called self-calibration methods, which try to estimate the parameter by exploiting only point correspondences among images, a calibration object having a known precise geometry is needed. In fact, self-calibration cannot usually achieve an accuracy comparable with that obtained with a calibration object, because it needs to estimate a large number of parameters, resulting in a much harder mathematical problem (Zhang 2000).

Much progress has been done, starting in the photogrammetry community, and more recently in the field of computer vision, in terms of developing object-based calibration methods. In general, these approaches can be classified in two major categories, based on the number of dimensions of the calibration objects: 1) 3D object-based calibration, where camera calibration is performed by observing a calibration object whose geometry in 3D space is known with very good precision; and 2) 2D plane-based calibration, which is based on imaging a planar pattern shown at a few different orientations. In our application, we adopted this second option, because it has proven to be the best choice in most situations, given its ease of use and good accuracy. Specifically, we adopted the method of Zhang (2000), available in the OpenCV library (https://opencv.org), which, in its iterative process, also estimates some lens distortion coefficients (see following paragraphs for a discussion on the distortion).

The fundamental equation to achieve the calibration is the same of the PnP problem (i.e., Eq. A2), and the iterative solution is based on minimizing the reprojection error defined in Eq. A1. However, in the case of the camera calibration, also the matrix K with the parameters is unknown, in addition to R and t. As such, solving the equation is, in principle, harder. However, the algorithm used to minimize the reprojection error does not need to run in real time, since the calibration is performed before the camera is used for head tracking. In addition, the point correspondences miMi over which the error is computed and minimized are in the order of several hundreds, which makes the estimation of the parameters very robust and reliable. In our application, these points were the intersections of the 9×7 squares of a planar checkerboard (shown in Fig. 4A) imaged in 15 different poses/positions. Specifically, a few snapshots were taken centrally at different distances from the camera, others spanned the image plane to sample the space where the radial distortion is more prominent, and finally (and more importantly) other snapshots were taken from different angles of orientation with respect to the image plane (Zhang 2000).

To measure the effect of changing the calibration images over the pose estimation, we collected a set of 80 images of the calibration checkerboard at different orientations. 50 random subsamples (without replacement), each composed of 50% of the total images, were used to calibrate the system, thus yielding 50 different calibrations. In Table A1, we report the mean and standard deviation of the internal parameters and the lens distortion coefficients obtained from such calibrations. In Table A2, we report the averages and standard deviations of the radial and tangential part contributions of the distortion. Some of the parameters reported in these tables (i.e., fx, fy, cx, and cy) have been defined in Eq. A2, while other parameters (k1, k2, k3, p1, and p2) are distortion coefficients described in the next paragraphs.

The focal lengths fx and fy (Table A1) showed a good stability. The coordinates of the principal point (cx, cy), as it is well known, are some of the hardest parameters to estimate and, in our case, had a standard deviation of ~4.5 pixels. In any event, they are additive terms and, as such, do not affect distance and angle measurements. The remaining parameters describe the distortion introduced by the lens during the image formation. We quantified the effect of distortion, both the radial and the tangential part, on the pose estimation by considering our specific setup.

From Eq. A2, after the roto-translation transformation [x y z] = [R t][X Y X 1], we have the 2d homogeneous coordinates

x=x/z,y=y/z.

Then, the distorted coordinates x″ and y″ can be modeled by the parameters k1, k2, k3, p1, and p2 using the following equation:

x=x1+k1r2+k2r4+k3r6+2p1xy+p2r2+2x2y=y1+k1r2+k2r4+k3r6+2p2xy+p1r2+2y2 (A3)

where r2 = x2+y2. The final 2D image coordinates are then obtained by s[uv1]=Kxy1.

In the worst case scenario, namely in the corners of the image where the distortion is at its maximum, x′ and y′ can be approximated by dSx/2f and dSy/2f, where f is the focal length of the lens and dSx and dSy are the x and y dimensions of the CCD sensor. By simple geometrical considerations, since our camera mounts a 1/3″ CMOS sensor and the lens has a focal length of 12.5 mm, r2 results equal to 0.0576. The distortion corrections were then calculated for every sample considering this worst case. The radial distortion correction (k1r2+k2r4+k3r6) dominates the tangential part 2p1xy+p2(r2+2x2), but, more importantly, it has a standard deviation of just ~0.1% (see Table A2).

In this framework, it is clear that the calibration is very stable, with parameters fluctuating around 0,1% for fx, fy and the distortion coefficients, whereas cx and cy do not affect the displacement or angular measurements. Vice versa, an accurate measurement of the physical distances between the dots of the pattern is crucial. We verified that an error of 5% in the dot distances produces errors of 5% in the pose estimates (for example, if the distances are estimated 5% bigger than the truth, the dot pattern is estimated at a pose 5% more distant to the camera) and, consequently, the translation measurements are affected.

APPENDIX B

Accuracy measurements of the custom assemblies used to position the dot pattern.

The accuracy of the placement of the dots’ pattern over the 5×5 breadboard shown in Fig. 3A was computed from the known accuracy of the 3D printer used to build the board. Specifically, our 3D printer can make at most an error of ± 0.05 mm (±0.025 mm) every 25.4 mm of printed material along the x and y directions (the error decreases at 0.025 mm for z axis). Given the 150-mm span along the x-axis and the 100-mm span along the y-axis, this yields, in the worst case scenario in which the errors keeps accumulating without canceling each other, a maximal expected error of ± 0.15 mm along x and ± 0.10 mm along y.

Computing the precision and accuracy of the custom assembly (made of two 3D-printed goniometers and a rotary stage) that was used to rotate the dot pattern in 3D (Fig. 4B) was more involving. It required building a validation setup, such that actual and desired angular displacements could be first transformed into linear shifts for easier comparison. To validate the yaw and pitch rotations, the setup consisted of two facing plates, mounted vertically on an optical breadboard table (Kinetic Systems Vibraplane) and held perpendicularly to the table with two rectified vertical brackets (Thorlabs VB01B/M; see Supplemental Fig. S1A: https://doi.org/10.5281/zenodo.3429691). On the surface of one of the plates we placed a millimeter-graduated paper, while on the other plate we mounted the custom assembly, but with a laser source instead of the dots’ pattern. The rationale of the system is that, once the center of rotation of each goniometer is known, each performed (nominal) rotation results in a specific linear displacement of the laser beam over the millimeter-graduated paper. Such linear displacement can be converted back to the actual angle α of the rotation using the following trigonometric identity:

α=tan1ld

where l is the linear displacement and d is the distance between the rotation center of the goniometer used to preform the rotation and the millimeter-graduated paper, along the axis that is perpendicular to the latter. Thus, by performing several repeated nominal rotations, targeting various combinations of desired yaw and pitch angles, it was possible to establish both the precision and the accuracy of the performed rotations. Specifically, we set the yaw and pitch angles at the following pairs of values: (1°,1°), (3°,3°), (5°,5°), (8°,8°), and (10°,10°) in the top right quadrant; (1°,-1°), (3°,-3°), …, etc. in the bottom right quadrant; (−1°,-1°), (−3°,-3°), …, etc. in the bottom left quadrant; and (−1°,1°), (−3°,3°), …, etc. in the top left quadrant. Each target rotation was repeated three times, thus yielding a total of 75 validation measurements, from which we computed the accuracy and precision of the performed rotations (in terms of RMSE) as 0.07° and 0.02° for the yaw and 0.05° and 0.03° for the pitch.

For the roll angle, we followed a similar approach, but the plate with the millimeter-graduated paper and the plate mounting the custom assembly were placed perpendicularly to each other, rather then parallel (see Supplemental Fig. S1B: https://doi.org/10.5281/zenodo.3429691). This allowed rotating the rotary stage used to produce the roll displacements in such a way that the rotations resulted as vertical linear shifts over the millimeter-graduated paper. Specifically, we set the roll angle to the following set of values: 1°, 3°, 5°, 8°, and 10°, as well as −1°, −3°, −5°, −8°, and −10°, and we repeated each rotation three times. This yielded an estimate of the accuracy and precision of the roll displacements as 0.13° and 0.04° RMSE.

Table A1.

Mean values and standard deviations of the internal camera parameters and distortion coefficients

fx fy Cx Cy k1 k2 p1 p2 k3 RMSE, pixels
Mean 2,701.7 2,707.4 636.6 508.9 −0.396 2.23 0.00098 −0.0019 −26.37 0.10
SD 2.51 1.98 4.73 4.29 0.012 0.78 0.00018 0.00025 15.50 0.00094

The reprojection error, namely the distance between the square’s corners in the images and their points reprojected after the calibration, is reported in the last column. fx and fy are the focal lengths and cx and cy are the coordinates of the principal point of the camera lens (see Eqs. A2 and A3); k1, k2, k3, p1, and p2 are the other distortion coefficients used in Eq. A3. RMSE, root mean square error.

Table A2.

Mean values and standard deviations of the two components of the distortion, i.e., the radial and the tangential part

Radial Distortion Tangential Distortion
Mean −0.02048 −0.00020
SD 0.0010 2.49e-05

The radial distortion of −0.02048 indicates that the pixels in the corners of the image appear ~2% closer to the center of the image (barrel distortion).

REFERENCES

  1. Acharya L, Aghajan ZM, Vuong C, Moore JJ, Mehta MR. Causal influence of visual cues on hippocampal directional selectivity. Cell 164: 197–207, 2016. doi: 10.1016/j.cell.2015.12.015. [DOI] [PubMed] [Google Scholar]
  2. Alemi-Neissi A, Rosselli FB, Zoccolan D. Multifeatural shape processing in rats engaged in invariant visual object recognition. J Neurosci 33: 5939–5956, 2013. doi: 10.1523/JNEUROSCI.3629-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Aoki R, Tsubota T, Goya Y, Benucci A. An automated platform for high-throughput mouse behavior and physiology with voluntary head-fixation. Nat Commun 8: 1196, 2017. doi: 10.1038/s41467-017-01371-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Aragão RS, Rodrigues MAB, de Barros KMFT, Silva SRF, Toscano AE, de Souza RE, Manhães-de-Castro R. Automatic system for analysis of locomotor activity in rodents—A reproducibility study. J Neurosci Methods 195: 216–221, 2011. doi: 10.1016/j.jneumeth.2010.12.016. [DOI] [PubMed] [Google Scholar]
  5. Attanasi A, Cavagna A, Del Castello L, Giardina I, Grigera TS, Jelić A, Melillo S, Parisi L, Pohl O, Shen E, Viale M. Information transfer and behavioural inertia in starling flocks. Nat Phys 10: 615–698, 2014. doi: 10.1038/nphys3035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Ayaz A, Saleem AB, Schölvinck ML, Carandini M. Locomotion controls spatial integration in mouse visual cortex. Curr Biol 23: 890–894, 2013. doi: 10.1016/j.cub.2013.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bassett JP, Taube JS. Neural correlates for angular head velocity in the rat dorsal tegmental nucleus. J Neurosci 21: 5740–5751, 2001. doi: 10.1523/JNEUROSCI.21-15-05740.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Berg HC, Brown DA. Chemotaxis in Escherichia coli analysed by three-dimensional tracking. Nature 239: 500–504, 1972. doi: 10.1038/239500a0. [DOI] [PubMed] [Google Scholar]
  9. Bergamasco F, Albarelli A, Torsello A. Image-space marker detection and recognition using projective invariants. Visualization and Transmission 2011 International Conference on 3D Imaging, Modeling, Processing. 2011: 381–388, 2011. doi: 10.1109/3DIMPVT.2011.55. [DOI] [Google Scholar]
  10. Bossens C, Op de Beeck HP. Linear and non-linear visual feature learning in rat and humans. Front Behav Neurosci 10: 235, 2016. doi: 10.3389/fnbeh.2016.00235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Canny J. A computational approach to edge detection. In: Readings in Computer Vision, edited by Fischler MA, Firschein O. San Francisco, CA: Morgan Kaufmann, 1987, p. 184–203. [Google Scholar]
  12. Cavagna A, Conti D, Creato C, Del Castello LD, Giardina I, Grigera TS, Melillo S, Parisi L, Viale M. Dynamic scaling in natural swarms. Nat Phys 13: 914–918, 2017. doi: 10.1038/nphys4153. [DOI] [Google Scholar]
  13. Dadarlat MC, Stryker MP. Locomotion enhances neural encoding of visual stimuli in mouse V1. J Neurosci 37: 3764–3775, 2017. doi: 10.1523/JNEUROSCI.2728-16.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Davidson TJ, Kloosterman F, Wilson MA. Hippocampal replay of extended experience. Neuron 63: 497–507, 2009. doi: 10.1016/j.neuron.2009.07.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. De Keyser R, Bossens C, Kubilius J, Op de Beeck HP. Cue-invariant shape recognition in rats as tested with second-order contours. J Vis 15: 14, 2015. doi: 10.1167/15.15.14. [DOI] [PubMed] [Google Scholar]
  16. Dhawale AK, Poddar R, Wolff SB, Normand VA, Kopelowitz E, Ölveczky BP. Automated long-term recording and analysis of neural activity in behaving animals. eLife 6: e27702, 2017. doi: 10.7554/eLife.27702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Djurdjevic V, Ansuini A, Bertolini D, Macke JH, Zoccolan D. Accuracy of rats in discriminating visual objects is explained by the complexity of their perceptual strategy. Curr Biol 28: 1005–1015.e5, 2018. doi: 10.1016/j.cub.2018.02.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Duda RO, Hart PE, Stork DG. Pattern Classification. New York: Wiley, 2001. [Google Scholar]
  19. Finkelstein A, Derdikman D, Rubin A, Foerster JN, Las L, Ulanovsky N. Three-dimensional head-direction coding in the bat brain. Nature 517: 159–164, 2015. doi: 10.1038/nature14031. [DOI] [PubMed] [Google Scholar]
  20. Fiore PD. Efficient linear solution of exterior orientation. IEEE Trans Pattern Anal Mach Intell 23: 140–148, 2001. doi: 10.1109/34.908965. [DOI] [Google Scholar]
  21. Fyhn M, Molden S, Witter MP, Moser EI, Moser M-B. Spatial representation in the entorhinal cortex. Science 305: 1258–1264, 2004. doi: 10.1126/science.1099901. [DOI] [PubMed] [Google Scholar]
  22. Garcia-Perez E, Mazzoni A, Zoccolan D, Robinson HP, Torre V. Statistics of decision making in the leech. J Neurosci 25: 2597–2608, 2005. doi: 10.1523/JNEUROSCI.3808-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Garro V, Crosilla F, Fusiello A. Solving the PnP problem with anisotropic orthogonal Procrustes analysis. Visualization Transmission 2012 Second International Conference on 3D Imaging, Modeling, Processing. 2012: 262–269, 2012. doi: 10.1109/3DIMPVT.2012.40. [DOI] [Google Scholar]
  24. González RC, Woods RE. Digital Image Processing. Upper Saddle River, NJ: Pearson/Prentice Hall, 2008. [Google Scholar]
  25. Gower JC, Dijksterhuis GB. Procrustes Problems. Oxford, UK: Oxford University Press, 2004. [Google Scholar]
  26. Grauman K, Leibe B. Visual Object Recognition. San Rafael, CA: Morgan & Claypool, 2011. [Google Scholar]
  27. Guitchounts G, Masis J, Wolff SB, Cox DD. Head direction and orienting-suppression signals in primary visual cortex of freely moving rats. Computational and Systems Neuroscience (Cosyne) 2019 Lisbon, Portugal, February 28–March 5, 2019. [Google Scholar]
  28. Hafting T, Fyhn M, Molden S, Moser M-B, Moser EI. Microstructure of a spatial map in the entorhinal cortex. Nature 436: 801–806, 2005. doi: 10.1038/nature03721. [DOI] [PubMed] [Google Scholar]
  29. Harris C, Stephens M. A combined corner and edge detector. In: Proceedings of the Alvey Vision Conference, edited by Taylor CJ. Manchester, UK: Alvety Vision Club, 1988, p. 23.1–23.6. [Google Scholar]
  30. Hauser MFA, Wiescholleck V, Colitti-Klausnitzer J, Bellebaum C, Manahan-Vaughan D. Event-related potentials evoked by passive visuospatial perception in rats and humans reveal common denominators in information processing. Brain Struct Funct 224: 1583–1597, 2019. doi: 10.1007/s00429-019-01854-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Horner AE, Heath CJ, Hvoslef-Eide M, Kent BA, Kim CH, Nilsson SRO, Alsiö J, Oomen CA, Holmes A, Saksida LM, Bussey TJ. The touchscreen operant platform for testing learning and memory in rats and mice. Nat Protoc 8: 1961–1984, 2013. doi: 10.1038/nprot.2013.122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B. DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Computer Vision—ECCV 2016, edited by Leibe B, Matas J, Sebe N, Welling M. Cham, Switzerland: Springer International, 2016, p. 34–50. [Google Scholar]
  33. Jähne B. Digital Image Processing. Berlin, Germany: Springer Science & Business Media, 2005. [Google Scholar]
  34. Kaliukhovich DA, Op de Beeck H. Hierarchical stimulus processing in rodent primary and lateral visual cortex as assessed through neuronal selectivity and repetition suppression. J Neurophysiol 120: 926–941, 2018. doi: 10.1152/jn.00673.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Kemp A, Manahan-Vaughan D. Passive spatial perception facilitates the expression of persistent hippocampal long-term depression. Cereb Cortex 22: 1614–1621, 2012. doi: 10.1093/cercor/bhr233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Kimmel DL, Mammo D, Newsome WT. Tracking the eye non-invasively: simultaneous comparison of the scleral search coil and optical tracking techniques in the macaque monkey. Front Behav Neurosci 6: 49, 2012. doi: 10.3389/fnbeh.2012.00049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Knutsen PM, Derdikman D, Ahissar E. Tracking whisker and head movements in unrestrained behaving rodents. J Neurophysiol 93: 2294–2301, 2005. doi: 10.1152/jn.00718.2004. [DOI] [PubMed] [Google Scholar]
  38. Kropff E, Carmichael JE, Moser M-B, Moser EI. Speed cells in the medial entorhinal cortex. Nature 523: 419–424, 2015. doi: 10.1038/nature14622. [DOI] [PubMed] [Google Scholar]
  39. Kurnikova A, Moore JD, Liao S-M, Deschênes M, Kleinfeld D. Coordination of orofacial motor actions into exploratory behavior by rat. Curr Biol 27: 688–696, 2017. doi: 10.1016/j.cub.2017.01.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Kurylo DD, Chung C, Yeturo S, Lanza J, Gorskaya A, Bukhari F. Effects of contrast, spatial frequency, and stimulus duration on reaction time in rats. Vision Res 106: 20–26, 2015. doi: 10.1016/j.visres.2014.10.031. [DOI] [PubMed] [Google Scholar]
  41. Kurylo DD, Yeturo S, Lanza J, Bukhari F. Lateral masking effects on contrast sensitivity in rats. Behav Brain Res 335: 1–7, 2017. doi: 10.1016/j.bbr.2017.07.046. [DOI] [PubMed] [Google Scholar]
  42. Lepetit V, Moreno-Noguer F, Fua P. EPnP: an accurate O(n) solution to the PnP problem. Int J Comput Vis 81: 155–166, 2009. doi: 10.1007/s11263-008-0152-6. [DOI] [Google Scholar]
  43. Licata AM, Kaufman MT, Raposo D, Ryan MB, Sheppard JP, Churchland AK. Posterior parietal cortex guides visual decisions in rats. J Neurosci 37: 4954–4966, 2017. doi: 10.1523/JNEUROSCI.0105-17.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Lindeberg T. Feature detection with automatic scale selection. Int J Comput Vis 30: 79–116, 1998. doi: 10.1023/A:1008045108935. [DOI] [Google Scholar]
  45. Lowe D. Fitting parameterized three-dimensional models to images. IEEE Trans Pattern Anal Mach Intell 13: 441–450, 1991. doi: 10.1109/34.134043. [DOI] [Google Scholar]
  46. Lowe DG. Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60: 91–110, 2004. doi: 10.1023/B:VISI.0000029664.99615.94. [DOI] [Google Scholar]
  47. Mar AC, Horner AE, Nilsson SR, Alsiö J, Kent BA, Kim CH, Holmes A, Saksida LM, Bussey TJ. The touchscreen operant platform for assessing executive function in rats and mice. Nat Protoc 8: 1985–2005, 2013. doi: 10.1038/nprot.2013.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Mathis A, Mamidanna P, Cury KM, Abe T, Murthy VN, Mathis MW, Bethge M. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat Neurosci 21: 1281–1289, 2018. doi: 10.1038/s41593-018-0209-y. [DOI] [PubMed] [Google Scholar]
  49. Matteucci G, Bellacosa Marotti R, Riggi M, Rosselli FB, Zoccolan D. Nonlinear processing of shape information in rat lateral extrastriate cortex. J Neurosci 39: 1649–1670, 2019. doi: 10.1523/JNEUROSCI.1938-18.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Mazzoni A, Garcia-Perez E, Zoccolan D, Graziosi S, Torre V. Quantitative characterization and classification of leech behavior. J Neurophysiol 93: 580–593, 2005. doi: 10.1152/jn.00608.2004. [DOI] [PubMed] [Google Scholar]
  51. Meier P, Flister E, Reinagel P. Collinear features impair visual detection by rats. J Vis 11: 22, 2011. doi: 10.1167/11.3.22. [DOI] [PubMed] [Google Scholar]
  52. Mersch DP, Crespi A, Keller L. Tracking individuals shows spatial fidelity is a key regulator of ant social organization. Science 340: 1090–1093, 2013. doi: 10.1126/science.1234316. [DOI] [PubMed] [Google Scholar]
  53. Mikolajczyk K, Schmid C. An affine invariant interest point detector. In: Computer Vision — ECCV 2002, edited by Heyden A, Sparr G, Nielsen M, Johansen P. Berlin, Germany: Springer, 2002, p. 128–142. [Google Scholar]
  54. More J. The Levenberg-Marquardt algorithm: implementation and theory. In: Lecture Notes in Mathematics 630: Numerical Analysis, edited by Watson G. Berlin, Germany: Springer, 1977, p. 105–116. [Google Scholar]
  55. Moser EI, Kropff E, Moser M-B. Place cells, grid cells, and the brain’s spatial representation system. Annu Rev Neurosci 31: 69–89, 2008. doi: 10.1146/annurev.neuro.31.061307.090723. [DOI] [PubMed] [Google Scholar]
  56. Moser M-B, Rowland DC, Moser EI. Place cells, grid cells, and memory. Cold Spring Harb Perspect Biol 7: a021808, 2015. doi: 10.1101/cshperspect.a021808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Naidu DK, Fisher RB. A Comparative analysis of algorithms for determining the peak position of a stripe to sub-pixel accuracy. In: BMVC91: Proceedings of the British Machine Vision Conference, edited by Mowforth P. London, UK: Springer, 1991, p. 217–225. [Google Scholar]
  58. Niell CM, Stryker MP. Modulation of visual responses by behavioral state in mouse visual cortex. Neuron 65: 472–479, 2010. doi: 10.1016/j.neuron.2010.01.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Nikbakht N, Tafreshiha A, Zoccolan D, Diamond ME. Supralinear and supramodal integration of visual and tactile signals in rats: psychophysics and neuronal mechanisms. Neuron 97: 626–639.e8, 2018. doi: 10.1016/j.neuron.2018.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. O’Keefe J, Dostrovsky J. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res 34: 171–175, 1971. doi: 10.1016/0006-8993(71)90358-1. [DOI] [PubMed] [Google Scholar]
  61. O’Keefe J, Nadel L. The Hippocampus as a Cognitive Map. Oxford, UK: Oxford University Press, 1978. [Google Scholar]
  62. Odoemene O, Pisupati S, Nguyen H, Churchland AK. Visual evidence accumulation guides decision-making in unrestrained mice. J Neurosci 38: 10143–10155, 2018. doi: 10.1523/JNEUROSCI.3478-17.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Pasquet MO, Tihy M, Gourgeon A, Pompili MN, Godsil BP, Léna C, Dugué GP. Wireless inertial measurement of head kinematics in freely-moving rats. Sci Rep 6: 35689, 2016. doi: 10.1038/srep35689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Payne HL, Raymond JL. Magnetic eye tracking in mice. eLife 6: e29222, 2017. doi: 10.7554/eLife.29222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Perkon I, Košir A, Itskov PM, Tasič J, Diamond ME. Unsupervised quantification of whisking and head movement in freely moving rodents. J Neurophysiol 105: 1950–1962, 2011. doi: 10.1152/jn.00764.2010. [DOI] [PubMed] [Google Scholar]
  66. Pinnell RC, Almajidy RK, Kirch RD, Cassel JC, Hofmann UG. A wireless EEG recording method for rat use inside the water maze. PLoS One 11: e0147730, 2016. doi: 10.1371/journal.pone.0147730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Quiroga RQ, Nadasdy Z, Ben-Shaul Y. Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering. Neural Comput 16: 1661–1687, 2004. doi: 10.1162/089976604774201631. [DOI] [PubMed] [Google Scholar]
  68. Raposo D, Kaufman MT, Churchland AK. A category-free neural population supports evolving demands during decision-making. Nat Neurosci 17: 1784–1792, 2014. doi: 10.1038/nn.3865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Redish AD. Vicarious trial and error. Nat Rev Neurosci 17: 147–159, 2016. doi: 10.1038/nrn.2015.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Remmel RS. An inexpensive eye movement monitor using the scleral search coil technique. IEEE Trans Biomed Eng 31: 388–390, 1984. doi: 10.1109/TBME.1984.325352. [DOI] [PubMed] [Google Scholar]
  71. Rigosa J, Lucantonio A, Noselli G, Fassihi A, Zorzin E, Manzino F, Pulecchi F, Diamond ME. Dye-enhanced visualization of rat whiskers for behavioral studies. eLife 6: e25290, 2017. doi: 10.7554/eLife.25290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Rosselli FB, Alemi A, Ansuini A, Zoccolan D. Object similarity affects the perceptual strategy underlying invariant visual object recognition in rats. Front Neural Circuits 9: 10, 2015. doi: 10.3389/fncir.2015.00010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Saleem AB, Ayaz A, Jeffery KJ, Harris KD, Carandini M. Integration of visual motion and locomotion in mouse visual cortex. Nat Neurosci 16: 1864–1869, 2013. doi: 10.1038/nn.3567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Saleem AB, Diamanti EM, Fournier J, Harris KD, Carandini M. Coherent encoding of subjective spatial position in visual cortex and hippocampus. Nature 562: 124–127, 2018. doi: 10.1038/s41586-018-0516-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Sargolini F, Fyhn M, Hafting T, McNaughton BL, Witter MP, Moser M-B, Moser EI. Conjunctive representation of position, direction, and velocity in entorhinal cortex. Science 312: 758–762, 2006. doi: 10.1126/science.1125572. [DOI] [PubMed] [Google Scholar]
  76. Sawinski J, Wallace DJ, Greenberg DS, Grossmann S, Denk W, Kerr JND. Visually evoked activity in cortical cells imaged in freely moving animals. Proc Natl Acad Sci USA 106: 19557–19562, 2009. doi: 10.1073/pnas.0903680106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Stackman RW, Taube JS. Firing properties of rat lateral mammillary single units: head direction, head pitch, and angular head velocity. J Neurosci 18: 9020–9037, 1998. doi: 10.1523/JNEUROSCI.18-21-09020.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Stahl JS, van Alphen AM, De Zeeuw CI. A comparison of video and magnetic search coil recordings of mouse eye movements. J Neurosci Methods 99: 101–110, 2000. doi: 10.1016/S0165-0270(00)00218-1. [DOI] [PubMed] [Google Scholar]
  79. Stirman J, Townsend LB, Smith S. A touchscreen based global motion perception task for mice. Vision Res 127: 74–83, 2016. doi: 10.1016/j.visres.2016.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Szuts TA, Fadeyev V, Kachiguine S, Sher A, Grivich MV, Agrochão M, Hottowy P, Dabrowski W, Lubenov EV, Siapas AG, Uchida N, Litke AM, Meister M. A wireless multi-channel neural amplifier for freely moving animals. Nat Neurosci 14: 263–269, 2011. doi: 10.1038/nn.2730. [DOI] [PubMed] [Google Scholar]
  81. Tafazoli S, Di Filippo A, Zoccolan D. Transformation-tolerant object recognition in rats revealed by visual priming. J Neurosci 32: 21–34, 2012. doi: 10.1523/JNEUROSCI.3932-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Tafazoli S, Safaai H, De Franceschi G, Rosselli FB, Vanzella W, Riggi M, Buffolo F, Panzeri S, Zoccolan D. Emergence of transformation-tolerant representations of visual objects in rat lateral extrastriate cortex. eLife 6: e22794, 2017. doi: 10.7554/eLife.22794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Taube JS. The head direction signal: origins and sensory-motor integration. Annu Rev Neurosci 30: 181–207, 2007. doi: 10.1146/annurev.neuro.29.051605.112854. [DOI] [PubMed] [Google Scholar]
  84. Tort ABL, Neto WP, Amaral OB, Kazlauckas V, Souza DO, Lara DR. A simple webcam-based approach for the measurement of rodent locomotion and other behavioural parameters. J Neurosci Methods 157: 91–97, 2006. doi: 10.1016/j.jneumeth.2006.04.005. [DOI] [PubMed] [Google Scholar]
  85. Tsoar A, Nathan R, Bartan Y, Vyssotski A, Dell’Omo G, Ulanovsky N. Large-scale navigational map in a mammal. Proc Natl Acad Sci USA 108: E718–E724, 2011. doi: 10.1073/pnas.1107365108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Vermaercke B, Op de Beeck HP. A multivariate approach reveals the behavioral templates underlying visual discrimination in rats. Curr Biol 22: 50–55, 2012. doi: 10.1016/j.cub.2011.11.041. [DOI] [PubMed] [Google Scholar]
  87. Vinken K, Van den Bergh G, Vermaercke B, Op de Beeck HP. Neural representations of natural and scrambled movies progressively change from rat striate to temporal cortex. Cereb Cortex 26: 3310–3322, 2016. doi: 10.1093/cercor/bhw111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Vinken K, Vermaercke B, Op de Beeck HP. Visual categorization of natural movies by rats. J Neurosci 34: 10645–10658, 2014. doi: 10.1523/JNEUROSCI.3663-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Vinken K, Vogels R, Op de Beeck H. Recent visual experience shapes visual processing in rats through stimulus-specific adaptation and response enhancement. Curr Biol 27: 914–919, 2017. doi: 10.1016/j.cub.2017.02.024. [DOI] [PubMed] [Google Scholar]
  90. Wallace DJ, Greenberg DS, Sawinski J, Rulla S, Notaro G, Kerr JND. Rats maintain an overhead binocular field at the expense of constant fusion. Nature 498: 65–69, 2013. doi: 10.1038/nature12153. [DOI] [PubMed] [Google Scholar]
  91. Yartsev MM, Ulanovsky N. Representation of three-dimensional space in the hippocampus of flying bats. Science 340: 367–372, 2013. doi: 10.1126/science.1235338. [DOI] [PubMed] [Google Scholar]
  92. Yu Y, Hira R, Stirman JN, Yu W, Smith IT, Smith SL. Mice use robust and common strategies to discriminate natural scenes. Sci Rep 8: 1379, 2018. doi: 10.1038/s41598-017-19108-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Zhang Z. A flexible new technique for camera calibration IEEE Trans Pattern Anal Mach Intell 22: 1330–1334, 2000. doi: 10.1109/34.888718. [DOI] [Google Scholar]
  94. Zoccolan D. Invariant visual object recognition and shape processing in rats. Behav Brain Res 285: 10–33, 2015. doi: 10.1016/j.bbr.2014.12.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Zoccolan D, Di Filippo A. Methodological approaches to the behavioural investigation of visual perception in rodents. In: Handbook of Object Novelty Recognition, edited by Ennaceur A, de Souza Silva MA. Amsterdam, Netherlands: Elsevier, 2018, chapt. 5, p. 69–101. Handbook of Behavioral Neuroscience 27. [Google Scholar]
  96. Zoccolan D, Graham BJ, Cox DD. A self-calibrating, camera-based eye tracker for the recording of rodent eye movements. Front Neurosci 4: 193, 2010. doi: 10.3389/fnins.2010.00193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Zoccolan D, Oertelt N, DiCarlo JJ, Cox DD. A rodent model for the study of invariant visual object recognition. Proc Natl Acad Sci USA 106: 8748–8753, 2009. doi: 10.1073/pnas.0811583106. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Neurophysiology are provided here courtesy of American Physiological Society

RESOURCES