Abstract
In mammalian animal models, high-resolution kinematic tracking is restricted to brief sessions in constrained environments, limiting our ability to probe naturalistic behaviors and their neural underpinnings. To address this, we developed CAPTURE, a behavioral monitoring system that combines motion capture and deep learning to continuously track the 3D kinematics of a rat’s head, trunk, and limbs for week-long timescales in freely behaving animals. CAPTURE realizes 10–100 fold gains in precision and robustness compared with existing convolutional network approaches to behavioral tracking. We demonstrate CAPTURE’s ability to comprehensively profile the kinematics and sequential organization of natural rodent behavior, its variation across individuals, and its perturbation by drugs and disease, including identifying perseverative grooming states in a rat model of Fragile X syndrome. CAPTURE significantly expands the range of behaviors and contexts that can be quantitatively investigated, opening the door to a new understanding of natural behavior and its neural basis.
Keywords: Behavior, Computational Ethology, Animal Tracking, Autism
Graphical Abstract

INTRODUCTION
The overarching goal of neuroscience and psychology is to describe the neural principles and mechanisms that underlie natural behavior. A critical first step towards this goal is the development of tools and analysis frameworks capable of precisely measuring and describing the behavior of our experimental subjects (Anderson and Perona, 2014; Egnor and Branson, 2016; Krakauer et al., 2017). Advances have been made in quantifying movements and behavior in both humans and mammalian animal models during brief recording sessions in well-delineated environments (Machado et al., 2015; Mathis et al., 2018; Pereira et al., 2019). However methods for precisely measuring behavior in naturalistic settings and over longer timescales have been lacking, permitting only coarse estimates of an animal’s posture, movements, and behavioral state (Hong et al., 2015; Wiltschko et al., 2015). This technological limitation has impeded quantitative inquiries into the organization of naturalistic behavior and its neural underpinnings.
For instance, while movements and the behaviors they produce are thought to operate under organizational rules, much like the phonological and syntactical rules that govern language, only a handful of such rules have been described (Berridge and Fentress, 1987; Dawkins, 1976; Lashley, 1951; Tinbergen, 1950). In contrast to most of biological research that deals with genes, cells and species, there is no formal taxonomical structure for parsing and naming behaviors of laboratory animals, let alone their combinations into behavioral sequences or states. Precisely describing naturalistic behavior in terms of variables that are reproducible across experiments , such as body-part kinematics , could lead to the creation of rigorous and commonly accepted definitions of complex behaviors. Extending these measurements across temporal scales could lead to the establishment of a true behavioral lingua franca. Such standard and quantitative metrics of behavior could greatly facilitate inquiries into the behavioral effects of cellular-, molecular- and circuit-level pathologies. More generally, they could serve as biomarkers for various disease models, the utility of which are currently limited by the lack of behavioral reproducibility across laboratories (Brunner et al., 2015; Silverman et al., 2010).
Similarly, it has been hypothesized that the neural systems that control movement mirror the hierarchical structure of animal behavior, with different brain regions or spinal circuits controlling the movement of single limbs, the production of individual behaviors, and the selection of longer timescale behavioral programs (Gallistel, 1982; Merel et al., 2019a). However, while task-based studies have yielded detailed characterizations of how individual movements are produced (Dhawale et al., 2019; Svoboda and Li, 2018), how more complex whole-body behaviors are controlled and generated remain poorly understood. Similar to how naturalistic visual stimuli were essential for illuminating the function and organization of the visual system (Simoncelli and Olshausen, 2001), effective tools for measuring naturalistic behavior should enable a new understanding of the principles by which the motor system stores and produces behavior (Merel et al., 2019b).
The ideal tool for describing natural behavior, its neural underpinnings, and the effects of environmental and neural perturbations must deliver precise and continuous multiscale measurements of movement kinematics across an animal’s natural behavioral repertoire. In mammals, achieving this goal requires tracking the position of points on the limbs, trunk and head at high spatiotemporal resolution and in 3D, across diverse postures and locations. In rodents, this necessitates kinematic tracking with millimeter-scale precision and millisecond-timescale resolution, ideally continuously over hours and days to sample the full repertoire of rodent behavior and to capture its long timescale structure (Anderson and Perona, 2014; Egnor and Branson, 2016). While overhead video-based recordings or depth imaging can estimate an animal’s pose in largely featureless environments, these methods fail to reliably track appendages (Hong et al., 2015; Jhuang et al., 2010; Wiltschko et al., 2015). Keypoint tracking tools from computer vision and machine learning allow monitoring of visible landmarks on an animal’s body (Branson et al., 2009; Machado et al., 2015; Mathis et al., 2018; Pereira et al., 2019), but to date have been limited to due to an inability to robustly track occluded landmarks or across a broad range of animal postures.
Here we present a technique capable of recording 3D movement kinematics across a rat’s behavioral repertoire. By combining motion capture, deep learning, and body piercing, we achieve continuous long-term kinematic tracking of a rat’s head, trunk and appendages with substantially superior precision and robustness compared to convolutional networks. We use these continuous kinematic recordings to collect a definitive reference dataset of rat behavior, cataloging nearly every movement a rat makes over week-long timescales. To parse these recordings, we developed a machine learning analysis framework that allowed us to identify stereotyped organismal behaviors, behavioral sequences, and behavioral states. This framework allows us to describe organizing principles of natural behavior and comprehensively phenotype behavioral perturbations introduced by stimulants and in a rat model of Fragile X syndrome.
RESULTS
CAPTURE: Continuous Appendicular and Postural Tracking Using Retroreflector Embedding
We sought to develop a system for continuous tracking of 3D whole-body kinematics during naturalistic rodent behavior. Because of its advantages in spatial precision, tracking speed, and data storage (Methods), we used motion capture, in which a calibrated camera array tracks the position of retroreflective markers placed on a human or animal subject (Mischiati et al., 2015). While well established in humans, motion capture has seen limited use in animal models due to difficulties in stably attaching markers over long recording sessions (Courtine et al., 2008; Mimica et al., 2018; Takeoka et al., 2014). To overcome this limitation, we developed a method for chronically attaching retroreflective markers to animals using body-piercings, a body modification approach also well established in humans (Horst et al., 1992; Stirn, 2003) (Methods S1). Our retroreflective markers consist of half-silvered high-index of refraction ball lenses (n=2.0) that serve as bright and durable motion capture markers (Mischiati et al., 2015). We fuse these retroreflectors to biocompatible transdermal body piercings using high strength epoxy. The vast majority of the piercings remain stably attached for the entirety of our experiments (Figure S1C).
To track the rat’s retroreflective body piercings, we constructed a rodent motion capture studio consisting of motion capture cameras positioned around a two-foot diameter plexiglass arena (Figure S1A). Because markers must be seen by at least two cameras to be triangulated into 3D, we used 12 cameras to provide robustness to occlusions. To encourage behavioral diversity we equipped it with bedding and objects and a lever for operant training (Kawai et al., 2015)(Figure 1A-B)(Videos S1 and S2).
Figure 1 ∣. CAPTURE: Continuous Appendicular and Postural Tracking Using Retroreflector Embedding.
(A) Schematic of the CAPTURE apparatus. Twelve motion capture cameras continuously track the position of 20 body piercings affixed to the animal’s head, trunk, and limbs.
(B) Upper: Schematic depictions of a rat with attached markers, colored by the major body segments tracked, engaging in different species-typical behaviors. Lower: hypothetical wireframe representation of 3D marker positions tracked by motion capture for each of the depicted behaviors.
(C) Speed of markers located on a single rat’s head, trunk, and appendages, across 72 hours of near-continuous motion capture recordings (upper), measured alongside the speed of the animal’s center-of-mass and the state of the room lights (‘on’ or ‘off’). Investigation of kinematics on minute-long timescales (lower) reveals modulation into alternating periods of movement and rest. Speed traces smoothed with a 30-s boxcar filter.
(D) During a behavioral sequence of scratching, rearing, and walking (upper), CAPTURE recordings reveal rhythmic modulation of a selected subset of body part Cartesian and joint angle velocity components (middle). Visualization of individual behaviors on millisecond timescales shows independent control of different appendages during scratching and wet dog shake behaviors (lower). We defined joint angles with respect to sagittal (s), coronal (c), transverse (t), and inter-segment planes (i)(Methods).
Shaded regions in C and D denote expanded regions in lower panels.
See also Figures S1-S3, Videos S1-S3, Table S1, Methods S1-S2.
We first assessed the ability of CAPTURE to record the kinematics of individual limbs across the rodent behavioral repertoire. We habituated our rats (n=5) to the arena and equipped them with the same set of twenty markers, which, once tracked, could be used to compute the position and orientation of the animal’s head, trunk, forelimbs, and hindlimbs (Figure S1B). We tracked marker positions nearly continuously at 300 Hz, for one week in each rat. Unlike depth imaging approaches (Mallick et al., 2014), the use of bedding or objects does not interfere with our motion capture recordings, allowing the experimental arena to double as the animal’s home cage (Figure 1C-D)(Video S3). We found that our recordings showed sub-millimeter tracking precision (0.21±0.07 mm; Methods), and that segment lengths between markers remained stable over the recording session (Figure S1D), indicating that CAPTURE could reliably report limb kinematics over one week timespans. While tracking performance degraded slightly when using fewer cameras (Figure S1E), there is little incentive to reduce the number of cameras. This is because , unlike with traditional video camera-based systems , adding motion capture cameras does not meaningfully add to the experimental or computational effort (Table S1, Methods S1).
To test whether body piercings altered the animal’s behavioral repertoire, we attached a headcap fitted with retroreflectors to n=3 rats and tracked these for two days before piercings were attached (Figure S2). Animals with and without body piercings showed equal fractions of time spent moving in the arena and similar distributions and covariances of head velocities (Figure S2A-C). A classifier trained to identify the animal’s behavior from the tracked movements of the headcap predicted equivalent behavioral usage before and after marker attachment (Figure S2D-F), altogether suggesting that piercings do not cause major behavioral changes in our animals.
Like all vision-based tracking approaches, unprocessed motion capture recordings were prone to dropouts of forelimb and hindlimb markers due to self- or environmental-occlusion. The vast majority of these dropouts were brief (~20 ms in duration), allowing us to use standard interpolation methods based on the temporal history of marker position to faithfully reconstruct the position of dropped markers (Liu and McMillan, 2006)(Figure S3). However, as these methods do not incorporate constraints from neighboring markers or model long-timescale influences on marker position, they perform poorly for longer dropouts. To address this, we used our large collection of well-tracked motion capture data (~25 million frames per day) to train a more expressive deep learning architecture. In particular, we trained a temporal convolutional network (Methods S2)(Oord et al., 2016a) to predict a given marker’s position using both temporal information about a its past locations, as well as spatial information about the position of all other markers (Figure S3). Our imputation procedure resulted in a low, ~1 mm estimated median error during artificial dropout periods. Following imputation, all 20 markers were well tracked for ~99% of frames when animals were active, resulting in a sub-millimeter positional error across markers (Figure S1F-S1H).
Comparison with keypoint detection using convolutional networks
Recently, 2D convolutional networks trained from scratch or on human-labeled pose databases have been applied to detect un-occluded visual landmarks, mostly in 2D, in individual behavioral tasks (Mathis et al., 2018; Pereira et al., 2019). To test how our approach compares to these established networks in terms of 3D tracking across a larger range of behaviors, we quantified the tracking accuracy of DeepLabCut relative to CAPTURE. We first used the recommended approach of fine-tuning a pretrained network using a small number of hand-labeled frames (225) of data. We then used the fine-tuned network to detect keypoints in frames from 6 synchronized, calibrated cameras, which we then triangulated across cameras to produce estimates of the animal’s 3D posture. Inspection revealed the keypoint predictions to be poor, showing substantial deviation from human keypoint labels on a held-out test dataset (Figure S4). These predictions were worse on the appendages or when using predictions from 3 cameras, indicating that networks especially struggled to track occluded markers.
It could be that these networks simply needed more training data to track naturalistic behavior in 3D. To test this, we trained DeepLabCut using 100–100,000 frames that were labeled by projecting ground truth marker positions (determined by motion capture) into video frames, a common practice in computer vision (Video S4)(Ionescu et al., 2014). Networks trained on large numbers of samples (100,000 and 10,000), showed accurate tracking, even on frequently occluded keypoints on the forelimbs and hindlimbs (Figure 2). This performance degraded substantially for those trained on fewer numbers of frames (1000 and 100), especially when using only 3 cameras.
Figure 2 ∣. 2D Convolutional networks are ill-suited for whole-body 3D tracking across multiple behaviors.
(A–B) Example video images, as well as wireframe representations of the tracked points for tracking using CAPTURE and DeepLabCut. We trained DeepLabCut using between 100 and 100,000 video frames with labels indicating the position of the 20 marker sites tracked using CAPTURE. DeepLabCut predictions were made using 6 cameras for animals bearing markers in (A) or out of (B) the training dataset.
(C) The average 3D distance between DeepLabCut predictions and the position of points tracked using CAPTURE, across training frames, camera numbers and the presence of the rat in or out of the training dataset (marked as in-sample or out-of-sample, respectively). The motion capture reprojection error is shown for comparison. Errors bars (mean ± s.e.m.) are within markers and computed over 306,356 and 249,241 frames for 3 in-sample and 2 out-of-sample animals, respectively.
(D) The 10–30 mm differences in precision between CAPTURE and DeepLabCut can produce dramatic changes in the ability to accurately reconstruct an animal’s pose. We computed the fraction of frames in which the length of 19 different body segments, as reported by DeepLabCut, were within 18 mm of the true segment lengths tracked using CAPTURE, for animals in- and out-of-sample. The fraction of correct segment lengths for CAPTURE is shown for comparison. Shaded error bars (within lines) show standard deviation of 100 bootstrapped samples of frames.
However, even when trained on a large number of samples, these convolutional networks did not generalize to tracking out-of-sample rats bearing markers. Networks applied to out-of-sample animals showed 20–30 mm average tracking error, making it impossible to accurately reconstruct the animal’s posture on the vast majority of frames (Figure 2D). Estimating performance with 6 additional cameras (12 total) did not rescue performance on these out-of-sample animals, consistent with past reports suggesting dozens of cameras and tens to hundreds of thousands of domain- and view-specific hand labels are required for 3D tracking using 2D convolutional networks (Bala et al., 2020; Iskakov et al., 2019). Lastly, training DeepLabCut using large numbers of labeled frames and then fine-tuning the network again on hand-labeled frames from an out-of-sample rat not bearing markers, did not substantially improve tracking (Figure S4). Thus, while useful for tracking in constrained behavioral tasks, we find that 2D convolutional networks are not currently well suited to the more general problem of 3D tracking across multiple naturalistic behaviors in freely moving animals (Figure S4E, Table S2).
Comprehensive profiling of behavioral kinematics
Having established CAPTURE’s state-of-the art precision and ability to precisely track markers over long timescales, we next validated that these kinematic recordings can be used to identify the frequency and transition structure of known and novel rodent behaviors. To do so, we developed a framework for describing discrete behaviors from our kinematic recordings by combining ideas from two previous approaches: (i) supervised behavioral classification, which can be used to detect instances of hand-labeled example behaviors but are unable to identify novel behaviors (Kabra et al., 2013), and (ii) unsupervised behavioral clustering, which can identify a much larger set of potentially novel behaviors, but to date has not been shown to robustly detect known rodent behaviors such as grooming (Berman et al., 2014; Wiltschko et al., 2015). To blend these approaches, we selected two sets of features that resembled those used in supervised and unsupervised classification approaches, respectively: 80 features that were informative about distinctions between a set of commonly recognized rodent behaviors (Supplementary Figure 5), and 60 features that broadly encapsulated the animal’s pose and kinematics. This yielded a 140 feature matrix that encapsulated the animal’s behavior in a ~500 ms time window (Methods).
To identify repeated instances of behaviors, we used a behavioral mapping approach, in which this high-dimensional behavioral feature vector is embedded into two-dimensions to create a map that facilitates clustering and exploratory data analysis (Berman et al., 2014). We collected timepoints across a representative set of CAPTURE recordings (16 rats, 1.04·109 frames), subsampled this collection to yield a balanced representation of different behavioral categories, and embedded the subsampled features into two-dimensions using t-stochastic neighbor embedding (t-SNE)(Figure 3A)(Maaten and Hinton, 2008). The resulting embedding contained density peaks that corresponded to repeated instances of similar behaviors, which we clustered using a watershed transform (Figure 3A; Video S5). Inspection revealed that behavioral clusters in different regions of the map corresponded to commonly recognized categories of rodent behaviors such as walking, rearing, grooming of the face and body using the tongue, and scratching of different body sites using the hindlimbs (Video S6). Individual behavioral clusters within a category corresponded to postural and kinematic variants (Video S7).
Figure 3 ∣. Comprehensive kinematic profiling of the rat behavioral repertoire.
(A) We developed a behavioral mapping procedure to identify behaviors in CAPTURE recordings. We first defined a set of 140 per-frame features describing the pose and kinematics of rats within a 500 ms local window. Many of these features were obtained by computing the eigenpostures of the rats from the measured kinematics (upper), which consisted of commonly observed postural changes such as rearing or turning to the left and right. We then computed time-frequency decompositions of the eigenposture scores over time using a wavelet transform (middle). We subsampled features from all 16 rats recorded (1.047 billion frames) and co-embedded them in two dimensions using t-SNE to create a behavioral map that we clustered using a watershed transform (Figure S5) (Methods). We annotated clusters in the behavioral map, which showed that behavioral categories segregated to different regions of the map. From these behavioral clusters we then computed ethograms describing behavioral usage over time (lower).
(B) The power spectrum of the speed of markers on different body segments during rhythmic behaviors. Because different body segments show large variance in overall power, power spectral densities are separately scaled within each marker group on the head, forelimbs, and hindlimbs. Power spectra computed from n=4 animals on two days.
(C) Upper: Power spectral densities of individual behavioral clusters belonging to rhythmic behaviors for one rat over two days. Power spectral densities were computed over the speed of one marker on the body segment listed. Colored lines correspond to examples below. Insets show position of the example behavioral clusters within the coarse behavioral category. Spectral densities are reported relative to 1 (mm·s−1)2/ Hz. Lower: individual instances of marker kinematics randomly drawn from example clusters.
(D) Example poses selected from non-rhythmic behavior clusters associated with rearing and stretching.
CAPTURE thus offers the ability to comprehensively profile the kinematics of the rodent behavioral repertoire. As a demonstration of this, we examined the frequency spectrum of different body parts during rhythmic behaviors, specifically grooming, scratching, and wet dog shakes (Figure 3B). Grooming of the body consistently showed peaks in the frequency of the head and side-specific forelimb speed at 4 Hz and 7-9 Hz, consistent with past reports (Berridge et al., 1987). In contrast, scratching showed side-specific frequency peaks across a 7-12 Hz range, consistent with the 15-20 Hz frequency reported in mice when adjusted for body size (Elliott et al., 2000). Wet dog shakes showed a peak in trunk power at 14 ± 0.6 Hz, consistent with past work using high-speed video analysis (Dickerson et al., 2012). Interestingly, while instances of wet dog shakes and grooming showed similar behavioral frequency, but variable amplitude, scratching behaviors varied more broadly in both their frequency and amplitude, suggesting that they may be generated by more flexible or less robust control circuits (Figure 3C). Furthermore, CAPTURE enabled the detection of variability in non-rhythmic behaviors, for instance postural variability in static rearing behaviors (Figure 3D).
New insights into the hierarchical structure of behavior
Animal behavior is thought to be hierarchically structured in time into repeated behavioral patterns (Dawkins, 1976; Tinbergen, 1950), yet — to date — systematic, quantitative means of identifying these structures have been lacking. To address this, we first probed the degree to which there is longer-timescale temporal structure in rodent behavior by examining the behavioral transition matrix at different timescales (Figure S6A). We found significantly more structure in the transition matrix at 10–100 second timescales than predicted by a first-order Markov chain (Berman et al., 2016), a time-invariant process that is commonly used to model behavioral dynamics (Berridge et al., 1987; Wiltschko et al., 2015).This enhancement in structure was far less pronounced at timescales of several minutes.
To elucidate the nature of these non-Markovian behavioral structures, we developed an algorithm that identifies temporal epochs with similar patterns of behavioral usage on a fixed timescale τ (Figure 4A)(Methods). As examples of these patterns, we used 15-s and 2-min timescales, which identified distinct behavioral patterns (Figure S6B-D). On 15-s timescales, the algorithm identified sequentially ordered patterns in the behavior, such as ‘canonical’ grooming sequences of the face followed by the body (Berridge et al., 1987) or the performance of stereotyped lever-pressing sequences acquired during training in our task (Kawai et al., 2015). We thus termed these detected patterns ‘sequences’ (Figure 4B-D)(Video S8). On 2-min timescales, it identified epochs of varying arousal or task-engagement, which often lacked stereotyped sequential ordering. We refer to these as ‘states’ (Video S9). Consistent with this nomenclature, the transition matrices of patterns on 15-s timescales were significantly sparser than those on 2-min timescales, indicating they possessed a more stereotypic ordering between behaviors (Figure S6E).
Figure 4 ∣. Rat behavior is hierarchically organized into behavioral sequences and states.
(A) We developed a temporal pattern matching algorithm to detect repeated behavioral patterns in our behavioral recordings. We smooth ethograms over 15-s or 2-minute timescales and compute the pairwise correlation to yield a similarity matrix, which we threshold to extract high-value off-diagonal elements. These correspond to patterns of behavioral usage observed at two or more timepoints in the dataset. We cluster these patterns to identify repeated sequences and states, which can be organized to identify hierarchical structure. Ethograms and similarity matrices are shown from a full day of recording in a one animal smoothed with 4-s and 15-s boxcar filters, respectively. Dendrograms and clustered states are schematic examples. Behaviors colored following Figure 3A.
(B) Ethograms, smoothed with a 4-s filter for visualization, shown on minute- (upper) and hour-long (lower) timescales. Ethograms are shown for the subset of behaviors observed during each time period, sorted and shaded by their membership in different behavioral sequences (upper) and states (lower). States and sequences detected using the pattern matching algorithm are shown above and ordered in (B) and (C) by their membership in different behavioral categories.
(C) Heatmaps showing the composition of sequences (upper) and states (lower) in terms of different behavioral categories.
(D) Examples of behavioral patterning during different sequences and states. Sequences show repeated temporal patterns of behavioral usage while states show average increases in behavioral frequency but without detailed temporal structure. For clarity, only a subset of frequently occurring behaviors are shown. Behaviors are colored by their membership in behavioraI categories as in Figure 3A.
(E) To display the hierarchical organization of these sequences and states, we computed a stacked tree, whose lower links reflect the probability of observing different behaviors during each sequence and whose upper links reflect the probability of observing different sequences in each state. Behaviors and links are colored according to their behavioral type: green (grooming): grooming and scratching; blue (idling): prone still and postural adjustment; red (active): investigatory, walking, shaking, and rearing. For clarity, one in eight behaviors are shown, and we only visualize links showing probabilities of occurrence greater than 0.05 and 0.1 for behaviors and sequences, respectively. Shaded region corresponds to region highlighted in (F).
(F) Hierarchical arrangement of the right grooming behavioral sequences highlighted in (E; left), as well as example ethograms of a subset of behaviors in each sequence (right). For clarity, in the hierarchy we show only one in four behaviors, and a subset of behavioral states used. Across animals, there was significant mutual information between behavioral state and the grooming sequence observed (0.62 vs 10−3 nats, P=0.004, sign test, over 9 days where grooming sequences were observed).
Example performance data in B–F shown for a single animal on one full day of recording. Example states and sequences numbered in parenthesis reference behavioral usages shown in C.
We used these sequences and states to form a hierarchical representation of behavior (Figure 4D), which was, again, significantly more structured than expected from Markovian behavior (Figure S6F-S6H). Rather than being organized as a strict, tree-based hierarchy (Berman et al., 2016; Dawkins, 1976), we found that behaviors were shared across multiple behavioral sequences, which were then used differentially across behavioral states (Figure 4E). For example, grooming of the right forelimb was used both in persistent body grooming sequences and in shorter, more vigorous episodes of grooming, with these sequences being used to different extents in different behavioral states (Figure 4F).
Lastly, we used our pattern detection algorithm to identify usage patterns on a broad range of timescales between 7.5 and 480 s. Patterns detected were largely distinct across timescales, but we found few unique patterns beyond timescales of several 100 s, and similar patterns below 15 s (Figure S6B-S6D). Together with our analysis of non-Markovian structure in the transition matrix, these results indicate that the most evident long-timescale structures in behavioral usage are patterns at 10–100 s timescales.
Multiscale phenotyping of the effects of drugs and disease
CAPTURE’s measurement precision and ability to record kinematics over long timescales enables an unprecedentedly comprehensive description of how drugs, neural circuit manipulations and disease states affect behavior. As a first example of this, we performed recordings after acute administration of two stimulants: caffeine and amphetamine, at dosages known to similarly increase locomotor activity (Antoniou et al., 1998). Analysis of CAPTURE recordings recapitulated these findings, showing that both caffeine and amphetamine increase the amount of time animals spend moving compared to recordings at baseline or after injection of a saline vehicle (Figure 5A). However, we found that the changes elicited by these compounds differed substantially in the types of behaviors affected. While both compounds increased the amount of time spent in active behaviors at the expense of idling, caffeine, but not amphetamine, increased the proportion of time rats spent grooming (Figure 5A-C). Amphetamine affected not only the types of behaviors animal’s expressed, but also their patterning in time (Figure 5D). While the states and sequences expressed by animals after administration of caffeine were similar to naturally occurring states of arousal, amphetamine elicited entirely novel behavioral states that consisted of repetitive locomotor sequences such as circling (Figure 5D,E)(Video S10). In the past, amphetamine was thought to elicit a biphasic behavioral response, stimulating locomotor activity at low doses and repetitive motor ‘stereotypies’, such as head swinging, at higher doses (Antoniou et al., 1998). Our results suggest that the increase in repetitive behavior triggered by amphetamine begins at much lower doses of administration and is, at least partially, dissociable from its effects on locomotion.
Figure 5 ∣. Caffeine and amphetamine show similar effects on arousal, but divergent effects on behavioral organization.
(A) Behavioral density maps during baseline and after acute administration of either a saline vehicle, caffeine (10 mg/kg) or amphetamine (0.5 mg/kg). The fraction of time animals spent moving compared to baseline (32±1%) increased significantly after administration of caffeine (80±1%) and amphetamine (84±1%) but not after a saline vehicle (17±1%; P=0.016 for caffeine and amphetamine, P=0.4 for vehicle, n=4 rats for drug and vehicle conditions and n=5 rats for baseline condition, 4,635–7,391 s per condition). Both stimulant compounds altered the amount of time spent engaging in walking and rearing (arrows).
(B) Caffeine and amphetamine increased the fraction of time spent in active locomotor behaviors at the expense of idling behaviors, but had divergent effects on grooming, with amphetamine alone showing suppression of grooming activity (all P<10−5, binomial test, n=238-989 effective samples per behavioral category).
(C) Caffeine and amphetamine introduced new types of active and grooming behaviors that were rarely observed at baseline, for instance high velocity walks and more vigorous grooming. Colored bars denote fold change significantly modulated behaviors (P<10−6, Poisson probability of rates in the perturbed condition, Benjamini-Hochberg corrected). Sorting all behavioral changes by their modulation after administration of caffeine (right), reveals that caffeine and amphetamine both increase the frequency of many active behaviors and decrease the frequency of many grooming behaviors.
(D) Box-and-whisker plots showing the correlation coefficient of sequence (upper) and state (lower) probability vectors across baseline and drug conditions. Amphetamine and caffeine induced significant changes in the long-timescale organization of behavior (**P=1.5·106, *P<0.005; rank-sum test; n=16 pairs of days between conditions). Because of the lack of movement during vehicle and time-matched controls, baseline data here is taken from two days of recordings.
(E) Heatmaps showing the composition of states in terms of behavioral categories (upper) and the fold change in the usage of these states compared to baseline (lower), sorted to emphasize changes across conditions. Amphetamine introduced highly distinct states in animals, emphasized by black lines (Video S10).
See also Video S10.
Having validated CAPTURE’s capacity to identify both known and novel effects of drugs on behavior, we next screened for behavioral changes in a rat model of Fragile X syndrome, the most common monogenic form of autism spectral disorders in humans (Miller et al., 2019). While behavioral assays using targeted behavioral tests have been performed in a variety of models of autism, including Fragile X syndrome (Silverman et al., 2010), reports of behavioral changes often conflict across studies, especially those related to grooming behaviors (Brunner et al., 2015). We used CAPTURE to compare the behavioral repertoire of male Fmr1-KO rats and their wildtype cage mates over three continuous days of home cage behavior. While knockout rats spent comparable time moving, they spent significantly more time grooming (Figure 6A-6B). This did not simply reflect an increase in the amount of time spent in normal grooming behaviors; Fmr1-KO also engaged in different types of grooming behaviors, that tended to occur in longer bouts (Figure 6C). The grooming behaviors in knockouts were also assembled into different sequential patterns than controls, exhibiting altered usage of behavioral sequences and states (Figure 6D). This effect was driven in part by abnormal and idiosyncratic grooming sequences in knockout animals (Figure 6E). Taken together, this suggests that while Fmr1-KO rats largely overlap with wildtype animals in their locomotor behavior, they show idiosyncratic perseverative grooming sequences, a robust behavioral manifestation well suited for exploring the neural mechanisms underlying motor stereotypies present in autism spectral disorders (Kalueff et al., 2016).
Figure 6 ∣. Fmr1-KO rats show idiosyncratic perseverative body grooming sequences.
(A) Behavioral density maps of Fmr1-WT (n=3) and Fmr1-KO (n=4) age-matched cage mates for two full days of recording. Arrows highlight periods of increased grooming in Fmr1-KO rats. Both wildtype and knockout rats spent equal amounts of time moving (35.1±2% vs 35.0±0.7%, rank-sum test, P=0.85; n=6,8 days of recording in WT and KO rats respectively, 445,601-673,473 s per condition).
(B) Fmr1-KO but not Fmr1-WT animals spent increased time grooming, at the expense of idling behaviors (all P<10−5, binomial test, neff=166–232 effective samples per behavioral category). This was accompanied by a significant increase in the dwell time of grooming behaviors alone (10±2% increase in dwell time between KO and WT animals over all grooming behaviors, P=0.03 signed-rank test, n=208 grooming behaviors, compared to <0.01% changes in the average dwell time of active and idling behaviors).
(C) The composition of behavioral categories, especially grooming behaviors, were substantially modified in Fmr1-KO but not Fmr1-WT animals. Colored bars denote fold change of behaviors significantly modulated across the stated conditions (P<10−6, Poisson probability of altered rates, Benjamini-Hochberg corrected). Sorting all behavioral changes by their modulation in Fmr1-KO animals (right), shows little commonality between behaviors modulated in wildtype and knockout animals.
(D) Box-and-whisker plots showing the correlation of sequence and state usage probabilities for WT and Fmr1-KO rats. There was a significantly decreased correlation across genotypes (*P=0.0006, P=0.01 for sequences and states, respectively; rank-sum test, n=48 pairs of days between conditions).
(E) Heatmaps showing the composition of sequences in terms of behavioral categories (upper) and the fold change in the usage of these sequences compared to baseline (lower), sorted to emphasize changes across conditions. Black lines highlight elevated levels of idiosyncratic grooming sequences in knockout rats.
Stability, individuality, and commonality of the rodent behavioral repertoire
Rodents are known to exhibit individual differences in behavior as a result of interaction between genetic, environmental, developmental, and social factors (Lathe, 2004). Individuality is typically measured through coarse estimates of behavioral usage (Forkosh et al., 2019; Freund et al., 2013), leaving open the question of whether animals exhibit individuality along other dimensions, for instance in the kinematics and long-timescale patterning of behavior that can now be systematically explored using CAPTURE.
We therefore used CAPTURE to compare the behavior of five female LE rats across 3-5 days of baseline recordings. CAPTURE identified subtle kinematic differences in the vigor and frequency of rhythmic behaviors across days and animals, but these kinematic differences were significantly smaller than those present across different behaviors, indicating that animals drew from a common behavioral repertoire (Figure 7A-7B). However, while animals drew from a common behavioral repertoire, they varied significantly in the usage of these behaviors, showing ~80% similarity across animals (Figure 7C-E). This similarity in usage showed only moderate dependence on the coarseness of behavioral definitions, indicating that this individuality in behavioral usage is not a trivial consequence of the slight differences in behavioral kinematics across animals (Figure 7F). Interestingly, dissimilarity across animals was most pronounced for grooming and rearing behaviors, suggesting they may be particularly sensitive to developmental or environmental differences (Figure 7E).
Figure 7 ∣. Long-timescale behavioral structure is a locus of behavioral individuality.
(A) Left: power spectral densities of the velocity of the front head marker, for two individual behavioral clusters, for n=5 rats over 3-5 days each. Spectral densities are reported relative to 1 (mm·s−1)2/ Hz. Right: the average Cartesian velocity of the head in each behavioral cluster, shown for two rats on three days. Shaded bars are s.e.m.
(B) Box plots showing the Pearson correlation of the 140-dimensional behavioral feature vector for five separate comparisons: within-animal, within-day comparisons of (1) different instances of the same behavior, different behaviors in the same coarse category (2) or different behaviors in different coarse categories (3), and of behavioral averages across different days within (4) and across (5) animals. All P<10−10, Kruskal-Wallis test with Bonferroni corrected post-hoc testing, n=1731, 13386, 135094, 5264, 22934, pairs of feature vectors for each comparison, respectively. Only behaviors with at least 5 instances per day were analyzed.
(C) Representative behavioral density maps showing consistency of behavior usage from day-to-day for two rats.
(D) Overlaid t-SNE density maps across days (upper) and rats (lower).
(E) The Pearson correlation between the probability vectors of behavior usage over days, across all categories (left) and per behavioral category (right), over different days and rats. There was a significant decrease in correlation across rats, especially for rearing and grooming behaviors (signed-rank test, P<10−7 for all comparisons; n=36 and 348 pairs of days within and across animals, respectively). There was a significant correlation between the behavioral usage change over days, and the number of calendar days between recordings (P = 0.002, F-test compared to null model, R2 =0.23, n=36), indicating that there was some drift in behavioral usage over time.
(F) The Pearson correlation between the probability vectors of behavior usage over days, across 7 increasingly fine-grained clusterings of the behavioral space (Methods).
(G) Example ethograms of sequence usage for two rats on two different days, as well as summary bar graphs showing the average sequence usage over the entire day. For clarity, we show only a subset of sequences, smoothed with a 10-s filter. Scale bar on bar plots corresponds to 10% of frames.
(H) The average Pearson correlation in the usage of temporal patterns across timescales. We found significantly greater similarity in usage across animals at short timescales compared to long timescales (signed-rank test, P<10−10, 174 pairs of days).
(I) We trained a random forest classifier to distinguish individual rats or individual days using behavioral usage statistics. Incorporating long-timescale pattern usage significantly improved identification of species, but not across days within species (signed-rank test, P=0.01, 0.6, respectively, n=21 days).
See also Figure S7.
Beyond variation in overall behavioral usage, animals demonstrated individuality in the long-timescale patterning of behavior. Animals again drew from a common set of temporal patterns (Figure S7). Single animals showed ~80% similarity in pattern usage over different days, but varied more significantly in their usage across animals, especially at long-timescales (Figure 7G-H). Classifiers trained to recognize animal identities based on behavioral usage were significantly improved by adding additional information about long-timescale patterns (Figure 7I). This suggests that the patterning of behavior reflects a previously unappreciated locus of individuality, above and beyond changes in behavioral usage.
DISCUSSION
We combined motion capture, deep learning, and body piercings to enable continuous kinematic tracking of the head, trunk, and appendages in freely behaving rats continuously over week-long timescales. In comparison to past approaches using depth cameras or convolutional neural networks, CAPTURE offers the ability to track limbs in 3D, across the rat’s full behavioral repertoire (Figures 1, 2). To parse these continuous 3D kinematic recordings, we developed an unsupervised computational analysis framework, which provides a synoptic picture of behavior in terms of discrete organismal behaviors and long-timescale behavioral patterns such as sequences and states (Figures 3, 4). This analysis allowed us to both recapitulate the kinematic and sequential characteristics of known rodent behaviors and identify novel kinematic, sequential, and organizational properties of behavior, including a coupling between the rodent’s behavioral state and type of grooming observed (Figures 4F). Finally, we demonstrated how CAPTURE can be used to identify novel behavioral, sequential, and kinematic changes in response to drugs and disease states (Figures 5 and 6), and across individuals (Figure 7), in particular demonstrating the presence of perseverative grooming stereotypies in a rat model of autism spectral disorder.
Comparisons to the emerging behavioral tracking toolkit
A behavioral revolution is underway in neuroscience. Much like the diverse options available for monitoring electrical, calcium, hemodynamic, and neuromodulatory activity in the brain, the expanding behavioral toolkit contains a range of tools for behavioral measurements that vary in the number of unique behaviors they can discriminate, and their ability to report the location of body keypoints in 2D or 3D (Table S2). By combining large numbers of high-resolution camera views with high signal-to-noise markers, CAPTURE offers a robust and reliable framework for 3D kinematic measurements, continuously across the rat behavioral repertoire. In comparison, alternative approaches using depth cameras (Hong et al., 2015; Wiltschko et al., 2015) may offer reduced cost and increased throughput, but they struggle with infrared reflections (e.g. from bedding and other objects), the ability to disassociate animals from objects in the arena, and cannot track defined keypoints, including the position of appendages. These limitations constrain the range of environments in which depth camera approaches can be applied, as well as their ability to test hypotheses relating neural activity with behavioral kinematics. Alternatively, we found that 2D convolutional networks, while useful for tracking non-occluded keypoints in behavioral tasks, need extensive training datasets (100,000 fully labeled frames) to perform whole-body 3D tracking, and even then fail to learn general visual features that readily transfer to new animals (Figure 2). Thus, these approaches are not suited to whole-body 3D tracking across an animal’s behavioral repertoire.
CAPTURE, however, also faces challenges, as it is difficult to extend to species and regions of the body that cannot be equipped with markers. While we found that body piercings did not overtly modify animals’ behavior (Figure S2), they may cause subtler behavioral changes or influence animals’ social or environmental interactions. These potential adverse effects, as well as potential shifts in marker positions during longer experiments, should be carefully considered. Extending CAPTURE to a broader range of species and keypoint locations could be helped by using it alongside synchronized video recordings, to train deep learning algorithms for markerless 3D motion capture as is done in humans (Ionescu et al., 2014; Pavlakos et al., 2017). This could permit high-throughput markerless motion capture with as few as one camera.
Behavioral analysis across spatial and temporal scales
Understanding the logic of biological systems can rarely be achieved by studies at a single scale. The omics era has demonstrated how complementary insights into the development and function of organisms can be derived across the single-cell, tissue, and organ-system scales, while neuroscience is increasingly driven forward by an integration of insights from molecular, systems and computational approaches (Luo et al., 2018). Previous studies of behavior in rodents were restricted to high-resolution snapshots of individual behaviors, or across a broader range of behaviors at a more qualitative level, stymieing efforts to integrate insights across scales (Berman, 2018). CAPTURE can precisely track kinematics over long timescales, allowing us to record both the occurrence of behaviors, their precise 3D kinematics and their organization into sequences and states. This multi-scale capability allowed us to quantitatively interrogate both the stereotypy and flexibility of rat behavior across timescales and individuals. We showed that behaviors are associated with different patterns of variability in frequency and amplitude (Figure 3C), that the same behaviors can be organized into different sequential patterns as a function of the animal’s behavioral state (Figure 4E-F), and that different animals show individuality in behavioral kinematics, usage, and patterning (Figure 7).
Our results represent only a few select examples of the types of inquiry into behavioral organization possible with CAPTURE. Similar to recent comprehensive surveys of brain connectivity or cellular transcriptional properties, the high-resolution and comprehensive nature of these datasets can be used by the broader community to achieve a diverse set of goals. For example, they could serve to test new algorithms for behavioral analyses, as reference datasets for standardizing behavioral definitions, or to characterize the biomechanical properties of movement across the rodent behavioral repertoire.
Comprehensive phenotyping of behavioral perturbations
In comparison to existing approaches for motor and behavioral phenotyping that use task-based batteries of tests to probe specific kinematic or behavioral deficits (Jinnah and Hess, 2015; Silverman et al., 2010) or depth cameras to coarsely assess changes in usage across a broader range of natural behaviors (Wiltschko et al., 2015), CAPTURE can comprehensively identify precise changes in both behavioral kinematics and usage. Additionally, the continuity of CAPTURE recordings, and compatibility with naturalistic arenas such as the animal’s home cage, enables a more complete characterization of changes in long timescale behavioral organization.
As an example of this capability, we demonstrated how caffeine and amphetamine elicit different changes in behavioral states and kinematics, with amphetamine producing a distinct, stereotypic, behavioral state with highly altered behaviors and behavioral kinematics (Figure 5). We also described idiosyncratic perseverative grooming sequences in a rat model of Fragile X syndrome (Figure 6). The kinematic and behavioral resolution of CAPTURE should be especially useful for motor diseases such as Parkinson’s, Huntington’s, Tourettes and Dystonia, where key motor symptoms such as tremor, dyskinesia, stereotypies, and postural changes lack robust quantitative metrics in animal models (Gittis and Kreitzer, 2012; Pappas et al., 2014; Parker et al., 2018). The ability of CAPTURE to record natural behaviors over long timescales should be especially useful in models of behavioral disorders such as autism, anxiety, bipolar, depression, in which symptoms are often associated with slower changes in behavioral patterns that have been difficult to reliably measure in animal models (Nestler and Hyman, 2010).
Next-generation frameworks for behavioral analysis
High-dimensional data analysis, especially at the billion-timepoint scale of CAPTURE recordings, is an emerging discipline. In the future, new frameworks for identifying behaviors and long timescale structure will be aided by new approaches for dimensionality reduction (DeAngelis et al., 2019), clustering (Todd et al., 2017), feature set design (Du et al., 2015), and statistical modelling (Calhoun et al., 2019). However, such unsupervised analyses will only be an intermediate step towards the creation of standardized and rigorously defined behavioral taxonomies, which CAPTURE recordings should greatly facilitate (Anderson and Adolphs, 2014).
The ability of CAPTURE to readout postural kinematics also opens the possibility of creating analysis frameworks that go beyond behavioral identification to elucidate important axes of variation from the perspective of motor control (DeAngelis et al., 2019), behavioral exploration (Wu et al., 2014), and neural activity. Much like how task-optimized neural networks have been valuable tools for elucidating the relevant axes of variation of poorly characterized sensory representations (Bao et al., 2020), the internal structure of deep neural networks trained to imitate CAPTURE recordings should be valuable tools for disentangling motor representations (Merel et al., 2018, 2019b).
Outlook
CAPTURE, in its current form, is a powerful tool for recording whole-body kinematics, but future improvements could extend its ability to provide an even more integrated picture of whole-organism behavior and physiology across species. Extension of CAPTURE’s kinematic coverage can be made using increased numbers of smaller markers, active motion capture markers, or improved algorithms for imputation and body model fitting. To facilitate a broader understanding of behavior and physiological state, CAPTURE recordings could be paired with complementary measurements of musculoskeletal or physiological state, such as of skeletal or muscular dynamics (Nakamura 2005), or eye, pupil, and whisker tracking (Meyer et al., 2018). CAPTURE can easily be extended to other mammalian model systems such as marmosets or mice, or social settings by using distinguishable marker sets, although potential effects of body piercings should be evaluated in each new use case. Motion capture systems identify keypoints in real time, and so can be used with real-time behavioral recognition approaches to deliver closed-loop feedback, enabling precise and rapid reinforcement of natural behaviors and specific movement kinematics. Lastly, the comprehensive behavioral recordings enabled by CAPTURE, when combined with neural recordings, should be a powerful tool for disentangling the neural coding schemes for movement and influence of behavior on cognitive and sensory coding.
Overall, by providing a quantitative and comprehensive portrait of animal behavior, CAPTURE sets the stage for a broad range of new inquiries into animal behavior and its control by the nervous system.
STAR METHODS
RESOURCE AVAILABILITY
Lead Contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Bence P. Ölveczky (olveczky@fas.harvard.edu).
Materials Availability
This study did not generate new unique reagents.
Data and Code Availability
Demonstration code and example datasets are available at https://github.com/jessedmarshall/CAPTURE_demo. The core functions used for the behavioral embedding are available https://github.com/gordonberman/MotionMapper. The code used for imputation is available from the authors at https://github.com/diegoaldarondo/MarkerBasedImputation. The remainder of the code used for analysis were made using standard approaches in MATLAB 2017b and open source code extensions and are available from the corresponding author on request.
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Experiments were performed using female and male Long-Evans rats aged 2-18 months. Female rats were obtained from Charles Rivers (Strain 006). Male Fmr1-KO rats (LE-Fmr1em4Mcwi) and cagemate controls were obtained from the Medical College of Wisconsin. Knockouts were generated by a CRISPR/SpCas9 knockout of Fmr1 as described elsewhere (Miller et al., 2019). Loss of the Fmr1 protein was confirmed by Western Blot. We recorded all Fmr1-KO and Fmr1-WT rats at 6-10 weeks of age. Genotype of individuals used in the study was confirmed by PCR.
METHOD DETAILS
Motion Capture Arena
We performed motion capture recordings using a commercial 12-camera motion capture system (Kestrel, Motion Analysis). To reduce background due to infrared reflections in motion capture recordings, we positioned cameras 5-10 feet from the center of the motion capture recording arena, a 2-foot diameter plexiglass cylinder. The arena was filled with pine chip bedding and a ceramic pedestal (5 inches diameter by 2 inches in height). We placed cameras at two different heights and angled them at 15° and 35° to the horizontal to track the half-silvered motion capture markers across the ~270° azimuthal extent of their retroreflectivity. We secured the plexiglass arena in a custom wooden platform sealed with urethane and painted with ultra-flat black paint (Krylon 1602). We fitted the arena with a custom lever and water spout for animal training (Kawai et al., 2015), and coated reflective elements outside the arena with dulling spray (Krylon 1301) to reduce reflectivity. We placed six RGB video cameras (Flea3 FL3-U3-13S2C, Point Grey) with varifocal lenses (Computar T10Z0513CS) on tripods (Slik Able 300 DX) above and on to the sides of the arena to provide video recordings. The arena was kept on a 12-hour light cycle, and red lights were used at night during RGB video recordings.
Retroreflective Body Piercings
We fabricated motion capture markers from 5 mm diameter high-index of refraction ball lenses (n=2.0, H-ZLAF, Worldhawk Optoelectronics) (Mischiati et al., 2015). To increase the reflectivity of the ball lenses, we cleaned and half-silvered them using a commercial silvering kit (A1204D, Angel Gilding) and custom rubber mold (Mischiati et al., 2015). The half-silvered markers were detectable in the motion capture arena across a 270 ° azimuthal range. To track the position and angular rotations of the head, three retroreflectors were attached to a custom 3D printed acrylic headcap (30x40 mm, 35 mm tall, 10 g) that was painted using flat black paint (Krylon 1602). We placed the markers in an isosceles triangle to improve marker assignment based on pairwise distances. We fabricated piercings for tracking the trunk and hips by soldering a 6 mm steel cup (H20-1585FN, Fire Mountain Gems) to a Monel clamp (INS1005-5, Kent Scientific). We used high-strength epoxy to affix retroreflectors to the steel cup (Loctite EA0151) that we cured at 175° C, resulting in light (0.6 g) and durable retroreflectors. We chose the angle of the cup to the clamp, and hence the angle of the retroreflectors, to maximize the observability of markers across the range of motion of the rat. We angled markers on the posterior of the animal towards the head to facilitate marker tracking when the animal reared. We angled lateral markers towards the midline to facilitate tracking during trunk twists. Because the skin on the lateral side of the animal was poorly suited to support the clamp piercings, we fabricated 0.4 gram markers to be placed on the shoulders, forelimbs and hindlimbs by fusing retroreflectors to steel earstuds with 6mm wide cups using high-strength epoxy.
To track the position and orientation of the animal’s head, trunk, and major appendages, as well as provide appropriate asymmetry to uniquely identify motion capture markers using a pairwise body model, we designed a custom 20 marker placement strategy. We placed markers according to skeletal landmarks identified under the skin (Hebel and Stromberg, 1976). We placed three markers along the animal’s spine at the sixth thoracic vertebrae (Th6), the first lumbar vertebrae (L1) and the first sacral vertebrae (S1). To provide asymmetry for distinguishing the left and right sides of the animal, we also placed two markers on the animal’s left trunk, midway along the anterior-posterior axis between each pair of adjacent spine markers and located 20 mm vertically beneath the spine. Markers over the animal’s hips were placed along the femur above the trochanter minor. We used ten markers to track the position and configuration of the forelimbs and hindlimbs. We attached three markers to each forelimb: one over the scapula, 10 mm from the posterior endpoint, one over the olecranon (elbow), and one at the midpoint of the ulna. We attached two markers to each hindlimb: one on the animal’s patella and one at the midpoint of the tibia.
Drug Administration
We injected all drugs at 1 ml/kg. We dissolved Caffeine (Sigma C0750) in Phosphate Buffered Saline (PBS) to achieve a dilution of 10 mg/ml and injected at 10 mg/kg. We administered amphetamine (Sigma A5880) at 0.5 mg/kg in PBS. Vehicle injections were performed using PBS. To administer drugs, we briefly anesthetized animals and allowed them to recover for 10 minutes before we began recordings. Baseline recordings were matched to the same time of day as drug administration.
Surgery
The care and experimental manipulation of all animals were reviewed and approved by the Harvard University Faculty of Arts and Sciences Institutional Animal Care and Use Committee. All surgical procedures were designed to limit pain and discomfort. Surgeries for attaching body piercings for motion tracking were performed under 1-2% isoflurane anesthesia. Prior to surgery, we sterilized all tools. We sterilized body piercings by placing them in 70% ethanol for 30 minutes and rinsing with sterile water. We shaved the animal’s head, trunk, and limbs using an electric razor. We removed remaining hair on the scalp using depilatory cream (Nair) and sterilized the scalp using betadine. To attach the custom acrylic heacap bearing head markers, we made a longitudinal incision over the animal’s scalp and retracted the skin to expose the skull. We placed three skull screws over the cerebellum and temporal lobes and covered the skull with C&B Metabond (Parkell). We affixed the headcap using cyanoacrylate glue and dental cement (A-M Systems, 525000). Sites for the placement of body piercings on the skin were marked using a skin pen and then sterilized using alternating washes of betadine and 70% ethanol.
To attach markers to the spine, trunk and hips, we made two small incisions spaced by 1 cm, and drew a sterile, beveled, 18-gauge needle through the incisions to draw open the sites (Angel, 2009). We then inserted body piercings through the ends of the incision and secured them in place using pliers. Incisions and piercings were oriented perpendicularly to the skin lines of maximal tension (Hussein, 1973). For markers on the shoulders, forelimbs and hindlimbs, we similarly inserted a sterile, 18 gauge hollow needle through two points on the skin spaced 10 mm apart. We inserted the end of the piercing through the hollow end of the needle and retraced the needle from the skin. To then secure limb piercings, we attached earnuts (Fire Mountain Gems, H20-A5314FN) to the back of the piercings. We spaced the earnuts from the skin using damp wooden barriers and soldered them to the piercings using a soldering iron and solder flux. We applied antibiotic cream to marker sites and administered animals buprenorphine (0.05 mg/kg) and carprofen (5 mg/kg) subcutaneously following surgery. Following surgery, rats maintained a stable weight: R2=0.06 for a linear fit between normalized body weight and day after surgery, p-value of the linear correlation coefficient after a linear fit to the data P=0.25, 20 timepoints over 6 animals.
Motion Capture Recordings
Prior to all recordings, we habituated animals to handling by the investigator for at least 5 days, and animals were handled daily during recording to provide enrichment (van Praag et al., 2000). We habituated animals to the arena for at least one day prior to marker attachment. We allowed animals to recover in the arena for at least one day before beginning recordings. To provide further enrichment we allowed animals to interact with a familiar, gender-matched, conspecific for 30-60 min.
We performed motion capture recordings using a commercial motion capture acquisition program (Cortex, Motion Analysis), on a custom acquisition computer (32 GB RAM, 3.6 GHz Intel i7). Motion capture was recorded continuously at 300 Hz with a 1/2500th second strobe and 750 nm illumination. We acquired data in 30-minute epochs with 5 s gaps in between and saved to a local server. We made simultaneous video recordings at 30-50 Hz. Points tracked by motion capture were assigned identities in real-time based on a model of pairwise distance relationships between markers that were fit offline.
Animals that were not part of Fragile X experiments (Fmr1-KO and Fmr1-WT) were previously trained in a motor skill task as previously described (Kawai et al., 2015; Poddar et al., 2013). Briefly, animals were water restricted to 85% of body weight and given three 1-hour training sessions per day in which they had to press a lever twice within a 700 ms interval to receive a water reward. Failed trials were penalized with a 1.2 s timeout. Animals were trained to asymptote in the automated recording boxes, with a mean inter-press interval trials within 10% of 700 ms and a coefficient of variation of the inter-press interval less than 0.25. We allowed animals to habituate to the motion capture arena until reliable task execution was observed.
Motion Capture Postprocessing
We performed preprocessing and data analysis on a custom analysis workstation (128 GB RAM, 3.6 GHz Intel i7) using Matlab (Mathworks, Natick MA). We first smoothed marker data using a 3-frame median filter. We then transformed marker positions into an egocentric reference frame centered on the middle of the animal’s spine, which we placed at the origin. In this egocentric reference frame, we aligned all markers in the horizontal (x-y) plane so that the front of the animal’s spine was oriented along the y-axis. We defined joint angles as projections of the angles between adjacent body segments (Figure S1B) in the y-z (saggital), x-z (coronal) and x-y (transverse) planes of the egocentric coordinate as well as the inter-segment plane defined by the three points of the adjacent segments.
We labeled frames as moving if the average speed of the spine in a 1-s moving window exceeded 1.5 mm/s. We marked markers as missing if they were not detected by the motion capture array or not fit by the motion capture body model. We additionally excluded measurements in which markers exceeded a velocity threshold of 25 mm/frame, or in which markers on the forelimb and hindlimb showed non-physical configuration relative to markers on the olecranon or patella, respectively.
To impute the position of missing markers we used a three-pronged approach. First, for gaps shorter than 5 frames (17 ms), we applied cubic interpolation separately across each gap for each Cartesian marker position. Second, for gaps longer than 5 frames, in frames in which a marker was missing but 5-8 adjacent markers were well tracked, markers were imputed using a random forest regression trained to predict the position of the marker from the well-tracked adjacent markers. We fit a separate model for each Cartesian component of each marker. Random forest models used 50 trees trained over each the 30-minute recording session, and had less than 1 mm median out-of-bag error for each marker group (Head: 0.53 mm Trunk: 0.34 mm: Forelimbs: 0.85 mm; Hindlimbs: 0.24 mm; n=16 animals, n=73 conditions). Lastly, to impute missing marker data over gaps >17 ms in which a marker and one or more of its adjacency partners were missing, we used a temporal convolutional network that incorporates past temporal information about all markers positions to impute missing data (Oord et al., 2016a)(Figure S3, Methods S1). Our network uses 9 past time points sampled at 60 Hz (150 ms) in a 4-layer neural network that uses dilated causal convolutions with 512 filters per layer. Neurons in the network used a linear activation function. Networks using a rectified-linear activation performed similarly. We trained the network separately over all frames on each full day using a GPU cluster (4x Nvidia V100).
Following imputation, we re-applied the velocity and physical thresholds above to identify poorly imputed markers that we then marked as missing. We additionally applied two other quality thresholds: a threshold of 1 standard deviation on the summed jerk of all markers averaged across a 2.5 second moving window and an additional physical constraint on the sum of pairwise distance dij of head markers (50 mm < ∑dij < 100 mm) for each frame to identify any remaining tracking errors.
To compare the number of markers detected by 6 or 12 cameras (Figure S1E), we recorded five animals using all 12 cameras, and stored the raw motion capture video files as well as the tracked marker positions. We used the motion capture video files to re-track the animals post hoc using Cortex. We tracked animals using the original 12 camera set, as well as four different subsets of 6 cameras: the upper and lower sets of cameras, as well as two different sets of alternating camera pairs. We assigned frames as moving where the velocity of the center of the spine, detected using all 12 cameras, was less than 3 mm/s, and compared detection ability over moving frames. For all points detected using 6 cameras, we assessed them as being on the animal if they were within 20 mm of a marker tracked using 12 cameras.
Performance comparisons with 2D Convolutional Neural Networks
Recently, 2D convolutional networks trained from scratch or on human-labeled pose databases have been applied to behavioral tasks to detect un-occluded visual landmarks in animals (Mathis et al., 2018; Pereira et al., 2019). These approaches, as well as keypoint prediction using classical computer vision approaches (Guo et al., 2015; Kabra et al., 2013), can be straightforwardly extended to 3D keypoint detection through triangulation of detected keypoints across multiple calibrated cameras. However, to date, these approaches have been restricted to 3D detection of un-occluded keypoints in individual behaviors. There are principled reasons for this. 3D tracking with these techniques requires that a keypoint be accurately identified in two different cameras before triangulation. As these 2D convolutional networks are pre-trained and fine-tuned on human labeled datasets in which only un-occluded markers are labeled, these networks are typically only able to identify un-occluded keypoints in each view. Thus, detection and successful triangulation of keypoints on the often-occluded appendages across a broad behavioral repertoire requires large numbers of cameras and labeled examples. These problems are elegantly solved using motion capture, which uses high-sensitivity markers to improve labeling precision to sub-pixel resolution (10 μm in our recordings), and on-board camera FPGAs to allow for integration of large numbers of high-resolution cameras with low data-bandwidth (10 GB/day compared to ~1000 GB/day for compressed video or depth camera recordings as computed from our data). Despite these limitations, these 2D approaches have been suggested as general purpose tools for 3D keypoint detection across behaviors.
To perform comparisons between CAPTURE and predictions from 2D convolutional networks, we synchronized the six video cameras (Motion Capture Recordings, above) with motion capture acquisition using a custom decimator circuit (Arduino Uno, Arduino) that downsampled 300 Hz motion capture trigger signals to 30 Hz. We acquired images at 1320 x 1048 and compressed them in real-time using h264 compression. We encompassed the entire arena (600 mm diameter) in each camera’s field of view, so that each camera’s pixel size at the center of the arena was less than 1 mm. We calibrated cameras using custom scripts in Matlab. We used a custom checkerboard with 17 mm spacing to compute camera intrinsics, and a motion capture L-frame (Motion Analysis) to compute camera extrinsics and align cameras to the motion capture world coordinate system. We manually inspected 2D projections of the 3D motion capture data, and in some experiments refined the camera calibrations by manually selecting points on the rat’s headcap and using these to recalculate extrinsic camera parameters. All transformations between the 3D world coordinate system and the image frame of individual cameras accounted for radial and tangential distortions of images. All calibrations showed low ~0.3 mm reprojection errors across cameras.
To generate training datasets from motion capture data, we synchronized recordings from five rats bearing markers using two different sets of camera views (3.24·105 timepoints; 1.94·106 images; 12 total views). We sampled 192,000 images from recordings from three of these rats, sampling across all views used. To ensure an adequate diversity of poses in this training dataset, we sampled these frames uniformly from 40 partitions of the motion capture recordings made by performing k-means clustering (k=40) on the animal’s pose estimated by motion capture. To make predictions across variable amounts of training data, we trained networks on randomly sampled subsets of 100, 1000, 10,000, or 100,000 frames from the initial 192,000 frame training dataset. We obtained keypoint predictions in poses for all 20 markers by projecting 3D keypoints detected by motion capture into 2D image frames from all views. We used the remaining two animals as a test dataset to evaluate the network’s capacity to generalize to held out animals.
To generate training datasets in rats not bearing markers (Figure S4), we lightly shaved three animals and attached a headcap such that rats were subject to the same physical preparation as animals used for motion capture except for marker attachment. Two human labelers marked the position of each of the 20 keypoints corresponding the body position of motion capture markers that were visible in 225 frames from 3 views (25 unique timepoints per rat) to compute the inter-human variability in hand-labeling. We used the labels from one human as a training dataset, and a separate set of 225 frames (25 timepoints per rat) labeled by the same human as a test dataset. We drew samples from the same two sets of camera views used above.
We initialized DeeperCut (the algorithm used in DeepLabCut) using ResNet 101 and used default training configurations without the use of pairwise terms (Insafutdinov et al., 2016; Mathis et al., 2018) We fine-tuned networks on hand-labeled images or motion capture reprojections described above for 1.03 ·106 steps. We also trained DeepLabCut on the full 192,000 frame set of labeled frames in animals bearing markers, and then additionally fine-tuned this network on the set of 225 hand-labeled images described above to attempt to improve generalization to rats not bearing markers. We triangulated the resulting marker predictions from 2D predictions across multiple views by taking the median vector across all individual pairwise triangulations, which we found was superior to multi-view singular value decomposition (Hartley and Zisserman, 2003). For 3-camera predictions, we repeated this procedure for all possible sets of 3 views and reported average statistics. To compare DeepLabCut with motion capture (Figure 2C), we compared DeepLabCut predictions to motion capture measurements on non-imputed recordings, ignoring frames in which motion capture markers were not tracked. To assess the ability of DeepLabCut or CAPTURE to reconstruct the animal’s posture (Figure 2D) we computed the fraction of body segments (Figure S1B) whose predicted lengths were within 18 mm of the segment lengths measured using motion capture. CAPTURE estimates of pose reconstruction efficacy are derived from the more extensive dataset used in Figure S1.
To compute expected enhancements in DeepLabCut performance when using 12 cameras, we used published measurements of markerless pose tracking performance using variable numbers of cameras (Bala et al., 2020; Iskakov et al., 2019). These reports demonstrate sub-linear increases in measurement precision when adding additional cameras, consistent with the precision of a noisy variable estimated using N independent measurements scaling as 1/. We therefore estimated the error of DeepLabCut predictions made using 12 cameras as below those that use 6 cameras, for each frame, and used these reduced errors when estimating pose reconstruction efficacy (Figure 2D).
Behavioral Feature Generation and Selection
To detect specific behaviors and kinematic variants of behaviors in our datasets, we generated a set of 140 per-frame features that describe the instantaneous speed and spatial configuration of the motion capture markers, as well as the spatio-temporal trajectories of the markers on a 500 ms timescale (Berman et al., 2014; Brown et al., 2013; Kabra et al., 2013; Stephens et al., 2008). We computed two sets of features for this analysis: (1) a set of 80 features specifically selected to provide information about 37 pairs of behavioral distinctions commonly recognized by rodent researchers, such as rearing, walking, and the subphases of facial and body grooming (Figure S5)(Whishaw and Kolb, 2005), and (2) a more general set of descriptors to enable discrimination between kinematic variants of these behaviors.
To select the features most informative about behavioral distinctions, we first generated a set of 985 features describing the pose kinematics of the rat in a 500 ms window. This 985-feature set consisted of both per-marker features describing the velocity of individual markers on different timescales, and whole-organism features, which conveyed information about the relative position and velocity of markers as a group. Per-marker features included the cartesian velocity components of each marker in the animal’s egocentric reference frame, smoothed on 100, 300, and 1000 ms time intervals, as well as the moving standard deviation of each velocity components within each interval (Kabra et al., 2013). We additionally included features encapsulating the animal’s overall speed: the average velocity and standard deviation of the animal in the world reference frame in each time interval. To compute whole-organism features, we combined information across the 10 markers on the top of the animal, which included the head, trunk, and hips. We reasoned that this set of posterior markers would be sufficient to classify the animal’s behavior because of the success of depth cameras in the same task (Hong et al., 2015; Wiltschko et al., 2015). We computed the top 10 principal components of this marker set’s Cartesian position, segment lengths and selected joint angles over time. To additionally compute features with frequency-specific information, we computed a wavelet transform of each of these pose features using 25 mortlet wavelets spaced between 0.5 and 60 Hz. This yielded a set of 250 time-frequency coefficients that we compressed by computing the top 15 principal components of the wavelets. In all cases in which we computed the principal components of a set of pose or wavelet features, we used the top eigenvectors from one rat as a fixed basis set to compute the top principal components of each of these feature categories across all rats.
We then selected features that provided information about behavioral distinctions. Two observers used a custom graphical user interface to annotate a subset of 1.56·104 motion capture frames with one of 37 commonly recognized rodent behaviors across n=3 animals. We computed the information each feature provided about pairs of behavioral distinctions (Figure S5). We selected 80 features, chosen by the presence of an approximate knee in the cumulative pairwise discriminability, that were most informative about behavioral distinctions: the top 10 and 6 principal components, respectively, of the Euclidean pose and segment lengths, the top 10, 15, and 15 principal components, respectively, of the wavelet transform of the joint angles of both the whole animal and, separately, the head and trunk, the relative speed of the head, hips, and trunk within a 100 and 300 ms window, the absolute speed of the trunk and its standard deviation in a 100 and 300 ms window, and the z-component of the trunk and head velocity averaged over a 100 and 300 ms window.
To provide greater ability to distinguish between new behaviors and kinematic variants of behavior not previously recognized by rodent researchers, we computed a set of more general features describing the configuration and kinematics of all 20 markers. We generated a tree of links between markers that approximated the major joint angles of the head, neck, trunk, forelimbs and hindlimbs (Figure S1), and computed the segment lengths, joint angles, and Cartesian pose of these links. We computed the top ten principal components of each of these feature categories. To provide a set of kinematic descriptors of each frame, we computed the wavelet transform of each of these ten principal components, using the same wavelet parameters as above. We computed all principal components using a common set of eigenvectors computed from one rat. Concatenating these postural and kinematic descriptors produced a 60-dimensional feature set that we combined with the features selected above to yield a 140-dimensional feature vector for each frame, which we then whitened.
To compare embeddings made using Cartesian or joint angle representations of posture (Figure S5E), we used the principal components and wavelet transforms of the Cartesian pose, or the joint angles and segment lengths, respectively.
This 140-dimensional feature set was sufficient for separating and commonly recognized behaviors and their kinematic variants (Figure 3). Some of these features, such as the principal components of the Cartesian and joint angle eigenpostures contain redundant information (Figure S5E) suggesting a more parsimonious feature space may be used. Conversely, some behaviors, such as sniffing, that are subtle or difficult to hand-annotate may not be emphasized in the feature selection approach above. Other feature engineering approaches, guided by ground truth CAPTURE datasets, could be used to weight, select, or expand this to facilitate interpretable behavioral embedding and classification.
Behavioral Embedding, Clustering, and Systematics
We created behavioral maps by embedding behavioral feature vectors in two-dimensions using t-SNE (Berman et al., 2014; Maaten and Hinton, 2008). To create a co-embedding across all rats, we concatenated the feature matrix of frames in which animals were moving for 16 rats across 73 different behavioral conditions (1.04·109 frames). We subsampled this feature matrix at 1 Hz to create a feature vector comprising ~106 timepoints. Because t-SNE uses an adaptive similarity metric between points, when we created embeddings by uniformly sampling the data the embeddings were dominated by large regions when the animal was relatively still or adjusting its posture. We thus balanced the feature set (Berman et al., 2014) by performing k-means clustering on the full ~106 frame feature matrix using 8 clusters. We drew 30,000 samples from each cluster to create a 240,000 frame feature matrix that we embedded using a multi-core implementation of t-SNE. We found that adding further samples resulted in overcrowding of the t-SNE space (Kobak and Berens, 2019). We performed all t-SNE embedding using the Barnes-Hut approximation with ϴ=0.5 and the top 50 principal components of the feature matrix. For co-embeddings across rats, we used a perplexity of 200, which we found produced superior results for large feature sets. After creation of the embedding space, we re-embedded out-of-sample points in two steps. First, we found the 25 nearest neighbors to the out-of-sample point in the 140-dimensional feature space of whitened features. Next, we found the position of the first nearest neighbor in the embedding space. We took the position of the out-of-sample point as the median position of all the 25 nearest neighbors within a 3 unit radius of the closest nearest neighbor (Kobak and Berens, 2019).
To create behavioral clusters, we smoothed the behavioral map with a gaussian kernel of width 0.125, ~2 times the width of the spatial autocorrelation of the t-SNE map. We then clustered the data using a watershed transform (Berman et al., 2014). After clustering, two observers defined the kinematic criteria for assigning behavioral clusters into one of 12 coarse behavioral categories, such as walking, rearing, or grooming. The observers also established criteria for further assigning clusters to one of ~80 fine behavioral categories, such as ‘low rear’, ‘high rear’, and ‘right head scratch’, that provided additional detail regarding the exact posture and kinematics of the animal. Each observer then watched 24 instances of each behavior selected at random from one animal and assigned each behavioral cluster into a coarse and fine behavioral category. Disagreements were resolved through discussion. Coarse behavioral boundaries drawn on the behavioral maps are hand drawn approximations to the occurrence of coarse behavioral labels in the dataset.
Temporal Pattern Matching Algorithm
To identify patterns of repeated behaviors, we began with an ethogram encapsulating the occurrence of K behaviors over M frames. We smoothed the ethogram with a boxcar filter across the temporal dimension, using filter windows of τ = 15, 120 s. We normalized each frame, yielding a behavioral probability density matrix. In this matrix, individual frames reflect the probability of behavioral usage in a window of length τ. We then computed the correlation coefficient between all frames of the density matrix, yielding a behavioral similarity matrix of dimensions M x M.
Off-diagonal elements in this behavioral similarity matrix correspond to pairs of temporal epochs with similar behavioral usage. To identify repeated behavioral patterns, we thresholded the matrix, retaining regions with a correlation coefficient in their behavioral density vector greater than 0.3. We then performed a watershed transform of this thresholded similarity matrix, where each region extracted using the watershed transform corresponded to a pair of temporal epochs with similar behavioral usage in the dataset. We retained only regions of length > seconds, and additionally excluded pairs of neighboring timepoints within 2 τ of one another, which may retain causal influence from the filtering process. For single day recordings, this procedure produced 50,000-100,000 epochs of similar behavioral usage to at least one other temporal epoch. To detect frequently occurring patterns, we computed the correlation distance between these epochs, and clustered the resulting distance matrix using hierarchical clustering, with a cutoff of 0.65 and tree depth of 3. The algorithm was linearly sensitive to the clustering cutoff used in a similar manner across animals. We chose a cutoff so that commonly accepted patterns such as the skilled tapping task were identified and not overly split across clusters.
The resulting clusters, which we refer to as ‘sequences’ and ‘states’ for short and long timescales, respectively, were marked as grooming if animals spent at least 40% of their time within the cluster grooming. Clusters were marked as active or inactive if they spent greater time during active behaviors (walking, investigating, rearing, and wet dog shake) compared to inactive behaviors (prone still, postural adjustment) or vice versa. Clusters were marked as task-related if they comprised at least 10% of the task-related timepoints, which were defined as those frames within 5 s or 30 s of a lever tap for sequences and states, respectively. Comparisons to Markovian behavior were made by simulating behavioral traces of identical length to the observed data using the average observed transition matrix and identifying sequences and states in the same manner as real data.
In cases when the algorithm was run over multiple days and animals, the number of epochs exceeded the memory capacity of hierarchical clustering algorithms, which rely on creating a full matrix of pairwise distances. To identify behavioral patterns in this case, we subsampled 50,000 epochs from the data in a balanced manner, by running k-means clustering on the dataset with k=300, and then evenly sampling from the clusters without replacement. We performed hierarchical clustering using the same parameters as above on this subsampled dataset and performed nearest-neighbors re-embedding to assign clusters for the remainder of the data.
Behavioral Comparisons Before and After Body Piercings
To compare behavioral usage before and after attachment of body piercing, we created a 122-dimensional feature set describing the movements of the head in the arena. This feature set included the instantaneous position and velocity of the head in the arena’s Cartesian coordinate frame, as well as in an egocentric coordinate frame centered on the head. We computed the roll and pitch angles of the head relative to the global Cartesian coordinate system. Because the yaw of the head can only be defined in a relative manner without measurements of the neck position, we only computed the absolute velocity of yaw deviations across neighboring frames. We smoothed all velocities with a 5-frame median filter, and to incorporate longer-timescale information about head movements, we convolved each Cartesian and egocentric velocity and position component with a boxcar filter on 200 ms and 1 s timescales. We additionally included 25 principal components of the wavelet transform of the Cartesian and egocentric velocity in, using the same wavelet parameters as described above.
To automatically identify the behavior of animals bearing head markers alone, we trained a random forest classifier with 50 trees to predict the animals’ behavior from the 122-dimensional feature set describing the kinematics of the headcap. To train the classifier, we assigned behavioral labels to the days after marker attachment, and used half of the data as training set, reserving the second, interleaved, half as a test set. We assumed a uniform prior probability of observing each coarse behavioral class.
QUANTIFICATION AND STATISTICAL ANALYSIS
Motion Capture Precision
Motion Capture uses FPGAs on cameras to identify marker positions in real time, obviating the need to stream high-definition video and permitting the use of large camera arrays to track points without prohibitive data storage and processing requirements. Because of the high signal-to-noise of markers and the large number of high-definition, high bit-depth cameras used, motion capture is extremely precise, and stationary markers in our arena were tracked with 10 μm precision (Mischiati et al., 2015). Much of the error in motion capture measurements comes from changes in the subset of cameras used to record markers as animals move in the arena, which will vary in their estimate of a marker’s position due to noise in the camera calibration. To estimate this error due to variation in the camera subset used, we computed the difference between the measured marker position in 3D space, averaged across all cameras, and the position of the marker projected into each camera. We then averaged this reprojection error across markers to arrive at an estimate of the overall precision taken over 5 rats: 0.21±0.07 mm for 12 cameras, n=8·107 frames; 0.22±0.08 mm for 6 cameras, n=2.7·106 frames; mean ± s.d..
We then estimated the overall error of CAPTURE (Figure S1G) for each marker m as: Em = frer + fiΣτfτeτ, where fr is the fraction of timepoints when the markers was well-tracked, er is the motion capture reprojection error, fi is the fraction of frames where the marker was imputed, fτ is the fraction of imputed frames with a gap length of τ and eτ is the mean error across synthetic gaps with length τ for the chosen marker.
Behavioral Power Spectra
To compute the power spectra of marker velocities for each behavior, we extracted a [−1,+1] s windows of time around all instances of a behavior from a single day. We computed the instantaneous Cartesian velocity of each marker in each time window in the egocentric reference frame centered on the animal. We concatenated these velocities and computed the power spectrum of each component using Welch’s method, with a 1-s bin size and 0.5-s overlap between bins. We averaged the power spectra across Cartesian components to yield the total power spectrum. To temporally align individual instances of behavior, we computed the cross-correlation between the z-velocity of a chosen marker across all behavioral instances, using a 500 ms lag. To further maximize the similarity of behaviors in each cluster, we divided instances into three clusters, and temporally shifted behaviors within a cluster to maximize the cross-correlation. Only clusters with at least 5 observed instances are shown.
Transition Matrix Eigenvalue Analysis
To assess the existence and extent of non-Markovian structure in rat behavior, we computed the eigenvalue-decomposition of the behavioral transition matrix, Tij(τ), where Tij(τ) = p(Xt+τ = i∣Xt = j) for two behaviors i and j. Since this transition matrix is right-stochastic in all cases, by the Perron-Frobenius Theorem, the leading eigenvalue should be exactly equal to one, and all other eigenvalues must have magnitudes less than one (Berman et al., 2016). The eigenvector corresponding to the largest eigenvalue is proportional to the overall probability of observing a given state, while other eigenvectors correspond roughly to linear combinations of behavior that exhibit the longest temporal correlations. Here, we look at the magnitudes of the largest eigenvalues (λ1,λ2,…) of this matrix as function of τ (Figure S6A). For comparison, we also computed the eigenvalues of the transition matrix Tij(τ) made from simulated data assuming that behavior is organized as a first-order Markov chain. We simulated a Markov chain separately for each day from the average observed transition matrix, to create a chain of identical length to the observed data.
Similarity of Behavioral Patterns Across Timescales
To compute the similarity of temporal pattern usage on two different timescales s(t1) and s(t2) (Figure S6D), we computed the fraction of time the two time series of cluster labels were in agreement were in agreement after optimal permutation of the cluster labels: for each temporal pattern j at short timescales t1 we found the temporal pattern i at the longer timescale t2 with the most similar distribution over time. We computed the fractional overlap of these two temporal patterns f(j) = ∣{s(t1) = i}⋂{s(t2) = j}∣/∣{s(t1) = i}⋃{s(t2) = j} ∣ and computed the average of this fractional overlap over all patterns j, weighted by the frequency of the observed pattern, to yield the overall similarity.
To compute the average sparsity of the pattern transition matrix (Figure S6E), for each pattern p on timescale τ we computed , where ρi is the frequency of observing behavior i, and Ti->j is the probability of transitioning between behaviors i and j. If a behavior transitions to n other behaviors with equal probability, then . Thus higher sparsity values Sp(τ) indicate that, on average, behaviors in a pattern transition to a smaller set of behaviors, and hence the pattern shows greater sequential organization. In Figure S6E we report the average sparsity over all patterns (1/Mp)∑pSp(τ), where Mp is the number of patterns, normalized by the average over all timescales. To ensure that our estimates of the transition matrix for each pattern are robust, we computed the sparsity for 25 different samples of behavior for each pattern by randomly selecting a consecutive subset of half the observed frames for each sample.
Cross-condition Behavioral Comparisons
To compare the frequency of behaviors across perturbations induced by drugs and disease we computed the number of occurrences of individual behaviors or coarse behavioral categories for each rat and condition at 6 Hz sampling (drug experiments) or 1 Hz sampling (Fragile X experiments). To test for significant changes in the fraction of time animals spent moving or in coarse behavioral categories (Figure 5A-5B; 6A-6B), we first computed the effective sample size of temporal processes using neff = n/(1 + 2τ), where τ is the temporal autocorrelation of the timeseries (Gelman et al., 2013). The autocorrelation of the binarized moving-not moving timeseries was τ =469±157 s, while the autocorrelation of the coarse behavioral timeseries and behavioral timeseries were 62±12 s and 0.65±0.1 s, respectively (mean ± s.d, computed over 10 days of recordings and 5 rats).
To examine the up- or down-regulation of specific behaviors, we first restricted analyses to a set of frequent behaviors observed in at least 120 samples across all conditions (20 or 120 seconds for drug and Fragile X experiments, respectively). To determine whether these frequent behaviors were significantly modulated across conditions we computed the Poisson probability of the observed behavioral frequency in the perturbed condition, given the behavioral frequency in the baseline condition. We the adjusted all Poisson p-values by the Benjamini-Hochberg procedure, and only showed behaviors whose adjusted p-value was less than 10−6.
Individuality Analyses
To compare behavior across days and animals, we selected days for analysis from 5 female Long-Evans rats where no drugs or vehicle were administered. We varied the number of clusters in Figure 7F by varying the kernel used to create a density map before watershed clustering. To compare usage of sequences and states, we ran our pattern detection algorithm over all animals and days considered. To maximize similarity in sequences and states detected across animals, we used a density of 0.2 for smoothing the behavioral map, which corresponds to 527 clusters in Figure 7F. We observed similar results using other densities.
To decode animal identity on each day given information about the animal’s behavioral and pattern usage, we trained a random forest classifier with 50 trees. We divided periods of active behavior on each day into 10-minute segments and computed the average behavioral and pattern usage within each segment. We trained the algorithm on half of the segments of each day, and tested on the second, interleaved, half. We found that training the classifier using shorter input segment lengths showed poorer performance, especially when the classifier only had access to behavioral usage statistics.
Data Analysis and Statistics
We performed all filtering and data analysis using custom software written in MATLAB (Mathworks). Unless otherwise stated, we used non-parametric statistical tests (two-tailed) to avoid assumptions that distributions were normal and had equal variance across groups. Boxes in all box-and-whisker plots corresponded to the interquartile range, with lines at the median. Whiskers spanned the smallest and largest data point within ± 1.5 times the interquartile range. No data was excluded in this study. Because of the proof-of-principle nature of this work, investigators were not blinded, nor were pre-determined sample sizes used to determine cohort sizes. Replication of these experiments was outside the scope of the current work.
Supplementary Material
Supplemental Video 1 ∣ Schematic 3D rendering of the CAPTURE technique.
Related to Figure 1.
A rat is free to behave in a behavioral arena. We attach twenty retroreflective markers to the animal using body piercings, which we track using a twelve-camera motion capture array. Tracked marker positions are used to create a wireframe representation of the pose of the rat’s trunk and appendages. We animated videos using the actual behavior of the rat in the open field arena.
Supplemental Video 2 ∣ Example of raw CAPTURE recordings in real time.
Related to Figure 1.
A view from one of the twelve infrared cameras used to track the rat’s retroreflective markers (right) is shown alongside a color video recording of the rat shown with the positions of all twenty markers reprojected (left). Green circles in the infrared view show the computed positions of markers, and lags reflect delays in software rendering rather than true lags in identifications. Colored linkages between markers in the color video recording are used to uniquely assign tracked markers to positions of the body. Motion capture data is shown with no post-processing, so unimputed marker dropouts are present.
Supplemental Video 3 ∣ CAPTURE enables continuous kinematic recordings.
Related to Figure 1.
Wireframe representations of rat behavior in the openfield arena for the same rat on six different days, animated at 2x, 10x and 100x real time.
Supplemental Video 4 ∣ 2D Pose detection networks are unsuited to tracking in 3D across the rat behavioral repertoire.
Related to Figure 2.
Video recording of a rat behavior in an open field, with keypoints tracked using CAPTURE or predicted by DeepLabCut projected onto video frames (left). We trained DeepLabCut with 100-100,000 frames. Animals whose videos were used in the training dataset (In-sample) or out of the training dataset (Out-of-sample) are shown separately. Right: concurrent wireframe representations of the keypoints in the animal’s egocentric reference frame. Video speed is 1.5-times real time.
Supplemental Video 5 ∣ Rats move through behavioral embedding space.
Related to Figure 3.
A video recording of animal behaving in the CAPTURE arena (right), and the simultaneous position of the animal’s posture and kinematics in the behavioral map over the past ten timepoints, visualized as black circles, (left). Video speed is 4-times real time.
Supplemental Video 6 ∣ Behavioral embeddings separate coarse behavioral categories.
Related to Figure 3.
A series of examples of rat behavior sampled from behavioral clusters distributed across different coarse categories in the embedding space. For each cluster sampled (left), six different examples of the rat’s behavior in that cluster are shown in the egocentric reference frame of the animal (right). Coarse behavioral categories are colored as in Figure 2. Videos are shown at real time and repeated twice.
Supplemental Video 7 ∣ Behavioral embeddings allow identification of kinematic variation.
Related to Figure 3.
A series of examples of rat behavior sampled from behavioral clusters distributed across the same coarse region of the embedding space. For each cluster, we show sixteen different instances of rat behavior, represented as wireframes in the animal’s egocentric reference frame (left) drawn from the same cluster in the embedding space (right). Different videos are shown at real time and repeated twice.
Supplemental Video 8 ∣ Behavior is organized on short timescales into repeated behavioral sequences.
Related to Figure 4.
Examples of different behavioral sequences detected by our pattern matching algorithm (Figure 4) on 15-s timescales. White squares indicate breaks between different instances of the behavior shown. Video speed is 4-times real time.
Supplemental Video 9 ∣ Behavior is structured on longer timescales into behavioral states.
Related to Figure 4.
Examples of different behavioral states detected by our pattern matching algorithm on 2-min timescales. Video speed is 4-times real time.
Supplemental Video 10 ∣ Caffeine and amphetamine introduce aberrant behavioral states.
Related to Figure 5.
Examples of three different behavioral states preferentially observed after administration of a saline vehicle, caffeine, or amphetamine, respectively. Video speed is 12-times real time.
KEY RESOURCES TABLE
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Chemicals, Peptides, and Recombinant Proteins | ||
| Amphetamine | Sigma-Aldrich | A5880 |
| Caffeine | Sigma-Aldrich | C0750 |
| Experimental Models: Organisms/Strains | ||
| Long-Evans Rats | Charles River | 006 |
| Fmr1-KO, WT Long-Evans Rats | Medical College of Wisconsin | RGD 11553874 |
| Software and Algorithms | ||
| Motion Mapper | (Berman et al., 2014) | https://github.com/gordonberman/MotionMapper |
| TCN Imputation | This Work | https://github.com/diegoaldarondo/MarkerBasedImputation |
| Behavioral Analysis Suite | This Work | https://github.com/jessedmarshall/CAPTURE_demo |
| Matlab | Mathworks | https://www.mathworks.com/products/matlab.html |
| Other | ||
| Motion Capture System | Motion Analysis | Kestrel 4200 |
| Video Cameras | FLIR | Flea3 FL3-U3-13S2C |
| 5 mm H-ZLAF ball lenses | Worldhawk Optoelectronics | Custom, via https://www.alibaba.com/ |
| Silvering Kit | Angel Gilding | Cat# A1204D |
| Earings | Fire Mountain Gems | Cat# H20-1585FN; H20-A5314FN |
| High-strength epoxy | Loctite | EA0151 |
| Clamps | Kent Scientific | INS1005-5 |
Acknowledgements
We thank all members of the Ölveczky lab for feedback and advice and acknowledge special experimental and computational assistance from Ashesh Dhawale, Steffen Wolff, Sasha Ileue, Alexander Barrett, Mahmood Shah, as well as the Harvard Center for Brain Science neuroengineers: Ed Soucy, Joel Greenwood, Adam Bercou and Brett Graham. We additionally thank the laboratory of Aaron Guerts and Simons Foundation Autism Research Initiative (SFARI) for generating the Fmr1-KO rat model. JDM acknowledges support from a Helen Hay Whitney Foundation postdoctoral fellowship and NINDS K99/R00 Pathway to Independence Award from the NIH, DEA from a National Science Foundation graduate fellowship, and WLW from a Harvard College Research Program fellowship. GJB acknowledges grant support from the Research Corporation, HFSP and NIMH, and BPÖ acknowledges support from SFARI, the NIH, NSF, the Starr Family Foundation. Parts of this work was performed at the Aspen Center for Physics, which is supported by the NSF.
Footnotes
Declaration of Interest
The authors declare no competing interests.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Agarwal A, and Triggs B (2004). Tracking Articulated Motion Using a Mixture of Autoregressive Models In Computer Vision - ECCV 2004, Pajdla T, and Matas J, eds. (Berlin, Heidelberg: Springer Berlin Heidelberg; ), pp. 54–65. [Google Scholar]
- Alahi A, Goel K, Ramanathan V, Robicquet A, Fei-Fei L, and Savarese S (2016). Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–971. [Google Scholar]
- Anderson DJ, and Adolphs R (2014). A framework for studying emotions across species. Cell 157, 187–200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anderson DJ, and Perona P (2014). Toward a science of computational ethology. Neuron 84, 18–31. [DOI] [PubMed] [Google Scholar]
- Angel E (2009). The piercing bible : the definitive guide to safe body piercing (Berkeley, Calif: Celestial Arts; ). [Google Scholar]
- Antoniou K, Kafetzopoulos E, Papadopoulou-Daifoti Z, Hyphantis T, and Marselos M (1998). D-amphetamine, cocaine and caffeine: a comparative study of acute effects on locomotor activity and behavioural patterns in rats. Neurosci Biobehav Rev 23, 189–196. [DOI] [PubMed] [Google Scholar]
- Bai S, Kolter JZ, and Koltun V (2018). An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. ArXiv180301271 Cs. [Google Scholar]
- Bala PC, Eisenreich BR, Yoo SBM, Hayden BY, Park HS, and Zimmermann J (2020). Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nat. Commun 11, 4560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bao P, She L, McGill M, and Tsao DY (2020). A map of object space in primate inferotemporal cortex. Nature 583, 103–108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berman GJ (2018). Measuring behavior across scales. BMC Biol. 16, 23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berman GJ, Choi DM, Bialek W, and Shaevitz JW (2014). Mapping the stereotyped behaviour of freely moving fruit flies. J. R. Soc. Interface 11, 20140672. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berman GJ, Bialek W, and Shaevitz JW (2016). Predictability and hierarchy in Drosophila behavior. Proc Natl Acad Sci U A 113, 11943–11948. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berridge KC, and Fentress JC (1987). Disruption of natural grooming chains after striatopallidal lesions. Psychobiology 15, 336–342. [Google Scholar]
- Berridge KC, Fentress JC, and Parr H (1987). Natural syntax rules control action sequence of rats. Behav. Brain Res 23, 59–68. [DOI] [PubMed] [Google Scholar]
- Blake R, and Shiffrar M (2006). Perception of Human Motion. Annu Rev Psychol 58, 47–73. [DOI] [PubMed] [Google Scholar]
- Box GEP, and Jenkins GM (1976). Time Series Analysis: Forecasting and Control, Revised Ed (Holden-Day; ). [Google Scholar]
- Branson K, Robie AA, Bender J, Perona P, and Dickinson MH (2009). High-throughput ethomics in large groups of Drosophila. Nat. Methods 6, 451–457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown AEX, Yemini EI, Grundy LJ, Jucikas T, and Schafer WR (2013). A dictionary of behavioral motifs reveals clusters of genes affecting Caenorhabditis elegans locomotion. Proc Natl Acad Sci 110, 791–796. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brunner D, Kabitzke P, He D, Cox K, Thiede L, Hanania T, Sabath E, Alexandrov V, Saxe M, Peles E, et al. (2015). Comprehensive Analysis of the 16p11.2 Deletion and Null Cntnap2 Mouse Models of Autism Spectrum Disorder. PLOS ONE 10, e0134572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Calhoun AJ, Pillow JW, and Murthy M (2019). Unsupervised identification of the internal states that shape natural behavior. Nat. Neurosci 22, 2040–2049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cappello A, Cappozzo A, La Palombara PF, Lucchetti L, and Leardini A (1997). Multiple anatomical landmark calibration for optimal bone pose estimation. Hum. Mov. Sci 16, 259–274. [Google Scholar]
- Chen L-C, Papandreou G, Kokkinos I, Kevin Murphy, and Yuille AL (2016). Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. ArXiv14127062 Cs. [DOI] [PubMed] [Google Scholar]
- Collins TD, Ghoussayni SN, Ewins DJ, and Kent JA (2009). A six degrees-of-freedom marker set for gait analysis: Repeatability and comparison with a modified Helen Hayes set. Gait Posture 30, 173–180. [DOI] [PubMed] [Google Scholar]
- Courtine G, Song B, Roy RR, Zhong H, Herrmann JE, Ao Y, Qi J, Edgerton VR, and Sofroniew MV (2008). Recovery of supraspinal control of stepping via indirect propriospinal relay connections after spinal cord injury. Nat Med 14, 69–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dawkins R (1976). Hierarchical organisation: A candidate principle for ethology In Growing Points in Ethology, (Cambridge, England: Cambridge University Press; ), pp. 7–54. [Google Scholar]
- DeAngelis BD, Zavatone-Veth JA, and Clark DA (2019). The manifold structure of limb coordination in walking Drosophila. ELife 8, e46409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deutscher J, Blake A, and Reid I (2000). Articulated body motion capture by annealed particle filtering. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition CVPR; 2000, pp. 126–133 vol.2. [Google Scholar]
- Dhawale AK, Poddar R, Wolff SB, Normand VA, Kopelowitz E, and Olveczky BP (2017). Automated long-term recording and analysis of neural activity in behaving animals. Elife 6, e27702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dhawale AK, Wolff SBE, Ko R, and Ölveczky BP (2019). The basal ganglia can control learned motor sequences independently of motor cortex. BioRxiv 827261. [Google Scholar]
- Dickerson AK, Mills ZG, and Hu DL (2012). Wet mammals shake at tuned frequencies to dry. J R Soc Interface 9, 3208–3218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Du Y, Wang W, and Wang L (2015). Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118. [Google Scholar]
- Egnor SER, and Branson K (2016). Computational Analysis of Behavior. Annu. Rev. Neurosci 39, 217–236. [DOI] [PubMed] [Google Scholar]
- Elliott GR, Vanwersch RAP, and Bruijnzeel PLB (2000). An automated method for registering and quantifying scratching activity in mice: Use for drug evaluation. J. Pharmacol. Toxicol. Methods 44, 453–459. [DOI] [PubMed] [Google Scholar]
- Forkosh O, Karamihalev S, Roeh S, Alon U, Anpilov S, Touma C, Nussbaumer M, Flachskamm C, Kaplick PM, Shemesh Y, et al. (2019). Identity domains capture individual differences from across the behavioral repertoire. Nat. Neurosci 22, 2023–2028. [DOI] [PubMed] [Google Scholar]
- Freund J, Brandmaier AM, Lewejohann L, Kirste I, Kritzler M, Krüger A, Sachser N, Lindenberger U, and Kempermann G (2013). Emergence of Individuality in Genetically Identical Mice. Science 340, 756–759. [DOI] [PubMed] [Google Scholar]
- Gal Y, and Ghahramani Z (2016). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In International Conference on Machine Learning, pp. 1050–1059. [Google Scholar]
- Gall J, Rosenhahn B, Brox T, and Seidel H-P (2010). Optimization and filtering for human motion capture. Int J Comput Vis 87, 75. [Google Scholar]
- Gallistel CR (1982). The Organization of Action: A New Synthesis (Hillsdale, NJ: Psychology Press; ). [Google Scholar]
- Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, and Rubin DB (2013). Bayesian data analysis (CRC press; ). [Google Scholar]
- Gittis AH, and Kreitzer AC (2012). Striatal microcircuitry and movement disorders. Trends Neurosci 35, 557–564. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Graves A, Mohamed A, and Hinton G (2013). Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. [Google Scholar]
- Guo JZ, Graves AR, Guo WW, Zheng J, Lee A, Rodriguez-Gonzalez J, Li N, Macklin JJ, Phillips JW, Mensh BD, et al. (2015). Cortex commands the performance of skilled movement. Elife 4, e10774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hebel R, and Stromberg MW (1976). Anatomy of the laboratory rat. (Baltimore: Williams and Wilkins; ). [Google Scholar]
- Heskes T (1997). Practical Confidence and Prediction Intervals In Advances in Neural Information Processing Systems 9, Mozer MC, Jordan MI, and Petsche T, eds. (MIT Press; ), pp. 176–182. [Google Scholar]
- Hong W, Kennedy A, Burgos-Artizzu XP, Zelikowsky M, Navonne SG, Perona P, and Anderson DJ (2015). Automated measurement of mouse social behaviors using depth sensing, video tracking, and machine learning. Proc. Natl. Acad. Sci 112, E5351–E5360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Horst S, Bernhard W, Teschler-Nicola M, Platzer W, zur Nedden D, Henn R, Oberhauser A, and Sjøvold T (1992). Some anthropological aspects of the prehistoric Tyrolean Ice Man. Science 258, 455–457. [DOI] [PubMed] [Google Scholar]
- Hussein MA (1973). Skin cleavage lines in the rat. Eur Surg Res 5, 73–79. [DOI] [PubMed] [Google Scholar]
- Ionescu C, Papava D, Olaru V, and Sminchisescu C (2014). Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Trans Pattern Anal Mach Intell 36, 1325–1339. [DOI] [PubMed] [Google Scholar]
- Iskakov K, Burkov E, Lempitsky V, and Malkov Y (2019). Learnable Triangulation of Human Pose. International Conference on Computer Vision 7718–7727. [Google Scholar]
- Jhuang H, Garrote E, Yu X, Khilnani V, Poggio T, Steele AD, and Serre T (2010). Automated home-cage behavioural phenotyping of mice. Nat. Commun 1, 68. [DOI] [PubMed] [Google Scholar]
- Jinnah HA, and Hess EJ (2015). Chapter 4 - Assessment of Movement Disorders in Rodents In Movement Disorders (Second Edition), LeDoux MS, ed. (Boston: Academic Press; ), pp. 59–76. [Google Scholar]
- Kabra M, Robie AA, Rivera-Alba M, Branson S, and Branson K (2013). JAABA: interactive machine learning for automatic annotation of animal behavior. Nat Methods 10, 64–67. [DOI] [PubMed] [Google Scholar]
- Kalueff AV, Stewart AM, Song C, Berridge KC, Graybiel AM, and Fentress JC (2016). Neurobiology of rodent self-grooming and its value for translational neuroscience. Nat. Rev. Neurosci 17, 45–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kawai R, Markman T, Poddar R, Ko R, Fantana AL, Dhawale AK, Kampff AR, and Olveczky BP (2015). Motor cortex is required for learning but not for executing a motor skill. Neuron 86, 800–812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kitagawa M, and Windsor B (2008). MoCap for Artists: Workflow and Techniques for Motion Capture (Amsterdam ; Boston: Focal Press; ). [Google Scholar]
- Kobak D, and Berens P (2019). The art of using t-SNE for single-cell transcriptomics. BioRxiv 453449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kobak D, and Linderman GC (2019). UMAP does not preserve global structure any better than t-SNE when using the same initialization. BioRxiv 2019.12.19.877522. [Google Scholar]
- Krakauer JW, Ghazanfar AA, Gomez-Marin A, MacIver MA, and Poeppel D (2017). Neuroscience Needs Behavior: Correcting a Reductionist Bias. Neuron 93, 480–490. [DOI] [PubMed] [Google Scholar]
- Lakshminarayanan B, Pritzel A, and Blundell C (2017). Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles In Advances in Neural Information Processing Systems 30, Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, and Garnett R, eds. (Curran Associates, Inc.), pp. 6402–6413. [Google Scholar]
- Lamb AM, Goyal AGAP, Zhang Y, Zhang S, Courville AC, and Bengio Y (2016). Professor forcing: A new algorithm for training recurrent networks. In Advances In Neural Information Processing Systems, pp. 4601–4609. [Google Scholar]
- Lashley KS (1951). The problem of serial order in behavior In Cerebral Mechanisms in Behavior, (New York: Wiley; ), pp. 112–131. [Google Scholar]
- Lathe R (2004). The individuality of mice. Genes Brain Behav. 3, 317–327. [DOI] [PubMed] [Google Scholar]
- Liu G, and McMillan L (2006). Estimation of missing markers in human motion capture. Vis. Comput 22, 721–728. [Google Scholar]
- Luo L, Callaway EM, and Svoboda K (2018). Genetic Dissection of Neural Circuits: A Decade of Progress. Neuron 98, 256–281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maaten, van der L, and Hinton G (2008). Visualizing Data using t-SNE. J. Mach. Learn. Res 9, 2579–2605. [Google Scholar]
- Machado AS, Darmohray DM, Fayad J, Marques HG, and Carey MR (2015). A quantitative framework for whole-body coordination reveals specific deficits in freely walking ataxic mice. ELife 4, e07892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mallick T, Das PP, and Majumdar AK (2014). Characterizations of Noise in Kinect Depth Images: A Review. IEEE Sens. J 14, 1731–1740. [Google Scholar]
- Mathis A, Mamidanna P, Cury KM, Abe T, Murthy VN, Mathis MW, and Bethge M (2018). DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat Neurosci 21, 1281–1289. [DOI] [PubMed] [Google Scholar]
- Merel J, Hasenclever L, Galashov A, Ahuja A, Pham V, Wayne G, Teh YW, and Heess N (2018). Neural Probabilistic Motor Primitives for Humanoid Control. In International Conference on Learning Representations, p. [Google Scholar]
- Merel J, Botvinick M, and Wayne G (2019a). Hierarchical motor control in mammals and machines. Nat. Commun 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Merel J, Aldarondo D, Marshall J, Tassa Y, Wayne G, and Olveczky B (2019b). Deep neuroethology of a virtual rodent. In International Conference on Learning Representations, p. [Google Scholar]
- Meyer AF, Poort J, O’Keefe J, Sahani M, and Linden JF (2018). A Head-Mounted Camera System Integrates Detailed Behavioral Monitoring with Multichannel Electrophysiology in Freely Moving Mice. Neuron 100, 46–60.e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miller EA, Kastner DB, Grzybowski MN, Dwinell MR, Geurts AM, and Frank LM (2019). Robust and replicable measurement for prepulse inhibition of the acoustic startle response. BioRxiv 601500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mimica B, Dunn BA, Tombaz T, Bojja VPTNCS, and Whitlock JR (2018). Efficient cortical coding of 3D posture in freely behaving rats. Science 362, 584–589. [DOI] [PubMed] [Google Scholar]
- Mischiati M, Lin H-T, Herold P, Imler E, Olberg R, and Leonardo A (2015). Internal models direct dragonfly interception steering. Nature 517, 333–338. [DOI] [PubMed] [Google Scholar]
- Nestler EJ, and Hyman SE (2010). Animal Models of Neuropsychiatric Disorders. Nat. Neurosci 13, 1161–1169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nix DA, and Weigend AS (1994). Estimating the mean and variance of the target probability distribution. In Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), pp. 55–60 vol.1. [Google Scholar]
- Ohayon S, Avni O, Taylor AL, Perona P, and Roian Egnor SE (2013). Automated multi-day tracking of marked mice for the analysis of social behavior. J. Neurosci. Methods 219, 10–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oord, van den A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, and Kavukcuoglu K (2016a). WaveNet: A Generative Model for Raw Audio. ArXiv160903499 Cs. [Google Scholar]
- Oord van den A, Kalchbrenner N, and Kavukcuoglu K (2016b). Pixel Recurrent Neural Networks. ArXiv160106759 Cs. [Google Scholar]
- Pappas SS, Leventhal DK, Albin RL, and Dauer WT (2014). Mouse models of neurodevelopmental disease of the basal ganglia and associated circuits. Curr Top Dev Biol 109, 97–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parker JG, Marshall JD, Ahanonu B, Wu Y-W, Kim TH, Grewe BF, Zhang Y, Li JZ, Ding JB, Ehlers MD, et al. (2018). Diametric neural ensemble dynamics in parkinsonian and dyskinetic states. Nature 557, 177–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pavlakos G, Zhou X, Derpanis KG, and Daniilidis K (2017). Harvesting Multiple Views for Marker-Less 3D Human Pose Annotations. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1253–1262. [Google Scholar]
- Pearce T, Zaki M, Brintrup A, and Neely A (2018). High-Quality Prediction Intervals for Deep Learning: A Distribution-Free, Ensembled Approach. ArXiv180207167 Stat. [Google Scholar]
- Pereira TD, Aldarondo DE, Willmore L, Kislin M, Wang SS-H, Murthy M, and Shaevitz JW (2019). Fast animal pose estimation using deep neural networks. Nat. Methods 16, 117–125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Poddar R, Kawai R, and Olveczky BP (2013). A fully automated high-throughput training system for rodents. PLoS One 8, e83171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Praag H, Kempermann G, and Gage FH (2000). Neural consequences of environmental enrichment. Nat. Rev. Neurosci 1, 191–198. [DOI] [PubMed] [Google Scholar]
- Seidler H, Bernhard W, Teschler-Nicola M, Platzer W, zur Nedden D, Henn R, Oberhauser A, and Sjovold T (1992). Some anthropological aspects of the prehistoric Tyrolean ice man. Science 258, 455–457. [DOI] [PubMed] [Google Scholar]
- Silverman JL, Yang M, Lord C, and Crawley JN (2010). Behavioural phenotyping assays for mouse models of autism. Nat. Rev. Neurosci 11, 490–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simoncelli EP, and Olshausen BA (2001). Natural image statistics and neural representation. Annu. Rev. Neurosci 24, 1193–1216. [DOI] [PubMed] [Google Scholar]
- Stephens GJ, Johnson-Kerner B, Bialek W, and Ryu WS (2008). Dimensionality and dynamics in the behavior of C. elegans. PLoS Comput Biol 4, e1000028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stirn A (2003). Body piercing: medical consequences and psychological motivations. Lancet 361, 1205–1215. [DOI] [PubMed] [Google Scholar]
- Svoboda K, and Li N (2018). Neural mechanisms of movement planning: motor cortex and beyond. Curr. Opin. Neurobiol 49, 33–41. [DOI] [PubMed] [Google Scholar]
- Takeoka A, Vollenweider I, Courtine G, and Arber S (2014). Muscle spindle feedback directs locomotor recovery and circuit reorganization after spinal cord injury. Cell 159, 1626–1639. [DOI] [PubMed] [Google Scholar]
- Taylor GW, Hinton GE, and Roweis ST (2007). Modeling Human Motion Using Binary Latent Variables In Advances in Neural Information Processing Systems 19, Schölkopf B, Platt JC, and Hoffman T, eds. (MIT Press; ), pp. 1345–1352. [Google Scholar]
- Tinbergen N (1950). The hierarchical organization of nervous mechanisms underlying instinctive behaviour. In Symposium for the Society for Experimental Biology, pp. 305–312. [Google Scholar]
- Todd JG, Kain JS, and de Bivort BL (2017). Systematic exploration of unsupervised methods for mapping behavior. Phys. Biol 14, 015002. [DOI] [PubMed] [Google Scholar]
- Venkatraman S, Jin X, Costa RM, and Carmena JM (2010). Investigating neural correlates of behavior in freely behaving rodents using inertial sensors. J Neurophysiol 104, 569–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whishaw IQ, and Kolb B (2005). The behavior of the laboratory rat : a handbook with tests (Oxford ; New York: Oxford University Press; ). [Google Scholar]
- Wiltschko AB, Johnson MJ, Iurilli G, Peterson RE, Katon JM, Pashkovski SL, Abraira VE, Adams RP, and Datta SR (2015). Mapping Sub-Second Structure in Mouse Behavior. Neuron 88, 1121–1135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu HG, Miyamoto YR, Gonzalez Castro LN, Ölveczky BP, and Smith MA (2014). Temporal structure of motor variability is dynamically regulated and predicts motor learning ability. Nat. Neurosci 17, 312–321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu F, and Koltun V (2016). Multi-Scale Context Aggregation by Dilated Convolutions. ArXiv151107122 Cs. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental Video 1 ∣ Schematic 3D rendering of the CAPTURE technique.
Related to Figure 1.
A rat is free to behave in a behavioral arena. We attach twenty retroreflective markers to the animal using body piercings, which we track using a twelve-camera motion capture array. Tracked marker positions are used to create a wireframe representation of the pose of the rat’s trunk and appendages. We animated videos using the actual behavior of the rat in the open field arena.
Supplemental Video 2 ∣ Example of raw CAPTURE recordings in real time.
Related to Figure 1.
A view from one of the twelve infrared cameras used to track the rat’s retroreflective markers (right) is shown alongside a color video recording of the rat shown with the positions of all twenty markers reprojected (left). Green circles in the infrared view show the computed positions of markers, and lags reflect delays in software rendering rather than true lags in identifications. Colored linkages between markers in the color video recording are used to uniquely assign tracked markers to positions of the body. Motion capture data is shown with no post-processing, so unimputed marker dropouts are present.
Supplemental Video 3 ∣ CAPTURE enables continuous kinematic recordings.
Related to Figure 1.
Wireframe representations of rat behavior in the openfield arena for the same rat on six different days, animated at 2x, 10x and 100x real time.
Supplemental Video 4 ∣ 2D Pose detection networks are unsuited to tracking in 3D across the rat behavioral repertoire.
Related to Figure 2.
Video recording of a rat behavior in an open field, with keypoints tracked using CAPTURE or predicted by DeepLabCut projected onto video frames (left). We trained DeepLabCut with 100-100,000 frames. Animals whose videos were used in the training dataset (In-sample) or out of the training dataset (Out-of-sample) are shown separately. Right: concurrent wireframe representations of the keypoints in the animal’s egocentric reference frame. Video speed is 1.5-times real time.
Supplemental Video 5 ∣ Rats move through behavioral embedding space.
Related to Figure 3.
A video recording of animal behaving in the CAPTURE arena (right), and the simultaneous position of the animal’s posture and kinematics in the behavioral map over the past ten timepoints, visualized as black circles, (left). Video speed is 4-times real time.
Supplemental Video 6 ∣ Behavioral embeddings separate coarse behavioral categories.
Related to Figure 3.
A series of examples of rat behavior sampled from behavioral clusters distributed across different coarse categories in the embedding space. For each cluster sampled (left), six different examples of the rat’s behavior in that cluster are shown in the egocentric reference frame of the animal (right). Coarse behavioral categories are colored as in Figure 2. Videos are shown at real time and repeated twice.
Supplemental Video 7 ∣ Behavioral embeddings allow identification of kinematic variation.
Related to Figure 3.
A series of examples of rat behavior sampled from behavioral clusters distributed across the same coarse region of the embedding space. For each cluster, we show sixteen different instances of rat behavior, represented as wireframes in the animal’s egocentric reference frame (left) drawn from the same cluster in the embedding space (right). Different videos are shown at real time and repeated twice.
Supplemental Video 8 ∣ Behavior is organized on short timescales into repeated behavioral sequences.
Related to Figure 4.
Examples of different behavioral sequences detected by our pattern matching algorithm (Figure 4) on 15-s timescales. White squares indicate breaks between different instances of the behavior shown. Video speed is 4-times real time.
Supplemental Video 9 ∣ Behavior is structured on longer timescales into behavioral states.
Related to Figure 4.
Examples of different behavioral states detected by our pattern matching algorithm on 2-min timescales. Video speed is 4-times real time.
Supplemental Video 10 ∣ Caffeine and amphetamine introduce aberrant behavioral states.
Related to Figure 5.
Examples of three different behavioral states preferentially observed after administration of a saline vehicle, caffeine, or amphetamine, respectively. Video speed is 12-times real time.
Data Availability Statement
Demonstration code and example datasets are available at https://github.com/jessedmarshall/CAPTURE_demo. The core functions used for the behavioral embedding are available https://github.com/gordonberman/MotionMapper. The code used for imputation is available from the authors at https://github.com/diegoaldarondo/MarkerBasedImputation. The remainder of the code used for analysis were made using standard approaches in MATLAB 2017b and open source code extensions and are available from the corresponding author on request.







