Skip to main content
eLife logoLink to eLife
. 2021 Sep 2;10:e61806. doi: 10.7554/eLife.61806

A framework for studying behavioral evolution by reconstructing ancestral repertoires

Damián G Hernández 1,2,, Catalina Rivera 1,, Jessica Cande 3, Baohua Zhou 1,4, David L Stern 3, Gordon J Berman 1,5,
Editors: Jesse H Goldberg6, Christian Rutz7
PMCID: PMC8445618  PMID: 34473052

Abstract

Although different animal species often exhibit extensive variation in many behaviors, typically scientists examine one or a small number of behaviors in any single study. Here, we propose a new framework to simultaneously study the evolution of many behaviors. We measured the behavioral repertoire of individuals from six species of fruit flies using unsupervised techniques and identified all stereotyped movements exhibited by each species. We then fit a Generalized Linear Mixed Model to estimate the intra- and inter-species behavioral covariances, and, by using the known phylogenetic relationships among species, we estimated the (unobserved) behaviors exhibited by ancestral species. We found that much of intra-specific behavioral variation has a similar covariance structure to previously described long-time scale variation in an individual’s behavior, suggesting that much of the measured variation between individuals of a single species in our assay reflects differences in the status of neural networks, rather than genetic or developmental differences between individuals. We then propose a method to identify groups of behaviors that appear to have evolved in a correlated manner, illustrating how sets of behaviors, rather than individual behaviors, likely evolved. Our approach provides a new framework for identifying co-evolving behaviors and may provide new opportunities to study the mechanistic basis of behavioral evolution.

Research organism: D. melanogaster

Introduction

Behavior is one of the most variable and rapidly evolving phenotypes, with notable differences even between closely related species (Lorenz, 1958; Martins, 1996). Variable behaviors and rapid behavioral evolution likely facilitates adaptation to new or varying environments and speciation (Baier and Hoekstra, 1914; West-Eberhard, 2003). Despite the importance of animal behavior, progress in revealing the genetic basis of behavioral evolution has been slow (Gleason and Ritchie, 2004; Yamamoto and Ishikawa, 2013; Ellison et al., 2011; Shaw and Lesnick, 2009). In contrast, recent decades have seen significant progress in understanding the genetic causes of morphological evolution (Williams and Carroll, 2009; Shubin et al., 2009; Levine and Davidson, 2005; Stern and Frankel, 2013).

While there are many potential reasons for the discrepancy between studies of behavioral and morphological evolution, including the lack of a fossil record for behavior, a key difficulty has been identifying which aspects of an animal’s development and physiology are the proximate causes of behavior evolution. Evolutionary changes in behavior could emerge from alterations in the developmental patterning of neural circuits (e.g., brain networks, descending commands, central pattern generators), changes in hormonal regulation that influence neural activity, or even from changes in non-neuronal morphology (Baker et al., 2001; Massey et al., 2019). Each of these possibilities could result in behavioral effects at different, yet overlapping, timescales – from muscle twitches to stereotyped suites of behaviors to longer-lived states like foraging or courtship or aging that may control the relative frequency of a given behavior. This complexity may make it difficult to identify the precise aspects of behavior that have evolved.

To address these difficulties, the standard approach in the genetic study of behavioral evolution has been to identify focal behaviors that exhibit robust differences between species, such as courtship behavior in fruit flies (Cande et al., 2012; Cande et al., 2014; Ding et al., 2019) or burrow formation in deermice (Weber et al., 2013; Hu and Hoekstra, 2017). It has been possible to identify genomic regions that correlate with quantitative changes in focal behaviors. However, usually multiple genomic regions are identified, each containing many genes. Given the large number of putative genes involved, combined with the possibility of epistatic interactions between loci, identification of the contributions of individual genes to behavioral evolution has progressed slowly.

An alternative approach to focusing on single behaviors is to examine the full repertoire of movements that an animal performs. By identifying sets of behaviors that evolve together, as was recently performed for hand-tuned traits in a study of birds-of-paradise evolution (Ligon et al., 2018), it may be possible to identify regulators of these suites of behaviors. This approach has been thwarted by the challenge of robustly measuring multiple behavioral phenotypes simultaneously. Recent progress in the unsupervised identification of animal behaviors across length and time scales, however, has made this approach possible (Berman, 2018; Brown and de Bivort, 2018). In this study, we introduce a quantitative framework for studying the evolutionary dynamics of large suites of behavior. We have focused initially on fruit flies, which provide a convenient model for this problem – both because they exhibit a wide range of complex behaviors and because unsupervised approaches can be used to map all of the animal movements captured in video recordings (Berman et al., 2014; Cande et al., 2018; Berman et al., 2016).

We recorded movies of isolated male flies from six species in a nearly stimulus-deprived environment. Because we did not record flies experiencing social and other environmental cues, we did not observe many charismatic natural behaviors, such as courtship and aggression. Nevertheless, we found that the behaviors they performed, including walking and grooming, contain species-specific information. We thus hypothesized that our quantitative representations of behaviors could be studied in an evolutionary context. To infer the evolutionary trajectories of behavioral evolution, we estimated ancestral behavioral repertoires with a Generalized Linear Mixed Model (GLMM) approach (Hadfield, 2010), which builds upon Felsenstein’s approach to reconstructing ancestral states (Felsenstein, 1985; Felsenstein, 2005; Hadfield and Nakagawa, 2010; O’Meara, 2012). Using these results, we develop a framework that allows us to model the behavioral traits that co-vary both within a species and along the phylogeny. We found that within-species variance has a similar structure to long-lasting internal states of the animal that we characterized previously, and that inter-species variance can capture how disparate behaviors may have evolved together. This latter finding points toward the presence of higher order behavioral traits that would not have been detected by studying individual behaviors in isolation and that may be amenable to further evolutionary and genetic analysis.

Experiments and behavioral quantification

We captured video recordings of all behaviors performed by single flies isolated in a largely featureless environment for multiple individuals from six species of the Drosophila melanogaster species subgroup: D. mauritiana, D. melanogaster, D. santomea, D. sechellia, D. simulans, and D. yakuba (Cande et al., 2018). Although the animals could not jump or fly in these chambers and were not expected to exhibit social or feeding behaviors, the flies displayed a variety of complex behaviors, including locomotion and grooming. Each of these behaviors involves multiple body parts that move at varying time scales. The species studied here were chosen because their phylogenetic relationships are well understood (Clark et al., 2007; Obbard et al., 2012; Chyb and Gompel, 2013; Seetharam and Stuart, 2013) (summarized in the tree seen in Figure 3), and genetic tools are available for most of these species (Stern et al., 2017). Since a single strain represents a genomic ‘snapshot’ of each species, we assayed multiple individuals from each of multiple strains of each species to attempt to capture species-specific differences, and not variation specific to particular strains (see Materials and methods). In total, we collected data from 561 flies, each measured for an hour at a sampling rate of 100 Hz.

While previous studies have identified differences in specific behaviors, such as courtship behavior, between these species (Cande et al., 2012; Ding et al., 2019; Yamamoto and Ishikawa, 2013; Auer and Benton, 2016), here we assayed the full repertoire of behaviors the flies performed in the arena, with the aim of identifying combinations of behaviors that may be evolving together. To measure this repertoire, we used a previously described behavior mapping method (Berman et al., 2014; Cande et al., 2018) that starts from raw video images and finds each animal’s stereotyped movements in an unsupervised manner. The output of this method is a two-dimensional probability density function (PDF) that contains many peaks and valleys (Figure 1A), where each peak corresponds to a different stereotyped behavior (e.g., right wing grooming, proboscis extension, running, etc).

Figure 1. Behavioral repertoires of Drosophila.

Figure 1.

(A) The behavioral space probability density function, obtained using the unsupervised approach described in Berman et al., 2014 on the entire data set of 561 individuals across all species. Coarse grained behaviors corresponding to the different types of movements exhibited in the map are shown as well. (B) The relative performance of each of the 134 stereotyped behaviors for each of the six species. Each region here represents a behavior, and the color scale indicates the logarithm of the fraction of time that each species performs the specified behavior divided by the average across all species.

Briefly, to create the density plots, raw video images were rotationally and translationally aligned to create an egocentric frame for the fly. The transformed images were decomposed using Principal Components Analysis into a low-dimensional set of time series. For each of these postural mode time series, a Morlet wavelet transform was applied, obtaining a local spectrogram between 1 Hz and 50 Hz (the Nyquist frequency). After normalization, each point in time was mapped using t-SNE (van der Maaten and Hinton, 2008) into a two-dimensional plane. Finally, convolving these points with a two-dimensional gaussian and applying the watershed transform (Meyer, 1994), produced 134 different regions, each of these containing a single local maximum of probability density that corresponds to a particular stereotypical behavior. We integrate over this local region of the probability density to calculate the probability that a fly is performing this behavior at a random point in time. Thus, we can associate each fly with a 134-dimensional real-valued vector that represents the probability of the fly performing a certain stereotyped behavior at a given time during the hour-long experimental session. We will refer to this quantity as the animal’s behavioral vector, P.

The behavioral map averaged across all six species is shown in Figure 1A and displays a pattern of movements similar to those we found in previous work, where locomotion, idle/slow, anterior/posterior movements, etc. are segregated into different regions (Berman et al., 2014; Cande et al., 2018). Averaging across all individuals of each species, we found the mean behavioral vector for each species (Figure 1B) and observed that each species performs certain behaviors with different probabilities. For example, D. mauritiana individuals spend more time performing fast locomotion than all other species on average, and D. yakuba individuals spend much of their time performing an almost species-unique type of slow locomotion, but little time running quickly.

These average probability maps provide some insight into potential species differences, but to identify species-specific behaviors, we also need to account for variation in the probability that individuals of each species perform each behavior. One way to address this problem is to ask whether an individual’s species identity can be predicted solely from its multi-dimensional behavioral vector. To explore this question, we first used t-SNE to project all 561 individuals into a two-dimensional plane (Figure 2A), using the Jensen-Shannon divergence as the distance metric between individual behavioral vectors. In this plot, different colors represent different species, and different symbols with the same color represent different strains within the same species. Although species do not segment cleanly into separate regions of this plane, the distribution of species is far from random, with individuals from the same species tending to group near to one other. Given this structure, there is likely species-specific information in the behavioral vectors.

Figure 2. Classification of fly species based on behavioral repertoires.

Figure 2.

(A) A t-SNE embedding of the behavioral repertoires shows that behavioral repertoires contain some species-specific information. Each dot represents one individual fly, with different colors representing different species and different symbols with the same color representing different strains within the same species. The distance matrix (561 by 561) used to create the embedding is the Jensen-Shannon divergence between the behavioral densities of individual flies. (B) Confusion matrix for the logistic regression with each row normalized. All the values are averaged from 100 different trials. The standard error is less than 0.01 for the diagonal elements and less than 0.005 for each of the off-diagonal elements.

To quantify this observation, we applied a multinomial logistic regression classifier to the data, performing a six-way classification based solely on the high-dimensional behavioral vectors. After training, the classifier correctly classified 89±.2% of vectors in our test set (a randomly selected 30% of the entire data set that was not used during training). Moreover, the confusion matrix (Figure 2B) revealed no systematic misclassification bias amongst the species. Note that we have used a relatively simple classifier compared to modern deep learning methods (Goodfellow et al., 2016), so these results likely represent a lower bound on the distinguishability of the behavioral vectors. Thus, the behavioral vectors contain considerable species-specific information. We therefore proceeded to explore how these behavioral vectors may have evolved along the phylogeny.

Reconstructing ancestral behavioral repertoires

Multiple methods have been proposed for reconstructing ancestral states from data collected from extant species (Felsenstein, 1985; Felsenstein, 2005; Yang, 2006; O’Meara, 2012; Royer-Carenzi and Didier, 2016). These methods generally fall into two camps: parsimony reconstruction, which attempts to reconstruct evolutionary history with the fewest number of evolutionary changes (Cunningham et al., 1998), and diffusion-processes, which model evolution as a random walk on a multi-dimensional landscape (Hadfield and Nakagawa, 2010). Given the high-dimensional behavioral vectors that we are attempting to model, a diffusion process is more likely to capture the inter-trait correlations that we would like to understand. Thus, we focus on a diffusion-based model here.

Given a phylogeny for a collection of species, we modeled how species-specific complexes of behaviors might have emerged. We assumed that each animal’s behavior is a quantitative trait with an additive random effect, that is, each animal’s behavior is a trait that results from the additive effects of many genetic loci, each of small effect, that is combined with a non-genetic effect that represents inter-specific variation. We do not, however, assume that all behaviors evolve independently of each other. Thus, we are interested in predicting (1) whether intra- and inter-species variation can be separated to identify independently evolving sets or linear combinations of behaviors and (2) how behaviors co-vary along the phylogeny, potentially revealing co-evolving suites of behaviors.

We assumed that the observed flies’ behaviors evolved via a diffusion process with Gaussian noise from a common ancestor along the known phylogenetic tree. Note that this is a less restrictive assumption than neutrality, as multiple traits under selection may evolve in a correlated manner. Specifically, we fit a multi-response Generalized Linear Mixed Model (GLMM) to the data, using the approach described in Hadfield, 2010, modeling the evolutionary process such that the logarithm of the behavioral vector, P, for each individual (l=(l1,,lK=134)), is given by

l=μ+ρ+e, (1)

where μ is the mean behavior of the common ancestor (treated as the fixed effects of this model), and ρ and e are the random effects corresponding to the phylogenetic and individual variability, respectively. We assume that these random effects are generated from the multi-dimensional normal distributions 𝒩(0,AV(a)) (phylogenetic) and 𝒩(0,IV(e)) (individual). Here, the matrix A represents the information contained in the phylogenetic tree, with Aij being proportional to the length of the path from the most recent common ancestor of species i and j to the common ancestor. A is normalized so that the diagonal elements are all equal to 1. Therefore, Aij represents the phylogenetic similarity between a pair of species. I is the identity matrix, and V(a) and V(e) are the phylogenetic and within-species covariance matrices, respectively.

We fit μ, V(a), and V(e) using Markov Chain Monte Carlo (MCMC) simulations, confirming that the MCMC converged using the Gelman-Rubin diagnostic (see Materials and methods, Figure 3—figure supplement 1). In addition, our model is able to infer the mean endpoint behavioral repertoires (Figure 3—figure supplement 2), providing confidence that our model is consistent with our input data. In addition to the inferred behavioral states corresponding to the common ancestor, P¯Anc, we also reconstructed the mean behavioral representations for the intermediate ancestors (Figure 3).

Figure 3. Reconstructed behavioral repertoires using the GLMM.

Inferred probabilities of the behavioral traits for the ancestral states are plotted at the denoted locations along the phylogeny. Except for the common ancestor, ancestral states are plotted with respect to the closest ancestor. For each behavioral trait, i, in the intermediate ancestors, we show: log(P¯i)-log(P¯iAnc), where P¯i and P¯iAnc correspond to the inferred mean behavioral trait for the given ancestor and its closest ancestor, respectively. Coarse grained behaviors corresponding to different types of movements are shown on the top right corner.

Figure 3.

Figure 3—figure supplement 1. Gelman Rubin diagnostic for model parameters inferred using MCMC.

Figure 3—figure supplement 1.

(A) Potential Scale Reduction Factor (PSRF, see Materials and methods) for the 134 ancestral behaviors inferred in the GLMM. 20 MCMC chains with different initial conditions were used. (B) PSRF for the phylogenetic covariance matrix elements corresponding to the 10% most common behaviors performed by the measured flies. (C) PSRF for the individual covariance matrix elements corresponding to the 10% most common behaviors performed by the measured flies. The PSRF values for all of these inferred parameters indicate that the MCMC chains have converged.
Figure 3—figure supplement 2. Comparison between measured and inferred behaviors (on a log scale) for each of the extant species.

Figure 3—figure supplement 2.

Here, each measured behavioral mean plotted against the mean obtained from the components of the MCMC samples corresponding to that particular species and behavioral mode (i.e., the inferred behavioral repertories from the GLMM). The biggest differences occur mostly in the low probability behaviors, which we expect to be more sensitive to sampling errors.
Figure 3—figure supplement 3. Comparison of the independent focused trait approach vs the repertoire approach for a pair of behaviors.

Figure 3—figure supplement 3.

(A) Schematic of the different predictions that each model provides for the probability contour lines for a pair of behaviors – uncorrelated single-trait model in orange vs. correlated full-repertoire approach in blue. By definition, the single-trait model cannot predict behavioral covariance either inter- or intra-species. (B) Behavioral traits averaged within-species (colored dots) for two specific behaviors show a positive correlation, which is explained by the full-repertoire model (in blue). Ellipses are centered at the coordinates representing the behavioral traits of the inferred ancestral state, with semi-major and semi-minor axes corresponding to the eigenvectors and values of the phylogenetic covariance matrix, restricted to the behaviors shown on the left. For comparison, the contour line inferred using the single-trait model is shown in orange (level curves at two standard deviations from the mean). (C) Behavioral traits for all individuals within a species show a negative correlation, for this particular pair of behaviors, in contrast to the positive correlation observed in the species means and predicted by the full model. Blue ellipses correspond to the contour probability levels coming from the individual covariance matrix of the full-repertoire model. Note that the predictions from the single-trait model must necessarily be uncorrelated.

We also found that the model that allows behavioral co-evolution out-performs a model where each behavioral trait evolves independently. Specifically, we fit a model where behavioral correlations between individuals of different species were removed by enforcing that V(a) and V(e) must be diagonal matrices, a reduction of more than 17,500 parameters compared to the full model (see Materials and methods for details). The phylogenetic ancestral reconstruction was then made for each behavioral trait separately. To compare the relative performance of these models, we used the Deviance Information Criterion (DIC) (Spiegelhalter et al., 2002), a commonly used assessment tool for MCMC-fit hierarchical models that lack a good estimate for the number of effective parameters. Like other information-theoretic model selection criteria (e.g., the Akaike Information Criterion or the Bayesian Information Criterion), smaller values of the DIC imply a larger posterior probability of the model given the available data. Despite the large reduction in the number of parameters for the independent-trait model, the DIC for the independent-trait model was substantially higher (DIC=(242±2)×103) than the DIC for the full model (DIC=(114±2)×103). Moreover, the full model was able to predict the inter- and intra-species covariances between dissimilar pairs of behaviors (Figure 3—figure supplement 3). Hence, modeling the evolution of the full behavioral repertoire captures the structure of the observed data better than a single trait approach.

Individual variability and long timescale correlations

While it is not possible to directly test the accuracy of our ancestral state reconstructions (Figure 3), the inferred covariance matrices generate predictions about behavioral and genetic correlations that are, in principle, testable. We therefore focus on the fitted covariance matrices, V(e) and V(a) (each in IR134×134), which account for within-species and phylogenetic random effects, respectively.

We will focus first on the intra-species covariance matrix, V(e). We note first that the matrix exhibits a modular structure (Figure 4A). After rearranging the behavior order via an information-based clustering procedure (Slonim et al., 2005), we see that a block diagonal pattern emerges, with positive correlations lying within the blocks and negative correlations lying off the diagonal. The details of this particular clustering approach are described in Materials and methods, but we find that the results are nearly identical for several different clustering methodologies (Figure 4—figure supplement 1). Quantifying the matrix’s modularity via the average within-cluster dissimilarity,

d=kp(Ck)i,jCk12[1-Vij(e)Vii(e)Vjj(e)], (2)

Figure 4. The structure of variability between flies of the same species relates to long timescale transitions in behavior.

(A) The intra-species behavioral covariance matrix (V(e)), with columns and rows ordered via an information-based clustering algorithm (Slonim et al., 2005). The black squares represent behaviors that are grouped together in the three-cluster solution. (B) Behavioral map representation of the clustering solutions. The two-, three-, and six-cluster solutions are shown on top (colors on the three cluster solution match those above the plot in A). The clusters are all spatially contiguous and break down hierarchically (see Figure 4—figure supplement 1 for more examples). (C) Clustering structure of the behavioral space obtained finding the optimally predictive groups of behaviors (see text for details). Note how these clusterings are very similar to the clusterings in B, despite having been derived from an entirely independent measure.

Figure 4.

Figure 4—figure supplement 1. Behaviors clustered according to the individual covariance matrix using three different clustering methods.

Figure 4—figure supplement 1.

(A) Results using k-medoids clustering method with distance matrix dij=(1-ρij)/2 for 2,3,.7 clusters. To the right, the WSI between the clusters obtained using k-medoids and those obtained using the Deterministic Information Bottleneck (DIB) method on behavioral transitions (see Materials and methods). There is a high degree of similarity between these independently derived measurements, as can be shown when compared to the WSI calculated by randomly shuffling the labels of the k-medoids clustering corresponding to each number of clusters. (B) Same as in A but using Spectral clustering instead of k-medoids. The similarity index between Spectral clustering and predictive information bottleneck is also statistically significant. (C) Same as in A but using an Information-based clustering approach (see Materials and methods) instead of k-medoids. The similarity index between Information-based clustering and the results from the DIB analysis is statistically significant as well.
Figure 4—figure supplement 2. Modularity of the intra-species behavioral covariance matrix using information based clustering.

Figure 4—figure supplement 2.

<d> corresponds to the average distance among elements of the same clusters, (see Materials and methods). We show that for different numbers of clusters, the within-cluster distance is significantly smaller (in blue) than expected by random assignation of behaviors to clusters (in orange).
Figure 4—figure supplement 3. Coarse-grained behavioral representations that are optimally predictive of the future behavior states via DIB.

Figure 4—figure supplement 3.

(A) Behavioral representation with 2,3,…,7 clusters using τ=50 in Equation 10. (B) Optimal trade-off curve (Pareto Front) between complexity of coarse grained description against predictive power. For each number of clusters, representations in A correspond to points (red points) on this curve with the highest predictive information.

where Ck is the set of all behaviors belonging to the k th cluster, we find that d0.30 and 0.22 for the 3- and 6-cluster solutions, respectively. These values are significantly smaller than the average distances obtained using random cluster assignments (d=0.46±0.03 and 0.45±0.04 for 3 and 6 clusters respectively, see Figure 4—figure supplement 2 for other numbers of clusters and clustering methods). Thus, we can conclude that the intra-specific covariance matrix has a far-from-random modular structure, implying that between individuals of the same species, groups of behaviors tend to vary together in a stereotyped manner.

Moreover, these groups of behaviors that co-vary together within a species are not random collections of behaviors. Instead, we found that co-varying clusters are spatially contiguous in the behavioral map, implying that covariances of groups of similar behaviors (behaviors involving moving similar parts of the animals’ bodies at similar speeds) compose much of the observed intra-species variance. The clustering method does not take the spatial structure of the behavioral map into account at all (just the values in V(e)), so the clusters of local behaviors in the behavior map reflect underlying similarity in the covariance of nearby behaviors, rather than an artifact of the algorithm. Moreover, co-varying clusters are hierarchically organized, where coarse-grained co-varying behaviors can be sub-divided into smaller co-varying clusters (Figure 4B), a feature that is not guaranteed by the information-based clustering algorithm.

This hierarchical structure of the behavioral map is reminiscent of the hierarchical temporal structure of behavior that was hypothesized originally by ethologists (Tinbergen, 1951; Deutsch et al., 2020) and was observed to optimally explain the history-dependent long timescale non-stationary structure of Drosophila melanogaster behavioral transitions (Berman et al., 2016). Thus, we hypothesized that the structure of the intra-species covariance matrix might be linked to deviations from statistical stationarity in the behavioral data that were not explicitly measured in the unsupervised clustering or modeled in the GLMM.

To explore this connection further, we performed an analysis that is analogous to the single-species study in Berman et al., 2016, finding coarse-grainings of the behavioral space (i.e., a description of the behavioral space using fewer behaviors) that are optimally predictive of the future behaviors that the flies perform. Specifically, if b(t) is the behavior that a fly performs at time t, we would like to create a clustered version of our behavioral map, Z, such that we maximize the information that z(t)Z, the cluster that the fly is in at time t contains as much information about the future behavior of the fly, b(t+τ) as possible. To keep Z from separating each behavior into its own cluster, we also need to make sure that Z is as simple a clustering as possible (i.e., a smaller number of clusters and a more even distribution of time spent within each clusters).

To be more precise, we calculated Z using the the Deterministic Information Bottleneck (DIB) method (Strouse and Schwab, 2017). This approach minimizes the functional

𝒥τ=-I(Z(t);Z(b+τ))+γ(Z), (3)

where b(t+τ) is a fly’s behavior at time t+τ, Z(t) is the coarse-grained behavior visited at time t, I(b(t);Z(t+τ)) is the mutual information between these quantities, γ is a positive constant, and (Z) is the entropy of the coarse-grained representation (see Materials and methods). As γ is increased, progressively simpler, but less predictive, representations are found.

Applying this method to the data, pooled across all six species and using τ=50 (Figure 4C, Figure 4—figure supplement 3), we found the same hierarchical division of the behavioral map that was observed for freely moving D. melanogaster (Berman et al., 2016). Moreover, we found that the structure of the space using this approach closely mirrors the structure found via directly clustering the intra-species covariance matrix, V(e) (Figure 4C). Quantifying the similarity between both clustering partitions by calculating the Weighted Similarity Index (WSI), a modification of the Rand Index (Rand, 1971) (Materials and methods), the WSI between the information-based clustering method and the predictive information bottleneck for three clusters is WSI=0.73, and WSI=0.87 for six clusters. For random clusterings, we would expect to observe 0.51±0.02 and 0.70±0.01 for 3 and 6 clusters, respectively, indicating a non-random overlap between these two partitions. Figure 4—figure supplement 1, shows that this result is independent of the clustering method and the number of clusters.

The overlap between these two coarse-grainings indicates that most individual variability in the behaviors we observe results from non-stationarity in behavioral measurements, rather than from individual-specific variation. That is, much of the intraspecific variation appears to reflect flies recorded when they were experiencing different hidden behavioral states (e.g., circadian state, hunger, etc.), rather than reflecting fixed (environmental or genetic) differences between flies. This variation may have arisen because, although we controlled many variables (e.g., fly age, circadian cycle, temperature, and humidity), it is not possible to control for all internal factors (e.g., hunger, arousal, etc.) that affect an animal’s behavioral patterns (Anderson, 2016). The temporal coarse-graining of the behavioral space that we found via the DIB provides insight into these non-stationarities, as they are optimally predictive of the fly’s future behaviors. Given the contiguous nature of these regions, this result means that flies tended to stay within specific regions of the behavioral space much longer than one would assume from a Markov model, hinting that there is an important connection between variability across animals and variability between animals.

More precisely, these results imply that variation in behavior observed among individuals, especially in non-manipulated settings, may often reflect a large component of hidden behavioral states (Figure 5A). Thus, it may be possible to improve upon behavioral measurements in many settings by controlling for the variability associated with these hidden states. For example, just because one fly performs less anterior grooming than another may reflect that the animal is in a different long timescale behavioral state, rather than that the animal has a genetically encoded preference for reduced grooming.

Figure 5. Variability within a species, long timescale transitions, and hidden states modulating behavior.

Figure 5.

(A) A cartoon of the hypothesized relation between individual variability within a species and long timescale transitions through hidden states. (B) Accounting for the long timescale dynamics - by adjusting for the amount of time spent in each coarse-grained region (here, the six cluster solution at the top right of Figure 4C) - affects the measured behavioral distributions between D. santomea and D. yakuba. Shown is the comparison of the Mahalanobis distance ((zb)ij) between behavioral distributions before (x-axis) and after (y-axis) adjusting. (C) Kernel density estimates of the distributions for the circled behaviors in (B) on the left before (left) and after (right) adjustments. Solid lines represent D. santomea and dashed lines represent D. yakuba.

A potential method for accounting for these artifacts is to normalize each individual’s behavioral density such that the amount of time that the animal spends in each of the coarse-grained regions is equalized. In other words, the amount of time spent anterior grooming, locomoting, etc. are set to be the same for all animals, thus accounting for the variability associated with the inferred hidden states. Mathematically, if Pi is the probability of observing behavior i, and Ci is the clustering assignment of this behavior, we can define a normalized probability, P^i, via

P^i=P¯(Ci)Pi(Ci)Pi, (4)

where Pi(C)=kCPk is the total density in cluster C for an individual fly and P¯(C) is the average across all animals.

We found that applying this normalization to our data often results in substantial changes in the inferred distributions of behavioral densities. For example, Figure 5B displays how the difference in behavioral density between D. santomea and D. yakuba (as measured by the Mahalanobis distance between the distributions) alters as a result of normalization. For some behaviors, the signal increases (red points), and in some cases, it decreases (blue points). Thus, it is important to take these non-stationary effects into account when estimating how often single behaviors are performed in studies of behavioral evolution. To measure these non-stationary effects, many behaviors must be measured, not just a focal behavior, thus partially explaining the relative success of our multi-trait model compared to a model where each trait is analyzed independently.

Identifying phylogenetically linked behaviors

One of the advantages of our approach is that we separate variations in behavior corresponding to evolutionary patterns, the phylogenetic variability, from variations among individuals of the same species. By studying the properties of the phylogenetic covariance matrix (V(a)), we can identify multiple behaviors that may have evolved together.

We first characterized the coarse-grained structure within V(a) through the information-based clustering used in the previous section (Slonim et al., 2005) (see Materials and methods). As seen in Figure 6A, the phylogenetically co-varying clusters are not spatially contiguous in the behavioral map. This finding is in contrast to the spatial contiguity we observed for the intra-species covariance matrix (Figure 4B). For example, the two-cluster solution (Figure 6A, left) groups the behavioral space into side legs movements (middle of the behavioral map) and certain locomotion gaits (far left of the behavioral map) versus the rest of behaviors. Similarly, when the matrix is clustered into a larger number of clusters, correlated groups are not contiguously arranged within the behavior map. Thus, our model predicts that many non-similar behaviors are evolving in a correlated manner.

Figure 6. Phylogenetic variability and behavioral meta-traits.

Figure 6.

(A) (top) Clustering the phylogenetic covariance matrix (using the same information-based clustering method from Figure 4), we observe that the clusters are no longer spatially contiguous. (bottom) The phylogenetic covariance matrix reordered according to four clusters (colors corresponding to the four-cluster map above). (B) Fraction of variance explained by the largest eigenvalues of the phylogenetic covariance matrix. (C) The eigenvectors corresponding to the largest six eigenvalues. (D) Distributions of the projections of individual density vectors from D. santomea and D. yakuba onto eigenvector 3. (E) Same as in D but using projections of individuals from D. sechellia and D. simulans onto eigenvector 4. (F) Same as in D but using projections of individuals from D. simulans and D. mauritiana onto eigenvector 5.

To quantify these patterns as traits, we decomposed V(a) via an eigendecomposition. As seen in Figure 6B, almost all of the variance within the matrix can be explained with only the first six eigenmodes. These eigenvectors (Figure 6C) share similar non-local structure to the clusterings described above. By projecting individual behavioral vectors onto these eigenvectors, the resulting dot products represent a meta-trait that is a linear combination of phylogenetically linked behaviors.

These evolving meta-traits may be suitable targets for further neurobiological or genetic studies. Three examples of these distributions are shown in Figure 6D–F for several pairs of closely related species. These three examples were not chosen at random, but instead because they showed significant differentiation between species. The aim of this analysis is not to show that all meta-traits would differ between all pairs of species, which is unlikely, but rather that it is possible to identify synthetic meta-traits that could be further interrogated with experimental methods.

Discussion

We have developed a quantitative framework to study the evolution of behavioral repertoires, using fruit flies (Drosophila) as a model system. We started with observations of 561 individuals from six extant species behaving in an unremarkable environment. This assay did not include social behaviors, such as courtship and aggression, nor many foraging behaviors. Thus, at first glance, it might seem like we had excluded most species-specific behaviors from the analysis. Nonetheless, we found that other complex behaviors, like walking, running, and grooming, exhibit species-specific features that can be used to reliably assign individuals to the correct species. Thus, the motor patterns of behaviors that are not normally investigated for their species-specific features are likely evolving between even closely related species. It is not clear, however, if these differences reflect natural selection or genetic drift. All of these behaviors are critical to individual survival, however, so it is possible that these behaviors have evolved, at least in part, in response to natural selection. It is clear, however, that the underlying mechanisms, and perhaps the neural circuitry, controlling these behaviors must have evolved.

Inspired by these observations, we estimated patterns of behavioral evolution in the context of a well-understood phylogeny. We fit a Generalized Mixed Linear Model to our behavioral measurements and the given phylogeny to reconstruct ancestral behavioral repertoires and the intra- and inter-species covariance matrices. We found that the patterns of intra-species variability are similar to long timescale behavioral dynamics that violate statistical stationarity - a result we reported previously in a study of a single species (Berman et al., 2016). This result suggests that much of the intra-specific variability that emerged by sampling flies under well-controlled conditions reflects variability in the hidden behavioral states of individual flies. While it may be challenging to conceptualize that seemingly simple behaviors, like the pace of walking and running, are reflective of an underlying long time scale behavior state, many short time scale behaviors, such as the individual movements involved in grooming (Seeds et al., 2014), courtship (Calhoun et al., 2019; Deutsch et al., 2020) and aggression (Hoopfer, 2016; Duistermars et al., 2018) reflect behaviors performed only, or mainly, in the context of a longer lasting behavioral state. These types of long timescale variability may be a statistical confound for evolutionary and experimental studies of behavior. We therefore propose a method to control for these internal states by normalizing the frequency of behaviors relative to an estimate of an animal’s non-stationary states. This method improved the accuracy of behavioral phenotyping and dramatically altered estimates of some species-specific behaviors. For more focused studies, it may not be necessary to measure the full suite of behaviors to effectively normalize for behavioral state, since state can sometimes be estimated from a smaller number of behaviors. In fact, targeted studies of charismatic behaviors, including behaviors associated with aggression or courtship, often implicitly normalize by behavioral state.

Given our estimates for how suites of behaviors evolved, we examined whether the inter-species covariance matrix could be used to identify behavioral meta-traits that might be subjected to further evolutionary and experimental analysis. We identified multiple suites of behaviors that differed between closely related species, providing a starting point for further analysis of how the mechanisms underlying these suites of behaviors have evolved.

There are multiple possible interpretations of these phylogenetically correlated behaviors. For instance, at the neural level, each of these groups of movements may reflect a motor response to shared upstream commands (Cande et al., 2018). Here, for example, different types of locomotion might be controlled through the same descending neural circuitry, but due to evolutionary changes, the same commands could lead to different behavioral outputs, as has been observed in fly courtship patterns (Ding et al., 2019). Alternatively, at the genetic level, multiple behaviors may be linked by pleiotropic effects of individual genetic changes. Finally, groups of co-evolving traits may not be linked mechanistically, but co-evolution may instead reflect selection on suites of behaviors. For example, the male neurons that drive fly courtship song production in the ventral nerve cord are unlikely to be related to the female neurons in the central brain that perceive and interpret the courtship song. Nonetheless, these traits co-evolve such that females tend to prefer songs produced by males of their own species (Bennet-Clark and Ewing, 1969; Ding et al., 2019).

The analysis framework introduced here represents the first attempt to analyze full behavioral repertoires to gain insight into evolution. In principle, this approach could be applied to any data set where a large number of behaviors have been sampled in many species. However, there are several areas where one could add future improvements to this approach. First, we recorded behavior from only six species of flies. Adding additional species would place more constraints on the evolutionary dynamics, likely resulting in less variance in the ancestral state estimations and potentially adding more structure to the relatively low rank (i.e., highly modular) covariance matrices. Additionally, further work is required to determine the balance between sampling within and between strains and species that optimizes estimates of evolutionary dynamics.

Second, our framework assumes that all evolutionary changes in behavior resemble a diffusion process. Although this assumption is a reasonable initial hypothesis (Felsenstein, 1985), it may be possible to test this assumption. For example, deeper sampling of additional species may allow identification of specific behaviors on particular lineages where neutrality can be rejected (Tajima, 1993). If evidence emerges that the analyzed behaviors do not evolve under a diffusion process but under stabilizing selection, for example, the model for ancestral reconstruction can be changed from a Brownian motion to an Ornstein-Uhlenbeck process (Martins and Hansen, 1997; Hansen and Martins, 1996; Royer-Carenzi and Didier, 2016). Such a change can be implemented by altering the structure of the phylogenetic matrix, A(Martins and Hansen, 1997; Caetano and Beaulieu, 2020), but without other alterations to the overall methodology presented here.

Another potential limitation of our analysis is that some of the observed inter-specific differences may reflect species-specific responses to environmental factors like room temperature or humidity, rather than underlying genetic or developmental factors. Two observations mitigate against this possibility, however. First, there were no significant differences in overall activity level of the different species, which would be a key indicator of environment-induced covariance. Second, the intra-species covariance matrix (derived from data from all species) agrees well with previous findings within a single species (Berman et al., 2016), implying that many of the potential environmental co-varying factors are shared across all six species.

In addition, all of our current analyses ignored the temporal structure of behavior and sequences of movements. While we found that the intra-specific variance has a similar structure to temporal structure that we reported previously (Berman et al., 2016), the order in which behaviors occur may also provide important biological information, especially during events like courtship or aggression. It should be possible to incorporate temporal structure directly into the regression (Caetano and Beaulieu, 2020). Deciding exactly which quantities to measure and how they should be incorporated, however, are complex questions that are outside the scope of this initial study. In addition, the number of fit parameters, already over 18,000 here, would need to grow even larger to accommodate modeling transition rates between the behaviors as traits themselves. Thus, fitting such models would necessitate even larger data sets than the one collected here. Moreover, because the Perron-Frobenius Theorem mathematically couples the transition probabilities between behaviors and the probabilities of the fly performing a given behavior, additional care (and data) is required to ensure that observed differences in behavior are due to changes in temporal structure rather than changes in the frequencies of performing a given behavior.

Lastly, capturing the full range of animal behaviors for a large number of animals presents a number of technological challenges, which is why we focused on measuring behavior in a highly simplified environment. However, a more complete understanding of the structure of behavior will require more sophisticated ways to capture behavioral dynamics in more naturalistic settings and during complex social arrangements. While modern deep learning methods have made tracking animals in more realistic settings increasingly plausible (Pereira et al., 2019; Mathis and Mathis, 2020), there are still considerable hurdles to translating this information into a form that can be subjected to the kind of analysis we propose here.

Despite these limitations, this work represents a new way to quantitatively characterize the evolution of complex behaviors, which may provide new phenotypes that can be subjected to experimental analysis. In the absence of a behavioral fossil record, reconstructing ancestral behaviors requires an inferential approach like the one we present here. In addition, more complex models could be built to test assumptions of the diffusion-based model we employed. Finally, a strength of our approach is that it makes falsifiable predictions about how behaviors are linked mechanistically, providing predictions that can be tested experimentally to provide further insight into the genetic and neurobiological structure of behavior.

Materials and methods

Data collection

All fly handling and imaging of fly behavior followed the procedures described in Cande et al., 2018, excepting that we did not provide retinol-free food to any of the animals, nor we did provide any red light cycling during the experiments. Individual male flies were collected upon eclosion and housed singly in 2 mL wells in a 96-well ’condo,’ with food deposited in the bottom of each well, which was sealed at the top with an airpore sheet. In total, we collected data from 561 individual from 18 strains and six species. Flies were imaged at age 7–12 days, within 4 hr of lights on. Individuals were sampled from multiple strains and species: three strains of D. mauritiana (mau29: 29 flies, mau317: 35 flies, mau318: 32 flies), four strains of D. melanogaster (Canton-S: 31 flies, Oregon-R: 33 flies, mel54: 34 flies, mel56: 31 flies), three strains of D. santomea (san00: 29 flies, san1482: 33 flies, STO OBAT: 22 flies), three strains of D. sechellia (sech28: 32 flies, sech340: 25 flies, sech349: 33 flies), three strains of D. simulans (sim5: 33 flies, sim199: 30 flies, Oxnard: 34 flies), and two strains of D. yakuba (yak01: 34 flies, CYO2: 31 flies).

Generalized linear mixed model

We fit our GLMM (Equation 1) using the software introduced in Hadfield, 2010. The covariance matrices V(e) and V(a)IRK×K, K=134 and the mean vector μIRK×1 were inferred from the posterior distribution via MCMC sampling. Prior distributions for the covariance matrices were given by Inverse Wishart Distributions (conjugate priors for the multi-Gaussian model) with K degrees of freedom and 1K+1I+J2 as scale matrix, with J and I the unit and identity matrices respectively. Tree branch length were estimated from Seetharam and Stuart, 2013.

Gelman-Rubin convergence diagnostic

This test evaluates MCMC convergence by analyzing the difference between several Markov chains. Specifically, we compare the estimated between-chains and within-chain variances for each parameter of the model. Large differences between these variances indicate non-convergence (Gelman and Rubin, 1992). Let θ be a model parameter of interest and {θm}t=1N be the m th simulated chain, m=1,2,,M. Denote, θ^m and σ^m2 be the sample posterior mean and variance of the m th chain. If θ^=1Mm=1Mθ^m is the overall posterior mean estimator, the between-chains (B) and within-chain (W) variances are given by:

B=NM1m=1M(θ^mθ^)2,W=1Mm=1Mσ^m2. (5)

In Gelman and Rubin, 1992, it is shown that the following weighted average of W and B is an unbiased estimator of the marginal posterior variance of θ:V^=N1NW+M+1NMB. The ratio V^/W should get close to one as the M chains converge to the target distribution with N. In reference (Brooks and Gelman, 1998) this ratio known as the Potential Scale Reduction Factor (PSRF) was corrected to account for the the sampling variability using Rc=d+3d+1V^W, where d is the degrees of freedom estimate of a t-distribution. Values of PSRF for all model parameters such that Rc<1.1 are used in Brooks and Gelman, 1998 as a criteria for convergence of the MCMC chains. Here, we used 20 independent chains, each with a different initialization.

Deviance information criterion (DIC)

The DIC is used as a Bayesian model selection criteria in problems where there is hierarchical structure to the underlying models and where the correct effective number of parameters is difficult to ascertain (Spiegelhalter et al., 2002). These aspects are often found in models like ours where the posterior distributions have been sampled using Markov Chain Monte Carlo. The DIC is defined as follows:

DIC=D(θ)¯+PD,withD(θ)¯=Ep(θ|y)[D(θ)],pD=D(θ)¯D(θ¯)¯ (6)

where D(θ)=-2logP(yθ)+2logf(y) is called the deviance (f(y) denotes a function of the data alone and P(yθ) corresponds to the likelihood of the model under evaluation). Hence, the posterior mean, D(θ)¯, can be considered as a Bayesian measure of fit. PD represents the effective number of parameters, where D(θ¯) is the deviance evaluated at the posterior mean of the parameters θ¯. Note that (i) both quantities needed to calculate DIC, D(θ)¯, and D(θ¯), can be readily estimated from the samples generated by MCMC, and (ii) alternatively, we can also re-write DIC=D(θ¯)+2PD. This is similar in form to the better-known Akaike Information criterion (AIC) (Akaike, 1973), for models with negligible prior information or for large data sets where the likelihood dominates over the prior.

Comparing the focused trait and full repertoire models

To build a model where behavioral traits evolve independently from each other, we fit each a single trait GLMM for each behavior j:

lj=μj+ρj+ej, (7)

where lj denotes the logarithm of the behavioral trait Pj, μj is the logarithm of the mean behavior of the common ancestor (treated as the fixed effect of this model), and ρj and ej are the random effects corresponding to the phylogenetic and individual variability, respectively. Similar to the multi-response model, these random effects are normally distributed from 𝒩(0,Aσj) and 𝒩(0,Aαj) with σj and αj (single numbers) corresponding to the phylogenetic and individual inferred variances and A the phylogenetic matrix defined in the main text. Prior distributions for the variances are given by inverse-Wishart distributions with 1.002 degrees of freedom and scale parameter equal to the variance of the logarithm of the corresponding behavioral trait.

We fit these models using 10 bootstrapped data sets and obtained an average DIC value of (230±2)×103. Note that in the single-trait model, since each behavior is treated independently, the likelihood gets factorized in terms of the individual likelihoods corresponding to each behavioral trait: P(l1,l2,,lKθ)=i=1KP(liθi). Therefore, the DIC (estimated in terms of the log-likelihood) is given by DIC=i=1KDICi, where DICi is calculated for each single trait GLMM.

In contrast, the complete GLMM model (described in the main text and in the section above) had a significantly lower average DIC value of (114±2)×103 (calculated over 10 bootstrapped data sets as well).

Information-based clustering

The information-based clustering approach used in this article (originally introduced in Slonim et al., 2005) minimizes the distance between elements within clusters, while also compressing the original representation as much as possible. More precisely, the method minimizes the functional

=d+TI(C;i), (8)

where I(C;i)=i=1NC=1NcP(C;i)log[P(Ci)P(C)] is the mutual information between the original behavioral variable i and the clustering C.<d>=C=1NcP(C)d(C), and d(C) is the average distance of elements chosen out of a single cluster:

d(C)=i1Ni2NP(i1C)P(i2C)d(i1,i2), (9)

with d(i1,i2) being the distance measure between a pair of elements and P(iC) being the probability to find element i in cluster C.T is a Lagrange multiplier that modulates the relative importance of minimizing the average within-cluster distance and simplifying the clustering.

Given |C|=Nc, T and a random initial condition for P(Ci), a solution is obtained by iterating a set of self-consistent equations (Slonim et al., 2005) until the convergence criteria t-t+1t<10-5 is satisfied. We chose 40,000 different initial conditions for P(C|i), along with randomly chosen values of T[0.1,1000] and Nc{2,3,,20}. For each set of initial conditions and parameters, we performed the optimization until the convergence criterion was met. We defined the Pareto front as the set of solutions P(Ci) such that no other solution presents a smaller d and a smaller I(C;i), and we only kept solutions that were along this front (eliminating duplicates). Finally, for each number of clusters we selected the solution with the lowest d.

For each number of clusters, we assess the modularity of the found solution by comparing d for the solution to the average distance corresponding to random cluster assignments. These assignments are made in such a way that the amount of elements per cluster is conserved by randomly shuffling the vector that assigns each behavior to a particular cluster. The values presented in the main text correspond to the mean and standard deviation of d over 50 different random trials.

Deterministic Information Bottleneck

We use the Deterministic Information Bottleneck (DIB) method (Strouse and Schwab, 2017) to find coarse-grainings of the behavioral space that optimally predict future states. Inspired by the Information Bottleneck (IB) (Tishby et al., 1999), given two measured variables, X and Y, the DIB method finds a clustering, Z, of X, where Z is maximally informative of Y, but is as simple as possible. Specifically, we minimize the functional:

𝒥=-I(Y;Z)+γ(Z) (10)

with respect to p(zZ|xX). Here, γ is a Lagrange multiplier that modulates the relative importance of the two terms, with larger values of γ resulting in simpler representations.

In practice, to compute this minimum for a given value of γ and an initial condition for p(z|x), we minimize

𝒥(α)=γH(Z)-αH(ZX)-I(Y;Z) (11)

with respect to p(zZ|xX) and take the limit as α0, following the self-consistent equation procedure described in Strouse and Schwab, 2017.

To apply DIB to the behavioral dynamics, we count time in units of the transitions between states, providing a discrete time series of behaviors: b(n) can thus be one of N=134 different integer values at each discrete time n. Here, we relate the joint distributions of b(n) (X in Equation 10) and b(n+τ) (Y) through a coarse-grained clustering of the behavioral states (Z). Similar to our approach with information-based clustering (see previous section), we chose 10,000 different pairs of random values for γ between 0.1 and 104 and Nc between 2 and 30 clusters. Given Nc, γ and a random initial condition for p(tx), we find a solution by iterating through a set of self-consistent equations (Strouse and Schwab, 2017) until the convergence criteria (an absolute change in the function of less than 10-6) is satisfied. If any cluster has its probability become zero at any iteration, then that cluster is dropped for all future iterations. Thus, Nc is the maximum number of clusters that can be returned. Of these 10,000 solutions, we keep all solutions that are on the Pareto front (i.e., no other solution has both a higher I(Y;Z) and a smaller H(Z)). The displayed clusters are the solutions on the Pareto front with the largest I(Y;Z) for a given number of clusters.

Weighted similarity index

We quantify the similarity between clustering partitions using the Weighted Similarity Index (WSI), a modification of the Rand Index (Rand, 1971) such that behaviors contribute the index according to their overall probability. Specifically,

WSI=i,jSaWij+k,lSbWklWij=PiPkklPkPl,

 where Sa(Sb) is the set of pairs of behaviors that belong to the same (different) cluster in the two partitions and Pk is the probability of observing behavior k.

Acknowledgements

We thank Ilya Nemenman, Jennifer Rieser, and Daniel Weissman for their helpful comments on the manuscript. DGH was supported by Programa Raices from the MinCyT. CR was supported by the NSF Physics of Living Systems Student Research Network (1806833). GJB. was supported by NIMH R01 MH115831-01, the Human Frontier Science Program (RGY0076/2018), and a Cottrell Scholar Award, a program of the Research Corporation for Science Advancement (25999). JC, DLS, and GJB were supported by the Howard Hughes Medical Institute and the Janelia visiting researcher program.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Gordon J Berman, Email: gordon.berman@emory.edu.

Jesse H Goldberg, Cornell University, United States.

Christian Rutz, University of St Andrews, United Kingdom.

Funding Information

This paper was supported by the following grants:

  • National Institute of Mental Health MH115831-01 to Gordon J Berman.

  • Human Frontier Science Program RGY0076/2018 to Gordon J Berman.

  • Howard Hughes Medical Institute to Jessica Cande, David L Stern, Gordon J Berman.

  • Research Corporation for Science Advancement 25999 to Gordon J Berman.

  • National Science Foundation 1806833 to Catalina Rivera.

  • Ministerio de Ciencia, Tecnología e Innovación de Argentina to Damián G Hernández.

Additional information

Competing interests

Reviewing editor, eLife.

No competing interests declared.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Conceptualization, Data curation, Investigation, Methodology, Writing - review and editing.

Conceptualization, Software, Investigation, Methodology, Writing - review and editing.

Conceptualization, Resources, Funding acquisition, Investigation, Visualization, Methodology, Writing - review and editing.

Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Additional files

Source data 1. Fly behavior source data.
elife-61806-data1.zip (505.4KB, zip)
Transparent reporting form

Data availability

All behavioral region information is submitted with the article and is posted on GitHub (https://github.com/bermanlabemory/behavioral-evolution, copy archived at https://archive.softwareheritage.org/swh:1:rev:b01a6e3a2c7da193f38631dfe925c65229494d74). The original video data are too large to post (tens of TB), but will be made available upon request.

References

  1. Akaike H. Information theory and an extension of the maximum likelihood principle. Proceedings of the 2nd International Symposium on Information Theory; 1973. pp. 267–281. [Google Scholar]
  2. Anderson DJ. Circuit modules linking internal states and social behaviour in flies and mice. Nature Reviews Neuroscience. 2016;17:692–704. doi: 10.1038/nrn.2016.125. [DOI] [PubMed] [Google Scholar]
  3. Auer TO, Benton R. Sexual circuitry in Drosophila. Current Opinion in Neurobiology. 2016;38:18–26. doi: 10.1016/j.conb.2016.01.004. [DOI] [PubMed] [Google Scholar]
  4. Baier F, Hoekstra HE. The genetics of morphological and behavioural island traits in deer mice. Proceedings of the Royal Society B. 1914;286:1697. doi: 10.1098/rspb.2019.1697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Baker BS, Taylor BJ, Hall JC. Are complex behaviors specified by dedicated regulatory genes? reasoning from Drosophila. Cell. 2001;105:13–24. doi: 10.1016/S0092-8674(01)00293-8. [DOI] [PubMed] [Google Scholar]
  6. Bennet-Clark HC, Ewing AW. Pulse interval as a critical parameter in the courtship song of Drosophila melanogaster. Animal Behaviour. 1969;17:755–759. doi: 10.1016/S0003-3472(69)80023-0. [DOI] [Google Scholar]
  7. Berman GJ, Choi DM, Bialek W, Shaevitz JW. Mapping the stereotyped behaviour of freely moving fruit flies. Journal of the Royal Society Interface. 2014;11:20140672. doi: 10.1098/rsif.2014.0672. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Berman GJ, Bialek W, Shaevitz JW. Predictability and hierarchy in Drosophila behavior. PNAS. 2016;113:11943–11948. doi: 10.1073/pnas.1607601113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Berman GJ. Measuring behavior across scales. BMC Biology. 2018;16:23. doi: 10.1186/s12915-018-0494-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Brooks SP, Gelman A. General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics. 1998;7:434–455. doi: 10.1080/10618600.1998.10474787. [DOI] [Google Scholar]
  11. Brown AEX, de Bivort B. Ethology as a physical science. Nature Physics. 2018;14:653–657. doi: 10.1038/s41567-018-0093-0. [DOI] [Google Scholar]
  12. Caetano DS, Beaulieu JM. Comparative analyses of phenotypic sequences using phylogenetic trees. The American Naturalist. 2020;195:E38–E50. doi: 10.1086/706912. [DOI] [PubMed] [Google Scholar]
  13. Calhoun AJ, Pillow JW, Murthy M. Unsupervised identification of the internal states that shape natural behavior. Nature Neuroscience. 2019;22:2040–2049. doi: 10.1038/s41593-019-0533-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Cande J, Andolfatto P, Prud'homme B, Stern DL, Gompel N. Evolution of multiple additive loci caused divergence between Drosophila yakuba and D. santomea in wing rowing during male courtship. PLOS ONE. 2012;7:e43888. doi: 10.1371/journal.pone.0043888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Cande J, Stern DL, Morita T, Prud’homme B, Gompel N. Looking Under the Lamp Post: Neither fruitless nor doublesex Has Evolved to Generate Divergent Male Courtship in Drosophila. Cell Reports. 2014;8:363–370. doi: 10.1016/j.celrep.2014.06.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Cande J, Namiki S, Qiu J, Korff W, Card GM, Shaevitz JW, Stern DL, Berman GJ. Optogenetic dissection of descending behavioral control in Drosophila. eLife. 2018;7:e34275. doi: 10.7554/eLife.34275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Chyb S, Gompel N. Atlas of Drosophila Morphology: Wild-Type and Classical Mutants. London: Academic Press; 2013. [Google Scholar]
  18. Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN, Pollard DA, Sackton TB, Larracuente AM, Singh ND, Abad JP, Abt DN, Adryan B, Aguade M, Akashi H, Anderson WW, Aquadro CF, Ardell DH, Arguello R, Artieri CG, Barbash DA, Barker D, Barsanti P, Batterham P, Batzoglou S, Begun D, Bhutkar A, Blanco E, Bosak SA, Bradley RK, Brand AD, Brent MR, Brooks AN, Brown RH, Butlin RK, Caggese C, Calvi BR, Bernardo de Carvalho A, Caspi A, Castrezana S, Celniker SE, Chang JL, Chapple C, Chatterji S, Chinwalla A, Civetta A, Clifton SW, Comeron JM, Costello JC, Coyne JA, Daub J, David RG, Delcher AL, Delehaunty K, Do CB, Ebling H, Edwards K, Eickbush T, Evans JD, Filipski A, Findeiss S, Freyhult E, Fulton L, Fulton R, Garcia AC, Gardiner A, Garfield DA, Garvin BE, Gibson G, Gilbert D, Gnerre S, Godfrey J, Good R, Gotea V, Gravely B, Greenberg AJ, Griffiths-Jones S, Gross S, Guigo R, Gustafson EA, Haerty W, Hahn MW, Halligan DL, Halpern AL, Halter GM, Han MV, Heger A, Hillier L, Hinrichs AS, Holmes I, Hoskins RA, Hubisz MJ, Hultmark D, Huntley MA, Jaffe DB, Jagadeeshan S, Jeck WR, Johnson J, Jones CD, Jordan WC, Karpen GH, Kataoka E, Keightley PD, Kheradpour P, Kirkness EF, Koerich LB, Kristiansen K, Kudrna D, Kulathinal RJ, Kumar S, Kwok R, Lander E, Langley CH, Lapoint R, Lazzaro BP, Lee SJ, Levesque L, Li R, Lin CF, Lin MF, Lindblad-Toh K, Llopart A, Long M, Low L, Lozovsky E, Lu J, Luo M, Machado CA, Makalowski W, Marzo M, Matsuda M, Matzkin L, McAllister B, McBride CS, McKernan B, McKernan K, Mendez-Lago M, Minx P, Mollenhauer MU, Montooth K, Mount SM, Mu X, Myers E, Negre B, Newfeld S, Nielsen R, Noor MA, O'Grady P, Pachter L, Papaceit M, Parisi MJ, Parisi M, Parts L, Pedersen JS, Pesole G, Phillippy AM, Ponting CP, Pop M, Porcelli D, Powell JR, Prohaska S, Pruitt K, Puig M, Quesneville H, Ram KR, Rand D, Rasmussen MD, Reed LK, Reenan R, Reily A, Remington KA, Rieger TT, Ritchie MG, Robin C, Rogers YH, Rohde C, Rozas J, Rubenfield MJ, Ruiz A, Russo S, Salzberg SL, Sanchez-Gracia A, Saranga DJ, Sato H, Schaeffer SW, Schatz MC, Schlenke T, Schwartz R, Segarra C, Singh RS, Sirot L, Sirota M, Sisneros NB, Smith CD, Smith TF, Spieth J, Stage DE, Stark A, Stephan W, Strausberg RL, Strempel S, Sturgill D, Sutton G, Sutton GG, Tao W, Teichmann S, Tobari YN, Tomimura Y, Tsolas JM, Valente VL, Venter E, Venter JC, Vicario S, Vieira FG, Vilella AJ, Villasante A, Walenz B, Wang J, Wasserman M, Watts T, Wilson D, Wilson RK, Wing RA, Wolfner MF, Wong A, Wong GK, Wu CI, Wu G, Yamamoto D, Yang HP, Yang SP, Yorke JA, Yoshida K, Zdobnov E, Zhang P, Zhang Y, Zimin AV, Baldwin J, Abdouelleil A, Abdulkadir J, Abebe A, Abera B, Abreu J, Acer SC, Aftuck L, Alexander A, An P, Anderson E, Anderson S, Arachi H, Azer M, Bachantsang P, Barry A, Bayul T, Berlin A, Bessette D, Bloom T, Blye J, Boguslavskiy L, Bonnet C, Boukhgalter B, Bourzgui I, Brown A, Cahill P, Channer S, Cheshatsang Y, Chuda L, Citroen M, Collymore A, Cooke P, Costello M, D'Aco K, Daza R, De Haan G, DeGray S, DeMaso C, Dhargay N, Dooley K, Dooley E, Doricent M, Dorje P, Dorjee K, Dupes A, Elong R, Falk J, Farina A, Faro S, Ferguson D, Fisher S, Foley CD, Franke A, Friedrich D, Gadbois L, Gearin G, Gearin CR, Giannoukos G, Goode T, Graham J, Grandbois E, Grewal S, Gyaltsen K, Hafez N, Hagos B, Hall J, Henson C, Hollinger A, Honan T, Huard MD, Hughes L, Hurhula B, Husby ME, Kamat A, Kanga B, Kashin S, Khazanovich D, Kisner P, Lance K, Lara M, Lee W, Lennon N, Letendre F, LeVine R, Lipovsky A, Liu X, Liu J, Liu S, Lokyitsang T, Lokyitsang Y, Lubonja R, Lui A, MacDonald P, Magnisalis V, Maru K, Matthews C, McCusker W, McDonough S, Mehta T, Meldrim J, Meneus L, Mihai O, Mihalev A, Mihova T, Mittelman R, Mlenga V, Montmayeur A, Mulrain L, Navidi A, Naylor J, Negash T, Nguyen T, Nguyen N, Nicol R, Norbu C, Norbu N, Novod N, O'Neill B, Osman S, Markiewicz E, Oyono OL, Patti C, Phunkhang P, Pierre F, Priest M, Raghuraman S, Rege F, Reyes R, Rise C, Rogov P, Ross K, Ryan E, Settipalli S, Shea T, Sherpa N, Shi L, Shih D, Sparrow T, Spaulding J, Stalker J, Stange-Thomann N, Stavropoulos S, Stone C, Strader C, Tesfaye S, Thomson T, Thoulutsang Y, Thoulutsang D, Topham K, Topping I, Tsamla T, Vassiliev H, Vo A, Wangchuk T, Wangdi T, Weiand M, Wilkinson J, Wilson A, Yadav S, Young G, Yu Q, Zembek L, Zhong D, Zimmer A, Zwirko Z, Jaffe DB, Alvarez P, Brockman W, Butler J, Chin C, Gnerre S, Grabherr M, Kleber M, Mauceli E, MacCallum I. Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007;450:203–218. doi: 10.1038/nature06341. [DOI] [PubMed] [Google Scholar]
  19. Cunningham CW, Omland KE, Oakley TH. Reconstructing ancestral character states: a critical reappraisal. Trends in Ecology & Evolution. 1998;13:361–366. doi: 10.1016/S0169-5347(98)01382-2. [DOI] [PubMed] [Google Scholar]
  20. Deutsch D, Pacheco D, Encarnacion-Rivera L, Pereira T, Fathy R, Clemens J, Girardin C, Calhoun A, Ireland E, Burke A, Dorkenwald S, McKellar C, Macrina T, Lu R, Lee K, Kemnitz N, Ih D, Castro M, Halageri A, Jordan C, Silversmith W, Wu J, Seung HS, Murthy M. The neural basis for a persistent internal state in Drosophila females. eLife. 2020;9:e59502. doi: 10.7554/eLife.59502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ding Y, Lillvis JL, Cande J, Berman GJ, Arthur BJ, Long X, Xu M, Dickson BJ, Stern DL. Neural evolution of Context-Dependent fly song. Current Biology. 2019;29:1089–1099. doi: 10.1016/j.cub.2019.02.019. [DOI] [PubMed] [Google Scholar]
  22. Duistermars BJ, Pfeiffer BD, Hoopfer ED, Anderson DJ. A brain module for scalable control of complex, Multi-motor threat displays. Neuron. 2018;100:1474–1490. doi: 10.1016/j.neuron.2018.10.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Ellison CK, Wiley C, Shaw KL. The genetics of speciation: genes of small effect underlie sexual isolation in the hawaiian cricket Laupala. Journal of Evolutionary Biology. 2011;24:1110–1119. doi: 10.1111/j.1420-9101.2011.02244.x. [DOI] [PubMed] [Google Scholar]
  24. Felsenstein J. Phylogenies and the comparative method. The American Naturalist. 1985;125:1–15. doi: 10.1086/284325. [DOI] [PubMed] [Google Scholar]
  25. Felsenstein J. Using the quantitative genetic threshold model for inferences between and within species. Philosophical Transactions of the Royal Society B: Biological Sciences. 2005;360:1427–1434. doi: 10.1098/rstb.2005.1669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Statistical Science. 1992;7:457–472. doi: 10.1214/ss/1177011136. [DOI] [Google Scholar]
  27. Gleason JM, Ritchie MG. Do quantitative trait loci (QTL) for a courtship song difference between Drosophila simulans and D. sechellia coincide with candidate genes and intraspecific QTL? Genetics. 2004;166:1303–1311. doi: 10.1534/genetics.166.3.1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge, USA: MIT press; 2016. [Google Scholar]
  29. Hadfield JD. MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package. Journal of Statistical Software. 2010;33:1–22. doi: 10.18637/jss.v033.i02. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Hadfield JD, Nakagawa S. General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi-trait models for continuous and categorical characters. Journal of Evolutionary Biology. 2010;23:494–508. doi: 10.1111/j.1420-9101.2009.01915.x. [DOI] [PubMed] [Google Scholar]
  31. Hansen TF, Martins EP. Translating between microevolutionary process and macroevolutionary patterns: the correlation structure of interspecific data. Evolution. 1996;50:1404–1417. doi: 10.1111/j.1558-5646.1996.tb03914.x. [DOI] [PubMed] [Google Scholar]
  32. Hoopfer ED. Neural control of aggression in Drosophila. Current Opinion in Neurobiology. 2016;38:109–118. doi: 10.1016/j.conb.2016.04.007. [DOI] [PubMed] [Google Scholar]
  33. Hu CK, Hoekstra HE. Peromyscus burrowing: a model system for behavioral evolution. Seminars in Cell & Developmental Biology. 2017;61:107–114. doi: 10.1016/j.semcdb.2016.08.001. [DOI] [PubMed] [Google Scholar]
  34. Levine M, Davidson EH. Gene regulatory networks for development. PNAS. 2005;102:4936–4942. doi: 10.1073/pnas.0408031102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Ligon RA, Diaz CD, Morano JL, Troscianko J, Stevens M, Moskeland A, Laman TG, Scholes E. Evolution of correlated complexity in the radically different courtship signals of birds-of-paradise. PLOS Biology. 2018;16:e2006962. doi: 10.1371/journal.pbio.2006962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Lorenz KZ. The evolution of behavior. Scientific American. 1958;199:67–78. doi: 10.1038/scientificamerican1258-67. [DOI] [PubMed] [Google Scholar]
  37. Martins EP. Phylogenies and the Comparative Method in Animal Behavior. Oxford: Oxford University Press; 1996. [Google Scholar]
  38. Martins EP, Hansen TF. Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. The American Naturalist. 1997;149:646–667. doi: 10.1086/286013. [DOI] [Google Scholar]
  39. Massey JH, Chung D, Siwanowicz I, Stern DL, Wittkopp PJ. The yellow gene influences Drosophila male mating success through sex comb melanization. eLife. 2019;8:e49388. doi: 10.7554/eLife.49388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Mathis MW, Mathis A. Deep learning tools for the measurement of animal behavior in neuroscience. Current Opinion in Neurobiology. 2020;60:1–11. doi: 10.1016/j.conb.2019.10.008. [DOI] [PubMed] [Google Scholar]
  41. Meyer F. Topographic distance and watershed lines. Signal Processing. 1994;38:113–125. doi: 10.1016/0165-1684(94)90060-4. [DOI] [Google Scholar]
  42. Obbard DJ, Maclennan J, Kim KW, Rambaut A, O'Grady PM, Jiggins FM. Estimating divergence dates and substitution rates in the Drosophila phylogeny. Molecular Biology and Evolution. 2012;29:3459–3473. doi: 10.1093/molbev/mss150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. O’Meara BC. Evolutionary inferences from phylogenies: a review of methods annual review of ecology. Evolution. 2012;43:267–285. doi: 10.1146/annurev-ecolsys-110411-160331. [DOI] [Google Scholar]
  44. Pereira TD, Aldarondo DE, Willmore L, Kislin M, Wang SS, Murthy M, Shaevitz JW. Fast animal pose estimation using deep neural networks. Nature Methods. 2019;16:117–125. doi: 10.1038/s41592-018-0234-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Rand WM. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association. 1971;66:846–850. doi: 10.1080/01621459.1971.10482356. [DOI] [Google Scholar]
  46. Royer-Carenzi M, Didier G. A comparison of ancestral state reconstruction methods for quantitative characters. Journal of Theoretical Biology. 2016;404:126–142. doi: 10.1016/j.jtbi.2016.05.029. [DOI] [PubMed] [Google Scholar]
  47. Seeds AM, Ravbar P, Chung P, Hampel S, Midgley FM, Mensh BD, Simpson JH. A suppression hierarchy among competing motor programs drives sequential grooming in Drosophila. eLife. 2014;3:e02951. doi: 10.7554/eLife.02951. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Seetharam AS, Stuart GW. Whole genome phylogeny for 21 Drosophila species using predicted 2b-RAD fragments. PeerJ. 2013;1:e226. doi: 10.7717/peerj.226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Shaw KL, Lesnick SC. Genomic linkage of male song and female acoustic preference QTL underlying a rapid species radiation. PNAS. 2009;106:9737–9742. doi: 10.1073/pnas.0900229106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Shubin N, Tabin C, Carroll S. Deep homology and the origins of evolutionary novelty. Nature. 2009;457:818–823. doi: 10.1038/nature07891. [DOI] [PubMed] [Google Scholar]
  51. Slonim N, Atwal GS, Tkacik G, Bialek W. Information-based clustering. PNAS. 2005;102:18297–18302. doi: 10.1073/pnas.0507432102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B. 2002;64:583–639. doi: 10.1111/1467-9868.00353. [DOI] [Google Scholar]
  53. Stern DL, Crocker J, Ding Y, Frankel N, Kappes G, Kim E, Kuzmickas R, Lemire A, Mast JD, Picard S. Genetic and transgenic reagents for Drosophila simulans, D. mauritiana, D. yakuba, D. santomea, and D. virilis. G3: Genes, Genomes, Genetics. 2017;7:1339–1347. doi: 10.1534/g3.116.038885. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Stern DL, Frankel N. The structure and evolution of Cis -regulatory regions: the shavenbaby story. Philosophical Transactions of the Royal Society B: Biological Sciences. 2013;368:0028. doi: 10.1098/rstb.2013.0028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Strouse DJ, Schwab DJ. The deterministic information bottleneck. Neural Computation. 2017;29:1611–1630. doi: 10.1162/NECO_a_00961. [DOI] [PubMed] [Google Scholar]
  56. Tajima F. Simple methods for testing the molecular evolutionary clock hypothesis. Genetics. 1993;135:599–607. doi: 10.1093/genetics/135.2.599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Tinbergen N. The Study of Instinct. Oxford: Oxford University Press; 1951. [Google Scholar]
  58. Tishby N, Pereira FC, Bialek W. The information bottleneck method. Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing Urbana-Champaign; 1999. pp. 368–377. [Google Scholar]
  59. van der Maaten L, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research. 2008;9:2579–2605. [Google Scholar]
  60. Weber JN, Peterson BK, Hoekstra HE. Discrete genetic modules are responsible for complex burrow evolution in Peromyscus mice. Nature. 2013;493:402–405. doi: 10.1038/nature11816. [DOI] [PubMed] [Google Scholar]
  61. West-Eberhard MJ. Developmental Plasticity and Evolution. Oxford: Oxford University Press; 2003. [Google Scholar]
  62. Williams TM, Carroll SB. Genetic and molecular insights into the development and evolution of sexual dimorphism. Nature Reviews Genetics. 2009;10:797–804. doi: 10.1038/nrg2687. [DOI] [PubMed] [Google Scholar]
  63. Yamamoto D, Ishikawa Y. Genetic and neural bases for species-specific behavior in Drosophila species. Journal of Neurogenetics. 2013;27:130–142. doi: 10.3109/01677063.2013.800060. [DOI] [PubMed] [Google Scholar]
  64. Yang Z. Computational Molecular Evolution. Oxford: Oxford University Press; 2006. [Google Scholar]

Decision letter

Editor: Jesse H Goldberg1
Reviewed by: Iain D Couzin2

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

Different animal species exhibit distinct behavioral repertoires, even in cases where common ancestors are relatively recent. How do repertoires acquire species-specificity during evolution? This manuscript provides new methods to examine behavioral patterns of different Drosophila species, and even to reconstruct potentially ancestral behavioral modes. Consistent with past work, the authors find that behaviors are inherited in clusters. Overall, this stimulating paper presents an exciting method for the unbiased study of behavioral evolution based on large datasets that are increasingly common in animal behavior, which differs from previous work focusing on specific traits.

Decision letter after peer review:

Thank you for submitting your article "A framework for studying behavioral evolution by reconstructing ancestral repertoires" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Christian Rutz as the Senior Editor. The following individual involved in the review of your submission has agreed to reveal their identity: Iain D Couzin (Reviewer #2).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

As the editors have judged that your manuscript is of interest, but as described below that additional experiments are required before it is published, we would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is “in revision at eLife”. Please let us know if you would like to pursue this option. (If your work is more suitable for medRxiv, you will need to post the preprint yourself, as the mechanisms for us to do so are still in development.)

Summary:

In this paper, the authors build upon their previous (ground-breaking) work in which automated methodology was employed to quantify the structure of behaviour and its regulation in fruit flies. Specifically, here, they address whether such methodology can be insightful in the study of how behaviour evolves, again using fruit flies as convenient and powerful model species (their phylogenetic relationships are well-understood, they do well in the lab, and it has been previously shown how well the behaviour of these species can be decomposed using automated methodology). There are a host of reasons why such an analysis is of value, not least because revealing the hidden structure of behavioural regulation has the potential to be informative regarding ancestral behavioural repertoires, and thus how it evolves. Overall, there is consensus that this is a thought-provoking paper that provides a valuable starting point in the analysis of behavioural evolution via detailed quantitative ethology. With added consideration of the factors discussed below this work makes a very helpful and novel contribution.

Essential revisions:

1. Consider behavioural transitions

All three reviewers were confused as to why transitions between states were not considered. Given that the authors have previously shown their capacity to reconstruct the hierarchical nested structure of behavioural states, typically represented as a network with transition probabilities between states, which would seemingly be extremely informative about the evolution of behavioural modules, all reviewers were left wondering why they did not consider comparisons of these networks -- especially given the array of network analysis tools available. Organisms, as they are well aware, don't just jump randomly from one behaviour to another, but there is a hierarchical organisation by which one behaviour influences the probability that another is elicited, and so on. Why did the authors not do this? There may be very good reasons, but it will be important for the reader to be informed. This would seem to be the most direct way to address how behaviour is structured. It is not clear to the reviewers exactly how much extra work this would be -- or how the scope of the paper would be required to change if transitions were considered. Thus, while formal inclusion of transitions into the framework is strongly preferred for revision, it is not absolutely required. At the minimum, the authors need to clarify exactly why transitions were not considered as well as what may be gained from such an approach in future work.

2. Confirm utility of the approach, its discoveries, and how it compares to more focused inquiries into the evolution of behaviour

The work suffers from a lack of clarity over why/how the method is superior to the focused trait approach. The fact that behaviours co-evolve in suites is by itself not especially novel. The authors should bring to the forefront, if possible, exactly how and why this approach is superior to others. Ultimately, the paper is an argument for the utility of studying repertoires rather than specific behaviours. The main thrust the paper tries to make is that we learn something new by looking at the evolution of repertoires of behaviour rather than focused analyses of specific behaviours (really they are looking at time budgets and perhaps that's a bit of a disconnect from their message as well). But they put forth in Figure 3 a reconstruction that seems to generate a nonsense behavioural repertoire for the ancestral species. So the authors should address the validity of using the specific phylogenetic methods to reconstruct behaviours based on the repertoires. They can address this point in two ways:

2.1. Generate simulations of repertoire evolution on the known Drosophila phylogeny and then take the endpoint repertoires they have evolved and assess how well the reconstruction of the known ancestral state works. People do these sorts of studies regularly to assess phylogenetic reconstruction approaches and methods (e.g., Royer-Carenzi and Didier 2016, J. Theoretical Biol.). This might be done with a real or arbitrary repertoire as the starting point for the simulation. This is crucial since it appears the rest of their findings hinge on the ancestral reconstruction.

2.2. As a complement to 2.1, the authors might assess the reconstruction of the repertoires in comparison to ancestral trait reconstructions for specific behavioural traits that might be more conventionally measured, assuming this can be extracted from the existing data. Part of the argument the authors make is that by considering the repertoire we achieve a more complete understanding of behavioral evolution. The approach seems very promising, though it is difficult to assess how it compares to other approaches for measuring behaviour. However good or bad the reconstructions of ancestral behaviours are for the repertoires, how do they compare to a focused trait approach? Perhaps better? Perhaps worse? Unless a comparison is done it's hard to say. More broadly, the paper argues for the utility of this approach to studying behaviour but it is not clear how it compares to analyses that focus on specific elements of behaviour rather than time budget vectors.

3. Make the paper more accessible for a general audience

A major issue with this paper was its inaccessibility, which could, unfortunately, ultimately reduce its impact. Most readers are by now familiar with past work using clustering methods (e.g., Cande et al., 2018; Berman et al., 2014) -- but many of the analyses in this paper are novel, difficult to penetrate, and poorly introduced for a general audience. The authors should take a fine-grained comb to this paper to make each analysis as accessible as possible. Simple fixes could include more plain language around what each analysis is testing/achieving, as well as consistency in terminology. Specifically, the final sentence of each paragraph could state in plain terms what the preceding analysis has just demonstrated, ruled-in, or ruled out.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "A framework for studying behavioral evolution by reconstructing ancestral repertoires" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Jesse H Goldberg as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Christian Rutz as the Senior Editor. The following individual involved in the review of your submission has agreed to reveal their identity: Iain D Couzin (Reviewer #2).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this decision letter to help you prepare a revised submission.

Please address continued concerns from Reviewer #3, and please attempt to clarify key sections of the text as recommended by Reviewer #1.

1) Reviewer #3: The paper has improved in many respects but the revision failed to deal with what I see as the most glaring issue (and which I had raised in my previous review).

All the extant species show biases in behaviors as shown by the hotter/browner colors in their behavioral maps. This suggests that we should expect that any species of Drosophila, extant or extinct, would be highly likely to have biases behaviors as well. However, the ancestrally reconstructed behavioral repertoire is totally washed out – there is no clear biases in behavior in the ancestral graph that is estimated in figure 3. This suggests to me that the method potentially has flaws.

I had suggested that the authors start with a PDF from one of their species and then simulate its evolution along the known Drosophila phylogeny and then try to reconstruct that repertoire using ancestral state reconstruction. If that process ends up showing a similarly 'flat' PDF where everything is similarly low probability (i.e. a totally blue PDF as shown in figure 3) then there is some cause for concern. If it were to retrieve a PDF similar in structure to the one that initiated the simulation, then that would be very reassuring.

The author responses seem to say that is what they have done in using the MCMC method but where the ancestral reconstruction analysis of a simulated evolutionary trajectory is to be found in the paper is unclear to me. I searched 'simulat' in the text of the manuscript and cannot see where trait evolution was simulated and then used as the starting values to try to evaluate how closely the ancestral reconstruction gets to the known ancestral starting point.

There is a lot in this paper, so though I feel this issue is important to address, does not necessarily preclude it from being published in my view. Though I would like to see the authors address the 'flatness' and thus seemingly unrealistic (it would seem to me at least) ancestral state that they present in figure 3.

2) Reviewer #1: The writing is slightly improved, though some sections could still be improved by concluding paragraphs not just with a mathematical result but also what it means in plain terms. E.g. Lines 264-7 conclude with the finding that the covariance matrix has a 'far from random modular structure;' this does not explain how this result relates to the title of the section (Individual Variability and long timescale correlations).

3) Please note that eLife has recently adopted the STRANGE guidelines for animal behaviour research:

https://doi.org/10.1038/d41586-020-01751-5

https://reviewer.elifesciences.org/author-guide/journal-policies

In your revisions, please consider scope for sampling biases in your study and how these may limit the generalisability of your findings, and make declarations as necessary. A few sentences in the section "Data collection" (lines 462-470) may suffice.

eLife. 2021 Sep 2;10:e61806. doi: 10.7554/eLife.61806.sa2

Author response


Essential revisions:

1. Consider behavioural transitions

All three reviewers were confused as to why transitions between states were not considered. Given that the authors have previously shown their capacity to reconstruct the hierarchical nested structure of behavioural states, typically represented as a network with transition probabilities between states, which would seemingly be extremely informative about the evolution of behavioural modules, all reviewers were left wondering why they did not consider comparisons of these networks -- especially given the array of network analysis tools available. Organisms, as they are well aware, don't just jump randomly from one behaviour to another, but there is a hierarchical organisation by which one behaviour influences the probability that another is elicited, and so on. Why did the authors not do this? There may be very good reasons, but it will be important for the reader to be informed. This would seem to be the most direct way to address how behaviour is structured. It is not clear to the reviewers exactly how much extra work this would be -- or how the scope of the paper would be required to change if transitions were considered. Thus, while formal inclusion of transitions into the framework is strongly preferred for revision, it is not absolutely required. At the minimum, the authors need to clarify exactly why transitions were not considered as well as what may be gained from such an approach in future work.

We agree that studying the evolution of the transition structure between behaviors is an important subject, and we are currently developing methods to study it. The primary difficulty here is the number of parameters involved. Because we are fitting two covariance matrices and an ancestral mean, we need to fit at least (N + 1) ∗N + N parameters, where N is the number of traits being reconstructed. Currently, we study a system that has N = 134 traits, leading to 18,224 parameters that we must fit with the GLMM. If we were to add the transition probabilities (or transition rates), this would require another ≈N2 parameters. Thus, instead of fitting a system with ≈ 18,000 parameters, our model would now have 327,284,280 parameters. While we would likely be able to perform the computation, we would be unlikely to believe the results at this point because our data set size is insufficient (the amount of time needed to accurately sample a transition matrix is a power of two longer than the amount of time needed to accurately sample the behavioral frequencies).

In addition, these traits (the transition matrix values) have different units than the behaviors, and the transition probabilities mathematically depend on the behavioral frequencies, since the first eigenvalue of the transition matrix is (by definition) proportional to the average behavioral frequency usage. Thus, it becomes necessary to add additional regularizing or normalizing factors, further complicating the analysis in a manner that is beyond the scope of what we aimed to demonstrate here.

Despite these difficulties, although transitions are not included in our actual model, one of the main points that we have highlighted is the similarity between the hierarchical structure that explains long timescale transitions, which we demonstrated previously, and the variability we observe here between individuals of the same species.

Again, though, this is a problem that we find extremely interesting, and we are actively working on exploring how to solve these technical and representational questions. We had some text describing this as a future direction in our Discussion section, but we have now expanded this section to further point out the importance of this question and some of the current technical challenges surrounding the study of the evolution of behavioral transitions (lines 428-441).

2. Confirm utility of the approach, its discoveries, and how it compares to more focused inquiries into the evolution of behaviour

The work suffers from a lack of clarity over why/how the method is superior to the focused trait approach. The fact that behaviours co-evolve in suites is by itself not especially novel. The authors should bring to the forefront, if possible, exactly how and why this approach is superior to others. Ultimately, the paper is an argument for the utility of studying repertoires rather than specific behaviours. The main thrust the paper tries to make is that we learn something new by looking at the evolution of repertoires of behaviour rather than focused analyses of specific behaviours (really they are looking at time budgets and perhaps that's a bit of a disconnect from their message as well). But they put forth in Figure 3 a reconstruction that seems to generate a nonsense behavioural repertoire for the ancestral species. So the authors should address the validity of using the specific phylogenetic methods to reconstruct behaviours based on the repertoires. They can address this point in two ways:

2.1. Generate simulations of repertoire evolution on the known Drosophila phylogeny and then take the endpoint repertoires they have evolved and assess how well the reconstruction of the known ancestral state works. People do these sorts of studies regularly to assess phylogenetic reconstruction approaches and methods (e.g., Royer-Carenzi and Didier 2016, J. Theoretical Biol.). This might be done with a real or arbitrary repertoire as the starting point for the simulation. This is crucial since it appears the rest of their findings hinge on the ancestral reconstruction.

This process is precisely what we are doing with our MCMC method (which was originally developed by Hadfield, as cited in the text). Specifically, given a known tree structure, we select a common ancestral mean behavior and two behavioral covariance matrix such that the likelihood of observing the measured behavioral distribution (including the covariances between the behaviors) is maximized. In addition to showing that our model has converged (Figure 3, Supplement 1), we also show that the predicted behavioral means at the endpoint species match the data (Figure 3, Supplement 2).

2.2. As a complement to 2.1, the authors might assess the reconstruction of the repertoires in comparison to ancestral trait reconstructions for specific behavioural traits that might be more conventionally measured, assuming this can be extracted from the existing data. Part of the argument the authors make is that by considering the repertoire we achieve a more complete understanding of behavioral evolution. The approach seems very promising, though it is difficult to assess how it compares to other approaches for measuring behaviour. However good or bad the reconstructions of ancestral behaviours are for the repertoires, how do they compare to a focused trait approach? Perhaps better? Perhaps worse? Unless a comparison is done it's hard to say. More broadly, the paper argues for the utility of this approach to studying behaviour but it is not clear how it compares to analyses that focus on specific elements of behaviour rather than time budget vectors.

To address these points, we have added an extra section in the main text, as well as in Materials and methods, that shows a comparison between our model and a simpler model where behaviors evolve independently (lines 214-229 and 491-524, as well as Figure 3, Supplement 3). In order to compare the performance of models with different complexity, we need a measure that not only takes into account how well the data is fit (i.e., maximizes likelihood), but that also penalizes the addition of extra parameters in more complex models (in this case, the off-diagonal terms of the two behavioral covariance matrices). For hierarchical models fit using MCMC methods (such as ours), the deviance information criterion (DIC) is the standard approach for model selection (Spiegelhalter et al., 2002). The DIC is a generalization of the Akaike criterion (see Materials and methods for additional details), and it is proportional to the negative log-likelihood evaluated at the mean of the posterior parameters, plus a penalty term proportional to the effective number of parameters in the model. The values of DIC for our model are much smaller than the models with a focused analysis, indicating that modeling covariance between behaviors explains the observed data much better than treating all behaviors independently. The implications of this result have been further expanded and discussed in the text, and we thank the reviewers for suggesting this validations approach.

3. Make the paper more accessible for a general audience

A major issue with this paper was its inaccessibility, which could, unfortunately, ultimately reduce its impact. Most readers are by now familiar with past work using clustering methods (e.g., Cande et al., 2018; Berman et al., 2014) -- but many of the analyses in this paper are novel, difficult to penetrate, and poorly introduced for a general audience. The authors should take a fine-grained comb to this paper to make each analysis as accessible as possible. Simple fixes could include more plain language around what each analysis is testing/achieving, as well as consistency in terminology. Specifically, the final sentence of each paragraph could state in plain terms what the preceding analysis has just demonstrated, ruled-in, or ruled out.

We agree with the reviewers that many aspects of the original manuscript text could have benefited from additional prose explaining the rationale and conclusions of the analysis. Following these comments, we have now included many additional clarifying comments to make the text more accessible.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Please address continued concerns from Reviewer #3, and please attempt to clarify key sections of the text as recommended by Reviewer #1.

(1) Reviewer #3: The paper has improved in many respects but the revision failed to deal with what I see as the most glaring issue (and which I had raised in my previous review).

All the extant species show biases in behaviors as shown by the hotter/browner colors in their behavioral maps. This suggests that we should expect that any species of Drosophila, extant or extinct, would be highly likely to have biases behaviors as well. However, the ancestrally reconstructed behavioral repertoire is totally washed out – there is no clear biases in behavior in the ancestral graph that is estimated in figure 3. This suggests to me that the method potentially has flaws.

I had suggested that the authors start with a PDF from one of their species and then simulate its evolution along the known Drosophila phylogeny and then try to reconstruct that repertoire using ancestral state reconstruction. If that process ends up showing a similarly 'flat' PDF where everything is similarly low probability (i.e. a totally blue PDF as shown in figure 3) then there is some cause for concern. If it were to retrieve a PDF similar in structure to the one that initiated the simulation, then that would be very reassuring.

The author responses seem to say that is what they have done in using the MCMC method but where the ancestral reconstruction analysis of a simulated evolutionary trajectory is to be found in the paper is unclear to me. I searched 'simulat' in the text of the manuscript and cannot see where trait evolution was simulated and then used as the starting values to try to evaluate how closely the ancestral reconstruction gets to the known ancestral starting point.

There is a lot in this paper, so though I feel this issue is important to address, does not necessarily preclude it from being published in my view. Though I would like to see the authors address the 'flatness' and thus seemingly unrealistic (it would seem to me at least) ancestral state that they present in figure 3.

We agree that the way in which we presented Figure 3 provides the impression that the ancestral repertoire is “washed out.” This is an effect of the way in which we plotted our data, and we thank the reviewer for pointing out this less-than-ideal visualization. The new version of the figure shows the ancestral distribution in non-logarithmic values, showing that there is indeed structure in this repertoire. We agree that it would be worrying if the distribution was indeed flat, but as can be seen, this isn’t the case here. All of the brown/hotter colors in the subsequent repertoires to the right are differences of the logs, which better shows the behavioral alterations along the phylogeny, as they are often subtle.

As to the proposed simulation, due to the reviewer’s excellent suggestion in the previous round of reviews, we added Figure 3—figure supplement 2, which shows that starting from the inferred ancestral state, the model’s mean predictions are in agreement with the data, providing a good double-check of our model’s self-consistency. In other words, when starting from the endpoints, we reconstruct the ancestral state (this is how the model is fit in the first place), and starting from the fit ancestral state, the model predicts, on average, the correct endpoint states. In addition, Figure 3 Figure supplement 3 (also emerging from the reviewers’ suggestions), shows that the model predicts the behavioral covariances well. We did not need to do simulations, per se, to show these results, since these values can be directly numerically calculated from the model (hence the lack of the word ’simulation’ connected to these efforts in the manuscript).

(2) Reviewer #1: The writing is slightly improved, though some sections could still be improved by concluding paragraphs not just with a mathematical result but also what it means in plain terms. E.g. Lines 264-7 conclude with the finding that the covariance matrix has a 'far from random modular structure;' this does not explain how this result relates to the title of the section (Individual Variability and long timescale correlations).

We thank the reviewer for their comments, and we have adapted several passages in the text to increase the clarity of our methodology and to make connections between the mathematical results and the according biological significance (see the latex-diff document for all changes).

(3) Please note that eLife has recently adopted the STRANGE guidelines for animal behaviour research:

https://doi.org/10.1038/d41586-020-01751-5

https://reviewer.elifesciences.org/author-guide/journal-policies

In your revisions, please consider scope for sampling biases in your study and how these may limit the generalisability of your findings, and make declarations as necessary. A few sentences in the section "Data collection" (lines 462-470) may suffice.

We thank the reviewers for pointing this out (even as an eLife reviewing editor, GJB was embarrassingly unaware of these guidelines). Although detailed information about how the flies were handled/housed was previously published in Cande et al., eLife, 2018 (and was pointed to accordingly via a citation), we added relevant details here to the Materials and methods section to make the manuscript more self-contained and to make our experimental details more clear.

As to sampling biases, we have provided a detailed list of the specific strains used in our experimental – all of which are readily available through the UCSD or Bloomington fly stocks. While we have no evidence that our precise choice of species/strains within the D. melanogaster species subgroup should generate bias, we do acknowledge in the Discussion section that our phylogentic reconstruction is likely under-constrained, and that adding more species/strains will “place more constraints on the evolutionary dynamics, likely resulting in less variance in the ancestral state estimations and potentially adding more structure to the relatively low rank (i.e., highly modular) covariance matrices. Additionally, further work is required to determine the balance between sampling within and between strains and species that optimizes estimates of evolutionary dynamics.”

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Source data 1. Fly behavior source data.
    elife-61806-data1.zip (505.4KB, zip)
    Transparent reporting form

    Data Availability Statement

    All behavioral region information is submitted with the article and is posted on GitHub (https://github.com/bermanlabemory/behavioral-evolution, copy archived at https://archive.softwareheritage.org/swh:1:rev:b01a6e3a2c7da193f38631dfe925c65229494d74). The original video data are too large to post (tens of TB), but will be made available upon request.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES