Significance
A long-standing problem in evolutionary biology is to understand the processes that shape the genetic composition of populations. In a population without migration, two processes that change allele frequencies are selection, which increases beneficial alleles and removes deleterious ones, and genetic drift, which randomly changes frequencies as some parents contribute more or fewer alleles to the next generation. Previous efforts to disentangle these processes have used genomic samples from a single time point and models of how selection affects neighboring sites (linked selection). Here, we use genomic data taken through time to quantify contributions of selection and drift to genome-wide frequency changes. We show that selection acts over short timescales in three evolve-and-resequence studies and has a sizable genome-wide impact.
Keywords: linked selection, experimental evolution, adaptation
Abstract
Rapid phenotypic adaptation is often observed in natural populations and selection experiments. However, detecting the genome-wide impact of this selection is difficult since adaptation often proceeds from standing variation and selection on polygenic traits, both of which may leave faint genomic signals indistinguishable from a noisy background of genetic drift. One promising signal comes from the genome-wide covariance between allele frequency changes observable from temporal genomic data (e.g., evolve-and-resequence studies). These temporal covariances reflect how heritable fitness variation in the population leads changes in allele frequencies at one time point to be predictive of the changes at later time points, as alleles are indirectly selected due to remaining associations with selected alleles. Since genetic drift does not lead to temporal covariance, we can use these covariances to estimate what fraction of the variation in allele frequency change through time is driven by linked selection. Here, we reanalyze three selection experiments to quantify the effects of linked selection over short timescales using covariance among time points and across replicates. We estimate that at least 17 to 37% of allele frequency change is driven by selection in these experiments. Against this background of positive genome-wide temporal covariances, we also identify signals of negative temporal covariance corresponding to reversals in the direction of selection for a reasonable proportion of loci over the time course of a selection experiment. Overall, we find that in the three studies we analyzed, linked selection has a large impact on short-term allele frequency dynamics that is readily distinguishable from genetic drift.
A long-standing problem in evolutionary genetics is quantifying the roles of genetic drift and selection in shaping genome-wide allele frequency changes. Selection can affect allele frequencies, both directly and indirectly, with the indirect effect coming from the action of selection on correlated loci elsewhere in genome [e.g., linked selection (1–4); ref. 5 has a review]. Previous work has mostly focused on teasing apart the impacts of drift and selection on genome-wide diversity using population samples from a single contemporary time point, often by modeling the correlation between regional recombination rate, gene density, and diversity created in the presence of linked selection (6, 7). This approach has shown that linked selection has a major role in shaping patterns of genome-wide diversity across the genomes of a range of sexual species (8–16) and has allowed us to quantify the relative influence of positive selection (hitchhiking) and negative selection (background selection) (8, 9, 16–19). However, we lack an understanding of both how linked selection acts over short time intervals and its full impact on genome-wide allele frequency changes.
There are numerous examples of rapid phenotypic adaptation (20–23) and rapid, selection-driven genomic evolution in asexual populations (24–26). Yet, the polygenic nature of fitness makes detecting the impact of selection on genome-wide variation over short timescales in sexual populations remarkably difficult (27–29). This is because the effect of selection on a polygenic trait (such as fitness) is distributed across numerous loci. This can lead to subtle allele frequency shifts on standing variation that are difficult to distinguish from background levels of genetic drift and sampling variance. Increasingly, genomic experimental evolution studies with multiple time points, and in some cases multiple replicate populations, are being used to detect large-effect selected loci (30, 31) and differentiate modes of selection (32–34). In addition, these temporal–genomic studies have begun in wild populations, some with the goal of finding variants that exhibit frequency changes consistent with fluctuating selection (35, 36). In a previous paper, we proposed that one useful signal for understanding the genome-wide impact of polygenic linked selection detectable from temporal genomic data is the temporal autocovariance (i.e., covariance between two time points) of allele frequency changes (37). These covariances are created when the loci that underly heritable fitness variation perturb the frequencies of linked alleles; in contrast, when genetic drift acts alone in a closed population, these covariances are expected to be zero for neutral alleles. Mathematically, temporal covariances are useful because it is natural to decompose the total variance in allele frequency change across a time interval into the variances and covariances in allele frequency change between generations. Furthermore, biologically, these covariances reflect the extent to which allele frequency changes in one generation predict changes in another due to shared selection pressures and associations with selected loci.
Here, we provide empirical analyses to quantify the impact of linked selection acting over short timescales (tens of generations) across two evolve-and-resequence studies (33, 38) and an artificial selection experiment (39). These sequencing selection experiments have started to uncover selected loci contributing to the adaptive response; however, it is as yet far from clear how much of genome-wide allele frequency changes are driven by selection or genetic drift. We repeatedly find a signal of temporal covariance, consistent with linked selection acting to significantly perturb genome-wide allele frequency changes across the genome in a manner that other approaches would not be able differentiate from genetic drift. We estimate a lower bound of the fraction of variance in allele frequency change caused by selection, as well as the correlation between allele frequency changes between replicate populations caused by convergent selection pressures. Overall, we demonstrate that linked selection has a powerful role in shaping genome-wide allele frequency changes over very short timescales in experimental evolution.
Results
We first analyzed the dataset of Barghi et al. (33), an evolve-and-resequence study with 10 replicate populations exposed to a high-temperature laboratory environment, evolved for 60 generations, and sequenced every 10 generations. Using the seven time points and 10 replicate populations, we estimated the genome-wide temporal covariance matrix for each of the 10 replicates. Each row of these matrices represents the temporal covariance Cov(Δ10ps,Δ10pt) between the allele frequency change (in 10-generation intervals, denoted Δ10pt) of some initial reference generation (the row of the matrix) and some later time point (the column of the matrix). We corrected these matrices for biases created due to sampling noise and normalized the entries for heterozygosity (SI Appendix, sections S1.2 and S1.4). These covariances are expected to be zero when only drift is acting, as only heritable variation for fitness can create covariance between allele frequency changes in a closed population (37). Averaging across the 10 replicate temporal covariances matrices, we find temporal covariances that are statistically significant (95% block bootstraps CIs do not contain zero), consistent with linked selection perturbing genome-wide allele frequency changes over very short time periods. The covariances between all adjacent time intervals are positive and then decay toward zero as we look at more distant time intervals (Fig. 1A), as expected when directional selection affects linked variants’ frequency trajectories until ultimately linkage disequilibrium (LD) and the associated additive genetic variance for fitness decays (which could occur as a population reaches a new optimum and directional selection weakens) (37). The temporal covariances per replicate are noisier, but this general pattern holds (SI Appendix, Fig. S23).
Since our covariances are averages over loci, the covariance estimate could be strongly affected by a few outlier regions. To test whether large outlier regions drive the genome-wide signal we see in the Barghi et al. (33) data, we calculate the covariances in 100-kb windows along the genome (we refer to these as windowed covariances throughout) and take the median windowed covariance (and trimmed-mean windowed covariance) as a measure of the genome-wide covariance robust to large-effect loci. These robust estimates (SI Appendix, Table S1 and Fig. S24) confirm the patterns we see using the mean covariance, establishing that genomic temporal covariances are nonzero due to the impact of selection acting across many genomic regions.
While the presence of positive temporal covariances is consistent with selection affecting allele frequencies over time, this measure is not easily interpretable. We can calculate a more intuitive measure from the temporal covariances to quantify the impact of selection on allele frequency change: the ratio of total covariance in allele frequency change to the total variance in allele frequency change. We denote the change in allele frequency as , where is the allele frequency in generation . Since the total variation in allele frequency change can be partitioned into variance and covariance components, (we correct for biases due to sequencing depth), and the covariances are zero when drift acts alone, this is a lower bound on how much of the variance in allele frequency change is caused by linked selection (37). We call this measure , defined as
[1] |
This estimates the impact of selection on allele frequency change between the initial generation 0 and some later generation , which can be varied to see how this quantity grows through time. When the sum of the covariances is positive, this measure can intuitively be understood as a lower bound on relative fraction of allele frequency change normally thought of as “drift” that is actually due to selection. Additionally, can be understood as a short-timescale estimate of the reduction in neutral diversity due to linked selection (or equivalently, the reduction in neutral effective population size needed to account linked selection) (SI Appendix, section S7). Since the Barghi et al. (33) experiment is sequenced every 10 generations, the numerator uses the covariances estimated between 10-generation blocks of allele frequency change; thus, the strong, unobservable covariances between adjacent generations do not contribute to the numerator of . Had these covariances been measurable on shorter timescales, their cumulative effect would likely have been higher yet (SI Appendix, sections S2 and S8.4 have more details). Additionally, selection inflates the variance in allele frequency change per generation; however, this effect cannot be easily distinguished from drift. For both these reasons, our measure is quite conservative (we demonstrate this through simulations in SI Appendix, section S8.4). Still, we find a remarkably strong signal. Greater than of total, genome-wide allele frequency change over 60 generations is the result of selection (Fig. 1B). This proportion of variance attributable to selection builds over time in Fig. 1B as the effects of linked selection are compounded over the generations unlike genetic drift. Our G(t) starts to plateau to a constant level as the covariances from earlier generations have decayed and so, no longer contribute as strongly (Fig. 1).
Additionally, we looked for a signal of temporal autocovariance in Bergland et al. (35), a study that collected Drosophila melanogaster through spring–fall season pairs across 3 years. If there was a strong pattern of genome-wide fluctuating selection, we might expect a pattern of positive covariances between similar seasonal changes (e.g., spring–fall in two adjacent years) and negative covariances between dissimilar seasonal changes (e.g., spring–fall and fall–spring in two adjacent years). However, we find no such signal over years, and in reproducing their original analysis, we find that their number of statistically significant seasonal polymorphisms is not enriched compared with an empirical null distribution created by permuting seasonal labels (we discuss this in more depth in SI Appendix, section S6).
The replicate design of Barghi et al. (33) allows us to quantify another covariance: the covariance in allele frequency change between replicate populations experiencing convergent selection pressures. These between-replicate covariances are created in the same way as temporal covariances: alleles linked to a particular fitness background are expected to have allele frequency changes in the same direction if the selection pressures are similar. Intuitively, where temporal covariances reflect that alleles associated with heritable fitness backgrounds are predictive of frequency changes between generations, replicate covariances reflect that heritable fitness backgrounds common to each replicate predict (under the same selection pressures) frequency changes between replicates; we note that there is not a direct one-to-one correspondence between temporal and replicate covariances since the latter are driven by a shared selection pressure and the stochastic genetic backgrounds across replicate populations. We measure this through a statistic similar to a correlation, which we call the convergent correlation: the ratio of average between-replicate covariance across all pairs to the average SD across all pairs of replicates:
[2] |
where and here are two replicate labels, and for the Barghi et al. (33) data, we use Δ10pt.
We have calculated the convergent correlation for all rows of the replicate covariance matrices. Like temporal covariances, we visualize these through time (Fig. 2 A, Left), with each line representing the convergent correlation from a particular reference generation as it varies with (shown on the x axis). In other words, each of the colored lines corresponds to the like-colored row of the convergence correlation matrix (Fig. 2 A, Right). We find that these convergent correlation coefficients are relatively weak and decay very quickly from an initial value of about 0.1 (95% block bootstrap CIs ) to around 0.01 (95% CIs [0.0087, 0.015]) within 20 generations. This suggests that while a substantial fraction of the initial response is shared over the replicates, this is followed by a rapid decay, a result consistent with the primary finding of the original Barghi et al. (33) study: that alternative loci contribute to longer-term adaptation across the different replicates.
A benefit of between-replicate covariances is that unlike temporal covariances, these can be calculated with only two sequenced time points and a replicated study design. This allowed us to assess the impact of linked selection in driving convergent patterns of allele frequency change across replicate populations in two other studies. First, we reanalyzed the selection experiment of Kelly and Hughes (38), which evolved three replicate wild populations of Drosophila simulans for 14 generations adapting to a novel laboratory environment. Since each replicate was exposed to the same selection pressure and shared LD common to the original natural founding population, we expected each of the three replicate populations to have positive convergence correlations. We find that all three convergent correlation coefficients between replicate pairs are significant (Fig. 2B) and average to 0.36 ( CI ). Additionally, we can calculate the proportion of the total variance in allele frequency change from convergent selection pressure, analogous to our , where the numerator is the convergent covariance and the denominator is the total variance (SI Appendix, section S4). We find that 37% of the total variance is due to shared allele frequency changes caused by selection (95% CI [29%, 41%]); these are similar to the convergence correlation since the variance is relatively constant across the replicates.
Next, we reanalyzed the Longshanks selection experiment, which selected for longer tibiae length relative to body size in mice, leading to a response to selection of about 5 SDs over the course of 20 generations (39, 40). This study includes two independent selection lines, Longshanks 1 (LS1) and Longshanks 2 (LS2), and an unselected control line (Ctrl) where parents were randomly selected. Consequently, this selection experiment offers a useful control to test our convergence correlations: we expect to see significant positive convergence correlations in the comparison between the two Longshanks selection lines but not between each of the control and Longshanks line pairs. We find that this is the case (gray CIs in Fig. 2C), with convergence correlations between each of the Longshanks lines and the control not being statistically different from zero, while the convergence correlation between the two Longshanks lines is strong (0.18) and statistically significant (CIs [0.07, 0.25]).
One finding in the Longshanks study was that two major-effect loci showed parallel frequency shifts between the two selection lines. We were curious to what extent our genome-wide covariances were being driven by these two outlier large-effect loci, so we excluded them from the analysis. Since we do not know the extent to which LD around these large-effect loci affects neighboring loci, we took the conservative precaution of excluding the entire chromosomes these loci reside on (chromosomes 5 and 10) and recalculating the temporal covariances. We find that excluding these large-effect loci has little impact on the CIs (blue CIs in Fig. 2C), indicating that these across-replicate covariances are indeed driven by a large number of loci. This is consistent with a signal of selection on a polygenic trait driving genome-wide change, although we note that large-effect loci can contribute to the indirect change at unlinked loci (41, 42).
The presence of an unselected control line provides an alternative way to partition the effects of linked selection and genetic drift: we can compare the total variance in allele frequency change of the control line (which excludes the effect of artificial selection on allele frequencies) with the total variance in frequency change of the Longshanks selection lines. This allows us to estimate the increase in variance in allele frequency change due to selection, which we can further partition into the effects of selection shared between selection lines and those unique to a selection line by estimating the shared effect through the observed covariance between replicates (Materials and Methods and SI Appendix, section S4 have more details). We estimate at least 32% (95% CI ) of the variance in allele frequency change is driven by the effects of selection, of which 14% (95% CI ) is estimated to be unique to a selection line, and 17% (95% CI ) is the effect of shared selection between the two Longshanks selection lines.
We observed that in the longest study we analyzed (33), some genome-wide temporal covariances become negative at future time points (the first two rows in Fig. 1 A, Left). This shows that alleles that were on average going up initially are later going down in frequency (i.e., that the average direction of selection experienced by alleles has flipped). This might reflect either a change in the environment or the genetic background, due to epistatic relationships among alleles altered by frequency changes (which can occur during an optima shift; ref. 43) or recombination breaking up selective alleles. Such reversals in selection dynamics could be occurring at other time points, but the signal of a change in the direction of selection at particular loci may be washed out when we calculate our genome-wide average temporal covariances. To address this limitation, we calculated the distribution of the temporal covariances over 100-kb windowed covariances (Fig. 3 shows these distributions pooling across all replicates, and SI Appendix, Fig. S26 shows individuals replicates). The covariance estimate of each genomic window will be noisy, due to sampling and genetic drift, and the neutral distribution of the covariance is complicated due to LD, which can occur over long physical distances in evolve-and-resequence and selection studies (44, 45). To address this, we have developed a permutation-based procedure that constructs an empirical neutral null distribution by randomly flipping the sign of the allele frequency changes in each genomic window (i.e., a single random sign flip is applied to all loci in a window). This destroys the systematic covariances created by linked selection and creates a sampling distribution of the covariances spuriously created by neutral genetic drift while preserving the complex dependencies between adjacent loci created by LD. This empirical neutral null distribution is conservative in the sense that the variances of the covariances are wider than expected under drift alone, as selection not only creates covariance between time intervals but also, inflates the magnitude of allele frequency change within a time interval. We see (Fig. 3 A and B) that there is an empirical excess of windows with positive covariances between close time points compared with the null distribution (a heavier right tail) and that this then shifts to an excess of windows with negative covariances between more distant time points (a heavier left tail).
We quantified the degree to which the left and right tails are inflated compared with the null distribution as a function of time and see excesses in both tails in Fig. 3C. This finding is also robust to sign-permuting allele frequency changes on a chromosome level, the longest extent that gametic LD can extend (SI Appendix, Fig. S29). We see a striking pattern that the windowed covariances not only decay toward zero but in fact, become negative through time, consistent with many regions in the genome having had a reversed fitness effect at later time points.
Finally, we used forward-in-time simulations to explore the conditions under which temporal and convergent correlations arise. We show a subset of our results for a model of stabilizing selection on a phenotype where directional selection is induced by a sudden shift in the optimum phenotype of varying magnitudes (Fig. 4A). We find that positive temporal covariances are produced by such selection (Fig. 4B) and that these positive temporal covariances can compound together to generate a large proportion of allele frequency change being due to selection [i.e., large ] over the relatively short time periods similar to our analyzed selection datasets span (Fig. 4C). The magnitude of increases with the strength of selection (i.e., the variance in fitness) such that stronger selection generates larger proportions of allele frequency change. We find a similar picture of stronger convergent selection pressures generating larger convergence correlations (Fig. 4D; SI Appendix, Fig. S12 shows how other factors impact convergence correlations).
Averaging across replicates, these simulation results show is relatively insensitive to the number of loci underlying the trait. However, if only a small number of loci influence the trait, the trajectories are typically much more stochastic across replicates. This reaffirms that the genome-wide linked selection response we see in the Barghi et al. (33) data is highly polygenic (compare Fig. 1B with SI Appendix, Fig. S6). Furthermore, using our simulations we find that sampling only every 10 generations does indeed mean that our estimates of are an underestimate of the proportional effect of linked selection as they cannot include the covariance between closely spaced generations (SI Appendix, Fig. S14).
Additionally, we explored other modes of selection with simulations. We find that the long-term dynamics of the covariances under directional truncation selection, which generates substantial epistasis, are richer than we see under Gaussian stabilizing selection (GSS) and multiplicative selection (SI Appendix, Fig. S18). We also conducted simulations of purifying selection alone (i.e., background selection) and find that this can also generate positive temporal covariances (SI Appendix, Fig. S16) and under some circumstances, can even generate convergence correlations (SI Appendix, Fig. S17). Thus, it is unlikely that the signatures of linked selection we see are entirely the result of the novel selection pressure the populations are exposed to, and some of this signature may be ongoing purifying selection. Only in the case of the Longshanks experiment does the presence of a control line allow us to conclude that selection that is almost entirely due to the novel selection pressure.
While none of our experiments have selected the populations in divergent directions, in our simulations we find that such selection can generate negative convergent correlations (Fig. 4D). This suggests that selection experiments combining multiple replicates, control lines, as well as divergent selection pressures might be quite informative in disentangling the contribution of particular selection pressures from genome-wide allele frequency changes.
Discussion
Since the seminal analysis of Smith and Haigh (1) demonstrating that linked neutral diversity is reduced as an advantageous polymorphism sweeps to fixation, over four decades of theoretical and empirical research has bettered our understanding of linked selection. One underused approach to understand the genome-wide effects of selection on polygenic trait (e.g., on standing variation) stems from an early quantitative genetic model of linked selection (41) and its later developments (42, 46–48; ref. 5 has a comparison of these models with classic hitchhiking models). Implicit in these models is that autocovariance between allele frequency change is created when there is heritable fitness variation in the population, a signal that may be readily detected from temporal genomic data (37). Depending on how many loci affect fitness, even a strong effect of linked selection may not be differentiable from genetic drift using only single contemporary population samples or looking at temporal allele frequency change at each locus in isolation. In this way, averaging summaries of temporal data allows us to sidestep the key problem of detecting selection from standing variation: that the genomic footprint leaves too soft of a signature to differentiate from a background of genetic drift. In fact, we find that the temporal covariance signal is detectable even in the extremely difficult to detect case of selection on highly polygenic traits (37).
It is worth building some intuition why temporal covariance allows us to detect such faint signals of polygenic linked selection from temporal genomic data. Variance in allele frequency change is subject to both drift and sampling noise, which at any single locus, may swamp the temporal covariance signal due to selection or create spurious covariances when selection is not acting. However, these spurious covariances do not share a directional signal, whereas the covariances created by linked selection do; consequently, averaging across the entire genome, the temporal signal exceeds sampling noise.
Our analyses reveal that a sizable proportion of allele frequency change in these experimental evolution populations is due to the (likely indirect) action of selection. Capitalizing on replicated designs, we characterized the extent to which convergent selection pressures lead to parallel changes in allele frequencies across replicate populations and found that a substantial proportion of the response is shared across short timescales. These likely represent substantial underestimates of the contribution of linked selection because the studies we have reanalyzed do not sequence the population each generation, preventing us from including the effects of stronger correlations between adjacent generations. Furthermore, our estimation methods are intentionally conservative: for example, they exclude the contribution of selection that does not persist across generations and selection that reverses sign; thus, they can be seen as a lower bound of the effects of selection, which we have confirmed through forward-in-time simulations. Finally, through simulation results, we show that for a given level of additive genetic variance, the strengths of temporal and replicate covariances depend on the mode of selection, the details of the populations or selection experiment, and the level of LD, yet the level of temporal covariance is relatively invariant to the number of loci underlying fitness, as long as fitness is sufficiently polygenic.
These estimates of the contribution of selection could be refined by using patterns of LD and recombination, which would allow us to more fully parameterize a linked selection model of temporal allele frequency change (37). The basic prediction is that regions of higher LD and lower recombination should have greater temporal autocovariance than regions with lower LD and higher recombination. However, one limitation of these pooled sequence datasets is that none of the studies we reanalyzed estimated LD data for the evolved populations. While there are LD data for a natural population of D. simulans (49, 50), we did not find a relationship between temporal covariance and LD. We believe that this is driven by the idiosyncratic nature of LD in evolve-and-resequence populations, which often extends over large genomic distances (38, 44). Future studies complete with LD data and recombination maps would allow one to disentangle the influence of closely linked sites from more distant sites in causing temporal autocovariance and allow the fitting of more parametric models to estimate population parameters such as the additive genetic variance for fitness directly from temporal genomic data alone (37). Future work could refine our estimates by including selection’s impact on the variance in allele frequency terms (e.g., equation 26 of ref. 37) and possibly quantifying the covariances missed when sequencing is not done each generation; both would lead to less conservative estimates that could show a large impact of selection.
Our primary focus here has been on evolution in laboratory populations. It is unclear whether we should expect a similar impact of selection in natural populations. In some of these experiments, selection pressures may have been stronger or more sustained than in natural populations (51, 52). Conversely, these laboratory populations were maintained at relatively small census sizes (Table 1), which will amplify the role of genetic drift, and increase the frequency of rare deleterious alleles in selection lines due to founder effects. The advantage of laboratory experiments is that they are closed populations; in natural populations, temporal covariance could also arise from the systematic migration of alleles from differentiated populations. Adapting these methods to natural populations will require either populations that are reasonably closed to migration or the effect of migration to be accounted for possibly either by knowledge of allele frequencies in source populations or the identification of migrant individuals.
Table 1.
Study | Species | Selection | Replicates | Population size* | Generations | Time points |
Kelly and Hughes (38) | D. simulans | Laboratory adaptation | 3 | 1,100 | 14 | 2 |
Barghi et al. (33) | D. simulans | Laboratory adaptation | 10 | 1,000 | 60 | 7 |
Castro et al. (39) | Mus musculus | Tibiae length | 2 | 32 | 17 | 2 |
Mus musculus | Control | 1 | 28 | 17 | 2 |
Approximate census population size during experiment.
While it challenging to apply temporal methods to natural populations, there is a lot of promise for these approaches (35, 36). Efforts to quantify the impact of linked selection have found that obligately sexual organisms have up to an 89% reduction in genome-wide diversity over long time periods (16, 18, 53–55) Thus, linked selection makes a sizable contribution to long-term allele frequency change in some species, and there is reason to be hopeful that we could detect this from temporal data, which would help to resolve the timescales that linked selection acts over in the wild. In our reanalysis of the Barghi et al. (32) study, we find evidence of complex linked selection dynamics, with selection pressures flipping over time due to environmental change, the breakup of epistatic combinations, or advantageous haplotypes. Such patterns would be completely obscured in samples from only contemporary populations. Thus, we can hope to have a much richer picture of the impact of selection as temporal sequencing becomes more common, allowing us to observe the effects of ecological dynamics in genomic data (52).
Furthermore, understanding the dynamics of linked selection over short timescales will help to unite phenotypic studies of rapid adaptation with a detectable genomic signature to address long-standing questions concerning linked selection, evolutionary quantitative genetics, and the overall impact selection has on genetic variation.
Materials and Methods
Datasets Analyzed.
We used available genomic data data from four studies: pooled population resequencing data from Barghi et al. (33), Kelly and Hughes (38), and Bergland et al. (35) and individual-level sequencing data from Castro et al. (39). In all cases, we used the variants kept after the filtering criteria of the original studies.
Variance and Covariance Estimates.
To remove systematic covariances in allele frequency change caused by tracking the reference or minor allele, we randomly choose an allele to track frequency for each locus. Then, we calculate the variance–covariance matrix of allele frequency changes using a Python software package we have written, available at http://github.com/vsbuffalo/cvtk. This simultaneously calculates temporal variances and covariances and replicate covariances and uses the sampling depth and number of diploid individuals to correct for bias in the variance estimates and a bias that occurs in covariance estimates between adjacent time points due to shared sampling noise (SI Appendix, sections S1.2–S1.4 have mathematical details of these estimators). We assess that our bias correction procedure is working adequately through a series of diagnostic plots that ensure that the procedure removes the relationship between sampling depth and uncorrected variance and covariances (SI Appendix, Fig. S4). Through our simulations we find that our estimates can differ based on how fixations and losses are handled in long time series (SI Appendix, section S8.7), but none of our findings in the text are qualitatively altered by this decision (SI Appendix, Figs. S19 and S20).
Estimating Uncertainty with a Block Bootstrap.
To infer the uncertainty of covariance, convergence correlation, and estimates, we used a block bootstrap procedure. This bootstrap procedure resamples blocks of loci, rather than individual loci, to infer the uncertainty of a statistic in the presence of unknown correlation between loci. As most estimators in this paper are ratios [e.g., covariance standardized by sample heterozygosity, , and the convergence correlation], which we estimate with a ratio of averages, we exploit the linearity of expectation for efficient computation of bootstrap samples (SI Appendix, Fig. S3 shows details).
Partitioning Unique and Shared Selection Effects in the Longshanks Study.
The unselected control line in the Longshanks experiment allows us to additionally partition the total variance in allele frequency change into drift, shared effects of selection, and unshared effects of selection between selected replicates. We begin by decomposing the allele frequency change in LS1 as , where these terms are the drift in LS1 (), selection unique to the LS1 replicate (), and selection response shared between the two Longshanks replicates (; and similarly for LS2). By construction, this decomposition assumes that each of these terms is uncorrelated within replicates, so the contribution of each term to the total variance in allele frequency change, , is the variance of that term’s allele frequency change.
We estimate the effects of selection by first calculating the fraction of the total variance explained by drift. We assume that the variance in allele frequency change observed in the unselected control [] is driven entirely by neutral genetic drift, and since an identical breeding scheme was used across all three replicates (except that breeders for the Ctrl line were chosen at random), we can use this as an estimate of the contribution of neutral genetic drift in the selected lines, . Then, we can estimate the increase in variance in allele frequency change due to selection as and the shared effect of selection across selected lines as . Finally, the covariance in allele change between replicates is used to estimate the shared effects of selection between lines, .
Windowed Covariance and the Empirical Neutral Null.
Throughout the paper, we use genomic windows for the block bootstrap procedure. For the D. simulans and D. melanogaster data from the Barghi et al. (33), Kelly and Hughes (38), and Bergland et al. (35) studies, we used large megabase windows for the block bootstrap procedure, while we used a 10-Mb window for the large mouse genome data from the Castro et al. (39) study.
Given evidence of a reversal in the direction of selection at later time points in the Barghi et al. (33) study, we calculated windowed temporal covariances on 10-kb windows and looked at the distribution of these covariances through time. We compare these distributions of windowed covariances with an empirical neutral null created by randomly permuting the sign of allele frequency change at the block level (to preserve the correlation structure between loci due to LD). This destroys the systematic covariances in allele frequency change created by linked selection, which emulates a frequency trajectory under drift. This approach is conservative since heritable fitness variation also inflates the magnitude of allele frequency change more than expected under drift, but we do not change these magnitudes. Using this empirical neutral null distribution of windowed covariances, we calculate how much of the observed windowed covariance distribution falls outside of empirical null distribution for different tail probabilities. While the comparison between the distribution of 10-kb windowed covariances and the empirical neutral null created from sign-permuting 10-kb windows is most natural, we wanted to ensure that our finding that the shift from mostly positive to mostly negative windowed covariances through time (Fig. 3) was robust to LD extending beyond the range of these 10-kb windows. We took the conservative approach of also sign permuting at the chromosome level and found the same qualitative shift (SI Appendix, Fig. S29).
Forward-in-Time Simulations.
To explore how aspects of genetic architecture, models of selection, and experimental design impact temporal covariance, the trajectories, and convergence correlations, we ran extensive forward-in-time simulations using SLiM (56); here, we discuss the GSS simulations in Fig. 4, but SI Appendix, section S8 describes these simulation routines and others in detail.
We simulated directional selection on a trait by first evolving each population of diploids to equilibrium (we will refer to this as the burn-in hereafter) under GSS for generations with the stabilizing selection variance and an optima set at 0. We note that the small burn-in population size means that these simulations should not be taken as reflecting any specific natural population, and they are for illustrative purposes only. We simulated a polygenic architecture by setting the trait mutation rate to per base pair, per generation, in addition to having a separate neutral mutation of , which created neutral mutations that we used to calculate the temporal covariances. Our simulated region was 50 Mb in length (about one-quarter of a Drosophila chromosome), and trait alleles were randomly selected to have a effect size. By tracking the trait mean through the burn-in, we found that it converged to the optimum as expected. After the burn-in, the population was split into two different replicate populations to capture the effect of bottlenecks in selection experiments (these population sizes varied between 50, 500, and 1,000 diploids, the latter representing no bottleneck). Each population then underwent an optima shift of 0.1, 0.5, or 1 on generation 5, with the first four generations serving as a control. These optima shifts were in the same direction (converging), different directions (diverging), or only one optima shifted (as a control). By tracking the trait mean, we saw that it converged as expected during burn-in, and the trait showed the expected directional response to selection (SI Appendix, Fig. S7). Using the neutral population frequency data from these simulations, we calculated the temporal covariances, trajectories, and convergence correlations.
Supplementary Material
Acknowledgments
We thank the authors of the original studies we have analyzed, including Neda Barghi, Nick Barton, Alan Bergland, Frank Chan, Kimberly Hughes, John Kelly, Dmitri Petrov, Campbell Rolian, and Christian Schlötterer. We also thank Doc Edge for helpful statistical advice and Dave Begun, Erin Calfee, Sarah Friedman, Andy Kern, Chuck Langley, Michael Turelli, Matt Osmond, Peter Ralph, and Sivan Yair for helpful discussions. Additionally, we thank Guy Sella and an anonymous reviewer whose comments greatly improved the manuscript. This research was supported by NSF Graduate Research Fellowship Grant 1650042 (to V.B.), NIH Grant R01-GM108779 (to G.C.), and NSF Grant 1353380 (to G.C.).
Footnotes
The authors declare no competing interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1919039117/-/DCSupplemental.
Data Availability.
All analyses were done in Python using numpy, matplotlib, and Jupyter notebooks (57–60); reanalysis code and notebooks to reproduce these analyses are available on GitHub, https://github.com/vsbuffalo/cvtk/ (61). All data are from previous studies and available; Barghi et al. (33, 62) data were downloaded from https://datadryad.org/resource/doi:10.5061/dryad.rr137kn, Kelly and Hughes (38, 63) data were downloaded from https://gsajournals.figshare.com/articles/Supplemental_Material_for_Kelly_and_Hughes_2018/7124963, Bergland et al. (35, 64) data were downloaded from https://datadryad.org/stash/dataset/doi:10.5061/dryad.v883p, and Castro et al. (39, 65) data were downloaded from http://ftp.tuebingen.mpg.de/fml/ag-chan/Longshanks/.
References
- 1.Smith J. M., Haigh J., The hitch-hiking effect of a favourable gene. Genet. Res. 23, 23–35 (1974). [PubMed] [Google Scholar]
- 2.Charlesworth B., Morgan M. T., Charlesworth D., The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289–1303 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Nordborg M., Charlesworth B., Charlesworth D., The effect of recombination on background selection. Genet. Res. 67, 159–174 (1996). [DOI] [PubMed] [Google Scholar]
- 4.Neher R. A., Kessinger T. A., Shraiman B. I., Coalescence and genetic diversity in sexual populations under selection. Proc. Natl. Acad. Sci. U.S.A., 110, 15836–15841 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Barton N. H., Genetic hitchhiking. Philos. Trans. R. Soc. Lond. B Biol. Sci. 355, 1553–1562 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Cutter A. D., Payseur B. A., Genomic signatures of selection at linked sites: Unifying the disparity among species. Nat. Rev. Genet. 14, 262–274 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Sella G., Petrov D. A., Przeworski M., Andolfatto P., Pervasive natural selection in the Drosophila genome?. PLoS Genet. 5, e1000495 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Macpherson J. M., Sella G., Davis J. C., Petrov D. A., Genomewide spatial correspondence between nonsynonymous divergence and neutral polymorphism reveals extensive adaptation in Drosophila. Genetics 177, 2083–2099 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Andolfatto P., Hitchhiking effects of recurrent beneficial amino acid substitutions in the Drosophila melanogaster genome. Genome Res. 17, 1755–1762 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Begun D. J., et al. , Population genomics: Whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol. 5, e310 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Beissinger T. M., et al. , Recent demography drives changes in linked selection across the maize genome. Nat. Plants 2, 16084 (2016). [DOI] [PubMed] [Google Scholar]
- 12.Sattath S., Elyashiv E., Kolodny O., Rinott Y., Sella G., Pervasive adaptive protein evolution apparent in diversity patterns around amino acid substitutions in Drosophila simulans. PLoS Genet. 7, e1001302 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Williamson R. J., et al. , Evidence for widespread positive and negative selection in coding and conserved noncoding regions of Capsella grandiflora. PLoS Genet. 10, e1004622 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Andersen E. C., et al. , Chromosome-scale selective sweeps shape Caenorhabditis elegans genomic diversity. Nat. Genet. 44, 285–290 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Cutter A. D., Choi J. Y., Natural selection shapes nucleotide polymorphism across the genome of the nematode Caenorhabditis briggsae. Genome Res. 20, 1103–1111 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Elyashiv E., et al. , A genomic map of the effects of linked selection in Drosophila. PLoS Genet. 12, e1006130 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Nordborg M., et al. , The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3, e196 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.McVicker G., Gordon D., Davis C., Green P., Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hernandez R. D., et al. ; 1000 Genomes Project , Classic selective sweeps were rare in recent human evolution. Science 331, 920–924 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Grant P. R., Grant B. R., Causes of lifetime fitness of Darwin’s finches in a fluctuating environment. Proc. Natl. Acad. Sci. U.S.A. 108, 674–679 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Grant P. R., Grant B. R., Evolution of character displacement in Darwin’s finches. Science 313, 224–226 (2006). [DOI] [PubMed] [Google Scholar]
- 22.Reznick D. N., Shaw F. H., Rodd F. H., Shaw R. G., Evaluation of the rate of evolution in natural populations of guppies (Poecilia reticulata). Science 275, 1934–1937 (1997). [DOI] [PubMed] [Google Scholar]
- 23.Franks S. J., Sim S., Weis A. E., Rapid evolution of flowering time by an annual plant in response to a climate fluctuation. Proc. Natl. Acad. Sci. U.S.A. 104, 1278–1282 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Good B. H., McDonald M. J., Barrick J. E., Lenski R. E., Desai M. M., The dynamics of molecular evolution over 60,000 generations. Nature 551, 45–50 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Bennett A. F., Dao K. M., Lenski R. E., Rapid evolution in response to high-temperature selection. Nature 346, 79–81 (1990). [DOI] [PubMed] [Google Scholar]
- 26.Baym M., et al. , Spatiotemporal microbial evolution on antibiotic landscapes. Science 353, 1147–1151 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Latta R. G., Differentiation of allelic frequencies at quantitative trait loci affecting locally adaptive traits. Am. Nat. 151, 283–292 (1998). [DOI] [PubMed] [Google Scholar]
- 28.Pritchard J. K., Pickrell J. K., Coop G., The genetics of human adaptation: Hard sweeps, soft sweeps, and polygenic adaptation. Curr. Biol. 20, R208–R215 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kemper K. E., Saxton S. J., Bolormaa S., Hayes B. J., Goddard M. E., Selection for complex traits leaves little or no classic signatures of selection. BMC Genom. 15, 246 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Turner T. L., Stewart A. D., Fields A. T., Rice W. R., Tarone A. M., Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster. PLoS Genet. 7, e1001336 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Turner T. L., Miller P. M., Investigating natural variation in Drosophila courtship song by the evolve and resequence approach. Genetics 191, 633–642 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Burke M. K., et al. , Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467, 587–590 (2010). [DOI] [PubMed] [Google Scholar]
- 33.Barghi N., et al. , Genetic redundancy fuels polygenic adaptation in Drosophila. PLoS Biol. 17, e3000128 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Therkildsen N. O., et al. , Contrasting genomic shifts underlie parallel phenotypic evolution in response to fishing. Science 365, 487–490 (2019). [DOI] [PubMed] [Google Scholar]
- 35.Bergland A. O., Behrman E. L., O’Brien K. R., Schmidt P. S., Petrov D. A., Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila. PLoS Genet. 10, e1004775 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Machado H. E., et al. , Broad geographic sampling reveals predictable and pervasive seasonal adaptation in Drosophila. 10.1101/337543 (11 October 2019). [DOI]
- 37.Buffalo V., Coop G., The linked selection signature of rapid adaptation in temporal genomic data. Genetics 213, 1007–1045 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Kelly J. K., Hughes K. A., Pervasive linked selection and Intermediate-Frequency alleles are implicated in an Evolve-and-Resequencing experiment of Drosophila simulans. Genetics 211, 943–961 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Castro J. P., et al. , An integrative genomic analysis of the Longshanks selection experiment for longer limbs in mice. Elife 8, e42014 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Marchini M., et al. , Impacts of genetic correlation on the independent evolution of body mass and skeletal size in mammals. BMC Evol. Biol. 14, 258 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Robertson A., Inbreeding in artificial selection programmes. Genet. Res. 2, 189–194 (1961). [DOI] [PubMed] [Google Scholar]
- 42.Santiago E., Caballero A., Effective size of populations under selection. Genetics 139, 1013–1030 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Hayward L. K., Sella G., Polygenic adaptation after a sudden change in environment. 10.1101/792952 (3 October 2019). [DOI] [PMC free article] [PubMed]
- 44.Nuzhdin S. V., Turner T. L., Promises and limitations of hitchhiking mapping. Curr. Opin. Genet. Dev. 23, 694–699 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Baldwin-Brown J. G., Long A. D., Thornton K. R., The power to detect quantitative trait loci using resequenced, experimentally evolved populations of diploid, sexual organisms. Mol. Biol. Evol. 31, 1040–1055 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Santiago E., Caballero A., Effective size and polymorphism of linked neutral loci in populations under directional selection. Genetics 149, 2105–2117 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Wray N. R., Thompson R., Prediction of rates of inbreeding in selected populations. Genet. Res. 55, 41–54 (1990). [DOI] [PubMed] [Google Scholar]
- 48.Woolliams J. A., Wray N. R., Thompson R., Prediction of long-term contributions and inbreeding in populations undergoing mass selection. Genet. Res. 62, 231–242 (1993). [Google Scholar]
- 49.Signor S. A., New F. N., Nuzhdin S., A large panel of Drosophila simulans reveals an abundance of common variants. Genome Biol. Evol. 10, 189–206 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Howie J. M., Mazzucco R., Taus T., Nolte V., Schlötterer C., DNA motifs are not general predictors of recombination in two Drosophila sister species.Genome Biol. Ecol. 11, 1345–1357 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Hendry A. P., Kinnison M. T., Perspective: The pace of modern life: Measuring rates of contemporary microevolution. Evolution 53, 1637–1653 (1999). [DOI] [PubMed] [Google Scholar]
- 52.Hairston N. G. Jr, Ellner S. P., Geber M. A., Yoshida T., Fox J. A., Rapid evolution and the convergence of ecological and evolutionary time. Ecol. Lett. 8, 1114–1127 (2005). [Google Scholar]
- 53.Corbett-Detig R. B., Hartl D. L., Sackton T. B., Natural selection constrains neutral diversity across a wide range of species. PLoS Biol. 13, e1002112 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Coop G., Does linked selection explain the narrow range of genetic diversity across species? 10.1101/042598 (7 March 2016). [DOI]
- 55.Comeron J. M., Background selection as baseline for nucleotide variation across the Drosophila genome. PLoS Genet. 10, e1004434 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Haller B. C., Messer P. W., SLiM 3: Forward genetic simulations beyond the Wright-Fisher model. Mol. Biol. Evol. 36, 632–637 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Van Rossum G., Drake F. L., Python Tutorial (Centrum voor Wiskunde en Informatica, Amsterdam, The Netherlands, 1995). [Google Scholar]
- 58.Oliphant T. E., A Guide to NumPy (Trelgol Publishing USA, 2006), vol. 1. [Google Scholar]
- 59.Kluyver T., et al. ; Jupyter development team , “Jupyter Notebooks—a publishing format for reproducible computational workflows” in Positioning and Power in Academic Publishing: Players, Agents and Agendas, Loizides F., Scmidt B., Eds. (IOS Press, 2016), pp. 87–90. [Google Scholar]
- 60.Hunter J. D., Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007). [Google Scholar]
- 61.Buffalo V., Coop G., Estimating the genome-wide contribution of selection to temporal allele frequency change. GitHub. https://github.com/vsbuffalo/cvtk. Deposited 31 July 2020. [DOI] [PMC free article] [PubMed]
- 62.Barghi N., et al. , Data from “Genetic redundancy fuels polygenic adaptation in Drosophila.” Dryad. https://datadryad.org/stash/dataset/doi:10.5061/dryad.rr137kn. Deposited 6 February 2019. [DOI] [PMC free article] [PubMed]
- 63.Kelly J. K., Hughes K. A., Data from “Supplemental Material for Kelly and Hughes.” Figshare. 10.25386/genetics.7124963. Deposited 28 December 2018. [DOI]
- 64.Bergland A. O., et al. , Data from “Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila.” Dryad. https://datadryad.org/stash/dataset/doi:10.5061/dryad.v883p. Deposited 8 October 2015. [DOI] [PMC free article] [PubMed]
- 65.Castro J. P. L., et al. , Data from “Index of /fml/ag-chan/Longshanks; beagle_genMap.all.impute.vcf.gz.” University of Tuebingen. http://ftp.tuebingen.mpg.de/fml/ag-chan/Longshanks/. Deposited 4 June 2019.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All analyses were done in Python using numpy, matplotlib, and Jupyter notebooks (57–60); reanalysis code and notebooks to reproduce these analyses are available on GitHub, https://github.com/vsbuffalo/cvtk/ (61). All data are from previous studies and available; Barghi et al. (33, 62) data were downloaded from https://datadryad.org/resource/doi:10.5061/dryad.rr137kn, Kelly and Hughes (38, 63) data were downloaded from https://gsajournals.figshare.com/articles/Supplemental_Material_for_Kelly_and_Hughes_2018/7124963, Bergland et al. (35, 64) data were downloaded from https://datadryad.org/stash/dataset/doi:10.5061/dryad.v883p, and Castro et al. (39, 65) data were downloaded from http://ftp.tuebingen.mpg.de/fml/ag-chan/Longshanks/.