The Linked Selection Signature of Rapid Adaptation in Temporal Genomic Data

Vince Buffalo; Graham Coop

doi:10.1534/genetics.119.302581

. 2019 Sep 26;213(3):1007–1045. doi: 10.1534/genetics.119.302581

The Linked Selection Signature of Rapid Adaptation in Temporal Genomic Data

Vince Buffalo ^*,^†,¹, Graham Coop ^†

PMCID: PMC6827383 PMID: 31558582

Populations adapt to selection on polygenic traits through subtle allele frequency changes scattered throughout the genome. Detecting such changes from population genomic data is quite difficult, as these small changes can look like genetic drift. Buffalo...

Keywords: linked selection, polygenic selection, rapid adaptation, temporal genomic data; MPP

Abstract

The majority of empirical population genetic studies have tried to understand the evolutionary processes that have shaped genetic variation in a single sample taken from a present-day population. However, genomic data collected over tens of generations in both natural and laboratory populations are increasingly used to find selected loci underpinning adaptation over these short timescales. Although these studies have been quite successful in detecting selection on large-effect loci, the fitness differences between individuals are often polygenic, such that selection leads to allele frequency changes that are difficult to distinguish from genetic drift. However, one promising signal comes from polygenic selection’s effect on neutral sites that become stochastically associated with the genetic backgrounds that lead to fitness differences between individuals. Previous theoretical work has established that the random associations between a neutral allele and heritable fitness backgrounds act to reduce the effective population size experienced by this neutral allele. These associations perturb neutral allele frequency trajectories, creating autocovariance in the allele frequency changes across generations. Here, we show how temporal genomic data allow us to measure the temporal autocovariance in allele frequency changes and characterize the genome-wide impact of polygenic selection. We develop expressions for these temporal autocovariances, showing that their magnitude is determined by the level of additive genetic variation, recombination, and linkage disequilibria in a region. Furthermore, by using analytic expressions for the temporal variances and autocovariances in allele frequency, we demonstrate that one can estimate the additive genetic variation for fitness and the drift-effective population size from temporal genomic data. We also show how the proportion of total variation in allele frequency change due to linked selection can be estimated from temporal data. Overall, we demonstrate that temporal genomic data offer opportunities to identify the role of linked selection on genome-wide diversity over short timescales, and can help bridge population genetic and quantitative genetic studies of adaptation.

ADAPTATION can occur over remarkably short ecological timescales, with dramatic changes in phenotypes occurring over just a few generations in natural populations. This rapid pace of adaptive change has been known to be mirrored at the genetic level since the early work of Fisher and Ford (1947) testing whether the rapid decline in a coloration polymorphism was consistent with natural selection or genetic drift. Since then, researchers have continued to use temporal data to detect selection on polymorphisms over short timescales in natural populations (Dobzhansky 1943, 1971; Fisher and Ford 1947; Kettlewell 1958, 1961; Mueller et al. 1985b), as well as quantify the rate of genetic drift (Prout 1954; Wallace 1956; Nei and Tajima 1981; Pollak 1983; Mueller et al. 1985a; Waples 1989; Wang and Whitlock 2003). However, this line of work in sexual populations has been partially eclipsed by a vast body of work examining large-scale population genetic and genomic data sets from a single contemporary timepoint. More recently, studies have applied similar temporal approaches to whole-genome data to discover selected loci in contemporaneous natural populations (Bergland et al. 2014; Rajpurohit et al. 2018), evolve and resequence studies (Teotónio et al. 2009; Burke et al. 2010; Johansson et al. 2010; Turner et al. 2011; Orozco-terWengel et al. 2012; Turner and Miller 2012; Franssen et al. 2017; Barghi et al. 2019), and ancient DNA (Mathieson et al. 2015; Fu et al. 2016). Furthermore, numerous methods have been developed to estimate effective population size (Nei and Tajima 1981; Pollak 1983; Waples 1989) and detect selected loci (Malaspinas et al. 2012; Mathieson and McVean 2013; Feder et al. 2014; Terhorst et al. 2015) from time-series data.

Overall, these approaches have identified compelling examples where selection has driven extreme allele frequency change at particular loci that is inconsistent with drift alone. However, most adaptation on ecological timescales likely involves selection on phenotypes with polygenic architecture and abundant standing variation (Endler 1986; Hendry and Kinnison 1999; Kinnison and Hendry 2001; Kopp and Hermisson 2009b). We know from theory that adaptation on such traits can result from very subtle allele frequency changes across the many loci that underlie the trait (Bulmer 1980), at least for the short-term evolutionary response (Hermisson and Pennings 2005; Chevin and Hospital 2008; Jain and Stephan 2015, 2017; Thornton 2018; Höllinger et al. 2019). These changes may be individually indistinguishable from genetic drift in temporal data. This poses a challenge for population genetic approaches to quantify selection: rapid phenotypic adaptations occurring on ecological timescales may leave a signal on genome-wide patterns of diversity that is undetectable by methods focused on individual loci.

Here, we explore an alternative: rather than aiming to find selected loci, we can use temporal data to quantify the genome-wide effects of linked selection during polygenic adaptation. Linked selection introduces a new source of stochasticity into evolution, as a neutral allele’s frequency change depends on the fitness of the set of random genetic backgrounds it finds itself on (Gillespie 2000). The impact linked selection has on neutral loci is mediated by associations [linkage disequilibria/disequilibrium (LD)] with selected loci and hence their recombination environment; neutral loci tightly linked to selected sites experience greater average reductions in diversity than more loosely coupled sites. Studies using a single timepoint have long exploited this idea, with some of the first evidence of pervasive natural selection being the correlation between diversity and recombination in Drosophila (Aguade et al. 1989; Begun and Aquadro 1992). Various forms of linked selection give rise to such patterns with much attention focusing on the hitchhiking (positive selection; Smith and Haigh 1974) or background selection models (negative or purifying selection; Charlesworth et al. 1993; Hudson and Kaplan 1995). Recent genomic studies have modeled patterns of genome-wide diversity considering substitutions, functional constraints, and recombination environments to estimate parameters of hitchhiking and background selection models, and have begun to differentiate between these models (McVicker et al. 2009; Hernandez et al. 2011; Elyashiv et al. 2016). Across-taxa comparisons have shown that signals of linked selection are present in many sexual organisms, and that in some species a proportion of the stochastic change in allele frequencies is due to the randomness of linked selection instead of genetic drift (Cutter and Payseur 2013; Corbett-Detig et al. 2015; Coop 2016). Likewise, in asexual and facultatively sexual organisms, both theory and empirical work show that linked selection and interference are primary determinants of the levels of genetic diversity (Neher and Shraiman 2011; Neher 2013; Good et al. 2014, 2017).

In this paper, we extend this well-established approach of quantifying genome-wide selection through its indirect impact on linked neutral sites to temporal genomic data. We show that during rapid polygenic selection, linked selection leaves a signal in temporal genomic data that can readily be differentiated from neutral processes. Specifically, selected alleles perturb the allele frequency trajectories of neighboring neutral loci, increasing the variance of neutral allele frequency change and creating covariance between the neutral allele frequency changes across generations. Earlier work has modeled this effect on neutral alleles as a long-term reduction in the effective population size (Wray and Thompson 1990; Santiago and Caballero 1995, 1998; Woolliams et al. 1993; Robertson 1961), but the increasing availability of genome-wide frequency data across multiple timepoints allows us to directly quantify the extent of linked selection over short ecological timescales (tens of generations). We develop theory for the variances and covariances of neutral allele frequency change under selection, and show that analogous to diversity in a single timepoint study, their magnitudes depend on the local fitness variation, recombination, and LD. Furthermore, we show that our theory can (1) directly partition the variation in genome-wide frequency change into the components caused by drift and selection, (2) estimate the additive genetic variance for fitness and how it changes over time, and (3) detect patterns of fluctuating selection from temporal data. Overall, we believe that our approach to modeling temporal genomic data will provide a more complete picture of how selection shapes allele frequency changes over ecological timescales in natural populations, potentially allowing us to understand short-term effects of linked selection that would otherwise not be perceptible from studies using a single timepoint.

Outline of Temporal Autocovariance Theory

Our goal is to understand how linked selection affects the frequency trajectories of neutral sites by modeling the variances and covariances of neutral allele frequency changes ( $Δ p_{t} = p_{t + 1} - p_{t}$ , where $p_{t}$ is the population frequency at time t; see Table 1 for a list of all notation used). We assume a closed population with discrete, nonoverlapping generations. When there are no heritable fitness differences between individuals, genetic drift is the only source of stochasticity of allele frequency change due to two sources of variation: random nonheritable or environmental differences in offspring number, and Mendelian segregation of heterozygotes. Both are directionless such that when averaged over evolutionary replicates, the expected change in allele frequency due to drift alone is $E (Δ p_{t}) = 0$ and quantified by the variance in allele frequency change $Var (Δ p_{t})$ (as quantified by the variance effective population size; Wright 1938; Crow and Kimura 1970; Charlesworth 2009).

Table 1. Notation.

Symbol	Usage and relevant equations
$p_{t}$	Allele frequency in generation/timepoint t
$Δ p_{t}$	Allele frequency change between generations $t + 1$ and t, $Δ p_{t} = p_{t + 1} - p_{t}$
$Δ_{N} p_{t}$	Frequency change due to nonheritable variation in fitness, (1), (2)
$Δ_{M} p_{t}$	Frequency change due to Mendelian segregation, (1), (2)
$Δ_{H} p_{t}$	Frequency change due to heritable differences, (1), (2)
N	Census population size of breeding individuals
$N_{e}$	Effect population size
$f_{i}$	Fitness (expected number of offspring) of individual i, (28)
$α_{t, l}$	Effect size in generation t and locus l, (5), (36)
L	Total number of loci impacting fitness, (5)
$g_{i, l} \in {0, 1, 2}$	Individual i’s gene count at locus l, (37)
$x_{i} \in {0, 1, 2}$	Individual i’s neutral gene count at the tracked neutral site, (5), (37), (38)
$D_{t, l}$ or $D_{t, l}^{'}$	Gametic linkage disequilibrium between the tracked neutral site and selected locus l at time t, Supplementary Figure SA.2, (5), (37), (38)
$D_{t, l}^{″}$	Nongametic disequilibrium between the tracked neutral site and selected locus l at time t, Supplementary Figure SA.2, (5), (37), (38)
$E (ℛ_{t, l}^{2})$	The squared correlation coefficient of linkage disequilibrium between the tracked neutral site and selected site l at time t, (7), (44)
$r_{l}$	The recombination fraction between the tracked neutral site and selected site l
$V_{a} (s)$	The additive genic variance, (8)
$V_{A} (s)$	The additive genetic variance, (17)
R	The total level of recombination in the region, in morgans, (9) and Figure 1
$r (g)$	A mapping function (i.e., Haldane’s), which maps a position g to a recombination fraction.
ρ	The population recombination rate, $ρ = 4 N r$ , Temporal autocovariance for an average neutral polymorphism
$A (R, t, s)$	The average linkage disequilibrium in a region of R M, that persisted from generation t to generation s, (10)
$V_{N}$	The nonheritable variance in offspring number, (11)
$S S H (t)$	The sum of site heterozygosity at selected sites time t, (13)
$S S H_{n} (t)$	The sum of site heterozygosity at neutral sites at time t, (13)
$s s h_{n} (t)$	The proportion of sum of site heterozygosity at neutral sites at time t relative to $S S H (1)$ , $S S H_{n} (t) / S S H_{n} (1)$ (15)
$z_{i}$	The breeding value of the trait that determines fitness, $z_{i} = \sum_{i = 1}^{L} α g_{i, l}$ , see Temporal Variance and Autocovariance Under Multilocus Selection in Appendix
$w (z_{i})$	The fitness of individual i with fitness function $w (\cdot)$ , see Temporal Variance and Autocovariance Under Multilocus Selection in Appendix
Q	The sample standardized variance–covariance matrix, (16)
$Q_{t, s}$	The elements of the observed sample matrix Q, (16)
Σ	The standardized variance–covariance matrix, based on our theoretical expressions
$Σ_{t, s}$	The elements of the standardized variance–covariance matrix, (10), (11)
$Δ p_{n, t}$	The allele frequency change at site n between times time $t + 1$ and t, (16)
τ	The number of allele frequency changes observed, e.g., after sampling for $τ + 1$ timepoints
$V_{a, s s h_{n}} (t)$	The additive genic variation at time s as approximated by the observed decay in the sum of site heterozygosity at neutral sites, (15)
${Var}_{i} (z_{i})$	The variance in trait values taken over individuals, (17)
$\hat{V_{A} (1)}$	The method-of-moments estimate of the additive genetic variance in the first generation, (19), (18)
$\hat{F}$	The method-of-moments estimate of Wright’s standardized variance, $F = 1 / 2 N$ , (19), (18)
$\hat{N}$	The method-of-moments estimate of drift-effective population size, $N = 1 / 2 F$
$σ^{2}$	The sampling noise around each element of the sample variance–covariance matrix.
B	The total number of windows after partitioning the genome, (22)
$w_{t e x t B P}$	Width of windows (in base pairs), (22)
$v_{A} (1)$	The average additive genetic variance per base pair
$w_{CBP}$	Number of coding base pairs in a window, (22)
$W_{CBP}$	Total number of coding base pairs in the genome, (23)
G	A conservative measure of the total variance in allele frequency change due to linked selection, (24)
$G'$	An alternate, less conservative measure of the total variance in allele frequency change due to linked selection, (25)
$G_{a b s}$	A variant of G using the absolute value of covariances, (27)

Open in a new tab

When there are heritable fitness differences between individuals in the population, a third source of stochasticity affects a neutral allele’s frequency change: neutral alleles can become randomly associated with the genetic backgrounds that determine the fitness differences between individuals (Santiago and Caballero 1995, 1998; Robertson 1961). Even though the neutral alleles do not impact fitness, their frequency trajectories are perturbed by their fitness background, as those on advantageous backgrounds leave more descendants, while those on disadvantageous backgrounds leave fewer. We can partition a neutral allele frequency’s change into these three uncorrelated stochastic components [following Santiago and Caballero (1995), see Appendix section Decomposition of Allele Frequency Change for proof],

Δ p_{t} = \underset{drift}{\underset{︸}{Δ_{N} p_{t} + Δ_{M} p_{t}}} + \underset{selection}{\underset{︸}{Δ_{H} p_{t}}}

(1)

where, $Δ_{N} p_{t} {,Δ}_{M} p_{t}$ , and $Δ_{H} p_{t}$ are the neutral allele’s frequency changes due to nonheritable variation in fitness between diploid individuals, Mendelian segregation of heterozygotes into offspring, and heritable variation in fitness (we refer to this as the heritable change in neutral allele frequency), respectively. Note that while throughout the paper we consider the allele frequency change between adjacent generations, the same approach can be extended to situations where the study system cannot be observed every generation. Like the stochastic components of drift, the allele frequency change due to heritable fitness differences is directionless $(E (Δ_{H} p_{t}) = 0)$ . Additionally, since each component is uncorrelated with the others, the variance in allele frequency change is

Var (Δ p_{t}) = Var (Δ_{N} p_{t}) + Var (Δ_{M} p_{t}) + Var (Δ_{H} p_{t}) .

(2)

The terms $Var (Δ_{N} p_{t})$ and $Var (Δ_{M} p_{t})$ capture the variance due to the random reproduction process, and the former can accommodate extra nonheritable variance in offspring number [as long as individuals are exchangeable with respect to their genotype (Cannings 1974)], while the term $Var (Δ_{H} p_{t})$ captures heritable fitness variation due to systematic differences in the fitnesses of individuals caused by their genotypes.

In addition to inflating the within-generation variance in allele frequency change, heritable fitness variation has another profound effect on neutral alleles: while the stochastic components of drift have independent effects on frequency change each generation, heritable variation in fitness creates temporal autocovariance in neutral allele frequency changes across generations. The contribution of temporal autocovariance is evident by writing the total cumulative allele frequency change as the sum of allele frequency changes each generation,

\begin{array}{l} Var (p_{t} - p_{0}) = Var (Δ p_{t - 1} + Δ p_{t - 2} + \dots + Δ p_{0}) \\ = \sum_{i = 0}^{t - 1} Var (Δ p_{i}) + \sum_{i \neq j} Cov (Δ p_{i}, Δ p_{j}) \\ Var (p_{t} - p_{0}) = \underset{drift}{\underset{︸}{\sum_{i = 0}^{t - 1} (Var (Δ_{N} p_{i}) + Var (Δ_{M} p_{i}))}} + \underset{genetic variance in offspring number}{\underset{︸}{\sum_{i = 0}^{t - 1} Var (Δ_{H} p_{i})}} + \underset{temporal autocovariance}{\underset{︸}{\sum_{i \neq j} Cov (Δ_{H} p_{i}, Δ_{H} p_{j})}} . \end{array}

(3)

These covariance terms are expected to be nonzero only when there is heritable variation in fitness [assuming there is neither non-Mendelian segregation, nor covariance between the parental and offspring environment, e.g., as in Heyer et al. (2005)].

Temporal autocovariance is caused by the persistence over generations of the statistical associations (LD) between a neutral allele and the fitnesses of the random genetic backgrounds it finds itself on; as long as some fraction of association persists, the heritable variation for fitness in one generation predicts the change in later generations, as illustrated by the fact that $Cov (Δ p_{2}, Δ p_{0}) > 0$ (see Figure 1A). Ultimately, segregation and recombination break down haplotypes and shuffle alleles among chromosomes, leading to the decay of autocovariance with time.

(A) On an advantageous background (light blue), a neutral allele increases in frequency leading to a positive change in allele frequency early on, $Δ p_{0} = p_{1} - p_{0}$ . As long as some fraction of neutral alleles remain associated with this advantageous background, the neutral allele is expected to increase in frequency in later generations, here $Δ p_{2} = p_{3} - p_{2}$ . This creates temporal autocovariance, $Cov (Δ p_{2}, Δ p_{0}) > 0$ . Similarly, had the neutral allele found itself on a low-fitness background (orange), this would also create temporal autocovariance. (B) This depicts the setup for our multilocus model. Multiple alleles (yellow) determine the fitness in a region R M in length, and these perturb the allele frequency trajectory of a focal neutral site (light blue).

The effect that heritable variation has on neutral alleles has traditionally been modeled in a quantitative genetics framework where a large number of loosely linked polymorphisms contribute to heritable fitness differences between individuals, and the impact of heritable fitness variation on a neutral allele is quantified as a reduction in its long-run effective population size (Santiago and Caballero 1995, 1998; Robertson 1961). This form of linked selection can be contrasted with classic population genetic hitchhiking theory (Smith and Haigh 1974), which considers how neutral alleles closely linked to a new beneficial mutation are affected as it sweeps to fixation. While classic population genetic linked selection models consider how neutral variation is affected by strong associations caused by tight linkage to an advantageous site, quantitative genetic models of linked selection consider the weakest forms of associations: those between unlinked loci within an individual (Morley 1954; Santiago and Caballero 1995; Robertson 1961) [see Barton (2000) for more on the two models of linked selection]. These quantitative genetic models of linked selection match the expressions for the loss of diversity found in classic hitchhiking models under a steady flux of loosely linked advantageous alleles entering the population [see page 2110 in Santiago and Caballero (1998)] and genome-wide background selection [see equation 12 of Santiago and Caballero (1998) and Nordborg et al. (1996)]. The distant associations considered by these models are quickly established but are rapidly broken down by segregation and independent assortment, yet still can have a marked effect on diversity (Santiago and Caballero 1995; Robertson 1961). However, since the impact of heritable fitness variation has traditionally been modeled as causing a reduction in the effective population size, there has been no direct way to separately estimate its effects from those of drift. We show that with temporal genomic data, one can directly measure the levels of temporal variances and autocovariances of allele frequency change in the population. Additionally, we that show temporal autocovariance is created under both tight and loosely linked selection, and below, develop expressions for its magnitude that are applicable to both, bridging the two regimes of linked selection.

A Model for Multilocus Temporal Autocovariance

Here, we develop theory for the temporal autocovariance in a neutral allele’s frequency changes through time, generated by the presence of heritable fitness in the population. We measure the temporal autocovariance $Cov (Δ p_{t}, Δ p_{s})$ at a single diallelic neutral locus. Since only allele frequency changes due to heritable variation in fitness contribute to temporal autocovariance, we can focus exclusively on the behavior of $Δ_{H} p_{t}$ in deriving our expressions for the autocovariance across timepoints. We imagine that an individual i has fitness $f_{i}$ , i.e., that their expected number of children is $f_{i}$ . We assume a constant population size, and so the population average fitness $E_{i} (f_{i}) = 1$ . Additionally, we assume that all fitness variation has an additive, polygenic architecture. Then, with L loci contributing to fitness, we can write individual i’s fitness as $f_{i} = 1 + \sum_{l = 1}^{L} α_{t, l} g_{i, l}$ , where $α_{t, l}$ is the effect size in generation t and $g_{i, l} \in {0, 1, 2}$ is individual i’s gene content at locus l. Here, each $α_{t, l}$ is analogous to a selection coefficient acting at locus l, since the fitnesses for genotypes $A_{1} A_{1}$ , $A_{1} A_{2}$ , and $A_{2} A_{2}$ are $1, 1 + α_{t, l}$ and $1 + 2 α_{t, l}$ , respectively. This formulation is approximately equivalent to exponential directional selection on some additively determined trait, implying that selection does not create LD between unlinked loci; see Temporal Variance and Autocovariance Under Multilocus Selection in the Appendix for more detail.

When fitness variation exists in the population [(that is, ${Var}_{i} (f_{i}) > 0$ ], the frequency of the neutral allele changes stochastically, as fitter individuals leave more descendants that inherit the neutral allele they carry and less-fit individuals leave fewer. Across the population, stochastic associations can form between the genetic components of an individual’s fitness and the neutral allele they carry, leading the neutral allele frequency to change due to fitness differences across individuals. The total heritable change in neutral allele frequency $Δ_{H} p_{t}$ can then be partitioned into each individual’s contribution to this change based on their fitness $f_{i}$ and the number of the tracked neutral alleles they carry, $x_{i} \in {0, 1, 2}$ , giving us

Δ_{H} p = \frac{1}{2 N} \sum_{i = 1}^{N} x_{i} (f_{i} - 1)

(4)

(Santiago and Caballero 1995). Substituting each fitness $f_{i}$ with its genetic basis and simplifying [see Temporal Variance and Autocovariance Under Multilocus Selection in the Appendix for derivation and equation 10 of Kirkpatrick et al. (2002)] gives

Δ_{H} p_{t} = \sum_{l = 1}^{L} α_{t, l} D_{t, l}^{'} + \sum_{l = 1}^{L} α_{t, l} D_{t, l}^{″}

(5)

where $D_{t, l}^{'}$ is the gametic LD between the neutral allele and the allele at the selected site l on the same gamete, whereas $D_{t, l}^{″}$ is the nongametic LD, or the covariance across the neutral and selected allele on the two different gametes forming an individual [see p. 121 of Weir (1996) for details and Appendix Figure A2A for an illustration]. Intuitively, this expression tells us that the heritable change in neutral allele frequency is determined by the gametic and nongametic LD between the neutral site and all sites that affect an individual’s fitness, scaled by the magnitude of each selected locus’s effect. Alternatively, we can see this as the multivariate breeder’s equation, where the neutral allele is a correlated trait responding to selection on other traits/loci (Lande 1979). This expression is the multilocus analog of the change in a neutral site’s frequency due to hitchhiking at a single linked site [e.g., see equations 2 and 3 in Stephan et al. (2006)].

Because the effects of the nongametic LD are relatively weak compared to the gametic LD for tightly linked loci (see The Strength of Unlinked and Nongametic Associations in the Appendix for an expression of their strength), we ignore these, and hereafter omit the primes in our notation so that $D_{t, l}$ refers to $D_{t, l}^{'}$ . Since $E (Δ_{H} p_{t}) = 0$ , we can write the covariance $Cov (Δ_{H} p_{t}, Δ_{H} p_{s})$ as $E (Δ_{H} p_{t} Δ_{H} p_{s})$ . Hereafter, we also omit the subscript H since $Cov (Δ p_{t}, Δ p_{s}) = Cov (Δ_{H} p_{t}, Δ_{H} p_{s})$ . Expanding these terms, the covariance between the allele frequency changes at generations t and s can be written as This statement for temporal autocovariance is fairly general, as it can handle fluctuating selection (e.g., when $α_{t, l}$ varies with time t) and any additive multilocus evolution (as long as the LD dynamics can be specified). Looking at the first term in this sum, we see that the temporal autocovariance is determined in part by the terms $E (D_{t, l} D_{s, l})$ . These expected LD products reflect the degree to which the association between the neutral locus and a selected site persists from generation t to s (here, $t < s$ ). Intuitively, the higher the initial association between the neutral and selected loci, and the slower the rate of decay of LD between sites, the greater temporal autocovariance will be.

The multilocus temporal autocovariance model with directional selection

Thus far, in reaching Equation 6 we assume only that fitness is additive across loci. In this section, we develop a model of how temporal autocovariance behaves specifically under directional selection beginning at a specific time. We make three assumptions to simplify our expressions. First, we assume that the effect size remains constant through time, such that $α_{l} : = α_{t, l}$ for all t (we relax this assumption in Fluctuating selection). Second, we ignore the contribution of the second term of Equation 6, $E (D_{t, k} D_{s, l})$ (for $k \neq l$ ), to temporal autocovariance. Under the case where the population is initially at mutation–drift–recombination equilibrium, we expect this product to be zero as there is no directional association between the two selected sites and the neutral site. However, we note that interaction between selected sites [Hill–Robertson interference (HRi)] will cause this term to become negative (Barton and Otto 2005), a point we return to later. Third, we assume that the selected sites increase in frequency independently, such that the dynamics of the LD between the neutral and selected site pairs can be modeled using two-locus dynamics. Using a deterministic continuous-time model for the dynamics of the LD between the selected and neutral site (Smith and Haigh 1974; Barton 2000), we rewrite the $E (D_{t, l} D_{s, l})$ terms in the expression for temporal autocovariance as

\begin{array}{l} Cov (Δ p_{t}, Δ p_{s}) = E [(\sum_{l = 1}^{L} α_{t, l} D_{t, l}) (\sum_{l = 1}^{L} α_{t, l} D_{s, l})] \\ = \underset{persistence of associations with selected site l}{\underset{︸}{\sum_{l = 1}^{L} α_{t, l} α_{s, l} E (D_{t, l} D_{s, l})}} + \underset{cross - associations between two selected sites}{\underset{︸}{\sum_{l \neq k} α_{t, k} α_{s, l} E (D_{t, k} D_{s, l})}} . \end{array}

(6)

\begin{array}{l} Cov (Δ p_{t}, Δ p_{s}) = \sum_{l = 1}^{L} α_{l}^{2} E (D_{t, l} D_{s, l}) \\ = \sum_{l = 1}^{L} α_{l}^{2} E (ℛ_{t, l}^{2}) p_{t} (1 - p_{t}) p_{s, l} (1 - p_{s, l}) {(1 - r_{l})}^{s - t} \end{array}

(7a)

\frac{Cov (Δ p_{t}, Δ p_{s})}{p_{t} (1 - p_{t})} = \sum_{l = 1}^{L} α_{l}^{2} p_{s, l} (1 - p_{s, l}) E (ℛ_{t, l}^{2}) {(1 - r_{l})}^{s - t},

(7b)

where $E (ℛ_{t, l}^{2})$ is the square of the correlation between the neutral site and selected site l at time t (a common measure of LD; Hill and Robertson 1968), and $r_{l}$ is the recombination fraction between the neutral site and selected site l.

We can further simplify this expression by assuming that there is no covariation between the additive genic variation at a selected site, and the LD between that selected site and the neutral site (see Temporal Variance and Autocovariance Under Multilocus Selection in the Appendix for more detail). This allows us to factor out the average additive genic variation for fitness at time s and write the covariance as

\frac{Cov (Δ p_{t}, Δ p_{s})}{p_{t} (1 - p_{t})} = \frac{V_{a} (s)}{2 L} \sum_{l = 1}^{L} E (ℛ_{t, l}^{2}) {(1 - r_{l})}^{s - t}

(8)

where $V_{a} (s)$ is the additive genic variance for fitness, which is the additive genetic variance for fitness $(V_{A})$ without the contribution of LD between selected sites, $V_{a} (s) = 2 \sum_{l} α_{l}^{2} p_{l} (s) (1 - p_{l} (s))$ . In part, our expression not relying on the LD between selected sites is a result of ignoring the second term in Equation 6; we revisit the consequences of this assumption further on in Comparing theory to simulation results.

This expression allows us to calculate the temporal autocovariance in cases where we know the vector of recombination fractions between the neutral and each of the selected sites, $r_{1}, r_{2}, \dots, r_{L}$ . Often we do not know the exact positions of these sites, but we can treat these positions as randomly placed on a chromosome and further simplify our model to understand the factors that determine temporal autocovariance.

Temporal autocovariance for an average neutral polymorphism

In the second part of our derivation, we develop a simple intuitive model of how temporal autocovariance is determined by a few key parameters when we make two additional assumptions. First, we assume that selected sites are randomly and uniformly distributed along the chromosome, such that a site’s position on the genetic map is a random variable $g \sim U (- R / 2, R / 2)$ (where R is the region’s length in morgans), and the focal neutral site with which we calculate temporal autocovariance lies in the middle of this idealized chromosome at the origin (as depicted in Figure 1B). Then, the recombination fraction between the focal neutral site and a selected site at random position g is given by the mapping function $r (g)$ , which maps the position g to a recombination fraction. A simple choice for $r (g)$ is Haldane’s mapping function, $r (g) = \frac{1}{2} (1 - e^{- 2 | g |})$ (Haldane 1919) (note we take the absolute value of g to translate the position g to a distance to the focal neutral site), and we use that here. Second, we assume that the LD between each selected site and the focal neutral site depends only on the recombination fraction $r (g)$ between the two loci, and not their absolute positions or effect sizes; then, we rewrite $E (ℛ_{t, l}^{2})$ as the function $E (ℛ_{t}^{2} (r (g)))$ . For example, if the population was initially at drift–recombination balance, this would be $E (ℛ^{2}) = (10 + ρ) / (22 + 13 ρ + ρ^{2})$ where $ρ = 4 N r (g)$ (Hill and Robertson 1968; Ohta and Kimura 1969). These assumptions allow us to conceptually understand the factors that determine temporal autocovariance; in practice, in temporal studies with LD data and recombination maps, one can directly calculate the sum in Equation 8 (see Empirically Calculating the Average LD Persisting Across Generations in the Appendix). We then write the temporal autocovariance experienced by a neutral allele in a region R-M long containing $V_{a} (s)$ fitness variation at time s as

\frac{Cov (Δ p_{t}, Δ p_{s})}{p_{t} (1 - p_{t})} \approx \frac{V_{a} (s)}{2 R} \int_{- R / 2}^{R / 2} E (ℛ_{t}^{2} (r (g))) {(1 - r (g))}^{(s - t)} d g

(9)

(see Temporal Variance and Autocovariance Under Multilocus Selection in the Appendix for details).

This integral is the sum of the initial LD between a typical neutral locus and a selected site, weighted by the decay of LD due to recombination over $s - t$ generations. Selection enters here through the total additive genic variance for fitness for the region divided by the genetic map length of the region $(V_{a} (s) / R)$ . Thus, a key compound parameter in describing the temporal covariance is the additive genic variance per morgan, a quantity somewhat similar to the ratio of new adaptive mutations per base pair to recombination per base pair, $ν_{BP} / r_{BP}$ , that occurs in models of recurrent sweeps (Stephan et al. 1992) and models of the limits of selection with linked loci (Robertson 1970, 1976). Note that this does not include the effects of genome-wide fitness variation, e.g., the impacts that unlinked selected sites have on the neutral site due to the associations created when the sites sort within the same individuals. We quantify the magnitude of these in The Contribution of the Rest of the Genome to Temporal Autocovariance at a Locus in the Appendix.

To validate our theory, we simulate a fixed region of R M and calculate the covariance in allele frequency changes by averaging over many uniformly distributed neutral sites within this region. Then, the random distance between a neutral site’s position n and a selected site’s position g is $c = | n - g |$ , where $n, g \sim U (0, R)$ ; this random variable c has a triangle distribution, $f (c) = 2 (R - c) / R^{2}$ . Averaging over the positions of both randomly placed neutral and selected sites, the temporal autocovariance is

\sum_{t, s} : = \frac{E_{n} (Cov (Δ p_{t}, Δ p_{s}))}{E_{n} (p_{t} (1 - p_{t}))} = \frac{V_{a} (s)}{2} \underset{A (R, t, s)}{\underset{︸}{\int_{0}^{R} E (ℛ_{t}^{2} (r (c))) {(1 - r (c))}^{(s - t)} \frac{2 (R - c)}{R^{2}} d c}}

(10)

where $E_{n} (\cdot)$ indicates an expectation taken over the position of the randomly placed neutral sites, and we define $A (R, t, s)$ as the average LD between selected and neutral sites that persists from generations t to s $(t \leq s)$ . As is common with estimating the expected values of other ratios like $F_{S T}$ (Bhatia et al. 2013), we use a ratio of expectations rather than the expectation of the ratio.

We can also use this expression to calculate the variance of allele frequency change. The standardized variance $Var (Δ p_{t}) / p_{t} (1 - p_{t})$ has two components: the drift term and the heritable variance in offspring number. Adding these independent contributions, the standardized variance is

\frac{E_{n} (Var (Δ p_{t}))}{E_{n} (p_{t} (1 - p_{t}))} = \frac{V_{N} + 2}{8 N} + \frac{V_{a}}{2} A (R, t, s)

(11)

where $V_{N}$ is the nonheritable variance in offspring number. Under a Wright–Fisher model of reproduction, $V_{N} \approx 2$ , this simplifies to

\sum_{t, t} : = \frac{E_{n} (Var (Δ p_{t}))}{E_{n} (p_{t} (1 - p_{t}))} = \frac{1}{2 N} + \frac{V_{a}}{2} A (R, t, s) .

(12)

When combined, this expression for the variance in allele frequency change and our expression for temporal autocovariance are in agreement with Robertson (2009) and Santiago and Caballero (1995; 1998) when predicting the total variance in allele frequency change (see Connecting our Model with the Models of Robertson and Santiago and Caballero in the Appendix). With the above expressions for the variances and covariances, we have a complete set of theoretical expressions for the variance–covariance matrix of allele frequency change, which we call Σ, with the diagonal variance elements $Σ_{t, t}$ given by Equation 12, and the upper- and lower-triangle covariance elements $\sum_{t, s} (t \neq s)$ given by Equation 10.

Modeling the dynamics of additive genic and genetic variation

Our expressions for temporal autocovariance (Equation 10) require an expression for $V_{a} (t)$ , the additive genic variation through time. However, we lack general expressions for the dynamics of the additive genic variation during selection, as these dynamics are quite complex for a few reasons. First, since our theory considers polygenic selection at a finite number of loci in a region, additive genic variation is not as constant as it would be under an infinitesimal model (Bulmer 1980). Second, we allow for arbitrary levels of recombination from very tight linkage to loose linkage. Previous work has shown that predicting the dynamics of additive genic variation in a system with an arbitrary level of recombination is difficult, as both the additive genic and genetic variances depend on the higher-order moments of LD [see Barton and Turelli (1987), p. 607 of Turelli and Barton (1990), and Barton (1991)].

A primary determinant of the additive genic variation is the heterozygosity of the selected sites. Assuming effect sizes are constant through time and across loci, we can rewrite the additive genic variation, $V_{a} (t) = 2 α^{2} \sum_{l} p_{l} (t) (1 - p_{l} (t))$ , as

V_{a} (t) = α^{2} S S H (t)

(13)

V_{a} (t) = V_{a} (1) \frac{S S H (t)}{S S H (1)}

(14)

where $S S H (t) = 2 \sum_{l} p_{l} (t) (1 - p_{l} (t))$ is the sum of site heterozygosity at time t. Ideally, we would directly use $S S H (t)$ in a region; however, this would require knowing a priori which sites are being selected. Instead, we assume that the trait is sufficiently polygenic that frequency changes due to selection are weak, and that the change in heterozygosity at neighboring neutral polymorphic sites approximately mirrors that at selected polymorphisms (this is the case under the infinitesimal model, where the change in frequencies due to selection is no different from the change due to drift) (Robertson 1960; Bulmer 1980, Kimura 1984). Then, using the sum of site heterozygosity at neutral sites, $S S H_{n} (t)$ , as a proxy for the sum of site heterozygosity at selected sites,

V_{a, s s h_{n}} (t) : = V_{a} (1) s s h_{n} (s)

(15)

where we define $s s h_{n} (s) = S S H_{n} (s) / S S H_{n} (1)$ as the factor by which $V_{a} (1)$ decreases at time s, approximated by neutral sites’ allele frequency changes. Under this approximation, the dynamics of genic variation are determined by one free parameter, $V_{a} (1)$ , and the directly measurable sum of site heterozygosity at neutral sites through time.

Our focus here is on the short-term response of a population, and so we look at the decay of genetic backgrounds present at the onset of directional selection. In reality, new mutations consistently create additive genetic variation for fitness; thus, an equilibrium level of additive genetic variance in the population can be maintained. The long-run effect of linked selection under this equilibrium model is handled by Santiago and Caballero (1995, 1998); see Connecting our Model with the Models of Robertson and Santiago and Caballero in the Appendix.

Multilocus simulation details

To test our theoretical expressions, we have conducted extensive forward simulations of directional selection on a polygenic trait. We vary four critical parameters in these simulations: (1) the level of additive genetic variance at the onset of selection $(V_{A})$ , (2) the level of recombination (R in morgans), (3) the number of selected sites in the region (L), and (4) the population size (N). We choose our grid of the selection and recombination parameters based on the levels we would expect across a wide variety of organisms; see the Multilocus Simulation Details section in the Appendix for details. We used three different population sizes $(N \in {100, 500, 1000})$ , but note that we use $N = 1000$ and a subset of the other parameters in our figures.

Before the onset of selection, we create the initial diploid population from a pool of gametes created by msprime (Kelleher et al. 2016), such that the initial allele frequency distribution and LD between sites is at mutation–drift–recombination balance. Details of how msprime was called are available this paper’s code repository (https://github.com/vsbuffalo/tempautocov) in R/simpop.r. Then, we pass this pool of gametes into a forward Wright–Fisher-with-recombination simulation routine and let it evolve for four generations neutrally before initiating selection on the fifth generation. These first four generations of neutral evolution (without mutation) serve as a control to validate that the variance in neutral allele frequency change is as expected under a Wright–Fisher model, and that temporal autocovariance between a generation before selection and during selection is zero.

We generate genetic variation for fitness by choosing L random loci from the neutrally evolved sites, and randomly assign an effect size of $- α$ or $+ α$ , such that the expected total amount of additive genic variation is $V_{A}$ (note that the initial additive genic and genetic variance are equal, $V_{A} = V_{a}$ , as the LD contribution is zero for randomly chosen sites). The details of this are given in the Multilocus Simulation Details section in the Appendix. This approach creates some additional variance around the target level of additive genic variation, as the sum of site heterozygosities will vary stochastically across simulation replicates. At the onset of selection, an individual i’s trait value is calculated as $z_{i} = \sum_{i = 1}^{L} α g_{i, l}$ where $g_{i, l}$ is their number of alleles with effect size α at locus l. Then, their absolute fitness is calculated using an exponential fitness function $w (z_{i}) = e^{z_{i}}$ [see p. 17 in Turelli and Barton (1990)]. Under our Wright–Fisher model, we sample the parents of the next generation according to a multinomial distribution, where the probability of individual i being a parent is $w (z_{i}) / \bar{w}$ .

We record 50 generations of simulated evolution, after which we compute the standardized sample temporal variance–covariance matrix Q (this is the sample analog of our theoretical variance–covariance matrix Σ), for each replicate as follows. First, we mark frequencies reaching fixation or loss as missing values. This allows the frequency changes before fixation/loss to contribute to the measured covariance, rather than removing the entire locus’s trajectory, which would act to condition the covariance on more intermediate frequencies. Note that one cannot ignore fixations or losses, as these have $Δ p_{t} = 0$ and thus an autocovariance of zero, which would act to underestimate the true level of autocovariance at segregating sites. Having marked fixations/losses as missing, we take the frequency matrix and calculate a vector of allele frequency changes $Δ {\vec{p}}_{n} = [Δ p_{n, 1}, Δ p_{n, 2}, \dots, Δ p_{n, τ}]$ using each neutral locus n’s $τ + 1$ observed generations. Finally, we calculate the $τ \times τ$ sample standardized variance–covariance matrix Q, averaging over M neutral loci such that element $Q_{t, s}$ is calculated as

Q_{t, s} = \frac{\frac{1}{M - 1} \sum_{n = 1}^{M} (Δ p_{n, t} Δ p_{n, s} - (\frac{1}{M} \sum_{n = 1}^{M} Δ p_{n, t}) (\frac{1}{M} \sum_{n = 1}^{M} Δ p_{n, s}))}{\frac{1}{M} \sum_{n = 1}^{M} p_{min (t, s)} (1 - p_{min (t, s)})},

(16)

though see Accounting for Allele Frequency Sampling Noise in the Appendix for a bias-corrected version when samples rather than population allele frequencies are used. Sums over missing values only use pairwise-complete observations, implemented by R’s cov() function’s use = ’pairwise.complete’ argument.

We have extensively validated our simulation procedure in a neutrally evolving population, ensuring that the decay of LD and the allele frequency change match expectations (see Supplemental Material, Figures S1.1, S1.2, and S1.3).

Comparing theory to simulation results

To validate our expressions for temporal autocovariance, we compare the levels of autocovariance and variance predicted by Equation 10 and Equation 12 to the average levels observed across simulation replicates. To calculate the theoretical values of temporal autocovariance and variance, our expression requires the additive genic variation at s, $V_{a} (s)$ ; however, as described in Modeling the dynamics of additive genic and genetic variation, we lack an analytic expression for the dynamics of genic variation to plug into $V_{a} (s)$ . Following the approach of others in evolutionary quantitative genetics [see p. 930 in Turelli and Barton (1994)], we substitute the numerical values calculated directly from the simulation data for $V_{a} (s)$ . Additionally we consider two other numerical values related to the additive genic variance: the observed additive genetic variance from our simulations [ $V_{A} (s) = {Var}_{i} (z_{i})$ at time s, which includes the contribution of LD between selected sites], and the additive genic variation at time s as approximated by the observed decay in the sum of site heterozygosity at neutral sites [ $V_{a, s s h_{n}} (1)$ , as described in Modeling the dynamics of additive genic and genetic variation].

Figure 2 compares the fit of our theory with differing additive genetic variances with the empirical covariances from our multilocus simulations. In each panel, we plot the level of temporal autocovariance between the allele frequency change across the first two generations of selection $(Δ p_{5})$ and some later allele frequency change $Δ p_{s}$ where s varies along the x-axis. Each point represents the temporal autocovariance (calculated across all sites in a region according to Equation 16) averaged across 100 replicate simulations, with the color of the point indicating the number of selected sites in the region. Within each panel, the temporal autocovariance predicted by Equation 10 is plotted as a set of three lines, one for each of the three different types of variance we have substituted in for $V_{a} (s)$ . Overall, the fit is close but varies depending on the type of variance used for $V_{a} (s)$ ; we discuss each in turn below.

In each panel, the temporal autocovariance $Cov (Δ p_{5}, Δ p_{s})$ is shown on the y-axis while generation s varies along the x-axis. Selection is initiated on the 5th generation, so $Δ p_{5}$ is the neutral allele’s frequency change across the first generation of selection. Each point is the temporal autocovariance between $Δ p_{5}$ and the $Δ p_{s}$ in a region, averaged over 100 simulation replicates, with the colors indicating the number of selected loci. The gray curves indicate the theoretical predictions (for $L = 500$ loci only) using Equation 10, with the equation’s variance provided by the empirically observed additive genic (solid), additive genetic (long dashes), and neutral sum of site heterozygosity approximations (short dashes). A thin horizontal dashed line indicates $y = 0$ . Across the columns, the level of recombination (in morgans) is varied; across rows, the initial level of additive genetic variation is varied. Note that while our results here are between the frequency change at the onset of selection $Δ p_{5}$ and some later change $Δ p_{s}$ , our covariance theory matches simulation results between any two arbitrary frequency changes $Δ p_{t}$ and $Δ p_{s}$ ; see Supplementary Figure S3.1.

Using empirical additive genic variation (solid lines), our theory provides a good fit to the simulation results for a short period after selection is initiated (around five generations) in regions with tighter linkage $(R = 0.01 Morgans)$ across a range of additive genetic variation parameters ( $0.01 \leq V_{A} \leq 0.05$ ; see Figure S3.2 for $V_{A}$ varying over orders of magnitude). With looser linkage ( $R \geq 0.1$ M), our theory using the empirical additive genic variation fits much more closely over a longer duration (∼10–15 generations). Note that some variability is caused by the noise of each simulation replicate around the target initial additive genetic variation $V_{a}$ (Multilocus simulation details), as each replicate samples sites from a neutral coalescent. Our theory also accurately predicts the temporal autocovariance for different choices of reference generation, i.e., varying t (see Figure S3.1).

When we use the sum of site heterozygosity at neutral sites ( $V_{a, s s h_{n}}$ , shown as a short-dashed line) as a proxy for additive genic variation, the theory fits simulations over the same time span as using the empirical additive genic variation. This is because: (1) the sum of site heterozygosity at neutral sites closely matches the sum of site heterozygosity at selected sites and (2) both closely follow the dynamics of additive genic variation through time (see Figure S2.1). Using $V_{a, s s h_{n}}$ has the advantage that we can directly measure the neutral sum of site heterozygosity, which proves useful later in Estimating Linked-Selection Parameters from Temporal Autocovariance, as we use this approach to help infer the initial additive genic variation at the onset of selection.

Finally, we find that using the additive genetic variance $V_{A} (s) = {Var}_{i} (z_{i})$ accurately predicts the dynamics of temporal autocovariance over tens of generations (see the long-dashed lines in Figure 2). Furthermore, calculating the temporal autocovariance using the empirical additive genetic variation better fits simulation data in regimes with tight recombination, where using genic variation performs poorly after the first few generations (e.g., the column of panels where $R = 0.01$ ). Thus, using the additive genetic variance in our framework provides a good fit to the temporal dynamics over relatively long time spans.

What differentiates $V_{A} (s)$ from $V_{a} (s)$ that could explain this better fit? The additive genic variation $V_{a} (s)$ ignores the contribution of LD between selected sites. We can write the additive genetic variance as

V_{A} (s) = {Var}_{i} (z_{i}, s) = \underset{genic variation, V_{a} (s)}{\underset{︸}{2 α^{2} \sum_{l = 1}^{L} p_{l} (s) (1 - p_{l} (s))}} + \underset{LD contribution}{\underset{︸}{α^{2} \sum_{i \neq j} D_{i, j} (s)}}

(17)

where $D_{i, j} (s)$ is the LD between selected sites i and j at time s. At the onset of selection, there is no expected LD between selected sites since the sites and effect sizes were randomly sampled; in other words, $E (D_{i, j} (s)) = 0$ . We see this in Figure 2, as the temporal autocovariances predicted with $V_{a} (s)$ match those of $V_{A} (s)$ when $s = 6$ (see also Figure S2.1, which plots the empirical additive genic and genetic variances over time). Over time, these two quantities diverge as negative LD build up. While negative LD between selected sites build up due to epistasis under some forms of selection (known as the Bulmer effect) (Bulmer 1971, 1980), this is known not to happen under multiplicative selection [see p. 50 and p. 177 in Bürger (2000)] that is equivalent to the exponential directional selection fitness surface we have used in our simulations. Instead, the buildup of negative LD between selected sites is likely due to HRi between selected sites (Hill and Robertson 1966), which affects the total additive genetic variation that selection is acting on. HRi refers to the creation of negative LD among beneficial alleles in a finite population resulting from the fact that beneficial alleles that are on the same haplotype move more quickly through the population than beneficial alleles on deleterious backgrounds, resulting in negative LD. This negative LD among beneficial alleles lowers $V_{A}$ compared to the genic $V_{a}$ (Hill and Robertson 1966; Barton and Otto 2005; Good et al. 2014; Crouch 2017). In the derivation of our expression for temporal autocovariance, we greatly simplified the multilocus dynamics by ignoring the second term in Equation 6. This term includes the expected product of two LD terms; each is the LD between the neutral site and a selected site. Using full multilocus theory, one may find that, by including these LD products, $V_{A}$ rather than $V_{a}$ factors out the expression in Equation 7, but we leave this for future work. Importantly, our simulation results suggest that the negative LD created by selective interference only affects the temporal autocovariances through the variance term $V_{a} (s)$ , and that the actual variance determining temporal autocovariance is the additive genetic variance, $V_{A} (s)$ .

In addition to modeling autocovariance through time, our theory can predict the total temporal variance in allele frequency, $Var (p_{t} - p_{0})$ , when there is heritable variation for fitness. Furthermore, from Equation 3 recall that we can decompose $Var (p_{t} - p_{0})$ into variance and covariance components. The variance components are determined by both the magnitude of drift $(1 / 2 N)$ and selection according to Equation 11, and the covariance components are determined solely by selection according to Equation 10 (assuming no inheritance of environmental factors). Using our theory, we have predictions for each of these components given the amount of additive genic/genetic variation for fitness, the population size (N), and the amount of recombination (R). In Figure 3, we compare the magnitudes of these components (averaged over the replicates of our simulations) to our theoretical predictions. We depict the predictions for the variance and covariance components using both the empirical additive genetic variance $(V_{A})$ and the neutral sum of site heterozygosity proxy $(V_{a, s s h_{n}})$ as adjacent bars, each around a point range with the point representing the average value over simulation replicates, and the bars indicating the lower and upper quartiles over simulations.

Summing over generations, Equation 6 and Equation 12 accurately predict the total variation in allele frequency change due to variance (var) and covariance (cov) components. The predicted cumulative variance in allele frequency change across the 10 generations after selection $(Var (p_{15} - p_{5}))$ is shown as bars, using both the empirical additive genetic variation $V_{A}$ (bars to the left of the point range) and the empirical neutral sum of site heterozygosity (bars to the right of the point range). The variance and covariance components are represented by blue/green and orange/yellow tones, respectively. Finally, we show the averaged results of our simulations as point ranges, with the points depicting the average, and the bars representing the lower and upper quartiles.

Finally, we have found that across a wide range of recombination and additive genetic variation parameters, the temporal autocovariance $Cov (Δ p_{t}, Δ p_{s})$ is largely determined by the compound parameter $V_{A} / R$ , and the number of generations between t and s, which is a factor in Equation 9. We show in Figure 4 that the temporal autocovariance $Cov (Δ p_{5}, Δ p_{s})$ from simulations across a wide range of $V_{A}$ and R parameters fall roughly on the same curve for each number of elapsed $s - t$ generations.

The compound parameter $V_{A} / R$ and the number of generations between the temporal autocovariance $s - t$ largely determines the magnitude of the temporal autocovariance across a wide spectrum of $V_{A}$ and R parameters. Each point is a simulation replicate with its x-axis position given by $V_{A} / R$ , the y-axis position equal to the temporal autocovariance, and the number of elapsed generations given by $(s - t)$ . Each line is a Loess curve fit through each set of points for a particular generation (with smoothing parameter $α = 0.9$ ).

Data availability

All code to reproduce these results is available on GitHub at https://github.com/vsbuffalo/tempautocov. Larger simulation data sets used to create figures are in the Supplemental material available at FigShare: https://doi.org/10.6084/m9.figshare.7709930.

Estimating Linked-Selection Parameters from Temporal Autocovariance

Our multilocus theory provides analytic expressions for the expected variances and covariances of a neutral allele’s frequency; thus, a natural approach to parameter estimation is to equate these expectations to averages from the data and apply the method of moments. We describe a method-of-moments procedure below to estimate the initial additive genetic variance at the onset of selection $(V_{A} (1))$ in the first generation and the drift-effective population size (N) from temporal data within a single region R-M long, and then show a simple extension that allows this to be applied to genome-wide data. Our basic approach is to first calculate the sample variances and covariances of the τ observed generation-to-generation allele frequency changes, averaging over all of the putatively neutral sites in a region. We then equate these sample variances and covariances to our analytic expressions for the variances and covariances, leaving us with an overdetermined system of equations, which we solve using least squares. We demonstrate that this simple estimation procedure provides accurate estimates of initial additive genetic variance and the drift-effective population size. We focus on this procedure, as it is simple and handles incomplete trajectories due to missing data or fixation/loss well. Calculating pairwise-complete covariances can leave sample covariance matrices nonpositive definite, which makes maximum likelihood estimation perhaps much more difficult. Throughout, we use population allele frequencies (i.e., there is no sampling noise). which simplifies the description of the method; in Appendix section Accounting for Allele Frequency Sampling Noise we describe how the method is changed by finite sampling of chromosomes from a population.

From our multilocus theory, we have analytic expressions for each element of the $τ \times τ$ covariance matrix of allele frequency changes in a region. To model the additive genic variance through time, we use the empirical neutral sum of site heterozygosity approximation as described in Modeling the dynamics of additive genic and genetic variation. This approximates the rate that the additive genic variation decreases through time from some initial level, $V_{A} (1)$ , which we wish to estimate. In total, we have $τ + τ (τ - 1) / 2$ unique moment equations, which for the variance and covariance are defined as

\frac{Var (Δ p_{t})}{E (p_{t} (1 - p_{t}))} = \frac{\hat{V_{A} (1)}}{2} \frac{S S H_{n} (t)}{S S H_{n} (1)} A (R, t, t) + \hat{F} : = Σ_{t, t}

(18)

\frac{Cov (Δ p_{t}, Δ p_{s})}{E (p_{t} (1 - p_{t}))} = \frac{\hat{V_{A} (1)}}{2} \frac{S S H_{n} (s)}{S S H_{n} (1)} A (R, t, s) : = Σ_{t, s} (for s > t) .

(19)

Here, the first line gives the form of τ equations for the variance of allele frequency changes between subsequent generations, which includes the effect of genetic drift, $\hat{F} = 1 / 2 N$ . The second line gives the form of the covariances of allele frequency changes among different generations. The term $A (R, t, s)$ is the average level of LD after the $s - t$ generations that have elapsed, given there are R M of recombination. In our multilocus theory section and simulations, this is equal to the integral in Equation 10. However, we can also directly calculate a sample $\bar{A (R, t, s)}$ from observed LD in a region (for details, see Equation 55 in the Appendix).

Following the method of moments, we equate each of these independent $τ + τ (τ - 1) / 2$ equations for $Σ_{t, s}$ to the observed sampling moments, the elements $Q_{t, s}$ of the upper triangle of the observed heterozygosity-normalized covariance matrix $\hat{Q}$ described in Equation 16. This yields $τ + τ (τ - 1) / 2$ equations with two unknown parameters: $\hat{V_{A} (1)}$ and $\hat{N}$ . We solve this overdetermined system of equations using least squares, an approach similar to the generalized method of moments in econometrics (Hansen 1982). This approach finds parameter estimates that minimize the squared error between the moment-based parameter estimate and the true parameter value, with respect to the true parameter value. We write the elements $Q_{t, s}$ of the upper triangle of the observed covariance matrix in the vector $\vec{q}$ , and write the method-of-moments equations as,

\vec{q} = \hat{V_{A} (1)} \vec{a} + \hat{F} \vec{b} + \vec{ε}

(20)

where the elements of $\vec{a}$ and $\vec{b}$ , in the same order as $\vec{q}$ , are given by

a_{t, s} = \frac{1}{2} \frac{S S H_{n} (s)}{S S H_{n} (t)} A (R, t, s), b_{t, s} = δ_{t, s}

(21)

where $δ_{t, s}$ is an indicator variable that is one when $s = t$ and zero otherwise.

Then, we can readily estimate the parameters $\hat{V_{A} (1)}$ and $\hat{F}$ using least squares. We then obtain an estimate of $\hat{N}$ by taking $1 / 2 \hat{F}$ . Since these equations are not statistically independent, we cannot assume $Cov (\vec{ε}) = σ^{2} I$ . However, this does not affect our estimates $\hat{V_{A} (1)}$ and $\hat{F}$ , as the least squares procedure is unbiased regardless of the covariance structure between the error terms [see p. 26 in Christensen (2011)].

Using this method-of-moments approach, we sought to infer the parameters of 20 of the replicates across the 254 parameter combinations (the same as used in Figure 2). We use the first five generations after the onset of selection to infer $\hat{V_{A} (1)}$ and $\hat{N}$ , as for this short time span the additive genetic variance is well approximated using the sum of neutral site heterozygosity approach (see Comparing theory to simulation results). Each simulation replicate includes ∼500 neutral sites (the exact number is random, see Multilocus Simulation Details in the Appendix for details).

Applying our approach to these simulations, we find that we can infer both the initial level of additive genetic variation $V_{A} (1)$ and the effective population size N from multilocus temporal data. In Figure 5A, we show that our method of moments gives reasonable estimates for the initial level of additive genetic variance over orders of magnitude of additive genetic variation, and different recombination regimes. As additive genetic variation for fitness becomes weaker (the left side of the figure), our estimates become more noisy. In Figure 5B we show the simultaneously estimated population size N against the true population size value. We also plot the estimated $N_{e}$ (not accounting for selection) from a simple temporal estimator, $N_{e} = - t / (2 log (1 - F))$ (Krimbas and Tsakas 1971; Waples 1989), where F is Wright’s standardized variance (Wright 1931). While with high $V_{A}$ and low R the method-of-moments approach still underestimates N, it performs far better than a standard temporal $N_{e}$ estimator that does not account for selection. In Appendix Figure A3, we include a version of this figure calculated using the method of moments on sample allele frequencies for a sample of size $n = 100$ chromosomes.

True parameter values and estimates using the method-of-moments approach on multilocus simulation data. (A) The true $V_{A} (1)$ (x-axis) and $\hat{V_{A} (1)}$ estimated from the variance–covariance matrix (y-axis) for each simulation replicate across different levels of recombination (indicated by each point’s color). The dashed gray line shows the $y = x$ line where an estimate is exactly true to its real value. Note that the plot is on a log–log scale, as $V_{A}$ varies across orders of magnitude in our simulations. (B) Estimated drift-effective population size $(\hat{N})$ across a range of simulations with different levels of additive genetic variance and recombination. Each point denotes the median, with lines denoting the interquartile range. A simple temporal estimate of the effective population size, estimated to account for the effects of selection, is averaged for each replicate and plotted as a dash. The true value $(N = 1000)$ is shown with the dashed gray line. Population frequencies (without sampling noise) are used in this figure; see Appendix Figure A3 for an analogous figure calculated with sample frequencies.

We can extend this approach to whole-genome data by imagining partitioning the genome into B nonoverlapping windows of length $w_{BP}$ in base pairs (e.g., megabase windows). We first assume that windows contribute uniformly to the genome-wide level of additive genetic variance for fitness, and show how our method-of-moments approach can be used to estimate a global $V_{A} (1)$ . Assuming a uniform distribution of genetic variance across base pairs, the total additive variance is $V_{A} (1) = v_{A} (1) B w_{BP}$ , across our B windows, where $v_{A} (1)$ is the additive genetic variance per base pair. As each window i contributes to $v_{A} (1) w_{BP}$ , our least squares approach given by Equation 20 becomes

Σ_{t, s, i} = \frac{\hat{V_{A} (1)}}{2} \frac{1}{B} \frac{S S H_{n} (s)}{S S H_{n} (1)} A (R_{i}, t, s) + \hat{F} δ_{t, s} + ε, for s \geq t .

(22)

However, we expect a priori that windows containing more coding bases might disproportionately contribute to the total additive genetic variance. This suggests an alternative model to fit where partitions of the additive genetic variance across windows are proportional to the number of coding bases, similar to background selection and other linked selection models (McVicker et al. 2009; Rockman et al. 2010; Corbett-Detig et al. 2015). Thus, we could write total $V_{A} (1) = v_{A} (1) \sum_{i = 1}^{B} w_{CBP, i}$ where $w_{CBP, i}$ is the number of coding or exonic base pairs in window i (this could be any quantifiable annotation feature in the window), and $W_{CBP} = \sum_{i = 1}^{B} w_{BP}, i$ is the total number of coding bases in the genome. With window i contributing $v_{A} (1) w_{CBP, i}$ to the additive genetic variance and having map length $R_{i}$ , we now define $\vec{q}, \vec{a}$ , and $\vec{b}$ as having elements given by the equations

\sum_{t, s, i} = \frac{\hat{V_{A} (1)}}{2} \frac{w_{CBP, i} S S H_{n} (s)}{W_{CBP} S S H_{n} (1)} A (R_{i}, t, s) + \hat{F} δ_{t, s} + ε, for s \geq t .

(23)

Again, the parameters of this model, $\hat{V_{A} (1)}$ and $\hat{N}$ , can be estimated with least squares. When analyzing genome-wide data, these various models could potentially be compared to an out-of-sample procedure, using inferred parameters to estimate the mean-squared predictive error between the two models for the remaining windows (Elyashiv et al. 2016). The C.I.s for our method-of-moments estimates could be obtained through bootstrapping genomic windows since the errors are not identically and independently distributed.

Estimating the proportion of variance in frequency change due to linked selection

We can also estimate what fraction of allele frequency change over t generations $(Var (p_{t} - p_{0}) / t)$ is due to linked selection acting to perturb the frequency trajectories of neutral alleles. We have developed two approaches: first, a more conservative approach that considers only the contribution of selection to the temporal autocovariance, and second, a more exact approach that uses the estimated effective population size to include the contribution of selection to both variances and covariances of allele frequency change.

First, a simple estimate of the total fraction of the variance in allele frequency change (G) caused by linked selection is

G = \frac{\sum_{t \neq s} Cov (Δ p_{t}, Δ p_{s})}{Var (p_{t} - p_{0})} .

(24)

However, this estimator is conservative because it ignores the contribution that linked selection has on the variance in allele frequency change across a single generation [the $Var (Δ_{H} p_{t})$ term in Equation 2]. If we include these variance terms, we have a less-conservative estimator that we call $G'$ :

\begin{array}{l} G' = \frac{\sum_{t \neq s} Cov (Δ p_{t}, Δ p_{s}) + \sum_{i = 1}^{t} Var (Δ_{_{H}} p_{i})}{Var (p_{t} - p_{0})} \\ = 1 - \frac{\sum_{i = 1}^{t} Var (Δ_{M} p_{i}) + \sum_{i = 1}^{t} Var (Δ_{N} p_{i})}{Var (p_{t} - p_{0})} . \end{array}

(25)

We can think of the numerator of the second term in Equation 25 as the variance in allele frequency change in a Wright–Fisher population without selection. Recall that under a Wright–Fisher model, the standardized variance across t generations is approximately $(1 - exp (- t / 2 N)) \approx p_{0} (1 - p_{0}) \times t / 2 N$ , where this second approximation works for short time spans $(t / 2 N ≪ 1)$ . This suggests that we can use our method-of-moments estimate of the effective population size without the effects of selection, $\hat{N}$ , and compare the fraction of standardized variance we expect under this rate of drift to the empirical standardized variance,

G' = 1 - \frac{t E (p_{0} (1 - p_{0}))}{2 \hat{N} Var (p_{t} - p_{0})} .

(26)

Figure 6 shows the estimated $G'$ from the method-of-moment $\hat{N}$ estimates using 20 replicates of simulated data. We learn three important points about our $G'$ estimator. First, for low $V_{A}$ or $V_{A} = 0$ , the estimator is quite noisy. Second, although the signal can be noisy for low $V_{A}$ , the relationship between $G'$ and the level of recombination is consistent with selection affecting the total variance in allele frequency changes across the genome. Finally, this suggests that a negative relationship between $G'$ and recombination rate, calculated in windows across the genome, is a robust signal of linked selection impacting the total variance in allele frequency change. Furthermore, we would expect a positive relationship between the number of coding base pairs per window (when such information is available) and $G'$ , which could serve as another robust signal of linked selection impacting the total variance in allele frequency change.

The proportion of total variance in allele frequency changes caused by linked selection, $G'$ , across a variety of different levels of additive genetic variance (each group of boxplots), and different levels of recombination (each colored boxplot within a group). Each boxplot shows the spread of values across 20 replicates, with $\hat{N}$ being calculated across each replicate.

Fluctuating selection

Thus far, we have assumed that fitness effect sizes are constant through time, that is $α_{t, l} = α_{s, l}$ , for all $t, s$ . In natural populations, changes in the environment or composition of the population may cause these effect sizes to change through time, due to changing selection pressures and changes in the epistatic environment experienced by alleles. If these changes occur within the timeframe of recorded allele frequency changes, the levels of temporal autocovariance will differ from the levels predicted from our directional selection theory. However, from Equation 6 we can see that the magnitude of temporal autocovariance is determined in large part by $E (α_{t} α_{s})$ .

Here, we discuss how temporal autocovariance behaves under an example of strong fluctuating selection: when selection on a trait changes direction at some point. Specifically, we change the fitness function $w (z_{i}) = e^{z_{i}}$ to $w (z_{i}) = e^{- z_{i}}$ after some timepoint $t^{*}$ ; this is equivalent to changing $α_{s, l} = - α_{t, l}$ if $s \geq t^{*}$ , and $α_{s, l} = α_{t, l}$ otherwise for all other $s < t^{*}$ .

When such a strong change in the direction of selection occurs, the temporal autocovariance between timepoints before and after the change becomes negative, since temporal autocovariance is determined by the product $α_{t} α_{s}$ for $t \neq s$ (here we are holding effects constant across loci). We have validated this using the same simulation procedure as described in Multilocus simulation details, except that on generation 15 we reverse the direction of selection on the trait by changing the fitness function $w (z) = e^{z}$ to $w (z) = e^{- z}$ . In Figure 7A, we show the temporal autocovariance $Cov (Δ p_{5}, Δ p_{s})$ for varying s along the x-axis (in this case, $V_{a} = 0.05, R = 0.1$ , and $L = 500$ ). During the first five generations, the temporal autocovariance behaves as it does under directional selection, decaying due to the decrease in additive genetic variance and the breakdown of LD. Then, on the 15th generation, the direction of selection on the trait with breeding value $z_{i}$ reverses, and temporal autocovariance becomes negative since $α_{s, l} α_{t, l} < 0$ for all $s \geq t^{*}$ and $t < t^{*}$ . Under this simple flip in the direction of selection pressure, the genic variance, $V_{a} (s) = 2 α^{2} \sum_{l} p_{s, l} (1 - p_{s, l})$ , in the expressions for the temporal autocovariance can be replaced with $V_{a} (s) = 2 α_{s} α_{t} \sum_{l} p_{s, l} (1 - p_{s, l})$ , akin to a genetic/genic covariance. In Figure 7A, the gray line is our predicted level of temporal autocovariance proportional to $V_{A} A (R, t, s)$ given by Equation 8 before generation 15, and after that generation it is proportional to $- V_{A} A (R, t, s)$ (using the empirical additive genetic variance). However, note that the dynamics of additive genetic variance under fluctuating selection are more complex than under directional selection. Whereas under directional selection the genetic variance decays as selection proceeds, under fluctuating selection there can be a transient increase in the additive genetic variance (seen in Figure 7A between generations 15–24). This transient inflation of the additive genetic variance is caused by the increase in the heterozygosity of haplotypes that experience reduced heterozygosity due to directional selection. With the direction of selection reversed, previously selected haplotypes move to more intermediate frequencies, which increases the additive genetic variance until selection proceeds and this variance decays (generations 25 and onwards). Overall, the dynamics of additive genetic variance under fluctuating selection are more complicated than under directional selection, which makes inference using our sum of site heterozygosity approximation infeasible.

(A) The covariance (cov) $Cov (Δ p_{5}, Δ p_{s})$ , where s is varied on the x-axis, for $V_{a} = 0.05$ , $R = 0.1$ , and $L = 500$ averaged over 100 replicates. Selection begins in generation 5 [with the fitness function $w (z_{i}) = e^{z_{i}}$ ] and on generation 15 the direction of selection flips [and the fitness function becomes $w (z_{i}) = e^{- z_{i}}$ ]. The gray line shows our directional selection temporal autocovariance prediction modified so that after the flip in the direction the trait is selected, we plot the *negative* theoretical level of temporal autocovariance. (B) The average cumulative variances (var) and covariances through the generations for the same simulation parameters, with the heights of the bars representing the total cumulative variation $Var (p_{t} - p_{0}) / t$ . Since the direction of selection flips, the covariance terms after generation 15 become negative, leading the total variance to decrease (the negative covariances are plotted below the x-axis line). After generation 21, the total covariance is negative, leading the total variance to dip below the level of variance alone (determined by drift and heritable fitness variation). The dark gray dashed line shows the level of variance expected by drift alone $(Var (p_{t} - p_{0}) / t = 1 / 2 N)$ . (C) The effect of using the absolute value of covariance, which prevents the negative autocovariances from canceling out the effects of other covariances before the direction of selection changes.

Since fluctuating selection can create negative temporal autocovariance, the total amount of autocovariance over time [e.g., $\sum_{t \neq s} Cov (Δ p_{t}, Δ p_{s})$ ] can misrepresent the actual amount that linked selection is affecting allele frequencies on shorter timescales. In turn, this leads our estimator G, for the fraction of variance in allele frequency due to linked selection, to underestimate the total contribution of linked selection to variation in allele frequencies over the time period. We show an example of this in Figure 7B, which depicts the total variance in allele frequency change $Var (p_{t} - p_{0}) / t$ through time, partitioned into variance and covariance components, and colored according to whether the generation was before or after the reverse in directional selection. The covariances $\sum_{t \neq s} Cov (Δ p_{t}, Δ p_{s})$ increase as they would under directional selection, but after generation 15, the contribution of covariance begins to decrease as negative autocovariances accumulate. By generation 20, the total covariance terms have a net negative effect, and actually act to decrease the total variance $Var (p_{t} - p_{0}) / t$ for a few generations below the constant level expected under drift and heritable variation.

To more fully capture the contribution of selection allele frequency change, we modify G, using the absolute value of the covariances

G_{a b s} = \frac{\sum_{t \neq s} | Cov (Δ p_{t}, Δ p_{s}) |}{Var (p_{t} - p_{0})},

(27)

which prevents negative temporal autocovariances from canceling out the effects of positive temporal autocovariances, since both are a reflection of linked selection acting on neutral allele frequency changes. We show $G_{a b s}$ in Figure 7C, where even after the change in the directional selection we see a steady accumulation of covariance contributing to total variance $Var (p_{t} - p_{0}) / t$ . This also suggests that one plausible way to check for genomically widespread fluctuating selection would be to test if $G_{a b s} > G$ . However, we note that in contrast to our simulations, it is likely that in natural populations only a subset of alleles change their relationship to fitness. This may act to dampen the magnitude of, but not completely reverse, the direction of genome-wide temporal autocovariance, and so different approaches may be needed to identify fluctuations.

Discussion

Currently, the prevailing empirical approach to studying linked selection relies on using samples from a single timepoint, and modeling the patterns of diversity subject to different functional constraints and in different recombination environments. The early theoretical work underpinning empirical analyses of linked selection’s effects on diversity were primarily full sweep, recurrent hitchhiking models, where beneficial mutations arise in the population and then sweep to fixation (Smith and Haigh 1974; Kaplan et al. 1989; Stephan et al. 1992). Furthermore, by looking at the patterns of diversity around amino acid substitutions or the site frequency spectrum in low-recombination regions, researchers have teased apart the effects of background selection and hitchhiking in Drosophila (Begun et al. 2007; Elyashiv et al. 2016), humans (McVicker et al. 2009; Hernandez et al. 2011), and some plant species (Nordborg et al. 2005; Schmid et al. 2005; Williamson et al. 2014; Beissinger et al. 2016). Yet, as other theoretical models of hitchhiking incorporating changes in the environment (Kopp and Hermisson 2007, 2009a,b), sweeps originating from standing variation (Hermisson and Pennings 2005), and multiple competing beneficial haplotypes (Pennings and Hermisson 2006) have been developed, it has become rather more difficult to detect the signals of these hitchhiking phenomena in empirical data. We have proposed here that temporal autocovariance offers a unique and measurable signal of linked selection over shorter timescales that provides a fuller picture of the ways in which genome-wide diversity has been affected by these other hitchhiking phenomena.

Empirical applications and future directions

Here, we have developed expressions for temporal variances and autocovariances, and applied these to model temporal variances and covariances during directional selection on a trait. We have demonstrated how one can: (1) estimate the additive genetic variance for fitness and the drift-effective population size, (2) estimate the fraction of variance in allele frequency change due to linked selection, and (3) evaluate whether fluctuating selection is operating from these temporal variances and autocovariances. However, we recognize a series of limitations and difficulties when applying these methods to empirical data and natural populations.

First, one difficulty with temporal sampling of natural populations is the risk that the genetic composition may change drastically due to migration or biased sampling across timepoints. Since our theory assumes a constant-sized and isolated population, migration into the sampled population presents a serious potential confounder. For example, seasonal migration could create an influx of new alleles that could at best dampen signals of directional selection across seasons, or at worst create a signal of artificial covariance between like seasons. Similar biases could occur if a sampling method incidentally preferred certain subgroups in a stratified population. This could occur, for example, if individuals differed in some behavior affecting their likelihood of being sampled across different temporal environments, which could cause spurious covariances. While such sampling issues and migration might be able to be detected post hoc from exploratory data analysis approaches like PCA, studying isolated natural populations and carefully designing sampling schemes would lead to the best inference. Importantly, the effects of gene flow and other temporal inhomogeneities could differ across recombination environments, as among-population differentiation will be more pronounced in regions of low recombination and high functional density (Keinan and Reich 2010; Nachman and Payseur 2012; Burri 2017). In situations where migration is a factor, one way forward might be to study the contribution of linked selection after partitioning out the effects of gene flow across recombinational and functional environments, by extending admixture inference approaches that estimate the genome-wide effect of drift and admixture.

Second, our method-of-moments approach relies on assuming that we can approximate the dynamics of the decay of the additive genetic variance that was present at a reference timepoint by using the observed changes in the sum of site heterozygosity in a region as a proxy for its decay. While we have shown that this model works under directional selection in a relatively idealized setting, inference in natural populations may be complicated by changes in the environment that induce the effect sizes across loci to vary across timepoints. While our fluctuating selection results show that our directional selection theory extends to changes in the direction of selection with minor adjustments (e.g., Figure 7), having the effect sizes vary across each generation would complicate the dynamics of additive genetic variance through time and make inference of $V_{A}$ difficult. Similarly, we assume that effect sizes are constant across sites. Variance in effect sizes across a region will not bias our results unless there is covariance between effect size and local recombination rate. Further work is required to develop statistical methods to test for violations of our assumptions about effect sizes remaining constant across time, and to potentially incorporate these complications into inference.

Along similar lines, we have focused on directional selection under a multiplicative model. However, selection experiments, a natural place to apply our method, often use truncation selection, which generates systematic epistasis for fitness and thus LD among loci (Bürger 2000; Walsh and Lynch 2018). Similarly, in natural populations stabilizing selection will act on many traits, which can also generate LD among loci. These selective processes will act to rapidly reduce the additive genetic variance for fitness across time, especially in low-recombination regions, and reduce the initial additive genetic variance in low-recombination regions. Simulating the effect of truncation selection and stabilizing selection on temporal covariances in allele frequencies seems a useful direction. Our model may prove to be a useful null model of selection that the complications of epistasis and different dynamics of the additive genetic variance could be tested against.

Finally, while we have demonstrated that our method-of-moments estimation approach is simple and leads to unbiased estimators, we see opportunities for simple extensions and other inference procedures. Differences in recombination rates and coding density are relatively easily accommodated (Equation 23). In fitting our model, we assumed a parametric form to the initial LD between neutral and selected loci; however, in practice the initial LD between neutral polymorphisms and putatively functional sites could be estimated empirically. Using the empirical LDs in Equation 23 could make the inference somewhat analogous to LD score regression (Bulik-Sullivan et al. 2015). The LD score of a SNP is simply the sum of the $R^{2}$ s of the focal SNP to all SNPs within some large physical genomic window, which could be used in place of the integral in Equation 10. Using this equation, $V_{A}$ could be estimated by regressing the temporal covariance of a SNP on its LD score. An alternate approach would be to use likelihood methods to model each neutral site’s frequency changes using the set of pairwise LD and recombination distances between the neutral site and all neighboring polymorphic sites. Then, genome- or region-wide estimates could be found via composite likelihood methods, in a similar manner to McVicker et al. (2009) and Elyashiv et al. (2016). Furthermore, one could include different $V_{A}$ parameters for neighboring polymorphic sites with specific functional annotations—such as those in genic regions, introns, and exons—to see how different classes of sites contribute to the additive genetic variance for fitness. Our hope is that statistical methods to quantify the effects of linked selection over short timescales will improve and be combined with measures of phenotypic change, leading to a more synthetic view of how selection on ecological timescales occurs at the genetic and phenotypic levels.

As the number of empirical temporal genomic studies continues to increase, it is worth mentioning how our study of temporal autocovariance suggests a few ways to optimize experimental design to increase the power to differentiate the effects of selection from drift. First, one should ideally sample frequencies from consecutive generations for at least some timepoints during the duration of the experiment. This is because the variance in allele frequency change $Var (Δ p_{t})$ between adjacent generations is only impacted by the heritable variance in offspring number, $Var (Δ_{_{H}} p_{t})$ (see Equation 2), and not by the accumulation of temporal autocovariance terms (e.g., Equation 3); this allows for more accurate estimates of G. In cases where a long study duration is needed but sequencing is limited to only a subset of generations, a mixed-duration sampling design, such as sampling generations 1 through 4, then 10 through 14, and so forth could serve as a compromise. Second, as described in Accounting for Allele Frequency Sampling Noise in the Appendix, the shared sampling noise between adjacent timepoints creates a negative bias in autocovariance that must be corrected for. As described in Equation 98 and Equation 99, we can estimate this bias from data, but this introduces additional uncertainty into our parameter estimates. In cases where the experimenter suspects a priori that fluctuating selection is occurring, e.g., between two seasons, we recommend at least two temporal samples per season. This allows one to differentiate negative covariance occurring from the bias correction procedure underestimating bias from negative covariance caused by fluctuating selection through comparing nonadjacent timepoints that differ in season. Finally, one can directly remove the effects of the technical sampling noise created by variation in sequencing by dividing up temporal samples and barcoding them into two groups (e.g., A and B). Then, the sample covariance estimate $Cov (Δ p_{t, A}, p_{s, B})$ does not share the technical sampling noise, reducing the bias (but note some bias remains due to the sampling process where individuals are sampled from the population.)

Connecting temporal linked selection with single-timepoint studies

Our goal in this paper is to suggest that quantifying variance and autocovariance using temporal data sets can help us understand the impact that linked selection has across the genome on short timescales, which supplements our current view informed mainly by single-timepoint studies. A range of approaches to estimate the parameters and impacts of models of linked selection from a single contemporary timepoint have been developed (Begun and Aquadro 1992; Wiehe and Stephan 1993; Hudson 1994; McVicker et al. 2009; Sella et al. 2009; Elyashiv et al. 2016). These estimates necessarily reflect linked selection over tens to hundreds of thousands of generations. One question is whether these estimates of the proportion of allele frequency change due to linked selection should line up with those over shorter time periods? Some forms of linked selection may be fairly uniform over time, whereas rare, strong sweeps will have a huge impact on long-term patterns of variation, but may be hard to catch in temporal data. Conversely, as we discuss below, fluctuating selection may lead to stronger signals of linked selection on short timescales than seen in long-term snapshots.

Studies of contemporary data have revealed multiple lines of evidence for the effect of linked selection in a variety of taxa. If linked selection is pervasive across the genome, diversity could be severely dampened as most sites would be in the vicinity of selected sites, thus reducing the genome-wide level of diversity without leaving strong local signals differentiated from the background. This is one proposed resolution of Lewontin’s paradox, the observation that diversity levels occupy a narrow range across taxa with population sizes that vary by orders of magnitude (Lewontin 1974; Smith and Haigh 1974; Gillespie 2001; Leffler et al. 2012). Elyashiv et al. (2016) estimated a 77– $89 %$ reduction in neutral diversity due to selection on linked sites in Drosophila melanogaster, and concluded that no genomic window was entirely free of the effect of selection. Similarly, Corbett-Detig et al. (2015) found evidence of a stronger relative reduction in polymorphisms due to linked selection in taxa with larger population sizes. However, these reductions fall short of the many orders of magnitude required for linked selection to explain Lewontin’s paradox (Coop 2016).

One limitation of these approaches is that they require estimating $π_{0}$ , the level of diversity in the absence of linked selection, usually from the diversity in high-recombination regions with low gene content. The average genome-wide reduction of diversity can then be judged relative to $π_{0}$ . Ideally, $π_{0}$ would be a measure of the average diversity due entirely to drift and demographic history, i.e., unaffected by heritable fitness variation. However, there are two complications with this. First, as Robertson (2009) first showed, even a site completely unlinked from sites creating heritable fitness variation experiences a reduced effective population size due to the total additive genetic variance for fitness at these unlinked sites, and thus lower diversity [see also Santiago and Caballero (1995)]. The second complication is that if linked selection is sufficiently strong, the bases used to measure $π_{0}$ may not be sufficiently unlinked from fitness-determining sites to plateau to the Robertson (2009) level of diversity, a known potential limitation (Coop 2016; Elyashiv et al. 2016). Overall, the empirical studies relying on present-day samples from a single timepoint could be underestimating the effects that pervasive linked selection has on diversity. If linked selection can be observed over suitable timescales in temporal data, we might be able to disentangle some of these effects. For example, if high-recombination regions still show temporal autocovariance in allele frequency change, we would have evidence that even these regions are not free of the effect of linked selection and we might be able to estimate its long-term impact on levels of diversity.

Temporally or spatially fluctuating selection has long been discussed as an explanation for abundant, rapid phenotypic adaptation over short timescales, yet over longer timescales both phenotypic changes and molecular evolution between taxa are slow (Gingerich 1983; Hendry and Kinnison 1999; Messer et al. 2016). However, most of our approaches to population genomic data are built on simple models with constant selection pressures, as typically we have not had the data to move beyond these models (Messer et al. 2016). Currently, many approaches to quantify the impact of linked selection due to hitchhiking assume classic sweeps, where a consistent selection pressure ends in the fixation of a beneficial allele (Wiehe and Stephan 1993; Sella et al. 2009; Hernandez et al. 2011). However, fluctuating selection can have a larger effect on reducing diversity than classic sweeps (Barton 2000) depending on the timescales over which such fluctuations occur. In fact, as Barton (2000) points out, the total effect of classic Maynard–Smith and Haigh-type sweeps on diversity is limited by the relatively slow rate of substitutions. We show that when the direction of selection on a trait abruptly reverses, this creates negative autocovariance between the allele frequency changes before and after the reverse in direction. We can observe the shift by plotting autocovariances over time and noting when they become negative, indicating a negative additive genetic covariance between fitness at two timepoints. Here, we assume a simple form of fluctuating selection, where selection pressures on all of our sites flip at some timepoint. In reality, selection pressures will change on only some traits, and some of the genetic response will be constrained by pleiotropy, thus only some proportion of the additive genetic variance will change. Still, we expect some level of negative covariance after a reversal in the direction of selection, and there is an additional signal of fluctuating selection by comparing how the strength of temporal autocovariance varies with recombination and the initial level of LD in the genome.

Connecting estimates of $V_{A}$ from temporal genomic data and quantitative genetic studies

The temporal covariance of allele frequencies potentially offers a way to estimate the additive genetic variance for fitness, as illustrated by our method-of-moments approach across genomic windows. The additive genetic variance for fitness can, like any other trait, be estimated through quantitative genetics methods, which exploit the phenotypic resemblance between relatives and their known kinship coefficients (Kruuk 2004; Shaw and Shaw 2013) [see Hendry et al. (2018) for a review], and these methods have been applied to estimate the additive genetic variance for fitness from natural populations (Mousseau and Roff 1987; Burt 1995). Ideally, one could reconcile quantitative genetic measures of fitness variance with estimates from allele frequency covariance. For example, Charlesworth (2015) undertook a similar analysis in D. melanogaster, comparing population genetic estimates of fitness variance to quantitative genetics estimates, highlighting a discordance potentially consistent with undetected large-effect alleles that are likely maintained by some form of balancing selection. By allowing us to directly measure fitness variation from population genetic data over very short timescales, temporal data could help untangle the causes of this discordance. A natural extension of this would be to see which regions contain the greatest inferred levels of additive genetic variance for fitness and test for functional covariates such as the number of coding bases, etc. Whereas previous temporal studies have focused on finding loci under selection, inferring the level of additive genetic variance could provide a more complete view of how much selection operates over short timescales.

Conclusions

With temporal data, we can directly partition the total variance in allele frequency change across generations, $Var (p_{t} - p_{0})$ , into components according to the underlying process governing their dynamics: drift and linked selection. Since the trajectory of a neutrally drifting polymorphism does not autocovary, evidence of temporal autocovariance across neutral sites in a closed population is consistent with linked selection perturbing these sites’ trajectories. If we consider drift to be the process by which nonheritable variation in reproductive success and Mendelian segregation cause allele frequencies to change, then this is estimable from and separable from the effects of linked selection using temporal data. This helps frame the long-running debate about the roles that neutral drift and linked selection have in allele frequency dynamics into a problem that can potentially be directly quantified by the contribution of each distinct process with temporal data.

Acknowledgments

We thank Nick Barton, Doc Edge, Matt Osmond, Enrique Santiago, Michael Turelli, and two anonymous reviewers for feedback on previous versions of the manuscript, and Aneil Agrawal, Dave Begun, Sarah Friedman, Bill Hill, John Kelly, Tyler Kent, Chuck Langley, Sally Otto, Jonathan Pritchard, Kevin Thornton, Anita To, and members of the Coop laboratory for helpful conversations. This research was supported by an National Science Foundation (NSF) Graduate Research Fellowship grant awarded to V.B. (1650042), and National Institutes of Health (R01 GM-108779) and NSF (1353380) grants awarded to G.C.

Appendix

Decomposition of Allele Frequency Change

This decomposition of neutral allele frequency change between two consecutive generations is based on that of Santiago and Caballero (1995). We imagine a closed Wright–Fisher population of N diploids, where each diploid i contributes $k_{i} \sim Multinom (f_{i} / N, 2 N)$ gametes to the next generation. We assume that the population size is constant, such that one diploid begets one diploid and thus $E_{i} (f_{i}) = 1$ . The neutral allele frequency in the next generation can be thought of as each of the N parents passing their average genotype $x_{i} / 2$ (where $x_{i} \in {0, 1, 2}$ is the number of tracked neutral alleles individual i carries) to their $k_{i}$ gametes, plus a random Mendelian deviation $b_{i j} \in {0, - 1 / 2, 1 / 2}$ to each offspring j. Then, the frequency in the next generation can be written as

p_{1} = \frac{1}{N} \sum_{i = 1}^{N} (k_{i} \frac{x_{i}}{2} + \sum_{j = 1}^{k_{i}} b_{i j})

(28)

where $b_{i j} = δ_{x_{i}, 1} (1 / 2 - B_{j}), B_{j} \sim Bernoulli (1 / 2)$ , and $δ_{x_{i}, 1}$ is an indicator function that is one when the individual i is a heterozygote (i.e., $x_{i} = 1$ ), and zero otherwise.

If we further decompose the number of offspring of individual i into the genetic and nongenetic contributions, $k_{i} = f_{i} + d_{i}$ , then

\begin{array}{l} p_{1} = \frac{1}{N} \sum_{i = 1}^{N} ((f_{i} + d_{i}) \frac{x_{i}}{2} + \sum_{j = 1}^{k_{i}} b_{i j}) \\ = \frac{1}{2 N} \sum_{i = 1}^{N} f_{i} x_{i} + \frac{1}{2 N} \sum_{i = 1}^{N} d_{i} x_{i} + \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{k_{i}} b_{i j} \end{array}

(29)

and the change of the neutral allele’s frequency is the difference $Δ p = p_{1} - p_{0}$ where $p_{0} = 1 / 2 N \sum_{i = 1}^{N} x_{i}$ . Then,

\begin{array}{l} Δ p = p_{1} - p_{0} = \frac{1}{2 N} \sum_{i = 1}^{N} f_{i} x_{i} + \frac{1}{2 N} \sum_{i = 1}^{N} d_{i} x_{i} + \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{k_{i}} b_{i j} - \frac{1}{2 N} \sum_{i = 1}^{N} x_{i} \\ = \underset{Δ_{H} p_{1}}{\underset{︸}{\frac{1}{2 N} \sum_{i = 1}^{N} x_{i} (f_{i} - 1)}} + \underset{Δ_{N} p_{1}}{\underset{︸}{\frac{1}{2 N} \sum_{i = 1}^{N} d_{i} x_{i}}} + \underset{Δ_{M} p_{1}}{\underset{︸}{\frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{k_{i}} b_{i j}}} . \end{array}

(30)

These d’s broadly capture nonheritable variation in an individual’s offspring number, with $E [d_{i}] = 0$ . In a quantitative genetics framework, ${Var}_{i} (d_{i})$ can include nongenetic variation in the lifetime reproductive success of individuals $(V_{E})$ , while in population genetic models ${Var}_{i} (d_{i})$ can accommodate the sampling of parents to form the next generation, e.g., multinomial sampling of individuals from fitnesses (Santiago and Caballero 1995). From now forwards, we assume that variation in these d’s is nonheritable. We note that in nonpanmictic populations, chance covariances could be created between a neutral polymorphism and environmental component of their phenotype, especially as a population expands its range into new environments that affect the phenotype and variants “surf” to higher frequencies (Edmonds et al. 2004; Excoffier and Ray 2008; Hallatschek and Nelson 2008).

Note that by construction, the allele frequency change components $Δ_{N} p_{t} Δ_{M} p_{t}$ and $Δ_{H} p_{t}$ are orthogonal within each individual, given the neutral allele frequency x. Partitioning the allele frequency change for an individual i into its components,

Δ p_{t, i} = \underset{Δ_{H} p_{t, i}}{\underset{︸}{\frac{1}{2} x_{i} (f_{i} - 1)}} + \underset{Δ_{N} p_{t, i}}{\underset{︸}{\frac{1}{2} x_{i} d_{i}}} + \underset{Δ_{M} p_{t, i}}{\underset{︸}{δ_{x_{i}, 1} (k_{i} / 2 - M (k_{i}))}}

(31)

= \frac{1}{2} x_{i} (f_{i} - 1) + \frac{1}{2} x_{i} d_{i} + δ_{x_{i}, 1} (f_{i} / 2 - M (f_{i})) + δ_{x_{i}, 1} (d_{i} / 2 - M (d_{i}))

(32)

where $M (n) = \sum_{j = 1}^{n} B_{j} \sim Binom (n, 1 / 2)$ .

Two random variables $X, Y$ are uncorrelated if $Cov (X, Y) = E (X Y) - E (X) E (Y) = 0$ , and are orthogonal if either has an expected value of zero such that $E (X Y) = 0$ . We show briefly that taking expectations over conceptual evolutionary replicates, the terms are orthogonal. First, the terms $x (f - 1)$ and $x d$ are orthogonal (dropping i subscripts),

\begin{array}{l} \frac{1}{4} Cov (x (f - 1), x d) = \frac{1}{4} (E (x^{2} d (f - 1)) - E (x (f - 1)) E (x d)) \\ = \frac{1}{4} (E (x^{2}) E (d) E (f - 1) - E (x) E (f - 1) E (x d)) \\ = 0 \end{array}

(33)

since $E (f) = 1$ across evolutionary replicates due to the assumption that population size is constant, and $x ⊥ f$ , as across all evolutionary replicates there is no dependence between a particular neutral allele an individual carries and their fitness (though in particular replicates, such associations occur). Similarly, for the case $x = 1$ (other cases are all zero and can be ignored), it can be shown using the law of total expectation that $Cov (x d, d / 2 - M (d)) = 0$ [and likewise with $x (f - 1)$ and $f / 2 - M (f)$ ]. Note that across individuals within a population, there are weak covariances in their number of offspring as the total number of offspring must sum to N; under a multinomial offspring distribution, these are of order $1 / N$ .

Temporal Variance and Autocovariance Under Multilocus Selection

We assume the phenotype of an individual i has an additive polygenic basis, such that their breeding value is $z_{i} = \sum_{l = 1}^{L} α_{t, l} g_{i, l}$ , which deviates around a mean of zero, and $α_{t, l}$ is the additive effect size at locus l in generation t and $g_{i, l} \in {0, 1, 2}$ is individual i’s allele count at this locus (note that the effect of nonheritable environmental noise affecting the trait is accounted for in the $d_{i}$ terms above). We impose directional selection on this trait using an exponential fitness function, such that individual i’s fitness is $f_{i} = w (z_{i}) / \bar{w} \approx e^{z_{i}}$ (assuming $\bar{w} \approx 1$ ). If we assume individuals’ phenotypic values do not deviate too far from their mean value of zero, we can approximate $f_{i}$ as: $f_{i} \approx 1 + \sum_{l = 1}^{L} α_{t, l} g_{i, l}$ . Then, we can write the change in neutral allele frequency due to only heritable variation in fitness $(Δ_{_{H}} p_{t})$ as a covariance between fitness and the neutral allele frequency across individuals in generation t,

Δ_{H} p_{t} = \frac{1}{2 N} \overset{N}{\sum_{i = 1}} x_{i} (f_{i} - 1)

(34)

= \frac{1}{2} {Cov}_{i} (x_{i}, f_{i})

(35)

= \frac{1}{2} Co v_{i} (x_{i}, \sum_{l = 1}^{L} α_{t, l} g_{i, l}),

(36)

which is the is the Robertson–Price covariance (Robertson 1966; Price 1970; Lynch et al. 1998; Walsh and Lynch 2018).

Now, we break up the genotypic value $x_{i}$ into the contributions of each of the two gametes that formed individual i, $x_{i} = x_{i}^{'} + x_{i}^{″}$ , and likewise with the trait locus $g_{i, l} = g_{i, l}^{'} + g_{i, l}^{″}$ , where $x_{i}^{'}, x_{i}^{″}, g_{i}^{'},$ and $g_{i}^{″}$ are all indicator variables. Expanding out the covariances, we have

\begin{array}{l} Δ_{H} p_{t} = \frac{1}{2} Cov (x_{i}^{'} + x_{i}^{″}, \sum_{l = 1}^{L} α_{t, l} (g_{i, l}^{'} + g_{i, l}^{″})) \\ = \frac{1}{2} (Cov (x_{i}^{'}, \overset{L}{\sum_{l = 1}} α_{t, l} (g_{i, l}^{'} + g_{i, l}^{″})) + Cov (x_{i}^{″}, \overset{L}{\sum_{l = 1}} α_{t, l} (g_{i, l}^{'} + g_{i, l}^{″}))) \\ = \frac{1}{2} (Cov (x_{i}^{'}, \overset{L}{\sum_{l = 1}} α_{t, l} g_{i, l}^{'}) + Cov (x_{i}^{'}, \overset{L}{\sum_{l = 1}} α_{t, l} g_{i, l}^{″}) + Cov (x_{i}^{″}, \overset{L}{\sum_{l = 1}} α_{t, l} g_{i, l}^{'}) + Cov (x_{i}^{″}, \overset{L}{\sum_{l = 1}} α_{t, l} g_{i, l}^{″})) . \end{array}

(37)

Each of these covariances is between the neutral allele and a selected allele, either on the same gamete (either maternal or paternal) or across gametes. These covariances can be written as LD terms,

\begin{array}{l} Δ_{H} p_{t} = \frac{1}{2} (Cov (x_{i}^{'}, \sum_{l = 1}^{L} α_{t, l} g_{i, l}^{'}) + Cov (x_{i}^{'}, \sum_{l = 1}^{L} α_{t, l} g_{i, l}^{″}) + Cov (x_{i}^{″}, \sum_{l = 1}^{L} α_{t, l} g_{i, l}^{'}) + Cov (x_{i}^{″}, \sum_{l = 1}^{L} α_{t, l} g_{i, l}^{″})) \\ = \frac{1}{2} (\sum_{l = 1}^{L} α_{t, l} Cov (x_{i}^{'}, g_{i, l}^{'}) + \sum_{l = 1}^{L} α_{t, l} Cov (x_{i}^{'}, g_{i, l}^{″}) + \sum_{l = 1}^{L} α_{t, l} Cov (x_{i}^{″}, g_{i, l}^{'}) + \sum_{l = 1}^{L} α_{t, l} Cov (x_{i}^{″}, g_{i, l}^{″})) \\ = \frac{1}{2} (\sum_{l = 1}^{L} α_{t, l} D_{l}^{'} + \sum_{l = 1}^{L} α_{t, l} D_{l}^{″} + \sum_{l = 1}^{L} α_{t, l} D_{l}^{″} + \sum_{l = 1}^{L} α_{t, l} D_{l}^{'}) \\ = \sum_{l = 1}^{L} α_{t, l} D_{l}^{'} + \sum_{l = 1}^{L} α_{t, l} D_{l}^{″} \end{array}

(38)

where $D_{L}^{'}$ is the LD between alleles on the same gamete (the gametic LD), and the $D_{l}^{″}$ is the across-gamete LD (nongametic LD) [see p. 121 in Weir (1996)]. This equation also appears in Kirkpatrick et al. (2002) (equation 10).

We ignore nongametic LD $D_{l}^{″}$ as these are weak under random mating, and write the multilocus temporal covariance between the allele frequency changes $Δ p_{t}$ and $Δ p_{s}$ as

\begin{array}{l} Cov (Δ p_{t}, Δ p_{s}) = E (Δ p_{t} Δ p_{s}) - E (Δ p_{t}) E (Δ p_{s}) \\ = E (\sum_{l = 1}^{L} α_{t, l} D_{t, l}^{'} \sum_{l = 1}^{L} α_{s, l} D_{s, l}^{'}) \\ = \underset{persistence of association to selected site l}{\underset{︸}{\overset{L}{\sum_{l = 1}} α_{t, l} α_{s, l} E (D_{t, l}^{'} D_{s, l}^{'})}} + \underset{cross - associations between two selected sites k and l}{\underset{︸}{\sum_{l \neq k} α_{t, k} α_{s, l} E (D_{t, k}^{'} D_{s, l}^{'})}} \end{array}

(39)

since $E (Δ p_{t}) = 0$ .

Modeling the dynamics of LD between selected and neutral sites

Here, we outline a model of the changes in LD between the focal neutral site and selected sites, which allows us to derive an expression for the first term in Equation 39. Typically, models of multilocus selection track the genetic changes in a population by transforming between a representation of haplotype frequencies to a representation of allele frequencies, LD, and higher-order LD (Barton and Turelli 1987, 1991; Turelli 1988; Turelli and Barton 1990). We for the moment avoid the difficulty of a full multilocus treatment by assuming that the linkage between selected sites is loose enough that one selected site’s frequency change is independent of the change at other selected sites, e.g., there is no selective interference; this was an assumption in past treatments [Santiago and Caballero (1995, 1998); see Barton (2000) for discussion of this]. Specifically, we model the dynamics of the LD between the neutral site and each selected site as they would behave under a single-sweep model.

We adapt Barton’s (2000) model for LD dynamics during a sweep. We imagine a polymorphic neutral locus has alleles $B_{1}$ and $B_{2}$ with frequencies p and $1 - p$ . We partition the allele frequency of $B_{1}$ by conditioning on which allele at the selected site (either $A_{1}$ or $A_{2}$ ) is carried on the same background, e.g., $P (B_{1} | A_{1})$ and $P (B_{1} | A_{2})$ , such that $P (B_{1}) = P (B_{1} | A_{1}) P (A_{1}) + P (B_{1} | A_{2}) P (A_{2})$ . Then, the LD between neutral and selected sites can be expressed as $D = P (A_{1}) P (A_{2}) (P (B_{1} | A_{1}) - P (B_{1} | A_{2})))$ . To simplify notation, we denote the frequency of the neutral allele on the different fitness backgrounds at time t by $P (B_{1} | A_{1}) = p_{t}^{(1)}$ and $P (B_{1} | A_{2}) = p_{t}^{(2)}$ , and the selected allele at locus l frequency as $p_{t, l}$ , then the LD between the focal neutral site and selected site l at time t is

D_{t, l} = p_{t, l} (1 - p_{t, l}) (p_{t}^{(1)} - p_{t}^{(2)}) .

(40)

Selection changes the frequency $p_{t, l}$ through time and recombination acts to disassociate the neutral allele with its backgrounds. The expected difference $p_{t}^{(1)} - p_{t}^{(2)}$ is maintained with probability $(1 - r_{l})$ each generation (Barton 2000). If the initial generation is t and the future generation is s $(s > t)$ , and $r_{l}$ is the recombination rate between the focal neutral locus and selected site l, this leads to

(p_{s}^{(1)} - p_{s}^{(2)}) = (p_{t}^{(1)} - p_{t}^{(2)}) {(1 - r_{l})}^{s - t} .

(41)

Then, we can use this to describe the dynamics of $D_{s, l}$ to generation t,

\begin{array}{l} \frac{p_{s, l} (1 - p_{s, l})}{p_{s, l} (1 - p_{s, l})} (p_{s}^{(1)} - p_{s}^{(2)}) = \frac{p_{t, l} (1 - p_{t, l})}{p_{t, l} (1 - p_{t, l})} (p_{t}^{(1)} - p_{t}^{(2)}) {(1 - r)}^{s - t} \\ \frac{D_{s, l}}{p_{s, l} (1 - p_{s, l})} = \frac{D_{t, l}}{p_{t, l} (1 - p_{t, l})} {(1 - r)}^{s - t} \\ D_{s, l} = D_{t, l} \frac{p_{s, l} (1 - p_{s, l})}{p_{t, l} (1 - p_{t, l})} {(1 - r_{l})}^{s - t} \end{array}

(42)

[compare with equations 30 and 31 in Stephan et al. (2006)]. Now, we can find the expected product $E (D_{t, l} D_{s, l})$ by multiplying both sides by $D_{t, l}$ and taking expectations. We treat the allele frequency trajectory as deterministic, giving us,

\begin{array}{l} D_{t, l} D_{s, l} = D_{t, l}^{2} \frac{p_{s, l} (1 - p_{s, l})}{p_{t, l} (1 - p_{t, l})} {(1 - r_{l})}^{s - t} \\ E (D_{t, l} D_{s, l}) = E (D_{t, l}^{2}) \frac{p_{s, l} (1 - p_{s, l})}{p_{t, l} (1 - p_{t, l})} {(1 - r_{l})}^{s - t} . \end{array}

(43)

Then, we simplify this by replacing $E (D_{t, l}^{2})$ with $E (ℛ_{t, l}^{2}) p_{t, l} (1 - p_{t, l}) p_{t} (1 - p_{t})$ [where $E (ℛ_{t, l})$ is the square of the correlation between the neutral site and selected site l at time t Hill and Robertson (1968)],

\begin{array}{l} E (D_{t, l} D_{s, l}) = E (D_{t, l}^{2}) \frac{p_{s, l} (1 - p_{s, l})}{p_{t, l} (1 - p_{t, l})} {(1 - r_{l})}^{s - t} \\ = E (ℛ_{t, l}^{2}) p_{t} (1 - p_{t}) p_{t, l} (1 - p_{t, l}) \frac{p_{s, l} (1 - p_{s, l})}{p_{t, l} (1 - p_{t, l})} {(1 - r_{l})}^{s - t} \\ = E (ℛ_{t, l}^{2}) p_{t} (1 - p_{t}) p_{s, l} (1 - p_{s, l}) {(1 - r_{l})}^{s - t} . \end{array}

(44)

Returning to Equation 39 and replacing the $E (D_{t, l} D_{s, l})$ terms with our expression above,

Cov (Δ p_{t}, Δ p_{s}) = \sum_{l = 1}^{L} α_{t, l} α_{s, l} E (ℛ_{t, l}^{2}) p_{t} (1 - p_{t}) p_{s, l} (1 - p_{s, l}) {(1 - r_{l})}^{s - t}

(45)

again ignoring the cross-associations between selected sites. Dividing our temporal covariance by the neutral site’s $p_{t} (1 - p_{t})$ , we can write the multilocus temporal covariance in a standardized form (analogous to Wright’s F),

\frac{Cov (Δ p_{t}, Δ p_{s})}{p_{t} (1 - p_{t})} = \sum_{l = 1}^{L} α_{t, l} α_{s, l} p_{s, l} (1 - p_{s, l}) E (ℛ_{t, l}^{2}) {(1 - r_{l})}^{s - t} .

(46)

Using average additive genetic variation

We can approximate Equation 46 by noticing that the terms $α_{t, l} α_{s, l} p_{s, l} (1 - p_{s, l})$ are similar to an additive genic variation if effect sizes remain constant through time. We make that assumption here, writing $α_{l} : = α_{t, l} = α_{s, l}$ , leading to

\frac{Cov (Δ p_{t}, Δ p_{s})}{p_{t} (1 - p_{t})} = \sum_{l = 1}^{L} α_{l}^{2} p_{s, l} (1 - p_{s, l}) E (ℛ_{t, l}^{2}) {(1 - r_{l})}^{s - t} .

(47)

We can further simplify this by assuming that there is no covariance between the additive genic variation at a selected site, and the LD between that selected site and the neutral site. We write the additive genic variation at site l at time s as $v_{a, l} (s) = 2 α_{l}^{2} p_{s, l} (1 - p_{s, l})$ , and the average additive genic variation across loci as $\bar{v_{a} (s)} : = V_{a} (s) / L = \frac{1}{L} \sum_{l} 2 α_{l}^{2} p_{s, l} (1 - p_{s, l})$ . Then, each locus’s additive genic variation can be expressed as: $v_{a, l} (s) = \bar{v_{a} (s)} + ε_{l}$ . Substituting this, the autocovariance is

\begin{array}{l} \frac{Cov (Δ p_{t}, Δ p_{s})}{p_{t} (1 - p_{t})} = \frac{1}{2} \sum_{l = 1}^{L} v_{a, l (s)} E (ℛ_{t, l}^{2}) {(1 - r_{l})}^{s - t} \\ = \frac{1}{2} \sum_{l = 1}^{L} (\bar{v_{a} (s)} + ε_{l}) E (ℛ_{t, l}^{2}) {(1 - r_{l})}^{s - t} \\ = \frac{1}{2} \underset{\begin{array}{l} average genic \\ variation per locus \end{array}}{\underset{︸}{\bar{v_{a (s)}}}} \times \underset{sum of persistence associations}{\underset{︸}{(\sum_{l = 1}^{L} E (ℛ_{t, l}^{2}) {(1 - r_{l})}^{s - t})}} + \frac{1}{2} \underset{effect - association covariation}{\underset{︸}{(\sum_{l = 1}^{L} ε_{l} E (ℛ_{t, l}^{2}) {(1 - r_{l})}^{s - t})}} . \end{array}

(48)

We assume that this last term, which is nonzero in expectation only if there is covariance between the additive genic variation at a selected site, and the expected LD between the selected and the neutral sites is zero. Rewriting the total genic variation as $V_{a} (s)$ ,

\frac{Cov (Δ p_{t}, Δ p_{s})}{p_{t} (1 - p_{t})} = \frac{V_{a} (s)}{2 L} \sum_{l = 1}^{L} E (ℛ_{t, l}^{2}) {(1 - r_{l})}^{s - t} .

(49)

Continuous approximation to chromosomes

Currently, we have treated the positions of the selected sites as fixed, conditional on knowing that a selected site l has a recombination fraction $r_{l}$ away from the focal neutral site. Here, we make a few further assumptions. First, we assume the L selected loci are each independently and identically uniformly distributed along a continuous region of R M in length $(g \sim U (- R / 2, R / 2))$ , and we now calculate the covariance at a focal neutral site at the origin by taking expectations over the random positions of these sites. Second, we assume that the LD between the neutral and selected site l only depends on the recombination fraction between the sites. Since g is the genetic distance, the recombination fraction is now provided by a mapping function $r (g)$ that maps genetic distances to recombination fractions. Throughout the paper and in the simulations, we use Haldane’s mapping function, $r (g) = \frac{1}{2} (1 - e^{- 2 | g |})$ (note the absolute value translates positions on $[- R / 2, R / 2]$ to distances to the focal neutral site). Next, we assume that the LD between two sites can be completely determined by the distance between the focal neutral site and a random selected site, allowing us to rewrite $E (ℛ_{t, l}^{2})$ as the function $E (ℛ_{t}^{2} (r (g)))$ . Now, letting $E_{r_{l}} (\cdot)$ represent the expectation taken over the random positions of selected sites on the genetic map,

\frac{Cov (Δ p_{t}, Δ p_{s})}{p_{t} (1 - p_{t})} = \frac{V_{a} (s)}{2 L} \sum_{l = 1}^{L} E_{r_{l}} (E (ℛ_{t}^{2} (r_{l})) {(1 - r_{l})}^{s - t})

(50a)

= \frac{V_{a} (s)}{2 L} \sum_{l = 1}^{L} \int_{- R / 2}^{R / 2} E (ℛ_{t}^{2} (r (g))) {(1 - r (g))}^{s - t} \frac{1}{R} d g

(50b)

= \frac{V_{a} (s)}{2 R} \int_{- R / 2}^{R / 2} E (ℛ_{t}^{2} (r (g))) {(1 - r (g))}^{s - t} d g

(50c)

since the $1 / L$ cancels with the L from the sum of expectations.

As our trait is neutrally evolving before directional selection starts, we use the expected neutral LD, $E (ℛ^{2}) = (10 + ρ) / (22 + 13 ρ + ρ^{2})$ , where $ρ = 4 N r (g)$ (Hill and Robertson 1968; Ohta and Kimura 1969) when t is the first generation when selection begins.

The Contribution of the Rest of the Genome to Temporal Autocovariance at a Locus

Additionally, we can consider the impact that other unlinked selected sites have on the neutral allele’s frequency trajectory. In Equation (50c), we are modeling the temporal autocovariance at a focal neutral site in a chromosome with total length R. Here, we assume the whole genome can be modeled in a similar fashion, as a single very large chromosome with genetic length C, the entire map length of the genome, and the focal neutral falls in the center of this. Then, we can take a piecewise integral over all linked sites (those less than a recombination fraction of one-half away) and all unlinked sites with respect to the genetic distance (those at a recombination fraction of one-half away). Our piecewise integral gives us

\frac{Cov (Δ p_{t}, Δ p_{s})}{p_{t} (1 - p_{t})} = \frac{V_{a} (s)}{2 C} \int_{- C / 2}^{C / 2} E (ℛ_{t}^{2} (r (g))) {(1 - r (g))}^{s - t} d g

(51)

= \frac{V_{a} (s)}{2 C} [2 \int_{0}^{g^{*}} E (ℛ_{t}^{2} (r (g))) {(1 - r (g))}^{s - t} d g + 2 \frac{E (ℛ_{t}^{2} (1 / 2))}{2^{s - t}} (C - g (1 / 2))]

(52)

where $g^{*}$ is the map distance at which a selected site becomes approximately unlinked to the neutral site, e.g., the recombination fraction is $1 / 2$ . As we move away from the neutral site, the first term in the brackets accounts for the accumulation of linked selected sites. Eventually, the genetic distance away from the neutral site becomes unlinked and the second term accounts for the contribution of these unlinked sites on both the same chromosome, as well as sites on other chromosomes. Note that here we assume that the density of additive genetic variance per morgan is constant. As in the main text, this ignores the contribution of nongametic LD, formed as two gametes unite in an individual, and can be converted into gametic LD via recombination [see Figure A2; this process is elaborated in The Strength of Unlinked and Nongametic Associations in the Appendix, and see also p. 521 in Tenesa et al. (2007)]. Selected sites that are unlinked or loosely linked to the neutral site (e.g., $r \approx 1 / 2$ ) quickly become associated with the neutral site, but also quickly decay; their contribution acts to further decrease the population size by a factor of two [see the discussion on p. 2115 in Santiago and Caballero (1998)].

As a simple thought experiment, we might ask for what value of M does the contribution from unlinked sites dominate the contribution from linked sites? For $N = 1000$ and assuming that the level of LD is that under mutation–drift–recombination balance (e.g., using the equation of Tomoko Ohta), we plot the relative contributions of linked selected sites (on the focal chromosome) and unlinked selected sites (on other chromosomes) for various spans of the covariance (e.g., $| s - t |$ ), and the size of the remaining portion of the genome in morgans (M) in Figure A1.

Averaging Covariance Across Multiple Loci

Thus far, our covariance assumes that a single neutral site is positioned in the center of a region R-M long, with selected sites uniformly distributed along this region. However, in our simulations we simulate a region that contains many neutral sites, which we average over in calculating the temporal autocovariance. In this case, we average over the random distance between a neutral site’s position n and a selected site’s position g, which is $c = | n - g |$ , where $n, g \sim U (0, R)$ . This random variable c is distributed according to the triangle distribution, $f (c) = 2 (R - c) / R^{2}$ ; we replace the uniform probability density function (PDF) in Equation (50b) with the triangle density PDF and average over the distance between sites,

\frac{E_{n} (Cov (Δ p_{t}, Δ p_{s}))}{E_{n} (p_{t} (1 - p_{t}))} = \frac{V_{a} (s)}{2} \int_{0}^{R} E (ℛ_{t}^{2} (r (c))) {(1 - r (c))}^{(s - t)} \frac{2 (R - c)}{R^{2}} d c

(53)

= \frac{V_{a} (s)}{2} A (R, t, s)

(54)

where $E_{n} (\cdot)$ indicates we take the expectation also over neutral sites, and we use $A (R, t, s)$ to denote the average LD between selected and neutral sites that persists from generations t to s $(t \leq s)$ . Note that in calculating the standardized covariance above, we use a ratio of expectations rather than the expectation of the ratio (Bhatia et al. 2013).

Empirically Calculating the Average LD Persisting Across Generations

In the previous expressions for temporal autocovariance, we stepped through a conceptual model for the average levels of LD between neutral and selected sites that persists across $| s - t |$ generations $(A (R, t, s))$ , where the positions of selected and neutral sites are randomly distributed along a chromosomal region. In systems with a known recombination map and studies where LD can be calculated, we have the recombination fraction $r_{i, j}$ , and the pairwise LD $R_{i, j}^{2}$ between two loci i and j (where $R^{2}$ is the $M \times M$ matrix of pairwise LD calculated at time t). Since we do not a priori know whether a site is selected or not, we sum over all polymorphic M loci, thus characterizing the average LD in a region as

\bar{A (t, s)} = \frac{2}{M (M - 1)} \sum_{i = 1}^{M} \sum_{j > i} R_{i, j}^{2} {(1 - r_{i, j})}^{| t - s |} .

(55)

This sum is the empirical analog to the integral in Equation 10.

The Strength of Unlinked and Nongametic Associations

Here, we characterize the contribution of completely genetically unlinked loci segregating for fitness variation to the change in frequency of our neutral allele. Across evolutionary replicates, there is no expected covariance between the neutral allele an individual carries and their fitness $(E (Δ_{H} p_{t}) = E (Cov (x_{i}, f_{i})) = 0)$ ; rather, for unlinked loci, chance associations are created from the variance around this sampling process of neutral alleles into individuals with varying fitness $(Var (Δ_{H} p_{t}) = Var (Cov (x_{i}, f_{i})))$ . As the neutral allele and fitness variation independently assort themselves into individuals, the chance associations that form have a variance given by $Var ({Cov}_{i} (x_{i}, f_{i}))$ . This has the form of the sampling variance of a covariance, which for random variables X and Y is given on p. 472 Kendall et al. (1994)),

Var (Cov (X, Y)) = \frac{{(n - 1)}^{2}}{n^{3}} (μ_{22} - μ_{11}^{2}) + \frac{n - 1}{n^{3}} (μ_{20} μ_{02} - μ_{11}^{2})

(56)

where μ_{r s} = E {(X - μ_{X})}^{r} {(Y - μ_{Y})}^{s})

(57)

where $μ_{X}$ and $μ_{Y}$ are the means of X and Y, respectively, the variance is taken over conceptual replicate populations, and the covariance is calculated over the individuals in a population. Then, applying this to our covariance $Δ_{_{H}} p_{1} = {Cov}_{i} (x_{i}, f_{i})$ ,

\begin{array}{l} Var (Δ_{_{H}} p_{1}) = 1 / 4 Var ({Cov}_{i} (x_{i}, f_{i})) \\ = \frac{{(N - 1)}^{2}}{4 N^{3}} (E [{(x_{i} - p_{0})}^{2} {(f_{i} - 1)}^{2}] - E {[(x_{i} - p_{0}) (f_{i} - 1)]}^{2}) \\ + \frac{N - 1}{4 N^{3}} (E [{(x_{i} - p_{0})}^{2}] E [{(f_{i} - 1)}^{2}] - E {[(x_{i} - p_{0}) (f_{i} - 1)]}^{2}) \\ = \frac{(N - 1)}{4 N^{3}} Var (x_{i}) Var (f_{i}) \\ \approx \frac{p_{0} (1 - p_{0})}{2 N} Var (f_{i}) . \end{array}

(58)

Thus the chance covariances that form between the neutral alleles individuals carry and their fitness have a variance proportional to $Var (f_{i}) / 2 N$ .

Nongametic LD’s Contribution to Temporal Autocovariance

Throughout the paper, we ignore the effects of nongametic LD, $D_{t, l}^{″}$ , the disequilibria that occurs between the two gametes (maternal and paternal) at two loci (see Figure A2A for an illustration of gametic LD $D^{'}$ and nongametic LD $D^{″}$ ). Following the equation for the sampling variance of nongametic LD $D_{A / B}$ in Weir (1996) (see p. 124), the chance nongametic disequilibrium that builds up sampling $2 N$ gametes into N individuals is,

E ({(D_{t, l}^{″})}^{2}) = Var (D_{A / B}) = \frac{1}{2 N} p_{A} (1 - p_{A}) p_{B} (1 - p_{B}) .

(59)

Note the similar form to Equation 58 as both the unlinked and nongametic LD arise from the random sampling of alleles at different loci into individuals.

There is no expected covariance between our gametic and nongametic LD within a generation $E (D_{t, l}^{″}, D_{t, l}^{'}) = 0$ , assuming random mating. However, $E (D_{t, l}^{″}, D_{s, l}^{'}) > 0$ for $s > t$ , as a fraction of the nongametic LD may be converted into gametic LD in the next generation. Specifically, following Santiago and Caballero (1995), we can write the product of the nongametic LD in generation t with the gametic LD in generation s as

E (D_{t, l}^{″}, D_{s, l}^{'}) = r {(1 - r)}^{(s - t - 1)} E ({(D_{t, l}^{″})}^{2})

(60)

where a proportion r the nongametic LD in generation t is converted into gametic LD, and a proportion ${(1 - r)}^{(s - t - 1)}$ of this is carried forward unbroken by recombination over the remaining $s - t - 1$ generations (see Figure A2B for an illustration of this process).

In our analysis in the main text, we ignore these terms as $D_{t, l}^{″}$ is expected to be small due to its inverse dependence on N. However, these terms are necessary for the analysis of looser linkage (Santiago and Caballero 1995, 1998).

Connecting our model with the models of Robertson and Santiago and Caballero

Here, we describe the models of Santiago and Caballero (1995, 1998), relating their work on the long-run effective population size experienced by a neutral allele where there is (1) unlinked heritable fitness variation (Santiago and Caballero 1995) or (2) linkage, where fitness-determining sites are randomly scattered along a chromosome (Santiago and Caballero 1998). Overall, their models are formulated in a quantitative genetics tradition, where the population genetic dynamics at the selected loci are not explicitly modeled (although these links are made more explicitly in their 1998 paper). In contrast, in deriving our expressions for temporal autocovariance and variance, we use a population genetic approach, modeling the dynamics at selected sites (though we simplify from the full multilocus treatment, e.g., we assume selected loci experience independent sweeps and we ignore the LD between selected sites). We show that we can reconcile the two approaches, and demonstrate that the temporal autocovariance expressions we develop in our models are implicit in their model. We also work through their expressions for $N_{e}$ with heritable variance, because it represents a quite useful result but their original presentation was spread across two papers (and a change in notation).

Santiago and Caballero’s 1995 and 1998 models for $N_{e}$

While our goal in the main text of our paper was to develop expressions for the temporal variances and autocovariances in allele frequency change when there is heritable fitness variation in the population, the goal of both the 1995 and 1998 Santiago and Caballero papers was to derive an expression for the long-run $N_{e}$ when there is heritable variation for fitness in the population. In their 1995 paper, they found the effective population size for large t (see p. 1018, equation 16) to be

N_{e} = lim_{t \to ∞} \frac{p_{0} (1 - p_{0}) - Var (p_{t - 1})}{2 (Var (p_{t}) - Var (p_{t - 1}))}

(61)

N_{e} = \frac{4 N}{2 + V_{n} + Q^{2} C^{2}}

(62)

where $C^{2}$ is the heritable variation for fitness ( $V_{A}$ in our notation), and $V_{n}$ is the nonheritable variation in offspring number (i.e., under a Wright–Fisher model, $V_{n} \approx 2$ ). For a neutral locus completely unlinked from fitness variation [the situation first considered by Robertson (1961)], $Q = 1 + G / 2 + {(G / 2)}^{2} + {(G / 2)}^{3} + \dots = \sum_{i = 0}^{∞} {(G / 2)}^{i} = 2 / (2 - G)$ [see equation 17 in Santiago and Caballero (1995)]. Here, G represents the decay rate of the additive genetic variance associated with a particular haplotype. Note that we have simplified their expressions by assuming no assortative mating, and that we try to follow their notation as closely as possible (consequently, the Q here is unrelated to the $Q_{t, s}$ of the main text). Santiago and Caballero (1995) assume that continual artificial selection maintains a constant level $C^{2}$ of fitness variation in the population each generation, yet the particular fitness backgrounds the neutral allele is stochastically associated with only contribute a fraction of G in the next generation, $G^{2}$ in the generation after, and so on, as selection reduces genetic variation for fitness (note: in their 1998 paper, they use Z instead of G). Similarly, the associations between the neutral and fitness backgrounds decay at a rate $1 / 2$ due to independent assortment. Note that Robertson (1961) assumed that the fitness backgrounds that become stochastically associated with the neutral allele do not experience any decay in their fitness variation $(G = 1)$ ; in this case, $Q = 2$ as Robertson’s work found. With an arbitrary amount of linkage between the focal neutral and fitness backgrounds, Santiago and Caballero (1998) show that only Q is affected, and derive an expression for Q that only depends on G and the size of the genome in morgans, L [see equation 6 in Santiago and Caballero (1998)].

In our main text, we model the temporal autocovariance created by heritable fitness variation, which also impacts the cumulative variance in allele frequency change $Var (p_{t} - p_{0})$ . To illustrate how our model connects with theirs, below is the cumulative variance in allele frequency change for three generations in their 1995 notation, with the corresponding changes in allele frequency below:

Var (p_{3} - p_{0}) = E ((\underset{Δ p_{1}}{\underset{︸}{S_{1} + D_{1} + H_{1}}}

\underset{Δ p_{2}}{+ \underset{︸}{(1 - r) G S_{1} + S_{2} + D_{2} + H_{2}}}

+ {\underset{Δ p_{3}}{\underset{︸}{{(1 - r)}^{2} G^{2} S_{1} + (1 - r) G S_{2} + S_{3} + D_{3} + H_{3}}))}}^{2} .

(63)

Grouping terms by the generation that the initial association was formed, we see how Santiago and Caballero (1995) define the $Q_{i}$ terms in their notation,

\begin{array}{l} Var (p_{3} - p_{0}) = E ((\underset{creation and persistence of generation 1 associations : = S_{1} Q_{3}}{\underset{︸}{S_{1} (1 + (1 - r) G + {(1 - r)}^{2} G^{2})}} + D_{1} + H_{1} + \\ \underset{creation and persistence of generation 2 associations : = S_{2} Q_{2}}{\underset{︸}{S_{2} (1 + (1 - r) G)}} + D_{2} + H_{2} + \\ {\underset{creation of generation 3 associations : = S_{3} Q_{1}}{\underset{︸}{S_{3}}} + D_{3} + H_{3})}^{2}) \end{array}

(64)

since the associations created in generation i $(0 < i \leq t)$ persist with probability ${(1 - r)}^{t - i}$ , with proportion $G^{t - i}$ of its original fitness variation in generation t. In general, the cumulative impact of the associations formed i generations ago has coefficient $Q_{i} = \sum_{j = 0}^{i - 1} {(1 - r)}^{j} G^{j}$ . Using these $Q_{i}$ terms simplifies this equation to

\begin{array}{l} Var (p_{4} - p_{0}) = E ((S_{1} Q_{4} + D_{1} + H_{1} + S_{2} Q_{3} + D_{2} + H_{2} + \\ {S_{3} Q_{2} + D_{3} + H_{3} + S_{4} Q_{1} + D_{4} + H_{4})}^{2}) \end{array}

(65)

or, in general,

Var (p_{t} - p_{0}) = \sum_{i = 1}^{t} E (D_{i}^{2}) + E (H_{i}^{2}) + Q_{t - i + 1}^{2} E (S_{i}^{2}) .

(66)

Then, Santiago and Caballero (1995) note that assuming $V_{n}$ , $C^{2}$ , and population size N are constant across generations, the magnitudes of all of the effects $E (D_{i}^{2}), E (H_{i}^{2})$ , and $E (S_{i}^{2})$ are constant across all generations (for all i, so we omit the i subscript for these terms), except for a geometric decay due to drift at a rate $(1 - 1 / 2 N_{e})$ per generation that effects all terms. Such that, when we include the decay in the variance due to drift,

Var (p_{t} - p_{0}) = \sum_{i = 1}^{t} (E (D^{2}) + E (H^{2}) + Q_{i}^{2} E (S^{2})) {(1 - \frac{1}{2 N_{e}})}^{t - i}

(67)

[compare with p. 1018 of Santiago and Caballero (1995)]. In the long run, the variance in the neutral allele’s frequency change hits a balance. Many copies of the neutral allele segregating in the population are on fitness backgrounds that it has recently become stochastically associated with, as segregation and recombination have not broken these associations apart. A few copies of the neutral allele are on fitness backgrounds they became associated with many generations ago, that have by chance survived to remain associated. In all cases, the effect that these associations have on present-day allele frequency change is weakened by the fact that natural selection has reduced the genetic variance of these fitness backgrounds. Since the long-run variance in allele frequency under drift in a Wright–Fisher population is

Var (p_{t}) = p_{0} (1 - p_{0}) [1 - {(1 - \frac{1}{2 N})}^{t}],

(68)

one can estimate the effective population size $N_{e}$ using the observed difference in variances $Var (p_{t})$ and $Var (p_{t - 1})$ . Note that this is a different long-run effective population size to that used by others, $N_{e} = p_{0} (1 - p_{0}) t / (2 Var (p_{t} - p_{0}))$ (Crow and Kimura 1970). Santiago and Caballero use Equation 68, taking the difference $Var (p_{t}) - Var (p_{t - 1})$ and rearranging to end up with the large t estimator of $N_{e}$ ,

N_{e} = \frac{p_{0} (1 - p_{0}) - Var (p_{t - 1})}{2 (Var (p_{t}) - Var (p_{t - 1}))}

(69)

[compare with p. 1018 in Santiago and Caballero (1995)]. Rearranging,

\begin{array}{l} 2 N_{e} (Var (p_{t}) - Var (p_{t - 1})) = p_{0} (1 - p_{0}) - Var (p_{t - 1}) \\ 2 N_{e} Var (p_{t}) - 2 N_{e} Var (p_{t - 1}) + Var (p_{t - 1}) = p_{0} (1 - p_{0}) \\ 2 N_{e} (Var (p_{t}) - Var (p_{t - 1}) (1 - \frac{1}{2 N_{e}})) = p_{0} (1 - p_{0}) . \end{array}

(70)

This very conveniently simplifies the sum in Equation 67, as we can show with the case of $t = 3$ ,

Var (p_{3}) = (E (D^{2}) + E (H^{2}) + Q_{1}^{2} E (S^{2})) {(1 - \frac{1}{2 N_{e}})}^{2} +

(E (D^{2}) + E (H^{2}) + Q_{2}^{2} E (S^{2})) (1 - \frac{1}{2 N_{e}}) +

E (D^{2}) + E (H^{2}) + Q_{3}^{2} E (S^{2})

Var (p_{2}) (1 - \frac{1}{2 N_{e}}) = (E (D^{2}) + E (H^{2}) + Q_{1}^{2} E (S^{2})) {(1 - \frac{1}{2 N_{e}})}^{2} +

(E (D^{2}) + E (H^{2}) + Q_{2}^{2} E (S^{2})) (1 - \frac{1}{2 N_{e}})

Var (p_{3}) - Var (p_{2}) (1 - \frac{1}{2 N_{e}}) = E (D^{2}) + E (H^{2}) + Q_{3}^{2} E (S^{2})

or, generally,

Var (p_{t}) - Var (p_{t - 1}) (1 - \frac{1}{2 N_{e}}) = E (D^{2}) + E (H^{2}) + Q_{t}^{2} E (S^{2}) .

(71)

Inserting this into Equation 70, the long-run effective population size can be written as

\begin{array}{l} N_{e} = \frac{p_{0} (1 - p_{0})}{2 (Var (p_{t}) - Var (p_{t - 1}) (1 - \frac{1}{2 N_{e}}))} \\ = \frac{p_{0} (1 - p_{0})}{2 (E (D^{2}) + E (H^{2}) + Q_{∞}^{2} E (S^{2}))} \end{array}

(72)

where $Q_{∞} = 1 + 1 / 2 + 1 / 4 + \dots = 2$ in Robertson’s (1961) model, and $Q_{∞} = 1 / (1 - G (1 - r))$ in Santiago and Caballero’s (1995) model. Note that r here represents the recombination fraction between fitness variation and neutral sites, which differs from equation 17 in Santiago and Caballero (1995), where r represents the correlation between parental fitness).

Then, Santiago and Caballero (1995) show,

E (S^{2}) = \frac{p_{0} (1 - p_{0})}{2 N} C^{2}

(73)

E (D^{2}) = \frac{p_{0} (1 - p_{0})}{2 N} \frac{V_{n}}{4}

(74)

E (H^{2}) = \frac{p_{0} (1 - p_{0})}{2 N} \frac{1}{2}

(75)

[compare with equation 11 in Santiago and Caballero (1995)]. Inserting these into Equation 72, we have

N_{e} = \frac{4 N}{2 + V_{n} + 4 Q_{∞}^{2} C^{2}},

(76)

which is a simplified version of equation 16 in Santiago and Caballero (1995), and which further simplifies to $N_{e} = N$ when $V_{n} = 2$ and $C^{2} = 0$ , the effective population size of a Wright–Fisher population of N hermaphroditic individuals.

The covariances caused by fitness associations

With an understanding of the basics of Santiago and Caballero’s (1995, 1998) models, how they connect to our notation, and how they reach their expression for the long-run effective population size, we turn now to finding the temporal autocovariances implicit in their model. We start by looking at the variance in allele frequency between generations 0 and 4 (Equation 63), including an additional generation so the pattern is clearer later,

Var (p_{4} - p_{0}) = E \underset{Δ p_{1}}{((\underset{︸}{S_{1} + D_{1} + H_{1}}} + \underset{Δ p_{2}}{\underset{︸}{(1 - r) G S_{1} + S_{2} + D_{2} + H_{2}}} + \underset{Δ p_{3}}{\underset{︸}{{(1 - r)}^{2} G^{2} S_{1} + (1 - r) G S_{2} + S_{3} + D_{3} + H_{3}}} + \underset{Δ p_{4}}{\underset{︸}{{{(1 - r)}^{3} G^{3} S_{1} + {(1 - r)}^{2} G^{2} S_{2} + (1 - r) G S_{3} + S_{4} + D_{4} + H_{4})}^{2}}}) .

The cross terms like $E (D_{1} D_{2}), E (H_{1} D_{1})$ , and $E (S_{1} S_{2})$ are all expected products of independent random variables, where the expectation of each random variable is zero, and consequently are all zero. The only nonzero cross terms are products of $E (S_{i}^{2})$ . When we look at the covariances with the allele frequency change in the initial generation and a later generation s, $Cov (Δ_{_{H}} p_{1}, Δ_{_{H}} p_{s})$ ,

Cov (Δ_{_{H}} p_{1}, Δ_{_{H}} p_{1}) = Var (Δ_{_{H}} p_{1}) = E (S_{1}^{2})

(77)

Cov (Δ_{_{H}} p_{1}, Δ_{_{H}} p_{2}) = \frac{E (S_{1}^{2}) G}{2} (1 - r)

(78)

Cov (Δ_{_{H}} p_{1}, Δ_{_{H}} p_{3}) = \frac{E (S_{1}^{2}) G^{2}}{2} {(1 - r)}^{2}

(79)

Cov (Δ_{_{H}} p_{1}, Δ_{_{H}} p_{4}) = \frac{E (S_{1}^{2}) G^{3}}{2} {(1 - r)}^{3}

(80)

where the $1 / 2$ coefficient comes from the fact that, in Santiago and Caballero’s work, the $E (S_{1}^{2})$ products represent both the $Cov (Δ_{_{H}} p_{t}, Δ_{_{H}} p_{s})$ and $Cov (Δ_{_{H}} p_{s}, Δ_{_{H}} p_{t})$ terms, so a single temporal autocovariance in our notation is one-half their joint covariance term.

However, if our reference generation is different, say two, the associations from earlier generations that have persisted to that generation can also lead to covariances to later generations. Looking at the covariances

Cov (Δ_{_{H}} p_{2}, Δ_{_{H}} p_{2}) = \frac{E (S_{2}^{2}) + E (S_{1}^{2}) (1 - r) G}{2}

(81)

Cov (Δ_{_{H}} p_{2}, Δ_{_{H}} p_{3}) = \frac{E (S_{2}^{2}) G + E (S_{1}^{2}) (1 - r) G^{2}}{2} (1 - r)

(82)

Cov (Δ_{_{H}} p_{2}, Δ_{_{H}} p_{4}) = \frac{E (S_{2}^{2}) (1 - r) G^{2} + E (S_{1}^{2}) {(1 - r)}^{2} G^{3}}{2} (1 - r) 2.

(83)

Likewise, the covariances $Cov (Δ_{_{H}} p_{3}, Δ_{_{H}} p_{s})$ include the associations that persist from earlier generations. In general,

Cov (Δ_{_{H}} p_{t}, Δ_{_{H}} p_{s}) = \frac{1}{2} \sum_{i = 1}^{t} E (S_{i}^{2}) {(G (1 - r))}^{t + s - 2 i}, for t \leq s .

(84)

This expression is more complex than our expression for temporal autocovariance because it is modeling the LD in generation t as it builds up from generations 1 to t. In contrast, our expressions for covariance incorporate all of this buildup of LD as the initial LD term $E (ℛ_{t})$ (for a single-locus case). This expression for autocovariance implied by Santiago and Caballero’s (1995) work matches our expression for autocovariance (for a single locus) when the arbitrary first generation is t rather than one

Cov (Δ_{_{H}} p_{t}, Δ_{_{H}} p_{s}) = \frac{E (S_{t}^{2}) G^{s - t}}{2} {(1 - r)}^{s - t}, for t \leq s .

(85)

Using the expression for $E (S_{t}^{2})$ [equivalent to $Var (Δ_{_{H}} p_{t})$ in our notation] derived in The Strength of Unlinked and Nongametic Association in the Appendix,

\frac{Cov (Δ_{_{H}} p_{t}, Δ_{_{H}} p_{s})}{p_{t} (1 - p_{t})} = \frac{C^{2} G^{s - t}}{4 N} {(1 - r)}^{s - t}, for t \leq s .

(86)

This is analogous to Equation 8 for a single locus, where $C^{2} G^{s - t}$ is the additive variation in generation s [equivalent to our $V_{a} (s)$ ], and the factor $1 / 2 N$ represents the chance buildup of LD between the neutral site and an unlinked fitness background. In our expression, we condition on existing LD $E (ℛ_{t}^{2})$ between the neutral site and its fitness background, whereas they assume a buildup of LD to a drift–recombination equilibrium. We can see this by returning to the $Δ_{_{H}} p_{4}$ term of Equation 63,

Var (Δ_{_{H}} p_{4}) = E {({(1 - r)}^{3} G^{3} S_{1} + {(1 - r)}^{2} G^{2} S_{2} + (1 - r) G S_{3} + S_{4} + D_{4} + H_{4})}^{2}

(87)

= {(1 - r)}^{5} G^{5} E (S_{1}^{2}) + {(1 - r)}^{4} G^{4} E (S_{2}^{2}) + {(1 - r)}^{2} G^{2} E {(S_{3})}^{2} + E (S_{4}^{2})

(88)

where following Santiago and Caballero’s (1995) approach (see p.1018), we can replace each $E (S_{i})$ with $E (S_{i}^{2}) = E (S^{2}) {(1 - 1 / 2 N)}^{i - 1}$ and let $G = 1$ as we focus on the buildup of LD. This gives us the general equation,

Var (Δ_{H} p_{t}) = E (S^{2}) \sum_{i = 1}^{t} {(1 - r)}^{2 i} {(1 - \frac{1}{2 N})}^{i - 1}

(89)

and taking this geometric series to infinity converges [since ${(1 - r)}^{2} (1 - 1 / 2 N) < 1$ ) and replacing $E (S^{2})$ ] with the chance associations that build up gametes sampled into individuals (Equation 58),

\begin{array}{l} E ({(Δ_{H} p_{∞})}^{2}) = E (S^{2}) \sum_{i = 1}^{∞} {(1 - r)}^{2 i} {(1 - \frac{1}{2 N})}^{i - 1} \\ = \frac{E (S^{2})}{1 - \frac{1}{2 N}} \sum_{i = 1}^{∞} {({(1 - r)}^{2} (1 - \frac{1}{2 N}))}^{i} \\ = \frac{C^{2} p_{0} (1 - p_{0})}{1 - \frac{1}{2 N}} \frac{1}{1 - {(1 - r)}^{2} (1 - \frac{1}{2 N})} . \end{array}

(90)

When we assume $r \to 0$ , $1 / N \to 0$ , and $N r$ is a constant, this gives us

E (ℛ^{2}) = \frac{E ({(Δ_{H} p_{∞})}^{2})}{C^{2} p_{0} (1 - p_{0})} \approx \frac{1}{1 + 4 N r},

(91)

which is analogous to the $E (ℛ^{2})$ measure of LD, standardized to rescale the fitness variation. The right-hand side is identical to the identity-by-descent equilibrium $E (ℛ^{2})$ under drift–recombination balance Sved (1971). Note that our expression can be recovered from Equation 84 when the reference generation $t \to ∞$ such that the LD hits its equilibrium level,

\frac{Cov (Δ_{H} p_{t}, Δ_{H} p_{s})}{p_{0} (1 - p_{0})} = \frac{1}{2} \underset{V_{A} (s)}{\underset{︸}{C^{2} G^{s - t}}} \underset{E (ℛ_{t}^{2})}{\underset{︸}{\frac{1}{1 + 4 N r}}} {(1 - r)}^{s - t},

(92)

which is identical to our expression for temporal autocovariance when initial LD is due to a neutral drift–recombination balance, and the change in $V_{A}$ during selection is modeled as a geometric decay at rate G. Note that the terms in underbraces indicate the corresponding terms in Equation 8 for a single locus.

Multilocus Simulation Details

Targeting an initial level of additive genetic variation

We choose θ for the coalescent simulations to target a total number of segregating sites $L + M$ , where L is the number of selected sites (a parameter we vary in our multilocus simulations) and $M = 200$ or more randomly placed neutral sites over which we can calculate the temporal autocovariance. Then, the total number of target sites $M + L$ is then inflated by a factor of 1.5 to ensure a sufficient number of sites given the random mutation process. The L selected sites are randomly chosen from the segregating sites, and all remaining mutations are neutral. Thus, using Watterson’s expression for the expected number of segregating sites under the coalescent (Watterson 1975), we have $θ = 1.5 (M + L) / (γ + log (2 N))$ where $γ \approx 0.577$ is Euler’s γ. Each of the L selected sites is given a random effect size of $\pm α$ with equal probability, where we choose by targeting a specific level additive genic variation $V_{a} = 2 α^{2} \sum_{l}^{L} p_{l} (1 - p_{l}) = α^{2} S S H_{L}$ where $S S H_{L} = 2 \sum_{l}^{L} p_{l} (1 - p_{l})$ is the sum of site heterozygosity of the L neutrally evolving sites that will be selected once selection begins. Under neutrality, $E (S S H_{L}) = θ_{L} = L / (γ + log (2 N))$ . Then, we set $α = \sqrt{V_{a} / θ_{L}}$ . We empirically validate that our target genic variation is close to the empirically observed level.

Choosing the simulation parameter range

We simulate over a grid of parameter ranges, varying the number of loci L, the target additive genic variation $V_{a}$ in the region (by varying α) and the recombination map length of the region R in morgans. We have varied these parameters over ranges that encompass a wide range likely to be encountered in natural populations. To do this, we found that additive genetic variation for lifetime reproductive success varied over orders of magnitude, from 0 [e.g., in male red deer (Kruuk et al. 2000)] to 1.1 [in the male red-billed gull (Teplitsky et al. 2009)]. These values of additive genetic variation are for the entire genome (we write these as $V_{A, G W}$ , where $G W$ indicates genome-wide); our simulations model a region of varying map length. We expect most recombination map lengths to be roughly over the scale of $5 - 50$ M in length, and we chose to investigate how temporal autocovariance behaves across a spectrum of recombination, from a completed linked region $(R = 0)$ , the scale over which a strong classic hard sweep could affect diversity (0.5 cM), to a region where the ends are approximately unlinked (4.5 M), overall giving us a parameter range of $R \in {0, 0.005, 0.01, 0.05, 0.1, 0.5, 1.5, 4.5}$ . Over this grid of our region recombination map lengths, and the total organism recombination map lengths, we get a rough estimate of the number of regions we would expect with this level of recombination in the organism’s total genome, imagining a homogeneous recombination rate, by taking $G / R$ . Using this estimate of the number of regions in the genome, we calculate the genetic variation per region over our grid of parameters as $V_{A, G W} R / G$ and target a level of additive genic variation per region $V_{a}$ equal to this regional additive genetic variation $V_{A}$ . From preliminary simulations, we found that we cannot detect much temporal autocovariance below $V_{a} < 0.001$ with the initial level of LD from mutation-drift balance, so we ignore parameters less than this value (other than $V_{a} = 0$ as a control). Additionally, to reduce the number of simulations, we exclude $V_{a} > 0.1$ as this only excludes a small region of the parameter grid and preliminary simulations demonstrated that the behavior of temporal autocovariance with high $V_{a}$ is evident with the included values. Overall, this gives us a spectrum of target additive genic variation per region of $V_{a} \in {0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.08, 0.1}$ . Note that to prevent our plots from being too dense, we include only a representative subset of this parameter grid in our figures.

Accounting for Allele Frequency Sampling Noise

In practice, one will calculate the temporal variance–covariance matrix on allele frequency trajectories calculated from sampled chromosomes from the population. We assume a binomial sampling process, where n chromosomes are sampled from the population such that $\tilde{p} = X / n$ and $X \sim Binom (p, n)$ . We can then write ${\tilde{p}}_{t} = p_{t} + ε_{t}$ , and our covariances can be written as

\begin{array}{l} Cov (Δ {\tilde{p}}_{t}, Δ {\tilde{p}}_{s}) = Cov ({\tilde{p}}_{t + 1} - {\tilde{p}}_{t}, {\tilde{p}}_{s + 1} - {\tilde{p}}_{s}) \\ = Cov (p_{t + 1} + ε_{t + 1} - p_{t} - ε_{t}, p_{s + 1} + ε_{s + 1} - p_{s} - ε_{s}) . \end{array}

(93)

Note this simplifies to

Cov (Δ {\tilde{p}}_{t}, Δ {\tilde{p}}_{s}) = Cov (p_{t + 1} - p_{t}, p_{s + 1} - p_{s}) if | t - s | > 1,

(94)

since in these cases, the sampling noise at a timepoint is not shared between the estimated allele frequency changes. However, if $| t - s | = 1$ the sampling noise from timepoint $t + 1$ is shared, biasing the sample estimate of covariance:

\begin{array}{l} Cov (Δ {\tilde{p}}_{t}, Δ {\tilde{p}}_{t + 1}) = Cov (p_{t + 1} + ε_{t + 1} - p_{t} - ε_{t}, p_{t + 2} + ε_{t + 2} - p_{t + 1} - ε_{t + 1}) \\ = Cov (Δ p_{t} + ε_{t + 1} - ε_{t}, Δ p_{t + 1} + ε_{t + 2} - ε_{t + 1}) \\ = Cov (Δ p_{t}, Δ p_{t + 1}) - Var (ε_{t + 1}) . \end{array}

(95)

Similarly, the variance $(t = s)$ is biased, as it is impacted by the binomial sampling noise too,

Var (Δ {\tilde{p}}_{t}) = Cov (Δ p_{t} + ε_{t + 1} - ε_{t}, Δ p_{t} + ε_{t + 1} - ε_{t}) = Var (Δ p_{t}) + Var (ε_{t + 1}) + Var (ε_{t}) .

(96)

In practice, these covariances are calculated over loci in a region or across the entire genome. We assume that the tracked allele has been randomly swapped (e.g., the tracked allele frequency is not systematically the minor, major, or reference allele), such that $E (Δ p_{t, l}) = 0$ for all t and l. Then, the unbiased covariance estimate as calculated over loci is

\frac{1}{L} \sum_{l = 1}^{L} (Δ p_{t, l} Δ p_{t + 1, l}) = \frac{1}{L} \sum_{l = 1}^{L} Δ {\tilde{p}}_{t, l} Δ {\tilde{p}}_{t + 1, l} + \frac{1}{L} \sum_{l = 1}^{L} ε_{t + 1, l}^{2} .

(97)

Then, we can use an unbiased plugin estimate of the frequency sampling variance $E (ε_{t + 1, l}^{2}) = V (ε_{t + 1, l}) = p_{t + 1, l} (1 - p_{t + 1, l}) / (n_{t, l} - 1)$ [see p.191 in Nei (1987)] to estimate these bias terms, and add or subtract them from the estimator accordingly. Accounting for finite sampling, the unbiased sample variance–covariance matrix now has elements:

Q_{t, t + 1} = \frac{1}{L} \sum_{l = 1}^{L} Δ {\tilde{p}}_{t, l} Δ {\tilde{p}}_{t + 1, l} + \frac{1}{L} \sum_{l = 1}^{L} \frac{p_{t + 1, l} (1 - p_{t + 1, l})}{n_{t + 1, l} - 1},

(98)

and variance

Q_{t, t} = \frac{1}{L} \sum_{l = 1}^{L} {(Δ {\tilde{p}}_{t, l})}^{2} - \frac{1}{L} \sum_{l = 1}^{L} \frac{p_{t, l} (1 - p_{t, l})}{n_{t, l} - 1} - \frac{1}{L} \sum_{l = 1}^{L} \frac{p_{t + 1, l} (1 - p_{t + 1, l})}{n_{t + 1, l} - 1} .

(99)

In Comparing theory to simulation results, we used population frequencies in introducing the method of moments estimators of $\hat{V_{A} (1)}$ and $\hat{N}$ . Here, we discuss the performance of these estimators with sample allele frequencies. Our simulations are identical to those described in Multilocus simulation details, except we have increased the target number of neutral sites in each region so it is around $10, 000$ . We mimic sampling of $n = {50, 100, 200, 500}$ chromosomes from the population, and use the bias-corrected sample variance–covariance matrix in the method-of-moments approach described in Estimating Linked-Selection Parameters from Temporal Autocovariance to estimate $\hat{V_{A} (1)}$ and $\hat{N}$ .

In Figure A3, we show the performance of our estimators in the case where $n = 100$ chromosomes have been sampled from the population. Overall, there are two important differences compared with Figure 5 of the main text. First, while the estimator $\hat{V_{A} (1)}$ performs well for high levels around $V_{A} \sim 0.1$ , the variance around the estimator increases significantly as $V_{A}$ grows weaker. As the covariances are proportional to $V_{A} / R$ , sampling noise grows larger than the theoretical temporal autocovariance as $V_{A}$ becomes weaker. Then, one cannot discriminate against the chance covariances formed by the sampling process from the temporal autocovariance created by linked selection without either a large sample size or more timepoints. Second, the approach underestimates $V_{A} (1)$ for very loose linkage $(R > 1 / 2)$ . This is another consequence of the first problem; as sampling noise grows equal to or larger than the magnitude of temporal autocovariance, the estimation procedure performs poorly. As the linkage becomes looser, the magnitude of temporal autocovariance grows weak relative to the sampling noise, and this noise can be partially absorbed by $\hat{N}$ . This effect can be somewhat ameliorated by calculating the sample variance–covariance matrix over a shorter region of the genome such that R is smaller (as long as SNP density is sufficient) or by increasing the sample size. Finally, the estimation of effective population size shown in Figure A3 is also affected by $V_{A}$ , as discussed in Estimating Linked-Selection Parameters from Temporal Autocovariance (though the effect is obscured by the sampling noise): high levels of $V_{A}$ in regions with low recombination generally lead to underestimates of $\hat{N}$ .

To understand how sample size affects the method-of-moments estimators, Figure A4 depicts median relative error of $\hat{V_{A} (1)}$ and $\hat{N}$ for various sample sizes.

Figure A1 — Here, we show the relative contributions of the linked and unlinked portions of the genome to the temporal autocovariance experienced by a neutral allele, for different generations elapsed (on the x-axis) and different map lengths of the unlinked genome (y-axis) for two different population sizes ( $N = 10^{3}$ and $N = 10^{6}$ ). The color gradient is determined by the ${log}_{10}$ value of the ratio of unlinked to linked contributions, the terms inside the braces in Equation 52. The dashed line indicates where the log ratio is zero, *e.g.*, the relative contributions are equal; this was determined numerically. These assume LD is determined by mutation–drift–recombination balance (Ohta and Kimura 1971).

Figure A2 — (A) An illustration of gametic $(D')$ and nongametic $(D^{″})$ LD between two loci in a diploid. (B) An illustration of how nongametic LD in generation t is converted to gametic LD through recombination (which happens with probability r, the recombination fraction between the two loci), and is then maintained until generation s with probability ${(1 - r)}^{s - t - 1}$ . The gray loci on the gray gamete indicate the homologous, but not tracked, focal association. Overall, the covariance created by the conversion of nongametic LD to gametic LD is $E (D_{s}^{'} D_{t}^{″}) = r {(1 - r)}^{s - t - 1} E ({(D_{t}^{″})}^{2})$ .

Figure A3 — True parameter values and estimates using the method-of-moments approach on sample ( $n = 100$ chromosomes) multilocus simulation data; these figures are analogous to Figure 5 in the main text, except the estimators have been calculated on sample, rather than population, frequency data. (A) The true $V_{A} (1)$ (x-axis) and $\hat{V_{A} (1)}$ estimated from the sample variance–covariance matrix (y-axis) for each simulation replicate across different levels of recombination (indicated by each point’s color). (B) Estimated drift-effective population sizes $(\hat{N})$ across a range of simulations with different levels of additive genetic variance and recombination. Each point denotes the median, with lines denoting the interquartile range. A simple temporal estimate of the effective population size, estimated with accounting for the effects of selection, is averaged for each replicate and plotted as a dash. The true value $(N = 1000)$ is shown with the dashed gray line.

Figure A4 — The median relative estimation error, over 30 replicate simulations, of the method-of-moments estimator for drift-effective population size $(\hat{N})$ and the initial additive genetic variance $(\hat{V_{A} (1)})$ .

Footnotes

Supplemental material available at FigShare: https://doi.org/10.6084/m9.figshare.7709930.

Communicating editor: N. Barton

Literature Cited

Aguade M., Miyashita N., and Langley C. H., 1989. Reduced variation in the yellow-achaete-scute region in natural populations of Drosophila melanogaster. Genetics 122: 607–615. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barghi N., Tobler R., Nolte V., Jakšić A. M., Mallard F. et al. , 2019. Genetic redundancy fuels polygenic adaptation in Drosophila. PLoS Biol. 17: e3000128 10.1371/journal.pbio.3000128 [DOI] [PMC free article] [PubMed] [Google Scholar]
Barton N. H., 1991. Natural and sexual selection on many loci. Genetics 127: 229–255. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barton N. H., 2000. Genetic hitchhiking. Proc. R. Soc. Lond. B Biol. Sci. 355: 1553–1562. 10.1098/rstb.2000.0716 [DOI] [PMC free article] [PubMed] [Google Scholar]
Barton N. H., and Otto S. P., 2005. Evolution of recombination due to random drift. Genetics 169: 2353–2370. 10.1534/genetics.104.032821 [DOI] [PMC free article] [PubMed] [Google Scholar]
Barton N. H., and Turelli M., 1987. Adaptive landscapes, genetic distance and the evolution of quantitative characters. Genet. Res. 49: 157–173. 10.1017/S0016672300026951 [DOI] [PubMed] [Google Scholar]
Begun D. J., and Aquadro C. F., 1992. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356: 519–520. 10.1038/356519a0 [DOI] [PubMed] [Google Scholar]
Begun D. J., Holloway A. K., Stevens K., Hillier L. W., Poh Y.P. et al. , 2007. Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol. 5: e310 10.1371/journal.pbio.0050310 [DOI] [PMC free article] [PubMed] [Google Scholar]
Beissinger T. M., Wang L., Crosby K., Durvasula A., Hufford M. B. et al. , 2016. Recent demography drives changes in linked selection across the maize genome. Nat. Plants 2: 16084 10.1038/nplants.2016.84 [DOI] [PubMed] [Google Scholar]
Bergland A. O., Behrman E. L., O’Brien K. R., Schmidt P. S., and Petrov D. A., 2014. Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila. PLoS Genet. 10: e1004775 10.1371/journal.pgen.1004775 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bhatia G., Patterson N., and Sankararaman S., and Price A. L., 2013. Estimating and interpreting FST: the impact of rare variants. Genome Res. 23: 1514–1521. 10.1101/gr.154831.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bulik-Sullivan B. K., Loh P.-R., Finucane H. K., Ripke S., Yang J. et al. , 2015. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47: 291–295. 10.1038/ng.3211 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bulmer M. G., 1971. The effect of selection on genetic variability. Am. Nat. 105: 201–211. 10.1086/282718 [DOI] [Google Scholar]
Bulmer M. G., 1980. The Mathematical Theory of Quantitative Genetics, Clarendon Press, ‎Oxford. [Google Scholar]
Bürger R., 2000. The Mathematical Theory of Selection, Recombination, and Mutation, John Wiley & Sons, Hoboken, NJ. [Google Scholar]
Burke M. K., Dunham J. P., Shahrestani P., Thornton K. R., Rose M. R., et al. , 2010. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467: 587–590. 10.1038/nature09352 [DOI] [PubMed] [Google Scholar]
Burri R., 2017. Interpreting differentiation landscapes in the light of long-term linked selection. Evol. Lett. 1: 118–131. 10.1002/evl3.14 [DOI] [Google Scholar]
Burt A., 1995. The evolution of fitness. Evolution 49: 1–8. 10.1111/j.1558-5646.1995.tb05954 [DOI] [PubMed] [Google Scholar]
Cannings C., 1974. The latent roots of certain Markov chains arising in genetics: a new approach, I. Haploid models. Adv. Appl. Probab. 7: 260–290. [Google Scholar]
Charlesworth B., 2009. Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 10: 195–205. 10.1038/nrg2526 [DOI] [PubMed] [Google Scholar]
Charlesworth B., 2015. Causes of natural variation in fitness: evidence from studies of Drosophila populations. Proc. Natl. Acad. Sci. USA 112: 1662–1669 (erratum: Proc. Natl. Acad. Sci. USA 112: E1049). 10.1073/pnas.1423275112 [DOI] [PMC free article] [PubMed] [Google Scholar]
Charlesworth B., Morgan M. T., and Charlesworth D., 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134: 1289–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chevin L. M. and Hospital F.. 2008. Selective sweep at a quantitative trait locus in the presence of background genetic variation. Genetics 180: 1645–1660. 10.1534/genetics.108.093351 [DOI] [PMC free article] [PubMed] [Google Scholar]
Christensen R., 2011. Plane Answers to Complex Questions/: The Theory of Linear Models, Springer-Verlag, Heidelberg, Germany: 10.1007/978-1-4419-9816-3 [DOI] [Google Scholar]
Coop G., 2016. Does linked selection explain the narrow range of genetic diversity across species? bioRxiv. Available at: https://www.biorxiv.org/content/10.1101/042598v1. 10.1101/042598 [DOI] [Google Scholar]
Corbett-Detig R. B., Hartl D. L., and Sackton T. B., 2015. Natural selection constrains neutral diversity across a wide range of species. PLoS Biol. 13: e1002112 10.1371/journal.pbio.1002112 [DOI] [PMC free article] [PubMed] [Google Scholar]
Crouch D. J. M. 2017. Statistical aspects of evolution under natural selection, with implications for the advantage of sexual reproduction. J. Theor. Biol. 431: 79–86. 10.1016/j.jtbi.2017.07.021 [DOI] [PubMed] [Google Scholar]
Crow J. F., and Kimura M., 1970. An Introduction to Population Genetics Theory, Harper & Row, Publishers, New York. [Google Scholar]
Cutter A. D. and Payseur B. A.. 2013. Genomic signatures of selection at linked sites: unifying the disparity among species. Nat. Rev. Genet. 14: 262–274. 10.1038/nrg3425 [DOI] [PMC free article] [PubMed] [Google Scholar]
Dobzhansky T., 1943. Genetics of natural populations IX. Temporal changes in the composition of populations of Drosophila pseudoobscura. Genetics 28: 162–186. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dobzhansky T., 1971. Evolutionary oscillations in Drosophila pseudoobscura, pp. 109–133 in Ecological Genetics and Evolution, edited by R. Creed. Springer-Verlag, New York: 10.1007/978-1-4757-0432-7_6 [DOI] [Google Scholar]
Edmonds C. A., Lillie A. S., and Cavalli-Sforza L. L.. 2004. Mutations arising in the wave front of an expanding population. Proc. Natl. Acad. Sci. U. S. A. 101: 975–979. [DOI] [PMC free article] [PubMed] [Google Scholar]
Elyashiv E., and Sattath S., Hu T. T., Strutsovsky A., McVicker G. et al. , 2016. A Genomic map of the effects of linked selection in Drosophila. PLoS Genet. 12: e1006130 10.1371/journal.pgen.1006130 [DOI] [PMC free article] [PubMed] [Google Scholar]
Endler J. A., 1986. Natural Selection in the Wild, Princeton University Press, Princeton, NY. [Google Scholar]
Excoffier L., and Ray N., 2008. Surfing during population expansions promotes genetic revolutions and structuration. Trends Ecol. Evol. 23: 347–351. 10.1016/j.tree.2008.04.004 [DOI] [PubMed] [Google Scholar]
Feder, A. F., S. Kryazhimskiy, and J. B. Plotkin, 2014 Identifying signatures of selection in genetic time series. Genetics 196: 509–522 [corrigenda: Genetics 210: 1559 (2018)]. 10.1534/genetics.113.158220 [DOI] [PMC free article] [PubMed] [Google Scholar]
Fisher R. A., and Ford E. B., 1947. The spread of a gene in natural conditions in a colony of the moth Panaxia dominula. Heredity (Edinb.) 1: 143–174. 10.1038/hdy.1947.11 [DOI] [Google Scholar]
Franssen S. U., Kofler R., and Schlötterer C., 2017. Uncovering the genetic signature of quantitative trait evolution with replicated time series data. Heredity (Edinb.) 118: 42–51. 10.1038/hdy.2016.98 [DOI] [PMC free article] [PubMed] [Google Scholar]
Fu Q., Posth C., Hajdinjak M., Petr M., Mallick S. et al. , 2016. The genetic history of Ice Age Europe. Nature 534: 200–205. 10.1038/nature17993 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gillespie J. H., 2000. Genetic drift in an infinite population. The pseudohitchhiking model. Genetics 155: 909–919. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gillespie J. H., 2001. Is the population size of a species relevant to its evolution? Evolution 55: 2161–2169. 10.1111/j.0014-3820.2001.tb00732.x [DOI] [PubMed] [Google Scholar]
Gingerich P. D., 1983. Rates of evolution: effects of time and temporal scaling. Science 222: 159–161. 10.1126/science.222.4620.159 [DOI] [PubMed] [Google Scholar]
Good B. H., Walczak A. M., Neher R. A., and Desai M. M.. 2014. Genetic diversity in the interference selection limit. PLoS Genet. 10: e1004222 10.1371/journal.pgen.1004222 [DOI] [PMC free article] [PubMed] [Google Scholar]
Good B. H., McDonald M. J., Barrick J. E., Lenski R. E., and Desai M. M.. 2017. The dynamics of molecular evolution over 60,000 generations. Nature 551: 45–50. 10.1038/nature24287 [DOI] [PMC free article] [PubMed] [Google Scholar]
Haldane J. B. S., 1919. The combination of linkage values and the calculation of distances between the loci of linked factors. J. Genet. 8: 299–309. [Google Scholar]
Hallatschek O. and Nelson D. R.. 2008. Gene surfing in expanding populations. Theor. Popul. Biol. 73: 158–170. 10.1016/j.tpb.2007.08.008 [DOI] [PubMed] [Google Scholar]
Hansen L. P., 1982. Large sample properties of generalized method of moments estimators. Econometrica 50: 1029–1054. 10.2307/1912775 [DOI] [Google Scholar]
Hendry A. P., and Kinnison M. T., 1999. Perspective: the pace of modern life: measuring rates of contemporary microevolution. Evolution 53: 1637–1653. 10.1111/j.1558-5646.1999.tb04550.x [DOI] [PubMed] [Google Scholar]
Hendry A. P., Schoen D. J., Wolak M. E., and Reid J. M., 2018. The contemporary evolution of fitness. Annu. Rev. Ecol. Evol. Syst. 49: 457–476 [Google Scholar]
Hermisson J., and Pennings P. S., 2005. Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics 169: 2335–2352. 10.1534/genetics.104.036947 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hernandez R. D., Kelley J. L., Elyashiv E., and Melton S. C., 2011. Classic selective sweeps were rare in recent human evolution. Science 331: 920–924. 10.1126/science.1198878 [DOI] [PMC free article] [PubMed] [Google Scholar]
Heyer E., Sibert A., and Austerlitz F., 2005. Cultural transmission of fitness: genes take the fast lane. Trends Genet. 21: 234–239. 10.1016/j.tig.2005.02.007 [DOI] [PubMed] [Google Scholar]
Hill W. G., and Robertson A., 1966. The effect of linkage on limits to artificial selection. Genet. Res. 8: 269–294. 10.1017/S0016672300010156 [DOI] [PubMed] [Google Scholar]
Hill W. G., and Robertson A., 1968. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38: 226–231. 10.1007/BF01245622 [DOI] [PubMed] [Google Scholar]
Höllinger I., Pennings P. S., and Hermisson J., 2019. Polygenic adaptation: from sweeps to subtle frequency shifts. PLoS Genet. 15: e1008035 10.1371/journal.pgen.1008035 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hudson R. R., 1994. How can the low levels of DNA sequence variation in regions of the drosophila genome with low recombination rates be explained? Proc. Natl. Acad. Sci. USA 91: 6815–6818. 10.1073/pnas.91.15.6815 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hudson R. R., and Kaplan N. L., 1995. Deleterious background selection with recombination. Genetics 141: 1605–1617. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jain K., and Stephan W., 2015. Response of polygenic traits under stabilizing selection and mutation when loci have unequal effects. G3 (Bethesda) 5: 1065–1074. 10.1534/g3.115.017970 [DOI] [PMC free article] [PubMed] [Google Scholar]
Jain K., and Stephan W., 2017. Rapid adaptation of a polygenic trait after a sudden environmental shift. Genetics 206: 389–406. 10.1534/genetics.116.196972 [DOI] [PMC free article] [PubMed] [Google Scholar]
Johansson A. M., Pettersson M. E., Siegel P. B., and Carlborg Ö., 2010. Genome-wide effects of long-term divergent selection. PLoS Genet. 6: e1001188 10.1371/journal.pgen.1001188 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kaplan N. L., Hudson R. R., and Langley C. H., 1989. The “hitchhiking effect” revisited. Genetics 123: 887–899. [DOI] [PMC free article] [PubMed] [Google Scholar]
Keinan A., and Reich D., 2010. Human population differentiation is strongly correlated with local recombination rate. PLoS Genet. 6: e1000886 10.1371/journal.pgen.1000886 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kelleher J., Etheridge A. M., and McVean G., 2016. Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS Comput. Biol. 12: e1004842. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kendall M., and Stuart A., Ord J. K., and O’Hagan A.. 1994. Kendall’s Advanced Theory of Statistics. volume 1: Distribution Theory. Ed. 6, edited by Arnold E. Wiley-Interscience, New York. [Google Scholar]
Kettlewell H. B. D., 1958. A survey of the frequencies of Biston betularia (L.)(Lep.) and its melanic forms in Great Britain. Heredity (Edinb.) 12: 51–72. [Google Scholar]
Kettlewell, H. B. D., 1961 The phenomenon of industrial melanism in Lepidoptera. Annu. Rev. Entomol. 6: 245–262. 10.1146/annurev.en.06.010161.001333 [DOI] [Google Scholar]
Kimura M., 1984. The Neutral Theory of Molecular Evolution, Cambridge University Press, Cambridge. [Google Scholar]
Kinnison M. T., and Hendry A. P., 2001. The pace of modern life II: from rates of contemporary microevolution to pattern and process. Genetica 112–113: 145–164. 10.1023/A:1013375419520 [DOI] [PubMed] [Google Scholar]
Kirkpatrick M., Johnson T., and Barton N., 2002. General models of multilocus evolution. Genetics 161: 1727–1750. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kopp M., and Hermisson J., 2007. Adaptation of a quantitative trait to a moving optimum. Genetics 176: 715–719. 10.1534/genetics.106.067215 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kopp M., and Hermisson J., 2009a. The genetic basis of phenotypic adaptation I: fixation of beneficial mutations in the moving optimum model. Genetics 182: 233–249. 10.1534/genetics.108.099820 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kopp M., and Hermisson J., 2009b The genetic basis of phenotypic adaptation II: the distribution of adaptive substitutions in the moving optimum model. Genetics 183: 1453–1476. 10.1534/genetics.109.106195 [DOI] [PMC free article] [PubMed] [Google Scholar]
Krimbas C. B., and Tsakas S., 1971. The genetics of Dacus oleae. V. Changes of esterase polymorphism in a natural population following insecticide control-selection or drift? Evolution 25: 454–460. 10.1111/j.1558-5646.1971.tb01904.x [DOI] [PubMed] [Google Scholar]
Kruuk L. E. B., 2004. Estimating genetic parameters in natural populations using the “animal model. Philos. Trans. R. Soc. Lond. B Biol. Sci. 359: 873–890. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kruuk L. E. B., Clutton-Brock T. H., Slate J., Pemberton J. M., Brotherstone S., et al. , 2000. Heritability of fitness in a wild mammal population. Proc. Natl. Acad. Sci. USA 97: 698–703. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lande R., 1979. Quantitative genetic analysis of multivariate evolution, applied to brain: body size allometry. Evolution 33: 402. [DOI] [PubMed] [Google Scholar]
Leffler E. M., Bullaughey K., Matute D. R., Meyer W. K., Ségurel L., et al. , 2012. Revisiting an old riddle: what determines genetic diversity levels within species? PLoS Biol. 10: e1001388. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lewontin R. C., 1974. The Genetic Basis of Evolutionary Change, Vol. 560 Columbia University Press, New York. [Google Scholar]
Lynch M., and Walsh B.. 1998. Genetics and Analysis of Quantitative Traits, Vol. I Sinauer Associates, Sunderland, MA. [Google Scholar]
Malaspinas, A. S., O. Malaspinas, S. N. Evans, and M. Slatkin, 2012 Estimating allele age and selection coefficient from time-serial data. Genetics 192: 599–607. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mathieson I., and McVean G., 2013. Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics 193: 973–984. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mathieson I., Lazaridis I., Rohland N., Mallick S., Patterson N. et al. , 2015. Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528: 499–503. 10.1038/nature16152 [DOI] [PMC free article] [PubMed] [Google Scholar]
McVicker G., Gordon D., Davis C., and Green P., 2009. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5: e1000471 10.1371/journal.pgen.1000471 [DOI] [PMC free article] [PubMed] [Google Scholar]
Messer P. W., Ellner S. P., and Hairston N. G.. 2016. Can population genetics adapt to rapid evolution? Trends Genet. 32: 408–418. [DOI] [PubMed] [Google Scholar]
Morley F. H. W., 1954. Selection for economic characters in Australian Merino sheep. Aust. J. Agric. Res. 5: 305–316. 10.1071/AR9540305 [DOI] [Google Scholar]
Mousseau T. A., and Roff D. A., 1987. Natural selection and the heritability of fitness components. Heredity (Edinb.) 59: 181–197. 10.1038/hdy.1987.113 [DOI] [PubMed] [Google Scholar]
Mueller L. D., Wilcox B. A., Ehrlich P. R., and Heckel D. G., 1985a A direct assessment of the role of genetic drift in determining allele frequency variation in populations of Euphydryas editha. Genetics 110: 495–511. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mueller L. D., Barr L. G., and Ayala F. J., 1985b Natural selection vs. random drift: evidence from temporal variation in allele frequencies in nature. Genetics 111: 517–554. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nachman M. W., and Payseur B. A., 2012. Recombination rate variation and speciation: theoretical predictions and empirical results from rabbits and mice. Philos. Trans. R. Soc. Lond. B Biol. Sci. 367: 409–421. 10.1098/rstb.2011.0249 [DOI] [PMC free article] [PubMed] [Google Scholar]
Neher R. A., 2013. Genetic draft, selective interference, and population genetics of rapid adaptation. Annu. Rev. Ecol. Evol. Syst. 44: 195–215. 10.1146/annurev-ecolsys-110512-135920 [DOI] [Google Scholar]
Neher R. A., and Shraiman B. I., 2011. Genetic draft and quasi-neutrality in large facultatively sexual populations. Genetics 188: 975–996. 10.1534/genetics.111.128876 [DOI] [PMC free article] [PubMed] [Google Scholar]
Nei M., 1987. Molecular Evolutionary Genetics, Columbia University Press, New York: 10.7312/nei-92038 [DOI] [Google Scholar]
Nei M., and Tajima F., 1981. Genetic drift and estimation of effective population size. Genetics 98: 625–640. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nordborg M., Charlesworth B., and Charlesworth D., 1996. The effect of recombination on background selection. Genet. Res. 67: 159–174. 10.1017/S0016672300033619 [DOI] [PubMed] [Google Scholar]
Nordborg M., Hu T. T., Ishino Y., Jhaveri J., Toomajian C. et al. , 2005. The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3: e196 10.1371/journal.pbio.0030196 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ohta T., and Kimura M., 1969. Linkage disequilibrium at steady state determined by random genetic drift and recurrent mutation. Genetics 63: 229–238. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ohta T., and Kimura M., 1971. Linkage disequilibrium between two segregating nucleotide sites under the steady flux of mutations in a finite population. Genetics 68: 571–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
Orozco-terWengel P., Kapun M., Nolte V., Kofler R., Flatt T. et al. , 2012. Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Mol. Ecol. 21: 4931–4941. 10.1111/j.1365-294X.2012.05673.x [DOI] [PMC free article] [PubMed] [Google Scholar]
Pennings P. S., and Hermisson J., 2006. Soft sweeps II--molecular population genetics of adaptation from recurrent mutation or migration. Mol. Biol. Evol. 23: 1076–1084. 10.1093/molbev/msj117 [DOI] [PubMed] [Google Scholar]
Pollak E., 1983. A new method for estimating the effective population size from allele frequency changes. Genetics 104: 531–548. [DOI] [PMC free article] [PubMed] [Google Scholar]
Price G. R., 1970. Selection and covariance. Nature 227: 520–521. 10.1038/227520a0 [DOI] [PubMed] [Google Scholar]
Prout T., 1954. Genetic Drift in Irradiated Experimental Populations of Drosophila Melanogaster. Genetics 39: 529–545. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rajpurohit S., Gefen E., Bergland A. O., Petrov D. A., Gibbs A. G., et al. , 2018. Spatiotemporal dynamics and genome-wide association analysis of desiccation tolerance in Drosophila melanogaster. Mol. Ecol. 27: 3525–3540. 10.1111/mec.14814 [DOI] [PMC free article] [PubMed] [Google Scholar]
Robertson A., 1960. A theory of limits in artificial selection. Proc. R. Soc. Lond. B Biol. Sci. 153: 234–249. 10.1098/rspb.1960.0099 [DOI] [Google Scholar]
Robertson A., 1961. Inbreeding in artificial selection programmes. Genet. Res. 2: 189–194. 10.1017/S0016672300000690 [DOI] [PubMed] [Google Scholar]
Robertson A., 1966. A mathematical model of the culling process in dairy cattle. Anim. Sci. 8: 95–108. 10.1017/S0003356100037752 [DOI] [Google Scholar]
Robertson A., 1970. A theory of limits in artificial selection with many linked loci, pp. 246–288 in Mathematical Topics in Population Genetics, edited by Kojima K.-I., Springer Berlin Heidelberg, Berlin, Heidelberg: 10.1007/978-3-642-46244-3_8 [DOI] [Google Scholar]
Robertson A., 1976. Artificial selection with a large number of linked loci, pp. 307–322 in Proceedings of the International Conference on Quantitative Genetics, edited by Pollak E., Kempthorne O., and Bailey T. B., Iowa State University Press, Iowa City, Iowa. [Google Scholar]
Rockman M. V., Skrovanek S. S., and Kruglyak L., 2010. Selection at linked sites shapes heritable phenotypic variation in C. elegans. Science 330: 372–376. 10.1126/science.1194208 [DOI] [PMC free article] [PubMed] [Google Scholar]
Santiago E., and Caballero A., 1995. Effective size of populations under selection. Genetics 139: 1013–1030. [DOI] [PMC free article] [PubMed] [Google Scholar]
Santiago E. and Caballero A., 1998. Effective size and polymorphism of linked neutral loci in populations under directional selection. Genetics 149: 2105–2117. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schmid K. J., Ramos-Onsins S., Ringys-Beckstein H., Weisshaar B., and Mitchell-Olds T., 2005. A multilocus sequence survey in Arabidopsis thaliana reveals a genome-wide departure from a neutral model of DNA sequence polymorphism. Genetics 169: 1601–1615. 10.1534/genetics.104.033795 [DOI] [PMC free article] [PubMed] [Google Scholar]
Sella G., Dmitri A Petrov M. P., and Andolfatto P., 2009. Pervasive natural selection in the Drosophila genome? PLoS Genet. 5: e1000495 10.1371/journal.pgen.1000495 [DOI] [PMC free article] [PubMed] [Google Scholar]
Shaw R. G., and Shaw F. H., 2013. Quantitative genetic study of the adaptive process. Heredity (Edinb.) 112: 13–20. 10.1038/hdy.2013.42 [DOI] [PMC free article] [PubMed] [Google Scholar]
Smith J. M., and Haigh J.. 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23: 23–35. [PubMed] [Google Scholar]
Stephan W., Wiehe T. H. E., and Lenz M. W., 1992. The effect of strongly selected substitutions on neutral polymorphism: analytical results based on diffusion theory. Theor. Popul. Biol. 41: 237–354. 10.1016/0040-5809(92)90045-U [DOI] [Google Scholar]
Stephan W., Song Y. S., and Langley C. H., 2006. The hitchhiking effect on linkage disequilibrium between linked neutral loci. Genetics 172: 2647–2663. 10.1534/genetics.105.050179 [DOI] [PMC free article] [PubMed] [Google Scholar]
Sved J. A., 1971. Linkage disequilibrium and homozygosity of chromosome segments in finite populations. Theor. Popul. Biol. 2: 125–141. 10.1016/0040-5809(71)90011-6 [DOI] [PubMed] [Google Scholar]
Tenesa A., Navarro P., Hayes B. J., Duffy D. L., Clarke G. M., et al. , 2007. Recent human effective population size estimated from linkage disequilibrium. Genome Res. 17: 520–526. 10.1101/gr.6023607 [DOI] [PMC free article] [PubMed] [Google Scholar]
Teotónio H., Chelo I. M., Bradić M., Rose M. R., and Long A. D., 2009. Experimental evolution reveals natural selection on standing genetic variation. Nat. Genet. 41: 251–257. 10.1038/ng.289 [DOI] [PubMed] [Google Scholar]
Teplitsky C., Mills J. A., Yarrall J. W., and Merilä J., 2009. Heritability of fitness components in a wild bird population. Evolution 63: 716–726. [DOI] [PubMed] [Google Scholar]
Terhorst J., Schlötterer C., and Song Y. S.. 2015. Multi-locus analysis of genomic time series data from experimental evolution. PLoS Genet. 11: e1005069 10.1371/journal.pgen.1005069 [DOI] [PMC free article] [PubMed] [Google Scholar]
Thornton, K. R., 2018 Polygenic adaptation to an environmental shift: temporal dynamics of variation under Gaussian stabilizing selection and additive effects on a single trait. bioRxiv. Available at: 10.1101/505750. 10.1101/505750 [DOI] [PMC free article] [PubMed]
Turelli, M., 1988 Population genetic models for polygenic variation and evolution, pp. 601–608 in Proceedings of the Second International Conference On Quantitative Genetics, edited by B. S. Weir, E. J. Eisen, M. M. Goodman, and G. Namkoong. Sinauer Associates, Sunderland, MA. [Google Scholar]
Turelli M., and Barton N. H., 1990. Dynamics of polygenic characters under selection. Theor. Popul. Biol. 38: 1–57. 10.1016/0040-5809(90)90002-D [DOI] [Google Scholar]
Turelli M., and Barton N. H., 1994. Genetic and statistical analyses of strong selection on polygenic traits: what, me normal? Genetics 138: 913–941. [DOI] [PMC free article] [PubMed] [Google Scholar]
Turner T. L., and Miller P. M., 2012. Investigating natural variation in Drosophila courtship song by the evolve and resequence approach. Genetics 191: 633–642. 10.1534/genetics.112.139337 [DOI] [PMC free article] [PubMed] [Google Scholar]
Turner T. L., Stewart A. D., Fields A. T., Rice W. R., and Tarone A. M.. 2011. Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster. PLoS Genet. 7: e1001336. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wallace B., 1956. Studies on irradiated populations ofDrosophila melanogaster. J. Genet. 54: 280–293. 10.1007/BF02982782 [DOI] [Google Scholar]
Walsh B., and Lynch M., 2018. Evolution and Selection of Quantitative Traits, Oxford University Press, Oxford: 10.1093/oso/9780198830870.001.0001 [DOI] [Google Scholar]
Wang J., and Whitlock M. C., 2003. Estimating effective population size and migration rates from genetic samples over space and time. Genetics 163: 429–446. [DOI] [PMC free article] [PubMed] [Google Scholar]
Waples R. S., 1989. A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics 121: 379–391. [DOI] [PMC free article] [PubMed] [Google Scholar]
Watterson G. A., 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7: 256–276. 10.1016/0040-5809(75)90020-9 [DOI] [PubMed] [Google Scholar]
Weir B. S., 1996. Genetic Data Analysis II: Methods for Discrete Population Genetic Data, Sinauer Associates, Sunderland, MA. [Google Scholar]
Wiehe T. H., and Stephan W., 1993. Analysis of a genetic hitchhiking model, and its application to DNA polymorphism data from Drosophila melanogaster. Mol. Biol. Evol. 10: 842–854. [DOI] [PubMed] [Google Scholar]
Williamson R. J., Josephs E. B., Platts A. E., Hazzouri K. M., Haudry A., et al. , 2014. Evidence for widespread positive and negative selection in coding and conserved noncoding regions of Capsella grandiflora. PLoS Genet. 10: e1004622. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wray N. R. and Thompson R.. 1990. Prediction of rates of inbreeding in selected populations. Genet. Res. 55: 41–54. [DOI] [PubMed] [Google Scholar]
Woolliams J. A., Wray N. R., and Thompson R., 1993. Prediction of long-term contributions and inbreeding in populations undergoing mass selection. Genet. Res. 62: 231–242. [Google Scholar]
Wright S., 1931. Evolution in mendelian populations. Genes 16: 97. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wright S., 1938. Size of population and breeding structure in relation to evolution. Science 87: 430–431. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[bib1] Aguade M., Miyashita N., and Langley C. H., 1989. Reduced variation in the yellow-achaete-scute region in natural populations of Drosophila melanogaster. Genetics 122: 607–615. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Barghi N., Tobler R., Nolte V., Jakšić A. M., Mallard F. et al. , 2019. Genetic redundancy fuels polygenic adaptation in Drosophila. PLoS Biol. 17: e3000128 10.1371/journal.pbio.3000128 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Barton N. H., 1991. Natural and sexual selection on many loci. Genetics 127: 229–255. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Barton N. H., 2000. Genetic hitchhiking. Proc. R. Soc. Lond. B Biol. Sci. 355: 1553–1562. 10.1098/rstb.2000.0716 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Barton N. H., and Otto S. P., 2005. Evolution of recombination due to random drift. Genetics 169: 2353–2370. 10.1534/genetics.104.032821 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Barton N. H., and Turelli M., 1987. Adaptive landscapes, genetic distance and the evolution of quantitative characters. Genet. Res. 49: 157–173. 10.1017/S0016672300026951 [DOI] [PubMed] [Google Scholar]

[bib7] Begun D. J., and Aquadro C. F., 1992. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356: 519–520. 10.1038/356519a0 [DOI] [PubMed] [Google Scholar]

[bib8] Begun D. J., Holloway A. K., Stevens K., Hillier L. W., Poh Y.P. et al. , 2007. Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol. 5: e310 10.1371/journal.pbio.0050310 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Beissinger T. M., Wang L., Crosby K., Durvasula A., Hufford M. B. et al. , 2016. Recent demography drives changes in linked selection across the maize genome. Nat. Plants 2: 16084 10.1038/nplants.2016.84 [DOI] [PubMed] [Google Scholar]

[bib10] Bergland A. O., Behrman E. L., O’Brien K. R., Schmidt P. S., and Petrov D. A., 2014. Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila. PLoS Genet. 10: e1004775 10.1371/journal.pgen.1004775 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Bhatia G., Patterson N., and Sankararaman S., and Price A. L., 2013. Estimating and interpreting FST: the impact of rare variants. Genome Res. 23: 1514–1521. 10.1101/gr.154831.113 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Bulik-Sullivan B. K., Loh P.-R., Finucane H. K., Ripke S., Yang J. et al. , 2015. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47: 291–295. 10.1038/ng.3211 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Bulmer M. G., 1971. The effect of selection on genetic variability. Am. Nat. 105: 201–211. 10.1086/282718 [DOI] [Google Scholar]

[bib14] Bulmer M. G., 1980. The Mathematical Theory of Quantitative Genetics, Clarendon Press, ‎Oxford. [Google Scholar]

[bib15] Bürger R., 2000. The Mathematical Theory of Selection, Recombination, and Mutation, John Wiley & Sons, Hoboken, NJ. [Google Scholar]

[bib16] Burke M. K., Dunham J. P., Shahrestani P., Thornton K. R., Rose M. R., et al. , 2010. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467: 587–590. 10.1038/nature09352 [DOI] [PubMed] [Google Scholar]

[bib17] Burri R., 2017. Interpreting differentiation landscapes in the light of long-term linked selection. Evol. Lett. 1: 118–131. 10.1002/evl3.14 [DOI] [Google Scholar]

[bib18] Burt A., 1995. The evolution of fitness. Evolution 49: 1–8. 10.1111/j.1558-5646.1995.tb05954 [DOI] [PubMed] [Google Scholar]

[bib19] Cannings C., 1974. The latent roots of certain Markov chains arising in genetics: a new approach, I. Haploid models. Adv. Appl. Probab. 7: 260–290. [Google Scholar]

[bib20] Charlesworth B., 2009. Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 10: 195–205. 10.1038/nrg2526 [DOI] [PubMed] [Google Scholar]

[bib21] Charlesworth B., 2015. Causes of natural variation in fitness: evidence from studies of Drosophila populations. Proc. Natl. Acad. Sci. USA 112: 1662–1669 (erratum: Proc. Natl. Acad. Sci. USA 112: E1049). 10.1073/pnas.1423275112 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] Charlesworth B., Morgan M. T., and Charlesworth D., 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134: 1289–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Chevin L. M. and Hospital F.. 2008. Selective sweep at a quantitative trait locus in the presence of background genetic variation. Genetics 180: 1645–1660. 10.1534/genetics.108.093351 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Christensen R., 2011. Plane Answers to Complex Questions/: The Theory of Linear Models, Springer-Verlag, Heidelberg, Germany: 10.1007/978-1-4419-9816-3 [DOI] [Google Scholar]

[bib25] Coop G., 2016. Does linked selection explain the narrow range of genetic diversity across species? bioRxiv. Available at: https://www.biorxiv.org/content/10.1101/042598v1. 10.1101/042598 [DOI] [Google Scholar]

[bib26] Corbett-Detig R. B., Hartl D. L., and Sackton T. B., 2015. Natural selection constrains neutral diversity across a wide range of species. PLoS Biol. 13: e1002112 10.1371/journal.pbio.1002112 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] Crouch D. J. M. 2017. Statistical aspects of evolution under natural selection, with implications for the advantage of sexual reproduction. J. Theor. Biol. 431: 79–86. 10.1016/j.jtbi.2017.07.021 [DOI] [PubMed] [Google Scholar]

[bib28] Crow J. F., and Kimura M., 1970. An Introduction to Population Genetics Theory, Harper & Row, Publishers, New York. [Google Scholar]

[bib29] Cutter A. D. and Payseur B. A.. 2013. Genomic signatures of selection at linked sites: unifying the disparity among species. Nat. Rev. Genet. 14: 262–274. 10.1038/nrg3425 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Dobzhansky T., 1943. Genetics of natural populations IX. Temporal changes in the composition of populations of Drosophila pseudoobscura. Genetics 28: 162–186. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] Dobzhansky T., 1971. Evolutionary oscillations in Drosophila pseudoobscura, pp. 109–133 in Ecological Genetics and Evolution, edited by R. Creed. Springer-Verlag, New York: 10.1007/978-1-4757-0432-7_6 [DOI] [Google Scholar]

[bib32] Edmonds C. A., Lillie A. S., and Cavalli-Sforza L. L.. 2004. Mutations arising in the wave front of an expanding population. Proc. Natl. Acad. Sci. U. S. A. 101: 975–979. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Elyashiv E., and Sattath S., Hu T. T., Strutsovsky A., McVicker G. et al. , 2016. A Genomic map of the effects of linked selection in Drosophila. PLoS Genet. 12: e1006130 10.1371/journal.pgen.1006130 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Endler J. A., 1986. Natural Selection in the Wild, Princeton University Press, Princeton, NY. [Google Scholar]

[bib35] Excoffier L., and Ray N., 2008. Surfing during population expansions promotes genetic revolutions and structuration. Trends Ecol. Evol. 23: 347–351. 10.1016/j.tree.2008.04.004 [DOI] [PubMed] [Google Scholar]

[bib36] Feder, A. F., S. Kryazhimskiy, and J. B. Plotkin, 2014 Identifying signatures of selection in genetic time series. Genetics 196: 509–522 [corrigenda: Genetics 210: 1559 (2018)]. 10.1534/genetics.113.158220 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] Fisher R. A., and Ford E. B., 1947. The spread of a gene in natural conditions in a colony of the moth Panaxia dominula. Heredity (Edinb.) 1: 143–174. 10.1038/hdy.1947.11 [DOI] [Google Scholar]

[bib38] Franssen S. U., Kofler R., and Schlötterer C., 2017. Uncovering the genetic signature of quantitative trait evolution with replicated time series data. Heredity (Edinb.) 118: 42–51. 10.1038/hdy.2016.98 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Fu Q., Posth C., Hajdinjak M., Petr M., Mallick S. et al. , 2016. The genetic history of Ice Age Europe. Nature 534: 200–205. 10.1038/nature17993 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Gillespie J. H., 2000. Genetic drift in an infinite population. The pseudohitchhiking model. Genetics 155: 909–919. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] Gillespie J. H., 2001. Is the population size of a species relevant to its evolution? Evolution 55: 2161–2169. 10.1111/j.0014-3820.2001.tb00732.x [DOI] [PubMed] [Google Scholar]

[bib42] Gingerich P. D., 1983. Rates of evolution: effects of time and temporal scaling. Science 222: 159–161. 10.1126/science.222.4620.159 [DOI] [PubMed] [Google Scholar]

[bib43] Good B. H., Walczak A. M., Neher R. A., and Desai M. M.. 2014. Genetic diversity in the interference selection limit. PLoS Genet. 10: e1004222 10.1371/journal.pgen.1004222 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] Good B. H., McDonald M. J., Barrick J. E., Lenski R. E., and Desai M. M.. 2017. The dynamics of molecular evolution over 60,000 generations. Nature 551: 45–50. 10.1038/nature24287 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] Haldane J. B. S., 1919. The combination of linkage values and the calculation of distances between the loci of linked factors. J. Genet. 8: 299–309. [Google Scholar]

[bib46] Hallatschek O. and Nelson D. R.. 2008. Gene surfing in expanding populations. Theor. Popul. Biol. 73: 158–170. 10.1016/j.tpb.2007.08.008 [DOI] [PubMed] [Google Scholar]

[bib47] Hansen L. P., 1982. Large sample properties of generalized method of moments estimators. Econometrica 50: 1029–1054. 10.2307/1912775 [DOI] [Google Scholar]

[bib48] Hendry A. P., and Kinnison M. T., 1999. Perspective: the pace of modern life: measuring rates of contemporary microevolution. Evolution 53: 1637–1653. 10.1111/j.1558-5646.1999.tb04550.x [DOI] [PubMed] [Google Scholar]

[bib49] Hendry A. P., Schoen D. J., Wolak M. E., and Reid J. M., 2018. The contemporary evolution of fitness. Annu. Rev. Ecol. Evol. Syst. 49: 457–476 [Google Scholar]

[bib50] Hermisson J., and Pennings P. S., 2005. Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics 169: 2335–2352. 10.1534/genetics.104.036947 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] Hernandez R. D., Kelley J. L., Elyashiv E., and Melton S. C., 2011. Classic selective sweeps were rare in recent human evolution. Science 331: 920–924. 10.1126/science.1198878 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] Heyer E., Sibert A., and Austerlitz F., 2005. Cultural transmission of fitness: genes take the fast lane. Trends Genet. 21: 234–239. 10.1016/j.tig.2005.02.007 [DOI] [PubMed] [Google Scholar]

[bib53] Hill W. G., and Robertson A., 1966. The effect of linkage on limits to artificial selection. Genet. Res. 8: 269–294. 10.1017/S0016672300010156 [DOI] [PubMed] [Google Scholar]

[bib54] Hill W. G., and Robertson A., 1968. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38: 226–231. 10.1007/BF01245622 [DOI] [PubMed] [Google Scholar]

[bib55] Höllinger I., Pennings P. S., and Hermisson J., 2019. Polygenic adaptation: from sweeps to subtle frequency shifts. PLoS Genet. 15: e1008035 10.1371/journal.pgen.1008035 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib56] Hudson R. R., 1994. How can the low levels of DNA sequence variation in regions of the drosophila genome with low recombination rates be explained? Proc. Natl. Acad. Sci. USA 91: 6815–6818. 10.1073/pnas.91.15.6815 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] Hudson R. R., and Kaplan N. L., 1995. Deleterious background selection with recombination. Genetics 141: 1605–1617. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib58] Jain K., and Stephan W., 2015. Response of polygenic traits under stabilizing selection and mutation when loci have unequal effects. G3 (Bethesda) 5: 1065–1074. 10.1534/g3.115.017970 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib59] Jain K., and Stephan W., 2017. Rapid adaptation of a polygenic trait after a sudden environmental shift. Genetics 206: 389–406. 10.1534/genetics.116.196972 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib60] Johansson A. M., Pettersson M. E., Siegel P. B., and Carlborg Ö., 2010. Genome-wide effects of long-term divergent selection. PLoS Genet. 6: e1001188 10.1371/journal.pgen.1001188 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] Kaplan N. L., Hudson R. R., and Langley C. H., 1989. The “hitchhiking effect” revisited. Genetics 123: 887–899. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib62] Keinan A., and Reich D., 2010. Human population differentiation is strongly correlated with local recombination rate. PLoS Genet. 6: e1000886 10.1371/journal.pgen.1000886 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib63] Kelleher J., Etheridge A. M., and McVean G., 2016. Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS Comput. Biol. 12: e1004842. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib64] Kendall M., and Stuart A., Ord J. K., and O’Hagan A.. 1994. Kendall’s Advanced Theory of Statistics. volume 1: Distribution Theory. Ed. 6, edited by Arnold E. Wiley-Interscience, New York. [Google Scholar]

[bib65] Kettlewell H. B. D., 1958. A survey of the frequencies of Biston betularia (L.)(Lep.) and its melanic forms in Great Britain. Heredity (Edinb.) 12: 51–72. [Google Scholar]

[bib66] Kettlewell, H. B. D., 1961 The phenomenon of industrial melanism in Lepidoptera. Annu. Rev. Entomol. 6: 245–262. 10.1146/annurev.en.06.010161.001333 [DOI] [Google Scholar]

[bib67] Kimura M., 1984. The Neutral Theory of Molecular Evolution, Cambridge University Press, Cambridge. [Google Scholar]

[bib68] Kinnison M. T., and Hendry A. P., 2001. The pace of modern life II: from rates of contemporary microevolution to pattern and process. Genetica 112–113: 145–164. 10.1023/A:1013375419520 [DOI] [PubMed] [Google Scholar]

[bib69] Kirkpatrick M., Johnson T., and Barton N., 2002. General models of multilocus evolution. Genetics 161: 1727–1750. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib70] Kopp M., and Hermisson J., 2007. Adaptation of a quantitative trait to a moving optimum. Genetics 176: 715–719. 10.1534/genetics.106.067215 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib71] Kopp M., and Hermisson J., 2009a. The genetic basis of phenotypic adaptation I: fixation of beneficial mutations in the moving optimum model. Genetics 182: 233–249. 10.1534/genetics.108.099820 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib72] Kopp M., and Hermisson J., 2009b The genetic basis of phenotypic adaptation II: the distribution of adaptive substitutions in the moving optimum model. Genetics 183: 1453–1476. 10.1534/genetics.109.106195 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib73] Krimbas C. B., and Tsakas S., 1971. The genetics of Dacus oleae. V. Changes of esterase polymorphism in a natural population following insecticide control-selection or drift? Evolution 25: 454–460. 10.1111/j.1558-5646.1971.tb01904.x [DOI] [PubMed] [Google Scholar]

[bib74] Kruuk L. E. B., 2004. Estimating genetic parameters in natural populations using the “animal model. Philos. Trans. R. Soc. Lond. B Biol. Sci. 359: 873–890. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib75] Kruuk L. E. B., Clutton-Brock T. H., Slate J., Pemberton J. M., Brotherstone S., et al. , 2000. Heritability of fitness in a wild mammal population. Proc. Natl. Acad. Sci. USA 97: 698–703. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib76] Lande R., 1979. Quantitative genetic analysis of multivariate evolution, applied to brain: body size allometry. Evolution 33: 402. [DOI] [PubMed] [Google Scholar]

[bib77] Leffler E. M., Bullaughey K., Matute D. R., Meyer W. K., Ségurel L., et al. , 2012. Revisiting an old riddle: what determines genetic diversity levels within species? PLoS Biol. 10: e1001388. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib78] Lewontin R. C., 1974. The Genetic Basis of Evolutionary Change, Vol. 560 Columbia University Press, New York. [Google Scholar]

[bib79] Lynch M., and Walsh B.. 1998. Genetics and Analysis of Quantitative Traits, Vol. I Sinauer Associates, Sunderland, MA. [Google Scholar]

[bib80] Malaspinas, A. S., O. Malaspinas, S. N. Evans, and M. Slatkin, 2012 Estimating allele age and selection coefficient from time-serial data. Genetics 192: 599–607. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib81] Mathieson I., and McVean G., 2013. Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics 193: 973–984. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib82] Mathieson I., Lazaridis I., Rohland N., Mallick S., Patterson N. et al. , 2015. Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528: 499–503. 10.1038/nature16152 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib83] McVicker G., Gordon D., Davis C., and Green P., 2009. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5: e1000471 10.1371/journal.pgen.1000471 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib84] Messer P. W., Ellner S. P., and Hairston N. G.. 2016. Can population genetics adapt to rapid evolution? Trends Genet. 32: 408–418. [DOI] [PubMed] [Google Scholar]

[bib85] Morley F. H. W., 1954. Selection for economic characters in Australian Merino sheep. Aust. J. Agric. Res. 5: 305–316. 10.1071/AR9540305 [DOI] [Google Scholar]

[bib86] Mousseau T. A., and Roff D. A., 1987. Natural selection and the heritability of fitness components. Heredity (Edinb.) 59: 181–197. 10.1038/hdy.1987.113 [DOI] [PubMed] [Google Scholar]

[bib87] Mueller L. D., Wilcox B. A., Ehrlich P. R., and Heckel D. G., 1985a A direct assessment of the role of genetic drift in determining allele frequency variation in populations of Euphydryas editha. Genetics 110: 495–511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib88] Mueller L. D., Barr L. G., and Ayala F. J., 1985b Natural selection vs. random drift: evidence from temporal variation in allele frequencies in nature. Genetics 111: 517–554. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib89] Nachman M. W., and Payseur B. A., 2012. Recombination rate variation and speciation: theoretical predictions and empirical results from rabbits and mice. Philos. Trans. R. Soc. Lond. B Biol. Sci. 367: 409–421. 10.1098/rstb.2011.0249 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib90] Neher R. A., 2013. Genetic draft, selective interference, and population genetics of rapid adaptation. Annu. Rev. Ecol. Evol. Syst. 44: 195–215. 10.1146/annurev-ecolsys-110512-135920 [DOI] [Google Scholar]

[bib91] Neher R. A., and Shraiman B. I., 2011. Genetic draft and quasi-neutrality in large facultatively sexual populations. Genetics 188: 975–996. 10.1534/genetics.111.128876 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib92] Nei M., 1987. Molecular Evolutionary Genetics, Columbia University Press, New York: 10.7312/nei-92038 [DOI] [Google Scholar]

[bib93] Nei M., and Tajima F., 1981. Genetic drift and estimation of effective population size. Genetics 98: 625–640. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib94] Nordborg M., Charlesworth B., and Charlesworth D., 1996. The effect of recombination on background selection. Genet. Res. 67: 159–174. 10.1017/S0016672300033619 [DOI] [PubMed] [Google Scholar]

[bib95] Nordborg M., Hu T. T., Ishino Y., Jhaveri J., Toomajian C. et al. , 2005. The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3: e196 10.1371/journal.pbio.0030196 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib96] Ohta T., and Kimura M., 1969. Linkage disequilibrium at steady state determined by random genetic drift and recurrent mutation. Genetics 63: 229–238. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib97] Ohta T., and Kimura M., 1971. Linkage disequilibrium between two segregating nucleotide sites under the steady flux of mutations in a finite population. Genetics 68: 571–580. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib98] Orozco-terWengel P., Kapun M., Nolte V., Kofler R., Flatt T. et al. , 2012. Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Mol. Ecol. 21: 4931–4941. 10.1111/j.1365-294X.2012.05673.x [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib99] Pennings P. S., and Hermisson J., 2006. Soft sweeps II--molecular population genetics of adaptation from recurrent mutation or migration. Mol. Biol. Evol. 23: 1076–1084. 10.1093/molbev/msj117 [DOI] [PubMed] [Google Scholar]

[bib100] Pollak E., 1983. A new method for estimating the effective population size from allele frequency changes. Genetics 104: 531–548. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib102] Price G. R., 1970. Selection and covariance. Nature 227: 520–521. 10.1038/227520a0 [DOI] [PubMed] [Google Scholar]

[bib103] Prout T., 1954. Genetic Drift in Irradiated Experimental Populations of Drosophila Melanogaster. Genetics 39: 529–545. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib104] Rajpurohit S., Gefen E., Bergland A. O., Petrov D. A., Gibbs A. G., et al. , 2018. Spatiotemporal dynamics and genome-wide association analysis of desiccation tolerance in Drosophila melanogaster. Mol. Ecol. 27: 3525–3540. 10.1111/mec.14814 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib105] Robertson A., 1960. A theory of limits in artificial selection. Proc. R. Soc. Lond. B Biol. Sci. 153: 234–249. 10.1098/rspb.1960.0099 [DOI] [Google Scholar]

[bib106] Robertson A., 1961. Inbreeding in artificial selection programmes. Genet. Res. 2: 189–194. 10.1017/S0016672300000690 [DOI] [PubMed] [Google Scholar]

[bib107] Robertson A., 1966. A mathematical model of the culling process in dairy cattle. Anim. Sci. 8: 95–108. 10.1017/S0003356100037752 [DOI] [Google Scholar]

[bib108] Robertson A., 1970. A theory of limits in artificial selection with many linked loci, pp. 246–288 in Mathematical Topics in Population Genetics, edited by Kojima K.-I., Springer Berlin Heidelberg, Berlin, Heidelberg: 10.1007/978-3-642-46244-3_8 [DOI] [Google Scholar]

[bib141] Robertson A., 1976. Artificial selection with a large number of linked loci, pp. 307–322 in Proceedings of the International Conference on Quantitative Genetics, edited by Pollak E., Kempthorne O., and Bailey T. B., Iowa State University Press, Iowa City, Iowa. [Google Scholar]

[bib109] Rockman M. V., Skrovanek S. S., and Kruglyak L., 2010. Selection at linked sites shapes heritable phenotypic variation in C. elegans. Science 330: 372–376. 10.1126/science.1194208 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib110] Santiago E., and Caballero A., 1995. Effective size of populations under selection. Genetics 139: 1013–1030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib111] Santiago E. and Caballero A., 1998. Effective size and polymorphism of linked neutral loci in populations under directional selection. Genetics 149: 2105–2117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib112] Schmid K. J., Ramos-Onsins S., Ringys-Beckstein H., Weisshaar B., and Mitchell-Olds T., 2005. A multilocus sequence survey in Arabidopsis thaliana reveals a genome-wide departure from a neutral model of DNA sequence polymorphism. Genetics 169: 1601–1615. 10.1534/genetics.104.033795 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib113] Sella G., Dmitri A Petrov M. P., and Andolfatto P., 2009. Pervasive natural selection in the Drosophila genome? PLoS Genet. 5: e1000495 10.1371/journal.pgen.1000495 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib114] Shaw R. G., and Shaw F. H., 2013. Quantitative genetic study of the adaptive process. Heredity (Edinb.) 112: 13–20. 10.1038/hdy.2013.42 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib115] Smith J. M., and Haigh J.. 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23: 23–35. [PubMed] [Google Scholar]

[bib116] Stephan W., Wiehe T. H. E., and Lenz M. W., 1992. The effect of strongly selected substitutions on neutral polymorphism: analytical results based on diffusion theory. Theor. Popul. Biol. 41: 237–354. 10.1016/0040-5809(92)90045-U [DOI] [Google Scholar]

[bib117] Stephan W., Song Y. S., and Langley C. H., 2006. The hitchhiking effect on linkage disequilibrium between linked neutral loci. Genetics 172: 2647–2663. 10.1534/genetics.105.050179 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib118] Sved J. A., 1971. Linkage disequilibrium and homozygosity of chromosome segments in finite populations. Theor. Popul. Biol. 2: 125–141. 10.1016/0040-5809(71)90011-6 [DOI] [PubMed] [Google Scholar]

[bib119] Tenesa A., Navarro P., Hayes B. J., Duffy D. L., Clarke G. M., et al. , 2007. Recent human effective population size estimated from linkage disequilibrium. Genome Res. 17: 520–526. 10.1101/gr.6023607 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib120] Teotónio H., Chelo I. M., Bradić M., Rose M. R., and Long A. D., 2009. Experimental evolution reveals natural selection on standing genetic variation. Nat. Genet. 41: 251–257. 10.1038/ng.289 [DOI] [PubMed] [Google Scholar]

[bib121] Teplitsky C., Mills J. A., Yarrall J. W., and Merilä J., 2009. Heritability of fitness components in a wild bird population. Evolution 63: 716–726. [DOI] [PubMed] [Google Scholar]

[bib122] Terhorst J., Schlötterer C., and Song Y. S.. 2015. Multi-locus analysis of genomic time series data from experimental evolution. PLoS Genet. 11: e1005069 10.1371/journal.pgen.1005069 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib123] Thornton, K. R., 2018 Polygenic adaptation to an environmental shift: temporal dynamics of variation under Gaussian stabilizing selection and additive effects on a single trait. bioRxiv. Available at: 10.1101/505750. 10.1101/505750 [DOI] [PMC free article] [PubMed]

[bib124] Turelli, M., 1988 Population genetic models for polygenic variation and evolution, pp. 601–608 in Proceedings of the Second International Conference On Quantitative Genetics, edited by B. S. Weir, E. J. Eisen, M. M. Goodman, and G. Namkoong. Sinauer Associates, Sunderland, MA. [Google Scholar]

[bib125] Turelli M., and Barton N. H., 1990. Dynamics of polygenic characters under selection. Theor. Popul. Biol. 38: 1–57. 10.1016/0040-5809(90)90002-D [DOI] [Google Scholar]

[bib126] Turelli M., and Barton N. H., 1994. Genetic and statistical analyses of strong selection on polygenic traits: what, me normal? Genetics 138: 913–941. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib127] Turner T. L., and Miller P. M., 2012. Investigating natural variation in Drosophila courtship song by the evolve and resequence approach. Genetics 191: 633–642. 10.1534/genetics.112.139337 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib128] Turner T. L., Stewart A. D., Fields A. T., Rice W. R., and Tarone A. M.. 2011. Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster. PLoS Genet. 7: e1001336. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib129] Wallace B., 1956. Studies on irradiated populations ofDrosophila melanogaster. J. Genet. 54: 280–293. 10.1007/BF02982782 [DOI] [Google Scholar]

[bib130] Walsh B., and Lynch M., 2018. Evolution and Selection of Quantitative Traits, Oxford University Press, Oxford: 10.1093/oso/9780198830870.001.0001 [DOI] [Google Scholar]

[bib131] Wang J., and Whitlock M. C., 2003. Estimating effective population size and migration rates from genetic samples over space and time. Genetics 163: 429–446. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib132] Waples R. S., 1989. A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics 121: 379–391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib133] Watterson G. A., 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7: 256–276. 10.1016/0040-5809(75)90020-9 [DOI] [PubMed] [Google Scholar]

[bib134] Weir B. S., 1996. Genetic Data Analysis II: Methods for Discrete Population Genetic Data, Sinauer Associates, Sunderland, MA. [Google Scholar]

[bib135] Wiehe T. H., and Stephan W., 1993. Analysis of a genetic hitchhiking model, and its application to DNA polymorphism data from Drosophila melanogaster. Mol. Biol. Evol. 10: 842–854. [DOI] [PubMed] [Google Scholar]

[bib136] Williamson R. J., Josephs E. B., Platts A. E., Hazzouri K. M., Haudry A., et al. , 2014. Evidence for widespread positive and negative selection in coding and conserved noncoding regions of Capsella grandiflora. PLoS Genet. 10: e1004622. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib137] Wray N. R. and Thompson R.. 1990. Prediction of rates of inbreeding in selected populations. Genet. Res. 55: 41–54. [DOI] [PubMed] [Google Scholar]

[bib138] Woolliams J. A., Wray N. R., and Thompson R., 1993. Prediction of long-term contributions and inbreeding in populations undergoing mass selection. Genet. Res. 62: 231–242. [Google Scholar]

[bib139] Wright S., 1931. Evolution in mendelian populations. Genes 16: 97. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib140] Wright S., 1938. Size of population and breeding structure in relation to evolution. Science 87: 430–431. [Google Scholar]

PERMALINK

The Linked Selection Signature of Rapid Adaptation in Temporal Genomic Data

Vince Buffalo

Graham Coop

Abstract

Outline of Temporal Autocovariance Theory

Table 1. Notation.

Figure 1.

A Model for Multilocus Temporal Autocovariance

The multilocus temporal autocovariance model with directional selection

Temporal autocovariance for an average neutral polymorphism

Modeling the dynamics of additive genic and genetic variation

Multilocus simulation details

Comparing theory to simulation results

Figure 2.

Figure 3.

Figure 4.

Data availability

Estimating Linked-Selection Parameters from Temporal Autocovariance

Figure 5.

Estimating the proportion of variance in frequency change due to linked selection

Figure 6.

Fluctuating selection

Figure 7.

Discussion

Empirical applications and future directions

Connecting temporal linked selection with single-timepoint studies

Connecting estimates of VA from temporal genomic data and quantitative genetic studies

Conclusions

Acknowledgments

Appendix

Decomposition of Allele Frequency Change

Temporal Variance and Autocovariance Under Multilocus Selection

Modeling the dynamics of LD between selected and neutral sites

Using average additive genetic variation

Continuous approximation to chromosomes

The Contribution of the Rest of the Genome to Temporal Autocovariance at a Locus

Averaging Covariance Across Multiple Loci

Empirically Calculating the Average LD Persisting Across Generations

The Strength of Unlinked and Nongametic Associations

Nongametic LD’s Contribution to Temporal Autocovariance

Connecting our model with the models of Robertson and Santiago and Caballero

Santiago and Caballero’s 1995 and 1998 models for Ne

The covariances caused by fitness associations

Multilocus Simulation Details

Targeting an initial level of additive genetic variation

Choosing the simulation parameter range

Accounting for Allele Frequency Sampling Noise

Figure A1.

Figure A2.

Figure A3.

Figure A4.

Footnotes

Literature Cited

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Connecting estimates of $V_{A}$ from temporal genomic data and quantitative genetic studies

Santiago and Caballero’s 1995 and 1998 models for $N_{e}$