Skip to main content
Genetics logoLink to Genetics
. 2018 May 29;209(4):1235–1278. doi: 10.1534/genetics.118.301058

The Effect of Strong Purifying Selection on Genetic Diversity

Ivana Cvijović *,†,1, Benjamin H Good †,‡,§, Michael M Desai *,†,**,1
PMCID: PMC6063222  PMID: 29844134

Negative selection is a ubiquitous evolutionary force, but its effects on diversity in large samples are poorly understood. Cvijović, Good, and Desai obtain simple analytical expressions for the whole population site frequency spectrum....

Keywords: linked selection, background selection, distinguishability, allele frequency trajectories, rare variants

Abstract

Purifying selection reduces genetic diversity, both at sites under direct selection and at linked neutral sites. This process, known as background selection, is thought to play an important role in shaping genomic diversity in natural populations. Yet despite its importance, the effects of background selection are not fully understood. Previous theoretical analyses of this process have taken a backward-time approach based on the structured coalescent. While they provide some insight, these methods are either limited to very small samples or are computationally prohibitive. Here, we present a new forward-time analysis of the trajectories of both neutral and deleterious mutations at a nonrecombining locus. We find that strong purifying selection leads to remarkably rich dynamics: neutral mutations can exhibit sweep-like behavior, and deleterious mutations can reach substantial frequencies even when they are guaranteed to eventually go extinct. Our analysis of these dynamics allows us to calculate analytical expressions for the full site frequency spectrum. We find that whenever background selection is strong enough to lead to a reduction in genetic diversity, it also results in substantial distortions to the site frequency spectrum, which can mimic the effects of population expansions or positive selection. Because these distortions are most pronounced in the low and high frequency ends of the spectrum, they become particularly important in larger samples, but may have small effects in smaller samples. We also apply our forward-time framework to calculate other quantities, such as the ultimate fates of polymorphisms or the fitnesses of their ancestral backgrounds.


PURIFYING selection against newly arising deleterious mutations is essential to preserving biological function. It is ubiquitous across all natural populations and is responsible for genomic sequence conservation across long evolutionary timescales. In addition to preserving function at directly selected sites, negative selection also leaves signatures in patterns of diversity at linked neutral sites, which have been observed in a wide range of organisms (Begun and Aquadro 1992; Charlesworth 1996; Cutter and Payseur 2003; McVicker et al. 2009; Flowers et al. 2012; Comeron 2014; Elyashiv et al. 2016). This process is known as background selection and understanding its effects is essential for characterizing the evolutionary pressures that have shaped a population, as well as for distinguishing its effects from less ubiquitous events such as population expansions or the positive selection of new adaptive traits.

At a qualitative level, the effects of background selection are well known: it reduces linked neutral diversity by reducing the number of individuals that are able to contribute descendants in the long run. Since individuals that carry strongly deleterious mutations cannot leave descendants on long timescales, all diversity that persists in the population must have arisen in individuals that were free of deleterious mutations. Since all of these individuals are equivalent in fitness, this suggests that diversity should resemble that expected in a neutral population of a smaller size—specifically, with a size equal to the number of mutation-free individuals (Charlesworth et al. 1993).

However, an extensive body of work has shown that this intuition is not correct and that background selection against strongly deleterious mutations can lead to nonneutral distortions in diversity statistics (Charlesworth et al. 1993, 1995; Hudson and Kaplan 1994; Tachida 2000; Gordo et al. 2002; Williamson and Orive 2002; O’Fallon et al. 2010; Nicolaisen and Desai 2012; Walczak et al. 2012; Good et al. 2014). The reason for this is simple: even strong selection cannot purge deleterious alleles instantly. Instead, deleterious haplotypes persist in the population on short timescales, allowing neutral variants that arise on their backgrounds to reach modest frequencies. This is most readily apparent in statistics based on the site frequency spectrum [the number, p(f), of polymorphisms which are at frequency f in the population], such as the number of singletons or Tajima’s D (Tajima 1989). As we show below, even when deleterious mutations have a strong effect on fitness, the site frequency spectrum shows an enormous excess of rare variants compared to the expectation for a neutral population of reduced effective size.

These signatures in genetic diversity are qualitatively similar to those we expect from population expansions and positive selection (Slatkin and Hudson 1991; Sawyer and Hartl 1992; Rannala 1997; Keinan and Clark 2012). A detailed quantitative understanding of background selection is therefore essential if we are to disentangle its signatures from those of other evolutionary processes.

The traditional approach to analyzing the effects of purifying selection has been to use backward-time approaches based on the structured coalescent (Hudson and Kaplan 1988, 1994). This offers an approximate framework to model how background selection affects the statistics of genealogical histories of a sample, and hence the expected patterns of genetic diversity. The approximations underlying this method are valid when selection is sufficiently strong that deleterious mutations rarely fix (Neher and Shraiman 2012), the same regime we will consider in this work. However, while these backward-time structured coalescent methods make it possible to rapidly simulate genealogies, they are essentially numerical methods and do not lead to analytical predictions. Furthermore, they give limited intuition as to the conditions under which their approximations are valid. A more technical but crucial limitation is that they rapidly become very computationally demanding in larger samples. This is becoming an increasingly important problem as advances in sequencing technology now make it possible to study sample sizes of thousands (or even hundreds of thousands) of individuals. The poor scaling of coalescent methods with sample size is of particular importance in studying background selection: since purifying selection is expected to result in an excess of rare variants, its effects increase in magnitude as sample size increases. This can reveal deviations from neutrality in large samples that are not seen in smaller samples.

Here, we use an alternative, forward-time approach to analyze how purifying selection affects patterns of genetic variation at a nonrecombining genomic segment. Our method is based on the observation that to predict single-locus statistics, such as the site frequency spectrum, it is not necessary to model the entire genealogy. Instead, we model the frequency of the lineage descended from a single mutation as it changes over time due to the combined forces of selection and genetic drift, and as it accumulates additional deleterious mutations. We then use these allele frequency trajectories to predict the site frequency spectrum, from which any other single-site statistic of interest can then be calculated (note, however, that multi-site statistics such as linkage disequilibrium or correlations between allele frequencies at different sites cannot be calculated from the site frequency spectrum).

We show that background selection creates large distortions in the frequency spectrum at linked neutral sites whenever there is significant fitness variation in the population. These distortions are concentrated in the high- and low-frequency ends of the frequency spectrum, and hence are particularly important in large samples. We provide analytical expressions for the frequencies at which these distortions occur and we can therefore predict at what sample sizes they can be seen in data.

Aside from single time-point statistics such as the site frequency spectrum, we also obtain analytical forms for the statistics of allele frequency trajectories. These trajectories have a very nonneutral character which reflects the underlying linked selection. Our approach offers an intuitive explanation for how these nonneutral behaviors arise in the presence of substantial linked fitness variation, which explains the origins of the distortions in the site frequency spectrum.

The statistics of allele frequency trajectories can also be used to calculate any time-dependent, single-site statistic. For example, we analyze how the future trajectory of a mutation can be predicted from the frequency at which we initially observe it, and we discuss the extent to which the observed frequency of a polymorphism can inform us about the fitness of the background on which it arose.

We emphasize that we focus throughout on modeling a perfectly linked genomic region. In the presence of recombination, our results offer insights about the effects of linked selection on diversity within regions that are effectively fully linked on the relevant timescales. In the Discussion, we discuss how our results can be used to provide a lower bound on the length of these segments, and therefore on the amount of linked selection relevant in sexually reproducing populations, and we comment on possible future extensions of our analysis to include recombination explicitly.

We begin in the next section by providing an intuitive explanation for the origins of the distortions in the site frequency spectrum in the presence of strong background selection, and explain why these distortions always accompany a reduction in diversity. This section summarizes the importance of correctly accounting for background selection, particularly when analyzing large samples, and should be accessible to all readers. We next define a specific model of background selection and summarize our main quantitative results.

We then present the analysis of our model. We begin by reviewing how dynamical aspects of allele frequency trajectories can be related to site frequency spectra, using the trajectories of isolated loci as an example. Readers already familiar with this intuition may choose to skip ahead, but those less interested in the technical details may find that this section provides useful intuition for the calculations in a simpler context. We then explain how this approach must be modified to account for linkage between multiple selected sites and present an intuitive description of the key features of allele frequency trajectories. These sections may be of interest to readers who wish to understand the intuitive origins of nonneutral behaviors of alleles in the presence of strong background selection. Finally, in the Analysis, we turn to a formal stochastic treatment of the trajectories of neutral and deleterious mutations. In the last section, we use these trajectories to calculate the site frequency spectrum and other statistics describing genetic diversity within the population.

Strong Background Selection Distorts the Site Frequency Spectrum

We begin by presenting a more detailed description of the effects of background selection on linked neutral alleles. We focus on analyzing the allele frequency spectrum, defined as the expected number, p(f), of mutations that are present at frequency f within the population in steady state. This allele frequency spectrum contains all relevant information about single-site statistics: any such statistic of interest can be calculated by subsampling appropriately from p(f).

In Figure 1A, we show an example of the site frequency spectrum of neutral mutations at a locus experiencing strong background selection, generated by Wright–Fisher forward-time simulations. This example shows several key generic features of background selection. First, at intermediate frequencies the site frequency spectrum has a neutral shape, p(f)f1, with the total number of such intermediate-frequency polymorphisms consistent with the simple reduced “effective population size” prediction (Charlesworth et al. 1993). However, at both low and high frequencies, p(f) is significantly distorted. At low frequencies, we see an enormous excess of rare alleles, qualitatively similar to what we expect in expanding populations (Slatkin and Hudson 1991; Rannala 1997). We also see a large excess of very high frequency variants, leading to a nonmonotonic site frequency spectrum. This is reminiscent of the nonmonotonicity seen in the presence of positive selection (Sawyer and Hartl 1992). Notably, these distortions at both high and low frequencies arise in populations of constant size in which all variation is either neutral or deleterious.

Figure 1.

Figure 1

(A) The (unfolded) average site frequency spectrum of neutral alleles along a nonrecombining genomic segment experiencing strong background selection deviates strongly from the prediction of neutral theory. The purple line shows the simulated neutral site frequency spectrum in Wright–Fisher simulations of an asexual population of N=105 individuals, where deleterious mutations occur at rate NUd=5000 and all have the same effect on fitness Ns=1000. The black lines show the neutral expectation for the site frequency spectrum of a population of N and Ne=NeUd/s individuals (solid and dashed line, respectively). The inset shows the same data, but with the x-axis linearly scaled to emphasize intermediate frequencies. Simulated site frequency spectra were obtained by measuring whole-population neutral site frequency spectra in 105 Wright–Fisher simulations, in which neutral mutations were set to occur at rate NUn=103, and by averaging and then smoothing the obtained curve using a box kernel smoother of width much smaller than the scale on which the site frequency spectrum varies (the kernel width was set to <10% of the minor allele frequency). (B and C) Statistics of the simulated site frequency spectrum (purple) can deviate from predictions of neutral theory (black) by many orders of magnitude in large samples, even though the effect of background selection will be small in small samples. (B) The average ratio of the number polymorphisms present as derived singletons (solid lines) or ancestral singletons (dotted lines) in the sample to the number present at 50% frequency, and (C) the average minor allele frequency of the sampled alleles.

The excess of rare derived alleles arises because selection takes a finite amount of time to purge deleterious genotypes. Thus we expect that there can be substantial neutral variation linked to deleterious alleles that, although doomed to be eventually purged from the population, can still reach modest frequencies. At the very lowest frequencies, we expect that neutral mutations arising in all individuals in the population (independent of the number of deleterious mutations they carry) can contribute. Thus, at the lowest frequencies, the site frequency spectrum should be unaffected by selection and should agree with the neutral site frequency spectrum of a population of size N. On the other hand, as argued above, the total number of common alleles must reflect the (much smaller) number of deleterious-mutation-free individuals, because only neutral mutations arising in such individuals can reach such high frequencies. Since the overall number of very rare alleles is proportional to the census population size N, and the number of common alleles reflects a much smaller deleterious-mutation-free subpopulation, there must be a transition between these two: between these extremes the site frequency spectrum must fall off more rapidly than the neutral prediction p(f)f1. This transition reflects the fact that as frequency increases, the effect of selection will be more strongly felt, and neutral mutations arising in genotypes of increasingly lower fitnesses will become increasingly unlikely.

As the frequency increases even further, we see from our simulations that the total number of polymorphisms increases again until, at very high frequencies, it matches the prediction for a neutral population of size equal to the census size N. Note that, at these frequencies, the total number of backgrounds contributing to the diversity is constant (i.e., all mutations reaching these frequencies must arise in the small subpopulation of mutation-free individuals). This suggests that fundamentally nonneutral behaviors must be dominating the dynamics of these high frequency neutral polymorphisms. To understand this, as well as the details of the rapid falloff at very low frequencies, we will need to develop a more detailed description of the trajectories of neutral alleles in the population; we analyze this in quantitative detail in a later section.

However, a simple argument can explain the agreement with the neutral prediction at the highest frequencies. Polymorphisms observed at these very high frequencies correspond to neutral variants that have almost reached fixation. The ancestral allele is still present in the population, but at a very low frequency. In principle, the dynamics of the derived and ancestral alleles should depend on the fitnesses of their backgrounds. However, once the frequency of the ancestral allele is sufficiently low, the effects of drift will once again dominate over the effects of selection. Thus, at extremely high frequencies of the derived allele, its dynamics must become neutral. In addition to having neutral dynamics, the overall rate at which neutral mutations enter this high-frequency regime also agrees with the rate in a neutral population at the census population size. This is because, at steady state, the total rate at which neutral mutations fix is equal to the product of the rate at which they enter the population at any point in time (NUn) and their fixation probability, 1/N (Birky and Walsh 1988). Thus, since the total rate at which alleles enter this high-frequency regime is unaffected by selection, and since their dynamics within this regime are neutral, we expect that the site frequency spectrum should also agree with the neutral prediction for a population of size N.

Although these simple arguments do not provide a full quantitative explanation of the site frequency spectrum, they already offer some intuition about the presence and magnitude of the distortions due to background selection. First, these distortions arise in part as a result of the difference in the number of backgrounds on which mutations that remain at the lowest frequencies and mutations that reach substantial frequencies can arise. Thus, they will always occur when background selection is strong enough to cause a substantial reduction in the effective population size: if the pairwise diversity π is at all reduced compared to the neutral expectation π0 [π/π0<1, or, in terms of McVicker’s B statistic, B<1 (McVicker et al. 2009)], these distortions exist (see Figure 1A). Second, because the distortions from the neutral shape are limited to high and low ends of the frequency spectrum, they will have limited effect on site frequency spectra of small samples, but will have dramatic consequences as the sample size increases (see Figure 1, B and C). On a practical level, this means that extrapolating conclusions from small samples about the effects of background selection can be grossly misleading.

Data availability

Code used to generate the simulated data are available at: https://github.com/icvijovic/background-selection. Supplemental material available at Figshare: https://doi.org/10.25386/genetics.6167591.

Model and Results

In the next few sections, we will analyze the dynamics of neutral mutations under background selection in detail. We focus on the simplest possible model of purifying selection at a perfectly linked genetic locus in a population of N individuals. We assume neutral mutations occur at a per-locus, per-generation rate Un and deleterious mutations occur at rate Ud (Ud1). Throughout the bulk of the analysis, we will assume that all deleterious mutations reduce the (log) fitness of the individual by the same amount s, although we analyze the effects of relaxing this assumption in a later section. We assume that s1 since this is the interesting case for biologically relevant mutation rates, although we also consider the effects of more strongly deleterious (or lethal) mutations in the Discussion. We neglect epistasis throughout, so that the fitness of an individual with k deleterious mutations at this locus is ks. For simplicity we consider haploid individuals, but our analysis also applies to diploids in the case of semidominance (h=1/2). We assume that selection is sufficiently strong that alleles carrying deleterious mutations cannot fix in the population (NseUd/s1). The opposite case, in which deleterious mutations are weak enough to routinely fix (NseUd/s1), has been the subject of earlier work (Good and Desai 2013; Neher and Hallatschek 2013; Good et al. 2014). In the Discussion, we comment on the connection between these earlier weak-selection results and the strong-selection case we study here.

Our model is equivalent to the nonepistatic case of the model formulated by Kimura and Maruyama (1966) and Haigh (1978) as well as to the h=1/2 case of the model considered by Charlesworth et al. (1993) and Hudson and Kaplan (1994), and later studied by many other authors (Gordo et al. 2002; Seger et al. 2010; Nicolaisen and Desai 2012; Walczak et al. 2012). However, instead of modeling the genealogies of a sample of individuals from the population backwards in time, we offer a forward-time analysis of this model in which we analyze the full frequency trajectory of alleles.

In the presence of strongly selected deleterious mutations (NseUd/s1), we find that the magnitude of the effects of background selection critically depends on the ratio, λ, of the deleterious mutation rate, Ud, to the selective cost of each deleterious mutation, s: λ=Ud/s (Figure 2). This ratio controls the overall variance in the number of deleterious mutations carried by individuals in the population, which is equal to λ=Ud/s (Kimura and Maruyama 1966). Whenever λ1, both the overall genetic diversity and the full neutral site frequency spectrum p(f) are unaffected by background selection and the site frequency spectrum p(f) is to leading order equal to

p(f)2NUnfwhen λ1. (1)

This prediction agrees with the results of forward-time simulations (see Figure 2). The intuition behind this result is simple: in the limit that λ1, a majority of individuals in the population are free of deleterious mutations; neutral alleles are therefore rarely linked to deleterious mutations. This results in a neutral site frequency spectrum.

Figure 2.

Figure 2

Comparison between the theoretical predictions for the site frequency spectrum and Wright–Fisher simulations. In all simulations, NseUd/s=1000e56.73, while the parameter λ=Ud/s varies from 5 to 0.1 (values shown on figure). Dashed lines show the expectations for a neutral population at reduced effective population size Ne=Neλ. At frequencies smaller than 1/(Nσ) (where σ=Uds) and larger than 11/(Nσ), the theoretical predictions agree with the predictions for a neutral population with census size N (black line). Within the range 1/(Nσ)<f<11/(Nσ), the theoretical predictions (Equation 2) are given by colored lines. A single theory curve was constructed from Equation 2 by joining the piecewise forms using sigmoid functions (for details see Constructing a Single Curve from Piecewise Asymptotic Functions in Appendix I). Note that this involves fitting O(1) constants to the curve, for reasons explained in Contribution from the Peaks of Trajectories and Constructing a Single Curve from Piecewise Asymptotic Functions in Appendix I. The values of the constants used are tabulated in Table I1. In simulations in which λ1, N=105 whereas N=104 for smaller λ. In all simulations, the per-individual, per-generation neutral mutation rate is Un=0.1 and site frequency spectra were obtained from these simulations as described in the caption of Figure 1.

However, we will show that the site frequency spectrum of neutral mutations follows a very different form when λ1:

p(f){2NUnf,forf1Nσ,NUnNsf2λlogλ(eλNsf),for1NσfeλNs,2NUneλf,foreλNsf1eλNs,NUnNs(1f)logλ[eλNs(1f)],for1Nσ1feλNs,2NUnf,for1f1Nσ; (2)

where σ=Uds represents the standard deviation in fitness in the population, and line 2 in Equation 2 is valid up to a constant factor (see Contribution from the Peaks of Trajectories in Appendix I for details). Comparisons between Equation 2 and simulations of the model are shown in Figure 2. We note that p(f) matches the site frequency spectrum of a neutral population with a smaller effective population size Ne=Neλ for 1/(Nseλ)<f<11/(Nseλ), but deviates strongly outside this frequency range. This implies that summary statistics based on the site frequency spectrum (e.g., the average minor allele frequency) will start to deviate from the neutral expectation in samples larger than Nseλ=Nes individuals, but not in smaller samples (Figure 1, B and C).

Our results also offer an intuitive interpretation of the origins of these distortions, which are summarized in Figure 3. When Uds, a large majority of individuals in the population will carry some deleterious mutations at the locus, which results in substantial fitness variation within the population. However, the majority of neutral alleles are present on backgrounds that are within O(σ) of the mean of the distribution. Thus, at frequencies f1/(Nσ) and 1f1/(Nσ), the effects of genetic drift dominate over any effects of linked selection for the majority of neutral alleles. At these frequencies, the site frequency spectrum agrees with that of a neutral population of size N (see Figure 3).

Figure 3.

Figure 3

A summary of the dominant effects shaping the site frequency spectrum. The site frequency spectrum and theoretical predictions are reproduced from Figure 1 and Figure 2 (λ=5). At frequencies below 1/(Nσ) and above 11/(Nσ), the allele frequency trajectories of the majority of neutral alleles are dominated by drift, resulting in neutral site frequency spectra corresponding to a population of size N. In contrast, linked selection has a crucial impact for 1/(Nσ)<f<11/(Nσ). The rapid falloff of the site frequency spectrum for 1/(Nσ)<f<1/(Nseλ) is primarily a result of allele frequency trajectories having fundamentally nonneutral properties. In this regime, the number of backgrounds on which neutral alleles can arise also declines with the frequency f. As we show later, for f1/(NUdeλ), the site frequency spectrum is dominated by neutral mutations originating on deleterious backgrounds. In contrast to the rapid decline at lower frequencies, the site frequency spectrum has a neutral shape between f=1/(Nseλ) and f=11/(Nseλ). In this regime, both the neutral and wild-type allele are in approximate mutation–selection balance (see blue dot and blue inset, showing the fitness distribution of such alleles) and large fluctuations of the allele frequency mirror the neutral fluctuations of the most fit individuals. At frequencies larger than 11/(Nseλ), the relative number of polymorphisms increases with the frequency. This pattern results from the effective positive selection of neutral alleles that fix among the fittest individuals (see red dot and red inset) and are, as a result, linked to fewer deleterious mutations than the wild type.

In contrast, the effects of linked selection have a crucial impact on allele frequency trajectories at frequencies f for which 1/(Nσ)f11/(Nσ). As we show in a later section, this region of the site frequency spectrum is dominated by alleles that arise on unusually fit backgrounds [with fitness with respect to the mean larger than O(σ)]. For these alleles, a crucial distinction arises between their short-term and long-term behavior: although genotypes that carry any polymorphic strongly deleterious variants are guaranteed to be eventually purged from the population, those that contain fewer than average deleterious mutations are still positively selected on shorter timescales. This results in strong nonneutral features in the frequency trajectories of these alleles. Their trajectories are characterized by rapid initial expansions, followed by a peak, and eventual exponential decline (Figure 4). These deterministic aspects of allele frequency trajectories are similar to those seen by Neher and Shraiman (2011) in models of linked selection in large facultatively sexual populations. We describe them in detail in the section titled Key features of lineage trajectories. A part of the rapid falloff in the site frequency spectrum between f=1/(Nσ) and f=1/(Nseλ) results from these deterministic effects: alleles arising on backgrounds with more deleterious variants can reach more limited frequencies than alleles arising on backgrounds with fewer deleterious variants. Thus, the number of backgrounds on which neutral alleles could have arisen declines with the frequency, leading to a falloff of the site frequency spectrum.

Figure 4.

Figure 4

(A) The average fitness of a lineage comprising individuals carrying k deleterious mutations at time t=0 (blue dot and blue inset). As the descendants of these individuals accumulate further deleterious mutations, the fitness of the lineage declines until the individuals accumulate an average of λ deleterious mutations (red dot) and reach their own mutation–selection balance, which is a steady-state Poisson profile with mean λ that has been shifted by the initial deleterious load, ks, (red inset). (B) In the absence of genetic drift, the size of the lineage will increase at a rate proportional to its relative fitness. Lineages arising in the class with k=0 deleterious mutations reach mutation–selection balance about td=log(λ)/s after arising, after which the size of the lineage asymptotes to n0(t=0)eλ. The fitness of these lineages changes on a shorter timescale, 1/s. (C) In contrast, lineages arising in classes with k>0 deleterious mutations peak in size after a time td(k), when their average relative fitness is zero, after which they decline exponentially at rate ks.

However, these deterministic aspects of the allele frequency trajectory are not sufficient to produce the site frequency spectrum in Equation 2, even if stochastic effects in the early phase of the trajectory are taken into account (i.e., during “establishment”; see Desai and Fisher 2007 and Neher and Shraiman 2011). This is because fluctuations in the numbers of most-fit individuals that occur after establishment continue to drive fluctuations in the overall allele frequency. This is closely related to the fluctuations in the population fitness distribution studied by Neher and Shraiman (2012) in an analysis of Muller’s ratchet.

In the Analysis, we quantify how these fluctuations propagate to shape the statistics of allele frequency trajectories, finding that fluctuations in the number of most-fit individuals that happen on a timescale shorter than 1/s are smoothed out due to the finite timescale on which selection can respond. In contrast, fluctuations that happen on timescales longer than 1/s are faithfully reproduced in the allele frequency trajectory, which leads to quasi-neutral statistics of allele frequency trajectories at frequencies between 1/(Nseλ) and 11/(Nseλ) (see Figure 3). The smoothing of fluctuations on a finite timescale introduces an additional fundamentally nonneutral feature in the total allele frequency trajectory. This distorts the site frequency spectrum at frequencies below 1/(Nseλ) above and beyond what would be predicted if we asserted a simple frequency-dependent effective population size equal to the number of backgrounds that can contribute to a given frequency.

Finally, we will demonstrate that the nonmonotonicity in the site frequency spectrum at frequencies between 11/(Nseλ) and 11/(Nσ) arises as a result of sweep-like behaviors of neutral alleles that have fixed among the most-fit individuals in the population (see Figure 3). Because these derived alleles carry, on average, fewer deleterious mutations than the wild type, they are positively selected despite having no inherent benefit. We will show that this difference in the average number of linked deleterious mutations gives rise to an effective frequency-dependent selection coefficient seff(f). This selection coefficient changes with the frequency f of the mutation as high-fitness, wild-type individuals ratchet to extinction:

seff(f)=logλ[1Nseλ(1f)]s,if1f1Nseλ. (3)

In the next sections, we derive the form of the site frequency spectrum in Equation 2 and explain these effects in more detail. We begin by presenting background necessary for understanding these results. We first revisit the intuition behind the shape of the site frequency spectra of isolated loci (Ewens 1963; Sawyer and Hartl 1992). We show that, in the absence of linkage between multiple selected sites, background selection does not lead to a site frequency spectrum of the form in Equation 2. Next, we explain how linkage between multiple selected sites modifies allele frequency trajectories. We revisit the key deterministic aspects of allele frequency trajectories in the presence of background selection, previously studied by Etheridge et al. (2009) and others, and extend these results to identify the key timescales important for understanding this problem. Finally, we turn to a full stochastic treatment of allele frequency trajectories in the Analysis, where we also derive the expressions for the site frequency spectra of neutral and deleterious mutations. In the Discussion, we comment on the practical implications of our results, as well as on connections to previous work and other models.

Background

Isolated loci

To gain insight into the more complicated case of linked selection, we first begin by reviewing the simplest case of a single locus isolated from any other selected loci. The probability that an allele at that locus is present at frequency f at time t, p(f,t), is described by the diffusion equation:

pt=f[sf(1f)p]+f2[f(1f)2Np]. (4)

Ewens (1963) showed that the expected site frequency spectrum can be obtained from this forward-time description of the allele frequency trajectory: because mutations are arising uniformly in time and the time at which a mutation is observed is random, the site frequency spectrum is proportional to the average time an allele is expected to spend in a given frequency window.

In this section, we show that the low- and high-frequency ends of the site frequency spectrum of isolated loci can be obtained from a simple heuristic argument that emphasizes this connection between allele frequency trajectories and the site frequency spectrum. These calculations are not intended to be exact [resulting frequency spectra are only valid up to O(1) factors], but they provide intuition for the origins of key features of the site frequency spectrum that we will return to more formally below.

Consider the simplest case of isolated, purely neutral loci. Neutral mutations will arise in the population at rate NUn. In the absence of selection, the trajectories of these mutations are governed by genetic drift. At steady state, the number of mutations we expect to see at frequency f is simply proportional to the number of mutations that reach that frequency and the typical time each of these mutations spends at that frequency before fixing or going extinct. In the absence of selection, a new mutation that arises at initial frequency f0=1/N will reach frequency f before going extinct with probability f0/f=1/(Nf). Standard branching process calculations (Fisher 2007) show that, given that it reaches frequency f, the mutation will spend about Nf generations around that frequency [defined as log(f) not changing by more than O(1)], provided that f is small (f1).

By combining these results, we can calculate the expected site frequency spectrum for small f. The rate at which new mutations reach frequency f is NUn1/(Nf). Those that do will remain around f (in the sense defined above) for about Nf generations. Thus the total number of neutral mutations within df of frequency f is p(f)dfNUn1/(Nf)Nfd(logf). In other words, we have

p(f)NUnf. (5)

This argument is valid when f is rare, but will start to break down at intermediate frequencies. However, because the wild type is rare when the mutant approaches fixation, an analogous argument can be used to describe the site frequency spectrum at high frequencies. The mutant trajectory still reaches frequency f with probability 1/(Nf). It will then spend roughly N(1f) generations around this frequency [i.e., within O(1) of log(1f)]. This gives p(f)NUn/fNUn in the high-frequency end of the spectrum. This simple forward-time heuristic argument reproduces a well-known result of coalescent theory (Wakeley 2009) and agrees with the more formal calculation of sojourn times in the Wright–Fisher process (Ewens 1963).

We can use a similar argument to calculate the frequency spectrum of strongly selected deleterious mutations with fitness effect s (with Ns1) that occur at a locus that is isolated from any other selected locus. Provided that the deleterious mutation is rare (below the “drift barrier” frequency, f<1/(Ns)), its trajectory is dominated by drift. Thus for f<1/(Ns), the mutation trajectory will be the same as for a neutral mutation and the frequency spectrum will therefore be neutral. In contrast, at frequencies larger than 1/(Ns), selection is stronger than drift, which prevents the mutation from exceeding this frequency. Combining these two expressions, we find that the frequency spectrum of an isolated deleterious mutation is, to a rough approximation, given by

p(f){NUdfiff<1Ns0otherwise. (6)

For completeness, we also show how a similar argument can be used to obtain the frequency spectrum of beneficial mutations. Although it is not immediately obvious that this is relevant to background selection, we will later see how similar trajectories emerge in the case of strong purifying selection. Just like deleterious alleles, strongly beneficial alleles with fitness effect s (with Ns1) will not feel the effects of selection as long as they do not exceed the drift barrier (f<1/(Ns)). Their trajectory and frequency spectrum will therefore be neutral below the drift barrier. As a result, only a small fraction s of beneficial mutations will reach frequency 1/(Ns). However, those that do will be destined to fix since, at frequencies larger than 1/(Ns), selection dominates over drift. Above this threshold, selection will cause the frequency of the mutation to grow logistically at rate s [df/dt=sf(1f)], spending 1/[sf(1f)] generations near frequency f. This is valid as long as f<11/(Ns), at which point the effects of drift become dominant due to the wild type being rare, and the trajectory of the mutant is once again the same as the trajectory of a neutral mutation. Combining these expressions, we obtain a rough approximation for the frequency spectrum of an isolated beneficial mutation:

p(f){NUbf,iff<1NsNUbf(1f),if1Ns<f<11NsNUbNs,iff>11Ns. (7)

Linked loci under background selection

We now turn to the analysis of background selection. Since we assume that all mutations have the same effect on fitness, the population can be partitioned into discrete fitness classes according to the number of deleterious mutations each individual carries at the locus. When the fitness effect of each mutation is sufficiently strong, the population assumes a steady-state fitness distribution in which the expected fraction of individuals with k deleterious mutations, hk, follows a Poisson distribution with mean k¯=λ (Kimura and Maruyama 1966; Haigh 1978):

hk=eλλkk!. (8)

A new allele in such a population will arise on a background with k existing mutations with probability hk.

From the form of hk we see that, depending on the value of λ, the population can be in one of two regimes. In the first regime, the rate at which mutations are generated is smaller than the rate at which selection can purge them (λ1). In this case, the majority of individuals in the population carry no deleterious mutations (h01), with only a small proportion, 1h0λ, of backgrounds in the population carrying some deleterious variants. To leading order in λ, all new neutral mutations will arise in a mutation-free background and will remain at the same fitness as the founding genotype. Their trajectories are thus the same as the trajectories of mutants at isolated genetic loci of the same fitness as the founding genotype (see Appendix D for details). This means that the full site frequency spectrum can be calculated by summing the contributions of site frequency spectra of isolated loci that we calculated above. The neutral and deleterious site frequency spectra are, to leading order in λ, given by Equations 5 and 6, respectively (see Appendix H for details). Thus, background selection has a negligible impact on mutational trajectories and diversity when λ1.

In the opposite regime where λ1, mutations are generated faster than selection can purge them and there will be substantial fitness variation at the locus. Consider a new allele (i.e., a new mutation at some site within the locus) that arises in this population. A short time after arising, individuals that carry this allele will accumulate newer deleterious mutations, which will lead the allele to spread through the fitness distribution. The fundamental difficulty in calculating the frequency trajectory of this allele, f(t), stems from the fact that a short time after arising, individuals that carry the allele will have accumulated different numbers of newer deleterious mutations. The total strength of selection against the allele depends on the average number of deleterious mutations that the individuals that carry the allele have. This will change over time in a complicated stochastic way as the lineage purges old deleterious mutations, accumulates new ones, and changes in frequency due to drift and selection. To calculate the distribution of allele frequency trajectories in this regime, we will need to model these changes in the fitness distribution of individuals carrying the allele. Although we will formally be treating λ as a large parameter, in practice our results will also adequately describe allele frequency trajectories in the cases of moderate λ (i.e., λ2, see Figure 2).

To make progress, we classify individuals carrying this allele (the “labeled lineage”) according to the number of deleterious mutants they have at the locus. We denote the total frequency of the labeled individuals that have i deleterious mutations as fi(t), so that the total frequency of the lineage, f(t), is given by

f(t)=ifi(t). (9)

The time evolution of the allele frequency in a Wright–Fisher process is commonly described by a diffusion equation for the probability density of the allele frequency (Ewens 2004). Instead, for our purposes, it will be more convenient to consider the equivalent Langevin equation (Van Kampen 2007):

dfidt=[is+k¯(t)s]fiUdfi+Udfi1+Ci(t). (10)

Here, Ci(t) is a noise term with a complicated correlation structure that is necessary to keep the total size of the population fixed (see Good and Desai 2013 for details), and k¯(t) is the mean number of mutations per individual in the entire population at time t. In the strong selection limit that we are interested in here (Nseλ1), fluctuations in the mean of the fitness distribution of the population are small and k¯(t)λ (Neher and Shraiman 2012).

Key features of lineage trajectories

Before turning to a detailed analysis of Equation 10, it is helpful to consider some of the key features of lineage trajectories that we will model more formally below. To begin, imagine a lineage founded by a neutral mutation in an individual with k deleterious mutations. Let the lineage comprise nk(0) individuals at some time t=0 shortly after arising, all of which carry k deleterious mutations (see blue inset in Figure 4A). At this time, the relative fitness of this lineage is simply ks(k¯s)=Udks. Thus, lineages founded in classes with k>λ will tend to decline in size. In contrast, the more interesting case arises if k<λ, since these lineages will tend to increase in size.

However, although the overall number of individuals that carry the allele will tend to increase when k<λ, the part of the lineage in the founding class k (the “founding genotype”) will tend to decline in size because it loses individuals through new deleterious mutations (at per-individual rate Ud). As a result, the founding genotype feels an effective selection pressure of UdksUd=ks, which is negative for all k>0 and 0 for k=0. This means that the lineage will increase in frequency, not through an increase in size of the founding genotype, but rather through the appearance of a large number of deleterious descendants in classes of lower fitness. The lineage must therefore decline in fitness as it increases in size.

In the absence of genetic drift, we can calculate how the size and fitness of the lineage change in time by dropping the stochastic terms in Equation 10 [subject to the initial condition nk(t=0)=nk(0) and nk+i(t=0)=0 for all i0]. These deterministic dynamics of the lineage have been analyzed previously by Etheridge et al. (2009), who showed that the number of additional mutations that an individual in the lineage carries at some later time t is Poisson distributed with mean λ(1est). Thus the average number of additional deleterious mutations eventually approaches λ after ttd=log(λ)/s generations. At this point, the lineage has reached its own mutation–selection balance: the fitness distribution of the lineage has the same shape as the distribution of the population [i.e., ni+k(t)=hink(t)] but is shifted by ks compared to the distribution of the population (see red inset in Figure 4A).

The average relative fitness x(t) of individuals in the lineage (Figure 4A) is therefore equal to

x(t)=Udestks, (11)

and the total number of individuals in the lineage is simply n(t)=nk(t=0)e0tx(t)dtnk(t=0)gk(t), where we have defined

gk(t)=ekst+λ(1est). (12)

Thus, we can see from Equations 11 and 12 that lineages founded in the 0-class will, on average, steadily increase in size at a declining rate until they asymptote at a total size equal to nk(t=0)eλ roughly td=log(λ)/s generations later (see Figure 4B). In contrast, lineages founded in the k-class will increase in size for only

td(k)=log(λ/k)s (13)

generations, when they peak at a size of nk(t=0)gk individuals (see Figure 4C), where we have defined

gkeλ(keλ)k. (14)

The lineages remain near this peak size for about

Δt(k)=1ks (15)

generations (Figure 4C). At longer times, they exponentially decline at rate ks (Figure 4C).

These simple deterministic calculations capture the average behavior of an allele and show that all alleles founded in classes with k>0 are likely to be extinct on timescales much longer than td(k), whereas sufficiently large lineages founded in the 0-class should simply reflect the frequency in the founding class about td generations earlier: f(t)eλf0(ttd). This is the forward-time analog of the intuition presented by Charlesworth et al. (1993).

Of course, this deterministic solution neglects the effects of genetic drift, which will be crucial, particularly because drift in each class propagates to affect the frequency of the lineage in all lower fitness classes (for a more detailed heuristic describing why drift can never be ignored, see The Importance of Genetic Drift in the Founding Class in Appendix B). Although these effects are complex, there is a hierarchy in the fluctuation terms which we can exploit to gain some intuition. From the deterministic solution above, we can see that a fluctuation of size δfi in class i will, on average, eventually cause a change in the total size of the lineage proportional to δfigi after a time delay td(i). Thus, the fluctuations that have the largest effect on the total size of the lineage are those that occur in the class of highest fitness (i.e., the founding class k). These fluctuations will turn out to be the most important in describing the frequency trajectory of the entire allele, although fluctuations in classes of lower fitness will still matter in lineages of a small enough size.

One could imagine that this result means that fluctuations in the total size of the lineage simply mirror the fluctuations in the founding class, amplified by a factor gk and after a time delay td(k). If fluctuations in the founding class are sufficiently slow, this is indeed the case. However, this is not true for fluctuations that occur on shorter timescales. Consider, for example, the case where a neutral mutation is founded in the mutation-free (k=0) class. Imagine that the frequency of the allele in the founding class changes by a small amount from f0 to f0+δf0 as a result of genetic drift (shown in the first panel of Figure 5). Based on the deterministic solution, this fluctuation will lead to a proportional change in the frequency of the portion of the lineage in the 1-class, and this change will take place over 1/s generations (see Appendix A for details). During this time, the change in the 1-class begins to lead to a shift in the frequency in the 2-class, which will mirror the change in the 0-class a further 1/(2s) generations later (see Figure 5). This change will then propagate, in turn, to lower classes and ultimately results in a proportional change in the total allele frequency a total of i=1λ1/(is)=log(λ)/s generations later (see Figure 5).

Figure 5.

Figure 5

A schematic showing how a change in the frequency of the lineage in the mutation-free class propagates to affect the frequency in all classes of lower fitness. At time t=0, the lineage is in mutation–selection balance at total frequency f, when the frequency of the portion of the lineage in the 0-class changes suddenly from f0 to f0+δf0. This change is felt in the 1-class 1/s generations later and propagates to the 2-class yet another 1/(2s) generations later. The lineage reaches a new equilibrium about log(λ)/s generations later, when the total allele frequency is proportional to (f0+δf0)eλ.

Now consider what happens if there is another change in the frequency in the founding class. If this change occurs within the initial 1/s generations, it will influence the 1-class simultaneously with the first fluctuation, and thus the effect of these two fluctuations on the overall lineage frequency will be “smoothed” out. In contrast, if the changes are separated by more than 1/s generations, they will propagate sequentially through the fitness distribution and are ultimately mirrored in the total allele frequency. Similar arguments apply to lineages founded in other fitness classes, though the relevant timescales and scale of amplification are different.

Together, these arguments suggest that fluctuations in the founding class will have the largest impact on overall fluctuations in the lineage frequency, and these overall fluctuations will represent an amplified but smoothed-out mirror of the fluctuations in the founding class. This smoothing will be crucial: the size of the lineage in the founding class will typically fluctuate neutrally, but the smoothed-out and amplified versions will have nonneutral statistics. As we will see below, this smoothing ultimately leads to distortions in the site frequency spectrum at low frequencies (f1/(Nseλ)).

Analysis

Formally, we analyze all of the effects described above by computing the distribution of the frequency trajectories f(t) of the allele, p(f,t), from Equation 10 for an allele arising in class k. This process is complicated by the correlation structure in the Ci(t) terms required to keep the population size constant. These correlations are important once the lineage reaches a high frequency and, in the presence of strong selection, they result in a complicated hierarchy of the moments of f, which do not close (Higgs and Woodcock 1995; Good and Desai 2013). However, we can simplify the problem by considering low-, high-, and intermediate-frequency lineages separately. First, at sufficiently low frequencies (f1), the Ci(t) in Equation 10 reduce to simple uncorrelated white noise. At these low frequencies, Equation 10 thus simplifies to

dfidt=(is+λs)fiUdfi+Udfi1+fiNηi(t), (16)

where the noise terms have ηi(t)=0 and covariances ηi(t)ηj(t)=δijδ(tt) and should be interpreted in the Itô sense. At very high frequencies (1f1), a similar simplification arises. In this case, the wild-type lineage is at low frequency and we can model the wild-type frequency using an analogous coupled branching process with uncorrelated white noise terms. Finally, at intermediate frequencies, we cannot simplify the noise terms in this way. Fortunately, for the case of strong selection we consider here, we will show that for 1/(Nseλ)f11/(Nseλ), lineage trajectories have neutral statistics on relevant timescales. As we will see below, these low-, intermediate-, and high-frequency solutions can then be asymptotically matched, giving us allele frequency trajectories and site frequency spectra at all frequencies.

In the next several subsections, we focus on the analysis of the distribution of trajectories at low and high frequencies (f1 or 1f1), where Equation 16 is valid. We then return in a later subsection to the analysis of trajectories at intermediate frequencies.

The dynamics of the lineage within each fitness class

To obtain the distribution of trajectories of the allele p(f,t) at low frequencies (f1) from Equation 16, we will first compute the generating function of f(t). This generating function is defined as

Hf(z,t)=ezf(t), (17)

where angle brackets denote the expectation over the probability distribution of the frequency trajectory f(t). Hf(z,t) is simply the Laplace transform of the probability distribution of f(t) and it therefore contains all of the relevant information about the probability distribution of f(t).

As we have already anticipated from our discussion above, the time evolution of f(t) depends on the distribution of the lineage among different fitness classes. To understand how this distribution changes under the influence of drift, mutation, and selection in these classes, we can consider the joint generating function for the fi(t),

H({zi},t)=eizifi(t). (18)

The generating function for the total allele frequency Hf(z,t) can then be obtained from this joint generating function by setting zi=z. We will use this relationship between the two generating functions to evaluate the importance of drift, mutation, and selection within each of the fitness classes on the total allele frequency.

By taking a time derivative of Equation 18 and substituting the time derivatives dfi/dt from Equation 16 (where the stochastic terms should be interpreted in the Itô sense, see Appendix C), we can obtain a partial differential equation (PDE) describing the evolution of the joint generating function:

Ht=i(isziUdzi+1+zi22N)Hzi. (19)

We see from Equation 19 that the joint generating function is constant along the characteristics zi(tt) defined by

dzidt=isziUdzi+1+zi22N. (20)

Thus, the joint generating function can be obtained by integrating along the characteristic backward in time from t=0 to t=t, subject to the boundary condition zi(t)=z. Note that the linear terms in the characteristic equations arise from selection and mutation out of the i-class and that the nonlinear term arises from drift in class i.

In Large Lineages Arising on Unusually Fit Backgrounds in Appendix E, we show that when considering the distribution of trajectories p(f,t) at frequencies feλ[i/(eλ)]i/(2Nsi)=gi/(2Nsi) the nonlinear terms in Equation 20 are of negligible magnitude uniformly in time in all classes containing i or more deleterious mutations per individual, as long as iλ, Nseλ1, and λ1. Here, gi represents the peak of the expected number of individuals in a lineage founded by a single individual in class i (see Equation 14 and Figure 4C). Thus, when fgi/(2Nsi), the effect of genetic drift is negligible in classes with i or more deleterious mutations. Conversely, when fgi/(2Nsi), genetic drift in the class with i deleterious mutations does affect the overall allele frequency.

Since drift is negligible in classes with i or more mutations, total allele frequencies of f(t)gi/(2Nsi) require that fi(ttd(i))1/(2Nsi). This threshold is reminiscent of the drift barrier, but its origin for classes below the founding class (i>k) is more subtle. We offer an intuitive explanation for this threshold in The Importance of Genetic Drift in Classes Below the Founding Class in Appendix B. Thus, drift in class i has an important impact on the overall frequency trajectory as long as fi1/(2Nsi). However, once fi exceeds 1/(2Nsi), the effect of genetic drift in that class, as well as in all classes below i, becomes negligible because the frequencies of the parts of the lineage in all classes below i are then also guaranteed to exceed the corresponding thresholds. Note that the frequency of the founding genotype fk is exponentially unlikely to substantially exceed 1/(2Nsk). This is because, as we explained earlier, the frequency trajectory of the founding genotype fk(t) has the same statistics as the trajectory of a mutation of fitness ks at an isolated locus (see Equation 16 and Appendix F). Thus, because fk is unlikely to exceed 1/(2Nsk), the overall allele frequency f of an allele founded in class k is exponentially unlikely to substantially exceed gk·1/(2Nsk).

In summary, by analyzing the generating function for the components of the lineage in different fitness classes, we have found that there is a clear separation between high-fitness classes in which mutation and drift are the primary forces, and classes of lower relative fitness in which mutation and selection dominate. The boundary between the stochastic and deterministic classes can be determined from the total allele frequency, allowing us to reduce a complicated problem involving a large number of coupled stochastic terms to what we will see is a small number of stochastic terms feeding an otherwise deterministic population.

Statistics of trajectories with g1/(2Ns)f1

At this point, we are in a position to calculate a piecewise form for the generating function Hf(z,t), valid near any frequency f. For example, consider the allele frequency trajectory in the vicinity of some frequency g1/(2Ns)f1. As we have explained above, at these frequencies contributions from mutations arising in class k1 are exponentially small, since they would require the frequency of the lineage in that class to substantially exceed 1/(2Ns), which happens only exponentially rarely. Thus, in this frequency range we will only see mutations arising in the mutation-free class (k=0). In addition to this, we have shown that at these frequencies genetic drift can be neglected in all classes but the 0-class. To obtain the generating function at these frequencies, we can therefore integrate the characteristic equations by dropping the nonlinear terms in Equation 20 for all i>0 [see Large Lineages Arising on Unusually Fit Backgrounds in Appendix E for details]. This yields the generating function for the frequency of the labeled lineage:

Hf(z,t)=ez[f0+Udtdτf0(τ)g1(tτ)], (21)

where the average is taken over all possible realizations of the trajectory in the founding class f0(t).

As before, g1(tτ) represents the expected number of individuals descended from an individual present in the 1-class tτ generations earlier (see Equation 12). Thus, the two terms in the exponent in Equation 21 represent the frequency of the lineage in the founding class f0 and the total frequency of the deleterious descendants of that lineage. The latter are seeded into the 1-class at rate NUdf0(τ) and each of these deleterious descendants founds a lineage that tτ generations later contains g1(tτ) individuals, so that the total frequency of the allele is simply

f(t)=f0(t)+Udtdτf0(τ)g1(tτ). (22)

Thus, we have obtained a simple expression for the frequency of the entire allele in which all of the stochastic effects have been reduced to a single stochastic component, f0(t). Furthermore, the stochastic dynamics of f0(t) are those of a simple, isolated, neutral mutation (see schematic of such a trajectory in Figure 6B). Note, however, that the statistics of the fluctuations in f(t) are not necessarily the same as the statistics of the trajectory in the founding class (see Figure 6A). This is because f(t) depends on an integral of f0(t) (see Equation 22) and therefore has different stochastic properties than f0(t) itself.

Figure 6.

Figure 6

Schematic of (A) the trajectory of the total allele frequency, and (B) the trajectory of the frequency of the portion of the allele that remains in the founding class. Soon after arising in the founding class, the allele frequency rapidly increases in the spreading phase of the trajectory. Early in this phase, the total allele frequency (purple) becomes much larger than the frequency of the founding genotype [black line in left inset of (A)]. In the peak phase of the trajectory, the total allele frequency trajectory represents a smoothed-out and amplified version of the trajectory in the founding class [the relationship between the total allele frequency (purple curve) and the founding genotype frequency (black curve) is based on Equation 22]. In the extinction phase of the trajectory, the allele frequency declines at an increasing rate [right inset of (A)]. We describe the extinction phase in more detail in Appendix E and in the section of the main text titled The trajectories of high frequency alleles, 1f1, where we also explain how the rate of extinction changes with the frequency.

From Equation 22, we can see that the frequency trajectory of the allele still has the same qualitative features as those we have seen in the deterministic behavior of mutations. Shortly after being founded, the lineage will become dominated by the deleterious descendants of the founding class, which are captured by the second term in Equation 22 (see left inset in Figure 6A). At early times [ttd=log(λ)/s], the total allele frequency must rapidly grow as the lineage spreads through the fitness distribution and approaches mutation–selection balance (see Figure 6A). About td generations after founding, the peak phase of the trajectory begins (see Figure 6A). During this phase, the average fitness of the lineage is approximately zero and the allele traces out a smoothed-out and amplified version of the trajectory in the founding class (Figure 6B). Finally, td generations after the descendants of the last individuals present in the founding class have peaked, the average fitness of the lineage will fall significantly below zero and the extinction phase of the trajectory begins.

As we show in Appendix I, the peak phase of the trajectory is the most important for understanding the site frequency spectrum. This is also the phase during which the trajectory of the mutation spends the longest time near a given frequency. In contrast, the spreading phase (see Figure 6A) has a negligible effect on the site frequency spectrum: by this we mean that the site frequency spectrum at a given frequency will always be dominated by the peak phase of trajectories that peak around that frequency, and will not be influenced by the spreading phase of trajectories that peak at much higher frequencies. We will therefore not consider the spreading phase in the main text, but discuss it in Contribution from the Spreading Stage of Trajectories in Appendix I. The extinction phase of the trajectory can also be neglected for a similar reason, except when considering the very highest frequencies: f11/(Nseλ) (see Contribution from the Extinction Stage of Trajectories in Appendix I). At these frequencies, the wild-type frequency is small and the mutant is in the process of fixation. To analyze the allele frequency trajectory at these frequencies, we model the wild type using the coupled branching process in Equation 16 and hence describe these trajectories by the extinction phase of the wild type.

To calculate the distribution of f(t) in the peak phase, we need to calculate the distribution of the time integral of f0(t) in Equation 22. We can simplify this integral by observing that g1(t) is highly peaked in time between td(1)Δt(1)/2 and td(1)+Δt(1)/2, where td(1) and Δt(1) are given by Equations 13 and 15 and are annotated in Figure 4C. In other words, starting at times around td(1) generations after the lineage reaches a substantial frequency in the founding class, the labeled lineage is dominated by the deleterious descendants of individuals extant in the founding class between td(1)Δt(1)/2 and td(1)+Δt(1)/2 generations earlier, with individuals extant in the founding class at other times having exponentially smaller contributions [see Large Lineages Arising on Unusually Fit Backgrounds in Appendix E for details]. Thus, the total size of the lineage will be proportional not to the frequency f0(ttd(1)) in the founding class td(1) generations earlier, but to the total time-integrated frequency within some window of width Δt(1) centered around that time. We call this quantity the “weight” and denote it by WΔt(1), where

WΔt(1)(t)=tΔt(1)2t+Δt(1)2f0(t)dt. (23)

The total allele frequency in the peak phase is therefore equal to

f(t)UdWΔt(1)(ttd)g1(td). (24)

Thus, to calculate the distribution of the allele trajectory, we only need to calculate the distribution of the weight in the founding class over a window of specified width, Δt(1). It is informative to consider the time-integrated form of the distribution of this weight, p(WΔt(1))=dtp(WΔt(1),t), since this form is also directly relevant to the site frequency spectrum [for a discussion of the time-dependent distribution WΔt(1)(t), see Appendix F]. In Appendix F we show that p(WΔt(1)) is given by

p(WΔt(1)){12NπΔt(1)WΔt(1)3/2,WΔt(1)Δt(1)2N,1WΔt(1),WΔt(1)Δt(1)2N. (25)

This distribution has a form that can be simply understood in terms of the trajectory in the founding class. Since genetic drift takes order Nf0 generations to change f0 substantially, drift will not change f0 significantly within Δt(1) generations when the frequency in the founding class exceeds Δt(1)/N=1/(Ns). As a result, the weight, WΔt(1), will be approximately equal to WΔt(1)f0Δt(1)=f0/s. Therefore, at these large frequencies, the weight simply traces the founding class frequency and the two quantities have the same distributions. At lower frequencies, f01/s, the founding genotype will typically have arisen and gone extinct in a time of order Nf0,max generations (where f0,max is the maximal frequency the lineage reaches over the course of its lifetime). By assumption, this time is much shorter than 1/s. Thus, the weight in a window of width 1/s that contains this trajectory is simply WΔt(1)=f0,maxNf0,max. This large a trajectory is obtained with probability 1/(Nf0,max), from which it follows (by a change of variable) that the distribution of weights in the founding class scales as WΔt(1)3/2.

As we anticipated in our discussion of the propagation of fluctuations of the founding genotype through the fitness distribution (Figure 5), we have found that the trajectory of the allele in the peak phase looks like a smoothed-out, time-delayed, and amplified version of the trajectory in the founding class (Figure 6). At sufficiently high frequencies, f(t)Ug11/(Ns2)1/(Nseλ), the timescale of the smoothing is shorter than the typical timescale of the fluctuations in the founding class. At these frequencies, the statistics of the fluctuations of the allele simply mirror the statistics of the fluctuations in the founding class, with a time delay equal to td(1)=log(λ)/s.

At lower frequencies, g1/(2Ns)f1/(Nseλ), the timescale of smoothing is much longer than the typical lifetime of the founding genotype. As a result, the deleterious descendants of the entire original genotype rise and fall simultaneously and fluctuations in the founding class are not reproduced in detail. Instead, the peak phase of the allele frequency trajectory consists of a single peak with size proportional to the total lifetime weight of the founding genotype, W=f(τ)dτ. As we calculated above, the distribution of these peak sizes falls off more rapidly than neutrally. This gives us a complete description of the statistics of the peaks of allele frequency trajectories in the frequency range fg1/(2Ns).

Statistics of trajectories with fg1/(2Ns)

So far, we have only considered trajectories of lineages that reach a maximal allele frequency larger than g1/(2Ns), all of which must have arisen in the mutation-free class. At lower frequencies, g2/(4Ns)fg1/(2Ns), the effects of genetic drift in class i=1 must also considered, but the behavior in classes with i2 is deterministic. In this case, by repeating our earlier procedure, we obtain a slightly different form for the generating function Hf(z,t),

Hf(z,t)=ez[f0+f1+Udtdτf1(τ)g2(tτ)], (26)

so that the total allele frequency is

f(t)=f0(t)+f1(t)+Udtdτf1(τ)g2(tτ). (27)

The total allele frequency is once again dominated by the last term, which represents the bulk of the deleterious descendants. Thus, by an analogous argument, the peak size of the lineage is proportional to the weight in the 1-class in a window of width Δt(2)=1/(2s):

f(t)UdWΔt(2)(ttd(2))g2. (28)

There are two types of trajectories that can reach these frequencies: trajectories that arise in the 1-class and reach a sufficiently large frequency in their founding class (f1λ1/2/(2NUd), see Appendix G); and trajectories that arise in the 0-class and reach a smaller frequency in their founding class (f0λ1/2/(2NUd)), but still leave behind enough deleterious descendants in the 1-class that the overall frequency in that class exceeds f1λ1/2/(2NUd). By the argument that we outlined before, this ensures that genetic drift will negligible in classes of lower fitness (i.e., for i2) and is guaranteed to happen if f0λ1/4/(NUd) (see Appendix G).

The trajectories of the former type are simple to understand since, in this case, the trajectory f1(t) is that of a simple, isolated, deleterious locus with fitness s [and f0(t)=0 at all times]. By repeating the same procedure as above, we find that the time-integrated distribution of the weights in the 1-class is

p(WΔt(2))Δt(2)2πNWΔt(2)3/2eNs2WΔt(2)2. (29)

Note that since the trajectory of a mutation in the founding 1-class is longer than 1/s generations only exponentially rarely, a window of length Δt(2) nearly always contains the entire founding class trajectory (see Appendix F). This is reflected in the form of the weight distribution in Equation 29, which falls as WΔt(2)3/2 with an exponential cutoff at 2/(Ns2). Thus, the frequency trajectory of an allele that arises in the 1-class will not mirror the fluctuations in the founding genotype. Instead, the peak phase of the allele frequency trajectory will nearly always consist of a single peak, just as we have seen in the case of alleles peaking at frequencies g1/(2Ns)f1/(Nseλ).

We now return to the other type of trajectory that can peak in this range: alleles arising in the 0-class, but reaching a small enough frequency that the effects of genetic drift in the 1-class cannot be ignored (f0λ/(NUd)). Because the trajectory of these alleles in the 1-class represents the combined trajectory of multiple clonal “sublineages,” each founded by a mutational event in the 0-class, the distribution of weights in the 1-class will be different [p(WΔt(2))WΔt(2)5/4, see Appendix G], which leads to a different distribution of overall allele frequencies f. However, as we show in Contribution from the Peaks of Trajectories in Appendix I, these trajectories have a negligible impact on the site frequency spectrum: because the overall number of mutations arising in class 1 is substantially larger than the overall number of mutations arising in class 0, trajectories that arise in class 0 and peak in the same frequency range as mutations originating in class 1 are less frequent by a large factor (λ3/4, see Appendix G).

Similarly, at even lower frequencies in the range gi+1/[2Ns(i+1)]fgi/(2Nsi) we will see the peaks of trajectories arising on backgrounds with i or fewer deleterious mutations. These trajectories all have a single peak of width equal to Δt(i+1)=1/(si+1). The maximal peak sizes are, once again, proportional to the total weight in the i-class, which will be distributed according to a different power law depending on the difference in the number of deleterious mutations Δ=ki between the founding class k and the i-class (see Appendix G for details). As we show in Contribution from the Peaks of Trajectories in Appendix I, the most numerous of these mutations are those that arise in the i-class (k=i). The index of this most-numerous class is a quantity that we return to at multiple points and we denote it with kc(f). We can obtain an explicit form for how kc(f) depends on the frequency f by solving the implicit condition gkc+1/[2Ns(kc+1)]fgkc/(2Nskc) for kc. We show in Contribution from the Peaks of Trajectories in Appendix I that, to leading order,

kc(f)+1logλ(1Nseλf),whenkc(f)1. (30)

Finally, at the very lowest frequencies, fλ/(NUd)=1/(Nσ), the site frequency spectrum is dominated by the trajectories of lineages that arise in a class that is within an standard deviation σ of the mean of the fitness distribution (i.e., lineages with |λk|σ/s=λ). Unlike the trajectories of lineages that arise in classes of higher fitness that we discussed above, allele frequency trajectories of lineages arising within an standard deviation of the mean are typically dominated by drift throughout their lifetime [see Lineages Arising on Typical Backgrounds in Appendix E]. This is because the timescale on which these lineages remain above the mean of the fitness distribution [which is limited by td(k)] is shorter than the timescale that it takes them to drift to a frequency large enough for the effect of selection to be felt [1/[N(Udsk)]td(k)]. Lineages arising in these classes do not reach frequencies substantially larger than 1/(Nσ), and have largely neutral trajectories at frequencies that remain below this threshold.

The mirrored fluctuations of the allele at intermediate frequencies, 1/(Nseλ)f11/(Nseλ)

We have seen that the effects of genetic drift in multiple fitness classes may be important when f1/(Nseλ) but that, at frequencies larger than 1/(Nseλ), genetic drift in all classes apart from the 0-class can be neglected. At these frequencies, the trajectory of the allele mirrors the fluctuations in the 0-class that occur on timescales longer than 1/s generations. We have also seen that overall allele frequencies larger than 1/(Nseλ) correspond to 0-class frequencies of f01/(Ns).

At more substantial allele frequencies (for which the condition that f1 is not satisfied), the coupled branching process in Equation 16 cannot be used to adequately model the allele frequency trajectory. This is because, at these frequencies, the correlations between fluctuations in the frequencies of the mutant and of the wild type, which are imposed by the finite-size constraint of the population, become important. However, we can account for these correlations simply by making use of the fact that the effect of genetic drift in all classes but the 0-class will remain negligible as long as both the mutant and the wild type remain at sufficiently large frequencies. Thus, to model the overall allele frequency trajectory at these intermediate frequencies, we can use a simple, neutral model to describe the frequency of the mutant in the 0-class, f0(t), and the frequency of the wild type in the 0-class, fwt,0(t)=h0f0(t)=eλf0(t), as

df0dt=f0(eλf0)Nη0(t) (31)

and treat the remainder of the population deterministically (which yields an expression for the relationship between f0(t) and f(t) that is identical to Equation 22).

Furthermore, since we have assumed that 1/(Nseλ)f11/(Nseλ), an additional simplification arises. In this frequency range, the frequency of both the mutant and of the wild-type 0-class exceed 1/(Ns). Thus, large fluctuations in the frequency of the mutant and of the wild type occur on timescales that are longer than 1/s generations. Because this timescale is longer than the timescale on which selection in lower classes responds (1/s), large fluctuations in the 0-class are mirrored by the overall frequency trajectory after a time delay. In other words, on timescales longer than 1/s generations, we can expand the exponent in the integrand in Equation 22 around its peak and approximate the total allele frequency of the mutant and the wild-type alleles as f(t)eλf0(tlogλ/s) and fwt(t)eλ{h0f0(tlog(λ)/s)}, which yields a model for the total allele frequency of the mutant:

dfdt=f(1f)Neλη(t), (32)

where η(t) is an effective noise term with mean η(t)=0, variance η(t)2=1, and auto-correlation η(t)η(t) that vanishes on timescales longer than 1/s. Thus, on timescales longer than 1/s, the allele frequency trajectory is just like that of a neutral mutation in a population of smaller size Neλ. On shorter timescales, the allele frequency trajectory will be more correlated in time than the frequency trajectory of a neutral population in a population of that size and will appear smoother. However, since large frequency changes of alleles at these frequencies will only occur on a timescale of order Neλf, which is much longer than 1/s, this description will be sufficient for describing site frequency spectra.

We emphasize that Equation 32 relies on the overall fluctuations in the fitness distribution of the population being negligible on relevant timescales, so that the average number of deleterious mutations per individual, k¯(t), is approximately equal to λ (and, crucially, independent of f). We expect that this approximation is valid when Nseλ1, because the overall fluctuations in k¯(t) are small compared to λ in this limit (Neher and Shraiman 2012). However, it is less clear whether this approximation continues to be appropriate as Nseλ approaches more moderate O(1) values. A more detailed exploration of these effects would require a path-integral approach similar to that of Neher and Shraiman (2012) and is beyond the scope of this work.

The trajectories of high frequency alleles, 1f1

The neutral model from the previous section breaks down when the allele frequency of the mutant exceeds 11/(Nseλ). These total allele frequencies are attained when the frequency of the founding genotype f0 exceeds the frequency h01/(Ns). When this occurs, the frequency of the wild type in the founding class will fall below 1/(Ns) and fluctuations that occur on timescales shorter than 1/s generations will once again become important. Mutant lineages that reach such high frequencies are almost certain to fix in the 0-class. Once this happens, all individuals that carry the wild-type allele at the locus will also be linked to a deleterious variant. Thus, although the mutant carries no inherent fitness benefit, it will thereafter appear fitter than the wild type because it has fixed among the most-fit individuals in the population. The mutant will therefore proceed to perform a true selective sweep and will drive the wild-type allele to extinction.

At these high allele frequencies, 1f1/(Nseλ)1, we can once again use the coupled branching process in Equation 16 to describe the allele frequency trajectory of the wild type, fwt(t)=1f(t). Seen from the point of view of the wild type, the fixation phase of the mutant corresponds to the extinction phase of the wild type (see right inset in Figure 3). To obtain a description of the allele frequency trajectory of the wild type at these times, we can expand the generating function in Equation 21 at long times, which yields

Hfwt(z,t)ez1Nseλes(tt0), (33)

for some choice of t0 [see Large Lineages Arising on Unusually Fit Backgrounds in Appendix E for details]. Note that, as before, Equation 33 is valid only as long as fwtg1/(2Ns) (i.e., as long as the size of the lineage in the 1-class exceeds 1/(2Ns)). Once the frequency of the wild type in the 1-class falls below 1/(2Ns), we can no longer treat this class deterministically. Once this happens, the part of the wild type that is in the 1-class will drift to extinction within about 1/s generations, whereas its bulk will continue to decay at a rate proportional to its average fitness, 2s. This will go on for as long as the frequency in the 2-class is larger than 1/(4Ns), corresponding to the total frequency of the lineage being larger than g2/(4Ns). Once the frequency of the wild type in the 2-class also falls below 1/(4Ns), the bulk of the lineage will continue to decay even more rapidly, at rate 3s, and so on. In general, once the frequency of the lineage in class kc, but not in class kc+1, falls below 1/(2Nskc), which corresponds to the total frequency of the wild type being in the range gkc+1/[2Ns(kc+1)]fwtgkc/(2Nskc), the average fitness of the bulk of the wild type will be equal to seff=s(kc+1) (see Figure 3).

Thus, the wild type goes extinct in a staggered fashion, dying out in classes of higher fitness first and declining in relative fitness in this process. As a result, the effective negative fitness of the wild type increases as its frequency declines, leading to an increasingly rapid exponential decay of the allele frequency (see right inset in Figure 6A). By solving the implicit condition for kc(fwt) above as we did previously (see Equations E19–E21 in Appendix E), we find that average fitness of the bulk of the wild-type distribution seff(fwt) is to leading order equal to

seff(fwt)=s[kc(fwt)+1]slogλ(1Nseλfwt), (34)

when fwt1/(Nseλ). This means that the frequency trajectory of the wild type in this phase obeys

fwt(t)elogλ(1Nfwtseλ)st. (35)

The Site Frequency Spectrum in the Presence of Background Selection

Having obtained a distribution of allele frequency trajectories, we are now in a position to evaluate the site frequency spectrum. Since the trajectory of any lineage depends on the fitness of the background on which it arose, we will find it convenient to divide the total site frequency spectrum, p(f), into the site frequency spectra of mutations with different ancestral background fitnesses, p(f,k). By definition, the total site frequency spectrum p(f) is the sum over these single-class frequency spectra:

p(f)=kp(f,k). (36)

We evaluate the site frequency spectrum in three overlapping regimes, f1, 1/(Nseλ)f11/(Nseλ), and 1f1.

The site frequency spectrum of rare alleles, f1

The rare end of the frequency spectrum (f1) consists of neutral alleles that (because they are rare) occurred on different genetic backgrounds. These alleles thus have independent allele frequency trajectories that can be described by the coupled branching process, Equation 16. As long as f(t)1/(Nσ), most lineage trajectories are dominated by genetic drift. Intuitively, this result is simple: provided that the lineage is rare enough, selection pressures in any fitness class (or more precisely, the bulk of the fitness classes where the vast majority of such alleles arise) can be neglected compared to drift. Thus, the resulting site frequency spectrum is

p(f,k)2NUnhkf,forf1Nσ. (37)

At these frequencies, the total site frequency spectrum is equal to

p(f)=kp(f,k)2NUnf, forf1Nσ. (38)

This agrees with our earlier intuition that, at the lowest frequencies, the entire population contributes to the site frequency spectrum and also agrees with the results of Wright–Fisher simulations (see Figure 3). Since the effects of selection are negligible, each fitness class contributes proportionally to its size, with the largest fitness classes contributing the most (see Figure 7). The deleterious mutation-free (k=0) class has a negligible effect on the site frequency spectrum, contributing only a small proportion (proportional to its total frequency, h0=eλ) of all variants seen at these frequencies.

Figure 7.

Figure 7

Proportions of polymorphisms at a given frequency that arose in genotypes that had a specified number of additional deleterious mutations compared to the most-fit genotype at the locus at the time they arose. The parameters in these simulations are the same as the parameters used to generate site frequency spectra in Figure 1 and Figure 3. Note that for frequencies f>1/(Nseλ), the entire site frequency spectrum comprises lineages that arose due to neutral mutations in the k=0 class (with only a few exceptions that arise due to rare ratchet events).

At larger frequencies, f1/(Nσ), selection plays an important role in shaping allele trajectories and the site frequency spectrum. The overall contribution of mutations originating in class k near some frequency f is determined not only by the overall rate NUnhk at which such mutations arise, but also by the probability that these mutations reach f, which declines with the initial deleterious load k. As a result, as f increases, the site frequency spectrum will become increasingly enriched for alleles arising in unusually fit backgrounds (see Figure 7).

The contributions to the site frequency spectra p(f,k) are straightforwardly obtained by integrating in time the distributions of allele frequency trajectories that we have described in the Analysis:

p(f,k)=NUnhkp(f,k,t)dt. (39)

This integral is dominated by the peak phase of allele frequency trajectories, during which we have seen that the allele frequency is simply proportional to the weight in class kc(f)log[1/(Nseλf)], which corresponds to the class of lowest fitness in which the dynamics are not deterministic:

p(f,k,t){p[f=UdgkcWΔt(kc)(ttd(kc))],kkc(f),0,otherwise . (40)

The overall site frequency spectrum is equal to the sum of these terms (Equation 36). In Contribution from the Peaks of Trajectories in Appendix I, we show that this sum is well approximated by the last term, corresponding to k=kc(f), and obtain that the site frequency spectrum in the rare end is, to leading order,

p(f){NUnNsf2λlogλ(1/(Nsfeλ))forλNUdf1Nseλ,2NUneλffor1Nseλf1, (41)

where the form of the frequency spectrum for f1/(Nseλ) is valid up to a constant factor (see Contribution from the Peaks of Trajectories in Appendix I for details). A comparison between these predictions and site frequency spectra obtained in Wright–Fisher simulations of the model is shown in Figure 2.

These results reproduce much of what we may have anticipated from our analysis of allele frequency trajectories. At frequencies f1/(Nseλ), these peaks represent the mirrored and amplified trajectories in the mutation-free (k=0) founding class. To reach these frequencies, mutations need to arise in the mutation-free class (which happens at rate NUneλ) and drift to substantial frequencies (f01/(Ns)). Since fluctuations in the founding class of lineages that exceed this frequency are slow compared to the timescale on which their deleterious descendants remain at their peak (Δt=1/s), the entire allele frequency trajectory reproduces the fluctuations of the neutral founding class. Thus, neutral site frequency spectra proportional to f1 emerge.

At smaller frequencies, f1/(Nseλ), allele frequency trajectories reflect the smoothed-out fluctuations in the high-fitness classes. At these frequencies, the site frequency spectrum comprises a rapidly increasing number of polymorphisms as the frequency decreases for three reasons (see Figure 3 and Figure 7). First, lower frequencies correspond to smaller feeding-class weights, which are more likely simply due to the effects of drift. Note that because in this frequency range the overall allele frequency is proportional to the total weight in the founding class and not the frequency, this effect leads to the site frequency spectrum falling off at a faster rate than the baseline expectation of f1, which would occur in the absence of smoothing of fluctuations in the founding class due to the finite timescale of selection. Second, the number of individuals with k deleterious mutations in the locus increases with k (for k<λ), causing an increase in the overall rate at which alleles peaking at lower frequencies arise. This variation in the overall number of such alleles gives rise to the steeper power law, f2, that can be compared to the distribution of peaks of individual lineages, which decays at most as f3/2. Finally, peaks that occur at frequencies gkc(f)/(2Nskc(f))fgkc(f)1/[2Ns(kc(f)1)] have duration of order Δt[kc(f)]1/(kc(f)s), which declines with the frequency f, giving rise to the root-logarithm factor.

The site frequency spectrum at intermediate and high frequencies

At frequencies much larger than 1/(Nseλ) but still smaller than 11/(Nseλ), the allele frequency trajectory is described by an effective neutral model on coarse enough timescales (1/s). At these frequencies, the site frequency spectrum is

p(f)2NUneλf, (42)

for

1Nseλf11Nseλ.

Note that this agrees with the result of the branching process calculation, which is valid in a part of this range, at frequencies corresponding to 1/(Nseλ)f1 and 11/(Nseλ)1f1.

This breaks down at even higher frequencies 1f1/(Nseλ). These frequencies correspond to the extinction phase of the wild type, during which the allele frequency no longer mirrors the frequency in the 0-class, but instead declines exponentially at an accelerating rate (see Equation 35). Equation 35 can be straightforwardly integrated in time (see Contribution from the Extinction Stage of Trajectories in Appendix I for details), which yields the form of the site frequency spectrum at these high frequencies

p(f)Uns(1f)logλ(1/(Ns(1f)eλ)), (43)

for

1Nσ1f1Nseλ.

Finally, once the wild-type frequency falls below 1/(Nσ), it will be in an analogous situation as the mutant at very low frequencies: independent of how it is distributed among the fitness classes, its trajectory will be dominated by drift, since most individuals in the population have fitness that does not differ from the mean fitness by more than σ. Thus, at these frequencies, the site frequency spectrum will once again agree with the site frequency spectrum of neutral loci isolated from any selected sites in the genome:

p(f)2Un, for1f1Nσ. (44)

By comparing our predictions to the results of Wright–Fisher simulations, we can see that this argument correctly predicts the form of the site frequency spectrum in these regimes, as well as the frequencies at which these transitions happen (see Figure 3).

The deleterious site frequency spectrum

So far, we have focused primarily on describing the trajectories and site frequency spectra of neutral mutations. However, because the trajectory of a neutral mutation that arises in an individual with k deleterious mutations is equivalent to the trajectory of a deleterious mutation that arises in an individual with k1 deleterious mutations, descriptions of trajectories of deleterious mutations follow without modification from our descriptions of trajectories of neutral mutations. The deleterious site frequency spectrum can thus be constructed from the single-class site frequency spectra of neutral mutations by a simple modification of the total rates at which new deleterious mutations arise [specifically, with the contribution to the deleterious site frequency spectrum of mutations arising in class k being equal to pdel(f,k)=pneutral(f,k+1)·NUdhk/(NUnhk+1)]. By summing these contributions, we find that the deleterious site frequency spectrum is to leading order

pdel(f)={2NUdf,forf1Nσ1f2λlogλ(1/(Nsfeλ)),if1Nσf1NUdeλ0otherwise, , (45)

where the form proportional to f2log[1/(Nsfeλ)]1 is once again valid up to a constant factor (for the same reason as described in Contribution from the Peaks of Trajectories in Appendix I).

Distributions of Effect Sizes

The model we have thus far considered assumes that all deleterious mutations have the same effect on fitness, s. In reality, different deleterious mutations will have different fitness effects. In Appendix J, we show that as long as the variation in the distribution of fitness effects (DFE) is small enough that var(s)/s¯21/log(Ud/s¯), the effects of background selection are well captured by a single-s model. In practice, this means that when considering moderate values of Ud/s¯e5, fractional differences in selection coefficients up to 1/log(Ud/s¯)20% will not substantively alter allele frequency trajectories. In this case, the combined effects of these mutations are well described by our single-s model.

However, when mutational effect sizes vary over multiple orders of magnitude, properties of the DFE will have an important impact on the quantitative details of the mutational trajectories that are not captured by our single-s model. The qualitative properties of allele frequency trajectories will remain the same (see Appendix J): alleles arising on unusually fit backgrounds will rapidly spread through the fitness distribution, peak for a finite amount of time about td generations later, and then proceed to go extinct at a rate proportional to their average fitness cost. However, the quantitative aspects of these trajectories will be different. For instance, small differences in the fitness effects of mutations δs1/td that do not affect the early stages of trajectories will be revealed on timescales of order td and affect the size and the width of the peak of the allele frequency trajectory. We have seen that these two quantities play an important role in determining the properties of allele frequency trajectories and of the site frequency spectrum.

As a result of the fact that weaker effects and smaller differences in effect sizes play a more important role in later parts of the allele frequency trajectory, the DFE relevant during the early phases of the trajectory may be different than the DFE relevant in the later phases of the trajectory. Furthermore, since longer-lived trajectories are also those that reach higher frequencies (having originated in backgrounds of higher fitness), this can result in a different DFE that is relevant at larger frequencies compared to the DFE relevant at lower frequencies. As a result it is possible that, for certain DFEs, no single “effective” effect size can be used to describe the trajectories at all frequencies. The full analysis of a model of background selection in which mutational effect sizes have a broad distribution remains an interesting avenue for future work.

Discussion

In this work, we have analyzed how linked purifying selection changes patterns of neutral genetic diversity in a process known as background selection. We have found that whenever background selection reduces neutral genetic diversity, it also leads to significant distortions in the neutral site frequency spectrum that cannot be explained by a simple reduction in effective population size (see Figure 1). These distortions become increasingly important in larger samples and have more limited effects in smaller samples (Figure 1, B and C). In this sense, the sample size represents a crucial parameter in populations experiencing background selection.

By introducing a forward-time analysis of the trajectories of individual alleles in a fully linked genetic locus experiencing neutral and strongly selected deleterious mutations, we derived analytical formulas for the whole-population site frequency spectrum (see Equations 1 and 2). These results can be used to calculate any diversity statistic based on the site frequency spectrum in samples of arbitrary size. Our results also offer intuitive explanations of the dynamics that underlie these distortions and give simple analytical conditions that predict when such distortions occur (Figure 3). In addition to single time-point statistics such as the site frequency spectrum, our analysis also yields time-dependent trajectories of alleles. We suggest that these may be crucial for distinguishing between evolutionary models that may remain indistinguishable based on site frequency spectra alone. We explain how this intuition about the time-dependent behavior can, in principle, be used to make simple predictions about the history and future of alleles, and we explain that it suggests new statistics of time-resolved samples that can be used to distinguish between different evolutionary models. We discuss these implications in turn below.

The frequency of a mutation tells us about its history and future

In addition to describing the expected site frequency spectrum at a single time point, our analysis of allele frequency trajectories allows us to calculate time-dependent quantities such as the posterior distribution of the past frequency trajectory of polymorphisms seen at a particular frequency, their ages, and their future behavior. For example, since the maximal frequency a mutation can attain strongly depends on the fitness of the background in which it arose (with lower-fitness backgrounds constraining trajectories to lower frequencies), observing an allele at a given frequency places a lower bound on the fitness of the background on which it arose. This in turn is informative about its past frequency trajectory. For example, alleles observed at frequencies f1/(NUdeλ) almost certainly arose in an individual that was among the most-fit individuals in the population and experienced a rapid initial exponential expansion at rate Ud, while alleles observed at frequencies f[k/(eλ)]k/(2Nskeλ) very likely arose on backgrounds with fewer than k deleterious mutations compared to the most-fit individual at the time. We emphasize that these thresholds are substantially smaller than the naive thresholds obtained by assuming that a mutation arising on a background with k mutations can only reach the drift barrier 1/(Ne·ks) corresponding to isolated deleterious mutations of fitness ks in a population of effective size Ne=Neλ.

The fitness of the ancestral background on which a mutation arose is not only interesting in terms of characterizing the history of a mutation, but is also informative of its future behavior. In the strong-selection limit of background selection that we have considered here (Nseλ1), deleterious mutations can fix in the population only exponentially rarely (Neher and Shraiman 2012). Thus, mutations arising on backgrounds already carrying deleterious mutations must eventually go extinct. We have shown that the site frequency spectrum at frequencies f1/(Nseλ) is dominated by mutations arising on deleterious backgrounds. Furthermore, we have shown that most polymorphisms seen at these frequencies are at the peak of their frequency trajectory. This means that we expect the frequency of such polymorphisms to decline on average. Thus, if we were to observe the population at some later time point, we expect that the polymorphisms present at such low frequencies should on average be observed at a lower frequency. In Figure 8, we show how the average change in frequency after Neλ generations depends on the original frequency that a mutation was sampled at, f. Note that the expectation for a neutral population of any size is that the average allele frequency change is exactly equal to zero. In the presence of background selection, this is no longer true for neutral mutations previously observed at frequencies f<1/(Nseλ) and f>11/(Nseλ) (see Figure 8).

Figure 8.

Figure 8

The average change in frequency of an allele observed at frequency f in an earlier sample. The pairs of frequencies were sampled in two consecutive sampling steps in Wright–Fisher simulations in which N=105, NUd=4Ns=2104. In the first step, the frequencies of all polymorphisms in the population and their unique identifiers were recorded. In the second sample, which was taken δt=NeUd/s generations later, the frequencies of all polymorphisms seen in the first sample were recorded [if a polymorphism had gone extinct, or fixed, a frequency f(δt)=0, or 1, was recorded]. The average represents the average frequency change over many distinct polymorphisms and the curve has been smoothed using a Gaussian kernel with width <5% of the minor allele frequency.

In contrast, since polymorphisms observed in the range 1/(Nseλ)f11/(Nseλ) must have originated in a mutation-free background, and since their dynamics reflect neutral evolution in this 0-class, the overall dynamics of such alleles are neutral. Therefore, although drift will lead to variation in the outcomes of individual alleles in this range, the average expected frequency change is equal to zero. This expectation is confirmed by simulations (Figure 8).

Finally, we have seen that polymorphisms seen at frequencies f11/(Nseλ) will typically already have replaced the wild-type allele within the 0-class. Thus, the wild-type allele must eventually go extinct (except for exponentially rare ratchet events). In other words, polymorphisms seen at these frequencies are certain to fix, replacing the ancestral allele at some later point in time (see Figure 8). Together, these results show that the site frequency spectrum can be divided into three regimes in which the dynamics of individual neutral alleles are effectively negatively selected, effectively neutral, and effectively positively selected. These effective selection pressures arise indirectly as a result of the fitnesses of the variants to which a neutral mutation at a given frequency is likely to be linked. This effect is important to bear in mind when analyzing time-resolved samples where these effective selection pressures could naively be misinterpreted as evidence of direct negative selection on low frequency-derived neutral alleles, and direct positive selection on high frequency-derived neutral alleles.

The distinguishability of models based on site frequency spectra

As has long been appreciated, background selection can lead to signatures in the site frequency spectrum that are qualitatively similar to population expansions and selective sweeps (Charlesworth et al. 1993, 1995; Hudson and Kaplan 1994; Tachida 2000; Gordo et al. 2002; Williamson and Orive 2002; O’Fallon et al. 2010; Nicolaisen and Desai 2012; Walczak et al. 2012; Good et al. 2014). Here we have shown that these similarities are not only qualitative, but (up to logarithmic corrections) also quantitatively agree with the site frequency spectra produced under these very different scenarios (Yule 1924; Lea and Coulson 1949; Mandelbrot 1974). This suggests that distinguishing between these models based on site frequency spectra alone may not be possible. We emphasize that these effects of background selection that mimic population expansions are seen in neutral site frequency spectra in a model in which the population size is fixed, so using synonymous site frequency spectra to “correct” for the effects of demography may not always be justified.

The quantitative agreement between the effects of background selection and positive selection that we have seen in the high-frequency end of the frequency spectrum is not purely incidental. In the presence of substantial variation in fitness, alleles that fix among the most-fit genotypes in the population are, in a sense, truly positively selected because they are linked to fewer deleterious mutations than average. As a result, sweep-like behaviors can occur in the absence of positive selection, as long as there is substantial fitness variation, independent of the source of this variation (i.e., whether it arose as a result of beneficial or deleterious mutations). In this case, these models may be indistinguishable even using time-resolved statistics because the allele frequency trajectories themselves have similar features.

In other cases, time-resolved statistics may be able to differentiate between models that produce similar site frequency spectra. For example, under background selection the low-frequency end of the site frequency spectrum is dominated by mutations that are linked to a larger-than-average number of deleterious variants; alleles in this regime are therefore expected to decline in frequency on sufficiently long timescales [of order log(λ)/s]. In contrast, in an exponentially expanding population, mutations present at these frequencies are very unlikely to change in frequency during the expansion. Thus we may be able to distinguish between these models using samples from the same population spaced far enough apart in time.

The effect of very strong deleterious mutations [sO(1)]

In our analysis of background selection, we have focused on mutations with small absolute effects on fitness (s1). We make this choice because although deleterious mutations with very strong effect (e.g., lethal mutations) do exist, they are unlikely to lead to a substantial reduction in genetic diversity unless they also occur at a very high rate (Uds1). In other words, these mutations can only have a substantial effect on reducing diversity if a large fraction of individuals in the population acquire them every generation. Thus, such mutations appear less likely to lead to strong effects on diversity in natural populations than mutations with smaller absolute effects on fitness.

Relationship to the structured coalescent

Throughout this work, we have assumed that selection against deleterious mutations is strong (i.e., Nseλ1), such that they are exponentially unlikely to fix (i.e., Muller’s ratchet is rare). This is the same limit in which the structured coalescent of Hudson and Kaplan (1994) is valid. Since our forward-time analysis of mutational trajectories uses similar approximations as are implicit in that method, it therefore has the same expected range of validity and accuracy. Although this limit has occasionally been referred to as “weak selection” in some prior literature, we emphasize that an assumption that is implicit in the structured coalescent is that selection against deleterious mutations is sufficiently strong that they do not routinely fix. In Supplemental Material, Figure S1, we show that our theoretical predictions indeed produce site frequency spectra that agree with the results of forward-time simulations roughly as well as numerical predictions generated using the structured coalescent. The advantage of our method is that it provides analytical predictions and scales to arbitrary sample sizes, in contrast to the structured coalescent, which is a numerical algorithm for conducting backward-time simulations.

Relationship to results on weakly selected deleterious mutations

In the case where selection is weak enough that deleterious mutations have a substantial probability of fixation (which occurs when Nseλ1), the population ratchets to lower fitness at the locus. In this limit, much like in the strong-selection case that we have studied here, the magnitude of the effects of background selection on diversity is controlled by whether or not deleterious mutations lead to substantial fitness variation at the locus. When deleterious mutation rates are weak enough that the scaled standard deviation in fitness satisfies Nσ1, site frequency spectra look largely neutral (see Figure S2 and Good et al. 2014). However, if the deleterious mutation rate is large enough that Nσ1, previous work has shown that substantial distortions can result (Neher and Shraiman 2011; Kosheleva and Desai 2013; Neher and Hallatschek 2013; Good et al. 2014). By analyzing evolutionary dynamics in this limit, Neher and Hallatschek (2013) have shown that the resulting site frequency spectrum scales as f2 at low frequencies and as {(1f)log[1/(1f)]}1 at high frequencies. These forms are similar to our limiting expressions in the high- and low-frequency ends of the spectrum, but do not contain the neutral region at intermediate frequencies. This neutral region shrinks as Nseλ declines and disappears when Nseλ2. Thus, the form of the site frequency spectrum in Equation 2 approaches the limiting forms for weak selection as Nseλ1 (Figure S2), exactly as expected for the transition to the weakly selected regime (see also Good et al. 2014).

Earlier work has argued that genealogies in this weak-selection limit (Nseλ1) approach the Bolthausen–Sznitman coalescent when fitness variance in the population is sufficiently large, Nσ1 (Neher and Shraiman 2011; Kosheleva and Desai 2013; Neher and Hallatschek 2013; Good et al. 2014). Recently, Hallatschek (2017) has studied allele frequency trajectories that arise in the forward-time dual of the Bolthausen–Sznitman coalescent. Our analysis of trajectories in the presence of strong background selection reveals many of the interesting features seen in that work. For instance, we have seen that, once an allele spreads through the fitness distribution and reaches mutation–selection balance, an effective frequency-dependent selection coefficient emerges:

seff(f)={log(Nseλf)s,iff1Nseλ0,if1Nseλf11Nseλlog[1Nseλ(1f)]s,otherwise. (46)

This effective selection coefficient arises due to the deleterious mutations to which neutral mutations are linked and due to changes with the frequency f of the mutation as high-fitness individuals within the neutral lineage drift to extinction or fixation [see Figure 3 and Large Lineages Arising on Unusually Fit Backgrounds in Appendix E], and is equal to 0 in the quasi-neutral regime (Figure 3). This is analogous to the fictitious selection coefficient, sfic(f)=log(f/(1f)), that emerges in the model analyzed by Hallatschek (2017). The difference in the frequency dependence of the effective selection coefficient between our results and the Hallatschek (2017) model is large when Nseλ1, but becomes negligible as Nseλ1; it underlies the differences between the site frequency spectra of rapidly adapting or ratcheting populations and the strong background selection limit that we have considered here.

Note, however, that there still exists a clear discrepancy between the form of the site frequency spectrum at low frequencies that arises in the Bolthausen–Sznitman coalescent, and the functional form that is obtained by analyzing evolutionary models of weak selection (Figure S3). This suggests that the correspondence between these evolutionary models and the Bolthausen–Sznitman coalescent is only approximate, even in the limit that Nσ. In particular, although the two seem to share dynamical properties which arise once the lineage spreads out through the fitness distribution (including the frequency-dependent selection coefficient), as well as similarities in some aspects of fluctuations in the numbers of high-fitness individuals as they accumulate further mutations (i.e., due to genetic draft; see, e.g., Kosheleva and Desai 2013), it is not immediately obvious that other aspects that we have described here, such as the smoothing of fluctuations due to drift, are identical in both models.

Extensions and limitations of our analysis

We have studied a simple model of a perfectly linked locus at which all mutations are either neutral or deleterious with the same effect on fitness, s. Our primary goal has been to describe the qualitative and quantitative effects of background selection on frequency trajectories and the site frequency spectrum within this simplest possible context. However, it is important to note that the assumptions of our model are likely to be violated in natural populations. In many cases, these additional complications do not change the general conclusions of our analysis. For example, the qualitative properties of the trajectories and site frequency spectra described here apply when deleterious mutations have a broader distribution of effect sizes, and we have shown here that our results are quantitatively unchanged when the distribution of effect sizes (DFE) is sufficiently narrow [var(s)/s¯21/log(Ud/s¯)]. On the other hand, when the DFE is very broad, additional work will be required to determine the quantitative properties of site frequency spectra. We anticipate different parts of the DFE may be important at different frequencies in sufficiently broad DFEs. If this is true, this would be an unusual feature of strong negative selection that does not arise in the case of strong positive selection, in which the effects of DFEs can usually be summarized by a single, predominant fitness effect (Good et al. 2012).

Finally, our assumption of perfect linkage in the genomic segment is likely to be violated in sexual populations, in which sites that are separated by shorter genomic distances are more tightly linked than distant sites. However, even in the presence of recombination, alleles will remain effectively asexual on short enough genomic distances, and are effectively freely recombining on long enough genomic distances (Franklin and Lewontin 1970; Slatkin 1972). In this case, a standard heuristic is to treat the genome as if it consists of freely recombining asexual blocks. In rapidly adapting or ratcheting populations, this heuristic has been shown to yield a rough approximation to diversity statistics when the “effective block length” is set by the condition that each block typically recombines once on the timescale of coalescence (Neher et al. 2013; Good et al. 2014; Weissman and Hallatschek 2014).

However, our analysis highlights that many of the interesting features of allele frequency trajectories in the presence of background selection occur on timescales much shorter than the timescale of coalescence. On these timescales, alleles will be fully linked on much longer genomic distances than this effective block length. This effect will be particularly important for young alleles, which are linked to long haplotypes because of the limited amount of time that recombination has had to break them up. On longer timescales, the length of the genomic segments to which these alleles are linked will become progressively shorter, but will typically not fall below the effective block length on any timescale. Given the strong dependence of allele frequency trajectories on the total mutation rate along this segment, it is less clear what effect such linkage to increasingly shorter genomic segments has on the statistics of allele frequency trajectories. A more detailed analysis of the effects of background selection in linear genomes remains an interesting direction for future work.

It is interesting to note that the effects of background selection on the site frequency spectrum in recombining genomes have been studied previously using forward-time (see, e.g., McVean and Charlesworth 2000; Kaiser and Charlesworth 2008; Zeng and Charlesworth 2010) as well as using backward-time simulations based on extensions of the structured coalescent (see, e.g., Hudson and Kaplan 1994; Zeng and Charlesworth 2011). However, much like in the asexual case, analytical predictions for the magnitude of the effects of background selection in recombining populations are usually limited to samples of two individuals (Hudson and Kaplan 1995; Nordborg et al. 1996). More recently, there has been some interest in exploring the combined effects of background selection and population subdivision (Zeng and Corcoran 2015) or partial asexuality and selfing (Agrawal and Hartfield 2016; Roze 2016). Analytical results in these cases are also often limited to very small samples and to the limit Uds in which the effects of background selection are modest. We hope that our forward-time approach can be extended in future work to explore the effect of background selection in the presence of such factors more fully.

Acknowledgments

We thank Oskar Hallatschek, Joachim Hermisson, Katherine Lawrence, Matthew Melissa, Richard Neher, Daniel Rice, Boris Shraiman, Shamil Sunyaev, John Wakeley, and the members of the Desai laboratory for useful discussions and helpful comments on the manuscript. Simulations in this article were run on the Odyssey cluster supported by the FAS Division of Science Research Computing Group at Harvard University. This work was supported in part by the Simons Foundation (grant 376196), grant DEB-1655960 from the National Science Foundation, and grant GM-104239 from the National Institutes of Health. The authors also acknowledge the Kavli Institute for Theoretical Physics at University of California, Santa Barbara, supported in part by the National Science Foundation grant PHY-1125915, National Institutes of Health grant R25 GM-067110, and the Gordon and Betty Moore Foundation grant 2919.01.

Appendix A: The Propagation of Fluctuations in the Size of the Founding Class

In this appendix, we consider in more detail how fluctuations in the size of the lineage in the founding class propagate to affect the total allele frequency. For this purpose, it will be convenient to consider a neutral mutation that arose in the k=0 class sufficiently long ago that it is in mutation–selection balance. Let the total frequency of the lineage be f. As in the main text, we denote the frequency of the part of the lineage that is in class i by fi. In mutation–selection balance, the fi will satisfy

fi=eλλii!f. (A1)

Consider what happens if the frequency f0 of the founding genotype changes suddenly to some value f0+δf0. Based on the deterministic solution, after a time t, this will lead to a change in the frequency of the part of the lineage in class i, δfi(t), of

δfi(t)=δf0[λ(1est)]ii!. (A2)

In other words, the relative change in the frequency of the lineage in the i-class is

δfi(t)fi=δf0f0[1est]i. (A3)

This approaches δf0/f0 at long times as the allele reestablishes mutation–selection balance. However, we can see from Equation A3 that this change is not felt at the same time in all classes. In the 1-class, the frequency changes gradually, at rate s (Equation A3), and results in a proportional change roughly τ1=1/s generations later. In general, in the i-class, this change is felt after a total delay of roughly τi=log(i)/s generations. Thus, the change propagates from class i to class i+1 over the course of

τi+1τi1(i+1)s (A4)

generations.

Ultimately, τλ=log(λ)/s=td generations later, this change will have been felt in a substantial fraction of the fitness distribution. Fitness classes near the mean of the distribution (which is λ classes below the 0-class) are those that exhibit the largest absolute change in frequency, since they contain the largest number of individuals when the lineage is in mutation–selection balance. Thus, changes in these classes account for a large proportion of the change in the total allele frequency, which explains the origin of the delay timescale, td, that we have introduced in the main text.

Appendix B: The Large Deviations from Average Behavior Caused by Genetic Drift

In this appendix, we consider the importance of the effects of drift in each individual fitness class on the overall allele frequency. In the first subsection, we revisit a standard argument to explain why fluctuations due to genetic drift in the frequency of the founding genotype can never be neglected, framing it in terms that will be useful when considering the importance of drift in classes below the founding class. In the next subsection, we build on this argument to explain why the effects of drift become negligible in all classes i in which the frequency, fi, of the component of the lineage in that class satisfies fi1/(Nsi), but cannot be neglected in all classes in which the frequency does not exceed that threshold.

The Importance of Genetic Drift in the Founding Class

The essential reason why drift can never be neglected in the early phase of a trajectory is that deviations from the low frequency average behavior caused by drift are not small perturbations, but are extremely broadly distributed. Consider for instance a mutation that arises in class k. As we explain in the main text, the founding genotype feels an effective selection coefficient equal to ks. The “deterministic trajectory” of the founding genotype is therefore

fk(t)=fk(0)ekst. (B1)

In other words, the deterministic trajectory of a neutral founding genotype (k=0) is a flat line, whereas the deterministic trajectory of a deleterious founding genotype (k>0) decays exponentially at rate ks.

However, we know that drift leads to large deviations from the deterministic behavior in Equation B1. In fact, we have mentioned that when fk1/(Nsk), drift can lead to an x-fold increase above this expectation with probability 1/x (Fisher 2007). Thus, the deviations from the deterministic expectation due to drift are distributed according to an extremely broad power law. As a result, large deviations from Equation B1 are very likely. For lineages arising in the 0-class, these deviations can take the frequency of a lineage all the way to fixation. However, deleterious founding genotypes with k>0 are exponentially unlikely to exceed the drift barrier at 1/(Nsk). Thus, the distribution of deviations from the mean, deterministic behavior of these founding genotypes also follows the same power law at low frequencies (fk1/(Nsk)), but is capped by selection at frequencies exceeding 1/(Nsk). As a result, the effects of drift on trajectories of deleterious mutations become perturbative at sufficiently large frequencies and can therefore be neglected when fk1/(Nsk).

Because fluctuations in fk always propagate to classes of lower fitness, drift in the founding class has an important impact on the overall allele frequency whenever it has an important impact on fk. This means that the overall frequency trajectory of alleles founded in the 0-class will always be affected by drift in fk, which will cause large, power law-distributed deviations from the deterministic expectation of the total allele frequency trajectory. Similarly, the overall frequency trajectory of alleles founded in a class with k>0 deleterious mutations will be affected by drift in the founding class when the overall allele frequency satisfies fgk·1/(Nsk) (which correspond to founding class frequencies fk1/(Nsk)), but prevented by selection from exceeding frequencies larger than gk·1/(Nsk) (see also Appendix E).

The Importance of Genetic Drift in Classes Below the Founding Class

Given these arguments, one may wonder whether the effects of drift are also important in classes below the founding class, in which individuals carry i>k deleterious mutations. Deviations from deterministic behavior in these classes (i.e., in fi) are also propagated to classes of lower fitness. Such deviations in fi(t), if large, will also have a large impact on the overall frequency trajectory of the allele, f(t). However, since classes below the founding class receive substantial mutational input from higher classes, it is not immediately clear whether the effects of drift on fi(t) will “average out” as a result of these mutations, or whether drift can still lead to large deviations from the deterministic expectation for fi(t). In Appendix E, we show by formally analyzing the distribution of allele frequency trajectories that drift in class i is negligible when fi1/(2Nsi), and in this appendix we give a heuristic argument explaining why this threshold arises. This heuristic argument does not reproduce O(1) factors that are obtained using formal methods (i.e., the factor of 1/2 in 1/(2Nsi)), but it offers additional intuition on the existence of this threshold and its dependence on the parameters N, s, and i.

The threshold 1/(Nsi) is reminiscent of the drift barrier relevant for single deleterious loci of fitness is. However its relevance in classes below the founding class is not immediately obvious. Although the individuals in class i also feel an effective selection pressure equal to is, new mutational events from class i1 counter these effects of selection. Thus, it is not obvious that the combination of the opposing effects of mutation into the class and selection within the class will be stronger than the effects of drift whenever fi1/(Nsi) (as opposed to some other threshold that also depends on Udfi1).

To gain insight into this, we consider in more detail the effects of individual mutational events into class i. Each of these mutational events can be thought of as founding a new sublineage in class i. The frequency trajectory of each sublineage is the same as that of a single locus with fitness is, and the overall trajectory fi(t) is equal to the sum of the trajectories of these sublineages. When a sublineage is small, drift will lead to large deviations from its average (deterministic) frequency trajectory, which is also given by Equation B1. However, as in the founding class, at frequencies larger than 1/(Nsi) these deviations are capped by the effects of selection. Thus, the drift barrier 1/(Nsi) represents the frequency above which fluctuations cannot lead to large deviations of individual sublineages from the average behavior.

To understand when drift has an important impact on the overall i-class trajectory fi(t), we can consider how these deviations in the trajectories of the sublineages add. At sufficiently small frequencies, fi1/(Nsi), the overall trajectory fi will be equal to the sum of random trajectories that have an extremely broad distribution. In this case, the sum will be dominated by the trajectory of the largest sublineage, which will be very different than the average trajectory. Thus, even when the total number of mutational events into class i is large, the effects of genetic drift in class i may not be negligible if each of these mutational events results in a relatively small trajectory. In other words, fluctuations due to drift in the frequency trajectories do not average out, but are rather dominated by the largest deviation from the mean. Conversely, when the total number of sublineages is large enough that many of them reach the frequency 1/(Nsi) (which is guaranteed to happen if the total number of mutational events into the i-class is much larger than 1/(si)), the overall frequency of the lineage will be much larger than 1/(Nsi). In this case, the largest event is no longer very different than the average event; the effects of genetic drift are therefore negligible compared to the effects of selection. The transition between these two behaviors happens when fi1/(Nsi), which roughly corresponds to exactly one sublineage exceeding 1/(Nsi). We discuss these effects using a more formal approach in Appendix G. Note that, by extending this argument to classes i+1 and lower, we can verify that once the frequency trajectory in class i exceeds 1/(Nsi) and becomes predominantly shaped by mutation and selection, the frequency of the allele in all lower-fitness classes is also guaranteed to exceed the corresponding frequency thresholds. This is why we can also neglect the effects of drift in all classes below a class in which the frequency exceeds 1/(Nsi).

Appendix C: The Generating Function for the Total Size of the Labeled Lineage

In this appendix, we consider the generating function for the total frequency of the lineage,

Hf(z,t)=ezf(t), (C1)

and derive a partial differential equation describing how it changes in time. As described in the main text, when the size of the lineage is small [f(t)1], its dynamics are described by the coupled system of Langevin equations for the components fi(t) of the total frequency f that denote the frequency of the part of the lineage that carry i deleterious mutations,

dfi(t)dt=isifi+Udfi+1+fiNηi(t). (C2)

In Equation C2, the ηi are independent, uncorrelated Gaussian noise terms. The total allele frequency is equal to the sum of these components, f(t)=ifi(t).

Note that the total allele frequency f(t) is not a Markov random variable since its evolution depends on the details of the distribution of the individuals within the lineage among the fitness classes. However, the frequencies of the components fi(t) are jointly Markov, with their joint distribution described by the joint generating function

H({zi},t)=exp[izifi(t)]. (C3)

The generating function for f(t) can be obtained from the joint generating function by setting zi=z for all i. We can obtain a PDE for the joint generating function by Taylor expanding H({zi},t+dt) and substituting in the differentials dfi(t)=isfidt+Udfi1dt+fi/Ndtηi(t) from Equation C2, which yields

Ht=i(iszi+zi22NUdzi+1)Hzi. (C4)

We can solve this PDE for the joint generating function by using the method of characteristics. The characteristic curves zi(tt) are defined by

dzidt=isziUdzi+1+zi22N, (C5)

and satisfy the boundary condition zi(t)=z. The linear terms in the characteristic equation arise from selection and mutation out of the i-class, and the nonlinear term arises from drift. Along these curves, the generating function is constant and so H({zi},t)=H({zi(0)},0)=eifi(0)zi(0), where the initial condition fi(0)=δik/N corresponds to a single individual present in class k at t=0. Thus, to obtain a solution for the joint generating function, we need to integrate along the characteristics in Equation C5 backwards in time from t=0 to t=t. In the next few appendices, we obtain these solutions in the limits of weak (Uds) and strong mutation (Uds).

Appendix D: Trajectories in the Presence of Weak Mutation (Uds)

When deleterious mutations arise more slowly than selection removes them (Ud/sλ1), deleterious descendants of a lineage are much less numerous than the founding genotype. To see this, we can expand the characteristics zi(tt) in powers of the small parameter λ. At leading order, the characteristics are uncoupled and can be straightforwardly integrated to obtain

zi0(0)=zeist1+z2Nsi(1eist). (D1)

By substituting this zeroth-order solution into Equation C5, we find that corrections due to deleterious descendants are O(λ) and are therefore small uniformly in z. Thus, the generating function for the total f of the labeled lineage t generations after arising in class k is

Hf(z,t)=exp[1Nzekst1+z2Nsk(1ekst)]+O(λ), (D2)

which agrees with classic results by Kendall (1948) for the generating function of independently segregating loci of fitness ks.

Equation D2 can be inverted to obtain the probability distribution, p(f,t), by an inverse Laplace transform:

p(f,t)=iidz2πiefzHf(z,t). (D3)

This distribution is well known, and can be obtained by standard methods. Noting that Hf(t,z) has a single essential singularity at z=2Nsk/(1ekst), we can perform the integral above either exactly by contour integration (by closing the contour using a large semicircle in the left half-plane and a straightforward application of the residue theorem, which gives a solution in terms of Bessel functions) or approximately by the method of steepest descents (taking care to deform the contour to pass through the saddle point on the right of the essential singularity). By carrying out this inverse Laplace transform, we obtain that the extinction probability by time t is

p(f=0,t)=exp[2sk1eskteskt], (D4)

which becomes of order one when t1/(sk), in agreement with our intuition that a lineage of fitness ks can only survive for order 1/(ks) generations. For nonextinct lineages, the probability distribution of the frequency is

p(f,f>0,t)2Nks1ekstekstN4πf3/4exp[2Nks1ekstf(1ekstNf)2]. (D5)

The site frequency spectrum can be obtained from this distribution of frequencies by integrating Equation D5 in time, or by an alternative method that we present in Appendix F.

Appendix E: Trajectories in the Presence of Strong Mutation (Uds)

When deleterious mutations arise faster than selection can remove them, mutation will play an important role in shaping the trajectory. The relative strength of mutation and selection compared to drift will depend on the frequency of the lineage. Drift will remain the dominant force at frequencies f1/(NUd). However, at larger frequencies, the mutation and selection terms will become important and we will see that the effects of drift in classes of low enough fitness become negligible.

Small Lineages (f1/(NUd))

The dominant term in the characteristic equation in this regime (which corresponds to zNUd in the generating function) is the drift term

dzidtzi22N, (E1)

which has the solution

zi(0)z1+zt2N. (E2)

We can verify that mutation and selection are negligible compared to drift on timescales of order t1/Ud as long as i2λ. Note that this condition (i2λ) is satisfied for essentially all of the individuals in the population since i=02λhi12π/λe(2λ+1)log(2+1/λ)+λ1. By summing the zi(0) terms, we find that on these timescales the generating function for the frequency of the mutation is

Hf(z,t)exp]zN1+zt2N], (E3)

which is just the generating function for the frequency of a neutral lineage (cf. Equation D2). On longer timescales (t1/(NUd)), this approximation breaks down and mutation and selection cannot be neglected for lineages arising in fitness classes far above the mean of the fitness distribution (with kλλ). This is because the probability that a portion of the lineage in a class with fewer than λλ mutations has drifted to a high enough frequency to feel the effects of mutation and selection becomes substantial on longer timescales, which can also be seen from the probability distribution of nonextinct lineages (Equation D5). We consider the generating function of these unusually fit mutations at these higher frequencies in the next subsection. In contrast, mutations that arise on more typical backgrounds with k>λλ mutations can drift to higher frequencies, of order λ/(NUd), before feeling the effects of selection, but cannot substantially exceed a total frequency λ/(NUd). We analyze their trajectories in the following subsection.

Large Lineages (f1/(NUd)) Arising on Unusually Fit Backgrounds (kλλ)

In lineages that reach higher frequencies, a large number of deleterious descendants arise every generation. This leads to strong couplings between the sizes of the components of the lineage in different fitness classes, and diminishes the importance of genetic drift in classes of lower fitness, which receive large numbers of deleterious descendants from classes of higher fitness. We find that, in classes of low enough fitness, the effects of genetic drift are negligible and the dominant balance is between the linear mutation and selection terms.

The solution to the linear (deterministic) problem has been obtained by Etheridge et al. (2009), but we reproduce the derivation briefly for completeness. In the absence of drift, the characteristics evolve according to

dzidt=isziUdzi+1iijzj, (E4)

which defines the linear operator ij. has right eigenvectors φ(j) with eigenvalues js given by

φi(j)={(λ)ji(ji)!,0ij0,otherwise, (E5)

and corresponding left eigenvectors

ψi(j)={λij(ij)!,i>j0,otherwise. (E6)

We can verify that the left and right eigenvectors are orthonormal [ψ(i)φ(j)=lψl(i)φl(j)=δij]. By eigenvalue decomposing zi(t) and integrating backward in time from t=0 to t=t, we obtain zi(tt)=jejstbjφi(j), where the amplitudes bj are set by the boundary condition at t=0, bj=ψ(j)z(t)=ψ(j)z=i=jzjλij/(ij)!. Finally, a summation yields

zi(tt)=eistj=0[λ(1est)]jj!zi+j. (E7)

Setting the boundary condition at t=0 to fi(0)=δki/N and evaluating zk(0), we reproduce the result by Etheridge et al. (2009): in the absence of genetic drift, the descendants of the labeled lineage follow a Poisson distribution that starts in class k and has mean λ(1est) and amplitude eksteλ(1est)/N.

To evaluate the effect of genetic drift on the total size of the lineage at some later time point we set zi=z. A sufficient (but not necessary) condition for genetic drift in class i being negligible in determining the total size of the lineage at some later time point t is that the nonlinear term zi(tt)2/(2Ns)izi(tt) uniformly in t. In the vicinity of some frequency f(t)f, corresponding to z(t)=z1/f, we find that the nonlinear term is negligible uniformly in t as long as

eisteλ(1est)2Nsif forallt,0<t<t. (E8)

Note that the condition in Equation E8 is obtained by plugging in the relationship between zi(tt) and z(t)=z1/f (from Equation E7) into the condition that zi(tt)2/(2Ns)izi(tt). Since the left-hand side in Equation E8 is bounded by eisteλ(1est)eλ(i/(λe))i=gi, the inequality is guaranteed to be satisfied uniformly in t as long as

f12Nsigi. (E9)

Defining kc(f) to be the smallest integer for which fgkc+1/[2Ns(kc+1)] we can verify that genetic drift is negligible in all classes with i>kc(f) but not in class kc(f).

Note that self-consistency of the deterministic solution for kc<λ implies that when fgkc+1/[2Ns(kc+1)] the frequency of the part of the allele in class i satisfies fi1/(2Nsi) for all i>kc, but not for ikc. Also note that this inequality can only be satisfied for some kc<λ if the founding class is sufficiently far above the fitness distribution (λkλ, where gk1). We return to lineages founded in classes with k>λλ mutations in the next subsection.

Thus, since genetic drift has a negligible effect in classes containing more than kc deleterious mutations, the characteristics zi(t) are given by the deterministic solution above, which we have already integrated. The frequency of the part of the lineage in classes with i>kc is therefore a deterministic function of the frequency trajectory in class kc, fkc(t). We can solve for this deterministic function straightforwardly by explicitly including fkc(t) as a variable mutational source term for classes of lower fitness. This yields an expression for the generating function of the entire lineage

Hf(z,t)=ezi=kkcfi(t)zUdtdτfkc(τ)gkc+1(tτ), (E10)

when

z2Ns(kc+1)gkc+1, (E11)

where we have used the notation from the main text: gi(t)=eist+λ(1est).

The relationship between the feeding class trajectory fkc(t) and the allele frequency trajectory f(t)

Equivalently, this result can be rewritten in terms of the relationship between the allele frequency trajectory f(t) and the trajectory of the portions of the alleles in classes with kkc:

f(t)=i=kkcfi(t)+Udtdτfkc(τ)gkc+1(tτ), (E12)

which is valid as long as fgkc+1/[2Ns(kc+1)]. Because the expression on the right-hand side of Equation E12 is dominated by the last term, the full allele frequency trajectory reduces to a single stochastic term fkc. Therefore, we can calculate the distribution of p(f,t) near any given frequency f by: (1) determining the feeding class kc(f), which corresponds to the class of lowest fitness in which genetic drift is not negligible; and (2) calculating the distribution of this time integral of the trajectory in that class, fkc(t), subject to the boundary condition that fk(0)=1/N.

In principle, this is still challenging if kc>k because the trajectory in class kc still depends on the trajectories in higher-fitness classes, all of which are stochastic. In addition, calculating the distribution of the convolution of fkc(t) and gkc+1(t) is still difficult, even when kc=k. Fortunately, a simplification arises from the highly peaked nature of gkc+1(tτ). Because the exponent in gkc+1(tτ) is peaked in time, the integral in Equation E12 is, up to exponentially small terms, dominated by the region in which gkc+1(tτ)fkc(τ) is largest. Since the variation in the magnitude of gkc+1(tτ) is much larger than the variation in the magnitude of fkc(τ), the integral will be dominated by the window during which gkc+1(tτ) is at its peak, as long as fkc(τ)0 in that window. In that case, we can make a Laplace-like approximation in Equation E12, in which we expand gkc+1(tτ) around its peak, and neglect contributions that are far away from this peak, since these are exponentially small. Near τ=ttd(kc+1)=tlog[λ/(kc+1)]/s

gkc+1(tτ)gkc+1e(kc+1)s2[tτtd(kc+1)]2, (E13)

which yields

f(t)Udgkc+1tdτfkc(τ)e(kc+1)s2(tτtd(kc+1))2Udgkc+1ttd(kc+1)Δt(kc+1)2ttd(kc+1)+Δt(kc+1)2fkc(τ)dτUdgkc+1WΔt(kc+1)(ttd(kc+1)). (E14)

As a result of this simplification, the allele frequency does not depend on the full frequency trajectory in the feeding class fkc(t), but only on its time integral (weight) in a window of width Δt(kc+1)=1/(kc+1s) around ttd(kc+1), which we denote by WΔt(kc+1)(ttd(kc+1)). Note that Equation E14 implies a simple condition in terms of the allele frequency trajectory in this feeding class kc that specifies when drift is negligible in downstream classes. We have shown above that as long as the total allele frequency fgkc+1/[2Ns(kc+1)] drift is negligible in classes with more than kc deleterious mutations per individual. From Equation E14, we can see that this condition can be restated in terms of the weight in the feeding class as

WΔt(kc+1)(ttd(kc+1))12NUds(kc+1). (E15)

Thus, kc can also be thought of as corresponding to the class of highest fitness in which the weight exceeds 1/(2NUds(kc+1)).

The approximation we have used in Equation E14 breaks down at very early times [ttd(k)] and very late times, during which fkc(τ)=0 in the relevant window. These correspond to the spreading and extinction phases of the trajectory. We show in Appendix I that the former has a negligible impact on the site frequency spectrum. The latter phase however has an important effect at very high frequencies of the mutant, i.e., when the wild type is rare and in its own extinction phase. During this extinction phase,

gkc+1(tτ)eλ(kc+1)s(tτ) (E16)

uniformly in t and the frequency trajectory is well approximated as

f(t)Udeλ(kc+1)sttdτfkc(τ)e(kc+1)sτ. (E17)

Applying the Laplace approximation once again, we conclude that the integral in Equation E17 is dominated by the window of width 1/[(kc+1)s] prior to extinction in the kc-class and therefore only weakly depends on time. Thus, during this extinction phase, the allele frequency decays exponentially at rate (kc+1)s and can be written as

f(t)=fpeake(kc+1)s(ttpeak), (E18)

for some choice of tpeak, where fpeak reflects the maximal frequency the trajectory reached before the onset of the extinction phase.

Thus, we can see that in the extinction phase of the trajectory, the effective fitness of the lineage changes with the frequency according to

seff(f)=[kc(f)+1]s. (E19)

To obtain an explicit expression for how the feeding class kc(f) and therefore seff(f) depend on the frequency f, we can solve the condition that gkc+1/[2Ns(kc+1)]fgkc/(2Nskc) for kc by setting f=C(f)·gkc+1/[2Ns(kc+1)] for some C(f) that satisfies 1C(f)λ. We find that, to leading order,

kc(f)+1logλ(1Nseλf)whenkc(f)1. (E20)

By plugging this back into the expression for seff(f), we find that, in the extinction phase of the trajectory, the effective selection coefficient changes with the frequency of the lineage according to

seff(f)=logλ(1Nseλf)s,iff1Nseλ. (E21)

In summary, we have shown in this appendix that the allele frequency trajectory in the peak phase of the allele only depends on the time integral of the frequency in class kc over a window of specified width Δt(kc+1) and that, outside this peak phase, the trajectory has an even simpler time-dependent form that we described above.

As we will see in Appendix F, the generating function for this relevant weight in class kc is straightforward to calculate when kc is the founding class (i.e., for kc=k). This case is relevant for trajectories that arise in class k and exceed frequencies fk1/(NUds(k+1)), which means that the feeding-class weight will exceed 1/(2NUds(k+1)) for a certain period of time.

However, not all trajectories that arise in class k will reach such large frequencies. We have seen in an earlier section that trajectories that do not ever exceed frequencies much larger than 1/(NUd) will have a trajectory that is dominated by drift throughout its lifetime. However, even those that do exceed fk1/(NUd) and therefore leave behind a large number of deleterious descendants will often not reach the much larger frequency fk1/(NUds(k+1)). In this case, we will have to treat multiple fitness classes stochastically and the weight relevant for the peak of the trajectory will be that in class kc>k. For kc>k(0), a further simplification results from the fact that the width of the window Δt(kc+1)(t) is longer than the lifetime of the mutation in class kc (see Appendix F and Appendix G for details). We use this simplification to calculate the resulting weight distribution in Appendix G. Finally, in Appendix I we use these results to obtain expressions for the average site frequency spectrum both in the case of strong and weak mutation.

Lineages Arising on Typical Backgrounds (kλ>λ)

Lineages founded in classes with k>λλ mutations will not enter the semideterministic regime described above. This is because selection in each individual class i in which they can be present prevents fi from exceeding 1/(Nsi)<1/(Nsk)1/(NUd), where the latter (1/(NUd)) is the necessary threshold for a large enough number of deleterious descendants to be generated that their dynamics become dominated by selection in some class below the i-class. This threshold equal to 1/NUd emerges from our analysis of the coupled branching process in the previous subsection and is further clarified and discussed in Appendix G.

In contrast to lineages arising far above the mean of the fitness distribution, the frequency trajectories of lineages that arise near the mean of the fitness distribution are dominated by drift and are eventually capped by negative selection at large enough frequencies. Selection becomes an important force about 1/Uds=1/σ generations after the lineage was founded. At this time, the accumulated deleterious load since arising becomes large enough to affect the trajectory of the mutation. This deleterious load will affect the trajectory substantially when the frequency of the lineage f(t) becomes comparable to the drift barrier set by its current relative fitness x(t), 1/(Nx(t)). The expected fitness of a lineage founded near the mean of the distribution [with |x(0)|σ] is x(t)=x(0)Ud(1est). Provided that the lineage has not drifted to extinction by t, its expected frequency at t is f(t)t/N. Thus, when f(t)1/(Nx(t)), the effect of selection will dominate over drift. This occurs when t1/Uds=1/σ. Thus, lineages that arise near the mean of the fitness distribution have a trajectory that has neutral statistics for the majority of its lifetime, but does not exceed 1/Nσ. Finally, lineages arising in classes far below the mean of the fitness distribution (kλλ), will also be dominated by drift, but limited to even lower frequencies. However, these lineages are also comparatively rare and only have a small relative impact on the lowest-frequency part of the site frequency spectrum (f1/(Nσ)).

Appendix F: The Distribution of Allele Frequencies and of the Weight in the Founding Class

In this appendix, we calculate the distribution of frequencies fk(t) and weights WΔt(t)=tΔt/2t+Δt/2fk(t)dt for the stochastic process defined by

dfkdt=ksfk+fkNη(t), (F1)

with fk(0)=1/N and fk(t)=0 for t<0. This process describes the trajectory of the component of the lineage that remains in the founding class (the founding genotype). To calculate these distributions, we begin by defining the joint generating function for the frequency fk(t) and the total time-integrated weight up to time t:

W(t)=0tfk(t)dt. (F2)

The joint generating function for these two quantities is defined as

G(z,ζ,t)=ezfk(t)ζW(t), (F3)

and satisfies the PDE

G(z,ζ,t)t=(ζ+ksz+z22N)G(z,ζ,t)z. (F4)

Once again, we solve this PDE using the method of characteristics. The characteristics z(tt) are defined by

dzdt=ζ+ksz+z22N,dζdt=0, (F5)

and are subject to the boundary condition z(t)=z, ζ(t)=ζ. The generating function is constant along the characteristics (dG/dt=0), and therefore satisfies

G(z,ζ,t)=G(z(0),ζ(0),0). (F6)

After integrating the ordinary differential equations in Equation F5, we find that the characteristics follow

z(tt)=a++(a+a)(za+)exp[(a+a)t2N]za(za+)exp[(a+a)t2N], (F7)

with a±=Nsk[1±1+2ζ/(N(sk)2)].

We can verify that the correct marginal generating function for the frequency of the lineage emerges from this result by setting ζ=0 and imposing the boundary condition G(z,0,t)=G(z(0),0,0)=ez(0)/N, which corresponds to the initial frequency at t=0 being 1/N.

To obtain the marginal generating function for the weight in the window between tΔt/2 and t+Δt/2, we set z=0, t=Δt, and choose a boundary condition that reflects the distribution of frequencies fk(tΔt/2) generations after the lineage was founded (see Equation D2):

GW(z=0,ζ,t=Δt)=exp{1Nz(0)eks(tΔt2)1+z(0)2Nsk[1eks(tΔt2)]}, (F8)

where

z(0)=2ζsk{1+2ζN(sk)2coth[ksΔt21+2ζN(ks)2]+1}. (F9)

The generating function in Equation F8 captures the full time-dependent behavior of the weight in the founding class in a window of width Δt and can be inverted by standard methods. However, it is in practice unnecessary to invert Equation F8 to calculate the site frequency spectrum. For our purposes here, we will be mostly concerned with two special cases: the total weight in the founding class from founding to extinction, W=0f(t)dt, and the time integral of the distributions of frequencies p[f(t)] and weights p[WΔt(t)] in a window of specified width Δt. The former case has been calculated previously by Weissman et al. (2009). We quote and discuss this result for completeness in the section below. We then analyze the latter case in the following section.

The Distribution of the Total Lifetime Weight in the Founding Class, W=0f(t)dt

The first special case that will be relevant to our analysis of trajectories and allele frequency spectra is the total integrated weight in the founding class from founding (t=0) to extinction. By setting Δt=t/2 in Equations F8 and F9, we find that the generating function for the total weight from founding to some later time t is

GW(ζ,t)=exp[2ζNsk{1+2ζN(sk)2coth[kst21+2ζN(ks)2]+1}]. (F10)

Note that Equation F10 becomes independent of time when t2/(ks) (uniformly in ζ), which agrees with our heuristic intuition that the lifetime of a mutation in class k is not longer than 2/(ks) generations. Since we have shown in Large Lineages Arising on Unusually Fit Backgrounds in Appendix E that the allele frequency trajectory f(t) depends on the weight in a window of width ΔtΔt(k+1)=1/(k+1s) (where the sign follows because kck) that is longer than 1/(ks) for k>1 (with k=1 being the marginal case), the distribution of WΔt(k+1)(t) will be either equal to the total lifetime weight of the allele [for tΔt(k+1)] or negligible for tΔt(k+1).

By taking the limit t in Equation F10, we obtain that the generating function for the distribution of the lifetime weight in the founding class is

GW(ζ)=exp[2ζ(Nsk)2+2ζN]. (F11)

The inverse Laplace transform of Equation F11 can be evaluated by standard methods, which yields the distribution of the lifetime weight in the founding class:

p(W)=12Nπ1W3/2eN(ks)2W212NW. (F12)

The Time Integrals of p(f,t) and p(WΔt,t)

To calculate the average site frequency spectrum, we need to calculate the time integral of the distributions of frequencies and weights over time. In principle, this can be done by inverting Equation F8 and then integrating the distribution of WΔt(t) over time. However, since this is a somewhat laborious calculation, we will use a convenient mathematical shortcut in which we first solve for the distribution of weights in a different stochastic process and then relate this back to the original process in Equation F1.

Specifically, we consider the stationary limit of the stochastic process defined by the Langevin equation:

dfdt=θksf+fNη(t). (F13)

This describes the time evolution of the frequency of a lineage with fitness ks in which individuals are continuously generated by mutation at some rate Nθ (and have frequency 1/N at the time when they are generated). This process is relevant because the distribution of frequencies and weights in the stationary process are related to the time integrals of the distributions of f(t) and WΔt(t). More precisely, in the limit that θ0 (keeping N constant), the distributions of f (and its time integrals) in the stationary process are the same as the time-integrated distributions of the nonstationary process, provided that we also divide by the total rate at which new individuals are generated, Nθ, to ensure proper normalization. That is,

p[fk(t)]dt=limθ01Nθp(f,f>0,θ). (F14)

We denote the joint generating function for the frequency, f, and weight in this process, W(t,θ)=0tW(t,θ)dt, by

Gθ(z,ζ,t)=ezf(t,θ)ζW(t,θ). (F15)

Gθ(z,ζ,t) satisfies the PDE

Gθ(z,ζ,t)t=(ζ+ksz+z22N)Gθ(z,ζ,t)zθzGθ(z,ζ,t). (F16)

Note that the generating functions for the two processes are related and that, by setting θ=0 in Equation F16, we obtain the generating function for the nonstationary process (see Equation F4). In particular, the characteristics for Equation F16 are the same as the characteristics for Equation F4 and they follow the form we calculated previously and quoted in Equation F7. Along these characteristics, the generating function satisfies

dGθdt=θz(tt)Gθ, (F17)

or equivalently, after integrating,

Gθ(z,ζ,t)=Gθ(z(0),ζ(0),0)exp[θ0tz(tt)dt]. (F18)

However, the boundary conditions for the two processes are different. The nonstationary process is subject to the boundary condition that there is a single individual present in the lineage at t=0, G(z(0),ζ(0),0)=ez(0)/N, whereas the stationary process is subject to the boundary condition that the process is stationary at the initial time point, t=t. The stationary property of the frequency distribution is guaranteed by the boundary condition Gθ(z(0),ζ(0),0)=[1+z(0)/(2Nsk)]2Nsθ. This can be obtained either by inspection or by substituting an arbitrary boundary condition and finding the limiting form for the generating function for the frequency as t, and noting that z(0) becomes independent of z as t, so the initial condition has no impact on the frequency distribution.

Plugging in the expression for z(tt) from Equation F7 into Equation F18 and performing the integral over t, we arrive at the solution to the joint generating function for f(θ) and W(θ),

Gθ(z,ζ,t)=Gθ(z(0),ζ(0),0)exp[θa+t]{1+(za+)1exp[(a+a)t2N]a+a}2Nθ. (F19)

To obtain the marginal generating function for f(θ), we set ζ=0, giving a+=0,a=2Nsk, and

Gθ,f(z)=(1+z2Nsk)2Nθ. (F20)

Conversely, to get the generating function for W(t,θ) we set z=0, which after some rearranging yields

z(0)=2ζsk{1+2ζN(sk)2coth[kst21+2ζN(ks)2]+1} (F21)

and

Gθ,W(ζ)=eNskθt{1+ζN(sk)21+2ζN(sk)2sinh[kst21+2ζN(ks)2]+cosh[kst21+2ζN(ks)2]}2Nθ. (F22)

We invert Equations F20 and F22 below.

Inversion of the generating functions in Equations F20 and F22

Since only the nonextinct portion of the process contributes to the site frequency spectrum, when inverting the generating functions for the weight and frequency, we will use the following relationship between the probability distribution p(g) and the moment-generating function Gg(z) of a random variable g:

p(g)=iidz2πiezgGg(z)=limx[eixgGg(ix)eixgGg(ix)2πig]+iidz2πiezgg[Ggz]. (F23)

From the definition of the moment-generating function and the sine limit definition of the Dirac δ function [limxsin(xg)/(πg)=δ(g)], it follows that the boundary terms amount to the probability mass at g=0 and that the distribution of the nonzero portion of the process is

p(g,g>0)=iidz2πiezgg[Ggz]. (F24)

After plugging this expression and the generating function for the frequency Equation F20 into Equation F14 and taking the θ0 limit, we find that the time-integrated distribution of frequencies in the founding class is

p[fk(t)]dt=2fke2Nskfk. (F25)

The time-integrated distribution of weights in the feeding class can be obtained in an entirely analogous fashion. In this case, it will be convenient to treat the cases k=0 and k>0 separately. When k=0, a lengthy but straightforward substitution of Equation F22 into Equation F24 gives

limθ0p(W,W>0,θ)Nθ=1Wiidζ2πieζW[1ζ+t2Nζtanh(ζ2Nt)]. (F26)

The simplest way to carry out this integral is by contour integration. To do this, we close the contour using a large semicircle in the left half-plane. The contribution from this circle vanishes as the radius of the semicircle approaches infinity, and so the integral considered above is equal to the sum of the residues within the left half-plane. The integrand has simple poles at ζ/(2N)=nπi/t for n0 with residues 2en2π22NW/t2, which yields

limθ0p(W,W>0,θ)Nθ=1W[2+2n=1en2π22NWt2]=1W{1+ϑ3[0,exp(2π2NWt2)]}, (F27)

where ϑ3 is the elliptic theta function. Asymptotic expansions for small and large arguments give

limθ0p(W,W>0,θ)Nθ={12NπtW3/2,Wt22Nπ2,2W,Wt22Nπ2. (F28)

The case k>0 is slightly more straightforward to evaluate since the length of the intervals we are interested in is longer than the typical timescale of selection, t=Δt(k+1)=1/(k+1s)1/(ks). As a result, the arguments in the hyperbolic functions in Equation F22 satisfy kst·1+2ζ/(N(ks)2)1 (for k>1, with k=1 being the marginal case), which yields a simple form for the distribution of nonzero weights:

limθ0p(W,W>0,θ)Nθ=1Wiidζ2πieζWlimθ0[1NθG(ζ,t)ζ]1Wiidζ2πieζWtNsk11+2ζN(sk)2. (F29)

Note that the expression in Equation F29 reduces to a standard Gaussian integral. By carrying out this integral, we obtain for the time integral of the distribution of weights in the founding class:

p[W(t)]dt=limθ0p(W,W>0,θ)NθeN(sk)2W/2t2NπW3/2. (F30)

Appendix G: The Distribution of Weights in Classes Below the Founding Class

We have seen in Appendix E that, when the allele frequency trajectory in the founding class fk(t) is small enough, the effects of genetic drift cannot be ignored in multiple fitness classes. In this section, we consider how the trajectories (and their weights) in these stochastic classes are coupled and derive the distribution of lifetime weights in class k+Δ, in which individuals carry Δ more mutations compared to individuals in the founding class.

The Relationship Between the Trajectory in the Founding Class k, and the Weight in Class k+1

We begin by considering the total lifetime weight in the class right below the founding class (i=k+1), which we will denote Wk+1. Wk+1 clearly depends on the weight in the founding class, Wk, since the total number of mutational events from the k-class into the (k+1)-class is equal to NUdWk. As we describe in The Importance of Genetic Drift in Classes Below the Founding Class in Appendix B, each one of these mutational events founds a sublineage, and the stochastic trajectory of each sublineage is described by Equation F1. The total weight of the lineage in class k+1 is simply the sum of the weights of each of these sublineages. The generating function of the lifetime weight in the k+1 class, Wk+1, is related to the lifetime weight in the founding class Wk according to

GWk+1(ζ)=exp[ζj=0NUdWkWk+1(j)], (G1)

where Wk+1(j) denotes the weight of the sublineage founded by the jth mutational event. Since the Wk+1(j) are independent and identically distributed, the generating function of their sum is equal to the product of their generating functions and

GWk+1(ζ)=GWk+1(1)NUdWk(ζ), (G2)

where the final average is taken over the distribution of the weight, Wk, in the k-class. The generating functions of Wk and Wk+1(1) are both given by Equation F11.

Using the same methods that we used to invert Equation F11, we obtain that the distribution of the total weights in class k+1, conditioned on the weight in class k being equal to Wk, is

p(Wk+1|Wk)=NUdWk2πN1Wk+13/2eN[s(k+1)]2Wk+12(NUdWk)22NWk+1. (G3)

We can see from this equation that the neutral decay of the distribution of weights in class k+1, which results from drift and is proportional to 1/W3/2, is exponentially cut off for Wk+1(NUdWk)2/(2N) and for Wk+12/{N[s(k+1)]2}.. The latter, high-weight cutoff is familiar from before and results from selection within the k+1 class. The low-weight cutoff results from the pressure of incoming mutational events.

A simple heuristic can explain the dependence of the low-weight cutoff on the weight in the founding class, Wk. The weight Wk+1 is at least as large as the weight of the largest sublineage. Because each of the NUdWk mutational events generates a sublineage that survives for T generations with probability 1/T and leaves a weight of order T2/N, at least one of these sublineages will survive for T generations with probability equal to 1(11/T)NUdWk1eNUdWk/T. This probability is of order 1 for TNUdWk, which means that with probability order 1 at least one of the sublineages will have weight T2/N(NUdWk)2/N. Note that this also means that when Wk>1/(NUd2) (consistent with the lineage exceeding frequency 1/(NUd) in the founding class), the weight in the next class is guaranteed to be larger than the weight in the founding class. This means that lineages that exceed the frequency 1/(NUd) in the founding class are almost guaranteed to generate an even larger number of individuals in the next class, which generates an even larger number of individuals in the following class, and so on.

We have implicitly assumed that the trajectory of each of the sublineages is dominated by drift. This will be true as long as T<1/[(k+1)s] [i.e., as long as Wk<1/[NUds(k+1)]]. In contrast, when Wk1/[NUds(k+1)], a large number the lineages will exceed the frequency 1/[Ns(k+1)] in the next class, and the trajectory in that class will become dominated by selection. We have shown in Appendix E that once this happens, drift in class k+1 and all classes below it will become negligible. Note that this heuristic argument also explains the self-consistency condition that emerged in Appendix E (see Equation E15) and explains why genetic drift becomes negligible in the (kc+1)-class whenever the weight in the kc-class is larger than 1/[NUds(kc+1)]..

In the section below, we will use the insights above to evaluate the weight distribution in class k+Δ, conditioned on the lineage arising in class k and selection being negligible in all classes beneath it, Wi1/[N(si)2] for ik+Δ. Because the lifetime of the longest-lived sublineage in each of these classes is at most 1/(is) in this limit and because the sublineages are seeded into the i-class over a time that is, by assumption, shorter than 1/Udis, the total lifetime of the lineage in all of these classes is strictly shorter than Δt(i), which is why we do not need to be concerned with the full, time-dependent properties of the distribution of weights in this class. Instead, the calculation of the distribution of lifetime weights will suffice for calculating the site frequency spectrum.

The Distribution of the Weight in Class k+Δ

Having obtained the distribution of the weight in class k+1, conditioned on the weight in class k being equal to Wk (see Equation G3), we can calculate the marginal distribution of weights Wk+1 by averaging over Wk. In the limit that 1/(NUd2)Wk1/[NUds(k+1)] and Wk+11/[N(sk)2] that we are interested in here, this distribution is

p(Wk+1)=0dWkp(Wk)p(Wk+1|Wk)=21/4Γ(54)πN1/4Ud1/2Wk+15/4. (G4)

Note that the distribution in the (k+1)-class decays less rapidly than in the k-class. In particular, the probability that the weight in the (k+1)-class exceeds 1/[NUds(k+2)] (and leads to the deterministic propagation of individuals in classes with k+2 or more deleterious mutations) is

P[Wk+11NUds(k+2)]=21/4Γ(14)πUd(k+2λ)1/4, (G5)

which is larger than the probability that the weight in the k-class exceeds the corresponding value by a large factor λ1/4, consistent with our intuition that the weight in the class below the founding class is guaranteed to exceed the weight in the founding class if NUd2Wk>1.

In general, we can calculate the distribution of the weight in class k+Δ by iterating this procedure. Specifically, the distribution of the weight in class Wk+2 conditioned on the weight in class k+1 being equal to Wk+1 also follows Equation G3 (but with k changed to k+1). By repeating the above procedure Δ times, we find that the distribution of lifetime weights in class k+Δ is

p(Wk+Δ)=12πN1Wk+Δ1+2(Δ+1)j=1ΔΓ[12(12j)]2π(NUd22)2(j+1)=Udj=1ΔΓ[12(12j)](2π)Δ+1(NUd2Wk+Δ2)2(Δ+1)1Wk+Δ. (G6)

Appendix H: The Site Frequency Spectrum in the Presence of Weak Mutation (Uds)

In the following two appendices, we use the results obtained in previous sections to calculate the site frequency spectrum of the labeled lineage in the limits that f1 and 1f1, by evaluating and inverting the generating function Hf(z,t) for the total frequency of the labeled lineage.

We have seen in Appendix D that trajectories of mutations in the presence of weak background selection (Uds) are, to leading order in the small parameter λ, the same as those of isolated loci with fitness ks. In Appendix F, we have shown that the time-integrated distribution of allele frequencies of a single, isolated locus of fitness ks is

p(f,f>0)=2fe2Nskf, (H1)

which agrees with classical results by Ewens (1963) and Sawyer and Hartl (1992). Thus, the contribution to the site frequency spectrum of neutral mutations arising in class k is

p(f,k)=NUnhkp(f,f>0)=2NUnhkfe2Nskf+O(λ). (H2)

Summing the contributions of all the classes, we find that the full neutral site frequency spectrum is

p(f)=kp(f,k)=2NUnf+O(λ). (H3)

The site frequency spectrum of deleterious mutations follows from the same argument, since the trajectory of a deleterious mutation arising on the background of an individual with k deleterious mutations is the same as the frequency trajectory of a neutral mutation arising in an individual with k+1 deleterious mutations. Thus, the site frequency spectrum of deleterious mutations is

pdel(f)=kNUdhk2fe2Ns(k+1)f=2NUdfe2Nsf+O(λ), (H4)

which once again agrees to leading order with the site frequency spectrum that we would have obtained assuming that all selected sites at the locus were isolated.

Appendix I: The Site Frequency Spectrum in the Presence of Strong Mutation (Uds)

In this appendix, we calculate the site frequency spectrum of the labeled lineage in the limits that f1, 1f1, and that λ1 by evaluating and inverting the generating function Hf(z,t) for the total frequency of the labeled lineage.

In the presence of strong mutation, we have seen that trajectories of mutations are dominated by drift at the lowest frequencies, where the generating function reduces to the generating function of a neutral mutation and is simply equal to the k=0 limit of the single locus-generating function in Equation D2. We have already calculated the site frequency spectrum that results from these trajectories in the previous section. Plugging in these results, we find that

p(f)=kNUnhk2f2NUnffor f1Nσ. (I1)

The site frequency spectrum at these frequencies is dominated by the contributions of lineages arising in average backgrounds, with |kλ|λ=σ/s. By the same argument, the frequency spectrum of deleterious mutations at the same frequencies is also

pdel(f)=kNUdhk2f2NUdffor f1Nσ. (I2)

At larger frequencies, the site frequency spectrum becomes dominated by lineages arising in unusually fit backgrounds, with kλλ. Their trajectories are instead described by Equation E10. We have seen that the integral in the exponent of Equation E10 has a different dependence on t for ttd(k), ttd(k), and ttd(k), which we have labeled the “spreading,” “peak,” and “extinction” phases of the trajectory. In evaluating the site frequency spectrum p(f), it will be convenient to calculate the contributions from each of these phases separately. We denote these contributions as pspread(f), ppeak(f), and pext(f), and the full site frequency spectrum is obtained by summing:

p(f)=pspread(f)+ppeak(f)+pext(f). (I3)

We evaluate ppeak(f) and pext(f) in the next two subsections of this appendix. We then show in the last subsection of this appendix that the contribution from pspread(f) is subdominant to that of pext(f).

Contribution from the Peaks of Trajectories

In Large Lineages Arising on Unusually Fit Backgrounds in Appendix E, we have shown that, in the peak phase of the trajectory, the total allele frequency is

f(t)Udgkc+1WΔt(kc+1)(ttd(kc+1)), (I4)

where kc is the class with the smallest number of mutations for which fgkc+1/[2Ns(kc+1)] or equivalently, the class with the smallest number of mutations in which the weight exceeds 1/[NUds(kc+1)].

We have seen above in Appendix G that, to achieve such a large weight in class kc, a mutation could have arisen in class k=kc and traced an unusually large trajectory, or arisen in class kc1 and traced a smaller trajectory in that class, which led to the creation of a large number of deleterious descendants in class kc, at least one of which had weight exceeding 1/[NUds(kc+1)]. Alternatively, it could have also arisen in class kc2 and traced an even smaller trajectory in that class that led to a larger weight in class kc1, and a sufficiently large weight in class kc for genetic drift to be negligible in classes i>kc+1. In other words, in the range of frequencies

12Ns(kc+1)gkc+1f12Nskcgkc, (I5)

we see the peaks of trajectories originating in classes k<kc, as long their weight in class kc is large enough that genetic drift in classes of lower fitness can be ignored. All of these peaks contribute to the site frequency spectrum and, by integrating Equation I4 in time, we find that

ppeak(f)=k=0kc(f)NUnhkp[f=Udgkc+1WΔt(kc+1)|aroseink]=k=0kc(f)NUnhk1Udgkc+1p[WΔt(kc+1)=fUdgkc+1|aroseink], (I6)

where the last term represents the time-integrated distribution of weights in a window of width Δt(kc+1) in class kc of a lineage that arose in class k. This distribution is given by Equation F28 for kc=0. Otherwise, when kc>1, the time-integrated distribution in Equation I6 is equal to the product of the window width, Δt(kc+1), and the distribution of lifetime weights in the founding class, given in Equation G6.

Since we have previously calculated all of these quantities, we can now turn to evaluating the sum in Equation I6. When fg1/(2Ns), then kc=0, and the sum in Equation I6 has only one term (k=0). By substituting in the expression for the time-integrated distribution of weights in Equation F28, we find that

ppeak(f)={2NUneλf,iff1NseλNUnf3/2[2eλπeNs]1/2,1NUdeλf1Nseλ. (I7)

At lower frequencies (1/(NUds)fg1/(2Ns)), lineages originating in multiple different fitness classes will be able to contribute to the site frequency spectrum. At these frequencies,

ppeak(f)=k=0kc(f)NUnhkΔt(kc+1)Udgkc+1p(Wk+Δ=fUdgkc+1|k+Δ=kc). (I8)

Plugging in the expression for p(Wk+Δ) from Equation G6, we find

ppeak(f)=NUnλπf1kc(f)+1k=0kchkj=1kckΓ[12(12j)](2π)kck+1(NUdfgkc+1)2(kck+1). (I9)

Because λ1, this sum is dominated by the k=kc term, since hk decays much more rapidly with decreasing k than any of the other terms increase. To evaluate the f-dependence of this term for fg1/(2Ns) and kc(f)1, we repeat the same procedure as in Large Lineages Arising on Unusually Fit Backgrounds in Appendix E to obtain an explicit form for kc(f). Briefly, to solve the self-consistency condition for kc(f), gkc+1/[2Ns(kc+1)]fgkc/(2Nskc), we set f=gkc+1·C(f)/[2Ns(kc+1)] for some C(f) that satisfies 1C(f)λ and find that to leading order

kc(f)+1logλ(1Nseλf)whenkc(f)1. (I10)

Plugging in, we obtain that the leading-order term in the distribution of peak sizes is

ppeak(f)NUnC(f)1/222πλNsf21logλ(1Nseλf),forlogλ(1Nseλf)1. (I11)

The term C(f) depends on f weaker than logarithmically and on frequency scales on which ppeak(f) changes substantially it will be approximately constant, C(f)C.

Because the crossover between the f3/2 scaling of ppeak(f), which occurs at high frequencies g1/(2Ns)f1/(Nseλ) [where kc(f)+1=1], and the f2log[1/(Nseλf)]1/2 behavior, which is valid at substantially lower frequencies [where kc(f)1], is in principle broad, this constant factor C is difficult to determine: asymptotic matching does not typically work well in the presence of such broad transitions and crude “patching” methods do not, in general, offer satisfactory results (Hinch 1991). Thus, Equation I11 is undetermined up to the constant factor C1/2, which is between 1 and λ. For our purposes here, this level of precision is sufficient—O(1) precision in the form of the spectrum was, after all, expected in the Laplace-like approximation that we used in Large Lineages Arising on Unusually Fit Backgrounds in Appendix E to calculate the stochastic integral over the trajectory of the feeding class. Thus, by absorbing the 22π term into this constant factor and relabeling C1/2 as C, we find that the peak contribution to the site frequency spectrum is

ppeak(f)NUnλNsf2Clogλ(1Nseλf),forlogλ(1Nseλf)1, (I12)

with C in the range 1Cλ.

Contribution from the Extinction Stage of Trajectories

Once the trajectory is beyond its peak, the total allele frequency decays as

f(t)=fpeake[kc(f)+1]s(ttpeak), (I13)

where fpeak denotes the maximal frequency that the trajectory reaches and Equation I13 is valid for ttd(k). Note that this stage only exists for frequencies f1/(Nseλ). At higher frequencies, f1/(Nseλ), the total allele frequency simply mirrors smoothed fluctuations in the founding class. Equation I13 can be straightforwardly integrated in time to obtain the contribution of this trajectory to the site frequency spectrum

p(f|fpeak)={1[kc(f)+1]sf,iffpeakf,0,otherwise. (I14)

Averaging Equation I14 over all possible trajectories, we find that

pext(f)=NUn1f1[kc(f)+1]sProb(fpeak>Df), (I15)

where D>1 is a constant that we have introduced to correctly account for the fact that the peak phase occurs at frequencies that are at least O(1) higher than the frequencies in the extinction stage.

For peak frequencies fpeak1/(Nseλ), we have already calculated the overall time-integrated distribution of peak sizes of lineages arising in classes of all fitness and we can use this result to calculate the total probability that a trajectory passes through f in its extinction stage,

Prob(fpeak>Df)=Dfppeak(f)dfNUnΔt(kc+1)=CλNfD. (I16)

This means that the contribution to the site frequency spectrum from the extinction phase of trajectories is equal to

pext(f)=NUnDλNsf2Clogλ(1Nseλf)<NUnλNsf2Clogλ(1Nseλf). (I17)

Therefore, pext(f) is strictly smaller than ppeak(f) by a factor [logλ[1/(Nseλf)]]1/2, which is large when f1/(Nseλ). Thus, this phase of the trajectory has a small effect on the low-frequency end of the spectrum.

However, in the high-frequency end of the spectrum, when 1f1/(Nseλ), the only contribution comes from this extinction phase of the wild type, which starts once the mutant approaches the frequency f11/(Ns) in the 0-class. These events happen at rate equal to NUneλ1/(Neλ)=Un and each contributes 1/{s[kc(1f)+1](1f)} to the site frequency spectrum. Multiplying these two terms, we find that the site frequency spectrum is proportional to

pdec(f)=Uns1(1f)logλ[1Ns(1f)eλ],if1Nσ1f1Nsfeλ. (I18)

Contribution from the Spreading Stage of Trajectories

At frequencies f1/(Nseλ), the site frequency spectrum also receives contributions from the spreading stage of trajectories, in which the allele frequency rapidly increases as the allele spreads through the fitness distribution. In this stage, the rate at which the frequency increases is strictly larger than what it would be if we ignored any contributions from the founding class after the mutation exceeds frequency 1/(NUd) [i.e., assuming f(t)=ekst+λ(1est)/(NUd)]:

df(t)dtfs(λestk). (I19)

Far below the peak of the trajectory, where λestk, the contribution from this stage of a single trajectory to the frequency spectrum that passes through f is thus simply bounded by

psingle(f)=1|dfdt|1fsλest1sflog(1Nseλf)=1sflog(λ)logλ(1Nseλf)1sf[kc(f)+1]. (I20)

Since the number of trajectories that pass through frequency f in the spreading phase is the same number that pass through f in the extinction phase, the contribution from the spreading phase to the site frequency is strictly smaller than that of the extinction phase throughout the region where both contributions exist, f1/(Nseλ).

Constructing a Single Curve from Piecewise Asymptotic Functions

In the previous sections of this appendix, we have shown that the site frequency spectrum is given by

p(f){2NUnf,forf1Nσ,NUnNsf2λlogλ(eλNsf),for1NσfeλNs,2NUneλf,foreλNsf1eλNs,NUnNs(1f)logλ[eλNs(1f)],for1Nσ1feλNs,2NUnf,for1f1Nσ. (I21)

As we have explained, line 2 of Equation I21 is valid up to a constant factor C1(λ) that is bounded by 1C1(λ)λ. These piecewise functions represent the leading-order behaviors far away from the transitions between the different regimes, which occur at f=1/(Nσ), 1/(Nseλ), 11/(Nseλ), and 11/(Nσ). For practical purposes, it is often convenient to construct a single theoretical curve that joins these curves at these transition points, while maintaining the correct form far away from the transition points. This procedure is not intended to extend the validity of the results outside of the regimes where asymptotic forms are available and is certainly not guaranteed to produce the correct functional forms at the transitions. However, it often yields satisfactory results, especially when the transitions are narrow in practice and when the two asymptotic forms are expected to lie on opposite sides of the behavior at the transition (i.e., one is expected to overestimate, and the other to underestimate). In the present case, the latter condition is true at the transitions at 1/(Nseλ) and 11/(Nseλ).

Here, we have used a sigmoid function,

g(f)=11+e+Nseλf (I22)

to join the functional forms at the transitions, which has the convenient property

g(f){1,forf1Nseλ0,forf1Nseλ. (I23)

In addition to this, because the forms are valid when 1/(Nσ)f1/(Nseλ) and 1/(Nσ)1f1/(Nseλ) have logarithmic divergences near the transitions (i.e., for f,1f=1/(Nseλ)), we also add small additive factors to these logarithms to avoid nonsensical results. Specifically, to compare our theoretical predictions with simulations, we plot

pjoined(f)=2NUn[C1g(f)Nsf2λ[log(1Nseλf)+C2]+[1g(f)][1g(1f)]eλf+C3g(1f)Ns(1f){log[eλNs(1f)]+C4}],for1Nσf11Nσ. (I24)

C1, C2, C3, and C4 were chosen to ensure visual smoothness of the curve. Note that the constant C3 is only necessary to ensure visual smoothness of the curve at limited λ (adding C4 to the denominator to control the logarithmic divergence causes the curve to be shifted downward, and C3 helps to correct for this). We tabulate the values used in Table I1.

TABLE I1.

Values of small constants defined in Eq. (I24) that were used in this paper.

λ C1 C2 C3 C4
≥3 1.5 0.5 1.4 1
≤2 1.0 0.5 1.4 1

In principle, we could also use a similar procedure to join the asymptotic forms at the transitions at 1/(Nσ) and 11/(Nσ). However, since both asymptotic forms overestimate the site frequency spectrum near these transitions, this works no better than simply setting

p(f)={2NUnf,forf1Nσpjoined(f),for1Nσ<f<11Nσ2NUnffor11Nσf. (I25)

This is the choice we have made when calculating theoretical predictions for site frequency spectra of smaller samples, which were necessary for comparisons with the structured coalescent.

Appendix J: Distributions of Effect Sizes

When the effects of deleterious mutations are not all identical but instead have a distribution with finite width, ρ(s), the deterministic dynamics that arise through the combined action of mutation and selection will be modified. In this appendix, we consider these deterministic dynamics. For concreteness, we assume that the fitness effects of new mutations come from a gamma distribution with mean s¯ and shape parameter α,

ρ(s)=ααΓ(α)s¯αesα/s¯, (J1)

and that these deleterious mutations occur at an overall rate Ud.

Under the assumption that all mutations have strong enough effects on fitness that the fitness of the population at the locus does not experience Muller’s ratchet on timescales of coalescence, the mean fitness of an allele at the locus will be equal to Ud, with the most-fit individuals being those with no deleterious mutations and an absolute fitness equal to zero. Consider now the deterministic dynamics of a lineage founded in an individual at absolute fitness x. The fitness of the lineage founded by this lineage will change as it accumulates new deleterious mutations according to

x(t)=x+Ud0dsρ(s)est. (J2)

Evaluating this integral, we find

x(t)=x+Ud(1+s¯tα)α. (J3)

When α is sufficiently large, corresponding to a sufficiently narrow fitness distribution, the resulting trajectory is well approximated by assuming that all fitness effects are the same and equal to the average fitness s¯ [or, more precisely, the harmonic mean of ρ(s), s¯·(α1)/αs¯]. To calculate how large α needs to be for this approximation to be valid, we can calculate the deterministic expectation for the average number of individuals in the lineage at time t after founding. This quantity is equal to

g(x,t)=exp[0tx(t)dt]=exp{xt+Udαs¯(α1)[1(1+s¯t/α)(α1)]}. (J4)

We see that this differs from the single-s expression only in the last term, proportional to (1+s¯t/α)(α1). At sufficiently short times, tα/s¯, this is well approximated by es¯t(α1)/α. On sufficiently long timescales, this will not be the case. However, because the overall magnitude of this term becomes negligible at times long after the peak of g(x,t), ttd, we only need it to remain well approximated by an exponential on timescales ttd, which requires that αlog(Ud/s¯)1. When this is the case, g(x,t) is, up to perturbative corrections, given by

g(x,t)exp[xt+Udα(α1)s¯(1eαs¯α1t)], (J5)

and the effects of selection are well described by a single-s model on all timescales.

Footnotes

Supplemental material available at Figshare: https://doi.org/10.25386/genetics.6167591.

Communicating editor: N. Barton

Literature Cited

  1. Agrawal A., Hartfield M., 2016.  Coalescence with background and balancing selection in systems with bi- and uniparental reproduction: contrasting partial asexuality and selfing. Genetics 202: 313–326. 10.1534/genetics.115.181024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Begun D. J., Aquadro C. F., 1992.  Levels of naturally occurring dna polymorphism correlate with recombination rates in d. melanogaster. Nature 356: 519–520. [DOI] [PubMed] [Google Scholar]
  3. Birky C. W., Walsh J. B., 1988.  Effects of linkage on rates of molecular evolution. Proc. Natl. Acad. Sci. USA 85: 6414–6418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Charlesworth B., 1996.  Background selection and patterns of genetic diversity in Drosophila melanogaster. Genet. Res. 68: 131–149. [DOI] [PubMed] [Google Scholar]
  5. Charlesworth B., Morgan M. T., Charlesworth D., 1993.  The effect of deleterious mutations on neutral molecular variation. Genetics 134: 1289–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Charlesworth D., Charlesworth B., Morgan M. T., 1995.  The pattern of neutral molecular variation under the background selection model. Genetics 141: 1619–1632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Comeron J. M., 2014.  Background selection as baseline for nucleotide variation across the Drosophila genome. PLoS Genet. 10: e1004434 10.1371/journal.pgen.1004434 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cutter A. D., Payseur B. A., 2003.  Selection at linked sites in the partial selfer Caenorhabditis elegans. Mol. Biol. Evol. 20: 665–673. [DOI] [PubMed] [Google Scholar]
  9. Desai M. M., Fisher D. S., 2007.  Beneficial mutation–selection balance and the effect of linkage on positive selection. Genetics 176: 1759–1798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Elyashiv E., Sattath S., Hu T. T., Strutsovsky A., McVicker G., et al. , 2016.  A genomic map of the effects of linked selection in drosophila. PLoS Genet. 12: e1006130 10.1371/journal.pgen.1006130 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Etheridge, A., P. Pfaffelhuber, and A. Wakolbinger, 2009 How often does the ratchet click? facts, heuristics, asymptotics, pp. 365–390 in Trends in Stochastic Analysis (London Mathematical Society Lecture Note Series), edited by J. Blath, P. Mörters, and M. Scheutzow. Cambridge University Press, Cambridge, UK. [Google Scholar]
  12. Ewens W. J., 1963.  The diffusion equation and a pseudo-distribution in genetics. J. R. Stat. Soc. B 25: 405–412. [Google Scholar]
  13. Ewens W. J., 2004.  Mathematical Population Genetics I. Springer-Verlag, New York. [Google Scholar]
  14. Fisher, D. S., 2007 Course 11 evolutionary dynamics, pp. 395–446, in Complex Systems (Les Houches, Vol. 85), edited by J.-P. Bouchaud, M. Mézard, and J. Dalibard. Elsevier, Amsterdam. [Google Scholar]
  15. Flowers J. M., Molina J., Rubinstein S., Huang P., Schaal B. A., et al. , 2012.  Natural selection in gene-dense regions shapes the genomic pattern of polymorphism in wild and domesticated rice. Mol. Biol. Evol. 29: 675–687. 10.1093/molbev/msr225 [DOI] [PubMed] [Google Scholar]
  16. Franklin I., Lewontin R. C., 1970.  Is the gene the unit of selection? Genetics 65: 707–734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Good B. H., Desai M. M., 2013.  Fluctuations in fitness distributions and the effects of weak linked selection on sequence evolution. Theor. Popul. Biol. 85: 86–102. 10.1016/j.tpb.2013.01.005 [DOI] [PubMed] [Google Scholar]
  18. Good B. H., Rouzine I. M., Balick D. J., Hallatschek O., Desai M. M., 2012.  Distribution of fixed beneficial mutations and the rate of adaptation in asexual populations. Proc. Natl. Acad. Sci. USA 109: 4950–4955. 10.1073/pnas.1119910109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Good B. H., Walczak A. M., Neher R. A., Desai M. M., 2014.  Genetic diversity in the interference selection limit. PLoS Genet. 10: e1004222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Gordo I., Navarro A., Charlesworth B., 2002.  Muller’s ratchet and the pattern of variation at a neutral locus. Genetics 161: 2137–2140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Haigh J., 1978.  The accumulation of deleterious genes in a population--Muller’s ratchet. Theor. Popul. Biol. 14: 251–267. [DOI] [PubMed] [Google Scholar]
  22. Hallatschek O., 2017.  Selection-like biases emerge in population models with recurrent jackpot events. bioRxiv 182519. 10.1101/182519 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Higgs P. G., Woodcock G., 1995.  The accumulation of mutations in asexual populations and the structure of genealogical trees in the presence of selection. J. Math. Biol. 33: 677–702. [Google Scholar]
  24. Hinch E. J., 1991.  Perturbation Methods. Cambridge University Press, Cambridge, UK. [Google Scholar]
  25. Hudson R. R., Kaplan N. L., 1988.  The coalescent process in models with selection and recombination. Genetics 120: 831–840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Hudson R. R, Kaplan N. L., 1994.  Gene trees with background selection, pp. 140–153 in Non-Neutral Evolution: Theories and Molecular Data, edited by Golding B. Springer, Boston. [Google Scholar]
  27. Hudson R. R., Kaplan N. L., 1995.  Deleterious background selection with recombination. Genetics 141: 1605–1617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kaiser V. B., Charlesworth B., 2008.  The effects of deleterious mutations on evolution in non-recombining genomes. Trends Genet. 25: 9–12. [DOI] [PubMed] [Google Scholar]
  29. Keinan A., Clark A. G., 2012.  Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336: 740–743. 10.1126/science.1217283 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Kendall D. G., 1948.  On the generalized “birth-and-death” process. Ann. Math. Stat. 19: 1–15. [Google Scholar]
  31. Kimura M., Maruyama T., 1966.  The mutational load with epistatic gene interactions in fitness. Genetics 54: 1337–1351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Kosheleva K., Desai M. M., 2013.  The dynamics of genetic draft in rapidly adapting populations. Genetics 195: 1007–1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Lea D. E., Coulson C. A., 1949.  The distribution of the number of mutants in bacterial populations. J. Genet. 49: 264–285. [DOI] [PubMed] [Google Scholar]
  34. Mandelbrot B., 1974.  A population birth-and-mutation process, i: explicit distributions for the number of mutants in an old culture of bacteria. J. Appl. Probab. 11: 437–444. [Google Scholar]
  35. McVean G. A. T., Charlesworth B., 2000.  The effects of Hill-Robertson interference between selected mutations on patterns of molecular evolution and variation. Genetics 155: 929–944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. McVicker G., Gordon D., Davis C., Green P., 2009.  Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5: e1000471 10.1371/journal.pgen.1000471 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Neher R. A., Hallatschek O., 2013.  Genealogies of rapidly adapting populations. Proc. Natl. Acad. Sci. USA 110: 437–442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Neher R. A., Shraiman B. I., 2011.  Genetic draft and quasi-neutrality in large facultatively sexual populations. Genetics 188: 975–996. 10.1534/genetics.111.128876 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Neher R. A., Shraiman B. I., 2012.  Fluctuations of fitness distributions and the rate of muller’s ratchet. Genetics 191: 1283–1293. 10.1534/genetics.112.141325 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Neher R. A., Kessinger T. A., Shraiman B. I., 2013.  Coalescence and genetic diversity in sexual populations under selection. Proc. Natl. Acad. Sci. USA 110: 15836–15841. 10.1073/pnas.1309697110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Nicolaisen L. E., Desai M. M., 2012.  Distortions in genealogies due to purifying selection. Mol. Biol. Evol. 29: 3589–3600. [DOI] [PubMed] [Google Scholar]
  42. Nordborg M., Charlesworth B., Charlesworth D., 1996.  The effect of recombination on background selection. Genet. Res. 67: 159–174. [DOI] [PubMed] [Google Scholar]
  43. O’Fallon B. D., Seger J., Adler F., 2010.  A continuous-state coalescent and the impact of weak selection on the structure of gene genealogies. Mol. Biol. Evol. 27: 1162–1172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Rannala B., 1997.  Gene genealogy in a population of variable size. Heredity 78: 417–423. [DOI] [PubMed] [Google Scholar]
  45. Roze D., 2016.  Background selection in partially selfing populations. Genetics 203: 937–957. 10.1534/genetics.116.187955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Sawyer S. A., Hartl D. L., 1992.  Population genetics of polymorphism and divergence. Genetics 132: 1161–1176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Seger J., Smith W. A., Perry J. J., Hunn J., Kaliszewska Z. A., 2010.  Gene genealogies strongly distorted by weakly interfering mutations in constant environments. Genetics 184: 529–545. 10.1534/genetics.109.103556 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Slatkin M., 1972.  On treating the chromosome as the unit of selection. Genetics 72: 157–168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Slatkin M., Hudson R. R., 1991.  Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129: 555–562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Tachida H., 2000.  DNA evolution under weak selection. Gene 261: 3–9. [DOI] [PubMed] [Google Scholar]
  51. Tajima F., 1989.  Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: 585–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Van Kampen N. G., 2007.  Stochastic Processes in Physics and Chemistry, Ed. 3. North Holland: Elsevier, Amsterdam. [Google Scholar]
  53. Wakeley J., 2009.  Coalescent Theory: An Introduction. Roberts and Company Publishers, Greenwood Village, CO. [Google Scholar]
  54. Walczak A. M., Nicolaisen L. E., Plotkin J. B., Desai M. M., 2012.  The structure of genealogies in the presence of purifying selection: a fitness-class coalescent. Genetics 190: 753–779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Weissman D. B., Hallatschek O., 2014.  The rate of adaptation in large sexual populations with linear chromosomes. Genetics 196: 1167–1183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Weissman D. B., Desai M. M., Fisher D. S., Feldman M. W., 2009.  The rate at which asexual populations cross fitness valleys. Theor. Popul. Biol. 75: 286–300. 10.1016/j.tpb.2009.02.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Williamson S., Orive M. E., 2002.  The genealogy of a sequence subject to purifying selection at multiple sites. Mol. Biol. Evol. 19: 1376–1384. [DOI] [PubMed] [Google Scholar]
  58. Yule G. U., 1924.  A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F. R. S. Philos. Trans. R. Soc. Lond., B 213: 21–87. [Google Scholar]
  59. Zeng K., Charlesworth B., 2010.  The effects of demography and linkage on the estimation of selection and mutation parameters. Genetics 186: 1411–1424. 10.1534/genetics.110.122150 [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Zeng K., Charlesworth B., 2011.  The joint effects of background selection and genetic recombination on local gene genealogies. Genetics 189: 251–266. 10.1534/genetics.111.130575 [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Zeng K., Corcoran P., 2015.  The effects of background and interference selection on patterns of genetic variation in subdivided populations. Genetics 201: 1539–1554. 10.1534/genetics.115.178558 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Code used to generate the simulated data are available at: https://github.com/icvijovic/background-selection. Supplemental material available at Figshare: https://doi.org/10.25386/genetics.6167591.


Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES