Abstract
A major research goal in evolutionary genetics is to uncover loci experiencing positive selection. One approach involves finding ‘selective sweeps’ patterns, which can either be ‘hard sweeps’ formed by de novo mutation, or ‘soft sweeps’ arising from recurrent mutation or existing standing variation. Existing theory generally assumes outcrossing populations, and it is unclear how dominance affects soft sweeps. We consider how arbitrary dominance and inbreeding via self-fertilization affect hard and soft sweep signatures. With increased self-fertilization, they are maintained over longer map distances due to reduced effective recombination and faster beneficial allele fixation times. Dominance can affect sweep patterns in outcrossers if the derived variant originates from either a single novel allele, or from recurrent mutation. These models highlight the challenges in distinguishing hard and soft sweeps, and propose methods to differentiate between scenarios.
Keywords: Adaptation, Dominance, Self-fertilisation, Selective Sweeps, Population Genetics
Inferring adaptive mutations from nucleotide polymorphism data are a major research goal in evolutionary genetics, and has been subject to extensive modeling work to determine the footprints they leave in genome data (Stephan 2019). The earliest models focused on a scenario where a beneficial mutation arose as a single copy before rapidly fixing. Linked neutral mutations then ‘hitchhike’ to fixation with the adaptive variant, reducing diversity around the selected locus (Maynard Smith and Haigh 1974; Kaplan et al. 1989). Hitchhiking also increases linkage disequilibrium in regions flanking the selected site, by raising the haplotype carrying the selected allele to high frequency (Thomson 1977; Innan and Nordborg 2003; McVean 2007). These theoretical expectations have spurred the creation of summary statistics for detecting sweeps, usually based on finding genetic regions exhibiting extended haplotype homozygosity (Sabeti et al. 2002; Kim and Nielsen 2004; Voight et al. 2006; Ferrer-Admetlla et al. 2014; Vatsiou et al. 2016), or an increase in high frequency derived variants (Fay and Wu 2000; Kim and Stephan 2002; Nielsen 2005; Boitard et al. 2009; Yang et al. 2018; Fujito et al. 2018).
Classic hitchhiking models consider ‘hard’ sweeps, where the common ancestor of an adaptive allele occurs after the onset of selection (Hermisson and Pennings 2017). Recent years have seen a focus on ‘soft’ sweeps, where the most recent common ancestor of a beneficial allele appeared before it became selected for (reviewed by Barrett and Schluter (2008); Messer and Petrov (2013); Hermisson and Pennings (2017)). Soft sweeps can originate from beneficial mutations being introduced by recurrent mutation at the target locus (Pennings and Hermisson 2006a, b), or originating from existing standing variation that was either neutral or deleterious (Orr and Betancourt 2001; Innan and Kim 2004; Przeworski et al. 2005; Hermisson and Pennings 2005; Wilson et al. 2014; Berg and Coop 2015; Wilson et al. 2017). A key property of soft sweeps is that the beneficial variant is present on multiple genetic backgrounds as it sweeps to fixation, so different haplotypes may carry the derived allele. This property is often used to detect soft sweeps in genetic data (Peter et al. 2012; Vitti et al. 2013; Garud et al. 2015; Garud and Petrov 2016; Schrider and Kern 2016; Sheehan and Song 2016; Harris et al. 2018a; Kern and Schrider 2018; Harris and DeGiorgio 2018, 2019). Soft sweeps have been reported in Drosophila (Karasov et al. 2010; Garud et al. 2015; Garud and Petrov 2016; Vy et al. 2017), humans (Peter et al. 2012; Schrider and Kern 2017; Laval et al. 2019), maize (Fustier et al. 2017), Anopheles mosquitoes (Xue et al. 2019), and pathogens including Plasmodium falciparum (Anderson et al. 2016) and HIV (Pennings et al. 2014; Williams and Pennings 2019). Yet determining how extensive soft sweeps are in nature remains a contentious issue (Jensen 2014; Harris et al. 2018b).
Up to now, there have only been a few investigations into how dominance affects sweep signatures. In a simulation study, Teshima and Przeworski (2006) explored how recessive mutations spend long periods of time at low frequencies, increasing the amount of recombination that acts on derived haplotypes, weakening signatures of hard sweeps. Fully recessive mutations may need a long time to reach a significantly high frequency to be detectable by genome scans (Teshima et al. 2006). Ewing et al. (2011) have carried out a general mathematical analysis of how dominance affects hard sweeps, finding that recessive beneficial mutations have markedly different signatures compared to those with other dominance values. Yet the impact of dominance on soft sweeps has yet to be explored in depth.
In addition, existing models have so far focused on randomly mating populations, with haplotypes freely mixing between individuals over generations. Different reproductive modes alter how alleles are inherited, potentially changing the hitchhiking effect. Self-fertilization, where male and female gametes produced from the same individual can fertilize one another, can alter adaptation rates and selection signatures (Hartfield et al. 2017). This mating system is prevalent among angiosperms (Igic and Kohn 2006), some animals (Jarne and Auld 2006) and fungi (Billiard et al. 2011). As the effects of dominance and self-fertilization become strongly intertwined, it is important to consider both together. Dominant mutations are more likely to fix than recessive ones in outcrossers, as they have a higher initial selection advantage (Haldane 1927). Yet recessive alleles can fix more easily in selfers than in outcrossers as homozygote mutations are created more rapidly (Charlesworth 1992; Glémin 2012). Furthermore, a decrease in effective recombination rates in selfers (Nordborg et al. 1996; Nordborg 2000; Charlesworth and Charlesworth 2010) can interfere with selection acting at linked sites, making it likelier that deleterious mutations hitchhike to fixation with adaptive alleles (Hartfield and Glémin 2014), or that rare mutations are lost by drift due to competition between adaptive mutations (Hartfield and Glémin 2016).
In a constant-sized population, beneficial mutations can be less likely to fix from standing variation (either neutral or deleterious) in selfers as they maintain lower diversity levels (Glémin and Ronfort 2013). Yet adaptation from standing variation becomes likelier in selfers compared to outcrossers under ‘evolutionary rescue’ scenarios, where swift adaptation is needed to prevent population extinction following environmental change. Here, rescue mutations are only present in standing variation as the population size otherwise becomes too small (Glémin and Ronfort 2013). Self-fertilization further aids this process by creating beneficial homozygotes more rapidly than in outcrossing populations (Uecker 2017).
Little data currently exists on the extent of soft sweeps in self-fertilizers. Many selfing organisms exhibit sweep-like patterns, including Arabidopsis thaliana (Long et al. 2013; Huber et al. 2014; Fulgione et al. 2018; Price et al. 2018); Caenorhabditis elegans (Andersen et al. 2012); Medicago truncatula (Bonhomme et al. 2015); and Microbotryum fungi (Badouin et al. 2017). Soft sweeps have also been reported in soya bean (Zhong et al. 2017). Detailed analyses of these cases has been hampered by a lack of theory on how hard and soft sweep signatures should manifest themselves under different self-fertilization and dominance levels. Previous studies have only focused on special cases: Hedrick (1980) analyzed linkage disequilibrium caused by a hard sweep under self-fertilization, while Schoen et al. (1996) modeled sweep patterns caused by modifiers that altered the mating system in different ways.
To this end, we develop a selective sweep model that accounts for dominance and inbreeding via self–fertilization. We determine the genetic diversity present following a sweep from either a de novo mutation, or from standing variation. We also determine the number of segregating sites and the site frequency spectrum, while comparing results to an alternative soft-sweep model where adaptive alleles arise via recurrent mutation. Note that we focus here on single sweep events, rather than characterizing how sweeps affect genome-wide diversity (Elyashiv et al. 2016; Campos et al. 2017; Booker and Keightley 2018; Rettelbach et al. 2019).
Methods
Model outline
We consider a diploid population of size N (carrying haplotypes in total). Individuals reproduce by self-fertilization with probability σ, and outcross with probability . A derived allele arises at a locus, and we are interested in determining the population history of neutral regions that are linked to it, with a recombination rate r between them. We principally look at the case where the beneficial allele arises from previously–neutral standing variation, and subsequently look at a sweep arising from recurrent mutation. The derived allele initially segregates neutrally for a period of time, then becomes advantageous with selective advantage when heterozygous and when homozygous, with and . We further assume that the population size is large and selection is large enough so that the beneficial allele’s change in frequency can be modeled deterministically (i.e., and ). Table 1 lists the notation used in the analysis.
Table 1. Glossary of Notation.
Symbol | Usage |
---|---|
N | Population size (with haplotypes) |
σ | Proportion of matings that are self-fertilizing |
F | Wright’s inbreeding coefficient, probability of identity-by-descent at a single gene, equal to at steady-state |
Joint probability of identity-by-descent at two loci (Equation 1) | |
Effective population size, equal to with selfing | |
r | Recombination rate between loci A and B |
‘Effective’ recombination rate, approximately equal to with selfing | |
R | , the population-level recombination rate |
Frequency at which the derived allele at B becomes advantageous | |
Accelerated (effective) starting frequency of B appearing as a single copy, conditional on fixation | |
s | Selective advantage of derived allele at B |
h | Dominance coefficient of derived allele at B |
t | Number of generations in the past from the present day |
Time in the past when derived locus became beneficial | |
Frequency of beneficial allele at time t | |
Probability of coalescence at time t | |
Probability of recombination at time t | |
Probability of mutation at time t | |
Probability that neutral marker does not coalesce or recombine during sweep phase | |
Probability that neutral marker recombines during sweep phase | |
Probability that neutral marker recombines during standing phase | |
Probability that a lineage mutates during sweep phase | |
Probability that a lineage mutates during standing phase | |
, | ‘Effective’ dominance coefficient for allele at low, high frequency |
π | Pairwise diversity at site ( is expected value without a sweep) |
Pairwise diversity following sweep from standing variation | |
Pairwise diversity following sweep from recurrent mutation | |
μ | Probability of neutral mutation occurring per site per generation |
Probability of beneficial mutation occurring at target locus per generation | |
Population level neutral mutation rate | |
Population level beneficial mutation rate |
Our goal is to determine how the spread of the derived, adaptive allele affects genealogies at linked neutral regions. For a sweep originating from standing variation, we follow the approach of Berg and Coop (2015) and, looking backward in time, break down the selected allele history into two phases. In the recent past is the ‘sweep phase’ where the derived allele was selectively favored, with its frequency decreasing from 1 to . Prior to that phase is the ‘standing phase’, which assumes that the derived allele is present at an approximate fixed frequency . During both phases, a pair of haplotypes can either coalesce, or one of them recombines onto the ancestral background. A schematic is shown in Figure 1.
During the sweep phase, the derived allele will also cause the spread of linked haplotypes that it appeared on. Over the course of the sweep, haplotypes are broken down by recombination; the total number of recombination events is proportional to , where is the fixation time of the beneficial allele, given an initial frequency (Smith and Haigh 1974). Dominance and self–fertilization have different effects on , and therefore the number of fixing haplotypes. If is low () then highly recessive or dominant mutations take longer to go to fixation (Glémin 2012), which can increase the number of recombination events. Dominance also affects the nature of the sweep trajectory. For example, recessive mutations spend more time at a low frequency compared to dominant mutations. These different sweep trajectories can also affect the final sweep profile (Teshima and Przeworski 2006). Self–fertilization leads to decreased fixation time of adaptive mutations through converting heterozygotes to homozygotes (Glémin 2012). Recombination is likelier to act between homozygotes under self-fertilization, so its effective rate is reduced by a factor , for the inbreeding coefficient (Nordborg et al. 1996; Nordborg 2000) and the joint probability of identity-by-descent at the two loci (Roze 2009, 2016; Hartfield and Glémin 2016), defined as:
(1) |
Note that approximates to (as ), unless σ is close to one and r is high (approximately greater than 0.1).
During the standing phase, the amount of initial recombinant haplotypes that are swept to fixation depend on the relative rates of recombination and coalescence. The latter occurs with probability proportional to for the effective population size. Under self–fertilization (Wright 1951; Pollak 1987; Charlesworth 1992; Caballero and Hill 1992; Nordborg and Donnelly 1997), so self–fertilization increases the coalescence probability. This scaling factor will change if there is a large non-Poisson variation in offspring number (Laporte and Charlesworth 2002). Although we focus on inbreeding via self-fertilization, the scalings and should also hold under other systems of regular inbreeding (Caballero and Hill 1992; Charlesworth and Charlesworth 2010, Box 8.4).
We will outline how both coalescence and recombination act during both of these phases, and use these calculations to determine selective sweep properties. Previous models tended to only determine how lineages recombine away from the derived background during the sweep phase, without considering how two lineages coalesce during the sweep phase. If lineages coalesce during the sweep, then the total number of unique recombination events, and hence the number of linked haplotypes, are reduced. Barton (1998) showed that these coalescent events are negligible only for very strong selection (; and B. Charlesworth, unpublished results). Hence, accounting for these coalescent events is important for producing accurate matches with simulation results.
Throughout, analytical solutions are compared to results from Wright-Fisher forward-in-time stochastic simulations that were ran using SLiM version 3.3 (Haller and Messer 2019). Results for outcrossing populations were also tested using coalescent simulations ran with msms (Ewing and Hermisson 2010). The simulation methods are outlined in Supplementary File S2.
Data availability
File S1 is a Mathematica notebook of analytical derivations and simulation results. File S2 contains additional methods, results and figures. File S3 contains copies of the simulation scripts, which are also available from https://github.com/MattHartfield/SweepDomSelf. Supplemental material available at figshare: https://doi.org/10.25387/g3.11687949.
Results
Probability of events during sweep phase
We first look at the probability of events (coalescence or recombination) acting during the sweep phase for the simplest case of two alleles. Looking back in time following the fixation of the derived mutation, sites linked to the beneficial allele can either coalesce or recombine onto the ancestral genetic background. Let be the adaptive mutation frequency at time t, defined as the number of generations prior to the present day. Further define (i.e., the allele is fixed at the present day), and the time in the past when the derived variant became beneficial (i.e., ).
For a pair of haplotype samples carrying the derived allele, if it is at frequency at time t, this lineage pair can either coalesce or one of the haplotypes recombine onto the ancestral background. Each event occurs with probability:
(2) |
Equation 2 is based on those obtained by Kaplan et al. (1989), assuming that due to self-fertilization (Pollak 1987; Charlesworth 1992; Caballero and Hill 1992; Nordborg and Donnelly 1997), and is the ‘effective’ recombination rate after correcting for increased homozygosity due to self-fertilization (Nordborg et al. 1996; Nordborg 2000; Charlesworth and Charlesworth 2010; Roze 2009, 2016; Hartfield and Glémin 2016). Equation 2 demonstrates how each event is differently influenced by p. In particular, the per–generation coalescence probability can be small unless p is close to . The total probability that coalescence occurs during the sweep phase increases if the beneficial allele spends a sizeable time at low frequency, e.g., when it is recessive. The terms in Equation 2 can also be defined as functions of p.
We are interested in calculating (i) the probability that no coalescence or recombination occurs in the sweep phase; (ii) the probability that recombination acts on a lineage to transfer it to the neutral background that is linked to the ancestral allele, assuming that no more than one recombination event occurs per generation (see Campos and Charlesworth (2019) for derivations assuming multiple recombination events). We will go through these probabilities in turn to determine expected pairwise diversity. For , the total probability that the two lineages do not coalesce or recombine over generations equals:
(3) |
Here ϵ is a small term and is the upper limit of the deterministic spread of the beneficial allele. We will discuss in the section ‘Effective starting frequency from a de novo mutation’ what a reasonable value for ϵ should be. Also note that we switch from a discrete–time calculation to a continuous–time calculation, which can give simplifying results. To calculate we insert the deterministic change in allele frequency p (Glémin 2012):
(4) |
Note the negative factor in Equation 4 since we are looking back in time. By substituting Equation 4 into Equation 3, we obtain an analytical solution for , although the resulting expression is complicated (Section A of Supplementary File S1).
To calculate , the probability that recombination acts during the sweep, we first calculate the probability that recombination occurs when the beneficial allele is at frequency . Here, no events occur in the time leading up to , then a recombination event occurs with probability . is obtained by integrating this probability over the entire sweep from time 0 to
(5) |
where:
(6) |
Note that the exponential term of is different from (Equation 3) since the upper integral limit is to rather than . That is, it only covers part of the sweep phase. Equation 5 is evaluated numerically. In Supplementary File S2, we provide a ‘star–like’ analytical approximation to that assumes no coalescence during the sweep phase.
Probability of coalescence from standing variation
The variant becomes advantageous at frequency . We assume that , and hence event probabilities, remain fixed over time. Berg and Coop (2015) have shown this assumption provides a good approximation to coalescent rates during the standing phase. The outcome during the standing phase is thus determined by competing Poisson processes. The two haplotypes could coalesce, with an exponentially-distributed waiting time with rate . Alternatively, one of the two haplotypes could recombine onto the ancestral background with mean waiting time . For two competing exponential distributions with rates and , the probability of the first event occurring given an event happens equals (Wakeley 2009, Chapter 2). Hence the probability that recombination occurs instead of coalescence equals:
(7) |
The probability of coalescence rather than recombination is . Here is the population-scaled recombination rate. The final approximation arises as if . This term reflects how increased homozygosity reduces both effective recombination and , with the latter making coalescence more likely. In addition, it also highlights how the signature of a sweep from standing variation, as characterized by the spread of different initial recombinant haplotypes, is spread over an increased distance of under self–fertilization.
Effective starting frequency for a de novo mutation, and effective final frequency
When a new beneficial mutation appears as a single copy, it is highly likely to go extinct by chance (Fisher 1922; Haldane 1927). Beneficial mutations that increase in frequency faster than expected when rare are more able to overcome this stochastic loss and reach fixation. These beneficial mutations will hence display an apparent ‘acceleration’ in their logistic growth, equivalent to having a starting frequency that is greater than (Maynard Smith 1976; Barton 1998; Desai and Fisher 2007; Martin and Lambert 2015). Correcting for this acceleration is important to accurately model hard sweep signatures, and inform on the minimum level of standing variation needed to differentiate a hard sweep from one originating from standing variation.
In Section B of Supplementary File S1, we determine that hard sweeps that go to fixation have the following effective starting frequency:
(8) |
where is the effective dominance coefficient for mutations at a low frequency. This result is consistent with those of Martin and Lambert (2015), who obtained a distribution of effective starting frequencies using stochastic differential equations. This acceleration effect can create substantial increases in the effective , especially for recessive mutations (Figure 2).
The effective final frequency of the derived allele , at which its spread is no longer deterministic, can be obtained by setting ; that is, by substituting to in Equation 8. This final frequency is always used, even if . van Herwaarden and van der Wal (2002) determined that the sojourn time for an allele with dominance coefficient h that is increasing in frequency, is the same for an allele decreasing in frequency with dominance . Glémin (2012) showed that this result also holds under any inbreeding value F. See Charlesworth (2020) for a fuller discussion of effective final frequencies and their impact on sweep fixation times.
Expected pairwise diversity
We use , and to calculate the expected pairwise diversity (denoted π) present around a sweep. During the sweep phase, the two neutral sites could either coalesce, or one of them recombines onto the ancestral background. If coalescence occurs, since it does so in the recent past then it is assumed that no diversity exist between samples, i.e., for π the average number of differences between two alleles (Tajima 1983). In reality there may be some residual diversity caused by appearance of mutations during the sweep phase; we do not account for these mutations while calculating π but will do so when calculating the site-frequency spectrum. Alternatively, if one of the two samples recombines onto the neutral background, they will have the same pairwise diversity between them as the background population (). If the two samples trace back to the standing phase (with probability ) then the same logic applies. Hence the expected diversity following a sweep , relative to the background value , equals:
(9) |
The full solution to Equation 9 can be obtained by plugging in the relevant parts from Equations 3, 5 and 7, which we evaluate numerically. Equation 9 is undefined for h = 0 or 1 with ; these cases can be derived separately.
Figure 3 plots Equation 9 with different dominance, self-fertilization, and standing frequency values. The analytical solution fits well compared to forward-in-time simulations, yet slightly overestimates them for high self-fertilization frequencies. It is unclear why this mismatch arises. One explanation could be that drift effects are magnified under self–fertilization, which causes a quicker sweep fixation time than expected from deterministic spread, if conditioning on a sweep going to fixation. Although (Equation 8) captures these drift effects for rare alleles, there may be additional effects that are not accounted for. Under complete outcrossing, baseline diversity is restored (i.e., goes to 1) closer to the sweep origin for recessive mutations (), compared to semidominant () or dominant () mutations. Sweeps caused by dominant and semidominant mutations result in a similar genetic diversity, so these cases may be hard to differentiate from diversity data alone.
These results can be better understood by examining the underlying allele trajectories, using logic described by Teshima and Przeworski (2006) (Figure 4). For outcrossing populations, recessive mutations spend most of the sojourn time at low frequencies, maximizing recombination events and restoring neutral variation. These trajectories mimic sweeps from standing variation, which spend extended periods of time at low frequencies in the standing phase. Conversely, dominant mutations spend most of their time at high frequencies, so most recombination events are between haplotypes that carry the derived allele. Hence, there is a reduced chance for linked neutral alleles to recombine onto the ancestral background.
As self-fertilization increases, sweep signatures become similar to the co-dominant case as the derived allele is more likely to spread as a homozygote, weakening the influence that dominance exerts over beneficial allele trajectories. Increasing also causes sweeps with different dominance coefficients to produce comparable signatures, as beneficial mutation trajectories become similar after conditioning on starting at an elevated frequency.
An analytical approximation can be obtained by using the ‘star-like’ result for (described in Supplementary Files S1, S2). In this case the expected pairwise diversity approximates to:
(10) |
Note that Equation 10 instead uses the probability of coalescence during the standing phase, . This approximation reflects similar formulas for diversity following soft sweeps in haploid outcrossing populations (Pennings and Hermisson 2006b; Berg and Coop 2015). There is a factor of two in the power term to account for two lineages. In Supplementary File S2 we demonstrate that this equation overestimates the relative diversity following a selective sweep. This mismatch arises since the star-like assumption of no coalescence during the sweep phase is only accurate for very strongly selected mutations (Barton 1998; B. Charlesworth, unpublished results). Hence it is important to consider coalescence during the sweep phase to accurately model selective sweeps that do not have an extremely high selection coefficient.
Site frequency spectrum
The star-like approximation can be used to obtain analytical solutions for the number of segregating sites and the site frequency spectrum (i.e., the probability that , 2 of n alleles carry derived variants). The full derivation for these statistics are outlined in Supplementary File S2, which uses the star-like approximation. Figure 5 plots the SFS (Equation A12 in Supplementary File S2) alongside simulation results. Analytical results fit the simulation data well after including an adjusted singleton class, which accounts for recent mutations that arise on the derived background during both the standing and sweep phases (Berg and Coop 2015). Including this new singleton class improves the model fit, but there remains a tendency for analytical results to underestimate the proportion of low- and high-frequency classes (l = 1 and 9 in Figure 5), and overestimate the proportion of intermediate-frequency classes. Additional inaccuracies could have arisen due to the use of the star-like approximation, which assumes that there is no coalescence during the sweep phase.
Hard sweeps in either outcrossers or partial selfers are characterized by a large number of singletons and highly-derived variants (Figure 5), which is a typical selective sweep signature (Braverman et al. 1995; Barton 1998; Kim and Stephan 2002). As the initial frequency increases, so does the number of intermediate-frequency variants (Figure 5). This signature is often seen as a characteristic of soft sweeps (Pennings and Hermisson 2006b; Berg and Coop 2015). Recessive hard sweeps ( and ) can produce SFS profiles that are similar to sweeps from standing variation, as there are an increased number of recombination events occurring since the allele is at a low frequency for long time periods (Figure 4). With increased self-fertilization, both hard and soft sweep signatures (e.g., increased number of intermediate-frequency alleles) are recovered when measuring the SFS at a longer recombination distance than in outcrossers (Figure 5, bottom row). This is an example of how signatures of sweeps from standing variation are extended over an increased recombination distance of around , as demonstrated by Equation 7.
Soft sweeps from recurrent mutation
So far, we have only focused on a soft sweep that arises from standing variation. An alternative type of soft sweep is one where recurrent mutation at the selected locus introduces the beneficial allele onto different genetic backgrounds. We can examine this case by modifying existing results. Below we derive the expected relative diversity between two alleles following this type of soft sweep, and outline the SFS for more than two samples in Supplementary File S2.
In this model, derived alleles arise from recurrent mutation and are instantaneously beneficial (i.e., there is no ‘standing phase’). During the sweep phase, lineages can escape the derived background by recombination, or if they are derived from a mutation event. If the beneficial allele is at frequency p then the probability of being descended from an ancestral allele by mutation is , for the mutation probability (Pennings and Hermisson 2006b). Denote the probability of a lineage experiencing recombination or mutation during this sweep phase by , respectively. In both these cases the expected diversity present at linked sites is . If none of these events arise with probability , then remaining lineages can either coalesce, or they arise from independent mutation events. If they coalesce then they have approximately zero pairwise diversity between them; alternatively, they have different origins and thus exhibit the same pairwise diversity as the neutral background. Let denote the probability that mutation occurs at the sweep origin, as opposed to coalescence.
Following this logic, the expected relative diversity for a sweep arising from recurrent mutation equals (with additional details in Supplementary File S1):
(11) |
denotes the diversity around a soft sweep from recurrent mutation. , are similar to the equations used when modeling a sweep from standing variation. They are both modified to account for additional beneficial mutation arising during the sweep phase:
(12) |
where:
(13) |
and:
(14) |
Note that Equation 14 has an upper integral limit of , as opposed to a general used in the sweep from standing variation model, reflecting that there is no standing phase.
is the mutation probability during the sweep phase, and is similar to Equation 13 except that is replaced by , for is the derived allele frequency when the event occurs. is the probability that, at the sweep origin, the derived allele appears by mutation instead of coalescing, and is defined in a similar manner to (Equation 7):
(15) |
where . The coalescence probability is . Equation 15 implies that self–fertilization makes it more likely for beneficial mutations to coalesce at the start of a sweep, rather than arising from independent mutation events. Hence the signatures of soft sweeps via recurrent mutation will be weakened under inbreeding.
Figure 6 compares in the standing variation case, and for the recurrent mutation case, under different levels of self-fertilization. While dominance only weakly affects sweep signatures arising from standing variation under outcrossing, it more strongly affects sweeps from recurrent mutation in outcrossing populations, as each variant arises from an initial frequency close to (Figure 4). Second, the two models exhibit different behavior close to the selected locus (R close to zero). The recurrent mutation model has non–zero diversity levels, while the standing variation model exhibits zero diversity. As R increases, diversity eventually becomes higher for the standing variation case compared to the recurrent mutation case. We can heuristically determine when this transition occurs as follows. Assume a large population size but weak recombination and mutation rates. Hence, it is unlikely that any events occur during the sweep phase, so , and . Then the expected relative diversity (Equation 11) equals for a sweep from standing variation, and for one from recurrent mutation. To find the recombination rate at which a sweep from recurrent mutation yields higher diversity than one from standing variation, we find the R value needed to equate the two probabilities, giving:
(16) |
The last approximation arises as . Hence for a fixed , the window where recurrent mutations create higher diversity near the selected locus increases for lower or higher F, since both these factors reduces the potential for recombination to create new haplotypes during the standing phase. Equation 16 is generally accurate when sweeps from standing variation have higher diversity than sweeps with recurrent mutations (Figure 6, bottom row), but becomes inaccurate for in outcrossing populations, as some events are likely to occur during the sweep phase. In Supplementary File S2 we show how similar results apply to the SFS.
Discussion
Summary of theoretical findings
While there has been many investigations into how different sweep processes can be detected from next-generation sequence data (Pritchard and Di Rienzo 2010; Messer and Petrov 2013; Stephan 2016; Hermisson and Pennings 2017), these models generally assumed idealised randomly mating populations and beneficial mutations that are semidominant (). Here we have created a more general selective sweep model, with arbitrary self-fertilization and dominance levels. Our principal focus is on comparing a hard sweep arising from a single allele copy to a soft sweep arising from standing variation, but we also consider the case of recurrent mutation (Figure 6).
We find that the qualitative patterns of different selective sweeps under selfing remain similar to expectations from outcrossing models. In particular, a sweep from standing variation still creates an elevated number of intermediate-frequency variants compared to a sweep from de novo mutation (Figures 5, 6). This pattern is standard for soft sweeps (Pennings and Hermisson 2006b; Messer and Petrov 2013; Berg and Coop 2015; Hermisson and Pennings 2017) so existing statistical methods for detecting them (e.g., observing an higher than expected number of haplotypes; Vitti et al. (2013); Garud et al. (2015)) can, in principle, also be applied to selfing organisms. Under self-fertilization, these signatures are stretched over longer physical regions than in outcrossers. These extensions arise as self-fertilization affects gene genealogies during both the sweep and standing phases in different ways. During the sweep phase, beneficial alleles fix more rapidly under higher self-fertilization as homozygous mutations are created more rapidly (Charlesworth 1992; Glémin 2012). In addition, the effective recombination rate is reduced by approximately (Nordborg et al. 1996; Nordborg 2000; Charlesworth and Charlesworth 2010), and slightly more for highly inbred populations (Roze 2009, 2016). These two effects mean that neutral variants linked to an adaptive allele are less likely to recombine onto the neutral background during the sweep phase, as reflected in Equation 3 for . During the standing phase, two haplotypes are more likely to coalesce under high levels of self-fertilization since is decreased by a factor (Pollak 1987; Charlesworth 1992; Caballero and Hill 1992; Nordborg and Donnelly 1997). This effect, combined with a reduced effective recombination rate, means that the overall recombination probability during the standing phase is reduced by a factor (Equation 7). Hence intermediate-frequency variants, which could provide evidence of adaptation from standing variation, will be spread out over longer genomic regions (this result can be seen in the site–frequency spectrum results, Figure 5). The elongation of sweep signatures means sweeps from standing variation can be easier to detect in selfing organisms than in outcrossers. Conversely, sweeps from recurrent mutation will have weakened signatures under self–fertilization. This result is due to a reduced effective population size, making it likelier that lineages trace back to a common ancestor rather than independent mutation events.
We have also investigated how dominance affects soft sweep signatures, since previous analyses have only focused on how dominance affects hard sweeps (Teshima and Przeworski 2006; Teshima et al. 2006; Ewing et al. 2011). In outcrossing organisms, recessive mutations leave weaker sweep signatures than additive or dominant mutations as they spend more time at low frequencies, increasing the amount of recombination that restores neutral variation (Figures 3, 4). With increased self-fertilization, dominance has a weaker impact on sweep signatures as most mutations are homozygous (Figure 4). We also show that the SFS for recessive alleles can resemble a soft sweep, with a higher number of intermediate-frequency variants than for other hard sweeps (Figure 5). Dominance only weakly affects sweeps from standing variation, as trajectories of beneficial alleles become similar once the variant’s initial frequency exceeds (Figures 3, 4). Yet different dominance levels can affect sweep signatures if the beneficial allele is reintroduced by recurrent mutation (Figure 6). Hence if one wishes to understand how dominance affects sweep signatures, it is also important to consider which processes underlie observed patterns of genetic diversity.
These results also demonstrate that the effects of dominance on sweeps are not necessarily intuitive. For example, both highly dominant and recessive mutations have elongated fixation times compared to co–dominant mutations (Glémin 2012). Based on this intuition, one could expect both dominant and recessive mutations to both produce weaker sweep signatures than co-dominant ones. In practice, dominant mutations have similar sweep signatures to co–dominant mutations (Figures 3, 5), and recessive sweeps could produce similar signatures as sweeps from standing variation (Figure 5). Dominance also has a weaker impact on sweeps from standing variation (Figures 3, 5).
Soft sweeps from recurrent mutation or standing variation?
These theoretical results shed light onto how to distinguish between soft sweeps that arise either from standing variation, or from recurrent mutation. Both models are characterized by an elevated number of intermediate-frequency variants, in comparison to a hard sweep. Yet sweeps arising from recurrent mutation have non–zero diversity at the selected locus, whereas a sweep from standing variation exhibits approximately zero diversity. Hence a sweep from recurrent mutation shows intermediate-frequency variants closer to the beneficial locus, compared to sweeps from standing variation (Figures 6 and C in Supplementary File S2). Further from the selected locus, a sweep from standing variation exhibits greater variation than one from recurrent mutation, due to recombinant haplotypes being created during the standing phase. Equation 16 provides a simple condition for , the recombination distance needed for a sweep from standing variation to exhibit higher diversity than one from recurrent mutation; from this equation, we see that the size of this region increases under higher self-fertilization. Hence it may be easier to differentiate between these two sweep scenarios in self–fertilizing organisms.
Differences in haplotype structure between sweeps from either standing variation or recurrent mutation should be more pronounced in self-fertilizing organisms, due to the reduction in effective recombination rates. However, when investigating sweep patterns over broad genetic regions, it becomes likelier that genetic diversity will be affected by multiple beneficial mutations spreading throughout the genome. Competing selective sweeps can lead to elevated diversity near a target locus for two reasons. First, selection interference increases the fixation time of individual mutations, allowing more recombination that can restore neutral diversity (Kim and Stephan 2003). In addition, competing selective sweeps can drag different sets of neutral variation to fixation. Selective sweep signatures in data tend to be asymmetric, and this effect will exacerbate this asymmetry (Chevin et al. 2008). Further investigations of selective sweep patterns across long genetic distances will prove to be a rich area of future research.
Finally, we have assumed a fixed population size, and that sweeps from standing variation arose from neutral variation. The resulting signatures could differ if the population size has changed over time (Wilson et al. 2014), if populations are structured (Zheng and Wiehe 2019), or if the beneficial allele was previously deleterious (Orr and Betancourt 2001). Both issues could also affect our ability to discriminate between soft and hard sweeps.
Potential applications to self-fertilizing organisms
Existing methods for finding sweep signatures in nucleotide polymorphism data are commonly based on finding regions with a site-frequency spectrum matching what is expected under a selective sweep (Nielsen et al. 2005; Boitard et al. 2009; Pavlidis et al. 2013; DeGiorgio et al. 2016; Huber et al. 2016). The more general models developed here can be used to create more specific sweep-detection methods that include self-fertilization. However, a recent analysis found that soft-sweep signatures can be incorrectly inferred if analyzing genetic regions that flank hard sweeps, which was named the ‘soft shoulder’ effect (Schrider et al. 2015). Due to the reduction in recombination in selfers, these model results indicate that ‘soft-shoulder’ footprints can arise over long genetic distances and should be taken into account. One remedy to this problem is to not just classify genetic regions as being subject to either a hard or soft sweep, but also as being linked to a region subject to one of these sweeps (Schrider and Kern 2016). These more general calculations can also be extended to quantify to what extent background selection and sweeps jointly shape genome-wide diversity in self-fertilizing organisms (Elyashiv et al. 2016; Campos et al. 2017; Booker and Keightley 2018; Rettelbach et al. 2019), or detect patterns of introgression (Setter et al. 2019).
Acknowledgments
We would like to thank Sally Otto for providing information on the elevated effective starting frequency of beneficial mutations; Brian Charlesworth on providing advice on modeling selective sweeps, sharing unpublished results, and providing comments on the manuscript; Ben Haller for answering questions about SLiM; Nick Barton, Jeffrey Ross-Ibarra and other anonymous referees for providing feedback on the manuscript. MH was supported by a Marie Curie International Outgoing Fellowship (MC-IOF-622936) and a NERC Independent Research Fellowship (NE/R015686/1). MH and TB also acknowledge financial support from the European Research Council under the European Union’s Seventh Framework Program (FP7/20072013, ERC Grant 311341).
Footnotes
Supplemental material available at figshare: https://doi.org/10.25387/g3.11687949.
Communicating editor: J. Ross-Ibarra
Literature Cited
- Andersen E. C., Gerke J. P., Shapiro J. A., Crissman J. R., Ghosh R. et al. , 2012. Chromosome-scale selective sweeps shape Caenorhabditis elegans genomic diversity. Nat. Genet. 44: 285–290. 10.1038/ng.1050 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anderson T. J. C., Nair S., McDew-White M., Cheeseman I. H., Nkhoma S. et al. , 2016. Population parameters underlying an ongoing soft sweep in southeast asian malaria parasites. Mol. Biol. Evol. 34: 131–144. 10.1093/molbev/msw228 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Badouin H., Gladieux P., Gouzy J., Siguenza S., Aguileta G. et al. , 2017. Widespread selective sweeps throughout the genome of model plant pathogenic fungi and identification of effector candidates. Mol. Ecol. 26: 2041–2062. 10.1111/mec.13976 [DOI] [PubMed] [Google Scholar]
- Barrett R. D. H., and Schluter D., 2008. Adaptation from standing genetic variation. Trends Ecol. Evol. 23: 38–44. 10.1016/j.tree.2007.09.008 [DOI] [PubMed] [Google Scholar]
- Barton N. H., 1998. The effect of hitch-hiking on neutral genealogies. Genet. Res. 72: 123–133. 10.1017/S0016672398003462 [DOI] [Google Scholar]
- Berg J. J., and Coop G., 2015. A coalescent model for a sweep of a unique standing variant. Genetics 201: 707–725. 10.1534/genetics.115.178962 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Billiard S., López-Villavicencio M., Devier B., Hood M. E., Fairhead C. et al. , 2011. Having sex, yes, but with whom? Inferences from fungi on the evolution of anisogamy and mating types. Biol. Rev. Camb. Philos. Soc. 86: 421–442. 10.1111/j.1469-185X.2010.00153.x [DOI] [PubMed] [Google Scholar]
- Boitard S., Schlötterer C., and Futschik A., 2009. Detecting selective sweeps: A new approach based on hidden markov models. Genetics 181: 1567–1578. 10.1534/genetics.108.100032 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bonhomme M., Boitard S., San Clemente H., Dumas B., Young N. et al. , 2015. Genomic signature of selective sweeps illuminates adaptation of Medicago truncatula to root-associated microorganisms. Mol. Biol. Evol. 32: 2097–2110. 10.1093/molbev/msv092 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Booker T. R., and Keightley P. D., 2018. Understanding the factors that shape patterns of nucleotide diversity in the house mouse genome. Mol. Biol. Evol. 35: 2971–2988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Braverman J. M., Hudson R. R., Kaplan N. L., Langley C. H., and Stephan W., 1995. The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. Genetics 140: 783–796. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caballero A., and Hill W. G., 1992. Effects of partial inbreeding on fixation rates and variation of mutant genes. Genetics 131: 493–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Campos J. L., and Charlesworth B., 2019. The effects on neutral variability of recurrent selective sweeps and background selection. Genetics 212: 287–303. 10.1534/genetics.119.301951 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Campos J. L., Zhao L., and Charlesworth B., 2017. Estimating the parameters of background selection and selective sweeps in drosophila in the presence of gene conversion. Proc. Natl. Acad. Sci. USA 114: E4762–E4771. 10.1073/pnas.1619434114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charlesworth B., 1992. Evolutionary rates in partially self-fertilizing species. Am. Nat. 140: 126–148. 10.1086/285406 [DOI] [PubMed] [Google Scholar]
- Charlesworth B., 2020. How long does it take to fix a favorable mutation, and why should we care? Am. Nat. Early Online. 10.1086/708187 [DOI] [PubMed]
- Charlesworth B., and Charlesworth D., 2010. Elements of Evolutionary Genetics, Roberts & Company Publishers, Greenwood Village, Colo. [Google Scholar]
- Chevin L.-M., Billiard S., and Hospital F., 2008. Hitchhiking both ways: Effect of two interfering selective sweeps on linked neutral variation. Genetics 180: 301–316. 10.1534/genetics.108.089706 [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeGiorgio M., Huber C. D., Hubisz M. J., Hellmann I., and Nielsen R., 2016. SweepFinder2: increased sensitivity, robustness and flexibility. Bioinformatics 32: 1895–1897. 10.1093/bioinformatics/btw051 [DOI] [PubMed] [Google Scholar]
- Desai M. M., and Fisher D. S., 2007. Beneficial mutation-selection balance and the effect of linkage on positive selection. Genetics 176: 1759–1798. 10.1534/genetics.106.067678 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elyashiv E., Sattath S., Hu T. T., Strutsovsky A., McVicker G. et al. , 2016. A genomic map of the effects of linked selection in Drosophila. PLoS Genet. 12: e1006130 10.1371/journal.pgen.1006130 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ewing G., and Hermisson J., 2010. MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics 26: 2064–2065. 10.1093/bioinformatics/btq322 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ewing G., Hermisson J., Pfaffelhuber P., and Rudolf J., 2011. Selective sweeps for recessive alleles and for other modes of dominance. J. Math. Biol. 63: 399–431. 10.1007/s00285-010-0382-4 [DOI] [PubMed] [Google Scholar]
- Fay J. C., and Wu C.-I., 2000. Hitchhiking Under Positive Darwinian Selection. Genetics 155: 1405–1413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferrer-Admetlla A., Liang M., Korneliussen T., and Nielsen R., 2014. On detecting incomplete soft or hard selective sweeps using haplotype structure. Mol. Biol. Evol. 31: 1275–1291. 10.1093/molbev/msu077 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fisher R. A., 1922. On the dominance ratio. Proc. R. Soc. Edinb. 42: 321–341. 10.1017/S0370164600023993 [DOI] [Google Scholar]
- Fujito N. T., Satta Y., Hayakawa T., and Takahata N., 2018. A new inference method for detecting an ongoing selective sweep. Genes Genet. Syst. 93: 149–161. 10.1266/ggs.18-00008 [DOI] [PubMed] [Google Scholar]
- Fulgione A., Koornneef M., Roux F., Hermisson J., and Hancock A. M., 2018. Madeiran Arabidopsis thaliana reveals ancient long-range colonization and clarifies demography in Eurasia. Mol. Biol. Evol. 35: 564–574. 10.1093/molbev/msx300 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fustier M. A., Brandenburg J. T., Boitard S., Lapeyronnie J., Eguiarte L. E. et al. , 2017. Signatures of local adaptation in lowland and highland teosintes from whole-genome sequencing of pooled samples. Mol. Ecol. 26: 2738–2756. 10.1111/mec.14082 [DOI] [PubMed] [Google Scholar]
- Garud N. R., Messer P. W., Buzbas E. O., and Petrov D. A., 2015. Recent Selective Sweeps in North American Drosophila melanogaster Show Signatures of Soft Sweeps. PLoS Genet. 11: e1005004 10.1371/journal.pgen.1005004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garud N. R., and Petrov D. A., 2016. Elevated linkage disequilibrium and signatures of soft sweeps are common in Drosophila melanogaster. Genetics 203: 863–880. 10.1534/genetics.115.184002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Glémin S., 2012. Extinction and fixation times with dominance and inbreeding. Theor. Popul. Biol. 81: 310–316. 10.1016/j.tpb.2012.02.006 [DOI] [PubMed] [Google Scholar]
- Glémin S., and Ronfort J., 2013. Adaptation and maladaptation in selfing and outcrossing species: new mutations versus standing variation. Evolution 67: 225–240. Erratum: 3381. 10.1111/j.1558-5646.2012.01778.x [DOI] [PubMed] [Google Scholar]
- Haldane J. B. S., 1927. A mathematical theory of natural and artificial selection, part V: Selection and mutation. Math. Proc. Camb. Philos. Soc. 23: 838–844. 10.1017/S0305004100015644 [DOI] [Google Scholar]
- Haller B. C., and Messer P. W., 2019. SLiM 3: Forward Genetic Simulations Beyond the Wright–Fisher Model. Mol. Biol. Evol. 36: 632–637. 10.1093/molbev/msy228 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harris A. M., and DeGiorgio M., 2018. Identifying and classifying shared selective sweeps from multilocus data. bioRxiv: 446005 10.1101/446005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harris A. M., and DeGiorgio M., 2019. A likelihood approach for uncovering selective sweep signatures from haplotype data. bioRxiv: 678722 10.1101/678722 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harris A. M., Garud N. R., and DeGiorgio M., 2018a Detection and classification of hard and soft sweeps from unphased genotypes by multilocus genotype identity. Genetics 210: 1429–1452. 10.1534/genetics.118.301502 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harris R. B., Sackman A., and Jensen J. D., 2018b On the unfounded enthusiasm for soft selective sweeps II: Examining recent evidence from humans, flies, and viruses. PLoS Genet. 14: e1007859 10.1371/journal.pgen.1007859 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartfield M., Bataillon T., and Glémin S., 2017. The evolutionary interplay between adaptation and self-fertilization. Trends Genet. 33: 420–431. 10.1016/j.tig.2017.04.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartfield M., and Glémin S., 2014. Hitchhiking of deleterious alleles and the cost of adaptation in partially selfing species. Genetics 196: 281–293. 10.1534/genetics.113.158196 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartfield M., and Glémin S., 2016. Limits to adaptation in partially selfing species. Genetics 203: 959–974. 10.1534/genetics.116.188821 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hedrick P. W., 1980. Hitchhiking: A comparison of linkage and partial selection. Genetics 94: 791–808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hermisson J., and Pennings P. S., 2005. Soft sweeps: Molecular population genetics of adaptation from standing genetic variation. Genetics 169: 2335–2352. 10.1534/genetics.104.036947 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hermisson J., and Pennings P. S., 2017. Soft sweeps and beyond: understanding the patterns and probabilities of selection footprints under rapid adaptation. Methods Ecol. Evol. 8: 700–716. 10.1111/2041-210X.12808 [DOI] [Google Scholar]
- Huber C. D., DeGiorgio M., Hellmann I., and Nielsen R., 2016. Detecting recent selective sweeps while controlling for mutation rate and background selection. Mol. Ecol. 25: 142–156. 10.1111/mec.13351 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huber C. D., Nordborg M., Hermisson J., and Hellmann I., 2014. Keeping It Local: Evidence for Positive Selection in Swedish Arabidopsis thaliana. Mol. Biol. Evol. 31: 3026–3039. 10.1093/molbev/msu247 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Igic B., and Kohn J. R., 2006. The distribution of plant mating systems: study bias against obligately outcrossing species. Evolution 60: 1098–1103. 10.1111/j.0014-3820.2006.tb01186.x [DOI] [PubMed] [Google Scholar]
- Innan H., and Kim Y., 2004. Pattern of polymorphism after strong artificial selection in a domestication event. Proc. Natl. Acad. Sci. USA 101: 10667–10672. 10.1073/pnas.0401720101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Innan H., and Nordborg M., 2003. The extent of linkage disequilibrium and haplotype sharing around a polymorphic site. Genetics 165: 437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jarne P., and Auld J. R., 2006. Animals mix it up too: the distribution of self-fertilization among hermaphroditic animals. Evolution 60: 1816–1824. 10.1111/j.0014-3820.2006.tb00525.x [DOI] [PubMed] [Google Scholar]
- Jensen J. D., 2014. On the unfounded enthusiasm for soft selective sweeps. Nat. Commun. 5: 5281 10.1038/ncomms6281 [DOI] [PubMed] [Google Scholar]
- Kaplan N. L., Hudson R. R., and Langley C. H., 1989. The “hitchhiking effect” revisited. Genetics 123: 887–899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karasov T., Messer P. W., and Petrov D. A., 2010. Evidence that adaptation in Drosophila is not limited by mutation at single sites. PLoS Genet. 6: e1000924 10.1371/journal.pgen.1000924 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kern A. D., and Schrider D. R., 2018. diploS/HIC: An updated approach to classifying selective sweeps. G3 (Bethesda) 8: 1959–1970. 10.1534/g3.118.200262 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim Y., and Nielsen R., 2004. Linkage disequilibrium as a signature of selective sweeps. Genetics 167: 1513–1524. 10.1534/genetics.103.025387 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim Y., and Stephan W., 2002. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160: 765–777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim Y., and Stephan W., 2003. Selective sweeps in the presence of interference among partially linked loci. Genetics 164: 389–398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laporte V., and Charlesworth B., 2002. Effective population size and population subdivision in demographically structured populations. Genetics 162: 501–519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laval, G., E. Patin, P. Boutillier, and L. Quintana-Murci, 2019 A genome-wide approximate bayesian computation approach suggests only limited numbers of soft sweeps in humans over the last 100,000 years. bioRxiv p. 2019.12.22.886234.
- Long Q., Rabanal F. A., Meng D., Huber C. D., Farlow A. et al. , 2013. Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden. Nat. Genet. 45: 884–890. 10.1038/ng.2678 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin G., and Lambert A., 2015. A simple, semi-deterministic approximation to the distribution of selective sweeps in large populations. Theor. Popul. Biol. 101: 40–46. 10.1016/j.tpb.2015.01.004 [DOI] [PubMed] [Google Scholar]
- Maynard Smith J., 1976. What determines the rate of evolution? Am. Nat. 110: 331–338. 10.1086/283071 [DOI] [Google Scholar]
- Maynard Smith J., and Haigh J., 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23: 23–35. 10.1017/S0016672300014634 [DOI] [PubMed] [Google Scholar]
- McVean G. A. T., 2007. The structure of linkage disequilibrium around a selective sweep. Genetics 175: 1395–1406. 10.1534/genetics.106.062828 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Messer P. W., and Petrov D. A., 2013. Population genomics of rapid adaptation by soft selective sweeps. Trends Ecol. Evol. 28: 659–669. 10.1016/j.tree.2013.08.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nielsen R., 2005. Molecular signals of natural selection. Annu. Rev. Genet. 39: 197–218. 10.1146/annurev.genet.39.073003.112420 [DOI] [PubMed] [Google Scholar]
- Nielsen R., Williamson S., Kim Y., Hubisz M. J., Clark A. G. et al. , 2005. Genomic scans for selective sweeps using SNP data. Genome Res. 15: 1566–1575. 10.1101/gr.4252305 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nordborg M., 2000. Linkage disequilibrium, gene trees and selfing: An ancestral recombination graph with partial self-fertilization. Genetics 154: 923–929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nordborg M., Charlesworth B., and Charlesworth D., 1996. Increased levels of polymorphism surrounding selectively maintained sites in highly selfing species. Proc. Biol. Sci. 263: 1033–1039. 10.1098/rspb.1996.0152 [DOI] [Google Scholar]
- Nordborg M., and Donnelly P., 1997. The coalescent process with selfing. Genetics 146: 1185–1195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orr H. A., and Betancourt A. J., 2001. Haldane’s sieve and adaptation from the standing genetic variation. Genetics 157: 875–884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pavlidis P., Živković D., Stamatakis A., and Alachiotis N., 2013. SweeD: Likelihood-Based Detection of Selective Sweeps in Thousands of Genomes. Mol. Biol. Evol. 30: 2224–2234. 10.1093/molbev/mst112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pennings P. S., and Hermisson J., 2006a Soft Sweeps II – Molecular Population Genetics of Adaptation from Recurrent Mutation or Migration. Mol. Biol. Evol. 23: 1076–1084. 10.1093/molbev/msj117 [DOI] [PubMed] [Google Scholar]
- Pennings P. S., and Hermisson J., 2006b Soft Sweeps III: The Signature of Positive Selection from Recurrent Mutation. PLoS Genet. 2: e186 10.1371/journal.pgen.0020186 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pennings P. S., Kryazhimskiy S., and Wakeley J., 2014. Loss and Recovery of Genetic Diversity in Adapting Populations of HIV. PLoS Genet. 10: e1004000 10.1371/journal.pgen.1004000 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peter B. M., Huerta-Sanchez E., and Nielsen R., 2012. Distinguishing between selective sweeps from standing variation and from a De Novo mutation. PLoS Genet. 8: e1003011 10.1371/journal.pgen.1003011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pollak E., 1987. On the theory of partially inbreeding finite populations. I. Partial selfing. Genetics 117: 353–360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Price N., Moyers B. T., Lopez L., Lasky J. R., Monroe J. G. et al. , 2018. Combining population genomics and fitness QTLs to identify the genetics of local adaptation in Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 115: 5028–5033. 10.1073/pnas.1719998115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pritchard J. K., and Di Rienzo A., 2010. Adaptation - not by sweeps alone. Nat. Rev. Genet. 11: 665–667. 10.1038/nrg2880 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Przeworski M., Coop G., and Wall J. D., 2005. The signature of positive selection on standing genetic variation. Evolution 59: 2312–2323. 10.1554/05-273.1 [DOI] [PubMed] [Google Scholar]
- Rettelbach A., Nater A., and Ellegren H., 2019. How linked selection shapes the diversity landscape in Ficedula flycatchers. Genetics 212: 277–285. 10.1534/genetics.119.301991 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roze D., 2009. Diploidy, population structure, and the evolution of recombination. Am. Nat. 174: S79–S94. 10.1086/599083 [DOI] [PubMed] [Google Scholar]
- Roze D., 2016. Background selection in partially selfing populations. Genetics 203: 937–957. 10.1534/genetics.116.187955 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sabeti P. C., Reich D. E., Higgins J. M., Levine H. Z. P., Richter D. J. et al. , 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419: 832–837. 10.1038/nature01140 [DOI] [PubMed] [Google Scholar]
- Schoen D. J., Morgan M. T., and Bataillon T., 1996. How Does Self-Pollination Evolve? Inferences from Floral Ecology and Molecular Genetic Variation. Philos. Trans. R. Soc. Lond. B Biol. Sci. 351: 1281–1290. 10.1098/rstb.1996.0111 [DOI] [Google Scholar]
- Schrider D. R., and Kern A. D., 2016. S/HIC: Robust identification of soft and hard sweeps using machine learning. PLoS Genet. 12: e1005928 10.1371/journal.pgen.1005928 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schrider D. R., and Kern A. D., 2017. Soft sweeps are the dominant mode of adaptation in the human genome. Mol. Biol. Evol. 34: 1863–1877. 10.1093/molbev/msx154 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schrider D. R., Mendes F. K., Hahn M. W., and Kern A. D., 2015. Soft shoulders ahead: Spurious signatures of soft and partial selective sweeps result from linked hard sweeps. Genetics 200: 267–284. 10.1534/genetics.115.174912 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Setter D., Mousset S., Cheng X., Nielsen R., DeGiorgio M. et al. , 2019. Volcanofinder: genomic scans for adaptive introgression. bioRxiv 697987 10.1101/697987 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sheehan S., and Song Y. S., 2016. Deep learning for population genetic inference. PLOS Comput. Biol. 12: e1004845 10.1371/journal.pcbi.1004845 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stephan W., 2016. Signatures of positive selection: from selective sweeps at individual loci to subtle allele frequency changes in polygenic adaptation. Mol. Ecol. 25: 79–88. 10.1111/mec.13288 [DOI] [PubMed] [Google Scholar]
- Stephan W., 2019. Selective sweeps. Genetics 211: 5–13. 10.1534/genetics.118.301319 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tajima F., 1983. Evolutionary Relationship of DNA Sequences in Finite Populations. Genetics 105: 437–460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teshima K. M., Coop G., and Przeworski M., 2006. How reliable are empirical genomic scans for selective sweeps? Genome Res. 16: 702–712. 10.1101/gr.5105206 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teshima K. M., and Przeworski M., 2006. Directional positive selection on an allele of arbitrary dominance. Genetics 172: 713–718. 10.1534/genetics.105.044065 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomson G., 1977. The effect of a selected locus on linked neutral loci. Genetics 85: 753–788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uecker H., 2017. Evolutionary rescue in randomly mating, selfing, and clonal populations. Evolution 71: 845–858. 10.1111/evo.13191 [DOI] [PubMed] [Google Scholar]
- van Herwaarden O. A., and van der Wal N. J., 2002. Extinction time and age of an allele in a large finite population. Theor. Popul. Biol. 61: 311–318. 10.1006/tpbi.2002.1576 [DOI] [PubMed] [Google Scholar]
- Vatsiou A. I., Bazin E., and Gaggiotti O. E., 2016. Detection of selective sweeps in structured populations: a comparison of recent methods. Mol. Ecol. 25: 89–103. 10.1111/mec.13360 [DOI] [PubMed] [Google Scholar]
- Vitti J. J., Grossman S. R., and Sabeti P. C., 2013. Detecting natural selection in genomic data. Annu. Rev. Genet. 47: 97–120. 10.1146/annurev-genet-111212-133526 [DOI] [PubMed] [Google Scholar]
- Voight B. F., Kudaravalli S., Wen X., and Pritchard J. K., 2006. A map of recent positive selection in the human genome. PLoS Biol. 4: e72 Erratums: e154; e147. 10.1371/journal.pbio.0040072 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vy H. M. T., Won Y.-J., and Kim Y., 2017. Multiple Modes of Positive Selection Shaping the Patterns of Incomplete Selective Sweeps over African Populations of Drosophila melanogaster. Mol. Biol. Evol. 34: 2792–2807. 10.1093/molbev/msx207 [DOI] [PubMed] [Google Scholar]
- Wakeley J., 2009. Coalescent theory: an introduction, Vol. 1 Roberts & Company Publishers, Greenwood Village, Colorado. [Google Scholar]
- Williams K.-A., and Pennings P. S., 2019. Drug resistance evolution in HIV in the late 1990s: hard sweeps, soft sweeps, clonal interference and the accumulation of drug resistance mutations. bioRxiv 548198 10.1101/548198 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilson B. A., Pennings P. S., and Petrov D. A., 2017. Soft selective sweeps in evolutionary rescue. Genetics 205: 1573–1586. 10.1534/genetics.116.191478 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilson B. A., Petrov D. A., and Messer P. W., 2014. Soft Selective Sweeps in Complex Demographic Scenarios. Genetics 198: 669–684. 10.1534/genetics.114.165571 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wright S., 1951. The genetical structure of populations. Ann. Eugen. 15: 323–354. 10.1111/j.1469-1809.1949.tb02451.x [DOI] [PubMed] [Google Scholar]
- Xue A. T., Schrider D. R., Kern A. D., and Ag1000G Consortium , 2019. Discovery of ongoing selective sweeps within Anopheles mosquito populations using deep learning. bioRxiv: 589069 10.1101/589069 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Z., Li J., Wiehe T., and Li H., 2018. Detecting recent positive selection with a single locus test bipartitioning the coalescent tree. Genetics 208: 791–805. 10.1534/genetics.117.300401 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng Y., and Wiehe T., 2019. Adaptation in structured populations and fuzzy boundaries between hard and soft sweeps. PLOS Comput. Biol. 15: e1007426 10.1371/journal.pcbi.1007426 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhong L., Yang Q., Yan X., Yu C., Su L. et al. , 2017. Signatures of soft sweeps across the Dt1 locus underlying determinate growth habit in soya bean. Mol. Ecol. 26: 4686–4699. 10.1111/mec.14209 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
File S1 is a Mathematica notebook of analytical derivations and simulation results. File S2 contains additional methods, results and figures. File S3 contains copies of the simulation scripts, which are also available from https://github.com/MattHartfield/SweepDomSelf. Supplemental material available at figshare: https://doi.org/10.25387/g3.11687949.