Abstract
The effect of natural selection on linked sites has been suggested to be a major determinant of genetic diversity. While it is in principle possible to estimate this effect from genome sequence data, interactions between selection, demography and inbreeding are expected to make inference less reliable. Here, we investigate whether the genome-wide reduction in diversity due to background selection () can be accurately estimated when populations are at demographic non-equilibrium and/or reproduce by partial self-fertilization. We show that the classic-BGS model is surprisingly robust to both demographic non-equilibrium and low rates of selfing, although both processes do lead to biased estimation of the distribution of fitness effects (DFE) of deleterious mutations. A high rate of selfing leads to poor estimation of both and DFE parameters. We propose an alternative approach where background selection, demography and partial selfing are jointly estimated from windowed site frequency spectra. This approach resolves most of the bias observed under the classic-BGS model and can also generate estimates of past demography that account for the effect of background selection and partial selfing. We apply the approach to genome sequence data from Capsella grandiflora and Capsella orientalis, which have contrasting mating systems and display a forty-fold difference in nucleotide diversity. Our results suggest that background selection has a weak effect on levels of genetic diversity in the outcrosser C. grandiflora () and a more substantial effect in the predominantly selfing species C. orientalis (), but that background selection alone cannot explain their disparity in genetic diversity.
Introduction
Genetic diversity varies by several orders of magnitude across species of animals and plants (Leffler et al. 2012; Romiguier et al. 2014; Chen et al. 2017; Buffalo 2021). A number of processes are suggested to contribute to this variation, including differences in de novo mutation rate, fluctuations in census population size, population structure and variance in offspring number (reviewed in Charlesworth and Jensen 2022). Comparative analyses have revealed several correlates of genetic diversity, including parental investment and longevity (Romiguier et al. 2014; Chen et al. 2017). In plants, mating system is a key determinant of genetic diversity (Chen et al. 2017), as species that reproduce via self-fertilization will have reduced diversity from inbreeding (Nordborg and Donnelly 1997). Despite these broad-scale patterns, it is still challenging to fully explain differences in diversity between closely related species (Stoffel et al. 2018; Mackintosh et al. 2019; Barry et al. 2022). We also lack a precise explanation for why diversity varies so little in comparison to census population size (Lewontin 1974).
The effect of natural selection on linked sites is a potential explanation for at least some of the variation in genetic diversity observed among species. Positive selection acting on a new mutation leaves a surrounding valley of reduced diversity, i.e. a selective sweep, with the width of the valley being inversely proportional to the sojourn time of the selected allele (Smith and Haigh 1974). Recurrent sweeps are expected to have a major impact on genetic diversity in species with large populations and low rates of sexual reproduction / recombination, but to have a much more limited effect in species with small populations and high rates of recombination (Nam et al. 2017; Ollivier et al. 2025). Purifying selection also leads to a reduction in genetic diversity at linked sites in a process known as background selection (BGS; Charlesworth et al. 1993; Charlesworth 2012). Although selection against a single deleterious mutation has only a very small effect on levels of linked diversity (Hudson and Kaplan 1995; Nordborg et al. 1996), deleterious mutations are common enough so that their combined effect can shape the landscape of diversity across genomes and, potentially, lead to considerable differences in genetic diversity between species.
Evaluating the role of BGS in determining levels of genetic diversity within and between species requires estimation of BGS from sequence data. Hudson and Kaplan (1995) and Nordborg et al. (1996) derived expectations for the scaled reduction in diversity (B) due to purifying selection acting on a deleterious mutation at some recombination distance away. They showed that this expectation can be used to predict patterns of nucleotide diversity along the genome, provided that natural selection is strong relative to drift and that the frequencies of deleterious mutations are independent of one another. This model, hereafter referred to as classic-BGS, has since been used to estimate the reduction in diversity due to BGS in model systems such as humans and Drosophila (McVicker et al. 2009; Comeron 2014; Elyashiv et al. 2016; Murphy et al. 2022), as well as in a handful of non-model species (Liang et al. 2022; Pope et al. 2023). Most notably, Corbett-Detig et al. (2015) used classic-BGS theory to estimate the combined effect of BGS and recurrent sweeps on genetic diversity across 40 species of animals and plants. They found that species with larger census size experience a greater reduction in genetic diversity from selection at linked sites, and suggested that this effect explains the narrow range of genetic diversity levels observed across species. Although the strength of this conclusion has since been challenged (Coop 2016; Buffalo 2021), comparative analyses like that of Corbett-Detig et al. (2015) are an undoubtedly useful approach for understanding the role of selection at linked sites in determining levels of genetic diversity in nature.
Classic-BGS theory assumes that a population is at demographic equilibrium, yet recent work has shown that demographic change modulates the effect of BGS on genetic diversity (Torres et al. 2018; Johri et al. 2021; Barroso and Ragsdale 2025). Inbreeding via self-fertilization also influences the strength of BGS, and although recombination rates and dominance coefficients can be rescaled to account for selfing (Nordborg 1997; Glémin and Ronfort 2013), this approach is known to be inaccurate when the rate of selfing is high enough (Kamran-Disfani and Agrawal 2014; Roze 2016). The fact that BGS is influenced by both demography and selfing suggests that the classic-BGS model may provide inaccurate estimates under these conditions. More generally, it is unclear whether this approach can be used for full parametric inference given that previous investigations often fix some of the parameters describing the distribution of deleterious fitness effects or only assess a small number of parameter combinations (Comeron 2014; Corbett-Detig et al. 2015; Liang et al. 2022).
Overview
Here, we investigate approaches for inference of BGS from genome sequence data. We use simulations to test the performance of the classic-BGS model under a range of conditions involving demographic non-equilibrium and partial-selfing. We also propose an alternative approach which aims to jointly model BGS, demography and partial-selfing, and we compare the performance of this model to classic-BGS. Our main focus is accurate estimation of the genome-wide reduction in genetic diversity due to BGS (), but we also test the ability of these methods to estimate the distribution of deleterious fitness effects (DFE), variation in B along the genome (B-maps), and, where applicable, the demographic history of the population. We apply our proposed joint inference method to two species of Capsella with contrasting mating systems and conclude by discussing recent progress and outstanding challenges in estimation of BGS from sequence data.
Results
Constant population size and strong selection
We implemented the classic-BGS model as a method for estimating the reduction in genetic diversity due to BGS along the genome (B). Our implementation is similar to those of previous work (Comeron 2014; Elyashiv et al. 2016) (see Methods for details). One notable difference is that we do not use the approximation for B derived by Hudson and Kaplan (1995) and Nordborg et al. (1996), and instead use:
| (1) |
from Roze (2016), where is the deleterious mutation rate at selected site i, is the reduction in fitness in a homozygote, is the dominance coefficient, and is the recombination distance to the selected site. Unlike the approximations of Hudson and Kaplan (1995) and Nordborg et al. (1996), Equation (1) holds under loose linkage and for unlinked mutations (; see Matheson and Masel 2024 for an investigation of this issue). We assume that all selected sites share the same μ and h but that fitness effects of deleterious mutations are gamma distributed and therefore parameterized by mean and shape β. There are four free parameters in the model—, μ, , β—and we assume throughout as this parameter is not identifiable (note that s and h always appear together as a product in Equation (1)). The parameter refers to the effective population size in the absence of BGS, so that the overall reduction in genetic diversity () is equal to . The problem of estimating , which is our main quantity of interest, can therefore be rephrased as estimation of .
Before fitting this model to data, we can first gain intuition about how each DFE parameter affects patterns of linked diversity by visualizing analytic predictions of B. Figure 1 shows how varying either μ, or β across a factor of ten, while holding the other two parameters constant, affects the predicted value of B given a single site under purifying selection at recombination distance r. Varying μ affects B equally across all values of r. By contrast, varying changes the relative contributions of tightly and loosely linked sites to BGS. The shape parameter β determines the spread of selection coefficients around and, as a result, it determines the spread of B across recombination distances. The effect of β on B is subtle overall (Fig. 1), suggesting that this parameter may be challenging to estimate precisely. We nonetheless expect that all three DFE parameters will be identifiable as long as their is sufficient information in the data (Fig. S1).
Fig. 1.
Analytic predictions for the reduction in genetic diversity from BGS. The predicted departure from neutral levels of diversity (, y-axis) is plotted against the recombination distance of a selected site (r, x-axis). Each panel shows the effect of varying a single DFE parameter across an order of magnitude, while holding the other parameters constant (, , ) and assuming an of .
To test the performance of the classic-BGS model under favorable conditions we simulated a panmictic population of constant size where mutations within coding DNA sequence (CDS) are subject to strong purifying selection (, , , , see Methods for more details). These simulations were conditioned on the CDS annotation and recombination map of three Capsella rubella chromosomes (Slotte et al. 2013), totaling 49 Mb in length. In these simulations BGS reduced genetic diversity within intergenic regions by 19%, i.e. , and this reduction varied across the genome due to variation in recombination rate and gene density (Fig. S2). We fit the classic-BGS model to these simulated data, using nucleotide diversity across 10 kb windows at input. The B-map predicted by the model is a qualitatively good fit to the B-map generated by the simulation (Fig. S2A) and maximum composite likelihood (MCL) parameter estimates (, , , ) are also a good match to the parameters of the simulation. We estimated the uncertainty in parameters by bootstrapping across the simulation replicates. While the 95% CIs are narrow for the parameter (), they are wider for μ () and (), and very broad for β () (Fig. S2B), which is consistent with β having a subtle effect on B. These results show that the classic-BGS model can accurately estimate , and by extension the reduction in genetic diversity due to BGS——but that variation in nucleotide diversity across the genome is only weakly informative about the DFE.
Our simulations are conditioned on the recombination map of C. rubella (Slotte et al. 2013; Brazier and Glémin 2022), where recombination varies on a scale of 100 kb. We assume this map when fitting the model, yet in reality most estimated recombination maps will tend to be more coarse than the true map. To investigate this issue we repeated the analysis conditioning on maps where recombination rate is instead measured in intervals of 1 or 5 Mb. We find that coarse recombination maps lead to poorly fitting B-maps and biased parameter estimates (Fig. S2). However, is still well estimated when using these recombination maps, albeit with more uncertainty (Fig. S2). This suggests that accurate estimation of is possible even with only partial information about recombination.
Non-equilibrium demography
We next consider the effect of non-equilibrium demography on estimation of BGS parameters. We modified the simulation procedure to include an instantaneous five-fold growth / decline in population size at time T generations in the past, where T is equal to N in the most recent epoch (see Methods). Values of N were chosen to ensure that different simulation scenarios would have the same level of nucleotide diversity at the time of sampling in the absence of BGS. Within these simulations the average reduction in diversity due to BGS was greater for the decline demography () than for the growth demography () (Fig. 2a). This interaction between demography and BGS has been reported previously (Torres et al. 2018; Johri et al. 2021; Barroso and Ragsdale 2025). It can be explained by BGS increasing the rate of coalescence and by extension the probability that lineages coalesce in earlier epochs, so more frequently when N is small (see the Appendix of Johri et al. 2021). We fit the classic-BGS model to these simulated datasets and found that in both cases the predicted B-map provides a good fit to the simulated B-map (Fig. 2a). The parameter (which for non-equilibrium demographies corresponds to the reciprocal of the overall rate of coalescence in the absence of BGS) is slightly overestimated for the decline demography, but the bias is very small (; Fig. 2b). Estimates of and β have similar accuracy and precision to those from simulations with constant population size. However, we find that the deleterious mutation rate is estimated incorrectly for both non-equilibrium demographies (Fig. 2b). This parameter () is underestimated under a scenario of population growth (, ()) and overestimated under population decline (, ()). This bias can be understood by considering the interaction between demography and BGS described above: the effect of BGS on nucleotide diversity is modulated by changes in population size and, because demography is missing from the model, this is accounted for by changing μ. So although the classic-BGS model provides meaningful estimates of under demographic non-equilibrium, estimates of μ become biased.
Fig. 2.
Estimation of BGS under non-equilibrium demography. a) The reduction in genetic diversity from BGS along one chromosome is shown for simulations with constant population size, population growth and population decline. The gray line in each panel is the observed reduction in the simulation, whereas estimates from the classic-BGS and BGS-with-demography models are shown as blue and dashed-red lines, respectively. b) Estimates of , μ, and β. Point estimates are shown as large points and bootstrap estimates are shown as small jittered points. Vertical lines show 95% CIs and dashed horizontal lines correspond to the simulated parameter values. The demographic history of each simulation scenario is shown on the right. The gray line corresponds to the number of individuals (N) of the simulated Wright-Fisher population. Estimates from the BGS-with-demography model are shown as a dashed-red line and the estimated history from methods assuming selective neutrality are also plotted for comparison.
Barroso and Ragsdale (2025) have also recently shown that demographic non-equilibrium leads to biased DFE inference under the classic-BGS model and have developed an elegant two-locus method to resolve this issue. Here, we investigate an alternative approach for jointly estimating BGS and demography that uses information contained in site frequency spectra (SFS) along the genome. The approach is similar to the classic-BGS model with some notable differences. First, we calculate the composite likelihood of observing windowed SFS along the genome, rather than levels of nucleotide diversity. Second, we include a two-epoch demographic history and estimate B values separately for each epoch. Finally, we account for the transition in the rate of coalescence from to in recent time using an approximation based on the results of Nicolaisen and Desai (2013). In summary, we model the rate of coalescence within each genomic window using demographic parameters that are shared genome-wide (, , T) along with epoch-specific B values for each window that depend on the local effect of BGS through time (, ) given the local recombination rate, density of selected sites and the deleterious DFE (μ, , β).
We fit this model of BGS-with-demography to the simulated datasets analyzed in the previous section, which differ only in their demographic history (constant, growth and decline). We find that the B-maps predicted by this model are almost identical to those predicted by the classic-BGS model (Fig. 2a). While the classic-BGS model failed to accurately estimate μ for histories of non-equilibrium demography, the new model provides accurate estimates of μ for all three simulated datasets (Fig. 2b). The small bias in estimates of under the decline demography is also resolved (Fig. 2b). The BGS-with-demography model estimates a two-epoch demographic history that approximately accounts for the effect of BGS on sequence variation. These estimated histories provide excellent matches to the simulated demographic parameters (Fig. 2b). As a comparison, we estimated demography from the same simulated datasets using a neutral two-epoch model as well as StairwayPlot2 (Liu and Fu 2020) (which also assumes selective neutrality). We find that both of these methods generate estimates of that are lower than the simulated N values by approximately a factor of (Fig. 2b). Additionally, these methods incorrectly estimate the timing of changes in , especially for the decline demography. For the constant population size simulations these methods estimate a small increase in in recent time, corresponding to the transition between and (Nicolaisen and Desai 2013; Fig. 2b). By contrast, the model of BGS-with-demography, which aims to capture this transition in the rate of coalescence, has only a very small amount of error in recent time (N is underestimated by ). Although the model of BGS-with-demography that we consider is approximate in nature, these results show that it can resolve bias associated with the classic-BGS model and also provide estimates of demography that account for the effect of BGS on sequence variation.
Weak selection
So far we have only simulated strong purifying selection where almost all deleterious mutations will be removed by natural selection. Specifically, we simulated a deleterious DFE with and , where the large value of β ensures that most mutations will satisfy the condition . However, estimates of the DFE across a wide range of species have shown that β tends to be (Chen et al. 2017), suggesting that the DFE often has a considerable fraction of weakly deleterious mutations. The model of BGS-with-demography presented above can likely accommodate weak selection to some degree because the DFE is truncated at within each epoch (see Methods). Truncating the DFE in this way is nonetheless an approximation and so inference could become biased if weakly deleterious mutations are sufficiently common. To investigate the effect of weak selection we repeated the simulations above while setting and . These values imply that most new mutations will have mildly deleterious fitness effects (i.e. ) when .
The overall reduction in genetic diversity due to BGS in these simulations was for constant, growth and decline demographies, respectively, which are smaller reductions than observed under strong purifying selection (see previous section). We fit both the classic-BGS and BGS-with-demography models to the simulated data. The B-maps predicted by these two models are almost identical and provide a qualitatively good fit to the simulations (Fig. S3). We again find that the BGS-with-demography model generates accurate estimates of the demographic history, whereas methods that assume neutrality are consistently biased (Fig. S3). However, we find that the DFE parameters tend to be poorly estimated (Fig. S3B). The two models give similar DFE estimates for simulations with constant and decline demographies, but estimates from the BGS-with-demography model for simulations with population growth are particularly biased. We suspect that the piece-wise constant assumption of our model may introduce bias whenever deleterious allele frequencies are slow to equilibrate. When we refit these models while fixing μ to the true value (which would be a realistic possibility in some model species), we find that estimates of the DFE parameters are more precise and accurate (Fig. S4), although small biases can still be observed. These results again show that the DFE is challenging to estimate from this type of model. Importantly, , B-maps, and demographic histories are accurately estimated under the BGS-with-demography model, even when the DFE is not.
Partial selfing
To investigate the effect of partial selfing on estimation of BGS we modified the general simulation procedure used above to include reproduction by self-fertilization at rate α. We simulated a mixed-mating population () and a high-selfing population (), both with a constant population size of diploids and a deleterious DFE that includes both strongly and weakly selected mutations (, ). We find that overall nucleotide diversity is reduced by a factor of in the mixed-mating population and by a factor of in the high-selfing population. However, these reductions include both the effects of selfing and BGS. Using knowledge of α, these values can be rescaled so that they only represent the effect of BGS: with and for mixed-mating and high-selfing, respectively. Given that for a randomly-mating population with the same size and DFE (see above), this shows that the interaction between selfing and BGS is weak when populations reproduce by mixed-mating but substantial under high-selfing.
We fit the classic-BGS model to these simulated data. Estimates of from this model are underestimates of the simulated N. This is expected given that partial selfing increases the long-term rate of coalescence by a factor of (where ) and the probability that a pair of lineages reach this long-term process when sampled from a diploid is only . The DFE parameter estimates from the classic-BGS model broadly match the true values for the mixed-mating population, albeit with a significant upwards bias in estimates of . Estimates for the high-selfing population are strongly biased (Fig. 3). Similarly, the predicted B-map is a qualitatively good fit for mixed-mating, but the effect of BGS on genetic diversity is underestimated for the high-selfing population. The classic-BGS model is therefore surprisingly robust to reproduction by mixed-mating but gives inaccurate estimates of the DFE and under high-selfing.
Fig. 3.
Estimation of BGS under partial selfing and non-equilibrium demography. a) Parameter estimates from the classic-BGS and BGS-with-demography model under different rates of partial selfing and demography. As in Fig. 2, large points are point estimates, small jittered points are bootstrap estimates and vertical lines correspond to 95% CIs. A log y-axis is used for plots showing estimates of and β. b) The reduction in genetic diversity from BGS along one chromosome is shown for each simulated scenario. The gray line in each panel is the observed reduction in the simulation, whereas estimates from the classic-BGS and BGS-with-demography models are shown as blue and dashed-red lines, respectively. The mean error (ME) between the simulated and estimated B-maps is also shown in each panel.
The BGS-with-demography model is expected to have the same limitations when fit to data from partially selfing populations. It is, however, possible to extend the model to account for partial selfing. First, the expected B value for a given time epoch and genomic window can be calculated using an approximation for the effect of BGS on the rate of coalescence under partial selfing. Here, we use a new approximation based on the results of Roze (2016) :
| (2) |
where and . This approximation is expected to better capture the contribution of loosely linked loci to BGS than that of Nordborg (1997). Second, the expected SFS for a genomic window should reflect the increased rate of coalescence from selfing, as well as the distortion in shape due to sampling diploid individuals. This has been addressed by Blischak et al. (2020) and here we use their inbreeding aware implementation of to obtain the SFS for each genomic window. Together these changes allow estimation of BGS parameters conditional on α, which can be estimated from or the SFS prior to model fitting, or left as a free parameter.
We reanalyzed the data from simulations including partial selfing using this selfing-aware model of BGS-with-demography while conditioning each analysis on the value of α implied by . Under mixed-mating, the model provides accurate parameter estimates, resolving the bias in (Fig. 3a). Under high-selfing, parameter estimates have more variance but are again approximately unbiased (Fig. 3a). The predicted B-map for the high-selfing population is also a better fit than that from the classic-BGS model (Fig. 3b). In principle, this model should also perform well for partially selfing populations at demographic non-equilibrium. We therefore simulated partially selfing populations with growth and decline demographies (as above) and fit both the classic and new model to these data. We find that the new model always generates a better fitting B-map, but that the improvement in mean error depends on demography (small for growth but substantial for decline). Estimates of μ and from the new model are also always more accurate, although there is evidence for bias in some cases (e.g. estimates of β for a scenario of high-selfing with growth; Fig. 3a). We performed additional simulations under high-selfing with varying μ and found that estimation is less accurate when the deleterious mutation rate is high (Tables S1 and S2), but that the new model consistently performs better than classic-BGS. Altogether, these results show that it is possible to obtain better estimates of the effect of BGS on genetic diversity by explicitly modeling demography and partial selfing.
While BGS leads to a small bias in the results of demographic inference under random mating (Fig. 2), we expect the bias introduced under partial-selfing to be much larger. We can evaluate this by examining the distribution of pairwise coalescence times from the simulated genealogies (i.e. tree sequences). Since the rate of coalescence through time is inversely proportional to the coalescent , we can use these distributions to obtain piece-wise constant trajectories. Figure 4a shows that partial-selfing leads to low coalescent in the very recent past due to coalescence from self-fertilization in immediate ancestors of diploid individuals. Further back in time, reflects coalescence in the wider population, the rate of which is jointly determined by demography, BGS and selfing. Although we simulated populations with constant size, varies through time whenever simulations include BGS. In recent time this is due to the transition in coalescence rate between and as lineages move through fitness classes. After this transition, tends to gradually increase again (see Discussion). The large departure between the simulated demography and realized rate of coalescence shows that demographic inference in selfing populations is challenging.
Fig. 4.
The effect of BGS and partial-selfing on coalescent through time. a) The implied by the distribution of pairwise coalescence from simulations that vary in α and whether they include BGS. The gray line in each plot corresponds to the simulated population size and colored lines correspond to coalescent trajectories from tree sequences for five batches of simulation replicates. b) Point estimates of inferred demographic histories from these simulated data using either the method of Blischak et al. (2020) or the BGS-with-demography model extended to partial-selfing.
Two-epoch demographic histories estimated by the BGS-with-demography model are shown in Fig. 4b. As a comparison we also estimated past demography using the method of Blischak et al. (2020), which accounts for partial selfing but not BGS. Generally, we observe that the method of Blischak et al. (2020) underestimates and estimates recent growth despite constant population sizes (Fig. 4b). The BGS-with-demography model performs better in these respects, but does overestimate in the distant past when the rate of selfing is high. This can be explained by BGS generating a “U-shaped” distortion in the SFS (Fig. S5), while our model assumes that the distortion is limited to low-frequency alleles. It is nonetheless encouraging that our approach can provide meaningful estimates of through time in selfing populations where the impact of BGS on patterns of sequence polymorphism is strong.
The effect of BGS on genetic diversity in Capsella grandiflora and Capsella orientalis
We next investigate the role of BGS in determining levels of genetic diversity in two closely related plant species; C. grandiflora, which is a self-incompatible outcrosser, and C. orientalis, which reproduces predominantly by selfing. We reanalyze high-coverage whole genome sequence data from Josephs et al. (2015) and Koenig et al. (2019) by jointly calling variants for 50 C. grandiflora and 16 C. orientalis individuals (see Methods; Tables S3–S5). Estimates of the inbreeding coefficient reflect the contrasting mating systems of these species, with in C. grandiflora and in C. orientalis. Hereafter we assume that C. grandiflora mates randomly () and that C. orientalis reproduces by partial selfing (). Nucleotide diversity per-site (excluding CDS) is approximately forty-fold higher in C. grandiflora () than in C. orientalis (), consistent with a strong effect of selection at linked sites in the predominantly selfing species.
We first fit the BGS-with-demography model to the outcrossing species C. grandiflora. While we have previously assumed a two-epoch demographic history and perfect polarization of alleles when fitting this model to simulated data, here we instead assume a three-epoch history and also include an allele polarization error parameter that we estimate under neutrality prior the model fitting ( ). We estimate a genome-wide reduction in diversity of in C. grandiflora, with the B-map varying between and across 10 kb windows (Fig. 5a). The gamma-distributed deleterious DFE for CDS is estimated with parameter values of , and . Although this value of β is unrealistic, setting a realistic value gives a very similar B-map (Fig. S6). The inferred demographic history is consistent with an expanding population as increases approximately three-fold over the last generations (Fig. 5c). Altogether, our results suggest that the effect of BGS on genetic diversity is weak in C. grandiflora.
Fig. 5.
The effect of BGS on genetic diversity in C. grandiflora and C. orientalis. a) Nucleotide diversity (π) is plotted as a gray line across seven C. grandiflora chromosomes in 200 kb sliding windows. Predicted levels of π and B from BGS are shown as a red line. The estimated DFE parameters are listed above the plot. b) The same as in a) but for C. orientalis. c) Estimates of through time for C. grandiflora and C. orientalis plotted on a log-log scale. d) The normalized unfolded SFS for C. grandiflora is shown as gray bars and the predicted SFS is shown as red bars. e) The same as in d) but for C. orientalis.
Ideally we would fit the same model of BGS-with-demography to the predominantly selfing species C. orientalis while also including the effect of partial selfing. Unfortunately, the SFS for our sample of C. orientalis genomes is very flat (Fig. 5e), which makes inference of past demography challenging. For example, fitting a neutral two-epoch demographic model to this data together with rates of polarization error and selfing suggests a 50-fold decrease in at , with and . These unrealistic parameter estimates (especially ) suggest that the shape of the SFS in C. orientalis may reflect other evolutionary forces that are not captured by our model (see Discussion). With this in mind, we chose to fit a simpler model of BGS-with-partial-selfing that only uses variation in π along the genome, rather than the full SFS. Although this means we cannot include past demography in our analysis, we avoid the possibility of biasing our inference by assuming that variation in the shape of the SFS across the genome reflects BGS. Instead we only use the shape of the SFS to estimate the rate of selfing prior to model fitting ( ).
For C. orientalis we estimate a genome-wide reduction in diversity of from BGS and a corresponding of approximately (Fig. 5). The estimated DFE suggests very weak selection, with , and . Note that these DFE estimates are informed by the sojourn time of deleterious mutations, which we expect to be increased by any negative linkage-disequilrbium between selected mutations. Although our simulations suggest that this effect is small when , it is likely substantial in C. orientalis given and so may explain why we estimate such a weak DFE. Although we have not used the SFS for estimating BGS parameters, we can nonetheless obtain an expected SFS for this scenario of BGS-with-partial-selfing. We find that the expectation from the model, which incudes no demographic change or polarization error, is a poor match to the data (Fig. 5e), especially in comparison to the model fit for C. grandiflora (Fig. 5d).
Our results suggest that the reduction in genetic diversity due to BGS () is five-fold greater in C. orientalis than C. grandiflora. The fact that we simultaneously model partial selfing, the DFE and past demography means that we can also estimate the contribution of each of these processes to the difference in between the two species. For example, while our model of C. grandiflora predicts genome-wide, if we replace the expanding history of C. grandiflora with a constant population size with equivalent overall then we instead predict . This shows that demography only plays a small role in modulating the effect of BGS on genetic diversity in C. grandiflora, and that most of the difference in between these species is instead due their contrasting mating systems.
Discussion
Here, we have evaluated the performance of two different methods for estimating the effect of BGS on genetic diversity: the classic-BGS model (Hudson and Kaplan 1995; Nordborg et al. 1996) and an SFS-based method that aims to jointly model BGS, demography and partial-selfing. Consistent with previous work, we found that demographic change (Torres et al. 2018; Johri et al. 2021; Barroso and Ragsdale 2025) and partial-selfing (Nordborg 1997; Kamran-Disfani and Agrawal 2014; Roze 2016; Burgarella et al. 2024) modulate the effect of BGS on genetic diversity and therefore lead to biased results when using the classic-BGS model. However, these biases mostly affect parameters describing the DFE rather than the predicted B-map. As a result, the classic-BGS model appears to be robust to both demographic change and low rates of partial-selfing if one is only interested in estimating the overall effect of BGS on genetic diversity—. This robustness can be understood by considering the identifiability of the different parameters in the model. While the DFE parameters are estimated from patterns of variation in diversity along the genome (Fig. 1), information about is mostly contained in regions where B is close to . The B-maps from simulations of randomly mating and mixed-mating populations often had considerable regions of chromosomes where (Figs. 2 and 3), making estimation of (and ) straightforward. By contrast, B-maps from simulations of high-selfing populations tended to have values of B that were much lower, meaning that estimation of requires accurate knowledge of the DFE in order to extrapolate levels of diversity to an unobservable region that is free of BGS. Accurate inference from such populations therefore requires alternative methods, and our proposed joint inference approach appears promising in this respect (Fig. 3).
Despite the robustness of the classic-BGS model, our results show that jointly modeling BGS and demography does provide several benefits. In particular, our joint inference approach resolves the bias in estimates of μ introduced by demographic non-equilibrium and also generates accurate inference of demography while accounting for BGS (Fig. 2). These benefits are demonstrated in our analysis of C. grandiflora, where we could show that a demographic history of expansion has only slightly weakened the effect of BGS on genetic diversity in this species. Our approach does assume that natural selection is strong relative to drift () and that frequencies of deleterious mutations equilibrate quickly between epochs, which will not always be true when selection is weak and effective population sizes are small (Fig. S3). Barroso and Ragsdale (2025) have also recently developed an approach for modeling BGS under non-equilibrium demography. Their two-locus method does not currently support joint inference (demographic histories are specified prior to BGS estimation), but it has the major advantage of explicitly modeling weak selection and deleterious allele frequencies through time. We therefore expect the method of Barroso and Ragsdale (2025) to be much more accurate than the one we propose here in regimes of weak selection. Additionally, because their approach uses polymorphism information at both selected and linked sites, we expect that estimation of DFE parameters will be more precise. While we have found that patterns of linked diversity contain limited information about the DFE (Figs. 2 and 3), it will be interesting to see whether the approach of Barroso and Ragsdale (2025) will prove more powerful than DFE estimation methods that ignore the effect of selection on linked sites (Keightley and Eyre-Walker 2007).
We have shown that the classic-BGS model effectively breaks down when analyzing data from populations that predominantly reproduce by selfing. By using a new approximation (Equation 2, Roze 2016) we were able to generate far more accurate estimates of and the DFE (Fig. 3). We nonetheless expect that BGS has additional effects on sequence variation in highly selfing populations that are not captured by our joint inference approach. In particular, negative linkage disequilibrium is expected to build up between deleterious mutations when outcrossing is rare, leading to less efficient selection and deleterious mutations that segregate longer than expected given sh (Hill and Robertson 1966; Daigle and Johri 2025). This is evidenced by subtle underestimates of for simulated data when (Fig. 3) and estimates of very weak selection for C. orientalis, where . We also found that distributions of coalescence times from simulations of BGS and partial-selfing were inconsistent with a piece-wise constant demographic history (Fig. 4a). While this could be due to insufficient burn-in (despite total generations), an alternative explanation is that a regime of weak purifying selection and low effective recombination rate leads to distortions in genealogies that are not captured by a monotonic transition from to (Seger et al. 2010; Strütt et al. 2025 ). Khudiakova et al. (2024) have recently suggested that weak purifying selection results in a multiple-merger coalescent process, and although outside of the scope of this current work, it would be interesting to test whether this can explain the unusually flat SFS of C. orientalis (Fig. 5e). We expect that our assumptions of independence between selected sites and a Kingman-like coalescent process may be problematic when populations are almost completely selfing. In such cases, simulation-based inference may be a better prospect for accurately estimating the effect of BGS on genetic diversity (Johri et al. 2021).
Several authors have suggested that BGS should be included in baseline / null models of evolution (Comeron 2014; Johri et al. 2020). Here, we have tested whether BGS (modulated by demography and partial-selfing) can explain the large difference in genetic diversity between two closely related species—C. grandiflora and C. orientalis—or if additional demographic and selective forces are required. We estimate that the effect of BGS on genetic diversity is approximately five-fold greater in the predominant selfer C. orientalis, yet maximum composite likelihood estimates of for these species still differ by an order of magnitude (Fig. 5). This suggests that much of the difference in diversity between these species is instead explained by other factors. Both selective sweeps and population bottlenecks are expected to have a greater impact on genetic diversity in predominantly selfing species (Baker 1955; Andersen et al. 2012; Pannell 2016; Hartfield and Bataillon 2020). These two processes are particularly promising explanations for the large disparity in genetic diversity between C. grandiflora and C. orientalis as they could also explain the highly distorted SFS of the latter species. Another important factor that we have not considered in our analysis is population structure. While the C. grandiflora individuals that we analyzed were sampled from a single population in Greece (Josephs et al. 2015), the C. orientalis individuals were sampled from across central Asia (Koenig et al. 2019; Table S3). This sampling scheme will lead to increased estimates of nucleotide diversity whenever migration between populations is weak relative to drift (Li 1976). Additionally, recent work has shown that the effect of BGS on total nucleotide diversity in a metapopulation setting depends on the rate of migration (Hasan and Whitlock 2024). Accounting for the effect of population structure when estimating BGS from sequence data is therefore an important direction for future methods development. Although fitting models that include many evolutionary processes is still difficult, methods that perform joint inference of a few interacting forces provide a promising approach to disentangling the determinants of genetic diversity.
Methods
Classic-BGS
We implemented the classic-BGS model by calculating the composite likelihood of levels of nucleotide diversity in genomic windows across the genome, given values of , μ, and β. For each genomic window we calculate B as , where is the contribution to B from purifying selection elsewhere in the genome and is the contribution of purifying selection within that window. By integrating Equation (1) across a DFE, for genomic window k can be calculated as:
| (3) |
where is the deleterious mutation rate in window i and is the map distance between windows k and i given by Haldane’s mapping function (Haldane 1919). When windows k and i are on different chromosomes . is the probability density function of a gamma distribution parameterized by mean and shape β:
| (4) |
To calculate we assume that deleterious mutations are distributed uniformly within the window and average Equation 9 of Nordborg et al. (1996) across a window by integration to give:
| (5) |
where and are the diploid deleterious mutation rate and recombination fraction in window k, respectively. Given B values for each window of the genome we calculate a log composite likelihood as:
| (6) |
where and are counts of the number of sites in window i where a pair of samples are polymorphic and monomorphic, respectively, and is calculated as:
| (7) |
where is the neutral de novo mutation rate per-generation. The value of is not estimated from the data and is instead set prior to model fitting. This parameter is expected to have little effect on estimation of BGS as it only determines the absolute value of and the lower bound of the integrals in Equations (3) and (5) (as ).
Even when grouping adjacent sites into windows, calculation of composite likelihoods for whole genomes can be prohibitively slow as the integrals in Equations (3) and (5) must be taken over thousands of recombination distances. To improve speed we evaluate these integrals across a grid 40 and 11 values of r and R given the parameters and β, and then use linear interpolation to estimate and for each window (Comeron 2014). We perform a two-step derivative-free optimization procedure to maximize the lnCL. We first perform a global search using our own implementation of the Controlled Random Search algorithm (Kaelo and Ali 2006) and then use the maximum likelihood parameter combination from this search as a starting point for a local search with the Nelder-Mead algorithm implemented in nlopt v2.7.1 (Nelder and Mead 1965; Johnson 2007).
BGS-with-demography
We extended the classic-BGS model to jointly estimate the effects of demography and BGS on sequence variation. Demography is modeled as a piece-wise constant history where time is divided into epochs so that the rate of coalescence in epoch t is and the boundary between epoch t and is . For example, in the absence of BGS and assuming a two-epoch history, the rate of coalescence through time is determined by parameters . In the presence of BGS, the coalescent history of genomic window i can be approximated by rescaling each by , where is calculated using Equations (3) and (5). One issue with these calculations is that the lower-bound of the integrals depend on the coalescent that results from BGS, which is unknown for individual epochs. We therefore initially assume the coalescent implied by genome-wide nucleotide diversity and perform iterative calculation of the B-map until converges.
In a coalescent process with BGS, the rate of coalescence only reaches once all lineages in the sample are in the least loaded fitness class. Nicolaisen and Desai (2013) derived an approximation for the coalescent at time t generations in the past due to this transition:
| (8) |
Performing calculations across values of t for every window in a genome would likely be prohibitively slow. We instead perform a simpler calculation using the term in Equation (8) that determines the speed of the transition, while neglecting recombination and again integrating across the DFE:
| (9) |
We thereby assume that the speed of the transition in coalescence rate is shared across the genome. This approximation is a drastic simplification of the coalescent process under purifying selection / BGS (Strütt et al. 2025). At the same time, a comparison of this approximation to those of Nicolaisen and Desai (2013) show that it is a good fit for transition dynamics in regions of low recombination (Fig. S7), which is where BGS is expected to be strongest and capturing the transition in coalescence rate is most important.
For each genomic window the coalescent history is described by rescaled demographic parameters (e.g. , , ) and a two-step transition from to calculated by evaluating Equation (9). We then use moments v1.1.16 to calculate the expected SFS of each window given its piece-wise constant coalescent history (Jouganous et al. 2017). The log composite likelihood of the model given windowed SFS along the genome is calculated as:
| (10) |
where is the expected frequency of sites with a derived allele count of j in window i and is the observed count of these sites. Note that the entry in the SFS corresponds to monomorphic sites in the sample. We perform the same optimization procedure as described above, but during the initial global search we round values of B to two decimal places to reduce the computational burden of calculating many similar expected SFS. Discretizing the B-map in this way results in a substantial speed-up in computation but does also reduce the smoothness of the likelihood surface.
BGS-with-partial-selfing
To include partial-selfing in the inference approach we integrate Equation (2) over the DFE to calculate . This approximation combines Nordborg’s (1997) approximation (that is obtained in the limit of weak recombination from Roze’s (2016) more general model), and Roze’s (2016) high selfing approximation. Numerical analysis indicates that the predictions obtained from this approximation are often close to the predictions from the more general model. To calculate and the transition in coalescence rate we rescale recombination rates and dominance coefficients by and , respectively, in Equations 5 and 9 (Nordborg 1997).
As above, we calculate the expected SFS for each genomic window given locally rescaled demographic parameters. Under partial-selfing coalescence rates are additionally rescaled by and we use the inbreeding aware implementation of (v2.3.3) to calculate expected spectra while accounting for the distortion introduced by sampling diploid individuals (Blischak et al. 2020).
Calculating B-maps and SFS along the genome under partial-selfing requires knowledge of α. This can be included as a free-parameter in the model, with the shape of the SFS and patterns of nucleotide diversity along the genome informing its value. We tested this approach and found that convergence of model parameters was much slower than if fixing α prior to model fitting. We therefore take the latter approach by calculating or performing an initial estimation from the SFS under neutrality.
Forward simulations
Forward simulations were performed with SLiM v4.2.2 (Haller et al. 2019; Haller and Messer 2023). Each simulation included three chromosomes that emulate chromosomes 1, 2 and 3 of the Capsella rubella genome. Coordinates of CDS positions corresponded to those of the genome annotation from Slotte et al. (2013) and recombination rates correspond to the map generated by Slotte et al. (2013) and later curated by Brazier and Glémin (2022). Recipes from the SLiM manual were used to model deleterious mutations within CDS, free recombination between chromosomes, instantaneous changes in population size and reproduction by partial-selfing. We simulated three different demographic scenarios: constant, growth and decline. The constant population size was , the growth parameters were , , and the decline parameters were , , , where corresponds to the size of the population in the most recent time epoch. Simulations with random mating consisted of generations and those with partial-selfing consisted of generations. We set the total number of generations to for the simulations used to generate the results shown in Fig. 4 to reduce the possibility that insufficient burn-in explained changes in coalescent through time. Each simulation was repeated 100 times with the output being a population-level tree sequence (Kelleher et al. 2018; Haller et al. 2019).
Tree sequences were recapitated and simplified using pySLiM v1.0.4, tskit v0.5.6 and msprime v1.3.1 (Baumdicker et al. 2021). Five diploid individuals were sampled per-simulation for randomly mating populations, whereas ten diploids were sampled for partial-selfing populations to ensure that estimation of α was not a limiting factor in our analysis. Neutral mutations were added to tree sequences while assuming a discrete genome and a per-site per-generation mutation rate of . Windowed SFS were tallied from tree sequences while removing the contribution of CDS to both polymorphic and monomorphic sites. We also outputted a VCF file for each simulation which was used to calculate via the method of Weir and Cockerham (1984).
For each simulation scenario we pooled data from the 100 replicates and fit BGS models to these windowed SFS to obtain point estimates of model parameters and B-maps. We generated bootstrap replicates by randomly sampling 100 simulation replicates with replacement 100 times. These bootstrap replicates were analyzed to calculate the uncertainty in parameter estimates, with upper and lower 95% CIs calculated as the 2.5 and 97.5 percentiles of parameter estimates across bootstrap replicates.
Analysis of Capsella data
Paired-end Illumina whole genome sequence data for 50 C. grandiflora and 16 C. orientalis individuals were downloaded from BioProjects PRJNA275635 and PRJEB6689, respectively (Josephs et al. 2015; Koenig et al. 2019; Tables S3–S5). Mapping and variant calling was performed using a modified version of the snpArcher pipeline (Mirchandani et al. 2024; https://github.com/ThomasBrazier/snpArcher-dev), with the C. rubella genome assembly (GCF_000375325.1; Slotte et al. 2013) as a reference. The main modification was the inclusion of a module for calculating callable sites for each sample ( reads but mean coverage) using mosdepth (Pedersen and Quinlan 2018). Callable sites were further refined by identifying and removing putative paralogous loci using ngsParalog (Linderoth 2018). The calculation for estimating excess heterozygosity used by ngsParalog was modified to account for partial-selfing. Specifically, the probability of observing a heterozygous genotype given a derived allele frequency p and inbreeding coefficient F was calculated as:
| (11) |
This calculation allowed us to identify paralogous loci for C. orientalis by calculating and removing paralogs iteratively. We used the resulting intervals to define non-CDS regions where at least 10 C. grandiflora and 10 C. orientalis samples had callable sites (49 Mb in total). From these intervals we calculated unfolded SFS in 10 kb windows along the genome, with multivariate hypergeometric sampling of genotypes used to downsample spectra to 10 diploids per-species. Polarization of alleles was performed using the genotypes of the other species, with sites showing polymorphism in both species being set to non-callable. Given the possibility of imperfect allele polarization we estimated an error parameter () for C. grandiflora by fitting a neutral three-epoch model to the SFS with as a free parameter. Under this model the expected SFS is calculated as:
| (12) |
where is the expected frequency of polymorphisms with i derived alleles given the demographic history. The value of estimated under this model was assumed when fitting the BGS-with-demography model. We also estimated for C. orientalis under a neutral demographic model but found that the parameter values were unrealistic (see Results) and therefore chose to fit a model of BGS-with-partial-selfing where only levels of windowed nucleotide diversity are considered.
We used the recombination map and CDS annotation of C. rubella to fit BGS models, while assuming that deleterious mutations are confined to CDS. Sequence variation on chromosome 5 (KB870809.1) was excluded from the analysis because of errors in the recombination map (Brazier and Glémin 2022). Note that we included CDS positions across the entire genome in our analysis, even for chromosomes or windows with no callable sites. We assumed a de novo mutation rate of per-site per-generation, given estimates in Arabidopsis thaliana (Weng et al. 2019), and a generation time of one year. The last two bins of the SFS (, ) were masked when fitting BGS models as such polymorphisms are likely to be inflated by any reference bias. We fit BGS models using 10 replicate runs for each dataset to assess convergence. We found that the BGS-with-partial-selfing model that was fit to the C. orientalis data showed unsatisfactory parameter convergence across replicates and so we repeated the analysis with narrower parameter ranges. Results from repeated runs are found in Tables S6 and S7, with results from the maximum composite likelihood run reported in the main text.
Supplementary Material
Acknowledgments
We would like to thank Thomas Brazier for his contributions to the modified snpArcher pipeline used in this study, Paul Blischak for advice on using with partial-selfing and Burçin Yıldırım for providing feedback on a previous version of this manuscript.
Contributor Information
Alexander Mackintosh, Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.
Maxence Brault, University of Rennes, CNRS, ECOBIO (Ecosystems, Biodiversity, Evolution), Rennes, France.
Denis Roze, Sorbonne Université, CNRS, UMR 7144 AD2M, DiSEEM, Roscoff, France.
Martin Lascoux, Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.
Sylvain Glémin, Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden; University of Rennes, CNRS, ECOBIO (Ecosystems, Biodiversity, Evolution), Rennes, France.
Supplementary material
Supplementary material is available at Molecular Biology and Evolution online.
Funding
A.M. is supported by a grant from the Swedish Research Council (2022-03099), awarded to S.G. M.B. is supported by the Agence Nationale de la Recherche Grant ANR-23-CE02-0003.
Data availability
Code for performing forward simulations with SLiM and fitting BGS models is available at https://zenodo.org/records/17465489 (Mackintosh 2025). The filtered VCF file, bed file of callable regions and windowed spectra for the two Capsella species can be found at the same repository. Python code for fitting BGS models to sequence data is available at https://github.com/A-J-F-Mackintosh/Binfer.
References
- Andersen EC et al. Chromosome-scale selective sweeps shape Caenorhabditis elegans genomic diversity. Nat Genet. 2012:44:285–290. 10.1038/ng.1050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baker HG. Self-compatibility and establishment after long-distance dispersal. Evolution. 1955:9:347–349. 10.2307/2405656. [DOI] [Google Scholar]
- Barroso GV, Ragsdale AP. A model for background selection in non-equilibrium populations [preprint]. bioRxiv. 2025. 10.1101/2025.02.19.639084. [DOI]
- Barry P, Broquet T, Gagnaire PA. Age-specific survivorship and fecundity shape genetic diversity in marine fishes. Evol Lett. 2022:6:46–62. 10.1002/evl3.265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baumdicker F et al. Efficient ancestry and mutation simulation with msprime 1.0. Genetics. 2021:220:iyab229. 10.1093/genetics/iyab229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blischak PD, Barker MS, Gutenkunst RN. Inferring the demographic history of inbred species from genome-wide SNP frequency data. Mol Biol Evol. 2020:37:2124–2136. 10.1093/molbev/msaa042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brazier T, Glémin S. Diversity and determinants of recombination landscapes in flowering plants. PLoS Genet. 2022:18:e1010141. 10.1371/journal.pgen.1010141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buffalo V. Quantifying the relationship between genetic diversity and population size suggests natural selection cannot explain Lewontin’s paradox. Elife. 2021:10:e67509. 10.7554/eLife.67509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burgarella C et al. Mating systems and recombination landscape strongly shape genetic diversity and selection in wheat relatives. Evol Lett. 2024:8:866–880. 10.1093/evlett/qrae039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charlesworth B. The effects of deleterious mutations on evolution at linked sites. Genetics. 2012:190:5–22. 10.1534/genetics.111.134288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charlesworth B, Jensen JD. How can we resolve Lewontin’s paradox? Genome Biol Evol. 2022:14:evac096. 10.1093/gbe/evac096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charlesworth B, Morgan M, Charlesworth D. The effect of deleterious mutations on neutral molecular variation. Genetics. 1993:134:1289–1303. 10.1093/genetics/134.4.1289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen J, Glémin S, Lascoux M. Genetic diversity and the efficacy of purifying selection across plant and animal species. Mol Biol Evol. 2017:34:1417–1428. 10.1093/molbev/msx088. [DOI] [PubMed] [Google Scholar]
- Comeron JM. Background selection as baseline for nucleotide variation across the Drosophila genome. PLoS Genet. 2014:10:e1004434. 10.1371/journal.pgen.1004434. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coop G. 2016. Does linked selection explain the narrow range of genetic diversity across species? [preprint]. bioRxiv. 10.1101/042598.. [DOI]
- Corbett-Detig RB, Hartl DL, Sackton TB. Natural selection constrains neutral diversity across a wide range of species. PLoS Biol. 2015:13:e1002112. 10.1371/journal.pbio.1002112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daigle A, Johri P. Hill-Robertson interference may bias the inference of fitness effects of new mutations in highly selfing species. Evolution. 2025:79:342–363. 10.1093/evolut/qpae168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elyashiv E et al. A genomic map of the effects of linked selection in Drosophila. PLoS Genet. 2016:12:e1006130. 10.1371/journal.pgen.1006130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Glémin S, Ronfort J. Adaptation and maladaptation in selfing and outcrossing species: new mutations versus standing variation. Evolution. 2013:67:225–240. 10.1111/j.1558-5646.2012.01778.x. [DOI] [PubMed] [Google Scholar]
- Haldane JBS. The combination of linkage values and the calculation of distances between the loci of linked factors. J Genet. 1919:8:293–303. [Google Scholar]
- Haller BC, Galloway J, Kelleher J, Messer PW, Ralph PL. Tree-sequence recording in SLiM opens new horizons for forward-time simulation of whole genomes. Mol Ecol Resour. 2019:19:552–566. 10.1111/men.2019.19.issue-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haller BC, Messer PW. SLiM 4: multispecies eco-evolutionary modeling. Am Nat. 2023:201:E127–E139. 10.1086/723601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartfield M, Bataillon T. Selective sweeps under dominance and inbreeding. G3 (Bethesda). 2020:10:1063–1075. 10.1534/g3.119.400919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hasan A, Whitlock MC. FST and genetic diversity in an island model with background selection. PLoS Genet. 2024:20:e1011225. 10.1371/journal.pgen.1011225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hill WG, Robertson A. The effect of linkage on limits to artificial selection. Genet Res (Camb). 1966:8:269–294. 10.1017/S0016672300010156. [DOI] [PubMed] [Google Scholar]
- Hudson RR, Kaplan NL. Deleterious background selection with recombination. Genetics. 1995:141:1605–1617. 10.1093/genetics/141.4.1605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson SG. The NLopt nonlinear-optimization package. 2007. https://github.com/stevengj/nlopt.
- Johri P et al. The impact of purifying and background selection on the inference of population history: problems and prospects. Mol Biol Evol. 2021:38:2986–3003. 10.1093/molbev/msab050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johri P, Charlesworth B, Jensen JD. Toward an evolutionarily appropriate null model: jointly inferring demography and purifying selection. Genetics. 2020:215:173–192. 10.1534/genetics.119.303002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Josephs EB, Lee YW, Stinchcombe JR, Wright SI. Association mapping reveals the role of purifying selection in the maintenance of genomic variation in gene expression. Proc Natl Acad Sci U S A. 2015:112:15390–15395. 10.1073/pnas.1503027112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jouganous J, Long W, Ragsdale AP, Gravel S. Inferring the joint demographic history of multiple populations: beyond the diffusion approximation. Genetics. 2017:206:1549–1567. 10.1534/genetics.117.200493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaelo P, Ali MM. Some variants of the controlled random search algorithm for global optimization. J Optim Theory Appl. 2006:130:253–264. 10.1007/s10957-006-9101-0. [DOI] [Google Scholar]
- Kamran-Disfani A, Agrawal A. Selfing, adaptation and background selection in finite populations. J Evol Biol. 2014:27:1360–1371. 10.1111/jeb.12343. [DOI] [PubMed] [Google Scholar]
- Keightley PD, Eyre-Walker A. Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics. 2007:177:2251–2261. 10.1534/genetics.107.080663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kelleher J, Thornton KR, Ashander J, Ralph PL. Efficient pedigree recording for fast population genetics simulation. PLoS Comput Biol. 2018:14:e1006581. 10.1371/journal.pcbi.1006581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Khudiakova KA, Boenkost F, Tourniaire J. 2024. Genealogies under purifying selection [preprint]. bioRxiv. 10.1101/2024.10.15.618444.. [DOI]
- Koenig D et al. Long-term balancing selection drives evolution of immunity genes in Capsella. Elife. 2019:8:e43606. 10.7554/eLife.43606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leffler EM et al. Revisiting an old riddle: what determines genetic diversity levels within species? PLoS Biol. 2012:10:1–9. 10.1371/journal.pbio.1001388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewontin RC. The genetic basis of evolutionary change. Columbia University Press; 1974. [Google Scholar]
- Li WH. Distribution of nucleotide differences between two randomly chosen cistrons in a subdivided population: the finite island model. Theor Popul Biol. 1976:10:303–308. 10.1016/0040-5809(76)90021-6. [DOI] [PubMed] [Google Scholar]
- Liang YY et al. Linked selection shapes the landscape of genomic variation in three oak species. New Phytol. 2022:233:555–568. 10.1111/nph.v233.1. [DOI] [PubMed] [Google Scholar]
- Linderoth T. Identifying population histories, adaptive genes, and genetic duplication from population-scale next generation sequencing. University of California; 2018. [Google Scholar]
- Liu X, Fu YX. Stairway plot 2: demographic history inference with folded SNP frequency spectra. Genome Biol. 2020:21:280. 10.1186/s13059-020-02196-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mackintosh A. Data and code associated with ’Estimating the reduction in genetic diversity from background selection under non-equilibrium demography and partial selfing’. 2025. [DOI] [PMC free article] [PubMed]
- Mackintosh A et al. The determinants of genetic diversity in butterflies. Nat Commun. 2019:10:3466. 10.1038/s41467-019-11308-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matheson J, Masel J. Background selection from unlinked sites causes nonindependent evolution of deleterious mutations. Genome Biol Evol. 2024:16:evae050. 10.1093/gbe/evae050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McVicker G, Gordon D, Davis C, Green P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 2009:5:e1000471. 10.1371/journal.pgen.1000471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mirchandani CD et al. A fast, reproducible, high-throughput variant calling workflow for population genomics. Mol Biol Evol. 2024:41:msad270. 10.1093/molbev/msad270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murphy DA, Elyashiv E, Amster G, Sella G. Broad-scale variation in human genetic diversity levels is predicted by purifying selection on coding and non-coding elements. Elife. 2022:12:e76065. 10.7554/eLife.76065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nam K et al. Evidence that the rate of strong selective sweeps increases with population size in the great apes. Proc Natl Acad Sci U S A. 2017:114:1613–1618. 10.1073/pnas.1605660114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nelder JA, Mead R. A simplex method for function minimization. Comput J. 1965:7:308–313. 10.1093/comjnl/7.4.308. [DOI] [Google Scholar]
- Nicolaisen LE, Desai MM. Distortions in genealogies due to purifying selection and recombination. Genetics. 2013:195:221–230. 10.1534/genetics.113.152983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nordborg M. Structured coalescent processes on different time scales. Genetics. 1997:146:1501–1514. 10.1093/genetics/146.4.1501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nordborg M, Charlesworth B, Charlesworth D. The effect of recombination on background selection. Genet Res (Camb). 1996:67:159–174. 10.1017/S0016672300033619. [DOI] [PubMed] [Google Scholar]
- Nordborg M, Donnelly P. The coalescent process with selfing. Genetics. 1997:146:1185–1195. 10.1093/genetics/146.3.1185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ollivier L, Charlesworth B, Pouyet F. Beyond recombination: exploring the impact of meiotic frequency on genome-wide genetic diversity. PLoS Genet. 2025:21:e1011798. 10.1371/journal.pgen.1011798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pannell JR. Evolution of the mating system in colonizing plants. In: Invasion genetics: the Baker and Stebbins legacy. John Wiley & Sons; 2016. p. 57–80. [Google Scholar]
- Pedersen BS, Quinlan AR. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics. 2018:34:867–868. 10.1093/bioinformatics/btx699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pope NS et al. The expansion of agriculture has shaped the recent evolutionary history of a specialized squash pollinator. Proc Natl Acad Sci U S A. 2023:120:e2208116120. 10.1073/pnas.2208116120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Romiguier J et al. Comparative population genomics in animals uncovers the determinants of genetic diversity. Nature. 2014:515:261–263. 10.1038/nature13685. [DOI] [PubMed] [Google Scholar]
- Roze D. Background selection in partially selfing populations. Genetics. 2016:203:937–957. 10.1534/genetics.116.187955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seger J et al. Gene genealogies strongly distorted by weakly interfering mutations in constant environments. Genetics. 2010:184:529–545. 10.1534/genetics.109.103556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Slotte T et al. The Capsella rubella genome and the genomic consequences of rapid mating system evolution. Nat Genet. 2013:45:831–835. 10.1038/ng.2669. [DOI] [PubMed] [Google Scholar]
- Smith JM, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res (Camb). 1974:23:23–35. 10.1017/S0016672300014634. [DOI] [PubMed] [Google Scholar]
- Stoffel M et al. Demographic histories and genetic diversity across pinnipeds are shaped by human exploitation, ecology and life-history. Nat Commun. 2018:9:4836. 10.1038/s41467-018-06695-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strütt S, Excoffier L, Peischl S. A generalized structured coalescent for purifying selection without recombination. Genetics. 2025:229:iyaf013. 10.1093/genetics/iyaf013. [DOI] [PubMed] [Google Scholar]
- Torres R, Szpiech ZA, Hernandez RD. Human demographic history has amplified the effects of background selection across the genome. PLoS Genet. 2018:14:e1007387. 10.1371/journal.pgen.1007387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution. 1984:38:1358–1370. 10.1111/j.1558-5646.1984.tb05657.x. [DOI] [PubMed] [Google Scholar]
- Weng ML et al. Fine-grained analysis of spontaneous mutation spectrum and frequency in Arabidopsis thaliana. Genetics. 2019:211:703–714. 10.1534/genetics.118.301721. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Code for performing forward simulations with SLiM and fitting BGS models is available at https://zenodo.org/records/17465489 (Mackintosh 2025). The filtered VCF file, bed file of callable regions and windowed spectra for the two Capsella species can be found at the same repository. Python code for fitting BGS models to sequence data is available at https://github.com/A-J-F-Mackintosh/Binfer.





