Skip to main content
Genetics logoLink to Genetics
. 2012 May;191(1):233–246. doi: 10.1534/genetics.111.138073

The Role of Background Selection in Shaping Patterns of Molecular Evolution and Variation: Evidence from Variability on the Drosophila X Chromosome

Brian Charlesworth 1,1
PMCID: PMC3338263  PMID: 22377629

Abstract

In the putatively ancestral population of Drosophila melanogaster, the ratio of silent DNA sequence diversity for X-linked loci to that for autosomal loci is approximately one, instead of the expected “null” value of 3/4. One possible explanation is that background selection (the hitchhiking effect of deleterious mutations) is more effective on the autosomes than on the X chromosome, because of the lack of crossing over in male Drosophila. The expected effects of background selection on neutral variability at sites in the middle of an X chromosome or an autosomal arm were calculated for different models of chromosome organization and methods of approximation, using current estimates of the deleterious mutation rate and distributions of the fitness effects of deleterious mutations. The robustness of the results to different distributions of fitness effects, dominance coefficients, mutation rates, mapping functions, and chromosome size was investigated. The predicted ratio of X-linked to autosomal variability is relatively insensitive to these variables, except for the mutation rate and map length. Provided that the deleterious mutation rate per genome is sufficiently large, it seems likely that background selection can account for the observed X to autosome ratio of variability in the ancestral population of D. melanogaster. The fact that this ratio is much less than one in D. pseudoobscura is also consistent with the model’s predictions, since this species has a high rate of crossing over. The results suggest that background selection may play a major role in shaping patterns of molecular evolution and variation.


MEAN silent site DNA sequence diversities in the putatively ancestral East African populations of Drosophila melanogaster seem to be approximately the same for the X chromosome (X) and autosomes (A) (Andolfatto 2001; Hutter et al. 2007; Singh et al. 2007), despite the fact that the “null” expectation for the ratio of the effective population sizes (Ne) of X and A is 3/4 for the case of a 1:1 sex ratio and purely random variation in offspring number in both sexes (Wright 1931). There is little evidence for an X vs. A difference in mutation rate at silent sites in Drosophila, after possible differences in the intensity of selection and mutational biases on silent sites for X and A are taken into account (Bauer and Aquadro 1997; Hutter et al. 2007; Keightley et al. 2009; Zeng and Charlesworth 2010; Haddrill et al. 2011) (but see Bachtrog 2008). This observation therefore suggests an equality of Ne values for X and A, since neutral diversity under the infinite sites model is equal to the product of 4Ne and the neutral mutation rate per site (Kimura 1971), where Ne is defined as one-half of the expected coalescent time for a pair of alleles at a given locus (Charlesworth and Charlesworth 2010, p. 217).

This equality of Ne values for X and A could reflect a highly female-biased sex ratio and/or a very high variance in male reproductive success (Hedrick 2007; Hutter et al. 2007; Ellegren 2009; Vicoso and Charlesworth 2009a), both of which reduce the effective population size of males relative to females. This reduction would have a smaller effect on X than on A, since the Drosophila X spends two-thirds of its time in females and only one-third of its time in males, whereas an autosome spends half of its time in each sex. But this difference between X and A also affects the population-effective rate of recombination for X vs. A, which controls the rate of breakdown of linkage disequilibrium. Drosophila males lack recombinational exchange between homologous chromosomes (Ashburner et al. 2005); the population-effective recombination rate for a given rate of recombination r in females between two loci is therefore 0.5r for A and 0.667r for X (Charlesworth and Charlesworth 2010, p. 381). This suggests that hitchhiking effects may have less influence on variability at typical X loci compared with A loci, given similar selection intensities for X and A mutations, consistent with the observation that X and A loci with similar population-effective recombination rates appear to have relative levels of silent site variability that are close to the null expectation (Vicoso and Charlesworth 2009b); a contrary effect can be produced by recurrent selective sweeps of partially recessive, positively selected mutations (Aquadro et al. 1994; Betancourt et al. 2004; Ellegren 2009).

An alternative explanation was proposed by Hutter et al. (2007), who suggested that a recent population expansion in the Zimbabwe population of D. melanogaster has differentially affected X-linked and autosomal variability. However, recent analyses of synonymous site variability in this population have cast doubt on the reality of such an expansion, when selection on codon usage is taken into account (Zeng and Charlesworth 2009, 2010). Similarly, the fact that the X/A diversity ratio is close to one for an African D. simulans population, which lacks inversions, suggests that a reduction in autosomal diversity caused by hitchhiking effects of autosomal inversion polymorphisms in D. melanogaster (Andolfatto 2001; Singh et al. 2007) cannot be a general explanation for this effect.

The purpose of this article is to investigate whether the process of background selection, the hitchhiking of neutral or nearly neutral variability by linked deleterious mutations (Charlesworth et al. 1993; Charlesworth 2012), can account for these apparently equal X and autosomal Ne values or at least contribute to an X/A ratio that differs substantially from 3/4, as was suggested earlier by Aquadro et al. (1994). To do this, a model of the effect of background selection (BGS) on variability across a large, normally recombining region of a chromosome is needed. A previous investigation of this problem in D. melanogaster by Charlesworth (1996) used phenotypic estimates of the strength of selection against deleterious mutations and the overall mutation rate to deleterious alleles, which have now been superseded by estimates based on DNA sequence data (Loewe and Charlesworth 2006; Loewe et al. 2006; Haag-Liautard et al. 2007; Keightley and Eyre-Walker 2007; Eyre-Walker and Keightley 2009; Keightley et al. 2009; Schneider et al. 2011; Wilson et al. 2011). In addition, the study by Charlesworth (1996) assumed that sites subject to purifying selection were spread uniformly along the chromosome, whereas in reality they are clustered into coding sequences and blocks of functional noncoding sequences (Misra et al. 2002).

The present study attempts to remedy these deficiencies, with specific reference to the effect of BGS on the X/A ratio of effective population sizes or neutral diversities under the infinites sites model, both of which are proportional to the expected coalescent time for a pair of alleles. Equations 4–9 of Nordborg et al. (1996) provide formulas for the effect of BGS on the coalescent time at a given nucleotide site. Comparisons with the results of computer simulations have shown that these formulas are accurate for the case of a single chromosome, provided that the strength of selection is sufficiently strong in comparison with the effect of genetic drift that the frequencies of deleterious mutant alleles can be treated as though they are in deterministic equilibrium (Nordborg et al. 1996). The model developed here takes into account the fact that some noncoding sequences in Drosophila (both intergenic and intronic) are strongly conserved, so that deleterious mutations affecting them probably have similar selective effects to nonsynonymous mutations, whereas another large class of noncoding sequences is subject to weaker or no selective constraints (Haddrill et al. 2005; Halligan and Keightley 2006; Casillas et al. 2007; Sella et al. 2009).

This article is concerned with the effect of BGS over a whole chromosome or chromosome arm. To simplify the analysis, I assume that we are dealing with a site located in the middle of a Drosophila chromosome arm and that recombination rates per unit physical distance are uniform across the arm. This will somewhat underestimate the overall effect of BGS on variability in a Drosophila population, since recombination rates are lower at the telomeres and centromeres than in the middle of an arm (Ashburner et al. 2005), but in practice most loci used in resequencing studies in Drosophila come from regions with high levels of recombination, which constitute the majority of the genome (Charlesworth 1996). In addition, the effect of lower variability in low recombination regions is partly counteracted by the weaker effect of BGS at the ends of a chromosome arm, caused by the lack of adjacent genes compared with the sites in the middle of a chromosome (Nordborg et al. 1996), so that consideration of the properties of the middle of a chromosome arm should provide a reasonably good picture of the typical level of variability in the presence of BGS in Drosophila, except at the extreme ends of a chromosome or chromosome arm.

Theory and Methods

Model assumptions: types of sequence and their selection coefficients

Predictions are developed using several different levels of approximations, with the overall goal of evaluating the sensitivity of the predicted X/A ratio of neutral diversity to the assumptions of the models. In line with the facts described above, three classes of selected site are modeled: strongly selected nonsynonymous sites, strongly selected noncoding sites, and weakly selected noncoding sites. Probability distributions of the fitness effects of newly arising deleterious mutations are assumed, such that the probability density of selection coefficient s against a homozygous mutation at a given nucleotide site is φk(s) for the kth class of selected site, where k = 1 for nonsynonymous sites, 2 for strongly selected noncoding sites, and 3 for weakly selected noncoding sites. For reasons given below, both nonsynonymous and strongly selected noncoding sequences are assumed here to share the same distribution, so that φ1(s) = φ2(s), although this assumption is easy to relax. Sex differences in selection coefficients are also ignored in most of the results presented below; again, reasons are given later as to why this is unlikely to be important.

The different classes of sites are organized into ng coding sequences of length lg bp, separated by “intergenic” sequences, which are divided into ns strongly selected noncoding sequences of length lis and ns + 1 weakly selected noncoding sequences of length liw (see Figure 1). Each coding sequence is flanked by weakly selected noncoding sequences, except at the telomere and centromere, where any nongenic DNA beyond the last coding sequences at the ends of the arm is ignored. The total length of noncoding sequence separating a pair of coding sequences is thus li = nslis + (ns + 1)liw. By increasing the number of coding sequences and decreasing their length accordingly, a chromosome arm with genes that includes long introns, containing a mixture of strongly and weakly selected sites, can easily be modeled. For simplicity, the account below mostly refers to noncoding sequences as intergenic.

Figure 1.

Figure 1

The organization of noncoding sequences around two coding sequences (blue), into blocks of strongly selected sequence (green) and weakly selected sequences (red).

The focal neutral site for which the effect of BGS is to be calculated is assumed to be located in the center of the chromosome arm, in the middle of a weakly selected intergenic sequence, so that the distance to the nearest coding sequence is 0.5(li – 1), assuming li to be odd. To accommodate this assumption, ng and ns are assumed to be even, and liw is assumed to be odd.

Deleterious mutations involved in BGS effects are assumed to be under such strong selection that they are kept at low frequencies. For nonrecessive autosomal mutations, this means that the equilibrium frequency of a deleterious mutation at a given site in an infinitely large randomly mating population is determined by the ratio of the mutation rate, u, to a deleterious variant at a site and the heterozygous selection coefficient against the deleterious variant, ts (Charlesworth and Charlesworth 2010, p. 161). In a large finite population, the mean frequency of deleterious mutations over a collection of sites with the same mutation and selection parameters is close to this equilibrium, provided that recombination is sufficiently frequent (Nordborg et al. 1996). With dominance coefficient h and selection coefficient s against homozygotes, the effective selection coefficient against the mutation is thus ts = hs (h > 0). For X-linked mutations with equal selection on the two sexes, the corresponding effective selection coefficient is given approximately by ts = (2h + 1)s/3 (Charlesworth and Charlesworth 2010, p. 98); corresponding formulas can be obtained for the case of sex-specific effects on fitness.

Model assumptions: mutation rates

For most of the results presented here, only a single chromosome arm is considered, on the assumption that the dissipation of the effects of BGS with recombinational distance is sufficiently strong that sites on one chromosome arm have little effect on another (evidence to support this assumption is presented in Results and Discussion). We define the diploid deleterious mutation rate for the kth class of selected site on a chromosome arm as Uk, where Uk is equal to the sum of 2u over all sites subject to purifying selection of this type of site on the chromosome arm in question. The standard numerical values for the Uk used in most of the calculations below were arrived at in the following way. The net deleterious diploid mutation rate for D. melanogaster was estimated by Haag-Liautard et al. (2007) to be 1.2 mutations per generation. The corresponding deleterious mutation rate, UD, for a typical chromosome arm contributing ∼20% of the euchromatic genome is equal to 0.24. With an average of 2800 genes per arm and an average total coding sequence length per gene of 1500 bp (Misra et al. 2002), the assumption that 70% of coding sites are nonsynonymous (Loewe and Charlesworth 2007) gives ∼2.94 Mb of sites capable of generating nonsynonymous mutations out of a total of 4.2 Mb coding sequence. In a chromosome arm of 20 Mb, this leaves ∼15.8 Mb of noncoding sequences. The majority of these are subject to some level of selective constraint (Halligan and Keightley 2006); ∼25% are strongly conserved sequences with an average length of ∼40 bp (Casillas et al. 2007).

The remaining 75% of noncoding sequences are here assumed to be under weak purifying selection. In the absence of introns, a total length li = 5659 bp of sequence between a pair of coding sequences is allowed, which is divided into ns = 36 strongly selected sequences of length 39 bp and ns + 1 = 37 weakly selected noncoding sequences of length 115 bp, organized as described above (these numbers are reduced proportionately if ng is increased and lg is reduced, to allow for long introns separating exons within genes). This gives a total of (ng – 1)nsls = 2799 × 36 × 39 = 3.93 Mb of strongly selected noncoding sequence and (ng – 1)(ns + 1)lw = 2799 × 37 × 115 = 11.91 Mb of weakly selected noncoding sequence. Together with the nonsynonymous sites, this gives a total of 18.78 Mb of sequence that is potentially under significant selection [selection on codon usage acting at synonymous sites is ignored here, since it is too weak to have significant BGS effects (Zeng and Charlesworth 2009; Zeng 2010)].

To assess the contributions of these types of sequence to the deleterious mutation rate, we also need estimates of their respective levels of selective constraint, i.e., the fraction of mutations in each class that are sufficiently deleterious to ensure their elimination from the population (Halligan and Keightley 2006). Table 2 of Casillas et al. (2007) suggests that the divergence per site between D. melanogaster and D. simulans for strongly conserved noncoding sequences is comparable to that for nonsynonymous sites (see Table 1 of Sella et al. 2009), and their fit of a gamma distribution to the distribution of selection coefficients against mutations in these sequences gave a similar value of the shape parameter a to published estimates for nonsynonymous mutations (Keightley and Eyre-Walker 2007; Eyre-Walker and Keightley 2009; Haddrill et al. 2010).

There is also evidence that similar values of the fraction α of fixed differences between species caused by positive selection apply to strongly selected noncoding and nonsynonymous mutations (Casillas et al. 2007; Sella et al. 2009); these need to be removed from estimates of between-species divergence before calculating selective constraint values, which apply only to sites subject to purifying selection. This component of the ratio of divergence at strongly selected sites relative to that for putatively neutral sites is taken here to be equal to 0.078, consistent with the observed ratio of nonsynonymous to synonymous divergence between D. melanogaster and D. simulans of ∼0.13 and a somewhat conservative α-value of 0.6, yielding a constraint value for strongly selected sites of cs = 0.922, for both class 1 and class 2 sites. For weakly constrained noncoding sites, a constraint value of cw = 0.572 is used here, which is slightly higher than the value indicated by Table 2 of Casillas et al. (2007) and Table 1 of Sella et al. (2009), to accommodate a small fraction of positively selected mutations. If a shape parameter of the gamma distribution of 0.3 is assumed, consistent with the evidence just mentioned, then mean s values for strongly selected and weakly selected sites that are consistent with these constraint values can be calculated by the method described in the Appendix of Haddrill et al. (2010); assuming a dominance coefficient of 0.5, these are found to be s1 = s2 = 2.5 × 10−3 (strongly selected sites) and a3 = 0.3, s3 = 8 × 10−6 (weakly selected sites), assuming Ne = 106.

Let the proportion of strongly selected sites among all sites potentially under significant selection be xs = xsc + xsn, where xsc and xsn are the proportions of nonsynonymous sites and strongly selected noncoding sites among potentially selected sites; the proportion of weakly selected sites is xw = 1 – xs. From the way in which UD was estimated (Haag-Liautard et al. 2007), the overall mutation rate for a chromosome arm contributed by these sequences is UA = UD/(xs cs + xw cw). We have xsc = 2.94/18.78 = 0.157, xsn = 3.93/18.78 = 0.210, xs = 0.367, and xw = 0.633. Use of these numerical values gives UA = 0.24/(0.367 × 0.922 + 0.633 × 0.572) = 0.343. We then obtain the following values of the deleterious mutation rates for each class: U1 = UAxsccs = 0.050, U2 = UAxsncs = 0.066, and U3 = UAxwcw = 0.124.

These values need to be reduced by removing mutations that fall below the threshold value for which the formulas for BGS used below are likely to be accurate (Nordborg et al. 1996), giving truncated mutation rates of UTk for each class of site. In the numerical results presented below, the distributions for both the strongly selected and the weakly selected sites were truncated at the lower end at sT = 5 × 10−6, corresponding to an Nes value of 5 in a population of effective size 106. This procedure will lead to an underestimate of the effect of BGS, as the mutations that fall below this threshold will exert some effects, although not as large as predicted by the formulas used below (Zeng and Charlesworth 2011). Using the gamma distribution and the mutation rate parameters described above, the truncated deleterious mutation rates are UT1 = 0.044, UT2 = 0.058, and UT3 = 0.044, giving a total truncated deleterious mutation rate UT = 0.146. The truncated mutation rates for the nonsynonymous and strongly selected sites are only slightly lower than the untruncated values, whereas ∼65% of the weakly selected noncoding sites are treated as neutral.

Exact model of BGS

For a focal neutral site, the expected effect of BGS caused by a given type of selected site is parameterized by the ratio of the coalescent time for this site to its “neutral” value in the absence of BGS (Hudson and Kaplan 1994, 1995; Nordborg et al. 1996), denoted here by Bk for the kth class of site. For simplicity, the mutation rate at a site in class k is assumed in the following analyses to be independent of s for these sites, but only mutations with an s above the truncation point sT described above are included in the calculations. Because different organizations of sites apply to nonsynonymous, strongly selected noncoding sites and weakly selected noncoding sites, each of these must be considered separately.

For all sites included in a given class k, we have

BkexpusT1itsφk(s)ds(ts+ri[1ts])2, (1)

where u is the mean haploid mutation rate per base pair, and ri is the frequency of recombination between the focal site and the ith site of class k (Nordborg et al. 1996).

We need to relate ri for each site to the physical distance from the focal site to the ith site under selection. Let the total map length in females of the chromosome arm be M M, so that the map distance per base pair is ρ = M/(nglg + [ng – 1]li). The simplest mapping function is a linear relation between map distance and recombination rate. Taking into account the lack of crossing over in male Drosophila (Ashburner et al. 2005), and averaging recombination rates across the two sexes as explained in the Introduction, this model yields population-effective recombination rates for a distance of d bp between the focal site and a given selected site of rdA = 0.5ρd and rdX = 0.667 ρd, for A and X sites, respectively. More generally, the recombination rate can be related to the map distance z = ρd by a mapping function that allows for the occurrence of double crossovers. Here, the “standard” mapping function of Charlesworth (1996) is used, which was shown by Cobbs (1978) to provide a good fit to Drosophila data. The population-effective recombination rates for map distance z are 0.25{1 – cos(2z)exp(–2z)} and 0.333{1 – cos(2z)exp(–2z)}, for the A and the X, respectively.

Given the values of the parameters described above, and the distribution of s, it is straightforward in principle to evaluate Bk for a given k by applying numerical integration over φk(s) to the summation in Equation 1. This requires specification of the distances of each selected site from the focal sites, as outlined in the Appendix for noncoding sites. The summation in Equation 1 proceeds along the chromosome arm in one direction, starting at the centrally located focal site, and the final result is doubled to estimate the sum for the whole arm.

For calculations involving nonsynonymous sites, all third coding positions are treated as neutral to accommodate synonymous sites and are skipped over when summing along a coding sequence. The mutational density u for these sites is u = U1/(1.4nglg), since the total number of nonsynonymous sites on a chromosome arm is ∼0.7nglg. For strongly selected noncoding sites, we have u = U2/(2[ng – 1]nsls). The same distributions of s are used for nonsynonymous and strongly selected sites in the numerical results shown below. For weakly selected sites, u = U3/(2[ng – 1][ns + 1]lw), which is substantially smaller than u for the strongly selected sites (reflecting the lower level of selective constraint in this case), and different parameters for the distribution of s are used.

The only difficulty with this procedure is that the large number of nucleotide sites (∼20 million) on a typical Drosophila chromosome arm makes numerical integration over all sites very slow. For this reason, the summation formula over sites was used, averaging the contribution from each site over a grid of 1000 points taken over the range of a truncated gamma distribution, with s derived from a single-parameter gamma distribution with shape parameter ak and mean sk before truncation of values of s < sT (the mean after truncation is higher). To avoid inaccuracies of numerical integration with very small s, for values of x < 0.001 the analytical formula for the integral of xak–1 was used instead of xak–1exp(–x) in the formula for the gamma distribution (where x = s ak/sk). Integration over the remainder of the distribution was continued up to a value of x = 10, with constant increments of x on a logarithmic scale. To avoid selection coefficients greater than one, all values with s > 1 were reset to 1.

This approach yields the “model 1” results for the case of the standard mapping function and “model 2” results for the case of a linear map. The features of the different types of model are summarized in Table 1.

Table 1. Models of background selection with selection on nonsynonymous, weakly selected, and strongly selected noncoding sites.

Model 1 Standard mapping function with summation over all sites and integration over the distributions of selection coefficients
Model 2 Linear mapping function with summation over all sites and integration over the distributions of selection coefficients
Model 3 Approximations using integration over a continuum of sites (with a linear mapping function)
Model 4 Approximations using summation of the integrals over each cluster of sites with the same selection regime (with a linear mapping function) and integration over the distributions of selection coefficients
Model 5 Approximations using integration along the genome of the integrals over clusters of sites (with a linear mapping function), and first and second moments of the distributions of selection coefficients

Approximations

Results can be obtained more rapidly using several different approximations to Equation 1. The simplest is that introduced by Hudson and Kaplan (1994) and Barton (1995), which assumes a linear map of length M M in females. Selected sites are distributed uniformly along it, with a population-effective map length for the chromosome arm of Me (0.5M and 0.667M, for X and A, respectively), with Me assumed to be much greater than any value of ts drawn from the distribution. Replacing the summation in Equation 1 by integration along a continuum, these assumptions give Bk ≈ exp(–UT/Me) (see Equation 10 of Nordborg et al. 1996), where UT is the sum of the truncated deleterious mutation rates UTk over all classes k. This yields the “model 3” results.

An approximation that should in principle be more accurate can be obtained as follows, retaining the assumption of a linear map. For nonsynonymous sites, the following procedure is followed. The physical distances from the focal site to the start and end of the jth coding sequence to its right or left are ∼ li(j – 0.5) + lg(j – 1) and li(j – 0.5) + lgj, respectively. Using the relations described above, these distances can be translated into population-effective recombination frequencies of rj1 and rj2, respectively. Summation along the length of a coding sequence is replaced by integration, and a deleterious mutational density of U1/(2 × 0.7 × nglg) per site is assumed, allowing as before for the fact that an average of 70% of coding sequence mutations are nonsynonymous. Using Equation 9 of Nordborg et al. (1996) with some rearrangement of terms (see Appendix, Equations A1 and A2), for a given value of ts we obtain the following net contribution to the negative of the exponent in Equation 1 from the jth pair of coding sequences to the right and left of the focal site

E1j(ts)U1tsng(ts+rj1[1ts])(ts+rj2[1ts]). (2)

The assumption that selected sites are uniformly distributed along a coding sequence is of course inaccurate because of the presence of synonymous sites. However, this is likely to cause only a minor error, since the mutational density per nonsynonymous site is U1/(1.4nglg) rather than U1/(2nglg), and the mean frequency of recombination between adjacent nonsynonymous sites is 1/(0.7) higher than assumed in this expression, if 30% of sites are neutral. The derivation given in the Appendix shows that these two effects cancel out in the final expression.

The overall exponent can then be obtained by integration over the truncated distribution of selection coefficients for nonsynonymous sites and summing over all j from 1 to ng/2. This gives

B1expsT1jE1j(ts)φ1(s)ds. (3)

The procedures for noncoding sites are similar, with summation over the integrals for each set of lis strongly selected and liw weakly selected sites, respectively (Appendix, Equations A4 and A10). Together with the results for the nonsynonymous sites, the expressions for B2 and B3 obtained in this way describe “model 4”.

Apart from model 3, these calculations are all dependent on the properties of the distribution of s. An alternative approximation that avoids using the details of this distribution can be obtained by using the approach used for model 4, but replacing summation over coding sequence or blocks of noncoding sites by integration with respect to a continuous variable representing the index of the coding sequence or noncoding block in question. The calculation is further simplified by assuming that the distribution of s is such that ts << 1 for most sites, so that the terms involving 1 – ts in the above expressions can be replaced by 1.

For nonsynonymous sites, use of the above expressions for the distances from the focal site to the start and end of a coding sequence yields the following approximation to the sum over all j of E1j(ts),

E1(ts)U1tsng1(1/2)ng×dx{ts+ρ˜(li[x1/2]+lg[x1])}{ts+ρ˜(li[x1/2]+lgx)}, (4)

where ρ˜= 0.5ρ for A and 0.667ρ for X, and ρ is the gradient of the map distance in female meiosis with respect to the number of base pairs separating a pair of sites.

Elementary integration reduces this expression to

E1(ts)U1tsngρ˜2lg(li+lg)ln{(a+(1/2)bng)(a+b)(a+(1/2)bng)(a+b)}, (5)

where a = ts – 0.5ρ˜ li, a′ = tsρ˜ (0.5li + lg), and b = ρ˜(li + lg).

We have (a + b)/(a′ + b) = 1 + ρ˜ lg/(a′ + b) and (a + 0.5bng)/(a′ + 0.5bng) = 1 + ρ˜lg/(a+ 0.5bng), where a+ b = ts + 0.5ρ˜ li and a′ + 0.5bng = tsρ˜ (0.5li + lg) + 0.5ρ˜ (li + lg)ngts + 0.5ρ˜ (li + lg)ng. We can reasonably assume that li >> lg, given the typical length of an intergenic sequence compared with a coding sequence in Drosophila (Misra et al. 2002). The logarithmic expression in Equation 5 can then be well approximated by its leading term ρ˜lg{(1/[ts + β]) – (1/[ts + γ])}, where β = 0.5ρ˜li and γ = 0.5 ρ˜ (li + lg)ng.

An approximation to the expectation of E1(ts) over the truncated distribution of s can then be obtained by representing 1/(ts + β) and 1/(ts + γ) by their Taylor series in the deviation of ts from its mean, δts, and taking the expectation of the resulting expression for E1(ts), ignoring terms of higher order than (δts)2. This yields the approximation

B1expU1Tngρ˜(li+lg)×{1(t1+β)[t1βV1(t1+β)2]1(t1+γ)[t1γV1(t1+γ)2]}, (6)

where U1T is the deleterious mutation rate for nonsynonymous sites after truncation of the distribution of selection coefficients, and t1 and V1 are the mean and variance of ts, respectively, taken over the truncated distribution of s for nonsynonymous sites.

The similar but more complex procedures for noncoding sites are described in the Appendix. Together with the results for nonsynonymous sites, these approximations for the Bk yield the “model 5” results. All these formulas were implemented in FORTRAN programs, which are available on request.

Results and Discussion

BGS on the X and A in D. melanogaster

Table 2 shows the results of calculations of the expected coalescent times under background selection relative to neutral expectation (B), based on the above formulas and assumptions and using selection and mutation parameters that are probably fairly realistic for the D. melanogaster X chromosome (which is a single arm) and an arm of a major D. melanogaster autosome. The model assumes that a chromosome is organized into blocks of coding sequences that are uninterrupted by introns, but are separated by blocks of noncoding sequence containing a mixture of weakly selected and strongly selected sites (Figure 1). Note that the truncation of very weakly selected mutations means that ∼65% of the sites in the weakly selected sequences are treated as neutral, with the standard selection parameters used here. A diploid deleterious mutation rate U = 0.24 for this genomic region was assumed, on the basis of the genome-wide estimate of 1.2 from Haag-Liautard et al. (2007), which includes all types of deleterious mutations. The results for both intermediate dominance (h = 0.5) and partial recessivity (h = 0.2) are shown. There is good evidence that many slightly deleterious mutations are partially recessive (h < 0.5) (Crow and Simmons 1983; Garcia-Dorado and Caballero 2000), although very weakly selected mutations, such as most of those generated by the gamma distributions assumed here, are likely to approach additivity (Wright 1934; Kacser and Burns 1981). It is thus not clear a priori which of these h values is likely to be more realistic, but an h value much less than 0.5 seems unlikely.

Table 2. The effects of background selection on D. melanogaster autosomal and X chromosomal genes.

Model 1 Model 2 Model 3 Model 4 Model 5
B values for autosomes
 Effects of strongly selected sites 0.684 0.684 0.665 0.680 0.638
0.687 0.687 0.665 0.690 0.606
 Effects of weakly selected noncoding sites 0.814 0.815 0.839 0.791 0.831
0.806 0.806 0.839 0.757 0.867
 Effects of all sites 0.556 0.557 0.558 0.538 0.530
0.554 0.554 0.558 0.523 0.525
B values for X chromosome
 Effects of strongly selected sites 0.789 0.789 0.775 0.788 0.752
0.790 0.790 0.775 0.790 0.746
 Effects of weakly selected noncoding sites 0.878 0.878 0.896 0.859 0.895
0.876 0.876 0.896 0.851 0.904
 Effects of all sites 0.693 0.693 0.695 0.677 0.673
0.692 0.692 0.695 0.672 0.675
Adjusted X/A diversity ratio for all sites 0.935 0.935 0.934 0.944 0.952
0.937 0.937 0.934 0.964 0.964

See Table 1 for the meaning of the different models. B is the ratio of the effective population size under background selection to the neutral value. The parameters of the gamma distributions of selection coefficients are a1 = a2 = 0.3, s1 = s2 = 2.5 × 10−3 (strongly selected sites) and a3 = 0.3, s3 = 8 × 10−6 (weakly selected sites). Results for the dominance coefficient h = 0.5 are shown in the top part of each row, and results for h = 0.2 are shown in the bottom part. A diploid deleterious mutation rate of UD = 0.24 is assumed for both the autosomal arm (A) and the X chromosome. Map lengths of 0.5 and 0.6 M in female meiosis are assumed for A and X, respectively. The number of coding sequences in an arm (ng) is 2800; the length of a coding sequence is 1500 bp. The noncoding regions between coding sequences are divided into 36 strongly selected sequences of length 39 bp and 37 weakly selected sequences of length 115 bp.

The map lengths of the X chromosome and the autosomal arm in female meiosis were set to 0.6 M and 0.5 M, respectively, which approximate the standard values for D. melanogaster (Ashburner et al. 2005). The genes in this model include only coding sequences. The relative values of X and A coalescent times are displayed after multiplying the B value for the X by 3/4, which is the ratio expected in the absence of BGS and with a 1:1 sex ratio and random variation in offspring number in both sexes (Wright 1931). This adjusted ratio provides a baseline prediction for the ratio of X/A neutral diversity values; an excess variance in male reproductive success due to sexual competition, or a female-biased sex ratio, would cause an even higher value (Hedrick 2007; Hutter et al. 2007; Vicoso and Charlesworth 2009a).

The predictions for B for autosomal loci when all sites are taken into account vary from ∼0.52 to 0.56, and for X -linked loci from 0.67 to 0.69, depending on the model and the value of h. The adjusted X/A ratio varies from 0.93 to 0.96; this is the parameter of most interest for the purpose of this article, so that its relatively small range is encouraging. The large data set of Hutter et al. (2007) on variability in noncoding sequences in the Zimbabwe population of D. melanogaster gave an X/A diversity ratio of 0.90 after correcting for effects of GC content on diversity and divergence, which is in good agreement with the predictions of Table 2 and highly significantly different from the null value of 0.75. Note, however, that these predictions ignore possible effects of selection on the variants in the relatively long noncoding sequences involved, so the exact value of this ratio is still somewhat uncertain.

Model 1 involves the least approximations, but model 2 (which assumes a linear map) gives almost identical results. Model 3, which simply assumes a uniform density of selected sites across the chromosome, gives a remarkably good approximation to the model 1 results; models 4 and 5, somewhat surprisingly, give a slightly worse fit to the model 1 results than model 3 and overpredict the effects of BGS, although the differences are probably not meaningful for the purpose of comparisons with data. The bulk of the effect of BGS comes from the strongly selected sites, but the weakly selected sites make a significant contribution to increasing the adjusted X/A diversity ratio away from 3/4. For example, model 1 with h = 0.5 and no weakly selected sites gives an adjusted X/A diversity ratio of ∼0.86 instead of 0.93.

For models 1 and 2, a smaller value of h gives a slightly larger effect of BGS; however, the effect of dominance appears to be negligible in all cases, consistent with the good performance of model 3 as an approximation, which is independent of h. However, it should be noted that the mean and threshold s values were kept unchanged from the h = 0.5 case, to isolate the effect of h. The distribution of mutational effects in Drosophila as estimated from population genetic data in reality involves ts not s, since nonrecessive autosomal mutations are largely selected against on the basis of their heterozygous effects and X-linked mutations on the basis of a weighted average of their heterozygous effects on females and their hemizgygous effects on males (Charlesworth and Charlesworth 2010, p. 161), so that the s values with h = 0.2 should be adjusted to give the same distribution of ts values as for the h = 0.5 case. This implies that changing h should have no effect on the results, provided that the distribution of ts is held constant, other than through rounding errors in the numerical results. This was verified by recalculating the results after multiplying the mean values of s for the strongly and weakly selected sites, as well as the threshold value of s, by 0.5/h and 3 × 0.5/(2h + 1) for autosomal and X-linked loci, respectively. For example, with model 1 and the D. melanogaster parameters, the overall B values for A and X are 0.557 and 0.692, respectively, yielding an adjusted X/A diversity ratio of 0.93. The same argument can be applied to other modifications to the selection model, such as female- or male-specific selective effects, implying that the results should be robust to these changes.

BGS on the X and A in D. pseudoobscura

It is of interest to compare the results with those for D. pseudoobscura and its relatives, which have a two-arm X chromosome but single-arm autosomes and a much higher frequency of crossing over per base pair than D. melanogaster (Sturtevant and Tan 1937; Bachtrog and Andolfatto 2006; Kulathinal et al. 2008; Stevison and Noor 2010). The total map lengths of the chromosome arms are not known precisely; values of 1.2 M and 1.3 M have been assumed here for an autosome and an X chromosome arm, respectively, on the basis of Kulathinal et al. (2008) and Stevison and Noor (2010). Table 3 shows the results for these map lengths, with the other parameters being the same as for D. melanogaster. The adjusted X/A ratios are always substantially smaller than their counterparts in Table 2, reflecting the greater dissipation of BGS by the higher frequencies of recombination on both chromosomes.

Table 3. The effects of background selection on D. pseudoobscura autosomal and X chromosomal genes.

Model 1 Model 2 Model 3 Model 4 Model 5
B values for autosomes
 Effects of strongly selected sites 0.855 0.855 0.843 0.856 0.812
0.859 0.859 0.843 0.865 0.800
 Effects of weakly selected noncoding sites 0.914 0.914 0.929 0.891 0.941
0.910 0.910 0.929 0.868 0.961
 Effects of all sites 0.782 0.782 0.784 0.763 0.765
0.782 0.782 0.784 0.751 0.769
B values for X chromosome
 Effects of strongly selected sites 0.897 0.897 0.889 0.899 0.860
0.898 0.898 0.889 0.900 0.864
 Effects of weakly selected noncoding sites 0.940 0.940 0.950 0.922 0.960
0.938 0.938 0.951 0.916 0.966
 Effects of all sites 0.843 0.843 0.845 0.829 0.834
0.843 0.843 0.845 0.825 0.835
Adjusted X/A diversity ratio for all sites 0.809 0.809 0.808 0.824 0.814
0.809 0.809 0.808 0.823 0.814

See Table 1 for the meaning of the different models. The parameters are the same as for Table 2, except that map lengths of 1.2 and 1.3 M are assumed for A and X, respectively.

Haddrill et al. (2010) found that, after removing loci that deviated significantly from neutrality, the estimated mean synonymous site diversities X and A for D. pseudoobscura were 0.0149 (SE = 0.0018) and 0.0230 (SE = 0.0021) for D. pseudoobscura, giving a value of 0.65 (SE = 0.26) for the X/A ratio, which is not significantly different from 0.75 but is significantly different from one and equal to the ratio of X/A effective population sizes estimated by Haddrill et al. (2011) after taking the recent population expansion in this species into account. The corresponding estimate for the close relative D. miranda, for which there is little evidence for a recent expansion, was 0.79. The observed ratios for these two species are thus statistically consistent with the values of ∼0.81 shown in Table 3.

Robustness of the results

The results for a focal site in the middle of an arm are largely insensitive to linkage to another chromosome arm of similar size to the one being considered. For example, under model 1 with the parameters used in Tables 2 and 3 with h = 0.5, the B values for D. melanogaster when an additional arm is present are 0.557 for A and 0.693 for X, with an adjusted X/A ratio of 0.93, i.e., a very slight decrease over the Table 2 results. There is no effect at all for D. pseudoobscura. Complete insensitivity to the size of the chromosome is necessarily the case for model 3, which uses the result that, for a site that is not too close to the end of a chromosome, the effect of BGS depends only on the ratio of the total mutation rate to the map length (see derivation of the model 3 prediction above). Increasing both the mutation rate and the map length by the same factor, as would happen if the influence of an additional arm with similar mutational parameters were considered, thus has no effect on B.

Another important feature of the models is the length of the coding sequences vs. intergenic sequences. The model on which the above results are based ignores the fact that, as mentioned in the Introduction, most Drosophila genes have introns, many of which are several hundred base pairs or more in length and contain some selectively highly constrained sequences (e.g., Sella et al. 2009). This can be crudely modeled by increasing the number of coding sequences, while holding their total length constant. The length of intergenic sequence is decreased proportionately, keeping the lengths of individual blocks of weakly selected and strongly selected noncoding sequences approximately constant.

It would be expected that dividing the chromosome arm into a larger number of shorter functional sequences, for the same total size and map length, would increase the effects of BGS and hence the X/A diversity ratio, since the average density of selected sites in relation to the frequency of recombination is reduced. Table 4 shows results that are otherwise comparable with the h = 0.5 results for D. melanogaster and D. pseudoobscura, for twice the number of coding sequences as before (some minor adjustments to the numbers and lengths of the noncoding sequences were made, to meet the assumptions about the organization of the chromosome).

Table 4. Background selection with a large number of short coding sequences.

Model 1 Model 2 Model 3 Model 4 Model 5
B values for autosomes
 Effects of strongly selected sites 0.641 0.641 0.667 0.679 0.660
0.833 0.833 0.845 0.855 0.829
 Effects of weakly selected noncoding sites 0.820 0.822 0.838 0.797 0.822
0.916 0.917 0.929 0.894 0.940
 Effects of all sites 0.525 0.527 0.559 0.541 0.542
0.763 0.764 0.785 0.764 0.780
B values for X chromosome
 Effects of strongly selected sites 0.758 0.758 0.774 0.786 0.763
0.881 0.881 0.889 0.897 0.879
 Effects of weakly selected noncoding sites 0.882 0.882 0.896 0.858 0.902
0.941 0.941 0.951 0.924 0.960
 Effects of all sites 0.668 0.668 0.695 0.673 0.689
0.829 0.829 0.845 0.829 0.843
Adjusted X/A diversity ratio for all sites 0.954 0.950 0.932 0.932 0.953
0.814 0.814 0.807 0.810 0.811

The top and bottom parts of each row show the results for D. melanogaster and D. pseudoobscura, respectively. See Table 1 for the meaning of the different models. A dominance coefficient h = 0.5 is assumed; the other selection parameters are the same as for Tables 2 and 3, except that 5600 coding sequences of length 750 bp, separated by 18 strongly selected 39-bp noncoding sequences and 19 weakly selected 113-bp noncoding sequences, are assumed.

As expected, the effects of BGS due to strongly selected sites under models 1 and 2 are enhanced, resulting in a slightly higher X/A diversity ratio than before. The effects of weakly selected noncoding sites are slightly diminished, presumably reflecting the fact that there are smaller clusters of blocks of these sites. The overall effect of BGS is greater than before, and the X/A diversity ratio for D. melanogaster is predicted to be >0.95, whereas that for D. pseudoobscura is barely changed at 0.81. Doubling the number of coding sequences again produces further effects in the same direction (results not shown), with the predicted X/A diversity ratios under models 1 and 2 for D. melanogaster and D. pseudoobscura becoming ∼0.98 and 0.82, respectively. The predictions of model 3 are no longer as close to the model 1 results as previously, mainly reflecting the increased effect of strongly selected sites in model 1, whereas the model 3 results change only marginally because of the adjustments in the parameters mentioned above. Model 4 performs only slightly better than model 3, mainly because it underpredicts the effects of strongly selected sites. Model 5 gives results that are closer to model 1, despite being an approximation to model 4. Overall, the results suggest that the X/A ratios are relatively insensitive to the way in which the chromosome arm is divided among coding and noncoding sequences, with a finer subdivision into strongly selected coding sequences leading to slightly larger effects of BGS.

The sensitivity of the results to the parameters of the distribution of selection coefficients was also examined. Since model 3 generally provides a good approximation, provided that the threshold value of s is kept constant, little effect of changing these parameters is expected, except by altering the proportion of deleterious mutations that fall below the threshold, thereby reducing the net truncated mutation rate UT. It would therefore be expected that, for a given shape parameter a, the effect of BGS should be greater, the larger the mean selection coefficient; similarly, for a given mean selection coefficient, the effect of BGS should be greater, the larger the value of a, since this reduces the coefficient of variation of the distribution.

The effects of changing the selection parameters for strongly selected sites were investigated, since these contribute the most to the effects of BGS. The theoretical expectations were confirmed, but the effects on the adjusted X/A diversity ratio were relatively minor. For example, changing s1 = s2 from 2.5 × 10−3 to 0.01 or to 0.5 × 10−3 caused the adjusted X/A ratio predicted by model 1 for D. melanogaster (with shape parameter a = 0.3) to change from 0.935 to 0.936 and 0.922, respectively. Changing a from 0.3 to 0.6 or to 1.2 (with s1 = s2 = 2.5 × 10−3) caused this ratio to change to 0.950 and 0.952, respectively.

Conclusions

Overall, it seems that the results are fairly robust to the details of the selection parameters for deleterious mutations, for a given deleterious mutation rate. However, they are very sensitive to the deleterious mutation rate and the amount of recombination. Using model 3 (Hudson and Kaplan 1994; Barton 1995), which generally gives a reasonable approximation to the more exact results, the adjusted X/A ratio with a truncated deleterious mutation rate for an arm of UT is equal to 0.75 × exp{UT(MeXMeA)/MeXMeA},where MeX and MeA are the population-effective map lengths of the X chromosomal and autosomal arms, respectively. This ratio changes almost linearly from 0.79 to 0.93, over the range from UT = 0.0365 to 0.146 (the value assumed above) with MeX = 0.40 and MeA = 0.25, the D. melanogaster values, and from 0.76 to 0.81 with MeX = 0.87 and MeA = 0.60, the D. pseudoobscura values. A higher UT of 0.25 per chromosome arm gives values of 1.09 and 0.85, for the D. melanogaster and D. pseudoobscura map lengths, respectively. Such a high mutation rate seems implausible, however, given the size of the Drosophila genome and our current estimates of the mutation rate in D. melanogaster (Haag-Liautard et al. 2007; Keightley et al. 2009), so that the lower range of mutation rates used in these calculations is more likely to apply.

As might be expected intuitively, longer map lengths lead to smaller X/A diversity ratios and a lower sensitivity to the deleterious mutation rate. Species like D. melanogaster, with a small number of chromosomes and relatively short map lengths, are thus most likely to show an effect of BGS on the overall X/A diversity ratio. As discussed by Charlesworth (2012), BGS will tend to reduce rather than increase the X/A diversity ratio in taxa like mammals, where crossing over occurs on the autosomes in males (the same applies to the ratio of Z chromosome to autosomal diversity in birds, but Lepidoptera should behave like Drosophila because of their lack of crossing over in females), but the effect is likely to be fairly small because of the large number of chromosomes and the correspondingly low deleterious mutation rate per chromosome. Even for a Drosophila species, whether or not the ratio of X to autosomal neutral variability for a gene in the middle of the relevant chromosome arms is substantially greater than the null expectation of 3/4 is highly dependent on the deleterious mutation rate and the map lengths in question. More accurate knowledge of these parameters will help to resolve the question of whether the observations on X/A variability ratios in different populations and species can be accounted for solely by BGS or whether the other factors mentioned in the Introduction need to be invoked.

A role for selective sweeps rather than BGS in producing this effect cannot, of course, be ruled out, although these have usually been invoked to explain the X/A silent diversity ratio of <3/4 in non-African populations of D. melanogaster and D. simulans (Begun and Aquadro 1993; Aquadro et al. 1994; Begun and Whitley 2000; Andolfatto 2001; Harr et al. 2002; Hutter et al. 2007; Singh et al. 2007; Stephan 2010; Mackay et al. 2012), on the basis of a faster rate of adaptive evolution on X than on A (Charlesworth et al. 1987; Vicoso and Charlesworth 2009a) in response to the novel out-of-Africa environment. If this hypothesis is correct, then it seems unlikely that selective sweeps could be the cause of the X/A variability ratio of near one in East African populations of D. melanogaster, given that the theoretical study of the effect of selective sweeps on the X/A diversity ratio in Drosophila by Betancourt et al. (2004) showed that the fixation of partially recessive favorable mutations reduces the value of this ratio below 3/4.

However, the question of the cause of the much lower X/A diversity ratio in non-African populations of D. melanogaster and D. simulans remains undecided, since purely demographic explanations have been proposed as an alternative or supplement to the selective sweep model (Charlesworth 2001; Wall et al. 2002; Pool and Nielsen 2007, 2008). If the ancestral population had a high value of this ratio, this would seem to rule out demographic explanations based on a greater sensitivity of X than of A to a population bottleneck that require a value close to three-quarters (Pool and Nielsen 2007, 2008). It is possible, however, that demographic effects could interact with those of BGS to contribute to the reduced X/A diversity. A severe reduction in population size would be expected to reduce the effect of BGS, since a larger proportion of deleterious mutations will fall below the threshold for validity of the model used here. If the effectiveness of BGS for a population at equilibrium with greatly reduced effective population size is examined by increasing the threshold selection coefficient sT in inverse proportion to the population size, the reduction can be quite large—for twofold and fourfold reductions below the values shown in Table 2, the adjusted X/A diversity ratios become 0.81 and 0.80, respectively. Not surprisingly, however, the ratio always remains above 3/4, in contrast to the observed ratio of <0.60 for the non-African sample of D. melanogaster in the meta-analysis in Table 4 of Singh et al. (2007).

However, the question of how background selection would interact with changing population size to affect the dynamics of the X/A diversity ratio remains to be studied; the computer algorithm for modeling BGS with recombination that has recently been developed by Zeng and Charlesworth (2011) should be helpful in this regard, since it can be modified to allow for changing population size. In this context, it is interesting to note that Singh et al. (2007) found that the level of polymorphism for intergenic noncoding sequences was similar for X and A in a sample from a U.S. population; both X and A showed a similar reduction in diversity compared with an African sample, suggesting that selective sweeps may be implicated in the greater reduction in variability for X than for A for sequences obtained from genes. The genome-wide surveys of diversity that are becoming available in Drosophila (e.g., Sackton et al. 2009; Mackay et al. 2012) should help to resolve these questions.

Acknowledgments

I thank Laurence Loewe, Tim Connallon, Kai Zeng, and two reviewers for their helpful comments on this article. This work was supported by grant BB/G003076/1 from the Biotechnology and Biological Sciences Research Council of the United Kingdom.

Appendix

Derivation of Equation 2

For a given pair of genes with index j, at the same distance to the right and left of the focal site and with a linear mapping function, the joint contribution to the negative of the exponent in Equation 1 can be written as

U1ts0.7nglgzj1zj2dz(ts+rj(z)[1ts])2=U1tsnglglg(rj2rj1)rj1rj2dr(ts+r[1ts])2, (A1)

where zj1 and zj2 are the numbers of bases separating the beginning and end of the gene from the focal site, respectively, and rj1 and rj2 are the corresponding recombination fractions.

The factor of 0.7 in the denominator of the right-hand terms disappears from the left-hand side, because the assumption of a uniform density of selected sites along the coding sequence implies that dz/dr = 0.7lg/(r1r2), since the mean recombination frequency between adjacent selected sites is (r1r2)/(0.7lg) and not (r1r2)/lg. Performing the integration, this expression becomes

U1tsng(rj2rj1)1(1ts){1(ts+rj1[1ts])1(ts+rj2[1ts])}, (A2)

which yields Equation 2 of the text.

Distances from the Focal Site for Strongly Selected Noncoding Sites and Expressions for Models 4 and 5

The distances d1mj and d2mj from the focal site to the beginning and end of the mth block of strongly selected noncoding sites, in the jth intergenic region to its right, are as follows. Let m = 1 for the leftmost block in an intergenic region and m = ns for the rightmost block (there is a block of weakly selected noncoding sequence between each of these and the adjacent gene).

For j = 0, the focal site is located in the center of the weakly selected noncoding block in the middle of the intergenic region, so that

d1m0=(m0.5)liw+(m1)lis,d2m0=d1m0+lis(1mns2). (A3a)

For 1 ≤ jng/2

d1mj=jlg+(j0.5)li+mliw+(m1)lis,d2mj=d1mj+lis(1mns). (A3b)

Note that d2mjd1mj = lis, independently of m.

The corresponding population-effective recombination rates can be obtained by multiplying the d’s by the appropriate function relating recombination rate to physical distance, yielding values of r1mj and r2mj. For the mth strongly selected noncoding block in the jth intergenic sequence, we can thus obtain expressions similar to Equations 2 and 3, except that U1 is replaced by U2 and ng is replaced by ngns in Equation 2, to take into account the fact that there are a total of approximately ngns strongly selected noncoding sequences on the chromosome arm:

E2mj(ts)U2tsngns(ts+r1mj[1ts])(ts+r2mj[1ts]). (A4)

By taking the exponent of the negative of the sum of this expression over all m and j, we obtain the expression for B2 in model 4.

The corresponding model 5 approximation to this sum can be obtained as follows. Following the method used for Equations 4 and 5 of the text, m is replaced by a continuous variable x, and Equation A4 is integrated with respect to x from x = 1 to x = ns. This yields an expression similar to Equation 5,

E2j(ts)U2tsngnsρ˜2(liw+lis)lisln{(aj+bns)(aj+b)(aj+bns)(aj+b)}, (A5)

where a0 = ts – 0.5ρ˜liw, a0 = a0ρ˜lis, aj = ts + ρ˜(jlg + [j – 0.5]li), aj = ajρ˜lis (for 1 ≤ jng/2), and b = ρ˜(liw + lis).

By taking expectations of the first- and second-order terms in the deviations of the ts from their mean over the truncated distribution of s, an expression similar to the negative exponent in Equation 6 is obtained for the sum of the contributions from the jth genes to the right and left of the focal site,

E2jU2Tngnsρ˜(liw+lis){1(t¯2+β2j)[t¯2β2jV2(t¯2+β2j)2]1(t¯2+γ2j)[t¯2γ2jV2(t¯2+γ2j)2]}, (A6)

where β2j = aj + bts, γ2j = aj + bnsts, and U2T is the deleterious mutation rate for strongly selected noncoding sites after truncation of the distribution of s; t¯2 and V2 are the mean and variance of ts over this distribution.

The sum of the E2j for all values of j between 0 and ng/2 is needed to obtain the final approximate expression for B2. The contribution E20 from the intergenic sequences immediately surrounding the focal site is given by Equations A5 and A6 with j = 0. The remaining part of the sum can be approximated by replacing j with a continuous variable y, so that β2j and γ2j are replaced by β2y and γ2y, and then integrating with respect to y from y = 1 to y = ng/2. We have

1ng/2dy(t¯2+β2y)=1λln(t¯2+κ+(1/2)λngt¯2+κ+λ)=I1β, (A7)

where κ = ρ˜(liw + lis – 0.5li) and λ = ρ˜(li + lg).

A similar integral I can be obtained for γ2y, replacing liw + lis in the expressions for κ and λ with (liw + lis)ns.

Similarly, we have

1ng/2β2ydy(t¯2+β2y)3=12λt¯2{(κ+(1/2)λng)(t¯2+κ+(1/2)λng)2(κ+λ)2(t¯2+κ+λ)2}=I2β (A8)

with an equivalent expression for I, again replacing liw + lis in the expressions for κ and λ by (liw + lis)ns.

These can be used in place of corresponding components of the sum of the E2j in Equation A5, together with the E20 term, yielding the model 5 approximation for B2.

Distances from the Focal Site for Weakly Selected Noncoding Sites and Expressions for Models 4 and 5

A similar approach can be used for weakly selected sites. The distances d1mj and d2mj from the focal site to the beginning and end of the mth block of weakly selected noncoding sites in the jth intergenic region to its right are given by

d110=0,d210=0.5liw(j=0,m=1) (A9a)
d1m0=(m1)(liw+lis)0.5liw,d2m0=d1m0+liw(j=0,2mns2). (A9b)

For 1 ≤ jng/2

d1mj=jlg+(j0.5)li+(m1)(liw+lis,),d2mj=d1mj+liw(1mns). (A9c)

These expressions yield the corresponding recombination frequencies, r1mj and r2mj, by multiplying by the appropriate function that relates recombination rate to distance.

Following the same procedure as for the strongly selected noncoding sequences, the equivalent of Equation A4 is

E3mj(ts)U3tsng(ns+1)(ts+r1mj[1ts])(ts+r2mj[1ts]). (A10)

This yields the expression for B3 in model 4, by taking the exponential function of the negative of its sum over all m and j.

The model 5 approximation to this sum can be obtained in a similar way to that used for the strongly selected noncoding sites. The equivalent to Equation A8 is

E3jU3Tng(ns+1)ρ˜(liw+lis){1(t¯3+β3j)[t¯3β3jV3(t¯3+β3j)2]1(t¯3+γ3j)[t¯3γ2jV3(t¯3+γ2j)2]}, (A11)

where β30 = 0.5ρ˜ liw, β3j = ρ˜ (jlg + [j – 0.5]li + liw) (for 1 ≤ j ≤ ng/2), γ30 = ρ˜ (nslis + [ns + 0.5]liw), γ3j = ρ˜ (jlg + [j – 0.5]li + nslis + [ns + 1]liw) (for 1 ≤ jng/2), and t¯3 and V3 are the mean and variance of ts over the truncated distribution of s for weakly selected noncoding sites. (Here, the discontinuity between m = 1 and m = 2 for j = 0 has been ignored, since it makes only a small contribution to the total; Equation A9b has been used for the case when j = 0 and m = 1.)

Equivalents to Equations A7 and A8 can then be obtained by the same method as for strongly selected noncoding sites, where now t¯2 is replaced with t¯3, κ = ρ˜ (liw – 0.5li), and λ = ρ˜ (li + lg) in the equivalents of Equation A7 and A8; κ is replaced by ρ˜ (nslis + [ns + 1]liw – 0.5li) in the corresponding expressions involving γ.

Footnotes

Communicating editor: J. Wakeley

Literature Cited

  1. Andolfatto P., 2001.  Contrasting patterns of X-linked and autosomal nucleotide variation in African and non-African populations of Drosophila melanogaster and D. simulans. Mol. Biol. Evol. 18: 279–290. [DOI] [PubMed] [Google Scholar]
  2. Aquadro C. F., Begun D. J., Kindahl E. C., 1994.  Selection, recombination, and DNA polymorphism in Drosophila, 46–56 Non-neutral Evolution: Theories and Molecular Data, edited by Golding B. Chapman & Hall, London. [Google Scholar]
  3. Ashburner M., Golic K. G., Hawley R. S., 2005.  Drosophila. A Laboratory Handbook. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. [Google Scholar]
  4. Bachtrog D., 2008.  Evidence for male-driven evolution in Drosophila. Mol. Biol. Evol. 25: 617–619. [DOI] [PubMed] [Google Scholar]
  5. Bachtrog D., Andolfatto P., 2006.  Selection, recombination and demographic history in Drosophila miranda. Genetics 174: 2045–2059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Barton N. H., 1995.  Linkage and the limits to natural selection. Genetics 140: 821–841. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bauer V. L., Aquadro C. F., 1997.  Rates of DNA sequence evolution are not sex biased in Drosophila melanogaster and D. simulans. Mol. Biol. Evol. 14: 1252–1257. [DOI] [PubMed] [Google Scholar]
  8. Begun D. J., Aquadro C. F., 1993.  African and North American populations of Drosophila melanogaster are very different at the DNA level. Nature 365: 548–550. [DOI] [PubMed] [Google Scholar]
  9. Begun D. J., Whitley P., 2000.  Reduced X-linked nucleotide polymorphism in Drosophila simulans. Proc. Natl. Acad. Sci. USA 97: 5960–5965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Betancourt A. J., Kim Y., Orr H. A., 2004.  A pseudohitchhiking model of X vs. autosomal diversity. Genetics 168: 2261–2269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Casillas S., Barbadilla A., Bergman C. M., 2007.  Purifying selection maintains highly conserved noncoding sequences in Drosophila. Mol. Biol. Evol. 24: 2222–2234. [DOI] [PubMed] [Google Scholar]
  12. Charlesworth B., 1996.  Background selection and patterns of genetic diversity in Drosophila melanogaster. Genet. Res. 68: 131–150. [DOI] [PubMed] [Google Scholar]
  13. Charlesworth B., 2001.  The effect of life-history and mode of inheritance on neutral genetic variability. Genet. Res. 77: 153–166. [DOI] [PubMed] [Google Scholar]
  14. Charlesworth B., 2012.  The effects of deleterious mutations on evolution at linked sites. Genetics 190: 5–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Charlesworth B., Charlesworth D., 2010.  Elements of Evolutionary Genetics. Roberts & Co., Greenwood Village, CO. [Google Scholar]
  16. Charlesworth B., Coyne J. A., Barton N. H., 1987.  The relative rates of evolution of sex chromosomes and autosomes. Am. Nat. 130: 113–146. [Google Scholar]
  17. Charlesworth B., Morgan M. T., Charlesworth D., 1993.  The effect of deleterious mutations on neutral molecular variation. Genetics 134: 1289–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Cobbs G., 1978.  Renewal approach to the theory of genetic linkage: case of no chromatid interference. Genetics 89: 563–581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Crow J. F., Simmons M. J., 1983.  The mutation load in Drosophila, pp. 1–35 in The Genetics and Biology of Drosophila, Vol. 3c, edited by M. Ashburner, H. L. Carson, and J. N. Thompson. Academic Press, London. [Google Scholar]
  20. Ellegren H., 2009.  The different levels of genetic diversity in sex chromosomes and autosomes. Trends Genet. 25: 278–284. [DOI] [PubMed] [Google Scholar]
  21. Eyre-Walker A., Keightley P. D., 2009.  Estimating the rate of adaptive mutations in the presence of slightly deleterious mutations and population size change. Mol. Biol. Evol. 26: 2097–2108. [DOI] [PubMed] [Google Scholar]
  22. Garcia-Dorado A., Caballero A., 2000.  On the average coefficient of dominance of deleterious spontaneous mutations. Genetics 155: 1991–2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Haag-Liautard C., Dorris M., Maside X., Macaskill S., Halligan D. L., et al. , 2007.  Direct estimation of per nucleotide and genomic deleterious mutation rates in Drosophila. Nature 445: 82–85. [DOI] [PubMed] [Google Scholar]
  24. Haddrill P. R., Charlesworth B., Halligan D. L., Andolfatto P., 2005.  Patterns of intron sequence evolution in Drosophila are dependent upon length and GC content. Genome Biol. 6: R67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Haddrill P. R., Loewe L., Charlesworth B., 2010.  Estimating the parameters of selection on nonsynonymous mutations in Drosophila pseudoobscura and D. miranda. Genetics 185: 1381–1396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Haddrill P. R., Zeng K., Charlesworth B., 2011.  Determinants of synonymous and nonsynonymous variability in three species of Drosophila. Mol. Biol. Evol. 28: 1731–1743. [DOI] [PubMed] [Google Scholar]
  27. Halligan D. L., Keightley P. D., 2006.  Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide sequence comparison. Genome Res. 16: 875–884. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Harr B., Kauer M., Schloetterer C., 2002.  Hitchhiking mapping: a population-based fine-mapping strategy for adaptive mutations in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 99: 12949–12954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Hedrick P. W., 2007.  Sex differences in mutation, recombination, selection, gene flow, and genetic drift. Evolution 61: 2750–2771. [DOI] [PubMed] [Google Scholar]
  30. Hudson R. R., Kaplan N. L., 1994.  Gene trees with background selection, 140–153 Non-neutral Evolution: Theories and Molecular Data, edited by Golding B. Chapman & Hall, London. [Google Scholar]
  31. Hudson R. R., Kaplan N. L., 1995.  Deleterious background selection with recombination. Genetics 141: 1605–1617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Hutter S., Li H. P., Beisswanger S., De Lorenzo D., Stephan W., 2007.  Distinctly different sex ratios in African and European populations of Drosophila melanogaster inferred from chromosome-wide nucleotide polymorphism data. Genetics 177: 469–480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Kacser H., Burns J. A., 1981.  The molecular basis of dominance. Genetics 97: 639–666. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Keightley P. D., Eyre-Walker A., 2007.  Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics 177: 2251–2261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Keightley P. D., Trivedi M., Thomson M., Oliver F., Kumar S., et al. , 2009.  Analysis of the genome sequences of three Drosophila melanogaster spontaneous mutation accumulation lines. Genome Res. 19: 1195–1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Kimura M., 1971.  Theoretical foundations of population genetics at the molecular level. Theor. Popul. Biol. 2: 174–208. [DOI] [PubMed] [Google Scholar]
  37. Kulathinal R. J., Bennett S. M., Fitzpatrick C. L., Noor M. A. F., 2008.  Fine-scale mapping of recombination rate in Drosophila refines its correlation to diversity and divergence. Proc. Natl. Acad. Sci. USA 10: 10051–10056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Loewe L., Charlesworth B., 2006.  Inferring the distribution of mutational effects on fitness in Drosophila. Biol. Lett. 2: 426–430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Loewe L., Charlesworth B., 2007.  Background selection in single genes may explain patterns of codon bias. Genetics 175: 1381–1393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Loewe L., Charlesworth B., Bartolomé C., Nöel V., 2006.  Estimating selection on nonsynonymous mutations. Genetics 172: 1079–1092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Mackay T. F. C., Richards S., Stone E. A., Barbadilla A., Ayroles J. F., et al. , 2012.  The Drosophila melanogaster genetic reference panel. Nature 482: 173–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Misra S., Crosby M. A., Mungall C. J., Matthews B. B., Campbell K. S., et al. , 2002.  Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol. 3: Research0083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Nordborg M., Charlesworth B., Charlesworth D., 1996.  The effect of recombination on background selection. Genet. Res. 67: 159–174. [DOI] [PubMed] [Google Scholar]
  44. Pool J. E., Nielsen R., 2007.  Population size changes reshape genomic patterns of diversity. Evolution 61: 3001–3006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Pool J. E., Nielsen R., 2008.  The impact of founder events on chromosomal variability in multiply mating species. Mol. Biol. Evol. 25: 1728–1736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Sackton T. B., Kulathinal R. J., Bergman C. M., Quinlan A. R., Dopman E. B., et al. , 2009.  Population genomic inferences from sparse high-throughput sequencing of two populations of Drosophila melanogaster. Genome Biol. Evol. 1: 449–450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Schneider A., Charlesworth B., Eyre-Walker A., Keightley P. D., 2011.  A method for inferring the rate of occurrence and fitness effects of advantageous mutations. Genetics 189: 1427–1437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Sella G., Petrov D. A., Przeworski M., Andolfatto P., 2009.  Pervasive natural selection in the Drosophila genome? PLoS Genet. 6: e1000495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Singh N. D., Macpherson J. M., Jensen J. D., Petrov D. A., 2007.  Similar levels of X-linked and autosomal nucleotide variation in African and non-African populations of Drosophila melanogaster. BMC Evol. Biol. 7: 202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Stephan W., 2010.  Genetic hitchiking vs. background selection: the controversy and its implications. Philos. Trans. R. Soc. B 365: 1245–1253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Stevison L. S., Noor M. A. F., 2010.  Genetic and evolutionary correlates of fine-scale recombination rate variation in Drosophila persimilis. J. Mol. Evol. 71: 332–345. [DOI] [PubMed] [Google Scholar]
  52. Sturtevant A. H., Tan C. C., 1937.  The comparative genetics of Drosophila pseudoobscura and Drosophila melanogaster. J. Genet. 34: 415–431. [Google Scholar]
  53. Vicoso B., Charlesworth B., 2009a Effective population size and the Faster-X effect: an extended model. Evolution 63: 2413–2426. [DOI] [PubMed] [Google Scholar]
  54. Vicoso B., Charlesworth B., 2009b Recombination rates may affect the ratio of X to autosomal noncoding polymorphism in African populations of Drosophila melanogaster. Genetics 181: 1699–1701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Wall J. D., Andolfatto P., Przeworski M. F., 2002.  Testing models of selection and demography in Drosophila simulans. Genetics 162: 203–216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Wilson D. J., Hernandez R. D., Andolfatto P., Przeworski M., 2011.  A population genetics-phylogenetics approach to inferring natural selection in coding sequences. PLoS Genet. 7: e1002395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Wright S., 1931.  Evolution in Mendelian populations. Genetics 16: 97–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Wright S., 1934.  Physiological and evolutionary theories of dominance. Am. Nat. 68: 25–53. [Google Scholar]
  59. Zeng K., 2010.  A simple multiallele model and its application to identifying preferred–unpreferred codons using polymorphism data. Mol. Biol. Evol. 27: 1327–1337. [DOI] [PubMed] [Google Scholar]
  60. Zeng K., Charlesworth B., 2009.  Estimating selection intensity on synonymous codon usage in a non-equilibrium population. Genetics 183: 651–662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Zeng K., Charlesworth B., 2010.  Studying patterns of recent evolution at synonymous sites and intronic sites in Drosophila melanogaster. J. Mol. Evol. 70: 116–128. [DOI] [PubMed] [Google Scholar]
  62. Zeng K., Charlesworth B., 2011.  The joint effects of background selection and genetic recombination on local gene genealogies. Genetics 189: 251–266. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES