Understanding the determinants of neutral diversity patterns on autosomes and sex chromosomes provides a bedrock for our interpretation of population genetic data...
Keywords: autosomes, polymorphism, sex-chromosomes
Abstract
Understanding the determinants of neutral diversity patterns on autosomes and sex chromosomes provides a bedrock for the interpretation of population genetic data; in particular, differences between the two informs our understanding of sex-specific demographic and mutation processes. While sex-specific age-structure and variation in reproductive success have long been known to affect neutral diversity, theoretical descriptions of these effects were complicated and lacking in generality, stymying attempts to relate diversity patterns of species with their life history. Here, we derive general yet simple expressions for these effects. In particular, we show that life history effects on X-to-autosome ratios of pairwise diversity levels (X:A diversity ratios) depend only on the male-to-female ratios of mutation rates, generation times, and reproductive variances. Our results reveal that changing the male-to-female ratio of generation times has opposite effects on X:A ratios of diversity and divergence. They also explain how sex-specific life histories modulate the response of X:A diversity ratios to changes in population size. More generally, they clarify that sex-specific life history—generation times in particular—should have marked effects on X:A diversity ratios in many taxa and enable further investigation of these effects.
ELUCIDATING the forces that shape neutral diversity patterns has been a major obsession of modern population genetics (Crow and Kimura 1970; Gillespie 2004; Charlesworth and Charlesworth 2010) and a main driver of inferences from population genetic data (e.g., International HapMap Consortium 2003; Myers et al. 2005; McVicker et al. 2009; Li and Durbin 2011). Contrasting relative neutral diversity levels on the X and autosomes is particularly interesting in this regard, as this contrast provides a unique setting for testing our understanding of the effects of these evolutionary forces. All else being equal, the ratio of genetic diversity levels on X and autosomes should mirror the ratio of their numbers in the population and thus equal 3/4. However, autosomes spend an equal number of generations in diploid form in both sexes, whereas the X spends twice as many generations in diploid form in females than in haploid form in males. As a result, X-to-autosome (X:A) diversity ratios can also be shaped by sex differences in life history and mutation processes [which are sexually dimorphic in many species (Hedrick 2007)], as well as by differences between X and autosomes in the effects of demographic history and selection at linked sites. Here, we focus on life history effects (as well as their interactions with demographic history), setting aside selection at linked sites, whose effects have been studied extensively (Charlesworth et al. 1987; Haldane 1924; Hammer et al. 2010; Charlesworth 2012). We focus on the X throughout, but similar arguments apply to neutral diversity on sex chromosomes more generally.
While the effects of sex-specific age-structures on neutral diversity have been modeled for decades (Felsenstein 1971), these effects have been underappreciated in empirical studies of X:A diversity ratios. Notably, in many species, generation times differ substantially between sexes (De Magalhães et al. 2007), with likely implications for the diversity levels of X compared to autosomes. Indeed, parental ages are among the strongest known modifiers of mutation rates, and of the degree of male mutation bias (α), in mammals (Crow 2006; Sayres et al. 2011; Segurel et al. 2014; Jónsson et al. 2017; Gao et al. 2019). Additionally, longer male than female generation times increase the numbers of generations that occur on the X relative to autosomes per unit time, influencing rates of divergence on the X relative to autosomes on a phylogenetic branch of fixed length (Amster and Sella 2016). By the same token, we would expect sex-specific generation times to affect diversity ratios: for a given absolute time (in years) to the most recent common ancestor (MRCA) of alleles at X or autosome linked loci, sex differences in generation times will lead to different numbers of generations on the X and autosomal lineages. In the case of diversity, however, this effect cannot be considered in isolation, as the coalescence process (and thus the distributions of times to the MRCAs on the X and autosomes) will also be affected by sex-specific generation times, and, perhaps, by sex specific age structure more generally (Charlesworth 2001).
The effects of age structure on the effective population size were first described by Felsenstein (1971) in a haploid model, and later extended for the X and autosomes by others (e.g., Hill 1972; Johnson 1977; Emigh and Pollak 1979; Pollak 1990; Charlesworth 1994; Charlesworth 2001). Felsenstein assumed a haploid population, in which individuals survive to age i, and a proportion of newborns descend from parents of age i. He relied on identity by decent considerations to show that
(1) |
where and is the expected generation time. While this result, its extensions to X and autosomes (Hill 1972; Johnson 1977; Emigh and Pollak 1979; Pollak 1990; Charlesworth 1994), and generalizations incorporating reproductive variance (Charlesworth 2001), provide valuable insights, they depend on a full parametrization of the age structure of the population. The dependence on so many parameters limits our understanding of the general effects of age structure on neutral diversity. Hill and Pollak derived alternative expressions for the of autosomes (Hill 1972) and of the X (Pollak 1990). For autosomes, for example, Hill showed that
(2) |
where is the variance in the number of descendants of sex that parents of sex have throughout their life, is the covariance between the numbers of male and female descendants of parents of sex s throughout their life, is the proportion of newborns of sex s, and is the sex-averaged generation time. While Hill and Pollak’s expressions rely on fewer parameters, they are still quite complicated, again limiting our understanding of the effects of age structure. In both cases, the underlying parameters are also difficult to measure in nature, making it hard to relate the theoretical results with observed X:A diversity ratios. In contrast, the effects of a sex-specific age-structure on X:A ratios of neutral substitutions between species depend only on the male-to-female ratio of generation times (Charlesworth 1994; Amster and Sella 2016), making them easy to understand and to test against phylogenetic observations (Amster and Sella 2016).
Unlike sex-specific age structure, differences in reproductive variance between males and females, i.e., variance in the number of offspring produced throughout life, on X:A diversity ratios have been considered by many empirical studies [e.g., (Charlesworth 2001; Hammer et al. 2008)]. Reproductive variance can arise from age-structure per se, i.e., from stochastic differences in age at death, age-specific fecundity, and sampling variance in the number of offspring. It can also arise from endogenous differences among individuals (due to environmental, social, or genetic causes). For example, many species are known to be highly polygynous, i.e., a minority of males sire offspring with multiple females, resulting in a higher reproductive variance in males than in females. We consider such variation in male fecundity to be endogenous. Since the X spends twice as many generations in females than in males, higher male reproductive variance decreases coalescence rates on the X relative to on autosomes, and, thus, increases the X:A diversity ratio. Both theoretical and empirical studies suggest that this effect can be substantial (e.g., Charlesworth 2001; Hammer et al. 2008).
From a theoretical standpoint, the effects of endogenous variation in reproductive success take simple forms when they are studied in isolation, i.e., without interactions with age structure. Notably, Wright (Wright 1939, 1986) derived a simple expression for the effective population size of diploid populations with endogenous reproductive variance and nonoverlapping generations, i.e., without age-structure (see below); we present a straightforward generalization of their expression for the X (also see below). Since then, several studies have incorporated reproductive variance into models with age structure, under the assumption that the age structure and reproductive variance are completely or partially independent (e.g., Sagitov and Jagers 2005; Pollak 2011). This assumption does not account for known correlations between ages of reproduction, reproductive success, and longevity: for example, between the age of first reproduction and longevity (Westendorp and Kirkwood 1998; Pettay et al. 2005), or between reproductive success and longevity (Westendorp and Kirkwood 1998; Thomas et al. 2000). A couple of studies considered more general models combining reproductive variance and age structure, but their analyses relied on simulations with particular parameter choices (Evans and Charlesworth 2013) or they resulted in complicated analytical results (Charlesworth 2001), with even more parameters than those with age-structure alone.
Our goal here is to understand how sex-specific age structure and variation in reproductive success affect neutral diversity levels on the X and autosomes in a general setting. Importantly, we aim to derive simple expressions for these dependencies, in terms of parameters that are straightforward to measure, so that the models can be related to and tested against observed diversity levels on the X and autosomes. To achieve these goals, we build on the coalescent treatment of age-structured populations (Orive 1993; Sagitov and Jagers 2005; Pollak 2011). We define the effective population size, , in terms of the expected coalescence time of a pair of neutral alleles (i.e., we use this expectation for haploids and half that for diploids), but note that other definitions used in cited work are equivalent to ours under the models considered.
Data availability
The authors state that all data necessary for confirming the conclusions presented in the manuscript are represented fully within the manuscript. Supplemental material available at figshare: https://doi.org/10.25386/genetics.12440075.
Results
The haploid case
The model:
To illustrate how we treat the coalescence process in an age-structured population, we first consider the haploid model proposed by Felsenstein (1971). We assume a haploid, panmictic population of constant size that is divided into age classes of 1 year (for convenience). We denote the number of individuals of age by , where is assumed to be constant, due to mortality, and for ages (see Table 1 for a summary of notations). We further assume that each of the newborns is independently chosen to descend from a random parent: the ages of the parents are chosen from a distribution with expectation G (the generation time), and the specific parent is chosen with uniform probability within the age class (endogenous reproductive variance is considered below).
Table 1. Notation for the haploid model.
Notation | Definition |
---|---|
Probability that a newborn descends from a parent of age a | |
Number of individuals of age a | |
Expected generation time | |
Effective age-class size | |
Relative reproductive success, where is the relative reproductive success at age a | |
Expected value of conditioned on survival to age j () | |
Weighted average of | |
Reproductive variance |
Effective population size:
This model was solved by Sagitov and Jagers (2005) in a coalescent framework, and here we provide an intuitive account of the solution (see SI Section 1 for a rigorous derivation).
We consider the rate of coalescence for a sample of two alleles. To this end, we trace their lineages backward in time in steps of the same size in which we measure age (e.g., years). When we trace the lineage of a single allele backward in time, the age of the individuals that carry it, a, forms a Markov chain, with stationary distribution . As we would expect from intuition, the stationary probability of being carried by a newborn is More generally, (Sagitov and Jagers 2005), where, intuitively, the term represents the probability that the allele was carried by a newborn years ago (with probability ) who inherited it from a parent of age j (with probability ). When we trace the lineages of two alleles backward in time, a coalescence can occur in age class in two possible routes. In the first, both alleles would be in newborns in the previous time step (with stationary probability ), descending from age class a (with probability ) from the same parent (with probability ). In the second, only one of the alleles would be in a newborn in the previous time step and the other would be in the age class (with stationary probability ), and the first allele would descend from the second the next step backward in time (with probability ). This second route also requires that (which is handled by defining ). Thus, the stationary probability of coalescence per year in age class a is
(3) |
The probability of coalescence per generation and corresponding effective population size follow. Multiplying the probabilities per age class per year by the generation time, summing them over age classes, and rearranging terms, we find that
(4) |
where In SI Section 1.3, we show that deviations from the stationary age distribution have negligible effects on coalescence rates, and thus that this approximation—and by analogy, those in the rest of the paper—conform with standard theory in neglecting effects that are smaller than an order of (Hudson 1990). Note that the (a, j)-term in is proportional to the probability that the coalescence in age class a occurs in an individual that gave birth to a newborn carrying one of the alleles at age a and a newborn carrying the other allele at age j. The terms add up to 1, allowing us to define the effective age class size, M, as the weighted harmonic mean
(5) |
The effective population size can then be viewed as the product of the effective age-class size and the effective number of age-classes, which is simply the expected generation time G, i.e.,
(6) |
In SI Section 1.5, we show that Equation 6 is equivalent to Felsenstein’s formula (Equation 1). The definition of the effective age-class size as a harmonic mean simplifies this formula and provides an intuition for the effect of age-structure on .
Reproductive variance:
Next, we extend the model to incorporate endogenous reproductive variance. To this end, we assume that each newborn is assigned a random vector (from a distribution that is a parameter of the model) describing its age-dependent, relative reproductive success, such that its probability of being chosen as a parent among the individuals of age class a is (thus, corresponds to the expected, rather than realized, reproductive success of the individual at age a). We further assume that the proportion of individuals with a given vector that reach age a, can vary with age, in effect allowing for dependencies between expected reproductive success and longevity. In SI Section 1.7, we detail why our results also apply to models that incorporate dependencies between realized reproductive success and longevity, such as the one proposed by Evans and Charlesworth (Evans and Charlesworth 2013). Our results thus apply to quite general interactions between reproductive success and age structure, including those that have been observed (see Introduction).
The extended model can be solved along the same lines we described above (see SI Section 1.2). Specifically, the coalescence rate per generation for a sample of two alleles and corresponding effective population size take a similar form:
(7) |
but in this case
(8) |
where for As in the simpler case, the (a, j)-term in is proportional to the probability that coalescence in age class a occurs in an individual that gave birth to a newborn carrying one of the alleles at age a and a newborn carrying the other allele at age j; but in this case, the coefficients and their weighted sums , incorporate the effect of endogenous reproductive variance. In contrast to the simpler case, the terms do not necessarily add up to 1. We therefore introduce a normalization by and define the effective age class size as
(9) |
In these terms, the effective population size takes the form
(10) |
Recasting in terms of reproductive variance:
To provide further intuition, we first consider the special case in which relative reproductive success is independent of age and of mortality rates. Namely, each newborn is assigned a scalar relative reproductive success r at birth, such that its probability of being chosen as a parent among the individuals of age class a is ; the distribution of r values then has expectation 1 (by construction). We denote its variance by ( when there is no endogenous reproductive variance). In this case, the coalescence rates in any given age class are increased by a factor and therefore the effective population size is
(11) |
Thus, , which can be interpreted as the reproductive variance caused by the Poisson sampling of parents, which contributes 1, and by the endogenous reproductive variance, which contributes In turn, the contribution of age-structure to reproductive variance is reflected in the term
The results of the general model can also be recast in terms of the reproductive variance. First consider a haploid Wright–Fisher process (i.e., with nonoverlapping generations) with endogenous reproductive variance, modeled similarly to the example considered above. In this case, Wright (1939, 1986) showed that the effective population size is
(12) |
where is the census population size and is the reproductive variance (strictly speaking, Wright derived an equivalent result for a diploid hermaphrodite model). Second, in the case with age-structure but without endogenous reproductive variance, Hill (1972) showed that the effective population size can also be written as
(13) |
where is the number of newborns per generation, and is the reproductive variance introduced by age structure. Comparing Equations 6, 10, and 13, we see that this variance can be expressed as This expression makes intuitive sense, as defined in this way would be the reproductive variance in the case considered by Wright, if individuals in the next generation were randomly chosen from a reproductive pool including only out of individuals in the previous one. In SI Section 1.4, we show that Equation 13 also holds for general models with age-structure and endogenous reproductive variance. In this case, the reproductive variance is
(14) |
where the first term in this product, is the variance introduced by stochasticity in birth and mortality, and the second term, reflects the contribution of endogenous reproductive variance.
In summary, Equation 13 implies that all the effects of age-structure and endogenous reproductive variance (and any dependence between them) on the effective population size can be summarized in terms of the generation time, the number of newborns per year, and the reproductive variance, In principle this equation applies to Y or W chromosomes, although in practice the effects of selection at linked loci are likely to dominate diversity levels on such chromosomes. Equation 13 also shows that, along with the reproductive variance, it is the number of newborns per generation, rather than the census population size, that determines the effective population size in age-structured populations. One implication is that (in the model with endogenous reproductive variance) is an upper bound on whereas is not (see SI Section 1.6).
Stochastic fluctuations in population size and continuous time:
Here, and below, we assume that the sizes of age classes are constant but we expect our results to hold under moderate, stochastic fluctuations in these sizes (e.g., due to sampling variance). To see why, consider the effects of relaxing the assumption of a constant population size in a model without age structure and with nonoverlapping generations. Assume, for example, that the population size in any given generation, is described by a zero-truncated Poisson distribution with a constant mean reflecting stochastic fluctuations with a constant carrying capacity. The coalescence rate in such a model would be (neglecting effects that are smaller than an order of thus illustrating that stochastic fluctuations in population size have a negligible effect on By the same token, we would expect our approximations to hold under moderate, stochastic fluctuations in the sizes of age classes when the population size is sufficiently large.
We also assume discrete time steps, but expect our approximations to apply to models in continuous time. Indeed, consider a model of a population of size in which individuals have a constant lifespan of 1 year. If we measure time in discrete units of -th of a year then we have age classes, corresponding to ages , with and , and a generation time of From Equations 5 and 6 we find that and When this model becomes the standard Wright–Fisher model with nonoverlapping generations, in which In the continuous time limit In other words, in continuous time, the effective population size is reduced to half of the census size [as is the case in the Moran model, in which each death/birth happens at a distinct time (Ewens 2004)]. Importantly, as we increase the effective population size changes smoothly and converges quickly, suggesting that our results (e.g., Equations 13 and 18) remain intact in models in continuous time.
X and autosomes
The diploid model with two sexes is more elaborate, but it is defined and solved along the same lines that we have described for the haploid model. Notably, in the diploid case, we allow for age-dependent mortality, fecundity, and endogenous reproductive variance to differ between the sexes. We also accommodate X and autosomal modes of inheritance (i.e., X-linked alleles in males always descend from females, whereas in all other cases alleles are equally likely to descend from males or females). In SI Section 2.3, we solve this model for the stationary distributions of sex and age and corresponding coalescence rates, by extending Pollak’s results for age-structured populations with two sexes (Pollak 2011) to account for endogenous reproductive variance.
Here, we describe the two formulations for the effective population sizes on the X and autosomes (defined as half the inverse of the coalescence rates) relying on the intuition gained from the haploid model. By analogy with the haploid case (Equation 10), we can express the effective population sizes on the X and autosomes in terms of their respective effective sizes of age-classes. First, we define the basic quantities ( and ) for each sex as in the haploid case. Second, we define these quantities for the X and autosomes as the appropriate averages over their values in males and females (Table 2). Formulated in this way, the effective population size for the X and autosomes take the form
(15) |
We note that the factor of 2, which is absent in the haploid case (Equation 10), does not arise from diploidy, which is accounted for by the definition of in the diploid case. Rather, it arises from doubling the effective number of age classes when there are two sexes (i.e., instead of classes in the haploid case), which doubles the expected coalescence time (because for a pair of alleles to coalescence, they must be in an individual of a given age and sex; see SI Section 2.3).
Table 2. Notation for the diploid model with two sexes.
Notation | Definition |
---|---|
Number of newborns (of both sexes) per-year | |
Proportion of male and female newborns | |
Expected generation times in females and males | |
Expected generation times for X and autosomes | |
Expected endogenous reproductive success factors in males and females | |
Expected endogenous reproductive success factors for X and autosomes | |
Effective sizes of age classes in males and females | |
Effective sizes of age classes for X and autosomes | |
Reproductive variance of males and females | |
Reproductive variance for X and autosomes | |
Expected mutation rate per generation in males and females | |
Average mutation rate on the X and autosomes | |
Function relating male-to-female ratios and X:A ratios |
All quantities in males and females are defined as in the haploid model (Table 1). All quantities for X and autosomes, other than effective age-class sizes, are arithmetic averages over sexes, weighted by the proportion of generations that X and autosomes spend in each sex. The effective age-class sizes are harmonic averages over sexes, weighted by the relative endogenous reproductive success factors, in each sex
We can also express the effective population sizes on the X and autosomes in terms of the number of newborns per generation and the reproductive variances (by analogy with Equation 13 for the haploid case). To this end, we first generalize Wright’s result for the effects of endogenous reproductive variance with nonoverlapping generations [Equation 12; (Wright 1939, 1986)] to the X and autosomes. We assume that the population consists of females and males, with a total population size of and denote the proportion of each sex by where Under this model
(16) |
where and are the reproductive variances in females and males, respectively (SI Section 2.4). With an equal sex ratio at birth (i.e., ), these expressions reduce to
(17) |
where and are the reproductive variance on the X and autosomes, respectively (Table 2). When there is no endogenous reproductive variance and the (Poisson) sampling variance in a panmictic sexually reproducing population, Equation 17 reduces to the results for the standard Wright–Fisher model, and Equations 16 and 17 differ from the analogous Equation 12 for the haploid case in a couple of ways. In SI Section 2.6 we provide some intuition for these differences by recasting the effective population sizes on X and autosomes in terms of allelic rather than individual reproductive variances. Doing so makes the analogy between the haploid and diploid formulas apparent.
By analogy with the haploid case (Equation 13), the expressions with general age-structure and endogenous reproductive variances follow from replacing the population size by the number of newborns per generation:
(18) |
where in this case, the variances and also include the effects of age-structure, is the number of newborns of both sexes per year, and the proportions of each sex, and are defined with respect to the numbers of newborns rather than individuals (SI Section 2.4). In fact, Equation 18 remains valid when the parameters and , and and are defined for the individuals of sex who survive to age as long as precedes reproductive age (because removing all individuals that do not reach from the model does not affect the coalescent). For example, we can take to be the reproductive age of sex the number of individuals (of both sexes) that survive to reproductive age per year, the proportion of individuals among them of sex and the reproductive variance of a random individual of sex that survive to reproductive age.
The expressions in Equation 18 are considerably simpler than the results of previous work [see SI Section 2.5 for a detailed discussion of the relationship of our results to those of Hill (1972) and Pollak (1990)]. In particular, they are given in terms of fewer parameters, which can be measured more readily in extant populations, especially given the flexibility to measure parameters at any age preceding reproductive age (e.g., circumventing difficulties of accounting for infant mortality). Moreover, Equation 18 demonstrates that the behavior under a general model of age-structure and endogenous reproductive variance is well approximated by the standard coalescence process (with nonoverlapping generations and no endogenous reproductive variance) using the appropriate effective population sizes (Equation 18) and units for time (i.e., using generation times on the X and autosomes, and , respectively, to describe the coalescence process in years) (see SI Sections 1.2 and 1.3).
Diversity levels on X and autosomes:
With expressions for the effective population sizes in hand, we turn to diversity levels. We allow for mutation rates to vary with sex and age [as they do in many taxa; (Segurel et al. 2014)], with their per generation rates in females, and in males, defined as expectations over parental ages. The mutation rates per generation on the X, and autosome, are defined by the appropriate averages over their values in males and females (Table 2). The expected heterozygosity on X and autosomes then follow from the standard forms:
(19) |
(Charlesworth and Charlesworth 2010). Note that these expressions are usually derived assuming that the genealogical and mutational processes are independent (Hudson 1990). This assumption is violated here, because both the coalescence and mutation rates depend on the ages along a lineage. In SI Section 3, we show that the standard forms hold nonetheless.
We can now combine our results to derive expressions for X:A ratios of effective population sizes and pairwise diversity levels (i.e., heterozygosities). From Equation 18, with some rearrangement of terms, the ratio of effective populations sizes is
(20) |
and, from Equation 19, the ratio of expected heterozygosities is
(21) |
where (f is defined such that, given a sex dependent parameter whose values on the X and autosomes are weighted arithmetic averages over sexes, i.e., and its X:A ratio is given by ).
When mutation rates, age structures, and endogenous reproductive variances are identical in both sexes, Equation 21 reduces to the naïve neutral expectation of 3/4. When they vary between sexes, Equation 21 shows that, even if many parameters are required to model life history, mutational and life history effects on the ratio reduce to the effects of the male-to-female ratios of mutation rates generation times and reproductive variances [more precisely,]. Importantly, Equations 20 and 21 are much simpler and more general than previous results, and are expressed terms of sex ratios of parameters that are considerably easier to measure in extant populations.
Discussion
Having a general yet simple expression for mutational and life history effects on the X:A diversity ratio allows us to draw several implications. First, it allows us to generalize previous bounds on the effects of each. Notably, we see that the multiplicative effect of the male-to-female ratios of mutation rates and generation times is bound between 2/3 and 4/3 [see Miyata et al. (1987) for the mutational bound alone], and the effect of the male-to-female ratio of reproductive variances is bound between 3/4 and 3/2 (see Charlesworth 2001). Considering these factors jointly, X:A diversity ratios are bound between 1/4 and 2, a wider range than appreciated previously (note that these bounds apply only under a constant population size; see below). The functional dependence on the three male-to-female ratios also suggests that changes in them will have the greatest impact when they are small (because declines with ). As the male-to-female ratios on the right hand side of Equation 21 are male biased (i.e., >1) in many taxa (Fenner 2005; Dixson 2009; Sayres and Makova 2011), we would expect differences in them among populations or closely related species to result in larger differences in X:A ratios when they are ∼1, and to have much smaller effects if far from 1.
Second, our results clarify the differential effects of life history on X:A ratios of diversity and divergence. More precisely, we consider not divergence but the number of substitutions that accumulate on the X and autosomes on a lineage since a species split (i.e., ignoring multiple hits and the contribution of ancestral polymorphism), denoted and respectively. The equivalent expression to Equation 21 for the ratio of substitutions is
(22) |
(Charlesworth 1994; Amster and Sella 2016). Thus, male mutation bias, has the same effect on ratios of diversity and substitutions. In contrast, reproductive variances and the sex ratio at birth affect only the diversity ratio because they affect the relative rates of coalescence, and, thus, the relative lengths of lineages on the X and autosomes; for the substitutions ratio, this length is set by the species’ split time. Interestingly, the male-to-female ratio of generation times has opposite effects on diversity and substitutions ratios (Figure 1). These opposing effects arise because generation times affect not only the relative number of generations on X compared to autosomes, which affects both diversity and substitutions ratios, but also relative coalescence rates, which affects only the diversity ratio.
Third, we can rely on the results for a constant population size to study how changes in population size and life history over time jointly affect X:A diversity ratios. Changes in population size are known to affect patterns of genetic variation in general [this signal underpins modern demographic inference, e.g., Li and Durbin (2011)], and X:A diversity ratios in particular (Hey and Harris 1999; Wall et al. 2002; Pool and Nielsen 2007). Having shown that life history also dramatically affects X:A ratios, it is natural to ask how these effects act jointly. In SI Section 4, we extend our model to incorporate these effects. We assume that population size and sex-specific life histories and mutation rates are piecewise-constant in n time intervals, where the i-th interval moving backward from the present is , and We show that the expected heterozygosity on autosomes at the present is well approximated by the recursion
(23) |
where and heterozygosity on the X follows a similar recursion (Eqs. S136 and S137). The ratio in any given interval i follows from the effects of the male-to-female ratios of generation times and reproductive variances in a constant population size (Equation 20). These recursions can be solved for and at the present and for their values at any time in the past (i.e., by substituting for in the recursions). In SI Section 4.3, we further describe how standard coalescence simulators can be used to incorporate the effects of sex- and time-dependent mutation rates, reproductive variances and age structure.
To illustrate how life history modulates the effects of changes in population size on the X:A diversity ratio, we consider a simple scenario, in which the autosomal population size drops from to at time , where here we consider to be increasing forward in time. We further assume that male-to-female ratios of generation times, reproductive variances, and mutation rates are constant, and set the ratio of mutation rates to to isolate genealogical effects. Under these assumptions, the equilibrium X:A diversity ratio, which is also the ratio at times , is (with , is further multiplied by ). Solving the recursions (Equation 23 and S137), we find that the diversity ratio at times is
(24) |
which converges to the equilibrium ratio in the long run (i.e., when ).
This example illustrates how life history modulates the transient response of the X:A diversity ratio to changes in population size (Figure 2). Without sex-specific life histories, i.e., when and Equation 24 reduces to the expression derived by Pool and Nielsen (2007). As they show, higher coalescence rate on the X compared to autosomes result in faster equilibration of X-linked diversity levels to the new population size, with exponential rate compared to and, thus, in a transient reduction of the X:A diversity ratio (black curve in Figure 2). A higher male reproductive variance weakens the difference between X and autosome (blue curve in Figure 2), because it causes coalescence rates (and thus rates of equilibration) to be more similar on the X and autosomes, decreasing equilibration rates on the X by a factors of On the other hand, a longer male generation time enhances the difference between X and autosomes in two ways (red curve in Figure 2): it increases coalescence rates and thus equilibration rates per generation on the X compared to autosomes (by a factor of), and it also decreases the relative number of generations that elapsed since the drop in population size on the X compared autosomes (by another factor of ). More generally, sex-specific life history modulates the effects of changes in population size on X:A diversity ratios in two ways: one is by changing the relative X:A coalescence time scale of the response in generations; the other is by changing the relative X:A generation times, and thus the relative rate of response of X vs. autosome per unit time (in years).
Fourth, our results indicate that age-structure likely has underappreciated and non-negligible effects on the genetic diversity of X (or Z) and autosomes in many species. Notably, sex-specific generation times are the strongest known modifier of mutation rates (Crow 2006; Kong et al. 2012; Segurel et al. 2014), and, as we show, their genealogical effects on divergence and diversity on autosomes and sex chromosomes can also be substantial. Our results can therefore help to interpret observed diversity ratios in many species, given estimates for life history trait values. In particular, in a parallel paper (Amster et al. 2020), we take such an approach to show that after accounting for the differential effects of linked selection on the X and autosomes, the joint effects of changes in life history and population size can fully explain the variation in X:A diversity ratios across human populations.
Acknowledgments
We thank M. Nordborg for helpful discussions and M. Przeworski for many helpful discussions and comments on the manuscript. We also thank Brain Charlesworth, Aneil Agrawal, and four anonymous reviewers for many helpful comments on this manuscript.
Footnotes
Supplemental material available at figshare: https://doi.org/10.25386/genetics.12440075.
Communicating editor: A. Agrawal
Literature Cited
- Amster G., and Sella G., 2016. Life history effects on the molecular clock of autosomes and sex chromosomes. Proc. Natl. Acad. Sci. USA 113: 1588–1593. 10.1073/pnas.1515798113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Amster, G., D. A. Murphy, W. M. Milligan and G. Sella, 2020 Changes in life history and population size can explain relative neutral diversity levels on X and autosomes in extant human populations. bioRxiv. doi: 10.1101/763524 (Preprint posted September 9, 2019). (in press) 10.1101/763524 [DOI] [PMC free article] [PubMed]
- Charlesworth B., 1994. Evolution In Age-Structured Populations, Cambridge University Press, Cambridge: 10.1017/CBO9780511525711 [DOI] [Google Scholar]
- Charlesworth B., 2001. The effect of life-history and mode of inheritance on neutral genetic variability. Genet. Res. 77: 153–166. 10.1017/S0016672301004979 [DOI] [PubMed] [Google Scholar]
- Charlesworth B., 2012. The effects of deleterious mutations on evolution at linked sites. Genetics 190: 5–22. 10.1534/genetics.111.134288 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charlesworth B., and Charlesworth D., 2010. Elements Of Evolutionary Genetics, Roberts and Co. Publishers, Greenwood Village, CO. [Google Scholar]
- Charlesworth B., Coyne J. A., and Barton N. H., 1987. The relative rates of evolution of sex chromosomes and autosomes. Am. Nat. 130: 113–146. 10.1086/284701 [DOI] [Google Scholar]
- Crow J. F., 2006. Age and sex effects on human mutation rates: an old problem with new complexities. J. Radiat. Res. 47 (Suppl B): B75–B82. 10.1269/jrr.47.b75 [DOI] [PubMed] [Google Scholar]
- Crow J. F., and Kimura M., 1970. An Introduction To Population Genetics Theory, Harper & Row, New York. [Google Scholar]
- De Magalhães J. P., Costa J., and Church G. M., 2007. An analysis of the relationship between metabolism, developmental schedules, and longevity using phylogenetic independent contrasts. J. Gerontol. A Biol. Sci. Med. Sci. 62: 149–160. 10.1093/gerona/62.2.149 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dixson A. F., 2009. Sexual Selection And The Origins Of Human Mating Systems, Oxford Univ, Press, Oxford. [Google Scholar]
- Emigh T. E., and Pollak E., 1979. Fixation probabilities and effective population numbers in diploid populations with overlapping generations. Theor. Popul. Biol. 15: 86–107. 10.1016/0040-5809(79)90028-5 [DOI] [Google Scholar]
- Evans B. J., and Charlesworth B., 2013. The effect of nonindependent mate pairing on the effective population size. Genetics 193: 545–556. 10.1534/genetics.112.146258 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ewens W. J., 2004. Mathematical Population Genetics, Springer, New York: 10.1007/978-0-387-21822-9 [DOI] [Google Scholar]
- Felsenstein J., 1971. Inbreeding and variance effective numbers in populations with overlapping generations. Genetics 68: 581–597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fenner J. N., 2005. Cross-cultural estimation of the human generation interval for use in genetics-based population divergence studies. Am. J. Phys. Anthropol. 128: 415–423. 10.1002/ajpa.20188 [DOI] [PubMed] [Google Scholar]
- Gao Z., Moorjani P., Sasani T. A., Pedersen B. S., Quinlan A. R. et al. , 2019. Overlooked roles of DNA damage and maternal age in generating human germline mutations. Proc. Natl. Acad. Sci. USA 116: 9491–9500. 10.1073/pnas.1901259116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gillespie J. H., 2004. Population Genetics: A Concise Guide, Johns Hopkins University Press, Baltimore, Md. [Google Scholar]
- Haldane J. B., 1924. A mathematical theory of natural and artificial selection–I. 1924. Bull. Math. Biol. 52: 209–240, discussion 201–207. [DOI] [PubMed] [Google Scholar]
- Hammer M. F., Mendez F. L., Cox M. P., Woerner A. E., and Wall J. D., 2008. Sex-biased evolutionary forces shape genomic patterns of human diversity. PLoS Genet. 4: e1000202 10.1371/journal.pgen.1000202 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hammer M. F., Woerner A. E., Mendez F. L., Watkins J. C., Cox M. P. et al. , 2010. The ratio of human X chromosome to autosome diversity is positively correlated with genetic distance from genes. Nat. Genet. 42: 830–831. 10.1038/ng.651 [DOI] [PubMed] [Google Scholar]
- Hedrick P. W., 2007. Sex: differences in mutation, recombination, selection, gene flow, and genetic drift. Evolution 61: 2750–2771. 10.1111/j.1558-5646.2007.00250.x [DOI] [PubMed] [Google Scholar]
- Hey J., and Harris E., 1999. Population bottlenecks and patterns of human polymorphism. Mol. Biol. Evol. 16: 1423–1426. 10.1093/oxfordjournals.molbev.a026054 [DOI] [PubMed] [Google Scholar]
- Hill W. G., 1972. Effective size of populations with overlapping generations. Theor. Popul. Biol. 3: 278–289. 10.1016/0040-5809(72)90004-4 [DOI] [PubMed] [Google Scholar]
- Hudson R. R., 1990. Gene genealogies and the coalescent process, pp. 1–14 in Oxford Surveys in Evolutionary Biology. Oxford University Press, Oxford, UK. [Google Scholar]
- International HapMap Consortium 2003. The international HapMap project. Nature 426: 789–796. 10.1038/nature02168 [DOI] [PubMed] [Google Scholar]
- Johnson D. L., 1977. Inbreeding in populations with overlapping generations. Genetics 87: 581–591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jónsson H., Sulem P., Kehr B., Kristmundsdottir S., Zink F. et al. , 2017. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 549: 519–522. 10.1038/nature24018 [DOI] [PubMed] [Google Scholar]
- Kong A., Frigge M. L., Masson G., Besenbacher S., Sulem P. et al. , 2012. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488: 471–475. 10.1038/nature11396 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H., and Durbin R., 2011. Inference of human population history from individual whole-genome sequences. Nature 475: 493–496. 10.1038/nature10231 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McVicker G., Gordon D., Davis C., and Green P., 2009. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5: e1000471 10.1371/journal.pgen.1000471 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miyata, T., H. Hayashida, K. Kuma, K. Mitsuyasu, and T. Yasunaga, 1987 Male-driven molecular evolution: a model and nucleotide sequence analysis. Cold Spring Harb. Symp. Quant. Biol. 52: 863–867. [DOI] [PubMed] [Google Scholar]
- Myers S., Bottolo L., Freeman C., McVean G., and Donnelly P., 2005. A fine-scale map of recombination rates and hotspots across the human genome. Science 310: 321–324. 10.1126/science.1117196 [DOI] [PubMed] [Google Scholar]
- Orive M. E., 1993. Effective population size in organisms with complex life-histories. Theor. Popul. Biol. 44: 316–340. 10.1006/tpbi.1993.1031 [DOI] [PubMed] [Google Scholar]
- Pettay J. E., Kruuk L. E., Jokela J., and Lummaa V., 2005. Heritability and genetic constraints of life-history trait evolution in preindustrial humans. Proc. Natl. Acad. Sci. USA 102: 2838–2843. 10.1073/pnas.0406709102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pollak E., 1990. The effective population size of an age-structured population with a sex-linked locus. Math. Biosci. 101: 121–130. 10.1016/0025-5564(90)90105-8 [DOI] [PubMed] [Google Scholar]
- Pollak E., 2011. Coalescent theory for age-structured random mating populations with two sexes. Math. Biosci. 233: 126–134. 10.1016/j.mbs.2011.07.002 [DOI] [PubMed] [Google Scholar]
- Pool J. E., and Nielsen R., 2007. Population size changes reshape genomic patterns of diversity. Evolution 61: 3001–3006. 10.1111/j.1558-5646.2007.00238.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sagitov S., and Jagers P., 2005. The coalescent effective size of age-structured populations. Ann. Appl. Probab. 15: 1778–1797. 10.1214/105051605000000223 [DOI] [Google Scholar]
- Sayres M. A. W., and Makova K. D., 2011. Genome analyses substantiate male mutation bias in many species. BioEssays 33: 938–945. 10.1002/bies.201100091 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sayres M. A. W., Venditti C., Pagel M., and Makova K. D., 2011. Do variations in substitution rates and male mutation bias correlate with life-history traits? A study of 32 mammalian genomes. Evolution 65: 2800–2815. 10.1111/j.1558-5646.2011.01337.x [DOI] [PubMed] [Google Scholar]
- Ségurel L., Wyman M. J., and Przeworski M., 2014. Determinants of mutation rate variation in the human germline. Annu. Rev. Genomics Hum. Genet. 15: 47–70. 10.1146/annurev-genom-031714-125740 [DOI] [PubMed] [Google Scholar]
- Thomas F., Teriokhin A., Renaud F., De Meeûs T., and Guégan J. F., 2000. Human longevity at the cost of reproductive success: evidence from global data. J. Evol. Biol. 13: 409–414. 10.1046/j.1420-9101.2000.00190.x [DOI] [Google Scholar]
- Wall J. D., Andolfatto P., and Przeworski M., 2002. Testing models of selection and demography in Drosophila simulans. Genetics 162: 203–216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Westendorp R. G., and Kirkwood T. B., 1998. Human longevity at the cost of reproductive success. Nature 396: 743–746. 10.1038/25519 [DOI] [PubMed] [Google Scholar]
- Wright S., 1986. Statistical genetics in relation to evolution, pp. 283–341 in Evolution: Selected Papers, edited by Provine W. B., University of Chicago Press, Chicago. (Original work published 1939) [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The authors state that all data necessary for confirming the conclusions presented in the manuscript are represented fully within the manuscript. Supplemental material available at figshare: https://doi.org/10.25386/genetics.12440075.