Abstract
When models of quantitative genetic variation are built from population genetic first principles, several assumptions are often made. One of the most important assumptions is that traits are controlled by many genes of small effect. This leads to a prediction of a Gaussian trait distribution in the population, via the Central Limit Theorem. Since these biological assumptions are often unknown or untrue, we characterized how finite numbers of loci or large mutational effects can impact the sampling distribution of a quantitative trait. To do so, we developed a neutral coalescent-based framework, allowing us to gain a detailed understanding of how number of loci and the underlying mutational model impacts the distribution of a quantitative trait. Through both analytical theory and simulation we found the normality assumption was highly sensitive to the details of the mutational process, with the greatest discrepancies arising when the number of loci was small or the mutational kernel was heavy-tailed. In particular, skewed mutational effects will produced skewed trait distributions and fat-tailed mutational kernels result in multimodal sampling distributions, even for traits controlled by a large number of loci. Since selection models and robust neutral models may produce qualitatively similar sampling distributions, we advise extra caution should be taken when interpreting model-based results for poorly understood systems of quantitative traits.
Keywords: Quantitative genetics, coalescent theory, characteristic function, neutral theory
1. Introduction
Questions about the distribution of traits that vary continuously in populations were critical in motivating early evolutionary biologists. The earliest studies of quantitative trait variation relied on phenomenological models, because the underlying nature of heritable variation was not yet well understood (Galton, 1883, 1889; Pearson, 1894, 1895). Despite the rediscovery of the work of Mendel (1866), researchers studying continuous variation in natural populations were initially skeptical that the Mendel’s laws could explain what they observed (Weldon, 1902; Pearson, 1904). These views were reconciled when Fisher (1918) showed that the observations of correlation and variation between phenotypes in natural populations could be explained by a model in which many genes made small contributions to the phenotype of an individual.
The insights of Fisher (1918) made it possible to build models of quantitative trait evolution from population genetic first principles. Early work focused primarily on the interplay between mutation and natural selection in the maintenance of quantitative genetic variation in natural populations, while typically ignoring the effects of genetic drift (Fisher, 1930; Haldane, 1954; Latter, 1960; Kimura, 1965).
However, genetic drift plays an important role in shaping variation in natural populations. While earlier work assumed that a finite number of alleles control quantitative genetic variation (e.g. Latter (1970)), Lande (1976) used the continuum-of-alleles model proposed by Kimura (1965) to model the impact of genetic drift on differentiation within and between populations. A key assumption of Lande’s models is that the additive genetic variance in a trait is constant over time. In fact, in finite populations the genetic variance itself is random; at equilibrium, there are still stochastic fluctuations around the deterministic value assumed by Lande, even if none of the underlying genetic architecture changes (Bürger and Lande, 1994).
Several later papers explored more detailed models to understand how genetic variance changes through time due to the joint effects of mutation and drift (e.g. Chakraborty and Nei (1982)). Lynch and Hill (1986) undertook an extremely thorough analysis of the evolution of neutral quantitative traits. They analyzed the moments (e.g. mean and variance) of trait distributions that arise due to mutation and genetic drift and provided several quantities that can be used to interpret variation within and between species and analyze mutation accumulation experiments.
Much of this earlier work has made several simplifying assumptions about the distribution of mutational effects and the genetic architecture of the traits in question. For instance, Lynch and Hill (1986), despite analyzing quite general models of dominance and epistasis, ignored the impact of heavy tailed or skewed mutational effects. While, in many cases, such properties of the mutational effect distribution are not expected to have an impact if a large number of genes determine the phenotype in question, it is unknown what impact they may have when only a small number of genes determine the genetic architecture of the trait. Moreover, when mutational effects display “power-law” or “fattailed” behavior, the impact of the details of the mutational effects may persist even in the so-called infinitesimal limit of a large number of loci with small effects. Finally, mutation accumulation experiments have produced skewed and/or leptokurtic samples of quantitative traits (Mackay et al., 1992), which is a direct motivation to relax assumptions on the mutational effects distribution.
Such deviations that stem from the violations of common modeling assumptions have the potential to influence our understanding of variation in natural populations. For instance, leptokurtic trait distributions may be a signal of some kind of diversifying selection (Kopp and Hermisson, 2006) but are also possible under neutrality when the number of loci governing a trait is small. Similarly, multimodal trait distributions may reflect some kind of underlying selective process (Doebeli et al., 2007) but may also be due to rare mutations of large effect.
We have two main goals in this work. Primarily, we want to assess the impact of violations of common assumptions on properties of the sampling distribution of a quantitative trait (e.g. variance, kurtosis, modality). Secondly, we believe that the formalism that we present here can be useful in a variety of situations in quantitative trait evolution, particularly in the development of robust null models for detecting selection at microevolutionary time scales. To this end, we introduce a novel framework for computing sampling distributions of quantitative traits. Our framework builds upon the coalescent approach of Whitlock (1999), but allows us to recover the full sampling distribution, instead of merely its moments.
First, we outline the biological model and explain how we can compute quantities of interest using a formalism based on characteristic functions. We then use this approach to compute the sample central moments. While much previous work focuses on only the first two central moments (mean and variance), we are able to compute arbitrarily high central moments, which are related to properties such as skewness and kurtosis. By doing so, we are able to determine the regime in which the details of the mutational effect distribution are visible in a sample from a natural population. Additionally, we explore the convergence to the infinitesimal limit and find that when “fat-tailed” effects are present, traditional theory based on the assumption of normality can lead to misleading predictions about phenotypic variation.
2. Model
The mechanistic model we construct has two components: a coalescent process, and a genetic mutational process that acts upon the controlling quantitative trait loci by sampling effect sizes from a mutational kernel. Together, these processes generate the values of quantitative traits sampled from the study population while explicitly modeling their shared genetic ancestry. Although we opt for simple model components during this exposition, the model generally supports more realistic and complex extensions, such as population structure and epistasis.
We assume that we sample n haploid individuals from a randomly mating population of size N. Initially, we consider a trait governed by a single locus and we will later extend the theory to traits governed by multiple loci. Let µ be the mutation rate per generation at the locus, and θ = 2N µ be the coalescent-scaled mutation rate. We model mutation as a process by which a new mutant adds an independent and identically distributed random effect to the ancestral state. Note that when the distribution of random effects is continuous, this corresponds to the Kimura (1965) continuum of alleles model. However, it is also possible for the effect distribution to be discrete, similar to the discrete model of Chakraborty and Nei (1982). While this model does not capture the impact of a biallelic locus with exactly two effects, the following theory could easily be modified to analyze that case.
Figure 1 shows one realization of both the coalescent and mutational processes for a sample of size 5. Given the phenotype at the root of the tree and the locations and effects of each mutation on the tree, the phenotypes at the tips are determined by adding mutant effects from the root to tip. To specify the root, we can assume without loss of generality that the ancestral phenotype for the entire population has a value 0 (this is similar to the common assumption in quantitative genetics literature that the ancestral state at each locus can be assigned a value of 0).
Figure 1.
Example realization of coalescent process for a sample of size 5. Mutations (marked as light gray X’s), are placed upon the genealogy representing each individual in the population. Effects of each mutation are drawn from a probability distribution and are added along each branch length. The model is specified such that the most recent common ancestor (MRCA) of the population has phenotype 0.0, while the MRCA of the population may have a phenotype different from zero, due to mutations that accumulate between the MRCA of the sample and the MRCA of the population.
This mutational process can be described as a compound Poisson process (see also Khaitovich et al. (2005b); Chaix et al. (2008); Landis et al. (2013) for compound Poisson processes in a phylogenetic context). To ensure that this paper is self contained, we briefly review relevant facts about compound Poisson processes in Appendix A.1.
In the following, we ignore the impact of non-genetic variation and focus on the breeding value of individuals, i.e. the average phenotype of an individual harboring a given set of mutations.
3. Results
3.1. Computing the characteristic function of a sample
In many analyses, the object of interest is the joint probability of the data. If we let X = (X1, X2, … , Xn) be the vector representing the values of the quantitative trait observed in a sample of n individuals, we denote the joint probability of the data as p(x1, x2, … , xn). Note that, in general, Xi and Xj are correlated due to shared ancestry, and that p must be computed by integrating over all mutational histories consistent with the data. Hence, computing p directly is extremely difficult.
Instead, we compute the characteristic function of X. For a one-dimensional random variable, X, the characteristic function is defined as 𝔼(eikX) where i is the imaginary unit, and k is a dummy variable. Generalizing this definition to an n-dimensional random variable, we are interested in computing
where k = (k1, k2, …, kn) is a vector of dummy variables. Like a probability density function, the characteristic function of X contains all the information about the distribution of X. Moreover, computing moments of X is reduced to calculating derivatives of the characteristic function, which will prove useful in the following.
We calculate the characteristic function of X in two parts. First, we compute a recursive formula for ϕn, the characteristic function given that ancestral phenotype of the sample is equal to 0. Then, we compute ρn, the characteristic function of the ancestral phenotype of the sample, assuming that the ancestral phenotype of the population is equal to 0. As we show in Appendix A.2, we can then multiply these characteristic functions to obtain the characteristic function of X.
We use a backward-forward argument to compute the recursive formula, first conditioning on the state when the first pair of lineages coalesce (backward in time) and then integrating (forward in time) to obtain the characteristic function for a sample of size n, ϕn. This results in
| (1) |
where k(u,v) is the vector of length n − 1 made by removing ku and kv and adding ku + kv to the vector of dummy variables and ψ(·) is the characteristic function of the mutational effect distribution.
This equation has a straight-forward interpretation. The characteristic function for a sample of size n, ϕn, is simply the characteristic function for a sample of size n − 1, ϕn−1, averaged over all possible pairs that could coalesce first, multiplied by the characteristic function for the amount of trait change that occurs more recently than the first coalescent. The multiplication comes from the fact that the characteristic function of a sum of independent random variables is the product of the characteristic functions of those random variables. We prove this result in Appendix A.3.
In Appendix A.4, we also show that the characteristic function for the phenotype at the root of the sample is
| (2) |
Intuitively, this equation arises by conditioning on whether u lineages are left in the population when the sample reaches its common ancestor and then averaging over the (random) time between when the individuals in the sample coalesce and when everyone in the population coalesces.
Hence, the characteristic function for a sample of size n is
3.2. Sampling traits controlled by a small number of loci
It is common practice in both theoretical and applied quantitative genetics to summarize information about the phenotypic distribution within a population by computing central moments. However, care must be taken when interpreting theoretical predictions about central moments estimated from a sample. This is because the phenotypes in the sample are not independent, but instead correlated due to their shared genealogical history. Hence, in any particular population, an estimate of a central moment may deviate from its expected value, even as the number of individuals sampled grows to infinity (Aldous, 1985).
With this caveat in mind, we computed the first four expected central moments for a sample of phenotypes taken from this model (see Appendix A.5 for details). They are
| (3) |
where hk is the unique minimum variance unbiased estimator of the kth central moment (Halmos et al., 1946), mk is the kth moment of the mutational effect distribution (which can be calculated by differentiating the characteristic function of the effect distribution, ψ) and L is the number of loci that influence the trait.
These equations reveal that it may be possible to construct method-of-moments estimators for the moments of the mutation effect distribution and/or the number of loci that govern a trait.
3.3. “Infinitesimal” limits for large numbers of loci
Many traits are assumed to be governed by a large number of loci, each individually of small effect. This is known as an infinitesimal model (Falconer and Mackay, 1996). Typically, the sampling distribution in the infinitesimal limit is assumed to be Gaussian, by appealing to the central limit theorem. Here, we find that under certain circumstances traits may not be normally distributed, even in the limit.
To obtain a non-trivial limit, we must assume that as the number of loci controlling the trait increases, the effect of each individual locus decreases in an appropriate way. Then, computing the characteristic function for a trait governed by a large number of independent loci is simple due to the fact the characteristic function of the sum of independent random variables is the product of their characteristic functions. Thus, assuming that each locus has the same effect distribution (this assumption can be relaxed relatively easily) the characteristic function of the limiting distribution of trait values is given by
where Rn is the limiting distribution of the phenotype at the root and Φn is the limiting distribution of the evolution in the sample.
In Appendices A.6 and A.7, we show that mutation effect distributions with power law behavior instead converge to a limiting stable distribution. A random variable X is said to have a power law distribution if P(X > x) ~ κx−α for large x, some κ > 0 and some α ∈ [0, 2). In this limit, individuals with shared genealogy may still have highly correlated phenotypes, due to rare mutations of large effect.
On the other hand, all mutation effect distributions without power law behavior converge to a Gaussian limit, due to the central limit theorem. To obtain the non-trivial limit, we assume that the variance of mutational effects per locus, τ2, goes to zero as the number of loci, L, goes to infinity in such a way that the total variance summed over all loci converges to a constant, i.e. Lτ2 → σ2. In Appendix A.8, we show that samples taken from a population in this limit can be represented as a sample from a normal distribution with a random mean. In particular,
where 𝒩(m, s2) represents a normal distribution with mean m and variance s2.
3.4. Simulation
To gain a more intuitive picture of how trait distributions change due to the underlying mutational kernel and the number of QTL, we conducted simulations. First, we used ms (Hudson, 2002) to generate coalescent genealogies and then generated and mapped mutational effects using custom scripts in R (R Core Team, 2013). Specifically, for each segregating site generated in ms, we drew a mutational effect from the appropriate distribution and added that effect to every individual who had a derived allele at that segregating site. We held the sample variance constant across our simulations; thus, we decreased the variance of the mutational effects as the number of loci increased. All code is available at http://github.com/Schraiber/quant_trait_coalescent or http://dx.doi.org/10.6084/m9.figshare.1337954.
We first assessed the signature left by various mutational kernels on the sampling distribution by computing the central moments across simulation replicates. We used a variety of mutational kernels in an attempt to capture different kinds of biologically relevant behavior: the (symetric) normal distribution, the skew-normal distribution (with skewness of 0.1, 0.5, and 0.9), and the Laplace distribution. We omit power law mutational kernels from this portion of the analysis because they do not have finite moments.
We simulated data while varying the number of loci from L = 2 to L = 256, while holding the sample size constant at n = 1024. For each mutational kernel and each L, we simulated 2000 replicates. Afterwards, we computed the mean values of the minimum variance unbiased estimators of the second, third and fourth central moments (i.e. h2, h3, and h4) across simulation replicates. We then compared these to their expected values, as computed in (3).
By design, h2 remains constant regardless of the mutational kernel of L. The normal and Laplace distributions are symmetric, and produce h3 of 0, regardless of the number of QTL. This result is consistent with our analytical analysis, which shows that the trait distribution should be symmetric if the underlying mutational kernel is symmetric. However, skew-normal mutational effects result in non-zero skewness even for traits controlled by over 100 loci if the underlying mutational kernel is sufficiently strongly skewed. Thus, the rate at which the sampling distribution’s third central moment, h3, converges to zero is in inverse proportion to the magnitude of the mutational kernel’s skewness. All mutational effects result in nono-zero h4 values when L is small, due to the randomness of the mutational process. Nonetheless, the Laplace distribution, the sole leptokurtic kernel in this comparison, is the slowest of all to converge to the normally distributed limit, suggesting the importance of the kurtosis of the mutational kernel for determining the kurtosis of the trait distribution.
We predicted, based on the multivariate stable limit derived for power law mutational effects, that power law mutational kernels may result in multimodal sampling distributions, even for traits controlled by a large number of QTL. We set out to determine the frequency of multimodality using the dip test (Hartigan and Hartigan, 1985). Briefly, the dip test computes a statistic measuring departure from unimodality and compares it to a conservative null distribution. Because the dip test is conservative, we expected to reject unimodality less than than 5% of the time for large L and kernels without power law behavior. Conversely, for kernels with power law behavior, we expect a larger fraction of tests to reject the null hypothesis, even for large numbers of loci.
We complemented the earlier simulated data with data simulated under three additional, α-stable, mutational kernels. We chose α ∈ {1.5, 1.7, 1.9} to assess a wide range of heavy tailed behavior, while ensuring that the distributions retained a finite mean.
Figure 3 shows that all mutational kernels result in multimodal sampling distributions when there are a small number of QTL. However, trait distributions with α-stable mutational kernels remained multimodal even for large L and that frequency of rejecting unimodality is proportional to the α value of the mutational kernel. As α decreases, the large effect mutations responsible for multimodality become more prominent, and cause a larger proportion of simulations to reject unimodality. In contrast, mutational kernels without power law behavior become unimodal as L increases. Nonetheless, we again see that heavier tailed distributions take longer to converge to the normal limit; in particular, Lapalce distributed mutations converge to unimodality more slowly than normally distributed mutations.
Figure 3.
Frequency to reject unimodal sampling distribution. Solid lines report the frequency the null hypothesis of the dip-test, that the sampling distribution was unimodal, was rejected for p < 0.05 when evolving under various mutational kernels. Data were simulated for 1024 sampled individuals and 2000 replicates for eight values of L, the number of loci. Colors distinguish the mutational kernel and relevant kernel parameters (if any).
4. Discussion
The natural world is replete with quantitative trait variation, and understanding the forces governing the evolution of quantitative traits is a central goal of evolutionary biology. The model of Fisher (1918), which explained how quantitative variation can be generated by Mendelian inheritance, provides an underpinning for understanding the generation and maintenance of variation in continuous characters. A primary assumption of much of this work is traits are controlled by a large number of loci and that new mutations have a very small, symmetric effect on the trait value.
In this work, we introduced a coalescent framework for modeling neutral evolution in quantitative traits. This stands in contrast to past work, which has typically taken a forward-in-time approach based on classical population genetics (but see Whitlock (1999) who also utilized a coalescent model). Our backward-in-time, sample-focused approach enabled us to derived an expression for the joint distribution of the data with arbitrary mutational effects and numbers of loci. We found that traits governed by a large number of loci with small effects are well-modeled by a Gaussian distribution, as expected. However, we saw that with small numbers of loci, significant departures from normality can be observed. Moreover, for fat-tailed (or power-law) mutational kernels, there are significant departures from normality (including multi-modality), even when the number of loci becomes large.
We assessed departure from normality in traits governed by a small number of loci by exploring the central moments of three different mutational kernels (normal, skew-normal and Laplace distributions) both analytically and by simulation. We showed that although all three mutational kernels converge to a Gaussian distribution, traits controlled by a small number of loci retain the signature of their underlying mutational kernel in their 3rd and 4th central moments. Hence, it may be possible to reconstruct aspects of the mutational effect distribution by observing phenotypes in natural populations. This may be particularly interesting for analyzing variation in gene expression, because mutational effects in cis may be strongly skewed (Khaitovich et al., 2005a; Chaix et al., 2008; Gruber et al., 2012). Our theory suggests that the distribution of gene expression in a population might therefore be skewed.
We were also interested in the circumstances under which multi-modal phenotypic distributions can arise. When a trait has a simple genetic architecture, it’s easy to see that there must be discrete phenotypic clusters, corresponding to groups of individuals sharing the same mutations. As the number of loci increases, there are more mutational targets (and thus more mutation events), which smooths the distribution, causing the sampling distribution to converge to the appropriate limiting distribution. For mutational effects with finite variance, this ultimately results in a limiting Gaussian distribution, consistent with the central limit theorem. However, when the mutational kernel is fat-tailed, the marginal effects of each locus do not vanish as the number of loci grows. Thus, some clade-specific mutations will always be of large effect despite the number of loci assumed by the model, resulting in a multi-modal sampling distribution.
These results show that even under the assumption of neutrality, significant departures from normality are possible and can be detected in empirical data. It is possible that these deviations from normality may be conflated with signatures of selection acting on quantitative variation. Several recent studies have claimed that evidence of non-Gaussianity may be evidence for non-neutral evolution at macroevolutionary time scales. For instance, Khaitovich et al. (2005a) and Chaix et al. (2008) found that the distribution of gene expression differences between great apes is strongly positively skewed. Similarly, Uyeda et al. (2011) argued that there is a one million year wait between bursts of evolution in the fossil record and numerous studies have explored non-Gaussian trait divergence in a phylogenetic context (Landis et al., 2013; Eastman et al., 2013). While it is unlikely that the population genetic model we developed can be directly applied to macroevolutionary data of this sort (Estes and Arnold, 2007), it is important to recognize that such effects can be due to purely neutral processes.
On shorter time scales, there is significant interest in detecting non-neutral quantitative trait evolution among closely related species or populations. One powerful method compares a measure of quantitative trait divergence, Qst, to the fixation index, Fst (McKay and Latta, 2002; Ovaskainen et al., 2011). However, this requires estimates of breeding values from common-garden experiments, and may be difficult to achieve. In other cases (e.g. Lemos et al. (2005)) more phenomenological approaches are taken, by comparing within and between species phenotypic diversity. The null distributions of these approaches typically rely on assumptions of the infinitesimal model, which we have shown may be violated due to mutations of large effect and/or loci with relatively simple genetic bases. To address these issues and leverage the abundance of modern quantitative trait data, Berg and Coop (2014) developed a method that explicitly uses breeding values estimated from quantitative trait mapping studies. When such effect size estimates are unavailable, it may be possible to use our formalism to develop robust null models to detect selection.
This framework, which provides a generative model based on an explicit characterization of the underlying mutational kernel, maybe useful for inferring parameters of mutational effect distributions from phenotypes sampled in natural populations. In particular, a large number of studies are now quantifying thousands of phenotypes across a large number of individuals by assaying molecular phenotypes such as gene expression (e.g Lappalainen et al. (2013)) or chromatin accessibility (e.g Degner et al. (2012)). Because these traits are thought to evolve subject to relatively few QTL of relatively large effect, we believe that it will be possible to use our model to make inferences about the mutational effects that shape phenotypic variation at the molecular level.
Our coalescent approach can be extended in several ways. Notably, we consider only haploid populations. In principle, an extension to diploid individuals is straight-forward using the result of Möhle (1998) that diploid, dioecious populations of size N are readily modeled by pairing random chromosomes from a haploid population of size 2N. To incorporate diploidy, we would also need to incorporate a model of dominance, of which several exist in the literature (e.g. the model of independent dominance of Lynch and Hill (1986).
From the point of view of the coalescent process, it is straightforward to apply our model to populations that have undergone complex demographic histories. This is because the dynamics of a coalescent under population size fluctuations and population structure are well known. Moreover, we explored only unlinked, neutral loci and it may be possible to obtain some analytical results for linked loci and/or weak natural selection by using the ancestral recombination graph and ancestral selection graph, respectively. While analytical results are difficult within these frameworks, we believe that they can be used to perform simple simulations of quantitative traits evolving in complex scenarios, thus enabling Approximate Bayesian Computation.
Figure 2.
Central moments. From left to right, the panels correspond to the central moments, h2, h3, and h4, respectively, for the sampling distributions evolving under various mutational kernels. Data were simulated for 1024 sampled individuals and 2000 replicates for eight values of L, the number of loci. Colors distinguish the mutational kernel and relevant kernel parameters (if any). Solid lines correspond to moment values computed from the simulated data. Dashed lines correspond to the expected moment values.
Acknowledgments
We are grateful to Monty Slatkin, Anand Bhaskar and Matt Pennell for reading an earlier version of this manuscript and providing extremely detailed suggestions that significantly improved its clarity. We are also indebted to Anand Bhaskar for suggesting the forward-backward approach that led to (1). We owe a debt of gratitude to Chris Ellison for several informative discussions about RNAseq. We would also like to thank Joachim Hermisson and two anonymous reviewers for comments that improved the focus of this manuscript. J.G.S. was supported by National Institutes of Health grants R01-GM40282 (awarded to Montgomery Slatkin) and National Science Foundational postdoctoral fellowship DBI-1402120. M.J.L. was supported by National Institutes of Health grant R01-GM069801 (awarded to John P. Huelsenbeck).
Appendix A. Mathematical derivations
A.1. Compound Poisson processes
To obtain the probability of the data under this model, we must be able to compute the probability of the change in phenotype along a branch of the tree. Unfortunately, except for very simple mutational models, this probability is impossible to compute analytically. Instead, we compute the characteristic function of the change along a branch.
Using standard results for compound Poisson processes (Kingman, 1992), we see that the characteristic function of the change along a branch of length t (in coalescent units) is
| (4) |
where ψ is the characteristic function of the mutational effect distribution.
A.2. The phenotype at the root of the sample genealogy and the subsequent evolution within the sample are subindepenent
Note that
where ℛ is the phenotype at the root of the sample genealogy and ℰu is the subsequent evolution leading to lineage u in the sample. So,
where the last line follows by independent and stationary increments of the compound Poisson process. Thus, ℛ and (ℰ1, ℰ2, …, ℰn) subindependent, and hence their joint characteristic function is the product of their characteristic functions.
A.3. Proof of recursive formula for the characteristic function
In this section we use X to indicate the vector of trait value conditional on the common ancestor of the sample having trait value 0. First, we condition on the state at the first coalescence (going back in time). The state consists of three components: 1) which pair of individuals coalesce, (u, v), 2) the time of the coalescent event, Tc, and 3) the trait value in each lineage at that time, X′ (note that, given (u, v), we have that , since those two lineages have coalesced and hence had the same trait value at the time of coalescence). Then,
| (5) |
where Y(t) = (Y1(t), Y2(t),…,Yn(t)) is the vector accounting for the evolution on each lineage that occurs during time t. The second line follows by the fact that each pair is equally likely to coalesce (with probability ) and the third line by independent increments of a compound Poisson process.
Now, we compute the internal expectation going forward in time. Noticing that 𝔼(eikTY(Tc)|Tc) is simply the characteristic function of a compound Poisson process run for length Tc, we see from (4) that
Because Tc and X′ are independent, we can integrate over Tc analytically in the outer expectation. The distribution of the time to the first coalescent event in a sample of size n is Exponential with rate , hence,
Plugging this result into (5) results in
but since X′ is simply the result of the same process where two of the entries are identical, we obtain the recursive formula (1) with ϕn = 𝔼 (eikTX) when X is a vector of size n.
To initialize the recursion, we must compute the characteristic function for a sample of size 2. This is
| (6) |
A.4. The phenotype at the root of the sample genealogy
First, we define Δ to be the time between when the sample genealogy finds a common ancestor and when the population genealogy finds a common ancestor. Then, we note that, conditional on Δ, the characteristic function of the phenotype at the root of the sample genealogy is
by using equation (4). Thus, the after integrating over Δ, the desired quantity is moment generating function of Δ, defined by
evaluated as .
We compute M(z) by conditioning on how many lineages are left in the population genealogy when the sample reaches its most recent common ancestor. To do this, we make use of a result of Saunders et al. (1984),
Given that u lineages are left in the population when the sample reaches its most recent common ancestor, the remaining time until the whole population reaches its common ancestor is simply the time it takes for a coalescent started with u to reach its most recent common ancestor, Cu. Thus by conditioning on the number of lineages left in the population and using the result from Saunders et al. (1984),
where the final line follows by recognizing that Cu is the sum of u − 1 independent exponential random variables with means . Substituting for z yields the desired result.
A.5. Computing sample central moments
While it is difficult to compute the expectation of any sample central moments for a particular sample, it is possible to average over replicate populations to compute expectations over replicate samples. We first begin by defining the h-statistics, which are the unique minimum variance unbiased estimators of the central moments (Halmos et al., 1946). In particular, letting Xi be the phenotype of individual i in a sample of size n (n.b. that these labels are arbitrary because individuals are exchangeable), and putting ,
We now compute expectations over these quantities by computing expectations over the products of the Sp. Because the phenotypes of the samples are correlated, this results in formulas that are different from the case of independent and identically distributed random variables. For instance,
Computing similar formulas as required and substituting into the definitions of the h-statistics results in
where the expectations on the right hand sides are over the correlated phenotypes in the sample. It is possible to compute these expectations by taking derivatives of the characteristic function (1). We demonstrate this for the case of h2; all other results follow similarly.
First, note that because all of these moments are central moments, the phenotype at the root of the sample genealogy will always cancel out and we only are concerned with the characteristic function relating to the evolution within the sample (1). Next, using the fact that the characteristic function of a sum of random variables is the product of their characteristic functions, let , i.e. the characteristic function of a trait governed by L identical loci. A basic property of the characteristic function is that
So,
and
where Cn is a term that depends on n and is identical in the two computations, and m2 = −ψ″(0) is the second moment of the mutational effect kernel. Finally,
A.6. Derivation of multivariate stable limit for sample distribution
Recall that a random variable X is said to have a fat-tailed (or power-law) distribution if
| (7) |
for large x and some κ > 0. As is typical in the literature, we reserve the term “fat-tailed” for distributions with α ∈ (0, 2).
To obtain an appropriate scaling limit, we assume that there is a parameter t, related to the parameter κ in (7) by
| (8) |
such that Lt → s as n → ∞. Note that s1/α is proportional to the scale parameter of the resulting limit distribution.
We provide a heuristic derivation, rather than a rigorous proof. First, we argue by induction that the (per locus) characteristic function for a sample of size n is
for large L, where 𝒫*(k) is the power set of the elements in k, except the set {k1, k2, …, kn}, and cn,|j| is a combinatorial constant that depends only on the sample size n and |j|, the size of the set j.
Note that for n = 2, this can be seen by observing that for large L, the characteristic function of a fat-tailed distribution is asymptotically
Thus,
Now, assume that the formula holds for ϕ̃n−1. Using the recursion (1), we have
The second line follows from plugging ψ̃ and ϕ̃, and c̃n,|j| arises by summing over the appropriate terms coming from all characteristic functions in the sum. Again looking for an asymptotic for large L, we see that
Finally, we note that by raising ϕ̃n to the Lth power, and taking the limit as L → ∞, we obtain the log characteristic function
| (9) |
where all terms are defined as before.
The characteristic function in (9) can be recognized to be that of a multivariate α-stable distribution (Press, 1972). These multivariate distributions are fat-tailed generalizations of the familiar multivariate normal distribution, and this limit corresponds to a generalized multivariate central limit theorem for sums of random vectors with fat-tailed distributions.
A.7. Limiting distribution of the phenotype at the root of the sample genealogy
Again, we proceed heuristically rather than rigorously. First, note that for large L,
so that
for large L. Thus,
So by definition of the exponential function, we have that
| (10) |
which is the characteristic function of a univariate α-stable distribution, arising from the fact that the phenotype at the root of the sample genealogy is itself a limit of a sum of random variables. Note that as n → ∞ (i.e. the sample becomes the whole population), R(k) → 1, because the root of the sample genealogy is the same as the root of the population genealogy and the root value has been specified to be equal to 0.
A.8. Multivariate Gaussian limits
For the case where the mutation distribution is not fat-tailed, we can use the multivariate central limit theorem to more efficiently derive the limiting distribution. The appropriate scaling in this case is to assume that if τ2 is the variance of the mutation effect kernel, then Lτ2 → σ2 as L→ ∞.
To apply the multivariate central limit theorem, we must derive the pairwise covariances between samples. While the required covariances could be computed by taking derivatives of the characteristic function, it is more instructive to compute these moments directly. For simplicity, we assume that the mutation effect distribution has mean 0 and variance τ2.
Assume that the population genealogy at a single locus, 𝒢, is fixed. Noting that the variance per unit time accrued by the mutational process is θ/2τ2 and using the rules for calculating covariance structure on a phylogeny, it’s easy to see that for samples i and j we have
where T is the height of 𝒢 and and Tij is the height of the most recent common ancestor of samples i and j. We can then use the law of total covariance,
to see that
This arises because 𝔼(T) = 2 and 𝔼(Tij) = 1.
Hence, as the number of loci increases to infinity in such a way that Lτ2 → σ2, the sampling distribution converges to a multivariate normal distribution with mean 0 and variance covariance matrix Σ having elements
Because the pairwise covariances are equal, the random vector X is an exchangeable Gaussian random vector. Hence, using well-known facts about the representation of exchangeable Gaussian random vectors, one arrives at the representation in the main text.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Aldous DJ. Exchangeability and Related Topics. Springer; 1985. [Google Scholar]
- Berg JJ, Coop G. A population genetic signal of polygenic adaptation. PLoS Genetics. 2014;10:e1004412. doi: 10.1371/journal.pgen.1004412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bürger R, Lande R. On the distribution of the mean and variance of a quantitative trait under mutation-selection-drift balance. Genetics. 1994;138:901–912. doi: 10.1093/genetics/138.3.901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chaix R, Somel M, Kreil DP, Khaitovich P, Lunter G. Evolution of primate gene expression: drift and corrective sweeps? Genetics. 2008;180:1379–1389. doi: 10.1534/genetics.108.089623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chakraborty R, Nei M. Genetic differentiation of quantitative characters between populations or species: I. Mutation and random genetic drift. Genetical Research. 1982;39:303–314. [Google Scholar]
- Degner JF, Pai AA, Pique-Regi R, Veyrieras J-B, Gaffney DJ, Pickrell JK, De Leon S, Michelini K, Lewellen N, Crawford GE, et al. Dnase [thinsp] i sensitivity qtls are a major determinant of human expression variation. Nature. 2012;482:390–394. doi: 10.1038/nature10808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doebeli M, Blok HJ, Leimar O, Dieckmann U. Multimodal pattern formation in phenotype distributions of sexual populations. Proceedings of the Royal Society B: Biological Sciences. 2007;274:347–357. doi: 10.1098/rspb.2006.3725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eastman JM, Wegmann D, Leuenberger C, Harmon LJ. Simpsonian ‘evolution by jumps’ in an adaptive radiation of anolis lizards. arXiv preprint arXiv. 2013 1305.4216. [Google Scholar]
- Estes S, Arnold SJ. Resolving the paradox of stasis: models with stabilizing selection explain evolutionary divergence on all timescales. The American Naturalist. 2007;169:227–244. doi: 10.1086/510633. [DOI] [PubMed] [Google Scholar]
- Falconer D, Mackay T. Introduction to Quantitative Genetics. 4 ed. American Genetic Association; 1996. [Google Scholar]
- Fisher R. The Genetical Theory of Natural Selection. Clarendon Press; 1930. [Google Scholar]
- Fisher RA. The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh. 1918;52:399–433. [Google Scholar]
- Galton F. Inquiries into Human Faculty and its Development. Macmillan; 1883. [Google Scholar]
- Galton F. Natural Inheritance. Macmillan; 1889. [Google Scholar]
- Gruber JD, Vogel K, Kalay G, Wittkopp PJ. Contrasting properties of gene-specific regulatory, coding, and copy number mutations in Saccharomyces cerevisiae: frequency, effects, and dominance. PLoS genetics. 2012;8:e1002497. doi: 10.1371/journal.pgen.1002497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haldane J. The statics of evolution. Evolution pas a Process. 1954:109–121. [Google Scholar]
- Halmos PR, et al. The theory of unbiased estimation. The Annals of Mathematical Statistics. 1946;17:34–43. [Google Scholar]
- Hartigan JA, Hartigan P. The dip test of unimodality. The Annals of Statistics. 1985:70–84. [Google Scholar]
- Hudson RR. Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics. 2002;18:337–338. doi: 10.1093/bioinformatics/18.2.337. [DOI] [PubMed] [Google Scholar]
- Khaitovich P, Hellmann I, Enard W, Nowick K, Leinweber M, Franz H, Weiss G, Lachmann M, Pääbo S. Parallel patterns of evolution in the genomes and transcriptomes of humans and chimpanzees. Science. 2005a;309:1850–1854. doi: 10.1126/science.1108296. [DOI] [PubMed] [Google Scholar]
- Khaitovich P, Pääbo S, Weiss G. Toward a neutral evolutionary model of gene expression. Genetics. 2005b;170:929–939. doi: 10.1534/genetics.104.037135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kimura M. A stochastic model concerning the maintenance of genetic variability in quantitative characters. Proceedings of the National Academy of Sciences of the United States of America. 1965;54:731. doi: 10.1073/pnas.54.3.731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kingman JFC. Poisson Processes. 3 ed. Oxford University Press; 1992. [Google Scholar]
- Kopp M, Hermisson J. The evolution of genetic architecture under frequency-dependent disruptive selection. Evolution. 2006;60:1537–1550. [PubMed] [Google Scholar]
- Lande R. Natural selection and random genetic drift in phenotypic evolution. Evolution. 1976:314–334. doi: 10.1111/j.1558-5646.1976.tb00911.x. [DOI] [PubMed] [Google Scholar]
- Landis MJ, Schraiber JG, Liang M. Phylogenetic analysis using Lévy processes: finding jumps in the evolution of continuous traits. Systematic biology. 2013;62:193–204. doi: 10.1093/sysbio/sys086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lappalainen T, Sammeth M, Friedländer MR, ’t Hoen PAC, Monlong J, Rivas MA, Gonzàlez-Porta M, Kurbatova N, Griebel T, Ferreira PG, et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013 doi: 10.1038/nature12531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Latter B. Natural selection for an intermediate optimum. Australian Journal of Biological Sciences. 1960;13:30–35. [Google Scholar]
- Latter B. Selection in finite populations with multiple alleles. ii. Centripetal selection, mutation, and isoallelic variation. Genetics. 1970;66:165. doi: 10.1093/genetics/66.1.165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lemos B, Meiklejohn CD, Cáceres M, Hartl DL. Rates of divergence in gene expression profiles of primates, mice, and flies: stabilizing selection and variability among functional categories. Evolution. 2005;59:126–137. [PubMed] [Google Scholar]
- Lynch M, Hill WG. Phenotypic evolution by neutral mutation. Evolution. 1986:915–935. doi: 10.1111/j.1558-5646.1986.tb00561.x. [DOI] [PubMed] [Google Scholar]
- Mackay T, Lyman RF, Jackson MS. Effects of P element insertions on quantitative traits in Drosophila melanogaster . Genetics. 1992;130:315–332. doi: 10.1093/genetics/130.2.315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKay JK, Latta RG. Adaptive population divergence: markers, qtl and traits. Trends in Ecology & Evolution. 2002;17:285–291. [Google Scholar]
- Mendel G. Versuche über pflanzenhybriden. Verhandlungen des naturforschenden Vereines in Brunn. 1866;4:3–44. [Google Scholar]
- Möhle M. Coalescent results for two-sex population models. Advances in Applied Probability. 1998:513–520. [Google Scholar]
- Ovaskainen O, Karhunen M, Zheng C, Arias JMC, Merilä J. A new method to uncover signatures of divergent and stabilizing selection in quantitative traits. Genetics. 2011;189:621–632. doi: 10.1534/genetics.111.129387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pearson K. Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London A. Pages. 1894:71–110. [Google Scholar]
- Pearson K. Contributions to the mathematical theory of evolution. III. Regression, heredity, and panmixia. Proceedings of the Royal Society of London. 1895;59:69–71. [Google Scholar]
- Pearson K. Mathematical contributions to the theory of evolution. XII. On a generalised theory of alternative inheritance, with special reference to Mendel’s laws. Philosophical Transactions of the Royal Society of London A. Pages. 1904:53–86. [Google Scholar]
- Press SJ. Multivariate stable distributions. Journal of Multivariate Analysis. 1972;2:444–462. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing. Austria: R Foundation for Statistical Computing Vienna; 2013. [Google Scholar]
- Saunders IW, Tavaré S, Watterson G. On the genealogy of nested subsamples from a haploid population. Advances in Applied probability. 1984:471–491. [Google Scholar]
- Uyeda JC, Hansen TF, Arnold SJ, Pienaar J. The million-year wait for macroevolutionary bursts. Proceedings of the National Academy of Sciences. 2011;108:15908–15913. doi: 10.1073/pnas.1014503108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weldon WFR. Mendel’s laws of alternative inheritance in peas. Biometrika. 1902:228–254. [Google Scholar]
- Whitlock MC. Neutral additive genetic variance in a metapopulation. Genetical research. 1999;74:215–221. doi: 10.1017/s0016672399004127. [DOI] [PubMed] [Google Scholar]



