Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Jul 1.
Published in final edited form as: Theor Popul Biol. 2018 Mar 28;122:149–157. doi: 10.1016/j.tpb.2018.03.006

Inference on Admixture Fractions in a Mechanistic Model of Recurrent Admixture

Erkan Ozge Buzbas a, Paul Verdu b
PMCID: PMC6054545  NIHMSID: NIHMS922872  PMID: 29604302

Abstract

Signatures of recent historical admixture are ubiquitous in human populations. We present a mechanistic model of admixture with two source populations, encompassing recurrent admixture periods and study the distribution of admixture fractions for finite but arbitrary genome size. We provide simulation-based methods to estimate the introgression parameters and discuss the implications of reaching stationarity on estimability of parameters when there are recurrent admixture events with different rates.

Keywords: recurrent admixture, introgression parameters, stationary distribution, population genetics, Bayesian computation

1. Introduction

Admixture among previously isolated populations is a key mechanism influencing the genetic diversity and biological evolution of populations. Since the 1930s population geneticists have extensively investigated genetic diversity patterns in admixed populations from numerous domesticated or wild animal and plant species (Bertorelle et al. 1998; Richards et al.1999; Rosenberg et al. 2002; Garant et al. 2005). This allowed reconstructing the demographic and migratory histories of the studied populations (Long 1991; Chikhi et al. 2001; Hellenthal et al. 2014), detecting traces of natural selection and biological adaptation (Cavalli-Sforza et al. 1971; Tang et al. 2007; Oleksyk et al. 2010; Lohmueller et al. 2011), identifying mutations involved in complex phenotypic traits determination (pathological or not) using admixture mapping approaches (Reich et al. 2005; Smith et al. 2005; Buerkle et al. 2008; Perry et al. 2014), or understanding the genomic architecture of admixed individuals (Tang et al. 2005; Tang et al. 2006; Sankararaman et al. 2008; Price et al. 2009; Bryc et al. 2010; Lawson et al. 2012).

Theoretical approaches to understand the effect of admixture on genetic patterns observed in populations, remain to be more thoroughly explored. A number of mechanistic models have been developed to describe historical admixture processes with a single event of admixture between two source populations. They have been fruitfully used to predict admixture proportions and time since the onset of admixture (Patterson et al. 2012; Pickrell et al. 2012; Lipson et al. 2013; Loh et al. 2013; Hellenthal et al. 2014), and other statistics such as fixation indices (Long 1991), and linkage disequilibrium (Chakraborty et al. 1988; Ewens et al. 1995; Pfaff et al. 2001; Guo et al. 2005; Baird 2006; Loh et al. 2013). However, admixture processes are known to be often more complex than a process with a single event of admixture between two source populations, involving potential recurrent admixture events with varying contributions from several source populations over time (e.g. Crawford et al 2012).

Recent studies have developed mechanistic models of complex historical admixture to better understand the influence of admixture parameters such as the contribution of source populations at each generation, the onset and the duration of admixture (Verdu et al. 2011; Gravel 2012; Goldberg et al. 2014; Goldberg et al. 2015). These authors showed that complex admixture processes may leave an identifiable signature on genetic patterns in admixed populations and individuals. Genetic patterns can further be used to reconstruct previously unknown admixture histories using genetic data. Nevertheless, our ability to accurately infer past admixture histories from observed genetic patterns largely depends upon further theoretical exploration of complex admixture models. For example, identifying whether admixture processes in a hybrid population occur panmictically or follow complex assortative mating is crucial to correctly interpret inferences on past admixture events drawn from the genetic data. In particular, assessing a priori the identifiability and estimability of admixture parameters when complex admixture processes operate is both crucial and challenging.

In this paper, we explore admixture processes derived from the mechanistic models proposed in Verdu and Rosenberg (2011) involving multiple admixture periods (which we refer as recurrent admixture throughout the paper) in the history of admixed populations. Over the long run, our recurrent admixture model leads to a stationary distribution of admixture fractions. We motivate this distribution analytically for single locus and two-locus genomes, and calculate it deterministically for genomes with more than two loci. We develop simulation-based statistical methods to estimate the admixture parameters under historical models using genotyping data, and discuss how reaching the stationarity influences parameter estimability under models with at least two distinct admixture periods over time.

2. A model with recurrent admixture

We consider a process of recurrent admixture every generation between two infinite diploid source populations with fixed and known genome size of L loci. The admixture fraction for an individual is the proportion of alleles in the genome of that individual contributed originally by one of the source populations in a population that evolves as in (Figure 1A). The admixed population H is assumed to be infinite and it is founded at generation t = 0 by random mating of parents from two source populations: A and B. At the founding generation t = 0, the parents of individuals in the admixed population are chosen from source populations A and B independent of each other with constant probabilities a0 and b0 respectively. For generations t ≥ 1 after the founding, the parents of the individuals in the admixed population at generation t + 1 are chosen from source populations A, B, and the admixed population H, independent of each other with probabilities a, b, and h respectively. Repeated over generations with constant introgression parameters a, b, and h, this process is a single-stage recurrent admixture model. Verdu and Rosenberg (2011) studied a version of this admixture model under the assumption of unspecified genome size and focused on the probability for a random locus in the genome of a random individual in an admixed population at generation g to ultimately have originated from source population i. Fixing the size of the genome makes the mathematical properties of our model fundamentally different then Verdu and Rosenberg (2011). We refer to their model as unspecified genome size model and discuss its differences from our model in section 3.2.

Figure 1.

Figure 1

A mechanistic model with: constant recurring admixture (A), and two stage recurring admixture (B) in a diploid population

We assign labels 1 and 0 to alleles originated from source populations A and B respectively, which corresponds to a “diagnostic” locus of the admixture process. For example, as in the case of a locus with two alleles, one of them private to a source population (see Szpiech et al. 2011). The admixture fraction of an individual takes values {0, 1/(2L), 2/(2L), …, 1, where the numerator is the number of alleles in the genome of an admixed individual originating from source population A.

Current hypotheses of admixture in recent history of human populations (Gravel 2012, Liang and Nielsen 2014), predict multiple-stage processes, so we study the properties of the distribution of admixture fractions in a two-stage recurrent admixture model (Figure 1B). In this model, there are two sets of introgression parameter values. Until time tc, the admixture process runs under the first set of introgression parameters as in the single-stage recurrent admixture model. From generation tc on, the recurrent admixture process continues with the second set of introgression parameter values until the present generation. We denote the two sets of introgression parameters corresponding to the first and second stage of the recurrent admixture process by (a1, b1, h1) and (a2, b2, h2) respectively (Figure 1B).

3. The stationary distribution of admixture fractions

3.1. Single-stage recurrent admixture model

We motivate the setup for the distribution of admixture fractions using a single-locus biallelic case: Three possible genotypes are 1|1, 0|1, 0|0, (where the vertical bar separates the two alleles in a genotype) and the three admixture fraction values that correspond to these genotypes are 1, 0.5, 0 respectively. We let pi(t) be the frequency of allele i at generation t in the admixed population H, with the initial frequencies p1(0) = p and p0(0) = q such that p+q = 1. At the first generation after the founding, the probability of an allele 1 in the offspring is equal to the probability of the parent coming from population A which contributes allele 1 with probability 1 or population H which contributes allele 1 with probability p. Therefore, the probability of drawing allele 1 as parent of an offspring is hp+a. A similar calculation for allele 0 implies that the probability of drawing allele 0 is hq+b. Under random mating in the hybrid population, the frequency of allele 1 at generation 1 in the admixed population is then given by p1(1) = (hp+a)2 + (hp+a)(hq+b) = hp+a. Calculating the frequency of alleles at each generation recursively, at generation t we have p1(t) = htp + ht−1a + ht−2a + … + ha + a. Excluding the first term, the right hand side of this equation follows a geometric series with rate t and at generation t the allele frequencies can be written as geometric sum p1(t) = htp + a(1 − ht)/(1 − h). Since h < 1, in the limit as t → ∞ the first term htp on the right hand side goes to zero, and by the properties of the geometric series, the remaining terms converge to p1(∞) = a/(1 − h). A similar calculation shows that for allele 0 we have p0(∞) = b/(1 − h). These probabilities are reminiscient of a result obtained in equation (31) of Verdu and Rosenberg (2011), where for a random locus in the genome of a random individual in the admixed population they derive the expected value of the admixture fraction as a/(1 − h).

The limiting probabilities for admixture fractions {1, 0.5, 0} are given by (a/(1 − h)2, (2ab)/(1 − h)2, b/(1 − h)2). The time to stationarity depends on the initial allele frequencies p, q and on the parameter values (a, b, h). For the biallelic single locus case, if the initial and equilibrium frequencies of allele 1 are p1 and p1(∞) respectively, the convergence rate to stationarity is given by limt→∞(|p1(t + 1) − p1(∞)|)/(|p1 (t) − p1(∞)|), which is equal to h. Since 0 < h < 1, the convergence rate to the limiting frequencies a/(1 − h) and b/(1 − h) is linear in the number of generations t. Figure 2 (panels I and II) shows the effect of different values of (a, b, h) on the time to convergence to the stationary distribution of admixture fractions when the initial frequencies are p = q = 0.5.

Figure 2.

Figure 2

Time to convergence to the stationary distribution in the single-stage admixture model: I.) The contour plot for time to convergence in a two-dimensional simplex with coordinates (a, b, h). II.) The surface of the contour plot. The legend on the right shows the height of the surface in double log scale and the corresponding number of generations (t) to convergence. The convergence is assumed to be attained at accuracy 10−16. Time to convergence is small unless h, the contribution from the hybrid population to itself is close to 1. III. The table entries are time to convergence for different values of h at different accuracies. For example, h = 0.39 and precision 10−3 gives the time to convergence as 5 generations. For all cases in the table a = b = (1 − h)/2.

The stationary distribution of admixture fractions can be calculated for multilocus genotypes as well. For a two-locus genome with haplotypes, 11, 01, 10, 00, there are five possible admixture fractions: {1, 0.75, 0.5, 0.25, 0}. We denote the haplotype frequencies at t = 0 by (p1, p2, p3, p4). At generation t the haplotype frequencies are given by p1(t + 1) = hp1(t) + a, p2(t + 1) = hp2(t), p3(t + 1) = hp3(t), p4(t + 1) = hp4(t) + b. At stationarity we have pi(ts) = pi(ts − 1) and using the recursion we get the probabilities of admixture fractions {1, 0.75, 0.5, 0.25, 0} as (p12,2(p1p2+p1p3), 2(p1p4+p2p3)+p22+p32, 2(p4p2+p4p3),p42.

For L > 2, analytical solutions are algebraically cumbersome but the stationary probabilities are easily calculated with forward-in-time deterministic simulations using a tolerance level ε as the criterion of stationarity. The admixture process reaches the stationarity at tolerance level ε at generation tε, if tε is the first generation at which the maximum difference between any two admixture fractions is smaller than ε for the first time in the process. The main factor that determines the computational cost for the stationary probabilities is L. Even for intermediate L, the number of haplotypes (2L) is large. Some compuational efficiency is achieved by calculating and storing a matrix of coefficients M that maps genotype probabilities to admixture fractions. When multiplied by a vector of genotype probabilities, M returns the probabilities of admixture fractions. Given L, this matrix is calculated once and it is repeatedly used below in Algorithm 1. Given L, introgression parameters (a, b, h), and the initial frequencies of haplotypes in the hybrid population p=(p1,p2,,p(2L)), Algorithm 1 deterministically calculates the stationary probabilities at a specified tolerance ε.

Algorithm 1.

1. Initialize:
  Calculate genotype probabilities: G = pTp where pT is transpose
  Calculate probabilities of admixture fractions: ϕ = MG
  Set s = (a, 0, …, 0, b), (1 × 2L vector)
 While(true), do:
2. Update haplotype probabilities: p = ph + s
3. Update genotype probabilities: G = pTp
4. Update probabilities of admixture fractions: ϕ′ = MG
5. if max |ϕ′ − ϕ| < ε,
 break and return ϕ′,
 else, set ϕ = ϕ′ and go to step (2)

Figure 2 panel III lists the number of generations to convergence for a specified accuracy in the allele frequencies and is useful in establishing convergence with observed data. For all cases in the figure we use a = b = (1 − h)/2. For example when h = 0.39, at tolerance ε = 10−3 we get the time to convergence as 5 generations.

3.2. Differences with the unspecified genome size model in Verdu and Rosenberg (2011)

There are differences between our admixture model and the unspecified genome size model and these differences lead to different distributions of admixture fractions. Consider an example with Parent 1 genotype 0|1, and Parent 2 genotype 1|1. Three offspring genotypes from these two parents are 0|1, 1|0, 1|1, with frequencies 0.25, 0.25, 0.5 and admixture fraction values 0.5, 0.5, 1 respectively. In the unspecified genome size model, the quantity of interest is the distribution of mean admixture fractions of offspring in the admixed population as opposed to the distribution of admixture fractions of all offspring. The mean admixture fraction is (0.5 + 0.5)(0.25) + (1)(0.5) = 0.75, so the model records 0.75 for the offspring of these parents. However, our model calculates the distribution of admixture fractions for all individuals at each generation: the probabilities that an admixture fraction value is equal to 0.5 and 1 are 0.5 and 0.5 respectively.

In the unspecified genome size model, the distribution of the mean admixture fraction for a random locus and each generation the number of possible admixture fraction values increases with 2t. At generation t, possible admixture fraction values are {0, 1/2t, …, (2t − 1)/2t, 1} and the number of admixture fraction does not depend on the number of loci, L. Thus, the support of the distribution of admixture fraction increases with time and thus it is challenging to find the stationary distribution of admixture fractions. We consider a fixed and known number of loci in the genome and the 2L possible admixture fraction have a nonnegative probability to be in the population at all generations after the first. This makes it easy to calculate the stationary distribution of admixture fractions in our model.

3.3. Two-stage recurrent admixture model

Gravel (2012) and Liang and Nielsen (2014) found support for demographic models in which U.S. African-American populations have experienced a second admixture pulse following a first admixture pulse. This result suggests that complex admixture processes may have occured in human genetic evolution and left an identifiable signature in the genomic make-up of hybrid populations. We expand their two-pulse models to investigate the consequences of a generalized two-stage recurrent admixture process with introgression parameters (a1, b1, h1) in the first stage and (a2, b2, h2) in the second stage by expanding the single stage recurrent admixture model developed in section 3.1. The results that we present in this section extend directly to multi-locus genomes.

We assume that at generation t the introgression parameter values change from (a1, b1, h1) to (a2, b2, h2). In addition to the first admixture process reaching the numerical stationarity at tolerance level ε at generation tε, we now assume that the second process reaches the numerical stationarity at generation ε (Figure 1B). The second process is assumed to follow the first process immediately and the starting allele frequencies for the second process are the ending allele frequencies from the first process. If the introgression parameter values change at generation t, then at generation t + 1 we have the allele frequencies given by p1(t+1)=h2[h1t+a1((1h1t)/(1h1))]+a2, and p0(t+1)=h2[h1tq+b1((1h1t)/(1h1))]+b2, where the quantities inside the brackets are p1(t) and p0(t) respectively, which become the initial frequencies of alleles for the second recurrent admixture process with introgression parameters (a2, b2, h2). If the total admixture process continues for t + ℓ generations, first for t generations under parameter values (a1, b1, h1) and then for generations under parameter values (a2, b2, h2), the allele frequencies are given by

p1(t+)=h2[h1tp+a1(1h1t1h1)]+a2(1h21h2), (1)
p0(t+)=h2[h1tp+a1(1h1t1h1)]+b2(1h21h2). (2)

Equations 1 and 2 develop an intution on the effect of the first and second admixture processes on the frequency of alleles as follows. If this stationarity has been reached before the second process has started, equations 1 and 2 simplify to

p1(t+|t>tε)=h2(a11h1)+a2(1h21h2), (3)
p0(t+|t>tε)=h2(b11h1)+b2(1h21h2). (4)

if the second process has reached the stationarity, the effect of the first process on the allele frequencies given by the first terms in equations 3, and 4 goes to zero and the second terms converge to the geometric sum p1(t+|>ε)=a21h2, and p0(t+|>ε)=b21h2. Thus, if the second admixture process has run long enough to reach the stationarity, no effect of the first admixture process can be captured in the allele frequency data and the allele frequencies will carry no information about the first set of introgression parameters, irrespective of how long the first process has run.

Figure 5 shows some examples of how quickly the effect of the first admixture process is erased by the second admixture process in a two-stage recurrent admixture models. We tested six models for which the introgression parameters are given in the upper left corner legend of the figure. The parameterization follows the order (a1, b1, h1), (a2, b2, h2). We started the recurrent admixture at generation 1 after the founding, and ended at generation 50. The horizontal axis of the plot shows the generation tc at which the admixture was switched from process 1 to process 2 in the sense that the introgression parameters were changed from (a1, b1, h1) to (a2, b2, h2) at that generation. The vertical axis of the plot captures the effect of tc on the frequencies of admixture fractions at the end of 50 generations using a Sum of Squared Difference statistic (SSD).

Figure 5.

Figure 5

The results of a simulation study from a two-stage admixture model: Six models differing in introgression parameter values for two admixture processes are compared when the switch from process 1 to 2 happens at tc ∈ {2, 3, …, 48}. The total length of the two admixture processes is 50 generations. The horizontal axis: the generation number tc. The vertical axis (log scale): the sum of squared differences (SSD) in the admixture fraction frequencies at generation 50, between the model in which the change in the introgression parameters occurs at generation 2 (earliest) and each of the other models with 2 < tc ≤ 48. The SSD is small in all models indicating that little information remains in the admixture fractions about the first admixture process if a second admixture process acts on the population with parameters different than the first process.

For a model with fixed ((a1, b1, h1), (a2, b2, h2)), SSD for tc is given by

SSD(tc)=i=12L+1[pi(50|t=tc)pi(50|t=2)]2,

where pi(50|t = tc) and pi(50|t = 2) denote the frequencies of admixture fractions i at generation 50 when the admixture was switched from process 1 to process 2 at generation tc and 2 respectively. On average, we would expect SSD to be large if the switch from process 1 to process 2 has occured late in the whole process, because in this case the second process did not act long enough to reshape the frequencies of the admixture fractions that had been determined by the first process. Indeed, for all six models at each value of tc ∈ {2, 3, …, 48}, we see this effect in figure 5: as tc increases, the SSD increases.

All SSD values on the vertical axis (log scale) are very small indicating that the information carried in the frequencies of admixture fractions from the first process gets lost in a few generations if a second recurrent admixture process acts on the population. Different parameter combinations of ((a1, b1, h1) relative to (a2, b2, h2)) had very little effect on the magnitude of SSD. We note that models 1 and 2, models 3 and 4, and models 5 and 6 give almost identical results due to symmetry in the admixture parameters.

An implication of these results is that even a few generations of the second admixture process acting on the population is sufficient to erase the information about the introgression parameters of the first admixture process in the case of recurrent admixture.

4. Statistical properties of the distribution of admixture fractions

In this section we describe Bayesian procedures to sample the posterior distribution of introgression parameters under the single-stage recurrent admixture model of section 3.1. If sufficiently long time has passed since the start of the recurrent admixture process and there is no drift, then a sample from the admixed population is effectively from the stationary distribution of admixture fractions. In this case, we first obtain the posterior distribution of admixture fractions given a random sample of admixture fractions from an admixed population, and then we find the introgression parameters that produce this distribution using an optimization algorithm. Statistical inference is peformed in the first step, and mapping this inference to the introgression parameters takes place in the second step.

We let D = (D1, D2, …, DK) be the K ≤ 2L + 1 admixture fractions calculated from L randomly chosen diagnostic biallelic loci independent from each other in the admixed population. We assume that the stationary distribution of admixture fractions, P(X = Di|ϕi) = ϕi, has been reached and thus D is multinomially distributed with probabilities ϕ = (ϕ1, ϕ2, …,ϕK). The minimally informative Bayesian prior on the parameter ϕ for multinomially distributed data is Dirichlet with parameters (1/2, 1/2, …, 1/2). The Dirichlet distribution is the conjugate prior for the multinomial data model, which means that the posterior distribution of ϕ given the data D is also Dirichlet distributed with parameters (D1 + 1/2, D2 + 1/2, …, DK + 1/2). This posterior distribution of population admixture fractions ϕ has the form

P(ϕ|D)i=1KϕiDi+1/2. (5)

Estimates of population admixture fractions follow from this posterior distribution. However, we are interested in estimates of introgression parameters and thus our task is to find a procedure to map estimates of admixture fractions to estimates of introgression parameters. If the genome consists of more than one locus1, there is a vector of introgression parameters (a*, b*, h*) that produces probabilities ϕ=(ϕ1,ϕ2,,ϕK)of admixture fractions through our admixture model. Finding the vector of introgression parameters that produce the probabilities of admixture fractions ϕ* is an optimization problem. We solve this problem by simulated annealing (SA) (see Appendix 1 for algorithm), a well-known optimization metaheuristic. To test the performance of our SA algorithm in finding the true values of introgression parameters given the population distribution of admixture fractions ϕ*, we randomly choose five test introgression parameter sets (a*, b*, h*) from a Dirichlet distribution with parameters (1,1,1) using L = 2, 3, 4, 5, 6 before considering a case with a larger number of loci. The convergence to true parameter values is illustrated in Figure 3.

Figure 3.

Figure 3

Optimization of introgression parameters (a, b, h) in five randomly generated test cases by simulated annealing algorithm (Appendix 1) given the distribution of admixture fraction. For each test case, a black dot at the end of 1000 iterations indicates the test parameter value under which the admixture fraction are simulated. The number of loci and the parameter values (a, b, h) used in test cases are as follows. (1): L = 6, (0.104, 0.177, 0.719). (2): L = 3, (0.407, 0.526, 0.07). (3): L = 4, (0.204, 0.691, 0.105). (4): L = 2, (0.009, 0.829, 0.162). (5): L = 5, (0.471, 0.101, 0.428).

The simulated annealing algorithm given in Appendix 1 finds introgression parameters that produce a given distribution of population admixture fractions. We do not observe these population admixture fractions, however, only a sample from this population. Therefore, our uncertainty about population admixture fractions will be reflected in our inference on introgression parameters. The posterior distribution of introgression parameters can be obtained in two steps. First, we obtain a random sample ϕ(i), i = 1, 2, …, m of admixture fractions given the data D. This is a sample from the posterior distribution in equation 5, which is a Dirichlet distribution and hence computationally straightforward to sample. Second, we apply the simulated annealing algorithm on each sampled distribution of admixture fractions ϕ(i), and find each set of introgression parameters (a(i), b(i), h(i)) producing these distributions of admixture fractions. These (a(i), b(i), h(i)) i = 1, 2, …, m values returned by the simulated annealing algorithm are a sample of size m from the posterior distribution of introgression parameters. For the same set of five test parameter sets used in the previous paragraph, Figure 4 shows the posterior samples of size m = 1000 obtained using this method.

Figure 4.

Figure 4

Posterior samples for five test cases. For each test case, the red dot indicates the test parameter value under which the admixture fractions are simulated, the gray cloud of points is a sample of size 1000 from the posterior distribution of parameters (a, b, h), and the red circle is a 95% elliptical interval based on the posterior sample. The number of loci and the parameter values (a, b, h) used in test cases are as follows. (1): L = 6, (0.104, 0.177, 0.719). (2): L = 3, (0.407, 0.526, 0.07). (3): L = 4, (0.204, 0.691, 0.105). (4): L = 2, (0.009, 0.829, 0.162). (5): L = 5, (0.471, 0.101, 0.428).

Any desired α percentile for the introgression parameters can be estimated by one run of the simulated annealing algorithm. First, we determine the lower ϕ(α) from the posterior sample of admixture fractions and find the α(α) of the interval estimates of introgression parameters by applying our simulated annealing algorithm. In particular, an interval of the desired level of credibility can be obtained by running the simulated annealing algorithm two times, once for the lower bound and once for the upper bound.

For our five simulated introgression parameter sets, Figure 4 shows inference on introgression parameters based on a sample of 1000 individuals from the simulated admixed populations. A sample of size 1000 from the posterior distribution is illustrated by clouds of gray dots for each simulated data set. Each cloud of points is obtained by first sampling m = 1000 admixture fraction points from the posterior distribution of admixture fractions and then applying simulated annealing algorithm on each point to find the introgression parameter value producing the given distribution of admixture fractions. 95% credible regions for each introgression parameters is shown by red ellipses, with = 0.025, u = 0.975 obtained by applying the simulated annealing algorithm twice. These 95% credible regions are precise and they show that our method using the simulating annealing algorithm to obtain interval estimates on introgression parameters works well in practice.

We conclude this section with some remarks on the estimation of introgression parameters in a large genome, that is when L is large. The number of theoretically possible haplotypes is given by 2L, which grows exponentially. For large L, some haplotypes will either be missing in the population if the population size is smaller than 2L, or exist at very low frequencies, and thus are unlikely to be sampled when the number of individuals in the sample is small relative to the population size. In these cases, the Bayesian procedure for inference on introgression parameters outlined in this section works well as an approximate method. Intuitively, we assume that the population consists only of haplotypes that are observed in the sample. Formally, this assumption is implemented by assigning a prior probability of zero to any haplotype that is not observed in the sample, resulting in zero posterior probability for these haplotypes. This assignment of zero prior probability to any haplotype not observed in the sample has also a computational advantage. When calculating the distribution of admixture fractions, only the frequencies of genotypes that result from pairing of the observed haplotypes —and not the genotypes that might have resulted from non-observed haplotypes—need to be calculated.

5. Discussion

Our main goals in this work were to better understand a priori the influence of complex admixture processes on the genetic evolution of hybrid populations, use intuitive distributions of admixture fractions, and investigate potentials of inferences for reconstructing the history of admixed populations using genetic data. Although not shown in this paper, we have also studied the joint estimation of duration of the admixture process t and introgression parameters (a, b, h) by simulation. We were confronted with estimability problems in the sense of weak identifiability of parameters. There are many parameter combinations where a process with: 1)little gene flow from source populations (small a, b) that acts for long periods of admixture (large t) or 2)large gene flow from source populations (large a, b) that acts for short periods of admixture (small t), produce practically indistinguishable distributions of admixture fractions. We believe that precise inferences about the introgression parameters require accurate information about the temporal properties of an admixture process. These properties consist either of the duration of admixture, the transient distribution of admixture fractions, and the initial frequencies of alleles, or the stationarity and the stationary distribution of admixture fractions. Using the transient distribution by conditioning on t to make inference on introgression parameters is an interesting idea, but accurate estimates of t from independent sources might not be available in practice.

If the stationarity has been reached, inferences of introgression parameters are considerably facilitated because the stationary distribution of admixture fractions can be calculated as in section 3, and this distribution is independent of initial frequencies of alleles. Statistical tests of stationarity could provide information about whether individuals in the hybrid population randomly mate according to their admixture levels. This is of interest for hybrid reintroduction programs or recurrent biological invasions involving admixture and/or hybridization with endogenous species (Hedrick 1994; Vergeer et al. 2004; Guillemaud et al. 2010; Kerdelhué et al. 2014). In these cases, it is often crucial to understand possible assortative matting mechanisms influencing the reproductive success of hybrids based on their admixture levels in order to predict future population dynamics including survival and expansion, or extinction.

Population genetics methods aiming at inferring the parameters of historical admixture processes using genetic data are increasingly applied to study, in particular, human admixture history at variable geographic and time scales (e.g. Hellenthal et al. 2014; Moreno-Estrada et al. 2013; Gravel et al. 2013; Raghavan et al. 2015; Skoglund et al. 2015; Mallick et al. 2016). These methods schematically rely whether on phylogenetic admixture graphs inferred using the allele frequency spectrum calculated from markers genotyped at the genome-wide scale (Reich et al. 2009; Patterson et al. 2012; Pickrell and Pritchard 2012; Lipson et al. 2013); or on linkage-disequilibrium decay and admixture-haplotype lengths distributions fitted to exponential curves derived under specific complex admixture models (Pool and Nielsen 2009; Moorjani et al. 2011; Gravel et al. 2012; Patterson et al. 2012; Loh et al. 2013; Hellenthal et al. 2014). These authors detailed and extensively warn users about the inherent limitations faced by these approaches. In particular, how poorly these methods may perform when the unknown admixture process underlying the data involves more than two pulses of admixture and/or recurring admixture mechanisms. Here, we demonstrate that if the admixture fractions in a hybrid population have reached stationarity, then drawing inferences about more ancient admixture events may be highly challenging. Moreover, in the simplest case of a single-stage recurrent admixture process, we show that the time needed to reach stationarity, although variable and depending on introgression parameters, can be relatively rapid on the order of 10 to 50 generations (Figure 3). Thus, while recent complex admixture processes may be accurately reconstructed using only present-day genetic data, inferring the parameters of more ancient admixture events might be misleading. In this context, the joint analyses of modern and ancient genetic data become crucial to accurately infer the parameters of more ancient historical processes (e.g. Raghavan et al. 2015; Kilinc et al. 2016; Lazaridis et al. 2014).

Our conclusions rely on the assumption that genetic markers used in the analysis are independent of each other. Independent markers are easier to work with since genetic data phasing still remains a challenging task involving uncertainties. Nevertheless, using linkage-disequilibrium data provides massive amounts of information about the underlying admixture mechanisms, as shown by numerous successful admixture-LD approaches applied to human genomic data (Gravel et al. 2012, 2013; Hellenthal et al. 2014; Malick et al. 2016). Therefore, a natural direction for future theoretical work similar to ours is to focus on LD patterns in hybrid populations having experienced complex admixture processes, to better understand a priori what information can be accurately extracted from real genomic data about historical admixture models and parameters.

Acknowledgments

This work was funded in part by the French Agence Nationale de la Recherche Grant “METHIS” ANR-15-CE32-0009-01. Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number P20GM104420. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Appendix 1: Simulated annealing (SA) algorithm for optimization of parameters given the population admixture fraction

SA is an iterative metaheuristic to minimize a function [27]. The algorithm starts at an arbitrary value (a(0), b(0), h(0)) on the parameter space and at each iteration replaces the current value with a proposed random solution (a(*), b(*), h(*)) with probability 1 if the proposed solution is a better minimizer of the objective function, or with a probability function if it is a worse solution. A temperature parameter T that decreases with time controls the probability of accepting a worse solution decreases with the number of iterations.

We minimize the total variation distance between the distribution of the observed xo and the proposed x* distribution of admixture fractions. The total variation distance is given by (1/2)i=12L+1|p(Di)p(xi)|, where p(Di) and p(xi) are the probabilities of the admixture fraction i in the observed and the proposed distribution respectively. We set an exponential cooling schedule where temperature at iteration i + 1 iteration is 0.95% of the temperature at iteration i. We set the temperature at the first and final iterations to T0 = 0.7 and Tf = 0.001 respectively.

  1. Initialize
    1. Set T(c) = T0 = 0.7, Tf = 0.01, α = 0.95,
    2. Set the estimated probability of admixture fraction i to P(Di)
    3. Simulate (a(c), b(c), h(c)) ~ Dirichlet(1, 1, 1)
    4. Calculate the current distribution P(x|a(c), b(c), h(c)) analytically
    5. Calculate the current energy:
      Ec=(1/2)i=12L+1|P(Di)P(xi|a(c),b(c),h(c))|
  2. While T(c) > Tf repeat:
    1. 6. Propose a new parameter value (a(p), b(p), h(p)) ~ Dirichlet(1, 1, 1)
    2. 7. Calculate the proposed distribution P(x|a(p), b(p), h(p)) analytically
    3. 8. Calculate the proposed energy:
      Ep=(1/2)i=12L+1|P(Di)P(xi|a(p),b(p),h(p))|
    4. 9. Simulate u ~ Unif(0, 1)
    5. 10. If u < e|Ep−Ec|/T(c), set (a, b, h) = (a(p), b(p), h(p)), Ec = Ep
    6. 11. Set T(c) = αT(c)

Footnotes

1

For L = 1, the model is underdetermined and the parameters are unidentifiable because there are three parameters but only two alleles. For example, if k ∈ (0, 0.25) the parameters (a, b, h) = (k, 3k, 1 − 4k) generate allele frequencies p1(∞) = 0.25, p0(∞) = 0.75. Substituting 1 − h = a + b in equations p1(∞) = a/(1 − h), and p0(∞) = b/(1 − h), we get p1(∞) = a/(a + b) and p0(∞) = b/(a + b) which shows that there are infinitely many parameter values that can generate the allele frequencies, and thus the distribution of admixture fractions.

References

  • 1.Anderson E, Thompson E. A model-based method for identifying species hybrids using multilocus genetic data. Genetics. 2002;160:1217–1229. doi: 10.1093/genetics/160.3.1217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Baird SJ. Phylogenetics: Fisher’s markers of admixture. Heredity. 2006;97:81–83. doi: 10.1038/sj.hdy.6800850. [DOI] [PubMed] [Google Scholar]
  • 3.Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian computation in population genetics. Genetics. 2002;162:2025–2035. doi: 10.1093/genetics/162.4.2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bertorelle G, Excoffier L. Inferring admixture proportions from molecular data. Molecular Biology and Evolution. 1998;15:1298–1311. doi: 10.1093/oxfordjournals.molbev.a025858. [DOI] [PubMed] [Google Scholar]
  • 5.Bryc K, Auton A, Nelson MR, Oksenberg JR, Hauser SL, Williams S, Froment A, Bodo JM, Wambebe C, Tishkoff SA, Bustamante CD. Genome-wide patterns of population structure and admixture in West Africans and African Americans. Procedings of National Academy of Science USA. 2010;107:786–791. doi: 10.1073/pnas.0909559107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Buerkle CA, Lexer C. Admixture as the basis for genetic mapping. Trends in Ecology and Evolution. 2008;23:686–694. doi: 10.1016/j.tree.2008.07.008. [DOI] [PubMed] [Google Scholar]
  • 7.Cavalli-Sforza LL, Bodmer WF. The genetics of human populations. xvi. W. H. Freeman; San Francisco, U.S.A.: 1971. p. 965. [Google Scholar]
  • 8.Chakraborty R, Weiss KM. Admixture as a tool for finding linked genes and detecting that difference from allelic association between loci. Procedings of National Academy of Science U S A. 1988;85:9119–9123. doi: 10.1073/pnas.85.23.9119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chikhi L, Bruford MW, Beaumont MA. Estimation of admixture proportions: a likelihood-based approach using Markov chain Monte Carlo. Genetics. 2001;158:1347–1362. doi: 10.1093/genetics/158.3.1347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Crawford MH, Campbell BC. Causes and consequences of human migration: an evolutionary perspective. xv. Cambridge University Press; Cambridge, U.K: 2012. p. 550. [Google Scholar]
  • 11.Edwards AFW. Foundations of Mathematical Genetics. Second. Cambridge University Press; Cambridge, U.K: 2000. pp. 40–42. [Google Scholar]
  • 12.Ewens W. Mathematical Population Genetics: I Theoretical Introduction. Second. Springer; London, U.K: 2004. [Google Scholar]
  • 13.Ewens WJ, Spielman RS. The transmission/disequilibrium test: history, subdivision, and admixture. American Journal of Human Genetics. 1995;57:455–464. [PMC free article] [PubMed] [Google Scholar]
  • 14.Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003;164:1567–1587. doi: 10.1093/genetics/164.4.1567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Garant D, Kruuk LE, Wilkin TA, McCleery RH, Sheldon BC. Evolution driven by differential dispersal within a wild bird population. Nature. 2005;433:60–65. doi: 10.1038/nature03051. [DOI] [PubMed] [Google Scholar]
  • 16.Goldberg A, Rosenberg NA. Beyond 2/3 and 1/3: The Complex Signatures of Sex-Biased Admixture on the X Chromosome. Genetics. 2015;201:263–279. doi: 10.1534/genetics.115.178509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Goldberg A, Verdu P, Rosenberg NA. Autosomal admixture levels are informative about sex bias in admixed populations. Genetics. 2014;198:1209–1229. doi: 10.1534/genetics.114.166793. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Gravel S. Population Genetics Models of Local Ancestry. Genetics. 2012;191:607–619. doi: 10.1534/genetics.112.139808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gravel S, Zakharia F, Moreno-Estrada A, Byrnes JK, Muzzio M, Rodriguez-Flores JL, Kenny EE, Gignoux CR, Maples BK, Guiblet W, Dutil J, Via M, Sandoval K, Bedoya G, Genomes P, et al. Reconstructing Native American migrations from whole-genome and whole-exome data. PLoS Genetics. 2013;9:e1004023. doi: 10.1371/journal.pgen.1004023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Guillemaud T, Beaumont MA, Ciosi M, Cornuet JM, Estoup A. Inferring introduction routes of invasive species using approximate Bayesian computation on microsatellite data. Heredity. 2010;104:88–99. doi: 10.1038/hdy.2009.92. [DOI] [PubMed] [Google Scholar]
  • 21.Guo W, Fung WK, Shi N, Guo J. On the formula for admixture linkage disequilibrium. Human Heredity. 2005;60:177–180. doi: 10.1159/000090119. [DOI] [PubMed] [Google Scholar]
  • 22.Hedrick PW. Purging inbreeding depression and the probability of extinction: full-sib mating. Heredity. 1994;73:363–372. doi: 10.1038/hdy.1994.183. [DOI] [PubMed] [Google Scholar]
  • 23.Hellenthal G, Busby GB, Band G, Wilson JF, Capelli C, Falush D, Myers S. A genetic atlas of human admixture history. Science. 2014;343:747–751. doi: 10.1126/science.1243518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Keller LF, Waller DM. Inbreeding effects in wild populations. Trends in Ecology and Evolution. 2002;17:230–241. [Google Scholar]
  • 25.Kerdelhue C, Boivin T, Burban C. Contrasted invasion processes imprint the genetic structure of an invasive scale insect across southern Europe. Heredity (Edinb) 2014;113:390–400. doi: 10.1038/hdy.2014.39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kilinc GM, Omrak A, Ozer F, Gunther T, Buyukkarakaya AM, Bicakci E, Baird D, Donertas HM, Ghalichi A, Yaka R, Koptekin D, Acan SC, Parvizi P, Krzewinska M, Daskalaki EA, et al. The Demographic Development of the First Farmers in Anatolia. Current Biology. 2016;26:2659–2666. doi: 10.1016/j.cub.2016.07.057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kirkpatrick S, Gelatt CD, Jr, Vecchi MP. Optimization by Simulated Annealing. Science. 1983;220:671–680. doi: 10.1126/science.220.4598.671. [DOI] [PubMed] [Google Scholar]
  • 28.Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of population structure using dense haplotype data. PLoS Genetics. 2012;8:e1002453. doi: 10.1371/journal.pgen.1002453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, Sudmant PH, Schraiber JG, Castellano S, Lipson M, Berger B, Economou C, Bollongino R, Fu Q, Bos KI, Nordenfelt S, et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature. 2014;513:409–413. doi: 10.1038/nature13673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Liang M, Nielsen R. Understanding admixture fraction. 2014 doi: 10.1101/008078. [DOI]
  • 31.Lipson M, Loh PR, Levin A, Reich D, Patterson N, Berger B. Efficient moment-based inference of admixture parameters and sources of gene flow. Mol Biol Evol. 2013;30:1788–1802. doi: 10.1093/molbev/mst099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Loh PR, Lipson M, Patterson N, Moorjani P, Pickrell JK, Reich D, Berger B. Inferring admixture histories of human populations using linkage disequilibrium. Genetics. 2013;193:1233–1254. doi: 10.1534/genetics.112.147330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Lohmueller KE, Bustamante CD, Clark AG. Detecting directional selection in the presence of recent admixture in african-americans. Genetics. 2011;187:823–835. doi: 10.1534/genetics.110.122739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Long JC. The genetic structure of admixed populations. Genetics. 1991;127:417–428. doi: 10.1093/genetics/127.2.417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, Zhao M, Chennagiri N, Nordenfelt S, Tandon A, Skoglund P, Lazaridis I, Sankararaman S, Fu Q, Rohland N, et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature. 2016;538(7624):201–206. doi: 10.1038/nature18964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Moorjani P, Patterson N, Hirschhorn JN, Keinan A, Hao L, Atzmon G, Burns E, Ostrer H, Price AL, Reich D. The history of African gene flow into Southern Europeans, Levantines, and Jews. PLoS Genetics. 2011;7(4):e1001373. doi: 10.1371/journal.pgen.1001373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Moreno-Estrada A, Gravel S, Zakharia F, McCauley JL, Byrnes JK, Gignoux CR, Ortiz-Tello PA, Martinez RJ, Hedges DJ, Morris RW, Eng C, Sandoval K, Acevedo-Acevedo S, Norman PJ, Layrisse Z, et al. Reconstructing the population genetic history of the Caribbean. PLoS Genetics. 2013;9(11):e1003925. doi: 10.1371/journal.pgen.1003925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Oleksyk TK, Smith MW, O’Brien SJ. Genome-wide scans for footprints of natural selection. Philosophical Transactions of the Royal Society B: Biological Sciences. 2010;365:185–205. doi: 10.1098/rstb.2009.0219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Parra EJ, Marcini A, Akey J, Martinson J, Batzer MA, Cooper R, Forrester T, Allison DB, Deka R, Ferrell RE, Shriver MD. Estimating African American admixture proportions by use of population-specific alleles. American Journal of Human Genetics. 1998;63:1839–1851. doi: 10.1086/302148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Paschou P, Ziv E, Burchard EG, Choudhry S, Rodriguez-Cintron W, Mahoney MW, Drineas P. PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genetics. 2007;3:1672–1686. doi: 10.1371/journal.pgen.0030160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D. Ancient admixture in human history. Genetics. 2012;192(3):1065–1093. doi: 10.1534/genetics.112.145037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genetics. 2006;2:e190. doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Perry GH, Foll M, Grenier JC, Patin E, Nedelec Y, Pacis A, Barakatt M, Gravel S, Zhou X, Nsobya SL, Excoffier L, Quintana-Murci L, Dominy NJ, Barreiro LB. Adaptive, convergent origins of the pygmy phenotype in African rainforest hunter-gatherers. Procedings of the National Academy of Sciences U S A. 2014;111:E3596–3603. doi: 10.1073/pnas.1402875111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Pfaff CL, Parra EJ, Bonilla C, Hiester K, McKeigue PM, Kamboh MI, Hutchinson RG, Ferrell RE, Boerwinkle E, Shriver MD. Population structure in admixed populations: effect of admixture dynamics on the pattern of linkage disequilibrium. American Journal of Human Genetics. 2001;68:198–207. doi: 10.1086/316935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Pickrell JK, Pritchard JK. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genetics. 2012;8:e1002967. doi: 10.1371/journal.pgen.1002967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Pool JE, Nielsen R. Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics. 2009;181:711–719. doi: 10.1534/genetics.108.098095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Price AL, Tandon A, Patterson N, Barnes KC, Rafaels N, Ruczinski I, Beaty TH, Mathias R, Reich D, Myers S. Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genetics. 2009;5:e1000519. doi: 10.1371/journal.pgen.1000519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–959. doi: 10.1093/genetics/155.2.945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Pudlo P, Marin JM, Estoup A, Cornuet JM, Gautier M, Robert CP. ABC model choice via random forests. ArXiv e-prints. 2014 doi: 10.1093/bioinformatics/btv684. [DOI] [PubMed] [Google Scholar]
  • 50.Raghavan M, Steinrucken M, Harris K, Schiffels S, Rasmussen S, DeGiorgio M, Albrechtsen A, Valdiosera C, Avila-Arcos MC, Malaspinas AS, Eriksson A, Moltke I, Metspalu M, Homburger JR, Wall J, et al. Genomic evidence for the Pleistocene and recent population history of Native Americans. Science. 2015;349(6250):aab3884. doi: 10.1126/science.aab3884. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Reich D, Patterson N. Will admixture mapping work to find disease genes? Philosophical Transactions of the Royal Society B: Biological Sciences. 2005;360:1605–1607. doi: 10.1098/rstb.2005.1691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Reich D, Thangaraj K, Patterson N, Price AL, Singh L. Reconstructing Indian population history. Nature. 2009;461:489–494. doi: 10.1038/nature08365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Richards CM, Church S, McCauley DE. The influence of population size and isolation on gene flow by pollen in Silene Alba. Evolution. 1999;53:63–73. doi: 10.1111/j.1558-5646.1999.tb05333.x. [DOI] [PubMed] [Google Scholar]
  • 54.Roberts DF, Hiorns RW. Methods of Analysis of the Genetic Composition of a Hybrid Population. Human Biology. 1965;37:38–43. [PubMed] [Google Scholar]
  • 55.Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW. Genetic structure of human populations. Science. 2002;298:2381–2385. doi: 10.1126/science.1078311. [DOI] [PubMed] [Google Scholar]
  • 56.Sankararaman S, Sridhar S, Kimmel G, Halperin E. Estimating local ancestry in admixed populations. American Journal of Human Genetics. 2008;82:290–303. doi: 10.1016/j.ajhg.2007.09.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Skoglund P, Mallick S, Bortolini MC, Chennagiri N, Hunemeier T, Petzl-Erler ML, Salzano FM, Patterson N, Reich D. Genetic evidence for two founding populations of the Americas. Nature. 2015;525:104–108. doi: 10.1038/nature14895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Smith MW, O’Brien SJ. Mapping by admixture linkage disequilibrium: advances, limitations and guidelines. Nature Reviews Genetics. 2005;6:623–632. doi: 10.1038/nrg1657. [DOI] [PubMed] [Google Scholar]
  • 59.Szpiech ZA, Rosenberg NA. On the size distribution of private microsatellite alleles. Theoretical Population Biology. 2011;80:100–113. doi: 10.1016/j.tpb.2011.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Tang H, Peng J, Wang P, Risch NJ. Estimation of individual admixture: analytical and study design considerations. Genet Epidemiology. 2005;28:289–301. doi: 10.1002/gepi.20064. [DOI] [PubMed] [Google Scholar]
  • 61.Tang H, Coram M, Wang P, Zhu X, Risch N. Reconstructing genetic ancestry blocks in admixed individuals. American Journal Human Genetics. 2006;79:1–12. doi: 10.1086/504302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Tang H, Choudhry S, Mei R, Morgan M, Rodriguez-Cintron W, Burchard EG, Risch NJ. Recent genetic selection in the ancestral admixture of Puerto Ricans. American Journal of Human Genetics. 2007;81:626–633. doi: 10.1086/520769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Tavaré S, Balding DJ, Griffiths RC, Donnelly P. Inferring coalescence times from DNA sequence data. Genetics. 1997;145:505–518. doi: 10.1093/genetics/145.2.505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Verdu P, Rosenberg NA. A general mechanistic model for admixture histories of admixed populations. Genetics. 2011;189:1413–1426. doi: 10.1534/genetics.111.132787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Verdu P, Pemberton TJ, Laurent R, Kemp BM, Gonzalez-Oliver A, Gorodezky C, Hughes CE, Shattuck MR, Petzelt B, Mitchell J, Harry H, William T, Worl R, Cybulski JS, Rosenberg NA, Malhi RS. Patterns of admixture and population structure in native populations of Northwest North America. PLoS Genetics. 2014;10:e1004530. doi: 10.1371/journal.pgen.1004530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Vergeer P, Sonderen E, Ouborg NJ. Introduction strategies put to the test: Local adaptation versus heterosis. Conservation Biology. 2004;18:812–821. [Google Scholar]
  • 67.Wang J. Maximum-likelihood estimation of admixture proportions from genetic data. Genetics. 2003;164:747–765. doi: 10.1093/genetics/164.2.747. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES