Table 1 . Behavior of f- and D-statistics for a simulated scenarios of admixture.
Scenario | Fst(C, B) | Fst(O, B) | D(A, B; C, O) | D(A, X; C, O) | f3(B; A, C) | f3(X; A, C) | f4-ratio |
---|---|---|---|---|---|---|---|
Baseline | 0.10 | 0.14 | 0.00 | −0.08 | 0.002 | −0.005 | 0.47 |
Vary sample size | |||||||
n = 2 from each population | 0.10 | 0.14 | 0.00 | −0.08 | 0.002 | −0.005 | 0.47 |
Vary SNP ascertainment | |||||||
Use all sites (full sequencing data) | 0.10 | 0.13 | 0.00 | −0.11 | 0.001 | −0.002 | 0.47 |
Polymorphic in a single B individual | 0.10 | 0.16 | −0.01 | −0.06 | 0.003 | −0.006 | 0.47 |
Polymorphic in a single C individual | 0.10 | 0.16 | 0.00 | −0.13 | 0.003 | −0.007 | 0.46 |
Polymorphic in a single X individual | 0.11 | 0.16 | 0.00 | −0.11 | 0.003 | −0.007 | 0.49 |
Polymorphic in two individuals: B and O | 0.10 | 0.16 | −0.01 | −0.08 | 0.002 | −0.005 | 0.46 |
Vary demography | |||||||
NA = 2,000 (vs. 50,000) pop A bottleneck | 0.10 | 0.14 | 0.00 | −0.08 | 0.002 | −0.005 | 0.48 |
NB = 2,000 (vs. 12,000) pop B bottleneck | 0.14 | 0.17 | 0.00 | −0.08 | 0.011 | −0.004 | 0.48 |
NC = 1,000 (vs. 25,000) pop C bottleneck | 0.16 | 0.14 | 0.00 | −0.08 | 0.002 | −0.005 | 0.46 |
NX = 500 (vs. 10,000) pop X bottleneck | 0.10 | 0.14 | 0.00 | −0.08 | 0.002 | 0.004 | 0.47 |
NABB′ = 3,000 (vs. 7,000) ABB′ bottleneck | 0.14 | 0.17 | 0.00 | −0.09 | 0.002 | −0.007 | 0.47 |
We carried out simulations for populations related according to Figure 4 using ms (Hudson 2002) with the command: ./ms 110 1000000 -t 1 -I 5 22 22 22 22 22 -n 1 8.0 -n 2 2.5 -n 3 5.0 -n 4 1.2 -n 5 1.0 -es 0.001 5 0.47 -en 0.001001 6 1.0 -ej 0.0060 5 4 -ej 0.007 6 2 -en 0.007001 2 0.33 -ej 0.01 4 3 -en 0.01001 3 0.7 -ej 0.03 3 2 -en 0.030001 2 0.25 -ej 0.06 2 1 -en 0.060001 1 1.0. We chose parameters to produce pairwise FST similar to that for A = Adygei, B = French, X = Uygur, C = Han and O = Yoruba. The baseline simulations correspond to n = 20 samples from each population; SNPs ascertained as heterozygous in a single individual from the outgroup O; and a mixture proportion of α = 0.47. Times are in generations with the subscript indicating the populations derived from the split: tadmix = 40, tBB′ = 240, tABB′ = 400, tCC′ = 280, tABB′ = 400, tABB′CC′ =1,200, tO = 2,400. The diploid population sizes are indicated by a subscript corresponding to the population to which they are ancestral in Figure 4 and are: NA = 50,000, NB = 12,000, NB′ = 10,000, NBB′ = 12,000, NC′ = 25,000, NX = NC′= 10,000, NCC′ = 3,300, NO = 80,000, NABB′ = 7,000, NABB′CC′ = 2,500, NABB′CC′O = 10,000. All simulations involved 106 replicates except for the run involving 2 samples (a single heterozygous individual) from each population, where we increased this to 107 replicates to accommodate the noisier results.