Table 2.
Indian Cline group | Samples | Z-score from 3 Population Test for mixture | % ANI ancestry | ±1 stand. error | Genetic drift D from the best fitting combination of ANI and ASI * | Wright’s fixation index F (estimates inbreeding) † | Estimated fraction of recessive diseases due to founder events †† |
---|---|---|---|---|---|---|---|
Mala | 3 | -2.5 | 38.8% | 1.2% | 0.0023 | 0 | 100% |
Madiga | 4 | -2.7 | 40.6% | 1.2% | 0.0018 | 0.0061 | 23% |
Chenchu | 6 | 31.3 (not significant) | 40.7% | 1.3% | 0.0492 | 0 | 100% |
Bhil | 7 | -10.6 | 42.9% | 1.1% | 0.0024 | 0 | 100% |
Satnami | 3 | -5.6 | 43.0% | 1.3% | 0.0019 | 0 | 100% |
Kurumba | 6 | -12.6 | 43.2% | 1.1% | 0.0001 | 0.0052 | 2% |
Kamsali | 3 | -6.5 | 44.5% | 1.3% | 0.0016 | 0.0066 | 19% |
Vysya | 5 | 5.4 (not significant) | 46.2% | 1.2% | 0.0083 | 0.0071 | 54% |
Lodi | 5 | -8.9 | 49.9% | 1.1% | 0.0027 | 0.0056 | 32% |
Naidu | 4 | -3.3 | 50.1% | 1.2% | 0.0022 | 0.0435 | 5% |
Tharu | 5 | -20.6 | 51.0% | 1.2% | 0.0000 | 0 | na |
Velama | 4 | -3.2 | 54.7% | 1.3% | 0.0044 | 0.0197 | 18% |
Srivastava | 2 | -7.5 | 56.4% | 1.5% | 0.0023 | 0 | 100% |
Meghawal | 5 | -13.3 | 60.3% | 1.2% | 0.0035 | 0 | 100% |
Vaish | 4 | -22.0 | 62.6% | 1.2% | 0.0012 | 0 | 100% |
Kashmiri Pandit | 5 | -20.6 | 70.6% | 1.2% | 0.0019 | 0 | 100% |
Sindhi | 10 | -26.3 | 73.7% | 1.1% | 0.0008 | 0.0043 | 16% |
Pathan | 15 | -34.3 | 76.9% | 1.1% | 0.0001 | 0.0039 | 3% |
Estimates of genetic drift (the variance in allele frequencies on any lineage) are based on a model in which each group is a simple mixture of ANI and ASI, followed by subsequent genetic drift specific to that group (corrected for inbreeding). To fit the model, we use the algorithm described in Note S4, and fit f2, f3 and f4 statistics that are calculated in a way that is unbiased by inbreeding (Appendix).
Wright’s fixation index F is estimated as the excess rate at which the two copies of a chromosome within an individual from a group are identical by state, compared within across individuals from that group (Appendix). We set negative values to 0; standard errors are typically around 0.003. Because of the small sample sizes, these estimates are heavily influenced by the samples that happen to have been included in our analysis, and thus should be considered approximate.
To estimate the proportion of recessive disease cases that are due to founder events, we consider the two alleles that a single individual carries at any locus. With probability F given by Wright’s Fixation Index, they coalesce in the last few generations due to consanguinity, and with probability (1-F)D, they coalescence more recently than ANI-ASI mixture due to founder events specific to that group. The fraction of recessive diseases due to founder events can thus be estimated as D(1-F)/(F+D(1-F)).