Abstract
Two commonly used measures of genetic diversity for intraspecies DNA sequence data are based, respectively, on the number of segregating sites, and on the average number of pairwise nucleotide differences. Expressions are derived for their variance in the presence of intragenic recombination for a panmictic population of fixed size that is at neutral equilibrium at the region sequenced. We show that, in contrast to the slow decrease in variance with increasing sample size, if the recombination rate is nonzero, the asymptotic rate of decrease of variance with increasing sequence length, for fixed sample size, is quite rapid. In particular, it is close to that which would be obtained by sequencing independent chromosome regions. The correlation between measures of diversity from linked regions is also examined. For a given total number of bases sequenced in a particular region, optimal sequencing strategies are derived. These typically involve sequencing relatively few (three to 10) long copies of the region. Under optimal strategies, the variances of the two measures are very similar for most parameter values considered. Results concerning optimal sequencing strategies will be sensitive to gross departures from the underlying assumptions, such as population bottlenecks, selective sweeps, and substantial population substructure.
Full Text
The Full Text of this article is available as a PDF (1.4 MB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Donnelly P., Tavaré S. Coalescents and genealogical structure under neutrality. Annu Rev Genet. 1995;29:401–421. doi: 10.1146/annurev.ge.29.120195.002153. [DOI] [PubMed] [Google Scholar]
- Fu Y. X. Estimating effective population size or mutation rate using the frequencies of mutations of various classes in a sample of DNA sequences. Genetics. 1994 Dec;138(4):1375–1386. doi: 10.1093/genetics/138.4.1375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hudson R. R. Properties of a neutral allele model with intragenic recombination. Theor Popul Biol. 1983 Apr;23(2):183–201. doi: 10.1016/0040-5809(83)90013-8. [DOI] [PubMed] [Google Scholar]
- Kaplan N. L., Hudson R. R., Langley C. H. The "hitchhiking effect" revisited. Genetics. 1989 Dec;123(4):887–899. doi: 10.1093/genetics/123.4.887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaplan N., Hudson R. R. The use of sample genealogies for studying a selectively neutral m-loci model with recombination. Theor Popul Biol. 1985 Dec;28(3):382–396. doi: 10.1016/0040-5809(85)90036-x. [DOI] [PubMed] [Google Scholar]
- Simonsen K. L., Churchill G. A., Aquadro C. F. Properties of statistical tests of neutrality for DNA polymorphism data. Genetics. 1995 Sep;141(1):413–429. doi: 10.1093/genetics/141.1.413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watterson G. A. On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975 Apr;7(2):256–276. doi: 10.1016/0040-5809(75)90020-9. [DOI] [PubMed] [Google Scholar]