Abstract
In the comparison of DNA and protein sequences between species or between paralogues or among individuals within a species or population, there is often some indication that different regions of the sequence are divergent or polymorphic to different degrees, indicating differential constraint or diversifying selection operating in different regions of the sequence. The problem is to test statistically whether the observed regional differences in the density of variant sites represent real differences and then to estimate as accurately as possible the location of the differential regions. A method is given for testing and locating regions of differential variation. The method consists of calculating G(x(k)) = k/n - x(k)/N, where x(k) is the position of the kth variant site along the sequence, n is the total number of variant sites, and N is the total sequence length. The estimated region is the longest stretch of adjacent sequence for which G(x(k)) is monotonically increasing (a hot spot) or decreasing (a cold spot). Critical values of this length for tests of significance are given, a sequential method is developed for locating multiple differential regions, and the power of the method against various alternatives is explored. The method locates the endpoints of hot spots and cold spots of variation with high accuracy.
Full Text
The Full Text of this article is available as a PDF (210.2 KB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Goss P. J., Lewontin R. C. Detecting heterogeneity of substitution along DNA and protein sequences. Genetics. 1996 May;143(1):589–602. doi: 10.1093/genetics/143.1.589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richter B., Long M., Lewontin R. C., Nitasaka E. Nucleotide variation and conservation at the dpp locus, a gene controlling early development in Drosophila. Genetics. 1997 Feb;145(2):311–323. doi: 10.1093/genetics/145.2.311. [DOI] [PMC free article] [PubMed] [Google Scholar]