Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2014 Mar 29;14:67. doi: 10.1186/1471-2148-14-67

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 2014 DeGiorgio et al.; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.

PMC Copyright notice

Schematic for creating the four subsets $D_{s}$ , $D_{s, 0}$ , $D_{p}$ , and $D_{p, 0}$ from dataset. For the matrices of datasets , $D_{s}$ , $D_{s, 0}$ , $D_{p}$ , and $D_{p, 0}$ (see Table 2), each row is an individual and each column is a locus. Thick black lines in these matrices separate the individuals in different species. Gray boxes indicate missing sequences. (A) At each locus, a single sequence from each species (indicated in red) is selected from dataset . These selected sequences are used to create $D_{s}$ such that there exists a single sequence sampled per species at each locus. Sequences from a subset of loci in $D_{s}$ (indicated in yellow) are used to create dataset $D_{s, 0}$ such that each locus has at least one nucleotide difference between each distinct pair of species other than pairs from distinct outgroups. (B) Dataset $D_{p}$ is the full starting dataset . At each locus ℓ, a distance matrix is created according to eq. 2. Sequences from a subset of loci (indicated in red) in $D_{p}$ are used to create dataset $D_{p, 0}$ such that each locus has a nonzero p-distance between each distinct pair of species other than pairs from distinct outgroups. Observe that the $D_{p, 0}$ matrix includes loci 3 and 7, which are not included in the $D_{s, 0}$ matrix. Loci 3 and 7 are included in $D_{p, 0}$ but not in $D_{s, 0}$ because in $D_{p, 0}$ , pairs of species contain at least one pair of individuals with different sequences, whereas in $D_{s, 0}$ , at least one pair of the 11 selected individuals have identical sequences. Therefore, the set of loci in $D_{p, 0}$ is a superset of the set of loci in $D_{s, 0}$ , and the number of loci in $D_{p, 0}$ is always greater than or equal to the number of loci in $D_{s, 0}$ .

Inline graphic — Schematic for creating the four subsets $D_{s}$ , $D_{s, 0}$ , $D_{p}$ , and $D_{p, 0}$ from dataset. For the matrices of datasets , $D_{s}$ , $D_{s, 0}$ , $D_{p}$ , and $D_{p, 0}$ (see Table 2), each row is an individual and each column is a locus. Thick black lines in these matrices separate the individuals in different species. Gray boxes indicate missing sequences. (A) At each locus, a single sequence from each species (indicated in red) is selected from dataset . These selected sequences are used to create $D_{s}$ such that there exists a single sequence sampled per species at each locus. Sequences from a subset of loci in $D_{s}$ (indicated in yellow) are used to create dataset $D_{s, 0}$ such that each locus has at least one nucleotide difference between each distinct pair of species other than pairs from distinct outgroups. (B) Dataset $D_{p}$ is the full starting dataset . At each locus ℓ, a distance matrix is created according to eq. 2. Sequences from a subset of loci (indicated in red) in $D_{p}$ are used to create dataset $D_{p, 0}$ such that each locus has a nonzero p-distance between each distinct pair of species other than pairs from distinct outgroups. Observe that the $D_{p, 0}$ matrix includes loci 3 and 7, which are not included in the $D_{s, 0}$ matrix. Loci 3 and 7 are included in $D_{p, 0}$ but not in $D_{s, 0}$ because in $D_{p, 0}$ , pairs of species contain at least one pair of individuals with different sequences, whereas in $D_{s, 0}$ , at least one pair of the 11 selected individuals have identical sequences. Therefore, the set of loci in $D_{p, 0}$ is a superset of the set of loci in $D_{s, 0}$ , and the number of loci in $D_{p, 0}$ is always greater than or equal to the number of loci in $D_{s, 0}$ .