Skip to main content
. 2014 Mar 29;14:67. doi: 10.1186/1471-2148-14-67

Figure 2.

Figure 2

Schematic for creating the four subsetsDs,Ds,0,Dp, andDp,0 from datasetInline graphic. For the matrices of datasets Inline graphic, Ds, Ds,0, Dp, and Dp,0 (see Table 2), each row is an individual and each column is a locus. Thick black lines in these matrices separate the individuals in different species. Gray boxes indicate missing sequences. (A) At each locus, a single sequence from each species (indicated in red) is selected from dataset Inline graphic. These selected sequences are used to create Ds such that there exists a single sequence sampled per species at each locus. Sequences from a subset of loci in Ds (indicated in yellow) are used to create dataset Ds,0 such that each locus has at least one nucleotide difference between each distinct pair of species other than pairs from distinct outgroups. (B) Dataset Dp is the full starting dataset Inline graphic. At each locus , a distance matrix is created according to eq. 2. Sequences from a subset of loci (indicated in red) in Dp are used to create dataset Dp,0 such that each locus has a nonzero p-distance between each distinct pair of species other than pairs from distinct outgroups. Observe that the Dp,0 matrix includes loci 3 and 7, which are not included in the Ds,0 matrix. Loci 3 and 7 are included in Dp,0 but not in Ds,0 because in Dp,0, pairs of species contain at least one pair of individuals with different sequences, whereas in Ds,0, at least one pair of the 11 selected individuals have identical sequences. Therefore, the set of loci in Dp,0 is a superset of the set of loci in Ds,0, and the number of loci in Dp,0 is always greater than or equal to the number of loci in Ds,0.