Table 2.
Effects of hit fraction threshold on cluster assembly. Bold indicates the threshold chosen for the current study.
Hit fractiona | Clustersb | Singletonsc | Phylogenetically informative clustersd | Max sizee | TCs in phylogenetically informative clustersf |
0.0 | 39924 | 26782 | 4423 | 6565 | 54051 |
0.1 | 47798 | 32824 | 4079 | 1947 | 42406 |
0.2 | 57229 | 41327 | 3324 | 1362 | 29403 |
0.3 | 64691 | 48864 | 2561 | 330 | 21504 |
0.4 | 71333 | 56383 | 1876 | 117 | 15457 |
0.5 | 77564 | 63890 | 1340 | 98 | 10721 |
0.6 | 83435 | 71539 | 897 | 95 | 7105 |
0.7 | 88864 | 79122 | 577 | 94 | 4536 |
0.8 | 94296 | 87186 | 324 | 92 | 2529 |
0.9 | 99843 | 95975 | 103 | 89 | 872 |
1.0 | 105144 | 104860 | 1 | 6 | 6 |
a Minimum proportion of sequence similarity based on BLAST's pairwise comparisons. The hit fraction determines whether a sequence is linked to another (if a pair is linked, they will be placed in the same cluster) and thus affects the level of heterogeneity within clusters and the number of assembled clusters. Original number of sequences is 105,453 TCs.
b Total number of assembled clusters.
c Number of single-sequence clusters.
d Phylogenetically informative clusters for this study are those that include at least three species and at least four sequences.
e Number of tentative consensus sequences (TCs) in the largest phylogenetically informative cluster.
f Total TCs in all phylogenetically informative clusters.