Table 1.
Number of clusters
0.99 | 0.97 | 0.95 | |
---|---|---|---|
DNACLUST exact | 233879 | 73726 | 28241 |
DNACLUST inexact | 240125 | 76391 | 28661 |
UCLUST exact | 144339 | 48418 | 20039 |
UCLUST inexact | 253108 | 71361 | 26685 |
CD-HIT | 245851 | 100280 | 55208 |
The number of clusters produced by DNACLUST, UCLUST and CD-HIT at various identity/similarity thresholds, on the twins dataset. Since each tool uses slightly different distance measures, the number of clusters can not be directly compared between different tools. (Namely the identity measure used by UCLUST and CD-HIT underestimates the distance between two sequences, as computed by the similarity measure used by DNACLUST). Instead we compare the change in the number of clusters when switching between the exact and inexact modes of each tool - a smaller change indicating better performance.