Skip to main content
. 2011 Jun 30;12:271. doi: 10.1186/1471-2105-12-271

Table 1.

Number of clusters

0.99 0.97 0.95
DNACLUST exact 233879 73726 28241

DNACLUST inexact 240125 76391 28661

UCLUST exact 144339 48418 20039

UCLUST inexact 253108 71361 26685

CD-HIT 245851 100280 55208

The number of clusters produced by DNACLUST, UCLUST and CD-HIT at various identity/similarity thresholds, on the twins dataset. Since each tool uses slightly different distance measures, the number of clusters can not be directly compared between different tools. (Namely the identity measure used by UCLUST and CD-HIT underestimates the distance between two sequences, as computed by the similarity measure used by DNACLUST). Instead we compare the change in the number of clusters when switching between the exact and inexact modes of each tool - a smaller change indicating better performance.

HHS Vulnerability Disclosure