Skip to main content
. 2014 Oct 4;15(1):343. doi: 10.1186/1471-2105-15-343

Table 1.

Clustering of different data-sets of small, medium and large sized protein sequences using different methods

Small proteins (10–100 amino acids length)
Number of sequences - 500
Method # of clusters Threshold Word-length Time
CW 15 0.5 NA 0 m 11.835 s
k-tuple 3 0.5 2 0 m 1.539 s
CLAP 7 0.5 5 2 m 28.322 s
CLUSS 68 NA 4 0 m 11.000 s
CD-HIT 223 0.5 3 0 m 0.034 s
Small proteins (10–100 amino acids length)
Number of sequences - 1000
Method # of clusters Threshold Word-length Time
CW 23 0.5 NA 0 m 59.788 s
k-tuple 3 0.5 2 0 m 5.659 s
CLAP 17 0.5 5 9 m 52.099 s
CLUSS NA NA NA 0 m 11.000 s
CD-HIT 607 0.5 3 0 m 0.091 s
Medium proteins (400–600 amino acids length)
Number of sequences - 500
Method # of clusters Threshold Word-length Time
CW 2 0.5 NA 8 m 46.895 s
k-tuple 3 0.5 2 0 m 2.25 s
CLAP 3 0.5 5 2 m 50.918 s
CLUSS 95 NA 4 0 m 3.133 s
CD-HIT 227 0.5 3 0 m 0.592 s
Medium proteins (400–600 amino acids length)
Number of sequences - 1000
Method # of clusters Threshold Word-length Time
CW 5 0.5 NA 32 m 50.379 s
k-tuple 2 0.5 2 0 m 7.789 s
CLAP 7 0.5 5 11 m1 2.664 s
CLUSS NA NA NA NA
CD-HIT 708 0.5 3 0 m 3.281 s
Large proteins (850–1000 amino acids length)
Number of sequences - 500
Method # of clusters Threshold Word-length Time
CW 15 0.5 NA 42 m 1.184 s
k-tuple 4 0.5 2 0 m 2.91 s
CLAP 4 0.5 5 4 m 22.752 s
CLUSS NA NA NA NA
CD-HIT 125 0.5 3 0 m0.916 s

The processing time was computed using the workstation that hosts the CLAP web-server, with a 2.40 GHz, Intel xeon processor and 16GB RAM running CentOS. The number of clusters generated at a specific threshold and word-length used in the computations is also shown.