Table 1.
Small proteins (10–100 amino acids length) | ||||
Number of sequences - 500 | ||||
Method | # of clusters | Threshold | Word-length | Time |
CW | 15 | 0.5 | NA | 0 m 11.835 s |
k-tuple | 3 | 0.5 | 2 | 0 m 1.539 s |
CLAP | 7 | 0.5 | 5 | 2 m 28.322 s |
CLUSS | 68 | NA | 4 | 0 m 11.000 s |
CD-HIT | 223 | 0.5 | 3 | 0 m 0.034 s |
Small proteins (10–100 amino acids length) | ||||
Number of sequences - 1000 | ||||
Method | # of clusters | Threshold | Word-length | Time |
CW | 23 | 0.5 | NA | 0 m 59.788 s |
k-tuple | 3 | 0.5 | 2 | 0 m 5.659 s |
CLAP | 17 | 0.5 | 5 | 9 m 52.099 s |
CLUSS | NA | NA | NA | 0 m 11.000 s |
CD-HIT | 607 | 0.5 | 3 | 0 m 0.091 s |
Medium proteins (400–600 amino acids length) | ||||
Number of sequences - 500 | ||||
Method | # of clusters | Threshold | Word-length | Time |
CW | 2 | 0.5 | NA | 8 m 46.895 s |
k-tuple | 3 | 0.5 | 2 | 0 m 2.25 s |
CLAP | 3 | 0.5 | 5 | 2 m 50.918 s |
CLUSS | 95 | NA | 4 | 0 m 3.133 s |
CD-HIT | 227 | 0.5 | 3 | 0 m 0.592 s |
Medium proteins (400–600 amino acids length) | ||||
Number of sequences - 1000 | ||||
Method | # of clusters | Threshold | Word-length | Time |
CW | 5 | 0.5 | NA | 32 m 50.379 s |
k-tuple | 2 | 0.5 | 2 | 0 m 7.789 s |
CLAP | 7 | 0.5 | 5 | 11 m1 2.664 s |
CLUSS | NA | NA | NA | NA |
CD-HIT | 708 | 0.5 | 3 | 0 m 3.281 s |
Large proteins (850–1000 amino acids length) | ||||
Number of sequences - 500 | ||||
Method | # of clusters | Threshold | Word-length | Time |
CW | 15 | 0.5 | NA | 42 m 1.184 s |
k-tuple | 4 | 0.5 | 2 | 0 m 2.91 s |
CLAP | 4 | 0.5 | 5 | 4 m 22.752 s |
CLUSS | NA | NA | NA | NA |
CD-HIT | 125 | 0.5 | 3 | 0 m0.916 s |
The processing time was computed using the workstation that hosts the CLAP web-server, with a 2.40 GHz, Intel xeon processor and 16GB RAM running CentOS. The number of clusters generated at a specific threshold and word-length used in the computations is also shown.