Table 3.
Clustering results and performance of Gclust at 90% eMEMi using 16 threads
Dataset | No. of sequences | No. of clusters | Running time (min) |
---|---|---|---|
Viral data | 9578 | 9101 | 8.7 |
Archaeal data | 38,381 | 16,064 | 88.0 |
Fungal data | 79,365 | 68,698 | 1322.8 |
Bacterial data | 112,111 | 105,867 | 7678.8 |
Note: The parameters used for clustering are as follows: -minlen 41 -both -nuc -threads 16 -chunk 400 -loadall -memiden 90 -rebuild -ext 1 -sparse 4. Parameter “-both” indicates that Gclust compares both strands of DNA sequences. MEM, maximal exact match. eMEMi, extended MEM identity.