Table 1.
Parameters used in sequence–sequence clustering grid search.
Parameter name | Values | Brief description |
---|---|---|
—cluster-mode | 0, 1, 2 | Clustering algorithm to usea |
—cluster-steps | 1, 2, 3 | Number of cascaded clustering steps to dob |
−s | 1, 4, 7 | Sensitivityc |
—min-seq-id | 0.5, 0.45, 0.4, 0.35, 0.3 | Percent identity threshold |
−c | 0.9, 0.85, 0.8, 0.75, 0.7 | Coverage threshold |
−e | 1e−10, 1e−05, 1e−3 | E-value threshold |
Different algorithms are available to interpret the graph of pairwise edges into clusters. 0 = set cover, 1 = single-linkage (like blastclust), 2 = greedy-incremental (like CD-HIT). Details are in MMseqs2 manual.
Performs clustering with strict parameters, then incrementally merges clusters by relaxing parameters down to the selected ones in these many steps.
Higher sensitivity values allow less similar k-mers to count as matches that can seed an alignment.