Skip to main content
. 2023 Feb 8;11:e14779. doi: 10.7717/peerj.14779

Figure 5. The homogeneity, completeness, AMI, and ARI of the single-batch tests.

Figure 5

(A) All tests aside from “CD-HIT” use MMseqs2 the clustering module stated. The sensitivity specified in parentheses refers to the MMseqs2 search run by Complet+, not the clustering sensitivity, which was the default value of 4.0 where applicable. Overall, Complet+ substantially improves each test case’s completeness at little expense to homogeneity. The loss in homogeneity is more notable when evaluating the clustering results on the family-level of classification, however still to a lesser degree than the increase in completeness. Complet+ also improves the AMI and ARI of each clustering to varying degrees, having a generally greater improvement. (B) The two leftmost tests are Complet+ run at minimum, and maximum MMseqs2 search sensitivity, each on the same Default linclust test results. The two following tests are identical aside from one using the cluster-reassign setting. The last test is the Connected Component (CC) clustering method of MMSeqs2, run at the highest sensitivity. Like the results in (A), AMI and ARI are improved with each case while completeness is also improved without significant loss of homogeneity.