Skip to main content
. 2021 May 5;9:e11348. doi: 10.7717/peerj.11348

Table 2. Comparison of the clustering properties when varying the distance metric, the distance threshold or the clustering strategy.

Analyses were run on 63,863 RefSeq Bacteria using two different distance metrics, either based on the Jaccard Index (JI) or the Identical Genome Fraction (IGF), six different distance thresholds (from 0.8 to 0.9 and from 0.6 to 0.7, respectively), and two different clustering strategies, either direct (JI-D and IGF-D) or indirect (JI-i and IGF-i; see text for details). All pack sizes were 200 and the clustering mode was set to “loose”. RI, Redundancy Index (# groups / # phyla).

Jaccard Index (JI)
Direct strategy (JI-d) Indirect strategy (JI-i)
threshold 0.80 0.82 0.84 0.86 0.88 0.90 threshold 0.80 0.82 0.84 0.86 0.88 0.90
RI 59 47 35 25 14 10 RI 54 46 35 20 12 4
# phyla 34 34 29 24 19 11 # phyla 34 31 25 22 13 11
# groups 2005 1589 1025 598 268 109 # groups 1845 1430 870 446 151 49
—pure groups 1992 1576 1009 587 261 106 – pure groups 1835 1416 853 434 149 45
– singletons 1201 904 557 325 143 56 – singletons 1727 818 488 242 88 24
— mixed groups 13 13 16 11 7 3 – mixed groups 10 14 17 12 2 4
– paraphyletic 0 0 0 0 0 0 – paraphyletic 0 1 0 0 0 0
– super-phyla 10 10 12 5 2 0 – super-phyla 10 13 9 7 0 1
– polyphyletic 3 3 4 6 5 3 – polyphyletic 0 0 8 5 2 3
Identical Genome Fraction (IGF)
Direct strategy (IGF-d) Indirect strategy (IGF-i)
threshold 0.60 0.62 0.64 0.66 0.68 0.70 threshold 0.60 0.62 0.64 0.66 0.68 0.70
RI 74 65 58 45 33 31 RI 50 55 44 30 23 11
# phyla 24 24 22 22 22 15 # phyla 31 25 24 24 19 16
# groups 1776 1548 1271 988 719 464 # groups 1536 1369 1061 715 440 176
—pure groups 1758 1530 1271 971 706 456 – pure groups 1514 1345 1042 701 426 167
– singletons 1094 939 755 587 419 260 – singletons 905 784 595 404 219 77
—mixed groups 18 18 19 17 13 8 – mixed groups 22 24 19 14 14 9
– paraphyletic 4 2 2 2 1 1 – paraphyletic 2 3 1 0 2 0
– super-phyla 11 11 13 10 4 1 – super-phyla 17 17 14 10 8 4
– polyphyletic 3 5 4 5 8 6 – polyphyletic 3 4 4 4 4 5