Table 1.
Clustering setting | Minimum sequence similarity | Homology groups | Single copy groups | Correct groups a | True Positives | False Positives | False Negatives | Recall | Precision | F-score |
---|---|---|---|---|---|---|---|---|---|---|
D1 | 95% | 49,290 | 812 | 395 | 128,085 | 14 | 3905 | 0.9704 | 0.9999 | 0.9849 |
D2 | 85% | 28,896 | 1615 | 629 | 131,795 | 24 | 195 | 0.9985 | 0.9998 | 0.9992 |
D3 | 75% | 24,650 | 1690 | 638 | 131,952 | 35 | 38 | 0.9997 | 0.9997 | 0.9997 |
D4 | 65% | 22,347 | 1699 | 640 | 131,975 | 38 | 15 | 0.9999 | 0.9997 | 0.9998 |
D5 | 55% | 20,636 | 1683 | 639 | 131,975 | 44 | 15 | 0.9999 | 0.9997 | 0.9998 |
D6 | 45% | 19,234 | 1653 | 633 | 131,985 | 245 | 5 | 0.9981 | 0.9981 | 0.9991 |
D7 | 35% | 17,908 | 1612 | 623 | 131,985 | 508 | 5 | 0.9962 | 0.9962 | 0.9981 |
D8 | 25% | 16,486 | 1486 | 607 | 131,986 | 7002 | 4 | 1.0000 | 0.9496 | 0.9741 |
a Correct groups are defined as the number of groups that correctly organize one out of 670 ‘complete’ and ‘non-duplicated’ Enterobacteriaceae BUSCO genes. Calculations of recall, precision, and F-score explained in Methods