Table 1.
General and BUSCO benchmark statistics for homology grouping performed under setting D1 to D8
| Clustering setting | Minimum sequence similarity | Homology groups | Single copy groups | Correct groups a | True Positives | False Positives | False Negatives | Recall | Precision | F-score |
|---|---|---|---|---|---|---|---|---|---|---|
| D1 | 95% | 49,290 | 812 | 395 | 128,085 | 14 | 3905 | 0.9704 | 0.9999 | 0.9849 |
| D2 | 85% | 28,896 | 1615 | 629 | 131,795 | 24 | 195 | 0.9985 | 0.9998 | 0.9992 |
| D3 | 75% | 24,650 | 1690 | 638 | 131,952 | 35 | 38 | 0.9997 | 0.9997 | 0.9997 |
| D4 | 65% | 22,347 | 1699 | 640 | 131,975 | 38 | 15 | 0.9999 | 0.9997 | 0.9998 |
| D5 | 55% | 20,636 | 1683 | 639 | 131,975 | 44 | 15 | 0.9999 | 0.9997 | 0.9998 |
| D6 | 45% | 19,234 | 1653 | 633 | 131,985 | 245 | 5 | 0.9981 | 0.9981 | 0.9991 |
| D7 | 35% | 17,908 | 1612 | 623 | 131,985 | 508 | 5 | 0.9962 | 0.9962 | 0.9981 |
| D8 | 25% | 16,486 | 1486 | 607 | 131,986 | 7002 | 4 | 1.0000 | 0.9496 | 0.9741 |
a Correct groups are defined as the number of groups that correctly organize one out of 670 ‘complete’ and ‘non-duplicated’ Enterobacteriaceae BUSCO genes. Calculations of recall, precision, and F-score explained in Methods