Table 1. Summary of PanOriginSV prediction accuracy.
Benchmarking results on large clusters obtained from MMSEQ2. For a single cluster, 25% of sequences were held out and used as the testing set. The accuracy of the top predicted lab was consistently higher in PanOriginSV compared to PlasmidHawk, however the accuracy when testing against the top 5 predictions for both tools was comparable.
Cluster | Train Sequences | Test Sequences | Test accuracy (Linear) | Top 5 test accuracy (Linear) | Test accuracy (Graph) | Top 5 test accuracy (Graph) |
---|---|---|---|---|---|---|
J7OEM | 2870 | 714 | 0.96 | 0.99 | 0.97 | 0.99 |
3PTDM | 2397 | 571 | 0.82 | 0.93 | 0.81 | 0.93 |
O3GQU | 1046 | 275 | 0.65 | 0.88 | 0.71 | 0.83 |
48073 | 973 | 205 | 0.71 | 0.89 | 0.71 | 0.87 |
WA905 | 639 | 149 | 0.87 | 0.94 | 0.87 | 0.97 |
GIGX0 | 604 | 119 | 0.71 | 0.87 | 0.79 | 0.93 |