. 2021 Mar 26;10:246. [Version 1] doi: 10.12688/f1000research.51477.1

Table 1. Summary of PanOriginSV prediction accuracy.

Benchmarking results on large clusters obtained from MMSEQ2. For a single cluster, 25% of sequences were held out and used as the testing set. The accuracy of the top predicted lab was consistently higher in PanOriginSV compared to PlasmidHawk, however the accuracy when testing against the top 5 predictions for both tools was comparable.

Cluster	Train Sequences	Test Sequences	Test accuracy (Linear)	Top 5 test accuracy (Linear)	Test accuracy (Graph)	Top 5 test accuracy (Graph)
J7OEM	2870	714	0.96	0.99	0.97	0.99
3PTDM	2397	571	0.82	0.93	0.81	0.93
O3GQU	1046	275	0.65	0.88	0.71	0.83
48073	973	205	0.71	0.89	0.71	0.87
WA905	639	149	0.87	0.94	0.87	0.97
GIGX0	604	119	0.71	0.87	0.79	0.93