Table 3.
Performance test of prokaryotic genome annotation tools based on four species
| Species | Software | Genes | CDS | CDS with function | CDS without function | rRNA | tRNA | Genes with same start and end position (%)a | Genes with same start position (%) | Similarity scoreb |
|---|---|---|---|---|---|---|---|---|---|---|
| Clostridioides difficile | Ref_annotationc | 3904 | 3850 | 3353 (87.09%) | 497 (12.91%) | 32 | 88 | / | / | / |
| PROKKA | 3913 | 3824 | 2124 (55.54%) | 1700 (44.46%) | / | 89 | 3547 (90.86%) | 3714 (95.13%) | 95.02% | |
| RASTtk | 4031 | 3912 | 2930 (74.90%) | 982 (25.10%) | 32 | 87 | 3557 (91.11%) | 3727 (95.47%) | 93.94% | |
| RAST | 4261 | 3904 | 2929 (75.03%) | 975 (24.97%) | 96 | 261 | 3721 (95.31%) | 3900 (99.90%) | 95.53% | |
| GeneSAS_genemarkS | 4087 | 3966 | / | / | 32 | 89 | 3615 (92.60%) | 3755 (96.20%) | 93.98% | |
| PGAP | 3886 | 3829 | 3410 (89.06%) | 419 (10.94%) | 32 | 88 | 3763 (96.39%) | 3828 (98.05%) | 98.28% | |
| Klebsiella pneumoniae | Ref_annotation | 5868 | 5779 | 4085 (70.69%) | 1694 (29.31%) | 25 | 62 | / | / | / |
| PROKKA | 5540 | 5451 | 3889 (71.34%) | 1562 (28.66%) | / | 88 | 4760 (81.12%) | 5100 (86.91%) | 89.41% | |
| RASTtk | 5844 | 5731 | 4826 (84.21%) | 905 (15.79%) | 25 | 88 | 5092 (86.78%) | 5394 (91.92%) | 92.11% | |
| RAST | 6070 | 5731 | 4857 (84.75%) | 874 (15.25%) | 75 | 264 | 5216 (88.89%) | 5517 (94.02%) | 92.43% | |
| GeneSAS_genemarkS | 5544 | 5544 | / | / | / | / | 4581 (78.07%) | 5018 (85.51%) | 87.94% | |
| PGAP | 5467 | 5527 | 5014 (90.72%) | 513 (9.28%) | 25 | 88 | 4858 (82.79%) | 5097 (86.86%) | 89.93% | |
| Neisseria gonorrhoeae | Ref_annotation | 2044 | 1973 | 1600 (81.09%) | 373 (18.91%) | 11 | 54 | / | / | / |
| PROKKA | 2201 | 2145 | 1235 (57.58%) | 910 (42.42%) | / | 55 | 1713 (83.81%) | 1836 (89.82%) | 86.50% | |
| RASTtk | 2642 | 2575 | 1765 (68.54%) | 810 (31.46%) | 12 | 55 | 1731 (84.69%) | 1881 (92.03%) | 80.28% | |
| RAST | 2776 | 2575 | 1783 (69.24%) | 792 (30.76%) | 36 | 165 | 1839 (89.98%) | 1999 (97.80%) | 82.95% | |
| GeneSAS_genemarkS | 2357 | 2357 | / | / | / | / | 1421 (69.52%) | 1683 (82.34%) | 76.48% | |
| PGAP | 2030 | 1960 | 1629 (83.11%) | 331 (16.89%) | 12 | 55 | 1930 (94.42%) | 1967 (96.23%) | 96.56% | |
| Staphylococcus aureus | Ref_annotation | 2842 | 2767 | 1238 (44.74%) | 1529 (55.26%) | 16 | 59 | / | / | / |
| PROKKA | 2693 | 2630 | 1720 (65.40%) | 910 (34.60%) | / | 62 | 2315 (81.46%) | 2476 (87.12%) | 89.47% | |
| RASTtk | 2763 | 2687 | 2157 (80.28%) | 530 (19.72%) | 16 | 60 | 2353 (82.79%) | 2523 (88.78%) | 90.03% | |
| RAST | 2915 | 2687 | 2157 (80.28%) | 530 (19.72%) | 48 | 180 | 2471 (86.95%) | 2647 (93.14%) | 91.96% | |
| GeneSAS_genemarkS | 2683 | 2683 | / | / | / | / | 2320 (81.63%) | 2458 (86.49%) | 88.98% | |
| PGAP | 2782 | 2704 | 2327 (86.06%) | 377 (13.94%) | 16 | 59 | 2580 (90.78%) | 2676 (94.16%) | 95.16% |
aPercentage is equal to (the number of detected identical genes/ref_annotation genes)*100.
bSimilarityScore = ((Genes with same start position)/(Totalx + Totalz))*2*100. Totalx and Totalz are the total number of genes in the software annotation and reference annotation. The function is from BEACON (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4539851/).
cref_annotation means the reference annotation. It is from NCBI RefSeq.