Table 3.
Assessment of gene finding accuracy of GeneMark-HM on simulated metagenomic sequences. The shortest contig length was 1500 nt. The protein-coding regions reported by MetaGeneAnnotator and FragGeneScan were longer than 120 nt; the three other tools reported predicted coding regions longer than 90 nt
1 640 483 genes in the test set | # of missed genes | % of missed genes | total # of genes predicted | # of wrong starts | % of wrong starts |
---|---|---|---|---|---|
FragGeneScan | 71 242 | 4.3 | 1 861 953 | 291 400 | 17.8 |
MetaGeneAnnotator | 37 451 | 2.3 | 1 794 502 | 284 555 | 17.3 |
MetaProdigal | 22 071 | 1.3 | 1 824 182 | 194 188 | 11.8 |
MetaGeneMark | 21 861 | 1.3 | 1 830 949 | 330 985 | 20.2 |
(a) GeneMark-HM with D4 (81% in pan-genome path, 19% in MetaGeneMark path) | 15 747 | 1.0 | 1 813 984 | 204 159 | 12.4 |
(b) GeneMark-HM with D3 (84% in pan-genome path, 16% in MetaGeneMark path) | 15 409 | 0.9 | 1 813 830 | 199 871 | 12.2 |
(c) GeneMark-HM with D2 (88% in pan-genome path, 12% in MetaGeneMark path) | 15 122 | 0.9 | 1 815 335 | 193 395 | 11.8 |
(d) GeneMark-HM with D1 (96% in pan-genome path, 4% in MetaGeneMark path) | 14 162 | 0.9 | 1 817 669 | 178 190 | 10.9 |