Skip to main content
. 2021 May 26;3(2):lqab047. doi: 10.1093/nargab/lqab047

Table 3.

Assessment of gene finding accuracy of GeneMark-HM on simulated metagenomic sequences. The shortest contig length was 1500 nt. The protein-coding regions reported by MetaGeneAnnotator and FragGeneScan were longer than 120 nt; the three other tools reported predicted coding regions longer than 90 nt

1 640 483 genes in the test set # of missed genes % of missed genes total # of genes predicted # of wrong starts % of wrong starts
FragGeneScan 71 242 4.3 1 861 953 291 400 17.8
MetaGeneAnnotator 37 451 2.3 1 794 502 284 555 17.3
MetaProdigal 22 071 1.3 1 824 182 194 188 11.8
MetaGeneMark 21 861 1.3 1 830 949 330 985 20.2
(a) GeneMark-HM with D4 (81% in pan-genome path, 19% in MetaGeneMark path) 15 747 1.0 1 813 984 204 159 12.4
(b) GeneMark-HM with D3 (84% in pan-genome path, 16% in MetaGeneMark path) 15 409 0.9 1 813 830 199 871 12.2
(c) GeneMark-HM with D2 (88% in pan-genome path, 12% in MetaGeneMark path) 15 122 0.9 1 815 335 193 395 11.8
(d) GeneMark-HM with D1 (96% in pan-genome path, 4% in MetaGeneMark path) 14 162 0.9 1 817 669 178 190 10.9