Table 1.
Species | No. of genes initially predicted by GeneMarkS-T | Sn/Pr of GeneMarkS-T-predicted genes | Order excluded DB | Species excluded DB | ||
---|---|---|---|---|---|---|
No. of HC genes | Sn/Pr of HC genes | No. of HC genes | Sn/Pr of HC genes | |||
C. elegans | 14,746 | 46.8/63.4 | 8062 | 35.7/88.4 | 11,399 | 51.7/90.6 |
A. thaliana | 17,589 | 51.2/79.9 | 16,008 | 56.7/97.3 | 16,551 | 58.8/97.6 |
D. melanogaster | 10,163 | 59.6/81.8 | 8109 | 55.0/94.7 | 9223 | 63.7/96.3 |
S. lycopersicum | 19,526 | 68.0/78.4 | 17,231 | 75.1/95.3 | 17,489 | 75.8/95.2 |
D. rerio | 22,992 | 59.6/59.9 | 16,918 | 67.0/88.5 | 16,573 | 66.9/90.4 |
G. gallus | 17,381 | 49.6/47.0 | 12,473 | 74.4/89.1 | 12,564 | 74.0/88.4 |
M. musculus | 15,819 | 49.6/63.2 | 13,057 | 63.5/93.2 | 12,965 | 63.9/94.5 |
Two versions of a reference protein database were used for each species: the “species excluded” and the “order excluded” (see section “Data sets”). Refinement made by GeneMarkS-TP reduced the number of initial predictions and produced a significant increase in Pr. In most of the genomes, there was also a positive change in Sn, especially in large inhomogeneous genomes of G. gallus and M. musculus. Additional data are provided in Supplemental Table S1.