Skip to main content
[Preprint]. 2024 Jan 3:2023.01.13.524024. Originally published 2023 Jan 15. [Version 4] doi: 10.1101/2023.01.13.524024

Table 1.

Summary statistics characterizing the sets of the GeneMarkS-T gene predictions in assembled transcripts as well as the sets of the high-confidence gene predictions (HC genes).

Species # of annotated genes # of genes predicted by GeneMarkS-T Sn/Sp of GeneMarkS-T predicted genes # of HC genes (Order excluded DB) Sn/Sp of HC genes (Order excluded DB) # of HC genes (Species excluded DB) Sn/Sp of HC genes (Species excluded DB)
C. elegans 19,969 14,746 46.8/63.4 8,062 35.7/88.4 11,399 51.7/90.6
A. thaliana 27,445 17,589 51.2/79.9 16,008 55.0/94.7 16,551 58.8/97.6
D. melanogaster 13,951 10,163 59.6/81.8 8,109 59.6/81.8 9,223 63.7/96.3
S. lycopersicum 25,158 19,526 67.8/77.8 17,231 74.9/95.2 17,489 75.8/95.1
D. rerio 25,611 22,992 59.6/59.9 16,918 67.0/88.5 16,573 66.9/90.4
G. gallus 17,279 17,381 49.6/47.0 12,473 74.4/89.1 12,564 74.0/88.4
M. musculus 22,611 15,819 49.6/63.2 13,057 63.5/93.2 12,965 63.9/94.5

Two versions of reference protein databases were used for each species: the database called ‘Species excluded’, containing all the proteins from an OrthoDB segment but proteins from the same species, as well as the smaller database called ‘Order excluded’ containing all the proteins from the same OrthoDB segment but proteins from the same taxonomic order (see Materials). Additional data is provided in Supplemental Table S1.