Skip to main content
[Preprint]. 2024 Jan 3:2023.01.13.524024. Originally published 2023 Jan 15. [Version 4] doi: 10.1101/2023.01.13.524024

Table 3.

Genomes and gene annotations used as references for the assessment of gene prediction accuracy.

Species Genome length (Mb) Reference annotation statistics
# coding genes # coding transcripts introns per gene
C. elegans (roundworm) 100 19,969 28,544 4.8
A. thaliana (thale cress) 119 27,445 40,827 4.0
D. melanogaster (fruit fly) 138 13,951 22,395 2.8
S. lycopersicum (tomato) 807 25,158 (15,138) 31,911 (15,150) 4.4 (4.3)
D. rerio (zebrafish) 1,345 25,610 (17,893) 42,929 (19,975) 8.4 (8.4)
G. gallus (chicken) 1,050 17,279 (10,736) 38,534 (12,733) 9.0 (9.2)
M. musculus (mouse) 2,723 22,405 (16,531) 58,318 (20,708) 6.0 (8.6)

The numbers in parentheses provided for the four large genomes characterize sets of genes and transcripts in the intersection of NCBI and Ensembl annotations (see Methods). The numbers of introns per gene were computed from averages for each gene among annotated alternative transcripts. Alternative transcripts that differ only by UTR regions are not considered.