Skip to main content
[Preprint]. 2024 Apr 17:2023.01.13.524024. Originally published 2023 Jan 15. [Version 5] doi: 10.1101/2023.01.13.524024

Table 4.

The table shows total numbers of protein-coding genes as well as individual CDSs (including alternative isoforms) annotated in the seven genomes (the annotation sources are described in Supplemental Table S7). The numbers of genes and individual CDSs in the intersections of the NCBI RefSeq and the Ensembl annotations are given in parentheses (see Methods). Annotations of C. elegans, A. thaliana, and D. melanogaster genomes are identical between RefSeq and Ensemble, therefore the intersection sets have the same numbers of genes and CDSs as in the RefSeq annotation.

Species Genome length (Mb) Reference annotation statistics
# of protein-coding genes # of CDSs introns per gene

C. elegans 100 19,969 - 28,544 - 4.8 -
A. thaliana 119 27,445 - 40,827 - 4.0 -
D. melanogaster 138 13,951 - 22,395 - 2.8 -

S. lycopersicum 807 25,158 (15,138) 31,911 (15,150) 4.4 (4.3)
D. rerio 1,345 25,610 (17,893) 42,929 (19,975) 8.4 (8.4)

G. gallus 1,050 17,279 (10,736) 38,534 (12,733) 9.0 (9.2)
M. musculus 2,723 22,405 (16,531) 58,318 (20,708) 6.0 (8.6)