Table 4.
Species | Genome length (Mb) | Reference annotation statistics | |||||
---|---|---|---|---|---|---|---|
No. of protein-coding genes | No. of CDSs | Introns per gene | |||||
C. elegans | 100 | 19,969 | — | 28,544 | — | 4.8 | — |
A. thaliana | 119 | 27,445 | — | 40,827 | — | 4.0 | — |
D. melanogaster | 138 | 13,951 | — | 22,395 | — | 2.8 | — |
S. lycopersicum | 807 | 25,158 | (15,138) | 31,911 | (15,150) | 4.4 | (4.3) |
D. rerio | 1345 | 25,610 | (17,893) | 42,929 | (19,975) | 8.4 | (8.4) |
G. gallus | 1050 | 17,279 | (10,736) | 38,534 | (12,733) | 9.0 | (9.2) |
M. musculus | 2723 | 22,405 | (16,531) | 58,318 | (20,708) | 6.0 | (8.6) |
The numbers of genes and individual CDSs in the intersections of the NCBI RefSeq and the Ensembl annotations are given in parentheses (see Methods). Annotations of the C. elegans, A. thaliana, and D. melanogaster genomes are identical between RefSeq and Ensembl; therefore, the intersection sets have the same numbers of genes and CDSs as in the RefSeq annotation.