Table 2.
Summary of the improved Official Gene Set (OGS2) comparing all gene constructions to good constructions having expression and/or homology evidence and to the previous OGS1.2 gene models. Percentages are of the total number of genes for the set
Summary Statistics | OGS2 All Models |
OGS2 Good Models |
OGS1.2 Final Models |
---|---|---|---|
Genes | 36,327 | 24,388 | 18,850 |
Protein coding genes | 25,725 (71 %) | 24,388 | 15,566a |
Non-coding genes | 3,997 (11 %) | 0 | 0 |
Transposon protein genes | 6,605 (20 %) | 385a | 2,935a |
Single transcript genes | 32,079 (88 %) | 20,243 (83 %) | 18,759 (99.5 %) |
Genes assigned to orthologb | 15,176 (42 %) | 15,173 (62 %) | -- |
Transcripts | 44,164 | 32,101 | 18,941 |
Alternative transcripts | 7837 | 7712 | 91 |
Mean isoforms per gene | 1.22 | 1.32 | 1 |
Complete proteins | 41,256 (93 %) | 30,521 (95 %) | 18,941 (100 %) |
Median transcript length | 1571 bp | 1603 bp | 1176 bp |
Median CDS length | 777 bp | 981 bp | 1032 bp |
Transcripts with UTR | 41,313 (94 %) | 30,512 (95 %) | 5264 (28 %) |
a2,935 OGS1.2 models are classified with strong homology to transposon proteins during OGS2 work, 385 models with expression and other insect homology but also transposon homology were retained in OGS2 “good” model set
b5,763 additional genes of OGS2 have significant protein homology, but are not assigned as orthologs in OrthoMCL orthology analysis, 3,454 of 24,388 “good” models lack significant homology, but have expression evidence