Skip to main content
. 2001 May;11(5):803–816. doi: 10.1101/gr.175701

Table 2.

Gene Level Accuracy of GenomeScan as a Function of Protein Similarity in DraftGene and FinishGene Datasets

Variable Similarity category/dataset


10−5 > P > 10−20 10−40 > P > 10−80 10−120 > P > 10−180



Draft Finish Draft Finish Draft Finish






No. of genes in dataset 174 174 151 151 93 93
% of fragmented genes 42 0 43 0 55 0
No. of predicted genes* 186 172 205 159 152 104
Genes completely covered (%) 38 58 48 71 57 73
Genes partially covered (%) 49 32 51 28 42 27
Genes missed (%) 13 10 1 1 1 0
No. of “extra” predicted genes* 18 14 19 10 8 11

Sequences were grouped according to the level of similarity between the encoded protein and the available database proteins used in the predictions as described in the legend to Fig. 3. All known genes in the FinishGene set are complete (all coding exons present in a single sequence). Some genes in the DraftGene set represented by multiple “partial genes” in different draft contigs; these are listed as fragmented genes. Known genes were classified as completely covered if all exons were covered by GenomeScan predicted exons; partially covered, if some exons (but not all) were covered by GenomeScan predicted exons; and missed, if no exon was covered by a GenomeScan-predicted exon. GenomeScan predicted genes which did not overlap any known gene are listed as “extra” predicted genes. 

*

Includes predicted partial genes as well as complete genes.