. 2001 May;11(5):803–816. doi: 10.1101/gr.175701

Table 2.

Gene Level Accuracy of GenomeScan as a Function of Protein Similarity in DraftGene and FinishGene Datasets

Variable	Similarity category/dataset

	10⁻⁵ > P > 10⁻²⁰		10⁻⁴⁰ > P > 10⁻⁸⁰		10⁻¹²⁰ > P > 10⁻¹⁸⁰

Draft	Finish	Draft	Finish	Draft	Finish

No. of genes in dataset	174	174	151	151	93	93
% of fragmented genes	42	0	43	0	55	0
No. of predicted genes^*	186	172	205	159	152	104
Genes completely covered (%)	38	58	48	71	57	73
Genes partially covered (%)	49	32	51	28	42	27
Genes missed (%)	13	10	1	1	1	0
No. of “extra” predicted genes^*	18	14	19	10	8	11

Sequences were grouped according to the level of similarity between the encoded protein and the available database proteins used in the predictions as described in the legend to Fig. 3. All known genes in the FinishGene set are complete (all coding exons present in a single sequence). Some genes in the DraftGene set represented by multiple “partial genes” in different draft contigs; these are listed as fragmented genes. Known genes were classified as completely covered if all exons were covered by GenomeScan predicted exons; partially covered, if some exons (but not all) were covered by GenomeScan predicted exons; and missed, if no exon was covered by a GenomeScan-predicted exon. GenomeScan predicted genes which did not overlap any known gene are listed as “extra” predicted genes.

Includes predicted partial genes as well as complete genes.