Table 1.
Category | Definition | No. of genes in category | No. of genes successfully amplified by 5′ RACE (%) | No. of RACE sequences that differed from the 5′ gene annotation (%) |
---|---|---|---|---|
EPD | Genes in the Eukaryotic Promoter Database having experimentally verified transcriptional start sites | 13 | 13 (100%) | 4 (31%) |
RefSeq | NCBI's curated non-redundant gene set | 27 | 20 (74%) | 8 (40%) |
B | Automated NCBI predictions covered by multiple ESTs | 23 | 15 (65%) | 7 (47%) |
C | Gene predictions which are covered by a single EST only and do not overlap any mRNA, cDNA, ENSEMBLE or GENIE evidence | 169 | 40 (24%) | 30 (75%) |
D | Gene predictions that do not overlap any EST, mRNA, cDNA, ENSEMBLE, or GENIE evidence | 68 | 18 (26%) | 12 (67%) |
Total | 300 | 106 (35%) | 61 (58%) |
Three hundred mouse genes or gene predictions were classified into five categories based on the quality of associated evidence. The definition column describes the basis for the classification. Genes in the EPD category have the highest quality evidence and were used as internal positive control for all experiments. Genes in category D were considered to be based on evidence with least amount of confidence. 5′ RACE–PCR was performed on 15 mouse tissues/stages as described in Methods. The number of genes successfully amplified in each category and satisfying the criteria described in the Methods section are listed. The number of 5′ RACE sequences where the reference sequence annotation was found to be incomplete at the 5′ end is shown for each category.