Table 2. Sequence validation rates of predictions,
FGENESH vs. GENSCAN
Gene prediction set |
Gene prediction program |
Total no. of predictions |
No. of predictions tested* |
No. sequence-validated (%)† |
sjc set |
GENSCAN |
196 |
161 |
60 (37.3%) |
FGENESH |
11 |
10 |
4 (40.0%) |
|
Heidelberg set‡ |
FGENESH |
1,266 |
160 |
18 (11.3%) |
Homol-2 set |
GENSCAN |
333 |
204 |
27 (13.2%) |
FGENESH |
6 |
5 |
2 (40.0%) |
|
Homol-0 set§ |
GENSCAN |
9,463 |
129 |
7 (5.4%) |
FGENESH |
127 |
75 |
5 (6.7%) |
|
Total predictions |
11,402 |
744 |
123 (16.5%) |
|
Control set |
159 |
159 |
154 (96.9%) |
*Gene predictions from each prediction set (sjc set, comprised of
GENSCAN or FGENESH predictions with an intron conserved in D. pseudoobscura; Heidelberg set, FGENESH predictions reported verified by transcription profiling (1); homol-2 set, with homology to D. pseudoobscura in two exons; homol-0 set, with homology in zero to one exon; and release 3.1 controls) were tested by sequencing of RT-PCR products.†
Gene predictions were considered validated if the aligned sequence of the PCR product was consistent with a spliced gene model in the region of the prediction.‡
Only 1,266 multiexon predictions from the 2,636 predictions described in the Hild et al. (1) study were considered for analysis, and, of these, we tested only the 160 with the highest priority scores that did not overlap any GENSCAN or FGENESH predictions tested in the other sets.§
The homol-0 set was selected to be representative of the full range of priority scores.