Table 4.
Number of sequences | Transcript median length | TP | FP | TP promoters | Precision | Recall | F-score | |
---|---|---|---|---|---|---|---|---|
ORF | 4912 | 1548 | 5317 | 9072 | 3934 | 37.0 | 80.1 | 50.6 |
Cuts | 501 | 428 | 540 | 242 | 404 | 69.1 | 80.6 | 74.4 |
Suts | 729 | 964 | 727 | 704 | 552 | 50.8 | 75.7 | 60.8 |
Other | 300 | 1272 | 296 | 447 | 223 | 39.8 | 74.3 | 51.9 |
All | 6442 | 1436 | 6880 | 10465 | 5113 | 39.7 | 79.4 | 52.9 |
TATA and TATA-less promoters | ||||||||
TATA | 842 | 1384 | 978 | 1298 | 701 | 43.0 | 83.3 | 56.7 |
TATA-less | 4070 | 1544 | 4206 | 7529 | 3139 | 35.8 | 77.1 | 48.9 |
Promoter prediction for 16 chromosomes for both forward and reverse strands has been carried out. −500 to +100 region relative to transcript start was chosen as a true positive region. The performance of PromPredict has been evaluated using the parameters precision, recall, and F-score. Precision is the ratio of number of true positives to the sum of true and false positive predictions, while recall is the ratio of the numbers of promoters with an identified true positive gene to the total number of promoters. TATA-containing and TATA-less gene promoters are defined based on the criterion of presence of TATA-box in −150 to −1 region relative to TSS22. Recall values for prediction of different transcripts belonging to ORF, non-protein coding CUTs, SUTs and other RNA classes (tRNA, rRNA, and SnoRNA) suggest that PromPredict is a good predictor for yeast promoter sequences. The algorithm performs better for TATA-containing gene promoters as compared to TATA-less promoters.