Skip to main content
. 2018 Mar 14;8:4520. doi: 10.1038/s41598-018-22129-8

Table 4.

Whole genome promoter prediction in S. cerevisiae using “PromPredict”.

Number of sequences Transcript median length TP FP TP promoters Precision Recall F-score
ORF 4912 1548 5317 9072 3934 37.0 80.1 50.6
Cuts 501 428 540 242 404 69.1 80.6 74.4
Suts 729 964 727 704 552 50.8 75.7 60.8
Other 300 1272 296 447 223 39.8 74.3 51.9
All 6442 1436 6880 10465 5113 39.7 79.4 52.9
TATA and TATA-less promoters
TATA 842 1384 978 1298 701 43.0 83.3 56.7
TATA-less 4070 1544 4206 7529 3139 35.8 77.1 48.9

Promoter prediction for 16 chromosomes for both forward and reverse strands has been carried out. −500 to +100 region relative to transcript start was chosen as a true positive region. The performance of PromPredict has been evaluated using the parameters precision, recall, and F-score. Precision is the ratio of number of true positives to the sum of true and false positive predictions, while recall is the ratio of the numbers of promoters with an identified true positive gene to the total number of promoters. TATA-containing and TATA-less gene promoters are defined based on the criterion of presence of TATA-box in −150 to −1 region relative to TSS22. Recall values for prediction of different transcripts belonging to ORF, non-protein coding CUTs, SUTs and other RNA classes (tRNA, rRNA, and SnoRNA) suggest that PromPredict is a good predictor for yeast promoter sequences. The algorithm performs better for TATA-containing gene promoters as compared to TATA-less promoters.