Table 3. Evaluation of gene finders with various training genes.
GeneFinder | Traininggenes## | Accuracy level | ||||||||
GENE | EXON | NUCLEOTDE | ||||||||
SN | SP | SN | SP | SN | SP | Predicted | Matched$$ | Duplicate++ | ||
GlimmerHMM | All validated genes excepttest genes | 0.16 | 0.15 | 0.27 | 0.30 | 0.54 | 0.50 | 684 | 269 (47) | 52 |
All validated genes includingtest gene | 0.20 | 0.20 | 0.33 | 0.35 | 0.61 | 0.55 | 710 | 273 (64) | 47 | |
Using a trained model fromprogram creator | Not available for Toxoplasma gondii | |||||||||
Using a model trained onhuman genes | 0.02 | 0.01 | 0.04 | 0.05 | 0.23 | 0.14 | 1129 | 247 (5) | 131 | |
SNAP | All validated genes excepttest genes | 0.18 | 0.12 | 0.44 | 0.33 | 0.46 | 0.35 | 889 | 277 (53) | 172 |
All validated genes includingtest genes | 0.18 | 0.12 | 0.46 | 0.35 | 895 | 279 (54) | 170 | |||
Using a trained model fromprogram creator | Not available for Toxoplasma gondii | |||||||||
Using a model trained onhuman genes | 0.09 | 0.04 | 0.06 | 0.09 | 0.16 | 0.11 | 1759 | 267 (25) | 315 | |
AUGUSTUS | All validated genes excepttest genes | 0.33 | 0.38 | 0.54 | 0.57 | 0.81 | 0.78 | 510 | 261 (99) | 2 |
All validated genesincluding test genes | 0.37 | 0.42 | 0.57 | 0.59 | 0.82 | 0.79 | 514 | 265 (111) | 2 | |
Using a trained modelfrom program creator | 0.36 | 0.42 | 0.57 | 0.56 | 0.78 | 0.84 | 470 | 256 (108) | 0 | |
Using a model trained onhuman genes | 0.12 | 0.09 | 0.19 | 0.19 | 0.34 | 0.25 | 114 | 282 (37) | 150 | |
GeneMark_hmm | Using a trained modelfrom program creator | 0.06 | 0.07 | 0.15 | 0.13 | 0.43 | 0.37 | 580 | 240 (19) | 49 |
GeneMark_hmm ES | Using a self-training procedure.i.e. no training genes required | 0.08 | 0.09 | 0.23 | 0.19 | 0.56 | 0.44 | 630 | 248 (25) | 45 |
The types of training genes used in the training model. The number of validated genes = 3,432 (includes test genes) and the number of test genes = 299.
Number of predicted genes that align entirely or partly with the test genes and meet the criteria E-value = 0 and 100% coverage – a value in brackets is the number of predicted genes that are exactly the same as the test genes i.e. the start and end genomic coordinates of each exon is the same as each test gene exon.
Number of predicted genes that align to the same test gene i.e. the predicted gene is only a part of the entire test gene and there can be one or more predictions per test gene.