Table 2.
Reference Organism |
Performance Category |
Ab Initio Predictions | MAKER Annotations | ||||
---|---|---|---|---|---|---|---|
Augustus | GeneMark | SNAP | Augustus | GeneMark | SNAP | ||
A. thaliana | Nucleotide Accuracy | 57.85% | 48.62% | 43.84% | 68.56% | 57.96% | 73.77% |
Exon Accuracy | 30.71% | 16.51% | 18.58% | 53.31% | 28.87% | 60.11% | |
D. melanogaster | Nucleotide Accuracy | 67.47% | 66.51% | 48.92% | 73.78% | 72.83% | 74.44% |
Exon Accuracy | 30.62% | 26.25% | 19.94% | 43.10% | 39.74% | 53.69% | |
C. elegans | Nucleotide Accuracy | 66.18% | 67.26% | 68.24% | 74.32% | 71.92% | 85.02% |
Exon Accuracy | 28.33% | 30.01% | 35.44% | 38.52% | 39.42% | 63.14% |
The effect of limited/insufficient training data on ab initio gene prediction is simulated by providing the algorithms Augustus, GeneMark, and SNAP with incorrect species parameters files (the A. thaliana species parameters were used to produce gene models for C. elegans and D. melanogaster, and the C. elegans parameters were used to produce gene models in A. thaliana). In comparison, the same predictors, when ran as part of the MAKER2 gene annotation pipeline, perform substantially better, even with the same incorrect species parameter files.