Table 1.
Types of sequence curation anomaly
| Name | Description of the anomaly | Score |
|---|---|---|
| UNMATCHED_RST5 | 5′ RACE tags that are not near the 5′-end of a CDS | 5 |
| UNMATCHED_TWINSCAN | Twinscan predicted exons that do not overlap any CDS exons | 1 |
| UNMATCHED_GENEFINDER | Genefinder predicted exons that do not overlap any CDS exons | 1 |
| JIGSAW_DIFFERS_FROM_CDS | Predicted jigsaw exons that differ from the CDS exons | 1 |
| CDS_DIFFERS_FROM_JIGSAW | CDS exons that do not overlap exons predicted by the program jigsaw | 1 |
| UNMATCHED_WABA | WABA well-conserved coding regions that do not match any CDS exons | Logarithm of the WABA score |
| OVERLAPPING_EXONS | CDS exons that overlap a CDS exon or any other sort of gene in the opposite sense | 5 |
| SHORT_EXONS | CDS exons shorter than 30 bases | 1 |
| LONG_EXONS | CDS exons longer than 20 000 bases | 1 |
| SHORT_INTRONS | CDS introns shorter than 25 bases | 1 |
| REPEAT_OVERLAPS_EXON | CDS exons that substantially overlap RepeatMasked regions | 1 |
| INTRONS_IN_UTR | UTRs which have three or more exons | 1 |
| SPLIT_GENE_BY_TWINSCAN | CDS that overlap two or more Twinscan predictions indicating they should be split | 1 |
| UNMATCHED_EST | EST alignments with no matching CDS exons or pseudogenes or transposons or repeats | 1 |
| UNMATCHED_MASS_SPEC_PEPTIDE | Mass spectrometry peptide positions that are no longer completely covered by a CDS exon or transposon | 10 |
| EST_OVERLAPS_INTRON | CDS introns (excluding ones from isoforms) that are completely covered by an aligned EST or other transcript alignment | 5 |
| UNMATCHED_EXPRESSION | Tiling array highly expressed regions that do not match a CDS | 10 |
| UNCONFIRMED_INTRON | Introns of EST/mRNA alignments that do not exactly match CDS introns and which do not overlap with pseudogenes, etc. | 10 |
| WEAK_INTRON_SPLICE_SITE | Splice sites of CDS introns that have weak scores | 1 |
| UNMATCHED_PROTEIN | BLASTX protein alignments to the genome which do not overlap CDS exons or pseudogenes or transposons, etc. | Logarithm of the BLASTX score |
| UNMATCHED_EST | EST/mRNA alignments with no matching CDS exons or pseudogenes or transposons | 3 |
| FRAMESHIFTED_PROTEIN | BLASTX protein alignments to the genome that indicate an apparent frameshift | Logarithm of the BLASTX score |
| MERGE_GENES_BY_PROTEIN | BLASTX protein alignments to the genome which overlap two genes indicating that the genes should be merged | Logarithm of the BLASTX score |
| NOT_PREDICTED_BY_MGENE | The curated CDS is not predicted by mGene | 2 |
| NOVEL_MGENE_PREDICTION | mGene predicts a CDS which does not overlap with a curated CDS | 2 |
| UNMATCHED_MGENE | mGene predicted exons that do not overlap any CDS exons | 2 |