Table 1. Sequences were identified for 275 putative orphan enzymes, most frequently by fixing database errors.
A. Orphans with new sequence data from literature-based method | ||
Putative orphan enzymes (POEs) | 1,122 | |
Orphan enzymes with new sequence data | 275 | 24.51% |
Remaining orphan enzymes | 847 | 75.49% |
B. Characterization of 275 identified orphan enzymes | ||
Missing annotation updates | 49.09% | |
Sequence annotated to class level (e.g. “aldehyde reductase”) | 49 | 17.69% |
Sequence needed addition of new enzyme activity to a properly annotated existing enzyme | 36 | 13.00% |
Sequence lacked assigned activities | 29 | 10.47% |
Sequence annotated with similar activity within same class as orphan activity | 21 | 7.58% |
Data labeling errors | 50.91% | |
Sequence lacked EC number | 109 | 39.35% |
Sequence lacked synonymous names | 28 | 10.11% |
Sequence misannotated with incorrect EC number | 3 | 1.08% |
C. How 275 orphan sequences were found | ||
Literature and database review | 268 | 97.45% |
BLASTing with N-terminal sequence data | 5 | 1.82% |
Molecular weight data compared to sequenced genome | 2 | 0.73% |
Sequences were found for 275 putative orphan enzymes (25% of the total) by searching through the literature, sequence databases, and patents. Approximately half of the orphans for which sequences could be found were “annotation updates,” in which the sequence for the enzyme in major databases was annotated with no activity, with a less specific activity, with an incorrect activity that was in the same general class as the correct one, or with another correct activity that the enzyme also carries out. The remaining orphans fell into the “data inconsistency” category. Orphan enzymes in this category were in some way annotated to sequence data, but a lack of an EC number or a nomenclature mismatch meant that searching these databases for the activity did not yield any sequence data.