Skip to main content
. 2014 May 14;9(5):e97250. doi: 10.1371/journal.pone.0097250

Table 1. Sequences were identified for 275 putative orphan enzymes, most frequently by fixing database errors.

A. Orphans with new sequence data from literature-based method
Putative orphan enzymes (POEs) 1,122
Orphan enzymes with new sequence data 275 24.51%
Remaining orphan enzymes 847 75.49%
B. Characterization of 275 identified orphan enzymes
Missing annotation updates 49.09%
Sequence annotated to class level (e.g. “aldehyde reductase”) 49 17.69%
Sequence needed addition of new enzyme activity to a properly annotated existing enzyme 36 13.00%
Sequence lacked assigned activities 29 10.47%
Sequence annotated with similar activity within same class as orphan activity 21 7.58%
Data labeling errors 50.91%
Sequence lacked EC number 109 39.35%
Sequence lacked synonymous names 28 10.11%
Sequence misannotated with incorrect EC number 3 1.08%
C. How 275 orphan sequences were found
Literature and database review 268 97.45%
BLASTing with N-terminal sequence data 5 1.82%
Molecular weight data compared to sequenced genome 2 0.73%

Sequences were found for 275 putative orphan enzymes (25% of the total) by searching through the literature, sequence databases, and patents. Approximately half of the orphans for which sequences could be found were “annotation updates,” in which the sequence for the enzyme in major databases was annotated with no activity, with a less specific activity, with an incorrect activity that was in the same general class as the correct one, or with another correct activity that the enzyme also carries out. The remaining orphans fell into the “data inconsistency” category. Orphan enzymes in this category were in some way annotated to sequence data, but a lack of an EC number or a nomenclature mismatch meant that searching these databases for the activity did not yield any sequence data.