. 2014 May 14;9(5):e97250. doi: 10.1371/journal.pone.0097250

Table 1. Sequences were identified for 275 putative orphan enzymes, most frequently by fixing database errors.

A. Orphans with new sequence data from literature-based method
*Putative orphan enzymes (POEs)*	*1,122*
Orphan enzymes with new sequence data	275	24.51%
Remaining orphan enzymes	847	75.49%
B. Characterization of 275 identified orphan enzymes
*Missing annotation updates*		*49.09%*
Sequence annotated to class level (e.g. “aldehyde reductase”)	49	17.69%
Sequence needed addition of new enzyme activity to a properly annotated existing enzyme	36	13.00%
Sequence lacked assigned activities	29	10.47%
Sequence annotated with similar activity within same class as orphan activity	21	7.58%
*Data labeling errors*		*50.91%*
Sequence lacked EC number	109	39.35%
Sequence lacked synonymous names	28	10.11%
Sequence misannotated with incorrect EC number	3	1.08%
C. How 275 orphan sequences were found
Literature and database review	268	97.45%
BLASTing with N-terminal sequence data	5	1.82%
Molecular weight data compared to sequenced genome	2	0.73%

Sequences were found for 275 putative orphan enzymes (25% of the total) by searching through the literature, sequence databases, and patents. Approximately half of the orphans for which sequences could be found were “annotation updates,” in which the sequence for the enzyme in major databases was annotated with no activity, with a less specific activity, with an incorrect activity that was in the same general class as the correct one, or with another correct activity that the enzyme also carries out. The remaining orphans fell into the “data inconsistency” category. Orphan enzymes in this category were in some way annotated to sequence data, but a lack of an EC number or a nomenclature mismatch meant that searching these databases for the activity did not yield any sequence data.