Table 1. Distribution of six enzyme classes before and after CD-HIT(0.65).
Dataset | EC 1 | EC 2 | EC 3 | EC 4 | EC 5 | EC 6 | Total |
---|---|---|---|---|---|---|---|
original data | 32958 | 82735 | 38611 | 22754 | 14096 | 23221 | 214375 |
after duplicate-elimination | 32016 | 79144 | 36862 | 22421 | 13872 | 23115 | 207430 |
after CD-HIT | 8781 | 23716 | 11994 | 5331 | 4037 | 5904 | 59763 |