Table 1. Number of protein sequences after removing redundant proteins at thresholds of 90%, 70% and 40% using CD-HIT.
Redundancy cut off | Positive Dataset (208) | Negative Dataset (1321) |
90% | 142 | 949 |
70% | 118 | 815 |
40% | 66 | 555 |
Redundancy cut off | Positive Dataset (208) | Negative Dataset (1321) |
90% | 142 | 949 |
70% | 118 | 815 |
40% | 66 | 555 |