Table 3.
Cluster ID | #sequences | #non-redundant sequences | Description |
CAM_CL_2057 | 20,508 | 24 | Reverse transcriptase (HIV) |
CAM_CL_1132 | 18,882 | 1,406 | Cytochrome c oxidase subunit I |
CAM_CL_2568 | 15,405 | 6,091 | ABC transporter |
CAM_CL_4367 | 15,228 | 771 | Cytochrome b |
CAM_CL_49 | 14,751 | 7,389 | Short-chain dehydrogenase |
CAM_CL_3510 | 13,255 | 5,173 | Immunoglobulin |
CAM_CL_2630 | 13,140 | 3,297 | Envelope glycoprotein |
CAM_CL_160 | 13,054 | 3,897 | Kinases |
CAM_CL_4556 | 12,403 | 6,345 | Response regulator |
CAM_CL_481 | 12,078 | 5,477 | Transcription regulator |
Column 3 hints at the extent of redundancy in the PANDA set.