Table 1.
Items | Description | No. of Protein Sequences | Dataset Size | % |
---|---|---|---|---|
A | Proteins with SCL annotation in UniProt database | 274730 | 494762 | 55.52 |
B | Proteins in A with experimentally known SCL | 55079 | 494762 | 11.13 |
C | Proteins in A with uncertain terms such as potential/probable/similarity | 219651 | 494762 | 44.39 |
D | Proteins with GO annotation | 461365 | 494762 | 93.24 |
E | Protein with SCL annotation in GO database | 337762 | 494762 | 68.26 |
F | UniProt human entries with experimentally known SCL | 6923 | 20274 | 34.14 |
G | UniProt human entries with uncertain terms such as potential/probable/similarity | 7486 | 20274 | 36.92 |
Distribution of 494762 protein entries from UniProtKB/Swiss-Prot* database (version 57.9) according to their SCL annotation and GO database reference.
* The original number of UniProt protein entries was 510076. Of these, 15314 were annotated as "fragment" or contained less than 50 amino acids residues, hence, were removed from further consideration, i.e. 494762. Similarly, we considered only 20274 human protein entries out of 20334 sequences.