Table.2.
Number of protein sequences in benchmark dataset 4802 created by Wei et al.
| Subcellular location | Number of protein sequences |
|---|---|
| Nucleus | 1720 |
| Cytoplasm | 1050 |
| Plasma membrane | 836 |
| Extracellular | 487 |
| Mitochondria | 407 |
| Endosome | 342 |
| Golgi apparatus | 272 |
| Nucleolus | 268 |
| Lysosomes | 125 |
| Endoplasmic reticulum | 120 |
| Cytoskeleton | 89 |
| Centrosome | 81 |
| Peroxisome | 67 |
| Early endosomes | 52 |
| Nuclear envelope | 47 |
| Cytoplasmic vesicles | 46 |
| Basolateral plasma membrane | 29 |
| Synaptic vesicles | 28 |
| Microtubule | 26 |
| Apical plasma membrane | 16 |
| Late endosomes | 16 |
| Golgi trans face | 11 |
| Secretory granule | 10 |
| Tight junction | 9 |
| Golgi cis cisterna | 7 |
| Medial-golgi | 7 |
| Melanosome | 6 |
| Secretory vesicles | 5 |
| Cellular component | 4 |
| ERGIC | 4 |
| Inner mitochondrial membrane | 4 |
| Transport vesicle | 4 |
| Golgi trans cisterna | 3 |
| Total | 6198 |
In this dataset, 4802 protein sequences are identified in 33 subcellular locations. The first column is the name of the subcellular covered by this dataset, and the second column is the number of proteins located at each subcellular location. The total number of subcellular locations is 6198 since each sequence can be found in multiple subcellular locations. The sequences distribute at those 33 locations unevenly. 35.8% of sequence samples are located at Nucleus and 21.9% sequences are located in Cytoplasm, while only 3 sequences are identified in Golgi Trans Cisterna. The number of positive cases in this dataset is 6198, and the positive case rate is 3.9% (6198/(4802 × 33))