Skip to main content
. 2021 Oct 22;22(Suppl 10):515. doi: 10.1186/s12859-021-04404-0

Table.2.

Number of protein sequences in benchmark dataset 4802 created by Wei et al.

Subcellular location Number of protein sequences
Nucleus 1720
Cytoplasm 1050
Plasma membrane 836
Extracellular 487
Mitochondria 407
Endosome 342
Golgi apparatus 272
Nucleolus 268
Lysosomes 125
Endoplasmic reticulum 120
Cytoskeleton 89
Centrosome 81
Peroxisome 67
Early endosomes 52
Nuclear envelope 47
Cytoplasmic vesicles 46
Basolateral plasma membrane 29
Synaptic vesicles 28
Microtubule 26
Apical plasma membrane 16
Late endosomes 16
Golgi trans face 11
Secretory granule 10
Tight junction 9
Golgi cis cisterna 7
Medial-golgi 7
Melanosome 6
Secretory vesicles 5
Cellular component 4
ERGIC 4
Inner mitochondrial membrane 4
Transport vesicle 4
Golgi trans cisterna 3
Total 6198

In this dataset, 4802 protein sequences are identified in 33 subcellular locations. The first column is the name of the subcellular covered by this dataset, and the second column is the number of proteins located at each subcellular location. The total number of subcellular locations is 6198 since each sequence can be found in multiple subcellular locations. The sequences distribute at those 33 locations unevenly. 35.8% of sequence samples are located at Nucleus and 21.9% sequences are located in Cytoplasm, while only 3 sequences are identified in Golgi Trans Cisterna. The number of positive cases in this dataset is 6198, and the positive case rate is 3.9% (6198/(4802 × 33))