Table 7.
UniProt ID | Protein name | Subcellular annotation | Expected classification | Final classification a | Misclassification by algorithm b | Evidence profile c |
---|---|---|---|---|---|---|
Q27298 |
SAG1 protein (P30 |
Membrane |
YES |
YES |
AB RF SVM |
Q27298,0,Y,0.297,0.141,M,2,7.30,0.56,0,21.5,Secreted,0.255,0.205,YES |
B0LUH4 |
Microneme protein 13 |
Unknown |
YES |
YES |
kNN |
B0LUH4,0,Y,0.888,0.907,S,1,0.11,0.11,0,29.0,Secreted,0.270,0.355,YES |
P84343 |
Peptidyl-prolyl cis-trans isomerase |
Unknown |
YES |
YES |
kNN |
P84343,0,Y,0.817,0.963,S,1,1.11,1.11,0,29.0,Secreted,0.465,0.536,YES |
Q9U483 |
Microneme protein Nc-P38 |
Unknown |
YES |
YES |
kNN |
Q9U483,0,Y,0.427,0.587,S,4,0.23,0.23,0,30.0,Secreted,0.355,0.1736,YES |
B9PRX5 |
Proteasome subunit alpha type |
Unknown |
YES |
YES |
RF SVM |
B9PRX5,0,Y,0.250,0.254,M,2,16.81,7.23,0,22.0,Secreted,0.648,0.515,YES |
B9QH60 |
Acetyl-CoA carboxylase, putative |
Unknown |
YES |
YES |
SVM |
B9QH60,1,N,0.322,0.019,M,1,22.02,0.00,1,5.0,Secreted,0.846,0.437,YES |
B6K9N1 |
Cytochrome P450 (putative) |
Unknown |
NO |
NO |
kNN |
B6K9N1,1,N,0.131,0.041,U,2,15.35,0.03,0,5.0,Membrane,0.197,0.480,NO |
B9Q0C2 |
Anamorsin homolog |
Cytoplasm |
NO |
NO |
kNN |
B9Q0C2,0,Y,0.245,0.108,U,4,0.54,0.00,0,20.0,Secreted,0.382,0.210,NO |
B9PK71 | DNA-directed RNA polymerase subunit | Nucleus | NO | NO | NB | B9PK71,0,N,0.188,0.223,U,4,0.00,0.00,0,22.0,Secreted,0.368,0.380,NO |
aFinal classification takes into account predictions from each algorithm and the most frequent classification type is used i.e. a majority rule approach. A YES classification is adopted for tied votes e.g. Q27298.
bAlgorithms are executed multiple times on the same input data. An in-house Perl script summarises the multiple runs and indicates the number of times (as a percentage) the predicted classification of protein differs from the expected. Proteins are regarded as misclassified if the number of times = 100%.
cColumn headers: 1 = ID, 2 = Phobius_TM, 3 = Phobius_SP, 4 = SignalP, 5 = TargetP_SP, 6 = TargetP_loc, 7 = TargetP_RC, 8 = TMHMM_AA, 9 = TMHMM_First60, 10 = TMHMM_TM, 11 = WoLF_PSORT, 12 = WoLF_PSORT_annotation, 13 = MHCI, 14 = MHCII, 15 = Expected classification.
Abbreviations: AB = Adaptive boosting, RF = random forest, SVM = support vector machines, NB = Naive Bayes, kNN = k-Nearest neighbour, NN = neural network.