Skip to main content
. 2013 Nov 2;14:315. doi: 10.1186/1471-2105-14-315

Table 7.

Description of proteins from the benchmark dataset that were misclassified by at least one machine learning algorithm

UniProt ID Protein name Subcellular annotation Expected classification Final classification a Misclassification by algorithm b Evidence profile c
Q27298
SAG1 protein (P30
Membrane
YES
YES
AB RF SVM
Q27298,0,Y,0.297,0.141,M,2,7.30,0.56,0,21.5,Secreted,0.255,0.205,YES
B0LUH4
Microneme protein 13
Unknown
YES
YES
kNN
B0LUH4,0,Y,0.888,0.907,S,1,0.11,0.11,0,29.0,Secreted,0.270,0.355,YES
P84343
Peptidyl-prolyl cis-trans isomerase
Unknown
YES
YES
kNN
P84343,0,Y,0.817,0.963,S,1,1.11,1.11,0,29.0,Secreted,0.465,0.536,YES
Q9U483
Microneme protein Nc-P38
Unknown
YES
YES
kNN
Q9U483,0,Y,0.427,0.587,S,4,0.23,0.23,0,30.0,Secreted,0.355,0.1736,YES
B9PRX5
Proteasome subunit alpha type
Unknown
YES
YES
RF SVM
B9PRX5,0,Y,0.250,0.254,M,2,16.81,7.23,0,22.0,Secreted,0.648,0.515,YES
B9QH60
Acetyl-CoA carboxylase, putative
Unknown
YES
YES
SVM
B9QH60,1,N,0.322,0.019,M,1,22.02,0.00,1,5.0,Secreted,0.846,0.437,YES
B6K9N1
Cytochrome P450 (putative)
Unknown
NO
NO
kNN
B6K9N1,1,N,0.131,0.041,U,2,15.35,0.03,0,5.0,Membrane,0.197,0.480,NO
B9Q0C2
Anamorsin homolog
Cytoplasm
NO
NO
kNN
B9Q0C2,0,Y,0.245,0.108,U,4,0.54,0.00,0,20.0,Secreted,0.382,0.210,NO
B9PK71 DNA-directed RNA polymerase subunit Nucleus NO NO NB B9PK71,0,N,0.188,0.223,U,4,0.00,0.00,0,22.0,Secreted,0.368,0.380,NO

aFinal classification takes into account predictions from each algorithm and the most frequent classification type is used i.e. a majority rule approach. A YES classification is adopted for tied votes e.g. Q27298.

bAlgorithms are executed multiple times on the same input data. An in-house Perl script summarises the multiple runs and indicates the number of times (as a percentage) the predicted classification of protein differs from the expected. Proteins are regarded as misclassified if the number of times = 100%.

cColumn headers: 1 = ID, 2 = Phobius_TM, 3 = Phobius_SP, 4 = SignalP, 5 = TargetP_SP, 6 = TargetP_loc, 7 = TargetP_RC, 8 = TMHMM_AA, 9 = TMHMM_First60, 10 = TMHMM_TM, 11 = WoLF_PSORT, 12 = WoLF_PSORT_annotation, 13 = MHCI, 14 = MHCII, 15 = Expected classification.

Abbreviations: AB = Adaptive boosting, RF = random forest, SVM = support vector machines, NB = Naive Bayes, kNN = k-Nearest neighbour, NN = neural network.