Table 5.
Test dataset |
Training dataset |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
T. gondii
|
|
Plasmodium
|
|
C. elegans
|
|
Combined species |
|
Benchmark |
|
SN | SP | SN | SP | SN | SP | SN | SP | SN | SP | |
|
Decision Tree
a
|
|||||||||
T. gondii |
1.00
b
|
0.81
b
|
0.95 |
0.89 |
1.00 |
0.83 |
1.00 |
0.83 |
1.00 |
0.83 |
Plasmodium |
0.84 |
0.90 |
1.00
b
|
1.00
b
|
0.85 |
0.96 |
1.00 |
0.92 |
1.00 |
0.98 |
C. elegans |
0.87 |
0.93 |
1.00 |
0.99 |
1.00
b
|
1.00
b
|
1.00 |
0.99 |
1.00 |
0.98 |
Combined species |
0.87 |
0.92 |
1.00 |
0.99 |
0.98 |
0.99 |
1.00
b
|
0.98
b
|
1.00 |
0.97 |
Benchmark |
0.86 |
0.91 |
0.97 |
0.96 |
0.96 |
0.96 |
0.97 |
0.91 |
1.00
b
|
1.00
b
|
|
Adaptive boosting
a
|
|||||||||
T. gondii |
0.51
b
|
0.06
b
|
0.96 |
0.88 |
1.00 |
0.83 |
1.00 |
0.91 |
1.00 |
0.83 |
Plasmodium |
0.82 |
0.99 |
0.98
b
|
0.96
b
|
0.95 |
0.96 |
1.00 |
1.00 |
1.00 |
0.98 |
C. elegans |
0.87 |
0.99 |
1.00 |
1.00 |
1.00
b
|
1.00
b
|
1.00 |
1.00 |
1.00 |
0.98 |
Combined species |
0.87 |
0.99 |
1.00 |
0.99 |
0.99 |
0.99 |
1.00
b
|
0.99
b
|
1.00 |
0.98 |
Benchmark |
0.85 |
0.99 |
0.97 |
0.98 |
0.97 |
0.96 |
0.99 |
0.99 |
0.98
b
|
0.97
b
|
|
Random forest
a
|
|||||||||
T. gondii |
0.97
b
|
0.90
b
|
1.00 |
0.83 |
1.00 |
0.89 |
1.00 |
1.00 |
1.00 |
0.83 |
Plasmodium |
0.87 |
1.00 |
0.99
b
|
0.99
b
|
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
0.98 |
C. elegans |
0.83 |
1.00 |
0.98 |
1.00 |
1.00
b
|
1.00
b
|
1.00 |
1.00 |
1.00 |
1.00 |
Combined species |
0.84 |
1.00 |
0.98 |
0.99 |
1.00 |
1.00 |
1.00
b
|
1.00
b
|
1.00 |
0.99 |
Benchmark |
0.82 |
1.00 |
0.99 |
0.99 |
0.99 |
1.00 |
0.97 |
0.99 |
0.99
b
|
0.99
b
|
|
k-Nearest neighbour |
|||||||||
T. gondii |
0.80
b
|
0.83
b
|
1.00 |
0.83 |
0.95 |
0.83 |
1.00 |
0.83 |
0.90 |
0.78 |
Plasmodium |
0.77 |
0.96 |
0.95
b
|
0.84
b
|
0.88 |
0.96 |
0.99 |
0.94 |
0.81 |
0.96 |
C. elegans |
0.88 |
0.99 |
0.99 |
0.95 |
0.96
b
|
0.98
b
|
0.99 |
0.99 |
0.95 |
0.98 |
Combined species |
0.87 |
0.98 |
0.99 |
0.94 |
0.97 |
0.98 |
0.96
b
|
0.97
b
|
0.92 |
0.97 |
Benchmark |
0.93 |
0.96 |
1.00 |
0.90 |
0.96 |
0.96 |
0.96 |
0.97 |
0.98
b
|
0.96
b
|
|
Naive bayes classifier |
|||||||||
T.gondii |
1.00
b
|
0.91
b
|
1.00 |
0.78 |
1.00 |
0.83 |
1.00 |
0.83 |
1.00 |
0.83 |
Plasmodium |
0.97 |
0.98 |
0.98
b
|
0.99
b
|
1.00 |
0.92 |
1.00 |
0.96 |
1.00 |
0.98 |
C. elegans |
0.87 |
1.00 |
0.92 |
0.95 |
1.00
b
|
0.98
b
|
0.97 |
0.98 |
1.00 |
0.99 |
Combined species |
0.89 |
0.99 |
0.93 |
0.95 |
1.00 |
0.97 |
0.98
b
|
0.97
b
|
1.00 |
0.98 |
Benchmark |
0.81 |
1.00 |
0.97 |
0.94 |
1.00 |
0.93 |
1.00 |
0.99 |
1.00
b
|
1.00
b
|
|
Neural networks
a
|
|||||||||
T. gondii |
0.98
b
|
0.90
b
|
0.99 |
0.83 |
1.00 |
0.84 |
1.00 |
0.91 |
0.99 |
0.83 |
Plasmodium |
0.88 |
0.92 |
0.99
b
|
0.89
b
|
0.99 |
0.97 |
0.97 |
0.98 |
0.93 |
0.97 |
C. elegans |
0.83 |
0.99 |
0.92 |
0.98 |
0.99
b
|
0.99
b
|
1.00 |
1.00 |
0.98 |
0.97 |
Combined species |
0.91 |
0.96 |
0.93 |
0.98 |
0.99 |
0.98 |
0.99
b
|
0.98
b
|
0.97 |
0.97 |
Benchmark |
0.78 |
0.97 |
0.97 |
0.97 |
0.99 |
0.95 |
0.99 |
0.96 |
1.00
b
|
0.95
b
|
|
Support vector machines |
|||||||||
T.gondii |
0.83
b
|
0.92
b
|
0.89 |
1.00 |
0.89 |
0.89 |
1.00 |
0.89 |
1.00 |
0.83 |
Plasmodium |
0.88 |
0.97 |
0.98
b
|
0.98
b
|
0.96 |
0.98 |
1.00 |
0.98 |
1.00 |
0.98 |
C. elegans |
0.83 |
0.89 |
0.98 |
0.99 |
0.94
b
|
0.99
b
|
0.99 |
1.00 |
0.91 |
0.99 |
Combined species |
0.84 |
0.91 |
0.98 |
0.98 |
0.99 |
0.99 |
0.92
b
|
0.99
b
|
0.93 |
0.98 |
Benchmark | 0.74 | 0.99 | 0.96 | 0.96 | 0.94 | 0.99 | 0.96 | 1.00 | 0.83 b | 0.92 b |
Abbreviations: SN = sensitivity; SP = specificity; T. gondii = Toxoplasma gondii; Plasmodium = species in the genus Plasmodium including falciparum, yoelii yoelii, and berghei; C. elegans = Caenorhabditis elegans; Combined species = combination of T. gondii, Plasmodium, and C. elegans datasets; Benchmark = dataset comprising evidence for T. gondii and Neospora caninum proteins from published studies.
a Results from the same input data fluctuate. The algorithm-specific R functions were executed 100 times and the prediction outcomes (false positives and negatives, true positives and negatives) were averaged to calculate SN and SP.
b Obtained from multiple cross-validations i.e. the algorithm-specific R functions randomly used 70% of the training dataset to build a model and the remaining 30% was used in the binary classification test. The cross-validation was executed 100 times and the prediction outcomes were averaged to calculate SN and SP.
The values underlined denote the best performing training dataset for classifying the benchmark proteins.