Skip to main content
. 2013 Nov 2;14:315. doi: 10.1186/1471-2105-14-315

Table 5.

Sensitivity and specificity performance measures of binary classification on different test datasets when using machine learning algorithms with different training datasets

Test dataset
Training dataset
 
T. gondii
 
Plasmodium
 
C. elegans
 
Combined species
 
Benchmark
 
  SN SP SN SP SN SP SN SP SN SP
 
Decision Tree a
T. gondii
1.00 b
0.81 b
0.95
0.89
1.00
0.83
1.00
0.83
1.00
0.83
Plasmodium
0.84
0.90
1.00 b
1.00 b
0.85
0.96
1.00
0.92
1.00
0.98
C. elegans
0.87
0.93
1.00
0.99
1.00 b
1.00 b
1.00
0.99
1.00
0.98
Combined species
0.87
0.92
1.00
0.99
0.98
0.99
1.00 b
0.98 b
1.00
0.97
Benchmark
0.86
0.91
0.97
0.96
0.96
0.96
0.97
0.91
1.00 b
1.00 b
 
Adaptive boosting a
T. gondii
0.51 b
0.06 b
0.96
0.88
1.00
0.83
1.00
0.91
1.00
0.83
Plasmodium
0.82
0.99
0.98 b
0.96 b
0.95
0.96
1.00
1.00
1.00
0.98
C. elegans
0.87
0.99
1.00
1.00
1.00 b
1.00 b
1.00
1.00
1.00
0.98
Combined species
0.87
0.99
1.00
0.99
0.99
0.99
1.00 b
0.99 b
1.00
0.98
Benchmark
0.85
0.99
0.97
0.98
0.97
0.96
0.99
0.99
0.98 b
0.97 b
 
Random forest a
T. gondii
0.97 b
0.90 b
1.00
0.83
1.00
0.89
1.00
1.00
1.00
0.83
Plasmodium
0.87
1.00
0.99 b
0.99 b
1.00
1.00
1.00
1.00
1.00
0.98
C. elegans
0.83
1.00
0.98
1.00
1.00 b
1.00 b
1.00
1.00
1.00
1.00
Combined species
0.84
1.00
0.98
0.99
1.00
1.00
1.00 b
1.00 b
1.00
0.99
Benchmark
0.82
1.00
0.99
0.99
0.99
1.00
0.97
0.99
0.99 b
0.99 b
 
k-Nearest neighbour
T. gondii
0.80 b
0.83 b
1.00
0.83
0.95
0.83
1.00
0.83
0.90
0.78
Plasmodium
0.77
0.96
0.95 b
0.84 b
0.88
0.96
0.99
0.94
0.81
0.96
C. elegans
0.88
0.99
0.99
0.95
0.96 b
0.98 b
0.99
0.99
0.95
0.98
Combined species
0.87
0.98
0.99
0.94
0.97
0.98
0.96 b
0.97 b
0.92
0.97
Benchmark
0.93
0.96
1.00
0.90
0.96
0.96
0.96
0.97
0.98 b
0.96 b
 
Naive bayes classifier
T.gondii
1.00 b
0.91 b
1.00
0.78
1.00
0.83
1.00
0.83
1.00
0.83
Plasmodium
0.97
0.98
0.98 b
0.99 b
1.00
0.92
1.00
0.96
1.00
0.98
C. elegans
0.87
1.00
0.92
0.95
1.00 b
0.98 b
0.97
0.98
1.00
0.99
Combined species
0.89
0.99
0.93
0.95
1.00
0.97
0.98 b
0.97 b
1.00
0.98
Benchmark
0.81
1.00
0.97
0.94
1.00
0.93
1.00
0.99
1.00 b
1.00 b
 
Neural networks a
T. gondii
0.98 b
0.90 b
0.99
0.83
1.00
0.84
1.00
0.91
0.99
0.83
Plasmodium
0.88
0.92
0.99 b
0.89 b
0.99
0.97
0.97
0.98
0.93
0.97
C. elegans
0.83
0.99
0.92
0.98
0.99 b
0.99 b
1.00
1.00
0.98
0.97
Combined species
0.91
0.96
0.93
0.98
0.99
0.98
0.99 b
0.98 b
0.97
0.97
Benchmark
0.78
0.97
0.97
0.97
0.99
0.95
0.99
0.96
1.00 b
0.95 b
 
Support vector machines
T.gondii
0.83 b
0.92 b
0.89
1.00
0.89
0.89
1.00
0.89
1.00
0.83
Plasmodium
0.88
0.97
0.98 b
0.98 b
0.96
0.98
1.00
0.98
1.00
0.98
C. elegans
0.83
0.89
0.98
0.99
0.94 b
0.99 b
0.99
1.00
0.91
0.99
Combined species
0.84
0.91
0.98
0.98
0.99
0.99
0.92 b
0.99 b
0.93
0.98
Benchmark 0.74 0.99 0.96 0.96 0.94 0.99 0.96 1.00 0.83 b 0.92 b

Abbreviations: SN = sensitivity; SP = specificity; T. gondii = Toxoplasma gondii; Plasmodium = species in the genus Plasmodium including falciparum, yoelii yoelii, and berghei; C. elegans = Caenorhabditis elegans; Combined species = combination of T. gondii, Plasmodium, and C. elegans datasets; Benchmark = dataset comprising evidence for T. gondii and Neospora caninum proteins from published studies.

a Results from the same input data fluctuate. The algorithm-specific R functions were executed 100 times and the prediction outcomes (false positives and negatives, true positives and negatives) were averaged to calculate SN and SP.

b Obtained from multiple cross-validations i.e. the algorithm-specific R functions randomly used 70% of the training dataset to build a model and the remaining 30% was used in the binary classification test. The cross-validation was executed 100 times and the prediction outcomes were averaged to calculate SN and SP.

The values underlined denote the best performing training dataset for classifying the benchmark proteins.