. 2013 Nov 2;14:315. doi: 10.1186/1471-2105-14-315

Table 5.

Sensitivity and specificity performance measures of binary classification on different test datasets when using machine learning algorithms with different training datasets

Test dataset	Training dataset
	*T. gondii*		*Plasmodium*		*C. elegans*		Combined species		Benchmark
	SN	SP	SN	SP	SN	SP	SN	SP	SN	SP
	Decision Tree^a
T. gondii	1.00^b	0.81^b	0.95	0.89	1.00	0.83	1.00	0.83	1.00	0.83
Plasmodium	0.84	0.90	1.00^b	1.00^b	0.85	0.96	1.00	0.92	1.00	0.98
C. elegans	0.87	0.93	1.00	0.99	1.00^b	1.00^b	1.00	0.99	1.00	0.98
Combined species	0.87	0.92	1.00	0.99	0.98	0.99	1.00^b	0.98^b	1.00	0.97
Benchmark	0.86	0.91	0.97	0.96	0.96	0.96	0.97	0.91	1.00^b	1.00^b
	Adaptive boosting^a
T. gondii	0.51^b	0.06^b	0.96	0.88	1.00	0.83	1.00	0.91	1.00	0.83
Plasmodium	0.82	0.99	0.98^b	0.96^b	0.95	0.96	1.00	1.00	1.00	0.98
C. elegans	0.87	0.99	1.00	1.00	1.00^b	1.00^b	1.00	1.00	1.00	0.98
Combined species	0.87	0.99	1.00	0.99	0.99	0.99	1.00^b	0.99^b	1.00	0.98
Benchmark	0.85	0.99	0.97	0.98	0.97	0.96	0.99	0.99	0.98^b	0.97^b
	Random forest^a
T. gondii	0.97^b	0.90^b	1.00	0.83	1.00	0.89	1.00	1.00	1.00	0.83
Plasmodium	0.87	1.00	0.99^b	0.99^b	1.00	1.00	1.00	1.00	1.00	0.98
C. elegans	0.83	1.00	0.98	1.00	1.00^b	1.00^b	1.00	1.00	1.00	1.00
Combined species	0.84	1.00	0.98	0.99	1.00	1.00	1.00^b	1.00^b	1.00	0.99
Benchmark	0.82	1.00	0.99	0.99	0.99	1.00	0.97	0.99	0.99^b	0.99^b
	k-Nearest neighbour
T. gondii	0.80^b	0.83^b	1.00	0.83	0.95	0.83	1.00	0.83	0.90	0.78
Plasmodium	0.77	0.96	0.95^b	0.84^b	0.88	0.96	0.99	0.94	0.81	0.96
C. elegans	0.88	0.99	0.99	0.95	0.96^b	0.98^b	0.99	0.99	0.95	0.98
Combined species	0.87	0.98	0.99	0.94	0.97	0.98	0.96^b	0.97^b	0.92	0.97
Benchmark	0.93	0.96	1.00	0.90	0.96	0.96	0.96	0.97	0.98^b	0.96^b
	Naive bayes classifier
T.gondii	1.00^b	0.91^b	1.00	0.78	1.00	0.83	1.00	0.83	1.00	0.83
Plasmodium	0.97	0.98	0.98^b	0.99^b	1.00	0.92	1.00	0.96	1.00	0.98
C. elegans	0.87	1.00	0.92	0.95	1.00^b	0.98^b	0.97	0.98	1.00	0.99
Combined species	0.89	0.99	0.93	0.95	1.00	0.97	0.98^b	0.97^b	1.00	0.98
Benchmark	0.81	1.00	0.97	0.94	1.00	0.93	1.00	0.99	1.00^b	1.00^b
	Neural networks^a
T. gondii	0.98^b	0.90^b	0.99	0.83	1.00	0.84	1.00	0.91	0.99	0.83
Plasmodium	0.88	0.92	0.99^b	0.89^b	0.99	0.97	0.97	0.98	0.93	0.97
C. elegans	0.83	0.99	0.92	0.98	0.99^b	0.99^b	1.00	1.00	0.98	0.97
Combined species	0.91	0.96	0.93	0.98	0.99	0.98	0.99^b	0.98^b	0.97	0.97
Benchmark	0.78	0.97	0.97	0.97	0.99	0.95	0.99	0.96	1.00^b	0.95 ^b
	Support vector machines
T.gondii	0.83^b	0.92^b	0.89	1.00	0.89	0.89	1.00	0.89	1.00	0.83
Plasmodium	0.88	0.97	0.98^b	0.98^b	0.96	0.98	1.00	0.98	1.00	0.98
C. elegans	0.83	0.89	0.98	0.99	0.94^b	0.99^b	0.99	1.00	0.91	0.99
Combined species	0.84	0.91	0.98	0.98	0.99	0.99	0.92^b	0.99^b	0.93	0.98
Benchmark	0.74	0.99	0.96	0.96	0.94	0.99	0.96	1.00	0.83^b	0.92^b

Abbreviations: SN = sensitivity; SP = specificity; T. gondii = Toxoplasma gondii; Plasmodium = species in the genus Plasmodium including falciparum, yoelii yoelii, and berghei; C. elegans = Caenorhabditis elegans; Combined species = combination of T. gondii, Plasmodium, and C. elegans datasets; Benchmark = dataset comprising evidence for T. gondii and Neospora caninum proteins from published studies.

^aResults from the same input data fluctuate. The algorithm-specific R functions were executed 100 times and the prediction outcomes (false positives and negatives, true positives and negatives) were averaged to calculate SN and SP.

^bObtained from multiple cross-validations i.e. the algorithm-specific R functions randomly used 70% of the training dataset to build a model and the remaining 30% was used in the binary classification test. The cross-validation was executed 100 times and the prediction outcomes were averaged to calculate SN and SP.

The values underlined denote the best performing training dataset for classifying the benchmark proteins.