. 2018 Jan 30;9:7. doi: 10.1186/s13326-017-0168-3

Table 4.

Statistical significance (McNemar’s) tests for the ASM and APG classifiers, for the null-hypothesis being that the two classifiers are equally accurate and a significance threshold of 0.01

Dataset	Number of examples		Accuracy		P-value
	Training	Testing	APG	ASM
AIMed	11,246	5,834	58.6	53.1	3.8e-7
BioInfer	7,414	9,666	77.8	76.8	0.0011
HPRD50	16,647	433	70.9	68.1	0.999
IEPA	16,263	817	73.6	66.5	3.9e-6
LLL	16,750	330	75.4	65.1	2.2e-6
CID: Sentence level relations.	9,913	5,099	72.2	71.2	0.0969
CID: Non Sentence level relations	21,656	11,562	84.9	84.1	0.0002

P-values less than the threshold are shown in italicized font