Skip to main content
. 2018 Jan 30;9:7. doi: 10.1186/s13326-017-0168-3

Table 4.

Statistical significance (McNemar’s) tests for the ASM and APG classifiers, for the null-hypothesis being that the two classifiers are equally accurate and a significance threshold of 0.01

Dataset Number of examples Accuracy P-value
Training Testing APG ASM
AIMed 11,246 5,834 58.6 53.1 3.8e-7
BioInfer 7,414 9,666 77.8 76.8 0.0011
HPRD50 16,647 433 70.9 68.1 0.999
IEPA 16,263 817 73.6 66.5 3.9e-6
LLL 16,750 330 75.4 65.1 2.2e-6
CID: Sentence level relations. 9,913 5,099 72.2 71.2 0.0969
CID: Non Sentence level relations 21,656 11,562 84.9 84.1 0.0002

P-values less than the threshold are shown in italicized font