Table 2. 10-fold document-level CV results.
Kernel | AIMed | BioInfer | HPRD50 | IEPA | LLL | |||||||||||||||
AUC | P | R | F | AUC | P | R | F | AUC | P | R | F | AUC | P | R | F | AUC | P | R | F | |
SL | 83.5 | 47.5 | 65.5 | 54.5 | 81.1 | 55.1 | 66.5 | 60.0 | 80.0 | 64.4 | 67.0 | 64.2 | 81.1 | 69.5 | 71.2 | 69.3 | 81.2 | 69.0 | 85.3 | 74.5 |
ST | 68.9 | 40.3 | 25.5 | 30.9 | 74.2 | 46.8 | 60.0 | 52.2 | 63.3 | 49.7 | 67.8 | 54.5 | 75.8 | 59.4 | 75.6 | 65.9 | 69.0 | 55.9 | 100. | 70.3 |
SST | 68.9 | 42.6 | 19.4 | 26.2 | 73.6 | 47.0 | 54.3 | 50.1 | 62.2 | 48.1 | 63.8 | 52.2 | 72.4 | 54.8 | 76.9 | 63.4 | 63.8 | 55.9 | 100. | 70.3 |
PT | 68.5 | 39.2 | 31.9 | 34.6 | 73.8 | 45.3 | 58.1 | 50.5 | 65.2 | 54.9 | 56.7 | 52.4 | 73.1 | 63.1 | 66.3 | 63.8 | 66.7 | 56.2 | 97.3 | 69.3 |
SpT | 66.1 | 33.0 | 25.5 | 27.3 | 74.1 | 44.0 | 68.2 | 53.4 | 65.7 | 49.3 | 71.7 | 56.4 | 75.9 | 54.5 | 81.8 | 64.7 | 50.0 | 55.9 | 100. | 70.3 |
kBSPS | 75.1 | 50.1 | 41.4 | 44.6 | 75.2 | 49.9 | 61.8 | 55.1 | 79.3 | 62.2 | 87.1 | 71.0 | 83.2 | 58.8 | 89.7 | 70.5 | 84.3 | 69.3 | 93.2 | 78.1 |
cosine | 70.5 | 43.6 | 39.4 | 40.9 | 66.1 | 44.8 | 44.0 | 44.1 | 74.8 | 59.0 | 67.2 | 61.2 | 75.5 | 61.3 | 68.4 | 64.1 | 75.2 | 70.2 | 81.7 | 73.8 |
edit | 75.2 | 68.8 | 27.7 | 39.0 | 67.4 | 50.4 | 39.2 | 43.8 | 79.2 | 71.3 | 45.2 | 53.3 | 80.2 | 77.2 | 60.2 | 67.1 | 87.5 | 68.0 | 98.0 | 78.4 |
APG | 84.6 | 59.9 | 53.6 | 56.2 | 81.5 | 60.2 | 61.3 | 60.7 | 80.9 | 68.2 | 69.8 | 67.8 | 83.9 | 66.6 | 82.6 | 73.1 | 83.5 | 71.3 | 91 | 78.1 |
APG (with SVM) | 71.2 | 62.9 | 48.9 | 54.7 | 73.9 | 60.2 | 63.4 | 61.6 | 74.1 | 65.4 | 72.5 | 67.5 | 76.2 | 71.0 | 75.1 | 72.1 | 74.9 | 70.9 | 95.4 | 79.7 |
SL [23] | 60.9 | 57.2 | 59.0 | |||||||||||||||||
kBSPS [29] | 67.2 | 49.4 | 44.7 | 46.1 | 76.9 | 66.7 | 80.2 | 70.9 | 75.8 | 70.4 | 73.0 | 70.8 | 78.5 | 76.8 | 91.8 | 82.2 | ||||
cosine [22] † | 62.0 | 55.0 | 58.1 | |||||||||||||||||
edit [22] † | 77.5 | 43.5 | 55.6 | |||||||||||||||||
APG [17] | 84.8 | 52.9 | 61.8 | 56.4 | 81.9 | 56.7 | 67.2 | 61.3 | 79.7 | 64.3 | 65.8 | 63.4 | 85.1 | 69.6 | 82.7 | 75.1 | 83.4 | 72.5 | 82.2 | 76.8 |
rich-feature-based [26] | 49.0 | 44.0 | 46.0 | 60.0 | 51.0 | 55.0 | 64.0 | 70.0 | 67.0 | 72.0 | 73.0 | 73.0 | ||||||||
hybrid [63] | 86.8 | 55.0 | 68.8 | 60.8 | 85.9 | 65.7 | 71.1 | 68.1 | 82.2 | 68.5 | 76.1 | 70.9 | 84.4 | 67.5 | 78.6 | 71.7 | 86.3 | 77.6 | 86.0 | 80.1 |
co-occ. [17] | 17.8 | 100. | 30.1 | 26.6 | 100. | 41.7 | 38.9 | 100. | 55.4 | 40.8 | 100. | 57.6 | 55.9 | 100. | 70.3 | |||||
RelEx [36] | 40.0 | 50.0 | 44.0 | 39.0 | 45.0 | 41.0 | 76.0 | 64.0 | 69.0 | 74.0 | 61.0 | 67.0 | 82.0 | 72.0 | 77.0 |
The first two blocks contain the results of our evaluation, the third block contains corresponding results of kernel approaches from the literature, and the third block shows some non-kernel-based baselines. Bold typeface shows our best results for a particular corpus (differences under 1 base point are ignored). AUC, precision, recall, and F-score in percent.
† instance-level CV.