Table 4.
Performance evaluation of LINNAEUS species tagging on different evaluation sets
Set | Level | Main set | TP | FP | FN | Recall | Prec. |
---|---|---|---|---|---|---|---|
NCBI taxonomy | Doc. | MEDLINE | 6,888 | 10,032 | (1,807) | 0.7922 | (0.4071) |
PMC OA abs | 15 | 20 | (6) | 0.7143 | (0.4286) | ||
PMC OA full (abs) | 16 | 166 | (3) | 0.8421 | (0.0791) | ||
PMC OA full (all) | 22 | 196 | (4) | 0.8462 | (0.1010) | ||
MeSH | Doc. | MEDLINE | 5,073,147 | 4,577,293 | 2,315,811 | 0.6866 | 0.5257 |
PMC OA abs | 36,641 | 49,151 | (14,797) | 0.7123 | (0.4271) | ||
PMC OA full (abs) | 46,484 | 291,872 | (2,219) | 0.9544 | (0.1374) | ||
PMC OA full (all) | 54,814 | 346,071 | (2,880) | 0.9201 | (0.1367) | ||
Entrez gene | Doc. | MEDLINE | 346,989 | 171,001 | (139,702) | 0.7130 | (0.6699) |
PMC OA abs | 6,946 | 4,110 | (2,357) | 0.7466 | (0.6283) | ||
PMC OA full (abs) | 8,184 | 38,275 | (470) | 0.9457 | (0.1762) | ||
PMC OA full (all) | 9,662 | 42,209 | (628) | 0.9390 | (0.1863) | ||
EMBL | Doc. | MEDLINE | 158,462 | 183,950 | (235,745) | 0.4020 | (0.4627) |
PMC OA abs | 4,807 | 4,360 | (7,902) | 0.3782 | (0.5244) | ||
PMC OA full (abs) | 6,601 | 34,447 | (3,859) | 0.6311 | (0.1608) | ||
PMC OA full (all) | 9,433 | 40,212 | (5,613) | 0.6269 | (0.1900) | ||
PMC linkouts | Doc. | MEDLINE | (27,259) | (23,377) | (122,596) | (0.1819) | (0.5383) |
PMC OA abs | (30,315) | (27,192) | (141,735) | (0.1762) | (0.5272) | ||
PMC OA full (abs) | 110,288 | 156,012 | 61,656 | 0.6414 | 0.4141 | ||
PMC OA full (all) | 11,2069 | 163,052 | 61,671 | 0.6450 | 0.4073 | ||
Whatizit-Organisms | Doc. | PMC OA abs | 64,686 | 29,222 | 12,930 | 0.8334 | 0.6888 |
PMC OA full (abs) | 308,410 | 67,171 | 100,079 | 0.7550 | 0.8211 | ||
PMC OA full (all) | 344,445 | 73,489 | 109,668 | 0.7585 | 0.8242 | ||
Mention | PMC OA abs | 139,077 | 147,426 | 39,351 | 0.7794 | 0.4854 | |
PMC OA full (xml) | 1,164,799 | 1,596,615 | 527,284 | 0.6883 | 0.4218 | ||
PMC OA full (all) | 1,304,620 | 2,398,321 | 1,133,018 | 0.5352 | 0.3523 | ||
Manual | Doc. | PMC OA abs | 101 | 0 | 3 | 0.9712 | 1.0 |
PMC OA full (abs) | 421 | 46 | 9 | 0.9791 | 0.9015 | ||
PMC OA full (all) | 462 | 49 | 9 | 0.9809 | 0.9041 | ||
Mention | PMC OA abs | 326 | 3 | 19 | 0.9449 | 0.9909 | |
PMC OA full (xml) | 3,190 | 92 | 222 | 0.9350 | 0.9720 | ||
PMC OA full (all) | 3,973 | 120 | 241 | 0.9428 | 0.9707 |
Values in parentheses are for comparisons between document sets of different type (for example, evaluation tag sets based on full text compared against species tags generated on abstracts) or when the evaluation set is likely to exclude a large number of species mentions. PMC OA full (all) shows accuracy for all full-text documents. PMC OA full (abs) shows accuracy for all full-text documents with an abstract that can be extracted, allowing comparison of document-level accuracy between full-text and abstract. PMC OA full (xml) shows accuracy for all full-text documents with XML abstract, allowing comparison of mention-level accuracy between full-text and abstracts.