Skip to main content
. 2022 Jul 19;23(5):bbac282. doi: 10.1093/bib/bbac282

Table 4.

Number of entity (mention and identifier) and relation annotations in the BioRED corpus, the IAA and the distribution between the training, development and test sets. The parenthesized numbers are the unique entities linked with concept identifiers.

Annotation s Training Dev Tests Total IAA
Document 400 100 100 600 -
Entity (ID) All 13 351 (2708) 3533 (956) 3535 (982) 20 419 (3869) 97.01%
Gene 4430 (1141) 1087 (368) 1180 (399) 6697 (1643) 97.35%
Disease 3646 (576) 982 (244) 917 (244) 5545 (778) 96.06%
Chemical 2853 (486) 822 (184) 754 (170) 4429 (651) 96.12%
Variant 890 (420) 250 (135) 241 (137) 1381 (678) 97.79%
Species 1429 (37) 370 (13) 393 (11) 2192 (47) 99.43%
Cell Line 103 (48) 22 (12) 50 (21) 175 (72) 99.68%
Relation 4178 1162 1163 6503 77.91%
Relation pair with novelty findings 2838 835 859 4532 85.01%