Table 1.
Performance | ||||
---|---|---|---|---|
Type | Training/evaluation corpus | Doc type | PubTator | PTC |
Gene | BioCreative II GN (45) | Abstract | GenNorm (34) 80.10% | GNormPlus (35) 86.70% |
Variant | BRONCO (24) | Full text | tmVar (31) N/A | tmVar 2.0 (38) 86.24% |
Disease | NCBI Disease (46) | Abstract | DNorm (32) 80.60% | TaggerOne (39) 83.70% |
Chemical | BioCreative V CDR (41) | Abstract | Dictionary 53.82% | TaggerOne 89.50% |
Species | Linnaeus (43) | Full text | SR4GN (33) 85.42% | SR4GN (33) 85.42% |
Cell Line | BioCreative VI BioID corpus (44) | Full text (caption) | N/A | TaggerOne 83.10% |
Performance listed is the F1 score for concept identification (normalization). The previous version of tmVar does not provide accession identifiers (dbSNP RS numbers) for variants located within the text. Cell line annotations are new in PTC.