Skip to main content
. 2024 Oct 17;12:e59782. doi: 10.2196/59782

Table 1.

Descriptive statistics of the medical named entity recognition datasets.

Dataset Medical entity type Entity types, n Annotations, n
Revised JNLPBAa DNA, RNA, protein, cell line, and cell type 5 52,785
BC5CDRb Disease and chemical 2 38,596
AnatEMc Organism subdivision, anatomical system, organ, multi-tissue structure, tissue, cell, developing anatomical structure, cellular component, organism substance, immaterial anatomical entity, pathological formation, and cancer 12 11,562

aJNLPBA: Joint Workshop on Natural Language Processing in Biomedicine and its Applications.

bBC5CDR: BioCreative V CDR.

cAnatEM: Anatomical Entity Mention.