Skip to main content
. 2024 Oct 17;26(10):871. doi: 10.3390/e26100871

Table 1.

Chinese medical text datasets.

Dataset Name Data Volume Data Tags Data Type
CMeEE [50] 23,000 9 types of entities Chinese medical text named entity recognition
CMelE [50] 22,406 53 types of relations Chinese medical text entity relationship extraction
CMedCausal [51] 3000 3 types of relations Medical causality relationship extraction
CHIP-CDEE [51] 2485 4 types of attributes Clinical discovery event extraction
CHIP-CDN [50] 18,000 2500 standardized Clinical terminology normalization
CHIP-CTC [50] 40,644 44 categories Clinical trial screening criteria short text classification
CHIP-MEDFNPC [52] 8000 4 types of attributes Medical dialogue clinical findings polarity determination
CHIP-STS [50] 30,000 2 categories Ping An Healthcare and Technology Disease Q&A transfer learning
KUAKE-IR [53] 104,000 10 relevant data items Medical paragraph retrieval
KUAKE-QLC [54] 10,880 11 categories Medical search query intent classification
KUAKE-QTR [52] 32,552 4 categories Medical search page title relevance
KUAKE-QQR [52] 18,196 3 categories Medical search query relevance
MedDG [55] 22,162 160 types of entities Chinese medical dialogue
DiaKG [56] 22,050 entities, 6890 relation triples 15 types of entities, 10 types of relations Diabetes domain entities and relations
IMCS [57] 4116 16 types of intents, 5 types of entities, 3 types of symptoms Intelligent dialogue diagnosis and treatment
TCMRelExtr [58] 16,150 summaries 4 types of entities, 5 types of relations Traditional Chinese medicine entity and relationship corpus