Table 1.
Overview of relation extraction challenge datasets
Task | Dataset name | Release | Data source | Scale/data size | Relation objects | Relation types | Reference |
---|---|---|---|---|---|---|---|
BioCreative II-PPI | Dataset for interaction pair subtask (IPS) | 2006 | PubMed | 1098 full-texts (740 for training, 358 for test) | Protein | Binary interaction | [34] |
BioCreative II.5 | Interaction pairs task (IPT) | 2009 | FEBS Letters | 122 full-texts | Protein | Binary interaction | [35] |
2010 i2b2/VA challenge – relation (now is n2c2) | 2010 i2b2 dataset | 2010 | Three health Facilities |
871 EMRs (394 for training, 477 for test) | Medical problem, treatment, test concepts | Medical problems-treatments, medical problems-tests, medical problems-other medical problems | [51] |
BioNLP-ST 2011 | Entity relations (REL) | 2011 | MEDLINE | 1210 abstracts | Gene/protein and the other entity | Protein-component and subunit-complex | [44] |
Epigenetics and post-translational modifications (EPI) | 2011 | PubMed | 1200 abstracts | Protein, event | 15 types | [44] | |
Genia event extraction (GE) | 2011 | MEDLINE, PMC |
1210 abstracts; 14 full-texts |
Protein, event | 9 types | [42] | |
Infectious diseases (ID) | 2011 | PMC | 30 full-texts | Protein, chemical, organism, two-component-system and regulon-operon | 10 types | [44] | |
DDIExtraction 2011 challenge | DrugDDI corpus | 2011 | DrugBank | 5806 sentences/579 texts | Drug | Binary interaction | [18] |
BioNLP-ST 2013 | Genia event extraction (GE) | 2013 | PMC | 34 full-texts | Protein, event | 13 types | [46] |
Cancer genetics (CG) | MLEE corpus; PubMed |
250 abstracts; 350 abstracts |
18 types | 40 types | [47] | ||
Pathway curation (PC) | PubMed | 525 abstracts | 4 types | 23 types | [48] | ||
Gene regulation ontology (GRO) | MEDLINE | 300 abstracts | 174 types | 126 types | [49] | ||
Gene regulation network in bacteria (GRN) | PubMed | 201 sentences | 6 types | 12 types | [50] | ||
DDIExtraction 2013 challenge (SemEval-2013 Task 9) | DDI corpus | 2013 | DrugBank; MEDLINE |
6795 sentences/792 texts; 2147 sentences/233 abstracts |
4 pharmacological mentions | Mechanism, effect, advice, int | [19] |
BioCreative V CDR task | BC5CDR corpus | 2016 | PubMed | 1500 abstracts (500 for training, 500 for development, 500 for test) | Chemical/disease | Binary interaction | [36] |
BioCreative VI ChemProt task | ChemProt corpus | 2017 | PubMed | 2432 abstracts (1020 for training, 612 for development, 800 for test) | Chemical compound/drug, gene/protein | 22 types | [37] |
BioCreative VI PrecMed task | Precision medicine (PM) | 2017 | PubMed | 5509 abstracts | Protein | Binary interaction | [38] |
MADE 1.0 challenge | Medication and adverse drug event from electronic health records (MADE1.0) corpus | 2018 | University of Massachusetts Memorial Hospital | 1089 EHR | 9 types | 7 types | [54] |
2018 n2c2 shared task-track 2 | 2018 n2c2 track 2 dataset | 2018 | MIMIC-III | 505 discharged summaries (303 for training, 202 for test) | 9 types (drugs and 8 other types) | 8 types (drugs with 7 other types) | [52] |
BioCreative VII DrugProt shared task | DrugProt corpus | 2021 | PubMed | 5000 abstracts (3500 for training, 750 for development, 750 for test) | Chemical compounds (drug included), gene/protein | 13 types | [40] |