Skip to main content
. 2024 Apr 12;25(3):bbae132. doi: 10.1093/bib/bbae132

Table 1.

Overview of relation extraction challenge datasets

Task Dataset name Release Data source Scale/data size Relation objects Relation types Reference
BioCreative II-PPI Dataset for interaction pair subtask (IPS) 2006 PubMed 1098 full-texts (740 for training, 358 for test) Protein Binary interaction [34]
BioCreative II.5 Interaction pairs task (IPT) 2009 FEBS Letters 122 full-texts Protein Binary interaction [35]
2010 i2b2/VA challenge – relation (now is n2c2) 2010 i2b2 dataset 2010 Three health
Facilities
871 EMRs (394 for training, 477 for test) Medical problem, treatment, test concepts Medical problems-treatments, medical problems-tests, medical problems-other medical problems [51]
BioNLP-ST 2011 Entity relations (REL) 2011 MEDLINE 1210 abstracts Gene/protein and the other entity Protein-component and subunit-complex [44]
Epigenetics and post-translational modifications (EPI) 2011 PubMed 1200 abstracts Protein, event 15 types [44]
Genia event extraction (GE) 2011 MEDLINE,
PMC
1210 abstracts;
14 full-texts
Protein, event 9 types [42]
Infectious diseases (ID) 2011 PMC 30 full-texts Protein, chemical, organism, two-component-system and regulon-operon 10 types [44]
DDIExtraction 2011 challenge DrugDDI corpus 2011 DrugBank 5806 sentences/579 texts Drug Binary interaction [18]
BioNLP-ST 2013 Genia event extraction (GE) 2013 PMC 34 full-texts Protein, event 13 types [46]
Cancer genetics (CG) MLEE corpus;
PubMed
250 abstracts;
350 abstracts
18 types 40 types [47]
Pathway curation (PC) PubMed 525 abstracts 4 types 23 types [48]
Gene regulation ontology (GRO) MEDLINE 300 abstracts 174 types 126 types [49]
Gene regulation network in bacteria (GRN) PubMed 201 sentences 6 types 12 types [50]
DDIExtraction 2013 challenge (SemEval-2013 Task 9) DDI corpus 2013 DrugBank;
MEDLINE
6795 sentences/792 texts;
2147 sentences/233 abstracts
4 pharmacological mentions Mechanism, effect, advice, int [19]
BioCreative V CDR task BC5CDR corpus 2016 PubMed 1500 abstracts (500 for training, 500 for development, 500 for test) Chemical/disease Binary interaction [36]
BioCreative VI ChemProt task ChemProt corpus 2017 PubMed 2432 abstracts (1020 for training, 612 for development, 800 for test) Chemical compound/drug, gene/protein 22 types [37]
BioCreative VI PrecMed task Precision medicine (PM) 2017 PubMed 5509 abstracts Protein Binary interaction [38]
MADE 1.0 challenge Medication and adverse drug event from electronic health records (MADE1.0) corpus 2018 University of Massachusetts Memorial Hospital 1089 EHR 9 types 7 types [54]
2018 n2c2 shared task-track 2 2018 n2c2 track 2 dataset 2018 MIMIC-III 505 discharged summaries (303 for training, 202 for test) 9 types (drugs and 8 other types) 8 types (drugs with 7 other types) [52]
BioCreative VII DrugProt shared task DrugProt corpus 2021 PubMed 5000 abstracts (3500 for training, 750 for development, 750 for test) Chemical compounds (drug included), gene/protein 13 types [40]