Skip to main content
. 2024 Apr 12;25(3):bbae132. doi: 10.1093/bib/bbae132

Table 2.

Overview of relation extraction datasets

Dataset name Published Data source Scale/data size Relation objects Relation types Reference
IEPA 2002 PubMed 486 sentences/
~300 abstracts
Chemicals Binary interaction [13]
AIMed 2005 MEDLINE 1955 sentences/
225 abstracts
Human protein/gene Binary interaction [15]
LLL 2005 MEDLINE 77 sentences Protein/gene in Bacillus subtilis 3 types [14]
Bioinfer 2007 PubMed 1100 sentences Protein/gene/RNA and related 68 types [16]
HPRD50 2007 MEDLINE 145 sentences/
50 abstracts
Human protein/gene Binary interaction [17]
EMU 2010 PubMed 109 abstracts on mutation Human protein/gene, disease (prostate cancer/breast cancer) Binary interaction [55]
MLEE corpus 2012 PubMed 2608 sentences/
262 abstracts on angiogenesis
Organism, anatomy, molecule types
(14 entity types included in 3 major entity types)
Anatomical, molecular, general, planned events
(18 events included in 4 major types)
[56]
EU-ADR 2012 MEDLINE 300 abstracts Drug, disease, and target (gene, protein and sequence variation) drug–disease, drug–target, target–disease [57]
ADE corpus 2012 MEDLINE 20 967 sentences/
2972 documents
Drug, adverse effect and dosage Drug–adverse effect, drug–dosage [21]
GAD corpus 2015 PubMed 5329 sentences Gene and disease Binary interaction [58]
PhenoCHF 2015 i2b2 recognizing obesity challenge;
PMC
300 discharged summaries;
10 full-texts
Six COPD-related mentions 3 types [59]
BRONCO 2016 PMC 108 full-texts Variant, gene, disease, drug and cell-line Variant with other entities (4 types) [60]
N-ary 2017 PMC 264 867 sentences Drug, gene, mutation Six types (5 pos, 1 neg) [61]
DDAE dataset 2019 PubMed 521 abstracts (400 for training, 121 for test) Disease 2 types [62]
RENET2 2021 MEDLINE;
PMC
1000 abstracts;
500 full-texts
Gene and disease Associated, non-associated and ambiguous [63]